@NOTimothyLottes Genius. Is there a need for a way to target the specific cache lines too? The goal being to increase the likelihood they get flushed on time
@ajohnston97629@Boost_Libraries I never said avoiding allocations is not important. I’m just saying boost small vector, and your implementation as well, are slow to access, because they force an indirect load. It’s not a theory, it’s a real bottleneck I’ve fixed many times in games now on the shelves in stores.
@ajohnston97629@Boost_Libraries There’s no branch in the fixed version either. CPU unconditionally loads the small storage where the dynamic memory pointer would be, but then uses a conditional select to use it or use the computed address. The discarded load doesn’t block csel retirement.
@ajohnston97629@Boost_Libraries Yes new/delete cost is important but boost::small_vector is slow to access because it requires an indirect load even to access the small storage. Your implementation has the same defect.
@Boost_Libraries Skipping allocs is nice but it’s not faster than std::vector because the memory locality win gets completely canceled by the indirect memory access. It’s much faster if you roll your own small vector that uses arithmetic to get the small storage pointer rather than a load.
@debasishg yeah looking at the blog post numbers it's not clear how to explain what I've seen. Somehow my experiments in practice showed over and over again that false sharing is surprisingly not the problem, but there may be more at play here.