On Sun, Feb 08, 2015 at 12:02:37PM +0000, Daniel Thompson wrote:
Currently sched_clock(), a very hot code path, is not optimized to minimise its cache profile. In particular:
cd is not ____cacheline_aligned,
struct clock_data does not distinguish between hotpath and coldpath data, reducing locality of reference in the hotpath,
Some hotpath data is missing from struct clock_data and is marked __read_mostly (which more or less guarantees it will not share a cache line with cd).
This patch corrects these problems by extracting all hotpath data into a separate structure and using ____cacheline_aligned to ensure the hotpath uses a single (64 byte) cache line.
Have you got any performance figures for this change, or is this just a theoretical optimisation? It would be interesting to see what effect this has on systems with 32-byte cachelines and also scenarios where there's contention on the sequence counter.
Will