Hi Tony,
On 3/18/2024 12:34 PM, Luck, Tony wrote:
While that is in some ways a more accurate view, it breaks a lot of legacy monitoring applications that expect the "L3" names.
True - but the behaviour is different from a non SNC system, if this software can read the file - but goes wrong because the contents of the file represent something different, its still broken.
This is a good point. There is also /sys/fs/resctrl/info/L3_MON to consider and trying to think what to do about that makes me go in circles about when user space may expect resctrl to indicate the resource and when user space may expect resctrl to indicate the scope. For example, /sys/fs/resctrl/mon_data/mon_L3_00 contains files with data that monitor the "L3" _resource_, no? If we change that to /sys/fs/resctrl/mon_data/mon_NODE_00 then it switches the meaning of the middle term to be "scope" while it still contains the monitoring data of the "L3" resource. So does that mean user space would need to rely on /sys/fs/resctrl/info/L3_MON to obtain the information about which monitoring files (/sys/fs/resctrl/info/L3_MON/mon_features) are related to the particular resource and then match those filenames with the filenames in /sys/fs/resctrl/mon_data/mon_NODE_00 to know which resource it applies to and learn from the directory name what scope measurement is at?
Reinette,
It's both a wave and a particle, depending on the observer.
In SNC systems resources on each socket are divided into 2, 3, 4 nodes. But the division is complicated. Memory and CPU cores are easy. They are each assigned to an SNC node. The cache is more complicated. The hash function for memory address to cache index is the part that is SNC aware. So memory on SNC node1 will allocate in the cache indices assigned to SNC node1. But that function has to be independent of which CPU is doing the access. That's why I keep mentioning "well behaved NUMA applications when talking about SNC.
So the resctrl monitoring operations still work on the L3 cache, but in SNC mode they work on a portion of the L3 cache. As long as all accesses are NUMA local you can think of the cache as partitioned between the SNC nodes.
But not everything is well behaved from a NUMA perspective. It would be misleading to describe the occupancy and bandwidth as belonging to an SNC node.
It's also a bit misleading to describe in terms of an L3 cache instance. But doing so doesn't require application changes.
What is the use case for needing to expose the individual cluster counts? What if resctrl just summed the cluster counts and presented the data as before - per L3 cache instance? I doubt that resctrl would be what applications would use to verify whether they are "well behaved" wrt NUMA.
Reinette