Hi guys,
On 07/03/2024 23:16, Tony Luck wrote:
On Thu, Mar 07, 2024 at 02:39:08PM -0800, Reinette Chatre wrote:
Thank you for the example. I find that significantly easier to understand than a single number in a generic "nodes_per_l3_cache". Especially with potential confusion surrounding inconsistent "nodes" between allocation and monitoring.
How about domain_cpu_list and domain_cpu_map ?
Like this (my test system doesn't have SNC, so all domains are the same):
$ cd /sys/fs/resctrl/info/ $ grep . */domain* L3/domain_cpu_list:0: 0-35,72-107 L3/domain_cpu_list:1: 36-71,108-143 L3/domain_cpu_map:0: 0000,00000fff,ffffff00,0000000f,ffffffff L3/domain_cpu_map:1: ffff,fffff000,000000ff,fffffff0,00000000 L3_MON/domain_cpu_list:0: 0-35,72-107 L3_MON/domain_cpu_list:1: 36-71,108-143 L3_MON/domain_cpu_map:0: 0000,00000fff,ffffff00,0000000f,ffffffff L3_MON/domain_cpu_map:1: ffff,fffff000,000000ff,fffffff0,00000000 MB/domain_cpu_list:0: 0-35,72-107 MB/domain_cpu_list:1: 36-71,108-143 MB/domain_cpu_map:0: 0000,00000fff,ffffff00,0000000f,ffffffff MB/domain_cpu_map:1: ffff,fffff000,000000ff,fffffff0,00000000
This duplicates the information in /sys/devices/system/cpu/cpuX/cache/indexY ... is this really because that information is, er, wrong on SNC systems. Is it possible to fix that?
From Tony's earlier description of how SNC changes things, the MB controls remain per-socket. To me it feels less invasive to fix the definition of L3 on these platforms to describe how it behaves (assuming that is possible), and define a new 'MB' that is NUMA scoped. This direction of redefining L3 means /sys/fs/resctrl and /sys/devices have different views of 'the' cache hierarchy.
(I also think that this be over the threshold on 'funny machines look funny' - but I bet someone builds an arm machine that looks like this too!)
Thanks,
James