Re: [PATCH 16/24] selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test

31 Oct 2023

      On 2023-10-27 at 15:32:58 +0300, Ilpo Järvinen wrote:
...
On Fri, 27 Oct 2023, Maciej Wieczór-Retman wrote:
...
On 2023-10-24 at 12:26:26 +0300, Ilpo Järvinen wrote:
...

ksft_print_msg("%s Check cache miss rate within %lu%%\n",
       ret ? "Fail:" : "Pass:", max_diff_percent);

ksft_print_msg("%s Check cache miss rate changed more than %.1f%%\n",

	       ret ? "Fail:" : "Pass:", (float)min_diff_percent);

Shouldn't "Fail" and "Pass" be flipped in the ternary operator? Or the condition
sign above "<" should be ">"?
I must not touch ret ? "Fail:" : "Pass:" logic, it's the correct way 
around. If I'd touch it, it'd break what the calling code assumes about 
the return value.
(More explanation below).
...
Now it looks like if (avg_diff * 100) is smaller than the min_diff_percent the
test is supposed to fail but the text suggests it's the other way around.
I also ran this selftest and that's the output:
# Pass: Check cache miss rate changed more than 3.0%
# Percent diff=45.8
# Number of bits: 4
# Average LLC val: 322489
# Cache span (lines): 294912
# Pass: Check cache miss rate changed more than 2.0%
# Percent diff=38.0
# Number of bits: 3
# Average LLC val: 445005
# Cache span (lines): 221184
# Pass: Check cache miss rate changed more than 1.0%
# Percent diff=27.2
# Number of bits: 2
# Average LLC val: 566145
# Cache span (lines): 147456
# Pass: Check cache miss rate changed more than 0.0%
# Percent diff=18.3
# Number of bits: 1
# Average LLC val: 669657
# Cache span (lines): 73728
ok 1 CAT: test
The diff percentages are much larger than the thresholds they're supposed to
be within and the test is passed.
No, the whole test logic is changed dramatically by this patch and 
failure logic is reverse now because of it. Note how I also altered these 
things:

MAX_DIFF_PERCENT -> MIN_DIFF_PERCENT_PER_BIT
max_diff_percent -> min_diff_percent
"cache miss rate within" -> "cache miss rate changed more than"

The new CAT test measures the # of cache misses (or in case of L2 CAT 
test, LLC accesses which is used as a proxy for L2 misses). Then it takes 
one bit away from the allocation mask and repeats the measurement.
If the # of LLC misses changes more than min_diff_precent when the 
number of bits in the allocation was changed, it is a strong indicator CAT 
is working like it should. Based on your numbers above, I'm extremely 
confident CAT works as expected!
I know for a fact that when the selftest is bound to a wrong resource id 
(which actually occurs on broadwell's with CoD enabled without one of the 
later patches in this series), this test is guaranteed to fail 100%, 
there's no noticeable difference measured in LLC misses in that case.
Thanks for explaining. Looking at it again the patch makes sense and seems very
coherent.
-- 
Kind regards
Maciej Wieczór-Retman

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 16/24] selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test