On 2023-10-27 at 15:32:58 +0300, Ilpo Järvinen wrote:
On Fri, 27 Oct 2023, Maciej Wieczór-Retman wrote:
On 2023-10-24 at 12:26:26 +0300, Ilpo Järvinen wrote:
- ksft_print_msg("%s Check cache miss rate within %lu%%\n",
ret ? "Fail:" : "Pass:", max_diff_percent);
ksft_print_msg("%s Check cache miss rate changed more than %.1f%%\n",
ret ? "Fail:" : "Pass:", (float)min_diff_percent);
Shouldn't "Fail" and "Pass" be flipped in the ternary operator? Or the condition sign above "<" should be ">"?
I must not touch ret ? "Fail:" : "Pass:" logic, it's the correct way around. If I'd touch it, it'd break what the calling code assumes about the return value.
(More explanation below).
Now it looks like if (avg_diff * 100) is smaller than the min_diff_percent the test is supposed to fail but the text suggests it's the other way around.
I also ran this selftest and that's the output:
# Pass: Check cache miss rate changed more than 3.0% # Percent diff=45.8 # Number of bits: 4 # Average LLC val: 322489 # Cache span (lines): 294912 # Pass: Check cache miss rate changed more than 2.0% # Percent diff=38.0 # Number of bits: 3 # Average LLC val: 445005 # Cache span (lines): 221184 # Pass: Check cache miss rate changed more than 1.0% # Percent diff=27.2 # Number of bits: 2 # Average LLC val: 566145 # Cache span (lines): 147456 # Pass: Check cache miss rate changed more than 0.0% # Percent diff=18.3 # Number of bits: 1 # Average LLC val: 669657 # Cache span (lines): 73728 ok 1 CAT: test
The diff percentages are much larger than the thresholds they're supposed to be within and the test is passed.
No, the whole test logic is changed dramatically by this patch and failure logic is reverse now because of it. Note how I also altered these things:
- MAX_DIFF_PERCENT -> MIN_DIFF_PERCENT_PER_BIT
- max_diff_percent -> min_diff_percent
- "cache miss rate within" -> "cache miss rate changed more than"
The new CAT test measures the # of cache misses (or in case of L2 CAT test, LLC accesses which is used as a proxy for L2 misses). Then it takes one bit away from the allocation mask and repeats the measurement.
If the # of LLC misses changes more than min_diff_precent when the number of bits in the allocation was changed, it is a strong indicator CAT is working like it should. Based on your numbers above, I'm extremely confident CAT works as expected!
I know for a fact that when the selftest is bound to a wrong resource id (which actually occurs on broadwell's with CoD enabled without one of the later patches in this series), this test is guaranteed to fail 100%, there's no noticeable difference measured in LLC misses in that case.
Thanks for explaining. Looking at it again the patch makes sense and seems very coherent.