Hi Shuah,
On 9/5/24 11:06 AM, Shuah Khan wrote:
On 9/4/24 06:54, Ilpo Järvinen wrote:
On Wed, 4 Sep 2024, Shuah Khan wrote:
On 9/4/24 06:18, Ilpo Järvinen wrote:
On Tue, 3 Sep 2024, Shuah Khan wrote:
On 9/3/24 08:45, Ilpo Järvinen wrote:
This series first generalizes resctrl selftest non-contiguous CAT check to not assume non-AMD vendor implies Intel. Second, it improves selftests such that the use of __cpuid_count() does not lead into a build failure (happens at least on ARM).
While ARM does not currently support resctrl features, there's an ongoing work to enable resctrl support also for it on the kernel side. In any case, a common header such as kselftest.h should have a proper fallback in place for what it provides, thus it seems justified to fix this common level problem on the common level rather than e.g. disabling build for resctrl selftest for archs lacking resctrl support.
I've dropped reviewed and tested by tags from the last patch in v3 due to major changes into the makefile logic. So it would be helpful if Muhammad could retest with this version.
Acquiring ARCH in lib.mk will likely allow some cleanup into some subdirectory makefiles but that is left as future work because this series focuses in fixing cpuid/build.
v4:
- New patch to reorder x86 selftest makefile to avoid clobbering CFLAGS
(would cause __cpuid_count() related build fail otherwise)
I don't like the way this patch series is mushrooming. I am not convinced that changes to lib.mk and x86 Makefile are necessary.
I didn't like it either what I found from the various makefiles. I think there are many things done which conflict with what lib.mk seems to try to do.
Some of it by desig. lib.mk offers framework for common things. There are provisions to override like in the case of x86, powerpc. lib.mk tries to be flexible as well.
I tried to ask in the first submission what test I should use in the header file as I'm not very familiar with how arch specific is done in userspace in the first place nor how it should be done within kselftest framework.
Thoughts on cpuid:
- It is x86 specific. Moving this to kselftest.h was done to avoid
duplicate. However now we are running into arm64/arm compile errors due to this which need addressing one way or the other.
I have some ideas on how to solve this - but I need answers to the following questions.
This is a question for you and Usama.
- Does resctrl run on arm64/arm and what's the output?
- Can all other tests in resctrl other tests except
noncont_cat_run_test?
- If so send me the output.
Hi Shuah,
As mentioned in my coverletter above, resctrl does not currently support arm but there's an ongoing work to add arm support. On kernel side it requires major refactoring to move non-arch specific stuff out from arch/x86 so has (predictably) taken long time.
The resctrl selftests are mostly written in arch independent way (*) but there's also a way to limit a test only to CPUs from a particular vendor. And now this noncont_cat_run_test needs to use cpuid only on Intel CPUs (to read the supported flag), it's not needed even on AMD CPUs as they always support non-contiguous CAT bitmask.
So to summarize, it would be possible to disable resctrl test for non-x86 but it does not address the underlying problem with cpuid which will just come back later I think.
Alternatively, if there's some a good way in C code to do ifdeffery around that cpuid call, I could make that too, but I need to know which symbol to use for that ifdef.
(*) The cache topology may make some selftest unusable on new archs but not the selftest code itself.
I agree that suppressing resctrl build is not a solution. The real problem is is in defining __cpuid_count() in common code path.
I fixed it and send patch in. As I was testing I noticed the following on AMD platform:
- it ran the L3_NONCONT_CAT test which is expected.
# # Starting L3_NONCONT_CAT test ... # # Mounting resctrl to "/sys/fs/resctrl" # ARCH_AMD - supports non-contiguous CBM # # Write schema "L3:0=ff" to resctrl FS # # Write schema "L3:0=fc3f" to resctrl FS # ok 5 L3_NONCONT_CAT: test
- It went on to run L2_NONCONT_CAT - failed
It is not intended to appear as a failure but instead just skipping of a test since the platform does not support the feature being tested.
# ok 6 # SKIP Hardware does not support L2_NONCONT_CAT or L2_NONCONT_CAT is disabled
The output looks as intended. When I run the test on an Intel system without L2 CAT the output looks the same.
Does it make sense to run both L3_NONCONT_CAT and L2_NONCONT_CAT on AMD? Maybe it is? resctrl checks L3 or L2 support on Intel.
The selftests test the features as exposed by the generic resctrl kernel subsystem instead of relying on its own inventory of what features need to be checked for which vendor. selftests will thus only test L3 or L2 if resctrl kernel subsystem indicates it is supported on underlying platform. Only afterwards may it use platform specific knowledge to help validate the feature. In this scenario resctrl indicated that L2 CAT is not supported by underlying platform and the test was skipped. It looks good to me.
Anyway - the problem is fixed now. Please review and test.
Thank you very much. Will do.
Reinette