[Please ignore this email if it is already reported ] Following build warnings/ errors noticed while building linux mainline master branch with gcc-11 for arm architecture with provided config file.
make --silent --keep-going --jobs=8 O=/home/tuxbuild/.cache/tuxmake/builds/current ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- 'CC=sccache arm-linux-gnueabihf-gcc' 'HOSTCC=sccache gcc' olddefconfig make --silent --keep-going --jobs=8 O=/home/tuxbuild/.cache/tuxmake/builds/current ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- 'CC=sccache arm-linux-gnueabihf-gcc' 'HOSTCC=sccache gcc' /builds/linux/arch/arm/boot/dts/bcm2711-rpi-4-b.dts:220.10-231.4: Warning (pci_device_reg): /scb/pcie@7d500000/pci@1,0: PCI unit address format error, expected 0,0 /builds/linux/arch/arm/boot/dts/bcm2711-rpi-4-b.dts:220.10-231.4: Warning (pci_device_reg): /scb/pcie@7d500000/pci@1,0: PCI unit address format error, expected 0,0 /builds/linux/net/ipv4/tcp.c: In function 'do_tcp_getsockopt.constprop': /builds/linux/net/ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] 4234 | } | ^ cc1: all warnings being treated as errors make[3]: *** [/builds/linux/scripts/Makefile.build:277: net/ipv4/tcp.o] Error 1 /builds/linux/fs/select.c: In function 'do_sys_poll': /builds/linux/fs/select.c:1041:1: error: the frame size of 1264 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] 1041 | } | ^ cc1: all warnings being treated as errors make[2]: *** [/builds/linux/scripts/Makefile.build:277: fs/select.o] Error 1 make[3]: Target '__build' not remade because of errors. make[2]: *** [/builds/linux/scripts/Makefile.build:540: net/ipv4] Error 2 make[2]: Target '__build' not remade because of errors. make[1]: *** [/builds/linux/Makefile:1872: fs] Error 2 /builds/linux/net/mac80211/mlme.c: In function 'ieee80211_sta_rx_queued_mgmt': /builds/linux/net/mac80211/mlme.c:4345:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] 4345 | } | ^ cc1: all warnings being treated as errors make[3]: *** [/builds/linux/scripts/Makefile.build:277: net/mac80211/mlme.o] Error 1 make[3]: Target '__build' not remade because of errors. make[2]: *** [/builds/linux/scripts/Makefile.build:540: net/mac80211] Error 2 make[2]: Target '__build' not remade because of errors. make[1]: *** [/builds/linux/Makefile:1872: net] Error 2 /builds/linux/drivers/base/test/property-entry-test.c: In function 'pe_test_uint_arrays': /builds/linux/drivers/base/test/property-entry-test.c:250:1: error: the frame size of 1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] 250 | } | ^ cc1: all warnings being treated as errors make[4]: *** [/builds/linux/scripts/Makefile.build:277: drivers/base/test/property-entry-test.o] Error 1 make[4]: Target '__build' not remade because of errors. make[3]: *** [/builds/linux/scripts/Makefile.build:540: drivers/base/test] Error 2 make[3]: Target '__build' not remade because of errors. make[2]: *** [/builds/linux/scripts/Makefile.build:540: drivers/base] Error 2 /builds/linux/drivers/usb/host/xhci.c: In function 'xhci_reserve_bandwidth': /builds/linux/drivers/usb/host/xhci.c:2867:1: error: the frame size of 1064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] 2867 | } | ^ cc1: all warnings being treated as errors make[4]: *** [/builds/linux/scripts/Makefile.build:277: drivers/usb/host/xhci.o] Error 1 make[4]: Target '__build' not remade because of errors. make[3]: *** [/builds/linux/scripts/Makefile.build:540: drivers/usb/host] Error 2 /builds/linux/drivers/net/ethernet/qlogic/qede/qede_filter.c: In function 'qede_config_rx_mode': /builds/linux/drivers/net/ethernet/qlogic/qede/qede_filter.c:1278:1: error: the frame size of 1072 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] 1278 | } | ^ cc1: all warnings being treated as errors make[6]: *** [/builds/linux/scripts/Makefile.build:277: drivers/net/ethernet/qlogic/qede/qede_filter.o] Error 1 make[6]: Target '__build' not remade because of errors. make[5]: *** [/builds/linux/scripts/Makefile.build:540: drivers/net/ethernet/qlogic/qede] Error 2 make[5]: Target '__build' not remade because of errors. make[4]: *** [/builds/linux/scripts/Makefile.build:540: drivers/net/ethernet/qlogic] Error 2 make[3]: Target '__build' not remade because of errors. make[2]: *** [/builds/linux/scripts/Makefile.build:540: drivers/usb] Error 2 make[4]: Target '__build' not remade because of errors. make[3]: *** [/builds/linux/scripts/Makefile.build:540: drivers/net/ethernet] Error 2 make[3]: Target '__build' not remade because of errors. make[2]: *** [/builds/linux/scripts/Makefile.build:540: drivers/net] Error 2 /builds/linux/drivers/media/dvb-frontends/mxl5xx.c: In function 'config_ts': /builds/linux/drivers/media/dvb-frontends/mxl5xx.c:1575:1: error: the frame size of 1232 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] 1575 | } | ^ cc1: all warnings being treated as errors make[4]: *** [/builds/linux/scripts/Makefile.build:277: drivers/media/dvb-frontends/mxl5xx.o] Error 1 make[4]: Target '__build' not remade because of errors. make[3]: *** [/builds/linux/scripts/Makefile.build:540: drivers/media/dvb-frontends] Error 2 make[3]: Target '__build' not remade because of errors. make[2]: *** [/builds/linux/scripts/Makefile.build:540: drivers/media] Error 2 make[2]: Target '__build' not remade because of errors. make[1]: *** [/builds/linux/Makefile:1872: drivers] Error 2 make[1]: Target '__all' not remade because of errors. make: *** [Makefile:219: __sub-make] Error 2 make: Target '__all' not remade because of errors.
Build config: https://builds.tuxbuild.com/1xjZpYj47LrrGs1OE1xApznPplW/config
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
meta data: ----------- git_describe: v5.14-9687-g27151f177827 git_ref: git_repo: https://gitlab.com/Linaro/lkft/mirrors/torvalds/linux-mainline git_sha: 27151f177827d478508e756c7657273261aaf8a9 git_short_log: 27151f177827 (\Merge tag 'perf-tools-for-v5.15-2021-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux) kconfig: [ defconfig https://raw.githubusercontent.com/Linaro/meta-lkft/sumo/recipes-kernel/linux... https://raw.githubusercontent.com/Linaro/meta-lkft/sumo/recipes-kernel/linux... https://raw.githubusercontent.com/Linaro/meta-lkft/sumo/recipes-kernel/linux... https://raw.githubusercontent.com/Linaro/meta-lkft/sumo/recipes-kernel/linux... https://raw.githubusercontent.com/Linaro/meta-lkft/sumo/recipes-kernel/linux... CONFIG_ARM_TI_CPUFREQ=y CONFIG_SERIAL_8250_OMAP=y CONFIG_POSIX_MQUEUE=y CONFIG_OF=y CONFIG_KASAN=y CONFIG_KUNIT=y CONFIG_KUNIT_ALL_TESTS=y ], kernel_version: 5.14.0 target_arch: arm toolchain: gcc-11
steps to reproduce: https://builds.tuxbuild.com/1xjZpYj47LrrGs1OE1xApznPplW/tuxmake_reproducer.s...
-- Linaro LKFT https://lkft.linaro.org
On Mon, Sep 6, 2021 at 5:27 AM Naresh Kamboju naresh.kamboju@linaro.org wrote:
/builds/linux/net/ipv4/tcp.c: In function 'do_tcp_getsockopt.constprop': /builds/linux/net/ipv4/tcp.c:4234:1: error: the frame size of 1152
None of these seem to be new.
It looks like this is (yet another) example of some build testing that just never cared about warnings.
Which was the exact problem that the new -Werror flag is all about. All these build test servers that only go "ok, it built, never mind all the warnings".
Everybody tells me that the build servers care about warnings. Everything I actually see tells a very different story indeed.
Linus
[ Added maintainers for various bits and pieces, since I spent the time trying to look at why those bits and pieces wasted stack-space and caused problems ]
On Tue, Sep 7, 2021 at 3:16 PM Linus Torvalds torvalds@linux-foundation.org wrote:
None of these seem to be new.
That said, all but one of them seem to be (a) very much worth fixing and (b) easy to fix.
The do_tcp_getsockopt() one in tpc.c is a classic case of "lots of different case statements, many of them with their own struct allocations on stack, and all of them disjoint".
So the fix for that would be the traditional one of just making helper functions for the cases that aren't entirely trivial. We've done that before, and it not only fixes stack usage problems, it results in code that is easier to read, and generally actually performs better too (exactly because it avoids sparse stacks and extra D$ use)
The pe_test_uints() one is the same horrendous nasty Kunit pattern that I fixed in commit 4b93c544e90e ("thunderbolt: test: split up test cases in tb_test_credit_alloc_all") that had an even worse case.
The KUNIT macros create all these individually reasonably small initialized structures on stack, and when you have more than a small handful of them the KUNIT infrastructure just makes the stack space explode. Sometimes the compiler will be able to re-use the stack slots, but it seems to be an iffy proposition to depend on it - it seems to be a combination of luck and various config options.
I detest code that exists for debugging or for testing, and that violates fundamental rules and causes more problems in the process.
The mac802.11 one seems to be due to 'struct ieee802_11_elems' being big, and allocated on the stack. I think it's probably made worse there with inlining, ie
- ieee80211_sta_rx_queued_mgmt() has one copy
- ieee80211_rx_mgmt_beacon() is possibly inlined, and has its own copy
but even if it isn't due to that kind of duplication due to inlining, that code is dangerous. Exactly because it has two nested stack frames with that big structure, and they are active at the same time in the callchain whether inlined or not.
And it's *pointlessly* dangerous, because the 'struct ieee802_11_elems elems' in ieee80211_sta_rx_queued_mgmt() is only used for the IEEE80211_STYPE_ACTION case, so it is entirely disjoint from the IEEE80211_STYPE_BEACON case, and those stack allocations simply should not nest like that in the first place.
Making the IEEE80211_STYPE_ACTION case be its own function - like the other cases - and moving the struct there should fix it. Possibly a "noinline" or two necessary to make sure that the compiler doesn't then undo the "these two cases are disjoint" thing.
The qede_config_rx_mode() has a fairly big 'struct qed_filter_params' structure on stack (it's mainly just a union of two other structures). That one is a bit silly, because the very same function *alsu* does a temporary allocation for the 'uc_macs[]' array, and I think it could have literally made that allocation just do both the params and the uc_macs[] array together.
But that "a bit silly" is actually *doubly* silly, because that big structure allocated for the stack that is actually a union, uses the QED_FILTER_TYPE_RX_MODE type of the union. Which in turn is literally *one*single*enum*field*.
So the qede_config_rx_mode() case uses that chunk of kernel stack for a big union for no good reason. It really only wants two words, but the way the code is written, it uses a lot, because the union also has a 'struct qed_filter_mcast_params' member that has an array of [64][ETH_ALEN] bytes in it.
So that's about 400 bytes of stack space entirely wasted if I read the code correctly.
The xhci_reserve_bandwidth() one is because it has an array of 31 'struct xhci_bw_info' on the stack. It's not a huge structure (six 32-bit words), but when you have 31 of those in an array, it's about 750 bytes right there. It should likely just be dynamically allocated - it doesn't seem to be some super-important critical thing where an allocation cannot be done.
The do_sys_poll() thing is a bit sad. The code has been tweaked to basically use 1kB of stack space in one configuration. It overflows it in a lot of other configs. Using stack space for those kinds of top-level functions that are guaranteed to have an empty stack is pretty much the best possible situation, but it's one where we don't really have a good way to try to have some kind of dynamic feedback from the compiler for how much other stack space it is using.
So that do_sys_poll() case is the only one I see where the stack usage is actually fine and explicitly expected. We *aim* for 1kB of stack, and then in some - probably quite a few - situations we go over.
There are many more of these cases. I've seen Hyper-V allocate 'struct cpumask' on the stack, which is once again an absolute no-no that people have apparently just ignored the warning for. When you have NR_CPUS being the maximum of 8k, those bits add up, and a single cpumask is 1kB in size. Which is why you should never do that on stack, and instead use '
cpumask_var_t mask; alloc_cpumask_var(&mask,..)
which will do a much more reasonable job. But the reason I call out hyperv is that as far as I know, hyperv itself doesn't actually support 8192 CPU's. So all that apic noise with 'struct cpumask' that uses 1kB of data when NR_CPUS is set to 8192 is just wasted. Maybe I'm wrong. Adding hyperv people to the cc too.
A lot of the stack frame size warnings are hidden by the fact that our default value for warning about stack usage is 2kB for 64-bit builds.
Probably exactly because people did things like that cpumask thing, and have these arrays of structures that are often even bigger in the 64-bit world.
Linus
On 9/7/2021 4:14 PM, Linus Torvalds wrote:
There are many more of these cases. I've seen Hyper-V allocate 'struct cpumask' on the stack, which is once again an absolute no-no that people have apparently just ignored the warning for. When you have NR_CPUS being the maximum of 8k, those bits add up, and a single cpumask is 1kB in size. Which is why you should never do that on stack, and instead use '
cpumask_var_t mask; alloc_cpumask_var(&mask,..)
which will do a much more reasonable job. But the reason I call out hyperv is that as far as I know, hyperv itself doesn't actually support 8192 CPU's. So all that apic noise with 'struct cpumask' that uses 1kB of data when NR_CPUS is set to 8192 is just wasted. Maybe I'm wrong. Adding hyperv people to the cc too.
I am only commenting on this because I was looking into an instance of this warning yesterday with Fedora's arm64 config, which has CONFIG_NR_CPUS=4096:
https://lore.kernel.org/r/YTZyMx91zV9kfDkQ@Ryzen-9-3900X.localdomain/
Won't your example only fix the issue with CONFIG_CPUMASK_OFFSTACK=y or am I misreading the gigantic comment in include/linux/cpumask.h? As far as I can tell, only x86 selects it and it is not user configurable unless CONFIG_DEBUG_PER_CPU_MAPS is set, which is buried in the debug options so most people won't bother trying to enable it. If I understand correctly, how should these be dealt with in the case of CONFIG_CPUMASK_OFFSTACK=n?
Cheers, Nathan
On Tue, Sep 7, 2021 at 4:35 PM Nathan Chancellor nathan@kernel.org wrote:
Won't your example only fix the issue with CONFIG_CPUMASK_OFFSTACK=y
Yes, but..
or am I misreading the gigantic comment in include/linux/cpumask.h?
you're not misreading the comment, but you are missing this important fact:
config NR_CPUS_RANGE_END int depends on X86_64 default 8192 if SMP && CPUMASK_OFFSTACK default 512 if SMP && !CPUMASK_OFFSTACK default 1 if !SMP
so basically you can't choose more than 512 CPU's unless CPUMASK_OFFSTACK is set.
Of course, we may have some bug in the Kconfig elsewhere, and I didn't check other architectures. So maybe there's some way to work around it.
But basically the rule is that CPUMASK_OFFSTACK and NR_CPUS are linked.
That linkage is admittedly a bit hidden and much too subtle. I think the only real reason why it's done that way is because people wanted to do test builds with CPUMASK_OFFSTACK even without having to have some ludicrous number of NR_CPUS.
You'll notice that the question "CPUMASK_OFFSTACK" is only enabled if DEBUG_PER_CPU_MAPS is true.
That whole "for debugging" reason made more sense a decade ago when this was all new and fancy.
It might make more sense to do that very explicitly, and make CPUMASK_OFFSTACK be just something like
config NR_CPUS_RANGE_END def_bool NR_CPUS <= 512
and get rid of the subtlety and choice in the matter.
Linus
On 9/7/2021 4:49 PM, Linus Torvalds wrote:
On Tue, Sep 7, 2021 at 4:35 PM Nathan Chancellor nathan@kernel.org wrote:
Won't your example only fix the issue with CONFIG_CPUMASK_OFFSTACK=y
Yes, but..
or am I misreading the gigantic comment in include/linux/cpumask.h?
you're not misreading the comment, but you are missing this important fact:
config NR_CPUS_RANGE_END int depends on X86_64 default 8192 if SMP && CPUMASK_OFFSTACK default 512 if SMP && !CPUMASK_OFFSTACK default 1 if !SMP
so basically you can't choose more than 512 CPU's unless CPUMASK_OFFSTACK is set.
Of course, we may have some bug in the Kconfig elsewhere, and I didn't check other architectures. So maybe there's some way to work around it.
Ah, okay, that is an x86-only limitation so I missed it. I do not think there is any bug with that Kconfig logic but it is only used on x86.
But basically the rule is that CPUMASK_OFFSTACK and NR_CPUS are linked.
That linkage is admittedly a bit hidden and much too subtle. I think the only real reason why it's done that way is because people wanted to do test builds with CPUMASK_OFFSTACK even without having to have some ludicrous number of NR_CPUS.
You'll notice that the question "CPUMASK_OFFSTACK" is only enabled if DEBUG_PER_CPU_MAPS is true.
That whole "for debugging" reason made more sense a decade ago when this was all new and fancy.
It might make more sense to do that very explicitly, and make CPUMASK_OFFSTACK be just something like
config NR_CPUS_RANGE_END def_bool NR_CPUS <= 512
and get rid of the subtlety and choice in the matter.
Indeed. Grepping around the tree, I see that arc, arm64, ia64, powerpc, and sparc64 all support NR_CPUS up to 4096 (8192 for PPC) but none of them select CPUMASK_OFFSTACK so it seems like they should test support for CPUMASK_OFFSTACK and adopt similar logic to x86 to limit how much stack space cpumask variables can use. Like you mentioned, it probably has not come up before because most of those are 64-bit platforms that have a higher default FRAME_WARN value (and the default NR_CPUS values on all of them is small). I only noticed because Fedora sets NR_CPUS to 4096 for arm64 and has a FRAME_WARN value of 1024, meaning two cpumask variables in the same frame puts that frame right at the 1024 limit.
Cheers, Nathan
On Tue, Sep 7, 2021 at 5:15 PM Nathan Chancellor nathan@kernel.org wrote:
Ah, okay, that is an x86-only limitation so I missed it. I do not think there is any bug with that Kconfig logic but it is only used on x86.
Yeah. I think other architectures are basically just buggy, but nobody has ever cared or noticed, because the hardware basically doesn't exist.
People hopefully don't actually configure for the kernel theoretical maximum outside of x86. Even there it's a bit ridiculous, but the hardware with lots and lots of cores at least _has_ existed.
Indeed. Grepping around the tree, I see that arc, arm64, ia64, powerpc, and sparc64 all support NR_CPUS up to 4096 (8192 for PPC) but none of them select CPUMASK_OFFSTACK
I think this one says it all:
arch/arc/Kconfig: range 2 4096
yeah. ARC allows you to configure 4k CPU's.
I think a lot of them have just copied the x86 code (it was 4k long ago), without actually understanding all the details.
That is indeed a strong argument for getting rid of the current much-too-subtle CPUMASK_OFFSTACK situation.
But as the hyperv code shows, even on x86 the "we never allocate a full cpumask on the stack" has gotten lost a bit since the days that Rusty and others actually implemented this all.
Linus
On Tue, Sep 7, 2021 at 6:35 PM Linus Torvalds torvalds@linux-foundation.org wrote:
I think a lot of them have just copied the x86 code (it was 4k long ago), without actually understanding all the details.
Just to put the x86 number in perspective: it was raised to 8192 back in 2013, with the comment
x86/cpu: Increase max CPU count to 8192
The MAXSMP option is intended to enable silly large numbers of CPUs for testing purposes. The current value of 4096 isn't very silly any longer as there are actual SGI machines that approach 6096 CPUs when taking HT into account.
Increase the value to a nice round 8192 to account for this and allow for short term future increases.
so on the x86 side, people have actually done these things.
Other architectures? I think some IBM power9 machines can hit 192 cores (with SMT4 - so NR_CPUS of 768), but I don't think there's been an equivalent of an SGI for anything but x86.
But admittedly I haven't checked or followed those things. I could easily imagine some boutique super-beefy setup.
Linus
On Wed, Sep 8, 2021 at 3:43 AM Linus Torvalds torvalds@linux-foundation.org wrote:
On Tue, Sep 7, 2021 at 6:35 PM Linus Torvalds torvalds@linux-foundation.org wrote:
I think a lot of them have just copied the x86 code (it was 4k long ago), without actually understanding all the details.
Just to put the x86 number in perspective: it was raised to 8192 back in 2013, with the comment
x86/cpu: Increase max CPU count to 8192 The MAXSMP option is intended to enable silly large numbers of CPUs for testing purposes. The current value of 4096 isn't very silly any longer as there are actual SGI machines that approach 6096 CPUs when taking HT into account. Increase the value to a nice round 8192 to account for this and allow for short term future increases.
so on the x86 side, people have actually done these things.
Other architectures? I think some IBM power9 machines can hit 192 cores (with SMT4 - so NR_CPUS of 768), but I don't think there's been an equivalent of an SGI for anything but x86.
But admittedly I haven't checked or followed those things. I could easily imagine some boutique super-beefy setup.
POWER10 was just announced with threads 1920 using SMT8, I think the latest s390 and sparc64 (from 2017) are in the same ballpark when using SMT. The largest arm64 I know of was ThunderX3 with 768 threads on dual-socket machines. This got cancelled before it was shipped to customers, but it's likely that others will exceed that in the future.
Arnd
On Tue, 2021-09-07 at 16:14 -0700, Linus Torvalds wrote:
The mac802.11 one seems to be due to 'struct ieee802_11_elems' being big, and allocated on the stack. I think it's probably made worse there with inlining, ie
- ieee80211_sta_rx_queued_mgmt() has one copy
- ieee80211_rx_mgmt_beacon() is possibly inlined, and has its own copy
but even if it isn't due to that kind of duplication due to inlining, that code is dangerous. Exactly because it has two nested stack frames with that big structure, and they are active at the same time in the callchain whether inlined or not.
And it's *pointlessly* dangerous, because the 'struct ieee802_11_elems elems' in ieee80211_sta_rx_queued_mgmt() is only used for the IEEE80211_STYPE_ACTION case, so it is entirely disjoint from the IEEE80211_STYPE_BEACON case, and those stack allocations simply should not nest like that in the first place.
Making the IEEE80211_STYPE_ACTION case be its own function - like the other cases - and moving the struct there should fix it. Possibly a "noinline" or two necessary to make sure that the compiler doesn't then undo the "these two cases are disjoint" thing.
Yeah, I'm aware, and I agree. We've been looking at it every now and then. This got made worse by us actually adding a fair amount of pointers to the struct recently (in this merge window).
Ultimately, every new spec addition ends up needing to add something there, so I think ultimately we'll probably want to either dynamically allocate it somewhere (perhaps in a data structure used here already), or possibly not have this at all and just find a way to return only the bits that are interesting. Even parsing a ~1k frame (typical, max ~2k) a handful of times is probably not even worse than having this large a structure that gets filled data that's probably useless in many cases (I think the different cases all just need a subset). But not sure, I'll take a look.
johannes
On Tue, Sep 07, 2021 at 04:14:24PM -0700, Linus Torvalds wrote:
[ Added maintainers for various bits and pieces, since I spent the time trying to look at why those bits and pieces wasted stack-space and caused problems ]
On Tue, Sep 7, 2021 at 3:16 PM Linus Torvalds torvalds@linux-foundation.org wrote:
[...]
There are many more of these cases. I've seen Hyper-V allocate 'struct cpumask' on the stack, which is once again an absolute no-no that people have apparently just ignored the warning for. When you have NR_CPUS being the maximum of 8k, those bits add up, and a single cpumask is 1kB in size. Which is why you should never do that on stack, and instead use '
cpumask_var_t mask; alloc_cpumask_var(&mask,..)
which will do a much more reasonable job. But the reason I call out hyperv is that as far as I know, hyperv itself doesn't actually support 8192 CPU's. So all that apic noise with 'struct cpumask' that uses 1kB of data when NR_CPUS is set to 8192 is just wasted. Maybe I'm wrong. Adding hyperv people to the cc too.
A lot of the stack frame size warnings are hidden by the fact that our default value for warning about stack usage is 2kB for 64-bit builds.
Probably exactly because people did things like that cpumask thing, and have these arrays of structures that are often even bigger in the 64-bit world.
Thanks for the heads-up. I found one instance of this bad practice in hv_apic.c. Presumably that's the one you were referring to.
However calling into the allocator from that IPI path seems very heavy weight. I will discuss with fellow engineers on how to fix it properly.
Wei.
Linus
From: Wei Liu
Sent: 08 September 2021 11:03
...
However calling into the allocator from that IPI path seems very heavy weight. I will discuss with fellow engineers on how to fix it properly.
Isn't the IPI code something that is likely to get called when a lot of stack has already been used?
So you really shouldn't be using much stack at all??
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Sep 08, 2021 at 02:51:21PM +0000, David Laight wrote:
From: Wei Liu
Sent: 08 September 2021 11:03
...
However calling into the allocator from that IPI path seems very heavy weight. I will discuss with fellow engineers on how to fix it properly.
Isn't the IPI code something that is likely to get called when a lot of stack has already been used?
So you really shouldn't be using much stack at all??
I don't follow your questions. I don't dispute there is a problem. I just think calling into the allocator is not a good idea in that particular piece of code we need to fix.
Hopefully we can come up with a solution to remove need for a cpumask in that code -- discussion is on-going.
Wei.
David
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
From: Wei Liu
Sent: 08 September 2021 16:24
On Wed, Sep 08, 2021 at 02:51:21PM +0000, David Laight wrote:
From: Wei Liu
Sent: 08 September 2021 11:03
...
However calling into the allocator from that IPI path seems very heavy weight. I will discuss with fellow engineers on how to fix it properly.
Isn't the IPI code something that is likely to get called when a lot of stack has already been used?
So you really shouldn't be using much stack at all??
I don't follow your questions. I don't dispute there is a problem. I just think calling into the allocator is not a good idea in that particular piece of code we need to fix.
Hopefully we can come up with a solution to remove need for a cpumask in that code -- discussion is on-going.
I'm pretty sure the IPI interrupt is high priority so can nest with another interrupt. (You certainly want it to be that way.)
So the kernel may already be running on the interrupt stack. If the interrupted ISR code has used a lot of stack then there may not be as much left as you might expect.
Many years ago (nearly 40!) I wrote something that did static stack depth analysis for an embedded system. Since there were no (interesting) indirect calls an no recursion it wasn't completely impossible. What it showed was that the deepest stack use was in error trace paths that probably never happened. I suspect the same is true for Linux - the deepest points will be inside printk() in obscure error paths. Get an IPI while in a printk() from deep inside an ISR and you may not have the amount of stack you expect.
It might be possible to use the clang 'control flow integrity' information to determine the actual maximum stack use even for indirectly called functions. I suspect that would be an eye-opener....
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Sep 8, 2021 at 3:03 AM Wei Liu wei.liu@kernel.org wrote:
Thanks for the heads-up. I found one instance of this bad practice in hv_apic.c. Presumably that's the one you were referring to.
Yeah, that must have been the one I saw.
However calling into the allocator from that IPI path seems very heavy weight. I will discuss with fellow engineers on how to fix it properly.
In other places, the options have been fairly straightforward:
(a) avoid the allocation entirely.
I think the main reason hyperv does it is because it wants to clear the "current cpu" from the cpumask for the ALL_BUT_SELF case, and if you can just instead keep track of that "all but self" bit separately and pass it down the call chain, you may not need that allocation at all.
(b) use a static percpu allocation
The IPI code generally wants interrupts disabled anyway, or it certainly can just do the required preemption disable to make sure that it has exclusive access to a percpu allocation.
(c) take advantage of any hypervisor limitations
If hyperv is limited to some smaller number of CPU's - google seems to imply 240 - maybe you can keep such a smaller allocation on the stack, but just verify that the incoming cpumask is less than whatever the hyperv limit is.
That (c) is obviously the worst choice in some sense, but in some cases limiting yourself to simplify things isn't wrong.
I suspect the percpu allocation model is the easiest one. It's what other IPI code does. See for example
arch/x86/kernel/apic/x2apic_cluster.c: __x2apic_send_IPI_mask()
and that percpu 'ipi_mask' thing.
That said, if you are already just iterating over the mask, doing (a) may be trivial. No allocation at all is even better than a percpu one.
I'm sure there are other options. The above is just the obvious ones that come to mind.
Linus
On Wed, Sep 08, 2021 at 08:59:36AM -0700, Linus Torvalds wrote:
On Wed, Sep 8, 2021 at 3:03 AM Wei Liu wei.liu@kernel.org wrote:
Thanks for the heads-up. I found one instance of this bad practice in hv_apic.c. Presumably that's the one you were referring to.
Yeah, that must have been the one I saw.
However calling into the allocator from that IPI path seems very heavy weight. I will discuss with fellow engineers on how to fix it properly.
In other places, the options have been fairly straightforward:
(a) avoid the allocation entirely.
I think the main reason hyperv does it is because it wants to clear the "current cpu" from the cpumask for the ALL_BUT_SELF case, and if you can just instead keep track of that "all but self" bit separately and pass it down the call chain, you may not need that allocation at all.
[..]
That said, if you are already just iterating over the mask, doing (a) may be trivial. No allocation at all is even better than a percpu one.
Yep. I just wrote two patches for this approach.
Wei.
On Tue, 7 Sep 2021, Linus Torvalds wrote:
The do_tcp_getsockopt() one in tpc.c is a classic case of "lots of different case statements, many of them with their own struct allocations on stack, and all of them disjoint".
Any compiler developers here? AFAIK the compiler knows the lifetime of function-local variables, so why not alias the actual memory locations and ranges to minimise stack usage?
bye, //mirabilos
On Wed, Sep 8, 2021 at 6:48 AM Thorsten Glaser t.glaser@tarent.de wrote:
On Tue, 7 Sep 2021, Linus Torvalds wrote:
The do_tcp_getsockopt() one in tpc.c is a classic case of "lots of different case statements, many of them with their own struct allocations on stack, and all of them disjoint".
Any compiler developers here? AFAIK the compiler knows the lifetime of function-local variables, so why not alias the actual memory locations and ranges to minimise stack usage?
At least on my builds, do_tcp_getsockopt() uses less than 512 bytes of stack.
Probably because tcp_zerocopy_receive() is _not_ inlined, by pure luck I suppose.
Perhaps we should use noinline_for_stack here.
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e8b48df73c852a48e51754ea98b1e08bf024bb9e..437910c096b202420518c9e5e5cd26b2194d8aa2 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2054,9 +2054,10 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk, }
#define TCP_ZEROCOPY_PAGE_BATCH_SIZE 32 -static int tcp_zerocopy_receive(struct sock *sk, - struct tcp_zerocopy_receive *zc, - struct scm_timestamping_internal *tss) +static noinline_for_stack int +tcp_zerocopy_receive(struct sock *sk, + struct tcp_zerocopy_receive *zc, + struct scm_timestamping_internal *tss) { u32 length = 0, offset, vma_len, avail_len, copylen = 0; unsigned long address = (unsigned long)zc->address;
On Wed, Sep 8, 2021 at 7:50 AM Eric Dumazet edumazet@google.com wrote:
At least on my builds, do_tcp_getsockopt() uses less than 512 bytes of stack.
Probably because tcp_zerocopy_receive() is _not_ inlined, by pure luck I suppose.
Perhaps we should use noinline_for_stack here.
I agree that that is likely a good idea, but I also suspect that the stack growth may be related to other issues. So it being less than 512 bytes for you may be related to other random noise than inlining.
In the past I've seen at least two patterns
(a) not merging stack slots at all
(b) some odd "pattern allocator" problems, where I think gcc ended up re-using previous stack slots if they were the right size, but failing when previous allocations were fragmented
that (a) thing is what -fconserve-stack is all about, and we also used to have (iirc) -fno-defer-pop to avoid having function call argument stacks stick around.
And (b) is one of those "random allocation pattern" things, which depends on the phase of the moon, where gcc ends up treating the stack frame as a series of fixed-size allocations, but isn't very smart about it. Even if some allocations got free'd, they might be surrounded by oithers that didn't, and then gcc wouldn't re-use them if there's a bigger allocation afterwards. And similarly, I don't think gcc ever even joins together two free'd stack frame allocations.
I also wouldn't be surprised at all if some of our hardening flags ended up causing the stack frame reuse to entirely fail. IOW, I could easily see things like INIT_STACK_ALL_ZERO might cause the compiler to initialize all the stack frame allocations "early", so that their lifetimes all overlap.
So it could easily be about very subtle and random code generation choices that just change the order of allocation. A spill in the wrong place, things like that.
Or it could be about not-so-subtle big config option things.
Linus
On Wed, Sep 8, 2021 at 5:49 PM Linus Torvalds torvalds@linux-foundation.org wrote:
On Wed, Sep 8, 2021 at 7:50 AM Eric Dumazet edumazet@google.com wrote: In the past I've seen at least two patterns
(a) not merging stack slots at all
(b) some odd "pattern allocator" problems, where I think gcc ended up re-using previous stack slots if they were the right size, but failing when previous allocations were fragmented
that (a) thing is what -fconserve-stack is all about, and we also used to have (iirc) -fno-defer-pop to avoid having function call argument stacks stick around.
CONFIG_KASAN_STACK leads to (a), and this has been the source of long discussions about whether to turn it off altogether, the current state being that it's disabled for CONFIG_COMPILE_TEST on clang because this can explode the stack usage even further when it starts spilling registers.
gcc also runs into this problem, but at least newer versions at not nearly as bad as they used to be in the past.
Arnd
On 9/7/21 5:14 PM, Linus Torvalds wrote:
[ Added maintainers for various bits and pieces, since I spent the time trying to look at why those bits and pieces wasted stack-space and caused problems ]
On Tue, Sep 7, 2021 at 3:16 PM Linus Torvalds torvalds@linux-foundation.org wrote:
None of these seem to be new.
That said, all but one of them seem to be (a) very much worth fixing and (b) easy to fix.
The do_tcp_getsockopt() one in tpc.c is a classic case of "lots of different case statements, many of them with their own struct allocations on stack, and all of them disjoint".
So the fix for that would be the traditional one of just making helper functions for the cases that aren't entirely trivial. We've done that before, and it not only fixes stack usage problems, it results in code that is easier to read, and generally actually performs better too (exactly because it avoids sparse stacks and extra D$ use)
The pe_test_uints() one is the same horrendous nasty Kunit pattern that I fixed in commit 4b93c544e90e ("thunderbolt: test: split up test cases in tb_test_credit_alloc_all") that had an even worse case.
The KUNIT macros create all these individually reasonably small initialized structures on stack, and when you have more than a small handful of them the KUNIT infrastructure just makes the stack space explode. Sometimes the compiler will be able to re-use the stack slots, but it seems to be an iffy proposition to depend on it - it seems to be a combination of luck and various config options.
I have been concerned about these macros creeping in for a while. I will take a closer look and work with Brendan to come with a plan to address it.
I detest code that exists for debugging or for testing, and that violates fundamental rules and causes more problems in the process.
Having recently debugged a problem spinlock hold in an invalid context related to debug code, I agree with you on this that test and debug code shouldn't cause problems.
thanks, -- Shuah
On Wed, Sep 8, 2021 at 4:12 PM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/7/21 5:14 PM, Linus Torvalds wrote:
The KUNIT macros create all these individually reasonably small initialized structures on stack, and when you have more than a small handful of them the KUNIT infrastructure just makes the stack space explode. Sometimes the compiler will be able to re-use the stack slots, but it seems to be an iffy proposition to depend on it - it seems to be a combination of luck and various config options.
I have been concerned about these macros creeping in for a while. I will take a closer look and work with Brendan to come with a plan to address it.
I've previously sent patches to turn off the structleak plugin for any kunit test file to work around this, but only a few of those patches got merged and new files have been added since. It would definitely help to come up with a proper fix, but my structleak-disable hack should be sufficient as a quick fix.
Arnd
On 9/8/21 11:05 AM, Arnd Bergmann wrote:
On Wed, Sep 8, 2021 at 4:12 PM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/7/21 5:14 PM, Linus Torvalds wrote:
The KUNIT macros create all these individually reasonably small initialized structures on stack, and when you have more than a small handful of them the KUNIT infrastructure just makes the stack space explode. Sometimes the compiler will be able to re-use the stack slots, but it seems to be an iffy proposition to depend on it - it seems to be a combination of luck and various config options.
I have been concerned about these macros creeping in for a while. I will take a closer look and work with Brendan to come with a plan to address it.
I've previously sent patches to turn off the structleak plugin for any kunit test file to work around this, but only a few of those patches got merged and new files have been added since. It would definitely help to come up with a proper fix, but my structleak-disable hack should be sufficient as a quick fix.
Looks like these are RFC patches and the discussion went cold. Let's pick this back up and we can make progress.
https://lore.kernel.org/lkml/CAFd5g45+JqKDqewqz2oZtnphA-_0w62FdSTkRs43K_NJUg...
thanks, -- Shuah
On Wed, Sep 8, 2021 at 10:16 AM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/8/21 11:05 AM, Arnd Bergmann wrote:
On Wed, Sep 8, 2021 at 4:12 PM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/7/21 5:14 PM, Linus Torvalds wrote:
The KUNIT macros create all these individually reasonably small initialized structures on stack, and when you have more than a small handful of them the KUNIT infrastructure just makes the stack space explode. Sometimes the compiler will be able to re-use the stack slots, but it seems to be an iffy proposition to depend on it - it seems to be a combination of luck and various config options.
I have been concerned about these macros creeping in for a while. I will take a closer look and work with Brendan to come with a plan to address it.
I've previously sent patches to turn off the structleak plugin for any kunit test file to work around this, but only a few of those patches got merged and new files have been added since. It would definitely help to come up with a proper fix, but my structleak-disable hack should be sufficient as a quick fix.
Looks like these are RFC patches and the discussion went cold. Let's pick this back up and we can make progress.
https://lore.kernel.org/lkml/CAFd5g45+JqKDqewqz2oZtnphA-_0w62FdSTkRs43K_NJUg...
I can try to get the patch reapplying and send it out (I just figured that Arnd or Kees would want to send it out :-) since it was your idea).
I definitely agree that in the cases where KUnit is not actually contributing to blowing the stack - struct leak just thinks it is, this is fine; however, it sounds like Linus' concerns with KUnit's macros go deeper than this. Arnd, I think you sketched out a way to make the KUNIT_* macros take up less space, but after some investigation we found that it was pretty inflexible.
Ideally test cases should never get big enough for KUNIT_* macros to be a problem (when they do it is usually an indication that your test case is trying to do too many things); nevertheless, we are still in this situation.
I think I will need to dust off some cobwebs out of my brain to remember why I didn't like the idea of making the KUNIT_* macros take up less stack space.
On Wed, Sep 8, 2021 at 2:25 PM Brendan Higgins brendanhiggins@google.com wrote:
I definitely agree that in the cases where KUnit is not actually contributing to blowing the stack - struct leak just thinks it is, this is fine; however, it sounds like Linus' concerns with KUnit's macros go deeper than this.
I don't mind Kunit tests when they don't cause problems, but one very natural way to use the Kunit test infrastructure does seem to be to just put a lot of them into one function.
And then the individually fairly small structures just add up. Probably mainly in some special configurations (ie together with CONFIG_KASAN_STACK as pointed out by Arnd, but there might be other cases that cause that issue too) where the compiler then doesn't merge stack slots.
I wonder if those 'kunit_assert' structures could be split into two: one part that could be 'static const', and at least shrink the dynamic stack use that way. Because at a minimun, things like type/file/line/format-msg seem to be things that really are just static and const.
Hmm?
Linus
On 9/8/21 3:24 PM, Brendan Higgins wrote:
On Wed, Sep 8, 2021 at 10:16 AM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/8/21 11:05 AM, Arnd Bergmann wrote:
On Wed, Sep 8, 2021 at 4:12 PM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/7/21 5:14 PM, Linus Torvalds wrote:
The KUNIT macros create all these individually reasonably small initialized structures on stack, and when you have more than a small handful of them the KUNIT infrastructure just makes the stack space explode. Sometimes the compiler will be able to re-use the stack slots, but it seems to be an iffy proposition to depend on it - it seems to be a combination of luck and various config options.
I have been concerned about these macros creeping in for a while. I will take a closer look and work with Brendan to come with a plan to address it.
I've previously sent patches to turn off the structleak plugin for any kunit test file to work around this, but only a few of those patches got merged and new files have been added since. It would definitely help to come up with a proper fix, but my structleak-disable hack should be sufficient as a quick fix.
Looks like these are RFC patches and the discussion went cold. Let's pick this back up and we can make progress.
https://lore.kernel.org/lkml/CAFd5g45+JqKDqewqz2oZtnphA-_0w62FdSTkRs43K_NJUg...
I can try to get the patch reapplying and send it out (I just figured that Arnd or Kees would want to send it out :-) since it was your idea).
Brendan,
Would you like to send me the fix with Suggested-by for Arnd or Kees?
thanks, -- Shuah
On Mon, Sep 13, 2021 at 1:55 PM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/8/21 3:24 PM, Brendan Higgins wrote:
On Wed, Sep 8, 2021 at 10:16 AM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/8/21 11:05 AM, Arnd Bergmann wrote:
On Wed, Sep 8, 2021 at 4:12 PM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/7/21 5:14 PM, Linus Torvalds wrote:
The KUNIT macros create all these individually reasonably small initialized structures on stack, and when you have more than a small handful of them the KUNIT infrastructure just makes the stack space explode. Sometimes the compiler will be able to re-use the stack slots, but it seems to be an iffy proposition to depend on it - it seems to be a combination of luck and various config options.
I have been concerned about these macros creeping in for a while. I will take a closer look and work with Brendan to come with a plan to address it.
I've previously sent patches to turn off the structleak plugin for any kunit test file to work around this, but only a few of those patches got merged and new files have been added since. It would definitely help to come up with a proper fix, but my structleak-disable hack should be sufficient as a quick fix.
Looks like these are RFC patches and the discussion went cold. Let's pick this back up and we can make progress.
https://lore.kernel.org/lkml/CAFd5g45+JqKDqewqz2oZtnphA-_0w62FdSTkRs43K_NJUg...
I can try to get the patch reapplying and send it out (I just figured that Arnd or Kees would want to send it out :-) since it was your idea).
Brendan,
Would you like to send me the fix with Suggested-by for Arnd or Kees?
So it looks like Arnd's fix was accepted (whether by him or someone else) for property-entry-test and Linus already fixed thunderbolt, so the only remaining of Arnd's patches is for the bitfield test, so I'll resend that one in a bit.
Also, I haven't actually tried Linus' suggestion yet, but the logic is sound and the change *should* be fairly unintrusive - I am going to give that a try and report back (but I will get the bitfield structleak disable patch out first since I already got that applying).
On Tue, Sep 14, 2021 at 10:48 PM Brendan Higgins brendanhiggins@google.com wrote:
On Mon, Sep 13, 2021 at 1:55 PM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/8/21 3:24 PM, Brendan Higgins wrote: Brendan,
Would you like to send me the fix with Suggested-by for Arnd or Kees?
So it looks like Arnd's fix was accepted (whether by him or someone else) for property-entry-test and Linus already fixed thunderbolt, so the only remaining of Arnd's patches is for the bitfield test, so I'll resend that one in a bit.
Also, I haven't actually tried Linus' suggestion yet, but the logic is sound and the change *should* be fairly unintrusive - I am going to give that a try and report back (but I will get the bitfield structleak disable patch out first since I already got that applying).
Looking at my randconfig tree, I find these six instances:
$ git grep -w DISABLE_STRUCTLEAK_PLUGIN drivers/base/test/Makefile:CFLAGS_property-entry-test.o += $(DISABLE_STRUCTLEAK_PLUGIN) drivers/iio/test/Makefile:CFLAGS_iio-test-format.o += $(DISABLE_STRUCTLEAK_PLUGIN) drivers/mmc/host/Makefile:CFLAGS_sdhci-of-aspeed.o += $(DISABLE_STRUCTLEAK_PLUGIN) drivers/thunderbolt/Makefile:CFLAGS_test.o += $(DISABLE_STRUCTLEAK_PLUGIN) lib/Makefile:CFLAGS_test_scanf.o += $(DISABLE_STRUCTLEAK_PLUGIN) lib/Makefile:CFLAGS_bitfield_kunit.o += $(DISABLE_STRUCTLEAK_PLUGIN) scripts/Makefile.gcc-plugins: DISABLE_STRUCTLEAK_PLUGIN += -fplugin-arg-structleak_plugin-disable scripts/Makefile.gcc-plugins:export DISABLE_STRUCTLEAK_PLUGIN
Sorry for failing to submit these as a proper patch. If you send a new version, I think you need to make sure you cover all of the above, using whichever change you like best.
Arnd
On Tue, Sep 14, 2021 at 3:04 PM Arnd Bergmann arnd@arndb.de wrote:
On Tue, Sep 14, 2021 at 10:48 PM Brendan Higgins brendanhiggins@google.com wrote:
On Mon, Sep 13, 2021 at 1:55 PM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/8/21 3:24 PM, Brendan Higgins wrote: Brendan,
Would you like to send me the fix with Suggested-by for Arnd or Kees?
So it looks like Arnd's fix was accepted (whether by him or someone else) for property-entry-test and Linus already fixed thunderbolt, so the only remaining of Arnd's patches is for the bitfield test, so I'll resend that one in a bit.
Also, I haven't actually tried Linus' suggestion yet, but the logic is sound and the change *should* be fairly unintrusive - I am going to give that a try and report back (but I will get the bitfield structleak disable patch out first since I already got that applying).
Looking at my randconfig tree, I find these six instances:
$ git grep -w DISABLE_STRUCTLEAK_PLUGIN drivers/base/test/Makefile:CFLAGS_property-entry-test.o += $(DISABLE_STRUCTLEAK_PLUGIN) drivers/iio/test/Makefile:CFLAGS_iio-test-format.o += $(DISABLE_STRUCTLEAK_PLUGIN) drivers/mmc/host/Makefile:CFLAGS_sdhci-of-aspeed.o += $(DISABLE_STRUCTLEAK_PLUGIN) drivers/thunderbolt/Makefile:CFLAGS_test.o += $(DISABLE_STRUCTLEAK_PLUGIN) lib/Makefile:CFLAGS_test_scanf.o += $(DISABLE_STRUCTLEAK_PLUGIN) lib/Makefile:CFLAGS_bitfield_kunit.o += $(DISABLE_STRUCTLEAK_PLUGIN) scripts/Makefile.gcc-plugins: DISABLE_STRUCTLEAK_PLUGIN += -fplugin-arg-structleak_plugin-disable scripts/Makefile.gcc-plugins:export DISABLE_STRUCTLEAK_PLUGIN
Alright, I incorporated all the above into a patchset that I think is ready to send out, but I had a couple of issues with the above suggestions:
- I could not find a config which causes a stacksize warning for sdhci-of-aspeed. - test_scanf is not a KUnit test. - Linus already fixed the thunderbolt test by breaking up the test cases.
I am going to send out patches for the thunderbolt test and for the sdhci-of-aspeed test for the sake of completeness, but I am not sure if we should merge those two. I'll let y'all decide on the patch review.
I only based the thunderbolt and bitfield test fixes on actual patches from Arnd, but I think Arnd pretty much did all the work here so I am crediting him with a Co-developed-by on all the other patches, so Arnd: please follow up on the other patches with a signed-off-by, unless you would rather me credit you in some other way.
Sorry for failing to submit these as a proper patch. If you send a new version, I think you need to make sure you cover all of the above, using whichever change you like best.
I am still going to try to get Linus' suggestion working since it actually solves the problem, but I would rather get the above suggested fix out there since it is quick and I know it works.
Cheers
On Thu, Sep 16, 2021 at 10:39 PM Brendan Higgins brendanhiggins@google.com wrote:
On Tue, Sep 14, 2021 at 3:04 PM Arnd Bergmann arnd@arndb.de wrote:
On Tue, Sep 14, 2021 at 10:48 PM Brendan Higgins brendanhiggins@google.com wrote:
On Mon, Sep 13, 2021 at 1:55 PM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/8/21 3:24 PM, Brendan Higgins wrote:
[...]
Alright, I incorporated all the above into a patchset that I think is ready to send out, but I had a couple of issues with the above suggestions:
- I could not find a config which causes a stacksize warning for
sdhci-of-aspeed.
- test_scanf is not a KUnit test.
- Linus already fixed the thunderbolt test by breaking up the test cases.
I am going to send out patches for the thunderbolt test and for the sdhci-of-aspeed test for the sake of completeness, but I am not sure if we should merge those two. I'll let y'all decide on the patch review.
Just in case I missed any interested parties on this thread, I posted my patches here:
https://lore.kernel.org/linux-kselftest/20210917061104.2680133-1-brendanhigg...
I only based the thunderbolt and bitfield test fixes on actual patches from Arnd, but I think Arnd pretty much did all the work here so I am crediting him with a Co-developed-by on all the other patches, so Arnd: please follow up on the other patches with a signed-off-by, unless you would rather me credit you in some other way.
Sorry for failing to submit these as a proper patch. If you send a new version, I think you need to make sure you cover all of the above, using whichever change you like best.
I am still going to try to get Linus' suggestion working since it actually solves the problem, but I would rather get the above suggested fix out there since it is quick and I know it works.
On Fri, Sep 17, 2021 at 8:16 AM Brendan Higgins brendanhiggins@google.com wrote:
On Thu, Sep 16, 2021 at 10:39 PM Brendan Higgins brendanhiggins@google.com wrote:
On Tue, Sep 14, 2021 at 3:04 PM Arnd Bergmann arnd@arndb.de wrote:
On Tue, Sep 14, 2021 at 10:48 PM Brendan Higgins brendanhiggins@google.com wrote:
On Mon, Sep 13, 2021 at 1:55 PM Shuah Khan skhan@linuxfoundation.org wrote:
On 9/8/21 3:24 PM, Brendan Higgins wrote:
[...]
Alright, I incorporated all the above into a patchset that I think is ready to send out, but I had a couple of issues with the above suggestions:
Thanks a lot for those suggestions.
- I could not find a config which causes a stacksize warning for
sdhci-of-aspeed.
I keep a history of my randconfig builds. This one only happened once before I fixed it, it may depend on some other combination of options. See my original defconfig file at https://pastebin.com/raw/XJxjVGYa rand/0xAB2DD5A0-failure:/git/arm-soc/drivers/mmc/host/sdhci-of-aspeed-test.c:47:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
- test_scanf is not a KUnit test.
I have three defconfigs for this one, all on x86-64:
rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c: In function 'numbers_list_field_width_val_width': rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c:530:1: error: the frame size of 2488 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c: In function 'numbers_list_field_width_typemax': rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c:488:1: error: the frame size of 2968 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c: In function 'numbers_list': rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c:437:1: error: the frame size of 2488 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
https://pastebin.com/raw/jUdY6d3G is the worst one of those
- Linus already fixed the thunderbolt test by breaking up the test cases.
Ok.
I am going to send out patches for the thunderbolt test and for the sdhci-of-aspeed test for the sake of completeness, but I am not sure if we should merge those two. I'll let y'all decide on the patch review.
Just in case I missed any interested parties on this thread, I posted my patches here:
https://lore.kernel.org/linux-kselftest/20210917061104.2680133-1-brendanhigg...
Thanks! I'll reply to the particular patch as well, but I don't think that this is sufficient here:
+CFLAGS_bitfield_kunit.o := $(call cc-option,-Wframe-larger-than=10240) $(DISABLE_STRUCTLEAK_PLUGIN)
If 10KB is actually needed, this definitely overflows the 8KB stack on 32-bit architectures.
Arnd