On 06/28/12 11:27, the mail apparently from Tom Gall included:
Hi All,
I'm stressing a system with apachebench. As one scales up work on a system obviously there's always a point where the wheels fall off, the engine explodes or something else exciting happens. But as Han Solo would say ... "hold together baby....", I'd like to eek out as much as I can. (If you're really interested, here's what I'm up to : http://fullshovel.wordpress.com/ start with part 1)
In this case with apachebench, I'm geting the following allocation errors in the kernel and need a little help deciphering. It sure looks like there's plenty of space to swap out however if I have this right, we're getting so much network traffic that the kernel gets inundated and it OOMs in the network stack.
I did later try setting sysctl -w vm.min_free_kbytes=32768 but that didn't really seem to help.
The much more complete dmesg dump is located at http://people.linaro.org/~tgall/dmesg-dump.txt
[127100.245117] swapper/0: page allocation failure: order:3, mode:0x20
[127100.245666] [<80100f14>] (__alloc_pages_nodemask+0x678/0x7a4) from [<80695270>] (kmem_getpages.isra.35+0x3c/0xc0) [127100.245666] [<80695270>] (kmem_getpages.isra.35+0x3c/0xc0) from [<80695380>] (cache_grow.constprop.37+0x8c/0x1fc) [127100.245666] [<80695380>] (cache_grow.constprop.37+0x8c/0x1fc) from [<8069570c>] (cache_alloc_refill+0x21c/0x274) [127100.245819] [<8069570c>] (cache_alloc_refill+0x21c/0x274) from [<80132dac>] (__kmalloc_track_caller+0xac/0x1b0) [127100.245910] [<80132dac>] (__kmalloc_track_caller+0xac/0x1b0) from [<8057a37c>] (__alloc_skb+0x60/0xfc) [127100.245971] [<8057a37c>] (__alloc_skb+0x60/0xfc) from [<8057a874>] (__netdev_alloc_skb+0x2c/0x54) [127100.245971] [<8057a874>] (__netdev_alloc_skb+0x2c/0x54) from [<8049dbb8>] (rx_submit+0x2c/0x1d4) [127100.245971] [<8049dbb8>] (rx_submit+0x2c/0x1d4) from [<8049e1c0>] (rx_complete+0x1a4/0x1b8) [127100.245971] [<8049e1c0>] (rx_complete+0x1a4/0x1b8) from [<804a5f38>] (usb_hcd_giveback_urb+0xb0/0xfc) [127100.246246] [<804a5f38>] (usb_hcd_giveback_urb+0xb0/0xfc) from [<804b887c>] (ehci_urb_done+0xb8/0xc4) [127100.246246] [<804b887c>] (ehci_urb_done+0xb8/0xc4) from [<804bb240>] (qh_completions+0xc8/0x49c)
Just some not directly useful extra info...
I noticed these yesterday in dmesg as well while adding the 32K min_free_kybytes in tilt-3.4 as a hack. It seems to be part of some syndrome with smsc driver and network memory allocation that's in mainline and not Panda-specific. Yesterday I saw in Google the same problems plaguing Raspberry Pi folks.
When I recently tried to stress the Panda a week or so ago by cloning gcc with a plan to compile it, in fact it lost sanity during the download with a storm of these kevent lost messages, hence the 32K hack being added.
I also remember the same problems about kevents being dropped getting looked at like a year ago without any solid result, it'll be interesting if anyone understands and can explain what the underlying issue is.
-Andy