On Sun, 2021-01-10 at 17:39 +0200, Mike Rapoport wrote:
On Wed, Jan 06, 2021 at 04:04:21PM -0500, Qian Cai wrote:
On Wed, 2021-01-06 at 10:05 +0200, Mike Rapoport wrote:
I think we trigger PF_POISONED_CHECK() in PageSlab(), then fffffffffffffffe is "accessed" from VM_BUG_ON_PAGE().
It seems to me that we are not initializing struct pages for holes at the node boundaries because zones are already clamped to exclude those holes.
Can you please try to see if the patch below will produce any useful info:
[ 0.000000] init_unavailable_range: spfn: 8c, epfn: 9b, zone: DMA, node: 0 [ 0.000000] init_unavailable_range: spfn: 1f7be, epfn: 1f9fe, zone: DMA32, node: 0 [ 0.000000] init_unavailable_range: spfn: 28784, epfn: 288e4, zone: DMA32, node: 0 [ 0.000000] init_unavailable_range: spfn: 298b9, epfn: 298bd, zone: DMA32, node: 0 [ 0.000000] init_unavailable_range: spfn: 29923, epfn: 29931, zone: DMA32, node: 0 [ 0.000000] init_unavailable_range: spfn: 29933, epfn: 29941, zone: DMA32, node: 0 [ 0.000000] init_unavailable_range: spfn: 29945, epfn: 29946, zone: DMA32, node: 0 [ 0.000000] init_unavailable_range: spfn: 29ff9, epfn: 2a823, zone: DMA32, node: 0 [ 0.000000] init_unavailable_range: spfn: 33a23, epfn: 33a53, zone: DMA32, node: 0 [ 0.000000] init_unavailable_range: spfn: 78000, epfn: 100000, zone: DMA32, node: 0 ... [ 572.222563][ T2302] kpagecount_read: pfn 47f380 is poisoned
...
[ 590.570032][ T2302] kpagecount_read: pfn 47ffff is poisoned [ 604.268653][ T2302] kpagecount_read: pfn 87ff80 is poisoned
...
[ 604.611698][ T2302] kpagecount_read: pfn 87ffbc is poisoned [ 617.484205][ T2302] kpagecount_read: pfn c7ff80 is poisoned
...
[ 618.212344][ T2302] kpagecount_read: pfn c7ffff is poisoned [ 633.134228][ T2302] kpagecount_read: pfn 107ff80 is poisoned
...
[ 633.874087][ T2302] kpagecount_read: pfn 107ffff is poisoned [ 647.686412][ T2302] kpagecount_read: pfn 147ff80 is poisoned
...
[ 648.425548][ T2302] kpagecount_read: pfn 147ffff is poisoned [ 663.692630][ T2302] kpagecount_read: pfn 187ff80 is poisoned
...
[ 664.432671][ T2302] kpagecount_read: pfn 187ffff is poisoned [ 675.462757][ T2302] kpagecount_read: pfn 1c7ff80 is poisoned
...
[ 676.202548][ T2302] kpagecount_read: pfn 1c7ffff is poisoned [ 687.121605][ T2302] kpagecount_read: pfn 207ff80 is poisoned
...
[ 687.860981][ T2302] kpagecount_read: pfn 207ffff is poisoned
The e820 map has a hole near the end of each node and these holes are not initialized with init_unavailable_range() after it was interleaved with memmap initialization because such holes are not accounted by zone->spanned_pages.
Yet, I'm still cannot really understand how this never triggered
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
before v5.7 as all the struct pages for these holes would have zone=0 and node=0 ...
@Qian, can you please boot your system with memblock=debug and share the logs?