On Tue, 15 Sep 2020 at 13:56, Hugh Dickins hughd@google.com wrote:
On Thu, 10 Sep 2020, Matthew Wilcox (Oracle) wrote:
This patch series started out as part of the THP patch set, but it has some nice effects along the way and it seems worth splitting it out and submitting separately.
Currently find_get_entry() and find_lock_entry() return the page corresponding to the requested index, but the first thing most callers do is find the head page, which we just threw away. As part of auditing all the callers, I found some misuses of the APIs and some plain inefficiencies that I've fixed.
The diffstat is unflattering, but I added more kernel-doc and a new wrapper.
v2:
- Rework how shmem_getpage_gfp() handles getting a head page back from find_lock_entry()
- Renamed find_get_swap_page() to find_get_incore_page()
- Make sure find_get_incore_page() doesn't return a head page
- Fix the missing include of linux/shmem_fs.h
- Move find_get_entry and find_lock_entry prototypes to mm/internal.h
- Rename thp_valid_index() to thp_contains()
- Fix thp_contains() for hugetlbfs and swapcache
- Add find_lock_head() wrapper around pagecache_get_page()
Matthew Wilcox (Oracle) (8): mm: Factor find_get_incore_page out of mincore_page mm: Use find_get_incore_page in memcontrol mm: Optimise madvise WILLNEED proc: Optimise smaps for shmem entries i915: Use find_lock_page instead of find_lock_entry mm: Convert find_get_entry to return the head page mm/shmem: Return head page from find_lock_entry mm: Add find_lock_head
While running kselftest mincore tests the following kernel BUG reported on the linux next-20200915 tag on x86_64, i386 and arm64.
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git git commit: 6b02addb1d1748d21dd1261e46029b264be4e5a0 git describe: next-20200915 make_kernelversion: 5.9.0-rc5 kernel-config: http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/intel-corei7-64/lkft...
Test case: ---------------
* Tests the user interface. This test triggers most of the documented * error conditions in mincore(). https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/plain/to...
kernel BUG: ----------------- [ 710.472860] kselftest: Running tests in mincore [ 710.554790] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 710.561765] #PF: supervisor read access in kernel mode [ 710.566920] #PF: error_code(0x0000) - not-present page [ 710.572065] PGD 80000003fd5b9067 P4D 80000003fd5b9067 PUD 45903f067 PMD 0 [ 710.578957] Oops: 0000 [#1] SMP PTI [ 710.582450] CPU: 0 PID: 19025 Comm: mincore_selftes Tainted: G W K 5.9.0-rc5-next-20200915 #1 [ 710.592094] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0b 07/27/2017 [ 710.599574] RIP: 0010:PageHuge+0x6/0x40 [ 710.603411] Code: c3 0f 1f 00 0f 1f 44 00 00 55 48 89 d6 48 89 e5 e8 ef fe ff ff 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 <48> 8b 07 48 89 e5 a9 00 00 01 00 75 09 48 8b 47 08 83 e0 01 74 17 [ 710.622149] RSP: 0018:ffffb0e2002bfcc0 EFLAGS: 00010246 [ 710.627373] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 710.634498] RDX: ffff9f231b2518c0 RSI: ffffffffa272b340 RDI: 0000000000000000 [ 710.641620] RBP: ffffb0e2002bfce8 R08: 0000000000000002 R09: 0000000000000000 [ 710.648747] R10: ffffb0e2002bfb20 R11: ffffffffa272b340 R12: ffff9f23193c5e68 [ 710.655876] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001 [ 710.663003] FS: 00007fa4c9ea24c0(0000) GS:ffff9f231fc00000(0000) knlGS:0000000000000000 [ 710.671088] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 710.676824] CR2: 0000000000000000 CR3: 00000004044d0004 CR4: 00000000003706f0 [ 710.683949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 710.691073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 710.698196] Call Trace: [ 710.700644] ? find_get_incore_page+0xc6/0x120 [ 710.705089] mincore_page+0x12/0x60 [ 710.708580] __mincore_unmapped_range+0x78/0xc0 [ 710.713105] mincore_pte_range+0x269/0x300 [ 710.717206] __walk_page_range+0x5ab/0xb60 [ 710.721308] walk_page_range+0xab/0x150 [ 710.725152] __x64_sys_mincore+0x13c/0x330 [ 710.729251] do_syscall_64+0x37/0x50 [ 710.732831] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 710.737882] RIP: 0033:0x7fa4c99be2d7 [ 710.741462] Code: 73 01 c3 48 8b 0d c1 fb 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 1b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 fb 2b 00 f7 d8 64 89 01 48 [ 710.760206] RSP: 002b:00007ffcb103baf8 EFLAGS: 00000203 ORIG_RAX: 000000000000001b [ 710.767770] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa4c99be2d7 [ 710.774897] RDX: 0000000000000000 RSI: 0000000000001000 RDI: 00007fa4c9ea6000 [ 710.782020] RBP: 00007ffcb103bc40 R08: 00000000ffffffff R09: 0000000000000000 [ 710.789144] R10: 0000000000000021 R11: 0000000000000203 R12: 0000000000400d00 [ 710.796268] R13: 00007ffcb103be10 R14: 0000000000000000 R15: 0000000000000000 [ 710.803395] Modules linked in: sch_fq 8021q iptable_filter xt_mark ip_tables cls_bpf sch_ingress veth algif_hash x86_pkg_temp_thermal fuse [last unloaded: memory_notifier_error_inject] [ 710.819814] CR2: 0000000000000000 [ 710.823128] ---[ end trace 67d1a6d0ea1b24e3 ]--- [ 710.827746] RIP: 0010:PageHuge+0x6/0x40 [ 710.831584] Code: c3 0f 1f 00 0f 1f 44 00 00 55 48 89 d6 48 89 e5 e8 ef fe ff ff 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 <48> 8b 07 48 89 e5 a9 00 00 01 00 75 09 48 8b 47 08 83 e0 01 74 17 [ 710.850322] RSP: 0018:ffffb0e2002bfcc0 EFLAGS: 00010246 [ 710.855546] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 710.862672] RDX: ffff9f231b2518c0 RSI: ffffffffa272b340 RDI: 0000000000000000 [ 710.869803] RBP: ffffb0e2002bfce8 R08: 0000000000000002 R09: 0000000000000000 [ 710.876928] R10: ffffb0e2002bfb20 R11: ffffffffa272b340 R12: ffff9f23193c5e68 [ 710.884050] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001 [ 710.891175] FS: 00007fa4c9ea24c0(0000) GS:ffff9f231fc00000(0000) knlGS:0000000000000000 [ 710.899253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 710.904990] CR2: 0000000000000000 CR3: 00000004044d0004 CR4: 00000000003706f0 [ 710.912113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 710.919236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 710.926360] note: mincore_selftes[19025] exited with preempt_count 1 [ 710.932704] BUG: sleeping function called from invalid context at /usr/src/kernel/include/linux/percpu-rwsem.h:49 [ 710.942950] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 19025, name: mincore_selftes [ 710.951631] INFO: lockdep is turned off. [ 710.955551] irq event stamp: 190 [ 710.958785] hardirqs last enabled at (189): [<ffffffffa0bda53c>] get_page_from_freelist+0x24c/0x14b0 [ 710.967995] hardirqs last disabled at (190): [<ffffffffa18c7921>] irqentry_enter+0x21/0x50 [ 710.976247] softirqs last enabled at (36): [<ffffffffa1c00308>] __do_softirq+0x308/0x42a [ 710.984419] softirqs last disabled at (11): [<ffffffffa1a00f82>] asm_call_on_stack+0x12/0x20 [ 710.992852] CPU: 0 PID: 19025 Comm: mincore_selftes Tainted: G D W K 5.9.0-rc5-next-20200915 #1 [ 711.002496] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0b 07/27/2017 [ 711.009967] Call Trace: [ 711.012412] dump_stack+0x7d/0x9f [ 711.015724] ___might_sleep+0x163/0x250 [ 711.019562] __might_sleep+0x4a/0x80 [ 711.023141] exit_signals+0x33/0x2f0 [ 711.026713] do_exit+0xa9/0xcb0 [ 711.029858] ? __x64_sys_mincore+0x13c/0x330 [ 711.034125] rewind_stack_do_exit+0x17/0x20 [ 711.038308] RIP: 0033:0x7fa4c99be2d7 [ 711.041888] Code: 73 01 c3 48 8b 0d c1 fb 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 1b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 fb 2b 00 f7 d8 64 89 01 48 [ 711.060633] RSP: 002b:00007ffcb103baf8 EFLAGS: 00000203 ORIG_RAX: 000000000000001b [ 711.068197] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa4c99be2d7 [ 711.075321] RDX: 0000000000000000 RSI: 0000000000001000 RDI: 00007fa4c9ea6000 [ 711.082445] RBP: 00007ffcb103bc40 R08: 00000000ffffffff R09: 0000000000000000 [ 711.089570] R10: 0000000000000021 R11: 0000000000000203 R12: 0000000000400d00 [ 711.096693] R13: 00007ffcb103be10 R14: 0000000000000000 R15: 0000000000000000 [ 737.104310] rcu: INFO: rcu_sched self-detected stall on CPU [ 737.109887] rcu: 0-....: (26000 ticks this GP) idle=1a2/1/0x4000000000000000 softirq=102446/102446 fqs=6463 [ 737.119792] (t=26016 jiffies g=173197 q=1522) [ 737.124238] NMI backtrace for cpu 0 [ 737.127731] CPU: 0 PID: 19025 Comm: mincore_selftes Tainted: G D W K 5.9.0-rc5-next-20200915 #1 [ 737.137376] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0b 07/27/2017 [ 737.144856] Call Trace: [ 737.147309] <IRQ> [ 737.149330] dump_stack+0x7d/0x9f [ 737.152649] nmi_cpu_backtrace+0xa4/0xc0 [ 737.156574] ? lapic_can_unplug_cpu+0xa0/0xa0 [ 737.160931] nmi_trigger_cpumask_backtrace+0x97/0xd0 [ 737.165889] arch_trigger_cpumask_backtrace+0x19/0x20 [ 737.170933] rcu_dump_cpu_stacks+0xbc/0xec [ 737.175025] rcu_sched_clock_irq+0x729/0x9c0 [ 737.179296] ? account_system_index_time+0x112/0x1f0 [ 737.184256] ? tick_sched_do_timer+0x60/0x60 [ 737.188528] update_process_times+0x28/0x60 [ 737.192711] tick_sched_handle.isra.21+0x34/0x50 [ 737.197322] tick_sched_timer+0x6d/0x80 [ 737.201154] __hrtimer_run_queues+0x1d0/0x450 [ 737.205514] hrtimer_interrupt+0xe7/0x240 [ 737.209527] __sysvec_apic_timer_interrupt+0x79/0x1f0 [ 737.214578] asm_call_on_stack+0x12/0x20 [ 737.218504] </IRQ> [ 737.220611] sysvec_apic_timer_interrupt+0x75/0xa0 [ 737.225402] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 737.230541] RIP: 0010:queued_spin_lock_slowpath+0x41/0x1a0 [ 737.236017] Code: f6 85 f6 75 3e f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 18 85 c0 75 04 eb 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 5d c3 f6 c4 01 75 04 c6 [ 737.254755] RSP: 0018:ffffb0e2002bfc48 EFLAGS: 00000202 [ 737.259980] RAX: 0000000000000101 RBX: ffff9f23167158c0 RCX: 0000000000000000 [ 737.267103] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9f23167158c0 [ 737.274230] RBP: ffffb0e2002bfc48 R08: 0000000000000001 R09: 0000000000000000 [ 737.281353] R10: 0000000000000000 R11: 0000000000000000 R12: 00007fa4c9ea1000 [ 737.288478] R13: 00007fa4c9ea1000 R14: ffffb0e2002bfe20 R15: 00007fa4c9ea1000 [ 737.295605] do_raw_spin_lock+0xb6/0xc0 [ 737.299440] _raw_spin_lock+0x37/0x40 [ 737.303098] ? unmap_page_range+0x4a6/0xd00 [ 737.307275] unmap_page_range+0x4a6/0xd00 [ 737.311294] unmap_single_vma+0x7d/0xf0 [ 737.315138] unmap_vmas+0xd4/0x160 [ 737.318544] exit_mmap+0xb1/0x1c0 [ 737.321866] mmput+0x6a/0x130 [ 737.324843] do_exit+0x359/0xcb0 [ 737.328076] rewind_stack_do_exit+0x17/0x20 [ 737.332260] RIP: 0033:0x7fa4c99be2d7 [ 737.335831] Code: Bad RIP value. [ 737.339054] RSP: 002b:00007ffcb103baf8 EFLAGS: 00000203 ORIG_RAX: 000000000000001b [ 737.346613] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa4c99be2d7 [ 737.353743] RDX: 0000000000000000 RSI: 0000000000001000 RDI: 00007fa4c9ea6000 [ 737.360867] RBP: 00007ffcb103bc40 R08: 00000000ffffffff R09: 0000000000000000 [ 737.367992] R10: 0000000000000021 R11: 0000000000000203 R12: 0000000000400d00 [ 737.375116] R13: 00007ffcb103be10 R14: 0000000000000000 R15: 0000000000000000 [ 815.107312] rcu: INFO: rcu_sched self-detected stall on CPU [ 815.112890] rcu: 0-....: (103727 ticks this GP) idle=1a2/1/0x4000000000000000 softirq=102446/102446 fqs=25897 [ 815.122966] (t=104019 jiffies g=173197 q=1545) [ 815.127492] NMI backtrace for cpu 0 [ 815.130985] CPU: 0 PID: 19025 Comm: mincore_selftes Tainted: G D W K 5.9.0-rc5-next-20200915 #1 [ 815.140628] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0b 07/27/2017 [ 815.148099] Call Trace: [ 815.150542] <IRQ> [ 815.152556] dump_stack+0x7d/0x9f [ 815.155876] nmi_cpu_backtrace+0xa4/0xc0 [ 815.159799] ? lapic_can_unplug_cpu+0xa0/0xa0 [ 815.164150] nmi_trigger_cpumask_backtrace+0x97/0xd0 [ 815.169108] arch_trigger_cpumask_backtrace+0x19/0x20 [ 815.174151] rcu_dump_cpu_stacks+0xbc/0xec [ 815.178245] rcu_sched_clock_irq+0x729/0x9c0 [ 815.182515] ? account_system_index_time+0x112/0x1f0 [ 815.187473] ? tick_sched_do_timer+0x60/0x60 [ 815.191744] update_process_times+0x28/0x60 [ 815.195922] tick_sched_handle.isra.21+0x34/0x50 [ 815.200533] tick_sched_timer+0x6d/0x80 [ 815.204365] __hrtimer_run_queues+0x1d0/0x450 [ 815.208725] hrtimer_interrupt+0xe7/0x240 [ 815.212738] __sysvec_apic_timer_interrupt+0x79/0x1f0 [ 815.217789] asm_call_on_stack+0x12/0x20 [ 815.221713] </IRQ> [ 815.223811] sysvec_apic_timer_interrupt+0x75/0xa0 [ 815.228597] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 815.233734] RIP: 0010:queued_spin_lock_slowpath+0x41/0x1a0
full test log link, https://lkft.validation.linaro.org/scheduler/job/1765602#L12129 https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20200915/tes... https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20200915/tes...
Reported-by: Naresh Kamboju naresh.kamboju@linaro.org