On Tue, Jan 09, 2024 at 10:17:22AM -0800, John Sperbeck wrote:
With 5.10LTS (e.g., 5.10.206), on a machine using an NVME device, the following tracing commands will trigger a crash due to a NULL pointer dereference:
KDIR=/sys/kernel/debug/tracing echo 1 > $KDIR/tracing_on echo 1 > $KDIR/events/nvme/enable echo "Waiting for trace events..." cat $KDIR/trace_pipe
The backtrace looks something like this:
Call Trace:
<IRQ> ? __die_body+0x6b/0xb0 ? __die+0x9e/0xb0 ? no_context+0x3eb/0x460 ? ttwu_do_activate+0xf0/0x120 ? __bad_area_nosemaphore+0x157/0x200 ? select_idle_sibling+0x2f/0x410 ? bad_area_nosemaphore+0x13/0x20 ? do_user_addr_fault+0x2ab/0x360 ? exc_page_fault+0x69/0x180 ? asm_exc_page_fault+0x1e/0x30 ? trace_event_raw_event_nvme_complete_rq+0xba/0x170 ? trace_event_raw_event_nvme_complete_rq+0xa3/0x170 nvme_complete_rq+0x168/0x170 nvme_pci_complete_rq+0x16c/0x1f0 nvme_handle_cqe+0xde/0x190 nvme_irq+0x78/0x100 __handle_irq_event_percpu+0x77/0x1e0 handle_irq_event+0x54/0xb0 handle_edge_irq+0xdf/0x230 asm_call_irq_on_stack+0xf/0x20 </IRQ> common_interrupt+0x9e/0x150 asm_common_interrupt+0x1e/0x40
It looks to me like these two upstream commits were backported to 5.10:
679c54f2de67 ("nvme: use command_id instead of req->tag in trace_nvme_complete_rq()") e7006de6c238 ("nvme: code command_id with a genctr for use-after-free validation")
But they depend on this upstream commit to initialize the 'cmd' field in some cases:
f4b9e6c90c57 ("nvme: use driver pdu command for passthrough")
Does it sound like I'm on the right track? The 5.15LTS and later seems to be okay.
If you apply that commit, does it solve the issue for you?
thanks,
greg k-h