Re: [PATCH] xenbus: Use kref to track req lifetime

7 May 2025

      On 06.05.25 23:09, Jason Andryuk wrote:
...
Marek reported seeing a NULL pointer fault in the xenbus_thread
callstack:
BUG: kernel NULL pointer dereference, address: 0000000000000000
RIP: e030:__wake_up_common+0x4c/0x180
Call Trace:
  <TASK>
  __wake_up_common_lock+0x82/0xd0
  process_msg+0x18e/0x2f0
  xenbus_thread+0x165/0x1c0
process_msg+0x18e is req->cb(req).  req->cb is set to xs_wake_up(), a
thin wrapper around wake_up(), or xenbus_dev_queue_reply().  It seems
like it was xs_wake_up() in this case.
It seems like req may have woken up the xs_wait_for_reply(), which
kfree()ed the req.  When xenbus_thread resumes, it faults on the zero-ed
data.
Linux Device Drivers 2nd edition states:
"Normally, a wake_up call can cause an immediate reschedule to happen,
meaning that other processes might run before wake_up returns."
... which would match the behaviour observed.
Change to keeping two krefs on each request.  One for the caller, and
one for xenbus_thread.  Each will kref_put() when finished, and the last
will free it.
This use of kref matches the description in
Documentation/core-api/kref.rst
Link: https://lore.kernel.org/xen-devel/ZO0WrR5J0xuwDIxW@mail-itl/
Reported-by: "Marek Marczykowski-Górecki" marmarek@invisiblethingslab.com
Fixes: fd8aa9095a95 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Andryuk jason.andryuk@amd.com
Reviewed-by: Juergen Gross jgross@suse.com
...

Kinda RFC-ish as I don't know if it fixes Marek's issue.  This does seem
like the correct approach if we are seeing req free()ed out from under
xenbus_thread.
I think your analysis is correct. When writing this code I didn't think
of wake_up() needing to access req->wq _after_ having woken up the waiter.
Juergen

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] xenbus: Use kref to track req lifetime