On 05.03.24 17:33, Ryan Roberts wrote:
On 05/03/2024 15:50, David Hildenbrand wrote:
On 05.03.24 16:13, Ryan Roberts wrote:
There was previously a theoretical window where swapoff() could run and teardown a swap_info_struct while a call to free_swap_and_cache() was running in another thread. This could cause, amongst other bad possibilities, swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to access the freed memory for swap_map.
This is a theoretical problem and I haven't been able to provoke it from a test case. But there has been agreement based on code review that this is possible (see link below).
Fix it by using get_swap_device()/put_swap_device(), which will stall swapoff(). There was an extra check in _swap_info_get() to confirm that the swap entry was valid. This wasn't present in get_swap_device() so I've added it. I couldn't find any existing get_swap_device() call sites where this extra check would cause any false alarms.
Details of how to provoke one possible issue (thanks to David Hilenbrand for deriving this):
Almost
"s/Hilenbrand/Hildenbrand/" :)
Ahh sorry... I even explicitly checked it against your name on emails... fat fingers...
No need to be sorry. Even your average German person would get it wrong, because there are other (more common) variants :)
[...]
LGTM
Are you planning on sending a doc extension for get_swap_device()?
I saw your comment:
We should likely update the documentation of get_swap_device(), that after decrementing the refcount, the SI might become stale and should not be touched without a prior get_swap_device().
But when I went to make the changes, I saw the documentation already said:
...we need to enclose all swap related functions with get_swap_device() and put_swap_device()... Notice that swapoff ... can still happen before the percpu_ref_tryget_live() in get_swap_device() or after the percpu_ref_put() in put_swap_device()... The caller must be prepared for that.
I thought that already covered it? I'm sure as usual, I've misunderstood your point. Happy to respin if you have something in mind?
No need to respin, we could clarify on top, if we decide it makes sense.
I was thinking about something like this, making it clearer that the PTL discussion above does not express the corner case we discovered:
diff --git a/mm/swapfile.c b/mm/swapfile.c index 2b3a2d85e350b..646a436581eee 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1232,6 +1232,11 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p, * with get_swap_device() and put_swap_device(), unless the swap * functions call get/put_swap_device() by themselves. * + * Note that when only holding the PTL, swapoff might succeed immediately + * after freeing a swap entry. Therefore, immediately after + * __swap_entry_free(), the swap info might become stale and should not + * be touched without a prior get_swap_device(). + * * Check whether swap entry is valid in the swap device. If so, * return pointer to swap_info_struct, and keep the swap entry valid * via preventing the swap device from being swapoff, until