Re: [PATCH 1/4] mm, swap: do not perform synchronous discard during allocation

24 Oct 2025

      On Tue, Oct 21, 2025 at 3:34 PM YoungJun Park youngjun.park@lge.com wrote:
...
...
...
Thanks, I was composing a reply on this and just saw your new comment.
I agree with this.
Hmm, it turns out modifying V1 to handle non-order 0 allocation
failure also has some minor issues. Every mTHP SWAP allocation failure
will have a slight higher overhead due to the discard check. V1 is
fine since it only checks discard for order 0, and order 0 alloc
failure is uncommon and usually means OOM already.
Looking at the original proposed patch.

 spin_lock(&swap_avail_lock);

 plist_for_each_entry_safe(si, next, &swap_avail_heads[nid], avail_lists[nid]) {

         spin_unlock(&swap_avail_lock);

         if (get_swap_device_info(si)) {

                 if (si->flags & SWP_PAGE_DISCARD)

                         ret = swap_do_scheduled_discard(si);

                 put_swap_device(si);

         }

         if (ret)

                 break;

if ret is true and we break,
wouldn’t that cause spin_unlock to run without the lock being held?
Thanks for catching this! Right, I need to return directly instead of
break. I've fixed that.
...

         spin_lock(&swap_avail_lock);

 }

 spin_unlock(&swap_avail_lock); <- unlocked without lock grab.

 return ret;

+}
...
I'm not saying V1 is the final solution, but I think maybe we can just
keep V1 as it is? That's easier for a stable backport too, and this is
doing far better than what it was like. The sync discard was added in
2013 and the later added percpu cluster at the same year never treated
it carefully. And the discard during allocation after recent swap
allocator rework has been kind of broken for a while.
To optimize it further in a clean way, we have to reverse the
allocator's handling order of the plist and fast / slow path. Current
order is local_lock -> fast -> slow (plist).
We can walk the plist first, then do the fast / slow path: plist (or
maybe something faster than plist but handles the priority) ->
local_lock -> fast -> slow (bonus: this is more friendly to RT kernels
I think the idea is good, but when approaching it that way,
I am curious about rotation handling.
In the current code, rotation is always done when traversing the plist in the slow path.
If we traverse the plist first, how should rotation be handled?
That's a very good question, things always get tricky when it comes to
the details...
...

Do a naive rotation at plist traversal time.

(But then fast path might allocate from an si we didn’t select.)
2. Rotate when allocating in the slow path.
(But between releasing swap_avail_lock, we might access an si that wasn’t rotated.)
Both cases could break rotation behavior — what do you think?
I think cluster level rotating is better, it prevents things from
going too fragmented and spreads the workload between devices in a
helpful way, but just my guess.
We can change the rotation behavior if the test shows some other
strategy is better.
Maybe we'll need something with a better design, like a alloc counter
for rotation. And if we look at the plist before the fast path we may
need to do some optimization for the plist lock too...

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/4] mm, swap: do not perform synchronous discard during allocation