Re: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails

2 Oct 2024


      On Wed, Oct 2, 2024 at 10:02 AM Barry Song 21cnbao@gmail.com wrote:
...
On Wed, Oct 2, 2024 at 8:43 AM Huang, Ying ying.huang@intel.com wrote:
...
Barry Song 21cnbao@gmail.com writes:
...
On Tue, Oct 1, 2024 at 7:43 AM Huang, Ying ying.huang@intel.com wrote:
...
Barry Song 21cnbao@gmail.com writes:
...
On Sun, Sep 29, 2024 at 3:43 PM Huang, Ying ying.huang@intel.com wrote:
...
Hi, Barry,
Barry Song 21cnbao@gmail.com writes:
> From: Barry Song v-songbaohua@oppo.com
>
> Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
> introduced an unconditional one-tick sleep when `swapcache_prepare()`
> fails, which has led to reports of UI stuttering on latency-sensitive
> Android devices. To address this, we can use a waitqueue to wake up
> tasks that fail `swapcache_prepare()` sooner, instead of always
> sleeping for a full tick. While tasks may occasionally be woken by an
> unrelated `do_swap_page()`, this method is preferable to two scenarios:
> rapid re-entry into page faults, which can cause livelocks, and
> multiple millisecond sleeps, which visibly degrade user experience.
In general, I think that this works.  Why not extend the solution to
cover schedule_timeout_uninterruptible() in __read_swap_cache_async()
too?  We can call wake_up() when we clear SWAP_HAS_CACHE.  To avoid
Hi Ying,
Thanks for your comments.
I feel extending the solution to __read_swap_cache_async() should be done
in a separate patch. On phones, I've never encountered any issues reported
on that path, so it might be better suited for an optimization rather than a
hotfix?
Hi Barry and Ying,
For the __read_swap_cache_async case, I'm not really against adding a
similar workqueue, but if no one is really suffering from it, and if
the workqueue do causes extra overhead, maybe we can ignore it for the
__read_swap_cache_async case now, and I plan to resent the following
patch:
https://lore.kernel.org/linux-mm/20240326185032.72159-9-ryncsn@gmail.com/#r
It removed all schedule_timeout_uninterruptible workaround and other
similar things, and the performance will go even higher.
...
...
...
...
Yes.  It's fine to do that in another patch as optimization.
Ok. I'll prepare a separate patch for optimizing that path.
Thanks!
...
...
...
...
overhead to call wake_up() when there's no task waiting, we can use an
atomic to count waiting tasks.
I'm not sure it's worth adding the complexity, as wake_up() on an empty
waitqueue should have a very low cost on its own?
wake_up() needs to call spin_lock_irqsave() unconditionally on a global
shared lock.  On systems with many CPUs (such servers), this may cause
severe lock contention.  Even the cache ping-pong may hurt performance
much.
I understand that cache synchronization was a significant issue before
qspinlock, but it seems to be less of a concern after its implementation.
Unfortunately, qspinlock cannot eliminate cache ping-pong issue, as
discussed in the following thread.
https://lore.kernel.org/lkml/20220510192708.GQ76023@worktop.programming.kick...
...
However, using a global atomic variable would still trigger cache broadcasts,
correct?
We can only change the atomic variable to non-zero when
swapcache_prepare() returns non-zero, and call wake_up() when the atomic
variable is non-zero.  Because swapcache_prepare() returns 0 most times,
the atomic variable is 0 most times.  If we don't change the value of
atomic variable, cache ping-pong will not be triggered.
yes. this can be implemented by adding another atomic variable.
...
Hi, Kairui,
Do you have some test cases to test parallel zram swap-in?  If so, that
can be used to verify whether cache ping-pong is an issue and whether it
can be fixed via a global atomic variable.
Yes, Kairui please run a test on your machine with lots of cores before
and after adding a global atomic variable as suggested by Ying. I am
sorry I don't have a server machine.
I just had a try with the build kernel test which I used for the
allocator patch series, with -j64, 1G memcg on my local branch:
Without the patch:
2677.63user 9100.43system 3:33.15elapsed 5452%CPU (0avgtext+0avgdata
863284maxresident)k
2671.40user 8969.07system 3:33.67elapsed 5447%CPU (0avgtext+0avgdata
863316maxresident)k
2673.66user 8973.90system 3:33.18elapsed 5463%CPU (0avgtext+0avgdata
863284maxresident)k
With the patch:
2655.05user 9134.21system 3:35.63elapsed 5467%CPU (0avgtext+0avgdata
863288maxresident)k
2652.57user 9104.87system 3:35.07elapsed 5466%CPU (0avgtext+0avgdata
863272maxresident)k
2665.44user 9155.97system 3:35.92elapsed 5474%CPU (0avgtext+0avgdata
863316maxresident)k
Only three test runs, the main bottleneck for the test is still some
other locks (list_lru lock, swap cgroup lock etc), but it does show
the performance seems a bit lower. Could be considered a trivial
amount of overhead so I think it's acceptable for the SYNC_IO path.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails