On Fri, 9 Aug 2024, David Hildenbrand wrote:
This really seems to be the latest point where we can "easily" back off and unlock the source folio -- in this function :)
I wonder if we should be smarter in the migrate_pages_batch() loop when we start the actual migrations via migrate_folio_move(): if we detect that a folio has unexpected references *and* it has waiters (PG_waiters), back off then and retry the folio later. If it only has unexpected references, just keep retrying: no waiters -> nobody is waiting for the lock to make progress.
Well just backoff ASAP if there are waiters detected anytime. A waiter would have increased the refcount. And a waiter will likely modify the page status soon. So push it to the end of the pages to be migrated to give it as much time as we can and check again later.