Re: [PATCH] btrfs: handle shrink_delalloc pages calculation differently

7 Jun 2021


      On Tue, Jun 01, 2021 at 03:45:08PM -0400, Josef Bacik wrote:
...
We have been hitting some early ENOSPC issues in production with more
recent kernels, and I tracked it down to us simply not flushing delalloc
as aggressively as we should be.  With tracing I was seeing us failing
all tickets with all of the block rsvs at or around 0, with very little
pinned space, but still around 120mib of outstanding bytes_may_used.
Upon further investigation I saw that we were flushing around 14 pages
per shrink call for delalloc, despite having around 2gib of delalloc
outstanding.
Consider the example of a 8 way machine, all cpu's trying to create a
file in parallel, which at the time of this commit requires 5 items to
do.  Assuming a 16k leaf size, we have 10mib of total metadata reclaim
size waiting on reservations.  Now assume we have 128mib of delalloc
outstanding.  With our current math we would set items to 20, and then
set to_reclaim to 20 * 256k, or 5mib.
Assuming that we went through this loop all 3 times, for both
FLUSH_DELALLOC and FLUSH_DELALLOC_WAIT, and then did the full loop
twice, we'd only flush 60mib of the 128mib delalloc space.  This could
leave a fair bit of delalloc reservations still hanging around by the
time we go to ENOSPC out all the remaining tickets.
Fix this two ways.  First, change the calculations to be a fraction of
the total delalloc bytes on the system.  Prior to my change we were
calculating based on dirty inodes so our math made more sense, now it's
just completely unrelated to what we're actually doing.
Second add a FLUSH_DELALLOC_FULL state, that we hold off until we've
gone through the flush states at least once.  This will empty the system
of all delalloc so we're sure to be truly out of space when we start
failing tickets.
I'm tagging stable 5.10 and forward, because this is where we started
using the page stuff heavily again.  This affects earlier kernel
versions as well, but would be a pain to backport to them as the
flushing mechanisms aren't the same.
For 5.10 it depends on f00c42dd4cc8b856e6 ("btrfs: introduce a
FORCE_COMMIT_TRANS flush operation") and is followed by the premptive
flushing series. Prior to the commit introducing COMMIT_TRANS there are
3 patches that seem lightweight enough for stable backport to 5.10 but
that should be evaluated first.
5.11.x stable is EOL, so 5.12 is ok to pick it but in case there's
interest to backport it to 5.10, more work is needed than just tagging.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] btrfs: handle shrink_delalloc pages calculation differently