FUSE has a bug where it fails to clear congestion states if a connection gets aborted while congested, which can leave nr_wb_congested[] stuck until reboot causing wait_iff_congested() to wait spuriously.
While the bdi owner, FUSE, is primarily responsible for clearing congestion states before destroying bdi_writebacks, bdi layer can ensure that congestion states are not leaked beyond bdi_writeback lifecycle.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Joshua Miller joshmiller@fb.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Jan Kara jack@suse.cz Cc: stable@vger.kernel.org --- include/linux/backing-dev.h | 14 +++++++++++++- mm/backing-dev.c | 2 +- 2 files changed, 14 insertions(+), 2 deletions(-)
--- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -220,6 +220,18 @@ static inline int bdi_sched_wait(void *w return 0; }
+static inline void __wb_congested_free(struct bdi_writeback_congested *congested) +{ + /* + * Make sure congestion states are cleared before freeing to avoid + * nr_wb_congested() corruption which can lead to misbehaving + * wait_iff_congested(). + */ + clear_wb_congested(congested, BLK_RW_SYNC); + clear_wb_congested(congested, BLK_RW_ASYNC); + kfree(congested); +} + #ifdef CONFIG_CGROUP_WRITEBACK
struct bdi_writeback_congested * @@ -409,7 +421,7 @@ wb_congested_get_create(struct backing_d static inline void wb_congested_put(struct bdi_writeback_congested *congested) { if (atomic_dec_and_test(&congested->refcnt)) - kfree(congested); + __wb_congested_free(congested); }
static inline struct bdi_writeback *wb_find_current(struct backing_dev_info *bdi) --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -509,7 +509,7 @@ void wb_congested_put(struct bdi_writeba }
spin_unlock_irqrestore(&cgwb_lock, flags); - kfree(congested); + __wb_congested_free(congested); }
static void cgwb_release_workfn(struct work_struct *work)
If a connection gets aborted while congested, FUSE can leave nr_wb_congested[] stuck until reboot causing wait_iff_congested() to wait spuriously which can lead to severe performance degradation.
The leak is caused by gating congestion state clearing with fc->connected test in request_end(). This was added way back in 2009 by 26c3679101db ("fuse: destroy bdi on umount"). While the commit description doesn't explain why the test was added, it most likely was to avoid dereferencing bdi after it got destroyed.
Since then, bdi lifetime rules have changed many times and now we're always guaranteed to have access to the bdi while the superblock is alive (fc->sb).
Drop fc->connected conditional to avoid leaking congestion states.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Joshua Miller joshmiller@fb.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Miklos Szeredi miklos@szeredi.hu Cc: Jan Kara jack@suse.cz Cc: stable@vger.kernel.org --- fs/fuse/dev.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -381,8 +381,7 @@ static void request_end(struct fuse_conn if (!fc->blocked && waitqueue_active(&fc->blocked_waitq)) wake_up(&fc->blocked_waitq);
- if (fc->num_background == fc->congestion_threshold && - fc->connected && fc->sb) { + if (fc->num_background == fc->congestion_threshold && fc->sb) { clear_bdi_congested(fc->sb->s_bdi, BLK_RW_SYNC); clear_bdi_congested(fc->sb->s_bdi, BLK_RW_ASYNC); }
On Fri 02-02-18 09:54:14, Tejun Heo wrote:
If a connection gets aborted while congested, FUSE can leave nr_wb_congested[] stuck until reboot causing wait_iff_congested() to wait spuriously which can lead to severe performance degradation.
The leak is caused by gating congestion state clearing with fc->connected test in request_end(). This was added way back in 2009 by 26c3679101db ("fuse: destroy bdi on umount"). While the commit description doesn't explain why the test was added, it most likely was to avoid dereferencing bdi after it got destroyed.
Since then, bdi lifetime rules have changed many times and now we're always guaranteed to have access to the bdi while the superblock is alive (fc->sb).
Drop fc->connected conditional to avoid leaking congestion states.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Joshua Miller joshmiller@fb.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Miklos Szeredi miklos@szeredi.hu Cc: Jan Kara jack@suse.cz Cc: stable@vger.kernel.org
Yeah, this should be fine AFAICT but my knowledge of FUSE is very cursory. Anyway:
Acked-by: Jan Kara jack@suse.cz
Honza
fs/fuse/dev.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -381,8 +381,7 @@ static void request_end(struct fuse_conn if (!fc->blocked && waitqueue_active(&fc->blocked_waitq)) wake_up(&fc->blocked_waitq);
if (fc->num_background == fc->congestion_threshold &&
fc->connected && fc->sb) {
}if (fc->num_background == fc->congestion_threshold && fc->sb) { clear_bdi_congested(fc->sb->s_bdi, BLK_RW_SYNC); clear_bdi_congested(fc->sb->s_bdi, BLK_RW_ASYNC);
On Tue, Feb 6, 2018 at 5:25 PM, Jan Kara jack@suse.cz wrote:
On Fri 02-02-18 09:54:14, Tejun Heo wrote:
If a connection gets aborted while congested, FUSE can leave nr_wb_congested[] stuck until reboot causing wait_iff_congested() to wait spuriously which can lead to severe performance degradation.
The leak is caused by gating congestion state clearing with fc->connected test in request_end(). This was added way back in 2009 by 26c3679101db ("fuse: destroy bdi on umount"). While the commit description doesn't explain why the test was added, it most likely was to avoid dereferencing bdi after it got destroyed.
Since then, bdi lifetime rules have changed many times and now we're always guaranteed to have access to the bdi while the superblock is alive (fc->sb).
Drop fc->connected conditional to avoid leaking congestion states.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Joshua Miller joshmiller@fb.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Miklos Szeredi miklos@szeredi.hu Cc: Jan Kara jack@suse.cz Cc: stable@vger.kernel.org
Yeah, this should be fine AFAICT but my knowledge of FUSE is very cursory. Anyway:
Acked-by: Jan Kara jack@suse.cz
Can't say I fully understand how the global "is any bdi congested" state is used in direct reclaim, but the patch is an obvious improvement, so applied.
Thanks, Miklos
On Fri, Feb 02, 2018 at 09:53:28AM -0800, Tejun Heo wrote:
FUSE has a bug where it fails to clear congestion states if a connection gets aborted while congested, which can leave nr_wb_congested[] stuck until reboot causing wait_iff_congested() to wait spuriously.
While the bdi owner, FUSE, is primarily responsible for clearing congestion states before destroying bdi_writebacks, bdi layer can ensure that congestion states are not leaked beyond bdi_writeback lifecycle.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Joshua Miller joshmiller@fb.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Jan Kara jack@suse.cz Cc: stable@vger.kernel.org
Acked-by: Johannes Weiner hannes@cmpxchg.org
On Fri 02-02-18 09:53:28, Tejun Heo wrote:
FUSE has a bug where it fails to clear congestion states if a connection gets aborted while congested, which can leave nr_wb_congested[] stuck until reboot causing wait_iff_congested() to wait spuriously.
While the bdi owner, FUSE, is primarily responsible for clearing congestion states before destroying bdi_writebacks, bdi layer can ensure that congestion states are not leaked beyond bdi_writeback lifecycle.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: Joshua Miller joshmiller@fb.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Jan Kara jack@suse.cz Cc: stable@vger.kernel.org
Looks good. You can add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
include/linux/backing-dev.h | 14 +++++++++++++- mm/backing-dev.c | 2 +- 2 files changed, 14 insertions(+), 2 deletions(-)
--- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -220,6 +220,18 @@ static inline int bdi_sched_wait(void *w return 0; } +static inline void __wb_congested_free(struct bdi_writeback_congested *congested) +{
- /*
* Make sure congestion states are cleared before freeing to avoid
* nr_wb_congested() corruption which can lead to misbehaving
* wait_iff_congested().
*/
- clear_wb_congested(congested, BLK_RW_SYNC);
- clear_wb_congested(congested, BLK_RW_ASYNC);
- kfree(congested);
+}
#ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback_congested * @@ -409,7 +421,7 @@ wb_congested_get_create(struct backing_d static inline void wb_congested_put(struct bdi_writeback_congested *congested) { if (atomic_dec_and_test(&congested->refcnt))
kfree(congested);
__wb_congested_free(congested);
} static inline struct bdi_writeback *wb_find_current(struct backing_dev_info *bdi) --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -509,7 +509,7 @@ void wb_congested_put(struct bdi_writeba } spin_unlock_irqrestore(&cgwb_lock, flags);
- kfree(congested);
- __wb_congested_free(congested);
} static void cgwb_release_workfn(struct work_struct *work)
linux-stable-mirror@lists.linaro.org