With the current implementation the following race can happen:
* blk_pre_runtime_suspend() calls blk_freeze_queue_start() and
blk_mq_unfreeze_queue().
* blk_queue_enter() calls blk_queue_pm_only() and that function returns
true.
* blk_queue_enter() calls blk_pm_request_resume() and that function does
not call pm_request_resume() because the queue runtime status is
RPM_ACTIVE.
* blk_pre_runtime_suspend() changes the queue status into RPM_SUSPENDING.
Fix this race by changing the queue runtime status into RPM_SUSPENDING
before switching q_usage_counter to atomic mode.
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Acked-by: Stanley Chu <stanley.chu(a)mediatek.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: stable <stable(a)vger.kernel.org>
Fixes: 986d413b7c15 ("blk-mq: Enable support for runtime power management")
Signed-off-by: Can Guo <cang(a)codeaurora.org>
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
---
block/blk-pm.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/block/blk-pm.c b/block/blk-pm.c
index b85234d758f7..17bd020268d4 100644
--- a/block/blk-pm.c
+++ b/block/blk-pm.c
@@ -67,6 +67,10 @@ int blk_pre_runtime_suspend(struct request_queue *q)
WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE);
+ spin_lock_irq(&q->queue_lock);
+ q->rpm_status = RPM_SUSPENDING;
+ spin_unlock_irq(&q->queue_lock);
+
/*
* Increase the pm_only counter before checking whether any
* non-PM blk_queue_enter() calls are in progress to avoid that any
@@ -89,15 +93,14 @@ int blk_pre_runtime_suspend(struct request_queue *q)
/* Switch q_usage_counter back to per-cpu mode. */
blk_mq_unfreeze_queue(q);
- spin_lock_irq(&q->queue_lock);
- if (ret < 0)
+ if (ret < 0) {
+ spin_lock_irq(&q->queue_lock);
+ q->rpm_status = RPM_ACTIVE;
pm_runtime_mark_last_busy(q->dev);
- else
- q->rpm_status = RPM_SUSPENDING;
- spin_unlock_irq(&q->queue_lock);
+ spin_unlock_irq(&q->queue_lock);
- if (ret)
blk_clear_pm_only(q);
+ }
return ret;
}
This deadlock is hitting Android users (Pixel 3/3a/4) with Magisk, due
to frequent umount/mount operations that trigger quota_sync, hitting
the race. See https://github.com/topjohnwu/Magisk/issues/3171 for
additional impact discussion.
In commit db6ec53b7e03, we added a semaphore to protect quota flags.
As part of this commit, we changed f2fs_quota_sync to call
f2fs_lock_op, in an attempt to prevent an AB/BA type deadlock with
quota_sem locking in block_operation. However, rwsem in Linux is not
recursive. Therefore, the following deadlock can occur:
f2fs_quota_sync
down_read(cp_rwsem) // f2fs_lock_op
filemap_fdatawrite
f2fs_write_data_pages
...
block_opertaion
down_write(cp_rwsem) - marks rwsem as
"writer pending"
down_read_trylock(cp_rwsem) - fails as there is
a writer pending.
Code keeps on trying,
live-locking the filesystem.
We solve this by creating a new rwsem, used specifically to
synchronize this case, instead of attempting to reuse an existing
lock.
Signed-off-by: Shachar Raindel <shacharr(a)gmail.com>
Fixes: db6ec53b7e03 f2fs: add a rw_sem to cover quota flag changes
---
fs/f2fs/f2fs.h | 3 +++
fs/f2fs/super.c | 13 +++++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index cb700d797296..b3e55137be7f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1448,6 +1448,7 @@ struct f2fs_sb_info {
struct inode *meta_inode; /* cache meta blocks */
struct mutex cp_mutex; /* checkpoint procedure lock */
struct rw_semaphore cp_rwsem; /* blocking FS operations */
+ struct rw_semaphore cp_quota_rwsem; /* blocking quota sync operations */
struct rw_semaphore node_write; /* locking node writes */
struct rw_semaphore node_change; /* locking node change */
wait_queue_head_t cp_wait;
@@ -1961,12 +1962,14 @@ static inline void f2fs_unlock_op(struct f2fs_sb_info *sbi)
static inline void f2fs_lock_all(struct f2fs_sb_info *sbi)
{
+ down_write(&sbi->cp_quota_rwsem);
down_write(&sbi->cp_rwsem);
}
static inline void f2fs_unlock_all(struct f2fs_sb_info *sbi)
{
up_write(&sbi->cp_rwsem);
+ up_write(&sbi->cp_quota_rwsem);
}
static inline int __get_cp_reason(struct f2fs_sb_info *sbi)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 00eff2f51807..5ce61147d7e5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2209,8 +2209,16 @@ int f2fs_quota_sync(struct super_block *sb, int type)
* f2fs_dquot_commit
* block_operation
* down_read(quota_sem)
+ *
+ * However, we cannot use the cp_rwsem to prevent this
+ * deadlock, as the cp_rwsem is taken for read inside the
+ * f2fs_dquot_commit code, and rwsem is not recursive.
+ *
+ * We therefore use a special lock to synchronize
+ * f2fs_quota_sync with block_operations, as this is the only
+ * place where such recursion occurs.
*/
- f2fs_lock_op(sbi);
+ down_read(&sbi->cp_quota_rwsem);
down_read(&sbi->quota_sem);
ret = dquot_writeback_dquots(sb, type);
@@ -2251,7 +2259,7 @@ int f2fs_quota_sync(struct super_block *sb, int type)
if (ret)
set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR);
up_read(&sbi->quota_sem);
- f2fs_unlock_op(sbi);
+ up_read(&sbi->cp_quota_rwsem);
return ret;
}
@@ -3599,6 +3607,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
init_rwsem(&sbi->cp_rwsem);
init_rwsem(&sbi->quota_sem);
+ init_rwsem(&sbi->cp_quota_rwsem);
init_waitqueue_head(&sbi->cp_wait);
init_sb_info(sbi);
--
2.29.2
On Sat, Nov 28, 2020 at 11:28:53PM +0800, Wen Yang wrote:
>
>
> 在 2020/11/28 下午10:05, Greg Kroah-Hartman 写道:
> > On Sat, Nov 28, 2020 at 09:59:09PM +0800, Wen Yang wrote:
> > >
> > >
> > > 在 2020/11/28 下午4:06, Greg Kroah-Hartman 写道:
> > > > On Sat, Nov 28, 2020 at 02:47:22PM +0800, Wen Yang wrote:
> > > > > [ Upstream commit 7bc3e6e55acf065500a24621f3b313e7e5998acf ]
> > > >
> > > > No, that is not this commit at all.
> > > >
> > > > What are you wanting to have happen here?
> > > >
> > > > confused,
> > > >
> > > > greg k-h
> > > >
> > >
> > > Thanks.
> > > Let's explain it briefly:
> > >
> > > The dentries such as /proc/<pid>/ns/ipc have the DCACHE_OP_DELETE flag, they
> > > should be deleted when the process exits.
> > > Suppose the following race appears:
> > >
> > > release_task dput
> > > -> proc_flush_task
> > > -> dentry->d_op->d_delete(dentry)
> > > -> __exit_signal
> > > -> dentry->d_lockref.count-- and return.
> > >
> > >
> > > In the proc_flush_task function, because another processe is using this
> > > dentry, it cannot be deleted;
> > > In the dput function, d_delete may be executed before __exit_signal (the pid
> > > has not been unhashed), so that d_delete returns false and the dentry can
> > > not be deleted.
> > >
> > > So this dentry is still caches (count is 0), and its parent dentries are
> > > also caches, and those dentries can only be deleted when drop_caches is
> > > manually triggered.
> > >
> > >
> > > In the release_task function, we should move proc_flush_task after the
> > > tasklist_lock is released(Just like the commit
> > > 7bc3e6e55acf065500a24621f3b313e7e5998acf did).
> >
> > I do not understand, is this a patch being submitted for the main kernel
> > tree, or for a stable kernel release?
> >
> > If stable, please read:
> > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > for how to do this properly.
> >
> > If main kernel tree, you can't have the "Upstream commit" line in the
> > changelog text as that makes no sense at all.
>
>
> Hi,
> This patch is submitted to the stable branches (from 4.9.y
> to 5.6.y).
>
> This problem can also be solved if the following patch could be ported to
> the stable branch:
> 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc")
> 26dbc60f385f ("proc: Generalize proc_sys_prune_dcache into
> proc_prune_siblings_dcache")
> f90f3cafe8d5 ("proc: Use d_invalidate in proc_prune_siblings_dcache")
>
> However, the above-mentioned patches modify too much code (more than 100
> lines), and there may also be some undiscovered bugs.
>
> So the safer method may be to apply this small patch(also ported from the
> equivalent fix already exist in Linus’ tree).
>
> We will reformat the patch later.
We always prefer to take the original, upstream patches, instead of
one-off changes as almost always, those one-off changes end up being
wrong and hard to work with over time.
So if we need more than one patch to solve this reported problem, that's
fine, can you test the above series of patches and provide a backported
set of them that we can use for this?
thanks,
gre gk-h
Hi Christian,
On Sat, Nov 28, 2020 at 3:11 PM Christian Eggers <ceggers(a)arri.de> wrote:
>
> Hi Greg, hi Sudip, hi Wolfram,
>
> this patch is part of my series :
>
<snip>
>
> Although I am happy seeing my patch in the stable series, I think it should be
> applied to mainline first.
Stable trees can only take fixes which have been applied to the
mainline. This is in mainline since v5.9.
--
Regards
Sudip
From: Chris Wilson <chris(a)chris-wilson.co.uk>
commit be33805c65297611971003d72e7f9235e23ec84d upstream.
Forcing mocs:1 [used for our winsys follows-pte mode] to be cached
caused display glitches. Though it is documented as deprecated (and so
likely behaves as uncached) use the follow-pte bit and force it out of
L3 cache.
Testcase: igt/kms_frontbuffer_tracking
Testcase: igt/kms_big_fb
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Ayaz A Siddiqui <ayaz.siddiqui(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Cc: Matt Roper <matthew.d.roper(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@…
(cherry picked from commit a04ac827366594c7244f60e9be79fcb404af69f0)
Fixes: 849c0fe9e831 ("drm/i915/gt: Initialize reserved and unspecified MOCS indices")
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
[Rodrigo: Updated Fixes tag]
Cc: <stable(a)vger.kernel.org> # 5.9.x
---
drivers/gpu/drm/i915/gt/intel_mocs.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
index b8f56e62158e..313e51e7d4f7 100644
--- a/drivers/gpu/drm/i915/gt/intel_mocs.c
+++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
@@ -243,8 +243,9 @@ static const struct drm_i915_mocs_entry tgl_mocs_table[] = {
* only, __init_mocs_table() take care to program unused index with
* this entry.
*/
- MOCS_ENTRY(1, LE_3_WB | LE_TC_1_LLC | LE_LRUM(3),
- L3_3_WB),
+ MOCS_ENTRY(I915_MOCS_PTE,
+ LE_0_PAGETABLE | LE_TC_0_PAGETABLE,
+ L3_1_UC),
GEN11_MOCS_ENTRIES,
/* Implicitly enable L1 - HDC:L1 + L3 + LLC */
--
2.28.0