Sorry, I may not have sent the email correctly.
I will resend it.
On Thu, 27 Oct 2022 20:26:04 +0000 Andrew Morton <akpm(a)linux-foundation.org> wrote:
> On Wed, 26 Oct 2022 20:24:38 +0900 NARIBAYASHI Akira <a.naribayashi(a)fujitsu.com> wrote:
>
> > Depending on the memory configuration, isolate_freepages_block() may
> > scan pages out of the target range and causes panic.
> >
> > The problem is that pfn as argument of fast_isolate_around() could
> > be out of the target range. Therefore we should consider the case
> > where pfn < start_pfn, and also the case where end_pfn < pfn.
> >
> > This problem should have been addressd by the commit 6e2b7044c199
> > ("mm, compaction: make fast_isolate_freepages() stay within zone")
> > but there was an oversight.
> >
> > Case1: pfn < start_pfn
> >
> > <at memory compaction for node Y>
> > | node X's zone | node Y's zone
> > +-----------------+------------------------------...
> > pageblock ^ ^ ^
> > +-----------+-----------+-----------+-----------+...
> > ^ ^ ^
> > ^ ^ end_pfn
> > ^ start_pfn = cc->zone->zone_start_pfn
> > pfn
> > <---------> scanned range by "Scan After"
> >
> > Case2: end_pfn < pfn
> >
> > <at memory compaction for node X>
> > | node X's zone | node Y's zone
> > +-----------------+------------------------------...
> > pageblock ^ ^ ^
> > +-----------+-----------+-----------+-----------+...
> > ^ ^ ^
> > ^ ^ pfn
> > ^ end_pfn
> > start_pfn
> > <---------> scanned range by "Scan Before"
> >
> > It seems that there is no good reason to skip nr_isolated pages
> > just after given pfn. So let perform simple scan from start to end
> > instead of dividing the scan into "Before" and "After".
>
> Under what circumstances will this panic occur? I assume those
> circumstnces are pretty rare, give that 6e2b7044c1992 was nearly two
> years ago.
>
> Did you consider the desirability of backporting this fix into earlier
> kernels?
Panic can occur on systems with multiple zones in a single pageblock.
The reason it is rare is that it only happens in special configurations.
Depending on how many similar systems there are, it may be a good idea to fix this problem for older kernels as well.
From: Xiubo Li <xiubli(a)redhat.com>
For the POSIX locks they are using the same owner, which is the
thread id. And multiple POSIX locks could be merged into single one,
so when checking whether the 'file' has locks may fail.
For a file where some openers use locking and others don't is a
really odd usage pattern though. Locks are like stoplights -- they
only work if everyone pays attention to them.
Just switch ceph_get_caps() to check whether any locks are set on
the inode. If there are POSIX/OFD/FLOCK locks on the file at the
time, we should set CHECK_FILELOCK, regardless of what fd was used
to set the lock.
Cc: stable(a)vger.kernel.org
Cc: Jeff Layton <jlayton(a)kernel.org>
Fixes: ff5d913dfc71 ("ceph: return -EIO if read/write against filp that lost file locks")
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/caps.c | 2 +-
fs/ceph/locks.c | 4 ----
fs/ceph/super.h | 1 -
3 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 065e9311b607..948136f81fc8 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2964,7 +2964,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
while (true) {
flags &= CEPH_FILE_MODE_MASK;
- if (atomic_read(&fi->num_locks))
+ if (vfs_inode_has_locks(inode))
flags |= CHECK_FILELOCK;
_got = 0;
ret = try_get_cap_refs(inode, need, want, endoff,
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index 3e2843e86e27..b191426bf880 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -32,18 +32,14 @@ void __init ceph_flock_init(void)
static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
{
- struct ceph_file_info *fi = dst->fl_file->private_data;
struct inode *inode = file_inode(dst->fl_file);
atomic_inc(&ceph_inode(inode)->i_filelock_ref);
- atomic_inc(&fi->num_locks);
}
static void ceph_fl_release_lock(struct file_lock *fl)
{
- struct ceph_file_info *fi = fl->fl_file->private_data;
struct inode *inode = file_inode(fl->fl_file);
struct ceph_inode_info *ci = ceph_inode(inode);
- atomic_dec(&fi->num_locks);
if (atomic_dec_and_test(&ci->i_filelock_ref)) {
/* clear error when all locks are released */
spin_lock(&ci->i_ceph_lock);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 14454f464029..e7662ff6f149 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -804,7 +804,6 @@ struct ceph_file_info {
struct list_head rw_contexts;
u32 filp_gen;
- atomic_t num_locks;
};
struct ceph_dir_file_info {
--
2.31.1
The catch-all evict can fail due to object lock contention, since it
only goes as far as trylocking the object, due to us already holding the
vm->mutex. Doing a full object lock here can deadlock, since the
vm->mutex is always our inner lock. Add another execbuf pass which drops
the vm->mutex and then tries to grab the object will the full lock,
before then retrying the eviction. This should be good enough for now to
fix the immediate regression with userspace seeing -ENOSPC from execbuf
due to contended object locks during GTT eviction.
Testcase: igt@gem_ppgtt@shrink-vs-evict-*
Fixes: 7e00897be8bf ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7627
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7570
References: https://bugzilla.mozilla.org/show_bug.cgi?id=1779558
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda(a)intel.com>
Cc: Mani Milani <mani(a)chromium.org>
Cc: <stable(a)vger.kernel.org> # v5.18+
---
.../gpu/drm/i915/gem/i915_gem_execbuffer.c | 25 +++++++++++--
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +-
drivers/gpu/drm/i915/i915_gem_evict.c | 37 ++++++++++++++-----
drivers/gpu/drm/i915/i915_gem_evict.h | 4 +-
drivers/gpu/drm/i915/i915_vma.c | 2 +-
.../gpu/drm/i915/selftests/i915_gem_evict.c | 4 +-
6 files changed, 56 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 86956b902c97..e2ce1e4e9723 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -745,25 +745,44 @@ static int eb_reserve(struct i915_execbuffer *eb)
*
* Defragmenting is skipped if all objects are pinned at a fixed location.
*/
- for (pass = 0; pass <= 2; pass++) {
+ for (pass = 0; pass <= 3; pass++) {
int pin_flags = PIN_USER | PIN_VALIDATE;
if (pass == 0)
pin_flags |= PIN_NONBLOCK;
if (pass >= 1)
- unpinned = eb_unbind(eb, pass == 2);
+ unpinned = eb_unbind(eb, pass >= 2);
if (pass == 2) {
err = mutex_lock_interruptible(&eb->context->vm->mutex);
if (!err) {
- err = i915_gem_evict_vm(eb->context->vm, &eb->ww);
+ err = i915_gem_evict_vm(eb->context->vm, &eb->ww, NULL);
mutex_unlock(&eb->context->vm->mutex);
}
if (err)
return err;
}
+ if (pass == 3) {
+retry:
+ err = mutex_lock_interruptible(&eb->context->vm->mutex);
+ if (!err) {
+ struct drm_i915_gem_object *busy_bo = NULL;
+
+ err = i915_gem_evict_vm(eb->context->vm, &eb->ww, &busy_bo);
+ mutex_unlock(&eb->context->vm->mutex);
+ if (err && busy_bo) {
+ err = i915_gem_object_lock(busy_bo, &eb->ww);
+ i915_gem_object_put(busy_bo);
+ if (!err)
+ goto retry;
+ }
+ }
+ if (err)
+ return err;
+ }
+
list_for_each_entry(ev, &eb->unbound, bind_link) {
err = eb_reserve_vma(eb, ev, pin_flags);
if (err)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index d73ba0f5c4c5..4f69bff63068 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -369,7 +369,7 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
if (vma == ERR_PTR(-ENOSPC)) {
ret = mutex_lock_interruptible(&ggtt->vm.mutex);
if (!ret) {
- ret = i915_gem_evict_vm(&ggtt->vm, &ww);
+ ret = i915_gem_evict_vm(&ggtt->vm, &ww, NULL);
mutex_unlock(&ggtt->vm.mutex);
}
if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 4cfe36b0366b..c02ebd6900ae 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -441,6 +441,11 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
* @vm: Address space to cleanse
* @ww: An optional struct i915_gem_ww_ctx. If not NULL, i915_gem_evict_vm
* will be able to evict vma's locked by the ww as well.
+ * @busy_bo: Optional pointer to struct drm_i915_gem_object. If not NULL, then
+ * in the event i915_gem_evict_vm() is unable to trylock an object for eviction,
+ * then @busy_bo will point to it. -EBUSY is also returned. The caller must drop
+ * the vm->mutex, before trying again to acquire the contended lock. The caller
+ * also owns a reference to the object.
*
* This function evicts all vmas from a vm.
*
@@ -450,7 +455,8 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
* To clarify: This is for freeing up virtual address space, not for freeing
* memory in e.g. the shrinker.
*/
-int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww)
+int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww,
+ struct drm_i915_gem_object **busy_bo)
{
int ret = 0;
@@ -482,15 +488,22 @@ int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww)
* the resv is shared among multiple objects, we still
* need the object ref.
*/
- if (dying_vma(vma) ||
+ if (!i915_gem_object_get_rcu(vma->obj) ||
(ww && (dma_resv_locking_ctx(vma->obj->base.resv) == &ww->ctx))) {
__i915_vma_pin(vma);
list_add(&vma->evict_link, &locked_eviction_list);
continue;
}
- if (!i915_gem_object_trylock(vma->obj, ww))
+ if (!i915_gem_object_trylock(vma->obj, ww)) {
+ if (busy_bo) {
+ *busy_bo = vma->obj; /* holds ref */
+ ret = -EBUSY;
+ break;
+ }
+ i915_gem_object_put(vma->obj);
continue;
+ }
__i915_vma_pin(vma);
list_add(&vma->evict_link, &eviction_list);
@@ -498,25 +511,29 @@ int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww)
if (list_empty(&eviction_list) && list_empty(&locked_eviction_list))
break;
- ret = 0;
/* Unbind locked objects first, before unlocking the eviction_list */
list_for_each_entry_safe(vma, vn, &locked_eviction_list, evict_link) {
__i915_vma_unpin(vma);
- if (ret == 0)
+ if (ret == 0) {
ret = __i915_vma_unbind(vma);
- if (ret != -EINTR) /* "Get me out of here!" */
- ret = 0;
+ if (ret != -EINTR) /* "Get me out of here!" */
+ ret = 0;
+ }
+ if (!dying_vma(vma))
+ i915_gem_object_put(vma->obj);
}
list_for_each_entry_safe(vma, vn, &eviction_list, evict_link) {
__i915_vma_unpin(vma);
- if (ret == 0)
+ if (ret == 0) {
ret = __i915_vma_unbind(vma);
- if (ret != -EINTR) /* "Get me out of here!" */
- ret = 0;
+ if (ret != -EINTR) /* "Get me out of here!" */
+ ret = 0;
+ }
i915_gem_object_unlock(vma->obj);
+ i915_gem_object_put(vma->obj);
}
} while (ret == 0);
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.h b/drivers/gpu/drm/i915/i915_gem_evict.h
index e593c530f9bd..bf0ee0e4fe60 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.h
+++ b/drivers/gpu/drm/i915/i915_gem_evict.h
@@ -11,6 +11,7 @@
struct drm_mm_node;
struct i915_address_space;
struct i915_gem_ww_ctx;
+struct drm_i915_gem_object;
int __must_check i915_gem_evict_something(struct i915_address_space *vm,
struct i915_gem_ww_ctx *ww,
@@ -23,6 +24,7 @@ int __must_check i915_gem_evict_for_node(struct i915_address_space *vm,
struct drm_mm_node *node,
unsigned int flags);
int i915_gem_evict_vm(struct i915_address_space *vm,
- struct i915_gem_ww_ctx *ww);
+ struct i915_gem_ww_ctx *ww,
+ struct drm_i915_gem_object **busy_bo);
#endif /* __I915_GEM_EVICT_H__ */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 34f0e6c923c2..7d044888ac33 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1599,7 +1599,7 @@ static int __i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
* locked objects when called from execbuf when pinning
* is removed. This would probably regress badly.
*/
- i915_gem_evict_vm(vm, NULL);
+ i915_gem_evict_vm(vm, NULL, NULL);
mutex_unlock(&vm->mutex);
}
} while (1);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 8c6517d29b8e..37068542aafe 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -344,7 +344,7 @@ static int igt_evict_vm(void *arg)
/* Everything is pinned, nothing should happen */
mutex_lock(&ggtt->vm.mutex);
- err = i915_gem_evict_vm(&ggtt->vm, NULL);
+ err = i915_gem_evict_vm(&ggtt->vm, NULL, NULL);
mutex_unlock(&ggtt->vm.mutex);
if (err) {
pr_err("i915_gem_evict_vm on a full GGTT returned err=%d]\n",
@@ -356,7 +356,7 @@ static int igt_evict_vm(void *arg)
for_i915_gem_ww(&ww, err, false) {
mutex_lock(&ggtt->vm.mutex);
- err = i915_gem_evict_vm(&ggtt->vm, &ww);
+ err = i915_gem_evict_vm(&ggtt->vm, &ww, NULL);
mutex_unlock(&ggtt->vm.mutex);
}
--
2.38.1
The patch titled
Subject: hfs: fix missing hfs_bnode_get() in __hfs_bnode_create
has been added to the -mm mm-nonmm-unstable branch. Its filename is
hfs-fix-missing-hfs_bnode_get-in-__hfs_bnode_create.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Liu Shixin <liushixin2(a)huawei.com>
Subject: hfs: fix missing hfs_bnode_get() in __hfs_bnode_create
Date: Mon, 12 Dec 2022 10:16:27 +0800
Syzbot found a kernel BUG in hfs_bnode_put():
kernel BUG at fs/hfs/bnode.c:466!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3634 Comm: kworker/u4:5 Not tainted 6.1.0-rc7-syzkaller-00190-g97ee9d1c1696 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: writeback wb_workfn (flush-7:0)
RIP: 0010:hfs_bnode_put+0x46f/0x480 fs/hfs/bnode.c:466
Code: 8a 80 ff e9 73 fe ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a0 fe ff ff 48 89 df e8 db 8a 80 ff e9 93 fe ff ff e8 a1 68 2c ff <0f> 0b e8 9a 68 2c ff 0f 0b 0f 1f 84 00 00 00 00 00 55 41 57 41 56
RSP: 0018:ffffc90003b4f258 EFLAGS: 00010293
RAX: ffffffff825e318f RBX: 0000000000000000 RCX: ffff8880739dd7c0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003b4f430 R08: ffffffff825e2d9b R09: ffffed10045157d1
R10: ffffed10045157d1 R11: 1ffff110045157d0 R12: ffff8880228abe80
R13: ffff88807016c000 R14: dffffc0000000000 R15: ffff8880228abe00
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa6ebe88718 CR3: 000000001e93d000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
hfs_write_inode+0x1bc/0xb40
write_inode fs/fs-writeback.c:1440 [inline]
__writeback_single_inode+0x4d6/0x670 fs/fs-writeback.c:1652
writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1878
__writeback_inodes_wb+0x125/0x420 fs/fs-writeback.c:1949
wb_writeback+0x440/0x7b0 fs/fs-writeback.c:2054
wb_check_start_all fs/fs-writeback.c:2176 [inline]
wb_do_writeback fs/fs-writeback.c:2202 [inline]
wb_workfn+0x827/0xef0 fs/fs-writeback.c:2235
process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
The BUG_ON() is triggered at here:
/* Dispose of resources used by a node */
void hfs_bnode_put(struct hfs_bnode *node)
{
if (node) {
<skipped>
BUG_ON(!atomic_read(&node->refcnt)); <- we have issue here!!!!
<skipped>
}
}
By tracing the refcnt, I found the node is created by hfs_bmap_alloc()
with refcnt 1. Then the node is used by hfs_btree_write(). There is a
missing of hfs_bnode_get() after find the node. The issue happened in
following path:
<alloc>
hfs_bmap_alloc
hfs_bnode_find
__hfs_bnode_create <- allocate a new node with refcnt 1.
hfs_bnode_put <- decrease the refcnt
<write>
hfs_btree_write
hfs_bnode_find
__hfs_bnode_create
hfs_bnode_findhash <- find the node without refcnt increased.
hfs_bnode_put <- trigger the BUG_ON() since refcnt is 0.
Link: https://lkml.kernel.org/r/20221212021627.3766829-1-liushixin2@huawei.com
Reported-by: syzbot+5b04b49a7ec7226c7426(a)syzkaller.appspotmail.com
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Cc: Fabio M. De Francesco <fmdefrancesco(a)gmail.com>
Cc: Viacheslav Dubeyko <slava(a)dubeyko.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hfs/bnode.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/hfs/bnode.c~hfs-fix-missing-hfs_bnode_get-in-__hfs_bnode_create
+++ a/fs/hfs/bnode.c
@@ -274,6 +274,7 @@ static struct hfs_bnode *__hfs_bnode_cre
tree->node_hash[hash] = node;
tree->node_hash_cnt++;
} else {
+ hfs_bnode_get(node2);
spin_unlock(&tree->hash_lock);
kfree(node);
wait_event(node2->lock_wq, !test_bit(HFS_BNODE_NEW, &node2->flags));
_
Patches currently in -mm which might be from liushixin2(a)huawei.com are
hfs-fix-missing-hfs_bnode_get-in-__hfs_bnode_create.patch
The patch titled
Subject: hugetlb: really allocate vma lock for all sharable vmas
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
hugetlb-really-allocate-vma-lock-for-all-sharable-vmas.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: really allocate vma lock for all sharable vmas
Date: Mon, 12 Dec 2022 15:50:41 -0800
Commit bbff39cc6cbc ("hugetlb: allocate vma lock for all sharable vmas")
removed the pmd sharable checks in the vma lock helper routines. However,
it left the functional version of helper routines behind #ifdef
CONFIG_ARCH_WANT_HUGE_PMD_SHARE. Therefore, the vma lock is not being
used for sharable vmas on architectures that do not support pmd sharing.
On these architectures, a potential fault/truncation race is exposed that
could leave pages in a hugetlb file past i_size until the file is removed.
Move the functional vma lock helpers outside the ifdef, and remove the
non-functional stubs. Since the vma lock is not just for pmd sharing,
rename the routine __vma_shareable_flags_pmd.
Link: https://lkml.kernel.org/r/20221212235042.178355-1-mike.kravetz@oracle.com
Fixes: bbff39cc6cbc ("hugetlb: allocate vma lock for all sharable vmas")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Mina Almasry <almasrymina(a)google.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 333 +++++++++++++++++++++----------------------------
1 file changed, 148 insertions(+), 185 deletions(-)
--- a/mm/hugetlb.c~hugetlb-really-allocate-vma-lock-for-all-sharable-vmas
+++ a/mm/hugetlb.c
@@ -255,6 +255,152 @@ static inline struct hugepage_subpool *s
return subpool_inode(file_inode(vma->vm_file));
}
+/*
+ * hugetlb vma_lock helper routines
+ */
+static bool __vma_shareable_lock(struct vm_area_struct *vma)
+{
+ return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) &&
+ vma->vm_private_data;
+}
+
+void hugetlb_vma_lock_read(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ down_read(&vma_lock->rw_sema);
+ }
+}
+
+void hugetlb_vma_unlock_read(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ up_read(&vma_lock->rw_sema);
+ }
+}
+
+void hugetlb_vma_lock_write(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ down_write(&vma_lock->rw_sema);
+ }
+}
+
+void hugetlb_vma_unlock_write(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ up_write(&vma_lock->rw_sema);
+ }
+}
+
+int hugetlb_vma_trylock_write(struct vm_area_struct *vma)
+{
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ if (!__vma_shareable_lock(vma))
+ return 1;
+
+ return down_write_trylock(&vma_lock->rw_sema);
+}
+
+void hugetlb_vma_assert_locked(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ lockdep_assert_held(&vma_lock->rw_sema);
+ }
+}
+
+void hugetlb_vma_lock_release(struct kref *kref)
+{
+ struct hugetlb_vma_lock *vma_lock = container_of(kref,
+ struct hugetlb_vma_lock, refs);
+
+ kfree(vma_lock);
+}
+
+static void __hugetlb_vma_unlock_write_put(struct hugetlb_vma_lock *vma_lock)
+{
+ struct vm_area_struct *vma = vma_lock->vma;
+
+ /*
+ * vma_lock structure may or not be released as a result of put,
+ * it certainly will no longer be attached to vma so clear pointer.
+ * Semaphore synchronizes access to vma_lock->vma field.
+ */
+ vma_lock->vma = NULL;
+ vma->vm_private_data = NULL;
+ up_write(&vma_lock->rw_sema);
+ kref_put(&vma_lock->refs, hugetlb_vma_lock_release);
+}
+
+static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ __hugetlb_vma_unlock_write_put(vma_lock);
+ }
+}
+
+static void hugetlb_vma_lock_free(struct vm_area_struct *vma)
+{
+ /*
+ * Only present in sharable vmas.
+ */
+ if (!vma || !__vma_shareable_lock(vma))
+ return;
+
+ if (vma->vm_private_data) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ down_write(&vma_lock->rw_sema);
+ __hugetlb_vma_unlock_write_put(vma_lock);
+ }
+}
+
+static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
+{
+ struct hugetlb_vma_lock *vma_lock;
+
+ /* Only establish in (flags) sharable vmas */
+ if (!vma || !(vma->vm_flags & VM_MAYSHARE))
+ return;
+
+ /* Should never get here with non-NULL vm_private_data */
+ if (vma->vm_private_data)
+ return;
+
+ vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL);
+ if (!vma_lock) {
+ /*
+ * If we can not allocate structure, then vma can not
+ * participate in pmd sharing. This is only a possible
+ * performance enhancement and memory saving issue.
+ * However, the lock is also used to synchronize page
+ * faults with truncation. If the lock is not present,
+ * unlikely races could leave pages in a file past i_size
+ * until the file is removed. Warn in the unlikely case of
+ * allocation failure.
+ */
+ pr_warn_once("HugeTLB: unable to allocate vma specific lock\n");
+ return;
+ }
+
+ kref_init(&vma_lock->refs);
+ init_rwsem(&vma_lock->rw_sema);
+ vma_lock->vma = vma;
+ vma->vm_private_data = vma_lock;
+}
+
/* Helper that removes a struct file_region from the resv_map cache and returns
* it for use.
*/
@@ -6616,7 +6762,8 @@ bool hugetlb_reserve_pages(struct inode
}
/*
- * vma specific semaphore used for pmd sharing synchronization
+ * vma specific semaphore used for pmd sharing and fault/truncation
+ * synchronization
*/
hugetlb_vma_lock_alloc(vma);
@@ -6872,149 +7019,6 @@ void adjust_range_if_pmd_sharing_possibl
*end = ALIGN(*end, PUD_SIZE);
}
-static bool __vma_shareable_flags_pmd(struct vm_area_struct *vma)
-{
- return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) &&
- vma->vm_private_data;
-}
-
-void hugetlb_vma_lock_read(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- down_read(&vma_lock->rw_sema);
- }
-}
-
-void hugetlb_vma_unlock_read(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- up_read(&vma_lock->rw_sema);
- }
-}
-
-void hugetlb_vma_lock_write(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- down_write(&vma_lock->rw_sema);
- }
-}
-
-void hugetlb_vma_unlock_write(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- up_write(&vma_lock->rw_sema);
- }
-}
-
-int hugetlb_vma_trylock_write(struct vm_area_struct *vma)
-{
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- if (!__vma_shareable_flags_pmd(vma))
- return 1;
-
- return down_write_trylock(&vma_lock->rw_sema);
-}
-
-void hugetlb_vma_assert_locked(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- lockdep_assert_held(&vma_lock->rw_sema);
- }
-}
-
-void hugetlb_vma_lock_release(struct kref *kref)
-{
- struct hugetlb_vma_lock *vma_lock = container_of(kref,
- struct hugetlb_vma_lock, refs);
-
- kfree(vma_lock);
-}
-
-static void __hugetlb_vma_unlock_write_put(struct hugetlb_vma_lock *vma_lock)
-{
- struct vm_area_struct *vma = vma_lock->vma;
-
- /*
- * vma_lock structure may or not be released as a result of put,
- * it certainly will no longer be attached to vma so clear pointer.
- * Semaphore synchronizes access to vma_lock->vma field.
- */
- vma_lock->vma = NULL;
- vma->vm_private_data = NULL;
- up_write(&vma_lock->rw_sema);
- kref_put(&vma_lock->refs, hugetlb_vma_lock_release);
-}
-
-static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- __hugetlb_vma_unlock_write_put(vma_lock);
- }
-}
-
-static void hugetlb_vma_lock_free(struct vm_area_struct *vma)
-{
- /*
- * Only present in sharable vmas.
- */
- if (!vma || !__vma_shareable_flags_pmd(vma))
- return;
-
- if (vma->vm_private_data) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- down_write(&vma_lock->rw_sema);
- __hugetlb_vma_unlock_write_put(vma_lock);
- }
-}
-
-static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
-{
- struct hugetlb_vma_lock *vma_lock;
-
- /* Only establish in (flags) sharable vmas */
- if (!vma || !(vma->vm_flags & VM_MAYSHARE))
- return;
-
- /* Should never get here with non-NULL vm_private_data */
- if (vma->vm_private_data)
- return;
-
- vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL);
- if (!vma_lock) {
- /*
- * If we can not allocate structure, then vma can not
- * participate in pmd sharing. This is only a possible
- * performance enhancement and memory saving issue.
- * However, the lock is also used to synchronize page
- * faults with truncation. If the lock is not present,
- * unlikely races could leave pages in a file past i_size
- * until the file is removed. Warn in the unlikely case of
- * allocation failure.
- */
- pr_warn_once("HugeTLB: unable to allocate vma specific lock\n");
- return;
- }
-
- kref_init(&vma_lock->refs);
- init_rwsem(&vma_lock->rw_sema);
- vma_lock->vma = vma;
- vma->vm_private_data = vma_lock;
-}
-
/*
* Search for a shareable pmd page for hugetlb. In any case calls pmd_alloc()
* and returns the corresponding pte. While this is not necessary for the
@@ -7103,47 +7107,6 @@ int huge_pmd_unshare(struct mm_struct *m
#else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */
-void hugetlb_vma_lock_read(struct vm_area_struct *vma)
-{
-}
-
-void hugetlb_vma_unlock_read(struct vm_area_struct *vma)
-{
-}
-
-void hugetlb_vma_lock_write(struct vm_area_struct *vma)
-{
-}
-
-void hugetlb_vma_unlock_write(struct vm_area_struct *vma)
-{
-}
-
-int hugetlb_vma_trylock_write(struct vm_area_struct *vma)
-{
- return 1;
-}
-
-void hugetlb_vma_assert_locked(struct vm_area_struct *vma)
-{
-}
-
-void hugetlb_vma_lock_release(struct kref *kref)
-{
-}
-
-static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma)
-{
-}
-
-static void hugetlb_vma_lock_free(struct vm_area_struct *vma)
-{
-}
-
-static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
-{
-}
-
pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pud_t *pud)
{
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlb-really-allocate-vma-lock-for-all-sharable-vmas.patch
The patch titled
Subject: mm/uffd: fix pte marker when fork() without fork event
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-uffd-fix-pte-marker-when-fork-without-fork-event.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/uffd: fix pte marker when fork() without fork event
Date: Wed, 14 Dec 2022 15:04:52 -0500
Patch series "mm: Fixes on pte markers".
Patch 1 resolves the syzkiller report from Pengfei.
Patch 2 further harden pte markers when used with the recent swapin error
markers. The major case is we should persist a swapin error marker after
fork(), so child shouldn't read a corrupted page.
This patch (of 2):
When fork(), dst_vma is not guaranteed to have VM_UFFD_WP even if src may
have it and has pte marker installed. The warning is improper along with
the comment. The right thing is to inherit the pte marker when needed, or
keep the dst pte empty.
A vague guess is this happened by an accident when there's the prior patch
to introduce src/dst vma into this helper during the uffd-wp feature got
developed and I probably messed up in the rebase, since if we replace
dst_vma with src_vma the warning & comment it all makes sense too.
Hugetlb did exactly the right here (copy_hugetlb_page_range()). Fix the
general path.
Reproducer:
https://github.com/xupengfe/syzkaller_logs/blob/main/221208_115556_copy_pag…
Bugzilla report: https://bugzilla.kernel.org/show_bug.cgi?id=216808
Link: https://lkml.kernel.org/r/20221214200453.1772655-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20221214200453.1772655-2-peterx@redhat.com
Fixes: c56d1b62cce8 ("mm/shmem: handle uffd-wp during fork()")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reported-by: Pengfei Xu <pengfei.xu(a)intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org> # 5.19+
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
--- a/mm/memory.c~mm-uffd-fix-pte-marker-when-fork-without-fork-event
+++ a/mm/memory.c
@@ -828,12 +828,8 @@ copy_nonpresent_pte(struct mm_struct *ds
return -EBUSY;
return -ENOENT;
} else if (is_pte_marker_entry(entry)) {
- /*
- * We're copying the pgtable should only because dst_vma has
- * uffd-wp enabled, do sanity check.
- */
- WARN_ON_ONCE(!userfaultfd_wp(dst_vma));
- set_pte_at(dst_mm, addr, dst_pte, pte);
+ if (userfaultfd_wp(dst_vma))
+ set_pte_at(dst_mm, addr, dst_pte, pte);
return 0;
}
if (!userfaultfd_wp(dst_vma))
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-uffd-fix-pte-marker-when-fork-without-fork-event.patch
mm-fix-a-few-rare-cases-of-using-swapin-error-pte-marker.patch
The patch titled
Subject: kmsan: export kmsan_handle_urb
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kmsan-export-kmsan_handle_urb.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: kmsan: export kmsan_handle_urb
Date: Thu, 15 Dec 2022 17:26:57 +0100
USB support can be in a loadable module, and this causes a link failure
with KMSAN:
ERROR: modpost: "kmsan_handle_urb" [drivers/usb/core/usbcore.ko] undefined!
Export the symbol so it can be used by this module.
Link: https://lkml.kernel.org/r/20221215162710.3802378-1-arnd@kernel.org
Fixes: 553a80188a5d ("kmsan: handle memory sent to/from USB")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmsan/hooks.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/kmsan/hooks.c~kmsan-export-kmsan_handle_urb
+++ a/mm/kmsan/hooks.c
@@ -260,6 +260,7 @@ void kmsan_handle_urb(const struct urb *
urb->transfer_buffer_length,
/*checked*/ false);
}
+EXPORT_SYMBOL_GPL(kmsan_handle_urb);
static void kmsan_handle_dma_page(const void *addr, size_t size,
enum dma_data_direction dir)
_
Patches currently in -mm which might be from arnd(a)arndb.de are
kmsan-include-linux-vmalloch.patch
kmsan-export-kmsan_handle_urb.patch
The patch titled
Subject: kmsan: include linux/vmalloc.h
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kmsan-include-linux-vmalloch.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: kmsan: include linux/vmalloc.h
Date: Thu, 15 Dec 2022 17:30:17 +0100
This is needed for the vmap/vunmap declarations:
mm/kmsan/kmsan_test.c:316:9: error: implicit declaration of function 'vmap' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
^
mm/kmsan/kmsan_test.c:316:29: error: use of undeclared identifier 'VM_MAP'
vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
^
mm/kmsan/kmsan_test.c:322:3: error: implicit declaration of function 'vunmap' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
vunmap(vbuf);
^
Link: https://lkml.kernel.org/r/20221215163046.4079767-1-arnd@kernel.org
Fixes: 8ed691b02ade ("kmsan: add tests for KMSAN")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmsan/kmsan_test.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/kmsan/kmsan_test.c~kmsan-include-linux-vmalloch
+++ a/mm/kmsan/kmsan_test.c
@@ -22,6 +22,7 @@
#include <linux/spinlock.h>
#include <linux/string.h>
#include <linux/tracepoint.h>
+#include <linux/vmalloc.h>
#include <trace/events/printk.h>
static DEFINE_PER_CPU(int, per_cpu_var);
_
Patches currently in -mm which might be from arnd(a)arndb.de are
kmsan-include-linux-vmalloch.patch
The patch titled
Subject: mm/mempolicy: fix memory leak in set_mempolicy_home_node system call
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mempolicy-fix-memory-leak-in-set_mempolicy_home_node-system-call.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Subject: mm/mempolicy: fix memory leak in set_mempolicy_home_node system call
Date: Thu, 15 Dec 2022 14:46:21 -0500
When encountering any vma in the range with policy other than MPOL_BIND or
MPOL_PREFERRED_MANY, an error is returned without issuing a mpol_put on
the policy just allocated with mpol_dup().
This allows arbitrary users to leak kernel memory.
Link: https://lkml.kernel.org/r/20221215194621.202816-1-mathieu.desnoyers@efficio…
Fixes: c6018b4b2549 ("mm/mempolicy: add set_mempolicy_home_node syscall")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Reviewed-by: Randy Dunlap <rdunlap(a)infradead.org>
Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Feng Tang <feng.tang(a)intel.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: <stable(a)vger.kernel.org> [5.17+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/mempolicy.c~mm-mempolicy-fix-memory-leak-in-set_mempolicy_home_node-system-call
+++ a/mm/mempolicy.c
@@ -1540,6 +1540,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node,
* the home node for vmas we already updated before.
*/
if (new->mode != MPOL_BIND && new->mode != MPOL_PREFERRED_MANY) {
+ mpol_put(new);
err = -EOPNOTSUPP;
break;
}
_
Patches currently in -mm which might be from mathieu.desnoyers(a)efficios.com are
mm-mempolicy-fix-memory-leak-in-set_mempolicy_home_node-system-call.patch
When encountering any vma in the range with policy other than MPOL_BIND
or MPOL_PREFERRED_MANY, an error is returned without issuing a mpol_put
on the policy just allocated with mpol_dup().
This allows arbitrary users to leak kernel memory.
Fixes: c6018b4b2549 ("mm/mempolicy: add set_mempolicy_home_node syscall")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Ben Widawsky <ben.widawsky(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Feng Tang <feng.tang(a)intel.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: <linux-api(a)vger.kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org # 5.17+
---
mm/mempolicy.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 61aa9aedb728..02c8a712282f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1540,6 +1540,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
* the home node for vmas we already updated before.
*/
if (new->mode != MPOL_BIND && new->mode != MPOL_PREFERRED_MANY) {
+ mpol_put(new);
err = -EOPNOTSUPP;
break;
}
--
2.25.1
When encountering any vma in the range with policy other than MPOL_BIND
or MPOL_PREFERRED_MANY, an error is returned without issuing a mpol_put
on the policy just allocated with mpol_dup().
This allows arbitrary users to leak kernel memory.
Fixes: c6018b4b2549 ("mm/mempolicy: add set_mempolicy_home_node syscall")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Reviewed-by: Randy Dunlap <rdunlap(a)infradead.org>
Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Feng Tang <feng.tang(a)intel.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: <linux-api(a)vger.kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org # 5.17+
---
mm/mempolicy.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 61aa9aedb728..02c8a712282f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1540,6 +1540,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
* the home node for vmas we already updated before.
*/
if (new->mode != MPOL_BIND && new->mode != MPOL_PREFERRED_MANY) {
+ mpol_put(new);
err = -EOPNOTSUPP;
break;
}
--
2.25.1
HP Elitebook 865 supports both the AMD GUID w/ _REV 2 and Microsoft
GUID with _REV 0. Both have very similar code but the AMD GUID
has a special workaround that is specific to a problem with
spurious wakeups on systems with Qualcomm WLAN.
This is believed to be a bug in the Qualcomm WLAN F/W (it doesn't
affect any other WLAN H/W). If this WLAN firmware is fixed this
quirk can be dropped.
Cc: stable(a)vger.kernel.org # 6.1
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/acpi/x86/s2idle.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/acpi/x86/s2idle.c b/drivers/acpi/x86/s2idle.c
index 5350c73564b6..422415cb14f4 100644
--- a/drivers/acpi/x86/s2idle.c
+++ b/drivers/acpi/x86/s2idle.c
@@ -401,6 +401,13 @@ static const struct acpi_device_id amd_hid_ids[] = {
{}
};
+static int lps0_prefer_amd(const struct dmi_system_id *id)
+{
+ pr_debug("Using AMD GUID w/ _REV 2.\n");
+ rev_id = 2;
+ return 0;
+}
+
static int lps0_prefer_microsoft(const struct dmi_system_id *id)
{
pr_debug("Preferring Microsoft GUID.\n");
@@ -462,6 +469,19 @@ static const struct dmi_system_id s2idle_dmi_table[] __initconst = {
DMI_MATCH(DMI_PRODUCT_NAME, "ROG Flow X16 GV601"),
},
},
+ {
+ /*
+ * AMD Rembrandt based HP EliteBook 835/845/865 G9
+ * Contains specialized AML in AMD/_REV 2 path to avoid
+ * triggering a bug in Qualcomm WLAN firmware. This may be
+ * removed in the future if that firmware is fixed.
+ */
+ .callback = lps0_prefer_amd,
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "HP"),
+ DMI_MATCH(DMI_BOARD_NAME, "8990"),
+ },
+ },
{}
};
--
2.34.1
Hello:
This patch was applied to netdev/net.git (master)
by Paolo Abeni <pabeni(a)redhat.com>:
On Wed, 14 Dec 2022 10:51:18 +0000 you wrote:
> This patch fixes the error "ravb 11c20000.ethernet eth0: failed to switch
> device to config mode" during unbind.
>
> We are doing register access after pm_runtime_put_sync().
>
> We usually do cleanup in reverse order of init. Currently in
> remove(), the "pm_runtime_put_sync" is not in reverse order.
>
> [...]
Here is the summary with links:
- [net,v2] ravb: Fix "failed to switch device to config mode" message during unbind
https://git.kernel.org/netdev/net/c/c72a7e42592b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
When updating the operating mode as part of regulator enable, the caller
has already locked the regulator tree and drms_uA_update() must not try
to do the same in order not to trigger a deadlock.
The lock inversion is reported by lockdep as:
======================================================
WARNING: possible circular locking dependency detected
6.1.0-next-20221215 #142 Not tainted
------------------------------------------------------
udevd/154 is trying to acquire lock:
ffffc11f123d7e50 (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent+0x54/0x280
but task is already holding lock:
ffff80000e4c36e8 (regulator_ww_class_acquire){+.+.}-{0:0}, at: regulator_enable+0x34/0x80
which lock already depends on the new lock.
...
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(regulator_ww_class_acquire);
lock(regulator_list_mutex);
lock(regulator_ww_class_acquire);
lock(regulator_list_mutex);
*** DEADLOCK ***
just before probe of a Qualcomm UFS controller (occasionally) deadlocks
when enabling one of its regulators.
Fixes: 9243a195be7a ("regulator: core: Change voltage setting path")
Fixes: f8702f9e4aa7 ("regulator: core: Use ww_mutex for regulators locking")
Cc: stable(a)vger.kernel.org # 5.0
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/regulator/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 729c45393803..ae69e493913d 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1002,7 +1002,7 @@ static int drms_uA_update(struct regulator_dev *rdev)
/* get input voltage */
input_uV = 0;
if (rdev->supply)
- input_uV = regulator_get_voltage(rdev->supply);
+ input_uV = regulator_get_voltage_rdev(rdev->supply->rdev);
if (input_uV <= 0)
input_uV = rdev->constraints->input_uV;
--
2.37.4
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
46307fd6e27a ("cgroup: Reorganize css_set_lock and kernfs path processing")
4534dee94105 ("cgroup: cgroup: Honor caller's cgroup NS when resolving cgroup id")
74e4b956eb1c ("cgroup: Honor caller's cgroup NS when resolving path")
e210a89f5b07 ("cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes")
be288169712f ("cgroup: reduce dependency on cgroup_mutex")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46307fd6e27a3f678a1678b02e667678c22aa8cc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michal=20Koutn=C3=BD?= <mkoutny(a)suse.com>
Date: Mon, 10 Oct 2022 10:29:18 +0200
Subject: [PATCH] cgroup: Reorganize css_set_lock and kernfs path processing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The commit 74e4b956eb1c incorrectly wrapped kernfs_walk_and_get
(might_sleep) under css_set_lock (spinlock). css_set_lock is needed by
__cset_cgroup_from_root to ensure stable cset->cgrp_links but not for
kernfs_walk_and_get.
We only need to make sure that the returned root_cgrp won't be freed
under us. This is given in the case of global root because it is static
(cgrp_dfl_root.cgrp). When the root_cgrp is lower in the hierarchy, it
is pinned by cgroup_ns->root_cset (and `current` task cannot switch
namespace asynchronously so ns_proxy pins cgroup_ns).
Note this reasoning won't hold for root cgroups in v1 hierarchies,
therefore create a special-cased helper function just for the default
hierarchy.
Fixes: 74e4b956eb1c ("cgroup: Honor caller's cgroup NS when resolving path")
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Signed-off-by: Michal Koutný <mkoutny(a)suse.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 764bdd5fd8d1..ecf409e3c3a7 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1392,6 +1392,9 @@ static void cgroup_destroy_root(struct cgroup_root *root)
cgroup_free_root(root);
}
+/*
+ * Returned cgroup is without refcount but it's valid as long as cset pins it.
+ */
static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
struct cgroup_root *root)
{
@@ -1403,6 +1406,7 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
res_cgroup = cset->dfl_cgrp;
} else {
struct cgrp_cset_link *link;
+ lockdep_assert_held(&css_set_lock);
list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
struct cgroup *c = link->cgrp;
@@ -1414,6 +1418,7 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
}
}
+ BUG_ON(!res_cgroup);
return res_cgroup;
}
@@ -1436,23 +1441,36 @@ current_cgns_cgroup_from_root(struct cgroup_root *root)
rcu_read_unlock();
- BUG_ON(!res);
return res;
}
+/*
+ * Look up cgroup associated with current task's cgroup namespace on the default
+ * hierarchy.
+ *
+ * Unlike current_cgns_cgroup_from_root(), this doesn't need locks:
+ * - Internal rcu_read_lock is unnecessary because we don't dereference any rcu
+ * pointers.
+ * - css_set_lock is not needed because we just read cset->dfl_cgrp.
+ * - As a bonus returned cgrp is pinned with the current because it cannot
+ * switch cgroup_ns asynchronously.
+ */
+static struct cgroup *current_cgns_cgroup_dfl(void)
+{
+ struct css_set *cset;
+
+ cset = current->nsproxy->cgroup_ns->root_cset;
+ return __cset_cgroup_from_root(cset, &cgrp_dfl_root);
+}
+
/* look up cgroup associated with given css_set on the specified hierarchy */
static struct cgroup *cset_cgroup_from_root(struct css_set *cset,
struct cgroup_root *root)
{
- struct cgroup *res = NULL;
-
lockdep_assert_held(&cgroup_mutex);
lockdep_assert_held(&css_set_lock);
- res = __cset_cgroup_from_root(cset, root);
-
- BUG_ON(!res);
- return res;
+ return __cset_cgroup_from_root(cset, root);
}
/*
@@ -6105,9 +6123,7 @@ struct cgroup *cgroup_get_from_id(u64 id)
if (!cgrp)
return ERR_PTR(-ENOENT);
- spin_lock_irq(&css_set_lock);
- root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
- spin_unlock_irq(&css_set_lock);
+ root_cgrp = current_cgns_cgroup_dfl();
if (!cgroup_is_descendant(cgrp, root_cgrp)) {
cgroup_put(cgrp);
return ERR_PTR(-ENOENT);
@@ -6686,10 +6702,8 @@ struct cgroup *cgroup_get_from_path(const char *path)
struct cgroup *cgrp = ERR_PTR(-ENOENT);
struct cgroup *root_cgrp;
- spin_lock_irq(&css_set_lock);
- root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
+ root_cgrp = current_cgns_cgroup_dfl();
kn = kernfs_walk_and_get(root_cgrp->kn, path);
- spin_unlock_irq(&css_set_lock);
if (!kn)
goto out;
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
46307fd6e27a ("cgroup: Reorganize css_set_lock and kernfs path processing")
4534dee94105 ("cgroup: cgroup: Honor caller's cgroup NS when resolving cgroup id")
74e4b956eb1c ("cgroup: Honor caller's cgroup NS when resolving path")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46307fd6e27a3f678a1678b02e667678c22aa8cc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michal=20Koutn=C3=BD?= <mkoutny(a)suse.com>
Date: Mon, 10 Oct 2022 10:29:18 +0200
Subject: [PATCH] cgroup: Reorganize css_set_lock and kernfs path processing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The commit 74e4b956eb1c incorrectly wrapped kernfs_walk_and_get
(might_sleep) under css_set_lock (spinlock). css_set_lock is needed by
__cset_cgroup_from_root to ensure stable cset->cgrp_links but not for
kernfs_walk_and_get.
We only need to make sure that the returned root_cgrp won't be freed
under us. This is given in the case of global root because it is static
(cgrp_dfl_root.cgrp). When the root_cgrp is lower in the hierarchy, it
is pinned by cgroup_ns->root_cset (and `current` task cannot switch
namespace asynchronously so ns_proxy pins cgroup_ns).
Note this reasoning won't hold for root cgroups in v1 hierarchies,
therefore create a special-cased helper function just for the default
hierarchy.
Fixes: 74e4b956eb1c ("cgroup: Honor caller's cgroup NS when resolving path")
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Signed-off-by: Michal Koutný <mkoutny(a)suse.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 764bdd5fd8d1..ecf409e3c3a7 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1392,6 +1392,9 @@ static void cgroup_destroy_root(struct cgroup_root *root)
cgroup_free_root(root);
}
+/*
+ * Returned cgroup is without refcount but it's valid as long as cset pins it.
+ */
static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
struct cgroup_root *root)
{
@@ -1403,6 +1406,7 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
res_cgroup = cset->dfl_cgrp;
} else {
struct cgrp_cset_link *link;
+ lockdep_assert_held(&css_set_lock);
list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
struct cgroup *c = link->cgrp;
@@ -1414,6 +1418,7 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
}
}
+ BUG_ON(!res_cgroup);
return res_cgroup;
}
@@ -1436,23 +1441,36 @@ current_cgns_cgroup_from_root(struct cgroup_root *root)
rcu_read_unlock();
- BUG_ON(!res);
return res;
}
+/*
+ * Look up cgroup associated with current task's cgroup namespace on the default
+ * hierarchy.
+ *
+ * Unlike current_cgns_cgroup_from_root(), this doesn't need locks:
+ * - Internal rcu_read_lock is unnecessary because we don't dereference any rcu
+ * pointers.
+ * - css_set_lock is not needed because we just read cset->dfl_cgrp.
+ * - As a bonus returned cgrp is pinned with the current because it cannot
+ * switch cgroup_ns asynchronously.
+ */
+static struct cgroup *current_cgns_cgroup_dfl(void)
+{
+ struct css_set *cset;
+
+ cset = current->nsproxy->cgroup_ns->root_cset;
+ return __cset_cgroup_from_root(cset, &cgrp_dfl_root);
+}
+
/* look up cgroup associated with given css_set on the specified hierarchy */
static struct cgroup *cset_cgroup_from_root(struct css_set *cset,
struct cgroup_root *root)
{
- struct cgroup *res = NULL;
-
lockdep_assert_held(&cgroup_mutex);
lockdep_assert_held(&css_set_lock);
- res = __cset_cgroup_from_root(cset, root);
-
- BUG_ON(!res);
- return res;
+ return __cset_cgroup_from_root(cset, root);
}
/*
@@ -6105,9 +6123,7 @@ struct cgroup *cgroup_get_from_id(u64 id)
if (!cgrp)
return ERR_PTR(-ENOENT);
- spin_lock_irq(&css_set_lock);
- root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
- spin_unlock_irq(&css_set_lock);
+ root_cgrp = current_cgns_cgroup_dfl();
if (!cgroup_is_descendant(cgrp, root_cgrp)) {
cgroup_put(cgrp);
return ERR_PTR(-ENOENT);
@@ -6686,10 +6702,8 @@ struct cgroup *cgroup_get_from_path(const char *path)
struct cgroup *cgrp = ERR_PTR(-ENOENT);
struct cgroup *root_cgrp;
- spin_lock_irq(&css_set_lock);
- root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
+ root_cgrp = current_cgns_cgroup_dfl();
kn = kernfs_walk_and_get(root_cgrp->kn, path);
- spin_unlock_irq(&css_set_lock);
if (!kn)
goto out;
Dear Stable,
[NB: Re-poking Stable with the correct contact address this time! :)]
> > > area_cache_get() is used to distribute cache->area and set cache->id,
> > > and if cache->id is not 0 and cache->area->kref refcount is 0, it will
> > > release the cache->area by nfp_cpp_area_release(). area_cache_get()
> > > set cache->id before cpp->op->area_init() and nfp_cpp_area_acquire().
> > >
> > > But if area_init() or nfp_cpp_area_acquire() fails, the cache->id is
> > > is already set but the refcount is not increased as expected. At this
> > > time, calling the nfp_cpp_area_release() will cause use-after-free.
> > >
> > > To avoid the use-after-free, set cache->id after area_init() and
> > > nfp_cpp_area_acquire() complete successfully.
> > >
> > > Note: This vulnerability is triggerable by providing emulated device
> > > equipped with specified configuration.
> > >
> > > BUG: KASAN: use-after-free in nfp6000_area_init (/home/user/Kernel/v5.19
> > > /x86_64/src/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760)
> > > Write of size 4 at addr ffff888005b7f4a0 by task swapper/0/1
> > >
> > > Call Trace:
> > > <TASK>
> > > nfp6000_area_init (/home/user/Kernel/v5.19/x86_64/src/drivers/net
> > > /ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760)
> > > area_cache_get.constprop.8 (/home/user/Kernel/v5.19/x86_64/src/drivers
> > > /net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:884)
> > >
> > > Allocated by task 1:
> > > nfp_cpp_area_alloc_with_name (/home/user/Kernel/v5.19/x86_64/src/drivers
> > > /net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:303)
> > > nfp_cpp_area_cache_add (/home/user/Kernel/v5.19/x86_64/src/drivers/net
> > > /ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:802)
> > > nfp6000_init (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > > /netronome/nfp/nfpcore/nfp6000_pcie.c:1230)
> > > nfp_cpp_from_operations (/home/user/Kernel/v5.19/x86_64/src/drivers/net
> > > /ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:1215)
> > > nfp_pci_probe (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > > /netronome/nfp/nfp_main.c:744)
> > >
> > > Freed by task 1:
> > > kfree (/home/user/Kernel/v5.19/x86_64/src/mm/slub.c:4562)
> > > area_cache_get.constprop.8 (/home/user/Kernel/v5.19/x86_64/src/drivers
> > > /net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:873)
> > > nfp_cpp_read (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > > /netronome/nfp/nfpcore/nfp_cppcore.c:924 /home/user/Kernel/v5.19/x86_64
> > > /src/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:973)
> > > nfp_cpp_readl (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > > /netronome/nfp/nfpcore/nfp_cpplib.c:48)
> > >
> > > Signed-off-by: Jialiang Wang <wangjialiang0806(a)163.com>
> >
> > Any reason why this doesn't have a Fixes: tag applied and/or didn't
> > get sent to Stable?
> >
> > Looks as if this needs to go back as far as v4.19.
> >
> > Fixes: 4cb584e0ee7df ("nfp: add CPP access core")
> >
> > commit 02e1a114fdb71e59ee6770294166c30d437bf86a upstream.
Would you be able to take this with the information provided please?
--
Lee Jones [李琼斯]
From: Miklos Szeredi <mszeredi(a)redhat.com>
commit df8629af293493757beccac2d3168fe5a315636e upstream.
Failure to do so may result in EEXIST even if the file only exists in the
cache and not in the filesystem.
The atomic nature of O_EXCL mandates that the cached state should be
ignored and existence verified anew.
Change-Id: I0f173de6f9f1af05d6e816246b5c56b670ec079c
Reported-by: Ken Schalk <kschalk(a)nvidia.com>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
Signed-off-by: Wu Bo <bo.wu(a)vivo.com>
---
fs/fuse/dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 8e95a75a4559..80a9e50392a0 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -205,7 +205,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
if (inode && fuse_is_bad(inode))
goto invalid;
else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
- (flags & LOOKUP_REVAL)) {
+ (flags & (LOOKUP_EXCL | LOOKUP_REVAL))) {
struct fuse_entry_out outarg;
FUSE_ARGS(args);
struct fuse_forget_link *forget;
--
2.35.3
Greg,
The recent history of copy_file_range() API is somewhat convoluted.
The API changes are documented in copy_file_range(2) man page.
I've just posted a man page update patch [1] to fix some wrong kernel
version references in the man page.
The problem is that it took many kernel releases to get reports on the
regression from v5.3 and yet more releases (v5.12..v5.19) to work on
the solution and get it merged.
This situation leads to confusion among users as can be seen in [2].
You've already picked the patch [1/2] to 5.15.y and I sent you a
request to pick patch [2/2] (from v6.1) as well.
Following are backports of the two patches to 5.10.y, which I verified
with the relevant test in fstests.
Backport to 5.4.y is more challenging because Darrick did a lot of
cleanup in that area between v5.4..v5.10, so I did not do it.
Thanks,
Amir.
[1] https://lore.kernel.org/linux-fsdevel/20221213120834.948163-1-amir73il@gmai…
[2] https://bugzilla.kernel.org/show_bug.cgi?id=216800
Amir Goldstein (2):
vfs: fix copy_file_range() regression in cross-fs copies
vfs: fix copy_file_range() averts filesystem freeze protection
fs/nfsd/vfs.c | 8 ++++-
fs/read_write.c | 90 +++++++++++++++++++++++++++++++++---------------------
include/linux/fs.h | 8 +++++
3 files changed, 70 insertions(+), 36 deletions(-)
--
2.16.5
v5.11 changes the blkdev lookup mechanism completely since commit
22ae8ce8b892 ("block: simplify bdev/disk lookup in blkdev_get"),
and small part of the change is to unhash part bdev inode when
deleting partition. Turns out this kind of change does fix one
nasty issue in case of BLOCK_EXT_MAJOR:
1) when one partition is deleted & closed, disk_put_part() is always
called before bdput(bdev), see blkdev_put(); so the part's devt can
be freed & re-used before the inode is dropped
2) then new partition with same devt can be created just before the
inode in 1) is dropped, then the old inode/bdev structurein 1) is
re-used for this new partition, this way causes use-after-free and
kernel panic.
It isn't possible to backport the whole big patchset of "merge struct
block_device and struct hd_struct v4" for addressing this issue.
https://lore.kernel.org/linux-block/20201128161510.347752-1-hch@lst.de/
So fixes it by unhashing part bdev in delete_partition(), and this way
is actually aligned with v5.11+'s behavior.
Backported from the following 5.10.y commit:
5f2f77560591 ("block: unhash blkdev part inode when the part is deleted")
Reported-by: Shiwei Cui <cuishw(a)inspur.com>
Tested-by: Shiwei Cui <cuishw(a)inspur.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jan Kara <jack(a)suse.cz>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
block/partition-generic.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 298c05f8b5e3..434c122cb958 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -254,6 +254,7 @@ void delete_partition(struct gendisk *disk, int partno)
{
struct disk_part_tbl *ptbl = disk->part_tbl;
struct hd_struct *part;
+ struct block_device *bdev;
if (partno >= ptbl->len)
return;
@@ -267,6 +268,11 @@ void delete_partition(struct gendisk *disk, int partno)
kobject_put(part->holder_dir);
device_del(part_to_dev(part));
+ bdev = bdget(part_devt(part));
+ if (bdev) {
+ remove_inode_hash(bdev->bd_inode);
+ bdput(bdev);
+ }
hd_struct_kill(part);
}
--
2.38.1
A bad bug in clang's implementation of -fzero-call-used-regs can result
in NULL pointer dereferences (see the links above the check for more
information). Restrict CONFIG_CC_HAS_ZERO_CALL_USED_REGS to either a
supported GCC version or a clang newer than 15.0.6, which will catch
both a theoretical 15.0.7 and the upcoming 16.0.0, which will both have
the bug fixed.
Cc: stable(a)vger.kernel.org # v5.15+
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
security/Kconfig.hardening | 3 +++
1 file changed, 3 insertions(+)
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index d766b7d0ffd1..53baa95cb644 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -257,6 +257,9 @@ config INIT_ON_FREE_DEFAULT_ON
config CC_HAS_ZERO_CALL_USED_REGS
def_bool $(cc-option,-fzero-call-used-regs=used-gpr)
+ # https://github.com/ClangBuiltLinux/linux/issues/1766
+ # https://github.com/llvm/llvm-project/issues/59242
+ depends on !CC_IS_CLANG || CLANG_VERSION > 150006
config ZERO_CALL_USED_REGS
bool "Enable register zeroing on function exit"
base-commit: 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
--
2.39.0
Commit c9bfcb315104 ("spi_mpc83xx: much improved driver") made
modifications to the driver to not perform speed changes while
chipselect is active. But those changes where lost with the
convertion to tranfer_one.
Previous implementation was allowing speed changes during
message transfer when cs_change flag was set.
At the time being, core SPI does not provide any feature to change
speed while chipselect is off, so do not allow any speed change during
message transfer, and perform the transfer setup in prepare_message
in order to set correct speed while chipselect is still off.
Reported-by: Herve Codina <herve.codina(a)bootlin.com>
Fixes: 64ca1a034f00 ("spi: fsl_spi: Convert to transfer_one")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Tested-by: Herve Codina <herve.codina(a)bootlin.com>
---
drivers/spi/spi-fsl-spi.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
index 731624f157fc..93152144fd2e 100644
--- a/drivers/spi/spi-fsl-spi.c
+++ b/drivers/spi/spi-fsl-spi.c
@@ -333,13 +333,26 @@ static int fsl_spi_prepare_message(struct spi_controller *ctlr,
{
struct mpc8xxx_spi *mpc8xxx_spi = spi_controller_get_devdata(ctlr);
struct spi_transfer *t;
+ struct spi_transfer *first;
+
+ first = list_first_entry(&m->transfers, struct spi_transfer,
+ transfer_list);
/*
* In CPU mode, optimize large byte transfers to use larger
* bits_per_word values to reduce number of interrupts taken.
+ *
+ * Some glitches can appear on the SPI clock when the mode changes.
+ * Check that there is no speed change during the transfer and set it up
+ * now to change the mode without having a chip-select asserted.
*/
- if (!(mpc8xxx_spi->flags & SPI_CPM_MODE)) {
- list_for_each_entry(t, &m->transfers, transfer_list) {
+ list_for_each_entry(t, &m->transfers, transfer_list) {
+ if (t->speed_hz != first->speed_hz) {
+ dev_err(&m->spi->dev,
+ "speed_hz cannot change during message.\n");
+ return -EINVAL;
+ }
+ if (!(mpc8xxx_spi->flags & SPI_CPM_MODE)) {
if (t->len < 256 || t->bits_per_word != 8)
continue;
if ((t->len & 3) == 0)
@@ -348,7 +361,7 @@ static int fsl_spi_prepare_message(struct spi_controller *ctlr,
t->bits_per_word = 16;
}
}
- return 0;
+ return fsl_spi_setup_transfer(m->spi, first);
}
static int fsl_spi_transfer_one(struct spi_controller *controller,
--
2.38.1
From: Baolin Wang <baolin.wang(a)linux.alibaba.com>
[ Upstream commit fac35ba763ed07ba93154c95ffc0c4a55023707f ]
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb:
refactor find_num_contig()"). Patch series "Support for contiguous pte
hugepages", v4. However, I do not believe these code paths were
executed until migration support was added with 5480280d3f2d ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d3f2d for the Fixes: targe.
Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.16620175…
Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Suggested-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[5.4: Fixup contextual diffs before pin_user_pages()]
Signed-off-by: Samuel Mendoza-Jonas <samjonas(a)amazon.com>
---
v2: Fix up stray out label that is now unused.
include/linux/hugetlb.h | 6 +++---
mm/gup.c | 13 ++++++++++++-
mm/hugetlb.c | 30 +++++++++++++++---------------
3 files changed, 30 insertions(+), 19 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cef70d6e1657..c7507135c2a0 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -127,8 +127,8 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
struct page *follow_huge_pd(struct vm_area_struct *vma,
unsigned long address, hugepd_t hpd,
int flags, int pdshift);
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int flags);
+struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address,
+ int flags);
struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int flags);
struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address,
@@ -175,7 +175,7 @@ static inline void hugetlb_show_meminfo(void)
{
}
#define follow_huge_pd(vma, addr, hpd, flags, pdshift) NULL
-#define follow_huge_pmd(mm, addr, pmd, flags) NULL
+#define follow_huge_pmd_pte(vma, addr, flags) NULL
#define follow_huge_pud(mm, addr, pud, flags) NULL
#define follow_huge_pgd(mm, addr, pgd, flags) NULL
#define prepare_hugepage_range(file, addr, len) (-EINVAL)
diff --git a/mm/gup.c b/mm/gup.c
index 3ef769529548..98bd5e0a51c8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -188,6 +188,17 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
spinlock_t *ptl;
pte_t *ptep, pte;
+ /*
+ * Considering PTE level hugetlb, like continuous-PTE hugetlb on
+ * ARM64 architecture.
+ */
+ if (is_vm_hugetlb_page(vma)) {
+ page = follow_huge_pmd_pte(vma, address, flags);
+ if (page)
+ return page;
+ return no_page_table(vma, flags);
+ }
+
retry:
if (unlikely(pmd_bad(*pmd)))
return no_page_table(vma, flags);
@@ -333,7 +344,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
if (pmd_none(pmdval))
return no_page_table(vma, flags);
if (pmd_huge(pmdval) && vma->vm_flags & VM_HUGETLB) {
- page = follow_huge_pmd(mm, address, pmd, flags);
+ page = follow_huge_pmd_pte(vma, address, flags);
if (page)
return page;
return no_page_table(vma, flags);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2126d78f053d..e4478b62d0f7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5157,30 +5157,30 @@ follow_huge_pd(struct vm_area_struct *vma,
}
struct page * __weak
-follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int flags)
+follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, int flags)
{
+ struct hstate *h = hstate_vma(vma);
+ struct mm_struct *mm = vma->vm_mm;
struct page *page = NULL;
spinlock_t *ptl;
- pte_t pte;
+ pte_t *ptep, pte;
+
retry:
- ptl = pmd_lockptr(mm, pmd);
- spin_lock(ptl);
- /*
- * make sure that the address range covered by this pmd is not
- * unmapped from other threads.
- */
- if (!pmd_huge(*pmd))
- goto out;
- pte = huge_ptep_get((pte_t *)pmd);
+ ptep = huge_pte_offset(mm, address, huge_page_size(h));
+ if (!ptep)
+ return NULL;
+
+ ptl = huge_pte_lock(h, mm, ptep);
+ pte = huge_ptep_get(ptep);
if (pte_present(pte)) {
- page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT);
+ page = pte_page(pte) +
+ ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
if (flags & FOLL_GET)
get_page(page);
} else {
if (is_hugetlb_entry_migration(pte)) {
spin_unlock(ptl);
- __migration_entry_wait(mm, (pte_t *)pmd, ptl);
+ __migration_entry_wait(mm, ptep, ptl);
goto retry;
}
/*
@@ -5188,7 +5188,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
* follow_page_mask().
*/
}
-out:
+
spin_unlock(ptl);
return page;
}
--
2.25.1
Currently, the max_loop commandline argument can be used to specify how
many loop block devices are created at init time. If it is not
specified on the commandline, CONFIG_BLK_DEV_LOOP_MIN_COUNT loop block
devices will be created.
The max_loop commandline argument can be used to override the value of
CONFIG_BLK_DEV_LOOP_MIN_COUNT. However, when max_loop is set to 0
through the commandline, the current logic treats it as if it had not
been set, and creates CONFIG_BLK_DEV_LOOP_MIN_COUNT devices anyway.
Fix this by starting max_loop off as set to CONFIG_BLK_DEV_LOOP_MIN_COUNT.
This preserves the intended behavior of creating
CONFIG_BLK_DEV_LOOP_MIN_COUNT loop block devices if the max_loop
commandline parameter is not specified, and allowing max_loop to
be respected for all values, including 0.
This allows environments that can create all of their required loop
block devices on demand to not have to unnecessarily preallocate loop
block devices.
Fixes: 732850827450 ("remove artificial software max_loop limit")
Cc: stable(a)vger.kernel.org
Cc: Ken Chen <kenchen(a)google.com>
Signed-off-by: Isaac J. Manjarres <isaacmanjarres(a)google.com>
---
drivers/block/loop.c | 28 ++++++++++++----------------
1 file changed, 12 insertions(+), 16 deletions(-)
This is a resend because I misspelled the address for
stable(a)vger.kernel.org the first time.
--Isaac
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ad92192c7d61..d12d3d171ec4 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1773,7 +1773,16 @@ static const struct block_device_operations lo_fops = {
/*
* And now the modules code and kernel interface.
*/
-static int max_loop;
+
+/*
+ * If max_loop is specified, create that many devices upfront.
+ * This also becomes a hard limit. If max_loop is not specified,
+ * create CONFIG_BLK_DEV_LOOP_MIN_COUNT loop devices at module
+ * init time. Loop devices can be requested on-demand with the
+ * /dev/loop-control interface, or be instantiated by accessing
+ * a 'dead' device node.
+ */
+static int max_loop = CONFIG_BLK_DEV_LOOP_MIN_COUNT;
module_param(max_loop, int, 0444);
MODULE_PARM_DESC(max_loop, "Maximum number of loop devices");
module_param(max_part, int, 0444);
@@ -2181,7 +2190,7 @@ MODULE_ALIAS("devname:loop-control");
static int __init loop_init(void)
{
- int i, nr;
+ int i;
int err;
part_shift = 0;
@@ -2209,19 +2218,6 @@ static int __init loop_init(void)
goto err_out;
}
- /*
- * If max_loop is specified, create that many devices upfront.
- * This also becomes a hard limit. If max_loop is not specified,
- * create CONFIG_BLK_DEV_LOOP_MIN_COUNT loop devices at module
- * init time. Loop devices can be requested on-demand with the
- * /dev/loop-control interface, or be instantiated by accessing
- * a 'dead' device node.
- */
- if (max_loop)
- nr = max_loop;
- else
- nr = CONFIG_BLK_DEV_LOOP_MIN_COUNT;
-
err = misc_register(&loop_misc);
if (err < 0)
goto err_out;
@@ -2233,7 +2229,7 @@ static int __init loop_init(void)
}
/* pre-create number of devices given by config or max_loop */
- for (i = 0; i < nr; i++)
+ for (i = 0; i < max_loop; i++)
loop_add(i);
printk(KERN_INFO "loop: module loaded\n");
--
2.39.0.rc1.256.g54fd8350bd-goog
The commit 74e7e1efdad4 (xen/netback: don't call kfree_skb() with
interrupts disabled) fixes deadlock in Linux netback driver. This seems
to be a good candidate for the stable trees. This patch didn't apply
cleanly in 5.15 kernel due to difference in function prototypes in
drivers/net/xen-netback/common.h.
Juergen Gross (1):
xen/netback: don't call kfree_skb() with interrupts disabled
drivers/net/xen-netback/common.h | 2 +-
drivers/net/xen-netback/interface.c | 6 ++++--
drivers/net/xen-netback/rx.c | 8 +++++---
3 files changed, 10 insertions(+), 6 deletions(-)
--
2.39.0.rc1.256.g54fd8350bd-goog
usb_kill_urb warranties that all the handlers are finished when it
returns, but does not protect against threads that might be handling
asynchronously the urb.
For UVC, the function uvc_ctrl_status_event_async() takes care of
control changes asynchronously.
If the code is executed in the following order:
CPU 0 CPU 1
===== =====
uvc_status_complete()
uvc_status_stop()
uvc_ctrl_status_event_work()
uvc_status_start() -> FAIL
Then uvc_status_start will keep failing and this error will be shown:
<4>[ 5.540139] URB 0000000000000000 submitted while active
drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528
Let's improve the current situation, by not re-submiting the urb if
we are stopping the status event. Also process the queued work
(if any) during stop.
CPU 0 CPU 1
===== =====
uvc_status_complete()
uvc_status_stop()
uvc_status_start()
uvc_ctrl_status_event_work() -> FAIL
Hopefully, with the usb layer protection this should be enough to cover
all the cases.
Cc: stable(a)vger.kernel.org
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Reviewed-by: Yunke Cao <yunkec(a)chromium.org>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
uvc: Fix race condition on uvc
Make sure that all the async work is finished when we stop the status urb.
To: Yunke Cao <yunkec(a)chromium.org>
To: Sergey Senozhatsky <senozhatsky(a)chromium.org>
To: Max Staudt <mstaudt(a)google.com>
To: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
To: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: linux-media(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
---
Changes in v3:
- Remove the patch for dev->status, makes more sense in another series, and makes
the zero day less nervous.
- Update reviewed-by (thanks Yunke!).
- Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org
Changes in v2:
- Add a patch for not kalloc dev->status
- Redo the logic mechanism, so it also works with suspend (Thanks Yunke!)
- Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org
---
drivers/media/usb/uvc/uvc_ctrl.c | 3 +++
drivers/media/usb/uvc/uvc_status.c | 6 ++++++
drivers/media/usb/uvc/uvcvideo.h | 1 +
3 files changed, 10 insertions(+)
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index c95a2229f4fa..5160facc8e20 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)
uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
+ if (dev->flush_status)
+ return;
+
/* Resubmit the URB. */
w->urb->interval = dev->int_ep->desc.bInterval;
ret = usb_submit_urb(w->urb, GFP_KERNEL);
diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
index 7518ffce22ed..09a5802dc974 100644
--- a/drivers/media/usb/uvc/uvc_status.c
+++ b/drivers/media/usb/uvc/uvc_status.c
@@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags)
if (dev->int_urb == NULL)
return 0;
+ dev->flush_status = false;
return usb_submit_urb(dev->int_urb, flags);
}
void uvc_status_stop(struct uvc_device *dev)
{
+ struct uvc_ctrl_work *w = &dev->async_ctrl;
+
+ dev->flush_status = true;
usb_kill_urb(dev->int_urb);
+ if (cancel_work_sync(&w->work))
+ uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
}
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index df93db259312..6a9b72d6789e 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -560,6 +560,7 @@ struct uvc_device {
struct usb_host_endpoint *int_ep;
struct urb *int_urb;
u8 *status;
+ bool flush_status;
struct input_dev *input;
char input_phys[64];
---
base-commit: 0ec5a38bf8499f403f81cb81a0e3a60887d1993c
change-id: 20221212-uvc-race-09276ea68bf8
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
Hi,
I would like to request for cherry-picking commit 6f3f65d80dac (linux v5.8) in
the linux-5.4.y branch.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
This commit is trivial, the potential regressions are low. The cherry-pick is
straightforward.
The kernel 5.4 is used by a lot of vendors, having this patch will help
supporting standard ebpf programs.
Regards,
Nicolas
This is the logical place to put the backlight device, and it also
fixes a kernel crash if the MIPI host is removed. Previously the
backlight device would be unregistered twice when this happened - once
as a child of the MIPI host through `mipi_dsi_host_unregister`, and
once when the panel device is destroyed.
Fixes: 12a6cbd4f3f1 ("drm/panel: otm8009a: Use new backlight API")
Signed-off-by: James Cowgill <james.cowgill(a)blaize.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/panel/panel-orisetech-otm8009a.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panel/panel-orisetech-otm8009a.c b/drivers/gpu/drm/panel/panel-orisetech-otm8009a.c
index b4729a94c34a..898b892f1143 100644
--- a/drivers/gpu/drm/panel/panel-orisetech-otm8009a.c
+++ b/drivers/gpu/drm/panel/panel-orisetech-otm8009a.c
@@ -471,7 +471,7 @@ static int otm8009a_probe(struct mipi_dsi_device *dsi)
DRM_MODE_CONNECTOR_DSI);
ctx->bl_dev = devm_backlight_device_register(dev, dev_name(dev),
- dsi->host->dev, ctx,
+ dev, ctx,
&otm8009a_backlight_ops,
NULL);
if (IS_ERR(ctx->bl_dev)) {
--
2.38.1
--
Guten tag,
Mein Name ist Philip Manul. Ich bin von Beruf Rechtsanwalt. Ich habe
einen verstorbenen Kunden, der zufällig denselben Namen mit Ihnen
teilt. Ich habe alle Papierdokumente in meinem Besitz. Ihr Verwandter,
mein verstorbener Kunde, hat hier in meinem Land einen nicht
beanspruchten Fonds zurückgelassen. Ich warte auf Ihre Antwort zum
Verfahren.
Philip Manul.
***************************************************
Good day,
My name is Philip Manul. I am a lawyer by profession. I have a
deceased client who happens to share the same surname with you. I have
all paper documents in my possession. Your relative my late client
left an unclaimed fund here in my country. I await your reply for
Procedure.
Philip Manul.
On Tue, Dec 13, 2022 at 09:59:38AM +0000, Biju Das wrote:
> This patch fixes the error "ravb 11c20000.ethernet eth0: failed to switch
> device to config mode" during unbind.
>
> We are doing register access after pm_runtime_put_sync().
>
> We usually do cleanup in reverse order of init. Currently in
> remove(), the "pm_runtime_put_sync" is not in reverse order.
>
> Probe
> reset_control_deassert(rstc);
> pm_runtime_enable(&pdev->dev);
> pm_runtime_get_sync(&pdev->dev);
>
> remove
> pm_runtime_put_sync(&pdev->dev);
> unregister_netdev(ndev);
> ..
> ravb_mdio_release(priv);
> pm_runtime_disable(&pdev->dev);
>
> Consider the call to unregister_netdev()
> unregister_netdev->unregister_netdevice_queue->rollback_registered_many
> that calls the below functions which access the registers after
> pm_runtime_put_sync()
> 1) ravb_get_stats
> 2) ravb_close
>
> Fixes: a0d2f20650e8 ("Renesas Ethernet AVB PTP clock driver")
I don't know how you came to this fixes line, but the more correct one
is c156633f1353 ("Renesas Ethernet AVB driver proper")
Ant the title should need to be "PATCH net".
When you resend the patch, feel free to add my tag.
Thanks,
Reviewed-by: Leon Romanovsky <leonro(a)nvidia.com>
Make sure that all the async work is finished when we stop the status urb.
To: Yunke Cao <yunkec(a)chromium.org>
To: Sergey Senozhatsky <senozhatsky(a)chromium.org>
To: Max Staudt <mstaudt(a)google.com>
To: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
To: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: linux-media(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v2:
- Add a patch for not kalloc dev->status
- Redo the logic mechanism, so it also works with suspend (Thanks Yunke!)
- Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org
---
Ricardo Ribalda (2):
media: uvcvideo: Fix race condition with usb_kill_urb
media: uvcvideo: Do not alloc dev->status
drivers/media/usb/uvc/uvc_ctrl.c | 3 +++
drivers/media/usb/uvc/uvc_status.c | 15 +++++++--------
drivers/media/usb/uvc/uvcvideo.h | 3 ++-
3 files changed, 12 insertions(+), 9 deletions(-)
---
base-commit: 0ec5a38bf8499f403f81cb81a0e3a60887d1993c
change-id: 20221212-uvc-race-09276ea68bf8
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
From: Xiubo Li <xiubli(a)redhat.com>
When ceph releasing the file_lock it will try to get the inode pointer
from the fl->fl_file, which the memory could already be released by
another thread in filp_close(). Because in VFS layer the fl->fl_file
doesn't increase the file's reference counter.
Will switch to use ceph dedicate lock info to track the inode.
And in ceph_fl_release_lock() we should skip all the operations if
the fl->fl_u.ceph_fl.fl_inode is not set, which should come from
the request file_lock. And we will set fl->fl_u.ceph_fl.fl_inode when
inserting it to the inode lock list, which is when copying the lock.
Cc: stable(a)vger.kernel.org
Cc: Jeff Layton <jlayton(a)kernel.org>
URL: https://tracker.ceph.com/issues/57986
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/locks.c | 20 ++++++++++++++++++--
include/linux/fs.h | 3 +++
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index b191426bf880..d31955f710f2 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -34,18 +34,34 @@ static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
{
struct inode *inode = file_inode(dst->fl_file);
atomic_inc(&ceph_inode(inode)->i_filelock_ref);
+ dst->fl_u.ceph.inode = igrab(inode);
}
+/*
+ * Do not use the 'fl->fl_file' in release function, which
+ * is possibly already released by another thread.
+ */
static void ceph_fl_release_lock(struct file_lock *fl)
{
- struct inode *inode = file_inode(fl->fl_file);
- struct ceph_inode_info *ci = ceph_inode(inode);
+ struct inode *inode = fl->fl_u.ceph.inode;
+ struct ceph_inode_info *ci;
+
+ /*
+ * If inode is NULL it should be a request file_lock,
+ * nothing we can do.
+ */
+ if (!inode)
+ return;
+
+ ci = ceph_inode(inode);
if (atomic_dec_and_test(&ci->i_filelock_ref)) {
/* clear error when all locks are released */
spin_lock(&ci->i_ceph_lock);
ci->i_ceph_flags &= ~CEPH_I_ERROR_FILELOCK;
spin_unlock(&ci->i_ceph_lock);
}
+ fl->fl_u.ceph.inode = NULL;
+ iput(inode);
}
static const struct file_lock_operations ceph_fl_lock_ops = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7b52fdfb6da0..c92c62f1e964 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1119,6 +1119,9 @@ struct file_lock {
int state; /* state of grant or error if -ve */
unsigned int debug_id;
} afs;
+ struct {
+ struct inode *inode;
+ } ceph;
} fl_u;
} __randomize_layout;
--
2.31.1
From: Xiubo Li <xiubli(a)redhat.com>
When ceph releasing the file_lock it will try to get the inode pointer
from the fl->fl_file, which the memory could already be released by
another thread in filp_close(). Because in VFS layer the fl->fl_file
doesn't increase the file's reference counter.
Will switch to use ceph dedicate lock info to track the inode.
And in ceph_fl_release_lock() we should skip all the operations if
the fl->fl_u.ceph_fl.fl_inode is not set, which should come from
the request file_lock. And we will set fl->fl_u.ceph_fl.fl_inode when
inserting it to the inode lock list, which is when copying the lock.
Cc: stable(a)vger.kernel.org
Cc: Jeff Layton <jlayton(a)kernel.org>
URL: https://tracker.ceph.com/issues/57986
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/locks.c | 20 ++++++++++++++++++--
include/linux/fs.h | 3 +++
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index b191426bf880..cf78608a3f9a 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -34,18 +34,34 @@ static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
{
struct inode *inode = file_inode(dst->fl_file);
atomic_inc(&ceph_inode(inode)->i_filelock_ref);
+ dst->fl_u.ceph.fl_inode = igrab(inode);
}
+/*
+ * Do not use the 'fl->fl_file' in release function, which
+ * is possibly already released by another thread.
+ */
static void ceph_fl_release_lock(struct file_lock *fl)
{
- struct inode *inode = file_inode(fl->fl_file);
- struct ceph_inode_info *ci = ceph_inode(inode);
+ struct inode *inode = fl->fl_u.ceph.fl_inode;
+ struct ceph_inode_info *ci;
+
+ /*
+ * If inode is NULL it should be a request file_lock,
+ * nothing we can do.
+ */
+ if (!inode)
+ return;
+
+ ci = ceph_inode(inode);
if (atomic_dec_and_test(&ci->i_filelock_ref)) {
/* clear error when all locks are released */
spin_lock(&ci->i_ceph_lock);
ci->i_ceph_flags &= ~CEPH_I_ERROR_FILELOCK;
spin_unlock(&ci->i_ceph_lock);
}
+ fl->fl_u.ceph.fl_inode = NULL;
+ iput(inode);
}
static const struct file_lock_operations ceph_fl_lock_ops = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7b52fdfb6da0..6106374f5257 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1119,6 +1119,9 @@ struct file_lock {
int state; /* state of grant or error if -ve */
unsigned int debug_id;
} afs;
+ struct {
+ struct inode *fl_inode;
+ } ceph;
} fl_u;
} __randomize_layout;
--
2.31.1
The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
accessing the (Control and Status Registers) CSRs of each LLCC bank.
This stride only works for some SoCs like SDM845 for which driver
support was initially added.
But the later SoCs use different register stride that vary between the
banks with holes in-between. So it is not possible to use a single register
stride for accessing the CSRs of each bank. By doing so could result in a
crash.
For fixing this issue, let's obtain the base address of each LLCC bank from
devicetree and get rid of the fixed stride. This also means, we no longer
need to rely on reg-names property and get the base addresses using index.
First index is LLCC bank 0 and last index is LLCC broadcast. If the SoC
supports more than one bank, then those needs to be defined in devicetree
for index from 1..N-1.
Cc: <stable(a)vger.kernel.org> # 4.20
Fixes: a3134fb09e0b ("drivers: soc: Add LLCC driver")
Fixes: 27450653f1db ("drivers: edac: Add EDAC driver support for QCOM SoCs")
Reported-by: Parikshit Pareek <quic_ppareek(a)quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
drivers/edac/qcom_edac.c | 14 +++---
drivers/soc/qcom/llcc-qcom.c | 72 +++++++++++++++++-------------
include/linux/soc/qcom/llcc-qcom.h | 6 +--
3 files changed, 48 insertions(+), 44 deletions(-)
diff --git a/drivers/edac/qcom_edac.c b/drivers/edac/qcom_edac.c
index 97a27e42dd61..5be93577fc03 100644
--- a/drivers/edac/qcom_edac.c
+++ b/drivers/edac/qcom_edac.c
@@ -213,7 +213,7 @@ dump_syn_reg_values(struct llcc_drv_data *drv, u32 bank, int err_type)
for (i = 0; i < reg_data.reg_cnt; i++) {
synd_reg = reg_data.synd_reg + (i * 4);
- ret = regmap_read(drv->regmap, drv->offsets[bank] + synd_reg,
+ ret = regmap_read(drv->regmaps[bank], synd_reg,
&synd_val);
if (ret)
goto clear;
@@ -222,8 +222,7 @@ dump_syn_reg_values(struct llcc_drv_data *drv, u32 bank, int err_type)
reg_data.name, i, synd_val);
}
- ret = regmap_read(drv->regmap,
- drv->offsets[bank] + reg_data.count_status_reg,
+ ret = regmap_read(drv->regmaps[bank], reg_data.count_status_reg,
&err_cnt);
if (ret)
goto clear;
@@ -233,8 +232,7 @@ dump_syn_reg_values(struct llcc_drv_data *drv, u32 bank, int err_type)
edac_printk(KERN_CRIT, EDAC_LLCC, "%s: Error count: 0x%4x\n",
reg_data.name, err_cnt);
- ret = regmap_read(drv->regmap,
- drv->offsets[bank] + reg_data.ways_status_reg,
+ ret = regmap_read(drv->regmaps[bank], reg_data.ways_status_reg,
&err_ways);
if (ret)
goto clear;
@@ -296,8 +294,7 @@ llcc_ecc_irq_handler(int irq, void *edev_ctl)
/* Iterate over the banks and look for Tag RAM or Data RAM errors */
for (i = 0; i < drv->num_banks; i++) {
- ret = regmap_read(drv->regmap,
- drv->offsets[i] + DRP_INTERRUPT_STATUS,
+ ret = regmap_read(drv->regmaps[i], DRP_INTERRUPT_STATUS,
&drp_error);
if (!ret && (drp_error & SB_ECC_ERROR)) {
@@ -312,8 +309,7 @@ llcc_ecc_irq_handler(int irq, void *edev_ctl)
if (!ret)
irq_rc = IRQ_HANDLED;
- ret = regmap_read(drv->regmap,
- drv->offsets[i] + TRP_INTERRUPT_0_STATUS,
+ ret = regmap_read(drv->regmaps[i], TRP_INTERRUPT_0_STATUS,
&trp_error);
if (!ret && (trp_error & SB_ECC_ERROR)) {
diff --git a/drivers/soc/qcom/llcc-qcom.c b/drivers/soc/qcom/llcc-qcom.c
index 23ce2f78c4ed..a29f22dad7fa 100644
--- a/drivers/soc/qcom/llcc-qcom.c
+++ b/drivers/soc/qcom/llcc-qcom.c
@@ -62,8 +62,6 @@
#define LLCC_TRP_WRSC_CACHEABLE_EN 0x21f2c
#define LLCC_TRP_ALGO_CFG8 0x21f30
-#define BANK_OFFSET_STRIDE 0x80000
-
#define LLCC_VERSION_2_0_0_0 0x02000000
#define LLCC_VERSION_2_1_0_0 0x02010000
#define LLCC_VERSION_4_1_0_0 0x04010000
@@ -898,8 +896,8 @@ static int qcom_llcc_remove(struct platform_device *pdev)
return 0;
}
-static struct regmap *qcom_llcc_init_mmio(struct platform_device *pdev,
- const char *name)
+static struct regmap *qcom_llcc_init_mmio(struct platform_device *pdev, u8 index,
+ const char *name)
{
void __iomem *base;
struct regmap_config llcc_regmap_config = {
@@ -909,7 +907,7 @@ static struct regmap *qcom_llcc_init_mmio(struct platform_device *pdev,
.fast_io = true,
};
- base = devm_platform_ioremap_resource_byname(pdev, name);
+ base = devm_platform_ioremap_resource(pdev, index);
if (IS_ERR(base))
return ERR_CAST(base);
@@ -927,6 +925,7 @@ static int qcom_llcc_probe(struct platform_device *pdev)
const struct llcc_slice_config *llcc_cfg;
u32 sz;
u32 version;
+ struct regmap *regmap;
drv_data = devm_kzalloc(dev, sizeof(*drv_data), GFP_KERNEL);
if (!drv_data) {
@@ -934,21 +933,51 @@ static int qcom_llcc_probe(struct platform_device *pdev)
goto err;
}
- drv_data->regmap = qcom_llcc_init_mmio(pdev, "llcc_base");
- if (IS_ERR(drv_data->regmap)) {
- ret = PTR_ERR(drv_data->regmap);
+ /* Initialize the first LLCC bank regmap */
+ regmap = qcom_llcc_init_mmio(pdev, i, "llcc0_base");
+ if (IS_ERR(regmap)) {
+ ret = PTR_ERR(regmap);
goto err;
}
- drv_data->bcast_regmap =
- qcom_llcc_init_mmio(pdev, "llcc_broadcast_base");
+ cfg = of_device_get_match_data(&pdev->dev);
+
+ ret = regmap_read(regmap, cfg->reg_offset[LLCC_COMMON_STATUS0], &num_banks);
+ if (ret)
+ goto err;
+
+ num_banks &= LLCC_LB_CNT_MASK;
+ num_banks >>= LLCC_LB_CNT_SHIFT;
+ drv_data->num_banks = num_banks;
+
+ drv_data->regmaps = devm_kcalloc(dev, num_banks, sizeof(*drv_data->regmaps), GFP_KERNEL);
+ if (!drv_data->regmaps) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ drv_data->regmaps[0] = regmap;
+
+ /* Initialize rest of LLCC bank regmaps */
+ for (i = 1; i < num_banks; i++) {
+ char *base = kasprintf(GFP_KERNEL, "llcc%d_base", i);
+
+ drv_data->regmaps[i] = qcom_llcc_init_mmio(pdev, i, base);
+ if (IS_ERR(drv_data->regmaps[i])) {
+ ret = PTR_ERR(drv_data->regmaps[i]);
+ kfree(base);
+ goto err;
+ }
+
+ kfree(base);
+ }
+
+ drv_data->bcast_regmap = qcom_llcc_init_mmio(pdev, i, "llcc_broadcast_base");
if (IS_ERR(drv_data->bcast_regmap)) {
ret = PTR_ERR(drv_data->bcast_regmap);
goto err;
}
- cfg = of_device_get_match_data(&pdev->dev);
-
/* Extract version of the IP */
ret = regmap_read(drv_data->bcast_regmap, cfg->reg_offset[LLCC_COMMON_HW_INFO],
&version);
@@ -957,15 +986,6 @@ static int qcom_llcc_probe(struct platform_device *pdev)
drv_data->version = version;
- ret = regmap_read(drv_data->regmap, cfg->reg_offset[LLCC_COMMON_STATUS0],
- &num_banks);
- if (ret)
- goto err;
-
- num_banks &= LLCC_LB_CNT_MASK;
- num_banks >>= LLCC_LB_CNT_SHIFT;
- drv_data->num_banks = num_banks;
-
llcc_cfg = cfg->sct_data;
sz = cfg->size;
@@ -973,16 +993,6 @@ static int qcom_llcc_probe(struct platform_device *pdev)
if (llcc_cfg[i].slice_id > drv_data->max_slices)
drv_data->max_slices = llcc_cfg[i].slice_id;
- drv_data->offsets = devm_kcalloc(dev, num_banks, sizeof(u32),
- GFP_KERNEL);
- if (!drv_data->offsets) {
- ret = -ENOMEM;
- goto err;
- }
-
- for (i = 0; i < num_banks; i++)
- drv_data->offsets[i] = i * BANK_OFFSET_STRIDE;
-
drv_data->bitmap = devm_bitmap_zalloc(dev, drv_data->max_slices,
GFP_KERNEL);
if (!drv_data->bitmap) {
diff --git a/include/linux/soc/qcom/llcc-qcom.h b/include/linux/soc/qcom/llcc-qcom.h
index ad1fd718169d..423220e66026 100644
--- a/include/linux/soc/qcom/llcc-qcom.h
+++ b/include/linux/soc/qcom/llcc-qcom.h
@@ -120,7 +120,7 @@ struct llcc_edac_reg_offset {
/**
* struct llcc_drv_data - Data associated with the llcc driver
- * @regmap: regmap associated with the llcc device
+ * @regmaps: regmaps associated with the llcc device
* @bcast_regmap: regmap associated with llcc broadcast offset
* @cfg: pointer to the data structure for slice configuration
* @edac_reg_offset: Offset of the LLCC EDAC registers
@@ -129,12 +129,11 @@ struct llcc_edac_reg_offset {
* @max_slices: max slices as read from device tree
* @num_banks: Number of llcc banks
* @bitmap: Bit map to track the active slice ids
- * @offsets: Pointer to the bank offsets array
* @ecc_irq: interrupt for llcc cache error detection and reporting
* @version: Indicates the LLCC version
*/
struct llcc_drv_data {
- struct regmap *regmap;
+ struct regmap **regmaps;
struct regmap *bcast_regmap;
const struct llcc_slice_config *cfg;
const struct llcc_edac_reg_offset *edac_reg_offset;
@@ -143,7 +142,6 @@ struct llcc_drv_data {
u32 max_slices;
u32 num_banks;
unsigned long *bitmap;
- u32 *offsets;
int ecc_irq;
u32 version;
};
--
2.25.1
The LLCC block has several banks each with a different base address
and holes in between. So it is not a correct approach to cover these
banks with a single offset/size. Instead, the individual bank's base
address needs to be specified in devicetree with the exact size.
Also, let's get rid of reg-names property as it is not needed anymore.
The driver is expected to parse the reg field based on index to get the
addresses of each LLCC banks.
Cc: <stable(a)vger.kernel.org> # 5.4
Fixes: ba0411ddd133 ("arm64: dts: sdm845: Add device node for Last level cache controller")
Reported-by: Parikshit Pareek <quic_ppareek(a)quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
arch/arm64/boot/dts/qcom/sdm845.dtsi | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 65032b94b46d..683b861e060d 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -2132,8 +2132,9 @@ uart15: serial@a9c000 {
llcc: system-cache-controller@1100000 {
compatible = "qcom,sdm845-llcc";
- reg = <0 0x01100000 0 0x31000>, <0 0x01300000 0 0x50000>;
- reg-names = "llcc_base", "llcc_broadcast_base";
+ reg = <0 0x01100000 0 0x50000>, <0 0x01180000 0 0x50000>,
+ <0 0x01200000 0 0x50000>, <0 0x01280000 0 0x50000>,
+ <0 0x01300000 0 0x50000>;
interrupts = <GIC_SPI 582 IRQ_TYPE_LEVEL_HIGH>;
};
--
2.25.1
The LLCC block has several banks each with a different base address
and holes in between. So it is not a correct approach to cover these
banks with a single offset/size. Instead, the individual bank's base
address needs to be specified in devicetree with the exact size.
On SM6350, there is only one LLCC bank available. So only change needed is
to remove the reg-names property from LLCC node to conform to the binding.
The driver is expected to parse the reg field based on index to get the
addresses of each LLCC banks.
Cc: <stable(a)vger.kernel.org> # 5.16
Fixes: ced2f0d75e13 ("arm64: dts: qcom: sm6350: Add LLCC node")
Reported-by: Parikshit Pareek <quic_ppareek(a)quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
arch/arm64/boot/dts/qcom/sm6350.dtsi | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm64/boot/dts/qcom/sm6350.dtsi b/arch/arm64/boot/dts/qcom/sm6350.dtsi
index 43324bf291c3..1f39627cd7c6 100644
--- a/arch/arm64/boot/dts/qcom/sm6350.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm6350.dtsi
@@ -1174,7 +1174,6 @@ dc_noc: interconnect@9160000 {
system-cache-controller@9200000 {
compatible = "qcom,sm6350-llcc";
reg = <0 0x09200000 0 0x50000>, <0 0x09600000 0 0x50000>;
- reg-names = "llcc_base", "llcc_broadcast_base";
};
gem_noc: interconnect@9680000 {
--
2.25.1
The LLCC block has several banks each with a different base address
and holes in between. So it is not a correct approach to cover these
banks with a single offset/size. Instead, the individual bank's base
address needs to be specified in devicetree with the exact size.
On SC7180, there is only one LLCC bank available. So only change needed is
to remove the reg-names property from LLCC node to conform to the binding.
The driver is expected to parse the reg field based on index to get the
addresses of each LLCC banks.
Cc: <stable(a)vger.kernel.org> # 5.6
Fixes: c831fa299996 ("arm64: dts: qcom: sc7180: Add Last level cache controller node")
Reported-by: Parikshit Pareek <quic_ppareek(a)quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi b/arch/arm64/boot/dts/qcom/sc7180.dtsi
index f71cf21a8dd8..b0d524bbf051 100644
--- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
@@ -2759,7 +2759,6 @@ dc_noc: interconnect@9160000 {
system-cache-controller@9200000 {
compatible = "qcom,sc7180-llcc";
reg = <0 0x09200000 0 0x50000>, <0 0x09600000 0 0x50000>;
- reg-names = "llcc_base", "llcc_broadcast_base";
interrupts = <GIC_SPI 582 IRQ_TYPE_LEVEL_HIGH>;
};
--
2.25.1
Hello
With sincerity of purpose I wish to communicate with you seeking your
acceptance towards investing in your country under your Management as
my foreign investor/business partner.
I'm Mrs. Aisha Al-Gaddafi, the only biological Daughter of the late
Libyan President (Late Colonel Muammar Gaddafi) I'm a single Mother
and a widow with three Children, presently residing herein Oman the
Southeastern coast of the Arabian Peninsula in Western Asia. I have
investment funds worth Twenty Seven Million Five Hundred Thousand
United State Dollars ($27.500.000.00 ) which I want to entrust to you
for the investment project in your country.
I am willing to negotiate an investment/business profit sharing ratio
with you based on the future investment earning profits. If you are
willing to handle this project kindly reply urgently to enable me to
provide you more information about the investment funds.
Best Regards
Mrs. Aisha Al-Gaddafi.
The LLCC block has several banks each with a different base address
and holes in between. So it is not a correct approach to cover these
banks with a single offset/size. Instead, the individual bank's base
address needs to be specified in devicetree with the exact size.
While at it, let's also fix the size of the llcc_broadcast_base to cover
the whole region.
Also, let's get rid of reg-names property as it is not needed anymore.
The driver is expected to parse the reg field based on index to get the
addresses of each LLCC banks.
Cc: <stable(a)vger.kernel.org> # 5.13
Fixes: 0392968dbe09 ("arm64: dts: qcom: sc7280: Add device tree node for LLCC")
Reported-by: Parikshit Pareek <quic_ppareek(a)quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
arch/arm64/boot/dts/qcom/sc7280.dtsi | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sc7280.dtsi b/arch/arm64/boot/dts/qcom/sc7280.dtsi
index 0adf13399e64..90e11cbbaf88 100644
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi
@@ -3579,8 +3579,8 @@ gem_noc: interconnect@9100000 {
system-cache-controller@9200000 {
compatible = "qcom,sc7280-llcc";
- reg = <0 0x09200000 0 0xd0000>, <0 0x09600000 0 0x50000>;
- reg-names = "llcc_base", "llcc_broadcast_base";
+ reg = <0 0x09200000 0 0x58000>, <0 0x09280000 0 0x58000>,
+ <0 0x09600000 0 0x58000>;
interrupts = <GIC_SPI 582 IRQ_TYPE_LEVEL_HIGH>;
};
--
2.25.1
Since commit 07ec77a1d4e8 ("sched: Allow task CPU affinity to be
restricted on asymmetric systems"), the setting and clearing of
user_cpus_ptr are done under pi_lock for arm64 architecture. However,
dup_user_cpus_ptr() accesses user_cpus_ptr without any lock
protection. When racing with the clearing of user_cpus_ptr in
__set_cpus_allowed_ptr_locked(), it can lead to user-after-free and
double-free in arm64 kernel.
Commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask") fixes this problem as user_cpus_ptr, once set, will never
be cleared in a task's lifetime. However, this bug was re-introduced
in commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") which allows the clearing of user_cpus_ptr in
do_set_cpus_allowed(). This time, it will affect all arches.
Fix this bug by always clearing the user_cpus_ptr of the newly
cloned/forked task before the copying process starts and check the
user_cpus_ptr state of the source task under pi_lock.
Note to stable, this patch won't be applicable to stable releases.
Just copy the new dup_user_cpus_ptr() function over.
Fixes: 07ec77a1d4e8 ("sched: Allow task CPU affinity to be restricted on asymmetric systems")
Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
CC: stable(a)vger.kernel.org
Reported-by: David Wang 王标 <wangbiao3(a)xiaomi.com>
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/sched/core.c | 32 ++++++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8df51b08bb38..f2b75faaf71a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2624,19 +2624,43 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
int node)
{
+ cpumask_t *user_mask;
unsigned long flags;
+ /*
+ * Always clear dst->user_cpus_ptr first as their user_cpus_ptr's
+ * may differ by now due to racing.
+ */
+ dst->user_cpus_ptr = NULL;
+
+ /*
+ * This check is racy and losing the race is a valid situation.
+ * It is not worth the extra overhead of taking the pi_lock on
+ * every fork/clone.
+ */
if (!src->user_cpus_ptr)
return 0;
- dst->user_cpus_ptr = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
- if (!dst->user_cpus_ptr)
+ user_mask = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
+ if (!user_mask)
return -ENOMEM;
- /* Use pi_lock to protect content of user_cpus_ptr */
+ /*
+ * Use pi_lock to protect content of user_cpus_ptr
+ *
+ * Though unlikely, user_cpus_ptr can be reset to NULL by a concurrent
+ * do_set_cpus_allowed().
+ */
raw_spin_lock_irqsave(&src->pi_lock, flags);
- cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
+ if (src->user_cpus_ptr) {
+ swap(dst->user_cpus_ptr, user_mask);
+ cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
+ }
raw_spin_unlock_irqrestore(&src->pi_lock, flags);
+
+ if (unlikely(user_mask))
+ kfree(user_mask);
+
return 0;
}
--
2.31.1
From: Baolin Wang <baolin.wang(a)linux.alibaba.com>
[ Upstream commit fac35ba763ed07ba93154c95ffc0c4a55023707f ]
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb:
refactor find_num_contig()"). Patch series "Support for contiguous pte
hugepages", v4. However, I do not believe these code paths were
executed until migration support was added with 5480280d3f2d ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d3f2d for the Fixes: targe.
Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.16620175…
Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Suggested-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[5.4: Fixup contextual diffs before pin_user_pages()]
Signed-off-by: Samuel Mendoza-Jonas <samjonas(a)amazon.com>
---
include/linux/hugetlb.h | 6 +++---
mm/gup.c | 13 ++++++++++++-
mm/hugetlb.c | 28 ++++++++++++++--------------
3 files changed, 29 insertions(+), 18 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cef70d6e1657..c7507135c2a0 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -127,8 +127,8 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
struct page *follow_huge_pd(struct vm_area_struct *vma,
unsigned long address, hugepd_t hpd,
int flags, int pdshift);
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int flags);
+struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address,
+ int flags);
struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int flags);
struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address,
@@ -175,7 +175,7 @@ static inline void hugetlb_show_meminfo(void)
{
}
#define follow_huge_pd(vma, addr, hpd, flags, pdshift) NULL
-#define follow_huge_pmd(mm, addr, pmd, flags) NULL
+#define follow_huge_pmd_pte(vma, addr, flags) NULL
#define follow_huge_pud(mm, addr, pud, flags) NULL
#define follow_huge_pgd(mm, addr, pgd, flags) NULL
#define prepare_hugepage_range(file, addr, len) (-EINVAL)
diff --git a/mm/gup.c b/mm/gup.c
index 3ef769529548..98bd5e0a51c8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -188,6 +188,17 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
spinlock_t *ptl;
pte_t *ptep, pte;
+ /*
+ * Considering PTE level hugetlb, like continuous-PTE hugetlb on
+ * ARM64 architecture.
+ */
+ if (is_vm_hugetlb_page(vma)) {
+ page = follow_huge_pmd_pte(vma, address, flags);
+ if (page)
+ return page;
+ return no_page_table(vma, flags);
+ }
+
retry:
if (unlikely(pmd_bad(*pmd)))
return no_page_table(vma, flags);
@@ -333,7 +344,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
if (pmd_none(pmdval))
return no_page_table(vma, flags);
if (pmd_huge(pmdval) && vma->vm_flags & VM_HUGETLB) {
- page = follow_huge_pmd(mm, address, pmd, flags);
+ page = follow_huge_pmd_pte(vma, address, flags);
if (page)
return page;
return no_page_table(vma, flags);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2126d78f053d..177dfd03af4e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5157,30 +5157,30 @@ follow_huge_pd(struct vm_area_struct *vma,
}
struct page * __weak
-follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int flags)
+follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, int flags)
{
+ struct hstate *h = hstate_vma(vma);
+ struct mm_struct *mm = vma->vm_mm;
struct page *page = NULL;
spinlock_t *ptl;
- pte_t pte;
+ pte_t *ptep, pte;
+
retry:
- ptl = pmd_lockptr(mm, pmd);
- spin_lock(ptl);
- /*
- * make sure that the address range covered by this pmd is not
- * unmapped from other threads.
- */
- if (!pmd_huge(*pmd))
- goto out;
- pte = huge_ptep_get((pte_t *)pmd);
+ ptep = huge_pte_offset(mm, address, huge_page_size(h));
+ if (!ptep)
+ return NULL;
+
+ ptl = huge_pte_lock(h, mm, ptep);
+ pte = huge_ptep_get(ptep);
if (pte_present(pte)) {
- page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT);
+ page = pte_page(pte) +
+ ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
if (flags & FOLL_GET)
get_page(page);
} else {
if (is_hugetlb_entry_migration(pte)) {
spin_unlock(ptl);
- __migration_entry_wait(mm, (pte_t *)pmd, ptl);
+ __migration_entry_wait(mm, ptep, ptl);
goto retry;
}
/*
--
2.25.1
From: Xiubo Li <xiubli(a)redhat.com>
For the POSIX locks they are using the same owner, which is the
thread id. And multiple POSIX locks could be merged into single one,
so when checking whether the 'file' has locks may fail.
For a file where some openers use locking and others don't is a
really odd usage pattern though. Locks are like stoplights -- they
only work if everyone pays attention to them.
Just switch ceph_get_caps() to check whether any locks are set on
the inode. If there are POSIX/OFD/FLOCK locks on the file at the
time, we should set CHECK_FILELOCK, regardless of what fd was used
to set the lock.
Cc: stable(a)vger.kernel.org
Fixes: ff5d913dfc71 ("ceph: return -EIO if read/write against filp that lost file locks")
Cc: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/caps.c | 2 +-
fs/ceph/locks.c | 4 ----
fs/ceph/super.h | 1 -
3 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 065e9311b607..948136f81fc8 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2964,7 +2964,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
while (true) {
flags &= CEPH_FILE_MODE_MASK;
- if (atomic_read(&fi->num_locks))
+ if (vfs_inode_has_locks(inode))
flags |= CHECK_FILELOCK;
_got = 0;
ret = try_get_cap_refs(inode, need, want, endoff,
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index 3e2843e86e27..b191426bf880 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -32,18 +32,14 @@ void __init ceph_flock_init(void)
static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
{
- struct ceph_file_info *fi = dst->fl_file->private_data;
struct inode *inode = file_inode(dst->fl_file);
atomic_inc(&ceph_inode(inode)->i_filelock_ref);
- atomic_inc(&fi->num_locks);
}
static void ceph_fl_release_lock(struct file_lock *fl)
{
- struct ceph_file_info *fi = fl->fl_file->private_data;
struct inode *inode = file_inode(fl->fl_file);
struct ceph_inode_info *ci = ceph_inode(inode);
- atomic_dec(&fi->num_locks);
if (atomic_dec_and_test(&ci->i_filelock_ref)) {
/* clear error when all locks are released */
spin_lock(&ci->i_ceph_lock);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 14454f464029..e7662ff6f149 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -804,7 +804,6 @@ struct ceph_file_info {
struct list_head rw_contexts;
u32 filp_gen;
- atomic_t num_locks;
};
struct ceph_dir_file_info {
--
2.31.1
pipe_write cannot be called on notification pipes so
post_one_notification cannot race it.
Locking and second pipe_full check are thus redundant.
This fixes an issue where pipe write could unexpectedly block:
// Assume there is no reader or reader polls and uses FIONREAD ioctl
// to read all the available bytes.
for (int i = 0; i < PIPE_DEF_BUFFERS+1; ++i) {
write(pipe_fd, buf_that_efaults, PAGE_SIZE);
}
// Never reached
Fixes: a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wiktor Garbacz <wiktorg(a)google.com>
---
fs/pipe.c | 35 +++++++++--------------------------
1 file changed, 9 insertions(+), 26 deletions(-)
diff --git a/fs/pipe.c b/fs/pipe.c
index 42c7ff41c2db..87356a2823cf 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -501,43 +501,26 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
pipe->tmp_page = page;
}
- /* Allocate a slot in the ring in advance and attach an
- * empty buffer. If we fault or otherwise fail to use
- * it, either the reader will consume it or it'll still
- * be there for the next write.
- */
- spin_lock_irq(&pipe->rd_wait.lock);
-
- head = pipe->head;
- if (pipe_full(head, pipe->tail, pipe->max_usage)) {
- spin_unlock_irq(&pipe->rd_wait.lock);
- continue;
+ copied = copy_page_from_iter(page, 0, PAGE_SIZE, from);
+ if (unlikely(copied < PAGE_SIZE && iov_iter_count(from))) {
+ if (!ret)
+ ret = -EFAULT;
+ break;
}
-
- pipe->head = head + 1;
- spin_unlock_irq(&pipe->rd_wait.lock);
+ ret += copied;
/* Insert it into the buffer array */
- buf = &pipe->bufs[head & mask];
buf->page = page;
buf->ops = &anon_pipe_buf_ops;
buf->offset = 0;
- buf->len = 0;
+ buf->len = copied;
if (is_packetized(filp))
buf->flags = PIPE_BUF_FLAG_PACKET;
else
buf->flags = PIPE_BUF_FLAG_CAN_MERGE;
pipe->tmp_page = NULL;
-
- copied = copy_page_from_iter(page, 0, PAGE_SIZE, from);
- if (unlikely(copied < PAGE_SIZE && iov_iter_count(from))) {
- if (!ret)
- ret = -EFAULT;
- break;
- }
- ret += copied;
- buf->offset = 0;
- buf->len = copied;
+ head++;
+ pipe->head = head;
if (!iov_iter_count(from))
break;
--
2.39.0.rc1.256.g54fd8350bd-goog
Good day,
Attn: Ceo/Owner.
We are currently funding existing and pre-existing projects at an affordable rate.
Business Loans and Expansion Capital is also welcome upon review of your project(s) brief/literature.
If interested, never hesitate to contact us for further correspondence.
Regards.
Andrew Schneider
Senior Portfolio Director
Aivon Funding Group Limited.
v5.11 changes the blkdev lookup mechanism completely since commit
22ae8ce8b892 ("block: simplify bdev/disk lookup in blkdev_get"),
and small part of the change is to unhash part bdev inode when
deleting partition. Turns out this kind of change does fix one
nasty issue in case of BLOCK_EXT_MAJOR:
1) when one partition is deleted & closed, disk_put_part() is always
called before bdput(bdev), see blkdev_put(); so the part's devt can
be freed & re-used before the inode is dropped
2) then new partition with same devt can be created just before the
inode in 1) is dropped, then the old inode/bdev structurein 1) is
re-used for this new partition, this way causes use-after-free and
kernel panic.
It isn't possible to backport the whole big patchset of "merge struct
block_device and struct hd_struct v4" for addressing this issue.
https://lore.kernel.org/linux-block/20201128161510.347752-1-hch@lst.de/
So fixes it by unhashing part bdev in delete_partition(), and this way
is actually aligned with v5.11+'s behavior.
Backported from the following 5.10.y commit:
5f2f77560591 ("block: unhash blkdev part inode when the part is deleted")
Reported-by: Shiwei Cui <cuishw(a)inspur.com>
Tested-by: Shiwei Cui <cuishw(a)inspur.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jan Kara <jack(a)suse.cz>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
block/partition-generic.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/block/partition-generic.c b/block/partition-generic.c
index db57cced9b98..68bd04d044b9 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -270,6 +270,7 @@ void delete_partition(struct gendisk *disk, int partno)
struct disk_part_tbl *ptbl =
rcu_dereference_protected(disk->part_tbl, 1);
struct hd_struct *part;
+ struct block_device *bdev;
if (partno >= ptbl->len)
return;
@@ -283,6 +284,11 @@ void delete_partition(struct gendisk *disk, int partno)
kobject_put(part->holder_dir);
device_del(part_to_dev(part));
+ bdev = bdget(part_devt(part));
+ if (bdev) {
+ remove_inode_hash(bdev->bd_inode);
+ bdput(bdev);
+ }
hd_struct_kill(part);
}
--
2.38.1
v5.11 changes the blkdev lookup mechanism completely since commit
22ae8ce8b892 ("block: simplify bdev/disk lookup in blkdev_get"),
and small part of the change is to unhash part bdev inode when
deleting partition. Turns out this kind of change does fix one
nasty issue in case of BLOCK_EXT_MAJOR:
1) when one partition is deleted & closed, disk_put_part() is always
called before bdput(bdev), see blkdev_put(); so the part's devt can
be freed & re-used before the inode is dropped
2) then new partition with same devt can be created just before the
inode in 1) is dropped, then the old inode/bdev structurein 1) is
re-used for this new partition, this way causes use-after-free and
kernel panic.
It isn't possible to backport the whole big patchset of "merge struct
block_device and struct hd_struct v4" for addressing this issue.
https://lore.kernel.org/linux-block/20201128161510.347752-1-hch@lst.de/
So fixes it by unhashing part bdev in delete_partition(), and this way
is actually aligned with v5.11+'s behavior.
Backported from the following 5.10.y commit:
5f2f77560591 ("block: unhash blkdev part inode when the part is deleted")
Reported-by: Shiwei Cui <cuishw(a)inspur.com>
Tested-by: Shiwei Cui <cuishw(a)inspur.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jan Kara <jack(a)suse.cz>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
block/partition-generic.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/block/partition-generic.c b/block/partition-generic.c
index aee643ce13d1..e69452c1f5ad 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -272,6 +272,7 @@ void delete_partition(struct gendisk *disk, int partno)
struct disk_part_tbl *ptbl =
rcu_dereference_protected(disk->part_tbl, 1);
struct hd_struct *part;
+ struct block_device *bdev;
if (partno >= ptbl->len)
return;
@@ -292,6 +293,12 @@ void delete_partition(struct gendisk *disk, int partno)
* "in-use" until we really free the gendisk.
*/
blk_invalidate_devt(part_devt(part));
+
+ bdev = bdget(part_devt(part));
+ if (bdev) {
+ remove_inode_hash(bdev->bd_inode);
+ bdput(bdev);
+ }
hd_struct_kill(part);
}
--
2.38.1
The LLCC block has several banks each with a different base address
and holes in between. So it is not a correct approach to cover these
banks with a single offset/size. Instead, the individual bank's base
address needs to be specified in devicetree with the exact size.
Also, let's get rid of reg-names property as it is not needed anymore.
The driver is expected to parse the reg field based on index to get the
addresses of each LLCC banks.
Cc: <stable(a)vger.kernel.org> # 5.18
Fixes: 1dc3e50eb680 ("arm64: dts: qcom: sm8450: Add LLCC/system-cache-controller node")
Reported-by: Parikshit Pareek <quic_ppareek(a)quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
arch/arm64/boot/dts/qcom/sm8450.dtsi | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sm8450.dtsi b/arch/arm64/boot/dts/qcom/sm8450.dtsi
index 570475040d95..30685857021a 100644
--- a/arch/arm64/boot/dts/qcom/sm8450.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8450.dtsi
@@ -3640,8 +3640,9 @@ gem_noc: interconnect@19100000 {
system-cache-controller@19200000 {
compatible = "qcom,sm8450-llcc";
- reg = <0 0x19200000 0 0x580000>, <0 0x19a00000 0 0x80000>;
- reg-names = "llcc_base", "llcc_broadcast_base";
+ reg = <0 0x19200000 0 0x80000>, <0 0x19600000 0 0x80000>,
+ <0 0x19300000 0 0x80000>, <0 0x19700000 0 0x80000>,
+ <0 0x19a00000 0 0x80000>;
interrupts = <GIC_SPI 266 IRQ_TYPE_LEVEL_HIGH>;
};
--
2.25.1
The LLCC block has several banks each with a different base address
and holes in between. So it is not a correct approach to cover these
banks with a single offset/size. Instead, the individual bank's base
address needs to be specified in devicetree with the exact size.
Also, let's get rid of reg-names property as it is not needed anymore.
The driver is expected to parse the reg field based on index to get the
addresses of each LLCC banks.
Cc: <stable(a)vger.kernel.org> # 5.17
Fixes: 9ac8999e8d6c ("arm64: dts: qcom: sm8350: Add LLCC node")
Reported-by: Parikshit Pareek <quic_ppareek(a)quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
arch/arm64/boot/dts/qcom/sm8350.dtsi | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sm8350.dtsi b/arch/arm64/boot/dts/qcom/sm8350.dtsi
index 245dce24ec59..7f76778e1bc8 100644
--- a/arch/arm64/boot/dts/qcom/sm8350.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8350.dtsi
@@ -2513,8 +2513,9 @@ gem_noc: interconnect@9100000 {
system-cache-controller@9200000 {
compatible = "qcom,sm8350-llcc";
- reg = <0 0x09200000 0 0x1d0000>, <0 0x09600000 0 0x50000>;
- reg-names = "llcc_base", "llcc_broadcast_base";
+ reg = <0 0x09200000 0 0x58000>, <0 0x09280000 0 0x58000>,
+ <0 0x09300000 0 0x58000>, <0 0x09380000 0 0x58000>,
+ <0 0x09600000 0 0x58000>;
};
usb_1: usb@a6f8800 {
--
2.25.1
The LLCC block has several banks each with a different base address
and holes in between. So it is not a correct approach to cover these
banks with a single offset/size. Instead, the individual bank's base
address needs to be specified in devicetree with the exact size.
Also, let's get rid of reg-names property as it is not needed anymore.
The driver is expected to parse the reg field based on index to get the
addresses of each LLCC banks.
Cc: <stable(a)vger.kernel.org> # 5.12
Fixes: 0085a33a25cc ("arm64: dts: qcom: sm8250: Add support for LLCC block")
Reported-by: Parikshit Pareek <quic_ppareek(a)quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
arch/arm64/boot/dts/qcom/sm8250.dtsi | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sm8250.dtsi b/arch/arm64/boot/dts/qcom/sm8250.dtsi
index dab5579946f3..439ca5f6000e 100644
--- a/arch/arm64/boot/dts/qcom/sm8250.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8250.dtsi
@@ -3545,8 +3545,9 @@ usb_1_dwc3: usb@a600000 {
system-cache-controller@9200000 {
compatible = "qcom,sm8250-llcc";
- reg = <0 0x09200000 0 0x1d0000>, <0 0x09600000 0 0x50000>;
- reg-names = "llcc_base", "llcc_broadcast_base";
+ reg = <0 0x09200000 0 0x50000>, <0 0x09280000 0 0x50000>,
+ <0 0x09300000 0 0x50000>, <0 0x09380000 0 0x50000>,
+ <0 0x09600000 0 0x50000>;
};
usb_2: usb@a8f8800 {
--
2.25.1
The LLCC block has several banks each with a different base address
and holes in between. So it is not a correct approach to cover these
banks with a single offset/size. Instead, the individual bank's base
address needs to be specified in devicetree with the exact size.
Also, let's get rid of reg-names property as it is not needed anymore.
The driver is expected to parse the reg field based on index to get the
addresses of each LLCC banks.
Cc: <stable(a)vger.kernel.org> # 5.11
Fixes: bb1f7cf68a2d ("arm64: dts: qcom: sm8150: Add LLC support for sm8150")
Reported-by: Parikshit Pareek <quic_ppareek(a)quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
arch/arm64/boot/dts/qcom/sm8150.dtsi | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sm8150.dtsi b/arch/arm64/boot/dts/qcom/sm8150.dtsi
index a0c57fb798d3..958655bf5b1a 100644
--- a/arch/arm64/boot/dts/qcom/sm8150.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8150.dtsi
@@ -1762,8 +1762,9 @@ mmss_noc: interconnect@1740000 {
system-cache-controller@9200000 {
compatible = "qcom,sm8150-llcc";
- reg = <0 0x09200000 0 0x200000>, <0 0x09600000 0 0x50000>;
- reg-names = "llcc_base", "llcc_broadcast_base";
+ reg = <0 0x09200000 0 0x50000>, <0 0x09280000 0 0x50000>,
+ <0 0x09300000 0 0x50000>, <0 0x09380000 0 0x50000>,
+ <0 0x09600000 0 0x50000>;
interrupts = <GIC_SPI 582 IRQ_TYPE_LEVEL_HIGH>;
};
--
2.25.1
The LLCC block has several banks each with a different base address
and holes in between. So it is not a correct approach to cover these
banks with a single offset/size. Instead, the individual bank's base
address needs to be specified in devicetree with the exact size.
Also, let's get rid of reg-names property as it is not needed anymore.
The driver is expected to parse the reg field based on index to get the
addresses of each LLCC banks.
Cc: <stable(a)vger.kernel.org> # 6.0
Fixes: 152d1faf1e2f ("arm64: dts: qcom: add SC8280XP platform")
Reported-by: Parikshit Pareek <quic_ppareek(a)quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
arch/arm64/boot/dts/qcom/sc8280xp.dtsi | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
index 109c9d2b684d..f8731cbccce9 100644
--- a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
@@ -1856,8 +1856,11 @@ opp-6 {
system-cache-controller@9200000 {
compatible = "qcom,sc8280xp-llcc";
- reg = <0 0x09200000 0 0x58000>, <0 0x09600000 0 0x58000>;
- reg-names = "llcc_base", "llcc_broadcast_base";
+ reg = <0 0x09200000 0 0x58000>, <0 0x09280000 0 0x58000>,
+ <0 0x09300000 0 0x58000>, <0 0x09380000 0 0x58000>,
+ <0 0x09400000 0 0x58000>, <0 0x09480000 0 0x58000>,
+ <0 0x09500000 0 0x58000>, <0 0x09580000 0 0x58000>,
+ <0 0x09600000 0 0x58000>;
interrupts = <GIC_SPI 582 IRQ_TYPE_LEVEL_HIGH>;
};
--
2.25.1
Good day,
Attn: Ceo/Owner.
We are currently funding existing and pre-existing projects at an affordable rate.
Business Loans and Expansion Capital is also welcome upon review of your project(s) brief/literature.
If interested, never hesitate to contact us for further correspondence.
Regards.
Andrew Schneider
Senior Portfolio Director
Aivon Funding Group Limited.
From: Xiubo Li <xiubli(a)redhat.com>
When ceph releasing the file_lock it will try to get the inode pointer
from the fl->fl_file, which the memory could already be released by
another thread in filp_close(). Because in VFS layer the fl->fl_file
doesn't increase the file's reference counter.
Will switch to use ceph dedicate lock info to track the inode.
And in ceph_fl_release_lock() we should skip all the operations if
the fl->fl_u.ceph_fl.fl_inode is not set, which should come from
the request file_lock. And we will set fl->fl_u.ceph_fl.fl_inode when
inserting it to the inode lock list, which is when copying the lock.
Cc: stable(a)vger.kernel.org
Cc: Jeff Layton <jlayton(a)kernel.org>
URL: https://tracker.ceph.com/issues/57986
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/locks.c | 20 ++++++++++++++++++--
include/linux/ceph/ceph_fs_fl.h | 17 +++++++++++++++++
include/linux/fs.h | 2 ++
3 files changed, 37 insertions(+), 2 deletions(-)
create mode 100644 include/linux/ceph/ceph_fs_fl.h
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index b191426bf880..621f38f10a88 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -34,18 +34,34 @@ static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
{
struct inode *inode = file_inode(dst->fl_file);
atomic_inc(&ceph_inode(inode)->i_filelock_ref);
+ dst->fl_u.ceph_fl.fl_inode = igrab(inode);
}
+/*
+ * Do not use the 'fl->fl_file' in release function, which
+ * is possibly already released by another thread.
+ */
static void ceph_fl_release_lock(struct file_lock *fl)
{
- struct inode *inode = file_inode(fl->fl_file);
- struct ceph_inode_info *ci = ceph_inode(inode);
+ struct inode *inode = fl->fl_u.ceph_fl.fl_inode;
+ struct ceph_inode_info *ci;
+
+ /*
+ * If inode is NULL it should be a request file_lock,
+ * nothing we can do.
+ */
+ if (!inode)
+ return;
+
+ ci = ceph_inode(inode);
if (atomic_dec_and_test(&ci->i_filelock_ref)) {
/* clear error when all locks are released */
spin_lock(&ci->i_ceph_lock);
ci->i_ceph_flags &= ~CEPH_I_ERROR_FILELOCK;
spin_unlock(&ci->i_ceph_lock);
}
+ fl->fl_u.ceph_fl.fl_inode = NULL;
+ iput(inode);
}
static const struct file_lock_operations ceph_fl_lock_ops = {
diff --git a/include/linux/ceph/ceph_fs_fl.h b/include/linux/ceph/ceph_fs_fl.h
new file mode 100644
index 000000000000..ad1cf96329f9
--- /dev/null
+++ b/include/linux/ceph/ceph_fs_fl.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * ceph_fs_fl.h - Ceph lock info
+ *
+ * LGPL2
+ */
+
+#ifndef CEPH_FS_FL_H
+#define CEPH_FS_FL_H
+
+#include <linux/fs.h>
+
+struct ceph_lock_info {
+ struct inode *fl_inode;
+};
+
+#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d6cb42b7e91c..2b03d5e375d7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1066,6 +1066,7 @@ bool opens_in_grace(struct net *);
/* that will die - we need it for nfs_lock_info */
#include <linux/nfs_fs_i.h>
+#include <linux/ceph/ceph_fs_fl.h>
/*
* struct file_lock represents a generic "file lock". It's used to represent
@@ -1119,6 +1120,7 @@ struct file_lock {
int state; /* state of grant or error if -ve */
unsigned int debug_id;
} afs;
+ struct ceph_lock_info ceph_fl;
} fl_u;
} __randomize_layout;
--
2.31.1
Patch 1 adds some missing error status values for MPTCP path management
netlink commands with invalid attributes.
Patches 2-4 make IPv6 subflows use the correct request_sock_ops
structure and IPv6-specific destructor. The first patch in this group is
a prerequisite change that simplifies the last two patches.
Matthieu Baerts (3):
mptcp: remove MPTCP 'ifdef' in TCP SYN cookies
mptcp: dedicated request sock for subflow in v6
mptcp: use proper req destructor for IPv6
Wei Yongjun (1):
mptcp: netlink: fix some error return code
include/net/mptcp.h | 12 ++++++--
net/ipv4/syncookies.c | 7 ++---
net/mptcp/pm_userspace.c | 4 +++
net/mptcp/subflow.c | 61 +++++++++++++++++++++++++++++++++-------
4 files changed, 68 insertions(+), 16 deletions(-)
base-commit: 01de1123322e4fe1bbd0fcdf0982511b55519c03
--
2.38.1
usb_kill_urb warranties that all the handlers are finished when it
returns, but does not protect against threads that might be handling
asynchronously the urb.
For UVC, the function uvc_ctrl_status_event_async() takes care of
control changes. If the code is executed in the following order:
CPU 0 CPU 1
===== =====
uvc_status_complete()
uvc_status_stop()
uvc_ctrl_status_event_work()
uvc_status_start() -> FAIL
Then uvc_status_start will keep failing and this error will be shown:
<4>[ 5.540139] URB 0000000000000000 submitted while active
drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528
Let's improve the current situation, by not re-submiting the urb if
there are no users on the system.
Also add a flag that is clear during stop, that will capture this
situation:
CPU 0 CPU 1
===== =====
uvc_status_complete()
uvc_status_stop()
uvc_status_start()
uvc_ctrl_status_event_work() -> FAIL
Hopefully, with the usb layer protection it should be enough to cover
all the cases.
Cc: stable(a)vger.kernel.org
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
To: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
To: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: linux-media(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
---
drivers/media/usb/uvc/uvc_ctrl.c | 9 +++++++++
drivers/media/usb/uvc/uvc_status.c | 1 +
drivers/media/usb/uvc/uvcvideo.h | 2 ++
3 files changed, 12 insertions(+)
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index c95a2229f4fa..0634a4baa2e9 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1442,12 +1442,20 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)
uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
+ mutex_lock(&dev->lock);
+ if (!dev->users || !dev->resubmit_urb) {
+ mutex_unlock(&dev->lock);
+ return;
+ }
+
/* Resubmit the URB. */
w->urb->interval = dev->int_ep->desc.bInterval;
ret = usb_submit_urb(w->urb, GFP_KERNEL);
if (ret < 0)
dev_err(&dev->udev->dev,
"Failed to resubmit status URB (%d).\n", ret);
+ dev->resubmit_urb = false;
+ mutex_unlock(&dev->lock);
}
bool uvc_ctrl_status_event_async(struct urb *urb, struct uvc_video_chain *chain,
@@ -1466,6 +1474,7 @@ bool uvc_ctrl_status_event_async(struct urb *urb, struct uvc_video_chain *chain,
w->chain = chain;
w->ctrl = ctrl;
+ dev->resubmit_urb = true;
schedule_work(&w->work);
return true;
diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c
index 7518ffce22ed..3cc6e1dfaf01 100644
--- a/drivers/media/usb/uvc/uvc_status.c
+++ b/drivers/media/usb/uvc/uvc_status.c
@@ -310,4 +310,5 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags)
void uvc_status_stop(struct uvc_device *dev)
{
usb_kill_urb(dev->int_urb);
+ dev->resubmit_urb = false;
}
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index df93db259312..9e6a52008ce5 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -539,6 +539,8 @@ struct uvc_device {
struct mutex lock; /* Protects users */
unsigned int users;
+ bool resubmit_urb;
+
atomic_t nmappings;
/* Video control interface */
---
base-commit: 0ec5a38bf8499f403f81cb81a0e3a60887d1993c
change-id: 20221212-uvc-race-09276ea68bf8
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: 1c0908d8e441631f5b8ba433523cf39339ee2ba0
Gitweb: https://git.kernel.org/tip/1c0908d8e441631f5b8ba433523cf39339ee2ba0
Author: Mel Gorman <mgorman(a)techsingularity.net>
AuthorDate: Fri, 02 Dec 2022 10:02:23
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 12 Dec 2022 19:55:56 +01:00
rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
on XFS on arm64.
kernel BUG at fs/inode.c:625!
Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1
pc : clear_inode+0xa0/0xc0
lr : clear_inode+0x38/0xc0
Call trace:
clear_inode+0xa0/0xc0
evict+0x160/0x180
iput+0x154/0x240
do_unlinkat+0x184/0x300
__arm64_sys_unlinkat+0x48/0xc0
el0_svc_common.constprop.4+0xe4/0x2c0
do_el0_svc+0xac/0x100
el0_svc+0x78/0x200
el0t_64_sync_handler+0x9c/0xc0
el0t_64_sync+0x19c/0x1a0
It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
is likely a bug that existed forever and only became visible when ARM
support was added to preempt-rt. The same problem does not occur on x86-64
and he also reported that converting sb->s_inode_wblist_lock to
raw_spinlock_t makes the problem disappear indicating that the RT spinlock
variant is the problem.
Which in turn means that RT mutexes on ARM64 and any other weakly ordered
architecture are affected by this independent of RT.
Will Deacon observed:
"I'd be more inclined to be suspicious of the slowpath tbh, as we need to
make sure that we have acquire semantics on all paths where the lock can
be taken. Looking at the rtmutex code, this really isn't obvious to me
-- for example, try_to_take_rt_mutex() appears to be able to return via
the 'takeit' label without acquire semantics and it looks like we might
be relying on the caller's subsequent _unlock_ of the wait_lock for
ordering, but that will give us release semantics which aren't correct."
Sebastian Andrzej Siewior prototyped a fix that does work based on that
comment but it was a little bit overkill and added some fences that should
not be necessary.
The lock owner is updated with an IRQ-safe raw spinlock held, but the
spin_unlock does not provide acquire semantics which are needed when
acquiring a mutex.
Adds the necessary acquire semantics for lock owner updates in the slow path
acquisition and the waiter bit logic.
It successfully completed 10 iterations of the dbench workload while the
vanilla kernel fails on the first iteration.
[ bigeasy(a)linutronix.de: Initial prototype fix ]
Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics")
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Reported-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.n…
---
kernel/locking/rtmutex.c | 55 +++++++++++++++++++++++++++++------
kernel/locking/rtmutex_api.c | 6 ++--
2 files changed, 49 insertions(+), 12 deletions(-)
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 7779ee8..010cf4e 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -89,15 +89,31 @@ static inline int __ww_mutex_check_kill(struct rt_mutex *lock,
* set this bit before looking at the lock.
*/
-static __always_inline void
-rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+static __always_inline struct task_struct *
+rt_mutex_owner_encode(struct rt_mutex_base *lock, struct task_struct *owner)
{
unsigned long val = (unsigned long)owner;
if (rt_mutex_has_waiters(lock))
val |= RT_MUTEX_HAS_WAITERS;
- WRITE_ONCE(lock->owner, (struct task_struct *)val);
+ return (struct task_struct *)val;
+}
+
+static __always_inline void
+rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+{
+ /*
+ * lock->wait_lock is held but explicit acquire semantics are needed
+ * for a new lock owner so WRITE_ONCE is insufficient.
+ */
+ xchg_acquire(&lock->owner, rt_mutex_owner_encode(lock, owner));
+}
+
+static __always_inline void rt_mutex_clear_owner(struct rt_mutex_base *lock)
+{
+ /* lock->wait_lock is held so the unlock provides release semantics. */
+ WRITE_ONCE(lock->owner, rt_mutex_owner_encode(lock, NULL));
}
static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
@@ -106,7 +122,8 @@ static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
}
-static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
+static __always_inline void
+fixup_rt_mutex_waiters(struct rt_mutex_base *lock, bool acquire_lock)
{
unsigned long owner, *p = (unsigned long *) &lock->owner;
@@ -172,8 +189,21 @@ static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
* still set.
*/
owner = READ_ONCE(*p);
- if (owner & RT_MUTEX_HAS_WAITERS)
- WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+ if (owner & RT_MUTEX_HAS_WAITERS) {
+ /*
+ * See rt_mutex_set_owner() and rt_mutex_clear_owner() on
+ * why xchg_acquire() is used for updating owner for
+ * locking and WRITE_ONCE() for unlocking.
+ *
+ * WRITE_ONCE() would work for the acquire case too, but
+ * in case that the lock acquisition failed it might
+ * force other lockers into the slow path unnecessarily.
+ */
+ if (acquire_lock)
+ xchg_acquire(p, owner & ~RT_MUTEX_HAS_WAITERS);
+ else
+ WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+ }
}
/*
@@ -208,6 +238,13 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock)
owner = *p;
} while (cmpxchg_relaxed(p, owner,
owner | RT_MUTEX_HAS_WAITERS) != owner);
+
+ /*
+ * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE
+ * operations in the event of contention. Ensure the successful
+ * cmpxchg is visible.
+ */
+ smp_mb__after_atomic();
}
/*
@@ -1243,7 +1280,7 @@ static int __sched __rt_mutex_slowtrylock(struct rt_mutex_base *lock)
* try_to_take_rt_mutex() sets the lock waiters bit
* unconditionally. Clean this up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
return ret;
}
@@ -1604,7 +1641,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit
* unconditionally. We might have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
trace_contention_end(lock, ret);
@@ -1719,7 +1756,7 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
* try_to_take_rt_mutex() sets the waiter bit unconditionally.
* We might have to fix that up:
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
debug_rt_mutex_free_waiter(&waiter);
trace_contention_end(lock, 0);
diff --git a/kernel/locking/rtmutex_api.c b/kernel/locking/rtmutex_api.c
index 9002209..cb9fdff 100644
--- a/kernel/locking/rtmutex_api.c
+++ b/kernel/locking/rtmutex_api.c
@@ -267,7 +267,7 @@ void __sched rt_mutex_init_proxy_locked(struct rt_mutex_base *lock,
void __sched rt_mutex_proxy_unlock(struct rt_mutex_base *lock)
{
debug_rt_mutex_proxy_unlock(lock);
- rt_mutex_set_owner(lock, NULL);
+ rt_mutex_clear_owner(lock);
}
/**
@@ -382,7 +382,7 @@ int __sched rt_mutex_wait_proxy_lock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
raw_spin_unlock_irq(&lock->wait_lock);
return ret;
@@ -438,7 +438,7 @@ bool __sched rt_mutex_cleanup_proxy_lock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, false);
raw_spin_unlock_irq(&lock->wait_lock);
Hello Rafael,
On Tue, Oct 11, 2022 at 10:20:50AM +0100, Mel Gorman wrote:
> On Mon, Oct 10, 2022 at 08:29:05PM +0200, Rafael J. Wysocki wrote:
> > > That's less than the previous 5/10 failures but I
> > > cannot be certain it helped without running a lot more boot tests. The
> > > failure happens in the same function as before.
> >
> > I've overlooked the fact that acpi_install_fixed_event_handler()
> > enables the event on success, so it is a bug to call it when the
> > handler is not ready.
> >
> > It should help to only enable the event after running cmos_do_probe()
> > where the driver data pointer is set, so please try the attached
> > patch.
I'm hitting this issue on the 6.0 stable releases (aka 6.0.y) and
looking at the stable tree I see this hasn't been merged... I just got
bitten by this on 6.0.12.
Greg, if Rafael agrees, I think you should apply 4919d3eb2ec0 and
0782b66ed2fb to the 6.0.y tree.
Thank you in advance.
Cheers,
--
Mathieu Chouquet-Stringer me(a)mathieu.digital
The sun itself sees not till heaven clears.
-- William Shakespeare --
It seems we can have one or more framebuffers that are still pinned when
suspending lmem, in such a case we end up creating a shmem backup
object, instead of evicting the object directly, but this will skip
copying the CCS aux state, since we don't allocate the extra storage for
the CCS pages as part of the ttm_tt construction. Since we can already
deal with pinned objects just fine, it doesn't seem too nasty to just
extend to support dealing with the CCS aux state, if the object is a
pinned framebuffer. This fixes display corruption (like in gnome-shell)
seen on DG2 when returning from suspend.
Fixes: da0595ae91da ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Nirmoy Das <nirmoy.das(a)intel.com>
Cc: Andrzej Hajda <andrzej.hajda(a)intel.com>
Cc: Shuicheng Lin <shuicheng.lin(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v5.19+
---
drivers/gpu/drm/i915/gem/i915_gem_object.c | 3 +++
.../gpu/drm/i915/gem/i915_gem_object_types.h | 10 ++++++----
drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c | 18 +++++++++++++++++-
3 files changed, 26 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 733696057761..1a0886b8aaa1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -785,6 +785,9 @@ bool i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj)
if (!HAS_FLAT_CCS(to_i915(obj->base.dev)))
return false;
+ if (obj->flags & I915_BO_ALLOC_CCS_AUX)
+ return true;
+
for (i = 0; i < obj->mm.n_placements; i++) {
/* Compression is not allowed for the objects with smem placement */
if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index a7b70701617a..19c9bdd8f905 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -327,16 +327,18 @@ struct drm_i915_gem_object {
* dealing with userspace objects the CPU fault handler is free to ignore this.
*/
#define I915_BO_ALLOC_GPU_ONLY BIT(6)
+#define I915_BO_ALLOC_CCS_AUX BIT(7)
#define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
I915_BO_ALLOC_VOLATILE | \
I915_BO_ALLOC_CPU_CLEAR | \
I915_BO_ALLOC_USER | \
I915_BO_ALLOC_PM_VOLATILE | \
I915_BO_ALLOC_PM_EARLY | \
- I915_BO_ALLOC_GPU_ONLY)
-#define I915_BO_READONLY BIT(7)
-#define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */
-#define I915_BO_PROTECTED BIT(9)
+ I915_BO_ALLOC_GPU_ONLY | \
+ I915_BO_ALLOC_CCS_AUX)
+#define I915_BO_READONLY BIT(8)
+#define I915_TILING_QUIRK_BIT 9 /* unknown swizzling; do not release! */
+#define I915_BO_PROTECTED BIT(10)
/**
* @mem_flags - Mutable placement-related flags
*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 07e49f22f2de..7e67742bc65e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -50,6 +50,7 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region *apply,
container_of(bo->bdev, typeof(*i915), bdev);
struct drm_i915_gem_object *backup;
struct ttm_operation_ctx ctx = {};
+ unsigned int flags;
int err = 0;
if (bo->resource->mem_type == I915_PL_SYSTEM || obj->ttm.backup)
@@ -65,7 +66,22 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region *apply,
if (obj->flags & I915_BO_ALLOC_PM_VOLATILE)
return 0;
- backup = i915_gem_object_create_shmem(i915, obj->base.size);
+ /*
+ * It seems that we might have some framebuffers still pinned at this
+ * stage, but for such objects we might also need to deal with the CCS
+ * aux state. Make sure we force the save/restore of the CCS state,
+ * otherwise we might observe display corruption, when returning from
+ * suspend.
+ */
+ flags = 0;
+ if (i915_gem_object_needs_ccs_pages(obj)) {
+ WARN_ON_ONCE(!i915_gem_object_is_framebuffer(obj));
+ WARN_ON_ONCE(!pm_apply->allow_gpu);
+
+ flags = I915_BO_ALLOC_CCS_AUX;
+ }
+ backup = i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_SMEM],
+ obj->base.size, 0, flags);
if (IS_ERR(backup))
return PTR_ERR(backup);
--
2.38.1
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: 2d77ed7afafef79dbb9db77f646ba429d907ed2b
Gitweb: https://git.kernel.org/tip/2d77ed7afafef79dbb9db77f646ba429d907ed2b
Author: Mel Gorman <mgorman(a)techsingularity.net>
AuthorDate: Fri, 02 Dec 2022 10:02:23
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 12 Dec 2022 13:13:38 +01:00
]65;6203;1crtmutex: Add acquire semantics for rtmutex lock acquisition slow path
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
on XFS on arm64.
kernel BUG at fs/inode.c:625!
Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1
pc : clear_inode+0xa0/0xc0
lr : clear_inode+0x38/0xc0
Call trace:
clear_inode+0xa0/0xc0
evict+0x160/0x180
iput+0x154/0x240
do_unlinkat+0x184/0x300
__arm64_sys_unlinkat+0x48/0xc0
el0_svc_common.constprop.4+0xe4/0x2c0
do_el0_svc+0xac/0x100
el0_svc+0x78/0x200
el0t_64_sync_handler+0x9c/0xc0
el0t_64_sync+0x19c/0x1a0
It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
is likely a bug that existed forever and only became visible when ARM
support was added to preempt-rt. The same problem does not occur on x86-64
and he also reported that converting sb->s_inode_wblist_lock to
raw_spinlock_t makes the problem disappear indicating that the RT spinlock
variant is the problem.
Which in turn means that RT mutexes on ARM64 and any other weakly ordered
architecture are affected by this independent of RT.
Will Deacon observed:
"I'd be more inclined to be suspicious of the slowpath tbh, as we need to
make sure that we have acquire semantics on all paths where the lock can
be taken. Looking at the rtmutex code, this really isn't obvious to me
-- for example, try_to_take_rt_mutex() appears to be able to return via
the 'takeit' label without acquire semantics and it looks like we might
be relying on the caller's subsequent _unlock_ of the wait_lock for
ordering, but that will give us release semantics which aren't correct."
Sebastian Andrzej Siewior prototyped a fix that does work based on that
comment but it was a little bit overkill and added some fences that should
not be necessary.
The lock owner is updated with an IRQ-safe raw spinlock held, but the
spin_unlock does not provide acquire semantics which are needed when
acquiring a mutex.
Adds the necessary acquire semantics for lock owner updates in the slow path
acquisition and the waiter bit logic.
It successfully completed 10 iterations of the dbench workload while the
vanilla kernel fails on the first iteration.
[ bigeasy(a)linutronix.de: Initial prototype fix ]
Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics")
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Reported-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.n…
---
kernel/locking/rtmutex.c | 55 +++++++++++++++++++++++++++++------
kernel/locking/rtmutex_api.c | 6 ++--
2 files changed, 49 insertions(+), 12 deletions(-)
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 7779ee8..010cf4e 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -89,15 +89,31 @@ static inline int __ww_mutex_check_kill(struct rt_mutex *lock,
* set this bit before looking at the lock.
*/
-static __always_inline void
-rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+static __always_inline struct task_struct *
+rt_mutex_owner_encode(struct rt_mutex_base *lock, struct task_struct *owner)
{
unsigned long val = (unsigned long)owner;
if (rt_mutex_has_waiters(lock))
val |= RT_MUTEX_HAS_WAITERS;
- WRITE_ONCE(lock->owner, (struct task_struct *)val);
+ return (struct task_struct *)val;
+}
+
+static __always_inline void
+rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+{
+ /*
+ * lock->wait_lock is held but explicit acquire semantics are needed
+ * for a new lock owner so WRITE_ONCE is insufficient.
+ */
+ xchg_acquire(&lock->owner, rt_mutex_owner_encode(lock, owner));
+}
+
+static __always_inline void rt_mutex_clear_owner(struct rt_mutex_base *lock)
+{
+ /* lock->wait_lock is held so the unlock provides release semantics. */
+ WRITE_ONCE(lock->owner, rt_mutex_owner_encode(lock, NULL));
}
static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
@@ -106,7 +122,8 @@ static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
}
-static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
+static __always_inline void
+fixup_rt_mutex_waiters(struct rt_mutex_base *lock, bool acquire_lock)
{
unsigned long owner, *p = (unsigned long *) &lock->owner;
@@ -172,8 +189,21 @@ static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
* still set.
*/
owner = READ_ONCE(*p);
- if (owner & RT_MUTEX_HAS_WAITERS)
- WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+ if (owner & RT_MUTEX_HAS_WAITERS) {
+ /*
+ * See rt_mutex_set_owner() and rt_mutex_clear_owner() on
+ * why xchg_acquire() is used for updating owner for
+ * locking and WRITE_ONCE() for unlocking.
+ *
+ * WRITE_ONCE() would work for the acquire case too, but
+ * in case that the lock acquisition failed it might
+ * force other lockers into the slow path unnecessarily.
+ */
+ if (acquire_lock)
+ xchg_acquire(p, owner & ~RT_MUTEX_HAS_WAITERS);
+ else
+ WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+ }
}
/*
@@ -208,6 +238,13 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock)
owner = *p;
} while (cmpxchg_relaxed(p, owner,
owner | RT_MUTEX_HAS_WAITERS) != owner);
+
+ /*
+ * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE
+ * operations in the event of contention. Ensure the successful
+ * cmpxchg is visible.
+ */
+ smp_mb__after_atomic();
}
/*
@@ -1243,7 +1280,7 @@ static int __sched __rt_mutex_slowtrylock(struct rt_mutex_base *lock)
* try_to_take_rt_mutex() sets the lock waiters bit
* unconditionally. Clean this up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
return ret;
}
@@ -1604,7 +1641,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit
* unconditionally. We might have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
trace_contention_end(lock, ret);
@@ -1719,7 +1756,7 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
* try_to_take_rt_mutex() sets the waiter bit unconditionally.
* We might have to fix that up:
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
debug_rt_mutex_free_waiter(&waiter);
trace_contention_end(lock, 0);
diff --git a/kernel/locking/rtmutex_api.c b/kernel/locking/rtmutex_api.c
index 9002209..cb9fdff 100644
--- a/kernel/locking/rtmutex_api.c
+++ b/kernel/locking/rtmutex_api.c
@@ -267,7 +267,7 @@ void __sched rt_mutex_init_proxy_locked(struct rt_mutex_base *lock,
void __sched rt_mutex_proxy_unlock(struct rt_mutex_base *lock)
{
debug_rt_mutex_proxy_unlock(lock);
- rt_mutex_set_owner(lock, NULL);
+ rt_mutex_clear_owner(lock);
}
/**
@@ -382,7 +382,7 @@ int __sched rt_mutex_wait_proxy_lock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
raw_spin_unlock_irq(&lock->wait_lock);
return ret;
@@ -438,7 +438,7 @@ bool __sched rt_mutex_cleanup_proxy_lock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, false);
raw_spin_unlock_irq(&lock->wait_lock);
Hello. There is a issue about the kvm module in kernel, which may be a vulnerability. And I need your adivce.
Here is affect version:
Linux kernel version 3.10.The virtualization platform uses the general Linux version before 4.15, which may also involve.
The issue is fixed in v4.15-rc1.Commit ID is dedf9c5e216902c6d34b5a0d0c40f4acbb3706d8. Link is https://github.com/torvalds/linux/commit/dedf9c5e216902c6d34b5a0d0c40f4acbb….
Here is prerequisites for issue exploitation:
1. The attacker has root privileges on the virtual machine (GuestOS).
2. The Linux kernel version of the host HostOS is less than 3.10 (theoretical analysis shows that versions less than 4.15 are affected).
3. The CPU APIC timer type when GuestOS starts is deadline.
4. The attacker resets the CPU APIC timer type to period within GuestOS.
Here is steps for loophole recurrence:
1. Create and start GuestOS(The CPU APIC timer type is deadline when GuestOS starts).
2. Reset the CPU APIC timer type to period within GuestOS. HostOS panic will be triggered probability.
Here is description of the cause of the issue:
This vulnerability is a vulnerability in the kvm module in the open-source linux kernel. There is a logic error in the processing interrupt of the kvm module lpaic. When the corresponding timing problem occurs, HardLock will reset the HostOS. Specific problem logic:
1. After the virtual machine (GuestOS) is started, initialize the CPU ACPI timer type as deadline, and start the timer timer; The CPU APIC timer type of the GuestOS will be recorded in the HostOS KVM module.
2. Modify CPU APIC timer to period type in GuestOS;
3. The process of changing CPU APIC timer type will be executed in module of kvm in HostOS;
4. When changing the CPU APIC timer type, it is possible to be interrupted by an interrupt whose timer has expired;
5. If the interrupt processing function of the timer thinks that the CPU APIC timer type is period, the CPU APIC timer will be added to the timer task list again; However, in this process, the CPU APIC timer period type value is not initialized, which causes the timer timeout to remain unchanged. Therefore, the timer task will be added back to the task list repeatedly, causing an endless loop, which eventually causes HardLock to reset the host.
Here is the issue impact:
The HostOS is panic. All virtual machines on the host are abnormal and cannot be used.
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0acc442309a0 ("can: af_can: fix NULL pointer dereference in can_rcv_filter")
fb08cba12b52 ("can: canxl: update CAN infrastructure for CAN XL frames")
467ef4c7b9d1 ("can: skb: add skb CAN frame data length helpers")
96a7457a14d9 ("can: skb: unify skb CAN frame identification helpers")
a6d190f8c767 ("can: skb: drop tx skb if in listen only mode")
ccd8a9351f7b ("can: skb: move can_dropped_invalid_skb() and can_skb_headroom_valid() to skb.c")
6a5286442fb6 ("can: Kconfig: turn menu "CAN Device Drivers" into a menuconfig using CAN_DEV")
df6ad5dd838e ("can: Kconfig: rename config symbol CAN_DEV into CAN_NETLINK")
6c1e423a3c84 ("can: can-dev: remove obsolete CAN LED support")
2dcb8e8782d8 ("can: ctucanfd: add support for CTU CAN FD open-source IP core - bus independent part.")
136bed0bfd3b ("can: mcba_usb: properly check endpoint type")
00f4a0afb7ea ("can: Use netif_rx().")
c5048a7b2c23 ("can: rcar_canfd: rcar_canfd_channel_probe(): register the CAN device when fully ready")
1c45f5778a3b ("can: flexcan: add ethtool support to change rx-rtr setting during runtime")
c5c88591040e ("can: flexcan: add more quirks to describe RX path capabilities")
34ea4e1c99f1 ("can: flexcan: rename RX modes")
01bb4dccd92b ("can: flexcan: allow to change quirks at runtime")
bfd00e021cf1 ("can: flexcan: move driver into separate sub directory")
5fe1be81efd2 ("can: dev: reorder struct can_priv members for better packing")
cc4b08c31b5c ("can: do not increase tx_bytes statistics for RTR frames")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0acc442309a0a1b01bcdaa135e56e6398a49439c Mon Sep 17 00:00:00 2001
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
Date: Tue, 6 Dec 2022 21:12:59 +0100
Subject: [PATCH] can: af_can: fix NULL pointer dereference in can_rcv_filter
Analogue to commit 8aa59e355949 ("can: af_can: fix NULL pointer
dereference in can_rx_register()") we need to check for a missing
initialization of ml_priv in the receive path of CAN frames.
Since commit 4e096a18867a ("net: introduce CAN specific pointer in the
struct net_device") the check for dev->type to be ARPHRD_CAN is not
sufficient anymore since bonding or tun netdevices claim to be CAN
devices but do not initialize ml_priv accordingly.
Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device")
Reported-by: syzbot+2d7f58292cb5b29eb5ad(a)syzkaller.appspotmail.com
Reported-by: Wei Chen <harperchen1110(a)gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Link: https://lore.kernel.org/all/20221206201259.3028-1-socketcan@hartkopp.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/af_can.c b/net/can/af_can.c
index 27dcdcc0b808..c69168f11e44 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -677,7 +677,7 @@ static void can_receive(struct sk_buff *skb, struct net_device *dev)
static int can_rcv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev)
{
- if (unlikely(dev->type != ARPHRD_CAN || (!can_is_can_skb(skb)))) {
+ if (unlikely(dev->type != ARPHRD_CAN || !can_get_ml_priv(dev) || !can_is_can_skb(skb))) {
pr_warn_once("PF_CAN: dropped non conform CAN skbuff: dev type %d, len %d\n",
dev->type, skb->len);
@@ -692,7 +692,7 @@ static int can_rcv(struct sk_buff *skb, struct net_device *dev,
static int canfd_rcv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev)
{
- if (unlikely(dev->type != ARPHRD_CAN || (!can_is_canfd_skb(skb)))) {
+ if (unlikely(dev->type != ARPHRD_CAN || !can_get_ml_priv(dev) || !can_is_canfd_skb(skb))) {
pr_warn_once("PF_CAN: dropped non conform CAN FD skbuff: dev type %d, len %d\n",
dev->type, skb->len);
@@ -707,7 +707,7 @@ static int canfd_rcv(struct sk_buff *skb, struct net_device *dev,
static int canxl_rcv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev)
{
- if (unlikely(dev->type != ARPHRD_CAN || (!can_is_canxl_skb(skb)))) {
+ if (unlikely(dev->type != ARPHRD_CAN || !can_get_ml_priv(dev) || !can_is_canxl_skb(skb))) {
pr_warn_once("PF_CAN: dropped non conform CAN XL skbuff: dev type %d, len %d\n",
dev->type, skb->len);
FOND MONETAR INTERNAȚIONAL (HQ1)
700 19th Street, N.W., Washington, D.C. 20431.
Fondul nerevendicat al loteriei Mo 755.000,00 USD
detectat de proprietarul de e-mail al fondului.
REF:-XVGN 82022.
Numărul câștigător; 5-6-14-29-35
Stimate proprietar de e-mail al fondului,
Ți-am trimis această scrisoare, acum o lună, dar nu am auzit de la
tine, nu sunt sigur dacă ai primit-o și de aceea o trimit din nou.
Acesta este pentru a te informa despre informații foarte importante
care vor fi de mare ajutor pentru a vă răscumpăra de toate
dificultățile pe care le-ați avut în obținerea plății restante din
cauza cererii excesive de bani de la dumneavoastră atât de către
funcționari corupți funcționari ai băncii, cât și de către companiile
de curierat după care fondul dumneavoastră rămâne neachitat.
Sunt doamna Kristalina Georgieva, un înalt funcționar la Fondul
Monetar Internațional (FMI). Te-ar putea interesa să știi că prin
atâtea corespondențe au ajuns la biroul nostru rapoarte despre felul
tulburător în care oamenii ca tine sunt tratați de către Diverse
Bănci. După investigația noastră, am aflat că adresa dvs. de e-mail a
fost unul dintre norocoșii câștigători ai selecției de loterie Mo în
anul 2020, dar din cauza unor bancheri corupți, aceștia încearcă să vă
redirecționeze fondurile în contul lor privat.
Toate prostatale guvernamentale și neguvernamentale, ONG-urile,
companiile financiare, băncile, companiile de securitate și companiile
de curierat care au fost în contact cu dvs. în ultimul timp au primit
instrucțiuni să vă retragă tranzacția și vi s-a sfătuit să NU
răspundeți de atunci. Fondul Monetar Internațional (FMI) este acum
direct responsabil pentru plata dvs., pe care o puteți primi în contul
dvs. bancar cu ajutorul Băncii Europene de Investiții.
Numele dvs. a apărut pe lista noastră de program de plăți a
beneficiarilor care își vor primi fondurile Loteriei Mo în această
plată din primul trimestru al anului 2022, deoarece transferăm fonduri
doar de două ori pe an, conform reglementărilor noastre bancare. Ne
cerem scuze pentru întârzierea plății și vă rugăm să nu mai comunicați
cu orice birou acum și să acordați atenție biroului nostru în
consecință.
Acum noua dvs. plată, Nr. de aprobare a Națiunilor Unite; UN 5685 P,
Nr. aprobat la Casa Alba: WH44CV, Nr. referinta 35460021, Nr. alocare:
674632 Nr. parola: 339331, nr. Cod PIN: 55674 și certificatul
dumneavoastră de plată de merit: Nr.: 103, Nr. cod emis: 0763;
Confirmare imediată (IMF) Telex Nr: -1114433; Cod secret nr: XXTN013.
Fondul dvs. de moștenire pentru plată parțială este de 755.000 USD
după ce ați primit aceste plăți vitale. numere câștigătoare;
5-6-14-29-35, data câștigării; 05 octombrie 2020, sunteți acum
calificat să primiți și să confirmați plata cu Regiunea Africană a
Fondului Monetar Internațional (FMI) imediat în următoarele 168 de
ore. Vă asigurăm că plata vă va ajunge atâta timp cât urmați
directivele și instrucțiunile mele. Am decis să vă oferim un COD,
CODUL ESTE: 601. Vă rugăm, de fiecare dată când primiți un mail cu
numele doamnei Kristalina Georgieva, verificați dacă există COD (601)
dacă codul nu este scris, vă rugăm să ștergeți mesaj din inbox-ul tău!
Sunteți sfătuit prin prezenta să NU mai trimiteți nicio plată către
nicio instituție cu privire la tranzacția dvs., deoarece fondul
dumneavoastră vă va fi transferat direct din sursa noastră. Singura
plată necesară la bancă este doar taxa pentru certificatul de
lichidare FMI, astfel încât banca va putea elibera întreaga sumă de
bani în contul dvs. bancar.
Sper că acest lucru este clar. Orice acțiune contrară acestei
instrucțiuni este pe propriul risc. REȚINEȚI CĂ TREBUIE PLĂȚIȚI
COMISIONALĂ DE LIMITARE A CERTIFICATULUI FMI CĂTRE BANCĂ ȘI FĂRĂ TAXĂ
SUPLIMENTARĂ.
Am depus deja fondul dvs. total la banca plătitoare corespunzătoare
Banca Europeană de Investiții, vreau să contactați Approval pentru a
primi fondul dvs. total în stare bună, fără alte întârzieri sau taxe
suplimentare.
Răspundeți la acest e-mail bancar:
(europeaninvestmentbankeib6(a)gmail.com)
Și fondul tău va fi eliberat în contul tău bancar odată ce vei contacta banca.
Contactați numele managerului băncii; Dr. Wilson Taylor
E-mail bancar:
(europeaninvestmentbankeib6(a)gmail.com)
Cu respect
Salutari,
Doamna Kristalina Georgieva, (I.M.F)(601)
FOND MONETAR INTERNAȚIONAL (HQ1)
700 19th Street, N.W., Washington, D.C. 20431.
Fondul nerevendicat al loteriei Mo 755.000,00 USD
detectat de proprietarul de e-mail al fondului.
REF:-XVGN 82022.
Numărul câștigător; 5-6-14-29-35
Stimate proprietar de e-mail al fondului,
Ți-am trimis această scrisoare, acum o lună, dar nu am auzit de la
tine, nu sunt sigur dacă ai primit-o și de aceea o trimit din nou.
Acesta este pentru a te informa despre informații foarte importante
care vor fi de mare ajutor pentru a vă răscumpăra de toate
dificultățile pe care le-ați avut în obținerea plății restante din
cauza cererii excesive de bani de la dumneavoastră atât de către
funcționari corupți funcționari ai băncii, cât și de către companiile
de curierat după care fondul dumneavoastră rămâne neachitat.
Sunt doamna Kristalina Georgieva, un înalt funcționar la Fondul
Monetar Internațional (FMI). Te-ar putea interesa să știi că prin
atâtea corespondențe au ajuns la biroul nostru rapoarte despre felul
tulburător în care oamenii ca tine sunt tratați de către Diverse
Bănci. După investigația noastră, am aflat că adresa dvs. de e-mail a
fost unul dintre norocoșii câștigători ai selecției de loterie Mo în
anul 2020, dar din cauza unor bancheri corupți, aceștia încearcă să vă
redirecționeze fondurile în contul lor privat.
Toate prostatale guvernamentale și neguvernamentale, ONG-urile,
companiile financiare, băncile, companiile de securitate și companiile
de curierat care au fost în contact cu dvs. în ultimul timp au primit
instrucțiuni să vă retragă tranzacția și vi s-a sfătuit să NU
răspundeți de atunci. Fondul Monetar Internațional (FMI) este acum
direct responsabil pentru plata dvs., pe care o puteți primi în contul
dvs. bancar cu ajutorul Băncii Europene de Investiții.
Numele dvs. a apărut pe lista noastră de program de plăți a
beneficiarilor care își vor primi fondurile Loteriei Mo în această
plată din primul trimestru al anului 2022, deoarece transferăm fonduri
doar de două ori pe an, conform reglementărilor noastre bancare. Ne
cerem scuze pentru întârzierea plății și vă rugăm să nu mai comunicați
cu orice birou acum și să acordați atenție biroului nostru în
consecință.
Acum noua dvs. plată, Nr. de aprobare a Națiunilor Unite; UN 5685 P,
Nr. aprobat la Casa Alba: WH44CV, Nr. referinta 35460021, Nr. alocare:
674632 Nr. parola: 339331, nr. Cod PIN: 55674 și certificatul
dumneavoastră de plată de merit: Nr.: 103, Nr. cod emis: 0763;
Confirmare imediată (IMF) Telex Nr: -1114433; Cod secret nr: XXTN013.
Fondul dvs. de moștenire pentru plată parțială este de 755.000 USD
după ce ați primit aceste plăți vitale. numere câștigătoare;
5-6-14-29-35, data câștigării; 05 octombrie 2020, sunteți acum
calificat să primiți și să confirmați plata cu Regiunea Africană a
Fondului Monetar Internațional (FMI) imediat în următoarele 168 de
ore. Vă asigurăm că plata vă va ajunge atâta timp cât urmați
directivele și instrucțiunile mele. Am decis să vă oferim un COD,
CODUL ESTE: 601. Vă rugăm, de fiecare dată când primiți un mail cu
numele doamnei Kristalina Georgieva, verificați dacă există COD (601)
dacă codul nu este scris, vă rugăm să ștergeți mesaj din inbox-ul tău!
Sunteți sfătuit prin prezenta să NU mai trimiteți nicio plată către
nicio instituție cu privire la tranzacția dvs., deoarece fondul
dumneavoastră vă va fi transferat direct din sursa noastră. Singura
plată necesară la bancă este doar taxa pentru certificatul de
lichidare FMI, astfel încât banca va putea elibera întreaga sumă de
bani în contul dvs. bancar.
Sper că acest lucru este clar. Orice acțiune contrară acestei
instrucțiuni este pe propriul risc. REȚINEȚI CĂ TREBUIE PLĂȚIȚI
COMISIONALĂ DE LIMITARE A CERTIFICATULUI FMI CĂTRE BANCĂ ȘI FĂRĂ TAXĂ
SUPLIMENTARĂ.
Am depus deja fondul dvs. total la banca plătitoare corespunzătoare
Banca Europeană de Investiții, vreau să contactați Approval pentru a
primi fondul dvs. total în stare bună, fără alte întârzieri sau taxe
suplimentare.
Răspundeți la acest e-mail bancar:
(europeaninvestmentbankeib6(a)gmail.com)
Și fondul tău va fi eliberat în contul tău bancar odată ce vei contacta banca.
Contactați numele managerului băncii; Dr. Wilson Taylor
E-mail bancar:
(europeaninvestmentbankeib6(a)gmail.com)
Cu respect
Salutari,
Doamna Kristalina Georgieva, (I.M.F)(601)
Département fédéral de l'intérieur *DFI* vous offre des opportunités de bourse d'étude et des contrats de travail sur:
-Canada-États-Unis -France -Belgique.
Pour soumettre vos demandes, rédigez une lettre de motivation et y joindre une (01) photo d’identité en réclamant le formulaire d’inscription à l'adresse mail: dfibourse(a)gmail.com
Ps : Pour information et une large diffusion.
Bonne chance
Bureau du registraire D F I