The patch titled
Subject: mm: convert partially_mapped set/clear operations to be atomic
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-convert-partially_mapped-set-clear-operations-to-be-atomic.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Usama Arif <usamaarif642(a)gmail.com>
Subject: mm: convert partially_mapped set/clear operations to be atomic
Date: Thu, 12 Dec 2024 18:33:51 +0000
Other page flags in the 2nd page, like PG_hwpoison and PG_anon_exclusive
can get modified concurrently. Changes to other page flags might be lost
if they are happening at the same time as non-atomic partially_mapped
operations. Hence, make partially_mapped operations atomic.
Link: https://lkml.kernel.org/r/20241212183351.1345389-1-usamaarif642@gmail.com
Fixes: 8422acdc97ed ("mm: introduce a pageflag for partially mapped folios")
Reported-by: David Hildenbrand <david(a)redhat.com>
Link: https://lore.kernel.org/all/e53b04ad-1827-43a2-a1ab-864c7efecf6e@redhat.com/
Signed-off-by: Usama Arif <usamaarif642(a)gmail.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Domenico Cerasuolo <cerasuolodomenico(a)gmail.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Nico Pache <npache(a)redhat.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/page-flags.h | 12 ++----------
mm/huge_memory.c | 8 ++++----
2 files changed, 6 insertions(+), 14 deletions(-)
--- a/include/linux/page-flags.h~mm-convert-partially_mapped-set-clear-operations-to-be-atomic
+++ a/include/linux/page-flags.h
@@ -862,18 +862,10 @@ static inline void ClearPageCompound(str
ClearPageHead(page);
}
FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE)
-FOLIO_TEST_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
-/*
- * PG_partially_mapped is protected by deferred_split split_queue_lock,
- * so its safe to use non-atomic set/clear.
- */
-__FOLIO_SET_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
-__FOLIO_CLEAR_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
+FOLIO_FLAG(partially_mapped, FOLIO_SECOND_PAGE)
#else
FOLIO_FLAG_FALSE(large_rmappable)
-FOLIO_TEST_FLAG_FALSE(partially_mapped)
-__FOLIO_SET_FLAG_NOOP(partially_mapped)
-__FOLIO_CLEAR_FLAG_NOOP(partially_mapped)
+FOLIO_FLAG_FALSE(partially_mapped)
#endif
#define PG_head_mask ((1UL << PG_head))
--- a/mm/huge_memory.c~mm-convert-partially_mapped-set-clear-operations-to-be-atomic
+++ a/mm/huge_memory.c
@@ -3577,7 +3577,7 @@ int split_huge_page_to_list_to_order(str
!list_empty(&folio->_deferred_list)) {
ds_queue->split_queue_len--;
if (folio_test_partially_mapped(folio)) {
- __folio_clear_partially_mapped(folio);
+ folio_clear_partially_mapped(folio);
mod_mthp_stat(folio_order(folio),
MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
}
@@ -3689,7 +3689,7 @@ bool __folio_unqueue_deferred_split(stru
if (!list_empty(&folio->_deferred_list)) {
ds_queue->split_queue_len--;
if (folio_test_partially_mapped(folio)) {
- __folio_clear_partially_mapped(folio);
+ folio_clear_partially_mapped(folio);
mod_mthp_stat(folio_order(folio),
MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
}
@@ -3733,7 +3733,7 @@ void deferred_split_folio(struct folio *
spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
if (partially_mapped) {
if (!folio_test_partially_mapped(folio)) {
- __folio_set_partially_mapped(folio);
+ folio_set_partially_mapped(folio);
if (folio_test_pmd_mappable(folio))
count_vm_event(THP_DEFERRED_SPLIT_PAGE);
count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED);
@@ -3826,7 +3826,7 @@ static unsigned long deferred_split_scan
} else {
/* We lost race with folio_put() */
if (folio_test_partially_mapped(folio)) {
- __folio_clear_partially_mapped(folio);
+ folio_clear_partially_mapped(folio);
mod_mthp_stat(folio_order(folio),
MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
}
_
Patches currently in -mm which might be from usamaarif642(a)gmail.com are
mm-convert-partially_mapped-set-clear-operations-to-be-atomic.patch
The patch titled
Subject: nilfs2: fix buffer head leaks in calls to truncate_inode_pages()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-buffer-head-leaks-in-calls-to-truncate_inode_pages.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix buffer head leaks in calls to truncate_inode_pages()
Date: Fri, 13 Dec 2024 01:43:28 +0900
When block_invalidatepage was converted to block_invalidate_folio, the
fallback to block_invalidatepage in folio_invalidate() if the
address_space_operations method invalidatepage (currently
invalidate_folio) was not set, was removed.
Unfortunately, some pseudo-inodes in nilfs2 use empty_aops set by
inode_init_always_gfp() as is, or explicitly set it to
address_space_operations. Therefore, with this change,
block_invalidatepage() is no longer called from folio_invalidate(), and as
a result, the buffer_head structures attached to these pages/folios are no
longer freed via try_to_free_buffers().
Thus, these buffer heads are now leaked by truncate_inode_pages(), which
cleans up the page cache from inode evict(), etc.
Three types of caches use empty_aops: gc inode caches and the DAT shadow
inode used by GC, and b-tree node caches. Of these, b-tree node caches
explicitly call invalidate_mapping_pages() during cleanup, which involves
calling try_to_free_buffers(), so the leak was not visible during normal
operation but worsened when GC was performed.
Fix this issue by using address_space_operations with invalidate_folio set
to block_invalidate_folio instead of empty_aops, which will ensure the
same behavior as before.
Link: https://lkml.kernel.org/r/20241212164556.21338-1-konishi.ryusuke@gmail.com
Fixes: 7ba13abbd31e ("fs: Turn block_invalidatepage into block_invalidate_folio")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [5.18+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/btnode.c | 1 +
fs/nilfs2/gcinode.c | 2 +-
fs/nilfs2/inode.c | 5 +++++
fs/nilfs2/nilfs.h | 1 +
4 files changed, 8 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/btnode.c~nilfs2-fix-buffer-head-leaks-in-calls-to-truncate_inode_pages
+++ a/fs/nilfs2/btnode.c
@@ -35,6 +35,7 @@ void nilfs_init_btnc_inode(struct inode
ii->i_flags = 0;
memset(&ii->i_bmap_data, 0, sizeof(struct nilfs_bmap));
mapping_set_gfp_mask(btnc_inode->i_mapping, GFP_NOFS);
+ btnc_inode->i_mapping->a_ops = &nilfs_buffer_cache_aops;
}
void nilfs_btnode_cache_clear(struct address_space *btnc)
--- a/fs/nilfs2/gcinode.c~nilfs2-fix-buffer-head-leaks-in-calls-to-truncate_inode_pages
+++ a/fs/nilfs2/gcinode.c
@@ -163,7 +163,7 @@ int nilfs_init_gcinode(struct inode *ino
inode->i_mode = S_IFREG;
mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
- inode->i_mapping->a_ops = &empty_aops;
+ inode->i_mapping->a_ops = &nilfs_buffer_cache_aops;
ii->i_flags = 0;
nilfs_bmap_init_gc(ii->i_bmap);
--- a/fs/nilfs2/inode.c~nilfs2-fix-buffer-head-leaks-in-calls-to-truncate_inode_pages
+++ a/fs/nilfs2/inode.c
@@ -276,6 +276,10 @@ const struct address_space_operations ni
.is_partially_uptodate = block_is_partially_uptodate,
};
+const struct address_space_operations nilfs_buffer_cache_aops = {
+ .invalidate_folio = block_invalidate_folio,
+};
+
static int nilfs_insert_inode_locked(struct inode *inode,
struct nilfs_root *root,
unsigned long ino)
@@ -681,6 +685,7 @@ struct inode *nilfs_iget_for_shadow(stru
NILFS_I(s_inode)->i_flags = 0;
memset(NILFS_I(s_inode)->i_bmap, 0, sizeof(struct nilfs_bmap));
mapping_set_gfp_mask(s_inode->i_mapping, GFP_NOFS);
+ s_inode->i_mapping->a_ops = &nilfs_buffer_cache_aops;
err = nilfs_attach_btree_node_cache(s_inode);
if (unlikely(err)) {
--- a/fs/nilfs2/nilfs.h~nilfs2-fix-buffer-head-leaks-in-calls-to-truncate_inode_pages
+++ a/fs/nilfs2/nilfs.h
@@ -401,6 +401,7 @@ extern const struct file_operations nilf
extern const struct inode_operations nilfs_file_inode_operations;
extern const struct file_operations nilfs_file_operations;
extern const struct address_space_operations nilfs_aops;
+extern const struct address_space_operations nilfs_buffer_cache_aops;
extern const struct inode_operations nilfs_dir_inode_operations;
extern const struct inode_operations nilfs_special_inode_operations;
extern const struct inode_operations nilfs_symlink_inode_operations;
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-buffer-head-leaks-in-calls-to-truncate_inode_pages.patch
The patch titled
Subject: zram: fix panic when using ext4 over zram
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
zram-panic-when-use-ext4-over-zram.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: caiqingfu <caiqingfu(a)ruijie.com.cn>
Subject: zram: fix panic when using ext4 over zram
Date: Fri, 29 Nov 2024 19:57:35 +0800
[ 52.073080 ] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 52.073511 ] Modules linked in:
[ 52.074094 ] CPU: 0 UID: 0 PID: 3825 Comm: a.out Not tainted 6.12.0-07749-g28eb75e178d3-dirty #3
[ 52.074672 ] Hardware name: linux,dummy-virt (DT)
[ 52.075128 ] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 52.075619 ] pc : obj_malloc+0x5c/0x160
[ 52.076402 ] lr : zs_malloc+0x200/0x570
[ 52.076630 ] sp : ffff80008dd335f0
[ 52.076797 ] x29: ffff80008dd335f0 x28: ffff000004104a00 x27: ffff000004dfc400
[ 52.077319 ] x26: 000000000000ca18 x25: ffff00003fcaf0e0 x24: ffff000006925cf0
[ 52.077785 ] x23: 0000000000000c0a x22: ffff0000032ee780 x21: ffff000006925cf0
[ 52.078257 ] x20: 0000000000088000 x19: 0000000000000000 x18: 0000000000fffc18
[ 52.078701 ] x17: 00000000fffffffd x16: 0000000000000803 x15: 00000000fffffffe
[ 52.079203 ] x14: 000000001824429d x13: ffff000006e84000 x12: ffff000006e83fec
[ 52.079711 ] x11: ffff000006e83000 x10: 00000000000002a5 x9 : ffff000006e83ff3
[ 52.080269 ] x8 : 0000000000000001 x7 : 0000000017e80000 x6 : 0000000000017e80
[ 52.080724 ] x5 : 0000000000000003 x4 : ffff00000402a5e8 x3 : 0000000000000066
[ 52.081081 ] x2 : ffff000006925cf0 x1 : ffff00000402a5e8 x0 : ffff000004104a00
[ 52.081595 ] Call trace:
[ 52.081925 ] obj_malloc+0x5c/0x160 (P)
[ 52.082220 ] zs_malloc+0x200/0x570 (L)
[ 52.082504 ] zs_malloc+0x200/0x570
[ 52.082716 ] zram_submit_bio+0x788/0x9e8
[ 52.083017 ] __submit_bio+0x1c4/0x338
[ 52.083343 ] submit_bio_noacct_nocheck+0x128/0x2c0
[ 52.083518 ] submit_bio_noacct+0x1c8/0x308
[ 52.083722 ] submit_bio+0xa8/0x14c
[ 52.083942 ] submit_bh_wbc+0x140/0x1bc
[ 52.084088 ] __block_write_full_folio+0x23c/0x5f0
[ 52.084232 ] block_write_full_folio+0x134/0x21c
[ 52.084524 ] write_cache_pages+0x64/0xd4
[ 52.084778 ] blkdev_writepages+0x50/0x8c
[ 52.085040 ] do_writepages+0x80/0x2b0
[ 52.085292 ] filemap_fdatawrite_wbc+0x6c/0x90
[ 52.085597 ] __filemap_fdatawrite_range+0x64/0x94
[ 52.085900 ] filemap_fdatawrite+0x1c/0x28
[ 52.086158 ] sync_bdevs+0x170/0x17c
[ 52.086374 ] ksys_sync+0x6c/0xb8
[ 52.086597 ] __arm64_sys_sync+0x10/0x20
[ 52.086847 ] invoke_syscall+0x44/0x100
[ 52.087230 ] el0_svc_common.constprop.0+0x40/0xe0
[ 52.087550 ] do_el0_svc+0x1c/0x28
[ 52.087690 ] el0_svc+0x30/0xd0
[ 52.087818 ] el0t_64_sync_handler+0xc8/0xcc
[ 52.088046 ] el0t_64_sync+0x198/0x19c
[ 52.088500 ] Code: 110004a5 6b0500df f9401273 54000160 (f9401664)
[ 52.089097 ] ---[ end trace 0000000000000000 ]---
When using ext4 on zram, the following panic occasionally occurs under
high memory usage
The reason is that when the handle is obtained using the slow path, it
will be re-compressed. If the data in the page changes, the compressed
length may exceed the previous one. Overflow occurred when writing to
zs_object, which then caused the panic.
Comment the fast path and force the slow path. Adding a large number of
read and write file systems can quickly reproduce it.
The solution is to re-obtain the handle after re-compression if the length
is different from the previous one.
Link: https://lkml.kernel.org/r/20241129115735.136033-1-baicaiaichibaicai@gmail.c…
Signed-off-by: caiqingfu <caiqingfu(a)ruijie.com.cn>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/block/zram/zram_drv.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/block/zram/zram_drv.c~zram-panic-when-use-ext4-over-zram
+++ a/drivers/block/zram/zram_drv.c
@@ -1633,6 +1633,7 @@ static int zram_write_page(struct zram *
unsigned long alloced_pages;
unsigned long handle = -ENOMEM;
unsigned int comp_len = 0;
+ unsigned int last_comp_len = 0;
void *src, *dst, *mem;
struct zcomp_strm *zstrm;
unsigned long element = 0;
@@ -1664,6 +1665,11 @@ compress_again:
if (comp_len >= huge_class_size)
comp_len = PAGE_SIZE;
+
+ if (last_comp_len && (last_comp_len != comp_len)) {
+ zs_free(zram->mem_pool, handle);
+ handle = (unsigned long)ERR_PTR(-ENOMEM);
+ }
/*
* handle allocation has 2 paths:
* a) fast path is executed with preemption disabled (for
@@ -1692,8 +1698,10 @@ compress_again:
if (IS_ERR_VALUE(handle))
return PTR_ERR((void *)handle);
- if (comp_len != PAGE_SIZE)
+ if (comp_len != PAGE_SIZE) {
+ last_comp_len = comp_len;
goto compress_again;
+ }
/*
* If the page is not compressible, you need to acquire the
* lock and execute the code below. The zcomp_stream_get()
_
Patches currently in -mm which might be from caiqingfu(a)ruijie.com.cn are
zram-panic-when-use-ext4-over-zram.patch
Since 5.16 and prior to 6.13 KVM can't be used with FSDAX
guest memory (PMD pages). To reproduce the issue you need to reserve
guest memory with `memmap=` cmdline, create and mount FS in DAX mode
(tested both XFS and ext4), see doc link below. ndctl command for test:
ndctl create-namespace -v -e namespace1.0 --map=dev --mode=fsdax -a 2M
Then pass memory object to qemu like:
-m 8G -object memory-backend-file,id=ram0,size=8G,\
mem-path=/mnt/pmem/guestmem,share=on,prealloc=on,dump=off,align=2097152 \
-numa node,memdev=ram0,cpus=0-1
QEMU fails to run guest with error: kvm run failed Bad address
and there are two warnings in dmesg:
WARN_ON_ONCE(!page_count(page)) in kvm_is_zone_device_page() and
WARN_ON_ONCE(folio_ref_count(folio) <= 0) in try_grab_folio() (v6.6.63)
It looks like in the past assumption was made that pfn won't change from
faultin_pfn() to release_pfn_clean(), e.g. see
commit 4cd071d13c5c ("KVM: x86/mmu: Move calls to thp_adjust() down a level")
But kvm_page_fault structure made pfn part of mutable state, so
now release_pfn_clean() can take hugepage-adjusted pfn.
And it works for all cases (/dev/shm, hugetlb, devdax) except fsdax.
Apparently in fsdax mode faultin-pfn and adjusted-pfn may refer to
different folios, so we're getting get_page/put_page imbalance.
To solve this preserve faultin pfn in separate local variable
and pass it in kvm_release_pfn_clean().
Patch tested for all mentioned guest memory backends with tdp_mmu={0,1}.
No bug in upstream as it was solved fundamentally by
commit 8dd861cc07e2 ("KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns")
and related patch series.
Link: https://nvdimm.docs.kernel.org/2mib_fs_dax.html
Fixes: 2f6305dd5676 ("KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault")
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru>
---
v1 -> v2:
* Instead of new struct field prefer local variable to snapshot faultin pfn
as suggested by Sean Christopherson.
* Tested patch for 6.1 and 6.12
arch/x86/kvm/mmu/mmu.c | 5 ++++-
arch/x86/kvm/mmu/paging_tmpl.h | 5 ++++-
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 13134954e24d..d392022dcb89 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4245,6 +4245,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
unsigned long mmu_seq;
+ kvm_pfn_t orig_pfn;
int r;
fault->gfn = fault->addr >> PAGE_SHIFT;
@@ -4272,6 +4273,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
if (r != RET_PF_CONTINUE)
return r;
+ orig_pfn = fault->pfn;
+
r = RET_PF_RETRY;
if (is_tdp_mmu_fault)
@@ -4296,7 +4299,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
read_unlock(&vcpu->kvm->mmu_lock);
else
write_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(fault->pfn);
+ kvm_release_pfn_clean(orig_pfn);
return r;
}
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 1f4f5e703f13..685560a45bf6 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -790,6 +790,7 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
struct guest_walker walker;
+ kvm_pfn_t orig_pfn;
int r;
unsigned long mmu_seq;
bool is_self_change_mapping;
@@ -868,6 +869,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
walker.pte_access &= ~ACC_EXEC_MASK;
}
+ orig_pfn = fault->pfn;
+
r = RET_PF_RETRY;
write_lock(&vcpu->kvm->mmu_lock);
@@ -881,7 +884,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
out_unlock:
write_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(fault->pfn);
+ kvm_release_pfn_clean(orig_pfn);
return r;
}
--
2.34.1
This patch series is to fix bug for APIs
- devm_pci_epc_destroy().
- pci_epf_remove_vepf().
and simplify APIs below:
- pci_epc_get().
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
---
Changes in v3:
- Remove stable tag of patch 1/3
- Add one more patch 3/3
- Link to v2: https://lore.kernel.org/all/20241102-pci-epc-core_fix-v2-0-0785f8435be5@qui…
Changes in v2:
- Correct tile and commit message for patch 1/2.
- Add one more patch 2/2 to simplify API pci_epc_get().
- Link to v1: https://lore.kernel.org/r/20241020-pci-epc-core_fix-v1-1-3899705e3537@quici…
---
Zijun Hu (3):
PCI: endpoint: Fix that API devm_pci_epc_destroy() fails to destroy the EPC device
PCI: endpoint: Simplify API pci_epc_get() implementation
PCI: endpoint: Fix API pci_epf_add_vepf() returning -EBUSY error
drivers/pci/endpoint/pci-epc-core.c | 23 +++++++----------------
drivers/pci/endpoint/pci-epf-core.c | 1 +
2 files changed, 8 insertions(+), 16 deletions(-)
---
base-commit: 11066801dd4b7c4d75fce65c812723a80c1481ae
change-id: 20241020-pci-epc-core_fix-a92512fa9d19
Best regards,
--
Zijun Hu <quic_zijuhu(a)quicinc.com>
While testing the encoded read feature the following crash was observed
and it can be reliably reproduced:
[ 2916.441731] Oops: general protection fault, probably for non-canonical address 0xa3f64e06d5eee2c7: 0000 [#1] PREEMPT_RT SMP NOPTI
[ 2916.441736] CPU: 5 UID: 0 PID: 592 Comm: kworker/u38:4 Kdump: loaded Not tainted 6.13.0-rc1+ #4
[ 2916.441739] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 2916.441740] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[ 2916.441777] RIP: 0010:__wake_up_common+0x29/0xa0
[ 2916.441808] RSP: 0018:ffffaaec0128fd80 EFLAGS: 00010216
[ 2916.441810] RAX: 0000000000000001 RBX: ffff95a6429cf020 RCX: 0000000000000000
[ 2916.441811] RDX: a3f64e06d5eee2c7 RSI: 0000000000000003 RDI: ffff95a6429cf000
^^^^^^^^^^^^^^^^
This comes from `priv->wait.head.next`
[ 2916.441823] Call Trace:
[ 2916.441833] <TASK>
[ 2916.441881] ? __wake_up_common+0x29/0xa0
[ 2916.441883] __wake_up_common_lock+0x37/0x60
[ 2916.441887] btrfs_encoded_read_endio+0x73/0x90 [btrfs] <<< UAF of `priv` object,
[ 2916.441921] btrfs_check_read_bio+0x321/0x500 [btrfs] details below.
[ 2916.441947] process_scheduled_works+0xc1/0x410
[ 2916.441960] worker_thread+0x105/0x240
crash> btrfs_encoded_read_private.wait.head ffff95a6429cf000 # `priv` from RDI ^^
wait.head = {
next = 0xa3f64e06d5eee2c7, # Corrupted as the object was already freed/reused.
prev = 0xffff95a6429cf020 # Stale data still point to itself (`&priv->wait.head`
} also in RBX ^^) ie. the list was free.
Possibly, this is easier (or even only?) reproducible on preemptible kernel.
It just happened to build an RT kernel for additional testing coverage.
Enabling slab debug gives us further related details, mostly confirming
what's expected:
[11:23:07] =============================================================================
[11:23:07] BUG kmalloc-64 (Not tainted): Poison overwritten
[11:23:07] -----------------------------------------------------------------------------
[11:23:07] 0xffff8fc7c5b6b542-0xffff8fc7c5b6b543 @offset=5442. First byte 0x4 instead of 0x6b
^
That makes two bytes into the `priv->wait.lock`
[11:23:07] FIX kmalloc-64: Restoring Poison 0xffff8fc7c5b6b542-0xffff8fc7c5b6b543=0x6b
[11:23:07] Allocated in btrfs_encoded_read_regular_fill_pages+0x5e/0x260 [btrfs] age=4 cpu=0 pid=18295
[11:23:07] __kmalloc_cache_noprof+0x81/0x2a0
[11:23:07] btrfs_encoded_read_regular_fill_pages+0x5e/0x260 [btrfs]
[11:23:07] btrfs_encoded_read_regular+0xee/0x200 [btrfs]
[11:23:07] btrfs_ioctl_encoded_read+0x477/0x600 [btrfs]
[11:23:07] btrfs_ioctl+0xefe/0x2a00 [btrfs]
[11:23:07] __x64_sys_ioctl+0xa3/0xc0
[11:23:07] do_syscall_64+0x74/0x180
[11:23:07] entry_SYSCALL_64_after_hwframe+0x76/0x7e
9121 unsigned long i = 0;
9122 struct btrfs_bio *bbio;
9123 int ret;
9124
* 9125 priv = kmalloc(sizeof(struct btrfs_encoded_read_private), GFP_NOFS);
9126 if (!priv)
9127 return -ENOMEM;
9128
9129 init_waitqueue_head(&priv->wait);
[11:23:07] Freed in btrfs_encoded_read_regular_fill_pages+0x1f9/0x260 [btrfs] age=4 cpu=0 pid=18295
[11:23:07] btrfs_encoded_read_regular_fill_pages+0x1f9/0x260 [btrfs]
[11:23:07] btrfs_encoded_read_regular+0xee/0x200 [btrfs]
[11:23:07] btrfs_ioctl_encoded_read+0x477/0x600 [btrfs]
[11:23:07] btrfs_ioctl+0xefe/0x2a00 [btrfs]
[11:23:07] __x64_sys_ioctl+0xa3/0xc0
[11:23:07] do_syscall_64+0x74/0x180
[11:23:07] entry_SYSCALL_64_after_hwframe+0x76/0x7e
9171 if (atomic_dec_return(&priv->pending) != 0)
9172 io_wait_event(priv->wait, !atomic_read(&priv->pending));
9173 /* See btrfs_encoded_read_endio() for ordering. */
9174 ret = blk_status_to_errno(READ_ONCE(priv->status));
* 9175 kfree(priv);
9176 return ret;
9177 }
9178 }
`priv` was freed here but then after that it was further used. The report
is comming soon after, see below. Note that the report is a few seconds
delayed by the RCU stall timeout. (It is the same example as with the
GPF crash above, just that one was reported right away without any delay).
Due to the poison this time instead of the GPF exception as observed above
the UAF caused a CPU hard lockup (reported by the RCU stall check as this
was a VM):
[11:23:28] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[11:23:28] rcu: 0-...!: (1 GPs behind) idle=48b4/1/0x4000000000000000 softirq=0/0 fqs=5 rcuc=5254 jiffies(starved)
[11:23:28] rcu: (detected by 1, t=5252 jiffies, g=1631241, q=250054 ncpus=8)
[11:23:28] Sending NMI from CPU 1 to CPUs 0:
[11:23:28] NMI backtrace for cpu 0
[11:23:28] CPU: 0 UID: 0 PID: 21445 Comm: kworker/u33:3 Kdump: loaded Tainted: G B 6.13.0-rc1+ #4
[11:23:28] Tainted: [B]=BAD_PAGE
[11:23:28] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[11:23:28] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[11:23:28] RIP: 0010:native_halt+0xa/0x10
[11:23:28] RSP: 0018:ffffb42ec277bc48 EFLAGS: 00000046
[11:23:28] Call Trace:
[11:23:28] <TASK>
[11:23:28] kvm_wait+0x53/0x60
[11:23:28] __pv_queued_spin_lock_slowpath+0x2ea/0x350
[11:23:28] _raw_spin_lock_irq+0x2b/0x40
[11:23:28] rtlock_slowlock_locked+0x1f3/0xce0
[11:23:28] rt_spin_lock+0x7b/0xb0
[11:23:28] __wake_up_common_lock+0x23/0x60
[11:23:28] btrfs_encoded_read_endio+0x73/0x90 [btrfs] <<< UAF of `priv` object.
[11:23:28] btrfs_check_read_bio+0x321/0x500 [btrfs]
[11:23:28] process_scheduled_works+0xc1/0x410
[11:23:28] worker_thread+0x105/0x240
9105 if (priv->uring_ctx) {
9106 int err = blk_status_to_errno(READ_ONCE(priv->status));
9107 btrfs_uring_read_extent_endio(priv->uring_ctx, err);
9108 kfree(priv);
9109 } else {
* 9110 wake_up(&priv->wait); <<< So we know UAF/GPF happens here.
9111 }
9112 }
9113 bio_put(&bbio->bio);
Now, the wait queue here does not really guarantee a proper
synchronization between `btrfs_encoded_read_regular_fill_pages()`
and `btrfs_encoded_read_endio()` which eventually results in various
use-afer-free effects like general protection fault or CPU hard lockup.
Using plain wait queue without additional instrumentation on top of the
`pending` counter is simply insufficient in this context. The reason wait
queue fails here is because the lifespan of that structure is only within
the `btrfs_encoded_read_regular_fill_pages()` function. In such a case
plain wait queue cannot be used to synchronize for it's own destruction.
Fix this by correctly using completion instead.
Also, while the lifespan of the structures in sync case is strictly
limited within the `..._fill_pages()` function, there is no need to
allocate from slab. Stack can be safely used instead.
Fixes: 1881fba89bd5 ("btrfs: add BTRFS_IOC_ENCODED_READ ioctl")
CC: stable(a)vger.kernel.org # 5.18+
Signed-off-by: Daniel Vacek <neelx(a)suse.com>
---
fs/btrfs/inode.c | 62 ++++++++++++++++++++++++++----------------------
1 file changed, 33 insertions(+), 29 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index fa648ab6fe806..61e0fd5c6a15f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9078,7 +9078,7 @@ static ssize_t btrfs_encoded_read_inline(
}
struct btrfs_encoded_read_private {
- wait_queue_head_t wait;
+ struct completion *sync_read;
void *uring_ctx;
atomic_t pending;
blk_status_t status;
@@ -9090,23 +9090,22 @@ static void btrfs_encoded_read_endio(struct btrfs_bio *bbio)
if (bbio->bio.bi_status) {
/*
- * The memory barrier implied by the atomic_dec_return() here
- * pairs with the memory barrier implied by the
- * atomic_dec_return() or io_wait_event() in
- * btrfs_encoded_read_regular_fill_pages() to ensure that this
- * write is observed before the load of status in
- * btrfs_encoded_read_regular_fill_pages().
+ * The memory barrier implied by the
+ * atomic_dec_and_test() here pairs with the memory
+ * barrier implied by the atomic_dec_and_test() in
+ * btrfs_encoded_read_regular_fill_pages() to ensure
+ * that this write is observed before the load of
+ * status in btrfs_encoded_read_regular_fill_pages().
*/
WRITE_ONCE(priv->status, bbio->bio.bi_status);
}
if (atomic_dec_and_test(&priv->pending)) {
- int err = blk_status_to_errno(READ_ONCE(priv->status));
-
if (priv->uring_ctx) {
+ int err = blk_status_to_errno(READ_ONCE(priv->status));
btrfs_uring_read_extent_endio(priv->uring_ctx, err);
kfree(priv);
} else {
- wake_up(&priv->wait);
+ complete(priv->sync_read);
}
}
bio_put(&bbio->bio);
@@ -9117,16 +9116,21 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
struct page **pages, void *uring_ctx)
{
struct btrfs_fs_info *fs_info = inode->root->fs_info;
- struct btrfs_encoded_read_private *priv;
+ struct completion sync_read;
+ struct btrfs_encoded_read_private sync_priv, *priv;
unsigned long i = 0;
struct btrfs_bio *bbio;
- int ret;
- priv = kmalloc(sizeof(struct btrfs_encoded_read_private), GFP_NOFS);
- if (!priv)
- return -ENOMEM;
+ if (uring_ctx) {
+ priv = kmalloc(sizeof(struct btrfs_encoded_read_private), GFP_NOFS);
+ if (!priv)
+ return -ENOMEM;
+ } else {
+ priv = &sync_priv;
+ init_completion(&sync_read);
+ priv->sync_read = &sync_read;
+ }
- init_waitqueue_head(&priv->wait);
atomic_set(&priv->pending, 1);
priv->status = 0;
priv->uring_ctx = uring_ctx;
@@ -9158,23 +9162,23 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
atomic_inc(&priv->pending);
btrfs_submit_bbio(bbio, 0);
- if (uring_ctx) {
- if (atomic_dec_return(&priv->pending) == 0) {
- ret = blk_status_to_errno(READ_ONCE(priv->status));
- btrfs_uring_read_extent_endio(uring_ctx, ret);
+ if (atomic_dec_and_test(&priv->pending)) {
+ if (uring_ctx) {
+ int err = blk_status_to_errno(READ_ONCE(priv->status));
+ btrfs_uring_read_extent_endio(uring_ctx, err);
kfree(priv);
- return ret;
+ return err;
+ } else {
+ complete(&sync_read);
}
+ }
+ if (uring_ctx)
return -EIOCBQUEUED;
- } else {
- if (atomic_dec_return(&priv->pending) != 0)
- io_wait_event(priv->wait, !atomic_read(&priv->pending));
- /* See btrfs_encoded_read_endio() for ordering. */
- ret = blk_status_to_errno(READ_ONCE(priv->status));
- kfree(priv);
- return ret;
- }
+
+ wait_for_completion_io(&sync_read);
+ /* See btrfs_encoded_read_endio() for ordering. */
+ return blk_status_to_errno(READ_ONCE(priv->status));
}
ssize_t btrfs_encoded_read_regular(struct kiocb *iocb, struct iov_iter *iter,
--
2.45.2
From: Shu Han <ebpqwerty472123(a)gmail.com>
[ Upstream commit ea7e2d5e49c05e5db1922387b09ca74aa40f46e2 ]
The remap_file_pages syscall handler calls do_mmap() directly, which
doesn't contain the LSM security check. And if the process has called
personality(READ_IMPLIES_EXEC) before and remap_file_pages() is called for
RW pages, this will actually result in remapping the pages to RWX,
bypassing a W^X policy enforced by SELinux.
So we should check prot by security_mmap_file LSM hook in the
remap_file_pages syscall handler before do_mmap() is called. Otherwise, it
potentially permits an attacker to bypass a W^X policy enforced by
SELinux.
The bypass is similar to CVE-2016-10044, which bypass the same thing via
AIO and can be found in [1].
The PoC:
$ cat > test.c
int main(void) {
size_t pagesz = sysconf(_SC_PAGE_SIZE);
int mfd = syscall(SYS_memfd_create, "test", 0);
const char *buf = mmap(NULL, 4 * pagesz, PROT_READ | PROT_WRITE,
MAP_SHARED, mfd, 0);
unsigned int old = syscall(SYS_personality, 0xffffffff);
syscall(SYS_personality, READ_IMPLIES_EXEC | old);
syscall(SYS_remap_file_pages, buf, pagesz, 0, 2, 0);
syscall(SYS_personality, old);
// show the RWX page exists even if W^X policy is enforced
int fd = open("/proc/self/maps", O_RDONLY);
unsigned char buf2[1024];
while (1) {
int ret = read(fd, buf2, 1024);
if (ret <= 0) break;
write(1, buf2, ret);
}
close(fd);
}
$ gcc test.c -o test
$ ./test | grep rwx
7f1836c34000-7f1836c35000 rwxs 00002000 00:01 2050 /memfd:test (deleted)
Link: https://project-zero.issues.chromium.org/issues/42452389 [1]
Cc: stable(a)vger.kernel.org
Signed-off-by: Shu Han <ebpqwerty472123(a)gmail.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
[ Resolve merge conflict in mm/mmap.c. ]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
---
mm/mmap.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c
index 9a9933ede542..ebc3583fa612 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3021,8 +3021,12 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
flags |= MAP_LOCKED;
file = get_file(vma->vm_file);
+ ret = security_mmap_file(vma->vm_file, prot, flags);
+ if (ret)
+ goto out_fput;
ret = do_mmap(vma->vm_file, start, size,
prot, flags, pgoff, &populate, NULL);
+out_fput:
fput(file);
out:
mmap_write_unlock(mm);
--
2.43.0
[ Sasha's backport helper bot ]
Hi,
The upstream commit SHA1 provided is correct: fcf6a49d79923a234844b8efe830a61f3f0584e4
WARNING: Author mismatch between patch and upstream commit:
Backport author: <gregkh(a)linuxfoundation.org>
Commit author: Wayne Lin <wayne.lin(a)amd.com>
Status in newer kernel trees:
6.12.y | Present (exact SHA1)
6.6.y | Present (different SHA1: c7e65cab54a8)
6.1.y | Not found
Note: The patch differs from the upstream commit:
---
1: fcf6a49d79923 ! 1: 79f06b6c107fd drm/amd/display: Don't refer to dc_sink in is_dsc_need_re_compute
@@
## Metadata ##
-Author: Wayne Lin <wayne.lin(a)amd.com>
+Author: gregkh(a)linuxfoundation.org <gregkh(a)linuxfoundation.org>
## Commit message ##
- drm/amd/display: Don't refer to dc_sink in is_dsc_need_re_compute
+ Patch "[PATCH 6.1.y] drm/amd/display: Don't refer to dc_sink in is_dsc_need_re_compute" has been added to the 5.4-stable tree
+
+ This is a note to let you know that I've just added the patch titled
+
+ [PATCH 6.1.y] drm/amd/display: Don't refer to dc_sink in is_dsc_need_re_compute
+
+ to the 5.4-stable tree which can be found at:
+ http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
+
+ The filename of the patch is:
+ drm-amd-display-don-t-refer-to-dc_sink-in-is_dsc_need_re_compute.patch
+ and it can be found in the queue-5.4 subdirectory.
+
+ If you, or anyone else, feels it should not be added to the stable tree,
+ please let <stable(a)vger.kernel.org> know about it.
+
+ From jianqi.ren.cn(a)windriver.com Thu Dec 12 13:11:21 2024
+ From: <jianqi.ren.cn(a)windriver.com>
+ Date: Wed, 11 Dec 2024 18:15:44 +0800
+ Subject: [PATCH 6.1.y] drm/amd/display: Don't refer to dc_sink in is_dsc_need_re_compute
+ To: <wayne.lin(a)amd.com>, <gregkh(a)linuxfoundation.org>
+ Cc: <patches(a)lists.linux.dev>, <jerry.zuo(a)amd.com>, <zaeem.mohamed(a)amd.com>, <daniel.wheeler(a)amd.com>, <alexander.deucher(a)amd.com>, <stable(a)vger.kernel.org>, <harry.wentland(a)amd.com>, <sunpeng.li(a)amd.com>, <Rodrigo.Siqueira(a)amd.com>, <christian.koenig(a)amd.com>, <airlied(a)gmail.com>, <daniel(a)ffwll.ch>, <Jerry.Zuo(a)amd.com>, <amd-gfx(a)lists.freedesktop.org>, <dri-devel(a)lists.freedesktop.org>, <linux-kernel(a)vger.kernel.org>
+ Message-ID: <20241211101544.2121147-1-jianqi.ren.cn(a)windriver.com>
+
+ From: Wayne Lin <wayne.lin(a)amd.com>
+
+ [ Upstream commit fcf6a49d79923a234844b8efe830a61f3f0584e4 ]
[Why]
When unplug one of monitors connected after mst hub, encounter null pointer dereference.
@@ Commit message
Signed-off-by: Wayne Lin <wayne.lin(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
+ Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
## drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c ##
@@ drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c: amdgpu_dm_mst_connector_early_unregister(struct drm_connector *connector)
@@ drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c: dm_dp_mst_detect(st
amdgpu_dm_set_mst_status(&aconnector->mst_status,
MST_REMOTE_EDID | MST_ALLOCATE_NEW_PAYLOAD | MST_CLEAR_ALLOCATED_PAYLOAD,
-@@ drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c: static bool is_dsc_need_re_compute(
- if (!aconnector || !aconnector->dsc_aux)
- continue;
-
-- /*
-- * check if cached virtual MST DSC caps are available and DSC is supported
-- * as per specifications in their Virtual DPCD registers.
-- */
-- if (!(aconnector->dc_sink->dsc_caps.dsc_dec_caps.is_dsc_supported ||
-- aconnector->dc_link->dpcd_caps.dsc_caps.dsc_basic_caps.fields.dsc_support.DSC_PASSTHROUGH_SUPPORT))
-- continue;
--
- stream_on_link[new_stream_on_link_num] = aconnector;
- new_stream_on_link_num++;
-
---
Results of testing on various branches:
| Branch | Patch Apply | Build Test |
|---------------------------|-------------|------------|
| stable/linux-6.1.y | Success | Success |
| stable/linux-5.4.y | Failed | N/A |
út 10. 12. 2024 v 22:04 odesílatel Sasha Levin <sashal(a)kernel.org> napsal:
>
> This is a note to let you know that I've just added the patch titled
>
> rtla/timerlat: Make timerlat_top_cpu->*_count unsigned long long
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> rtla-timerlat-make-timerlat_top_cpu-_count-unsigned-.patch
> and it can be found in the queue-6.6 subdirectory.
>
Could you also add "rtla/timerlat: Make timerlat_hist_cpu->*_count
unsigned long long", too (76b3102148135945b013797fac9b20), just like
we already have in-queue for 6.12? It makes no sense to do one fix but
not the other (clearly autosel AI won't take over the world yet).
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 0b8030ad5be8c39c4ad0f27fa740b3140a31023b
> Author: Tomas Glozar <tglozar(a)redhat.com>
> Date: Fri Oct 11 14:10:14 2024 +0200
>
> rtla/timerlat: Make timerlat_top_cpu->*_count unsigned long long
>
> [ Upstream commit 4eba4723c5254ba8251ecb7094a5078d5c300646 ]
>
> Most fields of struct timerlat_top_cpu are unsigned long long, but the
> fields {irq,thread,user}_count are int (32-bit signed).
>
> This leads to overflow when tracing on a large number of CPUs for a long
> enough time:
> $ rtla timerlat top -a20 -c 1-127 -d 12h
> ...
> 0 12:00:00 | IRQ Timer Latency (us) | Thread Timer Latency (us)
> CPU COUNT | cur min avg max | cur min avg max
> 1 #43200096 | 0 0 1 2 | 3 2 6 12
> ...
> 127 #43200096 | 0 0 1 2 | 3 2 5 11
> ALL #119144 e4 | 0 5 4 | 2 28 16
>
> The average latency should be 0-1 for IRQ and 5-6 for thread, but is
> reported as 5 and 28, about 4 to 5 times more, due to the count
> overflowing when summed over all CPUs: 43200096 * 127 = 5486412192,
> however, 1191444898 (= 5486412192 mod MAX_INT) is reported instead, as
> seen on the last line of the output, and the averages are thus ~4.6
> times higher than they should be (5486412192 / 1191444898 = ~4.6).
>
> Fix the issue by changing {irq,thread,user}_count fields to unsigned
> long long, similarly to other fields in struct timerlat_top_cpu and to
> the count variable in timerlat_top_print_sum.
>
> Link: https://lore.kernel.org/20241011121015.2868751-1-tglozar@redhat.com
> Reported-by: Attila Fazekas <afazekas(a)redhat.com>
> Signed-off-by: Tomas Glozar <tglozar(a)redhat.com>
> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/tools/tracing/rtla/src/timerlat_top.c b/tools/tracing/rtla/src/timerlat_top.c
> index a84f43857de14..0915092057f85 100644
> --- a/tools/tracing/rtla/src/timerlat_top.c
> +++ b/tools/tracing/rtla/src/timerlat_top.c
> @@ -49,9 +49,9 @@ struct timerlat_top_params {
> };
>
> struct timerlat_top_cpu {
> - int irq_count;
> - int thread_count;
> - int user_count;
> + unsigned long long irq_count;
> + unsigned long long thread_count;
> + unsigned long long user_count;
>
> unsigned long long cur_irq;
> unsigned long long min_irq;
> @@ -237,7 +237,7 @@ static void timerlat_top_print(struct osnoise_tool *top, int cpu)
> /*
> * Unless trace is being lost, IRQ counter is always the max.
> */
> - trace_seq_printf(s, "%3d #%-9d |", cpu, cpu_data->irq_count);
> + trace_seq_printf(s, "%3d #%-9llu |", cpu, cpu_data->irq_count);
>
> if (!cpu_data->irq_count) {
> trace_seq_printf(s, "%s %s %s %s |", no_value, no_value, no_value, no_value);
>
Thanks,
Tomas
The hid-sensor-hub creates the individual device structs and transfers them
to the created mfd platform-devices via the platform_data in the mfd_cell.
Before e651a1da442a ("HID: hid-sensor-hub: Allow parallel synchronous reads")
the sensor-hub was managing access centrally, with one "completion" in the
hub's data structure, which needed to be finished on removal at the latest.
The mentioned commit then moved this central management to each hid sensor
device, resulting on a completion in each struct hid_sensor_hub_device.
The remove procedure was adapted to go through all sensor devices and
finish any pending "completion".
What this didn't take into account was, platform_device_add_data() that is
used by mfd_add{_hotplug}_devices() does a kmemdup on the submitted
platform-data. So the data the platform-device gets is a copy of the
original data, meaning that the device worked on a different completion
than what sensor_hub_remove() currently wants to access.
To fix that, use device_for_each_child() to go through each child-device
similar to how mfd_remove_devices() unregisters the devices later and
with that get the live platform_data to finalize the correct completion.
Fixes: e651a1da442a ("HID: hid-sensor-hub: Allow parallel synchronous reads")
Cc: stable(a)vger.kernel.org
Acked-by: Benjamin Tissoires <bentiss(a)kernel.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Signed-off-by: Heiko Stuebner <heiko(a)sntech.de>
---
drivers/hid/hid-sensor-hub.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/drivers/hid/hid-sensor-hub.c b/drivers/hid/hid-sensor-hub.c
index 7bd86eef6ec7..4c94c03cb573 100644
--- a/drivers/hid/hid-sensor-hub.c
+++ b/drivers/hid/hid-sensor-hub.c
@@ -730,23 +730,30 @@ static int sensor_hub_probe(struct hid_device *hdev,
return ret;
}
+static int sensor_hub_finalize_pending_fn(struct device *dev, void *data)
+{
+ struct hid_sensor_hub_device *hsdev = dev->platform_data;
+
+ if (hsdev->pending.status)
+ complete(&hsdev->pending.ready);
+
+ return 0;
+}
+
static void sensor_hub_remove(struct hid_device *hdev)
{
struct sensor_hub_data *data = hid_get_drvdata(hdev);
unsigned long flags;
- int i;
hid_dbg(hdev, " hardware removed\n");
hid_hw_close(hdev);
hid_hw_stop(hdev);
+
spin_lock_irqsave(&data->lock, flags);
- for (i = 0; i < data->hid_sensor_client_cnt; ++i) {
- struct hid_sensor_hub_device *hsdev =
- data->hid_sensor_hub_client_devs[i].platform_data;
- if (hsdev->pending.status)
- complete(&hsdev->pending.ready);
- }
+ device_for_each_child(&hdev->dev, NULL,
+ sensor_hub_finalize_pending_fn);
spin_unlock_irqrestore(&data->lock, flags);
+
mfd_remove_devices(&hdev->dev);
mutex_destroy(&data->mutex);
}
--
2.45.2
The blamed commit changed the dsa_8021q_rcv() calling convention to
accept pre-populated source_port and switch_id arguments. If those are
not available, as in the case of tag_ocelot_8021q, the arguments must be
pre-initialized with -1.
Due to the bug of passing uninitialized arguments in tag_ocelot_8021q,
dsa_8021q_rcv() does not detect that it needs to populate the
source_port and switch_id, and this makes dsa_conduit_find_user() fail,
which leads to packet loss on reception.
Fixes: dcfe7673787b ("net: dsa: tag_sja1105: absorb logic for not overwriting precise info into dsa_8021q_rcv()")
Signed-off-by: Robert Hodaszi <robert.hodaszi(a)digi.com>
---
Cc: stable(a)vger.kernel.org
---
net/dsa/tag_ocelot_8021q.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
index 8e8b1bef6af6..11ea8cfd6266 100644
--- a/net/dsa/tag_ocelot_8021q.c
+++ b/net/dsa/tag_ocelot_8021q.c
@@ -79,7 +79,7 @@ static struct sk_buff *ocelot_xmit(struct sk_buff *skb,
static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
struct net_device *netdev)
{
- int src_port, switch_id;
+ int src_port = -1, switch_id = -1;
dsa_8021q_rcv(skb, &src_port, &switch_id, NULL, NULL);
--
2.43.0
From: Joe Damato <jdamato(a)fastly.com>
[ Upstream commit 08062af0a52107a243f7608fd972edb54ca5b7f8 ]
In commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature")
napi_defer_irqs was added to net_device and napi_defer_irqs_count was
added to napi_struct, both as type int.
This value never goes below zero, so there is not reason for it to be a
signed int. Change the type for both from int to u32, and add an
overflow check to sysfs to limit the value to S32_MAX.
The limit of S32_MAX was chosen because the practical limit before this
patch was S32_MAX (anything larger was an overflow) and thus there are
no behavioral changes introduced. If the extra bit is needed in the
future, the limit can be raised.
Before this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
$ cat /sys/class/net/eth4/napi_defer_hard_irqs
-2147483647
After this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
bash: line 0: echo: write error: Numerical result out of range
Similarly, /sys/class/net/XXXXX/tx_queue_len is defined as unsigned:
include/linux/netdevice.h: unsigned int tx_queue_len;
And has an overflow check:
dev_change_tx_queue_len(..., unsigned long new_len):
if (new_len != (unsigned int)new_len)
return -ERANGE;
Suggested-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Joe Damato <jdamato(a)fastly.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Link: https://patch.msgid.link/20240904153431.307932-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
include/linux/netdevice.h | 4 ++--
net/core/net-sysfs.c | 6 +++++-
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fbbd0df1106b..8379e938cd89 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -352,7 +352,7 @@ struct napi_struct {
unsigned long state;
int weight;
- int defer_hard_irqs_count;
+ u32 defer_hard_irqs_count;
unsigned long gro_bitmask;
int (*poll)(struct napi_struct *, int);
#ifdef CONFIG_NETPOLL
@@ -2193,7 +2193,7 @@ struct net_device {
struct bpf_prog __rcu *xdp_prog;
unsigned long gro_flush_timeout;
- int napi_defer_hard_irqs;
+ u32 napi_defer_hard_irqs;
#define GRO_LEGACY_MAX_SIZE 65536u
/* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE),
* and shinfo->gso_segs is a 16bit field.
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 8a06f97320e0..4ce57e75d139 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -30,6 +30,7 @@
#ifdef CONFIG_SYSFS
static const char fmt_hex[] = "%#x\n";
static const char fmt_dec[] = "%d\n";
+static const char fmt_uint[] = "%u\n";
static const char fmt_ulong[] = "%lu\n";
static const char fmt_u64[] = "%llu\n";
@@ -405,6 +406,9 @@ NETDEVICE_SHOW_RW(gro_flush_timeout, fmt_ulong);
static int change_napi_defer_hard_irqs(struct net_device *dev, unsigned long val)
{
+ if (val > S32_MAX)
+ return -ERANGE;
+
WRITE_ONCE(dev->napi_defer_hard_irqs, val);
return 0;
}
@@ -418,7 +422,7 @@ static ssize_t napi_defer_hard_irqs_store(struct device *dev,
return netdev_store(dev, attr, buf, len, change_napi_defer_hard_irqs);
}
-NETDEVICE_SHOW_RW(napi_defer_hard_irqs, fmt_dec);
+NETDEVICE_SHOW_RW(napi_defer_hard_irqs, fmt_uint);
static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
--
2.25.1
Changes in v8:
- Picks up change I agreed with Vlad but failed to cherry-pick into my b4
tree - Vlad/Bod
- Rewords the commit log for patch #3. As I read it I decided I might
translate bits of it from thought-stream into English - Bod
- Link to v7: https://lore.kernel.org/r/20241211-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v7:
- Expand commit log in patch #3
I've discussed with Bjorn on IRC and video what to put into the log here
and captured most of what we discussed.
Mostly the point here is voting for voltages in the power-domain list
is up to the drivers to do with performance states/opp-tables not for the
GDSC code. - Bjorn/Bryan
- Link to v6: https://lore.kernel.org/r/20241129-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v6:
- Passes NULL to second parameter of devm_pm_domain_attach_list - Vlad
- Link to v5: https://lore.kernel.org/r/20241128-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v5:
- In-lines devm_pm_domain_attach_list() in probe() directly - Vlad
- Link to v4: https://lore.kernel.org/r/20241127-b4-linux-next-24-11-18-clock-multiple-po…
v4:
- Adds Bjorn's RB to first patch - Bjorn
- Drops the 'd' in "and int" - Bjorn
- Amends commit log of patch 3 to capture a number of open questions -
Bjorn
- Link to v3: https://lore.kernel.org/r/20241126-b4-linux-next-24-11-18-clock-multiple-po…
v3:
- Fixes commit log "per which" - Bryan
- Link to v2: https://lore.kernel.org/r/20241125-b4-linux-next-24-11-18-clock-multiple-po…
v2:
The main change in this version is Bjorn's pointing out that pm_runtime_*
inside of the gdsc_enable/gdsc_disable path would be recursive and cause a
lockdep splat. Dmitry alluded to this too.
Bjorn pointed to stuff being done lower in the gdsc_register() routine that
might be a starting point.
I iterated around that idea and came up with patch #3. When a gdsc has no
parent and the pd_list is non-NULL then attach that orphan GDSC to the
clock controller power-domain list.
Existing subdomain code in gdsc_register() will connect the parent GDSCs in
the clock-controller to the clock-controller subdomain, the new code here
does that same job for a list of power-domains the clock controller depends
on.
To Dmitry's point about MMCX and MCX dependencies for the registers inside
of the clock controller, I have switched off all references in a test dtsi
and confirmed that accessing the clock-controller regs themselves isn't
required.
On the second point I also verified my test branch with lockdep on which
was a concern with the pm_domain version of this solution but I wanted to
cover it anyway with the new approach for completeness sake.
Here's the item-by-item list of changes:
- Adds a patch to capture pm_genpd_add_subdomain() result code - Bryan
- Changes changelog of second patch to remove singleton and generally
to make the commit log easier to understand - Bjorn
- Uses demv_pm_domain_attach_list - Vlad
- Changes error check to if (ret < 0 && ret != -EEXIST) - Vlad
- Retains passing &pd_data instead of NULL - because NULL doesn't do
the same thing - Bryan/Vlad
- Retains standalone function qcom_cc_pds_attach() because the pd_data
enumeration looks neater in a standalone function - Bryan/Vlad
- Drops pm_runtime in favour of gdsc_add_subdomain_list() for each
power-domain in the pd_list.
The pd_list will be whatever is pointed to by power-domains = <>
in the dtsi - Bjorn
- Link to v1: https://lore.kernel.org/r/20241118-b4-linux-next-24-11-18-clock-multiple-po…
v1:
On x1e80100 and it's SKUs the Camera Clock Controller - CAMCC has
multiple power-domains which power it. Usually with a single power-domain
the core platform code will automatically switch on the singleton
power-domain for you. If you have multiple power-domains for a device, in
this case the clock controller, you need to switch those power-domains
on/off yourself.
The clock controllers can also contain Global Distributed
Switch Controllers - GDSCs which themselves can be referenced from dtsi
nodes ultimately triggering a gdsc_en() in drivers/clk/qcom/gdsc.c.
As an example:
cci0: cci@ac4a000 {
power-domains = <&camcc TITAN_TOP_GDSC>;
};
This series adds the support to attach a power-domain list to the
clock-controllers and the GDSCs those controllers provide so that in the
case of the above example gdsc_toggle_logic() will trigger the power-domain
list with pm_runtime_resume_and_get() and pm_runtime_put_sync()
respectively.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (3):
clk: qcom: gdsc: Capture pm_genpd_add_subdomain result code
clk: qcom: common: Add support for power-domain attachment
clk: qcom: Support attaching GDSCs to multiple parents
drivers/clk/qcom/common.c | 6 ++++++
drivers/clk/qcom/gdsc.c | 41 +++++++++++++++++++++++++++++++++++++++--
drivers/clk/qcom/gdsc.h | 1 +
3 files changed, 46 insertions(+), 2 deletions(-)
---
base-commit: 744cf71b8bdfcdd77aaf58395e068b7457634b2c
change-id: 20241118-b4-linux-next-24-11-18-clock-multiple-power-domains-a5f994dc452a
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 64506b3d23a337e98a74b18dcb10c8619365f2bd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121240-props-brittle-f872@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 64506b3d23a337e98a74b18dcb10c8619365f2bd Mon Sep 17 00:00:00 2001
From: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Date: Mon, 11 Nov 2024 23:18:31 +0530
Subject: [PATCH] scsi: ufs: qcom: Only free platform MSIs when ESI is enabled
Otherwise, it will result in a NULL pointer dereference as below:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
Call trace:
mutex_lock+0xc/0x54
platform_device_msi_free_irqs_all+0x14/0x20
ufs_qcom_remove+0x34/0x48 [ufs_qcom]
platform_remove+0x28/0x44
device_remove+0x4c/0x80
device_release_driver_internal+0xd8/0x178
driver_detach+0x50/0x9c
bus_remove_driver+0x6c/0xbc
driver_unregister+0x30/0x60
platform_driver_unregister+0x14/0x20
ufs_qcom_pltform_exit+0x18/0xb94 [ufs_qcom]
__arm64_sys_delete_module+0x180/0x260
invoke_syscall+0x44/0x100
el0_svc_common.constprop.0+0xc0/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x34/0xdc
el0t_64_sync_handler+0xc0/0xc4
el0t_64_sync+0x190/0x194
Cc: stable(a)vger.kernel.org # 6.3
Fixes: 519b6274a777 ("scsi: ufs: qcom: Add MCQ ESI config vendor specific ops")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Link: https://lore.kernel.org/r/20241111-ufs_bug_fix-v1-2-45ad8b62f02e@linaro.org
Reviewed-by: Bean Huo <beanhuo(a)micron.com>
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/ufs/host/ufs-qcom.c b/drivers/ufs/host/ufs-qcom.c
index 3b592492e152..5220ec78021d 100644
--- a/drivers/ufs/host/ufs-qcom.c
+++ b/drivers/ufs/host/ufs-qcom.c
@@ -1861,10 +1861,12 @@ static int ufs_qcom_probe(struct platform_device *pdev)
static void ufs_qcom_remove(struct platform_device *pdev)
{
struct ufs_hba *hba = platform_get_drvdata(pdev);
+ struct ufs_qcom_host *host = ufshcd_get_variant(hba);
pm_runtime_get_sync(&(pdev)->dev);
ufshcd_remove(hba);
- platform_device_msi_free_irqs_all(hba->dev);
+ if (host->esi_enabled)
+ platform_device_msi_free_irqs_all(hba->dev);
}
static const struct of_device_id ufs_qcom_of_match[] __maybe_unused = {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 76031d9536a076bf023bedbdb1b4317fc801dd67
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121206-varnish-jackpot-7d74@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76031d9536a076bf023bedbdb1b4317fc801dd67 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 3 Dec 2024 11:16:30 +0100
Subject: [PATCH] clocksource: Make negative motion detection more robust
Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a
24-bit wide clocksource.
It turns out that the calculated maximal idle time, which limits idle
sleeps to prevent clocksource wrap arounds, is close to the point where the
negative motion detection triggers.
max_idle_ns: 597268854 ns
negative motion tripping point: 671088640 ns
If the idle wakeup is delayed beyond that point, the clocksource
advances far enough to trigger the negative motion detection. This
prevents the clock to advance and in the worst case the system stalls
completely if the consecutive sleeps based on the stale clock are
delayed as well.
Cure this by calculating a more robust cut-off value for negative motion,
which covers 87.5% of the actual clocksource counter width. Compare the
delta against this value to catch negative motion. This is specifically for
clock sources with a small counter width as their wrap around time is close
to the half counter width. For clock sources with wide counters this is not
a problem because the maximum idle time is far from the half counter width
due to the math overflow protection constraints.
For the case at hand this results in a tripping point of 1174405120ns.
Note, that this cannot prevent issues when the delay exceeds the 87.5%
margin, but that's not different from the previous unchecked version which
allowed arbitrary time jumps.
Systems with small counter width are prone to invalid results, but this
problem is unlikely to be seen on real hardware. If such a system
completely stalls for more than half a second, then there are other more
urgent problems than the counter wrapping around.
Fixes: c163e40af9b2 ("timekeeping: Always check for negative motion")
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Link: https://lore.kernel.org/all/8734j5ul4x.ffs@tglx
Closes: https://lore.kernel.org/all/387b120b-d68a-45e8-b6ab-768cd95d11c2@roeck-us.n…
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index ef1b16da6ad5..65b7c41471c3 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -49,6 +49,7 @@ struct module;
* @archdata: Optional arch-specific data
* @max_cycles: Maximum safe cycle value which won't overflow on
* multiplication
+ * @max_raw_delta: Maximum safe delta value for negative motion detection
* @name: Pointer to clocksource name
* @list: List head for registration (internal)
* @freq_khz: Clocksource frequency in khz.
@@ -109,6 +110,7 @@ struct clocksource {
struct arch_clocksource_data archdata;
#endif
u64 max_cycles;
+ u64 max_raw_delta;
const char *name;
struct list_head list;
u32 freq_khz;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index aab6472853fa..7304d7cf47f2 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -24,7 +24,7 @@ static void clocksource_enqueue(struct clocksource *cs);
static noinline u64 cycles_to_nsec_safe(struct clocksource *cs, u64 start, u64 end)
{
- u64 delta = clocksource_delta(end, start, cs->mask);
+ u64 delta = clocksource_delta(end, start, cs->mask, cs->max_raw_delta);
if (likely(delta < cs->max_cycles))
return clocksource_cyc2ns(delta, cs->mult, cs->shift);
@@ -993,6 +993,15 @@ static inline void clocksource_update_max_deferment(struct clocksource *cs)
cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
cs->maxadj, cs->mask,
&cs->max_cycles);
+
+ /*
+ * Threshold for detecting negative motion in clocksource_delta().
+ *
+ * Allow for 0.875 of the counter width so that overly long idle
+ * sleeps, which go slightly over mask/2, do not trigger the
+ * negative motion detection.
+ */
+ cs->max_raw_delta = (cs->mask >> 1) + (cs->mask >> 2) + (cs->mask >> 3);
}
static struct clocksource *clocksource_find_best(bool oneshot, bool skipcur)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0ca85ff4fbb4..3d128825d343 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -755,7 +755,8 @@ static void timekeeping_forward_now(struct timekeeper *tk)
u64 cycle_now, delta;
cycle_now = tk_clock_read(&tk->tkr_mono);
- delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
+ delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask,
+ tk->tkr_mono.clock->max_raw_delta);
tk->tkr_mono.cycle_last = cycle_now;
tk->tkr_raw.cycle_last = cycle_now;
@@ -2230,7 +2231,8 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode)
return false;
offset = clocksource_delta(tk_clock_read(&tk->tkr_mono),
- tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
+ tk->tkr_mono.cycle_last, tk->tkr_mono.mask,
+ tk->tkr_mono.clock->max_raw_delta);
/* Check if there's really nothing to do */
if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK)
diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
index 63e600e943a7..8c9079108ffb 100644
--- a/kernel/time/timekeeping_internal.h
+++ b/kernel/time/timekeeping_internal.h
@@ -30,15 +30,15 @@ static inline void timekeeping_inc_mg_floor_swaps(void)
#endif
-static inline u64 clocksource_delta(u64 now, u64 last, u64 mask)
+static inline u64 clocksource_delta(u64 now, u64 last, u64 mask, u64 max_delta)
{
u64 ret = (now - last) & mask;
/*
- * Prevent time going backwards by checking the MSB of mask in
- * the result. If set, return 0.
+ * Prevent time going backwards by checking the result against
+ * @max_delta. If greater, return 0.
*/
- return ret & ~(mask >> 1) ? 0 : ret;
+ return ret > max_delta ? 0 : ret;
}
/* Semi public for serialization of non timekeeper VDSO updates. */
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 76031d9536a076bf023bedbdb1b4317fc801dd67
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121205-could-retype-331c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76031d9536a076bf023bedbdb1b4317fc801dd67 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 3 Dec 2024 11:16:30 +0100
Subject: [PATCH] clocksource: Make negative motion detection more robust
Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a
24-bit wide clocksource.
It turns out that the calculated maximal idle time, which limits idle
sleeps to prevent clocksource wrap arounds, is close to the point where the
negative motion detection triggers.
max_idle_ns: 597268854 ns
negative motion tripping point: 671088640 ns
If the idle wakeup is delayed beyond that point, the clocksource
advances far enough to trigger the negative motion detection. This
prevents the clock to advance and in the worst case the system stalls
completely if the consecutive sleeps based on the stale clock are
delayed as well.
Cure this by calculating a more robust cut-off value for negative motion,
which covers 87.5% of the actual clocksource counter width. Compare the
delta against this value to catch negative motion. This is specifically for
clock sources with a small counter width as their wrap around time is close
to the half counter width. For clock sources with wide counters this is not
a problem because the maximum idle time is far from the half counter width
due to the math overflow protection constraints.
For the case at hand this results in a tripping point of 1174405120ns.
Note, that this cannot prevent issues when the delay exceeds the 87.5%
margin, but that's not different from the previous unchecked version which
allowed arbitrary time jumps.
Systems with small counter width are prone to invalid results, but this
problem is unlikely to be seen on real hardware. If such a system
completely stalls for more than half a second, then there are other more
urgent problems than the counter wrapping around.
Fixes: c163e40af9b2 ("timekeeping: Always check for negative motion")
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Link: https://lore.kernel.org/all/8734j5ul4x.ffs@tglx
Closes: https://lore.kernel.org/all/387b120b-d68a-45e8-b6ab-768cd95d11c2@roeck-us.n…
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index ef1b16da6ad5..65b7c41471c3 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -49,6 +49,7 @@ struct module;
* @archdata: Optional arch-specific data
* @max_cycles: Maximum safe cycle value which won't overflow on
* multiplication
+ * @max_raw_delta: Maximum safe delta value for negative motion detection
* @name: Pointer to clocksource name
* @list: List head for registration (internal)
* @freq_khz: Clocksource frequency in khz.
@@ -109,6 +110,7 @@ struct clocksource {
struct arch_clocksource_data archdata;
#endif
u64 max_cycles;
+ u64 max_raw_delta;
const char *name;
struct list_head list;
u32 freq_khz;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index aab6472853fa..7304d7cf47f2 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -24,7 +24,7 @@ static void clocksource_enqueue(struct clocksource *cs);
static noinline u64 cycles_to_nsec_safe(struct clocksource *cs, u64 start, u64 end)
{
- u64 delta = clocksource_delta(end, start, cs->mask);
+ u64 delta = clocksource_delta(end, start, cs->mask, cs->max_raw_delta);
if (likely(delta < cs->max_cycles))
return clocksource_cyc2ns(delta, cs->mult, cs->shift);
@@ -993,6 +993,15 @@ static inline void clocksource_update_max_deferment(struct clocksource *cs)
cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
cs->maxadj, cs->mask,
&cs->max_cycles);
+
+ /*
+ * Threshold for detecting negative motion in clocksource_delta().
+ *
+ * Allow for 0.875 of the counter width so that overly long idle
+ * sleeps, which go slightly over mask/2, do not trigger the
+ * negative motion detection.
+ */
+ cs->max_raw_delta = (cs->mask >> 1) + (cs->mask >> 2) + (cs->mask >> 3);
}
static struct clocksource *clocksource_find_best(bool oneshot, bool skipcur)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0ca85ff4fbb4..3d128825d343 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -755,7 +755,8 @@ static void timekeeping_forward_now(struct timekeeper *tk)
u64 cycle_now, delta;
cycle_now = tk_clock_read(&tk->tkr_mono);
- delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
+ delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask,
+ tk->tkr_mono.clock->max_raw_delta);
tk->tkr_mono.cycle_last = cycle_now;
tk->tkr_raw.cycle_last = cycle_now;
@@ -2230,7 +2231,8 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode)
return false;
offset = clocksource_delta(tk_clock_read(&tk->tkr_mono),
- tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
+ tk->tkr_mono.cycle_last, tk->tkr_mono.mask,
+ tk->tkr_mono.clock->max_raw_delta);
/* Check if there's really nothing to do */
if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK)
diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
index 63e600e943a7..8c9079108ffb 100644
--- a/kernel/time/timekeeping_internal.h
+++ b/kernel/time/timekeeping_internal.h
@@ -30,15 +30,15 @@ static inline void timekeeping_inc_mg_floor_swaps(void)
#endif
-static inline u64 clocksource_delta(u64 now, u64 last, u64 mask)
+static inline u64 clocksource_delta(u64 now, u64 last, u64 mask, u64 max_delta)
{
u64 ret = (now - last) & mask;
/*
- * Prevent time going backwards by checking the MSB of mask in
- * the result. If set, return 0.
+ * Prevent time going backwards by checking the result against
+ * @max_delta. If greater, return 0.
*/
- return ret & ~(mask >> 1) ? 0 : ret;
+ return ret > max_delta ? 0 : ret;
}
/* Semi public for serialization of non timekeeper VDSO updates. */
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 76031d9536a076bf023bedbdb1b4317fc801dd67
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121204-among-product-771b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76031d9536a076bf023bedbdb1b4317fc801dd67 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 3 Dec 2024 11:16:30 +0100
Subject: [PATCH] clocksource: Make negative motion detection more robust
Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a
24-bit wide clocksource.
It turns out that the calculated maximal idle time, which limits idle
sleeps to prevent clocksource wrap arounds, is close to the point where the
negative motion detection triggers.
max_idle_ns: 597268854 ns
negative motion tripping point: 671088640 ns
If the idle wakeup is delayed beyond that point, the clocksource
advances far enough to trigger the negative motion detection. This
prevents the clock to advance and in the worst case the system stalls
completely if the consecutive sleeps based on the stale clock are
delayed as well.
Cure this by calculating a more robust cut-off value for negative motion,
which covers 87.5% of the actual clocksource counter width. Compare the
delta against this value to catch negative motion. This is specifically for
clock sources with a small counter width as their wrap around time is close
to the half counter width. For clock sources with wide counters this is not
a problem because the maximum idle time is far from the half counter width
due to the math overflow protection constraints.
For the case at hand this results in a tripping point of 1174405120ns.
Note, that this cannot prevent issues when the delay exceeds the 87.5%
margin, but that's not different from the previous unchecked version which
allowed arbitrary time jumps.
Systems with small counter width are prone to invalid results, but this
problem is unlikely to be seen on real hardware. If such a system
completely stalls for more than half a second, then there are other more
urgent problems than the counter wrapping around.
Fixes: c163e40af9b2 ("timekeeping: Always check for negative motion")
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Link: https://lore.kernel.org/all/8734j5ul4x.ffs@tglx
Closes: https://lore.kernel.org/all/387b120b-d68a-45e8-b6ab-768cd95d11c2@roeck-us.n…
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index ef1b16da6ad5..65b7c41471c3 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -49,6 +49,7 @@ struct module;
* @archdata: Optional arch-specific data
* @max_cycles: Maximum safe cycle value which won't overflow on
* multiplication
+ * @max_raw_delta: Maximum safe delta value for negative motion detection
* @name: Pointer to clocksource name
* @list: List head for registration (internal)
* @freq_khz: Clocksource frequency in khz.
@@ -109,6 +110,7 @@ struct clocksource {
struct arch_clocksource_data archdata;
#endif
u64 max_cycles;
+ u64 max_raw_delta;
const char *name;
struct list_head list;
u32 freq_khz;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index aab6472853fa..7304d7cf47f2 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -24,7 +24,7 @@ static void clocksource_enqueue(struct clocksource *cs);
static noinline u64 cycles_to_nsec_safe(struct clocksource *cs, u64 start, u64 end)
{
- u64 delta = clocksource_delta(end, start, cs->mask);
+ u64 delta = clocksource_delta(end, start, cs->mask, cs->max_raw_delta);
if (likely(delta < cs->max_cycles))
return clocksource_cyc2ns(delta, cs->mult, cs->shift);
@@ -993,6 +993,15 @@ static inline void clocksource_update_max_deferment(struct clocksource *cs)
cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
cs->maxadj, cs->mask,
&cs->max_cycles);
+
+ /*
+ * Threshold for detecting negative motion in clocksource_delta().
+ *
+ * Allow for 0.875 of the counter width so that overly long idle
+ * sleeps, which go slightly over mask/2, do not trigger the
+ * negative motion detection.
+ */
+ cs->max_raw_delta = (cs->mask >> 1) + (cs->mask >> 2) + (cs->mask >> 3);
}
static struct clocksource *clocksource_find_best(bool oneshot, bool skipcur)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0ca85ff4fbb4..3d128825d343 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -755,7 +755,8 @@ static void timekeeping_forward_now(struct timekeeper *tk)
u64 cycle_now, delta;
cycle_now = tk_clock_read(&tk->tkr_mono);
- delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
+ delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask,
+ tk->tkr_mono.clock->max_raw_delta);
tk->tkr_mono.cycle_last = cycle_now;
tk->tkr_raw.cycle_last = cycle_now;
@@ -2230,7 +2231,8 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode)
return false;
offset = clocksource_delta(tk_clock_read(&tk->tkr_mono),
- tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
+ tk->tkr_mono.cycle_last, tk->tkr_mono.mask,
+ tk->tkr_mono.clock->max_raw_delta);
/* Check if there's really nothing to do */
if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK)
diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
index 63e600e943a7..8c9079108ffb 100644
--- a/kernel/time/timekeeping_internal.h
+++ b/kernel/time/timekeeping_internal.h
@@ -30,15 +30,15 @@ static inline void timekeeping_inc_mg_floor_swaps(void)
#endif
-static inline u64 clocksource_delta(u64 now, u64 last, u64 mask)
+static inline u64 clocksource_delta(u64 now, u64 last, u64 mask, u64 max_delta)
{
u64 ret = (now - last) & mask;
/*
- * Prevent time going backwards by checking the MSB of mask in
- * the result. If set, return 0.
+ * Prevent time going backwards by checking the result against
+ * @max_delta. If greater, return 0.
*/
- return ret & ~(mask >> 1) ? 0 : ret;
+ return ret > max_delta ? 0 : ret;
}
/* Semi public for serialization of non timekeeper VDSO updates. */
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 76031d9536a076bf023bedbdb1b4317fc801dd67
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121227-undermine-effort-1e7c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76031d9536a076bf023bedbdb1b4317fc801dd67 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 3 Dec 2024 11:16:30 +0100
Subject: [PATCH] clocksource: Make negative motion detection more robust
Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a
24-bit wide clocksource.
It turns out that the calculated maximal idle time, which limits idle
sleeps to prevent clocksource wrap arounds, is close to the point where the
negative motion detection triggers.
max_idle_ns: 597268854 ns
negative motion tripping point: 671088640 ns
If the idle wakeup is delayed beyond that point, the clocksource
advances far enough to trigger the negative motion detection. This
prevents the clock to advance and in the worst case the system stalls
completely if the consecutive sleeps based on the stale clock are
delayed as well.
Cure this by calculating a more robust cut-off value for negative motion,
which covers 87.5% of the actual clocksource counter width. Compare the
delta against this value to catch negative motion. This is specifically for
clock sources with a small counter width as their wrap around time is close
to the half counter width. For clock sources with wide counters this is not
a problem because the maximum idle time is far from the half counter width
due to the math overflow protection constraints.
For the case at hand this results in a tripping point of 1174405120ns.
Note, that this cannot prevent issues when the delay exceeds the 87.5%
margin, but that's not different from the previous unchecked version which
allowed arbitrary time jumps.
Systems with small counter width are prone to invalid results, but this
problem is unlikely to be seen on real hardware. If such a system
completely stalls for more than half a second, then there are other more
urgent problems than the counter wrapping around.
Fixes: c163e40af9b2 ("timekeeping: Always check for negative motion")
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Link: https://lore.kernel.org/all/8734j5ul4x.ffs@tglx
Closes: https://lore.kernel.org/all/387b120b-d68a-45e8-b6ab-768cd95d11c2@roeck-us.n…
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index ef1b16da6ad5..65b7c41471c3 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -49,6 +49,7 @@ struct module;
* @archdata: Optional arch-specific data
* @max_cycles: Maximum safe cycle value which won't overflow on
* multiplication
+ * @max_raw_delta: Maximum safe delta value for negative motion detection
* @name: Pointer to clocksource name
* @list: List head for registration (internal)
* @freq_khz: Clocksource frequency in khz.
@@ -109,6 +110,7 @@ struct clocksource {
struct arch_clocksource_data archdata;
#endif
u64 max_cycles;
+ u64 max_raw_delta;
const char *name;
struct list_head list;
u32 freq_khz;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index aab6472853fa..7304d7cf47f2 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -24,7 +24,7 @@ static void clocksource_enqueue(struct clocksource *cs);
static noinline u64 cycles_to_nsec_safe(struct clocksource *cs, u64 start, u64 end)
{
- u64 delta = clocksource_delta(end, start, cs->mask);
+ u64 delta = clocksource_delta(end, start, cs->mask, cs->max_raw_delta);
if (likely(delta < cs->max_cycles))
return clocksource_cyc2ns(delta, cs->mult, cs->shift);
@@ -993,6 +993,15 @@ static inline void clocksource_update_max_deferment(struct clocksource *cs)
cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
cs->maxadj, cs->mask,
&cs->max_cycles);
+
+ /*
+ * Threshold for detecting negative motion in clocksource_delta().
+ *
+ * Allow for 0.875 of the counter width so that overly long idle
+ * sleeps, which go slightly over mask/2, do not trigger the
+ * negative motion detection.
+ */
+ cs->max_raw_delta = (cs->mask >> 1) + (cs->mask >> 2) + (cs->mask >> 3);
}
static struct clocksource *clocksource_find_best(bool oneshot, bool skipcur)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0ca85ff4fbb4..3d128825d343 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -755,7 +755,8 @@ static void timekeeping_forward_now(struct timekeeper *tk)
u64 cycle_now, delta;
cycle_now = tk_clock_read(&tk->tkr_mono);
- delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
+ delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask,
+ tk->tkr_mono.clock->max_raw_delta);
tk->tkr_mono.cycle_last = cycle_now;
tk->tkr_raw.cycle_last = cycle_now;
@@ -2230,7 +2231,8 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode)
return false;
offset = clocksource_delta(tk_clock_read(&tk->tkr_mono),
- tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
+ tk->tkr_mono.cycle_last, tk->tkr_mono.mask,
+ tk->tkr_mono.clock->max_raw_delta);
/* Check if there's really nothing to do */
if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK)
diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
index 63e600e943a7..8c9079108ffb 100644
--- a/kernel/time/timekeeping_internal.h
+++ b/kernel/time/timekeeping_internal.h
@@ -30,15 +30,15 @@ static inline void timekeeping_inc_mg_floor_swaps(void)
#endif
-static inline u64 clocksource_delta(u64 now, u64 last, u64 mask)
+static inline u64 clocksource_delta(u64 now, u64 last, u64 mask, u64 max_delta)
{
u64 ret = (now - last) & mask;
/*
- * Prevent time going backwards by checking the MSB of mask in
- * the result. If set, return 0.
+ * Prevent time going backwards by checking the result against
+ * @max_delta. If greater, return 0.
*/
- return ret & ~(mask >> 1) ? 0 : ret;
+ return ret > max_delta ? 0 : ret;
}
/* Semi public for serialization of non timekeeper VDSO updates. */
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 76031d9536a076bf023bedbdb1b4317fc801dd67
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121226-cathedral-decimeter-88cb@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76031d9536a076bf023bedbdb1b4317fc801dd67 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 3 Dec 2024 11:16:30 +0100
Subject: [PATCH] clocksource: Make negative motion detection more robust
Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a
24-bit wide clocksource.
It turns out that the calculated maximal idle time, which limits idle
sleeps to prevent clocksource wrap arounds, is close to the point where the
negative motion detection triggers.
max_idle_ns: 597268854 ns
negative motion tripping point: 671088640 ns
If the idle wakeup is delayed beyond that point, the clocksource
advances far enough to trigger the negative motion detection. This
prevents the clock to advance and in the worst case the system stalls
completely if the consecutive sleeps based on the stale clock are
delayed as well.
Cure this by calculating a more robust cut-off value for negative motion,
which covers 87.5% of the actual clocksource counter width. Compare the
delta against this value to catch negative motion. This is specifically for
clock sources with a small counter width as their wrap around time is close
to the half counter width. For clock sources with wide counters this is not
a problem because the maximum idle time is far from the half counter width
due to the math overflow protection constraints.
For the case at hand this results in a tripping point of 1174405120ns.
Note, that this cannot prevent issues when the delay exceeds the 87.5%
margin, but that's not different from the previous unchecked version which
allowed arbitrary time jumps.
Systems with small counter width are prone to invalid results, but this
problem is unlikely to be seen on real hardware. If such a system
completely stalls for more than half a second, then there are other more
urgent problems than the counter wrapping around.
Fixes: c163e40af9b2 ("timekeeping: Always check for negative motion")
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Link: https://lore.kernel.org/all/8734j5ul4x.ffs@tglx
Closes: https://lore.kernel.org/all/387b120b-d68a-45e8-b6ab-768cd95d11c2@roeck-us.n…
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index ef1b16da6ad5..65b7c41471c3 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -49,6 +49,7 @@ struct module;
* @archdata: Optional arch-specific data
* @max_cycles: Maximum safe cycle value which won't overflow on
* multiplication
+ * @max_raw_delta: Maximum safe delta value for negative motion detection
* @name: Pointer to clocksource name
* @list: List head for registration (internal)
* @freq_khz: Clocksource frequency in khz.
@@ -109,6 +110,7 @@ struct clocksource {
struct arch_clocksource_data archdata;
#endif
u64 max_cycles;
+ u64 max_raw_delta;
const char *name;
struct list_head list;
u32 freq_khz;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index aab6472853fa..7304d7cf47f2 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -24,7 +24,7 @@ static void clocksource_enqueue(struct clocksource *cs);
static noinline u64 cycles_to_nsec_safe(struct clocksource *cs, u64 start, u64 end)
{
- u64 delta = clocksource_delta(end, start, cs->mask);
+ u64 delta = clocksource_delta(end, start, cs->mask, cs->max_raw_delta);
if (likely(delta < cs->max_cycles))
return clocksource_cyc2ns(delta, cs->mult, cs->shift);
@@ -993,6 +993,15 @@ static inline void clocksource_update_max_deferment(struct clocksource *cs)
cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
cs->maxadj, cs->mask,
&cs->max_cycles);
+
+ /*
+ * Threshold for detecting negative motion in clocksource_delta().
+ *
+ * Allow for 0.875 of the counter width so that overly long idle
+ * sleeps, which go slightly over mask/2, do not trigger the
+ * negative motion detection.
+ */
+ cs->max_raw_delta = (cs->mask >> 1) + (cs->mask >> 2) + (cs->mask >> 3);
}
static struct clocksource *clocksource_find_best(bool oneshot, bool skipcur)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0ca85ff4fbb4..3d128825d343 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -755,7 +755,8 @@ static void timekeeping_forward_now(struct timekeeper *tk)
u64 cycle_now, delta;
cycle_now = tk_clock_read(&tk->tkr_mono);
- delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
+ delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask,
+ tk->tkr_mono.clock->max_raw_delta);
tk->tkr_mono.cycle_last = cycle_now;
tk->tkr_raw.cycle_last = cycle_now;
@@ -2230,7 +2231,8 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode)
return false;
offset = clocksource_delta(tk_clock_read(&tk->tkr_mono),
- tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
+ tk->tkr_mono.cycle_last, tk->tkr_mono.mask,
+ tk->tkr_mono.clock->max_raw_delta);
/* Check if there's really nothing to do */
if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK)
diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
index 63e600e943a7..8c9079108ffb 100644
--- a/kernel/time/timekeeping_internal.h
+++ b/kernel/time/timekeeping_internal.h
@@ -30,15 +30,15 @@ static inline void timekeeping_inc_mg_floor_swaps(void)
#endif
-static inline u64 clocksource_delta(u64 now, u64 last, u64 mask)
+static inline u64 clocksource_delta(u64 now, u64 last, u64 mask, u64 max_delta)
{
u64 ret = (now - last) & mask;
/*
- * Prevent time going backwards by checking the MSB of mask in
- * the result. If set, return 0.
+ * Prevent time going backwards by checking the result against
+ * @max_delta. If greater, return 0.
*/
- return ret & ~(mask >> 1) ? 0 : ret;
+ return ret > max_delta ? 0 : ret;
}
/* Semi public for serialization of non timekeeper VDSO updates. */
This reverts commit dfe6c5692fb5 ("ocfs2: fix the la space leak when
unmounting an ocfs2 volume").
In commit dfe6c5692fb5, the commit log "This bug has existed since the
initial OCFS2 code." is wrong. The correct introduction commit is
30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()").
The influence of commit dfe6c5692fb5 is that it provides a correct
fix for the latest kernel. however, it shouldn't be pushed to stable
branches. Let's use this commit to revert all branches that include
dfe6c5692fb5 and use a new fix method to fix commit 30dd3478c3cd.
Fixes: dfe6c5692fb5 ("ocfs2: fix the la space leak when unmounting an ocfs2 volume")
Signed-off-by: Heming Zhao <heming.zhao(a)suse.com>
Cc: <stable(a)vger.kernel.org>
---
fs/ocfs2/localalloc.c | 19 -------------------
1 file changed, 19 deletions(-)
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c
index 8ac42ea81a17..5df34561c551 100644
--- a/fs/ocfs2/localalloc.c
+++ b/fs/ocfs2/localalloc.c
@@ -1002,25 +1002,6 @@ static int ocfs2_sync_local_to_main(struct ocfs2_super *osb,
start = bit_off + 1;
}
- /* clear the contiguous bits until the end boundary */
- if (count) {
- blkno = la_start_blk +
- ocfs2_clusters_to_blocks(osb->sb,
- start - count);
-
- trace_ocfs2_sync_local_to_main_free(
- count, start - count,
- (unsigned long long)la_start_blk,
- (unsigned long long)blkno);
-
- status = ocfs2_release_clusters(handle,
- main_bm_inode,
- main_bm_bh, blkno,
- count);
- if (status < 0)
- mlog_errno(status);
- }
-
bail:
if (status)
mlog_errno(status);
--
2.43.0
If the caller of vmap() specifies VM_MAP_PUT_PAGES (currently only the
i915 driver), we will decrement nr_vmalloc_pages and MEMCG_VMALLOC in
vfree(). These counters are incremented by vmalloc() but not by vmap()
so this will cause an underflow. Check the VM_MAP_PUT_PAGES flag before
decrementing either counter.
Fixes: b944afc9d64d (mm: add a VM_MAP_PUT_PAGES flag for vmap)
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
---
mm/vmalloc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f009b21705c1..5c88d0e90c20 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3374,7 +3374,8 @@ void vfree(const void *addr)
struct page *page = vm->pages[i];
BUG_ON(!page);
- mod_memcg_page_state(page, MEMCG_VMALLOC, -1);
+ if (!(vm->flags & VM_MAP_PUT_PAGES))
+ mod_memcg_page_state(page, MEMCG_VMALLOC, -1);
/*
* High-order allocs for huge vmallocs are split, so
* can be freed as an array of order-0 allocations
@@ -3382,7 +3383,8 @@ void vfree(const void *addr)
__free_page(page);
cond_resched();
}
- atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages);
+ if (!(vm->flags & VM_MAP_PUT_PAGES))
+ atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages);
kvfree(vm->pages);
kfree(vm);
}
--
2.45.2
From: Wayne Lin <wayne.lin(a)amd.com>
[ Upstream commit fcf6a49d79923a234844b8efe830a61f3f0584e4 ]
[Why]
When unplug one of monitors connected after mst hub, encounter null pointer dereference.
It's due to dc_sink get released immediately in early_unregister() or detect_ctx(). When
commit new state which directly referring to info stored in dc_sink will cause null pointer
dereference.
[how]
Remove redundant checking condition. Relevant condition should already be covered by checking
if dsc_aux is null or not. Also reset dsc_aux to NULL when the connector is disconnected.
Reviewed-by: Jerry Zuo <jerry.zuo(a)amd.com>
Acked-by: Zaeem Mohamed <zaeem.mohamed(a)amd.com>
Signed-off-by: Wayne Lin <wayne.lin(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 1acef5f3838f..a1619f4569cf 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -183,6 +183,8 @@ amdgpu_dm_mst_connector_early_unregister(struct drm_connector *connector)
dc_sink_release(dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
+ aconnector->dsc_aux = NULL;
+ port->passthrough_aux = NULL;
}
aconnector->mst_status = MST_STATUS_DEFAULT;
@@ -487,6 +489,8 @@ dm_dp_mst_detect(struct drm_connector *connector,
dc_sink_release(aconnector->dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
+ aconnector->dsc_aux = NULL;
+ port->passthrough_aux = NULL;
amdgpu_dm_set_mst_status(&aconnector->mst_status,
MST_REMOTE_EDID | MST_ALLOCATE_NEW_PAYLOAD | MST_CLEAR_ALLOCATED_PAYLOAD,
--
2.25.1
Netpoll will explicitly pass the polling call with a budget of 0 to
indicate it's clearing the Tx path only. For the gve_rx_poll and
gve_xdp_poll, they were mistakenly taking the 0 budget as the indication
to do all the work. Add check to avoid the rx path and xdp path being
called when budget is 0. And also avoid napi_complete_done being called
when budget is 0 for netpoll.
The original fix was merged here:
https://lore.kernel.org/r/20231114004144.2022268-1-ziweixiao@google.com
Resend it since the original one was not cleanly applied to 5.15 kernel.
commit 278a370c1766 ("gve: Fixes for napi_poll when budget is 0")
Fixes: f5cedc84a30d ("gve: Add transmit and receive support")
Signed-off-by: Ziwei Xiao <ziweixiao(a)google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi(a)google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi(a)google.com>
---
Changes in v2:
* Add the original git commit id
---
drivers/net/ethernet/google/gve/gve_main.c | 7 +++++++
drivers/net/ethernet/google/gve/gve_rx.c | 4 ----
drivers/net/ethernet/google/gve/gve_tx.c | 4 ----
3 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index bf8a4a7c43f7..c3f1959533a8 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -198,6 +198,10 @@ static int gve_napi_poll(struct napi_struct *napi, int budget)
if (block->tx)
reschedule |= gve_tx_poll(block, budget);
+
+ if (!budget)
+ return 0;
+
if (block->rx)
reschedule |= gve_rx_poll(block, budget);
@@ -246,6 +250,9 @@ static int gve_napi_poll_dqo(struct napi_struct *napi, int budget)
if (block->tx)
reschedule |= gve_tx_poll_dqo(block, /*do_clean=*/true);
+ if (!budget)
+ return 0;
+
if (block->rx) {
work_done = gve_rx_poll_dqo(block, budget);
reschedule |= work_done == budget;
diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c
index 94941d4e4744..368e0e770178 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -599,10 +599,6 @@ bool gve_rx_poll(struct gve_notify_block *block, int budget)
feat = block->napi.dev->features;
- /* If budget is 0, do all the work */
- if (budget == 0)
- budget = INT_MAX;
-
if (budget > 0)
repoll |= gve_clean_rx_done(rx, budget, feat);
else
diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c
index 665ac795a1ad..d56b8356f1f3 100644
--- a/drivers/net/ethernet/google/gve/gve_tx.c
+++ b/drivers/net/ethernet/google/gve/gve_tx.c
@@ -691,10 +691,6 @@ bool gve_tx_poll(struct gve_notify_block *block, int budget)
u32 nic_done;
u32 to_do;
- /* If budget is 0, do all the work */
- if (budget == 0)
- budget = INT_MAX;
-
/* Find out how much work there is to be done */
tx->last_nic_done = gve_tx_load_event_counter(priv, tx);
nic_done = be32_to_cpu(tx->last_nic_done);
--
2.47.0.338.g60cca15819-goog
From: Juntong Deng <juntong.deng(a)outlook.com>
commit 7ad4e0a4f61c57c3ca291ee010a9d677d0199fba upstream.
In gfs2_put_super(), whether withdrawn or not, the quota should
be cleaned up by gfs2_quota_cleanup().
Otherwise, struct gfs2_sbd will be freed before gfs2_qd_dealloc (rcu
callback) has run for all gfs2_quota_data objects, resulting in
use-after-free.
Also, gfs2_destroy_threads() and gfs2_quota_cleanup() is already called
by gfs2_make_fs_ro(), so in gfs2_put_super(), after calling
gfs2_make_fs_ro(), there is no need to call them again.
Reported-by: syzbot+29c47e9e51895928698c(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=29c47e9e51895928698c
Signed-off-by: Juntong Deng <juntong.deng(a)outlook.com>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Clayton Casciato <majortomtosourcecontrol(a)gmail.com>
Signed-off-by: Guocai He <guocai.he.cn(a)windriver.com>
---
This commit is backporting 7ad4e0a4f61c7ad4e0a4f61c57c3ca291ee010a9d677d0199fba to the branch linux-5.15.y to
solve the CVE-2024-52760. Please merge this commit to linux-5.15.y.
fs/gfs2/super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 268651ac9fc8..98158559893f 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -590,6 +590,8 @@ static void gfs2_put_super(struct super_block *sb)
if (!sb_rdonly(sb)) {
gfs2_make_fs_ro(sdp);
+ } else {
+ gfs2_quota_cleanup(sdp);
}
WARN_ON(gfs2_withdrawing(sdp));
--
2.34.1
From: Juntong Deng <juntong.deng(a)outlook.com>
commit 7ad4e0a4f61c57c3ca291ee010a9d677d0199fba upstream.
In gfs2_put_super(), whether withdrawn or not, the quota should
be cleaned up by gfs2_quota_cleanup().
Otherwise, struct gfs2_sbd will be freed before gfs2_qd_dealloc (rcu
callback) has run for all gfs2_quota_data objects, resulting in
use-after-free.
Also, gfs2_destroy_threads() and gfs2_quota_cleanup() is already called
by gfs2_make_fs_ro(), so in gfs2_put_super(), after calling
gfs2_make_fs_ro(), there is no need to call them again.
Reported-by: syzbot+29c47e9e51895928698c(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=29c47e9e51895928698c
Signed-off-by: Juntong Deng <juntong.deng(a)outlook.com>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Clayton Casciato <majortomtosourcecontrol(a)gmail.com>
Signed-off-by: Guocai He <guocai.he.cn(a)windriver.com>
---
This commit is backporting 7ad4e0a4f61c7ad4e0a4f61c57c3ca291ee010a9d677d0199fba to the branch linux-5.15.y to
solve the CVE-2024-52760. Please merge this commit to linux-5.15.y.
fs/gfs2/super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 268651ac9fc8..98158559893f 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -590,6 +590,8 @@ static void gfs2_put_super(struct super_block *sb)
if (!sb_rdonly(sb)) {
gfs2_make_fs_ro(sdp);
+ } else {
+ gfs2_quota_cleanup(sdp);
}
WARN_ON(gfs2_withdrawing(sdp));
--
2.34.1
NULL-dereference is possible in amd_pstate_adjust_perf in 6.6 stable
release.
The problem has been fixed by the following upstream patch that was adapted
to 6.6. The patch couldn't be applied clearly but the changes made are
minor.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Hi,
ocfs2 on a drbd device, writing something to it, then unmount ends up in:
[ 1135.766639] OCFS2: ERROR (device drbd0): ocfs2_block_group_clear_bits:
Group descriptor # 4128768 has bit count 32256 but claims 33222 are freed.
num_bits 996
[ 1135.766645] On-disk corruption discovered. Please run fsck.ocfs2 once
the filesystem is unmounted.
[ 1135.766647] (umount,10751,3):_ocfs2_free_suballoc_bits:2490 ERROR:
status = -30
[ 1135.766650] (umount,10751,3):_ocfs2_free_clusters:2573 ERROR: status =
-30
[ 1135.766652] (umount,10751,3):ocfs2_sync_local_to_main:1027 ERROR:
status = -30
[ 1135.766654] (umount,10751,3):ocfs2_sync_local_to_main:1032 ERROR:
status = -30
[ 1135.766656] (umount,10751,3):ocfs2_shutdown_local_alloc:449 ERROR:
status = -30
[ 1135.965908] ocfs2: Unmounting device (147,0) on (node 2)
This is since 6.6.55, reverting this patch helps:
commit e7a801014726a691d4aa6e3839b3f0940ea41591
Author: Heming Zhao <heming.zhao(a)suse.com>
Date: Fri Jul 19 19:43:10 2024 +0800
ocfs2: fix the la space leak when unmounting an ocfs2 volume
commit dfe6c5692fb525e5e90cefe306ee0dffae13d35f upstream.
Linux 6.1.119 is also broken, but 6.12.4 is fine.
I guess there is something missing?
Thomas
út 10. 12. 2024 v 21:44 odesílatel Sasha Levin <sashal(a)kernel.org> napsal:
>
> This is a note to let you know that I've just added the patch titled
>
> rtla/utils: Add idle state disabling via libcpupower
>
> to the 6.12-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> rtla-utils-add-idle-state-disabling-via-libcpupower.patch
> and it can be found in the queue-6.12 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
This is a part of a patchset implementing a new feature, rtla idle
state disabling, see [1]. It seems it was included in this stable
queue and also the one for 6.6 by mistake.
Also, the patch by itself does not do anything, because it depends on
preceding commits from the patchset to enable HAVE_LIBCPUPOWER_SUPPORT
and on following commits to actually implement the functionality in
the rtla command line interface.
Perhaps AUTOSEL picked it due to some merge conflicts with other
patches? I don't know of any though.
[1] - https://lore.kernel.org/linux-trace-kernel/20241017140914.3200454-1-tglozar…
>
>
>
> commit 354bcd3b3efd600f4d23cee6898a6968659bb3a9
> Author: Tomas Glozar <tglozar(a)redhat.com>
> Date: Thu Oct 17 16:09:11 2024 +0200
>
> rtla/utils: Add idle state disabling via libcpupower
>
> [ Upstream commit 083d29d3784319e9e9fab3ac02683a7b26ae3480 ]
>
> Add functions to utils.c to disable idle states through functions of
> libcpupower. This will serve as the basis for disabling idle states
> per cpu when running timerlat.
>
> Link: https://lore.kernel.org/20241017140914.3200454-4-tglozar@redhat.com
> Signed-off-by: Tomas Glozar <tglozar(a)redhat.com>
> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
> index 0735fcb827ed7..230f9fc7502dd 100644
> --- a/tools/tracing/rtla/src/utils.c
> +++ b/tools/tracing/rtla/src/utils.c
> @@ -4,6 +4,9 @@
> */
>
> #define _GNU_SOURCE
> +#ifdef HAVE_LIBCPUPOWER_SUPPORT
> +#include <cpuidle.h>
> +#endif /* HAVE_LIBCPUPOWER_SUPPORT */
> #include <dirent.h>
> #include <stdarg.h>
> #include <stdlib.h>
> @@ -519,6 +522,153 @@ int set_cpu_dma_latency(int32_t latency)
> return fd;
> }
>
> +#ifdef HAVE_LIBCPUPOWER_SUPPORT
> +static unsigned int **saved_cpu_idle_disable_state;
> +static size_t saved_cpu_idle_disable_state_alloc_ctr;
> +
> +/*
> + * save_cpu_idle_state_disable - save disable for all idle states of a cpu
> + *
> + * Saves the current disable of all idle states of a cpu, to be subsequently
> + * restored via restore_cpu_idle_disable_state.
> + *
> + * Return: idle state count on success, negative on error
> + */
> +int save_cpu_idle_disable_state(unsigned int cpu)
> +{
> + unsigned int nr_states;
> + unsigned int state;
> + int disabled;
> + int nr_cpus;
> +
> + nr_states = cpuidle_state_count(cpu);
> +
> + if (nr_states == 0)
> + return 0;
> +
> + if (saved_cpu_idle_disable_state == NULL) {
> + nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
> + saved_cpu_idle_disable_state = calloc(nr_cpus, sizeof(unsigned int *));
> + if (!saved_cpu_idle_disable_state)
> + return -1;
> + }
> +
> + saved_cpu_idle_disable_state[cpu] = calloc(nr_states, sizeof(unsigned int));
> + if (!saved_cpu_idle_disable_state[cpu])
> + return -1;
> + saved_cpu_idle_disable_state_alloc_ctr++;
> +
> + for (state = 0; state < nr_states; state++) {
> + disabled = cpuidle_is_state_disabled(cpu, state);
> + if (disabled < 0)
> + return disabled;
> + saved_cpu_idle_disable_state[cpu][state] = disabled;
> + }
> +
> + return nr_states;
> +}
> +
> +/*
> + * restore_cpu_idle_disable_state - restore disable for all idle states of a cpu
> + *
> + * Restores the current disable state of all idle states of a cpu that was
> + * previously saved by save_cpu_idle_disable_state.
> + *
> + * Return: idle state count on success, negative on error
> + */
> +int restore_cpu_idle_disable_state(unsigned int cpu)
> +{
> + unsigned int nr_states;
> + unsigned int state;
> + int disabled;
> + int result;
> +
> + nr_states = cpuidle_state_count(cpu);
> +
> + if (nr_states == 0)
> + return 0;
> +
> + if (!saved_cpu_idle_disable_state)
> + return -1;
> +
> + for (state = 0; state < nr_states; state++) {
> + if (!saved_cpu_idle_disable_state[cpu])
> + return -1;
> + disabled = saved_cpu_idle_disable_state[cpu][state];
> + result = cpuidle_state_disable(cpu, state, disabled);
> + if (result < 0)
> + return result;
> + }
> +
> + free(saved_cpu_idle_disable_state[cpu]);
> + saved_cpu_idle_disable_state[cpu] = NULL;
> + saved_cpu_idle_disable_state_alloc_ctr--;
> + if (saved_cpu_idle_disable_state_alloc_ctr == 0) {
> + free(saved_cpu_idle_disable_state);
> + saved_cpu_idle_disable_state = NULL;
> + }
> +
> + return nr_states;
> +}
> +
> +/*
> + * free_cpu_idle_disable_states - free saved idle state disable for all cpus
> + *
> + * Frees the memory used for storing cpu idle state disable for all cpus
> + * and states.
> + *
> + * Normally, the memory is freed automatically in
> + * restore_cpu_idle_disable_state; this is mostly for cleaning up after an
> + * error.
> + */
> +void free_cpu_idle_disable_states(void)
> +{
> + int cpu;
> + int nr_cpus;
> +
> + if (!saved_cpu_idle_disable_state)
> + return;
> +
> + nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
> +
> + for (cpu = 0; cpu < nr_cpus; cpu++) {
> + free(saved_cpu_idle_disable_state[cpu]);
> + saved_cpu_idle_disable_state[cpu] = NULL;
> + }
> +
> + free(saved_cpu_idle_disable_state);
> + saved_cpu_idle_disable_state = NULL;
> +}
> +
> +/*
> + * set_deepest_cpu_idle_state - limit idle state of cpu
> + *
> + * Disables all idle states deeper than the one given in
> + * deepest_state (assuming states with higher number are deeper).
> + *
> + * This is used to reduce the exit from idle latency. Unlike
> + * set_cpu_dma_latency, it can disable idle states per cpu.
> + *
> + * Return: idle state count on success, negative on error
> + */
> +int set_deepest_cpu_idle_state(unsigned int cpu, unsigned int deepest_state)
> +{
> + unsigned int nr_states;
> + unsigned int state;
> + int result;
> +
> + nr_states = cpuidle_state_count(cpu);
> +
> + for (state = deepest_state + 1; state < nr_states; state++) {
> + result = cpuidle_state_disable(cpu, state, 1);
> + if (result < 0)
> + return result;
> + }
> +
> + return nr_states;
> +}
> +#endif /* HAVE_LIBCPUPOWER_SUPPORT */
> +
> #define _STR(x) #x
> #define STR(x) _STR(x)
>
> diff --git a/tools/tracing/rtla/src/utils.h b/tools/tracing/rtla/src/utils.h
> index 99c9cf81bcd02..101d4799a0090 100644
> --- a/tools/tracing/rtla/src/utils.h
> +++ b/tools/tracing/rtla/src/utils.h
> @@ -66,6 +66,19 @@ int set_comm_sched_attr(const char *comm_prefix, struct sched_attr *attr);
> int set_comm_cgroup(const char *comm_prefix, const char *cgroup);
> int set_pid_cgroup(pid_t pid, const char *cgroup);
> int set_cpu_dma_latency(int32_t latency);
> +#ifdef HAVE_LIBCPUPOWER_SUPPORT
> +int save_cpu_idle_disable_state(unsigned int cpu);
> +int restore_cpu_idle_disable_state(unsigned int cpu);
> +void free_cpu_idle_disable_states(void);
> +int set_deepest_cpu_idle_state(unsigned int cpu, unsigned int state);
> +static inline int have_libcpupower_support(void) { return 1; }
> +#else
> +static inline int save_cpu_idle_disable_state(unsigned int cpu) { return -1; }
> +static inline int restore_cpu_idle_disable_state(unsigned int cpu) { return -1; }
> +static inline void free_cpu_idle_disable_states(void) { }
> +static inline int set_deepest_cpu_idle_state(unsigned int cpu, unsigned int state) { return -1; }
> +static inline int have_libcpupower_support(void) { return 0; }
> +#endif /* HAVE_LIBCPUPOWER_SUPPORT */
> int auto_house_keeping(cpu_set_t *monitored_cpus);
>
> #define ns_to_usf(x) (((double)x/1000))
>
Tomas
After upgrading from 6.6.52 to 6.6.58, tapping on the touchpad stopped
working. The problem is still present in 6.6.59.
I see the following in dmesg output; the first line was not there
previously:
[ 2.129282] hid-multitouch 0018:27C6:01E0.0001: The byte is not expected for fixing the report descriptor. It's possible that the touchpad firmware is not suitable for applying the fix. got: 9
[ 2.137479] input: GXTP5140:00 27C6:01E0 as /devices/platform/AMDI0010:00/i2c-0/i2c-GXTP5140:00/0018:27C6:01E0.0001/input/input10
[ 2.137680] input: GXTP5140:00 27C6:01E0 as /devices/platform/AMDI0010:00/i2c-0/i2c-GXTP5140:00/0018:27C6:01E0.0001/input/input11
[ 2.137921] hid-multitouch 0018:27C6:01E0.0001: input,hidraw0: I2C HID v1.00 Mouse [GXTP5140:00 27C6:01E0] on i2c-GXTP5140:00
Hardware is a Lenovo ThinkPad L15 Gen 4.
The problem goes away when reverting this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/d…
See also Gentoo bug report: https://bugs.gentoo.org/942797
Hi,
Can you add the below to 6.1-stable? Thanks!
commit 3181e22fb79910c7071e84a43af93ac89e8a7106
Author: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Mon Jan 9 14:46:10 2023 +0000
io_uring: wake up optimisations
Commit 3181e22fb79910c7071e84a43af93ac89e8a7106 upstream.
Flush completions is done either from the submit syscall or by the
task_work, both are in the context of the submitter task, and when it
goes for a single threaded rings like implied by ->task_complete, there
won't be any waiters on ->cq_wait but the master task. That means that
there can be no tasks sleeping on cq_wait while we run
__io_submit_flush_completions() and so waking up can be skipped.
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/60ad9768ec74435a0ddaa6eec0ffa7729474f69f.16732742…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4f0ae938b146..0b1361663267 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -582,6 +582,16 @@ static inline void __io_cq_unlock_post(struct io_ring_ctx *ctx)
io_cqring_ev_posted(ctx);
}
+static inline void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
+ __releases(ctx->completion_lock)
+{
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ io_commit_cqring_flush(ctx);
+ if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
+ __io_cqring_wake(ctx);
+}
+
void io_cq_unlock_post(struct io_ring_ctx *ctx)
{
__io_cq_unlock_post(ctx);
@@ -1339,7 +1349,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
if (!(req->flags & REQ_F_CQE_SKIP))
__io_fill_cqe_req(ctx, req);
}
- __io_cq_unlock_post(ctx);
+ __io_cq_unlock_post_flush(ctx);
io_free_batch_list(ctx, state->compl_reqs.first);
INIT_WQ_LIST(&state->compl_reqs);
--
Jens Axboe
From: Ard Biesheuvel <ardb(a)kernel.org>
When the host stage1 is configured for LPA2, the value currently being
programmed into TCR_EL2.T0SZ may be invalid unless LPA2 is configured
at HYP as well. This means kvm_lpa2_is_enabled() is not the right
condition to test when setting TCR_EL2.DS, as it will return false if
LPA2 is only available for stage 1 but not for stage 2.
Similary, programming TCR_EL2.PS based on a limited IPA range due to
lack of stage2 LPA2 support could potentially result in problems.
So use lpa2_is_enabled() instead, and set the PS field according to the
host's IPS, which is capped at 48 bits if LPA2 support is absent or
disabled. Whether or not we can make meaningful use of such a
configuration is a different question.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
arch/arm64/kvm/arm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a102c3aebdbc..7b2735ad32e9 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1990,8 +1990,7 @@ static int kvm_init_vector_slots(void)
static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
{
struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
- u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
- unsigned long tcr;
+ unsigned long tcr, ips;
/*
* Calculate the raw per-cpu offset without a translation from the
@@ -2005,6 +2004,7 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
params->mair_el2 = read_sysreg(mair_el1);
tcr = read_sysreg(tcr_el1);
+ ips = FIELD_GET(TCR_IPS_MASK, tcr);
if (cpus_have_final_cap(ARM64_KVM_HVHE)) {
tcr |= TCR_EPD1_MASK;
} else {
@@ -2014,8 +2014,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
tcr &= ~TCR_T0SZ_MASK;
tcr |= TCR_T0SZ(hyp_va_bits);
tcr &= ~TCR_EL2_PS_MASK;
- tcr |= FIELD_PREP(TCR_EL2_PS_MASK, kvm_get_parange(mmfr0));
- if (kvm_lpa2_is_enabled())
+ tcr |= FIELD_PREP(TCR_EL2_PS_MASK, ips);
+ if (lpa2_is_enabled())
tcr |= TCR_EL2_DS;
params->tcr_el2 = tcr;
--
2.47.1.613.gc27f4b7a9f-goog
From: Ard Biesheuvel <ardb(a)kernel.org>
Currently, LPA2 kernel support implies support for up to 52 bits of
physical addressing, and this is reflected in global definitions such as
PHYS_MASK_SHIFT and MAX_PHYSMEM_BITS.
This is potentially problematic, given that LPA2 hardware support is
modeled as a CPU feature which can be overridden, and with LPA2 hardware
support turned off, attempting to map physical regions with address bits
[51:48] set (which may exist on LPA2 capable systems booting with
arm64.nolva) will result in corrupted mappings with a truncated output
address and bogus shareability attributes.
This means that the accepted physical address range in the mapping
routines should be at most 48 bits wide when LPA2 support is configured
but not enabled at runtime.
Fixes: 352b0395b505 ("arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
arch/arm64/include/asm/pgtable-hwdef.h | 6 ------
arch/arm64/include/asm/pgtable-prot.h | 7 +++++++
arch/arm64/include/asm/sparsemem.h | 5 ++++-
3 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index c78a988cca93..a9136cc551cc 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -222,12 +222,6 @@
*/
#define S1_TABLE_AP (_AT(pmdval_t, 3) << 61)
-/*
- * Highest possible physical address supported.
- */
-#define PHYS_MASK_SHIFT (CONFIG_ARM64_PA_BITS)
-#define PHYS_MASK ((UL(1) << PHYS_MASK_SHIFT) - 1)
-
#define TTBR_CNP_BIT (UL(1) << 0)
/*
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 9f9cf13bbd95..a95f1f77bb39 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -81,6 +81,7 @@ extern unsigned long prot_ns_shared;
#define lpa2_is_enabled() false
#define PTE_MAYBE_SHARED PTE_SHARED
#define PMD_MAYBE_SHARED PMD_SECT_S
+#define PHYS_MASK_SHIFT (CONFIG_ARM64_PA_BITS)
#else
static inline bool __pure lpa2_is_enabled(void)
{
@@ -89,8 +90,14 @@ static inline bool __pure lpa2_is_enabled(void)
#define PTE_MAYBE_SHARED (lpa2_is_enabled() ? 0 : PTE_SHARED)
#define PMD_MAYBE_SHARED (lpa2_is_enabled() ? 0 : PMD_SECT_S)
+#define PHYS_MASK_SHIFT (lpa2_is_enabled() ? CONFIG_ARM64_PA_BITS : 48)
#endif
+/*
+ * Highest possible physical address supported.
+ */
+#define PHYS_MASK ((UL(1) << PHYS_MASK_SHIFT) - 1)
+
/*
* If we have userspace only BTI we don't want to mark kernel pages
* guarded even if the system does support BTI.
diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h
index 8a8acc220371..84783efdc9d1 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -5,7 +5,10 @@
#ifndef __ASM_SPARSEMEM_H
#define __ASM_SPARSEMEM_H
-#define MAX_PHYSMEM_BITS CONFIG_ARM64_PA_BITS
+#include <asm/pgtable-prot.h>
+
+#define MAX_PHYSMEM_BITS PHYS_MASK_SHIFT
+#define MAX_POSSIBLE_PHYSMEM_BITS (52)
/*
* Section size must be at least 512MB for 64K base
--
2.47.1.613.gc27f4b7a9f-goog
The PE Reset State "0" obtained from RTAS calls
ibm_read_slot_reset_[state|state2] indicates that
the Reset is deactivated and the PE is not in the MMIO
Stopped or DMA Stopped state.
With PE Reset State "0", the MMIO and DMA is allowed for
the PE. The function pseries_eeh_get_state() is currently
not indicating that to the caller because of which the
drivers are unable to resume the MMIO and DMA activity.
The patch fixes that by reflecting what is actually allowed.
Fixes: 00ba05a12b3c ("powerpc/pseries: Cleanup on pseries_eeh_get_state()")
Signed-off-by: Narayana Murty N <nnmlinux(a)linux.ibm.com>
---
Changelog:
V1:https://lore.kernel.org/all/20241107042027.338065-1-nnmlinux@linux.ibm.c…
--added Fixes tag for "powerpc/pseries: Cleanup on
pseries_eeh_get_state()".
---
arch/powerpc/platforms/pseries/eeh_pseries.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 1893f66371fa..b12ef382fec7 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -580,8 +580,10 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *delay)
switch(rets[0]) {
case 0:
- result = EEH_STATE_MMIO_ACTIVE |
- EEH_STATE_DMA_ACTIVE;
+ result = EEH_STATE_MMIO_ACTIVE |
+ EEH_STATE_DMA_ACTIVE |
+ EEH_STATE_MMIO_ENABLED |
+ EEH_STATE_DMA_ENABLED;
break;
case 1:
result = EEH_STATE_RESET_ACTIVE |
--
2.47.1
When `skb_splice_from_iter` was introduced, it inadvertently added
checksumming for AF_UNIX sockets. This resulted in significant
slowdowns, for example when using sendfile over unix sockets.
Using the test code in [1] in my test setup (2G single core qemu),
the client receives a 1000M file in:
- without the patch: 1482ms (+/- 36ms)
- with the patch: 652.5ms (+/- 22.9ms)
This commit addresses the issue by marking checksumming as unnecessary in
`unix_stream_sendmsg`
Cc: stable(a)vger.kernel.org
Signed-off-by: Frederik Deweerdt <deweerdt.lkml(a)gmail.com>
Fixes: 2e910b95329c ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES")
---
net/unix/af_unix.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 001ccc55ef0f..6b1762300443 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2313,6 +2313,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
fds_sent = true;
if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES)) {
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
err = skb_splice_from_iter(skb, &msg->msg_iter, size,
sk->sk_allocation);
if (err < 0) {
--
2.44.1
I am Tomasz Chmielewski, a Portfolio Manager and Chartered
Financial Analyst affiliated with Iwoca Poland Sp. Z OO in
Poland. I have the privilege of working with distinguished
investors who are eager to support your company's current
initiatives, thereby broadening their investment portfolios. If
this proposal aligns with your interests, I invite you to
respond, and I will gladly share more information to assist you.
Yours sincerely,
Tomasz Chmielewski Warsaw, Mazowieckie,
Poland.
If the caller of vmap() specifies VM_MAP_PUT_PAGES (currently only the
i915 driver), we will decrement nr_vmalloc_pages in vfree() without ever
incrementing it. Check the flag before decrementing the counter.
Fixes: b944afc9d64d (mm: add a VM_MAP_PUT_PAGES flag for vmap)
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
---
mm/vmalloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f009b21705c1..bc9c91f3b373 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3382,7 +3382,8 @@ void vfree(const void *addr)
__free_page(page);
cond_resched();
}
- atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages);
+ if (!(vm->flags & VM_MAP_PUT_PAGES))
+ atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages);
kvfree(vm->pages);
kfree(vm);
}
--
2.45.2
Recent platforms require more slack slots than the current value of
EFI_MMAP_NR_SLACK_SLOTS, otherwise they fail to boot. So, introduce
EFI_MIN_NR_MMAP_SLACK_SLOTS and EFI_MAX_NR_MMAP_SLACK_SLOTS
and use them to determine a number of slots that the platform
is willing to accept.
Cc: stable(a)vger.kernel.org
Cc: Tyler Hicks <code(a)tyhicks.com>
Tested-by: Brian Nguyen <nguyenbrian(a)microsoft.com>
Tested-by: Jacob Pan <panj(a)microsoft.com>
Reviewed-by: Allen Pais <apais(a)microsoft.com>
Signed-off-by: Hamza Mahfooz <hamzamahfooz(a)linux.microsoft.com>
---
drivers/firmware/efi/Kconfig | 23 +++++++++++++++++
.../firmware/efi/libstub/efi-stub-helper.c | 2 +-
drivers/firmware/efi/libstub/efistub.h | 15 +----------
drivers/firmware/efi/libstub/kaslr.c | 2 +-
drivers/firmware/efi/libstub/mem.c | 25 +++++++++++++++----
drivers/firmware/efi/libstub/randomalloc.c | 2 +-
drivers/firmware/efi/libstub/relocate.c | 2 +-
drivers/firmware/efi/libstub/x86-stub.c | 8 +++---
8 files changed, 52 insertions(+), 27 deletions(-)
diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index e312d731f4a3..7fedc271d543 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -155,6 +155,29 @@ config EFI_TEST
Say Y here to enable the runtime services support via /dev/efi_test.
If unsure, say N.
+#
+# An efi_boot_memmap is used by efi_get_memory_map() to return the
+# EFI memory map in a dynamically allocated buffer.
+#
+# The buffer allocated for the EFI memory map includes extra room for
+# a range of [EFI_MIN_NR_MMAP_SLACK_SLOTS, EFI_MAX_NR_MMAP_SLACK_SLOTS]
+# additional EFI memory descriptors. This facilitates the reuse of the
+# EFI memory map buffer when a second call to ExitBootServices() is
+# needed because of intervening changes to the EFI memory map. Other
+# related structures, e.g. x86 e820ext, need to factor in this headroom
+# requirement as well.
+#
+
+config EFI_MIN_NR_MMAP_SLACK_SLOTS
+ int
+ depends on EFI
+ default 8
+
+config EFI_MAX_NR_MMAP_SLACK_SLOTS
+ int
+ depends on EFI
+ default 64
+
config EFI_DEV_PATH_PARSER
bool
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index c0c81ca4237e..adf2b0c0dd34 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -432,7 +432,7 @@ efi_status_t efi_exit_boot_services(void *handle, void *priv,
if (efi_disable_pci_dma)
efi_pci_disable_bridge_busmaster();
- status = efi_get_memory_map(&map, true);
+ status = efi_get_memory_map(&map, true, NULL);
if (status != EFI_SUCCESS)
return status;
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 76e44c185f29..d86c6e13de5f 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -160,19 +160,6 @@ void efi_set_u64_split(u64 data, u32 *lo, u32 *hi)
*/
#define EFI_100NSEC_PER_USEC ((u64)10)
-/*
- * An efi_boot_memmap is used by efi_get_memory_map() to return the
- * EFI memory map in a dynamically allocated buffer.
- *
- * The buffer allocated for the EFI memory map includes extra room for
- * a minimum of EFI_MMAP_NR_SLACK_SLOTS additional EFI memory descriptors.
- * This facilitates the reuse of the EFI memory map buffer when a second
- * call to ExitBootServices() is needed because of intervening changes to
- * the EFI memory map. Other related structures, e.g. x86 e820ext, need
- * to factor in this headroom requirement as well.
- */
-#define EFI_MMAP_NR_SLACK_SLOTS 8
-
typedef struct efi_generic_dev_path efi_device_path_protocol_t;
union efi_device_path_to_text_protocol {
@@ -1059,7 +1046,7 @@ void efi_apply_loadoptions_quirk(const void **load_options, u32 *load_options_si
char *efi_convert_cmdline(efi_loaded_image_t *image);
efi_status_t efi_get_memory_map(struct efi_boot_memmap **map,
- bool install_cfg_tbl);
+ bool install_cfg_tbl, unsigned int *n);
efi_status_t efi_allocate_pages(unsigned long size, unsigned long *addr,
unsigned long max);
diff --git a/drivers/firmware/efi/libstub/kaslr.c b/drivers/firmware/efi/libstub/kaslr.c
index 6318c40bda38..06e7a1ef34ab 100644
--- a/drivers/firmware/efi/libstub/kaslr.c
+++ b/drivers/firmware/efi/libstub/kaslr.c
@@ -62,7 +62,7 @@ static bool check_image_region(u64 base, u64 size)
bool ret = false;
int map_offset;
- status = efi_get_memory_map(&map, false);
+ status = efi_get_memory_map(&map, false, NULL);
if (status != EFI_SUCCESS)
return false;
diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
index 4f1fa302234d..cab25183b790 100644
--- a/drivers/firmware/efi/libstub/mem.c
+++ b/drivers/firmware/efi/libstub/mem.c
@@ -13,32 +13,47 @@
* configuration table
*
* Retrieve the UEFI memory map. The allocated memory leaves room for
- * up to EFI_MMAP_NR_SLACK_SLOTS additional memory map entries.
+ * up to CONFIG_EFI_MAX_NR_MMAP_SLACK_SLOTS additional memory map entries.
*
* Return: status code
*/
efi_status_t efi_get_memory_map(struct efi_boot_memmap **map,
- bool install_cfg_tbl)
+ bool install_cfg_tbl,
+ unsigned int *n)
{
int memtype = install_cfg_tbl ? EFI_ACPI_RECLAIM_MEMORY
: EFI_LOADER_DATA;
efi_guid_t tbl_guid = LINUX_EFI_BOOT_MEMMAP_GUID;
+ unsigned int nr = CONFIG_EFI_MIN_NR_MMAP_SLACK_SLOTS;
struct efi_boot_memmap *m, tmp;
efi_status_t status;
unsigned long size;
+ BUILD_BUG_ON(!is_power_of_2(CONFIG_EFI_MIN_NR_MMAP_SLACK_SLOTS) ||
+ !is_power_of_2(CONFIG_EFI_MAX_NR_MMAP_SLACK_SLOTS) ||
+ CONFIG_EFI_MIN_NR_MMAP_SLACK_SLOTS >=
+ CONFIG_EFI_MAX_NR_MMAP_SLACK_SLOTS);
+
tmp.map_size = 0;
status = efi_bs_call(get_memory_map, &tmp.map_size, NULL, &tmp.map_key,
&tmp.desc_size, &tmp.desc_ver);
if (status != EFI_BUFFER_TOO_SMALL)
return EFI_LOAD_ERROR;
- size = tmp.map_size + tmp.desc_size * EFI_MMAP_NR_SLACK_SLOTS;
- status = efi_bs_call(allocate_pool, memtype, sizeof(*m) + size,
- (void **)&m);
+ do {
+ size = tmp.map_size + tmp.desc_size * nr;
+ status = efi_bs_call(allocate_pool, memtype, sizeof(*m) + size,
+ (void **)&m);
+ nr <<= 1;
+ } while (status == EFI_BUFFER_TOO_SMALL &&
+ nr <= CONFIG_EFI_MAX_NR_MMAP_SLACK_SLOTS);
+
if (status != EFI_SUCCESS)
return status;
+ if (n)
+ *n = nr;
+
if (install_cfg_tbl) {
/*
* Installing a configuration table might allocate memory, and
diff --git a/drivers/firmware/efi/libstub/randomalloc.c b/drivers/firmware/efi/libstub/randomalloc.c
index c41e7b2091cd..e80a65e7b87a 100644
--- a/drivers/firmware/efi/libstub/randomalloc.c
+++ b/drivers/firmware/efi/libstub/randomalloc.c
@@ -65,7 +65,7 @@ efi_status_t efi_random_alloc(unsigned long size,
efi_status_t status;
int map_offset;
- status = efi_get_memory_map(&map, false);
+ status = efi_get_memory_map(&map, false, NULL);
if (status != EFI_SUCCESS)
return status;
diff --git a/drivers/firmware/efi/libstub/relocate.c b/drivers/firmware/efi/libstub/relocate.c
index d694bcfa1074..b7b0aad95ba4 100644
--- a/drivers/firmware/efi/libstub/relocate.c
+++ b/drivers/firmware/efi/libstub/relocate.c
@@ -28,7 +28,7 @@ efi_status_t efi_low_alloc_above(unsigned long size, unsigned long align,
unsigned long nr_pages;
int i;
- status = efi_get_memory_map(&map, false);
+ status = efi_get_memory_map(&map, false, NULL);
if (status != EFI_SUCCESS)
goto fail;
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 188c8000d245..cb14f0d2a3d9 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -740,15 +740,15 @@ static efi_status_t allocate_e820(struct boot_params *params,
struct efi_boot_memmap *map;
efi_status_t status;
__u32 nr_desc;
+ __u32 nr;
- status = efi_get_memory_map(&map, false);
+ status = efi_get_memory_map(&map, false, &nr);
if (status != EFI_SUCCESS)
return status;
nr_desc = map->map_size / map->desc_size;
- if (nr_desc > ARRAY_SIZE(params->e820_table) - EFI_MMAP_NR_SLACK_SLOTS) {
- u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) +
- EFI_MMAP_NR_SLACK_SLOTS;
+ if (nr_desc > ARRAY_SIZE(params->e820_table) - nr) {
+ u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) + nr;
status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size);
}
--
2.47.1
The patch titled
Subject: vmalloc: fix accounting with i915
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
vmalloc-fix-accounting-with-i915.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Subject: vmalloc: fix accounting with i915
Date: Wed, 11 Dec 2024 20:25:37 +0000
If the caller of vmap() specifies VM_MAP_PUT_PAGES (currently only the
i915 driver), we will decrement nr_vmalloc_pages and MEMCG_VMALLOC in
vfree(). These counters are incremented by vmalloc() but not by vmap() so
this will cause an underflow. Check the VM_MAP_PUT_PAGES flag before
decrementing either counter.
Link: https://lkml.kernel.org/r/20241211202538.168311-1-willy@infradead.org
Fixes: b944afc9d64d (mm: add a VM_MAP_PUT_PAGES flag for vmap)
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: "Uladzislau Rezki (Sony)" <urezki(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/vmalloc.c~vmalloc-fix-accounting-with-i915
+++ a/mm/vmalloc.c
@@ -3374,7 +3374,8 @@ void vfree(const void *addr)
struct page *page = vm->pages[i];
BUG_ON(!page);
- mod_memcg_page_state(page, MEMCG_VMALLOC, -1);
+ if (!(vm->flags & VM_MAP_PUT_PAGES))
+ mod_memcg_page_state(page, MEMCG_VMALLOC, -1);
/*
* High-order allocs for huge vmallocs are split, so
* can be freed as an array of order-0 allocations
@@ -3382,7 +3383,8 @@ void vfree(const void *addr)
__free_page(page);
cond_resched();
}
- atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages);
+ if (!(vm->flags & VM_MAP_PUT_PAGES))
+ atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages);
kvfree(vm->pages);
kfree(vm);
}
_
Patches currently in -mm which might be from willy(a)infradead.org are
vmalloc-fix-accounting-with-i915.patch
mm-page_alloc-cache-page_zone-result-in-free_unref_page.patch
mm-make-alloc_pages_mpol-static.patch
mm-page_alloc-export-free_frozen_pages-instead-of-free_unref_page.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-post_alloc_hook.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-prep_new_page.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-get_page_from_freelist.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_cpuset_fallback.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_may_oom.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_direct_compact.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_direct_reclaim.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_slowpath.patch
mm-page_alloc-move-set_page_refcounted-to-end-of-__alloc_pages.patch
mm-page_alloc-add-__alloc_frozen_pages.patch
mm-mempolicy-add-alloc_frozen_pages.patch
slab-allocate-frozen-pages.patch
Hi Carlos,
Please pull this branch with changes for xfs for 6.13-rc1.
As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts. Please let me know if you
encounter any problems.
--D
The following changes since commit 13f3376dd82be60001200cddecea0b317add0c28:
xfs: fix !quota build (2024-12-11 10:55:08 +0100)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git tags/fixes-6.13_2024-12-11
for you to fetch changes up to bb77ec78c1ae78e414863032b71d62356a113942:
xfs: port xfs_ioc_start_commit to multigrain timestamps (2024-12-11 11:56:05 -0800)
----------------------------------------------------------------
xfs: bug fixes for 6.13 [v4]
This is hopefully the last set of bugfixes that I'll have for 6.13.
With a bit of luck, this should all go splendidly.
Signed-off-by: "Darrick J. Wong" <djwong(a)kernel.org>
----------------------------------------------------------------
Darrick J. Wong (6):
xfs: don't move nondir/nonreg temporary repair files to the metadir namespace
xfs: don't crash on corrupt /quotas dirent
xfs: check pre-metadir fields correctly
xfs: fix zero byte checking in the superblock scrubber
xfs: return from xfs_symlink_verify early on V4 filesystems
xfs: port xfs_ioc_start_commit to multigrain timestamps
fs/xfs/libxfs/xfs_symlink_remote.c | 4 ++-
fs/xfs/scrub/agheader.c | 71 ++++++++++++++++++++++++++++++--------
fs/xfs/scrub/tempfile.c | 12 ++++++-
fs/xfs/xfs_exchrange.c | 14 ++++----
fs/xfs/xfs_qm.c | 7 ++++
5 files changed, 84 insertions(+), 24 deletions(-)
Hi all,
This is hopefully the last set of bugfixes that I'll have for 6.13.
If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.
With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.
--D
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fi…
---
Commits in this patchset:
* xfs: don't move nondir/nonreg temporary repair files to the metadir namespace
* xfs: don't crash on corrupt /quotas dirent
* xfs: check pre-metadir fields correctly
* xfs: fix zero byte checking in the superblock scrubber
* xfs: return from xfs_symlink_verify early on V4 filesystems
* xfs: port xfs_ioc_start_commit to multigrain timestamps
---
fs/xfs/libxfs/xfs_symlink_remote.c | 4 ++
fs/xfs/scrub/agheader.c | 71 ++++++++++++++++++++++++++++--------
fs/xfs/scrub/tempfile.c | 12 ++++++
fs/xfs/xfs_exchrange.c | 14 ++++---
fs/xfs/xfs_qm.c | 7 ++++
5 files changed, 84 insertions(+), 24 deletions(-)
Hi all,
Bug fixes for 6.13.
If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.
This has been running on the djcloud for months with no problems. Enjoy!
Comments and questions are, as always, welcome.
--D
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=xf…
xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h…
---
Commits in this patchset:
* xfs: return a 64-bit block count from xfs_btree_count_blocks
* xfs: fix error bailout in xfs_rtginode_create
* xfs: update btree keys correctly when _insrec splits an inode root block
* xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay
* xfs: fix sb_spino_align checks for large fsblock sizes
* xfs: return from xfs_symlink_verify early on V4 filesystems
---
libxfs/xfs_bmap.c | 6 ++----
libxfs/xfs_btree.c | 33 +++++++++++++++++++++++++--------
libxfs/xfs_btree.h | 2 +-
libxfs/xfs_ialloc_btree.c | 4 +++-
libxfs/xfs_rtgroup.c | 2 +-
libxfs/xfs_sb.c | 11 ++++++-----
libxfs/xfs_symlink_remote.c | 4 +++-
7 files changed, 41 insertions(+), 21 deletions(-)
On Wed, Dec 11 2024 at 13:39, Sasha Levin wrote:
> platform-msi: Prepare for real per device domains
>
> [ Upstream commit c88f9110bfbca5975a8dee4c9792ba12684c7bca ]
>
> Provide functions to create and remove per device MSI domains which replace
> the platform-MSI domains. The new model is that each of the devices which
> utilize platform-MSI gets now its private MSI domain which is "customized"
> in size and with a device specific function to write the MSI message into
> the device.
>
> This is the same functionality as platform-MSI but it avoids all the down
> sides of platform MSI, i.e. the extra ID book keeping, the special data
> structure in the msi descriptor. Further the domains are only created when
> the devices are really in use, so the burden is on the usage and not on the
> infrastructure.
>
> Fill in the domain template and provide two functions to init/allocate and
> remove a per device MSI domain.
>
> Until all users and parent domain providers are converted, the init/alloc
> function invokes the original platform-MSI code when the irqdomain which is
> associated to the device does not provide MSI parent functionality yet.
>
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Signed-off-by: Anup Patel <apatel(a)ventanamicro.com>
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Link: https://lore.kernel.org/r/20240127161753.114685-6-apatel@ventanamicro.com
> Stable-dep-of: 64506b3d23a3 ("scsi: ufs: qcom: Only free platform MSIs when ESI is enabled")
See my other reply. Please don't backport the world if it's not really
required. I'll send a backport of 64506b3d23a3 in a minute.
Thanks,
tglx
On Wed, Dec 11 2024 at 13:39, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> irqchip: Convert all platform MSI users to the new API
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> irqchip-convert-all-platform-msi-users-to-the-new-ap.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 5df23ec861a0208ef524a27e44c694ca2decb7ea
> Author: Thomas Gleixner <tglx(a)linutronix.de>
> Date: Sat Jan 27 21:47:34 2024 +0530
>
> irqchip: Convert all platform MSI users to the new API
>
> [ Upstream commit 14fd06c776b5289a43c91cdc64bac3bdbc7b397e ]
>
> Switch all the users of the platform MSI domain over to invoke the new
> interfaces which branch to the original platform MSI functions when the
> irqdomain associated to the caller device does not yet provide MSI parent
> functionality.
>
> No functional change.
>
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Signed-off-by: Anup Patel <apatel(a)ventanamicro.com>
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Link: https://lore.kernel.org/r/20240127161753.114685-7-apatel@ventanamicro.com
> Stable-dep-of: 64506b3d23a3 ("scsi: ufs: qcom: Only free platform MSIs when ESI is enabled")
This commit makes the invocation of
platform_msi_domain_free_irqs_all(hba->dev);
conditional on
if (host->esi_enabled)
The original code before 5df23ec861a0208ef524a27e44c694ca2decb7ea was:
platform_msi_domain_free_irqs(hba->dev);
> @@ -1926,7 +1926,7 @@ static void ufs_qcom_remove(struct platform_device *pdev)
>
> pm_runtime_get_sync(&(pdev)->dev);
> ufshcd_remove(hba);
> - platform_msi_domain_free_irqs(hba->dev);
> + platform_device_msi_free_irqs_all(hba->dev);
> }
which means the whole backport is not required and just commit
64506b3d23a3 needs to be adjusted for pre 5df23ec861:
- platform_msi_domain_free_irqs(hba->dev);
+ if (host->esi_enabled)
+ platform_msi_domain_free_irqs(hba->dev);
Thanks,
tglx
Hi,
Could you please apply the following performance regression fix that is now in mainline to 6.12.y stable branch?
Commit Data:
commit-id : 58a0c875ce028678c9594c7bdf3fe33462392808
summary : nvme: don't apply NVME_QUIRK_DEALLOCATE_ZEROES when DSM is not supported
author : hch(a)hera.kernel.org
author date : 2024-11-27 06:42:18
committer : kbusch(a)kernel.org
committer date : 2024-12-02 18:03:19
stable patch-id : 7975710aeefd128836b498f0ac4dedbe6b4068d8
In Branches:
kernel_dot_org/torvalds_linux.git master - 58a0c875ce02
kernel_dot_org/linux-stable.git master - 58a0c875ce02
Thanks,
Saeed
The comparison function cmp_profile_data() violates the C standard's
requirements for qsort() comparison functions, which mandate symmetry
and transitivity:
* Symmetry: If x < y, then y > x.
* Transitivity: If x < y and y < z, then x < z.
When v1 and v2 are equal, the function incorrectly returns 1, breaking
symmetry and transitivity. This causes undefined behavior, which can
lead to memory corruption in certain versions of glibc [1].
Fix the issue by returning 0 when v1 and v2 are equal, ensuring
compliance with the C standard and preventing undefined behavior.
Link: https://www.qualys.com/2024/01/30/qsort.txt [1]
Fixes: 0f223813edd0 ("perf ftrace: Add 'profile' command")
Fixes: 74ae366c37b7 ("perf ftrace profile: Add -s/--sort option")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kuan-Wei Chiu <visitorckw(a)gmail.com>
---
tools/perf/builtin-ftrace.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 272d3c70810e..a56cf8b0a7d4 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -1151,8 +1151,9 @@ static int cmp_profile_data(const void *a, const void *b)
if (v1 > v2)
return -1;
- else
+ if (v1 < v2)
return 1;
+ return 0;
}
static void print_profile_result(struct perf_ftrace *ftrace)
--
2.34.1
Netpoll will explicitly pass the polling call with a budget of 0 to
indicate it's clearing the Tx path only. For the gve_rx_poll and
gve_xdp_poll, they were mistakenly taking the 0 budget as the indication
to do all the work. Add check to avoid the rx path and xdp path being
called when budget is 0. And also avoid napi_complete_done being called
when budget is 0 for netpoll.
The original fix was merged here:
https://lore.kernel.org/r/20231114004144.2022268-1-ziweixiao@google.com
Resend it since the original one was not cleanly applied to 6.1 kernel.
commit 278a370c1766 ("gve: Fixes for napi_poll when budget is 0")
Fixes: f5cedc84a30d ("gve: Add transmit and receive support")
Signed-off-by: Ziwei Xiao <ziweixiao(a)google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi(a)google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi(a)google.com>
---
Changes in v2:
* Add the original git commit id
---
drivers/net/ethernet/google/gve/gve_main.c | 7 +++++++
drivers/net/ethernet/google/gve/gve_rx.c | 4 ----
drivers/net/ethernet/google/gve/gve_tx.c | 4 ----
3 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index d3f6ad586ba1..8771ccfc69b4 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -202,6 +202,10 @@ static int gve_napi_poll(struct napi_struct *napi, int budget)
if (block->tx)
reschedule |= gve_tx_poll(block, budget);
+
+ if (!budget)
+ return 0;
+
if (block->rx) {
work_done = gve_rx_poll(block, budget);
reschedule |= work_done == budget;
@@ -242,6 +246,9 @@ static int gve_napi_poll_dqo(struct napi_struct *napi, int budget)
if (block->tx)
reschedule |= gve_tx_poll_dqo(block, /*do_clean=*/true);
+ if (!budget)
+ return 0;
+
if (block->rx) {
work_done = gve_rx_poll_dqo(block, budget);
reschedule |= work_done == budget;
diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c
index 021bbf308d68..639eb6848c7d 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -778,10 +778,6 @@ int gve_rx_poll(struct gve_notify_block *block, int budget)
feat = block->napi.dev->features;
- /* If budget is 0, do all the work */
- if (budget == 0)
- budget = INT_MAX;
-
if (budget > 0)
work_done = gve_clean_rx_done(rx, budget, feat);
diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c
index 5e11b8236754..bf1ac0d1dc6f 100644
--- a/drivers/net/ethernet/google/gve/gve_tx.c
+++ b/drivers/net/ethernet/google/gve/gve_tx.c
@@ -725,10 +725,6 @@ bool gve_tx_poll(struct gve_notify_block *block, int budget)
u32 nic_done;
u32 to_do;
- /* If budget is 0, do all the work */
- if (budget == 0)
- budget = INT_MAX;
-
/* In TX path, it may try to clean completed pkts in order to xmit,
* to avoid cleaning conflict, use spin_lock(), it yields better
* concurrency between xmit/clean than netif's lock.
--
2.47.0.338.g60cca15819-goog
From: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
[ Upstream commit 2a3cfb9a24a28da9cc13d2c525a76548865e182c ]
Since 'adev->dm.dc' in amdgpu_dm_fini() might turn out to be NULL
before the call to dc_enable_dmub_notifications(), check
beforehand to ensure there will not be a possible NULL-ptr-deref
there.
Also, since commit 1e88eb1b2c25 ("drm/amd/display: Drop
CONFIG_DRM_AMD_DC_HDCP") there are two separate checks for NULL in
'adev->dm.dc' before dc_deinit_callbacks() and dc_dmub_srv_destroy().
Clean up by combining them all under one 'if'.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 81927e2808be ("drm/amd/display: Support for DMUB AUX")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8dc0f70df24f..7b4d44dcb343 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1882,14 +1882,14 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
dc_deinit_callbacks(adev->dm.dc);
#endif
- if (adev->dm.dc)
+ if (adev->dm.dc) {
dc_dmub_srv_destroy(&adev->dm.dc->ctx->dmub_srv);
-
- if (dc_enable_dmub_notifications(adev->dm.dc)) {
- kfree(adev->dm.dmub_notify);
- adev->dm.dmub_notify = NULL;
- destroy_workqueue(adev->dm.delayed_hpd_wq);
- adev->dm.delayed_hpd_wq = NULL;
+ if (dc_enable_dmub_notifications(adev->dm.dc)) {
+ kfree(adev->dm.dmub_notify);
+ adev->dm.dmub_notify = NULL;
+ destroy_workqueue(adev->dm.delayed_hpd_wq);
+ adev->dm.delayed_hpd_wq = NULL;
+ }
}
if (adev->dm.dmub_bo)
--
2.25.1
From: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
[ Upstream commit 2a3cfb9a24a28da9cc13d2c525a76548865e182c ]
Since 'adev->dm.dc' in amdgpu_dm_fini() might turn out to be NULL
before the call to dc_enable_dmub_notifications(), check
beforehand to ensure there will not be a possible NULL-ptr-deref
there.
Also, since commit 1e88eb1b2c25 ("drm/amd/display: Drop
CONFIG_DRM_AMD_DC_HDCP") there are two separate checks for NULL in
'adev->dm.dc' before dc_deinit_callbacks() and dc_dmub_srv_destroy().
Clean up by combining them all under one 'if'.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 81927e2808be ("drm/amd/display: Support for DMUB AUX")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8dc0f70df24f..7b4d44dcb343 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1882,14 +1882,14 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
dc_deinit_callbacks(adev->dm.dc);
#endif
- if (adev->dm.dc)
+ if (adev->dm.dc) {
dc_dmub_srv_destroy(&adev->dm.dc->ctx->dmub_srv);
-
- if (dc_enable_dmub_notifications(adev->dm.dc)) {
- kfree(adev->dm.dmub_notify);
- adev->dm.dmub_notify = NULL;
- destroy_workqueue(adev->dm.delayed_hpd_wq);
- adev->dm.delayed_hpd_wq = NULL;
+ if (dc_enable_dmub_notifications(adev->dm.dc)) {
+ kfree(adev->dm.dmub_notify);
+ adev->dm.dmub_notify = NULL;
+ destroy_workqueue(adev->dm.delayed_hpd_wq);
+ adev->dm.delayed_hpd_wq = NULL;
+ }
}
if (adev->dm.dmub_bo)
--
2.25.1
From: "Luis Henriques (SUSE)" <luis.henriques(a)linux.dev>
[ commit 23dfdb56581ad92a9967bcd720c8c23356af74c1 upstream ]
The following kernel trace can be triggered with fstest generic/629 when
executed against a filesystem with fast-commit feature enabled:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 PID: 866 Comm: mount Not tainted 6.10.0+ #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x66/0x90
register_lock_class+0x759/0x7d0
__lock_acquire+0x85/0x2630
? __find_get_block+0xb4/0x380
lock_acquire+0xd1/0x2d0
? __ext4_journal_get_write_access+0xd5/0x160
_raw_spin_lock+0x33/0x40
? __ext4_journal_get_write_access+0xd5/0x160
__ext4_journal_get_write_access+0xd5/0x160
ext4_reserve_inode_write+0x61/0xb0
__ext4_mark_inode_dirty+0x79/0x270
? ext4_ext_replay_set_iblocks+0x2f8/0x450
ext4_ext_replay_set_iblocks+0x330/0x450
ext4_fc_replay+0x14c8/0x1540
? jread+0x88/0x2e0
? rcu_is_watching+0x11/0x40
do_one_pass+0x447/0xd00
jbd2_journal_recover+0x139/0x1b0
jbd2_journal_load+0x96/0x390
ext4_load_and_init_journal+0x253/0xd40
ext4_fill_super+0x2cc6/0x3180
...
In the replay path there's an attempt to lock sbi->s_bdev_wb_lock in
function ext4_check_bdev_write_error(). Unfortunately, at this point this
spinlock has not been initialized yet. Moving it's initialization to an
earlier point in __ext4_fill_super() fixes this splat.
Signed-off-by: Luis Henriques (SUSE) <luis.henriques(a)linux.dev>
Link: https://patch.msgid.link/20240718094356.7863-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
fs/ext4/super.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 987d49e18dbe..65e6e532cfb9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5276,6 +5276,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
mutex_init(&sbi->s_orphan_lock);
+ spin_lock_init(&sbi->s_bdev_wb_lock);
+
ext4_fast_commit_init(sb);
sb->s_root = NULL;
@@ -5526,7 +5528,6 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
* Save the original bdev mapping's wb_err value which could be
* used to detect the metadata async write error.
*/
- spin_lock_init(&sbi->s_bdev_wb_lock);
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
&sbi->s_bdev_wb_err);
sb->s_bdev->bd_super = sb;
--
2.25.1
From: Zijun Hu <quic_zijuhu(a)quicinc.com>
[ Upstream commit bfa54a793ba77ef696755b66f3ac4ed00c7d1248 ]
For bus_register(), any error which happens after kset_register() will
cause that @priv are freed twice, fixed by setting @priv with NULL after
the first free.
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
Link: https://lore.kernel.org/r/20240727-bus_register_fix-v1-1-fed8dd0dba7a@quici…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[ Using bus->p instead of priv which will be consistent with the context ]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/base/bus.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 339a9edcde5f..028e45d2f7fe 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -853,6 +853,8 @@ int bus_register(struct bus_type *bus)
bus_remove_file(bus, &bus_attr_uevent);
bus_uevent_fail:
kset_unregister(&bus->p->subsys);
+ /* Above kset_unregister() will kfree @priv */
+ bus->p = NULL;
out:
kfree(bus->p);
bus->p = NULL;
--
2.25.1
From: Willem de Bruijn <willemb(a)google.com>
[ Upstream commit dd89a81d850fa9a65f67b4527c0e420d15bf836c ]
Drop the WARN_ON_ONCE inn gue_gro_receive if the encapsulated type is
not known or does not have a GRO handler.
Such a packet is easily constructed. Syzbot generates them and sets
off this warning.
Remove the warning as it is expected and not actionable.
The warning was previously reduced from WARN_ON to WARN_ON_ONCE in
commit 270136613bf7 ("fou: Do WARN_ON_ONCE in gue_gro_receive for bad
proto callbacks").
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Link: https://lore.kernel.org/r/20240614122552.1649044-1-willemdebruijn.kernel@gm…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Libo Chen <libo.chen.cn(a)windriver.com>
---
net/ipv4/fou.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 1d67df4d8ed6..b1a8e4eec3f6 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -453,7 +453,7 @@ static struct sk_buff *gue_gro_receive(struct sock *sk,
offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
ops = rcu_dereference(offloads[proto]);
- if (WARN_ON_ONCE(!ops || !ops->callbacks.gro_receive))
+ if (!ops || !ops->callbacks.gro_receive)
goto out;
pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
--
2.25.1
From: Rand Deeb <rand.sec96(a)gmail.com>
[ Upstream commit 789c17185fb0f39560496c2beab9b57ce1d0cbe7 ]
The ssb_device_uevent() function first attempts to convert the 'dev' pointer
to 'struct ssb_device *'. However, it mistakenly dereferences 'dev' before
performing the NULL check, potentially leading to a NULL pointer
dereference if 'dev' is NULL.
To fix this issue, move the NULL check before dereferencing the 'dev' pointer,
ensuring that the pointer is valid before attempting to use it.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Rand Deeb <rand.sec96(a)gmail.com>
Signed-off-by: Kalle Valo <kvalo(a)kernel.org>
Link: https://msgid.link/20240306123028.164155-1-rand.sec96@gmail.com
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/ssb/main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/ssb/main.c b/drivers/ssb/main.c
index d52e91258e98..aae50a5dfb57 100644
--- a/drivers/ssb/main.c
+++ b/drivers/ssb/main.c
@@ -341,11 +341,13 @@ static int ssb_bus_match(struct device *dev, struct device_driver *drv)
static int ssb_device_uevent(struct device *dev, struct kobj_uevent_env *env)
{
- struct ssb_device *ssb_dev = dev_to_ssb_dev(dev);
+ struct ssb_device *ssb_dev;
if (!dev)
return -ENODEV;
+ ssb_dev = dev_to_ssb_dev(dev);
+
return add_uevent_var(env,
"MODALIAS=ssb:v%04Xid%04Xrev%02X",
ssb_dev->id.vendor, ssb_dev->id.coreid,
--
2.25.1
From: Kaixin Wang <kxwang23(a)m.fudan.edu.cn>
[ Upstream commit 609366e7a06d035990df78f1562291c3bf0d4a12 ]
In the cdns_i3c_master_probe function, &master->hj_work is bound with
cdns_i3c_master_hj. And cdns_i3c_master_interrupt can call
cnds_i3c_master_demux_ibis function to start the work.
If we remove the module which will call cdns_i3c_master_remove to
make cleanup, it will free master->base through i3c_master_unregister
while the work mentioned above will be used. The sequence of operations
that may lead to a UAF bug is as follows:
CPU0 CPU1
| cdns_i3c_master_hj
cdns_i3c_master_remove |
i3c_master_unregister(&master->base) |
device_unregister(&master->dev) |
device_release |
//free master->base |
| i3c_master_do_daa(&master->base)
| //use master->base
Fix it by ensuring that the work is canceled before proceeding with
the cleanup in cdns_i3c_master_remove.
Signed-off-by: Kaixin Wang <kxwang23(a)m.fudan.edu.cn>
Link: https://lore.kernel.org/r/20240911153544.848398-1-kxwang23@m.fudan.edu.cn
Signed-off-by: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/i3c/master/i3c-master-cdns.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/i3c/master/i3c-master-cdns.c b/drivers/i3c/master/i3c-master-cdns.c
index b9cfda6ae9ae..4473c0b1ae2e 100644
--- a/drivers/i3c/master/i3c-master-cdns.c
+++ b/drivers/i3c/master/i3c-master-cdns.c
@@ -1668,6 +1668,7 @@ static int cdns_i3c_master_remove(struct platform_device *pdev)
struct cdns_i3c_master *master = platform_get_drvdata(pdev);
int ret;
+ cancel_work_sync(&master->hj_work);
ret = i3c_master_unregister(&master->base);
if (ret)
return ret;
--
2.25.1
From: Joe Damato <jdamato(a)fastly.com>
[ Upstream commit 08062af0a52107a243f7608fd972edb54ca5b7f8 ]
In commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature")
napi_defer_irqs was added to net_device and napi_defer_irqs_count was
added to napi_struct, both as type int.
This value never goes below zero, so there is not reason for it to be a
signed int. Change the type for both from int to u32, and add an
overflow check to sysfs to limit the value to S32_MAX.
The limit of S32_MAX was chosen because the practical limit before this
patch was S32_MAX (anything larger was an overflow) and thus there are
no behavioral changes introduced. If the extra bit is needed in the
future, the limit can be raised.
Before this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
$ cat /sys/class/net/eth4/napi_defer_hard_irqs
-2147483647
After this patch:
$ sudo bash -c 'echo 2147483649 > /sys/class/net/eth4/napi_defer_hard_irqs'
bash: line 0: echo: write error: Numerical result out of range
Similarly, /sys/class/net/XXXXX/tx_queue_len is defined as unsigned:
include/linux/netdevice.h: unsigned int tx_queue_len;
And has an overflow check:
dev_change_tx_queue_len(..., unsigned long new_len):
if (new_len != (unsigned int)new_len)
return -ERANGE;
Suggested-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Joe Damato <jdamato(a)fastly.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Link: https://patch.msgid.link/20240904153431.307932-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
include/linux/netdevice.h | 4 ++--
net/core/net-sysfs.c | 6 +++++-
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fbbd0df1106b..8379e938cd89 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -352,7 +352,7 @@ struct napi_struct {
unsigned long state;
int weight;
- int defer_hard_irqs_count;
+ u32 defer_hard_irqs_count;
unsigned long gro_bitmask;
int (*poll)(struct napi_struct *, int);
#ifdef CONFIG_NETPOLL
@@ -2193,7 +2193,7 @@ struct net_device {
struct bpf_prog __rcu *xdp_prog;
unsigned long gro_flush_timeout;
- int napi_defer_hard_irqs;
+ u32 napi_defer_hard_irqs;
#define GRO_LEGACY_MAX_SIZE 65536u
/* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE),
* and shinfo->gso_segs is a 16bit field.
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 8a06f97320e0..4ce57e75d139 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -30,6 +30,7 @@
#ifdef CONFIG_SYSFS
static const char fmt_hex[] = "%#x\n";
static const char fmt_dec[] = "%d\n";
+static const char fmt_uint[] = "%u\n";
static const char fmt_ulong[] = "%lu\n";
static const char fmt_u64[] = "%llu\n";
@@ -405,6 +406,9 @@ NETDEVICE_SHOW_RW(gro_flush_timeout, fmt_ulong);
static int change_napi_defer_hard_irqs(struct net_device *dev, unsigned long val)
{
+ if (val > S32_MAX)
+ return -ERANGE;
+
WRITE_ONCE(dev->napi_defer_hard_irqs, val);
return 0;
}
@@ -418,7 +422,7 @@ static ssize_t napi_defer_hard_irqs_store(struct device *dev,
return netdev_store(dev, attr, buf, len, change_napi_defer_hard_irqs);
}
-NETDEVICE_SHOW_RW(napi_defer_hard_irqs, fmt_dec);
+NETDEVICE_SHOW_RW(napi_defer_hard_irqs, fmt_uint);
static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
--
2.25.1
From: Kory Maincent <kory.maincent(a)bootlin.com>
[ Upstream commit bbcc1c83f343e580c3aa1f2a8593343bf7b55bba ]
The Linked list element and pointer are not stored in the same memory as
the eDMA controller register. If the doorbell register is toggled before
the full write of the linked list a race condition error will occur.
In remote setup we can only use a readl to the memory to assure the full
write has occurred.
Fixes: 7e4b8a4fbe2c ("dmaengine: Add Synopsys eDMA IP version 0 support")
Reviewed-by: Serge Semin <fancer.lancer(a)gmail.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Signed-off-by: Kory Maincent <kory.maincent(a)bootlin.com>
Link: https://lore.kernel.org/r/20240129-b4-feature_hdma_mainline-v7-6-8e8c1acb7a…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/dma/dw-edma/dw-edma-v0-core.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
index a3816ba63285..aeaae28fab85 100644
--- a/drivers/dma/dw-edma/dw-edma-v0-core.c
+++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
@@ -357,6 +357,20 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
#endif /* CONFIG_64BIT */
}
+static void dw_edma_v0_sync_ll_data(struct dw_edma_chunk *chunk)
+{
+ /*
+ * In case of remote eDMA engine setup, the DW PCIe RP/EP internal
+ * configuration registers and application memory are normally accessed
+ * over different buses. Ensure LL-data reaches the memory before the
+ * doorbell register is toggled by issuing the dummy-read from the remote
+ * LL memory in a hope that the MRd TLP will return only after the
+ * last MWr TLP is completed
+ */
+ if (!(chunk->chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
+ readl(chunk->ll_region.vaddr);
+}
+
void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
{
struct dw_edma_chan *chan = chunk->chan;
@@ -423,6 +437,9 @@ void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
SET_CH_32(dw, chan->dir, chan->id, llp.msb,
upper_32_bits(chunk->ll_region.paddr));
}
+
+ dw_edma_v0_sync_ll_data(chunk);
+
/* Doorbell */
SET_RW_32(dw, chan->dir, doorbell,
FIELD_PREP(EDMA_V0_DOORBELL_CH_MASK, chan->id));
--
2.39.4
From: Rand Deeb <rand.sec96(a)gmail.com>
[ Upstream commit 789c17185fb0f39560496c2beab9b57ce1d0cbe7 ]
The ssb_device_uevent() function first attempts to convert the 'dev' pointer
to 'struct ssb_device *'. However, it mistakenly dereferences 'dev' before
performing the NULL check, potentially leading to a NULL pointer
dereference if 'dev' is NULL.
To fix this issue, move the NULL check before dereferencing the 'dev' pointer,
ensuring that the pointer is valid before attempting to use it.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Rand Deeb <rand.sec96(a)gmail.com>
Signed-off-by: Kalle Valo <kvalo(a)kernel.org>
Link: https://msgid.link/20240306123028.164155-1-rand.sec96@gmail.com
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/ssb/main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/ssb/main.c b/drivers/ssb/main.c
index d52e91258e98..aae50a5dfb57 100644
--- a/drivers/ssb/main.c
+++ b/drivers/ssb/main.c
@@ -341,11 +341,13 @@ static int ssb_bus_match(struct device *dev, struct device_driver *drv)
static int ssb_device_uevent(struct device *dev, struct kobj_uevent_env *env)
{
- struct ssb_device *ssb_dev = dev_to_ssb_dev(dev);
+ struct ssb_device *ssb_dev;
if (!dev)
return -ENODEV;
+ ssb_dev = dev_to_ssb_dev(dev);
+
return add_uevent_var(env,
"MODALIAS=ssb:v%04Xid%04Xrev%02X",
ssb_dev->id.vendor, ssb_dev->id.coreid,
--
2.25.1
From: Juntong Deng <juntong.deng(a)outlook.com>
commit 7ad4e0a4f61c57c3ca291ee010a9d677d0199fba upstream.
In gfs2_put_super(), whether withdrawn or not, the quota should
be cleaned up by gfs2_quota_cleanup().
Otherwise, struct gfs2_sbd will be freed before gfs2_qd_dealloc (rcu
callback) has run for all gfs2_quota_data objects, resulting in
use-after-free.
Also, gfs2_destroy_threads() and gfs2_quota_cleanup() is already called
by gfs2_make_fs_ro(), so in gfs2_put_super(), after calling
gfs2_make_fs_ro(), there is no need to call them again.
Reported-by: syzbot+29c47e9e51895928698c(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=29c47e9e51895928698c
Signed-off-by: Juntong Deng <juntong.deng(a)outlook.com>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Clayton Casciato <majortomtosourcecontrol(a)gmail.com>
Signed-off-by: Guocai He <guocai.he.cn(a)windriver.com>
---
This commit is backporting 7ad4e0a4f61c7ad4e0a4f61c57c3ca291ee010a9d677d0199fba to the branch linux-5.15.y to
solve the CVE-2024-52760. Please merge this commit to linux-5.15.y.
fs/gfs2/super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 268651ac9fc8..98158559893f 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -590,6 +590,8 @@ static void gfs2_put_super(struct super_block *sb)
if (!sb_rdonly(sb)) {
gfs2_make_fs_ro(sdp);
+ } else {
+ gfs2_quota_cleanup(sdp);
}
WARN_ON(gfs2_withdrawing(sdp));
--
2.34.1
From: Wayne Lin <wayne.lin(a)amd.com>
[ Upstream commit fcf6a49d79923a234844b8efe830a61f3f0584e4 ]
[Why]
When unplug one of monitors connected after mst hub, encounter null pointer dereference.
It's due to dc_sink get released immediately in early_unregister() or detect_ctx(). When
commit new state which directly referring to info stored in dc_sink will cause null pointer
dereference.
[how]
Remove redundant checking condition. Relevant condition should already be covered by checking
if dsc_aux is null or not. Also reset dsc_aux to NULL when the connector is disconnected.
Reviewed-by: Jerry Zuo <jerry.zuo(a)amd.com>
Acked-by: Zaeem Mohamed <zaeem.mohamed(a)amd.com>
Signed-off-by: Wayne Lin <wayne.lin(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 1acef5f3838f..a1619f4569cf 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -183,6 +183,8 @@ amdgpu_dm_mst_connector_early_unregister(struct drm_connector *connector)
dc_sink_release(dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
+ aconnector->dsc_aux = NULL;
+ port->passthrough_aux = NULL;
}
aconnector->mst_status = MST_STATUS_DEFAULT;
@@ -487,6 +489,8 @@ dm_dp_mst_detect(struct drm_connector *connector,
dc_sink_release(aconnector->dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
+ aconnector->dsc_aux = NULL;
+ port->passthrough_aux = NULL;
amdgpu_dm_set_mst_status(&aconnector->mst_status,
MST_REMOTE_EDID | MST_ALLOCATE_NEW_PAYLOAD | MST_CLEAR_ALLOCATED_PAYLOAD,
--
2.25.1
From: "Luis Henriques (SUSE)" <luis.henriques(a)linux.dev>
[ commit 23dfdb56581ad92a9967bcd720c8c23356af74c1 upstream ]
The following kernel trace can be triggered with fstest generic/629 when
executed against a filesystem with fast-commit feature enabled:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 PID: 866 Comm: mount Not tainted 6.10.0+ #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x66/0x90
register_lock_class+0x759/0x7d0
__lock_acquire+0x85/0x2630
? __find_get_block+0xb4/0x380
lock_acquire+0xd1/0x2d0
? __ext4_journal_get_write_access+0xd5/0x160
_raw_spin_lock+0x33/0x40
? __ext4_journal_get_write_access+0xd5/0x160
__ext4_journal_get_write_access+0xd5/0x160
ext4_reserve_inode_write+0x61/0xb0
__ext4_mark_inode_dirty+0x79/0x270
? ext4_ext_replay_set_iblocks+0x2f8/0x450
ext4_ext_replay_set_iblocks+0x330/0x450
ext4_fc_replay+0x14c8/0x1540
? jread+0x88/0x2e0
? rcu_is_watching+0x11/0x40
do_one_pass+0x447/0xd00
jbd2_journal_recover+0x139/0x1b0
jbd2_journal_load+0x96/0x390
ext4_load_and_init_journal+0x253/0xd40
ext4_fill_super+0x2cc6/0x3180
...
In the replay path there's an attempt to lock sbi->s_bdev_wb_lock in
function ext4_check_bdev_write_error(). Unfortunately, at this point this
spinlock has not been initialized yet. Moving it's initialization to an
earlier point in __ext4_fill_super() fixes this splat.
Signed-off-by: Luis Henriques (SUSE) <luis.henriques(a)linux.dev>
Link: https://patch.msgid.link/20240718094356.7863-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
fs/ext4/super.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 987d49e18dbe..65e6e532cfb9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5276,6 +5276,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
mutex_init(&sbi->s_orphan_lock);
+ spin_lock_init(&sbi->s_bdev_wb_lock);
+
ext4_fast_commit_init(sb);
sb->s_root = NULL;
@@ -5526,7 +5528,6 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
* Save the original bdev mapping's wb_err value which could be
* used to detect the metadata async write error.
*/
- spin_lock_init(&sbi->s_bdev_wb_lock);
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
&sbi->s_bdev_wb_err);
sb->s_bdev->bd_super = sb;
--
2.25.1
Patch fixing the broken audio issue for avs apparently didn't make it
into v6.12 stable tree and playing audio results in NULL pointer
dereference.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219577
Backport a fix.
Amadeusz Sławiński (1):
ASoC: Intel: avs: Fix return status of avs_pcm_hw_constraints_init()
sound/soc/intel/avs/pcm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.34.1
Commit dcfe7673787b4bfea2c213df443d312aa754757b ("net: dsa: tag_sja1105:
absorb logic for not overwriting precise info into dsa_8021q_rcv()")
added support to let the DSA switch driver set source_port and
switch_id. tag_8021q's logic overrides the previously set source_port
and switch_id only if they are marked as "invalid" (-1). sja1105 and
vsc73xx drivers are doing that properly, but ocelot_8021q driver doesn't
initialize those variables. That causes dsa_8021q_rcv() doesn't set
them, and they remain unassigned.
Initialize them as invalid to so dsa_8021q_rcv() can return with the
proper values.
Fixes: dcfe7673787b ("net: dsa: tag_sja1105: absorb logic for not overwriting precise info into dsa_8021q_rcv()")
Signed-off-by: Robert Hodaszi <robert.hodaszi(a)digi.com>
---
Cc: stable(a)vger.kernel.org
---
net/dsa/tag_ocelot_8021q.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
index 8e8b1bef6af6..11ea8cfd6266 100644
--- a/net/dsa/tag_ocelot_8021q.c
+++ b/net/dsa/tag_ocelot_8021q.c
@@ -79,7 +79,7 @@ static struct sk_buff *ocelot_xmit(struct sk_buff *skb,
static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
struct net_device *netdev)
{
- int src_port, switch_id;
+ int src_port = -1, switch_id = -1;
dsa_8021q_rcv(skb, &src_port, &switch_id, NULL, NULL);
--
2.43.0
I'm announcing the release of the 6.6.65 kernel.
This release only fixes a build regression for openrisc, and a runtime
regression for domU guests. If you don't have problems with them, no
need to upgrade.
The updated 6.6.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.6.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
arch/openrisc/include/asm/fixmap.h | 31 +------------------------------
arch/x86/platform/pvh/head.S | 22 +---------------------
3 files changed, 3 insertions(+), 52 deletions(-)
Dawei Li (1):
openrisc: Use asm-generic's version of fix_to_virt() & virt_to_fix()
Greg Kroah-Hartman (3):
Revert "x86/pvh: Call C code via the kernel virtual mapping"
Revert "x86/pvh: Set phys_base when calling xen_prepare_pvh()"
Linux 6.6.65
Commit dcfe7673787b4bfea2c213df443d312aa754757b ("net: dsa: tag_sja1105:
absorb logic for not overwriting precise info into dsa_8021q_rcv()")
added support to let the DSA switch driver set source_port and
switch_id. tag_8021q's logic overrides the previously set source_port
and switch_id only if they are marked as "invalid" (-1). sja1105 and
vsc73xx drivers are doing that properly, but ocelot_8021q driver doesn't
initialize those variables. That causes dsa_8021q_rcv() doesn't set
them, and they remain unassigned.
Initialize them as invalid to so dsa_8021q_rcv() can return with the
proper values.
Signed-off-by: Robert Hodaszi <robert.hodaszi(a)digi.com>
Cc: stable(a)vger.kernel.org
---
net/dsa/tag_ocelot_8021q.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
index 8e8b1bef6af6..11ea8cfd6266 100644
--- a/net/dsa/tag_ocelot_8021q.c
+++ b/net/dsa/tag_ocelot_8021q.c
@@ -79,7 +79,7 @@ static struct sk_buff *ocelot_xmit(struct sk_buff *skb,
static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
struct net_device *netdev)
{
- int src_port, switch_id;
+ int src_port = -1, switch_id = -1;
dsa_8021q_rcv(skb, &src_port, &switch_id, NULL, NULL);
--
2.43.0
Commit dcfe7673787b4bfea2c213df443d312aa754757b ("net: dsa: tag_sja1105:
absorb logic for not overwriting precise info into dsa_8021q_rcv()")
added support to let the DSA switch driver set source_port and
switch_id. tag_8021q's logic overrides the previously set source_port
and switch_id only if they are marked as "invalid" (-1). sja1105 and
vsc73xx drivers are doing that properly, but ocelot_8021q driver doesn't
initialize those variables. That causes dsa_8021q_rcv() doesn't set
them, and they remain unassigned.
Initialize them as invalid to so dsa_8021q_rcv() can return with the
proper values.
Fixes: dcfe7673787b ("net: dsa: tag_sja1105: absorb logic for not overwriting precise info into dsa_8021q_rcv()")
Signed-off-by: Robert Hodaszi <robert.hodaszi(a)digi.com>
---
Cc: stable(a)vger.kernel.org
---
net/dsa/tag_ocelot_8021q.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
index 8e8b1bef6af6..11ea8cfd6266 100644
--- a/net/dsa/tag_ocelot_8021q.c
+++ b/net/dsa/tag_ocelot_8021q.c
@@ -79,7 +79,7 @@ static struct sk_buff *ocelot_xmit(struct sk_buff *skb,
static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
struct net_device *netdev)
{
- int src_port, switch_id;
+ int src_port = -1, switch_id = -1;
dsa_8021q_rcv(skb, &src_port, &switch_id, NULL, NULL);
--
2.43.0
Hi,
As you have been an exhibitor at I/ITSEC: The Interservice/Industry Training, Simulation & Education Conference 2024. We have received an updated post attendees list of people including last-minute registers and walk-ins to the show.
Show details:
Date: 02 - 05 Dec 2024
Place: Orlando, USA.
Updated counts: 18,278+ visitors
Please let me know if you’d be interested in acquiring the visitors list so that I’ll share pricing and additional details for the same.
Kind Regards,
Jennifer Martin
Sr. Marketing Manager
If you do not wish to receive this newsletter reply “Not interested”
Commit dcfe7673787b4bfea2c213df443d312aa754757b ("net: dsa: tag_sja1105:
absorb logic for not overwriting precise info into dsa_8021q_rcv()")
added support to let the DSA switch driver set source_port and
switch_id. tag_8021q's logic overrides the previously set source_port
and switch_id only if they are marked as "invalid" (-1). sja1105 and
vsc73xx drivers are doing that properly, but ocelot_8021q driver doesn't
initialize those variables. That causes dsa_8021q_rcv() doesn't set
them, and they remain unassigned.
Initialize them as invalid to so dsa_8021q_rcv() can return with the
proper values.
Fixes: dcfe7673787b ("net: dsa: tag_sja1105: absorb logic for not overwriting precise info into dsa_8021q_rcv()")
Signed-off-by: Robert Hodaszi <robert.hodaszi(a)digi.com>
---
Cc: stable(a)vger.kernel.org
---
---
net/dsa/tag_ocelot_8021q.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
index 8e8b1bef6af6..11ea8cfd6266 100644
--- a/net/dsa/tag_ocelot_8021q.c
+++ b/net/dsa/tag_ocelot_8021q.c
@@ -79,7 +79,7 @@ static struct sk_buff *ocelot_xmit(struct sk_buff *skb,
static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
struct net_device *netdev)
{
- int src_port, switch_id;
+ int src_port = -1, switch_id = -1;
dsa_8021q_rcv(skb, &src_port, &switch_id, NULL, NULL);
--
2.43.0
Commit dcfe7673787b4bfea2c213df443d312aa754757b ("net: dsa: tag_sja1105:
absorb logic for not overwriting precise info into dsa_8021q_rcv()")
added support to let the DSA switch driver set source_port and
switch_id. tag_8021q's logic overrides the previously set source_port
and switch_id only if they are marked as "invalid" (-1). sja1105 and
vsc73xx drivers are doing that properly, but ocelot_8021q driver doesn't
initialize those variables. That causes dsa_8021q_rcv() doesn't set
them, and they remain unassigned.
Initialize them as invalid to so dsa_8021q_rcv() can return with the
proper values.
Signed-off-by: Robert Hodaszi <robert.hodaszi(a)digi.com>
Cc: stable(a)vger.kernel.org
---
net/dsa/tag_ocelot_8021q.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
index 8e8b1bef6af6..11ea8cfd6266 100644
--- a/net/dsa/tag_ocelot_8021q.c
+++ b/net/dsa/tag_ocelot_8021q.c
@@ -79,7 +79,7 @@ static struct sk_buff *ocelot_xmit(struct sk_buff *skb,
static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
struct net_device *netdev)
{
- int src_port, switch_id;
+ int src_port = -1, switch_id = -1;
dsa_8021q_rcv(skb, &src_port, &switch_id, NULL, NULL);
--
2.43.0
From: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
The stop trigger invokes rz_ssi_stop() and rz_ssi_stream_quit().
- The purpose of rz_ssi_stop() is to disable TX/RX, terminate DMA
transactions, and set the controller to idle.
- The purpose of rz_ssi_stream_quit() is to reset the substream-specific
software data by setting strm->running and strm->substream appropriately.
The function rz_ssi_is_stream_running() checks if both strm->substream and
strm->running are valid and returns true if so. Its implementation is as
follows:
static inline bool rz_ssi_is_stream_running(struct rz_ssi_stream *strm)
{
return strm->substream && strm->running;
}
When the controller is configured in full-duplex mode (with both playback
and capture active), the rz_ssi_stop() function does not modify the
controller settings when called for the first substream in the full-duplex
setup. Instead, it simply sets strm->running = 0 and returns if the
companion substream is still running. The following code illustrates this:
static int rz_ssi_stop(struct rz_ssi_priv *ssi, struct rz_ssi_stream *strm)
{
strm->running = 0;
if (rz_ssi_is_stream_running(&ssi->playback) ||
rz_ssi_is_stream_running(&ssi->capture))
return 0;
// ...
}
The controller settings, along with the DMA termination (for the last
stopped substream), are only applied when the last substream in the
full-duplex setup is stopped.
While applying the controller settings only when the last substream stops
is not problematic, terminating the DMA operations for only one substream
causes failures when starting and stopping full-duplex operations multiple
times in a loop.
To address this issue, call dmaengine_terminate_async() for both substreams
involved in the full-duplex setup when the last substream in the setup is
stopped.
Fixes: 4f8cd05a4305 ("ASoC: sh: rz-ssi: Add full duplex support")
Cc: stable(a)vger.kernel.org
Reviewed-by: Biju Das <biju.das.jz(a)bp.renesas.com>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
---
Changes in v4:
- updated patch description
Changes in v3:
- collected tags
- use proper fixes commit SHA1 and description
- s/sh/renesas in patch title
Changes in v2:
- none
sound/soc/renesas/rz-ssi.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/sound/soc/renesas/rz-ssi.c b/sound/soc/renesas/rz-ssi.c
index 6efd017aaa7f..2d8721156099 100644
--- a/sound/soc/renesas/rz-ssi.c
+++ b/sound/soc/renesas/rz-ssi.c
@@ -415,8 +415,12 @@ static int rz_ssi_stop(struct rz_ssi_priv *ssi, struct rz_ssi_stream *strm)
rz_ssi_reg_mask_setl(ssi, SSICR, SSICR_TEN | SSICR_REN, 0);
/* Cancel all remaining DMA transactions */
- if (rz_ssi_is_dma_enabled(ssi))
- dmaengine_terminate_async(strm->dma_ch);
+ if (rz_ssi_is_dma_enabled(ssi)) {
+ if (ssi->playback.dma_ch)
+ dmaengine_terminate_async(ssi->playback.dma_ch);
+ if (ssi->capture.dma_ch)
+ dmaengine_terminate_async(ssi->capture.dma_ch);
+ }
rz_ssi_set_idle(ssi);
--
2.39.2
From: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
I struggle to explain dividing an ARRAY_SIZE() by the size of an element
once again. As the latter equals to 2, only the half of EEPROM was ever
written. Drop the unexplainable division and write full ARRAY_SIZE().
Cc: stable(a)vger.kernel.org
Fixes: 7a8685accb95 ("leds: lp8860: Introduce TI lp8860 4 channel LED driver")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
---
This is based on code review only, I don't have LP8860 to test.
drivers/leds/leds-lp8860.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/leds/leds-lp8860.c b/drivers/leds/leds-lp8860.c
index 7a136fd817206..06196d851ade7 100644
--- a/drivers/leds/leds-lp8860.c
+++ b/drivers/leds/leds-lp8860.c
@@ -265,7 +265,7 @@ static int lp8860_init(struct lp8860_led *led)
goto out;
}
- reg_count = ARRAY_SIZE(lp8860_eeprom_disp_regs) / sizeof(lp8860_eeprom_disp_regs[0]);
+ reg_count = ARRAY_SIZE(lp8860_eeprom_disp_regs);
for (i = 0; i < reg_count; i++) {
ret = regmap_write(led->eeprom_regmap,
lp8860_eeprom_disp_regs[i].reg,
--
2.47.0
Currently afunc_bind sets std_ac_if_desc.bNumEndpoints to 1 if
controls (mute/volume) are enabled. During next afunc_bind call,
bNumEndpoints would be unchanged and incorrectly set to 1 even
if the controls aren't enabled.
Fix this by resetting the value of bNumEndpoints to 0 on every
afunc_bind call.
Fixes: eaf6cbe09920 ("usb: gadget: f_uac2: add volume and mute support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
drivers/usb/gadget/function/f_uac2.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/function/f_uac2.c b/drivers/usb/gadget/function/f_uac2.c
index ce5b77f89190..9b324821c93b 100644
--- a/drivers/usb/gadget/function/f_uac2.c
+++ b/drivers/usb/gadget/function/f_uac2.c
@@ -1185,6 +1185,7 @@ afunc_bind(struct usb_configuration *cfg, struct usb_function *fn)
uac2->as_in_alt = 0;
}
+ std_ac_if_desc.bNumEndpoints = 0;
if (FUOUT_EN(uac2_opts) || FUIN_EN(uac2_opts)) {
uac2->int_ep = usb_ep_autoconfig(gadget, &fs_ep_int_desc);
if (!uac2->int_ep) {
--
2.25.1
On Tue, 2024-12-10 at 15:41 -0500, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> drm/sched: memset() 'job' in drm_sched_job_init()
>
> to the 6.12-stable tree which can be found at:
>
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> drm-sched-memset-job-in-drm_sched_job_init.patch
> and it can be found in the queue-6.12 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable
> tree,
> please let <stable(a)vger.kernel.org> know about it.
Hi,
you can add it, it does improve things a bit.
But I'd like to use this opportunity to understand by what criteria you
found and selected this patch? Stable was not on CC, neither does the
patch contain a Fixes tag.
Regards,
P.
>
>
>
> commit d0a6c893de0172427064e39be400a23b0ba5ffec
> Author: Philipp Stanner <pstanner(a)redhat.com>
> Date: Mon Oct 21 12:50:28 2024 +0200
>
> drm/sched: memset() 'job' in drm_sched_job_init()
>
> [ Upstream commit 2320c9e6a768d135c7b0039995182bb1a4e4fd22 ]
>
> drm_sched_job_init() has no control over how users allocate
> struct
> drm_sched_job. Unfortunately, the function can also not set some
> struct
> members such as job->sched.
>
> This could theoretically lead to UB by users dereferencing the
> struct's
> pointer members too early.
>
> It is easier to debug such issues if these pointers are
> initialized to
> NULL, so dereferencing them causes a NULL pointer exception.
> Accordingly, drm_sched_entity_init() does precisely that and
> initializes
> its struct with memset().
>
> Initialize parameter "job" to 0 in drm_sched_job_init().
>
> Signed-off-by: Philipp Stanner <pstanner(a)redhat.com>
> Link:
> https://patchwork.freedesktop.org/patch/msgid/20241021105028.19794-2-pstann…
> Reviewed-by: Christian König <christian.koenig(a)amd.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index e97c6c60bc96e..416590ea0dc3d 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -803,6 +803,14 @@ int drm_sched_job_init(struct drm_sched_job
> *job,
> return -EINVAL;
> }
>
> + /*
> + * We don't know for sure how the user has allocated. Thus,
> zero the
> + * struct so that unallowed (i.e., too early) usage of
> pointers that
> + * this function does not set is guaranteed to lead to a
> NULL pointer
> + * exception instead of UB.
> + */
> + memset(job, 0, sizeof(*job));
> +
> job->entity = entity;
> job->credits = credits;
> job->s_fence = drm_sched_fence_alloc(entity, owner);
>
zram writeback is a costly operation, because every target slot
(unless ZRAM_HUGE) is decompressed before it gets written to a
backing device. The writeback to a backing device uses
submit_bio_wait() which may look like a rescheduling point.
However, if the backing device has BD_HAS_SUBMIT_BIO bit set
__submit_bio() calls directly disk->fops->submit_bio(bio) on
the backing device and so when submit_bio_wait() calls
blk_wait_io() the I/O is already done. On such systems we
effective end up in a loop
for_each (target slot) {
decompress(slot)
__submit_bio()
disk->fops->submit_bio(bio)
}
Which on PREEMPT_NONE systems triggers watchdogs (since there
are no explicit rescheduling points). Add cond_resched() to
the zram writeback loop.
Fixes: a939888ec38b ("zram: support idle/huge page writeback")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
---
drivers/block/zram/zram_drv.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0a924fae02a4..f5fa3db6b4f8 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -884,6 +884,8 @@ static ssize_t writeback_store(struct device *dev,
next:
zram_slot_unlock(zram, index);
release_pp_slot(zram, pps);
+
+ cond_resched();
}
if (blk_idx)
--
2.47.1.613.gc27f4b7a9f-goog
From: Kory Maincent <kory.maincent(a)bootlin.com>
[ Upstream commit bbcc1c83f343e580c3aa1f2a8593343bf7b55bba ]
The Linked list element and pointer are not stored in the same memory as
the eDMA controller register. If the doorbell register is toggled before
the full write of the linked list a race condition error will occur.
In remote setup we can only use a readl to the memory to assure the full
write has occurred.
Fixes: 7e4b8a4fbe2c ("dmaengine: Add Synopsys eDMA IP version 0 support")
Reviewed-by: Serge Semin <fancer.lancer(a)gmail.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Signed-off-by: Kory Maincent <kory.maincent(a)bootlin.com>
Link: https://lore.kernel.org/r/20240129-b4-feature_hdma_mainline-v7-6-8e8c1acb7a…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/dma/dw-edma/dw-edma-v0-core.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
index a3816ba63285..4fe83d5a25b6 100644
--- a/drivers/dma/dw-edma/dw-edma-v0-core.c
+++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
@@ -357,6 +357,20 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
#endif /* CONFIG_64BIT */
}
+static void dw_edma_v0_sync_ll_data(struct dw_edma_chunk *chunk)
+{
+ /*
+ * In case of remote eDMA engine setup, the DW PCIe RP/EP internal
+ * configuration registers and application memory are normally accessed
+ * over different buses. Ensure LL-data reaches the memory before the
+ * doorbell register is toggled by issuing the dummy-read from the remote
+ * LL memory in a hope that the MRd TLP will return only after the
+ * last MWr TLP is completed
+ */
+ if (!(chunk->chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
+ readl(chunk->ll_region.vaddr.io);
+}
+
void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
{
struct dw_edma_chan *chan = chunk->chan;
@@ -423,6 +437,9 @@ void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
SET_CH_32(dw, chan->dir, chan->id, llp.msb,
upper_32_bits(chunk->ll_region.paddr));
}
+
+ dw_edma_v0_sync_ll_data(chunk);
+
/* Doorbell */
SET_RW_32(dw, chan->dir, doorbell,
FIELD_PREP(EDMA_V0_DOORBELL_CH_MASK, chan->id));
--
2.39.4
From: Wayne Lin <wayne.lin(a)amd.com>
[ Upstream commit fcf6a49d79923a234844b8efe830a61f3f0584e4 ]
[Why]
When unplug one of monitors connected after mst hub, encounter null pointer dereference.
It's due to dc_sink get released immediately in early_unregister() or detect_ctx(). When
commit new state which directly referring to info stored in dc_sink will cause null pointer
dereference.
[how]
Remove redundant checking condition. Relevant condition should already be covered by checking
if dsc_aux is null or not. Also reset dsc_aux to NULL when the connector is disconnected.
Reviewed-by: Jerry Zuo <jerry.zuo(a)amd.com>
Acked-by: Zaeem Mohamed <zaeem.mohamed(a)amd.com>
Signed-off-by: Wayne Lin <wayne.lin(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 1acef5f3838f..a1619f4569cf 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -183,6 +183,8 @@ amdgpu_dm_mst_connector_early_unregister(struct drm_connector *connector)
dc_sink_release(dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
+ aconnector->dsc_aux = NULL;
+ port->passthrough_aux = NULL;
}
aconnector->mst_status = MST_STATUS_DEFAULT;
@@ -487,6 +489,8 @@ dm_dp_mst_detect(struct drm_connector *connector,
dc_sink_release(aconnector->dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
+ aconnector->dsc_aux = NULL;
+ port->passthrough_aux = NULL;
amdgpu_dm_set_mst_status(&aconnector->mst_status,
MST_REMOTE_EDID | MST_ALLOCATE_NEW_PAYLOAD | MST_CLEAR_ALLOCATED_PAYLOAD,
--
2.25.1
From: Paulo Alcantara <pc(a)manguebit.com>
[ Upstream commit 58acd1f497162e7d282077f816faa519487be045 ]
Skip sessions that are being teared down (status == SES_EXITING) to
avoid UAF.
Cc: stable(a)vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
fs/smb/client/ioctl.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/smb/client/ioctl.c b/fs/smb/client/ioctl.c
index ae9905e2b9d4..173c8c76d31f 100644
--- a/fs/smb/client/ioctl.c
+++ b/fs/smb/client/ioctl.c
@@ -246,17 +246,23 @@ static int cifs_dump_full_key(struct cifs_tcon *tcon, struct smb3_full_key_debug
spin_lock(&cifs_tcp_ses_lock);
list_for_each_entry(server_it, &cifs_tcp_ses_list, tcp_ses_list) {
list_for_each_entry(ses_it, &server_it->smb_ses_list, smb_ses_list) {
- if (ses_it->Suid == out.session_id) {
+ spin_lock(&ses_it->ses_lock);
+ if (ses_it->ses_status != SES_EXITING &&
+ ses_it->Suid == out.session_id) {
ses = ses_it;
/*
* since we are using the session outside the crit
* section, we need to make sure it won't be released
* so increment its refcount
*/
+
+ lockdep_assert_held(&cifs_tcp_ses_lock);
ses->ses_count++;
+ spin_unlock(&ses_it->ses_lock);
found = true;
goto search_end;
}
+ spin_unlock(&ses_it->ses_lock);
}
}
search_end:
--
2.25.1
From: Kory Maincent <kory.maincent(a)bootlin.com>
[ Upstream commit bbcc1c83f343e580c3aa1f2a8593343bf7b55bba ]
The Linked list element and pointer are not stored in the same memory as
the eDMA controller register. If the doorbell register is toggled before
the full write of the linked list a race condition error will occur.
In remote setup we can only use a readl to the memory to assure the full
write has occurred.
Fixes: 7e4b8a4fbe2c ("dmaengine: Add Synopsys eDMA IP version 0 support")
Reviewed-by: Serge Semin <fancer.lancer(a)gmail.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Signed-off-by: Kory Maincent <kory.maincent(a)bootlin.com>
Link: https://lore.kernel.org/r/20240129-b4-feature_hdma_mainline-v7-6-8e8c1acb7a…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/dma/dw-edma/dw-edma-v0-core.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
index a3816ba63285..aeaae28fab85 100644
--- a/drivers/dma/dw-edma/dw-edma-v0-core.c
+++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
@@ -357,6 +357,20 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
#endif /* CONFIG_64BIT */
}
+static void dw_edma_v0_sync_ll_data(struct dw_edma_chunk *chunk)
+{
+ /*
+ * In case of remote eDMA engine setup, the DW PCIe RP/EP internal
+ * configuration registers and application memory are normally accessed
+ * over different buses. Ensure LL-data reaches the memory before the
+ * doorbell register is toggled by issuing the dummy-read from the remote
+ * LL memory in a hope that the MRd TLP will return only after the
+ * last MWr TLP is completed
+ */
+ if (!(chunk->chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
+ readl(chunk->ll_region.vaddr);
+}
+
void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
{
struct dw_edma_chan *chan = chunk->chan;
@@ -423,6 +437,9 @@ void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
SET_CH_32(dw, chan->dir, chan->id, llp.msb,
upper_32_bits(chunk->ll_region.paddr));
}
+
+ dw_edma_v0_sync_ll_data(chunk);
+
/* Doorbell */
SET_RW_32(dw, chan->dir, doorbell,
FIELD_PREP(EDMA_V0_DOORBELL_CH_MASK, chan->id));
--
2.39.4
From: Rand Deeb <rand.sec96(a)gmail.com>
[ Upstream commit 789c17185fb0f39560496c2beab9b57ce1d0cbe7 ]
The ssb_device_uevent() function first attempts to convert the 'dev' pointer
to 'struct ssb_device *'. However, it mistakenly dereferences 'dev' before
performing the NULL check, potentially leading to a NULL pointer
dereference if 'dev' is NULL.
To fix this issue, move the NULL check before dereferencing the 'dev' pointer,
ensuring that the pointer is valid before attempting to use it.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Rand Deeb <rand.sec96(a)gmail.com>
Signed-off-by: Kalle Valo <kvalo(a)kernel.org>
Link: https://msgid.link/20240306123028.164155-1-rand.sec96@gmail.com
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/ssb/main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/ssb/main.c b/drivers/ssb/main.c
index d52e91258e98..aae50a5dfb57 100644
--- a/drivers/ssb/main.c
+++ b/drivers/ssb/main.c
@@ -341,11 +341,13 @@ static int ssb_bus_match(struct device *dev, struct device_driver *drv)
static int ssb_device_uevent(struct device *dev, struct kobj_uevent_env *env)
{
- struct ssb_device *ssb_dev = dev_to_ssb_dev(dev);
+ struct ssb_device *ssb_dev;
if (!dev)
return -ENODEV;
+ ssb_dev = dev_to_ssb_dev(dev);
+
return add_uevent_var(env,
"MODALIAS=ssb:v%04Xid%04Xrev%02X",
ssb_dev->id.vendor, ssb_dev->id.coreid,
--
2.25.1
From: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
[ Upstream commit 2a3cfb9a24a28da9cc13d2c525a76548865e182c ]
Since 'adev->dm.dc' in amdgpu_dm_fini() might turn out to be NULL
before the call to dc_enable_dmub_notifications(), check
beforehand to ensure there will not be a possible NULL-ptr-deref
there.
Also, since commit 1e88eb1b2c25 ("drm/amd/display: Drop
CONFIG_DRM_AMD_DC_HDCP") there are two separate checks for NULL in
'adev->dm.dc' before dc_deinit_callbacks() and dc_dmub_srv_destroy().
Clean up by combining them all under one 'if'.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 81927e2808be ("drm/amd/display: Support for DMUB AUX")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8dc0f70df24f..7b4d44dcb343 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1882,14 +1882,14 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
dc_deinit_callbacks(adev->dm.dc);
#endif
- if (adev->dm.dc)
+ if (adev->dm.dc) {
dc_dmub_srv_destroy(&adev->dm.dc->ctx->dmub_srv);
-
- if (dc_enable_dmub_notifications(adev->dm.dc)) {
- kfree(adev->dm.dmub_notify);
- adev->dm.dmub_notify = NULL;
- destroy_workqueue(adev->dm.delayed_hpd_wq);
- adev->dm.delayed_hpd_wq = NULL;
+ if (dc_enable_dmub_notifications(adev->dm.dc)) {
+ kfree(adev->dm.dmub_notify);
+ adev->dm.dmub_notify = NULL;
+ destroy_workqueue(adev->dm.delayed_hpd_wq);
+ adev->dm.delayed_hpd_wq = NULL;
+ }
}
if (adev->dm.dmub_bo)
--
2.25.1
From: Willem de Bruijn <willemb(a)google.com>
[ Upstream commit dd89a81d850fa9a65f67b4527c0e420d15bf836c ]
Drop the WARN_ON_ONCE inn gue_gro_receive if the encapsulated type is
not known or does not have a GRO handler.
Such a packet is easily constructed. Syzbot generates them and sets
off this warning.
Remove the warning as it is expected and not actionable.
The warning was previously reduced from WARN_ON to WARN_ON_ONCE in
commit 270136613bf7 ("fou: Do WARN_ON_ONCE in gue_gro_receive for bad
proto callbacks").
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Link: https://lore.kernel.org/r/20240614122552.1649044-1-willemdebruijn.kernel@gm…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Libo Chen <libo.chen.cn(a)windriver.com>
---
net/ipv4/fou.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 1d67df4d8ed6..b1a8e4eec3f6 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -453,7 +453,7 @@ static struct sk_buff *gue_gro_receive(struct sock *sk,
offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
ops = rcu_dereference(offloads[proto]);
- if (WARN_ON_ONCE(!ops || !ops->callbacks.gro_receive))
+ if (!ops || !ops->callbacks.gro_receive)
goto out;
pp = call_gro_receive(ops->callbacks.gro_receive, head, skb);
--
2.25.1
It's possible that dev_set_name() returns -ENOMEM. We could catch and
handle it by adding dev_set_name() return value check.
Cc: stable(a)vger.kernel.org
Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev")
Signed-off-by: Ma Ke <make_ruc2021(a)163.com>
---
drivers/ptp/ptp_clock.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 77a36e7bddd5..82405c07be3e 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -348,7 +348,9 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info *info,
ptp->dev.groups = ptp->pin_attr_groups;
ptp->dev.release = ptp_clock_release;
dev_set_drvdata(&ptp->dev, ptp);
- dev_set_name(&ptp->dev, "ptp%d", ptp->index);
+ err = dev_set_name(&ptp->dev, "ptp%d", ptp->index);
+ if (err)
+ goto no_pps;
/* Create a posix clock and link it to the device. */
err = posix_clock_register(&ptp->clock, &ptp->dev);
--
2.25.1
The patch titled
Subject: mm/page_alloc: don't call pfn_to_page() on possibly non-existent PFN in split_large_buddy()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-page_alloc-dont-call-pfn_to_page-on-possibly-non-existent-pfn-in-split_large_buddy.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/page_alloc: don't call pfn_to_page() on possibly non-existent PFN in split_large_buddy()
Date: Tue, 10 Dec 2024 10:34:37 +0100
In split_large_buddy(), we might call pfn_to_page() on a PFN that might
not exist. In corner cases, such as when freeing the highest pageblock in
the last memory section, this could result with CONFIG_SPARSEMEM &&
!CONFIG_SPARSEMEM_EXTREME in __pfn_to_section() returning NULL and and
__section_mem_map_addr() dereferencing that NULL pointer.
Let's fix it, and avoid doing a pfn_to_page() call for the first
iteration, where we already have the page.
So far this was found by code inspection, but let's just CC stable as the
fix is easy.
Link: https://lkml.kernel.org/r/20241210093437.174413-1-david@redhat.com
Fixes: fd919a85cd55 ("mm: page_isolation: prepare for hygienic freelists")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Vlastimil Babka <vbabka(a)suse.cz>
Closes: https://lkml.kernel.org/r/e1a898ba-a717-4d20-9144-29df1a6c8813@suse.cz
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-dont-call-pfn_to_page-on-possibly-non-existent-pfn-in-split_large_buddy
+++ a/mm/page_alloc.c
@@ -1238,13 +1238,15 @@ static void split_large_buddy(struct zon
if (order > pageblock_order)
order = pageblock_order;
- while (pfn != end) {
+ do {
int mt = get_pfnblock_migratetype(page, pfn);
__free_one_page(page, pfn, zone, order, mt, fpi);
pfn += 1 << order;
+ if (pfn == end)
+ break;
page = pfn_to_page(pfn);
- }
+ } while (1);
}
static void free_one_page(struct zone *zone, struct page *page,
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-page_alloc-dont-call-pfn_to_page-on-possibly-non-existent-pfn-in-split_large_buddy.patch
docs-tmpfs-update-the-large-folios-policy-for-tmpfs-and-shmem.patch
mm-memory_hotplug-move-debug_pagealloc_map_pages-into-online_pages_range.patch
mm-page_isolation-dont-pass-gfp-flags-to-isolate_single_pageblock.patch
mm-page_isolation-dont-pass-gfp-flags-to-start_isolate_page_range.patch
mm-page_alloc-make-__alloc_contig_migrate_range-static.patch
mm-page_alloc-sort-out-the-alloc_contig_range-gfp-flags-mess.patch
mm-page_alloc-forward-the-gfp-flags-from-alloc_contig_range-to-post_alloc_hook.patch
powernv-memtrace-use-__gfp_zero-with-alloc_contig_pages.patch
mm-hugetlb-dont-map-folios-writable-without-vm_write-when-copying-during-fork.patch
fs-proc-vmcore-convert-vmcore_cb_lock-into-vmcore_mutex.patch
fs-proc-vmcore-replace-vmcoredd_mutex-by-vmcore_mutex.patch
fs-proc-vmcore-disallow-vmcore-modifications-while-the-vmcore-is-open.patch
fs-proc-vmcore-prefix-all-pr_-with-vmcore.patch
fs-proc-vmcore-move-vmcore-definitions-out-of-kcoreh.patch
fs-proc-vmcore-factor-out-allocating-a-vmcore-range-and-adding-it-to-a-list.patch
fs-proc-vmcore-factor-out-freeing-a-list-of-vmcore-ranges.patch
fs-proc-vmcore-introduce-proc_vmcore_device_ram-to-detect-device-ram-ranges-in-2nd-kernel.patch
virtio-mem-mark-device-ready-before-registering-callbacks-in-kdump-mode.patch
virtio-mem-remember-usable-region-size.patch
virtio-mem-support-config_proc_vmcore_device_ram.patch
s390-kdump-virtio-mem-kdump-support-config_proc_vmcore_device_ram.patch
mm-page_alloc-conditionally-split-pageblock_order-pages-in-free_one_page-and-move_freepages_block_isolate.patch
mm-page_isolation-fixup-isolate_single_pageblock-comment-regarding-splitting-free-pages.patch
mm-page_alloc-dont-use-__gfp_hardwall-when-migrating-pages-via-alloc_contig.patch
mm-memory_hotplug-dont-use-__gfp_hardwall-when-migrating-pages-via-memory-offlining.patch
Changes in v7:
- Expand commit log in patch #3
I've discussed with Bjorn on IRC and video what to put into the log here
and captured most of what we discussed.
Mostly the point here is voting for voltages in the power-domain list
is up to the drivers to do with performance states/opp-tables not for the
GDSC code. - Bjorn/Bryan
- Link to v6: https://lore.kernel.org/r/20241129-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v6:
- Passes NULL to second parameter of devm_pm_domain_attach_list - Vlad
- Link to v5: https://lore.kernel.org/r/20241128-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v5:
- In-lines devm_pm_domain_attach_list() in probe() directly - Vlad
- Link to v4: https://lore.kernel.org/r/20241127-b4-linux-next-24-11-18-clock-multiple-po…
v4:
- Adds Bjorn's RB to first patch - Bjorn
- Drops the 'd' in "and int" - Bjorn
- Amends commit log of patch 3 to capture a number of open questions -
Bjorn
- Link to v3: https://lore.kernel.org/r/20241126-b4-linux-next-24-11-18-clock-multiple-po…
v3:
- Fixes commit log "per which" - Bryan
- Link to v2: https://lore.kernel.org/r/20241125-b4-linux-next-24-11-18-clock-multiple-po…
v2:
The main change in this version is Bjorn's pointing out that pm_runtime_*
inside of the gdsc_enable/gdsc_disable path would be recursive and cause a
lockdep splat. Dmitry alluded to this too.
Bjorn pointed to stuff being done lower in the gdsc_register() routine that
might be a starting point.
I iterated around that idea and came up with patch #3. When a gdsc has no
parent and the pd_list is non-NULL then attach that orphan GDSC to the
clock controller power-domain list.
Existing subdomain code in gdsc_register() will connect the parent GDSCs in
the clock-controller to the clock-controller subdomain, the new code here
does that same job for a list of power-domains the clock controller depends
on.
To Dmitry's point about MMCX and MCX dependencies for the registers inside
of the clock controller, I have switched off all references in a test dtsi
and confirmed that accessing the clock-controller regs themselves isn't
required.
On the second point I also verified my test branch with lockdep on which
was a concern with the pm_domain version of this solution but I wanted to
cover it anyway with the new approach for completeness sake.
Here's the item-by-item list of changes:
- Adds a patch to capture pm_genpd_add_subdomain() result code - Bryan
- Changes changelog of second patch to remove singleton and generally
to make the commit log easier to understand - Bjorn
- Uses demv_pm_domain_attach_list - Vlad
- Changes error check to if (ret < 0 && ret != -EEXIST) - Vlad
- Retains passing &pd_data instead of NULL - because NULL doesn't do
the same thing - Bryan/Vlad
- Retains standalone function qcom_cc_pds_attach() because the pd_data
enumeration looks neater in a standalone function - Bryan/Vlad
- Drops pm_runtime in favour of gdsc_add_subdomain_list() for each
power-domain in the pd_list.
The pd_list will be whatever is pointed to by power-domains = <>
in the dtsi - Bjorn
- Link to v1: https://lore.kernel.org/r/20241118-b4-linux-next-24-11-18-clock-multiple-po…
v1:
On x1e80100 and it's SKUs the Camera Clock Controller - CAMCC has
multiple power-domains which power it. Usually with a single power-domain
the core platform code will automatically switch on the singleton
power-domain for you. If you have multiple power-domains for a device, in
this case the clock controller, you need to switch those power-domains
on/off yourself.
The clock controllers can also contain Global Distributed
Switch Controllers - GDSCs which themselves can be referenced from dtsi
nodes ultimately triggering a gdsc_en() in drivers/clk/qcom/gdsc.c.
As an example:
cci0: cci@ac4a000 {
power-domains = <&camcc TITAN_TOP_GDSC>;
};
This series adds the support to attach a power-domain list to the
clock-controllers and the GDSCs those controllers provide so that in the
case of the above example gdsc_toggle_logic() will trigger the power-domain
list with pm_runtime_resume_and_get() and pm_runtime_put_sync()
respectively.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (3):
clk: qcom: gdsc: Capture pm_genpd_add_subdomain result code
clk: qcom: common: Add support for power-domain attachment
clk: qcom: Support attaching GDSCs to multiple parents
drivers/clk/qcom/common.c | 10 ++++++++++
drivers/clk/qcom/gdsc.c | 41 +++++++++++++++++++++++++++++++++++++++--
drivers/clk/qcom/gdsc.h | 1 +
3 files changed, 50 insertions(+), 2 deletions(-)
---
base-commit: 744cf71b8bdfcdd77aaf58395e068b7457634b2c
change-id: 20241118-b4-linux-next-24-11-18-clock-multiple-power-domains-a5f994dc452a
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Before writing a new value to the register, the old value needs to be
masked out for the new value to be programmed as intended, because at
least in some cases the reset value of that field is 0xf (max value).
At the moment, the dwc3 core initialises the threshold to the maximum
value (0xf), with the option to override it via a DT. No upstream DTs
seem to override it, therefore this commit doesn't change behaviour for
any upstream platform. Nevertheless, the code should be fixed to have
the desired outcome.
Do so.
Fixes: 80caf7d21adc ("usb: dwc3: add lpm erratum support")
Cc: stable(a)vger.kernel.org # 5.10+ (needs adjustment for 5.4)
Signed-off-by: André Draszik <andre.draszik(a)linaro.org>
---
Changes in v2:
- change mask definitions to be consistent with other masks (Thinh)
- udpate commit message to clarify that in some cases the reset value
is != 0
- Link to v1: https://lore.kernel.org/r/20241206-dwc3-nyet-fix-v1-1-293bc74f644f@linaro.o…
---
For stable-5.4, the if() test is slightly different, so a separate
patch will be sent for it for the patch to apply.
---
drivers/usb/dwc3/core.h | 1 +
drivers/usb/dwc3/gadget.c | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index ee73789326bc..f11570c8ffd0 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -464,6 +464,7 @@
#define DWC3_DCTL_TRGTULST_SS_INACT (DWC3_DCTL_TRGTULST(6))
/* These apply for core versions 1.94a and later */
+#define DWC3_DCTL_NYET_THRES_MASK (0xf << 20)
#define DWC3_DCTL_NYET_THRES(n) (((n) & 0xf) << 20)
#define DWC3_DCTL_KEEP_CONNECT BIT(19)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 83dc7304d701..31a654c6f15b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -4195,8 +4195,10 @@ static void dwc3_gadget_conndone_interrupt(struct dwc3 *dwc)
WARN_ONCE(DWC3_VER_IS_PRIOR(DWC3, 240A) && dwc->has_lpm_erratum,
"LPM Erratum not available on dwc3 revisions < 2.40a\n");
- if (dwc->has_lpm_erratum && !DWC3_VER_IS_PRIOR(DWC3, 240A))
+ if (dwc->has_lpm_erratum && !DWC3_VER_IS_PRIOR(DWC3, 240A)) {
+ reg &= ~DWC3_DCTL_NYET_THRES_MASK;
reg |= DWC3_DCTL_NYET_THRES(dwc->lpm_nyet_threshold);
+ }
dwc3_gadget_dctl_write_safe(dwc, reg);
} else {
---
base-commit: c245a7a79602ccbee780c004c1e4abcda66aec32
change-id: 20241206-dwc3-nyet-fix-7085f6d71d04
Best regards,
--
André Draszik <andre.draszik(a)linaro.org>
Starting Zen4, AMD SOCs have 12 Unified Memory Controllers (UMCs) per
socket.
When the amd64_edac module is being loaded, these UMCs are traversed to
determine if they have SdpInit (SdpCtrl[31]) and EccEnabled (UmcCapHi[30])
bits set and create masks in umc_en_mask and ecc_en_mask respectively.
However, the current data type of these variables is u8. As a result, if
only the last 4 UMCs (UMC8 - UMC11) of the system have been utilized,
umc_ecc_enabled() will return false. Consequently, the module may fail to
load on these systems.
Fixes: e2be5955a886 ("EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh")
Signed-off-by: Avadhut Naik <avadhut.naik(a)amd.com>
Cc: stable(a)vger.kernel.org
---
Changes in v2:
1. Change data type of variables from u16 to int. (Boris)
2. Modify commit message per feedback. (Boris)
3. Add Fixes: and CC:stable tags. (Boris)
---
drivers/edac/amd64_edac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index ddfbdb66b794..b1c034214a8d 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3362,7 +3362,7 @@ static bool dct_ecc_enabled(struct amd64_pvt *pvt)
static bool umc_ecc_enabled(struct amd64_pvt *pvt)
{
- u8 umc_en_mask = 0, ecc_en_mask = 0;
+ int umc_en_mask = 0, ecc_en_mask = 0;
u16 nid = pvt->mc_node_id;
struct amd64_umc *umc;
u8 ecc_en = 0, i;
base-commit: f84722cbed6c2b2094ad8bbe48be2c5900752935
--
2.43.0
In split_large_buddy(), we might call pfn_to_page() on a PFN that might
not exist. In corner cases, such as when freeing the highest pageblock in
the last memory section, this could result with CONFIG_SPARSEMEM &&
!CONFIG_SPARSEMEM_EXTREME in __pfn_to_section() returning NULL and
and __section_mem_map_addr() dereferencing that NULL pointer.
Let's fix it, and avoid doing a pfn_to_page() call for the first
iteration, where we already have the page.
So far this was found by code inspection, but let's just CC stable as
the fix is easy.
Fixes: fd919a85cd55 ("mm: page_isolation: prepare for hygienic freelists")
Reported-by: Vlastimil Babka <vbabka(a)suse.cz>
Closes: https://lkml.kernel.org/r/e1a898ba-a717-4d20-9144-29df1a6c8813@suse.cz
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
mm/page_alloc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48a291c485df4..a52c6022c65cb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1238,13 +1238,15 @@ static void split_large_buddy(struct zone *zone, struct page *page,
if (order > pageblock_order)
order = pageblock_order;
- while (pfn != end) {
+ do {
int mt = get_pfnblock_migratetype(page, pfn);
__free_one_page(page, pfn, zone, order, mt, fpi);
pfn += 1 << order;
+ if (pfn == end)
+ break;
page = pfn_to_page(pfn);
- }
+ } while (1);
}
static void free_one_page(struct zone *zone, struct page *page,
--
2.47.1
Recently we found the fifo_read() and fifo_write() are broken in our
5.15 and 5.4 kernels after cherry-pick the commit e635f652696e
("serial: sc16is7xx: convert from _raw_ to _noinc_ regmap functions
for FIFO"), that is because the reg needs to shift if we don't
cherry-pick a prerequisite commit 3837a0379533 ("serial: sc16is7xx:
improve regmap debugfs by using one regmap per port").
It is hard to backport the prerequisite commit to 5.15.y and 5.10.y
due to the significant conflict. To be safe, here fix it by shifting
the reg as regmap_volatile() does.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
---
in the V2:
add Cc:
fix a typo on prerequisite
add explanation for not backporting the prerequisite patch to 5.15.y and 5.10.y
drivers/tty/serial/sc16is7xx.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index d274a847c6ab..87e34099f369 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -487,7 +487,14 @@ static bool sc16is7xx_regmap_precious(struct device *dev, unsigned int reg)
static bool sc16is7xx_regmap_noinc(struct device *dev, unsigned int reg)
{
- return reg == SC16IS7XX_RHR_REG;
+ switch (reg >> SC16IS7XX_REG_SHIFT) {
+ case SC16IS7XX_RHR_REG:
+ return true;
+ default:
+ break;
+ }
+
+ return false;
}
/*
--
2.34.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 7912405643a14b527cd4a4f33c1d4392da900888
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121043-moneyless-stucco-c8f1@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7912405643a14b527cd4a4f33c1d4392da900888 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Sun, 1 Dec 2024 12:17:30 +0100
Subject: [PATCH] modpost: Add .irqentry.text to OTHER_SECTIONS
The compiler can fully inline the actual handler function of an interrupt
entry into the .irqentry.text entry point. If such a function contains an
access which has an exception table entry, modpost complains about a
section mismatch:
WARNING: vmlinux.o(__ex_table+0x447c): Section mismatch in reference ...
The relocation at __ex_table+0x447c references section ".irqentry.text"
which is not in the list of authorized sections.
Add .irqentry.text to OTHER_SECTIONS to cure the issue.
Reported-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org # needed for linux-5.4-y
Link: https://lore.kernel.org/all/20241128111844.GE10431@google.com/
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 0584cbcdbd2d..fb787a5715f5 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -772,7 +772,7 @@ static void check_section(const char *modname, struct elf_info *elf,
".ltext", ".ltext.*"
#define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
".fixup", ".entry.text", ".exception.text", \
- ".coldtext", ".softirqentry.text"
+ ".coldtext", ".softirqentry.text", ".irqentry.text"
#define ALL_TEXT_SECTIONS ".init.text", ".exit.text", \
TEXT_SECTIONS, OTHER_TEXT_SECTIONS
From: Zijun Hu <quic_zijuhu(a)quicinc.com>
[ Upstream commit bfa54a793ba77ef696755b66f3ac4ed00c7d1248 ]
For bus_register(), any error which happens after kset_register() will
cause that @priv are freed twice, fixed by setting @priv with NULL after
the first free.
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
Link: https://lore.kernel.org/r/20240727-bus_register_fix-v1-1-fed8dd0dba7a@quici…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[ Using bus->p instead of priv which will be consistent with the context ]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/base/bus.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 339a9edcde5f..028e45d2f7fe 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -853,6 +853,8 @@ int bus_register(struct bus_type *bus)
bus_remove_file(bus, &bus_attr_uevent);
bus_uevent_fail:
kset_unregister(&bus->p->subsys);
+ /* Above kset_unregister() will kfree @priv */
+ bus->p = NULL;
out:
kfree(bus->p);
bus->p = NULL;
--
2.25.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7912405643a14b527cd4a4f33c1d4392da900888
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121042-refined-ducky-6392@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7912405643a14b527cd4a4f33c1d4392da900888 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Sun, 1 Dec 2024 12:17:30 +0100
Subject: [PATCH] modpost: Add .irqentry.text to OTHER_SECTIONS
The compiler can fully inline the actual handler function of an interrupt
entry into the .irqentry.text entry point. If such a function contains an
access which has an exception table entry, modpost complains about a
section mismatch:
WARNING: vmlinux.o(__ex_table+0x447c): Section mismatch in reference ...
The relocation at __ex_table+0x447c references section ".irqentry.text"
which is not in the list of authorized sections.
Add .irqentry.text to OTHER_SECTIONS to cure the issue.
Reported-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org # needed for linux-5.4-y
Link: https://lore.kernel.org/all/20241128111844.GE10431@google.com/
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 0584cbcdbd2d..fb787a5715f5 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -772,7 +772,7 @@ static void check_section(const char *modname, struct elf_info *elf,
".ltext", ".ltext.*"
#define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
".fixup", ".entry.text", ".exception.text", \
- ".coldtext", ".softirqentry.text"
+ ".coldtext", ".softirqentry.text", ".irqentry.text"
#define ALL_TEXT_SECTIONS ".init.text", ".exit.text", \
TEXT_SECTIONS, OTHER_TEXT_SECTIONS
From: Gary Guo <gary(a)garyguo.net>
[ Upstream commit c95bbb59a9b22f9b838b15d28319185c1c884329 ]
The term "receiver" means that a type can be used as the type of `self`,
and thus enables method call syntax `foo.bar()` instead of
`Foo::bar(foo)`. Stable Rust as of today (1.81) enables a limited
selection of types (primitives and types in std, e.g. `Box` and `Arc`)
to be used as receivers, while custom types cannot.
We want the kernel `Arc` type to have the same functionality as the Rust
std `Arc`, so we use the `Receiver` trait (gated behind `receiver_trait`
unstable feature) to gain the functionality.
The `arbitrary_self_types` RFC [1] (tracking issue [2]) is accepted and
it will allow all types that implement a new `Receiver` trait (different
from today's unstable trait) to be used as receivers. This trait will be
automatically implemented for all `Deref` types, which include our `Arc`
type, so we no longer have to opt-in to be used as receiver. To prepare
us for the change, remove the `Receiver` implementation and the
associated feature. To still allow `Arc` and others to be used as method
receivers, turn on `arbitrary_self_types` feature instead.
This feature gate is introduced in 1.23.0. It used to enable both
`Deref` types and raw pointer types to be used as receivers, but the
latter is now split into a different feature gate in Rust 1.83 nightly.
We do not need receivers on raw pointers so this change would not affect
us and usage of `arbitrary_self_types` feature would work for all Rust
versions that we support (>=1.78).
Cc: Adrian Taylor <ade(a)hohum.me.uk>
Link: https://github.com/rust-lang/rfcs/pull/3519 [1]
Link: https://github.com/rust-lang/rust/issues/44874 [2]
Signed-off-by: Gary Guo <gary(a)garyguo.net>
Reviewed-by: Benno Lossin <benno.lossin(a)proton.me>
Reviewed-by: Alice Ryhl <aliceryhl(a)google.com>
Link: https://lore.kernel.org/r/20240915132734.1653004-1-gary@garyguo.net
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
Please consider applying to 6.12.y.
It is meant to support the upcoming Rust 1.84.0 compiler (to be
released in a month), since 6.12 LTS is the first stable kernel that
supports a minimum Rust version, thus users may use newer compilers.
Older LTSs do not need it, for that reason.
rust/kernel/lib.rs | 2 +-
rust/kernel/list/arc.rs | 3 ---
rust/kernel/sync/arc.rs | 6 ------
scripts/Makefile.build | 2 +-
4 files changed, 2 insertions(+), 11 deletions(-)
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 032c9089e686..e936254531fd 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -12,10 +12,10 @@
//! do so first instead of bypassing this crate.
#![no_std]
+#![feature(arbitrary_self_types)]
#![feature(coerce_unsized)]
#![feature(dispatch_from_dyn)]
#![feature(new_uninit)]
-#![feature(receiver_trait)]
#![feature(unsize)]
// Ensure conditional compilation based on the kernel configuration works;
diff --git a/rust/kernel/list/arc.rs b/rust/kernel/list/arc.rs
index d801b9dc6291..3483d8c232c4 100644
--- a/rust/kernel/list/arc.rs
+++ b/rust/kernel/list/arc.rs
@@ -441,9 +441,6 @@ fn as_ref(&self) -> &Arc<T> {
}
}
-// This is to allow [`ListArc`] (and variants) to be used as the type of `self`.
-impl<T, const ID: u64> core::ops::Receiver for ListArc<T, ID> where T: ListArcSafe<ID> + ?Sized {}
-
// This is to allow coercion from `ListArc<T>` to `ListArc<U>` if `T` can be converted to the
// dynamically-sized type (DST) `U`.
impl<T, U, const ID: u64> core::ops::CoerceUnsized<ListArc<U, ID>> for ListArc<T, ID>
diff --git a/rust/kernel/sync/arc.rs b/rust/kernel/sync/arc.rs
index 3021f30fd822..28743a7c74a8 100644
--- a/rust/kernel/sync/arc.rs
+++ b/rust/kernel/sync/arc.rs
@@ -171,9 +171,6 @@ unsafe fn container_of(ptr: *const T) -> NonNull<ArcInner<T>> {
}
}
-// This is to allow [`Arc`] (and variants) to be used as the type of `self`.
-impl<T: ?Sized> core::ops::Receiver for Arc<T> {}
-
// This is to allow coercion from `Arc<T>` to `Arc<U>` if `T` can be converted to the
// dynamically-sized type (DST) `U`.
impl<T: ?Sized + Unsize<U>, U: ?Sized> core::ops::CoerceUnsized<Arc<U>> for Arc<T> {}
@@ -480,9 +477,6 @@ pub struct ArcBorrow<'a, T: ?Sized + 'a> {
_p: PhantomData<&'a ()>,
}
-// This is to allow [`ArcBorrow`] (and variants) to be used as the type of `self`.
-impl<T: ?Sized> core::ops::Receiver for ArcBorrow<'_, T> {}
-
// This is to allow `ArcBorrow<U>` to be dispatched on when `ArcBorrow<T>` can be coerced into
// `ArcBorrow<U>`.
impl<T: ?Sized + Unsize<U>, U: ?Sized> core::ops::DispatchFromDyn<ArcBorrow<'_, U>>
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 8f423a1faf50..880785b52c04 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -248,7 +248,7 @@ $(obj)/%.lst: $(obj)/%.c FORCE
# Compile Rust sources (.rs)
# ---------------------------------------------------------------------------
-rust_allowed_features := new_uninit
+rust_allowed_features := arbitrary_self_types,new_uninit
# `--out-dir` is required to avoid temporaries being created by `rustc` in the
# current working directory, which may be not accessible in the out-of-tree
base-commit: 61baee2dc5341c936e7fa7b1ca33c5607868de69
--
2.47.1
From: Pratyush Brahma <quic_pbrahma(a)quicinc.com>
[ Upstream commit 229e6ee43d2a160a1592b83aad620d6027084aad ]
Null pointer dereference occurs due to a race between smmu
driver probe and client driver probe, when of_dma_configure()
for client is called after the iommu_device_register() for smmu driver
probe has executed but before the driver_bound() for smmu driver
has been called.
Following is how the race occurs:
T1:Smmu device probe T2: Client device probe
really_probe()
arm_smmu_device_probe()
iommu_device_register()
really_probe()
platform_dma_configure()
of_dma_configure()
of_dma_configure_id()
of_iommu_configure()
iommu_probe_device()
iommu_init_device()
arm_smmu_probe_device()
arm_smmu_get_by_fwnode()
driver_find_device_by_fwnode()
driver_find_device()
next_device()
klist_next()
/* null ptr
assigned to smmu */
/* null ptr dereference
while smmu->streamid_mask */
driver_bound()
klist_add_tail()
When this null smmu pointer is dereferenced later in
arm_smmu_probe_device, the device crashes.
Fix this by deferring the probe of the client device
until the smmu device has bound to the arm smmu driver.
Fixes: 021bb8420d44 ("iommu/arm-smmu: Wire up generic configuration support")
Cc: stable(a)vger.kernel.org # 6.6
Co-developed-by: Prakash Gupta <quic_guptap(a)quicinc.com>
Signed-off-by: Prakash Gupta <quic_guptap(a)quicinc.com>
Signed-off-by: Pratyush Brahma <quic_pbrahma(a)quicinc.com>
Link: https://lore.kernel.org/r/20241004090428.2035-1-quic_pbrahma@quicinc.com
[will: Add comment]
Signed-off-by: Will Deacon <will(a)kernel.org>
[rm: backport for context conflict prior to 6.8]
Signed-off-by: Robin Murphy <robin.murphy(a)arm.com>
---
drivers/iommu/arm/arm-smmu/arm-smmu.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d6d1a2a55cc0..42c5012ba8aa 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1359,6 +1359,17 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
goto out_free;
} else if (fwspec && fwspec->ops == &arm_smmu_ops) {
smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
+
+ /*
+ * Defer probe if the relevant SMMU instance hasn't finished
+ * probing yet. This is a fragile hack and we'd ideally
+ * avoid this race in the core code. Until that's ironed
+ * out, however, this is the most pragmatic option on the
+ * table.
+ */
+ if (!smmu)
+ return ERR_PTR(dev_err_probe(dev, -EPROBE_DEFER,
+ "smmu dev has not bound yet\n"));
} else {
return ERR_PTR(-ENODEV);
}
--
2.39.2.101.g768bb238c484.dirty
Split resume into a 3rd step to handle displays when DCC is
enabled on DCN 4.0.1. Move display after the buffer funcs
have been re-enabled so that the GPU will do the move and
properly set the DCC metadata for DCN.
v2: fix fence irq resume ordering
Backport to 6.12 and older kernels
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.11.x
(cherry picked from commit 73dae652dcac776296890da215ee7dec357a1032)
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 +++++++++++++++++++++-
1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1f08cb88d51b..66890f8a7670 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3666,7 +3666,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
*
* @adev: amdgpu_device pointer
*
- * First resume function for hardware IPs. The list of all the hardware
+ * Second resume function for hardware IPs. The list of all the hardware
* IPs that make up the asic is walked and the resume callbacks are run for
* all blocks except COMMON, GMC, and IH. resume puts the hardware into a
* functional state after a suspend and updates the software state as
@@ -3684,6 +3684,7 @@ static int amdgpu_device_ip_resume_phase2(struct amdgpu_device *adev)
if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_COMMON ||
adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC ||
adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_IH ||
+ adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE ||
adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_PSP)
continue;
r = adev->ip_blocks[i].version->funcs->resume(adev);
@@ -3698,6 +3699,36 @@ static int amdgpu_device_ip_resume_phase2(struct amdgpu_device *adev)
return 0;
}
+/**
+ * amdgpu_device_ip_resume_phase3 - run resume for hardware IPs
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Third resume function for hardware IPs. The list of all the hardware
+ * IPs that make up the asic is walked and the resume callbacks are run for
+ * all DCE. resume puts the hardware into a functional state after a suspend
+ * and updates the software state as necessary. This function is also used
+ * for restoring the GPU after a GPU reset.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+static int amdgpu_device_ip_resume_phase3(struct amdgpu_device *adev)
+{
+ int i, r;
+
+ for (i = 0; i < adev->num_ip_blocks; i++) {
+ if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
+ continue;
+ if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE) {
+ r = adev->ip_blocks[i].version->funcs->resume(adev);
+ if (r)
+ return r;
+ }
+ }
+
+ return 0;
+}
+
/**
* amdgpu_device_ip_resume - run resume for hardware IPs
*
@@ -3727,6 +3758,13 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
if (adev->mman.buffer_funcs_ring->sched.ready)
amdgpu_ttm_set_buffer_funcs_status(adev, true);
+ if (r)
+ return r;
+
+ amdgpu_fence_driver_hw_init(adev);
+
+ r = amdgpu_device_ip_resume_phase3(adev);
+
return r;
}
@@ -4809,7 +4847,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
dev_err(adev->dev, "amdgpu_device_ip_resume failed (%d).\n", r);
goto exit;
}
- amdgpu_fence_driver_hw_init(adev);
if (!adev->in_s0ix) {
r = amdgpu_amdkfd_resume(adev, adev->in_runpm);
@@ -5431,6 +5468,10 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
if (tmp_adev->mman.buffer_funcs_ring->sched.ready)
amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true);
+ r = amdgpu_device_ip_resume_phase3(tmp_adev);
+ if (r)
+ goto out;
+
if (vram_lost)
amdgpu_device_fill_reset_magic(tmp_adev);
--
2.47.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 7912405643a14b527cd4a4f33c1d4392da900888
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121044-elbow-varmint-5f14@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7912405643a14b527cd4a4f33c1d4392da900888 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Sun, 1 Dec 2024 12:17:30 +0100
Subject: [PATCH] modpost: Add .irqentry.text to OTHER_SECTIONS
The compiler can fully inline the actual handler function of an interrupt
entry into the .irqentry.text entry point. If such a function contains an
access which has an exception table entry, modpost complains about a
section mismatch:
WARNING: vmlinux.o(__ex_table+0x447c): Section mismatch in reference ...
The relocation at __ex_table+0x447c references section ".irqentry.text"
which is not in the list of authorized sections.
Add .irqentry.text to OTHER_SECTIONS to cure the issue.
Reported-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org # needed for linux-5.4-y
Link: https://lore.kernel.org/all/20241128111844.GE10431@google.com/
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 0584cbcdbd2d..fb787a5715f5 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -772,7 +772,7 @@ static void check_section(const char *modname, struct elf_info *elf,
".ltext", ".ltext.*"
#define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
".fixup", ".entry.text", ".exception.text", \
- ".coldtext", ".softirqentry.text"
+ ".coldtext", ".softirqentry.text", ".irqentry.text"
#define ALL_TEXT_SECTIONS ".init.text", ".exit.text", \
TEXT_SECTIONS, OTHER_TEXT_SECTIONS
From: Kaixin Wang <kxwang23(a)m.fudan.edu.cn>
[ Upstream commit 609366e7a06d035990df78f1562291c3bf0d4a12 ]
In the cdns_i3c_master_probe function, &master->hj_work is bound with
cdns_i3c_master_hj. And cdns_i3c_master_interrupt can call
cnds_i3c_master_demux_ibis function to start the work.
If we remove the module which will call cdns_i3c_master_remove to
make cleanup, it will free master->base through i3c_master_unregister
while the work mentioned above will be used. The sequence of operations
that may lead to a UAF bug is as follows:
CPU0 CPU1
| cdns_i3c_master_hj
cdns_i3c_master_remove |
i3c_master_unregister(&master->base) |
device_unregister(&master->dev) |
device_release |
//free master->base |
| i3c_master_do_daa(&master->base)
| //use master->base
Fix it by ensuring that the work is canceled before proceeding with
the cleanup in cdns_i3c_master_remove.
Signed-off-by: Kaixin Wang <kxwang23(a)m.fudan.edu.cn>
Link: https://lore.kernel.org/r/20240911153544.848398-1-kxwang23@m.fudan.edu.cn
Signed-off-by: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
---
drivers/i3c/master/i3c-master-cdns.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/i3c/master/i3c-master-cdns.c b/drivers/i3c/master/i3c-master-cdns.c
index b9cfda6ae9ae..4473c0b1ae2e 100644
--- a/drivers/i3c/master/i3c-master-cdns.c
+++ b/drivers/i3c/master/i3c-master-cdns.c
@@ -1668,6 +1668,7 @@ static int cdns_i3c_master_remove(struct platform_device *pdev)
struct cdns_i3c_master *master = platform_get_drvdata(pdev);
int ret;
+ cancel_work_sync(&master->hj_work);
ret = i3c_master_unregister(&master->base);
if (ret)
return ret;
--
2.25.1
From: Wayne Lin <wayne.lin(a)amd.com>
[ Upstream commit ad28d7c3d989fc5689581664653879d664da76f0 ]
[Why & How]
It actually exposes '6' types in enum dmub_notification_type. Not 5. Using smaller
number to create array dmub_callback & dmub_thread_offload has potential to access
item out of array bound. Fix it.
Reviewed-by: Jerry Zuo <jerry.zuo(a)amd.com>
Acked-by: Zaeem Mohamed <zaeem.mohamed(a)amd.com>
Signed-off-by: Wayne Lin <wayne.lin(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Cherry-pick from 9f404b0bc2df3880758fb3c3bc7496f596f347d7
CVE-2024-46871
Signed-off-by: Guocai He <guocai.he.cn(a)windriver.com>
---
This commit is backporting 9f404b0bc2df3880758fb3c3bc7496f596f347d7 to the branch linux-5.15.y to
solve the CVE-2024-46871. Please merge this commit to linux-5.15.y.
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index f9c3e5a41713..3102ade85b55 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -48,7 +48,7 @@
#define AMDGPU_DM_MAX_NUM_EDP 2
-#define AMDGPU_DMUB_NOTIFICATION_MAX 5
+#define AMDGPU_DMUB_NOTIFICATION_MAX 6
/*
#include "include/amdgpu_dal_power_if.h"
#include "amdgpu_dm_irq.h"
--
2.34.1
Uprobes always use bpf_prog_run_array_uprobe() under tasks-trace-RCU
protection. But it is possible to attach a non-sleepable BPF program to a
uprobe, and non-sleepable BPF programs are freed via normal RCU (see
__bpf_prog_put_noref()). This leads to UAF of the bpf_prog because a normal
RCU grace period does not imply a tasks-trace-RCU grace period.
Fix it by explicitly waiting for a tasks-trace-RCU grace period after
removing the attachment of a bpf_prog to a perf_event.
Cc: stable(a)vger.kernel.org
Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
Suggested-by: Andrii Nakryiko <andrii(a)kernel.org>
Suggested-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Jann Horn <jannh(a)google.com>
---
kernel/trace/bpf_trace.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 949a3870946c381820e8fa7194851b84593d17d9..a403b05a7091384fb08e8c47ed02fad79c1a4874 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2258,6 +2258,13 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
bpf_prog_array_free_sleepable(old_array);
}
+ /*
+ * It could be that the bpf_prog is not sleepable (and will be freed
+ * via normal RCU), but is called from a point that supports sleepable
+ * programs and uses tasks-trace-RCU.
+ */
+ synchronize_rcu_tasks_trace();
+
bpf_prog_put(event->prog);
event->prog = NULL;
---
base-commit: 509df676c2d79c985ec2eaa3e3a3bbe557645861
change-id: 20241210-bpf-fix-actual-uprobe-uaf-0aa234c0e005
--
Jann Horn <jannh(a)google.com>
The commit f9b11229b79c ("serial: 8250: Fix PM usage_count for console
handover") fixed one runtime PM usage counter balance problem that
occurs because .dev is not set during univ8250 setup preventing call to
pm_runtime_get_sync(). Later, univ8250_console_exit() will trigger the
runtime PM usage counter underflow as .dev is already set at that time.
Call pm_runtime_get_sync() to balance the RPM usage counter also in
serial8250_register_8250_port() before trying to add the port.
Reported-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Fixes: bedb404e91bb ("serial: 8250_port: Don't use power management for kernel console")
Cc: <stable(a)vger.kernel.org>
Tested-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
v2:
- Added tags
drivers/tty/serial/8250/8250_core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index 5f9f06911795..68baf75bdadc 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -812,6 +812,9 @@ int serial8250_register_8250_port(const struct uart_8250_port *up)
uart->dl_write = up->dl_write;
if (uart->port.type != PORT_8250_CIR) {
+ if (uart_console_registered(&uart->port))
+ pm_runtime_get_sync(uart->port.dev);
+
if (serial8250_isa_config != NULL)
serial8250_isa_config(0, &uart->port,
&uart->capabilities);
--
2.39.5
Ensure a non-interruptible wait is used when moving a bo to
XE_PL_SYSTEM. This prevents dma_mappings from being removed prematurely
while a GPU job is still in progress, even if the CPU receives a
signal during the operation.
Fixes: 75521e8b56e8 ("drm/xe: Perform dma_map when moving system buffer objects to TT")
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.11+
Suggested-by: Matthew Auld <matthew.auld(a)intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com>
---
drivers/gpu/drm/xe/xe_bo.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 73689dd7d672..b2aa368a23f8 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -733,7 +733,7 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
new_mem->mem_type == XE_PL_SYSTEM) {
long timeout = dma_resv_wait_timeout(ttm_bo->base.resv,
DMA_RESV_USAGE_BOOKKEEP,
- true,
+ false,
MAX_SCHEDULE_TIMEOUT);
if (timeout < 0) {
ret = timeout;
--
2.46.0
Ensure a non-interruptible wait is used when moving a bo to
XE_PL_SYSTEM. This prevents dma_mappings from being removed prematurely
while a GPU job is still in progress, even if the CPU receives a
signal during the operation.
Fixes: 75521e8b56e8 ("drm/xe: Perform dma_map when moving system buffer objects to TT")
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.11+
Suggested-by: Matthew Auld <matthew.auld(a)intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com>
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com>
---
drivers/gpu/drm/xe/xe_bo.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 283cd0294570..06931df876ab 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -733,7 +733,7 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
new_mem->mem_type == XE_PL_SYSTEM) {
long timeout = dma_resv_wait_timeout(ttm_bo->base.resv,
DMA_RESV_USAGE_BOOKKEEP,
- true,
+ false,
MAX_SCHEDULE_TIMEOUT);
if (timeout < 0) {
ret = timeout;
--
2.46.0
From: Joaquín Ignacio Aramendía <samsagax(a)gmail.com>
[ Upstream commit 361ebf5ef843b0aa1704c72eb26b91cf76c3c5b7 ]
Add quirk orientation for AYA NEO 2. The name appears without spaces in
DMI strings. That made it difficult to reuse the 2021 match. Also the
display is larger in resolution.
Tested by the JELOS team that has been patching their own kernel for a
while now and confirmed by users in the AYA NEO and ChimeraOS discord
servers.
Signed-off-by: Joaquín Ignacio Aramendía <samsagax(a)gmail.com>
Signed-off-by: Tobias Jakobi <tjakobi(a)math.uni-bielefeld.de>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2b35545b77a9fd8c9699b751ca282…
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/drm_panel_orientation_quirks.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/drm_panel_orientation_quirks.c b/drivers/gpu/drm/drm_panel_orientation_quirks.c
index f1091cb87de0c..bf90a5be956fe 100644
--- a/drivers/gpu/drm/drm_panel_orientation_quirks.c
+++ b/drivers/gpu/drm/drm_panel_orientation_quirks.c
@@ -166,6 +166,12 @@ static const struct dmi_system_id orientation_data[] = {
DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "T103HAF"),
},
.driver_data = (void *)&lcd800x1280_rightside_up,
+ }, { /* AYA NEO AYANEO 2 */
+ .matches = {
+ DMI_EXACT_MATCH(DMI_SYS_VENDOR, "AYANEO"),
+ DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "AYANEO 2"),
+ },
+ .driver_data = (void *)&lcd1200x1920_rightside_up,
}, { /* AYA NEO 2021 */
.matches = {
DMI_EXACT_MATCH(DMI_SYS_VENDOR, "AYADEVICE"),
--
2.43.0
From: Xu Yang <xu.yang_2(a)nxp.com>
[ Upstream commit ec841b8d73cff37f8960e209017efe1eb2fb21f2 ]
Currently, the imx deivice controller has below limitations:
1. can't generate short packet interrupt if IOC not set in dTD. So if one
request span more than one dTDs and only the last dTD set IOC, the usb
request will pending there if no more data comes.
2. the controller can't accurately deliver data to differtent usb requests
in some cases due to short packet. For example: one usb request span 3
dTDs, then if the controller received a short packet the next packet
will go to 2nd dTD of current request rather than the first dTD of next
request.
3. can't build a bus packet use multiple dTDs. For example: controller
needs to send one packet of 512 bytes use dTD1 (200 bytes) + dTD2
(312 bytes), actually the host side will see 200 bytes short packet.
Based on these limits, add CI_HDRC_HAS_SHORT_PKT_LIMIT flag and use it on
imx platforms.
Signed-off-by: Xu Yang <xu.yang_2(a)nxp.com>
Acked-by: Peter Chen <peter.chen(a)kernel.org>
Link: https://lore.kernel.org/r/20240923081203.2851768-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/usb/chipidea/ci.h | 1 +
drivers/usb/chipidea/ci_hdrc_imx.c | 1 +
drivers/usb/chipidea/core.c | 2 ++
include/linux/usb/chipidea.h | 1 +
4 files changed, 5 insertions(+)
diff --git a/drivers/usb/chipidea/ci.h b/drivers/usb/chipidea/ci.h
index 2a38e1eb65466..e4b003d060c26 100644
--- a/drivers/usb/chipidea/ci.h
+++ b/drivers/usb/chipidea/ci.h
@@ -260,6 +260,7 @@ struct ci_hdrc {
bool b_sess_valid_event;
bool imx28_write_fix;
bool has_portsc_pec_bug;
+ bool has_short_pkt_limit;
bool supports_runtime_pm;
bool in_lpm;
bool wakeup_int;
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index bdc04ce919f7a..b76e7c3fa2c6e 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -342,6 +342,7 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
struct ci_hdrc_platform_data pdata = {
.name = dev_name(&pdev->dev),
.capoffset = DEF_CAPOFFSET,
+ .flags = CI_HDRC_HAS_SHORT_PKT_LIMIT,
.notify_event = ci_hdrc_imx_notify_event,
};
int ret;
diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c
index 835bf2428dc6e..5aa16dbfc289c 100644
--- a/drivers/usb/chipidea/core.c
+++ b/drivers/usb/chipidea/core.c
@@ -1076,6 +1076,8 @@ static int ci_hdrc_probe(struct platform_device *pdev)
CI_HDRC_SUPPORTS_RUNTIME_PM);
ci->has_portsc_pec_bug = !!(ci->platdata->flags &
CI_HDRC_HAS_PORTSC_PEC_MISSED);
+ ci->has_short_pkt_limit = !!(ci->platdata->flags &
+ CI_HDRC_HAS_SHORT_PKT_LIMIT);
platform_set_drvdata(pdev, ci);
ret = hw_device_init(ci, base);
diff --git a/include/linux/usb/chipidea.h b/include/linux/usb/chipidea.h
index 5a7f96684ea22..ebdfef124b2bc 100644
--- a/include/linux/usb/chipidea.h
+++ b/include/linux/usb/chipidea.h
@@ -65,6 +65,7 @@ struct ci_hdrc_platform_data {
#define CI_HDRC_PHY_VBUS_CONTROL BIT(16)
#define CI_HDRC_HAS_PORTSC_PEC_MISSED BIT(17)
#define CI_HDRC_FORCE_VBUS_ACTIVE_ALWAYS BIT(18)
+#define CI_HDRC_HAS_SHORT_PKT_LIMIT BIT(19)
enum usb_dr_mode dr_mode;
#define CI_HDRC_CONTROLLER_RESET_EVENT 0
#define CI_HDRC_CONTROLLER_STOPPED_EVENT 1
--
2.43.0
From: Arnd Bergmann <arnd(a)arndb.de>
An older cleanup of mine inadvertently removed geode-gx1 and geode-lx
from the list of CPUs that are known to support a working cmpxchg8b.
Fixes: 88a2b4edda3d ("x86/Kconfig: Rework CONFIG_X86_PAE dependency")
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
arch/x86/Kconfig.cpu | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 2a7279d80460..42e6a40876ea 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -368,7 +368,7 @@ config X86_HAVE_PAE
config X86_CMPXCHG64
def_bool y
- depends on X86_HAVE_PAE || M586TSC || M586MMX || MK6 || MK7
+ depends on X86_HAVE_PAE || M586TSC || M586MMX || MK6 || MK7 || MGEODEGX1 || MGEODE_LX
# this should be set for all -march=.. options where the compiler
# generates cmov.
--
2.39.5
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x f5807b0606da7ac7c1b74a386b22134ec7702d05
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024120622-enamel-avenge-5621@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f5807b0606da7ac7c1b74a386b22134ec7702d05 Mon Sep 17 00:00:00 2001
From: Marcelo Dalmas <marcelo.dalmas(a)ge.com>
Date: Mon, 25 Nov 2024 12:16:09 +0000
Subject: [PATCH] ntp: Remove invalid cast in time offset math
Due to an unsigned cast, adjtimex() returns the wrong offest when using
ADJ_MICRO and the offset is negative. In this case a small negative offset
returns approximately 4.29 seconds (~ 2^32/1000 milliseconds) due to the
unsigned cast of the negative offset.
This cast was added when the kernel internal struct timex was changed to
use type long long for the time offset value to address the problem of a
64bit/32bit division on 32bit systems.
The correct cast would have been (s32), which is correct as time_offset can
only be in the range of [INT_MIN..INT_MAX] because the shift constant used
for calculating it is 32. But that's non-obvious.
Remove the cast and use div_s64() to cure the issue.
[ tglx: Fix white space damage, use div_s64() and amend the change log ]
Fixes: ead25417f82e ("timex: use __kernel_timex internally")
Signed-off-by: Marcelo Dalmas <marcelo.dalmas(a)ge.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/SJ0P101MB03687BF7D5A10FD3C49C51E5F42E2@SJ0P101M…
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index b550ebe0f03b..163e7a2033b6 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -798,7 +798,7 @@ int __do_adjtimex(struct __kernel_timex *txc, const struct timespec64 *ts,
txc->offset = shift_right(ntpdata->time_offset * NTP_INTERVAL_FREQ, NTP_SCALE_SHIFT);
if (!(ntpdata->time_status & STA_NANO))
- txc->offset = (u32)txc->offset / NSEC_PER_USEC;
+ txc->offset = div_s64(txc->offset, NSEC_PER_USEC);
}
result = ntpdata->time_state;
From: Arnd Bergmann <arnd(a)arndb.de>
An older cleanup of mine inadvertently removed geode-gx1 and geode-lx
from the list of CPUs that are known to support a working cmpxchg8b.
Fixes: 88a2b4edda3d ("x86/Kconfig: Rework CONFIG_X86_PAE dependency")
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
arch/x86/Kconfig.cpu | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 2a7279d80460..42e6a40876ea 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -368,7 +368,7 @@ config X86_HAVE_PAE
config X86_CMPXCHG64
def_bool y
- depends on X86_HAVE_PAE || M586TSC || M586MMX || MK6 || MK7
+ depends on X86_HAVE_PAE || M586TSC || M586MMX || MK6 || MK7 || MGEODEGX1 || MGEODE_LX
# this should be set for all -march=.. options where the compiler
# generates cmov.
--
2.39.5
Hi Carlos,
Please pull this branch with changes for xfs against 6.13-rc1.
As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts. Please let me know if you
encounter any problems.
--D
The following changes since commit feffde684ac29a3b7aec82d2df850fbdbdee55e4:
Merge tag 'for-6.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux (2024-12-03 11:02:17 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git tags/fixes-6.13_2024-12-03
for you to fetch changes up to 641a4a63154885060b7900f86bb45582f07e964b:
xfs: fix sb_spino_align checks for large fsblock sizes (2024-12-03 18:54:49 -0800)
----------------------------------------------------------------
xfs: bug fixes for 6.13 [v2 1/2]
Here are some bugfixes for 6.13 that have been accumulating since 6.12
was released.
With a bit of luck, this should all go splendidly.
Signed-off-by: "Darrick J. Wong" <djwong(a)kernel.org>
----------------------------------------------------------------
Darrick J. Wong (22):
xfs: fix off-by-one error in fsmap's end_daddr usage
xfs: metapath scrubber should use the already loaded inodes
xfs: keep quota directory inode loaded
xfs: return a 64-bit block count from xfs_btree_count_blocks
xfs: don't drop errno values when we fail to ficlone the entire range
xfs: separate healthy clearing mask during repair
xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink
xfs: mark metadir repair tempfiles with IRECOVERY
xfs: fix null bno_hint handling in xfs_rtallocate_rtg
xfs: fix error bailout in xfs_rtginode_create
xfs: update btree keys correctly when _insrec splits an inode root block
xfs: fix scrub tracepoints when inode-rooted btrees are involved
xfs: unlock inodes when erroring out of xfs_trans_alloc_dir
xfs: only run precommits once per transaction object
xfs: avoid nested calls to __xfs_trans_commit
xfs: don't lose solo superblock counter update transactions
xfs: don't lose solo dquot update transactions
xfs: separate dquot buffer reads from xfs_dqflush
xfs: clean up log item accesses in xfs_qm_dqflush{,_done}
xfs: attach dquot buffer to dquot log item buffer
xfs: convert quotacheck to attach dquot buffers
xfs: fix sb_spino_align checks for large fsblock sizes
fs/xfs/libxfs/xfs_btree.c | 33 +++++--
fs/xfs/libxfs/xfs_btree.h | 2 +-
fs/xfs/libxfs/xfs_ialloc_btree.c | 4 +-
fs/xfs/libxfs/xfs_rtgroup.c | 2 +-
fs/xfs/libxfs/xfs_sb.c | 11 ++-
fs/xfs/scrub/agheader.c | 6 +-
fs/xfs/scrub/agheader_repair.c | 6 +-
fs/xfs/scrub/fscounters.c | 2 +-
fs/xfs/scrub/health.c | 57 +++++++-----
fs/xfs/scrub/ialloc.c | 4 +-
fs/xfs/scrub/metapath.c | 68 ++++++--------
fs/xfs/scrub/refcount.c | 2 +-
fs/xfs/scrub/scrub.h | 6 ++
fs/xfs/scrub/symlink_repair.c | 3 +-
fs/xfs/scrub/tempfile.c | 10 +-
fs/xfs/scrub/trace.h | 2 +-
fs/xfs/xfs_bmap_util.c | 2 +-
fs/xfs/xfs_dquot.c | 195 +++++++++++++++++++++++++++++++++------
fs/xfs/xfs_dquot.h | 6 +-
fs/xfs/xfs_dquot_item.c | 51 +++++++---
fs/xfs/xfs_dquot_item.h | 7 ++
fs/xfs/xfs_file.c | 8 ++
fs/xfs/xfs_fsmap.c | 38 ++++----
fs/xfs/xfs_inode.h | 2 +-
fs/xfs/xfs_qm.c | 95 +++++++++++++------
fs/xfs/xfs_qm.h | 1 +
fs/xfs/xfs_quota.h | 7 +-
fs/xfs/xfs_rtalloc.c | 2 +-
fs/xfs/xfs_trans.c | 58 ++++++------
fs/xfs/xfs_trans_ail.c | 2 +-
fs/xfs/xfs_trans_dquot.c | 31 ++++++-
31 files changed, 495 insertions(+), 228 deletions(-)
Dear Linux community,
I am leading a project at Samsung where we utilized your kernel 5.15 in our system. Recently we faced an important kernel panic issue whose fix must be following patch:
https://lore.kernel.org/all/1946ef9f774851732eed78760a78ec40dbc6d178.166759…
The problem for us is that this patch must be available from kernel 6.1 onwards.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drive…
Clear to us is that it is not integrated into K5.15 yet: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drive…
On the other side we can see that this patch was already applied to Android Common Kernel android13-5.15 https://android-review.googlesource.com/q/I51dbcdc5dc536b0e69c6187b2d7ac6a2…. In particular, we can see the same implementation in the recent ACK release https://android.googlesource.com/kernel/common/+/refs/tags/android13-5.15-2… given on https://source.android.com/docs/core/architecture/kernel/gki-android13-5_15…
Do you have any plan to adjust your implementation in drivers/iommu/iommu.c in K5.15 as present in K6.1? If not, since the issue in our project is very critical for us, may I ask you kindly to trigger the sync of K5.15 and K6.1 soon?
I would appreciate highly your response and any support.
Kind regards / Mit freundlichen Grüßen
-Confidential-
Bilge Aydin
Project Manager
Phone:: +49-89-45578-1055 · Mobile:: +49-160-8855875 · Email: b.aydin(a)samsung.com
Samsung Semiconductor Europe GmbH
Einsteinstrasse 174, 81677 München, Deutschland
Jurisdiction and registered Munich / Germany, HRB Nr. 253778
Managing Director: Dermot Ryan
www.samsung.com/semiconductor/solutions/automotive
This communication (including any attachments) contains information which may be confidential and/or privileged. It is for the exclusive use of the intended recipient(s) so, if you have received this communication in error, please do not distribute, copy or use this communication or its contents but notify the sender immediately and then destroy any copies of it. Due to the nature of the Internet, the sender is unable to ensure the integrity of this message. Any views expressed in this communication are those of the individual sender, except where the sender specifically states them to be the views of the company the sender works for.
On 2024-12-09 11:27 am, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> iommu/arm-smmu: Defer probe of clients after smmu device bound
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> iommu-arm-smmu-defer-probe-of-clients-after-smmu-dev.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
FWIW the correct resolution for cherry-picking this directly is the
logically-straightforward one, as below (git is mostly just confused by
the context)
Cheers,
Robin.
----->8-----
diff --cc drivers/iommu/arm/arm-smmu/arm-smmu.c
index d6d1a2a55cc0,14618772a3d6..000000000000
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@@ -1357,10 -1435,19 +1357,21 @@@ static struct iommu_device *arm_smmu_pr
fwspec = dev_iommu_fwspec_get(dev);
if (ret)
goto out_free;
- } else {
+ } else if (fwspec && fwspec->ops == &arm_smmu_ops) {
smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
+
+ /*
+ * Defer probe if the relevant SMMU instance hasn't finished
+ * probing yet. This is a fragile hack and we'd ideally
+ * avoid this race in the core code. Until that's ironed
+ * out, however, this is the most pragmatic option on the
+ * table.
+ */
+ if (!smmu)
+ return ERR_PTR(dev_err_probe(dev, -EPROBE_DEFER,
+ "smmu dev has not bound yet\n"));
+ } else {
+ return ERR_PTR(-ENODEV);
}
ret = -EINVAL;
>
>
>
> commit 62dc845a353efab2254480df8ae7d06175627313
> Author: Pratyush Brahma <quic_pbrahma(a)quicinc.com>
> Date: Fri Oct 4 14:34:28 2024 +0530
>
> iommu/arm-smmu: Defer probe of clients after smmu device bound
>
> [ Upstream commit 229e6ee43d2a160a1592b83aad620d6027084aad ]
>
> Null pointer dereference occurs due to a race between smmu
> driver probe and client driver probe, when of_dma_configure()
> for client is called after the iommu_device_register() for smmu driver
> probe has executed but before the driver_bound() for smmu driver
> has been called.
>
> Following is how the race occurs:
>
> T1:Smmu device probe T2: Client device probe
>
> really_probe()
> arm_smmu_device_probe()
> iommu_device_register()
> really_probe()
> platform_dma_configure()
> of_dma_configure()
> of_dma_configure_id()
> of_iommu_configure()
> iommu_probe_device()
> iommu_init_device()
> arm_smmu_probe_device()
> arm_smmu_get_by_fwnode()
> driver_find_device_by_fwnode()
> driver_find_device()
> next_device()
> klist_next()
> /* null ptr
> assigned to smmu */
> /* null ptr dereference
> while smmu->streamid_mask */
> driver_bound()
> klist_add_tail()
>
> When this null smmu pointer is dereferenced later in
> arm_smmu_probe_device, the device crashes.
>
> Fix this by deferring the probe of the client device
> until the smmu device has bound to the arm smmu driver.
>
> Fixes: 021bb8420d44 ("iommu/arm-smmu: Wire up generic configuration support")
> Cc: stable(a)vger.kernel.org
> Co-developed-by: Prakash Gupta <quic_guptap(a)quicinc.com>
> Signed-off-by: Prakash Gupta <quic_guptap(a)quicinc.com>
> Signed-off-by: Pratyush Brahma <quic_pbrahma(a)quicinc.com>
> Link: https://lore.kernel.org/r/20241004090428.2035-1-quic_pbrahma@quicinc.com
> [will: Add comment]
> Signed-off-by: Will Deacon <will(a)kernel.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 8203a06014d71..b40ffa1ec2db6 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1354,6 +1354,17 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
> goto out_free;
> } else {
> smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
> +
> + /*
> + * Defer probe if the relevant SMMU instance hasn't finished
> + * probing yet. This is a fragile hack and we'd ideally
> + * avoid this race in the core code. Until that's ironed
> + * out, however, this is the most pragmatic option on the
> + * table.
> + */
> + if (!smmu)
> + return ERR_PTR(dev_err_probe(dev, -EPROBE_DEFER,
> + "smmu dev has not bound yet\n"));
> }
>
> ret = -EINVAL;
From: Jason Gerecke <jason.gerecke(a)wacom.com>
If an LED has a default_trigger set prior to being registered with
the subsystem, that trigger will be executed with a brightness value
defined by `trigger->brightness`. Our driver was not setting this
value, which was causing problems. It would cause the selected LED
to be turned off, as well as corrupt the hlv/llv values assigned to
other LEDs (since calling `wacom_led_brightness_set` will overite
these values).
This patch sets the value of `trigger->brightness` to an appropriate
value. We use `wacom_leds_brightness_get` to transform the llv/hlv
values into a brightness that is understood by the rest of the LED
subsystem.
Fixes: 822c91e72eac ("leds: trigger: Store brightness set by led_trigger_event()")
Cc: stable(a)vger.kernel.org # v6.10+
Signed-off-by: Jason Gerecke <jason.gerecke(a)wacom.com>
---
drivers/hid/wacom_sys.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
index 2bc45b24075c3..e60ca39ac7adc 100644
--- a/drivers/hid/wacom_sys.c
+++ b/drivers/hid/wacom_sys.c
@@ -1370,17 +1370,6 @@ static int wacom_led_register_one(struct device *dev, struct wacom *wacom,
if (!name)
return -ENOMEM;
- if (!read_only) {
- led->trigger.name = name;
- error = devm_led_trigger_register(dev, &led->trigger);
- if (error) {
- hid_err(wacom->hdev,
- "failed to register LED trigger %s: %d\n",
- led->cdev.name, error);
- return error;
- }
- }
-
led->group = group;
led->id = id;
led->wacom = wacom;
@@ -1397,6 +1386,19 @@ static int wacom_led_register_one(struct device *dev, struct wacom *wacom,
led->cdev.brightness_set = wacom_led_readonly_brightness_set;
}
+ if (!read_only) {
+ led->trigger.name = name;
+ if (id == wacom->led.groups[group].select)
+ led->trigger.brightness = wacom_leds_brightness_get(led);
+ error = devm_led_trigger_register(dev, &led->trigger);
+ if (error) {
+ hid_err(wacom->hdev,
+ "failed to register LED trigger %s: %d\n",
+ led->cdev.name, error);
+ return error;
+ }
+ }
+
error = devm_led_classdev_register(dev, &led->cdev);
if (error) {
hid_err(wacom->hdev,
--
2.47.1
From: Haibo Chen <haibo.chen(a)nxp.com>
Dummy usually used for rx, so combine the dummy buswidth check only
for rx mode.
Find issue after commit 98d1fb94ce75 ("mtd: spi-nor: core: replace
dummy buswidth from addr to data") on imx8mn-ddr4-evk board. On this
board, rx=4, tx=1, without the upper commit, chose SNOR_CMD_READ_1_1_4,
after this commit, chose SNOR_CMD_READ(1_1_1). This is because when
dummy buswidth = 4, spi_check_buswidth_req(mem, op->dummy.buswidth, true)
do not return the expected result.
Combine the cmd.buswidth and addr.buswidth check to tx is reasonable,
but for dummy buswidth, need to check the rx mode.
Fixes: c36ff266dc82 ("spi: Extend the core to ease integration of SPI memory controllers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haibo Chen <haibo.chen(a)nxp.com>
---
drivers/spi/spi-mem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
index abc6792e738c..bc9f09a9e5a0 100644
--- a/drivers/spi/spi-mem.c
+++ b/drivers/spi/spi-mem.c
@@ -150,7 +150,7 @@ static bool spi_mem_check_buswidth(struct spi_mem *mem,
return false;
if (op->dummy.nbytes &&
- spi_check_buswidth_req(mem, op->dummy.buswidth, true))
+ spi_check_buswidth_req(mem, op->dummy.buswidth, false))
return false;
if (op->data.dir != SPI_MEM_NO_DATA &&
--
2.34.1
There is a recent ML report that mounting a large fs backed by hardware
RAID56 controller (with one device missing) took too much time, and
systemd seems to kill the mount attempt.
In that case, the only error message is:
BTRFS error (device sdj): open_ctree failed
There is no reason on why the failure happened, making it very hard to
understand the reason.
At least output the error number (in the particular case it should be
-EINTR) to provide some clue.
Link: https://lore.kernel.org/linux-btrfs/9b9c4d2810abcca2f9f76e32220ed9a90febb23…
Reported-by: Christoph Anton Mitterer <calestyo(a)scientia.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7dfe5005129a..f6eaaf20229d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -971,7 +971,7 @@ static int btrfs_fill_super(struct super_block *sb,
err = open_ctree(sb, fs_devices);
if (err) {
- btrfs_err(fs_info, "open_ctree failed");
+ btrfs_err(fs_info, "open_ctree failed: %d", err);
return err;
}
--
2.47.1
This patch series is to fix bugs in drivers/of/irq.c
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
---
Zijun Hu (8):
of/irq: Fix wrong value of variable @len in of_irq_parse_imap_parent()
of/irq: Correct element count for array @dummy_imask in API of_irq_parse_raw()
of/irq: Fix device node refcount leakage in API of_irq_parse_raw()
of/irq: Fix using uninitialized variable @addr_len in API of_irq_parse_one()
of/irq: Fix device node refcount leakage in API of_irq_parse_one()
of/irq: Fix device node refcount leakages in of_irq_count()
of/irq: Fix device node refcount leakages in of_irq_init()
of/irq: Fix device node refcount leakage in API irq_of_parse_and_map()
drivers/of/irq.c | 30 +++++++++++++++++++++++++-----
1 file changed, 25 insertions(+), 5 deletions(-)
---
base-commit: 16ef9c9de0c48b836c5996c6e9792cb4f658c8f1
change-id: 20241208-of_irq_fix-659514bc9aa3
Best regards,
--
Zijun Hu <quic_zijuhu(a)quicinc.com>
Clippy in the upcoming Rust 1.83.0 spots a spurious empty line since the
`clippy::empty_line_after_doc_comments` warning is now enabled by default
given it is part of the `suspicious` group [1]:
error: empty line after doc comment
--> drivers/gpu/drm/drm_panic_qr.rs:931:1
|
931 | / /// They must remain valid for the duration of the function call.
932 | |
| |_
933 | #[no_mangle]
934 | / pub unsafe extern "C" fn drm_panic_qr_generate(
935 | | url: *const i8,
936 | | data: *mut u8,
937 | | data_len: usize,
... |
940 | | tmp_size: usize,
941 | | ) -> u8 {
| |_______- the comment documents this function
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_…
= note: `-D clippy::empty-line-after-doc-comments` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::empty_line_after_doc_comments)]`
= help: if the empty line is unintentional remove it
Thus remove the empty line.
Cc: stable(a)vger.kernel.org
Fixes: cb5164ac43d0 ("drm/panic: Add a QR code panic screen")
Link: https://github.com/rust-lang/rust-clippy/pull/13091 [1]
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
I added the Fixes and stable tags since it would be nice to keep the 6.12 LTS
Clippy-clean (since that one is the first that supports several Rust compilers).
drivers/gpu/drm/drm_panic_qr.rs | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/drm_panic_qr.rs
index 09500cddc009..ef2d490965ba 100644
--- a/drivers/gpu/drm/drm_panic_qr.rs
+++ b/drivers/gpu/drm/drm_panic_qr.rs
@@ -929,7 +929,6 @@ fn draw_all(&mut self, data: impl Iterator<Item = u8>) {
/// * `tmp` must be valid for reading and writing for `tmp_size` bytes.
///
/// They must remain valid for the duration of the function call.
-
#[no_mangle]
pub unsafe extern "C" fn drm_panic_qr_generate(
url: *const i8,
base-commit: b7ed2b6f4e8d7f64649795e76ee9db67300de8eb
--
2.47.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 07fa619f2a40c221ea27747a3323cabc59ab25eb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121049-attire-ramble-432b@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 07fa619f2a40c221ea27747a3323cabc59ab25eb Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw(a)amazon.co.uk>
Date: Thu, 5 Dec 2024 15:05:07 +0000
Subject: [PATCH] x86/kexec: Restore GDT on return from ::preserve_context
kexec
The restore_processor_state() function explicitly states that "the asm code
that gets us here will have restored a usable GDT". That wasn't true in the
case of returning from a ::preserve_context kexec. Make it so.
Without this, the kernel was depending on the called function to reload a
GDT which is appropriate for the kernel before returning.
Test program:
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <linux/kexec.h>
#include <linux/reboot.h>
#include <sys/reboot.h>
#include <sys/syscall.h>
int main (void)
{
struct kexec_segment segment = {};
unsigned char purgatory[] = {
0x66, 0xba, 0xf8, 0x03, // mov $0x3f8, %dx
0xb0, 0x42, // mov $0x42, %al
0xee, // outb %al, (%dx)
0xc3, // ret
};
int ret;
segment.buf = &purgatory;
segment.bufsz = sizeof(purgatory);
segment.mem = (void *)0x400000;
segment.memsz = 0x1000;
ret = syscall(__NR_kexec_load, 0x400000, 1, &segment, KEXEC_PRESERVE_CONTEXT);
if (ret) {
perror("kexec_load");
exit(1);
}
ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC);
if (ret) {
perror("kexec reboot");
exit(1);
}
printf("Success\n");
return 0;
}
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20241205153343.3275139-2-dwmw2@infradead.org
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index e9e88c342f75..1236f25fc8d1 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -242,6 +242,13 @@ SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped)
movq CR0(%r8), %r8
movq %rax, %cr3
movq %r8, %cr0
+
+#ifdef CONFIG_KEXEC_JUMP
+ /* Saved in save_processor_state. */
+ movq $saved_context, %rax
+ lgdt saved_context_gdt_desc(%rax)
+#endif
+
movq %rbp, %rax
popf
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 07fa619f2a40c221ea27747a3323cabc59ab25eb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121048-dreamlike-devourer-7135@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 07fa619f2a40c221ea27747a3323cabc59ab25eb Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw(a)amazon.co.uk>
Date: Thu, 5 Dec 2024 15:05:07 +0000
Subject: [PATCH] x86/kexec: Restore GDT on return from ::preserve_context
kexec
The restore_processor_state() function explicitly states that "the asm code
that gets us here will have restored a usable GDT". That wasn't true in the
case of returning from a ::preserve_context kexec. Make it so.
Without this, the kernel was depending on the called function to reload a
GDT which is appropriate for the kernel before returning.
Test program:
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <linux/kexec.h>
#include <linux/reboot.h>
#include <sys/reboot.h>
#include <sys/syscall.h>
int main (void)
{
struct kexec_segment segment = {};
unsigned char purgatory[] = {
0x66, 0xba, 0xf8, 0x03, // mov $0x3f8, %dx
0xb0, 0x42, // mov $0x42, %al
0xee, // outb %al, (%dx)
0xc3, // ret
};
int ret;
segment.buf = &purgatory;
segment.bufsz = sizeof(purgatory);
segment.mem = (void *)0x400000;
segment.memsz = 0x1000;
ret = syscall(__NR_kexec_load, 0x400000, 1, &segment, KEXEC_PRESERVE_CONTEXT);
if (ret) {
perror("kexec_load");
exit(1);
}
ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC);
if (ret) {
perror("kexec reboot");
exit(1);
}
printf("Success\n");
return 0;
}
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20241205153343.3275139-2-dwmw2@infradead.org
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index e9e88c342f75..1236f25fc8d1 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -242,6 +242,13 @@ SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped)
movq CR0(%r8), %r8
movq %rax, %cr3
movq %r8, %cr0
+
+#ifdef CONFIG_KEXEC_JUMP
+ /* Saved in save_processor_state. */
+ movq $saved_context, %rax
+ lgdt saved_context_gdt_desc(%rax)
+#endif
+
movq %rbp, %rax
popf
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 07fa619f2a40c221ea27747a3323cabc59ab25eb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121047-pushing-trouble-cc13@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 07fa619f2a40c221ea27747a3323cabc59ab25eb Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw(a)amazon.co.uk>
Date: Thu, 5 Dec 2024 15:05:07 +0000
Subject: [PATCH] x86/kexec: Restore GDT on return from ::preserve_context
kexec
The restore_processor_state() function explicitly states that "the asm code
that gets us here will have restored a usable GDT". That wasn't true in the
case of returning from a ::preserve_context kexec. Make it so.
Without this, the kernel was depending on the called function to reload a
GDT which is appropriate for the kernel before returning.
Test program:
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <linux/kexec.h>
#include <linux/reboot.h>
#include <sys/reboot.h>
#include <sys/syscall.h>
int main (void)
{
struct kexec_segment segment = {};
unsigned char purgatory[] = {
0x66, 0xba, 0xf8, 0x03, // mov $0x3f8, %dx
0xb0, 0x42, // mov $0x42, %al
0xee, // outb %al, (%dx)
0xc3, // ret
};
int ret;
segment.buf = &purgatory;
segment.bufsz = sizeof(purgatory);
segment.mem = (void *)0x400000;
segment.memsz = 0x1000;
ret = syscall(__NR_kexec_load, 0x400000, 1, &segment, KEXEC_PRESERVE_CONTEXT);
if (ret) {
perror("kexec_load");
exit(1);
}
ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC);
if (ret) {
perror("kexec reboot");
exit(1);
}
printf("Success\n");
return 0;
}
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20241205153343.3275139-2-dwmw2@infradead.org
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index e9e88c342f75..1236f25fc8d1 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -242,6 +242,13 @@ SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped)
movq CR0(%r8), %r8
movq %rax, %cr3
movq %r8, %cr0
+
+#ifdef CONFIG_KEXEC_JUMP
+ /* Saved in save_processor_state. */
+ movq $saved_context, %rax
+ lgdt saved_context_gdt_desc(%rax)
+#endif
+
movq %rbp, %rax
popf
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 07fa619f2a40c221ea27747a3323cabc59ab25eb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121045-limelight-gainfully-7197@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 07fa619f2a40c221ea27747a3323cabc59ab25eb Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw(a)amazon.co.uk>
Date: Thu, 5 Dec 2024 15:05:07 +0000
Subject: [PATCH] x86/kexec: Restore GDT on return from ::preserve_context
kexec
The restore_processor_state() function explicitly states that "the asm code
that gets us here will have restored a usable GDT". That wasn't true in the
case of returning from a ::preserve_context kexec. Make it so.
Without this, the kernel was depending on the called function to reload a
GDT which is appropriate for the kernel before returning.
Test program:
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <linux/kexec.h>
#include <linux/reboot.h>
#include <sys/reboot.h>
#include <sys/syscall.h>
int main (void)
{
struct kexec_segment segment = {};
unsigned char purgatory[] = {
0x66, 0xba, 0xf8, 0x03, // mov $0x3f8, %dx
0xb0, 0x42, // mov $0x42, %al
0xee, // outb %al, (%dx)
0xc3, // ret
};
int ret;
segment.buf = &purgatory;
segment.bufsz = sizeof(purgatory);
segment.mem = (void *)0x400000;
segment.memsz = 0x1000;
ret = syscall(__NR_kexec_load, 0x400000, 1, &segment, KEXEC_PRESERVE_CONTEXT);
if (ret) {
perror("kexec_load");
exit(1);
}
ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC);
if (ret) {
perror("kexec reboot");
exit(1);
}
printf("Success\n");
return 0;
}
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20241205153343.3275139-2-dwmw2@infradead.org
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index e9e88c342f75..1236f25fc8d1 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -242,6 +242,13 @@ SYM_CODE_START_LOCAL_NOALIGN(virtual_mapped)
movq CR0(%r8), %r8
movq %rax, %cr3
movq %r8, %cr0
+
+#ifdef CONFIG_KEXEC_JUMP
+ /* Saved in save_processor_state. */
+ movq $saved_context, %rax
+ lgdt saved_context_gdt_desc(%rax)
+#endif
+
movq %rbp, %rax
popf
Hi Greg, Sasha,
Please consider applying commit c95bbb59a9b2 ("rust: enable
arbitrary_self_types and remove `Receiver`") to 6.12.y.
It is meant to support the upcoming Rust 1.84.0 compiler (to be
released in a month), since 6.12 LTS is the first stable kernel that
supports a minimum Rust version, thus users may use newer compilers.
Older LTSs do not need it, for that reason.
It applies almost cleanly (there is a simple conflict).
Thanks!
Cheers,
Miguel
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 73dae652dcac776296890da215ee7dec357a1032
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121032-evasive-monotone-82e7@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 73dae652dcac776296890da215ee7dec357a1032 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Mon, 25 Nov 2024 13:59:09 -0500
Subject: [PATCH] drm/amdgpu: rework resume handling for display (v2)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Split resume into a 3rd step to handle displays when DCC is
enabled on DCN 4.0.1. Move display after the buffer funcs
have been re-enabled so that the GPU will do the move and
properly set the DCC metadata for DCN.
v2: fix fence irq resume ordering
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.11.x
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 913eb4f08ee6..96316111300a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3762,7 +3762,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
*
* @adev: amdgpu_device pointer
*
- * First resume function for hardware IPs. The list of all the hardware
+ * Second resume function for hardware IPs. The list of all the hardware
* IPs that make up the asic is walked and the resume callbacks are run for
* all blocks except COMMON, GMC, and IH. resume puts the hardware into a
* functional state after a suspend and updates the software state as
@@ -3780,6 +3780,7 @@ static int amdgpu_device_ip_resume_phase2(struct amdgpu_device *adev)
if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_COMMON ||
adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC ||
adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_IH ||
+ adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE ||
adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_PSP)
continue;
r = amdgpu_ip_block_resume(&adev->ip_blocks[i]);
@@ -3790,6 +3791,36 @@ static int amdgpu_device_ip_resume_phase2(struct amdgpu_device *adev)
return 0;
}
+/**
+ * amdgpu_device_ip_resume_phase3 - run resume for hardware IPs
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Third resume function for hardware IPs. The list of all the hardware
+ * IPs that make up the asic is walked and the resume callbacks are run for
+ * all DCE. resume puts the hardware into a functional state after a suspend
+ * and updates the software state as necessary. This function is also used
+ * for restoring the GPU after a GPU reset.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+static int amdgpu_device_ip_resume_phase3(struct amdgpu_device *adev)
+{
+ int i, r;
+
+ for (i = 0; i < adev->num_ip_blocks; i++) {
+ if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
+ continue;
+ if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_DCE) {
+ r = amdgpu_ip_block_resume(&adev->ip_blocks[i]);
+ if (r)
+ return r;
+ }
+ }
+
+ return 0;
+}
+
/**
* amdgpu_device_ip_resume - run resume for hardware IPs
*
@@ -3819,6 +3850,13 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
if (adev->mman.buffer_funcs_ring->sched.ready)
amdgpu_ttm_set_buffer_funcs_status(adev, true);
+ if (r)
+ return r;
+
+ amdgpu_fence_driver_hw_init(adev);
+
+ r = amdgpu_device_ip_resume_phase3(adev);
+
return r;
}
@@ -4899,7 +4937,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool notify_clients)
dev_err(adev->dev, "amdgpu_device_ip_resume failed (%d).\n", r);
goto exit;
}
- amdgpu_fence_driver_hw_init(adev);
if (!adev->in_s0ix) {
r = amdgpu_amdkfd_resume(adev, adev->in_runpm);
@@ -5484,6 +5521,10 @@ int amdgpu_device_reinit_after_reset(struct amdgpu_reset_context *reset_context)
if (tmp_adev->mman.buffer_funcs_ring->sched.ready)
amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true);
+ r = amdgpu_device_ip_resume_phase3(tmp_adev);
+ if (r)
+ goto out;
+
if (vram_lost)
amdgpu_device_fill_reset_magic(tmp_adev);
This patchset implements UVC v1.5 region of interest using V4L2
control API.
ROI control is consisted two uvc specific controls.
1. A rectangle control with a newly added type V4L2_CTRL_TYPE_RECT.
2. An auto control with type bitmask.
V4L2_CTRL_WHICH_MIN/MAX_VAL is added to support the rectangle control.
The corresponding v4l-utils series can be found at
https://patchwork.linuxtv.org/project/linux-media/list/?series=11069 .
Tested with v4l2-compliance, v4l2-ctl, calling ioctls on usb cameras and
VIVID with a newly added V4L2_CTRL_TYPE_RECT control.
This set includes also the patch:
media: uvcvideo: Fix event flags in uvc_ctrl_send_events
It is not technically part of this change, but we conflict with it.
I am continuing the work that Yunke did.
Changes in v16:
- add documentation
- discard re-style
- refactor -ENOMEM
- remove "Use the camera to clamp compound controls"
- move uvc_rect
- data_out = 0
- s/max/min in uvc_set_rect()
- Return -EINVAL in uvc_ioctl_xu_ctrl_map instead of -ENOTTY.
- Use switch inside uvc_set_le_value.
- Link to v15: https://lore.kernel.org/r/20241114-uvc-roi-v15-0-64cfeb56b6f8@chromium.org
Changes in v15:
- Modify mapping set/get to support any size
- Remove v4l2_size field. It is not needed, we can use the v4l2_type to
infer it.
- Improve documentation.
- Lots of refactoring, now adding compound and roi are very small
patches.
- Remove rectangle clamping, not supported by some firmware.
- Remove init, we can add it later.
- Move uvc_cid to USER_BASE
- Link to v14: https://lore.kernel.org/linux-media/20231201071907.3080126-1-yunkec@google.…
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Hans Verkuil (1):
media: v4l2-ctrls: add support for V4L2_CTRL_WHICH_MIN/MAX_VAL
Ricardo Ribalda (11):
media: uvcvideo: Fix event flags in uvc_ctrl_send_events
media: uvcvideo: Handle uvc menu translation inside uvc_get_le_value
media: uvcvideo: Handle uvc menu translation inside uvc_set_le_value
media: uvcvideo: refactor uvc_ioctl_g_ext_ctrls
media: uvcvideo: uvc_ioctl_(g|s)_ext_ctrls: handle NoP case
media: uvcvideo: Support any size for mapping get/set
media: uvcvideo: Factor out clamping from uvc_ctrl_set
media: uvcvideo: Factor out query_boundaries from query_ctrl
media: uvcvideo: let v4l2_query_v4l2_ctrl() work with v4l2_query_ext_ctrl
media: uvcvideo: Introduce uvc_mapping_v4l2_size
media: uvcvideo: Add sanity check to uvc_ioctl_xu_ctrl_map
Yunke Cao (6):
media: v4l2_ctrl: Add V4L2_CTRL_TYPE_RECT
media: vivid: Add a rectangle control
media: uvcvideo: add support for compound controls
media: uvcvideo: support V4L2_CTRL_WHICH_MIN/MAX_VAL
media: uvcvideo: implement UVC v1.5 ROI
media: uvcvideo: document UVC v1.5 ROI
.../userspace-api/media/drivers/uvcvideo.rst | 64 ++
.../userspace-api/media/v4l/vidioc-g-ext-ctrls.rst | 26 +-
.../userspace-api/media/v4l/vidioc-queryctrl.rst | 14 +
.../userspace-api/media/videodev2.h.rst.exceptions | 4 +
drivers/media/i2c/imx214.c | 4 +-
drivers/media/platform/qcom/venus/venc_ctrls.c | 9 +-
drivers/media/test-drivers/vivid/vivid-ctrls.c | 34 +
drivers/media/usb/uvc/uvc_ctrl.c | 799 ++++++++++++++++-----
drivers/media/usb/uvc/uvc_v4l2.c | 77 +-
drivers/media/usb/uvc/uvcvideo.h | 25 +-
drivers/media/v4l2-core/v4l2-ctrls-api.c | 54 +-
drivers/media/v4l2-core/v4l2-ctrls-core.c | 167 ++++-
drivers/media/v4l2-core/v4l2-ioctl.c | 4 +-
include/media/v4l2-ctrls.h | 38 +-
include/uapi/linux/usb/video.h | 1 +
include/uapi/linux/uvcvideo.h | 13 +
include/uapi/linux/v4l2-controls.h | 7 +
include/uapi/linux/videodev2.h | 5 +
18 files changed, 1058 insertions(+), 287 deletions(-)
---
base-commit: 5516200c466f92954551406ea641376963c43a92
change-id: 20241113-uvc-roi-66bd6cfa1e64
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 091c1dd2d4df6edd1beebe0e5863d4034ade9572
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121036-agility-showman-7ed5@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 091c1dd2d4df6edd1beebe0e5863d4034ade9572 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 20 Nov 2024 21:11:51 +0100
Subject: [PATCH] mm/mempolicy: fix migrate_to_node() assuming there is at
least one VMA in a MM
We currently assume that there is at least one VMA in a MM, which isn't
true.
So we might end up having find_vma() return NULL, to then de-reference
NULL. So properly handle find_vma() returning NULL.
This fixes the report:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
Code: ...
RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
__do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
__x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[akpm(a)linux-foundation.org: add unlikely()]
Link: https://lkml.kernel.org/r/20241120201151.9518-1-david@redhat.com
Fixes: 39743889aaf7 ("[PATCH] Swap Migration V5: sys_migrate_pages interface")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: syzbot+3511625422f7aa637f0d(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/673d2696.050a0220.3c9d61.012f.GAE@google.com/T/
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Christoph Lameter <cl(a)linux.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bb37cd1a51d8..04f35659717a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1080,6 +1080,10 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest,
mmap_read_lock(mm);
vma = find_vma(mm, 0);
+ if (unlikely(!vma)) {
+ mmap_read_unlock(mm);
+ return 0;
+ }
/*
* This does not migrate the range, but isolates all pages that
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 091c1dd2d4df6edd1beebe0e5863d4034ade9572
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121034-utmost-parsnip-0faf@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 091c1dd2d4df6edd1beebe0e5863d4034ade9572 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 20 Nov 2024 21:11:51 +0100
Subject: [PATCH] mm/mempolicy: fix migrate_to_node() assuming there is at
least one VMA in a MM
We currently assume that there is at least one VMA in a MM, which isn't
true.
So we might end up having find_vma() return NULL, to then de-reference
NULL. So properly handle find_vma() returning NULL.
This fixes the report:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
Code: ...
RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
__do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
__x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[akpm(a)linux-foundation.org: add unlikely()]
Link: https://lkml.kernel.org/r/20241120201151.9518-1-david@redhat.com
Fixes: 39743889aaf7 ("[PATCH] Swap Migration V5: sys_migrate_pages interface")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: syzbot+3511625422f7aa637f0d(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/673d2696.050a0220.3c9d61.012f.GAE@google.com/T/
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Christoph Lameter <cl(a)linux.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bb37cd1a51d8..04f35659717a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1080,6 +1080,10 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest,
mmap_read_lock(mm);
vma = find_vma(mm, 0);
+ if (unlikely(!vma)) {
+ mmap_read_unlock(mm);
+ return 0;
+ }
/*
* This does not migrate the range, but isolates all pages that
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 091c1dd2d4df6edd1beebe0e5863d4034ade9572
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121033-xerox-resilient-8b1b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 091c1dd2d4df6edd1beebe0e5863d4034ade9572 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 20 Nov 2024 21:11:51 +0100
Subject: [PATCH] mm/mempolicy: fix migrate_to_node() assuming there is at
least one VMA in a MM
We currently assume that there is at least one VMA in a MM, which isn't
true.
So we might end up having find_vma() return NULL, to then de-reference
NULL. So properly handle find_vma() returning NULL.
This fixes the report:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
Code: ...
RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
__do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
__x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[akpm(a)linux-foundation.org: add unlikely()]
Link: https://lkml.kernel.org/r/20241120201151.9518-1-david@redhat.com
Fixes: 39743889aaf7 ("[PATCH] Swap Migration V5: sys_migrate_pages interface")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: syzbot+3511625422f7aa637f0d(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/673d2696.050a0220.3c9d61.012f.GAE@google.com/T/
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Christoph Lameter <cl(a)linux.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bb37cd1a51d8..04f35659717a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1080,6 +1080,10 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest,
mmap_read_lock(mm);
vma = find_vma(mm, 0);
+ if (unlikely(!vma)) {
+ mmap_read_unlock(mm);
+ return 0;
+ }
/*
* This does not migrate the range, but isolates all pages that
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 091c1dd2d4df6edd1beebe0e5863d4034ade9572
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121032-handbook-flyover-440f@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 091c1dd2d4df6edd1beebe0e5863d4034ade9572 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 20 Nov 2024 21:11:51 +0100
Subject: [PATCH] mm/mempolicy: fix migrate_to_node() assuming there is at
least one VMA in a MM
We currently assume that there is at least one VMA in a MM, which isn't
true.
So we might end up having find_vma() return NULL, to then de-reference
NULL. So properly handle find_vma() returning NULL.
This fixes the report:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
Code: ...
RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
__do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
__x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[akpm(a)linux-foundation.org: add unlikely()]
Link: https://lkml.kernel.org/r/20241120201151.9518-1-david@redhat.com
Fixes: 39743889aaf7 ("[PATCH] Swap Migration V5: sys_migrate_pages interface")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: syzbot+3511625422f7aa637f0d(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/673d2696.050a0220.3c9d61.012f.GAE@google.com/T/
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Christoph Lameter <cl(a)linux.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bb37cd1a51d8..04f35659717a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1080,6 +1080,10 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest,
mmap_read_lock(mm);
vma = find_vma(mm, 0);
+ if (unlikely(!vma)) {
+ mmap_read_unlock(mm);
+ return 0;
+ }
/*
* This does not migrate the range, but isolates all pages that
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 091c1dd2d4df6edd1beebe0e5863d4034ade9572
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121030-dingy-chant-45f8@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 091c1dd2d4df6edd1beebe0e5863d4034ade9572 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 20 Nov 2024 21:11:51 +0100
Subject: [PATCH] mm/mempolicy: fix migrate_to_node() assuming there is at
least one VMA in a MM
We currently assume that there is at least one VMA in a MM, which isn't
true.
So we might end up having find_vma() return NULL, to then de-reference
NULL. So properly handle find_vma() returning NULL.
This fixes the report:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
Code: ...
RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
__do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
__x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[akpm(a)linux-foundation.org: add unlikely()]
Link: https://lkml.kernel.org/r/20241120201151.9518-1-david@redhat.com
Fixes: 39743889aaf7 ("[PATCH] Swap Migration V5: sys_migrate_pages interface")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: syzbot+3511625422f7aa637f0d(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/673d2696.050a0220.3c9d61.012f.GAE@google.com/T/
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Christoph Lameter <cl(a)linux.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bb37cd1a51d8..04f35659717a 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1080,6 +1080,10 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest,
mmap_read_lock(mm);
vma = find_vma(mm, 0);
+ if (unlikely(!vma)) {
+ mmap_read_unlock(mm);
+ return 0;
+ }
/*
* This does not migrate the range, but isolates all pages that
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e30a0361b8515d424c73c67de1a43e45a13b8ba2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121010-gladiator-streak-993a@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e30a0361b8515d424c73c67de1a43e45a13b8ba2 Mon Sep 17 00:00:00 2001
From: Jared Kangas <jkangas(a)redhat.com>
Date: Tue, 19 Nov 2024 13:02:34 -0800
Subject: [PATCH] kasan: make report_lock a raw spinlock
If PREEMPT_RT is enabled, report_lock is a sleeping spinlock and must not
be locked when IRQs are disabled. However, KASAN reports may be triggered
in such contexts. For example:
char *s = kzalloc(1, GFP_KERNEL);
kfree(s);
local_irq_disable();
char c = *s; /* KASAN report here leads to spin_lock() */
local_irq_enable();
Make report_spinlock a raw spinlock to prevent rescheduling when
PREEMPT_RT is enabled.
Link: https://lkml.kernel.org/r/20241119210234.1602529-1-jkangas@redhat.com
Fixes: 342a93247e08 ("locking/spinlock: Provide RT variant header: <linux/spinlock_rt.h>")
Signed-off-by: Jared Kangas <jkangas(a)redhat.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 50fb19ad4388..3fe77a360f1c 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -201,7 +201,7 @@ static inline void fail_non_kasan_kunit_test(void) { }
#endif /* CONFIG_KUNIT */
-static DEFINE_SPINLOCK(report_lock);
+static DEFINE_RAW_SPINLOCK(report_lock);
static void start_report(unsigned long *flags, bool sync)
{
@@ -212,7 +212,7 @@ static void start_report(unsigned long *flags, bool sync)
lockdep_off();
/* Make sure we don't end up in loop. */
report_suppress_start();
- spin_lock_irqsave(&report_lock, *flags);
+ raw_spin_lock_irqsave(&report_lock, *flags);
pr_err("==================================================================\n");
}
@@ -222,7 +222,7 @@ static void end_report(unsigned long *flags, const void *addr, bool is_write)
trace_error_report_end(ERROR_DETECTOR_KASAN,
(unsigned long)addr);
pr_err("==================================================================\n");
- spin_unlock_irqrestore(&report_lock, *flags);
+ raw_spin_unlock_irqrestore(&report_lock, *flags);
if (!test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
check_panic_on_warn("KASAN");
switch (kasan_arg_fault) {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x e30a0361b8515d424c73c67de1a43e45a13b8ba2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121008-yelling-swear-a95b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e30a0361b8515d424c73c67de1a43e45a13b8ba2 Mon Sep 17 00:00:00 2001
From: Jared Kangas <jkangas(a)redhat.com>
Date: Tue, 19 Nov 2024 13:02:34 -0800
Subject: [PATCH] kasan: make report_lock a raw spinlock
If PREEMPT_RT is enabled, report_lock is a sleeping spinlock and must not
be locked when IRQs are disabled. However, KASAN reports may be triggered
in such contexts. For example:
char *s = kzalloc(1, GFP_KERNEL);
kfree(s);
local_irq_disable();
char c = *s; /* KASAN report here leads to spin_lock() */
local_irq_enable();
Make report_spinlock a raw spinlock to prevent rescheduling when
PREEMPT_RT is enabled.
Link: https://lkml.kernel.org/r/20241119210234.1602529-1-jkangas@redhat.com
Fixes: 342a93247e08 ("locking/spinlock: Provide RT variant header: <linux/spinlock_rt.h>")
Signed-off-by: Jared Kangas <jkangas(a)redhat.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 50fb19ad4388..3fe77a360f1c 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -201,7 +201,7 @@ static inline void fail_non_kasan_kunit_test(void) { }
#endif /* CONFIG_KUNIT */
-static DEFINE_SPINLOCK(report_lock);
+static DEFINE_RAW_SPINLOCK(report_lock);
static void start_report(unsigned long *flags, bool sync)
{
@@ -212,7 +212,7 @@ static void start_report(unsigned long *flags, bool sync)
lockdep_off();
/* Make sure we don't end up in loop. */
report_suppress_start();
- spin_lock_irqsave(&report_lock, *flags);
+ raw_spin_lock_irqsave(&report_lock, *flags);
pr_err("==================================================================\n");
}
@@ -222,7 +222,7 @@ static void end_report(unsigned long *flags, const void *addr, bool is_write)
trace_error_report_end(ERROR_DETECTOR_KASAN,
(unsigned long)addr);
pr_err("==================================================================\n");
- spin_unlock_irqrestore(&report_lock, *flags);
+ raw_spin_unlock_irqrestore(&report_lock, *flags);
if (!test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
check_panic_on_warn("KASAN");
switch (kasan_arg_fault) {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 5f1b64e9a9b7ee9cfd32c6b2fab796e29bfed075
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024121049-manhandle-nintendo-724e@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5f1b64e9a9b7ee9cfd32c6b2fab796e29bfed075 Mon Sep 17 00:00:00 2001
From: Adrian Huang <ahuang12(a)lenovo.com>
Date: Wed, 13 Nov 2024 18:21:46 +0800
Subject: [PATCH] sched/numa: fix memory leak due to the overwritten
vma->numab_state
[Problem Description]
When running the hackbench program of LTP, the following memory leak is
reported by kmemleak.
# /opt/ltp/testcases/bin/hackbench 20 thread 1000
Running with 20*40 (== 800) tasks.
# dmesg | grep kmemleak
...
kmemleak: 480 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
kmemleak: 665 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff888cd8ca2c40 (size 64):
comm "hackbench", pid 17142, jiffies 4299780315
hex dump (first 32 bytes):
ac 74 49 00 01 00 00 00 4c 84 49 00 01 00 00 00 .tI.....L.I.....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc bff18fd4):
[<ffffffff81419a89>] __kmalloc_cache_noprof+0x2f9/0x3f0
[<ffffffff8113f715>] task_numa_work+0x725/0xa00
[<ffffffff8110f878>] task_work_run+0x58/0x90
[<ffffffff81ddd9f8>] syscall_exit_to_user_mode+0x1c8/0x1e0
[<ffffffff81dd78d5>] do_syscall_64+0x85/0x150
[<ffffffff81e0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
This issue can be consistently reproduced on three different servers:
* a 448-core server
* a 256-core server
* a 192-core server
[Root Cause]
Since multiple threads are created by the hackbench program (along with
the command argument 'thread'), a shared vma might be accessed by two or
more cores simultaneously. When two or more cores observe that
vma->numab_state is NULL at the same time, vma->numab_state will be
overwritten.
Although current code ensures that only one thread scans the VMAs in a
single 'numa_scan_period', there might be a chance for another thread
to enter in the next 'numa_scan_period' while we have not gotten till
numab_state allocation [1].
Note that the command `/opt/ltp/testcases/bin/hackbench 50 process 1000`
cannot the reproduce the issue. It is verified with 200+ test runs.
[Solution]
Use the cmpxchg atomic operation to ensure that only one thread executes
the vma->numab_state assignment.
[1] https://lore.kernel.org/lkml/1794be3c-358c-4cdc-a43d-a1f841d91ef7@amd.com/
Link: https://lkml.kernel.org/r/20241113102146.2384-1-ahuang12@lenovo.com
Fixes: ef6a22b70f6d ("sched/numa: apply the scan delay to every new vma")
Signed-off-by: Adrian Huang <ahuang12(a)lenovo.com>
Reported-by: Jiwei Sun <sunjw10(a)lenovo.com>
Reviewed-by: Raghavendra K T <raghavendra.kt(a)amd.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Valentin Schneider <vschneid(a)redhat.com>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fbdca89c677f..a59ae2e23daf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3399,11 +3399,17 @@ static void task_numa_work(struct callback_head *work)
/* Initialise new per-VMA NUMAB state. */
if (!vma->numab_state) {
- vma->numab_state = kzalloc(sizeof(struct vma_numab_state),
- GFP_KERNEL);
- if (!vma->numab_state)
+ struct vma_numab_state *ptr;
+
+ ptr = kzalloc(sizeof(*ptr), GFP_KERNEL);
+ if (!ptr)
continue;
+ if (cmpxchg(&vma->numab_state, NULL, ptr)) {
+ kfree(ptr);
+ continue;
+ }
+
vma->numab_state->start_scan_seq = mm->numa_scan_seq;
vma->numab_state->next_scan = now +