Modify the framework to adapt to more map modes, add benchmark
support for dma_map_sg, and add support sg map mode in ioctl.
The result:
[root@localhost]# ./dma_map_benchmark -m 1 -g 8 -t 8 -s 30 -d 2
dma mapping mode: DMA_MAP_SG_MODE
dma mapping benchmark: threads:8 seconds:30 node:-1 dir:FROM_DEVICE granule/sg_nents: 8
average map latency(us):1.4 standard deviation:0.3
average unmap latency(us):1.3 standard deviation:0.3
[root@localhost]# ./dma_map_benchmark -m 0 -g 8 -t 8 -s 30 -d 2
dma mapping mode: DMA_MAP_SINGLE_MODE
dma mapping benchmark: threads:8 seconds:30 node:-1 dir:FROM_DEVICE granule/sg_nents: 8
average map latency(us):1.0 standard deviation:0.3
average unmap latency(us):1.3 standard deviation:0.5
---
Changes since V2:
- Address the comments from Barry and ALOK, some commit information and function
input parameter names are modified to make them more accurate.
- Link: https://lore.kernel.org/all/20250506030100.394376-1-xiaqinxin@huawei.com/
Changes since V1:
- Address the comments from Barry, added some comments and changed the unmap type to void.
- Link: https://lore.kernel.org/lkml/20250212022718.1995504-1-xiaqinxin@huawei.com/
Qinxin Xia (4):
dma-mapping: benchmark: Add padding to ensure uABI remained consistent
dma-mapping: benchmark: modify the framework to adapt to more map modes
dma-mapping: benchmark: add support for dma_map_sg
selftests/dma: Add dma_map_sg support
include/linux/map_benchmark.h | 46 +++-
kernel/dma/map_benchmark.c | 225 ++++++++++++++++--
.../testing/selftests/dma/dma_map_benchmark.c | 16 +-
3 files changed, 252 insertions(+), 35 deletions(-)
--
2.33.0
This series adds support for camera clock controller base driver,
bindings and DT support on sc8180x platform.
Signed-off-by: Satya Priya Kakitapalli <quic_skakitap(a)quicinc.com>
---
Changes in v3:
- Drop Fixes tag in patch [1/4]. Dropped unused gpu_iref and
aggre_ufs_card_2 clk bindings.
- Move the allOf block below required block in bindings patch.
- Remove the unused cam_cc_parent_data_7 and cam_cc_parent_map_7
in the driver patch. Reported by kernel test bot.
- Link to v2: https://lore.kernel.org/r/20250430-sc8180x-camcc-support-v2-0-6bbb514f467c@…
Changes in v2:
- New patch [1/4] to add all the missing gcc bindings along with
the required GCC_CAMERA_AHB_CLOCK
- As per Konrad's comments, add the camera AHB clock dependency in the
DT and yaml bindings.
- As per Vladimir's comments, update the Kconfig to add the SC8180X config
in correct alphanumerical order.
- Link to v1: https://lore.kernel.org/r/20250422-sc8180x-camcc-support-v1-0-691614d13f06@…
---
Satya Priya Kakitapalli (4):
dt-bindings: clock: qcom: Add missing bindings on gcc-sc8180x
dt-bindings: clock: Add Qualcomm SC8180X Camera clock controller
clk: qcom: camcc-sc8180x: Add SC8180X camera clock controller driver
arm64: dts: qcom: Add camera clock controller for sc8180x
.../bindings/clock/qcom,sc8180x-camcc.yaml | 67 +
arch/arm64/boot/dts/qcom/sc8180x.dtsi | 14 +
drivers/clk/qcom/Kconfig | 10 +
drivers/clk/qcom/Makefile | 1 +
drivers/clk/qcom/camcc-sc8180x.c | 2889 ++++++++++++++++++++
include/dt-bindings/clock/qcom,gcc-sc8180x.h | 10 +
include/dt-bindings/clock/qcom,sc8180x-camcc.h | 181 ++
7 files changed, 3172 insertions(+)
---
base-commit: bc8aa6cdadcc00862f2b5720e5de2e17f696a081
change-id: 20250422-sc8180x-camcc-support-9a82507d2a39
Best regards,
--
Satya Priya Kakitapalli <quic_skakitap(a)quicinc.com>
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x be8250786ca94952a19ce87f98ad9906448bc9ef
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025050521-crispy-study-e836@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be8250786ca94952a19ce87f98ad9906448bc9ef Mon Sep 17 00:00:00 2001
From: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Date: Mon, 21 Apr 2025 15:52:32 +0800
Subject: [PATCH] mm, slab: clean up slab->obj_exts always
When memory allocation profiling is disabled at runtime or due to an
error, shutdown_mem_profiling() is called: slab->obj_exts which
previously allocated remains.
It won't be cleared by unaccount_slab() because of
mem_alloc_profiling_enabled() not true. It's incorrect, slab->obj_exts
should always be cleaned up in unaccount_slab() to avoid following error:
[...]BUG: Bad page state in process...
..
[...]page dumped because: page still charged to cgroup
[andriy.shevchenko(a)linux.intel.com: fold need_slab_obj_ext() into its only user]
Fixes: 21c690a349ba ("mm: introduce slabobj_ext to support slab object extensions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Acked-by: Harry Yoo <harry.yoo(a)oracle.com>
Tested-by: Harry Yoo <harry.yoo(a)oracle.com>
Acked-by: Suren Baghdasaryan <surenb(a)google.com>
Link: https://patch.msgid.link/20250421075232.2165527-1-quic_zhenhuah@quicinc.com
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
diff --git a/mm/slub.c b/mm/slub.c
index dc9e729e1d26..be8b09e09d30 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2028,8 +2028,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
return 0;
}
-/* Should be called only if mem_alloc_profiling_enabled() */
-static noinline void free_slab_obj_exts(struct slab *slab)
+static inline void free_slab_obj_exts(struct slab *slab)
{
struct slabobj_ext *obj_exts;
@@ -2049,18 +2048,6 @@ static noinline void free_slab_obj_exts(struct slab *slab)
slab->obj_exts = 0;
}
-static inline bool need_slab_obj_ext(void)
-{
- if (mem_alloc_profiling_enabled())
- return true;
-
- /*
- * CONFIG_MEMCG creates vector of obj_cgroup objects conditionally
- * inside memcg_slab_post_alloc_hook. No other users for now.
- */
- return false;
-}
-
#else /* CONFIG_SLAB_OBJ_EXT */
static inline void init_slab_obj_exts(struct slab *slab)
@@ -2077,11 +2064,6 @@ static inline void free_slab_obj_exts(struct slab *slab)
{
}
-static inline bool need_slab_obj_ext(void)
-{
- return false;
-}
-
#endif /* CONFIG_SLAB_OBJ_EXT */
#ifdef CONFIG_MEM_ALLOC_PROFILING
@@ -2129,7 +2111,7 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
static inline void
alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
{
- if (need_slab_obj_ext())
+ if (mem_alloc_profiling_enabled())
__alloc_tagging_slab_alloc_hook(s, object, flags);
}
@@ -2601,8 +2583,12 @@ static __always_inline void account_slab(struct slab *slab, int order,
static __always_inline void unaccount_slab(struct slab *slab, int order,
struct kmem_cache *s)
{
- if (memcg_kmem_online() || need_slab_obj_ext())
- free_slab_obj_exts(slab);
+ /*
+ * The slab object extensions should now be freed regardless of
+ * whether mem_alloc_profiling_enabled() or not because profiling
+ * might have been disabled after slab->obj_exts got allocated.
+ */
+ free_slab_obj_exts(slab);
mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
-(PAGE_SIZE << order));
After a recent change [1] in clang's randstruct implementation to
randomize structures that only contain function pointers, there is an
error because qede_ll_ops get randomized but does not use a designated
initializer for the first member:
drivers/net/ethernet/qlogic/qede/qede_main.c:206:2: error: a randomized struct can only be initialized with a designated initializer
206 | {
| ^
Explicitly initialize the common member using a designated initializer
to fix the build.
Cc: stable(a)vger.kernel.org
Fixes: 035f7f87b729 ("randstruct: Enable Clang support")
Link: https://github.com/llvm/llvm-project/commit/04364fb888eea6db9811510607bed4b… [1]
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 99df00c30b8c..b5d744d2586f 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -203,7 +203,7 @@ static struct pci_driver qede_pci_driver = {
};
static struct qed_eth_cb_ops qede_ll_ops = {
- {
+ .common = {
#ifdef CONFIG_RFS_ACCEL
.arfs_filter_op = qede_arfs_filter_op,
#endif
---
base-commit: 9540984da649d46f699c47f28c68bbd3c9d99e4c
change-id: 20250507-qede-fix-clang-randstruct-13d8c593cb58
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
When CONFIG_PREEMPT_COUNT is not configured (i.e. CONFIG_PREEMPT_NONE/
CONFIG_PREEMPT_VOLUNTARY), preempt_disable() / preempt_enable() merely
acts as a barrier(). However, in these cases cond_resched() can still
trigger a context switch and modify the CSR.EUEN, resulting in do_fpu()
exception being activated within the kernel-fpu critical sections, as
demonstrated in the following path:
dcn32_calculate_wm_and_dlg()
DC_FP_START()
dcn32_calculate_wm_and_dlg_fpu()
dcn32_find_dummy_latency_index_for_fw_based_mclk_switch()
dcn32_internal_validate_bw()
dcn32_enable_phantom_stream()
dc_create_stream_for_sink()
kzalloc(GFP_KERNEL)
__kmem_cache_alloc_node()
__cond_resched()
DC_FP_END()
This patch is similar to commit d021985 (x86/fpu: Improve crypto
performance by making kernel-mode FPU reliably usable in softirqs). It
uses local_bh_disable() instead of preempt_disable() for non-RT kernels
so it can avoid the cond_resched() issue, and also extend the kernel-fpu
application scenarios to the softirq context.
Cc: stable(a)vger.kernel.org
Signed-off-by: Tianyang Zhang <zhangtianyang(a)loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
arch/loongarch/kernel/kfpu.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/loongarch/kernel/kfpu.c b/arch/loongarch/kernel/kfpu.c
index ec5b28e570c9..4e469b021cf4 100644
--- a/arch/loongarch/kernel/kfpu.c
+++ b/arch/loongarch/kernel/kfpu.c
@@ -18,11 +18,28 @@ static unsigned int euen_mask = CSR_EUEN_FPEN;
static DEFINE_PER_CPU(bool, in_kernel_fpu);
static DEFINE_PER_CPU(unsigned int, euen_current);
+static inline void fpregs_lock(void)
+{
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ local_bh_disable();
+ else
+ preempt_disable();
+}
+
+static inline void fpregs_unlock(void)
+{
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ local_bh_enable();
+ else
+ preempt_enable();
+}
+
void kernel_fpu_begin(void)
{
unsigned int *euen_curr;
- preempt_disable();
+ if (!irqs_disabled())
+ fpregs_lock();
WARN_ON(this_cpu_read(in_kernel_fpu));
@@ -73,7 +90,8 @@ void kernel_fpu_end(void)
this_cpu_write(in_kernel_fpu, false);
- preempt_enable();
+ if (!irqs_disabled())
+ fpregs_unlock();
}
EXPORT_SYMBOL_GPL(kernel_fpu_end);
--
2.20.1
commit 968f19c5b1b7d5595423b0ac0020cc18dfed8cb5 upstream.
[BUG]
It is a long known bug that VM image on btrfs can lead to data csum
mismatch, if the qemu is using direct-io for the image (this is commonly
known as cache mode 'none').
[CAUSE]
Inside the VM, if the fs is EXT4 or XFS, or even NTFS from Windows, the
fs is allowed to dirty/modify the folio even if the folio is under
writeback (as long as the address space doesn't have AS_STABLE_WRITES
flag inherited from the block device).
This is a valid optimization to improve the concurrency, and since these
filesystems have no extra checksum on data, the content change is not a
problem at all.
But the final write into the image file is handled by btrfs, which needs
the content not to be modified during writeback, or the checksum will
not match the data (checksum is calculated before submitting the bio).
So EXT4/XFS/NTRFS assume they can modify the folio under writeback, but
btrfs requires no modification, this leads to the false csum mismatch.
This is only a controlled example, there are even cases where
multi-thread programs can submit a direct IO write, then another thread
modifies the direct IO buffer for whatever reason.
For such cases, btrfs has no sane way to detect such cases and leads to
false data csum mismatch.
[FIX]
I have considered the following ideas to solve the problem:
- Make direct IO to always skip data checksum
This not only requires a new incompatible flag, as it breaks the
current per-inode NODATASUM flag.
But also requires extra handling for no csum found cases.
And this also reduces our checksum protection.
- Let hardware handle all the checksum
AKA, just nodatasum mount option.
That requires trust for hardware (which is not that trustful in a lot
of cases), and it's not generic at all.
- Always fallback to buffered write if the inode requires checksum
This was suggested by Christoph, and is the solution utilized by this
patch.
The cost is obvious, the extra buffer copying into page cache, thus it
reduces the performance.
But at least it's still user configurable, if the end user still wants
the zero-copy performance, just set NODATASUM flag for the inode
(which is a common practice for VM images on btrfs).
Since we cannot trust user space programs to keep the buffer
consistent during direct IO, we have no choice but always falling back
to buffered IO. At least by this, we avoid the more deadly false data
checksum mismatch error.
CC: stable(a)vger.kernel.org # 6.6
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
[ Fix a conflict due to the movement of the function. ]
---
fs/btrfs/file.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e794606e7c78..f1456c745c6d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1515,6 +1515,23 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
goto buffered;
}
+ /*
+ * We can't control the folios being passed in, applications can write
+ * to them while a direct IO write is in progress. This means the
+ * content might change after we calculated the data checksum.
+ * Therefore we can end up storing a checksum that doesn't match the
+ * persisted data.
+ *
+ * To be extra safe and avoid false data checksum mismatch, if the
+ * inode requires data checksum, just fallback to buffered IO.
+ * For buffered IO we have full control of page cache and can ensure
+ * no one is modifying the content during writeback.
+ */
+ if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) {
+ btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
+ goto buffered;
+ }
+
/*
* The iov_iter can be mapped to the same file range we are writing to.
* If that's the case, then we will deadlock in the iomap code, because
--
2.49.0
From: Josef Bacik <josef(a)toxicpanda.com>
[ Upstream commit 8cbc3001a3264d998d6b6db3e23f935c158abd4d ]
The submit helper will always run bio_endio() on the bio if it fails to
submit, so cleaning up the bio just leads to a variety of use-after-free
and NULL pointer dereference bugs because we race with the endio
function that is cleaning up the bio. Instead just return BLK_STS_OK as
the repair function has to continue to process the rest of the pages,
and the endio for the repair bio will do the appropriate cleanup for the
page that it was given.
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
[Minor context change fixed.]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Build test passed.
---
fs/btrfs/extent_io.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 346fc46d019b..a1946d62911c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2624,7 +2624,6 @@ int btrfs_repair_one_sector(struct inode *inode,
const int icsum = bio_offset >> fs_info->sectorsize_bits;
struct bio *repair_bio;
struct btrfs_io_bio *repair_io_bio;
- blk_status_t status;
btrfs_debug(fs_info,
"repair read error: read error at %llu", start);
@@ -2664,13 +2663,13 @@ int btrfs_repair_one_sector(struct inode *inode,
"repair read error: submitting new read to mirror %d",
failrec->this_mirror);
- status = submit_bio_hook(inode, repair_bio, failrec->this_mirror,
- failrec->bio_flags);
- if (status) {
- free_io_failure(failure_tree, tree, failrec);
- bio_put(repair_bio);
- }
- return blk_status_to_errno(status);
+ /*
+ * At this point we have a bio, so any errors from submit_bio_hook()
+ * will be handled by the endio on the repair_bio, so we can't return an
+ * error here.
+ */
+ submit_bio_hook(inode, repair_bio, failrec->this_mirror, failrec->bio_flags);
+ return BLK_STS_OK;
}
static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
--
2.34.1
The patch titled
Subject: mm: userfaultfd: correct dirty flags set for both present and swap pte
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-userfaultfd-correct-dirty-flags-set-for-both-present-and-swap-pte.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Barry Song <v-songbaohua(a)oppo.com>
Subject: mm: userfaultfd: correct dirty flags set for both present and swap pte
Date: Fri, 9 May 2025 10:09:12 +1200
As David pointed out, what truly matters for mremap and userfaultfd move
operations is the soft dirty bit. The current comment and
implementation���which always sets the dirty bit for present PTEs and
fails to set the soft dirty bit for swap PTEs���are incorrect. This could
break features like Checkpoint-Restore in Userspace (CRIU).
This patch updates the behavior to correctly set the soft dirty bit for
both present and swap PTEs in accordance with mremap.
Link: https://lkml.kernel.org/r/20250508220912.7275-1-21cnbao@gmail.com
Fixes: adef440691bab ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
Reported-by: David Hildenbrand <david(a)redhat.com>
Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redha…
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
--- a/mm/userfaultfd.c~mm-userfaultfd-correct-dirty-flags-set-for-both-present-and-swap-pte
+++ a/mm/userfaultfd.c
@@ -1064,8 +1064,13 @@ static int move_present_pte(struct mm_st
src_folio->index = linear_page_index(dst_vma, dst_addr);
orig_dst_pte = mk_pte(&src_folio->page, dst_vma->vm_page_prot);
- /* Follow mremap() behavior and treat the entry dirty after the move */
- orig_dst_pte = pte_mkwrite(pte_mkdirty(orig_dst_pte), dst_vma);
+ /* Set soft dirty bit so userspace can notice the pte was moved */
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ orig_dst_pte = pte_mksoft_dirty(orig_dst_pte);
+#endif
+ if (pte_dirty(orig_src_pte))
+ orig_dst_pte = pte_mkdirty(orig_dst_pte);
+ orig_dst_pte = pte_mkwrite(orig_dst_pte, dst_vma);
set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte);
out:
@@ -1100,6 +1105,9 @@ static int move_swap_pte(struct mm_struc
}
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ orig_src_pte = pte_swp_mksoft_dirty(orig_src_pte);
+#endif
set_pte_at(mm, dst_addr, dst_pte, orig_src_pte);
double_pt_unlock(dst_ptl, src_ptl);
_
Patches currently in -mm which might be from v-songbaohua(a)oppo.com are
mm-userfaultfd-correct-dirty-flags-set-for-both-present-and-swap-pte.patch
The report zones buffer size is currently limited by the HBA's
maximum segment count to ensure the buffer can be mapped. However,
the block layer further limits the number of iovec entries to
1024 when allocating a bio.
To avoid allocation of buffers too large to be mapped, further
restrict the maximum buffer size to BIO_MAX_INLINE_VECS.
Replace the UIO_MAXIOV symbolic name with the more contextually
appropriate BIO_MAX_INLINE_VECS.
Fixes: b091ac616846 ("sd_zbc: Fix report zones buffer allocation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Steve Siwinski <ssiwinski(a)atto.com>
---
block/bio.c | 2 +-
drivers/scsi/sd_zbc.c | 6 +++++-
include/linux/bio.h | 1 +
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 4e6c85a33d74..4be592d37fb6 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -611,7 +611,7 @@ struct bio *bio_kmalloc(unsigned short nr_vecs, gfp_t gfp_mask)
{
struct bio *bio;
- if (nr_vecs > UIO_MAXIOV)
+ if (nr_vecs > BIO_MAX_INLINE_VECS)
return NULL;
return kmalloc(struct_size(bio, bi_inline_vecs, nr_vecs), gfp_mask);
}
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 7a447ff600d2..a8db66428f80 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -169,6 +169,7 @@ static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
unsigned int nr_zones, size_t *buflen)
{
struct request_queue *q = sdkp->disk->queue;
+ unsigned int max_segments;
size_t bufsize;
void *buf;
@@ -180,12 +181,15 @@ static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
* Furthermore, since the report zone command cannot be split, make
* sure that the allocated buffer can always be mapped by limiting the
* number of pages allocated to the HBA max segments limit.
+ * Since max segments can be larger than the max inline bio vectors,
+ * further limit the allocated buffer to BIO_MAX_INLINE_VECS.
*/
nr_zones = min(nr_zones, sdkp->zone_info.nr_zones);
bufsize = roundup((nr_zones + 1) * 64, SECTOR_SIZE);
bufsize = min_t(size_t, bufsize,
queue_max_hw_sectors(q) << SECTOR_SHIFT);
- bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT);
+ max_segments = min(BIO_MAX_INLINE_VECS, queue_max_segments(q));
+ bufsize = min_t(size_t, bufsize, max_segments << PAGE_SHIFT);
while (bufsize >= SECTOR_SIZE) {
buf = kvzalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index cafc7c215de8..b786ec5bcc81 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -11,6 +11,7 @@
#include <linux/uio.h>
#define BIO_MAX_VECS 256U
+#define BIO_MAX_INLINE_VECS UIO_MAXIOV
struct queue_limits;
--
2.43.5
From: Barry Song <v-songbaohua(a)oppo.com>
As David pointed out, what truly matters for mremap and userfaultfd
move operations is the soft dirty bit. The current comment and
implementation—which always sets the dirty bit for present PTEs
and fails to set the soft dirty bit for swap PTEs—are incorrect.
This could break features like Checkpoint-Restore in Userspace
(CRIU).
This patch updates the behavior to correctly set the soft dirty bit
for both present and swap PTEs in accordance with mremap.
Reported-by: David Hildenbrand <david(a)redhat.com>
Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redha…
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Fixes: adef440691bab ("userfaultfd: UFFDIO_MOVE uABI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
---
mm/userfaultfd.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index e8ce92dc105f..bc473ad21202 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1064,8 +1064,13 @@ static int move_present_pte(struct mm_struct *mm,
src_folio->index = linear_page_index(dst_vma, dst_addr);
orig_dst_pte = folio_mk_pte(src_folio, dst_vma->vm_page_prot);
- /* Follow mremap() behavior and treat the entry dirty after the move */
- orig_dst_pte = pte_mkwrite(pte_mkdirty(orig_dst_pte), dst_vma);
+ /* Set soft dirty bit so userspace can notice the pte was moved */
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ orig_dst_pte = pte_mksoft_dirty(orig_dst_pte);
+#endif
+ if (pte_dirty(orig_src_pte))
+ orig_dst_pte = pte_mkdirty(orig_dst_pte);
+ orig_dst_pte = pte_mkwrite(orig_dst_pte, dst_vma);
set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte);
out:
@@ -1100,6 +1105,9 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
}
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ orig_src_pte = pte_swp_mksoft_dirty(orig_src_pte);
+#endif
set_pte_at(mm, dst_addr, dst_pte, orig_src_pte);
double_pt_unlock(dst_ptl, src_ptl);
--
2.39.3 (Apple Git-146)
With UBSAN enabled, we're getting the following trace:
UBSAN: array-index-out-of-bounds in .../drivers/clk/clk-s2mps11.c:186:3
index 0 is out of range for type 'struct clk_hw *[] __counted_by(num)' (aka 'struct clk_hw *[]')
This is because commit f316cdff8d67 ("clk: Annotate struct
clk_hw_onecell_data with __counted_by") annotated the hws member of
that struct with __counted_by, which informs the bounds sanitizer about
the number of elements in hws, so that it can warn when hws is accessed
out of bounds.
As noted in that change, the __counted_by member must be initialised
with the number of elements before the first array access happens,
otherwise there will be a warning from each access prior to the
initialisation because the number of elements is zero. This occurs in
s2mps11_clk_probe() due to ::num being assigned after ::hws access.
Move the assignment to satisfy the requirement of assign-before-access.
Cc: stable(a)vger.kernel.org
Fixes: f316cdff8d67 ("clk: Annotate struct clk_hw_onecell_data with __counted_by")
Signed-off-by: André Draszik <andre.draszik(a)linaro.org>
---
drivers/clk/clk-s2mps11.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/clk/clk-s2mps11.c b/drivers/clk/clk-s2mps11.c
index 014db6386624071e173b5b940466301d2596400a..8ddf3a9a53dfd5bb52a05a3e02788a357ea77ad3 100644
--- a/drivers/clk/clk-s2mps11.c
+++ b/drivers/clk/clk-s2mps11.c
@@ -137,6 +137,8 @@ static int s2mps11_clk_probe(struct platform_device *pdev)
if (!clk_data)
return -ENOMEM;
+ clk_data->num = S2MPS11_CLKS_NUM;
+
switch (hwid) {
case S2MPS11X:
s2mps11_reg = S2MPS11_REG_RTC_CTRL;
@@ -186,7 +188,6 @@ static int s2mps11_clk_probe(struct platform_device *pdev)
clk_data->hws[i] = &s2mps11_clks[i].hw;
}
- clk_data->num = S2MPS11_CLKS_NUM;
of_clk_add_hw_provider(s2mps11_clks->clk_np, of_clk_hw_onecell_get,
clk_data);
---
base-commit: 9388ec571cb1adba59d1cded2300eeb11827679c
change-id: 20250326-s2mps11-ubsan-c90978e7bc04
Best regards,
--
André Draszik <andre.draszik(a)linaro.org>
Hi,
Pasi Kallinen reported in Debian a regression with perf r5101c4
counter, initially it was found in
https://github.com/rr-debugger/rr/issues/3949 but said to be a kernel
problem.
On Tue, May 06, 2025 at 07:18:39PM +0300, Pasi Kallinen wrote:
> Package: src:linux
> Version: 6.12.25-1
> Severity: normal
> X-Debbugs-Cc: debian-amd64(a)lists.debian.org, paxed(a)alt.org
> User: debian-amd64(a)lists.debian.org
> Usertags: amd64
>
> Dear Maintainer,
>
> perf stat -e r5101c4 true
>
> reports "not supported".
>
> The counters worked in kernel 6.11.10.
>
> I first noticed this not working when updating to 6.12.22.
> Booting back to 6.11.10, the counters work correctly.
Does this ring a bell?
Would you be able to bisect the changes to identify where the
behaviour changed?
Regards,
Salvatore
The quilt patch titled
Subject: x86/kexec: fix potential cmem->ranges out of bounds
has been removed from the -mm tree. Its filename was
x86-kexec-fix-potential-cmem-ranges-out-of-bounds.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: fuqiang wang <fuqiang.wang(a)easystack.cn>
Subject: x86/kexec: fix potential cmem->ranges out of bounds
Date: Mon, 8 Jan 2024 21:06:47 +0800
In memmap_exclude_ranges(), elfheader will be excluded from crashk_res.
In the current x86 architecture code, the elfheader is always allocated
at crashk_res.start. It seems that there won't be a new split range.
But it depends on the allocation position of elfheader in crashk_res. To
avoid potential out of bounds in future, add a extra slot.
The similar issue also exists in fill_up_crash_elf_data(). The range to
be excluded is [0, 1M], start (0) is special and will not appear in the
middle of existing cmem->ranges[]. But in cast the low 1M could be
changed in the future, add a extra slot too.
Without this patch, kdump kernel will fail to be loaded by
kexec_file_load,
[ 139.736948] UBSAN: array-index-out-of-bounds in arch/x86/kernel/crash.c:350:25
[ 139.742360] index 0 is out of range for type 'range [*]'
[ 139.745695] CPU: 0 UID: 0 PID: 5778 Comm: kexec Not tainted 6.15.0-0.rc3.20250425git02ddfb981de8.32.fc43.x86_64 #1 PREEMPT(lazy)
[ 139.745698] Hardware name: Amazon EC2 c5.large/, BIOS 1.0 10/16/2017
[ 139.745699] Call Trace:
[ 139.745700] <TASK>
[ 139.745701] dump_stack_lvl+0x5d/0x80
[ 139.745706] ubsan_epilogue+0x5/0x2b
[ 139.745709] __ubsan_handle_out_of_bounds.cold+0x54/0x59
[ 139.745711] crash_setup_memmap_entries+0x2d9/0x330
[ 139.745716] setup_boot_parameters+0xf8/0x6a0
[ 139.745720] bzImage64_load+0x41b/0x4e0
[ 139.745722] ? find_next_iomem_res+0x109/0x140
[ 139.745727] ? locate_mem_hole_callback+0x109/0x170
[ 139.745737] kimage_file_alloc_init+0x1ef/0x3e0
[ 139.745740] __do_sys_kexec_file_load+0x180/0x2f0
[ 139.745742] do_syscall_64+0x7b/0x160
[ 139.745745] ? do_user_addr_fault+0x21a/0x690
[ 139.745747] ? exc_page_fault+0x7e/0x1a0
[ 139.745749] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 139.745751] RIP: 0033:0x7f7712c84e4d
Previously discussed link:
[1] https://lore.kernel.org/kexec/ZXk2oBf%2FT1Ul6o0c@MiWiFi-R3L-srv/
[2] https://lore.kernel.org/kexec/273284e8-7680-4f5f-8065-c5d780987e59@easystac…
[3] https://lore.kernel.org/kexec/ZYQ6O%2F57sHAPxTHm@MiWiFi-R3L-srv/
Link: https://lkml.kernel.org/r/20240108130720.228478-1-fuqiang.wang@easystack.cn
Signed-off-by: fuqiang wang <fuqiang.wang(a)easystack.cn>
Acked-by: Baoquan He <bhe(a)redhat.com>
Reported-by: Coiby Xu <coxu(a)redhat.com>
Closes: https://lkml.kernel.org/r/4de3c2onosr7negqnfhekm4cpbklzmsimgdfv33c52dktqpza…
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: <x86(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/crash.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/crash.c~x86-kexec-fix-potential-cmem-ranges-out-of-bounds
+++ a/arch/x86/kernel/crash.c
@@ -165,8 +165,18 @@ static struct crash_mem *fill_up_crash_e
/*
* Exclusion of crash region and/or crashk_low_res may cause
* another range split. So add extra two slots here.
+ *
+ * Exclusion of low 1M may not cause another range split, because the
+ * range of exclude is [0, 1M] and the condition for splitting a new
+ * region is that the start, end parameters are both in a certain
+ * existing region in cmem and cannot be equal to existing region's
+ * start or end. Obviously, the start of [0, 1M] cannot meet this
+ * condition.
+ *
+ * But in order to lest the low 1M could be changed in the future,
+ * (e.g. [stare, 1M]), add a extra slot.
*/
- nr_ranges += 2;
+ nr_ranges += 3;
cmem = vzalloc(struct_size(cmem, ranges, nr_ranges));
if (!cmem)
return NULL;
@@ -298,9 +308,16 @@ int crash_setup_memmap_entries(struct ki
struct crash_memmap_data cmd;
struct crash_mem *cmem;
- cmem = vzalloc(struct_size(cmem, ranges, 1));
+ /*
+ * In the current x86 architecture code, the elfheader is always
+ * allocated at crashk_res.start. But it depends on the allocation
+ * position of elfheader in crashk_res. To avoid potential out of
+ * bounds in future, add a extra slot.
+ */
+ cmem = vzalloc(struct_size(cmem, ranges, 2));
if (!cmem)
return -ENOMEM;
+ cmem->max_nr_ranges = 2;
memset(&cmd, 0, sizeof(struct crash_memmap_data));
cmd.params = params;
_
Patches currently in -mm which might be from fuqiang.wang(a)easystack.cn are
From: Fabio Estevam <festevam(a)denx.de>
Since commit 2718f15403fb ("iio: sanity check available_scan_masks array"),
verbose and misleading warnings are printed for devices like the MAX11601:
max1363 1-0064: available_scan_mask 8 subset of 0. Never used
max1363 1-0064: available_scan_mask 9 subset of 0. Never used
max1363 1-0064: available_scan_mask 10 subset of 0. Never used
max1363 1-0064: available_scan_mask 11 subset of 0. Never used
max1363 1-0064: available_scan_mask 12 subset of 0. Never used
max1363 1-0064: available_scan_mask 13 subset of 0. Never used
...
[warnings continue]
Fix the available_scan_masks sanity check logic so that it
only prints the warning when an element of available_scan_mask
is in fact a subset of a previous one.
These warnings incorrectly report that later scan masks are subsets of
the first one, even when they are not. The issue lies in the logic that
checks for subset relationships between scan masks.
Fix the subset detection to correctly compare each mask only
against previous masks, and only warn when a true subset is found.
With this fix, the warning output becomes both correct and more
informative:
max1363 1-0064: Mask 7 (0xc) is a subset of mask 6 (0xf) and will be ignored
Cc: stable(a)vger.kernel.org
Fixes: 2718f15403fb ("iio: sanity check available_scan_masks array")
Signed-off-by: Fabio Estevam <festevam(a)denx.de>
---
drivers/iio/industrialio-core.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 6a6568d4a2cb..855d5fd3e6b2 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1904,6 +1904,11 @@ static int iio_check_extended_name(const struct iio_dev *indio_dev)
static const struct iio_buffer_setup_ops noop_ring_setup_ops;
+static int is_subset(unsigned long a, unsigned long b)
+{
+ return (a & ~b) == 0;
+}
+
static void iio_sanity_check_avail_scan_masks(struct iio_dev *indio_dev)
{
unsigned int num_masks, masklength, longs_per_mask;
@@ -1947,21 +1952,13 @@ static void iio_sanity_check_avail_scan_masks(struct iio_dev *indio_dev)
* available masks in the order of preference (presumably the least
* costy to access masks first).
*/
- for (i = 0; i < num_masks - 1; i++) {
- const unsigned long *mask1;
- int j;
- mask1 = av_masks + i * longs_per_mask;
- for (j = i + 1; j < num_masks; j++) {
- const unsigned long *mask2;
-
- mask2 = av_masks + j * longs_per_mask;
- if (bitmap_subset(mask2, mask1, masklength))
+ for (i = 1; i < num_masks; ++i)
+ for (int j = 0; j < i; ++j)
+ if (is_subset(av_masks[i], av_masks[j]))
dev_warn(indio_dev->dev.parent,
- "available_scan_mask %d subset of %d. Never used\n",
- j, i);
- }
- }
+ "Mask %d (0x%lx) is a subset of mask %d (0x%lx) and will be ignored\n",
+ i, av_masks[i], j, av_masks[j]);
}
/**
--
2.34.1
Hi,
On 3/14/25 2:08 PM, Benjamin Berg wrote:
> From: Benjamin Berg <benjamin.berg(a)intel.com>
> um: work around sched_yield not yielding in time-travel mode
>
> sched_yield by a userspace may not actually cause scheduling in
> time-travel mode as no time has passed. In the case seen it appears to
> be a badly implemented userspace spinlock in ASAN. Unfortunately, with
> time-travel it causes an extreme slowdown or even deadlock depending on
> the kernel configuration (CONFIG_UML_MAX_USERSPACE_ITERATIONS).
>
> Work around it by accounting time to the process whenever it executes a
> sched_yield syscall.
>
> Signed-off-by: Benjamin Berg <benjamin.berg(a)intel.com>
From what I can tell the patch mentioned above was backported to 6.12.27 by:
<https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/arc…>
but without the upstream
|Commit 0b8b2668f9981c1fefc2ef892bd915288ef01f33
|Author: Benjamin Berg <benjamin.berg(a)intel.com>
|Date: Thu Oct 10 16:25:37 2024 +0200
| um: insert scheduler ticks when userspace does not yield
|
| In time-travel mode userspace can do a lot of work without any time
| passing. Unfortunately, this can result in OOM situations as the RCU
| core code will never be run. [...]
the kernel build for 6.12.27 for the UM-Target will fail:
| /usr/bin/ld: arch/um/kernel/skas/syscall.o: in function `handle_syscall': linux-6.12.27/arch/um/kernel/skas/syscall.c:43:(.text+0xa2): undefined reference to `tt_extra_sched_jiffies'
| collect2: error: ld returned 1 exit status
is it possible to backport 0b8b2668f9981c1fefc2ef892bd915288ef01f33 too?
Or is it better to revert 887c5c12e80c8424bd471122d2e8b6b462e12874 again
in the stable releases?
Best Regards,
Christian Lamparter
>
> ---
>
> I suspect it is this code in ASAN that uses sched_yield
> https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_co…
> though there are also some other places that use sched_yield.
>
> I doubt that code is reasonable. At the same time, not sure that
> sched_yield is behaving as advertised either as it obviously is not
> necessarily relinquishing the CPU.
> ---
> arch/um/include/linux/time-internal.h | 2 ++
> arch/um/kernel/skas/syscall.c | 11 +++++++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/arch/um/include/linux/time-internal.h b/arch/um/include/linux/time-internal.h
> index b22226634ff6..138908b999d7 100644
> --- a/arch/um/include/linux/time-internal.h
> +++ b/arch/um/include/linux/time-internal.h
> @@ -83,6 +83,8 @@ extern void time_travel_not_configured(void);
> #define time_travel_del_event(...) time_travel_not_configured()
> #endif /* CONFIG_UML_TIME_TRAVEL_SUPPORT */
>
> +extern unsigned long tt_extra_sched_jiffies;
> +
> /*
> * Without CONFIG_UML_TIME_TRAVEL_SUPPORT this is a linker error if used,
> * which is intentional since we really shouldn't link it in that case.
> diff --git a/arch/um/kernel/skas/syscall.c b/arch/um/kernel/skas/syscall.c
> index b09e85279d2b..a5beaea2967e 100644
> --- a/arch/um/kernel/skas/syscall.c
> +++ b/arch/um/kernel/skas/syscall.c
> @@ -31,6 +31,17 @@ void handle_syscall(struct uml_pt_regs *r)
> goto out;
>
> syscall = UPT_SYSCALL_NR(r);
> +
> + /*
> + * If no time passes, then sched_yield may not actually yield, causing
> + * broken spinlock implementations in userspace (ASAN) to hang for long
> + * periods of time.
> + */
> + if ((time_travel_mode == TT_MODE_INFCPU ||
> + time_travel_mode == TT_MODE_EXTERNAL) &&
> + syscall == __NR_sched_yield)
> + tt_extra_sched_jiffies += 1;
> +
> if (syscall >= 0 && syscall < __NR_syscalls) {
> unsigned long ret = EXECUTE_SYSCALL(syscall, regs);
>