From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
When requesting pins whose intr_detection_width setting is not 1 or 2
for interrupts (for example by running `gpiomon -c 0 113` on RB2), we'll
hit a BUG() in msm_gpio_irq_set_type(). Potentially crashing the kernel
due to an invalid request from user-space is not optimal, so let's go
through the pins and mark those that would fail the check as invalid for
the irq chip as we should not even register them as available irqs.
This function can be extended if we determine that there are more
corner-cases like this.
Fixes: f365be092572 ("pinctrl: Add Qualcomm TLMM driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
---
drivers/pinctrl/qcom/pinctrl-msm.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c b/drivers/pinctrl/qcom/pinctrl-msm.c
index f012ea88aa22c..77e0c2f023455 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -1038,6 +1038,24 @@ static bool msm_gpio_needs_dual_edge_parent_workaround(struct irq_data *d,
test_bit(d->hwirq, pctrl->skip_wake_irqs);
}
+static void msm_gpio_irq_init_valid_mask(struct gpio_chip *gc,
+ unsigned long *valid_mask,
+ unsigned int ngpios)
+{
+ struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
+ const struct msm_pingroup *g;
+ int i;
+
+ bitmap_fill(valid_mask, ngpios);
+
+ for (i = 0; i < ngpios; i++) {
+ g = &pctrl->soc->groups[i];
+ if (g->intr_detection_width != 1 &&
+ g->intr_detection_width != 2)
+ clear_bit(i, valid_mask);
+ }
+}
+
static int msm_gpio_irq_set_type(struct irq_data *d, unsigned int type)
{
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
@@ -1441,6 +1459,7 @@ static int msm_gpio_init(struct msm_pinctrl *pctrl)
girq->default_type = IRQ_TYPE_NONE;
girq->handler = handle_bad_irq;
girq->parents[0] = pctrl->irq;
+ girq->init_valid_mask = msm_gpio_irq_init_valid_mask;
ret = gpiochip_add_data(&pctrl->chip, pctrl);
if (ret) {
--
2.48.1
From: Dan Aloni <dan.aloni(a)vastdata.com>
[ Upstream commit a9c10b5b3b67b3750a10c8b089b2e05f5e176e33 ]
If there are failures then we must not leave the non-NULL pointers with
the error value, otherwise `rpcrdma_ep_destroy` gets confused and tries
free them, resulting in an Oops.
Signed-off-by: Dan Aloni <dan.aloni(a)vastdata.com>
Acked-by: Chuck Lever <chuck.lever(a)oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
(cherry picked from commit a9c10b5b3b67b3750a10c8b089b2e05f5e176e33)
[Larry: backport to 5.4.y. Minor conflict resolved due to missing commit 93aa8e0a9de80
xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep]
Signed-off-by: Larry Bassel <larry.bassel(a)oracle.com>
---
net/sunrpc/xprtrdma/verbs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index cfae1a871578..4fd3f632a2af 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -525,6 +525,7 @@ int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
IB_POLL_WORKQUEUE);
if (IS_ERR(sendcq)) {
rc = PTR_ERR(sendcq);
+ sendcq = NULL;
goto out1;
}
@@ -533,6 +534,7 @@ int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
IB_POLL_WORKQUEUE);
if (IS_ERR(recvcq)) {
rc = PTR_ERR(recvcq);
+ recvcq = NULL;
goto out2;
}
--
2.46.0
Hi Kees,
Bill's PR to disable __counted_by for "whole struct" __bdos cases has now
been merged into 19.1.3 [1], so here's the patch to disable __counted_by
for clang versions < 19.1.3 in the kernel.
Hopefully in the near future __counted_by for whole struct __bdos can be
enabled once again in coordination between the kernel, gcc, and clang.
There has been recent progress on this in [2] thanks to Tavian.
Also see previous discussion on the mailing list [3]
Thanks to everyone for moving this issue along. In particular, Bill for
his PR to clang/llvm, Kees and Thorsten for reproducers of the two issues,
Nathan for Kconfig-ifying this patch, and Miguel for reviewing.
Info for the stable team:
This patch should be backported to kernels >= 6.6 to make sure that those
build correctly with the effected clang versions. This patch cherry-picks
cleanly onto linux-6.11.y. For linux-6.6.y three prerequiste commits are
neded:
16c31dd7fdf6: Compiler Attributes: counted_by: bump min gcc version
2993eb7a8d34: Compiler Attributes: counted_by: fixup clang URL
231dc3f0c936: lkdtm/bugs: Improve warning message for compilers without counted_by support
There are still two merge conflicts even with those prerequistes.
Here's the correct resolution:
1. include/linux/compiler_types.h:
use the incoming change until before (but not including) the
"Apply __counted_by() when the Endianness matches to increase test coverage."
comment
2. lib/overflow_kunit.c:
HEAD is correct
[1] https://github.com/llvm/llvm-project/pull/112786
[2] https://github.com/llvm/llvm-project/pull/112636
[3] https://lore.kernel.org/lkml/3E304FB2-799D-478F-889A-CDFC1A52DCD8@toblux.co…
Best Regards
Jan
Jan Hendrik Farr (1):
Compiler Attributes: disable __counted_by for clang < 19.1.3
drivers/misc/lkdtm/bugs.c | 2 +-
include/linux/compiler_attributes.h | 13 -------------
include/linux/compiler_types.h | 19 +++++++++++++++++++
init/Kconfig | 9 +++++++++
lib/overflow_kunit.c | 2 +-
5 files changed, 30 insertions(+), 15 deletions(-)
--
2.47.0
Hi Greg,
Please cherry-pick this patch series into 5.10.y stable. It
includes a feature that fixes CVE-2022-0500 which allows a user with
cap_bpf privileges to get root privileges. The patch that fixes
the bug is
patch 6/8: bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM
The rest are the depedences required by the fix patch.
This patchset has been merged in mainline v5.17 and backported to v5.16[1]
and v5.15[2]
Tested by compile, build and run through the bpf selftest test_progs.
Before:
./test_progs -t ksyms_btf/write_check
test_ksyms_btf:PASS:btf_exists 0 nsec
test_write_check:FAIL:skel_open unexpected load of a prog writing to ksym memory
#44/3 write_check:FAIL
#44 ksyms_btf:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 2 FAILED
After:
./test_progs -t ksyms_btf/write_check
#44/3 write_check:OK
#44 ksyms_btf:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
[1] https://lore.kernel.org/all/Yg6cixLJFoxDmp+I@kroah.com/
[2] https://lore.kernel.org/all/Ymupcl2JshcWjmMD@kroah.com/
Hao Luo (8):
bpf: Introduce composable reg, ret and arg types.
bpf: Replace ARG_XXX_OR_NULL with ARG_XXX | PTR_MAYBE_NULL
bpf: Replace RET_XXX_OR_NULL with RET_XXX | PTR_MAYBE_NULL
bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL
bpf: Introduce MEM_RDONLY flag
bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.
bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.
bpf/selftests: Test PTR_TO_RDONLY_MEM
include/linux/bpf.h | 98 +++-
include/linux/bpf_verifier.h | 18 +
kernel/bpf/btf.c | 8 +-
kernel/bpf/cgroup.c | 2 +-
kernel/bpf/helpers.c | 10 +-
kernel/bpf/map_iter.c | 4 +-
kernel/bpf/ringbuf.c | 2 +-
kernel/bpf/verifier.c | 477 +++++++++---------
kernel/trace/bpf_trace.c | 22 +-
net/core/bpf_sk_storage.c | 2 +-
net/core/filter.c | 62 +--
net/core/sock_map.c | 2 +-
.../selftests/bpf/prog_tests/ksyms_btf.c | 14 +
.../bpf/progs/test_ksyms_btf_write_check.c | 29 ++
14 files changed, 441 insertions(+), 309 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf_write_check.c
--
2.47.1
Hello,
this is a followup to
https://lore.kernel.org/stable/cover.1749223334.git.u.kleine-koenig@baylibr…
that handled backporting the two patches by Alexandre to the active
stable kernels between 6.15 and 5.15. Here comes a backport to 5.10.y, git
am handles application to 5.4.y just fine.
Compared to the backport for later kernels I included a major rework of
rtc_time64_to_tm() by Cassio Neri. (FTR: I checked, that commit by
Cassio Neri isn't the reason we need to fix rtc_time64_to_tm(), the
actual problem is older.)
Now that I completed the backport and did some final checks on it I
noticed that the problem fixed here is (TTBOMK) a theoretic one because
only drivers with .start_secs < 0 are known to have issues and in 5.10
and before there is no such driver. I'm uncertain if this should result
in not backporting the changes. I would tend to pick them anyhow, but
I won't argue on a veto.
Best regards
Uwe
Alexandre Mergnat (2):
rtc: Make rtc_time64_to_tm() support dates before 1970
rtc: Fix offset calculation for .start_secs < 0
Cassio Neri (1):
rtc: Improve performance of rtc_time64_to_tm(). Add tests.
drivers/rtc/Kconfig | 10 ++++
drivers/rtc/Makefile | 1 +
drivers/rtc/class.c | 2 +-
drivers/rtc/lib.c | 121 ++++++++++++++++++++++++++++++++---------
drivers/rtc/lib_test.c | 79 +++++++++++++++++++++++++++
5 files changed, 185 insertions(+), 28 deletions(-)
create mode 100644 drivers/rtc/lib_test.c
base-commit: 01e7e36b8606e5d4fddf795938010f7bfa3aa277
--
2.49.0
From: Jakub Kicinski <kuba(a)kernel.org>
commit f22b4b55edb507a2b30981e133b66b642be4d13f upstream.
I find the behavior of xa_for_each_start() slightly counter-intuitive.
It doesn't end the iteration by making the index point after the last
element. IOW calling xa_for_each_start() again after it "finished"
will run the body of the loop for the last valid element, instead
of doing nothing.
This works fine for netlink dumps if they terminate correctly
(i.e. coalesce or carefully handle NLM_DONE), but as we keep getting
reminded legacy dumps are unlikely to go away.
Fixing this generically at the xa_for_each_start() level seems hard -
there is no index reserved for "end of iteration".
ifindexes are 31b wide, tho, and iterator is ulong so for
for_each_netdev_dump() it's safe to go to the next element.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Jeremy Kerr <jk(a)codeconstruct.com.au>
---
The mctp RTM_GETADDR rework backport of acab78ae12c7 ("net: mctp: Don't
access ifa_index when missing") pulled 2d45eeb7d5d7 ("mctp: no longer
rely on net->dev_index_head[]") as a dependency. However, that change
relies on this backport for correct behaviour of for_each_netdev_dump().
Jakub mentions[1] that nothing should be relying on the old behaviour of
for_each_netdev_dump(), hence the backport.
[1]: https://lore.kernel.org/netdev/20250609083749.741c27f5@kernel.org/
This backport is only applicable to 6.6.y; the change hit upstream in
6.10.
---
include/linux/netdevice.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0b0a172337dbac5716e5e5556befd95b4c201f5b..030d9de2ba2d23aa80b4b02182883f022f553964 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3036,7 +3036,8 @@ extern rwlock_t dev_base_lock; /* Device list lock */
#define net_device_entry(lh) list_entry(lh, struct net_device, dev_list)
#define for_each_netdev_dump(net, d, ifindex) \
- xa_for_each_start(&(net)->dev_by_index, (ifindex), (d), (ifindex))
+ for (; (d = xa_find(&(net)->dev_by_index, &ifindex, \
+ ULONG_MAX, XA_PRESENT)); ifindex++)
static inline struct net_device *next_net_device(struct net_device *dev)
{
---
base-commit: c2603c511feb427b2b09f74b57816a81272932a1
change-id: 20250610-nl-dump-618700905d4f
Best regards,
--
Jeremy Kerr <jk(a)codeconstruct.com.au>
Fix compilation warning:
In file included from ./include/linux/kernel.h:15,
from ./include/linux/list.h:9,
from ./include/linux/module.h:12,
from net/ipv4/inet_hashtables.c:12:
net/ipv4/inet_hashtables.c: In function ‘inet_ehash_locks_alloc’:
./include/linux/minmax.h:20:35: warning: comparison of distinct pointer types lacks a cast
20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ^~
./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck’
26 | (__typecheck(x, y) && __no_side_effects(x, y))
| ^~~~~~~~~~~
./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp’
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~
./include/linux/minmax.h:52:25: note: in expansion of macro ‘__careful_cmp’
52 | #define max(x, y) __careful_cmp(x, y, >)
| ^~~~~~~~~~~~~
net/ipv4/inet_hashtables.c:946:19: note: in expansion of macro ‘max’
946 | nblocks = max(nblocks, num_online_nodes() * PAGE_SIZE / locksz);
| ^~~
CC block/badblocks.o
When warnings are treated as errors, this causes the build to fail.
The issue is a type mismatch between the operands passed to the max()
macro. Here, nblocks is an unsigned int, while the expression
num_online_nodes() * PAGE_SIZE / locksz is promoted to unsigned long.
This happens because:
- num_online_nodes() returns int
- PAGE_SIZE is typically defined as an unsigned long (depending on the
architecture)
- locksz is unsigned int
The resulting arithmetic expression is promoted to unsigned long.
Thus, the max() macro compares values of different types: unsigned int
vs unsigned long.
This issue was introduced in commit f8ece40786c9 ("tcp: bring back NUMA
dispersion in inet_ehash_locks_alloc()") during the update from kernel
v5.10.237 to v5.10.238.
It does not exist in newer kernel branches (e.g., v5.15.185 and all 6.x
branches), because they include commit d03eba99f5bf ("minmax: allow
min()/max()/clamp() if the arguments have the same signedness.")
Fix the issue by using max_t(unsigned int, ...) to explicitly cast both
operands to the same type, avoiding the type mismatch and ensuring
correctness.
Fixes: f8ece40786c9 ("tcp: bring back NUMA dispersion in inet_ehash_locks_alloc()")
Signed-off-by: Eliav Farber <farbere(a)amazon.com>
---
V1 -> V2: Use upstream commit SHA1 in reference
net/ipv4/inet_hashtables.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index fea74ab2a4be..ac2d185c04ef 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -943,7 +943,7 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U) * num_possible_cpus();
/* At least one page per NUMA node. */
- nblocks = max(nblocks, num_online_nodes() * PAGE_SIZE / locksz);
+ nblocks = max_t(unsigned int, nblocks, num_online_nodes() * PAGE_SIZE / locksz);
nblocks = roundup_pow_of_two(nblocks);
--
2.47.1
Use common wrappers operating directly on the struct sg_table objects to
fix incorrect use of scatterlists sync calls. dma_sync_sg_for_*()
functions have to be called with the number of elements originally passed
to dma_map_sg_*() function, not the one returned in sgtable's nents.
Fixes: 1ffe09590121 ("udmabuf: fix dma-buf cpu access")
CC: stable(a)vger.kernel.org
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
---
drivers/dma-buf/udmabuf.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 7eee3eb47a8e..c9d0c68d2fcb 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -264,8 +264,7 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
ubuf->sg = NULL;
}
} else {
- dma_sync_sg_for_cpu(dev, ubuf->sg->sgl, ubuf->sg->nents,
- direction);
+ dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
}
return ret;
@@ -280,7 +279,7 @@ static int end_cpu_udmabuf(struct dma_buf *buf,
if (!ubuf->sg)
return -EINVAL;
- dma_sync_sg_for_device(dev, ubuf->sg->sgl, ubuf->sg->nents, direction);
+ dma_sync_sgtable_for_device(dev, ubuf->sg, direction);
return 0;
}
--
2.34.1
Running as a Xen PV guest uncovered some bugs when ITS mitigation is
active.
Juergen Gross (3):
x86/execmem: don't use PAGE_KERNEL protection for code pages
x86/mm/pat: don't collapse pages without PSE set
x86/alternative: make kernel ITS thunks read-only
arch/x86/kernel/alternative.c | 16 ++++++++++++++++
arch/x86/mm/init.c | 2 +-
arch/x86/mm/pat/set_memory.c | 3 +++
3 files changed, 20 insertions(+), 1 deletion(-)
--
2.43.0
Hello Cassio,
thanks for your input.
On Tue, Jun 10, 2025 at 09:31:48PM +0100, Cassio Neri wrote:
> Although untested, I'm pretty sure that with very small changes, the
> previous revision (1d1bb12) can handle dates prior to 1970-01-01 with no
> need to add extra branches or arithmetic operations. Indeed, 1d1bb12
> contains:
>
> <code>
> /* time must be positive */
> days = div_s64_rem(time, 86400, &secs);
>
> /* day of the week, 1970-01-01 was a Thursday */
> tm->tm_wday = (days + 4) % 7;
>
> /* long comments */
>
> udays = ((u32) days) + 719468;
> </code>
>
> This could have been changed to:
>
> <code>
> /* time must be >= -719468 * 86400 which corresponds to 0000-03-01 */
> udays = div_u64_rem(time + 719468 * 86400, 86400, &secs);
>
> /* day of the week, 0000-03-01 was a Wednesday (in the proleptic Gregorian
> calendar) */
> tm->tm_wday = (days + 3) % 7;
>
> /* long comments */
> </code>
>
> Indeed, the addition of 719468 * 86400 to `time` makes `days` to be 719468
> more than it should be. Therefore, in the calculation of `udays`, the
> addition of 719468 becomes unnecessary and thus, `udays == days`. Moreover,
> this means that `days` can be removed altogether and replaced by `udays`.
> (Not the other way around because in the remaining code `udays` must be
> u32.)
>
> Now, 719468 % 7 = 1 and thus tm->wday is 1 day after what it should be and
> we correct that by adding 3 instead of 4.
>
> Therefore, I suggest these changes on top of 1d1bb12 instead of those made
> in 7df4cfe. Since you're working on this, can I please kindly suggest two
> other changes?
It's to late for "instead", and we're discussing a backport to stable
for a commit that is already in v6.16-rc1.
While your concerns are correct (though I didn't check the details yet),
I claim that 7df4cfef8b35 is correct and it's the right thing to
backport that today. Incremental changes can then go in the development
version (and backported if deemed necessary).
> 1) Change the reference provided in the long comment. It should say, "The
> following algorithm is, basically, Figure 12 of Neri and Schneider [1]" and
> [1] should refer to the published article:
>
> Neri C, Schneider L. Euclidean affine functions and their application to
> calendar algorithms. Softw Pract Exper. 2023;53(4):937-970. doi:
> 10.1002/spe.3172
> https://doi.org/10.1002/spe.3172
>
> The article is much better written and clearer than the pre-print currently
> referred to.
I'll add that to my todo list. (that = improving rtc_time64_to_tm() and
reading your paper :-)
> 2) Function rtc_time64_to_tm_test_date_range in drivers/rtc/lib_test.c, is
> a kunit test that checks the result for everyday in a 160000 years range
> starting at 1970-01-01. It'd be nice if this test is adapted to the new
> code and starts at 1900-01-01 (technically, it could start at 0000-03-01
> but since tm->year counts from 1900, it would be weird to see tm->year ==
> -1900 to mean that the calendar year is 0.) Also 160000 is definitely an
> overkill (my bad!) and a couple of thousands of years, say 3000, should be
> more than safe for anyone. :-)
I already did 2), see https://git.kernel.org/linus/ccb2dba3c19f.
Best regards
Uwe
From: Dave Hansen <dave.hansen(a)linux.intel.com>
PTI uses separate ASIDs (aka. PCIDs) for kernel and user address
spaces. When the kernel needs to flush the user address space, it
just sets a bit in a bitmap and then flushes the entire PCID on
the next switch to userspace.
But, this bitmap is a single 'unsigned long' which is plenty for
all 6 dynamic ASIDs. But, unfortunately, the INVLPGB support
brings along a bunch more user ASIDs, as many as ~2k more. The
bitmap can't address that many.
Fortunately, the bitmap is only needed for PTI and all the CPUs
with INVLPGB are AMD CPUs that aren't vulnerable to Meltdown and
don't need PTI. The only way someone can run into an issue in
practice is by booting with pti=on on a newer AMD CPU.
Disable INVLPGB if PTI is enabled. Avoid overrunning the small
bitmap.
Note: this will be fixed up properly by making the bitmap bigger.
For now, just avoid the mostly theoretical bug.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Fixes: 4afeb0ed1753 ("x86/mm: Enable broadcast TLB invalidation for multi-threaded processes")
Cc: stable(a)vger.kernel.org
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
---
b/arch/x86/mm/pti.c | 5 +++++
1 file changed, 5 insertions(+)
diff -puN arch/x86/mm/pti.c~no-INVLPGB-plus-KPTI arch/x86/mm/pti.c
--- a/arch/x86/mm/pti.c~no-INVLPGB-plus-KPTI 2025-06-10 15:02:14.439554339 -0700
+++ b/arch/x86/mm/pti.c 2025-06-10 15:09:47.713198206 -0700
@@ -98,6 +98,11 @@ void __init pti_check_boottime_disable(v
return;
setup_force_cpu_cap(X86_FEATURE_PTI);
+
+ if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
+ pr_debug("PTI enabled, disabling INVLPGB\n");
+ setup_clear_cpu_cap(X86_FEATURE_INVLPGB);
+ }
}
static int __init pti_parse_cmdline(char *arg)
_
In mt_perf_to_adistance(), the calculation of abstract distance (adist)
involves multiplying several int values including MEMTIER_ADISTANCE_DRAM.
```
*adist = MEMTIER_ADISTANCE_DRAM *
(perf->read_latency + perf->write_latency) /
(default_dram_perf.read_latency + default_dram_perf.write_latency) *
(default_dram_perf.read_bandwidth + default_dram_perf.write_bandwidth) /
(perf->read_bandwidth + perf->write_bandwidth);
```
Since these values can be large, the multiplication may exceed the maximum
value of an int (INT_MAX) and overflow (Our platform did), leading to an
incorrect adist.
User-visible impact:
The memory tiering subsystem will misinterpret slow memory (like CXL)
as faster than DRAM, causing inappropriate demotion of pages from
CXL (slow memory) to DRAM (fast memory).
For example, we will see the following demotion chains from the dmesg, where
Node0,1 are DRAM, and Node2,3 are CXL node:
Demotion targets for Node 0: null
Demotion targets for Node 1: null
Demotion targets for Node 2: preferred: 0-1, fallback: 0-1
Demotion targets for Node 3: preferred: 0-1, fallback: 0-1
Change MEMTIER_ADISTANCE_DRAM to be a long constant by writing it with the
'L' suffix. This prevents the overflow because the multiplication will then
be done in the long type which has a larger range.
Fixes: 3718c02dbd4c ("acpi, hmat: calculate abstract distance with HMAT")
Cc: stable(a)vger.kernel.org
Reviewed-by: Huang Ying <ying.huang(a)linux.alibaba.com>
Acked-by: Balbir Singh <balbirs(a)nvidia.com>
Reviewed-by: Donet Tom <donettom(a)linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
---
V2:
Document the 'User-visible impact' # Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memory-tiers.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 0dc0cf2863e2..7a805796fcfd 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -18,7 +18,7 @@
* adistance value (slightly faster) than default DRAM adistance to be part of
* the same memory tier.
*/
-#define MEMTIER_ADISTANCE_DRAM ((4 * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE >> 1))
+#define MEMTIER_ADISTANCE_DRAM ((4L * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE >> 1))
struct memory_tier;
struct memory_dev_type {
--
2.41.0
The patch titled
Subject: maple_tree: restart walk on correct status
has been added to the -mm mm-new branch. Its filename is
maple_tree-restart-walk-on-correct-status.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Wei Yang <richard.weiyang(a)gmail.com>
Subject: maple_tree: restart walk on correct status
Date: Wed, 11 Jun 2025 01:12:52 +0000
Commit a8091f039c1e ("maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW
states") adds more status during maple tree walk. But it introduce a typo
on the status check during walk.
It expects to mean neither active nor start, we would restart the walk,
while current code means we would always restart the walk.
Link: https://lkml.kernel.org/r/20250611011253.19515-3-richard.weiyang@gmail.com
Fixes: a8091f039c1e ("maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/maple_tree.c~maple_tree-restart-walk-on-correct-status
+++ a/lib/maple_tree.c
@@ -4930,7 +4930,7 @@ void *mas_walk(struct ma_state *mas)
{
void *entry;
- if (!mas_is_active(mas) || !mas_is_start(mas))
+ if (!mas_is_active(mas) && !mas_is_start(mas))
mas->status = ma_start;
retry:
entry = mas_state_walk(mas);
_
Patches currently in -mm which might be from richard.weiyang(a)gmail.com are
maple_tree-fix-mt_destroy_walk-on-root-leaf-node.patch
maple_tree-restart-walk-on-correct-status.patch
maple_tree-assert-retrieving-new-value-on-a-tree-containing-just-a-leaf-node.patch
The patch titled
Subject: maple_tree: fix mt_destroy_walk() on root leaf node
has been added to the -mm mm-new branch. Its filename is
maple_tree-fix-mt_destroy_walk-on-root-leaf-node.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Wei Yang <richard.weiyang(a)gmail.com>
Subject: maple_tree: fix mt_destroy_walk() on root leaf node
Date: Wed, 11 Jun 2025 01:12:51 +0000
Patch series "maple_tree: Fix the replacement of a root leaf node", v3.
On destroy we should set each node dead. But current code miss this when
the maple tree has only the root node.
The reason is that mt_destroy_walk() leverages mte_destroy_descend() to
set the node dead, but this is skipped since the only root node is a leaf.
Patch 1 fixes this.
When adding a test case, I found we always get the new value even when we
leave the old root node not dead. It turns out we always re-walk the tree
in mas_walk(). It looks like a typo on the status check of mas_walk().
Patch 2 fixes this.
Patch 3 adds a test case to assert retrieving new value when overwriting
the whole range to a tree with only root node.
This patch (of 3):
On destroy, we should set each node dead. But current code miss this when
the maple tree has only the root node.
The reason is mt_destroy_walk() leverage mte_destroy_descend() to set node
dead, but this is skipped since the only root node is a leaf.
Fixes this by setting the node dead if it is a leaf.
Link: https://lkml.kernel.org/r/20250611011253.19515-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20250611011253.19515-2-richard.weiyang@gmail.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 1 +
1 file changed, 1 insertion(+)
--- a/lib/maple_tree.c~maple_tree-fix-mt_destroy_walk-on-root-leaf-node
+++ a/lib/maple_tree.c
@@ -5319,6 +5319,7 @@ static void mt_destroy_walk(struct maple
struct maple_enode *start;
if (mte_is_leaf(enode)) {
+ mte_set_node_dead(enode);
node->type = mte_node_type(enode);
goto free_leaf;
}
_
Patches currently in -mm which might be from richard.weiyang(a)gmail.com are
maple_tree-fix-mt_destroy_walk-on-root-leaf-node.patch
maple_tree-restart-walk-on-correct-status.patch
maple_tree-assert-retrieving-new-value-on-a-tree-containing-just-a-leaf-node.patch
Property num_cpu and feature is read-only once eiointc is created, which
is set with KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL attr group before device
creation.
Attr group KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS is to update register
and software state for migration and reset usage, property num_cpu and
feature can not be update again if it is created already.
Here discard write operation with property num_cpu and feature in attr
group KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL.
Cc: stable(a)vger.kernel.org
Fixes: 1ad7efa552fd ("LoongArch: KVM: Add EIOINTC user mode read and write functions")
Signed-off-by: Bibo Mao <maobibo(a)loongson.cn>
---
arch/loongarch/kvm/intc/eiointc.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/loongarch/kvm/intc/eiointc.c b/arch/loongarch/kvm/intc/eiointc.c
index 0b648c56b0c3..b48511f903b5 100644
--- a/arch/loongarch/kvm/intc/eiointc.c
+++ b/arch/loongarch/kvm/intc/eiointc.c
@@ -910,9 +910,22 @@ static int kvm_eiointc_sw_status_access(struct kvm_device *dev,
data = (void __user *)attr->addr;
switch (addr) {
case KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_NUM_CPU:
+ /*
+ * Property num_cpu and feature is read-only once eiointc is
+ * created with KVM_DEV_LOONGARCH_EXTIOI_GRP_CTRL group API
+ *
+ * Disable writing with KVM_DEV_LOONGARCH_EXTIOI_GRP_SW_STATUS
+ * group API
+ */
+ if (is_write)
+ return ret;
+
p = &s->num_cpu;
break;
case KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_FEATURE:
+ if (is_write)
+ return ret;
+
p = &s->features;
break;
case KVM_DEV_LOONGARCH_EXTIOI_SW_STATUS_STATE:
--
2.39.3
Commit a8091f039c1e ("maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW
states") adds more status during maple tree walk. But it introduce a
typo on the status check during walk.
It expects to mean neither active nor start, we would restart the walk,
while current code means we would always restart the walk.
Fixes: a8091f039c1e ("maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
---
lib/maple_tree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index b0c345b6e646..7144dbbc3481 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4930,7 +4930,7 @@ void *mas_walk(struct ma_state *mas)
{
void *entry;
- if (!mas_is_active(mas) || !mas_is_start(mas))
+ if (!mas_is_active(mas) && !mas_is_start(mas))
mas->status = ma_start;
retry:
entry = mas_state_walk(mas);
--
2.34.1
On destroy, we should set each node dead. But current code miss this
when the maple tree has only the root node.
The reason is mt_destroy_walk() leverage mte_destroy_descend() to set
node dead, but this is skipped since the only root node is a leaf.
Fixes this by setting the node dead if it is a leaf.
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
---
v2:
* move the operation into mt_destroy_walk()
* adjust the title accordingly
---
lib/maple_tree.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index affe979bd14d..b0c345b6e646 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5319,6 +5319,7 @@ static void mt_destroy_walk(struct maple_enode *enode, struct maple_tree *mt,
struct maple_enode *start;
if (mte_is_leaf(enode)) {
+ mte_set_node_dead(enode);
node->type = mte_node_type(enode);
goto free_leaf;
}
--
2.34.1