From: Rob Clark <robdclark(a)chromium.org>
[ Upstream commit 5d319f75ccf7f0927425a7545aa1a22b3eedc189 ]
In error paths, we could unref the submit without calling
drm_sched_entity_push_job(), so msm_job_free() will never get
called. Since drm_sched_job_cleanup() will NULL out the
s_fence, we can use that to detect this case.
Signed-off-by: Rob Clark <robdclark(a)chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/653584/
Signed-off-by: Rob Clark <robin.clark(a)oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
Let me analyze based on what I know from the commit:
**YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## Analysis of the Fence Leak Fix
### 1. **Bug Description**
The commit fixes a resource leak in the MSM GPU driver's submit error
path. When errors occur after `drm_sched_job_arm()` but before
`drm_sched_entity_push_job()`, the scheduler fence (`s_fence`) is leaked
because `drm_sched_job_cleanup()` is never called.
### 2. **Code Changes Analysis**
The fix adds this critical code to `__msm_gem_submit_destroy()`:
```c
/*
- In error paths, we could unref the submit without calling
- drm_sched_entity_push_job(), so msm_job_free() will never
- get called. Since drm_sched_job_cleanup() will NULL out
- s_fence, we can use that to detect this case.
*/
if (submit->base.s_fence)
drm_sched_job_cleanup(&submit->base);
```
This is a defensive check that ensures proper cleanup of scheduler
resources when the submit is destroyed without going through the normal
scheduler path.
### 3. **Why This is a Good Backport Candidate**
**a) Fixes a Real Bug:** This addresses a concrete resource leak that
affects system stability. Each leaked fence consumes memory and other
kernel resources that are never freed.
**b) Small and Contained:** The fix is minimal - just 9 lines of code
including comments. It doesn't change any APIs or data structures.
**c) Clear Error Path Fix:** The fix targets a specific error handling
path without affecting the normal execution flow. When
`drm_sched_entity_push_job()` is called successfully, `msm_job_free()`
handles the cleanup as before.
**d) Follows Established Patterns:** The fix uses the same pattern as
the hw_fence cleanup (checking refcount) that's already in the code,
making it consistent with existing error handling.
**e) No Architectural Changes:** This is purely a bug fix that plugs a
resource leak. It doesn't introduce new features or change behavior.
**f) Critical Subsystem:** GPU drivers are important for system
stability and user experience. Memory leaks in GPU submission paths can
accumulate quickly under heavy graphics workloads.
### 4. **Comparison with Similar Commits**
Looking at the similar commits provided:
- The first commit (drm/msm: Fix wait_fence submitqueue leak) with
backport status YES also fixes a resource leak in error paths
- It's similarly small, contained, and fixes a clear bug without
architectural changes
### 5. **Risk Assessment**
**Low Risk:**
- The added check only executes in the destroy path when `s_fence` is
non-NULL
- If `drm_sched_entity_push_job()` was called, `msm_job_free()` will
have already cleaned up `s_fence`, making it NULL
- The fix is defensive and won't cause issues even if called
unnecessarily
### 6. **Stable Criteria Met**
According to stable tree rules:
- ✓ Fixes a bug (fence resource leak)
- ✓ Minimal change (9 lines)
- ✓ Obviously correct (clear logic with explanatory comment)
- ✓ Already tested (has Patchwork link indicating review)
- ✓ No new features
- ✓ Fixes a real issue users could encounter
The fence leak could accumulate over time, especially in scenarios with
GPU submission failures, leading to memory exhaustion and system
instability. This makes it an important fix for stable kernels.
drivers/gpu/drm/msm/msm_gem_submit.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index f775638d239a5..937c1f5d88cbb 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -85,6 +85,15 @@ void __msm_gem_submit_destroy(struct kref *kref)
container_of(kref, struct msm_gem_submit, ref);
unsigned i;
+ /*
+ * In error paths, we could unref the submit without calling
+ * drm_sched_entity_push_job(), so msm_job_free() will never
+ * get called. Since drm_sched_job_cleanup() will NULL out
+ * s_fence, we can use that to detect this case.
+ */
+ if (submit->base.s_fence)
+ drm_sched_job_cleanup(&submit->base);
+
if (submit->fence_id) {
spin_lock(&submit->queue->idr_lock);
idr_remove(&submit->queue->fence_idr, submit->fence_id);
--
2.39.5
From: Jake Hillion <jake(a)hillion.co.uk>
[ Upstream commit f8afb12a2d7503de6558c23cacd7acbf6e9fe678 ]
__hsmp_send_message sleeps between result read attempts and has a
timeout of 100ms. Under extreme load it's possible for these sleeps to
take a long time, exceeding the 100ms. In this case the current code
does not check the register and fails with ETIMEDOUT.
Refactor the loop to ensure there is at least one read of the register
after a sleep of any duration. This removes instances of ETIMEDOUT with
a single caller, even with a misbehaving scheduler. Tested on AMD
Bergamo machines.
Suggested-by: Blaise Sanouillet <linux(a)blaise.sanouillet.com>
Reviewed-by: Suma Hegde <suma.hegde(a)amd.com>
Tested-by: Suma Hegde <suma.hegde(a)amd.com>
Signed-off-by: Jake Hillion <jake(a)hillion.co.uk>
Link: https://lore.kernel.org/r/20250605-amd-hsmp-v2-1-a811bc3dd74a@hillion.co.uk
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
**YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## Bug Analysis
The commit fixes a critical race condition in the AMD HSMP driver's
timeout handling. The issue occurs in the `__hsmp_send_message()`
function where:
1. **Original code flaw**: The loop checks `time_before(jiffies,
timeout)` at the start of each iteration, then sleeps, but doesn't
check the register one final time after the last sleep completes.
2. **Race condition scenario**: Under heavy system load (as mentioned in
the commit message - "Under extreme load"), the sleep operations
(`usleep_range()`) can take significantly longer than intended. If
the final sleep extends beyond the 100ms timeout window, the code
exits with `-ETIMEDOUT` without checking if the SMU actually
responded during that extended sleep.
3. **The fix**: Changes the loop from `while (time_before(jiffies,
timeout))` to `while (true)` and moves the timeout check to after the
register read, ensuring at least one register check occurs after any
sleep duration.
## Why This Qualifies for Stable Backport
1. **Fixes a real bug affecting users**: The commit message explicitly
states this was "Tested on AMD Bergamo machines" and fixes actual
instances of ETIMEDOUT errors. AMD Bergamo is a server processor used
in production data centers.
2. **Small, contained fix**: The change is minimal - just restructuring
the loop logic without changing functionality. This meets the stable
kernel criteria of being a small, obvious fix.
3. **No new features**: This purely fixes existing behavior without
adding capabilities.
4. **Critical subsystem**: The HSMP driver is essential for AMD EPYC
server management, controlling:
- Power management
- Thermal monitoring
- Performance states
- System telemetry
5. **Production impact**: Spurious `-ETIMEDOUT` errors would cause
failures in:
- Data center management tools
- Power capping operations
- Performance monitoring
- Thermal management
6. **Clear problem and solution**: The race condition is well-
understood, and the fix ensures the code behaves as intended -
checking the register after sleeping rather than potentially timing
out without a final check.
## Risk Assessment
The risk is minimal because:
- The logic change is straightforward and correct
- It's been tested on production AMD Bergamo systems
- It only affects the timeout path behavior
- The worst case is the same as before (timeout after 100ms)
- The best case fixes false timeouts under load
This is exactly the type of bug fix that stable kernels exist to deliver
- fixing real issues users encounter in production without introducing
new risks.
drivers/platform/x86/amd/hsmp/hsmp.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/platform/x86/amd/hsmp/hsmp.c b/drivers/platform/x86/amd/hsmp/hsmp.c
index a3ac09a90de45..ab877112f4c80 100644
--- a/drivers/platform/x86/amd/hsmp/hsmp.c
+++ b/drivers/platform/x86/amd/hsmp/hsmp.c
@@ -99,7 +99,7 @@ static int __hsmp_send_message(struct hsmp_socket *sock, struct hsmp_message *ms
short_sleep = jiffies + msecs_to_jiffies(HSMP_SHORT_SLEEP);
timeout = jiffies + msecs_to_jiffies(HSMP_MSG_TIMEOUT);
- while (time_before(jiffies, timeout)) {
+ while (true) {
ret = sock->amd_hsmp_rdwr(sock, mbinfo->msg_resp_off, &mbox_status, HSMP_RD);
if (ret) {
dev_err(sock->dev, "Error %d reading mailbox status\n", ret);
@@ -108,6 +108,10 @@ static int __hsmp_send_message(struct hsmp_socket *sock, struct hsmp_message *ms
if (mbox_status != HSMP_STATUS_NOT_READY)
break;
+
+ if (!time_before(jiffies, timeout))
+ break;
+
if (time_before(jiffies, short_sleep))
usleep_range(50, 100);
else
--
2.39.5
The MIPS32r2 ChaCha code has never been buildable with the clang
assembler. First, clang doesn't support the 'rotl' pseudo-instruction:
error: unknown instruction, did you mean: rol, rotr?
Second, clang requires that both operands of the 'wsbh' instruction be
explicitly given:
error: too few operands for instruction
To fix this, align the code with the real instruction set by (1) using
the real instruction 'rotr' instead of the nonstandard pseudo-
instruction 'rotl', and (2) explicitly giving both operands to 'wsbh'.
To make removing the use of 'rotl' a bit easier, also remove the
unnecessary special-casing for big endian CPUs at
.Lchacha_mips_xor_bytes. The tail handling is actually
endian-independent since it processes one byte at a time. On big endian
CPUs the old code byte-swapped SAVED_X, then iterated through it in
reverse order. But the byteswap and reverse iteration canceled out.
Tested with chacha20poly1305-selftest in QEMU using "-M malta" with both
little endian and big endian mips32r2 kernels.
Fixes: 49aa7c00eddf ("crypto: mips/chacha - import 32r2 ChaCha code from Zinc")
Cc: stable(a)vger.kernel.org
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202505080409.EujEBwA0-lkp@intel.com/
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
---
This applies on top of other pending lib/crypto patches and can be
retrieved from git at:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git mips-chacha-fix
lib/crypto/mips/chacha-core.S | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
diff --git a/lib/crypto/mips/chacha-core.S b/lib/crypto/mips/chacha-core.S
index 5755f69cfe007..706aeb850fb0d 100644
--- a/lib/crypto/mips/chacha-core.S
+++ b/lib/crypto/mips/chacha-core.S
@@ -53,21 +53,17 @@
#define IS_UNALIGNED $s7
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
#define MSB 0
#define LSB 3
-#define ROTx rotl
-#define ROTR(n) rotr n, 24
#define CPU_TO_LE32(n) \
- wsbh n; \
+ wsbh n, n; \
rotr n, 16;
#else
#define MSB 3
#define LSB 0
-#define ROTx rotr
#define CPU_TO_LE32(n)
-#define ROTR(n)
#endif
#define FOR_EACH_WORD(x) \
x( 0); \
x( 1); \
@@ -190,14 +186,14 @@ CONCAT3(.Lchacha_mips_xor_aligned_, PLUS_ONE(x), _b: ;) \
addu X(D), X(N); \
xor X(V), X(A); \
xor X(W), X(B); \
xor X(Y), X(C); \
xor X(Z), X(D); \
- rotl X(V), S; \
- rotl X(W), S; \
- rotl X(Y), S; \
- rotl X(Z), S;
+ rotr X(V), 32 - S; \
+ rotr X(W), 32 - S; \
+ rotr X(Y), 32 - S; \
+ rotr X(Z), 32 - S;
.text
.set reorder
.set noat
.globl chacha_crypt_arch
@@ -370,25 +366,23 @@ chacha_crypt_arch:
addu IN, $at
addu OUT, $at
/* First byte */
lbu T1, 0(IN)
addiu $at, BYTES, 1
- CPU_TO_LE32(SAVED_X)
- ROTR(SAVED_X)
xor T1, SAVED_X
sb T1, 0(OUT)
beqz $at, .Lchacha_mips_xor_done
/* Second byte */
lbu T1, 1(IN)
addiu $at, BYTES, 2
- ROTx SAVED_X, 8
+ rotr SAVED_X, 8
xor T1, SAVED_X
sb T1, 1(OUT)
beqz $at, .Lchacha_mips_xor_done
/* Third byte */
lbu T1, 2(IN)
- ROTx SAVED_X, 8
+ rotr SAVED_X, 8
xor T1, SAVED_X
sb T1, 2(OUT)
b .Lchacha_mips_xor_done
.Lchacha_mips_no_full_block_unaligned:
--
2.50.0
Sohil reported seeing a split lock warning when running a test that
generates userspace #DB:
x86/split lock detection: #DB: sigtrap_loop_64/4614 took a bus_lock trap at address: 0x4011ae
We investigated the issue and figured out:
1) The warning is a false positive.
2) It is not caused by the test itself.
3) It occurs even when Bus Lock Detection (BLD) is disabled.
4) It only happens on the first #DB on a CPU.
And the root cause is, at boot time, Linux zeros DR6. This leads to
different DR6 values depending on whether the CPU supports BLD:
1) On CPUs with BLD support, DR6 becomes 0xFFFF07F0 (bit 11, DR6.BLD,
is cleared).
2) On CPUs without BLD, DR6 becomes 0xFFFF0FF0.
Since only BLD-induced #DB exceptions clear DR6.BLD and other debug
exceptions leave it unchanged, even if the first #DB is unrelated to
BLD, DR6.BLD is still cleared. As a result, such a first #DB is
misinterpreted as a BLD #DB, and a false warning is triggerred.
Fix the bug by initializing DR6 by writing its architectural reset
value at boot time.
DR7 suffers from a similar issue, apply the same fix.
This patch set is based on tip/x86/urgent branch as of today.
Link to the previous patch set v3:
https://lore.kernel.org/all/20250618172723.1651465-1-xin@zytor.com/
Change in v4:
*) Cc stable in the DR7 initialization patch for backporting, just in
case bit 10 of DR7 has become unreserved on new hardware, even
though clearing it doesn't currently cause any real issues (Dave
Hansen).
Xin Li (Intel) (2):
x86/traps: Initialize DR6 by writing its architectural reset value
x86/traps: Initialize DR7 by writing its architectural reset value
arch/x86/include/asm/debugreg.h | 19 ++++++++++++----
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/include/uapi/asm/debugreg.h | 21 ++++++++++++++++-
arch/x86/kernel/cpu/common.c | 24 ++++++++------------
arch/x86/kernel/kgdb.c | 2 +-
arch/x86/kernel/process_32.c | 2 +-
arch/x86/kernel/process_64.c | 2 +-
arch/x86/kernel/traps.c | 34 +++++++++++++++++-----------
arch/x86/kvm/x86.c | 4 ++--
9 files changed, 72 insertions(+), 38 deletions(-)
base-commit: 2aebf5ee43bf0ed225a09a30cf515d9f2813b759
--
2.49.0
This reverts commit ad5643cf2f69 ("riscv: Define TASK_SIZE_MAX for
__access_ok()").
This commit changes TASK_SIZE_MAX to be LONG_MAX to optimize access_ok(),
because the previous TASK_SIZE_MAX (default to TASK_SIZE) requires some
computation.
The reasoning was that all user addresses are less than LONG_MAX, and all
kernel addresses are greater than LONG_MAX. Therefore access_ok() can
filter kernel addresses.
Addresses between TASK_SIZE and LONG_MAX are not valid user addresses, but
access_ok() let them pass. That was thought to be okay, because they are
not valid addresses at hardware level.
Unfortunately, one case is missed: get_user_pages_fast() happily accepts
addresses between TASK_SIZE and LONG_MAX. futex(), for instance, uses
get_user_pages_fast(). This causes the problem reported by Robert [1].
Therefore, revert this commit. TASK_SIZE_MAX is changed to the default:
TASK_SIZE.
This unfortunately reduces performance, because TASK_SIZE is more expensive
to compute compared to LONG_MAX. But correctness first, we can think about
optimization later, if required.
Reported-by: <rtm(a)csail.mit.edu>
Closes: https://lore.kernel.org/linux-riscv/77605.1750245028@localhost/
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
arch/riscv/include/asm/pgtable.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 438ce7df24c39..5bd5aae60d536 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -1075,7 +1075,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
*/
#ifdef CONFIG_64BIT
#define TASK_SIZE_64 (PGDIR_SIZE * PTRS_PER_PGD / 2)
-#define TASK_SIZE_MAX LONG_MAX
#ifdef CONFIG_COMPAT
#define TASK_SIZE_32 (_AC(0x80000000, UL) - PAGE_SIZE)
--
2.39.5
This reverts commit 61a74ad25462 ("riscv: misaligned: fix sleeping function
called during misaligned access handling"). The commit addresses a sleeping
in atomic context problem, but it is not the correct fix as explained by
Clément:
"Using nofault would lead to failure to read from user memory that is paged
out for instance. This is not really acceptable, we should handle user
misaligned access even at an address that would generate a page fault."
This bug has been properly fixed by commit 453805f0a28f ("riscv:
misaligned: enable IRQs while handling misaligned accesses").
Revert this improper fix.
Link: https://lore.kernel.org/linux-riscv/b779beed-e44e-4a5e-9551-4647682b0d21@ri…
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
arch/riscv/kernel/traps_misaligned.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c
index dd8e4af6583f4..93043924fe6c6 100644
--- a/arch/riscv/kernel/traps_misaligned.c
+++ b/arch/riscv/kernel/traps_misaligned.c
@@ -454,7 +454,7 @@ static int handle_scalar_misaligned_load(struct pt_regs *regs)
val.data_u64 = 0;
if (user_mode(regs)) {
- if (copy_from_user_nofault(&val, (u8 __user *)addr, len))
+ if (copy_from_user(&val, (u8 __user *)addr, len))
return -1;
} else {
memcpy(&val, (u8 *)addr, len);
@@ -555,7 +555,7 @@ static int handle_scalar_misaligned_store(struct pt_regs *regs)
return -EOPNOTSUPP;
if (user_mode(regs)) {
- if (copy_to_user_nofault((u8 __user *)addr, &val, len))
+ if (copy_to_user((u8 __user *)addr, &val, len))
return -1;
} else {
memcpy((u8 *)addr, &val, len);
--
2.39.5
The mailbox controller driver for the Microchip Inter-processor
Communication can be built as a module. It uses cpuid_to_hartid_map and
commit 4783ce32b080 ("riscv: export __cpuid_to_hartid_map") enables that
to work for SMP. However, cpuid_to_hartid_map uses boot_cpu_hartid on
non-SMP kernels and this driver can be useful in such configurations[1].
Export boot_cpu_hartid so the driver can be built as a module on non-SMP
kernels as well.
Link: https://lore.kernel.org/lkml/20250617-confess-reimburse-876101e099cb@spud/ [1]
Cc: stable(a)vger.kernel.org
Fixes: e4b1d67e7141 ("mailbox: add Microchip IPC support")
Signed-off-by: Klara Modin <klarasmodin(a)gmail.com>
---
arch/riscv/kernel/setup.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index f7c9a1caa83e..14888e5ea19a 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -50,6 +50,7 @@ atomic_t hart_lottery __section(".sdata")
#endif
;
unsigned long boot_cpu_hartid;
+EXPORT_SYMBOL_GPL(boot_cpu_hartid);
/*
* Place kernel memory regions on the resource tree so that
--
2.49.0
The patch titled
Subject: mm/vmalloc: leave lazy MMU mode on PTE mapping error
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-vmalloc-leave-lazy-mmu-mode-on-pte-mapping-error.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Alexander Gordeev <agordeev(a)linux.ibm.com>
Subject: mm/vmalloc: leave lazy MMU mode on PTE mapping error
Date: Mon, 23 Jun 2025 09:57:21 +0200
vmap_pages_pte_range() enters the lazy MMU mode, but fails to leave it in
case an error is encountered.
Link: https://lkml.kernel.org/r/20250623075721.2817094-1-agordeev@linux.ibm.com
Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/r/202506132017.T1l1l6ME-lkp@intel.com/
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-leave-lazy-mmu-mode-on-pte-mapping-error
+++ a/mm/vmalloc.c
@@ -514,6 +514,7 @@ static int vmap_pages_pte_range(pmd_t *p
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
{
+ int err = 0;
pte_t *pte;
/*
@@ -530,12 +531,18 @@ static int vmap_pages_pte_range(pmd_t *p
do {
struct page *page = pages[*nr];
- if (WARN_ON(!pte_none(ptep_get(pte))))
- return -EBUSY;
- if (WARN_ON(!page))
- return -ENOMEM;
- if (WARN_ON(!pfn_valid(page_to_pfn(page))))
- return -EINVAL;
+ if (WARN_ON(!pte_none(ptep_get(pte)))) {
+ err = -EBUSY;
+ break;
+ }
+ if (WARN_ON(!page)) {
+ err = -ENOMEM;
+ break;
+ }
+ if (WARN_ON(!pfn_valid(page_to_pfn(page)))) {
+ err = -EINVAL;
+ break;
+ }
set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
(*nr)++;
@@ -543,7 +550,8 @@ static int vmap_pages_pte_range(pmd_t *p
arch_leave_lazy_mmu_mode();
*mask |= PGTBL_PTE_MODIFIED;
- return 0;
+
+ return err;
}
static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
_
Patches currently in -mm which might be from agordeev(a)linux.ibm.com are
mm-vmalloc-leave-lazy-mmu-mode-on-pte-mapping-error.patch
Hello,
This is to inform all that constant firmware crashes have been seen in
the "Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter",
which was shipped with the Dell Inspiron 5567 laptops. This affects
every kernel release, including the stable and the longterm ones.
All the logs have been taken after livebooting an Arch Linux ISO.
Every distro has been tried, and it has been confirmed that some error
of this kind is shown in every distro.
## Steps to reproduce the issue
1. Boot/liveboot any Linux ISO through this card (and possibly, this laptop).
2. Wi-Fi network interface appears.
3. Connect the Wi-Fi router to the computer.
4. A few moments/minutes after that, the touchpad stops working, and
the network interface cannot even access the Internet anymore (BUT,
the network interface might disappear, might not disappear).
## Affected distros and the necessary workarounds
This has been the pattern on every distro and their corresponding
kernels (LMDE, Linux Mint, Pop!_OS, Zorin, Kubuntu, KDE Neon,
elementaryOS, Fedora, and even Arch). The fix which made these distros
usable is to add two things:
- Adding "options ath10k_core skip_otp=y" to a new conf file in /etc/modprobe.d.
- Adding "pci=noaer" in GRUB kernel parameters so that the logs are
not flooded with Multiple Correctable Errors.
To defend my case (that it occurs in the other models of Inspiron 5567
too), I have recently contacted someone running Linux Mint on the same
model. The answer was the same: the touchpad and the Wi-Fi stop
simultaneously.
## Some of the limitations
The kernel was tainted, but the other things have been properly noted
in case they might provide some useful details. As stated,
investigating why IRQ #16 is disabled will probably give us the
answer.
## Logs provided
All the logs in a combined manner can be found here:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180
- Full dmesg: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- Hostnamectl: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- lspci: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- Modinfo of the driver:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- Ping command:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- /proc/interrupts:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- IP addr command (Heavily Redacted):
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
Lastly, this issue on the GitHub repository of Pop!_OS 'might' be
relevant: https://github.com/pop-os/pop/issues/1470
It would be highly appreciated if the matter were looked into.
Thanks,
Bandhan Pramanik
cmos_interrupt() can be called in a non-interrupt context, such as in
an ACPI event handler (which runs in an interrupt thread). Therefore,
usage of spin_lock(&rtc_lock) is insecure. Use spin_lock_irqsave() /
spin_unlock_irqrestore() instead.
Before a misguided
commit 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ")
the cmos_interrupt() function used spin_lock_irqsave(). That commit
changed it to spin_lock() and broke locking, which was partially fixed in
commit 13be2efc390a ("rtc: cmos: Disable irq around direct invocation of cmos_interrupt()")
That second commit did not take account of the ACPI fixed event handler
pathway, however. It introduced local_irq_disable() workarounds in
cmos_check_wkalrm(), which can cause problems on PREEMPT_RT kernels
and are now unnecessary.
Add an explicit comment so that this change will not be reverted by
mistake.
Cc: <stable(a)vger.kernel.org>
Fixes: 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ")
Signed-off-by: Mateusz Jończyk <mat.jonczyk(a)o2.pl>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Tested-by: Chris Bainbridge <chris.bainbridge(a)gmail.com>
Reported-by: Chris Bainbridge <chris.bainbridge(a)gmail.com>
Closes: https://lore.kernel.org/all/aDtJ92foPUYmGheF@debian.local/
---
Changes after DRAFT version of the patch:
- rewrite commit message,
- test this locally (also on top of 5.10.238 for the stable backport),
- fix a grammar mistake in the comment.
---
drivers/rtc/rtc-cmos.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 8172869bd3d7..0743c6acd6e2 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -692,8 +692,12 @@ static irqreturn_t cmos_interrupt(int irq, void *p)
{
u8 irqstat;
u8 rtc_control;
+ unsigned long flags;
- spin_lock(&rtc_lock);
+ /* We cannot use spin_lock() here, as cmos_interrupt() is also called
+ * in a non-irq context.
+ */
+ spin_lock_irqsave(&rtc_lock, flags);
/* When the HPET interrupt handler calls us, the interrupt
* status is passed as arg1 instead of the irq number. But
@@ -727,7 +731,7 @@ static irqreturn_t cmos_interrupt(int irq, void *p)
hpet_mask_rtc_irq_bit(RTC_AIE);
CMOS_READ(RTC_INTR_FLAGS);
}
- spin_unlock(&rtc_lock);
+ spin_unlock_irqrestore(&rtc_lock, flags);
if (is_intr(irqstat)) {
rtc_update_irq(p, 1, irqstat);
@@ -1295,9 +1299,7 @@ static void cmos_check_wkalrm(struct device *dev)
* ACK the rtc irq here
*/
if (t_now >= cmos->alarm_expires && cmos_use_acpi_alarm()) {
- local_irq_disable();
cmos_interrupt(0, (void *)cmos->rtc);
- local_irq_enable();
return;
}
--
2.25.1