We use Kconfig to select the kernel stack size, doubling the default
size if KASAN is enabled.
But that actually only works if KASAN is selected from the beginning,
meaning that if KASAN config is added later (for example using
menuconfig), CONFIG_THREAD_SIZE_ORDER won't be updated, keeping the
default size, which is not enough for KASAN as reported in [1].
So fix this by moving the logic to compute the right kernel stack into a
header.
Fixes: a7555f6b62e7 ("riscv: stack: Add config of thread stack size")
Reported-by: syzbot+ba9eac24453387a9d502(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000eb301906222aadc2@google.com/ [1]
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
---
arch/riscv/Kconfig | 3 +--
arch/riscv/include/asm/thread_info.h | 7 ++++++-
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ccbfd28f4982..b65846d02622 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -759,8 +759,7 @@ config IRQ_STACKS
config THREAD_SIZE_ORDER
int "Kernel stack size (in power-of-two numbers of page size)" if VMAP_STACK && EXPERT
range 0 4
- default 1 if 32BIT && !KASAN
- default 3 if 64BIT && KASAN
+ default 1 if 32BIT
default 2
help
Specify the Pages of thread stack size (from 4KB to 64KB), which also
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
index fca5c6be2b81..385b43211a71 100644
--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
@@ -13,7 +13,12 @@
#include <linux/sizes.h>
/* thread information allocation */
-#define THREAD_SIZE_ORDER CONFIG_THREAD_SIZE_ORDER
+#ifdef CONFIG_KASAN
+#define KASAN_STACK_ORDER 1
+#else
+#define KASAN_STACK_ORDER 0
+#endif
+#define THREAD_SIZE_ORDER (CONFIG_THREAD_SIZE_ORDER + KASAN_STACK_ORDER)
#define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER)
/*
--
2.39.2
img_info->mhi_buf should be freed on error path in mhi_alloc_bhie_table().
This error case is rare but still needs to be fixed.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 3000f85b8f47 ("bus: mhi: core: Add support for basic PM operations")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
v2: add missing Cc: stable, as Greg Kroah-Hartman's bot reported
drivers/bus/mhi/host/boot.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
index edc0ec5a0933..738dcd11b66f 100644
--- a/drivers/bus/mhi/host/boot.c
+++ b/drivers/bus/mhi/host/boot.c
@@ -357,6 +357,7 @@ int mhi_alloc_bhie_table(struct mhi_controller *mhi_cntrl,
for (--i, --mhi_buf; i >= 0; i--, mhi_buf--)
dma_free_coherent(mhi_cntrl->cntrl_dev, mhi_buf->len,
mhi_buf->buf, mhi_buf->dma_addr);
+ kfree(img_info->mhi_buf);
error_alloc_mhi_buf:
kfree(img_info);
--
2.39.2
Hi
In downstream Debian we got a report from Eric Degenetais, in
https://bugs.debian.org/1081833 that after the update to the 6.1.106
based version, there were regular cracks in HDMI sound during
playback.
Eric was able to bisec the issue down to
92afcc310038ebe5d66c689bb0bf418f5451201c in the v6.1.y series which
got applied in 6.1.104.
Cf. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1081833#47
#regzbot introduced: 92afcc310038ebe5d66c689bb0bf418f5451201c
#regzbot link: https://bugs.debian.org/1081833
It should be noted that Eric as well tried more recent stable series
as well, in particular did test as well 6.10.6 based version back on
20th september, and the issue was reproducible there as well.
Is there anything else we can try to provide?
Regards,
Salvatore
Hi,
some of our customers (Proxmox VE) are seeing issues with file
corruptions when accessing contents located on CephFS via the in-kernel
Ceph client [0,1], we managed to reproduce this regression on kernels up
to the latest 6.11-rc6.
Accessing the same content on the CephFS using the FUSE client or the
in-kernel ceph client with older kernels (Ubuntu kernel on v6.5) does
not show file corruptions.
Unfortunately the corruption is hard to reproduce, seemingly only a
small subset of files is affected. However, once a file is affected, the
issue is persistent and can easily be reproduced.
Bisection with the reproducer points to this commit:
"92b6cc5d: netfs: Add iov_iters to (sub)requests to describe various
buffers"
Description of the issue:
A file was copied from local filesystem to cephfs via:
```
cp /tmp/proxmox-backup-server_3.2-1.iso
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso
```
* sha256sum on local
filesystem:`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607
/tmp/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel up to above commit:
`1d19698e8f7e769cf0a0dcc7ba0018ef5416c5ec495d5e61313f9c84a4237607
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* sha256sum on cephfs with kernel after above commit:
`89ad3620bf7b1e0913b534516cfbe48580efbaec944b79951e2c14e5e551f736
/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso`
* removing and/or recopying the file does not change the issue, the
corrupt checksum remains the same.
* accessing the same file from different clients results in the same
output: the one with above patch applied do show the incorrect checksum,
ones without the patch show the correct checksum.
* the issue persists even across reboot of the ceph cluster and/or clients.
* the file is indeed corrupt after reading, as verified by a `cmp -b`.
Interestingly, the first 4M contain the correct data, the following 4M
are read as all zeros, which differs from the original data.
* the issue is related to the readahead size: mounting the cephfs with a
`rasize=0` makes the issue disappear, same is true for sizes up to 128k
(please note that the ranges as initially reported on the mailing list
[3] are not correct for rasize [0..128k] the file is not corrupted).
In the bugtracker issue [4] I attached a ftrace with "*ceph*" as filter
while performing a read on the latest kernel 6.11-rc6 while performing
```
dd if=/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso of=/tmp/test.out
bs=8M count=1
```
the relevant part shown by task `dd-26192`.
Please let me know if I can provide further information or debug outputs
in order to narrow down the issue.
[0] https://forum.proxmox.com/threads/78340/post-676129
[1] https://forum.proxmox.com/threads/149249/
[2] https://forum.proxmox.com/threads/151291/
[3]
https://lore.kernel.org/lkml/db686d0c-2f27-47c8-8c14-26969433b13b@proxmox.c…
[4] https://bugzilla.kernel.org/show_bug.cgi?id=219237
#regzbot introduced: 92b6cc5d
Regards,
Christian Ebner
It is possible that an interrupt is disabled and masked at the same time.
When the interrupt is enabled again by enable_irq(), only plic_irq_enable()
is called, not plic_irq_unmask(). The interrupt remains masked and never
raises.
An example where interrupt is both disabled and masked is when
handle_fasteoi_irq() is the handler, and IRQS_ONESHOT is set. The interrupt
handler:
1. Mask the interrupt
2. Handle the interrupt
3. Check if interrupt is still enabled, and unmask it (see
cond_unmask_eoi_irq())
If another task disables the interrupt in the middle of the above steps,
the interrupt will not get unmasked, and will remain masked when it is
enabled in the future.
The problem is occasionally observed when PREEMPT_RT is enabled, because
PREEMPT_RT add the IRQS_ONESHOT flag. But PREEMPT_RT only makes the
problem more likely to appear, the bug has been around since
commit a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask
operations").
Fix it by unmasking interrupt in plic_irq_enable().
Fixes: a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations").
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
drivers/irqchip/irq-sifive-plic.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 2f6ef5c495bd..0efbf14ec9fa 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -128,6 +128,9 @@ static inline void plic_irq_toggle(const struct cpumask *mask,
static void plic_irq_enable(struct irq_data *d)
{
+ struct plic_priv *priv = irq_data_get_irq_chip_data(d);
+
+ writel(1, priv->regs + PRIORITY_BASE + d->hwirq * PRIORITY_PER_ID);
plic_irq_toggle(irq_data_get_effective_affinity_mask(d), d, 1);
}
--
2.39.5
The only user (the mesa gallium driver) is already assuming explicit
synchronization and doing the export/import dance on shared BOs. The
only reason we were registering ourselves as writers on external BOs
is because Xe, which was the reference back when we developed Panthor,
was doing so. Turns out Xe was wrong, and we really want bookkeep on
all registered fences, so userspace can explicitly upgrade those to
read/write when needed.
Fixes: 4bdca1150792 ("drm/panthor: Add the driver frontend block")
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Simona Vetter <simona.vetter(a)ffwll.ch>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon(a)collabora.com>
---
drivers/gpu/drm/panthor/panthor_sched.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 9a0ff48f7061..41260cf4beb8 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -3423,13 +3423,8 @@ void panthor_job_update_resvs(struct drm_exec *exec, struct drm_sched_job *sched
{
struct panthor_job *job = container_of(sched_job, struct panthor_job, base);
- /* Still not sure why we want USAGE_WRITE for external objects, since I
- * was assuming this would be handled through explicit syncs being imported
- * to external BOs with DMA_BUF_IOCTL_IMPORT_SYNC_FILE, but other drivers
- * seem to pass DMA_RESV_USAGE_WRITE, so there must be a good reason.
- */
panthor_vm_update_resvs(job->group->vm, exec, &sched_job->s_fence->finished,
- DMA_RESV_USAGE_BOOKKEEP, DMA_RESV_USAGE_WRITE);
+ DMA_RESV_USAGE_BOOKKEEP, DMA_RESV_USAGE_BOOKKEEP);
}
void panthor_sched_unplug(struct panthor_device *ptdev)
--
2.46.0
If deferred operations are pending, we want to wait for those to
land before declaring the queue blocked on a SYNC_WAIT. We need
this to deal with the case where the sync object is signalled through
a deferred SYNC_{ADD,SET} from the same queue. If we don't do that
and the group gets scheduled out before the deferred SYNC_{SET,ADD}
is executed, we'll end up with a timeout, because no external
SYNC_{SET,ADD} will make the scheduler reconsider the group for
execution.
Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon(a)collabora.com>
---
drivers/gpu/drm/panthor/panthor_sched.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 41260cf4beb8..201d5e7a921e 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1103,7 +1103,13 @@ cs_slot_sync_queue_state_locked(struct panthor_device *ptdev, u32 csg_id, u32 cs
list_move_tail(&group->wait_node,
&group->ptdev->scheduler->groups.waiting);
}
- group->blocked_queues |= BIT(cs_id);
+
+ /* The queue is only blocked if there's no deferred operation
+ * pending, which can be checked through the scoreboard status.
+ */
+ if (!cs_iface->output->status_scoreboards)
+ group->blocked_queues |= BIT(cs_id);
+
queue->syncwait.gpu_va = cs_iface->output->status_wait_sync_ptr;
queue->syncwait.ref = cs_iface->output->status_wait_sync_value;
status_wait_cond = cs_iface->output->status_wait & CS_STATUS_WAIT_SYNC_COND_MASK;
--
2.46.0
The group variable can't be used to retrieve ptdev in our second loop,
because it points to the previously iterated list_head, not a valid
group. Get the ptdev object from the scheduler instead.
Cc: <stable(a)vger.kernel.org>
Fixes: d72f049087d4 ("drm/panthor: Allow driver compilation")
Reported-by: kernel test robot <lkp(a)intel.com>
Reported-by: Julia Lawall <julia.lawall(a)inria.fr>
Closes: https://lore.kernel.org/r/202409302306.UDikqa03-lkp@intel.com/
Signed-off-by: Boris Brezillon <boris.brezillon(a)collabora.com>
---
drivers/gpu/drm/panthor/panthor_sched.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 201d5e7a921e..24ff91c084e4 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -2052,6 +2052,7 @@ static void
tick_ctx_cleanup(struct panthor_scheduler *sched,
struct panthor_sched_tick_ctx *ctx)
{
+ struct panthor_device *ptdev = sched->ptdev;
struct panthor_group *group, *tmp;
u32 i;
@@ -2060,7 +2061,7 @@ tick_ctx_cleanup(struct panthor_scheduler *sched,
/* If everything went fine, we should only have groups
* to be terminated in the old_groups lists.
*/
- drm_WARN_ON(&group->ptdev->base, !ctx->csg_upd_failed_mask &&
+ drm_WARN_ON(&ptdev->base, !ctx->csg_upd_failed_mask &&
group_can_run(group));
if (!group_can_run(group)) {
@@ -2083,7 +2084,7 @@ tick_ctx_cleanup(struct panthor_scheduler *sched,
/* If everything went fine, the groups to schedule lists should
* be empty.
*/
- drm_WARN_ON(&group->ptdev->base,
+ drm_WARN_ON(&ptdev->base,
!ctx->csg_upd_failed_mask && !list_empty(&ctx->groups[i]));
list_for_each_entry_safe(group, tmp, &ctx->groups[i], run_node) {
--
2.46.0