There is a long standing bug which causes I2C communication not to
work on the Armada 3700 based boards. This small series restores
that functionality.
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
Signed-off-by: Imre Kaloz <kaloz(a)openwrt.org>
---
Gabor Juhos (3):
i2c: add init_recovery() callback
i2c: pxa: prevent calling of the generic recovery init code
i2c: pxa: handle 'Early Bus Busy' condition on Armada 3700
drivers/i2c/busses/i2c-pxa.c | 25 +++++++++++++++++++------
drivers/i2c/i2c-core-base.c | 8 +++++++-
include/linux/i2c.h | 4 ++++
3 files changed, 30 insertions(+), 7 deletions(-)
---
base-commit: 92a09c47464d040866cf2b4cd052bc60555185fb
change-id: 20250510-i2c-pxa-fix-i2c-communication-3e6de1e3d0c6
Best regards,
--
Gabor Juhos <j4g8y7(a)gmail.com>
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x f31fe8165d365379d858c53bef43254c7d6d1cfd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051209-overturn-supreme-26f1@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f31fe8165d365379d858c53bef43254c7d6d1cfd Mon Sep 17 00:00:00 2001
From: Naman Jain <namjain(a)linux.microsoft.com>
Date: Fri, 2 May 2025 13:18:10 +0530
Subject: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
On regular bootup, devices get registered to VMBus first, so when
uio_hv_generic driver for a particular device type is probed,
the device is already initialized and added, so sysfs creation in
hv_uio_probe() works fine. However, when the device is removed
and brought back, the channel gets rescinded and the device again gets
registered to VMBus. However this time, the uio_hv_generic driver is
already registered to probe for that device and in this case sysfs
creation is tried before the device's kobject gets initialized
completely.
Fix this by moving the core logic of sysfs creation of ring buffer,
from uio_hv_generic to HyperV's VMBus driver, where the rest of the
sysfs attributes for the channels are defined. While doing that, make
use of attribute groups and macros, instead of creating sysfs
directly, to ensure better error handling and code flow.
Problematic path:
vmbus_process_offer (A new offer comes for the VMBus device)
vmbus_add_channel_work
vmbus_device_register
|-> device_register
| |...
| |-> hv_uio_probe
| |...
| |-> sysfs_create_bin_file (leads to a warning as
| the primary channel's kobject, which is used to
| create the sysfs file, is not yet initialized)
|-> kset_create_and_add
|-> vmbus_add_channel_kobj (initialization of the primary
channel's kobject happens later)
Above code flow is sequential and the warning is always reproducible in
this path.
Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel")
Cc: stable(a)kernel.org
Suggested-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Suggested-by: Michael Kelley <mhklinux(a)outlook.com>
Reviewed-by: Michael Kelley <mhklinux(a)outlook.com>
Tested-by: Michael Kelley <mhklinux(a)outlook.com>
Reviewed-by: Dexuan Cui <decui(a)microsoft.com>
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
Link: https://lore.kernel.org/r/20250502074811.2022-2-namjain@linux.microsoft.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 29780f3a7478..0b450e53161e 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -477,4 +477,10 @@ static inline int hv_debug_add_dev_dir(struct hv_device *dev)
#endif /* CONFIG_HYPERV_TESTING */
+/* Create and remove sysfs entry for memory mapped ring buffers for a channel */
+int hv_create_ring_sysfs(struct vmbus_channel *channel,
+ int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
+ struct vm_area_struct *vma));
+int hv_remove_ring_sysfs(struct vmbus_channel *channel);
+
#endif /* _HYPERV_VMBUS_H */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 8d3cff42bdbb..0f16a83cc2d6 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1802,6 +1802,27 @@ static ssize_t subchannel_id_show(struct vmbus_channel *channel,
}
static VMBUS_CHAN_ATTR_RO(subchannel_id);
+static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
+ const struct bin_attribute *attr,
+ struct vm_area_struct *vma)
+{
+ struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
+
+ /*
+ * hv_(create|remove)_ring_sysfs implementation ensures that mmap_ring_buffer
+ * is not NULL.
+ */
+ return channel->mmap_ring_buffer(channel, vma);
+}
+
+static struct bin_attribute chan_attr_ring_buffer = {
+ .attr = {
+ .name = "ring",
+ .mode = 0600,
+ },
+ .size = 2 * SZ_2M,
+ .mmap = hv_mmap_ring_buffer_wrapper,
+};
static struct attribute *vmbus_chan_attrs[] = {
&chan_attr_out_mask.attr,
&chan_attr_in_mask.attr,
@@ -1821,6 +1842,11 @@ static struct attribute *vmbus_chan_attrs[] = {
NULL
};
+static struct bin_attribute *vmbus_chan_bin_attrs[] = {
+ &chan_attr_ring_buffer,
+ NULL
+};
+
/*
* Channel-level attribute_group callback function. Returns the permission for
* each attribute, and returns 0 if an attribute is not visible.
@@ -1841,9 +1867,24 @@ static umode_t vmbus_chan_attr_is_visible(struct kobject *kobj,
return attr->mode;
}
+static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj,
+ const struct bin_attribute *attr, int idx)
+{
+ const struct vmbus_channel *channel =
+ container_of(kobj, struct vmbus_channel, kobj);
+
+ /* Hide ring attribute if channel's ring_sysfs_visible is set to false */
+ if (attr == &chan_attr_ring_buffer && !channel->ring_sysfs_visible)
+ return 0;
+
+ return attr->attr.mode;
+}
+
static const struct attribute_group vmbus_chan_group = {
.attrs = vmbus_chan_attrs,
- .is_visible = vmbus_chan_attr_is_visible
+ .bin_attrs = vmbus_chan_bin_attrs,
+ .is_visible = vmbus_chan_attr_is_visible,
+ .is_bin_visible = vmbus_chan_bin_attr_is_visible,
};
static const struct kobj_type vmbus_chan_ktype = {
@@ -1851,6 +1892,63 @@ static const struct kobj_type vmbus_chan_ktype = {
.release = vmbus_chan_release,
};
+/**
+ * hv_create_ring_sysfs() - create "ring" sysfs entry corresponding to ring buffers for a channel.
+ * @channel: Pointer to vmbus_channel structure
+ * @hv_mmap_ring_buffer: function pointer for initializing the function to be called on mmap of
+ * channel's "ring" sysfs node, which is for the ring buffer of that channel.
+ * Function pointer is of below type:
+ * int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
+ * struct vm_area_struct *vma))
+ * This has a pointer to the channel and a pointer to vm_area_struct,
+ * used for mmap, as arguments.
+ *
+ * Sysfs node for ring buffer of a channel is created along with other fields, however its
+ * visibility is disabled by default. Sysfs creation needs to be controlled when the use-case
+ * is running.
+ * For example, HV_NIC device is used either by uio_hv_generic or hv_netvsc at any given point of
+ * time, and "ring" sysfs is needed only when uio_hv_generic is bound to that device. To avoid
+ * exposing the ring buffer by default, this function is reponsible to enable visibility of
+ * ring for userspace to use.
+ * Note: Race conditions can happen with userspace and it is not encouraged to create new
+ * use-cases for this. This was added to maintain backward compatibility, while solving
+ * one of the race conditions in uio_hv_generic while creating sysfs.
+ *
+ * Returns 0 on success or error code on failure.
+ */
+int hv_create_ring_sysfs(struct vmbus_channel *channel,
+ int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
+ struct vm_area_struct *vma))
+{
+ struct kobject *kobj = &channel->kobj;
+
+ channel->mmap_ring_buffer = hv_mmap_ring_buffer;
+ channel->ring_sysfs_visible = true;
+
+ return sysfs_update_group(kobj, &vmbus_chan_group);
+}
+EXPORT_SYMBOL_GPL(hv_create_ring_sysfs);
+
+/**
+ * hv_remove_ring_sysfs() - remove ring sysfs entry corresponding to ring buffers for a channel.
+ * @channel: Pointer to vmbus_channel structure
+ *
+ * Hide "ring" sysfs for a channel by changing its is_visible attribute and updating sysfs group.
+ *
+ * Returns 0 on success or error code on failure.
+ */
+int hv_remove_ring_sysfs(struct vmbus_channel *channel)
+{
+ struct kobject *kobj = &channel->kobj;
+ int ret;
+
+ channel->ring_sysfs_visible = false;
+ ret = sysfs_update_group(kobj, &vmbus_chan_group);
+ channel->mmap_ring_buffer = NULL;
+ return ret;
+}
+EXPORT_SYMBOL_GPL(hv_remove_ring_sysfs);
+
/*
* vmbus_add_channel_kobj - setup a sub-directory under device/channels
*/
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 1b19b5647495..69c1df0f4ca5 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -131,15 +131,12 @@ static void hv_uio_rescind(struct vmbus_channel *channel)
vmbus_device_unregister(channel->device_obj);
}
-/* Sysfs API to allow mmap of the ring buffers
+/* Function used for mmap of ring buffer sysfs interface.
* The ring buffer is allocated as contiguous memory by vmbus_open
*/
-static int hv_uio_ring_mmap(struct file *filp, struct kobject *kobj,
- const struct bin_attribute *attr,
- struct vm_area_struct *vma)
+static int
+hv_uio_ring_mmap(struct vmbus_channel *channel, struct vm_area_struct *vma)
{
- struct vmbus_channel *channel
- = container_of(kobj, struct vmbus_channel, kobj);
void *ring_buffer = page_address(channel->ringbuffer_page);
if (channel->state != CHANNEL_OPENED_STATE)
@@ -149,15 +146,6 @@ static int hv_uio_ring_mmap(struct file *filp, struct kobject *kobj,
channel->ringbuffer_pagecount << PAGE_SHIFT);
}
-static const struct bin_attribute ring_buffer_bin_attr = {
- .attr = {
- .name = "ring",
- .mode = 0600,
- },
- .size = 2 * SZ_2M,
- .mmap = hv_uio_ring_mmap,
-};
-
/* Callback from VMBUS subsystem when new channel created. */
static void
hv_uio_new_channel(struct vmbus_channel *new_sc)
@@ -178,8 +166,7 @@ hv_uio_new_channel(struct vmbus_channel *new_sc)
/* Disable interrupts on sub channel */
new_sc->inbound.ring_buffer->interrupt_mask = 1;
set_channel_read_mode(new_sc, HV_CALL_ISR);
-
- ret = sysfs_create_bin_file(&new_sc->kobj, &ring_buffer_bin_attr);
+ ret = hv_create_ring_sysfs(new_sc, hv_uio_ring_mmap);
if (ret) {
dev_err(device, "sysfs create ring bin file failed; %d\n", ret);
vmbus_close(new_sc);
@@ -350,10 +337,18 @@ hv_uio_probe(struct hv_device *dev,
goto fail_close;
}
- ret = sysfs_create_bin_file(&channel->kobj, &ring_buffer_bin_attr);
- if (ret)
- dev_notice(&dev->device,
- "sysfs create ring bin file failed; %d\n", ret);
+ /*
+ * This internally calls sysfs_update_group, which returns a non-zero value if it executes
+ * before sysfs_create_group. This is expected as the 'ring' will be created later in
+ * vmbus_device_register() -> vmbus_add_channel_kobj(). Thus, no need to check the return
+ * value and print warning.
+ *
+ * Creating/exposing sysfs in driver probe is not encouraged as it can lead to race
+ * conditions with userspace. For backward compatibility, "ring" sysfs could not be removed
+ * or decoupled from uio_hv_generic probe. Userspace programs can make use of inotify
+ * APIs to make sure that ring is created.
+ */
+ hv_create_ring_sysfs(channel, hv_uio_ring_mmap);
hv_set_drvdata(dev, pdata);
@@ -375,7 +370,7 @@ hv_uio_remove(struct hv_device *dev)
if (!pdata)
return;
- sysfs_remove_bin_file(&dev->channel->kobj, &ring_buffer_bin_attr);
+ hv_remove_ring_sysfs(dev->channel);
uio_unregister_device(&pdata->info);
hv_uio_cleanup(dev, pdata);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 675959fb97ba..d6ffe01962c2 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1002,6 +1002,12 @@ struct vmbus_channel {
/* The max size of a packet on this channel */
u32 max_pkt_size;
+
+ /* function to mmap ring buffer memory to the channel's sysfs ring attribute */
+ int (*mmap_ring_buffer)(struct vmbus_channel *channel, struct vm_area_struct *vma);
+
+ /* boolean to control visibility of sysfs for ring buffer */
+ bool ring_sysfs_visible;
};
#define lock_requestor(channel, flags) \
With an unset CONFIG_CPU_MITIGATIONS, or more precisely with
CONFIG_MITIGATION_ITS,
I get a compilation error:
ld: arch/x86/net/bpf_jit_comp.o: in function `emit_indirect_jump':
bpf_jit_comp.c:(.text+0x8cb): undefined reference to
`__x86_indirect_its_thunk_array'
make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
make[1]: *** [/usr/src/linux/Makefile:1182: vmlinux] Error 2
make: *** [Makefile:224: __sub-make] Error 2
Keyboard and touchpad stopped working on several Apple Macbooks from the
year 2017 using kernel 6.12.xx . Until now I could only find this
discussion affirming the bug on Debian and Fedora:
https://github.com/Dunedan/mbp-2016-linux/issues/202
On siduction I also tried the more recent kernels 6.14.5 and mainline
6.15-rc4 (from Ubuntu) and the issue persisted with my testdevice
MacBookPro14,1 -- see the relevant output:
kernel: platform pxa2xx-spi.3: Adding to iommu group 20
kernel: input: Apple SPI Keyboard as
/devices/pci0000:00/0000:00:1e.3/pxa2xx-spi.3/spi_master/spi2/spi-APP000D:00/input/input0
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
0xffffa000 [fault reason 0x06] PTE Read access is not set
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
0xffffa000 [fault reason 0x06] PTE Read access is not set
kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
0xffffa000 [fault reason 0x06] PTE Read access is not set
kernel: DMAR: DRHD: handling fault status reg 3
kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00
Many thanks,
Jörg Berkel
From: Emanuele Ghidoli <emanuele.ghidoli(a)toradex.com>
If an input changes state during wake-up and is used as an interrupt
source, the IRQ handler reads the volatile input register to clear the
interrupt mask and deassert the IRQ line. However, the IRQ handler is
triggered before access to the register is granted, causing the read
operation to fail.
As a result, the IRQ handler enters a loop, repeatedly printing the
"failed reading register" message, until `pca953x_resume()` is eventually
called, which restores the driver context and enables access to
registers.
Fix by disabling the IRQ line before entering suspend mode, and
re-enabling it after the driver context is restored in `pca953x_resume()`.
An IRQ can be disabled with disable_irq() and still wake the system as
long as the IRQ has wake enabled, so the wake-up functionality is
preserved.
Fixes: b76574300504 ("gpio: pca953x: Restore registers after suspend/resume cycle")
Cc: stable(a)vger.kernel.org
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli(a)toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini(a)toradex.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Tested-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
---
v2 -> v3
- add r-b Andy, t-b Geert
- fixed commit message
v1 -> v2
- Instead of calling PM ops with disabled interrupts, just disable the
irq while going in suspend and re-enable it after restoring the
context in resume function.
---
drivers/gpio/gpio-pca953x.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
index ab2c0fd428fb..b852e4997629 100644
--- a/drivers/gpio/gpio-pca953x.c
+++ b/drivers/gpio/gpio-pca953x.c
@@ -1226,6 +1226,8 @@ static int pca953x_restore_context(struct pca953x_chip *chip)
guard(mutex)(&chip->i2c_lock);
+ if (chip->client->irq > 0)
+ enable_irq(chip->client->irq);
regcache_cache_only(chip->regmap, false);
regcache_mark_dirty(chip->regmap);
ret = pca953x_regcache_sync(chip);
@@ -1238,6 +1240,10 @@ static int pca953x_restore_context(struct pca953x_chip *chip)
static void pca953x_save_context(struct pca953x_chip *chip)
{
guard(mutex)(&chip->i2c_lock);
+
+ /* Disable IRQ to prevent early triggering while regmap "cache only" is on */
+ if (chip->client->irq > 0)
+ disable_irq(chip->client->irq);
regcache_cache_only(chip->regmap, true);
}
--
2.39.5
The patch fixes a deadlock which can be triggered by an internal
syzkaller [1] reproducer and captured by bpftrace script [2] and its log
[3] in this scenario:
Process 1 Process 2
--- ---
hugetlb_fault
mutex_lock(B) // take B
filemap_lock_hugetlb_folio
filemap_lock_folio
__filemap_get_folio
folio_lock(A) // take A
hugetlb_wp
mutex_unlock(B) // release B
... hugetlb_fault
... mutex_lock(B) // take B
filemap_lock_hugetlb_folio
filemap_lock_folio
__filemap_get_folio
folio_lock(A) // blocked
unmap_ref_private
...
mutex_lock(B) // retake and blocked
This is a ABBA deadlock involving two locks:
- Lock A: pagecache_folio lock
- Lock B: hugetlb_fault_mutex_table lock
The deadlock occurs between two processes as follows:
1. The first process (let’s call it Process 1) is handling a
copy-on-write (COW) operation on a hugepage via hugetlb_wp. Due to
insufficient reserved hugetlb pages, Process 1, owner of the reserved
hugetlb page, attempts to unmap a hugepage owned by another process
(non-owner) to satisfy the reservation. Before unmapping, Process 1
acquires lock B (hugetlb_fault_mutex_table lock) and then lock A
(pagecache_folio lock). To proceed with the unmap, it releases Lock B
but retains Lock A. After the unmap, Process 1 tries to reacquire Lock
B. However, at this point, Lock B has already been acquired by another
process.
2. The second process (Process 2) enters the hugetlb_fault handler
during the unmap operation. It successfully acquires Lock B
(hugetlb_fault_mutex_table lock) that was just released by Process 1,
but then attempts to acquire Lock A (pagecache_folio lock), which is
still held by Process 1.
As a result, Process 1 (holding Lock A) is blocked waiting for Lock B
(held by Process 2), while Process 2 (holding Lock B) is blocked waiting
for Lock A (held by Process 1), constructing a ABBA deadlock scenario.
The solution here is to unlock the pagecache_folio and provide the
pagecache_folio_unlocked variable to the caller to have the visibility
over the pagecache_folio status for subsequent handling.
The error message:
INFO: task repro_20250402_:13229 blocked for more than 64 seconds.
Not tainted 6.15.0-rc3+ #24
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:repro_20250402_ state:D stack:25856 pid:13229 tgid:13228 ppid:3513 task_flags:0x400040 flags:0x00004006
Call Trace:
<TASK>
__schedule+0x1755/0x4f50
schedule+0x158/0x330
schedule_preempt_disabled+0x15/0x30
__mutex_lock+0x75f/0xeb0
hugetlb_wp+0xf88/0x3440
hugetlb_fault+0x14c8/0x2c30
trace_clock_x86_tsc+0x20/0x20
do_user_addr_fault+0x61d/0x1490
exc_page_fault+0x64/0x100
asm_exc_page_fault+0x26/0x30
RIP: 0010:__put_user_4+0xd/0x20
copy_process+0x1f4a/0x3d60
kernel_clone+0x210/0x8f0
__x64_sys_clone+0x18d/0x1f0
do_syscall_64+0x6a/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x41b26d
</TASK>
INFO: task repro_20250402_:13229 is blocked on a mutex likely owned by task repro_20250402_:13250.
task:repro_20250402_ state:D stack:28288 pid:13250 tgid:13228 ppid:3513 task_flags:0x400040 flags:0x00000006
Call Trace:
<TASK>
__schedule+0x1755/0x4f50
schedule+0x158/0x330
io_schedule+0x92/0x110
folio_wait_bit_common+0x69a/0xba0
__filemap_get_folio+0x154/0xb70
hugetlb_fault+0xa50/0x2c30
trace_clock_x86_tsc+0x20/0x20
do_user_addr_fault+0xace/0x1490
exc_page_fault+0x64/0x100
asm_exc_page_fault+0x26/0x30
RIP: 0033:0x402619
</TASK>
INFO: task repro_20250402_:13250 blocked for more than 65 seconds.
Not tainted 6.15.0-rc3+ #24
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:repro_20250402_ state:D stack:28288 pid:13250 tgid:13228 ppid:3513 task_flags:0x400040 flags:0x00000006
Call Trace:
<TASK>
__schedule+0x1755/0x4f50
schedule+0x158/0x330
io_schedule+0x92/0x110
folio_wait_bit_common+0x69a/0xba0
__filemap_get_folio+0x154/0xb70
hugetlb_fault+0xa50/0x2c30
trace_clock_x86_tsc+0x20/0x20
do_user_addr_fault+0xace/0x1490
exc_page_fault+0x64/0x100
asm_exc_page_fault+0x26/0x30
RIP: 0033:0x402619
</TASK>
Showing all locks held in the system:
1 lock held by khungtaskd/35:
#0: ffffffff879a7440 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x30/0x180
2 locks held by repro_20250402_/13229:
#0: ffff888017d801e0 (&mm->mmap_lock){++++}-{4:4}, at: lock_mm_and_find_vma+0x37/0x300
#1: ffff888000fec848 (&hugetlb_fault_mutex_table[i]){+.+.}-{4:4}, at: hugetlb_wp+0xf88/0x3440
3 locks held by repro_20250402_/13250:
#0: ffff8880177f3d08 (vm_lock){++++}-{0:0}, at: do_user_addr_fault+0x41b/0x1490
#1: ffff888000fec848 (&hugetlb_fault_mutex_table[i]){+.+.}-{4:4}, at: hugetlb_fault+0x3b8/0x2c30
#2: ffff8880129500e8 (&resv_map->rw_sema){++++}-{4:4}, at: hugetlb_fault+0x494/0x2c30
Link: https://drive.google.com/file/d/1DVRnIW-vSayU5J1re9Ct_br3jJQU6Vpb/view?usp=… [1]
Link: https://github.com/bboymimi/bpftracer/blob/master/scripts/hugetlb_lock_debu… [2]
Link: https://drive.google.com/file/d/1bWq2-8o-BJAuhoHWX7zAhI6ggfhVzQUI/view?usp=… [3]
Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
Cc: <stable(a)vger.kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Florent Revest <revest(a)google.com>
Cc: Gavin Shan <gshan(a)redhat.com>
Signed-off-by: Gavin Guo <gavinguo(a)igalia.com>
---
mm/hugetlb.c | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e3e6ac991b9c..ad54a74aa563 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6115,7 +6115,8 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
* Keep the pte_same checks anyway to make transition from the mutex easier.
*/
static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
- struct vm_fault *vmf)
+ struct vm_fault *vmf,
+ bool *pagecache_folio_unlocked)
{
struct vm_area_struct *vma = vmf->vma;
struct mm_struct *mm = vma->vm_mm;
@@ -6212,6 +6213,22 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio,
u32 hash;
folio_put(old_folio);
+ /*
+ * The pagecache_folio needs to be unlocked to avoid
+ * deadlock and we won't re-lock it in hugetlb_wp(). The
+ * pagecache_folio could be truncated after being
+ * unlocked. So its state should not be relied
+ * subsequently.
+ *
+ * Setting *pagecache_folio_unlocked to true allows the
+ * caller to handle any necessary logic related to the
+ * folio's unlocked state.
+ */
+ if (pagecache_folio) {
+ folio_unlock(pagecache_folio);
+ if (pagecache_folio_unlocked)
+ *pagecache_folio_unlocked = true;
+ }
/*
* Drop hugetlb_fault_mutex and vma_lock before
* unmapping. unmapping needs to hold vma_lock
@@ -6566,7 +6583,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
hugetlb_count_add(pages_per_huge_page(h), mm);
if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
/* Optimization, do the COW without a second fault */
- ret = hugetlb_wp(folio, vmf);
+ ret = hugetlb_wp(folio, vmf, NULL);
}
spin_unlock(vmf->ptl);
@@ -6638,6 +6655,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
struct hstate *h = hstate_vma(vma);
struct address_space *mapping;
int need_wait_lock = 0;
+ bool pagecache_folio_unlocked = false;
struct vm_fault vmf = {
.vma = vma,
.address = address & huge_page_mask(h),
@@ -6792,7 +6810,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
if (!huge_pte_write(vmf.orig_pte)) {
- ret = hugetlb_wp(pagecache_folio, &vmf);
+ ret = hugetlb_wp(pagecache_folio, &vmf,
+ &pagecache_folio_unlocked);
goto out_put_page;
} else if (likely(flags & FAULT_FLAG_WRITE)) {
vmf.orig_pte = huge_pte_mkdirty(vmf.orig_pte);
@@ -6809,10 +6828,14 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
out_ptl:
spin_unlock(vmf.ptl);
- if (pagecache_folio) {
+ /*
+ * If the pagecache_folio is unlocked in hugetlb_wp(), we skip
+ * folio_unlock() here.
+ */
+ if (pagecache_folio && !pagecache_folio_unlocked)
folio_unlock(pagecache_folio);
+ if (pagecache_folio)
folio_put(pagecache_folio);
- }
out_mutex:
hugetlb_vma_unlock_read(vma);
base-commit: d76bb1ebb5587f66b0f8b8099bfbb44722bc08b3
--
2.43.0
A user reported on the Arch Linux Forums that their device is emitting
the following message in the kernel journal, which is fixed by adding
the quirk as submitted in this patch:
> kernel: usb 1-2: current rate 8436480 is different from the runtime rate 48000
There also is an entry for this product line added long time ago.
Their specific device has the following ID:
$ lsusb | grep Audio
Bus 001 Device 002: ID 1101:0003 EasyPass Industrial Co., Ltd Audioengine D1
Link: https://bbs.archlinux.org/viewtopic.php?id=305494
Fixes: 93f9d1a4ac593 ("ALSA: usb-audio: Apply sample rate quirk for Audioengine D1")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Heusel <christian(a)heusel.eu>
---
sound/usb/quirks.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/sound/usb/quirks.c b/sound/usb/quirks.c
index 9112313a9dbc005d8ab6076a6f2f4b0c0cecc64f..eb192834db68ca599770055cb563201e648f20ba 100644
--- a/sound/usb/quirks.c
+++ b/sound/usb/quirks.c
@@ -2250,6 +2250,8 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = {
QUIRK_FLAG_FIXED_RATE),
DEVICE_FLG(0x0fd9, 0x0008, /* Hauppauge HVR-950Q */
QUIRK_FLAG_SHARE_MEDIA_DEVICE | QUIRK_FLAG_ALIGN_TRANSFER),
+ DEVICE_FLG(0x1101, 0x0003, /* Audioengine D1 */
+ QUIRK_FLAG_GET_SAMPLE_RATE),
DEVICE_FLG(0x1224, 0x2a25, /* Jieli Technology USB PHY 2.0 */
QUIRK_FLAG_GET_SAMPLE_RATE | QUIRK_FLAG_MIC_RES_16),
DEVICE_FLG(0x1395, 0x740a, /* Sennheiser DECT */
---
base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
change-id: 20250512-audioengine-quirk-addition-718e6c86a2cc
Best regards,
--
Christian Heusel <christian(a)heusel.eu>
This patch series addresses a regression in Energy Efficient Ethernet
(EEE) handling for KSZ switches with integrated PHYs, introduced in
kernel v6.9 by commit fe0d4fd9285e ("net: phy: Keep track of EEE
configuration").
The first patch updates the DSA driver to allow phylink to properly
manage PHY EEE configuration. Since integrated PHYs handle LPI
internally and ports without integrated PHYs do not document MAC-level
LPI support, dummy MAC LPI callbacks are provided.
The second patch removes outdated EEE workarounds from the micrel PHY
driver, as they are no longer needed with correct phylink handling.
This series addresses the regression for mainline and kernels starting
from v6.14. It is not easily possible to fully fix older kernels due
to missing infrastructure changes.
Tested on KSZ9893 hardware.
Oleksij Rempel (2):
net: dsa: microchip: let phylink manage PHY EEE configuration on KSZ
switches
net: phy: micrel: remove KSZ9477 EEE quirks now handled by phylink
drivers/net/dsa/microchip/ksz_common.c | 135 ++++++++++++++++++++-----
drivers/net/phy/micrel.c | 7 --
include/linux/micrel_phy.h | 1 -
3 files changed, 107 insertions(+), 36 deletions(-)
--
2.39.5
While the set_msix() callback function in pcie-cadence-ep writes the
Table Size field correctly (N-1), the calculation of the PBA offset
is wrong because it calculates space for (N-1) entries instead of N.
This results in e.g. the following error when using QEMU with PCI
passthrough on a device which relies on the PCI endpoint subsystem:
failed to add PCI capability 0x11[0x50]@0xb0: table & pba overlap, or they don't fit in BARs, or don't align
Fix the calculation of PBA offset in the MSI-X capability.
Cc: stable(a)vger.kernel.org
Fixes: 3ef5d16f50f8 ("PCI: cadence: Add MSI-X support to Endpoint driver")
Reviewed-by: Wilfred Mallawa <wilfred.mallawa(a)wdc.com>
Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org>
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
drivers/pci/controller/cadence/pcie-cadence-ep.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/cadence/pcie-cadence-ep.c b/drivers/pci/controller/cadence/pcie-cadence-ep.c
index 599ec4b1223e..112ae200b393 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
@@ -292,13 +292,14 @@ static int cdns_pcie_ep_set_msix(struct pci_epc *epc, u8 fn, u8 vfn,
struct cdns_pcie *pcie = &ep->pcie;
u32 cap = CDNS_PCIE_EP_FUNC_MSIX_CAP_OFFSET;
u32 val, reg;
+ u16 actual_interrupts = interrupts + 1;
fn = cdns_pcie_get_fn_from_vfn(pcie, fn, vfn);
reg = cap + PCI_MSIX_FLAGS;
val = cdns_pcie_ep_fn_readw(pcie, fn, reg);
val &= ~PCI_MSIX_FLAGS_QSIZE;
- val |= interrupts;
+ val |= interrupts; /* 0's based value */
cdns_pcie_ep_fn_writew(pcie, fn, reg, val);
/* Set MSI-X BAR and offset */
@@ -308,7 +309,7 @@ static int cdns_pcie_ep_set_msix(struct pci_epc *epc, u8 fn, u8 vfn,
/* Set PBA BAR and offset. BAR must match MSI-X BAR */
reg = cap + PCI_MSIX_PBA;
- val = (offset + (interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
+ val = (offset + (actual_interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
cdns_pcie_ep_fn_writel(pcie, fn, reg, val);
return 0;
--
2.49.0
While the set_msix() callback function in pcie-designware-ep writes the
Table Size field correctly (N-1), the calculation of the PBA offset
is wrong because it calculates space for (N-1) entries instead of N.
This results in e.g. the following error when using QEMU with PCI
passthrough on a device which relies on the PCI endpoint subsystem:
failed to add PCI capability 0x11[0x50]@0xb0: table & pba overlap, or they don't fit in BARs, or don't align
Fix the calculation of PBA offset in the MSI-X capability.
Cc: stable(a)vger.kernel.org
Fixes: 83153d9f36e2 ("PCI: endpoint: Fix ->set_msix() to take BIR and offset as arguments")
Reviewed-by: Wilfred Mallawa <wilfred.mallawa(a)wdc.com>
Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org>
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
drivers/pci/controller/dwc/pcie-designware-ep.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 1a0bf9341542..24026f3f3413 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -585,6 +585,7 @@ static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
struct dw_pcie_ep_func *ep_func;
u32 val, reg;
+ u16 actual_interrupts = interrupts + 1;
ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
if (!ep_func || !ep_func->msix_cap)
@@ -595,7 +596,7 @@ static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
reg = ep_func->msix_cap + PCI_MSIX_FLAGS;
val = dw_pcie_ep_readw_dbi(ep, func_no, reg);
val &= ~PCI_MSIX_FLAGS_QSIZE;
- val |= interrupts;
+ val |= interrupts; /* 0's based value */
dw_pcie_writew_dbi(pci, reg, val);
reg = ep_func->msix_cap + PCI_MSIX_TABLE;
@@ -603,7 +604,7 @@ static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
dw_pcie_ep_writel_dbi(ep, func_no, reg, val);
reg = ep_func->msix_cap + PCI_MSIX_PBA;
- val = (offset + (interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
+ val = (offset + (actual_interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
dw_pcie_ep_writel_dbi(ep, func_no, reg, val);
dw_pcie_dbi_ro_wr_dis(pci);
--
2.49.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 94cff94634e506a4a44684bee1875d2dbf782722
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051256-encrust-scribe-9996@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 94cff94634e506a4a44684bee1875d2dbf782722 Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Date: Fri, 4 Apr 2025 15:31:16 +0200
Subject: [PATCH] clocksource/i8253: Use raw_spinlock_irqsave() in
clockevent_i8253_disable()
On x86 during boot, clockevent_i8253_disable() can be invoked via
x86_late_time_init -> hpet_time_init() -> pit_timer_init() which happens
with enabled interrupts.
If some of the old i8253 hardware is actually used then lockdep will notice
that i8253_lock is used in hard interrupt context. This causes lockdep to
complain because it observed the lock being acquired with interrupts
enabled and in hard interrupt context.
Make clockevent_i8253_disable() acquire the lock with
raw_spinlock_irqsave() to cure this.
[ tglx: Massage change log and use guard() ]
Fixes: c8c4076723dac ("x86/timer: Skip PIT initialization on modern chipsets")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20250404133116.p-XRWJXf@linutronix.de
diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c
index 39f7c2d736d1..b603c25f3dfa 100644
--- a/drivers/clocksource/i8253.c
+++ b/drivers/clocksource/i8253.c
@@ -103,7 +103,7 @@ int __init clocksource_i8253_init(void)
#ifdef CONFIG_CLKEVT_I8253
void clockevent_i8253_disable(void)
{
- raw_spin_lock(&i8253_lock);
+ guard(raw_spinlock_irqsave)(&i8253_lock);
/*
* Writing the MODE register should stop the counter, according to
@@ -132,8 +132,6 @@ void clockevent_i8253_disable(void)
outb_p(0, PIT_CH0);
outb_p(0x30, PIT_MODE);
-
- raw_spin_unlock(&i8253_lock);
}
static int pit_shutdown(struct clock_event_device *evt)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 6beb6835c1fbb3f676aebb51a5fee6b77fed9308
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025050913-rubble-confirm-99ee@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6beb6835c1fbb3f676aebb51a5fee6b77fed9308 Mon Sep 17 00:00:00 2001
From: Eelco Chaudron <echaudro(a)redhat.com>
Date: Tue, 6 May 2025 16:28:54 +0200
Subject: [PATCH] openvswitch: Fix unsafe attribute parsing in
output_userspace()
This patch replaces the manual Netlink attribute iteration in
output_userspace() with nla_for_each_nested(), which ensures that only
well-formed attributes are processed.
Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
Signed-off-by: Eelco Chaudron <echaudro(a)redhat.com>
Acked-by: Ilya Maximets <i.maximets(a)ovn.org>
Acked-by: Aaron Conole <aconole(a)redhat.com>
Link: https://patch.msgid.link/0bd65949df61591d9171c0dc13e42cea8941da10.174654173…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 61fea7baae5d..2f22ca59586f 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -975,8 +975,7 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
upcall.cmd = OVS_PACKET_CMD_ACTION;
upcall.mru = OVS_CB(skb)->mru;
- for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
- a = nla_next(a, &rem)) {
+ nla_for_each_nested(a, attr, rem) {
switch (nla_type(a)) {
case OVS_USERSPACE_ATTR_USERDATA:
upcall.userdata = a;
In some cases, there is a small-time gap in which CMD_RING_BUSY
can be cleared by controller but adding command completion event
to event ring will be delayed. As the result driver will return
error code.
This behavior has been detected on usbtest driver (test 9) with
configuration including ep1in/ep1out bulk and ep2in/ep2out isoc
endpoint.
Probably this gap occurred because controller was busy with adding
some other events to event ring.
The CMD_RING_BUSY is cleared to '0' when the Command Descriptor
has been executed and not when command completion event has been
added to event ring.
To fix this issue for this test the small delay is sufficient
less than 10us) but to make sure the problem doesn't happen again
in the future the patch introduces 10 retries to check with delay
about 20us before returning error code.
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
cc: stable(a)vger.kernel.org
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
---
Changelog:
v2:
- replaced usleep_range with udelay
- increased retry counter and decreased the udelay value
drivers/usb/cdns3/cdnsp-gadget.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/cdns3/cdnsp-gadget.c b/drivers/usb/cdns3/cdnsp-gadget.c
index 4824a10df07e..58650b7f4173 100644
--- a/drivers/usb/cdns3/cdnsp-gadget.c
+++ b/drivers/usb/cdns3/cdnsp-gadget.c
@@ -547,6 +547,7 @@ int cdnsp_wait_for_cmd_compl(struct cdnsp_device *pdev)
dma_addr_t cmd_deq_dma;
union cdnsp_trb *event;
u32 cycle_state;
+ u32 retry = 10;
int ret, val;
u64 cmd_dma;
u32 flags;
@@ -578,8 +579,23 @@ int cdnsp_wait_for_cmd_compl(struct cdnsp_device *pdev)
flags = le32_to_cpu(event->event_cmd.flags);
/* Check the owner of the TRB. */
- if ((flags & TRB_CYCLE) != cycle_state)
+ if ((flags & TRB_CYCLE) != cycle_state) {
+ /*
+ * Give some extra time to get chance controller
+ * to finish command before returning error code.
+ * Checking CMD_RING_BUSY is not sufficient because
+ * this bit is cleared to '0' when the Command
+ * Descriptor has been executed by controller
+ * and not when command completion event has
+ * be added to event ring.
+ */
+ if (retry--) {
+ udelay(20);
+ continue;
+ }
+
return -EINVAL;
+ }
cmd_dma = le64_to_cpu(event->event_cmd.cmd_trb);
--
2.43.0
From: Mario Limonciello <mario.limonciello(a)amd.com>
On VCN 4.0.5 using a doorbell to notify VCN hardware for WPTR changes
while dynamic power gating is enabled introduces a timing dependency
that can sometimes cause WPTR to not be properly updated. This manifests
as a job timeout which will trigger a VCN reset and cause the application
that submitted the job to crash.
Writing directly to the WPTR register instead of using the doorbell changes
the timing enough that the issue doesn't happen. Turn off doorbell use for
now while the issue continues to be debugged.
Cc: stable(a)vger.kernel.org
Cc: David.Wu3(a)amd.com
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3909
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
index ba603b2246e2e..ea9513f65d7e4 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
@@ -181,7 +181,7 @@ static int vcn_v4_0_5_sw_init(struct amdgpu_ip_block *ip_block)
return r;
ring = &adev->vcn.inst[i].ring_enc[0];
- ring->use_doorbell = true;
+ ring->use_doorbell = false;
if (amdgpu_sriov_vf(adev))
ring->doorbell_index = (adev->doorbell_index.vcn.vcn_ring0_1 << 1) +
i * (adev->vcn.inst[i].num_enc_rings + 1) + 1;
--
2.43.0
The patch titled
Subject: taskstats: fix struct taskstats breaks backward compatibility since version 15
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
taskstats-fix-struct-taskstats-breaks-backward-compatibility-since-version-15.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Wang Yaxin <wang.yaxin(a)zte.com.cn>
Subject: taskstats: fix struct taskstats breaks backward compatibility since version 15
Date: Sat, 10 May 2025 15:54:13 +0800 (CST)
Problem
========
commit 658eb5ab916d ("delayacct: add delay max to record delay peak")
- adding more fields
commit f65c64f311ee ("delayacct: add delay min to record delay peak")
- adding more fields
commit b016d0873777 ("taskstats: modify taskstats version")
- version bump to 15
Since version 15 (TASKSTATS_VERSION=15) the new layout of the structure
adds fields in the middle of the structure, rendering all old software
incompatible with newer kernels and software compiled against the new
kernel headers incompatible with older kernels.
Solution
=========
move delay max and delay min to the end of taskstat, and bump
the version to 16 after the change
Link: https://lkml.kernel.org/r/20250510155413259V4JNRXxukdDgzsaL0Fo6a@zte.com.cn
Fixes: f65c64f311ee ("delayacct: add delay min to record delay peak")
Signed-off-by: Wang Yaxin <wang.yaxin(a)zte.com.cn>
Signed-off-by: xu xin <xu.xin16(a)zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2(a)zte.com.cn>
Cc: Balbir Singh <bsingharora(a)gmail.com>
Cc: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/uapi/linux/taskstats.h | 43 +++++++++++++++++++------------
1 file changed, 27 insertions(+), 16 deletions(-)
--- a/include/uapi/linux/taskstats.h~taskstats-fix-struct-taskstats-breaks-backward-compatibility-since-version-15
+++ a/include/uapi/linux/taskstats.h
@@ -34,7 +34,7 @@
*/
-#define TASKSTATS_VERSION 15
+#define TASKSTATS_VERSION 16
#define TS_COMM_LEN 32 /* should be >= TASK_COMM_LEN
* in linux/sched.h */
@@ -72,8 +72,6 @@ struct taskstats {
*/
__u64 cpu_count __attribute__((aligned(8)));
__u64 cpu_delay_total;
- __u64 cpu_delay_max;
- __u64 cpu_delay_min;
/* Following four fields atomically updated using task->delays->lock */
@@ -82,14 +80,10 @@ struct taskstats {
*/
__u64 blkio_count;
__u64 blkio_delay_total;
- __u64 blkio_delay_max;
- __u64 blkio_delay_min;
/* Delay waiting for page fault I/O (swap in only) */
__u64 swapin_count;
__u64 swapin_delay_total;
- __u64 swapin_delay_max;
- __u64 swapin_delay_min;
/* cpu "wall-clock" running time
* On some architectures, value will adjust for cpu time stolen
@@ -172,14 +166,11 @@ struct taskstats {
/* Delay waiting for memory reclaim */
__u64 freepages_count;
__u64 freepages_delay_total;
- __u64 freepages_delay_max;
- __u64 freepages_delay_min;
+
/* Delay waiting for thrashing page */
__u64 thrashing_count;
__u64 thrashing_delay_total;
- __u64 thrashing_delay_max;
- __u64 thrashing_delay_min;
/* v10: 64-bit btime to avoid overflow */
__u64 ac_btime64; /* 64-bit begin time */
@@ -187,8 +178,6 @@ struct taskstats {
/* v11: Delay waiting for memory compact */
__u64 compact_count;
__u64 compact_delay_total;
- __u64 compact_delay_max;
- __u64 compact_delay_min;
/* v12 begin */
__u32 ac_tgid; /* thread group ID */
@@ -210,15 +199,37 @@ struct taskstats {
/* v13: Delay waiting for write-protect copy */
__u64 wpcopy_count;
__u64 wpcopy_delay_total;
- __u64 wpcopy_delay_max;
- __u64 wpcopy_delay_min;
/* v14: Delay waiting for IRQ/SOFTIRQ */
__u64 irq_count;
__u64 irq_delay_total;
+
+ /* v15: add Delay max and Delay min */
+
+ /* v16: move Delay max and Delay min to the end of taskstat */
+ __u64 cpu_delay_max;
+ __u64 cpu_delay_min;
+
+ __u64 blkio_delay_max;
+ __u64 blkio_delay_min;
+
+ __u64 swapin_delay_max;
+ __u64 swapin_delay_min;
+
+ __u64 freepages_delay_max;
+ __u64 freepages_delay_min;
+
+ __u64 thrashing_delay_max;
+ __u64 thrashing_delay_min;
+
+ __u64 compact_delay_max;
+ __u64 compact_delay_min;
+
+ __u64 wpcopy_delay_max;
+ __u64 wpcopy_delay_min;
+
__u64 irq_delay_max;
__u64 irq_delay_min;
- /* v15: add Delay max */
};
_
Patches currently in -mm which might be from wang.yaxin(a)zte.com.cn are
taskstats-fix-struct-taskstats-breaks-backward-compatibility-since-version-15.patch
In defining VEND1_GLOBAL_LED_PROV_ACT_STRETCH there was a typo where the
GENMASK definition was swapped.
Fix it to prevent any kind of misconfiguration if ever this define will
be used in the future.
Cc: <stable(a)vger.kernel.org>
Fixes: 61578f679378 ("net: phy: aquantia: add support for PHY LEDs")
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
---
drivers/net/phy/aquantia/aquantia.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/aquantia/aquantia.h b/drivers/net/phy/aquantia/aquantia.h
index 0c78bfabace5..e2bb66a21589 100644
--- a/drivers/net/phy/aquantia/aquantia.h
+++ b/drivers/net/phy/aquantia/aquantia.h
@@ -76,7 +76,7 @@
#define VEND1_GLOBAL_LED_PROV_LINK100 BIT(5)
#define VEND1_GLOBAL_LED_PROV_RX_ACT BIT(3)
#define VEND1_GLOBAL_LED_PROV_TX_ACT BIT(2)
-#define VEND1_GLOBAL_LED_PROV_ACT_STRETCH GENMASK(0, 1)
+#define VEND1_GLOBAL_LED_PROV_ACT_STRETCH GENMASK(1, 0)
#define VEND1_GLOBAL_LED_PROV_LINK_MASK (VEND1_GLOBAL_LED_PROV_LINK100 | \
VEND1_GLOBAL_LED_PROV_LINK1000 | \
--
2.48.1
Hi All,
Chages since v6:
- do not unnecessary free pages across iterations
Chages since v5:
- full error message included into commit description
Chages since v4:
- unused pages leak is avoided
Chages since v3:
- pfn_to_virt() changed to page_to_virt() due to compile error
Chages since v2:
- page allocation moved out of the atomic context
Chages since v1:
- Fixes: and -stable tags added to the patch description
Thanks!
Alexander Gordeev (1):
kasan: Avoid sleepable page allocation from atomic context
mm/kasan/shadow.c | 76 ++++++++++++++++++++++++++++++++++++++---------
1 file changed, 62 insertions(+), 14 deletions(-)
--
2.45.2
From: Michael Kelley <mhklinux(a)outlook.com>
Starting with commit dca5161f9bd0 in the 6.3 kernel, the Linux driver
for Hyper-V synthetic networking (netvsc) occasionally reports
"nvsp_rndis_pkt_complete error status: 2".[1] This error indicates
that Hyper-V has rejected a network packet transmit request from the
guest, and the outgoing network packet is dropped. Higher level
network protocols presumably recover and resend the packet so there is
no functional error, but performance is slightly impacted. Commit
dca5161f9bd0 is not the cause of the error -- it only added reporting
of an error that was already happening without any notice. The error
has presumably been present since the netvsc driver was originally
introduced into Linux.
This patch set fixes the root cause of the problem, which is that the
netvsc driver in Linux may send an incorrectly formatted VMBus message
to Hyper-V when transmitting the network packet. The incorrect
formatting occurs when the rndis header of the VMBus message crosses a
page boundary due to how the Linux skb head memory is aligned. In such
a case, two PFNs are required to describe the location of the rndis
header, even though they are contiguous in guest physical address
(GPA) space. Hyper-V requires that two PFNs be in a single "GPA range"
data struture, but current netvsc code puts each PFN in its own GPA
range, which Hyper-V rejects as an error in the case of the rndis
header.
The incorrect formatting occurs only for larger packets that netvsc
must transmit via a VMBus "GPA Direct" message. There's no problem
when netvsc transmits a smaller packet by copying it into a pre-
allocated send buffer slot because the pre-allocated slots don't have
page crossing issues.
After commit 14ad6ed30a10 in the 6.14 kernel, the error occurs much
more frequently in VMs with 16 or more vCPUs. It may occur every few
seconds, or even more frequently, in a ssh session that outputs a lot
of text. Commit 14ad6ed30a10 subtly changes how skb head memory is
allocated, making it much more likely that the rndis header will cross
a page boundary when the vCPU count is 16 or more. The changes in
commit 14ad6ed30a10 are perfectly valid -- they just had the side
effect of making the netvsc bug more prominent.
One fix is to check for adjacent PFNs in vmbus_sendpacket_pagebuffer()
and just combine them into a single GPA range. Such a fix is very
contained. But conceptually it is fixing the problem at the wrong
level. So this patch set takes the broader approach of maintaining
the already known grouping of contiguous PFNs at a higher level in
the netvsc driver code, and propagating that grouping down to the
creation of the VMBus message to send to Hyper-V. Maintaining the
grouping fixes this problem, and has the added benefit of allowing
netvsc_dma_map() to make fewer calls to dma_map_single() to do bounce
buffering in CoCo VMs.
Patch 1 is a preparatory change to allow vmbus_sendpacket_mpb_desc()
to specify multiple GPA ranges. In current code
vmbus_sendpacket_mpb_desc() is used only by the storvsc synthetic SCSI
driver, and it always creates a single GPA range.
Patch 2 updates the netvsc driver to use vmbus_sendpacket_mpb_desc()
instead of vmbus_sendpacket_pagebuffer(). Because the higher levels of
netvsc still don't group contiguous PFNs, this patch is functionally
neutral. The VMBus message to Hyper-V still has many GPA ranges, each
with a single PFN. But it lays the groundwork for the next patch.
Patch 3 changes the higher levels of netvsc to preserve the already
known grouping of contiguous PFNs. When the contiguous groupings are
passed to vmbus_sendpacket_mpb_desc(), GPA ranges containing multiple
PFNs are produced, as expected by Hyper-V. This is point at which the
core problem is fixed.
Patches 4 and 5 remove code that is no longer necessary after the
previous patches.
These changes provide a net reduction of about 65 lines of code, which
is an added benefit.
These changes have been tested in normal VMs, in SEV-SNP and TDX CoCo
VMs, and in Dv6-series VMs where the netvsp implementation is in the
OpenHCL paravisor instead of the Hyper-V host.
These changes are built against kernel version 6.15-rc6.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217503
Michael Kelley (5):
Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple
ranges
hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages
hv_netvsc: Preserve contiguous PFN grouping in the page buffer array
hv_netvsc: Remove rmsg_pgcnt
Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()
drivers/hv/channel.c | 65 ++-----------------------------
drivers/net/hyperv/hyperv_net.h | 13 ++++++-
drivers/net/hyperv/netvsc.c | 57 ++++++++++++++++++++++-----
drivers/net/hyperv/netvsc_drv.c | 62 +++++++----------------------
drivers/net/hyperv/rndis_filter.c | 24 +++---------
drivers/scsi/storvsc_drv.c | 1 +
include/linux/hyperv.h | 7 ----
7 files changed, 83 insertions(+), 146 deletions(-)
--
2.25.1