Hi, Thorsten here, the Linux kernel's regression tracker.
I noticed a report about a linux-6.6.y regression in bugzilla.kernel.org that appears to be caused by this commit from Dan applied by Greg:
15fffc6a5624b1 ("driver core: Fix uevent_show() vs driver detach race") [v6.11-rc3, v6.10.5, v6.6.46, v6.1.105, v5.15.165, v5.10.224, v5.4.282, v4.19.320]
The reporter did not check yet if mainline is affected; decided to forward the report by mail nevertheless, as the maintainer for the subsystem is also the maintainer for the stable tree. ;-)
To quote from https://bugzilla.kernel.org/show_bug.cgi?id=219244 :
The symptoms of this bug are as follows:
- After booting (to the graphical login screen) the mouse pointer
would frozen and only after unplugging and plugging-in again the usb plug of the mouse would the mouse be working as expected.
- If one would log in without fixing the mouse issue, the mouse
pointer would still be frozen after login.
- The usb keyboard usually is not affected even though plugged into
the same usb-hub - thus logging in is possible.
- The mouse pointer is also frozen if the usb connector is plugged
into a different usb-port (different from the usb-hub)
- The pointer is moveable via the inbuilt synaptics trackpad
The kernel log shows almost the same messages (not sure if the differences mean anything in regards to this bug) for the initial recognizing the mouse (frozen mouse pointer) and the re-plugged-in mouse (and subsequently moveable mouse pointer):
[kernel] [ 8.763158] usb 1-2.2.1.2: new low-speed USB device number 10 using xhci_hcd [kernel] [ 8.956028] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04 [kernel] [ 8.956036] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [kernel] [ 8.956039] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0 [kernel] [ 8.956041] usb 1-2.2.1.2: Manufacturer: Microsoft [kernel] [ 8.963554] input: Microsoft Microsoft Basic Optical Mouse v2.0 as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0002/input/input18 [kernel] [ 8.964417] hid-generic 0003:045E:00CB.0002: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0
[kernel] [ 31.258381] usb 1-2.2.1.2: USB disconnect, device number 10 [kernel] [ 31.595051] usb 1-2.2.1.2: new low-speed USB device number 16 using xhci_hcd [kernel] [ 31.804002] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04 [kernel] [ 31.804010] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [kernel] [ 31.804013] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0 [kernel] [ 31.804016] usb 1-2.2.1.2: Manufacturer: Microsoft [kernel] [ 31.812933] input: Microsoft Microsoft Basic Optical Mouse v2.0 as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0004/input/input20 [kernel] [ 31.814028] hid-generic 0003:045E:00CB.0004: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0
Differences:
../0003:045E:00CB.0002/input/input18 vs ../0003:045E:00CB.0004/input/input20
and
hid-generic 0003:045E:00CB.0002 vs hid-generic 0003:045E:00CB.0004
The connector / usb-port was not changed in this case!
The symptoms of this bug have been present at one point in the recent past, but with kernel v6.6.45 (or maybe even some version before that) it was fine. But with 6.6.45 it seems to be definitely fine.
But with v6.6.46 the symptoms returned. That's the reason I suspected the kernel to be the cause of this issue. So I did some bisecting - which wasn't easy because that bug would often times not appear if the system was simply rebooted into the test kernel. As the bug would definitely appear on the affected kernels (v6.6.46 ff) after shutting down the system for the night and booting the next day, I resorted to simulating the over-night powering-off by shutting the system down, unplugging the power and pressing the power button to get rid of residual voltage. But even then a few times the bug would only appear if I repeated this procedure before booting the system again with the respective kernel.
This is on a Thinkpad with Kaby Lake and integrated Intel graphics. Even though it is a laptop, it is used as a desktop device, and the internal battery is disconnected, the removable battery is removed as the system is plugged-in via the power cord at all times (when in use)! Also, the system has no power (except for the bios battery, of course) over night as the power outlet is switched off if the device is not in use.
Not sure if this affects the issue - or how it does. But for successful bisecting I had to resort to the above procedure.
Bisecting the issue (between the release commits of v6.6.45 and v6.6.46) resulted in this commit [1] being the probable culprit.
I then tested kernel v6.6.49. It still produced the bug for me. So I reverted the changes of the assumed "bad commit" and re-compiled kernel v6.6.49. With this modified kernel the bug seems to be gone.
Now, I assume the commit has a reason for being introduced, but maybe needs some adjusting in order to avoid this bug I'm experiencing on my system. Also, I can't say why the issue appeared in the past even without this commit being present, as I haven't bisected any kernel version before v6.6.45.
[1]:
4d035c743c3e391728a6f81cbf0f7f9ca700cf62 is the first bad commit commit 4d035c743c3e391728a6f81cbf0f7f9ca700cf62 Author: Dan Williams dan.j.williams@intel.com Date: Fri Jul 12 12:42:09 2024 -0700
driver core: Fix uevent_show() vs driver detach race
commit 15fffc6a5624b13b428bb1c6e9088e32a55eb82c upstream. uevent_show() wants to de-reference dev->driver->name. There is no clean
See the ticket for more details. Note, you have to use bugzilla to reach the reporter, as I sadly[1] can not CCed them in mails like this.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
[1] because bugzilla.kernel.org tells users upon registration their "email address will never be displayed to logged out users"
P.S.: let me use this mail to also add the report to the list of tracked regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: 4d035c743c3e391728a6f81cbf0f7f9ca700cf62 #regzbot from: brmails+k #regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=219244 #regzbot title: driver core: frozen usb mouse pointer at boot #regzbot ignore-activity
On Wed, Sep 11, 2024 at 09:55:03AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
Hi, Thorsten here, the Linux kernel's regression tracker.
I noticed a report about a linux-6.6.y regression in bugzilla.kernel.org that appears to be caused by this commit from Dan applied by Greg:
15fffc6a5624b1 ("driver core: Fix uevent_show() vs driver detach race") [v6.11-rc3, v6.10.5, v6.6.46, v6.1.105, v5.15.165, v5.10.224, v5.4.282, v4.19.320]
The reporter did not check yet if mainline is affected; decided to forward the report by mail nevertheless, as the maintainer for the subsystem is also the maintainer for the stable tree. ;-)
To quote from https://bugzilla.kernel.org/show_bug.cgi?id=219244 :
This is very odd, because:
The symptoms of this bug are as follows:
- After booting (to the graphical login screen) the mouse pointer
would frozen and only after unplugging and plugging-in again the usb plug of the mouse would the mouse be working as expected.
- If one would log in without fixing the mouse issue, the mouse
pointer would still be frozen after login.
- The usb keyboard usually is not affected even though plugged into
the same usb-hub - thus logging in is possible.
- The mouse pointer is also frozen if the usb connector is plugged
into a different usb-port (different from the usb-hub)
- The pointer is moveable via the inbuilt synaptics trackpad
The patch from Dan should only affect when the module is unloaded, not when the device is removed.
And it should not diferenciate between device types (i.e. mouse, keyboard, etc.) as it affects ALL devices in the system.
The kernel log shows almost the same messages (not sure if the differences mean anything in regards to this bug) for the initial recognizing the mouse (frozen mouse pointer) and the re-plugged-in mouse (and subsequently moveable mouse pointer):
[kernel] [ 8.763158] usb 1-2.2.1.2: new low-speed USB device number 10 using xhci_hcd [kernel] [ 8.956028] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04 [kernel] [ 8.956036] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [kernel] [ 8.956039] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0 [kernel] [ 8.956041] usb 1-2.2.1.2: Manufacturer: Microsoft [kernel] [ 8.963554] input: Microsoft Microsoft Basic Optical Mouse v2.0 as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0002/input/input18 [kernel] [ 8.964417] hid-generic 0003:045E:00CB.0002: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0
[kernel] [ 31.258381] usb 1-2.2.1.2: USB disconnect, device number 10 [kernel] [ 31.595051] usb 1-2.2.1.2: new low-speed USB device number 16 using xhci_hcd [kernel] [ 31.804002] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04 [kernel] [ 31.804010] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [kernel] [ 31.804013] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0 [kernel] [ 31.804016] usb 1-2.2.1.2: Manufacturer: Microsoft [kernel] [ 31.812933] input: Microsoft Microsoft Basic Optical Mouse v2.0 as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0004/input/input20 [kernel] [ 31.814028] hid-generic 0003:045E:00CB.0004: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0
Differences:
../0003:045E:00CB.0002/input/input18 vs ../0003:045E:00CB.0004/input/input20
That's normal, just a different name for the device, they are always dynamic.
and
hid-generic 0003:045E:00CB.0002 vs hid-generic 0003:045E:00CB.0004
Again, different device names, all should be fine.
The connector / usb-port was not changed in this case!
The symptoms of this bug have been present at one point in the recent past, but with kernel v6.6.45 (or maybe even some version before that) it was fine. But with 6.6.45 it seems to be definitely fine.
But with v6.6.46 the symptoms returned. That's the reason I suspected the kernel to be the cause of this issue. So I did some bisecting - which wasn't easy because that bug would often times not appear if the system was simply rebooted into the test kernel. As the bug would definitely appear on the affected kernels (v6.6.46 ff) after shutting down the system for the night and booting the next day, I resorted to simulating the over-night powering-off by shutting the system down, unplugging the power and pressing the power button to get rid of residual voltage. But even then a few times the bug would only appear if I repeated this procedure before booting the system again with the respective kernel.
This is on a Thinkpad with Kaby Lake and integrated Intel graphics. Even though it is a laptop, it is used as a desktop device, and the internal battery is disconnected, the removable battery is removed as the system is plugged-in via the power cord at all times (when in use)! Also, the system has no power (except for the bios battery, of course) over night as the power outlet is switched off if the device is not in use.
Not sure if this affects the issue - or how it does. But for successful bisecting I had to resort to the above procedure.
Bisecting the issue (between the release commits of v6.6.45 and v6.6.46) resulted in this commit [1] being the probable culprit.
I then tested kernel v6.6.49. It still produced the bug for me. So I reverted the changes of the assumed "bad commit" and re-compiled kernel v6.6.49. With this modified kernel the bug seems to be gone.
This is odd.
Does the latest 6.10.y release also show this problem?
I can't duplicate this here, and it's the first I've heard of it (given that USB mice are pretty popular, I would suspect others would have hit it as well...)
thanks,
greg k-h
Greg Kroah-Hartman wrote: [..]
This is odd.
Does the latest 6.10.y release also show this problem?
I can't duplicate this here, and it's the first I've heard of it (given that USB mice are pretty popular, I would suspect others would have hit it as well...)
Sorry for missing this earlier. One thought is that userspace has a dependency on uevent_show() flushing device probing. In other words the side effect of taking the device_lock() in uevent_show() is that udev might enjoy some occasions where the reading the uevent flushes probing before the udev rule runs. With this change, uevent_show() no longer waits for any inflight probes to complete.
One idea to fix this problem is to create a special case sysfs attribute type that takes the device_lock() before kernfs_get_active() to avoid the deadlock on attribute teardown.
I'll take a look. Thanks for forwarding the report Thorsten!
Dan Williams wrote:
Greg Kroah-Hartman wrote: [..]
This is odd.
Does the latest 6.10.y release also show this problem?
I can't duplicate this here, and it's the first I've heard of it (given that USB mice are pretty popular, I would suspect others would have hit it as well...)
Sorry for missing this earlier. One thought is that userspace has a dependency on uevent_show() flushing device probing. In other words the side effect of taking the device_lock() in uevent_show() is that udev might enjoy some occasions where the reading the uevent flushes probing before the udev rule runs. With this change, uevent_show() no longer waits for any inflight probes to complete.
One idea to fix this problem is to create a special case sysfs attribute type that takes the device_lock() before kernfs_get_active() to avoid the deadlock on attribute teardown.
I'll take a look. Thanks for forwarding the report Thorsten!
Ok, the following boots and passes the CXL unit tests, would appreciate if the reporter can give this a try:
-- >8 -- Subject: driver core: Fix userspace expectations of uevent_show() as a probe barrier
From: Dan Williams dan.j.williams@intel.com
Commit "driver core: Fix uevent_show() vs driver detach race" [1] attempted to fix a lockdep report in uevent_show() by making the lookup of driver information for a given device lockless. It turns out that userspace likely depends on the side-effect of uevent_show() flushing device probing.
Introduce a new "locked" type for sysfs attributes that arranges for the attribute to be called under the device-lock, but without setting up a reverse locking order dependency with the kernfs_get_active() lock.
This new facility holds a reference on a device while any locked-sysfs attribute of that device is open. It then takes the lock around sysfs read/write operations in the following order:
of->mutex of->op_mutex <= device_lock() kernfs_get_active() <operation>
Compare that to problematic locking order of:
of->mutex kernfs_get_active() <operation> device_lock()
...which causes potential deadlocks with kernfs_drain() that may be called while the device_lock() is held.
Fixes: 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race") [1] Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=219244 Signed-off-by: Dan Williams dan.j.williams@intel.com --- drivers/base/core.c | 2 +- fs/kernfs/file.c | 24 +++++++++++++++++++----- fs/sysfs/file.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ fs/sysfs/group.c | 4 ++-- include/linux/device.h | 3 +++ include/linux/kernfs.h | 1 + include/linux/sysfs.h | 10 ++++++++++ 7 files changed, 83 insertions(+), 8 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c index 8c0733d3aad8..1fd5a18cbb62 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2772,7 +2772,7 @@ static ssize_t uevent_store(struct device *dev, struct device_attribute *attr,
return count; } -static DEVICE_ATTR_RW(uevent); +static DEVICE_ATTR_LOCKED_RW(uevent);
static ssize_t online_show(struct device *dev, struct device_attribute *attr, char *buf) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 8502ef68459b..eb5c2167beb9 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -142,6 +142,20 @@ static void kernfs_seq_stop_active(struct seq_file *sf, void *v) kernfs_put_active(of->kn); }
+static void kernfs_open_file_lock(struct kernfs_open_file *of) +{ + mutex_lock(&of->mutex); + if (of->op_mutex) + mutex_lock(of->op_mutex); +} + +static void kernfs_open_file_unlock(struct kernfs_open_file *of) +{ + if (of->op_mutex) + mutex_unlock(of->op_mutex); + mutex_unlock(&of->mutex); +} + static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos) { struct kernfs_open_file *of = sf->private; @@ -151,7 +165,7 @@ static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos) * @of->mutex nests outside active ref and is primarily to ensure that * the ops aren't called concurrently for the same open file. */ - mutex_lock(&of->mutex); + kernfs_open_file_lock(of); if (!kernfs_get_active(of->kn)) return ERR_PTR(-ENODEV);
@@ -193,7 +207,7 @@ static void kernfs_seq_stop(struct seq_file *sf, void *v)
if (v != ERR_PTR(-ENODEV)) kernfs_seq_stop_active(sf, v); - mutex_unlock(&of->mutex); + kernfs_open_file_unlock(of); }
static int kernfs_seq_show(struct seq_file *sf, void *v) @@ -322,9 +336,9 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) * @of->mutex nests outside active ref and is used both to ensure that * the ops aren't called concurrently for the same open file. */ - mutex_lock(&of->mutex); + kernfs_open_file_lock(of); if (!kernfs_get_active(of->kn)) { - mutex_unlock(&of->mutex); + kernfs_open_file_unlock(of); len = -ENODEV; goto out_free; } @@ -336,7 +350,7 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) len = -EINVAL;
kernfs_put_active(of->kn); - mutex_unlock(&of->mutex); + kernfs_open_file_unlock(of);
if (len > 0) iocb->ki_pos += len; diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index d1995e2d6c94..1bb878efcf00 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -15,6 +15,7 @@ #include <linux/list.h> #include <linux/mutex.h> #include <linux/seq_file.h> +#include <linux/device.h> #include <linux/mm.h>
#include "sysfs.h" @@ -189,6 +190,26 @@ static int sysfs_kf_bin_open(struct kernfs_open_file *of) return 0; }
+/* locked attributes are always device attributes */ +static int sysfs_kf_setup_lock(struct kernfs_open_file *of) +{ + struct kobject *kobj = of->kn->parent->priv; + struct device *dev = kobj_to_dev(kobj); + + get_device(dev); + of->op_mutex = &dev->mutex; + + return 0; +} + +static void sysfs_kf_free_lock(struct kernfs_open_file *of) +{ + struct kobject *kobj = of->kn->parent->priv; + struct device *dev = kobj_to_dev(kobj); + + put_device(dev); +} + void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr) { struct kernfs_node *kn = kobj->sd, *tmp; @@ -227,6 +248,25 @@ static const struct kernfs_ops sysfs_file_kfops_rw = { .write = sysfs_kf_write, };
+static const struct kernfs_ops sysfs_locked_kfops_ro = { + .seq_show = sysfs_kf_seq_show, + .open = sysfs_kf_setup_lock, + .release = sysfs_kf_free_lock, +}; + +static const struct kernfs_ops sysfs_locked_kfops_wo = { + .write = sysfs_kf_write, + .open = sysfs_kf_setup_lock, + .release = sysfs_kf_free_lock, +}; + +static const struct kernfs_ops sysfs_locked_kfops_rw = { + .seq_show = sysfs_kf_seq_show, + .write = sysfs_kf_write, + .open = sysfs_kf_setup_lock, + .release = sysfs_kf_free_lock, +}; + static const struct kernfs_ops sysfs_prealloc_kfops_ro = { .read = sysfs_kf_read, .prealloc = true, @@ -287,6 +327,13 @@ int sysfs_add_file_mode_ns(struct kernfs_node *parent, ops = &sysfs_prealloc_kfops_ro; else if (sysfs_ops->store) ops = &sysfs_prealloc_kfops_wo; + } else if (mode & SYSFS_LOCKED) { + if (sysfs_ops->show && sysfs_ops->store) + ops = &sysfs_locked_kfops_rw; + else if (sysfs_ops->show) + ops = &sysfs_locked_kfops_ro; + else if (sysfs_ops->store) + ops = &sysfs_locked_kfops_wo; } else { if (sysfs_ops->show && sysfs_ops->store) ops = &sysfs_file_kfops_rw; diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c index d22ad67a0f32..0158367866be 100644 --- a/fs/sysfs/group.c +++ b/fs/sysfs/group.c @@ -68,11 +68,11 @@ static int create_files(struct kernfs_node *parent, struct kobject *kobj, continue; }
- WARN(mode & ~(SYSFS_PREALLOC | 0664), + WARN(mode & ~(SYSFS_PREALLOC | SYSFS_LOCKED | 0664), "Attribute %s: Invalid permissions 0%o\n", (*attr)->name, mode);
- mode &= SYSFS_PREALLOC | 0664; + mode &= SYSFS_PREALLOC | SYSFS_LOCKED | 0664; error = sysfs_add_file_mode_ns(parent, *attr, mode, uid, gid, NULL); if (unlikely(error)) diff --git a/include/linux/device.h b/include/linux/device.h index 34eb20f5966f..c38c33bed333 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -180,6 +180,9 @@ ssize_t device_show_string(struct device *dev, struct device_attribute *attr, #define DEVICE_ATTR_RW(_name) \ struct device_attribute dev_attr_##_name = __ATTR_RW(_name)
+#define DEVICE_ATTR_LOCKED_RW(_name) \ + struct device_attribute dev_attr_##_name = __ATTR_LOCKED_RW(_name) + /** * DEVICE_ATTR_ADMIN_RW - Define an admin-only read-write device attribute. * @_name: Attribute name. diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 87c79d076d6d..df6828a7cd3e 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -257,6 +257,7 @@ struct kernfs_open_file {
/* private fields, do not use outside kernfs proper */ struct mutex mutex; + struct mutex *op_mutex; struct mutex prealloc_mutex; int event; struct list_head list; diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h index c4e64dc11206..981588c9c673 100644 --- a/include/linux/sysfs.h +++ b/include/linux/sysfs.h @@ -103,6 +103,7 @@ struct attribute_group {
#define SYSFS_PREALLOC 010000 #define SYSFS_GROUP_INVISIBLE 020000 +#define SYSFS_LOCKED 040000
/* * DEFINE_SYSFS_GROUP_VISIBLE(name): @@ -230,6 +231,13 @@ struct attribute_group { .store = _store, \ }
+#define __ATTR_LOCKED(_name, _mode, _show, _store) { \ + .attr = {.name = __stringify(_name), \ + .mode = SYSFS_LOCKED | VERIFY_OCTAL_PERMISSIONS(_mode) },\ + .show = _show, \ + .store = _store, \ +} + #define __ATTR_RO(_name) { \ .attr = { .name = __stringify(_name), .mode = 0444 }, \ .show = _name##_show, \ @@ -255,6 +263,8 @@ struct attribute_group {
#define __ATTR_RW(_name) __ATTR(_name, 0644, _name##_show, _name##_store)
+#define __ATTR_LOCKED_RW(_name) __ATTR_LOCKED(_name, 0644, _name##_show, _name##_store) + #define __ATTR_NULL { .attr = { .name = NULL } }
#ifdef CONFIG_DEBUG_LOCK_ALLOC
On 24.09.24 04:54, Dan Williams wrote:
Dan Williams wrote:
Greg Kroah-Hartman wrote: [..]
This is odd.
Does the latest 6.10.y release also show this problem?
I can't duplicate this here, and it's the first I've heard of it (given that USB mice are pretty popular, I would suspect others would have hit it as well...)
Sorry for missing this earlier. One thought is that userspace has a dependency on uevent_show() flushing device probing. In other words the side effect of taking the device_lock() in uevent_show() is that udev might enjoy some occasions where the reading the uevent flushes probing before the udev rule runs. With this change, uevent_show() no longer waits for any inflight probes to complete.
One idea to fix this problem is to create a special case sysfs attribute type that takes the device_lock() before kernfs_get_active() to avoid the deadlock on attribute teardown.
I'll take a look. Thanks for forwarding the report Thorsten!
Ok, the following boots and passes the CXL unit tests, would appreciate if the reporter can give this a try:
Somehow I apparently became a "bugzilla-man-in-the-middle interface" yet again... But whatever! ¯_(ツ)_/¯
To forward the latest comment from the ticket:
""" --- Comment #11 from brmails+k@disroot.org --- Good news!
I think the proposed patch by Dan Williams fixes the issue.
I have tested it with v6.6.52 and v6.10.11. I haven't been able to recreate the issue with those modified kernels even once.
The patch can be applied to v6.11.0 and v6.10.11 out of the box. For v6.6.52 I had to slightly modify it as the line
#define SYSFS_GROUP_INVISIBLE 020000
doesn't exist in /include/linux/sysfs.h in v6.6.52 hence the patch looking for that line fails on that file. But after adjusting the patch accordingly, the patch works fine on v6.6.52 and the issue is gone with the patched version of v6.6.52, just like 6.10.11.
So, I assume that the fix / patch proposed by Dan Williams works as intended resolving the issue I had.
Thanks again for forwarding the bug report and for the quick fix! """
Ciao, Thorsten
[ add tools@kernel.org for bugzilla bridge mention ]
Linux regression tracking (Thorsten Leemhuis) wrote:
On 24.09.24 04:54, Dan Williams wrote:
Dan Williams wrote:
Greg Kroah-Hartman wrote: [..]
This is odd.
Does the latest 6.10.y release also show this problem?
I can't duplicate this here, and it's the first I've heard of it (given that USB mice are pretty popular, I would suspect others would have hit it as well...)
Sorry for missing this earlier. One thought is that userspace has a dependency on uevent_show() flushing device probing. In other words the side effect of taking the device_lock() in uevent_show() is that udev might enjoy some occasions where the reading the uevent flushes probing before the udev rule runs. With this change, uevent_show() no longer waits for any inflight probes to complete.
One idea to fix this problem is to create a special case sysfs attribute type that takes the device_lock() before kernfs_get_active() to avoid the deadlock on attribute teardown.
I'll take a look. Thanks for forwarding the report Thorsten!
Ok, the following boots and passes the CXL unit tests, would appreciate if the reporter can give this a try:
Somehow I apparently became a "bugzilla-man-in-the-middle interface" yet again... But whatever! ¯_(ツ)_/¯
Oh, sorry about that Thorsten, for some reason I thought we were living in this new world that Konstantin mentions where mailing-list to bugzilla communication is automated. Hopefully that arrives soon.
I very much appreciate the help here and thank you for the hard work you put into taking care of regressions!
To forward the latest comment from the ticket:
""" --- Comment #11 from brmails+k@disroot.org --- Good news!
I think the proposed patch by Dan Williams fixes the issue.
I have tested it with v6.6.52 and v6.10.11. I haven't been able to recreate the issue with those modified kernels even once.
The patch can be applied to v6.11.0 and v6.10.11 out of the box. For v6.6.52 I had to slightly modify it as the line
#define SYSFS_GROUP_INVISIBLE 020000
doesn't exist in /include/linux/sysfs.h in v6.6.52 hence the patch looking for that line fails on that file. But after adjusting the patch accordingly, the patch works fine on v6.6.52 and the issue is gone with the patched version of v6.6.52, just like 6.10.11.
So, I assume that the fix / patch proposed by Dan Williams works as intended resolving the issue I had.
Thanks again for forwarding the bug report and for the quick fix! """
Great news, I'll revise the patch a bit to add some clarifying documentation. Specifically, that the new DEVICE_ATTR_RW_LOCKED() is only for exceptional corner cases like uevent_show(), and that most usages of "dev->driver" in a sysfs attribute are best served by being declared as part of "drv_groups".
linux-stable-mirror@lists.linaro.org