The onboard_hub 'driver' consists of two drivers, a platform
driver and a USB driver. Currently when the onboard hub driver
is initialized it first registers the platform driver, then the
USB driver. This results in a race condition when the 'attach'
work is executed, which is scheduled when the platform device
is probed. The purpose of fhe 'attach' work is to bind elegible
USB hub devices to the onboard_hub USB driver. This fails if
the work runs before the USB driver has been registered.
Register the USB driver first, then the platform driver. This
increases the chances that the onboard_hub USB devices are probed
before their corresponding platform device, which the USB driver
tries to locate in _probe(). The driver already handles this
situation and defers probing if the onboard hub platform device
doesn't exist yet.
Cc: stable(a)vger.kernel.org
Fixes: 8bc063641ceb ("usb: misc: Add onboard_usb_hub driver")
Link: https://lore.kernel.org/lkml/Y6W00vQm3jfLflUJ@hovoldconsulting.com/T/#m0d64…
Reported-by: Alexander Stein <alexander.stein(a)ew.tq-group.com>
Signed-off-by: Matthias Kaehlcke <mka(a)chromium.org>
---
(no changes since v1)
drivers/usb/misc/onboard_usb_hub.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/misc/onboard_usb_hub.c b/drivers/usb/misc/onboard_usb_hub.c
index 94e7966e199d..db0844b30bbd 100644
--- a/drivers/usb/misc/onboard_usb_hub.c
+++ b/drivers/usb/misc/onboard_usb_hub.c
@@ -433,13 +433,13 @@ static int __init onboard_hub_init(void)
{
int ret;
- ret = platform_driver_register(&onboard_hub_driver);
+ ret = usb_register_device_driver(&onboard_hub_usbdev_driver, THIS_MODULE);
if (ret)
return ret;
- ret = usb_register_device_driver(&onboard_hub_usbdev_driver, THIS_MODULE);
+ ret = platform_driver_register(&onboard_hub_driver);
if (ret)
- platform_driver_unregister(&onboard_hub_driver);
+ usb_deregister_device_driver(&onboard_hub_usbdev_driver);
return ret;
}
--
2.39.0.314.g84b9a713c41-goog
The self-refresh helper framework overloads "disable" to sometimes mean
"go into self-refresh mode," and this mode activates automatically
(e.g., after some period of unchanging display output). In such cases,
the display pipe is still considered "on", and user-space is not aware
that we went into self-refresh mode. Thus, users may expect that
vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
properly.
However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
vblank enabled here.
Add a new exception, such that we allow CRTCs to be "disabled" (with
self-refresh active) with vblank interrupts still enabled.
Cc: <stable(a)vger.kernel.org> # dependency for subsequent patch
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
drivers/gpu/drm/drm_atomic_helper.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index d579fd8f7cb8..7b5eddadebd5 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1207,6 +1207,12 @@ disable_outputs(struct drm_device *dev, struct drm_atomic_state *old_state)
if (!drm_dev_has_vblank(dev))
continue;
+ /*
+ * Self-refresh is not a true "disable"; let vblank remain
+ * enabled.
+ */
+ if (new_crtc_state->self_refresh_active)
+ continue;
ret = drm_crtc_vblank_get(crtc);
WARN_ONCE(ret != -EINVAL, "driver forgot to call drm_crtc_vblank_off()\n");
--
2.39.0.314.g84b9a713c41-goog
On Wed, Jan 11, 2023, at 07:16, Naresh Kamboju wrote:
> On Tue, 10 Jan 2023 at 23:36, Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> wrote:
>>
>
> Results from Linaro’s test farm.
> Regressions on arm64 Raspberry Pi 4 Model B.
>
> Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
>
> While running LTP controllers cgroup_fj_stress_blkio test cases
> the Insufficient stack space to handle exception! occurred and
> followed by kernel panic on arm64 Raspberry Pi 4 Model B with
> clang-15 built kernel Image.
>
> The full boot and test log attached to this email and build and
> Kconfig links provided in the bottom of this email.
>
> I will try to reproduce this reported issue and get back to you.
I looked at the log between 6.0.18 and 6.0.19-rc1, but don't see
any arm64 or memory management patches that could result in this.
Do you know if 6.0.18 ran successfull
> [ 2893.044339] Insufficient stack space to handle exception!
> [ 2893.044351] ESR: 0x0000000096000047 -- DABT (current EL)
> [ 2893.044360] FAR: 0xffff8000128180d0
> [ 2893.044364] Task stack: [0xffff800012a18000..0xffff800012a1c000]
> [ 2893.044370] IRQ stack: [0xffff80000a798000..0xffff80000a79c000]
> [ 2893.044375] Overflow stack: [0xffff0000f77c4310..0xffff0000f77c5310]
...
> [ 2893.044413] pc : el1h_64_sync+0x0/0x68
> [ 2893.044430] lr : wp_page_copy+0xf8/0x90c
> [ 2893.044445] sp : ffff8000128180d0
...
> [ 2893.044692] el1h_64_sync+0x0/0x68
> [ 2893.044700] do_wp_page+0x4a0/0x5c8
> [ 2893.044708] handle_mm_fault+0x7fc/0x14dc
> [ 2893.044718] do_page_fault+0x29c/0x450
> [ 2893.044727] do_mem_abort+0x4c/0xf8
> [ 2893.044741] el0_da+0x48/0xa8
> [ 2893.044750] el0t_64_sync_handler+0xcc/0xf0
> [ 2893.044759] el0t_64_sync+0x18c/0x190
It claims that the stack overflow happened in do_wp_page(),
but that has a really short call chain. It would be good
to have the source line for do_wp_page+0x4a0/0x5c8 and
wp_page_copy+0xf8/0x90c to see where exactly it was.
> [ 2893.285975] WARNING: CPU: 2 PID: 315758 at kernel/sched/core.c:3119
> set_task_cpu+0x14c/0x208
....
> [ 2893.286117] CPU: 2 PID: 315758 Comm: cgroup_fj_stres Not tainted
> [ 2893.286416] arch_timer_handler_phys+0x44/0x54
> [ 2893.286427] handle_percpu_devid_irq+0x90/0x220
> [ 2893.286439] generic_handle_domain_irq+0x38/0x50
> [ 2893.286447] gic_handle_irq+0x68/0xe8
> [ 2893.286455] el1_interrupt+0x88/0xc8
> [ 2893.286464] el1h_64_irq_handler+0x18/0x24
> [ 2893.286474] el1h_64_irq+0x64/0x68
> [ 2893.286482] panic+0x2d8/0x374
This is apparently a second unrelated bug -- it still processes timer
interrupts after calling panic() and this apparently fails because
the system is already unusable.
> artifact-location:
> https://storage.tuxsuite.com/public/linaro/lkft/builds/2K9JDtix2mHMoYRjNkBe…
file not found. I tried to get the vmlinux file to look at the disassembly
but the artifacts appear to be gone already.
Arnd
While experimenting with applying noqueue to a classful queue discipline,
we discovered a NULL pointer dereference in the __dev_queue_xmit()
path that generates a kernel OOPS:
# dev=enp0s5
# tc qdisc replace dev $dev root handle 1: htb default 1
# tc class add dev $dev parent 1: classid 1:1 htb rate 10mbit
# tc qdisc add dev $dev parent 1:1 handle 10: noqueue
# ping -I $dev -w 1 -c 1 1.1.1.1
[ 2.172856] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 2.173217] #PF: supervisor instruction fetch in kernel mode
...
[ 2.178451] Call Trace:
[ 2.178577] <TASK>
[ 2.178686] htb_enqueue+0x1c8/0x370
[ 2.178880] dev_qdisc_enqueue+0x15/0x90
[ 2.179093] __dev_queue_xmit+0x798/0xd00
[ 2.179305] ? _raw_write_lock_bh+0xe/0x30
[ 2.179522] ? __local_bh_enable_ip+0x32/0x70
[ 2.179759] ? ___neigh_create+0x610/0x840
[ 2.179968] ? eth_header+0x21/0xc0
[ 2.180144] ip_finish_output2+0x15e/0x4f0
[ 2.180348] ? dst_output+0x30/0x30
[ 2.180525] ip_push_pending_frames+0x9d/0xb0
[ 2.180739] raw_sendmsg+0x601/0xcb0
[ 2.180916] ? _raw_spin_trylock+0xe/0x50
[ 2.181112] ? _raw_spin_unlock_irqrestore+0x16/0x30
[ 2.181354] ? get_page_from_freelist+0xcd6/0xdf0
[ 2.181594] ? sock_sendmsg+0x56/0x60
[ 2.181781] sock_sendmsg+0x56/0x60
[ 2.181958] __sys_sendto+0xf7/0x160
[ 2.182139] ? handle_mm_fault+0x6e/0x1d0
[ 2.182366] ? do_user_addr_fault+0x1e1/0x660
[ 2.182627] __x64_sys_sendto+0x1b/0x30
[ 2.182881] do_syscall_64+0x38/0x90
[ 2.183085] entry_SYSCALL_64_after_hwframe+0x63/0xcd
...
[ 2.187402] </TASK>
Previously in commit d66d6c3152e8 ("net: sched: register noqueue
qdisc"), NULL was set for the noqueue discipline on noqueue init
so that __dev_queue_xmit() falls through for the noqueue case. This
also sets a bypass of the enqueue NULL check in the
register_qdisc() function for the struct noqueue_disc_ops.
Classful queue disciplines make it past the NULL check in
__dev_queue_xmit() because the discipline is set to htb (in this case),
and then in the call to __dev_xmit_skb(), it calls into htb_enqueue()
which grabs a leaf node for a class and then calls qdisc_enqueue() by
passing in a queue discipline which assumes ->enqueue() is not set to NULL.
Fix this by not allowing classes to be assigned to the noqueue
discipline. Linux TC Notes states that classes cannot be set to
the noqueue discipline. [1] Let's enforce that here.
Links:
1. https://linux-tc-notes.sourceforge.net/tc/doc/sch_noqueue.txt
Fixes: d66d6c3152e8 ("net: sched: register noqueue qdisc")
Cc: stable(a)vger.kernel.org
Signed-off-by: Frederick Lawler <fred(a)cloudflare.com>
Reviewed-by: Jakub Sitnicki <jakub(a)cloudflare.com>
---
net/sched/sch_api.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2317db02c764..72d2c204d5f3 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1133,6 +1133,11 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
return -ENOENT;
}
+ if (new && new->ops == &noqueue_qdisc_ops) {
+ NL_SET_ERR_MSG(extack, "Cannot assign noqueue to a class");
+ return -EINVAL;
+ }
+
err = cops->graft(parent, cl, new, &old, extack);
if (err)
return err;
--
2.34.1