From: Qiuxu Zhuo <qiuxu.zhuo(a)intel.com>
[ Upstream commit 833cd3e9ad8360785b6c23c82dd3856df00732d9 ]
Sometimes the system [1] hangs on x86 I/O machine checks. However, the
expected behavior is to reboot the system, as the machine check handler
ultimately triggers a panic(), initiating a reboot in the last step.
The root cause is that sometimes the panic() is blocked when
drm_fb_helper_damage() invoking schedule_work() to flush the frame buffer.
This occurs during the process of flushing all messages to the frame
buffer driver as shown in the following call trace:
Machine check occurs [2]:
panic()
console_flush_on_panic()
console_flush_all()
console_emit_next_record()
con->write()
vt_console_print()
hide_cursor()
vc->vc_sw->con_cursor()
fbcon_cursor()
ops->cursor()
bit_cursor()
soft_cursor()
info->fbops->fb_imageblit()
drm_fbdev_generic_defio_imageblit()
drm_fb_helper_damage_area()
drm_fb_helper_damage()
schedule_work() // <--- blocked here
...
emergency_restart() // wasn't invoked, so no reboot.
During panic(), except the panic CPU, all the other CPUs are stopped.
In schedule_work(), the panic CPU requires the lock of worker_pool to
queue the work on that pool, while the lock may have been token by some
other stopped CPU. So schedule_work() is blocked.
Additionally, during a panic(), since there is no opportunity to execute
any scheduled work, it's safe to fix this issue by skipping schedule_work()
on 'oops_in_progress' in drm_fb_helper_damage().
[1] Enable the kernel option CONFIG_FRAMEBUFFER_CONSOLE,
CONFIG_DRM_FBDEV_EMULATION, and boot with the 'console=tty0'
kernel command line parameter.
[2] Set 'panic_timeout' to a non-zero value before calling panic().
Acked-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Reported-by: Yudong Wang <yudong.wang(a)intel.com>
Tested-by: Yudong Wang <yudong.wang(a)intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240703141737.75378-1-qiuxu.…
Signed-off-by: Maarten Lankhorst,,, <maarten.lankhorst(a)linux.intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/drm_fb_helper.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 117237d3528bd..618b045230336 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -631,6 +631,17 @@ static void drm_fb_helper_add_damage_clip(struct drm_fb_helper *helper, u32 x, u
static void drm_fb_helper_damage(struct drm_fb_helper *helper, u32 x, u32 y,
u32 width, u32 height)
{
+ /*
+ * This function may be invoked by panic() to flush the frame
+ * buffer, where all CPUs except the panic CPU are stopped.
+ * During the following schedule_work(), the panic CPU needs
+ * the worker_pool lock, which might be held by a stopped CPU,
+ * causing schedule_work() and panic() to block. Return early on
+ * oops_in_progress to prevent this blocking.
+ */
+ if (oops_in_progress)
+ return;
+
drm_fb_helper_add_damage_clip(helper, x, y, width, height);
schedule_work(&helper->damage_work);
--
2.43.0
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 06c03c8edce333b9ad9c6b207d93d3a5ae7c10c0
Gitweb: https://git.kernel.org/tip/06c03c8edce333b9ad9c6b207d93d3a5ae7c10c0
Author: Justin Stitt <justinstitt(a)google.com>
AuthorDate: Fri, 17 May 2024 00:47:10
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 05 Aug 2024 16:14:14 +02:00
ntp: Safeguard against time_constant overflow
Using syzkaller with the recently reintroduced signed integer overflow
sanitizer produces this UBSAN report:
UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:738:18
9223372036854775806 + 4 cannot be represented in type 'long'
Call Trace:
handle_overflow+0x171/0x1b0
__do_adjtimex+0x1236/0x1440
do_adjtimex+0x2be/0x740
The user supplied time_constant value is incremented by four and then
clamped to the operating range.
Before commit eea83d896e31 ("ntp: NTP4 user space bits update") the user
supplied value was sanity checked to be in the operating range. That change
removed the sanity check and relied on clamping after incrementing which
does not work correctly when the user supplied value is in the overflow
zone of the '+ 4' operation.
The operation requires CAP_SYS_TIME and the side effect of the overflow is
NTP getting out of sync.
Similar to the fixups for time_maxerror and time_esterror, clamp the user
space supplied value to the operating range.
[ tglx: Switch to clamping ]
Fixes: eea83d896e31 ("ntp: NTP4 user space bits update")
Signed-off-by: Justin Stitt <justinstitt(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Miroslav Lichvar <mlichvar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20240517-b4-sio-ntp-c-v2-1-f3a80096f36f@google.…
Closes: https://github.com/KSPP/linux/issues/352
---
kernel/time/ntp.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 502e1e5..8d2dd21 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -733,11 +733,10 @@ static inline void process_adjtimex_modes(const struct __kernel_timex *txc,
time_esterror = clamp(txc->esterror, 0, NTP_PHASE_LIMIT);
if (txc->modes & ADJ_TIMECONST) {
- time_constant = txc->constant;
+ time_constant = clamp(txc->constant, 0, MAXTC);
if (!(time_status & STA_NANO))
time_constant += 4;
- time_constant = min(time_constant, (long)MAXTC);
- time_constant = max(time_constant, 0l);
+ time_constant = clamp(time_constant, 0, MAXTC);
}
if (txc->modes & ADJ_TAI &&
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 5916be8a53de6401871bdd953f6c60237b47d6d3
Gitweb: https://git.kernel.org/tip/5916be8a53de6401871bdd953f6c60237b47d6d3
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Sat, 03 Aug 2024 17:07:51 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 05 Aug 2024 16:14:14 +02:00
timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex()
The addition of the bases argument to clock_was_set() fixed up all call
sites correctly except for do_adjtimex(). This uses CLOCK_REALTIME
instead of CLOCK_SET_WALL as argument. CLOCK_REALTIME is 0.
As a result the effect of that clock_was_set() notification is incomplete
and might result in timers expiring late because the hrtimer code does
not re-evaluate the affected clock bases.
Use CLOCK_SET_WALL instead of CLOCK_REALTIME to tell the hrtimers code
which clock bases need to be re-evaluated.
Fixes: 17a1b8826b45 ("hrtimer: Add bases argument to clock_was_set()")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/877ccx7igo.ffs@tglx
---
kernel/time/timekeeping.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 2fa87dc..5391e41 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2606,7 +2606,7 @@ int do_adjtimex(struct __kernel_timex *txc)
clock_set |= timekeeping_advance(TK_ADV_FREQ);
if (clock_set)
- clock_was_set(CLOCK_REALTIME);
+ clock_was_set(CLOCK_SET_WALL);
ntp_notify_cmos_timer();
RT tasks do not have any timerslack, as this induces jitter. By
that, the timer slack is already ignored in the nanosleep family and
schedule_hrtimeout_range() (fixed in 0c52310f2600).
The hrtimer_start_range_ns function is indirectly used by glibc-2.33+
for timed waits on condition variables. These are sometimes used in
RT applications for realtime queue processing. At least on the
combination of kernel 5.10 and glibc-2.31, the timed wait on condition
variables in rt tasks was precise (no slack), however glibc-2.33
changed the internal wait implementation, exposing this oversight.
Make the timer slack consistent across all hrtimer programming code,
by ignoring the timerslack for tasks with rt policies also in the last
remaining location in hrtimer_start_range_ns().
Cc: stable(a)vger.kernel.org
Signed-off-by: Felix Moessbauer <felix.moessbauer(a)siemens.com>
---
kernel/time/hrtimer.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index b8ee320208d4..e8b44e7c281f 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1274,7 +1274,7 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
* hrtimer_start_range_ns - (re)start an hrtimer
* @timer: the timer to be added
* @tim: expiry time
- * @delta_ns: "slack" range for the timer
+ * @delta_ns: "slack" range for the timer for SCHED_OTHER tasks
* @mode: timer mode: absolute (HRTIMER_MODE_ABS) or
* relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED);
* softirq based mode is considered for debug purpose only!
@@ -1299,6 +1299,10 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
base = lock_hrtimer_base(timer, &flags);
+ /* rt-tasks do not have a timer slack for obvious reasons */
+ if (task_is_realtime(current))
+ delta_ns = 0;
+
if (__hrtimer_start_range_ns(timer, tim, delta_ns, mode, base))
hrtimer_reprogram(timer, true);
--
2.39.2
RT tasks do not have any timerslack, as this induces jitter. By
that, the timer slack is already ignored in the nanosleep family and
schedule_hrtimeout_range() (fixed in 0c52310f2600).
The hrtimer_start_range_ns function is indirectly used by glibc-2.33+
for timed waits on condition variables. These are sometimes used in
RT applications for realtime queue processing. At least on the
combination of kernel 5.10 and glibc-2.31, the timed wait on condition
variables in rt tasks was precise (no slack), however glibc-2.33
changed the internal wait implementation, exposing the kernel bug.
This patch makes the timer slack consistent across all hrtimer
programming code, by ignoring the timerslack for rt tasks also in the
last remaining location in hrtimer_start_range_ns().
Similar to 0c52310f2600, this fix should be backported as well.
Cc: stable(a)vger.kernel.org
Signed-off-by: Felix Moessbauer <felix.moessbauer(a)siemens.com>
---
kernel/time/hrtimer.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 2b1469f61d9c..1b26e095114d 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1274,7 +1274,7 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
* hrtimer_start_range_ns - (re)start an hrtimer
* @timer: the timer to be added
* @tim: expiry time
- * @delta_ns: "slack" range for the timer
+ * @delta_ns: "slack" range for the timer for SCHED_OTHER tasks
* @mode: timer mode: absolute (HRTIMER_MODE_ABS) or
* relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED);
* softirq based mode is considered for debug purpose only!
@@ -1299,6 +1299,10 @@ void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
base = lock_hrtimer_base(timer, &flags);
+ /* rt-tasks do not have a timer slack for obvious reasons */
+ if (rt_task(current))
+ delta_ns = 0;
+
if (__hrtimer_start_range_ns(timer, tim, delta_ns, mode, base))
hrtimer_reprogram(timer, true);
--
2.39.2
Greg,
Kernel CI is reporting a build failure on v6.6-rt:
https://grafana.kernelci.org/d/build/build?orgId=1&var-datasource=default&v…
It's in arch/riscv/kernel/cpufeature.c where a return statement in
check_unaligned_access() doesn't have a value (and
check_unaligned_access returns int).
Is 6.6 stable supporting RiscV? If so then we either have to fix that
return, or backport the refactor of arch/riscv/kernel/cpufeature.c
(f413aae96cda0). If it's not then who should I talk to about turning
off riscv CI builds for v6.6-rt?
Thanks,
Clark
On Mon, Aug 5, 2024 at 2:10 PM Blake Sperling <breakingspell(a)gmail.com> wrote:
>
> Hello, I noticed a regression from v.6.6.43 to v6.6.44 caused by this commit.
>
> When using virtio NIC with a QEMU/KVM Windows guest, network traffic from the VM stalls in the outbound (upload) direction.This affects remote access and file shares most noticeably, and the inbound (download) direction does not have the issue.
>
> iperf3 will show consistent results, 0 bytes/sec when initiating a test within the guest to a server on LAN, and reverse will be full speed. Nothing out of the ordinary in host dmesg or guest Event Viewer while the behavior is being displayed.
>
> Crucially, this only seems to affect Windows guests, Ubuntu guest with the same NIC configuration tests fine both directions.
> I wonder if NetKVM guest drivers may be related, the current latest version of the drivers (v248) did not make a difference, but it is several months old.
>
> Let me know if there are any further tests or info I can provide, thanks!
Does Willem's patch fix the issue?
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=8…
Thanks
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: ipu-bridge: fix ipu6 Kconfig dependencies
Author: Arnd Bergmann <arnd(a)arndb.de>
Date: Fri Jul 19 11:53:50 2024 +0200
Commit 4670c8c3fb04 ("media: ipu-bridge: Fix Kconfig dependencies") changed
how IPU_BRIDGE dependencies are handled for all drivers, but the IPU6
variant was added the old way, which causes build time warnings when I2C is
turned off:
WARNING: unmet direct dependencies detected for IPU_BRIDGE
Depends on [n]: MEDIA_SUPPORT [=m] && PCI [=y] && MEDIA_PCI_SUPPORT [=y] && (ACPI [=y] || COMPILE_TEST [=y]) && I2C [=n]
Selected by [m]:
- VIDEO_INTEL_IPU6 [=m] && MEDIA_SUPPORT [=m] && PCI [=y] && MEDIA_PCI_SUPPORT [=y] && (ACPI [=y] || COMPILE_TEST [=y]) && VIDEO_DEV [=m] && X86 [=y] && X86_64 [=y] && HAS_DMA [=y]
To make it consistent with the other IPU drivers as well as avoid this
warning, change the 'select' into 'depends on'.
Fixes: c70281cc83d6 ("media: intel/ipu6: add Kconfig and Makefile")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
[Sakari Ailus: Alternatively depend on !IPU_BRIDGE.]
Cc: stable(a)vger.kernel.org # for v6.10
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
drivers/media/pci/intel/ipu6/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
---
diff --git a/drivers/media/pci/intel/ipu6/Kconfig b/drivers/media/pci/intel/ipu6/Kconfig
index 154343080c82..b7ab24b89836 100644
--- a/drivers/media/pci/intel/ipu6/Kconfig
+++ b/drivers/media/pci/intel/ipu6/Kconfig
@@ -3,13 +3,13 @@ config VIDEO_INTEL_IPU6
depends on ACPI || COMPILE_TEST
depends on VIDEO_DEV
depends on X86 && X86_64 && HAS_DMA
+ depends on IPU_BRIDGE || !IPU_BRIDGE
select DMA_OPS
select IOMMU_IOVA
select VIDEO_V4L2_SUBDEV_API
select MEDIA_CONTROLLER
select VIDEOBUF2_DMA_CONTIG
select V4L2_FWNODE
- select IPU_BRIDGE
help
This is the 6th Gen Intel Image Processing Unit, found in Intel SoCs
and used for capturing images and video from camera sensors.
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: uvcvideo: Fix custom control mapping probing
Author: Ricardo Ribalda <ribalda(a)chromium.org>
Date: Mon Jul 22 11:52:26 2024 +0000
Custom control mapping introduced a bug, where the filter function was
applied to every single control.
Fix it so it is only applied to the matching controls.
The following dmesg errors during probe are now fixed:
usb 1-5: Found UVC 1.00 device Integrated_Webcam_HD (0c45:670c)
usb 1-5: Failed to query (GET_CUR) UVC control 2 on unit 2: -75 (exp. 1).
usb 1-5: Failed to query (GET_CUR) UVC control 3 on unit 2: -75 (exp. 1).
usb 1-5: Failed to query (GET_CUR) UVC control 6 on unit 2: -75 (exp. 1).
usb 1-5: Failed to query (GET_CUR) UVC control 7 on unit 2: -75 (exp. 1).
usb 1-5: Failed to query (GET_CUR) UVC control 8 on unit 2: -75 (exp. 1).
usb 1-5: Failed to query (GET_CUR) UVC control 9 on unit 2: -75 (exp. 1).
usb 1-5: Failed to query (GET_CUR) UVC control 10 on unit 2: -75 (exp. 1).
Reported-by: Paul Menzel <pmenzel(a)molgen.mpg.de>
Closes: https://lore.kernel.org/linux-media/518cd6b4-68a8-4895-b8fc-97d4dae1ddc4@mo…
Cc: stable(a)vger.kernel.org
Fixes: 8f4362a8d42b ("media: uvcvideo: Allow custom control mapping")
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
Link: https://lore.kernel.org/r/20240722-fix-filter-mapping-v2-1-7ed5bb6c1185@chr…
Tested-by: Paul Menzel <pmenzel(a)molgen.mpg.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
drivers/media/usb/uvc/uvc_ctrl.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
---
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index 0136df5732ba..4fe26e82e3d1 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -2680,6 +2680,10 @@ static void uvc_ctrl_init_ctrl(struct uvc_video_chain *chain,
for (i = 0; i < ARRAY_SIZE(uvc_ctrl_mappings); ++i) {
const struct uvc_control_mapping *mapping = &uvc_ctrl_mappings[i];
+ if (!uvc_entity_match_guid(ctrl->entity, mapping->entity) ||
+ ctrl->info.selector != mapping->selector)
+ continue;
+
/* Let the device provide a custom mapping. */
if (mapping->filter_mapping) {
mapping = mapping->filter_mapping(chain, ctrl);
@@ -2687,9 +2691,7 @@ static void uvc_ctrl_init_ctrl(struct uvc_video_chain *chain,
continue;
}
- if (uvc_entity_match_guid(ctrl->entity, mapping->entity) &&
- ctrl->info.selector == mapping->selector)
- __uvc_ctrl_add_mapping(chain, ctrl, mapping);
+ __uvc_ctrl_add_mapping(chain, ctrl, mapping);
}
}