The Processor _PDC buffer bits notify ACPI of the OS capabilities, and
so ACPI can adjust the return of other Processor methods taking the OS
capabilities into account.
When Linux is running as a Xen dom0, it's the hypervisor the entity
in charge of processor power management, and hence Xen needs to make
sure the capabilities reported in the _PDC buffer match the
capabilities of the driver in Xen.
Introduce a small helper to sanitize the buffer when running as Xen
dom0.
Signed-off-by: Roger Pau Monné <roger.pau(a)citrix.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/include/asm/xen/hypervisor.h | 2 ++
arch/x86/xen/enlighten.c | 17 +++++++++++++++++
drivers/acpi/processor_pdc.c | 8 ++++++++
3 files changed, 27 insertions(+)
diff --git a/arch/x86/include/asm/xen/hypervisor.h b/arch/x86/include/asm/xen/hypervisor.h
index b9f512138043..b4ed90ef5e68 100644
--- a/arch/x86/include/asm/xen/hypervisor.h
+++ b/arch/x86/include/asm/xen/hypervisor.h
@@ -63,12 +63,14 @@ void __init mem_map_via_hcall(struct boot_params *boot_params_p);
#ifdef CONFIG_XEN_DOM0
bool __init xen_processor_present(uint32_t acpi_id);
+void xen_sanitize_pdc(uint32_t *buf);
#else
static inline bool xen_processor_present(uint32_t acpi_id)
{
BUG();
return false;
}
+static inline void xen_sanitize_pdc(uint32_t *buf) { BUG(); }
#endif
#endif /* _ASM_X86_XEN_HYPERVISOR_H */
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index d4c44361a26c..394dd6675113 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -372,4 +372,21 @@ bool __init xen_processor_present(uint32_t acpi_id)
return false;
}
+
+void xen_sanitize_pdc(uint32_t *buf)
+{
+ struct xen_platform_op op = {
+ .cmd = XENPF_set_processor_pminfo,
+ .interface_version = XENPF_INTERFACE_VERSION,
+ .u.set_pminfo.id = -1,
+ .u.set_pminfo.type = XEN_PM_PDC,
+ };
+ int ret;
+
+ set_xen_guest_handle(op.u.set_pminfo.pdc, buf);
+ ret = HYPERVISOR_platform_op(&op);
+ if (ret)
+ pr_info("sanitize of _PDC buffer bits from Xen failed: %d\n",
+ ret);
+}
#endif
diff --git a/drivers/acpi/processor_pdc.c b/drivers/acpi/processor_pdc.c
index 18fb04523f93..58f4c208517a 100644
--- a/drivers/acpi/processor_pdc.c
+++ b/drivers/acpi/processor_pdc.c
@@ -137,6 +137,14 @@ acpi_processor_eval_pdc(acpi_handle handle, struct acpi_object_list *pdc_in)
buffer[2] &= ~(ACPI_PDC_C_C2C3_FFH | ACPI_PDC_C_C1_FFH);
}
+ if (xen_initial_domain())
+ /*
+ * When Linux is running as Xen dom0 it's the hypervisor the
+ * entity in charge of the processor power management, and so
+ * Xen needs to check the OS capabilities reported in the _PDC
+ * buffer matches what the hypervisor driver supports.
+ */
+ xen_sanitize_pdc((uint32_t *)pdc_in->pointer->buffer.pointer);
status = acpi_evaluate_object(handle, "_PDC", pdc_in, NULL);
if (ACPI_FAILURE(status))
--
2.37.3
From: Hui Li <caelli(a)tencent.com>
We have met a hang on pty device, the reader was blocking
at epoll on master side, the writer was sleeping at wait_woken
inside n_tty_write on slave side, and the write buffer on
tty_port was full, we found that the reader and writer would
never be woken again and blocked forever.
The problem was caused by a race between reader and kworker:
n_tty_read(reader): n_tty_receive_buf_common(kworker):
copy_from_read_buf()|
|room = N_TTY_BUF_SIZE - (ldata->read_head - tail)
|room <= 0
n_tty_kick_worker() |
|ldata->no_room = true
After writing to slave device, writer wakes up kworker to flush
data on tty_port to reader, and the kworker finds that reader
has no room to store data so room <= 0 is met. At this moment,
reader consumes all the data on reader buffer and calls
n_tty_kick_worker to check ldata->no_room which is false and
reader quits reading. Then kworker sets ldata->no_room=true
and quits too.
If write buffer is not full, writer will wake kworker to flush data
again after following writes, but if write buffer is full and writer
goes to sleep, kworker will never be woken again and tty device is
blocked.
This problem can be solved with a check for read buffer size inside
n_tty_receive_buf_common, if read buffer is empty and ldata->no_room
is true, a call to n_tty_kick_worker is necessary to keep flushing
data to reader.
Cc: <stable(a)vger.kernel.org>
Fixes: 42458f41d08f ("n_tty: Ensure reader restarts worker for next reader")
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Signed-off-by: Hui Li <caelli(a)tencent.com>
---
Patch changelogs between v1 and v2:
-add barrier inside n_tty_read and n_tty_receive_buf_common;
-comment why barrier is needed;
-access to ldata->no_room is changed with READ_ONCE and WRITE_ONCE;
Patch changelogs between v2 and v3:
-in function n_tty_receive_buf_common, add unlikely to check
ldata->no_room, eg: if (unlikely(ldata->no_room)), and READ_ONCE
is removed here to get locality;
-change comment for barrier to show the race condition to make
comment easier to understand;
Patch changelogs between v3 and v4:
-change subject from 'tty: fix a possible hang on tty device' to
'tty: fix hang on tty device with no_room set' to make subject
more obvious;
Patch changelogs between v4 and v5:
-name is changed from cael to caelli, li is added as the family
name and caelli is the fullname.
Patch changelogs between v5 and v6:
-change from and Signed-off-by, from 'caelli <juanfengpy(a)gmail.com>'
to 'caelli <caelli(a)tencent.com>', later one is my corporate address.
Patch changelogs between v6 and v7:
-change name from caelli to 'Hui Li', which is my name in chinese.
-the comment for barrier is improved, and a Fixes and Reviewed-by
tags is added.
drivers/tty/n_tty.c | 41 +++++++++++++++++++++++++++++++++++++----
1 file changed, 37 insertions(+), 4 deletions(-)
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index c8f56c9b1a1c..8c17304fffcf 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -204,8 +204,8 @@ static void n_tty_kick_worker(struct tty_struct *tty)
struct n_tty_data *ldata = tty->disc_data;
/* Did the input worker stop? Restart it */
- if (unlikely(ldata->no_room)) {
- ldata->no_room = 0;
+ if (unlikely(READ_ONCE(ldata->no_room))) {
+ WRITE_ONCE(ldata->no_room, 0);
WARN_RATELIMIT(tty->port->itty == NULL,
"scheduling with invalid itty\n");
@@ -1698,7 +1698,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
if (overflow && room < 0)
ldata->read_head--;
room = overflow;
- ldata->no_room = flow && !room;
+ WRITE_ONCE(ldata->no_room, flow && !room);
} else
overflow = 0;
@@ -1729,6 +1729,27 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
} else
n_tty_check_throttle(tty);
+ if (unlikely(ldata->no_room)) {
+ /*
+ * Barrier here is to ensure to read the latest read_tail in
+ * chars_in_buffer() and to make sure that read_tail is not loaded
+ * before ldata->no_room is set, otherwise, following race may occur:
+ * n_tty_receive_buf_common()
+ * n_tty_read()
+ * if (!chars_in_buffer(tty))->false
+ * copy_from_read_buf()
+ * read_tail=commit_head
+ * n_tty_kick_worker()
+ * if (ldata->no_room)->false
+ * ldata->no_room = 1
+ * Then both kworker and reader will fail to kick n_tty_kick_worker(),
+ * smp_mb is paired with smp_mb() in n_tty_read().
+ */
+ smp_mb();
+ if (!chars_in_buffer(tty))
+ n_tty_kick_worker(tty);
+ }
+
up_read(&tty->termios_rwsem);
return rcvd;
@@ -2282,8 +2303,25 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
if (time)
timeout = time;
}
- if (old_tail != ldata->read_tail)
+ if (old_tail != ldata->read_tail) {
+ /*
+ * Make sure no_room is not read in n_tty_kick_worker()
+ * before setting ldata->read_tail in copy_from_read_buf(),
+ * otherwise, following race may occur:
+ * n_tty_read()
+ * n_tty_receive_buf_common()
+ * n_tty_kick_worker()
+ * if(ldata->no_room)->false
+ * ldata->no_room = 1
+ * if (!chars_in_buffer(tty))->false
+ * copy_from_read_buf()
+ * read_tail=commit_head
+ * Both reader and kworker will fail to kick tty_buffer_restart_work(),
+ * smp_mb is paired with smp_mb() in n_tty_receive_buf_common().
+ */
+ smp_mb();
n_tty_kick_worker(tty);
+ }
up_read(&tty->termios_rwsem);
remove_wait_queue(&tty->read_wait, &wait);
--
2.27.0
This reverts commit d6af2ed29c7c1c311b96dac989dcb991e90ee195.
The statement "the hv_pci_bus_exit() call releases structures of all its
child devices" in commit d6af2ed29c7c is not true: in the path
hv_pci_probe() -> hv_pci_enter_d0() -> hv_pci_bus_exit(hdev, true): the
parameter "keep_devs" is true, so hv_pci_bus_exit() does *not* release the
child "struct hv_pci_dev *hpdev" that is created earlier in
pci_devices_present_work() -> new_pcichild_device().
The commit d6af2ed29c7c was originally made in July 2020 for RHEL 7.7,
where the old version of hv_pci_bus_exit() was used; when the commit was
rebased and merged into the upstream, people didn't notice that it's
not really necessary. The commit itself doesn't cause any issue, but it
makes hv_pci_probe() more complicated. Revert it to facilitate some
upcoming changes to hv_pci_probe().
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Acked-by: Wei Hu <weh(a)microsoft.com>
Cc: stable(a)vger.kernel.org
---
v2:
No change to the patch body.
Added Wei Hu's Acked-by.
Added Cc:stable
v3:
Added Michael's Reviewed-by.
drivers/pci/controller/pci-hyperv.c | 71 ++++++++++++++---------------
1 file changed, 34 insertions(+), 37 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 46df6d093d683..48feab095a144 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3225,8 +3225,10 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
struct pci_bus_d0_entry *d0_entry;
struct hv_pci_compl comp_pkt;
struct pci_packet *pkt;
+ bool retry = true;
int ret;
+enter_d0_retry:
/*
* Tell the host that the bus is ready to use, and moved into the
* powered-on state. This includes telling the host which region
@@ -3253,6 +3255,38 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
if (ret)
goto exit;
+ /*
+ * In certain case (Kdump) the pci device of interest was
+ * not cleanly shut down and resource is still held on host
+ * side, the host could return invalid device status.
+ * We need to explicitly request host to release the resource
+ * and try to enter D0 again.
+ */
+ if (comp_pkt.completion_status < 0 && retry) {
+ retry = false;
+
+ dev_err(&hdev->device, "Retrying D0 Entry\n");
+
+ /*
+ * Hv_pci_bus_exit() calls hv_send_resource_released()
+ * to free up resources of its child devices.
+ * In the kdump kernel we need to set the
+ * wslot_res_allocated to 255 so it scans all child
+ * devices to release resources allocated in the
+ * normal kernel before panic happened.
+ */
+ hbus->wslot_res_allocated = 255;
+
+ ret = hv_pci_bus_exit(hdev, true);
+
+ if (ret == 0) {
+ kfree(pkt);
+ goto enter_d0_retry;
+ }
+ dev_err(&hdev->device,
+ "Retrying D0 failed with ret %d\n", ret);
+ }
+
if (comp_pkt.completion_status < 0) {
dev_err(&hdev->device,
"PCI Pass-through VSP failed D0 Entry with status %x\n",
@@ -3493,7 +3527,6 @@ static int hv_pci_probe(struct hv_device *hdev,
struct hv_pcibus_device *hbus;
u16 dom_req, dom;
char *name;
- bool enter_d0_retry = true;
int ret;
/*
@@ -3633,47 +3666,11 @@ static int hv_pci_probe(struct hv_device *hdev,
if (ret)
goto free_fwnode;
-retry:
ret = hv_pci_query_relations(hdev);
if (ret)
goto free_irq_domain;
ret = hv_pci_enter_d0(hdev);
- /*
- * In certain case (Kdump) the pci device of interest was
- * not cleanly shut down and resource is still held on host
- * side, the host could return invalid device status.
- * We need to explicitly request host to release the resource
- * and try to enter D0 again.
- * Since the hv_pci_bus_exit() call releases structures
- * of all its child devices, we need to start the retry from
- * hv_pci_query_relations() call, requesting host to send
- * the synchronous child device relations message before this
- * information is needed in hv_send_resources_allocated()
- * call later.
- */
- if (ret == -EPROTO && enter_d0_retry) {
- enter_d0_retry = false;
-
- dev_err(&hdev->device, "Retrying D0 Entry\n");
-
- /*
- * Hv_pci_bus_exit() calls hv_send_resources_released()
- * to free up resources of its child devices.
- * In the kdump kernel we need to set the
- * wslot_res_allocated to 255 so it scans all child
- * devices to release resources allocated in the
- * normal kernel before panic happened.
- */
- hbus->wslot_res_allocated = 255;
- ret = hv_pci_bus_exit(hdev, true);
-
- if (ret == 0)
- goto retry;
-
- dev_err(&hdev->device,
- "Retrying D0 failed with ret %d\n", ret);
- }
if (ret)
goto free_irq_domain;
--
2.25.1
When the host tries to remove a PCI device, the host first sends a
PCI_EJECT message to the guest, and the guest is supposed to gracefully
remove the PCI device and send a PCI_EJECTION_COMPLETE message to the host;
the host then sends a VMBus message CHANNELMSG_RESCIND_CHANNELOFFER to
the guest (when the guest receives this message, the device is already
unassigned from the guest) and the guest can do some final cleanup work;
if the guest fails to respond to the PCI_EJECT message within one minute,
the host sends the VMBus message CHANNELMSG_RESCIND_CHANNELOFFER and
removes the PCI device forcibly.
In the case of fast device addition/removal, it's possible that the PCI
device driver is still configuring MSI-X interrupts when the guest receives
the PCI_EJECT message; the channel callback calls hv_pci_eject_device(),
which sets hpdev->state to hv_pcichild_ejecting, and schedules a work
hv_eject_device_work(); if the PCI device driver is calling
pci_alloc_irq_vectors() -> ... -> hv_compose_msi_msg(), we can break the
while loop in hv_compose_msi_msg() due to the updated hpdev->state, and
leave data->chip_data with its default value of NULL; later, when the PCI
device driver calls request_irq() -> ... -> hv_irq_unmask(), the guest
crashes in hv_arch_irq_unmask() due to data->chip_data being NULL.
Fix the issue by not testing hpdev->state in the while loop: when the
guest receives PCI_EJECT, the device is still assigned to the guest, and
the guest has one minute to finish the device removal gracefully. We don't
really need to (and we should not) test hpdev->state in the loop.
Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Cc: stable(a)vger.kernel.org
---
v2:
Removed the "debug code".
No change to the patch body.
Added Cc:stable
v3:
Added Michael's Reviewed-by.
drivers/pci/controller/pci-hyperv.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index b82c7cde19e66..1b11cf7391933 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -643,6 +643,11 @@ static void hv_arch_irq_unmask(struct irq_data *data)
pbus = pdev->bus;
hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata);
int_desc = data->chip_data;
+ if (!int_desc) {
+ dev_warn(&hbus->hdev->device, "%s() can not unmask irq %u\n",
+ __func__, data->irq);
+ return;
+ }
spin_lock_irqsave(&hbus->retarget_msi_interrupt_lock, flags);
@@ -1911,12 +1916,6 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
hv_pci_onchannelcallback(hbus);
spin_unlock_irqrestore(&channel->sched_lock, flags);
- if (hpdev->state == hv_pcichild_ejecting) {
- dev_err_once(&hbus->hdev->device,
- "the device is being ejected\n");
- goto enable_tasklet;
- }
-
udelay(100);
}
--
2.25.1
Fix the longstanding race between hv_pci_query_relations() and
survey_child_resources() by flushing the workqueue before we exit from
hv_pci_query_relations().
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Cc: stable(a)vger.kernel.org
---
v2:
Removed the "debug code".
No change to the patch body.
Added Cc:stable
v3:
Added Michael's Reviewed-by.
drivers/pci/controller/pci-hyperv.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index f33370b756283..b82c7cde19e66 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3308,6 +3308,19 @@ static int hv_pci_query_relations(struct hv_device *hdev)
if (!ret)
ret = wait_for_response(hdev, &comp);
+ /*
+ * In the case of fast device addition/removal, it's possible that
+ * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we
+ * already got a PCI_BUS_RELATIONS* message from the host and the
+ * channel callback already scheduled a work to hbus->wq, which can be
+ * running survey_child_resources() -> complete(&hbus->survey_event),
+ * even after hv_pci_query_relations() exits and the stack variable
+ * 'comp' is no longer valid. This can cause a strange hang issue
+ * or sometimes a page fault. Flush hbus->wq before we exit from
+ * hv_pci_query_relations() to avoid the issues.
+ */
+ flush_workqueue(hbus->wq);
+
return ret;
}
--
2.25.1
Hi there, hello,
Sometimes when I suspend (by closing the lid, less often - by pressing
Fn+F1 (sleep key combo)) or poweroff my laptop (both by pressing powerit
button and running "loginctl poweroff"), it goes in such a state when it
doesn't respond to opening/closing the lid, power button nor
Ctrl+Alt+Del, but, unlike in sleep mode, the fan is rotating and the
"awake status" LED is on. I checked /var/log/kern.log, but it didn't
report suspend at that moment at all: went straight from [UFW BLOCK] to
"Microcode updated" on force reboot (marked with an arrow):
Apr 13 10:40:32 bong kernel: asus_wmi: Unknown key code 0xcf
Apr 13 10:44:05 bong kernel: [UFW BLOCK] IN=wlan0 OUT= MAC=/*confidential*/
Apr 13 10:47:45 bong kernel: [UFW BLOCK] IN=wlan0 OUT= MAC=/*confidential*/
Apr 13 10:47:46 bong kernel: ICMPv6: NA: /*router*/ advertised our address /*ipv6*/ on wlan0!
Apr 13 10:47:48 bong last message buffered 2 times
-> Apr 13 10:49:11 bong kernel: [UFW BLOCK] IN=wlan0 OUT= MAC=/*confidential*/
Apr 13 10:52:34 bong kernel: microcode: microcode updated early to revision 0xf0, date = 2021-11-12
Apr 13 10:52:34 bong kernel: Linux version 6.1.23-bong+ (acid@bong) (gcc (Gentoo Hardened 12.2.1_p20230121-r1 p10) 12.2.1 20230121, GNU ld (Gentoo 2.39 p5) 2.39.0) #1 SMP PREEMPT_DYNAMIC Tue Apr 11 15:21:57 EEST 2023
Apr 13 10:52:34 bong kernel: Command line: root=/dev/genston/root ro loglevel=4 rd.lvm.vg=genston rd.luks.uuid=97d10669-2da1-452d-a372-887e420b2ad4 rd.luks.allow-discards pci=nomsi initrd=\x5cinitramfs-6.1.23-bong+.img
Apr 13 10:52:34 bong kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Apr 13 10:52:34 bong kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Apr 13 10:52:34 bong kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Normally it starts like this (taken from dmesg to sync with elogind messages)
[ 7835.869228] elogind-daemon[2033]: Lid closed.
[ 7835.872875] elogind-daemon[2033]: Suspending...
[ 7835.873955] elogind-daemon[2033]: Suspending system...
[ 7835.873970] PM: suspend entry (deep)
[ 7835.902814] Filesystems sync: 0.028 seconds
[ 7835.920362] Freezing user space processes
[ 7835.923030] Freezing user space processes completed (elapsed 0.002 seconds)
[ 7835.923046] OOM killer disabled.
[ 7835.923049] Freezing remaining freezable tasks
[ 7835.924445] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 7835.924624] printk: Suspending console(s) (use no_console_suspend to debug)
The issue appeared when I was using pf-kernel with genpatches and
updated from 6.1-pf2 to 6.1-pf3 (corresponding to vanilla versions 6.1.3
-> 6.1.6). I used that fork until 6.2-pf2, but since then (early March)
moved to vanilla sources and started following the 6.1.y branch when it
was declared LTS. And the issue was present on all of them.
The hang was last detected 3 days ago on 6.1.22 and today on 6.1.23.
I'd like to bisect it, but it could take ages for a couple of reasons:
1) I don't know exact patterns it follows. One of the scenarios I've
noticed was this one (sorry if too ridiculous):
- put the laptop on the nearby couch and simultaneously close
the lid; the loose charger jack might disconnect;
- lay the mouse upside down (so it doesn't wake up when I
reconnect the charger),
but it's not a 100% guarantee of the bug and, as I said earlier, the
laptop also misbehaves on shutdown.
2) The issue happens rarely, once in a few days (sometimes up to a week;
I haven't measured it precisely back then).
Hardware: https://tilde.cafe/u/acidbong/kernel/lspci (`lspci -vvnn`)
Config (latest vanilla): https://git.sr.ht/~acid-bong/kernel/tree/806e6639da610952798e1b5d8c0d700062…
Built with KCFLAGS="-march=native"
Isolated cmdline: root=/dev/genston/root ro loglevel=4 rd.lvm.vg=genston rd.luks.uuid=97d10669-2da1-452d-a372-887e420b2ad4 rd.luks.allow-discards pci=nomsi initrd=\initramfs-6.1.23-bong+.img
# regzbot introduced v6.1.3..v6.1.6
---
Regards,
~acidbong