While refactoring the way the ITSs are probed, the handling of
quirks applicable to ACPI-based platforms was lost. As a result,
systems such as HIP07 lose their GICv4 functionnality, and some
other may even fail to boot, unless they are configured to boot
with DT.
Move the enabling of quirks into its_probe_one(), making it
common to all firmware implementations.
Fixes: 9585a495ac93 ("irqchip/gic-v3-its: Split allocation from initialisation of its_node")
Reviewed-by: Lorenzo Pieralisi <lpieralisi(a)kernel.org>
Reviewed-by: Zenghui Yu <yuzenghui(a)huawei.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/irqchip/irq-gic-v3-its.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index fec1b58470df..250b4562f308 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -5091,6 +5091,8 @@ static int __init its_probe_one(struct its_node *its)
u32 ctlr;
int err;
+ its_enable_quirks(its);
+
if (is_v4(its)) {
if (!(its->typer & GITS_TYPER_VMOVP)) {
err = its_compute_its_list_map(its);
@@ -5442,7 +5444,6 @@ static int __init its_of_probe(struct device_node *node)
if (!its)
return -ENOMEM;
- its_enable_quirks(its);
err = its_probe_one(its);
if (err) {
its_node_destroy(its);
--
2.39.2
When updating the affinity of a VPE, we currently skip the VMOVP
command if the two CPUs are part of the same VPE affinity.
But this is wrong, as the doorbell corresponding to this VPE
is still delivered on the 'old' CPU, which screws up the balancing.
Furthermore, offlining that 'old' CPU results in doorbell interrupts
generated for this VPE being discarded.
The harsh reality is that we cannot easily elide VMOVP when
a set_affinity request occurs. It needs to be obeyed, and if
an optimisation is to be made, it is at the point where the affinity
change request is made (such as in KVM).
Drop the VMOVP elision altogether, and only use the vpe_table_mask
to try and stay within the same ITS affinity group if at all possible.
Fixes: dd3f050a216e (irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP)
Reported-by: Kunkun Jiang <jiangkunkun(a)huawei.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/irqchip/irq-gic-v3-its.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 250b4562f308..53abd4779914 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3826,8 +3826,9 @@ static int its_vpe_set_affinity(struct irq_data *d,
bool force)
{
struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
- int from, cpu = cpumask_first(mask_val);
+ struct cpumask common, *table_mask;
unsigned long flags;
+ int from, cpu;
/*
* Changing affinity is mega expensive, so let's be as lazy as
@@ -3843,19 +3844,22 @@ static int its_vpe_set_affinity(struct irq_data *d,
* taken on any vLPI handling path that evaluates vpe->col_idx.
*/
from = vpe_to_cpuid_lock(vpe, &flags);
- if (from == cpu)
- goto out;
-
- vpe->col_idx = cpu;
+ table_mask = gic_data_rdist_cpu(from)->vpe_table_mask;
/*
- * GICv4.1 allows us to skip VMOVP if moving to a cpu whose RD
- * is sharing its VPE table with the current one.
+ * If we are offered another CPU in the same GICv4.1 ITS
+ * affinity, pick this one. Otherwise, any CPU will do.
*/
- if (gic_data_rdist_cpu(cpu)->vpe_table_mask &&
- cpumask_test_cpu(from, gic_data_rdist_cpu(cpu)->vpe_table_mask))
+ if (table_mask && cpumask_and(&common, mask_val, table_mask))
+ cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
+ else
+ cpu = cpumask_first(mask_val);
+
+ if (from == cpu)
goto out;
+ vpe->col_idx = cpu;
+
its_send_vmovp(vpe);
its_vpe_db_proxy_move(vpe, from, cpu);
--
2.39.2
Syzkaller reports warning in ext4_set_page_dirty() in 5.10 and 5.15
stable releases. It happens because invalidate_inode_page() frees pages
that are needed for the system. To fix this we need to add additional
checks to the function. page_mapped() checks if a page exists in the
page tables, but this is not enough. The page can be used in other places:
https://elixir.bootlin.com/linux/v6.8-rc1/source/include/linux/page_ref.h#L…
Kernel outputs an error line related to direct I/O:
https://syzkaller.appspot.com/text?tag=CrashLog&x=14ab52dac80000
The problem can be fixed in 5.10 and 5.15 stable releases by the
following patch.
The patch replaces page_mapped() call with check that finds additional
references to the page excluding page cache and filesystem private data.
If additional references exist, the page cannot be freed.
This version does not include the first patch from the first version.
The problem can be fixed without it.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Link: https://syzkaller.appspot.com/bug?extid=02f21431b65c214aa1d6
Matthew Wilcox (Oracle) (1):
mm/truncate: Replace page_mapped() call in invalidate_inode_page()
mm/truncate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--
2.34.1
shutdown_pirq and startup_pirq are not taking the
irq_mapping_update_lock because they can't due to lock inversion. Both
are called with the irq_desc->lock being taking. The lock order,
however, is first irq_mapping_update_lock and then irq_desc->lock.
This opens multiple races:
- shutdown_pirq can be interrupted by a function that allocates an event
channel:
CPU0 CPU1
shutdown_pirq {
xen_evtchn_close(e)
__startup_pirq {
EVTCHNOP_bind_pirq
-> returns just freed evtchn e
set_evtchn_to_irq(e, irq)
}
xen_irq_info_cleanup() {
set_evtchn_to_irq(e, -1)
}
}
Assume here event channel e refers here to the same event channel
number.
After this race the evtchn_to_irq mapping for e is invalid (-1).
- __startup_pirq races with __unbind_from_irq in a similar way. Because
__startup_pirq doesn't take irq_mapping_update_lock it can grab the
evtchn that __unbind_from_irq is currently freeing and cleaning up. In
this case even though the event channel is allocated, its mapping can
be unset in evtchn_to_irq.
The fix is to first cleanup the mappings and then close the event
channel. In this way, when an event channel gets allocated it's
potential previous evtchn_to_irq mappings are guaranteed to be unset already.
This is also the reverse order of the allocation where first the event
channel is allocated and then the mappings are setup.
On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal
[un]bind interfaces"), we hit a BUG like the following during probing of NVMe
devices. The issue is that during nvme_setup_io_queues, pci_free_irq
is called for every device which results in a call to shutdown_pirq.
With many nvme devices it's therefore likely to hit this race during
boot because there will be multiple calls to shutdown_pirq and
startup_pirq are running potentially in parallel.
------------[ cut here ]------------
blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled
kernel BUG at drivers/xen/events/events_base.c:499!
invalid opcode: 0000 [#1] SMP PTI
CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 5.10.201-191.748.amzn2.x86_64 #1
Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
Workqueue: nvme-reset-wq nvme_reset_work
RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0
Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00
RSP: 0000:ffffc9000d533b08 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
RDX: 0000000000000028 RSI: 00000000ffffffff RDI: 00000000ffffffff
RBP: ffff888107419680 R08: 0000000000000000 R09: ffffffff82d72b00
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001ed
R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff88bc8b500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000002610001 CR4: 00000000001706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? show_trace_log_lvl+0x1c1/0x2d9
? show_trace_log_lvl+0x1c1/0x2d9
? set_affinity_irq+0xdc/0x1c0
? __die_body.cold+0x8/0xd
? die+0x2b/0x50
? do_trap+0x90/0x110
? bind_evtchn_to_cpu+0xdf/0xf0
? do_error_trap+0x65/0x80
? bind_evtchn_to_cpu+0xdf/0xf0
? exc_invalid_op+0x4e/0x70
? bind_evtchn_to_cpu+0xdf/0xf0
? asm_exc_invalid_op+0x12/0x20
? bind_evtchn_to_cpu+0xdf/0xf0
? bind_evtchn_to_cpu+0xc5/0xf0
set_affinity_irq+0xdc/0x1c0
irq_do_set_affinity+0x1d7/0x1f0
irq_setup_affinity+0xd6/0x1a0
irq_startup+0x8a/0xf0
__setup_irq+0x639/0x6d0
? nvme_suspend+0x150/0x150
request_threaded_irq+0x10c/0x180
? nvme_suspend+0x150/0x150
pci_request_irq+0xa8/0xf0
? __blk_mq_free_request+0x74/0xa0
queue_request_irq+0x6f/0x80
nvme_create_queue+0x1af/0x200
nvme_create_io_queues+0xbd/0xf0
nvme_setup_io_queues+0x246/0x320
? nvme_irq_check+0x30/0x30
nvme_reset_work+0x1c8/0x400
process_one_work+0x1b0/0x350
worker_thread+0x49/0x310
? process_one_work+0x350/0x350
kthread+0x11b/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x22/0x30
Modules linked in:
---[ end trace a11715de1eee1873 ]---
Fixes: d46a78b05c0e ("xen: implement pirq type event channels")
Cc: stable(a)vger.kernel.org
Co-debugged-by: Andrew Panyakin <apanyaki(a)amazon.com>
Signed-off-by: Maximilian Heyne <mheyne(a)amazon.de>
---
drivers/xen/events/events_base.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index b8cfea7812d6..3b9f080109d7 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -923,8 +923,8 @@ static void shutdown_pirq(struct irq_data *data)
return;
do_mask(info, EVT_MASK_REASON_EXPLICIT);
- xen_evtchn_close(evtchn);
xen_irq_info_cleanup(info);
+ xen_evtchn_close(evtchn);
}
static void enable_pirq(struct irq_data *data)
@@ -956,6 +956,7 @@ EXPORT_SYMBOL_GPL(xen_irq_from_gsi);
static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
{
evtchn_port_t evtchn;
+ bool close_evtchn = false;
if (!info) {
xen_irq_free_desc(irq);
@@ -975,7 +976,7 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
struct xenbus_device *dev;
if (!info->is_static)
- xen_evtchn_close(evtchn);
+ close_evtchn = true;
switch (info->type) {
case IRQT_VIRQ:
@@ -995,6 +996,9 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
}
xen_irq_info_cleanup(info);
+
+ if (close_evtchn)
+ xen_evtchn_close(evtchn);
}
xen_free_irq(info);
--
2.40.1
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: b0344d6854d25a8b3b901c778b1728885dd99007
Gitweb: https://git.kernel.org/tip/b0344d6854d25a8b3b901c778b1728885dd99007
Author: Doug Berger <opendmb(a)gmail.com>
AuthorDate: Fri, 09 Feb 2024 17:24:49 -08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 13 Feb 2024 09:33:31 +01:00
irqchip/irq-brcmstb-l2: Add write memory barrier before exit
It was observed on Broadcom devices that use GIC v3 architecture L1
interrupt controllers as the parent of brcmstb-l2 interrupt controllers
that the deactivation of the parent interrupt could happen before the
brcmstb-l2 deasserted its output. This would lead the GIC to reactivate the
interrupt only to find that no L2 interrupt was pending. The result was a
spurious interrupt invoking handle_bad_irq() with its associated
messaging. While this did not create a functional problem it is a waste of
cycles.
The hazard exists because the memory mapped bus writes to the brcmstb-l2
registers are buffered and the GIC v3 architecture uses a very efficient
system register write to deactivate the interrupt.
Add a write memory barrier prior to invoking chained_irq_exit() to
introduce a dsb(st) on those systems to ensure the system register write
cannot be executed until the memory mapped writes are visible to the
system.
[ florian: Added Fixes tag ]
Fixes: 7f646e92766e ("irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller")
Signed-off-by: Doug Berger <opendmb(a)gmail.com>
Signed-off-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Acked-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240210012449.3009125-1-florian.fainelli@broadco…
---
drivers/irqchip/irq-brcmstb-l2.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-brcmstb-l2.c b/drivers/irqchip/irq-brcmstb-l2.c
index 5559c94..2b0b317 100644
--- a/drivers/irqchip/irq-brcmstb-l2.c
+++ b/drivers/irqchip/irq-brcmstb-l2.c
@@ -2,7 +2,7 @@
/*
* Generic Broadcom Set Top Box Level 2 Interrupt controller driver
*
- * Copyright (C) 2014-2017 Broadcom
+ * Copyright (C) 2014-2024 Broadcom
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -112,6 +112,9 @@ static void brcmstb_l2_intc_irq_handle(struct irq_desc *desc)
generic_handle_domain_irq(b->domain, irq);
} while (status);
out:
+ /* Don't ack parent before all device writes are done */
+ wmb();
+
chained_irq_exit(chip, desc);
}
[Cc: stable]
On Mon, 12 Feb 2024, Ulrich Hecht wrote:
>
> > On 02/08/2024 11:51 PM CET Finn Thain <fthain(a)linux-m68k.org> wrote:
> > Ulrich, I imagine that you would normally receive fixes via the
> > corresponding -stable trees. If Michael's series went into
> > stable/linux-4.19.y you could cherry-pick from there for your v4.4.y tree
> > and maybe avoid some merge conflicts that way.
>
> That would work for me.
>
OK. Here's the relevant commit. It fixes bd6f56a75bb2 which first appeared
in v2.6.38-rc1. I believe this can be cherry-picked without any conflicts.
50e43a573344 m68k: Update ->thread.esp0 before calling syscall_trace() in ret_from_signal
From: Zhang Rui <rui.zhang(a)intel.com>
[ Upstream commit 34cf8c657cf0365791cdc658ddbca9cc907726ce ]
Currently, coretemp driver supports only 128 cores per package.
This loses some core temperature information on systems that have more
than 128 cores per package.
[ 58.685033] coretemp coretemp.0: Adding Core 128 failed
[ 58.692009] coretemp coretemp.0: Adding Core 129 failed
...
Enlarge the limitation to 512 because there are platforms with more than
256 cores per package.
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
Link: https://lore.kernel.org/r/20240202092144.71180-4-rui.zhang@intel.com
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hwmon/coretemp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 33371f7a4c0f..e08bb28ec427 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -53,7 +53,7 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
#define PKG_SYSFS_ATTR_NO 1 /* Sysfs attribute for package temp */
#define BASE_SYSFS_ATTR_NO 2 /* Sysfs Base attr no for coretemp */
-#define NUM_REAL_CORES 128 /* Number of Real cores per cpu */
+#define NUM_REAL_CORES 512 /* Number of Real cores per cpu */
#define CORETEMP_NAME_LENGTH 28 /* String Length of attrs */
#define MAX_CORE_ATTRS 4 /* Maximum no of basic attrs */
#define TOTAL_ATTRS (MAX_CORE_ATTRS + 1)
--
2.43.0
From: Zhang Rui <rui.zhang(a)intel.com>
[ Upstream commit 34cf8c657cf0365791cdc658ddbca9cc907726ce ]
Currently, coretemp driver supports only 128 cores per package.
This loses some core temperature information on systems that have more
than 128 cores per package.
[ 58.685033] coretemp coretemp.0: Adding Core 128 failed
[ 58.692009] coretemp coretemp.0: Adding Core 129 failed
...
Enlarge the limitation to 512 because there are platforms with more than
256 cores per package.
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
Link: https://lore.kernel.org/r/20240202092144.71180-4-rui.zhang@intel.com
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hwmon/coretemp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 0eabad344961..b8d5087da65b 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -40,7 +40,7 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
#define PKG_SYSFS_ATTR_NO 1 /* Sysfs attribute for package temp */
#define BASE_SYSFS_ATTR_NO 2 /* Sysfs Base attr no for coretemp */
-#define NUM_REAL_CORES 128 /* Number of Real cores per cpu */
+#define NUM_REAL_CORES 512 /* Number of Real cores per cpu */
#define CORETEMP_NAME_LENGTH 28 /* String Length of attrs */
#define MAX_CORE_ATTRS 4 /* Maximum no of basic attrs */
#define TOTAL_ATTRS (MAX_CORE_ATTRS + 1)
--
2.43.0