From: Alexander Sverdlin <alexander.sverdlin(a)nokia.com>
Erase can be zeroed in spi_nor_parse_4bait() or
spi_nor_init_non_uniform_erase_map(). In practice it happened with
mt25qu256a, which supports 4K, 32K, 64K erases with 3b address commands,
but only 4K and 64K erase with 4b address commands.
Fixes: dc92843159a7 ("mtd: spi-nor: fix erase_type array to indicate current map conf")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)nokia.com>
---
Changes in v2:
erase->opcode -> erase->size
drivers/mtd/spi-nor/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 88dd090..183ea9d 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -1400,6 +1400,8 @@ spi_nor_find_best_erase_type(const struct spi_nor_erase_map *map,
continue;
erase = &map->erase_type[i];
+ if (!erase->size)
+ continue;
/* Alignment is not mandatory for overlaid regions */
if (region->offset & SNOR_OVERLAID_REGION &&
--
2.10.2
This reverts commit 2dc016599cfa9672a147528ca26d70c3654a5423.
Users are reporting regressions in regulatory domain detection and
channel availability.
The problem this was trying to resolve was fixed in firmware anyway:
QCA6174 hw3.0: sdio-4.4.1: add firmware.bin_WLAN.RMH.4.4.1-00042
https://github.com/kvalo/ath10k-firmware/commit/4d382787f0efa77dba40394e0bc…
Link: https://bbs.archlinux.org/viewtopic.php?id=254535
Link: http://lists.infradead.org/pipermail/ath10k/2020-April/014871.html
Link: http://lists.infradead.org/pipermail/ath10k/2020-May/015152.html
Fixes: 2dc016599cfa ("ath: add support for special 0x0 regulatory domain")
Cc: <stable(a)vger.kernel.org>
Cc: Wen Gong <wgong(a)codeaurora.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
drivers/net/wireless/ath/regd.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/wireless/ath/regd.c b/drivers/net/wireless/ath/regd.c
index bee9110b91f3..20f4f8ea9f89 100644
--- a/drivers/net/wireless/ath/regd.c
+++ b/drivers/net/wireless/ath/regd.c
@@ -666,14 +666,14 @@ ath_regd_init_wiphy(struct ath_regulatory *reg,
/*
* Some users have reported their EEPROM programmed with
- * 0x8000 or 0x0 set, this is not a supported regulatory
- * domain but since we have more than one user with it we
- * need a solution for them. We default to 0x64, which is
- * the default Atheros world regulatory domain.
+ * 0x8000 set, this is not a supported regulatory domain
+ * but since we have more than one user with it we need
+ * a solution for them. We default to 0x64, which is the
+ * default Atheros world regulatory domain.
*/
static void ath_regd_sanitize(struct ath_regulatory *reg)
{
- if (reg->current_rd != COUNTRY_ERD_FLAG && reg->current_rd != 0)
+ if (reg->current_rd != COUNTRY_ERD_FLAG)
return;
printk(KERN_DEBUG "ath: EEPROM regdomain sanitized\n");
reg->current_rd = 0x64;
--
2.27.0.rc0.183.gde8f92d652-goog
The i801 controller provides a locking mechanism that the OS is supposed
to use to safely share the SMBus with ACPI AML or other firmware.
Previously, Linux attempted to get out of the way of ACPI AML entirely,
but left the bus locked if it used it before the first AML access. This
causes AML implementations that *do* attempt to safely share the bus
to time out if Linux uses it first; notably, this regressed ACPI video
backlight controls on 2015 iMacs after 01590f361e started instantiating
SPD EEPROMs on boot.
Commit 065b6211a8 fixed the immediate problem of leaving the bus locked,
but we can do better. The controller does have a proper locking mechanism,
so let's use it as intended. Since we can't rely on the BIOS doing this
properly, we implement the following logic:
- If ACPI AML uses the bus at all, we make a note and disable power
management. The latter matches already existing behavior.
- When we want to use the bus, we attempt to lock it first. If the
locking attempt times out, *and* ACPI hasn't tried to use the bus at
all yet, we cautiously go ahead and assume the BIOS forgot to unlock
the bus after boot. This preserves existing behavior.
- We always unlock the bus after a transfer.
- If ACPI AML tries to use the bus (except trying to lock it) while
we're in the middle of a transfer, or after we've determined
locking is broken, we know we cannot safely share the bus and give up.
Upon first usage of SMBus by ACPI AML, if nothing has gone horribly
wrong so far, users will see:
i801_smbus 0000:00:1f.4: SMBus controller is shared with ACPI AML. This seems safe so far.
If locking the SMBus times out, users will see:
i801_smbus 0000:00:1f.4: BIOS left SMBus locked
And if ACPI AML tries to use the bus concurrently with Linux, or it
previously used the bus and we failed to subsequently lock it as
above, the driver will give up and users will get:
i801_smbus 0000:00:1f.4: BIOS uses SMBus unsafely
i801_smbus 0000:00:1f.4: Driver SMBus register access inhibited
This fixes the regression introduced by 01590f361e, and further allows
safely sharing the SMBus on 2015 iMacs. Tested by running `i2cdump` in a
loop while changing backlight levels via the ACPI video device.
Fixes: 01590f361e ("i2c: i801: Instantiate SPD EEPROMs automatically")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hector Martin <marcan(a)marcan.st>
---
drivers/i2c/busses/i2c-i801.c | 96 ++++++++++++++++++++++++++++-------
1 file changed, 79 insertions(+), 17 deletions(-)
diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c
index 04a1e38f2a6f..03be6310d6d7 100644
--- a/drivers/i2c/busses/i2c-i801.c
+++ b/drivers/i2c/busses/i2c-i801.c
@@ -287,11 +287,18 @@ struct i801_priv {
#endif
struct platform_device *tco_pdev;
+ /* BIOS left the controller marked busy. */
+ bool inuse_stuck;
/*
- * If set to true the host controller registers are reserved for
- * ACPI AML use. Protected by acpi_lock.
+ * If set to true, ACPI AML uses the host controller registers.
+ * Protected by acpi_lock.
*/
- bool acpi_reserved;
+ bool acpi_usage;
+ /*
+ * If set to true, ACPI AML uses the host controller registers in an
+ * unsafe way. Protected by acpi_lock.
+ */
+ bool acpi_unsafe;
struct mutex acpi_lock;
};
@@ -854,10 +861,37 @@ static s32 i801_access(struct i2c_adapter *adap, u16 addr,
int hwpec;
int block = 0;
int ret = 0, xact = 0;
+ int timeout = 0;
struct i801_priv *priv = i2c_get_adapdata(adap);
+ /*
+ * The controller provides a bit that implements a mutex mechanism
+ * between users of the bus. First, try to lock the hardware mutex.
+ * If this doesn't work, we give up trying to do this, but then
+ * bail if ACPI uses SMBus at all.
+ */
+ if (!priv->inuse_stuck) {
+ while (inb_p(SMBHSTSTS(priv)) & SMBHSTSTS_INUSE_STS) {
+ if (++timeout >= MAX_RETRIES) {
+ dev_warn(&priv->pci_dev->dev,
+ "BIOS left SMBus locked\n");
+ priv->inuse_stuck = true;
+ break;
+ }
+ usleep_range(250, 500);
+ }
+ }
+
mutex_lock(&priv->acpi_lock);
- if (priv->acpi_reserved) {
+ if (priv->acpi_usage && priv->inuse_stuck && !priv->acpi_unsafe) {
+ priv->acpi_unsafe = true;
+
+ dev_warn(&priv->pci_dev->dev, "BIOS uses SMBus unsafely\n");
+ dev_warn(&priv->pci_dev->dev,
+ "Driver SMBus register access inhibited\n");
+ }
+
+ if (priv->acpi_unsafe) {
mutex_unlock(&priv->acpi_lock);
return -EBUSY;
}
@@ -1639,6 +1673,16 @@ static bool i801_acpi_is_smbus_ioport(const struct i801_priv *priv,
address <= pci_resource_end(priv->pci_dev, SMBBAR);
}
+static acpi_status
+i801_acpi_do_access(u32 function, acpi_physical_address address,
+ u32 bits, u64 *value)
+{
+ if ((function & ACPI_IO_MASK) == ACPI_READ)
+ return acpi_os_read_port(address, (u32 *)value, bits);
+ else
+ return acpi_os_write_port(address, (u32)*value, bits);
+}
+
static acpi_status
i801_acpi_io_handler(u32 function, acpi_physical_address address, u32 bits,
u64 *value, void *handler_context, void *region_context)
@@ -1648,17 +1692,38 @@ i801_acpi_io_handler(u32 function, acpi_physical_address address, u32 bits,
acpi_status status;
/*
- * Once BIOS AML code touches the OpRegion we warn and inhibit any
- * further access from the driver itself. This device is now owned
- * by the system firmware.
+ * Non-i801 accesses pass through.
*/
- mutex_lock(&priv->acpi_lock);
+ if (!i801_acpi_is_smbus_ioport(priv, address))
+ return i801_acpi_do_access(function, address, bits, value);
- if (!priv->acpi_reserved && i801_acpi_is_smbus_ioport(priv, address)) {
- priv->acpi_reserved = true;
+ if (!mutex_trylock(&priv->acpi_lock)) {
+ mutex_lock(&priv->acpi_lock);
+ /*
+ * This better be a read of the status register to acquire
+ * the lock...
+ */
+ if (!priv->acpi_unsafe &&
+ !(address == SMBHSTSTS(priv) &&
+ (function & ACPI_IO_MASK) == ACPI_READ)) {
+ /*
+ * Uh-oh, ACPI AML is trying to do something with the
+ * controller without locking it properly.
+ */
+ priv->acpi_unsafe = true;
+
+ dev_warn(&pdev->dev, "BIOS uses SMBus unsafely\n");
+ dev_warn(&pdev->dev,
+ "Driver SMBus register access inhibited\n");
+ }
+ }
- dev_warn(&pdev->dev, "BIOS is accessing SMBus registers\n");
- dev_warn(&pdev->dev, "Driver SMBus register access inhibited\n");
+ if (!priv->acpi_usage) {
+ priv->acpi_usage = true;
+
+ if (!priv->acpi_unsafe)
+ dev_info(&pdev->dev,
+ "SMBus controller is shared with ACPI AML. This seems safe so far.\n");
/*
* BIOS is accessing the host controller so prevent it from
@@ -1667,10 +1732,7 @@ i801_acpi_io_handler(u32 function, acpi_physical_address address, u32 bits,
pm_runtime_get_sync(&pdev->dev);
}
- if ((function & ACPI_IO_MASK) == ACPI_READ)
- status = acpi_os_read_port(address, (u32 *)value, bits);
- else
- status = acpi_os_write_port(address, (u32)*value, bits);
+ status = i801_acpi_do_access(function, address, bits, value);
mutex_unlock(&priv->acpi_lock);
@@ -1706,7 +1768,7 @@ static void i801_acpi_remove(struct i801_priv *priv)
ACPI_ADR_SPACE_SYSTEM_IO, i801_acpi_io_handler);
mutex_lock(&priv->acpi_lock);
- if (priv->acpi_reserved)
+ if (priv->acpi_usage)
pm_runtime_put(&priv->pci_dev->dev);
mutex_unlock(&priv->acpi_lock);
}
--
2.32.0
When booting with ACPI unavailable or disabled, get_smp_config() ends up
calling MP_processor_info() for each CPU found in the MPS
table. Previously, this resulted in boot_cpu_physical_apicid getting
unconditionally overwritten by the apicid of whatever processor had the
CPU_BOOTPROCESSOR flag. This occurred even if boot_cpu_physical_apicid
had already been more reliably determined in register_lapic_address() by
calling read_apic_id() from the actual boot processor.
Ordinariliy, this is not a problem because the boot processor really is
the one with the CPU_BOOTPROCESSOR flag. However, kexec is an exception
in which the kernel may be booted from any processor regardless of the
MPS table contents. In this case, boot_cpu_physical_apicid may not
indicate the actual boot processor.
This was particularly problematic when the second kernel was booted with
NR_CPUS fewer than the number of physical processors. It's the job of
generic_processor_info() to decide which CPUs to bring up in this case.
That obviously must include the real boot processor which it takes care
to save a slot for. It relies upon the contents of
boot_cpu_physical_apicid to do this, which if incorrect, may result in
the boot processor getting left out.
This condition can be discovered by smp_sanity_check() and rectified by
adding the boot processor to the phys_cpu_present_map with the warning
"weird, boot CPU (#%d) not listed by the BIOS". However, commit
3e730dad3b6da ("x86/apic: Unify interrupt mode setup for SMP-capable
system") caused setup_local_APIC() to be called before this could happen
resulting in a BUG_ON(!apic->apic_id_registered()):
[ 0.655452] ------------[ cut here ]------------
[ 0.660610] Kernel BUG at setup_local_APIC+0x74/0x280 [verbose debug info unavailable]
[ 0.669466] invalid opcode: 0000 [#1] SMP
[ 0.673948] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.109.Ar-16509018.eostrunkkernel419 #1
[ 0.683670] Hardware name: Quanta Quanta LY6 (1LY6UZZ0FBC), BIOS 1.0.6.0-e7d6a55 11/26/2015
[ 0.693007] RIP: 0010:setup_local_APIC+0x74/0x280
[ 0.698264] Code: 80 e4 fe bf f0 00 00 00 89 c6 48 8b 05 0f 1a 8e 00 ff 50 10 e8 12 53 fd ff 48 8b 05 00 1a 8e 00 ff 90 a0 00 00 00 85 c0 75 02 <0f> 0b 48 8b 05 ed 19 8e 00 41 be 00 02 00 00 ff 90 b0 00 00 00 48
[ 0.719251] RSP: 0000:ffffffff81a03e20 EFLAGS: 00010246
[ 0.725091] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
[ 0.733066] RDX: 0000000000000000 RSI: 000000000000000f RDI: 0000000000000020
[ 0.741041] RBP: ffffffff81a03e98 R08: 0000000000000002 R09: 0000000000000000
[ 0.749014] R10: ffffffff81a204e0 R11: ffffffff81b50ea7 R12: 0000000000000000
[ 0.756989] R13: ffffffff81aef920 R14: ffffffff81af60a0 R15: 0000000000000000
[ 0.764965] FS: 0000000000000000(0000) GS:ffff888036800000(0000) knlGS:0000000000000000
[ 0.774007] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.780427] CR2: ffff888035c01000 CR3: 0000000035a08000 CR4: 00000000000006b0
[ 0.788401] Call Trace:
[ 0.791137] ? amd_iommu_prepare+0x15/0x2a
[ 0.795717] apic_bsp_setup+0x55/0x75
[ 0.799808] apic_intr_mode_init+0x169/0x16e
[ 0.804579] x86_late_time_init+0x10/0x17
[ 0.809062] start_kernel+0x37e/0x3fe
[ 0.813154] x86_64_start_reservations+0x2a/0x2c
[ 0.818316] x86_64_start_kernel+0x72/0x75
[ 0.822886] secondary_startup_64+0xa4/0xb0
[ 0.827564] ---[ end trace 237b64da0fd9b22e ]---
This change avoids these issues by only setting boot_cpu_physical_apicid
from the MPS table if it is not already set, which can occur in the
construct_default_ISA_mptable() path. Otherwise,
boot_cpu_physical_apicid will already have been set in
register_lapic_address() and should therefore remain untouched.
Looking through all the places where boot_cpu_physical_apicid is
accessed, nearly all of them assume that boot_cpu_physical_apicid should
match read_apic_id() on the booting processor. The only place that might
intend to use the BSP apicid listed in the MPS table is amd_numa_init(),
which explicitly requires boot_cpu_physical_apicid to be the lowest
apicid of all processors. Ironically, due to the early exit short
circuit in early_get_smp_config(), it instead gets
boot_cpu_physical_apicid = read_apic_id() rather than the MPS table
BSP. The behaviour of amd_numa_init() is therefore unaffected by this
change.
Fixes: 3e730dad3b6da ("x86/apic: Unify interrupt mode setup for SMP-capable system")
Signed-off-by: Kevin Mitchell <kevmitch(a)arista.com>
Cc: <stable(a)vger.kernel.org>
---
arch/x86/kernel/mpparse.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index afac7ccce72f..6f22f09bfe11 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -64,7 +64,8 @@ static void __init MP_processor_info(struct mpc_cpu *m)
if (m->cpuflag & CPU_BOOTPROCESSOR) {
bootup_cpu = " (Bootup-CPU)";
- boot_cpu_physical_apicid = m->apicid;
+ if (boot_cpu_physical_apicid == -1U)
+ boot_cpu_physical_apicid = m->apicid;
}
pr_info("Processor #%d%s\n", m->apicid, bootup_cpu);
--
2.26.2
From: Joerg Roedel <jroedel(a)suse.de>
Allow a runtime opt-out of kexec support for architecture code in case
the kernel is running in an environment where kexec is not properly
supported yet.
This will be used on x86 when the kernel is running as an SEV-ES
guest. SEV-ES guests need special handling for kexec to hand over all
CPUs to the new kernel. This requires special hypervisor support and
handling code in the guest which is not yet implemented.
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
---
include/linux/kexec.h | 1 +
kernel/kexec.c | 14 ++++++++++++++
kernel/kexec_file.c | 9 +++++++++
3 files changed, 24 insertions(+)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 0c994ae37729..85c30dcd0bdc 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -201,6 +201,7 @@ int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
unsigned long buf_len);
#endif
int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf);
+bool arch_kexec_supported(void);
extern int kexec_add_buffer(struct kexec_buf *kbuf);
int kexec_locate_mem_hole(struct kexec_buf *kbuf);
diff --git a/kernel/kexec.c b/kernel/kexec.c
index c82c6c06f051..d03134160458 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -195,11 +195,25 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
* that to happen you need to do that yourself.
*/
+bool __weak arch_kexec_supported(void)
+{
+ return true;
+}
+
static inline int kexec_load_check(unsigned long nr_segments,
unsigned long flags)
{
int result;
+ /*
+ * The architecture may support kexec in general, but the kernel could
+ * run in an environment where it is not (yet) possible to execute a new
+ * kernel. Allow the architecture code to opt-out of kexec support when
+ * it is running in such an environment.
+ */
+ if (!arch_kexec_supported())
+ return -ENOSYS;
+
/* We only trust the superuser with rebooting the system. */
if (!capable(CAP_SYS_BOOT) || kexec_load_disabled)
return -EPERM;
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 33400ff051a8..96d08a512e9c 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -358,6 +358,15 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
int ret = 0, i;
struct kimage **dest_image, *image;
+ /*
+ * The architecture may support kexec in general, but the kernel could
+ * run in an environment where it is not (yet) possible to execute a new
+ * kernel. Allow the architecture code to opt-out of kexec support when
+ * it is running in such an environment.
+ */
+ if (!arch_kexec_supported())
+ return -ENOSYS;
+
/* We only trust the superuser with rebooting the system. */
if (!capable(CAP_SYS_BOOT) || kexec_load_disabled)
return -EPERM;
--
2.31.1
From: Alex Leibovich <alexl(a)marvell.com>
Automatic Clock Gating is a feature used for the power
consumption optimisation. It turned out that
during early init phase it may prevent the stable voltage
switch to 1.8V - due to that on some platfroms an endless
printout in dmesg can be observed:
"mmc1: 1.8V regulator output did not became stable"
Fix the problem by disabling the ACG at very beginning
of the sdhci_init and let that be enabled later.
Fixes: 3a3748dba881 ("mmc: sdhci-xenon: Add Marvell Xenon SDHC core functionality")
Signed-off-by: Alex Leibovich <alexl(a)marvell.com>
Signed-off-by: Marcin Wojtas <mw(a)semihalf.com>
Cc: stable(a)vger.kernel.org
---
drivers/mmc/host/sdhci-xenon.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/mmc/host/sdhci-xenon.c b/drivers/mmc/host/sdhci-xenon.c
index c67611fdaa8a..4b05f6fdefb4 100644
--- a/drivers/mmc/host/sdhci-xenon.c
+++ b/drivers/mmc/host/sdhci-xenon.c
@@ -168,7 +168,12 @@ static void xenon_reset_exit(struct sdhci_host *host,
/* Disable tuning request and auto-retuning again */
xenon_retune_setup(host);
- xenon_set_acg(host, true);
+ /*
+ * The ACG should be turned off at the early init time, in order
+ * to solve a possile issues with the 1.8V regulator stabilization.
+ * The feature is enabled in later stage.
+ */
+ xenon_set_acg(host, false);
xenon_set_sdclk_off_idle(host, sdhc_id, false);
--
2.29.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 353050be4c19e102178ccc05988101887c25ae53 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Tue, 9 Nov 2021 18:48:08 +0000
Subject: [PATCH] bpf: Fix toctou on read-only map's constant scalar tracking
Commit a23740ec43ba ("bpf: Track contents of read-only maps as scalars") is
checking whether maps are read-only both from BPF program side and user space
side, and then, given their content is constant, reading out their data via
map->ops->map_direct_value_addr() which is then subsequently used as known
scalar value for the register, that is, it is marked as __mark_reg_known()
with the read value at verification time. Before a23740ec43ba, the register
content was marked as an unknown scalar so the verifier could not make any
assumptions about the map content.
The current implementation however is prone to a TOCTOU race, meaning, the
value read as known scalar for the register is not guaranteed to be exactly
the same at a later point when the program is executed, and as such, the
prior made assumptions of the verifier with regards to the program will be
invalid which can cause issues such as OOB access, etc.
While the BPF_F_RDONLY_PROG map flag is always fixed and required to be
specified at map creation time, the map->frozen property is initially set to
false for the map given the map value needs to be populated, e.g. for global
data sections. Once complete, the loader "freezes" the map from user space
such that no subsequent updates/deletes are possible anymore. For the rest
of the lifetime of the map, this freeze one-time trigger cannot be undone
anymore after a successful BPF_MAP_FREEZE cmd return. Meaning, any new BPF_*
cmd calls which would update/delete map entries will be rejected with -EPERM
since map_get_sys_perms() removes the FMODE_CAN_WRITE permission. This also
means that pending update/delete map entries must still complete before this
guarantee is given. This corner case is not an issue for loaders since they
create and prepare such program private map in successive steps.
However, a malicious user is able to trigger this TOCTOU race in two different
ways: i) via userfaultfd, and ii) via batched updates. For i) userfaultfd is
used to expand the competition interval, so that map_update_elem() can modify
the contents of the map after map_freeze() and bpf_prog_load() were executed.
This works, because userfaultfd halts the parallel thread which triggered a
map_update_elem() at the time where we copy key/value from the user buffer and
this already passed the FMODE_CAN_WRITE capability test given at that time the
map was not "frozen". Then, the main thread performs the map_freeze() and
bpf_prog_load(), and once that had completed successfully, the other thread
is woken up to complete the pending map_update_elem() which then changes the
map content. For ii) the idea of the batched update is similar, meaning, when
there are a large number of updates to be processed, it can increase the
competition interval between the two. It is therefore possible in practice to
modify the contents of the map after executing map_freeze() and bpf_prog_load().
One way to fix both i) and ii) at the same time is to expand the use of the
map's map->writecnt. The latter was introduced in fc9702273e2e ("bpf: Add mmap()
support for BPF_MAP_TYPE_ARRAY") and further refined in 1f6cb19be2e2 ("bpf:
Prevent re-mmap()'ing BPF map as writable for initially r/o mapping") with
the rationale to make a writable mmap()'ing of a map mutually exclusive with
read-only freezing. The counter indicates writable mmap() mappings and then
prevents/fails the freeze operation. Its semantics can be expanded beyond
just mmap() by generally indicating ongoing write phases. This would essentially
span any parallel regular and batched flavor of update/delete operation and
then also have map_freeze() fail with -EBUSY. For the check_mem_access() in
the verifier we expand upon the bpf_map_is_rdonly() check ensuring that all
last pending writes have completed via bpf_map_write_active() test. Once the
map->frozen is set and bpf_map_write_active() indicates a map->writecnt of 0
only then we are really guaranteed to use the map's data as known constants.
For map->frozen being set and pending writes in process of still being completed
we fall back to marking that register as unknown scalar so we don't end up
making assumptions about it. With this, both TOCTOU reproducers from i) and
ii) are fixed.
Note that the map->writecnt has been converted into a atomic64 in the fix in
order to avoid a double freeze_mutex mutex_{un,}lock() pair when updating
map->writecnt in the various map update/delete BPF_* cmd flavors. Spanning
the freeze_mutex over entire map update/delete operations in syscall side
would not be possible due to then causing everything to be serialized.
Similarly, something like synchronize_rcu() after setting map->frozen to wait
for update/deletes to complete is not possible either since it would also
have to span the user copy which can sleep. On the libbpf side, this won't
break d66562fba1ce ("libbpf: Add BPF object skeleton support") as the
anonymous mmap()-ed "map initialization image" is remapped as a BPF map-backed
mmap()-ed memory where for .rodata it's non-writable.
Fixes: a23740ec43ba ("bpf: Track contents of read-only maps as scalars")
Reported-by: w1tcher.bupt(a)gmail.com
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f715e8863f4d..e7a163a3146b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -193,7 +193,7 @@ struct bpf_map {
atomic64_t usercnt;
struct work_struct work;
struct mutex freeze_mutex;
- u64 writecnt; /* writable mmap cnt; protected by freeze_mutex */
+ atomic64_t writecnt;
};
static inline bool map_value_has_spin_lock(const struct bpf_map *map)
@@ -1419,6 +1419,7 @@ void bpf_map_put(struct bpf_map *map);
void *bpf_map_area_alloc(u64 size, int numa_node);
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
void bpf_map_area_free(void *base);
+bool bpf_map_write_active(const struct bpf_map *map);
void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
int generic_map_lookup_batch(struct bpf_map *map,
const union bpf_attr *attr,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 50f96ea4452a..1033ee8c0caf 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -132,6 +132,21 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
return map;
}
+static void bpf_map_write_active_inc(struct bpf_map *map)
+{
+ atomic64_inc(&map->writecnt);
+}
+
+static void bpf_map_write_active_dec(struct bpf_map *map)
+{
+ atomic64_dec(&map->writecnt);
+}
+
+bool bpf_map_write_active(const struct bpf_map *map)
+{
+ return atomic64_read(&map->writecnt) != 0;
+}
+
static u32 bpf_map_value_size(const struct bpf_map *map)
{
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
@@ -601,11 +616,8 @@ static void bpf_map_mmap_open(struct vm_area_struct *vma)
{
struct bpf_map *map = vma->vm_file->private_data;
- if (vma->vm_flags & VM_MAYWRITE) {
- mutex_lock(&map->freeze_mutex);
- map->writecnt++;
- mutex_unlock(&map->freeze_mutex);
- }
+ if (vma->vm_flags & VM_MAYWRITE)
+ bpf_map_write_active_inc(map);
}
/* called for all unmapped memory region (including initial) */
@@ -613,11 +625,8 @@ static void bpf_map_mmap_close(struct vm_area_struct *vma)
{
struct bpf_map *map = vma->vm_file->private_data;
- if (vma->vm_flags & VM_MAYWRITE) {
- mutex_lock(&map->freeze_mutex);
- map->writecnt--;
- mutex_unlock(&map->freeze_mutex);
- }
+ if (vma->vm_flags & VM_MAYWRITE)
+ bpf_map_write_active_dec(map);
}
static const struct vm_operations_struct bpf_map_default_vmops = {
@@ -668,7 +677,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
goto out;
if (vma->vm_flags & VM_MAYWRITE)
- map->writecnt++;
+ bpf_map_write_active_inc(map);
out:
mutex_unlock(&map->freeze_mutex);
return err;
@@ -1139,6 +1148,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
+ bpf_map_write_active_inc(map);
if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
goto err_put;
@@ -1174,6 +1184,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
free_key:
kvfree(key);
err_put:
+ bpf_map_write_active_dec(map);
fdput(f);
return err;
}
@@ -1196,6 +1207,7 @@ static int map_delete_elem(union bpf_attr *attr)
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
+ bpf_map_write_active_inc(map);
if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
goto err_put;
@@ -1226,6 +1238,7 @@ static int map_delete_elem(union bpf_attr *attr)
out:
kvfree(key);
err_put:
+ bpf_map_write_active_dec(map);
fdput(f);
return err;
}
@@ -1533,6 +1546,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
+ bpf_map_write_active_inc(map);
if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ) ||
!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
@@ -1597,6 +1611,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
free_key:
kvfree(key);
err_put:
+ bpf_map_write_active_dec(map);
fdput(f);
return err;
}
@@ -1624,8 +1639,7 @@ static int map_freeze(const union bpf_attr *attr)
}
mutex_lock(&map->freeze_mutex);
-
- if (map->writecnt) {
+ if (bpf_map_write_active(map)) {
err = -EBUSY;
goto err_put;
}
@@ -4171,6 +4185,9 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
union bpf_attr __user *uattr,
int cmd)
{
+ bool has_read = cmd == BPF_MAP_LOOKUP_BATCH ||
+ cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH;
+ bool has_write = cmd != BPF_MAP_LOOKUP_BATCH;
struct bpf_map *map;
int err, ufd;
struct fd f;
@@ -4183,16 +4200,13 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
-
- if ((cmd == BPF_MAP_LOOKUP_BATCH ||
- cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH) &&
- !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
+ if (has_write)
+ bpf_map_write_active_inc(map);
+ if (has_read && !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
err = -EPERM;
goto err_put;
}
-
- if (cmd != BPF_MAP_LOOKUP_BATCH &&
- !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
+ if (has_write && !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
err = -EPERM;
goto err_put;
}
@@ -4205,8 +4219,9 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
BPF_DO_BATCH(map->ops->map_update_batch);
else
BPF_DO_BATCH(map->ops->map_delete_batch);
-
err_put:
+ if (has_write)
+ bpf_map_write_active_dec(map);
fdput(f);
return err;
}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 65d2f93b7030..50efda51515b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4056,7 +4056,22 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
static bool bpf_map_is_rdonly(const struct bpf_map *map)
{
- return (map->map_flags & BPF_F_RDONLY_PROG) && map->frozen;
+ /* A map is considered read-only if the following condition are true:
+ *
+ * 1) BPF program side cannot change any of the map content. The
+ * BPF_F_RDONLY_PROG flag is throughout the lifetime of a map
+ * and was set at map creation time.
+ * 2) The map value(s) have been initialized from user space by a
+ * loader and then "frozen", such that no new map update/delete
+ * operations from syscall side are possible for the rest of
+ * the map's lifetime from that point onwards.
+ * 3) Any parallel/pending map update/delete operations from syscall
+ * side have been completed. Only after that point, it's safe to
+ * assume that map value(s) are immutable.
+ */
+ return (map->map_flags & BPF_F_RDONLY_PROG) &&
+ READ_ONCE(map->frozen) &&
+ !bpf_map_write_active(map);
}
static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val)
Commit 6dce5aa59e0b ("PCI: xgene: Use inbound resources for setup")
broke PCI support on XGene. The cause is the IB resources are now sorted
in address order instead of being in DT dma-ranges order. The result is
which inbound registers are used for each region are swapped. I don't
know the details about this h/w, but it appears that IB region 0
registers can't handle a size greater than 4GB. In any case, limiting
the size for region 0 is enough to get back to the original assignment
of dma-ranges to regions.
Reported-by: Stéphane Graber <stgraber(a)ubuntu.com>
Fixes: 6dce5aa59e0b ("PCI: xgene: Use inbound resources for setup")
Link: https://lore.kernel.org/all/CA+enf=v9rY_xnZML01oEgKLmvY1NGBUUhnSJaETmXtDtXf…
Cc: stable(a)vger.kernel.org # v5.5+
Signed-off-by: Rob Herring <robh(a)kernel.org>
---
drivers/pci/controller/pci-xgene.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/controller/pci-xgene.c b/drivers/pci/controller/pci-xgene.c
index 56d0d50338c8..d83dbd977418 100644
--- a/drivers/pci/controller/pci-xgene.c
+++ b/drivers/pci/controller/pci-xgene.c
@@ -465,7 +465,7 @@ static int xgene_pcie_select_ib_reg(u8 *ib_reg_mask, u64 size)
return 1;
}
- if ((size > SZ_1K) && (size < SZ_1T) && !(*ib_reg_mask & (1 << 0))) {
+ if ((size > SZ_1K) && (size < SZ_4G) && !(*ib_reg_mask & (1 << 0))) {
*ib_reg_mask |= (1 << 0);
return 0;
}
--
2.32.0
On 2021-07-30 12:35, Anders Roxell wrote:
> From: Robin Murphy <robin.murphy(a)arm.com>
>
>> Now that PCI inbound window restrictions are handled generically between
>> the of_pci resource parsing and the IOMMU layer, and described in the
>> Juno DT, we can finally enable the PCIe SMMU without the risk of DMA
>> mappings inadvertently allocating unusable addresses.
>>
>> Similarly, the relevant support for IOMMU mappings for peripheral
>> transfers has been hooked up in the pl330 driver for ages, so we can
>> happily enable the DMA SMMU without that breaking anything either.
>>
>> Signed-off-by: Robin Murphy <robin.murphy(a)arm.com>
>
> When we build a kernel with 64k page size and run the ltp syscalls we
> sporadically see a kernel crash while doing a mkfs on a connected SATA
> drive. This is happening every third test run on any juno-r2 device in
> the lab with the same kernel image (stable-rc 5.13.y, mainline and next)
> with gcc-11.
Hmm, I guess 64K pages might make a difference in that we'll chew
through IOVA space a lot faster with small mappings...
I'll have to try to reproduce this locally, since the interesting thing
would be knowing what DMA address it was trying to use that went wrong,
but IOMMU tracepoints and/or dma-debug are going to generate an crazy
amount of data to sift through and try to correlate - having done it
before it's not something I'd readily ask someone else to do for me :)
On a hunch, though, does it make any difference if you remove the first
entry from the PCIe "dma-ranges" (the 0x2c1c0000 one)?
Robin.
> Here is a snippet of the boot log [1]:
>
> + mkfs -t ext4 /dev/disk/by-id/ata-SanDisk_SDSSDA120G_165192443611
> mke2fs 1.43.8 (1-Jan-2018)
> Discarding device blocks: 4096/29305200
> [ 55.344291] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [ 55.351423] ata1.00: irq_stat 0x00020002, failed to transmit command
> FIS
> [ 55.358205] ata1.00: failed command: DATA SET MANAGEMENT
> [ 55.363561] ata1.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 12
> dma 512 out
> [ 55.363561] res ec/ff:00:00:00:00/00:00:00:00:ec/00 Emask
> 0x12 (ATA bus error)
> [ 55.378955] ata1.00: status: { Busy }
> [ 55.382658] ata1.00: error: { ICRC UNC AMNF IDNF ABRT }
> [ 55.387947] ata1: hard resetting link
> [ 55.391653] ata1: controller in dubious state, performing PORT_RST
> [ 57.588447] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> [ 57.613471] ata1.00: configured for UDMA/100
> [ 57.617866] ata1.00: device reported invalid CHS sector 0
> [ 57.623397] ata1: EH complete
>
>
> When we revert this patch we don't see any issue.
>
> Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
>
> Cheers,
> Anders
> [1]
> https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.13.y/build/v5.13…
>
Some boot partitions on the Samsung 4GB KLM4G1YE4C "4YMD1R" and "M4G1YC"
cards appear broken when accessed randomly. CMD6 to switch back to the main
partition randomly stalls after CMD18 access to the boot partition 1, and
the card never comes back online. The accesses to the boot partitions work
several times before this happens, but eventually the card access hangs
while initializing the card.
Some problematic eMMC cards are found in the Samsung GT-S7710 (Skomer)
and SGH-I407 (Kyle) mobile phones.
I tried using only single blocks with CMD17 on the boot partitions with the
result that it crashed even faster.
After a bit of root cause analysis it turns out that these old eMMC cards
probably cannot do hardware busy detection (monitoring DAT0) properly.
The card survives on older kernels, but this is because recent kernels have
added busy detection handling for the SoC used in these phones, exposing
the issue.
Construct a quirk that makes the MMC cord avoid using the ->card_busy()
callback if the card is listed with MMC_QUIRK_BROKEN_HW_BUSY_DETECT and
register the known problematic cards. The core changes are pretty
straight-forward with a helper inline to check of we can use hardware
busy detection.
On the MMCI host we have to counter the fact that if the host was able to
use busy detect, it would be used unsolicited in the command IRQ callback.
Rewrite this so that MMCI will not attempt to use hardware busy detection
in the command IRQ until:
- A card is attached to the host and
- We know that the card can handle this busy detection
I have glanced over the ->card_busy() callbacks on some other hosts and
they seem to mostly read a register reflecting the value of DAT0 for this
which works fine with the quirk in this patch. However if the error appear
on other hosts they might need additional fixes.
After applying this patch, the main partition can be accessed and mounted
without problems on Samsung GT-S7710 and SGH-I407.
Fixes: cb0335b778c7 ("mmc: mmci: add busy_complete callback")
Cc: stable(a)vger.kernel.org
Cc: phone-devel(a)vger.kernel.org
Cc: Ludovic Barre <ludovic.barre(a)st.com>
Cc: Stephan Gerhold <stephan(a)gerhold.net>
Reported-by: newbyte(a)disroot.org
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
ChangeLog v2->v3:
- Rebase on v5.14-rc1
- Reword the commit message slightly.
ChangeLog v1->v2:
- Rewrite to reflect the actual problem of broken busy detection.
---
drivers/mmc/core/core.c | 8 ++++----
drivers/mmc/core/core.h | 17 +++++++++++++++++
drivers/mmc/core/mmc_ops.c | 4 ++--
drivers/mmc/core/quirks.h | 21 +++++++++++++++++++++
drivers/mmc/host/mmci.c | 22 ++++++++++++++++++++--
include/linux/mmc/card.h | 1 +
6 files changed, 65 insertions(+), 8 deletions(-)
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 95fedcf56e4a..e08dd9ea3d46 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -232,7 +232,7 @@ static void __mmc_start_request(struct mmc_host *host, struct mmc_request *mrq)
* And bypass I/O abort, reset and bus suspend operations.
*/
if (sdio_is_io_busy(mrq->cmd->opcode, mrq->cmd->arg) &&
- host->ops->card_busy) {
+ mmc_hw_busy_detect(host)) {
int tries = 500; /* Wait aprox 500ms at maximum */
while (host->ops->card_busy(host) && --tries)
@@ -1200,7 +1200,7 @@ int mmc_set_uhs_voltage(struct mmc_host *host, u32 ocr)
*/
if (!host->ops->start_signal_voltage_switch)
return -EPERM;
- if (!host->ops->card_busy)
+ if (!mmc_hw_busy_detect(host))
pr_warn("%s: cannot verify signal voltage switch\n",
mmc_hostname(host));
@@ -1220,7 +1220,7 @@ int mmc_set_uhs_voltage(struct mmc_host *host, u32 ocr)
* after the response of cmd11, but wait 1 ms to be sure
*/
mmc_delay(1);
- if (host->ops->card_busy && !host->ops->card_busy(host)) {
+ if (mmc_hw_busy_detect(host) && !host->ops->card_busy(host)) {
err = -EAGAIN;
goto power_cycle;
}
@@ -1241,7 +1241,7 @@ int mmc_set_uhs_voltage(struct mmc_host *host, u32 ocr)
* Failure to switch is indicated by the card holding
* dat[0:3] low
*/
- if (host->ops->card_busy && host->ops->card_busy(host))
+ if (mmc_hw_busy_detect(host) && host->ops->card_busy(host))
err = -EAGAIN;
power_cycle:
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index 0c4de2030b3f..6a5619eed4a6 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -181,4 +181,21 @@ static inline int mmc_flush_cache(struct mmc_host *host)
return 0;
}
+/**
+ * mmc_hw_busy_detect() - Can we use hw busy detection?
+ * @host: the host in question
+ */
+static inline bool mmc_hw_busy_detect(struct mmc_host *host)
+{
+ struct mmc_card *card = host->card;
+ bool has_ops;
+ bool able = true;
+
+ has_ops = (host->ops->card_busy != NULL);
+ if (card)
+ able = !(card->quirks & MMC_QUIRK_BROKEN_HW_BUSY_DETECT);
+
+ return (has_ops && able);
+}
+
#endif
diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index 973756ed4016..546fc799a8e5 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -435,7 +435,7 @@ static int mmc_busy_cb(void *cb_data, bool *busy)
u32 status = 0;
int err;
- if (host->ops->card_busy) {
+ if (mmc_hw_busy_detect(host)) {
*busy = host->ops->card_busy(host);
return 0;
}
@@ -597,7 +597,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
* when it's not allowed to poll by using CMD13, then we need to rely on
* waiting the stated timeout to be sufficient.
*/
- if (!send_status && !host->ops->card_busy) {
+ if (!send_status && !mmc_hw_busy_detect(host)) {
mmc_delay(timeout_ms);
goto out_tim;
}
diff --git a/drivers/mmc/core/quirks.h b/drivers/mmc/core/quirks.h
index d68e6e513a4f..8da6526f0eb0 100644
--- a/drivers/mmc/core/quirks.h
+++ b/drivers/mmc/core/quirks.h
@@ -99,6 +99,27 @@ static const struct mmc_fixup __maybe_unused mmc_blk_fixups[] = {
MMC_FIXUP("V10016", CID_MANFID_KINGSTON, CID_OEMID_ANY, add_quirk_mmc,
MMC_QUIRK_TRIM_BROKEN),
+ /*
+ * Some older Samsung eMMCs have broken hardware busy detection.
+ * Enabling this feature in the host controller can make the card
+ * accesses lock up completely.
+ */
+ MMC_FIXUP("4YMD1R", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
+ MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
+ /* Samsung KLMxGxxE4x eMMCs from 2012: 4, 8, 16, 32 and 64 GB */
+ MMC_FIXUP("M4G1YC", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
+ MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
+ MMC_FIXUP("M8G1WA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
+ MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
+ MMC_FIXUP("MAG2WA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
+ MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
+ MMC_FIXUP("MBG4WA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
+ MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
+ MMC_FIXUP("MAG2WA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
+ MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
+ MMC_FIXUP("MCG8WA", CID_MANFID_SAMSUNG, CID_OEMID_ANY, add_quirk_mmc,
+ MMC_QUIRK_BROKEN_HW_BUSY_DETECT),
+
END_FIXUP
};
diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index 984d35055156..3046917b2b67 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -347,6 +347,24 @@ static int mmci_card_busy(struct mmc_host *mmc)
return busy;
}
+/* Use this if the MMCI variant AND the card supports it */
+static bool mmci_use_busy_detect(struct mmci_host *host)
+{
+ struct mmc_card *card = host->mmc->card;
+
+ if (!host->variant->busy_detect)
+ return false;
+
+ /* We don't allow this until we know that the card can handle it */
+ if (!card)
+ return false;
+
+ if (card->quirks & MMC_QUIRK_BROKEN_HW_BUSY_DETECT)
+ return false;
+
+ return true;
+}
+
static void mmci_reg_delay(struct mmci_host *host)
{
/*
@@ -1381,7 +1399,7 @@ mmci_cmd_irq(struct mmci_host *host, struct mmc_command *cmd,
return;
/* Handle busy detection on DAT0 if the variant supports it. */
- if (busy_resp && host->variant->busy_detect)
+ if (busy_resp && mmci_use_busy_detect(host))
if (!host->ops->busy_complete(host, status, err_msk))
return;
@@ -1725,7 +1743,7 @@ static void mmci_set_max_busy_timeout(struct mmc_host *mmc)
struct mmci_host *host = mmc_priv(mmc);
u32 max_busy_timeout = 0;
- if (!host->variant->busy_detect)
+ if (!mmci_use_busy_detect(host))
return;
if (host->variant->busy_timeout && mmc->actual_clock)
diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
index 74e6c0624d27..525a39951c6d 100644
--- a/include/linux/mmc/card.h
+++ b/include/linux/mmc/card.h
@@ -280,6 +280,7 @@ struct mmc_card {
/* for byte mode */
#define MMC_QUIRK_NONSTD_SDIO (1<<2) /* non-standard SDIO card attached */
/* (missing CIA registers) */
+#define MMC_QUIRK_BROKEN_HW_BUSY_DETECT (1<<3) /* Disable hardware busy detection on DAT0 */
#define MMC_QUIRK_NONSTD_FUNC_IF (1<<4) /* SDIO card has nonstd function interfaces */
#define MMC_QUIRK_DISABLE_CD (1<<5) /* disconnect CD/DAT[3] resistor */
#define MMC_QUIRK_INAND_CMD38 (1<<6) /* iNAND devices have broken CMD38 */
--
2.31.1
On 11/24/21 8:28 AM, Jens Axboe wrote:
> On 11/23/21 8:27 PM, Daniel Black wrote:
>> On Mon, Nov 15, 2021 at 7:55 AM Jens Axboe <axboe(a)kernel.dk> wrote:
>>>
>>> On 11/14/21 1:33 PM, Daniel Black wrote:
>>>> On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <axboe(a)kernel.dk> wrote:
>>>>>
>>>>> Alright, give this one a go if you can. Against -git, but will apply to
>>>>> 5.15 as well.
>>>>
>>>>
>>>> Works. Thank you very much.
>>>>
>>>> https://jira.mariadb.org/browse/MDEV-26674?page=com.atlassian.jira.plugin.s…
>>>>
>>>> Tested-by: Marko Mäkelä <marko.makela(a)mariadb.com>
>>>
>>> The patch is already upstream (and in the 5.15 stable queue), and I
>>> provided 5.14 patches too.
>>
>> Jens,
>>
>> I'm getting the same reproducer on 5.14.20
>> (https://bugzilla.redhat.com/show_bug.cgi?id=2018882#c3) though the
>> backport change logs indicate 5.14.19 has the patch.
>>
>> Anything missing?
>
> We might also need another patch that isn't in stable, I'm attaching
> it here. Any chance you can run 5.14.20/21 with this applied? If not,
> I'll do some sanity checking here and push it to -stable.
Looks good to me - Greg, would you mind queueing this up for
5.14-stable?
--
Jens Axboe
Note: this applies to 5.10 stable only. It doesn't trigger on anything
above 5.10 as the code there has been substantially reworked. This also
doesn't apply to any stable kernel below 5.10 afaict.
Syzbot found a bug: KASAN: invalid-free in io_dismantle_req
https://syzkaller.appspot.com/bug?id=123d9a852fc88ba573ffcb2dbcf4f9576c3b05…
The test submits bunch of io_uring writes and exits, which then triggers
uring_task_cancel() and io_put_identity(), which in some corner cases,
tries to free a static identity. This causes a panic as shown in the
trace below:
BUG: KASAN: double-free or invalid-free in kfree+0xd5/0x310
CPU: 0 PID: 4618 Comm: repro Not tainted 5.10.76-05281-g4944ec82ebb9-dirty #17
Call Trace:
dump_stack_lvl+0x1b2/0x21b
print_address_description+0x8d/0x3b0
kasan_report_invalid_free+0x58/0x130
____kasan_slab_free+0x14b/0x170
__kasan_slab_free+0x11/0x20
slab_free_freelist_hook+0xcc/0x1a0
kfree+0xd5/0x310
io_dismantle_req+0x9b0/0xd90
io_do_iopoll+0x13a4/0x23e0
io_iopoll_try_reap_events+0x116/0x290
io_uring_cancel_task_requests+0x197d/0x1ee0
io_uring_flush+0x170/0x6d0
filp_close+0xb0/0x150
put_files_struct+0x1d4/0x350
exit_files+0x80/0xa0
do_exit+0x6d9/0x2390
do_group_exit+0x16a/0x2d0
get_signal+0x133e/0x1f80
arch_do_signal+0x7b/0x610
exit_to_user_mode_prepare+0xaa/0xe0
syscall_exit_to_user_mode+0x24/0x40
do_syscall_64+0x3d/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Allocated by task 4611:
____kasan_kmalloc+0xcd/0x100
__kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x208/0x390
io_uring_alloc_task_context+0x57/0x550
io_uring_add_task_file+0x1f7/0x290
io_uring_create+0x2195/0x3490
__x64_sys_io_uring_setup+0x1bf/0x280
do_syscall_64+0x31/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff88810732b500
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 88 bytes inside of
192-byte region [ffff88810732b500, ffff88810732b5c0)
Kernel panic - not syncing: panic_on_warn set ...
This issue bisected to this commit:
commit 186725a80c4e ("io_uring: fix skipping disabling sqo on exec")
Simple reverting the offending commit doesn't work as it hits some
other, related issues like:
/* sqo_dead check is for when this happens after cancellation */
WARN_ON_ONCE(ctx->sqo_task == current && !ctx->sqo_dead &&
!xa_load(&tctx->xa, (unsigned long)file));
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5622 at fs/io_uring.c:8960 io_uring_flush+0x5bc/0x6d0
Modules linked in:
CPU: 1 PID: 5622 Comm: repro Not tainted 5.10.76-05281-g4944ec82ebb9-dirty #16
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-6.fc35 04/01/2014
RIP: 0010:io_uring_flush+0x5bc/0x6d0
Call Trace:
filp_close+0xb0/0x150
put_files_struct+0x1d4/0x350
reset_files_struct+0x88/0xa0
bprm_execve+0x7f2/0x9f0
do_execveat_common+0x46f/0x5d0
__x64_sys_execve+0x92/0xb0
do_syscall_64+0x31/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Changing __io_uring_task_cancel() to call io_disable_sqo_submit() directly,
as the comment suggests, only if __io_uring_files_cancel() is not executed
seems to fix the issue.
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: <io-uring(a)vger.kernel.org>
Cc: <linux-fsdevel(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <stable(a)vger.kernel.org>
Reported-by: syzbot+6055980d041c8ac23307(a)syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk(a)linaro.org>
---
fs/io_uring.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0736487165da..fcf9ffe9b209 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8882,20 +8882,18 @@ void __io_uring_task_cancel(void)
struct io_uring_task *tctx = current->io_uring;
DEFINE_WAIT(wait);
s64 inflight;
+ int canceled = 0;
/* make sure overflow events are dropped */
atomic_inc(&tctx->in_idle);
- /* trigger io_disable_sqo_submit() */
- if (tctx->sqpoll)
- __io_uring_files_cancel(NULL);
-
do {
/* read completions before cancelations */
inflight = tctx_inflight(tctx);
if (!inflight)
break;
__io_uring_files_cancel(NULL);
+ canceled = 1;
prepare_to_wait(&tctx->wait, &wait, TASK_UNINTERRUPTIBLE);
@@ -8909,6 +8907,21 @@ void __io_uring_task_cancel(void)
finish_wait(&tctx->wait, &wait);
} while (1);
+ /*
+ * trigger io_disable_sqo_submit()
+ * if not already done by __io_uring_files_cancel()
+ */
+ if (tctx->sqpoll && !canceled) {
+ struct file *file;
+ unsigned long index;
+
+ xa_for_each(&tctx->xa, index, file) {
+ struct io_ring_ctx *ctx = file->private_data;
+
+ io_disable_sqo_submit(ctx);
+ }
+ }
+
atomic_dec(&tctx->in_idle);
io_uring_remove_task_files(tctx);
--
2.33.1
While in practice vcpu->vcpu_idx == vcpu->vcp_id is often true,
it may not always be, and we must not rely on this.
Currently kvm->arch.idle_mask is indexed by vcpu_id, which implies
that code like
for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) {
vcpu = kvm_get_vcpu(kvm, vcpu_id);
do_stuff(vcpu);
}
is not legit. The trouble is, we do actually use kvm->arch.idle_mask
like this. To fix this problem we have two options. Either use
kvm_get_vcpu_by_id(vcpu_id), which would loop to find the right vcpu_id,
or switch to indexing via vcpu_idx. The latter is preferable for obvious
reasons.
Let us make switch from indexing kvm->arch.idle_mask by vcpu_id to
indexing it by vcpu_idx. To keep gisa_int.kicked_mask indexed by the
same index as idle_mask lets make the same change for it as well.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: 1ee0bc559dc3 ("KVM: s390: get rid of local_int array")
Cc: <stable(a)vger.kernel.org> # 3.15+
---
arch/s390/include/asm/kvm_host.h | 1 +
arch/s390/kvm/interrupt.c | 12 ++++++------
arch/s390/kvm/kvm-s390.c | 2 +-
arch/s390/kvm/kvm-s390.h | 2 +-
4 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 161a9e12bfb8..630eab0fa176 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -957,6 +957,7 @@ struct kvm_arch{
atomic64_t cmma_dirty_pages;
/* subset of available cpu features enabled by user space */
DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
+ /* indexed by vcpu_idx */
DECLARE_BITMAP(idle_mask, KVM_MAX_VCPUS);
struct kvm_s390_gisa_interrupt gisa_int;
struct kvm_s390_pv pv;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index d548d60caed2..16256e17a544 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -419,13 +419,13 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu)
static void __set_cpu_idle(struct kvm_vcpu *vcpu)
{
kvm_s390_set_cpuflags(vcpu, CPUSTAT_WAIT);
- set_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask);
+ set_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask);
}
static void __unset_cpu_idle(struct kvm_vcpu *vcpu)
{
kvm_s390_clear_cpuflags(vcpu, CPUSTAT_WAIT);
- clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask);
+ clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask);
}
static void __reset_intercept_indicators(struct kvm_vcpu *vcpu)
@@ -3050,18 +3050,18 @@ int kvm_s390_get_irq_state(struct kvm_vcpu *vcpu, __u8 __user *buf, int len)
static void __airqs_kick_single_vcpu(struct kvm *kvm, u8 deliverable_mask)
{
- int vcpu_id, online_vcpus = atomic_read(&kvm->online_vcpus);
+ int vcpu_idx, online_vcpus = atomic_read(&kvm->online_vcpus);
struct kvm_s390_gisa_interrupt *gi = &kvm->arch.gisa_int;
struct kvm_vcpu *vcpu;
- for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) {
- vcpu = kvm_get_vcpu(kvm, vcpu_id);
+ for_each_set_bit(vcpu_idx, kvm->arch.idle_mask, online_vcpus) {
+ vcpu = kvm_get_vcpu(kvm, vcpu_idx);
if (psw_ioint_disabled(vcpu))
continue;
deliverable_mask &= (u8)(vcpu->arch.sie_block->gcr[6] >> 24);
if (deliverable_mask) {
/* lately kicked but not yet running */
- if (test_and_set_bit(vcpu_id, gi->kicked_mask))
+ if (test_and_set_bit(vcpu_idx, gi->kicked_mask))
return;
kvm_s390_vcpu_wakeup(vcpu);
return;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 4527ac7b5961..8580543c5bc3 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4044,7 +4044,7 @@ static int vcpu_pre_run(struct kvm_vcpu *vcpu)
kvm_s390_patch_guest_per_regs(vcpu);
}
- clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.gisa_int.kicked_mask);
+ clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.gisa_int.kicked_mask);
vcpu->arch.sie_block->icptcode = 0;
cpuflags = atomic_read(&vcpu->arch.sie_block->cpuflags);
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 9fad25109b0d..ecd741ee3276 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -79,7 +79,7 @@ static inline int is_vcpu_stopped(struct kvm_vcpu *vcpu)
static inline int is_vcpu_idle(struct kvm_vcpu *vcpu)
{
- return test_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask);
+ return test_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask);
}
static inline int kvm_is_ucontrol(struct kvm *kvm)
base-commit: 77dd11439b86e3f7990e4c0c9e0b67dca82750ba
--
2.25.1
From: Palmer Dabbelt <palmer(a)rivosinc.com>
For non-relocatable kernels we need to be able to link the kernel at
approximately PAGE_OFFSET, thus requiring medany (as medlow requires the
code to be linked within 2GiB of 0). The inverse doesn't apply, though:
since medany code can be linked anywhere it's fine to link it close to
0, so we can support the smaller memory config.
Fixes: de5f4b8f634b ("RISC-V: Define MAXPHYSMEM_1GB only for RV32")
Cc: stable(a)vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
---
I found this when going through the savedefconfig diffs for the K210
defconfigs. I'm not entirely sure they're doing the right thing here
(they should probably be setting CMODEL_LOW to take advantage of the
better code generation), but I don't have any way to test those
platforms so I don't want to change too much.
---
arch/riscv/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 821252b65f89..61f64512dcde 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -280,7 +280,7 @@ choice
depends on 32BIT
bool "1GiB"
config MAXPHYSMEM_2GB
- depends on 64BIT && CMODEL_MEDLOW
+ depends on 64BIT
bool "2GiB"
config MAXPHYSMEM_128GB
depends on 64BIT && CMODEL_MEDANY
--
2.32.0
On kdump instead of using an intermediate step to relocate the kernel,
that lives in a "control buffer" outside the current kernel's mapping,
we jump to the crash kernel directly by calling riscv_kexec_norelocate().
The current implementation uses va_pa_offset while switching to physical
addressing, however since we moved the kernel outside the linear mapping
this won't work anymore since riscv_kexec_norelocate() is part of the
kernel mapping and we should use kernel_map.va_kernel_pa_offset, and also
take XIP kernel into account.
We don't really need to use va_pa_offset on riscv_kexec_norelocate, we
can just set STVEC to the physical address of the new kernel instead and
let the hart jump to the new kernel on the next instruction after setting
SATP to zero. This fixes kdump and is also simpler/cleaner.
I tested this on the latest qemu and HiFive Unmatched and works as
expected.
v2: I removed the direct jump after setting satp as suggested.
Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping")
Signed-off-by: Nick Kossifidis <mick(a)ics.forth.gr>
Reviewed-by: Alexandre Ghiti <alex(a)ghiti.fr>
Cc: <stable(a)vger.kernel.org> # 5.13
Cc: <stable(a)vger.kernel.org> # 5.14
---
arch/riscv/kernel/kexec_relocate.S | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
index a80b52a74..059c5e216 100644
--- a/arch/riscv/kernel/kexec_relocate.S
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -159,25 +159,15 @@ SYM_CODE_START(riscv_kexec_norelocate)
* s0: (const) Phys address to jump to
* s1: (const) Phys address of the FDT image
* s2: (const) The hartid of the current hart
- * s3: (const) kernel_map.va_pa_offset, used when switching MMU off
*/
mv s0, a1
mv s1, a2
mv s2, a3
- mv s3, a4
/* Disable / cleanup interrupts */
csrw CSR_SIE, zero
csrw CSR_SIP, zero
- /* Switch to physical addressing */
- la s4, 1f
- sub s4, s4, s3
- csrw CSR_STVEC, s4
- csrw CSR_SATP, zero
-
-.align 2
-1:
/* Pass the arguments to the next kernel / Cleanup*/
mv a0, s2
mv a1, s1
@@ -214,7 +204,15 @@ SYM_CODE_START(riscv_kexec_norelocate)
csrw CSR_SCAUSE, zero
csrw CSR_SSCRATCH, zero
- jalr zero, a2, 0
+ /*
+ * Switch to physical addressing
+ * This will also trigger a jump to CSR_STVEC
+ * which in this case is the address of the new
+ * kernel.
+ */
+ csrw CSR_STVEC, a2
+ csrw CSR_SATP, zero
+
SYM_CODE_END(riscv_kexec_norelocate)
.section ".rodata"
--
2.32.0
From: Zhengjun Xing <zhengjun.xing(a)linux.intel.com>
The user recently report a perf issue in the ICX platform, when test by
perf event “uncore_imc_x/cas_count_write”,the write bandwidth is always
very small (only 0.38MB/s), it is caused by the wrong "umask" for the
"cas_count_write" event. When double-checking, find "cas_count_read"
also is wrong.
The public document for ICX uncore:
https://www.intel.com/content/www/us/en/develop/download/3rd-gen-intel-xeon…
On page 142, Table 2-143, defines Unit Masks for CAS_COUNT:
RD b00001111
WR b00110000
So Corrected both "cas_count_read" and "cas_count_write" for ICX.
Old settings:
hswep_uncore_imc_events
INTEL_UNCORE_EVENT_DESC(cas_count_read, "event=0x04,umask=0x03")
INTEL_UNCORE_EVENT_DESC(cas_count_write, "event=0x04,umask=0x0c")
New settings:
snr_uncore_imc_events
INTEL_UNCORE_EVENT_DESC(cas_count_read, "event=0x04,umask=0x0f")
INTEL_UNCORE_EVENT_DESC(cas_count_write, "event=0x04,umask=0x30"),
Fixes: 2b3b76b5ec67 ("perf/x86/intel/uncore: Add Ice Lake server uncore support")
Reviewed-by: Adrian Hunter <adrian.hunter(a)intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
Change log:
v3:
* Add change log
v2:
* Add stable tag
arch/x86/events/intel/uncore_snbep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index 5ddc0f30db6f..a6fd8eb410a9 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -5468,7 +5468,7 @@ static struct intel_uncore_type icx_uncore_imc = {
.fixed_ctr_bits = 48,
.fixed_ctr = SNR_IMC_MMIO_PMON_FIXED_CTR,
.fixed_ctl = SNR_IMC_MMIO_PMON_FIXED_CTL,
- .event_descs = hswep_uncore_imc_events,
+ .event_descs = snr_uncore_imc_events,
.perf_ctr = SNR_IMC_MMIO_PMON_CTR0,
.event_ctl = SNR_IMC_MMIO_PMON_CTL0,
.event_mask = SNBEP_PMON_RAW_EVENT_MASK,
--
2.25.1
From: Aditya Garg <redecorating(a)protonmail.com>
Some devices have a bug causing them to not work if they query LE tx power on startup. Thus we add a quirk in order to not query it and default min/max tx power values to HCI_TX_POWER_INVALID.
Signed-off-by: Aditya Garg <gargaditya08(a)live.com>
Tested-by: Aditya Garg <gargaditya08(a)live.com>
---
include/net/bluetooth/hci.h | 9 +++++++++
net/bluetooth/hci_core.c | 3 ++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 63065bc01b766c..383342efcdc464 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -246,6 +246,15 @@ enum {
* HCI after resume.
*/
HCI_QUIRK_NO_SUSPEND_NOTIFIER,
+
+ /*
+ * When this quirk is set, LE tx power is not queried on startup
+ * and the min/max tx power values default to HCI_TX_POWER_INVALID.
+ *
+ * This quirk can be set before hci_register_dev is called or
+ * during the hdev->setup vendor callback.
+ */
+ HCI_QUIRK_BROKEN_READ_TRANSMIT_POWER,
};
/* HCI device flags */
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 8d33aa64846b1c..434c6878fe9640 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -619,7 +619,8 @@ static int hci_init3_req(struct hci_request *req, unsigned long opt)
hci_req_add(req, HCI_OP_LE_READ_ADV_TX_POWER, 0, NULL);
}
- if (hdev->commands[38] & 0x80) {
+ if (hdev->commands[38] & 0x80 &&
+ !test_bit(HCI_QUIRK_BROKEN_READ_TRANSMIT_POWER, &hdev->quirks)) {
/* Read LE Min/Max Tx Power*/
hci_req_add(req, HCI_OP_LE_READ_TRANSMIT_POWER,
0, NULL);
When replugging the device the following message shows up:
gpio gpiochip2: (dln2): detected irqchip that is shared with multiple gpiochips: please fix the driver.
This also has the effect that interrupts won't work.
The same problem would also show up if multiple devices where plugged in.
Fix this by allocating the irq_chip data structure per instance like other
drivers do.
I don't know when this problem appeared, but it is present in 5.10.
Cc: <stable(a)vger.kernel.org> # 5.10+
Cc: Daniel Baluta <daniel.baluta(a)gmail.com>
Signed-off-by: Noralf Trønnes <noralf(a)tronnes.org>
---
drivers/gpio/gpio-dln2.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/gpio/gpio-dln2.c b/drivers/gpio/gpio-dln2.c
index 4c5f6d0c8d74..22f11dd5210d 100644
--- a/drivers/gpio/gpio-dln2.c
+++ b/drivers/gpio/gpio-dln2.c
@@ -46,6 +46,7 @@
struct dln2_gpio {
struct platform_device *pdev;
struct gpio_chip gpio;
+ struct irq_chip irqchip;
/*
* Cache pin direction to save us one transfer, since the hardware has
@@ -383,15 +384,6 @@ static void dln2_irq_bus_unlock(struct irq_data *irqd)
mutex_unlock(&dln2->irq_lock);
}
-static struct irq_chip dln2_gpio_irqchip = {
- .name = "dln2-irq",
- .irq_mask = dln2_irq_mask,
- .irq_unmask = dln2_irq_unmask,
- .irq_set_type = dln2_irq_set_type,
- .irq_bus_lock = dln2_irq_bus_lock,
- .irq_bus_sync_unlock = dln2_irq_bus_unlock,
-};
-
static void dln2_gpio_event(struct platform_device *pdev, u16 echo,
const void *data, int len)
{
@@ -477,8 +469,15 @@ static int dln2_gpio_probe(struct platform_device *pdev)
dln2->gpio.direction_output = dln2_gpio_direction_output;
dln2->gpio.set_config = dln2_gpio_set_config;
+ dln2->irqchip.name = "dln2-irq",
+ dln2->irqchip.irq_mask = dln2_irq_mask,
+ dln2->irqchip.irq_unmask = dln2_irq_unmask,
+ dln2->irqchip.irq_set_type = dln2_irq_set_type,
+ dln2->irqchip.irq_bus_lock = dln2_irq_bus_lock,
+ dln2->irqchip.irq_bus_sync_unlock = dln2_irq_bus_unlock,
+
girq = &dln2->gpio.irq;
- girq->chip = &dln2_gpio_irqchip;
+ girq->chip = &dln2->irqchip;
/* The event comes from the outside so no parent handler */
girq->parent_handler = NULL;
girq->num_parents = 0;
--
2.33.0
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cecbc0c7eba7983965cac94f88d2db00b913253b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Fri, 29 Oct 2021 22:18:02 +0300
Subject: [PATCH] drm/i915/hdmi: Turn DP++ TMDS output buffers back on in
encoder->shutdown()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Looks like our VBIOS/GOP generally fail to turn the DP dual mode adater
TMDS output buffers back on after a reboot. This leads to a black screen
after reboot if we turned the TMDS output buffers off prior to reboot.
And if i915 decides to do a fastboot the black screen will persist even
after i915 takes over.
Apparently this has been a problem ever since commit b2ccb822d376 ("drm/i915:
Enable/disable TMDS output buffers in DP++ adaptor as needed") if one
rebooted while the display was turned off. And things became worse with
commit fe0f1e3bfdfe ("drm/i915: Shut down displays gracefully on reboot")
since now we always turn the display off before a reboot.
This was reported on a RKL, but I confirmed the same behaviour on my
SNB as well. So looks pretty universal.
Let's fix this by explicitly turning the TMDS output buffers back on
in the encoder->shutdown() hook. Note that this gets called after irqs
have been disabled, so the i2c communication with the DP dual mode
adapter has to be performed via polling (which the gmbus code is
perfectly happy to do for us).
We also need a bit of care in handling DDI encoders which may or may
not be set up for HDMI output. Specifically ddc_pin will not be
populated for a DP only DDI encoder, in which case we don't want to
call intel_gmbus_get_adapter(). We can handle that by simply doing
the dual mode adapter type check before calling
intel_gmbus_get_adapter().
Cc: <stable(a)vger.kernel.org> # v5.11+
Fixes: fe0f1e3bfdfe ("drm/i915: Shut down displays gracefully on reboot")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4371
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211029191802.18448-2-ville.…
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy(a)intel.com>
(cherry picked from commit 49c55f7b035b87371a6d3c53d9af9f92ddc962db)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/g4x_hdmi.c b/drivers/gpu/drm/i915/display/g4x_hdmi.c
index 88c427f3c346..f5b4dd5b4275 100644
--- a/drivers/gpu/drm/i915/display/g4x_hdmi.c
+++ b/drivers/gpu/drm/i915/display/g4x_hdmi.c
@@ -584,6 +584,7 @@ void g4x_hdmi_init(struct drm_i915_private *dev_priv,
else
intel_encoder->enable = g4x_enable_hdmi;
}
+ intel_encoder->shutdown = intel_hdmi_encoder_shutdown;
intel_encoder->type = INTEL_OUTPUT_HDMI;
intel_encoder->power_domain = intel_port_to_power_domain(port);
diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c b/drivers/gpu/drm/i915/display/intel_ddi.c
index 1dcfe31e6c6f..cfb567df71b3 100644
--- a/drivers/gpu/drm/i915/display/intel_ddi.c
+++ b/drivers/gpu/drm/i915/display/intel_ddi.c
@@ -4361,6 +4361,7 @@ static void intel_ddi_encoder_shutdown(struct intel_encoder *encoder)
enum phy phy = intel_port_to_phy(i915, encoder->port);
intel_dp_encoder_shutdown(encoder);
+ intel_hdmi_encoder_shutdown(encoder);
if (!intel_phy_is_tc(i915, phy))
return;
diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.c b/drivers/gpu/drm/i915/display/intel_hdmi.c
index d2e61f6c6e08..371736bdc01f 100644
--- a/drivers/gpu/drm/i915/display/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/display/intel_hdmi.c
@@ -1246,12 +1246,13 @@ static void hsw_set_infoframes(struct intel_encoder *encoder,
void intel_dp_dual_mode_set_tmds_output(struct intel_hdmi *hdmi, bool enable)
{
struct drm_i915_private *dev_priv = intel_hdmi_to_i915(hdmi);
- struct i2c_adapter *adapter =
- intel_gmbus_get_adapter(dev_priv, hdmi->ddc_bus);
+ struct i2c_adapter *adapter;
if (hdmi->dp_dual_mode.type < DRM_DP_DUAL_MODE_TYPE2_DVI)
return;
+ adapter = intel_gmbus_get_adapter(dev_priv, hdmi->ddc_bus);
+
drm_dbg_kms(&dev_priv->drm, "%s DP dual mode adaptor TMDS output\n",
enable ? "Enabling" : "Disabling");
@@ -2258,6 +2259,17 @@ int intel_hdmi_compute_config(struct intel_encoder *encoder,
return 0;
}
+void intel_hdmi_encoder_shutdown(struct intel_encoder *encoder)
+{
+ struct intel_hdmi *intel_hdmi = enc_to_intel_hdmi(encoder);
+
+ /*
+ * Give a hand to buggy BIOSen which forget to turn
+ * the TMDS output buffers back on after a reboot.
+ */
+ intel_dp_dual_mode_set_tmds_output(intel_hdmi, true);
+}
+
static void
intel_hdmi_unset_edid(struct drm_connector *connector)
{
diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.h b/drivers/gpu/drm/i915/display/intel_hdmi.h
index b43a180d007e..2bf440eb400a 100644
--- a/drivers/gpu/drm/i915/display/intel_hdmi.h
+++ b/drivers/gpu/drm/i915/display/intel_hdmi.h
@@ -28,6 +28,7 @@ void intel_hdmi_init_connector(struct intel_digital_port *dig_port,
int intel_hdmi_compute_config(struct intel_encoder *encoder,
struct intel_crtc_state *pipe_config,
struct drm_connector_state *conn_state);
+void intel_hdmi_encoder_shutdown(struct intel_encoder *encoder);
bool intel_hdmi_handle_sink_scrambling(struct intel_encoder *encoder,
struct drm_connector *connector,
bool high_tmds_clock_ratio,
Add missing power-domain "mxc" required by CDSP PAS remoteproc on SM8350
SoC.
Fixes: e8b4e9a21af7 ("remoteproc: qcom: pas: Add SM8350 PAS remoteprocs")
Signed-off-by: Sibi Sankar <sibis(a)codeaurora.org>
Cc: stable(a)vger.kernel.org
---
The device tree and pas documentation lists mcx as a required pd for cdsp.
Looks like it was missed while adding the proxy pds in the pas driver.
Bjorn/Vinod you'll need to test this patch before picking it up.
drivers/remoteproc/qcom_q6v5_pas.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
index b921fc26cd04..ad20065dbdea 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -661,6 +661,7 @@ static const struct adsp_data sm8350_cdsp_resource = {
},
.proxy_pd_names = (char*[]){
"cx",
+ "mxc",
NULL
},
.ssr_name = "cdsp",
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
On Wed, Oct 20, 2021 at 12:23:26PM +0200, Hans de Goede wrote:
> On 10/19/21 23:52, Bjorn Helgaas wrote:
> > On Thu, Oct 14, 2021 at 08:39:42PM +0200, Hans de Goede wrote:
> >> Some BIOS-es contain a bug where they add addresses which map to system
> >> RAM in the PCI host bridge window returned by the ACPI _CRS method, see
> >> commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
> >> space").
> >>
> >> To work around this bug Linux excludes E820 reserved addresses when
> >> allocating addresses from the PCI host bridge window since 2010.
> >> ...
> > I haven't seen anybody else eager to merge this, so I guess I'll stick
> > my neck out here.
> >
> > I applied this to my for-linus branch for v5.15.
>
> Thank you, and sorry about the build-errors which the lkp
> kernel-test-robot found.
>
> I've just send out a patch which fixes these build-errors
> (verified with both .config-s from the lkp reports).
> Feel free to squash this into the original patch (or keep
> them separate, whatever works for you).
Thanks, I squashed the fix in.
HOWEVER, I think it would be fairly risky to push this into v5.15.
We would be relying on the assumption that current machines have all
fixed the BIOS defect that 4dc2287c1805 addressed, and we have little
evidence for that.
I'm not sure there's significant benefit to having this in v5.15.
Yes, the mainline v5.15 kernel would work on the affected machines,
but I suspect most people with those machines are running distro
kernels, not mainline kernels.
This issue has been around a long time, so it's not like a regression
that we just introduced. If we fixed these machines and regressed
*other* machines, we'd be worse off than we are now.
Convince me otherwise if you see this differently :)
In the meantime, here's another possibility for working around this.
What if we discarded remove_e820_regions() completely, but aligned the
problem _CRS windows a little more? The 4dc2287c1805 case was this:
BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
where the _CRS window was of size 0x20100000, i.e., 512M + 1M. At
least in this particular case, we could avoid the problem by throwing
away that first 1M and aligning the window to a nice 3G boundary.
Maybe it would be worth giving up a small fraction (less than 0.2% in
this case) of questionable windows like this?
Bjorn
The patch titled
Subject: mm: fix panic in __alloc_pages
has been added to the -mm tree. Its filename is
mm-fix-panic-in-__alloc_pages.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-fix-panic-in-__alloc_pages.pat…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-panic-in-__alloc_pages.pat…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Alexey Makhalov <amakhalov(a)vmware.com>
Subject: mm: fix panic in __alloc_pages
There is a kernel panic caused by pcpu_alloc_pages() passing offlined and
uninitialized node to alloc_pages_node() leading to panic by NULL
dereferencing uninitialized NODE_DATA(nid).
CPU2 has been hot-added
BUG: unable to handle page fault for address: 0000000000001608
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 1 Comm: systemd Tainted: G E 5.15.0-rc7+ #11
Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW
RIP: 0010:__alloc_pages+0x127/0x290
Code: 4c 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 e0 48 8b 55 b8 c1 e8 0c 83 e0 01 88 45 d0 4c 89 c8 48 85 d2 0f 85 1a 01 00 00 <45> 3b 41 08 0f 82 10 01 00 00 48 89 45 c0 48 8b 00 44 89 e2 81 e2
RSP: 0018:ffffc900006f3bc8 EFLAGS: 00010246
RAX: 0000000000001600 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000cc2
RBP: ffffc900006f3c18 R08: 0000000000000001 R09: 0000000000001600
R10: ffffc900006f3a40 R11: ffff88813c9fffe8 R12: 0000000000000cc2
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000cc2
FS: 00007f27ead70500(0000) GS:ffff88807ce00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000001608 CR3: 000000000582c003 CR4: 00000000001706b0
Call Trace:
pcpu_alloc_pages.constprop.0+0xe4/0x1c0
pcpu_populate_chunk+0x33/0xb0
pcpu_alloc+0x4d3/0x6f0
__alloc_percpu_gfp+0xd/0x10
alloc_mem_cgroup_per_node_info+0x54/0xb0
mem_cgroup_alloc+0xed/0x2f0
mem_cgroup_css_alloc+0x33/0x2f0
css_create+0x3a/0x1f0
cgroup_apply_control_enable+0x12b/0x150
cgroup_mkdir+0xdd/0x110
kernfs_iop_mkdir+0x4f/0x80
vfs_mkdir+0x178/0x230
do_mkdirat+0xfd/0x120
__x64_sys_mkdir+0x47/0x70
? syscall_exit_to_user_mode+0x21/0x50
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Panic can be easily reproduced by disabling udev rule for automatic
onlining hot added CPU followed by CPU with memoryless node (NUMA node
with CPU only) hot add.
Hot adding CPU and memoryless node does not bring the node to online
state. Memoryless node will be onlined only during the onlining its CPU.
Node can be in one of the following states:
1. not present.(nid == NUMA_NO_NODE)
2. present, but offline (nid > NUMA_NO_NODE, node_online(nid) == 0,
NODE_DATA(nid) == NULL)
3. present and online (nid > NUMA_NO_NODE, node_online(nid) > 0,
NODE_DATA(nid) != NULL)
Percpu code is doing allocations for all possible CPUs. The issue happens
when it serves hot added but not yet onlined CPU when its node is in 2nd
state. This node is not ready to use, fallback to numa_mem_id().
Link: https://lkml.kernel.org/r/20211108202325.20304-1-amakhalov@vmware.com
Signed-off-by: Alexey Makhalov <amakhalov(a)vmware.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/percpu-vm.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/mm/percpu-vm.c~mm-fix-panic-in-__alloc_pages
+++ a/mm/percpu-vm.c
@@ -84,15 +84,19 @@ static int pcpu_alloc_pages(struct pcpu_
gfp_t gfp)
{
unsigned int cpu, tcpu;
- int i;
+ int i, nid;
gfp |= __GFP_HIGHMEM;
for_each_possible_cpu(cpu) {
+ nid = cpu_to_node(cpu);
+ if (nid == NUMA_NO_NODE || !node_online(nid))
+ nid = numa_mem_id();
+
for (i = page_start; i < page_end; i++) {
struct page **pagep = &pages[pcpu_page_idx(cpu, i)];
- *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
+ *pagep = alloc_pages_node(nid, gfp, 0);
if (!*pagep)
goto err;
}
_
Patches currently in -mm which might be from amakhalov(a)vmware.com are
mm-fix-panic-in-__alloc_pages.patch
Commit a799c2bd29d19c565 ("x86/setup: Consolidate early memory
reservations") introduced early_reserve_memory() to do all needed
initial memblock_reserve() calls in one function. Unfortunately the
call of early_reserve_memory() is done too late for Xen dom0, as in
some cases a Xen hook called by e820__memory_setup() will need those
memory reservations to have happened already.
Move the call of early_reserve_memory() before the call of
e820__memory_setup() in order to avoid such problems.
Cc: stable(a)vger.kernel.org
Fixes: a799c2bd29d19c565 ("x86/setup: Consolidate early memory reservations")
Reported-by: Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross(a)suse.com>
---
V2:
- update comment (Jan Beulich, Boris Petkov)
- move call down in setup_arch() (Mike Galbraith)
---
arch/x86/kernel/setup.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 79f164141116..40ed44ead063 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -830,6 +830,20 @@ void __init setup_arch(char **cmdline_p)
x86_init.oem.arch_setup();
+ /*
+ * Do some memory reservations *before* memory is added to memblock, so
+ * memblock allocations won't overwrite it.
+ *
+ * After this point, everything still needed from the boot loader or
+ * firmware or kernel text should be early reserved or marked not RAM in
+ * e820. All other memory is free game.
+ *
+ * This call needs to happen before e820__memory_setup() which calls the
+ * xen_memory_setup() on Xen dom0 which relies on the fact that those
+ * early reservations have happened already.
+ */
+ early_reserve_memory();
+
iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
e820__memory_setup();
parse_setup_data();
@@ -876,18 +890,6 @@ void __init setup_arch(char **cmdline_p)
parse_early_param();
- /*
- * Do some memory reservations *before* memory is added to
- * memblock, so memblock allocations won't overwrite it.
- * Do it after early param, so we could get (unlikely) panic from
- * serial.
- *
- * After this point everything still needed from the boot loader or
- * firmware or kernel text should be early reserved or marked not
- * RAM in e820. All other memory is free game.
- */
- early_reserve_memory();
-
#ifdef CONFIG_MEMORY_HOTPLUG
/*
* Memory used by the kernel cannot be hot-removed because Linux
--
2.26.2
All of entries are freed in a loop, however, the freed entry is accessed
by list_del() after the loop.
When epf driver that includes pci-epf-test unload, the pci_epf_remove_cfs()
is called, and occurred the use after free. Therefore, kernel panics
randomly after or while the module unloading.
I tested this patch with r8a77951-Salvator-xs boards.
Fixes: ef1433f ("PCI: endpoint: Create configfs entry for each pci_epf_device_id table entry")
Signed-off-by: Shunsuke Mie <mie(a)igel.co.jp>
---
drivers/pci/endpoint/pci-epf-core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/endpoint/pci-epf-core.c b/drivers/pci/endpoint/pci-epf-core.c
index e9289d10f822..538e902b0ba6 100644
--- a/drivers/pci/endpoint/pci-epf-core.c
+++ b/drivers/pci/endpoint/pci-epf-core.c
@@ -202,8 +202,10 @@ static void pci_epf_remove_cfs(struct pci_epf_driver *driver)
return;
mutex_lock(&pci_epf_mutex);
- list_for_each_entry_safe(group, tmp, &driver->epf_group, group_entry)
+ list_for_each_entry_safe(group, tmp, &driver->epf_group, group_entry) {
+ list_del(&group->group_entry);
pci_ep_cfs_remove_epf_group(group);
+ }
list_del(&driver->epf_group);
mutex_unlock(&pci_epf_mutex);
}
--
2.17.1
Patch 1/2 fixes the issue reported by Nikita where a MAC address
wouldn't match if given as first field of a set, and patch 2/2 adds
the corresponding test.
Stefano Brivio (2):
nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit
groups
selftests: netfilter: Add correctness test for mac,net set type
net/netfilter/nft_set_pipapo_avx2.c | 2 +-
.../selftests/netfilter/nft_concat_range.sh | 24 ++++++++++++++++---
2 files changed, 22 insertions(+), 4 deletions(-)
--
2.30.2
There is a limited amount of SGX memory (EPC) on each system. When that
memory is used up, SGX has its own swapping mechanism which is similar
in concept but totally separate from the core mm/* code. Instead of
swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM
comes from a shared memory pseudo-file and can itself be swapped by the
core mm code. There is a hierarchy like this:
EPC <-> shmem <-> disk
After data is swapped back in from shmem to EPC, the shmem backing
storage needs to be freed. Currently, the backing shmem is not freed.
This effectively wastes the shmem while the enclave is running. The
memory is recovered when the enclave is destroyed and the backing
storage freed.
Sort this out by freeing memory with shmem_truncate_range(), as soon as
a page is faulted back to the EPC. In addition, free the memory for
PCMD pages as soon as all PCMD's in a page have been marked as unused
by zeroing its contents.
Reported-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v2:
* Rewrite commit message as proposed by Dave.
* Truncate PCMD pages (Dave).
---
arch/x86/kernel/cpu/sgx/encl.c | 48 +++++++++++++++++++++++++++++++---
1 file changed, 44 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..ea43c10e5458 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -12,6 +12,27 @@
#include "encls.h"
#include "sgx.h"
+
+/*
+ * Get the page number of the page in the backing storage, which stores the PCMD
+ * of the enclave page in the given page index. PCMD pages are located after
+ * the backing storage for the visible enclave pages and SECS.
+ */
+static inline pgoff_t sgx_encl_get_backing_pcmd_nr(struct sgx_encl *encl, pgoff_t index)
+{
+ return PFN_DOWN(encl->size) + 1 + (index / sizeof(struct sgx_pcmd));
+}
+
+/*
+ * Free a page from the backing storage in the given page index.
+ */
+static inline void sgx_encl_truncate_backing_page(struct sgx_encl *encl, pgoff_t index)
+{
+ struct inode *inode = file_inode(encl->backing);
+
+ shmem_truncate_range(inode, PFN_PHYS(index), PFN_PHYS(index) + PAGE_SIZE - 1);
+}
+
/*
* ELDU: Load an EPC page as unblocked. For more info, see "OS Management of EPC
* Pages" in the SDM.
@@ -24,7 +45,10 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
struct sgx_encl *encl = encl_page->encl;
struct sgx_pageinfo pginfo;
struct sgx_backing b;
+ bool pcmd_page_empty;
pgoff_t page_index;
+ pgoff_t pcmd_index;
+ u8 *pcmd_page;
int ret;
if (secs_page)
@@ -38,8 +62,8 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
pginfo.addr = encl_page->desc & PAGE_MASK;
pginfo.contents = (unsigned long)kmap_atomic(b.contents);
- pginfo.metadata = (unsigned long)kmap_atomic(b.pcmd) +
- b.pcmd_offset;
+ pcmd_page = kmap_atomic(b.pcmd);
+ pginfo.metadata = (unsigned long)pcmd_page + b.pcmd_offset;
if (secs_page)
pginfo.secs = (u64)sgx_get_epc_virt_addr(secs_page);
@@ -55,11 +79,27 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
ret = -EFAULT;
}
- kunmap_atomic((void *)(unsigned long)(pginfo.metadata - b.pcmd_offset));
+ memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
+
+ /*
+ * The area for the PCMD in the page was zeroed above. Check if the
+ * whole page is now empty meaning that all PCMD's have been zeroed:
+ */
+ pcmd_page_empty = !memchr_inv(pcmd_page, 0, PAGE_SIZE);
+
+ kunmap_atomic(pcmd_page);
kunmap_atomic((void *)(unsigned long)pginfo.contents);
sgx_encl_put_backing(&b, false);
+ /* Free the backing memory. */
+ sgx_encl_truncate_backing_page(encl, page_index);
+
+ if (pcmd_page_empty) {
+ pcmd_index = sgx_encl_get_backing_pcmd_nr(encl, page_index);
+ sgx_encl_truncate_backing_page(encl, pcmd_index);
+ }
+
return ret;
}
@@ -577,7 +617,7 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
struct sgx_backing *backing)
{
- pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
+ pgoff_t pcmd_index = sgx_encl_get_backing_pcmd_nr(encl, page_index);
struct page *contents;
struct page *pcmd;
--
2.32.0
From: Juergen Gross <jgross(a)suse.com>
[ Upstream commit 897919ad8b42eb8222553838ab82414a924694aa ]
This configuration option provides a misc device as an API to userspace.
Make this API usable without having to select the module as a transitive
dependency.
This also fixes an issue where localyesconfig would select
CONFIG_XEN_PRIVCMD=m because it was not visible and defaulted to
building as module.
[boris: clarified help message per Jan's suggestion]
Based-on-patch-by: Thomas Weißschuh <linux(a)weissschuh.net>
Signed-off-by: Juergen Gross <jgross(a)suse.com>
Link: https://lore.kernel.org/r/20211116143323.18866-1-jgross@suse.com
Reviewed-by: Jan Beulich <jbeulich(a)suse.com>
Reviewed-by: Thomas Weißschuh <linux(a)weissschuh.net>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/xen/Kconfig | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 3a14948269b1e..59f862350a6ec 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -199,9 +199,15 @@ config XEN_SCSI_BACKEND
if guests need generic access to SCSI devices.
config XEN_PRIVCMD
- tristate
+ tristate "Xen hypercall passthrough driver"
depends on XEN
default m
+ help
+ The hypercall passthrough driver allows privileged user programs to
+ perform Xen hypercalls. This driver is normally required for systems
+ running as Dom0 to perform privileged operations, but in some
+ disaggregated Xen setups this driver might be needed for other
+ domains, too.
config XEN_STUB
bool "Xen stub drivers"
--
2.33.0
From: Bob Peterson <rpeterso(a)redhat.com>
[ Upstream commit 49462e2be119d38c5eb5759d0d1b712df3a41239 ]
Before this patch, evict would clear the iopen glock's gl_object after
releasing the inode glock. In the meantime, another process could reuse
the same block and thus glocks for a new inode. It would lock the inode
glock (exclusively), and then the iopen glock (shared). The shared
locking mode doesn't provide any ordering against the evict, so by the
time the iopen glock is reused, evict may not have gotten to setting
gl_object to NULL.
Fix that by releasing the iopen glock before the inode glock in
gfs2_evict_inode.
Signed-off-by: Bob Peterson <rpeterso(a)redhat.com>gl_object
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/gfs2/super.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 6a355e1347d7f..d2b7ecbd1b150 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1438,13 +1438,6 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_ordered_del_inode(ip);
clear_inode(inode);
gfs2_dir_hash_inval(ip);
- if (ip->i_gl) {
- glock_clear_object(ip->i_gl, ip);
- wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
- gfs2_glock_add_to_lru(ip->i_gl);
- gfs2_glock_put_eventually(ip->i_gl);
- ip->i_gl = NULL;
- }
if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
struct gfs2_glock *gl = ip->i_iopen_gh.gh_gl;
@@ -1457,6 +1450,13 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_holder_uninit(&ip->i_iopen_gh);
gfs2_glock_put_eventually(gl);
}
+ if (ip->i_gl) {
+ glock_clear_object(ip->i_gl, ip);
+ wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
+ gfs2_glock_add_to_lru(ip->i_gl);
+ gfs2_glock_put_eventually(ip->i_gl);
+ ip->i_gl = NULL;
+ }
}
static struct inode *gfs2_alloc_inode(struct super_block *sb)
--
2.33.0
From: Bob Peterson <rpeterso(a)redhat.com>
[ Upstream commit 49462e2be119d38c5eb5759d0d1b712df3a41239 ]
Before this patch, evict would clear the iopen glock's gl_object after
releasing the inode glock. In the meantime, another process could reuse
the same block and thus glocks for a new inode. It would lock the inode
glock (exclusively), and then the iopen glock (shared). The shared
locking mode doesn't provide any ordering against the evict, so by the
time the iopen glock is reused, evict may not have gotten to setting
gl_object to NULL.
Fix that by releasing the iopen glock before the inode glock in
gfs2_evict_inode.
Signed-off-by: Bob Peterson <rpeterso(a)redhat.com>gl_object
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/gfs2/super.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 6e00d15ef0a82..cc51b5f5f52d8 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1402,13 +1402,6 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_ordered_del_inode(ip);
clear_inode(inode);
gfs2_dir_hash_inval(ip);
- if (ip->i_gl) {
- glock_clear_object(ip->i_gl, ip);
- wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
- gfs2_glock_add_to_lru(ip->i_gl);
- gfs2_glock_put_eventually(ip->i_gl);
- ip->i_gl = NULL;
- }
if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
struct gfs2_glock *gl = ip->i_iopen_gh.gh_gl;
@@ -1421,6 +1414,13 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_holder_uninit(&ip->i_iopen_gh);
gfs2_glock_put_eventually(gl);
}
+ if (ip->i_gl) {
+ glock_clear_object(ip->i_gl, ip);
+ wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
+ gfs2_glock_add_to_lru(ip->i_gl);
+ gfs2_glock_put_eventually(ip->i_gl);
+ ip->i_gl = NULL;
+ }
}
static struct inode *gfs2_alloc_inode(struct super_block *sb)
--
2.33.0
Reserving memory using efi_mem_reserve() calls into the x86
efi_arch_mem_reserve() function. This function will insert a new EFI
memory descriptor into the EFI memory map representing the area of
memory to be reserved and marking it as EFI runtime memory.
As part of adding this new entry, a new EFI memory map is allocated and
mapped. The mapping is where a problem can occur. This new EFI memory map
is mapped using early_memremap(). If the allocated memory comes from an
area that is marked as EFI_BOOT_SERVICES_DATA memory in the current EFI
memory map, then it will be mapped unencrypted (see memremap_is_efi_data()
and the call to efi_mem_type()).
However, during replacement of the old EFI memory map with the new EFI
memory map, efi_mem_type() is disabled, resulting in the new EFI memory
map always being mapped encrypted in efi.memmap. This will cause a kernel
crash later in the boot.
Since it is known that the new EFI memory map will always be mapped
encrypted when efi_memmap_install() is called, explicitly map the new EFI
memory map as encrypted (using early_memremap_prot()) when inserting the
new memory map entry.
Cc: <stable(a)vger.kernel.org> # 4.14.x
Fixes: 8f716c9b5feb ("x86/mm: Add support to access boot related data in the clear")
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
---
Changes for v2:
- Update/expand commit message to (hopefully) make it easier to read and
understand
- Add a comment around the use of the early_memremap_prot() call
---
arch/x86/platform/efi/quirks.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index b15ebfe40a73..14f8f20d727a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -277,7 +277,19 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
return;
}
- new = early_memremap(data.phys_map, data.size);
+ /*
+ * When SME is active, early_memremap() can map the memory unencrypted
+ * if the allocation came from EFI_BOOT_SERVICES_DATA (see
+ * memremap_is_efi_data() and the call to efi_mem_type()). However,
+ * when efi_memmap_install() is called to replace the memory map,
+ * efi_mem_type() is "disabled" and so the memory will always be mapped
+ * encrypted. To avoid this possible mismatch between the mappings,
+ * always map the newly allocated memmap memory as encrypted.
+ *
+ * When SME is not active, this behaves just like early_memremap().
+ */
+ new = early_memremap_prot(data.phys_map, data.size,
+ pgprot_val(pgprot_encrypted(FIXMAP_PAGE_NORMAL)));
if (!new) {
pr_err("Failed to map new boot services memmap\n");
return;
--
2.33.1
Add the missing bulk-endpoint max-packet sanity check to probe() to
avoid division by zero in uvc_alloc_urb_buffers() in case a malicious
device has broken descriptors (or when doing descriptor fuzz testing).
Note that USB core will reject URBs submitted for endpoints with zero
wMaxPacketSize but that drivers doing packet-size calculations still
need to handle this (cf. commit 2548288b4fb0 ("USB: Fix: Don't skip
endpoint descriptors with maxpacket=0")).
Fixes: c0efd232929c ("V4L/DVB (8145a): USB Video Class driver")
Cc: stable(a)vger.kernel.org # 2.6.26
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/media/usb/uvc/uvc_video.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index e16464606b14..85ac5c1081b6 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -1958,6 +1958,10 @@ static int uvc_video_start_transfer(struct uvc_streaming *stream,
if (ep == NULL)
return -EIO;
+ /* Reject broken descriptors. */
+ if (usb_endpoint_maxp(&ep->desc) == 0)
+ return -EIO;
+
ret = uvc_init_video_bulk(stream, ep, gfp_flags);
}
--
2.32.0
Summary: When attempting to rise or shut down a NIC manually or via
network-manager under 5.15, the machine reboots or freezes.
Occurs with: 5.15.4-051504-generic and earlier 5.15 mainline (
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.4/) as well as
liquorix flavours.
Does not occur with: 5.14 and 5.13 (both with various flavours)
Hi all,
I'm experiencing a severe bug that causes the machine to reboot or
freeze when trying to login and/or rise/shutdown a NIC. Here's a brief
description of scenarios I've tested:
Scenario 1: enp6s0 managed manually using /etc/networking/interfaces,
DHCP
a. Issuing ifdown enp6s0 in terminal will throw
"/etc/resolvconf/update.d/libc: Warning: /etc/resolv.conf is
not a symbolic link to /run/resolvconf/resolv.conf"
and cause the machine to reboot after ~10s of showing a blinking cursor
b. Issuing shutdown -h now or trying to shutdown/reboot machine via
GUI:
shutdown will stop on "stop job is running for ifdown enp6s0" and after
approx. 10..15s the countdown freezes. Repeated ALT-SysReq-REISUB does
not reboot the machine, a hard reset is required.
--
Scenario 2: enp6s0 managed manually using /etc/networking/interfaces,
STATIC
a. Issuing ifdown enp6s0 in terminal will throw
"send_packet: Operation not permitted
dhclient.c:3010: Failed to send 300 byte long packet over
fallback interface."
and cause the machine to reboot after ~10s of blinking cursor.
b. Issuing shutdown -h now or trying to shutdown or reboot machine via
GUI: shutdown will stop on "stop job is running for ifdown enp6s0" and
after approx. 10..15s the countdown freezes. Repeated ALT-SysReq-REISUB
does not reboot the machine, a hard reset is required.
--
Scenario 3: enp6s0 managed by network manager
a. After booting and logging in either via GUI or TTY, the display will
stay blank and only show a blinking cursor and then freeze after
5..10s. ALT-SysReq-REISUB does not reboot the machine, a hard reset is
required.
--
Here's a snippet from the journal for Scenario 1a:
Nov 21 10:39:25 computer sudo[5606]: user : TTY=pts/0 ;
PWD=/home/user ; USER=root ; COMMAND=/usr/sbin/ifdown enp6s0
Nov 21 10:39:25 computer sudo[5606]: pam_unix(sudo:session): session
opened for user root by (uid=0)
-- Reboot --
Nov 21 10:40:14 computer systemd-journald[478]: Journal started
--
I'm running Alder Lake i9 12900K but I have E-cores disabled in BIOS.
Here are some more specs with working kernel:
$ inxi -bxz
System: Kernel: 5.14.0-19.2-liquorix-amd64 x86_64 bits: 64 compiler:
N/A Desktop: Xfce 4.16.3
Distro: Ubuntu 20.04.3 LTS (Focal Fossa)
Machine: Type: Desktop System: ASUS product: N/A v: N/A serial: N/A
Mobo: ASUSTeK model: ROG STRIX Z690-A GAMING WIFI D4 v: Rev
1.xx serial: <filter>
UEFI [Legacy]: American Megatrends v: 0707 date: 11/10/2021
CPU: 8-Core: 12th Gen Intel Core i9-12900K type: MT MCP arch: N/A
speed: 5381 MHz max: 3201 MHz
Graphics: Device-1: NVIDIA vendor: Gigabyte driver: nvidia v: 470.86
bus ID: 01:00.0
Display: server: X.Org 1.20.11 driver: nvidia tty: N/A
Message: Unable to show advanced data. Required tool glxinfo
missing.
Network: Device-1: Intel vendor: ASUSTeK driver: igc v: kernel port:
4000 bus ID: 06:00.0
Please advice how I may assist in debugging!
Thanks.
From: Guangming <Guangming.Cao(a)mediatek.com>
For previous version, it uses 'sg_table.nent's to traverse sg_table in pages
free flow.
However, 'sg_table.nents' is reassigned in 'dma_map_sg', it means the number of
created entries in the DMA adderess space.
So, use 'sg_table.nents' in pages free flow will case some pages can't be freed.
Here we should use sg_table.orig_nents to free pages memory, but use the
sgtable helper 'for each_sgtable_sg'(, instead of the previous rather common
helper 'for_each_sg' which maybe cause memory leak) is much better.
Fixes: d963ab0f15fb0 ("dma-buf: system_heap: Allocate higher order pages if available")
Signed-off-by: Guangming <Guangming.Cao(a)mediatek.com>
---
drivers/dma-buf/heaps/system_heap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index 23a7e74ef966..8660508f3684 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -289,7 +289,7 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
int i;
table = &buffer->sg_table;
- for_each_sg(table->sgl, sg, table->nents, i) {
+ for_each_sgtable_sg(table, sg, i) {
struct page *page = sg_page(sg);
__free_pages(page, compound_order(page));
--
2.17.1
The five patches referencing this cover letter are backports of upstream
fix "3f015d89a47c NFSv42: Fix pagecache invalidation after COPY/CLONE",
which did not apply cleanly to longterm trees.
Note: while the upstream fix received extensive testing, these backports
have been build-tested only.
Benjamin Coddington (1):
NFSv42: Fix pagecache invalidation after COPY/CLONE
fs/nfs/nfs42proc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--
2.31.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 85b6d24646e4125c591639841169baa98a2da503 Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:21 -0800
Subject: [PATCH] shm: extend forced shm destroy to support objects from
several IPC nses
Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife…
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 05e22770af51..b75395ec8d52 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -131,6 +131,16 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ if (ns) {
+ if (refcount_inc_not_zero(&ns->ns.count))
+ return ns;
+ }
+
+ return NULL;
+}
+
extern void put_ipc_ns(struct ipc_namespace *ns);
#else
static inline struct ipc_namespace *copy_ipcs(unsigned long flags,
@@ -147,6 +157,11 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ return ns;
+}
+
static inline void put_ipc_ns(struct ipc_namespace *ns)
{
}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ba88a6987400..058d7f371e25 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -158,7 +158,7 @@ static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs. Also
* pins the final release of task.io_context. Also protects ->cpuset and
- * ->cgroup.subsys[]. And ->vfork_done.
+ * ->cgroup.subsys[]. And ->vfork_done. And ->sysvshm.shm_clist.
*
* Nests both inside and outside of read_lock(&tasklist_lock).
* It must not be nested with write_lock_irq(&tasklist_lock),
diff --git a/ipc/shm.c b/ipc/shm.c
index 4942bdd65748..b3048ebd5c31 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -62,9 +62,18 @@ struct shmid_kernel /* private to the kernel */
struct pid *shm_lprid;
struct ucounts *mlock_ucounts;
- /* The task created the shm object. NULL if the task is dead. */
+ /*
+ * The task created the shm object, for
+ * task_lock(shp->shm_creator)
+ */
struct task_struct *shm_creator;
- struct list_head shm_clist; /* list by creator */
+
+ /*
+ * List by creator. task_lock(->shm_creator) required for read/write.
+ * If list_empty(), then the creator is dead already.
+ */
+ struct list_head shm_clist;
+ struct ipc_namespace *ns;
} __randomize_layout;
/* shm_mode upper byte flags */
@@ -115,6 +124,7 @@ static void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
struct shmid_kernel *shp;
shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+ WARN_ON(ns != shp->ns);
if (shp->shm_nattch) {
shp->shm_perm.mode |= SHM_DEST;
@@ -225,10 +235,43 @@ static void shm_rcu_free(struct rcu_head *head)
kfree(shp);
}
-static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
+/*
+ * It has to be called with shp locked.
+ * It must be called before ipc_rmid()
+ */
+static inline void shm_clist_rm(struct shmid_kernel *shp)
{
- list_del(&s->shm_clist);
- ipc_rmid(&shm_ids(ns), &s->shm_perm);
+ struct task_struct *creator;
+
+ /* ensure that shm_creator does not disappear */
+ rcu_read_lock();
+
+ /*
+ * A concurrent exit_shm may do a list_del_init() as well.
+ * Just do nothing if exit_shm already did the work
+ */
+ if (!list_empty(&shp->shm_clist)) {
+ /*
+ * shp->shm_creator is guaranteed to be valid *only*
+ * if shp->shm_clist is not empty.
+ */
+ creator = shp->shm_creator;
+
+ task_lock(creator);
+ /*
+ * list_del_init() is a nop if the entry was already removed
+ * from the list.
+ */
+ list_del_init(&shp->shm_clist);
+ task_unlock(creator);
+ }
+ rcu_read_unlock();
+}
+
+static inline void shm_rmid(struct shmid_kernel *s)
+{
+ shm_clist_rm(s);
+ ipc_rmid(&shm_ids(s->ns), &s->shm_perm);
}
@@ -283,7 +326,7 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
shm_file = shp->shm_file;
shp->shm_file = NULL;
ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- shm_rmid(ns, shp);
+ shm_rmid(shp);
shm_unlock(shp);
if (!is_file_hugepages(shm_file))
shmem_lock(shm_file, 0, shp->mlock_ucounts);
@@ -303,10 +346,10 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
*
* 2) sysctl kernel.shm_rmid_forced is set to 1.
*/
-static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+static bool shm_may_destroy(struct shmid_kernel *shp)
{
return (shp->shm_nattch == 0) &&
- (ns->shm_rmid_forced ||
+ (shp->ns->shm_rmid_forced ||
(shp->shm_perm.mode & SHM_DEST));
}
@@ -337,7 +380,7 @@ static void shm_close(struct vm_area_struct *vma)
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_dtim = ktime_get_real_seconds();
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
@@ -358,10 +401,10 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
*
* As shp->* are changed under rwsem, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL)
+ if (!list_empty(&shp->shm_clist))
return 0;
- if (shm_may_destroy(ns, shp)) {
+ if (shm_may_destroy(shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
@@ -379,48 +422,97 @@ void shm_destroy_orphaned(struct ipc_namespace *ns)
/* Locking assumes this will only be called with task == current */
void exit_shm(struct task_struct *task)
{
- struct ipc_namespace *ns = task->nsproxy->ipc_ns;
- struct shmid_kernel *shp, *n;
+ for (;;) {
+ struct shmid_kernel *shp;
+ struct ipc_namespace *ns;
- if (list_empty(&task->sysvshm.shm_clist))
- return;
+ task_lock(task);
+
+ if (list_empty(&task->sysvshm.shm_clist)) {
+ task_unlock(task);
+ break;
+ }
+
+ shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel,
+ shm_clist);
- /*
- * If kernel.shm_rmid_forced is not set then only keep track of
- * which shmids are orphaned, so that a later set of the sysctl
- * can clean them up.
- */
- if (!ns->shm_rmid_forced) {
- down_read(&shm_ids(ns).rwsem);
- list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
- shp->shm_creator = NULL;
/*
- * Only under read lock but we are only called on current
- * so no entry on the list will be shared.
+ * 1) Get pointer to the ipc namespace. It is worth to say
+ * that this pointer is guaranteed to be valid because
+ * shp lifetime is always shorter than namespace lifetime
+ * in which shp lives.
+ * We taken task_lock it means that shp won't be freed.
*/
- list_del(&task->sysvshm.shm_clist);
- up_read(&shm_ids(ns).rwsem);
- return;
- }
+ ns = shp->ns;
- /*
- * Destroy all already created segments, that were not yet mapped,
- * and mark any mapped as orphan to cover the sysctl toggling.
- * Destroy is skipped if shm_may_destroy() returns false.
- */
- down_write(&shm_ids(ns).rwsem);
- list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
- shp->shm_creator = NULL;
+ /*
+ * 2) If kernel.shm_rmid_forced is not set then only keep track of
+ * which shmids are orphaned, so that a later set of the sysctl
+ * can clean them up.
+ */
+ if (!ns->shm_rmid_forced)
+ goto unlink_continue;
- if (shm_may_destroy(ns, shp)) {
- shm_lock_by_ptr(shp);
- shm_destroy(ns, shp);
+ /*
+ * 3) get a reference to the namespace.
+ * The refcount could be already 0. If it is 0, then
+ * the shm objects will be free by free_ipc_work().
+ */
+ ns = get_ipc_ns_not_zero(ns);
+ if (!ns) {
+unlink_continue:
+ list_del_init(&shp->shm_clist);
+ task_unlock(task);
+ continue;
}
- }
- /* Remove the list head from any segments still attached. */
- list_del(&task->sysvshm.shm_clist);
- up_write(&shm_ids(ns).rwsem);
+ /*
+ * 4) get a reference to shp.
+ * This cannot fail: shm_clist_rm() is called before
+ * ipc_rmid(), thus the refcount cannot be 0.
+ */
+ WARN_ON(!ipc_rcu_getref(&shp->shm_perm));
+
+ /*
+ * 5) unlink the shm segment from the list of segments
+ * created by current.
+ * This must be done last. After unlinking,
+ * only the refcounts obtained above prevent IPC_RMID
+ * from destroying the segment or the namespace.
+ */
+ list_del_init(&shp->shm_clist);
+
+ task_unlock(task);
+
+ /*
+ * 6) we have all references
+ * Thus lock & if needed destroy shp.
+ */
+ down_write(&shm_ids(ns).rwsem);
+ shm_lock_by_ptr(shp);
+ /*
+ * rcu_read_lock was implicitly taken in shm_lock_by_ptr, it's
+ * safe to call ipc_rcu_putref here
+ */
+ ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
+
+ if (ipc_valid_object(&shp->shm_perm)) {
+ if (shm_may_destroy(shp))
+ shm_destroy(ns, shp);
+ else
+ shm_unlock(shp);
+ } else {
+ /*
+ * Someone else deleted the shp from namespace
+ * idr/kht while we have waited.
+ * Just unlock and continue.
+ */
+ shm_unlock(shp);
+ }
+
+ up_write(&shm_ids(ns).rwsem);
+ put_ipc_ns(ns); /* paired with get_ipc_ns_not_zero */
+ }
}
static vm_fault_t shm_fault(struct vm_fault *vmf)
@@ -676,7 +768,11 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if (error < 0)
goto no_id;
+ shp->ns = ns;
+
+ task_lock(current);
list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist);
+ task_unlock(current);
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
@@ -1567,7 +1663,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 126e8bee943e9926238c891e2df5b5573aee76bc Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:18 -0800
Subject: [PATCH] ipc: WARN if trying to remove ipc object which is absent
Patch series "shm: shm_rmid_forced feature fixes".
Some time ago I met kernel crash after CRIU restore procedure,
fortunately, it was CRIU restore, so, I had dump files and could do
restore many times and crash reproduced easily. After some
investigation I've constructed the minimal reproducer. It was found
that it's use-after-free and it happens only if sysctl
kernel.shm_rmid_forced = 1.
The key of the problem is that the exit_shm() function not handles shp's
object destroy when task->sysvshm.shm_clist contains items from
different IPC namespaces. In most cases this list will contain only
items from one IPC namespace.
How can this list contain object from different namespaces? The
exit_shm() function is designed to clean up this list always when
process leaves IPC namespace. But we made a mistake a long time ago and
did not add a exit_shm() call into the setns() syscall procedures.
The first idea was just to add this call to setns() syscall but it
obviously changes semantics of setns() syscall and that's
userspace-visible change. So, I gave up on this idea.
The first real attempt to address the issue was just to omit forced
destroy if we meet shp object not from current task IPC namespace [1].
But that was not the best idea because task->sysvshm.shm_clist was
protected by rwsem which belongs to current task IPC namespace. It
means that list corruption may occur.
Second approach is just extend exit_shm() to properly handle shp's from
different IPC namespaces [2]. This is really non-trivial thing, I've
put a lot of effort into that but not believed that it's possible to
make it fully safe, clean and clear.
Thanks to the efforts of Manfred Spraul working an elegant solution was
designed. Thanks a lot, Manfred!
Eric also suggested the way to address the issue in ("[RFC][PATCH] shm:
In shm_exit destroy all created and never attached segments") Eric's
idea was to maintain a list of shm_clists one per IPC namespace, use
lock-less lists. But there is some extra memory consumption-related
concerns.
An alternative solution which was suggested by me was implemented in
("shm: reset shm_clist on setns but omit forced shm destroy"). The idea
is pretty simple, we add exit_shm() syscall to setns() but DO NOT
destroy shm segments even if sysctl kernel.shm_rmid_forced = 1, we just
clean up the task->sysvshm.shm_clist list.
This chages semantics of setns() syscall a little bit but in comparision
to the "naive" solution when we just add exit_shm() without any special
exclusions this looks like a safer option.
[1] https://lkml.org/lkml/2021/7/6/1108
[2] https://lkml.org/lkml/2021/7/14/736
This patch (of 2):
Let's produce a warning if we trying to remove non-existing IPC object
from IPC namespace kht/idr structures.
This allows us to catch possible bugs when the ipc_rmid() function was
called with inconsistent struct ipc_ids*, struct kern_ipc_perm*
arguments.
Link: https://lkml.kernel.org/r/20211027224348.611025-1-alexander.mikhalitsyn@vir…
Link: https://lkml.kernel.org/r/20211027224348.611025-2-alexander.mikhalitsyn@vir…
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/ipc/util.c b/ipc/util.c
index d48d8cfa1f3f..fa2d86ef3fb8 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -447,8 +447,8 @@ static int ipcget_public(struct ipc_namespace *ns, struct ipc_ids *ids,
static void ipc_kht_remove(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
{
if (ipcp->key != IPC_PRIVATE)
- rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
- ipc_kht_params);
+ WARN_ON_ONCE(rhashtable_remove_fast(&ids->key_ht, &ipcp->khtnode,
+ ipc_kht_params));
}
/**
@@ -498,7 +498,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
{
int idx = ipcid_to_idx(ipcp->id);
- idr_remove(&ids->ipcs_idr, idx);
+ WARN_ON_ONCE(idr_remove(&ids->ipcs_idr, idx) != ipcp);
ipc_kht_remove(ids, ipcp);
ids->in_use--;
ipcp->deleted = true;
Hi Greg,
it seems that many reported issues for 5.15 and 5.14 kernels are fixed
by the recent development on 5.16-rc, and it's worth to backport those
to 5.15.x stable.
Below is the list of commits from the mainline. Could you cherry-pick
them to 5.15.x stable? Applying from top to bottom.
I confirmed that they can be applied and built cleanly on 5.15.5.
4e7cf1fbb34ecb472c073980458cbe413afd4d64
9c9a3b9da891cc70405a544da6855700eddcbb71
e581f1cec4f899f788f6c9477f805b1d5fef25e2
bceee75387554f682638e719d1ea60125ea78cea
d215f63d49da9a8803af3e81acd6cad743686573
0ef74366bc150dda4f53c546dfa6e8f7c707e087
d5f871f89e21bb71827ea57bd484eedea85839a0
813a17cab9b708bbb1e0db8902e19857b57196ec
23939115be181bc5dbc33aa8471adcdbffa28910
53451b6da8271905941eb1eb369db152c4bd92f2
eee5d6f1356a016105a974fb176b491288439efa
83de8f83816e8e15227dac985163e3d433a2bf9d
The diffstat of those are like below:
sound/usb/card.h | 10 ++-
sound/usb/endpoint.c | 223 +++++++++++++++++++++++++++++++++++++--------------
sound/usb/endpoint.h | 13 ++-
sound/usb/pcm.c | 165 +++++++++++++++++++++++++++++--------
4 files changed, 306 insertions(+), 105 deletions(-)
If you prefer, I can submit the patches, too.
Thanks!
Takashi
While working on supporting the Intel HDR backlight interface, I noticed
that there's a couple of laptops that will very rarely manage to boot up
without detecting Intel HDR backlight support - even though it's supported
on the system. One example of such a laptop is the Lenovo P17 1st
generation.
Following some investigation Ville Syrjälä did through the docs they have
available to them, they discovered that there's actually supposed to be a
30ms wait after writing the source OUI before we begin setting up the rest
of the backlight interface.
This seems to be correct, as adding this 30ms delay seems to have
completely fixed the probing issues I was previously seeing. So - let's
start performing a 30ms wait after writing the OUI, which we do in a manner
similar to how we keep track of PPS delays (e.g. record the timestamp of
the OUI write, and then wait for however many ms are left since that
timestamp right before we interact with the backlight) in order to avoid
waiting any longer then we need to. As well, this also avoids us performing
this delay on systems where we don't end up using the HDR backlight
interface.
V2:
* Move panel delays into intel_pps
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 4a8d79901d5b ("drm/i915/dp: Enable Intel's HDR backlight interface (only SDR for now)")
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v5.12+
---
drivers/gpu/drm/i915/display/intel_display_types.h | 4 ++++
drivers/gpu/drm/i915/display/intel_dp.c | 11 +++++++++++
drivers/gpu/drm/i915/display/intel_dp.h | 2 ++
drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c | 5 +++++
4 files changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h b/drivers/gpu/drm/i915/display/intel_display_types.h
index ea1e8a6e10b0..ad64f9caa7ff 100644
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@@ -1485,6 +1485,7 @@ struct intel_pps {
bool want_panel_vdd;
unsigned long last_power_on;
unsigned long last_backlight_off;
+ unsigned long last_oui_write;
ktime_t panel_power_off_time;
intel_wakeref_t vdd_wakeref;
@@ -1653,6 +1654,9 @@ struct intel_dp {
struct intel_dp_pcon_frl frl;
struct intel_psr psr;
+
+ /* When we last wrote the OUI for eDP */
+ unsigned long last_oui_write;
};
enum lspcon_vendor {
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 0a424bf69396..45318891ba07 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -29,6 +29,7 @@
#include <linux/i2c.h>
#include <linux/notifier.h>
#include <linux/slab.h>
+#include <linux/timekeeping.h>
#include <linux/types.h>
#include <asm/byteorder.h>
@@ -2010,6 +2011,16 @@ intel_edp_init_source_oui(struct intel_dp *intel_dp, bool careful)
if (drm_dp_dpcd_write(&intel_dp->aux, DP_SOURCE_OUI, oui, sizeof(oui)) < 0)
drm_err(&i915->drm, "Failed to write source OUI\n");
+
+ intel_dp->pps.last_oui_write = jiffies;
+}
+
+void intel_dp_wait_source_oui(struct intel_dp *intel_dp)
+{
+ struct drm_i915_private *i915 = dp_to_i915(intel_dp);
+
+ drm_dbg_kms(&i915->drm, "Performing OUI wait\n");
+ wait_remaining_ms_from_jiffies(intel_dp->last_oui_write, 30);
}
/* If the device supports it, try to set the power state appropriately */
diff --git a/drivers/gpu/drm/i915/display/intel_dp.h b/drivers/gpu/drm/i915/display/intel_dp.h
index ce229026dc91..b64145a3869a 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.h
+++ b/drivers/gpu/drm/i915/display/intel_dp.h
@@ -119,4 +119,6 @@ void intel_dp_pcon_dsc_configure(struct intel_dp *intel_dp,
const struct intel_crtc_state *crtc_state);
void intel_dp_phy_test(struct intel_encoder *encoder);
+void intel_dp_wait_source_oui(struct intel_dp *intel_dp);
+
#endif /* __INTEL_DP_H__ */
diff --git a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
index 8b9c925c4c16..62c112daacf2 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
@@ -36,6 +36,7 @@
#include "intel_backlight.h"
#include "intel_display_types.h"
+#include "intel_dp.h"
#include "intel_dp_aux_backlight.h"
/* TODO:
@@ -106,6 +107,8 @@ intel_dp_aux_supports_hdr_backlight(struct intel_connector *connector)
int ret;
u8 tcon_cap[4];
+ intel_dp_wait_source_oui(intel_dp);
+
ret = drm_dp_dpcd_read(aux, INTEL_EDP_HDR_TCON_CAP0, tcon_cap, sizeof(tcon_cap));
if (ret != sizeof(tcon_cap))
return false;
@@ -204,6 +207,8 @@ intel_dp_aux_hdr_enable_backlight(const struct intel_crtc_state *crtc_state,
int ret;
u8 old_ctrl, ctrl;
+ intel_dp_wait_source_oui(intel_dp);
+
ret = drm_dp_dpcd_readb(&intel_dp->aux, INTEL_EDP_HDR_GETSET_CTRL_PARAMS, &old_ctrl);
if (ret != 1) {
drm_err(&i915->drm, "Failed to read current backlight control mode: %d\n", ret);
--
2.33.1
Building sh:rts7751r2dplus_defconfig:net,rtl8139:initrd ... failed
------------
Error log:
In file included from arch/sh/mm/init.c:25:
./arch/sh/include/asm/tlb.h:118:15: error: unknown type name 'voide'
118 | static inline voide
| ^~~~~
./arch/sh/include/asm/tlb.h: In function 'tlb_flush_pmd_range':
./arch/sh/include/asm/tlb.h:126:1: error: no return statement in function returning non-void [-Werror=return-type]
Guenter
Check out this report and any autotriaged failures in our web dashboard:
https://datawarehouse.cki-project.org/kcidb/checkouts/25737
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 862cfde7c0e2 - drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ Networking bridge: sanity - mlx5
⚡⚡⚡ Ethernet drivers sanity - mlx5
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
⚡⚡⚡ LTP - cve
⚡⚡⚡ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ Networking bridge: sanity - mlx5
⚡⚡⚡ Ethernet drivers sanity - mlx5
Host 7:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ lvm cache test
Host 3:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - nvmeof-mp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - srp
s390x:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 3:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
⚡⚡⚡ LTP - cve
⚡⚡⚡ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - nvmeof-mp
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - nvmeof-mp
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.
Check out this report and any autotriaged failures in our web dashboard:
https://datawarehouse.cki-project.org/kcidb/checkouts/25734
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 27a717ac84ce - drm/amdgpu/gfx9: switch to golden tsc registers for renoir+
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 2:
✅ Boot test
✅ Reboot test
✅ Networking bridge: sanity - mlx5
✅ Ethernet drivers sanity - mlx5
Host 3:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 5:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking: igmp conformance test
🚧 ✅ audit: audit testsuite test
Host 6:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ❌ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - srp
Host 2:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ lvm cache test
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - nvmeof-mp
Host 4:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 5:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
s390x:
Host 1:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 3:
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 2:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.
On Wed, Nov 24, 2021 at 4:54 AM Rafael J. Wysocki <rafael(a)kernel.org> wrote:
>
> On Tue, Nov 16, 2021 at 9:22 PM Evan Green <evgreen(a)chromium.org> wrote:
> >
> > On Tue, Nov 16, 2021 at 9:54 AM Rafael J. Wysocki <rafael(a)kernel.org> wrote:
> > >
> > > On Mon, Nov 15, 2021 at 6:13 PM Evan Green <evgreen(a)chromium.org> wrote:
> > > >
> > > > Gentle bump.
> > > >
> > > >
> > > > On Fri, Oct 29, 2021 at 12:24 PM Evan Green <evgreen(a)chromium.org> wrote:
> > > > >
> > > > > snapshot_write() is inappropriately limiting the amount of data that can
> > > > > be written in cases where a partial page has already been written. For
> > > > > example, one would expect to be able to write 1 byte, then 4095 bytes to
> > > > > the snapshot device, and have both of those complete fully (since now
> > > > > we're aligned to a page again). But what ends up happening is we write 1
> > > > > byte, then 4094/4095 bytes complete successfully.
> > > > >
> > > > > The reason is that simple_write_to_buffer()'s second argument is the
> > > > > total size of the buffer, not the size of the buffer minus the offset.
> > > > > Since simple_write_to_buffer() accounts for the offset in its
> > > > > implementation, snapshot_write() can just pass the full page size
> > > > > directly down.
> > > > >
> > > > > Signed-off-by: Evan Green <evgreen(a)chromium.org>
> > > > > ---
> > > > >
> > > > > kernel/power/user.c | 2 +-
> > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/kernel/power/user.c b/kernel/power/user.c
> > > > > index 740723bb388524..ad241b4ff64c58 100644
> > > > > --- a/kernel/power/user.c
> > > > > +++ b/kernel/power/user.c
> > > > > @@ -177,7 +177,7 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf,
> > > > > if (res <= 0)
> > > > > goto unlock;
> > > > > } else {
> > > > > - res = PAGE_SIZE - pg_offp;
> > > > > + res = PAGE_SIZE;
> > > > > }
> > > > >
> > > > > if (!data_of(data->handle)) {
> > > > > --
> > >
> > > Do you actually see this problem in practice?
> >
> > Yes. I may fire up another thread to explain why I'm stuck doing a
> > partial page write, and how I might be able to stop doing that in the
> > future with some kernel help. But either way, this is a bug.
>
> OK, patch applied as 5.16-rc material.
>
> I guess it should go into -stable kernels too?
Yes, putting it into -stable would make sense also. I should have CCed
them originally, doing that now.
-Evan
Each peer's endpoint contains a dst_cache entry that takes a reference
to another netdev. When the containing namespace exits, we take down the
socket and prevent future sockets from being created (by setting
creating_net to NULL), which removes that potential reference on the
netns. However, it doesn't release references to the netns that a netdev
cached in dst_cache might be taking, so the netns still might fail to
exit. Since the socket is gimped anyway, we can simply clear all the
dst_caches (by way of clearing the endpoint src), which will release all
references.
However, the current dst_cache_reset function only releases those
references lazily. But it turns out that all of our usages of
wg_socket_clear_peer_endpoint_src are called from contexts that are not
exactly high-speed or bottle-necked. For example, when there's
connection difficulty, or when userspace is reconfiguring the interface.
And in particular for this patch, when the netns is exiting. So for
those cases, it makes more sense to call dst_release immediately. For
that, we add a small helper function to dst_cache.
This patch also adds a test to netns.sh from Hangbin Liu to ensure this
doesn't regress.
Test-by: Hangbin Liu <liuhangbin(a)gmail.com>
Reported-by: Xiumei Mu <xmu(a)redhat.com>
Cc: Hangbin Liu <liuhangbin(a)gmail.com>
Cc: Toke Høiland-Jørgensen <toke(a)redhat.com>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Fixes: 900575aa33a3 ("wireguard: device: avoid circular netns references")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
drivers/net/wireguard/device.c | 3 +++
drivers/net/wireguard/socket.c | 2 +-
include/net/dst_cache.h | 11 ++++++++++
net/core/dst_cache.c | 19 +++++++++++++++++
tools/testing/selftests/wireguard/netns.sh | 24 +++++++++++++++++++++-
5 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
index 551ddaaaf540..77e64ea6be67 100644
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -398,6 +398,7 @@ static struct rtnl_link_ops link_ops __read_mostly = {
static void wg_netns_pre_exit(struct net *net)
{
struct wg_device *wg;
+ struct wg_peer *peer;
rtnl_lock();
list_for_each_entry(wg, &device_list, device_list) {
@@ -407,6 +408,8 @@ static void wg_netns_pre_exit(struct net *net)
mutex_lock(&wg->device_update_lock);
rcu_assign_pointer(wg->creating_net, NULL);
wg_socket_reinit(wg, NULL, NULL);
+ list_for_each_entry(peer, &wg->peer_list, peer_list)
+ wg_socket_clear_peer_endpoint_src(peer);
mutex_unlock(&wg->device_update_lock);
}
}
diff --git a/drivers/net/wireguard/socket.c b/drivers/net/wireguard/socket.c
index 8c496b747108..6f07b949cb81 100644
--- a/drivers/net/wireguard/socket.c
+++ b/drivers/net/wireguard/socket.c
@@ -308,7 +308,7 @@ void wg_socket_clear_peer_endpoint_src(struct wg_peer *peer)
{
write_lock_bh(&peer->endpoint_lock);
memset(&peer->endpoint.src6, 0, sizeof(peer->endpoint.src6));
- dst_cache_reset(&peer->endpoint_cache);
+ dst_cache_reset_now(&peer->endpoint_cache);
write_unlock_bh(&peer->endpoint_lock);
}
diff --git a/include/net/dst_cache.h b/include/net/dst_cache.h
index 67634675e919..df6622a5fe98 100644
--- a/include/net/dst_cache.h
+++ b/include/net/dst_cache.h
@@ -79,6 +79,17 @@ static inline void dst_cache_reset(struct dst_cache *dst_cache)
dst_cache->reset_ts = jiffies;
}
+/**
+ * dst_cache_reset_now - invalidate the cache contents immediately
+ * @dst_cache: the cache
+ *
+ * The caller must be sure there are no concurrent users, as this frees
+ * all dst_cache users immediately, rather than waiting for the next
+ * per-cpu usage like dst_cache_reset does. Most callers should use the
+ * higher speed lazily-freed dst_cache_reset function instead.
+ */
+void dst_cache_reset_now(struct dst_cache *dst_cache);
+
/**
* dst_cache_init - initialize the cache, allocating the required storage
* @dst_cache: the cache
diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
index be74ab4551c2..0ccfd5fa5cb9 100644
--- a/net/core/dst_cache.c
+++ b/net/core/dst_cache.c
@@ -162,3 +162,22 @@ void dst_cache_destroy(struct dst_cache *dst_cache)
free_percpu(dst_cache->cache);
}
EXPORT_SYMBOL_GPL(dst_cache_destroy);
+
+void dst_cache_reset_now(struct dst_cache *dst_cache)
+{
+ int i;
+
+ if (!dst_cache->cache)
+ return;
+
+ dst_cache->reset_ts = jiffies;
+ for_each_possible_cpu(i) {
+ struct dst_cache_pcpu *idst = per_cpu_ptr(dst_cache->cache, i);
+ struct dst_entry *dst = idst->dst;
+
+ idst->cookie = 0;
+ idst->dst = NULL;
+ dst_release(dst);
+ }
+}
+EXPORT_SYMBOL_GPL(dst_cache_reset_now);
diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh
index 2e5c1630885e..8a9461aa0878 100755
--- a/tools/testing/selftests/wireguard/netns.sh
+++ b/tools/testing/selftests/wireguard/netns.sh
@@ -613,6 +613,28 @@ ip0 link set wg0 up
kill $ncat_pid
ip0 link del wg0
+# Ensure that dst_cache references don't outlive netns lifetime
+ip1 link add dev wg0 type wireguard
+ip2 link add dev wg0 type wireguard
+configure_peers
+ip1 link add veth1 type veth peer name veth2
+ip1 link set veth2 netns $netns2
+ip1 addr add fd00:aa::1/64 dev veth1
+ip2 addr add fd00:aa::2/64 dev veth2
+ip1 link set veth1 up
+ip2 link set veth2 up
+waitiface $netns1 veth1
+waitiface $netns2 veth2
+ip1 -6 route add default dev veth1 via fd00:aa::2
+ip2 -6 route add default dev veth2 via fd00:aa::1
+n1 wg set wg0 peer "$pub2" endpoint [fd00:aa::2]:2
+n2 wg set wg0 peer "$pub1" endpoint [fd00:aa::1]:1
+n1 ping6 -c 1 fd00::2
+pp ip netns delete $netns1
+pp ip netns delete $netns2
+pp ip netns add $netns1
+pp ip netns add $netns2
+
# Ensure there aren't circular reference loops
ip1 link add wg1 type wireguard
ip2 link add wg2 type wireguard
@@ -631,7 +653,7 @@ while read -t 0.1 -r line 2>/dev/null || [[ $? -ne 142 ]]; do
done < /dev/kmsg
alldeleted=1
for object in "${!objects[@]}"; do
- if [[ ${objects["$object"]} != *createddestroyed ]]; then
+ if [[ ${objects["$object"]} != *createddestroyed && ${objects["$object"]} != *createdcreateddestroyeddestroyed ]]; then
echo "Error: $object: merely ${objects["$object"]}" >&3
alldeleted=0
fi
--
2.34.1
Before commit 86585c61972f ("hwmon: (pwm-fan) stop using legacy
PWM functions and some cleanups") pwm_apply_state() was called
unconditionally in pwm_fan_probe(). In this commit this direct
call was replaced by a call to __set_pwm(ct, MAX_PWM) which
however is a noop if ctx->pwm_value already matches the value to
set.
After probe the fan is supposed to run at full speed, and the
internal driver state suggests it does, but this isn't asserted
and depending on bootloader and pwm low-level driver, the fan
might just be off.
So drop setting pwm_value to MAX_PWM to ensure the check in
__set_pwm doesn't make it exit early and the fan goes on as
intended.
Cc: stable(a)vger.kernel.org
Fixes: 86585c61972f ("hwmon: (pwm-fan) stop using legacy PWM functions and some cleanups")
Signed-off-by: Billy Tsai <billy_tsai(a)aspeedtech.com>
---
drivers/hwmon/pwm-fan.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/hwmon/pwm-fan.c b/drivers/hwmon/pwm-fan.c
index 17518b4cab1b..f12b9a28a232 100644
--- a/drivers/hwmon/pwm-fan.c
+++ b/drivers/hwmon/pwm-fan.c
@@ -336,8 +336,6 @@ static int pwm_fan_probe(struct platform_device *pdev)
return ret;
}
- ctx->pwm_value = MAX_PWM;
-
pwm_init_state(ctx->pwm, &ctx->pwm_state);
/*
--
2.25.1
Check out this report and any autotriaged failures in our web dashboard:
https://datawarehouse.cki-project.org/kcidb/checkouts/25602
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: b6f81f5c9c8e - cifs: nosharesock should be set on new server
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
✅ Boot test
✅ Reboot test
✅ Networking bridge: sanity - mlx5
✅ Ethernet drivers sanity - mlx5
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 4:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking: igmp conformance test
🚧 ✅ audit: audit testsuite test
Host 5:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 7:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ❌ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - srp
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ LTP - cve
⚡⚡⚡ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 3:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ lvm cache test
Host 4:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - nvmeof-mp
Host 5:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
s390x:
Host 1:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 3:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 85b6d24646e4125c591639841169baa98a2da503 Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:21 -0800
Subject: [PATCH] shm: extend forced shm destroy to support objects from
several IPC nses
Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife…
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 05e22770af51..b75395ec8d52 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -131,6 +131,16 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ if (ns) {
+ if (refcount_inc_not_zero(&ns->ns.count))
+ return ns;
+ }
+
+ return NULL;
+}
+
extern void put_ipc_ns(struct ipc_namespace *ns);
#else
static inline struct ipc_namespace *copy_ipcs(unsigned long flags,
@@ -147,6 +157,11 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ return ns;
+}
+
static inline void put_ipc_ns(struct ipc_namespace *ns)
{
}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ba88a6987400..058d7f371e25 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -158,7 +158,7 @@ static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs. Also
* pins the final release of task.io_context. Also protects ->cpuset and
- * ->cgroup.subsys[]. And ->vfork_done.
+ * ->cgroup.subsys[]. And ->vfork_done. And ->sysvshm.shm_clist.
*
* Nests both inside and outside of read_lock(&tasklist_lock).
* It must not be nested with write_lock_irq(&tasklist_lock),
diff --git a/ipc/shm.c b/ipc/shm.c
index 4942bdd65748..b3048ebd5c31 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -62,9 +62,18 @@ struct shmid_kernel /* private to the kernel */
struct pid *shm_lprid;
struct ucounts *mlock_ucounts;
- /* The task created the shm object. NULL if the task is dead. */
+ /*
+ * The task created the shm object, for
+ * task_lock(shp->shm_creator)
+ */
struct task_struct *shm_creator;
- struct list_head shm_clist; /* list by creator */
+
+ /*
+ * List by creator. task_lock(->shm_creator) required for read/write.
+ * If list_empty(), then the creator is dead already.
+ */
+ struct list_head shm_clist;
+ struct ipc_namespace *ns;
} __randomize_layout;
/* shm_mode upper byte flags */
@@ -115,6 +124,7 @@ static void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
struct shmid_kernel *shp;
shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+ WARN_ON(ns != shp->ns);
if (shp->shm_nattch) {
shp->shm_perm.mode |= SHM_DEST;
@@ -225,10 +235,43 @@ static void shm_rcu_free(struct rcu_head *head)
kfree(shp);
}
-static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
+/*
+ * It has to be called with shp locked.
+ * It must be called before ipc_rmid()
+ */
+static inline void shm_clist_rm(struct shmid_kernel *shp)
{
- list_del(&s->shm_clist);
- ipc_rmid(&shm_ids(ns), &s->shm_perm);
+ struct task_struct *creator;
+
+ /* ensure that shm_creator does not disappear */
+ rcu_read_lock();
+
+ /*
+ * A concurrent exit_shm may do a list_del_init() as well.
+ * Just do nothing if exit_shm already did the work
+ */
+ if (!list_empty(&shp->shm_clist)) {
+ /*
+ * shp->shm_creator is guaranteed to be valid *only*
+ * if shp->shm_clist is not empty.
+ */
+ creator = shp->shm_creator;
+
+ task_lock(creator);
+ /*
+ * list_del_init() is a nop if the entry was already removed
+ * from the list.
+ */
+ list_del_init(&shp->shm_clist);
+ task_unlock(creator);
+ }
+ rcu_read_unlock();
+}
+
+static inline void shm_rmid(struct shmid_kernel *s)
+{
+ shm_clist_rm(s);
+ ipc_rmid(&shm_ids(s->ns), &s->shm_perm);
}
@@ -283,7 +326,7 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
shm_file = shp->shm_file;
shp->shm_file = NULL;
ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- shm_rmid(ns, shp);
+ shm_rmid(shp);
shm_unlock(shp);
if (!is_file_hugepages(shm_file))
shmem_lock(shm_file, 0, shp->mlock_ucounts);
@@ -303,10 +346,10 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
*
* 2) sysctl kernel.shm_rmid_forced is set to 1.
*/
-static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+static bool shm_may_destroy(struct shmid_kernel *shp)
{
return (shp->shm_nattch == 0) &&
- (ns->shm_rmid_forced ||
+ (shp->ns->shm_rmid_forced ||
(shp->shm_perm.mode & SHM_DEST));
}
@@ -337,7 +380,7 @@ static void shm_close(struct vm_area_struct *vma)
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_dtim = ktime_get_real_seconds();
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
@@ -358,10 +401,10 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
*
* As shp->* are changed under rwsem, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL)
+ if (!list_empty(&shp->shm_clist))
return 0;
- if (shm_may_destroy(ns, shp)) {
+ if (shm_may_destroy(shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
@@ -379,48 +422,97 @@ void shm_destroy_orphaned(struct ipc_namespace *ns)
/* Locking assumes this will only be called with task == current */
void exit_shm(struct task_struct *task)
{
- struct ipc_namespace *ns = task->nsproxy->ipc_ns;
- struct shmid_kernel *shp, *n;
+ for (;;) {
+ struct shmid_kernel *shp;
+ struct ipc_namespace *ns;
- if (list_empty(&task->sysvshm.shm_clist))
- return;
+ task_lock(task);
+
+ if (list_empty(&task->sysvshm.shm_clist)) {
+ task_unlock(task);
+ break;
+ }
+
+ shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel,
+ shm_clist);
- /*
- * If kernel.shm_rmid_forced is not set then only keep track of
- * which shmids are orphaned, so that a later set of the sysctl
- * can clean them up.
- */
- if (!ns->shm_rmid_forced) {
- down_read(&shm_ids(ns).rwsem);
- list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
- shp->shm_creator = NULL;
/*
- * Only under read lock but we are only called on current
- * so no entry on the list will be shared.
+ * 1) Get pointer to the ipc namespace. It is worth to say
+ * that this pointer is guaranteed to be valid because
+ * shp lifetime is always shorter than namespace lifetime
+ * in which shp lives.
+ * We taken task_lock it means that shp won't be freed.
*/
- list_del(&task->sysvshm.shm_clist);
- up_read(&shm_ids(ns).rwsem);
- return;
- }
+ ns = shp->ns;
- /*
- * Destroy all already created segments, that were not yet mapped,
- * and mark any mapped as orphan to cover the sysctl toggling.
- * Destroy is skipped if shm_may_destroy() returns false.
- */
- down_write(&shm_ids(ns).rwsem);
- list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
- shp->shm_creator = NULL;
+ /*
+ * 2) If kernel.shm_rmid_forced is not set then only keep track of
+ * which shmids are orphaned, so that a later set of the sysctl
+ * can clean them up.
+ */
+ if (!ns->shm_rmid_forced)
+ goto unlink_continue;
- if (shm_may_destroy(ns, shp)) {
- shm_lock_by_ptr(shp);
- shm_destroy(ns, shp);
+ /*
+ * 3) get a reference to the namespace.
+ * The refcount could be already 0. If it is 0, then
+ * the shm objects will be free by free_ipc_work().
+ */
+ ns = get_ipc_ns_not_zero(ns);
+ if (!ns) {
+unlink_continue:
+ list_del_init(&shp->shm_clist);
+ task_unlock(task);
+ continue;
}
- }
- /* Remove the list head from any segments still attached. */
- list_del(&task->sysvshm.shm_clist);
- up_write(&shm_ids(ns).rwsem);
+ /*
+ * 4) get a reference to shp.
+ * This cannot fail: shm_clist_rm() is called before
+ * ipc_rmid(), thus the refcount cannot be 0.
+ */
+ WARN_ON(!ipc_rcu_getref(&shp->shm_perm));
+
+ /*
+ * 5) unlink the shm segment from the list of segments
+ * created by current.
+ * This must be done last. After unlinking,
+ * only the refcounts obtained above prevent IPC_RMID
+ * from destroying the segment or the namespace.
+ */
+ list_del_init(&shp->shm_clist);
+
+ task_unlock(task);
+
+ /*
+ * 6) we have all references
+ * Thus lock & if needed destroy shp.
+ */
+ down_write(&shm_ids(ns).rwsem);
+ shm_lock_by_ptr(shp);
+ /*
+ * rcu_read_lock was implicitly taken in shm_lock_by_ptr, it's
+ * safe to call ipc_rcu_putref here
+ */
+ ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
+
+ if (ipc_valid_object(&shp->shm_perm)) {
+ if (shm_may_destroy(shp))
+ shm_destroy(ns, shp);
+ else
+ shm_unlock(shp);
+ } else {
+ /*
+ * Someone else deleted the shp from namespace
+ * idr/kht while we have waited.
+ * Just unlock and continue.
+ */
+ shm_unlock(shp);
+ }
+
+ up_write(&shm_ids(ns).rwsem);
+ put_ipc_ns(ns); /* paired with get_ipc_ns_not_zero */
+ }
}
static vm_fault_t shm_fault(struct vm_fault *vmf)
@@ -676,7 +768,11 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if (error < 0)
goto no_id;
+ shp->ns = ns;
+
+ task_lock(current);
list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist);
+ task_unlock(current);
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
@@ -1567,7 +1663,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 85b6d24646e4125c591639841169baa98a2da503 Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:21 -0800
Subject: [PATCH] shm: extend forced shm destroy to support objects from
several IPC nses
Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife…
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 05e22770af51..b75395ec8d52 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -131,6 +131,16 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ if (ns) {
+ if (refcount_inc_not_zero(&ns->ns.count))
+ return ns;
+ }
+
+ return NULL;
+}
+
extern void put_ipc_ns(struct ipc_namespace *ns);
#else
static inline struct ipc_namespace *copy_ipcs(unsigned long flags,
@@ -147,6 +157,11 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ return ns;
+}
+
static inline void put_ipc_ns(struct ipc_namespace *ns)
{
}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ba88a6987400..058d7f371e25 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -158,7 +158,7 @@ static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs. Also
* pins the final release of task.io_context. Also protects ->cpuset and
- * ->cgroup.subsys[]. And ->vfork_done.
+ * ->cgroup.subsys[]. And ->vfork_done. And ->sysvshm.shm_clist.
*
* Nests both inside and outside of read_lock(&tasklist_lock).
* It must not be nested with write_lock_irq(&tasklist_lock),
diff --git a/ipc/shm.c b/ipc/shm.c
index 4942bdd65748..b3048ebd5c31 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -62,9 +62,18 @@ struct shmid_kernel /* private to the kernel */
struct pid *shm_lprid;
struct ucounts *mlock_ucounts;
- /* The task created the shm object. NULL if the task is dead. */
+ /*
+ * The task created the shm object, for
+ * task_lock(shp->shm_creator)
+ */
struct task_struct *shm_creator;
- struct list_head shm_clist; /* list by creator */
+
+ /*
+ * List by creator. task_lock(->shm_creator) required for read/write.
+ * If list_empty(), then the creator is dead already.
+ */
+ struct list_head shm_clist;
+ struct ipc_namespace *ns;
} __randomize_layout;
/* shm_mode upper byte flags */
@@ -115,6 +124,7 @@ static void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
struct shmid_kernel *shp;
shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+ WARN_ON(ns != shp->ns);
if (shp->shm_nattch) {
shp->shm_perm.mode |= SHM_DEST;
@@ -225,10 +235,43 @@ static void shm_rcu_free(struct rcu_head *head)
kfree(shp);
}
-static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
+/*
+ * It has to be called with shp locked.
+ * It must be called before ipc_rmid()
+ */
+static inline void shm_clist_rm(struct shmid_kernel *shp)
{
- list_del(&s->shm_clist);
- ipc_rmid(&shm_ids(ns), &s->shm_perm);
+ struct task_struct *creator;
+
+ /* ensure that shm_creator does not disappear */
+ rcu_read_lock();
+
+ /*
+ * A concurrent exit_shm may do a list_del_init() as well.
+ * Just do nothing if exit_shm already did the work
+ */
+ if (!list_empty(&shp->shm_clist)) {
+ /*
+ * shp->shm_creator is guaranteed to be valid *only*
+ * if shp->shm_clist is not empty.
+ */
+ creator = shp->shm_creator;
+
+ task_lock(creator);
+ /*
+ * list_del_init() is a nop if the entry was already removed
+ * from the list.
+ */
+ list_del_init(&shp->shm_clist);
+ task_unlock(creator);
+ }
+ rcu_read_unlock();
+}
+
+static inline void shm_rmid(struct shmid_kernel *s)
+{
+ shm_clist_rm(s);
+ ipc_rmid(&shm_ids(s->ns), &s->shm_perm);
}
@@ -283,7 +326,7 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
shm_file = shp->shm_file;
shp->shm_file = NULL;
ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- shm_rmid(ns, shp);
+ shm_rmid(shp);
shm_unlock(shp);
if (!is_file_hugepages(shm_file))
shmem_lock(shm_file, 0, shp->mlock_ucounts);
@@ -303,10 +346,10 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
*
* 2) sysctl kernel.shm_rmid_forced is set to 1.
*/
-static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+static bool shm_may_destroy(struct shmid_kernel *shp)
{
return (shp->shm_nattch == 0) &&
- (ns->shm_rmid_forced ||
+ (shp->ns->shm_rmid_forced ||
(shp->shm_perm.mode & SHM_DEST));
}
@@ -337,7 +380,7 @@ static void shm_close(struct vm_area_struct *vma)
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_dtim = ktime_get_real_seconds();
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
@@ -358,10 +401,10 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
*
* As shp->* are changed under rwsem, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL)
+ if (!list_empty(&shp->shm_clist))
return 0;
- if (shm_may_destroy(ns, shp)) {
+ if (shm_may_destroy(shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
@@ -379,48 +422,97 @@ void shm_destroy_orphaned(struct ipc_namespace *ns)
/* Locking assumes this will only be called with task == current */
void exit_shm(struct task_struct *task)
{
- struct ipc_namespace *ns = task->nsproxy->ipc_ns;
- struct shmid_kernel *shp, *n;
+ for (;;) {
+ struct shmid_kernel *shp;
+ struct ipc_namespace *ns;
- if (list_empty(&task->sysvshm.shm_clist))
- return;
+ task_lock(task);
+
+ if (list_empty(&task->sysvshm.shm_clist)) {
+ task_unlock(task);
+ break;
+ }
+
+ shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel,
+ shm_clist);
- /*
- * If kernel.shm_rmid_forced is not set then only keep track of
- * which shmids are orphaned, so that a later set of the sysctl
- * can clean them up.
- */
- if (!ns->shm_rmid_forced) {
- down_read(&shm_ids(ns).rwsem);
- list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
- shp->shm_creator = NULL;
/*
- * Only under read lock but we are only called on current
- * so no entry on the list will be shared.
+ * 1) Get pointer to the ipc namespace. It is worth to say
+ * that this pointer is guaranteed to be valid because
+ * shp lifetime is always shorter than namespace lifetime
+ * in which shp lives.
+ * We taken task_lock it means that shp won't be freed.
*/
- list_del(&task->sysvshm.shm_clist);
- up_read(&shm_ids(ns).rwsem);
- return;
- }
+ ns = shp->ns;
- /*
- * Destroy all already created segments, that were not yet mapped,
- * and mark any mapped as orphan to cover the sysctl toggling.
- * Destroy is skipped if shm_may_destroy() returns false.
- */
- down_write(&shm_ids(ns).rwsem);
- list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
- shp->shm_creator = NULL;
+ /*
+ * 2) If kernel.shm_rmid_forced is not set then only keep track of
+ * which shmids are orphaned, so that a later set of the sysctl
+ * can clean them up.
+ */
+ if (!ns->shm_rmid_forced)
+ goto unlink_continue;
- if (shm_may_destroy(ns, shp)) {
- shm_lock_by_ptr(shp);
- shm_destroy(ns, shp);
+ /*
+ * 3) get a reference to the namespace.
+ * The refcount could be already 0. If it is 0, then
+ * the shm objects will be free by free_ipc_work().
+ */
+ ns = get_ipc_ns_not_zero(ns);
+ if (!ns) {
+unlink_continue:
+ list_del_init(&shp->shm_clist);
+ task_unlock(task);
+ continue;
}
- }
- /* Remove the list head from any segments still attached. */
- list_del(&task->sysvshm.shm_clist);
- up_write(&shm_ids(ns).rwsem);
+ /*
+ * 4) get a reference to shp.
+ * This cannot fail: shm_clist_rm() is called before
+ * ipc_rmid(), thus the refcount cannot be 0.
+ */
+ WARN_ON(!ipc_rcu_getref(&shp->shm_perm));
+
+ /*
+ * 5) unlink the shm segment from the list of segments
+ * created by current.
+ * This must be done last. After unlinking,
+ * only the refcounts obtained above prevent IPC_RMID
+ * from destroying the segment or the namespace.
+ */
+ list_del_init(&shp->shm_clist);
+
+ task_unlock(task);
+
+ /*
+ * 6) we have all references
+ * Thus lock & if needed destroy shp.
+ */
+ down_write(&shm_ids(ns).rwsem);
+ shm_lock_by_ptr(shp);
+ /*
+ * rcu_read_lock was implicitly taken in shm_lock_by_ptr, it's
+ * safe to call ipc_rcu_putref here
+ */
+ ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
+
+ if (ipc_valid_object(&shp->shm_perm)) {
+ if (shm_may_destroy(shp))
+ shm_destroy(ns, shp);
+ else
+ shm_unlock(shp);
+ } else {
+ /*
+ * Someone else deleted the shp from namespace
+ * idr/kht while we have waited.
+ * Just unlock and continue.
+ */
+ shm_unlock(shp);
+ }
+
+ up_write(&shm_ids(ns).rwsem);
+ put_ipc_ns(ns); /* paired with get_ipc_ns_not_zero */
+ }
}
static vm_fault_t shm_fault(struct vm_fault *vmf)
@@ -676,7 +768,11 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if (error < 0)
goto no_id;
+ shp->ns = ns;
+
+ task_lock(current);
list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist);
+ task_unlock(current);
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
@@ -1567,7 +1663,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 85b6d24646e4125c591639841169baa98a2da503 Mon Sep 17 00:00:00 2001
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Date: Fri, 19 Nov 2021 16:43:21 -0800
Subject: [PATCH] shm: extend forced shm destroy to support objects from
several IPC nses
Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife…
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Manfred Spraul <manfred(a)colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Vasily Averin <vvs(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 05e22770af51..b75395ec8d52 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -131,6 +131,16 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ if (ns) {
+ if (refcount_inc_not_zero(&ns->ns.count))
+ return ns;
+ }
+
+ return NULL;
+}
+
extern void put_ipc_ns(struct ipc_namespace *ns);
#else
static inline struct ipc_namespace *copy_ipcs(unsigned long flags,
@@ -147,6 +157,11 @@ static inline struct ipc_namespace *get_ipc_ns(struct ipc_namespace *ns)
return ns;
}
+static inline struct ipc_namespace *get_ipc_ns_not_zero(struct ipc_namespace *ns)
+{
+ return ns;
+}
+
static inline void put_ipc_ns(struct ipc_namespace *ns)
{
}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ba88a6987400..058d7f371e25 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -158,7 +158,7 @@ static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
* Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
* subscriptions and synchronises with wait4(). Also used in procfs. Also
* pins the final release of task.io_context. Also protects ->cpuset and
- * ->cgroup.subsys[]. And ->vfork_done.
+ * ->cgroup.subsys[]. And ->vfork_done. And ->sysvshm.shm_clist.
*
* Nests both inside and outside of read_lock(&tasklist_lock).
* It must not be nested with write_lock_irq(&tasklist_lock),
diff --git a/ipc/shm.c b/ipc/shm.c
index 4942bdd65748..b3048ebd5c31 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -62,9 +62,18 @@ struct shmid_kernel /* private to the kernel */
struct pid *shm_lprid;
struct ucounts *mlock_ucounts;
- /* The task created the shm object. NULL if the task is dead. */
+ /*
+ * The task created the shm object, for
+ * task_lock(shp->shm_creator)
+ */
struct task_struct *shm_creator;
- struct list_head shm_clist; /* list by creator */
+
+ /*
+ * List by creator. task_lock(->shm_creator) required for read/write.
+ * If list_empty(), then the creator is dead already.
+ */
+ struct list_head shm_clist;
+ struct ipc_namespace *ns;
} __randomize_layout;
/* shm_mode upper byte flags */
@@ -115,6 +124,7 @@ static void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
struct shmid_kernel *shp;
shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+ WARN_ON(ns != shp->ns);
if (shp->shm_nattch) {
shp->shm_perm.mode |= SHM_DEST;
@@ -225,10 +235,43 @@ static void shm_rcu_free(struct rcu_head *head)
kfree(shp);
}
-static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
+/*
+ * It has to be called with shp locked.
+ * It must be called before ipc_rmid()
+ */
+static inline void shm_clist_rm(struct shmid_kernel *shp)
{
- list_del(&s->shm_clist);
- ipc_rmid(&shm_ids(ns), &s->shm_perm);
+ struct task_struct *creator;
+
+ /* ensure that shm_creator does not disappear */
+ rcu_read_lock();
+
+ /*
+ * A concurrent exit_shm may do a list_del_init() as well.
+ * Just do nothing if exit_shm already did the work
+ */
+ if (!list_empty(&shp->shm_clist)) {
+ /*
+ * shp->shm_creator is guaranteed to be valid *only*
+ * if shp->shm_clist is not empty.
+ */
+ creator = shp->shm_creator;
+
+ task_lock(creator);
+ /*
+ * list_del_init() is a nop if the entry was already removed
+ * from the list.
+ */
+ list_del_init(&shp->shm_clist);
+ task_unlock(creator);
+ }
+ rcu_read_unlock();
+}
+
+static inline void shm_rmid(struct shmid_kernel *s)
+{
+ shm_clist_rm(s);
+ ipc_rmid(&shm_ids(s->ns), &s->shm_perm);
}
@@ -283,7 +326,7 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
shm_file = shp->shm_file;
shp->shm_file = NULL;
ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
- shm_rmid(ns, shp);
+ shm_rmid(shp);
shm_unlock(shp);
if (!is_file_hugepages(shm_file))
shmem_lock(shm_file, 0, shp->mlock_ucounts);
@@ -303,10 +346,10 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
*
* 2) sysctl kernel.shm_rmid_forced is set to 1.
*/
-static bool shm_may_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
+static bool shm_may_destroy(struct shmid_kernel *shp)
{
return (shp->shm_nattch == 0) &&
- (ns->shm_rmid_forced ||
+ (shp->ns->shm_rmid_forced ||
(shp->shm_perm.mode & SHM_DEST));
}
@@ -337,7 +380,7 @@ static void shm_close(struct vm_area_struct *vma)
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_dtim = ktime_get_real_seconds();
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
@@ -358,10 +401,10 @@ static int shm_try_destroy_orphaned(int id, void *p, void *data)
*
* As shp->* are changed under rwsem, it's safe to skip shp locking.
*/
- if (shp->shm_creator != NULL)
+ if (!list_empty(&shp->shm_clist))
return 0;
- if (shm_may_destroy(ns, shp)) {
+ if (shm_may_destroy(shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
@@ -379,48 +422,97 @@ void shm_destroy_orphaned(struct ipc_namespace *ns)
/* Locking assumes this will only be called with task == current */
void exit_shm(struct task_struct *task)
{
- struct ipc_namespace *ns = task->nsproxy->ipc_ns;
- struct shmid_kernel *shp, *n;
+ for (;;) {
+ struct shmid_kernel *shp;
+ struct ipc_namespace *ns;
- if (list_empty(&task->sysvshm.shm_clist))
- return;
+ task_lock(task);
+
+ if (list_empty(&task->sysvshm.shm_clist)) {
+ task_unlock(task);
+ break;
+ }
+
+ shp = list_first_entry(&task->sysvshm.shm_clist, struct shmid_kernel,
+ shm_clist);
- /*
- * If kernel.shm_rmid_forced is not set then only keep track of
- * which shmids are orphaned, so that a later set of the sysctl
- * can clean them up.
- */
- if (!ns->shm_rmid_forced) {
- down_read(&shm_ids(ns).rwsem);
- list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
- shp->shm_creator = NULL;
/*
- * Only under read lock but we are only called on current
- * so no entry on the list will be shared.
+ * 1) Get pointer to the ipc namespace. It is worth to say
+ * that this pointer is guaranteed to be valid because
+ * shp lifetime is always shorter than namespace lifetime
+ * in which shp lives.
+ * We taken task_lock it means that shp won't be freed.
*/
- list_del(&task->sysvshm.shm_clist);
- up_read(&shm_ids(ns).rwsem);
- return;
- }
+ ns = shp->ns;
- /*
- * Destroy all already created segments, that were not yet mapped,
- * and mark any mapped as orphan to cover the sysctl toggling.
- * Destroy is skipped if shm_may_destroy() returns false.
- */
- down_write(&shm_ids(ns).rwsem);
- list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
- shp->shm_creator = NULL;
+ /*
+ * 2) If kernel.shm_rmid_forced is not set then only keep track of
+ * which shmids are orphaned, so that a later set of the sysctl
+ * can clean them up.
+ */
+ if (!ns->shm_rmid_forced)
+ goto unlink_continue;
- if (shm_may_destroy(ns, shp)) {
- shm_lock_by_ptr(shp);
- shm_destroy(ns, shp);
+ /*
+ * 3) get a reference to the namespace.
+ * The refcount could be already 0. If it is 0, then
+ * the shm objects will be free by free_ipc_work().
+ */
+ ns = get_ipc_ns_not_zero(ns);
+ if (!ns) {
+unlink_continue:
+ list_del_init(&shp->shm_clist);
+ task_unlock(task);
+ continue;
}
- }
- /* Remove the list head from any segments still attached. */
- list_del(&task->sysvshm.shm_clist);
- up_write(&shm_ids(ns).rwsem);
+ /*
+ * 4) get a reference to shp.
+ * This cannot fail: shm_clist_rm() is called before
+ * ipc_rmid(), thus the refcount cannot be 0.
+ */
+ WARN_ON(!ipc_rcu_getref(&shp->shm_perm));
+
+ /*
+ * 5) unlink the shm segment from the list of segments
+ * created by current.
+ * This must be done last. After unlinking,
+ * only the refcounts obtained above prevent IPC_RMID
+ * from destroying the segment or the namespace.
+ */
+ list_del_init(&shp->shm_clist);
+
+ task_unlock(task);
+
+ /*
+ * 6) we have all references
+ * Thus lock & if needed destroy shp.
+ */
+ down_write(&shm_ids(ns).rwsem);
+ shm_lock_by_ptr(shp);
+ /*
+ * rcu_read_lock was implicitly taken in shm_lock_by_ptr, it's
+ * safe to call ipc_rcu_putref here
+ */
+ ipc_rcu_putref(&shp->shm_perm, shm_rcu_free);
+
+ if (ipc_valid_object(&shp->shm_perm)) {
+ if (shm_may_destroy(shp))
+ shm_destroy(ns, shp);
+ else
+ shm_unlock(shp);
+ } else {
+ /*
+ * Someone else deleted the shp from namespace
+ * idr/kht while we have waited.
+ * Just unlock and continue.
+ */
+ shm_unlock(shp);
+ }
+
+ up_write(&shm_ids(ns).rwsem);
+ put_ipc_ns(ns); /* paired with get_ipc_ns_not_zero */
+ }
}
static vm_fault_t shm_fault(struct vm_fault *vmf)
@@ -676,7 +768,11 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if (error < 0)
goto no_id;
+ shp->ns = ns;
+
+ task_lock(current);
list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist);
+ task_unlock(current);
/*
* shmid gets reported as "inode#" in /proc/pid/maps.
@@ -1567,7 +1663,8 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
down_write(&shm_ids(ns).rwsem);
shp = shm_lock(ns, shmid);
shp->shm_nattch--;
- if (shm_may_destroy(ns, shp))
+
+ if (shm_may_destroy(shp))
shm_destroy(ns, shp);
else
shm_unlock(shp);
The patch titled
Subject: timers: implement usleep_idle_range()
has been added to the -mm tree. Its filename is
timers-implement-usleep_idle_range.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/timers-implement-usleep_idle_rang…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/timers-implement-usleep_idle_rang…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: timers: implement usleep_idle_range()
Patch series "mm/damon: Fix fake /proc/loadavg reports", v3.
This patchset fixes DAMON's fake load report issue. The first patch makes
yet another variant of usleep_range() for this fix, and the second patch
fixes the issue of DAMON by making it using the newly introduced function.
This patch (of 2):
Some kernel threads such as DAMON could need to repeatedly sleep in micro
seconds level. Because usleep_range() sleeps in uninterruptible state,
however, such threads would make /proc/loadavg reports fake load.
To help such cases, this commit implements a variant of usleep_range()
called usleep_idle_range(). It is same to usleep_range() but sets the
state of the current task as TASK_IDLE while sleeping.
Link: https://lkml.kernel.org/r/20211126145015.15862-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211126145015.15862-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Suggested-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: John Stultz <john.stultz(a)linaro.org>
Cc: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/delay.h | 14 +++++++++++++-
kernel/time/timer.c | 16 +++++++++-------
2 files changed, 22 insertions(+), 8 deletions(-)
--- a/include/linux/delay.h~timers-implement-usleep_idle_range
+++ a/include/linux/delay.h
@@ -20,6 +20,7 @@
*/
#include <linux/math.h>
+#include <linux/sched.h>
extern unsigned long loops_per_jiffy;
@@ -58,7 +59,18 @@ void calibrate_delay(void);
void __attribute__((weak)) calibration_delay_done(void);
void msleep(unsigned int msecs);
unsigned long msleep_interruptible(unsigned int msecs);
-void usleep_range(unsigned long min, unsigned long max);
+void usleep_range_state(unsigned long min, unsigned long max,
+ unsigned int state);
+
+static inline void usleep_range(unsigned long min, unsigned long max)
+{
+ usleep_range_state(min, max, TASK_UNINTERRUPTIBLE);
+}
+
+static inline void usleep_idle_range(unsigned long min, unsigned long max)
+{
+ usleep_range_state(min, max, TASK_IDLE);
+}
static inline void ssleep(unsigned int seconds)
{
--- a/kernel/time/timer.c~timers-implement-usleep_idle_range
+++ a/kernel/time/timer.c
@@ -2054,26 +2054,28 @@ unsigned long msleep_interruptible(unsig
EXPORT_SYMBOL(msleep_interruptible);
/**
- * usleep_range - Sleep for an approximate time
- * @min: Minimum time in usecs to sleep
- * @max: Maximum time in usecs to sleep
+ * usleep_range_state - Sleep for an approximate time in a given state
+ * @min: Minimum time in usecs to sleep
+ * @max: Maximum time in usecs to sleep
+ * @state: State of the current task that will be while sleeping
*
* In non-atomic context where the exact wakeup time is flexible, use
- * usleep_range() instead of udelay(). The sleep improves responsiveness
+ * usleep_range_state() instead of udelay(). The sleep improves responsiveness
* by avoiding the CPU-hogging busy-wait of udelay(), and the range reduces
* power usage by allowing hrtimers to take advantage of an already-
* scheduled interrupt instead of scheduling a new one just for this sleep.
*/
-void __sched usleep_range(unsigned long min, unsigned long max)
+void __sched usleep_range_state(unsigned long min, unsigned long max,
+ unsigned int state)
{
ktime_t exp = ktime_add_us(ktime_get(), min);
u64 delta = (u64)(max - min) * NSEC_PER_USEC;
for (;;) {
- __set_current_state(TASK_UNINTERRUPTIBLE);
+ __set_current_state(state);
/* Do not return before the requested sleep time has elapsed */
if (!schedule_hrtimeout_range(&exp, delta, HRTIMER_MODE_ABS))
break;
}
}
-EXPORT_SYMBOL(usleep_range);
+EXPORT_SYMBOL(usleep_range_state);
_
Patches currently in -mm which might be from sj(a)kernel.org are
timers-implement-usleep_idle_range.patch
mm-damon-core-fix-fake-load-reports-due-to-uninterruptible-sleeps.patch
mm-damon-remove-some-no-need-func-definitions-in-damonh-file-fix.patch
If APICv is disabled for this vCPU, assigned devices may
still attempt to post interrupts. In that case, we need
to cancel the vmentry and deliver the interrupt with
KVM_REQ_EVENT. Extend the existing code that handles
injection of L1 interrupts into L2 to cover this case
as well.
vmx_hwapic_irr_update is only called when APICv is active
so it would be confusing to add a check for
vcpu->arch.apicv_active in there. Instead, just use
vmx_set_rvi directly in vmx_sync_pir_to_irr.
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++------------
1 file changed, 23 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ba66c171d951..cccf1eab58ac 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6264,7 +6264,7 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
int max_irr;
bool max_irr_updated;
- if (KVM_BUG_ON(!vcpu->arch.apicv_active, vcpu->kvm))
+ if (KVM_BUG_ON(!enable_apicv, vcpu->kvm))
return -EIO;
if (pi_test_on(&vmx->pi_desc)) {
@@ -6276,20 +6276,31 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
smp_mb__after_atomic();
max_irr_updated =
kvm_apic_update_irr(vcpu, vmx->pi_desc.pir, &max_irr);
-
- /*
- * If we are running L2 and L1 has a new pending interrupt
- * which can be injected, this may cause a vmexit or it may
- * be injected into L2. Either way, this interrupt will be
- * processed via KVM_REQ_EVENT, not RVI, because we do not use
- * virtual interrupt delivery to inject L1 interrupts into L2.
- */
- if (is_guest_mode(vcpu) && max_irr_updated)
- kvm_make_request(KVM_REQ_EVENT, vcpu);
} else {
max_irr = kvm_lapic_find_highest_irr(vcpu);
+ max_irr_updated = false;
}
- vmx_hwapic_irr_update(vcpu, max_irr);
+
+ /*
+ * If virtual interrupt delivery is not in use, the interrupt
+ * will be processed via KVM_REQ_EVENT, not RVI. This can happen
+ * in two cases:
+ *
+ * 1) If we are running L2 and L1 has a new pending interrupt
+ * which can be injected, this may cause a vmexit or it may
+ * be injected into L2. We do not use virtual interrupt
+ * delivery to inject L1 interrupts into L2.
+ *
+ * 2) If APICv is disabled for this vCPU, assigned devices may
+ * still attempt to post interrupts. The posted interrupt
+ * vector will cause a vmexit and the subsequent entry will
+ * call sync_pir_to_irr.
+ */
+ if (!is_guest_mode(vcpu) && vcpu->arch.apicv_active)
+ vmx_set_rvi(max_irr);
+ else if (max_irr_updated)
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+
return max_irr;
}
--
2.27.0
Check out this report and any autotriaged failures in our web dashboard:
https://datawarehouse.cki-project.org/kcidb/checkouts/25652
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 83e74b927dce - block: avoid to quiesce queue in elevator_init_mq
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 2:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking: igmp conformance test
🚧 ✅ audit: audit testsuite test
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 5:
✅ Boot test
✅ Reboot test
✅ Networking bridge: sanity - mlx5
✅ Ethernet drivers sanity - mlx5
Host 6:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ❌ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 7:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - srp
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 3:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ lvm cache test
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
s390x:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ❌ Podman system test - as user
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
Host 3:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ LTP - cve
✅ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.
Currently, checks for whether VT-d PI can be used refer to the current
status of the feature in the current vCPU; or they more or less pick
vCPU 0 in case a specific vCPU is not available.
However, these checks do not attempt to synchronize with changes to
the IRTE. In particular, there is no path that updates the IRTE when
APICv is re-activated on vCPU 0; and there is no path to wakeup a CPU
that has APICv disabled, if the wakeup occurs because of an IRTE
that points to a posted interrupt.
To fix this, always go through the VT-d PI path as long as there are
assigned devices and APICv is available on both the host and the VM side.
Since the relevant condition was copied over three times, take the hint
and factor it into a separate function.
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/vmx/posted_intr.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 5f81ef092bd4..1c94783b5a54 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -5,6 +5,7 @@
#include <asm/cpu.h>
#include "lapic.h"
+#include "irq.h"
#include "posted_intr.h"
#include "trace.h"
#include "vmx.h"
@@ -77,13 +78,18 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
pi_set_on(pi_desc);
}
+static bool vmx_can_use_vtd_pi(struct kvm *kvm)
+{
+ return irqchip_in_kernel(kvm) && enable_apicv &&
+ kvm_arch_has_assigned_device(kvm) &&
+ irq_remapping_cap(IRQ_POSTING_CAP);
+}
+
void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
{
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
- if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
- !irq_remapping_cap(IRQ_POSTING_CAP) ||
- !kvm_vcpu_apicv_active(vcpu))
+ if (!vmx_can_use_vtd_pi(vcpu->kvm))
return;
/* Set SN when the vCPU is preempted */
@@ -141,9 +147,7 @@ int pi_pre_block(struct kvm_vcpu *vcpu)
struct pi_desc old, new;
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
- if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
- !irq_remapping_cap(IRQ_POSTING_CAP) ||
- !kvm_vcpu_apicv_active(vcpu))
+ if (!vmx_can_use_vtd_pi(vcpu->kvm))
return 0;
WARN_ON(irqs_disabled());
@@ -270,9 +274,7 @@ int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq,
struct vcpu_data vcpu_info;
int idx, ret = 0;
- if (!kvm_arch_has_assigned_device(kvm) ||
- !irq_remapping_cap(IRQ_POSTING_CAP) ||
- !kvm_vcpu_apicv_active(kvm->vcpus[0]))
+ if (!vmx_can_use_vtd_pi(kvm))
return 0;
idx = srcu_read_lock(&kvm->irq_srcu);
--
2.27.0
The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even
if APICv is disabled on the vCPU that receives it. In that case, the
interrupt will just cause a vmexit and leave the ON bit set together
with the PIR bit corresponding to the interrupt.
Right now, the interrupt would not be delivered until APICv is re-enabled.
However, fixing this is just a matter of always doing the PIR->IRR
synchronization, even if the vCPU has temporarily disabled APICv.
This is not a problem for performance, or if anything it is an
improvement. First, in the common case where vcpu->arch.apicv_active is
true, one fewer check has to be performed. Second, static_call_cond will
elide the function call if APICv is not present or disabled. Finally,
in the case for AMD hardware we can remove the sync_pir_to_irr callback:
it is only needed for apic_has_interrupt_for_ppr, and that function
already has a fallback for !APICv.
Cc: stable(a)vger.kernel.org
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/lapic.c | 2 +-
arch/x86/kvm/svm/svm.c | 1 -
arch/x86/kvm/x86.c | 18 +++++++++---------
3 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 759952dd1222..f206fc35deff 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -707,7 +707,7 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
static int apic_has_interrupt_for_ppr(struct kvm_lapic *apic, u32 ppr)
{
int highest_irr;
- if (apic->vcpu->arch.apicv_active)
+ if (kvm_x86_ops.sync_pir_to_irr)
highest_irr = static_call(kvm_x86_sync_pir_to_irr)(apic->vcpu);
else
highest_irr = apic_find_highest_irr(apic);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5630c241d5f6..d0f68d11ec70 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4651,7 +4651,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_irr_update = svm_hwapic_irr_update,
.hwapic_isr_update = svm_hwapic_isr_update,
- .sync_pir_to_irr = kvm_lapic_find_highest_irr,
.apicv_post_state_restore = avic_post_state_restore,
.set_tss_addr = svm_set_tss_addr,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 441f4769173e..a8f12c83db4b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4448,8 +4448,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s)
{
- if (vcpu->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
return kvm_apic_get_state(vcpu, s);
}
@@ -9528,8 +9527,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
if (irqchip_split(vcpu->kvm))
kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors);
else {
- if (vcpu->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (ioapic_in_kernel(vcpu->kvm))
kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors);
}
@@ -9802,10 +9800,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
/*
* This handles the case where a posted interrupt was
- * notified with kvm_vcpu_kick.
+ * notified with kvm_vcpu_kick. Assigned devices can
+ * use the POSTED_INTR_VECTOR even if APICv is disabled,
+ * so do it even if APICv is disabled on this vCPU.
*/
- if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ if (kvm_lapic_enabled(vcpu))
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (kvm_vcpu_exit_request(vcpu)) {
vcpu->mode = OUTSIDE_GUEST_MODE;
@@ -9849,8 +9849,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
break;
- if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ if (kvm_lapic_enabled(vcpu))
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (unlikely(kvm_vcpu_exit_request(vcpu))) {
exit_fastpath = EXIT_FASTPATH_EXIT_HANDLED;
--
2.27.0
Synchronize the condition for the two calls to kvm_x86_sync_pir_to_irr.
The one in the reenter-guest fast path invoked the callback
unconditionally even if LAPIC is disabled.
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5a403d92833f..441f4769173e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9849,7 +9849,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
break;
- if (vcpu->arch.apicv_active)
+ if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
static_call(kvm_x86_sync_pir_to_irr)(vcpu);
if (unlikely(kvm_vcpu_exit_request(vcpu))) {
--
2.27.0
Check out this report and any autotriaged failures in our web dashboard:
https://datawarehouse.cki-project.org/kcidb/checkouts/25645
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 401eae66e1aa - block: avoid to quiesce queue in elevator_init_mq
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
Targeted tests: NO
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking: igmp conformance test
🚧 ✅ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 3:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 4:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 5:
✅ Boot test
✅ Reboot test
✅ Networking bridge: sanity - mlx5
✅ Ethernet drivers sanity - mlx5
Host 6:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 💥 stress: stress-ng - os
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - srp
Host 2:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ IPMI driver test
✅ IPMItool loop stress test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage block - filesystem fio test
✅ Storage block - queue scheduler test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ lvm cache test
Host 3:
✅ Boot test
✅ Reboot test
🚧 ❌ Storage blktests - nvmeof-mp
Host 4:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
s390x:
Host 1:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - nvmeof-mp
Host 2:
✅ Boot test
✅ Reboot test
🚧 ✅ Storage blktests - srp
Host 3:
✅ Boot test
✅ Reboot test
✅ LTP - cve
✅ LTP - sched
✅ LTP - syscalls
✅ LTP - can
✅ LTP - commands
✅ LTP - containers
✅ LTP - dio
✅ LTP - fs
✅ LTP - fsx
✅ LTP - math
✅ LTP - hugetlb
✅ LTP - mm
✅ LTP - nptl
✅ LTP - pty
✅ LTP - ipc
✅ LTP - tracing
✅ LTP: openposix test suite
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ NFS Connectathon
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ storage: dm/common
✅ lvm snapper test
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 4:
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage blktests - blk
✅ Storage: swraid mdadm raid_module test
✅ stress: stress-ng - interrupt
✅ stress: stress-ng - cpu
✅ stress: stress-ng - cpu-cache
✅ stress: stress-ng - memory
🚧 ✅ Podman system test - as root
🚧 ✅ Podman system test - as user
🚧 ✅ Storage blktests - nvme-tcp
🚧 ✅ lvm cache test
🚧 ✅ stress: stress-ng - os
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ xfstests - cifsv3.11
⚡⚡⚡ IPMI driver test
⚡⚡⚡ IPMItool loop stress test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ Storage blktests - blk
⚡⚡⚡ Storage block - filesystem fio test
⚡⚡⚡ Storage block - queue scheduler test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
⚡⚡⚡ stress: stress-ng - interrupt
⚡⚡⚡ stress: stress-ng - cpu
⚡⚡⚡ stress: stress-ng - cpu-cache
⚡⚡⚡ stress: stress-ng - memory
🚧 ⚡⚡⚡ Podman system test - as root
🚧 ⚡⚡⚡ Podman system test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ Storage blktests - nvme-tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ lvm cache test
🚧 ⚡⚡⚡ stress: stress-ng - os
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - nvmeof-mp
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
⚡⚡⚡ LTP - cve
⚡⚡⚡ LTP - sched
⚡⚡⚡ LTP - syscalls
⚡⚡⚡ LTP - can
⚡⚡⚡ LTP - commands
⚡⚡⚡ LTP - containers
⚡⚡⚡ LTP - dio
⚡⚡⚡ LTP - fs
⚡⚡⚡ LTP - fsx
⚡⚡⚡ LTP - math
⚡⚡⚡ LTP - hugetlb
⚡⚡⚡ LTP - mm
⚡⚡⚡ LTP - nptl
⚡⚡⚡ LTP - pty
⚡⚡⚡ LTP - ipc
⚡⚡⚡ LTP - tracing
⚡⚡⚡ LTP: openposix test suite
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ NFS Connectathon
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: dm/common
⚡⚡⚡ lvm snapper test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking: igmp conformance test
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
🚧 ⚡⚡⚡ Storage blktests - nvmeof-mp
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
🚧 ⚡⚡⚡ Storage blktests - srp
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Targeted tests
--------------
Test runs for patches always include a set of base tests, plus some
tests chosen based on the file paths modified by the patch. The latter
are called "targeted tests". If no targeted tests are run, that means
no patch-specific tests are available. Please, consider contributing a
targeted test for related patches to increase test coverage. See
https://docs.engineering.redhat.com/x/_wEZB for more details.