The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From abf0e8e4ef25478a4390115e6a953d589d1f9ffd Mon Sep 17 00:00:00 2001
From: Alexander Egorenkov <egorenar(a)linux.ibm.com>
Date: Thu, 9 Dec 2021 08:38:17 +0100
Subject: [PATCH] s390/kexec: handle R_390_PLT32DBL rela in
arch_kexec_apply_relocations_add()
Starting with gcc 11.3, the C compiler will generate PLT-relative function
calls even if they are local and do not require it. Later on during linking,
the linker will replace all PLT-relative calls to local functions with
PC-relative ones. Unfortunately, the purgatory code of kexec/kdump is
not being linked as a regular executable or shared library would have been,
and therefore, all PLT-relative addresses remain in the generated purgatory
object code unresolved. This leads to the situation where the purgatory
code is being executed during kdump with all PLT-relative addresses
unresolved. And this results in endless loops within the purgatory code.
Furthermore, the clang C compiler has always behaved like described above
and this commit should fix kdump for kernels built with the latter.
Because the purgatory code is no regular executable or shared library,
contains only calls to local functions and has no PLT, all R_390_PLT32DBL
relocation entries can be resolved just like a R_390_PC32DBL one.
* https://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_zSeries/x1633.html…
Relocation entries of purgatory code generated with gcc 11.3
------------------------------------------------------------
$ readelf -r linux/arch/s390/purgatory/purgatory.o
Relocation section '.rela.text' at offset 0x370 contains 5 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000005c 000c00000013 R_390_PC32DBL 0000000000000000 purgatory_sha_regions + 2
00000000007a 000d00000014 R_390_PLT32DBL 0000000000000000 sha256_update + 2
00000000008c 000e00000014 R_390_PLT32DBL 0000000000000000 sha256_final + 2
000000000092 000800000013 R_390_PC32DBL 0000000000000000 .LC0 + 2
0000000000a0 000f00000014 R_390_PLT32DBL 0000000000000000 memcmp + 2
Relocation entries of purgatory code generated with gcc 11.2
------------------------------------------------------------
$ readelf -r linux/arch/s390/purgatory/purgatory.o
Relocation section '.rela.text' at offset 0x368 contains 5 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000005c 000c00000013 R_390_PC32DBL 0000000000000000 purgatory_sha_regions + 2
00000000007a 000d00000013 R_390_PC32DBL 0000000000000000 sha256_update + 2
00000000008c 000e00000013 R_390_PC32DBL 0000000000000000 sha256_final + 2
000000000092 000800000013 R_390_PC32DBL 0000000000000000 .LC0 + 2
0000000000a0 000f00000013 R_390_PC32DBL 0000000000000000 memcmp + 2
Signed-off-by: Alexander Egorenkov <egorenar(a)linux.ibm.com>
Reported-by: Tao Liu <ltao(a)redhat.com>
Suggested-by: Philipp Rudo <prudo(a)redhat.com>
Reviewed-by: Philipp Rudo <prudo(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20211209073817.82196-1-egorenar@linux.ibm.com
Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com>
diff --git a/arch/s390/kernel/machine_kexec_file.c b/arch/s390/kernel/machine_kexec_file.c
index 876cdd3c994e..8f43575a4dd3 100644
--- a/arch/s390/kernel/machine_kexec_file.c
+++ b/arch/s390/kernel/machine_kexec_file.c
@@ -348,6 +348,10 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
addr = section->sh_addr + relas[i].r_offset;
r_type = ELF64_R_TYPE(relas[i].r_info);
+
+ if (r_type == R_390_PLT32DBL)
+ r_type = R_390_PC32DBL;
+
ret = arch_kexec_do_relocs(r_type, loc, val, addr);
if (ret) {
pr_err("Unknown rela relocation: %d\n", r_type);
In function nvkm_ioctl_map(), the variable "type" could be
uninitialized if "nvkm_object_map()" returns error code, however,
it does not check the return value and directly use the "type" in
the if statement, which is potentially unsafe.
Cc: stable(a)vger.kernel.org
Fixes: 01326050391c ("drm/nouveau/core/object: allow arguments to be passed to map function")
Signed-off-by: Yizhuo Zhai <yzhai003(a)ucr.edu>
---
drivers/gpu/drm/nouveau/nvkm/core/ioctl.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c b/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c
index 735cb6816f10..4264d9d79783 100644
--- a/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c
+++ b/drivers/gpu/drm/nouveau/nvkm/core/ioctl.c
@@ -266,6 +266,8 @@ nvkm_ioctl_map(struct nvkm_client *client,
ret = nvkm_object_map(object, data, size, &type,
&args->v0.handle,
&args->v0.length);
+ if (ret)
+ return ret;
if (type == NVKM_OBJECT_MAP_IO)
args->v0.type = NVIF_IOCTL_MAP_V0_IO;
else
--
2.25.1
Use down_read_nested() and down_write_nested() when taking the
ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
the path to the PCI root bus as lock subclass parameter. This fixes the
following false-positive lockdep report when unplugging a Lenovo X1C8 from
a Lenovo 2nd gen TB3 dock:
[ 28.583853] pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
[ 28.583891] pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
[ 28.584849] ============================================
[ 28.584854] WARNING: possible recursive locking detected
[ 28.584858] 5.16.0-rc2+ #621 Not tainted
[ 28.584864] --------------------------------------------
[ 28.584867] irq/124-pciehp/86 is trying to acquire lock:
[ 28.584873] ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80
[ 28.584904]
but task is already holding lock:
[ 28.584908] ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
[ 28.584929]
other info that might help us debug this:
[ 28.584933] Possible unsafe locking scenario:
[ 28.584936] CPU0
[ 28.584939] ----
[ 28.584942] lock(&ctrl->reset_lock);
[ 28.584949] lock(&ctrl->reset_lock);
[ 28.584955]
*** DEADLOCK ***
[ 28.584959] May be due to missing lock nesting notation
[ 28.584963] 3 locks held by irq/124-pciehp/86:
[ 28.584970] #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
[ 28.584991] #1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
[ 28.585012] #2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40
[ 28.585037]
stack backtrace:
[ 28.585042] CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
[ 28.585052] Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
[ 28.585059] Call Trace:
[ 28.585064] <TASK>
[ 28.585073] dump_stack_lvl+0x59/0x73
[ 28.585087] __lock_acquire.cold+0xc5/0x2c6
[ 28.585106] ? find_held_lock+0x2b/0x80
[ 28.585124] lock_acquire+0xb5/0x2b0
[ 28.585132] ? pciehp_check_presence+0x23/0x80
[ 28.585144] ? lock_is_held_type+0xa8/0x120
[ 28.585161] down_read+0x3e/0x50
[ 28.585172] ? pciehp_check_presence+0x23/0x80
[ 28.585183] pciehp_check_presence+0x23/0x80
[ 28.585194] pciehp_runtime_resume+0x5c/0xa0
[ 28.585206] ? pci_msix_init+0x60/0x60
[ 28.585214] device_for_each_child+0x45/0x70
[ 28.585227] pcie_port_device_runtime_resume+0x20/0x30
[ 28.585236] pci_pm_runtime_resume+0xa7/0xc0
[ 28.585246] ? pci_pm_freeze_noirq+0x100/0x100
[ 28.585257] __rpm_callback+0x41/0x110
[ 28.585271] ? pci_pm_freeze_noirq+0x100/0x100
[ 28.585281] rpm_callback+0x59/0x70
[ 28.585293] rpm_resume+0x512/0x7b0
[ 28.585309] __pm_runtime_resume+0x4a/0x90
[ 28.585322] __device_release_driver+0x28/0x240
[ 28.585338] device_release_driver+0x26/0x40
[ 28.585351] pci_stop_bus_device+0x68/0x90
[ 28.585363] pci_stop_bus_device+0x2c/0x90
[ 28.585373] pci_stop_and_remove_bus_device+0xe/0x20
[ 28.585384] pciehp_unconfigure_device+0x6c/0x110
[ 28.585396] ? __pm_runtime_resume+0x58/0x90
[ 28.585409] pciehp_disable_slot+0x5b/0xe0
[ 28.585421] pciehp_handle_presence_or_link_change+0xc3/0x2f0
[ 28.585436] pciehp_ist+0x179/0x180
[ 28.585449] ? disable_irq_nosync+0x10/0x10
[ 28.585460] irq_thread_fn+0x1d/0x60
[ 28.585470] ? irq_thread+0x81/0x1a0
[ 28.585480] irq_thread+0xcb/0x1a0
[ 28.585491] ? irq_thread_fn+0x60/0x60
[ 28.585502] ? irq_thread_check_affinity+0xb0/0xb0
[ 28.585514] kthread+0x165/0x190
[ 28.585522] ? set_kthread_struct+0x40/0x40
[ 28.585531] ret_from_fork+0x1f/0x30
[ 28.585554] </TASK>
This lockdep warning is triggered because with Thunderbolt, hotplug ports
are nested. When removing multiple devices in a daisy-chain, each hotplug
port's reset_lock may be acquired recursively. It's never the same lock,
so the lockdep splat is a false positive.
Because locks at the same hierarchy level are never acquired recursively,
a per-level lockdep class is sufficient to fix the lockdep warning.
The choice to use one lockdep subclass per pcie-hotplug controller in
the path to the root-bus was made to conserve class keys because their
number is limited and the complexity grows quadratically with number of
keys according to Documentation/locking/lockdep-design.rst.
Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redh…
Cc: stable(a)vger.kernel.org
Reported-by: "Theodore Ts'o" <tytso(a)mit.edu>
Reviewed-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
Changes in v2:
- Only use a subclass for each hotplug capable port/parent in the path to
the PCI root bus, instead of one for each level in the PCI hierarchy,
to avoid hitting MAX_LOCKDEP_SUBCLASSES
- Drop the "PCI: Add a pci_dev_depth() helper function" since we now need
a special version of this to only count hotplug ports
- Various commit message improvements
---
drivers/pci/hotplug/pciehp.h | 3 +++
drivers/pci/hotplug/pciehp_core.c | 2 +-
drivers/pci/hotplug/pciehp_hpc.c | 21 ++++++++++++++++++---
3 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 918dccbc74b6..e0a614acee05 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -75,6 +75,8 @@ extern int pciehp_poll_time;
* @reset_lock: prevents access to the Data Link Layer Link Active bit in the
* Link Status register and to the Presence Detect State bit in the Slot
* Status register during a slot reset which may cause them to flap
+ * @depth: Number of additional hotplug ports in the path to the root bus,
+ * used as lock subclass for @reset_lock
* @ist_running: flag to keep user request waiting while IRQ thread is running
* @request_result: result of last user request submitted to the IRQ thread
* @requester: wait queue to wake up on completion of user request,
@@ -106,6 +108,7 @@ struct controller {
struct hotplug_slot hotplug_slot; /* hotplug core interface */
struct rw_semaphore reset_lock;
+ unsigned int depth;
unsigned int ist_running;
int request_result;
wait_queue_head_t requester;
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index f34114d45259..4042d87d539d 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -166,7 +166,7 @@ static void pciehp_check_presence(struct controller *ctrl)
{
int occupied;
- down_read(&ctrl->reset_lock);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
mutex_lock(&ctrl->state_lock);
occupied = pciehp_card_present_or_link_active(ctrl);
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 83a0fa119cae..963fb50528da 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -583,7 +583,7 @@ static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
* the corresponding link change may have been ignored above.
* Synthesize it to ensure that it is acted on.
*/
- down_read(&ctrl->reset_lock);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
if (!pciehp_check_link_active(ctrl))
pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
up_read(&ctrl->reset_lock);
@@ -746,7 +746,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
* Disable requests have higher priority than Presence Detect Changed
* or Data Link Layer State Changed events.
*/
- down_read(&ctrl->reset_lock);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
if (events & DISABLE_SLOT)
pciehp_handle_disable_request(ctrl);
else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
@@ -906,7 +906,7 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, bool probe)
if (probe)
return 0;
- down_write(&ctrl->reset_lock);
+ down_write_nested(&ctrl->reset_lock, ctrl->depth);
if (!ATTN_BUTTN(ctrl)) {
ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
@@ -962,6 +962,20 @@ static inline void dbg_ctrl(struct controller *ctrl)
#define FLAG(x, y) (((x) & (y)) ? '+' : '-')
+static inline int pcie_hotplug_depth(struct pci_dev *dev)
+{
+ struct pci_bus *bus = dev->bus;
+ int depth = 0;
+
+ while (bus->parent) {
+ bus = bus->parent;
+ if (bus->self && bus->self->is_hotplug_bridge)
+ depth++;
+ }
+
+ return depth;
+}
+
struct controller *pcie_init(struct pcie_device *dev)
{
struct controller *ctrl;
@@ -975,6 +989,7 @@ struct controller *pcie_init(struct pcie_device *dev)
return NULL;
ctrl->pcie = dev;
+ ctrl->depth = pcie_hotplug_depth(dev->port);
pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &slot_cap);
if (pdev->hotplug_user_indicators)
--
2.33.1
Some BIOS-es contain a bug where they add addresses which map to system
RAM in the PCI host bridge window returned by the ACPI _CRS method, see
commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address
space").
To work around this bug Linux excludes E820 reserved addresses when
allocating addresses from the PCI host bridge window since 2010.
Recently (2019) some systems have shown-up with E820 reservations which
cover the entire _CRS returned PCI bridge memory window, causing all
attempts to assign memory to PCI BARs which have not been setup by the
BIOS to fail. For example here are the relevant dmesg bits from a
Lenovo IdeaPad 3 15IIL 81WE:
[mem 0x000000004bc50000-0x00000000cfffffff] reserved
pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
The ACPI specifications appear to allow this new behavior:
The relationship between E820 and ACPI _CRS is not really very clear.
ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
This range of addresses is in use or reserved by the system and is
not to be included in the allocatable memory pool of the operating
system's memory manager.
and it may be used when:
The address range is in use by a memory-mapped system device.
Furthermore, sec 15.2 says:
Address ranges defined for baseboard memory-mapped I/O devices, such
as APICs, are returned as reserved.
A PCI host bridge qualifies as a baseboard memory-mapped I/O device,
and its apertures are in use and certainly should not be included in
the general allocatable pool, so the fact that some BIOS-es reports
the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
So it seems that the excluding of E820 reserved addresses is a mistake.
Ideally Linux would fully stop excluding E820 reserved addresses,
but then the old systems this was added for will regress.
Instead keep the old behavior for old systems, while ignoring
the E820 reservations for any systems from now on.
Old systems are defined here as BIOS year < 2018, this was chosen to make
sure that E820 reservations will not be used on the currently affected
systems, while at the same time also taking into account that the systems
for which the E820 checking was originally added may have received BIOS
updates for quite a while (esp. CVE related ones), giving them a more
recent BIOS year then 2010.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793
BugLink: https://bugs.launchpad.net/bugs/1878279
BugLink: https://bugs.launchpad.net/bugs/1931715
BugLink: https://bugs.launchpad.net/bugs/1932069
BugLink: https://bugs.launchpad.net/bugs/1921649
Cc: Benoit Grégoire <benoitg(a)coeus.ca>
Cc: Hui Wang <hui.wang(a)canonical.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Bjorn Helgaas <bhelgaas(a)google.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
Changes in v6:
- Remove the possibility to change the behavior from the commandline
because of worries that users may use this to paper over other problems
Changes in v5:
- Drop mention of Windows behavior from the commit msg, replace with a
reference to the specs
- Improve documentation in Documentation/admin-guide/kernel-parameters.txt
- Reword the big comment added, use "PCI host bridge window" in it and drop
all refences to Windows
Changes in v4:
- Rewrap the big comment block to fit in 80 columns
- Add Rafael's Acked-by
- Add Cc: stable(a)vger.kernel.org
Changes in v3:
- Commit msg tweaks (drop dmesg timestamps, typo fix)
- Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
- Add Mika's Reviewed-by
Changes in v2:
- Replace the per model DMI quirk approach with disabling E820 reservations
checking for all systems with a BIOS year >= 2018
- Add documentation for the new kernel-parameters to
Documentation/admin-guide/kernel-parameters.txt
---
Other patches trying to address the same issue:
https://lore.kernel.org/r/20210624095324.34906-1-hui.wang@canonical.comhttps://lore.kernel.org/r/20200617164734.84845-1-mika.westerberg@linux.inte…
V1 patch:
https://lore.kernel.org/r/20211005150956.303707-1-hdegoede@redhat.com
---
arch/x86/kernel/resource.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
index 9b9fb7882c20..9ae64f9af956 100644
--- a/arch/x86/kernel/resource.c
+++ b/arch/x86/kernel/resource.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
+#include <linux/dmi.h>
#include <linux/ioport.h>
#include <asm/e820/api.h>
@@ -23,11 +24,31 @@ static void resource_clip(struct resource *res, resource_size_t start,
res->start = end + 1;
}
+/*
+ * Some BIOS-es contain a bug where they add addresses which map to
+ * system RAM in the PCI host bridge window returned by the ACPI _CRS
+ * method, see commit 4dc2287c1805 ("x86: avoid E820 regions when
+ * allocating address space"). To avoid this Linux by default excludes
+ * E820 reservations when allocating addresses since 2010.
+ * In 2019 some systems have shown-up with E820 reservations which cover
+ * the entire _CRS returned PCI host bridge window, causing all attempts
+ * to assign memory to PCI BARs to fail if Linux uses E820 reservations.
+ *
+ * Ideally Linux would fully stop using E820 reservations, but then
+ * the old systems this was added for will regress.
+ * Instead keep the old behavior for old systems, while ignoring the
+ * E820 reservations for any systems from now on.
+ */
static void remove_e820_regions(struct resource *avail)
{
- int i;
+ int i, year = dmi_get_bios_year();
struct e820_entry *entry;
+ if (year >= 2018)
+ return;
+
+ pr_info_once("PCI: Removing E820 reservations from host bridge windows\n");
+
for (i = 0; i < e820_table->nr_entries; i++) {
entry = &e820_table->entries[i];
--
2.33.1
On kdump instead of using an intermediate step to relocate the kernel,
that lives in a "control buffer" outside the current kernel's mapping,
we jump to the crash kernel directly by calling riscv_kexec_norelocate().
The current implementation uses va_pa_offset while switching to physical
addressing, however since we moved the kernel outside the linear mapping
this won't work anymore since riscv_kexec_norelocate() is part of the
kernel mapping and we should use kernel_map.va_kernel_pa_offset, and also
take XIP kernel into account.
We don't really need to use va_pa_offset on riscv_kexec_norelocate, we
can just set STVEC to the physical address of the new kernel instead and
let the hart jump to the new kernel on the next instruction after setting
SATP to zero. This fixes kdump and is also simpler/cleaner.
I tested this on the latest qemu and HiFive Unmatched and works as
expected.
v2: I removed the direct jump after setting satp as suggested.
Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping")
Signed-off-by: Nick Kossifidis <mick(a)ics.forth.gr>
Reviewed-by: Alexandre Ghiti <alex(a)ghiti.fr>
Cc: <stable(a)vger.kernel.org> # 5.13
Cc: <stable(a)vger.kernel.org> # 5.14
---
arch/riscv/kernel/kexec_relocate.S | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
index a80b52a74..059c5e216 100644
--- a/arch/riscv/kernel/kexec_relocate.S
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -159,25 +159,15 @@ SYM_CODE_START(riscv_kexec_norelocate)
* s0: (const) Phys address to jump to
* s1: (const) Phys address of the FDT image
* s2: (const) The hartid of the current hart
- * s3: (const) kernel_map.va_pa_offset, used when switching MMU off
*/
mv s0, a1
mv s1, a2
mv s2, a3
- mv s3, a4
/* Disable / cleanup interrupts */
csrw CSR_SIE, zero
csrw CSR_SIP, zero
- /* Switch to physical addressing */
- la s4, 1f
- sub s4, s4, s3
- csrw CSR_STVEC, s4
- csrw CSR_SATP, zero
-
-.align 2
-1:
/* Pass the arguments to the next kernel / Cleanup*/
mv a0, s2
mv a1, s1
@@ -214,7 +204,15 @@ SYM_CODE_START(riscv_kexec_norelocate)
csrw CSR_SCAUSE, zero
csrw CSR_SSCRATCH, zero
- jalr zero, a2, 0
+ /*
+ * Switch to physical addressing
+ * This will also trigger a jump to CSR_STVEC
+ * which in this case is the address of the new
+ * kernel.
+ */
+ csrw CSR_STVEC, a2
+ csrw CSR_SATP, zero
+
SYM_CODE_END(riscv_kexec_norelocate)
.section ".rodata"
--
2.32.0
Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled.
However, it will fail if DMA zone has no managed pages. The failure
can be seen in kdump kernel of x86_64 as below:
CPU: 0 PID: 65 Comm: kworker/u2:1 Not tainted 5.14.0-rc2+ #9
Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS RMLSDP.86I.R2.28.D690.1306271008 06/27/2013
Workqueue: events_unbound async_run_entry_fn
Call Trace:
dump_stack_lvl+0x57/0x72
warn_alloc.cold+0x72/0xd6
__alloc_pages_slowpath.constprop.0+0xf56/0xf70
__alloc_pages+0x23b/0x2b0
allocate_slab+0x406/0x630
___slab_alloc+0x4b1/0x7e0
? sr_probe+0x200/0x600
? lock_acquire+0xc4/0x2e0
? fs_reclaim_acquire+0x4d/0xe0
? lock_is_held_type+0xa7/0x120
? sr_probe+0x200/0x600
? __slab_alloc+0x67/0x90
__slab_alloc+0x67/0x90
? sr_probe+0x200/0x600
? sr_probe+0x200/0x600
kmem_cache_alloc_trace+0x259/0x270
sr_probe+0x200/0x600
......
bus_probe_device+0x9f/0xb0
device_add+0x3d2/0x970
......
__scsi_add_device+0xea/0x100
ata_scsi_scan_host+0x97/0x1d0
async_run_entry_fn+0x30/0x130
process_one_work+0x2b0/0x5c0
worker_thread+0x55/0x3c0
? process_one_work+0x5c0/0x5c0
kthread+0x149/0x170
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
Mem-Info:
......
The above failure happened when calling kmalloc() to allocate buffer with
GFP_DMA. It requests to allocate slab page from DMA zone while no managed
pages in there.
sr_probe()
--> get_capabilities()
--> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA);
The DMA zone should be checked if it has managed pages, then try to create
dma-kmalloc.
Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")
Cc: stable(a)vger.kernel.org
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
---
mm/slab_common.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index e5d080a93009..ae4ef0f8903a 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -878,6 +878,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
{
int i;
enum kmalloc_cache_type type;
+#ifdef CONFIG_ZONE_DMA
+ bool managed_dma;
+#endif
/*
* Including KMALLOC_CGROUP if CONFIG_MEMCG_KMEM defined
@@ -905,10 +908,16 @@ void __init create_kmalloc_caches(slab_flags_t flags)
slab_state = UP;
#ifdef CONFIG_ZONE_DMA
+ managed_dma = has_managed_dma();
+
for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) {
struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i];
if (s) {
+ if (!managed_dma) {
+ kmalloc_caches[KMALLOC_DMA][i] = kmalloc_caches[KMALLOC_NORMAL][i];
+ continue;
+ }
kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
kmalloc_info[i].name[KMALLOC_DMA],
kmalloc_info[i].size,
--
2.17.2
This is the start of the stable review cycle for the 5.15.9 release.
There are 42 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 17 Dec 2021 17:20:14 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.9-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.9-rc1
Adrian Hunter <adrian.hunter(a)intel.com>
perf inject: Fix itrace space allowed for new attributes
Miklos Szeredi <mszeredi(a)redhat.com>
fuse: make sure reclaim doesn't write the inode
Nikita Yushchenko <nikita.yoush(a)cogentembedded.com>
staging: most: dim2: use device release method
Chen Jun <chenjun102(a)huawei.com>
tracing: Fix a kmemleak false positive in tracing_map
Philip Yang <Philip.Yang(a)amd.com>
drm/amdkfd: process_info lock not needed for svm
Perry Yuan <Perry.Yuan(a)amd.com>
drm/amd/display: add connector type check for CRC source set
Philip Yang <Philip.Yang(a)amd.com>
drm/amdkfd: fix double free mem structure
Mustapha Ghaddar <mghaddar(a)amd.com>
drm/amd/display: Fix for the no Audio bug with Tiled Displays
Flora Cui <flora.cui(a)amd.com>
drm/amdgpu: check atomic flag to differeniate with legacy path
Flora Cui <flora.cui(a)amd.com>
drm/amdgpu: cancel the correct hrtimer on exit
Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
net: netlink: af_netlink: Prevent empty skb by adding a check on len.
Ondrej Jirman <megous(a)megous.com>
i2c: rk3x: Handle a spurious start completion interrupt flag
Helge Deller <deller(a)gmx.de>
parisc/agp: Annotate parisc agp init functions with __init
Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
ALSA: hda/hdmi: fix HDA codec entry table order for ADL-P
Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
ALSA: hda: Add Intel DG2 PCI ID and HDMI codec vid
Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
loop: Use pr_warn_once() for loop_control_remove() warning
Erik Ekman <erik(a)kryo.se>
net/mlx4_en: Update reported link modes for 1/10G
Alexander Stein <alexander.stein(a)ew.tq-group.com>
Revert "tty: serial: fsl_lpuart: drop earlycon entry for i.MX8QXP"
Ilie Halip <ilie.halip(a)gmail.com>
s390/test_unwind: use raw opcode instead of invalid instruction
Marc Zyngier <maz(a)kernel.org>
KVM: arm64: Save PSTATE early on exit
Douglas Anderson <dianders(a)chromium.org>
drm/msm/dp: Avoid unpowered AUX xfers that caused crashes
Philip Chen <philipchen(a)chromium.org>
drm/msm/dsi: set default num_data_lanes
Akhil P Oommen <akhilpo(a)codeaurora.org>
drm/msm/a6xx: Fix uinitialized use of gpu_scid
Akhil P Oommen <akhilpo(a)codeaurora.org>
drm/msm: Fix null ptr access msm_ioctl_gem_submit()
Vincent Whitchurch <vincent.whitchurch(a)axis.com>
i2c: virtio: fix completion handling
Ronak Doshi <doshir(a)vmware.com>
vmxnet3: fix minimum vectors alloc issue
Yahui Cao <yahui.cao(a)intel.com>
ice: fix FDIR init missing when reset VF
Tatyana Nikolova <tatyana.e.nikolova(a)intel.com>
RDMA/irdma: Don't arm the CQ more than two times if no CE for this CQ
Shiraz Saleem <shiraz.saleem(a)intel.com>
RDMA/irdma: Report correct WC errors
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
RDMA/irdma: Fix a potential memory allocation issue in 'irdma_prm_add_pble_mem()'
Shiraz Saleem <shiraz.saleem(a)intel.com>
RDMA/irdma: Fix a user-after-free in add_pble_prm
David Howells <dhowells(a)redhat.com>
netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lock
Song Liu <songliubraving(a)fb.com>
perf bpf_skel: Do not use typedef to avoid error on old clang
Martin Botka <martin.botka(a)somainline.org>
clk: qcom: sm6125-gcc: Swap ops of ice and apps on sdcc1
Rob Herring <robh(a)kernel.org>
dt-bindings: media: nxp,imx7-mipi-csi2: Drop bad if/then schema
Eric Dumazet <edumazet(a)google.com>
inet: use #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING consistently
Herve Codina <herve.codina(a)bootlin.com>
mtd: rawnand: Fix nand_choose_best_timings() on unsupported interface
Herve Codina <herve.codina(a)bootlin.com>
mtd: rawnand: Fix nand_erase_op delay
Alaa Hleihel <alaa(a)nvidia.com>
RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow
Pavel Skripkin <paskripkin(a)gmail.com>
RDMA: Fix use-after-free in rxe_queue_cleanup
Wilken Gottwalt <wilken.gottwalt(a)posteo.net>
hwmon: (corsair-psu) fix plain integer used as NULL pointer
Tadeusz Struk <tadeusz.struk(a)linaro.org>
nfc: fix segfault in nfc_genl_dump_devices_done
-------------
Diffstat:
.../bindings/media/nxp,imx7-mipi-csi2.yaml | 14 +-----
Makefile | 4 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 6 +++
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 7 ++-
arch/s390/lib/test_unwind.c | 5 +-
drivers/block/loop.c | 2 +-
drivers/char/agp/parisc-agp.c | 6 +--
drivers/clk/qcom/gcc-sm6125.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 4 +-
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 9 ----
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c | 8 ++++
drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 ++
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 18 +++----
drivers/gpu/drm/msm/dp/dp_aux.c | 17 +++++++
drivers/gpu/drm/msm/dsi/dsi_host.c | 2 +
drivers/gpu/drm/msm/msm_gem_submit.c | 1 +
drivers/hwmon/corsair-psu.c | 2 +-
drivers/i2c/busses/i2c-rk3x.c | 4 +-
drivers/i2c/busses/i2c-virtio.c | 32 +++++--------
drivers/infiniband/hw/irdma/hw.c | 7 ++-
drivers/infiniband/hw/irdma/main.h | 1 +
drivers/infiniband/hw/irdma/pble.c | 8 ++--
drivers/infiniband/hw/irdma/pble.h | 1 -
drivers/infiniband/hw/irdma/utils.c | 24 +++++++---
drivers/infiniband/hw/irdma/verbs.c | 23 +++++++--
drivers/infiniband/hw/irdma/verbs.h | 2 +
drivers/infiniband/hw/mlx5/mlx5_ib.h | 6 +--
drivers/infiniband/hw/mlx5/mr.c | 26 +++++-----
drivers/infiniband/sw/rxe/rxe_qp.c | 1 +
drivers/mtd/nand/raw/nand_base.c | 6 +--
drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 2 +
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 6 +--
drivers/net/vmxnet3/vmxnet3_drv.c | 13 ++---
drivers/staging/most/dim2/dim2.c | 55 ++++++++++++----------
drivers/tty/serial/fsl_lpuart.c | 1 +
fs/fuse/dir.c | 8 ++++
fs/fuse/file.c | 15 ++++++
fs/fuse/fuse_i.h | 1 +
fs/fuse/inode.c | 3 ++
fs/netfs/read_helper.c | 15 ++----
kernel/trace/tracing_map.c | 3 ++
net/ipv4/inet_connection_sock.c | 2 +-
net/netlink/af_netlink.c | 5 ++
net/nfc/netlink.c | 6 ++-
sound/pci/hda/hda_intel.c | 12 ++++-
sound/pci/hda/patch_hdmi.c | 3 +-
tools/perf/builtin-inject.c | 2 +-
tools/perf/util/bpf_skel/bperf.h | 14 ------
tools/perf/util/bpf_skel/bperf_follower.bpf.c | 16 +++++--
tools/perf/util/bpf_skel/bperf_leader.bpf.c | 16 +++++--
52 files changed, 284 insertions(+), 180 deletions(-)
After commit 9786b65bc61ac ("drm/ttm: fix mmap refcounting"),
drm_gem_mmap_obj() takes a reference of the passed drm_gem_object at the
beginning of the function to safely dereference the mmap offset pointer,
and releases it at the end, if an error occurred. However, the cma and
shmem helpers are also releasing that reference in case of an error,
which causes the imbalance of the reference counter and the panic
reported by syzbot.
Don't release the reference in drm_gem_mmap_obj() if the mmap method was
called and it returned an error, and uniformly apply the same behavior of
the cma and shmem helpers to the ttm helper (release the reference in the
helper, not in the caller, when an error occurs).
Cc: stable(a)vger.kernel.org
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Reported-by: syzbot+c8ae65286134dd1b800d(a)syzkaller.appspotmail.com
Fixes: 9786b65bc61ac ("drm/ttm: fix mmap refcounting")
---
drivers/gpu/drm/drm_gem.c | 3 ++-
drivers/gpu/drm/drm_gem_ttm_helper.c | 4 +---
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 4dcdec6487bb..7264a1a7a8d2 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1049,8 +1049,9 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,
if (obj->funcs->mmap) {
ret = obj->funcs->mmap(obj, vma);
+ /* All helpers call drm_gem_object_put() */
if (ret)
- goto err_drm_gem_object_put;
+ return ret;
WARN_ON(!(vma->vm_flags & VM_DONTEXPAND));
} else {
if (!vma->vm_ops) {
diff --git a/drivers/gpu/drm/drm_gem_ttm_helper.c b/drivers/gpu/drm/drm_gem_ttm_helper.c
index ecf3d2a54a98..c44bfdbb722d 100644
--- a/drivers/gpu/drm/drm_gem_ttm_helper.c
+++ b/drivers/gpu/drm/drm_gem_ttm_helper.c
@@ -101,8 +101,6 @@ int drm_gem_ttm_mmap(struct drm_gem_object *gem,
int ret;
ret = ttm_bo_mmap_obj(vma, bo);
- if (ret < 0)
- return ret;
/*
* ttm has its own object refcounting, so drop gem reference
@@ -110,7 +108,7 @@ int drm_gem_ttm_mmap(struct drm_gem_object *gem,
*/
drm_gem_object_put(gem);
- return 0;
+ return ret;
}
EXPORT_SYMBOL(drm_gem_ttm_mmap);
--
2.32.0