From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
BugLink: https://bugs.launchpad.net/bugs/1950644
commit 032146cda85566abcd1c4884d9d23e4e30a07e9a upstream.
If we open a file without read access and then pass the fd to a syscall
whose implementation calls kernel_read_file_from_fd(), we get a warning
from __kernel_read():
if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
This currently affects both finit_module() and kexec_file_load(), but it
could affect other syscalls in the future.
Link: https://lkml.kernel.org/r/20211007220110.600005-1-willy@infradead.org
Fixes: b844f0ecbc56 ("vfs: define kernel_copy_file_from_fd()")
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Acked-by: Christian Brauner <christian.brauner(a)ubuntu.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
(cherry picked from commit 0f218ba4c8aac7041cd8b81a5a893b0d121e6316 linux-5.4.y)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
---
fs/exec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c
index eeba096e8a38..006f7fb40b96 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1000,7 +1000,7 @@ int kernel_read_file_from_fd(int fd, void **buf, loff_t *size, loff_t max_size,
struct fd f = fdget(fd);
int ret = -EBADF;
- if (!f.file)
+ if (!f.file || !(f.file->f_mode & FMODE_READ))
goto out;
ret = kernel_read_file(f.file, buf, size, max_size, id);
--
2.32.0
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
BugLink: https://bugs.launchpad.net/bugs/1950644
If we open a file without read access and then pass the fd to a syscall
whose implementation calls kernel_read_file_from_fd(), we get a warning
from __kernel_read():
if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
This currently affects both finit_module() and kexec_file_load(), but it
could affect other syscalls in the future.
Link: https://lkml.kernel.org/r/20211007220110.600005-1-willy@infradead.org
Fixes: b844f0ecbc56 ("vfs: define kernel_copy_file_from_fd()")
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Acked-by: Christian Brauner <christian.brauner(a)ubuntu.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
(cherry picked from commit 032146cda85566abcd1c4884d9d23e4e30a07e9a)
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
---
fs/kernel_read_file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c
index 90d255fbdd9b..c84d87f558cb 100644
--- a/fs/kernel_read_file.c
+++ b/fs/kernel_read_file.c
@@ -178,7 +178,7 @@ int kernel_read_file_from_fd(int fd, loff_t offset, void **buf,
struct fd f = fdget(fd);
int ret = -EBADF;
- if (!f.file)
+ if (!f.file || !(f.file->f_mode & FMODE_READ))
goto out;
ret = kernel_read_file(f.file, offset, buf, buf_size, file_size, id);
--
2.32.0
Currently, checks for whether VT-d PI can be used refer to the current
status of the feature in the current vCPU; or they more or less pick
vCPU 0 in case a specific vCPU is not available.
However, these checks do not attempt to synchronize with changes to
the IRTE. In particular, there is no path that updates the IRTE when
APICv is re-activated on vCPU 0; and there is no path to wakeup a CPU
that has APICv disabled, if the wakeup occurs because of an IRTE
that points to a posted interrupt.
To fix this, always go through the VT-d PI path as long as there are
assigned devices and APICv is available on both the host and the VM side.
Since the relevant condition was copied over three times, take the hint
and factor it into a separate function.
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/vmx/posted_intr.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 5f81ef092bd4..b64dd1374ed9 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -5,6 +5,7 @@
#include <asm/cpu.h>
#include "lapic.h"
+#include "irq.h"
#include "posted_intr.h"
#include "trace.h"
#include "vmx.h"
@@ -77,13 +78,18 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
pi_set_on(pi_desc);
}
+static bool vmx_can_use_vtd_pi(struct kvm *kvm)
+{
+ return kvm_arch_has_assigned_device(kvm) &&
+ irq_remapping_cap(IRQ_POSTING_CAP) &&
+ irqchip_in_kernel(kvm) && enable_apicv;
+}
+
void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
{
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
- if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
- !irq_remapping_cap(IRQ_POSTING_CAP) ||
- !kvm_vcpu_apicv_active(vcpu))
+ if (!vmx_can_use_vtd_pi(vcpu->kvm))
return;
/* Set SN when the vCPU is preempted */
@@ -141,9 +147,7 @@ int pi_pre_block(struct kvm_vcpu *vcpu)
struct pi_desc old, new;
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
- if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
- !irq_remapping_cap(IRQ_POSTING_CAP) ||
- !kvm_vcpu_apicv_active(vcpu))
+ if (!vmx_can_use_vtd_pi(vcpu->kvm))
return 0;
WARN_ON(irqs_disabled());
@@ -270,9 +274,7 @@ int pi_update_irte(struct kvm *kvm, unsigned int host_irq, uint32_t guest_irq,
struct vcpu_data vcpu_info;
int idx, ret = 0;
- if (!kvm_arch_has_assigned_device(kvm) ||
- !irq_remapping_cap(IRQ_POSTING_CAP) ||
- !kvm_vcpu_apicv_active(kvm->vcpus[0]))
+ if (!vmx_can_use_vtd_pi(kvm))
return 0;
idx = srcu_read_lock(&kvm->irq_srcu);
--
2.27.0
Commit 4f86a06e2d6e ("irqdomain: Make normal and nomap irqdomains
exclusive") introduced an IRQ_DOMAIN_FLAG_NO_MAP flag to isolate the
'nomap' domains still in use under the powerpc arch. With this new
flag, the revmap_tree of the IRQ domain is not used anymore. This
change broke the support of shared LSIs [1] in the XIVE driver because
it was relying on a lookup in the revmap_tree to query previously
mapped interrupts. Linux now creates two distinct IRQ mappings on the
same HW IRQ which can lead to unexpected behavior in the drivers.
The XIVE IRQ domain is not a direct mapping domain and its HW IRQ
interrupt number space is rather large : 1M/socket on POWER9 and
POWER10, change the XIVE driver to use a 'tree' domain type instead.
[1] For instance, a linux KVM guest with virtio-rng and virtio-balloon
devices.
Cc: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org # v5.14+
Fixes: 4f86a06e2d6e ("irqdomain: Make normal and nomap irqdomains exclusive")
Signed-off-by: Cédric Le Goater <clg(a)kaod.org>
---
Marc,
The Fixes tag is there because the patch in question revealed that
something was broken in XIVE. genirq is not in cause. However, I
don't know for PS3 and Cell. May be less critical for now.
arch/powerpc/sysdev/xive/common.c | 3 +--
arch/powerpc/sysdev/xive/Kconfig | 1 -
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index fed6fd16c8f4..9d0f0fe25598 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1536,8 +1536,7 @@ static const struct irq_domain_ops xive_irq_domain_ops = {
static void __init xive_init_host(struct device_node *np)
{
- xive_irq_domain = irq_domain_add_nomap(np, XIVE_MAX_IRQ,
- &xive_irq_domain_ops, NULL);
+ xive_irq_domain = irq_domain_add_tree(np, &xive_irq_domain_ops, NULL);
if (WARN_ON(xive_irq_domain == NULL))
return;
irq_set_default_host(xive_irq_domain);
diff --git a/arch/powerpc/sysdev/xive/Kconfig b/arch/powerpc/sysdev/xive/Kconfig
index 97796c6b63f0..785c292d104b 100644
--- a/arch/powerpc/sysdev/xive/Kconfig
+++ b/arch/powerpc/sysdev/xive/Kconfig
@@ -3,7 +3,6 @@ config PPC_XIVE
bool
select PPC_SMP_MUXED_IPI
select HARDIRQS_SW_RESEND
- select IRQ_DOMAIN_NOMAP
config PPC_XIVE_NATIVE
bool
--
2.31.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 541ac97186d9ea88491961a46284de3603c914fd Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Fri, 1 Oct 2021 21:41:20 +0200
Subject: [PATCH] x86/sev: Make the #VC exception stacks part of the default
stacks storage
The size of the exception stacks was increased by the commit in Fixes,
resulting in stack sizes greater than a page in size. The #VC exception
handling was only mapping the first (bottom) page, resulting in an
SEV-ES guest failing to boot.
Make the #VC exception stacks part of the default exception stacks
storage and allocate them with a CONFIG_AMD_MEM_ENCRYPT=y .config. Map
them only when a SEV-ES guest has been detected.
Rip out the custom VC stacks mapping and storage code.
[ bp: Steal and adapt Tom's commit message. ]
Fixes: 7fae4c24a2b8 ("x86: Increase exception stack sizes")
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Tested-by: Brijesh Singh <brijesh.singh(a)amd.com>
Link: https://lkml.kernel.org/r/YVt1IMjIs7pIZTRR@zn.tnic
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 3d52b094850a..dd5ea1bdf04c 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -10,6 +10,12 @@
#ifdef CONFIG_X86_64
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define VC_EXCEPTION_STKSZ EXCEPTION_STKSZ
+#else
+#define VC_EXCEPTION_STKSZ 0
+#endif
+
/* Macro to enforce the same ordering and stack sizes */
#define ESTACKS_MEMBERS(guardsize, optional_stack_size) \
char DF_stack_guard[guardsize]; \
@@ -28,7 +34,7 @@
/* The exception stacks' physical storage. No guard pages required */
struct exception_stacks {
- ESTACKS_MEMBERS(0, 0)
+ ESTACKS_MEMBERS(0, VC_EXCEPTION_STKSZ)
};
/* The effective cpu entry area mapping with guard pages. */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 53a6837d354b..4d0d1c2b65e1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -46,16 +46,6 @@ static struct ghcb __initdata *boot_ghcb;
struct sev_es_runtime_data {
struct ghcb ghcb_page;
- /* Physical storage for the per-CPU IST stack of the #VC handler */
- char ist_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
- /*
- * Physical storage for the per-CPU fall-back stack of the #VC handler.
- * The fall-back stack is used when it is not safe to switch back to the
- * interrupted stack in the #VC entry code.
- */
- char fallback_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
/*
* Reserve one page per CPU as backup storage for the unencrypted GHCB.
* It is needed when an NMI happens while the #VC handler uses the real
@@ -99,27 +89,6 @@ DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);
-static void __init setup_vc_stacks(int cpu)
-{
- struct sev_es_runtime_data *data;
- struct cpu_entry_area *cea;
- unsigned long vaddr;
- phys_addr_t pa;
-
- data = per_cpu(runtime_data, cpu);
- cea = get_cpu_entry_area(cpu);
-
- /* Map #VC IST stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC);
- pa = __pa(data->ist_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-
- /* Map VC fall-back stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC2);
- pa = __pa(data->fallback_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-}
-
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -787,7 +756,6 @@ void __init sev_es_init_vc_handling(void)
for_each_possible_cpu(cpu) {
alloc_runtime_data(cpu);
init_ghcb(cpu);
- setup_vc_stacks(cpu);
}
sev_es_setup_play_dead();
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index f5e1e60c9095..6c2f1b76a0b6 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -110,6 +110,13 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
cea_map_stack(NMI);
cea_map_stack(DB);
cea_map_stack(MCE);
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
+ cea_map_stack(VC);
+ cea_map_stack(VC2);
+ }
+ }
}
#else
static inline void percpu_setup_exception_stacks(unsigned int cpu)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 541ac97186d9ea88491961a46284de3603c914fd Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Fri, 1 Oct 2021 21:41:20 +0200
Subject: [PATCH] x86/sev: Make the #VC exception stacks part of the default
stacks storage
The size of the exception stacks was increased by the commit in Fixes,
resulting in stack sizes greater than a page in size. The #VC exception
handling was only mapping the first (bottom) page, resulting in an
SEV-ES guest failing to boot.
Make the #VC exception stacks part of the default exception stacks
storage and allocate them with a CONFIG_AMD_MEM_ENCRYPT=y .config. Map
them only when a SEV-ES guest has been detected.
Rip out the custom VC stacks mapping and storage code.
[ bp: Steal and adapt Tom's commit message. ]
Fixes: 7fae4c24a2b8 ("x86: Increase exception stack sizes")
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Tested-by: Brijesh Singh <brijesh.singh(a)amd.com>
Link: https://lkml.kernel.org/r/YVt1IMjIs7pIZTRR@zn.tnic
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 3d52b094850a..dd5ea1bdf04c 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -10,6 +10,12 @@
#ifdef CONFIG_X86_64
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define VC_EXCEPTION_STKSZ EXCEPTION_STKSZ
+#else
+#define VC_EXCEPTION_STKSZ 0
+#endif
+
/* Macro to enforce the same ordering and stack sizes */
#define ESTACKS_MEMBERS(guardsize, optional_stack_size) \
char DF_stack_guard[guardsize]; \
@@ -28,7 +34,7 @@
/* The exception stacks' physical storage. No guard pages required */
struct exception_stacks {
- ESTACKS_MEMBERS(0, 0)
+ ESTACKS_MEMBERS(0, VC_EXCEPTION_STKSZ)
};
/* The effective cpu entry area mapping with guard pages. */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 53a6837d354b..4d0d1c2b65e1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -46,16 +46,6 @@ static struct ghcb __initdata *boot_ghcb;
struct sev_es_runtime_data {
struct ghcb ghcb_page;
- /* Physical storage for the per-CPU IST stack of the #VC handler */
- char ist_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
- /*
- * Physical storage for the per-CPU fall-back stack of the #VC handler.
- * The fall-back stack is used when it is not safe to switch back to the
- * interrupted stack in the #VC entry code.
- */
- char fallback_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
/*
* Reserve one page per CPU as backup storage for the unencrypted GHCB.
* It is needed when an NMI happens while the #VC handler uses the real
@@ -99,27 +89,6 @@ DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);
-static void __init setup_vc_stacks(int cpu)
-{
- struct sev_es_runtime_data *data;
- struct cpu_entry_area *cea;
- unsigned long vaddr;
- phys_addr_t pa;
-
- data = per_cpu(runtime_data, cpu);
- cea = get_cpu_entry_area(cpu);
-
- /* Map #VC IST stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC);
- pa = __pa(data->ist_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-
- /* Map VC fall-back stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC2);
- pa = __pa(data->fallback_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-}
-
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -787,7 +756,6 @@ void __init sev_es_init_vc_handling(void)
for_each_possible_cpu(cpu) {
alloc_runtime_data(cpu);
init_ghcb(cpu);
- setup_vc_stacks(cpu);
}
sev_es_setup_play_dead();
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index f5e1e60c9095..6c2f1b76a0b6 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -110,6 +110,13 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
cea_map_stack(NMI);
cea_map_stack(DB);
cea_map_stack(MCE);
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
+ cea_map_stack(VC);
+ cea_map_stack(VC2);
+ }
+ }
}
#else
static inline void percpu_setup_exception_stacks(unsigned int cpu)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 541ac97186d9ea88491961a46284de3603c914fd Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Fri, 1 Oct 2021 21:41:20 +0200
Subject: [PATCH] x86/sev: Make the #VC exception stacks part of the default
stacks storage
The size of the exception stacks was increased by the commit in Fixes,
resulting in stack sizes greater than a page in size. The #VC exception
handling was only mapping the first (bottom) page, resulting in an
SEV-ES guest failing to boot.
Make the #VC exception stacks part of the default exception stacks
storage and allocate them with a CONFIG_AMD_MEM_ENCRYPT=y .config. Map
them only when a SEV-ES guest has been detected.
Rip out the custom VC stacks mapping and storage code.
[ bp: Steal and adapt Tom's commit message. ]
Fixes: 7fae4c24a2b8 ("x86: Increase exception stack sizes")
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Tested-by: Brijesh Singh <brijesh.singh(a)amd.com>
Link: https://lkml.kernel.org/r/YVt1IMjIs7pIZTRR@zn.tnic
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 3d52b094850a..dd5ea1bdf04c 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -10,6 +10,12 @@
#ifdef CONFIG_X86_64
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define VC_EXCEPTION_STKSZ EXCEPTION_STKSZ
+#else
+#define VC_EXCEPTION_STKSZ 0
+#endif
+
/* Macro to enforce the same ordering and stack sizes */
#define ESTACKS_MEMBERS(guardsize, optional_stack_size) \
char DF_stack_guard[guardsize]; \
@@ -28,7 +34,7 @@
/* The exception stacks' physical storage. No guard pages required */
struct exception_stacks {
- ESTACKS_MEMBERS(0, 0)
+ ESTACKS_MEMBERS(0, VC_EXCEPTION_STKSZ)
};
/* The effective cpu entry area mapping with guard pages. */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 53a6837d354b..4d0d1c2b65e1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -46,16 +46,6 @@ static struct ghcb __initdata *boot_ghcb;
struct sev_es_runtime_data {
struct ghcb ghcb_page;
- /* Physical storage for the per-CPU IST stack of the #VC handler */
- char ist_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
- /*
- * Physical storage for the per-CPU fall-back stack of the #VC handler.
- * The fall-back stack is used when it is not safe to switch back to the
- * interrupted stack in the #VC entry code.
- */
- char fallback_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
/*
* Reserve one page per CPU as backup storage for the unencrypted GHCB.
* It is needed when an NMI happens while the #VC handler uses the real
@@ -99,27 +89,6 @@ DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);
-static void __init setup_vc_stacks(int cpu)
-{
- struct sev_es_runtime_data *data;
- struct cpu_entry_area *cea;
- unsigned long vaddr;
- phys_addr_t pa;
-
- data = per_cpu(runtime_data, cpu);
- cea = get_cpu_entry_area(cpu);
-
- /* Map #VC IST stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC);
- pa = __pa(data->ist_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-
- /* Map VC fall-back stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC2);
- pa = __pa(data->fallback_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-}
-
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -787,7 +756,6 @@ void __init sev_es_init_vc_handling(void)
for_each_possible_cpu(cpu) {
alloc_runtime_data(cpu);
init_ghcb(cpu);
- setup_vc_stacks(cpu);
}
sev_es_setup_play_dead();
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index f5e1e60c9095..6c2f1b76a0b6 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -110,6 +110,13 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
cea_map_stack(NMI);
cea_map_stack(DB);
cea_map_stack(MCE);
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
+ cea_map_stack(VC);
+ cea_map_stack(VC2);
+ }
+ }
}
#else
static inline void percpu_setup_exception_stacks(unsigned int cpu)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 541ac97186d9ea88491961a46284de3603c914fd Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Fri, 1 Oct 2021 21:41:20 +0200
Subject: [PATCH] x86/sev: Make the #VC exception stacks part of the default
stacks storage
The size of the exception stacks was increased by the commit in Fixes,
resulting in stack sizes greater than a page in size. The #VC exception
handling was only mapping the first (bottom) page, resulting in an
SEV-ES guest failing to boot.
Make the #VC exception stacks part of the default exception stacks
storage and allocate them with a CONFIG_AMD_MEM_ENCRYPT=y .config. Map
them only when a SEV-ES guest has been detected.
Rip out the custom VC stacks mapping and storage code.
[ bp: Steal and adapt Tom's commit message. ]
Fixes: 7fae4c24a2b8 ("x86: Increase exception stack sizes")
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Tested-by: Brijesh Singh <brijesh.singh(a)amd.com>
Link: https://lkml.kernel.org/r/YVt1IMjIs7pIZTRR@zn.tnic
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 3d52b094850a..dd5ea1bdf04c 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -10,6 +10,12 @@
#ifdef CONFIG_X86_64
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define VC_EXCEPTION_STKSZ EXCEPTION_STKSZ
+#else
+#define VC_EXCEPTION_STKSZ 0
+#endif
+
/* Macro to enforce the same ordering and stack sizes */
#define ESTACKS_MEMBERS(guardsize, optional_stack_size) \
char DF_stack_guard[guardsize]; \
@@ -28,7 +34,7 @@
/* The exception stacks' physical storage. No guard pages required */
struct exception_stacks {
- ESTACKS_MEMBERS(0, 0)
+ ESTACKS_MEMBERS(0, VC_EXCEPTION_STKSZ)
};
/* The effective cpu entry area mapping with guard pages. */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 53a6837d354b..4d0d1c2b65e1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -46,16 +46,6 @@ static struct ghcb __initdata *boot_ghcb;
struct sev_es_runtime_data {
struct ghcb ghcb_page;
- /* Physical storage for the per-CPU IST stack of the #VC handler */
- char ist_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
- /*
- * Physical storage for the per-CPU fall-back stack of the #VC handler.
- * The fall-back stack is used when it is not safe to switch back to the
- * interrupted stack in the #VC entry code.
- */
- char fallback_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
/*
* Reserve one page per CPU as backup storage for the unencrypted GHCB.
* It is needed when an NMI happens while the #VC handler uses the real
@@ -99,27 +89,6 @@ DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);
-static void __init setup_vc_stacks(int cpu)
-{
- struct sev_es_runtime_data *data;
- struct cpu_entry_area *cea;
- unsigned long vaddr;
- phys_addr_t pa;
-
- data = per_cpu(runtime_data, cpu);
- cea = get_cpu_entry_area(cpu);
-
- /* Map #VC IST stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC);
- pa = __pa(data->ist_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-
- /* Map VC fall-back stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC2);
- pa = __pa(data->fallback_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-}
-
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -787,7 +756,6 @@ void __init sev_es_init_vc_handling(void)
for_each_possible_cpu(cpu) {
alloc_runtime_data(cpu);
init_ghcb(cpu);
- setup_vc_stacks(cpu);
}
sev_es_setup_play_dead();
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index f5e1e60c9095..6c2f1b76a0b6 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -110,6 +110,13 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
cea_map_stack(NMI);
cea_map_stack(DB);
cea_map_stack(MCE);
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
+ cea_map_stack(VC);
+ cea_map_stack(VC2);
+ }
+ }
}
#else
static inline void percpu_setup_exception_stacks(unsigned int cpu)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 541ac97186d9ea88491961a46284de3603c914fd Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Fri, 1 Oct 2021 21:41:20 +0200
Subject: [PATCH] x86/sev: Make the #VC exception stacks part of the default
stacks storage
The size of the exception stacks was increased by the commit in Fixes,
resulting in stack sizes greater than a page in size. The #VC exception
handling was only mapping the first (bottom) page, resulting in an
SEV-ES guest failing to boot.
Make the #VC exception stacks part of the default exception stacks
storage and allocate them with a CONFIG_AMD_MEM_ENCRYPT=y .config. Map
them only when a SEV-ES guest has been detected.
Rip out the custom VC stacks mapping and storage code.
[ bp: Steal and adapt Tom's commit message. ]
Fixes: 7fae4c24a2b8 ("x86: Increase exception stack sizes")
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Tested-by: Brijesh Singh <brijesh.singh(a)amd.com>
Link: https://lkml.kernel.org/r/YVt1IMjIs7pIZTRR@zn.tnic
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 3d52b094850a..dd5ea1bdf04c 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -10,6 +10,12 @@
#ifdef CONFIG_X86_64
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define VC_EXCEPTION_STKSZ EXCEPTION_STKSZ
+#else
+#define VC_EXCEPTION_STKSZ 0
+#endif
+
/* Macro to enforce the same ordering and stack sizes */
#define ESTACKS_MEMBERS(guardsize, optional_stack_size) \
char DF_stack_guard[guardsize]; \
@@ -28,7 +34,7 @@
/* The exception stacks' physical storage. No guard pages required */
struct exception_stacks {
- ESTACKS_MEMBERS(0, 0)
+ ESTACKS_MEMBERS(0, VC_EXCEPTION_STKSZ)
};
/* The effective cpu entry area mapping with guard pages. */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 53a6837d354b..4d0d1c2b65e1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -46,16 +46,6 @@ static struct ghcb __initdata *boot_ghcb;
struct sev_es_runtime_data {
struct ghcb ghcb_page;
- /* Physical storage for the per-CPU IST stack of the #VC handler */
- char ist_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
- /*
- * Physical storage for the per-CPU fall-back stack of the #VC handler.
- * The fall-back stack is used when it is not safe to switch back to the
- * interrupted stack in the #VC entry code.
- */
- char fallback_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
/*
* Reserve one page per CPU as backup storage for the unencrypted GHCB.
* It is needed when an NMI happens while the #VC handler uses the real
@@ -99,27 +89,6 @@ DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);
-static void __init setup_vc_stacks(int cpu)
-{
- struct sev_es_runtime_data *data;
- struct cpu_entry_area *cea;
- unsigned long vaddr;
- phys_addr_t pa;
-
- data = per_cpu(runtime_data, cpu);
- cea = get_cpu_entry_area(cpu);
-
- /* Map #VC IST stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC);
- pa = __pa(data->ist_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-
- /* Map VC fall-back stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC2);
- pa = __pa(data->fallback_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-}
-
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -787,7 +756,6 @@ void __init sev_es_init_vc_handling(void)
for_each_possible_cpu(cpu) {
alloc_runtime_data(cpu);
init_ghcb(cpu);
- setup_vc_stacks(cpu);
}
sev_es_setup_play_dead();
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index f5e1e60c9095..6c2f1b76a0b6 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -110,6 +110,13 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
cea_map_stack(NMI);
cea_map_stack(DB);
cea_map_stack(MCE);
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
+ cea_map_stack(VC);
+ cea_map_stack(VC2);
+ }
+ }
}
#else
static inline void percpu_setup_exception_stacks(unsigned int cpu)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 541ac97186d9ea88491961a46284de3603c914fd Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Fri, 1 Oct 2021 21:41:20 +0200
Subject: [PATCH] x86/sev: Make the #VC exception stacks part of the default
stacks storage
The size of the exception stacks was increased by the commit in Fixes,
resulting in stack sizes greater than a page in size. The #VC exception
handling was only mapping the first (bottom) page, resulting in an
SEV-ES guest failing to boot.
Make the #VC exception stacks part of the default exception stacks
storage and allocate them with a CONFIG_AMD_MEM_ENCRYPT=y .config. Map
them only when a SEV-ES guest has been detected.
Rip out the custom VC stacks mapping and storage code.
[ bp: Steal and adapt Tom's commit message. ]
Fixes: 7fae4c24a2b8 ("x86: Increase exception stack sizes")
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Tested-by: Brijesh Singh <brijesh.singh(a)amd.com>
Link: https://lkml.kernel.org/r/YVt1IMjIs7pIZTRR@zn.tnic
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 3d52b094850a..dd5ea1bdf04c 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -10,6 +10,12 @@
#ifdef CONFIG_X86_64
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define VC_EXCEPTION_STKSZ EXCEPTION_STKSZ
+#else
+#define VC_EXCEPTION_STKSZ 0
+#endif
+
/* Macro to enforce the same ordering and stack sizes */
#define ESTACKS_MEMBERS(guardsize, optional_stack_size) \
char DF_stack_guard[guardsize]; \
@@ -28,7 +34,7 @@
/* The exception stacks' physical storage. No guard pages required */
struct exception_stacks {
- ESTACKS_MEMBERS(0, 0)
+ ESTACKS_MEMBERS(0, VC_EXCEPTION_STKSZ)
};
/* The effective cpu entry area mapping with guard pages. */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 53a6837d354b..4d0d1c2b65e1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -46,16 +46,6 @@ static struct ghcb __initdata *boot_ghcb;
struct sev_es_runtime_data {
struct ghcb ghcb_page;
- /* Physical storage for the per-CPU IST stack of the #VC handler */
- char ist_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
- /*
- * Physical storage for the per-CPU fall-back stack of the #VC handler.
- * The fall-back stack is used when it is not safe to switch back to the
- * interrupted stack in the #VC entry code.
- */
- char fallback_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
/*
* Reserve one page per CPU as backup storage for the unencrypted GHCB.
* It is needed when an NMI happens while the #VC handler uses the real
@@ -99,27 +89,6 @@ DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);
-static void __init setup_vc_stacks(int cpu)
-{
- struct sev_es_runtime_data *data;
- struct cpu_entry_area *cea;
- unsigned long vaddr;
- phys_addr_t pa;
-
- data = per_cpu(runtime_data, cpu);
- cea = get_cpu_entry_area(cpu);
-
- /* Map #VC IST stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC);
- pa = __pa(data->ist_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-
- /* Map VC fall-back stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC2);
- pa = __pa(data->fallback_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-}
-
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -787,7 +756,6 @@ void __init sev_es_init_vc_handling(void)
for_each_possible_cpu(cpu) {
alloc_runtime_data(cpu);
init_ghcb(cpu);
- setup_vc_stacks(cpu);
}
sev_es_setup_play_dead();
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index f5e1e60c9095..6c2f1b76a0b6 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -110,6 +110,13 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
cea_map_stack(NMI);
cea_map_stack(DB);
cea_map_stack(MCE);
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
+ cea_map_stack(VC);
+ cea_map_stack(VC2);
+ }
+ }
}
#else
static inline void percpu_setup_exception_stacks(unsigned int cpu)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 541ac97186d9ea88491961a46284de3603c914fd Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Fri, 1 Oct 2021 21:41:20 +0200
Subject: [PATCH] x86/sev: Make the #VC exception stacks part of the default
stacks storage
The size of the exception stacks was increased by the commit in Fixes,
resulting in stack sizes greater than a page in size. The #VC exception
handling was only mapping the first (bottom) page, resulting in an
SEV-ES guest failing to boot.
Make the #VC exception stacks part of the default exception stacks
storage and allocate them with a CONFIG_AMD_MEM_ENCRYPT=y .config. Map
them only when a SEV-ES guest has been detected.
Rip out the custom VC stacks mapping and storage code.
[ bp: Steal and adapt Tom's commit message. ]
Fixes: 7fae4c24a2b8 ("x86: Increase exception stack sizes")
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Tested-by: Brijesh Singh <brijesh.singh(a)amd.com>
Link: https://lkml.kernel.org/r/YVt1IMjIs7pIZTRR@zn.tnic
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 3d52b094850a..dd5ea1bdf04c 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -10,6 +10,12 @@
#ifdef CONFIG_X86_64
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define VC_EXCEPTION_STKSZ EXCEPTION_STKSZ
+#else
+#define VC_EXCEPTION_STKSZ 0
+#endif
+
/* Macro to enforce the same ordering and stack sizes */
#define ESTACKS_MEMBERS(guardsize, optional_stack_size) \
char DF_stack_guard[guardsize]; \
@@ -28,7 +34,7 @@
/* The exception stacks' physical storage. No guard pages required */
struct exception_stacks {
- ESTACKS_MEMBERS(0, 0)
+ ESTACKS_MEMBERS(0, VC_EXCEPTION_STKSZ)
};
/* The effective cpu entry area mapping with guard pages. */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 53a6837d354b..4d0d1c2b65e1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -46,16 +46,6 @@ static struct ghcb __initdata *boot_ghcb;
struct sev_es_runtime_data {
struct ghcb ghcb_page;
- /* Physical storage for the per-CPU IST stack of the #VC handler */
- char ist_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
- /*
- * Physical storage for the per-CPU fall-back stack of the #VC handler.
- * The fall-back stack is used when it is not safe to switch back to the
- * interrupted stack in the #VC entry code.
- */
- char fallback_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
/*
* Reserve one page per CPU as backup storage for the unencrypted GHCB.
* It is needed when an NMI happens while the #VC handler uses the real
@@ -99,27 +89,6 @@ DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);
-static void __init setup_vc_stacks(int cpu)
-{
- struct sev_es_runtime_data *data;
- struct cpu_entry_area *cea;
- unsigned long vaddr;
- phys_addr_t pa;
-
- data = per_cpu(runtime_data, cpu);
- cea = get_cpu_entry_area(cpu);
-
- /* Map #VC IST stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC);
- pa = __pa(data->ist_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-
- /* Map VC fall-back stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC2);
- pa = __pa(data->fallback_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-}
-
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -787,7 +756,6 @@ void __init sev_es_init_vc_handling(void)
for_each_possible_cpu(cpu) {
alloc_runtime_data(cpu);
init_ghcb(cpu);
- setup_vc_stacks(cpu);
}
sev_es_setup_play_dead();
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index f5e1e60c9095..6c2f1b76a0b6 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -110,6 +110,13 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
cea_map_stack(NMI);
cea_map_stack(DB);
cea_map_stack(MCE);
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
+ cea_map_stack(VC);
+ cea_map_stack(VC2);
+ }
+ }
}
#else
static inline void percpu_setup_exception_stacks(unsigned int cpu)
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 541ac97186d9ea88491961a46284de3603c914fd Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Fri, 1 Oct 2021 21:41:20 +0200
Subject: [PATCH] x86/sev: Make the #VC exception stacks part of the default
stacks storage
The size of the exception stacks was increased by the commit in Fixes,
resulting in stack sizes greater than a page in size. The #VC exception
handling was only mapping the first (bottom) page, resulting in an
SEV-ES guest failing to boot.
Make the #VC exception stacks part of the default exception stacks
storage and allocate them with a CONFIG_AMD_MEM_ENCRYPT=y .config. Map
them only when a SEV-ES guest has been detected.
Rip out the custom VC stacks mapping and storage code.
[ bp: Steal and adapt Tom's commit message. ]
Fixes: 7fae4c24a2b8 ("x86: Increase exception stack sizes")
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Tested-by: Brijesh Singh <brijesh.singh(a)amd.com>
Link: https://lkml.kernel.org/r/YVt1IMjIs7pIZTRR@zn.tnic
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 3d52b094850a..dd5ea1bdf04c 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -10,6 +10,12 @@
#ifdef CONFIG_X86_64
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define VC_EXCEPTION_STKSZ EXCEPTION_STKSZ
+#else
+#define VC_EXCEPTION_STKSZ 0
+#endif
+
/* Macro to enforce the same ordering and stack sizes */
#define ESTACKS_MEMBERS(guardsize, optional_stack_size) \
char DF_stack_guard[guardsize]; \
@@ -28,7 +34,7 @@
/* The exception stacks' physical storage. No guard pages required */
struct exception_stacks {
- ESTACKS_MEMBERS(0, 0)
+ ESTACKS_MEMBERS(0, VC_EXCEPTION_STKSZ)
};
/* The effective cpu entry area mapping with guard pages. */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 53a6837d354b..4d0d1c2b65e1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -46,16 +46,6 @@ static struct ghcb __initdata *boot_ghcb;
struct sev_es_runtime_data {
struct ghcb ghcb_page;
- /* Physical storage for the per-CPU IST stack of the #VC handler */
- char ist_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
- /*
- * Physical storage for the per-CPU fall-back stack of the #VC handler.
- * The fall-back stack is used when it is not safe to switch back to the
- * interrupted stack in the #VC entry code.
- */
- char fallback_stack[EXCEPTION_STKSZ] __aligned(PAGE_SIZE);
-
/*
* Reserve one page per CPU as backup storage for the unencrypted GHCB.
* It is needed when an NMI happens while the #VC handler uses the real
@@ -99,27 +89,6 @@ DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
/* Needed in vc_early_forward_exception */
void do_early_exception(struct pt_regs *regs, int trapnr);
-static void __init setup_vc_stacks(int cpu)
-{
- struct sev_es_runtime_data *data;
- struct cpu_entry_area *cea;
- unsigned long vaddr;
- phys_addr_t pa;
-
- data = per_cpu(runtime_data, cpu);
- cea = get_cpu_entry_area(cpu);
-
- /* Map #VC IST stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC);
- pa = __pa(data->ist_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-
- /* Map VC fall-back stack */
- vaddr = CEA_ESTACK_BOT(&cea->estacks, VC2);
- pa = __pa(data->fallback_stack);
- cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
-}
-
static __always_inline bool on_vc_stack(struct pt_regs *regs)
{
unsigned long sp = regs->sp;
@@ -787,7 +756,6 @@ void __init sev_es_init_vc_handling(void)
for_each_possible_cpu(cpu) {
alloc_runtime_data(cpu);
init_ghcb(cpu);
- setup_vc_stacks(cpu);
}
sev_es_setup_play_dead();
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index f5e1e60c9095..6c2f1b76a0b6 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -110,6 +110,13 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
cea_map_stack(NMI);
cea_map_stack(DB);
cea_map_stack(MCE);
+
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
+ if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
+ cea_map_stack(VC);
+ cea_map_stack(VC2);
+ }
+ }
}
#else
static inline void percpu_setup_exception_stacks(unsigned int cpu)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3ec18fc7831e7d79e2d536dd1f3bc0d3ba425e8a Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)stackframe.org>
Date: Sat, 13 Nov 2021 20:41:17 +0100
Subject: [PATCH] parisc/entry: fix trace test in syscall exit path
commit 8779e05ba8aa ("parisc: Fix ptrace check on syscall return")
fixed testing of TI_FLAGS. This uncovered a bug in the test mask.
syscall_restore_rfi is only used when the kernel needs to exit to
usespace with single or block stepping and the recovery counter
enabled. The test however used _TIF_SYSCALL_TRACE_MASK, which
includes a lot of bits that shouldn't be tested here.
Fix this by using TIF_SINGLESTEP and TIF_BLOCKSTEP directly.
I encountered this bug by enabling syscall tracepoints. Both in qemu and
on real hardware. As soon as i enabled the tracepoint (sys_exit_read,
but i guess it doesn't really matter which one), i got random page
faults in userspace almost immediately.
Signed-off-by: Sven Schnelle <svens(a)stackframe.org>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 57944d6f9ebb..88c188a965d8 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -1805,7 +1805,7 @@ syscall_restore:
/* Are we being ptraced? */
LDREG TASK_TI_FLAGS(%r1),%r19
- ldi _TIF_SYSCALL_TRACE_MASK,%r2
+ ldi _TIF_SINGLESTEP|_TIF_BLOCKSTEP,%r2
and,COND(=) %r19,%r2,%r0
b,n syscall_restore_rfi
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3ec18fc7831e7d79e2d536dd1f3bc0d3ba425e8a Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)stackframe.org>
Date: Sat, 13 Nov 2021 20:41:17 +0100
Subject: [PATCH] parisc/entry: fix trace test in syscall exit path
commit 8779e05ba8aa ("parisc: Fix ptrace check on syscall return")
fixed testing of TI_FLAGS. This uncovered a bug in the test mask.
syscall_restore_rfi is only used when the kernel needs to exit to
usespace with single or block stepping and the recovery counter
enabled. The test however used _TIF_SYSCALL_TRACE_MASK, which
includes a lot of bits that shouldn't be tested here.
Fix this by using TIF_SINGLESTEP and TIF_BLOCKSTEP directly.
I encountered this bug by enabling syscall tracepoints. Both in qemu and
on real hardware. As soon as i enabled the tracepoint (sys_exit_read,
but i guess it doesn't really matter which one), i got random page
faults in userspace almost immediately.
Signed-off-by: Sven Schnelle <svens(a)stackframe.org>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 57944d6f9ebb..88c188a965d8 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -1805,7 +1805,7 @@ syscall_restore:
/* Are we being ptraced? */
LDREG TASK_TI_FLAGS(%r1),%r19
- ldi _TIF_SYSCALL_TRACE_MASK,%r2
+ ldi _TIF_SINGLESTEP|_TIF_BLOCKSTEP,%r2
and,COND(=) %r19,%r2,%r0
b,n syscall_restore_rfi
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3ec18fc7831e7d79e2d536dd1f3bc0d3ba425e8a Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)stackframe.org>
Date: Sat, 13 Nov 2021 20:41:17 +0100
Subject: [PATCH] parisc/entry: fix trace test in syscall exit path
commit 8779e05ba8aa ("parisc: Fix ptrace check on syscall return")
fixed testing of TI_FLAGS. This uncovered a bug in the test mask.
syscall_restore_rfi is only used when the kernel needs to exit to
usespace with single or block stepping and the recovery counter
enabled. The test however used _TIF_SYSCALL_TRACE_MASK, which
includes a lot of bits that shouldn't be tested here.
Fix this by using TIF_SINGLESTEP and TIF_BLOCKSTEP directly.
I encountered this bug by enabling syscall tracepoints. Both in qemu and
on real hardware. As soon as i enabled the tracepoint (sys_exit_read,
but i guess it doesn't really matter which one), i got random page
faults in userspace almost immediately.
Signed-off-by: Sven Schnelle <svens(a)stackframe.org>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 57944d6f9ebb..88c188a965d8 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -1805,7 +1805,7 @@ syscall_restore:
/* Are we being ptraced? */
LDREG TASK_TI_FLAGS(%r1),%r19
- ldi _TIF_SYSCALL_TRACE_MASK,%r2
+ ldi _TIF_SINGLESTEP|_TIF_BLOCKSTEP,%r2
and,COND(=) %r19,%r2,%r0
b,n syscall_restore_rfi
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3ec18fc7831e7d79e2d536dd1f3bc0d3ba425e8a Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)stackframe.org>
Date: Sat, 13 Nov 2021 20:41:17 +0100
Subject: [PATCH] parisc/entry: fix trace test in syscall exit path
commit 8779e05ba8aa ("parisc: Fix ptrace check on syscall return")
fixed testing of TI_FLAGS. This uncovered a bug in the test mask.
syscall_restore_rfi is only used when the kernel needs to exit to
usespace with single or block stepping and the recovery counter
enabled. The test however used _TIF_SYSCALL_TRACE_MASK, which
includes a lot of bits that shouldn't be tested here.
Fix this by using TIF_SINGLESTEP and TIF_BLOCKSTEP directly.
I encountered this bug by enabling syscall tracepoints. Both in qemu and
on real hardware. As soon as i enabled the tracepoint (sys_exit_read,
but i guess it doesn't really matter which one), i got random page
faults in userspace almost immediately.
Signed-off-by: Sven Schnelle <svens(a)stackframe.org>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 57944d6f9ebb..88c188a965d8 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -1805,7 +1805,7 @@ syscall_restore:
/* Are we being ptraced? */
LDREG TASK_TI_FLAGS(%r1),%r19
- ldi _TIF_SYSCALL_TRACE_MASK,%r2
+ ldi _TIF_SINGLESTEP|_TIF_BLOCKSTEP,%r2
and,COND(=) %r19,%r2,%r0
b,n syscall_restore_rfi
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3ec18fc7831e7d79e2d536dd1f3bc0d3ba425e8a Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)stackframe.org>
Date: Sat, 13 Nov 2021 20:41:17 +0100
Subject: [PATCH] parisc/entry: fix trace test in syscall exit path
commit 8779e05ba8aa ("parisc: Fix ptrace check on syscall return")
fixed testing of TI_FLAGS. This uncovered a bug in the test mask.
syscall_restore_rfi is only used when the kernel needs to exit to
usespace with single or block stepping and the recovery counter
enabled. The test however used _TIF_SYSCALL_TRACE_MASK, which
includes a lot of bits that shouldn't be tested here.
Fix this by using TIF_SINGLESTEP and TIF_BLOCKSTEP directly.
I encountered this bug by enabling syscall tracepoints. Both in qemu and
on real hardware. As soon as i enabled the tracepoint (sys_exit_read,
but i guess it doesn't really matter which one), i got random page
faults in userspace almost immediately.
Signed-off-by: Sven Schnelle <svens(a)stackframe.org>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 57944d6f9ebb..88c188a965d8 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -1805,7 +1805,7 @@ syscall_restore:
/* Are we being ptraced? */
LDREG TASK_TI_FLAGS(%r1),%r19
- ldi _TIF_SYSCALL_TRACE_MASK,%r2
+ ldi _TIF_SINGLESTEP|_TIF_BLOCKSTEP,%r2
and,COND(=) %r19,%r2,%r0
b,n syscall_restore_rfi
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3ec18fc7831e7d79e2d536dd1f3bc0d3ba425e8a Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)stackframe.org>
Date: Sat, 13 Nov 2021 20:41:17 +0100
Subject: [PATCH] parisc/entry: fix trace test in syscall exit path
commit 8779e05ba8aa ("parisc: Fix ptrace check on syscall return")
fixed testing of TI_FLAGS. This uncovered a bug in the test mask.
syscall_restore_rfi is only used when the kernel needs to exit to
usespace with single or block stepping and the recovery counter
enabled. The test however used _TIF_SYSCALL_TRACE_MASK, which
includes a lot of bits that shouldn't be tested here.
Fix this by using TIF_SINGLESTEP and TIF_BLOCKSTEP directly.
I encountered this bug by enabling syscall tracepoints. Both in qemu and
on real hardware. As soon as i enabled the tracepoint (sys_exit_read,
but i guess it doesn't really matter which one), i got random page
faults in userspace almost immediately.
Signed-off-by: Sven Schnelle <svens(a)stackframe.org>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 57944d6f9ebb..88c188a965d8 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -1805,7 +1805,7 @@ syscall_restore:
/* Are we being ptraced? */
LDREG TASK_TI_FLAGS(%r1),%r19
- ldi _TIF_SYSCALL_TRACE_MASK,%r2
+ ldi _TIF_SINGLESTEP|_TIF_BLOCKSTEP,%r2
and,COND(=) %r19,%r2,%r0
b,n syscall_restore_rfi
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3ec18fc7831e7d79e2d536dd1f3bc0d3ba425e8a Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)stackframe.org>
Date: Sat, 13 Nov 2021 20:41:17 +0100
Subject: [PATCH] parisc/entry: fix trace test in syscall exit path
commit 8779e05ba8aa ("parisc: Fix ptrace check on syscall return")
fixed testing of TI_FLAGS. This uncovered a bug in the test mask.
syscall_restore_rfi is only used when the kernel needs to exit to
usespace with single or block stepping and the recovery counter
enabled. The test however used _TIF_SYSCALL_TRACE_MASK, which
includes a lot of bits that shouldn't be tested here.
Fix this by using TIF_SINGLESTEP and TIF_BLOCKSTEP directly.
I encountered this bug by enabling syscall tracepoints. Both in qemu and
on real hardware. As soon as i enabled the tracepoint (sys_exit_read,
but i guess it doesn't really matter which one), i got random page
faults in userspace almost immediately.
Signed-off-by: Sven Schnelle <svens(a)stackframe.org>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 57944d6f9ebb..88c188a965d8 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -1805,7 +1805,7 @@ syscall_restore:
/* Are we being ptraced? */
LDREG TASK_TI_FLAGS(%r1),%r19
- ldi _TIF_SYSCALL_TRACE_MASK,%r2
+ ldi _TIF_SINGLESTEP|_TIF_BLOCKSTEP,%r2
and,COND(=) %r19,%r2,%r0
b,n syscall_restore_rfi
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3ec18fc7831e7d79e2d536dd1f3bc0d3ba425e8a Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)stackframe.org>
Date: Sat, 13 Nov 2021 20:41:17 +0100
Subject: [PATCH] parisc/entry: fix trace test in syscall exit path
commit 8779e05ba8aa ("parisc: Fix ptrace check on syscall return")
fixed testing of TI_FLAGS. This uncovered a bug in the test mask.
syscall_restore_rfi is only used when the kernel needs to exit to
usespace with single or block stepping and the recovery counter
enabled. The test however used _TIF_SYSCALL_TRACE_MASK, which
includes a lot of bits that shouldn't be tested here.
Fix this by using TIF_SINGLESTEP and TIF_BLOCKSTEP directly.
I encountered this bug by enabling syscall tracepoints. Both in qemu and
on real hardware. As soon as i enabled the tracepoint (sys_exit_read,
but i guess it doesn't really matter which one), i got random page
faults in userspace almost immediately.
Signed-off-by: Sven Schnelle <svens(a)stackframe.org>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 57944d6f9ebb..88c188a965d8 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -1805,7 +1805,7 @@ syscall_restore:
/* Are we being ptraced? */
LDREG TASK_TI_FLAGS(%r1),%r19
- ldi _TIF_SYSCALL_TRACE_MASK,%r2
+ ldi _TIF_SINGLESTEP|_TIF_BLOCKSTEP,%r2
and,COND(=) %r19,%r2,%r0
b,n syscall_restore_rfi
In vcpu_load_eoi_exitmap(), currently the eoi_exit_bitmap[4] array is
initialized only when Hyper-V context is available, in other path it is
just passed to kvm_x86_ops.load_eoi_exitmap() directly from on the stack,
which would cause unexpected interrupt delivery/handling issues, e.g. an
*old* linux kernel that relies on PIT to do clock calibration on KVM might
randomly fail to boot.
Fix it by passing ioapic_handled_vectors to load_eoi_exitmap() when Hyper-V
context is not available.
Fixes: f2bc14b69c38 ("KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context")
Cc: stable(a)vger.kernel.org
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Signed-off-by: Huang Le <huangle1(a)jd.com>
---
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc7eb5fddfd3..26466f94e31a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9547,12 +9547,16 @@ static void vcpu_load_eoi_exitmap(struct kvm_vcpu *vcpu)
if (!kvm_apic_hw_enabled(vcpu->arch.apic))
return;
- if (to_hv_vcpu(vcpu))
+ if (to_hv_vcpu(vcpu)) {
bitmap_or((ulong *)eoi_exit_bitmap,
vcpu->arch.ioapic_handled_vectors,
to_hv_synic(vcpu)->vec_bitmap, 256);
+ static_call(kvm_x86_load_eoi_exitmap)(vcpu, eoi_exit_bitmap);
+ return;
+ }
- static_call(kvm_x86_load_eoi_exitmap)(vcpu, eoi_exit_bitmap);
+ static_call(kvm_x86_load_eoi_exitmap)(
+ vcpu, (u64 *)vcpu->arch.ioapic_handled_vectors);
}
void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
Access to netdev after free_netdev() will cause use-after-free bug.
Move debug log before free_netdev() call to avoid it.
Cc: stable(a)vger.kernel.org # v4.19+
Signed-off-by: Pavel Skripkin <paskripkin(a)gmail.com>
---
Note about Fixes: tag. The commit introduced it was before this driver
was moved out from staging. I guess, Fixes tag cannot be used here.
Please, let me know if I am wrong.
Cc: Dan Carpenter <dan.carpenter(a)oracle.com>
@Dan, is there a smatch checker for straigthforward use after free bugs?
Like acessing pointer after free was called? I think, adding
free_netdev() to check list might be good idea
I've skimmed througth smatch source and didn't find one, so can you,
please, point out to it if it exists.
With regards,
Pavel Skripkin
---
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index e0c3c58e2ac7..abd833d94eb3 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -4540,10 +4540,10 @@ static int dpaa2_eth_remove(struct fsl_mc_device *ls_dev)
fsl_mc_portal_free(priv->mc_io);
- free_netdev(net_dev);
-
dev_dbg(net_dev->dev.parent, "Removed interface %s\n", net_dev->name);
+ free_netdev(net_dev);
+
return 0;
}
--
2.33.1
The patch titled
Subject: mm/damon/dbgfs: fix missed use of damon_dbgfs_lock
has been added to the -mm tree. Its filename is
mm-damon-dbgfs-fix-missed-use-of-damon_dbgfs_lock.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-damon-dbgfs-fix-missed-use-of-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-damon-dbgfs-fix-missed-use-of-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/dbgfs: fix missed use of damon_dbgfs_lock
DAMON debugfs is supposed to protect dbgfs_ctxs, dbgfs_nr_ctxs, and
dbgfs_dirs using damon_dbgfs_lock. However, some of the code is accessing
the variables without the protection. This commit fixes it by protecting
all such accesses.
Link: https://lkml.kernel.org/r/20211110145758.16558-3-sj@kernel.org
Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/dbgfs.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
--- a/mm/damon/dbgfs.c~mm-damon-dbgfs-fix-missed-use-of-damon_dbgfs_lock
+++ a/mm/damon/dbgfs.c
@@ -877,12 +877,14 @@ static ssize_t dbgfs_monitor_on_write(st
return -EINVAL;
}
+ mutex_lock(&damon_dbgfs_lock);
if (!strncmp(kbuf, "on", count)) {
int i;
for (i = 0; i < dbgfs_nr_ctxs; i++) {
if (damon_targets_empty(dbgfs_ctxs[i])) {
kfree(kbuf);
+ mutex_unlock(&damon_dbgfs_lock);
return -EINVAL;
}
}
@@ -892,6 +894,7 @@ static ssize_t dbgfs_monitor_on_write(st
} else {
ret = -EINVAL;
}
+ mutex_unlock(&damon_dbgfs_lock);
if (!ret)
ret = count;
@@ -944,15 +947,16 @@ static int __init __damon_dbgfs_init(voi
static int __init damon_dbgfs_init(void)
{
- int rc;
+ int rc = -ENOMEM;
+ mutex_lock(&damon_dbgfs_lock);
dbgfs_ctxs = kmalloc(sizeof(*dbgfs_ctxs), GFP_KERNEL);
if (!dbgfs_ctxs)
- return -ENOMEM;
+ goto out;
dbgfs_ctxs[0] = dbgfs_new_ctx();
if (!dbgfs_ctxs[0]) {
kfree(dbgfs_ctxs);
- return -ENOMEM;
+ goto out;
}
dbgfs_nr_ctxs = 1;
@@ -963,6 +967,8 @@ static int __init damon_dbgfs_init(void)
pr_err("%s: dbgfs init failed\n", __func__);
}
+out:
+ mutex_unlock(&damon_dbgfs_lock);
return rc;
}
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-dbgfs-use-__gfp_nowarn-for-user-specified-size-buffer-allocation.patch
mm-damon-dbgfs-fix-missed-use-of-damon_dbgfs_lock.patch
The patch titled
Subject: mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation
has been added to the -mm tree. Its filename is
mm-damon-dbgfs-use-__gfp_nowarn-for-user-specified-size-buffer-allocation.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-damon-dbgfs-use-__gfp_nowarn-f…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-damon-dbgfs-use-__gfp_nowarn-f…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation
Patch series "DAMON fixes".
This patch (of 2):
DAMON users can trigger below warning in '__alloc_pages()' by invoking
write() to some DAMON debugfs files with arbitrarily high count argument,
because DAMON debugfs interface allocates some buffers based on the
user-specified 'count'.
if (unlikely(order >= MAX_ORDER)) {
WARN_ON_ONCE(!(gfp & __GFP_NOWARN));
return NULL;
}
Because the DAMON debugfs interface code checks failure of the
'kmalloc()', this commit simply suppresses the warnings by adding
'__GFP_NOWARN' flag.
Link: https://lkml.kernel.org/r/20211110145758.16558-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20211110145758.16558-2-sj@kernel.org
Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/dbgfs.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/mm/damon/dbgfs.c~mm-damon-dbgfs-use-__gfp_nowarn-for-user-specified-size-buffer-allocation
+++ a/mm/damon/dbgfs.c
@@ -32,7 +32,7 @@ static char *user_input_str(const char _
if (*ppos)
return ERR_PTR(-EINVAL);
- kbuf = kmalloc(count + 1, GFP_KERNEL);
+ kbuf = kmalloc(count + 1, GFP_KERNEL | __GFP_NOWARN);
if (!kbuf)
return ERR_PTR(-ENOMEM);
@@ -133,7 +133,7 @@ static ssize_t dbgfs_schemes_read(struct
char *kbuf;
ssize_t len;
- kbuf = kmalloc(count, GFP_KERNEL);
+ kbuf = kmalloc(count, GFP_KERNEL | __GFP_NOWARN);
if (!kbuf)
return -ENOMEM;
@@ -452,7 +452,7 @@ static ssize_t dbgfs_init_regions_read(s
char *kbuf;
ssize_t len;
- kbuf = kmalloc(count, GFP_KERNEL);
+ kbuf = kmalloc(count, GFP_KERNEL | __GFP_NOWARN);
if (!kbuf)
return -ENOMEM;
@@ -578,7 +578,7 @@ static ssize_t dbgfs_kdamond_pid_read(st
char *kbuf;
ssize_t len;
- kbuf = kmalloc(count, GFP_KERNEL);
+ kbuf = kmalloc(count, GFP_KERNEL | __GFP_NOWARN);
if (!kbuf)
return -ENOMEM;
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-dbgfs-use-__gfp_nowarn-for-user-specified-size-buffer-allocation.patch
mm-damon-dbgfs-fix-missed-use-of-damon_dbgfs_lock.patch
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 86432a6dca9bed79111990851df5756d3eb5f57c Mon Sep 17 00:00:00 2001
From: Gao Xiang <hsiangkao(a)linux.alibaba.com>
Date: Thu, 4 Nov 2021 02:20:06 +0800
Subject: [PATCH] erofs: fix unsafe pagevec reuse of hooked pclusters
There are pclusters in runtime marked with Z_EROFS_PCLUSTER_TAIL
before actual I/O submission. Thus, the decompression chain can be
extended if the following pcluster chain hooks such tail pcluster.
As the related comment mentioned, if some page is made of a hooked
pcluster and another followed pcluster, it can be reused for in-place
I/O (since I/O should be submitted anyway):
_______________________________________________________________
| tail (partial) page | head (partial) page |
|_____PRIMARY_HOOKED___|____________PRIMARY_FOLLOWED____________|
However, it's by no means safe to reuse as pagevec since if such
PRIMARY_HOOKED pclusters finally move into bypass chain without I/O
submission. It's somewhat hard to reproduce with LZ4 and I just found
it (general protection fault) by ro_fsstressing a LZMA image for long
time.
I'm going to actively clean up related code together with multi-page
folio adaption in the next few months. Let's address it directly for
easier backporting for now.
Call trace for reference:
z_erofs_decompress_pcluster+0x10a/0x8a0 [erofs]
z_erofs_decompress_queue.isra.36+0x3c/0x60 [erofs]
z_erofs_runqueue+0x5f3/0x840 [erofs]
z_erofs_readahead+0x1e8/0x320 [erofs]
read_pages+0x91/0x270
page_cache_ra_unbounded+0x18b/0x240
filemap_get_pages+0x10a/0x5f0
filemap_read+0xa9/0x330
new_sync_read+0x11b/0x1a0
vfs_read+0xf1/0x190
Link: https://lore.kernel.org/r/20211103182006.4040-1-xiang@kernel.org
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable(a)vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Gao Xiang <hsiangkao(a)linux.alibaba.com>
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 11c7a1aaebad..eb51df4a9f77 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -373,8 +373,8 @@ static bool z_erofs_try_inplace_io(struct z_erofs_collector *clt,
/* callers must be with collection lock held */
static int z_erofs_attach_page(struct z_erofs_collector *clt,
- struct page *page,
- enum z_erofs_page_type type)
+ struct page *page, enum z_erofs_page_type type,
+ bool pvec_safereuse)
{
int ret;
@@ -384,9 +384,9 @@ static int z_erofs_attach_page(struct z_erofs_collector *clt,
z_erofs_try_inplace_io(clt, page))
return 0;
- ret = z_erofs_pagevec_enqueue(&clt->vector, page, type);
+ ret = z_erofs_pagevec_enqueue(&clt->vector, page, type,
+ pvec_safereuse);
clt->cl->vcnt += (unsigned int)ret;
-
return ret ? 0 : -EAGAIN;
}
@@ -729,7 +729,8 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
tight &= (clt->mode >= COLLECT_PRIMARY_FOLLOWED);
retry:
- err = z_erofs_attach_page(clt, page, page_type);
+ err = z_erofs_attach_page(clt, page, page_type,
+ clt->mode >= COLLECT_PRIMARY_FOLLOWED);
/* should allocate an additional short-lived page for pagevec */
if (err == -EAGAIN) {
struct page *const newpage =
@@ -737,7 +738,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
set_page_private(newpage, Z_EROFS_SHORTLIVED_PAGE);
err = z_erofs_attach_page(clt, newpage,
- Z_EROFS_PAGE_TYPE_EXCLUSIVE);
+ Z_EROFS_PAGE_TYPE_EXCLUSIVE, true);
if (!err)
goto retry;
}
diff --git a/fs/erofs/zpvec.h b/fs/erofs/zpvec.h
index dfd7fe0503bb..b05464f4a808 100644
--- a/fs/erofs/zpvec.h
+++ b/fs/erofs/zpvec.h
@@ -106,11 +106,18 @@ static inline void z_erofs_pagevec_ctor_init(struct z_erofs_pagevec_ctor *ctor,
static inline bool z_erofs_pagevec_enqueue(struct z_erofs_pagevec_ctor *ctor,
struct page *page,
- enum z_erofs_page_type type)
+ enum z_erofs_page_type type,
+ bool pvec_safereuse)
{
- if (!ctor->next && type)
- if (ctor->index + 1 == ctor->nr)
+ if (!ctor->next) {
+ /* some pages cannot be reused as pvec safely without I/O */
+ if (type == Z_EROFS_PAGE_TYPE_EXCLUSIVE && !pvec_safereuse)
+ type = Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED;
+
+ if (type != Z_EROFS_PAGE_TYPE_EXCLUSIVE &&
+ ctor->index + 1 == ctor->nr)
return false;
+ }
if (ctor->index >= ctor->nr)
z_erofs_pagevec_ctor_pagedown(ctor, false);
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0706f74f719e6e72c3a862ab2990796578fa73cc Mon Sep 17 00:00:00 2001
From: Masahiro Yamada <masahiroy(a)kernel.org>
Date: Wed, 10 Nov 2021 00:01:45 +0900
Subject: [PATCH] MIPS: fix *-pkg builds for loongson2ef platform
Since commit 805b2e1d427a ("kbuild: include Makefile.compiler only when
compiler is needed"), package builds for the loongson2f platform fail.
$ make ARCH=mips CROSS_COMPILE=mips64-linux- lemote2f_defconfig bindeb-pkg
[ snip ]
sh ./scripts/package/builddeb
arch/mips/loongson2ef//Platform:36: *** only binutils >= 2.20.2 have needed option -mfix-loongson2f-nop. Stop.
cp: cannot stat '': No such file or directory
make[5]: *** [scripts/Makefile.package:87: intdeb-pkg] Error 1
make[4]: *** [Makefile:1558: intdeb-pkg] Error 2
make[3]: *** [debian/rules:13: binary-arch] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
make[2]: *** [scripts/Makefile.package:83: bindeb-pkg] Error 2
make[1]: *** [Makefile:1558: bindeb-pkg] Error 2
make: *** [Makefile:350: __build_one_by_one] Error 2
The reason is because "make image_name" fails.
$ make ARCH=mips CROSS_COMPILE=mips64-linux- image_name
arch/mips/loongson2ef//Platform:36: *** only binutils >= 2.20.2 have needed option -mfix-loongson2f-nop. Stop.
In general, adding $(error ...) in the parse stage is troublesome,
and it is pointless to check toolchains even if we are not building
anything. Do not include Kbuild.platform in such cases.
Fixes: 805b2e1d427a ("kbuild: include Makefile.compiler only when compiler is needed")
Reported-by: Jason Self <jason(a)bluehome.net>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index ea3cd080a1c7..f7b58da2f388 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -254,7 +254,9 @@ endif
#
# Board-dependent options and extra files
#
+ifdef need-compiler
include $(srctree)/arch/mips/Kbuild.platforms
+endif
ifdef CONFIG_PHYSICAL_START
load-y = $(CONFIG_PHYSICAL_START)
commit 0706f74f719e6e72c3a862ab2990796578fa73cc upstream.
Since commit 805b2e1d427a ("kbuild: include Makefile.compiler only when
compiler is needed"), package builds for the loongson2f platform fail.
$ make ARCH=mips CROSS_COMPILE=mips64-linux- lemote2f_defconfig bindeb-pkg
[ snip ]
sh ./scripts/package/builddeb
arch/mips/loongson2ef//Platform:36: *** only binutils >= 2.20.2 have needed option -mfix-loongson2f-nop. Stop.
cp: cannot stat '': No such file or directory
make[5]: *** [scripts/Makefile.package:87: intdeb-pkg] Error 1
make[4]: *** [Makefile:1558: intdeb-pkg] Error 2
make[3]: *** [debian/rules:13: binary-arch] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
make[2]: *** [scripts/Makefile.package:83: bindeb-pkg] Error 2
make[1]: *** [Makefile:1558: bindeb-pkg] Error 2
make: *** [Makefile:350: __build_one_by_one] Error 2
The reason is because "make image_name" fails.
$ make ARCH=mips CROSS_COMPILE=mips64-linux- image_name
arch/mips/loongson2ef//Platform:36: *** only binutils >= 2.20.2 have needed option -mfix-loongson2f-nop. Stop.
In general, adding $(error ...) in the parse stage is troublesome,
and it is pointless to check toolchains even if we are not building
anything. Do not include Kbuild.platform in such cases.
Fixes: 805b2e1d427a ("kbuild: include Makefile.compiler only when compiler is needed")
Reported-by: Jason Self <jason(a)bluehome.net>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
---
arch/mips/Makefile | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index 653befc1b176..0dfef0beaaaa 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -254,7 +254,9 @@ endif
#
# Board-dependent options and extra files
#
+ifdef need-compiler
include arch/mips/Kbuild.platforms
+endif
ifdef CONFIG_PHYSICAL_START
load-y = $(CONFIG_PHYSICAL_START)
--
2.30.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 86432a6dca9bed79111990851df5756d3eb5f57c Mon Sep 17 00:00:00 2001
From: Gao Xiang <hsiangkao(a)linux.alibaba.com>
Date: Thu, 4 Nov 2021 02:20:06 +0800
Subject: [PATCH] erofs: fix unsafe pagevec reuse of hooked pclusters
There are pclusters in runtime marked with Z_EROFS_PCLUSTER_TAIL
before actual I/O submission. Thus, the decompression chain can be
extended if the following pcluster chain hooks such tail pcluster.
As the related comment mentioned, if some page is made of a hooked
pcluster and another followed pcluster, it can be reused for in-place
I/O (since I/O should be submitted anyway):
_______________________________________________________________
| tail (partial) page | head (partial) page |
|_____PRIMARY_HOOKED___|____________PRIMARY_FOLLOWED____________|
However, it's by no means safe to reuse as pagevec since if such
PRIMARY_HOOKED pclusters finally move into bypass chain without I/O
submission. It's somewhat hard to reproduce with LZ4 and I just found
it (general protection fault) by ro_fsstressing a LZMA image for long
time.
I'm going to actively clean up related code together with multi-page
folio adaption in the next few months. Let's address it directly for
easier backporting for now.
Call trace for reference:
z_erofs_decompress_pcluster+0x10a/0x8a0 [erofs]
z_erofs_decompress_queue.isra.36+0x3c/0x60 [erofs]
z_erofs_runqueue+0x5f3/0x840 [erofs]
z_erofs_readahead+0x1e8/0x320 [erofs]
read_pages+0x91/0x270
page_cache_ra_unbounded+0x18b/0x240
filemap_get_pages+0x10a/0x5f0
filemap_read+0xa9/0x330
new_sync_read+0x11b/0x1a0
vfs_read+0xf1/0x190
Link: https://lore.kernel.org/r/20211103182006.4040-1-xiang@kernel.org
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable(a)vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Gao Xiang <hsiangkao(a)linux.alibaba.com>
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 11c7a1aaebad..eb51df4a9f77 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -373,8 +373,8 @@ static bool z_erofs_try_inplace_io(struct z_erofs_collector *clt,
/* callers must be with collection lock held */
static int z_erofs_attach_page(struct z_erofs_collector *clt,
- struct page *page,
- enum z_erofs_page_type type)
+ struct page *page, enum z_erofs_page_type type,
+ bool pvec_safereuse)
{
int ret;
@@ -384,9 +384,9 @@ static int z_erofs_attach_page(struct z_erofs_collector *clt,
z_erofs_try_inplace_io(clt, page))
return 0;
- ret = z_erofs_pagevec_enqueue(&clt->vector, page, type);
+ ret = z_erofs_pagevec_enqueue(&clt->vector, page, type,
+ pvec_safereuse);
clt->cl->vcnt += (unsigned int)ret;
-
return ret ? 0 : -EAGAIN;
}
@@ -729,7 +729,8 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
tight &= (clt->mode >= COLLECT_PRIMARY_FOLLOWED);
retry:
- err = z_erofs_attach_page(clt, page, page_type);
+ err = z_erofs_attach_page(clt, page, page_type,
+ clt->mode >= COLLECT_PRIMARY_FOLLOWED);
/* should allocate an additional short-lived page for pagevec */
if (err == -EAGAIN) {
struct page *const newpage =
@@ -737,7 +738,7 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
set_page_private(newpage, Z_EROFS_SHORTLIVED_PAGE);
err = z_erofs_attach_page(clt, newpage,
- Z_EROFS_PAGE_TYPE_EXCLUSIVE);
+ Z_EROFS_PAGE_TYPE_EXCLUSIVE, true);
if (!err)
goto retry;
}
diff --git a/fs/erofs/zpvec.h b/fs/erofs/zpvec.h
index dfd7fe0503bb..b05464f4a808 100644
--- a/fs/erofs/zpvec.h
+++ b/fs/erofs/zpvec.h
@@ -106,11 +106,18 @@ static inline void z_erofs_pagevec_ctor_init(struct z_erofs_pagevec_ctor *ctor,
static inline bool z_erofs_pagevec_enqueue(struct z_erofs_pagevec_ctor *ctor,
struct page *page,
- enum z_erofs_page_type type)
+ enum z_erofs_page_type type,
+ bool pvec_safereuse)
{
- if (!ctor->next && type)
- if (ctor->index + 1 == ctor->nr)
+ if (!ctor->next) {
+ /* some pages cannot be reused as pvec safely without I/O */
+ if (type == Z_EROFS_PAGE_TYPE_EXCLUSIVE && !pvec_safereuse)
+ type = Z_EROFS_VLE_PAGE_TYPE_TAIL_SHARED;
+
+ if (type != Z_EROFS_PAGE_TYPE_EXCLUSIVE &&
+ ctor->index + 1 == ctor->nr)
return false;
+ }
if (ctor->index >= ctor->nr)
z_erofs_pagevec_ctor_pagedown(ctor, false);
The patch titled
Subject: proc/vmcore: fix clearing user buffer by properly using clear_user()
has been added to the -mm tree. Its filename is
proc-vmcore-fix-clearing-user-buffer-by-properly-using-clear_user.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/proc-vmcore-fix-clearing-user-buf…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/proc-vmcore-fix-clearing-user-buf…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: proc/vmcore: fix clearing user buffer by properly using clear_user()
To clear a user buffer we cannot simply use memset, we have to use
clear_user(). With a virtio-mem device that registers a vmcore_cb and has
some logically unplugged memory inside an added Linux memory block, I can
easily trigger a BUG by copying the vmcore via "cp":
[ 11.327580] systemd[1]: Starting Kdump Vmcore Save Service...
[ 11.339697] kdump[420]: Kdump is using the default log level(3).
[ 11.370964] kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
[ 11.373997] kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
[ 11.385357] kdump[465]: saving vmcore-dmesg.txt complete
[ 11.386722] kdump[467]: saving vmcore
[ 16.531275] BUG: unable to handle page fault for address: 00007f2374e01000
[ 16.531705] #PF: supervisor write access in kernel mode
[ 16.532037] #PF: error_code(0x0003) - permissions violation
[ 16.532396] PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
[ 16.532872] Oops: 0003 [#1] PREEMPT SMP NOPTI
[ 16.533154] CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
[ 16.533513] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
[ 16.534198] RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
[ 16.534552] Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
[ 16.535670] RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
[ 16.535998] RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
[ 16.536441] RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
[ 16.536878] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
[ 16.537315] R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
[ 16.537755] R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
[ 16.538200] FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
[ 16.538696] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.539055] CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
[ 16.539510] Call Trace:
[ 16.539679] <TASK>
[ 16.539828] read_vmcore+0x236/0x2c0
[ 16.540063] ? enqueue_hrtimer+0x2f/0x80
[ 16.540323] ? inode_security+0x22/0x60
[ 16.540572] proc_reg_read+0x55/0xa0
[ 16.540807] vfs_read+0x95/0x190
[ 16.541022] ksys_read+0x4f/0xc0
[ 16.541238] do_syscall_64+0x3b/0x90
[ 16.541475] entry_SYSCALL_64_after_hwframe+0x44/0xae
Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on wrong
access. In the x86-64 variant of clear_user(), SMAP is properly handled
via clac()+stac().
To fix, properly use clear_user() when we're dealing with a user buffer.
Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Philipp Rudo <prudo(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/vmcore.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
--- a/fs/proc/vmcore.c~proc-vmcore-fix-clearing-user-buffer-by-properly-using-clear_user
+++ a/fs/proc/vmcore.c
@@ -154,9 +154,13 @@ ssize_t read_from_oldmem(char *buf, size
nr_bytes = count;
/* If pfn is not ram, return zeros for sparse dump files */
- if (!pfn_is_ram(pfn))
- memset(buf, 0, nr_bytes);
- else {
+ if (!pfn_is_ram(pfn)) {
+ tmp = 0;
+ if (!userbuf)
+ memset(buf, 0, nr_bytes);
+ else if (clear_user(buf, nr_bytes))
+ tmp = -EFAULT;
+ } else {
if (encrypted)
tmp = copy_oldmem_page_encrypted(pfn, buf,
nr_bytes,
@@ -165,12 +169,12 @@ ssize_t read_from_oldmem(char *buf, size
else
tmp = copy_oldmem_page(pfn, buf, nr_bytes,
offset, userbuf);
-
- if (tmp < 0) {
- up_read(&vmcore_cb_rwsem);
- return tmp;
- }
}
+ if (tmp < 0) {
+ up_read(&vmcore_cb_rwsem);
+ return tmp;
+ }
+
*ppos += nr_bytes;
count -= nr_bytes;
buf += nr_bytes;
_
Patches currently in -mm which might be from david(a)redhat.com are
proc-vmcore-fix-clearing-user-buffer-by-properly-using-clear_user.patch
Booting to Android userspace on 5.14 or newer triggers the following
SELinux denial:
avc: denied { sys_nice } for comm="init" capability=23
scontext=u:r:init:s0 tcontext=u:r:init:s0 tclass=capability
permissive=0
Init is PID 0 running as root, so it already has CAP_SYS_ADMIN. For
better compatibility with older SEPolicy, check ADMIN before NICE.
Fixes: 9d3a39a5f1e4 ("block: grant IOPRIO_CLASS_RT to CAP_SYS_NICE")
Signed-off-by: Alistair Delva <adelva(a)google.com>
Cc: Khazhismel Kumykov <khazhy(a)google.com>
Cc: Bart Van Assche <bvanassche(a)acm.org>
Cc: Serge Hallyn <serge(a)hallyn.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Paul Moore <paul(a)paul-moore.com>
Cc: selinux(a)vger.kernel.org
Cc: linux-security-module(a)vger.kernel.org
Cc: kernel-team(a)android.com
Cc: stable(a)vger.kernel.org # v5.14+
---
v2: added comment requested by Jens
block/ioprio.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/block/ioprio.c b/block/ioprio.c
index 0e4ff245f2bf..313c14a70bbd 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -69,7 +69,14 @@ int ioprio_check_cap(int ioprio)
switch (class) {
case IOPRIO_CLASS_RT:
- if (!capable(CAP_SYS_NICE) && !capable(CAP_SYS_ADMIN))
+ /*
+ * Originally this only checked for CAP_SYS_ADMIN,
+ * which was implicitly allowed for pid 0 by security
+ * modules such as SELinux. Make sure we check
+ * CAP_SYS_ADMIN first to avoid a denial/avc for
+ * possibly missing CAP_SYS_NICE permission.
+ */
+ if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_NICE))
return -EPERM;
fallthrough;
/* rt has prio field too */
--
2.34.0.rc1.387.gb447b232ab-goog
The patch titled
Subject: hugetlb: fix hugetlb cgroup refcounting during mremap
has been added to the -mm tree. Its filename is
hugetlb-fix-hugetlb-cgroup-refcounting-during-mremap.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/hugetlb-fix-hugetlb-cgroup-refcou…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/hugetlb-fix-hugetlb-cgroup-refcou…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Bui Quang Minh <minhquangbui99(a)gmail.com>
Subject: hugetlb: fix hugetlb cgroup refcounting during mremap
When hugetlb_vm_op_open() is called during copy_vma(), we may take the
reference to resv_map->css. Later, when clearing the reservation pointer
of old_vma after transferring it to new_vma, we forget to drop the
reference to resv_map->css. This leads to a reference leak of css.
Fixes this by adding a check to drop reservation css reference in
clear_vma_resv_huge_pages()
Link: https://lkml.kernel.org/r/20211113154412.91134-1-minhquangbui99@gmail.com
Fixes: 550a7d60bd5e35 ("mm, hugepages: add mremap() support for hugepage backed vma")
Signed-off-by: Bui Quang Minh <minhquangbui99(a)gmail.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Mina Almasry <almasrymina(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb_cgroup.h | 12 ++++++++++++
mm/hugetlb.c | 4 +++-
2 files changed, 15 insertions(+), 1 deletion(-)
--- a/include/linux/hugetlb_cgroup.h~hugetlb-fix-hugetlb-cgroup-refcounting-during-mremap
+++ a/include/linux/hugetlb_cgroup.h
@@ -128,6 +128,13 @@ static inline void resv_map_dup_hugetlb_
css_get(resv_map->css);
}
+static inline void resv_map_put_hugetlb_cgroup_uncharge_info(
+ struct resv_map *resv_map)
+{
+ if (resv_map->css)
+ css_put(resv_map->css);
+}
+
extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr);
extern int hugetlb_cgroup_charge_cgroup_rsvd(int idx, unsigned long nr_pages,
@@ -210,6 +217,11 @@ static inline void resv_map_dup_hugetlb_
struct resv_map *resv_map)
{
}
+
+static inline void resv_map_put_hugetlb_cgroup_uncharge_info(
+ struct resv_map *resv_map)
+{
+}
static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr)
--- a/mm/hugetlb.c~hugetlb-fix-hugetlb-cgroup-refcounting-during-mremap
+++ a/mm/hugetlb.c
@@ -1037,8 +1037,10 @@ void clear_vma_resv_huge_pages(struct vm
*/
struct resv_map *reservations = vma_resv_map(vma);
- if (reservations && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
+ if (reservations && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+ resv_map_put_hugetlb_cgroup_uncharge_info(reservations);
kref_put(&reservations->refs, resv_map_release);
+ }
reset_vma_resv_huge_pages(vma);
}
_
Patches currently in -mm which might be from minhquangbui99(a)gmail.com are
hugetlb-fix-hugetlb-cgroup-refcounting-during-mremap.patch
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
While working on supporting the Intel HDR backlight interface, I noticed
that there's a couple of laptops that will very rarely manage to boot up
without detecting Intel HDR backlight support - even though it's supported
on the system. One example of such a laptop is the Lenovo P17 1st
generation.
Following some investigation Ville Syrjälä did through the docs they have
available to them, they discovered that there's actually supposed to be a
30ms wait after writing the source OUI before we begin setting up the rest
of the backlight interface.
This seems to be correct, as adding this 30ms delay seems to have
completely fixed the probing issues I was previously seeing. So - let's
start performing a 30ms wait after writing the OUI, which we do in a manner
similar to how we keep track of PPS delays (e.g. record the timestamp of
the OUI write, and then wait for however many ms are left since that
timestamp right before we interact with the backlight) in order to avoid
waiting any longer then we need to. As well, this also avoids us performing
this delay on systems where we don't end up using the HDR backlight
interface.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 4a8d79901d5b ("drm/i915/dp: Enable Intel's HDR backlight interface (only SDR for now)")
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v5.12+
---
drivers/gpu/drm/i915/display/intel_display_types.h | 3 +++
drivers/gpu/drm/i915/display/intel_dp.c | 3 +++
drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c | 11 +++++++++++
3 files changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h b/drivers/gpu/drm/i915/display/intel_display_types.h
index ea1e8a6e10b0..b9c967837872 100644
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@@ -1653,6 +1653,9 @@ struct intel_dp {
struct intel_dp_pcon_frl frl;
struct intel_psr psr;
+
+ /* When we last wrote the OUI for eDP */
+ unsigned long last_oui_write;
};
enum lspcon_vendor {
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 0a424bf69396..77d9a9390c1e 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -29,6 +29,7 @@
#include <linux/i2c.h>
#include <linux/notifier.h>
#include <linux/slab.h>
+#include <linux/timekeeping.h>
#include <linux/types.h>
#include <asm/byteorder.h>
@@ -2010,6 +2011,8 @@ intel_edp_init_source_oui(struct intel_dp *intel_dp, bool careful)
if (drm_dp_dpcd_write(&intel_dp->aux, DP_SOURCE_OUI, oui, sizeof(oui)) < 0)
drm_err(&i915->drm, "Failed to write source OUI\n");
+
+ intel_dp->last_oui_write = jiffies;
}
/* If the device supports it, try to set the power state appropriately */
diff --git a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
index 569d17b4d00f..2c35b999ec2c 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c
@@ -96,6 +96,13 @@
#define INTEL_EDP_BRIGHTNESS_OPTIMIZATION_1 0x359
/* Intel EDP backlight callbacks */
+static void
+wait_for_oui(struct drm_i915_private *i915, struct intel_dp *intel_dp)
+{
+ drm_dbg_kms(&i915->drm, "Performing OUI wait\n");
+ wait_remaining_ms_from_jiffies(intel_dp->last_oui_write, 30);
+}
+
static bool
intel_dp_aux_supports_hdr_backlight(struct intel_connector *connector)
{
@@ -106,6 +113,8 @@ intel_dp_aux_supports_hdr_backlight(struct intel_connector *connector)
int ret;
u8 tcon_cap[4];
+ wait_for_oui(i915, intel_dp);
+
ret = drm_dp_dpcd_read(aux, INTEL_EDP_HDR_TCON_CAP0, tcon_cap, sizeof(tcon_cap));
if (ret != sizeof(tcon_cap))
return false;
@@ -204,6 +213,8 @@ intel_dp_aux_hdr_enable_backlight(const struct intel_crtc_state *crtc_state,
int ret;
u8 old_ctrl, ctrl;
+ wait_for_oui(i915, intel_dp);
+
ret = drm_dp_dpcd_readb(&intel_dp->aux, INTEL_EDP_HDR_GETSET_CTRL_PARAMS, &old_ctrl);
if (ret != 1) {
drm_err(&i915->drm, "Failed to read current backlight control mode: %d\n", ret);
--
2.31.1
The patch titled
Subject: mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
has been added to the -mm tree. Its filename is
mm-kmemleak-slob-respect-slab_noleaktrace-flag.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-kmemleak-slob-respect-slab_nol…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-kmemleak-slob-respect-slab_nol…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Rustam Kovhaev <rkovhaev(a)gmail.com>
Subject: mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
When kmemleak is enabled for SLOB, system does not boot and does not print
anything to the console. At the very early stage in the boot process we
hit infinite recursion from kmemleak_init() and eventually kernel crashes.
kmemleak_init() specifies SLAB_NOLEAKTRACE for KMEM_CACHE(), but
kmem_cache_create_usercopy() removes it because CACHE_CREATE_MASK is not
valid for SLOB.
Let's fix CACHE_CREATE_MASK and make kmemleak work with SLOB
Link: https://lkml.kernel.org/r/20211115020850.3154366-1-rkovhaev@gmail.com
Fixes: d8843922fba4 ("slab: Ignore internal flags in cache creation")
Signed-off-by: Rustam Kovhaev <rkovhaev(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Glauber Costa <glommer(a)parallels.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/slab.h~mm-kmemleak-slob-respect-slab_noleaktrace-flag
+++ a/mm/slab.h
@@ -147,7 +147,7 @@ static inline slab_flags_t kmem_cache_fl
#define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE | SLAB_RECLAIM_ACCOUNT | \
SLAB_TEMPORARY | SLAB_ACCOUNT)
#else
-#define SLAB_CACHE_FLAGS (0)
+#define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE)
#endif
/* Common flags available with current configuration */
_
Patches currently in -mm which might be from rkovhaev(a)gmail.com are
mm-kmemleak-slob-respect-slab_noleaktrace-flag.patch
The patch titled
Subject: hexagon: clean up timer-regs.h
has been added to the -mm tree. Its filename is
hexagon-clean-up-timer-regsh.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/hexagon-clean-up-timer-regsh.patch
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/hexagon-clean-up-timer-regsh.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Nathan Chancellor <nathan(a)kernel.org>
Subject: hexagon: clean up timer-regs.h
When building allmodconfig, there is a warning about TIMER_ENABLE being
redefined:
drivers/clocksource/timer-oxnas-rps.c:39:9: error: 'TIMER_ENABLE' macro redefined [-Werror,-Wmacro-redefined]
#define TIMER_ENABLE BIT(7)
^
./arch/hexagon/include/asm/timer-regs.h:13:9: note: previous definition is here
#define TIMER_ENABLE 0
^
1 error generated.
The values in this header are only used in one file each, if they are
used at all. Remove the header and sink all of the constants into their
respective files.
TCX0_CLK_RATE is only used in arch/hexagon/include/asm/timex.h
TIMER_ENABLE, RTOS_TIMER_INT, RTOS_TIMER_REGS_ADDR are only used in
arch/hexagon/kernel/time.c.
SLEEP_CLK_RATE and TIMER_CLR_ON_MATCH have both been unused since the
file's introduction in commit 71e4a47f32f4 ("Hexagon: Add time and timer
functions").
TIMER_ENABLE is redefined as BIT(0) so the shift is moved into the
definition, rather than its use.
Link: https://lkml.kernel.org/r/20211115174250.1994179-3-nathan@kernel.org
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Acked-by: Brian Cain <bcain(a)codeaurora.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/hexagon/include/asm/timer-regs.h | 26 ------------------------
arch/hexagon/include/asm/timex.h | 3 --
arch/hexagon/kernel/time.c | 12 +++++++++--
3 files changed, 11 insertions(+), 30 deletions(-)
--- a/arch/hexagon/include/asm/timer-regs.h
+++ /dev/null
@@ -1,26 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Timer support for Hexagon
- *
- * Copyright (c) 2010-2011, The Linux Foundation. All rights reserved.
- */
-
-#ifndef _ASM_TIMER_REGS_H
-#define _ASM_TIMER_REGS_H
-
-/* This stuff should go into a platform specific file */
-#define TCX0_CLK_RATE 19200
-#define TIMER_ENABLE 0
-#define TIMER_CLR_ON_MATCH 1
-
-/*
- * 8x50 HDD Specs 5-8. Simulator co-sim not fixed until
- * release 1.1, and then it's "adjustable" and probably not defaulted.
- */
-#define RTOS_TIMER_INT 3
-#ifdef CONFIG_HEXAGON_COMET
-#define RTOS_TIMER_REGS_ADDR 0xAB000000UL
-#endif
-#define SLEEP_CLK_RATE 32000
-
-#endif
--- a/arch/hexagon/include/asm/timex.h~hexagon-clean-up-timer-regsh
+++ a/arch/hexagon/include/asm/timex.h
@@ -7,11 +7,10 @@
#define _ASM_TIMEX_H
#include <asm-generic/timex.h>
-#include <asm/timer-regs.h>
#include <asm/hexagon_vm.h>
/* Using TCX0 as our clock. CLOCK_TICK_RATE scheduled to be removed. */
-#define CLOCK_TICK_RATE TCX0_CLK_RATE
+#define CLOCK_TICK_RATE 19200
#define ARCH_HAS_READ_CURRENT_TIMER
--- a/arch/hexagon/kernel/time.c~hexagon-clean-up-timer-regsh
+++ a/arch/hexagon/kernel/time.c
@@ -17,9 +17,10 @@
#include <linux/of_irq.h>
#include <linux/module.h>
-#include <asm/timer-regs.h>
#include <asm/hexagon_vm.h>
+#define TIMER_ENABLE BIT(0)
+
/*
* For the clocksource we need:
* pcycle frequency (600MHz)
@@ -33,6 +34,13 @@ cycles_t pcycle_freq_mhz;
cycles_t thread_freq_mhz;
cycles_t sleep_clk_freq;
+/*
+ * 8x50 HDD Specs 5-8. Simulator co-sim not fixed until
+ * release 1.1, and then it's "adjustable" and probably not defaulted.
+ */
+#define RTOS_TIMER_INT 3
+#define RTOS_TIMER_REGS_ADDR 0xAB000000UL
+
static struct resource rtos_timer_resources[] = {
{
.start = RTOS_TIMER_REGS_ADDR,
@@ -80,7 +88,7 @@ static int set_next_event(unsigned long
iowrite32(0, &rtos_timer->clear);
iowrite32(delta, &rtos_timer->match);
- iowrite32(1 << TIMER_ENABLE, &rtos_timer->enable);
+ iowrite32(TIMER_ENABLE, &rtos_timer->enable);
return 0;
}
_
Patches currently in -mm which might be from nathan(a)kernel.org are
hexagon-export-raw-i-o-routines-for-modules.patch
hexagon-clean-up-timer-regsh.patch
hexagon-ignore-vmlinuxlds.patch
The patch titled
Subject: hexagon: export raw I/O routines for modules
has been added to the -mm tree. Its filename is
hexagon-export-raw-i-o-routines-for-modules.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/hexagon-export-raw-i-o-routines-f…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/hexagon-export-raw-i-o-routines-f…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Nathan Chancellor <nathan(a)kernel.org>
Subject: hexagon: export raw I/O routines for modules
Patch series "Fixes for ARCH=hexagon allmodconfig", v2.
This series fixes some issues noticed with ARCH=hexagon allmodconfig.
This patch (of 3):
When building ARCH=hexagon allmodconfig, the following errors occur:
ERROR: modpost: "__raw_readsl" [drivers/i3c/master/svc-i3c-master.ko] undefined!
ERROR: modpost: "__raw_writesl" [drivers/i3c/master/dw-i3c-master.ko] undefined!
ERROR: modpost: "__raw_readsl" [drivers/i3c/master/dw-i3c-master.ko] undefined!
ERROR: modpost: "__raw_writesl" [drivers/i3c/master/i3c-master-cdns.ko] undefined!
ERROR: modpost: "__raw_readsl" [drivers/i3c/master/i3c-master-cdns.ko] undefined!
Export these symbols so that modules can use them without any errors.
Link: https://lkml.kernel.org/r/20211115174250.1994179-1-nathan@kernel.org
Link: https://lkml.kernel.org/r/20211115174250.1994179-2-nathan@kernel.org
Fixes: 013bf24c3829 ("Hexagon: Provide basic implementation and/or stubs for I/O routines.")
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Acked-by: Brian Cain <bcain(a)codeaurora.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/hexagon/lib/io.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/arch/hexagon/lib/io.c~hexagon-export-raw-i-o-routines-for-modules
+++ a/arch/hexagon/lib/io.c
@@ -27,6 +27,7 @@ void __raw_readsw(const void __iomem *ad
*dst++ = *src;
}
+EXPORT_SYMBOL(__raw_readsw);
/*
* __raw_writesw - read words a short at a time
@@ -47,6 +48,7 @@ void __raw_writesw(void __iomem *addr, c
}
+EXPORT_SYMBOL(__raw_writesw);
/* Pretty sure len is pre-adjusted for the length of the access already */
void __raw_readsl(const void __iomem *addr, void *data, int len)
@@ -62,6 +64,7 @@ void __raw_readsl(const void __iomem *ad
}
+EXPORT_SYMBOL(__raw_readsl);
void __raw_writesl(void __iomem *addr, const void *data, int len)
{
@@ -76,3 +79,4 @@ void __raw_writesl(void __iomem *addr, c
}
+EXPORT_SYMBOL(__raw_writesl);
_
Patches currently in -mm which might be from nathan(a)kernel.org are
hexagon-export-raw-i-o-routines-for-modules.patch
hexagon-clean-up-timer-regsh.patch
hexagon-ignore-vmlinuxlds.patch
When building allmodconfig, there is a warning about TIMER_ENABLE being
redefined:
drivers/clocksource/timer-oxnas-rps.c:39:9: error: 'TIMER_ENABLE' macro redefined [-Werror,-Wmacro-redefined]
#define TIMER_ENABLE BIT(7)
^
./arch/hexagon/include/asm/timer-regs.h:13:9: note: previous definition is here
#define TIMER_ENABLE 0
^
1 error generated.
The values in this header are only used in one file each, if they are
used at all. Remove the header and sink all of the constants into their
respective files.
TCX0_CLK_RATE is only used in arch/hexagon/include/asm/timex.h
TIMER_ENABLE, RTOS_TIMER_INT, RTOS_TIMER_REGS_ADDR are only used in
arch/hexagon/kernel/time.c.
SLEEP_CLK_RATE and TIMER_CLR_ON_MATCH have both been unused since the
file's introduction in commit 71e4a47f32f4 ("Hexagon: Add time and timer
functions").
TIMER_ENABLE is redefined as BIT(0) so the shift is moved into the
definition, rather than its use.
Cc: stable(a)vger.kernel.org
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Acked-by: Brian Cain <bcain(a)codeaurora.org>
---
arch/hexagon/include/asm/timer-regs.h | 26 --------------------------
arch/hexagon/include/asm/timex.h | 3 +--
arch/hexagon/kernel/time.c | 12 ++++++++++--
3 files changed, 11 insertions(+), 30 deletions(-)
delete mode 100644 arch/hexagon/include/asm/timer-regs.h
diff --git a/arch/hexagon/include/asm/timer-regs.h b/arch/hexagon/include/asm/timer-regs.h
deleted file mode 100644
index ee6c61423a05..000000000000
--- a/arch/hexagon/include/asm/timer-regs.h
+++ /dev/null
@@ -1,26 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Timer support for Hexagon
- *
- * Copyright (c) 2010-2011, The Linux Foundation. All rights reserved.
- */
-
-#ifndef _ASM_TIMER_REGS_H
-#define _ASM_TIMER_REGS_H
-
-/* This stuff should go into a platform specific file */
-#define TCX0_CLK_RATE 19200
-#define TIMER_ENABLE 0
-#define TIMER_CLR_ON_MATCH 1
-
-/*
- * 8x50 HDD Specs 5-8. Simulator co-sim not fixed until
- * release 1.1, and then it's "adjustable" and probably not defaulted.
- */
-#define RTOS_TIMER_INT 3
-#ifdef CONFIG_HEXAGON_COMET
-#define RTOS_TIMER_REGS_ADDR 0xAB000000UL
-#endif
-#define SLEEP_CLK_RATE 32000
-
-#endif
diff --git a/arch/hexagon/include/asm/timex.h b/arch/hexagon/include/asm/timex.h
index 8d4ec76fceb4..dfe69e118b2b 100644
--- a/arch/hexagon/include/asm/timex.h
+++ b/arch/hexagon/include/asm/timex.h
@@ -7,11 +7,10 @@
#define _ASM_TIMEX_H
#include <asm-generic/timex.h>
-#include <asm/timer-regs.h>
#include <asm/hexagon_vm.h>
/* Using TCX0 as our clock. CLOCK_TICK_RATE scheduled to be removed. */
-#define CLOCK_TICK_RATE TCX0_CLK_RATE
+#define CLOCK_TICK_RATE 19200
#define ARCH_HAS_READ_CURRENT_TIMER
diff --git a/arch/hexagon/kernel/time.c b/arch/hexagon/kernel/time.c
index feffe527ac92..febc95714d75 100644
--- a/arch/hexagon/kernel/time.c
+++ b/arch/hexagon/kernel/time.c
@@ -17,9 +17,10 @@
#include <linux/of_irq.h>
#include <linux/module.h>
-#include <asm/timer-regs.h>
#include <asm/hexagon_vm.h>
+#define TIMER_ENABLE BIT(0)
+
/*
* For the clocksource we need:
* pcycle frequency (600MHz)
@@ -33,6 +34,13 @@ cycles_t pcycle_freq_mhz;
cycles_t thread_freq_mhz;
cycles_t sleep_clk_freq;
+/*
+ * 8x50 HDD Specs 5-8. Simulator co-sim not fixed until
+ * release 1.1, and then it's "adjustable" and probably not defaulted.
+ */
+#define RTOS_TIMER_INT 3
+#define RTOS_TIMER_REGS_ADDR 0xAB000000UL
+
static struct resource rtos_timer_resources[] = {
{
.start = RTOS_TIMER_REGS_ADDR,
@@ -80,7 +88,7 @@ static int set_next_event(unsigned long delta, struct clock_event_device *evt)
iowrite32(0, &rtos_timer->clear);
iowrite32(delta, &rtos_timer->match);
- iowrite32(1 << TIMER_ENABLE, &rtos_timer->enable);
+ iowrite32(TIMER_ENABLE, &rtos_timer->enable);
return 0;
}
--
2.34.0.rc0
commit 415de44076640483648d6c0f6d645a9ee61328ad upstream.
Currently, Linux probes for X86_BUG_NULL_SEL unconditionally which
makes it unsafe to migrate in a virtualised environment as the
properties across the migration pool might differ.
To be specific, the case which goes wrong is:
1. Zen1 (or earlier) and Zen2 (or later) in a migration pool
2. Linux boots on Zen2, probes and finds the absence of X86_BUG_NULL_SEL
3. Linux is then migrated to Zen1
Linux is now running on a X86_BUG_NULL_SEL-impacted CPU while believing
that the bug is fixed.
The only way to address the problem is to fully trust the "no longer
affected" CPUID bit when virtualised, because in the above case it would
be clear deliberately to indicate the fact "you might migrate to
somewhere which has this behaviour".
Zen3 adds the NullSelectorClearsBase CPUID bit to indicate that loading
a NULL segment selector zeroes the base and limit fields, as well as
just attributes. Zen2 also has this behaviour but doesn't have the NSCB
bit.
[ bp: Minor touchups. ]
Signed-off-by: Jane Malalane <jane.malalane(a)citrix.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
CC: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20211021104744.24126-1-jane.malalane@citrix.com
---
Backport to 4.19. Drop Hygon modifications.
---
arch/x86/kernel/cpu/amd.c | 2 ++
arch/x86/kernel/cpu/common.c | 44 +++++++++++++++++++++++++++++++++++++-------
arch/x86/kernel/cpu/cpu.h | 1 +
3 files changed, 40 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index de69090ca142..98c23126f751 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -993,6 +993,8 @@ static void init_amd(struct cpuinfo_x86 *c)
if (cpu_has(c, X86_FEATURE_IRPERF) &&
!cpu_has_amd_erratum(c, amd_erratum_1054))
msr_set_bit(MSR_K7_HWCR, MSR_K7_HWCR_IRPERF_EN_BIT);
+
+ check_null_seg_clears_base(c);
}
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2058e8c0e61d..7e7400af7797 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1254,9 +1254,8 @@ void __init early_cpu_init(void)
early_identify_cpu(&boot_cpu_data);
}
-static void detect_null_seg_behavior(struct cpuinfo_x86 *c)
+static bool detect_null_seg_behavior(void)
{
-#ifdef CONFIG_X86_64
/*
* Empirically, writing zero to a segment selector on AMD does
* not clear the base, whereas writing zero to a segment
@@ -1277,10 +1276,43 @@ static void detect_null_seg_behavior(struct cpuinfo_x86 *c)
wrmsrl(MSR_FS_BASE, 1);
loadsegment(fs, 0);
rdmsrl(MSR_FS_BASE, tmp);
- if (tmp != 0)
- set_cpu_bug(c, X86_BUG_NULL_SEG);
wrmsrl(MSR_FS_BASE, old_base);
-#endif
+ return tmp == 0;
+}
+
+void check_null_seg_clears_base(struct cpuinfo_x86 *c)
+{
+ /* BUG_NULL_SEG is only relevant with 64bit userspace */
+ if (!IS_ENABLED(CONFIG_X86_64))
+ return;
+
+ /* Zen3 CPUs advertise Null Selector Clears Base in CPUID. */
+ if (c->extended_cpuid_level >= 0x80000021 &&
+ cpuid_eax(0x80000021) & BIT(6))
+ return;
+
+ /*
+ * CPUID bit above wasn't set. If this kernel is still running
+ * as a HV guest, then the HV has decided not to advertize
+ * that CPUID bit for whatever reason. For example, one
+ * member of the migration pool might be vulnerable. Which
+ * means, the bug is present: set the BUG flag and return.
+ */
+ if (cpu_has(c, X86_FEATURE_HYPERVISOR)) {
+ set_cpu_bug(c, X86_BUG_NULL_SEG);
+ return;
+ }
+
+ /*
+ * Zen2 CPUs also have this behaviour, but no CPUID bit.
+ * 0x18 is the respective family for Hygon.
+ */
+ if ((c->x86 == 0x17 || c->x86 == 0x18) &&
+ detect_null_seg_behavior())
+ return;
+
+ /* All the remaining ones are affected */
+ set_cpu_bug(c, X86_BUG_NULL_SEG);
}
static void generic_identify(struct cpuinfo_x86 *c)
@@ -1316,8 +1348,6 @@ static void generic_identify(struct cpuinfo_x86 *c)
get_model_name(c); /* Default name */
- detect_null_seg_behavior(c);
-
/*
* ESPFIX is a strange bug. All real CPUs have it. Paravirt
* systems that run Linux at CPL > 0 may or may not have the
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index e89602d2aff5..4eb9bf68b122 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -76,6 +76,7 @@ extern int detect_extended_topology_early(struct cpuinfo_x86 *c);
extern int detect_extended_topology(struct cpuinfo_x86 *c);
extern int detect_ht_early(struct cpuinfo_x86 *c);
extern void detect_ht(struct cpuinfo_x86 *c);
+extern void check_null_seg_clears_base(struct cpuinfo_x86 *c);
unsigned int aperfmperf_get_khz(int cpu);
--
2.11.0
commit 91adec9e0709 ("drm/amd/display: Look at firmware version to
determine using dmub on dcn21")
Newer DMUB firmware on Renoir and Green Sardine do not need to disable
dmcu and this actually causes problems with DP-C alt mode for a number of
machines.
Backport the fix from this from hand modified backport because mainline
switched to IP version checking which doesn't exist in linux-stable.
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1772
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1735
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
---
Changes from v1->v2:
* Update commit message syntax for hand modified commit
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 1ea31dcc7a8b..084491afe540 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1141,8 +1141,15 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
case CHIP_RAVEN:
case CHIP_RENOIR:
init_data.flags.gpu_vm_support = true;
- if (ASICREV_IS_GREEN_SARDINE(adev->external_rev_id))
+ switch (adev->dm.dmcub_fw_version) {
+ case 0: /* development */
+ case 0x1: /* linux-firmware.git hash 6d9f399 */
+ case 0x01000000: /* linux-firmware.git hash 9a0b0f4 */
+ init_data.flags.disable_dmcu = false;
+ break;
+ default:
init_data.flags.disable_dmcu = true;
+ }
break;
case CHIP_VANGOGH:
case CHIP_YELLOW_CARP:
--
2.25.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 76224355db7570cbe6b6f75c8929a1558828dd55 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Tue, 17 Aug 2021 21:05:16 +0200
Subject: [PATCH] fuse: truncate pagecache on atomic_o_trunc
fuse_finish_open() will be called with FUSE_NOWRITE in case of atomic
O_TRUNC. This can deadlock with fuse_wait_on_page_writeback() in
fuse_launder_page() triggered by invalidate_inode_pages2().
Fix by replacing invalidate_inode_pages2() in fuse_finish_open() with a
truncate_pagecache() call. This makes sense regardless of FOPEN_KEEP_CACHE
or fc->writeback cache, so do it unconditionally.
Reported-by: Xie Yongji <xieyongji(a)bytedance.com>
Reported-and-tested-by: syzbot+bea44a5189836d956894(a)syzkaller.appspotmail.com
Fixes: e4648309b85a ("fuse: truncate pending writes on O_TRUNC")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 97f860cfc195..5e5efb66c7d7 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -198,12 +198,11 @@ void fuse_finish_open(struct inode *inode, struct file *file)
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = get_fuse_conn(inode);
- if (!(ff->open_flags & FOPEN_KEEP_CACHE))
- invalidate_inode_pages2(inode->i_mapping);
if (ff->open_flags & FOPEN_STREAM)
stream_open(inode, file);
else if (ff->open_flags & FOPEN_NONSEEKABLE)
nonseekable_open(inode, file);
+
if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) {
struct fuse_inode *fi = get_fuse_inode(inode);
@@ -211,10 +210,14 @@ void fuse_finish_open(struct inode *inode, struct file *file)
fi->attr_version = atomic64_inc_return(&fc->attr_version);
i_size_write(inode, 0);
spin_unlock(&fi->lock);
+ truncate_pagecache(inode, 0);
fuse_invalidate_attr(inode);
if (fc->writeback_cache)
file_update_time(file);
+ } else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) {
+ invalidate_inode_pages2(inode->i_mapping);
}
+
if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache)
fuse_link_write_file(file);
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4e17d65dafdd3513042d8f00404c9b6068a825c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Tue, 5 Oct 2021 20:09:41 +0200
Subject: [PATCH] PCI: aardvark: Fix PCIe Max Payload Size setting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Change PCIe Max Payload Size setting in PCIe Device Control register to 512
bytes to align with PCIe Link Initialization sequence as defined in Marvell
Armada 3700 Functional Specification. According to the specification,
maximal Max Payload Size supported by this device is 512 bytes.
Without this kernel prints suspicious line:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 16384, max 512)
With this change it changes to:
pci 0000:01:00.0: Upstream bridge's Max Payload Size set to 256 (was 512, max 512)
Link: https://lore.kernel.org/r/20211005180952.6812-3-kabel@kernel.org
Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller driver")
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Marek Behún <kabel(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/pci/controller/pci-aardvark.c b/drivers/pci/controller/pci-aardvark.c
index 596ebcfcc82d..884510630bae 100644
--- a/drivers/pci/controller/pci-aardvark.c
+++ b/drivers/pci/controller/pci-aardvark.c
@@ -488,8 +488,9 @@ static void advk_pcie_setup_hw(struct advk_pcie *pcie)
reg = advk_readl(pcie, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
reg &= ~PCI_EXP_DEVCTL_RELAX_EN;
reg &= ~PCI_EXP_DEVCTL_NOSNOOP_EN;
+ reg &= ~PCI_EXP_DEVCTL_PAYLOAD;
reg &= ~PCI_EXP_DEVCTL_READRQ;
- reg |= PCI_EXP_DEVCTL_PAYLOAD; /* Set max payload size */
+ reg |= PCI_EXP_DEVCTL_PAYLOAD_512B;
reg |= PCI_EXP_DEVCTL_READRQ_512B;
advk_writel(pcie, reg, PCIE_CORE_PCIEXP_CAP + PCI_EXP_DEVCTL);
[Public]
On designs that that GPIO controller has a dedicated IRQ, wakeups aren't working properly from s0i3.
S0i3 is first supported on 5.14 on AMD, so it's a good idea to fix this there.
This is fixed by the following commits:
commit 7e6f8d6f4a42 ("pinctrl: amd: Add irq field data") # 5.14
commit acd47b9f28e5 ("pinctrl: amd: Handle wake-up interrupt") #5.14
Other designs that the GPIO controller shares the IRQ with the ACPI SCI have other fixes that will be sent to stable once they're in 5.16.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 319fa1a52e438a6e028329187783a25ad498c4e6 Mon Sep 17 00:00:00 2001
From: Nathan Lynch <nathanl(a)linux.ibm.com>
Date: Wed, 20 Oct 2021 14:47:03 -0500
Subject: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities
updates
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On VMs with NX encryption, compression, and/or RNG offload, these
capabilities are described by nodes in the ibm,platform-facilities device
tree hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
├── ibm,compression-v1
├── ibm,random-v1
└── ibm,sym-encryption-v1
3 directories
The acceleration functions that these nodes describe are not disrupted by
live migration, not even temporarily.
But the post-migration ibm,update-nodes sequence firmware always sends
"delete" messages for this hierarchy, followed by an "add" directive to
reconstruct it via ibm,configure-connector (log with debugging statements
enabled in mobility.c):
mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
mobility: removing node /ibm,platform-facilities:4294967286
...
mobility: added node /ibm,platform-facilities:4294967286
Note we receive a single "add" message for the entire hierarchy, and what
we receive from the ibm,configure-connector sequence is the top-level
platform-facilities node along with its three children. The debug message
simply reports the parent node and not the whole subtree.
Also, significantly, the nodes added are almost completely equivalent to
the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
the leaf nodes is the only property I've observed to differ, and Linux does
not use that. So in practice, the sum of update messages Linux receives for
this hierarchy is equivalent to minor property updates.
We succeed in removing the original hierarchy from the device tree. But the
vio bus code is ignorant of this, and does not unbind or relinquish its
references. The leaf nodes, still reachable through sysfs, of course still
refer to the now-freed ibm,platform-facilities parent node, which makes
use-after-free possible:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
refcount_warn_saturate+0x160/0x1f0 (unreliable)
kobject_get+0xf0/0x100
of_node_get+0x30/0x50
of_get_parent+0x50/0xb0
of_fwnode_get_parent+0x54/0x90
fwnode_count_parents+0x50/0x150
fwnode_full_name_string+0x30/0x110
device_node_string+0x49c/0x790
vsnprintf+0x1c0/0x4c0
sprintf+0x44/0x60
devspec_show+0x34/0x50
dev_attr_show+0x40/0xa0
sysfs_kf_seq_show+0xbc/0x200
kernfs_seq_show+0x44/0x60
seq_read_iter+0x2a4/0x740
kernfs_fop_read_iter+0x254/0x2e0
new_sync_read+0x120/0x190
vfs_read+0x1d0/0x240
Moreover, the "new" replacement subtree is not correctly added to the
device tree, resulting in ibm,platform-facilities parent node without the
appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
0 directories
$ cd /sys/devices/vio ; find . -xtype l -exec file {} +
./ibm,sym-encryption-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
./ibm,random-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
./ibm,compression-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
This is because add_dt_node() -> dlpar_attach_node() attaches only the
parent node returned from configure-connector, ignoring any children. This
should be corrected for the general case, but fixing that won't help with
the stale OF node references, which is the more urgent problem.
One way to address that would be to make the drivers respond to node
removal notifications, so that node references can be dropped
appropriately. But this would likely force the drivers to disrupt active
clients for no useful purpose: equivalent nodes are immediately re-added.
And recall that the acceleration capabilities described by the nodes remain
available throughout the whole process.
The solution I believe to be robust for this situation is to convert
remove+add of a node with an unchanged phandle to an update of the node's
properties in the Linux device tree structure. That would involve changing
and adding a fair amount of code, and may take several iterations to land.
Until that can be realized we have a confirmed use-after-free and the
possibility of memory corruption. So add a limited workaround that
discriminates on the node type, ignoring adds and removes. This should be
amenable to backporting in the meantime.
Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nathan Lynch <nathanl(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20211020194703.2613093-1-nathanl@linux.ibm.com
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e83e0891272d..210a37a065fb 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -63,6 +63,27 @@ static int mobility_rtas_call(int token, char *buf, s32 scope)
static int delete_dt_node(struct device_node *dn)
{
+ struct device_node *pdn;
+ bool is_platfac;
+
+ pdn = of_get_parent(dn);
+ is_platfac = of_node_is_type(dn, "ibm,platform-facilities") ||
+ of_node_is_type(pdn, "ibm,platform-facilities");
+ of_node_put(pdn);
+
+ /*
+ * The drivers that bind to nodes in the platform-facilities
+ * hierarchy don't support node removal, and the removal directive
+ * from firmware is always followed by an add of an equivalent
+ * node. The capability (e.g. RNG, encryption, compression)
+ * represented by the node is never interrupted by the migration.
+ * So ignore changes to this part of the tree.
+ */
+ if (is_platfac) {
+ pr_notice("ignoring remove operation for %pOFfp\n", dn);
+ return 0;
+ }
+
pr_debug("removing node %pOFfp\n", dn);
dlpar_detach_node(dn);
return 0;
@@ -222,6 +243,19 @@ static int add_dt_node(struct device_node *parent_dn, __be32 drc_index)
if (!dn)
return -ENOENT;
+ /*
+ * Since delete_dt_node() ignores this node type, this is the
+ * necessary counterpart. We also know that a platform-facilities
+ * node returned from dlpar_configure_connector() has children
+ * attached, and dlpar_attach_node() only adds the parent, leaking
+ * the children. So ignore these on the add side for now.
+ */
+ if (of_node_is_type(dn, "ibm,platform-facilities")) {
+ pr_notice("ignoring add operation for %pOF\n", dn);
+ dlpar_free_cc_nodes(dn);
+ return 0;
+ }
+
rc = dlpar_attach_node(dn, parent_dn);
if (rc)
dlpar_free_cc_nodes(dn);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 319fa1a52e438a6e028329187783a25ad498c4e6 Mon Sep 17 00:00:00 2001
From: Nathan Lynch <nathanl(a)linux.ibm.com>
Date: Wed, 20 Oct 2021 14:47:03 -0500
Subject: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities
updates
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On VMs with NX encryption, compression, and/or RNG offload, these
capabilities are described by nodes in the ibm,platform-facilities device
tree hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
├── ibm,compression-v1
├── ibm,random-v1
└── ibm,sym-encryption-v1
3 directories
The acceleration functions that these nodes describe are not disrupted by
live migration, not even temporarily.
But the post-migration ibm,update-nodes sequence firmware always sends
"delete" messages for this hierarchy, followed by an "add" directive to
reconstruct it via ibm,configure-connector (log with debugging statements
enabled in mobility.c):
mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
mobility: removing node /ibm,platform-facilities:4294967286
...
mobility: added node /ibm,platform-facilities:4294967286
Note we receive a single "add" message for the entire hierarchy, and what
we receive from the ibm,configure-connector sequence is the top-level
platform-facilities node along with its three children. The debug message
simply reports the parent node and not the whole subtree.
Also, significantly, the nodes added are almost completely equivalent to
the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
the leaf nodes is the only property I've observed to differ, and Linux does
not use that. So in practice, the sum of update messages Linux receives for
this hierarchy is equivalent to minor property updates.
We succeed in removing the original hierarchy from the device tree. But the
vio bus code is ignorant of this, and does not unbind or relinquish its
references. The leaf nodes, still reachable through sysfs, of course still
refer to the now-freed ibm,platform-facilities parent node, which makes
use-after-free possible:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
refcount_warn_saturate+0x160/0x1f0 (unreliable)
kobject_get+0xf0/0x100
of_node_get+0x30/0x50
of_get_parent+0x50/0xb0
of_fwnode_get_parent+0x54/0x90
fwnode_count_parents+0x50/0x150
fwnode_full_name_string+0x30/0x110
device_node_string+0x49c/0x790
vsnprintf+0x1c0/0x4c0
sprintf+0x44/0x60
devspec_show+0x34/0x50
dev_attr_show+0x40/0xa0
sysfs_kf_seq_show+0xbc/0x200
kernfs_seq_show+0x44/0x60
seq_read_iter+0x2a4/0x740
kernfs_fop_read_iter+0x254/0x2e0
new_sync_read+0x120/0x190
vfs_read+0x1d0/0x240
Moreover, the "new" replacement subtree is not correctly added to the
device tree, resulting in ibm,platform-facilities parent node without the
appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
0 directories
$ cd /sys/devices/vio ; find . -xtype l -exec file {} +
./ibm,sym-encryption-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
./ibm,random-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
./ibm,compression-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
This is because add_dt_node() -> dlpar_attach_node() attaches only the
parent node returned from configure-connector, ignoring any children. This
should be corrected for the general case, but fixing that won't help with
the stale OF node references, which is the more urgent problem.
One way to address that would be to make the drivers respond to node
removal notifications, so that node references can be dropped
appropriately. But this would likely force the drivers to disrupt active
clients for no useful purpose: equivalent nodes are immediately re-added.
And recall that the acceleration capabilities described by the nodes remain
available throughout the whole process.
The solution I believe to be robust for this situation is to convert
remove+add of a node with an unchanged phandle to an update of the node's
properties in the Linux device tree structure. That would involve changing
and adding a fair amount of code, and may take several iterations to land.
Until that can be realized we have a confirmed use-after-free and the
possibility of memory corruption. So add a limited workaround that
discriminates on the node type, ignoring adds and removes. This should be
amenable to backporting in the meantime.
Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nathan Lynch <nathanl(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20211020194703.2613093-1-nathanl@linux.ibm.com
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e83e0891272d..210a37a065fb 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -63,6 +63,27 @@ static int mobility_rtas_call(int token, char *buf, s32 scope)
static int delete_dt_node(struct device_node *dn)
{
+ struct device_node *pdn;
+ bool is_platfac;
+
+ pdn = of_get_parent(dn);
+ is_platfac = of_node_is_type(dn, "ibm,platform-facilities") ||
+ of_node_is_type(pdn, "ibm,platform-facilities");
+ of_node_put(pdn);
+
+ /*
+ * The drivers that bind to nodes in the platform-facilities
+ * hierarchy don't support node removal, and the removal directive
+ * from firmware is always followed by an add of an equivalent
+ * node. The capability (e.g. RNG, encryption, compression)
+ * represented by the node is never interrupted by the migration.
+ * So ignore changes to this part of the tree.
+ */
+ if (is_platfac) {
+ pr_notice("ignoring remove operation for %pOFfp\n", dn);
+ return 0;
+ }
+
pr_debug("removing node %pOFfp\n", dn);
dlpar_detach_node(dn);
return 0;
@@ -222,6 +243,19 @@ static int add_dt_node(struct device_node *parent_dn, __be32 drc_index)
if (!dn)
return -ENOENT;
+ /*
+ * Since delete_dt_node() ignores this node type, this is the
+ * necessary counterpart. We also know that a platform-facilities
+ * node returned from dlpar_configure_connector() has children
+ * attached, and dlpar_attach_node() only adds the parent, leaking
+ * the children. So ignore these on the add side for now.
+ */
+ if (of_node_is_type(dn, "ibm,platform-facilities")) {
+ pr_notice("ignoring add operation for %pOF\n", dn);
+ dlpar_free_cc_nodes(dn);
+ return 0;
+ }
+
rc = dlpar_attach_node(dn, parent_dn);
if (rc)
dlpar_free_cc_nodes(dn);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 319fa1a52e438a6e028329187783a25ad498c4e6 Mon Sep 17 00:00:00 2001
From: Nathan Lynch <nathanl(a)linux.ibm.com>
Date: Wed, 20 Oct 2021 14:47:03 -0500
Subject: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities
updates
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On VMs with NX encryption, compression, and/or RNG offload, these
capabilities are described by nodes in the ibm,platform-facilities device
tree hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
├── ibm,compression-v1
├── ibm,random-v1
└── ibm,sym-encryption-v1
3 directories
The acceleration functions that these nodes describe are not disrupted by
live migration, not even temporarily.
But the post-migration ibm,update-nodes sequence firmware always sends
"delete" messages for this hierarchy, followed by an "add" directive to
reconstruct it via ibm,configure-connector (log with debugging statements
enabled in mobility.c):
mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
mobility: removing node /ibm,platform-facilities:4294967286
...
mobility: added node /ibm,platform-facilities:4294967286
Note we receive a single "add" message for the entire hierarchy, and what
we receive from the ibm,configure-connector sequence is the top-level
platform-facilities node along with its three children. The debug message
simply reports the parent node and not the whole subtree.
Also, significantly, the nodes added are almost completely equivalent to
the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
the leaf nodes is the only property I've observed to differ, and Linux does
not use that. So in practice, the sum of update messages Linux receives for
this hierarchy is equivalent to minor property updates.
We succeed in removing the original hierarchy from the device tree. But the
vio bus code is ignorant of this, and does not unbind or relinquish its
references. The leaf nodes, still reachable through sysfs, of course still
refer to the now-freed ibm,platform-facilities parent node, which makes
use-after-free possible:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
refcount_warn_saturate+0x160/0x1f0 (unreliable)
kobject_get+0xf0/0x100
of_node_get+0x30/0x50
of_get_parent+0x50/0xb0
of_fwnode_get_parent+0x54/0x90
fwnode_count_parents+0x50/0x150
fwnode_full_name_string+0x30/0x110
device_node_string+0x49c/0x790
vsnprintf+0x1c0/0x4c0
sprintf+0x44/0x60
devspec_show+0x34/0x50
dev_attr_show+0x40/0xa0
sysfs_kf_seq_show+0xbc/0x200
kernfs_seq_show+0x44/0x60
seq_read_iter+0x2a4/0x740
kernfs_fop_read_iter+0x254/0x2e0
new_sync_read+0x120/0x190
vfs_read+0x1d0/0x240
Moreover, the "new" replacement subtree is not correctly added to the
device tree, resulting in ibm,platform-facilities parent node without the
appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
0 directories
$ cd /sys/devices/vio ; find . -xtype l -exec file {} +
./ibm,sym-encryption-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
./ibm,random-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
./ibm,compression-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
This is because add_dt_node() -> dlpar_attach_node() attaches only the
parent node returned from configure-connector, ignoring any children. This
should be corrected for the general case, but fixing that won't help with
the stale OF node references, which is the more urgent problem.
One way to address that would be to make the drivers respond to node
removal notifications, so that node references can be dropped
appropriately. But this would likely force the drivers to disrupt active
clients for no useful purpose: equivalent nodes are immediately re-added.
And recall that the acceleration capabilities described by the nodes remain
available throughout the whole process.
The solution I believe to be robust for this situation is to convert
remove+add of a node with an unchanged phandle to an update of the node's
properties in the Linux device tree structure. That would involve changing
and adding a fair amount of code, and may take several iterations to land.
Until that can be realized we have a confirmed use-after-free and the
possibility of memory corruption. So add a limited workaround that
discriminates on the node type, ignoring adds and removes. This should be
amenable to backporting in the meantime.
Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nathan Lynch <nathanl(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20211020194703.2613093-1-nathanl@linux.ibm.com
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e83e0891272d..210a37a065fb 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -63,6 +63,27 @@ static int mobility_rtas_call(int token, char *buf, s32 scope)
static int delete_dt_node(struct device_node *dn)
{
+ struct device_node *pdn;
+ bool is_platfac;
+
+ pdn = of_get_parent(dn);
+ is_platfac = of_node_is_type(dn, "ibm,platform-facilities") ||
+ of_node_is_type(pdn, "ibm,platform-facilities");
+ of_node_put(pdn);
+
+ /*
+ * The drivers that bind to nodes in the platform-facilities
+ * hierarchy don't support node removal, and the removal directive
+ * from firmware is always followed by an add of an equivalent
+ * node. The capability (e.g. RNG, encryption, compression)
+ * represented by the node is never interrupted by the migration.
+ * So ignore changes to this part of the tree.
+ */
+ if (is_platfac) {
+ pr_notice("ignoring remove operation for %pOFfp\n", dn);
+ return 0;
+ }
+
pr_debug("removing node %pOFfp\n", dn);
dlpar_detach_node(dn);
return 0;
@@ -222,6 +243,19 @@ static int add_dt_node(struct device_node *parent_dn, __be32 drc_index)
if (!dn)
return -ENOENT;
+ /*
+ * Since delete_dt_node() ignores this node type, this is the
+ * necessary counterpart. We also know that a platform-facilities
+ * node returned from dlpar_configure_connector() has children
+ * attached, and dlpar_attach_node() only adds the parent, leaking
+ * the children. So ignore these on the add side for now.
+ */
+ if (of_node_is_type(dn, "ibm,platform-facilities")) {
+ pr_notice("ignoring add operation for %pOF\n", dn);
+ dlpar_free_cc_nodes(dn);
+ return 0;
+ }
+
rc = dlpar_attach_node(dn, parent_dn);
if (rc)
dlpar_free_cc_nodes(dn);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 319fa1a52e438a6e028329187783a25ad498c4e6 Mon Sep 17 00:00:00 2001
From: Nathan Lynch <nathanl(a)linux.ibm.com>
Date: Wed, 20 Oct 2021 14:47:03 -0500
Subject: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities
updates
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On VMs with NX encryption, compression, and/or RNG offload, these
capabilities are described by nodes in the ibm,platform-facilities device
tree hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
├── ibm,compression-v1
├── ibm,random-v1
└── ibm,sym-encryption-v1
3 directories
The acceleration functions that these nodes describe are not disrupted by
live migration, not even temporarily.
But the post-migration ibm,update-nodes sequence firmware always sends
"delete" messages for this hierarchy, followed by an "add" directive to
reconstruct it via ibm,configure-connector (log with debugging statements
enabled in mobility.c):
mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
mobility: removing node /ibm,platform-facilities:4294967286
...
mobility: added node /ibm,platform-facilities:4294967286
Note we receive a single "add" message for the entire hierarchy, and what
we receive from the ibm,configure-connector sequence is the top-level
platform-facilities node along with its three children. The debug message
simply reports the parent node and not the whole subtree.
Also, significantly, the nodes added are almost completely equivalent to
the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
the leaf nodes is the only property I've observed to differ, and Linux does
not use that. So in practice, the sum of update messages Linux receives for
this hierarchy is equivalent to minor property updates.
We succeed in removing the original hierarchy from the device tree. But the
vio bus code is ignorant of this, and does not unbind or relinquish its
references. The leaf nodes, still reachable through sysfs, of course still
refer to the now-freed ibm,platform-facilities parent node, which makes
use-after-free possible:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
refcount_warn_saturate+0x160/0x1f0 (unreliable)
kobject_get+0xf0/0x100
of_node_get+0x30/0x50
of_get_parent+0x50/0xb0
of_fwnode_get_parent+0x54/0x90
fwnode_count_parents+0x50/0x150
fwnode_full_name_string+0x30/0x110
device_node_string+0x49c/0x790
vsnprintf+0x1c0/0x4c0
sprintf+0x44/0x60
devspec_show+0x34/0x50
dev_attr_show+0x40/0xa0
sysfs_kf_seq_show+0xbc/0x200
kernfs_seq_show+0x44/0x60
seq_read_iter+0x2a4/0x740
kernfs_fop_read_iter+0x254/0x2e0
new_sync_read+0x120/0x190
vfs_read+0x1d0/0x240
Moreover, the "new" replacement subtree is not correctly added to the
device tree, resulting in ibm,platform-facilities parent node without the
appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
0 directories
$ cd /sys/devices/vio ; find . -xtype l -exec file {} +
./ibm,sym-encryption-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
./ibm,random-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
./ibm,compression-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
This is because add_dt_node() -> dlpar_attach_node() attaches only the
parent node returned from configure-connector, ignoring any children. This
should be corrected for the general case, but fixing that won't help with
the stale OF node references, which is the more urgent problem.
One way to address that would be to make the drivers respond to node
removal notifications, so that node references can be dropped
appropriately. But this would likely force the drivers to disrupt active
clients for no useful purpose: equivalent nodes are immediately re-added.
And recall that the acceleration capabilities described by the nodes remain
available throughout the whole process.
The solution I believe to be robust for this situation is to convert
remove+add of a node with an unchanged phandle to an update of the node's
properties in the Linux device tree structure. That would involve changing
and adding a fair amount of code, and may take several iterations to land.
Until that can be realized we have a confirmed use-after-free and the
possibility of memory corruption. So add a limited workaround that
discriminates on the node type, ignoring adds and removes. This should be
amenable to backporting in the meantime.
Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nathan Lynch <nathanl(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20211020194703.2613093-1-nathanl@linux.ibm.com
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e83e0891272d..210a37a065fb 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -63,6 +63,27 @@ static int mobility_rtas_call(int token, char *buf, s32 scope)
static int delete_dt_node(struct device_node *dn)
{
+ struct device_node *pdn;
+ bool is_platfac;
+
+ pdn = of_get_parent(dn);
+ is_platfac = of_node_is_type(dn, "ibm,platform-facilities") ||
+ of_node_is_type(pdn, "ibm,platform-facilities");
+ of_node_put(pdn);
+
+ /*
+ * The drivers that bind to nodes in the platform-facilities
+ * hierarchy don't support node removal, and the removal directive
+ * from firmware is always followed by an add of an equivalent
+ * node. The capability (e.g. RNG, encryption, compression)
+ * represented by the node is never interrupted by the migration.
+ * So ignore changes to this part of the tree.
+ */
+ if (is_platfac) {
+ pr_notice("ignoring remove operation for %pOFfp\n", dn);
+ return 0;
+ }
+
pr_debug("removing node %pOFfp\n", dn);
dlpar_detach_node(dn);
return 0;
@@ -222,6 +243,19 @@ static int add_dt_node(struct device_node *parent_dn, __be32 drc_index)
if (!dn)
return -ENOENT;
+ /*
+ * Since delete_dt_node() ignores this node type, this is the
+ * necessary counterpart. We also know that a platform-facilities
+ * node returned from dlpar_configure_connector() has children
+ * attached, and dlpar_attach_node() only adds the parent, leaking
+ * the children. So ignore these on the add side for now.
+ */
+ if (of_node_is_type(dn, "ibm,platform-facilities")) {
+ pr_notice("ignoring add operation for %pOF\n", dn);
+ dlpar_free_cc_nodes(dn);
+ return 0;
+ }
+
rc = dlpar_attach_node(dn, parent_dn);
if (rc)
dlpar_free_cc_nodes(dn);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 319fa1a52e438a6e028329187783a25ad498c4e6 Mon Sep 17 00:00:00 2001
From: Nathan Lynch <nathanl(a)linux.ibm.com>
Date: Wed, 20 Oct 2021 14:47:03 -0500
Subject: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities
updates
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On VMs with NX encryption, compression, and/or RNG offload, these
capabilities are described by nodes in the ibm,platform-facilities device
tree hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
├── ibm,compression-v1
├── ibm,random-v1
└── ibm,sym-encryption-v1
3 directories
The acceleration functions that these nodes describe are not disrupted by
live migration, not even temporarily.
But the post-migration ibm,update-nodes sequence firmware always sends
"delete" messages for this hierarchy, followed by an "add" directive to
reconstruct it via ibm,configure-connector (log with debugging statements
enabled in mobility.c):
mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
mobility: removing node /ibm,platform-facilities:4294967286
...
mobility: added node /ibm,platform-facilities:4294967286
Note we receive a single "add" message for the entire hierarchy, and what
we receive from the ibm,configure-connector sequence is the top-level
platform-facilities node along with its three children. The debug message
simply reports the parent node and not the whole subtree.
Also, significantly, the nodes added are almost completely equivalent to
the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
the leaf nodes is the only property I've observed to differ, and Linux does
not use that. So in practice, the sum of update messages Linux receives for
this hierarchy is equivalent to minor property updates.
We succeed in removing the original hierarchy from the device tree. But the
vio bus code is ignorant of this, and does not unbind or relinquish its
references. The leaf nodes, still reachable through sysfs, of course still
refer to the now-freed ibm,platform-facilities parent node, which makes
use-after-free possible:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
refcount_warn_saturate+0x160/0x1f0 (unreliable)
kobject_get+0xf0/0x100
of_node_get+0x30/0x50
of_get_parent+0x50/0xb0
of_fwnode_get_parent+0x54/0x90
fwnode_count_parents+0x50/0x150
fwnode_full_name_string+0x30/0x110
device_node_string+0x49c/0x790
vsnprintf+0x1c0/0x4c0
sprintf+0x44/0x60
devspec_show+0x34/0x50
dev_attr_show+0x40/0xa0
sysfs_kf_seq_show+0xbc/0x200
kernfs_seq_show+0x44/0x60
seq_read_iter+0x2a4/0x740
kernfs_fop_read_iter+0x254/0x2e0
new_sync_read+0x120/0x190
vfs_read+0x1d0/0x240
Moreover, the "new" replacement subtree is not correctly added to the
device tree, resulting in ibm,platform-facilities parent node without the
appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
0 directories
$ cd /sys/devices/vio ; find . -xtype l -exec file {} +
./ibm,sym-encryption-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
./ibm,random-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
./ibm,compression-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
This is because add_dt_node() -> dlpar_attach_node() attaches only the
parent node returned from configure-connector, ignoring any children. This
should be corrected for the general case, but fixing that won't help with
the stale OF node references, which is the more urgent problem.
One way to address that would be to make the drivers respond to node
removal notifications, so that node references can be dropped
appropriately. But this would likely force the drivers to disrupt active
clients for no useful purpose: equivalent nodes are immediately re-added.
And recall that the acceleration capabilities described by the nodes remain
available throughout the whole process.
The solution I believe to be robust for this situation is to convert
remove+add of a node with an unchanged phandle to an update of the node's
properties in the Linux device tree structure. That would involve changing
and adding a fair amount of code, and may take several iterations to land.
Until that can be realized we have a confirmed use-after-free and the
possibility of memory corruption. So add a limited workaround that
discriminates on the node type, ignoring adds and removes. This should be
amenable to backporting in the meantime.
Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nathan Lynch <nathanl(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20211020194703.2613093-1-nathanl@linux.ibm.com
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e83e0891272d..210a37a065fb 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -63,6 +63,27 @@ static int mobility_rtas_call(int token, char *buf, s32 scope)
static int delete_dt_node(struct device_node *dn)
{
+ struct device_node *pdn;
+ bool is_platfac;
+
+ pdn = of_get_parent(dn);
+ is_platfac = of_node_is_type(dn, "ibm,platform-facilities") ||
+ of_node_is_type(pdn, "ibm,platform-facilities");
+ of_node_put(pdn);
+
+ /*
+ * The drivers that bind to nodes in the platform-facilities
+ * hierarchy don't support node removal, and the removal directive
+ * from firmware is always followed by an add of an equivalent
+ * node. The capability (e.g. RNG, encryption, compression)
+ * represented by the node is never interrupted by the migration.
+ * So ignore changes to this part of the tree.
+ */
+ if (is_platfac) {
+ pr_notice("ignoring remove operation for %pOFfp\n", dn);
+ return 0;
+ }
+
pr_debug("removing node %pOFfp\n", dn);
dlpar_detach_node(dn);
return 0;
@@ -222,6 +243,19 @@ static int add_dt_node(struct device_node *parent_dn, __be32 drc_index)
if (!dn)
return -ENOENT;
+ /*
+ * Since delete_dt_node() ignores this node type, this is the
+ * necessary counterpart. We also know that a platform-facilities
+ * node returned from dlpar_configure_connector() has children
+ * attached, and dlpar_attach_node() only adds the parent, leaking
+ * the children. So ignore these on the add side for now.
+ */
+ if (of_node_is_type(dn, "ibm,platform-facilities")) {
+ pr_notice("ignoring add operation for %pOF\n", dn);
+ dlpar_free_cc_nodes(dn);
+ return 0;
+ }
+
rc = dlpar_attach_node(dn, parent_dn);
if (rc)
dlpar_free_cc_nodes(dn);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 319fa1a52e438a6e028329187783a25ad498c4e6 Mon Sep 17 00:00:00 2001
From: Nathan Lynch <nathanl(a)linux.ibm.com>
Date: Wed, 20 Oct 2021 14:47:03 -0500
Subject: [PATCH] powerpc/pseries/mobility: ignore ibm, platform-facilities
updates
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On VMs with NX encryption, compression, and/or RNG offload, these
capabilities are described by nodes in the ibm,platform-facilities device
tree hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
├── ibm,compression-v1
├── ibm,random-v1
└── ibm,sym-encryption-v1
3 directories
The acceleration functions that these nodes describe are not disrupted by
live migration, not even temporarily.
But the post-migration ibm,update-nodes sequence firmware always sends
"delete" messages for this hierarchy, followed by an "add" directive to
reconstruct it via ibm,configure-connector (log with debugging statements
enabled in mobility.c):
mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
mobility: removing node /ibm,platform-facilities:4294967286
...
mobility: added node /ibm,platform-facilities:4294967286
Note we receive a single "add" message for the entire hierarchy, and what
we receive from the ibm,configure-connector sequence is the top-level
platform-facilities node along with its three children. The debug message
simply reports the parent node and not the whole subtree.
Also, significantly, the nodes added are almost completely equivalent to
the ones removed; even phandles are unchanged. ibm,shared-interrupt-pool in
the leaf nodes is the only property I've observed to differ, and Linux does
not use that. So in practice, the sum of update messages Linux receives for
this hierarchy is equivalent to minor property updates.
We succeed in removing the original hierarchy from the device tree. But the
vio bus code is ignorant of this, and does not unbind or relinquish its
references. The leaf nodes, still reachable through sysfs, of course still
refer to the now-freed ibm,platform-facilities parent node, which makes
use-after-free possible:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 1706 at lib/refcount.c:25 refcount_warn_saturate+0x164/0x1f0
refcount_warn_saturate+0x160/0x1f0 (unreliable)
kobject_get+0xf0/0x100
of_node_get+0x30/0x50
of_get_parent+0x50/0xb0
of_fwnode_get_parent+0x54/0x90
fwnode_count_parents+0x50/0x150
fwnode_full_name_string+0x30/0x110
device_node_string+0x49c/0x790
vsnprintf+0x1c0/0x4c0
sprintf+0x44/0x60
devspec_show+0x34/0x50
dev_attr_show+0x40/0xa0
sysfs_kf_seq_show+0xbc/0x200
kernfs_seq_show+0x44/0x60
seq_read_iter+0x2a4/0x740
kernfs_fop_read_iter+0x254/0x2e0
new_sync_read+0x120/0x190
vfs_read+0x1d0/0x240
Moreover, the "new" replacement subtree is not correctly added to the
device tree, resulting in ibm,platform-facilities parent node without the
appropriate leaf nodes, and broken symlinks in the sysfs device hierarchy:
$ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
/sys/firmware/devicetree/base/ibm,platform-facilities/
0 directories
$ cd /sys/devices/vio ; find . -xtype l -exec file {} +
./ibm,sym-encryption-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,sym-encryption-v1
./ibm,random-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,random-v1
./ibm,compression-v1/of_node: broken symbolic link to
../../../firmware/devicetree/base/ibm,platform-facilities/ibm,compression-v1
This is because add_dt_node() -> dlpar_attach_node() attaches only the
parent node returned from configure-connector, ignoring any children. This
should be corrected for the general case, but fixing that won't help with
the stale OF node references, which is the more urgent problem.
One way to address that would be to make the drivers respond to node
removal notifications, so that node references can be dropped
appropriately. But this would likely force the drivers to disrupt active
clients for no useful purpose: equivalent nodes are immediately re-added.
And recall that the acceleration capabilities described by the nodes remain
available throughout the whole process.
The solution I believe to be robust for this situation is to convert
remove+add of a node with an unchanged phandle to an update of the node's
properties in the Linux device tree structure. That would involve changing
and adding a fair amount of code, and may take several iterations to land.
Until that can be realized we have a confirmed use-after-free and the
possibility of memory corruption. So add a limited workaround that
discriminates on the node type, ignoring adds and removes. This should be
amenable to backporting in the meantime.
Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nathan Lynch <nathanl(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20211020194703.2613093-1-nathanl@linux.ibm.com
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e83e0891272d..210a37a065fb 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -63,6 +63,27 @@ static int mobility_rtas_call(int token, char *buf, s32 scope)
static int delete_dt_node(struct device_node *dn)
{
+ struct device_node *pdn;
+ bool is_platfac;
+
+ pdn = of_get_parent(dn);
+ is_platfac = of_node_is_type(dn, "ibm,platform-facilities") ||
+ of_node_is_type(pdn, "ibm,platform-facilities");
+ of_node_put(pdn);
+
+ /*
+ * The drivers that bind to nodes in the platform-facilities
+ * hierarchy don't support node removal, and the removal directive
+ * from firmware is always followed by an add of an equivalent
+ * node. The capability (e.g. RNG, encryption, compression)
+ * represented by the node is never interrupted by the migration.
+ * So ignore changes to this part of the tree.
+ */
+ if (is_platfac) {
+ pr_notice("ignoring remove operation for %pOFfp\n", dn);
+ return 0;
+ }
+
pr_debug("removing node %pOFfp\n", dn);
dlpar_detach_node(dn);
return 0;
@@ -222,6 +243,19 @@ static int add_dt_node(struct device_node *parent_dn, __be32 drc_index)
if (!dn)
return -ENOENT;
+ /*
+ * Since delete_dt_node() ignores this node type, this is the
+ * necessary counterpart. We also know that a platform-facilities
+ * node returned from dlpar_configure_connector() has children
+ * attached, and dlpar_attach_node() only adds the parent, leaking
+ * the children. So ignore these on the add side for now.
+ */
+ if (of_node_is_type(dn, "ibm,platform-facilities")) {
+ pr_notice("ignoring add operation for %pOF\n", dn);
+ dlpar_free_cc_nodes(dn);
+ return 0;
+ }
+
rc = dlpar_attach_node(dn, parent_dn);
if (rc)
dlpar_free_cc_nodes(dn);
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8468e937df1f31411d1e127fa38db064af051fe5 Mon Sep 17 00:00:00 2001
From: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Date: Fri, 5 Nov 2021 13:43:44 -0700
Subject: [PATCH] mm, thp: fix incorrect unmap behavior for private pages
When truncating pagecache on file THP, the private pages of a process
should not be unmapped mapping. This incorrect behavior on a dynamic
shared libraries which will cause related processes to happen core dump.
A simple test for a DSO (Prerequisite is the DSO mapped in file THP):
int main(int argc, char *argv[])
{
int fd;
fd = open(argv[1], O_WRONLY);
if (fd < 0) {
perror("open");
}
close(fd);
return 0;
}
The test only to open a target DSO, and do nothing. But this operation
will lead one or more process to happen core dump. This patch mainly to
fix this bug.
Link: https://lkml.kernel.org/r/20211025092134.18562-3-rongwei.wang@linux.alibaba…
Fixes: eb6ecbed0aa2 ("mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs")
Signed-off-by: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Tested-by: Xu Yu <xuyu(a)linux.alibaba.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Song Liu <song(a)kernel.org>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Collin Fijalkovich <cfijalkovich(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/open.c b/fs/open.c
index 9ec3cfca3b1a..e0df1536eb69 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -857,8 +857,17 @@ static int do_dentry_open(struct file *f,
*/
smp_mb();
if (filemap_nr_thps(inode->i_mapping)) {
+ struct address_space *mapping = inode->i_mapping;
+
filemap_invalidate_lock(inode->i_mapping);
- truncate_pagecache(inode, 0);
+ /*
+ * unmap_mapping_range just need to be called once
+ * here, because the private pages is not need to be
+ * unmapped mapping (e.g. data segment of dynamic
+ * shared libraries here).
+ */
+ unmap_mapping_range(mapping, 0, 0, 0);
+ truncate_inode_pages(mapping, 0);
filemap_invalidate_unlock(inode->i_mapping);
}
}
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 55fc0d91746759c71bc165bba62a2db64ac98e35 Mon Sep 17 00:00:00 2001
From: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Date: Fri, 5 Nov 2021 13:43:41 -0700
Subject: [PATCH] mm, thp: lock filemap when truncating page cache
Patch series "fix two bugs for file THP".
This patch (of 2):
Transparent huge page has supported read-only non-shmem files. The
file- backed THP is collapsed by khugepaged and truncated when written
(for shared libraries).
However, there is a race when multiple writers truncate the same page
cache concurrently.
In that case, subpage(s) of file THP can be revealed by find_get_entry
in truncate_inode_pages_range, which will trigger PageTail BUG_ON in
truncate_inode_page, as follows:
page:000000009e420ff2 refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff pfn:0x50c3ff
head:0000000075ff816d order:9 compound_mapcount:0 compound_pincount:0
flags: 0x37fffe0000010815(locked|uptodate|lru|arch_1|head)
raw: 37fffe0000000000 fffffe0013108001 dead000000000122 dead000000000400
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
head: 37fffe0000010815 fffffe001066bd48 ffff000404183c20 0000000000000000
head: 0000000000000600 0000000000000000 00000001ffffffff ffff000c0345a000
page dumped because: VM_BUG_ON_PAGE(PageTail(page))
------------[ cut here ]------------
kernel BUG at mm/truncate.c:213!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in: xfs(E) libcrc32c(E) rfkill(E) ...
CPU: 14 PID: 11394 Comm: check_madvise_d Kdump: ...
Hardware name: ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
Call trace:
truncate_inode_page+0x64/0x70
truncate_inode_pages_range+0x550/0x7e4
truncate_pagecache+0x58/0x80
do_dentry_open+0x1e4/0x3c0
vfs_open+0x38/0x44
do_open+0x1f0/0x310
path_openat+0x114/0x1dc
do_filp_open+0x84/0x134
do_sys_openat2+0xbc/0x164
__arm64_sys_openat+0x74/0xc0
el0_svc_common.constprop.0+0x88/0x220
do_el0_svc+0x30/0xa0
el0_svc+0x20/0x30
el0_sync_handler+0x1a4/0x1b0
el0_sync+0x180/0x1c0
Code: aa0103e0 900061e1 910ec021 9400d300 (d4210000)
This patch mainly to lock filemap when one enter truncate_pagecache(),
avoiding truncating the same page cache concurrently.
Link: https://lkml.kernel.org/r/20211025092134.18562-1-rongwei.wang@linux.alibaba…
Link: https://lkml.kernel.org/r/20211025092134.18562-2-rongwei.wang@linux.alibaba…
Fixes: eb6ecbed0aa2 ("mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs")
Signed-off-by: Xu Yu <xuyu(a)linux.alibaba.com>
Signed-off-by: Rongwei Wang <rongwei.wang(a)linux.alibaba.com>
Suggested-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Tested-by: Song Liu <song(a)kernel.org>
Cc: Collin Fijalkovich <cfijalkovich(a)google.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/open.c b/fs/open.c
index daa324606a41..9ec3cfca3b1a 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -856,8 +856,11 @@ static int do_dentry_open(struct file *f,
* of THPs into the page cache will fail.
*/
smp_mb();
- if (filemap_nr_thps(inode->i_mapping))
+ if (filemap_nr_thps(inode->i_mapping)) {
+ filemap_invalidate_lock(inode->i_mapping);
truncate_pagecache(inode, 0);
+ filemap_invalidate_unlock(inode->i_mapping);
+ }
}
return 0;
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8468f45091d2866affed6f6a7aecc20779139173 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Wed, 3 Nov 2021 14:49:17 +0800
Subject: [PATCH] bcache: fix use-after-free problem in bcache_device_free()
In bcache_device_free(), pointer disk is referenced still in
ida_simple_remove() after blk_cleanup_disk() gets called on this
pointer. This may cause a potential panic by use-after-free on the
disk pointer.
This patch fixes the problem by calling blk_cleanup_disk() after
ida_simple_remove().
Fixes: bc70852fd104 ("bcache: convert to blk_alloc_disk/blk_cleanup_disk")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Hannes Reinecke <hare(a)suse.de>
Cc: Ulf Hansson <ulf.hansson(a)linaro.org>
Cc: stable(a)vger.kernel.org # v5.14+
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/20211103064917.67383-1-colyli@suse.de
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 84a48eed8e24..a7bb3355b776 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -885,9 +885,9 @@ static void bcache_device_free(struct bcache_device *d)
bcache_device_detach(d);
if (disk) {
- blk_cleanup_disk(disk);
ida_simple_remove(&bcache_device_idx,
first_minor_to_idx(disk->first_minor));
+ blk_cleanup_disk(disk);
}
bioset_exit(&d->bio_split);
This is a backport of the remaining patches from the below series:
https://lore.kernel.org/all/cover.1633464148.git.naveen.n.rao@linux.vnet.ib…
Kindly apply to the longterm tree for v5.10
Thanks,
Naveen
Naveen N. Rao (4):
powerpc/lib: Add helper to check if offset is within conditional
branch range
powerpc/bpf: Validate branch ranges
powerpc/security: Add a helper to query stf_barrier type
powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/include/asm/security_features.h | 5 ++
arch/powerpc/kernel/security.c | 5 ++
arch/powerpc/lib/code-patching.c | 7 ++-
arch/powerpc/net/bpf_jit.h | 33 ++++++----
arch/powerpc/net/bpf_jit64.h | 8 +--
arch/powerpc/net/bpf_jit_comp64.c | 64 ++++++++++++++++++--
7 files changed, 100 insertions(+), 23 deletions(-)
base-commit: 5040520482a594e92d4f69141229a6dd26173511
--
2.33.1
The patch below does not apply to the 5.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d3e3c102d107bb84251455a298cf475f24bab995 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Thu, 11 Nov 2021 17:32:53 -0700
Subject: [PATCH] io-wq: serialize hash clear with wakeup
We need to ensure that we serialize the stalled and hash bits with the
wait_queue wait handler, or we could be racing with someone modifying
the hashed state after we find it busy, but before we then give up and
wait for it to be cleared. This can cause random delays or stalls when
handling buffered writes for many files, where some of these files cause
hash collisions between the worker threads.
Cc: stable(a)vger.kernel.org
Reported-by: Daniel Black <daniel(a)mariadb.org>
Fixes: e941894eae31 ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io-wq.c b/fs/io-wq.c
index afd955d53db9..88202de519f6 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -423,9 +423,10 @@ static inline unsigned int io_get_work_hash(struct io_wq_work *work)
return work->flags >> IO_WQ_HASH_SHIFT;
}
-static void io_wait_on_hash(struct io_wqe *wqe, unsigned int hash)
+static bool io_wait_on_hash(struct io_wqe *wqe, unsigned int hash)
{
struct io_wq *wq = wqe->wq;
+ bool ret = false;
spin_lock_irq(&wq->hash->wait.lock);
if (list_empty(&wqe->wait.entry)) {
@@ -433,9 +434,11 @@ static void io_wait_on_hash(struct io_wqe *wqe, unsigned int hash)
if (!test_bit(hash, &wq->hash->map)) {
__set_current_state(TASK_RUNNING);
list_del_init(&wqe->wait.entry);
+ ret = true;
}
}
spin_unlock_irq(&wq->hash->wait.lock);
+ return ret;
}
static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct,
@@ -475,14 +478,21 @@ static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct,
}
if (stall_hash != -1U) {
+ bool unstalled;
+
/*
* Set this before dropping the lock to avoid racing with new
* work being added and clearing the stalled bit.
*/
set_bit(IO_ACCT_STALLED_BIT, &acct->flags);
raw_spin_unlock(&wqe->lock);
- io_wait_on_hash(wqe, stall_hash);
+ unstalled = io_wait_on_hash(wqe, stall_hash);
raw_spin_lock(&wqe->lock);
+ if (unstalled) {
+ clear_bit(IO_ACCT_STALLED_BIT, &acct->flags);
+ if (wq_has_sleeper(&wqe->wq->hash->wait))
+ wake_up(&wqe->wq->hash->wait);
+ }
}
return NULL;
@@ -564,8 +574,11 @@ static void io_worker_handle_work(struct io_worker *worker)
io_wqe_enqueue(wqe, linked);
if (hash != -1U && !next_hashed) {
+ /* serialize hash clear with wake_up() */
+ spin_lock_irq(&wq->hash->wait.lock);
clear_bit(hash, &wq->hash->map);
clear_bit(IO_ACCT_STALLED_BIT, &acct->flags);
+ spin_unlock_irq(&wq->hash->wait.lock);
if (wq_has_sleeper(&wq->hash->wait))
wake_up(&wq->hash->wait);
raw_spin_lock(&wqe->lock);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 35e4c6c1a2fc2eb11b9306e95cda1fa06a511948 Mon Sep 17 00:00:00 2001
From: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Date: Tue, 9 Nov 2021 19:47:23 +0900
Subject: [PATCH] block: Hold invalidate_lock in BLKZEROOUT ioctl
When BLKZEROOUT ioctl and data read race, the data read leaves stale
page cache. To avoid the stale page cache, hold invalidate_lock of the
block device file mapping. The stale page cache is observed when
blktests test case block/009 is modified to call "blkdiscard -z" command
and repeated hundreds of times.
This patch can be applied back to the stable kernel version v5.15.y.
Rework is required for older stable kernels.
Fixes: 22dd6d356628 ("block: invalidate the page cache when issuing BLKZEROOUT")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Cc: stable(a)vger.kernel.org # v5.15
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20211109104723.835533-3-shinichiro.kawasaki@wdc.c…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/ioctl.c b/block/ioctl.c
index 9fa87f64f703..0a1d10ac2e1a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -154,6 +154,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
{
uint64_t range[2];
uint64_t start, end, len;
+ struct inode *inode = bdev->bd_inode;
int err;
if (!(mode & FMODE_WRITE))
@@ -176,12 +177,17 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
return -EINVAL;
/* Invalidate the page cache, including dirty pages */
+ filemap_invalidate_lock(inode->i_mapping);
err = truncate_bdev_range(bdev, mode, start, end);
if (err)
- return err;
+ goto fail;
+
+ err = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ BLKDEV_ZERO_NOUNMAP);
- return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
- BLKDEV_ZERO_NOUNMAP);
+fail:
+ filemap_invalidate_unlock(inode->i_mapping);
+ return err;
}
static int put_ushort(unsigned short __user *argp, unsigned short val)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 35e4c6c1a2fc2eb11b9306e95cda1fa06a511948 Mon Sep 17 00:00:00 2001
From: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Date: Tue, 9 Nov 2021 19:47:23 +0900
Subject: [PATCH] block: Hold invalidate_lock in BLKZEROOUT ioctl
When BLKZEROOUT ioctl and data read race, the data read leaves stale
page cache. To avoid the stale page cache, hold invalidate_lock of the
block device file mapping. The stale page cache is observed when
blktests test case block/009 is modified to call "blkdiscard -z" command
and repeated hundreds of times.
This patch can be applied back to the stable kernel version v5.15.y.
Rework is required for older stable kernels.
Fixes: 22dd6d356628 ("block: invalidate the page cache when issuing BLKZEROOUT")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Cc: stable(a)vger.kernel.org # v5.15
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20211109104723.835533-3-shinichiro.kawasaki@wdc.c…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/ioctl.c b/block/ioctl.c
index 9fa87f64f703..0a1d10ac2e1a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -154,6 +154,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
{
uint64_t range[2];
uint64_t start, end, len;
+ struct inode *inode = bdev->bd_inode;
int err;
if (!(mode & FMODE_WRITE))
@@ -176,12 +177,17 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
return -EINVAL;
/* Invalidate the page cache, including dirty pages */
+ filemap_invalidate_lock(inode->i_mapping);
err = truncate_bdev_range(bdev, mode, start, end);
if (err)
- return err;
+ goto fail;
+
+ err = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ BLKDEV_ZERO_NOUNMAP);
- return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
- BLKDEV_ZERO_NOUNMAP);
+fail:
+ filemap_invalidate_unlock(inode->i_mapping);
+ return err;
}
static int put_ushort(unsigned short __user *argp, unsigned short val)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 35e4c6c1a2fc2eb11b9306e95cda1fa06a511948 Mon Sep 17 00:00:00 2001
From: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Date: Tue, 9 Nov 2021 19:47:23 +0900
Subject: [PATCH] block: Hold invalidate_lock in BLKZEROOUT ioctl
When BLKZEROOUT ioctl and data read race, the data read leaves stale
page cache. To avoid the stale page cache, hold invalidate_lock of the
block device file mapping. The stale page cache is observed when
blktests test case block/009 is modified to call "blkdiscard -z" command
and repeated hundreds of times.
This patch can be applied back to the stable kernel version v5.15.y.
Rework is required for older stable kernels.
Fixes: 22dd6d356628 ("block: invalidate the page cache when issuing BLKZEROOUT")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Cc: stable(a)vger.kernel.org # v5.15
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20211109104723.835533-3-shinichiro.kawasaki@wdc.c…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/ioctl.c b/block/ioctl.c
index 9fa87f64f703..0a1d10ac2e1a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -154,6 +154,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
{
uint64_t range[2];
uint64_t start, end, len;
+ struct inode *inode = bdev->bd_inode;
int err;
if (!(mode & FMODE_WRITE))
@@ -176,12 +177,17 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
return -EINVAL;
/* Invalidate the page cache, including dirty pages */
+ filemap_invalidate_lock(inode->i_mapping);
err = truncate_bdev_range(bdev, mode, start, end);
if (err)
- return err;
+ goto fail;
+
+ err = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
+ BLKDEV_ZERO_NOUNMAP);
- return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
- BLKDEV_ZERO_NOUNMAP);
+fail:
+ filemap_invalidate_unlock(inode->i_mapping);
+ return err;
}
static int put_ushort(unsigned short __user *argp, unsigned short val)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7607c44c157d343223510c8ffdf7206fdd2a6213 Mon Sep 17 00:00:00 2001
From: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Date: Tue, 9 Nov 2021 19:47:22 +0900
Subject: [PATCH] block: Hold invalidate_lock in BLKDISCARD ioctl
When BLKDISCARD ioctl and data read race, the data read leaves stale
page cache. To avoid the stale page cache, hold invalidate_lock of the
block device file mapping. The stale page cache is observed when
blktests test case block/009 is repeated hundreds of times.
This patch can be applied back to the stable kernel version v5.15.y
with slight patch edit. Rework is required for older stable kernels.
Fixes: 351499a172c0 ("block: Invalidate cache on discard v2")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Cc: stable(a)vger.kernel.org # v5.15
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20211109104723.835533-2-shinichiro.kawasaki@wdc.c…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/ioctl.c b/block/ioctl.c
index d6af0ac97e57..9fa87f64f703 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -113,6 +113,7 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
uint64_t range[2];
uint64_t start, len;
struct request_queue *q = bdev_get_queue(bdev);
+ struct inode *inode = bdev->bd_inode;
int err;
if (!(mode & FMODE_WRITE))
@@ -135,12 +136,17 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
if (start + len > bdev_nr_bytes(bdev))
return -EINVAL;
+ filemap_invalidate_lock(inode->i_mapping);
err = truncate_bdev_range(bdev, mode, start, start + len - 1);
if (err)
- return err;
+ goto fail;
+
+ err = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+ GFP_KERNEL, flags);
- return blkdev_issue_discard(bdev, start >> 9, len >> 9,
- GFP_KERNEL, flags);
+fail:
+ filemap_invalidate_unlock(inode->i_mapping);
+ return err;
}
static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7607c44c157d343223510c8ffdf7206fdd2a6213 Mon Sep 17 00:00:00 2001
From: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Date: Tue, 9 Nov 2021 19:47:22 +0900
Subject: [PATCH] block: Hold invalidate_lock in BLKDISCARD ioctl
When BLKDISCARD ioctl and data read race, the data read leaves stale
page cache. To avoid the stale page cache, hold invalidate_lock of the
block device file mapping. The stale page cache is observed when
blktests test case block/009 is repeated hundreds of times.
This patch can be applied back to the stable kernel version v5.15.y
with slight patch edit. Rework is required for older stable kernels.
Fixes: 351499a172c0 ("block: Invalidate cache on discard v2")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Cc: stable(a)vger.kernel.org # v5.15
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20211109104723.835533-2-shinichiro.kawasaki@wdc.c…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/ioctl.c b/block/ioctl.c
index d6af0ac97e57..9fa87f64f703 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -113,6 +113,7 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
uint64_t range[2];
uint64_t start, len;
struct request_queue *q = bdev_get_queue(bdev);
+ struct inode *inode = bdev->bd_inode;
int err;
if (!(mode & FMODE_WRITE))
@@ -135,12 +136,17 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
if (start + len > bdev_nr_bytes(bdev))
return -EINVAL;
+ filemap_invalidate_lock(inode->i_mapping);
err = truncate_bdev_range(bdev, mode, start, start + len - 1);
if (err)
- return err;
+ goto fail;
+
+ err = blkdev_issue_discard(bdev, start >> 9, len >> 9,
+ GFP_KERNEL, flags);
- return blkdev_issue_discard(bdev, start >> 9, len >> 9,
- GFP_KERNEL, flags);
+fail:
+ filemap_invalidate_unlock(inode->i_mapping);
+ return err;
}
static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
The reset GPIO was marked active-high, which is against what's specified
in the documentation. Mark the reset GPIO as active-low. With this
change, Bluetooth can now be used on the i9100.
Fixes: 8620cc2f99b7 ("ARM: dts: exynos: Add devicetree file for the Galaxy S2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
---
arch/arm/boot/dts/exynos4210-i9100.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/exynos4210-i9100.dts b/arch/arm/boot/dts/exynos4210-i9100.dts
index 55922176807e..5f5d9b135736 100644
--- a/arch/arm/boot/dts/exynos4210-i9100.dts
+++ b/arch/arm/boot/dts/exynos4210-i9100.dts
@@ -827,7 +827,7 @@ bluetooth {
compatible = "brcm,bcm4330-bt";
shutdown-gpios = <&gpl0 4 GPIO_ACTIVE_HIGH>;
- reset-gpios = <&gpl1 0 GPIO_ACTIVE_HIGH>;
+ reset-gpios = <&gpl1 0 GPIO_ACTIVE_LOW>;
device-wakeup-gpios = <&gpx3 1 GPIO_ACTIVE_HIGH>;
host-wakeup-gpios = <&gpx2 6 GPIO_ACTIVE_HIGH>;
};
--
2.33.0
When using performance policy, EPP value is restored to non "performance"
mode EPP after offline and online.
For example:
cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference
performance
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference
balance_performance
The commit 4adcf2e5829f ("cpufreq: intel_pstate: Add ->offline and ->online callbacks")
optimized save restore path of the HWP request MSR, when there is no
change in the policy. Also added special processing for performance mode
EPP. If EPP has been set to "performance" by the active mode "performance"
scaling algorithm, replace that value with the cached EPP. This ends up
replacing with cached EPP during offline, which is restored during online
again.
So add a change which will set cpu_data->epp_policy to zero, when in
performance policy and has non zero epp. In this way EPP is set to zero
again.
Fixes: 4adcf2e5829f ("cpufreq: intel_pstate: Add ->offline and ->online callbacks")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # v5.9+
---
drivers/cpufreq/intel_pstate.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 815df3daae9d..49ff24d2b0ea 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -930,17 +930,23 @@ static void intel_pstate_hwp_set(unsigned int cpu)
{
struct cpudata *cpu_data = all_cpu_data[cpu];
int max, min;
+ s16 epp = 0;
u64 value;
- s16 epp;
max = cpu_data->max_perf_ratio;
min = cpu_data->min_perf_ratio;
- if (cpu_data->policy == CPUFREQ_POLICY_PERFORMANCE)
- min = max;
-
rdmsrl_on_cpu(cpu, MSR_HWP_REQUEST, &value);
+ if (boot_cpu_has(X86_FEATURE_HWP_EPP))
+ epp = (value >> 24) & 0xff;
+
+ if (cpu_data->policy == CPUFREQ_POLICY_PERFORMANCE) {
+ min = max;
+ if (epp)
+ cpu_data->epp_policy = 0;
+ }
+
value &= ~HWP_MIN_PERF(~0L);
value |= HWP_MIN_PERF(min);
--
2.17.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3735459037114d31e5acd9894fad9aed104231a0 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 9 Nov 2021 14:53:57 +0100
Subject: [PATCH] PCI/MSI: Destroy sysfs before freeing entries
free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 70433013897b..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -368,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
+ if (dev->msi_irq_groups) {
+ msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+ dev->msi_irq_groups = NULL;
+ }
+
pci_msi_teardown_msi_irqs(dev);
list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -379,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
list_del(&entry->list);
free_msi_entry(entry);
}
-
- if (dev->msi_irq_groups) {
- msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
- dev->msi_irq_groups = NULL;
- }
}
static void pci_intx_for_msi(struct pci_dev *dev, int enable)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a923a2676e60683aee46aa4b93c30aff240ac20d Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Fri, 22 Oct 2021 00:58:23 +0200
Subject: [PATCH] MIPS: Fix assembly error from MIPSr2 code used within
MIPS_ISA_ARCH_LEVEL
Fix assembly errors like:
{standard input}: Assembler messages:
{standard input}:287: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
{standard input}:680: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
{standard input}:1274: Error: opcode not supported on this processor: mips3 (mips3) `dins $12,$9,32,32'
{standard input}:2175: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32'
make[1]: *** [scripts/Makefile.build:277: mm/highmem.o] Error 1
with code produced from `__cmpxchg64' for MIPS64r2 CPU configurations
using CONFIG_32BIT and CONFIG_PHYS_ADDR_T_64BIT.
This is due to MIPS_ISA_ARCH_LEVEL downgrading the assembly architecture
to `r4000' i.e. MIPS III for MIPS64r2 configurations, while there is a
block of code containing a DINS MIPS64r2 instruction conditionalized on
MIPS_ISA_REV >= 2 within the scope of the downgrade.
The assembly architecture override code pattern has been put there for
LL/SC instructions, so that code compiles for configurations that select
a processor to build for that does not support these instructions while
still providing run-time support for processors that do, dynamically
switched by non-constant `cpu_has_llsc'. It went in with linux-mips.org
commit aac8aa7717a2 ("Enable a suitable ISA for the assembler around
ll/sc so that code builds even for processors that don't support the
instructions. Plus minor formatting fixes.") back in 2005.
Fix the problem by wrapping these instructions along with the adjacent
SYNC instructions only, following the practice established with commit
cfd54de3b0e4 ("MIPS: Avoid move psuedo-instruction whilst using
MIPS_ISA_LEVEL") and commit 378ed6f0e3c5 ("MIPS: Avoid using .set mips0
to restore ISA"). Strictly speaking the SYNC instructions do not have
to be wrapped as they are only used as a Loongson3 erratum workaround,
so they will be enabled in the assembler by default, but do this so as
to keep code consistent with other places.
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: c7e2d71dda7a ("MIPS: Fix set_pte() for Netlogic XLR using cmpxchg64()")
Cc: stable(a)vger.kernel.org # v5.1+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 0b983800f48b..66a8b293fd80 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -249,6 +249,7 @@ static inline unsigned long __cmpxchg64(volatile void *ptr,
/* Load 64 bits from ptr */
" " __SYNC(full, loongson3_war) " \n"
"1: lld %L0, %3 # __cmpxchg64 \n"
+ " .set pop \n"
/*
* Split the 64 bit value we loaded into the 2 registers that hold the
* ret variable.
@@ -276,12 +277,14 @@ static inline unsigned long __cmpxchg64(volatile void *ptr,
" or %L1, %L1, $at \n"
" .set at \n"
# endif
+ " .set push \n"
+ " .set " MIPS_ISA_ARCH_LEVEL " \n"
/* Attempt to store new at ptr */
" scd %L1, %2 \n"
/* If we failed, loop! */
"\t" __SC_BEQZ "%L1, 1b \n"
- " .set pop \n"
"2: " __SYNC(full, loongson3_war) " \n"
+ " .set pop \n"
: "=&r"(ret),
"=&r"(tmp),
"=" GCC_OFF_SMALL_ASM() (*(unsigned long long *)ptr)
Please include "smb3: do not error on fsync when readonly"
Commit id:
71e6864eacbef0b2645ca043cdfbac272cb6cea3
It fixes an fsync problem over SMB3 mounts, reported by Julian, when
the file is not opened for write.
--
Thanks,
Steve
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 712a951025c0667ff00b25afc360f74e639dfabe Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Tue, 2 Nov 2021 11:10:37 +0100
Subject: [PATCH] fuse: fix page stealing
It is possible to trigger a crash by splicing anon pipe bufs to the fuse
device.
The reason for this is that anon_pipe_buf_release() will reuse buf->page if
the refcount is 1, but that page might have already been stolen and its
flags modified (e.g. PG_lru added).
This happens in the unlikely case of fuse_dev_splice_write() getting around
to calling pipe_buf_release() after a page has been stolen, added to the
page cache and removed from the page cache.
Fix by calling pipe_buf_release() right after the page was inserted into
the page cache. In this case the page has an elevated refcount so any
release function will know that the page isn't reusable.
Reported-by: Frank Dinoff <fdinoff(a)google.com>
Link: https://lore.kernel.org/r/CAAmZXrsGg2xsP1CK+cbuEMumtrqdvD-NKnWzhNcvn71RV3c1…
Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device")
Cc: <stable(a)vger.kernel.org> # v2.6.35
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 337fa6f5a7c6..79f7eda49e06 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -847,6 +847,12 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
replace_page_cache_page(oldpage, newpage);
+ /*
+ * Release while we have extra ref on stolen page. Otherwise
+ * anon_pipe_buf_release() might think the page can be reused.
+ */
+ pipe_buf_release(cs->pipe, buf);
+
get_page(newpage);
if (!(buf->flags & PIPE_BUF_FLAG_LRU))
@@ -2031,8 +2037,12 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
pipe_lock(pipe);
out_free:
- for (idx = 0; idx < nbuf; idx++)
- pipe_buf_release(pipe, &bufs[idx]);
+ for (idx = 0; idx < nbuf; idx++) {
+ struct pipe_buffer *buf = &bufs[idx];
+
+ if (buf->ops)
+ pipe_buf_release(pipe, buf);
+ }
pipe_unlock(pipe);
kvfree(bufs);
This is a backport of the remaining patches from the below series:
https://lore.kernel.org/all/cover.1633464148.git.naveen.n.rao@linux.vnet.ib…
Kindly apply to the longterm tree for v5.4
Thanks,
Naveen
Naveen N. Rao (5):
powerpc/lib: Add helper to check if offset is within conditional
branch range
powerpc/bpf: Validate branch ranges
powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
powerpc/security: Add a helper to query stf_barrier type
powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/include/asm/security_features.h | 5 ++
arch/powerpc/kernel/security.c | 5 ++
arch/powerpc/lib/code-patching.c | 7 +-
arch/powerpc/net/bpf_jit.h | 33 ++++---
arch/powerpc/net/bpf_jit64.h | 8 +-
arch/powerpc/net/bpf_jit_comp64.c | 91 ++++++++++++++++----
7 files changed, 117 insertions(+), 33 deletions(-)
base-commit: c65356f0f7268b1260dd64415c2145e73640872e
--
2.33.1
This is a backport of the remaining patches from the below series:
https://lore.kernel.org/all/cover.1633464148.git.naveen.n.rao@linux.vnet.ib…
Kindly apply to the longterm tree for v4.19
Thanks,
Naveen
Naveen N. Rao (5):
powerpc/lib: Add helper to check if offset is within conditional
branch range
powerpc/bpf: Validate branch ranges
powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
powerpc/security: Add a helper to query stf_barrier type
powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/include/asm/security_features.h | 5 ++
arch/powerpc/kernel/security.c | 5 ++
arch/powerpc/lib/code-patching.c | 7 +-
arch/powerpc/net/bpf_jit.h | 33 ++++---
arch/powerpc/net/bpf_jit64.h | 8 +-
arch/powerpc/net/bpf_jit_comp64.c | 93 ++++++++++++++++----
7 files changed, 118 insertions(+), 34 deletions(-)
base-commit: 3033e5726834e4c9c8c48cdb2273f33bd105f938
--
2.33.1
This is a backport of the remaining patches from the below series:
https://lore.kernel.org/all/cover.1633464148.git.naveen.n.rao@linux.vnet.ib…
Kindly apply to the longterm tree for v4.14
Thanks,
Naveen
Naveen N. Rao (3):
powerpc/lib: Add helper to check if offset is within conditional
branch range
powerpc/bpf: Validate branch ranges
powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/lib/code-patching.c | 7 ++++-
arch/powerpc/net/bpf_jit.h | 33 +++++++++++++--------
arch/powerpc/net/bpf_jit_comp64.c | 37 +++++++++++++++---------
4 files changed, 52 insertions(+), 26 deletions(-)
base-commit: 0447aa205abe1c0c016b4f7fa9d7c08d920b5c8e
--
2.33.1
This is a backport of the remaining patches from the below series:
https://lore.kernel.org/all/cover.1633464148.git.naveen.n.rao@linux.vnet.ib…
Kindly apply to the longterm tree for v4.9
Thanks,
Naveen
Naveen N. Rao (2):
powerpc/bpf: Validate branch ranges
powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
arch/powerpc/net/bpf_jit.h | 25 ++++++++++++++++-----
arch/powerpc/net/bpf_jit_comp64.c | 37 ++++++++++++++++++++-----------
2 files changed, 43 insertions(+), 19 deletions(-)
base-commit: 59d4178b517472a1f023886baded2191458a76b5
--
2.33.1
This is a note to let you know that I've just added the patch titled
staging: r8188eu: use GFP_ATOMIC under spinlock
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 4a293eaf92a510ff688dc7b3f0815221f99c9d1b Mon Sep 17 00:00:00 2001
From: Michael Straube <straube.linux(a)gmail.com>
Date: Mon, 8 Nov 2021 11:55:37 +0100
Subject: staging: r8188eu: use GFP_ATOMIC under spinlock
In function rtw_report_sec_ie() kzalloc() is called under a spinlock,
so the allocation have to be atomic.
Call tree:
-> rtw_select_and_join_from_scanned_queue() <- takes a spinlock
-> rtw_joinbss_cmd()
-> rtw_restruct_sec_ie()
-> rtw_report_sec_ie()
Fixes: 2b42bd58b321 ("staging: r8188eu: introduce new os_dep dir for RTL8188eu driver")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Michael Straube <straube.linux(a)gmail.com>
Link: https://lore.kernel.org/r/20211108105537.31655-1-straube.linux@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/r8188eu/os_dep/mlme_linux.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/r8188eu/os_dep/mlme_linux.c b/drivers/staging/r8188eu/os_dep/mlme_linux.c
index a9b6ffdbf31a..f7ce724ebf87 100644
--- a/drivers/staging/r8188eu/os_dep/mlme_linux.c
+++ b/drivers/staging/r8188eu/os_dep/mlme_linux.c
@@ -112,7 +112,7 @@ void rtw_report_sec_ie(struct adapter *adapter, u8 authmode, u8 *sec_ie)
buff = NULL;
if (authmode == _WPA_IE_ID_) {
- buff = kzalloc(IW_CUSTOM_MAX, GFP_KERNEL);
+ buff = kzalloc(IW_CUSTOM_MAX, GFP_ATOMIC);
if (!buff)
return;
p = buff;
--
2.33.1
This is a note to let you know that I've just added the patch titled
staging: r8188eu: fix a memory leak in rtw_wx_read32()
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From be4ea8f383551b9dae11b8dfff1f38b3b5436e9a Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Tue, 9 Nov 2021 14:49:36 +0300
Subject: staging: r8188eu: fix a memory leak in rtw_wx_read32()
Free "ptmp" before returning -EINVAL.
Fixes: 2b42bd58b321 ("staging: r8188eu: introduce new os_dep dir for RTL8188eu driver")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Link: https://lore.kernel.org/r/20211109114935.GC16587@kili
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/r8188eu/os_dep/ioctl_linux.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/staging/r8188eu/os_dep/ioctl_linux.c b/drivers/staging/r8188eu/os_dep/ioctl_linux.c
index 52d42e576443..9404355726d0 100644
--- a/drivers/staging/r8188eu/os_dep/ioctl_linux.c
+++ b/drivers/staging/r8188eu/os_dep/ioctl_linux.c
@@ -1980,6 +1980,7 @@ static int rtw_wx_read32(struct net_device *dev,
u32 data32;
u32 bytes;
u8 *ptmp;
+ int ret;
padapter = (struct adapter *)rtw_netdev_priv(dev);
p = &wrqu->data;
@@ -2007,12 +2008,17 @@ static int rtw_wx_read32(struct net_device *dev,
break;
default:
DBG_88E(KERN_INFO "%s: usage> read [bytes],[address(hex)]\n", __func__);
- return -EINVAL;
+ ret = -EINVAL;
+ goto err_free_ptmp;
}
DBG_88E(KERN_INFO "%s: addr = 0x%08X data =%s\n", __func__, addr, extra);
kfree(ptmp);
return 0;
+
+err_free_ptmp:
+ kfree(ptmp);
+ return ret;
}
static int rtw_wx_write32(struct net_device *dev,
--
2.33.1
This is a note to let you know that I've just added the patch titled
staging: r8188eu: Use kzalloc() with GFP_ATOMIC in atomic context
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From c15a059f85de49c542e6ec2464967dd2b2aa18f6 Mon Sep 17 00:00:00 2001
From: "Fabio M. De Francesco" <fmdefrancesco(a)gmail.com>
Date: Mon, 1 Nov 2021 20:18:47 +0100
Subject: staging: r8188eu: Use kzalloc() with GFP_ATOMIC in atomic context
Use the GFP_ATOMIC flag of kzalloc() with two memory allocation in
report_del_sta_event(). This function is called while holding spinlocks,
therefore it is not allowed to sleep. With the GFP_ATOMIC type flag, the
allocation is high priority and must not sleep.
This issue is detected by Smatch which emits the following warning:
"drivers/staging/r8188eu/core/rtw_mlme_ext.c:6848 report_del_sta_event()
warn: sleeping in atomic context".
After the change, the post-commit hook output the following message:
"CHECK: Prefer kzalloc(sizeof(*pcmd_obj)...) over
kzalloc(sizeof(struct cmd_obj)...)".
According to the above "CHECK", use the preferred style in the first
kzalloc().
Fixes: 79f712ea994d ("staging: r8188eu: Remove wrappers for kalloc() and kzalloc()")
Fixes: 15865124feed ("staging: r8188eu: introduce new core dir for RTL8188eu driver")
Signed-off-by: Fabio M. De Francesco <fmdefrancesco(a)gmail.com>
Link: https://lore.kernel.org/r/20211101191847.6749-1-fmdefrancesco@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable <stable(a)vger.kernel.org>
---
drivers/staging/r8188eu/core/rtw_mlme_ext.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/r8188eu/core/rtw_mlme_ext.c b/drivers/staging/r8188eu/core/rtw_mlme_ext.c
index 5b60e6df5f87..b4820ad2cee7 100644
--- a/drivers/staging/r8188eu/core/rtw_mlme_ext.c
+++ b/drivers/staging/r8188eu/core/rtw_mlme_ext.c
@@ -6847,12 +6847,12 @@ void report_del_sta_event(struct adapter *padapter, unsigned char *MacAddr, unsi
struct mlme_ext_priv *pmlmeext = &padapter->mlmeextpriv;
struct cmd_priv *pcmdpriv = &padapter->cmdpriv;
- pcmd_obj = kzalloc(sizeof(struct cmd_obj), GFP_KERNEL);
+ pcmd_obj = kzalloc(sizeof(*pcmd_obj), GFP_ATOMIC);
if (!pcmd_obj)
return;
cmdsz = (sizeof(struct stadel_event) + sizeof(struct C2HEvent_Header));
- pevtcmd = kzalloc(cmdsz, GFP_KERNEL);
+ pevtcmd = kzalloc(cmdsz, GFP_ATOMIC);
if (!pevtcmd) {
kfree(pcmd_obj);
return;
--
2.33.1
This is a note to let you know that I've just added the patch titled
staging: r8188eu: Fix breakage introduced when 5G code was removed
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d5f0b804368951b6b4a77d2f14b5bb6a04b0e011 Mon Sep 17 00:00:00 2001
From: Larry Finger <Larry.Finger(a)lwfinger.net>
Date: Sun, 7 Nov 2021 11:35:43 -0600
Subject: staging: r8188eu: Fix breakage introduced when 5G code was removed
In commit 221abd4d478a ("staging: r8188eu: Remove no more necessary definitions
and code"), two entries were removed from RTW_ChannelPlanMap[], but not replaced
with zeros. The position within this table is important, thus the patch broke
systems operating in regulatory domains osted later than entry 0x13 in the table.
Unfortunately, the FCC entry comes before that point and most testers did not see
this problem.
Fixes: 221abd4d478a ("staging: r8188eu: Remove no more necessary definitions and code")
Cc: Stable <stable(a)vger.kernel.org> # v5.5+
Reported-and-tested-by: Zameer Manji <zmanji(a)gmail.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Reviewed-by: Phillip Potter <phil(a)philpotter.co.uk>
Signed-off-by: Larry Finger <Larry.Finger(a)lwfinger.net>
Link: https://lore.kernel.org/r/20211107173543.7486-1-Larry.Finger@lwfinger.net
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/r8188eu/core/rtw_mlme_ext.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/staging/r8188eu/core/rtw_mlme_ext.c b/drivers/staging/r8188eu/core/rtw_mlme_ext.c
index 55c3d4a6faeb..5b60e6df5f87 100644
--- a/drivers/staging/r8188eu/core/rtw_mlme_ext.c
+++ b/drivers/staging/r8188eu/core/rtw_mlme_ext.c
@@ -107,6 +107,7 @@ static struct rt_channel_plan_map RTW_ChannelPlanMap[RT_CHANNEL_DOMAIN_MAX] = {
{0x01}, /* 0x10, RT_CHANNEL_DOMAIN_JAPAN */
{0x02}, /* 0x11, RT_CHANNEL_DOMAIN_FCC_NO_DFS */
{0x01}, /* 0x12, RT_CHANNEL_DOMAIN_JAPAN_NO_DFS */
+ {0x00}, /* 0x13 */
{0x02}, /* 0x14, RT_CHANNEL_DOMAIN_TAIWAN_NO_DFS */
{0x00}, /* 0x15, RT_CHANNEL_DOMAIN_ETSI_NO_DFS */
{0x00}, /* 0x16, RT_CHANNEL_DOMAIN_KOREA_NO_DFS */
@@ -118,6 +119,7 @@ static struct rt_channel_plan_map RTW_ChannelPlanMap[RT_CHANNEL_DOMAIN_MAX] = {
{0x00}, /* 0x1C, */
{0x00}, /* 0x1D, */
{0x00}, /* 0x1E, */
+ {0x00}, /* 0x1F, */
/* 0x20 ~ 0x7F , New Define ===== */
{0x00}, /* 0x20, RT_CHANNEL_DOMAIN_WORLD_NULL */
{0x01}, /* 0x21, RT_CHANNEL_DOMAIN_ETSI1_NULL */
--
2.33.1
This is a note to let you know that I've just added the patch titled
staging/fbtft: Fix backlight
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 7865dd24934ad580d1bcde8f63c39f324211a23b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Noralf=20Tr=C3=B8nnes?= <noralf(a)tronnes.org>
Date: Fri, 5 Nov 2021 21:43:58 +0100
Subject: staging/fbtft: Fix backlight
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit b4a1ed0cd18b ("fbdev: make FB_BACKLIGHT a tristate") forgot to
update fbtft breaking its backlight support when FB_BACKLIGHT is a module.
Since FB_TFT selects FB_BACKLIGHT there's no need for this conditional
so just remove it and we're good.
Fixes: b4a1ed0cd18b ("fbdev: make FB_BACKLIGHT a tristate")
Cc: <stable(a)vger.kernel.org>
Acked-by: Sam Ravnborg <sam(a)ravnborg.org>
Signed-off-by: Noralf Trønnes <noralf(a)tronnes.org>
Link: https://lore.kernel.org/r/20211105204358.2991-1-noralf@tronnes.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/fbtft/fb_ssd1351.c | 4 ----
drivers/staging/fbtft/fbtft-core.c | 9 +--------
2 files changed, 1 insertion(+), 12 deletions(-)
diff --git a/drivers/staging/fbtft/fb_ssd1351.c b/drivers/staging/fbtft/fb_ssd1351.c
index cf263a58a148..6fd549a424d5 100644
--- a/drivers/staging/fbtft/fb_ssd1351.c
+++ b/drivers/staging/fbtft/fb_ssd1351.c
@@ -187,7 +187,6 @@ static struct fbtft_display display = {
},
};
-#ifdef CONFIG_FB_BACKLIGHT
static int update_onboard_backlight(struct backlight_device *bd)
{
struct fbtft_par *par = bl_get_data(bd);
@@ -231,9 +230,6 @@ static void register_onboard_backlight(struct fbtft_par *par)
if (!par->fbtftops.unregister_backlight)
par->fbtftops.unregister_backlight = fbtft_unregister_backlight;
}
-#else
-static void register_onboard_backlight(struct fbtft_par *par) { };
-#endif
FBTFT_REGISTER_DRIVER(DRVNAME, "solomon,ssd1351", &display);
diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
index ecb5f75f6dd5..f2684d2d6851 100644
--- a/drivers/staging/fbtft/fbtft-core.c
+++ b/drivers/staging/fbtft/fbtft-core.c
@@ -128,7 +128,6 @@ static int fbtft_request_gpios(struct fbtft_par *par)
return 0;
}
-#ifdef CONFIG_FB_BACKLIGHT
static int fbtft_backlight_update_status(struct backlight_device *bd)
{
struct fbtft_par *par = bl_get_data(bd);
@@ -161,6 +160,7 @@ void fbtft_unregister_backlight(struct fbtft_par *par)
par->info->bl_dev = NULL;
}
}
+EXPORT_SYMBOL(fbtft_unregister_backlight);
static const struct backlight_ops fbtft_bl_ops = {
.get_brightness = fbtft_backlight_get_brightness,
@@ -198,12 +198,7 @@ void fbtft_register_backlight(struct fbtft_par *par)
if (!par->fbtftops.unregister_backlight)
par->fbtftops.unregister_backlight = fbtft_unregister_backlight;
}
-#else
-void fbtft_register_backlight(struct fbtft_par *par) { };
-void fbtft_unregister_backlight(struct fbtft_par *par) { };
-#endif
EXPORT_SYMBOL(fbtft_register_backlight);
-EXPORT_SYMBOL(fbtft_unregister_backlight);
static void fbtft_set_addr_win(struct fbtft_par *par, int xs, int ys, int xe,
int ye)
@@ -853,13 +848,11 @@ int fbtft_register_framebuffer(struct fb_info *fb_info)
fb_info->fix.smem_len >> 10, text1,
HZ / fb_info->fbdefio->delay, text2);
-#ifdef CONFIG_FB_BACKLIGHT
/* Turn on backlight if available */
if (fb_info->bl_dev) {
fb_info->bl_dev->props.power = FB_BLANK_UNBLANK;
fb_info->bl_dev->ops->update_status(fb_info->bl_dev);
}
-#endif
return 0;
--
2.33.1
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: v4l2-ioctl.c: readbuffers depends on V4L2_CAP_READWRITE
Author: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Date: Wed Nov 3 12:28:31 2021 +0000
If V4L2_CAP_READWRITE is not set, then readbuffers must be set to 0,
otherwise v4l2-compliance will complain.
A note on the Fixes tag below: this patch does not really fix that commit,
but it can be applied from that commit onwards. For older code there is no
guarantee that device_caps is set, so even though this patch would apply,
it will not work reliably.
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Fixes: 049e684f2de9 (media: v4l2-dev: fix WARN_ON(!vdev->device_caps))
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/v4l2-core/v4l2-ioctl.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
---
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index 31d0109ce5a8..69b74d0e8a90 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -2090,6 +2090,7 @@ static int v4l_prepare_buf(const struct v4l2_ioctl_ops *ops,
static int v4l_g_parm(const struct v4l2_ioctl_ops *ops,
struct file *file, void *fh, void *arg)
{
+ struct video_device *vfd = video_devdata(file);
struct v4l2_streamparm *p = arg;
v4l2_std_id std;
int ret = check_fmt(file, p->type);
@@ -2101,7 +2102,8 @@ static int v4l_g_parm(const struct v4l2_ioctl_ops *ops,
if (p->type != V4L2_BUF_TYPE_VIDEO_CAPTURE &&
p->type != V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE)
return -EINVAL;
- p->parm.capture.readbuffers = 2;
+ if (vfd->device_caps & V4L2_CAP_READWRITE)
+ p->parm.capture.readbuffers = 2;
ret = ops->vidioc_g_std(file, fh, &std);
if (ret == 0)
v4l2_video_std_frame_period(std, &p->parm.capture.timeperframe);