The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: b6b4fbd90b155a0025223df2c137af8a701d53b3
Gitweb: https://git.kernel.org/tip/b6b4fbd90b155a0025223df2c137af8a701d53b3
Author: Sean Christopherson <seanjc(a)google.com>
AuthorDate: Tue, 04 May 2021 15:56:31 -07:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Wed, 05 May 2021 21:50:14 +02:00
x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported
Initialize MSR_TSC_AUX with CPU node information if RDTSCP or RDPID is
supported. This fixes a bug where vdso_read_cpunode() will read garbage
via RDPID if RDPID is supported but RDTSCP is not. While no known CPU
supports RDPID but not RDTSCP, both Intel's SDM and AMD's APM allow for
RDPID to exist without RDTSCP, e.g. it's technically a legal CPU model
for a virtual machine.
Note, technically MSR_TSC_AUX could be initialized if and only if RDPID
is supported since RDTSCP is currently not used to retrieve the CPU node.
But, the cost of the superfluous WRMSR is negigible, whereas leaving
MSR_TSC_AUX uninitialized is just asking for future breakage if someone
decides to utilize RDTSCP.
Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210504225632.1532621-2-seanjc@google.com
---
arch/x86/kernel/cpu/common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6bdb69a..490bed0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1851,7 +1851,7 @@ static inline void setup_getcpu(int cpu)
unsigned long cpudata = vdso_encode_cpunode(cpu, early_cpu_to_node(cpu));
struct desc_struct d = { };
- if (boot_cpu_has(X86_FEATURE_RDTSCP))
+ if (boot_cpu_has(X86_FEATURE_RDTSCP) || boot_cpu_has(X86_FEATURE_RDPID))
write_rdtscp_aux(cpudata);
/* Store CPU and node number in limit. */
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: a217a6593cec8b315d4c2f344bae33660b39b703
Gitweb: https://git.kernel.org/tip/a217a6593cec8b315d4c2f344bae33660b39b703
Author: Lai Jiangshan <laijs(a)linux.alibaba.com>
AuthorDate: Tue, 04 May 2021 21:50:14 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Wed, 05 May 2021 22:54:10 +02:00
KVM/VMX: Invoke NMI non-IST entry instead of IST entry
In VMX, the host NMI handler needs to be invoked after NMI VM-Exit.
Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect
call instead of INTn"), this was done by INTn ("int $2"). But INTn
microcode is relatively expensive, so the commit reworked NMI VM-Exit
handling to invoke the kernel handler by function call.
But this missed a detail. The NMI entry point for direct invocation is
fetched from the IDT table and called on the kernel stack. But on 64-bit
the NMI entry installed in the IDT expects to be invoked on the IST stack.
It relies on the "NMI executing" variable on the IST stack to work
correctly, which is at a fixed position in the IST stack. When the entry
point is unexpectedly called on the kernel stack, the RSP-addressed "NMI
executing" variable is obviously also on the kernel stack and is
"uninitialized" and can cause the NMI entry code to run in the wrong way.
Provide a non-ist entry point for VMX which shares the C-function with
the regular NMI entry and invoke the new asm entry point instead.
On 32-bit this just maps to the regular NMI entry point as 32-bit has no
ISTs and is not affected.
[ tglx: Made it independent for backporting, massaged changelog ]
Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn")
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87r1imi8i1.ffs@nanos.tec.linutronix.de
---
arch/x86/include/asm/idtentry.h | 15 +++++++++++++++
arch/x86/kernel/nmi.c | 10 ++++++++++
arch/x86/kvm/vmx/vmx.c | 16 +++++++++-------
3 files changed, 34 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index e35e342..73d45b0 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -588,6 +588,21 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC, xenpv_exc_machine_check);
#endif
/* NMI */
+
+#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
+/*
+ * Special NOIST entry point for VMX which invokes this on the kernel
+ * stack. asm_exc_nmi() requires an IST to work correctly vs. the NMI
+ * 'executing' marker.
+ *
+ * On 32bit this just uses the regular NMI entry point because 32-bit does
+ * not have ISTs.
+ */
+DECLARE_IDTENTRY(X86_TRAP_NMI, exc_nmi_noist);
+#else
+#define asm_exc_nmi_noist asm_exc_nmi
+#endif
+
DECLARE_IDTENTRY_NMI(X86_TRAP_NMI, exc_nmi);
#ifdef CONFIG_XEN_PV
DECLARE_IDTENTRY_RAW(X86_TRAP_NMI, xenpv_exc_nmi);
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index bf250a3..2ef961c 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -524,6 +524,16 @@ nmi_restart:
mds_user_clear_cpu_buffers();
}
+#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)
+DEFINE_IDTENTRY_RAW(exc_nmi_noist)
+{
+ exc_nmi(regs);
+}
+#endif
+#if IS_MODULE(CONFIG_KVM_INTEL)
+EXPORT_SYMBOL_GPL(asm_exc_nmi_noist);
+#endif
+
void stop_nmi(void)
{
ignore_nmis++;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cbe0cda..b21d751 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -36,6 +36,7 @@
#include <asm/debugreg.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
+#include <asm/idtentry.h>
#include <asm/io.h>
#include <asm/irq_remapping.h>
#include <asm/kexec.h>
@@ -6415,18 +6416,17 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu, u32 intr_info)
+static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
+ unsigned long entry)
{
- unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
- gate_desc *desc = (gate_desc *)host_idt_base + vector;
-
kvm_before_interrupt(vcpu);
- vmx_do_interrupt_nmi_irqoff(gate_offset(desc));
+ vmx_do_interrupt_nmi_irqoff(entry);
kvm_after_interrupt(vcpu);
}
static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
{
+ const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
/* if exit due to PF check for async PF */
@@ -6437,18 +6437,20 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
else if (is_nmi(intr_info))
- handle_interrupt_nmi_irqoff(&vmx->vcpu, intr_info);
+ handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
}
static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
{
u32 intr_info = vmx_get_intr_info(vcpu);
+ unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
+ gate_desc *desc = (gate_desc *)host_idt_base + vector;
if (WARN_ONCE(!is_external_intr(intr_info),
"KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
return;
- handle_interrupt_nmi_irqoff(vcpu, intr_info);
+ handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
}
static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 160457140187c5fb127b844e5a85f87f00a01b14
Gitweb: https://git.kernel.org/tip/160457140187c5fb127b844e5a85f87f00a01b14
Author: Wanpeng Li <wanpengli(a)tencent.com>
AuthorDate: Tue, 04 May 2021 17:27:30 -07:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Wed, 05 May 2021 22:54:11 +02:00
KVM: x86: Defer vtime accounting 'til after IRQ handling
Defer the call to account guest time until after servicing any IRQ(s)
that happened in the guest or immediately after VM-Exit. Tick-based
accounting of vCPU time relies on PF_VCPU being set when the tick IRQ
handler runs, and IRQs are blocked throughout the main sequence of
vcpu_enter_guest(), including the call into vendor code to actually
enter and exit the guest.
This fixes a bug where reported guest time remains '0', even when
running an infinite loop in the guest:
https://bugzilla.kernel.org/show_bug.cgi?id=209831
Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs")
Suggested-by: Thomas Gleixner <tglx(a)linutronix.de>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.com
---
arch/x86/kvm/svm/svm.c | 6 +++---
arch/x86/kvm/vmx/vmx.c | 6 +++---
arch/x86/kvm/x86.c | 9 +++++++++
3 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9790c73..c400def 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3753,15 +3753,15 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu)
* have them in state 'on' as recorded before entering guest mode.
* Same as enter_from_user_mode().
*
- * guest_exit_irqoff() restores host context and reinstates RCU if
- * enabled and required.
+ * context_tracking_guest_exit() restores host context and reinstates
+ * RCU if enabled and required.
*
* This needs to be done before the below as native_read_msr()
* contains a tracepoint and x86_spec_ctrl_restore_host() calls
* into world and some more.
*/
lockdep_hardirqs_off(CALLER_ADDR0);
- guest_exit_irqoff();
+ context_tracking_guest_exit();
instrumentation_begin();
trace_hardirqs_off_finish();
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b21d751..e108fb4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6703,15 +6703,15 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
* have them in state 'on' as recorded before entering guest mode.
* Same as enter_from_user_mode().
*
- * guest_exit_irqoff() restores host context and reinstates RCU if
- * enabled and required.
+ * context_tracking_guest_exit() restores host context and reinstates
+ * RCU if enabled and required.
*
* This needs to be done before the below as native_read_msr()
* contains a tracepoint and x86_spec_ctrl_restore_host() calls
* into world and some more.
*/
lockdep_hardirqs_off(CALLER_ADDR0);
- guest_exit_irqoff();
+ context_tracking_guest_exit();
instrumentation_begin();
trace_hardirqs_off_finish();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cebdaa1..6eda283 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9315,6 +9315,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
local_irq_disable();
kvm_after_interrupt(vcpu);
+ /*
+ * Wait until after servicing IRQs to account guest time so that any
+ * ticks that occurred while running the guest are properly accounted
+ * to the guest. Waiting until IRQs are enabled degrades the accuracy
+ * of accounting via context tracking, but the loss of accuracy is
+ * acceptable for all known use cases.
+ */
+ vtime_account_guest_exit();
+
if (lapic_in_kernel(vcpu)) {
s64 delta = vcpu->arch.apic->lapic_timer.advance_expire_delta;
if (delta != S64_MIN) {
This is the start of the stable review cycle for the 5.4.117 release.
There are 21 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 07 May 2021 11:23:16 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.117-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.117-rc1
Ondrej Mosnacek <omosnace(a)redhat.com>
perf/core: Fix unconditional security_locked_down() call
Miklos Szeredi <mszeredi(a)redhat.com>
ovl: allow upperdir inside lowerdir
Dan Carpenter <dan.carpenter(a)oracle.com>
scsi: ufs: Unlock on a couple error paths
Mark Pearson <markpearson(a)lenovo.com>
platform/x86: thinkpad_acpi: Correct thermal sensor allocation
Shengjiu Wang <shengjiu.wang(a)nxp.com>
ASoC: ak5558: Add MODULE_DEVICE_TABLE
Shengjiu Wang <shengjiu.wang(a)nxp.com>
ASoC: ak4458: Add MODULE_DEVICE_TABLE
Chris Chiu <chris.chiu(a)canonical.com>
USB: Add reset-resume quirk for WD19's Realtek Hub
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
USB: Add LPM quirk for Lenovo ThinkPad USB-C Dock Gen2 Ethernet
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Add MIDI quirk for Vox ToneLab EX
Thomas Richter <tmricht(a)linux.ibm.com>
perf ftrace: Fix access to pid in array when setting a pid filter
Zhen Lei <thunder.leizhen(a)huawei.com>
perf data: Fix error return code in perf_data__create_dir()
Jiri Kosina <jkosina(a)suse.cz>
iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_gen2_enqueue_hcmd()
Arnd Bergmann <arnd(a)arndb.de>
avoid __memcat_p link failure
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix leakage of uninitialized bpf stack under speculation
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix masking negation logic upon negative dst register
Jiri Kosina <jkosina(a)suse.cz>
iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_enqueue_hcmd()
Nick Lowe <nick.lowe(a)gmail.com>
igb: Enable RSS for Intel I211 Ethernet Controller
Phillip Potter <phil(a)philpotter.co.uk>
net: usb: ax88179_178a: initialize local variables before use
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
ACPI: x86: Call acpi_boot_table_init() after acpi_table_upgrade()
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
ACPI: tables: x86: Reserve memory occupied by ACPI tables
Romain Naour <romain.naour(a)gmail.com>
mips: Do not include hi and lo in clobber list for R6
-------------
Diffstat:
Makefile | 4 +--
arch/mips/include/asm/vdso/gettimeofday.h | 26 +++++++++++---
arch/x86/kernel/acpi/boot.c | 25 +++++++-------
arch/x86/kernel/setup.c | 7 ++--
drivers/acpi/tables.c | 42 +++++++++++++++++++++--
drivers/net/ethernet/intel/igb/igb_main.c | 3 +-
drivers/net/usb/ax88179_178a.c | 4 +--
drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 7 ++--
drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 7 ++--
drivers/platform/x86/thinkpad_acpi.c | 31 ++++++++++++-----
drivers/scsi/ufs/ufshcd.c | 14 +++++---
drivers/usb/core/quirks.c | 4 +++
fs/overlayfs/super.c | 12 ++++---
include/linux/acpi.h | 9 ++++-
include/linux/bpf_verifier.h | 5 +--
kernel/bpf/verifier.c | 33 ++++++++++--------
kernel/events/core.c | 12 +++----
lib/Makefile | 4 +--
sound/soc/codecs/ak4458.c | 1 +
sound/soc/codecs/ak5558.c | 1 +
sound/usb/quirks-table.h | 10 ++++++
tools/perf/builtin-ftrace.c | 2 +-
tools/perf/util/data.c | 5 +--
23 files changed, 183 insertions(+), 85 deletions(-)
This is the start of the stable review cycle for the 5.10.35 release.
There are 29 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 07 May 2021 11:23:16 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.35-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.35-rc1
Ondrej Mosnacek <omosnace(a)redhat.com>
perf/core: Fix unconditional security_locked_down() call
Mark Pearson <markpearson(a)lenovo.com>
platform/x86: thinkpad_acpi: Correct thermal sensor allocation
Shengjiu Wang <shengjiu.wang(a)nxp.com>
ASoC: ak5558: Add MODULE_DEVICE_TABLE
Shengjiu Wang <shengjiu.wang(a)nxp.com>
ASoC: ak4458: Add MODULE_DEVICE_TABLE
Chris Chiu <chris.chiu(a)canonical.com>
USB: Add reset-resume quirk for WD19's Realtek Hub
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
USB: Add LPM quirk for Lenovo ThinkPad USB-C Dock Gen2 Ethernet
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Add MIDI quirk for Vox ToneLab EX
Miklos Szeredi <mszeredi(a)redhat.com>
ovl: allow upperdir inside lowerdir
Mickaël Salaün <mic(a)linux.microsoft.com>
ovl: fix leaked dentry
Jianxiong Gao <jxgao(a)google.com>
nvme-pci: set min_align_mask
Jianxiong Gao <jxgao(a)google.com>
swiotlb: respect min_align_mask
Jianxiong Gao <jxgao(a)google.com>
swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
Jianxiong Gao <jxgao(a)google.com>
swiotlb: refactor swiotlb_tbl_map_single
Jianxiong Gao <jxgao(a)google.com>
swiotlb: clean up swiotlb_tbl_unmap_single
Jianxiong Gao <jxgao(a)google.com>
swiotlb: factor out a nr_slots helper
Jianxiong Gao <jxgao(a)google.com>
swiotlb: factor out an io_tlb_offset helper
Jianxiong Gao <jxgao(a)google.com>
swiotlb: add a IO_TLB_SIZE define
Jianxiong Gao <jxgao(a)google.com>
driver core: add a min_align_mask field to struct device_dma_parameters
Vasily Averin <vvs(a)virtuozzo.com>
tools/cgroup/slabinfo.py: updated to work on current kernel
Thomas Richter <tmricht(a)linux.ibm.com>
perf ftrace: Fix access to pid in array when setting a pid filter
Serge E. Hallyn <serge(a)hallyn.com>
capabilities: require CAP_SETFCAP to map uid 0
Zhen Lei <thunder.leizhen(a)huawei.com>
perf data: Fix error return code in perf_data__create_dir()
Bjorn Andersson <bjorn.andersson(a)linaro.org>
net: qrtr: Avoid potential use after free in MHI send
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix leakage of uninitialized bpf stack under speculation
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix masking negation logic upon negative dst register
Nick Lowe <nick.lowe(a)gmail.com>
igb: Enable RSS for Intel I211 Ethernet Controller
Phillip Potter <phil(a)philpotter.co.uk>
net: usb: ax88179_178a: initialize local variables before use
Jonathon Reinhart <jonathon.reinhart(a)gmail.com>
netfilter: conntrack: Make global sysctls readonly in non-init netns
Romain Naour <romain.naour(a)gmail.com>
mips: Do not include hi and lo in clobber list for R6
-------------
Diffstat:
Makefile | 4 +-
arch/mips/include/asm/vdso/gettimeofday.h | 26 ++-
drivers/net/ethernet/intel/igb/igb_main.c | 3 +-
drivers/net/usb/ax88179_178a.c | 6 +-
drivers/nvme/host/pci.c | 1 +
drivers/platform/x86/thinkpad_acpi.c | 31 ++--
drivers/usb/core/quirks.c | 4 +
fs/overlayfs/namei.c | 1 +
fs/overlayfs/super.c | 12 +-
include/linux/bpf_verifier.h | 5 +-
include/linux/device.h | 1 +
include/linux/dma-mapping.h | 16 ++
include/linux/swiotlb.h | 1 +
include/linux/user_namespace.h | 3 +
include/uapi/linux/capability.h | 3 +-
kernel/bpf/verifier.c | 33 ++--
kernel/dma/swiotlb.c | 259 +++++++++++++++++-------------
kernel/events/core.c | 12 +-
kernel/user_namespace.c | 65 +++++++-
net/netfilter/nf_conntrack_standalone.c | 10 +-
net/qrtr/mhi.c | 8 +-
sound/soc/codecs/ak4458.c | 1 +
sound/soc/codecs/ak5558.c | 1 +
sound/usb/quirks-table.h | 10 ++
tools/cgroup/memcg_slabinfo.py | 8 +-
tools/perf/builtin-ftrace.c | 2 +-
tools/perf/util/data.c | 5 +-
27 files changed, 347 insertions(+), 184 deletions(-)
Hi,
I ran Smatch on 5.4.116 and I found that we were missing commit
bb14dd1564c9 ("scsi: ufs: Unlock on a couple error paths").
The problem was caused because somehow my Fixes tag did not match the
upstream commit that stable used. I have both hashes in my git tree and
the patches are identical except for the hash. I don't know git well
enough to say what went wrong. I don't think the SCSI tree rebases?
My fixes tag:
Fixes: a276c19e3e98 ("scsi: ufs: Avoid busy-waiting by eliminating tag conflicts")
^^^^^^^^^^^^
Stable hash:
commit a8d2d45c70c7391386baf7863674f156da56a3d5
Author: Bart Van Assche <bvanassche(a)acm.org>
Date: Mon Dec 9 10:13:08 2019 -0800
scsi: ufs: Avoid busy-waiting by eliminating tag conflicts
[ Upstream commit 7252a3603015f1fd04363956f4b72a537c9f9c42 ]
^^^^^^^^^^^^
regards,
dan carpenter
This is the start of the stable review cycle for the 5.11.19 release.
There are 31 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 07 May 2021 11:23:16 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.19-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.11.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.11.19-rc1
Ondrej Mosnacek <omosnace(a)redhat.com>
perf/core: Fix unconditional security_locked_down() call
Mark Pearson <markpearson(a)lenovo.com>
platform/x86: thinkpad_acpi: Correct thermal sensor allocation
Shengjiu Wang <shengjiu.wang(a)nxp.com>
ASoC: ak5558: Add MODULE_DEVICE_TABLE
Shengjiu Wang <shengjiu.wang(a)nxp.com>
ASoC: ak4458: Add MODULE_DEVICE_TABLE
Chris Chiu <chris.chiu(a)canonical.com>
USB: Add reset-resume quirk for WD19's Realtek Hub
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
USB: Add LPM quirk for Lenovo ThinkPad USB-C Dock Gen2 Ethernet
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Fix implicit sync clearance at stopping stream
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Add MIDI quirk for Vox ToneLab EX
Miklos Szeredi <mszeredi(a)redhat.com>
ovl: allow upperdir inside lowerdir
Mickaël Salaün <mic(a)linux.microsoft.com>
ovl: fix leaked dentry
Jianxiong Gao <jxgao(a)google.com>
nvme-pci: set min_align_mask
Jianxiong Gao <jxgao(a)google.com>
swiotlb: respect min_align_mask
Jianxiong Gao <jxgao(a)google.com>
swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
Jianxiong Gao <jxgao(a)google.com>
swiotlb: refactor swiotlb_tbl_map_single
Jianxiong Gao <jxgao(a)google.com>
swiotlb: clean up swiotlb_tbl_unmap_single
Jianxiong Gao <jxgao(a)google.com>
swiotlb: factor out a nr_slots helper
Jianxiong Gao <jxgao(a)google.com>
swiotlb: factor out an io_tlb_offset helper
Jianxiong Gao <jxgao(a)google.com>
swiotlb: add a IO_TLB_SIZE define
Jianxiong Gao <jxgao(a)google.com>
driver core: add a min_align_mask field to struct device_dma_parameters
Vasily Averin <vvs(a)virtuozzo.com>
tools/cgroup/slabinfo.py: updated to work on current kernel
Thomas Richter <tmricht(a)linux.ibm.com>
perf ftrace: Fix access to pid in array when setting a pid filter
Serge E. Hallyn <serge(a)hallyn.com>
capabilities: require CAP_SETFCAP to map uid 0
Zhen Lei <thunder.leizhen(a)huawei.com>
perf data: Fix error return code in perf_data__create_dir()
Bjorn Andersson <bjorn.andersson(a)linaro.org>
net: qrtr: Avoid potential use after free in MHI send
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix leakage of uninitialized bpf stack under speculation
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix masking negation logic upon negative dst register
Nick Lowe <nick.lowe(a)gmail.com>
igb: Enable RSS for Intel I211 Ethernet Controller
Imre Deak <imre.deak(a)intel.com>
drm/i915: Disable runtime power management during shutdown
Phillip Potter <phil(a)philpotter.co.uk>
net: usb: ax88179_178a: initialize local variables before use
Jonathon Reinhart <jonathon.reinhart(a)gmail.com>
netfilter: conntrack: Make global sysctls readonly in non-init netns
Romain Naour <romain.naour(a)gmail.com>
mips: Do not include hi and lo in clobber list for R6
-------------
Diffstat:
Makefile | 4 +-
arch/mips/include/asm/vdso/gettimeofday.h | 26 ++-
drivers/gpu/drm/i915/i915_drv.c | 10 ++
drivers/net/ethernet/intel/igb/igb_main.c | 3 +-
drivers/net/usb/ax88179_178a.c | 6 +-
drivers/nvme/host/pci.c | 1 +
drivers/platform/x86/thinkpad_acpi.c | 31 ++--
drivers/usb/core/quirks.c | 4 +
fs/overlayfs/namei.c | 1 +
fs/overlayfs/super.c | 12 +-
include/linux/bpf_verifier.h | 5 +-
include/linux/device.h | 1 +
include/linux/dma-mapping.h | 16 ++
include/linux/swiotlb.h | 1 +
include/linux/user_namespace.h | 3 +
include/uapi/linux/capability.h | 3 +-
kernel/bpf/verifier.c | 33 ++--
kernel/dma/swiotlb.c | 259 +++++++++++++++++-------------
kernel/events/core.c | 12 +-
kernel/user_namespace.c | 65 +++++++-
net/netfilter/nf_conntrack_standalone.c | 10 +-
net/qrtr/mhi.c | 8 +-
sound/soc/codecs/ak4458.c | 1 +
sound/soc/codecs/ak5558.c | 1 +
sound/usb/endpoint.c | 8 +-
sound/usb/quirks-table.h | 10 ++
tools/cgroup/memcg_slabinfo.py | 8 +-
tools/perf/builtin-ftrace.c | 2 +-
tools/perf/util/data.c | 5 +-
29 files changed, 361 insertions(+), 188 deletions(-)
The NVME device pluged in some AMD PCIE root port will resume timeout
from s2idle which caused by NVME power CFG lost in the SMU FW restore.
This issue can be workaround by using PCIe power set with simple
suspend/resume process path instead of APST. In the onwards ASIC will
try do the NVME shutdown save and restore in the BIOS and still need PCIe
power setting to resume from RTD3 for s2idle.
In this preparation patch add a PCIe quirk for the AMD.
Cc: <stable(a)vger.kernel.org> # 5.11+
Signed-off-by: Prike Liang <Prike.Liang(a)amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k(a)amd.com>
[ck: split patches for nvme and pcie]
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni(a)wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni(a)wdc.com>
---
Changes in v2:
Fix the patch format and check chip root complex DID instead of PCIe RP
to avoid the storage device plugged in internal PCIe RP by USB adaptor.
Changes in v3:
According to Christoph Hellwig do NVME PCIe related identify opt better
in PCIe quirk driver rather than in NVME module.
Changes in v4:
Split the fix to PCIe and NVMe part and then call the pci_dev_put() put
the device reference count and finally refine the commit info.
---
drivers/pci/quirks.c | 10 ++++++++++
include/linux/pci.h | 2 ++
2 files changed, 12 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3..f95c8b2 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -312,6 +312,16 @@ static void quirk_nopciamd(struct pci_dev *dev)
}
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8151_0, quirk_nopciamd);
+static void quirk_amd_nvme_fixup(struct pci_dev *dev)
+{
+ struct pci_dev *rdev;
+
+ dev->dev_flags |= PCI_DEV_FLAGS_AMD_NVME_SIMPLE_SUSPEND;
+ pci_info(dev, "AMD simple suspend opt enabled\n");
+
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1630, quirk_amd_nvme_fixup);
+
/* Triton requires workarounds to be used by the drivers */
static void quirk_triton(struct pci_dev *dev)
{
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 53f4904..a6e1b1b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -227,6 +227,8 @@ enum pci_dev_flags {
PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
/* Don't use Relaxed Ordering for TLPs directed at this device */
PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
+ /* AMD simple suspend opt quirk */
+ PCI_DEV_FLAGS_AMD_NVME_SIMPLE_SUSPEND = (__force pci_dev_flags_t) (1 << 12),
};
enum pci_irq_reroute_variant {
--
2.7.4
This is the start of the stable review cycle for the 5.12.2 release.
There are 17 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 07 May 2021 11:23:16 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.12.2-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.12.2-rc1
Ondrej Mosnacek <omosnace(a)redhat.com>
perf/core: Fix unconditional security_locked_down() call
Mark Pearson <markpearson(a)lenovo.com>
platform/x86: thinkpad_acpi: Correct thermal sensor allocation
Shengjiu Wang <shengjiu.wang(a)nxp.com>
ASoC: ak5558: Add MODULE_DEVICE_TABLE
Shengjiu Wang <shengjiu.wang(a)nxp.com>
ASoC: ak4458: Add MODULE_DEVICE_TABLE
Chris Chiu <chris.chiu(a)canonical.com>
USB: Add reset-resume quirk for WD19's Realtek Hub
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
USB: Add LPM quirk for Lenovo ThinkPad USB-C Dock Gen2 Ethernet
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Fix implicit sync clearance at stopping stream
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Add MIDI quirk for Vox ToneLab EX
Miklos Szeredi <mszeredi(a)redhat.com>
ovl: allow upperdir inside lowerdir
Mickaël Salaün <mic(a)linux.microsoft.com>
ovl: fix leaked dentry
Bjorn Andersson <bjorn.andersson(a)linaro.org>
net: qrtr: Avoid potential use after free in MHI send
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix leakage of uninitialized bpf stack under speculation
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix masking negation logic upon negative dst register
Imre Deak <imre.deak(a)intel.com>
drm/i915: Disable runtime power management during shutdown
Phillip Potter <phil(a)philpotter.co.uk>
net: usb: ax88179_178a: initialize local variables before use
Jonathon Reinhart <jonathon.reinhart(a)gmail.com>
netfilter: conntrack: Make global sysctls readonly in non-init netns
Romain Naour <romain.naour(a)gmail.com>
mips: Do not include hi and lo in clobber list for R6
-------------
Diffstat:
Makefile | 4 ++--
arch/mips/include/asm/vdso/gettimeofday.h | 26 +++++++++++++++++++-----
drivers/gpu/drm/i915/i915_drv.c | 10 ++++++++++
drivers/net/usb/ax88179_178a.c | 6 ++++--
drivers/platform/x86/thinkpad_acpi.c | 31 ++++++++++++++++++++---------
drivers/usb/core/quirks.c | 4 ++++
fs/overlayfs/namei.c | 1 +
fs/overlayfs/super.c | 12 ++++++-----
include/linux/bpf_verifier.h | 5 +++--
kernel/bpf/verifier.c | 33 +++++++++++++++++--------------
kernel/events/core.c | 12 +++++------
net/netfilter/nf_conntrack_standalone.c | 10 ++--------
net/qrtr/mhi.c | 8 +++++---
sound/soc/codecs/ak4458.c | 1 +
sound/soc/codecs/ak5558.c | 1 +
sound/usb/endpoint.c | 8 ++++----
sound/usb/quirks-table.h | 10 ++++++++++
17 files changed, 121 insertions(+), 61 deletions(-)
Currently, -Wunused-but-set-variable is only supported by GCC so it is
disabled unconditionally in a GCC only block (it is enabled with W=1).
clang currently has its implementation for this warning in review so
preemptively move this statement out of the GCC only block and wrap it
with cc-disable-warning so that both compilers function the same.
Cc: stable(a)vger.kernel.org
Link: https://reviews.llvm.org/D100581
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
Makefile | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/Makefile b/Makefile
index f03888cdba4e..911d839cfea8 100644
--- a/Makefile
+++ b/Makefile
@@ -775,16 +775,16 @@ KBUILD_CFLAGS += -Wno-gnu
KBUILD_CFLAGS += -mno-global-merge
else
-# These warnings generated too much noise in a regular build.
-# Use make W=1 to enable them (see scripts/Makefile.extrawarn)
-KBUILD_CFLAGS += -Wno-unused-but-set-variable
-
# Warn about unmarked fall-throughs in switch statement.
# Disabled for clang while comment to attribute conversion happens and
# https://github.com/ClangBuiltLinux/linux/issues/636 is discussed.
KBUILD_CFLAGS += $(call cc-option,-Wimplicit-fallthrough,)
endif
+# These warnings generated too much noise in a regular build.
+# Use make W=1 to enable them (see scripts/Makefile.extrawarn)
+KBUILD_CFLAGS += $(call cc-disable-warning, unused-but-set-variable)
+
KBUILD_CFLAGS += $(call cc-disable-warning, unused-const-variable)
ifdef CONFIG_FRAME_POINTER
KBUILD_CFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls
base-commit: d8201efe75e13146ebde433745c7920e15593baf
--
2.31.1.362.g311531c9de
From: Alexander Aring <aahringo(a)redhat.com>
[ Upstream commit 92c48950b43f4a767388cf87709d8687151a641f ]
This patch fixes the following message which randomly pops up during
glocktop call:
seq_file: buggy .next function table_seq_next did not update position index
The issue is that seq_read_iter() in fs/seq_file.c also needs an
increment of the index in an non next record case as well which this
patch fixes otherwise seq_read_iter() will print out the above message.
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
Signed-off-by: David Teigland <teigland(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/dlm/debug_fs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c
index eea64912c9c0..3b79c0284a30 100644
--- a/fs/dlm/debug_fs.c
+++ b/fs/dlm/debug_fs.c
@@ -545,6 +545,7 @@ static void *table_seq_next(struct seq_file *seq, void *iter_ptr, loff_t *pos)
if (bucket >= ls->ls_rsbtbl_size) {
kfree(ri);
+ ++*pos;
return NULL;
}
tree = toss ? &ls->ls_rsbtbl[bucket].toss : &ls->ls_rsbtbl[bucket].keep;
--
2.30.2
From: Alexander Aring <aahringo(a)redhat.com>
[ Upstream commit 92c48950b43f4a767388cf87709d8687151a641f ]
This patch fixes the following message which randomly pops up during
glocktop call:
seq_file: buggy .next function table_seq_next did not update position index
The issue is that seq_read_iter() in fs/seq_file.c also needs an
increment of the index in an non next record case as well which this
patch fixes otherwise seq_read_iter() will print out the above message.
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
Signed-off-by: David Teigland <teigland(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/dlm/debug_fs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c
index 466f7d60edc2..fabce23fdbac 100644
--- a/fs/dlm/debug_fs.c
+++ b/fs/dlm/debug_fs.c
@@ -545,6 +545,7 @@ static void *table_seq_next(struct seq_file *seq, void *iter_ptr, loff_t *pos)
if (bucket >= ls->ls_rsbtbl_size) {
kfree(ri);
+ ++*pos;
return NULL;
}
tree = toss ? &ls->ls_rsbtbl[bucket].toss : &ls->ls_rsbtbl[bucket].keep;
--
2.30.2
From: Alexander Aring <aahringo(a)redhat.com>
[ Upstream commit 92c48950b43f4a767388cf87709d8687151a641f ]
This patch fixes the following message which randomly pops up during
glocktop call:
seq_file: buggy .next function table_seq_next did not update position index
The issue is that seq_read_iter() in fs/seq_file.c also needs an
increment of the index in an non next record case as well which this
patch fixes otherwise seq_read_iter() will print out the above message.
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
Signed-off-by: David Teigland <teigland(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/dlm/debug_fs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c
index fa08448e35dd..bb87dad03cd4 100644
--- a/fs/dlm/debug_fs.c
+++ b/fs/dlm/debug_fs.c
@@ -544,6 +544,7 @@ static void *table_seq_next(struct seq_file *seq, void *iter_ptr, loff_t *pos)
if (bucket >= ls->ls_rsbtbl_size) {
kfree(ri);
+ ++*pos;
return NULL;
}
tree = toss ? &ls->ls_rsbtbl[bucket].toss : &ls->ls_rsbtbl[bucket].keep;
--
2.30.2
From: Alexander Aring <aahringo(a)redhat.com>
[ Upstream commit 92c48950b43f4a767388cf87709d8687151a641f ]
This patch fixes the following message which randomly pops up during
glocktop call:
seq_file: buggy .next function table_seq_next did not update position index
The issue is that seq_read_iter() in fs/seq_file.c also needs an
increment of the index in an non next record case as well which this
patch fixes otherwise seq_read_iter() will print out the above message.
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
Signed-off-by: David Teigland <teigland(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/dlm/debug_fs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c
index fa08448e35dd..bb87dad03cd4 100644
--- a/fs/dlm/debug_fs.c
+++ b/fs/dlm/debug_fs.c
@@ -544,6 +544,7 @@ static void *table_seq_next(struct seq_file *seq, void *iter_ptr, loff_t *pos)
if (bucket >= ls->ls_rsbtbl_size) {
kfree(ri);
+ ++*pos;
return NULL;
}
tree = toss ? &ls->ls_rsbtbl[bucket].toss : &ls->ls_rsbtbl[bucket].keep;
--
2.30.2
From: Alexander Aring <aahringo(a)redhat.com>
[ Upstream commit 92c48950b43f4a767388cf87709d8687151a641f ]
This patch fixes the following message which randomly pops up during
glocktop call:
seq_file: buggy .next function table_seq_next did not update position index
The issue is that seq_read_iter() in fs/seq_file.c also needs an
increment of the index in an non next record case as well which this
patch fixes otherwise seq_read_iter() will print out the above message.
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
Signed-off-by: David Teigland <teigland(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/dlm/debug_fs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c
index d6bbccb0ed15..d5bd990bcab8 100644
--- a/fs/dlm/debug_fs.c
+++ b/fs/dlm/debug_fs.c
@@ -542,6 +542,7 @@ static void *table_seq_next(struct seq_file *seq, void *iter_ptr, loff_t *pos)
if (bucket >= ls->ls_rsbtbl_size) {
kfree(ri);
+ ++*pos;
return NULL;
}
tree = toss ? &ls->ls_rsbtbl[bucket].toss : &ls->ls_rsbtbl[bucket].keep;
--
2.30.2
Hello,
please consider backporting commit 08ef1af4de5f ("perf/core: Fix
unconditional security_locked_down() call") to stable kernels, as
without it SELinux requires an extraneous permission for
perf_event_open(2) calls with PERF_SAMPLE_REGS_INTR unset.
The range of target kernel versions is implied by the Fixes: tag.
Thanks,
--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.
evm_inode_init_security() requires an HMAC key to calculate the HMAC on
initial xattrs provided by LSMs. However, it checks generically whether a
key has been loaded, including also public keys, which is not correct as
public keys are not suitable to calculate the HMAC.
Originally, support for signature verification was introduced to verify a
possibly immutable initial ram disk, when no new files are created, and to
switch to HMAC for the root filesystem. By that time, an HMAC key should
have been loaded and usable to calculate HMACs for new files.
More recently support for requiring an HMAC key was removed from the
kernel, so that signature verification can be used alone. Since this is a
legitimate use case, evm_inode_init_security() should not return an error
when no HMAC key has been loaded.
This patch fixes this problem by replacing the evm_key_loaded() check with
a check of the EVM_INIT_HMAC flag in evm_initialized.
Cc: stable(a)vger.kernel.org # 4.5.x
Fixes: 26ddabfe96b ("evm: enable EVM when X509 certificate is loaded")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Reviewed-by: Mimi Zohar <zohar(a)linux.ibm.com>
---
security/integrity/evm/evm_main.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/security/integrity/evm/evm_main.c b/security/integrity/evm/evm_main.c
index 0de367aaa2d3..7ac5204c8d1f 100644
--- a/security/integrity/evm/evm_main.c
+++ b/security/integrity/evm/evm_main.c
@@ -521,7 +521,7 @@ void evm_inode_post_setattr(struct dentry *dentry, int ia_valid)
}
/*
- * evm_inode_init_security - initializes security.evm
+ * evm_inode_init_security - initializes security.evm HMAC value
*/
int evm_inode_init_security(struct inode *inode,
const struct xattr *lsm_xattr,
@@ -530,7 +530,8 @@ int evm_inode_init_security(struct inode *inode,
struct evm_xattr *xattr_data;
int rc;
- if (!evm_key_loaded() || !evm_protected_xattr(lsm_xattr->name))
+ if (!(evm_initialized & EVM_INIT_HMAC) ||
+ !evm_protected_xattr(lsm_xattr->name))
return 0;
xattr_data = kzalloc(sizeof(*xattr_data), GFP_NOFS);
--
2.25.1
Hi Greg,
This is a backport of commit 708fa01597fa ("ovl: allow upperdir inside
lowerdir").
Thanks,
Miklos
---
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Mon, 12 Apr 2021 12:00:37 +0200
Subject: ovl: allow upperdir inside lowerdir
commit 708fa01597fa002599756bf56a96d0de1677375c upstream.
Commit 146d62e5a586 ("ovl: detect overlapping layers") made sure we don't
have overlapping layers, but it also broke the arguably valid use case of
mount -olowerdir=/,upperdir=/subdir,..
where upperdir overlaps lowerdir on the same filesystem. This has been
causing regressions.
Revert the check, but only for the specific case where upperdir and/or
workdir are subdirectories of lowerdir. Any other overlap (e.g. lowerdir
is subdirectory of upperdir, etc) case is crazy, so leave the check in
place for those.
Overlaps are detected at lookup time too, so reverting the mount time check
should be safe.
Fixes: 146d62e5a586 ("ovl: detect overlapping layers")
Cc: <stable(a)vger.kernel.org> # v5.2
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
---
fs/overlayfs/super.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 7621ff176d15..1f0503aaf18c 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1501,7 +1501,8 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
* - upper/work dir of any overlayfs instance
*/
static int ovl_check_layer(struct super_block *sb, struct ovl_fs *ofs,
- struct dentry *dentry, const char *name)
+ struct dentry *dentry, const char *name,
+ bool is_lower)
{
struct dentry *next = dentry, *parent;
int err = 0;
@@ -1513,7 +1514,7 @@ static int ovl_check_layer(struct super_block *sb, struct ovl_fs *ofs,
/* Walk back ancestors to root (inclusive) looking for traps */
while (!err && parent != next) {
- if (ovl_lookup_trap_inode(sb, parent)) {
+ if (is_lower && ovl_lookup_trap_inode(sb, parent)) {
err = -ELOOP;
pr_err("overlayfs: overlapping %s path\n", name);
} else if (ovl_is_inuse(parent)) {
@@ -1539,7 +1540,7 @@ static int ovl_check_overlapping_layers(struct super_block *sb,
if (ofs->upper_mnt) {
err = ovl_check_layer(sb, ofs, ofs->upper_mnt->mnt_root,
- "upperdir");
+ "upperdir", false);
if (err)
return err;
@@ -1550,7 +1551,8 @@ static int ovl_check_overlapping_layers(struct super_block *sb,
* workbasedir. In that case, we already have their traps in
* inode cache and we will catch that case on lookup.
*/
- err = ovl_check_layer(sb, ofs, ofs->workbasedir, "workdir");
+ err = ovl_check_layer(sb, ofs, ofs->workbasedir, "workdir",
+ false);
if (err)
return err;
}
@@ -1558,7 +1560,7 @@ static int ovl_check_overlapping_layers(struct super_block *sb,
for (i = 0; i < ofs->numlower; i++) {
err = ovl_check_layer(sb, ofs,
ofs->lower_layers[i].mnt->mnt_root,
- "lowerdir");
+ "lowerdir", true);
if (err)
return err;
}
--
2.30.2
From: Frieder Schrempf <frieder.schrempf(a)kontron.de>
Since 8ce8c0abcba3 the driver queues work via priv->restart_work when
resuming after suspend, even when the interface was not previously
enabled. This causes a null dereference error as the workqueue is
only allocated and initialized in mcp251x_open().
To fix this we move the workqueue init to mcp251x_can_probe() as
there is no reason to do it later and repeat it whenever
mcp251x_open() is called.
Fixes: 8ce8c0abcba3 ("can: mcp251x: only reset hardware as required")
Cc: stable(a)vger.kernel.org
Signed-off-by: Frieder Schrempf <frieder.schrempf(a)kontron.de>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
Changes in v2:
* Remove the out_clean label in mcp251x_open()
* Add Andy's R-b tag
* Add 'From' tag
Hi Marc, I'm sending a v2 mainly because I noticed that v1 is missing
the 'From' tag and as my company's mailserver always sends my name
reversed this causes incorrect author information in git. So if possible
you could fix this up. If this is too much work, just leave it as is.
Thanks!
---
drivers/net/can/spi/mcp251x.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index a57da43680d8..6f888b771589 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -956,8 +956,6 @@ static int mcp251x_stop(struct net_device *net)
priv->force_quit = 1;
free_irq(spi->irq, priv);
- destroy_workqueue(priv->wq);
- priv->wq = NULL;
mutex_lock(&priv->mcp_lock);
@@ -1224,15 +1222,6 @@ static int mcp251x_open(struct net_device *net)
goto out_close;
}
- priv->wq = alloc_workqueue("mcp251x_wq", WQ_FREEZABLE | WQ_MEM_RECLAIM,
- 0);
- if (!priv->wq) {
- ret = -ENOMEM;
- goto out_clean;
- }
- INIT_WORK(&priv->tx_work, mcp251x_tx_work_handler);
- INIT_WORK(&priv->restart_work, mcp251x_restart_work_handler);
-
ret = mcp251x_hw_wake(spi);
if (ret)
goto out_free_wq;
@@ -1252,7 +1241,6 @@ static int mcp251x_open(struct net_device *net)
out_free_wq:
destroy_workqueue(priv->wq);
-out_clean:
free_irq(spi->irq, priv);
mcp251x_hw_sleep(spi);
out_close:
@@ -1373,6 +1361,15 @@ static int mcp251x_can_probe(struct spi_device *spi)
if (ret)
goto out_clk;
+ priv->wq = alloc_workqueue("mcp251x_wq", WQ_FREEZABLE | WQ_MEM_RECLAIM,
+ 0);
+ if (!priv->wq) {
+ ret = -ENOMEM;
+ goto out_clk;
+ }
+ INIT_WORK(&priv->tx_work, mcp251x_tx_work_handler);
+ INIT_WORK(&priv->restart_work, mcp251x_restart_work_handler);
+
priv->spi = spi;
mutex_init(&priv->mcp_lock);
@@ -1417,6 +1414,8 @@ static int mcp251x_can_probe(struct spi_device *spi)
return 0;
error_probe:
+ destroy_workqueue(priv->wq);
+ priv->wq = NULL;
mcp251x_power_enable(priv->power, 0);
out_clk:
@@ -1438,6 +1437,9 @@ static int mcp251x_can_remove(struct spi_device *spi)
mcp251x_power_enable(priv->power, 0);
+ destroy_workqueue(priv->wq);
+ priv->wq = NULL;
+
clk_disable_unprepare(priv->clk);
free_candev(net);
--
2.25.1
Hi maintainer,
I'm fairly new to contributing to the kernel and didn't know about the
stable tree procedure so missed setting the CC:stable@vger.kernel.org in
my patch submission; I'm following option 2 on the stable-kernel-rules
guide.
Subject: [PATCH] platform/x86: thinkpad_acpi: Correct thermal sensor
allocation
Upstream Commit ID: 6759e18e5cd8745a5dfc5726e4a3db5281ec1639
Reason: Some EC registers on Thinkpad machines were being incorrectly
used as temperature sensors. One in particular was fooling thermald into
thinking the system was hot when it wasn't, and keeping fans ramped up
unnecessarily.
I've been requested by some distro's to get this fix into the stable
tree to make it easier for them to then pull into their releases.
If it's possible to add this to 5.11 and maybe 5.10 that would be
appreciated.
Please let me know if you need anything or have any questions
Many thanks
Mark Pearson
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 708fa01597fa002599756bf56a96d0de1677375c Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Mon, 12 Apr 2021 12:00:37 +0200
Subject: [PATCH] ovl: allow upperdir inside lowerdir
Commit 146d62e5a586 ("ovl: detect overlapping layers") made sure we don't
have overlapping layers, but it also broke the arguably valid use case of
mount -olowerdir=/,upperdir=/subdir,..
where upperdir overlaps lowerdir on the same filesystem. This has been
causing regressions.
Revert the check, but only for the specific case where upperdir and/or
workdir are subdirectories of lowerdir. Any other overlap (e.g. lowerdir
is subdirectory of upperdir, etc) case is crazy, so leave the check in
place for those.
Overlaps are detected at lookup time too, so reverting the mount time check
should be safe.
Fixes: 146d62e5a586 ("ovl: detect overlapping layers")
Cc: <stable(a)vger.kernel.org> # v5.2
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index a33b31bf7e05..b01d4147520d 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1854,7 +1854,8 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
* - upper/work dir of any overlayfs instance
*/
static int ovl_check_layer(struct super_block *sb, struct ovl_fs *ofs,
- struct dentry *dentry, const char *name)
+ struct dentry *dentry, const char *name,
+ bool is_lower)
{
struct dentry *next = dentry, *parent;
int err = 0;
@@ -1866,7 +1867,7 @@ static int ovl_check_layer(struct super_block *sb, struct ovl_fs *ofs,
/* Walk back ancestors to root (inclusive) looking for traps */
while (!err && parent != next) {
- if (ovl_lookup_trap_inode(sb, parent)) {
+ if (is_lower && ovl_lookup_trap_inode(sb, parent)) {
err = -ELOOP;
pr_err("overlapping %s path\n", name);
} else if (ovl_is_inuse(parent)) {
@@ -1892,7 +1893,7 @@ static int ovl_check_overlapping_layers(struct super_block *sb,
if (ovl_upper_mnt(ofs)) {
err = ovl_check_layer(sb, ofs, ovl_upper_mnt(ofs)->mnt_root,
- "upperdir");
+ "upperdir", false);
if (err)
return err;
@@ -1903,7 +1904,8 @@ static int ovl_check_overlapping_layers(struct super_block *sb,
* workbasedir. In that case, we already have their traps in
* inode cache and we will catch that case on lookup.
*/
- err = ovl_check_layer(sb, ofs, ofs->workbasedir, "workdir");
+ err = ovl_check_layer(sb, ofs, ofs->workbasedir, "workdir",
+ false);
if (err)
return err;
}
@@ -1911,7 +1913,7 @@ static int ovl_check_overlapping_layers(struct super_block *sb,
for (i = 1; i < ofs->numlayer; i++) {
err = ovl_check_layer(sb, ofs,
ofs->layers[i].mnt->mnt_root,
- "lowerdir");
+ "lowerdir", true);
if (err)
return err;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 708fa01597fa002599756bf56a96d0de1677375c Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Mon, 12 Apr 2021 12:00:37 +0200
Subject: [PATCH] ovl: allow upperdir inside lowerdir
Commit 146d62e5a586 ("ovl: detect overlapping layers") made sure we don't
have overlapping layers, but it also broke the arguably valid use case of
mount -olowerdir=/,upperdir=/subdir,..
where upperdir overlaps lowerdir on the same filesystem. This has been
causing regressions.
Revert the check, but only for the specific case where upperdir and/or
workdir are subdirectories of lowerdir. Any other overlap (e.g. lowerdir
is subdirectory of upperdir, etc) case is crazy, so leave the check in
place for those.
Overlaps are detected at lookup time too, so reverting the mount time check
should be safe.
Fixes: 146d62e5a586 ("ovl: detect overlapping layers")
Cc: <stable(a)vger.kernel.org> # v5.2
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index a33b31bf7e05..b01d4147520d 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1854,7 +1854,8 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
* - upper/work dir of any overlayfs instance
*/
static int ovl_check_layer(struct super_block *sb, struct ovl_fs *ofs,
- struct dentry *dentry, const char *name)
+ struct dentry *dentry, const char *name,
+ bool is_lower)
{
struct dentry *next = dentry, *parent;
int err = 0;
@@ -1866,7 +1867,7 @@ static int ovl_check_layer(struct super_block *sb, struct ovl_fs *ofs,
/* Walk back ancestors to root (inclusive) looking for traps */
while (!err && parent != next) {
- if (ovl_lookup_trap_inode(sb, parent)) {
+ if (is_lower && ovl_lookup_trap_inode(sb, parent)) {
err = -ELOOP;
pr_err("overlapping %s path\n", name);
} else if (ovl_is_inuse(parent)) {
@@ -1892,7 +1893,7 @@ static int ovl_check_overlapping_layers(struct super_block *sb,
if (ovl_upper_mnt(ofs)) {
err = ovl_check_layer(sb, ofs, ovl_upper_mnt(ofs)->mnt_root,
- "upperdir");
+ "upperdir", false);
if (err)
return err;
@@ -1903,7 +1904,8 @@ static int ovl_check_overlapping_layers(struct super_block *sb,
* workbasedir. In that case, we already have their traps in
* inode cache and we will catch that case on lookup.
*/
- err = ovl_check_layer(sb, ofs, ofs->workbasedir, "workdir");
+ err = ovl_check_layer(sb, ofs, ofs->workbasedir, "workdir",
+ false);
if (err)
return err;
}
@@ -1911,7 +1913,7 @@ static int ovl_check_overlapping_layers(struct super_block *sb,
for (i = 1; i < ofs->numlayer; i++) {
err = ovl_check_layer(sb, ofs,
ofs->layers[i].mnt->mnt_root,
- "lowerdir");
+ "lowerdir", true);
if (err)
return err;
}
We observed several NVMe failures when running with SWIOTLB. The root
cause of the issue is that when data is mapped via SWIOTLB, the address
offset is not preserved. Several device drivers including the NVMe
driver relies on this offset to function correctly.
Even though we discovered the error when running using AMD SEV, we have
reproduced the same error in Rhel 8 without SEV. By adding swiotlb=force
option to the boot command line parameter, NVMe funcionality is
impacted. For example formatting a disk into xfs format returns an
error.
----
Changes in v2:
Rebased patches to 5.10.33
Updated patch description to correct format.
Jianxiong Gao (9):
driver core: add a min_align_mask field to struct
device_dma_parameters
swiotlb: add a IO_TLB_SIZE define
swiotlb: factor out an io_tlb_offset helper
swiotlb: factor out a nr_slots helper
swiotlb: clean up swiotlb_tbl_unmap_single
swiotlb: refactor swiotlb_tbl_map_single
swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
swiotlb: respect min_align_mask
nvme-pci: set min_align_mask
drivers/nvme/host/pci.c | 1 +
include/linux/device.h | 1 +
include/linux/dma-mapping.h | 16 +++
include/linux/swiotlb.h | 1 +
kernel/dma/swiotlb.c | 259 ++++++++++++++++++++----------------
5 files changed, 164 insertions(+), 114 deletions(-)
--
2.31.1.498.g6c1eba8ee3d-goog
The patch titled
Subject: mm/hugetlb: fix cow where page writtable in child
has been added to the -mm tree. Its filename is
mm-hugetlb-fix-cow-where-page-writtable-in-child.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-cow-where-page-wri…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-cow-where-page-wri…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix cow where page writtable in child
When rework early cow of pinned hugetlb pages, we moved huge_ptep_get()
upper but overlooked a side effect that the huge_ptep_get() will fetch the
pte after wr-protection. After moving it upwards, we need explicit
wr-protect of child pte or we will keep the write bit set in the child
process, which could cause data corrution where the child can write to the
original page directly.
This issue can also be exposed by "memfd_test hugetlbfs" kselftest.
Link: https://lkml.kernel.org/r/20210503234356.9097-3-peterx@redhat.com
Fixes: 4eae4efa2c299 ("hugetlb: do early cow when page pinned on src mm")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Joel Fernandes (Google) <joel(a)joelfernandes.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/hugetlb.c~mm-hugetlb-fix-cow-where-page-writtable-in-child
+++ a/mm/hugetlb.c
@@ -3898,6 +3898,7 @@ again:
* See Documentation/vm/mmu_notifier.rst
*/
huge_ptep_set_wrprotect(src, addr, src_pte);
+ entry = huge_pte_wrprotect(entry);
}
page_dup_rmap(ptepage, true);
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-hugetlb-fix-f_seal_future_write.patch
mm-hugetlb-fix-cow-where-page-writtable-in-child.patch
hugetlb-pass-vma-into-huge_pte_alloc-and-huge_pmd_share.patch
hugetlb-pass-vma-into-huge_pte_alloc-and-huge_pmd_share-fix.patch
hugetlb-userfaultfd-forbid-huge-pmd-sharing-when-uffd-enabled.patch
hugetlb-userfaultfd-forbid-huge-pmd-sharing-when-uffd-enabled-fix.patch
mm-hugetlb-move-flush_hugetlb_tlb_range-into-hugetlbh.patch
hugetlb-userfaultfd-unshare-all-pmds-for-hugetlbfs-when-register-wp.patch
userfaultfd-add-minor-fault-registration-mode-fix.patch
The patch titled
Subject: mm/hugetlb: fix F_SEAL_FUTURE_WRITE
has been added to the -mm tree. Its filename is
mm-hugetlb-fix-f_seal_future_write.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-f_seal_future_writ…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-f_seal_future_writ…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix F_SEAL_FUTURE_WRITE
Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2.
Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to
hugetlbfs, which I can easily verify using the memfd_test program, which
seems that the program is hardly run with hugetlbfs pages (as by default
shmem).
Meanwhile I found another probably even more severe issue on that hugetlb
fork won't wr-protect child cow pages, so child can potentially write to
parent private pages. Patch 2 addresses that.
After this series applied, "memfd_test hugetlbfs" should start to pass.
This patch (of 2):
F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
There is a test program for that and it fails constantly.
$ ./memfd_test hugetlbfs
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
mmap() didn't fail as expected
Aborted (core dumped)
I think it's probably because no one is really running the hugetlbfs test.
Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we
do in shmem_mmap(). Generalize a helper for that.
Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com
Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reported-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Joel Fernandes (Google) <joel(a)joelfernandes.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 5 +++++
include/linux/mm.h | 32 ++++++++++++++++++++++++++++++++
mm/shmem.c | 22 ++++------------------
3 files changed, 41 insertions(+), 18 deletions(-)
--- a/fs/hugetlbfs/inode.c~mm-hugetlb-fix-f_seal_future_write
+++ a/fs/hugetlbfs/inode.c
@@ -131,6 +131,7 @@ static void huge_pagevec_release(struct
static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
{
struct inode *inode = file_inode(file);
+ struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
loff_t len, vma_len;
int ret;
struct hstate *h = hstate_file(file);
@@ -146,6 +147,10 @@ static int hugetlbfs_file_mmap(struct fi
vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
vma->vm_ops = &hugetlb_vm_ops;
+ ret = seal_check_future_write(info->seals, vma);
+ if (ret)
+ return ret;
+
/*
* page based offset in vm_pgoff could be sufficiently large to
* overflow a loff_t when converted to byte offset. This can
--- a/include/linux/mm.h~mm-hugetlb-fix-f_seal_future_write
+++ a/include/linux/mm.h
@@ -3190,5 +3190,37 @@ void mem_dump_obj(void *object);
static inline void mem_dump_obj(void *object) {}
#endif
+/**
+ * seal_check_future_write - Check for F_SEAL_FUTURE_WRITE flag and handle it
+ * @seals: the seals to check
+ * @vma: the vma to operate on
+ *
+ * Check whether F_SEAL_FUTURE_WRITE is set; if so, do proper check/handling on
+ * the vma flags. Return 0 if check pass, or <0 for errors.
+ */
+static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
+{
+ if (seals & F_SEAL_FUTURE_WRITE) {
+ /*
+ * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
+ * "future write" seal active.
+ */
+ if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
+ return -EPERM;
+
+ /*
+ * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
+ * MAP_SHARED and read-only, take care to not allow mprotect to
+ * revert protections on such mappings. Do this only for shared
+ * mappings. For private mappings, don't need to mask
+ * VM_MAYWRITE as we still want them to be COW-writable.
+ */
+ if (vma->vm_flags & VM_SHARED)
+ vma->vm_flags &= ~(VM_MAYWRITE);
+ }
+
+ return 0;
+}
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
--- a/mm/shmem.c~mm-hugetlb-fix-f_seal_future_write
+++ a/mm/shmem.c
@@ -2258,25 +2258,11 @@ out_nomem:
static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
{
struct shmem_inode_info *info = SHMEM_I(file_inode(file));
+ int ret;
- if (info->seals & F_SEAL_FUTURE_WRITE) {
- /*
- * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
- * "future write" seal active.
- */
- if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
- return -EPERM;
-
- /*
- * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
- * MAP_SHARED and read-only, take care to not allow mprotect to
- * revert protections on such mappings. Do this only for shared
- * mappings. For private mappings, don't need to mask
- * VM_MAYWRITE as we still want them to be COW-writable.
- */
- if (vma->vm_flags & VM_SHARED)
- vma->vm_flags &= ~(VM_MAYWRITE);
- }
+ ret = seal_check_future_write(info->seals, vma);
+ if (ret)
+ return ret;
/* arm64 - allow memory tagging on RAM-based files */
vma->vm_flags |= VM_MTE_ALLOWED;
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-hugetlb-fix-f_seal_future_write.patch
mm-hugetlb-fix-cow-where-page-writtable-in-child.patch
hugetlb-pass-vma-into-huge_pte_alloc-and-huge_pmd_share.patch
hugetlb-pass-vma-into-huge_pte_alloc-and-huge_pmd_share-fix.patch
hugetlb-userfaultfd-forbid-huge-pmd-sharing-when-uffd-enabled.patch
hugetlb-userfaultfd-forbid-huge-pmd-sharing-when-uffd-enabled-fix.patch
mm-hugetlb-move-flush_hugetlb_tlb_range-into-hugetlbh.patch
hugetlb-userfaultfd-unshare-all-pmds-for-hugetlbfs-when-register-wp.patch
userfaultfd-add-minor-fault-registration-mode-fix.patch
From: Yazen Ghannam <yazen.ghannam(a)amd.com>
Always call kill_me_maybe() in order to attempt memory recovery. This
ensures that any memory associated with the error is properly marked as
poison.
This is needed for errors that occur on memory, but that do not have
MCG_STATUS[RIPV] set. One example is data poison consumption through the
instruction fetch units on AMD Zen-based systems.
The MF_MUST_KILL flag is passed to memory_failure() when
MCG_STATUS[RIPV] is not set. So the associated process will still be
killed.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Yazen Ghannam <yazen.ghannam(a)amd.com>
---
arch/x86/kernel/cpu/mce/core.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 308fb644b94a..9040d45ed997 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1285,10 +1285,7 @@ static void queue_task_work(struct mce *m, int kill_current_task)
current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
current->mce_whole_page = whole_page(m);
- if (kill_current_task)
- current->mce_kill_me.func = kill_me_now;
- else
- current->mce_kill_me.func = kill_me_maybe;
+ current->mce_kill_me.func = kill_me_maybe;
task_work_add(current, ¤t->mce_kill_me, TWA_RESUME);
}
--
2.25.1
From: Yazen Ghannam <yazen.ghannam(a)amd.com>
The Instruction Fetch (IF) units on AMD Zen-based systems do not
guarantee a synchronous #MC is delivered. Therefore, MCG_STATUS[EIPV|RIPV]
will not be set. However, the microarchitecture does guarantee that the
exception is delivered within the same context. In other words, the
exact rIP is not known, but the context is known to not have changed.
There is no architecturally-defined method to determine this behavior.
The Code Segment (CS) register is always valid on AMD Zen-based IF units
regardless of the value of MCG_STATUS[EIPV|RIPV].
Add a quirk for all current Zen-based systems to save the CS register
for the IF banks.
This is needed to properly determine the context of the error.
Otherwise, the severity grading function will assume the context is
IN_KERNEL due to the m->cs value being 0 (the initialized value). This
leads to unnecessary kernel panics on data poison errors due to the
kernel believing the poison consumption occurred in kernel context.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Yazen Ghannam <yazen.ghannam(a)amd.com>
---
arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++++++++
arch/x86/kernel/cpu/mce/core.c | 7 +++++++
arch/x86/kernel/cpu/mce/internal.h | 2 ++
3 files changed, 26 insertions(+)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index e486f96b3cb3..141dcdd857b5 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -180,6 +180,23 @@ static struct smca_hwid smca_hwid_mcatypes[] = {
struct smca_bank smca_banks[MAX_NR_BANKS];
EXPORT_SYMBOL_GPL(smca_banks);
+/*
+ * Zen-based Instruction Fetch Units set EIPV=RIPV=0 on poison consumption
+ * errors (XEC = 12). However, the context is still valid, so save the CS
+ * register for later use.
+ */
+void quirk_zen_ifu(int bank, struct mce *m, struct pt_regs *regs)
+{
+ if (smca_get_bank_type(bank) != SMCA_IF)
+ return;
+ if ((m->mcgstatus & (MCG_STATUS_EIPV|MCG_STATUS_RIPV)) != 0)
+ return;
+ if (((m->status >> 16) & 0x1F) != 12)
+ return;
+
+ m->cs = regs->cs;
+}
+
/*
* In SMCA enabled processors, we can have multiple banks for a given IP type.
* So to define a unique name for each bank, we use a temp c-string to append
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index bf7fe87a7e88..308fb644b94a 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1754,6 +1754,13 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
if (c->x86 == 0x15 && c->x86_model <= 0xf)
mce_flags.overflow_recov = 1;
+ if (c->x86 == 0x17 || c->x86 == 0x19)
+ quirk_no_way_out = quirk_zen_ifu;
+ }
+
+ if (c->x86_vendor == X86_VENDOR_HYGON) {
+ if (c->x86 == 0x18)
+ quirk_no_way_out = quirk_zen_ifu;
}
if (c->x86_vendor == X86_VENDOR_INTEL) {
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 88dcc79cfb07..656d5d6c9783 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -181,8 +181,10 @@ extern struct mca_msr_regs msr_ops;
extern bool filter_mce(struct mce *m);
#ifdef CONFIG_X86_MCE_AMD
+extern void quirk_zen_ifu(int bank, struct mce *m, struct pt_regs *regs);
extern bool amd_filter_mce(struct mce *m);
#else
+#define quirk_zen_ifu NULL
static inline bool amd_filter_mce(struct mce *m) { return false; };
#endif
--
2.25.1
This is the start of the stable review cycle for the 5.10.34 release.
There are 2 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 02 May 2021 14:19:04 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.34-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.34-rc1
Tomas Winkler <tomas.winkler(a)intel.com>
mei: me: add Alder Lake P device id.
Jiri Kosina <jkosina(a)suse.cz>
iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_gen2_enqueue_hcmd()
-------------
Diffstat:
Makefile | 4 ++--
drivers/misc/mei/hw-me-regs.h | 1 +
drivers/misc/mei/pci-me.c | 1 +
drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 7 ++++---
4 files changed, 8 insertions(+), 5 deletions(-)
On Mon, May 3, 2021 at 7:00 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux(a)googlegroups.com> wrote:
> > > >> ERROR: "__memcat_p" [drivers/hwtracing/stm/stm_core.ko] undefined!
> >
> > I'm fairly sure this is unrelated to my patch, but I don't see what
> > happened here.
>
> It's unrelated to your patch. It was fixed in 5.7 by
> 7273ad2b08f8ac9563579d16a3cf528857b26f49 and a few other dependencies
> according to https://github.com/ClangBuiltLinux/linux/issues/515.
>
Ah right, the big hammer.
Greg, not sure what we want to do here. Backporting
7273ad2b08f8 ("kbuild: link lib-y objects to vmlinux forcibly when
CONFIG_MODULES=y")
to v5.4 and earlier would be an easy workaround, but it has the potential
of adding extra bloat to the kernel image since it links in all other
library objects as well.
Arnd
From: Martin Wilck <mwilck(a)suse.com>
We have observed a few crashes run_timer_softirq(), where a broken
timer_list struct belonging to an anatt_timer was encountered. The broken
structures look like this, and we see actually multiple ones attached to
the same timer base:
crash> struct timer_list 0xffff92471bcfdc90
struct timer_list {
entry = {
next = 0xdead000000000122, // LIST_POISON2
pprev = 0x0
},
expires = 4296022933,
function = 0xffffffffc06de5e0 <nvme_anatt_timeout>,
flags = 20
}
If such a timer is encountered in run_timer_softirq(), the kernel
crashes. The test scenario was an I/O load test with lots of NVMe
controllers, some of which were removed and re-added on the storage side.
I think this may happen if the rdma recovery_work starts, in this call
chain:
nvme_rdma_error_recovery_work()
/* this stops all sorts of activity for the controller, but not
the multipath-related work queue and timer */
nvme_rdma_reconnect_or_remove(ctrl)
=> kicks reconnect_work
work queue: reconnect_work
nvme_rdma_reconnect_ctrl_work()
nvme_rdma_setup_ctrl()
nvme_rdma_configure_admin_queue()
nvme_init_identify()
nvme_mpath_init()
# this sets some fields of the timer_list without taking a lock
timer_setup()
nvme_read_ana_log()
mod_timer() or del_timer_sync()
Similar for TCP. The idea for the patch is based on the observation that
nvme_rdma_reset_ctrl_work() calls nvme_stop_ctrl()->nvme_mpath_stop(),
whereas nvme_rdma_error_recovery_work() stops only the keepalive timer, but
not the anatt timer. Also, nvme_mpath_init() is the only place where
the anatt_timer structure is accessed without locking.
[The following paragraph was contributed by Chao Leng <lengchao(a)huawei.com>]
The process maybe:1.ana_work add the timer;2.error recovery occurs,
in reconnecting, reinitialize the timer and call nvme_read_ana_log,
nvme_read_ana_log may add the timer again.
The same timer is added twice, crash will happens later.
This situation has actually been observed in a crash dump, where we
found an anatt timer pending that had been started ~80s ago, despite a
log message telling that the anatt timer for the same controller had
timed out a few seconds ago. This could only be explained by the same
timer having been attached multiple times.
Signed-off-by: Martin Wilck <mwilck(a)suse.com>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Chao Leng <lengchao(a)huawei.com>
Cc: stable(a)vger.kernel.org
----
Changes in v4: Updated commit message with Chao Leng's analysis, as
suggested by Daniel Wagner.
Changes in v3: Changed the subject line, as suggested by Sagi Grimberg
Changes in v2: Moved call to nvme_mpath_stop() further down, directly before
the call of nvme_rdma_reconnect_or_remove() (Chao Leng)
---
drivers/nvme/host/multipath.c | 1 +
drivers/nvme/host/rdma.c | 1 +
drivers/nvme/host/tcp.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index a1d476e1ac02..c63dd5dfa7ff 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -586,6 +586,7 @@ void nvme_mpath_stop(struct nvme_ctrl *ctrl)
del_timer_sync(&ctrl->anatt_timer);
cancel_work_sync(&ctrl->ana_work);
}
+EXPORT_SYMBOL_GPL(nvme_mpath_stop);
#define SUBSYS_ATTR_RW(_name, _mode, _show, _store) \
struct device_attribute subsys_attr_##_name = \
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index be905d4fdb47..fc07a7b0dc1d 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1202,6 +1202,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
return;
}
+ nvme_mpath_stop(&ctrl->ctrl);
nvme_rdma_reconnect_or_remove(ctrl);
}
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index a0f00cb8f9f3..46287b4f4d10 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2068,6 +2068,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
return;
}
+ nvme_mpath_stop(ctrl);
nvme_tcp_reconnect_or_remove(ctrl);
}
--
2.31.1
The problem was in calculate_skip() function.
int skip = calculate_skip(i_size_read(inode) >> msblk->block_log);
i_size_read(inode) and msblk->block_log are unsigned integers,
but calculate_skip had a signed int as argument. This cast led
to wrong skip value and then to divide by zero bug.
Fixes: 1701aecb6849 ("Squashfs: regular file operations")
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+e8f781243ce16ac2f962(a)syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin(a)gmail.com>
---
fs/squashfs/file.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 7b1128398976..2ebcbd4f84cc 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -44,8 +44,8 @@
* Locate cache slot in range [offset, index] for specified inode. If
* there's more than one return the slot closest to index.
*/
-static struct meta_index *locate_meta_index(struct inode *inode, int offset,
- int index)
+static struct meta_index *locate_meta_index(struct inode *inode, unsigned int offset,
+ unsigned int index)
{
struct meta_index *meta = NULL;
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
@@ -83,8 +83,8 @@ static struct meta_index *locate_meta_index(struct inode *inode, int offset,
/*
* Find and initialise an empty cache slot for index offset.
*/
-static struct meta_index *empty_meta_index(struct inode *inode, int offset,
- int skip)
+static struct meta_index *empty_meta_index(struct inode *inode, unsigned int offset,
+ unsigned int skip)
{
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
struct meta_index *meta = NULL;
@@ -211,11 +211,11 @@ static long long read_indexes(struct super_block *sb, int n,
* If the skip factor is limited in this way then the file will use multiple
* slots.
*/
-static inline int calculate_skip(int blocks)
+static inline unsigned int calculate_skip(unsigned int blocks)
{
- int skip = blocks / ((SQUASHFS_META_ENTRIES + 1)
+ unsigned int skip = blocks / ((SQUASHFS_META_ENTRIES + 1)
* SQUASHFS_META_INDEXES);
- return min(SQUASHFS_CACHED_BLKS - 1, skip + 1);
+ return min((unsigned int) SQUASHFS_CACHED_BLKS - 1, skip + 1);
}
@@ -224,12 +224,12 @@ static inline int calculate_skip(int blocks)
* on-disk locations of the datablock and block list metadata block
* <index_block, index_offset> for index (scaled to nearest cache index).
*/
-static int fill_meta_index(struct inode *inode, int index,
+static int fill_meta_index(struct inode *inode, unsigned int index,
u64 *index_block, int *index_offset, u64 *data_block)
{
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
- int skip = calculate_skip(i_size_read(inode) >> msblk->block_log);
- int offset = 0;
+ unsigned int skip = calculate_skip(i_size_read(inode) >> msblk->block_log);
+ unsigned int offset = 0;
struct meta_index *meta;
struct meta_entry *meta_entry;
u64 cur_index_block = squashfs_i(inode)->block_list_start;
@@ -323,7 +323,7 @@ static int fill_meta_index(struct inode *inode, int index,
* Get the on-disk location and compressed size of the datablock
* specified by index. Fill_meta_index() does most of the work.
*/
-static int read_blocklist(struct inode *inode, int index, u64 *block)
+static int read_blocklist(struct inode *inode, unsigned int index, u64 *block)
{
u64 start;
long long blks;
@@ -448,7 +448,7 @@ static int squashfs_readpage(struct file *file, struct page *page)
{
struct inode *inode = page->mapping->host;
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
- int index = page->index >> (msblk->block_log - PAGE_SHIFT);
+ unsigned int index = page->index >> (msblk->block_log - PAGE_SHIFT);
int file_end = i_size_read(inode) >> msblk->block_log;
int expected = index == file_end ?
(i_size_read(inode) & (msblk->block_size - 1)) :
--
2.31.1
Hi,
Please may we request that commit 6e6026f2dd20 ("igb: Enable RSS for
Intel I211 Ethernet Controller") be backported to the 5.4 and 5.10 LTS
kernels?
The Intel i211 Ethernet Controller supports 2 Receive Side Scaling
(RSS) queues, the patch corrects the issue that the i211 should not be
excluded from having this feature enabled.
Best regards,
Nick
---------- Forwarded message ---------
From: Jakub Kicinski <kuba(a)kernel.org>
Date: Mon, 3 May 2021 at 19:30
Subject: Re: [PATCH net] igb: Enable RSS for Intel I211 Ethernet Controller
To: Nick Lowe <nick.lowe(a)gmail.com>
Cc: Matt Corallo <linux-wired-list(a)bluematt.me>, Nguyen, Anthony L
<anthony.l.nguyen(a)intel.com>, netdev(a)vger.kernel.org
<netdev(a)vger.kernel.org>, davem(a)davemloft.net <davem(a)davemloft.net>,
Brandeburg, Jesse <jesse.brandeburg(a)intel.com>,
intel-wired-lan(a)lists.osuosl.org <intel-wired-lan(a)lists.osuosl.org>
On Mon, 3 May 2021 13:32:24 +0100 Nick Lowe wrote:
> Hi all,
>
> Now that the 5.12 kernel has released, please may we consider
> backporting commit 6e6026f2dd2005844fb35c3911e8083c09952c6c to both
> the 5.4 and 5.10 LTS kernels so that RSS starts to function with the
> i211?
No objections here. Please submit the backport request to stable@.
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opt…
I'm announcing the release of the 5.10.34 kernel.
All users of the 5.10 kernel series must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
drivers/misc/mei/hw-me-regs.h | 1 +
drivers/misc/mei/pci-me.c | 1 +
drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 7 ++++---
4 files changed, 7 insertions(+), 4 deletions(-)
Greg Kroah-Hartman (1):
Linux 5.10.34
Jiri Kosina (1):
iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_gen2_enqueue_hcmd()
Tomas Winkler (1):
mei: me: add Alder Lake P device id.
I'm announcing the release of the 5.4.116 kernel.
All users of the 5.4 kernel series must upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
kernel/bpf/verifier.c | 233 ++++++----
tools/testing/selftests/bpf/verifier/bounds_deduction.c | 21
tools/testing/selftests/bpf/verifier/bounds_mix_sign_unsign.c | 13
tools/testing/selftests/bpf/verifier/unpriv.c | 2
tools/testing/selftests/bpf/verifier/value_ptr_arith.c | 6
6 files changed, 174 insertions(+), 103 deletions(-)
Daniel Borkmann (8):
bpf: Move off_reg into sanitize_ptr_alu
bpf: Ensure off_reg has no mixed signed bounds for all types
bpf: Rework ptr_limit into alu_limit and add common error path
bpf: Improve verifier error messages for users
bpf: Refactor and streamline bounds check into helper
bpf: Move sanitize_val_alu out of op switch
bpf: Tighten speculative pointer arithmetic mask
bpf: Update selftests to reflect new error states
Greg Kroah-Hartman (1):
Linux 5.4.116
When rework early cow of pinned hugetlb pages, we moved huge_ptep_get() upper
but overlooked a side effect that the huge_ptep_get() will fetch the pte after
wr-protection. After moving it upwards, we need explicit wr-protect of child
pte or we will keep the write bit set in the child process, which could cause
data corrution where the child can write to the original page directly.
This issue can also be exposed by "memfd_test hugetlbfs" kselftest.
Cc: stable(a)vger.kernel.org
Fixes: 4eae4efa2c299 ("hugetlb: do early cow when page pinned on src mm")
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Signed-off-by: Peter Xu <peterx(a)redhat.com>
---
mm/hugetlb.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index aab3a33214d10..72544ebb24f0e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4076,6 +4076,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
* See Documentation/vm/mmu_notifier.rst
*/
huge_ptep_set_wrprotect(src, addr, src_pte);
+ entry = huge_pte_wrprotect(entry);
}
page_dup_rmap(ptepage, true);
--
2.31.1
F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
There is a test program for that and it fails constantly.
$ ./memfd_test hugetlbfs
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
mmap() didn't fail as expected
Aborted (core dumped)
I think it's probably because no one is really running the hugetlbfs test.
Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in
shmem_mmap(). Generalize a helper for that.
Cc: Joel Fernandes (Google) <joel(a)joelfernandes.org>
Cc: stable(a)vger.kernel.org
Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
Reported-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Signed-off-by: Peter Xu <peterx(a)redhat.com>
---
fs/hugetlbfs/inode.c | 5 +++++
include/linux/mm.h | 32 ++++++++++++++++++++++++++++++++
mm/shmem.c | 22 ++++------------------
3 files changed, 41 insertions(+), 18 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9b383c39756a5..6557cf2cb1879 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -131,6 +131,7 @@ static void huge_pagevec_release(struct pagevec *pvec)
static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
{
struct inode *inode = file_inode(file);
+ struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
loff_t len, vma_len;
int ret;
struct hstate *h = hstate_file(file);
@@ -146,6 +147,10 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
vma->vm_ops = &hugetlb_vm_ops;
+ ret = seal_check_future_write(info->seals, vma);
+ if (ret)
+ return ret;
+
/*
* page based offset in vm_pgoff could be sufficiently large to
* overflow a loff_t when converted to byte offset. This can
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d6790ab0cf575..b9b2caf9302bc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3238,5 +3238,37 @@ extern int sysctl_nr_trim_pages;
void mem_dump_obj(void *object);
+/**
+ * seal_check_future_write - Check for F_SEAL_FUTURE_WRITE flag and handle it
+ * @seals: the seals to check
+ * @vma: the vma to operate on
+ *
+ * Check whether F_SEAL_FUTURE_WRITE is set; if so, do proper check/handling on
+ * the vma flags. Return 0 if check pass, or <0 for errors.
+ */
+static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
+{
+ if (seals & F_SEAL_FUTURE_WRITE) {
+ /*
+ * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
+ * "future write" seal active.
+ */
+ if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
+ return -EPERM;
+
+ /*
+ * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
+ * MAP_SHARED and read-only, take care to not allow mprotect to
+ * revert protections on such mappings. Do this only for shared
+ * mappings. For private mappings, don't need to mask
+ * VM_MAYWRITE as we still want them to be COW-writable.
+ */
+ if (vma->vm_flags & VM_SHARED)
+ vma->vm_flags &= ~(VM_MAYWRITE);
+ }
+
+ return 0;
+}
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
diff --git a/mm/shmem.c b/mm/shmem.c
index a1f21736ad68e..250b52e682590 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2258,25 +2258,11 @@ int shmem_lock(struct file *file, int lock, struct user_struct *user)
static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
{
struct shmem_inode_info *info = SHMEM_I(file_inode(file));
+ int ret;
- if (info->seals & F_SEAL_FUTURE_WRITE) {
- /*
- * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
- * "future write" seal active.
- */
- if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
- return -EPERM;
-
- /*
- * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
- * MAP_SHARED and read-only, take care to not allow mprotect to
- * revert protections on such mappings. Do this only for shared
- * mappings. For private mappings, don't need to mask
- * VM_MAYWRITE as we still want them to be COW-writable.
- */
- if (vma->vm_flags & VM_SHARED)
- vma->vm_flags &= ~(VM_MAYWRITE);
- }
+ ret = seal_check_future_write(info->seals, vma);
+ if (ret)
+ return ret;
/* arm64 - allow memory tagging on RAM-based files */
vma->vm_flags |= VM_MTE_ALLOWED;
--
2.31.1
Stable team, please backport the upstream commits
7962893ecb85 ("drm/i915: Disable runtime power management during shutdown")
to the v5.11 stable kernel, they fix a system shutdown failure.
References: https://lore.kernel.org/intel-gfx/042237f49ed1fd719126a3407d7c909e49addbea.…
Reported-and-tested-by: Mario Hüttel <mario.huettel(a)gmx.net>
Thanks,
Imre
From: James Smart <jsmart2021(a)gmail.com>
[ Upstream commit 9302154c07bff4e7f7f43c506a1ac84540303d06 ]
The wqe_dbde field indicates whether a Data BDE is present in Words 0:2 and
should therefore should be clear in the abts request wqe. By setting the
bit we can be misleading fw into error cases.
Clear the wqe_dbde field.
Link: https://lore.kernel.org/r/20210301171821.3427-2-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy(a)broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy(a)broadcom.com>
Signed-off-by: James Smart <jsmart2021(a)gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/lpfc/lpfc_nvmet.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index eacdcb931bda..fa0d0d15e82c 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -2554,7 +2554,6 @@ lpfc_nvmet_unsol_issue_abort(struct lpfc_hba *phba,
bf_set(wqe_rcvoxid, &wqe_abts->xmit_sequence.wqe_com, xri);
/* Word 10 */
- bf_set(wqe_dbde, &wqe_abts->xmit_sequence.wqe_com, 1);
bf_set(wqe_iod, &wqe_abts->xmit_sequence.wqe_com, LPFC_WQE_IOD_WRITE);
bf_set(wqe_lenloc, &wqe_abts->xmit_sequence.wqe_com,
LPFC_WQE_LENLOC_WORD12);
--
2.30.2
From: James Smart <jsmart2021(a)gmail.com>
[ Upstream commit 9302154c07bff4e7f7f43c506a1ac84540303d06 ]
The wqe_dbde field indicates whether a Data BDE is present in Words 0:2 and
should therefore should be clear in the abts request wqe. By setting the
bit we can be misleading fw into error cases.
Clear the wqe_dbde field.
Link: https://lore.kernel.org/r/20210301171821.3427-2-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy(a)broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy(a)broadcom.com>
Signed-off-by: James Smart <jsmart2021(a)gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/lpfc/lpfc_nvmet.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 5bc33817568e..23ead17e60fe 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -2912,7 +2912,6 @@ lpfc_nvmet_unsol_issue_abort(struct lpfc_hba *phba,
bf_set(wqe_rcvoxid, &wqe_abts->xmit_sequence.wqe_com, xri);
/* Word 10 */
- bf_set(wqe_dbde, &wqe_abts->xmit_sequence.wqe_com, 1);
bf_set(wqe_iod, &wqe_abts->xmit_sequence.wqe_com, LPFC_WQE_IOD_WRITE);
bf_set(wqe_lenloc, &wqe_abts->xmit_sequence.wqe_com,
LPFC_WQE_LENLOC_WORD12);
--
2.30.2