The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
50bcceb7724e ("x86/pm: Add enumeration check before spec MSRs save/restore setup")
2632daebafd0 ("x86/cpu: Restore AMD's DE_CFG MSR after resume")
e2a1256b17b1 ("x86/speculation: Restore speculation related MSRs during S3 resume")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 50bcceb7724e471d9b591803889df45dcbb584bc Mon Sep 17 00:00:00 2001
From: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Date: Tue, 15 Nov 2022 11:17:06 -0800
Subject: [PATCH] x86/pm: Add enumeration check before spec MSRs save/restore
setup
pm_save_spec_msr() keeps a list of all the MSRs which _might_ need
to be saved and restored at hibernate and resume. However, it has
zero awareness of CPU support for these MSRs. It mostly works by
unconditionally attempting to manipulate these MSRs and relying on
rdmsrl_safe() being able to handle a #GP on CPUs where the support is
unavailable.
However, it's possible for reads (RDMSR) to be supported for a given MSR
while writes (WRMSR) are not. In this case, msr_build_context() sees
a successful read (RDMSR) and marks the MSR as valid. Then, later, a
write (WRMSR) fails, producing a nasty (but harmless) error message.
This causes restore_processor_state() to try and restore it, but writing
this MSR is not allowed on the Intel Atom N2600 leading to:
unchecked MSR access error: WRMSR to 0x122 (tried to write 0x0000000000000002) \
at rIP: 0xffffffff8b07a574 (native_write_msr+0x4/0x20)
Call Trace:
<TASK>
restore_processor_state
x86_acpi_suspend_lowlevel
acpi_suspend_enter
suspend_devices_and_enter
pm_suspend.cold
state_store
kernfs_fop_write_iter
vfs_write
ksys_write
do_syscall_64
? do_syscall_64
? up_read
? lock_is_held_type
? asm_exc_page_fault
? lockdep_hardirqs_on
entry_SYSCALL_64_after_hwframe
To fix this, add the corresponding X86_FEATURE bit for each MSR. Avoid
trying to manipulate the MSR when the feature bit is clear. This
required adding a X86_FEATURE bit for MSRs that do not have one already,
but it's a small price to pay.
[ bp: Move struct msr_enumeration inside the only function that uses it. ]
Fixes: 73924ec4d560 ("x86/pm: Save the MSR validity status at context setup")
Reported-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: <stable(a)kernel.org>
Link: https://lore.kernel.org/r/c24db75d69df6e66c0465e13676ad3f2837a2ed8.16685397…
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 4cd39f304e20..93ae33248f42 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -513,16 +513,23 @@ static int pm_cpu_check(const struct x86_cpu_id *c)
static void pm_save_spec_msr(void)
{
- u32 spec_msr_id[] = {
- MSR_IA32_SPEC_CTRL,
- MSR_IA32_TSX_CTRL,
- MSR_TSX_FORCE_ABORT,
- MSR_IA32_MCU_OPT_CTRL,
- MSR_AMD64_LS_CFG,
- MSR_AMD64_DE_CFG,
+ struct msr_enumeration {
+ u32 msr_no;
+ u32 feature;
+ } msr_enum[] = {
+ { MSR_IA32_SPEC_CTRL, X86_FEATURE_MSR_SPEC_CTRL },
+ { MSR_IA32_TSX_CTRL, X86_FEATURE_MSR_TSX_CTRL },
+ { MSR_TSX_FORCE_ABORT, X86_FEATURE_TSX_FORCE_ABORT },
+ { MSR_IA32_MCU_OPT_CTRL, X86_FEATURE_SRBDS_CTRL },
+ { MSR_AMD64_LS_CFG, X86_FEATURE_LS_CFG_SSBD },
+ { MSR_AMD64_DE_CFG, X86_FEATURE_LFENCE_RDTSC },
};
+ int i;
- msr_build_context(spec_msr_id, ARRAY_SIZE(spec_msr_id));
+ for (i = 0; i < ARRAY_SIZE(msr_enum); i++) {
+ if (boot_cpu_has(msr_enum[i].feature))
+ msr_build_context(&msr_enum[i].msr_no, 1);
+ }
}
static int pm_check_save_msr(void)
On Thu, Nov 24, 2022 at 01:48:08PM -0500, John Aron wrote:
> Hello -
>
>
>
> I have an idea of where to begin: our kernel code compiles and works on Red
> Hat, CentOS, and Fedora. In Ubuntu 20.04, I have an error.
>
>
>
> root@form:/home/john/thor-linux/Kernel/ubuntu20.04# make
>
> rmmod: ERROR: Module thor is not currently loaded
>
> make: [Makefile:7: all] Error 1 (ignored)
>
> make[1]: Entering directory '/usr/src/linux-headers-5.4.0-131-generic'
>
> CC [M] /home/john/thor-linux/Kernel/ubuntu22.04/thor.o
>
> /home/john/thor-linux/Kernel/ubuntu22.04/thor.o: warning: objtool:
> _Controller_process_response_map()+0x1b3: unreachable instruction
>
> Building modules, stage 2.
>
> MODPOST 1 modules
>
> CC [M] /home/john/thor-linux/Kernel/ubuntu22.04/thor.mod.o
>
> LD [M] /home/john/thor-linux/Kernel/ubuntu22.04/thor.ko
>
> make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-131-generic'
>
> make[1]: Entering directory '/usr/src/linux-headers-5.4.0-131-generic'
>
> CLEAN /home/john/thor-linux/Kernel/ubuntu22.04/Module.symvers
>
> make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-131-generic'
>
> #@sudo dmesg -C
>
> #@sudo insmod /usr/local/etc/thor.ko
>
> filename: /usr/local/etc/thor.ko
>
> version: 0.1
>
> description: THOR KMOD
>
> author: Aronetics
>
> license: GPL
>
> srcversion: BC856FA85DB2FEFD38A1B2A
>
> depends:
>
> retpoline: Y
>
> name: thor
>
> vermagic: 5.4.0-131-generic SMP mod_unload modversions
>
> #@sudo dmesg
>
> root@form:/home/john/thor-linux/Kernel/ubuntu20.04#
> <mailto:root@form:/home/john/thor-linux/Kernel/ubuntu20.04#>
>
>
>
> Every 2.0s: tail -n30 /var/lib/dkms/thor/1.0.1/build/make.log
>
>
>
> DKMS make.log for thor-1.0.1 for kernel 5.4.0-131-generic (x86_64)
>
> Thu 24 Nov 2022 01:10:33 PM EST
>
> make: Entering directory '/usr/src/linux-headers-5.4.0-131-generic'
>
> CC [M] /var/lib/dkms/thor/1.0.1/build/thor.o
>
> /var/lib/dkms/thor/1.0.1/build/thor.o: warning: objtool:
> _Controller_process_response_map()+0x1b3: unreachable instruction
>
> Building modules, stage 2.
>
> MODPOST 1 modules
>
> CC [M] /var/lib/dkms/thor/1.0.1/build/thor.mod.o
>
> LD [M] /var/lib/dkms/thor/1.0.1/build/thor.ko
>
> make: Leaving directory '/usr/src/linux-headers-5.4.0-131-generic'
>
>
>
> Is this an error in objtool on Ubuntu within
> /usr/src/linux-headers-5.4.0-${26-130}/tools/objtool ?
Do you have a pointer to your code anywhere? Do you have .S files in
it, or is it all C files?
And did you ask the Canonical developers about this? You should have a
support contract you are paying for with them, so why not use that?
thanks,
greg k-h
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
04aa64375f48 ("drm/i915: fix TLB invalidation for Gen12 video and compute engines")
33da97894758 ("drm/i915/gt: Serialize TLB invalidates with GT resets")
7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
1176d15f0f6e ("Merge tag 'drm-intel-gt-next-2021-10-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04aa64375f48a5d430b5550d9271f8428883e550 Mon Sep 17 00:00:00 2001
From: Andrzej Hajda <andrzej.hajda(a)intel.com>
Date: Mon, 14 Nov 2022 11:38:24 +0100
Subject: [PATCH] drm/i915: fix TLB invalidation for Gen12 video and compute
engines
In case of Gen12 video and compute engines, TLB_INV registers are masked -
to modify one bit, corresponding bit in upper half of the register must
be enabled, otherwise nothing happens.
CVE: CVE-2022-4139
Suggested-by: Chris Wilson <chris.p.wilson(a)intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda(a)intel.com>
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index d0b03a928b9a..5c931b6696c3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1017,6 +1017,11 @@ static void mmio_invalidate_full(struct intel_gt *gt)
if (!i915_mmio_reg_offset(rb.reg))
continue;
+ if (GRAPHICS_VER(i915) == 12 && (engine->class == VIDEO_DECODE_CLASS ||
+ engine->class == VIDEO_ENHANCEMENT_CLASS ||
+ engine->class == COMPUTE_CLASS))
+ rb.bit = _MASKED_BIT_ENABLE(rb.bit);
+
intel_uncore_write_fw(uncore, rb.reg, rb.bit);
awake |= engine->mask;
}
I'm looking to use a sendfile(2) with a Xilinx XDMA kernel driver in order to move data from a PCIe board with Xilinx FPGA to the network card with "zero-copy".
Currently I'm getting EINVAL return status from sendfile(2) when providing opened XDMA device file descriptor as input fd.
The device driver provides a character device that can be mmap'ed.
There seem to be other restrictions. Can anyone provide insight on what would be needed to make this work?
Thanks! //hinko
Hi Greg
After upgrading to 5.4.211 we were started seeing some nodes getting
stuck in our Kubernetes cluster. All nodes are running this kernel
version. After taking a closer look it seems that runc was command getting
stuck. Looking at the stack it appears the thread is stuck in epoll wait for
sometime.
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[<0>] ep_poll+0x48d/0x4e0
[<0>] do_epoll_wait+0xab/0xc0
[<0>] __x64_sys_epoll_pwait+0x4d/0xa0
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[<0>] futex_wait_queue_me+0xb6/0x110
[<0>] futex_wait+0xe2/0x260
[<0>] do_futex+0x372/0x4f0
[<0>] __x64_sys_futex+0x134/0x180
[<0>] do_syscall_64+0x48/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
I noticed there are other discussions going on as well
regarding this.
https://lore.kernel.org/all/Y1pY2n6E1Xa58MXv@kroah.com/
Reverting the below patch does fix the issue:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
We don't see this issue in latest upstream kernel or even latest 5.10
stable tree. Looking at the patches that went in for 5.10 stable there's
one that stands out that seems to be missing in 5.4.
289caf5d8f6c61c6d2b7fd752a7f483cd153f182 (epoll: check for events when removing
a timed out thread from the wait queue)
Backporting this patch to 5.4 we don't see the hangups anymore. Looks like
this patch fixes time out scenarios which might cause missed wake ups.
The other patch in the patch series also fixes a race and is needed for
the second patch to apply.
Roman Penyaev (1):
epoll: call final ep_events_available() check under the lock
Soheil Hassas Yeganeh (1):
epoll: check for events when removing a timed out thread from the wait
queue
fs/eventpoll.c | 68 ++++++++++++++++++++++++++++++--------------------
1 file changed, 41 insertions(+), 27 deletions(-)
--
2.37.1
Commit be36f9e7517e ("efi: READ_ONCE rng seed size before munmap")
added a READ_ONCE() and also changed the call to
add_bootloader_randomness() to use the local size variable. Neither
of these changes was actually needed and this was not backported to
the 4.19 stable branch.
Commit 161a438d730d ("efi: random: reduce seed size to 32 bytes")
reverted the addition of READ_ONCE() and added a limit to the value of
size. This depends on the earlier commit, because size can now differ
from seed->size, but it was wrongly backported to the 4.19 stable
branch by itself.
Apply the missing change to the add_bootloader_randomness() parameter
(except that here we are still using add_device_randomness()).
Fixes: 0513592520ae ("efi: random: reduce seed size to 32 bytes")
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
---
drivers/firmware/efi/efi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index f0ef2643b70e..2bbc2289fe09 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -566,7 +566,7 @@ int __init efi_config_parse_tables(void *config_tables, int count, int sz,
sizeof(*seed) + size);
if (seed != NULL) {
pr_notice("seeding entropy pool\n");
- add_device_randomness(seed->bits, seed->size);
+ add_device_randomness(seed->bits, size);
early_memunmap(seed, sizeof(*seed) + size);
} else {
pr_err("Could not map UEFI random seed!\n");
The quilt patch titled
Subject: Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
has been removed from the -mm tree. Its filename was
kconfigdebug-provide-a-little-extra-frame_warn-leeway-when-kasan-is-enabled.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lee Jones <lee(a)kernel.org>
Subject: Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
Date: Fri, 25 Nov 2022 12:07:50 +0000
When enabled, KASAN enlarges function's stack-frames. Pushing quite a few
over the current threshold. This can mainly be seen on 32-bit
architectures where the present limit (when !GCC) is a lowly 1024-Bytes.
Link: https://lkml.kernel.org/r/20221125120750.3537134-3-lee@kernel.org
Signed-off-by: Lee Jones <lee(a)kernel.org>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: "Christian K��nig" <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Leo Li <sunpeng.li(a)amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: "Pan, Xinhui" <Xinhui.Pan(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Tom Rix <trix(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Kconfig.debug | 1 +
1 file changed, 1 insertion(+)
--- a/lib/Kconfig.debug~kconfigdebug-provide-a-little-extra-frame_warn-leeway-when-kasan-is-enabled
+++ a/lib/Kconfig.debug
@@ -399,6 +399,7 @@ config FRAME_WARN
default 2048 if GCC_PLUGIN_LATENT_ENTROPY
default 2048 if PARISC
default 1536 if (!64BIT && XTENSA)
+ default 1280 if KASAN && !64BIT
default 1024 if !64BIT
default 2048 if 64BIT
help
_
Patches currently in -mm which might be from lee(a)kernel.org are
The quilt patch titled
Subject: drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
has been removed from the -mm tree. Its filename was
drm-amdgpu-temporarily-disable-broken-clang-builds-due-to-blown-stack-frame.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lee Jones <lee(a)kernel.org>
Subject: drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
Date: Fri, 25 Nov 2022 12:07:49 +0000
Patch series "Fix a bunch of allmodconfig errors", v2.
Since b339ec9c229aa ("kbuild: Only default to -Werror if COMPILE_TEST")
WERROR now defaults to COMPILE_TEST meaning that it's enabled for
allmodconfig builds. This leads to some interesting build failures when
using Clang, each resolved in this set.
With this set applied, I am able to obtain a successful allmodconfig Arm
build.
This patch (of 2):
calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 ||
ARM64) architectures built with Clang (all released versions), whereby the
stack frame gets blown up to well over 5k. This would cause an immediate
kernel panic on most architectures. We'll revert this when the following
bug report has been resolved:
https://github.com/llvm/llvm-project/issues/41896.
Link: https://lkml.kernel.org/r/20221125120750.3537134-1-lee@kernel.org
Link: https://lkml.kernel.org/r/20221125120750.3537134-2-lee@kernel.org
Signed-off-by: Lee Jones <lee(a)kernel.org>
Suggested-by: Arnd Bergmann <arnd(a)arndb.de>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: "Christian K��nig" <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Lee Jones <lee(a)kernel.org>
Cc: Leo Li <sunpeng.li(a)amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: "Pan, Xinhui" <Xinhui.Pan(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Tom Rix <trix(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/gpu/drm/amd/display/Kconfig | 7 +++++++
1 file changed, 7 insertions(+)
--- a/drivers/gpu/drm/amd/display/Kconfig~drm-amdgpu-temporarily-disable-broken-clang-builds-due-to-blown-stack-frame
+++ a/drivers/gpu/drm/amd/display/Kconfig
@@ -5,6 +5,7 @@ menu "Display Engine Configuration"
config DRM_AMD_DC
bool "AMD DC - Enable new display engine"
default y
+ depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64
select SND_HDA_COMPONENT if SND_HDA_CORE
select DRM_AMD_DC_DCN if (X86 || PPC_LONG_DOUBLE_128)
help
@@ -12,6 +13,12 @@ config DRM_AMD_DC
support for AMDGPU. This adds required support for Vega and
Raven ASICs.
+ calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 || ARM64)
+ architectures built with Clang (all released versions), whereby the stack
+ frame gets blown up to well over 5k. This would cause an immediate kernel
+ panic on most architectures. We'll revert this when the following bug report
+ has been resolved: https://github.com/llvm/llvm-project/issues/41896.
+
config DRM_AMD_DC_DCN
def_bool n
help
_
Patches currently in -mm which might be from lee(a)kernel.org are