On 04/11/25 4:47 pm, Samir M wrote:
> Hello,
>
>
> I am observing below error while running the make modules_install
> command on latest mainline kernel on IBM Power11 server.
>
>
> Error:
> DEPMOD /lib/modules/6.18.0-rc4 depmod: ERROR: kmod_builtin_iter_next:
> unexpected string without modname prefix
>
IBM CI has also reported this error.
Error:
depmod: ERROR: kmod_builtin_iter_next: unexpected string without modname
prefix
INSTALL /boot
depmod: ERROR: kmod_builtin_iter_next: unexpected string without modname
prefix
depmod: ERROR: kmod_builtin_iter_next: unexpected string without modname
prefix
Git bisect is pointing to below commit as first bad commit.
d50f21091358b2b29dc06c2061106cdb0f030d03 is the first bad commit
commit d50f21091358b2b29dc06c2061106cdb0f030d03
Author: Dimitri John Ledkov <dimitri.ledkov(a)surgut.co.uk>
Date: Sun Oct 26 20:21:00 2025 +0000
kbuild: align modinfo section for Secureboot Authenticode EDK2 compat
Previously linker scripts would always generate vmlinuz that has
sections
aligned. And thus padded (correct Authenticode calculation) and
unpadded
calculation would be same. As in https://github.com/rhboot/pesign
userspace
tool would produce the same authenticode digest for both of the
following
commands:
pesign --padding --hash --in ./arch/x86_64/boot/bzImage
pesign --nopadding --hash --in ./arch/x86_64/boot/bzImage
The commit 3e86e4d74c04 ("kbuild: keep .modinfo section in
vmlinux.unstripped") added .modinfo section of variable length.
Depending
on kernel configuration it may or may not be aligned.
All userspace signing tooling correctly pads such section to
calculation
spec compliant authenticode digest.
However, if bzImage is not further processed and is attempted to be
loaded
directly by EDK2 firmware, it calculates unpadded Authenticode
digest and
fails to correct accept/reject such kernel builds even when propoer
Authenticode values are enrolled in db/dbx. One can say EDK2 requires
aligned/padded kernels in Secureboot.
Thus add ALIGN(8) to the .modinfo section, to esure kernels
irrespective of
modinfo contents can be loaded by all existing EDK2 firmware builds.
Fixes: 3e86e4d74c04 ("kbuild: keep .modinfo section in
vmlinux.unstripped")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov(a)surgut.co.uk>
Link:
https://patch.msgid.link/20251026202100.679989-1-dimitri.ledkov@surgut.co.uk
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
include/asm-generic/vmlinux.lds.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Git Bisect log:
git bisect log
git bisect start
# status: waiting for both good and bad commits
# bad: [c9cfc122f03711a5124b4aafab3211cf4d35a2ac] Merge tag
'for-6.18-rc4-tag' of
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
git bisect bad c9cfc122f03711a5124b4aafab3211cf4d35a2ac
# status: waiting for good commit(s), bad commit known
# good: [dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa] Linux 6.18-rc3
git bisect good dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa
# good: [3ad81aa52085a7e67edfa4bc8f518e5962196bb3] Merge tag 'v6.18-p4'
of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect good 3ad81aa52085a7e67edfa4bc8f518e5962196bb3
# good: [f414f9fd68797182f8de4e1cd9855b6b28abde99] Merge tag
'pci-v6.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
git bisect good f414f9fd68797182f8de4e1cd9855b6b28abde99
# good: [41dacb39fe79cd2fce42d31fa6658d926489a548] Merge tag
'drm-xe-fixes-2025-10-30' of
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
git bisect good 41dacb39fe79cd2fce42d31fa6658d926489a548
# bad: [f9bc8e0912b8f6b1d60608a715a1da575670e038] Merge tag
'perf-urgent-2025-11-01' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad f9bc8e0912b8f6b1d60608a715a1da575670e038
# good: [c44b4b9eeb71f5b0b617abf6fd66d1ef0aab6200] objtool: Fix
skip_alt_group() for non-alternative STAC/CLAC
git bisect good c44b4b9eeb71f5b0b617abf6fd66d1ef0aab6200
# bad: [cb7f9fc3725a11447a4af69dfe8d648e4320acdc] Merge tag
'kbuild-fixes-6.18-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
git bisect bad cb7f9fc3725a11447a4af69dfe8d648e4320acdc
# bad: [d50f21091358b2b29dc06c2061106cdb0f030d03] kbuild: align modinfo
section for Secureboot Authenticode EDK2 compat
git bisect bad d50f21091358b2b29dc06c2061106cdb0f030d03
# good: [5ff90d427ef841fa48608d0c19a81c48d6126d46] kbuild:
install-extmod-build: Fix when given dir outside the build dir
git bisect good 5ff90d427ef841fa48608d0c19a81c48d6126d46
# first bad commit: [d50f21091358b2b29dc06c2061106cdb0f030d03] kbuild:
align modinfo section for Secureboot Authenticode EDK2 compat
Please add below tag as well, if you happen to fix this.
Reported-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com>
Regards,
Venkat.
>
> If you happen to fix the above issue, then please add below tag.
> Reported-by: Samir M <samir(a)linux.ibm.com>
>
>
> Regards,
> Samir.
>
>
The device bus LAN ID was obtained from PCI_FUNC(), but when a PF
port is passthrough to a virtual machine, the function number may not
match the actual port index on the device. This could cause the driver
to perform operations such as LAN reset on the wrong port.
Fix this by reading the LAN ID from port status register.
Fixes: a34b3e6ed8fb ("net: txgbe: Store PCI info")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu(a)trustnetic.com>
---
drivers/net/ethernet/wangxun/libwx/wx_hw.c | 3 ++-
drivers/net/ethernet/wangxun/libwx/wx_type.h | 4 ++--
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_hw.c b/drivers/net/ethernet/wangxun/libwx/wx_hw.c
index 814164459707..58b8300e3d2c 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_hw.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_hw.c
@@ -2480,7 +2480,8 @@ int wx_sw_init(struct wx *wx)
wx->oem_svid = pdev->subsystem_vendor;
wx->oem_ssid = pdev->subsystem_device;
wx->bus.device = PCI_SLOT(pdev->devfn);
- wx->bus.func = PCI_FUNC(pdev->devfn);
+ wx->bus.func = FIELD_GET(WX_CFG_PORT_ST_LANID,
+ rd32(wx, WX_CFG_PORT_ST));
if (wx->oem_svid == PCI_VENDOR_ID_WANGXUN ||
pdev->is_virtfn) {
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
index d0cbcded1dd4..b1a6ef5709a9 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
@@ -102,6 +102,8 @@
#define WX_CFG_PORT_CTL_DRV_LOAD BIT(3)
#define WX_CFG_PORT_CTL_QINQ BIT(2)
#define WX_CFG_PORT_CTL_D_VLAN BIT(0) /* double vlan*/
+#define WX_CFG_PORT_ST 0x14404
+#define WX_CFG_PORT_ST_LANID GENMASK(9, 8)
#define WX_CFG_TAG_TPID(_i) (0x14430 + ((_i) * 4))
#define WX_CFG_PORT_CTL_NUM_VT_MASK GENMASK(13, 12) /* number of TVs */
@@ -564,8 +566,6 @@ enum WX_MSCA_CMD_value {
#define TXD_USE_COUNT(S) DIV_ROUND_UP((S), WX_MAX_DATA_PER_TXD)
#define DESC_NEEDED (MAX_SKB_FRAGS + 4)
-#define WX_CFG_PORT_ST 0x14404
-
/******************* Receive Descriptor bit definitions **********************/
#define WX_RXD_STAT_DD BIT(0) /* Done */
#define WX_RXD_STAT_EOP BIT(1) /* End of Packet */
--
2.48.1
The ucsi_psy_get_current_max function defaults to 0.1A when it is not
clear how much current the partner device can support. But this does
not check the port is connected, and will report 0.1A max current when
nothing is connected. Update ucsi_psy_get_current_max to report 0A when
there is no connection.
Fixes: af833e7f7db3 ("usb: typec: ucsi: psy: Set current max to 100mA for BC 1.2 and Default")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jameson Thies <jthies(a)google.com>
Reviewed-by: Benson Leung <bleung(a)chromium.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel(a)collabora.com>
Tested-by: Kenneth R. Crudup <kenny(a)panix.com>
---
v3 changes:
- change log moved under "--"
v2 changes:
- added cc stable tag to commit message
drivers/usb/typec/ucsi/psy.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/usb/typec/ucsi/psy.c b/drivers/usb/typec/ucsi/psy.c
index 2b0225821502..3abe9370ffaa 100644
--- a/drivers/usb/typec/ucsi/psy.c
+++ b/drivers/usb/typec/ucsi/psy.c
@@ -169,6 +169,11 @@ static int ucsi_psy_get_current_max(struct ucsi_connector *con,
{
u32 pdo;
+ if (!UCSI_CONSTAT(con, CONNECTED)) {
+ val->intval = 0;
+ return 0;
+ }
+
switch (UCSI_CONSTAT(con, PWR_OPMODE)) {
case UCSI_CONSTAT_PWR_OPMODE_PD:
if (con->num_pdos > 0) {
base-commit: 18514fd70ea4ca9de137bb3bceeac1bac4bcad75
--
2.51.2.1041.gc1ab5b90ca-goog
The patch titled
Subject: kernel/kexec: fix IMA when allocation happens in CMA area
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kernel-kexec-fix-ima-when-allocation-happens-in-cma-area.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Pingfan Liu <piliu(a)redhat.com>
Subject: kernel/kexec: fix IMA when allocation happens in CMA area
Date: Wed, 5 Nov 2025 21:09:22 +0800
When I tested kexec with the latest kernel, I ran into the following
warning:
[ 40.712410] ------------[ cut here ]------------
[ 40.712576] WARNING: CPU: 2 PID: 1562 at kernel/kexec_core.c:1001 kimage_map_segment+0x144/0x198
[...]
[ 40.816047] Call trace:
[ 40.818498] kimage_map_segment+0x144/0x198 (P)
[ 40.823221] ima_kexec_post_load+0x58/0xc0
[ 40.827246] __do_sys_kexec_file_load+0x29c/0x368
[...]
[ 40.855423] ---[ end trace 0000000000000000 ]---
This is caused by the fact that kexec allocates the destination directly
in the CMA area. In that case, the CMA kernel address should be exported
directly to the IMA component, instead of using the vmalloc'd address.
Link: https://lkml.kernel.org/r/20251105130922.13321-2-piliu@redhat.com
Fixes: 0091d9241ea2 ("kexec: define functions to map and unmap segments")
Signed-off-by: Pingfan Liu <piliu(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Roberto Sassu <roberto.sassu(a)huawei.com>
Cc: Alexander Graf <graf(a)amazon.com>
Cc: Steven Chen <chenste(a)linux.microsoft.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kexec_core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/kernel/kexec_core.c~kernel-kexec-fix-ima-when-allocation-happens-in-cma-area
+++ a/kernel/kexec_core.c
@@ -967,6 +967,7 @@ void *kimage_map_segment(struct kimage *
kimage_entry_t *ptr, entry;
struct page **src_pages;
unsigned int npages;
+ struct page *cma;
void *vaddr = NULL;
int i;
@@ -974,6 +975,9 @@ void *kimage_map_segment(struct kimage *
size = image->segment[idx].memsz;
eaddr = addr + size;
+ cma = image->segment_cma[idx];
+ if (cma)
+ return cma;
/*
* Collect the source pages and map them in a contiguous VA range.
*/
@@ -1014,7 +1018,8 @@ void *kimage_map_segment(struct kimage *
void kimage_unmap_segment(void *segment_buffer)
{
- vunmap(segment_buffer);
+ if (is_vmalloc_addr(segment_buffer))
+ vunmap(segment_buffer);
}
struct kexec_load_limit {
_
Patches currently in -mm which might be from piliu(a)redhat.com are
kernel-kexec-change-the-prototype-of-kimage_map_segment.patch
kernel-kexec-fix-ima-when-allocation-happens-in-cma-area.patch
The patch titled
Subject: kernel/kexec: change the prototype of kimage_map_segment()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kernel-kexec-change-the-prototype-of-kimage_map_segment.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Pingfan Liu <piliu(a)redhat.com>
Subject: kernel/kexec: change the prototype of kimage_map_segment()
Date: Wed, 5 Nov 2025 21:09:21 +0800
The kexec segment index will be required to extract the corresponding
information for that segment in kimage_map_segment(). Additionally,
kexec_segment already holds the kexec relocation destination address and
size. Therefore, the prototype of kimage_map_segment() can be changed.
Link: https://lkml.kernel.org/r/20251105130922.13321-1-piliu@redhat.com
Fixes: 0091d9241ea2 ("kexec: define functions to map and unmap segments")
Signed-off-by: Pingfan Liu <piliu(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Roberto Sassu <roberto.sassu(a)huawei.com>
Cc: Alexander Graf <graf(a)amazon.com>
Cc: Steven Chen <chenste(a)linux.microsoft.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/kexec.h | 4 ++--
kernel/kexec_core.c | 9 ++++++---
security/integrity/ima/ima_kexec.c | 4 +---
3 files changed, 9 insertions(+), 8 deletions(-)
--- a/include/linux/kexec.h~kernel-kexec-change-the-prototype-of-kimage_map_segment
+++ a/include/linux/kexec.h
@@ -530,7 +530,7 @@ extern bool kexec_file_dbg_print;
#define kexec_dprintk(fmt, arg...) \
do { if (kexec_file_dbg_print) pr_info(fmt, ##arg); } while (0)
-extern void *kimage_map_segment(struct kimage *image, unsigned long addr, unsigned long size);
+extern void *kimage_map_segment(struct kimage *image, int idx);
extern void kimage_unmap_segment(void *buffer);
#else /* !CONFIG_KEXEC_CORE */
struct pt_regs;
@@ -540,7 +540,7 @@ static inline void __crash_kexec(struct
static inline void crash_kexec(struct pt_regs *regs) { }
static inline int kexec_should_crash(struct task_struct *p) { return 0; }
static inline int kexec_crash_loaded(void) { return 0; }
-static inline void *kimage_map_segment(struct kimage *image, unsigned long addr, unsigned long size)
+static inline void *kimage_map_segment(struct kimage *image, int idx)
{ return NULL; }
static inline void kimage_unmap_segment(void *buffer) { }
#define kexec_in_progress false
--- a/kernel/kexec_core.c~kernel-kexec-change-the-prototype-of-kimage_map_segment
+++ a/kernel/kexec_core.c
@@ -960,17 +960,20 @@ int kimage_load_segment(struct kimage *i
return result;
}
-void *kimage_map_segment(struct kimage *image,
- unsigned long addr, unsigned long size)
+void *kimage_map_segment(struct kimage *image, int idx)
{
+ unsigned long addr, size, eaddr;
unsigned long src_page_addr, dest_page_addr = 0;
- unsigned long eaddr = addr + size;
kimage_entry_t *ptr, entry;
struct page **src_pages;
unsigned int npages;
void *vaddr = NULL;
int i;
+ addr = image->segment[idx].mem;
+ size = image->segment[idx].memsz;
+ eaddr = addr + size;
+
/*
* Collect the source pages and map them in a contiguous VA range.
*/
--- a/security/integrity/ima/ima_kexec.c~kernel-kexec-change-the-prototype-of-kimage_map_segment
+++ a/security/integrity/ima/ima_kexec.c
@@ -250,9 +250,7 @@ void ima_kexec_post_load(struct kimage *
if (!image->ima_buffer_addr)
return;
- ima_kexec_buffer = kimage_map_segment(image,
- image->ima_buffer_addr,
- image->ima_buffer_size);
+ ima_kexec_buffer = kimage_map_segment(image, image->ima_segment_index);
if (!ima_kexec_buffer) {
pr_err("Could not map measurements buffer.\n");
return;
_
Patches currently in -mm which might be from piliu(a)redhat.com are
kernel-kexec-change-the-prototype-of-kimage_map_segment.patch
kernel-kexec-fix-ima-when-allocation-happens-in-cma-area.patch
The patch titled
Subject: mm/huge_memory: fix folio split check for anon folios in swapcache.
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-fix-folio-split-check-for-anon-folios-in-swapcache.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/huge_memory: fix folio split check for anon folios in swapcache.
Date: Wed, 5 Nov 2025 11:29:10 -0500
Both uniform and non uniform split check missed the check to prevent
splitting anon folios in swapcache to non-zero order. Fix the check.
Link: https://lkml.kernel.org/r/20251105162910.752266-1-ziy@nvidia.com
Fixes: 58729c04cf10 ("mm/huge_memory: add buddy allocator like (non-uniform) folio_split()")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: "David Hildenbrand (Red Hat)" <david(a)kernel.org>
Closes: https://lore.kernel.org/all/dc0ecc2c-4089-484f-917f-920fdca4c898@kernel.org/
Acked-by: David Hildenbrand (Red Hat) <david(a)kernel.org>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: Lance Yang <lance.yang(a)linux.dev>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Nico Pache <npache(a)redhat.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Wei Yang <richard.weiyang(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/huge_memory.c~mm-huge_memory-fix-folio-split-check-for-anon-folios-in-swapcache
+++ a/mm/huge_memory.c
@@ -3522,7 +3522,8 @@ bool non_uniform_split_supported(struct
/* order-1 is not supported for anonymous THP. */
VM_WARN_ONCE(warns && new_order == 1,
"Cannot split to order-1 folio");
- return new_order != 1;
+ if (new_order == 1)
+ return false;
} else if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
!mapping_large_folio_support(folio->mapping)) {
/*
@@ -3553,7 +3554,8 @@ bool uniform_split_supported(struct foli
if (folio_test_anon(folio)) {
VM_WARN_ONCE(warns && new_order == 1,
"Cannot split to order-1 folio");
- return new_order != 1;
+ if (new_order == 1)
+ return false;
} else if (new_order) {
if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
!mapping_large_folio_support(folio->mapping)) {
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
mm-huge_memory-do-not-change-split_huge_page-target-order-silently.patch
mm-huge_memory-preserve-pg_has_hwpoisoned-if-a-folio-is-split-to-0-order.patch
mm-huge_memory-fix-folio-split-check-for-anon-folios-in-swapcache.patch
mm-huge_memory-add-split_huge_page_to_order.patch
mm-memory-failure-improve-large-block-size-folio-handling.patch
mm-huge_memory-fix-kernel-doc-comments-for-folio_split-and-related.patch
mm-huge_memory-fix-kernel-doc-comments-for-folio_split-and-related-fix.patch
KVM currenty fails a nested VMRUN and injects VMEXIT_INVALID (aka
SVM_EXIT_ERR) if L1 sets NP_ENABLE and the host does not support NPTs.
On first glance, it seems like the check should actually be for
guest_cpu_cap_has(X86_FEATURE_NPT) instead, as it is possible for the
host to support NPTs but the guest CPUID to not advertise it.
However, the consistency check is not architectural to begin with. The
APM does not mention VMEXIT_INVALID if NP_ENABLE is set on a processor
that does not have X86_FEATURE_NPT. Hence, NP_ENABLE should be ignored
if X86_FEATURE_NPT is not available for L1. Apart from the consistency
check, this is currently the case because NP_ENABLE is actually copied
from VMCB01 to VMCB02, not from VMCB12.
On the other hand, the APM does mention two other consistency checks for
NP_ENABLE, both of which are missing (paraphrased):
In Volume #2, 15.25.3 (24593—Rev. 3.42—March 2024):
If VMRUN is executed with hCR0.PG cleared to zero and NP_ENABLE set to
1, VMRUN terminates with #VMEXIT(VMEXIT_INVALID)
In Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024):
When VMRUN is executed with nested paging enabled (NP_ENABLE = 1), the
following conditions are considered illegal state combinations, in
addition to those mentioned in “Canonicalization and Consistency
Checks”:
• Any MBZ bit of nCR3 is set.
• Any G_PAT.PA field has an unsupported type encoding or any
reserved field in G_PAT has a nonzero value.
Replace the existing consistency check with consistency checks on
hCR0.PG and nCR3. The G_PAT consistency check will be addressed
separately.
Pass L1's CR0 to __nested_vmcb_check_controls(). In
nested_vmcb_check_controls(), L1's CR0 is available through
kvm_read_cr0(), as vcpu->arch.cr0 is not updated to L2's CR0 until later
through nested_vmcb02_prepare_save() -> svm_set_cr0().
In svm_set_nested_state(), L1's CR0 is available in the captured save
area, as svm_get_nested_state() captures L1's save area when running L2,
and L1's CR0 is stashed in VMCB01 on nested VMRUN (in
nested_svm_vmrun()).
Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
---
arch/x86/kvm/svm/nested.c | 21 ++++++++++++++++-----
arch/x86/kvm/svm/svm.h | 3 ++-
2 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 83de3456df708..9a534f04bdc83 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -325,7 +325,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
}
static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
- struct vmcb_ctrl_area_cached *control)
+ struct vmcb_ctrl_area_cached *control,
+ unsigned long l1_cr0)
{
if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
return false;
@@ -333,8 +334,12 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
if (CC(control->asid == 0))
return false;
- if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && !npt_enabled))
- return false;
+ if (control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
+ if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
+ return false;
+ if (CC(!(l1_cr0 & X86_CR0_PG)))
+ return false;
+ }
if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
MSRPM_SIZE)))
@@ -400,7 +405,12 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
- return __nested_vmcb_check_controls(vcpu, ctl);
+ /*
+ * Make sure we did not enter guest mode yet, in which case
+ * kvm_read_cr0() could return L2's CR0.
+ */
+ WARN_ON_ONCE(is_guest_mode(vcpu));
+ return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu));
}
static
@@ -1832,7 +1842,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
ret = -EINVAL;
__nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
- if (!__nested_vmcb_check_controls(vcpu, &ctl_cached))
+ /* 'save' contains L1 state saved from before VMRUN */
+ if (!__nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
goto out_free;
/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6765a5e433cea..0a2908e22d746 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
static inline bool nested_npt_enabled(struct vcpu_svm *svm)
{
- return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
+ return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
+ svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
}
static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
--
2.51.2.1026.g39e6a42477-goog