From: Wanpeng Li <wanpengli(a)tencent.com>
Commit 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's
CPUID) avoids to access pv tlb shootdown host side logic when this pv feature
is not exposed to guest, however, kvm_steal_time.preempted not only leveraged
by pv tlb shootdown logic but also mitigate the lock holder preemption issue.
>From guest's point of view, vCPU is always preempted since we lose the reset
of kvm_steal_time.preempted before vmentry if pv tlb shootdown feature is not
exposed. This patch fixes it by clearing kvm_steal_time.preempted before
vmentry.
Fixes: 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID)
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
---
arch/x86/kvm/x86.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dfb7c32..bed7b53 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3105,6 +3105,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
st->preempted & KVM_VCPU_FLUSH_TLB);
if (xchg(&st->preempted, 0) & KVM_VCPU_FLUSH_TLB)
kvm_vcpu_flush_tlb_guest(vcpu);
+ } else {
+ st->preempted = 0;
}
vcpu->arch.st.preempted = 0;
--
2.7.4
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3c0315424f5e3d2a4113c7272367bee1e8e6a174 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 4 Mar 2021 21:43:10 -0800
Subject: [PATCH] f2fs: fix error handling in f2fs_end_enable_verity()
f2fs didn't properly clean up if verity failed to be enabled on a file:
- It left verity metadata (pages past EOF) in the page cache, which
would be exposed to userspace if the file was later extended.
- It didn't truncate the verity metadata at all (either from cache or
from disk) if an error occurred while setting the verity bit.
Fix these bugs by adding a call to truncate_inode_pages() and ensuring
that we truncate the verity metadata (both from cache and from disk) in
all error paths. Also rework the code to cleanly separate the success
path from the error paths, which makes it much easier to understand.
Finally, log a message if f2fs_truncate() fails, since it might
otherwise fail silently.
Reported-by: Yunlei He <heyunlei(a)hihonor.com>
Fixes: 95ae251fe828 ("f2fs: add fs-verity support")
Cc: <stable(a)vger.kernel.org> # v5.4+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index 054ec852b5ea..15ba36926fad 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -152,40 +152,73 @@ static int f2fs_end_enable_verity(struct file *filp, const void *desc,
size_t desc_size, u64 merkle_tree_size)
{
struct inode *inode = file_inode(filp);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
u64 desc_pos = f2fs_verity_metadata_pos(inode) + merkle_tree_size;
struct fsverity_descriptor_location dloc = {
.version = cpu_to_le32(F2FS_VERIFY_VER),
.size = cpu_to_le32(desc_size),
.pos = cpu_to_le64(desc_pos),
};
- int err = 0;
+ int err = 0, err2 = 0;
- if (desc != NULL) {
- /* Succeeded; write the verity descriptor. */
- err = pagecache_write(inode, desc, desc_size, desc_pos);
+ /*
+ * If an error already occurred (which fs/verity/ signals by passing
+ * desc == NULL), then only clean-up is needed.
+ */
+ if (desc == NULL)
+ goto cleanup;
- /* Write all pages before clearing FI_VERITY_IN_PROGRESS. */
- if (!err)
- err = filemap_write_and_wait(inode->i_mapping);
- }
+ /* Append the verity descriptor. */
+ err = pagecache_write(inode, desc, desc_size, desc_pos);
+ if (err)
+ goto cleanup;
+
+ /*
+ * Write all pages (both data and verity metadata). Note that this must
+ * happen before clearing FI_VERITY_IN_PROGRESS; otherwise pages beyond
+ * i_size won't be written properly. For crash consistency, this also
+ * must happen before the verity inode flag gets persisted.
+ */
+ err = filemap_write_and_wait(inode->i_mapping);
+ if (err)
+ goto cleanup;
+
+ /* Set the verity xattr. */
+ err = f2fs_setxattr(inode, F2FS_XATTR_INDEX_VERITY,
+ F2FS_XATTR_NAME_VERITY, &dloc, sizeof(dloc),
+ NULL, XATTR_CREATE);
+ if (err)
+ goto cleanup;
- /* If we failed, truncate anything we wrote past i_size. */
- if (desc == NULL || err)
- f2fs_truncate(inode);
+ /* Finally, set the verity inode flag. */
+ file_set_verity(inode);
+ f2fs_set_inode_flags(inode);
+ f2fs_mark_inode_dirty_sync(inode, true);
clear_inode_flag(inode, FI_VERITY_IN_PROGRESS);
+ return 0;
- if (desc != NULL && !err) {
- err = f2fs_setxattr(inode, F2FS_XATTR_INDEX_VERITY,
- F2FS_XATTR_NAME_VERITY, &dloc, sizeof(dloc),
- NULL, XATTR_CREATE);
- if (!err) {
- file_set_verity(inode);
- f2fs_set_inode_flags(inode);
- f2fs_mark_inode_dirty_sync(inode, true);
- }
+cleanup:
+ /*
+ * Verity failed to be enabled, so clean up by truncating any verity
+ * metadata that was written beyond i_size (both from cache and from
+ * disk) and clearing FI_VERITY_IN_PROGRESS.
+ *
+ * Taking i_gc_rwsem[WRITE] is needed to stop f2fs garbage collection
+ * from re-instantiating cached pages we are truncating (since unlike
+ * normal file accesses, garbage collection isn't limited by i_size).
+ */
+ down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+ truncate_inode_pages(inode->i_mapping, inode->i_size);
+ err2 = f2fs_truncate(inode);
+ if (err2) {
+ f2fs_err(sbi, "Truncating verity metadata failed (errno=%d)",
+ err2);
+ set_sbi_flag(sbi, SBI_NEED_FSCK);
}
- return err;
+ up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+ clear_inode_flag(inode, FI_VERITY_IN_PROGRESS);
+ return err ?: err2;
}
static int f2fs_get_verity_descriptor(struct inode *inode, void *buf,
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c49f71f60754acbff37505e1d16ca796bf8a8140 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Tue, 20 Apr 2021 04:50:40 +0200
Subject: [PATCH] MIPS: Reinstate platform `__div64_32' handler
Our current MIPS platform `__div64_32' handler is inactive, because it
is incorrectly only enabled for 64-bit configurations, for which generic
`do_div' code does not call it anyway.
The handler is not suitable for being called from there though as it
only calculates 32 bits of the quotient under the assumption the 64-bit
divident has been suitably reduced. Code for such reduction used to be
there, however it has been incorrectly removed with commit c21004cd5b4c
("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0."), which should
have only updated an obsoleted constraint for an inline asm involving
$hi and $lo register outputs, while possibly wiring the original MIPS
variant of the `do_div' macro as `__div64_32' handler for the generic
`do_div' implementation
Correct the handler as follows then:
- Revert most of the commit referred, however retaining the current
formatting, except for the final two instructions of the inline asm
sequence, which the original commit missed. Omit the original 64-bit
parts though.
- Rename the original `do_div' macro to `__div64_32'. Use the combined
`x' constraint referring to the MD accumulator as a whole, replacing
the original individual `h' and `l' constraints used for $hi and $lo
registers respectively, of which `h' has been obsoleted with GCC 4.4.
Update surrounding code accordingly.
We have since removed support for GCC versions before 4.9, so no need
for a special arrangement here; GCC has supported the `x' constraint
since forever anyway, or at least going back to 1991.
- Rename the `__base' local variable in `__div64_32' to `__radix' to
avoid a conflict with a local variable in `do_div'.
- Actually enable this code for 32-bit rather than 64-bit configurations
by qualifying it with BITS_PER_LONG being 32 instead of 64. Include
<asm/bitsperlong.h> for this macro rather than <linux/types.h> as we
don't need anything else.
- Finally include <asm-generic/div64.h> last rather than first.
This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0668s and 0.2629s from 2.1529s
and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.
For a reference 64-bit `do_div' code where we have the DDIVU instruction
available to do the whole calculation right away averages at 0.0660s for
the latter CPU.
Fixes: c21004cd5b4c ("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0.")
Reported-by: Huacai Chen <chenhuacai(a)kernel.org>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Cc: stable(a)vger.kernel.org # v2.6.30+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/div64.h b/arch/mips/include/asm/div64.h
index dc5ea5736440..b252300e299d 100644
--- a/arch/mips/include/asm/div64.h
+++ b/arch/mips/include/asm/div64.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2000, 2004 Maciej W. Rozycki
+ * Copyright (C) 2000, 2004, 2021 Maciej W. Rozycki
* Copyright (C) 2003, 07 Ralf Baechle (ralf(a)linux-mips.org)
*
* This file is subject to the terms and conditions of the GNU General Public
@@ -9,25 +9,18 @@
#ifndef __ASM_DIV64_H
#define __ASM_DIV64_H
-#include <asm-generic/div64.h>
-
-#if BITS_PER_LONG == 64
+#include <asm/bitsperlong.h>
-#include <linux/types.h>
+#if BITS_PER_LONG == 32
/*
* No traps on overflows for any of these...
*/
-#define __div64_32(n, base) \
-({ \
+#define do_div64_32(res, high, low, base) ({ \
unsigned long __cf, __tmp, __tmp2, __i; \
unsigned long __quot32, __mod32; \
- unsigned long __high, __low; \
- unsigned long long __n; \
\
- __high = *__n >> 32; \
- __low = __n; \
__asm__( \
" .set push \n" \
" .set noat \n" \
@@ -51,18 +44,50 @@
" subu %0, %0, %z6 \n" \
" addiu %2, %2, 1 \n" \
"3: \n" \
- " bnez %4, 0b\n\t" \
- " srl %5, %1, 0x1f\n\t" \
+ " bnez %4, 0b \n" \
+ " srl %5, %1, 0x1f \n" \
" .set pop" \
: "=&r" (__mod32), "=&r" (__tmp), \
"=&r" (__quot32), "=&r" (__cf), \
"=&r" (__i), "=&r" (__tmp2) \
- : "Jr" (base), "0" (__high), "1" (__low)); \
+ : "Jr" (base), "0" (high), "1" (low)); \
\
- (__n) = __quot32; \
+ (res) = __quot32; \
__mod32; \
})
-#endif /* BITS_PER_LONG == 64 */
+#define __div64_32(n, base) ({ \
+ unsigned long __upper, __low, __high, __radix; \
+ unsigned long long __modquot; \
+ unsigned long long __quot; \
+ unsigned long long __div; \
+ unsigned long __mod; \
+ \
+ __div = (*n); \
+ __radix = (base); \
+ \
+ __high = __div >> 32; \
+ __low = __div; \
+ __upper = __high; \
+ \
+ if (__high) { \
+ __asm__("divu $0, %z1, %z2" \
+ : "=x" (__modquot) \
+ : "Jr" (__high), "Jr" (__radix)); \
+ __upper = __modquot >> 32; \
+ __high = __modquot; \
+ } \
+ \
+ __mod = do_div64_32(__low, __upper, __low, __radix); \
+ \
+ __quot = __high; \
+ __quot = __quot << 32 | __low; \
+ (*n) = __quot; \
+ __mod; \
+})
+
+#endif /* BITS_PER_LONG == 32 */
+
+#include <asm-generic/div64.h>
#endif /* __ASM_DIV64_H */
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c49f71f60754acbff37505e1d16ca796bf8a8140 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Tue, 20 Apr 2021 04:50:40 +0200
Subject: [PATCH] MIPS: Reinstate platform `__div64_32' handler
Our current MIPS platform `__div64_32' handler is inactive, because it
is incorrectly only enabled for 64-bit configurations, for which generic
`do_div' code does not call it anyway.
The handler is not suitable for being called from there though as it
only calculates 32 bits of the quotient under the assumption the 64-bit
divident has been suitably reduced. Code for such reduction used to be
there, however it has been incorrectly removed with commit c21004cd5b4c
("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0."), which should
have only updated an obsoleted constraint for an inline asm involving
$hi and $lo register outputs, while possibly wiring the original MIPS
variant of the `do_div' macro as `__div64_32' handler for the generic
`do_div' implementation
Correct the handler as follows then:
- Revert most of the commit referred, however retaining the current
formatting, except for the final two instructions of the inline asm
sequence, which the original commit missed. Omit the original 64-bit
parts though.
- Rename the original `do_div' macro to `__div64_32'. Use the combined
`x' constraint referring to the MD accumulator as a whole, replacing
the original individual `h' and `l' constraints used for $hi and $lo
registers respectively, of which `h' has been obsoleted with GCC 4.4.
Update surrounding code accordingly.
We have since removed support for GCC versions before 4.9, so no need
for a special arrangement here; GCC has supported the `x' constraint
since forever anyway, or at least going back to 1991.
- Rename the `__base' local variable in `__div64_32' to `__radix' to
avoid a conflict with a local variable in `do_div'.
- Actually enable this code for 32-bit rather than 64-bit configurations
by qualifying it with BITS_PER_LONG being 32 instead of 64. Include
<asm/bitsperlong.h> for this macro rather than <linux/types.h> as we
don't need anything else.
- Finally include <asm-generic/div64.h> last rather than first.
This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0668s and 0.2629s from 2.1529s
and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.
For a reference 64-bit `do_div' code where we have the DDIVU instruction
available to do the whole calculation right away averages at 0.0660s for
the latter CPU.
Fixes: c21004cd5b4c ("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0.")
Reported-by: Huacai Chen <chenhuacai(a)kernel.org>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Cc: stable(a)vger.kernel.org # v2.6.30+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/div64.h b/arch/mips/include/asm/div64.h
index dc5ea5736440..b252300e299d 100644
--- a/arch/mips/include/asm/div64.h
+++ b/arch/mips/include/asm/div64.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2000, 2004 Maciej W. Rozycki
+ * Copyright (C) 2000, 2004, 2021 Maciej W. Rozycki
* Copyright (C) 2003, 07 Ralf Baechle (ralf(a)linux-mips.org)
*
* This file is subject to the terms and conditions of the GNU General Public
@@ -9,25 +9,18 @@
#ifndef __ASM_DIV64_H
#define __ASM_DIV64_H
-#include <asm-generic/div64.h>
-
-#if BITS_PER_LONG == 64
+#include <asm/bitsperlong.h>
-#include <linux/types.h>
+#if BITS_PER_LONG == 32
/*
* No traps on overflows for any of these...
*/
-#define __div64_32(n, base) \
-({ \
+#define do_div64_32(res, high, low, base) ({ \
unsigned long __cf, __tmp, __tmp2, __i; \
unsigned long __quot32, __mod32; \
- unsigned long __high, __low; \
- unsigned long long __n; \
\
- __high = *__n >> 32; \
- __low = __n; \
__asm__( \
" .set push \n" \
" .set noat \n" \
@@ -51,18 +44,50 @@
" subu %0, %0, %z6 \n" \
" addiu %2, %2, 1 \n" \
"3: \n" \
- " bnez %4, 0b\n\t" \
- " srl %5, %1, 0x1f\n\t" \
+ " bnez %4, 0b \n" \
+ " srl %5, %1, 0x1f \n" \
" .set pop" \
: "=&r" (__mod32), "=&r" (__tmp), \
"=&r" (__quot32), "=&r" (__cf), \
"=&r" (__i), "=&r" (__tmp2) \
- : "Jr" (base), "0" (__high), "1" (__low)); \
+ : "Jr" (base), "0" (high), "1" (low)); \
\
- (__n) = __quot32; \
+ (res) = __quot32; \
__mod32; \
})
-#endif /* BITS_PER_LONG == 64 */
+#define __div64_32(n, base) ({ \
+ unsigned long __upper, __low, __high, __radix; \
+ unsigned long long __modquot; \
+ unsigned long long __quot; \
+ unsigned long long __div; \
+ unsigned long __mod; \
+ \
+ __div = (*n); \
+ __radix = (base); \
+ \
+ __high = __div >> 32; \
+ __low = __div; \
+ __upper = __high; \
+ \
+ if (__high) { \
+ __asm__("divu $0, %z1, %z2" \
+ : "=x" (__modquot) \
+ : "Jr" (__high), "Jr" (__radix)); \
+ __upper = __modquot >> 32; \
+ __high = __modquot; \
+ } \
+ \
+ __mod = do_div64_32(__low, __upper, __low, __radix); \
+ \
+ __quot = __high; \
+ __quot = __quot << 32 | __low; \
+ (*n) = __quot; \
+ __mod; \
+})
+
+#endif /* BITS_PER_LONG == 32 */
+
+#include <asm-generic/div64.h>
#endif /* __ASM_DIV64_H */
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c49f71f60754acbff37505e1d16ca796bf8a8140 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Tue, 20 Apr 2021 04:50:40 +0200
Subject: [PATCH] MIPS: Reinstate platform `__div64_32' handler
Our current MIPS platform `__div64_32' handler is inactive, because it
is incorrectly only enabled for 64-bit configurations, for which generic
`do_div' code does not call it anyway.
The handler is not suitable for being called from there though as it
only calculates 32 bits of the quotient under the assumption the 64-bit
divident has been suitably reduced. Code for such reduction used to be
there, however it has been incorrectly removed with commit c21004cd5b4c
("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0."), which should
have only updated an obsoleted constraint for an inline asm involving
$hi and $lo register outputs, while possibly wiring the original MIPS
variant of the `do_div' macro as `__div64_32' handler for the generic
`do_div' implementation
Correct the handler as follows then:
- Revert most of the commit referred, however retaining the current
formatting, except for the final two instructions of the inline asm
sequence, which the original commit missed. Omit the original 64-bit
parts though.
- Rename the original `do_div' macro to `__div64_32'. Use the combined
`x' constraint referring to the MD accumulator as a whole, replacing
the original individual `h' and `l' constraints used for $hi and $lo
registers respectively, of which `h' has been obsoleted with GCC 4.4.
Update surrounding code accordingly.
We have since removed support for GCC versions before 4.9, so no need
for a special arrangement here; GCC has supported the `x' constraint
since forever anyway, or at least going back to 1991.
- Rename the `__base' local variable in `__div64_32' to `__radix' to
avoid a conflict with a local variable in `do_div'.
- Actually enable this code for 32-bit rather than 64-bit configurations
by qualifying it with BITS_PER_LONG being 32 instead of 64. Include
<asm/bitsperlong.h> for this macro rather than <linux/types.h> as we
don't need anything else.
- Finally include <asm-generic/div64.h> last rather than first.
This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0668s and 0.2629s from 2.1529s
and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.
For a reference 64-bit `do_div' code where we have the DDIVU instruction
available to do the whole calculation right away averages at 0.0660s for
the latter CPU.
Fixes: c21004cd5b4c ("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0.")
Reported-by: Huacai Chen <chenhuacai(a)kernel.org>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Cc: stable(a)vger.kernel.org # v2.6.30+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/div64.h b/arch/mips/include/asm/div64.h
index dc5ea5736440..b252300e299d 100644
--- a/arch/mips/include/asm/div64.h
+++ b/arch/mips/include/asm/div64.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2000, 2004 Maciej W. Rozycki
+ * Copyright (C) 2000, 2004, 2021 Maciej W. Rozycki
* Copyright (C) 2003, 07 Ralf Baechle (ralf(a)linux-mips.org)
*
* This file is subject to the terms and conditions of the GNU General Public
@@ -9,25 +9,18 @@
#ifndef __ASM_DIV64_H
#define __ASM_DIV64_H
-#include <asm-generic/div64.h>
-
-#if BITS_PER_LONG == 64
+#include <asm/bitsperlong.h>
-#include <linux/types.h>
+#if BITS_PER_LONG == 32
/*
* No traps on overflows for any of these...
*/
-#define __div64_32(n, base) \
-({ \
+#define do_div64_32(res, high, low, base) ({ \
unsigned long __cf, __tmp, __tmp2, __i; \
unsigned long __quot32, __mod32; \
- unsigned long __high, __low; \
- unsigned long long __n; \
\
- __high = *__n >> 32; \
- __low = __n; \
__asm__( \
" .set push \n" \
" .set noat \n" \
@@ -51,18 +44,50 @@
" subu %0, %0, %z6 \n" \
" addiu %2, %2, 1 \n" \
"3: \n" \
- " bnez %4, 0b\n\t" \
- " srl %5, %1, 0x1f\n\t" \
+ " bnez %4, 0b \n" \
+ " srl %5, %1, 0x1f \n" \
" .set pop" \
: "=&r" (__mod32), "=&r" (__tmp), \
"=&r" (__quot32), "=&r" (__cf), \
"=&r" (__i), "=&r" (__tmp2) \
- : "Jr" (base), "0" (__high), "1" (__low)); \
+ : "Jr" (base), "0" (high), "1" (low)); \
\
- (__n) = __quot32; \
+ (res) = __quot32; \
__mod32; \
})
-#endif /* BITS_PER_LONG == 64 */
+#define __div64_32(n, base) ({ \
+ unsigned long __upper, __low, __high, __radix; \
+ unsigned long long __modquot; \
+ unsigned long long __quot; \
+ unsigned long long __div; \
+ unsigned long __mod; \
+ \
+ __div = (*n); \
+ __radix = (base); \
+ \
+ __high = __div >> 32; \
+ __low = __div; \
+ __upper = __high; \
+ \
+ if (__high) { \
+ __asm__("divu $0, %z1, %z2" \
+ : "=x" (__modquot) \
+ : "Jr" (__high), "Jr" (__radix)); \
+ __upper = __modquot >> 32; \
+ __high = __modquot; \
+ } \
+ \
+ __mod = do_div64_32(__low, __upper, __low, __radix); \
+ \
+ __quot = __high; \
+ __quot = __quot << 32 | __low; \
+ (*n) = __quot; \
+ __mod; \
+})
+
+#endif /* BITS_PER_LONG == 32 */
+
+#include <asm-generic/div64.h>
#endif /* __ASM_DIV64_H */
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c49f71f60754acbff37505e1d16ca796bf8a8140 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Tue, 20 Apr 2021 04:50:40 +0200
Subject: [PATCH] MIPS: Reinstate platform `__div64_32' handler
Our current MIPS platform `__div64_32' handler is inactive, because it
is incorrectly only enabled for 64-bit configurations, for which generic
`do_div' code does not call it anyway.
The handler is not suitable for being called from there though as it
only calculates 32 bits of the quotient under the assumption the 64-bit
divident has been suitably reduced. Code for such reduction used to be
there, however it has been incorrectly removed with commit c21004cd5b4c
("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0."), which should
have only updated an obsoleted constraint for an inline asm involving
$hi and $lo register outputs, while possibly wiring the original MIPS
variant of the `do_div' macro as `__div64_32' handler for the generic
`do_div' implementation
Correct the handler as follows then:
- Revert most of the commit referred, however retaining the current
formatting, except for the final two instructions of the inline asm
sequence, which the original commit missed. Omit the original 64-bit
parts though.
- Rename the original `do_div' macro to `__div64_32'. Use the combined
`x' constraint referring to the MD accumulator as a whole, replacing
the original individual `h' and `l' constraints used for $hi and $lo
registers respectively, of which `h' has been obsoleted with GCC 4.4.
Update surrounding code accordingly.
We have since removed support for GCC versions before 4.9, so no need
for a special arrangement here; GCC has supported the `x' constraint
since forever anyway, or at least going back to 1991.
- Rename the `__base' local variable in `__div64_32' to `__radix' to
avoid a conflict with a local variable in `do_div'.
- Actually enable this code for 32-bit rather than 64-bit configurations
by qualifying it with BITS_PER_LONG being 32 instead of 64. Include
<asm/bitsperlong.h> for this macro rather than <linux/types.h> as we
don't need anything else.
- Finally include <asm-generic/div64.h> last rather than first.
This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0668s and 0.2629s from 2.1529s
and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.
For a reference 64-bit `do_div' code where we have the DDIVU instruction
available to do the whole calculation right away averages at 0.0660s for
the latter CPU.
Fixes: c21004cd5b4c ("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0.")
Reported-by: Huacai Chen <chenhuacai(a)kernel.org>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Cc: stable(a)vger.kernel.org # v2.6.30+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/div64.h b/arch/mips/include/asm/div64.h
index dc5ea5736440..b252300e299d 100644
--- a/arch/mips/include/asm/div64.h
+++ b/arch/mips/include/asm/div64.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2000, 2004 Maciej W. Rozycki
+ * Copyright (C) 2000, 2004, 2021 Maciej W. Rozycki
* Copyright (C) 2003, 07 Ralf Baechle (ralf(a)linux-mips.org)
*
* This file is subject to the terms and conditions of the GNU General Public
@@ -9,25 +9,18 @@
#ifndef __ASM_DIV64_H
#define __ASM_DIV64_H
-#include <asm-generic/div64.h>
-
-#if BITS_PER_LONG == 64
+#include <asm/bitsperlong.h>
-#include <linux/types.h>
+#if BITS_PER_LONG == 32
/*
* No traps on overflows for any of these...
*/
-#define __div64_32(n, base) \
-({ \
+#define do_div64_32(res, high, low, base) ({ \
unsigned long __cf, __tmp, __tmp2, __i; \
unsigned long __quot32, __mod32; \
- unsigned long __high, __low; \
- unsigned long long __n; \
\
- __high = *__n >> 32; \
- __low = __n; \
__asm__( \
" .set push \n" \
" .set noat \n" \
@@ -51,18 +44,50 @@
" subu %0, %0, %z6 \n" \
" addiu %2, %2, 1 \n" \
"3: \n" \
- " bnez %4, 0b\n\t" \
- " srl %5, %1, 0x1f\n\t" \
+ " bnez %4, 0b \n" \
+ " srl %5, %1, 0x1f \n" \
" .set pop" \
: "=&r" (__mod32), "=&r" (__tmp), \
"=&r" (__quot32), "=&r" (__cf), \
"=&r" (__i), "=&r" (__tmp2) \
- : "Jr" (base), "0" (__high), "1" (__low)); \
+ : "Jr" (base), "0" (high), "1" (low)); \
\
- (__n) = __quot32; \
+ (res) = __quot32; \
__mod32; \
})
-#endif /* BITS_PER_LONG == 64 */
+#define __div64_32(n, base) ({ \
+ unsigned long __upper, __low, __high, __radix; \
+ unsigned long long __modquot; \
+ unsigned long long __quot; \
+ unsigned long long __div; \
+ unsigned long __mod; \
+ \
+ __div = (*n); \
+ __radix = (base); \
+ \
+ __high = __div >> 32; \
+ __low = __div; \
+ __upper = __high; \
+ \
+ if (__high) { \
+ __asm__("divu $0, %z1, %z2" \
+ : "=x" (__modquot) \
+ : "Jr" (__high), "Jr" (__radix)); \
+ __upper = __modquot >> 32; \
+ __high = __modquot; \
+ } \
+ \
+ __mod = do_div64_32(__low, __upper, __low, __radix); \
+ \
+ __quot = __high; \
+ __quot = __quot << 32 | __low; \
+ (*n) = __quot; \
+ __mod; \
+})
+
+#endif /* BITS_PER_LONG == 32 */
+
+#include <asm-generic/div64.h>
#endif /* __ASM_DIV64_H */
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c49f71f60754acbff37505e1d16ca796bf8a8140 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Tue, 20 Apr 2021 04:50:40 +0200
Subject: [PATCH] MIPS: Reinstate platform `__div64_32' handler
Our current MIPS platform `__div64_32' handler is inactive, because it
is incorrectly only enabled for 64-bit configurations, for which generic
`do_div' code does not call it anyway.
The handler is not suitable for being called from there though as it
only calculates 32 bits of the quotient under the assumption the 64-bit
divident has been suitably reduced. Code for such reduction used to be
there, however it has been incorrectly removed with commit c21004cd5b4c
("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0."), which should
have only updated an obsoleted constraint for an inline asm involving
$hi and $lo register outputs, while possibly wiring the original MIPS
variant of the `do_div' macro as `__div64_32' handler for the generic
`do_div' implementation
Correct the handler as follows then:
- Revert most of the commit referred, however retaining the current
formatting, except for the final two instructions of the inline asm
sequence, which the original commit missed. Omit the original 64-bit
parts though.
- Rename the original `do_div' macro to `__div64_32'. Use the combined
`x' constraint referring to the MD accumulator as a whole, replacing
the original individual `h' and `l' constraints used for $hi and $lo
registers respectively, of which `h' has been obsoleted with GCC 4.4.
Update surrounding code accordingly.
We have since removed support for GCC versions before 4.9, so no need
for a special arrangement here; GCC has supported the `x' constraint
since forever anyway, or at least going back to 1991.
- Rename the `__base' local variable in `__div64_32' to `__radix' to
avoid a conflict with a local variable in `do_div'.
- Actually enable this code for 32-bit rather than 64-bit configurations
by qualifying it with BITS_PER_LONG being 32 instead of 64. Include
<asm/bitsperlong.h> for this macro rather than <linux/types.h> as we
don't need anything else.
- Finally include <asm-generic/div64.h> last rather than first.
This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0668s and 0.2629s from 2.1529s
and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.
For a reference 64-bit `do_div' code where we have the DDIVU instruction
available to do the whole calculation right away averages at 0.0660s for
the latter CPU.
Fixes: c21004cd5b4c ("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0.")
Reported-by: Huacai Chen <chenhuacai(a)kernel.org>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Cc: stable(a)vger.kernel.org # v2.6.30+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/div64.h b/arch/mips/include/asm/div64.h
index dc5ea5736440..b252300e299d 100644
--- a/arch/mips/include/asm/div64.h
+++ b/arch/mips/include/asm/div64.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2000, 2004 Maciej W. Rozycki
+ * Copyright (C) 2000, 2004, 2021 Maciej W. Rozycki
* Copyright (C) 2003, 07 Ralf Baechle (ralf(a)linux-mips.org)
*
* This file is subject to the terms and conditions of the GNU General Public
@@ -9,25 +9,18 @@
#ifndef __ASM_DIV64_H
#define __ASM_DIV64_H
-#include <asm-generic/div64.h>
-
-#if BITS_PER_LONG == 64
+#include <asm/bitsperlong.h>
-#include <linux/types.h>
+#if BITS_PER_LONG == 32
/*
* No traps on overflows for any of these...
*/
-#define __div64_32(n, base) \
-({ \
+#define do_div64_32(res, high, low, base) ({ \
unsigned long __cf, __tmp, __tmp2, __i; \
unsigned long __quot32, __mod32; \
- unsigned long __high, __low; \
- unsigned long long __n; \
\
- __high = *__n >> 32; \
- __low = __n; \
__asm__( \
" .set push \n" \
" .set noat \n" \
@@ -51,18 +44,50 @@
" subu %0, %0, %z6 \n" \
" addiu %2, %2, 1 \n" \
"3: \n" \
- " bnez %4, 0b\n\t" \
- " srl %5, %1, 0x1f\n\t" \
+ " bnez %4, 0b \n" \
+ " srl %5, %1, 0x1f \n" \
" .set pop" \
: "=&r" (__mod32), "=&r" (__tmp), \
"=&r" (__quot32), "=&r" (__cf), \
"=&r" (__i), "=&r" (__tmp2) \
- : "Jr" (base), "0" (__high), "1" (__low)); \
+ : "Jr" (base), "0" (high), "1" (low)); \
\
- (__n) = __quot32; \
+ (res) = __quot32; \
__mod32; \
})
-#endif /* BITS_PER_LONG == 64 */
+#define __div64_32(n, base) ({ \
+ unsigned long __upper, __low, __high, __radix; \
+ unsigned long long __modquot; \
+ unsigned long long __quot; \
+ unsigned long long __div; \
+ unsigned long __mod; \
+ \
+ __div = (*n); \
+ __radix = (base); \
+ \
+ __high = __div >> 32; \
+ __low = __div; \
+ __upper = __high; \
+ \
+ if (__high) { \
+ __asm__("divu $0, %z1, %z2" \
+ : "=x" (__modquot) \
+ : "Jr" (__high), "Jr" (__radix)); \
+ __upper = __modquot >> 32; \
+ __high = __modquot; \
+ } \
+ \
+ __mod = do_div64_32(__low, __upper, __low, __radix); \
+ \
+ __quot = __high; \
+ __quot = __quot << 32 | __low; \
+ (*n) = __quot; \
+ __mod; \
+})
+
+#endif /* BITS_PER_LONG == 32 */
+
+#include <asm-generic/div64.h>
#endif /* __ASM_DIV64_H */
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c49f71f60754acbff37505e1d16ca796bf8a8140 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Tue, 20 Apr 2021 04:50:40 +0200
Subject: [PATCH] MIPS: Reinstate platform `__div64_32' handler
Our current MIPS platform `__div64_32' handler is inactive, because it
is incorrectly only enabled for 64-bit configurations, for which generic
`do_div' code does not call it anyway.
The handler is not suitable for being called from there though as it
only calculates 32 bits of the quotient under the assumption the 64-bit
divident has been suitably reduced. Code for such reduction used to be
there, however it has been incorrectly removed with commit c21004cd5b4c
("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0."), which should
have only updated an obsoleted constraint for an inline asm involving
$hi and $lo register outputs, while possibly wiring the original MIPS
variant of the `do_div' macro as `__div64_32' handler for the generic
`do_div' implementation
Correct the handler as follows then:
- Revert most of the commit referred, however retaining the current
formatting, except for the final two instructions of the inline asm
sequence, which the original commit missed. Omit the original 64-bit
parts though.
- Rename the original `do_div' macro to `__div64_32'. Use the combined
`x' constraint referring to the MD accumulator as a whole, replacing
the original individual `h' and `l' constraints used for $hi and $lo
registers respectively, of which `h' has been obsoleted with GCC 4.4.
Update surrounding code accordingly.
We have since removed support for GCC versions before 4.9, so no need
for a special arrangement here; GCC has supported the `x' constraint
since forever anyway, or at least going back to 1991.
- Rename the `__base' local variable in `__div64_32' to `__radix' to
avoid a conflict with a local variable in `do_div'.
- Actually enable this code for 32-bit rather than 64-bit configurations
by qualifying it with BITS_PER_LONG being 32 instead of 64. Include
<asm/bitsperlong.h> for this macro rather than <linux/types.h> as we
don't need anything else.
- Finally include <asm-generic/div64.h> last rather than first.
This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0668s and 0.2629s from 2.1529s
and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.
For a reference 64-bit `do_div' code where we have the DDIVU instruction
available to do the whole calculation right away averages at 0.0660s for
the latter CPU.
Fixes: c21004cd5b4c ("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0.")
Reported-by: Huacai Chen <chenhuacai(a)kernel.org>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Cc: stable(a)vger.kernel.org # v2.6.30+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/div64.h b/arch/mips/include/asm/div64.h
index dc5ea5736440..b252300e299d 100644
--- a/arch/mips/include/asm/div64.h
+++ b/arch/mips/include/asm/div64.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2000, 2004 Maciej W. Rozycki
+ * Copyright (C) 2000, 2004, 2021 Maciej W. Rozycki
* Copyright (C) 2003, 07 Ralf Baechle (ralf(a)linux-mips.org)
*
* This file is subject to the terms and conditions of the GNU General Public
@@ -9,25 +9,18 @@
#ifndef __ASM_DIV64_H
#define __ASM_DIV64_H
-#include <asm-generic/div64.h>
-
-#if BITS_PER_LONG == 64
+#include <asm/bitsperlong.h>
-#include <linux/types.h>
+#if BITS_PER_LONG == 32
/*
* No traps on overflows for any of these...
*/
-#define __div64_32(n, base) \
-({ \
+#define do_div64_32(res, high, low, base) ({ \
unsigned long __cf, __tmp, __tmp2, __i; \
unsigned long __quot32, __mod32; \
- unsigned long __high, __low; \
- unsigned long long __n; \
\
- __high = *__n >> 32; \
- __low = __n; \
__asm__( \
" .set push \n" \
" .set noat \n" \
@@ -51,18 +44,50 @@
" subu %0, %0, %z6 \n" \
" addiu %2, %2, 1 \n" \
"3: \n" \
- " bnez %4, 0b\n\t" \
- " srl %5, %1, 0x1f\n\t" \
+ " bnez %4, 0b \n" \
+ " srl %5, %1, 0x1f \n" \
" .set pop" \
: "=&r" (__mod32), "=&r" (__tmp), \
"=&r" (__quot32), "=&r" (__cf), \
"=&r" (__i), "=&r" (__tmp2) \
- : "Jr" (base), "0" (__high), "1" (__low)); \
+ : "Jr" (base), "0" (high), "1" (low)); \
\
- (__n) = __quot32; \
+ (res) = __quot32; \
__mod32; \
})
-#endif /* BITS_PER_LONG == 64 */
+#define __div64_32(n, base) ({ \
+ unsigned long __upper, __low, __high, __radix; \
+ unsigned long long __modquot; \
+ unsigned long long __quot; \
+ unsigned long long __div; \
+ unsigned long __mod; \
+ \
+ __div = (*n); \
+ __radix = (base); \
+ \
+ __high = __div >> 32; \
+ __low = __div; \
+ __upper = __high; \
+ \
+ if (__high) { \
+ __asm__("divu $0, %z1, %z2" \
+ : "=x" (__modquot) \
+ : "Jr" (__high), "Jr" (__radix)); \
+ __upper = __modquot >> 32; \
+ __high = __modquot; \
+ } \
+ \
+ __mod = do_div64_32(__low, __upper, __low, __radix); \
+ \
+ __quot = __high; \
+ __quot = __quot << 32 | __low; \
+ (*n) = __quot; \
+ __mod; \
+})
+
+#endif /* BITS_PER_LONG == 32 */
+
+#include <asm-generic/div64.h>
#endif /* __ASM_DIV64_H */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c49f71f60754acbff37505e1d16ca796bf8a8140 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Tue, 20 Apr 2021 04:50:40 +0200
Subject: [PATCH] MIPS: Reinstate platform `__div64_32' handler
Our current MIPS platform `__div64_32' handler is inactive, because it
is incorrectly only enabled for 64-bit configurations, for which generic
`do_div' code does not call it anyway.
The handler is not suitable for being called from there though as it
only calculates 32 bits of the quotient under the assumption the 64-bit
divident has been suitably reduced. Code for such reduction used to be
there, however it has been incorrectly removed with commit c21004cd5b4c
("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0."), which should
have only updated an obsoleted constraint for an inline asm involving
$hi and $lo register outputs, while possibly wiring the original MIPS
variant of the `do_div' macro as `__div64_32' handler for the generic
`do_div' implementation
Correct the handler as follows then:
- Revert most of the commit referred, however retaining the current
formatting, except for the final two instructions of the inline asm
sequence, which the original commit missed. Omit the original 64-bit
parts though.
- Rename the original `do_div' macro to `__div64_32'. Use the combined
`x' constraint referring to the MD accumulator as a whole, replacing
the original individual `h' and `l' constraints used for $hi and $lo
registers respectively, of which `h' has been obsoleted with GCC 4.4.
Update surrounding code accordingly.
We have since removed support for GCC versions before 4.9, so no need
for a special arrangement here; GCC has supported the `x' constraint
since forever anyway, or at least going back to 1991.
- Rename the `__base' local variable in `__div64_32' to `__radix' to
avoid a conflict with a local variable in `do_div'.
- Actually enable this code for 32-bit rather than 64-bit configurations
by qualifying it with BITS_PER_LONG being 32 instead of 64. Include
<asm/bitsperlong.h> for this macro rather than <linux/types.h> as we
don't need anything else.
- Finally include <asm-generic/div64.h> last rather than first.
This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0668s and 0.2629s from 2.1529s
and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.
For a reference 64-bit `do_div' code where we have the DDIVU instruction
available to do the whole calculation right away averages at 0.0660s for
the latter CPU.
Fixes: c21004cd5b4c ("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0.")
Reported-by: Huacai Chen <chenhuacai(a)kernel.org>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Cc: stable(a)vger.kernel.org # v2.6.30+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/div64.h b/arch/mips/include/asm/div64.h
index dc5ea5736440..b252300e299d 100644
--- a/arch/mips/include/asm/div64.h
+++ b/arch/mips/include/asm/div64.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2000, 2004 Maciej W. Rozycki
+ * Copyright (C) 2000, 2004, 2021 Maciej W. Rozycki
* Copyright (C) 2003, 07 Ralf Baechle (ralf(a)linux-mips.org)
*
* This file is subject to the terms and conditions of the GNU General Public
@@ -9,25 +9,18 @@
#ifndef __ASM_DIV64_H
#define __ASM_DIV64_H
-#include <asm-generic/div64.h>
-
-#if BITS_PER_LONG == 64
+#include <asm/bitsperlong.h>
-#include <linux/types.h>
+#if BITS_PER_LONG == 32
/*
* No traps on overflows for any of these...
*/
-#define __div64_32(n, base) \
-({ \
+#define do_div64_32(res, high, low, base) ({ \
unsigned long __cf, __tmp, __tmp2, __i; \
unsigned long __quot32, __mod32; \
- unsigned long __high, __low; \
- unsigned long long __n; \
\
- __high = *__n >> 32; \
- __low = __n; \
__asm__( \
" .set push \n" \
" .set noat \n" \
@@ -51,18 +44,50 @@
" subu %0, %0, %z6 \n" \
" addiu %2, %2, 1 \n" \
"3: \n" \
- " bnez %4, 0b\n\t" \
- " srl %5, %1, 0x1f\n\t" \
+ " bnez %4, 0b \n" \
+ " srl %5, %1, 0x1f \n" \
" .set pop" \
: "=&r" (__mod32), "=&r" (__tmp), \
"=&r" (__quot32), "=&r" (__cf), \
"=&r" (__i), "=&r" (__tmp2) \
- : "Jr" (base), "0" (__high), "1" (__low)); \
+ : "Jr" (base), "0" (high), "1" (low)); \
\
- (__n) = __quot32; \
+ (res) = __quot32; \
__mod32; \
})
-#endif /* BITS_PER_LONG == 64 */
+#define __div64_32(n, base) ({ \
+ unsigned long __upper, __low, __high, __radix; \
+ unsigned long long __modquot; \
+ unsigned long long __quot; \
+ unsigned long long __div; \
+ unsigned long __mod; \
+ \
+ __div = (*n); \
+ __radix = (base); \
+ \
+ __high = __div >> 32; \
+ __low = __div; \
+ __upper = __high; \
+ \
+ if (__high) { \
+ __asm__("divu $0, %z1, %z2" \
+ : "=x" (__modquot) \
+ : "Jr" (__high), "Jr" (__radix)); \
+ __upper = __modquot >> 32; \
+ __high = __modquot; \
+ } \
+ \
+ __mod = do_div64_32(__low, __upper, __low, __radix); \
+ \
+ __quot = __high; \
+ __quot = __quot << 32 | __low; \
+ (*n) = __quot; \
+ __mod; \
+})
+
+#endif /* BITS_PER_LONG == 32 */
+
+#include <asm-generic/div64.h>
#endif /* __ASM_DIV64_H */
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c49f71f60754acbff37505e1d16ca796bf8a8140 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Tue, 20 Apr 2021 04:50:40 +0200
Subject: [PATCH] MIPS: Reinstate platform `__div64_32' handler
Our current MIPS platform `__div64_32' handler is inactive, because it
is incorrectly only enabled for 64-bit configurations, for which generic
`do_div' code does not call it anyway.
The handler is not suitable for being called from there though as it
only calculates 32 bits of the quotient under the assumption the 64-bit
divident has been suitably reduced. Code for such reduction used to be
there, however it has been incorrectly removed with commit c21004cd5b4c
("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0."), which should
have only updated an obsoleted constraint for an inline asm involving
$hi and $lo register outputs, while possibly wiring the original MIPS
variant of the `do_div' macro as `__div64_32' handler for the generic
`do_div' implementation
Correct the handler as follows then:
- Revert most of the commit referred, however retaining the current
formatting, except for the final two instructions of the inline asm
sequence, which the original commit missed. Omit the original 64-bit
parts though.
- Rename the original `do_div' macro to `__div64_32'. Use the combined
`x' constraint referring to the MD accumulator as a whole, replacing
the original individual `h' and `l' constraints used for $hi and $lo
registers respectively, of which `h' has been obsoleted with GCC 4.4.
Update surrounding code accordingly.
We have since removed support for GCC versions before 4.9, so no need
for a special arrangement here; GCC has supported the `x' constraint
since forever anyway, or at least going back to 1991.
- Rename the `__base' local variable in `__div64_32' to `__radix' to
avoid a conflict with a local variable in `do_div'.
- Actually enable this code for 32-bit rather than 64-bit configurations
by qualifying it with BITS_PER_LONG being 32 instead of 64. Include
<asm/bitsperlong.h> for this macro rather than <linux/types.h> as we
don't need anything else.
- Finally include <asm-generic/div64.h> last rather than first.
This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0668s and 0.2629s from 2.1529s
and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.
For a reference 64-bit `do_div' code where we have the DDIVU instruction
available to do the whole calculation right away averages at 0.0660s for
the latter CPU.
Fixes: c21004cd5b4c ("MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0.")
Reported-by: Huacai Chen <chenhuacai(a)kernel.org>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Cc: stable(a)vger.kernel.org # v2.6.30+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/div64.h b/arch/mips/include/asm/div64.h
index dc5ea5736440..b252300e299d 100644
--- a/arch/mips/include/asm/div64.h
+++ b/arch/mips/include/asm/div64.h
@@ -1,5 +1,5 @@
/*
- * Copyright (C) 2000, 2004 Maciej W. Rozycki
+ * Copyright (C) 2000, 2004, 2021 Maciej W. Rozycki
* Copyright (C) 2003, 07 Ralf Baechle (ralf(a)linux-mips.org)
*
* This file is subject to the terms and conditions of the GNU General Public
@@ -9,25 +9,18 @@
#ifndef __ASM_DIV64_H
#define __ASM_DIV64_H
-#include <asm-generic/div64.h>
-
-#if BITS_PER_LONG == 64
+#include <asm/bitsperlong.h>
-#include <linux/types.h>
+#if BITS_PER_LONG == 32
/*
* No traps on overflows for any of these...
*/
-#define __div64_32(n, base) \
-({ \
+#define do_div64_32(res, high, low, base) ({ \
unsigned long __cf, __tmp, __tmp2, __i; \
unsigned long __quot32, __mod32; \
- unsigned long __high, __low; \
- unsigned long long __n; \
\
- __high = *__n >> 32; \
- __low = __n; \
__asm__( \
" .set push \n" \
" .set noat \n" \
@@ -51,18 +44,50 @@
" subu %0, %0, %z6 \n" \
" addiu %2, %2, 1 \n" \
"3: \n" \
- " bnez %4, 0b\n\t" \
- " srl %5, %1, 0x1f\n\t" \
+ " bnez %4, 0b \n" \
+ " srl %5, %1, 0x1f \n" \
" .set pop" \
: "=&r" (__mod32), "=&r" (__tmp), \
"=&r" (__quot32), "=&r" (__cf), \
"=&r" (__i), "=&r" (__tmp2) \
- : "Jr" (base), "0" (__high), "1" (__low)); \
+ : "Jr" (base), "0" (high), "1" (low)); \
\
- (__n) = __quot32; \
+ (res) = __quot32; \
__mod32; \
})
-#endif /* BITS_PER_LONG == 64 */
+#define __div64_32(n, base) ({ \
+ unsigned long __upper, __low, __high, __radix; \
+ unsigned long long __modquot; \
+ unsigned long long __quot; \
+ unsigned long long __div; \
+ unsigned long __mod; \
+ \
+ __div = (*n); \
+ __radix = (base); \
+ \
+ __high = __div >> 32; \
+ __low = __div; \
+ __upper = __high; \
+ \
+ if (__high) { \
+ __asm__("divu $0, %z1, %z2" \
+ : "=x" (__modquot) \
+ : "Jr" (__high), "Jr" (__radix)); \
+ __upper = __modquot >> 32; \
+ __high = __modquot; \
+ } \
+ \
+ __mod = do_div64_32(__low, __upper, __low, __radix); \
+ \
+ __quot = __high; \
+ __quot = __quot << 32 | __low; \
+ (*n) = __quot; \
+ __mod; \
+})
+
+#endif /* BITS_PER_LONG == 32 */
+
+#include <asm-generic/div64.h>
#endif /* __ASM_DIV64_H */
From: Ard Biesheuvel <ardb(a)kernel.org>
[ Upstream commit f9e7a99fb6b86aa6a00e53b34ee6973840e005aa ]
The cache invalidation code in v7_invalidate_l1 can be tweaked to
re-read the associativity from CCSIDR, and keep the way identifier
component in a single register that is assigned in the outer loop. This
way, we need 2 registers less.
Given that the number of sets is typically much larger than the
associativity, rearrange the code so that the outer loop has the fewer
number of iterations, ensuring that the re-read of CCSIDR only occurs a
handful of times in practice.
Fix the whitespace while at it, and update the comment to indicate that
this code is no longer a clone of anything else.
Acked-by: Nicolas Pitre <nico(a)fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mm/cache-v7.S | 51 +++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 26 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 11d699af30ed..db568be45946 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -27,41 +27,40 @@
* processor. We fix this by performing an invalidate, rather than a
* clean + invalidate, before jumping into the kernel.
*
- * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs
- * to be called for both secondary cores startup and primary core resume
- * procedures.
+ * This function needs to be called for both secondary cores startup and
+ * primary core resume procedures.
*/
ENTRY(v7_invalidate_l1)
mov r0, #0
mcr p15, 2, r0, c0, c0, 0
mrc p15, 1, r0, c0, c0, 0
- movw r1, #0x7fff
- and r2, r1, r0, lsr #13
+ movw r3, #0x3ff
+ and r3, r3, r0, lsr #3 @ 'Associativity' in CCSIDR[12:3]
+ clz r1, r3 @ WayShift
+ mov r2, #1
+ mov r3, r3, lsl r1 @ NumWays-1 shifted into bits [31:...]
+ movs r1, r2, lsl r1 @ #1 shifted left by same amount
+ moveq r1, #1 @ r1 needs value > 0 even if only 1 way
- movw r1, #0x3ff
+ and r2, r0, #0x7
+ add r2, r2, #4 @ SetShift
- and r3, r1, r0, lsr #3 @ NumWays - 1
- add r2, r2, #1 @ NumSets
+1: movw r4, #0x7fff
+ and r0, r4, r0, lsr #13 @ 'NumSets' in CCSIDR[27:13]
- and r0, r0, #0x7
- add r0, r0, #4 @ SetShift
-
- clz r1, r3 @ WayShift
- add r4, r3, #1 @ NumWays
-1: sub r2, r2, #1 @ NumSets--
- mov r3, r4 @ Temp = NumWays
-2: subs r3, r3, #1 @ Temp--
- mov r5, r3, lsl r1
- mov r6, r2, lsl r0
- orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
- mcr p15, 0, r5, c7, c6, 2
- bgt 2b
- cmp r2, #0
- bgt 1b
- dsb st
- isb
- ret lr
+2: mov r4, r0, lsl r2 @ NumSet << SetShift
+ orr r4, r4, r3 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+ mcr p15, 0, r4, c7, c6, 2
+ subs r0, r0, #1 @ Set--
+ bpl 2b
+ subs r3, r3, r1 @ Way--
+ bcc 3f
+ mrc p15, 1, r0, c0, c0, 0 @ re-read cache geometry from CCSIDR
+ b 1b
+3: dsb st
+ isb
+ ret lr
ENDPROC(v7_invalidate_l1)
/*
--
2.30.2
From: Ard Biesheuvel <ardb(a)kernel.org>
[ Upstream commit f9e7a99fb6b86aa6a00e53b34ee6973840e005aa ]
The cache invalidation code in v7_invalidate_l1 can be tweaked to
re-read the associativity from CCSIDR, and keep the way identifier
component in a single register that is assigned in the outer loop. This
way, we need 2 registers less.
Given that the number of sets is typically much larger than the
associativity, rearrange the code so that the outer loop has the fewer
number of iterations, ensuring that the re-read of CCSIDR only occurs a
handful of times in practice.
Fix the whitespace while at it, and update the comment to indicate that
this code is no longer a clone of anything else.
Acked-by: Nicolas Pitre <nico(a)fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mm/cache-v7.S | 51 +++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 26 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 11d699af30ed..db568be45946 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -27,41 +27,40 @@
* processor. We fix this by performing an invalidate, rather than a
* clean + invalidate, before jumping into the kernel.
*
- * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs
- * to be called for both secondary cores startup and primary core resume
- * procedures.
+ * This function needs to be called for both secondary cores startup and
+ * primary core resume procedures.
*/
ENTRY(v7_invalidate_l1)
mov r0, #0
mcr p15, 2, r0, c0, c0, 0
mrc p15, 1, r0, c0, c0, 0
- movw r1, #0x7fff
- and r2, r1, r0, lsr #13
+ movw r3, #0x3ff
+ and r3, r3, r0, lsr #3 @ 'Associativity' in CCSIDR[12:3]
+ clz r1, r3 @ WayShift
+ mov r2, #1
+ mov r3, r3, lsl r1 @ NumWays-1 shifted into bits [31:...]
+ movs r1, r2, lsl r1 @ #1 shifted left by same amount
+ moveq r1, #1 @ r1 needs value > 0 even if only 1 way
- movw r1, #0x3ff
+ and r2, r0, #0x7
+ add r2, r2, #4 @ SetShift
- and r3, r1, r0, lsr #3 @ NumWays - 1
- add r2, r2, #1 @ NumSets
+1: movw r4, #0x7fff
+ and r0, r4, r0, lsr #13 @ 'NumSets' in CCSIDR[27:13]
- and r0, r0, #0x7
- add r0, r0, #4 @ SetShift
-
- clz r1, r3 @ WayShift
- add r4, r3, #1 @ NumWays
-1: sub r2, r2, #1 @ NumSets--
- mov r3, r4 @ Temp = NumWays
-2: subs r3, r3, #1 @ Temp--
- mov r5, r3, lsl r1
- mov r6, r2, lsl r0
- orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
- mcr p15, 0, r5, c7, c6, 2
- bgt 2b
- cmp r2, #0
- bgt 1b
- dsb st
- isb
- ret lr
+2: mov r4, r0, lsl r2 @ NumSet << SetShift
+ orr r4, r4, r3 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+ mcr p15, 0, r4, c7, c6, 2
+ subs r0, r0, #1 @ Set--
+ bpl 2b
+ subs r3, r3, r1 @ Way--
+ bcc 3f
+ mrc p15, 1, r0, c0, c0, 0 @ re-read cache geometry from CCSIDR
+ b 1b
+3: dsb st
+ isb
+ ret lr
ENDPROC(v7_invalidate_l1)
/*
--
2.30.2
From: Ard Biesheuvel <ardb(a)kernel.org>
[ Upstream commit f9e7a99fb6b86aa6a00e53b34ee6973840e005aa ]
The cache invalidation code in v7_invalidate_l1 can be tweaked to
re-read the associativity from CCSIDR, and keep the way identifier
component in a single register that is assigned in the outer loop. This
way, we need 2 registers less.
Given that the number of sets is typically much larger than the
associativity, rearrange the code so that the outer loop has the fewer
number of iterations, ensuring that the re-read of CCSIDR only occurs a
handful of times in practice.
Fix the whitespace while at it, and update the comment to indicate that
this code is no longer a clone of anything else.
Acked-by: Nicolas Pitre <nico(a)fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mm/cache-v7.S | 51 +++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 26 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 50a70edbc863..08986397e5c7 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -27,41 +27,40 @@
* processor. We fix this by performing an invalidate, rather than a
* clean + invalidate, before jumping into the kernel.
*
- * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs
- * to be called for both secondary cores startup and primary core resume
- * procedures.
+ * This function needs to be called for both secondary cores startup and
+ * primary core resume procedures.
*/
ENTRY(v7_invalidate_l1)
mov r0, #0
mcr p15, 2, r0, c0, c0, 0
mrc p15, 1, r0, c0, c0, 0
- movw r1, #0x7fff
- and r2, r1, r0, lsr #13
+ movw r3, #0x3ff
+ and r3, r3, r0, lsr #3 @ 'Associativity' in CCSIDR[12:3]
+ clz r1, r3 @ WayShift
+ mov r2, #1
+ mov r3, r3, lsl r1 @ NumWays-1 shifted into bits [31:...]
+ movs r1, r2, lsl r1 @ #1 shifted left by same amount
+ moveq r1, #1 @ r1 needs value > 0 even if only 1 way
- movw r1, #0x3ff
+ and r2, r0, #0x7
+ add r2, r2, #4 @ SetShift
- and r3, r1, r0, lsr #3 @ NumWays - 1
- add r2, r2, #1 @ NumSets
+1: movw r4, #0x7fff
+ and r0, r4, r0, lsr #13 @ 'NumSets' in CCSIDR[27:13]
- and r0, r0, #0x7
- add r0, r0, #4 @ SetShift
-
- clz r1, r3 @ WayShift
- add r4, r3, #1 @ NumWays
-1: sub r2, r2, #1 @ NumSets--
- mov r3, r4 @ Temp = NumWays
-2: subs r3, r3, #1 @ Temp--
- mov r5, r3, lsl r1
- mov r6, r2, lsl r0
- orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
- mcr p15, 0, r5, c7, c6, 2
- bgt 2b
- cmp r2, #0
- bgt 1b
- dsb st
- isb
- ret lr
+2: mov r4, r0, lsl r2 @ NumSet << SetShift
+ orr r4, r4, r3 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+ mcr p15, 0, r4, c7, c6, 2
+ subs r0, r0, #1 @ Set--
+ bpl 2b
+ subs r3, r3, r1 @ Way--
+ bcc 3f
+ mrc p15, 1, r0, c0, c0, 0 @ re-read cache geometry from CCSIDR
+ b 1b
+3: dsb st
+ isb
+ ret lr
ENDPROC(v7_invalidate_l1)
/*
--
2.30.2
From: Ard Biesheuvel <ardb(a)kernel.org>
[ Upstream commit f9e7a99fb6b86aa6a00e53b34ee6973840e005aa ]
The cache invalidation code in v7_invalidate_l1 can be tweaked to
re-read the associativity from CCSIDR, and keep the way identifier
component in a single register that is assigned in the outer loop. This
way, we need 2 registers less.
Given that the number of sets is typically much larger than the
associativity, rearrange the code so that the outer loop has the fewer
number of iterations, ensuring that the re-read of CCSIDR only occurs a
handful of times in practice.
Fix the whitespace while at it, and update the comment to indicate that
this code is no longer a clone of anything else.
Acked-by: Nicolas Pitre <nico(a)fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mm/cache-v7.S | 51 +++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 26 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 2149b47a0c5a..463965dc7922 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -28,41 +28,40 @@
* processor. We fix this by performing an invalidate, rather than a
* clean + invalidate, before jumping into the kernel.
*
- * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs
- * to be called for both secondary cores startup and primary core resume
- * procedures.
+ * This function needs to be called for both secondary cores startup and
+ * primary core resume procedures.
*/
ENTRY(v7_invalidate_l1)
mov r0, #0
mcr p15, 2, r0, c0, c0, 0
mrc p15, 1, r0, c0, c0, 0
- movw r1, #0x7fff
- and r2, r1, r0, lsr #13
+ movw r3, #0x3ff
+ and r3, r3, r0, lsr #3 @ 'Associativity' in CCSIDR[12:3]
+ clz r1, r3 @ WayShift
+ mov r2, #1
+ mov r3, r3, lsl r1 @ NumWays-1 shifted into bits [31:...]
+ movs r1, r2, lsl r1 @ #1 shifted left by same amount
+ moveq r1, #1 @ r1 needs value > 0 even if only 1 way
- movw r1, #0x3ff
+ and r2, r0, #0x7
+ add r2, r2, #4 @ SetShift
- and r3, r1, r0, lsr #3 @ NumWays - 1
- add r2, r2, #1 @ NumSets
+1: movw r4, #0x7fff
+ and r0, r4, r0, lsr #13 @ 'NumSets' in CCSIDR[27:13]
- and r0, r0, #0x7
- add r0, r0, #4 @ SetShift
-
- clz r1, r3 @ WayShift
- add r4, r3, #1 @ NumWays
-1: sub r2, r2, #1 @ NumSets--
- mov r3, r4 @ Temp = NumWays
-2: subs r3, r3, #1 @ Temp--
- mov r5, r3, lsl r1
- mov r6, r2, lsl r0
- orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
- mcr p15, 0, r5, c7, c6, 2
- bgt 2b
- cmp r2, #0
- bgt 1b
- dsb st
- isb
- ret lr
+2: mov r4, r0, lsl r2 @ NumSet << SetShift
+ orr r4, r4, r3 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+ mcr p15, 0, r4, c7, c6, 2
+ subs r0, r0, #1 @ Set--
+ bpl 2b
+ subs r3, r3, r1 @ Way--
+ bcc 3f
+ mrc p15, 1, r0, c0, c0, 0 @ re-read cache geometry from CCSIDR
+ b 1b
+3: dsb st
+ isb
+ ret lr
ENDPROC(v7_invalidate_l1)
/*
--
2.30.2
From: Ard Biesheuvel <ardb(a)kernel.org>
[ Upstream commit f9e7a99fb6b86aa6a00e53b34ee6973840e005aa ]
The cache invalidation code in v7_invalidate_l1 can be tweaked to
re-read the associativity from CCSIDR, and keep the way identifier
component in a single register that is assigned in the outer loop. This
way, we need 2 registers less.
Given that the number of sets is typically much larger than the
associativity, rearrange the code so that the outer loop has the fewer
number of iterations, ensuring that the re-read of CCSIDR only occurs a
handful of times in practice.
Fix the whitespace while at it, and update the comment to indicate that
this code is no longer a clone of anything else.
Acked-by: Nicolas Pitre <nico(a)fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mm/cache-v7.S | 51 +++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 26 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index 0ee8fc4b4672..8e69bf36a3ec 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -33,41 +33,40 @@ icache_size:
* processor. We fix this by performing an invalidate, rather than a
* clean + invalidate, before jumping into the kernel.
*
- * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs
- * to be called for both secondary cores startup and primary core resume
- * procedures.
+ * This function needs to be called for both secondary cores startup and
+ * primary core resume procedures.
*/
ENTRY(v7_invalidate_l1)
mov r0, #0
mcr p15, 2, r0, c0, c0, 0
mrc p15, 1, r0, c0, c0, 0
- movw r1, #0x7fff
- and r2, r1, r0, lsr #13
+ movw r3, #0x3ff
+ and r3, r3, r0, lsr #3 @ 'Associativity' in CCSIDR[12:3]
+ clz r1, r3 @ WayShift
+ mov r2, #1
+ mov r3, r3, lsl r1 @ NumWays-1 shifted into bits [31:...]
+ movs r1, r2, lsl r1 @ #1 shifted left by same amount
+ moveq r1, #1 @ r1 needs value > 0 even if only 1 way
- movw r1, #0x3ff
+ and r2, r0, #0x7
+ add r2, r2, #4 @ SetShift
- and r3, r1, r0, lsr #3 @ NumWays - 1
- add r2, r2, #1 @ NumSets
+1: movw r4, #0x7fff
+ and r0, r4, r0, lsr #13 @ 'NumSets' in CCSIDR[27:13]
- and r0, r0, #0x7
- add r0, r0, #4 @ SetShift
-
- clz r1, r3 @ WayShift
- add r4, r3, #1 @ NumWays
-1: sub r2, r2, #1 @ NumSets--
- mov r3, r4 @ Temp = NumWays
-2: subs r3, r3, #1 @ Temp--
- mov r5, r3, lsl r1
- mov r6, r2, lsl r0
- orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
- mcr p15, 0, r5, c7, c6, 2
- bgt 2b
- cmp r2, #0
- bgt 1b
- dsb st
- isb
- ret lr
+2: mov r4, r0, lsl r2 @ NumSet << SetShift
+ orr r4, r4, r3 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+ mcr p15, 0, r4, c7, c6, 2
+ subs r0, r0, #1 @ Set--
+ bpl 2b
+ subs r3, r3, r1 @ Way--
+ bcc 3f
+ mrc p15, 1, r0, c0, c0, 0 @ re-read cache geometry from CCSIDR
+ b 1b
+3: dsb st
+ isb
+ ret lr
ENDPROC(v7_invalidate_l1)
/*
--
2.30.2
From: Ard Biesheuvel <ardb(a)kernel.org>
[ Upstream commit f9e7a99fb6b86aa6a00e53b34ee6973840e005aa ]
The cache invalidation code in v7_invalidate_l1 can be tweaked to
re-read the associativity from CCSIDR, and keep the way identifier
component in a single register that is assigned in the outer loop. This
way, we need 2 registers less.
Given that the number of sets is typically much larger than the
associativity, rearrange the code so that the outer loop has the fewer
number of iterations, ensuring that the re-read of CCSIDR only occurs a
handful of times in practice.
Fix the whitespace while at it, and update the comment to indicate that
this code is no longer a clone of anything else.
Acked-by: Nicolas Pitre <nico(a)fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mm/cache-v7.S | 51 +++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 26 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index dc8f152f3556..e3bc1d6e13d0 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -33,41 +33,40 @@ icache_size:
* processor. We fix this by performing an invalidate, rather than a
* clean + invalidate, before jumping into the kernel.
*
- * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs
- * to be called for both secondary cores startup and primary core resume
- * procedures.
+ * This function needs to be called for both secondary cores startup and
+ * primary core resume procedures.
*/
ENTRY(v7_invalidate_l1)
mov r0, #0
mcr p15, 2, r0, c0, c0, 0
mrc p15, 1, r0, c0, c0, 0
- movw r1, #0x7fff
- and r2, r1, r0, lsr #13
+ movw r3, #0x3ff
+ and r3, r3, r0, lsr #3 @ 'Associativity' in CCSIDR[12:3]
+ clz r1, r3 @ WayShift
+ mov r2, #1
+ mov r3, r3, lsl r1 @ NumWays-1 shifted into bits [31:...]
+ movs r1, r2, lsl r1 @ #1 shifted left by same amount
+ moveq r1, #1 @ r1 needs value > 0 even if only 1 way
- movw r1, #0x3ff
+ and r2, r0, #0x7
+ add r2, r2, #4 @ SetShift
- and r3, r1, r0, lsr #3 @ NumWays - 1
- add r2, r2, #1 @ NumSets
+1: movw r4, #0x7fff
+ and r0, r4, r0, lsr #13 @ 'NumSets' in CCSIDR[27:13]
- and r0, r0, #0x7
- add r0, r0, #4 @ SetShift
-
- clz r1, r3 @ WayShift
- add r4, r3, #1 @ NumWays
-1: sub r2, r2, #1 @ NumSets--
- mov r3, r4 @ Temp = NumWays
-2: subs r3, r3, #1 @ Temp--
- mov r5, r3, lsl r1
- mov r6, r2, lsl r0
- orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
- mcr p15, 0, r5, c7, c6, 2
- bgt 2b
- cmp r2, #0
- bgt 1b
- dsb st
- isb
- ret lr
+2: mov r4, r0, lsl r2 @ NumSet << SetShift
+ orr r4, r4, r3 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+ mcr p15, 0, r4, c7, c6, 2
+ subs r0, r0, #1 @ Set--
+ bpl 2b
+ subs r3, r3, r1 @ Way--
+ bcc 3f
+ mrc p15, 1, r0, c0, c0, 0 @ re-read cache geometry from CCSIDR
+ b 1b
+3: dsb st
+ isb
+ ret lr
ENDPROC(v7_invalidate_l1)
/*
--
2.30.2
From: Ard Biesheuvel <ardb(a)kernel.org>
[ Upstream commit f9e7a99fb6b86aa6a00e53b34ee6973840e005aa ]
The cache invalidation code in v7_invalidate_l1 can be tweaked to
re-read the associativity from CCSIDR, and keep the way identifier
component in a single register that is assigned in the outer loop. This
way, we need 2 registers less.
Given that the number of sets is typically much larger than the
associativity, rearrange the code so that the outer loop has the fewer
number of iterations, ensuring that the re-read of CCSIDR only occurs a
handful of times in practice.
Fix the whitespace while at it, and update the comment to indicate that
this code is no longer a clone of anything else.
Acked-by: Nicolas Pitre <nico(a)fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mm/cache-v7.S | 51 +++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 26 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index dc8f152f3556..e3bc1d6e13d0 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -33,41 +33,40 @@ icache_size:
* processor. We fix this by performing an invalidate, rather than a
* clean + invalidate, before jumping into the kernel.
*
- * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs
- * to be called for both secondary cores startup and primary core resume
- * procedures.
+ * This function needs to be called for both secondary cores startup and
+ * primary core resume procedures.
*/
ENTRY(v7_invalidate_l1)
mov r0, #0
mcr p15, 2, r0, c0, c0, 0
mrc p15, 1, r0, c0, c0, 0
- movw r1, #0x7fff
- and r2, r1, r0, lsr #13
+ movw r3, #0x3ff
+ and r3, r3, r0, lsr #3 @ 'Associativity' in CCSIDR[12:3]
+ clz r1, r3 @ WayShift
+ mov r2, #1
+ mov r3, r3, lsl r1 @ NumWays-1 shifted into bits [31:...]
+ movs r1, r2, lsl r1 @ #1 shifted left by same amount
+ moveq r1, #1 @ r1 needs value > 0 even if only 1 way
- movw r1, #0x3ff
+ and r2, r0, #0x7
+ add r2, r2, #4 @ SetShift
- and r3, r1, r0, lsr #3 @ NumWays - 1
- add r2, r2, #1 @ NumSets
+1: movw r4, #0x7fff
+ and r0, r4, r0, lsr #13 @ 'NumSets' in CCSIDR[27:13]
- and r0, r0, #0x7
- add r0, r0, #4 @ SetShift
-
- clz r1, r3 @ WayShift
- add r4, r3, #1 @ NumWays
-1: sub r2, r2, #1 @ NumSets--
- mov r3, r4 @ Temp = NumWays
-2: subs r3, r3, #1 @ Temp--
- mov r5, r3, lsl r1
- mov r6, r2, lsl r0
- orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
- mcr p15, 0, r5, c7, c6, 2
- bgt 2b
- cmp r2, #0
- bgt 1b
- dsb st
- isb
- ret lr
+2: mov r4, r0, lsl r2 @ NumSet << SetShift
+ orr r4, r4, r3 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+ mcr p15, 0, r4, c7, c6, 2
+ subs r0, r0, #1 @ Set--
+ bpl 2b
+ subs r3, r3, r1 @ Way--
+ bcc 3f
+ mrc p15, 1, r0, c0, c0, 0 @ re-read cache geometry from CCSIDR
+ b 1b
+3: dsb st
+ isb
+ ret lr
ENDPROC(v7_invalidate_l1)
/*
--
2.30.2
From: Ard Biesheuvel <ardb(a)kernel.org>
[ Upstream commit f9e7a99fb6b86aa6a00e53b34ee6973840e005aa ]
The cache invalidation code in v7_invalidate_l1 can be tweaked to
re-read the associativity from CCSIDR, and keep the way identifier
component in a single register that is assigned in the outer loop. This
way, we need 2 registers less.
Given that the number of sets is typically much larger than the
associativity, rearrange the code so that the outer loop has the fewer
number of iterations, ensuring that the re-read of CCSIDR only occurs a
handful of times in practice.
Fix the whitespace while at it, and update the comment to indicate that
this code is no longer a clone of anything else.
Acked-by: Nicolas Pitre <nico(a)fluxnic.net>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mm/cache-v7.S | 51 +++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 26 deletions(-)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index dc8f152f3556..e3bc1d6e13d0 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -33,41 +33,40 @@ icache_size:
* processor. We fix this by performing an invalidate, rather than a
* clean + invalidate, before jumping into the kernel.
*
- * This function is cloned from arch/arm/mach-tegra/headsmp.S, and needs
- * to be called for both secondary cores startup and primary core resume
- * procedures.
+ * This function needs to be called for both secondary cores startup and
+ * primary core resume procedures.
*/
ENTRY(v7_invalidate_l1)
mov r0, #0
mcr p15, 2, r0, c0, c0, 0
mrc p15, 1, r0, c0, c0, 0
- movw r1, #0x7fff
- and r2, r1, r0, lsr #13
+ movw r3, #0x3ff
+ and r3, r3, r0, lsr #3 @ 'Associativity' in CCSIDR[12:3]
+ clz r1, r3 @ WayShift
+ mov r2, #1
+ mov r3, r3, lsl r1 @ NumWays-1 shifted into bits [31:...]
+ movs r1, r2, lsl r1 @ #1 shifted left by same amount
+ moveq r1, #1 @ r1 needs value > 0 even if only 1 way
- movw r1, #0x3ff
+ and r2, r0, #0x7
+ add r2, r2, #4 @ SetShift
- and r3, r1, r0, lsr #3 @ NumWays - 1
- add r2, r2, #1 @ NumSets
+1: movw r4, #0x7fff
+ and r0, r4, r0, lsr #13 @ 'NumSets' in CCSIDR[27:13]
- and r0, r0, #0x7
- add r0, r0, #4 @ SetShift
-
- clz r1, r3 @ WayShift
- add r4, r3, #1 @ NumWays
-1: sub r2, r2, #1 @ NumSets--
- mov r3, r4 @ Temp = NumWays
-2: subs r3, r3, #1 @ Temp--
- mov r5, r3, lsl r1
- mov r6, r2, lsl r0
- orr r5, r5, r6 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
- mcr p15, 0, r5, c7, c6, 2
- bgt 2b
- cmp r2, #0
- bgt 1b
- dsb st
- isb
- ret lr
+2: mov r4, r0, lsl r2 @ NumSet << SetShift
+ orr r4, r4, r3 @ Reg = (Temp<<WayShift)|(NumSets<<SetShift)
+ mcr p15, 0, r4, c7, c6, 2
+ subs r0, r0, #1 @ Set--
+ bpl 2b
+ subs r3, r3, r1 @ Way--
+ bcc 3f
+ mrc p15, 1, r0, c0, c0, 0 @ re-read cache geometry from CCSIDR
+ b 1b
+3: dsb st
+ isb
+ ret lr
ENDPROC(v7_invalidate_l1)
/*
--
2.30.2
After commit 81ad4276b505 ("Thermal: Ignore invalid trip points") all
user_space governor notifications via RW trip point is broken in intel
thermal drivers. This commits marks trip_points with value of 0 during
call to thermal_zone_device_register() as invalid. RW trip points can be
0 as user space will set the correct trip temperature later.
During driver init, x86_package_temp and all int340x drivers sets RW trip
temperature as 0. This results in all these trips marked as invalid by
the thermal core.
To fix this initialize RW trips to THERMAL_TEMP_INVALID instead of 0.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
---
drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c | 4 ++++
drivers/thermal/intel/x86_pkg_temp_thermal.c | 2 +-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c b/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
index d1248ba943a4..62c0aa5d0783 100644
--- a/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
+++ b/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c
@@ -237,6 +237,8 @@ struct int34x_thermal_zone *int340x_thermal_zone_add(struct acpi_device *adev,
if (ACPI_FAILURE(status))
trip_cnt = 0;
else {
+ int i;
+
int34x_thermal_zone->aux_trips =
kcalloc(trip_cnt,
sizeof(*int34x_thermal_zone->aux_trips),
@@ -247,6 +249,8 @@ struct int34x_thermal_zone *int340x_thermal_zone_add(struct acpi_device *adev,
}
trip_mask = BIT(trip_cnt) - 1;
int34x_thermal_zone->aux_trip_nr = trip_cnt;
+ for (i = 0; i < trip_cnt; ++i)
+ int34x_thermal_zone->aux_trips[i] = THERMAL_TEMP_INVALID;
}
trip_cnt = int340x_thermal_read_trips(int34x_thermal_zone);
diff --git a/drivers/thermal/intel/x86_pkg_temp_thermal.c b/drivers/thermal/intel/x86_pkg_temp_thermal.c
index 295742e83960..4d8edc61a78b 100644
--- a/drivers/thermal/intel/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/intel/x86_pkg_temp_thermal.c
@@ -166,7 +166,7 @@ static int sys_get_trip_temp(struct thermal_zone_device *tzd,
if (thres_reg_value)
*temp = zonedev->tj_max - thres_reg_value * 1000;
else
- *temp = 0;
+ *temp = THERMAL_TEMP_INVALID;
pr_debug("sys_get_trip_temp %d\n", *temp);
return 0;
--
2.25.1
commit a3ba26ecfb569f4aa3f867e80c02aa65f20aadad upstream.
Access to the GHCB is mainly in the VMGEXIT path and it is known that the
GHCB will be mapped. But there are two paths where it is possible the GHCB
might not be mapped.
The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
However, if a SIPI is performed without a corresponding AP Reset Hold,
then the GHCB might not be mapped (depending on the previous VMEXIT),
which will result in a NULL pointer dereference.
The svm_complete_emulated_msr() routine will update the GHCB to inform
the caller of a RDMSR/WRMSR operation about any errors. While it is likely
that the GHCB will be mapped in this situation, add a safe guard
in this path to be certain a NULL pointer dereference is not encountered.
Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: stable(a)vger.kernel.org
Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky(a)amd.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/svm/sev.c | 3 +++
arch/x86/kvm/svm/svm.c | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 874ea309279f..d6cf74d3b3b2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2104,5 +2104,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
* the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
* non-zero value.
*/
+ if (!svm->ghcb)
+ return;
+
ghcb_set_sw_exit_info_2(svm->ghcb, 1);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58a45bb139f8..2b6e7874c491 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2809,7 +2809,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
{
struct vcpu_svm *svm = to_svm(vcpu);
- if (!sev_es_guest(svm->vcpu.kvm) || !err)
+ if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->ghcb))
return kvm_complete_insn_gp(&svm->vcpu, err);
ghcb_set_sw_exit_info_1(svm->ghcb, 1);
--
2.31.0
Hi all,
A breakage in 5.4.102 has been reported [1] due to the backport of the
two following upstream commits:
8a5a75e5e9e5 ("of/fdt: Make sure no-map does not remove already reserved regions")
86588296acbf ("fdt: Properly handle "no-map" field in the memory region")
As Alexandre noted in the original thread, the backport missed
dependencies. But since these patches were not really fixes in the first
place, it seems preferable to simply revert them from 5.4 and earlier.
[1] https://lore.kernel.org/linux-arm-kernel/CAL_Jsq+LUPZFhXd+j-xM67rZB=pvEvZM+…
Quentin Perret (2):
Revert "of/fdt: Make sure no-map does not remove already reserved
regions"
Revert "fdt: Properly handle "no-map" field in the memory region"
drivers/of/fdt.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
--
2.31.1.607.g51e8a6a459-goog
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34e5b01186858b36c4d7c87e1a025071e8e2401f Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 3 May 2021 05:11:42 +0800
Subject: [PATCH] sctp: delay auto_asconf init until binding the first addr
As Or Cohen described:
If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
held and sp->do_auto_asconf is true, then an element is removed
from the auto_asconf_splist without any proper locking.
This can happen in the following functions:
1. In sctp_accept, if sctp_sock_migrate fails.
2. In inet_create or inet6_create, if there is a bpf program
attached to BPF_CGROUP_INET_SOCK_CREATE which denies
creation of the sctp socket.
This patch is to fix it by moving the auto_asconf init out of
sctp_init_sock(), by which inet_create()/inet6_create() won't
need to operate it in sctp_destroy_sock() when calling
sk_common_release().
It also makes more sense to do auto_asconf init while binding the
first addr, as auto_asconf actually requires an ANY addr bind,
see it in sctp_addr_wq_timeout_handler().
This addresses CVE-2021-23133.
Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications")
Reported-by: Or Cohen <orcohen(a)paloaltonetworks.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 76a388b5021c..40f9f6c4a0a1 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -357,6 +357,18 @@ static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
return af;
}
+static void sctp_auto_asconf_init(struct sctp_sock *sp)
+{
+ struct net *net = sock_net(&sp->inet.sk);
+
+ if (net->sctp.default_auto_asconf) {
+ spin_lock(&net->sctp.addr_wq_lock);
+ list_add_tail(&sp->auto_asconf_list, &net->sctp.auto_asconf_splist);
+ spin_unlock(&net->sctp.addr_wq_lock);
+ sp->do_auto_asconf = 1;
+ }
+}
+
/* Bind a local address either to an endpoint or to an association. */
static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
{
@@ -418,8 +430,10 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
return -EADDRINUSE;
/* Refresh ephemeral port. */
- if (!bp->port)
+ if (!bp->port) {
bp->port = inet_sk(sk)->inet_num;
+ sctp_auto_asconf_init(sp);
+ }
/* Add the address to the bind address list.
* Use GFP_ATOMIC since BHs will be disabled.
@@ -4993,19 +5007,6 @@ static int sctp_init_sock(struct sock *sk)
sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);
- /* Nothing can fail after this block, otherwise
- * sctp_destroy_sock() will be called without addr_wq_lock held
- */
- if (net->sctp.default_auto_asconf) {
- spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
- list_add_tail(&sp->auto_asconf_list,
- &net->sctp.auto_asconf_splist);
- sp->do_auto_asconf = 1;
- spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
- } else {
- sp->do_auto_asconf = 0;
- }
-
local_bh_enable();
return 0;
@@ -9401,6 +9402,8 @@ static int sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
return err;
}
+ sctp_auto_asconf_init(newsp);
+
/* Move any messages in the old socket's receive queue that are for the
* peeled off association to the new socket's receive queue.
*/
The FEC does not have a PHY so it should not have a phy-handle. It is
connected to the switch at RGMII level so we need a fixed-link sub-node
on both ends.
This was not a problem until the qca8k.c driver was converted to PHYLINK
by commit b3591c2a3661 ("net: dsa: qca8k: Switch to PHYLINK instead of
PHYLIB"). That commit revealed the FEC configuration was not correct.
Fixes: 87489ec3a77f ("ARM: dts: imx: Add Y Soft IOTA Draco, Hydra and Ursa boards")
Cc: stable(a)vger.kernel.org
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: Michal Vokáč <michal.vokac(a)ysoft.com>
---
arch/arm/boot/dts/imx6dl-yapp4-common.dtsi | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi b/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi
index 111d4d331f98..686dab57a1e4 100644
--- a/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi
+++ b/arch/arm/boot/dts/imx6dl-yapp4-common.dtsi
@@ -121,9 +121,13 @@
phy-reset-gpios = <&gpio1 25 GPIO_ACTIVE_LOW>;
phy-reset-duration = <20>;
phy-supply = <&sw2_reg>;
- phy-handle = <ðphy0>;
status = "okay";
+ fixed-link {
+ speed = <1000>;
+ full-duplex;
+ };
+
mdio {
#address-cells = <1>;
#size-cells = <0>;
--
2.1.4
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 94ac0835391efc1a30feda6fc908913ec012951e Mon Sep 17 00:00:00 2001
From: Eric Auger <eric.auger(a)redhat.com>
Date: Mon, 12 Apr 2021 17:00:34 +0200
Subject: [PATCH] KVM: arm/arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST read
When reading the base address of the a REDIST region
through KVM_VGIC_V3_ADDR_TYPE_REDIST we expect the
redistributor region list to be populated with a single
element.
However list_first_entry() expects the list to be non empty.
Instead we should use list_first_entry_or_null which effectively
returns NULL if the list is empty.
Fixes: dbd9733ab674 ("KVM: arm/arm64: Replace the single rdist region by a list")
Cc: <Stable(a)vger.kernel.org> # v4.18+
Signed-off-by: Eric Auger <eric.auger(a)redhat.com>
Reported-by: Gavin Shan <gshan(a)redhat.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20210412150034.29185-1-eric.auger@redhat.com
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 2f66cf247282..7740995de982 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -87,8 +87,8 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write)
r = vgic_v3_set_redist_base(kvm, 0, *addr, 0);
goto out;
}
- rdreg = list_first_entry(&vgic->rd_regions,
- struct vgic_redist_region, list);
+ rdreg = list_first_entry_or_null(&vgic->rd_regions,
+ struct vgic_redist_region, list);
if (!rdreg)
addr_ptr = &undef_value;
else
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: beb6df0ce94f - thermal/core/fair share: Lock the thermal zone while looping over instances
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
Merge: OK
Compile: OK
Tests: FAILED
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
One or more kernel tests failed:
s390x:
❌ Networking tunnel: geneve basic test
ppc64le:
❌ LTP
❌ Networking tunnel: geneve basic test
aarch64:
❌ Networking tunnel: geneve basic test
x86_64:
❌ Networking tunnel: geneve basic test
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
❌ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ xfstests - btrfs
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage block - filesystem fio test
🚧 ⚡⚡⚡ Storage block - queue scheduler test
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test
🚧 ⚡⚡⚡ stress: stress-ng
ppc64le:
Host 1:
✅ Boot test
❌ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
❌ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ trace: ftrace/tracer
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ xfstests - btrfs
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
🚧 ✅ Storage block - filesystem fio test
🚧 ✅ Storage block - queue scheduler test
🚧 ✅ Storage nvme - tcp
🚧 ✅ Storage: lvm device-mapper test
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ selinux-policy: serge-testsuite
✅ Storage: swraid mdadm raid_module test
🚧 ✅ Storage blktests
🚧 ✅ Storage nvme - tcp
🚧 ⚡⚡⚡ stress: stress-ng
Host 2:
✅ Boot test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
❌ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ trace: ftrace/tracer
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
x86_64:
Host 1:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ xfstests - nfsv4.2
✅ selinux-policy: serge-testsuite
✅ power-management: cpupower/sanity test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ CPU: Idle Test
🚧 ❌ xfstests - btrfs
🚧 ✅ xfstests - cifsv3.11
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
🚧 ✅ Storage block - filesystem fio test
🚧 ✅ Storage block - queue scheduler test
🚧 ✅ Storage nvme - tcp
🚧 ✅ Storage: lvm device-mapper test
🚧 ✅ stress: stress-ng
Host 2:
✅ Boot test
✅ ACPI table test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
❌ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: sanity smoke test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Hi,
Since v5.4.102 I observe a regression on stm32mp1 platform: "no-map"
reserved-memory regions are no more "reserved" and make part of the
kernel System RAM. This causes allocation failure for devices which try
to take a reserved-memory region.
It has been introduced by the following path:
"fdt: Properly handle "no-map" field in the memory region
[ Upstream commit 86588296acbfb1591e92ba60221e95677ecadb43 ]"
which replace memblock_remove by memblock_mark_nomap in no-map case.
Reverting this patch it's fine.
I add part of my DT (something is maybe wrong inside):
memory@c0000000 {
reg = <0xc0000000 0x20000000>;
};
reserved-memory {
#address-cells = <1>;
#size-cells = <1>;
ranges;
gpu_reserved: gpu@d4000000 {
reg = <0xd4000000 0x4000000>;
no-map;
};
};
Sorry if this issue has already been raised and discussed.
Thanks
alex
From: Christian Brauner <christian.brauner(a)ubuntu.com>
We currently don't have any filesystems that support idmapped mounts
which are mountable inside a user namespace. That was a deliberate
decision for now as a userns root can just mount the filesystem
themselves. So enforce this restriction explicitly until there's a real
use-case for this. This way we can notice it and will have a chance to
adapt and audit our translation helpers and fstests appropriately if we
need to support such filesystems.
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: stable(a)vger.kernel.org
CC: linux-fsdevel(a)vger.kernel.org
Suggested-by: Seth Forshee <seth.forshee(a)canonical.com>
Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com>
---
fs/namespace.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index f63337828e1c..c3f1a78ba369 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3855,8 +3855,12 @@ static int can_idmap_mount(const struct mount_kattr *kattr, struct mount *mnt)
if (!(m->mnt_sb->s_type->fs_flags & FS_ALLOW_IDMAP))
return -EINVAL;
+ /* Don't yet support filesystem mountable in user namespaces. */
+ if (m->mnt_sb->s_user_ns != &init_user_ns)
+ return -EINVAL;
+
/* We're not controlling the superblock. */
- if (!ns_capable(m->mnt_sb->s_user_ns, CAP_SYS_ADMIN))
+ if (!capable(CAP_SYS_ADMIN))
return -EPERM;
/* Mount has already been visible in the filesystem hierarchy. */
base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
--
2.27.0
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3db1d52466dc11dca4e47ef12a6e6e97f846af62 Mon Sep 17 00:00:00 2001
From: Erwan Le Ray <erwan.leray(a)foss.st.com>
Date: Thu, 4 Mar 2021 17:23:07 +0100
Subject: [PATCH] serial: stm32: fix tx_empty condition
In "tx_empty", we should poll TC bit in both DMA and PIO modes (instead of
TXE) to check transmission data register has been transmitted independently
of the FIFO mode. TC indicates that both transmit register and shift
register are empty. When shift register is empty, tx_empty should return
TIOCSER_TEMT instead of TC value.
Cleans the USART_CR_TC TCCF register define (transmission complete clear
flag) as it is duplicate of USART_ICR_TCCF.
Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver")
Signed-off-by: Erwan Le Ray <erwan.leray(a)foss.st.com>
Link: https://lore.kernel.org/r/20210304162308.8984-13-erwan.leray@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index d205fce1950a..99dfa884cbef 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -515,7 +515,10 @@ static unsigned int stm32_usart_tx_empty(struct uart_port *port)
struct stm32_port *stm32_port = to_stm32_port(port);
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
- return readl_relaxed(port->membase + ofs->isr) & USART_SR_TXE;
+ if (readl_relaxed(port->membase + ofs->isr) & USART_SR_TC)
+ return TIOCSER_TEMT;
+
+ return 0;
}
static void stm32_usart_set_mctrl(struct uart_port *port, unsigned int mctrl)
diff --git a/drivers/tty/serial/stm32-usart.h b/drivers/tty/serial/stm32-usart.h
index cb4f327c46db..94b568aa46bb 100644
--- a/drivers/tty/serial/stm32-usart.h
+++ b/drivers/tty/serial/stm32-usart.h
@@ -127,9 +127,6 @@ struct stm32_usart_info stm32h7_info = {
/* Dummy bits */
#define USART_SR_DUMMY_RX BIT(16)
-/* USART_ICR (F7) */
-#define USART_CR_TC BIT(6)
-
/* USART_DR */
#define USART_DR_MASK GENMASK(8, 0)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34e5b01186858b36c4d7c87e1a025071e8e2401f Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 3 May 2021 05:11:42 +0800
Subject: [PATCH] sctp: delay auto_asconf init until binding the first addr
As Or Cohen described:
If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
held and sp->do_auto_asconf is true, then an element is removed
from the auto_asconf_splist without any proper locking.
This can happen in the following functions:
1. In sctp_accept, if sctp_sock_migrate fails.
2. In inet_create or inet6_create, if there is a bpf program
attached to BPF_CGROUP_INET_SOCK_CREATE which denies
creation of the sctp socket.
This patch is to fix it by moving the auto_asconf init out of
sctp_init_sock(), by which inet_create()/inet6_create() won't
need to operate it in sctp_destroy_sock() when calling
sk_common_release().
It also makes more sense to do auto_asconf init while binding the
first addr, as auto_asconf actually requires an ANY addr bind,
see it in sctp_addr_wq_timeout_handler().
This addresses CVE-2021-23133.
Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications")
Reported-by: Or Cohen <orcohen(a)paloaltonetworks.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 76a388b5021c..40f9f6c4a0a1 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -357,6 +357,18 @@ static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
return af;
}
+static void sctp_auto_asconf_init(struct sctp_sock *sp)
+{
+ struct net *net = sock_net(&sp->inet.sk);
+
+ if (net->sctp.default_auto_asconf) {
+ spin_lock(&net->sctp.addr_wq_lock);
+ list_add_tail(&sp->auto_asconf_list, &net->sctp.auto_asconf_splist);
+ spin_unlock(&net->sctp.addr_wq_lock);
+ sp->do_auto_asconf = 1;
+ }
+}
+
/* Bind a local address either to an endpoint or to an association. */
static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
{
@@ -418,8 +430,10 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
return -EADDRINUSE;
/* Refresh ephemeral port. */
- if (!bp->port)
+ if (!bp->port) {
bp->port = inet_sk(sk)->inet_num;
+ sctp_auto_asconf_init(sp);
+ }
/* Add the address to the bind address list.
* Use GFP_ATOMIC since BHs will be disabled.
@@ -4993,19 +5007,6 @@ static int sctp_init_sock(struct sock *sk)
sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);
- /* Nothing can fail after this block, otherwise
- * sctp_destroy_sock() will be called without addr_wq_lock held
- */
- if (net->sctp.default_auto_asconf) {
- spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
- list_add_tail(&sp->auto_asconf_list,
- &net->sctp.auto_asconf_splist);
- sp->do_auto_asconf = 1;
- spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
- } else {
- sp->do_auto_asconf = 0;
- }
-
local_bh_enable();
return 0;
@@ -9401,6 +9402,8 @@ static int sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
return err;
}
+ sctp_auto_asconf_init(newsp);
+
/* Move any messages in the old socket's receive queue that are for the
* peeled off association to the new socket's receive queue.
*/
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34e5b01186858b36c4d7c87e1a025071e8e2401f Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 3 May 2021 05:11:42 +0800
Subject: [PATCH] sctp: delay auto_asconf init until binding the first addr
As Or Cohen described:
If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
held and sp->do_auto_asconf is true, then an element is removed
from the auto_asconf_splist without any proper locking.
This can happen in the following functions:
1. In sctp_accept, if sctp_sock_migrate fails.
2. In inet_create or inet6_create, if there is a bpf program
attached to BPF_CGROUP_INET_SOCK_CREATE which denies
creation of the sctp socket.
This patch is to fix it by moving the auto_asconf init out of
sctp_init_sock(), by which inet_create()/inet6_create() won't
need to operate it in sctp_destroy_sock() when calling
sk_common_release().
It also makes more sense to do auto_asconf init while binding the
first addr, as auto_asconf actually requires an ANY addr bind,
see it in sctp_addr_wq_timeout_handler().
This addresses CVE-2021-23133.
Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications")
Reported-by: Or Cohen <orcohen(a)paloaltonetworks.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 76a388b5021c..40f9f6c4a0a1 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -357,6 +357,18 @@ static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
return af;
}
+static void sctp_auto_asconf_init(struct sctp_sock *sp)
+{
+ struct net *net = sock_net(&sp->inet.sk);
+
+ if (net->sctp.default_auto_asconf) {
+ spin_lock(&net->sctp.addr_wq_lock);
+ list_add_tail(&sp->auto_asconf_list, &net->sctp.auto_asconf_splist);
+ spin_unlock(&net->sctp.addr_wq_lock);
+ sp->do_auto_asconf = 1;
+ }
+}
+
/* Bind a local address either to an endpoint or to an association. */
static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
{
@@ -418,8 +430,10 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
return -EADDRINUSE;
/* Refresh ephemeral port. */
- if (!bp->port)
+ if (!bp->port) {
bp->port = inet_sk(sk)->inet_num;
+ sctp_auto_asconf_init(sp);
+ }
/* Add the address to the bind address list.
* Use GFP_ATOMIC since BHs will be disabled.
@@ -4993,19 +5007,6 @@ static int sctp_init_sock(struct sock *sk)
sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);
- /* Nothing can fail after this block, otherwise
- * sctp_destroy_sock() will be called without addr_wq_lock held
- */
- if (net->sctp.default_auto_asconf) {
- spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
- list_add_tail(&sp->auto_asconf_list,
- &net->sctp.auto_asconf_splist);
- sp->do_auto_asconf = 1;
- spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
- } else {
- sp->do_auto_asconf = 0;
- }
-
local_bh_enable();
return 0;
@@ -9401,6 +9402,8 @@ static int sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
return err;
}
+ sctp_auto_asconf_init(newsp);
+
/* Move any messages in the old socket's receive queue that are for the
* peeled off association to the new socket's receive queue.
*/
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34e5b01186858b36c4d7c87e1a025071e8e2401f Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 3 May 2021 05:11:42 +0800
Subject: [PATCH] sctp: delay auto_asconf init until binding the first addr
As Or Cohen described:
If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
held and sp->do_auto_asconf is true, then an element is removed
from the auto_asconf_splist without any proper locking.
This can happen in the following functions:
1. In sctp_accept, if sctp_sock_migrate fails.
2. In inet_create or inet6_create, if there is a bpf program
attached to BPF_CGROUP_INET_SOCK_CREATE which denies
creation of the sctp socket.
This patch is to fix it by moving the auto_asconf init out of
sctp_init_sock(), by which inet_create()/inet6_create() won't
need to operate it in sctp_destroy_sock() when calling
sk_common_release().
It also makes more sense to do auto_asconf init while binding the
first addr, as auto_asconf actually requires an ANY addr bind,
see it in sctp_addr_wq_timeout_handler().
This addresses CVE-2021-23133.
Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications")
Reported-by: Or Cohen <orcohen(a)paloaltonetworks.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 76a388b5021c..40f9f6c4a0a1 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -357,6 +357,18 @@ static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
return af;
}
+static void sctp_auto_asconf_init(struct sctp_sock *sp)
+{
+ struct net *net = sock_net(&sp->inet.sk);
+
+ if (net->sctp.default_auto_asconf) {
+ spin_lock(&net->sctp.addr_wq_lock);
+ list_add_tail(&sp->auto_asconf_list, &net->sctp.auto_asconf_splist);
+ spin_unlock(&net->sctp.addr_wq_lock);
+ sp->do_auto_asconf = 1;
+ }
+}
+
/* Bind a local address either to an endpoint or to an association. */
static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
{
@@ -418,8 +430,10 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
return -EADDRINUSE;
/* Refresh ephemeral port. */
- if (!bp->port)
+ if (!bp->port) {
bp->port = inet_sk(sk)->inet_num;
+ sctp_auto_asconf_init(sp);
+ }
/* Add the address to the bind address list.
* Use GFP_ATOMIC since BHs will be disabled.
@@ -4993,19 +5007,6 @@ static int sctp_init_sock(struct sock *sk)
sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);
- /* Nothing can fail after this block, otherwise
- * sctp_destroy_sock() will be called without addr_wq_lock held
- */
- if (net->sctp.default_auto_asconf) {
- spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
- list_add_tail(&sp->auto_asconf_list,
- &net->sctp.auto_asconf_splist);
- sp->do_auto_asconf = 1;
- spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
- } else {
- sp->do_auto_asconf = 0;
- }
-
local_bh_enable();
return 0;
@@ -9401,6 +9402,8 @@ static int sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
return err;
}
+ sctp_auto_asconf_init(newsp);
+
/* Move any messages in the old socket's receive queue that are for the
* peeled off association to the new socket's receive queue.
*/
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34e5b01186858b36c4d7c87e1a025071e8e2401f Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 3 May 2021 05:11:42 +0800
Subject: [PATCH] sctp: delay auto_asconf init until binding the first addr
As Or Cohen described:
If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
held and sp->do_auto_asconf is true, then an element is removed
from the auto_asconf_splist without any proper locking.
This can happen in the following functions:
1. In sctp_accept, if sctp_sock_migrate fails.
2. In inet_create or inet6_create, if there is a bpf program
attached to BPF_CGROUP_INET_SOCK_CREATE which denies
creation of the sctp socket.
This patch is to fix it by moving the auto_asconf init out of
sctp_init_sock(), by which inet_create()/inet6_create() won't
need to operate it in sctp_destroy_sock() when calling
sk_common_release().
It also makes more sense to do auto_asconf init while binding the
first addr, as auto_asconf actually requires an ANY addr bind,
see it in sctp_addr_wq_timeout_handler().
This addresses CVE-2021-23133.
Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications")
Reported-by: Or Cohen <orcohen(a)paloaltonetworks.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 76a388b5021c..40f9f6c4a0a1 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -357,6 +357,18 @@ static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
return af;
}
+static void sctp_auto_asconf_init(struct sctp_sock *sp)
+{
+ struct net *net = sock_net(&sp->inet.sk);
+
+ if (net->sctp.default_auto_asconf) {
+ spin_lock(&net->sctp.addr_wq_lock);
+ list_add_tail(&sp->auto_asconf_list, &net->sctp.auto_asconf_splist);
+ spin_unlock(&net->sctp.addr_wq_lock);
+ sp->do_auto_asconf = 1;
+ }
+}
+
/* Bind a local address either to an endpoint or to an association. */
static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
{
@@ -418,8 +430,10 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
return -EADDRINUSE;
/* Refresh ephemeral port. */
- if (!bp->port)
+ if (!bp->port) {
bp->port = inet_sk(sk)->inet_num;
+ sctp_auto_asconf_init(sp);
+ }
/* Add the address to the bind address list.
* Use GFP_ATOMIC since BHs will be disabled.
@@ -4993,19 +5007,6 @@ static int sctp_init_sock(struct sock *sk)
sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);
- /* Nothing can fail after this block, otherwise
- * sctp_destroy_sock() will be called without addr_wq_lock held
- */
- if (net->sctp.default_auto_asconf) {
- spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
- list_add_tail(&sp->auto_asconf_list,
- &net->sctp.auto_asconf_splist);
- sp->do_auto_asconf = 1;
- spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
- } else {
- sp->do_auto_asconf = 0;
- }
-
local_bh_enable();
return 0;
@@ -9401,6 +9402,8 @@ static int sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
return err;
}
+ sctp_auto_asconf_init(newsp);
+
/* Move any messages in the old socket's receive queue that are for the
* peeled off association to the new socket's receive queue.
*/
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34e5b01186858b36c4d7c87e1a025071e8e2401f Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 3 May 2021 05:11:42 +0800
Subject: [PATCH] sctp: delay auto_asconf init until binding the first addr
As Or Cohen described:
If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
held and sp->do_auto_asconf is true, then an element is removed
from the auto_asconf_splist without any proper locking.
This can happen in the following functions:
1. In sctp_accept, if sctp_sock_migrate fails.
2. In inet_create or inet6_create, if there is a bpf program
attached to BPF_CGROUP_INET_SOCK_CREATE which denies
creation of the sctp socket.
This patch is to fix it by moving the auto_asconf init out of
sctp_init_sock(), by which inet_create()/inet6_create() won't
need to operate it in sctp_destroy_sock() when calling
sk_common_release().
It also makes more sense to do auto_asconf init while binding the
first addr, as auto_asconf actually requires an ANY addr bind,
see it in sctp_addr_wq_timeout_handler().
This addresses CVE-2021-23133.
Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications")
Reported-by: Or Cohen <orcohen(a)paloaltonetworks.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 76a388b5021c..40f9f6c4a0a1 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -357,6 +357,18 @@ static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
return af;
}
+static void sctp_auto_asconf_init(struct sctp_sock *sp)
+{
+ struct net *net = sock_net(&sp->inet.sk);
+
+ if (net->sctp.default_auto_asconf) {
+ spin_lock(&net->sctp.addr_wq_lock);
+ list_add_tail(&sp->auto_asconf_list, &net->sctp.auto_asconf_splist);
+ spin_unlock(&net->sctp.addr_wq_lock);
+ sp->do_auto_asconf = 1;
+ }
+}
+
/* Bind a local address either to an endpoint or to an association. */
static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
{
@@ -418,8 +430,10 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
return -EADDRINUSE;
/* Refresh ephemeral port. */
- if (!bp->port)
+ if (!bp->port) {
bp->port = inet_sk(sk)->inet_num;
+ sctp_auto_asconf_init(sp);
+ }
/* Add the address to the bind address list.
* Use GFP_ATOMIC since BHs will be disabled.
@@ -4993,19 +5007,6 @@ static int sctp_init_sock(struct sock *sk)
sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);
- /* Nothing can fail after this block, otherwise
- * sctp_destroy_sock() will be called without addr_wq_lock held
- */
- if (net->sctp.default_auto_asconf) {
- spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
- list_add_tail(&sp->auto_asconf_list,
- &net->sctp.auto_asconf_splist);
- sp->do_auto_asconf = 1;
- spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
- } else {
- sp->do_auto_asconf = 0;
- }
-
local_bh_enable();
return 0;
@@ -9401,6 +9402,8 @@ static int sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
return err;
}
+ sctp_auto_asconf_init(newsp);
+
/* Move any messages in the old socket's receive queue that are for the
* peeled off association to the new socket's receive queue.
*/
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34e5b01186858b36c4d7c87e1a025071e8e2401f Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 3 May 2021 05:11:42 +0800
Subject: [PATCH] sctp: delay auto_asconf init until binding the first addr
As Or Cohen described:
If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
held and sp->do_auto_asconf is true, then an element is removed
from the auto_asconf_splist without any proper locking.
This can happen in the following functions:
1. In sctp_accept, if sctp_sock_migrate fails.
2. In inet_create or inet6_create, if there is a bpf program
attached to BPF_CGROUP_INET_SOCK_CREATE which denies
creation of the sctp socket.
This patch is to fix it by moving the auto_asconf init out of
sctp_init_sock(), by which inet_create()/inet6_create() won't
need to operate it in sctp_destroy_sock() when calling
sk_common_release().
It also makes more sense to do auto_asconf init while binding the
first addr, as auto_asconf actually requires an ANY addr bind,
see it in sctp_addr_wq_timeout_handler().
This addresses CVE-2021-23133.
Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications")
Reported-by: Or Cohen <orcohen(a)paloaltonetworks.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 76a388b5021c..40f9f6c4a0a1 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -357,6 +357,18 @@ static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
return af;
}
+static void sctp_auto_asconf_init(struct sctp_sock *sp)
+{
+ struct net *net = sock_net(&sp->inet.sk);
+
+ if (net->sctp.default_auto_asconf) {
+ spin_lock(&net->sctp.addr_wq_lock);
+ list_add_tail(&sp->auto_asconf_list, &net->sctp.auto_asconf_splist);
+ spin_unlock(&net->sctp.addr_wq_lock);
+ sp->do_auto_asconf = 1;
+ }
+}
+
/* Bind a local address either to an endpoint or to an association. */
static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
{
@@ -418,8 +430,10 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
return -EADDRINUSE;
/* Refresh ephemeral port. */
- if (!bp->port)
+ if (!bp->port) {
bp->port = inet_sk(sk)->inet_num;
+ sctp_auto_asconf_init(sp);
+ }
/* Add the address to the bind address list.
* Use GFP_ATOMIC since BHs will be disabled.
@@ -4993,19 +5007,6 @@ static int sctp_init_sock(struct sock *sk)
sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);
- /* Nothing can fail after this block, otherwise
- * sctp_destroy_sock() will be called without addr_wq_lock held
- */
- if (net->sctp.default_auto_asconf) {
- spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
- list_add_tail(&sp->auto_asconf_list,
- &net->sctp.auto_asconf_splist);
- sp->do_auto_asconf = 1;
- spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
- } else {
- sp->do_auto_asconf = 0;
- }
-
local_bh_enable();
return 0;
@@ -9401,6 +9402,8 @@ static int sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
return err;
}
+ sctp_auto_asconf_init(newsp);
+
/* Move any messages in the old socket's receive queue that are for the
* peeled off association to the new socket's receive queue.
*/
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34e5b01186858b36c4d7c87e1a025071e8e2401f Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 3 May 2021 05:11:42 +0800
Subject: [PATCH] sctp: delay auto_asconf init until binding the first addr
As Or Cohen described:
If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
held and sp->do_auto_asconf is true, then an element is removed
from the auto_asconf_splist without any proper locking.
This can happen in the following functions:
1. In sctp_accept, if sctp_sock_migrate fails.
2. In inet_create or inet6_create, if there is a bpf program
attached to BPF_CGROUP_INET_SOCK_CREATE which denies
creation of the sctp socket.
This patch is to fix it by moving the auto_asconf init out of
sctp_init_sock(), by which inet_create()/inet6_create() won't
need to operate it in sctp_destroy_sock() when calling
sk_common_release().
It also makes more sense to do auto_asconf init while binding the
first addr, as auto_asconf actually requires an ANY addr bind,
see it in sctp_addr_wq_timeout_handler().
This addresses CVE-2021-23133.
Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications")
Reported-by: Or Cohen <orcohen(a)paloaltonetworks.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 76a388b5021c..40f9f6c4a0a1 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -357,6 +357,18 @@ static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
return af;
}
+static void sctp_auto_asconf_init(struct sctp_sock *sp)
+{
+ struct net *net = sock_net(&sp->inet.sk);
+
+ if (net->sctp.default_auto_asconf) {
+ spin_lock(&net->sctp.addr_wq_lock);
+ list_add_tail(&sp->auto_asconf_list, &net->sctp.auto_asconf_splist);
+ spin_unlock(&net->sctp.addr_wq_lock);
+ sp->do_auto_asconf = 1;
+ }
+}
+
/* Bind a local address either to an endpoint or to an association. */
static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
{
@@ -418,8 +430,10 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
return -EADDRINUSE;
/* Refresh ephemeral port. */
- if (!bp->port)
+ if (!bp->port) {
bp->port = inet_sk(sk)->inet_num;
+ sctp_auto_asconf_init(sp);
+ }
/* Add the address to the bind address list.
* Use GFP_ATOMIC since BHs will be disabled.
@@ -4993,19 +5007,6 @@ static int sctp_init_sock(struct sock *sk)
sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);
- /* Nothing can fail after this block, otherwise
- * sctp_destroy_sock() will be called without addr_wq_lock held
- */
- if (net->sctp.default_auto_asconf) {
- spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
- list_add_tail(&sp->auto_asconf_list,
- &net->sctp.auto_asconf_splist);
- sp->do_auto_asconf = 1;
- spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
- } else {
- sp->do_auto_asconf = 0;
- }
-
local_bh_enable();
return 0;
@@ -9401,6 +9402,8 @@ static int sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
return err;
}
+ sctp_auto_asconf_init(newsp);
+
/* Move any messages in the old socket's receive queue that are for the
* peeled off association to the new socket's receive queue.
*/
Hi Greg and Sasha,
Please find attached backports of commit 1139aeb1c521 ("smp: Fix
smp_call_function_single_async prototype") upstream, which resolves a
serious looking clang warning seen across several different builds. It
has been build tested with gcc and clang for x86_64 defconfig and
allmodconfig.
Please let me know if there are any problems.
Cheers,
Nathan
Hello,
Please apply upstream git commit 8d432592f30f ("net: Only allow init
netns to set default tcp cong to a restricted algo") to the stable
trees.
Thanks,
Jonathon Reinhart
When hotplugging CPUs on Tegra186 and Tegra194 errors such as the
following are seen ...
IRQ63: set affinity failed(-22).
IRQ65: set affinity failed(-22).
IRQ66: set affinity failed(-22).
IRQ67: set affinity failed(-22).
Looking at the /proc/interrupts the above are all interrupts associated
with GPIOs. The reason why these error messages occur is because there
is no 'parent_data' associated with any of the GPIO interrupts and so
tegra186_irq_set_affinity() simply returns -EINVAL.
To understand why there is no 'parent_data' it is first necessary to
understand that in addition to the GPIO interrupts being routed to the
interrupt controller (GIC), the interrupts for some GPIOs are also
routed to the Tegra Power Management Controller (PMC) to wake up the
system from low power states. In order to configure GPIO events as
wake events in the PMC, the PMC is configured as IRQ parent domain
for the GPIO IRQ domain. Originally the GIC was the IRQ parent domain
of the PMC and although this was working, this started causing issues
once commit 64a267e9a41c ("irqchip/gic: Configure SGIs as standard
interrupts") was added, because technically, the GIC is not a parent
of the PMC. Commit c351ab7bf2a5 ("soc/tegra: pmc: Don't create fake
interrupt hierarchy levels") fixed this by severing the IRQ domain
hierarchy for the Tegra GPIOs and hence, there may be no IRQ parent
domain for the GPIOs.
The GPIO controllers on Tegra186 and Tegra194 have either one or six
interrupt lines to the interrupt controller. For GPIO controllers with
six interrupts, the mapping of the GPIO interrupt to the controller
interrupt is configurable within the GPIO controller. Currently a
default mapping is used, however, it could be possible to use the
set affinity callback for the Tegra186 GPIO driver to do something a
bit more interesting. Currently, because interrupts for all GPIOs are
have the same mapping and any attempts to configure the affinity for
a given GPIO can conflict with another that shares the same IRQ, for
now it is simpler to just remove set affinity support and this avoids
the above warnings being seen.
Cc: <stable(a)vger.kernel.org>
Fixes: c4e1f7d92cd6 ("gpio: tegra186: Set affinity callback to parent")
Signed-off-by: Jon Hunter <jonathanh(a)nvidia.com>
---
drivers/gpio/gpio-tegra186.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/drivers/gpio/gpio-tegra186.c b/drivers/gpio/gpio-tegra186.c
index 1bd9e44df718..05974b760796 100644
--- a/drivers/gpio/gpio-tegra186.c
+++ b/drivers/gpio/gpio-tegra186.c
@@ -444,16 +444,6 @@ static int tegra186_irq_set_wake(struct irq_data *data, unsigned int on)
return 0;
}
-static int tegra186_irq_set_affinity(struct irq_data *data,
- const struct cpumask *dest,
- bool force)
-{
- if (data->parent_data)
- return irq_chip_set_affinity_parent(data, dest, force);
-
- return -EINVAL;
-}
-
static void tegra186_gpio_irq(struct irq_desc *desc)
{
struct tegra_gpio *gpio = irq_desc_get_handler_data(desc);
@@ -700,7 +690,6 @@ static int tegra186_gpio_probe(struct platform_device *pdev)
gpio->intc.irq_unmask = tegra186_irq_unmask;
gpio->intc.irq_set_type = tegra186_irq_set_type;
gpio->intc.irq_set_wake = tegra186_irq_set_wake;
- gpio->intc.irq_set_affinity = tegra186_irq_set_affinity;
irq = &gpio->gpio.irq;
irq->chip = &gpio->intc;
--
2.17.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f85f1baaa18932a041fd2b1c2ca6cfd9898c7d2b Mon Sep 17 00:00:00 2001
From: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Date: Tue, 2 Mar 2021 13:36:44 +0100
Subject: [PATCH] KVM: s390: split kvm_s390_logical_to_effective
Split kvm_s390_logical_to_effective to a generic function called
_kvm_s390_logical_to_effective. The new function takes a PSW and an address
and returns the address with the appropriate bits masked off. The old
function now calls the new function with the appropriate PSW from the vCPU.
This is needed to avoid code duplication for vSIE.
Signed-off-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Cc: stable(a)vger.kernel.org # for VSIE: correctly handle MVPG when in VSIE
Link: https://lore.kernel.org/r/20210302174443.514363-2-imbrenda@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
index f4c51756c462..2d8631a1f23e 100644
--- a/arch/s390/kvm/gaccess.h
+++ b/arch/s390/kvm/gaccess.h
@@ -36,6 +36,29 @@ static inline unsigned long kvm_s390_real_to_abs(struct kvm_vcpu *vcpu,
return gra;
}
+/**
+ * _kvm_s390_logical_to_effective - convert guest logical to effective address
+ * @psw: psw of the guest
+ * @ga: guest logical address
+ *
+ * Convert a guest logical address to an effective address by applying the
+ * rules of the addressing mode defined by bits 31 and 32 of the given PSW
+ * (extendended/basic addressing mode).
+ *
+ * Depending on the addressing mode, the upper 40 bits (24 bit addressing
+ * mode), 33 bits (31 bit addressing mode) or no bits (64 bit addressing
+ * mode) of @ga will be zeroed and the remaining bits will be returned.
+ */
+static inline unsigned long _kvm_s390_logical_to_effective(psw_t *psw,
+ unsigned long ga)
+{
+ if (psw_bits(*psw).eaba == PSW_BITS_AMODE_64BIT)
+ return ga;
+ if (psw_bits(*psw).eaba == PSW_BITS_AMODE_31BIT)
+ return ga & ((1UL << 31) - 1);
+ return ga & ((1UL << 24) - 1);
+}
+
/**
* kvm_s390_logical_to_effective - convert guest logical to effective address
* @vcpu: guest virtual cpu
@@ -52,13 +75,7 @@ static inline unsigned long kvm_s390_real_to_abs(struct kvm_vcpu *vcpu,
static inline unsigned long kvm_s390_logical_to_effective(struct kvm_vcpu *vcpu,
unsigned long ga)
{
- psw_t *psw = &vcpu->arch.sie_block->gpsw;
-
- if (psw_bits(*psw).eaba == PSW_BITS_AMODE_64BIT)
- return ga;
- if (psw_bits(*psw).eaba == PSW_BITS_AMODE_31BIT)
- return ga & ((1UL << 31) - 1);
- return ga & ((1UL << 24) - 1);
+ return _kvm_s390_logical_to_effective(&vcpu->arch.sie_block->gpsw, ga);
}
/*
On Tue, May 11, 2021 at 12:31:34PM -0400, George Kennedy wrote:
> Hello Greg,
>
> During Syzkaller reproducer testing on 5.4.y (5.4.118-rc1) the following
> crash occurred:
>
> WARNING: in hsr_addr_subst_dest
> https://syzkaller.appspot.com/bug?id=924b5574f42ebeddc94fad06f2fa329b199d58…
That is not a "crash", just syzbot complaining about a warning.
But I'll go queue it up, thanks.
greg k-h
On Wed, May 12, 2021 at 12:55 PM Jianwen Ji <jiji(a)redhat.com> wrote:
>
>
>
> On Wed, May 12, 2021 at 6:17 PM Veronika Kabatova <vkabatov(a)redhat.com> wrote:
>>
>> On Wed, May 12, 2021 at 3:22 AM Sasha Levin <sashal(a)kernel.org> wrote:
>> >
>> > On Tue, May 11, 2021 at 09:43:50PM -0000, CKI Project wrote:
>> > >
>> > >Hello,
>> > >
>> > >We ran automated tests on a recent commit from this kernel tree:
>> > >
>> > > Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>> > > Commit: beb6df0ce94f - thermal/core/fair share: Lock the thermal zone while looping over instances
>> > >
>> > >The results of these automated tests are provided below.
>> > >
>> > > Overall result: FAILED (see details below)
>> > > Merge: OK
>> > > Compile: OK
>> > > Tests: FAILED
>> > >
>> > >All kernel binaries, config files, and logs are available for download here:
>> > >
>> > > https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
>> > >
>> > >One or more kernel tests failed:
>> > >
>> > > s390x:
>> > > ❌ Networking tunnel: geneve basic test
>> > >
>> > > ppc64le:
>> > > ❌ LTP
>> > > ❌ Networking tunnel: geneve basic test
>> > >
>> > > aarch64:
>> > > ❌ Networking tunnel: geneve basic test
>> > >
>> > > x86_64:
>> > > ❌ Networking tunnel: geneve basic test
>> >
>> > CKI folks, looks like there was a gap between 5.11.16 and now, and idea
>> > if the reported issue here is new in the 5.11.19 -rc, or something that
>> > regressed earlier?
>> >
>>
>> Hi Sasha,
>>
>> this is a bug we've previously reported here:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=210569
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=210569 should have been fixed upstream.
>
> I think the current bug we have is https://bugzilla.kernel.org/show_bug.cgi?id=212749.
>
Thanks for correction Jianwen! The logs look very similar so our
result database treated the two issues as the same. According to
this BZ comments, it was not yet fixed in mainline, is that correct?
Veronika
>>
>>
>>
>> This is an older bug we've been seeing for a while and thus the
>> report should not have been sent. Apologies for that - we fixed
>> some permission issues with reporting yesterday and it must
>> have uncovered an underlying bug. This is the only list that hits
>> the problem. We'll look into that reason right away.
>>
>> Veronika
>>
>> > --
>> > Thanks,
>> > Sasha
>> >
>>
>
>
> --
>
> Jianwen Ji
>
> He/Him
>
> Quality Engineering
>
> Red Hat Beijing
>
> T: +86-10-62608073
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bf1e15a82e3b74ee86bb119d6038b41e1ed2b319 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Tue, 20 Apr 2021 04:13:03 -0400
Subject: [PATCH] KVM: selftests: Always run vCPU thread with blocked SIG_IPI
The main thread could start to send SIG_IPI at any time, even before signal
blocked on vcpu thread. Therefore, start the vcpu thread with the signal
blocked.
Without this patch, on very busy cores the dirty_log_test could fail directly
on receiving a SIGUSR1 without a handler (when vcpu runs far slower than main).
Reported-by: Peter Xu <peterx(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index ffa4e2791926..81edbd23d371 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -527,9 +527,8 @@ static void *vcpu_worker(void *data)
*/
sigmask->len = 8;
pthread_sigmask(0, NULL, sigset);
+ sigdelset(sigset, SIG_IPI);
vcpu_ioctl(vm, VCPU_ID, KVM_SET_SIGNAL_MASK, sigmask);
- sigaddset(sigset, SIG_IPI);
- pthread_sigmask(SIG_BLOCK, sigset, NULL);
sigemptyset(sigset);
sigaddset(sigset, SIG_IPI);
@@ -858,6 +857,7 @@ int main(int argc, char *argv[])
.interval = TEST_HOST_LOOP_INTERVAL,
};
int opt, i;
+ sigset_t sigset;
sem_init(&sem_vcpu_stop, 0, 0);
sem_init(&sem_vcpu_cont, 0, 0);
@@ -916,6 +916,11 @@ int main(int argc, char *argv[])
srandom(time(0));
+ /* Ensure that vCPU threads start with SIG_IPI blocked. */
+ sigemptyset(&sigset);
+ sigaddset(&sigset, SIG_IPI);
+ pthread_sigmask(SIG_BLOCK, &sigset, NULL);
+
if (host_log_mode_option == LOG_MODE_ALL) {
/* Run each log mode */
for (i = 0; i < LOG_MODE_NUM; i++) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bf1e15a82e3b74ee86bb119d6038b41e1ed2b319 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Tue, 20 Apr 2021 04:13:03 -0400
Subject: [PATCH] KVM: selftests: Always run vCPU thread with blocked SIG_IPI
The main thread could start to send SIG_IPI at any time, even before signal
blocked on vcpu thread. Therefore, start the vcpu thread with the signal
blocked.
Without this patch, on very busy cores the dirty_log_test could fail directly
on receiving a SIGUSR1 without a handler (when vcpu runs far slower than main).
Reported-by: Peter Xu <peterx(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index ffa4e2791926..81edbd23d371 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -527,9 +527,8 @@ static void *vcpu_worker(void *data)
*/
sigmask->len = 8;
pthread_sigmask(0, NULL, sigset);
+ sigdelset(sigset, SIG_IPI);
vcpu_ioctl(vm, VCPU_ID, KVM_SET_SIGNAL_MASK, sigmask);
- sigaddset(sigset, SIG_IPI);
- pthread_sigmask(SIG_BLOCK, sigset, NULL);
sigemptyset(sigset);
sigaddset(sigset, SIG_IPI);
@@ -858,6 +857,7 @@ int main(int argc, char *argv[])
.interval = TEST_HOST_LOOP_INTERVAL,
};
int opt, i;
+ sigset_t sigset;
sem_init(&sem_vcpu_stop, 0, 0);
sem_init(&sem_vcpu_cont, 0, 0);
@@ -916,6 +916,11 @@ int main(int argc, char *argv[])
srandom(time(0));
+ /* Ensure that vCPU threads start with SIG_IPI blocked. */
+ sigemptyset(&sigset);
+ sigaddset(&sigset, SIG_IPI);
+ pthread_sigmask(SIG_BLOCK, &sigset, NULL);
+
if (host_log_mode_option == LOG_MODE_ALL) {
/* Run each log mode */
for (i = 0; i < LOG_MODE_NUM; i++) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 016ff1a442d9a8f36dcb3beca0bcdfc35e281e18 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Sat, 17 Apr 2021 10:36:01 -0400
Subject: [PATCH] KVM: selftests: Sync data verify of dirty logging with guest
sync
This fixes a bug that can trigger with e.g. "taskset -c 0 ./dirty_log_test" or
when the testing host is very busy.
A similar previous attempt is done [1] but that is not enough, the reason is
stated in the reply [2].
As a summary (partly quotting from [2]):
The problem is I think one guest memory write operation (of this specific test)
contains a few micro-steps when page is during kvm dirty tracking (here I'm
only considering write-protect rather than pml but pml should be similar at
least when the log buffer is full):
(1) Guest read 'iteration' number into register, prepare to write, page fault
(2) Set dirty bit in either dirty bitmap or dirty ring
(3) Return to guest, data written
When we verify the data, we assumed that all these steps are "atomic", say,
when (1) happened for this page, we assume (2) & (3) must have happened. We
had some trick to workaround "un-atomicity" of above three steps, as previous
version of this patch wanted to fix atomicity of step (2)+(3) by explicitly
letting the main thread wait for at least one vmenter of vcpu thread, which
should work. However what I overlooked is probably that we still have race
when (1) and (2) can be interrupted.
One example calltrace when it could happen that we read an old interation, got
interrupted before even setting the dirty bit and flushing data:
__schedule+1742
__cond_resched+52
__get_user_pages+530
get_user_pages_unlocked+197
hva_to_pfn+206
try_async_pf+132
direct_page_fault+320
kvm_mmu_page_fault+103
vmx_handle_exit+288
vcpu_enter_guest+2460
kvm_arch_vcpu_ioctl_run+325
kvm_vcpu_ioctl+526
__x64_sys_ioctl+131
do_syscall_64+51
entry_SYSCALL_64_after_hwframe+68
It means iteration number cached in vcpu register can be very old when dirty
bit set and data flushed.
So far I don't see an easy way to guarantee all steps 1-3 atomicity but to sync
at the GUEST_SYNC() point of guest code when we do verification of the dirty
bits as what this patch does.
[1] https://lore.kernel.org/lkml/20210413213641.23742-1-peterx@redhat.com/
[2] https://lore.kernel.org/lkml/20210417140956.GV4440@xz-x1/
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Andrew Jones <drjones(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Message-Id: <20210417143602.215059-2-peterx(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index bb2752d78fe3..ffa4e2791926 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -17,6 +17,7 @@
#include <linux/bitmap.h>
#include <linux/bitops.h>
#include <asm/barrier.h>
+#include <linux/atomic.h>
#include "kvm_util.h"
#include "test_util.h"
@@ -137,12 +138,20 @@ static uint64_t host_clear_count;
static uint64_t host_track_next_count;
/* Whether dirty ring reset is requested, or finished */
-static sem_t dirty_ring_vcpu_stop;
-static sem_t dirty_ring_vcpu_cont;
+static sem_t sem_vcpu_stop;
+static sem_t sem_vcpu_cont;
+/*
+ * This is only set by main thread, and only cleared by vcpu thread. It is
+ * used to request vcpu thread to stop at the next GUEST_SYNC, since GUEST_SYNC
+ * is the only place that we'll guarantee both "dirty bit" and "dirty data"
+ * will match. E.g., SIG_IPI won't guarantee that if the vcpu is interrupted
+ * after setting dirty bit but before the data is written.
+ */
+static atomic_t vcpu_sync_stop_requested;
/*
* This is updated by the vcpu thread to tell the host whether it's a
* ring-full event. It should only be read until a sem_wait() of
- * dirty_ring_vcpu_stop and before vcpu continues to run.
+ * sem_vcpu_stop and before vcpu continues to run.
*/
static bool dirty_ring_vcpu_ring_full;
/*
@@ -234,6 +243,17 @@ static void clear_log_collect_dirty_pages(struct kvm_vm *vm, int slot,
kvm_vm_clear_dirty_log(vm, slot, bitmap, 0, num_pages);
}
+/* Should only be called after a GUEST_SYNC */
+static void vcpu_handle_sync_stop(void)
+{
+ if (atomic_read(&vcpu_sync_stop_requested)) {
+ /* It means main thread is sleeping waiting */
+ atomic_set(&vcpu_sync_stop_requested, false);
+ sem_post(&sem_vcpu_stop);
+ sem_wait_until(&sem_vcpu_cont);
+ }
+}
+
static void default_after_vcpu_run(struct kvm_vm *vm, int ret, int err)
{
struct kvm_run *run = vcpu_state(vm, VCPU_ID);
@@ -244,6 +264,8 @@ static void default_after_vcpu_run(struct kvm_vm *vm, int ret, int err)
TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC,
"Invalid guest sync status: exit_reason=%s\n",
exit_reason_str(run->exit_reason));
+
+ vcpu_handle_sync_stop();
}
static bool dirty_ring_supported(void)
@@ -301,13 +323,13 @@ static void dirty_ring_wait_vcpu(void)
{
/* This makes sure that hardware PML cache flushed */
vcpu_kick();
- sem_wait_until(&dirty_ring_vcpu_stop);
+ sem_wait_until(&sem_vcpu_stop);
}
static void dirty_ring_continue_vcpu(void)
{
pr_info("Notifying vcpu to continue\n");
- sem_post(&dirty_ring_vcpu_cont);
+ sem_post(&sem_vcpu_cont);
}
static void dirty_ring_collect_dirty_pages(struct kvm_vm *vm, int slot,
@@ -361,11 +383,11 @@ static void dirty_ring_after_vcpu_run(struct kvm_vm *vm, int ret, int err)
/* Update the flag first before pause */
WRITE_ONCE(dirty_ring_vcpu_ring_full,
run->exit_reason == KVM_EXIT_DIRTY_RING_FULL);
- sem_post(&dirty_ring_vcpu_stop);
+ sem_post(&sem_vcpu_stop);
pr_info("vcpu stops because %s...\n",
dirty_ring_vcpu_ring_full ?
"dirty ring is full" : "vcpu is kicked out");
- sem_wait_until(&dirty_ring_vcpu_cont);
+ sem_wait_until(&sem_vcpu_cont);
pr_info("vcpu continues now.\n");
} else {
TEST_ASSERT(false, "Invalid guest sync status: "
@@ -377,7 +399,7 @@ static void dirty_ring_after_vcpu_run(struct kvm_vm *vm, int ret, int err)
static void dirty_ring_before_vcpu_join(void)
{
/* Kick another round of vcpu just to make sure it will quit */
- sem_post(&dirty_ring_vcpu_cont);
+ sem_post(&sem_vcpu_cont);
}
struct log_mode {
@@ -768,7 +790,25 @@ static void run_test(enum vm_guest_mode mode, void *arg)
usleep(p->interval * 1000);
log_mode_collect_dirty_pages(vm, TEST_MEM_SLOT_INDEX,
bmap, host_num_pages);
+
+ /*
+ * See vcpu_sync_stop_requested definition for details on why
+ * we need to stop vcpu when verify data.
+ */
+ atomic_set(&vcpu_sync_stop_requested, true);
+ sem_wait_until(&sem_vcpu_stop);
+ /*
+ * NOTE: for dirty ring, it's possible that we didn't stop at
+ * GUEST_SYNC but instead we stopped because ring is full;
+ * that's okay too because ring full means we're only missing
+ * the flush of the last page, and since we handle the last
+ * page specially verification will succeed anyway.
+ */
+ assert(host_log_mode == LOG_MODE_DIRTY_RING ||
+ atomic_read(&vcpu_sync_stop_requested) == false);
vm_dirty_log_verify(mode, bmap);
+ sem_post(&sem_vcpu_cont);
+
iteration++;
sync_global_to_guest(vm, iteration);
}
@@ -819,8 +859,8 @@ int main(int argc, char *argv[])
};
int opt, i;
- sem_init(&dirty_ring_vcpu_stop, 0, 0);
- sem_init(&dirty_ring_vcpu_cont, 0, 0);
+ sem_init(&sem_vcpu_stop, 0, 0);
+ sem_init(&sem_vcpu_cont, 0, 0);
guest_modes_append_default();
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 016ff1a442d9a8f36dcb3beca0bcdfc35e281e18 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Sat, 17 Apr 2021 10:36:01 -0400
Subject: [PATCH] KVM: selftests: Sync data verify of dirty logging with guest
sync
This fixes a bug that can trigger with e.g. "taskset -c 0 ./dirty_log_test" or
when the testing host is very busy.
A similar previous attempt is done [1] but that is not enough, the reason is
stated in the reply [2].
As a summary (partly quotting from [2]):
The problem is I think one guest memory write operation (of this specific test)
contains a few micro-steps when page is during kvm dirty tracking (here I'm
only considering write-protect rather than pml but pml should be similar at
least when the log buffer is full):
(1) Guest read 'iteration' number into register, prepare to write, page fault
(2) Set dirty bit in either dirty bitmap or dirty ring
(3) Return to guest, data written
When we verify the data, we assumed that all these steps are "atomic", say,
when (1) happened for this page, we assume (2) & (3) must have happened. We
had some trick to workaround "un-atomicity" of above three steps, as previous
version of this patch wanted to fix atomicity of step (2)+(3) by explicitly
letting the main thread wait for at least one vmenter of vcpu thread, which
should work. However what I overlooked is probably that we still have race
when (1) and (2) can be interrupted.
One example calltrace when it could happen that we read an old interation, got
interrupted before even setting the dirty bit and flushing data:
__schedule+1742
__cond_resched+52
__get_user_pages+530
get_user_pages_unlocked+197
hva_to_pfn+206
try_async_pf+132
direct_page_fault+320
kvm_mmu_page_fault+103
vmx_handle_exit+288
vcpu_enter_guest+2460
kvm_arch_vcpu_ioctl_run+325
kvm_vcpu_ioctl+526
__x64_sys_ioctl+131
do_syscall_64+51
entry_SYSCALL_64_after_hwframe+68
It means iteration number cached in vcpu register can be very old when dirty
bit set and data flushed.
So far I don't see an easy way to guarantee all steps 1-3 atomicity but to sync
at the GUEST_SYNC() point of guest code when we do verification of the dirty
bits as what this patch does.
[1] https://lore.kernel.org/lkml/20210413213641.23742-1-peterx@redhat.com/
[2] https://lore.kernel.org/lkml/20210417140956.GV4440@xz-x1/
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Andrew Jones <drjones(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Message-Id: <20210417143602.215059-2-peterx(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index bb2752d78fe3..ffa4e2791926 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -17,6 +17,7 @@
#include <linux/bitmap.h>
#include <linux/bitops.h>
#include <asm/barrier.h>
+#include <linux/atomic.h>
#include "kvm_util.h"
#include "test_util.h"
@@ -137,12 +138,20 @@ static uint64_t host_clear_count;
static uint64_t host_track_next_count;
/* Whether dirty ring reset is requested, or finished */
-static sem_t dirty_ring_vcpu_stop;
-static sem_t dirty_ring_vcpu_cont;
+static sem_t sem_vcpu_stop;
+static sem_t sem_vcpu_cont;
+/*
+ * This is only set by main thread, and only cleared by vcpu thread. It is
+ * used to request vcpu thread to stop at the next GUEST_SYNC, since GUEST_SYNC
+ * is the only place that we'll guarantee both "dirty bit" and "dirty data"
+ * will match. E.g., SIG_IPI won't guarantee that if the vcpu is interrupted
+ * after setting dirty bit but before the data is written.
+ */
+static atomic_t vcpu_sync_stop_requested;
/*
* This is updated by the vcpu thread to tell the host whether it's a
* ring-full event. It should only be read until a sem_wait() of
- * dirty_ring_vcpu_stop and before vcpu continues to run.
+ * sem_vcpu_stop and before vcpu continues to run.
*/
static bool dirty_ring_vcpu_ring_full;
/*
@@ -234,6 +243,17 @@ static void clear_log_collect_dirty_pages(struct kvm_vm *vm, int slot,
kvm_vm_clear_dirty_log(vm, slot, bitmap, 0, num_pages);
}
+/* Should only be called after a GUEST_SYNC */
+static void vcpu_handle_sync_stop(void)
+{
+ if (atomic_read(&vcpu_sync_stop_requested)) {
+ /* It means main thread is sleeping waiting */
+ atomic_set(&vcpu_sync_stop_requested, false);
+ sem_post(&sem_vcpu_stop);
+ sem_wait_until(&sem_vcpu_cont);
+ }
+}
+
static void default_after_vcpu_run(struct kvm_vm *vm, int ret, int err)
{
struct kvm_run *run = vcpu_state(vm, VCPU_ID);
@@ -244,6 +264,8 @@ static void default_after_vcpu_run(struct kvm_vm *vm, int ret, int err)
TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC,
"Invalid guest sync status: exit_reason=%s\n",
exit_reason_str(run->exit_reason));
+
+ vcpu_handle_sync_stop();
}
static bool dirty_ring_supported(void)
@@ -301,13 +323,13 @@ static void dirty_ring_wait_vcpu(void)
{
/* This makes sure that hardware PML cache flushed */
vcpu_kick();
- sem_wait_until(&dirty_ring_vcpu_stop);
+ sem_wait_until(&sem_vcpu_stop);
}
static void dirty_ring_continue_vcpu(void)
{
pr_info("Notifying vcpu to continue\n");
- sem_post(&dirty_ring_vcpu_cont);
+ sem_post(&sem_vcpu_cont);
}
static void dirty_ring_collect_dirty_pages(struct kvm_vm *vm, int slot,
@@ -361,11 +383,11 @@ static void dirty_ring_after_vcpu_run(struct kvm_vm *vm, int ret, int err)
/* Update the flag first before pause */
WRITE_ONCE(dirty_ring_vcpu_ring_full,
run->exit_reason == KVM_EXIT_DIRTY_RING_FULL);
- sem_post(&dirty_ring_vcpu_stop);
+ sem_post(&sem_vcpu_stop);
pr_info("vcpu stops because %s...\n",
dirty_ring_vcpu_ring_full ?
"dirty ring is full" : "vcpu is kicked out");
- sem_wait_until(&dirty_ring_vcpu_cont);
+ sem_wait_until(&sem_vcpu_cont);
pr_info("vcpu continues now.\n");
} else {
TEST_ASSERT(false, "Invalid guest sync status: "
@@ -377,7 +399,7 @@ static void dirty_ring_after_vcpu_run(struct kvm_vm *vm, int ret, int err)
static void dirty_ring_before_vcpu_join(void)
{
/* Kick another round of vcpu just to make sure it will quit */
- sem_post(&dirty_ring_vcpu_cont);
+ sem_post(&sem_vcpu_cont);
}
struct log_mode {
@@ -768,7 +790,25 @@ static void run_test(enum vm_guest_mode mode, void *arg)
usleep(p->interval * 1000);
log_mode_collect_dirty_pages(vm, TEST_MEM_SLOT_INDEX,
bmap, host_num_pages);
+
+ /*
+ * See vcpu_sync_stop_requested definition for details on why
+ * we need to stop vcpu when verify data.
+ */
+ atomic_set(&vcpu_sync_stop_requested, true);
+ sem_wait_until(&sem_vcpu_stop);
+ /*
+ * NOTE: for dirty ring, it's possible that we didn't stop at
+ * GUEST_SYNC but instead we stopped because ring is full;
+ * that's okay too because ring full means we're only missing
+ * the flush of the last page, and since we handle the last
+ * page specially verification will succeed anyway.
+ */
+ assert(host_log_mode == LOG_MODE_DIRTY_RING ||
+ atomic_read(&vcpu_sync_stop_requested) == false);
vm_dirty_log_verify(mode, bmap);
+ sem_post(&sem_vcpu_cont);
+
iteration++;
sync_global_to_guest(vm, iteration);
}
@@ -819,8 +859,8 @@ int main(int argc, char *argv[])
};
int opt, i;
- sem_init(&dirty_ring_vcpu_stop, 0, 0);
- sem_init(&dirty_ring_vcpu_cont, 0, 0);
+ sem_init(&sem_vcpu_stop, 0, 0);
+ sem_init(&sem_vcpu_cont, 0, 0);
guest_modes_append_default();
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 53b16dd6ba5cf64ed147ac3523ec34651d553cb0 Mon Sep 17 00:00:00 2001
From: Eric Auger <eric.auger(a)redhat.com>
Date: Mon, 5 Apr 2021 18:39:34 +0200
Subject: [PATCH] KVM: arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION read
The doc says:
"The characteristics of a specific redistributor region can
be read by presetting the index field in the attr data.
Only valid for KVM_DEV_TYPE_ARM_VGIC_V3"
Unfortunately the existing code fails to read the input attr data.
Fixes: 04c110932225 ("KVM: arm/arm64: Implement KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION")
Cc: stable(a)vger.kernel.org#v4.17+
Signed-off-by: Eric Auger <eric.auger(a)redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei(a)arm.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-3-eric.auger@redhat.com
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 44419679f91a..2f66cf247282 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -226,6 +226,9 @@ static int vgic_get_common_attr(struct kvm_device *dev,
u64 addr;
unsigned long type = (unsigned long)attr->attr;
+ if (copy_from_user(&addr, uaddr, sizeof(addr)))
+ return -EFAULT;
+
r = kvm_vgic_addr(dev->kvm, type, &addr, false);
if (r)
return (r == -ENODEV) ? -ENXIO : r;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 53b16dd6ba5cf64ed147ac3523ec34651d553cb0 Mon Sep 17 00:00:00 2001
From: Eric Auger <eric.auger(a)redhat.com>
Date: Mon, 5 Apr 2021 18:39:34 +0200
Subject: [PATCH] KVM: arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION read
The doc says:
"The characteristics of a specific redistributor region can
be read by presetting the index field in the attr data.
Only valid for KVM_DEV_TYPE_ARM_VGIC_V3"
Unfortunately the existing code fails to read the input attr data.
Fixes: 04c110932225 ("KVM: arm/arm64: Implement KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION")
Cc: stable(a)vger.kernel.org#v4.17+
Signed-off-by: Eric Auger <eric.auger(a)redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei(a)arm.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-3-eric.auger@redhat.com
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 44419679f91a..2f66cf247282 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -226,6 +226,9 @@ static int vgic_get_common_attr(struct kvm_device *dev,
u64 addr;
unsigned long type = (unsigned long)attr->attr;
+ if (copy_from_user(&addr, uaddr, sizeof(addr)))
+ return -EFAULT;
+
r = kvm_vgic_addr(dev->kvm, type, &addr, false);
if (r)
return (r == -ENODEV) ? -ENXIO : r;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5d3c4c79384af06e3c8e25b7770b6247496b4417 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 12 Apr 2021 15:20:49 -0700
Subject: [PATCH] KVM: Stop looking for coalesced MMIO zones if the bus is
destroyed
Abort the walk of coalesced MMIO zones if kvm_io_bus_unregister_dev()
fails to allocate memory for the new instance of the bus. If it can't
instantiate a new bus, unregister_dev() destroys all devices _except_ the
target device. But, it doesn't tell the caller that it obliterated the
bus and invoked the destructor for all devices that were on the bus. In
the coalesced MMIO case, this can result in a deleted list entry
dereference due to attempting to continue iterating on coalesced_zones
after future entries (in the walk) have been deleted.
Opportunistically add curly braces to the for-loop, which encompasses
many lines but sneaks by without braces due to the guts being a single
if statement.
Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable(a)vger.kernel.org
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210412222050.876100-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d17e3ff1138d..82b066db37cb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -192,8 +192,8 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
int len, void *val);
int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
int len, struct kvm_io_device *dev);
-void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
- struct kvm_io_device *dev);
+int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+ struct kvm_io_device *dev);
struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
gpa_t addr);
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 62bd908ecd58..f08f5e82460b 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -174,21 +174,36 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
struct kvm_coalesced_mmio_zone *zone)
{
struct kvm_coalesced_mmio_dev *dev, *tmp;
+ int r;
if (zone->pio != 1 && zone->pio != 0)
return -EINVAL;
mutex_lock(&kvm->slots_lock);
- list_for_each_entry_safe(dev, tmp, &kvm->coalesced_zones, list)
+ list_for_each_entry_safe(dev, tmp, &kvm->coalesced_zones, list) {
if (zone->pio == dev->zone.pio &&
coalesced_mmio_in_range(dev, zone->addr, zone->size)) {
- kvm_io_bus_unregister_dev(kvm,
+ r = kvm_io_bus_unregister_dev(kvm,
zone->pio ? KVM_PIO_BUS : KVM_MMIO_BUS, &dev->dev);
kvm_iodevice_destructor(&dev->dev);
+
+ /*
+ * On failure, unregister destroys all devices on the
+ * bus _except_ the target device, i.e. coalesced_zones
+ * has been modified. No need to restart the walk as
+ * there aren't any zones left.
+ */
+ if (r)
+ break;
}
+ }
mutex_unlock(&kvm->slots_lock);
+ /*
+ * Ignore the result of kvm_io_bus_unregister_dev(), from userspace's
+ * perspective, the coalesced MMIO is most definitely unregistered.
+ */
return 0;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c771d40737c9..f84b126c32d6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4621,15 +4621,15 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
}
/* Caller must hold slots_lock. */
-void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
- struct kvm_io_device *dev)
+int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+ struct kvm_io_device *dev)
{
int i, j;
struct kvm_io_bus *new_bus, *bus;
bus = kvm_get_bus(kvm, bus_idx);
if (!bus)
- return;
+ return 0;
for (i = 0; i < bus->dev_count; i++)
if (bus->range[i].dev == dev) {
@@ -4637,7 +4637,7 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
if (i == bus->dev_count)
- return;
+ return 0;
new_bus = kmalloc(struct_size(bus, range, bus->dev_count - 1),
GFP_KERNEL_ACCOUNT);
@@ -4662,7 +4662,7 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
kfree(bus);
- return;
+ return new_bus ? 0 : -ENOMEM;
}
struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5d3c4c79384af06e3c8e25b7770b6247496b4417 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 12 Apr 2021 15:20:49 -0700
Subject: [PATCH] KVM: Stop looking for coalesced MMIO zones if the bus is
destroyed
Abort the walk of coalesced MMIO zones if kvm_io_bus_unregister_dev()
fails to allocate memory for the new instance of the bus. If it can't
instantiate a new bus, unregister_dev() destroys all devices _except_ the
target device. But, it doesn't tell the caller that it obliterated the
bus and invoked the destructor for all devices that were on the bus. In
the coalesced MMIO case, this can result in a deleted list entry
dereference due to attempting to continue iterating on coalesced_zones
after future entries (in the walk) have been deleted.
Opportunistically add curly braces to the for-loop, which encompasses
many lines but sneaks by without braces due to the guts being a single
if statement.
Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable(a)vger.kernel.org
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210412222050.876100-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d17e3ff1138d..82b066db37cb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -192,8 +192,8 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
int len, void *val);
int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
int len, struct kvm_io_device *dev);
-void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
- struct kvm_io_device *dev);
+int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+ struct kvm_io_device *dev);
struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
gpa_t addr);
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 62bd908ecd58..f08f5e82460b 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -174,21 +174,36 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
struct kvm_coalesced_mmio_zone *zone)
{
struct kvm_coalesced_mmio_dev *dev, *tmp;
+ int r;
if (zone->pio != 1 && zone->pio != 0)
return -EINVAL;
mutex_lock(&kvm->slots_lock);
- list_for_each_entry_safe(dev, tmp, &kvm->coalesced_zones, list)
+ list_for_each_entry_safe(dev, tmp, &kvm->coalesced_zones, list) {
if (zone->pio == dev->zone.pio &&
coalesced_mmio_in_range(dev, zone->addr, zone->size)) {
- kvm_io_bus_unregister_dev(kvm,
+ r = kvm_io_bus_unregister_dev(kvm,
zone->pio ? KVM_PIO_BUS : KVM_MMIO_BUS, &dev->dev);
kvm_iodevice_destructor(&dev->dev);
+
+ /*
+ * On failure, unregister destroys all devices on the
+ * bus _except_ the target device, i.e. coalesced_zones
+ * has been modified. No need to restart the walk as
+ * there aren't any zones left.
+ */
+ if (r)
+ break;
}
+ }
mutex_unlock(&kvm->slots_lock);
+ /*
+ * Ignore the result of kvm_io_bus_unregister_dev(), from userspace's
+ * perspective, the coalesced MMIO is most definitely unregistered.
+ */
return 0;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c771d40737c9..f84b126c32d6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4621,15 +4621,15 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
}
/* Caller must hold slots_lock. */
-void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
- struct kvm_io_device *dev)
+int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+ struct kvm_io_device *dev)
{
int i, j;
struct kvm_io_bus *new_bus, *bus;
bus = kvm_get_bus(kvm, bus_idx);
if (!bus)
- return;
+ return 0;
for (i = 0; i < bus->dev_count; i++)
if (bus->range[i].dev == dev) {
@@ -4637,7 +4637,7 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
if (i == bus->dev_count)
- return;
+ return 0;
new_bus = kmalloc(struct_size(bus, range, bus->dev_count - 1),
GFP_KERNEL_ACCOUNT);
@@ -4662,7 +4662,7 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
kfree(bus);
- return;
+ return new_bus ? 0 : -ENOMEM;
}
struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5d3c4c79384af06e3c8e25b7770b6247496b4417 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 12 Apr 2021 15:20:49 -0700
Subject: [PATCH] KVM: Stop looking for coalesced MMIO zones if the bus is
destroyed
Abort the walk of coalesced MMIO zones if kvm_io_bus_unregister_dev()
fails to allocate memory for the new instance of the bus. If it can't
instantiate a new bus, unregister_dev() destroys all devices _except_ the
target device. But, it doesn't tell the caller that it obliterated the
bus and invoked the destructor for all devices that were on the bus. In
the coalesced MMIO case, this can result in a deleted list entry
dereference due to attempting to continue iterating on coalesced_zones
after future entries (in the walk) have been deleted.
Opportunistically add curly braces to the for-loop, which encompasses
many lines but sneaks by without braces due to the guts being a single
if statement.
Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable(a)vger.kernel.org
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210412222050.876100-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d17e3ff1138d..82b066db37cb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -192,8 +192,8 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
int len, void *val);
int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
int len, struct kvm_io_device *dev);
-void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
- struct kvm_io_device *dev);
+int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+ struct kvm_io_device *dev);
struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
gpa_t addr);
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 62bd908ecd58..f08f5e82460b 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -174,21 +174,36 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
struct kvm_coalesced_mmio_zone *zone)
{
struct kvm_coalesced_mmio_dev *dev, *tmp;
+ int r;
if (zone->pio != 1 && zone->pio != 0)
return -EINVAL;
mutex_lock(&kvm->slots_lock);
- list_for_each_entry_safe(dev, tmp, &kvm->coalesced_zones, list)
+ list_for_each_entry_safe(dev, tmp, &kvm->coalesced_zones, list) {
if (zone->pio == dev->zone.pio &&
coalesced_mmio_in_range(dev, zone->addr, zone->size)) {
- kvm_io_bus_unregister_dev(kvm,
+ r = kvm_io_bus_unregister_dev(kvm,
zone->pio ? KVM_PIO_BUS : KVM_MMIO_BUS, &dev->dev);
kvm_iodevice_destructor(&dev->dev);
+
+ /*
+ * On failure, unregister destroys all devices on the
+ * bus _except_ the target device, i.e. coalesced_zones
+ * has been modified. No need to restart the walk as
+ * there aren't any zones left.
+ */
+ if (r)
+ break;
}
+ }
mutex_unlock(&kvm->slots_lock);
+ /*
+ * Ignore the result of kvm_io_bus_unregister_dev(), from userspace's
+ * perspective, the coalesced MMIO is most definitely unregistered.
+ */
return 0;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c771d40737c9..f84b126c32d6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4621,15 +4621,15 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
}
/* Caller must hold slots_lock. */
-void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
- struct kvm_io_device *dev)
+int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+ struct kvm_io_device *dev)
{
int i, j;
struct kvm_io_bus *new_bus, *bus;
bus = kvm_get_bus(kvm, bus_idx);
if (!bus)
- return;
+ return 0;
for (i = 0; i < bus->dev_count; i++)
if (bus->range[i].dev == dev) {
@@ -4637,7 +4637,7 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
if (i == bus->dev_count)
- return;
+ return 0;
new_bus = kmalloc(struct_size(bus, range, bus->dev_count - 1),
GFP_KERNEL_ACCOUNT);
@@ -4662,7 +4662,7 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
kfree(bus);
- return;
+ return new_bus ? 0 : -ENOMEM;
}
struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5d3c4c79384af06e3c8e25b7770b6247496b4417 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 12 Apr 2021 15:20:49 -0700
Subject: [PATCH] KVM: Stop looking for coalesced MMIO zones if the bus is
destroyed
Abort the walk of coalesced MMIO zones if kvm_io_bus_unregister_dev()
fails to allocate memory for the new instance of the bus. If it can't
instantiate a new bus, unregister_dev() destroys all devices _except_ the
target device. But, it doesn't tell the caller that it obliterated the
bus and invoked the destructor for all devices that were on the bus. In
the coalesced MMIO case, this can result in a deleted list entry
dereference due to attempting to continue iterating on coalesced_zones
after future entries (in the walk) have been deleted.
Opportunistically add curly braces to the for-loop, which encompasses
many lines but sneaks by without braces due to the guts being a single
if statement.
Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable(a)vger.kernel.org
Reported-by: Hao Sun <sunhao.th(a)gmail.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210412222050.876100-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d17e3ff1138d..82b066db37cb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -192,8 +192,8 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr,
int len, void *val);
int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
int len, struct kvm_io_device *dev);
-void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
- struct kvm_io_device *dev);
+int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+ struct kvm_io_device *dev);
struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
gpa_t addr);
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 62bd908ecd58..f08f5e82460b 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -174,21 +174,36 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
struct kvm_coalesced_mmio_zone *zone)
{
struct kvm_coalesced_mmio_dev *dev, *tmp;
+ int r;
if (zone->pio != 1 && zone->pio != 0)
return -EINVAL;
mutex_lock(&kvm->slots_lock);
- list_for_each_entry_safe(dev, tmp, &kvm->coalesced_zones, list)
+ list_for_each_entry_safe(dev, tmp, &kvm->coalesced_zones, list) {
if (zone->pio == dev->zone.pio &&
coalesced_mmio_in_range(dev, zone->addr, zone->size)) {
- kvm_io_bus_unregister_dev(kvm,
+ r = kvm_io_bus_unregister_dev(kvm,
zone->pio ? KVM_PIO_BUS : KVM_MMIO_BUS, &dev->dev);
kvm_iodevice_destructor(&dev->dev);
+
+ /*
+ * On failure, unregister destroys all devices on the
+ * bus _except_ the target device, i.e. coalesced_zones
+ * has been modified. No need to restart the walk as
+ * there aren't any zones left.
+ */
+ if (r)
+ break;
}
+ }
mutex_unlock(&kvm->slots_lock);
+ /*
+ * Ignore the result of kvm_io_bus_unregister_dev(), from userspace's
+ * perspective, the coalesced MMIO is most definitely unregistered.
+ */
return 0;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c771d40737c9..f84b126c32d6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4621,15 +4621,15 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
}
/* Caller must hold slots_lock. */
-void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
- struct kvm_io_device *dev)
+int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+ struct kvm_io_device *dev)
{
int i, j;
struct kvm_io_bus *new_bus, *bus;
bus = kvm_get_bus(kvm, bus_idx);
if (!bus)
- return;
+ return 0;
for (i = 0; i < bus->dev_count; i++)
if (bus->range[i].dev == dev) {
@@ -4637,7 +4637,7 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
if (i == bus->dev_count)
- return;
+ return 0;
new_bus = kmalloc(struct_size(bus, range, bus->dev_count - 1),
GFP_KERNEL_ACCOUNT);
@@ -4662,7 +4662,7 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
kfree(bus);
- return;
+ return new_bus ? 0 : -ENOMEM;
}
struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2ee3757424be7c1cd1d0bbfa6db29a7edd82a250 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 12 Apr 2021 15:20:48 -0700
Subject: [PATCH] KVM: Destroy I/O bus devices on unregister failure _after_
sync'ing SRCU
If allocating a new instance of an I/O bus fails when unregistering a
device, wait to destroy the device until after all readers are guaranteed
to see the new null bus. Destroying devices before the bus is nullified
could lead to use-after-free since readers expect the devices on their
reference of the bus to remain valid.
Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210412222050.876100-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 529cff1050d7..c771d40737c9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4646,7 +4646,13 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
new_bus->dev_count--;
memcpy(new_bus->range + i, bus->range + i + 1,
flex_array_size(new_bus, range, new_bus->dev_count - i));
- } else {
+ }
+
+ rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
+ synchronize_srcu_expedited(&kvm->srcu);
+
+ /* Destroy the old bus _after_ installing the (null) bus. */
+ if (!new_bus) {
pr_err("kvm: failed to shrink bus, removing it completely\n");
for (j = 0; j < bus->dev_count; j++) {
if (j == i)
@@ -4655,8 +4661,6 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
}
- rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
- synchronize_srcu_expedited(&kvm->srcu);
kfree(bus);
return;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2ee3757424be7c1cd1d0bbfa6db29a7edd82a250 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 12 Apr 2021 15:20:48 -0700
Subject: [PATCH] KVM: Destroy I/O bus devices on unregister failure _after_
sync'ing SRCU
If allocating a new instance of an I/O bus fails when unregistering a
device, wait to destroy the device until after all readers are guaranteed
to see the new null bus. Destroying devices before the bus is nullified
could lead to use-after-free since readers expect the devices on their
reference of the bus to remain valid.
Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210412222050.876100-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 529cff1050d7..c771d40737c9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4646,7 +4646,13 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
new_bus->dev_count--;
memcpy(new_bus->range + i, bus->range + i + 1,
flex_array_size(new_bus, range, new_bus->dev_count - i));
- } else {
+ }
+
+ rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
+ synchronize_srcu_expedited(&kvm->srcu);
+
+ /* Destroy the old bus _after_ installing the (null) bus. */
+ if (!new_bus) {
pr_err("kvm: failed to shrink bus, removing it completely\n");
for (j = 0; j < bus->dev_count; j++) {
if (j == i)
@@ -4655,8 +4661,6 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
}
- rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
- synchronize_srcu_expedited(&kvm->srcu);
kfree(bus);
return;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2ee3757424be7c1cd1d0bbfa6db29a7edd82a250 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 12 Apr 2021 15:20:48 -0700
Subject: [PATCH] KVM: Destroy I/O bus devices on unregister failure _after_
sync'ing SRCU
If allocating a new instance of an I/O bus fails when unregistering a
device, wait to destroy the device until after all readers are guaranteed
to see the new null bus. Destroying devices before the bus is nullified
could lead to use-after-free since readers expect the devices on their
reference of the bus to remain valid.
Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210412222050.876100-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 529cff1050d7..c771d40737c9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4646,7 +4646,13 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
new_bus->dev_count--;
memcpy(new_bus->range + i, bus->range + i + 1,
flex_array_size(new_bus, range, new_bus->dev_count - i));
- } else {
+ }
+
+ rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
+ synchronize_srcu_expedited(&kvm->srcu);
+
+ /* Destroy the old bus _after_ installing the (null) bus. */
+ if (!new_bus) {
pr_err("kvm: failed to shrink bus, removing it completely\n");
for (j = 0; j < bus->dev_count; j++) {
if (j == i)
@@ -4655,8 +4661,6 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
}
- rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
- synchronize_srcu_expedited(&kvm->srcu);
kfree(bus);
return;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2ee3757424be7c1cd1d0bbfa6db29a7edd82a250 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 12 Apr 2021 15:20:48 -0700
Subject: [PATCH] KVM: Destroy I/O bus devices on unregister failure _after_
sync'ing SRCU
If allocating a new instance of an I/O bus fails when unregistering a
device, wait to destroy the device until after all readers are guaranteed
to see the new null bus. Destroying devices before the bus is nullified
could lead to use-after-free since readers expect the devices on their
reference of the bus to remain valid.
Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210412222050.876100-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 529cff1050d7..c771d40737c9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4646,7 +4646,13 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
new_bus->dev_count--;
memcpy(new_bus->range + i, bus->range + i + 1,
flex_array_size(new_bus, range, new_bus->dev_count - i));
- } else {
+ }
+
+ rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
+ synchronize_srcu_expedited(&kvm->srcu);
+
+ /* Destroy the old bus _after_ installing the (null) bus. */
+ if (!new_bus) {
pr_err("kvm: failed to shrink bus, removing it completely\n");
for (j = 0; j < bus->dev_count; j++) {
if (j == i)
@@ -4655,8 +4661,6 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
}
- rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
- synchronize_srcu_expedited(&kvm->srcu);
kfree(bus);
return;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2ee3757424be7c1cd1d0bbfa6db29a7edd82a250 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 12 Apr 2021 15:20:48 -0700
Subject: [PATCH] KVM: Destroy I/O bus devices on unregister failure _after_
sync'ing SRCU
If allocating a new instance of an I/O bus fails when unregistering a
device, wait to destroy the device until after all readers are guaranteed
to see the new null bus. Destroying devices before the bus is nullified
could lead to use-after-free since readers expect the devices on their
reference of the bus to remain valid.
Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210412222050.876100-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 529cff1050d7..c771d40737c9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4646,7 +4646,13 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
new_bus->dev_count--;
memcpy(new_bus->range + i, bus->range + i + 1,
flex_array_size(new_bus, range, new_bus->dev_count - i));
- } else {
+ }
+
+ rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
+ synchronize_srcu_expedited(&kvm->srcu);
+
+ /* Destroy the old bus _after_ installing the (null) bus. */
+ if (!new_bus) {
pr_err("kvm: failed to shrink bus, removing it completely\n");
for (j = 0; j < bus->dev_count; j++) {
if (j == i)
@@ -4655,8 +4661,6 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
}
}
- rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
- synchronize_srcu_expedited(&kvm->srcu);
kfree(bus);
return;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 94ac0835391efc1a30feda6fc908913ec012951e Mon Sep 17 00:00:00 2001
From: Eric Auger <eric.auger(a)redhat.com>
Date: Mon, 12 Apr 2021 17:00:34 +0200
Subject: [PATCH] KVM: arm/arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST read
When reading the base address of the a REDIST region
through KVM_VGIC_V3_ADDR_TYPE_REDIST we expect the
redistributor region list to be populated with a single
element.
However list_first_entry() expects the list to be non empty.
Instead we should use list_first_entry_or_null which effectively
returns NULL if the list is empty.
Fixes: dbd9733ab674 ("KVM: arm/arm64: Replace the single rdist region by a list")
Cc: <Stable(a)vger.kernel.org> # v4.18+
Signed-off-by: Eric Auger <eric.auger(a)redhat.com>
Reported-by: Gavin Shan <gshan(a)redhat.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Link: https://lore.kernel.org/r/20210412150034.29185-1-eric.auger@redhat.com
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 2f66cf247282..7740995de982 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -87,8 +87,8 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write)
r = vgic_v3_set_redist_base(kvm, 0, *addr, 0);
goto out;
}
- rdreg = list_first_entry(&vgic->rd_regions,
- struct vgic_redist_region, list);
+ rdreg = list_first_entry_or_null(&vgic->rd_regions,
+ struct vgic_redist_region, list);
if (!rdreg)
addr_ptr = &undef_value;
else
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82277eeed65eed6c6ee5b8f97bd978763eab148f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:25 -0700
Subject: [PATCH] KVM: nVMX: Truncate base/index GPR value on address calc in
!64-bit
Drop bits 63:32 of the base and/or index GPRs when calculating the
effective address of a VMX instruction memory operand. Outside of 64-bit
mode, memory encodings are strictly limited to E*X and below.
Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-7-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f93ff2302b4c..ac838fb2aa4e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4619,9 +4619,9 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
else if (addr_size == 0)
off = (gva_t)sign_extend64(off, 15);
if (base_is_valid)
- off += kvm_register_read(vcpu, base_reg);
+ off += kvm_register_readl(vcpu, base_reg);
if (index_is_valid)
- off += kvm_register_read(vcpu, index_reg) << scaling;
+ off += kvm_register_readl(vcpu, index_reg) << scaling;
vmx_get_segment(vcpu, &s, seg_reg);
/*
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82277eeed65eed6c6ee5b8f97bd978763eab148f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:25 -0700
Subject: [PATCH] KVM: nVMX: Truncate base/index GPR value on address calc in
!64-bit
Drop bits 63:32 of the base and/or index GPRs when calculating the
effective address of a VMX instruction memory operand. Outside of 64-bit
mode, memory encodings are strictly limited to E*X and below.
Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-7-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f93ff2302b4c..ac838fb2aa4e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4619,9 +4619,9 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
else if (addr_size == 0)
off = (gva_t)sign_extend64(off, 15);
if (base_is_valid)
- off += kvm_register_read(vcpu, base_reg);
+ off += kvm_register_readl(vcpu, base_reg);
if (index_is_valid)
- off += kvm_register_read(vcpu, index_reg) << scaling;
+ off += kvm_register_readl(vcpu, index_reg) << scaling;
vmx_get_segment(vcpu, &s, seg_reg);
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82277eeed65eed6c6ee5b8f97bd978763eab148f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:25 -0700
Subject: [PATCH] KVM: nVMX: Truncate base/index GPR value on address calc in
!64-bit
Drop bits 63:32 of the base and/or index GPRs when calculating the
effective address of a VMX instruction memory operand. Outside of 64-bit
mode, memory encodings are strictly limited to E*X and below.
Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-7-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f93ff2302b4c..ac838fb2aa4e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4619,9 +4619,9 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
else if (addr_size == 0)
off = (gva_t)sign_extend64(off, 15);
if (base_is_valid)
- off += kvm_register_read(vcpu, base_reg);
+ off += kvm_register_readl(vcpu, base_reg);
if (index_is_valid)
- off += kvm_register_read(vcpu, index_reg) << scaling;
+ off += kvm_register_readl(vcpu, index_reg) << scaling;
vmx_get_segment(vcpu, &s, seg_reg);
/*
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82277eeed65eed6c6ee5b8f97bd978763eab148f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:25 -0700
Subject: [PATCH] KVM: nVMX: Truncate base/index GPR value on address calc in
!64-bit
Drop bits 63:32 of the base and/or index GPRs when calculating the
effective address of a VMX instruction memory operand. Outside of 64-bit
mode, memory encodings are strictly limited to E*X and below.
Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-7-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f93ff2302b4c..ac838fb2aa4e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4619,9 +4619,9 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
else if (addr_size == 0)
off = (gva_t)sign_extend64(off, 15);
if (base_is_valid)
- off += kvm_register_read(vcpu, base_reg);
+ off += kvm_register_readl(vcpu, base_reg);
if (index_is_valid)
- off += kvm_register_read(vcpu, index_reg) << scaling;
+ off += kvm_register_readl(vcpu, index_reg) << scaling;
vmx_get_segment(vcpu, &s, seg_reg);
/*
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82277eeed65eed6c6ee5b8f97bd978763eab148f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:25 -0700
Subject: [PATCH] KVM: nVMX: Truncate base/index GPR value on address calc in
!64-bit
Drop bits 63:32 of the base and/or index GPRs when calculating the
effective address of a VMX instruction memory operand. Outside of 64-bit
mode, memory encodings are strictly limited to E*X and below.
Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-7-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f93ff2302b4c..ac838fb2aa4e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4619,9 +4619,9 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
else if (addr_size == 0)
off = (gva_t)sign_extend64(off, 15);
if (base_is_valid)
- off += kvm_register_read(vcpu, base_reg);
+ off += kvm_register_readl(vcpu, base_reg);
if (index_is_valid)
- off += kvm_register_read(vcpu, index_reg) << scaling;
+ off += kvm_register_readl(vcpu, index_reg) << scaling;
vmx_get_segment(vcpu, &s, seg_reg);
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ee050a577523dfd5fac95e6cc182ebe0293ead59 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:24 -0700
Subject: [PATCH] KVM: nVMX: Truncate bits 63:32 of VMCS field on nested check
in !64-bit
Drop bits 63:32 of the VMCS field encoding when checking for a nested
VM-Exit on VMREAD/VMWRITE in !64-bit mode. VMREAD and VMWRITE always
use 32-bit operands outside of 64-bit mode.
The actual emulation of VMREAD/VMWRITE does the right thing, this bug is
purely limited to incorrectly causing a nested VM-Exit if a GPR happens
to have bits 63:32 set outside of 64-bit mode.
Fixes: a7cde481b6e8 ("KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-6-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0f8c118ebc35..f93ff2302b4c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5745,7 +5745,7 @@ static bool nested_vmx_exit_handled_vmcs_access(struct kvm_vcpu *vcpu,
/* Decode instruction info and find the field to access */
vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
- field = kvm_register_read(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
+ field = kvm_register_readl(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
/* Out-of-range fields always cause a VM exit from L2 to L1 */
if (field >> 15)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d8971344f5739a9cc53f91f1f593ddd82265b93b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:23 -0700
Subject: [PATCH] KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit
mode
Drop bits 63:32 when storing a DR/CR to a GPR when the vCPU is not in
64-bit mode. Per the SDM:
The operand size for these instructions is always 32 bits in non-64-bit
modes, regardless of the operand-size attribute.
CR8 technically isn't affected as CR8 isn't accessible outside of 64-bit
mode, but fix it up for consistency and to allow for future cleanup.
Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index edc23c77be32..64354b009fe0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5121,12 +5121,12 @@ static int handle_cr(struct kvm_vcpu *vcpu)
case 3:
WARN_ON_ONCE(enable_unrestricted_guest);
val = kvm_read_cr3(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
case 8:
val = kvm_get_cr8(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -5199,7 +5199,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
unsigned long val;
kvm_get_dr(vcpu, dr, &val);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
err = 0;
} else {
err = kvm_set_dr(vcpu, dr, kvm_register_readl(vcpu, reg));
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d8971344f5739a9cc53f91f1f593ddd82265b93b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:23 -0700
Subject: [PATCH] KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit
mode
Drop bits 63:32 when storing a DR/CR to a GPR when the vCPU is not in
64-bit mode. Per the SDM:
The operand size for these instructions is always 32 bits in non-64-bit
modes, regardless of the operand-size attribute.
CR8 technically isn't affected as CR8 isn't accessible outside of 64-bit
mode, but fix it up for consistency and to allow for future cleanup.
Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index edc23c77be32..64354b009fe0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5121,12 +5121,12 @@ static int handle_cr(struct kvm_vcpu *vcpu)
case 3:
WARN_ON_ONCE(enable_unrestricted_guest);
val = kvm_read_cr3(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
case 8:
val = kvm_get_cr8(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -5199,7 +5199,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
unsigned long val;
kvm_get_dr(vcpu, dr, &val);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
err = 0;
} else {
err = kvm_set_dr(vcpu, dr, kvm_register_readl(vcpu, reg));
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d8971344f5739a9cc53f91f1f593ddd82265b93b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:23 -0700
Subject: [PATCH] KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit
mode
Drop bits 63:32 when storing a DR/CR to a GPR when the vCPU is not in
64-bit mode. Per the SDM:
The operand size for these instructions is always 32 bits in non-64-bit
modes, regardless of the operand-size attribute.
CR8 technically isn't affected as CR8 isn't accessible outside of 64-bit
mode, but fix it up for consistency and to allow for future cleanup.
Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index edc23c77be32..64354b009fe0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5121,12 +5121,12 @@ static int handle_cr(struct kvm_vcpu *vcpu)
case 3:
WARN_ON_ONCE(enable_unrestricted_guest);
val = kvm_read_cr3(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
case 8:
val = kvm_get_cr8(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -5199,7 +5199,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
unsigned long val;
kvm_get_dr(vcpu, dr, &val);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
err = 0;
} else {
err = kvm_set_dr(vcpu, dr, kvm_register_readl(vcpu, reg));
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d8971344f5739a9cc53f91f1f593ddd82265b93b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:23 -0700
Subject: [PATCH] KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit
mode
Drop bits 63:32 when storing a DR/CR to a GPR when the vCPU is not in
64-bit mode. Per the SDM:
The operand size for these instructions is always 32 bits in non-64-bit
modes, regardless of the operand-size attribute.
CR8 technically isn't affected as CR8 isn't accessible outside of 64-bit
mode, but fix it up for consistency and to allow for future cleanup.
Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index edc23c77be32..64354b009fe0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5121,12 +5121,12 @@ static int handle_cr(struct kvm_vcpu *vcpu)
case 3:
WARN_ON_ONCE(enable_unrestricted_guest);
val = kvm_read_cr3(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
case 8:
val = kvm_get_cr8(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -5199,7 +5199,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
unsigned long val;
kvm_get_dr(vcpu, dr, &val);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
err = 0;
} else {
err = kvm_set_dr(vcpu, dr, kvm_register_readl(vcpu, reg));
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d8971344f5739a9cc53f91f1f593ddd82265b93b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:23 -0700
Subject: [PATCH] KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit
mode
Drop bits 63:32 when storing a DR/CR to a GPR when the vCPU is not in
64-bit mode. Per the SDM:
The operand size for these instructions is always 32 bits in non-64-bit
modes, regardless of the operand-size attribute.
CR8 technically isn't affected as CR8 isn't accessible outside of 64-bit
mode, but fix it up for consistency and to allow for future cleanup.
Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index edc23c77be32..64354b009fe0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5121,12 +5121,12 @@ static int handle_cr(struct kvm_vcpu *vcpu)
case 3:
WARN_ON_ONCE(enable_unrestricted_guest);
val = kvm_read_cr3(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
case 8:
val = kvm_get_cr8(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -5199,7 +5199,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
unsigned long val;
kvm_get_dr(vcpu, dr, &val);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
err = 0;
} else {
err = kvm_set_dr(vcpu, dr, kvm_register_readl(vcpu, reg));
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d8971344f5739a9cc53f91f1f593ddd82265b93b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:23 -0700
Subject: [PATCH] KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit
mode
Drop bits 63:32 when storing a DR/CR to a GPR when the vCPU is not in
64-bit mode. Per the SDM:
The operand size for these instructions is always 32 bits in non-64-bit
modes, regardless of the operand-size attribute.
CR8 technically isn't affected as CR8 isn't accessible outside of 64-bit
mode, but fix it up for consistency and to allow for future cleanup.
Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index edc23c77be32..64354b009fe0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5121,12 +5121,12 @@ static int handle_cr(struct kvm_vcpu *vcpu)
case 3:
WARN_ON_ONCE(enable_unrestricted_guest);
val = kvm_read_cr3(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
case 8:
val = kvm_get_cr8(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -5199,7 +5199,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
unsigned long val;
kvm_get_dr(vcpu, dr, &val);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
err = 0;
} else {
err = kvm_set_dr(vcpu, dr, kvm_register_readl(vcpu, reg));
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d8971344f5739a9cc53f91f1f593ddd82265b93b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:23 -0700
Subject: [PATCH] KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit
mode
Drop bits 63:32 when storing a DR/CR to a GPR when the vCPU is not in
64-bit mode. Per the SDM:
The operand size for these instructions is always 32 bits in non-64-bit
modes, regardless of the operand-size attribute.
CR8 technically isn't affected as CR8 isn't accessible outside of 64-bit
mode, but fix it up for consistency and to allow for future cleanup.
Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index edc23c77be32..64354b009fe0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5121,12 +5121,12 @@ static int handle_cr(struct kvm_vcpu *vcpu)
case 3:
WARN_ON_ONCE(enable_unrestricted_guest);
val = kvm_read_cr3(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
case 8:
val = kvm_get_cr8(vcpu);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -5199,7 +5199,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
unsigned long val;
kvm_get_dr(vcpu, dr, &val);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
err = 0;
} else {
err = kvm_set_dr(vcpu, dr, kvm_register_readl(vcpu, reg));
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c805f5d5585ab5e0cdac6b1ccf7086eb120fb7db Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 4 Mar 2021 17:10:57 -0800
Subject: [PATCH] KVM: nVMX: Defer the MMU reload to the normal path on an EPTP
switch
Defer reloading the MMU after a EPTP successful EPTP switch. The VMFUNC
instruction itself is executed in the previous EPTP context, any side
effects, e.g. updating RIP, should occur in the old context. Practically
speaking, this bug is benign as VMX doesn't touch the MMU when skipping
an emulated instruction, nor does queuing a single-step #DB. No other
post-switch side effects exist.
Fixes: 41ab93727467 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210305011101.3597423-14-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bd1343a0896e..d74abd778429 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5481,16 +5481,11 @@ static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
if (!nested_vmx_check_eptp(vcpu, new_eptp))
return 1;
- kvm_mmu_unload(vcpu);
mmu->ept_ad = accessed_dirty;
mmu->mmu_role.base.ad_disabled = !accessed_dirty;
vmcs12->ept_pointer = new_eptp;
- /*
- * TODO: Check what's the correct approach in case
- * mmu reload fails. Currently, we just let the next
- * reload potentially fail
- */
- kvm_mmu_reload(vcpu);
+
+ kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
}
return 0;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c805f5d5585ab5e0cdac6b1ccf7086eb120fb7db Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 4 Mar 2021 17:10:57 -0800
Subject: [PATCH] KVM: nVMX: Defer the MMU reload to the normal path on an EPTP
switch
Defer reloading the MMU after a EPTP successful EPTP switch. The VMFUNC
instruction itself is executed in the previous EPTP context, any side
effects, e.g. updating RIP, should occur in the old context. Practically
speaking, this bug is benign as VMX doesn't touch the MMU when skipping
an emulated instruction, nor does queuing a single-step #DB. No other
post-switch side effects exist.
Fixes: 41ab93727467 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210305011101.3597423-14-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bd1343a0896e..d74abd778429 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5481,16 +5481,11 @@ static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
if (!nested_vmx_check_eptp(vcpu, new_eptp))
return 1;
- kvm_mmu_unload(vcpu);
mmu->ept_ad = accessed_dirty;
mmu->mmu_role.base.ad_disabled = !accessed_dirty;
vmcs12->ept_pointer = new_eptp;
- /*
- * TODO: Check what's the correct approach in case
- * mmu reload fails. Currently, we just let the next
- * reload potentially fail
- */
- kvm_mmu_reload(vcpu);
+
+ kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
}
return 0;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c805f5d5585ab5e0cdac6b1ccf7086eb120fb7db Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 4 Mar 2021 17:10:57 -0800
Subject: [PATCH] KVM: nVMX: Defer the MMU reload to the normal path on an EPTP
switch
Defer reloading the MMU after a EPTP successful EPTP switch. The VMFUNC
instruction itself is executed in the previous EPTP context, any side
effects, e.g. updating RIP, should occur in the old context. Practically
speaking, this bug is benign as VMX doesn't touch the MMU when skipping
an emulated instruction, nor does queuing a single-step #DB. No other
post-switch side effects exist.
Fixes: 41ab93727467 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210305011101.3597423-14-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bd1343a0896e..d74abd778429 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5481,16 +5481,11 @@ static int nested_vmx_eptp_switching(struct kvm_vcpu *vcpu,
if (!nested_vmx_check_eptp(vcpu, new_eptp))
return 1;
- kvm_mmu_unload(vcpu);
mmu->ept_ad = accessed_dirty;
mmu->mmu_role.base.ad_disabled = !accessed_dirty;
vmcs12->ept_pointer = new_eptp;
- /*
- * TODO: Check what's the correct approach in case
- * mmu reload fails. Currently, we just let the next
- * reload potentially fail
- */
- kvm_mmu_reload(vcpu);
+
+ kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
}
return 0;
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc9eff67fc35d733e2de0e0017dc3f5a86e8daf8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:27 -0700
Subject: [PATCH] KVM: SVM: Use default rAX size for INVLPGA emulation
Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
outside of 64-bit mode to make KVM's emulation slightly less wrong. The
address for INVLPGA is determined by the effective address size, i.e.
it's not hardcoded to 64/32 bits for a given mode. Add a FIXME to call
out that the emulation is wrong.
Opportunistically tweak the ASID handling to make it clear that it's
defined by ECX, not rCX.
Per the APM:
The portion of rAX used to form the address is determined by the
effective address size (current execution mode and optional address
size prefix). The ASID is taken from ECX.
Fixes: ff092385e828 ("KVM: SVM: Implement INVLPGA")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-9-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 857bcf3a4cda..1f5a8e7872c1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2290,11 +2290,17 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
static int invlpga_interception(struct kvm_vcpu *vcpu)
{
- trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, kvm_rcx_read(vcpu),
- kvm_rax_read(vcpu));
+ gva_t gva = kvm_rax_read(vcpu);
+ u32 asid = kvm_rcx_read(vcpu);
+
+ /* FIXME: Handle an address size prefix. */
+ if (!is_long_mode(vcpu))
+ gva = (u32)gva;
+
+ trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
- kvm_mmu_invlpg(vcpu, kvm_rax_read(vcpu));
+ kvm_mmu_invlpg(vcpu, gva);
return kvm_skip_emulated_instruction(vcpu);
}
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc9eff67fc35d733e2de0e0017dc3f5a86e8daf8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:27 -0700
Subject: [PATCH] KVM: SVM: Use default rAX size for INVLPGA emulation
Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
outside of 64-bit mode to make KVM's emulation slightly less wrong. The
address for INVLPGA is determined by the effective address size, i.e.
it's not hardcoded to 64/32 bits for a given mode. Add a FIXME to call
out that the emulation is wrong.
Opportunistically tweak the ASID handling to make it clear that it's
defined by ECX, not rCX.
Per the APM:
The portion of rAX used to form the address is determined by the
effective address size (current execution mode and optional address
size prefix). The ASID is taken from ECX.
Fixes: ff092385e828 ("KVM: SVM: Implement INVLPGA")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-9-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 857bcf3a4cda..1f5a8e7872c1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2290,11 +2290,17 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
static int invlpga_interception(struct kvm_vcpu *vcpu)
{
- trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, kvm_rcx_read(vcpu),
- kvm_rax_read(vcpu));
+ gva_t gva = kvm_rax_read(vcpu);
+ u32 asid = kvm_rcx_read(vcpu);
+
+ /* FIXME: Handle an address size prefix. */
+ if (!is_long_mode(vcpu))
+ gva = (u32)gva;
+
+ trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
- kvm_mmu_invlpg(vcpu, kvm_rax_read(vcpu));
+ kvm_mmu_invlpg(vcpu, gva);
return kvm_skip_emulated_instruction(vcpu);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc9eff67fc35d733e2de0e0017dc3f5a86e8daf8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:27 -0700
Subject: [PATCH] KVM: SVM: Use default rAX size for INVLPGA emulation
Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
outside of 64-bit mode to make KVM's emulation slightly less wrong. The
address for INVLPGA is determined by the effective address size, i.e.
it's not hardcoded to 64/32 bits for a given mode. Add a FIXME to call
out that the emulation is wrong.
Opportunistically tweak the ASID handling to make it clear that it's
defined by ECX, not rCX.
Per the APM:
The portion of rAX used to form the address is determined by the
effective address size (current execution mode and optional address
size prefix). The ASID is taken from ECX.
Fixes: ff092385e828 ("KVM: SVM: Implement INVLPGA")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-9-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 857bcf3a4cda..1f5a8e7872c1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2290,11 +2290,17 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
static int invlpga_interception(struct kvm_vcpu *vcpu)
{
- trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, kvm_rcx_read(vcpu),
- kvm_rax_read(vcpu));
+ gva_t gva = kvm_rax_read(vcpu);
+ u32 asid = kvm_rcx_read(vcpu);
+
+ /* FIXME: Handle an address size prefix. */
+ if (!is_long_mode(vcpu))
+ gva = (u32)gva;
+
+ trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
- kvm_mmu_invlpg(vcpu, kvm_rax_read(vcpu));
+ kvm_mmu_invlpg(vcpu, gva);
return kvm_skip_emulated_instruction(vcpu);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc9eff67fc35d733e2de0e0017dc3f5a86e8daf8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:27 -0700
Subject: [PATCH] KVM: SVM: Use default rAX size for INVLPGA emulation
Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
outside of 64-bit mode to make KVM's emulation slightly less wrong. The
address for INVLPGA is determined by the effective address size, i.e.
it's not hardcoded to 64/32 bits for a given mode. Add a FIXME to call
out that the emulation is wrong.
Opportunistically tweak the ASID handling to make it clear that it's
defined by ECX, not rCX.
Per the APM:
The portion of rAX used to form the address is determined by the
effective address size (current execution mode and optional address
size prefix). The ASID is taken from ECX.
Fixes: ff092385e828 ("KVM: SVM: Implement INVLPGA")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-9-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 857bcf3a4cda..1f5a8e7872c1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2290,11 +2290,17 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
static int invlpga_interception(struct kvm_vcpu *vcpu)
{
- trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, kvm_rcx_read(vcpu),
- kvm_rax_read(vcpu));
+ gva_t gva = kvm_rax_read(vcpu);
+ u32 asid = kvm_rcx_read(vcpu);
+
+ /* FIXME: Handle an address size prefix. */
+ if (!is_long_mode(vcpu))
+ gva = (u32)gva;
+
+ trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
- kvm_mmu_invlpg(vcpu, kvm_rax_read(vcpu));
+ kvm_mmu_invlpg(vcpu, gva);
return kvm_skip_emulated_instruction(vcpu);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc9eff67fc35d733e2de0e0017dc3f5a86e8daf8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:27 -0700
Subject: [PATCH] KVM: SVM: Use default rAX size for INVLPGA emulation
Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
outside of 64-bit mode to make KVM's emulation slightly less wrong. The
address for INVLPGA is determined by the effective address size, i.e.
it's not hardcoded to 64/32 bits for a given mode. Add a FIXME to call
out that the emulation is wrong.
Opportunistically tweak the ASID handling to make it clear that it's
defined by ECX, not rCX.
Per the APM:
The portion of rAX used to form the address is determined by the
effective address size (current execution mode and optional address
size prefix). The ASID is taken from ECX.
Fixes: ff092385e828 ("KVM: SVM: Implement INVLPGA")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-9-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 857bcf3a4cda..1f5a8e7872c1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2290,11 +2290,17 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
static int invlpga_interception(struct kvm_vcpu *vcpu)
{
- trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, kvm_rcx_read(vcpu),
- kvm_rax_read(vcpu));
+ gva_t gva = kvm_rax_read(vcpu);
+ u32 asid = kvm_rcx_read(vcpu);
+
+ /* FIXME: Handle an address size prefix. */
+ if (!is_long_mode(vcpu))
+ gva = (u32)gva;
+
+ trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
- kvm_mmu_invlpg(vcpu, kvm_rax_read(vcpu));
+ kvm_mmu_invlpg(vcpu, gva);
return kvm_skip_emulated_instruction(vcpu);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc9eff67fc35d733e2de0e0017dc3f5a86e8daf8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:27 -0700
Subject: [PATCH] KVM: SVM: Use default rAX size for INVLPGA emulation
Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
outside of 64-bit mode to make KVM's emulation slightly less wrong. The
address for INVLPGA is determined by the effective address size, i.e.
it's not hardcoded to 64/32 bits for a given mode. Add a FIXME to call
out that the emulation is wrong.
Opportunistically tweak the ASID handling to make it clear that it's
defined by ECX, not rCX.
Per the APM:
The portion of rAX used to form the address is determined by the
effective address size (current execution mode and optional address
size prefix). The ASID is taken from ECX.
Fixes: ff092385e828 ("KVM: SVM: Implement INVLPGA")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-9-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 857bcf3a4cda..1f5a8e7872c1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2290,11 +2290,17 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
static int invlpga_interception(struct kvm_vcpu *vcpu)
{
- trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, kvm_rcx_read(vcpu),
- kvm_rax_read(vcpu));
+ gva_t gva = kvm_rax_read(vcpu);
+ u32 asid = kvm_rcx_read(vcpu);
+
+ /* FIXME: Handle an address size prefix. */
+ if (!is_long_mode(vcpu))
+ gva = (u32)gva;
+
+ trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
- kvm_mmu_invlpg(vcpu, kvm_rax_read(vcpu));
+ kvm_mmu_invlpg(vcpu, gva);
return kvm_skip_emulated_instruction(vcpu);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc9eff67fc35d733e2de0e0017dc3f5a86e8daf8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:27 -0700
Subject: [PATCH] KVM: SVM: Use default rAX size for INVLPGA emulation
Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
outside of 64-bit mode to make KVM's emulation slightly less wrong. The
address for INVLPGA is determined by the effective address size, i.e.
it's not hardcoded to 64/32 bits for a given mode. Add a FIXME to call
out that the emulation is wrong.
Opportunistically tweak the ASID handling to make it clear that it's
defined by ECX, not rCX.
Per the APM:
The portion of rAX used to form the address is determined by the
effective address size (current execution mode and optional address
size prefix). The ASID is taken from ECX.
Fixes: ff092385e828 ("KVM: SVM: Implement INVLPGA")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-9-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 857bcf3a4cda..1f5a8e7872c1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2290,11 +2290,17 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
static int invlpga_interception(struct kvm_vcpu *vcpu)
{
- trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, kvm_rcx_read(vcpu),
- kvm_rax_read(vcpu));
+ gva_t gva = kvm_rax_read(vcpu);
+ u32 asid = kvm_rcx_read(vcpu);
+
+ /* FIXME: Handle an address size prefix. */
+ if (!is_long_mode(vcpu))
+ gva = (u32)gva;
+
+ trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
- kvm_mmu_invlpg(vcpu, kvm_rax_read(vcpu));
+ kvm_mmu_invlpg(vcpu, gva);
return kvm_skip_emulated_instruction(vcpu);
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bc9eff67fc35d733e2de0e0017dc3f5a86e8daf8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:27 -0700
Subject: [PATCH] KVM: SVM: Use default rAX size for INVLPGA emulation
Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
outside of 64-bit mode to make KVM's emulation slightly less wrong. The
address for INVLPGA is determined by the effective address size, i.e.
it's not hardcoded to 64/32 bits for a given mode. Add a FIXME to call
out that the emulation is wrong.
Opportunistically tweak the ASID handling to make it clear that it's
defined by ECX, not rCX.
Per the APM:
The portion of rAX used to form the address is determined by the
effective address size (current execution mode and optional address
size prefix). The ASID is taken from ECX.
Fixes: ff092385e828 ("KVM: SVM: Implement INVLPGA")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-9-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 857bcf3a4cda..1f5a8e7872c1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2290,11 +2290,17 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
static int invlpga_interception(struct kvm_vcpu *vcpu)
{
- trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, kvm_rcx_read(vcpu),
- kvm_rax_read(vcpu));
+ gva_t gva = kvm_rax_read(vcpu);
+ u32 asid = kvm_rcx_read(vcpu);
+
+ /* FIXME: Handle an address size prefix. */
+ if (!is_long_mode(vcpu))
+ gva = (u32)gva;
+
+ trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
- kvm_mmu_invlpg(vcpu, kvm_rax_read(vcpu));
+ kvm_mmu_invlpg(vcpu, gva);
return kvm_skip_emulated_instruction(vcpu);
}
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0884335a2e653b8a045083aa1d57ce74269ac81d Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:22 -0700
Subject: [PATCH] KVM: SVM: Truncate GPR value for DR and CR accesses in
!64-bit mode
Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not
in 64-bit mode. The APM states bits 63:32 are dropped for both DRs and
CRs:
In 64-bit mode, the operand size is fixed at 64 bits without the need
for a REX prefix. In non-64-bit mode, the operand size is fixed at 32
bits and the upper 32 bits of the destination are forced to 0.
Fixes: 7ff76d58a9dc ("KVM: SVM: enhance MOV CR intercept handler")
Fixes: cae3797a4639 ("KVM: SVM: enhance mov DR intercept handler")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-4-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 301792542937..857bcf3a4cda 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2451,7 +2451,7 @@ static int cr_interception(struct kvm_vcpu *vcpu)
err = 0;
if (cr >= 16) { /* mov to cr */
cr -= 16;
- val = kvm_register_read(vcpu, reg);
+ val = kvm_register_readl(vcpu, reg);
trace_kvm_cr_write(cr, val);
switch (cr) {
case 0:
@@ -2497,7 +2497,7 @@ static int cr_interception(struct kvm_vcpu *vcpu)
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
}
return kvm_complete_insn_gp(vcpu, err);
@@ -2563,11 +2563,11 @@ static int dr_interception(struct kvm_vcpu *vcpu)
dr = svm->vmcb->control.exit_code - SVM_EXIT_READ_DR0;
if (dr >= 16) { /* mov to DRn */
dr -= 16;
- val = kvm_register_read(vcpu, reg);
+ val = kvm_register_readl(vcpu, reg);
err = kvm_set_dr(vcpu, dr, val);
} else {
kvm_get_dr(vcpu, dr, &val);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
}
return kvm_complete_insn_gp(vcpu, err);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0884335a2e653b8a045083aa1d57ce74269ac81d Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:22 -0700
Subject: [PATCH] KVM: SVM: Truncate GPR value for DR and CR accesses in
!64-bit mode
Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not
in 64-bit mode. The APM states bits 63:32 are dropped for both DRs and
CRs:
In 64-bit mode, the operand size is fixed at 64 bits without the need
for a REX prefix. In non-64-bit mode, the operand size is fixed at 32
bits and the upper 32 bits of the destination are forced to 0.
Fixes: 7ff76d58a9dc ("KVM: SVM: enhance MOV CR intercept handler")
Fixes: cae3797a4639 ("KVM: SVM: enhance mov DR intercept handler")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-4-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 301792542937..857bcf3a4cda 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2451,7 +2451,7 @@ static int cr_interception(struct kvm_vcpu *vcpu)
err = 0;
if (cr >= 16) { /* mov to cr */
cr -= 16;
- val = kvm_register_read(vcpu, reg);
+ val = kvm_register_readl(vcpu, reg);
trace_kvm_cr_write(cr, val);
switch (cr) {
case 0:
@@ -2497,7 +2497,7 @@ static int cr_interception(struct kvm_vcpu *vcpu)
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
}
return kvm_complete_insn_gp(vcpu, err);
@@ -2563,11 +2563,11 @@ static int dr_interception(struct kvm_vcpu *vcpu)
dr = svm->vmcb->control.exit_code - SVM_EXIT_READ_DR0;
if (dr >= 16) { /* mov to DRn */
dr -= 16;
- val = kvm_register_read(vcpu, reg);
+ val = kvm_register_readl(vcpu, reg);
err = kvm_set_dr(vcpu, dr, val);
} else {
kvm_get_dr(vcpu, dr, &val);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
}
return kvm_complete_insn_gp(vcpu, err);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0884335a2e653b8a045083aa1d57ce74269ac81d Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:22 -0700
Subject: [PATCH] KVM: SVM: Truncate GPR value for DR and CR accesses in
!64-bit mode
Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not
in 64-bit mode. The APM states bits 63:32 are dropped for both DRs and
CRs:
In 64-bit mode, the operand size is fixed at 64 bits without the need
for a REX prefix. In non-64-bit mode, the operand size is fixed at 32
bits and the upper 32 bits of the destination are forced to 0.
Fixes: 7ff76d58a9dc ("KVM: SVM: enhance MOV CR intercept handler")
Fixes: cae3797a4639 ("KVM: SVM: enhance mov DR intercept handler")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-4-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 301792542937..857bcf3a4cda 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2451,7 +2451,7 @@ static int cr_interception(struct kvm_vcpu *vcpu)
err = 0;
if (cr >= 16) { /* mov to cr */
cr -= 16;
- val = kvm_register_read(vcpu, reg);
+ val = kvm_register_readl(vcpu, reg);
trace_kvm_cr_write(cr, val);
switch (cr) {
case 0:
@@ -2497,7 +2497,7 @@ static int cr_interception(struct kvm_vcpu *vcpu)
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
}
return kvm_complete_insn_gp(vcpu, err);
@@ -2563,11 +2563,11 @@ static int dr_interception(struct kvm_vcpu *vcpu)
dr = svm->vmcb->control.exit_code - SVM_EXIT_READ_DR0;
if (dr >= 16) { /* mov to DRn */
dr -= 16;
- val = kvm_register_read(vcpu, reg);
+ val = kvm_register_readl(vcpu, reg);
err = kvm_set_dr(vcpu, dr, val);
} else {
kvm_get_dr(vcpu, dr, &val);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
}
return kvm_complete_insn_gp(vcpu, err);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0884335a2e653b8a045083aa1d57ce74269ac81d Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Apr 2021 19:21:22 -0700
Subject: [PATCH] KVM: SVM: Truncate GPR value for DR and CR accesses in
!64-bit mode
Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not
in 64-bit mode. The APM states bits 63:32 are dropped for both DRs and
CRs:
In 64-bit mode, the operand size is fixed at 64 bits without the need
for a REX prefix. In non-64-bit mode, the operand size is fixed at 32
bits and the upper 32 bits of the destination are forced to 0.
Fixes: 7ff76d58a9dc ("KVM: SVM: enhance MOV CR intercept handler")
Fixes: cae3797a4639 ("KVM: SVM: enhance mov DR intercept handler")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210422022128.3464144-4-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 301792542937..857bcf3a4cda 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2451,7 +2451,7 @@ static int cr_interception(struct kvm_vcpu *vcpu)
err = 0;
if (cr >= 16) { /* mov to cr */
cr -= 16;
- val = kvm_register_read(vcpu, reg);
+ val = kvm_register_readl(vcpu, reg);
trace_kvm_cr_write(cr, val);
switch (cr) {
case 0:
@@ -2497,7 +2497,7 @@ static int cr_interception(struct kvm_vcpu *vcpu)
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
}
return kvm_complete_insn_gp(vcpu, err);
@@ -2563,11 +2563,11 @@ static int dr_interception(struct kvm_vcpu *vcpu)
dr = svm->vmcb->control.exit_code - SVM_EXIT_READ_DR0;
if (dr >= 16) { /* mov to DRn */
dr -= 16;
- val = kvm_register_read(vcpu, reg);
+ val = kvm_register_readl(vcpu, reg);
err = kvm_set_dr(vcpu, dr, val);
} else {
kvm_get_dr(vcpu, dr, &val);
- kvm_register_write(vcpu, reg, val);
+ kvm_register_writel(vcpu, reg, val);
}
return kvm_complete_insn_gp(vcpu, err);
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a3ba26ecfb569f4aa3f867e80c02aa65f20aadad Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Fri, 9 Apr 2021 09:38:42 -0500
Subject: [PATCH] KVM: SVM: Make sure GHCB is mapped before updating
Access to the GHCB is mainly in the VMGEXIT path and it is known that the
GHCB will be mapped. But there are two paths where it is possible the GHCB
might not be mapped.
The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
However, if a SIPI is performed without a corresponding AP Reset Hold,
then the GHCB might not be mapped (depending on the previous VMEXIT),
which will result in a NULL pointer dereference.
The svm_complete_emulated_msr() routine will update the GHCB to inform
the caller of a RDMSR/WRMSR operation about any errors. While it is likely
that the GHCB will be mapped in this situation, add a safe guard
in this path to be certain a NULL pointer dereference is not encountered.
Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: stable(a)vger.kernel.org
Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky(a)amd.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d7a923712ca9..bb4bf5ffb104 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2104,5 +2104,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
* the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
* non-zero value.
*/
+ if (!svm->ghcb)
+ return;
+
ghcb_set_sw_exit_info_2(svm->ghcb, 1);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6c39b0cd6ec6..4fc1f16505db 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2757,7 +2757,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
{
struct vcpu_svm *svm = to_svm(vcpu);
- if (!sev_es_guest(vcpu->kvm) || !err)
+ if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->ghcb))
return kvm_complete_insn_gp(vcpu, err);
ghcb_set_sw_exit_info_1(svm->ghcb, 1);
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a3ba26ecfb569f4aa3f867e80c02aa65f20aadad Mon Sep 17 00:00:00 2001
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Fri, 9 Apr 2021 09:38:42 -0500
Subject: [PATCH] KVM: SVM: Make sure GHCB is mapped before updating
Access to the GHCB is mainly in the VMGEXIT path and it is known that the
GHCB will be mapped. But there are two paths where it is possible the GHCB
might not be mapped.
The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
However, if a SIPI is performed without a corresponding AP Reset Hold,
then the GHCB might not be mapped (depending on the previous VMEXIT),
which will result in a NULL pointer dereference.
The svm_complete_emulated_msr() routine will update the GHCB to inform
the caller of a RDMSR/WRMSR operation about any errors. While it is likely
that the GHCB will be mapped in this situation, add a safe guard
in this path to be certain a NULL pointer dereference is not encountered.
Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: stable(a)vger.kernel.org
Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky(a)amd.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d7a923712ca9..bb4bf5ffb104 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2104,5 +2104,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
* the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
* non-zero value.
*/
+ if (!svm->ghcb)
+ return;
+
ghcb_set_sw_exit_info_2(svm->ghcb, 1);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6c39b0cd6ec6..4fc1f16505db 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2757,7 +2757,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
{
struct vcpu_svm *svm = to_svm(vcpu);
- if (!sev_es_guest(vcpu->kvm) || !err)
+ if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->ghcb))
return kvm_complete_insn_gp(vcpu, err);
ghcb_set_sw_exit_info_1(svm->ghcb, 1);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6f2b296aa6432d8274e258cc3220047ca04f5de0 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 23 Apr 2021 15:34:01 -0700
Subject: [PATCH] KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP
unsupported
Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in
the guest's CPUID model.
Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210423223404.3860547-2-seanjc(a)google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cd8c333ed2dc..9ed9c7bd7cfd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2674,6 +2674,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_TSC_AUX:
if (!boot_cpu_has(X86_FEATURE_RDTSCP))
return 1;
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
msr_info->data = svm->tsc_aux;
break;
/*
@@ -2892,6 +2895,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
if (!boot_cpu_has(X86_FEATURE_RDTSCP))
return 1;
+ if (!msr->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
+
/*
* This is rare, so we update the MSR here instead of using
* direct_access_msrs. Doing that would require a rdmsr in
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6f2b296aa6432d8274e258cc3220047ca04f5de0 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 23 Apr 2021 15:34:01 -0700
Subject: [PATCH] KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP
unsupported
Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in
the guest's CPUID model.
Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210423223404.3860547-2-seanjc(a)google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cd8c333ed2dc..9ed9c7bd7cfd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2674,6 +2674,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_TSC_AUX:
if (!boot_cpu_has(X86_FEATURE_RDTSCP))
return 1;
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
msr_info->data = svm->tsc_aux;
break;
/*
@@ -2892,6 +2895,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
if (!boot_cpu_has(X86_FEATURE_RDTSCP))
return 1;
+ if (!msr->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
+
/*
* This is rare, so we update the MSR here instead of using
* direct_access_msrs. Doing that would require a rdmsr in
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6f2b296aa6432d8274e258cc3220047ca04f5de0 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 23 Apr 2021 15:34:01 -0700
Subject: [PATCH] KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP
unsupported
Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in
the guest's CPUID model.
Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210423223404.3860547-2-seanjc(a)google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cd8c333ed2dc..9ed9c7bd7cfd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2674,6 +2674,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_TSC_AUX:
if (!boot_cpu_has(X86_FEATURE_RDTSCP))
return 1;
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
msr_info->data = svm->tsc_aux;
break;
/*
@@ -2892,6 +2895,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
if (!boot_cpu_has(X86_FEATURE_RDTSCP))
return 1;
+ if (!msr->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
+
/*
* This is rare, so we update the MSR here instead of using
* direct_access_msrs. Doing that would require a rdmsr in
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6f2b296aa6432d8274e258cc3220047ca04f5de0 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 23 Apr 2021 15:34:01 -0700
Subject: [PATCH] KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP
unsupported
Inject #GP on guest accesses to MSR_TSC_AUX if RDTSCP is unsupported in
the guest's CPUID model.
Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210423223404.3860547-2-seanjc(a)google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cd8c333ed2dc..9ed9c7bd7cfd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2674,6 +2674,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_TSC_AUX:
if (!boot_cpu_has(X86_FEATURE_RDTSCP))
return 1;
+ if (!msr_info->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
msr_info->data = svm->tsc_aux;
break;
/*
@@ -2892,6 +2895,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
if (!boot_cpu_has(X86_FEATURE_RDTSCP))
return 1;
+ if (!msr->host_initiated &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+ return 1;
+
/*
* This is rare, so we update the MSR here instead of using
* direct_access_msrs. Doing that would require a rdmsr in
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8727906fde6ea665b52e68ddc58833772537f40a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Mar 2021 20:19:36 -0700
Subject: [PATCH] KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs
are created
Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one
or more vCPUs have been created. KVM assumes a VM is tagged SEV/SEV-ES
prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV
enabled, and svm_create_vcpu() needs to allocate the VMSA. At best,
creating vCPUs before SEV/SEV-ES init will lead to unexpected errors
and/or behavior, and at worst it will crash the host, e.g.
sev_launch_update_vmsa() will dereference a null svm->vmsa pointer.
Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command")
Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210331031936.2495277-4-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 97d42a007b86..824bc7d22e77 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -182,6 +182,9 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
bool es_active = argp->id == KVM_SEV_ES_INIT;
int asid, ret;
+ if (kvm->created_vcpus)
+ return -EINVAL;
+
ret = -EBUSY;
if (unlikely(sev->active))
return ret;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8727906fde6ea665b52e68ddc58833772537f40a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Mar 2021 20:19:36 -0700
Subject: [PATCH] KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs
are created
Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one
or more vCPUs have been created. KVM assumes a VM is tagged SEV/SEV-ES
prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV
enabled, and svm_create_vcpu() needs to allocate the VMSA. At best,
creating vCPUs before SEV/SEV-ES init will lead to unexpected errors
and/or behavior, and at worst it will crash the host, e.g.
sev_launch_update_vmsa() will dereference a null svm->vmsa pointer.
Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command")
Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210331031936.2495277-4-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 97d42a007b86..824bc7d22e77 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -182,6 +182,9 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
bool es_active = argp->id == KVM_SEV_ES_INIT;
int asid, ret;
+ if (kvm->created_vcpus)
+ return -EINVAL;
+
ret = -EBUSY;
if (unlikely(sev->active))
return ret;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6d1b867d045699d6ce0dfa0ef35d1b87dd36db56 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 4 Mar 2021 17:10:56 -0800
Subject: [PATCH] KVM: SVM: Don't strip the C-bit from CR2 on #PF interception
Don't strip the C-bit from the faulting address on an intercepted #PF,
the address is a virtual address, not a physical address.
Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210305011101.3597423-13-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58a45bb139f8..cc45ea16c0d7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1898,7 +1898,7 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
static int pf_interception(struct vcpu_svm *svm)
{
- u64 fault_address = __sme_clr(svm->vmcb->control.exit_info_2);
+ u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;
return kvm_handle_page_fault(&svm->vcpu, error_code, fault_address,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6d1b867d045699d6ce0dfa0ef35d1b87dd36db56 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 4 Mar 2021 17:10:56 -0800
Subject: [PATCH] KVM: SVM: Don't strip the C-bit from CR2 on #PF interception
Don't strip the C-bit from the faulting address on an intercepted #PF,
the address is a virtual address, not a physical address.
Fixes: 0ede79e13224 ("KVM: SVM: Clear C-bit from the page fault address")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210305011101.3597423-13-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58a45bb139f8..cc45ea16c0d7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1898,7 +1898,7 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
static int pf_interception(struct vcpu_svm *svm)
{
- u64 fault_address = __sme_clr(svm->vmcb->control.exit_info_2);
+ u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;
return kvm_handle_page_fault(&svm->vcpu, error_code, fault_address,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1fa4cbd53d9bc7bb0eaa7bcf7c8a5904372a4ec Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Fri, 9 Apr 2021 12:18:31 +0800
Subject: [PATCH] KVM: X86: Do not yield to self
If the target is self we do not need to yield, we can avoid malicious
guest to play this.
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Message-Id: <1617941911-5338-3-git-send-email-wanpengli(a)tencent.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a4bce181d7..66d2ab074a5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8228,6 +8228,10 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
if (!target || !READ_ONCE(target->ready))
goto no_yield;
+ /* Ignore requests to yield to self */
+ if (vcpu == target)
+ goto no_yield;
+
if (kvm_vcpu_yield_to(target) <= 0)
goto no_yield;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1fa4cbd53d9bc7bb0eaa7bcf7c8a5904372a4ec Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Fri, 9 Apr 2021 12:18:31 +0800
Subject: [PATCH] KVM: X86: Do not yield to self
If the target is self we do not need to yield, we can avoid malicious
guest to play this.
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Message-Id: <1617941911-5338-3-git-send-email-wanpengli(a)tencent.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a4bce181d7..66d2ab074a5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8228,6 +8228,10 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
if (!target || !READ_ONCE(target->ready))
goto no_yield;
+ /* Ignore requests to yield to self */
+ if (vcpu == target)
+ goto no_yield;
+
if (kvm_vcpu_yield_to(target) <= 0)
goto no_yield;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1fa4cbd53d9bc7bb0eaa7bcf7c8a5904372a4ec Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Fri, 9 Apr 2021 12:18:31 +0800
Subject: [PATCH] KVM: X86: Do not yield to self
If the target is self we do not need to yield, we can avoid malicious
guest to play this.
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Message-Id: <1617941911-5338-3-git-send-email-wanpengli(a)tencent.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a4bce181d7..66d2ab074a5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8228,6 +8228,10 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
if (!target || !READ_ONCE(target->ready))
goto no_yield;
+ /* Ignore requests to yield to self */
+ if (vcpu == target)
+ goto no_yield;
+
if (kvm_vcpu_yield_to(target) <= 0)
goto no_yield;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1fa4cbd53d9bc7bb0eaa7bcf7c8a5904372a4ec Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Fri, 9 Apr 2021 12:18:31 +0800
Subject: [PATCH] KVM: X86: Do not yield to self
If the target is self we do not need to yield, we can avoid malicious
guest to play this.
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Message-Id: <1617941911-5338-3-git-send-email-wanpengli(a)tencent.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a4bce181d7..66d2ab074a5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8228,6 +8228,10 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
if (!target || !READ_ONCE(target->ready))
goto no_yield;
+ /* Ignore requests to yield to self */
+ if (vcpu == target)
+ goto no_yield;
+
if (kvm_vcpu_yield_to(target) <= 0)
goto no_yield;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1fa4cbd53d9bc7bb0eaa7bcf7c8a5904372a4ec Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Fri, 9 Apr 2021 12:18:31 +0800
Subject: [PATCH] KVM: X86: Do not yield to self
If the target is self we do not need to yield, we can avoid malicious
guest to play this.
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Message-Id: <1617941911-5338-3-git-send-email-wanpengli(a)tencent.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a4bce181d7..66d2ab074a5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8228,6 +8228,10 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
if (!target || !READ_ONCE(target->ready))
goto no_yield;
+ /* Ignore requests to yield to self */
+ if (vcpu == target)
+ goto no_yield;
+
if (kvm_vcpu_yield_to(target) <= 0)
goto no_yield;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1fa4cbd53d9bc7bb0eaa7bcf7c8a5904372a4ec Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Fri, 9 Apr 2021 12:18:31 +0800
Subject: [PATCH] KVM: X86: Do not yield to self
If the target is self we do not need to yield, we can avoid malicious
guest to play this.
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Message-Id: <1617941911-5338-3-git-send-email-wanpengli(a)tencent.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a4bce181d7..66d2ab074a5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8228,6 +8228,10 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
if (!target || !READ_ONCE(target->ready))
goto no_yield;
+ /* Ignore requests to yield to self */
+ if (vcpu == target)
+ goto no_yield;
+
if (kvm_vcpu_yield_to(target) <= 0)
goto no_yield;
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1fa4cbd53d9bc7bb0eaa7bcf7c8a5904372a4ec Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Fri, 9 Apr 2021 12:18:31 +0800
Subject: [PATCH] KVM: X86: Do not yield to self
If the target is self we do not need to yield, we can avoid malicious
guest to play this.
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Message-Id: <1617941911-5338-3-git-send-email-wanpengli(a)tencent.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a4bce181d7..66d2ab074a5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8228,6 +8228,10 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
if (!target || !READ_ONCE(target->ready))
goto no_yield;
+ /* Ignore requests to yield to self */
+ if (vcpu == target)
+ goto no_yield;
+
if (kvm_vcpu_yield_to(target) <= 0)
goto no_yield;
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a1fa4cbd53d9bc7bb0eaa7bcf7c8a5904372a4ec Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Fri, 9 Apr 2021 12:18:31 +0800
Subject: [PATCH] KVM: X86: Do not yield to self
If the target is self we do not need to yield, we can avoid malicious
guest to play this.
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Message-Id: <1617941911-5338-3-git-send-email-wanpengli(a)tencent.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a4bce181d7..66d2ab074a5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8228,6 +8228,10 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
if (!target || !READ_ONCE(target->ready))
goto no_yield;
+ /* Ignore requests to yield to self */
+ if (vcpu == target)
+ goto no_yield;
+
if (kvm_vcpu_yield_to(target) <= 0)
goto no_yield;
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4a98623d5d90175c0f99d185171e60807391e487 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 9 Mar 2021 14:42:07 -0800
Subject: [PATCH] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow
paging
Set the PAE roots used as decrypted to play nice with SME when KVM is
using shadow paging. Explicitly skip setting the C-bit when loading
CR3 for PAE shadow paging, even though it's completely ignored by the
CPU. The extra documentation is nice to have.
Note, there are several subtleties at play with NPT. In addition to
legacy shadow paging, the PAE roots are used for SVM's NPT when either
KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
NPT. However, 32-bit Linux, and thus KVM, doesn't support SME. And
64-bit KVM can happily set the C-bit in CR3. This also means that
keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
conceptually wrong, but functionally ok since SME is 64-bit only.
Leave it as is to avoid unnecessary pollution.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210309224207.1218275-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6b0576ff2846..c6ed633594a2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -48,6 +48,7 @@
#include <asm/memtype.h>
#include <asm/cmpxchg.h>
#include <asm/io.h>
+#include <asm/set_memory.h>
#include <asm/vmx.h>
#include <asm/kvm_page_track.h>
#include "trace.h"
@@ -3388,7 +3389,10 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
return -EIO;
- /* Unlike 32-bit NPT, the PDP table doesn't need to be in low mem. */
+ /*
+ * Unlike 32-bit NPT, the PDP table doesn't need to be in low mem, and
+ * doesn't need to be decrypted.
+ */
pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!pae_root)
return -ENOMEM;
@@ -5274,6 +5278,8 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
static void free_mmu_pages(struct kvm_mmu *mmu)
{
+ if (!tdp_enabled && mmu->pae_root)
+ set_memory_encrypted((unsigned long)mmu->pae_root, 1);
free_page((unsigned long)mmu->pae_root);
free_page((unsigned long)mmu->lm_root);
}
@@ -5308,6 +5314,20 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
return -ENOMEM;
mmu->pae_root = page_address(page);
+
+ /*
+ * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
+ * get the CPU to treat the PDPTEs as encrypted. Decrypt the page so
+ * that KVM's writes and the CPU's reads get along. Note, this is
+ * only necessary when using shadow paging, as 64-bit NPT can get at
+ * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
+ * by 32-bit kernels (when KVM itself uses 32-bit NPT).
+ */
+ if (!tdp_enabled)
+ set_memory_decrypted((unsigned long)mmu->pae_root, 1);
+ else
+ WARN_ON_ONCE(shadow_me_mask);
+
for (i = 0; i < 4; ++i)
mmu->pae_root[i] = INVALID_PAE_ROOT;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58f4dc0e7864..271196400495 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3907,9 +3907,8 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
struct vcpu_svm *svm = to_svm(vcpu);
unsigned long cr3;
- root_hpa = __sme_set(root_hpa);
if (npt_enabled) {
- svm->vmcb->control.nested_cr3 = root_hpa;
+ svm->vmcb->control.nested_cr3 = __sme_set(root_hpa);
vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
/* Loading L2's CR3 is handled by enter_svm_guest_mode. */
@@ -3917,7 +3916,7 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
return;
cr3 = vcpu->arch.cr3;
} else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
- cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+ cr3 = __sme_set(root_hpa) | kvm_get_active_pcid(vcpu);
} else {
/* PCID in the guest should be impossible with a 32-bit MMU. */
WARN_ON_ONCE(kvm_get_active_pcid(vcpu));
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4a98623d5d90175c0f99d185171e60807391e487 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 9 Mar 2021 14:42:07 -0800
Subject: [PATCH] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow
paging
Set the PAE roots used as decrypted to play nice with SME when KVM is
using shadow paging. Explicitly skip setting the C-bit when loading
CR3 for PAE shadow paging, even though it's completely ignored by the
CPU. The extra documentation is nice to have.
Note, there are several subtleties at play with NPT. In addition to
legacy shadow paging, the PAE roots are used for SVM's NPT when either
KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
NPT. However, 32-bit Linux, and thus KVM, doesn't support SME. And
64-bit KVM can happily set the C-bit in CR3. This also means that
keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
conceptually wrong, but functionally ok since SME is 64-bit only.
Leave it as is to avoid unnecessary pollution.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210309224207.1218275-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6b0576ff2846..c6ed633594a2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -48,6 +48,7 @@
#include <asm/memtype.h>
#include <asm/cmpxchg.h>
#include <asm/io.h>
+#include <asm/set_memory.h>
#include <asm/vmx.h>
#include <asm/kvm_page_track.h>
#include "trace.h"
@@ -3388,7 +3389,10 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
return -EIO;
- /* Unlike 32-bit NPT, the PDP table doesn't need to be in low mem. */
+ /*
+ * Unlike 32-bit NPT, the PDP table doesn't need to be in low mem, and
+ * doesn't need to be decrypted.
+ */
pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!pae_root)
return -ENOMEM;
@@ -5274,6 +5278,8 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
static void free_mmu_pages(struct kvm_mmu *mmu)
{
+ if (!tdp_enabled && mmu->pae_root)
+ set_memory_encrypted((unsigned long)mmu->pae_root, 1);
free_page((unsigned long)mmu->pae_root);
free_page((unsigned long)mmu->lm_root);
}
@@ -5308,6 +5314,20 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
return -ENOMEM;
mmu->pae_root = page_address(page);
+
+ /*
+ * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
+ * get the CPU to treat the PDPTEs as encrypted. Decrypt the page so
+ * that KVM's writes and the CPU's reads get along. Note, this is
+ * only necessary when using shadow paging, as 64-bit NPT can get at
+ * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
+ * by 32-bit kernels (when KVM itself uses 32-bit NPT).
+ */
+ if (!tdp_enabled)
+ set_memory_decrypted((unsigned long)mmu->pae_root, 1);
+ else
+ WARN_ON_ONCE(shadow_me_mask);
+
for (i = 0; i < 4; ++i)
mmu->pae_root[i] = INVALID_PAE_ROOT;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58f4dc0e7864..271196400495 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3907,9 +3907,8 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
struct vcpu_svm *svm = to_svm(vcpu);
unsigned long cr3;
- root_hpa = __sme_set(root_hpa);
if (npt_enabled) {
- svm->vmcb->control.nested_cr3 = root_hpa;
+ svm->vmcb->control.nested_cr3 = __sme_set(root_hpa);
vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
/* Loading L2's CR3 is handled by enter_svm_guest_mode. */
@@ -3917,7 +3916,7 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
return;
cr3 = vcpu->arch.cr3;
} else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
- cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+ cr3 = __sme_set(root_hpa) | kvm_get_active_pcid(vcpu);
} else {
/* PCID in the guest should be impossible with a 32-bit MMU. */
WARN_ON_ONCE(kvm_get_active_pcid(vcpu));
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4a98623d5d90175c0f99d185171e60807391e487 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 9 Mar 2021 14:42:07 -0800
Subject: [PATCH] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow
paging
Set the PAE roots used as decrypted to play nice with SME when KVM is
using shadow paging. Explicitly skip setting the C-bit when loading
CR3 for PAE shadow paging, even though it's completely ignored by the
CPU. The extra documentation is nice to have.
Note, there are several subtleties at play with NPT. In addition to
legacy shadow paging, the PAE roots are used for SVM's NPT when either
KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
NPT. However, 32-bit Linux, and thus KVM, doesn't support SME. And
64-bit KVM can happily set the C-bit in CR3. This also means that
keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
conceptually wrong, but functionally ok since SME is 64-bit only.
Leave it as is to avoid unnecessary pollution.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210309224207.1218275-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6b0576ff2846..c6ed633594a2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -48,6 +48,7 @@
#include <asm/memtype.h>
#include <asm/cmpxchg.h>
#include <asm/io.h>
+#include <asm/set_memory.h>
#include <asm/vmx.h>
#include <asm/kvm_page_track.h>
#include "trace.h"
@@ -3388,7 +3389,10 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
return -EIO;
- /* Unlike 32-bit NPT, the PDP table doesn't need to be in low mem. */
+ /*
+ * Unlike 32-bit NPT, the PDP table doesn't need to be in low mem, and
+ * doesn't need to be decrypted.
+ */
pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!pae_root)
return -ENOMEM;
@@ -5274,6 +5278,8 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
static void free_mmu_pages(struct kvm_mmu *mmu)
{
+ if (!tdp_enabled && mmu->pae_root)
+ set_memory_encrypted((unsigned long)mmu->pae_root, 1);
free_page((unsigned long)mmu->pae_root);
free_page((unsigned long)mmu->lm_root);
}
@@ -5308,6 +5314,20 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
return -ENOMEM;
mmu->pae_root = page_address(page);
+
+ /*
+ * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
+ * get the CPU to treat the PDPTEs as encrypted. Decrypt the page so
+ * that KVM's writes and the CPU's reads get along. Note, this is
+ * only necessary when using shadow paging, as 64-bit NPT can get at
+ * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
+ * by 32-bit kernels (when KVM itself uses 32-bit NPT).
+ */
+ if (!tdp_enabled)
+ set_memory_decrypted((unsigned long)mmu->pae_root, 1);
+ else
+ WARN_ON_ONCE(shadow_me_mask);
+
for (i = 0; i < 4; ++i)
mmu->pae_root[i] = INVALID_PAE_ROOT;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58f4dc0e7864..271196400495 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3907,9 +3907,8 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
struct vcpu_svm *svm = to_svm(vcpu);
unsigned long cr3;
- root_hpa = __sme_set(root_hpa);
if (npt_enabled) {
- svm->vmcb->control.nested_cr3 = root_hpa;
+ svm->vmcb->control.nested_cr3 = __sme_set(root_hpa);
vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
/* Loading L2's CR3 is handled by enter_svm_guest_mode. */
@@ -3917,7 +3916,7 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
return;
cr3 = vcpu->arch.cr3;
} else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
- cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+ cr3 = __sme_set(root_hpa) | kvm_get_active_pcid(vcpu);
} else {
/* PCID in the guest should be impossible with a 32-bit MMU. */
WARN_ON_ONCE(kvm_get_active_pcid(vcpu));
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4a98623d5d90175c0f99d185171e60807391e487 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 9 Mar 2021 14:42:07 -0800
Subject: [PATCH] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow
paging
Set the PAE roots used as decrypted to play nice with SME when KVM is
using shadow paging. Explicitly skip setting the C-bit when loading
CR3 for PAE shadow paging, even though it's completely ignored by the
CPU. The extra documentation is nice to have.
Note, there are several subtleties at play with NPT. In addition to
legacy shadow paging, the PAE roots are used for SVM's NPT when either
KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
NPT. However, 32-bit Linux, and thus KVM, doesn't support SME. And
64-bit KVM can happily set the C-bit in CR3. This also means that
keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
conceptually wrong, but functionally ok since SME is 64-bit only.
Leave it as is to avoid unnecessary pollution.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210309224207.1218275-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6b0576ff2846..c6ed633594a2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -48,6 +48,7 @@
#include <asm/memtype.h>
#include <asm/cmpxchg.h>
#include <asm/io.h>
+#include <asm/set_memory.h>
#include <asm/vmx.h>
#include <asm/kvm_page_track.h>
#include "trace.h"
@@ -3388,7 +3389,10 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
return -EIO;
- /* Unlike 32-bit NPT, the PDP table doesn't need to be in low mem. */
+ /*
+ * Unlike 32-bit NPT, the PDP table doesn't need to be in low mem, and
+ * doesn't need to be decrypted.
+ */
pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!pae_root)
return -ENOMEM;
@@ -5274,6 +5278,8 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
static void free_mmu_pages(struct kvm_mmu *mmu)
{
+ if (!tdp_enabled && mmu->pae_root)
+ set_memory_encrypted((unsigned long)mmu->pae_root, 1);
free_page((unsigned long)mmu->pae_root);
free_page((unsigned long)mmu->lm_root);
}
@@ -5308,6 +5314,20 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
return -ENOMEM;
mmu->pae_root = page_address(page);
+
+ /*
+ * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
+ * get the CPU to treat the PDPTEs as encrypted. Decrypt the page so
+ * that KVM's writes and the CPU's reads get along. Note, this is
+ * only necessary when using shadow paging, as 64-bit NPT can get at
+ * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
+ * by 32-bit kernels (when KVM itself uses 32-bit NPT).
+ */
+ if (!tdp_enabled)
+ set_memory_decrypted((unsigned long)mmu->pae_root, 1);
+ else
+ WARN_ON_ONCE(shadow_me_mask);
+
for (i = 0; i < 4; ++i)
mmu->pae_root[i] = INVALID_PAE_ROOT;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58f4dc0e7864..271196400495 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3907,9 +3907,8 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
struct vcpu_svm *svm = to_svm(vcpu);
unsigned long cr3;
- root_hpa = __sme_set(root_hpa);
if (npt_enabled) {
- svm->vmcb->control.nested_cr3 = root_hpa;
+ svm->vmcb->control.nested_cr3 = __sme_set(root_hpa);
vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
/* Loading L2's CR3 is handled by enter_svm_guest_mode. */
@@ -3917,7 +3916,7 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
return;
cr3 = vcpu->arch.cr3;
} else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
- cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+ cr3 = __sme_set(root_hpa) | kvm_get_active_pcid(vcpu);
} else {
/* PCID in the guest should be impossible with a 32-bit MMU. */
WARN_ON_ONCE(kvm_get_active_pcid(vcpu));
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4a98623d5d90175c0f99d185171e60807391e487 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 9 Mar 2021 14:42:07 -0800
Subject: [PATCH] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow
paging
Set the PAE roots used as decrypted to play nice with SME when KVM is
using shadow paging. Explicitly skip setting the C-bit when loading
CR3 for PAE shadow paging, even though it's completely ignored by the
CPU. The extra documentation is nice to have.
Note, there are several subtleties at play with NPT. In addition to
legacy shadow paging, the PAE roots are used for SVM's NPT when either
KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
NPT. However, 32-bit Linux, and thus KVM, doesn't support SME. And
64-bit KVM can happily set the C-bit in CR3. This also means that
keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
conceptually wrong, but functionally ok since SME is 64-bit only.
Leave it as is to avoid unnecessary pollution.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210309224207.1218275-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6b0576ff2846..c6ed633594a2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -48,6 +48,7 @@
#include <asm/memtype.h>
#include <asm/cmpxchg.h>
#include <asm/io.h>
+#include <asm/set_memory.h>
#include <asm/vmx.h>
#include <asm/kvm_page_track.h>
#include "trace.h"
@@ -3388,7 +3389,10 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
return -EIO;
- /* Unlike 32-bit NPT, the PDP table doesn't need to be in low mem. */
+ /*
+ * Unlike 32-bit NPT, the PDP table doesn't need to be in low mem, and
+ * doesn't need to be decrypted.
+ */
pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!pae_root)
return -ENOMEM;
@@ -5274,6 +5278,8 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
static void free_mmu_pages(struct kvm_mmu *mmu)
{
+ if (!tdp_enabled && mmu->pae_root)
+ set_memory_encrypted((unsigned long)mmu->pae_root, 1);
free_page((unsigned long)mmu->pae_root);
free_page((unsigned long)mmu->lm_root);
}
@@ -5308,6 +5314,20 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
return -ENOMEM;
mmu->pae_root = page_address(page);
+
+ /*
+ * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
+ * get the CPU to treat the PDPTEs as encrypted. Decrypt the page so
+ * that KVM's writes and the CPU's reads get along. Note, this is
+ * only necessary when using shadow paging, as 64-bit NPT can get at
+ * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
+ * by 32-bit kernels (when KVM itself uses 32-bit NPT).
+ */
+ if (!tdp_enabled)
+ set_memory_decrypted((unsigned long)mmu->pae_root, 1);
+ else
+ WARN_ON_ONCE(shadow_me_mask);
+
for (i = 0; i < 4; ++i)
mmu->pae_root[i] = INVALID_PAE_ROOT;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58f4dc0e7864..271196400495 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3907,9 +3907,8 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
struct vcpu_svm *svm = to_svm(vcpu);
unsigned long cr3;
- root_hpa = __sme_set(root_hpa);
if (npt_enabled) {
- svm->vmcb->control.nested_cr3 = root_hpa;
+ svm->vmcb->control.nested_cr3 = __sme_set(root_hpa);
vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
/* Loading L2's CR3 is handled by enter_svm_guest_mode. */
@@ -3917,7 +3916,7 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
return;
cr3 = vcpu->arch.cr3;
} else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
- cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+ cr3 = __sme_set(root_hpa) | kvm_get_active_pcid(vcpu);
} else {
/* PCID in the guest should be impossible with a 32-bit MMU. */
WARN_ON_ONCE(kvm_get_active_pcid(vcpu));
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4a98623d5d90175c0f99d185171e60807391e487 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 9 Mar 2021 14:42:07 -0800
Subject: [PATCH] KVM: x86/mmu: Mark the PAE roots as decrypted for shadow
paging
Set the PAE roots used as decrypted to play nice with SME when KVM is
using shadow paging. Explicitly skip setting the C-bit when loading
CR3 for PAE shadow paging, even though it's completely ignored by the
CPU. The extra documentation is nice to have.
Note, there are several subtleties at play with NPT. In addition to
legacy shadow paging, the PAE roots are used for SVM's NPT when either
KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
NPT. However, 32-bit Linux, and thus KVM, doesn't support SME. And
64-bit KVM can happily set the C-bit in CR3. This also means that
keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
conceptually wrong, but functionally ok since SME is 64-bit only.
Leave it as is to avoid unnecessary pollution.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210309224207.1218275-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6b0576ff2846..c6ed633594a2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -48,6 +48,7 @@
#include <asm/memtype.h>
#include <asm/cmpxchg.h>
#include <asm/io.h>
+#include <asm/set_memory.h>
#include <asm/vmx.h>
#include <asm/kvm_page_track.h>
#include "trace.h"
@@ -3388,7 +3389,10 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->lm_root))
return -EIO;
- /* Unlike 32-bit NPT, the PDP table doesn't need to be in low mem. */
+ /*
+ * Unlike 32-bit NPT, the PDP table doesn't need to be in low mem, and
+ * doesn't need to be decrypted.
+ */
pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!pae_root)
return -ENOMEM;
@@ -5274,6 +5278,8 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
static void free_mmu_pages(struct kvm_mmu *mmu)
{
+ if (!tdp_enabled && mmu->pae_root)
+ set_memory_encrypted((unsigned long)mmu->pae_root, 1);
free_page((unsigned long)mmu->pae_root);
free_page((unsigned long)mmu->lm_root);
}
@@ -5308,6 +5314,20 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
return -ENOMEM;
mmu->pae_root = page_address(page);
+
+ /*
+ * CR3 is only 32 bits when PAE paging is used, thus it's impossible to
+ * get the CPU to treat the PDPTEs as encrypted. Decrypt the page so
+ * that KVM's writes and the CPU's reads get along. Note, this is
+ * only necessary when using shadow paging, as 64-bit NPT can get at
+ * the C-bit even when shadowing 32-bit NPT, and SME isn't supported
+ * by 32-bit kernels (when KVM itself uses 32-bit NPT).
+ */
+ if (!tdp_enabled)
+ set_memory_decrypted((unsigned long)mmu->pae_root, 1);
+ else
+ WARN_ON_ONCE(shadow_me_mask);
+
for (i = 0; i < 4; ++i)
mmu->pae_root[i] = INVALID_PAE_ROOT;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 58f4dc0e7864..271196400495 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3907,9 +3907,8 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
struct vcpu_svm *svm = to_svm(vcpu);
unsigned long cr3;
- root_hpa = __sme_set(root_hpa);
if (npt_enabled) {
- svm->vmcb->control.nested_cr3 = root_hpa;
+ svm->vmcb->control.nested_cr3 = __sme_set(root_hpa);
vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
/* Loading L2's CR3 is handled by enter_svm_guest_mode. */
@@ -3917,7 +3916,7 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
return;
cr3 = vcpu->arch.cr3;
} else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
- cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+ cr3 = __sme_set(root_hpa) | kvm_get_active_pcid(vcpu);
} else {
/* PCID in the guest should be impossible with a 32-bit MMU. */
WARN_ON_ONCE(kvm_get_active_pcid(vcpu));
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 17e368d94af77c1533bfd4136e080a33a6330282 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 4 Mar 2021 17:10:54 -0800
Subject: [PATCH] KVM: x86/mmu: Set the C-bit in the PDPTRs and LM
pseudo-PDPTRs
Set the C-bit in SPTEs that are set outside of the normal MMU flows,
specifically the PDPDTRs and the handful of special cased "LM root"
entries, all of which are shadow paging only.
Note, the direct-mapped-root PDPTR handling is needed for the scenario
where paging is disabled in the guest, in which case KVM uses a direct
mapped MMU even though TDP is disabled.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210305011101.3597423-11-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index da94734272db..ff7e050ba1ac 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3261,7 +3261,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
i << 30, PT32_ROOT_LEVEL, true);
- mmu->pae_root[i] = root | PT_PRESENT_MASK;
+ mmu->pae_root[i] = root | PT_PRESENT_MASK |
+ shadow_me_mask;
}
mmu->root_hpa = __pa(mmu->pae_root);
} else
@@ -3314,7 +3315,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
* or a PAE 3-level page table. In either case we need to be aware that
* the shadow page table may be a PAE or a long mode page table.
*/
- pm_mask = PT_PRESENT_MASK;
+ pm_mask = PT_PRESENT_MASK | shadow_me_mask;
if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 17e368d94af77c1533bfd4136e080a33a6330282 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 4 Mar 2021 17:10:54 -0800
Subject: [PATCH] KVM: x86/mmu: Set the C-bit in the PDPTRs and LM
pseudo-PDPTRs
Set the C-bit in SPTEs that are set outside of the normal MMU flows,
specifically the PDPDTRs and the handful of special cased "LM root"
entries, all of which are shadow paging only.
Note, the direct-mapped-root PDPTR handling is needed for the scenario
where paging is disabled in the guest, in which case KVM uses a direct
mapped MMU even though TDP is disabled.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210305011101.3597423-11-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index da94734272db..ff7e050ba1ac 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3261,7 +3261,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
i << 30, PT32_ROOT_LEVEL, true);
- mmu->pae_root[i] = root | PT_PRESENT_MASK;
+ mmu->pae_root[i] = root | PT_PRESENT_MASK |
+ shadow_me_mask;
}
mmu->root_hpa = __pa(mmu->pae_root);
} else
@@ -3314,7 +3315,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
* or a PAE 3-level page table. In either case we need to be aware that
* the shadow page table may be a PAE or a long mode page table.
*/
- pm_mask = PT_PRESENT_MASK;
+ pm_mask = PT_PRESENT_MASK | shadow_me_mask;
if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 17e368d94af77c1533bfd4136e080a33a6330282 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 4 Mar 2021 17:10:54 -0800
Subject: [PATCH] KVM: x86/mmu: Set the C-bit in the PDPTRs and LM
pseudo-PDPTRs
Set the C-bit in SPTEs that are set outside of the normal MMU flows,
specifically the PDPDTRs and the handful of special cased "LM root"
entries, all of which are shadow paging only.
Note, the direct-mapped-root PDPTR handling is needed for the scenario
where paging is disabled in the guest, in which case KVM uses a direct
mapped MMU even though TDP is disabled.
Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable(a)vger.kernel.org
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210305011101.3597423-11-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index da94734272db..ff7e050ba1ac 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3261,7 +3261,8 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
root = mmu_alloc_root(vcpu, i << (30 - PAGE_SHIFT),
i << 30, PT32_ROOT_LEVEL, true);
- mmu->pae_root[i] = root | PT_PRESENT_MASK;
+ mmu->pae_root[i] = root | PT_PRESENT_MASK |
+ shadow_me_mask;
}
mmu->root_hpa = __pa(mmu->pae_root);
} else
@@ -3314,7 +3315,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
* or a PAE 3-level page table. In either case we need to be aware that
* the shadow page table may be a PAE or a long mode page table.
*/
- pm_mask = PT_PRESENT_MASK;
+ pm_mask = PT_PRESENT_MASK | shadow_me_mask;
if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;