There is a limited amount of SGX memory (EPC) on each system. When that
memory is used up, SGX has its own swapping mechanism which is similar
in concept but totally separate from the core mm/* code. Instead of
swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM
comes from a shared memory pseudo-file and can itself be swapped by the
core mm code. There is a hierarchy like this:
EPC <-> shmem <-> disk
After data is swapped back in from shmem to EPC, the shmem backing
storage needs to be freed. Currently, the backing shmem is not freed.
This effectively wastes the shmem while the enclave is running. The
memory is recovered when the enclave is destroyed and the backing
storage freed.
Sort this out by freeing memory with shmem_truncate_range(), as soon as
a page is faulted back to the EPC. In addition, free the memory for
PCMD pages as soon as all PCMD's in a page have been marked as unused
by zeroing its contents.
Reported-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v2:
* Rewrite commit message as proposed by Dave.
* Truncate PCMD pages (Dave).
---
arch/x86/kernel/cpu/sgx/encl.c | 48 +++++++++++++++++++++++++++++++---
1 file changed, 44 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..ea43c10e5458 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -12,6 +12,27 @@
#include "encls.h"
#include "sgx.h"
+
+/*
+ * Get the page number of the page in the backing storage, which stores the PCMD
+ * of the enclave page in the given page index. PCMD pages are located after
+ * the backing storage for the visible enclave pages and SECS.
+ */
+static inline pgoff_t sgx_encl_get_backing_pcmd_nr(struct sgx_encl *encl, pgoff_t index)
+{
+ return PFN_DOWN(encl->size) + 1 + (index / sizeof(struct sgx_pcmd));
+}
+
+/*
+ * Free a page from the backing storage in the given page index.
+ */
+static inline void sgx_encl_truncate_backing_page(struct sgx_encl *encl, pgoff_t index)
+{
+ struct inode *inode = file_inode(encl->backing);
+
+ shmem_truncate_range(inode, PFN_PHYS(index), PFN_PHYS(index) + PAGE_SIZE - 1);
+}
+
/*
* ELDU: Load an EPC page as unblocked. For more info, see "OS Management of EPC
* Pages" in the SDM.
@@ -24,7 +45,10 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
struct sgx_encl *encl = encl_page->encl;
struct sgx_pageinfo pginfo;
struct sgx_backing b;
+ bool pcmd_page_empty;
pgoff_t page_index;
+ pgoff_t pcmd_index;
+ u8 *pcmd_page;
int ret;
if (secs_page)
@@ -38,8 +62,8 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
pginfo.addr = encl_page->desc & PAGE_MASK;
pginfo.contents = (unsigned long)kmap_atomic(b.contents);
- pginfo.metadata = (unsigned long)kmap_atomic(b.pcmd) +
- b.pcmd_offset;
+ pcmd_page = kmap_atomic(b.pcmd);
+ pginfo.metadata = (unsigned long)pcmd_page + b.pcmd_offset;
if (secs_page)
pginfo.secs = (u64)sgx_get_epc_virt_addr(secs_page);
@@ -55,11 +79,27 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
ret = -EFAULT;
}
- kunmap_atomic((void *)(unsigned long)(pginfo.metadata - b.pcmd_offset));
+ memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
+
+ /*
+ * The area for the PCMD in the page was zeroed above. Check if the
+ * whole page is now empty meaning that all PCMD's have been zeroed:
+ */
+ pcmd_page_empty = !memchr_inv(pcmd_page, 0, PAGE_SIZE);
+
+ kunmap_atomic(pcmd_page);
kunmap_atomic((void *)(unsigned long)pginfo.contents);
sgx_encl_put_backing(&b, false);
+ /* Free the backing memory. */
+ sgx_encl_truncate_backing_page(encl, page_index);
+
+ if (pcmd_page_empty) {
+ pcmd_index = sgx_encl_get_backing_pcmd_nr(encl, page_index);
+ sgx_encl_truncate_backing_page(encl, pcmd_index);
+ }
+
return ret;
}
@@ -577,7 +617,7 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
struct sgx_backing *backing)
{
- pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
+ pgoff_t pcmd_index = sgx_encl_get_backing_pcmd_nr(encl, page_index);
struct page *contents;
struct page *pcmd;
--
2.32.0
From: Juergen Gross <jgross(a)suse.com>
[ Upstream commit 897919ad8b42eb8222553838ab82414a924694aa ]
This configuration option provides a misc device as an API to userspace.
Make this API usable without having to select the module as a transitive
dependency.
This also fixes an issue where localyesconfig would select
CONFIG_XEN_PRIVCMD=m because it was not visible and defaulted to
building as module.
[boris: clarified help message per Jan's suggestion]
Based-on-patch-by: Thomas Weißschuh <linux(a)weissschuh.net>
Signed-off-by: Juergen Gross <jgross(a)suse.com>
Link: https://lore.kernel.org/r/20211116143323.18866-1-jgross@suse.com
Reviewed-by: Jan Beulich <jbeulich(a)suse.com>
Reviewed-by: Thomas Weißschuh <linux(a)weissschuh.net>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/xen/Kconfig | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 3a14948269b1e..59f862350a6ec 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -199,9 +199,15 @@ config XEN_SCSI_BACKEND
if guests need generic access to SCSI devices.
config XEN_PRIVCMD
- tristate
+ tristate "Xen hypercall passthrough driver"
depends on XEN
default m
+ help
+ The hypercall passthrough driver allows privileged user programs to
+ perform Xen hypercalls. This driver is normally required for systems
+ running as Dom0 to perform privileged operations, but in some
+ disaggregated Xen setups this driver might be needed for other
+ domains, too.
config XEN_STUB
bool "Xen stub drivers"
--
2.33.0
From: Bob Peterson <rpeterso(a)redhat.com>
[ Upstream commit 49462e2be119d38c5eb5759d0d1b712df3a41239 ]
Before this patch, evict would clear the iopen glock's gl_object after
releasing the inode glock. In the meantime, another process could reuse
the same block and thus glocks for a new inode. It would lock the inode
glock (exclusively), and then the iopen glock (shared). The shared
locking mode doesn't provide any ordering against the evict, so by the
time the iopen glock is reused, evict may not have gotten to setting
gl_object to NULL.
Fix that by releasing the iopen glock before the inode glock in
gfs2_evict_inode.
Signed-off-by: Bob Peterson <rpeterso(a)redhat.com>gl_object
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/gfs2/super.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 6a355e1347d7f..d2b7ecbd1b150 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1438,13 +1438,6 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_ordered_del_inode(ip);
clear_inode(inode);
gfs2_dir_hash_inval(ip);
- if (ip->i_gl) {
- glock_clear_object(ip->i_gl, ip);
- wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
- gfs2_glock_add_to_lru(ip->i_gl);
- gfs2_glock_put_eventually(ip->i_gl);
- ip->i_gl = NULL;
- }
if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
struct gfs2_glock *gl = ip->i_iopen_gh.gh_gl;
@@ -1457,6 +1450,13 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_holder_uninit(&ip->i_iopen_gh);
gfs2_glock_put_eventually(gl);
}
+ if (ip->i_gl) {
+ glock_clear_object(ip->i_gl, ip);
+ wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
+ gfs2_glock_add_to_lru(ip->i_gl);
+ gfs2_glock_put_eventually(ip->i_gl);
+ ip->i_gl = NULL;
+ }
}
static struct inode *gfs2_alloc_inode(struct super_block *sb)
--
2.33.0
From: Bob Peterson <rpeterso(a)redhat.com>
[ Upstream commit 49462e2be119d38c5eb5759d0d1b712df3a41239 ]
Before this patch, evict would clear the iopen glock's gl_object after
releasing the inode glock. In the meantime, another process could reuse
the same block and thus glocks for a new inode. It would lock the inode
glock (exclusively), and then the iopen glock (shared). The shared
locking mode doesn't provide any ordering against the evict, so by the
time the iopen glock is reused, evict may not have gotten to setting
gl_object to NULL.
Fix that by releasing the iopen glock before the inode glock in
gfs2_evict_inode.
Signed-off-by: Bob Peterson <rpeterso(a)redhat.com>gl_object
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/gfs2/super.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 6e00d15ef0a82..cc51b5f5f52d8 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1402,13 +1402,6 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_ordered_del_inode(ip);
clear_inode(inode);
gfs2_dir_hash_inval(ip);
- if (ip->i_gl) {
- glock_clear_object(ip->i_gl, ip);
- wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
- gfs2_glock_add_to_lru(ip->i_gl);
- gfs2_glock_put_eventually(ip->i_gl);
- ip->i_gl = NULL;
- }
if (gfs2_holder_initialized(&ip->i_iopen_gh)) {
struct gfs2_glock *gl = ip->i_iopen_gh.gh_gl;
@@ -1421,6 +1414,13 @@ static void gfs2_evict_inode(struct inode *inode)
gfs2_holder_uninit(&ip->i_iopen_gh);
gfs2_glock_put_eventually(gl);
}
+ if (ip->i_gl) {
+ glock_clear_object(ip->i_gl, ip);
+ wait_on_bit_io(&ip->i_flags, GIF_GLOP_PENDING, TASK_UNINTERRUPTIBLE);
+ gfs2_glock_add_to_lru(ip->i_gl);
+ gfs2_glock_put_eventually(ip->i_gl);
+ ip->i_gl = NULL;
+ }
}
static struct inode *gfs2_alloc_inode(struct super_block *sb)
--
2.33.0
Reserving memory using efi_mem_reserve() calls into the x86
efi_arch_mem_reserve() function. This function will insert a new EFI
memory descriptor into the EFI memory map representing the area of
memory to be reserved and marking it as EFI runtime memory.
As part of adding this new entry, a new EFI memory map is allocated and
mapped. The mapping is where a problem can occur. This new EFI memory map
is mapped using early_memremap(). If the allocated memory comes from an
area that is marked as EFI_BOOT_SERVICES_DATA memory in the current EFI
memory map, then it will be mapped unencrypted (see memremap_is_efi_data()
and the call to efi_mem_type()).
However, during replacement of the old EFI memory map with the new EFI
memory map, efi_mem_type() is disabled, resulting in the new EFI memory
map always being mapped encrypted in efi.memmap. This will cause a kernel
crash later in the boot.
Since it is known that the new EFI memory map will always be mapped
encrypted when efi_memmap_install() is called, explicitly map the new EFI
memory map as encrypted (using early_memremap_prot()) when inserting the
new memory map entry.
Cc: <stable(a)vger.kernel.org> # 4.14.x
Fixes: 8f716c9b5feb ("x86/mm: Add support to access boot related data in the clear")
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
---
Changes for v2:
- Update/expand commit message to (hopefully) make it easier to read and
understand
- Add a comment around the use of the early_memremap_prot() call
---
arch/x86/platform/efi/quirks.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index b15ebfe40a73..14f8f20d727a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -277,7 +277,19 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
return;
}
- new = early_memremap(data.phys_map, data.size);
+ /*
+ * When SME is active, early_memremap() can map the memory unencrypted
+ * if the allocation came from EFI_BOOT_SERVICES_DATA (see
+ * memremap_is_efi_data() and the call to efi_mem_type()). However,
+ * when efi_memmap_install() is called to replace the memory map,
+ * efi_mem_type() is "disabled" and so the memory will always be mapped
+ * encrypted. To avoid this possible mismatch between the mappings,
+ * always map the newly allocated memmap memory as encrypted.
+ *
+ * When SME is not active, this behaves just like early_memremap().
+ */
+ new = early_memremap_prot(data.phys_map, data.size,
+ pgprot_val(pgprot_encrypted(FIXMAP_PAGE_NORMAL)));
if (!new) {
pr_err("Failed to map new boot services memmap\n");
return;
--
2.33.1
Add the missing bulk-endpoint max-packet sanity check to probe() to
avoid division by zero in uvc_alloc_urb_buffers() in case a malicious
device has broken descriptors (or when doing descriptor fuzz testing).
Note that USB core will reject URBs submitted for endpoints with zero
wMaxPacketSize but that drivers doing packet-size calculations still
need to handle this (cf. commit 2548288b4fb0 ("USB: Fix: Don't skip
endpoint descriptors with maxpacket=0")).
Fixes: c0efd232929c ("V4L/DVB (8145a): USB Video Class driver")
Cc: stable(a)vger.kernel.org # 2.6.26
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/media/usb/uvc/uvc_video.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index e16464606b14..85ac5c1081b6 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -1958,6 +1958,10 @@ static int uvc_video_start_transfer(struct uvc_streaming *stream,
if (ep == NULL)
return -EIO;
+ /* Reject broken descriptors. */
+ if (usb_endpoint_maxp(&ep->desc) == 0)
+ return -EIO;
+
ret = uvc_init_video_bulk(stream, ep, gfp_flags);
}
--
2.32.0