Partially revert commit 5f501d555653 ("binfmt_elf: reintroduce using
MAP_FIXED_NOREPLACE").
At least ia64 has ET_EXEC PT_LOAD segments that are not virtual-address
contiguous (but _are_ file-offset contiguous). This would result in
giant mapping attempts to cover the entire span, including the virtual
address range hole. Disable total_mapping_size for ET_EXEC, which
reduces the MAP_FIXED_NOREPLACE coverage to only the first PT_LOAD:
$ readelf -lW /usr/bin/gcc
...
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz ...
...
LOAD 0x000000 0x4000000000000000 0x4000000000000000 0x00b5a0 0x00b5a0 ...
LOAD 0x00b5a0 0x600000000000b5a0 0x600000000000b5a0 0x0005ac 0x000710 ...
...
^^^^^^^^ ^^^^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^
File offset range : 0x000000-0x00bb4c
0x00bb4c bytes
Virtual address range : 0x4000000000000000-0x600000000000bcb0
0x200000000000bcb0 bytes
Ironically, this is the reverse of the problem that originally caused
problems with ET_EXEC and MAP_FIXED_NOREPLACE: overlaps. This problem is
with holes. Future work could restore full coverage if load_elf_binary()
were to perform mappings in a separate phase from the loading (where
it could resolve both overlaps and holes).
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: linux-fsdevel(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Reported-by: matoro <matoro_mailinglist_kernel(a)matoro.tk>
Reported-by: John Paul Adrian Glaubitz <glaubitz(a)physik.fu-berlin.de>
Fixes: 5f501d555653 ("binfmt_elf: reintroduce using MAP_FIXED_NOREPLACE")
Link: https://lore.kernel.org/r/a3edd529-c42d-3b09-135c-7e98a15b150f@leemhuis.info
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
Here's the v5.16 backport.
---
fs/binfmt_elf.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index f8c7f26f1fbb..911a9e7044f4 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1135,14 +1135,25 @@ static int load_elf_binary(struct linux_binprm *bprm)
* is then page aligned.
*/
load_bias = ELF_PAGESTART(load_bias - vaddr);
- }
- /*
- * Calculate the entire size of the ELF mapping (total_size).
- * (Note that load_addr_set is set to true later once the
- * initial mapping is performed.)
- */
- if (!load_addr_set) {
+ /*
+ * Calculate the entire size of the ELF mapping
+ * (total_size), used for the initial mapping,
+ * due to first_pt_load which is set to false later
+ * once the initial mapping is performed.
+ *
+ * Note that this is only sensible when the LOAD
+ * segments are contiguous (or overlapping). If
+ * used for LOADs that are far apart, this would
+ * cause the holes between LOADs to be mapped,
+ * running the risk of having the mapping fail,
+ * as it would be larger than the ELF file itself.
+ *
+ * As a result, only ET_DYN does this, since
+ * some ET_EXEC (e.g. ia64) may have virtual
+ * memory holes between LOADs.
+ *
+ */
total_size = total_mapping_size(elf_phdata,
elf_ex->e_phnum);
if (!total_size) {
--
2.32.0
There is a limited amount of SGX memory (EPC) on each system. When that
memory is used up, SGX has its own swapping mechanism which is similar
in concept but totally separate from the core mm/* code. Instead of
swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM
comes from a shared memory pseudo-file and can itself be swapped by the
core mm code. There is a hierarchy like this:
EPC <-> shmem <-> disk
After data is swapped back in from shmem to EPC, the shmem backing
storage needs to be freed. Currently, the backing shmem is not freed.
This effectively wastes the shmem while the enclave is running. The
memory is recovered when the enclave is destroyed and the backing
storage freed.
Sort this out by freeing memory with shmem_truncate_range(), as soon as
a page is faulted back to the EPC. In addition, free the memory for
PCMD pages as soon as all PCMD's in a page have been marked as unused
by zeroing its contents.
Reported-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v4:
* Sanitized the offset calculations.
v3:
* Resend.
v2:
* Rewrite commit message as proposed by Dave.
* Truncate PCMD pages (Dave).
---
arch/x86/kernel/cpu/sgx/encl.c | 44 +++++++++++++++++++++++++++-------
1 file changed, 36 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 8be6f0592bdc..94319190efb1 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -12,6 +12,17 @@
#include "encls.h"
#include "sgx.h"
+
+/*
+ * Free a page from the backing storage in the given page index.
+ */
+static inline void sgx_encl_truncate_backing_page(struct sgx_encl *encl, pgoff_t index)
+{
+ struct inode *inode = file_inode(encl->backing);
+
+ shmem_truncate_range(inode, PFN_PHYS(index), PFN_PHYS(index) + PAGE_SIZE - 1);
+}
+
/*
* ELDU: Load an EPC page as unblocked. For more info, see "OS Management of EPC
* Pages" in the SDM.
@@ -24,7 +35,9 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
struct sgx_encl *encl = encl_page->encl;
struct sgx_pageinfo pginfo;
struct sgx_backing b;
+ bool pcmd_page_empty;
pgoff_t page_index;
+ u8 *pcmd_page;
int ret;
if (secs_page)
@@ -38,8 +51,8 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
pginfo.addr = encl_page->desc & PAGE_MASK;
pginfo.contents = (unsigned long)kmap_atomic(b.contents);
- pginfo.metadata = (unsigned long)kmap_atomic(b.pcmd) +
- b.pcmd_offset;
+ pcmd_page = kmap_atomic(b.pcmd);
+ pginfo.metadata = (unsigned long)pcmd_page + b.pcmd_offset;
if (secs_page)
pginfo.secs = (u64)sgx_get_epc_virt_addr(secs_page);
@@ -55,11 +68,28 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
ret = -EFAULT;
}
- kunmap_atomic((void *)(unsigned long)(pginfo.metadata - b.pcmd_offset));
+ memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
+
+ /*
+ * The area for the PCMD in the page was zeroed above. Check if the
+ * whole page is now empty meaning that all PCMD's have been zeroed:
+ */
+ pcmd_page_empty = !memchr_inv(pcmd_page, 0, PAGE_SIZE);
+
+ kunmap_atomic(pcmd_page);
kunmap_atomic((void *)(unsigned long)pginfo.contents);
sgx_encl_put_backing(&b, false);
+ sgx_encl_truncate_backing_page(encl, page_index);
+
+ if (pcmd_page_empty) {
+ pgoff_t pcmd_off = encl->size + PAGE_SIZE /* SECS */ +
+ page_index * sizeof(struct sgx_pcmd);
+
+ sgx_encl_truncate_backing_page(encl, PFN_DOWN(pcmd_off));
+ }
+
return ret;
}
@@ -583,7 +613,7 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
struct sgx_backing *backing)
{
- pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
+ pgoff_t pcmd_off = encl->size + PAGE_SIZE /* SECS */ + page_index * sizeof(struct sgx_pcmd);
struct page *contents;
struct page *pcmd;
@@ -591,7 +621,7 @@ static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
if (IS_ERR(contents))
return PTR_ERR(contents);
- pcmd = sgx_encl_get_backing_page(encl, pcmd_index);
+ pcmd = sgx_encl_get_backing_page(encl, PFN_DOWN(pcmd_off));
if (IS_ERR(pcmd)) {
put_page(contents);
return PTR_ERR(pcmd);
@@ -600,9 +630,7 @@ static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
backing->page_index = page_index;
backing->contents = contents;
backing->pcmd = pcmd;
- backing->pcmd_offset =
- (page_index & (PAGE_SIZE / sizeof(struct sgx_pcmd) - 1)) *
- sizeof(struct sgx_pcmd);
+ backing->pcmd_offset = pcmd_off & (PAGE_SIZE - 1);
return 0;
}
--
2.35.1
It's possible to change which CRTC is in use for a given
connector/encoder/bridge while we're in self-refresh without fully
disabling the connector/encoder/bridge along the way. This can confuse
the bridge encoder/bridge, because
(a) it needs to track the SR state (trying to perform "active"
operations while the panel is still in SR can be Bad(TM)); and
(b) it tracks the SR state via the CRTC state (and after the switch, the
previous SR state is lost).
Thus, we need to either somehow carry the self-refresh state over to the
new CRTC, or else force an encoder/bridge self-refresh transition during
such a switch.
I choose the latter, so we disable the encoder (and exit PSR) before
attaching it to the new CRTC (where we can continue to assume a clean
(non-self-refresh) state).
This fixes PSR issues seen on Rockchip RK3399 systems with
drivers/gpu/drm/bridge/analogix/analogix_dp_core.c.
Change in v2:
- Drop "->enable" condition; this could possibly be "->active" to
reflect the intended hardware state, but it also is a little
over-specific. We want to make a transition through "disabled" any
time we're exiting PSR at the same time as a CRTC switch.
(Thanks Liu Ying)
Cc: Liu Ying <victor.liu(a)oss.nxp.com>
Cc: <stable(a)vger.kernel.org>
Fixes: 1452c25b0e60 ("drm: Add helpers to kick off self refresh mode in drivers")
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
drivers/gpu/drm/drm_atomic_helper.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 9603193d2fa1..987e4b212e9f 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1011,9 +1011,19 @@ crtc_needs_disable(struct drm_crtc_state *old_state,
return drm_atomic_crtc_effectively_active(old_state);
/*
- * We need to run through the crtc_funcs->disable() function if the CRTC
- * is currently on, if it's transitioning to self refresh mode, or if
- * it's in self refresh mode and needs to be fully disabled.
+ * We need to disable bridge(s) and CRTC if we're transitioning out of
+ * self-refresh and changing CRTCs at the same time, because the
+ * bridge tracks self-refresh status via CRTC state.
+ */
+ if (old_state->self_refresh_active &&
+ old_state->crtc != new_state->crtc)
+ return true;
+
+ /*
+ * We also need to run through the crtc_funcs->disable() function if
+ * the CRTC is currently on, if it's transitioning to self refresh
+ * mode, or if it's in self refresh mode and needs to be fully
+ * disabled.
*/
return old_state->active ||
(old_state->self_refresh_active && !new_state->active) ||
--
2.35.1.574.g5d30c73bfb-goog
If ksize() is used on an allocation, the compiler cannot make any
assumptions about its size any more (as hinted by __alloc_size). Force
it to forget.
One caller was using a container_of() construction that needed to be
worked around.
Cc: Marco Elver <elver(a)google.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: linux-mm(a)kvack.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1599
Fixes: c37495d6254c ("slab: add __alloc_size attributes for better bounds checking")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
This appears to work for me, but I'm waiting for more feedback on
the specific instance got tripped over in Android.
---
drivers/base/devres.c | 4 +++-
include/linux/slab.h | 26 +++++++++++++++++++++++++-
mm/slab_common.c | 19 +++----------------
3 files changed, 31 insertions(+), 18 deletions(-)
diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index eaa9a5cd1db9..1a2645bd7234 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -855,6 +855,7 @@ void *devm_krealloc(struct device *dev, void *ptr, size_t new_size, gfp_t gfp)
size_t total_new_size, total_old_size;
struct devres *old_dr, *new_dr;
unsigned long flags;
+ void *allocation;
if (unlikely(!new_size)) {
devm_kfree(dev, ptr);
@@ -874,7 +875,8 @@ void *devm_krealloc(struct device *dev, void *ptr, size_t new_size, gfp_t gfp)
if (!check_dr_size(new_size, &total_new_size))
return NULL;
- total_old_size = ksize(container_of(ptr, struct devres, data));
+ allocation = container_of(ptr, struct devres, data);
+ total_old_size = ksize(allocation);
if (total_old_size == 0) {
WARN(1, "Pointer doesn't point to dynamically allocated memory.");
return NULL;
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 37bde99b74af..a14f3bfa2f44 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -182,8 +182,32 @@ int kmem_cache_shrink(struct kmem_cache *s);
void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2);
void kfree(const void *objp);
void kfree_sensitive(const void *objp);
+
+/**
+ * ksize - get the actual amount of memory allocated for a given object
+ * @objp: Pointer to the object
+ *
+ * kmalloc may internally round up allocations and return more memory
+ * than requested. ksize() can be used to determine the actual amount of
+ * memory allocated. The caller may use this additional memory, even though
+ * a smaller amount of memory was initially specified with the kmalloc call.
+ * The caller must guarantee that objp points to a valid object previously
+ * allocated with either kmalloc() or kmem_cache_alloc(). The object
+ * must not be freed during the duration of the call.
+ *
+ * Return: size of the actual memory used by @objp in bytes
+ */
+#define ksize(objp) ({ \
+ /* \
+ * Getting the actual allocation size means the __alloc_size \
+ * hints are no longer valid, and the compiler needs to \
+ * forget about them. \
+ */ \
+ OPTIMIZER_HIDE_VAR(objp); \
+ _ksize(objp); \
+})
size_t __ksize(const void *objp);
-size_t ksize(const void *objp);
+size_t _ksize(const void *objp);
#ifdef CONFIG_PRINTK
bool kmem_valid_obj(void *object);
void kmem_dump_obj(void *object);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 23f2ab0713b7..ba5fa8481396 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1245,21 +1245,8 @@ void kfree_sensitive(const void *p)
}
EXPORT_SYMBOL(kfree_sensitive);
-/**
- * ksize - get the actual amount of memory allocated for a given object
- * @objp: Pointer to the object
- *
- * kmalloc may internally round up allocations and return more memory
- * than requested. ksize() can be used to determine the actual amount of
- * memory allocated. The caller may use this additional memory, even though
- * a smaller amount of memory was initially specified with the kmalloc call.
- * The caller must guarantee that objp points to a valid object previously
- * allocated with either kmalloc() or kmem_cache_alloc(). The object
- * must not be freed during the duration of the call.
- *
- * Return: size of the actual memory used by @objp in bytes
- */
-size_t ksize(const void *objp)
+/* Should not be called directly: use "ksize" macro wrapper. */
+size_t _ksize(const void *objp)
{
size_t size;
@@ -1289,7 +1276,7 @@ size_t ksize(const void *objp)
kasan_unpoison_range(objp, size);
return size;
}
-EXPORT_SYMBOL(ksize);
+EXPORT_SYMBOL(_ksize);
/* Tracepoints definitions. */
EXPORT_TRACEPOINT_SYMBOL(kmalloc);
--
2.30.2