Commit a799c2bd29d19c565 ("x86/setup: Consolidate early memory
reservations") introduced early_reserve_memory() to do all needed
initial memblock_reserve() calls in one function. Unfortunately the
call of early_reserve_memory() is done too late for Xen dom0, as in
some cases a Xen hook called by e820__memory_setup() will need those
memory reservations to have happened already.
Move the call of early_reserve_memory() before the call of
e820__memory_setup() in order to avoid such problems.
Cc: stable(a)vger.kernel.org
Fixes: a799c2bd29d19c565 ("x86/setup: Consolidate early memory reservations")
Reported-by: Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross(a)suse.com>
---
V2:
- update comment (Jan Beulich, Boris Petkov)
- move call down in setup_arch() (Mike Galbraith)
---
arch/x86/kernel/setup.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 79f164141116..40ed44ead063 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -830,6 +830,20 @@ void __init setup_arch(char **cmdline_p)
x86_init.oem.arch_setup();
+ /*
+ * Do some memory reservations *before* memory is added to memblock, so
+ * memblock allocations won't overwrite it.
+ *
+ * After this point, everything still needed from the boot loader or
+ * firmware or kernel text should be early reserved or marked not RAM in
+ * e820. All other memory is free game.
+ *
+ * This call needs to happen before e820__memory_setup() which calls the
+ * xen_memory_setup() on Xen dom0 which relies on the fact that those
+ * early reservations have happened already.
+ */
+ early_reserve_memory();
+
iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
e820__memory_setup();
parse_setup_data();
@@ -876,18 +890,6 @@ void __init setup_arch(char **cmdline_p)
parse_early_param();
- /*
- * Do some memory reservations *before* memory is added to
- * memblock, so memblock allocations won't overwrite it.
- * Do it after early param, so we could get (unlikely) panic from
- * serial.
- *
- * After this point everything still needed from the boot loader or
- * firmware or kernel text should be early reserved or marked not
- * RAM in e820. All other memory is free game.
- */
- early_reserve_memory();
-
#ifdef CONFIG_MEMORY_HOTPLUG
/*
* Memory used by the kernel cannot be hot-removed because Linux
--
2.26.2
All of entries are freed in a loop, however, the freed entry is accessed
by list_del() after the loop.
When epf driver that includes pci-epf-test unload, the pci_epf_remove_cfs()
is called, and occurred the use after free. Therefore, kernel panics
randomly after or while the module unloading.
I tested this patch with r8a77951-Salvator-xs boards.
Fixes: ef1433f ("PCI: endpoint: Create configfs entry for each pci_epf_device_id table entry")
Signed-off-by: Shunsuke Mie <mie(a)igel.co.jp>
---
drivers/pci/endpoint/pci-epf-core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/endpoint/pci-epf-core.c b/drivers/pci/endpoint/pci-epf-core.c
index e9289d10f822..538e902b0ba6 100644
--- a/drivers/pci/endpoint/pci-epf-core.c
+++ b/drivers/pci/endpoint/pci-epf-core.c
@@ -202,8 +202,10 @@ static void pci_epf_remove_cfs(struct pci_epf_driver *driver)
return;
mutex_lock(&pci_epf_mutex);
- list_for_each_entry_safe(group, tmp, &driver->epf_group, group_entry)
+ list_for_each_entry_safe(group, tmp, &driver->epf_group, group_entry) {
+ list_del(&group->group_entry);
pci_ep_cfs_remove_epf_group(group);
+ }
list_del(&driver->epf_group);
mutex_unlock(&pci_epf_mutex);
}
--
2.17.1
Reserving memory using efi_mem_reserve() calls into the x86
efi_arch_mem_reserve() function. This function will insert a new EFI
memory descriptor into the EFI memory map representing the area of
memory to be reserved and marking it as EFI runtime memory.
As part of adding this new entry, a new EFI memory map is allocated and
mapped. The mapping is where a problem can occur. This new EFI memory map
is mapped using early_memremap(). If the allocated memory comes from an
area that is marked as EFI_BOOT_SERVICES_DATA memory in the current EFI
memory map, then it will be mapped unencrypted (see memremap_is_efi_data()
and the call to efi_mem_type()).
However, during replacement of the old EFI memory map with the new EFI
memory map, efi_mem_type() is disabled, resulting in the new EFI memory
map always being mapped encrypted in efi.memmap. This will cause a kernel
crash later in the boot.
Since it is known that the new EFI memory map will always be mapped
encrypted when efi_memmap_install() is called, explicitly map the new EFI
memory map as encrypted (using early_memremap_prot()) when inserting the
new memory map entry.
Cc: <stable(a)vger.kernel.org> # 4.14.x
Fixes: 8f716c9b5feb ("x86/mm: Add support to access boot related data in the clear")
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
---
Changes for v2:
- Update/expand commit message to (hopefully) make it easier to read and
understand
- Add a comment around the use of the early_memremap_prot() call
---
arch/x86/platform/efi/quirks.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index b15ebfe40a73..14f8f20d727a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -277,7 +277,19 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
return;
}
- new = early_memremap(data.phys_map, data.size);
+ /*
+ * When SME is active, early_memremap() can map the memory unencrypted
+ * if the allocation came from EFI_BOOT_SERVICES_DATA (see
+ * memremap_is_efi_data() and the call to efi_mem_type()). However,
+ * when efi_memmap_install() is called to replace the memory map,
+ * efi_mem_type() is "disabled" and so the memory will always be mapped
+ * encrypted. To avoid this possible mismatch between the mappings,
+ * always map the newly allocated memmap memory as encrypted.
+ *
+ * When SME is not active, this behaves just like early_memremap().
+ */
+ new = early_memremap_prot(data.phys_map, data.size,
+ pgprot_val(pgprot_encrypted(FIXMAP_PAGE_NORMAL)));
if (!new) {
pr_err("Failed to map new boot services memmap\n");
return;
--
2.33.1
Add the missing bulk-endpoint max-packet sanity check to probe() to
avoid division by zero in uvc_alloc_urb_buffers() in case a malicious
device has broken descriptors (or when doing descriptor fuzz testing).
Note that USB core will reject URBs submitted for endpoints with zero
wMaxPacketSize but that drivers doing packet-size calculations still
need to handle this (cf. commit 2548288b4fb0 ("USB: Fix: Don't skip
endpoint descriptors with maxpacket=0")).
Fixes: c0efd232929c ("V4L/DVB (8145a): USB Video Class driver")
Cc: stable(a)vger.kernel.org # 2.6.26
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/media/usb/uvc/uvc_video.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index e16464606b14..85ac5c1081b6 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -1958,6 +1958,10 @@ static int uvc_video_start_transfer(struct uvc_streaming *stream,
if (ep == NULL)
return -EIO;
+ /* Reject broken descriptors. */
+ if (usb_endpoint_maxp(&ep->desc) == 0)
+ return -EIO;
+
ret = uvc_init_video_bulk(stream, ep, gfp_flags);
}
--
2.32.0
Add support for the POLLFREE flag to force complete iocb inline in
aio_poll_wake(). A thread may use it to signal it's exit and/or request
to cleanup while pending poll request. In this case, aio_poll_wake()
needs to make sure it doesn't keep any reference to the queue entry
before returning from wake to avoid possible use after free via
poll_cancel() path.
UAF issue was found during binder and aio interactions in certain
sequence of events [1].
The POLLFREE flag is no more exclusive to the epoll and is being
shared with the aio. Remove comment from poll.h to avoid confusion.
[1] https://lore.kernel.org/r/CAKUd0B_TCXRY4h1hTztfwWbNSFQqsudDLn2S_28csgWZmZAG…
Fixes: af5c72b1fc7a ("Fix aio_poll() races")
Signed-off-by: Ramji Jiyani <ramjiyani(a)google.com>
Reviewed-by: Jeff Moyer <jmoyer(a)redhat.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: stable(a)vger.kernel.org # 4.19+
---
Changes since v1:
- Removed parenthesis around POLLFREE macro definition as per review.
- Updated description to refer UAF issue discussion this patch fixes.
- Updated description to remove reference to parenthesis change.
- Added Reviewed-by from Jeff Moyer
Changes since v2:
- Added Fixes tag.
- Added stable tag for backporting on 4.19+ LTS releases
Changes since v3:
- Updated patch description
- Updated Fixes tag to issue manifestation origin
Changes since v4:
- Added Reviewed-by from Christoph Hellwig
---
fs/aio.c | 45 ++++++++++++++++++---------------
include/uapi/asm-generic/poll.h | 2 +-
2 files changed, 26 insertions(+), 21 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 51b08ab01dff..5d539c05df42 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1674,6 +1674,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
{
struct poll_iocb *req = container_of(wait, struct poll_iocb, wait);
struct aio_kiocb *iocb = container_of(req, struct aio_kiocb, poll);
+ struct kioctx *ctx = iocb->ki_ctx;
__poll_t mask = key_to_poll(key);
unsigned long flags;
@@ -1683,29 +1684,33 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
list_del_init(&req->wait.entry);
- if (mask && spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
- struct kioctx *ctx = iocb->ki_ctx;
+ /*
+ * Use irqsave/irqrestore because not all filesystems (e.g. fuse)
+ * call this function with IRQs disabled and because IRQs have to
+ * be disabled before ctx_lock is obtained.
+ */
+ if (mask & POLLFREE) {
+ /* Force complete iocb inline to remove refs to deleted entry */
+ spin_lock_irqsave(&ctx->ctx_lock, flags);
+ } else if (!(mask && spin_trylock_irqsave(&ctx->ctx_lock, flags))) {
+ /* Can't complete iocb inline; schedule for later */
+ schedule_work(&req->work);
+ return 1;
+ }
- /*
- * Try to complete the iocb inline if we can. Use
- * irqsave/irqrestore because not all filesystems (e.g. fuse)
- * call this function with IRQs disabled and because IRQs
- * have to be disabled before ctx_lock is obtained.
- */
- list_del(&iocb->ki_list);
- iocb->ki_res.res = mangle_poll(mask);
- req->done = true;
- if (iocb->ki_eventfd && eventfd_signal_allowed()) {
- iocb = NULL;
- INIT_WORK(&req->work, aio_poll_put_work);
- schedule_work(&req->work);
- }
- spin_unlock_irqrestore(&ctx->ctx_lock, flags);
- if (iocb)
- iocb_put(iocb);
- } else {
+ /* complete iocb inline */
+ list_del(&iocb->ki_list);
+ iocb->ki_res.res = mangle_poll(mask);
+ req->done = true;
+ if (iocb->ki_eventfd && eventfd_signal_allowed()) {
+ iocb = NULL;
+ INIT_WORK(&req->work, aio_poll_put_work);
schedule_work(&req->work);
}
+ spin_unlock_irqrestore(&ctx->ctx_lock, flags);
+ if (iocb)
+ iocb_put(iocb);
+
return 1;
}
diff --git a/include/uapi/asm-generic/poll.h b/include/uapi/asm-generic/poll.h
index 41b509f410bf..f9c520ce4bf4 100644
--- a/include/uapi/asm-generic/poll.h
+++ b/include/uapi/asm-generic/poll.h
@@ -29,7 +29,7 @@
#define POLLRDHUP 0x2000
#endif
-#define POLLFREE (__force __poll_t)0x4000 /* currently only for epoll */
+#define POLLFREE (__force __poll_t)0x4000
#define POLL_BUSY_LOOP (__force __poll_t)0x8000
--
2.33.0.1079.g6e70778dc9-goog
The function nand_davinci_read_page_hwecc_oob_first() does read the ECC
data from the OOB area. Therefore it does not need to calculate the ECC
as it is already available.
Cc: <stable(a)vger.kernel.org> # v5.2
Fixes: a0ac778eb82c ("mtd: rawnand: ingenic: Add support for the JZ4740")
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
---
drivers/mtd/nand/raw/davinci_nand.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/mtd/nand/raw/davinci_nand.c b/drivers/mtd/nand/raw/davinci_nand.c
index 118da9944e3b..89de24d3bb7a 100644
--- a/drivers/mtd/nand/raw/davinci_nand.c
+++ b/drivers/mtd/nand/raw/davinci_nand.c
@@ -394,7 +394,6 @@ static int nand_davinci_read_page_hwecc_oob_first(struct nand_chip *chip,
int eccsteps = chip->ecc.steps;
uint8_t *p = buf;
uint8_t *ecc_code = chip->ecc.code_buf;
- uint8_t *ecc_calc = chip->ecc.calc_buf;
unsigned int max_bitflips = 0;
/* Read the OOB area first */
@@ -420,8 +419,6 @@ static int nand_davinci_read_page_hwecc_oob_first(struct nand_chip *chip,
if (ret)
return ret;
- chip->ecc.calculate(chip, p, &ecc_calc[i]);
-
stat = chip->ecc.correct(chip, p, &ecc_code[i], NULL);
if (stat == -EBADMSG &&
(chip->ecc.options & NAND_ECC_GENERIC_ERASED_CHECK)) {
--
2.33.0
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4030a6e6a6a4a42ff8c18414c9e0c93e24cc70b8 Mon Sep 17 00:00:00 2001
From: Paul Burton <paulburton(a)google.com>
Date: Thu, 1 Jul 2021 10:24:07 -0700
Subject: [PATCH] tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
Currently tgid_map is sized at PID_MAX_DEFAULT entries, which means that
on systems where pid_max is configured higher than PID_MAX_DEFAULT the
ftrace record-tgid option doesn't work so well. Any tasks with PIDs
higher than PID_MAX_DEFAULT are simply not recorded in tgid_map, and
don't show up in the saved_tgids file.
In particular since systemd v243 & above configure pid_max to its
highest possible 1<<22 value by default on 64 bit systems this renders
the record-tgids option of little use.
Increase the size of tgid_map to the configured pid_max instead,
allowing it to cover the full range of PIDs up to the maximum value of
PID_MAX_LIMIT if the system is configured that way.
On 64 bit systems with pid_max == PID_MAX_LIMIT this will increase the
size of tgid_map from 256KiB to 16MiB. Whilst this 64x increase in
memory overhead sounds significant 64 bit systems are presumably best
placed to accommodate it, and since tgid_map is only allocated when the
record-tgid option is actually used presumably the user would rather it
spends sufficient memory to actually record the tgids they expect.
The size of tgid_map could also increase for CONFIG_BASE_SMALL=y
configurations, but these seem unlikely to be systems upon which people
are both configuring a large pid_max and running ftrace with record-tgid
anyway.
Of note is that we only allocate tgid_map once, the first time that the
record-tgid option is enabled. Therefore its size is only set once, to
the value of pid_max at the time the record-tgid option is first
enabled. If a user increases pid_max after that point, the saved_tgids
file will not contain entries for any tasks with pids beyond the earlier
value of pid_max.
Link: https://lkml.kernel.org/r/20210701172407.889626-2-paulburton@google.com
Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Paul Burton <paulburton(a)google.com>
[ Fixed comment coding style ]
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4843076d67d3..14f56e9fa001 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2191,8 +2191,15 @@ void tracing_reset_all_online_cpus(void)
}
}
+/*
+ * The tgid_map array maps from pid to tgid; i.e. the value stored at index i
+ * is the tgid last observed corresponding to pid=i.
+ */
static int *tgid_map;
+/* The maximum valid index into tgid_map. */
+static size_t tgid_map_max;
+
#define SAVED_CMDLINES_DEFAULT 128
#define NO_CMDLINE_MAP UINT_MAX
static arch_spinlock_t trace_cmdline_lock = __ARCH_SPIN_LOCK_UNLOCKED;
@@ -2468,24 +2475,41 @@ void trace_find_cmdline(int pid, char comm[])
preempt_enable();
}
+static int *trace_find_tgid_ptr(int pid)
+{
+ /*
+ * Pairs with the smp_store_release in set_tracer_flag() to ensure that
+ * if we observe a non-NULL tgid_map then we also observe the correct
+ * tgid_map_max.
+ */
+ int *map = smp_load_acquire(&tgid_map);
+
+ if (unlikely(!map || pid > tgid_map_max))
+ return NULL;
+
+ return &map[pid];
+}
+
int trace_find_tgid(int pid)
{
- if (unlikely(!tgid_map || !pid || pid > PID_MAX_DEFAULT))
- return 0;
+ int *ptr = trace_find_tgid_ptr(pid);
- return tgid_map[pid];
+ return ptr ? *ptr : 0;
}
static int trace_save_tgid(struct task_struct *tsk)
{
+ int *ptr;
+
/* treat recording of idle task as a success */
if (!tsk->pid)
return 1;
- if (unlikely(!tgid_map || tsk->pid > PID_MAX_DEFAULT))
+ ptr = trace_find_tgid_ptr(tsk->pid);
+ if (!ptr)
return 0;
- tgid_map[tsk->pid] = tsk->tgid;
+ *ptr = tsk->tgid;
return 1;
}
@@ -5225,6 +5249,8 @@ int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set)
int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
{
+ int *map;
+
if ((mask == TRACE_ITER_RECORD_TGID) ||
(mask == TRACE_ITER_RECORD_CMD))
lockdep_assert_held(&event_mutex);
@@ -5247,10 +5273,19 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
trace_event_enable_cmd_record(enabled);
if (mask == TRACE_ITER_RECORD_TGID) {
- if (!tgid_map)
- tgid_map = kvcalloc(PID_MAX_DEFAULT + 1,
- sizeof(*tgid_map),
- GFP_KERNEL);
+ if (!tgid_map) {
+ tgid_map_max = pid_max;
+ map = kvcalloc(tgid_map_max + 1, sizeof(*tgid_map),
+ GFP_KERNEL);
+
+ /*
+ * Pairs with smp_load_acquire() in
+ * trace_find_tgid_ptr() to ensure that if it observes
+ * the tgid_map we just allocated then it also observes
+ * the corresponding tgid_map_max value.
+ */
+ smp_store_release(&tgid_map, map);
+ }
if (!tgid_map) {
tr->trace_flags &= ~TRACE_ITER_RECORD_TGID;
return -ENOMEM;
@@ -5664,18 +5699,14 @@ static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
{
int pid = ++(*pos);
- if (pid > PID_MAX_DEFAULT)
- return NULL;
-
- return &tgid_map[pid];
+ return trace_find_tgid_ptr(pid);
}
static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
{
- if (!tgid_map || *pos > PID_MAX_DEFAULT)
- return NULL;
+ int pid = *pos;
- return &tgid_map[*pos];
+ return trace_find_tgid_ptr(pid);
}
static void saved_tgids_stop(struct seq_file *m, void *v)