Validate the stack arguments and setup the stack depening on whether or not
it is growing down or up.
Legacy clone() required userspace to know in which direction the stack is
growing and pass down the stack pointer appropriately. To make things more
confusing microblaze uses a variant of the clone() syscall selected by
CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument.
IA64 has a separate clone2() syscall which also takes an additional
stack_size argument. Finally, parisc has a stack that is growing upwards.
Userspace therefore has a lot nasty code like the following:
#define __STACK_SIZE (8 * 1024 * 1024)
pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd)
{
pid_t ret;
void *stack;
stack = malloc(__STACK_SIZE);
if (!stack)
return -ENOMEM;
#ifdef __ia64__
ret = __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, pidfd);
#elif defined(__parisc__) /* stack grows up */
ret = clone(fn, stack, flags | SIGCHLD, arg, pidfd);
#else
ret = clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pidfd);
#endif
return ret;
}
or even crazier variants such as [3].
With clone3() we have the ability to validate the stack. We can check that
when stack_size is passed, the stack pointer is valid and the other way
around. We can also check that the memory area userspace gave us is fine to
use via access_ok(). Furthermore, we probably should not require
userspace to know in which direction the stack is growing. It is easy
for us to do this in the kernel and I couldn't find the original
reasoning behind exposing this detail to userspace.
/* Intentional user visible API change */
clone3() was released with 5.3. Currently, it is not documented and very
unclear to userspace how the stack and stack_size argument have to be
passed. After talking to glibc folks we concluded that trying to change
clone3() to setup the stack instead of requiring userspace to do this is
the right course of action.
Note, that this is an explicit change in user visible behavior we introduce
with this patch. If it breaks someone's use-case we will revert! (And then
e.g. place the new behavior under an appropriate flag.)
Breaking someone's use-case is very unlikely though. First, neither glibc
nor musl currently expose a wrapper for clone3(). Second, there is no real
motivation for anyone to use clone3() directly since it does not provide
features that legacy clone doesn't. New features for clone3() will first
happen in v5.5 which is why v5.4 is still a good time to try and make that
change now and backport it to v5.3. Searches on [4] did not reveal any
packages calling clone3().
[1]: https://lore.kernel.org/r/CAG48ez3q=BeNcuVTKBN79kJui4vC6nw0Bfq6xc-i0neheT17…
[2]: https://lore.kernel.org/r/20191028172143.4vnnjpdljfnexaq5@wittgenstein
[3]: https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa…
[4]: https://codesearch.debian.net
Fixes: 7f192e3cd316 ("fork: add clone3")
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Jann Horn <jannh(a)google.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: linux-api(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: <stable(a)vger.kernel.org> # 5.3
Cc: GNU C Library <libc-alpha(a)sourceware.org>
Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com>
---
include/uapi/linux/sched.h | 4 ++++
kernel/fork.c | 33 ++++++++++++++++++++++++++++++++-
2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 99335e1f4a27..25b4fa00bad1 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -51,6 +51,10 @@
* sent when the child exits.
* @stack: Specify the location of the stack for the
* child process.
+ * Note, @stack is expected to point to the
+ * lowest address. The stack direction will be
+ * determined by the kernel and set up
+ * appropriately based on @stack_size.
* @stack_size: The size of the stack for the child process.
* @tls: If CLONE_SETTLS is set, the tls descriptor
* is set to tls.
diff --git a/kernel/fork.c b/kernel/fork.c
index bcdf53125210..55af6931c6ec 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2561,7 +2561,35 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
return 0;
}
-static bool clone3_args_valid(const struct kernel_clone_args *kargs)
+/**
+ * clone3_stack_valid - check and prepare stack
+ * @kargs: kernel clone args
+ *
+ * Verify that the stack arguments userspace gave us are sane.
+ * In addition, set the stack direction for userspace since it's easy for us to
+ * determine.
+ */
+static inline bool clone3_stack_valid(struct kernel_clone_args *kargs)
+{
+ if (kargs->stack == 0) {
+ if (kargs->stack_size > 0)
+ return false;
+ } else {
+ if (kargs->stack_size == 0)
+ return false;
+
+ if (!access_ok((void __user *)kargs->stack, kargs->stack_size))
+ return false;
+
+#if !defined(CONFIG_STACK_GROWSUP) && !defined(CONFIG_IA64)
+ kargs->stack += kargs->stack_size;
+#endif
+ }
+
+ return true;
+}
+
+static bool clone3_args_valid(struct kernel_clone_args *kargs)
{
/*
* All lower bits of the flag word are taken.
@@ -2581,6 +2609,9 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs)
kargs->exit_signal)
return false;
+ if (!clone3_stack_valid(kargs))
+ return false;
+
return true;
}
--
2.23.0
This is a note to let you know that I've just added the patch titled
mei: me: add comet point V device id
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 82b29b9f72afdccb40ea5f3c13c6a3cb65a597bc Mon Sep 17 00:00:00 2001
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Date: Tue, 5 Nov 2019 17:05:14 +0200
Subject: mei: me: add comet point V device id
Comet Point (Comet Lake) V device id.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Link: https://lore.kernel.org/r/20191105150514.14010-2-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/hw-me-regs.h | 1 +
drivers/misc/mei/pci-me.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/misc/mei/hw-me-regs.h b/drivers/misc/mei/hw-me-regs.h
index c09f8bb49495..b359f06f05e7 100644
--- a/drivers/misc/mei/hw-me-regs.h
+++ b/drivers/misc/mei/hw-me-regs.h
@@ -81,6 +81,7 @@
#define MEI_DEV_ID_CMP_LP 0x02e0 /* Comet Point LP */
#define MEI_DEV_ID_CMP_LP_3 0x02e4 /* Comet Point LP 3 (iTouch) */
+#define MEI_DEV_ID_CMP_V 0xA3BA /* Comet Point Lake V */
#define MEI_DEV_ID_ICP_LP 0x34E0 /* Ice Lake Point LP */
diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c
index 3dca63eddaa0..ce43415a536c 100644
--- a/drivers/misc/mei/pci-me.c
+++ b/drivers/misc/mei/pci-me.c
@@ -98,6 +98,7 @@ static const struct pci_device_id mei_me_pci_tbl[] = {
{MEI_PCI_DEVICE(MEI_DEV_ID_CMP_LP, MEI_ME_PCH12_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_CMP_LP_3, MEI_ME_PCH8_CFG)},
+ {MEI_PCI_DEVICE(MEI_DEV_ID_CMP_V, MEI_ME_PCH12_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_ICP_LP, MEI_ME_PCH12_CFG)},
--
2.23.0
This is a note to let you know that I've just added the patch titled
mei: bus: prefix device names on bus with the bus name
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 7a2b9e6ec84588b0be65cc0ae45a65bac431496b Mon Sep 17 00:00:00 2001
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Date: Tue, 5 Nov 2019 17:05:13 +0200
Subject: mei: bus: prefix device names on bus with the bus name
Add parent device name to the name of devices on bus to avoid
device names collisions for same client UUID available
from different MEI heads. Namely this prevents sysfs collision under
/sys/bus/mei/device/
In the device part leave just UUID other parameters that are
required for device matching are not required here and are
just bloating the name.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Link: https://lore.kernel.org/r/20191105150514.14010-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/bus.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c
index 985bd4fd3328..53bb394ccba6 100644
--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -873,15 +873,16 @@ static const struct device_type mei_cl_device_type = {
/**
* mei_cl_bus_set_name - set device name for me client device
+ * <controller>-<client device>
+ * Example: 0000:00:16.0-55213584-9a29-4916-badf-0fb7ed682aeb
*
* @cldev: me client device
*/
static inline void mei_cl_bus_set_name(struct mei_cl_device *cldev)
{
- dev_set_name(&cldev->dev, "mei:%s:%pUl:%02X",
- cldev->name,
- mei_me_cl_uuid(cldev->me_cl),
- mei_me_cl_ver(cldev->me_cl));
+ dev_set_name(&cldev->dev, "%s-%pUl",
+ dev_name(cldev->bus->dev),
+ mei_me_cl_uuid(cldev->me_cl));
}
/**
--
2.23.0
This is a note to let you know that I've just added the patch titled
tty: serial: fsl_lpuart: use the sg count from dma_map_sg
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the tty-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 487ee861de176090b055eba5b252b56a3b9973d6 Mon Sep 17 00:00:00 2001
From: Peng Fan <peng.fan(a)nxp.com>
Date: Tue, 5 Nov 2019 05:51:10 +0000
Subject: tty: serial: fsl_lpuart: use the sg count from dma_map_sg
The dmaengine_prep_slave_sg needs to use sg count returned
by dma_map_sg, not use sport->dma_tx_nents, because the return
value of dma_map_sg is not always same with "nents".
When enabling iommu for lpuart + edma, iommu framework may concatenate
two sgs into one.
Fixes: 6250cc30c4c4e ("tty: serial: fsl_lpuart: Use scatter/gather DMA for Tx")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Peng Fan <peng.fan(a)nxp.com>
Link: https://lore.kernel.org/r/1572932977-17866-1-git-send-email-peng.fan@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/fsl_lpuart.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/fsl_lpuart.c b/drivers/tty/serial/fsl_lpuart.c
index 22df5f8f48b6..4e128d19e0ad 100644
--- a/drivers/tty/serial/fsl_lpuart.c
+++ b/drivers/tty/serial/fsl_lpuart.c
@@ -437,8 +437,8 @@ static void lpuart_dma_tx(struct lpuart_port *sport)
}
sport->dma_tx_desc = dmaengine_prep_slave_sg(sport->dma_tx_chan, sgl,
- sport->dma_tx_nents,
- DMA_MEM_TO_DEV, DMA_PREP_INTERRUPT);
+ ret, DMA_MEM_TO_DEV,
+ DMA_PREP_INTERRUPT);
if (!sport->dma_tx_desc) {
dma_unmap_sg(dev, sgl, sport->dma_tx_nents, DMA_TO_DEVICE);
dev_err(dev, "Cannot prepare TX slave DMA!\n");
--
2.23.0
This is a note to let you know that I've just added the patch titled
staging: rtl8192e: fix potential use after free
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the staging-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From b7aa39a2ed0112d07fc277ebd24a08a7b2368ab9 Mon Sep 17 00:00:00 2001
From: Pan Bian <bianpan2016(a)163.com>
Date: Tue, 5 Nov 2019 22:49:11 +0800
Subject: staging: rtl8192e: fix potential use after free
The variable skb is released via kfree_skb() when the return value of
_rtl92e_tx is not zero. However, after that, skb is accessed again to
read its length, which may result in a use after free bug. This patch
fixes the bug by moving the release operation to where skb is never
used later.
Signed-off-by: Pan Bian <bianpan2016(a)163.com>
Reviewed-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/1572965351-6745-1-git-send-email-bianpan2016@163.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/rtl8192e/rtl8192e/rtl_core.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
index b08712a9c029..dace81a7d1ba 100644
--- a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
+++ b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
@@ -1616,14 +1616,15 @@ static void _rtl92e_hard_data_xmit(struct sk_buff *skb, struct net_device *dev,
memcpy((unsigned char *)(skb->cb), &dev, sizeof(dev));
skb_push(skb, priv->rtllib->tx_headroom);
ret = _rtl92e_tx(dev, skb);
- if (ret != 0)
- kfree_skb(skb);
if (queue_index != MGNT_QUEUE) {
priv->rtllib->stats.tx_bytes += (skb->len -
priv->rtllib->tx_headroom);
priv->rtllib->stats.tx_packets++;
}
+
+ if (ret != 0)
+ kfree_skb(skb);
}
static int _rtl92e_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
--
2.23.0
When ext4_mkdir(), ext4_symlink(), ext4_create(), or ext4_mknod() fail
to add entry into directory, it ends up dropping freshly created inode
under the running transaction and thus inode truncation happens under
that transaction. That breaks assumptions that evict() does not get
called from a transaction context and at least in ext4_symlink() case it
can result in inode eviction deadlocking in inode_wait_for_writeback()
when flush worker finds symlink inode, starts to write it back and
blocks on starting a transaction. So change the code in ext4_mkdir() and
ext4_add_nondir() to drop inode reference only after the transaction is
stopped. We also have to add inode to the orphan list in that case as
otherwise the inode would get leaked in case we crash before inode
deletion is committed.
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/ext4/namei.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 97cf1c8b56b2..a67cae3c8ff5 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2547,21 +2547,29 @@ static void ext4_dec_count(handle_t *handle, struct inode *inode)
}
+/*
+ * Add non-directory inode to a directory. On success, the inode reference is
+ * consumed by dentry is instantiation. This is also indicated by clearing of
+ * *inodep pointer. On failure, the caller is responsible for dropping the
+ * inode reference in the safe context.
+ */
static int ext4_add_nondir(handle_t *handle,
- struct dentry *dentry, struct inode *inode)
+ struct dentry *dentry, struct inode **inodep)
{
struct inode *dir = d_inode(dentry->d_parent);
+ struct inode *inode = *inodep;
int err = ext4_add_entry(handle, dentry, inode);
if (!err) {
ext4_mark_inode_dirty(handle, inode);
if (IS_DIRSYNC(dir))
ext4_handle_sync(handle);
d_instantiate_new(dentry, inode);
+ *inodep = NULL;
return 0;
}
drop_nlink(inode);
+ ext4_orphan_add(handle, inode);
unlock_new_inode(inode);
- iput(inode);
return err;
}
@@ -2595,10 +2603,12 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
inode->i_op = &ext4_file_inode_operations;
inode->i_fop = &ext4_file_operations;
ext4_set_aops(inode);
- err = ext4_add_nondir(handle, dentry, inode);
+ err = ext4_add_nondir(handle, dentry, &inode);
}
if (handle)
ext4_journal_stop(handle);
+ if (!IS_ERR_OR_NULL(inode))
+ iput(inode);
if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
goto retry;
return err;
@@ -2625,10 +2635,12 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
if (!IS_ERR(inode)) {
init_special_inode(inode, inode->i_mode, rdev);
inode->i_op = &ext4_special_inode_operations;
- err = ext4_add_nondir(handle, dentry, inode);
+ err = ext4_add_nondir(handle, dentry, &inode);
}
if (handle)
ext4_journal_stop(handle);
+ if (!IS_ERR_OR_NULL(inode))
+ iput(inode);
if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
goto retry;
return err;
@@ -2778,10 +2790,12 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
if (err) {
out_clear_inode:
clear_nlink(inode);
+ ext4_orphan_add(handle, inode);
unlock_new_inode(inode);
ext4_mark_inode_dirty(handle, inode);
+ ext4_journal_stop(handle);
iput(inode);
- goto out_stop;
+ goto out_retry;
}
ext4_inc_count(handle, dir);
ext4_update_dx_flag(dir);
@@ -2795,6 +2809,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
out_stop:
if (handle)
ext4_journal_stop(handle);
+out_retry:
if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
goto retry;
return err;
@@ -3327,9 +3342,11 @@ static int ext4_symlink(struct inode *dir,
inode->i_size = disk_link.len - 1;
}
EXT4_I(inode)->i_disksize = inode->i_size;
- err = ext4_add_nondir(handle, dentry, inode);
+ err = ext4_add_nondir(handle, dentry, &inode);
if (handle)
ext4_journal_stop(handle);
+ if (inode)
+ iput(inode);
goto out_free_encrypted_link;
err_drop_inode:
--
2.16.4
When number of free space in the journal is very low, the arithmetic in
jbd2_log_space_left() could underflow resulting in very high number of
free blocks and thus triggering assertion failure in transaction commit
code complaining there's not enough space in the journal:
J_ASSERT(journal->j_free > 1);
Properly check for the low number of free blocks.
CC: stable(a)vger.kernel.org
Reviewed-by: Theodore Ts'o <tytso(a)mit.edu>
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
include/linux/jbd2.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 603fbc4e2f70..10e6049c0ba9 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1582,7 +1582,7 @@ static inline int jbd2_space_needed(journal_t *journal)
static inline unsigned long jbd2_log_space_left(journal_t *journal)
{
/* Allow for rounding errors */
- unsigned long free = journal->j_free - 32;
+ long free = journal->j_free - 32;
if (journal->j_committing_transaction) {
unsigned long committing = atomic_read(&journal->
@@ -1591,7 +1591,7 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
/* Transaction + control blocks */
free -= committing + (committing >> JBD2_CONTROL_BLOCKS_SHIFT);
}
- return free;
+ return max_t(long, free, 0);
}
/*
--
2.16.4