This is the start of the stable review cycle for the 4.9.119 release. There are 17 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Aug 9 17:23:30 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.119-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.9.119-rc1
Shankara Pailoor shankarapailoor@gmail.com jfs: Fix inconsistency between memory allocation and ea_buf->max_size
Michael J. Ruhl michael.j.ruhl@intel.com IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
Kees Cook keescook@chromium.org fork: unconditionally clear stack on fork
Konstantin Khlebnikov khlebnikov@yandex-team.ru kmemleak: clear stale pointers from task stacks
Eric Dumazet edumazet@google.com tcp: add tcp_ooo_try_coalesce() helper
Filipe Manana fdmanana@suse.com Btrfs: fix file data corruption after cloning a range and fsync
Esben Haabendal eha@deif.com i2c: imx: Fix reinit_completion() use
Masami Hiramatsu mhiramat@kernel.org ring_buffer: tracing: Inherit the tracing setting to next ring buffer
Vitaly Kuznetsov vkuznets@redhat.com ACPI / PCI: Bail early in acpi_pci_add_bus() if there is no ACPI handle
Theodore Ts'o tytso@mit.edu ext4: fix false negatives *and* false positives in ext4_check_descriptors()
Dmitry Safonov dima@arista.com netlink: Don't shift on 64 for ngroups
Dmitry Safonov dima@arista.com netlink: Don't shift with UB on nlk->ngroups
Dmitry Safonov dima@arista.com netlink: Do not subscribe to non-existent groups
Anna-Maria Gleixner anna-maria@linutronix.de nohz: Fix local_timer_softirq_pending()
Thomas Gleixner tglx@linutronix.de genirq: Make force irq threading setup more robust
Anil Gurumurthy anil.gurumurthy@cavium.com scsi: qla2xxx: Return error when TMF returns
Quinn Tran quinn.tran@cavium.com scsi: qla2xxx: Fix ISP recovery on unload
-------------
Diffstat:
Makefile | 4 ++-- drivers/i2c/busses/i2c-imx.c | 3 +-- drivers/infiniband/hw/hfi1/rc.c | 2 +- drivers/infiniband/hw/hfi1/uc.c | 4 ++-- drivers/infiniband/hw/hfi1/ud.c | 4 ++-- drivers/infiniband/hw/hfi1/verbs_txreq.c | 4 ++-- drivers/infiniband/hw/hfi1/verbs_txreq.h | 4 ++-- drivers/pci/pci-acpi.c | 2 +- drivers/scsi/qla2xxx/qla_init.c | 7 +++---- drivers/scsi/qla2xxx/qla_os.c | 5 +++-- fs/btrfs/extent_io.c | 3 +++ fs/ext4/super.c | 4 ++-- fs/jfs/xattr.c | 10 ++++++---- include/linux/ring_buffer.h | 1 + include/linux/thread_info.h | 7 +------ kernel/fork.c | 3 +++ kernel/irq/manage.c | 9 ++++++++- kernel/time/tick-sched.c | 2 +- kernel/trace/ring_buffer.c | 16 ++++++++++++++++ kernel/trace/trace.c | 6 ++++++ net/ipv4/tcp_input.c | 23 +++++++++++++++++++++-- net/netlink/af_netlink.c | 5 +++++ 22 files changed, 92 insertions(+), 36 deletions(-)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran quinn.tran@cavium.com
commit b08abbd9f5996309f021684f9ca74da30dcca36a upstream.
During unload process, the chip can encounter problem where a FW dump would be captured. For this case, the full reset sequence will be skip to bring the chip back to full operational state.
Fixes: e315cd28b9ef ("[SCSI] qla2xxx: Code changes for qla data structure refactoring") Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran quinn.tran@cavium.com Signed-off-by: Himanshu Madhani himanshu.madhani@cavium.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/qla2xxx/qla_os.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -5218,8 +5218,9 @@ qla2x00_do_dpc(void *data) } }
- if (test_and_clear_bit(ISP_ABORT_NEEDED, - &base_vha->dpc_flags)) { + if (test_and_clear_bit + (ISP_ABORT_NEEDED, &base_vha->dpc_flags) && + !test_bit(UNLOADING, &base_vha->dpc_flags)) {
ql_dbg(ql_dbg_dpc, base_vha, 0x4007, "ISP abort scheduled.\n");
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anil Gurumurthy anil.gurumurthy@cavium.com
commit b4146c4929ef61d5afca011474d59d0918a0cd82 upstream.
Propagate the task management completion status properly to avoid unnecessary waits for commands to complete.
Fixes: faef62d13463 ("[SCSI] qla2xxx: Fix Task Management command asynchronous handling") Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy anil.gurumurthy@cavium.com Signed-off-by: Himanshu Madhani himanshu.madhani@cavium.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/qla2xxx/qla_init.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -329,11 +329,10 @@ qla2x00_async_tm_cmd(fc_port_t *fcport,
wait_for_completion(&tm_iocb->u.tmf.comp);
- rval = tm_iocb->u.tmf.comp_status == CS_COMPLETE ? - QLA_SUCCESS : QLA_FUNCTION_FAILED; + rval = tm_iocb->u.tmf.data;
- if ((rval != QLA_SUCCESS) || tm_iocb->u.tmf.data) { - ql_dbg(ql_dbg_taskm, vha, 0x8030, + if (rval != QLA_SUCCESS) { + ql_log(ql_log_warn, vha, 0x8030, "TM IOCB failed (%x).\n", rval); }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit d1f0301b3333eef5efbfa1fe0f0edbea01863d5d upstream.
The support of force threading interrupts which are set up with both a primary and a threaded handler wreckaged the setup of regular requested threaded interrupts (primary handler == NULL).
The reason is that it does not check whether the primary handler is set to the default handler which wakes the handler thread. Instead it replaces the thread handler with the primary handler as it would do with force threaded interrupts which have been requested via request_irq(). So both the primary and the thread handler become the same which then triggers the warnon that the thread handler tries to wakeup a not configured secondary thread.
Fortunately this only happens when the driver omits the IRQF_ONESHOT flag when requesting the threaded interrupt, which is normaly caught by the sanity checks when force irq threading is disabled.
Fix it by skipping the force threading setup when a regular threaded interrupt is requested. As a consequence the interrupt request which lacks the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging it.
Fixes: 2a1d3ab8986d ("genirq: Handle force threading of irqs with primary and thread handler") Reported-by: Kurt Kanzenbach kurt.kanzenbach@linutronix.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Kurt Kanzenbach kurt.kanzenbach@linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/irq/manage.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1026,6 +1026,13 @@ static int irq_setup_forced_threading(st if (new->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT)) return 0;
+ /* + * No further action required for interrupts which are requested as + * threaded interrupts already + */ + if (new->handler == irq_default_primary_handler) + return 0; + new->flags |= IRQF_ONESHOT;
/* @@ -1033,7 +1040,7 @@ static int irq_setup_forced_threading(st * thread handler. We force thread them as well by creating a * secondary action. */ - if (new->handler != irq_default_primary_handler && new->thread_fn) { + if (new->handler && new->thread_fn) { /* Allocate the secondary action */ new->secondary = kzalloc(sizeof(struct irqaction), GFP_KERNEL); if (!new->secondary)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anna-Maria Gleixner anna-maria@linutronix.de
commit 80d20d35af1edd632a5e7a3b9c0ab7ceff92769e upstream.
local_timer_softirq_pending() checks whether the timer softirq is pending with: local_softirq_pending() & TIMER_SOFTIRQ.
This is wrong because TIMER_SOFTIRQ is the softirq number and not a bitmask. So the test checks for the wrong bit.
Use BIT(TIMER_SOFTIRQ) instead.
Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()") Signed-off-by: Anna-Maria Gleixner anna-maria@linutronix.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Paul E. McKenney paulmck@linux.vnet.ibm.com Reviewed-by: Daniel Bristot de Oliveira bristot@redhat.com Acked-by: Frederic Weisbecker frederic@kernel.org Cc: bigeasy@linutronix.de Cc: peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/tick-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -665,7 +665,7 @@ static void tick_nohz_restart(struct tic
static inline bool local_timer_softirq_pending(void) { - return local_softirq_pending() & TIMER_SOFTIRQ; + return local_softirq_pending() & BIT(TIMER_SOFTIRQ); }
static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Safonov dima@arista.com
[ Upstream commit 7acf9d4237c46894e0fa0492dd96314a41742e84 ]
Make ABI more strict about subscribing to group > ngroups. Code doesn't check for that and it looks bogus. (one can subscribe to non-existing group) Still, it's possible to bind() to all possible groups with (-1)
Cc: "David S. Miller" davem@davemloft.net Cc: Herbert Xu herbert@gondor.apana.org.au Cc: Steffen Klassert steffen.klassert@secunet.com Cc: netdev@vger.kernel.org Signed-off-by: Dmitry Safonov dima@arista.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netlink/af_netlink.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -985,6 +985,7 @@ static int netlink_bind(struct socket *s if (err) return err; } + groups &= (1UL << nlk->ngroups) - 1;
bound = nlk->bound; if (bound) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Safonov dima@arista.com
[ Upstream commit 61f4b23769f0cc72ae62c9a81cf08f0397d40da8 ]
On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in hang during boot. Check for 0 ngroups and use (unsigned long long) as a type to shift.
Fixes: 7acf9d4237c4 ("netlink: Do not subscribe to non-existent groups"). Reported-by: kernel test robot rong.a.chen@intel.com Signed-off-by: Dmitry Safonov dima@arista.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netlink/af_netlink.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -985,7 +985,11 @@ static int netlink_bind(struct socket *s if (err) return err; } - groups &= (1UL << nlk->ngroups) - 1; + + if (nlk->ngroups == 0) + groups = 0; + else + groups &= (1ULL << nlk->ngroups) - 1;
bound = nlk->bound; if (bound) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Safonov dima@arista.com
commit 91874ecf32e41b5d86a4cb9d60e0bee50d828058 upstream.
It's legal to have 64 groups for netlink_sock.
As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe only to first 32 groups.
The check for correctness of .bind() userspace supplied parameter is done by applying mask made from ngroups shift. Which broke Android as they have 64 groups and the shift for mask resulted in an overflow.
Fixes: 61f4b23769f0 ("netlink: Don't shift with UB on nlk->ngroups") Cc: "David S. Miller" davem@davemloft.net Cc: Herbert Xu herbert@gondor.apana.org.au Cc: Steffen Klassert steffen.klassert@secunet.com Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Reported-and-Tested-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Dmitry Safonov dima@arista.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netlink/af_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -988,8 +988,8 @@ static int netlink_bind(struct socket *s
if (nlk->ngroups == 0) groups = 0; - else - groups &= (1ULL << nlk->ngroups) - 1; + else if (nlk->ngroups < 8*sizeof(groups)) + groups &= (1UL << nlk->ngroups) - 1;
bound = nlk->bound; if (bound) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit 44de022c4382541cebdd6de4465d1f4f465ff1dd upstream.
Ext4_check_descriptors() was getting called before s_gdb_count was initialized. So for file systems w/o the meta_bg feature, allocation bitmaps could overlap the block group descriptors and ext4 wouldn't notice.
For file systems with the meta_bg feature enabled, there was a fencepost error which would cause the ext4_check_descriptors() to incorrectly believe that the block allocation bitmap overlaps with the block group descriptor blocks, and it would reject the mount.
Fix both of these problems.
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@vger.kernel.org Signed-off-by: Benjamin Gilbert bgilbert@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2231,7 +2231,7 @@ static int ext4_check_descriptors(struct struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_fsblk_t first_block = le32_to_cpu(sbi->s_es->s_first_data_block); ext4_fsblk_t last_block; - ext4_fsblk_t last_bg_block = sb_block + ext4_bg_num_gdb(sb, 0) + 1; + ext4_fsblk_t last_bg_block = sb_block + ext4_bg_num_gdb(sb, 0); ext4_fsblk_t block_bitmap; ext4_fsblk_t inode_bitmap; ext4_fsblk_t inode_table; @@ -3941,13 +3941,13 @@ static int ext4_fill_super(struct super_ goto failed_mount2; } } + sbi->s_gdb_count = db_count; if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) { ext4_msg(sb, KERN_ERR, "group descriptors corrupted!"); ret = -EFSCORRUPTED; goto failed_mount2; }
- sbi->s_gdb_count = db_count; get_random_bytes(&sbi->s_next_generation, sizeof(u32)); spin_lock_init(&sbi->s_next_gen_lock);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vitaly Kuznetsov vkuznets@redhat.com
commit a0040c0145945d3bd203df8fa97f6dfa819f3f7d upstream.
Hyper-V instances support PCI pass-through which is implemented through PV pci-hyperv driver. When a device is passed through, a new root PCI bus is created in the guest. The bus sits on top of VMBus and has no associated information in ACPI. acpi_pci_add_bus() in this case proceeds all the way to acpi_evaluate_dsm(), which reports
ACPI: : failed to evaluate _DSM (0x1001)
While acpi_pci_slot_enumerate() and acpiphp_enumerate_slots() are protected against ACPI_HANDLE() being NULL and do nothing, acpi_evaluate_dsm() is not and gives us the error. It seems the correct fix is to not do anything in acpi_pci_add_bus() in such cases.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Cc: Sinan Kaya okaya@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/pci-acpi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -567,7 +567,7 @@ void acpi_pci_add_bus(struct pci_bus *bu union acpi_object *obj; struct pci_host_bridge *bridge;
- if (acpi_pci_disabled || !bus->bridge) + if (acpi_pci_disabled || !bus->bridge || !ACPI_HANDLE(bus->bridge)) return;
acpi_pci_slot_enumerate(bus);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu mhiramat@kernel.org
commit 73c8d8945505acdcbae137c2e00a1232e0be709f upstream.
Maintain the tracing on/off setting of the ring_buffer when switching to the trace buffer snapshot.
Taking a snapshot is done by swapping the backup ring buffer (max_tr_buffer). But since the tracing on/off setting is defined by the ring buffer, when swapping it, the tracing on/off setting can also be changed. This causes a strange result like below:
/sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 0 > tracing_on /sys/kernel/debug/tracing # cat tracing_on 0 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 0
We don't touch tracing_on, but snapshot changes tracing_on setting each time. This is an anomaly, because user doesn't know that each "ring_buffer" stores its own tracing-enable state and the snapshot is done by swapping ring buffers.
Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbo...
Cc: Ingo Molnar mingo@redhat.com Cc: Shuah Khan shuah@kernel.org Cc: Tom Zanussi tom.zanussi@linux.intel.com Cc: Hiraku Toyooka hiraku.toyooka@cybertrust.co.jp Cc: stable@vger.kernel.org Fixes: debdd57f5145 ("tracing: Make a snapshot feature available from userspace") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org [ Updated commit log and comment in the code ] Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/ring_buffer.h | 1 + kernel/trace/ring_buffer.c | 16 ++++++++++++++++ kernel/trace/trace.c | 6 ++++++ 3 files changed, 23 insertions(+)
--- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -162,6 +162,7 @@ void ring_buffer_record_enable(struct ri void ring_buffer_record_off(struct ring_buffer *buffer); void ring_buffer_record_on(struct ring_buffer *buffer); int ring_buffer_record_is_on(struct ring_buffer *buffer); +int ring_buffer_record_is_set_on(struct ring_buffer *buffer); void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu); void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
--- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3137,6 +3137,22 @@ int ring_buffer_record_is_on(struct ring }
/** + * ring_buffer_record_is_set_on - return true if the ring buffer is set writable + * @buffer: The ring buffer to see if write is set enabled + * + * Returns true if the ring buffer is set writable by ring_buffer_record_on(). + * Note that this does NOT mean it is in a writable state. + * + * It may return true when the ring buffer has been disabled by + * ring_buffer_record_disable(), as that is a temporary disabling of + * the ring buffer. + */ +int ring_buffer_record_is_set_on(struct ring_buffer *buffer) +{ + return !(atomic_read(&buffer->record_disabled) & RB_BUFFER_OFF); +} + +/** * ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer * @buffer: The ring buffer to stop writes to. * @cpu: The CPU buffer to stop --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1323,6 +1323,12 @@ update_max_tr(struct trace_array *tr, st
arch_spin_lock(&tr->max_lock);
+ /* Inherit the recordable setting from trace_buffer */ + if (ring_buffer_record_is_set_on(tr->trace_buffer.buffer)) + ring_buffer_record_on(tr->max_buffer.buffer); + else + ring_buffer_record_off(tr->max_buffer.buffer); + buf = tr->trace_buffer.buffer; tr->trace_buffer.buffer = tr->max_buffer.buffer; tr->max_buffer.buffer = buf;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit bd3599a0e142cd73edd3b6801068ac3f48ac771a upstream.
When we clone a range into a file we can end up dropping existing extent maps (or trimming them) and replacing them with new ones if the range to be cloned overlaps with a range in the destination inode. When that happens we add the new extent maps to the list of modified extents in the inode's extent map tree, so that a "fast" fsync (the flag BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps and log corresponding extent items. However, at the end of range cloning operation we do truncate all the pages in the affected range (in order to ensure future reads will not get stale data). Sometimes this truncation will release the corresponding extent maps besides the pages from the page cache. If this happens, then a "fast" fsync operation will miss logging some extent items, because it relies exclusively on the extent maps being present in the inode's extent tree, leading to data loss/corruption if the fsync ends up using the same transaction used by the clone operation (that transaction was not committed in the meanwhile). An extent map is released through the callback btrfs_invalidatepage(), which gets called by truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The later ends up calling try_release_extent_mapping() which will release the extent map if some conditions are met, like the file size being greater than 16Mb, gfp flags allow blocking and the range not being locked (which is the case during the clone operation) nor being the extent map flagged as pinned (also the case for cloning).
The following example, turned into a test for fstests, reproduces the issue:
$ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar # reflink destination offset corresponds to the size of file bar, # 2728Kb minus 4Kb. $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar
$ md5sum /mnt/bar 95a95813a8c2abc9aa75a6c2914a077e /mnt/bar
<power fail>
$ mount /dev/sdb /mnt $ md5sum /mnt/bar 207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the # power failure
In the above example, the destination offset of the clone operation corresponds to the size of the "bar" file minus 4Kb. So during the clone operation, the extent map covering the range from 2572Kb to 2728Kb gets trimmed so that it ends at offset 2724Kb, and a new extent map covering the range from 2724Kb to 11724Kb is created. So at the end of the clone operation when we ask to truncate the pages in the range from 2724Kb to 2724Kb + 15908Kb, the page invalidation callback ends up removing the new extent map (through try_release_extent_mapping()) when the page at offset 2724Kb is passed to that callback.
Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent map is removed at try_release_extent_mapping(), forcing the next fsync to search for modified extents in the fs/subvolume tree instead of relying on the presence of extent maps in memory. This way we can continue doing a "fast" fsync if the destination range of a clone operation does not overlap with an existing range or if any of the criteria necessary to remove an extent map at try_release_extent_mapping() is not met (file size not bigger then 16Mb or gfp flags do not allow blocking).
CC: stable@vger.kernel.org # 3.16+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent_io.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4298,6 +4298,7 @@ int try_release_extent_mapping(struct ex struct extent_map *em; u64 start = page_offset(page); u64 end = start + PAGE_SIZE - 1; + struct btrfs_inode *btrfs_inode = BTRFS_I(page->mapping->host);
if (gfpflags_allow_blocking(mask) && page->mapping->host->i_size > SZ_16M) { @@ -4320,6 +4321,8 @@ int try_release_extent_mapping(struct ex extent_map_end(em) - 1, EXTENT_LOCKED | EXTENT_WRITEBACK, 0, NULL)) { + set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, + &btrfs_inode->runtime_flags); remove_extent_mapping(map, em); /* once for the rb tree */ free_extent_map(em);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 58152ecbbcc6a0ce7fddd5bf5f6ee535834ece0c upstream.
In case skb in out_or_order_queue is the result of multiple skbs coalescing, we would like to get a proper gso_segs counter tracking, so that future tcp_drop() can report an accurate number.
I chose to not implement this tracking for skbs in receive queue, since they are not dropped, unless socket is disconnected.
Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Soheil Hassas Yeganeh soheil@google.com Acked-by: Yuchung Cheng ycheng@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: David Woodhouse dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4370,6 +4370,23 @@ static bool tcp_try_coalesce(struct sock return true; }
+static bool tcp_ooo_try_coalesce(struct sock *sk, + struct sk_buff *to, + struct sk_buff *from, + bool *fragstolen) +{ + bool res = tcp_try_coalesce(sk, to, from, fragstolen); + + /* In case tcp_drop() is called later, update to->gso_segs */ + if (res) { + u32 gso_segs = max_t(u16, 1, skb_shinfo(to)->gso_segs) + + max_t(u16, 1, skb_shinfo(from)->gso_segs); + + skb_shinfo(to)->gso_segs = min_t(u32, gso_segs, 0xFFFF); + } + return res; +} + static void tcp_drop(struct sock *sk, struct sk_buff *skb) { sk_drops_add(sk, skb); @@ -4493,7 +4510,8 @@ static void tcp_data_queue_ofo(struct so /* In the typical case, we are adding an skb to the end of the list. * Use of ooo_last_skb avoids the O(Log(N)) rbtree lookup. */ - if (tcp_try_coalesce(sk, tp->ooo_last_skb, skb, &fragstolen)) { + if (tcp_ooo_try_coalesce(sk, tp->ooo_last_skb, + skb, &fragstolen)) { coalesce_done: tcp_grow_window(sk, skb); kfree_skb_partial(skb, fragstolen); @@ -4543,7 +4561,8 @@ coalesce_done: tcp_drop(sk, skb1); goto merge_right; } - } else if (tcp_try_coalesce(sk, skb1, skb, &fragstolen)) { + } else if (tcp_ooo_try_coalesce(sk, skb1, + skb, &fragstolen)) { goto coalesce_done; } p = &parent->rb_right;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konstantin Khlebnikov khlebnikov@yandex-team.ru
commit ca182551857cc2c1e6a2b7f1e72090a137a15008 upstream.
Kmemleak considers any pointers on task stacks as references. This patch clears newly allocated and reused vmap stacks.
Link: http://lkml.kernel.org/r/150728990124.744199.8403409836394318684.stgit@buzz Signed-off-by: Konstantin Khlebnikov khlebnikov@yandex-team.ru Acked-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org [ Srivatsa: Backported to 4.9.y ] Signed-off-by: Srivatsa S. Bhat srivatsa@csail.mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/thread_info.h | 2 +- kernel/fork.c | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-)
--- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -59,7 +59,7 @@ extern long do_no_restart_syscall(struct
#ifdef __KERNEL__
-#ifdef CONFIG_DEBUG_STACK_USAGE +#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) # define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ __GFP_ZERO) #else --- a/kernel/fork.c +++ b/kernel/fork.c @@ -184,6 +184,10 @@ static unsigned long *alloc_thread_stack continue; this_cpu_write(cached_stacks[i], NULL);
+#ifdef CONFIG_DEBUG_KMEMLEAK + /* Clear stale pointers from reused stack. */ + memset(s->addr, 0, THREAD_SIZE); +#endif tsk->stack_vm_area = s; local_irq_enable(); return s->addr;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
commit e01e80634ecdde1dd113ac43b3adad21b47f3957 upstream.
One of the classes of kernel stack content leaks[1] is exposing the contents of prior heap or stack contents when a new process stack is allocated. Normally, those stacks are not zeroed, and the old contents remain in place. In the face of stack content exposure flaws, those contents can leak to userspace.
Fixing this will make the kernel no longer vulnerable to these flaws, as the stack will be wiped each time a stack is assigned to a new process. There's not a meaningful change in runtime performance; it almost looks like it provides a benefit.
Performing back-to-back kernel builds before: Run times: 157.86 157.09 158.90 160.94 160.80 Mean: 159.12 Std Dev: 1.54
and after: Run times: 159.31 157.34 156.71 158.15 160.81 Mean: 158.46 Std Dev: 1.46
Instead of making this a build or runtime config, Andy Lutomirski recommended this just be enabled by default.
[1] A noisy search for many kinds of stack content leaks can be seen here: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
I did some more with perf and cycle counts on running 100,000 execs of /bin/true.
before: Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841 Mean: 221015379122.60 Std Dev: 4662486552.47
after: Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348 Mean: 217745009865.40 Std Dev: 5935559279.99
It continues to look like it's faster, though the deviation is rather wide, but I'm not sure what I could do that would be less noisy. I'm open to ideas!
Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast Signed-off-by: Kees Cook keescook@chromium.org Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Andy Lutomirski luto@kernel.org Cc: Laura Abbott labbott@redhat.com Cc: Rasmus Villemoes rasmus.villemoes@prevas.dk Cc: Mel Gorman mgorman@techsingularity.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org [ Srivatsa: Backported to 4.9.y ] Signed-off-by: Srivatsa S. Bhat srivatsa@csail.mit.edu Reviewed-by: Srinidhi Rao srinidhir@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/thread_info.h | 7 +------ kernel/fork.c | 3 +-- 2 files changed, 2 insertions(+), 8 deletions(-)
--- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -59,12 +59,7 @@ extern long do_no_restart_syscall(struct
#ifdef __KERNEL__
-#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) -# define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ - __GFP_ZERO) -#else -# define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK) -#endif +#define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | __GFP_ZERO)
/* * flag set/clear/test wrappers --- a/kernel/fork.c +++ b/kernel/fork.c @@ -184,10 +184,9 @@ static unsigned long *alloc_thread_stack continue; this_cpu_write(cached_stacks[i], NULL);
-#ifdef CONFIG_DEBUG_KMEMLEAK /* Clear stale pointers from reused stack. */ memset(s->addr, 0, THREAD_SIZE); -#endif + tsk->stack_vm_area = s; local_irq_enable(); return s->addr;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael J. Ruhl michael.j.ruhl@intel.com
commit b697d7d8c741f27b728a878fc55852b06d0f6f5e upstream.
The __get_txreq() function can return a pointer, ERR_PTR(-EBUSY), or NULL. All of the relevant call sites look for IS_ERR, so the NULL return would lead to a NULL pointer exception.
Do not use the ERR_PTR mechanism for this function.
Update all call sites to handle the return value correctly.
Clean up error paths to reflect return value.
Fixes: 45842abbb292 ("staging/rdma/hfi1: move txreq header code") Cc: stable@vger.kernel.org # 4.9.x+ Reported-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Reviewed-by: Kamenee Arumugam kamenee.arumugam@intel.com Signed-off-by: Michael J. Ruhl michael.j.ruhl@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/hfi1/rc.c | 2 +- drivers/infiniband/hw/hfi1/uc.c | 4 ++-- drivers/infiniband/hw/hfi1/ud.c | 4 ++-- drivers/infiniband/hw/hfi1/verbs_txreq.c | 4 ++-- drivers/infiniband/hw/hfi1/verbs_txreq.h | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-)
--- a/drivers/infiniband/hw/hfi1/rc.c +++ b/drivers/infiniband/hw/hfi1/rc.c @@ -397,7 +397,7 @@ int hfi1_make_rc_req(struct rvt_qp *qp,
lockdep_assert_held(&qp->s_lock); ps->s_txreq = get_txreq(ps->dev, qp); - if (IS_ERR(ps->s_txreq)) + if (!ps->s_txreq) goto bail_no_tx;
ohdr = &ps->s_txreq->phdr.hdr.u.oth; --- a/drivers/infiniband/hw/hfi1/uc.c +++ b/drivers/infiniband/hw/hfi1/uc.c @@ -1,5 +1,5 @@ /* - * Copyright(c) 2015, 2016 Intel Corporation. + * Copyright(c) 2015 - 2018 Intel Corporation. * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. @@ -72,7 +72,7 @@ int hfi1_make_uc_req(struct rvt_qp *qp, int middle = 0;
ps->s_txreq = get_txreq(ps->dev, qp); - if (IS_ERR(ps->s_txreq)) + if (!ps->s_txreq) goto bail_no_tx;
if (!(ib_rvt_state_ops[qp->state] & RVT_PROCESS_SEND_OK)) { --- a/drivers/infiniband/hw/hfi1/ud.c +++ b/drivers/infiniband/hw/hfi1/ud.c @@ -1,5 +1,5 @@ /* - * Copyright(c) 2015, 2016 Intel Corporation. + * Copyright(c) 2015 - 2018 Intel Corporation. * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. @@ -285,7 +285,7 @@ int hfi1_make_ud_req(struct rvt_qp *qp, u8 sc5;
ps->s_txreq = get_txreq(ps->dev, qp); - if (IS_ERR(ps->s_txreq)) + if (!ps->s_txreq) goto bail_no_tx;
if (!(ib_rvt_state_ops[qp->state] & RVT_PROCESS_NEXT_SEND_OK)) { --- a/drivers/infiniband/hw/hfi1/verbs_txreq.c +++ b/drivers/infiniband/hw/hfi1/verbs_txreq.c @@ -1,5 +1,5 @@ /* - * Copyright(c) 2016 Intel Corporation. + * Copyright(c) 2016 - 2018 Intel Corporation. * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. @@ -94,7 +94,7 @@ struct verbs_txreq *__get_txreq(struct h struct rvt_qp *qp) __must_hold(&qp->s_lock) { - struct verbs_txreq *tx = ERR_PTR(-EBUSY); + struct verbs_txreq *tx = NULL;
write_seqlock(&dev->iowait_lock); if (ib_rvt_state_ops[qp->state] & RVT_PROCESS_RECV_OK) { --- a/drivers/infiniband/hw/hfi1/verbs_txreq.h +++ b/drivers/infiniband/hw/hfi1/verbs_txreq.h @@ -1,5 +1,5 @@ /* - * Copyright(c) 2016 Intel Corporation. + * Copyright(c) 2016 - 2018 Intel Corporation. * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. @@ -82,7 +82,7 @@ static inline struct verbs_txreq *get_tx if (unlikely(!tx)) { /* call slow path to get the lock */ tx = __get_txreq(dev, qp); - if (IS_ERR(tx)) + if (!tx) return tx; } tx->qp = qp;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shankara Pailoor shankarapailoor@gmail.com
commit 92d34134193e5b129dc24f8d79cb9196626e8d7a upstream.
The code is assuming the buffer is max_size length, but we weren't allocating enough space for it.
Signed-off-by: Shankara Pailoor shankarapailoor@gmail.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Cc: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/jfs/xattr.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/fs/jfs/xattr.c +++ b/fs/jfs/xattr.c @@ -491,15 +491,17 @@ static int ea_get(struct inode *inode, s if (size > PSIZE) { /* * To keep the rest of the code simple. Allocate a - * contiguous buffer to work with + * contiguous buffer to work with. Make the buffer large + * enough to make use of the whole extent. */ - ea_buf->xattr = kmalloc(size, GFP_KERNEL); + ea_buf->max_size = (size + sb->s_blocksize - 1) & + ~(sb->s_blocksize - 1); + + ea_buf->xattr = kmalloc(ea_buf->max_size, GFP_KERNEL); if (ea_buf->xattr == NULL) return -ENOMEM;
ea_buf->flag = EA_MALLOC; - ea_buf->max_size = (size + sb->s_blocksize - 1) & - ~(sb->s_blocksize - 1);
if (ea_size == 0) return 0;
On Tue, Aug 07, 2018 at 08:51:36PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.119 release. There are 17 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Aug 9 17:23:30 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.119-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Merged, compiled with -Werror, and installed onto my OnePlus 6.
No initial issues noticed in dmesg or general usage.
Thanks! Nathan
On 08/07/2018 12:51 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.119 release. There are 17 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Aug 9 17:23:30 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.119-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On 8 August 2018 at 00:21, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.9.119 release. There are 17 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Aug 9 17:23:30 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.119-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.9.119-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.9.y git commit: 1e9cd04e077b33f5dbecb2ddbc53695724b0cf97 git describe: v4.9.118-18-g1e9cd04e077b Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.118-18-...
No regressions (compared to build v4.9.118-17-g575cd33d16af)
Ran 15539 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Tue, Aug 07, 2018 at 08:51:36PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.119 release. There are 17 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Aug 9 17:23:30 UTC 2018. Anything received after that time might be too late.
Build results: total: 148 pass: 148 fail: 0 Qemu test results: total: 277 pass: 277 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
linux-stable-mirror@lists.linaro.org