This is the start of the stable review cycle for the 4.17.14 release. There are 18 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Aug 9 17:23:05 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.14-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.17.14-rc1
Shankara Pailoor shankarapailoor@gmail.com jfs: Fix inconsistency between memory allocation and ea_buf->max_size
Dave Chinner dchinner@redhat.com xfs: validate cached inodes are free when allocated
Eric Sandeen sandeen@sandeen.net xfs: don't call xfs_da_shrink_inode with NULL bp
Linus Torvalds torvalds@linux-foundation.org Partially revert "block: fail op_is_write() requests to read-only partitions"
Filipe Manana fdmanana@suse.com Btrfs: fix file data corruption after cloning a range and fsync
Esben Haabendal eha@deif.com i2c: imx: Fix reinit_completion() use
Masami Hiramatsu mhiramat@kernel.org ring_buffer: tracing: Inherit the tracing setting to next ring buffer
Dmitry Safonov dima@arista.com netlink: Don't shift on 64 for ngroups
Frederic Weisbecker frederic@kernel.org nohz: Fix missing tick reprogram when interrupting an inline softirq
Anna-Maria Gleixner anna-maria@linutronix.de nohz: Fix local_timer_softirq_pending()
Kan Liang kan.liang@linux.intel.com perf/x86/intel/uncore: Fix hardcoded index of Broadwell extra PCI devices
Thomas Gleixner tglx@linutronix.de genirq: Make force irq threading setup more robust
Kees Cook keescook@chromium.org jfs: Fix usercopy whitelist for inline inode data
Anil Gurumurthy anil.gurumurthy@cavium.com scsi: qla2xxx: Return error when TMF returns
Quinn Tran quinn.tran@cavium.com scsi: qla2xxx: Fix ISP recovery on unload
Quinn Tran quinn.tran@cavium.com scsi: qla2xxx: Fix driver unload by shutting down chip
Quinn Tran quinn.tran@cavium.com scsi: qla2xxx: Fix NPIV deletion by calling wait_for_sess_deletion
Quinn Tran quinn.tran@cavium.com scsi: qla2xxx: Fix unintialized List head crash
-------------
Diffstat:
Makefile | 4 +- arch/x86/events/intel/uncore.h | 2 +- arch/x86/events/intel/uncore_snbep.c | 10 +++-- block/blk-core.c | 5 ++- drivers/i2c/busses/i2c-imx.c | 3 +- drivers/scsi/qla2xxx/qla_attr.c | 1 + drivers/scsi/qla2xxx/qla_gbl.h | 1 + drivers/scsi/qla2xxx/qla_gs.c | 4 ++ drivers/scsi/qla2xxx/qla_init.c | 7 ++-- drivers/scsi/qla2xxx/qla_inline.h | 2 + drivers/scsi/qla2xxx/qla_isr.c | 3 ++ drivers/scsi/qla2xxx/qla_mbx.c | 6 +++ drivers/scsi/qla2xxx/qla_mid.c | 11 +++++- drivers/scsi/qla2xxx/qla_os.c | 51 +++++++++++-------------- drivers/scsi/qla2xxx/qla_sup.c | 3 ++ fs/btrfs/extent_io.c | 3 ++ fs/jfs/jfs_dinode.h | 7 ++++ fs/jfs/jfs_incore.h | 1 + fs/jfs/super.c | 3 +- fs/jfs/xattr.c | 10 +++-- fs/xfs/libxfs/xfs_attr_leaf.c | 5 +-- fs/xfs/xfs_icache.c | 73 ++++++++++++++++++++++++------------ include/linux/ring_buffer.h | 1 + kernel/irq/manage.c | 9 ++++- kernel/softirq.c | 2 +- kernel/time/tick-sched.c | 2 +- kernel/trace/ring_buffer.c | 16 ++++++++ kernel/trace/trace.c | 6 +++ net/netlink/af_netlink.c | 4 +- 29 files changed, 171 insertions(+), 84 deletions(-)
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran quinn.tran@cavium.com
commit e3dde080ebbdbb4bda8eee35d770714fee8c59ac upstream.
In case of IOCB Queue full or system where memory is low and driver receives large number of RSCN storm, the stale sp pointer can stay on gpnid_list resulting in page_fault.
This patch fixes this issue by initializing the sp->elem list head and removing sp->elem before memory is freed.
Following stack trace is seen
9 [ffff987b37d1bc60] page_fault at ffffffffad516768 [exception RIP: qla24xx_async_gpnid+496] 10 [ffff987b37d1bd10] qla24xx_async_gpnid at ffffffffc039866d [qla2xxx] 11 [ffff987b37d1bd80] qla2x00_do_work at ffffffffc036169c [qla2xxx] 12 [ffff987b37d1be38] qla2x00_do_dpc_all_vps at ffffffffc03adfed [qla2xxx] 13 [ffff987b37d1be78] qla2x00_do_dpc at ffffffffc036458a [qla2xxx] 14 [ffff987b37d1bec8] kthread at ffffffffacebae31
Fixes: 2d73ac6102d9 ("scsi: qla2xxx: Serialize GPNID for multiple RSCN") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Quinn Tran quinn.tran@cavium.com Signed-off-by: Himanshu Madhani himanshu.madhani@cavium.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/qla2xxx/qla_gs.c | 4 ++++ drivers/scsi/qla2xxx/qla_inline.h | 2 ++ 2 files changed, 6 insertions(+)
--- a/drivers/scsi/qla2xxx/qla_gs.c +++ b/drivers/scsi/qla2xxx/qla_gs.c @@ -3712,6 +3712,10 @@ int qla24xx_async_gpnid(scsi_qla_host_t return rval;
done_free_sp: + spin_lock_irqsave(&vha->hw->vport_slock, flags); + list_del(&sp->elem); + spin_unlock_irqrestore(&vha->hw->vport_slock, flags); + if (sp->u.iocb_cmd.u.ctarg.req) { dma_free_coherent(&vha->hw->pdev->dev, sizeof(struct ct_sns_pkt), --- a/drivers/scsi/qla2xxx/qla_inline.h +++ b/drivers/scsi/qla2xxx/qla_inline.h @@ -222,6 +222,8 @@ qla2xxx_get_qpair_sp(struct qla_qpair *q sp->fcport = fcport; sp->iocbs = 1; sp->vha = qpair->vha; + INIT_LIST_HEAD(&sp->elem); + done: if (!sp) QLA_QPAIR_MARK_NOT_BUSY(qpair);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran quinn.tran@cavium.com
commit efa93f48fa9d423fda166bc3b6c0cbb09682492e upstream.
Add wait for session deletion to finish before freeing an NPIV scsi host.
Fixes: 726b85487067 ("qla2xxx: Add framework for async fabric discovery") Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran quinn.tran@cavium.com Signed-off-by: Himanshu Madhani himanshu.madhani@cavium.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/qla2xxx/qla_attr.c | 1 + drivers/scsi/qla2xxx/qla_gbl.h | 1 + drivers/scsi/qla2xxx/qla_mid.c | 5 +++++ drivers/scsi/qla2xxx/qla_os.c | 2 +- 4 files changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/scsi/qla2xxx/qla_attr.c +++ b/drivers/scsi/qla2xxx/qla_attr.c @@ -2141,6 +2141,7 @@ qla24xx_vport_delete(struct fc_vport *fc msleep(1000);
qla24xx_disable_vp(vha); + qla2x00_wait_for_sess_deletion(vha);
vha->flags.delete_progress = 1;
--- a/drivers/scsi/qla2xxx/qla_gbl.h +++ b/drivers/scsi/qla2xxx/qla_gbl.h @@ -213,6 +213,7 @@ void qla2x00_handle_login_done_event(str int qla24xx_post_gnl_work(struct scsi_qla_host *, fc_port_t *); int qla24xx_async_abort_cmd(srb_t *); int qla24xx_post_relogin_work(struct scsi_qla_host *vha); +void qla2x00_wait_for_sess_deletion(scsi_qla_host_t *);
/* * Global Functions in qla_mid.c source file. --- a/drivers/scsi/qla2xxx/qla_mid.c +++ b/drivers/scsi/qla2xxx/qla_mid.c @@ -153,10 +153,15 @@ qla24xx_disable_vp(scsi_qla_host_t *vha) { unsigned long flags; int ret; + fc_port_t *fcport;
ret = qla24xx_control_vp(vha, VCE_COMMAND_DISABLE_VPS_LOGO_ALL); atomic_set(&vha->loop_state, LOOP_DOWN); atomic_set(&vha->loop_down_timer, LOOP_DOWN_TIME); + list_for_each_entry(fcport, &vha->vp_fcports, list) + fcport->logout_on_delete = 0; + + qla2x00_mark_all_devices_lost(vha, 0);
/* Remove port id from vp target map */ spin_lock_irqsave(&vha->hw->hardware_lock, flags); --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -1147,7 +1147,7 @@ static inline int test_fcport_count(scsi * qla2x00_wait_for_sess_deletion can only be called from remove_one. * it has dependency on UNLOADING flag to stop device discovery */ -static void +void qla2x00_wait_for_sess_deletion(scsi_qla_host_t *vha) { qla2x00_mark_all_devices_lost(vha, 0);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran quinn.tran@cavium.com
commit 45235022da9925b2b070c0139629233173e50089 upstream.
Use chip shutdown at the start of unload to stop all DMA + traffic and bring down the laser. This prevents any link activities from triggering the driver to be re-engaged.
Fixes: 4b60c82736d0 ("scsi: qla2xxx: Add fw_started flags to qpair") Cc: stable@vger.kernel.org #4.16 Signed-off-by: Quinn Tran quinn.tran@cavium.com Signed-off-by: Himanshu Madhani himanshu.madhani@cavium.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/qla2xxx/qla_isr.c | 3 ++ drivers/scsi/qla2xxx/qla_mbx.c | 6 +++++ drivers/scsi/qla2xxx/qla_mid.c | 6 +++-- drivers/scsi/qla2xxx/qla_os.c | 44 ++++++++++++++++------------------------- drivers/scsi/qla2xxx/qla_sup.c | 3 ++ 5 files changed, 34 insertions(+), 28 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -631,6 +631,9 @@ qla2x00_async_event(scsi_qla_host_t *vha unsigned long flags; fc_port_t *fcport = NULL;
+ if (!vha->hw->flags.fw_started) + return; + /* Setup to process RIO completion. */ handle_cnt = 0; if (IS_CNA_CAPABLE(ha)) --- a/drivers/scsi/qla2xxx/qla_mbx.c +++ b/drivers/scsi/qla2xxx/qla_mbx.c @@ -4212,6 +4212,9 @@ qla25xx_init_req_que(struct scsi_qla_hos mbx_cmd_t *mcp = &mc; struct qla_hw_data *ha = vha->hw;
+ if (!ha->flags.fw_started) + return QLA_SUCCESS; + ql_dbg(ql_dbg_mbx + ql_dbg_verbose, vha, 0x10d3, "Entered %s.\n", __func__);
@@ -4281,6 +4284,9 @@ qla25xx_init_rsp_que(struct scsi_qla_hos mbx_cmd_t *mcp = &mc; struct qla_hw_data *ha = vha->hw;
+ if (!ha->flags.fw_started) + return QLA_SUCCESS; + ql_dbg(ql_dbg_mbx + ql_dbg_verbose, vha, 0x10d6, "Entered %s.\n", __func__);
--- a/drivers/scsi/qla2xxx/qla_mid.c +++ b/drivers/scsi/qla2xxx/qla_mid.c @@ -152,10 +152,12 @@ int qla24xx_disable_vp(scsi_qla_host_t *vha) { unsigned long flags; - int ret; + int ret = QLA_SUCCESS; fc_port_t *fcport;
- ret = qla24xx_control_vp(vha, VCE_COMMAND_DISABLE_VPS_LOGO_ALL); + if (vha->hw->flags.fw_started) + ret = qla24xx_control_vp(vha, VCE_COMMAND_DISABLE_VPS_LOGO_ALL); + atomic_set(&vha->loop_state, LOOP_DOWN); atomic_set(&vha->loop_down_timer, LOOP_DOWN_TIME); list_for_each_entry(fcport, &vha->vp_fcports, list) --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -303,6 +303,7 @@ static void qla2x00_free_device(scsi_qla static int qla2xxx_map_queues(struct Scsi_Host *shost); static void qla2x00_destroy_deferred_work(struct qla_hw_data *);
+ struct scsi_host_template qla2xxx_driver_template = { .module = THIS_MODULE, .name = QLA2XXX_DRIVER_NAME, @@ -3603,6 +3604,8 @@ qla2x00_remove_one(struct pci_dev *pdev)
base_vha = pci_get_drvdata(pdev); ha = base_vha->hw; + ql_log(ql_log_info, base_vha, 0xb079, + "Removing driver\n");
/* Indicate device removal to prevent future board_disable and wait * until any pending board_disable has completed. */ @@ -3625,6 +3628,21 @@ qla2x00_remove_one(struct pci_dev *pdev) } qla2x00_wait_for_hba_ready(base_vha);
+ if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha)) { + if (ha->flags.fw_started) + qla2x00_abort_isp_cleanup(base_vha); + } else if (!IS_QLAFX00(ha)) { + if (IS_QLA8031(ha)) { + ql_dbg(ql_dbg_p3p, base_vha, 0xb07e, + "Clearing fcoe driver presence.\n"); + if (qla83xx_clear_drv_presence(base_vha) != QLA_SUCCESS) + ql_dbg(ql_dbg_p3p, base_vha, 0xb079, + "Error while clearing DRV-Presence.\n"); + } + + qla2x00_try_to_stop_firmware(base_vha); + } + qla2x00_wait_for_sess_deletion(base_vha);
/* @@ -3648,14 +3666,6 @@ qla2x00_remove_one(struct pci_dev *pdev)
qla2x00_delete_all_vps(ha, base_vha);
- if (IS_QLA8031(ha)) { - ql_dbg(ql_dbg_p3p, base_vha, 0xb07e, - "Clearing fcoe driver presence.\n"); - if (qla83xx_clear_drv_presence(base_vha) != QLA_SUCCESS) - ql_dbg(ql_dbg_p3p, base_vha, 0xb079, - "Error while clearing DRV-Presence.\n"); - } - qla2x00_abort_all_cmds(base_vha, DID_NO_CONNECT << 16);
qla2x00_dfs_remove(base_vha); @@ -3715,24 +3725,6 @@ qla2x00_free_device(scsi_qla_host_t *vha qla2x00_stop_timer(vha);
qla25xx_delete_queues(vha); - - if (ha->flags.fce_enabled) - qla2x00_disable_fce_trace(vha, NULL, NULL); - - if (ha->eft) - qla2x00_disable_eft_trace(vha); - - if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha)) { - if (ha->flags.fw_started) - qla2x00_abort_isp_cleanup(vha); - } else { - if (ha->flags.fw_started) { - /* Stop currently executing firmware. */ - qla2x00_try_to_stop_firmware(vha); - ha->flags.fw_started = 0; - } - } - vha->flags.online = 0;
/* turn-off interrupts on the card */ --- a/drivers/scsi/qla2xxx/qla_sup.c +++ b/drivers/scsi/qla2xxx/qla_sup.c @@ -1880,6 +1880,9 @@ qla24xx_beacon_off(struct scsi_qla_host if (IS_P3P_TYPE(ha)) return QLA_SUCCESS;
+ if (!ha->flags.fw_started) + return QLA_SUCCESS; + ha->beacon_blink_led = 0;
if (IS_QLA2031(ha) || IS_QLA27XX(ha))
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quinn Tran quinn.tran@cavium.com
commit b08abbd9f5996309f021684f9ca74da30dcca36a upstream.
During unload process, the chip can encounter problem where a FW dump would be captured. For this case, the full reset sequence will be skip to bring the chip back to full operational state.
Fixes: e315cd28b9ef ("[SCSI] qla2xxx: Code changes for qla data structure refactoring") Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran quinn.tran@cavium.com Signed-off-by: Himanshu Madhani himanshu.madhani@cavium.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/qla2xxx/qla_os.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -6014,8 +6014,9 @@ qla2x00_do_dpc(void *data) set_bit(ISP_ABORT_NEEDED, &base_vha->dpc_flags); }
- if (test_and_clear_bit(ISP_ABORT_NEEDED, - &base_vha->dpc_flags)) { + if (test_and_clear_bit + (ISP_ABORT_NEEDED, &base_vha->dpc_flags) && + !test_bit(UNLOADING, &base_vha->dpc_flags)) {
ql_dbg(ql_dbg_dpc, base_vha, 0x4007, "ISP abort scheduled.\n");
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anil Gurumurthy anil.gurumurthy@cavium.com
commit b4146c4929ef61d5afca011474d59d0918a0cd82 upstream.
Propagate the task management completion status properly to avoid unnecessary waits for commands to complete.
Fixes: faef62d13463 ("[SCSI] qla2xxx: Fix Task Management command asynchronous handling") Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy anil.gurumurthy@cavium.com Signed-off-by: Himanshu Madhani himanshu.madhani@cavium.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/qla2xxx/qla_init.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -1518,11 +1518,10 @@ qla2x00_async_tm_cmd(fc_port_t *fcport,
wait_for_completion(&tm_iocb->u.tmf.comp);
- rval = tm_iocb->u.tmf.comp_status == CS_COMPLETE ? - QLA_SUCCESS : QLA_FUNCTION_FAILED; + rval = tm_iocb->u.tmf.data;
- if ((rval != QLA_SUCCESS) || tm_iocb->u.tmf.data) { - ql_dbg(ql_dbg_taskm, vha, 0x8030, + if (rval != QLA_SUCCESS) { + ql_log(ql_log_warn, vha, 0x8030, "TM IOCB failed (%x).\n", rval); }
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
commit 961b33c244e5ba1543ae26270a1ba29f29c2db83 upstream.
Bart Massey reported what turned out to be a usercopy whitelist false positive in JFS when symlink contents exceeded 128 bytes. The inline inode data (i_inline) is actually designed to overflow into the "extended area" following it (i_inline_ea) when needed. So the whitelist needed to be expanded to include both i_inline and i_inline_ea (the whole size of which is calculated internally using IDATASIZE, 256, instead of sizeof(i_inline), 128).
$ cd /mnt/jfs $ touch $(perl -e 'print "B" x 250') $ ln -s B* b $ ls -l >/dev/null
[ 249.436410] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'jfs_ip' (offset 616, size 250)!
Reported-by: Bart Massey bart.massey@gmail.com Fixes: 8d2704d382a9 ("jfs: Define usercopy region in jfs_ip slab cache") Cc: Dave Kleikamp shaggy@kernel.org Cc: jfs-discussion@lists.sourceforge.net Cc: stable@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/jfs/jfs_dinode.h | 7 +++++++ fs/jfs/jfs_incore.h | 1 + fs/jfs/super.c | 3 +-- 3 files changed, 9 insertions(+), 2 deletions(-)
--- a/fs/jfs/jfs_dinode.h +++ b/fs/jfs/jfs_dinode.h @@ -115,6 +115,13 @@ struct dinode { dxd_t _dxd; /* 16: */ union { __le32 _rdev; /* 4: */ + /* + * The fast symlink area + * is expected to overflow + * into _inlineea when + * needed (which will clear + * INLINEEA). + */ u8 _fastsymlink[128]; } _u; u8 _inlineea[128]; --- a/fs/jfs/jfs_incore.h +++ b/fs/jfs/jfs_incore.h @@ -87,6 +87,7 @@ struct jfs_inode_info { struct { unchar _unused[16]; /* 16: */ dxd_t _dxd; /* 16: */ + /* _inline may overflow into _inline_ea when needed */ unchar _inline[128]; /* 128: inline symlink */ /* _inline_ea may overlay the last part of * file._xtroot if maxentry = XTROOTINITSLOT --- a/fs/jfs/super.c +++ b/fs/jfs/super.c @@ -967,8 +967,7 @@ static int __init init_jfs_fs(void) jfs_inode_cachep = kmem_cache_create_usercopy("jfs_ip", sizeof(struct jfs_inode_info), 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT, - offsetof(struct jfs_inode_info, i_inline), - sizeof_field(struct jfs_inode_info, i_inline), + offsetof(struct jfs_inode_info, i_inline), IDATASIZE, init_once); if (jfs_inode_cachep == NULL) return -ENOMEM;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit d1f0301b3333eef5efbfa1fe0f0edbea01863d5d upstream.
The support of force threading interrupts which are set up with both a primary and a threaded handler wreckaged the setup of regular requested threaded interrupts (primary handler == NULL).
The reason is that it does not check whether the primary handler is set to the default handler which wakes the handler thread. Instead it replaces the thread handler with the primary handler as it would do with force threaded interrupts which have been requested via request_irq(). So both the primary and the thread handler become the same which then triggers the warnon that the thread handler tries to wakeup a not configured secondary thread.
Fortunately this only happens when the driver omits the IRQF_ONESHOT flag when requesting the threaded interrupt, which is normaly caught by the sanity checks when force irq threading is disabled.
Fix it by skipping the force threading setup when a regular threaded interrupt is requested. As a consequence the interrupt request which lacks the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging it.
Fixes: 2a1d3ab8986d ("genirq: Handle force threading of irqs with primary and thread handler") Reported-by: Kurt Kanzenbach kurt.kanzenbach@linutronix.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Kurt Kanzenbach kurt.kanzenbach@linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/irq/manage.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1067,6 +1067,13 @@ static int irq_setup_forced_threading(st if (new->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT)) return 0;
+ /* + * No further action required for interrupts which are requested as + * threaded interrupts already + */ + if (new->handler == irq_default_primary_handler) + return 0; + new->flags |= IRQF_ONESHOT;
/* @@ -1074,7 +1081,7 @@ static int irq_setup_forced_threading(st * thread handler. We force thread them as well by creating a * secondary action. */ - if (new->handler != irq_default_primary_handler && new->thread_fn) { + if (new->handler && new->thread_fn) { /* Allocate the secondary action */ new->secondary = kzalloc(sizeof(struct irqaction), GFP_KERNEL); if (!new->secondary)
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kan Liang kan.liang@linux.intel.com
commit 156c8b58ef5cfd97245928c95669fd4cb0f9c388 upstream.
Masayoshi Mizuma reported that a warning message is shown while a CPU is hot-removed on Broadwell servers:
WARNING: CPU: 126 PID: 6 at arch/x86/events/intel/uncore.c:988 uncore_pci_remove+0x10b/0x150 Call Trace: pci_device_remove+0x42/0xd0 device_release_driver_internal+0x148/0x220 pci_stop_bus_device+0x76/0xa0 pci_stop_root_bus+0x44/0x60 acpi_pci_root_remove+0x1f/0x80 acpi_bus_trim+0x57/0x90 acpi_bus_trim+0x2e/0x90 acpi_device_hotplug+0x2bc/0x4b0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x174/0x3a0 worker_thread+0x4c/0x3d0 kthread+0xf8/0x130
This bug was introduced by:
commit 15a3e845b01c ("perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs")
The index of "QPI Port 2 filter" was hardcode to 2, but this conflicts with the index of "PCU.3" which is "HSWEP_PCI_PCU_3", which equals to 2 as well.
To fix the conflict, the hardcoded index needs to be cleaned up:
- introduce a new enumerator "BDX_PCI_QPI_PORT2_FILTER" for "QPI Port 2 filter" on Broadwell, - increase UNCORE_EXTRA_PCI_DEV_MAX by one, - clean up the hardcoded index.
Debugged-by: Masayoshi Mizuma m.mizuma@jp.fujitsu.com Suggested-by: Ingo Molnar mingo@kernel.org Reported-by: Masayoshi Mizuma m.mizuma@jp.fujitsu.com Tested-by: Masayoshi Mizuma m.mizuma@jp.fujitsu.com Signed-off-by: Kan Liang kan.liang@linux.intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Cc: msys.mizuma@gmail.com Cc: stable@vger.kernel.org Fixes: 15a3e845b01c ("perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs") Link: http://lkml.kernel.org/r/1532953688-15008-1-git-send-email-kan.liang@linux.i... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/intel/uncore.h | 2 +- arch/x86/events/intel/uncore_snbep.c | 10 +++++++--- 2 files changed, 8 insertions(+), 4 deletions(-)
--- a/arch/x86/events/intel/uncore.h +++ b/arch/x86/events/intel/uncore.h @@ -23,7 +23,7 @@ #define UNCORE_PCI_DEV_TYPE(data) ((data >> 8) & 0xff) #define UNCORE_PCI_DEV_IDX(data) (data & 0xff) #define UNCORE_EXTRA_PCI_DEV 0xff -#define UNCORE_EXTRA_PCI_DEV_MAX 3 +#define UNCORE_EXTRA_PCI_DEV_MAX 4
#define UNCORE_EVENT_CONSTRAINT(c, n) EVENT_CONSTRAINT(c, n, 0xff)
--- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -1029,6 +1029,7 @@ void snbep_uncore_cpu_init(void) enum { SNBEP_PCI_QPI_PORT0_FILTER, SNBEP_PCI_QPI_PORT1_FILTER, + BDX_PCI_QPI_PORT2_FILTER, HSWEP_PCI_PCU_3, };
@@ -3286,15 +3287,18 @@ static const struct pci_device_id bdx_un }, { /* QPI Port 0 filter */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f86), - .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 0), + .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, + SNBEP_PCI_QPI_PORT0_FILTER), }, { /* QPI Port 1 filter */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f96), - .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 1), + .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, + SNBEP_PCI_QPI_PORT1_FILTER), }, { /* QPI Port 2 filter */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f46), - .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 2), + .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, + BDX_PCI_QPI_PORT2_FILTER), }, { /* PCU.3 (for Capability registers) */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fc0),
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anna-Maria Gleixner anna-maria@linutronix.de
commit 80d20d35af1edd632a5e7a3b9c0ab7ceff92769e upstream.
local_timer_softirq_pending() checks whether the timer softirq is pending with: local_softirq_pending() & TIMER_SOFTIRQ.
This is wrong because TIMER_SOFTIRQ is the softirq number and not a bitmask. So the test checks for the wrong bit.
Use BIT(TIMER_SOFTIRQ) instead.
Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()") Signed-off-by: Anna-Maria Gleixner anna-maria@linutronix.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Paul E. McKenney paulmck@linux.vnet.ibm.com Reviewed-by: Daniel Bristot de Oliveira bristot@redhat.com Acked-by: Frederic Weisbecker frederic@kernel.org Cc: bigeasy@linutronix.de Cc: peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/tick-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -642,7 +642,7 @@ static void tick_nohz_restart(struct tic
static inline bool local_timer_softirq_pending(void) { - return local_softirq_pending() & TIMER_SOFTIRQ; + return local_softirq_pending() & BIT(TIMER_SOFTIRQ); }
static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frederic Weisbecker frederic@kernel.org
commit 0a0e0829f990120cef165bbb804237f400953ec2 upstream.
The full nohz tick is reprogrammed in irq_exit() only if the exit is not in a nesting interrupt. This stands as an optimization: whether a hardirq or a softirq is interrupted, the tick is going to be reprogrammed when necessary at the end of the inner interrupt, with even potential new updates on the timer queue.
When soft interrupts are interrupted, it's assumed that they are executing on the tail of an interrupt return. In that case tick_nohz_irq_exit() is called after softirq processing to take care of the tick reprogramming.
But the assumption is wrong: softirqs can be processed inline as well, ie: outside of an interrupt, like in a call to local_bh_enable() or from ksoftirqd.
Inline softirqs don't reprogram the tick once they are done, as opposed to interrupt tail softirq processing. So if a tick interrupts an inline softirq processing, the next timer will neither be reprogrammed from the interrupting tick's irq_exit() nor after the interrupted softirq processing. This situation may leave the tick unprogrammed while timers are armed.
To fix this, simply keep reprogramming the tick even if a softirq has been interrupted. That can be optimized further, but for now correctness is more important.
Note that new timers enqueued in nohz_full mode after a softirq gets interrupted will still be handled just fine through self-IPIs triggered by the timer code.
Reported-by: Anna-Maria Gleixner anna-maria@linutronix.de Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Anna-Maria Gleixner anna-maria@linutronix.de Cc: stable@vger.kernel.org # 4.14+ Link: https://lkml.kernel.org/r/1533303094-15855-1-git-send-email-frederic@kernel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/softirq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -387,7 +387,7 @@ static inline void tick_irq_exit(void)
/* Make sure that timer wheel updates are propagated */ if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) { - if (!in_interrupt()) + if (!in_irq()) tick_nohz_irq_exit(); } #endif
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Safonov dima@arista.com
commit 91874ecf32e41b5d86a4cb9d60e0bee50d828058 upstream.
It's legal to have 64 groups for netlink_sock.
As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe only to first 32 groups.
The check for correctness of .bind() userspace supplied parameter is done by applying mask made from ngroups shift. Which broke Android as they have 64 groups and the shift for mask resulted in an overflow.
Fixes: 61f4b23769f0 ("netlink: Don't shift with UB on nlk->ngroups") Cc: "David S. Miller" davem@davemloft.net Cc: Herbert Xu herbert@gondor.apana.org.au Cc: Steffen Klassert steffen.klassert@secunet.com Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Reported-and-Tested-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Dmitry Safonov dima@arista.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netlink/af_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1013,8 +1013,8 @@ static int netlink_bind(struct socket *s
if (nlk->ngroups == 0) groups = 0; - else - groups &= (1ULL << nlk->ngroups) - 1; + else if (nlk->ngroups < 8*sizeof(groups)) + groups &= (1UL << nlk->ngroups) - 1;
bound = nlk->bound; if (bound) {
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu mhiramat@kernel.org
commit 73c8d8945505acdcbae137c2e00a1232e0be709f upstream.
Maintain the tracing on/off setting of the ring_buffer when switching to the trace buffer snapshot.
Taking a snapshot is done by swapping the backup ring buffer (max_tr_buffer). But since the tracing on/off setting is defined by the ring buffer, when swapping it, the tracing on/off setting can also be changed. This causes a strange result like below:
/sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 0 > tracing_on /sys/kernel/debug/tracing # cat tracing_on 0 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 0
We don't touch tracing_on, but snapshot changes tracing_on setting each time. This is an anomaly, because user doesn't know that each "ring_buffer" stores its own tracing-enable state and the snapshot is done by swapping ring buffers.
Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbo...
Cc: Ingo Molnar mingo@redhat.com Cc: Shuah Khan shuah@kernel.org Cc: Tom Zanussi tom.zanussi@linux.intel.com Cc: Hiraku Toyooka hiraku.toyooka@cybertrust.co.jp Cc: stable@vger.kernel.org Fixes: debdd57f5145 ("tracing: Make a snapshot feature available from userspace") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org [ Updated commit log and comment in the code ] Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/ring_buffer.h | 1 + kernel/trace/ring_buffer.c | 16 ++++++++++++++++ kernel/trace/trace.c | 6 ++++++ 3 files changed, 23 insertions(+)
--- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -165,6 +165,7 @@ void ring_buffer_record_enable(struct ri void ring_buffer_record_off(struct ring_buffer *buffer); void ring_buffer_record_on(struct ring_buffer *buffer); int ring_buffer_record_is_on(struct ring_buffer *buffer); +int ring_buffer_record_is_set_on(struct ring_buffer *buffer); void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu); void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
--- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3227,6 +3227,22 @@ int ring_buffer_record_is_on(struct ring }
/** + * ring_buffer_record_is_set_on - return true if the ring buffer is set writable + * @buffer: The ring buffer to see if write is set enabled + * + * Returns true if the ring buffer is set writable by ring_buffer_record_on(). + * Note that this does NOT mean it is in a writable state. + * + * It may return true when the ring buffer has been disabled by + * ring_buffer_record_disable(), as that is a temporary disabling of + * the ring buffer. + */ +int ring_buffer_record_is_set_on(struct ring_buffer *buffer) +{ + return !(atomic_read(&buffer->record_disabled) & RB_BUFFER_OFF); +} + +/** * ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer * @buffer: The ring buffer to stop writes to. * @cpu: The CPU buffer to stop --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1375,6 +1375,12 @@ update_max_tr(struct trace_array *tr, st
arch_spin_lock(&tr->max_lock);
+ /* Inherit the recordable setting from trace_buffer */ + if (ring_buffer_record_is_set_on(tr->trace_buffer.buffer)) + ring_buffer_record_on(tr->max_buffer.buffer); + else + ring_buffer_record_off(tr->max_buffer.buffer); + buf = tr->trace_buffer.buffer; tr->trace_buffer.buffer = tr->max_buffer.buffer; tr->max_buffer.buffer = buf;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit bd3599a0e142cd73edd3b6801068ac3f48ac771a upstream.
When we clone a range into a file we can end up dropping existing extent maps (or trimming them) and replacing them with new ones if the range to be cloned overlaps with a range in the destination inode. When that happens we add the new extent maps to the list of modified extents in the inode's extent map tree, so that a "fast" fsync (the flag BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps and log corresponding extent items. However, at the end of range cloning operation we do truncate all the pages in the affected range (in order to ensure future reads will not get stale data). Sometimes this truncation will release the corresponding extent maps besides the pages from the page cache. If this happens, then a "fast" fsync operation will miss logging some extent items, because it relies exclusively on the extent maps being present in the inode's extent tree, leading to data loss/corruption if the fsync ends up using the same transaction used by the clone operation (that transaction was not committed in the meanwhile). An extent map is released through the callback btrfs_invalidatepage(), which gets called by truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The later ends up calling try_release_extent_mapping() which will release the extent map if some conditions are met, like the file size being greater than 16Mb, gfp flags allow blocking and the range not being locked (which is the case during the clone operation) nor being the extent map flagged as pinned (also the case for cloning).
The following example, turned into a test for fstests, reproduces the issue:
$ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt
$ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar
$ xfs_io -c "fsync" /mnt/bar # reflink destination offset corresponds to the size of file bar, # 2728Kb minus 4Kb. $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar
$ md5sum /mnt/bar 95a95813a8c2abc9aa75a6c2914a077e /mnt/bar
<power fail>
$ mount /dev/sdb /mnt $ md5sum /mnt/bar 207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the # power failure
In the above example, the destination offset of the clone operation corresponds to the size of the "bar" file minus 4Kb. So during the clone operation, the extent map covering the range from 2572Kb to 2728Kb gets trimmed so that it ends at offset 2724Kb, and a new extent map covering the range from 2724Kb to 11724Kb is created. So at the end of the clone operation when we ask to truncate the pages in the range from 2724Kb to 2724Kb + 15908Kb, the page invalidation callback ends up removing the new extent map (through try_release_extent_mapping()) when the page at offset 2724Kb is passed to that callback.
Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent map is removed at try_release_extent_mapping(), forcing the next fsync to search for modified extents in the fs/subvolume tree instead of relying on the presence of extent maps in memory. This way we can continue doing a "fast" fsync if the destination range of a clone operation does not overlap with an existing range or if any of the criteria necessary to remove an extent map at try_release_extent_mapping() is not met (file size not bigger then 16Mb or gfp flags do not allow blocking).
CC: stable@vger.kernel.org # 3.16+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent_io.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4245,6 +4245,7 @@ int try_release_extent_mapping(struct ex struct extent_map *em; u64 start = page_offset(page); u64 end = start + PAGE_SIZE - 1; + struct btrfs_inode *btrfs_inode = BTRFS_I(page->mapping->host);
if (gfpflags_allow_blocking(mask) && page->mapping->host->i_size > SZ_16M) { @@ -4267,6 +4268,8 @@ int try_release_extent_mapping(struct ex extent_map_end(em) - 1, EXTENT_LOCKED | EXTENT_WRITEBACK, 0, NULL)) { + set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, + &btrfs_inode->runtime_flags); remove_extent_mapping(map, em); /* once for the rb tree */ free_extent_map(em);
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Linus Torvalds torvalds@linux-foundation.org
commit a32e236eb93e62a0f692e79b7c3c9636689559b9 upstream.
It turns out that commit 721c7fc701c7 ("block: fail op_is_write() requests to read-only partitions"), while obviously correct, causes problems for some older lvm2 installations.
The reason is that the lvm snapshotting will continue to write to the snapshow COW volume, even after the volume has been marked read-only. End result: snapshot failure.
This has actually been fixed in newer version of the lvm2 tool, but the old tools still exist, and the breakage was reported both in the kernel bugzilla and in the Debian bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=200439 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442
The lvm2 fix is here
https://sourceware.org/git/?p=lvm2.git%3Ba=commit%3Bh=a6fdb9d9d70f51c49ad11a...
but until everybody has updated to recent versions, we'll have to weaken the "never write to read-only partitions" check. It now allows the write to happen, but causes a warning, something like this:
generic_make_request: Trying to write to read-only block-device dm-3 (partno X) Modules linked in: nf_tables xt_cgroup xt_owner kvm_intel iwlmvm kvm irqbypass iwlwifi CPU: 1 PID: 77 Comm: kworker/1:1 Not tainted 4.17.9-gentoo #3 Hardware name: LENOVO 20B6A019RT/20B6A019RT, BIOS GJET91WW (2.41 ) 09/21/2016 Workqueue: ksnaphd do_metadata RIP: 0010:generic_make_request_checks+0x4ac/0x600 ... Call Trace: generic_make_request+0x64/0x400 submit_bio+0x6c/0x140 dispatch_io+0x287/0x430 sync_io+0xc3/0x120 dm_io+0x1f8/0x220 do_metadata+0x1d/0x30 process_one_work+0x1b9/0x3e0 worker_thread+0x2b/0x3c0 kthread+0x113/0x130 ret_from_fork+0x35/0x40
Note that this is a "revert" in behavior only. I'm leaving alone the actual code cleanups in commit 721c7fc701c7, but letting the previously uncaught request go through with a warning instead of stopping it.
Fixes: 721c7fc701c7 ("block: fail op_is_write() requests to read-only partitions") Reported-and-tested-by: WGH wgh@torlan.ru Acked-by: Mike Snitzer snitzer@redhat.com Cc: Sagi Grimberg sagi@grimberg.me Cc: Ilya Dryomov idryomov@gmail.com Cc: Jens Axboe axboe@kernel.dk Cc: Zdenek Kabelac zkabelac@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/blk-core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/block/blk-core.c +++ b/block/blk-core.c @@ -2174,11 +2174,12 @@ static inline bool bio_check_ro(struct b if (part->policy && op_is_write(bio_op(bio))) { char b[BDEVNAME_SIZE];
- printk(KERN_ERR + WARN_ONCE(1, "generic_make_request: Trying to write " "to read-only block-device %s (partno %d)\n", bio_devname(bio, b), part->partno); - return true; + /* Older lvm-tools actually trigger this */ + return false; }
return false;
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Sandeen sandeen@sandeen.net
commit bb3d48dcf86a97dc25fe9fc2c11938e19cb4399a upstream.
xfs_attr3_leaf_create may have errored out before instantiating a buffer, for example if the blkno is out of range. In that case there is no work to do to remove it, and in fact xfs_da_shrink_inode will lead to an oops if we try.
This also seems to fix a flaw where the original error from xfs_attr3_leaf_create gets overwritten in the cleanup case, and it removes a pointless assignment to bp which isn't used after this.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199969 Reported-by: Xu, Wen wen.xu@gatech.edu Tested-by: Xu, Wen wen.xu@gatech.edu Signed-off-by: Eric Sandeen sandeen@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Cc: Eduardo Valentin eduval@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/xfs/libxfs/xfs_attr_leaf.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -803,9 +803,8 @@ xfs_attr_shortform_to_leaf( ASSERT(blkno == 0); error = xfs_attr3_leaf_create(args, blkno, &bp); if (error) { - error = xfs_da_shrink_inode(args, 0, bp); - bp = NULL; - if (error) + /* xfs_attr3_leaf_create may not have instantiated a block */ + if (bp && (xfs_da_shrink_inode(args, 0, bp) != 0)) goto out; xfs_idata_realloc(dp, size, XFS_ATTR_FORK); /* try to put */ memcpy(ifp->if_u1.if_data, tmpbuffer, size); /* it back */
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
commit afca6c5b2595fc44383919fba740c194b0b76aff upstream.
A recent fuzzed filesystem image cached random dcache corruption when the reproducer was run. This often showed up as panics in lookup_slow() on a null inode->i_ops pointer when doing pathwalks.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 .... Call Trace: lookup_slow+0x44/0x60 walk_component+0x3dd/0x9f0 link_path_walk+0x4a7/0x830 path_lookupat+0xc1/0x470 filename_lookup+0x129/0x270 user_path_at_empty+0x36/0x40 path_listxattr+0x98/0x110 SyS_listxattr+0x13/0x20 do_syscall_64+0xf5/0x280 entry_SYSCALL_64_after_hwframe+0x42/0xb7
but had many different failure modes including deadlocks trying to lock the inode that was just allocated or KASAN reports of use-after-free violations.
The cause of the problem was a corrupt INOBT on a v4 fs where the root inode was marked as free in the inobt record. Hence when we allocated an inode, it chose the root inode to allocate, found it in the cache and re-initialised it.
We recently fixed a similar inode allocation issue caused by inobt record corruption problem in xfs_iget_cache_miss() in commit ee457001ed6c ("xfs: catch inode allocation state mismatch corruption"). This change adds similar checks to the cache-hit path to catch it, and turns the reproducer into a corruption shutdown situation.
Reported-by: Wen Xu wen.xu@gatech.edu Signed-Off-By: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Carlos Maiolino cmaiolino@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com [darrick: fix typos in comment] Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Cc: Eduardo Valentin eduval@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/xfs/xfs_icache.c | 73 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 48 insertions(+), 25 deletions(-)
--- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -309,6 +309,46 @@ xfs_reinit_inode( }
/* + * If we are allocating a new inode, then check what was returned is + * actually a free, empty inode. If we are not allocating an inode, + * then check we didn't find a free inode. + * + * Returns: + * 0 if the inode free state matches the lookup context + * -ENOENT if the inode is free and we are not allocating + * -EFSCORRUPTED if there is any state mismatch at all + */ +static int +xfs_iget_check_free_state( + struct xfs_inode *ip, + int flags) +{ + if (flags & XFS_IGET_CREATE) { + /* should be a free inode */ + if (VFS_I(ip)->i_mode != 0) { + xfs_warn(ip->i_mount, +"Corruption detected! Free inode 0x%llx not marked free! (mode 0x%x)", + ip->i_ino, VFS_I(ip)->i_mode); + return -EFSCORRUPTED; + } + + if (ip->i_d.di_nblocks != 0) { + xfs_warn(ip->i_mount, +"Corruption detected! Free inode 0x%llx has blocks allocated!", + ip->i_ino); + return -EFSCORRUPTED; + } + return 0; + } + + /* should be an allocated inode */ + if (VFS_I(ip)->i_mode == 0) + return -ENOENT; + + return 0; +} + +/* * Check the validity of the inode we just found it the cache */ static int @@ -357,12 +397,12 @@ xfs_iget_cache_hit( }
/* - * If lookup is racing with unlink return an error immediately. + * Check the inode free state is valid. This also detects lookup + * racing with unlinks. */ - if (VFS_I(ip)->i_mode == 0 && !(flags & XFS_IGET_CREATE)) { - error = -ENOENT; + error = xfs_iget_check_free_state(ip, flags); + if (error) goto out_error; - }
/* * If IRECLAIMABLE is set, we've torn down the VFS inode already. @@ -485,29 +525,12 @@ xfs_iget_cache_miss(
/* - * If we are allocating a new inode, then check what was returned is - * actually a free, empty inode. If we are not allocating an inode, - * the check we didn't find a free inode. + * Check the inode free state is valid. This also detects lookup + * racing with unlinks. */ - if (flags & XFS_IGET_CREATE) { - if (VFS_I(ip)->i_mode != 0) { - xfs_warn(mp, -"Corruption detected! Free inode 0x%llx not marked free on disk", - ino); - error = -EFSCORRUPTED; - goto out_destroy; - } - if (ip->i_d.di_nblocks != 0) { - xfs_warn(mp, -"Corruption detected! Free inode 0x%llx has blocks allocated!", - ino); - error = -EFSCORRUPTED; - goto out_destroy; - } - } else if (VFS_I(ip)->i_mode == 0) { - error = -ENOENT; + error = xfs_iget_check_free_state(ip, flags); + if (error) goto out_destroy; - }
/* * Preload the radix tree so we can insert safely under the
4.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shankara Pailoor shankarapailoor@gmail.com
commit 92d34134193e5b129dc24f8d79cb9196626e8d7a upstream.
The code is assuming the buffer is max_size length, but we weren't allocating enough space for it.
Signed-off-by: Shankara Pailoor shankarapailoor@gmail.com Signed-off-by: Dave Kleikamp dave.kleikamp@oracle.com Cc: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/jfs/xattr.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/fs/jfs/xattr.c +++ b/fs/jfs/xattr.c @@ -491,15 +491,17 @@ static int ea_get(struct inode *inode, s if (size > PSIZE) { /* * To keep the rest of the code simple. Allocate a - * contiguous buffer to work with + * contiguous buffer to work with. Make the buffer large + * enough to make use of the whole extent. */ - ea_buf->xattr = kmalloc(size, GFP_KERNEL); + ea_buf->max_size = (size + sb->s_blocksize - 1) & + ~(sb->s_blocksize - 1); + + ea_buf->xattr = kmalloc(ea_buf->max_size, GFP_KERNEL); if (ea_buf->xattr == NULL) return -ENOMEM;
ea_buf->flag = EA_MALLOC; - ea_buf->max_size = (size + sb->s_blocksize - 1) & - ~(sb->s_blocksize - 1);
if (ea_size == 0) return 0;
On 08/07/2018 12:51 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.17.14 release. There are 18 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Aug 9 17:23:05 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.14-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On 8 August 2018 at 00:21, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.17.14 release. There are 18 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Aug 9 17:23:05 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.14-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.17.14-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.17.y git commit: 53b91e019709af31eaf31b16123585f05c00a411 git describe: v4.17.13-19-g53b91e019709 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.17-oe/build/v4.17.13-19...
No regressions (compared to build v4.17.13)
Ran 13918 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Wed, Aug 08, 2018 at 10:50:20AM +0530, Naresh Kamboju wrote:
On 8 August 2018 at 00:21, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.17.14 release. There are 18 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Aug 9 17:23:05 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.17.14-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.17.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Great! Thanks for testing these and letting me know.
greg k-h
On Tue, Aug 07, 2018 at 08:51:03PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.17.14 release. There are 18 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Aug 9 17:23:05 UTC 2018. Anything received after that time might be too late.
Build results: total: 134 pass: 134 fail: 0 Qemu test results: total: 290 pass: 290 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
On Wed, Aug 08, 2018 at 08:48:27AM -0700, Guenter Roeck wrote:
On Tue, Aug 07, 2018 at 08:51:03PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.17.14 release. There are 18 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Aug 9 17:23:05 UTC 2018. Anything received after that time might be too late.
Build results: total: 134 pass: 134 fail: 0 Qemu test results: total: 290 pass: 290 fail: 0
Details are available at http://kerneltests.org/builders/.
Thanks for testing all of these and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org