This is the start of the stable review cycle for the 4.14.55 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Jul 12 18:24:36 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.55-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.55-rc1
Sebastian Andrzej Siewior bigeasy@linutronix.de Revert mm/vmstat.c: fix vmstat_update() preemption BUG
Sebastian Andrzej Siewior bigeasy@linutronix.de sched, tracing: Fix trace_sched_pi_setprio() for deboosting
Dan Carpenter dan.carpenter@oracle.com staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write()
Jann Horn jannh@google.com netfilter: nf_log: don't hold nf_log_mutex during user access
Tokunori Ikegami ikegami@allied-telesis.co.jp mtd: cfi_cmdset_0002: Change erase functions to check chip good only
Tokunori Ikegami ikegami@allied-telesis.co.jp mtd: cfi_cmdset_0002: Change erase functions to retry for error
Tokunori Ikegami ikegami@allied-telesis.co.jp mtd: cfi_cmdset_0002: Change definition naming to retry write operation
Ross Zwisler ross.zwisler@linux.intel.com dm: prevent DAX mounts if not supported
Mike Snitzer snitzer@redhat.com dm: set QUEUE_FLAG_DAX accordingly in dm_table_set_restrictions()
Ross Zwisler ross.zwisler@linux.intel.com dax: check for QUEUE_FLAG_DAX in bdev_dax_supported()
Dave Jiang dave.jiang@intel.com dax: change bdev_dax_supported() to support boolean returns
Darrick J. Wong darrick.wong@oracle.com fs: allow per-device dax status checking for filesystems
Martin Kaiser martin@kaiser.cx mtd: rawnand: mxc: set spare area size register explicitly
Brad Love brad@nextdimension.cc media: cx25840: Use subdev host data for PLL override
Rasmus Villemoes linux@rasmusvillemoes.dk Kbuild: fix # escaping in .cmd files for future Make
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "dpaa_eth: fix error in dpaa_remove()"
Jaegeuk Kim jaegeuk@kernel.org f2fs: truncate preallocated blocks in error case
Sakari Ailus sakari.ailus@linux.intel.com media: vb2: core: Finish buffers at the end of the stream
Naoya Horiguchi n-horiguchi@ah.jp.nec.com mm: hwpoison: disable memory error handling on 1GB hugepage
Rakib Mullick rakib.mullick@gmail.com irq/core: Fix boot crash when the irqaffinity= boot parameter is passed on CPUMASK_OFFSTACK=y kernels(v1)
Daniel Rosenberg drosen@google.com HID: debug: check length before copy_to_user()
Gustavo A. R. Silva gustavo@embeddedor.com HID: hiddev: fix potential Spectre v1
Jason Andryuk jandryuk@gmail.com HID: i2c-hid: Fix "incomplete report" noise
Ilya Dryomov idryomov@gmail.com block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()
Ilya Dryomov idryomov@gmail.com block: factor out __blkdev_issue_zero_pages()
Jon Derrick jonathan.derrick@intel.com ext4: check superblock mapped prior to committing
Theodore Ts'o tytso@mit.edu ext4: add more mount time checks of the superblock
Theodore Ts'o tytso@mit.edu ext4: add more inode number paranoia checks
Theodore Ts'o tytso@mit.edu ext4: avoid running out of journal credits when appending to an inline file
Theodore Ts'o tytso@mit.edu ext4: never move the system.data xattr out of the inode body
Theodore Ts'o tytso@mit.edu ext4: clear i_data in ext4_inode_info when removing inline data
Theodore Ts'o tytso@mit.edu ext4: include the illegal physical block in the bad map ext4_error msg
Theodore Ts'o tytso@mit.edu ext4: verify the depth of extent tree in ext4_find_extent()
Theodore Ts'o tytso@mit.edu ext4: only look at the bg_flags field if it is valid
Theodore Ts'o tytso@mit.edu ext4: always check block group bounds in ext4_init_block_bitmap()
Theodore Ts'o tytso@mit.edu ext4: make sure bitmaps and the inode table don't overlap with bg descriptors
Theodore Ts'o tytso@mit.edu ext4: always verify the magic number in xattr blocks
Theodore Ts'o tytso@mit.edu ext4: add corruption check in ext4_xattr_set_entry()
Theodore Ts'o tytso@mit.edu jbd2: don't mark block as modified if the handle is out of credits
Mikulas Patocka mpatocka@redhat.com drm/udl: fix display corruption of the last line
Michel Dänzer michel.daenzer@amd.com drm: Use kvzalloc for allocating blob property memory
Stefano Brivio sbrivio@redhat.com cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting
Paulo Alcantara paulo@paulo.ac cifs: Fix infinite loop when using hard mount option
Paulo Alcantara paulo@paulo.ac cifs: Fix memory leak in smb2_set_ea()
Lars Persson lars.persson@axis.com cifs: Fix use after free of a mid_q_entry
Jason Gunthorpe jgg@mellanox.com vfio: Use get_user_pages_longterm correctly
Lars Ellenberg lars.ellenberg@linbit.com drbd: fix access after free
Christian Borntraeger borntraeger@de.ibm.com s390: Correct register corruption in critical section cleanup
David Disseldorp ddiss@suse.de scsi: target: Fix truncated PR-in ReadKeys response
Jann Horn jannh@google.com scsi: sg: mitigate read/write abuse
Changbin Du changbin.du@intel.com tracing: Fix missing return symbol in function_graph output
Cannon Matthews cannonmatthews@google.com mm: hugetlb: yield when prepping struct pages
Janosch Frank frankja@linux.ibm.com userfaultfd: hugetlbfs: fix userfaultfd_huge_must_wait() pte access
-------------
Diffstat:
Makefile | 4 +- arch/s390/kernel/entry.S | 4 +- block/blk-lib.c | 108 +++++++++++++++-------- drivers/block/drbd/drbd_worker.c | 2 +- drivers/dax/super.c | 42 +++++---- drivers/gpu/drm/drm_property.c | 6 +- drivers/gpu/drm/udl/udl_fb.c | 5 +- drivers/gpu/drm/udl/udl_transfer.c | 11 ++- drivers/hid/hid-debug.c | 8 +- drivers/hid/i2c-hid/i2c-hid.c | 2 +- drivers/hid/usbhid/hiddev.c | 11 +++ drivers/md/dm-table.c | 9 +- drivers/md/dm.c | 6 +- drivers/media/i2c/cx25840/cx25840-core.c | 28 ++++-- drivers/media/v4l2-core/videobuf2-core.c | 9 ++ drivers/mtd/chips/cfi_cmdset_0002.c | 30 +++++-- drivers/mtd/nand/mxc_nand.c | 5 +- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +- drivers/scsi/sg.c | 42 ++++++++- drivers/staging/comedi/drivers/quatech_daqp_cs.c | 2 +- drivers/target/target_core_pr.c | 15 ++-- drivers/vfio/vfio_iommu_type1.c | 16 ++-- fs/cifs/cifsglob.h | 1 + fs/cifs/cifsproto.h | 1 + fs/cifs/cifssmb.c | 10 ++- fs/cifs/connect.c | 8 +- fs/cifs/smb1ops.c | 1 + fs/cifs/smb2ops.c | 3 + fs/cifs/smb2pdu.c | 25 ++++-- fs/cifs/smb2transport.c | 1 + fs/cifs/transport.c | 18 +++- fs/ext2/super.c | 3 +- fs/ext4/balloc.c | 21 +++-- fs/ext4/ext4.h | 8 -- fs/ext4/ext4_extents.h | 1 + fs/ext4/extents.c | 6 ++ fs/ext4/ialloc.c | 14 ++- fs/ext4/inline.c | 39 +------- fs/ext4/inode.c | 7 +- fs/ext4/mballoc.c | 6 +- fs/ext4/super.c | 89 ++++++++++++++++--- fs/ext4/xattr.c | 40 ++++----- fs/f2fs/file.c | 9 ++ fs/jbd2/transaction.c | 9 +- fs/userfaultfd.c | 12 +-- fs/xfs/xfs_ioctl.c | 3 +- fs/xfs/xfs_iops.c | 30 +++++-- fs/xfs/xfs_super.c | 10 ++- include/linux/dax.h | 11 +-- include/linux/mm.h | 1 + include/trace/events/sched.h | 4 +- kernel/irq/irqdesc.c | 6 +- kernel/trace/trace_functions_graph.c | 5 +- mm/hugetlb.c | 1 + mm/memory-failure.c | 16 ++++ mm/vmstat.c | 2 - net/netfilter/nf_log.c | 9 +- scripts/Kbuild.include | 5 +- tools/build/Build.include | 5 +- tools/objtool/Makefile | 2 +- tools/scripts/Makefile.include | 2 + 61 files changed, 556 insertions(+), 255 deletions(-)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Janosch Frank frankja@linux.ibm.com
commit 1e2c043628c7736dd56536d16c0ce009bc834ae7 upstream.
Use huge_ptep_get() to translate huge ptes to normal ptes so we can check them with the huge_pte_* functions. Otherwise some architectures will check the wrong values and will not wait for userspace to bring in the memory.
Link: http://lkml.kernel.org/r/20180626132421.78084-1-frankja@linux.ibm.com Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges") Signed-off-by: Janosch Frank frankja@linux.ibm.com Reviewed-by: David Hildenbrand david@redhat.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/userfaultfd.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -220,24 +220,26 @@ static inline bool userfaultfd_huge_must unsigned long reason) { struct mm_struct *mm = ctx->mm; - pte_t *pte; + pte_t *ptep, pte; bool ret = true;
VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
- pte = huge_pte_offset(mm, address, vma_mmu_pagesize(vma)); - if (!pte) + ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma)); + + if (!ptep) goto out;
ret = false; + pte = huge_ptep_get(ptep);
/* * Lockless access: we're in a wait_event so it's ok if it * changes under us. */ - if (huge_pte_none(*pte)) + if (huge_pte_none(pte)) ret = true; - if (!huge_pte_write(*pte) && (reason & VM_UFFD_WP)) + if (!huge_pte_write(pte) && (reason & VM_UFFD_WP)) ret = true; out: return ret;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cannon Matthews cannonmatthews@google.com
commit 520495fe96d74e05db585fc748351e0504d8f40d upstream.
When booting with very large numbers of gigantic (i.e. 1G) pages, the operations in the loop of gather_bootmem_prealloc, and specifically prep_compound_gigantic_page, takes a very long time, and can cause a softlockup if enough pages are requested at boot.
For example booting with 3844 1G pages requires prepping (set_compound_head, init the count) over 1 billion 4K tail pages, which takes considerable time.
Add a cond_resched() to the outer loop in gather_bootmem_prealloc() to prevent this lockup.
Tested: Booted with softlockup_panic=1 hugepagesz=1G hugepages=3844 and no softlockup is reported, and the hugepages are reported as successfully setup.
Link: http://lkml.kernel.org/r/20180627214447.260804-1-cannonmatthews@google.com Signed-off-by: Cannon Matthews cannonmatthews@google.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Michal Hocko mhocko@suse.com Cc: Andres Lagar-Cavilla andreslc@google.com Cc: Peter Feiner pfeiner@google.com Cc: Greg Thelen gthelen@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/hugetlb.c | 1 + 1 file changed, 1 insertion(+)
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2159,6 +2159,7 @@ static void __init gather_bootmem_preall */ if (hstate_is_gigantic(h)) adjust_managed_page_count(page, 1 << h->order); + cond_resched(); } }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Changbin Du changbin.du@intel.com
commit 1fe4293f4b8de75824935f8d8e9a99c7fc6873da upstream.
The function_graph tracer does not show the interrupt return marker for the leaf entry. On leaf entries, we see an unbalanced interrupt marker (the interrupt was entered, but nevern left).
Before: 1) | SyS_write() { 1) | __fdget_pos() { 1) 0.061 us | __fget_light(); 1) 0.289 us | } 1) | vfs_write() { 1) 0.049 us | rw_verify_area(); 1) + 15.424 us | __vfs_write(); 1) ==========> | 1) 6.003 us | smp_apic_timer_interrupt(); 1) 0.055 us | __fsnotify_parent(); 1) 0.073 us | fsnotify(); 1) + 23.665 us | } 1) + 24.501 us | }
After: 0) | SyS_write() { 0) | __fdget_pos() { 0) 0.052 us | __fget_light(); 0) 0.328 us | } 0) | vfs_write() { 0) 0.057 us | rw_verify_area(); 0) | __vfs_write() { 0) ==========> | 0) 8.548 us | smp_apic_timer_interrupt(); 0) <========== | 0) + 36.507 us | } /* __vfs_write */ 0) 0.049 us | __fsnotify_parent(); 0) 0.066 us | fsnotify(); 0) + 50.064 us | } 0) + 50.952 us | }
Link: http://lkml.kernel.org/r/1517413729-20411-1-git-send-email-changbin.du@intel...
Cc: stable@vger.kernel.org Fixes: f8b755ac8e0cc ("tracing/function-graph-tracer: Output arrows signal on hardirq call/return") Signed-off-by: Changbin Du changbin.du@intel.com Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/trace_functions_graph.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/kernel/trace/trace_functions_graph.c +++ b/kernel/trace/trace_functions_graph.c @@ -831,6 +831,7 @@ print_graph_entry_leaf(struct trace_iter struct ftrace_graph_ret *graph_ret; struct ftrace_graph_ent *call; unsigned long long duration; + int cpu = iter->cpu; int i;
graph_ret = &ret_entry->ret; @@ -839,7 +840,6 @@ print_graph_entry_leaf(struct trace_iter
if (data) { struct fgraph_cpu_data *cpu_data; - int cpu = iter->cpu;
cpu_data = per_cpu_ptr(data->cpu_data, cpu);
@@ -869,6 +869,9 @@ print_graph_entry_leaf(struct trace_iter
trace_seq_printf(s, "%ps();\n", (void *)call->func);
+ print_graph_irq(iter, graph_ret->func, TRACE_GRAPH_RET, + cpu, iter->ent->pid, flags); + return trace_handle_return(s); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
commit 26b5b874aff5659a7e26e5b1997e3df2c41fa7fd upstream.
As Al Viro noted in commit 128394eff343 ("sg_write()/bsg_write() is not fit to be called under KERNEL_DS"), sg improperly accesses userspace memory outside the provided buffer, permitting kernel memory corruption via splice(). But it doesn't just do it on ->write(), also on ->read().
As a band-aid, make sure that the ->read() and ->write() handlers can not be called in weird contexts (kernel context or credentials different from file opener), like for ib_safe_file_access().
If someone needs to use these interfaces from different security contexts, a new interface should be written that goes through the ->ioctl() handler.
I've mostly copypasted ib_safe_file_access() over as sg_safe_file_access() because I couldn't find a good common header - please tell me if you know a better way.
[mkp: s/_safe_/_check_/]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com Acked-by: Douglas Gilbert dgilbert@interlog.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/sg.c | 42 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-)
--- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -51,6 +51,7 @@ static int sg_version_num = 30536; /* 2 #include <linux/atomic.h> #include <linux/ratelimit.h> #include <linux/uio.h> +#include <linux/cred.h> /* for sg_check_file_access() */
#include "scsi.h" #include <scsi/scsi_dbg.h> @@ -210,6 +211,33 @@ static void sg_device_destroy(struct kre sdev_prefix_printk(prefix, (sdp)->device, \ (sdp)->disk->disk_name, fmt, ##a)
+/* + * The SCSI interfaces that use read() and write() as an asynchronous variant of + * ioctl(..., SG_IO, ...) are fundamentally unsafe, since there are lots of ways + * to trigger read() and write() calls from various contexts with elevated + * privileges. This can lead to kernel memory corruption (e.g. if these + * interfaces are called through splice()) and privilege escalation inside + * userspace (e.g. if a process with access to such a device passes a file + * descriptor to a SUID binary as stdin/stdout/stderr). + * + * This function provides protection for the legacy API by restricting the + * calling context. + */ +static int sg_check_file_access(struct file *filp, const char *caller) +{ + if (filp->f_cred != current_real_cred()) { + pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n", + caller, task_tgid_vnr(current), current->comm); + return -EPERM; + } + if (uaccess_kernel()) { + pr_err_once("%s: process %d (%s) called from kernel context, this is not allowed.\n", + caller, task_tgid_vnr(current), current->comm); + return -EACCES; + } + return 0; +} + static int sg_allow_access(struct file *filp, unsigned char *cmd) { struct sg_fd *sfp = filp->private_data; @@ -394,6 +422,14 @@ sg_read(struct file *filp, char __user * struct sg_header *old_hdr = NULL; int retval = 0;
+ /* + * This could cause a response to be stranded. Close the associated + * file descriptor to free up any resources being held. + */ + retval = sg_check_file_access(filp, __func__); + if (retval) + return retval; + if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp))) return -ENXIO; SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp, @@ -581,9 +617,11 @@ sg_write(struct file *filp, const char _ struct sg_header old_hdr; sg_io_hdr_t *hp; unsigned char cmnd[SG_MAX_CDB_SIZE]; + int retval;
- if (unlikely(uaccess_kernel())) - return -EINVAL; + retval = sg_check_file_access(filp, __func__); + if (retval) + return retval;
if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp))) return -ENXIO;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Disseldorp ddiss@suse.de
commit 63ce3c384db26494615e3c8972bcd419ed71f4c4 upstream.
SPC5r17 states that the contents of the ADDITIONAL LENGTH field are not altered based on the allocation length, so always calculate and pack the full key list length even if the list itself is truncated.
According to Maged:
Yes it fixes the "Storage Spaces Persistent Reservation" test in the Windows 2016 Server Failover Cluster validation suites when having many connections that result in more than 8 registrations. I tested your patch on 4.17 with iblock.
This behaviour can be tested using the libiscsi PrinReadKeys.Truncate test.
Cc: stable@vger.kernel.org Signed-off-by: David Disseldorp ddiss@suse.de Reviewed-by: Mike Christie mchristi@redhat.com Tested-by: Maged Mokhtar mmokhtar@petasan.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/target/target_core_pr.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/drivers/target/target_core_pr.c +++ b/drivers/target/target_core_pr.c @@ -3729,11 +3729,16 @@ core_scsi3_pri_read_keys(struct se_cmd * * Check for overflow of 8byte PRI READ_KEYS payload and * next reservation key list descriptor. */ - if ((add_len + 8) > (cmd->data_length - 8)) - break; - - put_unaligned_be64(pr_reg->pr_res_key, &buf[off]); - off += 8; + if (off + 8 <= cmd->data_length) { + put_unaligned_be64(pr_reg->pr_res_key, &buf[off]); + off += 8; + } + /* + * SPC5r17: 6.16.2 READ KEYS service action + * The ADDITIONAL LENGTH field indicates the number of bytes in + * the Reservation key list. The contents of the ADDITIONAL + * LENGTH field are not altered based on the allocation length + */ add_len += 8; } spin_unlock(&dev->t10_pr.registration_lock);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Borntraeger borntraeger@de.ibm.com
commit 891f6a726cacbb87e5b06076693ffab53bd378d7 upstream.
In the critical section cleanup we must not mess with r1. For march=z9 or older, larl + ex (instead of exrl) are used with r1 as a temporary register. This can clobber r1 in several interrupt handlers. Fix this by using r11 as a temp register. r11 is being saved by all callers of cleanup_critical.
Fixes: 6dd85fbb87 ("s390: move expoline assembler macros to a header") Cc: stable@vger.kernel.org #v4.16 Reported-by: Oliver Kurz okurz@suse.com Reported-by: Petr Tesařík ptesarik@suse.com Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Reviewed-by: Hendrik Brueckner brueckner@linux.ibm.com Signed-off-by: Martin Schwidefsky schwidefsky@de.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/kernel/entry.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -1244,7 +1244,7 @@ cleanup_critical: jl 0f clg %r9,BASED(.Lcleanup_table+104) # .Lload_fpu_regs_end jl .Lcleanup_load_fpu_regs -0: BR_EX %r14 +0: BR_EX %r14,%r11
.align 8 .Lcleanup_table: @@ -1280,7 +1280,7 @@ cleanup_critical: ni __SIE_PROG0C+3(%r9),0xfe # no longer in SIE lctlg %c1,%c1,__LC_USER_ASCE # load primary asce larl %r9,sie_exit # skip forward to sie_exit - BR_EX %r14 + BR_EX %r14,%r11 #endif
.Lcleanup_system_call:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lars Ellenberg lars.ellenberg@linbit.com
commit 64dafbc9530c10300acffc57fae3269d95fa8f93 upstream.
We have struct drbd_requests { ... struct bio *private_bio; ... } to hold a bio clone for local submission.
On local IO completion, we put that bio, and in case we want to use the result later, we overload that member to hold the ERR_PTR() of the completion result,
Which, before v4.3, used to be the passed in "int error", so we could first bio_put(), then assign.
v4.3-rc1~100^2~21 4246a0b63bd8 block: add a bi_error field to struct bio changed that: bio_put(req->private_bio); - req->private_bio = ERR_PTR(error); + req->private_bio = ERR_PTR(bio->bi_error);
Which introduces an access after free, because it was non obvious that req->private_bio == bio.
Impact of that was mostly unnoticable, because we only use that value in a multiple-failure case, and even then map any "unexpected" error code to EIO, so worst case we could potentially mask a more specific error with EIO in a multiple failure case.
Unless the pointed to memory region was unmapped, as is the case with CONFIG_DEBUG_PAGEALLOC, in which case this results in
BUG: unable to handle kernel paging request
v4.13-rc1~70^2~75 4e4cbee93d56 block: switch bios to blk_status_t changes it further to bio_put(req->private_bio); req->private_bio = ERR_PTR(blk_status_to_errno(bio->bi_status));
And blk_status_to_errno() now contains a WARN_ON_ONCE() for unexpected values, which catches this "sometimes", if the memory has been reused quickly enough for other things.
Should also go into stable since 4.3, with the trivial change around 4.13.
Cc: stable@vger.kernel.org Fixes: 4246a0b63bd8 block: add a bi_error field to struct bio Reported-by: Sarah Newman srn@prgmr.com Signed-off-by: Lars Ellenberg lars.ellenberg@linbit.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/drbd/drbd_worker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/block/drbd/drbd_worker.c +++ b/drivers/block/drbd/drbd_worker.c @@ -282,8 +282,8 @@ void drbd_request_endio(struct bio *bio) what = COMPLETED_OK; }
- bio_put(req->private_bio); req->private_bio = ERR_PTR(blk_status_to_errno(bio->bi_status)); + bio_put(bio);
/* not req_mod(), we need irqsave here! */ spin_lock_irqsave(&device->resource->req_lock, flags);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Gunthorpe jgg@mellanox.com
commit bb94b55af3461e26b32f0e23d455abeae0cfca5d upstream.
The patch noted in the fixes below converted get_user_pages_fast() to get_user_pages_longterm(), however the two calls differ in a few ways.
First _fast() is documented to not require the mmap_sem, while _longterm() is documented to need it. Hold the mmap sem as required.
Second, _fast accepts an 'int write' while _longterm uses 'unsigned int gup_flags', so the expression '!!(prot & IOMMU_WRITE)' is only working by luck as FOLL_WRITE is currently == 0x1. Use the expected FOLL_WRITE constant instead.
Fixes: 94db151dc892 ("vfio: disable filesystem-dax page pinning") Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe jgg@mellanox.com Acked-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/vfio/vfio_iommu_type1.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-)
--- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -339,18 +339,16 @@ static int vaddr_get_pfn(struct mm_struc struct page *page[1]; struct vm_area_struct *vma; struct vm_area_struct *vmas[1]; + unsigned int flags = 0; int ret;
+ if (prot & IOMMU_WRITE) + flags |= FOLL_WRITE; + + down_read(&mm->mmap_sem); if (mm == current->mm) { - ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE), - page, vmas); + ret = get_user_pages_longterm(vaddr, 1, flags, page, vmas); } else { - unsigned int flags = 0; - - if (prot & IOMMU_WRITE) - flags |= FOLL_WRITE; - - down_read(&mm->mmap_sem); ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, vmas, NULL); /* @@ -364,8 +362,8 @@ static int vaddr_get_pfn(struct mm_struc ret = -EOPNOTSUPP; put_page(page[0]); } - up_read(&mm->mmap_sem); } + up_read(&mm->mmap_sem);
if (ret == 1) { *pfn = page_to_pfn(page[0]);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lars Persson lars.persson@axis.com
commit 696e420bb2a6624478105651d5368d45b502b324 upstream.
With protocol version 2.0 mounts we have seen crashes with corrupt mid entries. Either the server->pending_mid_q list becomes corrupt with a cyclic reference in one element or a mid object fetched by the demultiplexer thread becomes overwritten during use.
Code review identified a race between the demultiplexer thread and the request issuing thread. The demultiplexer thread seems to be written with the assumption that it is the sole user of the mid object until it calls the mid callback which either wakes the issuer task or deletes the mid.
This assumption is not true because the issuer task can be woken up earlier by a signal. If the demultiplexer thread has proceeded as far as setting the mid_state to MID_RESPONSE_RECEIVED then the issuer thread will happily end up calling cifs_delete_mid while the demultiplexer thread still is using the mid object.
Inserting a delay in the cifs demultiplexer thread widens the race window and makes reproduction of the race very easy:
if (server->large_buf) buf = server->bigbuf;
+ usleep_range(500, 4000);
server->lstrp = jiffies;
To resolve this I think the proper solution involves putting a reference count on the mid object. This patch makes sure that the demultiplexer thread holds a reference until it has finished processing the transaction.
Cc: stable@vger.kernel.org Signed-off-by: Lars Persson larper@axis.com Acked-by: Paulo Alcantara palcantara@suse.de Reviewed-by: Ronnie Sahlberg lsahlber@redhat.com Reviewed-by: Pavel Shilovsky pshilov@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/cifsglob.h | 1 + fs/cifs/cifsproto.h | 1 + fs/cifs/connect.c | 8 +++++++- fs/cifs/smb1ops.c | 1 + fs/cifs/smb2ops.c | 1 + fs/cifs/smb2transport.c | 1 + fs/cifs/transport.c | 18 +++++++++++++++++- 7 files changed, 29 insertions(+), 2 deletions(-)
--- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1340,6 +1340,7 @@ typedef int (mid_handle_t)(struct TCP_Se /* one of these for every pending CIFS request to the server */ struct mid_q_entry { struct list_head qhead; /* mids waiting on reply from this server */ + struct kref refcount; struct TCP_Server_Info *server; /* server corresponding to this mid */ __u64 mid; /* multiplex id */ __u32 pid; /* process id */ --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -76,6 +76,7 @@ extern struct mid_q_entry *AllocMidQEntr struct TCP_Server_Info *server); extern void DeleteMidQEntry(struct mid_q_entry *midEntry); extern void cifs_delete_mid(struct mid_q_entry *mid); +extern void cifs_mid_q_entry_release(struct mid_q_entry *midEntry); extern void cifs_wake_up_task(struct mid_q_entry *mid); extern int cifs_handle_standard(struct TCP_Server_Info *server, struct mid_q_entry *mid); --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -889,6 +889,7 @@ cifs_demultiplex_thread(void *p) continue; server->total_read += length;
+ mid_entry = NULL; if (server->ops->is_transform_hdr && server->ops->receive_transform && server->ops->is_transform_hdr(buf)) { @@ -903,8 +904,11 @@ cifs_demultiplex_thread(void *p) length = mid_entry->receive(server, mid_entry); }
- if (length < 0) + if (length < 0) { + if (mid_entry) + cifs_mid_q_entry_release(mid_entry); continue; + }
if (server->large_buf) buf = server->bigbuf; @@ -920,6 +924,8 @@ cifs_demultiplex_thread(void *p)
if (!mid_entry->multiRsp || mid_entry->multiEnd) mid_entry->callback(mid_entry); + + cifs_mid_q_entry_release(mid_entry); } else if (server->ops->is_oplock_break && server->ops->is_oplock_break(buf, server)) { cifs_dbg(FYI, "Received oplock break\n"); --- a/fs/cifs/smb1ops.c +++ b/fs/cifs/smb1ops.c @@ -105,6 +105,7 @@ cifs_find_mid(struct TCP_Server_Info *se if (compare_mid(mid->mid, buf) && mid->mid_state == MID_REQUEST_SUBMITTED && le16_to_cpu(mid->command) == buf->Command) { + kref_get(&mid->refcount); spin_unlock(&GlobalMid_Lock); return mid; } --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -202,6 +202,7 @@ smb2_find_mid(struct TCP_Server_Info *se if ((mid->mid == wire_mid) && (mid->mid_state == MID_REQUEST_SUBMITTED) && (mid->command == shdr->Command)) { + kref_get(&mid->refcount); spin_unlock(&GlobalMid_Lock); return mid; } --- a/fs/cifs/smb2transport.c +++ b/fs/cifs/smb2transport.c @@ -548,6 +548,7 @@ smb2_mid_entry_alloc(const struct smb2_s
temp = mempool_alloc(cifs_mid_poolp, GFP_NOFS); memset(temp, 0, sizeof(struct mid_q_entry)); + kref_init(&temp->refcount); temp->mid = le64_to_cpu(shdr->MessageId); temp->pid = current->pid; temp->command = shdr->Command; /* Always LE */ --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -56,6 +56,7 @@ AllocMidQEntry(const struct smb_hdr *smb
temp = mempool_alloc(cifs_mid_poolp, GFP_NOFS); memset(temp, 0, sizeof(struct mid_q_entry)); + kref_init(&temp->refcount); temp->mid = get_mid(smb_buffer); temp->pid = current->pid; temp->command = cpu_to_le16(smb_buffer->Command); @@ -77,6 +78,21 @@ AllocMidQEntry(const struct smb_hdr *smb return temp; }
+static void _cifs_mid_q_entry_release(struct kref *refcount) +{ + struct mid_q_entry *mid = container_of(refcount, struct mid_q_entry, + refcount); + + mempool_free(mid, cifs_mid_poolp); +} + +void cifs_mid_q_entry_release(struct mid_q_entry *midEntry) +{ + spin_lock(&GlobalMid_Lock); + kref_put(&midEntry->refcount, _cifs_mid_q_entry_release); + spin_unlock(&GlobalMid_Lock); +} + void DeleteMidQEntry(struct mid_q_entry *midEntry) { @@ -105,7 +121,7 @@ DeleteMidQEntry(struct mid_q_entry *midE } } #endif - mempool_free(midEntry, cifs_mid_poolp); + cifs_mid_q_entry_release(midEntry); }
void
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara paulo@paulo.ac
commit 6aa0c114eceec8cc61715f74a4ce91b048d7561c upstream.
This patch fixes a memory leak when doing a setxattr(2) in SMB2+.
Signed-off-by: Paulo Alcantara palcantara@suse.de Cc: stable@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/smb2ops.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -636,6 +636,8 @@ smb2_set_ea(const unsigned int xid, stru
rc = SMB2_set_ea(xid, tcon, fid.persistent_fid, fid.volatile_fid, ea, len); + kfree(ea); + SMB2_close(xid, tcon, fid.persistent_fid, fid.volatile_fid);
return rc;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara paulo@paulo.ac
commit 7ffbe65578b44fafdef577a360eb0583929f7c6e upstream.
For every request we send, whether it is SMB1 or SMB2+, we attempt to reconnect tcon (cifs_reconnect_tcon or smb2_reconnect) before carrying out the request.
So, while server->tcpStatus != CifsNeedReconnect, we wait for the reconnection to succeed on wait_event_interruptible_timeout(). If it returns, that means that either the condition was evaluated to true, or timeout elapsed, or it was interrupted by a signal.
Since we're not handling the case where the process woke up due to a received signal (-ERESTARTSYS), the next call to wait_event_interruptible_timeout() will _always_ fail and we end up looping forever inside either cifs_reconnect_tcon() or smb2_reconnect().
Here's an example of how to trigger that:
$ mount.cifs //foo/share /mnt/test -o username=foo,password=foo,vers=1.0,hard
(break connection to server before executing bellow cmd) $ stat -f /mnt/test & sleep 140 [1] 2511
$ ps -aux -q 2511 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2511 0.0 0.0 12892 1008 pts/0 S 12:24 0:00 stat -f /mnt/test
$ kill -9 2511
(wait for a while; process is stuck in the kernel) $ ps -aux -q 2511 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2511 83.2 0.0 12892 1008 pts/0 R 12:24 30:01 stat -f /mnt/test
By using 'hard' mount point means that cifs.ko will keep retrying indefinitely, however we must allow the process to be killed otherwise it would hang the system.
Signed-off-by: Paulo Alcantara palcantara@suse.de Cc: stable@vger.kernel.org Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/cifssmb.c | 10 ++++++++-- fs/cifs/smb2pdu.c | 18 ++++++++++++------ 2 files changed, 20 insertions(+), 8 deletions(-)
--- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -150,8 +150,14 @@ cifs_reconnect_tcon(struct cifs_tcon *tc * greater than cifs socket timeout which is 7 seconds */ while (server->tcpStatus == CifsNeedReconnect) { - wait_event_interruptible_timeout(server->response_q, - (server->tcpStatus != CifsNeedReconnect), 10 * HZ); + rc = wait_event_interruptible_timeout(server->response_q, + (server->tcpStatus != CifsNeedReconnect), + 10 * HZ); + if (rc < 0) { + cifs_dbg(FYI, "%s: aborting reconnect due to a received" + " signal by the process\n", __func__); + return -ERESTARTSYS; + }
/* are we still trying to reconnect? */ if (server->tcpStatus != CifsNeedReconnect) --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -153,7 +153,7 @@ out: static int smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon) { - int rc = 0; + int rc; struct nls_table *nls_codepage; struct cifs_ses *ses; struct TCP_Server_Info *server; @@ -164,10 +164,10 @@ smb2_reconnect(__le16 smb2_command, stru * for those three - in the calling routine. */ if (tcon == NULL) - return rc; + return 0;
if (smb2_command == SMB2_TREE_CONNECT) - return rc; + return 0;
if (tcon->tidStatus == CifsExiting) { /* @@ -210,8 +210,14 @@ smb2_reconnect(__le16 smb2_command, stru return -EAGAIN; }
- wait_event_interruptible_timeout(server->response_q, - (server->tcpStatus != CifsNeedReconnect), 10 * HZ); + rc = wait_event_interruptible_timeout(server->response_q, + (server->tcpStatus != CifsNeedReconnect), + 10 * HZ); + if (rc < 0) { + cifs_dbg(FYI, "%s: aborting reconnect due to a received" + " signal by the process\n", __func__); + return -ERESTARTSYS; + }
/* are we still trying to reconnect? */ if (server->tcpStatus != CifsNeedReconnect) @@ -229,7 +235,7 @@ smb2_reconnect(__le16 smb2_command, stru }
if (!tcon->ses->need_reconnect && !tcon->need_reconnect) - return rc; + return 0;
nls_codepage = load_nls_default();
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Brivio sbrivio@redhat.com
commit f46ecbd97f508e68a7806291a139499794874f3d upstream.
A "small" CIFS buffer is not big enough in general to hold a setacl request for SMB2, and we end up overflowing the buffer in send_set_info(). For instance:
# mount.cifs //127.0.0.1/test /mnt/test -o username=test,password=test,nounix,cifsacl # touch /mnt/test/acltest # getcifsacl /mnt/test/acltest REVISION:0x1 CONTROL:0x9004 OWNER:S-1-5-21-2926364953-924364008-418108241-1000 GROUP:S-1-22-2-1001 ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff ACL:S-1-22-2-1001:ALLOWED/0x0/R ACL:S-1-22-2-1001:ALLOWED/0x0/R ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff ACL:S-1-1-0:ALLOWED/0x0/R # setcifsacl -a "ACL:S-1-22-2-1004:ALLOWED/0x0/R" /mnt/test/acltest
this setacl will cause the following KASAN splat:
[ 330.777927] BUG: KASAN: slab-out-of-bounds in send_set_info+0x4dd/0xc20 [cifs] [ 330.779696] Write of size 696 at addr ffff88010d5e2860 by task setcifsacl/1012
[ 330.781882] CPU: 1 PID: 1012 Comm: setcifsacl Not tainted 4.18.0-rc2+ #2 [ 330.783140] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 330.784395] Call Trace: [ 330.784789] dump_stack+0xc2/0x16b [ 330.786777] print_address_description+0x6a/0x270 [ 330.787520] kasan_report+0x258/0x380 [ 330.788845] memcpy+0x34/0x50 [ 330.789369] send_set_info+0x4dd/0xc20 [cifs] [ 330.799511] SMB2_set_acl+0x76/0xa0 [cifs] [ 330.801395] set_smb2_acl+0x7ac/0xf30 [cifs] [ 330.830888] cifs_xattr_set+0x963/0xe40 [cifs] [ 330.840367] __vfs_setxattr+0x84/0xb0 [ 330.842060] __vfs_setxattr_noperm+0xe6/0x370 [ 330.843848] vfs_setxattr+0xc2/0xd0 [ 330.845519] setxattr+0x258/0x320 [ 330.859211] path_setxattr+0x15b/0x1b0 [ 330.864392] __x64_sys_setxattr+0xc0/0x160 [ 330.866133] do_syscall_64+0x14e/0x4b0 [ 330.876631] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 330.878503] RIP: 0033:0x7ff2e507db0a [ 330.880151] Code: 48 8b 0d 89 93 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 bc 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 56 93 2c 00 f7 d8 64 89 01 48 [ 330.885358] RSP: 002b:00007ffdc4903c18 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc [ 330.887733] RAX: ffffffffffffffda RBX: 000055d1170de140 RCX: 00007ff2e507db0a [ 330.890067] RDX: 000055d1170de7d0 RSI: 000055d115b39184 RDI: 00007ffdc4904818 [ 330.892410] RBP: 0000000000000001 R08: 0000000000000000 R09: 000055d1170de7e4 [ 330.894785] R10: 00000000000002b8 R11: 0000000000000246 R12: 0000000000000007 [ 330.897148] R13: 000055d1170de0c0 R14: 0000000000000008 R15: 000055d1170de550
[ 330.901057] Allocated by task 1012: [ 330.902888] kasan_kmalloc+0xa0/0xd0 [ 330.904714] kmem_cache_alloc+0xc8/0x1d0 [ 330.906615] mempool_alloc+0x11e/0x380 [ 330.908496] cifs_small_buf_get+0x35/0x60 [cifs] [ 330.910510] smb2_plain_req_init+0x4a/0xd60 [cifs] [ 330.912551] send_set_info+0x198/0xc20 [cifs] [ 330.914535] SMB2_set_acl+0x76/0xa0 [cifs] [ 330.916465] set_smb2_acl+0x7ac/0xf30 [cifs] [ 330.918453] cifs_xattr_set+0x963/0xe40 [cifs] [ 330.920426] __vfs_setxattr+0x84/0xb0 [ 330.922284] __vfs_setxattr_noperm+0xe6/0x370 [ 330.924213] vfs_setxattr+0xc2/0xd0 [ 330.926008] setxattr+0x258/0x320 [ 330.927762] path_setxattr+0x15b/0x1b0 [ 330.929592] __x64_sys_setxattr+0xc0/0x160 [ 330.931459] do_syscall_64+0x14e/0x4b0 [ 330.933314] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 330.936843] Freed by task 0: [ 330.938588] (stack is not available)
[ 330.941886] The buggy address belongs to the object at ffff88010d5e2800 which belongs to the cache cifs_small_rq of size 448 [ 330.946362] The buggy address is located 96 bytes inside of 448-byte region [ffff88010d5e2800, ffff88010d5e29c0) [ 330.950722] The buggy address belongs to the page: [ 330.952789] page:ffffea0004357880 count:1 mapcount:0 mapping:ffff880108fdca80 index:0x0 compound_mapcount: 0 [ 330.955665] flags: 0x17ffffc0008100(slab|head) [ 330.957760] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff880108fdca80 [ 330.960356] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 [ 330.963005] page dumped because: kasan: bad access detected
[ 330.967039] Memory state around the buggy address: [ 330.969255] ffff88010d5e2880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 330.971833] ffff88010d5e2900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 330.974397] >ffff88010d5e2980: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 330.976956] ^ [ 330.979226] ffff88010d5e2a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 330.981755] ffff88010d5e2a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 330.984225] ==================================================================
Fix this by allocating a regular CIFS buffer in smb2_plain_req_init() if the request command is SMB2_SET_INFO.
Reported-by: Jianhong Yin jiyin@redhat.com Fixes: 366ed846df60 ("cifs: Use smb 2 - 3 and cifsacl mount options setacl function") CC: Stable stable@vger.kernel.org Signed-off-by: Stefano Brivio sbrivio@redhat.com Reviewed-and-tested-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/smb2pdu.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -338,7 +338,10 @@ smb2_plain_req_init(__le16 smb2_command, return rc;
/* BB eventually switch this to SMB2 specific small buf size */ - *request_buf = cifs_small_buf_get(); + if (smb2_command == SMB2_SET_INFO) + *request_buf = cifs_buf_get(); + else + *request_buf = cifs_small_buf_get(); if (*request_buf == NULL) { /* BB should we add a retry in here if not a writepage? */ return -ENOMEM; @@ -3168,7 +3171,7 @@ send_set_info(const unsigned int xid, st }
rc = SendReceive2(xid, ses, iov, num, &resp_buftype, flags, &rsp_iov); - cifs_small_buf_release(req); + cifs_buf_release(req); rsp = (struct smb2_set_info_rsp *)rsp_iov.iov_base;
if (rc != 0)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michel Dänzer michel.daenzer@amd.com
commit 718b5406cd76f1aa6434311241b7febf0e8571ff upstream.
The property size may be controlled by userspace, can be large (I've seen failure with order 4, i.e. 16 pages / 64 KB) and doesn't need to be physically contiguous.
Signed-off-by: Michel Dänzer michel.daenzer@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Link: https://patchwork.freedesktop.org/patch/msgid/20180629142710.2069-1-michel@d... Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/drm_property.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/drm_property.c +++ b/drivers/gpu/drm/drm_property.c @@ -516,7 +516,7 @@ static void drm_property_free_blob(struc
drm_mode_object_unregister(blob->dev, &blob->base);
- kfree(blob); + kvfree(blob); }
/** @@ -543,7 +543,7 @@ drm_property_create_blob(struct drm_devi if (!length || length > ULONG_MAX - sizeof(struct drm_property_blob)) return ERR_PTR(-EINVAL);
- blob = kzalloc(sizeof(struct drm_property_blob)+length, GFP_KERNEL); + blob = kvzalloc(sizeof(struct drm_property_blob)+length, GFP_KERNEL); if (!blob) return ERR_PTR(-ENOMEM);
@@ -559,7 +559,7 @@ drm_property_create_blob(struct drm_devi ret = __drm_mode_object_add(dev, &blob->base, DRM_MODE_OBJECT_BLOB, true, drm_property_free_blob); if (ret) { - kfree(blob); + kvfree(blob); return ERR_PTR(-EINVAL); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
commit 99ec9e77511dea55d81729fc80b6c63a61bfa8e0 upstream.
The displaylink hardware has such a peculiarity that it doesn't render a command until next command is received. This produces occasional corruption, such as when setting 22x11 font on the console, only the first line of the cursor will be blinking if the cursor is located at some specific columns.
When we end up with a repeating pixel, the driver has a bug that it leaves one uninitialized byte after the command (and this byte is enough to flush the command and render it - thus it fixes the screen corruption), however whe we end up with a non-repeating pixel, there is no byte appended and this results in temporary screen corruption.
This patch fixes the screen corruption by always appending a byte 0xAF at the end of URB. It also removes the uninitialized byte.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie airlied@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/udl/udl_fb.c | 5 ++++- drivers/gpu/drm/udl/udl_transfer.c | 11 +++++++---- 2 files changed, 11 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/udl/udl_fb.c +++ b/drivers/gpu/drm/udl/udl_fb.c @@ -137,7 +137,10 @@ int udl_handle_damage(struct udl_framebu
if (cmd > (char *) urb->transfer_buffer) { /* Send partial buffer remaining before exiting */ - int len = cmd - (char *) urb->transfer_buffer; + int len; + if (cmd < (char *) urb->transfer_buffer + urb->transfer_buffer_length) + *cmd++ = 0xAF; + len = cmd - (char *) urb->transfer_buffer; ret = udl_submit_urb(dev, urb, len); bytes_sent += len; } else --- a/drivers/gpu/drm/udl/udl_transfer.c +++ b/drivers/gpu/drm/udl/udl_transfer.c @@ -153,11 +153,11 @@ static void udl_compress_hline16( raw_pixels_count_byte = cmd++; /* we'll know this later */ raw_pixel_start = pixel;
- cmd_pixel_end = pixel + (min(MAX_CMD_PIXELS + 1, - min((int)(pixel_end - pixel) / bpp, - (int)(cmd_buffer_end - cmd) / 2))) * bpp; + cmd_pixel_end = pixel + min3(MAX_CMD_PIXELS + 1UL, + (unsigned long)(pixel_end - pixel) / bpp, + (unsigned long)(cmd_buffer_end - 1 - cmd) / 2) * bpp;
- prefetch_range((void *) pixel, (cmd_pixel_end - pixel) * bpp); + prefetch_range((void *) pixel, cmd_pixel_end - pixel); pixel_val16 = get_pixel_val16(pixel, bpp);
while (pixel < cmd_pixel_end) { @@ -193,6 +193,9 @@ static void udl_compress_hline16( if (pixel > raw_pixel_start) { /* finalize last RAW span */ *raw_pixels_count_byte = ((pixel-raw_pixel_start) / bpp) & 0xFF; + } else { + /* undo unused byte */ + cmd--; }
*cmd_pixels_count_byte = ((pixel - cmd_pixel_start) / bpp) & 0xFF;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit e09463f220ca9a1a1ecfda84fcda658f99a1f12a upstream.
Do not set the b_modified flag in block's journal head should not until after we're sure that jbd2_journal_dirty_metadat() will not abort with an error due to there not being enough space reserved in the jbd2 handle.
Otherwise, future attempts to modify the buffer may lead a large number of spurious errors and warnings.
This addresses CVE-2018-10883.
https://bugzilla.kernel.org/show_bug.cgi?id=200071
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/jbd2/transaction.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1366,6 +1366,13 @@ int jbd2_journal_dirty_metadata(handle_t if (jh->b_transaction == transaction && jh->b_jlist != BJ_Metadata) { jbd_lock_bh_state(bh); + if (jh->b_transaction == transaction && + jh->b_jlist != BJ_Metadata) + pr_err("JBD2: assertion failure: h_type=%u " + "h_line_no=%u block_no=%llu jlist=%u\n", + handle->h_type, handle->h_line_no, + (unsigned long long) bh->b_blocknr, + jh->b_jlist); J_ASSERT_JH(jh, jh->b_transaction != transaction || jh->b_jlist == BJ_Metadata); jbd_unlock_bh_state(bh); @@ -1385,11 +1392,11 @@ int jbd2_journal_dirty_metadata(handle_t * of the transaction. This needs to be done * once a transaction -bzzz */ - jh->b_modified = 1; if (handle->h_buffer_credits <= 0) { ret = -ENOSPC; goto out_unlock_bh; } + jh->b_modified = 1; handle->h_buffer_credits--; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit 5369a762c882c0b6e9599e4ebbb3a9ba9eee7e2d upstream.
In theory this should have been caught earlier when the xattr list was verified, but in case it got missed, it's simple enough to add check to make sure we don't overrun the xattr buffer.
This addresses CVE-2018-10879.
https://bugzilla.kernel.org/show_bug.cgi?id=200001
Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Andreas Dilger adilger@dilger.ca Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/xattr.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1559,7 +1559,7 @@ static int ext4_xattr_set_entry(struct e handle_t *handle, struct inode *inode, bool is_block) { - struct ext4_xattr_entry *last; + struct ext4_xattr_entry *last, *next; struct ext4_xattr_entry *here = s->here; size_t min_offs = s->end - s->base, name_len = strlen(i->name); int in_inode = i->in_inode; @@ -1594,7 +1594,13 @@ static int ext4_xattr_set_entry(struct e
/* Compute min_offs and last. */ last = s->first; - for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) { + for (; !IS_LAST_ENTRY(last); last = next) { + next = EXT4_XATTR_NEXT(last); + if ((void *)next >= s->end) { + EXT4_ERROR_INODE(inode, "corrupted xattr entries"); + ret = -EFSCORRUPTED; + goto out; + } if (!last->e_value_inum && last->e_value_size) { size_t offs = le16_to_cpu(last->e_value_offs); if (offs < min_offs)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit 513f86d73855ce556ea9522b6bfd79f87356dc3a upstream.
If there an inode points to a block which is also some other type of metadata block (such as a block allocation bitmap), the buffer_verified flag can be set when it was validated as that other metadata block type; however, it would make a really terrible external attribute block. The reason why we use the verified flag is to avoid constantly reverifying the block. However, it doesn't take much overhead to make sure the magic number of the xattr block is correct, and this will avoid potential crashes.
This addresses CVE-2018-10879.
https://bugzilla.kernel.org/show_bug.cgi?id=200001
Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Andreas Dilger adilger@dilger.ca Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/xattr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -229,12 +229,12 @@ __ext4_xattr_check_block(struct inode *i { int error = -EFSCORRUPTED;
- if (buffer_verified(bh)) - return 0; - if (BHDR(bh)->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) || BHDR(bh)->h_blocks != cpu_to_le32(1)) goto errout; + if (buffer_verified(bh)) + return 0; + error = -EFSBADCRC; if (!ext4_xattr_block_csum_verify(inode, bh)) goto errout;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit 77260807d1170a8cf35dbb06e07461a655f67eee upstream.
It's really bad when the allocation bitmaps and the inode table overlap with the block group descriptors, since it causes random corruption of the bg descriptors. So we really want to head those off at the pass.
https://bugzilla.kernel.org/show_bug.cgi?id=199865
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/super.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2301,6 +2301,7 @@ static int ext4_check_descriptors(struct struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_fsblk_t first_block = le32_to_cpu(sbi->s_es->s_first_data_block); ext4_fsblk_t last_block; + ext4_fsblk_t last_bg_block = sb_block + ext4_bg_num_gdb(sb, 0) + 1; ext4_fsblk_t block_bitmap; ext4_fsblk_t inode_bitmap; ext4_fsblk_t inode_table; @@ -2333,6 +2334,14 @@ static int ext4_check_descriptors(struct if (!sb_rdonly(sb)) return 0; } + if (block_bitmap >= sb_block + 1 && + block_bitmap <= last_bg_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " + "Block bitmap for group %u overlaps " + "block group descriptors", i); + if (!sb_rdonly(sb)) + return 0; + } if (block_bitmap < first_block || block_bitmap > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " "Block bitmap for group %u not in group " @@ -2347,6 +2356,14 @@ static int ext4_check_descriptors(struct if (!sb_rdonly(sb)) return 0; } + if (inode_bitmap >= sb_block + 1 && + inode_bitmap <= last_bg_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " + "Inode bitmap for group %u overlaps " + "block group descriptors", i); + if (!sb_rdonly(sb)) + return 0; + } if (inode_bitmap < first_block || inode_bitmap > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " "Inode bitmap for group %u not in group " @@ -2361,6 +2378,14 @@ static int ext4_check_descriptors(struct if (!sb_rdonly(sb)) return 0; } + if (inode_table >= sb_block + 1 && + inode_table <= last_bg_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " + "Inode table for group %u overlaps " + "block group descriptors", i); + if (!sb_rdonly(sb)) + return 0; + } if (inode_table < first_block || inode_table + sbi->s_itb_per_group - 1 > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: "
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit 819b23f1c501b17b9694325471789e6b5cc2d0d2 upstream.
Regardless of whether the flex_bg feature is set, we should always check to make sure the bits we are setting in the block bitmap are within the block group bounds.
https://bugzilla.kernel.org/show_bug.cgi?id=199865
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/balloc.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
--- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -184,7 +184,6 @@ static int ext4_init_block_bitmap(struct unsigned int bit, bit_max; struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_fsblk_t start, tmp; - int flex_bg = 0; struct ext4_group_info *grp;
J_ASSERT_BH(bh, buffer_locked(bh)); @@ -217,22 +216,19 @@ static int ext4_init_block_bitmap(struct
start = ext4_group_first_block_no(sb, block_group);
- if (ext4_has_feature_flex_bg(sb)) - flex_bg = 1; - /* Set bits for block and inode bitmaps, and inode table */ tmp = ext4_block_bitmap(sb, gdp); - if (!flex_bg || ext4_block_in_group(sb, tmp, block_group)) + if (ext4_block_in_group(sb, tmp, block_group)) ext4_set_bit(EXT4_B2C(sbi, tmp - start), bh->b_data);
tmp = ext4_inode_bitmap(sb, gdp); - if (!flex_bg || ext4_block_in_group(sb, tmp, block_group)) + if (ext4_block_in_group(sb, tmp, block_group)) ext4_set_bit(EXT4_B2C(sbi, tmp - start), bh->b_data);
tmp = ext4_inode_table(sb, gdp); for (; tmp < ext4_inode_table(sb, gdp) + sbi->s_itb_per_group; tmp++) { - if (!flex_bg || ext4_block_in_group(sb, tmp, block_group)) + if (ext4_block_in_group(sb, tmp, block_group)) ext4_set_bit(EXT4_B2C(sbi, tmp - start), bh->b_data); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit 8844618d8aa7a9973e7b527d038a2a589665002c upstream.
The bg_flags field in the block group descripts is only valid if the uninit_bg or metadata_csum feature is enabled. We were not consistently looking at this field; fix this.
Also block group #0 must never have uninitialized allocation bitmaps, or need to be zeroed, since that's where the root inode, and other special inodes are set up. Check for these conditions and mark the file system as corrupted if they are detected.
This addresses CVE-2018-10876.
https://bugzilla.kernel.org/show_bug.cgi?id=199403
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/balloc.c | 11 ++++++++++- fs/ext4/ialloc.c | 14 ++++++++++++-- fs/ext4/mballoc.c | 6 ++++-- fs/ext4/super.c | 11 ++++++++++- 4 files changed, 36 insertions(+), 6 deletions(-)
--- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -451,7 +451,16 @@ ext4_read_block_bitmap_nowait(struct sup goto verify; } ext4_lock_group(sb, block_group); - if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { + if (block_group == 0) { + ext4_unlock_group(sb, block_group); + unlock_buffer(bh); + ext4_error(sb, "Block bitmap for bg 0 marked " + "uninitialized"); + err = -EFSCORRUPTED; + goto out; + } err = ext4_init_block_bitmap(sb, bh, block_group, desc); set_bitmap_uptodate(bh); set_buffer_uptodate(bh); --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -155,7 +155,16 @@ ext4_read_inode_bitmap(struct super_bloc }
ext4_lock_group(sb, block_group); - if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT))) { + if (block_group == 0) { + ext4_unlock_group(sb, block_group); + unlock_buffer(bh); + ext4_error(sb, "Inode bitmap for bg 0 marked " + "uninitialized"); + err = -EFSCORRUPTED; + goto out; + } memset(bh->b_data, 0, (EXT4_INODES_PER_GROUP(sb) + 7) / 8); ext4_mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), sb->s_blocksize * 8, bh->b_data); @@ -1000,7 +1009,8 @@ got:
/* recheck and clear flag under lock if we still need to */ ext4_lock_group(sb, group); - if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); ext4_free_group_clusters_set(sb, gdp, ext4_free_clusters_after_init(sb, group, gdp)); --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2456,7 +2456,8 @@ int ext4_mb_add_groupinfo(struct super_b * initialize bb_free to be able to skip * empty groups without initialization */ - if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { meta_group_info[i]->bb_free = ext4_free_clusters_after_init(sb, group, desc); } else { @@ -3023,7 +3024,8 @@ ext4_mb_mark_diskspace_used(struct ext4_ #endif ext4_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); - if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); ext4_free_group_clusters_set(sb, gdp, ext4_free_clusters_after_init(sb, --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3095,13 +3095,22 @@ static ext4_group_t ext4_has_uninit_itab ext4_group_t group, ngroups = EXT4_SB(sb)->s_groups_count; struct ext4_group_desc *gdp = NULL;
+ if (!ext4_has_group_desc_csum(sb)) + return ngroups; + for (group = 0; group < ngroups; group++) { gdp = ext4_get_group_desc(sb, group, NULL); if (!gdp) continue;
- if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))) + if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)) + continue; + if (group != 0) break; + ext4_error(sb, "Inode table for bg 0 marked as " + "needing zeroing"); + if (sb_rdonly(sb)) + return ngroups; }
return group;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit bc890a60247171294acc0bd67d211fa4b88d40ba upstream.
If there is a corupted file system where the claimed depth of the extent tree is -1, this can cause a massive buffer overrun leading to sadness.
This addresses CVE-2018-10877.
https://bugzilla.kernel.org/show_bug.cgi?id=199417
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/ext4_extents.h | 1 + fs/ext4/extents.c | 6 ++++++ 2 files changed, 7 insertions(+)
--- a/fs/ext4/ext4_extents.h +++ b/fs/ext4/ext4_extents.h @@ -103,6 +103,7 @@ struct ext4_extent_header { };
#define EXT4_EXT_MAGIC cpu_to_le16(0xf30a) +#define EXT4_MAX_EXTENT_DEPTH 5
#define EXT4_EXTENT_TAIL_OFFSET(hdr) \ (sizeof(struct ext4_extent_header) + \ --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -881,6 +881,12 @@ ext4_find_extent(struct inode *inode, ex
eh = ext_inode_hdr(inode); depth = ext_depth(inode); + if (depth < 0 || depth > EXT4_MAX_EXTENT_DEPTH) { + EXT4_ERROR_INODE(inode, "inode has invalid extent depth: %d", + depth); + ret = -EFSCORRUPTED; + goto err; + }
if (path) { ext4_ext_drop_refs(path);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit bdbd6ce01a70f02e9373a584d0ae9538dcf0a121 upstream.
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -401,9 +401,9 @@ static int __check_block_validity(struct if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), map->m_pblk, map->m_len)) { ext4_error_inode(inode, func, line, map->m_pblk, - "lblock %lu mapped to illegal pblock " + "lblock %lu mapped to illegal pblock %llu " "(length %d)", (unsigned long) map->m_lblk, - map->m_len); + map->m_pblk, map->m_len); return -EFSCORRUPTED; } return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit 6e8ab72a812396996035a37e5ca4b3b99b5d214b upstream.
When converting from an inode from storing the data in-line to a data block, ext4_destroy_inline_data_nolock() was only clearing the on-disk copy of the i_blocks[] array. It was not clearing copy of the i_blocks[] in ext4_inode_info, in i_data[], which is the copy actually used by ext4_map_blocks().
This didn't matter much if we are using extents, since the extents header would be invalid and thus the extents could would re-initialize the extents tree. But if we are using indirect blocks, the previous contents of the i_blocks array will be treated as block numbers, with potentially catastrophic results to the file system integrity and/or user data.
This gets worse if the file system is using a 1k block size and s_first_data is zero, but even without this, the file system can get quite badly corrupted.
This addresses CVE-2018-10881.
https://bugzilla.kernel.org/show_bug.cgi?id=200015
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/inline.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -443,6 +443,7 @@ static int ext4_destroy_inline_data_nolo
memset((void *)ext4_raw_inode(&is.iloc)->i_block, 0, EXT4_MIN_INLINE_DATA_SIZE); + memset(ei->i_data, 0, EXT4_MIN_INLINE_DATA_SIZE);
if (ext4_has_feature_extents(inode->i_sb)) { if (S_ISDIR(inode->i_mode) ||
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit 8cdb5240ec5928b20490a2bb34cb87e9a5f40226 upstream.
When expanding the extra isize space, we must never move the system.data xattr out of the inode body. For performance reasons, it doesn't make any sense, and the inline data implementation assumes that system.data xattr is never in the external xattr block.
This addresses CVE-2018-10880
https://bugzilla.kernel.org/show_bug.cgi?id=200005
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/xattr.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -2656,6 +2656,11 @@ static int ext4_xattr_make_inode_space(h last = IFIRST(header); /* Find the entry best suited to be pushed into EA block */ for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) { + /* never move system.data out of the inode */ + if ((last->e_name_len == 4) && + (last->e_name_index == EXT4_XATTR_INDEX_SYSTEM) && + !memcmp(last->e_name, "data", 4)) + continue; total_size = EXT4_XATTR_LEN(last->e_name_len); if (!last->e_value_inum) total_size += EXT4_XATTR_SIZE(
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit 8bc1379b82b8e809eef77a9fedbb75c6c297be19 upstream.
Use a separate journal transaction if it turns out that we need to convert an inline file to use an data block. Otherwise we could end up failing due to not having journal credits.
This addresses CVE-2018-10883.
https://bugzilla.kernel.org/show_bug.cgi?id=200071
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/ext4.h | 3 --- fs/ext4/inline.c | 38 +------------------------------------- fs/ext4/xattr.c | 19 ++----------------- 3 files changed, 3 insertions(+), 57 deletions(-)
--- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3049,9 +3049,6 @@ extern struct buffer_head *ext4_get_firs extern int ext4_inline_data_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, int *has_inline, __u64 start, __u64 len); -extern int ext4_try_to_evict_inline_data(handle_t *handle, - struct inode *inode, - int needed); extern int ext4_inline_data_truncate(struct inode *inode, int *has_inline);
extern int ext4_convert_inline_data(struct inode *inode); --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -893,11 +893,11 @@ retry_journal: flags |= AOP_FLAG_NOFS;
if (ret == -ENOSPC) { + ext4_journal_stop(handle); ret = ext4_da_convert_inline_data_to_extent(mapping, inode, flags, fsdata); - ext4_journal_stop(handle); if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) goto retry_journal; @@ -1865,42 +1865,6 @@ out: return (error < 0 ? error : 0); }
-/* - * Called during xattr set, and if we can sparse space 'needed', - * just create the extent tree evict the data to the outer block. - * - * We use jbd2 instead of page cache to move data to the 1st block - * so that the whole transaction can be committed as a whole and - * the data isn't lost because of the delayed page cache write. - */ -int ext4_try_to_evict_inline_data(handle_t *handle, - struct inode *inode, - int needed) -{ - int error; - struct ext4_xattr_entry *entry; - struct ext4_inode *raw_inode; - struct ext4_iloc iloc; - - error = ext4_get_inode_loc(inode, &iloc); - if (error) - return error; - - raw_inode = ext4_raw_inode(&iloc); - entry = (struct ext4_xattr_entry *)((void *)raw_inode + - EXT4_I(inode)->i_inline_off); - if (EXT4_XATTR_LEN(entry->e_name_len) + - EXT4_XATTR_SIZE(le32_to_cpu(entry->e_value_size)) < needed) { - error = -ENOSPC; - goto out; - } - - error = ext4_convert_inline_data_nolock(handle, inode, &iloc); -out: - brelse(iloc.bh); - return error; -} - int ext4_inline_data_truncate(struct inode *inode, int *has_inline) { handle_t *handle; --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -2211,23 +2211,8 @@ int ext4_xattr_ibody_inline_set(handle_t if (EXT4_I(inode)->i_extra_isize == 0) return -ENOSPC; error = ext4_xattr_set_entry(i, s, handle, inode, false /* is_block */); - if (error) { - if (error == -ENOSPC && - ext4_has_inline_data(inode)) { - error = ext4_try_to_evict_inline_data(handle, inode, - EXT4_XATTR_LEN(strlen(i->name) + - EXT4_XATTR_SIZE(i->value_len))); - if (error) - return error; - error = ext4_xattr_ibody_find(inode, i, is); - if (error) - return error; - error = ext4_xattr_set_entry(i, s, handle, inode, - false /* is_block */); - } - if (error) - return error; - } + if (error) + return error; header = IHDR(inode, ext4_raw_inode(&is->iloc)); if (!IS_LAST_ENTRY(s->first)) { header->h_magic = cpu_to_le32(EXT4_XATTR_MAGIC);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit c37e9e013469521d9adb932d17a1795c139b36db upstream.
If there is a directory entry pointing to a system inode (such as a journal inode), complain and declare the file system to be corrupted.
Also, if the superblock's first inode number field is too small, refuse to mount the file system.
This addresses CVE-2018-10882.
https://bugzilla.kernel.org/show_bug.cgi?id=200069
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/ext4.h | 5 ----- fs/ext4/inode.c | 3 ++- fs/ext4/super.c | 5 +++++ 3 files changed, 7 insertions(+), 6 deletions(-)
--- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1542,11 +1542,6 @@ static inline struct ext4_inode_info *EX static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) { return ino == EXT4_ROOT_INO || - ino == EXT4_USR_QUOTA_INO || - ino == EXT4_GRP_QUOTA_INO || - ino == EXT4_BOOT_LOADER_INO || - ino == EXT4_JOURNAL_INO || - ino == EXT4_RESIZE_INO || (ino >= EXT4_FIRST_INO(sb) && ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)); } --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4455,7 +4455,8 @@ static int __ext4_get_inode_loc(struct i int inodes_per_block, inode_offset;
iloc->bh = NULL; - if (!ext4_valid_inum(sb, inode->i_ino)) + if (inode->i_ino < EXT4_ROOT_INO || + inode->i_ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)) return -EFSCORRUPTED;
iloc->block_group = (inode->i_ino - 1) / EXT4_INODES_PER_GROUP(sb); --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3811,6 +3811,11 @@ static int ext4_fill_super(struct super_ } else { sbi->s_inode_size = le16_to_cpu(es->s_inode_size); sbi->s_first_ino = le32_to_cpu(es->s_first_ino); + if (sbi->s_first_ino < EXT4_GOOD_OLD_FIRST_INO) { + ext4_msg(sb, KERN_ERR, "invalid first ino: %u", + sbi->s_first_ino); + goto failed_mount; + } if ((sbi->s_inode_size < EXT4_GOOD_OLD_INODE_SIZE) || (!is_power_of_2(sbi->s_inode_size)) || (sbi->s_inode_size > blocksize)) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit bfe0a5f47ada40d7984de67e59a7d3390b9b9ecc upstream.
The kernel's ext4 mount-time checks were more permissive than e2fsprogs's libext2fs checks when opening a file system. The superblock is considered too insane for debugfs or e2fsck to operate on it, the kernel has no business trying to mount it.
This will make file system fuzzing tools work harder, but the failure cases that they find will be more useful and be easier to evaluate.
Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/super.c | 37 ++++++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 11 deletions(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3749,6 +3749,13 @@ static int ext4_fill_super(struct super_ le32_to_cpu(es->s_log_block_size)); goto failed_mount; } + if (le32_to_cpu(es->s_log_cluster_size) > + (EXT4_MAX_CLUSTER_LOG_SIZE - EXT4_MIN_BLOCK_LOG_SIZE)) { + ext4_msg(sb, KERN_ERR, + "Invalid log cluster size: %u", + le32_to_cpu(es->s_log_cluster_size)); + goto failed_mount; + }
if (le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks) > (blocksize / 4)) { ext4_msg(sb, KERN_ERR, @@ -3892,13 +3899,6 @@ static int ext4_fill_super(struct super_ "block size (%d)", clustersize, blocksize); goto failed_mount; } - if (le32_to_cpu(es->s_log_cluster_size) > - (EXT4_MAX_CLUSTER_LOG_SIZE - EXT4_MIN_BLOCK_LOG_SIZE)) { - ext4_msg(sb, KERN_ERR, - "Invalid log cluster size: %u", - le32_to_cpu(es->s_log_cluster_size)); - goto failed_mount; - } sbi->s_cluster_bits = le32_to_cpu(es->s_log_cluster_size) - le32_to_cpu(es->s_log_block_size); sbi->s_clusters_per_group = @@ -3919,10 +3919,10 @@ static int ext4_fill_super(struct super_ } } else { if (clustersize != blocksize) { - ext4_warning(sb, "fragment/cluster size (%d) != " - "block size (%d)", clustersize, - blocksize); - clustersize = blocksize; + ext4_msg(sb, KERN_ERR, + "fragment/cluster size (%d) != " + "block size (%d)", clustersize, blocksize); + goto failed_mount; } if (sbi->s_blocks_per_group > blocksize * 8) { ext4_msg(sb, KERN_ERR, @@ -3976,6 +3976,13 @@ static int ext4_fill_super(struct super_ ext4_blocks_count(es)); goto failed_mount; } + if ((es->s_first_data_block == 0) && (es->s_log_block_size == 0) && + (sbi->s_cluster_ratio == 1)) { + ext4_msg(sb, KERN_WARNING, "bad geometry: first data " + "block is 0 with a 1k block and cluster size"); + goto failed_mount; + } + blocks_count = (ext4_blocks_count(es) - le32_to_cpu(es->s_first_data_block) + EXT4_BLOCKS_PER_GROUP(sb) - 1); @@ -4011,6 +4018,14 @@ static int ext4_fill_super(struct super_ ret = -ENOMEM; goto failed_mount; } + if (((u64)sbi->s_groups_count * sbi->s_inodes_per_group) != + le32_to_cpu(es->s_inodes_count)) { + ext4_msg(sb, KERN_ERR, "inodes count not valid: %u vs %llu", + le32_to_cpu(es->s_inodes_count), + ((u64)sbi->s_groups_count * sbi->s_inodes_per_group)); + ret = -EINVAL; + goto failed_mount; + }
bgl_lock_init(sbi->s_blockgroup_lock);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jon Derrick jonathan.derrick@intel.com
commit a17712c8e4be4fa5404d20e9cd3b2b21eae7bc56 upstream.
This patch attempts to close a hole leading to a BUG seen with hot removals during writes [1].
A block device (NVME namespace in this test case) is formatted to EXT4 without partitions. It's mounted and write I/O is run to a file, then the device is hot removed from the slot. The superblock attempts to be written to the drive which is no longer present.
The typical chain of events leading to the BUG: ext4_commit_super() __sync_dirty_buffer() submit_bh() submit_bh_wbc() BUG_ON(!buffer_mapped(bh));
This fix checks for the superblock's buffer head being mapped prior to syncing.
[1] https://www.spinics.net/lists/linux-ext4/msg56527.html
Signed-off-by: Jon Derrick jonathan.derrick@intel.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/super.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4754,6 +4754,14 @@ static int ext4_commit_super(struct supe
if (!sbh || block_device_ejected(sb)) return error; + + /* + * The superblock bh should be mapped, but it might not be if the + * device was hot-removed. Not much we can do but fail the I/O. + */ + if (!buffer_mapped(sbh)) + return error; + /* * If the file system is mounted read-only, don't update the * superblock write time. This avoids updating the superblock
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Dryomov idryomov@gmail.com
commit 425a4dba7953e35ffd096771973add6d2f40d2ed upstream.
blkdev_issue_zeroout() will use this in !BLKDEV_ZERO_NOFALLBACK case.
Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Cc: Janne Huttunen janne.huttunen@nokia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/blk-lib.c | 63 ++++++++++++++++++++++++++++++++------------------------ 1 file changed, 37 insertions(+), 26 deletions(-)
--- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -275,6 +275,40 @@ static unsigned int __blkdev_sectors_to_ return min(pages, (sector_t)BIO_MAX_PAGES); }
+static int __blkdev_issue_zero_pages(struct block_device *bdev, + sector_t sector, sector_t nr_sects, gfp_t gfp_mask, + struct bio **biop) +{ + struct request_queue *q = bdev_get_queue(bdev); + struct bio *bio = *biop; + int bi_size = 0; + unsigned int sz; + + if (!q) + return -ENXIO; + + while (nr_sects != 0) { + bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects), + gfp_mask); + bio->bi_iter.bi_sector = sector; + bio_set_dev(bio, bdev); + bio_set_op_attrs(bio, REQ_OP_WRITE, 0); + + while (nr_sects != 0) { + sz = min((sector_t) PAGE_SIZE, nr_sects << 9); + bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0); + nr_sects -= bi_size >> 9; + sector += bi_size >> 9; + if (bi_size < sz) + break; + } + cond_resched(); + } + + *biop = bio; + return 0; +} + /** * __blkdev_issue_zeroout - generate number of zero filed write bios * @bdev: blockdev to issue @@ -305,9 +339,6 @@ int __blkdev_issue_zeroout(struct block_ unsigned flags) { int ret; - int bi_size = 0; - struct bio *bio = *biop; - unsigned int sz; sector_t bs_mask;
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1; @@ -317,30 +348,10 @@ int __blkdev_issue_zeroout(struct block_ ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask, biop, flags); if (ret != -EOPNOTSUPP || (flags & BLKDEV_ZERO_NOFALLBACK)) - goto out; - - ret = 0; - while (nr_sects != 0) { - bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects), - gfp_mask); - bio->bi_iter.bi_sector = sector; - bio_set_dev(bio, bdev); - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); + return ret;
- while (nr_sects != 0) { - sz = min((sector_t) PAGE_SIZE, nr_sects << 9); - bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0); - nr_sects -= bi_size >> 9; - sector += bi_size >> 9; - if (bi_size < sz) - break; - } - cond_resched(); - } - - *biop = bio; -out: - return ret; + return __blkdev_issue_zero_pages(bdev, sector, nr_sects, gfp_mask, + biop); } EXPORT_SYMBOL(__blkdev_issue_zeroout);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Dryomov idryomov@gmail.com
commit d5ce4c31d6df518dd8f63bbae20d7423c5018a6c upstream.
sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to permit trying WRITE SAME on older SCSI devices, unless ->no_write_same is set. Because REQ_OP_WRITE_ZEROES is implemented in terms of WRITE SAME, blkdev_issue_zeroout() may fail with -EREMOTEIO:
$ fallocate -zn -l 1k /dev/sdg fallocate: fallocate failed: Remote I/O error $ fallocate -zn -l 1k /dev/sdg # OK $ fallocate -zn -l 1k /dev/sdg # OK
The following calls succeed because sd_done() sets ->no_write_same in response to a sense that would become BLK_STS_TARGET/-EREMOTEIO, causing __blkdev_issue_zeroout() to fall back to generating ZERO_PAGE bios.
This means blkdev_issue_zeroout() must cope with WRITE ZEROES failing and fall back to manually zeroing, unless BLKDEV_ZERO_NOFALLBACK is specified. For BLKDEV_ZERO_NOFALLBACK case, return -EOPNOTSUPP if sd_done() has just set ->no_write_same thus indicating lack of offload support.
Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing") Cc: Hannes Reinecke hare@suse.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Cc: Janne Huttunen janne.huttunen@nokia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- block/blk-lib.c | 45 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 35 insertions(+), 10 deletions(-)
--- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -322,12 +322,6 @@ static int __blkdev_issue_zero_pages(str * Zero-fill a block range, either using hardware offload or by explicitly * writing zeroes to the device. * - * Note that this function may fail with -EOPNOTSUPP if the driver signals - * zeroing offload support, but the device fails to process the command (for - * some devices there is no non-destructive way to verify whether this - * operation is actually supported). In this case the caller should call - * retry the call to blkdev_issue_zeroout() and the fallback path will be used. - * * If a device is using logical block provisioning, the underlying space will * not be released if %flags contains BLKDEV_ZERO_NOUNMAP. * @@ -371,18 +365,49 @@ EXPORT_SYMBOL(__blkdev_issue_zeroout); int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, unsigned flags) { - int ret; - struct bio *bio = NULL; + int ret = 0; + sector_t bs_mask; + struct bio *bio; struct blk_plug plug; + bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
+ bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1; + if ((sector | nr_sects) & bs_mask) + return -EINVAL; + +retry: + bio = NULL; blk_start_plug(&plug); - ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, - &bio, flags); + if (try_write_zeroes) { + ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, + gfp_mask, &bio, flags); + } else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) { + ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects, + gfp_mask, &bio); + } else { + /* No zeroing offload support */ + ret = -EOPNOTSUPP; + } if (ret == 0 && bio) { ret = submit_bio_wait(bio); bio_put(bio); } blk_finish_plug(&plug); + if (ret && try_write_zeroes) { + if (!(flags & BLKDEV_ZERO_NOFALLBACK)) { + try_write_zeroes = false; + goto retry; + } + if (!bdev_write_zeroes_sectors(bdev)) { + /* + * Zeroing offload support was indicated, but the + * device reported ILLEGAL REQUEST (for some devices + * there is no non-destructive way to verify whether + * WRITE ZEROES is actually supported). + */ + ret = -EOPNOTSUPP; + } + }
return ret; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Andryuk jandryuk@gmail.com
commit ef6eaf27274c0351f7059163918f3795da13199c upstream.
Commit ac75a041048b ("HID: i2c-hid: fix size check and type usage") started writing messages when the ret_size is <= 2 from i2c_master_recv. However, my device i2c-DLL07D1 returns 2 for a short period of time (~0.5s) after I stop moving the pointing stick or touchpad. It varies, but you get ~50 messages each time which spams the log hard.
[ 95.925055] i2c_hid i2c-DLL07D1:01: i2c_hid_get_input: incomplete report (83/2)
This has also been observed with a i2c-ALP0017.
[ 1781.266353] i2c_hid i2c-ALP0017:00: i2c_hid_get_input: incomplete report (30/2)
Only print the message when ret_size is totally invalid and less than 2 to cut down on the log spam.
Fixes: ac75a041048b ("HID: i2c-hid: fix size check and type usage") Reported-by: John Smith john-s-84@gmx.net Cc: stable@vger.kernel.org Signed-off-by: Jason Andryuk jandryuk@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/i2c-hid/i2c-hid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hid/i2c-hid/i2c-hid.c +++ b/drivers/hid/i2c-hid/i2c-hid.c @@ -476,7 +476,7 @@ static void i2c_hid_get_input(struct i2c return; }
- if ((ret_size > size) || (ret_size <= 2)) { + if ((ret_size > size) || (ret_size < 2)) { dev_err(&ihid->client->dev, "%s: incomplete report (%d/%d)\n", __func__, size, ret_size); return;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit 4f65245f2d178b9cba48350620d76faa4a098841 upstream.
uref->field_index, uref->usage_index, finfo.field_index and cinfo.index can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/hid/usbhid/hiddev.c:473 hiddev_ioctl_usage() warn: potential spectre issue 'report->field' (local cap) drivers/hid/usbhid/hiddev.c:477 hiddev_ioctl_usage() warn: potential spectre issue 'field->usage' (local cap) drivers/hid/usbhid/hiddev.c:757 hiddev_ioctl() warn: potential spectre issue 'report->field' (local cap) drivers/hid/usbhid/hiddev.c:801 hiddev_ioctl() warn: potential spectre issue 'hid->collection' (local cap)
Fix this by sanitizing such structure fields before using them to index report->field, field->usage and hid->collection
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/usbhid/hiddev.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/drivers/hid/usbhid/hiddev.c +++ b/drivers/hid/usbhid/hiddev.c @@ -36,6 +36,7 @@ #include <linux/hiddev.h> #include <linux/compat.h> #include <linux/vmalloc.h> +#include <linux/nospec.h> #include "usbhid.h"
#ifdef CONFIG_USB_DYNAMIC_MINORS @@ -469,10 +470,14 @@ static noinline int hiddev_ioctl_usage(s
if (uref->field_index >= report->maxfield) goto inval; + uref->field_index = array_index_nospec(uref->field_index, + report->maxfield);
field = report->field[uref->field_index]; if (uref->usage_index >= field->maxusage) goto inval; + uref->usage_index = array_index_nospec(uref->usage_index, + field->maxusage);
uref->usage_code = field->usage[uref->usage_index].hid;
@@ -499,6 +504,8 @@ static noinline int hiddev_ioctl_usage(s
if (uref->field_index >= report->maxfield) goto inval; + uref->field_index = array_index_nospec(uref->field_index, + report->maxfield);
field = report->field[uref->field_index];
@@ -753,6 +760,8 @@ static long hiddev_ioctl(struct file *fi
if (finfo.field_index >= report->maxfield) break; + finfo.field_index = array_index_nospec(finfo.field_index, + report->maxfield);
field = report->field[finfo.field_index]; memset(&finfo, 0, sizeof(finfo)); @@ -797,6 +806,8 @@ static long hiddev_ioctl(struct file *fi
if (cinfo.index >= hid->maxcollection) break; + cinfo.index = array_index_nospec(cinfo.index, + hid->maxcollection);
cinfo.type = hid->collection[cinfo.index].type; cinfo.usage = hid->collection[cinfo.index].usage;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Rosenberg drosen@google.com
commit 717adfdaf14704fd3ec7fa2c04520c0723247eac upstream.
If our length is greater than the size of the buffer, we overflow the buffer
Cc: stable@vger.kernel.org Signed-off-by: Daniel Rosenberg drosen@google.com Reviewed-by: Benjamin Tissoires benjamin.tissoires@redhat.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/hid-debug.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/hid/hid-debug.c +++ b/drivers/hid/hid-debug.c @@ -1154,6 +1154,8 @@ copy_rest: goto out; if (list->tail > list->head) { len = list->tail - list->head; + if (len > count) + len = count;
if (copy_to_user(buffer + ret, &list->hid_debug_buf[list->head], len)) { ret = -EFAULT; @@ -1163,6 +1165,8 @@ copy_rest: list->head += len; } else { len = HID_DEBUG_BUFSIZE - list->head; + if (len > count) + len = count;
if (copy_to_user(buffer, &list->hid_debug_buf[list->head], len)) { ret = -EFAULT; @@ -1170,7 +1174,9 @@ copy_rest: } list->head = 0; ret += len; - goto copy_rest; + count -= len; + if (count > 0) + goto copy_rest; }
}
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rakib Mullick rakib.mullick@gmail.com
commit 10d94ff4d558b96bfc4f55bb0051ae4d938246fe upstream.
When the irqaffinity= kernel parameter is passed in a CPUMASK_OFFSTACK=y kernel, it fails to boot, because zalloc_cpumask_var() cannot be used before initializing the slab allocator to allocate a cpumask.
So, use alloc_bootmem_cpumask_var() instead.
Also do some cleanups while at it: in init_irq_default_affinity() remove an #ifdef via using cpumask_available().
Signed-off-by: Rakib Mullick rakib.mullick@gmail.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20171026045800.27087-1-rakib.mullick@gmail.com Link: http://lkml.kernel.org/r/20171101041451.12581-1-rakib.mullick@gmail.com Signed-off-by: Ingo Molnar mingo@kernel.org Cc: Janne Huttunen janne.huttunen@nokia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/irq/irqdesc.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -27,7 +27,7 @@ static struct lock_class_key irq_desc_lo #if defined(CONFIG_SMP) static int __init irq_affinity_setup(char *str) { - zalloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT); + alloc_bootmem_cpumask_var(&irq_default_affinity); cpulist_parse(str, irq_default_affinity); /* * Set at least the boot cpu. We don't want to end up with @@ -40,10 +40,8 @@ __setup("irqaffinity=", irq_affinity_set
static void __init init_irq_default_affinity(void) { -#ifdef CONFIG_CPUMASK_OFFSTACK - if (!irq_default_affinity) + if (!cpumask_available(irq_default_affinity)) zalloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT); -#endif if (cpumask_empty(irq_default_affinity)) cpumask_setall(irq_default_affinity); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naoya Horiguchi n-horiguchi@ah.jp.nec.com
commit 31286a8484a85e8b4e91ddb0f5415aee8a416827 upstream.
Recently the following BUG was reported:
Injecting memory failure for pfn 0x3c0000 at process virtual address 0x7fe300000000 Memory failure: 0x3c0000: recovery action for huge page: Recovered BUG: unable to handle kernel paging request at ffff8dfcc0003000 IP: gup_pgd_range+0x1f0/0xc20 PGD 17ae72067 P4D 17ae72067 PUD 0 Oops: 0000 [#1] SMP PTI ... CPU: 3 PID: 5467 Comm: hugetlb_1gb Not tainted 4.15.0-rc8-mm1-abc+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
You can easily reproduce this by calling madvise(MADV_HWPOISON) twice on a 1GB hugepage. This happens because get_user_pages_fast() is not aware of a migration entry on pud that was created in the 1st madvise() event.
I think that conversion to pud-aligned migration entry is working, but other MM code walking over page table isn't prepared for it. We need some time and effort to make all this work properly, so this patch avoids the reported bug by just disabling error handling for 1GB hugepage.
[n-horiguchi@ah.jp.nec.com: v2] Link: http://lkml.kernel.org/r/1517284444-18149-1-git-send-email-n-horiguchi@ah.jp... Link: http://lkml.kernel.org/r/1517207283-15769-1-git-send-email-n-horiguchi@ah.jp... Signed-off-by: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Punit Agrawal punit.agrawal@arm.com Tested-by: Michael Ellerman mpe@ellerman.id.au Cc: Anshuman Khandual khandual@linux.vnet.ibm.com Cc: "Aneesh Kumar K.V" aneesh.kumar@linux.vnet.ibm.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/mm.h | 1 + mm/memory-failure.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+)
--- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2549,6 +2549,7 @@ enum mf_action_page_type { MF_MSG_POISONED_HUGE, MF_MSG_HUGE, MF_MSG_FREE_HUGE, + MF_MSG_NON_PMD_HUGE, MF_MSG_UNMAP_FAILED, MF_MSG_DIRTY_SWAPCACHE, MF_MSG_CLEAN_SWAPCACHE, --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -508,6 +508,7 @@ static const char * const action_page_ty [MF_MSG_POISONED_HUGE] = "huge page already hardware poisoned", [MF_MSG_HUGE] = "huge page", [MF_MSG_FREE_HUGE] = "free huge page", + [MF_MSG_NON_PMD_HUGE] = "non-pmd-sized huge page", [MF_MSG_UNMAP_FAILED] = "unmapping failed page", [MF_MSG_DIRTY_SWAPCACHE] = "dirty swapcache page", [MF_MSG_CLEAN_SWAPCACHE] = "clean swapcache page", @@ -1090,6 +1091,21 @@ static int memory_failure_hugetlb(unsign return 0; }
+ /* + * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so + * simply disable it. In order to make it work properly, we need + * make sure that: + * - conversion of a pud that maps an error hugetlb into hwpoison + * entry properly works, and + * - other mm code walking over page table is aware of pud-aligned + * hwpoison entries. + */ + if (huge_page_size(page_hstate(head)) > PMD_SIZE) { + action_result(pfn, MF_MSG_NON_PMD_HUGE, MF_IGNORED); + res = -EBUSY; + goto out; + } + if (!hwpoison_user_mappings(p, pfn, trapno, flags, &head)) { action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED); res = -EBUSY;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sakari Ailus sakari.ailus@linux.intel.com
commit 03703ed1debf777ea845aa9b50ba2e80a5e7dd3c upstream.
If buffers were prepared or queued and the buffers were released without starting the queue, the finish mem op (corresponding to the prepare mem op) was never called to the buffers.
Before commit a136f59c0a1f there was no need to do this as in such a case the prepare mem op had not been called yet. Address the problem by explicitly calling finish mem op when the queue is stopped if the buffer is in either prepared or queued state.
Fixes: a136f59c0a1f ("[media] vb2: Move buffer cache synchronisation to prepare from queue")
Cc: stable@vger.kernel.org # for v4.13 and up Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Tested-by: Devin Heitmueller dheitmueller@kernellabs.com Signed-off-by: Hans Verkuil hans.verkuil@cisco.com Signed-off-by: Mauro Carvalho Chehab mchehab@s-opensource.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/v4l2-core/videobuf2-core.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/media/v4l2-core/videobuf2-core.c +++ b/drivers/media/v4l2-core/videobuf2-core.c @@ -1689,6 +1689,15 @@ static void __vb2_queue_cancel(struct vb for (i = 0; i < q->num_buffers; ++i) { struct vb2_buffer *vb = q->bufs[i];
+ if (vb->state == VB2_BUF_STATE_PREPARED || + vb->state == VB2_BUF_STATE_QUEUED) { + unsigned int plane; + + for (plane = 0; plane < vb->num_planes; ++plane) + call_void_memop(vb, finish, + vb->planes[plane].mem_priv); + } + if (vb->state != VB2_BUF_STATE_DEQUEUED) { vb->state = VB2_BUF_STATE_PREPARED; call_void_vb_qop(vb, buf_finish, vb);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jaegeuk Kim jaegeuk@kernel.org
commit dc7a10ddee0c56c6d891dd18de5c4ee9869545e0 upstream.
If write is failed, we must deallocate the blocks that we couldn't write.
Cc: stable@vger.kernel.org Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/f2fs/file.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -2694,11 +2694,16 @@ static ssize_t f2fs_file_write_iter(stru inode_lock(inode); ret = generic_write_checks(iocb, from); if (ret > 0) { + bool preallocated = false; + size_t target_size = 0; int err;
if (iov_iter_fault_in_readable(from, iov_iter_count(from))) set_inode_flag(inode, FI_NO_PREALLOC);
+ preallocated = true; + target_size = iocb->ki_pos + iov_iter_count(from); + err = f2fs_preallocate_blocks(iocb, from); if (err) { clear_inode_flag(inode, FI_NO_PREALLOC); @@ -2710,6 +2715,10 @@ static ssize_t f2fs_file_write_iter(stru blk_finish_plug(&plug); clear_inode_flag(inode, FI_NO_PREALLOC);
+ /* if we couldn't write data, we should deallocate blocks. */ + if (preallocated && i_size_read(inode) < target_size) + f2fs_truncate(inode); + if (ret > 0) f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit 5bbb99d2fde047df596379be6c58e265e2ddbe1f which is commit 88075256ee817041d68c2387f29065b5cb2b342a upstream.
Jiri writes that this was an incorrect fix, and Madalin-cristian says it was fixed differently in a later patch. So just revert this from 4.14.y.
Reported-by: Jiri Slaby jslaby@suse.cz Cc: Madalin Bucur madalin.bucur@nxp.com Cc: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -2863,7 +2863,7 @@ static int dpaa_remove(struct platform_d struct device *dev; int err;
- dev = pdev->dev.parent; + dev = &pdev->dev; net_dev = dev_get_drvdata(dev);
priv = netdev_priv(net_dev);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rasmus Villemoes linux@rasmusvillemoes.dk
commit 9564a8cf422d7b58f6e857e3546d346fa970191e upstream.
I tried building using a freshly built Make (4.2.1-69-g8a731d1), but already the objtool build broke with
orc_dump.c: In function ‘orc_dump’: orc_dump.c:106:2: error: ‘elf_getshnum’ is deprecated [-Werror=deprecated-declarations] if (elf_getshdrnum(elf, &nr_sections)) {
Turns out that with that new Make, the backslash was not removed, so cpp didn't see a #include directive, grep found nothing, and -DLIBELF_USE_DEPRECATED was wrongly put in CFLAGS.
Now, that new Make behaviour is documented in their NEWS file:
* WARNING: Backward-incompatibility! Number signs (#) appearing inside a macro reference or function invocation no longer introduce comments and should not be escaped with backslashes: thus a call such as: foo := $(shell echo '#') is legal. Previously the number sign needed to be escaped, for example: foo := $(shell echo '#') Now this latter will resolve to "#". If you want to write makefiles portable to both versions, assign the number sign to a variable: C := # foo := $(shell echo '$C') This was claimed to be fixed in 3.81, but wasn't, for some reason. To detect this change search for 'nocomment' in the .FEATURES variable.
This also fixes up the two make-cmd instances to replace # with $(pound) rather than with #. There might very well be other places that need similar fixup in preparation for whatever future Make release contains the above change, but at least this builds an x86_64 defconfig with the new make.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197847 Cc: Randy Dunlap rdunlap@infradead.org Signed-off-by: Rasmus Villemoes linux@rasmusvillemoes.dk Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- scripts/Kbuild.include | 5 +++-- tools/build/Build.include | 5 +++-- tools/objtool/Makefile | 2 +- tools/scripts/Makefile.include | 2 ++ 4 files changed, 9 insertions(+), 5 deletions(-)
--- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -8,6 +8,7 @@ squote := ' empty := space := $(empty) $(empty) space_escape := _-_SPACE_-_ +pound := #
### # Name of target with a '.' as filename prefix. foo/bar.o => foo/.bar.o @@ -251,11 +252,11 @@ endif
# Replace >$< with >$$< to preserve $ when reloading the .cmd file # (needed for make) -# Replace >#< with >#< to avoid starting a comment in the .cmd file +# Replace >#< with >$(pound)< to avoid starting a comment in the .cmd file # (needed for make) # Replace >'< with >'''< to be able to enclose the whole string in '...' # (needed for the shell) -make-cmd = $(call escsq,$(subst #,\#,$(subst $$,$$$$,$(cmd_$(1))))) +make-cmd = $(call escsq,$(subst $(pound),$$(pound),$(subst $$,$$$$,$(cmd_$(1)))))
# Find any prerequisites that is newer than target or that does not exist. # PHONY targets skipped in both cases. --- a/tools/build/Build.include +++ b/tools/build/Build.include @@ -12,6 +12,7 @@ # Convenient variables comma := , squote := ' +pound := #
### # Name of target with a '.' as filename prefix. foo/bar.o => foo/.bar.o @@ -43,11 +44,11 @@ echo-cmd = $(if $($(quiet)cmd_$(1)),\ ### # Replace >$< with >$$< to preserve $ when reloading the .cmd file # (needed for make) -# Replace >#< with >#< to avoid starting a comment in the .cmd file +# Replace >#< with >$(pound)< to avoid starting a comment in the .cmd file # (needed for make) # Replace >'< with >'''< to be able to enclose the whole string in '...' # (needed for the shell) -make-cmd = $(call escsq,$(subst #,\#,$(subst $$,$$$$,$(cmd_$(1))))) +make-cmd = $(call escsq,$(subst $(pound),$$(pound),$(subst $$,$$$$,$(cmd_$(1)))))
### # Find any prerequisites that is newer than target or that does not exist. --- a/tools/objtool/Makefile +++ b/tools/objtool/Makefile @@ -35,7 +35,7 @@ CFLAGS += -Wall -Werror $(WARNINGS) -f LDFLAGS += -lelf $(LIBSUBCMD)
# Allow old libelf to be used: -elfshdr := $(shell echo '#include <libelf.h>' | $(CC) $(CFLAGS) -x c -E - | grep elf_getshdr) +elfshdr := $(shell echo '$(pound)include <libelf.h>' | $(CC) $(CFLAGS) -x c -E - | grep elf_getshdr) CFLAGS += $(if $(elfshdr),,-DLIBELF_USE_DEPRECATED)
AWK = awk --- a/tools/scripts/Makefile.include +++ b/tools/scripts/Makefile.include @@ -101,3 +101,5 @@ ifneq ($(silent),1) QUIET_INSTALL = @printf ' INSTALL %s\n' $1; endif endif + +pound := #
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Brad Love brad@nextdimension.cc
commit 3ee9bc12342cf546313d300808ff47d7dbb8e7db upstream.
The cx25840 driver currently configures 885, 887, and 888 using default divisors for each chip. This check to see if the cx23885 driver has passed the cx25840 a non-default clock rate for a specific chip. If a cx23885 board has left clk_freq at 0, the clock default values will be used to configure the PLLs.
This patch only has effect on 888 boards who set clk_freq to 25M.
Signed-off-by: Brad Love brad@nextdimension.cc Signed-off-by: Mauro Carvalho Chehab mchehab@s-opensource.com Cc: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/i2c/cx25840/cx25840-core.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-)
--- a/drivers/media/i2c/cx25840/cx25840-core.c +++ b/drivers/media/i2c/cx25840/cx25840-core.c @@ -463,8 +463,13 @@ static void cx23885_initialize(struct i2 { DEFINE_WAIT(wait); struct cx25840_state *state = to_state(i2c_get_clientdata(client)); + u32 clk_freq = 0; struct workqueue_struct *q;
+ /* cx23885 sets hostdata to clk_freq pointer */ + if (v4l2_get_subdev_hostdata(&state->sd)) + clk_freq = *((u32 *)v4l2_get_subdev_hostdata(&state->sd)); + /* * Come out of digital power down * The CX23888, at least, needs this, otherwise registers aside from @@ -500,8 +505,13 @@ static void cx23885_initialize(struct i2 * 50.0 MHz * (0xb + 0xe8ba26/0x2000000)/4 = 5 * 28.636363 MHz * 572.73 MHz before post divide */ - /* HVR1850 or 50MHz xtal */ - cx25840_write(client, 0x2, 0x71); + if (clk_freq == 25000000) { + /* 888/ImpactVCBe or 25Mhz xtal */ + ; /* nothing to do */ + } else { + /* HVR1850 or 50MHz xtal */ + cx25840_write(client, 0x2, 0x71); + } cx25840_write4(client, 0x11c, 0x01d1744c); cx25840_write4(client, 0x118, 0x00000416); cx25840_write4(client, 0x404, 0x0010253e); @@ -544,9 +554,15 @@ static void cx23885_initialize(struct i2 /* HVR1850 */ switch (state->id) { case CX23888_AV: - /* 888/HVR1250 specific */ - cx25840_write4(client, 0x10c, 0x13333333); - cx25840_write4(client, 0x108, 0x00000515); + if (clk_freq == 25000000) { + /* 888/ImpactVCBe or 25MHz xtal */ + cx25840_write4(client, 0x10c, 0x01b6db7b); + cx25840_write4(client, 0x108, 0x00000512); + } else { + /* 888/HVR1250 or 50MHz xtal */ + cx25840_write4(client, 0x10c, 0x13333333); + cx25840_write4(client, 0x108, 0x00000515); + } break; default: cx25840_write4(client, 0x10c, 0x002be2c9); @@ -576,7 +592,7 @@ static void cx23885_initialize(struct i2 * 368.64 MHz before post divide * 122.88 MHz / 0xa = 12.288 MHz */ - /* HVR1850 or 50MHz xtal */ + /* HVR1850 or 50MHz xtal or 25MHz xtal */ cx25840_write4(client, 0x114, 0x017dbf48); cx25840_write4(client, 0x110, 0x000a030e); break;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Kaiser martin@kaiser.cx
commit 3f77f244d8ec28e3a0a81240ffac7d626390060c upstream.
The v21 version of the NAND flash controller contains a Spare Area Size Register (SPAS) at offset 0x10. Its setting defaults to the maximum spare area size of 218 bytes. The size that is set in this register is used by the controller when it calculates the ECC bytes internally in hardware.
Usually, this register is updated from settings in the IIM fuses when the system is booting from NAND flash. For other boot media, however, the SPAS register remains at the default setting, which may not work for the particular flash chip on the board. The same goes for flash chips whose configuration cannot be set in the IIM fuses (e.g. chips with 2k sector size and 128 bytes spare area size can't be configured in the IIM fuses on imx25 systems).
Set the SPAS register explicitly during the preset operation. Derive the register value from mtd->oobsize that was detected during probe by decoding the flash chip's ID bytes.
While at it, rename the define for the spare area register's offset to NFC_V21_RSLTSPARE_AREA. The register at offset 0x10 on v1 controllers is different from the register on v21 controllers.
Fixes: d484018 ("mtd: mxc_nand: set NFC registers after reset") Cc: stable@vger.kernel.org Signed-off-by: Martin Kaiser martin@kaiser.cx Reviewed-by: Sascha Hauer s.hauer@pengutronix.de Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mtd/nand/mxc_nand.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/mtd/nand/mxc_nand.c +++ b/drivers/mtd/nand/mxc_nand.c @@ -48,7 +48,7 @@ #define NFC_V1_V2_CONFIG (host->regs + 0x0a) #define NFC_V1_V2_ECC_STATUS_RESULT (host->regs + 0x0c) #define NFC_V1_V2_RSLTMAIN_AREA (host->regs + 0x0e) -#define NFC_V1_V2_RSLTSPARE_AREA (host->regs + 0x10) +#define NFC_V21_RSLTSPARE_AREA (host->regs + 0x10) #define NFC_V1_V2_WRPROT (host->regs + 0x12) #define NFC_V1_UNLOCKSTART_BLKADDR (host->regs + 0x14) #define NFC_V1_UNLOCKEND_BLKADDR (host->regs + 0x16) @@ -1119,6 +1119,9 @@ static void preset_v2(struct mtd_info *m writew(config1, NFC_V1_V2_CONFIG1); /* preset operation */
+ /* spare area size in 16-bit half-words */ + writew(mtd->oobsize / 2, NFC_V21_RSLTSPARE_AREA); + /* Unlock the internal RAM Buffer */ writew(0x2, NFC_V1_V2_CONFIG);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Darrick J. Wong darrick.wong@oracle.com
commit ba23cba9b3bdc967aabdc6ff1e3e9b11ce05bb4f upstream.
Change bdev_dax_supported so it takes a bdev parameter. This enables multi-device filesystems like xfs to check that a dax device can work for the particular filesystem. Once that's in place, actually fix all the parts of XFS where we need to be able to distinguish between datadev and rtdev.
This patch fixes the problem where we screw up the dax support checking in xfs if the datadev and rtdev have different dax capabilities.
Signed-off-by: Darrick J. Wong darrick.wong@oracle.com [rez: Re-added __bdev_dax_supported() for !CONFIG_FS_DAX cases] Signed-off-by: Ross Zwisler ross.zwisler@linux.intel.com Reviewed-by: Eric Sandeen sandeen@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dax/super.c | 22 +++++++++++----------- fs/ext2/super.c | 2 +- fs/ext4/super.c | 2 +- fs/xfs/xfs_ioctl.c | 3 ++- fs/xfs/xfs_iops.c | 30 +++++++++++++++++++++++++----- fs/xfs/xfs_super.c | 10 ++++++++-- include/linux/dax.h | 9 +++++---- 7 files changed, 53 insertions(+), 25 deletions(-)
--- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -73,7 +73,7 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
/** * __bdev_dax_supported() - Check if the device supports dax for filesystem - * @sb: The superblock of the device + * @bdev: block device to check * @blocksize: The block size of the device * * This is a library function for filesystems to check if the block device @@ -81,33 +81,33 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev); * * Return: negative errno if unsupported, 0 if supported. */ -int __bdev_dax_supported(struct super_block *sb, int blocksize) +int __bdev_dax_supported(struct block_device *bdev, int blocksize) { - struct block_device *bdev = sb->s_bdev; struct dax_device *dax_dev; pgoff_t pgoff; int err, id; void *kaddr; pfn_t pfn; long len; + char buf[BDEVNAME_SIZE];
if (blocksize != PAGE_SIZE) { - pr_err("VFS (%s): error: unsupported blocksize for dax\n", - sb->s_id); + pr_debug("%s: error: unsupported blocksize for dax\n", + bdevname(bdev, buf)); return -EINVAL; }
err = bdev_dax_pgoff(bdev, 0, PAGE_SIZE, &pgoff); if (err) { - pr_err("VFS (%s): error: unaligned partition for dax\n", - sb->s_id); + pr_debug("%s: error: unaligned partition for dax\n", + bdevname(bdev, buf)); return err; }
dax_dev = dax_get_by_host(bdev->bd_disk->disk_name); if (!dax_dev) { - pr_err("VFS (%s): error: device does not support dax\n", - sb->s_id); + pr_debug("%s: error: device does not support dax\n", + bdevname(bdev, buf)); return -EOPNOTSUPP; }
@@ -118,8 +118,8 @@ int __bdev_dax_supported(struct super_bl put_dax(dax_dev);
if (len < 1) { - pr_err("VFS (%s): error: dax access failed (%ld)", - sb->s_id, len); + pr_debug("%s: error: dax access failed (%ld)\n", + bdevname(bdev, buf), len); return len < 0 ? len : -EIO; }
--- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -953,7 +953,7 @@ static int ext2_fill_super(struct super_ blocksize = BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);
if (sbi->s_mount_opt & EXT2_MOUNT_DAX) { - err = bdev_dax_supported(sb, blocksize); + err = bdev_dax_supported(sb->s_bdev, blocksize); if (err) goto failed_mount; } --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3770,7 +3770,7 @@ static int ext4_fill_super(struct super_ " that may contain inline data"); goto failed_mount; } - err = bdev_dax_supported(sb, blocksize); + err = bdev_dax_supported(sb->s_bdev, blocksize); if (err) goto failed_mount; } --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1101,7 +1101,8 @@ xfs_ioctl_setattr_dax_invalidate( if (fa->fsx_xflags & FS_XFLAG_DAX) { if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))) return -EINVAL; - if (bdev_dax_supported(sb, sb->s_blocksize) < 0) + if (bdev_dax_supported(xfs_find_bdev_for_inode(VFS_I(ip)), + sb->s_blocksize) < 0) return -EINVAL; }
--- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -1184,6 +1184,30 @@ static const struct inode_operations xfs .update_time = xfs_vn_update_time, };
+/* Figure out if this file actually supports DAX. */ +static bool +xfs_inode_supports_dax( + struct xfs_inode *ip) +{ + struct xfs_mount *mp = ip->i_mount; + + /* Only supported on non-reflinked files. */ + if (!S_ISREG(VFS_I(ip)->i_mode) || xfs_is_reflink_inode(ip)) + return false; + + /* DAX mount option or DAX iflag must be set. */ + if (!(mp->m_flags & XFS_MOUNT_DAX) && + !(ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) + return false; + + /* Block size must match page size */ + if (mp->m_sb.sb_blocksize != PAGE_SIZE) + return false; + + /* Device has to support DAX too. */ + return xfs_find_daxdev_for_inode(VFS_I(ip)) != NULL; +} + STATIC void xfs_diflags_to_iflags( struct inode *inode, @@ -1202,11 +1226,7 @@ xfs_diflags_to_iflags( inode->i_flags |= S_SYNC; if (flags & XFS_DIFLAG_NOATIME) inode->i_flags |= S_NOATIME; - if (S_ISREG(inode->i_mode) && - ip->i_mount->m_sb.sb_blocksize == PAGE_SIZE && - !xfs_is_reflink_inode(ip) && - (ip->i_mount->m_flags & XFS_MOUNT_DAX || - ip->i_d.di_flags2 & XFS_DIFLAG2_DAX)) + if (xfs_inode_supports_dax(ip)) inode->i_flags |= S_DAX; }
--- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1640,11 +1640,17 @@ xfs_fs_fill_super( sb->s_flags |= SB_I_VERSION;
if (mp->m_flags & XFS_MOUNT_DAX) { + int error2 = 0; + xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
- error = bdev_dax_supported(sb, sb->s_blocksize); - if (error) { + error = bdev_dax_supported(mp->m_ddev_targp->bt_bdev, + sb->s_blocksize); + if (mp->m_rtdev_targp) + error2 = bdev_dax_supported(mp->m_rtdev_targp->bt_bdev, + sb->s_blocksize); + if (error && error2) { xfs_alert(mp, "DAX unsupported by block device. Turning off DAX."); mp->m_flags &= ~XFS_MOUNT_DAX; --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -40,10 +40,10 @@ static inline void put_dax(struct dax_de
int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff); #if IS_ENABLED(CONFIG_FS_DAX) -int __bdev_dax_supported(struct super_block *sb, int blocksize); -static inline int bdev_dax_supported(struct super_block *sb, int blocksize) +int __bdev_dax_supported(struct block_device *bdev, int blocksize); +static inline int bdev_dax_supported(struct block_device *bdev, int blocksize) { - return __bdev_dax_supported(sb, blocksize); + return __bdev_dax_supported(bdev, blocksize); }
static inline struct dax_device *fs_dax_get_by_host(const char *host) @@ -58,7 +58,8 @@ static inline void fs_put_dax(struct dax
struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev); #else -static inline int bdev_dax_supported(struct super_block *sb, int blocksize) +static inline int bdev_dax_supported(struct block_device *bdev, + int blocksize) { return -EOPNOTSUPP; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Jiang dave.jiang@intel.com
commit 80660f20252d6f76c9f203874ad7c7a4a8508cf8 upstream.
The function return values are confusing with the way the function is named. We expect a true or false return value but it actually returns 0/-errno. This makes the code very confusing. Changing the return values to return a bool where if DAX is supported then return true and no DAX support returns false.
Signed-off-by: Dave Jiang dave.jiang@intel.com Signed-off-by: Ross Zwisler ross.zwisler@linux.intel.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/dax/super.c | 14 +++++++------- fs/ext2/super.c | 3 +-- fs/ext4/super.c | 3 +-- fs/xfs/xfs_ioctl.c | 4 ++-- fs/xfs/xfs_super.c | 12 ++++++------ include/linux/dax.h | 8 ++++---- 6 files changed, 21 insertions(+), 23 deletions(-)
--- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -79,9 +79,9 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev); * This is a library function for filesystems to check if the block device * can be mounted with dax option. * - * Return: negative errno if unsupported, 0 if supported. + * Return: true if supported, false if unsupported */ -int __bdev_dax_supported(struct block_device *bdev, int blocksize) +bool __bdev_dax_supported(struct block_device *bdev, int blocksize) { struct dax_device *dax_dev; pgoff_t pgoff; @@ -94,21 +94,21 @@ int __bdev_dax_supported(struct block_de if (blocksize != PAGE_SIZE) { pr_debug("%s: error: unsupported blocksize for dax\n", bdevname(bdev, buf)); - return -EINVAL; + return false; }
err = bdev_dax_pgoff(bdev, 0, PAGE_SIZE, &pgoff); if (err) { pr_debug("%s: error: unaligned partition for dax\n", bdevname(bdev, buf)); - return err; + return false; }
dax_dev = dax_get_by_host(bdev->bd_disk->disk_name); if (!dax_dev) { pr_debug("%s: error: device does not support dax\n", bdevname(bdev, buf)); - return -EOPNOTSUPP; + return false; }
id = dax_read_lock(); @@ -120,10 +120,10 @@ int __bdev_dax_supported(struct block_de if (len < 1) { pr_debug("%s: error: dax access failed (%ld)\n", bdevname(bdev, buf), len); - return len < 0 ? len : -EIO; + return false; }
- return 0; + return true; } EXPORT_SYMBOL_GPL(__bdev_dax_supported); #endif --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -953,8 +953,7 @@ static int ext2_fill_super(struct super_ blocksize = BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);
if (sbi->s_mount_opt & EXT2_MOUNT_DAX) { - err = bdev_dax_supported(sb->s_bdev, blocksize); - if (err) + if (!bdev_dax_supported(sb->s_bdev, blocksize)) goto failed_mount; }
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3770,8 +3770,7 @@ static int ext4_fill_super(struct super_ " that may contain inline data"); goto failed_mount; } - err = bdev_dax_supported(sb->s_bdev, blocksize); - if (err) + if (!bdev_dax_supported(sb->s_bdev, blocksize)) goto failed_mount; }
--- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1101,8 +1101,8 @@ xfs_ioctl_setattr_dax_invalidate( if (fa->fsx_xflags & FS_XFLAG_DAX) { if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))) return -EINVAL; - if (bdev_dax_supported(xfs_find_bdev_for_inode(VFS_I(ip)), - sb->s_blocksize) < 0) + if (!bdev_dax_supported(xfs_find_bdev_for_inode(VFS_I(ip)), + sb->s_blocksize)) return -EINVAL; }
--- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1640,17 +1640,17 @@ xfs_fs_fill_super( sb->s_flags |= SB_I_VERSION;
if (mp->m_flags & XFS_MOUNT_DAX) { - int error2 = 0; + bool rtdev_is_dax = false, datadev_is_dax;
xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
- error = bdev_dax_supported(mp->m_ddev_targp->bt_bdev, - sb->s_blocksize); + datadev_is_dax = bdev_dax_supported(mp->m_ddev_targp->bt_bdev, + sb->s_blocksize); if (mp->m_rtdev_targp) - error2 = bdev_dax_supported(mp->m_rtdev_targp->bt_bdev, - sb->s_blocksize); - if (error && error2) { + rtdev_is_dax = bdev_dax_supported( + mp->m_rtdev_targp->bt_bdev, sb->s_blocksize); + if (!rtdev_is_dax && !datadev_is_dax) { xfs_alert(mp, "DAX unsupported by block device. Turning off DAX."); mp->m_flags &= ~XFS_MOUNT_DAX; --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -40,8 +40,8 @@ static inline void put_dax(struct dax_de
int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff); #if IS_ENABLED(CONFIG_FS_DAX) -int __bdev_dax_supported(struct block_device *bdev, int blocksize); -static inline int bdev_dax_supported(struct block_device *bdev, int blocksize) +bool __bdev_dax_supported(struct block_device *bdev, int blocksize); +static inline bool bdev_dax_supported(struct block_device *bdev, int blocksize) { return __bdev_dax_supported(bdev, blocksize); } @@ -58,10 +58,10 @@ static inline void fs_put_dax(struct dax
struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev); #else -static inline int bdev_dax_supported(struct block_device *bdev, +static inline bool bdev_dax_supported(struct block_device *bdev, int blocksize) { - return -EOPNOTSUPP; + return false; }
static inline struct dax_device *fs_dax_get_by_host(const char *host)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ross Zwisler ross.zwisler@linux.intel.com
commit 15256f6cc4b44f2e70503758150267fd2a53c0d6 upstream.
Add an explicit check for QUEUE_FLAG_DAX to __bdev_dax_supported(). This is needed for DM configurations where the first element in the dm-linear or dm-stripe target supports DAX, but other elements do not. Without this check __bdev_dax_supported() will pass for such devices, letting a filesystem on that device mount with the DAX option.
Signed-off-by: Ross Zwisler ross.zwisler@linux.intel.com Suggested-by: Mike Snitzer snitzer@redhat.com Fixes: commit 545ed20e6df6 ("dm: add infrastructure for DAX support") Cc: stable@vger.kernel.org Acked-by: Dan Williams dan.j.williams@intel.com Reviewed-by: Toshi Kani toshi.kani@hpe.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/dax/super.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -84,6 +84,7 @@ EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev); bool __bdev_dax_supported(struct block_device *bdev, int blocksize) { struct dax_device *dax_dev; + struct request_queue *q; pgoff_t pgoff; int err, id; void *kaddr; @@ -96,6 +97,13 @@ bool __bdev_dax_supported(struct block_d bdevname(bdev, buf)); return false; } + + q = bdev_get_queue(bdev); + if (!q || !blk_queue_dax(q)) { + pr_debug("%s: error: request queue doesn't support dax\n", + bdevname(bdev, buf)); + return false; + }
err = bdev_dax_pgoff(bdev, 0, PAGE_SIZE, &pgoff); if (err) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mike Snitzer snitzer@redhat.com
commit ad3793fc3945173f64d82d05d3ecde41f6c0435c upstream.
Rather than having DAX support be unique by setting it based on table type in dm_setup_md_queue().
Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Ross Zwisler ross.zwisler@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-table.c | 2 ++ drivers/md/dm.c | 3 --- 2 files changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1813,6 +1813,8 @@ void dm_table_set_restrictions(struct dm } blk_queue_write_cache(q, wc, fua);
+ if (dm_table_supports_dax(t)) + queue_flag_set_unlocked(QUEUE_FLAG_DAX, q); if (dm_table_supports_dax_write_cache(t)) dax_write_cache(t->md->dax_dev, true);
--- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2050,9 +2050,6 @@ int dm_setup_md_queue(struct mapped_devi */ bioset_free(md->queue->bio_split); md->queue->bio_split = NULL; - - if (type == DM_TYPE_DAX_BIO_BASED) - queue_flag_set_unlocked(QUEUE_FLAG_DAX, md->queue); break; case DM_TYPE_NONE: WARN_ON_ONCE(true);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ross Zwisler ross.zwisler@linux.intel.com
commit dbc626597c39b24cefce09fbd8e9dea85869a801 upstream.
Currently device_supports_dax() just checks to see if the QUEUE_FLAG_DAX flag is set on the device's request queue to decide whether or not the device supports filesystem DAX. Really we should be using bdev_dax_supported() like filesystems do at mount time. This performs other tests like checking to make sure the dax_direct_access() path works.
We also explicitly clear QUEUE_FLAG_DAX on the DM device's request queue if any of the underlying devices do not support DAX. This makes the handling of QUEUE_FLAG_DAX consistent with the setting/clearing of most other flags in dm_table_set_restrictions().
Now that bdev_dax_supported() explicitly checks for QUEUE_FLAG_DAX, this will ensure that filesystems built upon DM devices will only be able to mount with DAX if all underlying devices also support DAX.
Signed-off-by: Ross Zwisler ross.zwisler@linux.intel.com Fixes: commit 545ed20e6df6 ("dm: add infrastructure for DAX support") Cc: stable@vger.kernel.org Acked-by: Dan Williams dan.j.williams@intel.com Reviewed-by: Toshi Kani toshi.kani@hpe.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-table.c | 7 ++++--- drivers/md/dm.c | 3 +-- 2 files changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -883,9 +883,7 @@ EXPORT_SYMBOL_GPL(dm_table_set_type); static int device_supports_dax(struct dm_target *ti, struct dm_dev *dev, sector_t start, sector_t len, void *data) { - struct request_queue *q = bdev_get_queue(dev->bdev); - - return q && blk_queue_dax(q); + return bdev_dax_supported(dev->bdev, PAGE_SIZE); }
static bool dm_table_supports_dax(struct dm_table *t) @@ -1815,6 +1813,9 @@ void dm_table_set_restrictions(struct dm
if (dm_table_supports_dax(t)) queue_flag_set_unlocked(QUEUE_FLAG_DAX, q); + else + queue_flag_clear_unlocked(QUEUE_FLAG_DAX, q); + if (dm_table_supports_dax_write_cache(t)) dax_write_cache(t->md->dax_dev, true);
--- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -961,8 +961,7 @@ static long dm_dax_direct_access(struct if (len < 1) goto out; nr_pages = min(len, nr_pages); - if (ti->type->direct_access) - ret = ti->type->direct_access(ti, pgoff, nr_pages, kaddr, pfn); + ret = ti->type->direct_access(ti, pgoff, nr_pages, kaddr, pfn);
out: dm_put_live_table(md, srcu_idx);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tokunori Ikegami ikegami@allied-telesis.co.jp
commit 85a82e28b023de9b259a86824afbd6ba07bd6475 upstream.
The definition can be used for other program and erase operations also. So change the naming to MAX_RETRIES from MAX_WORD_RETRIES.
Signed-off-by: Tokunori Ikegami ikegami@allied-telesis.co.jp Reviewed-by: Joakim Tjernlund Joakim.Tjernlund@infinera.com Cc: Chris Packham chris.packham@alliedtelesis.co.nz Cc: Brian Norris computersforpeace@gmail.com Cc: David Woodhouse dwmw2@infradead.org Cc: Boris Brezillon boris.brezillon@free-electrons.com Cc: Marek Vasut marek.vasut@gmail.com Cc: Richard Weinberger richard@nod.at Cc: Cyrille Pitchen cyrille.pitchen@wedev4u.fr Cc: linux-mtd@lists.infradead.org Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -42,7 +42,7 @@ #define AMD_BOOTLOC_BUG #define FORCE_WORD_WRITE 0
-#define MAX_WORD_RETRIES 3 +#define MAX_RETRIES 3
#define SST49LF004B 0x0060 #define SST49LF040B 0x0050 @@ -1647,7 +1647,7 @@ static int __xipram do_write_oneword(str map_write( map, CMD(0xF0), chip->start ); /* FIXME - should have reset delay before continuing */
- if (++retry_cnt <= MAX_WORD_RETRIES) + if (++retry_cnt <= MAX_RETRIES) goto retry;
ret = -EIO; @@ -2106,7 +2106,7 @@ retry: map_write(map, CMD(0xF0), chip->start); /* FIXME - should have reset delay before continuing */
- if (++retry_cnt <= MAX_WORD_RETRIES) + if (++retry_cnt <= MAX_RETRIES) goto retry;
ret = -EIO;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tokunori Ikegami ikegami@allied-telesis.co.jp
commit 45f75b8a919a4255f52df454f1ffdee0e42443b2 upstream.
For the word write functions it is retried for error. But it is not implemented to retry for the erase functions. To make sure for the erase functions change to retry as same.
This is needed to prevent the flash erase error caused only once. It was caused by the error case of chip_good() in the do_erase_oneblock(). Also it was confirmed on the MACRONIX flash device MX29GL512FHT2I-11G. But the error issue behavior is not able to reproduce at this moment. The flash controller is parallel Flash interface integrated on BCM53003.
Signed-off-by: Tokunori Ikegami ikegami@allied-telesis.co.jp Reviewed-by: Joakim Tjernlund Joakim.Tjernlund@infinera.com Cc: Chris Packham chris.packham@alliedtelesis.co.nz Cc: Brian Norris computersforpeace@gmail.com Cc: David Woodhouse dwmw2@infradead.org Cc: Boris Brezillon boris.brezillon@free-electrons.com Cc: Marek Vasut marek.vasut@gmail.com Cc: Richard Weinberger richard@nod.at Cc: Cyrille Pitchen cyrille.pitchen@wedev4u.fr Cc: linux-mtd@lists.infradead.org Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mtd/chips/cfi_cmdset_0002.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -2241,6 +2241,7 @@ static int __xipram do_erase_chip(struct unsigned long int adr; DECLARE_WAITQUEUE(wait, current); int ret = 0; + int retry_cnt = 0;
adr = cfi->addr_unlock1;
@@ -2258,6 +2259,7 @@ static int __xipram do_erase_chip(struct ENABLE_VPP(map); xip_disable(map, chip, adr);
+ retry: cfi_send_gen_cmd(0xAA, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); cfi_send_gen_cmd(0x55, cfi->addr_unlock2, chip->start, map, cfi, cfi->device_type, NULL); cfi_send_gen_cmd(0x80, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); @@ -2312,6 +2314,9 @@ static int __xipram do_erase_chip(struct map_write( map, CMD(0xF0), chip->start ); /* FIXME - should have reset delay before continuing */
+ if (++retry_cnt <= MAX_RETRIES) + goto retry; + ret = -EIO; }
@@ -2331,6 +2336,7 @@ static int __xipram do_erase_oneblock(st unsigned long timeo = jiffies + HZ; DECLARE_WAITQUEUE(wait, current); int ret = 0; + int retry_cnt = 0;
adr += chip->start;
@@ -2348,6 +2354,7 @@ static int __xipram do_erase_oneblock(st ENABLE_VPP(map); xip_disable(map, chip, adr);
+ retry: cfi_send_gen_cmd(0xAA, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); cfi_send_gen_cmd(0x55, cfi->addr_unlock2, chip->start, map, cfi, cfi->device_type, NULL); cfi_send_gen_cmd(0x80, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); @@ -2405,6 +2412,9 @@ static int __xipram do_erase_oneblock(st map_write( map, CMD(0xF0), chip->start ); /* FIXME - should have reset delay before continuing */
+ if (++retry_cnt <= MAX_RETRIES) + goto retry; + ret = -EIO; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tokunori Ikegami ikegami@allied-telesis.co.jp
commit 79ca484b613041ca223f74b34608bb6f5221724b upstream.
Currently the functions use to check both chip ready and good. But the chip ready is not enough to check the operation status. So change this to check the chip good instead of this. About the retry functions to make sure the error handling remain it.
Signed-off-by: Tokunori Ikegami ikegami@allied-telesis.co.jp Reviewed-by: Joakim Tjernlund Joakim.Tjernlund@infinera.com Cc: Chris Packham chris.packham@alliedtelesis.co.nz Cc: Brian Norris computersforpeace@gmail.com Cc: David Woodhouse dwmw2@infradead.org Cc: Boris Brezillon boris.brezillon@free-electrons.com Cc: Marek Vasut marek.vasut@gmail.com Cc: Richard Weinberger richard@nod.at Cc: Cyrille Pitchen cyrille.pitchen@wedev4u.fr Cc: linux-mtd@lists.infradead.org Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mtd/chips/cfi_cmdset_0002.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
--- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -2296,12 +2296,13 @@ static int __xipram do_erase_chip(struct chip->erase_suspended = 0; }
- if (chip_ready(map, adr)) + if (chip_good(map, adr, map_word_ff(map))) break;
if (time_after(jiffies, timeo)) { printk(KERN_WARNING "MTD %s(): software timeout\n", __func__ ); + ret = -EIO; break; }
@@ -2309,15 +2310,15 @@ static int __xipram do_erase_chip(struct UDELAY(map, chip, adr, 1000000/HZ); } /* Did we succeed? */ - if (!chip_good(map, adr, map_word_ff(map))) { + if (ret) { /* reset on all failures. */ map_write( map, CMD(0xF0), chip->start ); /* FIXME - should have reset delay before continuing */
- if (++retry_cnt <= MAX_RETRIES) + if (++retry_cnt <= MAX_RETRIES) { + ret = 0; goto retry; - - ret = -EIO; + } }
chip->state = FL_READY; @@ -2391,7 +2392,7 @@ static int __xipram do_erase_oneblock(st chip->erase_suspended = 0; }
- if (chip_ready(map, adr)) { + if (chip_good(map, adr, map_word_ff(map))) { xip_enable(map, chip, adr); break; } @@ -2400,6 +2401,7 @@ static int __xipram do_erase_oneblock(st xip_enable(map, chip, adr); printk(KERN_WARNING "MTD %s(): software timeout\n", __func__ ); + ret = -EIO; break; }
@@ -2407,15 +2409,15 @@ static int __xipram do_erase_oneblock(st UDELAY(map, chip, adr, 1000000/HZ); } /* Did we succeed? */ - if (!chip_good(map, adr, map_word_ff(map))) { + if (ret) { /* reset on all failures. */ map_write( map, CMD(0xF0), chip->start ); /* FIXME - should have reset delay before continuing */
- if (++retry_cnt <= MAX_RETRIES) + if (++retry_cnt <= MAX_RETRIES) { + ret = 0; goto retry; - - ret = -EIO; + } }
chip->state = FL_READY;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
commit ce00bf07cc95a57cd20b208e02b3c2604e532ae8 upstream.
The old code would indefinitely block other users of nf_log_mutex if a userspace access in proc_dostring() blocked e.g. due to a userfaultfd region. Fix it by moving proc_dostring() out of the locked region.
This is a followup to commit 266d07cb1c9a ("netfilter: nf_log: fix sleeping function called from invalid context"), which changed this code from using rcu_read_lock() to taking nf_log_mutex.
Fixes: 266d07cb1c9a ("netfilter: nf_log: fix sleeping function calle[...]") Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/nf_log.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/net/netfilter/nf_log.c +++ b/net/netfilter/nf_log.c @@ -458,14 +458,17 @@ static int nf_log_proc_dostring(struct c rcu_assign_pointer(net->nf.nf_loggers[tindex], logger); mutex_unlock(&nf_log_mutex); } else { + struct ctl_table tmp = *table; + + tmp.data = buf; mutex_lock(&nf_log_mutex); logger = nft_log_dereference(net->nf.nf_loggers[tindex]); if (!logger) - table->data = "NONE"; + strlcpy(buf, "NONE", sizeof(buf)); else - table->data = logger->name; - r = proc_dostring(table, write, buffer, lenp, ppos); + strlcpy(buf, logger->name, sizeof(buf)); mutex_unlock(&nf_log_mutex); + r = proc_dostring(&tmp, write, buffer, lenp, ppos); }
return r;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@oracle.com
commit 1376b0a2160319125c3a2822e8c09bd283cd8141 upstream.
There is a '>' vs '<' typo so this loop is a no-op.
Fixes: d35dcc89fc93 ("staging: comedi: quatech_daqp_cs: fix daqp_ao_insn_write()") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Ian Abbott abbotti@mev.co.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/staging/comedi/drivers/quatech_daqp_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/staging/comedi/drivers/quatech_daqp_cs.c +++ b/drivers/staging/comedi/drivers/quatech_daqp_cs.c @@ -642,7 +642,7 @@ static int daqp_ao_insn_write(struct com /* Make sure D/A update mode is direct update */ outb(0, dev->iobase + DAQP_AUX_REG);
- for (i = 0; i > insn->n; i++) { + for (i = 0; i < insn->n; i++) { unsigned int val = data[i]; int ret;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
commit 4ff648decf4712d39f184fc2df3163f43975575a upstream.
Since the following commit:
b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()")
the sched_pi_setprio trace point shows the "newprio" during a deboost:
|futex sched_pi_setprio: comm=futex_requeue_p pid"34 oldprio newprio=3D98 |futex sched_switch: prev_comm=futex_requeue_p prev_pid"34 prev_prio=120
This patch open codes __rt_effective_prio() in the tracepoint as the 'newprio' to get the old behaviour back / the correct priority:
|futex sched_pi_setprio: comm=futex_requeue_p pid"20 oldprio newprio=3D120 |futex sched_switch: prev_comm=futex_requeue_p prev_pid"20 prev_prio=120
Peter suggested to open code the new priority so people using tracehook could get the deadline data out.
Reported-by: Mansky Christian man@keba.com Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Steven Rostedt rostedt@goodmis.org Cc: Thomas Gleixner tglx@linutronix.de Fixes: b91473ff6e97 ("sched,tracing: Update trace_sched_pi_setprio()") Link: http://lkml.kernel.org/r/20180524132647.gg6ziuogczdmjjzu@linutronix.de Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/trace/events/sched.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -435,7 +435,9 @@ TRACE_EVENT(sched_pi_setprio, memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); __entry->pid = tsk->pid; __entry->oldprio = tsk->prio; - __entry->newprio = pi_task ? pi_task->prio : tsk->prio; + __entry->newprio = pi_task ? + min(tsk->normal_prio, pi_task->prio) : + tsk->normal_prio; /* XXX SCHED_DEADLINE bits missing */ ),
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
commit 28557cc106e6d2aa8b8c5c7687ea9f8055ff3911 upstream.
Revert commit c7f26ccfb2c3 ("mm/vmstat.c: fix vmstat_update() preemption BUG"). Steven saw a "using smp_processor_id() in preemptible" message and added a preempt_disable() section around it to keep it quiet. This is not the right thing to do it does not fix the real problem.
vmstat_update() is invoked by a kworker on a specific CPU. This worker it bound to this CPU. The name of the worker was "kworker/1:1" so it should have been a worker which was bound to CPU1. A worker which can run on any CPU would have a `u' before the first digit.
smp_processor_id() can be used in a preempt-enabled region as long as the task is bound to a single CPU which is the case here. If it could run on an arbitrary CPU then this is the problem we have an should seek to resolve.
Not only this smp_processor_id() must not be migrated to another CPU but also refresh_cpu_vm_stats() which might access wrong per-CPU variables. Not to mention that other code relies on the fact that such a worker runs on one specific CPU only.
Therefore revert that commit and we should look instead what broke the affinity mask of the kworker.
Link: http://lkml.kernel.org/r/20180504104451.20278-1-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Steven J. Hill steven.hill@cavium.com Cc: Tejun Heo htejun@gmail.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Thomas Gleixner tglx@linutronix.de Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/vmstat.c | 2 -- 1 file changed, 2 deletions(-)
--- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1770,11 +1770,9 @@ static void vmstat_update(struct work_st * to occur in the future. Keep on running the * update worker thread. */ - preempt_disable(); queue_delayed_work_on(smp_processor_id(), mm_percpu_wq, this_cpu_ptr(&vmstat_work), round_jiffies_relative(sysctl_stat_interval)); - preempt_enable(); } }
On 10 July 2018 at 23:54, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.55 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Jul 12 18:24:36 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.55-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.14.55-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 5c888f221d45895399fe36b2e0c0015a676320eb git describe: v4.14.54-54-g5c888f221d45 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.54-54...
No regressions (compared to build v4.14.54-51-g0a7132162c00)
Ran 16542 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On 07/10/2018 11:24 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.55 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Jul 12 18:24:36 UTC 2018. Anything received after that time might be too late.
Build results: total: 148 pass: 148 fail: 0 Qemu test results: total: 160 pass: 160 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
On 07/10/2018 12:24 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.55 release. There are 53 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Jul 12 18:24:36 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.55-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org