This is the start of the stable review cycle for the 4.14.198 release. There are 12 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 13 Sep 2020 12:24:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.198-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.198-rc1
Jakub Kicinski kuba@kernel.org net: disable netpoll on fresh napis
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp tipc: fix shutdown() of connectionless socket
Xin Long lucien.xin@gmail.com sctp: not disable bh in the whole sctp_get_port_local()
Kamil Lorenc kamil@re-ws.pl net: usb: dm9601: Add USB ID of Keenetic Plus DSL
Paul Moore paul@paul-moore.com netlabel: fix problems with mapping removal
Jakub Kicinski kuba@kernel.org bnxt: don't enable NAPI until rings are ready
Alex Williamson alex.williamson@redhat.com vfio/pci: Fix SR-IOV VF handling with MMIO blocking
Alex Williamson alex.williamson@redhat.com vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
Alex Williamson alex.williamson@redhat.com vfio-pci: Fault mmaps to enable vma tracking
Alex Williamson alex.williamson@redhat.com vfio/type1: Support faulting PFNMAP vmas
Jens Axboe axboe@kernel.dk block: ensure bdi->io_pages is always initialized
Takashi Sakamoto o-takashi@sakamocchi.jp ALSA; firewire-tascam: exclude Tascam FE-8 from detection
-------------
Diffstat:
Makefile | 4 +- block/blk-core.c | 2 + drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +- drivers/net/usb/dm9601.c | 4 + drivers/vfio/pci/vfio_pci.c | 352 +++++++++++++++++++++++++++--- drivers/vfio/pci/vfio_pci_config.c | 51 ++++- drivers/vfio/pci/vfio_pci_intrs.c | 14 ++ drivers/vfio/pci/vfio_pci_private.h | 16 ++ drivers/vfio/pci/vfio_pci_rdwr.c | 29 ++- drivers/vfio/vfio_iommu_type1.c | 36 ++- net/core/dev.c | 3 +- net/core/netpoll.c | 2 +- net/netlabel/netlabel_domainhash.c | 59 ++--- net/sctp/socket.c | 16 +- net/tipc/socket.c | 9 +- sound/firewire/tascam/tascam.c | 30 ++- 16 files changed, 539 insertions(+), 97 deletions(-)
From: Takashi Sakamoto o-takashi@sakamocchi.jp
Tascam FE-8 is known to support communication by asynchronous transaction only. The support can be implemented in userspace application and snd-firewire-ctl-services project has the support. However, ALSA firewire-tascam driver is bound to the model.
This commit changes device entries so that the model is excluded. In a commit 53b3ffee7885 ("ALSA: firewire-tascam: change device probing processing"), I addressed to the concern that version field in configuration differs depending on installed firmware. However, as long as I checked, the version number is fixed. It's safe to return version number back to modalias.
Fixes: 53b3ffee7885 ("ALSA: firewire-tascam: change device probing processing") Cc: stable@vger.kernel.org # 4.4+ Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Link: https://lore.kernel.org/r/20200823075537.56255-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/firewire/tascam/tascam.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/sound/firewire/tascam/tascam.c b/sound/firewire/tascam/tascam.c index d3fdc463a884e..1e61cdce28952 100644 --- a/sound/firewire/tascam/tascam.c +++ b/sound/firewire/tascam/tascam.c @@ -225,11 +225,39 @@ static void snd_tscm_remove(struct fw_unit *unit) }
static const struct ieee1394_device_id snd_tscm_id_table[] = { + // Tascam, FW-1884. { .match_flags = IEEE1394_MATCH_VENDOR_ID | - IEEE1394_MATCH_SPECIFIER_ID, + IEEE1394_MATCH_SPECIFIER_ID | + IEEE1394_MATCH_VERSION, .vendor_id = 0x00022e, .specifier_id = 0x00022e, + .version = 0x800000, + }, + // Tascam, FE-8 (.version = 0x800001) + // This kernel module doesn't support FE-8 because the most of features + // can be implemented in userspace without any specific support of this + // module. + // + // .version = 0x800002 is unknown. + // + // Tascam, FW-1082. + { + .match_flags = IEEE1394_MATCH_VENDOR_ID | + IEEE1394_MATCH_SPECIFIER_ID | + IEEE1394_MATCH_VERSION, + .vendor_id = 0x00022e, + .specifier_id = 0x00022e, + .version = 0x800003, + }, + // Tascam, FW-1804. + { + .match_flags = IEEE1394_MATCH_VENDOR_ID | + IEEE1394_MATCH_SPECIFIER_ID | + IEEE1394_MATCH_VERSION, + .vendor_id = 0x00022e, + .specifier_id = 0x00022e, + .version = 0x800004, }, /* FE-08 requires reverse-engineering because it just has faders. */ {}
From: Jens Axboe axboe@kernel.dk
[ Upstream commit de1b0ee490eafdf65fac9eef9925391a8369f2dc ]
If a driver leaves the limit settings as the defaults, then we don't initialize bdi->io_pages. This means that file systems may need to work around bdi->io_pages == 0, which is somewhat messy.
Initialize the default value just like we do for ->ra_pages.
Cc: stable@vger.kernel.org Fixes: 9491ae4aade6 ("mm: don't cap request size based on read-ahead setting") Reported-by: OGAWA Hirofumi hirofumi@mail.parknet.co.jp Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-core.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/block/blk-core.c b/block/blk-core.c index 4e04c79aa2c27..2407c898ba7d8 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -852,6 +852,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
q->backing_dev_info->ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_SIZE; + q->backing_dev_info->io_pages = + (VM_MAX_READAHEAD * 1024) / PAGE_SIZE; q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK; q->backing_dev_info->name = "block"; q->node = node_id;
From: Alex Williamson alex.williamson@redhat.com
commit 41311242221e3482b20bfed10fa4d9db98d87016 upstream.
With conversion to follow_pfn(), DMA mapping a PFNMAP range depends on the range being faulted into the vma. Add support to manually provide that, in the same way as done on KVM with hva_to_pfn_remapped().
Reviewed-by: Peter Xu peterx@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com [Ajay: Regenerated the patch for v4.14] Signed-off-by: Ajay Kaher akaher@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vfio/vfio_iommu_type1.c | 36 +++++++++++++++++++++++++++++++++--- 1 file changed, 33 insertions(+), 3 deletions(-)
--- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -336,6 +336,32 @@ static int put_pfn(unsigned long pfn, in return 0; }
+static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm, + unsigned long vaddr, unsigned long *pfn, + bool write_fault) +{ + int ret; + + ret = follow_pfn(vma, vaddr, pfn); + if (ret) { + bool unlocked = false; + + ret = fixup_user_fault(NULL, mm, vaddr, + FAULT_FLAG_REMOTE | + (write_fault ? FAULT_FLAG_WRITE : 0), + &unlocked); + if (unlocked) + return -EAGAIN; + + if (ret) + return ret; + + ret = follow_pfn(vma, vaddr, pfn); + } + + return ret; +} + static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, int prot, unsigned long *pfn) { @@ -375,12 +401,16 @@ static int vaddr_get_pfn(struct mm_struc
down_read(&mm->mmap_sem);
+retry: vma = find_vma_intersection(mm, vaddr, vaddr + 1);
if (vma && vma->vm_flags & VM_PFNMAP) { - if (!follow_pfn(vma, vaddr, pfn) && - is_invalid_reserved_pfn(*pfn)) - ret = 0; + ret = follow_fault_pfn(vma, mm, vaddr, pfn, prot & IOMMU_WRITE); + if (ret == -EAGAIN) + goto retry; + + if (!ret && !is_invalid_reserved_pfn(*pfn)) + ret = -EFAULT; }
up_read(&mm->mmap_sem);
From: Alex Williamson alex.williamson@redhat.com
commit 11c4cd07ba111a09f49625f9e4c851d83daf0a22 upstream.
Rather than calling remap_pfn_range() when a region is mmap'd, setup a vm_ops handler to support dynamic faulting of the range on access. This allows us to manage a list of vmas actively mapping the area that we can later use to invalidate those mappings. The open callback invalidates the vma range so that all tracking is inserted in the fault handler and removed in the close handler.
Reviewed-by: Peter Xu peterx@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com [Ajay: Regenerated the patch for v4.14] Signed-off-by: Ajay Kaher akaher@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vfio/pci/vfio_pci.c | 76 +++++++++++++++++++++++++++++++++++- drivers/vfio/pci/vfio_pci_private.h | 7 +++ 2 files changed, 81 insertions(+), 2 deletions(-)
--- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1120,6 +1120,70 @@ static ssize_t vfio_pci_write(void *devi return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true); }
+static int vfio_pci_add_vma(struct vfio_pci_device *vdev, + struct vm_area_struct *vma) +{ + struct vfio_pci_mmap_vma *mmap_vma; + + mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL); + if (!mmap_vma) + return -ENOMEM; + + mmap_vma->vma = vma; + + mutex_lock(&vdev->vma_lock); + list_add(&mmap_vma->vma_next, &vdev->vma_list); + mutex_unlock(&vdev->vma_lock); + + return 0; +} + +/* + * Zap mmaps on open so that we can fault them in on access and therefore + * our vma_list only tracks mappings accessed since last zap. + */ +static void vfio_pci_mmap_open(struct vm_area_struct *vma) +{ + zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); +} + +static void vfio_pci_mmap_close(struct vm_area_struct *vma) +{ + struct vfio_pci_device *vdev = vma->vm_private_data; + struct vfio_pci_mmap_vma *mmap_vma; + + mutex_lock(&vdev->vma_lock); + list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) { + if (mmap_vma->vma == vma) { + list_del(&mmap_vma->vma_next); + kfree(mmap_vma); + break; + } + } + mutex_unlock(&vdev->vma_lock); +} + +static int vfio_pci_mmap_fault(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + struct vfio_pci_device *vdev = vma->vm_private_data; + + if (vfio_pci_add_vma(vdev, vma)) + return VM_FAULT_OOM; + + if (remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, + vma->vm_end - vma->vm_start, vma->vm_page_prot)) + return VM_FAULT_SIGBUS; + + return VM_FAULT_NOPAGE; +} + +static const struct vm_operations_struct vfio_pci_mmap_ops = { + .open = vfio_pci_mmap_open, + .close = vfio_pci_mmap_close, + .fault = vfio_pci_mmap_fault, +}; + static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) { struct vfio_pci_device *vdev = device_data; @@ -1185,8 +1249,14 @@ static int vfio_pci_mmap(void *device_da vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
- return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, - req_len, vma->vm_page_prot); + /* + * See remap_pfn_range(), called from vfio_pci_fault() but we can't + * change vm_flags within the fault handler. Set them now. + */ + vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; + vma->vm_ops = &vfio_pci_mmap_ops; + + return 0; }
static void vfio_pci_request(void *device_data, unsigned int count) @@ -1243,6 +1313,8 @@ static int vfio_pci_probe(struct pci_dev vdev->irq_type = VFIO_PCI_NUM_IRQS; mutex_init(&vdev->igate); spin_lock_init(&vdev->irqlock); + mutex_init(&vdev->vma_lock); + INIT_LIST_HEAD(&vdev->vma_list);
ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); if (ret) { --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -63,6 +63,11 @@ struct vfio_pci_dummy_resource { struct list_head res_next; };
+struct vfio_pci_mmap_vma { + struct vm_area_struct *vma; + struct list_head vma_next; +}; + struct vfio_pci_device { struct pci_dev *pdev; void __iomem *barmap[PCI_STD_RESOURCE_END + 1]; @@ -95,6 +100,8 @@ struct vfio_pci_device { struct eventfd_ctx *err_trigger; struct eventfd_ctx *req_trigger; struct list_head dummy_resources_list; + struct mutex vma_lock; + struct list_head vma_list; };
#define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
From: Alex Williamson alex.williamson@redhat.com
commit abafbc551fddede3e0a08dee1dcde08fc0eb8476 upstream.
Accessing the disabled memory space of a PCI device would typically result in a master abort response on conventional PCI, or an unsupported request on PCI express. The user would generally see these as a -1 response for the read return data and the write would be silently discarded, possibly with an uncorrected, non-fatal AER error triggered on the host. Some systems however take it upon themselves to bring down the entire system when they see something that might indicate a loss of data, such as this discarded write to a disabled memory space.
To avoid this, we want to try to block the user from accessing memory spaces while they're disabled. We start with a semaphore around the memory enable bit, where writers modify the memory enable state and must be serialized, while readers make use of the memory region and can access in parallel. Writers include both direct manipulation via the command register, as well as any reset path where the internal mechanics of the reset may both explicitly and implicitly disable memory access, and manipulation of the MSI-X configuration, where the MSI-X vector table resides in MMIO space of the device. Readers include the read and write file ops to access the vfio device fd offsets as well as memory mapped access. In the latter case, we make use of our new vma list support to zap, or invalidate, those memory mappings in order to force them to be faulted back in on access.
Our semaphore usage will stall user access to MMIO spaces across internal operations like reset, but the user might experience new behavior when trying to access the MMIO space while disabled via the PCI command register. Access via read or write while disabled will return -EIO and access via memory maps will result in a SIGBUS. This is expected to be compatible with known use cases and potentially provides better error handling capabilities than present in the hardware, while avoiding the more readily accessible and severe platform error responses that might otherwise occur.
Fixes: CVE-2020-12888 Reviewed-by: Peter Xu peterx@redhat.com Signed-off-by: Alex Williamson alex.williamson@redhat.com [Ajay: Regenerated the patch for v4.14] Signed-off-by: Ajay Kaher akaher@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/vfio/pci/vfio_pci.c | 294 +++++++++++++++++++++++++++++++----- drivers/vfio/pci/vfio_pci_config.c | 36 +++- drivers/vfio/pci/vfio_pci_intrs.c | 14 + drivers/vfio/pci/vfio_pci_private.h | 9 + drivers/vfio/pci/vfio_pci_rdwr.c | 29 ++- 5 files changed, 334 insertions(+), 48 deletions(-)
--- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -29,6 +29,7 @@ #include <linux/vfio.h> #include <linux/vgaarb.h> #include <linux/nospec.h> +#include <linux/sched/mm.h>
#include "vfio_pci_private.h"
@@ -181,6 +182,7 @@ no_mmap:
static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); static void vfio_pci_disable(struct vfio_pci_device *vdev); +static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data);
/* * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND @@ -644,6 +646,12 @@ int vfio_pci_register_dev_region(struct return 0; }
+struct vfio_devices { + struct vfio_device **devices; + int cur_index; + int max_index; +}; + static long vfio_pci_ioctl(void *device_data, unsigned int cmd, unsigned long arg) { @@ -717,7 +725,7 @@ static long vfio_pci_ioctl(void *device_ { void __iomem *io; size_t size; - u16 orig_cmd; + u16 cmd;
info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); info.flags = 0; @@ -737,10 +745,7 @@ static long vfio_pci_ioctl(void *device_ * Is it really there? Enable memory decode for * implicit access in pci_map_rom(). */ - pci_read_config_word(pdev, PCI_COMMAND, &orig_cmd); - pci_write_config_word(pdev, PCI_COMMAND, - orig_cmd | PCI_COMMAND_MEMORY); - + cmd = vfio_pci_memory_lock_and_enable(vdev); io = pci_map_rom(pdev, &size); if (io) { info.flags = VFIO_REGION_INFO_FLAG_READ; @@ -748,8 +753,8 @@ static long vfio_pci_ioctl(void *device_ } else { info.size = 0; } + vfio_pci_memory_unlock_and_restore(vdev, cmd);
- pci_write_config_word(pdev, PCI_COMMAND, orig_cmd); break; } case VFIO_PCI_VGA_REGION_INDEX: @@ -885,8 +890,16 @@ static long vfio_pci_ioctl(void *device_ return ret;
} else if (cmd == VFIO_DEVICE_RESET) { - return vdev->reset_works ? - pci_try_reset_function(vdev->pdev) : -EINVAL; + int ret; + + if (!vdev->reset_works) + return -EINVAL; + + vfio_pci_zap_and_down_write_memory_lock(vdev); + ret = pci_try_reset_function(vdev->pdev); + up_write(&vdev->memory_lock); + + return ret;
} else if (cmd == VFIO_DEVICE_GET_PCI_HOT_RESET_INFO) { struct vfio_pci_hot_reset_info hdr; @@ -966,8 +979,9 @@ reset_info_exit: int32_t *group_fds; struct vfio_pci_group_entry *groups; struct vfio_pci_group_info info; + struct vfio_devices devs = { .cur_index = 0 }; bool slot = false; - int i, count = 0, ret = 0; + int i, group_idx, mem_idx = 0, count = 0, ret = 0;
minsz = offsetofend(struct vfio_pci_hot_reset, count);
@@ -1019,9 +1033,9 @@ reset_info_exit: * user interface and store the group and iommu ID. This * ensures the group is held across the reset. */ - for (i = 0; i < hdr.count; i++) { + for (group_idx = 0; group_idx < hdr.count; group_idx++) { struct vfio_group *group; - struct fd f = fdget(group_fds[i]); + struct fd f = fdget(group_fds[group_idx]); if (!f.file) { ret = -EBADF; break; @@ -1034,8 +1048,9 @@ reset_info_exit: break; }
- groups[i].group = group; - groups[i].id = vfio_external_user_iommu_id(group); + groups[group_idx].group = group; + groups[group_idx].id = + vfio_external_user_iommu_id(group); }
kfree(group_fds); @@ -1054,14 +1069,65 @@ reset_info_exit: ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_validate_devs, &info, slot); - if (!ret) - /* User has access, do the reset */ - ret = slot ? pci_try_reset_slot(vdev->pdev->slot) : - pci_try_reset_bus(vdev->pdev->bus); + + if (ret) + goto hot_reset_release; + + devs.max_index = count; + devs.devices = kcalloc(count, sizeof(struct vfio_device *), + GFP_KERNEL); + if (!devs.devices) { + ret = -ENOMEM; + goto hot_reset_release; + } + + /* + * We need to get memory_lock for each device, but devices + * can share mmap_sem, therefore we need to zap and hold + * the vma_lock for each device, and only then get each + * memory_lock. + */ + ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, + vfio_pci_try_zap_and_vma_lock_cb, + &devs, slot); + if (ret) + goto hot_reset_release; + + for (; mem_idx < devs.cur_index; mem_idx++) { + struct vfio_pci_device *tmp; + + tmp = vfio_device_data(devs.devices[mem_idx]); + + ret = down_write_trylock(&tmp->memory_lock); + if (!ret) { + ret = -EBUSY; + goto hot_reset_release; + } + mutex_unlock(&tmp->vma_lock); + } + + /* User has access, do the reset */ + ret = slot ? pci_try_reset_slot(vdev->pdev->slot) : + pci_try_reset_bus(vdev->pdev->bus);
hot_reset_release: - for (i--; i >= 0; i--) - vfio_group_put_external_user(groups[i].group); + for (i = 0; i < devs.cur_index; i++) { + struct vfio_device *device; + struct vfio_pci_device *tmp; + + device = devs.devices[i]; + tmp = vfio_device_data(device); + + if (i < mem_idx) + up_write(&tmp->memory_lock); + else + mutex_unlock(&tmp->vma_lock); + vfio_device_put(device); + } + kfree(devs.devices); + + for (group_idx--; group_idx >= 0; group_idx--) + vfio_group_put_external_user(groups[group_idx].group);
kfree(groups); return ret; @@ -1120,8 +1186,126 @@ static ssize_t vfio_pci_write(void *devi return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true); }
-static int vfio_pci_add_vma(struct vfio_pci_device *vdev, - struct vm_area_struct *vma) +/* Return 1 on zap and vma_lock acquired, 0 on contention (only with @try) */ +static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try) +{ + struct vfio_pci_mmap_vma *mmap_vma, *tmp; + + /* + * Lock ordering: + * vma_lock is nested under mmap_sem for vm_ops callback paths. + * The memory_lock semaphore is used by both code paths calling + * into this function to zap vmas and the vm_ops.fault callback + * to protect the memory enable state of the device. + * + * When zapping vmas we need to maintain the mmap_sem => vma_lock + * ordering, which requires using vma_lock to walk vma_list to + * acquire an mm, then dropping vma_lock to get the mmap_sem and + * reacquiring vma_lock. This logic is derived from similar + * requirements in uverbs_user_mmap_disassociate(). + * + * mmap_sem must always be the top-level lock when it is taken. + * Therefore we can only hold the memory_lock write lock when + * vma_list is empty, as we'd need to take mmap_sem to clear + * entries. vma_list can only be guaranteed empty when holding + * vma_lock, thus memory_lock is nested under vma_lock. + * + * This enables the vm_ops.fault callback to acquire vma_lock, + * followed by memory_lock read lock, while already holding + * mmap_sem without risk of deadlock. + */ + while (1) { + struct mm_struct *mm = NULL; + + if (try) { + if (!mutex_trylock(&vdev->vma_lock)) + return 0; + } else { + mutex_lock(&vdev->vma_lock); + } + while (!list_empty(&vdev->vma_list)) { + mmap_vma = list_first_entry(&vdev->vma_list, + struct vfio_pci_mmap_vma, + vma_next); + mm = mmap_vma->vma->vm_mm; + if (mmget_not_zero(mm)) + break; + + list_del(&mmap_vma->vma_next); + kfree(mmap_vma); + mm = NULL; + } + if (!mm) + return 1; + mutex_unlock(&vdev->vma_lock); + + if (try) { + if (!down_read_trylock(&mm->mmap_sem)) { + mmput(mm); + return 0; + } + } else { + down_read(&mm->mmap_sem); + } + if (mmget_still_valid(mm)) { + if (try) { + if (!mutex_trylock(&vdev->vma_lock)) { + up_read(&mm->mmap_sem); + mmput(mm); + return 0; + } + } else { + mutex_lock(&vdev->vma_lock); + } + list_for_each_entry_safe(mmap_vma, tmp, + &vdev->vma_list, vma_next) { + struct vm_area_struct *vma = mmap_vma->vma; + + if (vma->vm_mm != mm) + continue; + + list_del(&mmap_vma->vma_next); + kfree(mmap_vma); + + zap_vma_ptes(vma, vma->vm_start, + vma->vm_end - vma->vm_start); + } + mutex_unlock(&vdev->vma_lock); + } + up_read(&mm->mmap_sem); + mmput(mm); + } +} + +void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev) +{ + vfio_pci_zap_and_vma_lock(vdev, false); + down_write(&vdev->memory_lock); + mutex_unlock(&vdev->vma_lock); +} + +u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev) +{ + u16 cmd; + + down_write(&vdev->memory_lock); + pci_read_config_word(vdev->pdev, PCI_COMMAND, &cmd); + if (!(cmd & PCI_COMMAND_MEMORY)) + pci_write_config_word(vdev->pdev, PCI_COMMAND, + cmd | PCI_COMMAND_MEMORY); + + return cmd; +} + +void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd) +{ + pci_write_config_word(vdev->pdev, PCI_COMMAND, cmd); + up_write(&vdev->memory_lock); +} + +/* Caller holds vma_lock */ +static int __vfio_pci_add_vma(struct vfio_pci_device *vdev, + struct vm_area_struct *vma) { struct vfio_pci_mmap_vma *mmap_vma;
@@ -1130,10 +1314,7 @@ static int vfio_pci_add_vma(struct vfio_ return -ENOMEM;
mmap_vma->vma = vma; - - mutex_lock(&vdev->vma_lock); list_add(&mmap_vma->vma_next, &vdev->vma_list); - mutex_unlock(&vdev->vma_lock);
return 0; } @@ -1167,15 +1348,32 @@ static int vfio_pci_mmap_fault(struct vm { struct vm_area_struct *vma = vmf->vma; struct vfio_pci_device *vdev = vma->vm_private_data; + int ret = VM_FAULT_NOPAGE; + + mutex_lock(&vdev->vma_lock); + down_read(&vdev->memory_lock); + + if (!__vfio_pci_memory_enabled(vdev)) { + ret = VM_FAULT_SIGBUS; + mutex_unlock(&vdev->vma_lock); + goto up_out; + } + + if (__vfio_pci_add_vma(vdev, vma)) { + ret = VM_FAULT_OOM; + mutex_unlock(&vdev->vma_lock); + goto up_out; + }
- if (vfio_pci_add_vma(vdev, vma)) - return VM_FAULT_OOM; + mutex_unlock(&vdev->vma_lock);
if (remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, vma->vm_end - vma->vm_start, vma->vm_page_prot)) - return VM_FAULT_SIGBUS; + ret = VM_FAULT_SIGBUS;
- return VM_FAULT_NOPAGE; +up_out: + up_read(&vdev->memory_lock); + return ret; }
static const struct vm_operations_struct vfio_pci_mmap_ops = { @@ -1315,6 +1513,7 @@ static int vfio_pci_probe(struct pci_dev spin_lock_init(&vdev->irqlock); mutex_init(&vdev->vma_lock); INIT_LIST_HEAD(&vdev->vma_list); + init_rwsem(&vdev->memory_lock);
ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); if (ret) { @@ -1409,12 +1608,6 @@ static struct pci_driver vfio_pci_driver .err_handler = &vfio_err_handlers, };
-struct vfio_devices { - struct vfio_device **devices; - int cur_index; - int max_index; -}; - static int vfio_pci_get_devs(struct pci_dev *pdev, void *data) { struct vfio_devices *devs = data; @@ -1431,6 +1624,39 @@ static int vfio_pci_get_devs(struct pci_ vfio_device_put(device); return -EBUSY; } + + devs->devices[devs->cur_index++] = device; + return 0; +} + +static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) +{ + struct vfio_devices *devs = data; + struct vfio_device *device; + struct vfio_pci_device *vdev; + + if (devs->cur_index == devs->max_index) + return -ENOSPC; + + device = vfio_device_get_from_dev(&pdev->dev); + if (!device) + return -EINVAL; + + if (pci_dev_driver(pdev) != &vfio_pci_driver) { + vfio_device_put(device); + return -EBUSY; + } + + vdev = vfio_device_data(device); + + /* + * Locking multiple devices is prone to deadlock, runaway and + * unwind if we hit contention. + */ + if (!vfio_pci_zap_and_vma_lock(vdev, true)) { + vfio_device_put(device); + return -EBUSY; + }
devs->devices[devs->cur_index++] = device; return 0; --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -398,6 +398,14 @@ static inline void p_setd(struct perm_bi *(__le32 *)(&p->write[off]) = cpu_to_le32(write); }
+/* Caller should hold memory_lock semaphore */ +bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev) +{ + u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]); + + return cmd & PCI_COMMAND_MEMORY; +} + /* * Restore the *real* BARs after we detect a FLR or backdoor reset. * (backdoor = some device specific technique that we didn't catch) @@ -558,13 +566,18 @@ static int vfio_basic_config_write(struc
new_cmd = le32_to_cpu(val);
+ phys_io = !!(phys_cmd & PCI_COMMAND_IO); + virt_io = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_IO); + new_io = !!(new_cmd & PCI_COMMAND_IO); + phys_mem = !!(phys_cmd & PCI_COMMAND_MEMORY); virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY); new_mem = !!(new_cmd & PCI_COMMAND_MEMORY);
- phys_io = !!(phys_cmd & PCI_COMMAND_IO); - virt_io = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_IO); - new_io = !!(new_cmd & PCI_COMMAND_IO); + if (!new_mem) + vfio_pci_zap_and_down_write_memory_lock(vdev); + else + down_write(&vdev->memory_lock);
/* * If the user is writing mem/io enable (new_mem/io) and we @@ -581,8 +594,11 @@ static int vfio_basic_config_write(struc }
count = vfio_default_config_write(vdev, pos, count, perm, offset, val); - if (count < 0) + if (count < 0) { + if (offset == PCI_COMMAND) + up_write(&vdev->memory_lock); return count; + }
/* * Save current memory/io enable bits in vconfig to allow for @@ -593,6 +609,8 @@ static int vfio_basic_config_write(struc
*virt_cmd &= cpu_to_le16(~mask); *virt_cmd |= cpu_to_le16(new_cmd & mask); + + up_write(&vdev->memory_lock); }
/* Emulate INTx disable */ @@ -830,8 +848,11 @@ static int vfio_exp_config_write(struct pos - offset + PCI_EXP_DEVCAP, &cap);
- if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) + if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) { + vfio_pci_zap_and_down_write_memory_lock(vdev); pci_try_reset_function(vdev->pdev); + up_write(&vdev->memory_lock); + } }
/* @@ -909,8 +930,11 @@ static int vfio_af_config_write(struct v pos - offset + PCI_AF_CAP, &cap);
- if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) + if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) { + vfio_pci_zap_and_down_write_memory_lock(vdev); pci_try_reset_function(vdev->pdev); + up_write(&vdev->memory_lock); + } }
return count; --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -252,6 +252,7 @@ static int vfio_msi_enable(struct vfio_p struct pci_dev *pdev = vdev->pdev; unsigned int flag = msix ? PCI_IRQ_MSIX : PCI_IRQ_MSI; int ret; + u16 cmd;
if (!is_irq_none(vdev)) return -EINVAL; @@ -261,13 +262,16 @@ static int vfio_msi_enable(struct vfio_p return -ENOMEM;
/* return the number of supported vectors if we can't get all: */ + cmd = vfio_pci_memory_lock_and_enable(vdev); ret = pci_alloc_irq_vectors(pdev, 1, nvec, flag); if (ret < nvec) { if (ret > 0) pci_free_irq_vectors(pdev); + vfio_pci_memory_unlock_and_restore(vdev, cmd); kfree(vdev->ctx); return ret; } + vfio_pci_memory_unlock_and_restore(vdev, cmd);
vdev->num_ctx = nvec; vdev->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX : @@ -290,6 +294,7 @@ static int vfio_msi_set_vector_signal(st struct pci_dev *pdev = vdev->pdev; struct eventfd_ctx *trigger; int irq, ret; + u16 cmd;
if (vector < 0 || vector >= vdev->num_ctx) return -EINVAL; @@ -298,7 +303,11 @@ static int vfio_msi_set_vector_signal(st
if (vdev->ctx[vector].trigger) { irq_bypass_unregister_producer(&vdev->ctx[vector].producer); + + cmd = vfio_pci_memory_lock_and_enable(vdev); free_irq(irq, vdev->ctx[vector].trigger); + vfio_pci_memory_unlock_and_restore(vdev, cmd); + kfree(vdev->ctx[vector].name); eventfd_ctx_put(vdev->ctx[vector].trigger); vdev->ctx[vector].trigger = NULL; @@ -326,6 +335,7 @@ static int vfio_msi_set_vector_signal(st * such a reset it would be unsuccessful. To avoid this, restore the * cached value of the message prior to enabling. */ + cmd = vfio_pci_memory_lock_and_enable(vdev); if (msix) { struct msi_msg msg;
@@ -335,6 +345,7 @@ static int vfio_msi_set_vector_signal(st
ret = request_irq(irq, vfio_msihandler, 0, vdev->ctx[vector].name, trigger); + vfio_pci_memory_unlock_and_restore(vdev, cmd); if (ret) { kfree(vdev->ctx[vector].name); eventfd_ctx_put(trigger); @@ -379,6 +390,7 @@ static void vfio_msi_disable(struct vfio { struct pci_dev *pdev = vdev->pdev; int i; + u16 cmd;
for (i = 0; i < vdev->num_ctx; i++) { vfio_virqfd_disable(&vdev->ctx[i].unmask); @@ -387,7 +399,9 @@ static void vfio_msi_disable(struct vfio
vfio_msi_set_block(vdev, 0, vdev->num_ctx, NULL, msix);
+ cmd = vfio_pci_memory_lock_and_enable(vdev); pci_free_irq_vectors(pdev); + vfio_pci_memory_unlock_and_restore(vdev, cmd);
/* * Both disable paths above use pci_intx_for_msi() to clear DisINTx --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -102,6 +102,7 @@ struct vfio_pci_device { struct list_head dummy_resources_list; struct mutex vma_lock; struct list_head vma_list; + struct rw_semaphore memory_lock; };
#define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX) @@ -137,6 +138,14 @@ extern int vfio_pci_register_dev_region( unsigned int type, unsigned int subtype, const struct vfio_pci_regops *ops, size_t size, u32 flags, void *data); + +extern bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev); +extern void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device + *vdev); +extern u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev); +extern void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, + u16 cmd); + #ifdef CONFIG_VFIO_PCI_IGD extern int vfio_pci_igd_init(struct vfio_pci_device *vdev); #else --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -122,6 +122,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_ size_t x_start = 0, x_end = 0; resource_size_t end; void __iomem *io; + struct resource *res = &vdev->pdev->resource[bar]; ssize_t done;
if (pci_resource_start(pdev, bar)) @@ -137,6 +138,14 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_
count = min(count, (size_t)(end - pos));
+ if (res->flags & IORESOURCE_MEM) { + down_read(&vdev->memory_lock); + if (!__vfio_pci_memory_enabled(vdev)) { + up_read(&vdev->memory_lock); + return -EIO; + } + } + if (bar == PCI_ROM_RESOURCE) { /* * The ROM can fill less space than the BAR, so we start the @@ -144,20 +153,21 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_ * filling large ROM BARs much faster. */ io = pci_map_rom(pdev, &x_start); - if (!io) - return -ENOMEM; + if (!io) { + done = -ENOMEM; + goto out; + } x_end = end; } else if (!vdev->barmap[bar]) { - int ret; - - ret = pci_request_selected_regions(pdev, 1 << bar, "vfio"); - if (ret) - return ret; + done = pci_request_selected_regions(pdev, 1 << bar, "vfio"); + if (done) + goto out;
io = pci_iomap(pdev, bar, 0); if (!io) { pci_release_selected_regions(pdev, 1 << bar); - return -ENOMEM; + done = -ENOMEM; + goto out; }
vdev->barmap[bar] = io; @@ -176,6 +186,9 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_
if (bar == PCI_ROM_RESOURCE) pci_unmap_rom(pdev, io); +out: + if (res->flags & IORESOURCE_MEM) + up_read(&vdev->memory_lock);
return done; }
From: Alex Williamson alex.williamson@redhat.com
commit ebfa440ce38b7e2e04c3124aa89c8a9f4094cf21 upstream.
SR-IOV VFs do not implement the memory enable bit of the command register, therefore this bit is not set in config space after pci_enable_device(). This leads to an unintended difference between PF and VF in hand-off state to the user. We can correct this by setting the initial value of the memory enable bit in our virtualized config space. There's really no need however to ever fault a user on a VF though as this would only indicate an error in the user's management of the enable bit, versus a PF where the same access could trigger hardware faults.
Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/vfio/pci/vfio_pci_config.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
--- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -401,9 +401,15 @@ static inline void p_setd(struct perm_bi /* Caller should hold memory_lock semaphore */ bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev) { + struct pci_dev *pdev = vdev->pdev; u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]);
- return cmd & PCI_COMMAND_MEMORY; + /* + * SR-IOV VF memory enable is handled by the MSE bit in the + * PF SR-IOV capability, there's therefore no need to trigger + * faults based on the virtual value. + */ + return pdev->is_virtfn || (cmd & PCI_COMMAND_MEMORY); }
/* @@ -1732,6 +1738,15 @@ int vfio_config_init(struct vfio_pci_dev vconfig[PCI_INTERRUPT_PIN]);
vconfig[PCI_INTERRUPT_PIN] = 0; /* Gratuitous for good VFs */ + + /* + * VFs do no implement the memory enable bit of the COMMAND + * register therefore we'll not have it set in our initial + * copy of config space after pci_enable_device(). For + * consistency with PFs, set the virtual enable bit here. + */ + *(__le16 *)&vconfig[PCI_COMMAND] |= + cpu_to_le16(PCI_COMMAND_MEMORY); }
if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx)
From: Jakub Kicinski kuba@kernel.org
commit 96ecdcc992eb7f468b2cf829b0f5408a1fad4668 upstream.
Netpoll can try to poll napi as soon as napi_enable() is called. It crashes trying to access a doorbell which is still NULL:
BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 59 PID: 6039 Comm: ethtool Kdump: loaded Tainted: G S 5.9.0-rc1-00469-g5fd99b5d9950-dirty #26 RIP: 0010:bnxt_poll+0x121/0x1c0 Code: c4 20 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 8b 86 a0 01 00 00 41 23 85 18 01 00 00 49 8b 96 a8 01 00 00 0d 00 00 00 24 <89> 02 41 f6 45 77 02 74 cb 49 8b ae d8 01 00 00 31 c0 c7 44 24 1a netpoll_poll_dev+0xbd/0x1a0 __netpoll_send_skb+0x1b2/0x210 netpoll_send_udp+0x2c9/0x406 write_ext_msg+0x1d7/0x1f0 console_unlock+0x23c/0x520 vprintk_emit+0xe0/0x1d0 printk+0x58/0x6f x86_vector_activate.cold+0xf/0x46 __irq_domain_activate_irq+0x50/0x80 __irq_domain_activate_irq+0x32/0x80 __irq_domain_activate_irq+0x32/0x80 irq_domain_activate_irq+0x25/0x40 __setup_irq+0x2d2/0x700 request_threaded_irq+0xfb/0x160 __bnxt_open_nic+0x3b1/0x750 bnxt_open_nic+0x19/0x30 ethtool_set_channels+0x1ac/0x220 dev_ethtool+0x11ba/0x2240 dev_ioctl+0x1cf/0x390 sock_do_ioctl+0x95/0x130
Reported-by: Rob Sherwood rsher@fb.com Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -6378,14 +6378,14 @@ static int __bnxt_open_nic(struct bnxt * } }
- bnxt_enable_napi(bp); - rc = bnxt_init_nic(bp, irq_re_init); if (rc) { netdev_err(bp->dev, "bnxt_init_nic err: %x\n", rc); - goto open_err; + goto open_err_irq; }
+ bnxt_enable_napi(bp); + if (link_re_init) { mutex_lock(&bp->link_lock); rc = bnxt_update_phy_setting(bp); @@ -6410,9 +6410,6 @@ static int __bnxt_open_nic(struct bnxt * bnxt_vf_reps_open(bp); return 0;
-open_err: - bnxt_disable_napi(bp); - open_err_irq: bnxt_del_napi(bp);
From: Paul Moore paul@paul-moore.com
[ Upstream commit d3b990b7f327e2afa98006e7666fb8ada8ed8683 ]
This patch fixes two main problems seen when removing NetLabel mappings: memory leaks and potentially extra audit noise.
The memory leaks are caused by not properly free'ing the mapping's address selector struct when free'ing the entire entry as well as not properly cleaning up a temporary mapping entry when adding new address selectors to an existing entry. This patch fixes both these problems such that kmemleak reports no NetLabel associated leaks after running the SELinux test suite.
The potentially extra audit noise was caused by the auditing code in netlbl_domhsh_remove_entry() being called regardless of the entry's validity. If another thread had already marked the entry as invalid, but not removed/free'd it from the list of mappings, then it was possible that an additional mapping removal audit record would be generated. This patch fixes this by returning early from the removal function when the entry was previously marked invalid. This change also had the side benefit of improving the code by decreasing the indentation level of large chunk of code by one (accounting for most of the diffstat).
Fixes: 63c416887437 ("netlabel: Add network address selectors to the NetLabel/LSM domain mapping") Reported-by: Stephen Smalley stephen.smalley.work@gmail.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netlabel/netlabel_domainhash.c | 59 ++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 29 deletions(-)
--- a/net/netlabel/netlabel_domainhash.c +++ b/net/netlabel/netlabel_domainhash.c @@ -99,6 +99,7 @@ static void netlbl_domhsh_free_entry(str kfree(netlbl_domhsh_addr6_entry(iter6)); } #endif /* IPv6 */ + kfree(ptr->def.addrsel); } kfree(ptr->domain); kfree(ptr); @@ -550,6 +551,8 @@ int netlbl_domhsh_add(struct netlbl_dom_ goto add_return; } #endif /* IPv6 */ + /* cleanup the new entry since we've moved everything over */ + netlbl_domhsh_free_entry(&entry->rcu); } else ret_val = -EINVAL;
@@ -593,6 +596,12 @@ int netlbl_domhsh_remove_entry(struct ne { int ret_val = 0; struct audit_buffer *audit_buf; + struct netlbl_af4list *iter4; + struct netlbl_domaddr4_map *map4; +#if IS_ENABLED(CONFIG_IPV6) + struct netlbl_af6list *iter6; + struct netlbl_domaddr6_map *map6; +#endif /* IPv6 */
if (entry == NULL) return -ENOENT; @@ -610,6 +619,9 @@ int netlbl_domhsh_remove_entry(struct ne ret_val = -ENOENT; spin_unlock(&netlbl_domhsh_lock);
+ if (ret_val) + return ret_val; + audit_buf = netlbl_audit_start_common(AUDIT_MAC_MAP_DEL, audit_info); if (audit_buf != NULL) { audit_log_format(audit_buf, @@ -619,40 +631,29 @@ int netlbl_domhsh_remove_entry(struct ne audit_log_end(audit_buf); }
- if (ret_val == 0) { - struct netlbl_af4list *iter4; - struct netlbl_domaddr4_map *map4; -#if IS_ENABLED(CONFIG_IPV6) - struct netlbl_af6list *iter6; - struct netlbl_domaddr6_map *map6; -#endif /* IPv6 */ - - switch (entry->def.type) { - case NETLBL_NLTYPE_ADDRSELECT: - netlbl_af4list_foreach_rcu(iter4, - &entry->def.addrsel->list4) { - map4 = netlbl_domhsh_addr4_entry(iter4); - cipso_v4_doi_putdef(map4->def.cipso); - } + switch (entry->def.type) { + case NETLBL_NLTYPE_ADDRSELECT: + netlbl_af4list_foreach_rcu(iter4, &entry->def.addrsel->list4) { + map4 = netlbl_domhsh_addr4_entry(iter4); + cipso_v4_doi_putdef(map4->def.cipso); + } #if IS_ENABLED(CONFIG_IPV6) - netlbl_af6list_foreach_rcu(iter6, - &entry->def.addrsel->list6) { - map6 = netlbl_domhsh_addr6_entry(iter6); - calipso_doi_putdef(map6->def.calipso); - } + netlbl_af6list_foreach_rcu(iter6, &entry->def.addrsel->list6) { + map6 = netlbl_domhsh_addr6_entry(iter6); + calipso_doi_putdef(map6->def.calipso); + } #endif /* IPv6 */ - break; - case NETLBL_NLTYPE_CIPSOV4: - cipso_v4_doi_putdef(entry->def.cipso); - break; + break; + case NETLBL_NLTYPE_CIPSOV4: + cipso_v4_doi_putdef(entry->def.cipso); + break; #if IS_ENABLED(CONFIG_IPV6) - case NETLBL_NLTYPE_CALIPSO: - calipso_doi_putdef(entry->def.calipso); - break; + case NETLBL_NLTYPE_CALIPSO: + calipso_doi_putdef(entry->def.calipso); + break; #endif /* IPv6 */ - } - call_rcu(&entry->rcu, netlbl_domhsh_free_entry); } + call_rcu(&entry->rcu, netlbl_domhsh_free_entry);
return ret_val; }
From: Kamil Lorenc kamil@re-ws.pl
[ Upstream commit a609d0259183a841621f252e067f40f8cc25d6f6 ]
Keenetic Plus DSL is a xDSL modem that uses dm9620 as its USB interface.
Signed-off-by: Kamil Lorenc kamil@re-ws.pl Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/dm9601.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/net/usb/dm9601.c +++ b/drivers/net/usb/dm9601.c @@ -625,6 +625,10 @@ static const struct usb_device_id produc USB_DEVICE(0x0a46, 0x1269), /* DM9621A USB to Fast Ethernet Adapter */ .driver_info = (unsigned long)&dm9601_info, }, + { + USB_DEVICE(0x0586, 0x3427), /* ZyXEL Keenetic Plus DSL xDSL modem */ + .driver_info = (unsigned long)&dm9601_info, + }, {}, // END };
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 3106ecb43a05dc3e009779764b9da245a5d082de ]
With disabling bh in the whole sctp_get_port_local(), when snum == 0 and too many ports have been used, the do-while loop will take the cpu for a long time and cause cpu stuck:
[ ] watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [ ] RIP: 0010:native_queued_spin_lock_slowpath+0x4de/0x940 [ ] Call Trace: [ ] _raw_spin_lock+0xc1/0xd0 [ ] sctp_get_port_local+0x527/0x650 [sctp] [ ] sctp_do_bind+0x208/0x5e0 [sctp] [ ] sctp_autobind+0x165/0x1e0 [sctp] [ ] sctp_connect_new_asoc+0x355/0x480 [sctp] [ ] __sctp_connect+0x360/0xb10 [sctp]
There's no need to disable bh in the whole function of sctp_get_port_local. So fix this cpu stuck by removing local_bh_disable() called at the beginning, and using spin_lock_bh() instead.
The same thing was actually done for inet_csk_get_port() in Commit ea8add2b1903 ("tcp/dccp: better use of ephemeral ports in bind()").
Thanks to Marcelo for pointing the buggy code out.
v1->v2: - use cond_resched() to yield cpu to other tasks if needed, as Eric noticed.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Ying Xu yinxu@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/socket.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
--- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -7086,8 +7086,6 @@ static long sctp_get_port_local(struct s
pr_debug("%s: begins, snum:%d\n", __func__, snum);
- local_bh_disable(); - if (snum == 0) { /* Search for an available port. */ int low, high, remaining, index; @@ -7106,20 +7104,21 @@ static long sctp_get_port_local(struct s continue; index = sctp_phashfn(sock_net(sk), rover); head = &sctp_port_hashtable[index]; - spin_lock(&head->lock); + spin_lock_bh(&head->lock); sctp_for_each_hentry(pp, &head->chain) if ((pp->port == rover) && net_eq(sock_net(sk), pp->net)) goto next; break; next: - spin_unlock(&head->lock); + spin_unlock_bh(&head->lock); + cond_resched(); } while (--remaining > 0);
/* Exhausted local port range during search? */ ret = 1; if (remaining <= 0) - goto fail; + return ret;
/* OK, here is the one we will use. HEAD (the port * hash table list entry) is non-NULL and we hold it's @@ -7134,7 +7133,7 @@ static long sctp_get_port_local(struct s * port iterator, pp being NULL. */ head = &sctp_port_hashtable[sctp_phashfn(sock_net(sk), snum)]; - spin_lock(&head->lock); + spin_lock_bh(&head->lock); sctp_for_each_hentry(pp, &head->chain) { if ((pp->port == snum) && net_eq(pp->net, sock_net(sk))) goto pp_found; @@ -7218,10 +7217,7 @@ success: ret = 0;
fail_unlock: - spin_unlock(&head->lock); - -fail: - local_bh_enable(); + spin_unlock_bh(&head->lock); return ret; }
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 2a63866c8b51a3f72cea388dfac259d0e14c4ba6 ]
syzbot is reporting hung task at nbd_ioctl() [1], for there are two problems regarding TIPC's connectionless socket's shutdown() operation.
---------- #include <fcntl.h> #include <sys/socket.h> #include <sys/ioctl.h> #include <linux/nbd.h> #include <unistd.h>
int main(int argc, char *argv[]) { const int fd = open("/dev/nbd0", 3); alarm(5); ioctl(fd, NBD_SET_SOCK, socket(PF_TIPC, SOCK_DGRAM, 0)); ioctl(fd, NBD_DO_IT, 0); /* To be interrupted by SIGALRM. */ return 0; } ----------
One problem is that wait_for_completion() from flush_workqueue() from nbd_start_device_ioctl() from nbd_ioctl() cannot be completed when nbd_start_device_ioctl() received a signal at wait_event_interruptible(), for tipc_shutdown() from kernel_sock_shutdown(SHUT_RDWR) from nbd_mark_nsock_dead() from sock_shutdown() from nbd_start_device_ioctl() is failing to wake up a WQ thread sleeping at wait_woken() from tipc_wait_for_rcvmsg() from sock_recvmsg() from sock_xmit() from nbd_read_stat() from recv_work() scheduled by nbd_start_device() from nbd_start_device_ioctl(). Fix this problem by always invoking sk->sk_state_change() (like inet_shutdown() does) when tipc_shutdown() is called.
The other problem is that tipc_wait_for_rcvmsg() cannot return when tipc_shutdown() is called, for tipc_shutdown() sets sk->sk_shutdown to SEND_SHUTDOWN (despite "how" is SHUT_RDWR) while tipc_wait_for_rcvmsg() needs sk->sk_shutdown set to RCV_SHUTDOWN or SHUTDOWN_MASK. Fix this problem by setting sk->sk_shutdown to SHUTDOWN_MASK (like inet_shutdown() does) when the socket is connectionless.
[1] https://syzkaller.appspot.com/bug?id=3fe51d307c1f0a845485cf1798aa059d12bf18b...
Reported-by: syzbot syzbot+e36f41d207137b5d12f7@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tipc/socket.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2126,18 +2126,21 @@ static int tipc_shutdown(struct socket * lock_sock(sk);
__tipc_shutdown(sock, TIPC_CONN_SHUTDOWN); - sk->sk_shutdown = SEND_SHUTDOWN; + if (tipc_sk_type_connectionless(sk)) + sk->sk_shutdown = SHUTDOWN_MASK; + else + sk->sk_shutdown = SEND_SHUTDOWN;
if (sk->sk_state == TIPC_DISCONNECTING) { /* Discard any unreceived messages */ __skb_queue_purge(&sk->sk_receive_queue);
- /* Wake up anyone sleeping in poll */ - sk->sk_state_change(sk); res = 0; } else { res = -ENOTCONN; } + /* Wake up anyone sleeping in poll. */ + sk->sk_state_change(sk);
release_sock(sk); return res;
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 96e97bc07e90f175a8980a22827faf702ca4cb30 ]
napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent netpoll from accessing rings before init is complete. However, the same is not done for fresh napi instances in netif_napi_add(), even though we expect NAPI instances to be added as disabled.
This causes crashes during driver reconfiguration (enabling XDP, changing the channel count) - if there is any printk() after netif_napi_add() but before napi_enable().
To ensure memory ordering is correct we need to use RCU accessors.
Reported-by: Rob Sherwood rsher@fb.com Fixes: 2d8bff12699a ("netpoll: Close race condition between poll_one_napi and napi_disable") Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 3 ++- net/core/netpoll.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -5532,12 +5532,13 @@ void netif_napi_add(struct net_device *d pr_err_once("netif_napi_add() called with weight %d on device %s\n", weight, dev->name); napi->weight = weight; - list_add(&napi->dev_list, &dev->napi_list); napi->dev = dev; #ifdef CONFIG_NETPOLL napi->poll_owner = -1; #endif set_bit(NAPI_STATE_SCHED, &napi->state); + set_bit(NAPI_STATE_NPSVC, &napi->state); + list_add_rcu(&napi->dev_list, &dev->napi_list); napi_hash_add(napi); } EXPORT_SYMBOL(netif_napi_add); --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -179,7 +179,7 @@ static void poll_napi(struct net_device struct napi_struct *napi; int cpu = smp_processor_id();
- list_for_each_entry(napi, &dev->napi_list, dev_list) { + list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) { if (cmpxchg(&napi->poll_owner, -1, cpu) == -1) { poll_one_napi(napi); smp_store_release(&napi->poll_owner, -1);
On Fri, 11 Sep 2020 14:46:54 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.198 release. There are 12 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 13 Sep 2020 12:24:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.198-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v4.14: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 30 tests: 30 pass, 0 fail
Linux version: 4.14.198-rc1-g69fa365ca674 Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 9/11/20 6:46 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.198 release. There are 12 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 13 Sep 2020 12:24:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.198-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Fri, Sep 11, 2020 at 02:46:54PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.198 release. There are 12 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 13 Sep 2020 12:24:42 +0000. Anything received after that time might be too late.
Build results: total: 171 pass: 171 fail: 0 Qemu test results: total: 408 pass: 408 fail: 0
Oh, and I keep forgetting the formal:
Tested-by: Guenter Roeck linux@roeck-us.net
Please consider that to be given for all test reports even if not explicit.
Thanks, Guenter
On Fri, 11 Sep 2020 at 18:29, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.198 release. There are 12 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 13 Sep 2020 12:24:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.198-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
Summary ------------------------------------------------------------------------
kernel: 4.14.198-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 69fa365ca674b7df0b29cce9ffdfc43c0fa311dc git describe: v4.14.197-13-g69fa365ca674 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.14.y/build/v4.14....
No regressions (compared to build v4.14.197)
No fixes (compared to build v4.14.197)
Ran 19581 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * kselftest/drivers * kselftest/filesystems * kselftest/net * linux-log-parser * perf * network-basic-tests * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * v4l2-compliance * ltp-open-posix-tests * ssuite * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-native/drivers * kselftest-vsyscall-mode-native/filesystems * kselftest-vsyscall-mode-native/net * kselftest-vsyscall-mode-none * kselftest-vsyscall-mode-none/drivers * kselftest-vsyscall-mode-none/filesystems * kselftest-vsyscall-mode-none/net
linux-stable-mirror@lists.linaro.org