This is the start of the stable review cycle for the 4.14.236 release. There are 47 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.236-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.236-rc1
Jan Beulich jbeulich@suse.com xen-pciback: redo VF placement in the virtual topology
Cheng Jian cj.chengjian@huawei.com sched/fair: Optimize select_idle_cpu
Sean Christopherson seanjc@google.com KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
Michael Chan michael.chan@broadcom.com bnxt_en: Remove the setting of dev_port.
Daniel Borkmann daniel@iogearbox.net bpf: No need to simulate speculative domain for immediates
Daniel Borkmann daniel@iogearbox.net bpf: Fix mask direction swap upon off reg sign change
Daniel Borkmann daniel@iogearbox.net bpf: Wrap aux data inside bpf_sanitize_info container
Daniel Borkmann daniel@iogearbox.net bpf: Fix leakage of uninitialized bpf stack under speculation
Alexei Starovoitov ast@kernel.org selftests/bpf: make 'dubious pointer arithmetic' test useful
Alexei Starovoitov ast@fb.com selftests/bpf: fix test_align
Alexei Starovoitov ast@kernel.org bpf/verifier: disallow pointer subtraction
Alexei Starovoitov ast@kernel.org bpf: do not allow root to mangle valid pointers
Daniel Borkmann daniel@iogearbox.net bpf: Update selftests to reflect new error states
Daniel Borkmann daniel@iogearbox.net bpf: Tighten speculative pointer arithmetic mask
Daniel Borkmann daniel@iogearbox.net bpf: Move sanitize_val_alu out of op switch
Daniel Borkmann daniel@iogearbox.net bpf: Refactor and streamline bounds check into helper
Daniel Borkmann daniel@iogearbox.net bpf: Improve verifier error messages for users
Daniel Borkmann daniel@iogearbox.net bpf: Rework ptr_limit into alu_limit and add common error path
Daniel Borkmann daniel@iogearbox.net bpf: Ensure off_reg has no mixed signed bounds for all types
Daniel Borkmann daniel@iogearbox.net bpf: Move off_reg into sanitize_ptr_alu
Piotr Krysiuk piotras@gmail.com bpf, selftests: Fix up some test_verifier cases for unprivileged
Mina Almasry almasrymina@google.com mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
Josef Bacik josef@toxicpanda.com btrfs: fixup error handling in fixup_inode_link_counts
Josef Bacik josef@toxicpanda.com btrfs: fix error handling in btrfs_del_csums
Krzysztof Kozlowski krzysztof.kozlowski@canonical.com nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect
Junxiao Bi junxiao.bi@oracle.com ocfs2: fix data corruption by fallocate
Mark Rutland mark.rutland@arm.com pid: take a reference when initializing `cad_pid`
Ye Bin yebin10@huawei.com ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
Takashi Iwai tiwai@suse.de ALSA: timer: Fix master timer notification
Pavel Skripkin paskripkin@gmail.com net: caif: fix memory leak in cfusbl_device_notify
Pavel Skripkin paskripkin@gmail.com net: caif: fix memory leak in caif_device_notify
Pavel Skripkin paskripkin@gmail.com net: caif: add proper error handling
Pavel Skripkin paskripkin@gmail.com net: caif: added cfserl_release function
Lin Ma linma@zju.edu.cn Bluetooth: use correct lock to prevent UAF of hdev object
Lin Ma linma@zju.edu.cn Bluetooth: fix the erroneous flush_work() order
Wei Yongjun weiyongjun1@huawei.com ieee802154: fix error return code in ieee802154_llsec_getparams()
Zhen Lei thunder.leizhen@huawei.com ieee802154: fix error return code in ieee802154_add_iface()
Pablo Neira Ayuso pablo@netfilter.org netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
Arnd Bergmann arnd@arndb.de HID: i2c-hid: fix format string mismatch
Zhen Lei thunder.leizhen@huawei.com HID: pidff: fix error return code in hid_pidff_init()
Julian Anastasov ja@ssi.bg ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
Max Gurtovoy mgurtovoy@nvidia.com vfio/platform: fix module_put call in error flow
Randy Dunlap rdunlap@infradead.org vfio/pci: zap_vma_ptes() needs MMU
Zhen Lei thunder.leizhen@huawei.com vfio/pci: Fix error return code in vfio_ecap_init()
Rasmus Villemoes linux@rasmusvillemoes.dk efi: cper: fix snprintf() use in cper_dimm_err_location()
Heiner Kallweit hkallweit1@gmail.com efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
Grant Grundler grundler@chromium.org net: usb: cdc_ncm: don't spew notifications
-------------
Diffstat:
Makefile | 4 +- arch/x86/kvm/svm.c | 8 +- drivers/firmware/efi/cper.c | 4 +- drivers/firmware/efi/memattr.c | 5 - drivers/hid/i2c-hid/i2c-hid-core.c | 4 +- drivers/hid/usbhid/hid-pidff.c | 1 + drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 - drivers/net/usb/cdc_ncm.c | 12 +- drivers/vfio/pci/Kconfig | 1 + drivers/vfio/pci/vfio_pci_config.c | 2 +- drivers/vfio/platform/vfio_platform_common.c | 2 +- drivers/xen/xen-pciback/vpci.c | 14 +- fs/btrfs/file-item.c | 10 +- fs/btrfs/tree-log.c | 13 +- fs/ext4/extents.c | 43 ++-- fs/ocfs2/file.c | 55 +++- include/linux/bpf_verifier.h | 5 +- include/linux/usb/usbnet.h | 2 + include/net/caif/caif_dev.h | 2 +- include/net/caif/cfcnfg.h | 2 +- include/net/caif/cfserl.h | 1 + init/main.c | 2 +- kernel/bpf/verifier.c | 369 ++++++++++++++++----------- kernel/sched/fair.c | 7 +- mm/hugetlb.c | 14 +- net/bluetooth/hci_core.c | 7 +- net/bluetooth/hci_sock.c | 4 +- net/caif/caif_dev.c | 13 +- net/caif/caif_usb.c | 14 +- net/caif/cfcnfg.c | 16 +- net/caif/cfserl.c | 5 + net/ieee802154/nl-mac.c | 4 +- net/ieee802154/nl-phy.c | 4 +- net/netfilter/ipvs/ip_vs_ctl.c | 2 +- net/netfilter/nfnetlink_cthelper.c | 8 +- net/nfc/llcp_sock.c | 2 + sound/core/timer.c | 3 +- tools/testing/selftests/bpf/test_align.c | 26 +- tools/testing/selftests/bpf/test_verifier.c | 114 +++++---- 39 files changed, 501 insertions(+), 304 deletions(-)
From: Grant Grundler grundler@chromium.org
[ Upstream commit de658a195ee23ca6aaffe197d1d2ea040beea0a2 ]
RTL8156 sends notifications about every 32ms. Only display/log notifications when something changes.
This issue has been reported by others: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832472 https://lkml.org/lkml/2020/8/27/1083
... [785962.779840] usb 1-1: new high-speed USB device number 5 using xhci_hcd [785962.929944] usb 1-1: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=30.00 [785962.929949] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6 [785962.929952] usb 1-1: Product: USB 10/100/1G/2.5G LAN [785962.929954] usb 1-1: Manufacturer: Realtek [785962.929956] usb 1-1: SerialNumber: 000000001 [785962.991755] usbcore: registered new interface driver cdc_ether [785963.017068] cdc_ncm 1-1:2.0: MAC-Address: 00:24:27:88:08:15 [785963.017072] cdc_ncm 1-1:2.0: setting rx_max = 16384 [785963.017169] cdc_ncm 1-1:2.0: setting tx_max = 16384 [785963.017682] cdc_ncm 1-1:2.0 usb0: register 'cdc_ncm' at usb-0000:00:14.0-1, CDC NCM, 00:24:27:88:08:15 [785963.019211] usbcore: registered new interface driver cdc_ncm [785963.023856] usbcore: registered new interface driver cdc_wdm [785963.025461] usbcore: registered new interface driver cdc_mbim [785963.038824] cdc_ncm 1-1:2.0 enx002427880815: renamed from usb0 [785963.089586] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected [785963.121673] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected [785963.153682] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected ...
This is about 2KB per second and will overwrite all contents of a 1MB dmesg buffer in under 10 minutes rendering them useless for debugging many kernel problems.
This is also an extra 180 MB/day in /var/logs (or 1GB per week) rendering the majority of those logs useless too.
When the link is up (expected state), spew amount is >2x higher: ... [786139.600992] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected [786139.632997] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink [786139.665097] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected [786139.697100] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink [786139.729094] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected [786139.761108] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink ...
Chrome OS cannot support RTL8156 until this is fixed.
Signed-off-by: Grant Grundler grundler@chromium.org Reviewed-by: Hayes Wang hayeswang@realtek.com Link: https://lore.kernel.org/r/20210120011208.3768105-1-grundler@chromium.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/cdc_ncm.c | 12 +++++++++++- include/linux/usb/usbnet.h | 2 ++ 2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index eef1412c058d..468db50eb5e7 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -1591,6 +1591,15 @@ cdc_ncm_speed_change(struct usbnet *dev, uint32_t rx_speed = le32_to_cpu(data->DLBitRRate); uint32_t tx_speed = le32_to_cpu(data->ULBitRate);
+ /* if the speed hasn't changed, don't report it. + * RTL8156 shipped before 2021 sends notification about every 32ms. + */ + if (dev->rx_speed == rx_speed && dev->tx_speed == tx_speed) + return; + + dev->rx_speed = rx_speed; + dev->tx_speed = tx_speed; + /* * Currently the USB-NET API does not support reporting the actual * device speed. Do print it instead. @@ -1634,7 +1643,8 @@ static void cdc_ncm_status(struct usbnet *dev, struct urb *urb) * USB_CDC_NOTIFY_NETWORK_CONNECTION notification shall be * sent by device after USB_CDC_NOTIFY_SPEED_CHANGE. */ - usbnet_link_change(dev, !!event->wValue, 0); + if (netif_carrier_ok(dev->net) != !!event->wValue) + usbnet_link_change(dev, !!event->wValue, 0); break;
case USB_CDC_NOTIFY_SPEED_CHANGE: diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index e87a805cbfef..5df465dc7af8 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -82,6 +82,8 @@ struct usbnet { # define EVENT_LINK_CHANGE 11 # define EVENT_SET_RX_MODE 12 # define EVENT_NO_IP_ALIGN 13 + u32 rx_speed; /* in bps - NOT Mbps */ + u32 tx_speed; /* in bps - NOT Mbps */ };
static inline struct usb_driver *driver_of(struct usb_interface *intf)
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 45add3cc99feaaf57d4b6f01d52d532c16a1caee ]
UEFI spec 2.9, p.108, table 4-1 lists the scenario that both attributes are cleared with the description "No memory access protection is possible for Entry". So we can have valid entries where both attributes are cleared, so remove the check.
Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Fixes: 10f0d2f577053 ("efi: Implement generic support for the Memory Attributes table") Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/memattr.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c index aac972b056d9..e0889922cc6d 100644 --- a/drivers/firmware/efi/memattr.c +++ b/drivers/firmware/efi/memattr.c @@ -69,11 +69,6 @@ static bool entry_is_valid(const efi_memory_desc_t *in, efi_memory_desc_t *out) return false; }
- if (!(in->attribute & (EFI_MEMORY_RO | EFI_MEMORY_XP))) { - pr_warn("Entry attributes invalid: RO and XP bits both cleared\n"); - return false; - } - if (PAGE_SIZE > EFI_PAGE_SIZE && (!PAGE_ALIGNED(in->phys_addr) || !PAGE_ALIGNED(in->num_pages << EFI_PAGE_SHIFT))) {
From: Rasmus Villemoes linux@rasmusvillemoes.dk
[ Upstream commit 942859d969de7f6f7f2659a79237a758b42782da ]
snprintf() should be given the full buffer size, not one less. And it guarantees nul-termination, so doing it manually afterwards is pointless.
It's even potentially harmful (though probably not in practice because CPER_REC_LEN is 256), due to the "return how much would have been written had the buffer been big enough" semantics. I.e., if the bank and/or device strings are long enough that the "DIMM location ..." output gets truncated, writing to msg[n] is a buffer overflow.
Signed-off-by: Rasmus Villemoes linux@rasmusvillemoes.dk Fixes: 3760cd20402d4 ("CPER: Adjust code flow of some functions") Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/efi/cper.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c index 209dc5aefc31..efbb13c6581e 100644 --- a/drivers/firmware/efi/cper.c +++ b/drivers/firmware/efi/cper.c @@ -380,8 +380,7 @@ static int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg) if (!msg || !(mem->validation_bits & CPER_MEM_VALID_MODULE_HANDLE)) return 0;
- n = 0; - len = CPER_REC_LEN - 1; + len = CPER_REC_LEN; dmi_memdev_name(mem->mem_dev_handle, &bank, &device); if (bank && device) n = snprintf(msg, len, "DIMM location: %s %s ", bank, device); @@ -390,7 +389,6 @@ static int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg) "DIMM location: not present. DMI handle: 0x%.4x ", mem->mem_dev_handle);
- msg[n] = '\0'; return n; }
From: Zhen Lei thunder.leizhen@huawei.com
[ Upstream commit d1ce2c79156d3baf0830990ab06d296477b93c26 ]
The error code returned from vfio_ext_cap_len() is stored in 'len', not in 'ret'.
Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Reviewed-by: Max Gurtovoy mgurtovoy@nvidia.com Message-Id: 20210515020458.6771-1-thunder.leizhen@huawei.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vfio/pci/vfio_pci_config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index a1a26465d224..86e917f1cc21 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -1579,7 +1579,7 @@ static int vfio_ecap_init(struct vfio_pci_device *vdev) if (len == 0xFF) { len = vfio_ext_cap_len(vdev, ecap, epos); if (len < 0) - return ret; + return len; } }
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 2a55ca37350171d9b43d561528f23d4130097255 ]
zap_vma_ptes() is only available when CONFIG_MMU is set/enabled. Without CONFIG_MMU, vfio_pci.o has build errors, so make VFIO_PCI depend on MMU.
riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `vfio_pci_mmap_open': vfio_pci.c:(.text+0x1ec): undefined reference to `zap_vma_ptes' riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `.L0 ': vfio_pci.c:(.text+0x165c): undefined reference to `zap_vma_ptes'
Fixes: 11c4cd07ba11 ("vfio-pci: Fault mmaps to enable vma tracking") Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: kernel test robot lkp@intel.com Cc: Alex Williamson alex.williamson@redhat.com Cc: Cornelia Huck cohuck@redhat.com Cc: kvm@vger.kernel.org Cc: Jason Gunthorpe jgg@nvidia.com Cc: Eric Auger eric.auger@redhat.com Message-Id: 20210515190856.2130-1-rdunlap@infradead.org Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vfio/pci/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 24ee2605b9f0..0da884bfc7a8 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -1,6 +1,7 @@ config VFIO_PCI tristate "VFIO support for PCI devices" depends on VFIO && PCI && EVENTFD + depends on MMU select VFIO_VIRQFD select IRQ_BYPASS_MANAGER help
From: Max Gurtovoy mgurtovoy@nvidia.com
[ Upstream commit dc51ff91cf2d1e9a2d941da483602f71d4a51472 ]
The ->parent_module is the one that use in try_module_get. It should also be the one the we use in module_put during vfio_platform_open().
Fixes: 32a2d71c4e80 ("vfio: platform: introduce vfio-platform-base module") Signed-off-by: Max Gurtovoy mgurtovoy@nvidia.com Message-Id: 20210518192133.59195-1-mgurtovoy@nvidia.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vfio/platform/vfio_platform_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index aa9e792110e3..f42acc830c24 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -288,7 +288,7 @@ err_irq: vfio_platform_regions_cleanup(vdev); err_reg: mutex_unlock(&driver_lock); - module_put(THIS_MODULE); + module_put(vdev->parent_module); return ret; }
From: Julian Anastasov ja@ssi.bg
[ Upstream commit 56e4ee82e850026d71223262c07df7d6af3bd872 ]
syzbot reported memory leak [1] when adding service with HASHED flag. We should ignore this flag both from sockopt and netlink provided data, otherwise the service is not hashed and not visible while releasing resources.
[1] BUG: memory leak unreferenced object 0xffff888115227800 (size 512): comm "syz-executor263", pid 8658, jiffies 4294951882 (age 12.560s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff83977188>] kmalloc include/linux/slab.h:556 [inline] [<ffffffff83977188>] kzalloc include/linux/slab.h:686 [inline] [<ffffffff83977188>] ip_vs_add_service+0x598/0x7c0 net/netfilter/ipvs/ip_vs_ctl.c:1343 [<ffffffff8397d770>] do_ip_vs_set_ctl+0x810/0xa40 net/netfilter/ipvs/ip_vs_ctl.c:2570 [<ffffffff838449a8>] nf_setsockopt+0x68/0xa0 net/netfilter/nf_sockopt.c:101 [<ffffffff839ae4e9>] ip_setsockopt+0x259/0x1ff0 net/ipv4/ip_sockglue.c:1435 [<ffffffff839fa03c>] raw_setsockopt+0x18c/0x1b0 net/ipv4/raw.c:857 [<ffffffff83691f20>] __sys_setsockopt+0x1b0/0x360 net/socket.c:2117 [<ffffffff836920f2>] __do_sys_setsockopt net/socket.c:2128 [inline] [<ffffffff836920f2>] __se_sys_setsockopt net/socket.c:2125 [inline] [<ffffffff836920f2>] __x64_sys_setsockopt+0x22/0x30 net/socket.c:2125 [<ffffffff84350efa>] do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 [<ffffffff84400068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Reported-and-tested-by: syzbot+e562383183e4b1766930@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Julian Anastasov ja@ssi.bg Reviewed-by: Simon Horman horms@verge.net.au Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipvs/ip_vs_ctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index c1672ff00963..eea0144aada7 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -1262,7 +1262,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u, ip_vs_addr_copy(svc->af, &svc->addr, &u->addr); svc->port = u->port; svc->fwmark = u->fwmark; - svc->flags = u->flags; + svc->flags = u->flags & ~IP_VS_SVC_F_HASHED; svc->timeout = u->timeout * HZ; svc->netmask = u->netmask; svc->ipvs = ipvs;
From: Zhen Lei thunder.leizhen@huawei.com
[ Upstream commit 3dd653c077efda8152f4dd395359617d577a54cd ]
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function.
Fixes: 224ee88fe395 ("Input: add force feedback driver for PID devices") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/usbhid/hid-pidff.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/hid/usbhid/hid-pidff.c b/drivers/hid/usbhid/hid-pidff.c index 08174d341f4a..bc75f1efa0f4 100644 --- a/drivers/hid/usbhid/hid-pidff.c +++ b/drivers/hid/usbhid/hid-pidff.c @@ -1304,6 +1304,7 @@ int hid_pidff_init(struct hid_device *hid)
if (pidff->pool[PID_DEVICE_MANAGED_POOL].value && pidff->pool[PID_DEVICE_MANAGED_POOL].value[0] == 0) { + error = -EPERM; hid_notice(hid, "device does not support device managed pool\n"); goto fail;
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit dc5f9f55502e13ba05731d5046a14620aa2ff456 ]
clang doesn't like printing a 32-bit integer using %hX format string:
drivers/hid/i2c-hid/i2c-hid-core.c:994:18: error: format specifies type 'unsigned short' but the argument has type '__u32' (aka 'unsigned int') [-Werror,-Wformat] client->name, hid->vendor, hid->product); ^~~~~~~~~~~ drivers/hid/i2c-hid/i2c-hid-core.c:994:31: error: format specifies type 'unsigned short' but the argument has type '__u32' (aka 'unsigned int') [-Werror,-Wformat] client->name, hid->vendor, hid->product); ^~~~~~~~~~~~
Use an explicit cast to truncate it to the low 16 bits instead.
Fixes: 9ee3e06610fd ("HID: i2c-hid: override HID descriptors for certain devices") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/i2c-hid/i2c-hid-core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c b/drivers/hid/i2c-hid/i2c-hid-core.c index 0294cac4c856..b16bf4358485 100644 --- a/drivers/hid/i2c-hid/i2c-hid-core.c +++ b/drivers/hid/i2c-hid/i2c-hid-core.c @@ -1092,8 +1092,8 @@ static int i2c_hid_probe(struct i2c_client *client, hid->vendor = le16_to_cpu(ihid->hdesc.wVendorID); hid->product = le16_to_cpu(ihid->hdesc.wProductID);
- snprintf(hid->name, sizeof(hid->name), "%s %04hX:%04hX", - client->name, hid->vendor, hid->product); + snprintf(hid->name, sizeof(hid->name), "%s %04X:%04X", + client->name, (u16)hid->vendor, (u16)hid->product); strlcpy(hid->phys, dev_name(&client->dev), sizeof(hid->phys));
ihid->quirks = i2c_hid_lookup_quirk(hid->vendor, hid->product);
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 8971ee8b087750a23f3cd4dc55bff2d0303fd267 ]
The private helper data size cannot be updated. However, updates that contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is the same.
Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nfnetlink_cthelper.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nfnetlink_cthelper.c b/net/netfilter/nfnetlink_cthelper.c index dfe4e6787219..2ebeb615db15 100644 --- a/net/netfilter/nfnetlink_cthelper.c +++ b/net/netfilter/nfnetlink_cthelper.c @@ -370,10 +370,14 @@ static int nfnl_cthelper_update(const struct nlattr * const tb[], struct nf_conntrack_helper *helper) { + u32 size; int ret;
- if (tb[NFCTH_PRIV_DATA_LEN]) - return -EBUSY; + if (tb[NFCTH_PRIV_DATA_LEN]) { + size = ntohl(nla_get_be32(tb[NFCTH_PRIV_DATA_LEN])); + if (size != helper->data_len) + return -EBUSY; + }
if (tb[NFCTH_POLICY]) { ret = nfnl_cthelper_update_policy(helper, tb[NFCTH_POLICY]);
From: Zhen Lei thunder.leizhen@huawei.com
[ Upstream commit 79c6b8ed30e54b401c873dbad2511f2a1c525fd5 ]
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function.
Fixes: be51da0f3e34 ("ieee802154: Stop using NLA_PUT*().") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Link: https://lore.kernel.org/r/20210508062517.2574-1-thunder.leizhen@huawei.com Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ieee802154/nl-phy.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ieee802154/nl-phy.c b/net/ieee802154/nl-phy.c index dc2960be51e0..6a7fe415d5f4 100644 --- a/net/ieee802154/nl-phy.c +++ b/net/ieee802154/nl-phy.c @@ -249,8 +249,10 @@ int ieee802154_add_iface(struct sk_buff *skb, struct genl_info *info) }
if (nla_put_string(msg, IEEE802154_ATTR_PHY_NAME, wpan_phy_name(phy)) || - nla_put_string(msg, IEEE802154_ATTR_DEV_NAME, dev->name)) + nla_put_string(msg, IEEE802154_ATTR_DEV_NAME, dev->name)) { + rc = -EMSGSIZE; goto nla_put_failure; + } dev_put(dev);
wpan_phy_put(phy);
From: Wei Yongjun weiyongjun1@huawei.com
[ Upstream commit 373e864cf52403b0974c2f23ca8faf9104234555 ]
Fix to return negative error code -ENOBUFS from the error handling case instead of 0, as done elsewhere in this function.
Fixes: 3e9c156e2c21 ("ieee802154: add netlink interfaces for llsec") Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Wei Yongjun weiyongjun1@huawei.com Link: https://lore.kernel.org/r/20210519141614.3040055-1-weiyongjun1@huawei.com Signed-off-by: Stefan Schmidt stefan@datenfreihafen.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ieee802154/nl-mac.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ieee802154/nl-mac.c b/net/ieee802154/nl-mac.c index c0930b9fe848..7531cb1665d2 100644 --- a/net/ieee802154/nl-mac.c +++ b/net/ieee802154/nl-mac.c @@ -688,8 +688,10 @@ int ieee802154_llsec_getparams(struct sk_buff *skb, struct genl_info *info) nla_put_u8(msg, IEEE802154_ATTR_LLSEC_SECLEVEL, params.out_level) || nla_put_u32(msg, IEEE802154_ATTR_LLSEC_FRAME_COUNTER, be32_to_cpu(params.frame_counter)) || - ieee802154_llsec_fill_key_id(msg, ¶ms.out_key)) + ieee802154_llsec_fill_key_id(msg, ¶ms.out_key)) { + rc = -ENOBUFS; goto out_free; + }
dev_put(dev);
From: Lin Ma linma@zju.edu.cn
commit 6a137caec23aeb9e036cdfd8a46dd8a366460e5d upstream.
In the cleanup routine for failed initialization of HCI device, the flush_work(&hdev->rx_work) need to be finished before the flush_work(&hdev->cmd_work). Otherwise, the hci_rx_work() can possibly invoke new cmd_work and cause a bug, like double free, in late processings.
This was assigned CVE-2021-3564.
This patch reorder the flush_work() to fix this bug.
Cc: Marcel Holtmann marcel@holtmann.org Cc: Johan Hedberg johan.hedberg@gmail.com Cc: Luiz Augusto von Dentz luiz.dentz@gmail.com Cc: "David S. Miller" davem@davemloft.net Cc: Jakub Kicinski kuba@kernel.org Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Lin Ma linma@zju.edu.cn Signed-off-by: Hao Xiong mart1n@zju.edu.cn Cc: stable stable@vger.kernel.org Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/hci_core.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1458,8 +1458,13 @@ static int hci_dev_do_open(struct hci_de } else { /* Init failed, cleanup */ flush_work(&hdev->tx_work); - flush_work(&hdev->cmd_work); + + /* Since hci_rx_work() is possible to awake new cmd_work + * it should be flushed first to avoid unexpected call of + * hci_cmd_work() + */ flush_work(&hdev->rx_work); + flush_work(&hdev->cmd_work);
skb_queue_purge(&hdev->cmd_q); skb_queue_purge(&hdev->rx_q);
From: Lin Ma linma@zju.edu.cn
commit e305509e678b3a4af2b3cfd410f409f7cdaabb52 upstream.
The hci_sock_dev_event() function will cleanup the hdev object for sockets even if this object may still be in used within the hci_sock_bound_ioctl() function, result in UAF vulnerability.
This patch replace the BH context lock to serialize these affairs and prevent the race condition.
Signed-off-by: Lin Ma linma@zju.edu.cn Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/hci_sock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -750,7 +750,7 @@ void hci_sock_dev_event(struct hci_dev * /* Detach sockets from device */ read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { - bh_lock_sock_nested(sk); + lock_sock(sk); if (hci_pi(sk)->hdev == hdev) { hci_pi(sk)->hdev = NULL; sk->sk_err = EPIPE; @@ -759,7 +759,7 @@ void hci_sock_dev_event(struct hci_dev *
hci_dev_put(hdev); } - bh_unlock_sock(sk); + release_sock(sk); } read_unlock(&hci_sk_list.lock); }
From: Pavel Skripkin paskripkin@gmail.com
commit bce130e7f392ddde8cfcb09927808ebd5f9c8669 upstream.
Added cfserl_release() function.
Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/caif/cfserl.h | 1 + net/caif/cfserl.c | 5 +++++ 2 files changed, 6 insertions(+)
--- a/include/net/caif/cfserl.h +++ b/include/net/caif/cfserl.h @@ -9,4 +9,5 @@ #include <net/caif/caif_layer.h>
struct cflayer *cfserl_create(int instance, bool use_stx); +void cfserl_release(struct cflayer *layer); #endif --- a/net/caif/cfserl.c +++ b/net/caif/cfserl.c @@ -31,6 +31,11 @@ static int cfserl_transmit(struct cflaye static void cfserl_ctrlcmd(struct cflayer *layr, enum caif_ctrlcmd ctrl, int phyid);
+void cfserl_release(struct cflayer *layer) +{ + kfree(layer); +} + struct cflayer *cfserl_create(int instance, bool use_stx) { struct cfserl *this = kzalloc(sizeof(struct cfserl), GFP_ATOMIC);
From: Pavel Skripkin paskripkin@gmail.com
commit a2805dca5107d5603f4bbc027e81e20d93476e96 upstream.
caif_enroll_dev() can fail in some cases. Ingnoring these cases can lead to memory leak due to not assigning link_support pointer to anywhere.
Fixes: 7c18d2205ea7 ("caif: Restructure how link caif link layer enroll") Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/caif/caif_dev.h | 2 +- include/net/caif/cfcnfg.h | 2 +- net/caif/caif_dev.c | 8 +++++--- net/caif/cfcnfg.c | 16 +++++++++++----- 4 files changed, 18 insertions(+), 10 deletions(-)
--- a/include/net/caif/caif_dev.h +++ b/include/net/caif/caif_dev.h @@ -119,7 +119,7 @@ void caif_free_client(struct cflayer *ad * The link_support layer is used to add any Link Layer specific * framing. */ -void caif_enroll_dev(struct net_device *dev, struct caif_dev_common *caifdev, +int caif_enroll_dev(struct net_device *dev, struct caif_dev_common *caifdev, struct cflayer *link_support, int head_room, struct cflayer **layer, int (**rcv_func)( struct sk_buff *, struct net_device *, --- a/include/net/caif/cfcnfg.h +++ b/include/net/caif/cfcnfg.h @@ -62,7 +62,7 @@ void cfcnfg_remove(struct cfcnfg *cfg); * @fcs: Specify if checksum is used in CAIF Framing Layer. * @head_room: Head space needed by link specific protocol. */ -void +int cfcnfg_add_phy_layer(struct cfcnfg *cnfg, struct net_device *dev, struct cflayer *phy_layer, enum cfcnfg_phy_preference pref, --- a/net/caif/caif_dev.c +++ b/net/caif/caif_dev.c @@ -303,7 +303,7 @@ static void dev_flowctrl(struct net_devi caifd_put(caifd); }
-void caif_enroll_dev(struct net_device *dev, struct caif_dev_common *caifdev, +int caif_enroll_dev(struct net_device *dev, struct caif_dev_common *caifdev, struct cflayer *link_support, int head_room, struct cflayer **layer, int (**rcv_func)(struct sk_buff *, struct net_device *, @@ -314,11 +314,12 @@ void caif_enroll_dev(struct net_device * enum cfcnfg_phy_preference pref; struct cfcnfg *cfg = get_cfcnfg(dev_net(dev)); struct caif_device_entry_list *caifdevs; + int res;
caifdevs = caif_device_list(dev_net(dev)); caifd = caif_device_alloc(dev); if (!caifd) - return; + return -ENOMEM; *layer = &caifd->layer; spin_lock_init(&caifd->flow_lock);
@@ -340,7 +341,7 @@ void caif_enroll_dev(struct net_device * sizeof(caifd->layer.name) - 1); caifd->layer.name[sizeof(caifd->layer.name) - 1] = 0; caifd->layer.transmit = transmit; - cfcnfg_add_phy_layer(cfg, + res = cfcnfg_add_phy_layer(cfg, dev, &caifd->layer, pref, @@ -350,6 +351,7 @@ void caif_enroll_dev(struct net_device * mutex_unlock(&caifdevs->lock); if (rcv_func) *rcv_func = receive; + return res; } EXPORT_SYMBOL(caif_enroll_dev);
--- a/net/caif/cfcnfg.c +++ b/net/caif/cfcnfg.c @@ -452,7 +452,7 @@ unlock: rcu_read_unlock(); }
-void +int cfcnfg_add_phy_layer(struct cfcnfg *cnfg, struct net_device *dev, struct cflayer *phy_layer, enum cfcnfg_phy_preference pref, @@ -461,7 +461,7 @@ cfcnfg_add_phy_layer(struct cfcnfg *cnfg { struct cflayer *frml; struct cfcnfg_phyinfo *phyinfo = NULL; - int i; + int i, res = 0; u8 phyid;
mutex_lock(&cnfg->lock); @@ -475,12 +475,15 @@ cfcnfg_add_phy_layer(struct cfcnfg *cnfg goto got_phyid; } pr_warn("Too many CAIF Link Layers (max 6)\n"); + res = -EEXIST; goto out;
got_phyid: phyinfo = kzalloc(sizeof(struct cfcnfg_phyinfo), GFP_ATOMIC); - if (!phyinfo) + if (!phyinfo) { + res = -ENOMEM; goto out_err; + }
phy_layer->id = phyid; phyinfo->pref = pref; @@ -494,8 +497,10 @@ got_phyid:
frml = cffrml_create(phyid, fcs);
- if (!frml) + if (!frml) { + res = -ENOMEM; goto out_err; + } phyinfo->frm_layer = frml; layer_set_up(frml, cnfg->mux);
@@ -513,11 +518,12 @@ got_phyid: list_add_rcu(&phyinfo->node, &cnfg->phys); out: mutex_unlock(&cnfg->lock); - return; + return res;
out_err: kfree(phyinfo); mutex_unlock(&cnfg->lock); + return res; } EXPORT_SYMBOL(cfcnfg_add_phy_layer);
From: Pavel Skripkin paskripkin@gmail.com
commit b53558a950a89824938e9811eddfc8efcd94e1bb upstream.
In case of caif_enroll_dev() fail, allocated link_support won't be assigned to the corresponding structure. So simply free allocated pointer in case of error
Fixes: 7c18d2205ea7 ("caif: Restructure how link caif link layer enroll") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+7ec324747ce876a29db6@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/caif/caif_dev.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/net/caif/caif_dev.c +++ b/net/caif/caif_dev.c @@ -366,6 +366,7 @@ static int caif_device_notify(struct not struct cflayer *layer, *link_support; int head_room = 0; struct caif_device_entry_list *caifdevs; + int res;
cfg = get_cfcnfg(dev_net(dev)); caifdevs = caif_device_list(dev_net(dev)); @@ -391,8 +392,10 @@ static int caif_device_notify(struct not break; } } - caif_enroll_dev(dev, caifdev, link_support, head_room, + res = caif_enroll_dev(dev, caifdev, link_support, head_room, &layer, NULL); + if (res) + cfserl_release(link_support); caifdev->flowctrl = dev_flowctrl; break;
From: Pavel Skripkin paskripkin@gmail.com
commit 7f5d86669fa4d485523ddb1d212e0a2d90bd62bb upstream.
In case of caif_enroll_dev() fail, allocated link_support won't be assigned to the corresponding structure. So simply free allocated pointer in case of error.
Fixes: 7ad65bf68d70 ("caif: Add support for CAIF over CDC NCM USB interface") Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin paskripkin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/caif/caif_usb.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
--- a/net/caif/caif_usb.c +++ b/net/caif/caif_usb.c @@ -116,6 +116,11 @@ static struct cflayer *cfusbl_create(int return (struct cflayer *) this; }
+static void cfusbl_release(struct cflayer *layer) +{ + kfree(layer); +} + static struct packet_type caif_usb_type __read_mostly = { .type = cpu_to_be16(ETH_P_802_EX1), }; @@ -128,6 +133,7 @@ static int cfusbl_device_notify(struct n struct cflayer *layer, *link_support; struct usbnet *usbnet; struct usb_device *usbdev; + int res;
/* Check whether we have a NCM device, and find its VID/PID. */ if (!(dev->dev.parent && dev->dev.parent->driver && @@ -170,8 +176,11 @@ static int cfusbl_device_notify(struct n if (dev->num_tx_queues > 1) pr_warn("USB device uses more than one tx queue\n");
- caif_enroll_dev(dev, &common, link_support, CFUSB_MAX_HEADLEN, + res = caif_enroll_dev(dev, &common, link_support, CFUSB_MAX_HEADLEN, &layer, &caif_usb_type.func); + if (res) + goto err; + if (!pack_added) dev_add_pack(&caif_usb_type); pack_added = true; @@ -181,6 +190,9 @@ static int cfusbl_device_notify(struct n layer->name[sizeof(layer->name) - 1] = 0;
return 0; +err: + cfusbl_release(link_support); + return res; }
static struct notifier_block caif_device_notifier = {
From: Takashi Iwai tiwai@suse.de
commit 9c1fe96bded935369f8340c2ac2e9e189f697d5d upstream.
snd_timer_notify1() calls the notification to each slave for a master event, but it passes a wrong event number. It should be +10 offset, corresponding to SNDRV_TIMER_EVENT_MXXX, but it's incorrectly with +100 offset. Casually this was spotted by UBSAN check via syzkaller.
Reported-by: syzbot+d102fa5b35335a7e544e@syzkaller.appspotmail.com Reviewed-by: Jaroslav Kysela perex@perex.cz Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/000000000000e5560e05c3bd1d63@google.com Link: https://lore.kernel.org/r/20210602113823.23777-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/core/timer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/sound/core/timer.c +++ b/sound/core/timer.c @@ -490,9 +490,10 @@ static void snd_timer_notify1(struct snd return; if (timer->hw.flags & SNDRV_TIMER_HW_SLAVE) return; + event += 10; /* convert to SNDRV_TIMER_EVENT_MXXX */ list_for_each_entry(ts, &ti->slave_active_head, active_list) if (ts->ccallback) - ts->ccallback(ts, event + 100, &tstamp, resolution); + ts->ccallback(ts, event, &tstamp, resolution); }
/* start/continue a master timer */
From: Ye Bin yebin10@huawei.com
commit 082cd4ec240b8734a82a89ffb890216ac98fec68 upstream.
We got follow bug_on when run fsstress with injecting IO fault: [130747.323114] kernel BUG at fs/ext4/extents_status.c:762! [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP ...... [130747.334329] Call trace: [130747.334553] ext4_es_cache_extent+0x150/0x168 [ext4] [130747.334975] ext4_cache_extents+0x64/0xe8 [ext4] [130747.335368] ext4_find_extent+0x300/0x330 [ext4] [130747.335759] ext4_ext_map_blocks+0x74/0x1178 [ext4] [130747.336179] ext4_map_blocks+0x2f4/0x5f0 [ext4] [130747.336567] ext4_mpage_readpages+0x4a8/0x7a8 [ext4] [130747.336995] ext4_readpage+0x54/0x100 [ext4] [130747.337359] generic_file_buffered_read+0x410/0xae8 [130747.337767] generic_file_read_iter+0x114/0x190 [130747.338152] ext4_file_read_iter+0x5c/0x140 [ext4] [130747.338556] __vfs_read+0x11c/0x188 [130747.338851] vfs_read+0x94/0x150 [130747.339110] ksys_read+0x74/0xf0
This patch's modification is according to Jan Kara's suggestion in: https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201... "I see. Now I understand your patch. Honestly, seeing how fragile is trying to fix extent tree after split has failed in the middle, I would probably go even further and make sure we fix the tree properly in case of ENOSPC and EDQUOT (those are easily user triggerable). Anything else indicates a HW problem or fs corruption so I'd rather leave the extent tree as is and don't try to fix it (which also means we will not create overlapping extents)."
Cc: stable@kernel.org Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/extents.c | 43 +++++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 20 deletions(-)
--- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3275,7 +3275,10 @@ static int ext4_split_extent_at(handle_t ext4_ext_mark_unwritten(ex2);
err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags); - if (err == -ENOSPC && (EXT4_EXT_MAY_ZEROOUT & split_flag)) { + if (err != -ENOSPC && err != -EDQUOT) + goto out; + + if (EXT4_EXT_MAY_ZEROOUT & split_flag) { if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) { if (split_flag & EXT4_EXT_DATA_VALID1) { err = ext4_ext_zeroout(inode, ex2); @@ -3301,30 +3304,30 @@ static int ext4_split_extent_at(handle_t ext4_ext_pblock(&orig_ex)); }
- if (err) - goto fix_extent_len; - /* update the extent length and mark as initialized */ - ex->ee_len = cpu_to_le16(ee_len); - ext4_ext_try_to_merge(handle, inode, path, ex); - err = ext4_ext_dirty(handle, inode, path + path->p_depth); - if (err) - goto fix_extent_len; - - /* update extent status tree */ - err = ext4_zeroout_es(inode, &zero_ex); - - goto out; - } else if (err) - goto fix_extent_len; - -out: - ext4_ext_show_leaf(inode, path); - return err; + if (!err) { + /* update the extent length and mark as initialized */ + ex->ee_len = cpu_to_le16(ee_len); + ext4_ext_try_to_merge(handle, inode, path, ex); + err = ext4_ext_dirty(handle, inode, path + path->p_depth); + if (!err) + /* update extent status tree */ + err = ext4_zeroout_es(inode, &zero_ex); + /* If we failed at this point, we don't know in which + * state the extent tree exactly is so don't try to fix + * length of the original extent as it may do even more + * damage. + */ + goto out; + } + }
fix_extent_len: ex->ee_len = orig_ex.ee_len; ext4_ext_dirty(handle, inode, path + path->p_depth); return err; +out: + ext4_ext_show_leaf(inode, path); + return err; }
/*
From: Mark Rutland mark.rutland@arm.com
commit 0711f0d7050b9e07c44bc159bbc64ac0a1022c7f upstream.
During boot, kernel_init_freeable() initializes `cad_pid` to the init task's struct pid. Later on, we may change `cad_pid` via a sysctl, and when this happens proc_do_cad_pid() will increment the refcount on the new pid via get_pid(), and will decrement the refcount on the old pid via put_pid(). As we never called get_pid() when we initialized `cad_pid`, we decrement a reference we never incremented, can therefore free the init task's struct pid early. As there can be dangling references to the struct pid, we can later encounter a use-after-free (e.g. when delivering signals).
This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to have been around since the conversion of `cad_pid` to struct pid in commit 9ec52099e4b8 ("[PATCH] replace cad_pid by a struct pid") from the pre-KASAN stone age of v2.6.19.
Fix this by getting a reference to the init task's struct pid when we assign it to `cad_pid`.
Full KASAN splat below.
================================================================== BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline] BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273
CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1 Hardware name: linux,dummy-virt (DT) Call trace: ns_of_pid include/linux/pid.h:153 [inline] task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509 do_notify_parent+0x308/0xe60 kernel/signal.c:1950 exit_notify kernel/exit.c:682 [inline] do_exit+0x2334/0x2bd0 kernel/exit.c:845 do_group_exit+0x108/0x2c8 kernel/exit.c:922 get_signal+0x4e4/0x2a88 kernel/signal.c:2781 do_signal arch/arm64/kernel/signal.c:882 [inline] do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936 work_pending+0xc/0x2dc
Allocated by task 0: slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516 slab_alloc_node mm/slub.c:2907 [inline] slab_alloc mm/slub.c:2915 [inline] kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920 alloc_pid+0xdc/0xc00 kernel/pid.c:180 copy_process+0x2794/0x5e18 kernel/fork.c:2129 kernel_clone+0x194/0x13c8 kernel/fork.c:2500 kernel_thread+0xd4/0x110 kernel/fork.c:2552 rest_init+0x44/0x4a0 init/main.c:687 arch_call_rest_init+0x1c/0x28 start_kernel+0x520/0x554 init/main.c:1064 0x0
Freed by task 270: slab_free_hook mm/slub.c:1562 [inline] slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600 slab_free mm/slub.c:3161 [inline] kmem_cache_free+0x224/0x8e0 mm/slub.c:3177 put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114 put_pid+0x30/0x48 kernel/pid.c:109 proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401 proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591 proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617 call_write_iter include/linux/fs.h:1977 [inline] new_sync_write+0x3ac/0x510 fs/read_write.c:518 vfs_write fs/read_write.c:605 [inline] vfs_write+0x9c4/0x1018 fs/read_write.c:585 ksys_write+0x124/0x240 fs/read_write.c:658 __do_sys_write fs/read_write.c:670 [inline] __se_sys_write fs/read_write.c:667 [inline] __arm64_sys_write+0x78/0xb0 fs/read_write.c:667 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall arch/arm64/kernel/syscall.c:49 [inline] el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129 do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168 el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416 el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432 el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701
The buggy address belongs to the object at ffff23794dda0000 which belongs to the cache pid of size 224 The buggy address is located 4 bytes inside of 224-byte region [ffff23794dda0000, ffff23794dda00e0) The buggy address belongs to the page: page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0 head:(____ptrval____) order:1 compound_mapcount:0 flags: 0x3fffc0000010200(slab|head) raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080 raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ==================================================================
Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com Fixes: 9ec52099e4b8678a ("[PATCH] replace cad_pid by a struct pid") Signed-off-by: Mark Rutland mark.rutland@arm.com Acked-by: Christian Brauner christian.brauner@ubuntu.com Cc: Cedric Le Goater clg@fr.ibm.com Cc: Christian Brauner christian@brauner.io Cc: Eric W. Biederman ebiederm@xmission.com Cc: Kees Cook <keescook@chromium.org Cc: Martin Schwidefsky schwidefsky@de.ibm.com Cc: Paul Mackerras paulus@samba.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- init/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/init/main.c +++ b/init/main.c @@ -1054,7 +1054,7 @@ static noinline void __init kernel_init_ */ set_mems_allowed(node_states[N_MEMORY]);
- cad_pid = task_pid(current); + cad_pid = get_pid(task_pid(current));
smp_prepare_cpus(setup_max_cpus);
From: Junxiao Bi junxiao.bi@oracle.com
commit 6bba4471f0cc1296fe3c2089b9e52442d3074b2e upstream.
When fallocate punches holes out of inode size, if original isize is in the middle of last cluster, then the part from isize to the end of the cluster will be zeroed with buffer write, at that time isize is not yet updated to match the new size, if writeback is kicked in, it will invoke ocfs2_writepage()->block_write_full_page() where the pages out of inode size will be dropped. That will cause file corruption. Fix this by zero out eof blocks when extending the inode size.
Running the following command with qemu-image 4.2.1 can get a corrupted coverted image file easily.
qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ -O qcow2 -o compat=1.1 $qcow_image.conv
The usage of fallocate in qemu is like this, it first punches holes out of inode size, then extend the inode size.
fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 fallocate(11, 0, 2276196352, 65536) = 0
v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T...
Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi junxiao.bi@oracle.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Jan Kara jack@suse.cz Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Jun Piao piaojun@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/file.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 5 deletions(-)
--- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1862,6 +1862,45 @@ out: }
/* + * zero out partial blocks of one cluster. + * + * start: file offset where zero starts, will be made upper block aligned. + * len: it will be trimmed to the end of current cluster if "start + len" + * is bigger than it. + */ +static int ocfs2_zeroout_partial_cluster(struct inode *inode, + u64 start, u64 len) +{ + int ret; + u64 start_block, end_block, nr_blocks; + u64 p_block, offset; + u32 cluster, p_cluster, nr_clusters; + struct super_block *sb = inode->i_sb; + u64 end = ocfs2_align_bytes_to_clusters(sb, start); + + if (start + len < end) + end = start + len; + + start_block = ocfs2_blocks_for_bytes(sb, start); + end_block = ocfs2_blocks_for_bytes(sb, end); + nr_blocks = end_block - start_block; + if (!nr_blocks) + return 0; + + cluster = ocfs2_bytes_to_clusters(sb, start); + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, + &nr_clusters, NULL); + if (ret) + return ret; + if (!p_cluster) + return 0; + + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); +} + +/* * Parts of this function taken from xfs_change_file_space() */ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, @@ -1871,7 +1910,7 @@ static int __ocfs2_change_file_space(str { int ret; s64 llen; - loff_t size; + loff_t size, orig_isize; struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); struct buffer_head *di_bh = NULL; handle_t *handle; @@ -1902,6 +1941,7 @@ static int __ocfs2_change_file_space(str goto out_inode_unlock; }
+ orig_isize = i_size_read(inode); switch (sr->l_whence) { case 0: /*SEEK_SET*/ break; @@ -1909,7 +1949,7 @@ static int __ocfs2_change_file_space(str sr->l_start += f_pos; break; case 2: /*SEEK_END*/ - sr->l_start += i_size_read(inode); + sr->l_start += orig_isize; break; default: ret = -EINVAL; @@ -1963,6 +2003,14 @@ static int __ocfs2_change_file_space(str default: ret = -EINVAL; } + + /* zeroout eof blocks in the cluster. */ + if (!ret && change_size && orig_isize < size) { + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, + size - orig_isize); + if (!ret) + i_size_write(inode, size); + } up_write(&OCFS2_I(inode)->ip_alloc_sem); if (ret) { mlog_errno(ret); @@ -1979,9 +2027,6 @@ static int __ocfs2_change_file_space(str goto out_inode_unlock; }
- if (change_size && i_size_read(inode) < size) - i_size_write(inode, size); - inode->i_ctime = inode->i_mtime = current_time(inode); ret = ocfs2_mark_inode_dirty(handle, inode, di_bh); if (ret < 0)
From: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com
commit 4ac06a1e013cf5fdd963317ffd3b968560f33bba upstream.
It's possible to trigger NULL pointer dereference by local unprivileged user, when calling getsockname() after failed bind() (e.g. the bind fails because LLCP_SAP_MAX used as SAP):
BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 1 PID: 426 Comm: llcp_sock_getna Not tainted 5.13.0-rc2-next-20210521+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014 Call Trace: llcp_sock_getname+0xb1/0xe0 __sys_getpeername+0x95/0xc0 ? lockdep_hardirqs_on_prepare+0xd5/0x180 ? syscall_enter_from_user_mode+0x1c/0x40 __x64_sys_getpeername+0x11/0x20 do_syscall_64+0x36/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae
This can be reproduced with Syzkaller C repro (bind followed by getpeername): https://syzkaller.appspot.com/x/repro.c?x=14def446e00000
Cc: stable@vger.kernel.org Fixes: d646960f7986 ("NFC: Initial LLCP support") Reported-by: syzbot+80fb126e7f7d8b1a5914@syzkaller.appspotmail.com Reported-by: butt3rflyh4ck butterflyhuangxx@gmail.com Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com Link: https://lore.kernel.org/r/20210531072138.5219-1-krzysztof.kozlowski@canonica... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/nfc/llcp_sock.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/nfc/llcp_sock.c +++ b/net/nfc/llcp_sock.c @@ -122,6 +122,7 @@ static int llcp_sock_bind(struct socket if (!llcp_sock->service_name) { nfc_llcp_local_put(llcp_sock->local); llcp_sock->local = NULL; + llcp_sock->dev = NULL; ret = -ENOMEM; goto put_dev; } @@ -131,6 +132,7 @@ static int llcp_sock_bind(struct socket llcp_sock->local = NULL; kfree(llcp_sock->service_name); llcp_sock->service_name = NULL; + llcp_sock->dev = NULL; ret = -EADDRINUSE; goto put_dev; }
From: Josef Bacik josef@toxicpanda.com
commit b86652be7c83f70bf406bed18ecf55adb9bfb91b upstream.
Error injection stress would sometimes fail with checksums on disk that did not have a corresponding extent. This occurred because the pattern in btrfs_del_csums was
while (1) { ret = btrfs_search_slot(); if (ret < 0) break; } ret = 0; out: btrfs_free_path(path); return ret;
If we got an error from btrfs_search_slot we'd clear the error because we were breaking instead of goto out. Instead of using goto out, simply handle the cases where we may leave a random value in ret, and get rid of the
ret = 0; out:
pattern and simply allow break to have the proper error reporting. With this fix we properly abort the transaction and do not commit thinking we successfully deleted the csum.
Reviewed-by: Qu Wenruo wqu@suse.com CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/file-item.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -599,7 +599,7 @@ int btrfs_del_csums(struct btrfs_trans_h u64 end_byte = bytenr + len; u64 csum_end; struct extent_buffer *leaf; - int ret; + int ret = 0; u16 csum_size = btrfs_super_csum_size(fs_info->super_copy); int blocksize_bits = fs_info->sb->s_blocksize_bits;
@@ -615,6 +615,7 @@ int btrfs_del_csums(struct btrfs_trans_h path->leave_spinning = 1; ret = btrfs_search_slot(trans, root, &key, path, -1, 1); if (ret > 0) { + ret = 0; if (path->slots[0] == 0) break; path->slots[0]--; @@ -671,7 +672,7 @@ int btrfs_del_csums(struct btrfs_trans_h ret = btrfs_del_items(trans, root, path, path->slots[0], del_nr); if (ret) - goto out; + break; if (key.offset == bytenr) break; } else if (key.offset < bytenr && csum_end > end_byte) { @@ -715,8 +716,9 @@ int btrfs_del_csums(struct btrfs_trans_h ret = btrfs_split_item(trans, root, path, &key, offset); if (ret && ret != -EAGAIN) { btrfs_abort_transaction(trans, ret); - goto out; + break; } + ret = 0;
key.offset = end_byte - 1; } else { @@ -726,8 +728,6 @@ int btrfs_del_csums(struct btrfs_trans_h } btrfs_release_path(path); } - ret = 0; -out: btrfs_free_path(path); return ret; }
From: Josef Bacik josef@toxicpanda.com
commit 011b28acf940eb61c000059dd9e2cfcbf52ed96b upstream.
This function has the following pattern
while (1) { ret = whatever(); if (ret) goto out; } ret = 0 out: return ret;
However several places in this while loop we simply break; when there's a problem, thus clearing the return value, and in one case we do a return -EIO, and leak the memory for the path.
Fix this by re-arranging the loop to deal with ret == 1 coming from btrfs_search_slot, and then simply delete the
ret = 0; out:
bit so everybody can break if there is an error, which will allow for proper error handling to occur.
CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik josef@toxicpanda.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/tree-log.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
--- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -1558,6 +1558,7 @@ static noinline int fixup_inode_link_cou break;
if (ret == 1) { + ret = 0; if (path->slots[0] == 0) break; path->slots[0]--; @@ -1570,17 +1571,19 @@ static noinline int fixup_inode_link_cou
ret = btrfs_del_item(trans, root, path); if (ret) - goto out; + break;
btrfs_release_path(path); inode = read_one_inode(root, key.offset); - if (!inode) - return -EIO; + if (!inode) { + ret = -EIO; + break; + }
ret = fixup_inode_link_count(trans, root, inode); iput(inode); if (ret) - goto out; + break;
/* * fixup on a directory may create new entries, @@ -1589,8 +1592,6 @@ static noinline int fixup_inode_link_cou */ key.offset = (u64)-1; } - ret = 0; -out: btrfs_release_path(path); return ret; }
From: Mina Almasry almasrymina@google.com
[ Upstream commit d84cf06e3dd8c5c5b547b5d8931015fc536678e5 ]
The userfaultfd hugetlb tests cause a resv_huge_pages underflow. This happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on an index for which we already have a page in the cache. When this happens, we allocate a second page, double consuming the reservation, and then fail to insert the page into the cache and return -EEXIST.
To fix this, we first check if there is a page in the cache which already consumed the reservation, and return -EEXIST immediately if so.
There is still a rare condition where we fail to copy the page contents AND race with a call for hugetlb_no_page() for this index and again we will underflow resv_huge_pages. That is fixed in a more complicated patch not targeted for -stable.
Test:
Hacked the code locally such that resv_huge_pages underflows produce a warning, then:
./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success ./tools/testing/selftests/vm/userfaultfd hugetlb 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
Both tests succeed and produce no warnings. After the test runs number of free/resv hugepages is correct.
[mike.kravetz@oracle.com: changelog fixes]
Link: https://lkml.kernel.org/r/20210528004649.85298-1-almasrymina@google.com Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Mina Almasry almasrymina@google.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: Axel Rasmussen axelrasmussen@google.com Cc: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/hugetlb.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e59e0f7ed562..0dc181290d1f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4099,10 +4099,20 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct page *page;
if (!*pagep) { - ret = -ENOMEM; + /* If a page already exists, then it's UFFDIO_COPY for + * a non-missing case. Return -EEXIST. + */ + if (vm_shared && + hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + ret = -EEXIST; + goto out; + } + page = alloc_huge_page(dst_vma, dst_addr, 0); - if (IS_ERR(page)) + if (IS_ERR(page)) { + ret = -ENOMEM; goto out; + }
ret = copy_huge_page_from_user(page, (const void __user *) src_addr,
From: Piotr Krysiuk piotras@gmail.com
commit 0a13e3537ea67452d549a6a80da3776d6b7dedb3 upstream.
Fix up test_verifier error messages for the case where the original error message changed, or for the case where pointer alu errors differ between privileged and unprivileged tests. Also, add alternative tests for keeping coverage of the original verifier rejection error message (fp alu), and newly reject map_ptr += rX where rX == 0 given we now forbid alu on these types for unprivileged. All test_verifier cases pass after the change. The test case fixups were kept separate to ease backporting of core changes.
Signed-off-by: Piotr Krysiuk piotras@gmail.com Co-developed-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Alexei Starovoitov ast@kernel.org [fllinden@amazon.com: backport to 4.14, skipping non-existent tests] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/test_verifier.c | 44 +++++++++++++++++++++------- 1 file changed, 34 insertions(+), 10 deletions(-)
--- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -2235,7 +2235,7 @@ static struct bpf_test tests[] = { .result = ACCEPT, }, { - "unpriv: adding of fp", + "unpriv: adding of fp, reg", .insns = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_MOV64_IMM(BPF_REG_1, 0), @@ -2243,9 +2243,22 @@ static struct bpf_test tests[] = { BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, -8), BPF_EXIT_INSN(), }, - .result = ACCEPT, + .errstr_unpriv = "R1 tried to add from different maps, paths, or prohibited types", .result_unpriv = REJECT, + .result = ACCEPT, + }, + { + "unpriv: adding of fp, imm", + .insns = { + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0), + BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, -8), + BPF_EXIT_INSN(), + }, .errstr_unpriv = "R1 stack pointer arithmetic goes out of range", + .result_unpriv = REJECT, + .result = ACCEPT, }, { "unpriv: cmp of stack pointer", @@ -7766,8 +7779,9 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }, - .result = REJECT, + .errstr_unpriv = "R0 tried to sub from different maps, paths, or prohibited types", .errstr = "R0 tried to subtract pointer from scalar", + .result = REJECT, }, { "check deducing bounds from const, 2", @@ -7780,6 +7794,8 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_1, BPF_REG_0), BPF_EXIT_INSN(), }, + .errstr_unpriv = "R1 tried to sub from different maps, paths, or prohibited types", + .result_unpriv = REJECT, .result = ACCEPT, }, { @@ -7790,8 +7806,9 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }, - .result = REJECT, + .errstr_unpriv = "R0 tried to sub from different maps, paths, or prohibited types", .errstr = "R0 tried to subtract pointer from scalar", + .result = REJECT, }, { "check deducing bounds from const, 4", @@ -7804,6 +7821,8 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_1, BPF_REG_0), BPF_EXIT_INSN(), }, + .errstr_unpriv = "R1 tried to sub from different maps, paths, or prohibited types", + .result_unpriv = REJECT, .result = ACCEPT, }, { @@ -7814,8 +7833,9 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }, - .result = REJECT, + .errstr_unpriv = "R0 tried to sub from different maps, paths, or prohibited types", .errstr = "R0 tried to subtract pointer from scalar", + .result = REJECT, }, { "check deducing bounds from const, 6", @@ -7826,8 +7846,9 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }, - .result = REJECT, + .errstr_unpriv = "R0 tried to sub from different maps, paths, or prohibited types", .errstr = "R0 tried to subtract pointer from scalar", + .result = REJECT, }, { "check deducing bounds from const, 7", @@ -7839,8 +7860,9 @@ static struct bpf_test tests[] = { offsetof(struct __sk_buff, mark)), BPF_EXIT_INSN(), }, - .result = REJECT, + .errstr_unpriv = "R1 tried to sub from different maps, paths, or prohibited types", .errstr = "dereference of modified ctx ptr", + .result = REJECT, }, { "check deducing bounds from const, 8", @@ -7852,8 +7874,9 @@ static struct bpf_test tests[] = { offsetof(struct __sk_buff, mark)), BPF_EXIT_INSN(), }, - .result = REJECT, + .errstr_unpriv = "R1 tried to add from different maps, paths, or prohibited types", .errstr = "dereference of modified ctx ptr", + .result = REJECT, }, { "check deducing bounds from const, 9", @@ -7863,8 +7886,9 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }, - .result = REJECT, + .errstr_unpriv = "R0 tried to sub from different maps, paths, or prohibited types", .errstr = "R0 tried to subtract pointer from scalar", + .result = REJECT, }, { "check deducing bounds from const, 10", @@ -7876,8 +7900,8 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }, - .result = REJECT, .errstr = "math between ctx pointer and register with unbounded min value is not allowed", + .result = REJECT, }, { "XDP pkt read, pkt_end <= pkt_data', bad access 2",
From: Daniel Borkmann daniel@iogearbox.net
commit 6f55b2f2a1178856c19bbce2f71449926e731914 upstream.
Small refactor to drag off_reg into sanitize_ptr_alu(), so we later on can use off_reg for generalizing some of the checks for all pointer types.
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org [fllinden@amazon.com: fix minor contextual conflict for 4.14] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2094,11 +2094,12 @@ static int sanitize_val_alu(struct bpf_v static int sanitize_ptr_alu(struct bpf_verifier_env *env, struct bpf_insn *insn, const struct bpf_reg_state *ptr_reg, - struct bpf_reg_state *dst_reg, - bool off_is_neg) + const struct bpf_reg_state *off_reg, + struct bpf_reg_state *dst_reg) { struct bpf_verifier_state *vstate = env->cur_state; struct bpf_insn_aux_data *aux = cur_aux(env); + bool off_is_neg = off_reg->smin_value < 0; bool ptr_is_dst_reg = ptr_reg == dst_reg; u8 opcode = BPF_OP(insn->code); u32 alu_state, alu_limit; @@ -2224,7 +2225,7 @@ static int adjust_ptr_min_max_vals(struc
switch (opcode) { case BPF_ADD: - ret = sanitize_ptr_alu(env, insn, ptr_reg, dst_reg, smin_val < 0); + ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg); if (ret < 0) { verbose("R%d tried to add from different maps, paths, or prohibited types\n", dst); return ret; @@ -2279,7 +2280,7 @@ static int adjust_ptr_min_max_vals(struc } break; case BPF_SUB: - ret = sanitize_ptr_alu(env, insn, ptr_reg, dst_reg, smin_val < 0); + ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg); if (ret < 0) { verbose("R%d tried to sub from different maps, paths, or prohibited types\n", dst); return ret;
From: Daniel Borkmann daniel@iogearbox.net
commit 24c109bb1537c12c02aeed2d51a347b4d6a9b76e upstream.
The mixed signed bounds check really belongs into retrieve_ptr_limit() instead of outside of it in adjust_ptr_min_max_vals(). The reason is that this check is not tied to PTR_TO_MAP_VALUE only, but to all pointer types that we handle in retrieve_ptr_limit() and given errors from the latter propagate back to adjust_ptr_min_max_vals() and lead to rejection of the program, it's a better place to reside to avoid anything slipping through for future types. The reason why we must reject such off_reg is that we otherwise would not be able to derive a mask, see details in 9d7eceede769 ("bpf: restrict unknown scalars of mixed signed bounds for unprivileged").
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org [fllinden@amazon.com: backport to 4.14] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2025,12 +2025,18 @@ static struct bpf_insn_aux_data *cur_aux }
static int retrieve_ptr_limit(const struct bpf_reg_state *ptr_reg, - u32 *ptr_limit, u8 opcode, bool off_is_neg) + const struct bpf_reg_state *off_reg, + u32 *ptr_limit, u8 opcode) { + bool off_is_neg = off_reg->smin_value < 0; bool mask_to_left = (opcode == BPF_ADD && off_is_neg) || (opcode == BPF_SUB && !off_is_neg); u32 off, max;
+ if (!tnum_is_const(off_reg->var_off) && + (off_reg->smin_value < 0) != (off_reg->smax_value < 0)) + return -EACCES; + switch (ptr_reg->type) { case PTR_TO_STACK: /* Offset 0 is out-of-bounds, but acceptable start for the @@ -2121,7 +2127,7 @@ static int sanitize_ptr_alu(struct bpf_v alu_state |= ptr_is_dst_reg ? BPF_ALU_SANITIZE_SRC : BPF_ALU_SANITIZE_DST;
- err = retrieve_ptr_limit(ptr_reg, &alu_limit, opcode, off_is_neg); + err = retrieve_ptr_limit(ptr_reg, off_reg, &alu_limit, opcode); if (err < 0) return err;
@@ -2164,8 +2170,8 @@ static int adjust_ptr_min_max_vals(struc smin_ptr = ptr_reg->smin_value, smax_ptr = ptr_reg->smax_value; u64 umin_val = off_reg->umin_value, umax_val = off_reg->umax_value, umin_ptr = ptr_reg->umin_value, umax_ptr = ptr_reg->umax_value; - u32 dst = insn->dst_reg, src = insn->src_reg; u8 opcode = BPF_OP(insn->code); + u32 dst = insn->dst_reg; int ret;
dst_reg = ®s[dst]; @@ -2205,13 +2211,6 @@ static int adjust_ptr_min_max_vals(struc dst); return -EACCES; } - if (ptr_reg->type == PTR_TO_MAP_VALUE) { - if (!env->allow_ptr_leaks && !known && (smin_val < 0) != (smax_val < 0)) { - verbose("R%d has unknown scalar with mixed signed bounds, pointer arithmetic with it prohibited for !root\n", - off_reg == dst_reg ? dst : src); - return -EACCES; - } - }
/* In case of 'scalar += pointer', dst_reg inherits pointer type and id. * The id may be overwritten later if we create a new variable offset.
From: Daniel Borkmann daniel@iogearbox.net
commit b658bbb844e28f1862867f37e8ca11a8e2aa94a3 upstream.
Small refactor with no semantic changes in order to consolidate the max ptr_limit boundary check.
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2026,12 +2026,12 @@ static struct bpf_insn_aux_data *cur_aux
static int retrieve_ptr_limit(const struct bpf_reg_state *ptr_reg, const struct bpf_reg_state *off_reg, - u32 *ptr_limit, u8 opcode) + u32 *alu_limit, u8 opcode) { bool off_is_neg = off_reg->smin_value < 0; bool mask_to_left = (opcode == BPF_ADD && off_is_neg) || (opcode == BPF_SUB && !off_is_neg); - u32 off, max; + u32 off, max = 0, ptr_limit = 0;
if (!tnum_is_const(off_reg->var_off) && (off_reg->smin_value < 0) != (off_reg->smax_value < 0)) @@ -2045,22 +2045,27 @@ static int retrieve_ptr_limit(const stru max = MAX_BPF_STACK + mask_to_left; off = ptr_reg->off + ptr_reg->var_off.value; if (mask_to_left) - *ptr_limit = MAX_BPF_STACK + off; + ptr_limit = MAX_BPF_STACK + off; else - *ptr_limit = -off - 1; - return *ptr_limit >= max ? -ERANGE : 0; + ptr_limit = -off - 1; + break; case PTR_TO_MAP_VALUE: max = ptr_reg->map_ptr->value_size; if (mask_to_left) { - *ptr_limit = ptr_reg->umax_value + ptr_reg->off; + ptr_limit = ptr_reg->umax_value + ptr_reg->off; } else { off = ptr_reg->smin_value + ptr_reg->off; - *ptr_limit = ptr_reg->map_ptr->value_size - off - 1; + ptr_limit = ptr_reg->map_ptr->value_size - off - 1; } - return *ptr_limit >= max ? -ERANGE : 0; + break; default: return -EINVAL; } + + if (ptr_limit >= max) + return -ERANGE; + *alu_limit = ptr_limit; + return 0; }
static bool can_skip_alu_sanitation(const struct bpf_verifier_env *env,
From: Daniel Borkmann daniel@iogearbox.net
commit a6aaece00a57fa6f22575364b3903dfbccf5345d upstream.
Consolidate all error handling and provide more user-friendly error messages from sanitize_ptr_alu() and sanitize_val_alu().
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org [fllinden@amazon.com: backport to 4.14] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 84 ++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 62 insertions(+), 22 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2024,6 +2024,14 @@ static struct bpf_insn_aux_data *cur_aux return &env->insn_aux_data[env->insn_idx]; }
+enum { + REASON_BOUNDS = -1, + REASON_TYPE = -2, + REASON_PATHS = -3, + REASON_LIMIT = -4, + REASON_STACK = -5, +}; + static int retrieve_ptr_limit(const struct bpf_reg_state *ptr_reg, const struct bpf_reg_state *off_reg, u32 *alu_limit, u8 opcode) @@ -2035,7 +2043,7 @@ static int retrieve_ptr_limit(const stru
if (!tnum_is_const(off_reg->var_off) && (off_reg->smin_value < 0) != (off_reg->smax_value < 0)) - return -EACCES; + return REASON_BOUNDS;
switch (ptr_reg->type) { case PTR_TO_STACK: @@ -2059,11 +2067,11 @@ static int retrieve_ptr_limit(const stru } break; default: - return -EINVAL; + return REASON_TYPE; }
if (ptr_limit >= max) - return -ERANGE; + return REASON_LIMIT; *alu_limit = ptr_limit; return 0; } @@ -2083,7 +2091,7 @@ static int update_alu_sanitation_state(s if (aux->alu_state && (aux->alu_state != alu_state || aux->alu_limit != alu_limit)) - return -EACCES; + return REASON_PATHS;
/* Corresponding fixup done in fixup_bpf_calls(). */ aux->alu_state = alu_state; @@ -2156,7 +2164,46 @@ do_sim: ret = push_stack(env, env->insn_idx + 1, env->insn_idx, true); if (!ptr_is_dst_reg && ret) *dst_reg = tmp; - return !ret ? -EFAULT : 0; + return !ret ? REASON_STACK : 0; +} + +static int sanitize_err(struct bpf_verifier_env *env, + const struct bpf_insn *insn, int reason, + const struct bpf_reg_state *off_reg, + const struct bpf_reg_state *dst_reg) +{ + static const char *err = "pointer arithmetic with it prohibited for !root"; + const char *op = BPF_OP(insn->code) == BPF_ADD ? "add" : "sub"; + u32 dst = insn->dst_reg, src = insn->src_reg; + + switch (reason) { + case REASON_BOUNDS: + verbose("R%d has unknown scalar with mixed signed bounds, %s\n", + off_reg == dst_reg ? dst : src, err); + break; + case REASON_TYPE: + verbose("R%d has pointer with unsupported alu operation, %s\n", + off_reg == dst_reg ? src : dst, err); + break; + case REASON_PATHS: + verbose("R%d tried to %s from different maps, paths or scalars, %s\n", + dst, op, err); + break; + case REASON_LIMIT: + verbose("R%d tried to %s beyond pointer bounds, %s\n", + dst, op, err); + break; + case REASON_STACK: + verbose("R%d could not be pushed for speculative verification, %s\n", + dst, err); + break; + default: + verbose("verifier internal error: unknown reason (%d)\n", + reason); + break; + } + + return -EACCES; }
/* Handles arithmetic on a pointer and a scalar: computes new min/max and var_off. @@ -2230,10 +2277,9 @@ static int adjust_ptr_min_max_vals(struc switch (opcode) { case BPF_ADD: ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg); - if (ret < 0) { - verbose("R%d tried to add from different maps, paths, or prohibited types\n", dst); - return ret; - } + if (ret < 0) + return sanitize_err(env, insn, ret, off_reg, dst_reg); + /* We can take a fixed offset as long as it doesn't overflow * the s32 'off' field */ @@ -2285,10 +2331,9 @@ static int adjust_ptr_min_max_vals(struc break; case BPF_SUB: ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg); - if (ret < 0) { - verbose("R%d tried to sub from different maps, paths, or prohibited types\n", dst); - return ret; - } + if (ret < 0) + return sanitize_err(env, insn, ret, off_reg, dst_reg); + if (dst_reg == off_reg) { /* scalar -= pointer. Creates an unknown scalar */ if (!env->allow_ptr_leaks) @@ -2412,7 +2457,6 @@ static int adjust_scalar_min_max_vals(st s64 smin_val, smax_val; u64 umin_val, umax_val; u64 insn_bitness = (BPF_CLASS(insn->code) == BPF_ALU64) ? 64 : 32; - u32 dst = insn->dst_reg; int ret;
if (insn_bitness == 32) { @@ -2449,10 +2493,8 @@ static int adjust_scalar_min_max_vals(st switch (opcode) { case BPF_ADD: ret = sanitize_val_alu(env, insn); - if (ret < 0) { - verbose("R%d tried to add from different pointers or scalars\n", dst); - return ret; - } + if (ret < 0) + return sanitize_err(env, insn, ret, NULL, NULL); if (signed_add_overflows(dst_reg->smin_value, smin_val) || signed_add_overflows(dst_reg->smax_value, smax_val)) { dst_reg->smin_value = S64_MIN; @@ -2473,10 +2515,8 @@ static int adjust_scalar_min_max_vals(st break; case BPF_SUB: ret = sanitize_val_alu(env, insn); - if (ret < 0) { - verbose("R%d tried to sub from different pointers or scalars\n", dst); - return ret; - } + if (ret < 0) + return sanitize_err(env, insn, ret, NULL, NULL); if (signed_sub_overflows(dst_reg->smin_value, smax_val) || signed_sub_overflows(dst_reg->smax_value, smin_val)) { /* Overflow possible, we know nothing */
From: Daniel Borkmann daniel@iogearbox.net
commit 073815b756c51ba9d8384d924c5d1c03ca3d1ae4 upstream.
Move the bounds check in adjust_ptr_min_max_vals() into a small helper named sanitize_check_bounds() in order to simplify the former a bit.
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org [fllinden@amazon.com: backport to 4.14] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 54 ++++++++++++++++++++++++++++++++++---------------- 1 file changed, 37 insertions(+), 17 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2206,6 +2206,41 @@ static int sanitize_err(struct bpf_verif return -EACCES; }
+static int sanitize_check_bounds(struct bpf_verifier_env *env, + const struct bpf_insn *insn, + const struct bpf_reg_state *dst_reg) +{ + u32 dst = insn->dst_reg; + + /* For unprivileged we require that resulting offset must be in bounds + * in order to be able to sanitize access later on. + */ + if (env->allow_ptr_leaks) + return 0; + + switch (dst_reg->type) { + case PTR_TO_STACK: + if (check_stack_access(env, dst_reg, dst_reg->off + + dst_reg->var_off.value, 1)) { + verbose("R%d stack pointer arithmetic goes out of range, " + "prohibited for !root\n", dst); + return -EACCES; + } + break; + case PTR_TO_MAP_VALUE: + if (check_map_access(env, dst, dst_reg->off, 1)) { + verbose("R%d pointer arithmetic of map value goes out of range, " + "prohibited for !root\n", dst); + return -EACCES; + } + break; + default: + break; + } + + return 0; +} + /* Handles arithmetic on a pointer and a scalar: computes new min/max and var_off. * Caller should also handle BPF_MOV case separately. * If we return -EACCES, caller may want to try again treating pointer as a @@ -2421,23 +2456,8 @@ static int adjust_ptr_min_max_vals(struc __reg_deduce_bounds(dst_reg); __reg_bound_offset(dst_reg);
- /* For unprivileged we require that resulting offset must be in bounds - * in order to be able to sanitize access later on. - */ - if (!env->allow_ptr_leaks) { - if (dst_reg->type == PTR_TO_MAP_VALUE && - check_map_access(env, dst, dst_reg->off, 1)) { - verbose("R%d pointer arithmetic of map value goes out of range, " - "prohibited for !root\n", dst); - return -EACCES; - } else if (dst_reg->type == PTR_TO_STACK && - check_stack_access(env, dst_reg, dst_reg->off + - dst_reg->var_off.value, 1)) { - verbose("R%d stack pointer arithmetic goes out of range, " - "prohibited for !root\n", dst); - return -EACCES; - } - } + if (sanitize_check_bounds(env, insn, dst_reg) < 0) + return -EACCES;
return 0; }
From: Daniel Borkmann daniel@iogearbox.net
commit f528819334881fd622fdadeddb3f7edaed8b7c9b upstream.
Add a small sanitize_needed() helper function and move sanitize_val_alu() out of the main opcode switch. In upcoming work, we'll move sanitize_ptr_alu() as well out of its opcode switch so this helps to streamline both.
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org [fllinden@amazon.com: backported to 4.14] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2110,6 +2110,11 @@ static int sanitize_val_alu(struct bpf_v return update_alu_sanitation_state(aux, BPF_ALU_NON_POINTER, 0); }
+static bool sanitize_needed(u8 opcode) +{ + return opcode == BPF_ADD || opcode == BPF_SUB; +} + static int sanitize_ptr_alu(struct bpf_verifier_env *env, struct bpf_insn *insn, const struct bpf_reg_state *ptr_reg, @@ -2510,11 +2515,14 @@ static int adjust_scalar_min_max_vals(st return 0; }
- switch (opcode) { - case BPF_ADD: + if (sanitize_needed(opcode)) { ret = sanitize_val_alu(env, insn); if (ret < 0) return sanitize_err(env, insn, ret, NULL, NULL); + } + + switch (opcode) { + case BPF_ADD: if (signed_add_overflows(dst_reg->smin_value, smin_val) || signed_add_overflows(dst_reg->smax_value, smax_val)) { dst_reg->smin_value = S64_MIN; @@ -2534,9 +2542,6 @@ static int adjust_scalar_min_max_vals(st dst_reg->var_off = tnum_add(dst_reg->var_off, src_reg.var_off); break; case BPF_SUB: - ret = sanitize_val_alu(env, insn); - if (ret < 0) - return sanitize_err(env, insn, ret, NULL, NULL); if (signed_sub_overflows(dst_reg->smin_value, smax_val) || signed_sub_overflows(dst_reg->smax_value, smin_val)) { /* Overflow possible, we know nothing */
From: Daniel Borkmann daniel@iogearbox.net
commit 7fedb63a8307dda0ec3b8969a3b233a1dd7ea8e0 upstream.
This work tightens the offset mask we use for unprivileged pointer arithmetic in order to mitigate a corner case reported by Piotr and Benedict where in the speculative domain it is possible to advance, for example, the map value pointer by up to value_size-1 out-of-bounds in order to leak kernel memory via side-channel to user space.
Before this change, the computed ptr_limit for retrieve_ptr_limit() helper represents largest valid distance when moving pointer to the right or left which is then fed as aux->alu_limit to generate masking instructions against the offset register. After the change, the derived aux->alu_limit represents the largest potential value of the offset register which we mask against which is just a narrower subset of the former limit.
For minimal complexity, we call sanitize_ptr_alu() from 2 observation points in adjust_ptr_min_max_vals(), that is, before and after the simulated alu operation. In the first step, we retieve the alu_state and alu_limit before the operation as well as we branch-off a verifier path and push it to the verification stack as we did before which checks the dst_reg under truncation, in other words, when the speculative domain would attempt to move the pointer out-of-bounds.
In the second step, we retrieve the new alu_limit and calculate the absolute distance between both. Moreover, we commit the alu_state and final alu_limit via update_alu_sanitation_state() to the env's instruction aux data, and bail out from there if there is a mismatch due to coming from different verification paths with different states.
Reported-by: Piotr Krysiuk piotras@gmail.com Reported-by: Benedict Schlueter benedict.schlueter@rub.de Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org Tested-by: Benedict Schlueter benedict.schlueter@rub.de [fllinden@amazon.com: backported to 4.14] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 70 +++++++++++++++++++++++++++++++------------------- 1 file changed, 44 insertions(+), 26 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2039,7 +2039,7 @@ static int retrieve_ptr_limit(const stru bool off_is_neg = off_reg->smin_value < 0; bool mask_to_left = (opcode == BPF_ADD && off_is_neg) || (opcode == BPF_SUB && !off_is_neg); - u32 off, max = 0, ptr_limit = 0; + u32 max = 0, ptr_limit = 0;
if (!tnum_is_const(off_reg->var_off) && (off_reg->smin_value < 0) != (off_reg->smax_value < 0)) @@ -2048,23 +2048,18 @@ static int retrieve_ptr_limit(const stru switch (ptr_reg->type) { case PTR_TO_STACK: /* Offset 0 is out-of-bounds, but acceptable start for the - * left direction, see BPF_REG_FP. + * left direction, see BPF_REG_FP. Also, unknown scalar + * offset where we would need to deal with min/max bounds is + * currently prohibited for unprivileged. */ max = MAX_BPF_STACK + mask_to_left; - off = ptr_reg->off + ptr_reg->var_off.value; - if (mask_to_left) - ptr_limit = MAX_BPF_STACK + off; - else - ptr_limit = -off - 1; + ptr_limit = -(ptr_reg->var_off.value + ptr_reg->off); break; case PTR_TO_MAP_VALUE: max = ptr_reg->map_ptr->value_size; - if (mask_to_left) { - ptr_limit = ptr_reg->umax_value + ptr_reg->off; - } else { - off = ptr_reg->smin_value + ptr_reg->off; - ptr_limit = ptr_reg->map_ptr->value_size - off - 1; - } + ptr_limit = (mask_to_left ? + ptr_reg->smin_value : + ptr_reg->umax_value) + ptr_reg->off; break; default: return REASON_TYPE; @@ -2119,10 +2114,12 @@ static int sanitize_ptr_alu(struct bpf_v struct bpf_insn *insn, const struct bpf_reg_state *ptr_reg, const struct bpf_reg_state *off_reg, - struct bpf_reg_state *dst_reg) + struct bpf_reg_state *dst_reg, + struct bpf_insn_aux_data *tmp_aux, + const bool commit_window) { + struct bpf_insn_aux_data *aux = commit_window ? cur_aux(env) : tmp_aux; struct bpf_verifier_state *vstate = env->cur_state; - struct bpf_insn_aux_data *aux = cur_aux(env); bool off_is_neg = off_reg->smin_value < 0; bool ptr_is_dst_reg = ptr_reg == dst_reg; u8 opcode = BPF_OP(insn->code); @@ -2141,18 +2138,33 @@ static int sanitize_ptr_alu(struct bpf_v if (vstate->speculative) goto do_sim;
- alu_state = off_is_neg ? BPF_ALU_NEG_VALUE : 0; - alu_state |= ptr_is_dst_reg ? - BPF_ALU_SANITIZE_SRC : BPF_ALU_SANITIZE_DST; - err = retrieve_ptr_limit(ptr_reg, off_reg, &alu_limit, opcode); if (err < 0) return err;
+ if (commit_window) { + /* In commit phase we narrow the masking window based on + * the observed pointer move after the simulated operation. + */ + alu_state = tmp_aux->alu_state; + alu_limit = abs(tmp_aux->alu_limit - alu_limit); + } else { + alu_state = off_is_neg ? BPF_ALU_NEG_VALUE : 0; + alu_state |= ptr_is_dst_reg ? + BPF_ALU_SANITIZE_SRC : BPF_ALU_SANITIZE_DST; + } + err = update_alu_sanitation_state(aux, alu_state, alu_limit); if (err < 0) return err; do_sim: + /* If we're in commit phase, we're done here given we already + * pushed the truncated dst_reg into the speculative verification + * stack. + */ + if (commit_window) + return 0; + /* Simulate and find potential out-of-bounds access under * speculative execution from truncation as a result of * masking when off was not within expected range. If off @@ -2262,6 +2274,7 @@ static int adjust_ptr_min_max_vals(struc smin_ptr = ptr_reg->smin_value, smax_ptr = ptr_reg->smax_value; u64 umin_val = off_reg->umin_value, umax_val = off_reg->umax_value, umin_ptr = ptr_reg->umin_value, umax_ptr = ptr_reg->umax_value; + struct bpf_insn_aux_data tmp_aux = {}; u8 opcode = BPF_OP(insn->code); u32 dst = insn->dst_reg; int ret; @@ -2314,12 +2327,15 @@ static int adjust_ptr_min_max_vals(struc !check_reg_sane_offset(env, ptr_reg, ptr_reg->type)) return -EINVAL;
- switch (opcode) { - case BPF_ADD: - ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg); + if (sanitize_needed(opcode)) { + ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg, + &tmp_aux, false); if (ret < 0) return sanitize_err(env, insn, ret, off_reg, dst_reg); + }
+ switch (opcode) { + case BPF_ADD: /* We can take a fixed offset as long as it doesn't overflow * the s32 'off' field */ @@ -2370,10 +2386,6 @@ static int adjust_ptr_min_max_vals(struc } break; case BPF_SUB: - ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg); - if (ret < 0) - return sanitize_err(env, insn, ret, off_reg, dst_reg); - if (dst_reg == off_reg) { /* scalar -= pointer. Creates an unknown scalar */ if (!env->allow_ptr_leaks) @@ -2463,6 +2475,12 @@ static int adjust_ptr_min_max_vals(struc
if (sanitize_check_bounds(env, insn, dst_reg) < 0) return -EACCES; + if (sanitize_needed(opcode)) { + ret = sanitize_ptr_alu(env, insn, dst_reg, off_reg, dst_reg, + &tmp_aux, true); + if (ret < 0) + return sanitize_err(env, insn, ret, off_reg, dst_reg); + }
return 0; }
From: Daniel Borkmann daniel@iogearbox.net
commit d7a5091351756d0ae8e63134313c455624e36a13 upstream.
Update various selftest error messages:
* The 'Rx tried to sub from different maps, paths, or prohibited types' is reworked into more specific/differentiated error messages for better guidance.
* The change into 'value -4294967168 makes map_value pointer be out of bounds' is due to moving the mixed bounds check into the speculation handling and thus occuring slightly later than above mentioned sanity check.
* The change into 'math between map_value pointer and register with unbounded min value' is similarly due to register sanity check coming before the mixed bounds check.
* The case of 'map access: known scalar += value_ptr from different maps' now loads fine given masks are the same from the different paths (despite max map value size being different).
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: John Fastabend john.fastabend@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org [fllinden@amazon.com - 4.14 backport, account for split test_verifier and different / missing tests] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/test_verifier.c | 34 +++++++++------------------- 1 file changed, 12 insertions(+), 22 deletions(-)
--- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -2243,7 +2243,7 @@ static struct bpf_test tests[] = { BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, -8), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R1 tried to add from different maps, paths, or prohibited types", + .errstr_unpriv = "R1 stack pointer arithmetic goes out of range", .result_unpriv = REJECT, .result = ACCEPT, }, @@ -6220,7 +6220,6 @@ static struct bpf_test tests[] = { }, .fixup_map1 = { 3 }, .errstr = "unbounded min value", - .errstr_unpriv = "R1 has unknown scalar with mixed signed bounds", .result = REJECT, }, { @@ -6245,7 +6244,6 @@ static struct bpf_test tests[] = { }, .fixup_map1 = { 3 }, .errstr = "unbounded min value", - .errstr_unpriv = "R1 has unknown scalar with mixed signed bounds", .result = REJECT, }, { @@ -6272,7 +6270,6 @@ static struct bpf_test tests[] = { }, .fixup_map1 = { 3 }, .errstr = "unbounded min value", - .errstr_unpriv = "R8 has unknown scalar with mixed signed bounds", .result = REJECT, }, { @@ -6298,7 +6295,6 @@ static struct bpf_test tests[] = { }, .fixup_map1 = { 3 }, .errstr = "unbounded min value", - .errstr_unpriv = "R8 has unknown scalar with mixed signed bounds", .result = REJECT, }, { @@ -6347,7 +6343,6 @@ static struct bpf_test tests[] = { }, .fixup_map1 = { 3 }, .errstr = "unbounded min value", - .errstr_unpriv = "R1 has unknown scalar with mixed signed bounds", .result = REJECT, }, { @@ -6419,7 +6414,6 @@ static struct bpf_test tests[] = { }, .fixup_map1 = { 3 }, .errstr = "unbounded min value", - .errstr_unpriv = "R1 has unknown scalar with mixed signed bounds", .result = REJECT, }, { @@ -6471,7 +6465,6 @@ static struct bpf_test tests[] = { }, .fixup_map1 = { 3 }, .errstr = "unbounded min value", - .errstr_unpriv = "R1 has unknown scalar with mixed signed bounds", .result = REJECT, }, { @@ -6499,7 +6492,6 @@ static struct bpf_test tests[] = { }, .fixup_map1 = { 3 }, .errstr = "unbounded min value", - .errstr_unpriv = "R1 has unknown scalar with mixed signed bounds", .result = REJECT, }, { @@ -6526,7 +6518,6 @@ static struct bpf_test tests[] = { }, .fixup_map1 = { 3 }, .errstr = "unbounded min value", - .errstr_unpriv = "R1 has unknown scalar with mixed signed bounds", .result = REJECT, }, { @@ -6556,7 +6547,6 @@ static struct bpf_test tests[] = { }, .fixup_map1 = { 3 }, .errstr = "unbounded min value", - .errstr_unpriv = "R7 has unknown scalar with mixed signed bounds", .result = REJECT, }, { @@ -6615,7 +6605,6 @@ static struct bpf_test tests[] = { }, .fixup_map1 = { 3 }, .errstr = "unbounded min value", - .errstr_unpriv = "R1 has unknown scalar with mixed signed bounds", .result = REJECT, .result_unpriv = REJECT, }, @@ -7779,7 +7768,7 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 tried to sub from different maps, paths, or prohibited types", + .errstr_unpriv = "R1 has pointer with unsupported alu operation", .errstr = "R0 tried to subtract pointer from scalar", .result = REJECT, }, @@ -7794,7 +7783,7 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_1, BPF_REG_0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R1 tried to sub from different maps, paths, or prohibited types", + .errstr_unpriv = "R1 has pointer with unsupported alu operation", .result_unpriv = REJECT, .result = ACCEPT, }, @@ -7806,22 +7795,23 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 tried to sub from different maps, paths, or prohibited types", + .errstr_unpriv = "R1 has pointer with unsupported alu operation", .errstr = "R0 tried to subtract pointer from scalar", .result = REJECT, }, { "check deducing bounds from const, 4", .insns = { + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_JMP_IMM(BPF_JSLE, BPF_REG_0, 0, 1), BPF_EXIT_INSN(), BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 1), BPF_EXIT_INSN(), - BPF_ALU64_REG(BPF_SUB, BPF_REG_1, BPF_REG_0), + BPF_ALU64_REG(BPF_SUB, BPF_REG_6, BPF_REG_0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R1 tried to sub from different maps, paths, or prohibited types", + .errstr_unpriv = "R6 has pointer with unsupported alu operation", .result_unpriv = REJECT, .result = ACCEPT, }, @@ -7833,7 +7823,7 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 tried to sub from different maps, paths, or prohibited types", + .errstr_unpriv = "R1 has pointer with unsupported alu operation", .errstr = "R0 tried to subtract pointer from scalar", .result = REJECT, }, @@ -7846,7 +7836,7 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 tried to sub from different maps, paths, or prohibited types", + .errstr_unpriv = "R1 has pointer with unsupported alu operation", .errstr = "R0 tried to subtract pointer from scalar", .result = REJECT, }, @@ -7860,7 +7850,7 @@ static struct bpf_test tests[] = { offsetof(struct __sk_buff, mark)), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R1 tried to sub from different maps, paths, or prohibited types", + .errstr_unpriv = "R1 has pointer with unsupported alu operation", .errstr = "dereference of modified ctx ptr", .result = REJECT, }, @@ -7874,7 +7864,7 @@ static struct bpf_test tests[] = { offsetof(struct __sk_buff, mark)), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R1 tried to add from different maps, paths, or prohibited types", + .errstr_unpriv = "R1 has pointer with unsupported alu operation", .errstr = "dereference of modified ctx ptr", .result = REJECT, }, @@ -7886,7 +7876,7 @@ static struct bpf_test tests[] = { BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_1), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R0 tried to sub from different maps, paths, or prohibited types", + .errstr_unpriv = "R1 has pointer with unsupported alu operation", .errstr = "R0 tried to subtract pointer from scalar", .result = REJECT, },
From: Alexei Starovoitov ast@kernel.org
commit 82abbf8d2fc46d79611ab58daa7c608df14bb3ee upstream.
Do not allow root to convert valid pointers into unknown scalars. In particular disallow: ptr &= reg ptr <<= reg ptr += ptr and explicitly allow: ptr -= ptr since pkt_end - pkt == length
1. This minimizes amount of address leaks root can do. In the future may need to further tighten the leaks with kptr_restrict.
2. If program has such pointer math it's likely a user mistake and when verifier complains about it right away instead of many instructions later on invalid memory access it's easier for users to fix their progs.
3. when register holding a pointer cannot change to scalar it allows JITs to optimize better. Like 32-bit archs could use single register for pointers instead of a pair required to hold 64-bit scalars.
4. reduces architecture dependent behavior. Since code: r1 = r10; r1 &= 0xff; if (r1 ...) will behave differently arm64 vs x64 and offloaded vs native.
A significant chunk of ptr mangling was allowed by commit f1174f77b50c ("bpf/verifier: rework value tracking") yet some of it was allowed even earlier.
Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net [fllinden@amazon.com: backport to 4.14] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 100 +++++++++------------------- tools/testing/selftests/bpf/test_verifier.c | 56 ++++++++------- 2 files changed, 62 insertions(+), 94 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2292,28 +2292,24 @@ static int adjust_ptr_min_max_vals(struc
if (BPF_CLASS(insn->code) != BPF_ALU64) { /* 32-bit ALU ops on pointers produce (meaningless) scalars */ - if (!env->allow_ptr_leaks) - verbose("R%d 32-bit pointer arithmetic prohibited\n", - dst); + verbose("R%d 32-bit pointer arithmetic prohibited\n", + dst); return -EACCES; }
if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) { - if (!env->allow_ptr_leaks) - verbose("R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n", - dst); + verbose("R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n", + dst); return -EACCES; } if (ptr_reg->type == CONST_PTR_TO_MAP) { - if (!env->allow_ptr_leaks) - verbose("R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n", - dst); + verbose("R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n", + dst); return -EACCES; } if (ptr_reg->type == PTR_TO_PACKET_END) { - if (!env->allow_ptr_leaks) - verbose("R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n", - dst); + verbose("R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n", + dst); return -EACCES; }
@@ -2388,9 +2384,8 @@ static int adjust_ptr_min_max_vals(struc case BPF_SUB: if (dst_reg == off_reg) { /* scalar -= pointer. Creates an unknown scalar */ - if (!env->allow_ptr_leaks) - verbose("R%d tried to subtract pointer from scalar\n", - dst); + verbose("R%d tried to subtract pointer from scalar\n", + dst); return -EACCES; } /* We don't allow subtraction from FP, because (according to @@ -2398,9 +2393,8 @@ static int adjust_ptr_min_max_vals(struc * be able to deal with it. */ if (ptr_reg->type == PTR_TO_STACK) { - if (!env->allow_ptr_leaks) - verbose("R%d subtraction from stack pointer prohibited\n", - dst); + verbose("R%d subtraction from stack pointer prohibited\n", + dst); return -EACCES; } if (known && (ptr_reg->off - smin_val == @@ -2450,19 +2444,14 @@ static int adjust_ptr_min_max_vals(struc case BPF_AND: case BPF_OR: case BPF_XOR: - /* bitwise ops on pointers are troublesome, prohibit for now. - * (However, in principle we could allow some cases, e.g. - * ptr &= ~3 which would reduce min_value by 3.) - */ - if (!env->allow_ptr_leaks) - verbose("R%d bitwise operator %s on pointer prohibited\n", - dst, bpf_alu_string[opcode >> 4]); + /* bitwise ops on pointers are troublesome. */ + verbose("R%d bitwise operator %s on pointer prohibited\n", + dst, bpf_alu_string[opcode >> 4]); return -EACCES; default: /* other operators (e.g. MUL,LSH) produce non-pointer results */ - if (!env->allow_ptr_leaks) - verbose("R%d pointer arithmetic with %s operator prohibited\n", - dst, bpf_alu_string[opcode >> 4]); + verbose("R%d pointer arithmetic with %s operator prohibited\n", + dst, bpf_alu_string[opcode >> 4]); return -EACCES; }
@@ -2752,7 +2741,6 @@ static int adjust_reg_min_max_vals(struc struct bpf_reg_state *regs = cur_regs(env), *dst_reg, *src_reg; struct bpf_reg_state *ptr_reg = NULL, off_reg = {0}; u8 opcode = BPF_OP(insn->code); - int rc;
dst_reg = ®s[insn->dst_reg]; src_reg = NULL; @@ -2763,43 +2751,29 @@ static int adjust_reg_min_max_vals(struc if (src_reg->type != SCALAR_VALUE) { if (dst_reg->type != SCALAR_VALUE) { /* Combining two pointers by any ALU op yields - * an arbitrary scalar. + * an arbitrary scalar. Disallow all math except + * pointer subtraction */ - if (!env->allow_ptr_leaks) { - verbose("R%d pointer %s pointer prohibited\n", - insn->dst_reg, - bpf_alu_string[opcode >> 4]); - return -EACCES; + if (opcode == BPF_SUB){ + mark_reg_unknown(regs, insn->dst_reg); + return 0; } - mark_reg_unknown(regs, insn->dst_reg); - return 0; + verbose("R%d pointer %s pointer prohibited\n", + insn->dst_reg, + bpf_alu_string[opcode >> 4]); + return -EACCES; } else { /* scalar += pointer * This is legal, but we have to reverse our * src/dest handling in computing the range */ - rc = adjust_ptr_min_max_vals(env, insn, - src_reg, dst_reg); - if (rc == -EACCES && env->allow_ptr_leaks) { - /* scalar += unknown scalar */ - __mark_reg_unknown(&off_reg); - return adjust_scalar_min_max_vals( - env, insn, - dst_reg, off_reg); - } - return rc; + return adjust_ptr_min_max_vals(env, insn, + src_reg, dst_reg); } } else if (ptr_reg) { /* pointer += scalar */ - rc = adjust_ptr_min_max_vals(env, insn, - dst_reg, src_reg); - if (rc == -EACCES && env->allow_ptr_leaks) { - /* unknown scalar += scalar */ - __mark_reg_unknown(dst_reg); - return adjust_scalar_min_max_vals( - env, insn, dst_reg, *src_reg); - } - return rc; + return adjust_ptr_min_max_vals(env, insn, + dst_reg, src_reg); } } else { /* Pretend the src is a reg with a known value, since we only @@ -2808,17 +2782,9 @@ static int adjust_reg_min_max_vals(struc off_reg.type = SCALAR_VALUE; __mark_reg_known(&off_reg, insn->imm); src_reg = &off_reg; - if (ptr_reg) { /* pointer += K */ - rc = adjust_ptr_min_max_vals(env, insn, - ptr_reg, src_reg); - if (rc == -EACCES && env->allow_ptr_leaks) { - /* unknown scalar += K */ - __mark_reg_unknown(dst_reg); - return adjust_scalar_min_max_vals( - env, insn, dst_reg, off_reg); - } - return rc; - } + if (ptr_reg) /* pointer += K */ + return adjust_ptr_min_max_vals(env, insn, + ptr_reg, src_reg); }
/* Got here implies adding two SCALAR_VALUEs */ --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -462,9 +462,7 @@ static struct bpf_test tests[] = { BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr_unpriv = "R1 subtraction from stack pointer", - .result_unpriv = REJECT, - .errstr = "R1 invalid mem access", + .errstr = "R1 subtraction from stack pointer", .result = REJECT, }, { @@ -1900,9 +1898,8 @@ static struct bpf_test tests[] = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .result = ACCEPT, - .result_unpriv = REJECT, - .errstr_unpriv = "R1 pointer += pointer", + .result = REJECT, + .errstr = "R1 pointer += pointer", }, { "unpriv: neg pointer", @@ -2694,7 +2691,8 @@ static struct bpf_test tests[] = { BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct __sk_buff, data)), BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_4), - BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct __sk_buff, len)), BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 49), BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 49), BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_2), @@ -3001,7 +2999,7 @@ static struct bpf_test tests[] = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr = "invalid access to packet", + .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END", .result = REJECT, .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, @@ -3988,9 +3986,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map2 = { 3, 11 }, - .errstr_unpriv = "R0 pointer += pointer", - .errstr = "R0 invalid mem access 'inv'", - .result_unpriv = REJECT, + .errstr = "R0 pointer += pointer", .result = REJECT, .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS, }, @@ -4031,7 +4027,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 4 }, - .errstr = "R4 invalid mem access", + .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL", .result = REJECT, .prog_type = BPF_PROG_TYPE_SCHED_CLS }, @@ -4052,7 +4048,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 4 }, - .errstr = "R4 invalid mem access", + .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL", .result = REJECT, .prog_type = BPF_PROG_TYPE_SCHED_CLS }, @@ -4073,7 +4069,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 4 }, - .errstr = "R4 invalid mem access", + .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL", .result = REJECT, .prog_type = BPF_PROG_TYPE_SCHED_CLS }, @@ -5304,10 +5300,8 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map2 = { 3 }, - .errstr_unpriv = "R0 bitwise operator &= on pointer", - .errstr = "invalid mem access 'inv'", + .errstr = "R0 bitwise operator &= on pointer", .result = REJECT, - .result_unpriv = REJECT, }, { "map element value illegal alu op, 2", @@ -5323,10 +5317,8 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map2 = { 3 }, - .errstr_unpriv = "R0 32-bit pointer arithmetic prohibited", - .errstr = "invalid mem access 'inv'", + .errstr = "R0 32-bit pointer arithmetic prohibited", .result = REJECT, - .result_unpriv = REJECT, }, { "map element value illegal alu op, 3", @@ -5342,10 +5334,8 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map2 = { 3 }, - .errstr_unpriv = "R0 pointer arithmetic with /= operator", - .errstr = "invalid mem access 'inv'", + .errstr = "R0 pointer arithmetic with /= operator", .result = REJECT, - .result_unpriv = REJECT, }, { "map element value illegal alu op, 4", @@ -5938,8 +5928,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map_in_map = { 3 }, - .errstr = "R1 type=inv expected=map_ptr", - .errstr_unpriv = "R1 pointer arithmetic on CONST_PTR_TO_MAP prohibited", + .errstr = "R1 pointer arithmetic on CONST_PTR_TO_MAP prohibited", .result = REJECT, }, { @@ -7300,6 +7289,19 @@ static struct bpf_test tests[] = { .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, { + "pkt_end - pkt_start is allowed", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1, + offsetof(struct __sk_buff, data_end)), + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct __sk_buff, data)), + BPF_ALU64_REG(BPF_SUB, BPF_REG_0, BPF_REG_2), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + }, + { "XDP pkt read, pkt_end mangling, bad access 1", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, @@ -7314,7 +7316,7 @@ static struct bpf_test tests[] = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr = "R1 offset is outside of the packet", + .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END", .result = REJECT, .prog_type = BPF_PROG_TYPE_XDP, }, @@ -7333,7 +7335,7 @@ static struct bpf_test tests[] = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr = "R1 offset is outside of the packet", + .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END", .result = REJECT, .prog_type = BPF_PROG_TYPE_XDP, },
From: Alexei Starovoitov ast@kernel.org
commit dd066823db2ac4e22f721ec85190817b58059a54 upstream.
Subtraction of pointers was accidentally allowed for unpriv programs by commit 82abbf8d2fc4. Revert that part of commit.
Fixes: 82abbf8d2fc4 ("bpf: do not allow root to mangle valid pointers") Reported-by: Jann Horn jannh@google.com Acked-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net [fllinden@amazon.com: backport to 4.14] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2754,7 +2754,7 @@ static int adjust_reg_min_max_vals(struc * an arbitrary scalar. Disallow all math except * pointer subtraction */ - if (opcode == BPF_SUB){ + if (opcode == BPF_SUB && env->allow_ptr_leaks) { mark_reg_unknown(regs, insn->dst_reg); return 0; }
From: Alexei Starovoitov ast@fb.com
commit 2b36047e7889b7efee22c11e17f035f721855731 upstream.
since commit 82abbf8d2fc4 the verifier rejects the bit-wise arithmetic on pointers earlier. The test 'dubious pointer arithmetic' now has less output to match on. Adjust it.
Fixes: 82abbf8d2fc4 ("bpf: do not allow root to mangle valid pointers") Reported-by: kernel test robot xiaolong.ye@intel.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/test_align.c | 22 +--------------------- 1 file changed, 1 insertion(+), 21 deletions(-)
--- a/tools/testing/selftests/bpf/test_align.c +++ b/tools/testing/selftests/bpf/test_align.c @@ -474,27 +474,7 @@ static struct bpf_align_test tests[] = { .result = REJECT, .matches = { {4, "R5=pkt(id=0,off=0,r=0,imm=0)"}, - /* ptr & 0x40 == either 0 or 0x40 */ - {5, "R5=inv(id=0,umax_value=64,var_off=(0x0; 0x40))"}, - /* ptr << 2 == unknown, (4n) */ - {7, "R5=inv(id=0,smax_value=9223372036854775804,umax_value=18446744073709551612,var_off=(0x0; 0xfffffffffffffffc))"}, - /* (4n) + 14 == (4n+2). We blow our bounds, because - * the add could overflow. - */ - {8, "R5=inv(id=0,var_off=(0x2; 0xfffffffffffffffc))"}, - /* Checked s>=0 */ - {10, "R5=inv(id=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, - /* packet pointer + nonnegative (4n+2) */ - {12, "R6=pkt(id=1,off=0,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, - {14, "R4=pkt(id=1,off=4,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, - /* NET_IP_ALIGN + (4n+2) == (4n), alignment is fine. - * We checked the bounds, but it might have been able - * to overflow if the packet pointer started in the - * upper half of the address space. - * So we did not get a 'range' on R6, and the access - * attempt will fail. - */ - {16, "R6=pkt(id=1,off=0,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, + /* R5 bitwise operator &= on pointer prohibited */ } }, {
From: Alexei Starovoitov ast@kernel.org
commit 31e95b61e172144bb2b626a291db1bdc0769275b upstream.
mostly revert the previous workaround and make 'dubious pointer arithmetic' test useful again. Use (ptr - ptr) << const instead of ptr << const to generate large scalar. The rest stays as before commit 2b36047e7889.
Fixes: 2b36047e7889 ("selftests/bpf: fix test_align") Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net [fllinden@amazon.com: adjust for 4.14 (no liveness of regs in output)] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/test_align.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-)
--- a/tools/testing/selftests/bpf/test_align.c +++ b/tools/testing/selftests/bpf/test_align.c @@ -446,11 +446,9 @@ static struct bpf_align_test tests[] = { .insns = { PREP_PKT_POINTERS, BPF_MOV64_IMM(BPF_REG_0, 0), - /* ptr & const => unknown & const */ - BPF_MOV64_REG(BPF_REG_5, BPF_REG_2), - BPF_ALU64_IMM(BPF_AND, BPF_REG_5, 0x40), - /* ptr << const => unknown << const */ - BPF_MOV64_REG(BPF_REG_5, BPF_REG_2), + /* (ptr - ptr) << 2 */ + BPF_MOV64_REG(BPF_REG_5, BPF_REG_3), + BPF_ALU64_REG(BPF_SUB, BPF_REG_5, BPF_REG_2), BPF_ALU64_IMM(BPF_LSH, BPF_REG_5, 2), /* We have a (4n) value. Let's make a packet offset * out of it. First add 14, to make it a (4n+2) @@ -473,8 +471,26 @@ static struct bpf_align_test tests[] = { .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = REJECT, .matches = { - {4, "R5=pkt(id=0,off=0,r=0,imm=0)"}, - /* R5 bitwise operator &= on pointer prohibited */ + {4, "R5=pkt_end(id=0,off=0,imm=0)"}, + /* (ptr - ptr) << 2 == unknown, (4n) */ + {6, "R5=inv(id=0,smax_value=9223372036854775804,umax_value=18446744073709551612,var_off=(0x0; 0xfffffffffffffffc))"}, + /* (4n) + 14 == (4n+2). We blow our bounds, because + * the add could overflow. + */ + {7, "R5=inv(id=0,var_off=(0x2; 0xfffffffffffffffc))"}, + /* Checked s>=0 */ + {9, "R5=inv(id=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, + /* packet pointer + nonnegative (4n+2) */ + {11, "R6=pkt(id=1,off=0,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, + {13, "R4=pkt(id=1,off=4,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, + /* NET_IP_ALIGN + (4n+2) == (4n), alignment is fine. + * We checked the bounds, but it might have been able + * to overflow if the packet pointer started in the + * upper half of the address space. + * So we did not get a 'range' on R6, and the access + * attempt will fail. + */ + {15, "R6=pkt(id=1,off=0,r=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, } }, {
From: Daniel Borkmann daniel@iogearbox.net
commit 801c6058d14a82179a7ee17a4b532cac6fad067f upstream.
The current implemented mechanisms to mitigate data disclosure under speculation mainly address stack and map value oob access from the speculative domain. However, Piotr discovered that uninitialized BPF stack is not protected yet, and thus old data from the kernel stack, potentially including addresses of kernel structures, could still be extracted from that 512 bytes large window. The BPF stack is special compared to map values since it's not zero initialized for every program invocation, whereas map values /are/ zero initialized upon their initial allocation and thus cannot leak any prior data in either domain. In the non-speculative domain, the verifier ensures that every stack slot read must have a prior stack slot write by the BPF program to avoid such data leaking issue.
However, this is not enough: for example, when the pointer arithmetic operation moves the stack pointer from the last valid stack offset to the first valid offset, the sanitation logic allows for any intermediate offsets during speculative execution, which could then be used to extract any restricted stack content via side-channel.
Given for unprivileged stack pointer arithmetic the use of unknown but bounded scalars is generally forbidden, we can simply turn the register-based arithmetic operation into an immediate-based arithmetic operation without the need for masking. This also gives the benefit of reducing the needed instructions for the operation. Given after the work in 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic mask"), the aux->alu_limit already holds the final immediate value for the offset register with the known scalar. Thus, a simple mov of the immediate to AX register with using AX as the source for the original instruction is sufficient and possible now in this case.
Reported-by: Piotr Krysiuk piotras@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Tested-by: Piotr Krysiuk piotras@gmail.com Reviewed-by: Piotr Krysiuk piotras@gmail.com Reviewed-by: John Fastabend john.fastabend@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org [fllinden@amazon.com: fixed minor 4.14 conflict because of renamed function] Signed-off-by: Frank van der Linden fllinden@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bpf_verifier.h | 5 +++-- kernel/bpf/verifier.c | 27 +++++++++++++++++---------- 2 files changed, 20 insertions(+), 12 deletions(-)
--- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -114,10 +114,11 @@ struct bpf_verifier_state_list { };
/* Possible states for alu_state member. */ -#define BPF_ALU_SANITIZE_SRC 1U -#define BPF_ALU_SANITIZE_DST 2U +#define BPF_ALU_SANITIZE_SRC (1U << 0) +#define BPF_ALU_SANITIZE_DST (1U << 1) #define BPF_ALU_NEG_VALUE (1U << 2) #define BPF_ALU_NON_POINTER (1U << 3) +#define BPF_ALU_IMMEDIATE (1U << 4) #define BPF_ALU_SANITIZE (BPF_ALU_SANITIZE_SRC | \ BPF_ALU_SANITIZE_DST)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2120,6 +2120,7 @@ static int sanitize_ptr_alu(struct bpf_v { struct bpf_insn_aux_data *aux = commit_window ? cur_aux(env) : tmp_aux; struct bpf_verifier_state *vstate = env->cur_state; + bool off_is_imm = tnum_is_const(off_reg->var_off); bool off_is_neg = off_reg->smin_value < 0; bool ptr_is_dst_reg = ptr_reg == dst_reg; u8 opcode = BPF_OP(insn->code); @@ -2150,6 +2151,7 @@ static int sanitize_ptr_alu(struct bpf_v alu_limit = abs(tmp_aux->alu_limit - alu_limit); } else { alu_state = off_is_neg ? BPF_ALU_NEG_VALUE : 0; + alu_state |= off_is_imm ? BPF_ALU_IMMEDIATE : 0; alu_state |= ptr_is_dst_reg ? BPF_ALU_SANITIZE_SRC : BPF_ALU_SANITIZE_DST; } @@ -4850,7 +4852,7 @@ static int fixup_bpf_calls(struct bpf_ve const u8 code_sub = BPF_ALU64 | BPF_SUB | BPF_X; struct bpf_insn insn_buf[16]; struct bpf_insn *patch = &insn_buf[0]; - bool issrc, isneg; + bool issrc, isneg, isimm; u32 off_reg;
aux = &env->insn_aux_data[i + delta]; @@ -4861,16 +4863,21 @@ static int fixup_bpf_calls(struct bpf_ve isneg = aux->alu_state & BPF_ALU_NEG_VALUE; issrc = (aux->alu_state & BPF_ALU_SANITIZE) == BPF_ALU_SANITIZE_SRC; + isimm = aux->alu_state & BPF_ALU_IMMEDIATE;
off_reg = issrc ? insn->src_reg : insn->dst_reg; - if (isneg) - *patch++ = BPF_ALU64_IMM(BPF_MUL, off_reg, -1); - *patch++ = BPF_MOV32_IMM(BPF_REG_AX, aux->alu_limit); - *patch++ = BPF_ALU64_REG(BPF_SUB, BPF_REG_AX, off_reg); - *patch++ = BPF_ALU64_REG(BPF_OR, BPF_REG_AX, off_reg); - *patch++ = BPF_ALU64_IMM(BPF_NEG, BPF_REG_AX, 0); - *patch++ = BPF_ALU64_IMM(BPF_ARSH, BPF_REG_AX, 63); - *patch++ = BPF_ALU64_REG(BPF_AND, BPF_REG_AX, off_reg); + if (isimm) { + *patch++ = BPF_MOV32_IMM(BPF_REG_AX, aux->alu_limit); + } else { + if (isneg) + *patch++ = BPF_ALU64_IMM(BPF_MUL, off_reg, -1); + *patch++ = BPF_MOV32_IMM(BPF_REG_AX, aux->alu_limit); + *patch++ = BPF_ALU64_REG(BPF_SUB, BPF_REG_AX, off_reg); + *patch++ = BPF_ALU64_REG(BPF_OR, BPF_REG_AX, off_reg); + *patch++ = BPF_ALU64_IMM(BPF_NEG, BPF_REG_AX, 0); + *patch++ = BPF_ALU64_IMM(BPF_ARSH, BPF_REG_AX, 63); + *patch++ = BPF_ALU64_REG(BPF_AND, BPF_REG_AX, off_reg); + } if (!issrc) *patch++ = BPF_MOV64_REG(insn->dst_reg, insn->src_reg); insn->src_reg = BPF_REG_AX; @@ -4878,7 +4885,7 @@ static int fixup_bpf_calls(struct bpf_ve insn->code = insn->code == code_add ? code_sub : code_add; *patch++ = *insn; - if (issrc && isneg) + if (issrc && isneg && !isimm) *patch++ = BPF_ALU64_IMM(BPF_MUL, off_reg, -1); cnt = patch - insn_buf;
From: Daniel Borkmann daniel@iogearbox.net
commit 3d0220f6861d713213b015b582e9f21e5b28d2e0 upstream.
Add a container structure struct bpf_sanitize_info which holds the current aux info, and update call-sites to sanitize_ptr_alu() to pass it in. This is needed for passing in additional state later on.
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Piotr Krysiuk piotras@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2110,15 +2110,19 @@ static bool sanitize_needed(u8 opcode) return opcode == BPF_ADD || opcode == BPF_SUB; }
+struct bpf_sanitize_info { + struct bpf_insn_aux_data aux; +}; + static int sanitize_ptr_alu(struct bpf_verifier_env *env, struct bpf_insn *insn, const struct bpf_reg_state *ptr_reg, const struct bpf_reg_state *off_reg, struct bpf_reg_state *dst_reg, - struct bpf_insn_aux_data *tmp_aux, + struct bpf_sanitize_info *info, const bool commit_window) { - struct bpf_insn_aux_data *aux = commit_window ? cur_aux(env) : tmp_aux; + struct bpf_insn_aux_data *aux = commit_window ? cur_aux(env) : &info->aux; struct bpf_verifier_state *vstate = env->cur_state; bool off_is_imm = tnum_is_const(off_reg->var_off); bool off_is_neg = off_reg->smin_value < 0; @@ -2147,8 +2151,8 @@ static int sanitize_ptr_alu(struct bpf_v /* In commit phase we narrow the masking window based on * the observed pointer move after the simulated operation. */ - alu_state = tmp_aux->alu_state; - alu_limit = abs(tmp_aux->alu_limit - alu_limit); + alu_state = info->aux.alu_state; + alu_limit = abs(info->aux.alu_limit - alu_limit); } else { alu_state = off_is_neg ? BPF_ALU_NEG_VALUE : 0; alu_state |= off_is_imm ? BPF_ALU_IMMEDIATE : 0; @@ -2276,7 +2280,7 @@ static int adjust_ptr_min_max_vals(struc smin_ptr = ptr_reg->smin_value, smax_ptr = ptr_reg->smax_value; u64 umin_val = off_reg->umin_value, umax_val = off_reg->umax_value, umin_ptr = ptr_reg->umin_value, umax_ptr = ptr_reg->umax_value; - struct bpf_insn_aux_data tmp_aux = {}; + struct bpf_sanitize_info info = {}; u8 opcode = BPF_OP(insn->code); u32 dst = insn->dst_reg; int ret; @@ -2327,7 +2331,7 @@ static int adjust_ptr_min_max_vals(struc
if (sanitize_needed(opcode)) { ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg, - &tmp_aux, false); + &info, false); if (ret < 0) return sanitize_err(env, insn, ret, off_reg, dst_reg); } @@ -2468,7 +2472,7 @@ static int adjust_ptr_min_max_vals(struc return -EACCES; if (sanitize_needed(opcode)) { ret = sanitize_ptr_alu(env, insn, dst_reg, off_reg, dst_reg, - &tmp_aux, true); + &info, true); if (ret < 0) return sanitize_err(env, insn, ret, off_reg, dst_reg); }
From: Daniel Borkmann daniel@iogearbox.net
commit bb01a1bba579b4b1c5566af24d95f1767859771e upstream.
Masking direction as indicated via mask_to_left is considered to be calculated once and then used to derive pointer limits. Thus, this needs to be placed into bpf_sanitize_info instead so we can pass it to sanitize_ptr_alu() call after the pointer move. Piotr noticed a corner case where the off reg causes masking direction change which then results in an incorrect final aux->alu_limit.
Fixes: 7fedb63a8307 ("bpf: Tighten speculative pointer arithmetic mask") Reported-by: Piotr Krysiuk piotras@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Piotr Krysiuk piotras@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2033,18 +2033,10 @@ enum { };
static int retrieve_ptr_limit(const struct bpf_reg_state *ptr_reg, - const struct bpf_reg_state *off_reg, - u32 *alu_limit, u8 opcode) + u32 *alu_limit, bool mask_to_left) { - bool off_is_neg = off_reg->smin_value < 0; - bool mask_to_left = (opcode == BPF_ADD && off_is_neg) || - (opcode == BPF_SUB && !off_is_neg); u32 max = 0, ptr_limit = 0;
- if (!tnum_is_const(off_reg->var_off) && - (off_reg->smin_value < 0) != (off_reg->smax_value < 0)) - return REASON_BOUNDS; - switch (ptr_reg->type) { case PTR_TO_STACK: /* Offset 0 is out-of-bounds, but acceptable start for the @@ -2112,6 +2104,7 @@ static bool sanitize_needed(u8 opcode)
struct bpf_sanitize_info { struct bpf_insn_aux_data aux; + bool mask_to_left; };
static int sanitize_ptr_alu(struct bpf_verifier_env *env, @@ -2143,7 +2136,16 @@ static int sanitize_ptr_alu(struct bpf_v if (vstate->speculative) goto do_sim;
- err = retrieve_ptr_limit(ptr_reg, off_reg, &alu_limit, opcode); + if (!commit_window) { + if (!tnum_is_const(off_reg->var_off) && + (off_reg->smin_value < 0) != (off_reg->smax_value < 0)) + return REASON_BOUNDS; + + info->mask_to_left = (opcode == BPF_ADD && off_is_neg) || + (opcode == BPF_SUB && !off_is_neg); + } + + err = retrieve_ptr_limit(ptr_reg, &alu_limit, info->mask_to_left); if (err < 0) return err;
From: Daniel Borkmann daniel@iogearbox.net
commit a7036191277f9fa68d92f2071ddc38c09b1e5ee5 upstream.
In 801c6058d14a ("bpf: Fix leakage of uninitialized bpf stack under speculation") we replaced masking logic with direct loads of immediates if the register is a known constant. Given in this case we do not apply any masking, there is also no reason for the operation to be truncated under the speculative domain.
Therefore, there is also zero reason for the verifier to branch-off and simulate this case, it only needs to do it for unknown but bounded scalars. As a side-effect, this also enables few test cases that were previously rejected due to simulation under zero truncation.
Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Piotr Krysiuk piotras@gmail.com Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2169,8 +2169,12 @@ do_sim: /* If we're in commit phase, we're done here given we already * pushed the truncated dst_reg into the speculative verification * stack. + * + * Also, when register is a known constant, we rewrite register-based + * operation to immediate-based, and thus do not need masking (and as + * a consequence, do not need to simulate the zero-truncation either). */ - if (commit_window) + if (commit_window || off_is_imm) return 0;
/* Simulate and find potential out-of-bounds access under
From: Michael Chan michael.chan@broadcom.com
commit 1d86859fdf31a0d50cc82b5d0d6bfb5fe98f6c00 upstream.
The dev_port is meant to distinguish the network ports belonging to the same PCI function. Our devices only have one network port associated with each PCI function and so we should not set it for correctness.
Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -4791,7 +4791,6 @@ static int bnxt_hwrm_func_qcaps(struct b
pf->fw_fid = le16_to_cpu(resp->fid); pf->port_id = le16_to_cpu(resp->port_id); - bp->dev->dev_port = pf->port_id; memcpy(pf->mac_addr, resp->mac_address, ETH_ALEN); pf->max_rsscos_ctxs = le16_to_cpu(resp->max_rsscos_ctx); pf->max_cp_rings = le16_to_cpu(resp->max_cmpl_rings);
From: Sean Christopherson seanjc@google.com
commit 0884335a2e653b8a045083aa1d57ce74269ac81d upstream.
Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not in 64-bit mode. The APM states bits 63:32 are dropped for both DRs and CRs:
In 64-bit mode, the operand size is fixed at 64 bits without the need for a REX prefix. In non-64-bit mode, the operand size is fixed at 32 bits and the upper 32 bits of the destination are forced to 0.
Fixes: 7ff76d58a9dc ("KVM: SVM: enhance MOV CR intercept handler") Fixes: cae3797a4639 ("KVM: SVM: enhance mov DR intercept handler") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210422022128.3464144-4-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com [sudip: manual backport to old file] Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3532,7 +3532,7 @@ static int cr_interception(struct vcpu_s err = 0; if (cr >= 16) { /* mov to cr */ cr -= 16; - val = kvm_register_read(&svm->vcpu, reg); + val = kvm_register_readl(&svm->vcpu, reg); switch (cr) { case 0: if (!check_selective_cr0_intercepted(svm, val)) @@ -3577,7 +3577,7 @@ static int cr_interception(struct vcpu_s kvm_queue_exception(&svm->vcpu, UD_VECTOR); return 1; } - kvm_register_write(&svm->vcpu, reg, val); + kvm_register_writel(&svm->vcpu, reg, val); } return kvm_complete_insn_gp(&svm->vcpu, err); } @@ -3607,13 +3607,13 @@ static int dr_interception(struct vcpu_s if (dr >= 16) { /* mov to DRn */ if (!kvm_require_dr(&svm->vcpu, dr - 16)) return 1; - val = kvm_register_read(&svm->vcpu, reg); + val = kvm_register_readl(&svm->vcpu, reg); kvm_set_dr(&svm->vcpu, dr - 16, val); } else { if (!kvm_require_dr(&svm->vcpu, dr)) return 1; kvm_get_dr(&svm->vcpu, dr, &val); - kvm_register_write(&svm->vcpu, reg, val); + kvm_register_writel(&svm->vcpu, reg, val); }
return kvm_skip_emulated_instruction(&svm->vcpu);
From: Cheng Jian cj.chengjian@huawei.com
commit 60588bfa223ff675b95f866249f90616613fbe31 upstream.
select_idle_cpu() will scan the LLC domain for idle CPUs, it's always expensive. so the next commit :
1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()")
introduces a way to limit how many CPUs we scan.
But it consume some CPUs out of 'nr' that are not allowed for the task and thus waste our attempts. The function always return nr_cpumask_bits, and we can't find a CPU which our task is allowed to run.
Cpumask may be too big, similar to select_idle_core(), use per_cpu_ptr 'select_idle_mask' to prevent stack overflow.
Fixes: 1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()") Signed-off-by: Cheng Jian cj.chengjian@huawei.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Srikar Dronamraju srikar@linux.vnet.ibm.com Reviewed-by: Vincent Guittot vincent.guittot@linaro.org Reviewed-by: Valentin Schneider valentin.schneider@arm.com Link: https://lkml.kernel.org/r/20191213024530.28052-1-cj.chengjian@huawei.com Signed-off-by: Yang Wei yang.wei@linux.alibaba.com Tested-by: Yang Wei yang.wei@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/fair.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5779,6 +5779,7 @@ static inline int select_idle_smt(struct */ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target) { + struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask); struct sched_domain *this_sd; u64 avg_cost, avg_idle; u64 time, cost; @@ -5809,11 +5810,11 @@ static int select_idle_cpu(struct task_s
time = local_clock();
- for_each_cpu_wrap(cpu, sched_domain_span(sd), target) { + cpumask_and(cpus, sched_domain_span(sd), &p->cpus_allowed); + + for_each_cpu_wrap(cpu, cpus, target) { if (!--nr) return -1; - if (!cpumask_test_cpu(cpu, &p->cpus_allowed)) - continue; if (idle_cpu(cpu)) break; }
From: Jan Beulich jbeulich@suse.com
The commit referenced below was incomplete: It merely affected what would get written to the vdev-<N> xenstore node. The guest would still find the function at the original function number as long as __xen_pcibk_get_pci_dev() wouldn't be in sync. The same goes for AER wrt __xen_pcibk_get_pcifront_dev().
Undo overriding the function to zero and instead make sure that VFs at function zero remain alone in their slot. This has the added benefit of improving overall capacity, considering that there's only a total of 32 slots available right now (PCI segment and bus can both only ever be zero at present).
This is upstream commit 4ba50e7c423c29639878c00573288869aa627068.
Fixes: 8a5248fe10b1 ("xen PV passthru: assign SR-IOV virtual functions to separate virtual slots") Signed-off-by: Jan Beulich jbeulich@suse.com Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Link: https://lore.kernel.org/r/8def783b-404c-3452-196d-3f3fd4d72c9e@suse.com Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/xen/xen-pciback/vpci.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
--- a/drivers/xen/xen-pciback/vpci.c +++ b/drivers/xen/xen-pciback/vpci.c @@ -69,7 +69,7 @@ static int __xen_pcibk_add_pci_dev(struc struct pci_dev *dev, int devid, publish_pci_dev_cb publish_cb) { - int err = 0, slot, func = -1; + int err = 0, slot, func = PCI_FUNC(dev->devfn); struct pci_dev_entry *t, *dev_entry; struct vpci_dev_data *vpci_dev = pdev->pci_dev_data;
@@ -94,23 +94,26 @@ static int __xen_pcibk_add_pci_dev(struc
/* * Keep multi-function devices together on the virtual PCI bus, except - * virtual functions. + * that we want to keep virtual functions at func 0 on their own. They + * aren't multi-function devices and hence their presence at func 0 + * may cause guests to not scan the other functions. */ - if (!dev->is_virtfn) { + if (!dev->is_virtfn || func) { for (slot = 0; slot < PCI_SLOT_MAX; slot++) { if (list_empty(&vpci_dev->dev_list[slot])) continue;
t = list_entry(list_first(&vpci_dev->dev_list[slot]), struct pci_dev_entry, list); + if (t->dev->is_virtfn && !PCI_FUNC(t->dev->devfn)) + continue;
if (match_slot(dev, t->dev)) { pr_info("vpci: %s: assign to virtual slot %d func %d\n", pci_name(dev), slot, - PCI_FUNC(dev->devfn)); + func); list_add_tail(&dev_entry->list, &vpci_dev->dev_list[slot]); - func = PCI_FUNC(dev->devfn); goto unlock; } } @@ -123,7 +126,6 @@ static int __xen_pcibk_add_pci_dev(struc pci_name(dev), slot); list_add_tail(&dev_entry->list, &vpci_dev->dev_list[slot]); - func = dev->is_virtfn ? 0 : PCI_FUNC(dev->devfn); goto unlock; } }
On Tue, 08 Jun 2021 20:26:43 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.236 release. There are 47 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.236-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v4.14: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 32 tests: 32 pass, 0 fail
Linux version: 4.14.236-rc1-g872e045a48f8 Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On Wed, 9 Jun 2021 at 00:01, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.236 release. There are 47 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.236-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 4.14.236-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git branch: linux-4.14.y * git commit: 872e045a48f87d3c4c6deaab477abd068fda46b3 * git describe: v4.14.235-48-g872e045a48f8 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.14.y/build/v4.14....
## No regressions (compared to v4.14.235)
## No fixes (compared to v4.14.235)
## Test result summary total: 61694, pass: 48597, fail: 1298, skip: 10549, xfail: 1250,
## Build Summary * arm: 97 total, 97 passed, 0 failed * arm64: 24 total, 24 passed, 0 failed * dragonboard-410c: 1 total, 1 passed, 0 failed * hi6220-hikey: 1 total, 1 passed, 0 failed * i386: 14 total, 14 passed, 0 failed * juno-r2: 1 total, 1 passed, 0 failed * mips: 36 total, 36 passed, 0 failed * sparc: 9 total, 9 passed, 0 failed * x15: 1 total, 1 passed, 0 failed * x86: 1 total, 1 passed, 0 failed * x86_64: 14 total, 14 passed, 0 failed
## Test suites summary * fwts * igt-gpu-tools * install-android-platform-tools-r2600 * kselftest-android * kselftest-bpf * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kvm-unit-tests * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * rcutorture * ssuite * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
On Tue, Jun 08, 2021 at 08:26:43PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.236 release. There are 47 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 10 Jun 2021 17:59:18 +0000. Anything received after that time might be too late.
Build results: total: 168 pass: 168 fail: 0 Qemu test results: total: 406 pass: 406 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
linux-stable-mirror@lists.linaro.org