This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.5.8-rc1
Jim Mattson jmattson@google.com kvm: nVMX: VMWRITE checks unsupported field before read-only field
Jim Mattson jmattson@google.com kvm: nVMX: VMWRITE checks VMCS-link pointer before VMCS field
David Rientjes rientjes@google.com mm, thp: fix defrag setting if newline is not used
Wei Yang richardw.yang@linux.intel.com mm/huge_memory.c: use head to check huge zero page
John Hubbard jhubbard@nvidia.com mm/gup: allow FOLL_FORCE for get_user_pages_fast()
Vlastimil Babka vbabka@suse.cz mm/debug.c: always print flags in dump_page()
Waiman Long longman@redhat.com locking/lockdep: Fix lockdep_stats indentation problem
Daniel Jordan daniel.m.jordan@oracle.com padata: always acquire cpu_hotplug_lock before pinst->lock
Christoph Hellwig hch@lst.de xfs: clear kernel only flags in XFS_IOC_ATTRMULTI_BY_HANDLE
Bjorn Andersson bjorn.andersson@linaro.org clk: qcom: rpmh: Sort OF match table
Sameer Pujar spujar@nvidia.com bus: tegra-aconnect: Remove PM_CLK dependency
Matteo Croce mcroce@redhat.com netfilter: nf_flowtable: fix documentation
Xin Long lucien.xin@gmail.com netfilter: nft_tunnel: no need to call htons() when dumping ports
Florian Fainelli f.fainelli@gmail.com thermal: brcmstb_thermal: Do not use DT coefficients
Linus Walleij linus.walleij@linaro.org thermal: db8500: Depromote debug print
Geert Uytterhoeven geert@linux-m68k.org ubifs: Fix ino_t format warnings in orphan_delete()
Neeraj Upadhyay neeraju@codeaurora.org rcu: Allow only one expedited GP to run concurrently with wakeups
Sean Christopherson sean.j.christopherson@intel.com KVM: x86: Remove spurious clearing of async #PF MSR
Sean Christopherson sean.j.christopherson@intel.com KVM: x86: Remove spurious kvm_mmu_unload() from vcpu destruction path
Peter Xu peterx@redhat.com KVM: X86: Fix kvm_bitmap_or_dest_vcpus() to use irq shorthand
Xiaochen Shen xiaochen.shen@intel.com x86/resctrl: Check monitoring static key in the MBM overflow handler
Cengiz Can cengiz@kernel.wtf perf maps: Add missing unlock to maps__insert() error case
Jiri Olsa jolsa@kernel.org perf ui gtk: Add missing zalloc object
Arnaldo Carvalho de Melo acme@redhat.com perf hists browser: Restore ESC as "Zoom out" of DSO/thread/etc
Uwe Kleine-König u.kleine-koenig@pengutronix.de pwm: omap-dmtimer: put_device() after of_find_device_by_node()
Thomas Gleixner tglx@linutronix.de lib/vdso: Update coarse timekeeper unconditionally
Thomas Gleixner tglx@linutronix.de lib/vdso: Make __arch_update_vdso_data() logic understandable
Masami Hiramatsu mhiramat@kernel.org kprobes: Set unoptimized flag after unoptimizing code
Janne Karhunen janne.karhunen@gmail.com ima: ima/lsm policy rule loading logic bug fixes
Christophe JAILLET christophe.jaillet@wanadoo.fr drivers: net: xgene: Fix the order of the arguments of 'alloc_etherdev_mqs()'
Lijun Ou oulijun@huawei.com RDMA/hns: Bugfix for posting a wqe with sge
Yixian Liu liuyixian@huawei.com RDMA/hns: Simplify the calculation and usage of wqe idx for post verbs
Chao Yu chao@kernel.org f2fs: fix to add swap extent correctly
Cheng Jian cj.chengjian@huawei.com sched/fair: Optimize select_idle_cpu
Sean Christopherson sean.j.christopherson@intel.com KVM: Check for a bad hva before dropping into the ghc slow path
Tom Lendacky thomas.lendacky@amd.com KVM: SVM: Override default MMIO mask if memory encryption is enabled
Jin Yao yao.jin@linux.intel.com perf report: Fix no libunwind compiled warning break s390 issue
Brian Norris briannorris@chromium.org mwifiex: delete unused mwifiex_get_intf_num()
Brian Norris briannorris@chromium.org mwifiex: drop most magic numbers from mwifiex_process_tdls_action_frame()
Aleksa Sarai cyphar@cyphar.com namei: only return -ECHILD from follow_dotdot_rcu()
Tuong Lien tuong.t.lien@dektech.com.au tipc: fix successful connect() but timed out
Arthur Kiyanovski akiyano@amazon.com net: ena: make ena rxfh support ETH_RSS_HASH_NO_CHANGE
Ursula Braun ubraun@linux.ibm.com net/smc: no peer ID in CLC decline for SMCD
Michael Ellerman mpe@ellerman.id.au selftests: Install settings files to fix TIMEOUT failures
Dmitry Bogdanov dbogdanov@marvell.com net: atlantic: fix out of range usage of active_vlans array
Pavel Belous pbelous@marvell.com net: atlantic: possible fault in transition to hibernation
Pavel Belous pbelous@marvell.com net: atlantic: fix potential error handling
Pavel Belous pbelous@marvell.com net: atlantic: fix use after free kasan warn
Nikita Danilov ndanilov@marvell.com net: atlantic: better loopback mode handling
Dmitry Bezrukov dbezrukov@marvell.com net: atlantic: checksum compat issue
Nikolay Aleksandrov nikolay@cumulusnetworks.com net: netlink: cap max groups which will be considered in netlink_bind()
Julian Wiedmann jwi@linux.ibm.com s390/qeth: fix off-by-one in RX copybreak check
Alexandra Winter wintera@linux.ibm.com s390/qeth: vnicc Fix EOPNOTSUPP precedence
Bijan Mottahedeh bijan.mottahedeh@oracle.com nvme-pci: Hold cq_poll_lock while completing CQEs
Peter Chen peter.chen@nxp.com usb: charger: assign specific number for enum value
Haiyang Zhang haiyangz@microsoft.com hv_netvsc: Fix unwanted wakeup in netvsc_attach()
Masahiro Yamada masahiroy@kernel.org kbuild: fix DT binding schema rule to detect command line changes
Andrei Otcheretianski andrei.otcheretianski@intel.com mac80211: Remove a redundant mutex unlock
Johannes Berg johannes.berg@intel.com nl80211: fix potential leak in AP start
Tina Zhang tina.zhang@intel.com drm/i915/gvt: Separate display reset from ALL_ENGINES reset
Chris Wilson chris@chris-wilson.co.uk drm/i915: Avoid recursing onto active vma from the shrinker
Tina Zhang tina.zhang@intel.com drm/i915/gvt: Fix orphan vgpu dmabuf_objs' lifetime
Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz MIPS: cavium_octeon: Fix syncw generation.
Wolfram Sang wsa@the-dreams.de i2c: jz4780: silence log flood on txabrt
Gustavo A. R. Silva gustavo@embeddedor.com i2c: altera: Fix potential integer overflow
Oliver Upton oupton@google.com KVM: nVMX: Emulate MTF when performing instruction emulation
Christophe JAILLET christophe.jaillet@wanadoo.fr MIPS: VPE: Fix a double free and a memory leak in 'release_vpe()'
Anup Patel anup.patel@wdc.com RISC-V: Don't enable all interrupts in trap_init()
dan.carpenter@oracle.com dan.carpenter@oracle.com HID: hiddev: Fix race in in hiddev_disconnect()
Christophe JAILLET christophe.jaillet@wanadoo.fr HID: alps: Fix an error handling path in 'alps_input_configured()'
Cong Wang xiyou.wangcong@gmail.com netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: Fix forceadd evaluation path
Eugenio Pérez eperezma@redhat.com vhost: Check docket sk_family instead of call getname
Ursula Braun ubraun@linux.ibm.com net/smc: transfer fasync_list in case of fallback
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
Jens Axboe axboe@kernel.dk io_uring: fix 32-bit compatability with sendmsg/recvmsg
Rafael J. Wysocki rafael.j.wysocki@intel.com cpufreq: Fix policy initialization for internal governor drivers
Shirish S shirish.s@amd.com amdgpu/gmc_v9: save/restore sdpif regs during S3
Orson Zhai orson.unisoc@gmail.com Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs"
Steven Rostedt (VMware) rostedt@goodmis.org tracing: Disable trace_printk() on post poned tests
Jan Kara jack@suse.cz blktrace: Protect q->blk_trace with RCU
Wolfram Sang wsa@the-dreams.de macintosh: therm_windtunnel: fix regression when instantiating devices
Daniel Vetter daniel.vetter@ffwll.ch drm/radeon: Inline drm_get_pci_dev
Daniel Vetter daniel.vetter@ffwll.ch drm/amdgpu: Drop DRIVER_USE_AGP
Johan Korsnes jkorsnes@cisco.com HID: core: increase HID report buffer size to 8KiB
Johan Korsnes jkorsnes@cisco.com HID: core: fix off-by-one memset in hid_report_raw_event()
Hans de Goede hdegoede@redhat.com HID: ite: Only bind to keyboard USB interface on Acer SW5-012 keyboard dock
Oliver Upton oupton@google.com KVM: VMX: check descriptor table exits on instruction emulation
Mika Westerberg mika.westerberg@linux.intel.com ACPI: watchdog: Fix gas->access_width usage
Mika Westerberg mika.westerberg@linux.intel.com ACPICA: Introduce ACPI_ACCESS_BYTE_WIDTH() macro
Paul Moore paul@paul-moore.com audit: always check the netlink payload length in audit_receive_msg()
Paul Moore paul@paul-moore.com audit: fix error handling in audit_data_to_entry()
Dan Carpenter dan.carpenter@oracle.com ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()
Kees Cook keescook@chromium.org docs: Fix empty parallelism argument
Benjamin Block bblock@linux.ibm.com scsi: zfcp: fix wrong data and display format of SFP+ temperature
Damien Le Moal damien.lemoal@wdc.com scsi: sd_sbc: Fix sd_zbc_report_zones()
Keith Busch kbusch@kernel.org nvme/pci: move cqe check after device shutdown
Nigel Kirkland nigel.kirkland@broadcom.com nvme: prevent warning triggered by nvme_stop_keep_alive
Anton Eidelman anton@lightbitslabs.com nvme/tcp: fix bug on double requeue when send fails
Guangbin Huang huangguangbin2@huawei.com net: hns3: fix a copying IPv6 address error in hclge_fd_get_flow_tuples()
Yonglong Liu liuyonglong@huawei.com net: hns3: fix VF bandwidth does not take effect in some case
Yufeng Mo moyufeng@huawei.com net: hns3: add management table after IMP reset
Shay Bar shay.bar@celeno.com mac80211: fix wrong 160/80+80 MHz setting
Sergey Matyukevich sergey.matyukevich.os@quantenna.com cfg80211: add missing policy for NL80211_ATTR_STATUS_CODE
Coly Li colyli@suse.de bcache: ignore pending signals when creating gc and allocator thread
Frank Sorenson sorenson@redhat.com cifs: Fix mode output in debugging statements
Jens Axboe axboe@kernel.dk io-wq: don't call kXalloc_node() with non-online node
Ben Shelton benjamin.h.shelton@intel.com ice: Use correct netif error function
Anirudh Venkataramanan anirudh.venkataramanan@intel.com ice: Use ice_pf_to_dev
Bruce Allan bruce.w.allan@intel.com ice: update Unit Load Status bitmask to check after reset
Bruce Allan bruce.w.allan@intel.com ice: fix and consolidate logging of NVM/firmware version information
Brett Creeley brett.creeley@intel.com ice: Don't allow same value for Rx tail to be written twice
Dave Ertman david.m.ertman@intel.com ice: Fix switch between FW and SW LLDP
Arthur Kiyanovski akiyano@amazon.com net: ena: ena-com.c: prevent NULL pointer dereference
Sameeh Jubran sameehj@amazon.com net: ena: ethtool: use correct value for crc32 hash
Arthur Kiyanovski akiyano@amazon.com net: ena: fix corruption of dev_idx_to_host_tbl
Arthur Kiyanovski akiyano@amazon.com net: ena: fix incorrectly saving queue numbers when setting RSS indirection table
Arthur Kiyanovski akiyano@amazon.com net: ena: rss: store hash function as values and not bits
Sameeh Jubran sameehj@amazon.com net: ena: rss: fix failure to get indirection table
Sameeh Jubran sameehj@amazon.com net: ena: rss: do not allocate key when not supported
Arthur Kiyanovski akiyano@amazon.com net: ena: fix incorrect default RSS key
Arthur Kiyanovski akiyano@amazon.com net: ena: add missing ethtool TX timestamping indication
Arthur Kiyanovski akiyano@amazon.com net: ena: fix uses of round_jiffies()
Arthur Kiyanovski akiyano@amazon.com net: ena: fix potential crash when rxfh key is NULL
Brett Creeley brett.creeley@intel.com i40e: Fix the conditional for i40e_vc_validate_vqs_bitmaps
Thierry Reding treding@nvidia.com soc/tegra: fuse: Fix build with Tegra194 configuration
Daniel Kolesa daniel@octaforge.org amdgpu: Prevent build errors regarding soft/hard-float FP ABI tags
Isabel Zhang isabel.zhang@amd.com drm/amd/display: Add initialitions for PLL2 clock source
Yongqiang Sun yongqiang.sun@amd.com drm/amd/display: Limit minimum DPPCLK to 100MHz.
Aric Cyr aric.cyr@amd.com drm/amd/display: Check engine is not NULL before acquiring
Krishnamraju Eraparaju krishna2@chelsio.com RDMA/siw: Remove unwanted WARN_ON in siw_cm_llp_data_ready()
Sung Lee sung.lee@amd.com drm/amd/display: Do not set optimized_require to false after plane disable
Kuninori Morimoto kuninori.morimoto.gx@renesas.com ARM: dts: sti: fixup sound frame-inversion for stihxxx-b2120.dtsi
Xiubo Li xiubli@redhat.com ceph: do not execute direct write in parallel if O_APPEND is specified
Kan Liang kan.liang@linux.intel.com perf/x86/msr: Add Tremont support
Kan Liang kan.liang@linux.intel.com perf/x86/cstate: Add Tremont support
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Add Elkhart Lake support
Peter Zijlstra peterz@infradead.org arm/ftrace: Fix BE text poking
John Garry john.garry@huawei.com perf/smmuv3: Use platform_get_irq_optional() for wired interrupt
Trond Myklebust trondmy@gmail.com NFSv4: Fix races between open and dentry revalidation
Bjørn Mork bjorn@mork.no qmi_wwan: unconditionally reject 2 ep interfaces
Bjørn Mork bjorn@mork.no qmi_wwan: re-add DW5821e pre-production variant
Harald Freudenberger freude@linux.ibm.com s390/zcrypt: fix card and queue total counter wrap
Stefano Garzarella sgarzare@redhat.com io_uring: flush overflowed CQ events in the io_uring_poll()
Sergey Matyukevich sergey.matyukevich.os@quantenna.com cfg80211: check wiphy driver existence for drvinfo report
Johannes Berg johannes.berg@intel.com mac80211: consider more elements in parsing CRC
Jeff Moyer jmoyer@redhat.com dax: pass NOWAIT flag to iomap_apply
Vincent Guittot vincent.guittot@linaro.org sched/fair: Prevent unlimited runtime on throttled group
Peter Zijlstra (Intel) peterz@infradead.org timers/nohz: Update NOHZ load in remote tick
Scott Wood swood@redhat.com sched/core: Don't skip remote tick for idle CPUs
Sean Paul seanpaul@chromium.org drm/msm: Set dma maximum segment size for mdss
Corey Minyard cminyard@mvista.com ipmi:ssif: Handle a possible NULL pointer reference
Eric Dumazet edumazet@google.com net: rtnetlink: fix bugs in rtnl_alt_ifname()
Alexandre Belloni alexandre.belloni@bootlin.com net: macb: Properly handle phylink on at91rm9200
Eric Dumazet edumazet@google.com net: add strict checks in netdev_name_node_alt_destroy()
Shannon Nelson snelson@pensando.io ionic: fix fw_status read
Benjamin Poirier bpoirier@cumulusnetworks.com ipv6: Fix nlmsg_flags when splitting a multipath route
Benjamin Poirier bpoirier@cumulusnetworks.com ipv6: Fix route replacement with dev-only route
Taehee Yoo ap420073@gmail.com bonding: fix lockdep warning in bond_get_stats()
Taehee Yoo ap420073@gmail.com net: export netdev_next_lower_dev_rcu()
Taehee Yoo ap420073@gmail.com bonding: add missing netdev_update_lockdep_key()
Vasundhara Volam vasundhara-v.volam@broadcom.com bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs.
Vasundhara Volam vasundhara-v.volam@broadcom.com bnxt_en: Improve device shutdown method.
Xin Long lucien.xin@gmail.com sctp: move the format error check out of __sctp_sf_do_9_1_abort
Willem de Bruijn willemb@google.com udp: rehash on disconnect
Paolo Abeni pabeni@redhat.com Revert "net: dev: introduce support for sch BYPASS for lockless qdisc"
Michal Kalderon michal.kalderon@marvell.com qede: Fix race between rdma destroy workqueue and link change event
Dmitry Osipenko digetx@gmail.com nfc: pn544: Fix occasional HW initialization failure
Rohit Maheshwari rohitm@chelsio.com net/tls: Fix to avoid gettig invalid tls record
Jason Baron jbaron@akamai.com net: sched: correct flower port blocking
Arun Parameswaran arun.parameswaran@broadcom.com net: phy: restore mdio regs in the iproc mdio driver
Horatiu Vultur horatiu.vultur@microchip.com net: mscc: fix in frame extraction
Alexandre Belloni alexandre.belloni@bootlin.com net: macb: ensure interface is not suspended on at91rm9200
Jethro Beekman jethro@fortanix.com net: fib_rules: Correctly set table field when table number exceeds 8 bits
Florian Fainelli f.fainelli@gmail.com net: dsa: b53: Ensure the default VID is untagged
Aristeu Rozanski aris@redhat.com EDAC: skx_common: downgrade message importance on missing PCI device
-------------
Diffstat:
Documentation/networking/nf_flowtable.txt | 2 +- Documentation/sphinx/parallel-wrapper.sh | 2 +- Makefile | 4 +- arch/arm/boot/dts/stihxxx-b2120.dtsi | 2 +- arch/arm/include/asm/vdso/vsyscall.h | 4 +- arch/arm/kernel/ftrace.c | 7 +- arch/mips/include/asm/sync.h | 4 +- arch/mips/kernel/vpe.c | 2 +- arch/riscv/kernel/traps.c | 4 +- arch/x86/events/intel/core.c | 1 + arch/x86/events/intel/cstate.c | 22 +- arch/x86/events/msr.c | 3 +- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kernel/cpu/resctrl/internal.h | 1 + arch/x86/kernel/cpu/resctrl/monitor.c | 4 +- arch/x86/kvm/lapic.c | 2 +- arch/x86/kvm/svm.c | 44 ++ arch/x86/kvm/vmx/nested.c | 105 ++-- arch/x86/kvm/vmx/nested.h | 5 + arch/x86/kvm/vmx/vmx.c | 52 +- arch/x86/kvm/vmx/vmx.h | 3 + arch/x86/kvm/x86.c | 8 +- drivers/acpi/acpi_watchdog.c | 3 +- drivers/bus/Kconfig | 1 - drivers/char/ipmi/ipmi_ssif.c | 10 +- drivers/clk/qcom/clk-rpmh.c | 2 +- drivers/cpufreq/cpufreq.c | 12 +- drivers/devfreq/devfreq.c | 4 +- drivers/edac/skx_common.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 37 +- drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile | 6 + .../drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c | 6 + drivers/gpu/drm/amd/display/dc/dce/dce_aux.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 1 - .../gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 6 + .../drm/amd/include/asic_reg/dce/dce_12_0_offset.h | 2 + drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 4 +- drivers/gpu/drm/i915/gvt/dmabuf.c | 2 +- drivers/gpu/drm/i915/gvt/vgpu.c | 2 +- drivers/gpu/drm/msm/msm_drv.c | 8 + drivers/gpu/drm/radeon/radeon_drv.c | 43 +- drivers/gpu/drm/radeon/radeon_kms.c | 6 + drivers/hid/hid-alps.c | 2 +- drivers/hid/hid-core.c | 4 +- drivers/hid/hid-ite.c | 5 +- drivers/hid/usbhid/hiddev.c | 2 +- drivers/i2c/busses/i2c-altera.c | 2 +- drivers/i2c/busses/i2c-jz4780.c | 36 +- drivers/infiniband/hw/hns/hns_roce_device.h | 3 +- drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 37 +- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 80 +-- drivers/infiniband/sw/siw/siw_cm.c | 5 +- drivers/macintosh/therm_windtunnel.c | 52 +- drivers/md/bcache/alloc.c | 18 +- drivers/md/bcache/btree.c | 13 + drivers/net/bonding/bond_main.c | 55 +- drivers/net/bonding/bond_options.c | 2 + drivers/net/dsa/b53/b53_common.c | 3 + drivers/net/ethernet/amazon/ena/ena_com.c | 96 ++-- drivers/net/ethernet/amazon/ena/ena_com.h | 9 + drivers/net/ethernet/amazon/ena/ena_ethtool.c | 46 +- drivers/net/ethernet/amazon/ena/ena_netdev.c | 6 +- drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 + drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 2 +- .../net/ethernet/aquantia/atlantic/aq_ethtool.c | 5 + .../net/ethernet/aquantia/atlantic/aq_filters.c | 2 +- drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 8 +- .../net/ethernet/aquantia/atlantic/aq_pci_func.c | 13 +- drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 10 +- drivers/net/ethernet/aquantia/atlantic/aq_ring.h | 3 +- .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 18 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 12 +- drivers/net/ethernet/cadence/macb.h | 1 + drivers/net/ethernet/cadence/macb_main.c | 66 ++- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 22 +- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 +- drivers/net/ethernet/intel/ice/ice_base.c | 12 +- drivers/net/ethernet/intel/ice/ice_common.c | 17 +- drivers/net/ethernet/intel/ice/ice_dcb_nl.c | 12 +- drivers/net/ethernet/intel/ice/ice_ethtool.c | 17 +- drivers/net/ethernet/intel/ice/ice_hw_autogen.h | 6 + drivers/net/ethernet/intel/ice/ice_lib.c | 33 +- drivers/net/ethernet/intel/ice/ice_lib.h | 2 - drivers/net/ethernet/intel/ice/ice_main.c | 23 +- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 2 +- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 8 +- drivers/net/ethernet/mscc/ocelot_board.c | 8 + drivers/net/ethernet/pensando/ionic/ionic_dev.c | 11 +- drivers/net/ethernet/pensando/ionic/ionic_if.h | 1 + drivers/net/ethernet/qlogic/qede/qede.h | 2 + drivers/net/ethernet/qlogic/qede/qede_rdma.c | 29 +- drivers/net/hyperv/netvsc.c | 2 +- drivers/net/hyperv/netvsc_drv.c | 3 + drivers/net/phy/mdio-bcm-iproc.c | 20 + drivers/net/usb/qmi_wwan.c | 43 +- drivers/net/wireless/marvell/mwifiex/main.h | 13 - drivers/net/wireless/marvell/mwifiex/tdls.c | 75 +-- drivers/nfc/pn544/i2c.c | 1 + drivers/nvme/host/core.c | 10 +- drivers/nvme/host/pci.c | 25 +- drivers/nvme/host/rdma.c | 2 +- drivers/nvme/host/tcp.c | 9 +- drivers/perf/arm_smmuv3_pmu.c | 2 +- drivers/pwm/pwm-omap-dmtimer.c | 21 +- drivers/s390/crypto/ap_bus.h | 4 +- drivers/s390/crypto/ap_card.c | 8 +- drivers/s390/crypto/ap_queue.c | 6 +- drivers/s390/crypto/zcrypt_api.c | 16 +- drivers/s390/net/qeth_core_main.c | 2 +- drivers/s390/net/qeth_l2_main.c | 29 +- drivers/s390/scsi/zfcp_fsf.h | 2 +- drivers/s390/scsi/zfcp_sysfs.c | 2 +- drivers/scsi/sd_zbc.c | 7 +- drivers/soc/tegra/fuse/fuse-tegra30.c | 3 +- drivers/thermal/broadcom/brcmstb_thermal.c | 31 +- drivers/thermal/db8500_thermal.c | 4 +- drivers/vhost/net.c | 10 +- drivers/watchdog/wdat_wdt.c | 2 +- fs/ceph/file.c | 17 +- fs/cifs/cifsacl.c | 4 +- fs/cifs/connect.c | 2 +- fs/cifs/inode.c | 2 +- fs/dax.c | 3 + fs/ext4/super.c | 6 +- fs/f2fs/data.c | 32 +- fs/io-wq.c | 22 +- fs/io_uring.c | 12 +- fs/namei.c | 2 +- fs/nfs/nfs4file.c | 1 - fs/nfs/nfs4proc.c | 18 +- fs/ubifs/orphan.c | 4 +- fs/xfs/libxfs/xfs_attr.h | 7 +- fs/xfs/xfs_ioctl.c | 2 + fs/xfs/xfs_ioctl32.c | 2 + include/acpi/actypes.h | 3 +- include/asm-generic/vdso/vsyscall.h | 4 +- include/linux/blkdev.h | 2 +- include/linux/blktrace_api.h | 18 +- include/linux/hid.h | 2 +- include/linux/netdevice.h | 7 +- include/linux/netfilter/ipset/ip_set.h | 11 +- include/linux/sched/nohz.h | 2 + include/net/flow_dissector.h | 9 + include/uapi/linux/usb/charger.h | 16 +- kernel/audit.c | 40 +- kernel/auditfilter.c | 71 +-- kernel/kprobes.c | 4 +- kernel/locking/lockdep_proc.c | 4 +- kernel/padata.c | 4 +- kernel/rcu/tree_exp.h | 11 +- kernel/sched/core.c | 31 +- kernel/sched/fair.c | 7 +- kernel/sched/loadavg.c | 33 +- kernel/time/vsyscall.c | 37 +- kernel/trace/blktrace.c | 114 +++- kernel/trace/trace.c | 2 + mm/debug.c | 8 +- mm/gup.c | 3 +- mm/huge_memory.c | 26 +- net/core/dev.c | 34 +- net/core/fib_rules.c | 2 +- net/core/rtnetlink.c | 26 +- net/ipv4/udp.c | 6 +- net/ipv6/ip6_fib.c | 7 +- net/ipv6/route.c | 1 + net/mac80211/mlme.c | 6 +- net/mac80211/util.c | 34 +- net/netfilter/ipset/ip_set_core.c | 34 +- net/netfilter/ipset/ip_set_hash_gen.h | 635 ++++++++++++++------- net/netfilter/nft_tunnel.c | 4 +- net/netfilter/xt_hashlimit.c | 12 +- net/netlink/af_netlink.c | 5 +- net/sched/cls_flower.c | 1 + net/sctp/sm_statefuns.c | 29 +- net/smc/af_smc.c | 2 + net/smc/smc_clc.c | 4 +- net/tipc/socket.c | 2 + net/tls/tls_device.c | 20 +- net/wireless/ethtool.c | 8 +- net/wireless/nl80211.c | 5 +- scripts/Makefile.lib | 4 +- security/integrity/ima/ima_policy.c | 44 +- tools/perf/builtin-report.c | 6 +- tools/perf/ui/browsers/hists.c | 1 + tools/perf/ui/gtk/Build | 5 + tools/perf/util/map.c | 1 + tools/testing/selftests/ftrace/Makefile | 2 +- tools/testing/selftests/livepatch/Makefile | 2 + tools/testing/selftests/net/fib_tests.sh | 6 + tools/testing/selftests/rseq/Makefile | 2 + tools/testing/selftests/rtc/Makefile | 2 + virt/kvm/kvm_main.c | 12 +- 196 files changed, 2077 insertions(+), 1118 deletions(-)
From: Aristeu Rozanski aris@redhat.com
[ Upstream commit 854bb48018d5da261d438b2232fa683bdb553979 ]
Both skx_edac and i10nm_edac drivers are loaded based on the matching CPU being available which leads the module to be automatically loaded in virtual machines as well. That will fail due the missing PCI devices. In both drivers the first function to make use of the PCI devices is skx_get_hi_lo() will simply print
EDAC skx: Can't get tolm/tohm
for each CPU core, which is noisy. This patch makes it a debug message.
Signed-off-by: Aristeu Rozanski aris@redhat.com Signed-off-by: Tony Luck tony.luck@intel.com Link: https://lore.kernel.org/r/20191204212325.c4k47p5hrnn3vpb5@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/edac/skx_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/edac/skx_common.c b/drivers/edac/skx_common.c index 95662a4ff4c4f..99bbaf629b8d9 100644 --- a/drivers/edac/skx_common.c +++ b/drivers/edac/skx_common.c @@ -256,7 +256,7 @@ int skx_get_hi_lo(unsigned int did, int off[], u64 *tolm, u64 *tohm)
pdev = pci_get_device(PCI_VENDOR_ID_INTEL, did, NULL); if (!pdev) { - skx_printk(KERN_ERR, "Can't get tolm/tohm\n"); + edac_dbg(2, "Can't get tolm/tohm\n"); return -ENODEV; }
From: Florian Fainelli f.fainelli@gmail.com
[ Upstream commit d965a5432d4c3e6b9c3d2bc1d4a800013bbf76f6 ]
We need to ensure that the default VID is untagged otherwise the switch will be sending tagged frames and the results can be problematic. This is especially true with b53 switches that use VID 0 as their default VLAN since VID 0 has a special meaning.
Fixes: fea83353177a ("net: dsa: b53: Fix default VLAN ID") Fixes: 061f6a505ac3 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation") Signed-off-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/b53/b53_common.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -1353,6 +1353,9 @@ void b53_vlan_add(struct dsa_switch *ds,
b53_get_vlan_entry(dev, vid, vl);
+ if (vid == 0 && vid == b53_default_pvid(dev)) + untagged = true; + vl->members |= BIT(port); if (untagged && !dsa_is_cpu_port(ds, port)) vl->untag |= BIT(port);
From: Jethro Beekman jethro@fortanix.com
[ Upstream commit 540e585a79e9d643ede077b73bcc7aa2d7b4d919 ]
In 709772e6e06564ed94ba740de70185ac3d792773, RT_TABLE_COMPAT was added to allow legacy software to deal with routing table numbers >= 256, but the same change to FIB rule queries was overlooked.
Signed-off-by: Jethro Beekman jethro@fortanix.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/fib_rules.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -974,7 +974,7 @@ static int fib_nl_fill_rule(struct sk_bu
frh = nlmsg_data(nlh); frh->family = ops->family; - frh->table = rule->table; + frh->table = rule->table < 256 ? rule->table : RT_TABLE_COMPAT; if (nla_put_u32(skb, FRA_TABLE, rule->table)) goto nla_put_failure; if (nla_put_u32(skb, FRA_SUPPRESS_PREFIXLEN, rule->suppress_prefixlen))
From: Alexandre Belloni alexandre.belloni@bootlin.com
[ Upstream commit e6a41c23df0d5da01540d2abef41591589c0b4be ]
Because of autosuspend, at91ether_start is called with clocks disabled. Ensure that pm_runtime doesn't suspend the interface as soon as it is opened as there is no pm_runtime support is the other relevant parts of the platform support for at91rm9200.
Fixes: d54f89af6cc4 ("net: macb: Add pm runtime support") Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Reviewed-by: Claudiu Beznea claudiu.beznea@microchip.com Acked-by: Nicolas Ferre nicolas.ferre@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/cadence/macb_main.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -3751,6 +3751,10 @@ static int at91ether_open(struct net_dev u32 ctl; int ret;
+ ret = pm_runtime_get_sync(&lp->pdev->dev); + if (ret < 0) + return ret; + /* Clear internal statistics */ ctl = macb_readl(lp, NCR); macb_writel(lp, NCR, ctl | MACB_BIT(CLRSTAT)); @@ -3815,7 +3819,7 @@ static int at91ether_close(struct net_de q->rx_buffers, q->rx_buffers_dma); q->rx_buffers = NULL;
- return 0; + return pm_runtime_put(&lp->pdev->dev); }
/* Transmit packet */
From: Horatiu Vultur horatiu.vultur@microchip.com
[ Upstream commit a81541041ceb55bcec9a8bb8ad3482263f0a205a ]
Each extracted frame on Ocelot has an IFH. The frame and IFH are extracted by reading chuncks of 4 bytes from a register.
In case the IFH and frames were read corretly it would try to read the next frame. In case there are no more frames in the queue, it checks if there were any previous errors and in that case clear the queue. But this check will always succeed also when there are no errors. Because when extracting the IFH the error is checked against 4(number of bytes read) and then the error is set only if the extraction of the frame failed. So in a happy case where there are no errors the err variable is still 4. So it could be a case where after the check that there are no more frames in the queue, a frame will arrive in the queue but because the error is not reseted, it would try to flush the queue. So the frame will be lost.
The fix consist in resetting the error after reading the IFH.
Signed-off-by: Horatiu Vultur horatiu.vultur@microchip.com Acked-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mscc/ocelot_board.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/net/ethernet/mscc/ocelot_board.c +++ b/drivers/net/ethernet/mscc/ocelot_board.c @@ -114,6 +114,14 @@ static irqreturn_t ocelot_xtr_irq_handle if (err != 4) break;
+ /* At this point the IFH was read correctly, so it is safe to + * presume that there is no error. The err needs to be reset + * otherwise a frame could come in CPU queue between the while + * condition and the check for error later on. And in that case + * the new frame is just removed and not processed. + */ + err = 0; + ocelot_parse_ifh(ifh, &info);
ocelot_port = ocelot->ports[info.port];
From: Arun Parameswaran arun.parameswaran@broadcom.com
commit 6f08e98d62799e53c89dbf2c9a49d77e20ca648c upstream.
The mii management register in iproc mdio block does not have a retention register so it is lost on suspend. Save and restore value of register while resuming from suspend.
Fixes: bb1a619735b4 ("net: phy: Initialize mdio clock at probe function") Signed-off-by: Arun Parameswaran arun.parameswaran@broadcom.com Signed-off-by: Scott Branden scott.branden@broadcom.com Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/mdio-bcm-iproc.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
--- a/drivers/net/phy/mdio-bcm-iproc.c +++ b/drivers/net/phy/mdio-bcm-iproc.c @@ -178,6 +178,23 @@ static int iproc_mdio_remove(struct plat return 0; }
+#ifdef CONFIG_PM_SLEEP +int iproc_mdio_resume(struct device *dev) +{ + struct platform_device *pdev = to_platform_device(dev); + struct iproc_mdio_priv *priv = platform_get_drvdata(pdev); + + /* restore the mii clock configuration */ + iproc_mdio_config_clk(priv->base); + + return 0; +} + +static const struct dev_pm_ops iproc_mdio_pm_ops = { + .resume = iproc_mdio_resume +}; +#endif /* CONFIG_PM_SLEEP */ + static const struct of_device_id iproc_mdio_of_match[] = { { .compatible = "brcm,iproc-mdio", }, { /* sentinel */ }, @@ -188,6 +205,9 @@ static struct platform_driver iproc_mdio .driver = { .name = "iproc-mdio", .of_match_table = iproc_mdio_of_match, +#ifdef CONFIG_PM_SLEEP + .pm = &iproc_mdio_pm_ops, +#endif }, .probe = iproc_mdio_probe, .remove = iproc_mdio_remove,
From: Jason Baron jbaron@akamai.com
[ Upstream commit 8a9093c79863b58cc2f9874d7ae788f0d622a596 ]
tc flower rules that are based on src or dst port blocking are sometimes ineffective due to uninitialized stack data. __skb_flow_dissect() extracts ports from the skb for tc flower to match against. However, the port dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in key_control->flags. All callers of __skb_flow_dissect(), zero-out the key_control field except for fl_classify() as used by the flower classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to __skb_flow_dissect(), since key_control is allocated on the stack and may not be initialized.
Since key_basic and key_control are present for all flow keys, let's make sure they are initialized.
Fixes: 62230715fd24 ("flow_dissector: do not dissect l4 ports for fragments") Co-developed-by: Eric Dumazet edumazet@google.com Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: Jason Baron jbaron@akamai.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/flow_dissector.h | 9 +++++++++ net/sched/cls_flower.c | 1 + 2 files changed, 10 insertions(+)
--- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -5,6 +5,7 @@ #include <linux/types.h> #include <linux/in6.h> #include <linux/siphash.h> +#include <linux/string.h> #include <uapi/linux/if_ether.h>
struct sk_buff; @@ -349,4 +350,12 @@ struct bpf_flow_dissector { void *data_end; };
+static inline void +flow_dissector_init_keys(struct flow_dissector_key_control *key_control, + struct flow_dissector_key_basic *key_basic) +{ + memset(key_control, 0, sizeof(*key_control)); + memset(key_basic, 0, sizeof(*key_basic)); +} + #endif --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -305,6 +305,7 @@ static int fl_classify(struct sk_buff *s struct cls_fl_filter *f;
list_for_each_entry_rcu(mask, &head->masks, list) { + flow_dissector_init_keys(&skb_key.control, &skb_key.basic); fl_clear_masked_range(&skb_key, mask);
skb_flow_dissect_meta(skb, &mask->dissector, &skb_key);
From: Rohit Maheshwari rohitm@chelsio.com
[ Upstream commit 06f5201c6392f998a49ca9c9173e2930c8eb51d8 ]
Current code doesn't check if tcp sequence number is starting from (/after) 1st record's start sequnce number. It only checks if seq number is before 1st record's end sequnce number. This problem will always be a possibility in re-transmit case. If a record which belongs to a requested seq number is already deleted, tls_get_record will start looking into list and as per the check it will look if seq number is before the end seq of 1st record, which will always be true and will return 1st record always, it should in fact return NULL. As part of the fix, start looking each record only if the sequence number lies in the list else return NULL. There is one more check added, driver look for the start marker record to handle tcp packets which are before the tls offload start sequence number, hence return 1st record if the record is tls start marker and seq number is before the 1st record's starting sequence number.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Rohit Maheshwari rohitm@chelsio.com Reviewed-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/tls/tls_device.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
--- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -592,7 +592,7 @@ struct tls_record_info *tls_get_record(s u32 seq, u64 *p_record_sn) { u64 record_sn = context->hint_record_sn; - struct tls_record_info *info; + struct tls_record_info *info, *last;
info = context->retransmit_hint; if (!info || @@ -604,6 +604,24 @@ struct tls_record_info *tls_get_record(s struct tls_record_info, list); if (!info) return NULL; + /* send the start_marker record if seq number is before the + * tls offload start marker sequence number. This record is + * required to handle TCP packets which are before TLS offload + * started. + * And if it's not start marker, look if this seq number + * belongs to the list. + */ + if (likely(!tls_record_is_start_marker(info))) { + /* we have the first record, get the last record to see + * if this seq number belongs to the list. + */ + last = list_last_entry(&context->records_list, + struct tls_record_info, list); + + if (!between(seq, tls_record_start_seq(info), + last->end_seq)) + return NULL; + } record_sn = context->unacked_record_sn; }
From: Dmitry Osipenko digetx@gmail.com
[ Upstream commit c3331d2fe3fd4d5e321f2467d01f72de7edfb5d0 ]
The PN544 driver checks the "enable" polarity during of driver's probe and it's doing that by turning ON and OFF NFC with different polarities until enabling succeeds. It takes some time for the hardware to power-down, and thus, to deassert the IRQ that is raised by turning ON the hardware. Since the delay after last power-down of the polarity-checking process is missed in the code, the interrupt may trigger immediately after installing the IRQ handler (right after the checking is done), which results in IRQ handler trying to touch the disabled HW and ends with marking NFC as 'DEAD' during of the driver's probe:
pn544_hci_i2c 1-002a: NFC: nfc_en polarity : active high pn544_hci_i2c 1-002a: NFC: invalid len byte shdlc: llc_shdlc_recv_frame: NULL Frame -> link is dead
This patch fixes the occasional NFC initialization failure on Nexus 7 device.
Signed-off-by: Dmitry Osipenko digetx@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/nfc/pn544/i2c.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/nfc/pn544/i2c.c +++ b/drivers/nfc/pn544/i2c.c @@ -225,6 +225,7 @@ static void pn544_hci_i2c_platform_init(
out: gpiod_set_value_cansleep(phy->gpiod_en, !phy->en_polarity); + usleep_range(10000, 15000); }
static void pn544_hci_i2c_enable_mode(struct pn544_i2c_phy *phy, int run_mode)
From: Michal Kalderon michal.kalderon@marvell.com
[ Upstream commit af6565adb02d3129d3fae4d9d5da945abaf4417a ]
If an event is added while the rdma workqueue is being destroyed it could lead to several races, list corruption, null pointer dereference during queue_work or init_queue. This fixes the race between the two flows which can occur during shutdown.
A kref object and a completion object are added to the rdma_dev structure, these are initialized before the workqueue is created. The refcnt is used to indicate work is being added to the workqueue and ensures the cleanup flow won't start while we're in the middle of adding the event. Once the work is added, the refcnt is decreased and the cleanup flow is safe to run.
Fixes: cee9fbd8e2e ("qede: Add qedr framework") Signed-off-by: Ariel Elior ariel.elior@marvell.com Signed-off-by: Michal Kalderon michal.kalderon@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/qlogic/qede/qede.h | 2 + drivers/net/ethernet/qlogic/qede/qede_rdma.c | 29 ++++++++++++++++++++++++++- 2 files changed, 30 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -163,6 +163,8 @@ struct qede_rdma_dev { struct list_head entry; struct list_head rdma_event_list; struct workqueue_struct *rdma_wq; + struct kref refcnt; + struct completion event_comp; bool exp_recovery; };
--- a/drivers/net/ethernet/qlogic/qede/qede_rdma.c +++ b/drivers/net/ethernet/qlogic/qede/qede_rdma.c @@ -59,6 +59,9 @@ static void _qede_rdma_dev_add(struct qe static int qede_rdma_create_wq(struct qede_dev *edev) { INIT_LIST_HEAD(&edev->rdma_info.rdma_event_list); + kref_init(&edev->rdma_info.refcnt); + init_completion(&edev->rdma_info.event_comp); + edev->rdma_info.rdma_wq = create_singlethread_workqueue("rdma_wq"); if (!edev->rdma_info.rdma_wq) { DP_NOTICE(edev, "qedr: Could not create workqueue\n"); @@ -83,8 +86,23 @@ static void qede_rdma_cleanup_event(stru } }
+static void qede_rdma_complete_event(struct kref *ref) +{ + struct qede_rdma_dev *rdma_dev = + container_of(ref, struct qede_rdma_dev, refcnt); + + /* no more events will be added after this */ + complete(&rdma_dev->event_comp); +} + static void qede_rdma_destroy_wq(struct qede_dev *edev) { + /* Avoid race with add_event flow, make sure it finishes before + * we start accessing the list and cleaning up the work + */ + kref_put(&edev->rdma_info.refcnt, qede_rdma_complete_event); + wait_for_completion(&edev->rdma_info.event_comp); + qede_rdma_cleanup_event(edev); destroy_workqueue(edev->rdma_info.rdma_wq); } @@ -310,15 +328,24 @@ static void qede_rdma_add_event(struct q if (!edev->rdma_info.qedr_dev) return;
+ /* We don't want the cleanup flow to start while we're allocating and + * scheduling the work + */ + if (!kref_get_unless_zero(&edev->rdma_info.refcnt)) + return; /* already being destroyed */ + event_node = qede_rdma_get_free_event_node(edev); if (!event_node) - return; + goto out;
event_node->event = event; event_node->ptr = edev;
INIT_WORK(&event_node->work, qede_rdma_handle_event); queue_work(edev->rdma_info.rdma_wq, &event_node->work); + +out: + kref_put(&edev->rdma_info.refcnt, qede_rdma_complete_event); }
void qede_rdma_dev_event_open(struct qede_dev *edev)
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 379349e9bc3b42b8b2f8f7a03f64a97623fff323 ]
This reverts commit ba27b4cdaaa66561aaedb2101876e563738d36fe
Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc"). I can't find any working solution other than a plain revert.
This will introduce some minor performance regressions for pfifo_fast qdisc. I plan to address them in net-next with more indirect call wrapper boilerplate for qdiscs.
Reported-by: Ahmad Fatoum a.fatoum@pengutronix.de Fixes: ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc") Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 22 ++-------------------- 1 file changed, 2 insertions(+), 20 deletions(-)
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -3607,26 +3607,8 @@ static inline int __dev_xmit_skb(struct qdisc_calculate_pkt_len(skb, q);
if (q->flags & TCQ_F_NOLOCK) { - if ((q->flags & TCQ_F_CAN_BYPASS) && READ_ONCE(q->empty) && - qdisc_run_begin(q)) { - if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, - &q->state))) { - __qdisc_drop(skb, &to_free); - rc = NET_XMIT_DROP; - goto end_run; - } - qdisc_bstats_cpu_update(q, skb); - - rc = NET_XMIT_SUCCESS; - if (sch_direct_xmit(skb, q, dev, txq, NULL, true)) - __qdisc_run(q); - -end_run: - qdisc_run_end(q); - } else { - rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; - qdisc_run(q); - } + rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; + qdisc_run(q);
if (unlikely(to_free)) kfree_skb_list(to_free);
From: Willem de Bruijn willemb@google.com
[ Upstream commit 303d0403b8c25e994e4a6e45389e173cf8706fb5 ]
As of the below commit, udp sockets bound to a specific address can coexist with one bound to the any addr for the same port.
The commit also phased out the use of socket hashing based only on port (hslot), in favor of always hashing on {addr, port} (hslot2).
The change broke the following behavior with disconnect (AF_UNSPEC):
server binds to 0.0.0.0:1337 server connects to 127.0.0.1:80 server disconnects client connects to 127.0.0.1:1337 client sends "hello" server reads "hello" // times out, packet did not find sk
On connect the server acquires a specific source addr suitable for routing to its destination. On disconnect it reverts to the any addr.
The connect call triggers a rehash to a different hslot2. On disconnect, add the same to return to the original hslot2.
Skip this step if the socket is going to be unhashed completely.
Fixes: 4cdeeee9252a ("net: udp: prefer listeners bound to an address") Reported-by: Pavel Roskin plroskin@gmail.com Signed-off-by: Willem de Bruijn willemb@google.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/udp.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1856,8 +1856,12 @@ int __udp_disconnect(struct sock *sk, in inet->inet_dport = 0; sock_rps_reset_rxhash(sk); sk->sk_bound_dev_if = 0; - if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK)) + if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK)) { inet_reset_saddr(sk); + if (sk->sk_prot->rehash && + (sk->sk_userlocks & SOCK_BINDPORT_LOCK)) + sk->sk_prot->rehash(sk); + }
if (!(sk->sk_userlocks & SOCK_BINDPORT_LOCK)) { sk->sk_prot->unhash(sk);
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 245709ec8be89af46ea7ef0444c9c80913999d99 ]
When T2 timer is to be stopped, the asoc should also be deleted, otherwise, there will be no chance to call sctp_association_free and the asoc could last in memory forever.
However, in sctp_sf_shutdown_sent_abort(), after adding the cmd SCTP_CMD_TIMER_STOP for T2 timer, it may return error due to the format error from __sctp_sf_do_9_1_abort() and miss adding SCTP_CMD_ASSOC_FAILED where the asoc will be deleted.
This patch is to fix it by moving the format error check out of __sctp_sf_do_9_1_abort(), and do it before adding the cmd SCTP_CMD_TIMER_STOP for T2 timer.
Thanks Hangbin for reporting this issue by the fuzz testing.
v1->v2: - improve the comment in the code as Marcelo's suggestion.
Fixes: 96ca468b86b0 ("sctp: check invalid value of length parameter in error cause") Reported-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sctp/sm_statefuns.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-)
--- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -170,6 +170,16 @@ static inline bool sctp_chunk_length_val return true; }
+/* Check for format error in an ABORT chunk */ +static inline bool sctp_err_chunk_valid(struct sctp_chunk *chunk) +{ + struct sctp_errhdr *err; + + sctp_walk_errors(err, chunk->chunk_hdr); + + return (void *)err == (void *)chunk->chunk_end; +} + /********************************************************** * These are the state functions for handling chunk events. **********************************************************/ @@ -2255,6 +2265,9 @@ enum sctp_disposition sctp_sf_shutdown_p sctp_bind_addr_state(&asoc->base.bind_addr, &chunk->dest)) return sctp_sf_discard_chunk(net, ep, asoc, type, arg, commands);
+ if (!sctp_err_chunk_valid(chunk)) + return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands); + return __sctp_sf_do_9_1_abort(net, ep, asoc, type, arg, commands); }
@@ -2298,6 +2311,9 @@ enum sctp_disposition sctp_sf_shutdown_s sctp_bind_addr_state(&asoc->base.bind_addr, &chunk->dest)) return sctp_sf_discard_chunk(net, ep, asoc, type, arg, commands);
+ if (!sctp_err_chunk_valid(chunk)) + return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands); + /* Stop the T2-shutdown timer. */ sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_STOP, SCTP_TO(SCTP_EVENT_TIMEOUT_T2_SHUTDOWN)); @@ -2565,6 +2581,9 @@ enum sctp_disposition sctp_sf_do_9_1_abo sctp_bind_addr_state(&asoc->base.bind_addr, &chunk->dest)) return sctp_sf_discard_chunk(net, ep, asoc, type, arg, commands);
+ if (!sctp_err_chunk_valid(chunk)) + return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands); + return __sctp_sf_do_9_1_abort(net, ep, asoc, type, arg, commands); }
@@ -2582,16 +2601,8 @@ static enum sctp_disposition __sctp_sf_d
/* See if we have an error cause code in the chunk. */ len = ntohs(chunk->chunk_hdr->length); - if (len >= sizeof(struct sctp_chunkhdr) + sizeof(struct sctp_errhdr)) { - struct sctp_errhdr *err; - - sctp_walk_errors(err, chunk->chunk_hdr); - if ((void *)err != (void *)chunk->chunk_end) - return sctp_sf_pdiscard(net, ep, asoc, type, arg, - commands); - + if (len >= sizeof(struct sctp_chunkhdr) + sizeof(struct sctp_errhdr)) error = ((struct sctp_errhdr *)chunk->skb->data)->cause; - }
sctp_add_cmd_sf(commands, SCTP_CMD_SET_SK_ERR, SCTP_ERROR(ECONNRESET)); /* ASSOC_FAILED will DELETE_TCB. */
From: Vasundhara Volam vasundhara-v.volam@broadcom.com
[ Upstream commit 5567ae4a8d569d996d0d88d0eceb76205e4c7ce5 ]
Especially when bnxt_shutdown() is called during kexec, we need to disable MSIX and disable Bus Master to completely quiesce the device. Make these 2 calls unconditionally in the shutdown method.
Fixes: c20dc142dd7b ("bnxt_en: Disable bus master during PCI shutdown and driver unload.") Signed-off-by: Vasundhara Volam vasundhara-v.volam@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -11972,10 +11972,10 @@ static void bnxt_shutdown(struct pci_dev dev_close(dev);
bnxt_ulp_shutdown(bp); + bnxt_clear_int_mode(bp); + pci_disable_device(pdev);
if (system_state == SYSTEM_POWER_OFF) { - bnxt_clear_int_mode(bp); - pci_disable_device(pdev); pci_wake_from_d3(pdev, bp->wol); pci_set_power_state(pdev, PCI_D3hot); }
From: Vasundhara Volam vasundhara-v.volam@broadcom.com
[ Upstream commit 8743db4a9acfd51f805ac0c87bcaae92c42d1061 ]
If crashed kernel does not shutdown the NIC properly, PCIe FLR is required in the kdump kernel in order to initialize all the functions properly.
Fixes: d629522e1d66 ("bnxt_en: Reduce memory usage when running in kdump kernel.") Signed-off-by: Vasundhara Volam vasundhara-v.volam@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -11775,6 +11775,14 @@ static int bnxt_init_one(struct pci_dev if (version_printed++ == 0) pr_info("%s", version);
+ /* Clear any pending DMA transactions from crash kernel + * while loading driver in capture kernel. + */ + if (is_kdump_kernel()) { + pci_clear_master(pdev); + pcie_flr(pdev); + } + max_irqs = bnxt_get_max_irq(pdev); dev = alloc_etherdev_mq(sizeof(*bp), max_irqs); if (!dev)
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 064ff66e2bef84f1153087612032b5b9eab005bd ]
After bond_release(), netdev_update_lockdep_key() should be called. But both ioctl path and attribute path don't call netdev_update_lockdep_key(). This patch adds missing netdev_update_lockdep_key().
Test commands: ip link add bond0 type bond ip link add bond1 type bond ifenslave bond0 bond1 ifenslave -d bond0 bond1 ifenslave bond1 bond0
Splat looks like: [ 29.501182][ T1046] WARNING: possible circular locking dependency detected [ 29.501945][ T1039] hardirqs last disabled at (1962): [<ffffffffac6c807f>] handle_mm_fault+0x13f/0x700 [ 29.503442][ T1046] 5.5.0+ #322 Not tainted [ 29.503447][ T1046] ------------------------------------------------------ [ 29.504277][ T1039] softirqs last enabled at (1180): [<ffffffffade00678>] __do_softirq+0x678/0x981 [ 29.505443][ T1046] ifenslave/1046 is trying to acquire lock: [ 29.505886][ T1039] softirqs last disabled at (1169): [<ffffffffac19c18a>] irq_exit+0x17a/0x1a0 [ 29.509997][ T1046] ffff88805d5da280 (&dev->addr_list_lock_key#3){+...}, at: dev_mc_sync_multiple+0x95/0x120 [ 29.511243][ T1046] [ 29.511243][ T1046] but task is already holding lock: [ 29.512192][ T1046] ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding] [ 29.514124][ T1046] [ 29.514124][ T1046] which lock already depends on the new lock. [ 29.514124][ T1046] [ 29.517297][ T1046] [ 29.517297][ T1046] the existing dependency chain (in reverse order) is: [ 29.518231][ T1046] [ 29.518231][ T1046] -> #1 (&dev->addr_list_lock_key#4){+...}: [ 29.519076][ T1046] _raw_spin_lock+0x30/0x70 [ 29.519588][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.520208][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.520862][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ 29.521640][ T1046] __bond_opt_set+0x1ff/0xbb0 [bonding] [ 29.522438][ T1046] __bond_opt_set_notify+0x2b/0xf0 [bonding] [ 29.523251][ T1046] bond_opt_tryset_rtnl+0x92/0xf0 [bonding] [ 29.524082][ T1046] bonding_sysfs_store_option+0x8a/0xf0 [bonding] [ 29.524959][ T1046] kernfs_fop_write+0x276/0x410 [ 29.525620][ T1046] vfs_write+0x197/0x4a0 [ 29.526218][ T1046] ksys_write+0x141/0x1d0 [ 29.526818][ T1046] do_syscall_64+0x99/0x4f0 [ 29.527430][ T1046] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 29.528265][ T1046] [ 29.528265][ T1046] -> #0 (&dev->addr_list_lock_key#3){+...}: [ 29.529272][ T1046] __lock_acquire+0x2d8d/0x3de0 [ 29.529935][ T1046] lock_acquire+0x164/0x3b0 [ 29.530638][ T1046] _raw_spin_lock+0x30/0x70 [ 29.531187][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.531790][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.532451][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ 29.533163][ T1046] __bond_opt_set+0x1ff/0xbb0 [bonding] [ 29.533789][ T1046] __bond_opt_set_notify+0x2b/0xf0 [bonding] [ 29.534595][ T1046] bond_opt_tryset_rtnl+0x92/0xf0 [bonding] [ 29.535500][ T1046] bonding_sysfs_store_option+0x8a/0xf0 [bonding] [ 29.536379][ T1046] kernfs_fop_write+0x276/0x410 [ 29.537057][ T1046] vfs_write+0x197/0x4a0 [ 29.537640][ T1046] ksys_write+0x141/0x1d0 [ 29.538251][ T1046] do_syscall_64+0x99/0x4f0 [ 29.538870][ T1046] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 29.539659][ T1046] [ 29.539659][ T1046] other info that might help us debug this: [ 29.539659][ T1046] [ 29.540953][ T1046] Possible unsafe locking scenario: [ 29.540953][ T1046] [ 29.541883][ T1046] CPU0 CPU1 [ 29.542540][ T1046] ---- ---- [ 29.543209][ T1046] lock(&dev->addr_list_lock_key#4); [ 29.543880][ T1046] lock(&dev->addr_list_lock_key#3); [ 29.544873][ T1046] lock(&dev->addr_list_lock_key#4); [ 29.545863][ T1046] lock(&dev->addr_list_lock_key#3); [ 29.546525][ T1046] [ 29.546525][ T1046] *** DEADLOCK *** [ 29.546525][ T1046] [ 29.547542][ T1046] 5 locks held by ifenslave/1046: [ 29.548196][ T1046] #0: ffff88806044c478 (sb_writers#5){.+.+}, at: vfs_write+0x3bb/0x4a0 [ 29.549248][ T1046] #1: ffff88805af00890 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cf/0x410 [ 29.550343][ T1046] #2: ffff88805b8b54b0 (kn->count#157){.+.+}, at: kernfs_fop_write+0x1f2/0x410 [ 29.551575][ T1046] #3: ffffffffaecf4cf0 (rtnl_mutex){+.+.}, at: bond_opt_tryset_rtnl+0x5f/0xf0 [bonding] [ 29.552819][ T1046] #4: ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding] [ 29.554175][ T1046] [ 29.554175][ T1046] stack backtrace: [ 29.554907][ T1046] CPU: 0 PID: 1046 Comm: ifenslave Not tainted 5.5.0+ #322 [ 29.555854][ T1046] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 29.557064][ T1046] Call Trace: [ 29.557504][ T1046] dump_stack+0x96/0xdb [ 29.558054][ T1046] check_noncircular+0x371/0x450 [ 29.558723][ T1046] ? print_circular_bug.isra.35+0x310/0x310 [ 29.559486][ T1046] ? hlock_class+0x130/0x130 [ 29.560100][ T1046] ? __lock_acquire+0x2d8d/0x3de0 [ 29.560761][ T1046] __lock_acquire+0x2d8d/0x3de0 [ 29.561366][ T1046] ? register_lock_class+0x14d0/0x14d0 [ 29.562045][ T1046] ? find_held_lock+0x39/0x1d0 [ 29.562641][ T1046] lock_acquire+0x164/0x3b0 [ 29.563199][ T1046] ? dev_mc_sync_multiple+0x95/0x120 [ 29.563872][ T1046] _raw_spin_lock+0x30/0x70 [ 29.564464][ T1046] ? dev_mc_sync_multiple+0x95/0x120 [ 29.565146][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.565793][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.566487][ T1046] ? bond_update_slave_arr+0x940/0x940 [bonding] [ 29.567279][ T1046] ? bstr_printf+0xc20/0xc20 [ 29.567857][ T1046] ? stack_trace_consume_entry+0x160/0x160 [ 29.568614][ T1046] ? deactivate_slab.isra.77+0x2c5/0x800 [ 29.569320][ T1046] ? check_chain_key+0x236/0x5d0 [ 29.569939][ T1046] ? sscanf+0x93/0xc0 [ 29.570442][ T1046] ? vsscanf+0x1e20/0x1e20 [ 29.571003][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ ... ]
Fixes: ab92d68fc22f ("net: core: add generic lockdep keys") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_main.c | 2 ++ drivers/net/bonding/bond_options.c | 2 ++ 2 files changed, 4 insertions(+)
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3640,6 +3640,8 @@ static int bond_do_ioctl(struct net_devi case BOND_RELEASE_OLD: case SIOCBONDRELEASE: res = bond_release(bond_dev, slave_dev); + if (!res) + netdev_update_lockdep_key(slave_dev); break; case BOND_SETHWADDR_OLD: case SIOCBONDSETHWADDR: --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -1398,6 +1398,8 @@ static int bond_option_slaves_set(struct case '-': slave_dbg(bond->dev, dev, "Releasing interface\n"); ret = bond_release(bond->dev, dev); + if (!ret) + netdev_update_lockdep_key(dev); break;
default:
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 7151affeef8d527f50b4b68a871fd28bd660023f ]
netdev_next_lower_dev_rcu() will be used to implement a function, which is to walk all lower interfaces. There are already functions that they walk their lower interface. (netdev_walk_all_lower_dev_rcu, netdev_walk_all_lower_dev()). But, there would be cases that couldn't be covered by given netdev_walk_all_lower_dev_{rcu}() function. So, some modules would want to implement own function, which is to walk all lower interfaces.
In the next patch, netdev_next_lower_dev_rcu() will be used. In addition, this patch removes two unused prototypes in netdevice.h.
Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/netdevice.h | 7 +++---- net/core/dev.c | 6 +++--- 2 files changed, 6 insertions(+), 7 deletions(-)
--- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -72,6 +72,8 @@ void netdev_set_default_ethtool_ops(stru #define NET_RX_SUCCESS 0 /* keep 'em coming, baby */ #define NET_RX_DROP 1 /* packet dropped */
+#define MAX_NEST_DEV 8 + /* * Transmit return codes: transmit return codes originate from three different * namespaces: @@ -4323,11 +4325,8 @@ void *netdev_lower_get_next(struct net_d ldev; \ ldev = netdev_lower_get_next(dev, &(iter)))
-struct net_device *netdev_all_lower_get_next(struct net_device *dev, +struct net_device *netdev_next_lower_dev_rcu(struct net_device *dev, struct list_head **iter); -struct net_device *netdev_all_lower_get_next_rcu(struct net_device *dev, - struct list_head **iter); - int netdev_walk_all_lower_dev(struct net_device *dev, int (*fn)(struct net_device *lower_dev, void *data), --- a/net/core/dev.c +++ b/net/core/dev.c @@ -146,7 +146,6 @@ #include "net-sysfs.h"
#define MAX_GRO_SKBS 8 -#define MAX_NEST_DEV 8
/* This should be increased if a protocol with a bigger head is added. */ #define GRO_MAX_HEAD (MAX_HEADER + 128) @@ -7135,8 +7134,8 @@ static int __netdev_walk_all_lower_dev(s return 0; }
-static struct net_device *netdev_next_lower_dev_rcu(struct net_device *dev, - struct list_head **iter) +struct net_device *netdev_next_lower_dev_rcu(struct net_device *dev, + struct list_head **iter) { struct netdev_adjacent *lower;
@@ -7148,6 +7147,7 @@ static struct net_device *netdev_next_lo
return lower->dev; } +EXPORT_SYMBOL(netdev_next_lower_dev_rcu);
static u8 __netdev_upper_depth(struct net_device *dev) {
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit b3e80d44f5b1b470dd9e2dbc6816e63a5c519709 ]
In the "struct bonding", there is stats_lock. This lock protects "bond_stats" in the "struct bonding". bond_stats is updated in the bond_get_stats() and this function would be executed concurrently. So, the lock is needed.
Bonding interfaces would be nested. So, either stats_lock should use dynamic lockdep class key or stats_lock should be used by spin_lock_nested(). In the current code, stats_lock is using a dynamic lockdep class key. But there is no updating stats_lock_key routine So, lockdep warning will occur.
Test commands: ip link add bond0 type bond ip link add bond1 type bond ip link set bond0 master bond1 ip link set bond0 nomaster ip link set bond1 master bond0
Splat looks like: [ 38.420603][ T957] 5.5.0+ #394 Not tainted [ 38.421074][ T957] ------------------------------------------------------ [ 38.421837][ T957] ip/957 is trying to acquire lock: [ 38.422399][ T957] ffff888063262cd8 (&bond->stats_lock_key#2){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding] [ 38.423528][ T957] [ 38.423528][ T957] but task is already holding lock: [ 38.424526][ T957] ffff888065fd2cd8 (&bond->stats_lock_key){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding] [ 38.426075][ T957] [ 38.426075][ T957] which lock already depends on the new lock. [ 38.426075][ T957] [ 38.428536][ T957] [ 38.428536][ T957] the existing dependency chain (in reverse order) is: [ 38.429475][ T957] [ 38.429475][ T957] -> #1 (&bond->stats_lock_key){+.+.}: [ 38.430273][ T957] _raw_spin_lock+0x30/0x70 [ 38.430812][ T957] bond_get_stats+0x90/0x4d0 [bonding] [ 38.431451][ T957] dev_get_stats+0x1ec/0x270 [ 38.432088][ T957] bond_get_stats+0x1a5/0x4d0 [bonding] [ 38.432767][ T957] dev_get_stats+0x1ec/0x270 [ 38.433322][ T957] rtnl_fill_stats+0x44/0xbe0 [ 38.433866][ T957] rtnl_fill_ifinfo+0xeb2/0x3720 [ 38.434474][ T957] rtmsg_ifinfo_build_skb+0xca/0x170 [ 38.435081][ T957] rtmsg_ifinfo_event.part.33+0x1b/0xb0 [ 38.436848][ T957] rtnetlink_event+0xcd/0x120 [ 38.437455][ T957] notifier_call_chain+0x90/0x160 [ 38.438067][ T957] netdev_change_features+0x74/0xa0 [ 38.438708][ T957] bond_compute_features.isra.45+0x4e6/0x6f0 [bonding] [ 38.439522][ T957] bond_enslave+0x3639/0x47b0 [bonding] [ 38.440225][ T957] do_setlink+0xaab/0x2ef0 [ 38.440786][ T957] __rtnl_newlink+0x9c5/0x1270 [ 38.441463][ T957] rtnl_newlink+0x65/0x90 [ 38.442075][ T957] rtnetlink_rcv_msg+0x4a8/0x890 [ 38.442774][ T957] netlink_rcv_skb+0x121/0x350 [ 38.443451][ T957] netlink_unicast+0x42e/0x610 [ 38.444282][ T957] netlink_sendmsg+0x65a/0xb90 [ 38.444992][ T957] ____sys_sendmsg+0x5ce/0x7a0 [ 38.445679][ T957] ___sys_sendmsg+0x10f/0x1b0 [ 38.446365][ T957] __sys_sendmsg+0xc6/0x150 [ 38.447007][ T957] do_syscall_64+0x99/0x4f0 [ 38.447668][ T957] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 38.448538][ T957] [ 38.448538][ T957] -> #0 (&bond->stats_lock_key#2){+.+.}: [ 38.449554][ T957] __lock_acquire+0x2d8d/0x3de0 [ 38.450148][ T957] lock_acquire+0x164/0x3b0 [ 38.450711][ T957] _raw_spin_lock+0x30/0x70 [ 38.451292][ T957] bond_get_stats+0x90/0x4d0 [bonding] [ 38.451950][ T957] dev_get_stats+0x1ec/0x270 [ 38.452425][ T957] bond_get_stats+0x1a5/0x4d0 [bonding] [ 38.453362][ T957] dev_get_stats+0x1ec/0x270 [ 38.453825][ T957] rtnl_fill_stats+0x44/0xbe0 [ 38.454390][ T957] rtnl_fill_ifinfo+0xeb2/0x3720 [ 38.456257][ T957] rtmsg_ifinfo_build_skb+0xca/0x170 [ 38.456998][ T957] rtmsg_ifinfo_event.part.33+0x1b/0xb0 [ 38.459351][ T957] rtnetlink_event+0xcd/0x120 [ 38.460086][ T957] notifier_call_chain+0x90/0x160 [ 38.460829][ T957] netdev_change_features+0x74/0xa0 [ 38.461752][ T957] bond_compute_features.isra.45+0x4e6/0x6f0 [bonding] [ 38.462705][ T957] bond_enslave+0x3639/0x47b0 [bonding] [ 38.463476][ T957] do_setlink+0xaab/0x2ef0 [ 38.464141][ T957] __rtnl_newlink+0x9c5/0x1270 [ 38.464897][ T957] rtnl_newlink+0x65/0x90 [ 38.465522][ T957] rtnetlink_rcv_msg+0x4a8/0x890 [ 38.466215][ T957] netlink_rcv_skb+0x121/0x350 [ 38.466895][ T957] netlink_unicast+0x42e/0x610 [ 38.467583][ T957] netlink_sendmsg+0x65a/0xb90 [ 38.468285][ T957] ____sys_sendmsg+0x5ce/0x7a0 [ 38.469202][ T957] ___sys_sendmsg+0x10f/0x1b0 [ 38.469884][ T957] __sys_sendmsg+0xc6/0x150 [ 38.470587][ T957] do_syscall_64+0x99/0x4f0 [ 38.471245][ T957] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 38.472093][ T957] [ 38.472093][ T957] other info that might help us debug this: [ 38.472093][ T957] [ 38.473438][ T957] Possible unsafe locking scenario: [ 38.473438][ T957] [ 38.474898][ T957] CPU0 CPU1 [ 38.476234][ T957] ---- ---- [ 38.480171][ T957] lock(&bond->stats_lock_key); [ 38.480808][ T957] lock(&bond->stats_lock_key#2); [ 38.481791][ T957] lock(&bond->stats_lock_key); [ 38.482754][ T957] lock(&bond->stats_lock_key#2); [ 38.483416][ T957] [ 38.483416][ T957] *** DEADLOCK *** [ 38.483416][ T957] [ 38.484505][ T957] 3 locks held by ip/957: [ 38.485048][ T957] #0: ffffffffbccf6230 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x457/0x890 [ 38.486198][ T957] #1: ffff888065fd2cd8 (&bond->stats_lock_key){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding] [ 38.487625][ T957] #2: ffffffffbc9254c0 (rcu_read_lock){....}, at: bond_get_stats+0x5/0x4d0 [bonding] [ 38.488897][ T957] [ 38.488897][ T957] stack backtrace: [ 38.489646][ T957] CPU: 1 PID: 957 Comm: ip Not tainted 5.5.0+ #394 [ 38.490497][ T957] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 38.492810][ T957] Call Trace: [ 38.493219][ T957] dump_stack+0x96/0xdb [ 38.493709][ T957] check_noncircular+0x371/0x450 [ 38.494344][ T957] ? lookup_address+0x60/0x60 [ 38.494923][ T957] ? print_circular_bug.isra.35+0x310/0x310 [ 38.495699][ T957] ? hlock_class+0x130/0x130 [ 38.496334][ T957] ? __lock_acquire+0x2d8d/0x3de0 [ 38.496979][ T957] __lock_acquire+0x2d8d/0x3de0 [ 38.497607][ T957] ? register_lock_class+0x14d0/0x14d0 [ 38.498333][ T957] ? check_chain_key+0x236/0x5d0 [ 38.499003][ T957] lock_acquire+0x164/0x3b0 [ 38.499800][ T957] ? bond_get_stats+0x90/0x4d0 [bonding] [ 38.500706][ T957] _raw_spin_lock+0x30/0x70 [ 38.501435][ T957] ? bond_get_stats+0x90/0x4d0 [bonding] [ 38.502311][ T957] bond_get_stats+0x90/0x4d0 [bonding] [ ... ]
But, there is another problem. The dynamic lockdep class key is protected by RTNL, but bond_get_stats() would be called outside of RTNL. So, it would use an invalid dynamic lockdep class key.
In order to fix this issue, stats_lock uses spin_lock_nested() instead of a dynamic lockdep key. The bond_get_stats() calls bond_get_lowest_level_rcu() to get the correct nest level value, which will be used by spin_lock_nested(). The "dev->lower_level" indicates lower nest level value, but this value is invalid outside of RTNL. So, bond_get_lowest_level_rcu() returns valid lower nest level value in the RCU critical section. bond_get_lowest_level_rcu() will be work only when LOCKDEP is enabled.
Fixes: 089bca2caed0 ("bonding: use dynamic lockdep key instead of subclass") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/bonding/bond_main.c | 53 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 50 insertions(+), 3 deletions(-)
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3526,6 +3526,47 @@ static void bond_fold_stats(struct rtnl_ } }
+#ifdef CONFIG_LOCKDEP +static int bond_get_lowest_level_rcu(struct net_device *dev) +{ + struct net_device *ldev, *next, *now, *dev_stack[MAX_NEST_DEV + 1]; + struct list_head *niter, *iter, *iter_stack[MAX_NEST_DEV + 1]; + int cur = 0, max = 0; + + now = dev; + iter = &dev->adj_list.lower; + + while (1) { + next = NULL; + while (1) { + ldev = netdev_next_lower_dev_rcu(now, &iter); + if (!ldev) + break; + + next = ldev; + niter = &ldev->adj_list.lower; + dev_stack[cur] = now; + iter_stack[cur++] = iter; + if (max <= cur) + max = cur; + break; + } + + if (!next) { + if (!cur) + return max; + next = dev_stack[--cur]; + niter = iter_stack[cur]; + } + + now = next; + iter = niter; + } + + return max; +} +#endif + static void bond_get_stats(struct net_device *bond_dev, struct rtnl_link_stats64 *stats) { @@ -3533,11 +3574,17 @@ static void bond_get_stats(struct net_de struct rtnl_link_stats64 temp; struct list_head *iter; struct slave *slave; + int nest_level = 0;
- spin_lock(&bond->stats_lock); - memcpy(stats, &bond->bond_stats, sizeof(*stats));
rcu_read_lock(); +#ifdef CONFIG_LOCKDEP + nest_level = bond_get_lowest_level_rcu(bond_dev); +#endif + + spin_lock_nested(&bond->stats_lock, nest_level); + memcpy(stats, &bond->bond_stats, sizeof(*stats)); + bond_for_each_slave_rcu(bond, slave, iter) { const struct rtnl_link_stats64 *new = dev_get_stats(slave->dev, &temp); @@ -3547,10 +3594,10 @@ static void bond_get_stats(struct net_de /* save off the slave stats for the next run */ memcpy(&slave->slave_stats, new, sizeof(*new)); } - rcu_read_unlock();
memcpy(&bond->bond_stats, stats, sizeof(*stats)); spin_unlock(&bond->stats_lock); + rcu_read_unlock(); }
static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd)
From: Benjamin Poirier bpoirier@cumulusnetworks.com
[ Upstream commit e404b8c7cfb31654c9024d497cec58a501501692 ]
After commit 27596472473a ("ipv6: fix ECMP route replacement") it is no longer possible to replace an ECMP-able route by a non ECMP-able route. For example, ip route add 2001:db8::1/128 via fe80::1 dev dummy0 ip route replace 2001:db8::1/128 dev dummy0 does not work as expected.
Tweak the replacement logic so that point 3 in the log of the above commit becomes: 3. If the new route is not ECMP-able, and no matching non-ECMP-able route exists, replace matching ECMP-able route (if any) or add the new route.
We can now summarize the entire replace semantics to: When doing a replace, prefer replacing a matching route of the same "ECMP-able-ness" as the replace argument. If there is no such candidate, fallback to the first route found.
Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Benjamin Poirier bpoirier@cumulusnetworks.com Reviewed-by: Michal Kubecek mkubecek@suse.cz Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/ip6_fib.c | 7 ++++--- tools/testing/selftests/net/fib_tests.sh | 6 ++++++ 2 files changed, 10 insertions(+), 3 deletions(-)
--- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1068,8 +1068,7 @@ static int fib6_add_rt2node(struct fib6_ found++; break; } - if (rt_can_ecmp) - fallback_ins = fallback_ins ?: ins; + fallback_ins = fallback_ins ?: ins; goto next_iter; }
@@ -1112,7 +1111,9 @@ next_iter: }
if (fallback_ins && !found) { - /* No ECMP-able route found, replace first non-ECMP one */ + /* No matching route with same ecmp-able-ness found, replace + * first matching route + */ ins = fallback_ins; iter = rcu_dereference_protected(*ins, lockdep_is_held(&rt->fib6_table->tb6_lock)); --- a/tools/testing/selftests/net/fib_tests.sh +++ b/tools/testing/selftests/net/fib_tests.sh @@ -910,6 +910,12 @@ ipv6_rt_replace_mpath() check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" log_test $? 0 "Multipath with single path via multipath attribute"
+ # multipath with dev-only + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 dev veth1" + check_route6 "2001:db8:104::/64 dev veth1 metric 1024" + log_test $? 0 "Multipath with dev-only" + # route replace fails - invalid nexthop 1 add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:111::3 nexthop via 2001:db8:103::3"
From: Benjamin Poirier bpoirier@cumulusnetworks.com
[ Upstream commit afecdb376bd81d7e16578f0cfe82a1aec7ae18f3 ]
When splitting an RTA_MULTIPATH request into multiple routes and adding the second and later components, we must not simply remove NLM_F_REPLACE but instead replace it by NLM_F_CREATE. Otherwise, it may look like the netlink message was malformed.
For example, ip route add 2001:db8::1/128 dev dummy0 ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 \ nexthop via fe80::30:2 dev dummy0 results in the following warnings: [ 1035.057019] IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE [ 1035.057517] IPv6: NLM_F_CREATE should be set when creating new route
This patch makes the nlmsg sequence look equivalent for __ip6_ins_rt() to what it would get if the multipath route had been added in multiple netlink operations: ip route add 2001:db8::1/128 dev dummy0 ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 ip route append 2001:db8::1/128 nexthop via fe80::30:2 dev dummy0
Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Benjamin Poirier bpoirier@cumulusnetworks.com Reviewed-by: Michal Kubecek mkubecek@suse.cz Reviewed-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/route.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5152,6 +5152,7 @@ static int ip6_route_multipath_add(struc */ cfg->fc_nlinfo.nlh->nlmsg_flags &= ~(NLM_F_EXCL | NLM_F_REPLACE); + cfg->fc_nlinfo.nlh->nlmsg_flags |= NLM_F_CREATE; nhn++; }
From: Shannon Nelson snelson@pensando.io
[ Upstream commit 68b759a75d6257759d1e37ff13f2d0659baf1112 ]
The fw_status field is only 8 bits, so fix the read. Also, we only want to look at the one status bit, to allow for future use of the other bits, and watch for a bad PCI read.
Fixes: 97ca486592c0 ("ionic: add heartbeat check") Signed-off-by: Shannon Nelson snelson@pensando.io Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/pensando/ionic/ionic_dev.c | 11 +++++++---- drivers/net/ethernet/pensando/ionic/ionic_if.h | 1 + 2 files changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/pensando/ionic/ionic_dev.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_dev.c @@ -103,7 +103,7 @@ int ionic_heartbeat_check(struct ionic * { struct ionic_dev *idev = &ionic->idev; unsigned long hb_time; - u32 fw_status; + u8 fw_status; u32 hb;
/* wait a little more than one second before testing again */ @@ -111,9 +111,12 @@ int ionic_heartbeat_check(struct ionic * if (time_before(hb_time, (idev->last_hb_time + ionic->watchdog_period))) return 0;
- /* firmware is useful only if fw_status is non-zero */ - fw_status = ioread32(&idev->dev_info_regs->fw_status); - if (!fw_status) + /* firmware is useful only if the running bit is set and + * fw_status != 0xff (bad PCI read) + */ + fw_status = ioread8(&idev->dev_info_regs->fw_status); + if (fw_status == 0xff || + !(fw_status & IONIC_FW_STS_F_RUNNING)) return -ENXIO;
/* early FW has no heartbeat, else FW will return non-zero */ --- a/drivers/net/ethernet/pensando/ionic/ionic_if.h +++ b/drivers/net/ethernet/pensando/ionic/ionic_if.h @@ -2348,6 +2348,7 @@ union ionic_dev_info_regs { u8 version; u8 asic_type; u8 asic_rev; +#define IONIC_FW_STS_F_RUNNING 0x1 u8 fw_status; u32 fw_heartbeat; char fw_version[IONIC_DEVINFO_FWVERS_BUFLEN];
From: Eric Dumazet edumazet@google.com
[ Upstream commit e08ad80551b4b33c02f2fce1522f6c227d3976cf ]
netdev_name_node_alt_destroy() does a lookup over all device names of a namespace.
We need to make sure the name belongs to the device of interest, and that we do not destroy its primary name, since we rely on it being not deleted : dev->name_node would indeed point to freed memory.
syzbot report was the following :
BUG: KASAN: use-after-free in dev_net include/linux/netdevice.h:2206 [inline] BUG: KASAN: use-after-free in mld_force_mld_version net/ipv6/mcast.c:1172 [inline] BUG: KASAN: use-after-free in mld_in_v2_mode_only net/ipv6/mcast.c:1180 [inline] BUG: KASAN: use-after-free in mld_in_v1_mode+0x203/0x230 net/ipv6/mcast.c:1190 Read of size 8 at addr ffff88809886c588 by task swapper/1/0
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x32 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:641 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135 dev_net include/linux/netdevice.h:2206 [inline] mld_force_mld_version net/ipv6/mcast.c:1172 [inline] mld_in_v2_mode_only net/ipv6/mcast.c:1180 [inline] mld_in_v1_mode+0x203/0x230 net/ipv6/mcast.c:1190 mld_send_initial_cr net/ipv6/mcast.c:2083 [inline] mld_dad_timer_expire+0x24/0x230 net/ipv6/mcast.c:2118 call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x6c3/0x1790 kernel/time/timer.c:1786 __do_softirq+0x262/0x98c kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x19b/0x1e0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:546 [inline] smp_apic_timer_interrupt+0x1a3/0x610 arch/x86/kernel/apic/apic.c:1146 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: 68 73 c5 f9 eb 8a cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 94 be 59 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 84 be 59 00 fb f4 <c3> cc 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 de 2a 74 f9 e8 09 RSP: 0018:ffffc90000d3fd68 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff136761a RBX: ffff8880a99fc340 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffff8880a99fcbd4 RBP: ffffc90000d3fd98 R08: ffff8880a99fc340 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 R13: ffffffff8aa5a1c0 R14: 0000000000000000 R15: 0000000000000001 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:686 default_idle_call+0x84/0xb0 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x3c8/0x6e0 kernel/sched/idle.c:269 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:361 start_secondary+0x2f4/0x410 arch/x86/kernel/smpboot.c:264 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242
Allocated by task 10229: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:515 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:488 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529 __do_kmalloc_node mm/slab.c:3616 [inline] __kmalloc_node+0x4e/0x70 mm/slab.c:3623 kmalloc_node include/linux/slab.h:578 [inline] kvmalloc_node+0x68/0x100 mm/util.c:574 kvmalloc include/linux/mm.h:645 [inline] kvzalloc include/linux/mm.h:653 [inline] alloc_netdev_mqs+0x98/0xe40 net/core/dev.c:9797 rtnl_create_link+0x22d/0xaf0 net/core/rtnetlink.c:3047 __rtnl_newlink+0xf9f/0x1790 net/core/rtnetlink.c:3309 rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3377 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 __sys_sendto+0x262/0x380 net/socket.c:1998 __do_compat_sys_socketcall net/compat.c:771 [inline] __se_compat_sys_socketcall net/compat.c:719 [inline] __ia32_compat_sys_socketcall+0x530/0x710 net/compat.c:719 do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline] do_fast_syscall_32+0x27b/0xe16 arch/x86/entry/common.c:408 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
Freed by task 10229: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:337 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:476 kasan_slab_free+0xe/0x10 mm/kasan/common.c:485 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x2c0 mm/slab.c:3757 __netdev_name_node_alt_destroy+0x1ff/0x2a0 net/core/dev.c:322 netdev_name_node_alt_destroy+0x57/0x80 net/core/dev.c:334 rtnl_alt_ifname net/core/rtnetlink.c:3518 [inline] rtnl_linkprop.isra.0+0x575/0x6f0 net/core/rtnetlink.c:3567 rtnl_dellinkprop+0x46/0x60 net/core/rtnetlink.c:3588 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __compat_sys_sendmsg net/compat.c:642 [inline] __do_compat_sys_sendmsg net/compat.c:649 [inline] __se_compat_sys_sendmsg net/compat.c:646 [inline] __ia32_compat_sys_sendmsg+0x7a/0xb0 net/compat.c:646 do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline] do_fast_syscall_32+0x27b/0xe16 arch/x86/entry/common.c:408 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
The buggy address belongs to the object at ffff88809886c000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 1416 bytes inside of 4096-byte region [ffff88809886c000, ffff88809886d000) The buggy address belongs to the page: page:ffffea0002621b00 refcount:1 mapcount:0 mapping:ffff8880aa402000 index:0x0 compound_mapcount: 0 flags: 0xfffe0000010200(slab|head) raw: 00fffe0000010200 ffffea0002610d08 ffffea0002607608 ffff8880aa402000 raw: 0000000000000000 ffff88809886c000 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88809886c480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88809886c500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88809886c580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff88809886c600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88809886c680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Jiri Pirko jiri@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/dev.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -330,6 +330,12 @@ int netdev_name_node_alt_destroy(struct name_node = netdev_name_node_lookup(net, name); if (!name_node) return -ENOENT; + /* lookup might have found our primary name or a name belonging + * to another device. + */ + if (name_node == dev->name_node || name_node->dev != dev) + return -EINVAL; + __netdev_name_node_alt_destroy(name_node);
return 0;
From: Alexandre Belloni alexandre.belloni@bootlin.com
[ Upstream commit ac2fcfa9fd26db67d7000677c05629c34cc94564 ]
at91ether_init was handling the phy mode and speed but since the switch to phylink, the NCFGR register got overwritten by macb_mac_config(). The issue is that the RM9200_RMII bit and the MACB_CLK_DIV32 field are cleared but never restored as they conflict with the PAE, GBE and PCSSEL bits.
Add new capability to differentiate between EMAC and the other versions of the IP and use it to set and avoid clearing the relevant bits.
Also, this fixes a NULL pointer dereference in macb_mac_link_up as the EMAC doesn't use any rings/bufffers/queues.
Fixes: 7897b071ac3b ("net: macb: convert to phylink") Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/cadence/macb.h | 1 drivers/net/ethernet/cadence/macb_main.c | 60 ++++++++++++++++--------------- 2 files changed, 33 insertions(+), 28 deletions(-)
--- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -645,6 +645,7 @@ #define MACB_CAPS_GEM_HAS_PTP 0x00000040 #define MACB_CAPS_BD_RD_PREFETCH 0x00000080 #define MACB_CAPS_NEEDS_RSTONUBR 0x00000100 +#define MACB_CAPS_MACB_IS_EMAC 0x08000000 #define MACB_CAPS_FIFO_MODE 0x10000000 #define MACB_CAPS_GIGABIT_MODE_AVAILABLE 0x20000000 #define MACB_CAPS_SG_DISABLED 0x40000000 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -533,8 +533,21 @@ static void macb_mac_config(struct phyli old_ctrl = ctrl = macb_or_gem_readl(bp, NCFGR);
/* Clear all the bits we might set later */ - ctrl &= ~(GEM_BIT(GBE) | MACB_BIT(SPD) | MACB_BIT(FD) | MACB_BIT(PAE) | - GEM_BIT(SGMIIEN) | GEM_BIT(PCSSEL)); + ctrl &= ~(MACB_BIT(SPD) | MACB_BIT(FD) | MACB_BIT(PAE)); + + if (bp->caps & MACB_CAPS_MACB_IS_EMAC) { + if (state->interface == PHY_INTERFACE_MODE_RMII) + ctrl |= MACB_BIT(RM9200_RMII); + } else { + ctrl &= ~(GEM_BIT(GBE) | GEM_BIT(SGMIIEN) | GEM_BIT(PCSSEL)); + + /* We do not support MLO_PAUSE_RX yet */ + if (state->pause & MLO_PAUSE_TX) + ctrl |= MACB_BIT(PAE); + + if (state->interface == PHY_INTERFACE_MODE_SGMII) + ctrl |= GEM_BIT(SGMIIEN) | GEM_BIT(PCSSEL); + }
if (state->speed == SPEED_1000) ctrl |= GEM_BIT(GBE); @@ -544,13 +557,6 @@ static void macb_mac_config(struct phyli if (state->duplex) ctrl |= MACB_BIT(FD);
- /* We do not support MLO_PAUSE_RX yet */ - if (state->pause & MLO_PAUSE_TX) - ctrl |= MACB_BIT(PAE); - - if (state->interface == PHY_INTERFACE_MODE_SGMII) - ctrl |= GEM_BIT(SGMIIEN) | GEM_BIT(PCSSEL); - /* Apply the new configuration, if any */ if (old_ctrl ^ ctrl) macb_or_gem_writel(bp, NCFGR, ctrl); @@ -569,9 +575,10 @@ static void macb_mac_link_down(struct ph unsigned int q; u32 ctrl;
- for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) - queue_writel(queue, IDR, - bp->rx_intr_mask | MACB_TX_INT_FLAGS | MACB_BIT(HRESP)); + if (!(bp->caps & MACB_CAPS_MACB_IS_EMAC)) + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) + queue_writel(queue, IDR, + bp->rx_intr_mask | MACB_TX_INT_FLAGS | MACB_BIT(HRESP));
/* Disable Rx and Tx */ ctrl = macb_readl(bp, NCR) & ~(MACB_BIT(RE) | MACB_BIT(TE)); @@ -588,17 +595,19 @@ static void macb_mac_link_up(struct phyl struct macb_queue *queue; unsigned int q;
- macb_set_tx_clk(bp->tx_clk, bp->speed, ndev); + if (!(bp->caps & MACB_CAPS_MACB_IS_EMAC)) { + macb_set_tx_clk(bp->tx_clk, bp->speed, ndev);
- /* Initialize rings & buffers as clearing MACB_BIT(TE) in link down - * cleared the pipeline and control registers. - */ - bp->macbgem_ops.mog_init_rings(bp); - macb_init_buffers(bp); + /* Initialize rings & buffers as clearing MACB_BIT(TE) in link down + * cleared the pipeline and control registers. + */ + bp->macbgem_ops.mog_init_rings(bp); + macb_init_buffers(bp);
- for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) - queue_writel(queue, IER, - bp->rx_intr_mask | MACB_TX_INT_FLAGS | MACB_BIT(HRESP)); + for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) + queue_writel(queue, IER, + bp->rx_intr_mask | MACB_TX_INT_FLAGS | MACB_BIT(HRESP)); + }
/* Enable Rx and Tx */ macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(RE) | MACB_BIT(TE)); @@ -4002,7 +4011,6 @@ static int at91ether_init(struct platfor struct net_device *dev = platform_get_drvdata(pdev); struct macb *bp = netdev_priv(dev); int err; - u32 reg;
bp->queues[0].bp = bp;
@@ -4016,11 +4024,7 @@ static int at91ether_init(struct platfor
macb_writel(bp, NCR, 0);
- reg = MACB_BF(CLK, MACB_CLK_DIV32) | MACB_BIT(BIG); - if (bp->phy_interface == PHY_INTERFACE_MODE_RMII) - reg |= MACB_BIT(RM9200_RMII); - - macb_writel(bp, NCFGR, reg); + macb_writel(bp, NCFGR, MACB_BF(CLK, MACB_CLK_DIV32) | MACB_BIT(BIG));
return 0; } @@ -4179,7 +4183,7 @@ static const struct macb_config sama5d4_ };
static const struct macb_config emac_config = { - .caps = MACB_CAPS_NEEDS_RSTONUBR, + .caps = MACB_CAPS_NEEDS_RSTONUBR | MACB_CAPS_MACB_IS_EMAC, .clk_init = at91ether_clk_init, .init = at91ether_init, };
From: Eric Dumazet edumazet@google.com
[ Upstream commit 44bfa9c5e5f06c72540273813e4c66beb5a8c213 ]
Since IFLA_ALT_IFNAME is an NLA_STRING, we have no guarantee it is nul terminated.
We should use nla_strdup() instead of kstrdup(), since this helper will make sure not accessing out-of-bounds data.
BUG: KMSAN: uninit-value in strlen+0x5e/0xa0 lib/string.c:535 CPU: 1 PID: 19157 Comm: syz-executor.5 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 strlen+0x5e/0xa0 lib/string.c:535 kstrdup+0x7f/0x1a0 mm/util.c:59 rtnl_alt_ifname net/core/rtnetlink.c:3495 [inline] rtnl_linkprop+0x85d/0xc00 net/core/rtnetlink.c:3553 rtnl_newlinkprop+0x9d/0xb0 net/core/rtnetlink.c:3568 rtnetlink_rcv_msg+0x1153/0x1570 net/core/rtnetlink.c:5424 netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45b3b9 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ff1c7b1ac78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007ff1c7b1b6d4 RCX: 000000000045b3b9 RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000003 RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000009cb R14: 00000000004cb3dd R15: 000000000075bf2c
Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82 slab_alloc_node mm/slub.c:2774 [inline] __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4382 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline] netlink_sendmsg+0x7d3/0x14d0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Jiri Pirko jiri@mellanox.com Reported-by: syzbot syzkaller@googlegroups.com Reviewed-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/rtnetlink.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-)
--- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3499,27 +3499,25 @@ static int rtnl_alt_ifname(int cmd, stru if (err) return err;
- alt_ifname = nla_data(attr); + alt_ifname = nla_strdup(attr, GFP_KERNEL); + if (!alt_ifname) + return -ENOMEM; + if (cmd == RTM_NEWLINKPROP) { - alt_ifname = kstrdup(alt_ifname, GFP_KERNEL); - if (!alt_ifname) - return -ENOMEM; err = netdev_name_node_alt_create(dev, alt_ifname); - if (err) { - kfree(alt_ifname); - return err; - } + if (!err) + alt_ifname = NULL; } else if (cmd == RTM_DELLINKPROP) { err = netdev_name_node_alt_destroy(dev, alt_ifname); - if (err) - return err; } else { - WARN_ON(1); - return 0; + WARN_ON_ONCE(1); + err = -EINVAL; }
- *changed = true; - return 0; + kfree(alt_ifname); + if (!err) + *changed = true; + return err; }
static int rtnl_linkprop(int cmd, struct sk_buff *skb, struct nlmsghdr *nlh,
From: Corey Minyard cminyard@mvista.com
[ Upstream commit 6b8526d3abc02c08a2f888e8c20b7ac9e5776dfe ]
In error cases a NULL can be passed to memcpy. The length will always be zero, so it doesn't really matter, but go ahead and check for NULL, anyway, to be more precise and avoid static analysis errors.
Reported-by: kbuild test robot lkp@intel.com Signed-off-by: Corey Minyard cminyard@mvista.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/char/ipmi/ipmi_ssif.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c index 22c6a2e612360..8ac390c2b5147 100644 --- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -775,10 +775,14 @@ static void msg_done_handler(struct ssif_info *ssif_info, int result, flags = ipmi_ssif_lock_cond(ssif_info, &oflags); msg = ssif_info->curr_msg; if (msg) { + if (data) { + if (len > IPMI_MAX_MSG_LENGTH) + len = IPMI_MAX_MSG_LENGTH; + memcpy(msg->rsp, data, len); + } else { + len = 0; + } msg->rsp_size = len; - if (msg->rsp_size > IPMI_MAX_MSG_LENGTH) - msg->rsp_size = IPMI_MAX_MSG_LENGTH; - memcpy(msg->rsp, data, msg->rsp_size); ssif_info->curr_msg = NULL; }
From: Sean Paul seanpaul@chromium.org
[ Upstream commit db735fc4036bbe1fbe606819b5f0ff26cc76cdff ]
Turning on CONFIG_DMA_API_DEBUG_SG results in the following error:
[ 12.078665] msm ae00000.mdss: DMA-API: mapping sg segment longer than device claims to support [len=3526656] [max=65536] [ 12.089870] WARNING: CPU: 6 PID: 334 at /mnt/host/source/src/third_party/kernel/v4.19/kernel/dma/debug.c:1301 debug_dma_map_sg+0x1dc/0x318 [ 12.102655] Modules linked in: joydev [ 12.106442] CPU: 6 PID: 334 Comm: frecon Not tainted 4.19.0 #2 [ 12.112450] Hardware name: Google Cheza (rev3+) (DT) [ 12.117566] pstate: 60400009 (nZCv daif +PAN -UAO) [ 12.122506] pc : debug_dma_map_sg+0x1dc/0x318 [ 12.126995] lr : debug_dma_map_sg+0x1dc/0x318 [ 12.131487] sp : ffffff800cc3ba80 [ 12.134913] x29: ffffff800cc3ba80 x28: 0000000000000000 [ 12.140395] x27: 0000000000000004 x26: 0000000000000004 [ 12.145868] x25: ffffff8008e55b18 x24: 0000000000000000 [ 12.151337] x23: 00000000ffffffff x22: ffffff800921c000 [ 12.156809] x21: ffffffc0fa75b080 x20: ffffffc0f7195090 [ 12.162280] x19: ffffffc0f1c53280 x18: 0000000000000000 [ 12.167749] x17: 0000000000000000 x16: 0000000000000000 [ 12.173218] x15: 0000000000000000 x14: 0720072007200720 [ 12.178689] x13: 0720072007200720 x12: 0720072007200720 [ 12.184161] x11: 0720072007200720 x10: 0720072007200720 [ 12.189641] x9 : ffffffc0f1fc6b60 x8 : 0000000000000000 [ 12.195110] x7 : ffffff8008132ce0 x6 : 0000000000000000 [ 12.200585] x5 : 0000000000000000 x4 : ffffff8008134734 [ 12.206058] x3 : ffffff800cc3b830 x2 : ffffffc0f1fc6240 [ 12.211532] x1 : 25045a74f48a7400 x0 : 25045a74f48a7400 [ 12.217006] Call trace: [ 12.219535] debug_dma_map_sg+0x1dc/0x318 [ 12.223671] get_pages+0x19c/0x20c [ 12.227177] msm_gem_fault+0x64/0xfc [ 12.230874] __do_fault+0x3c/0x140 [ 12.234383] __handle_mm_fault+0x70c/0xdb8 [ 12.238603] handle_mm_fault+0xac/0xc4 [ 12.242473] do_page_fault+0x1bc/0x3d4 [ 12.246342] do_translation_fault+0x54/0x88 [ 12.250652] do_mem_abort+0x60/0xf0 [ 12.254250] el0_da+0x20/0x24 [ 12.257317] irq event stamp: 67260 [ 12.260828] hardirqs last enabled at (67259): [<ffffff8008132d0c>] console_unlock+0x214/0x608 [ 12.269693] hardirqs last disabled at (67260): [<ffffff8008080e0c>] do_debug_exception+0x5c/0x178 [ 12.278820] softirqs last enabled at (67256): [<ffffff8008081664>] __do_softirq+0x4d4/0x520 [ 12.287510] softirqs last disabled at (67249): [<ffffff80080be574>] irq_exit+0xa8/0x100 [ 12.295742] ---[ end trace e63cfc40c313ffab ]---
The root of the problem is that the default segment size for sgt is (UINT_MAX & PAGE_MASK), and the default segment size for device dma is 64K. As such, if you compare the 2, you would deduce that the sg segment will overflow the device's capacity. In reality, the hardware can accommodate the larger sg segments, it's just not initializing its max segment properly. This patch initializes the max segment size for the mdss device, which gets rid of that pesky warning.
Reported-by: Stephen Boyd swboyd@chromium.org Tested-by: Stephen Boyd swboyd@chromium.org Tested-by: Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org Reviewed-by: Rob Clark robdclark@gmail.com Signed-off-by: Sean Paul seanpaul@chromium.org Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20200121111813.REPOST.1.I92c66... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/msm_drv.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index c84f0a8b3f2ce..b73fbb65e14b2 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -441,6 +441,14 @@ static int msm_drm_init(struct device *dev, struct drm_driver *drv) if (ret) goto err_msm_uninit;
+ if (!dev->dma_parms) { + dev->dma_parms = devm_kzalloc(dev, sizeof(*dev->dma_parms), + GFP_KERNEL); + if (!dev->dma_parms) + return -ENOMEM; + } + dma_set_max_seg_size(dev, DMA_BIT_MASK(32)); + msm_gem_shrinker_init(ddev);
switch (get_mdp_ver(pdev)) {
From: Scott Wood swood@redhat.com
[ Upstream commit 488603b815a7514c7009e6fc339d74ed4a30f343 ]
This will be used in the next patch to get a loadavg update from nohz cpus. The delta check is skipped because idle_sched_class doesn't update se.exec_start.
Signed-off-by: Scott Wood swood@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lkml.kernel.org/r/1578736419-14628-2-git-send-email-swood@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/core.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b2564d62a0f74..3cb879f4eb9c6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3669,22 +3669,24 @@ static void sched_tick_remote(struct work_struct *work) * statistics and checks timeslices in a time-independent way, regardless * of when exactly it is running. */ - if (idle_cpu(cpu) || !tick_nohz_tick_stopped_cpu(cpu)) + if (!tick_nohz_tick_stopped_cpu(cpu)) goto out_requeue;
rq_lock_irq(rq, &rf); curr = rq->curr; - if (is_idle_task(curr) || cpu_is_offline(cpu)) + if (cpu_is_offline(cpu)) goto out_unlock;
update_rq_clock(rq); - delta = rq_clock_task(rq) - curr->se.exec_start;
- /* - * Make sure the next tick runs within a reasonable - * amount of time. - */ - WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); + if (!is_idle_task(curr)) { + /* + * Make sure the next tick runs within a reasonable + * amount of time. + */ + delta = rq_clock_task(rq) - curr->se.exec_start; + WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3); + } curr->sched_class->task_tick(rq, curr, 0);
out_unlock:
From: Peter Zijlstra (Intel) peterz@infradead.org
[ Upstream commit ebc0f83c78a2d26384401ecf2d2fa48063c0ee27 ]
The way loadavg is tracked during nohz only pays attention to the load upon entering nohz. This can be particularly noticeable if full nohz is entered while non-idle, and then the cpu goes idle and stays that way for a long time.
Use the remote tick to ensure that full nohz cpus report their deltas within a reasonable time.
[ swood: Added changelog and removed recheck of stopped tick. ]
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Scott Wood swood@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lkml.kernel.org/r/1578736419-14628-3-git-send-email-swood@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sched/nohz.h | 2 ++ kernel/sched/core.c | 4 +++- kernel/sched/loadavg.c | 33 +++++++++++++++++++++++---------- 3 files changed, 28 insertions(+), 11 deletions(-)
diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h index 1abe91ff6e4a2..6d67e9a5af6bb 100644 --- a/include/linux/sched/nohz.h +++ b/include/linux/sched/nohz.h @@ -15,9 +15,11 @@ static inline void nohz_balance_enter_idle(int cpu) { }
#ifdef CONFIG_NO_HZ_COMMON void calc_load_nohz_start(void); +void calc_load_nohz_remote(struct rq *rq); void calc_load_nohz_stop(void); #else static inline void calc_load_nohz_start(void) { } +static inline void calc_load_nohz_remote(struct rq *rq) { } static inline void calc_load_nohz_stop(void) { } #endif /* CONFIG_NO_HZ_COMMON */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3cb879f4eb9c6..65ed821335dd5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3677,6 +3677,7 @@ static void sched_tick_remote(struct work_struct *work) if (cpu_is_offline(cpu)) goto out_unlock;
+ curr = rq->curr; update_rq_clock(rq);
if (!is_idle_task(curr)) { @@ -3689,10 +3690,11 @@ static void sched_tick_remote(struct work_struct *work) } curr->sched_class->task_tick(rq, curr, 0);
+ calc_load_nohz_remote(rq); out_unlock: rq_unlock_irq(rq, &rf); - out_requeue: + /* * Run the remote tick once per second (1Hz). This arbitrary * frequency is large enough to avoid overload but short enough diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c index 28a516575c181..de22da666ac73 100644 --- a/kernel/sched/loadavg.c +++ b/kernel/sched/loadavg.c @@ -231,16 +231,11 @@ static inline int calc_load_read_idx(void) return calc_load_idx & 1; }
-void calc_load_nohz_start(void) +static void calc_load_nohz_fold(struct rq *rq) { - struct rq *this_rq = this_rq(); long delta;
- /* - * We're going into NO_HZ mode, if there's any pending delta, fold it - * into the pending NO_HZ delta. - */ - delta = calc_load_fold_active(this_rq, 0); + delta = calc_load_fold_active(rq, 0); if (delta) { int idx = calc_load_write_idx();
@@ -248,6 +243,24 @@ void calc_load_nohz_start(void) } }
+void calc_load_nohz_start(void) +{ + /* + * We're going into NO_HZ mode, if there's any pending delta, fold it + * into the pending NO_HZ delta. + */ + calc_load_nohz_fold(this_rq()); +} + +/* + * Keep track of the load for NOHZ_FULL, must be called between + * calc_load_nohz_{start,stop}(). + */ +void calc_load_nohz_remote(struct rq *rq) +{ + calc_load_nohz_fold(rq); +} + void calc_load_nohz_stop(void) { struct rq *this_rq = this_rq(); @@ -268,7 +281,7 @@ void calc_load_nohz_stop(void) this_rq->calc_load_update += LOAD_FREQ; }
-static long calc_load_nohz_fold(void) +static long calc_load_nohz_read(void) { int idx = calc_load_read_idx(); long delta = 0; @@ -323,7 +336,7 @@ static void calc_global_nohz(void) } #else /* !CONFIG_NO_HZ_COMMON */
-static inline long calc_load_nohz_fold(void) { return 0; } +static inline long calc_load_nohz_read(void) { return 0; } static inline void calc_global_nohz(void) { }
#endif /* CONFIG_NO_HZ_COMMON */ @@ -346,7 +359,7 @@ void calc_global_load(unsigned long ticks) /* * Fold the 'old' NO_HZ-delta to include all NO_HZ CPUs. */ - delta = calc_load_nohz_fold(); + delta = calc_load_nohz_read(); if (delta) atomic_long_add(delta, &calc_load_tasks);
From: Vincent Guittot vincent.guittot@linaro.org
[ Upstream commit 2a4b03ffc69f2dedc6388e9a6438b5f4c133a40d ]
When a running task is moved on a throttled task group and there is no other task enqueued on the CPU, the task can keep running using 100% CPU whatever the allocated bandwidth for the group and although its cfs rq is throttled. Furthermore, the group entity of the cfs_rq and its parents are not enqueued but only set as curr on their respective cfs_rqs.
We have the following sequence:
sched_move_task -dequeue_task: dequeue task and group_entities. -put_prev_task: put task and group entities. -sched_change_group: move task to new group. -enqueue_task: enqueue only task but not group entities because cfs_rq is throttled. -set_next_task : set task and group_entities as current sched_entity of their cfs_rq.
Another impact is that the root cfs_rq runnable_load_avg at root rq stays null because the group_entities are not enqueued. This situation will stay the same until an "external" event triggers a reschedule. Let trigger it immediately instead.
Signed-off-by: Vincent Guittot vincent.guittot@linaro.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: Ben Segall bsegall@google.com Link: https://lkml.kernel.org/r/1579011236-31256-1-git-send-email-vincent.guittot@... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/core.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 65ed821335dd5..9e7768dbd92d2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7068,8 +7068,15 @@ void sched_move_task(struct task_struct *tsk)
if (queued) enqueue_task(rq, tsk, queue_flags); - if (running) + if (running) { set_next_task(rq, tsk); + /* + * After changing group, the running task may have joined a + * throttled one but it's still the running task. Trigger a + * resched to make sure that task can still run. + */ + resched_curr(rq); + }
task_rq_unlock(rq, tsk, &rf); }
From: Jeff Moyer jmoyer@redhat.com
[ Upstream commit 96222d53842dfe54869ec4e1b9d4856daf9105a2 ]
fstests generic/471 reports a failure when run with MOUNT_OPTIONS="-o dax". The reason is that the initial pwrite to an empty file with the RWF_NOWAIT flag set does not return -EAGAIN. It turns out that dax_iomap_rw doesn't pass that flag through to iomap_apply.
With this patch applied, generic/471 passes for me.
Signed-off-by: Jeff Moyer jmoyer@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/x49r1z86e1d.fsf@segfault.boston.devel.redhat.com Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/dax.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/dax.c b/fs/dax.c index 1f1f0201cad18..0b0d8819cb1bb 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1207,6 +1207,9 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, lockdep_assert_held(&inode->i_rwsem); }
+ if (iocb->ki_flags & IOCB_NOWAIT) + flags |= IOMAP_NOWAIT; + while (iov_iter_count(iter)) { ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops, iter, dax_iomap_actor);
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit a04564c99bb4a92f805a58e56b2d22cc4978f152 ]
We only use the parsing CRC for checking if a beacon changed, and elements with an ID > 63 cannot be represented in the filter. Thus, like we did before with WMM and Cisco vendor elements, just statically add these forgotten items to the CRC: - WLAN_EID_VHT_OPERATION - WLAN_EID_OPMODE_NOTIF
I guess that in most cases when VHT/HE operation change, the HT operation also changed, and so the change was picked up, but we did notice that pure operating mode notification changes were ignored.
Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/20200131111300.891737-22-luca@coelho.fi [restrict to VHT for the mac80211 branch] Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/util.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 32a7a53833c01..739e90555d8b9 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -1063,16 +1063,22 @@ _ieee802_11_parse_elems_crc(const u8 *start, size_t len, bool action, elem_parse_failed = true; break; case WLAN_EID_VHT_OPERATION: - if (elen >= sizeof(struct ieee80211_vht_operation)) + if (elen >= sizeof(struct ieee80211_vht_operation)) { elems->vht_operation = (void *)pos; - else - elem_parse_failed = true; + if (calc_crc) + crc = crc32_be(crc, pos - 2, elen + 2); + break; + } + elem_parse_failed = true; break; case WLAN_EID_OPMODE_NOTIF: - if (elen > 0) + if (elen > 0) { elems->opmode_notif = pos; - else - elem_parse_failed = true; + if (calc_crc) + crc = crc32_be(crc, pos - 2, elen + 2); + break; + } + elem_parse_failed = true; break; case WLAN_EID_MESH_ID: elems->mesh_id = pos;
From: Sergey Matyukevich sergey.matyukevich.os@quantenna.com
[ Upstream commit bfb7bac3a8f47100ebe7961bd14e924c96e21ca7 ]
When preparing ethtool drvinfo, check if wiphy driver is defined before dereferencing it. Driver may not exist, e.g. if wiphy is attached to a virtual platform device.
Signed-off-by: Sergey Matyukevich sergey.matyukevich.os@quantenna.com Link: https://lore.kernel.org/r/20200203105644.28875-1-sergey.matyukevich.os@quant... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/ethtool.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/wireless/ethtool.c b/net/wireless/ethtool.c index a9c0f368db5d2..24e18405cdb48 100644 --- a/net/wireless/ethtool.c +++ b/net/wireless/ethtool.c @@ -7,9 +7,13 @@ void cfg80211_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) { struct wireless_dev *wdev = dev->ieee80211_ptr; + struct device *pdev = wiphy_dev(wdev->wiphy);
- strlcpy(info->driver, wiphy_dev(wdev->wiphy)->driver->name, - sizeof(info->driver)); + if (pdev->driver) + strlcpy(info->driver, pdev->driver->name, + sizeof(info->driver)); + else + strlcpy(info->driver, "N/A", sizeof(info->driver));
strlcpy(info->version, init_utsname()->release, sizeof(info->version));
From: Stefano Garzarella sgarzare@redhat.com
[ Upstream commit 63e5d81f72af1bf370bf8a6745b0a8d71a7bb37d ]
In io_uring_poll() we must flush overflowed CQ events before to check if there are CQ events available, to avoid missing events.
We call the io_cqring_events() that checks and flushes any overflow and returns the number of CQ events available.
Signed-off-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 678c62782ba3b..de4bd647cd1df 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4970,7 +4970,7 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait) if (READ_ONCE(ctx->rings->sq.tail) - ctx->cached_sq_head != ctx->rings->sq_ring_entries) mask |= EPOLLOUT | EPOLLWRNORM; - if (READ_ONCE(ctx->rings->cq.head) != ctx->cached_cq_tail) + if (io_cqring_events(ctx, false)) mask |= EPOLLIN | EPOLLRDNORM;
return mask;
From: Harald Freudenberger freude@linux.ibm.com
[ Upstream commit fcd98d4002539f1e381916fc1b6648938c1eac76 ]
The internal statistic counters for the total number of requests processed per card and per queue used integers. So they do wrap after a rather huge amount of crypto requests processed. This patch introduces uint64 counters which should hold much longer but still may wrap. The sysfs attributes request_count for card and queue also used only %ld and now display the counter value with %llu.
This is not a security relevant fix. The int overflow which happened is not in any way exploitable as a security breach.
Signed-off-by: Harald Freudenberger freude@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/crypto/ap_bus.h | 4 ++-- drivers/s390/crypto/ap_card.c | 8 ++++---- drivers/s390/crypto/ap_queue.c | 6 +++--- drivers/s390/crypto/zcrypt_api.c | 16 +++++++++------- 4 files changed, 18 insertions(+), 16 deletions(-)
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h index bb35ba4a8d243..4348fdff1c61e 100644 --- a/drivers/s390/crypto/ap_bus.h +++ b/drivers/s390/crypto/ap_bus.h @@ -162,7 +162,7 @@ struct ap_card { unsigned int functions; /* AP device function bitfield. */ int queue_depth; /* AP queue depth.*/ int id; /* AP card number. */ - atomic_t total_request_count; /* # requests ever for this AP device.*/ + atomic64_t total_request_count; /* # requests ever for this AP device.*/ };
#define to_ap_card(x) container_of((x), struct ap_card, ap_dev.device) @@ -179,7 +179,7 @@ struct ap_queue { enum ap_state state; /* State of the AP device. */ int pendingq_count; /* # requests on pendingq list. */ int requestq_count; /* # requests on requestq list. */ - int total_request_count; /* # requests ever for this AP device.*/ + u64 total_request_count; /* # requests ever for this AP device.*/ int request_timeout; /* Request timeout in jiffies. */ struct timer_list timeout; /* Timer for request timeouts. */ struct list_head pendingq; /* List of message sent to AP queue. */ diff --git a/drivers/s390/crypto/ap_card.c b/drivers/s390/crypto/ap_card.c index 63b4cc6cd7e59..e85bfca1ed163 100644 --- a/drivers/s390/crypto/ap_card.c +++ b/drivers/s390/crypto/ap_card.c @@ -63,13 +63,13 @@ static ssize_t request_count_show(struct device *dev, char *buf) { struct ap_card *ac = to_ap_card(dev); - unsigned int req_cnt; + u64 req_cnt;
req_cnt = 0; spin_lock_bh(&ap_list_lock); - req_cnt = atomic_read(&ac->total_request_count); + req_cnt = atomic64_read(&ac->total_request_count); spin_unlock_bh(&ap_list_lock); - return snprintf(buf, PAGE_SIZE, "%d\n", req_cnt); + return snprintf(buf, PAGE_SIZE, "%llu\n", req_cnt); }
static ssize_t request_count_store(struct device *dev, @@ -83,7 +83,7 @@ static ssize_t request_count_store(struct device *dev, for_each_ap_queue(aq, ac) aq->total_request_count = 0; spin_unlock_bh(&ap_list_lock); - atomic_set(&ac->total_request_count, 0); + atomic64_set(&ac->total_request_count, 0);
return count; } diff --git a/drivers/s390/crypto/ap_queue.c b/drivers/s390/crypto/ap_queue.c index 37c3bdc3642dc..a317ab4849320 100644 --- a/drivers/s390/crypto/ap_queue.c +++ b/drivers/s390/crypto/ap_queue.c @@ -479,12 +479,12 @@ static ssize_t request_count_show(struct device *dev, char *buf) { struct ap_queue *aq = to_ap_queue(dev); - unsigned int req_cnt; + u64 req_cnt;
spin_lock_bh(&aq->lock); req_cnt = aq->total_request_count; spin_unlock_bh(&aq->lock); - return snprintf(buf, PAGE_SIZE, "%d\n", req_cnt); + return snprintf(buf, PAGE_SIZE, "%llu\n", req_cnt); }
static ssize_t request_count_store(struct device *dev, @@ -676,7 +676,7 @@ void ap_queue_message(struct ap_queue *aq, struct ap_message *ap_msg) list_add_tail(&ap_msg->list, &aq->requestq); aq->requestq_count++; aq->total_request_count++; - atomic_inc(&aq->card->total_request_count); + atomic64_inc(&aq->card->total_request_count); /* Send/receive as many request from the queue as possible. */ ap_wait(ap_sm_event_loop(aq, AP_EVENT_POLL)); spin_unlock_bh(&aq->lock); diff --git a/drivers/s390/crypto/zcrypt_api.c b/drivers/s390/crypto/zcrypt_api.c index 9157e728a362d..7fa0262e91af0 100644 --- a/drivers/s390/crypto/zcrypt_api.c +++ b/drivers/s390/crypto/zcrypt_api.c @@ -605,8 +605,8 @@ static inline bool zcrypt_card_compare(struct zcrypt_card *zc, weight += atomic_read(&zc->load); pref_weight += atomic_read(&pref_zc->load); if (weight == pref_weight) - return atomic_read(&zc->card->total_request_count) > - atomic_read(&pref_zc->card->total_request_count); + return atomic64_read(&zc->card->total_request_count) > + atomic64_read(&pref_zc->card->total_request_count); return weight > pref_weight; }
@@ -1216,11 +1216,12 @@ static void zcrypt_qdepth_mask(char qdepth[], size_t max_adapters) spin_unlock(&zcrypt_list_lock); }
-static void zcrypt_perdev_reqcnt(int reqcnt[], size_t max_adapters) +static void zcrypt_perdev_reqcnt(u32 reqcnt[], size_t max_adapters) { struct zcrypt_card *zc; struct zcrypt_queue *zq; int card; + u64 cnt;
memset(reqcnt, 0, sizeof(int) * max_adapters); spin_lock(&zcrypt_list_lock); @@ -1232,8 +1233,9 @@ static void zcrypt_perdev_reqcnt(int reqcnt[], size_t max_adapters) || card >= max_adapters) continue; spin_lock(&zq->queue->lock); - reqcnt[card] = zq->queue->total_request_count; + cnt = zq->queue->total_request_count; spin_unlock(&zq->queue->lock); + reqcnt[card] = (cnt < UINT_MAX) ? (u32) cnt : UINT_MAX; } } local_bh_enable(); @@ -1411,9 +1413,9 @@ static long zcrypt_unlocked_ioctl(struct file *filp, unsigned int cmd, return 0; } case ZCRYPT_PERDEV_REQCNT: { - int *reqcnt; + u32 *reqcnt;
- reqcnt = kcalloc(AP_DEVICES, sizeof(int), GFP_KERNEL); + reqcnt = kcalloc(AP_DEVICES, sizeof(u32), GFP_KERNEL); if (!reqcnt) return -ENOMEM; zcrypt_perdev_reqcnt(reqcnt, AP_DEVICES); @@ -1470,7 +1472,7 @@ static long zcrypt_unlocked_ioctl(struct file *filp, unsigned int cmd, } case Z90STAT_PERDEV_REQCNT: { /* the old ioctl supports only 64 adapters */ - int reqcnt[MAX_ZDEV_CARDIDS]; + u32 reqcnt[MAX_ZDEV_CARDIDS];
zcrypt_perdev_reqcnt(reqcnt, MAX_ZDEV_CARDIDS); if (copy_to_user((int __user *) arg, reqcnt, sizeof(reqcnt)))
From: Bjørn Mork bjorn@mork.no
[ Upstream commit 88bf54603f6f2c137dfee1abf6436ceac3528d2d ]
Commit f25e1392fdb5 removed the support for the pre-production variant of the Dell DW5821e to avoid probing another USB interface unnecessarily. However, the pre-production samples are found in the wild, and this lack of support is causing problems for users of such samples. It is therefore necessary to support both variants.
Matching on both interfaces 0 and 1 is not expected to cause any problem with either variant, as only the QMI function will be probed successfully on either. Interface 1 will be rejected based on the HID class for the production variant:
T: Bus=01 Lev=03 Prnt=04 Port=00 Cnt=01 Dev#= 16 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 2 P: Vendor=413c ProdID=81d7 Rev=03.18 S: Manufacturer=DELL S: Product=DW5821e Snapdragon X20 LTE S: SerialNumber=0123456789ABCDEF C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan I: If#= 1 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
And interface 0 will be rejected based on too few endpoints for the pre-production variant:
T: Bus=01 Lev=02 Prnt=02 Port=03 Cnt=03 Dev#= 7 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 2 P: Vendor=413c ProdID=81d7 Rev= 3.18 S: Manufacturer=DELL S: Product=DW5821e Snapdragon X20 LTE S: SerialNumber=0123456789ABCDEF C: #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver= I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
Fixes: f25e1392fdb5 ("qmi_wwan: fix interface number for DW5821e production firmware") Link: https://whrl.pl/Rf0vNk Reported-by: Lars Melin larsm17@gmail.com Cc: Aleksander Morgado aleksander@aleksander.es Signed-off-by: Bjørn Mork bjorn@mork.no Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 9485c8d1de8a3..839cef720cf64 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1363,6 +1363,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x413c, 0x81b6, 8)}, /* Dell Wireless 5811e */ {QMI_FIXED_INTF(0x413c, 0x81b6, 10)}, /* Dell Wireless 5811e */ {QMI_FIXED_INTF(0x413c, 0x81d7, 0)}, /* Dell Wireless 5821e */ + {QMI_FIXED_INTF(0x413c, 0x81d7, 1)}, /* Dell Wireless 5821e preproduction config */ {QMI_FIXED_INTF(0x413c, 0x81e0, 0)}, /* Dell Wireless 5821e with eSIM support*/ {QMI_FIXED_INTF(0x03f0, 0x4e1d, 8)}, /* HP lt4111 LTE/EV-DO/HSPA+ Gobi 4G Module */ {QMI_FIXED_INTF(0x03f0, 0x9d1d, 1)}, /* HP lt4120 Snapdragon X5 LTE */
From: Bjørn Mork bjorn@mork.no
[ Upstream commit 00516d13d4cfa56ce39da144db2dbf08b09b9357 ]
We have been using the fact that the QMI and DIAG functions usually are the only ones with class/subclass/protocol being ff/ff/ff on Quectel modems. This has allowed us to match the QMI function without knowing the exact interface number, which can vary depending on firmware configuration.
The ability to silently reject the DIAG function, which is usually handled by the option driver, is important for this method to work. This is done based on the knowledge that it has exactly 2 bulk endpoints. QMI function control interfaces will have either 3 or 1 endpoint. This rule is universal so the quirk condition can be removed.
The fixed layouts known from the Gobi1k and Gobi2k modems have been gradually replaced by more dynamic layouts, and many vendors now use configurable layouts without changing device IDs. Renaming the class/subclass/protocol matching macro makes it more obvious that this is now not Quectel specific anymore.
Cc: Kristian Evensen kristian.evensen@gmail.com Cc: Aleksander Morgado aleksander@aleksander.es Signed-off-by: Bjørn Mork bjorn@mork.no Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/qmi_wwan.c | 42 ++++++++++++++------------------------ 1 file changed, 15 insertions(+), 27 deletions(-)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 839cef720cf64..3b7a3b8a5e067 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -61,7 +61,6 @@ enum qmi_wwan_flags {
enum qmi_wwan_quirks { QMI_WWAN_QUIRK_DTR = 1 << 0, /* needs "set DTR" request */ - QMI_WWAN_QUIRK_QUECTEL_DYNCFG = 1 << 1, /* check num. endpoints */ };
struct qmimux_hdr { @@ -916,16 +915,6 @@ static const struct driver_info qmi_wwan_info_quirk_dtr = { .data = QMI_WWAN_QUIRK_DTR, };
-static const struct driver_info qmi_wwan_info_quirk_quectel_dyncfg = { - .description = "WWAN/QMI device", - .flags = FLAG_WWAN | FLAG_SEND_ZLP, - .bind = qmi_wwan_bind, - .unbind = qmi_wwan_unbind, - .manage_power = qmi_wwan_manage_power, - .rx_fixup = qmi_wwan_rx_fixup, - .data = QMI_WWAN_QUIRK_DTR | QMI_WWAN_QUIRK_QUECTEL_DYNCFG, -}; - #define HUAWEI_VENDOR_ID 0x12D1
/* map QMI/wwan function by a fixed interface number */ @@ -946,14 +935,18 @@ static const struct driver_info qmi_wwan_info_quirk_quectel_dyncfg = { #define QMI_GOBI_DEVICE(vend, prod) \ QMI_FIXED_INTF(vend, prod, 0)
-/* Quectel does not use fixed interface numbers on at least some of their - * devices. We need to check the number of endpoints to ensure that we bind to - * the correct interface. +/* Many devices have QMI and DIAG functions which are distinguishable + * from other vendor specific functions by class, subclass and + * protocol all being 0xff. The DIAG function has exactly 2 endpoints + * and is silently rejected when probed. + * + * This makes it possible to match dynamically numbered QMI functions + * as seen on e.g. many Quectel modems. */ -#define QMI_QUIRK_QUECTEL_DYNCFG(vend, prod) \ +#define QMI_MATCH_FF_FF_FF(vend, prod) \ USB_DEVICE_AND_INTERFACE_INFO(vend, prod, USB_CLASS_VENDOR_SPEC, \ USB_SUBCLASS_VENDOR_SPEC, 0xff), \ - .driver_info = (unsigned long)&qmi_wwan_info_quirk_quectel_dyncfg + .driver_info = (unsigned long)&qmi_wwan_info_quirk_dtr
static const struct usb_device_id products[] = { /* 1. CDC ECM like devices match on the control interface */ @@ -1059,10 +1052,10 @@ static const struct usb_device_id products[] = { USB_DEVICE_AND_INTERFACE_INFO(0x03f0, 0x581d, USB_CLASS_VENDOR_SPEC, 1, 7), .driver_info = (unsigned long)&qmi_wwan_info, }, - {QMI_QUIRK_QUECTEL_DYNCFG(0x2c7c, 0x0125)}, /* Quectel EC25, EC20 R2.0 Mini PCIe */ - {QMI_QUIRK_QUECTEL_DYNCFG(0x2c7c, 0x0306)}, /* Quectel EP06/EG06/EM06 */ - {QMI_QUIRK_QUECTEL_DYNCFG(0x2c7c, 0x0512)}, /* Quectel EG12/EM12 */ - {QMI_QUIRK_QUECTEL_DYNCFG(0x2c7c, 0x0800)}, /* Quectel RM500Q-GL */ + {QMI_MATCH_FF_FF_FF(0x2c7c, 0x0125)}, /* Quectel EC25, EC20 R2.0 Mini PCIe */ + {QMI_MATCH_FF_FF_FF(0x2c7c, 0x0306)}, /* Quectel EP06/EG06/EM06 */ + {QMI_MATCH_FF_FF_FF(0x2c7c, 0x0512)}, /* Quectel EG12/EM12 */ + {QMI_MATCH_FF_FF_FF(0x2c7c, 0x0800)}, /* Quectel RM500Q-GL */
/* 3. Combined interface devices matching on interface number */ {QMI_FIXED_INTF(0x0408, 0xea42, 4)}, /* Yota / Megafon M100-1 */ @@ -1455,7 +1448,6 @@ static int qmi_wwan_probe(struct usb_interface *intf, { struct usb_device_id *id = (struct usb_device_id *)prod; struct usb_interface_descriptor *desc = &intf->cur_altsetting->desc; - const struct driver_info *info;
/* Workaround to enable dynamic IDs. This disables usbnet * blacklisting functionality. Which, if required, can be @@ -1491,12 +1483,8 @@ static int qmi_wwan_probe(struct usb_interface *intf, * different. Ignore the current interface if the number of endpoints * equals the number for the diag interface (two). */ - info = (void *)id->driver_info; - - if (info->data & QMI_WWAN_QUIRK_QUECTEL_DYNCFG) { - if (desc->bNumEndpoints == 2) - return -ENODEV; - } + if (desc->bNumEndpoints == 2) + return -ENODEV;
return usbnet_probe(intf, id); }
From: Trond Myklebust trondmy@gmail.com
[ Upstream commit cf5b4059ba7197d6cef9c0e024979d178ed8c8ec ]
We want to make sure that we revalidate the dentry if and only if we've done an OPEN by filename. In order to avoid races with remote changes to the directory on the server, we want to save the verifier before calling OPEN. The exception is if the server returned a delegation with our OPEN, as we then know that the filename can't have changed on the server.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Reviewed-by: Benjamin Coddington bcodding@gmail.com Tested-by: Benjamin Coddington bcodding@gmail.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4file.c | 1 - fs/nfs/nfs4proc.c | 18 ++++++++++++++++-- 2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 620de905cba97..3f892035c1413 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -86,7 +86,6 @@ nfs4_file_open(struct inode *inode, struct file *filp) if (inode != d_inode(dentry)) goto out_drop;
- nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); nfs_file_set_open_context(filp, ctx); nfs_fscache_open_file(inode, filp); err = 0; diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 6ddb4f517d373..13c2de527718a 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2962,10 +2962,13 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata, struct dentry *dentry; struct nfs4_state *state; fmode_t acc_mode = _nfs4_ctx_to_accessmode(ctx); + struct inode *dir = d_inode(opendata->dir); + unsigned long dir_verifier; unsigned int seq; int ret;
seq = raw_seqcount_begin(&sp->so_reclaim_seqcount); + dir_verifier = nfs_save_change_attribute(dir);
ret = _nfs4_proc_open(opendata, ctx); if (ret != 0) @@ -2993,8 +2996,19 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata, dput(ctx->dentry); ctx->dentry = dentry = alias; } - nfs_set_verifier(dentry, - nfs_save_change_attribute(d_inode(opendata->dir))); + } + + switch(opendata->o_arg.claim) { + default: + break; + case NFS4_OPEN_CLAIM_NULL: + case NFS4_OPEN_CLAIM_DELEGATE_CUR: + case NFS4_OPEN_CLAIM_DELEGATE_PREV: + if (!opendata->rpc_done) + break; + if (opendata->o_res.delegation_type != 0) + dir_verifier = nfs_save_change_attribute(dir); + nfs_set_verifier(dentry, dir_verifier); }
/* Parse layoutget results before we check for access */
From: John Garry john.garry@huawei.com
[ Upstream commit 0ca2c0319a7bce0e152b51b866979d62dc261e48 ]
Even though a SMMUv3 PMCG implementation may use an MSI as the form of interrupt source, the kernel would still complain that it does not find the wired (GSIV) interrupt in this case:
root@(none)$ dmesg | grep arm-smmu-v3-pmcg | grep "not found" [ 59.237219] arm-smmu-v3-pmcg arm-smmu-v3-pmcg.8.auto: IRQ index 0 not found [ 59.322841] arm-smmu-v3-pmcg arm-smmu-v3-pmcg.9.auto: IRQ index 0 not found [ 59.422155] arm-smmu-v3-pmcg arm-smmu-v3-pmcg.10.auto: IRQ index 0 not found [ 59.539014] arm-smmu-v3-pmcg arm-smmu-v3-pmcg.11.auto: IRQ index 0 not found [ 59.640329] arm-smmu-v3-pmcg arm-smmu-v3-pmcg.12.auto: IRQ index 0 not found [ 59.743112] arm-smmu-v3-pmcg arm-smmu-v3-pmcg.13.auto: IRQ index 0 not found [ 59.880577] arm-smmu-v3-pmcg arm-smmu-v3-pmcg.14.auto: IRQ index 0 not found [ 60.017528] arm-smmu-v3-pmcg arm-smmu-v3-pmcg.15.auto: IRQ index 0 not found
Use platform_get_irq_optional() to silence the warning.
If neither interrupt source is found, then the driver will still warn that IRQ setup errored and the probe will fail.
Reviewed-by: Robin Murphy robin.murphy@arm.com Signed-off-by: John Garry john.garry@huawei.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/perf/arm_smmuv3_pmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c index d704eccc548f6..f01a57e5a5f35 100644 --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -771,7 +771,7 @@ static int smmu_pmu_probe(struct platform_device *pdev) smmu_pmu->reloc_base = smmu_pmu->reg_base; }
- irq = platform_get_irq(pdev, 0); + irq = platform_get_irq_optional(pdev, 0); if (irq > 0) smmu_pmu->irq = irq;
From: Peter Zijlstra peterz@infradead.org
[ Upstream commit be993e44badc448add6a18d6f12b20615692c4c3 ]
The __patch_text() function already applies __opcode_to_mem_*(), so when __opcode_to_mem_*() is not the identity (BE*), it is applied twice, wrecking the instruction.
Fixes: 42e51f187f86 ("arm/ftrace: Use __patch_text()") Reported-by: Dmitry Osipenko digetx@gmail.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Tested-by: Dmitry Osipenko digetx@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/kernel/ftrace.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/arm/kernel/ftrace.c b/arch/arm/kernel/ftrace.c index bda949fd84e8b..93caf757f1d5d 100644 --- a/arch/arm/kernel/ftrace.c +++ b/arch/arm/kernel/ftrace.c @@ -81,13 +81,10 @@ static int ftrace_modify_code(unsigned long pc, unsigned long old, { unsigned long replaced;
- if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) { + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) old = __opcode_to_mem_thumb32(old); - new = __opcode_to_mem_thumb32(new); - } else { + else old = __opcode_to_mem_arm(old); - new = __opcode_to_mem_arm(new); - }
if (validate) { if (probe_kernel_read(&replaced, (void *)pc, MCOUNT_INSN_SIZE))
From: Kan Liang kan.liang@linux.intel.com
[ Upstream commit eda23b387f6c4bb2971ac7e874a09913f533b22c ]
Elkhart Lake also uses Tremont CPU. From the perspective of Intel PMU, there is nothing changed compared with Jacobsville. Share the perf code with Jacobsville.
Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Andi Kleen ak@linux.intel.com Link: https://lkml.kernel.org/r/1580236279-35492-1-git-send-email-kan.liang@linux.... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/events/intel/core.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 3be51aa06e67e..dff6623804c28 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4765,6 +4765,7 @@ __init int intel_pmu_init(void) break;
case INTEL_FAM6_ATOM_TREMONT_D: + case INTEL_FAM6_ATOM_TREMONT: x86_pmu.late_ack = true; memcpy(hw_cache_event_ids, glp_hw_cache_event_ids, sizeof(hw_cache_event_ids));
From: Kan Liang kan.liang@linux.intel.com
[ Upstream commit ecf71fbccb9ac5cb964eb7de59bb9da3755b7885 ]
Tremont is Intel's successor to Goldmont Plus. From the perspective of Intel cstate residency counters, there is nothing changed compared with Goldmont Plus and Goldmont.
Share glm_cstates with Goldmont Plus and Goldmont. Update the comments for Tremont.
Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Andi Kleen ak@linux.intel.com Link: https://lkml.kernel.org/r/1580236279-35492-2-git-send-email-kan.liang@linux.... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/events/intel/cstate.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c index e1daf4151e116..4814c964692cb 100644 --- a/arch/x86/events/intel/cstate.c +++ b/arch/x86/events/intel/cstate.c @@ -40,17 +40,18 @@ * Model specific counters: * MSR_CORE_C1_RES: CORE C1 Residency Counter * perf code: 0x00 - * Available model: SLM,AMT,GLM,CNL + * Available model: SLM,AMT,GLM,CNL,TNT * Scope: Core (each processor core has a MSR) * MSR_CORE_C3_RESIDENCY: CORE C3 Residency Counter * perf code: 0x01 * Available model: NHM,WSM,SNB,IVB,HSW,BDW,SKL,GLM, - * CNL,KBL,CML + * CNL,KBL,CML,TNT * Scope: Core * MSR_CORE_C6_RESIDENCY: CORE C6 Residency Counter * perf code: 0x02 * Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW, - * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL + * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL, + * TNT * Scope: Core * MSR_CORE_C7_RESIDENCY: CORE C7 Residency Counter * perf code: 0x03 @@ -60,17 +61,18 @@ * MSR_PKG_C2_RESIDENCY: Package C2 Residency Counter. * perf code: 0x00 * Available model: SNB,IVB,HSW,BDW,SKL,KNL,GLM,CNL, - * KBL,CML,ICL,TGL + * KBL,CML,ICL,TGL,TNT * Scope: Package (physical package) * MSR_PKG_C3_RESIDENCY: Package C3 Residency Counter. * perf code: 0x01 * Available model: NHM,WSM,SNB,IVB,HSW,BDW,SKL,KNL, - * GLM,CNL,KBL,CML,ICL,TGL + * GLM,CNL,KBL,CML,ICL,TGL,TNT * Scope: Package (physical package) * MSR_PKG_C6_RESIDENCY: Package C6 Residency Counter. * perf code: 0x02 - * Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW - * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL + * Available model: SLM,AMT,NHM,WSM,SNB,IVB,HSW,BDW, + * SKL,KNL,GLM,CNL,KBL,CML,ICL,TGL, + * TNT * Scope: Package (physical package) * MSR_PKG_C7_RESIDENCY: Package C7 Residency Counter. * perf code: 0x03 @@ -87,7 +89,8 @@ * Scope: Package (physical package) * MSR_PKG_C10_RESIDENCY: Package C10 Residency Counter. * perf code: 0x06 - * Available model: HSW ULT,KBL,GLM,CNL,CML,ICL,TGL + * Available model: HSW ULT,KBL,GLM,CNL,CML,ICL,TGL, + * TNT * Scope: Package (physical package) * */ @@ -640,8 +643,9 @@ static const struct x86_cpu_id intel_cstates_match[] __initconst = {
X86_CSTATES_MODEL(INTEL_FAM6_ATOM_GOLDMONT, glm_cstates), X86_CSTATES_MODEL(INTEL_FAM6_ATOM_GOLDMONT_D, glm_cstates), - X86_CSTATES_MODEL(INTEL_FAM6_ATOM_GOLDMONT_PLUS, glm_cstates), + X86_CSTATES_MODEL(INTEL_FAM6_ATOM_TREMONT_D, glm_cstates), + X86_CSTATES_MODEL(INTEL_FAM6_ATOM_TREMONT, glm_cstates),
X86_CSTATES_MODEL(INTEL_FAM6_ICELAKE_L, icl_cstates), X86_CSTATES_MODEL(INTEL_FAM6_ICELAKE, icl_cstates),
From: Kan Liang kan.liang@linux.intel.com
[ Upstream commit 0aa0e0d6b34b89649e6b5882a7e025a0eb9bd832 ]
Tremont is Intel's successor to Goldmont Plus. SMI_COUNT MSR is also supported.
Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Andi Kleen ak@linux.intel.com Link: https://lkml.kernel.org/r/1580236279-35492-3-git-send-email-kan.liang@linux.... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/events/msr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c index 6f86650b3f77d..a949f6f55991d 100644 --- a/arch/x86/events/msr.c +++ b/arch/x86/events/msr.c @@ -75,8 +75,9 @@ static bool test_intel(int idx, void *data)
case INTEL_FAM6_ATOM_GOLDMONT: case INTEL_FAM6_ATOM_GOLDMONT_D: - case INTEL_FAM6_ATOM_GOLDMONT_PLUS: + case INTEL_FAM6_ATOM_TREMONT_D: + case INTEL_FAM6_ATOM_TREMONT:
case INTEL_FAM6_XEON_PHI_KNL: case INTEL_FAM6_XEON_PHI_KNM:
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 8e4473bb50a1796c9c32b244e5dbc5ee24ead937 ]
In O_APPEND & O_DIRECT mode, the data from different writers will be possibly overlapping each other since they take the shared lock.
For example, both Writer1 and Writer2 are in O_APPEND and O_DIRECT mode:
Writer1 Writer2
shared_lock() shared_lock() getattr(CAP_SIZE) getattr(CAP_SIZE) iocb->ki_pos = EOF iocb->ki_pos = EOF write(data1) write(data2) shared_unlock() shared_unlock()
The data2 will overlap the data1 from the same file offset, the old EOF.
Switch to exclusive lock instead when O_APPEND is specified.
Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/file.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 11929d2bb594c..cd09e63d682b7 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1418,6 +1418,7 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) struct ceph_cap_flush *prealloc_cf; ssize_t count, written = 0; int err, want, got; + bool direct_lock = false; loff_t pos; loff_t limit = max(i_size_read(inode), fsc->max_file_size);
@@ -1428,8 +1429,11 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) if (!prealloc_cf) return -ENOMEM;
+ if ((iocb->ki_flags & (IOCB_DIRECT | IOCB_APPEND)) == IOCB_DIRECT) + direct_lock = true; + retry_snap: - if (iocb->ki_flags & IOCB_DIRECT) + if (direct_lock) ceph_start_io_direct(inode); else ceph_start_io_write(inode); @@ -1519,14 +1523,15 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
/* we might need to revert back to that point */ data = *from; - if (iocb->ki_flags & IOCB_DIRECT) { + if (iocb->ki_flags & IOCB_DIRECT) written = ceph_direct_read_write(iocb, &data, snapc, &prealloc_cf); - ceph_end_io_direct(inode); - } else { + else written = ceph_sync_write(iocb, &data, pos, snapc); + if (direct_lock) + ceph_end_io_direct(inode); + else ceph_end_io_write(inode); - } if (written > 0) iov_iter_advance(from, written); ceph_put_snap_context(snapc); @@ -1577,7 +1582,7 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
goto out_unlocked; out: - if (iocb->ki_flags & IOCB_DIRECT) + if (direct_lock) ceph_end_io_direct(inode); else ceph_end_io_write(inode);
From: Kuninori Morimoto kuninori.morimoto.gx@renesas.com
[ Upstream commit f24667779b5348279e5e4328312a141a730a1fc7 ]
frame-inversion is "flag" not "uint32". This patch fixup it.
Signed-off-by: Kuninori Morimoto kuninori.morimoto.gx@renesas.com Reviewed-by: Patrice Chotard patrice.chotard@st.com Signed-off-by: Patrice Chotard patrice.chotard@st.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/stihxxx-b2120.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/stihxxx-b2120.dtsi b/arch/arm/boot/dts/stihxxx-b2120.dtsi index 60e11045ad762..d051f080e52ec 100644 --- a/arch/arm/boot/dts/stihxxx-b2120.dtsi +++ b/arch/arm/boot/dts/stihxxx-b2120.dtsi @@ -46,7 +46,7 @@ /* DAC */ format = "i2s"; mclk-fs = <256>; - frame-inversion = <1>; + frame-inversion; cpu { sound-dai = <&sti_uni_player2>; };
From: Sung Lee sung.lee@amd.com
[ Upstream commit df36f6cf23ada812930afa8ee76681d4ad307c61 ]
[WHY] The optimized_require flag is needed to set watermarks and clocks lower in certain conditions. This flag is set to true and then set to false while programming front end in dcn20.
[HOW] Do not set the flag to false while disabling plane.
Signed-off-by: Sung Lee sung.lee@amd.com Reviewed-by: Tony Cheng Tony.Cheng@amd.com Acked-by: Bhawanpreet Lakha Bhawanpreet.Lakha@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c index ac8c18fadefce..448bc9b39942f 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c @@ -493,7 +493,6 @@ static void dcn20_plane_atomic_disable(struct dc *dc, struct pipe_ctx *pipe_ctx) dpp->funcs->dpp_dppclk_control(dpp, false, false);
hubp->power_gated = true; - dc->optimized_required = false; /* We're powering off, no need to optimize */
dc->hwss.plane_atomic_power_down(dc, pipe_ctx->plane_res.dpp,
From: Krishnamraju Eraparaju krishna2@chelsio.com
[ Upstream commit 663218a3e715fd9339d143a3e10088316b180f4f ]
Warnings like below can fill up the dmesg while disconnecting RDMA connections. Hence, remove the unwanted WARN_ON.
WARNING: CPU: 6 PID: 0 at drivers/infiniband/sw/siw/siw_cm.c:1229 siw_cm_llp_data_ready+0xc1/0xd0 [siw] RIP: 0010:siw_cm_llp_data_ready+0xc1/0xd0 [siw] Call Trace: <IRQ> tcp_data_queue+0x226/0xb40 tcp_rcv_established+0x220/0x620 tcp_v4_do_rcv+0x12a/0x1e0 tcp_v4_rcv+0xb05/0xc00 ip_local_deliver_finish+0x69/0x210 ip_local_deliver+0x6b/0xe0 ip_rcv+0x273/0x362 __netif_receive_skb_core+0xb35/0xc30 netif_receive_skb_internal+0x3d/0xb0 napi_gro_frags+0x13b/0x200 t4_ethrx_handler+0x433/0x7d0 [cxgb4] process_responses+0x318/0x580 [cxgb4] napi_rx_handler+0x14/0x100 [cxgb4] net_rx_action+0x149/0x3b0 __do_softirq+0xe3/0x30a irq_exit+0x100/0x110 do_IRQ+0x7f/0xe0 common_interrupt+0xf/0xf </IRQ>
Link: https://lore.kernel.org/r/20200207141429.27927-1-krishna2@chelsio.com Signed-off-by: Krishnamraju Eraparaju krishna2@chelsio.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/sw/siw/siw_cm.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c index 3bccfef40e7e1..ac86363ce1a24 100644 --- a/drivers/infiniband/sw/siw/siw_cm.c +++ b/drivers/infiniband/sw/siw/siw_cm.c @@ -1225,10 +1225,9 @@ static void siw_cm_llp_data_ready(struct sock *sk) read_lock(&sk->sk_callback_lock);
cep = sk_to_cep(sk); - if (!cep) { - WARN_ON(1); + if (!cep) goto out; - } + siw_dbg_cep(cep, "state: %d\n", cep->state);
switch (cep->state) {
From: Aric Cyr aric.cyr@amd.com
[ Upstream commit 2b63d0ec0daf79ba503fa8bfa25e07dc3da274f3 ]
[Why] Engine can be NULL in some cases, so we must not acquire it.
[How] Check for NULL engine before acquiring.
Signed-off-by: Aric Cyr aric.cyr@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Acked-by: Bhawanpreet Lakha Bhawanpreet.Lakha@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/dce/dce_aux.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c index 793c0cec407f9..5fcffb29317e3 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c @@ -398,7 +398,7 @@ static bool acquire( { enum gpio_result result;
- if (!is_engine_available(engine)) + if ((engine == NULL) || !is_engine_available(engine)) return false;
result = dal_ddc_open(ddc, GPIO_MODE_HARDWARE,
From: Yongqiang Sun yongqiang.sun@amd.com
[ Upstream commit 6c81917a0485ee2a1be0dc23321ac10ecfd9578b ]
[Why] Underflow is observed when plug in a 4K@60 monitor with 1366x768 eDP due to DPPCLK is too low.
[How] Limit minimum DPPCLK to 100MHz.
Signed-off-by: Yongqiang Sun yongqiang.sun@amd.com Reviewed-by: Eric Yang eric.yang2@amd.com Acked-by: Bhawanpreet Lakha Bhawanpreet.Lakha@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c index dbf063856846e..5f683d118d2aa 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c @@ -149,6 +149,12 @@ void rn_update_clocks(struct clk_mgr *clk_mgr_base, rn_vbios_smu_set_min_deep_sleep_dcfclk(clk_mgr, clk_mgr_base->clks.dcfclk_deep_sleep_khz); }
+ // workaround: Limit dppclk to 100Mhz to avoid lower eDP panel switch to plus 4K monitor underflow. + if (!IS_DIAG_DC(dc->ctx->dce_environment)) { + if (new_clocks->dppclk_khz < 100000) + new_clocks->dppclk_khz = 100000; + } + if (should_set_clock(safe_to_lower, new_clocks->dppclk_khz, clk_mgr->base.clks.dppclk_khz)) { if (clk_mgr->base.clks.dppclk_khz > new_clocks->dppclk_khz) dpp_clock_lowered = true;
From: Isabel Zhang isabel.zhang@amd.com
[ Upstream commit c134c3cabae46a56ab2e1f5e5fa49405e1758838 ]
[Why] Starting from 14nm, the PLL is built into the PHY and the PLL is mapped to PHY on 1 to 1 basis. In the code, the DP port is mapped to a PLL that was not initialized. This causes DP to HDMI dongle to not light up the display.
[How] Initializations added for PLL2 when creating resources.
Signed-off-by: Isabel Zhang isabel.zhang@amd.com Reviewed-by: Eric Yang eric.yang2@amd.com Acked-by: Bhawanpreet Lakha Bhawanpreet.Lakha@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c index 83cda43a1b6b3..77741b18c85b0 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c @@ -57,6 +57,7 @@ #include "dcn20/dcn20_dccg.h" #include "dcn21_hubbub.h" #include "dcn10/dcn10_resource.h" +#include "dce110/dce110_resource.h"
#include "dcn20/dcn20_dwb.h" #include "dcn20/dcn20_mmhubbub.h" @@ -867,6 +868,7 @@ static const struct dc_debug_options debug_defaults_diags = { enum dcn20_clk_src_array_id { DCN20_CLK_SRC_PLL0, DCN20_CLK_SRC_PLL1, + DCN20_CLK_SRC_PLL2, DCN20_CLK_SRC_TOTAL_DCN21 };
@@ -1730,6 +1732,10 @@ static bool construct( dcn21_clock_source_create(ctx, ctx->dc_bios, CLOCK_SOURCE_COMBO_PHY_PLL1, &clk_src_regs[1], false); + pool->base.clock_sources[DCN20_CLK_SRC_PLL2] = + dcn21_clock_source_create(ctx, ctx->dc_bios, + CLOCK_SOURCE_COMBO_PHY_PLL2, + &clk_src_regs[2], false);
pool->base.clk_src_count = DCN20_CLK_SRC_TOTAL_DCN21;
From: Daniel Kolesa daniel@octaforge.org
[ Upstream commit 416611d9b6eebaeae58ed26cc7d23131c69126b1 ]
On PowerPC, the compiler will tag object files with whether they use hard or soft float FP ABI and whether they use 64 or 128-bit long double ABI. On systems with 64-bit long double ABI, a tag will get emitted whenever a double is used, as on those systems a long double is the same as a double. This will prevent linkage as other files are being compiled with hard-float.
On ppc64, this code will never actually get used for the time being, as the only currently existing hardware using it are the Renoir APUs. Therefore, until this is testable and can be fixed properly, at least make sure the build will not fail.
Signed-off-by: Daniel Kolesa daniel@octaforge.org Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile index b864869cc7e3e..6fa7422c51da5 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile @@ -91,6 +91,12 @@ ifdef CONFIG_DRM_AMD_DC_DCN2_1 ############################################################################### CLK_MGR_DCN21 = rn_clk_mgr.o rn_clk_mgr_vbios_smu.o
+# prevent build errors regarding soft-float vs hard-float FP ABI tags +# this code is currently unused on ppc64, as it applies to Renoir APUs only +ifdef CONFIG_PPC64 +CFLAGS_$(AMDDALPATH)/dc/clk_mgr/dcn21/rn_clk_mgr.o := $(call cc-option,-mno-gnu-attribute) +endif + AMD_DAL_CLK_MGR_DCN21 = $(addprefix $(AMDDALPATH)/dc/clk_mgr/dcn21/,$(CLK_MGR_DCN21))
AMD_DISPLAY_FILES += $(AMD_DAL_CLK_MGR_DCN21)
From: Thierry Reding treding@nvidia.com
[ Upstream commit 6f4ecbe284df5f22e386a640d9a4b32cede62030 ]
If only Tegra194 support is enabled, the tegra30_fuse_read() and tegra30_fuse_init() function are not declared and cause a build failure. Add Tegra194 to the preprocessor guard to make sure these functions are available for Tegra194-only builds as well.
Link: https://lore.kernel.org/r/20200203143114.3967295-1-thierry.reding@gmail.com Reported-by: kbuild test robot lkp@intel.com Signed-off-by: Thierry Reding treding@nvidia.com Signed-off-by: Olof Johansson olof@lixom.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soc/tegra/fuse/fuse-tegra30.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/tegra/fuse/fuse-tegra30.c b/drivers/soc/tegra/fuse/fuse-tegra30.c index b8daaf5b7291b..efd158b4607cb 100644 --- a/drivers/soc/tegra/fuse/fuse-tegra30.c +++ b/drivers/soc/tegra/fuse/fuse-tegra30.c @@ -36,7 +36,8 @@ defined(CONFIG_ARCH_TEGRA_124_SOC) || \ defined(CONFIG_ARCH_TEGRA_132_SOC) || \ defined(CONFIG_ARCH_TEGRA_210_SOC) || \ - defined(CONFIG_ARCH_TEGRA_186_SOC) + defined(CONFIG_ARCH_TEGRA_186_SOC) || \ + defined(CONFIG_ARCH_TEGRA_194_SOC) static u32 tegra30_fuse_read_early(struct tegra_fuse *fuse, unsigned int offset) { if (WARN_ON(!fuse->base))
From: Brett Creeley brett.creeley@intel.com
[ Upstream commit f27f37a04a69890ac85d9155f03ee2d23b678d8f ]
Commit d9d6a9aed3f6 ("i40e: Fix virtchnl_queue_select bitmap validation") introduced a necessary change for verifying how queue bitmaps from the iavf driver get validated. Unfortunately, the conditional was reversed. Fix this.
Fixes: d9d6a9aed3f6 ("i40e: Fix virtchnl_queue_select bitmap validation") Signed-off-by: Brett Creeley brett.creeley@intel.com Tested-by: Andrew Bowers andrewx.bowers@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 69523ac85639e..56b9e445732ba 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -2362,7 +2362,7 @@ static int i40e_vc_enable_queues_msg(struct i40e_vf *vf, u8 *msg) goto error_param; }
- if (i40e_vc_validate_vqs_bitmaps(vqs)) { + if (!i40e_vc_validate_vqs_bitmaps(vqs)) { aq_ret = I40E_ERR_PARAM; goto error_param; } @@ -2424,7 +2424,7 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg) goto error_param; }
- if (i40e_vc_validate_vqs_bitmaps(vqs)) { + if (!i40e_vc_validate_vqs_bitmaps(vqs)) { aq_ret = I40E_ERR_PARAM; goto error_param; }
From: Arthur Kiyanovski akiyano@amazon.com
[ Upstream commit 91a65b7d3ed8450f31ab717a65dcb5f9ceb5ab02 ]
When ethtool -X is called without an hkey, ena_com_fill_hash_function() is called with key=NULL, which is passed to memcpy causing a crash.
This commit fixes this issue by checking key is not NULL.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran sameehj@amazon.com Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_com.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index ea62604fdf8ca..e54c44fdcaa73 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -2297,15 +2297,16 @@ int ena_com_fill_hash_function(struct ena_com_dev *ena_dev,
switch (func) { case ENA_ADMIN_TOEPLITZ: - if (key_len > sizeof(hash_key->key)) { - pr_err("key len (%hu) is bigger than the max supported (%zu)\n", - key_len, sizeof(hash_key->key)); - return -EINVAL; + if (key) { + if (key_len != sizeof(hash_key->key)) { + pr_err("key len (%hu) doesn't equal the supported size (%zu)\n", + key_len, sizeof(hash_key->key)); + return -EINVAL; + } + memcpy(hash_key->key, key, key_len); + rss->hash_init_val = init_val; + hash_key->keys_num = key_len >> 2; } - - memcpy(hash_key->key, key, key_len); - rss->hash_init_val = init_val; - hash_key->keys_num = key_len >> 2; break; case ENA_ADMIN_CRC32: rss->hash_init_val = init_val;
From: Arthur Kiyanovski akiyano@amazon.com
[ Upstream commit 2a6e5fa2f4c25b66c763428a3e65363214946931 ]
From the documentation of round_jiffies():
"Rounds a time delta in the future (in jiffies) up or down to (approximately) full seconds. This is useful for timers for which the exact time they fire does not matter too much, as long as they fire approximately every X seconds. By rounding these timers to whole seconds, all such timers will fire at the same time, rather than at various times spread out. The goal of this is to have the CPU wake up less, which saves power."
There are 2 parts to this patch: ================================ Part 1: ------- In our case we need timer_service to be called approximately every X=1 seconds, and the exact time does not matter, so using round_jiffies() is the right way to go.
Therefore we add round_jiffies() to the mod_timer() in ena_timer_service().
Part 2: ------- round_jiffies() is used in check_for_missing_keep_alive() when getting the jiffies of the expiration of the keep_alive timeout. Here it is actually a mistake to use round_jiffies() because we want the exact time when keep_alive should expire and not an approximate rounded time, which can cause early, false positive, timeouts.
Therefore we remove round_jiffies() in the calculation of keep_alive_expired() in check_for_missing_keep_alive().
Fixes: 82ef30f13be0 ("net: ena: add hardware hints capability to the driver") Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 948583fdcc286..1c1a41bd11daa 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -3049,8 +3049,8 @@ static void check_for_missing_keep_alive(struct ena_adapter *adapter) if (adapter->keep_alive_timeout == ENA_HW_HINTS_NO_TIMEOUT) return;
- keep_alive_expired = round_jiffies(adapter->last_keep_alive_jiffies + - adapter->keep_alive_timeout); + keep_alive_expired = adapter->last_keep_alive_jiffies + + adapter->keep_alive_timeout; if (unlikely(time_is_before_jiffies(keep_alive_expired))) { netif_err(adapter, drv, adapter->netdev, "Keep alive watchdog timeout.\n"); @@ -3152,7 +3152,7 @@ static void ena_timer_service(struct timer_list *t) }
/* Reset the timer */ - mod_timer(&adapter->timer_service, jiffies + HZ); + mod_timer(&adapter->timer_service, round_jiffies(jiffies + HZ)); }
static int ena_calc_max_io_queue_num(struct pci_dev *pdev,
From: Arthur Kiyanovski akiyano@amazon.com
[ Upstream commit cf6d17fde93bdda23c9b02dd5906a12bf8c55209 ]
Current implementation of the driver calls skb_tx_timestamp()to add a software tx timestamp to the skb, however the software-transmit capability is not reported in ethtool -T.
This commit updates the ethtool structure to report the software-transmit capability in ethtool -T using the standard ethtool_op_get_ts_info(). This function reports all software timestamping capabilities (tx and rx), as well as setting phc_index = -1. phc_index is the index of the PTP hardware clock device that will be used for hardware timestamps. Since we don't have such a device in ENA, using the default -1 value is the correct setting.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Ezequiel Lara Gomez ezegomez@amazon.com Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c index fc96c66b44cb5..8b56383b64aea 100644 --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c @@ -812,6 +812,7 @@ static const struct ethtool_ops ena_ethtool_ops = { .set_channels = ena_set_channels, .get_tunable = ena_get_tunable, .set_tunable = ena_set_tunable, + .get_ts_info = ethtool_op_get_ts_info, };
void ena_set_ethtool_ops(struct net_device *netdev)
From: Arthur Kiyanovski akiyano@amazon.com
[ Upstream commit 0d1c3de7b8c78a5e44b74b62ede4a63629f5d811 ]
Bug description: When running "ethtool -x <if_name>" the key shows up as all zeros.
When we use "ethtool -X <if_name> hfunc toeplitz hkey some:random:key" to set the key and then try to retrieve it using "ethtool -x <if_name>" then we return the correct key because we return the one we saved.
Bug cause: We don't fetch the key from the device but instead return the key that we have saved internally which is by default set to zero upon allocation.
Fix: This commit fixes the issue by initializing the key to a random value using netdev_rss_key_fill().
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran sameehj@amazon.com Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_com.c | 15 +++++++++++++++ drivers/net/ethernet/amazon/ena/ena_com.h | 1 + 2 files changed, 16 insertions(+)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index e54c44fdcaa73..d6b894b06fa30 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -1041,6 +1041,19 @@ static int ena_com_get_feature(struct ena_com_dev *ena_dev, feature_ver); }
+static void ena_com_hash_key_fill_default_key(struct ena_com_dev *ena_dev) +{ + struct ena_admin_feature_rss_flow_hash_control *hash_key = + (ena_dev->rss).hash_key; + + netdev_rss_key_fill(&hash_key->key, sizeof(hash_key->key)); + /* The key is stored in the device in u32 array + * as well as the API requires the key to be passed in this + * format. Thus the size of our array should be divided by 4 + */ + hash_key->keys_num = sizeof(hash_key->key) / sizeof(u32); +} + static int ena_com_hash_key_allocate(struct ena_com_dev *ena_dev) { struct ena_rss *rss = &ena_dev->rss; @@ -2631,6 +2644,8 @@ int ena_com_rss_init(struct ena_com_dev *ena_dev, u16 indr_tbl_log_size) if (unlikely(rc)) goto err_hash_key;
+ ena_com_hash_key_fill_default_key(ena_dev); + rc = ena_com_hash_ctrl_init(ena_dev); if (unlikely(rc)) goto err_hash_ctrl; diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h b/drivers/net/ethernet/amazon/ena/ena_com.h index 0ce37d54ed108..9b5bd28ed0ac6 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.h +++ b/drivers/net/ethernet/amazon/ena/ena_com.h @@ -44,6 +44,7 @@ #include <linux/spinlock.h> #include <linux/types.h> #include <linux/wait.h> +#include <linux/netdevice.h>
#include "ena_common_defs.h" #include "ena_admin_defs.h"
From: Sameeh Jubran sameehj@amazon.com
[ Upstream commit 6a4f7dc82d1e3abd3feb0c60b5041056fcd9880c ]
Currently we allocate the key whether the device supports setting the key or not. This commit adds a check to the allocation function and handles the error accordingly.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran sameehj@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_com.c | 24 ++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index d6b894b06fa30..6f758ece86f60 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -1057,6 +1057,20 @@ static void ena_com_hash_key_fill_default_key(struct ena_com_dev *ena_dev) static int ena_com_hash_key_allocate(struct ena_com_dev *ena_dev) { struct ena_rss *rss = &ena_dev->rss; + struct ena_admin_feature_rss_flow_hash_control *hash_key; + struct ena_admin_get_feat_resp get_resp; + int rc; + + hash_key = (ena_dev->rss).hash_key; + + rc = ena_com_get_feature_ex(ena_dev, &get_resp, + ENA_ADMIN_RSS_HASH_FUNCTION, + ena_dev->rss.hash_key_dma_addr, + sizeof(ena_dev->rss.hash_key), 0); + if (unlikely(rc)) { + hash_key = NULL; + return -EOPNOTSUPP; + }
rss->hash_key = dma_alloc_coherent(ena_dev->dmadev, sizeof(*rss->hash_key), @@ -2640,11 +2654,15 @@ int ena_com_rss_init(struct ena_com_dev *ena_dev, u16 indr_tbl_log_size) if (unlikely(rc)) goto err_indr_tbl;
+ /* The following function might return unsupported in case the + * device doesn't support setting the key / hash function. We can safely + * ignore this error and have indirection table support only. + */ rc = ena_com_hash_key_allocate(ena_dev); - if (unlikely(rc)) + if (unlikely(rc) && rc != -EOPNOTSUPP) goto err_hash_key; - - ena_com_hash_key_fill_default_key(ena_dev); + else if (rc != -EOPNOTSUPP) + ena_com_hash_key_fill_default_key(ena_dev);
rc = ena_com_hash_ctrl_init(ena_dev); if (unlikely(rc))
From: Sameeh Jubran sameehj@amazon.com
[ Upstream commit 0c8923c0a64fb5d14bebb9a9065d2dc25ac5e600 ]
On old hardware, getting / setting the hash function is not supported while gettting / setting the indirection table is.
This commit enables us to still show the indirection table on older hardwares by setting the hash function and key to NULL.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran sameehj@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c index 8b56383b64aea..8be9df885bf4f 100644 --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c @@ -648,7 +648,21 @@ static int ena_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key, if (rc) return rc;
+ /* We call this function in order to check if the device + * supports getting/setting the hash function. + */ rc = ena_com_get_hash_function(adapter->ena_dev, &ena_func, key); + + if (rc) { + if (rc == -EOPNOTSUPP) { + key = NULL; + hfunc = NULL; + rc = 0; + } + + return rc; + } + if (rc) return rc;
From: Arthur Kiyanovski akiyano@amazon.com
[ Upstream commit 4844470d472d660c26149ad764da2406adb13423 ]
The device receives, stores and retrieves the hash function value as bits and not as their enum value.
The bug: * In ena_com_set_hash_function() we set cmd.u.flow_hash_func.selected_func to the bit value of rss->hash_func. (1 << rss->hash_func) * In ena_com_get_hash_function() we retrieve the hash function and store it's bit value in rss->hash_func. (Now the bit value of rss->hash_func is stored in rss->hash_func instead of it's enum value)
The fix: This commit fixes the issue by converting the retrieved hash function values from the device to the matching enum value of the set bit using ffs(). ffs() finds the first set bit's index in a word. Since the function returns 1 for the LSB's index, we need to subtract 1 from the returned value (note that BIT(0) is 1).
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran sameehj@amazon.com Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_com.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index 6f758ece86f60..8ab192cb26b74 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -2370,7 +2370,11 @@ int ena_com_get_hash_function(struct ena_com_dev *ena_dev, if (unlikely(rc)) return rc;
- rss->hash_func = get_resp.u.flow_hash_func.selected_func; + /* ffs() returns 1 in case the lsb is set */ + rss->hash_func = ffs(get_resp.u.flow_hash_func.selected_func); + if (rss->hash_func) + rss->hash_func--; + if (func) *func = rss->hash_func;
From: Arthur Kiyanovski akiyano@amazon.com
[ Upstream commit 92569fd27f5cb0ccbdf7c7d70044b690e89a0277 ]
The indirection table has the indices of the Rx queues. When we store it during set indirection operation, we convert the indices to our internal representation of the indices.
Our internal representation of the indices is: even indices for Tx and uneven indices for Rx, where every Tx/Rx pair are in a consecutive order starting from 0. For example if the driver has 3 queues (3 for Tx and 3 for Rx) then the indices are as follows: 0 1 2 3 4 5 Tx Rx Tx Rx Tx Rx
The BUG: The issue is that when we satisfy a get request for the indirection table, we don't convert the indices back to the original representation.
The FIX: Simply apply the inverse function for the indices of the indirection table after we set it.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran sameehj@amazon.com Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 24 ++++++++++++++++++- drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 ++ 2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c index 8be9df885bf4f..610a7c63e1742 100644 --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c @@ -636,6 +636,28 @@ static u32 ena_get_rxfh_key_size(struct net_device *netdev) return ENA_HASH_KEY_SIZE; }
+static int ena_indirection_table_get(struct ena_adapter *adapter, u32 *indir) +{ + struct ena_com_dev *ena_dev = adapter->ena_dev; + int i, rc; + + if (!indir) + return 0; + + rc = ena_com_indirect_table_get(ena_dev, indir); + if (rc) + return rc; + + /* Our internal representation of the indices is: even indices + * for Tx and uneven indices for Rx. We need to convert the Rx + * indices to be consecutive + */ + for (i = 0; i < ENA_RX_RSS_TABLE_SIZE; i++) + indir[i] = ENA_IO_RXQ_IDX_TO_COMBINED_IDX(indir[i]); + + return rc; +} + static int ena_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key, u8 *hfunc) { @@ -644,7 +666,7 @@ static int ena_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key, u8 func; int rc;
- rc = ena_com_indirect_table_get(adapter->ena_dev, indir); + rc = ena_indirection_table_get(adapter, indir); if (rc) return rc;
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index bffd778f2ce34..2fe5eeea6b695 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@ -129,6 +129,8 @@
#define ENA_IO_TXQ_IDX(q) (2 * (q)) #define ENA_IO_RXQ_IDX(q) (2 * (q) + 1) +#define ENA_IO_TXQ_IDX_TO_COMBINED_IDX(q) ((q) / 2) +#define ENA_IO_RXQ_IDX_TO_COMBINED_IDX(q) (((q) - 1) / 2)
#define ENA_MGMNT_IRQ_IDX 0 #define ENA_IO_IRQ_FIRST_IDX 1
From: Arthur Kiyanovski akiyano@amazon.com
[ Upstream commit e3f89f91e98ce07dc0f121a3b70d21aca749ba39 ]
The function ena_com_ind_tbl_convert_from_device() has an overflow bug as explained below. Either way, this function is not needed at all since we don't retrieve the indirection table from the device at any point which means that this conversion is not needed.
The bug: The for loop iterates over all io_sq_queues, when passing the actual number of used queues the io_sq_queues[i].idx equals 0 since they are uninitialized which results in the following code to be executed till the end of the loop:
dev_idx_to_host_tbl[0] = i;
This results dev_idx_to_host_tbl[0] in being equal to ENA_TOTAL_NUM_QUEUES - 1.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran sameehj@amazon.com Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_com.c | 28 ----------------------- 1 file changed, 28 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index 8ab192cb26b74..74743fd8a1e0a 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -1281,30 +1281,6 @@ static int ena_com_ind_tbl_convert_to_device(struct ena_com_dev *ena_dev) return 0; }
-static int ena_com_ind_tbl_convert_from_device(struct ena_com_dev *ena_dev) -{ - u16 dev_idx_to_host_tbl[ENA_TOTAL_NUM_QUEUES] = { (u16)-1 }; - struct ena_rss *rss = &ena_dev->rss; - u8 idx; - u16 i; - - for (i = 0; i < ENA_TOTAL_NUM_QUEUES; i++) - dev_idx_to_host_tbl[ena_dev->io_sq_queues[i].idx] = i; - - for (i = 0; i < 1 << rss->tbl_log_size; i++) { - if (rss->rss_ind_tbl[i].cq_idx > ENA_TOTAL_NUM_QUEUES) - return -EINVAL; - idx = (u8)rss->rss_ind_tbl[i].cq_idx; - - if (dev_idx_to_host_tbl[idx] > ENA_TOTAL_NUM_QUEUES) - return -EINVAL; - - rss->host_rss_ind_tbl[i] = dev_idx_to_host_tbl[idx]; - } - - return 0; -} - static void ena_com_update_intr_delay_resolution(struct ena_com_dev *ena_dev, u16 intr_delay_resolution) { @@ -2638,10 +2614,6 @@ int ena_com_indirect_table_get(struct ena_com_dev *ena_dev, u32 *ind_tbl) if (!ind_tbl) return 0;
- rc = ena_com_ind_tbl_convert_from_device(ena_dev); - if (unlikely(rc)) - return rc; - for (i = 0; i < (1 << rss->tbl_log_size); i++) ind_tbl[i] = rss->host_rss_ind_tbl[i];
From: Sameeh Jubran sameehj@amazon.com
[ Upstream commit 886d2089276e40d460731765083a741c5c762461 ]
Up till kernel 4.11 there was no enum defined for crc32 hash in ethtool, thus the xor enum was used for supporting crc32.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran sameehj@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c index 610a7c63e1742..4ad69066e7846 100644 --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c @@ -693,7 +693,7 @@ static int ena_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key, func = ETH_RSS_HASH_TOP; break; case ENA_ADMIN_CRC32: - func = ETH_RSS_HASH_XOR; + func = ETH_RSS_HASH_CRC32; break; default: netif_err(adapter, drv, netdev, @@ -739,7 +739,7 @@ static int ena_set_rxfh(struct net_device *netdev, const u32 *indir, case ETH_RSS_HASH_TOP: func = ENA_ADMIN_TOEPLITZ; break; - case ETH_RSS_HASH_XOR: + case ETH_RSS_HASH_CRC32: func = ENA_ADMIN_CRC32; break; default:
From: Arthur Kiyanovski akiyano@amazon.com
[ Upstream commit c207979f5ae10ed70aff1bb13f39f0736973de99 ]
comp_ctx can be NULL in a very rare case when an admin command is executed during the execution of ena_remove().
The bug scenario is as follows:
* ena_destroy_device() sets the comp_ctx to be NULL * An admin command is executed before executing unregister_netdev(), this can still happen because our device can still receive callbacks from the netdev infrastructure such as ethtool commands. * When attempting to access the comp_ctx, the bug occurs since it's set to NULL
Fix: Added a check that comp_ctx is not NULL
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran sameehj@amazon.com Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_com.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index 74743fd8a1e0a..304531332e70a 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -200,6 +200,11 @@ static void comp_ctxt_release(struct ena_com_admin_queue *queue, static struct ena_comp_ctx *get_comp_ctxt(struct ena_com_admin_queue *queue, u16 command_id, bool capture) { + if (unlikely(!queue->comp_ctx)) { + pr_err("Completion context is NULL\n"); + return NULL; + } + if (unlikely(command_id >= queue->q_depth)) { pr_err("command id is larger than the queue size. cmd_id: %u queue size %d\n", command_id, queue->q_depth);
From: Dave Ertman david.m.ertman@intel.com
[ Upstream commit 53977ee47410885e7d4eee87d2c811a48a275150 ]
When switching between FW and SW LLDP mode, the number of configured TLV apps in the driver's DCB configuration is getting out of synch with what lldpad thinks is configured. This is causing a problem when shutting down lldpad. The cleanup is trying to delete TLV apps that are not defined in the kernel.
Since the driver is keeping an accurate account of the apps defined, use the drivers number of apps to determine if there is an app to delete. If the number of apps is <= 1, then do not attempt to delete.
Signed-off-by: Dave Ertman david.m.ertman@intel.com Tested-by: Andrew Bowers andrewx.bowers@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_dcb_nl.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_nl.c b/drivers/net/ethernet/intel/ice/ice_dcb_nl.c index d870c1aedc170..926c9772f0860 100644 --- a/drivers/net/ethernet/intel/ice/ice_dcb_nl.c +++ b/drivers/net/ethernet/intel/ice/ice_dcb_nl.c @@ -713,13 +713,13 @@ static int ice_dcbnl_delapp(struct net_device *netdev, struct dcb_app *app) return -EINVAL;
mutex_lock(&pf->tc_mutex); - ret = dcb_ieee_delapp(netdev, app); - if (ret) - goto delapp_out; - old_cfg = &pf->hw.port_info->local_dcbx_cfg;
- if (old_cfg->numapps == 1) + if (old_cfg->numapps <= 1) + goto delapp_out; + + ret = dcb_ieee_delapp(netdev, app); + if (ret) goto delapp_out;
new_cfg = &pf->hw.port_info->desired_dcbx_cfg;
From: Brett Creeley brett.creeley@intel.com
[ Upstream commit 168983a8e19b89efd175661e53faa6246be363a0 ]
Currently we compare the value we are about to write to the Rx tail register with the previous value of next_to_use. The problem with this is we only write tail on 8 descriptor boundaries, but next_to_use is updated whenever we clean Rx descriptors. Fix this by comparing the value we are about to write to tail with the previously written tail value. This will prevent duplicate Rx tail bumps.
Signed-off-by: Brett Creeley brett.creeley@intel.com Tested-by: Andrew Bowers andrewx.bowers@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 35bbc4ff603cd..6da048a6ca7c1 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -10,7 +10,7 @@ */ void ice_release_rx_desc(struct ice_ring *rx_ring, u32 val) { - u16 prev_ntu = rx_ring->next_to_use; + u16 prev_ntu = rx_ring->next_to_use & ~0x7;
rx_ring->next_to_use = val;
From: Bruce Allan bruce.w.allan@intel.com
[ Upstream commit fbf1e1f6988e70287b1bfcad4f655ca96b681929 ]
Logging the firmware/NVM information during driver load is redundant since that information is also available via ethtool. Move the functionality found in ice_nvm_version_str() directly into ice_get_drvinfo() and remove calling the former and logging that info during driver probe. This also gets rid of a bug in ice_nvm_version_str() where it returns a pointer to a buffer which is free'ed when that function exits.
Signed-off-by: Bruce Allan bruce.w.allan@intel.com Tested-by: Andrew Bowers andrewx.bowers@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_ethtool.c | 15 +++++++++++++-- drivers/net/ethernet/intel/ice/ice_lib.c | 19 ------------------- drivers/net/ethernet/intel/ice/ice_lib.h | 2 -- drivers/net/ethernet/intel/ice/ice_main.c | 5 ----- 4 files changed, 13 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c index 9ebd93e79aeb6..f956f7bb4ef2d 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c @@ -165,13 +165,24 @@ static void ice_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo) { struct ice_netdev_priv *np = netdev_priv(netdev); + u8 oem_ver, oem_patch, nvm_ver_hi, nvm_ver_lo; struct ice_vsi *vsi = np->vsi; struct ice_pf *pf = vsi->back; + struct ice_hw *hw = &pf->hw; + u16 oem_build;
strlcpy(drvinfo->driver, KBUILD_MODNAME, sizeof(drvinfo->driver)); strlcpy(drvinfo->version, ice_drv_ver, sizeof(drvinfo->version)); - strlcpy(drvinfo->fw_version, ice_nvm_version_str(&pf->hw), - sizeof(drvinfo->fw_version)); + + /* Display NVM version (from which the firmware version can be + * determined) which contains more pertinent information. + */ + ice_get_nvm_version(hw, &oem_ver, &oem_build, &oem_patch, + &nvm_ver_hi, &nvm_ver_lo); + snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), + "%x.%02x 0x%x %d.%d.%d", nvm_ver_hi, nvm_ver_lo, + hw->nvm.eetrack, oem_ver, oem_build, oem_patch); + strlcpy(drvinfo->bus_info, pci_name(pf->pdev), sizeof(drvinfo->bus_info)); drvinfo->n_priv_flags = ICE_PRIV_FLAG_ARRAY_SIZE; diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index e7449248fab4c..e0e3c6400e4b9 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -2647,25 +2647,6 @@ int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc) } #endif /* CONFIG_DCB */
-/** - * ice_nvm_version_str - format the NVM version strings - * @hw: ptr to the hardware info - */ -char *ice_nvm_version_str(struct ice_hw *hw) -{ - u8 oem_ver, oem_patch, ver_hi, ver_lo; - static char buf[ICE_NVM_VER_LEN]; - u16 oem_build; - - ice_get_nvm_version(hw, &oem_ver, &oem_build, &oem_patch, &ver_hi, - &ver_lo); - - snprintf(buf, sizeof(buf), "%x.%02x 0x%x %d.%d.%d", ver_hi, ver_lo, - hw->nvm.eetrack, oem_ver, oem_build, oem_patch); - - return buf; -} - /** * ice_update_ring_stats - Update ring statistics * @ring: ring to update diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h index 6e31e30aba394..0d2b1119c0e38 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_lib.h @@ -97,8 +97,6 @@ void ice_vsi_cfg_frame_size(struct ice_vsi *vsi);
u32 ice_intrl_usec_to_reg(u8 intrl, u8 gran);
-char *ice_nvm_version_str(struct ice_hw *hw); - enum ice_status ice_vsi_cfg_mac_fltr(struct ice_vsi *vsi, const u8 *macaddr, bool set);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 69bff085acf75..b4cbeb4f3177f 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -3241,11 +3241,6 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent) goto err_exit_unroll; }
- dev_info(dev, "firmware %d.%d.%d api %d.%d.%d nvm %s build 0x%08x\n", - hw->fw_maj_ver, hw->fw_min_ver, hw->fw_patch, - hw->api_maj_ver, hw->api_min_ver, hw->api_patch, - ice_nvm_version_str(hw), hw->fw_build); - ice_request_fw(pf);
/* if ice_request_fw fails, ICE_FLAG_ADV_FEATURES bit won't be
From: Bruce Allan bruce.w.allan@intel.com
[ Upstream commit cf8fc2a0863f9ff27ebd2efcdb1f7d378b9fb8a6 ]
After a reset the Unit Load Status bits in the GLNVM_ULD register to check for completion should be 0x7FF before continuing. Update the mask to check (minus the three reserved bits that are always set).
Signed-off-by: Bruce Allan bruce.w.allan@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Tested-by: Andrew Bowers andrewx.bowers@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_common.c | 17 ++++++++++++----- drivers/net/ethernet/intel/ice/ice_hw_autogen.h | 6 ++++++ 2 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c index fb1d930470c71..cb437a448305e 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.c +++ b/drivers/net/ethernet/intel/ice/ice_common.c @@ -937,7 +937,7 @@ void ice_deinit_hw(struct ice_hw *hw) */ enum ice_status ice_check_reset(struct ice_hw *hw) { - u32 cnt, reg = 0, grst_delay; + u32 cnt, reg = 0, grst_delay, uld_mask;
/* Poll for Device Active state in case a recent CORER, GLOBR, * or EMPR has occurred. The grst delay value is in 100ms units. @@ -959,13 +959,20 @@ enum ice_status ice_check_reset(struct ice_hw *hw) return ICE_ERR_RESET_FAILED; }
-#define ICE_RESET_DONE_MASK (GLNVM_ULD_CORER_DONE_M | \ - GLNVM_ULD_GLOBR_DONE_M) +#define ICE_RESET_DONE_MASK (GLNVM_ULD_PCIER_DONE_M |\ + GLNVM_ULD_PCIER_DONE_1_M |\ + GLNVM_ULD_CORER_DONE_M |\ + GLNVM_ULD_GLOBR_DONE_M |\ + GLNVM_ULD_POR_DONE_M |\ + GLNVM_ULD_POR_DONE_1_M |\ + GLNVM_ULD_PCIER_DONE_2_M) + + uld_mask = ICE_RESET_DONE_MASK;
/* Device is Active; check Global Reset processes are done */ for (cnt = 0; cnt < ICE_PF_RESET_WAIT_COUNT; cnt++) { - reg = rd32(hw, GLNVM_ULD) & ICE_RESET_DONE_MASK; - if (reg == ICE_RESET_DONE_MASK) { + reg = rd32(hw, GLNVM_ULD) & uld_mask; + if (reg == uld_mask) { ice_debug(hw, ICE_DBG_INIT, "Global reset processes done. %d\n", cnt); break; diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h index e8f32350fed29..6f4a70fa39037 100644 --- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h +++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h @@ -276,8 +276,14 @@ #define GLNVM_GENS_SR_SIZE_S 5 #define GLNVM_GENS_SR_SIZE_M ICE_M(0x7, 5) #define GLNVM_ULD 0x000B6008 +#define GLNVM_ULD_PCIER_DONE_M BIT(0) +#define GLNVM_ULD_PCIER_DONE_1_M BIT(1) #define GLNVM_ULD_CORER_DONE_M BIT(3) #define GLNVM_ULD_GLOBR_DONE_M BIT(4) +#define GLNVM_ULD_POR_DONE_M BIT(5) +#define GLNVM_ULD_POR_DONE_1_M BIT(8) +#define GLNVM_ULD_PCIER_DONE_2_M BIT(9) +#define GLNVM_ULD_PE_DONE_M BIT(10) #define GLPCI_CNF2 0x000BE004 #define GLPCI_CNF2_CACHELINE_SIZE_M BIT(1) #define PF_FUNC_RID 0x0009E880
From: Anirudh Venkataramanan anirudh.venkataramanan@intel.com
[ Upstream commit 9a946843ba5c173e259fef7a035feac994a65b59 ]
Use ice_pf_to_dev(pf) instead of &pf->pdev->dev Use ice_pf_to_dev(vsi->back) instead of &vsi->back->pdev->dev When a pointer to the pf instance is available, use ice_pf_to_dev instead of ice_hw_to_dev
Signed-off-by: Anirudh Venkataramanan anirudh.venkataramanan@intel.com Tested-by: Andrew Bowers andrewx.bowers@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_base.c | 12 ++++++------ drivers/net/ethernet/intel/ice/ice_dcb_nl.c | 2 +- drivers/net/ethernet/intel/ice/ice_ethtool.c | 2 +- drivers/net/ethernet/intel/ice/ice_lib.c | 14 +++++++------- drivers/net/ethernet/intel/ice/ice_main.c | 16 ++++++++-------- drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 8 ++++---- 6 files changed, 27 insertions(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 77d6a0291e975..6939c14858b20 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -320,7 +320,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) if (err) return err;
- dev_info(&vsi->back->pdev->dev, "Registered XDP mem model MEM_TYPE_ZERO_COPY on Rx ring %d\n", + dev_info(ice_pf_to_dev(vsi->back), "Registered XDP mem model MEM_TYPE_ZERO_COPY on Rx ring %d\n", ring->q_index); } else { if (!xdp_rxq_info_is_reg(&ring->xdp_rxq)) @@ -399,7 +399,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) /* Absolute queue number out of 2K needs to be passed */ err = ice_write_rxq_ctx(hw, &rlan_ctx, pf_q); if (err) { - dev_err(&vsi->back->pdev->dev, + dev_err(ice_pf_to_dev(vsi->back), "Failed to set LAN Rx queue context for absolute Rx queue %d error: %d\n", pf_q, err); return -EIO; @@ -422,7 +422,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) ice_alloc_rx_bufs_slow_zc(ring, ICE_DESC_UNUSED(ring)) : ice_alloc_rx_bufs(ring, ICE_DESC_UNUSED(ring)); if (err) - dev_info(&vsi->back->pdev->dev, + dev_info(ice_pf_to_dev(vsi->back), "Failed allocate some buffers on %sRx ring %d (pf_q %d)\n", ring->xsk_umem ? "UMEM enabled " : "", ring->q_index, pf_q); @@ -817,13 +817,13 @@ ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum ice_disq_rst_src rst_src, * queues at the hardware level anyway. */ if (status == ICE_ERR_RESET_ONGOING) { - dev_dbg(&vsi->back->pdev->dev, + dev_dbg(ice_pf_to_dev(vsi->back), "Reset in progress. LAN Tx queues already disabled\n"); } else if (status == ICE_ERR_DOES_NOT_EXIST) { - dev_dbg(&vsi->back->pdev->dev, + dev_dbg(ice_pf_to_dev(vsi->back), "LAN Tx queues do not exist, nothing to disable\n"); } else if (status) { - dev_err(&vsi->back->pdev->dev, + dev_err(ice_pf_to_dev(vsi->back), "Failed to disable LAN Tx queues, error: %d\n", status); return -ENODEV; } diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_nl.c b/drivers/net/ethernet/intel/ice/ice_dcb_nl.c index 926c9772f0860..265cf69b321bf 100644 --- a/drivers/net/ethernet/intel/ice/ice_dcb_nl.c +++ b/drivers/net/ethernet/intel/ice/ice_dcb_nl.c @@ -882,7 +882,7 @@ ice_dcbnl_vsi_del_app(struct ice_vsi *vsi, sapp.protocol = app->prot_id; sapp.priority = app->priority; err = ice_dcbnl_delapp(vsi->netdev, &sapp); - dev_dbg(&vsi->back->pdev->dev, + dev_dbg(ice_pf_to_dev(vsi->back), "Deleting app for VSI idx=%d err=%d sel=%d proto=0x%x, prio=%d\n", vsi->idx, err, app->selector, app->prot_id, app->priority); } diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c index f956f7bb4ef2d..9bd166e3dff3d 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c @@ -1054,7 +1054,7 @@ ice_set_fecparam(struct net_device *netdev, struct ethtool_fecparam *fecparam) fec = ICE_FEC_NONE; break; default: - dev_warn(&vsi->back->pdev->dev, "Unsupported FEC mode: %d\n", + dev_warn(ice_pf_to_dev(vsi->back), "Unsupported FEC mode: %d\n", fecparam->fec); return -EINVAL; } diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index e0e3c6400e4b9..b43bb51f6067a 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -116,7 +116,7 @@ static void ice_vsi_set_num_desc(struct ice_vsi *vsi) vsi->num_tx_desc = ICE_DFLT_NUM_TX_DESC; break; default: - dev_dbg(&vsi->back->pdev->dev, + dev_dbg(ice_pf_to_dev(vsi->back), "Not setting number of Tx/Rx descriptors for VSI type %d\n", vsi->type); break; @@ -697,7 +697,7 @@ static void ice_vsi_setup_q_map(struct ice_vsi *vsi, struct ice_vsi_ctx *ctxt) vsi->num_txq = tx_count;
if (vsi->type == ICE_VSI_VF && vsi->num_txq != vsi->num_rxq) { - dev_dbg(&vsi->back->pdev->dev, "VF VSI should have same number of Tx and Rx queues. Hence making them equal\n"); + dev_dbg(ice_pf_to_dev(vsi->back), "VF VSI should have same number of Tx and Rx queues. Hence making them equal\n"); /* since there is a chance that num_rxq could have been changed * in the above for loop, make num_txq equal to num_rxq. */ @@ -1306,7 +1306,7 @@ int ice_vsi_cfg_rxqs(struct ice_vsi *vsi)
err = ice_setup_rx_ctx(vsi->rx_rings[i]); if (err) { - dev_err(&vsi->back->pdev->dev, + dev_err(ice_pf_to_dev(vsi->back), "ice_setup_rx_ctx failed for RxQ %d, err %d\n", i, err); return err; @@ -1476,7 +1476,7 @@ int ice_vsi_manage_vlan_insertion(struct ice_vsi *vsi)
status = ice_update_vsi(hw, vsi->idx, ctxt, NULL); if (status) { - dev_err(&vsi->back->pdev->dev, "update VSI for VLAN insert failed, err %d aq_err %d\n", + dev_err(ice_pf_to_dev(vsi->back), "update VSI for VLAN insert failed, err %d aq_err %d\n", status, hw->adminq.sq_last_status); ret = -EIO; goto out; @@ -1522,7 +1522,7 @@ int ice_vsi_manage_vlan_stripping(struct ice_vsi *vsi, bool ena)
status = ice_update_vsi(hw, vsi->idx, ctxt, NULL); if (status) { - dev_err(&vsi->back->pdev->dev, "update VSI for VLAN strip failed, ena = %d err %d aq_err %d\n", + dev_err(ice_pf_to_dev(vsi->back), "update VSI for VLAN strip failed, ena = %d err %d aq_err %d\n", ena, status, hw->adminq.sq_last_status); ret = -EIO; goto out; @@ -1696,7 +1696,7 @@ ice_vsi_set_q_vectors_reg_idx(struct ice_vsi *vsi) struct ice_q_vector *q_vector = vsi->q_vectors[i];
if (!q_vector) { - dev_err(&vsi->back->pdev->dev, + dev_err(ice_pf_to_dev(vsi->back), "Failed to set reg_idx on q_vector %d VSI %d\n", i, vsi->vsi_num); goto clear_reg_idx; @@ -2718,6 +2718,6 @@ ice_vsi_cfg_mac_fltr(struct ice_vsi *vsi, const u8 *macaddr, bool set) status = ice_remove_mac(&vsi->back->hw, &tmp_add_list);
cfg_mac_fltr_exit: - ice_free_fltr_list(&vsi->back->pdev->dev, &tmp_add_list); + ice_free_fltr_list(ice_pf_to_dev(vsi->back), &tmp_add_list); return status; } diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index b4cbeb4f3177f..c9b35b202639d 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -269,7 +269,7 @@ static int ice_cfg_promisc(struct ice_vsi *vsi, u8 promisc_m, bool set_promisc) */ static int ice_vsi_sync_fltr(struct ice_vsi *vsi) { - struct device *dev = &vsi->back->pdev->dev; + struct device *dev = ice_pf_to_dev(vsi->back); struct net_device *netdev = vsi->netdev; bool promisc_forced_on = false; struct ice_pf *pf = vsi->back; @@ -1364,7 +1364,7 @@ static int ice_force_phys_link_state(struct ice_vsi *vsi, bool link_up) if (vsi->type != ICE_VSI_PF) return 0;
- dev = &vsi->back->pdev->dev; + dev = ice_pf_to_dev(vsi->back);
pi = vsi->port_info;
@@ -1682,7 +1682,7 @@ static int ice_vsi_req_irq_msix(struct ice_vsi *vsi, char *basename) */ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi) { - struct device *dev = &vsi->back->pdev->dev; + struct device *dev = ice_pf_to_dev(vsi->back); int i;
for (i = 0; i < vsi->num_xdp_txq; i++) { @@ -3858,14 +3858,14 @@ ice_set_features(struct net_device *netdev, netdev_features_t features)
/* Don't set any netdev advanced features with device in Safe Mode */ if (ice_is_safe_mode(vsi->back)) { - dev_err(&vsi->back->pdev->dev, + dev_err(ice_pf_to_dev(vsi->back), "Device is in Safe Mode - not enabling advanced netdev features\n"); return ret; }
/* Do not change setting during reset */ if (ice_is_reset_in_progress(pf->state)) { - dev_err(&vsi->back->pdev->dev, + dev_err(ice_pf_to_dev(vsi->back), "Device is resetting, changing advanced netdev features temporarily unavailable.\n"); return -EBUSY; } @@ -4408,7 +4408,7 @@ int ice_vsi_setup_tx_rings(struct ice_vsi *vsi) int i, err = 0;
if (!vsi->num_txq) { - dev_err(&vsi->back->pdev->dev, "VSI %d has 0 Tx queues\n", + dev_err(ice_pf_to_dev(vsi->back), "VSI %d has 0 Tx queues\n", vsi->vsi_num); return -EINVAL; } @@ -4439,7 +4439,7 @@ int ice_vsi_setup_rx_rings(struct ice_vsi *vsi) int i, err = 0;
if (!vsi->num_rxq) { - dev_err(&vsi->back->pdev->dev, "VSI %d has 0 Rx queues\n", + dev_err(ice_pf_to_dev(vsi->back), "VSI %d has 0 Rx queues\n", vsi->vsi_num); return -EINVAL; } @@ -4968,7 +4968,7 @@ static int ice_vsi_update_bridge_mode(struct ice_vsi *vsi, u16 bmode)
status = ice_update_vsi(hw, vsi->idx, ctxt, NULL); if (status) { - dev_err(&vsi->back->pdev->dev, "update VSI for bridge mode failed, bmode = %d err %d aq_err %d\n", + dev_err(ice_pf_to_dev(vsi->back), "update VSI for bridge mode failed, bmode = %d err %d aq_err %d\n", bmode, status, hw->adminq.sq_last_status); ret = -EIO; goto out; diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index edb374296d1f3..e2114f24a19e9 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -508,7 +508,7 @@ static int ice_vsi_manage_pvid(struct ice_vsi *vsi, u16 vid, bool enable)
status = ice_update_vsi(hw, vsi->idx, ctxt, NULL); if (status) { - dev_info(&vsi->back->pdev->dev, "update VSI for port VLAN failed, err %d aq_err %d\n", + dev_info(ice_pf_to_dev(vsi->back), "update VSI for port VLAN failed, err %d aq_err %d\n", status, hw->adminq.sq_last_status); ret = -EIO; goto out; @@ -2019,7 +2019,7 @@ static int ice_vc_ena_qs_msg(struct ice_vf *vf, u8 *msg) continue;
if (ice_vsi_ctrl_rx_ring(vsi, true, vf_q_id)) { - dev_err(&vsi->back->pdev->dev, + dev_err(ice_pf_to_dev(vsi->back), "Failed to enable Rx ring %d on VSI %d\n", vf_q_id, vsi->vsi_num); v_ret = VIRTCHNL_STATUS_ERR_PARAM; @@ -2122,7 +2122,7 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg)
if (ice_vsi_stop_tx_ring(vsi, ICE_NO_RESET, vf->vf_id, ring, &txq_meta)) { - dev_err(&vsi->back->pdev->dev, + dev_err(ice_pf_to_dev(vsi->back), "Failed to stop Tx ring %d on VSI %d\n", vf_q_id, vsi->vsi_num); v_ret = VIRTCHNL_STATUS_ERR_PARAM; @@ -2149,7 +2149,7 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg) continue;
if (ice_vsi_ctrl_rx_ring(vsi, false, vf_q_id)) { - dev_err(&vsi->back->pdev->dev, + dev_err(ice_pf_to_dev(vsi->back), "Failed to stop Rx ring %d on VSI %d\n", vf_q_id, vsi->vsi_num); v_ret = VIRTCHNL_STATUS_ERR_PARAM;
From: Ben Shelton benjamin.h.shelton@intel.com
[ Upstream commit 1d8bd9927234081db15a1d42a7f99505244e3703 ]
Use the correct netif_msg_[tx,rx]_error() function to determine whether to print the MDD event type.
Signed-off-by: Ben Shelton benjamin.h.shelton@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Tested-by: Andrew Bowers andrewx.bowers@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index c9b35b202639d..7f71f06fa819c 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -1235,7 +1235,7 @@ static void ice_handle_mdd_event(struct ice_pf *pf) u16 queue = ((reg & GL_MDET_TX_TCLAN_QNUM_M) >> GL_MDET_TX_TCLAN_QNUM_S);
- if (netif_msg_rx_err(pf)) + if (netif_msg_tx_err(pf)) dev_info(dev, "Malicious Driver Detection event %d on TX queue %d PF# %d VF# %d\n", event, queue, pf_num, vf_num); wr32(hw, GL_MDET_TX_TCLAN, 0xffffffff);
From: Jens Axboe axboe@kernel.dk
[ Upstream commit 7563439adfae153b20331f1567c8b5d0e5cbd8a7 ]
Glauber reports a crash on init on a box he has:
RIP: 0010:__alloc_pages_nodemask+0x132/0x340 Code: 18 01 75 04 41 80 ce 80 89 e8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 48 8b 3c 24 83 e0 01 88 44 24 20 48 85 d2 0f 85 74 01 00 00 <3b> 77 08 0f 82 6b 01 00 00 48 89 7c 24 10 89 ea 48 8b 07 b9 00 02 RSP: 0018:ffffb8be4d0b7c28 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000e8e8 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002080 RBP: 0000000000012cc0 R08: 0000000000000000 R09: 0000000000000002 R10: 0000000000000dc0 R11: ffff995c60400100 R12: 0000000000000000 R13: 0000000000012cc0 R14: 0000000000000001 R15: ffff995c60db00f0 FS: 00007f4d115ca900(0000) GS:ffff995c60d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000002088 CR3: 00000017cca66002 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: alloc_slab_page+0x46/0x320 new_slab+0x9d/0x4e0 ___slab_alloc+0x507/0x6a0 ? io_wq_create+0xb4/0x2a0 __slab_alloc+0x1c/0x30 kmem_cache_alloc_node_trace+0xa6/0x260 io_wq_create+0xb4/0x2a0 io_uring_setup+0x97f/0xaa0 ? io_remove_personalities+0x30/0x30 ? io_poll_trigger_evfd+0x30/0x30 do_syscall_64+0x5b/0x1c0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f4d116cb1ed
which is due to the 'wqe' and 'worker' allocation being node affine. But it isn't valid to call the node affine allocation if the node isn't online.
Setup structures for even offline nodes, as usual, but skip them in terms of thread setup to not waste resources. If the node isn't online, just alloc memory with NUMA_NO_NODE.
Reported-by: Glauber Costa glauber@scylladb.com Tested-by: Glauber Costa glauber@scylladb.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io-wq.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c index 0dc4bb6de6566..25ffb6685baea 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -666,11 +666,16 @@ static int io_wq_manager(void *data) /* create fixed workers */ refcount_set(&wq->refs, workers_to_create); for_each_node(node) { + if (!node_online(node)) + continue; if (!create_io_worker(wq, wq->wqes[node], IO_WQ_ACCT_BOUND)) goto err; workers_to_create--; }
+ while (workers_to_create--) + refcount_dec(&wq->refs); + complete(&wq->done);
while (!kthread_should_stop()) { @@ -678,6 +683,9 @@ static int io_wq_manager(void *data) struct io_wqe *wqe = wq->wqes[node]; bool fork_worker[2] = { false, false };
+ if (!node_online(node)) + continue; + spin_lock_irq(&wqe->lock); if (io_wqe_need_worker(wqe, IO_WQ_ACCT_BOUND)) fork_worker[IO_WQ_ACCT_BOUND] = true; @@ -793,7 +801,9 @@ static bool io_wq_for_each_worker(struct io_wqe *wqe,
list_for_each_entry_rcu(worker, &wqe->all_list, all_list) { if (io_worker_get(worker)) { - ret = func(worker, data); + /* no task if node is/was offline */ + if (worker->task) + ret = func(worker, data); io_worker_release(worker); if (ret) break; @@ -1006,6 +1016,8 @@ void io_wq_flush(struct io_wq *wq) for_each_node(node) { struct io_wqe *wqe = wq->wqes[node];
+ if (!node_online(node)) + continue; init_completion(&data.done); INIT_IO_WORK(&data.work, io_wq_flush_func); data.work.flags |= IO_WQ_WORK_INTERNAL; @@ -1038,12 +1050,15 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
for_each_node(node) { struct io_wqe *wqe; + int alloc_node = node;
- wqe = kzalloc_node(sizeof(struct io_wqe), GFP_KERNEL, node); + if (!node_online(alloc_node)) + alloc_node = NUMA_NO_NODE; + wqe = kzalloc_node(sizeof(struct io_wqe), GFP_KERNEL, alloc_node); if (!wqe) goto err; wq->wqes[node] = wqe; - wqe->node = node; + wqe->node = alloc_node; wqe->acct[IO_WQ_ACCT_BOUND].max_workers = bounded; atomic_set(&wqe->acct[IO_WQ_ACCT_BOUND].nr_running, 0); if (wq->user) { @@ -1051,7 +1066,6 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data) task_rlimit(current, RLIMIT_NPROC); } atomic_set(&wqe->acct[IO_WQ_ACCT_UNBOUND].nr_running, 0); - wqe->node = node; wqe->wq = wq; spin_lock_init(&wqe->lock); INIT_WQ_LIST(&wqe->work_list);
From: Frank Sorenson sorenson@redhat.com
[ Upstream commit f52aa79df43c4509146140de0241bc21a4a3b4c7 ]
A number of the debug statements output file or directory mode in hex. Change these to print using octal.
Signed-off-by: Frank Sorenson sorenson@redhat.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/cifsacl.c | 4 ++-- fs/cifs/connect.c | 2 +- fs/cifs/inode.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/cifs/cifsacl.c b/fs/cifs/cifsacl.c index fb41e51dd5743..25704beb9d4ca 100644 --- a/fs/cifs/cifsacl.c +++ b/fs/cifs/cifsacl.c @@ -601,7 +601,7 @@ static void access_flags_to_mode(__le32 ace_flags, int type, umode_t *pmode, ((flags & FILE_EXEC_RIGHTS) == FILE_EXEC_RIGHTS)) *pmode |= (S_IXUGO & (*pbits_to_set));
- cifs_dbg(NOISY, "access flags 0x%x mode now 0x%x\n", flags, *pmode); + cifs_dbg(NOISY, "access flags 0x%x mode now %04o\n", flags, *pmode); return; }
@@ -630,7 +630,7 @@ static void mode_to_access_flags(umode_t mode, umode_t bits_to_use, if (mode & S_IXUGO) *pace_flags |= SET_FILE_EXEC_RIGHTS;
- cifs_dbg(NOISY, "mode: 0x%x, access flags now 0x%x\n", + cifs_dbg(NOISY, "mode: %04o, access flags now 0x%x\n", mode, *pace_flags); return; } diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 0aa3623ae0e16..641825cfa7670 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -4151,7 +4151,7 @@ int cifs_setup_cifs_sb(struct smb_vol *pvolume_info, cifs_sb->mnt_gid = pvolume_info->linux_gid; cifs_sb->mnt_file_mode = pvolume_info->file_mode; cifs_sb->mnt_dir_mode = pvolume_info->dir_mode; - cifs_dbg(FYI, "file mode: 0x%hx dir mode: 0x%hx\n", + cifs_dbg(FYI, "file mode: %04ho dir mode: %04ho\n", cifs_sb->mnt_file_mode, cifs_sb->mnt_dir_mode);
cifs_sb->actimeo = pvolume_info->actimeo; diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index ca76a9287456f..b3f3675e18788 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -1649,7 +1649,7 @@ int cifs_mkdir(struct inode *inode, struct dentry *direntry, umode_t mode) struct TCP_Server_Info *server; char *full_path;
- cifs_dbg(FYI, "In cifs_mkdir, mode = 0x%hx inode = 0x%p\n", + cifs_dbg(FYI, "In cifs_mkdir, mode = %04ho inode = 0x%p\n", mode, inode);
cifs_sb = CIFS_SB(inode->i_sb);
From: Coly Li colyli@suse.de
[ Upstream commit 0b96da639a4874311e9b5156405f69ef9fc3bef8 ]
When run a cache set, all the bcache btree node of this cache set will be checked by bch_btree_check(). If the bcache btree is very large, iterating all the btree nodes will occupy too much system memory and the bcache registering process might be selected and killed by system OOM killer. kthread_run() will fail if current process has pending signal, therefore the kthread creating in run_cache_set() for gc and allocator kernel threads are very probably failed for a very large bcache btree.
Indeed such OOM is safe and the registering process will exit after the registration done. Therefore this patch flushes pending signals during the cache set start up, specificly in bch_cache_allocator_start() and bch_gc_thread_start(), to make sure run_cache_set() won't fail for large cahced data set.
Signed-off-by: Coly Li colyli@suse.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/bcache/alloc.c | 18 ++++++++++++++++-- drivers/md/bcache/btree.c | 13 +++++++++++++ 2 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c index a1df0d95151c6..8bc1faf71ff2f 100644 --- a/drivers/md/bcache/alloc.c +++ b/drivers/md/bcache/alloc.c @@ -67,6 +67,7 @@ #include <linux/blkdev.h> #include <linux/kthread.h> #include <linux/random.h> +#include <linux/sched/signal.h> #include <trace/events/bcache.h>
#define MAX_OPEN_BUCKETS 128 @@ -733,8 +734,21 @@ int bch_open_buckets_alloc(struct cache_set *c)
int bch_cache_allocator_start(struct cache *ca) { - struct task_struct *k = kthread_run(bch_allocator_thread, - ca, "bcache_allocator"); + struct task_struct *k; + + /* + * In case previous btree check operation occupies too many + * system memory for bcache btree node cache, and the + * registering process is selected by OOM killer. Here just + * ignore the SIGKILL sent by OOM killer if there is, to + * avoid kthread_run() being failed by pending signals. The + * bcache registering process will exit after the registration + * done. + */ + if (signal_pending(current)) + flush_signals(current); + + k = kthread_run(bch_allocator_thread, ca, "bcache_allocator"); if (IS_ERR(k)) return PTR_ERR(k);
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 14d6c33b0957e..78f0711a25849 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -34,6 +34,7 @@ #include <linux/random.h> #include <linux/rcupdate.h> #include <linux/sched/clock.h> +#include <linux/sched/signal.h> #include <linux/rculist.h> #include <linux/delay.h> #include <trace/events/bcache.h> @@ -1917,6 +1918,18 @@ static int bch_gc_thread(void *arg)
int bch_gc_thread_start(struct cache_set *c) { + /* + * In case previous btree check operation occupies too many + * system memory for bcache btree node cache, and the + * registering process is selected by OOM killer. Here just + * ignore the SIGKILL sent by OOM killer if there is, to + * avoid kthread_run() being failed by pending signals. The + * bcache registering process will exit after the registration + * done. + */ + if (signal_pending(current)) + flush_signals(current); + c->gc_thread = kthread_run(bch_gc_thread, c, "bcache_gc"); return PTR_ERR_OR_ZERO(c->gc_thread); }
On 3/3/20 10:42 AM, Greg Kroah-Hartman wrote:
From: Coly Li colyli@suse.de
[ Upstream commit 0b96da639a4874311e9b5156405f69ef9fc3bef8 ]
When run a cache set, all the bcache btree node of this cache set will be checked by bch_btree_check(). If the bcache btree is very large, iterating all the btree nodes will occupy too much system memory and the bcache registering process might be selected and killed by system OOM killer. kthread_run() will fail if current process has pending signal, therefore the kthread creating in run_cache_set() for gc and allocator kernel threads are very probably failed for a very large bcache btree.
Indeed such OOM is safe and the registering process will exit after the registration done. Therefore this patch flushes pending signals during the cache set start up, specificly in bch_cache_allocator_start() and bch_gc_thread_start(), to make sure run_cache_set() won't fail for large cahced data set.
Ditto this one, of course.
Did someone send this in for stable? It's not marked stable in the original commit.
On Tue, Mar 03, 2020 at 10:58:29AM -0700, Jens Axboe wrote:
On 3/3/20 10:42 AM, Greg Kroah-Hartman wrote:
From: Coly Li colyli@suse.de
[ Upstream commit 0b96da639a4874311e9b5156405f69ef9fc3bef8 ]
When run a cache set, all the bcache btree node of this cache set will be checked by bch_btree_check(). If the bcache btree is very large, iterating all the btree nodes will occupy too much system memory and the bcache registering process might be selected and killed by system OOM killer. kthread_run() will fail if current process has pending signal, therefore the kthread creating in run_cache_set() for gc and allocator kernel threads are very probably failed for a very large bcache btree.
Indeed such OOM is safe and the registering process will exit after the registration done. Therefore this patch flushes pending signals during the cache set start up, specificly in bch_cache_allocator_start() and bch_gc_thread_start(), to make sure run_cache_set() won't fail for large cahced data set.
Ditto this one, of course.
Did someone send this in for stable? It's not marked stable in the original commit.
I think the autobot grabbed it.
From: Sergey Matyukevich sergey.matyukevich.os@quantenna.com
[ Upstream commit ea75080110a4c1fa011b0a73cb8f42227143ee3e ]
The nl80211_policy is missing for NL80211_ATTR_STATUS_CODE attribute. As a result, for strictly validated commands, it's assumed to not be supported.
Signed-off-by: Sergey Matyukevich sergey.matyukevich.os@quantenna.com Link: https://lore.kernel.org/r/20200213131608.10541-2-sergey.matyukevich.os@quant... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/nl80211.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 1e97ac5435b23..118a98de516cd 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -437,6 +437,7 @@ const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = { [NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT] = { .type = NLA_FLAG }, [NL80211_ATTR_CONTROL_PORT_OVER_NL80211] = { .type = NLA_FLAG }, [NL80211_ATTR_PRIVACY] = { .type = NLA_FLAG }, + [NL80211_ATTR_STATUS_CODE] = { .type = NLA_U16 }, [NL80211_ATTR_CIPHER_SUITE_GROUP] = { .type = NLA_U32 }, [NL80211_ATTR_WPA_VERSIONS] = { .type = NLA_U32 }, [NL80211_ATTR_PID] = { .type = NLA_U32 },
From: Shay Bar shay.bar@celeno.com
[ Upstream commit 33181ea7f5a62a17fbe55f0f73428ecb5e686be8 ]
Before this patch, STA's would set new width of 160/80+80 MHz based on AP capability only. This is wrong because STA may not support > 80MHz BW. Fix is to verify STA has 160/80+80 MHz capability before increasing its width to > 80MHz.
The "support_80_80" and "support_160" setting is based on: "Table 9-272 — Setting of the Supported Channel Width Set subfield and Extended NSS BW Support subfield at a STA transmitting the VHT Capabilities Information field"
From "Draft P802.11REVmd_D3.0.pdf"
Signed-off-by: Aviad Brikman aviad.brikman@celeno.com Signed-off-by: Shay Bar shay.bar@celeno.com Link: https://lore.kernel.org/r/20200210130728.23674-1-shay.bar@celeno.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/util.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 739e90555d8b9..decd46b383938 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -2993,10 +2993,22 @@ bool ieee80211_chandef_vht_oper(struct ieee80211_hw *hw, int cf0, cf1; int ccfs0, ccfs1, ccfs2; int ccf0, ccf1; + u32 vht_cap; + bool support_80_80 = false; + bool support_160 = false;
if (!oper || !htop) return false;
+ vht_cap = hw->wiphy->bands[chandef->chan->band]->vht_cap.cap; + support_160 = (vht_cap & (IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_MASK | + IEEE80211_VHT_CAP_EXT_NSS_BW_MASK)); + support_80_80 = ((vht_cap & + IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_160_80PLUS80MHZ) || + (vht_cap & IEEE80211_VHT_CAP_SUPP_CHAN_WIDTH_160MHZ && + vht_cap & IEEE80211_VHT_CAP_EXT_NSS_BW_MASK) || + ((vht_cap & IEEE80211_VHT_CAP_EXT_NSS_BW_MASK) >> + IEEE80211_VHT_CAP_EXT_NSS_BW_SHIFT > 1)); ccfs0 = oper->center_freq_seg0_idx; ccfs1 = oper->center_freq_seg1_idx; ccfs2 = (le16_to_cpu(htop->operation_mode) & @@ -3024,10 +3036,10 @@ bool ieee80211_chandef_vht_oper(struct ieee80211_hw *hw, unsigned int diff;
diff = abs(ccf1 - ccf0); - if (diff == 8) { + if ((diff == 8) && support_160) { new.width = NL80211_CHAN_WIDTH_160; new.center_freq1 = cf1; - } else if (diff > 8) { + } else if ((diff > 8) && support_80_80) { new.width = NL80211_CHAN_WIDTH_80P80; new.center_freq2 = cf1; }
From: Yufeng Mo moyufeng@huawei.com
[ Upstream commit d0db7ed397517c8b2be24a0d1abfa15df776908e ]
In the current process, the management table is missing after the IMP reset. This patch adds the management table to the reset process.
Fixes: f5aac71c0327 ("net: hns3: add manager table initialization for hardware") Signed-off-by: Yufeng Mo moyufeng@huawei.com Signed-off-by: Huazhong Tan tanhuazhong@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 13dbd249f35fa..bfdb08572f0cc 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -9821,6 +9821,13 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev) return ret; }
+ ret = init_mgr_tbl(hdev); + if (ret) { + dev_err(&pdev->dev, + "failed to reinit manager table, ret = %d\n", ret); + return ret; + } + ret = hclge_init_fd_config(hdev); if (ret) { dev_err(&pdev->dev, "fd table init fail, ret=%d\n", ret);
From: Yonglong Liu liuyonglong@huawei.com
[ Upstream commit 19eb1123b4e9337fe20b1763fec528f837ec6568 ]
When enabling 4 TC after setting the bandwidth of VF, the bandwidth of VF will resume to default value, because of the qset resources changed in this case.
This patch fixes it by using a fixed VF's qset resources according to HNAE3_MAX_TC macro.
Fixes: ee9e44248f52 ("net: hns3: add support for configuring bandwidth of VF on the host") Signed-off-by: Yonglong Liu liuyonglong@huawei.com Signed-off-by: Huazhong Tan tanhuazhong@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c index 180224eab1ca4..28db13253a5e7 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c @@ -566,7 +566,7 @@ static void hclge_tm_vport_tc_info_update(struct hclge_vport *vport) */ kinfo->num_tc = vport->vport_id ? 1 : min_t(u16, vport->alloc_tqps, hdev->tm_info.num_tc); - vport->qs_offset = (vport->vport_id ? hdev->tm_info.num_tc : 0) + + vport->qs_offset = (vport->vport_id ? HNAE3_MAX_TC : 0) + (vport->vport_id ? (vport->vport_id - 1) : 0);
max_rss_size = min_t(u16, hdev->rss_size_max,
From: Guangbin Huang huangguangbin2@huawei.com
[ Upstream commit 47327c9315b2f3ae4ab659457977a26669631f20 ]
The IPv6 address defined in struct in6_addr is specified as big endian, but there is no specified endian in struct hclge_fd_rule_tuples, so it will cause a problem if directly use memcpy() to copy ipv6 address between these two structures since this field in struct hclge_fd_rule_tuples is little endian.
This patch fixes this problem by using be32_to_cpu() to convert endian of IPv6 address of struct in6_addr before copying.
Fixes: d93ed94fbeaf ("net: hns3: add aRFS support for PF") Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Signed-off-by: Huazhong Tan tanhuazhong@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index bfdb08572f0cc..5d74f5a60102a 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -6106,6 +6106,9 @@ static int hclge_get_all_rules(struct hnae3_handle *handle, static void hclge_fd_get_flow_tuples(const struct flow_keys *fkeys, struct hclge_fd_rule_tuples *tuples) { +#define flow_ip6_src fkeys->addrs.v6addrs.src.in6_u.u6_addr32 +#define flow_ip6_dst fkeys->addrs.v6addrs.dst.in6_u.u6_addr32 + tuples->ether_proto = be16_to_cpu(fkeys->basic.n_proto); tuples->ip_proto = fkeys->basic.ip_proto; tuples->dst_port = be16_to_cpu(fkeys->ports.dst); @@ -6114,12 +6117,12 @@ static void hclge_fd_get_flow_tuples(const struct flow_keys *fkeys, tuples->src_ip[3] = be32_to_cpu(fkeys->addrs.v4addrs.src); tuples->dst_ip[3] = be32_to_cpu(fkeys->addrs.v4addrs.dst); } else { - memcpy(tuples->src_ip, - fkeys->addrs.v6addrs.src.in6_u.u6_addr32, - sizeof(tuples->src_ip)); - memcpy(tuples->dst_ip, - fkeys->addrs.v6addrs.dst.in6_u.u6_addr32, - sizeof(tuples->dst_ip)); + int i; + + for (i = 0; i < IPV6_SIZE; i++) { + tuples->src_ip[i] = be32_to_cpu(flow_ip6_src[i]); + tuples->dst_ip[i] = be32_to_cpu(flow_ip6_dst[i]); + } } }
From: Anton Eidelman anton@lightbitslabs.com
[ Upstream commit 2d570a7c0251c594489a2c16b82b14ae30345c03 ]
When nvme_tcp_io_work() fails to send to socket due to connection close/reset, error_recovery work is triggered from nvme_tcp_state_change() socket callback. This cancels all the active requests in the tagset, which requeues them.
The failed request, however, was ended and thus requeued individually as well unless send returned -EPIPE. Another return code to be treated the same way is -ECONNRESET.
Double requeue caused BUG_ON(blk_queued_rq(rq)) in blk_mq_requeue_request() from either the individual requeue of the failed request or the bulk requeue from blk_mq_tagset_busy_iter(, nvme_cancel_request, );
Signed-off-by: Anton Eidelman anton@lightbitslabs.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/tcp.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 6d43b23a0fc8b..f8fa5c5b79f17 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1054,7 +1054,12 @@ static void nvme_tcp_io_work(struct work_struct *w) } else if (unlikely(result < 0)) { dev_err(queue->ctrl->ctrl.device, "failed to send request %d\n", result); - if (result != -EPIPE) + + /* + * Fail the request unless peer closed the connection, + * in which case error recovery flow will complete all. + */ + if ((result != -EPIPE) && (result != -ECONNRESET)) nvme_tcp_fail_request(queue->request); nvme_tcp_done_send_req(queue); return;
From: Nigel Kirkland nigel.kirkland@broadcom.com
[ Upstream commit 97b2512ad000a409b4073dd1a71e4157d76675cb ]
Delayed keep alive work is queued on system workqueue and may be cancelled via nvme_stop_keep_alive from nvme_reset_wq, nvme_fc_wq or nvme_wq.
Check_flush_dependency detects mismatched attributes between the work-queue context used to cancel the keep alive work and system-wq. Specifically system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used to cancel keep alive work have WQ_MEM_RECLAIM flag.
Example warning:
workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc] is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core]
To avoid the flags mismatch, delayed keep alive work is queued on nvme_wq.
However this creates a secondary concern where work and a request to cancel that work may be in the same work queue - namely err_work in the rdma and tcp transports, which will want to flush/cancel the keep alive work which will now be on nvme_wq.
After reviewing the transports, it looks like err_work can be moved to nvme_reset_wq. In fact that aligns them better with transition into RESETTING and performing related reset work in nvme_reset_wq.
Change nvme-rdma and nvme-tcp to perform err_work in nvme_reset_wq.
Signed-off-by: Nigel Kirkland nigel.kirkland@broadcom.com Signed-off-by: James Smart jsmart2021@gmail.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 10 +++++----- drivers/nvme/host/rdma.c | 2 +- drivers/nvme/host/tcp.c | 2 +- 3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 641c07347e8d8..ada59df642d29 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -66,8 +66,8 @@ MODULE_PARM_DESC(streams, "turn on support for Streams write directives"); * nvme_reset_wq - hosts nvme reset works * nvme_delete_wq - hosts nvme delete works * - * nvme_wq will host works such are scan, aen handling, fw activation, - * keep-alive error recovery, periodic reconnects etc. nvme_reset_wq + * nvme_wq will host works such as scan, aen handling, fw activation, + * keep-alive, periodic reconnects etc. nvme_reset_wq * runs reset works which also flush works hosted on nvme_wq for * serialization purposes. nvme_delete_wq host controller deletion * works which flush reset works for serialization. @@ -976,7 +976,7 @@ static void nvme_keep_alive_end_io(struct request *rq, blk_status_t status) startka = true; spin_unlock_irqrestore(&ctrl->lock, flags); if (startka) - schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ); + queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ); }
static int nvme_keep_alive(struct nvme_ctrl *ctrl) @@ -1006,7 +1006,7 @@ static void nvme_keep_alive_work(struct work_struct *work) dev_dbg(ctrl->device, "reschedule traffic based keep-alive timer\n"); ctrl->comp_seen = false; - schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ); + queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ); return; }
@@ -1023,7 +1023,7 @@ static void nvme_start_keep_alive(struct nvme_ctrl *ctrl) if (unlikely(ctrl->kato == 0)) return;
- schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ); + queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ); }
void nvme_stop_keep_alive(struct nvme_ctrl *ctrl) diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 2a47c6c5007e1..3e85c5cacefd2 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1088,7 +1088,7 @@ static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl) if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING)) return;
- queue_work(nvme_wq, &ctrl->err_work); + queue_work(nvme_reset_wq, &ctrl->err_work); }
static void nvme_rdma_wr_error(struct ib_cq *cq, struct ib_wc *wc, diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index f8fa5c5b79f17..49d4373b84eb3 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -422,7 +422,7 @@ static void nvme_tcp_error_recovery(struct nvme_ctrl *ctrl) if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING)) return;
- queue_work(nvme_wq, &to_tcp_ctrl(ctrl)->err_work); + queue_work(nvme_reset_wq, &to_tcp_ctrl(ctrl)->err_work); }
static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
From: Keith Busch kbusch@kernel.org
[ Upstream commit fa46c6fb5d61b1f17b06d7c6ef75478b576304c7 ]
Many users have reported nvme triggered irq_startup() warnings during shutdown. The driver uses the nvme queue's irq to synchronize scanning for completions, and enabling an interrupt affined to only offline CPUs triggers the alarming warning.
Move the final CQE check to after disabling the device and all registered interrupts have been torn down so that we do not have any IRQ to synchronize.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206509 Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index da392b50f73e7..9c80f9f081496 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1401,6 +1401,23 @@ static void nvme_disable_admin_queue(struct nvme_dev *dev, bool shutdown) nvme_poll_irqdisable(nvmeq, -1); }
+/* + * Called only on a device that has been disabled and after all other threads + * that can check this device's completion queues have synced. This is the + * last chance for the driver to see a natural completion before + * nvme_cancel_request() terminates all incomplete requests. + */ +static void nvme_reap_pending_cqes(struct nvme_dev *dev) +{ + u16 start, end; + int i; + + for (i = dev->ctrl.queue_count - 1; i > 0; i--) { + nvme_process_cq(&dev->queues[i], &start, &end, -1); + nvme_complete_cqes(&dev->queues[i], start, end); + } +} + static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues, int entry_size) { @@ -2235,11 +2252,6 @@ static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode) if (timeout == 0) return false;
- /* handle any remaining CQEs */ - if (opcode == nvme_admin_delete_cq && - !test_bit(NVMEQ_DELETE_ERROR, &nvmeq->flags)) - nvme_poll_irqdisable(nvmeq, -1); - sent--; if (nr_queues) goto retry; @@ -2428,6 +2440,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown) nvme_suspend_io_queues(dev); nvme_suspend_queue(&dev->queues[0]); nvme_pci_disable(dev); + nvme_reap_pending_cqes(dev);
blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl); blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
From: Damien Le Moal damien.lemoal@wdc.com
commit 51fdaa0490241e8cd41b40cbf43a336d1a014460 upstream.
The block layer generic blk_revalidate_disk_zones() checks the validity of zone descriptors reported by a disk using the blk_revalidate_zone_cb() callback function executed for each zone descriptor. If a ZBC disk reports invalid zone descriptors, blk_revalidate_disk_zones() returns an error and sd_zbc_read_zones() changes the disk capacity to 0, which in turn results in the gendisk structure capacity to be set to 0. This all works well for the first revalidate pass on a disk and the block layer detects the capactiy change.
On the second revalidate pass, blk_revalidate_disk_zones() is called again and sd_zbc_report_zones() executed to check the zones a second time. However, for this second pass, the gendisk capacity is now 0, which results in sd_zbc_report_zones() to do nothing and to report success and no zones. blk_revalidate_disk_zones() in turn returns success and sets the disk queue chunk_sectors limit with zero as no zones were checked, causing a oops to trigger on the BUG_ON(!is_power_of_2(chunk_sectors)) in blk_queue_chunk_sectors().
Fix this by using the sdkp capacity field rather than the gendisk capacity for the report zones loop in sd_zbc_report_zones(). Also add a check to return immediately an error if the sdkp capacity is 0. With this fix, invalid/buggy ZBC disk scan does not trigger a oops and are exposed with a 0 capacity. This change also preserve the chance for the disk to be correctly revalidated on the second revalidate pass as the scsi disk structure capacity field is always set to the disk reported value when sd_zbc_report_zones() is called.
Link: https://lore.kernel.org/r/20200219063800.880834-1-damien.lemoal@wdc.com Fixes: d41003513e61 ("block: rework zone reporting") Cc: Cc: stable@vger.kernel.org # v5.5 Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Damien Le Moal damien.lemoal@wdc.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/sd_zbc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -161,6 +161,7 @@ int sd_zbc_report_zones(struct gendisk * unsigned int nr_zones, report_zones_cb cb, void *data) { struct scsi_disk *sdkp = scsi_disk(disk); + sector_t capacity = logical_to_sectors(sdkp->device, sdkp->capacity); unsigned int nr, i; unsigned char *buf; size_t offset, buflen = 0; @@ -171,11 +172,15 @@ int sd_zbc_report_zones(struct gendisk * /* Not a zoned device */ return -EOPNOTSUPP;
+ if (!capacity) + /* Device gone or invalid */ + return -ENODEV; + buf = sd_zbc_alloc_report_buffer(sdkp, nr_zones, &buflen); if (!buf) return -ENOMEM;
- while (zone_idx < nr_zones && sector < get_capacity(disk)) { + while (zone_idx < nr_zones && sector < capacity) { ret = sd_zbc_do_report_zones(sdkp, buf, buflen, sectors_to_logical(sdkp->device, sector), true); if (ret)
From: Benjamin Block bblock@linux.ibm.com
commit a3fd4bfe85fbb67cf4ec1232d0af625ece3c508b upstream.
When implementing support for retrieval of local diagnostic data from the FCP channel, the wrong data format was assumed for the temperature of the local SFP+ connector. The Fibre Channel Link Services (FC-LS-3) specification is not clear on the format of the stored integer, and only after consulting the SNIA specification SFF-8472 did we realize it is stored as two's complement. Thus, the used data and display format is wrong, and highly misleading for users when the temperature should drop below 0°C (however unlikely that may be).
To fix this, change the data format in `struct fsf_qtcb_bottom_port` from unsigned to signed, and change the printf format string used to generate `zfcp_sysfs_adapter_diag_sfp_temperature_show()` from `%hu` to `%hd`.
Link: https://lore.kernel.org/r/d6e3be5428da5c9490cfff4df7cae868bc9f1a7e.158203950... Fixes: a10a61e807b0 ("scsi: zfcp: support retrieval of SFP Data via Exchange Port Data") Fixes: 6028f7c4cd87 ("scsi: zfcp: introduce sysfs interface for diagnostics of local SFP transceiver") Cc: stable@vger.kernel.org # 5.5+ Reviewed-by: Jens Remus jremus@linux.ibm.com Reviewed-by: Fedor Loshakov loshakov@linux.ibm.com Reviewed-by: Steffen Maier maier@linux.ibm.com Signed-off-by: Benjamin Block bblock@linux.ibm.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/scsi/zfcp_fsf.h | 2 +- drivers/s390/scsi/zfcp_sysfs.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/s390/scsi/zfcp_fsf.h +++ b/drivers/s390/scsi/zfcp_fsf.h @@ -410,7 +410,7 @@ struct fsf_qtcb_bottom_port { u8 cb_util; u8 a_util; u8 res2; - u16 temperature; + s16 temperature; u16 vcc; u16 tx_bias; u16 tx_power; --- a/drivers/s390/scsi/zfcp_sysfs.c +++ b/drivers/s390/scsi/zfcp_sysfs.c @@ -800,7 +800,7 @@ static ZFCP_DEV_ATTR(adapter_diag, b2b_c static ZFCP_DEV_ATTR(adapter_diag_sfp, _name, 0400, \ zfcp_sysfs_adapter_diag_sfp_##_name##_show, NULL)
-ZFCP_DEFINE_DIAG_SFP_ATTR(temperature, temperature, 5, "%hu"); +ZFCP_DEFINE_DIAG_SFP_ATTR(temperature, temperature, 6, "%hd"); ZFCP_DEFINE_DIAG_SFP_ATTR(vcc, vcc, 5, "%hu"); ZFCP_DEFINE_DIAG_SFP_ATTR(tx_bias, tx_bias, 5, "%hu"); ZFCP_DEFINE_DIAG_SFP_ATTR(tx_power, tx_power, 5, "%hu");
From: Kees Cook keescook@chromium.org
commit adc10f5b0a03606e30c704cff1f0283a696d0260 upstream.
When there was no parallelism (no top-level -j arg and a pre-1.7 sphinx-build), the argument passed would be empty ("") instead of just being missing, which would (understandably) badly confuse sphinx-build. Fix this by removing the quotes.
Reported-by: Rafael J. Wysocki rafael@kernel.org Fixes: 51e46c7a4007 ("docs, parallelism: Rearrange how jobserver reservations are made") Cc: stable@vger.kernel.org # v5.5 only Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Jonathan Corbet corbet@lwn.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/sphinx/parallel-wrapper.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/sphinx/parallel-wrapper.sh +++ b/Documentation/sphinx/parallel-wrapper.sh @@ -30,4 +30,4 @@ if [ -n "$parallel" ] ; then parallel="-j$parallel" fi
-exec "$sphinx" "$parallel" "$@" +exec "$sphinx" $parallel "$@"
From: Dan Carpenter dan.carpenter@oracle.com
commit 37b0b6b8b99c0e1c1f11abbe7cf49b6d03795b3f upstream.
If sbi->s_flex_groups_allocated is zero and the first allocation fails then this code will crash. The problem is that "i--" will set "i" to -1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1 is type promoted to unsigned and becomes UINT_MAX. Since UINT_MAX is more than zero, the condition is true so we call kvfree(new_groups[-1]). The loop will carry on freeing invalid memory until it crashes.
Fixes: 7c990728b99e ("ext4: fix potential race between s_flex_groups online resizing and access") Reviewed-by: Suraj Jitindar Singh surajjs@amazon.com Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountain Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/super.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2346,7 +2346,7 @@ int ext4_alloc_flex_bg_array(struct supe { struct ext4_sb_info *sbi = EXT4_SB(sb); struct flex_groups **old_groups, **new_groups; - int size, i; + int size, i, j;
if (!sbi->s_log_groups_per_flex) return 0; @@ -2367,8 +2367,8 @@ int ext4_alloc_flex_bg_array(struct supe sizeof(struct flex_groups)), GFP_KERNEL); if (!new_groups[i]) { - for (i--; i >= sbi->s_flex_groups_allocated; i--) - kvfree(new_groups[i]); + for (j = sbi->s_flex_groups_allocated; j < i; j++) + kvfree(new_groups[j]); kvfree(new_groups); ext4_msg(sb, KERN_ERR, "not enough memory for %d flex groups", size);
From: Paul Moore paul@paul-moore.com
commit 2ad3e17ebf94b7b7f3f64c050ff168f9915345eb upstream.
Commit 219ca39427bf ("audit: use union for audit_field values since they are mutually exclusive") combined a number of separate fields in the audit_field struct into a single union. Generally this worked just fine because they are generally mutually exclusive. Unfortunately in audit_data_to_entry() the overlap can be a problem when a specific error case is triggered that causes the error path code to attempt to cleanup an audit_field struct and the cleanup involves attempting to free a stored LSM string (the lsm_str field). Currently the code always has a non-NULL value in the audit_field.lsm_str field as the top of the for-loop transfers a value into audit_field.val (both .lsm_str and .val are part of the same union); if audit_data_to_entry() fails and the audit_field struct is specified to contain a LSM string, but the audit_field.lsm_str has not yet been properly set, the error handling code will attempt to free the bogus audit_field.lsm_str value that was set with audit_field.val at the top of the for-loop.
This patch corrects this by ensuring that the audit_field.val is only set when needed (it is cleared when the audit_field struct is allocated with kcalloc()). It also corrects a few other issues to ensure that in case of error the proper error code is returned.
Cc: stable@vger.kernel.org Fixes: 219ca39427bf ("audit: use union for audit_field values since they are mutually exclusive") Reported-by: syzbot+1f4d90ead370d72e450b@syzkaller.appspotmail.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/auditfilter.c | 71 ++++++++++++++++++++++++++++----------------------- 1 file changed, 39 insertions(+), 32 deletions(-)
--- a/kernel/auditfilter.c +++ b/kernel/auditfilter.c @@ -456,6 +456,7 @@ static struct audit_entry *audit_data_to bufp = data->buf; for (i = 0; i < data->field_count; i++) { struct audit_field *f = &entry->rule.fields[i]; + u32 f_val;
err = -EINVAL;
@@ -464,12 +465,12 @@ static struct audit_entry *audit_data_to goto exit_free;
f->type = data->fields[i]; - f->val = data->values[i]; + f_val = data->values[i];
/* Support legacy tests for a valid loginuid */ - if ((f->type == AUDIT_LOGINUID) && (f->val == AUDIT_UID_UNSET)) { + if ((f->type == AUDIT_LOGINUID) && (f_val == AUDIT_UID_UNSET)) { f->type = AUDIT_LOGINUID_SET; - f->val = 0; + f_val = 0; entry->rule.pflags |= AUDIT_LOGINUID_LEGACY; }
@@ -485,7 +486,7 @@ static struct audit_entry *audit_data_to case AUDIT_SUID: case AUDIT_FSUID: case AUDIT_OBJ_UID: - f->uid = make_kuid(current_user_ns(), f->val); + f->uid = make_kuid(current_user_ns(), f_val); if (!uid_valid(f->uid)) goto exit_free; break; @@ -494,11 +495,12 @@ static struct audit_entry *audit_data_to case AUDIT_SGID: case AUDIT_FSGID: case AUDIT_OBJ_GID: - f->gid = make_kgid(current_user_ns(), f->val); + f->gid = make_kgid(current_user_ns(), f_val); if (!gid_valid(f->gid)) goto exit_free; break; case AUDIT_ARCH: + f->val = f_val; entry->rule.arch_f = f; break; case AUDIT_SUBJ_USER: @@ -511,11 +513,13 @@ static struct audit_entry *audit_data_to case AUDIT_OBJ_TYPE: case AUDIT_OBJ_LEV_LOW: case AUDIT_OBJ_LEV_HIGH: - str = audit_unpack_string(&bufp, &remain, f->val); - if (IS_ERR(str)) + str = audit_unpack_string(&bufp, &remain, f_val); + if (IS_ERR(str)) { + err = PTR_ERR(str); goto exit_free; - entry->rule.buflen += f->val; - + } + entry->rule.buflen += f_val; + f->lsm_str = str; err = security_audit_rule_init(f->type, f->op, str, (void **)&f->lsm_rule); /* Keep currently invalid fields around in case they @@ -524,68 +528,71 @@ static struct audit_entry *audit_data_to pr_warn("audit rule for LSM '%s' is invalid\n", str); err = 0; - } - if (err) { - kfree(str); + } else if (err) goto exit_free; - } else - f->lsm_str = str; break; case AUDIT_WATCH: - str = audit_unpack_string(&bufp, &remain, f->val); - if (IS_ERR(str)) + str = audit_unpack_string(&bufp, &remain, f_val); + if (IS_ERR(str)) { + err = PTR_ERR(str); goto exit_free; - entry->rule.buflen += f->val; - - err = audit_to_watch(&entry->rule, str, f->val, f->op); + } + err = audit_to_watch(&entry->rule, str, f_val, f->op); if (err) { kfree(str); goto exit_free; } + entry->rule.buflen += f_val; break; case AUDIT_DIR: - str = audit_unpack_string(&bufp, &remain, f->val); - if (IS_ERR(str)) + str = audit_unpack_string(&bufp, &remain, f_val); + if (IS_ERR(str)) { + err = PTR_ERR(str); goto exit_free; - entry->rule.buflen += f->val; - + } err = audit_make_tree(&entry->rule, str, f->op); kfree(str); if (err) goto exit_free; + entry->rule.buflen += f_val; break; case AUDIT_INODE: + f->val = f_val; err = audit_to_inode(&entry->rule, f); if (err) goto exit_free; break; case AUDIT_FILTERKEY: - if (entry->rule.filterkey || f->val > AUDIT_MAX_KEY_LEN) + if (entry->rule.filterkey || f_val > AUDIT_MAX_KEY_LEN) goto exit_free; - str = audit_unpack_string(&bufp, &remain, f->val); - if (IS_ERR(str)) + str = audit_unpack_string(&bufp, &remain, f_val); + if (IS_ERR(str)) { + err = PTR_ERR(str); goto exit_free; - entry->rule.buflen += f->val; + } + entry->rule.buflen += f_val; entry->rule.filterkey = str; break; case AUDIT_EXE: - if (entry->rule.exe || f->val > PATH_MAX) + if (entry->rule.exe || f_val > PATH_MAX) goto exit_free; - str = audit_unpack_string(&bufp, &remain, f->val); + str = audit_unpack_string(&bufp, &remain, f_val); if (IS_ERR(str)) { err = PTR_ERR(str); goto exit_free; } - entry->rule.buflen += f->val; - - audit_mark = audit_alloc_mark(&entry->rule, str, f->val); + audit_mark = audit_alloc_mark(&entry->rule, str, f_val); if (IS_ERR(audit_mark)) { kfree(str); err = PTR_ERR(audit_mark); goto exit_free; } + entry->rule.buflen += f_val; entry->rule.exe = audit_mark; break; + default: + f->val = f_val; + break; } }
From: Paul Moore paul@paul-moore.com
commit 756125289285f6e55a03861bf4b6257aa3d19a93 upstream.
This patch ensures that we always check the netlink payload length in audit_receive_msg() before we take any action on the payload itself.
Cc: stable@vger.kernel.org Reported-by: syzbot+399c44bf1f43b8747403@syzkaller.appspotmail.com Reported-by: syzbot+e4b12d8d202701f08b6d@syzkaller.appspotmail.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/audit.c | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-)
--- a/kernel/audit.c +++ b/kernel/audit.c @@ -1100,13 +1100,11 @@ static void audit_log_feature_change(int audit_log_end(ab); }
-static int audit_set_feature(struct sk_buff *skb) +static int audit_set_feature(struct audit_features *uaf) { - struct audit_features *uaf; int i;
BUILD_BUG_ON(AUDIT_LAST_FEATURE + 1 > ARRAY_SIZE(audit_feature_names)); - uaf = nlmsg_data(nlmsg_hdr(skb));
/* if there is ever a version 2 we should handle that here */
@@ -1174,6 +1172,7 @@ static int audit_receive_msg(struct sk_b { u32 seq; void *data; + int data_len; int err; struct audit_buffer *ab; u16 msg_type = nlh->nlmsg_type; @@ -1187,6 +1186,7 @@ static int audit_receive_msg(struct sk_b
seq = nlh->nlmsg_seq; data = nlmsg_data(nlh); + data_len = nlmsg_len(nlh);
switch (msg_type) { case AUDIT_GET: { @@ -1210,7 +1210,7 @@ static int audit_receive_msg(struct sk_b struct audit_status s; memset(&s, 0, sizeof(s)); /* guard against past and future API changes */ - memcpy(&s, data, min_t(size_t, sizeof(s), nlmsg_len(nlh))); + memcpy(&s, data, min_t(size_t, sizeof(s), data_len)); if (s.mask & AUDIT_STATUS_ENABLED) { err = audit_set_enabled(s.enabled); if (err < 0) @@ -1314,7 +1314,9 @@ static int audit_receive_msg(struct sk_b return err; break; case AUDIT_SET_FEATURE: - err = audit_set_feature(skb); + if (data_len < sizeof(struct audit_features)) + return -EINVAL; + err = audit_set_feature(data); if (err) return err; break; @@ -1326,6 +1328,8 @@ static int audit_receive_msg(struct sk_b
err = audit_filter(msg_type, AUDIT_FILTER_USER); if (err == 1) { /* match or error */ + char *str = data; + err = 0; if (msg_type == AUDIT_USER_TTY) { err = tty_audit_push(); @@ -1333,26 +1337,24 @@ static int audit_receive_msg(struct sk_b break; } audit_log_user_recv_msg(&ab, msg_type); - if (msg_type != AUDIT_USER_TTY) + if (msg_type != AUDIT_USER_TTY) { + /* ensure NULL termination */ + str[data_len - 1] = '\0'; audit_log_format(ab, " msg='%.*s'", AUDIT_MESSAGE_TEXT_MAX, - (char *)data); - else { - int size; - + str); + } else { audit_log_format(ab, " data="); - size = nlmsg_len(nlh); - if (size > 0 && - ((unsigned char *)data)[size - 1] == '\0') - size--; - audit_log_n_untrustedstring(ab, data, size); + if (data_len > 0 && str[data_len - 1] == '\0') + data_len--; + audit_log_n_untrustedstring(ab, str, data_len); } audit_log_end(ab); } break; case AUDIT_ADD_RULE: case AUDIT_DEL_RULE: - if (nlmsg_len(nlh) < sizeof(struct audit_rule_data)) + if (data_len < sizeof(struct audit_rule_data)) return -EINVAL; if (audit_enabled == AUDIT_LOCKED) { audit_log_common_recv_msg(audit_context(), &ab, @@ -1364,7 +1366,7 @@ static int audit_receive_msg(struct sk_b audit_log_end(ab); return -EPERM; } - err = audit_rule_change(msg_type, seq, data, nlmsg_len(nlh)); + err = audit_rule_change(msg_type, seq, data, data_len); break; case AUDIT_LIST_RULES: err = audit_list_rules_send(skb, seq); @@ -1379,7 +1381,7 @@ static int audit_receive_msg(struct sk_b case AUDIT_MAKE_EQUIV: { void *bufp = data; u32 sizes[2]; - size_t msglen = nlmsg_len(nlh); + size_t msglen = data_len; char *old, *new;
err = -EINVAL; @@ -1455,7 +1457,7 @@ static int audit_receive_msg(struct sk_b
memset(&s, 0, sizeof(s)); /* guard against past and future API changes */ - memcpy(&s, data, min_t(size_t, sizeof(s), nlmsg_len(nlh))); + memcpy(&s, data, min_t(size_t, sizeof(s), data_len)); /* check if new data is valid */ if ((s.enabled != 0 && s.enabled != 1) || (s.log_passwd != 0 && s.log_passwd != 1))
From: Mika Westerberg mika.westerberg@linux.intel.com
commit 1dade3a7048ccfc675650cd2cf13d578b095e5fb upstream.
Sometimes it is useful to find the access_width field value in bytes and not in bits so add a helper that can be used for this purpose.
Suggested-by: Jean Delvare jdelvare@suse.de Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Reviewed-by: Jean Delvare jdelvare@suse.de Cc: 4.16+ stable@vger.kernel.org # 4.16+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/acpi/actypes.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/include/acpi/actypes.h +++ b/include/acpi/actypes.h @@ -532,11 +532,12 @@ typedef u64 acpi_integer; strnlen (a, ACPI_NAMESEG_SIZE) == ACPI_NAMESEG_SIZE)
/* - * Algorithm to obtain access bit width. + * Algorithm to obtain access bit or byte width. * Can be used with access_width of struct acpi_generic_address and access_size of * struct acpi_resource_generic_register. */ #define ACPI_ACCESS_BIT_WIDTH(size) (1 << ((size) + 2)) +#define ACPI_ACCESS_BYTE_WIDTH(size) (1 << ((size) - 1))
/******************************************************************************* *
From: Mika Westerberg mika.westerberg@linux.intel.com
commit 2ba33a4e9e22ac4dda928d3e9b5978a3a2ded4e0 upstream.
ACPI Generic Address Structure (GAS) access_width field is not in bytes as the driver seems to expect in few places so fix this by using the newly introduced macro ACPI_ACCESS_BYTE_WIDTH().
Fixes: b1abf6fc4982 ("ACPI / watchdog: Fix off-by-one error at resource assignment") Fixes: 058dfc767008 ("ACPI / watchdog: Add support for WDAT hardware watchdog") Reported-by: Jean Delvare jdelvare@suse.de Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Reviewed-by: Jean Delvare jdelvare@suse.de Cc: 4.16+ stable@vger.kernel.org # 4.16+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/acpi_watchdog.c | 3 +-- drivers/watchdog/wdat_wdt.c | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/acpi/acpi_watchdog.c +++ b/drivers/acpi/acpi_watchdog.c @@ -126,12 +126,11 @@ void __init acpi_watchdog_init(void) gas = &entries[i].register_region;
res.start = gas->address; + res.end = res.start + ACPI_ACCESS_BYTE_WIDTH(gas->access_width) - 1; if (gas->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) { res.flags = IORESOURCE_MEM; - res.end = res.start + ALIGN(gas->access_width, 4) - 1; } else if (gas->space_id == ACPI_ADR_SPACE_SYSTEM_IO) { res.flags = IORESOURCE_IO; - res.end = res.start + gas->access_width - 1; } else { pr_warn("Unsupported address space: %u\n", gas->space_id); --- a/drivers/watchdog/wdat_wdt.c +++ b/drivers/watchdog/wdat_wdt.c @@ -389,7 +389,7 @@ static int wdat_wdt_probe(struct platfor
memset(&r, 0, sizeof(r)); r.start = gas->address; - r.end = r.start + gas->access_width - 1; + r.end = r.start + ACPI_ACCESS_BYTE_WIDTH(gas->access_width) - 1; if (gas->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) { r.flags = IORESOURCE_MEM; } else if (gas->space_id == ACPI_ADR_SPACE_SYSTEM_IO) {
From: Oliver Upton oupton@google.com
commit 86f7e90ce840aa1db407d3ea6e9b3a52b2ce923c upstream.
KVM emulates UMIP on hardware that doesn't support it by setting the 'descriptor table exiting' VM-execution control and performing instruction emulation. When running nested, this emulation is broken as KVM refuses to emulate L2 instructions by default.
Correct this regression by allowing the emulation of descriptor table instructions if L1 hasn't requested 'descriptor table exiting'.
Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Reported-by: Jan Kiszka jan.kiszka@web.de Cc: stable@vger.kernel.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Jim Mattson jmattson@google.com Signed-off-by: Oliver Upton oupton@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/vmx/vmx.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
--- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7179,6 +7179,7 @@ static int vmx_check_intercept_io(struct else intercept = nested_vmx_check_io_bitmaps(vcpu, port, size);
+ /* FIXME: produce nested vmexit and return X86EMUL_INTERCEPTED. */ return intercept ? X86EMUL_UNHANDLEABLE : X86EMUL_CONTINUE; }
@@ -7208,6 +7209,20 @@ static int vmx_check_intercept(struct kv case x86_intercept_outs: return vmx_check_intercept_io(vcpu, info);
+ case x86_intercept_lgdt: + case x86_intercept_lidt: + case x86_intercept_lldt: + case x86_intercept_ltr: + case x86_intercept_sgdt: + case x86_intercept_sidt: + case x86_intercept_sldt: + case x86_intercept_str: + if (!nested_cpu_has2(vmcs12, SECONDARY_EXEC_DESC)) + return X86EMUL_CONTINUE; + + /* FIXME: produce nested vmexit and return X86EMUL_INTERCEPTED. */ + break; + /* TODO: check more intercepts... */ default: break;
From: Hans de Goede hdegoede@redhat.com
commit beae56192a2570578ae45050e73c5ff9254f63e6 upstream.
Commit 8f18eca9ebc5 ("HID: ite: Add USB id match for Acer SW5-012 keyboard dock") added the USB id for the Acer SW5-012's keyboard dock to the hid-ite driver to fix the rfkill driver not working.
Most keyboard docks with an ITE 8595 keyboard/touchpad controller have the "Wireless Radio Control" bits which need the special hid-ite driver on the second USB interface (the mouse interface) and their touchpad only supports mouse emulation, so using generic hid-input handling for anything but the "Wireless Radio Control" bits is fine. On these devices we simply bind to all USB interfaces.
But unlike other ITE8595 using keyboard docks, the Acer Aspire Switch 10 (SW5-012)'s touchpad not only does mouse emulation it also supports HID-multitouch and all the keys including the "Wireless Radio Control" bits have been moved to the first USB interface (the keyboard intf).
So we need hid-ite to handle the first (keyboard) USB interface and have it NOT bind to the second (mouse) USB interface so that that can be handled by hid-multitouch.c and we get proper multi-touch support.
This commit changes the hid_device_id for the SW5-012 keyboard dock to only match on hid devices from the HID_GROUP_GENERIC group, this way hid-ite will not bind the the mouse/multi-touch interface which has HID_GROUP_MULTITOUCH_WIN_8 as group. This fixes the regression to mouse-emulation mode introduced by adding the keyboard dock USB id.
Cc: stable@vger.kernel.org Fixes: 8f18eca9ebc5 ("HID: ite: Add USB id match for Acer SW5-012 keyboard dock") Reported-by: Zdeněk Rampas zdenda.rampas@gmail.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Benjamin Tissoires benjamin.tissoires@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/hid-ite.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/hid/hid-ite.c +++ b/drivers/hid/hid-ite.c @@ -41,8 +41,9 @@ static const struct hid_device_id ite_de { HID_USB_DEVICE(USB_VENDOR_ID_ITE, USB_DEVICE_ID_ITE8595) }, { HID_USB_DEVICE(USB_VENDOR_ID_258A, USB_DEVICE_ID_258A_6A88) }, /* ITE8595 USB kbd ctlr, with Synaptics touchpad connected to it. */ - { HID_USB_DEVICE(USB_VENDOR_ID_SYNAPTICS, - USB_DEVICE_ID_SYNAPTICS_ACER_SWITCH5_012) }, + { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC, + USB_VENDOR_ID_SYNAPTICS, + USB_DEVICE_ID_SYNAPTICS_ACER_SWITCH5_012) }, { } }; MODULE_DEVICE_TABLE(hid, ite_devices);
From: Johan Korsnes jkorsnes@cisco.com
commit 5ebdffd25098898aff1249ae2f7dbfddd76d8f8f upstream.
In case a report is greater than HID_MAX_BUFFER_SIZE, it is truncated, but the report-number byte is not correctly handled. This results in a off-by-one in the following memset, causing a kernel Oops and ensuing system crash.
Note: With commit 8ec321e96e05 ("HID: Fix slab-out-of-bounds read in hid_field_extract") I no longer hit the kernel Oops as we instead fail "controlled" at probe if there is a report too long in the HID report-descriptor. hid_report_raw_event() is an exported symbol, so presumabely we cannot always rely on this being the case.
Fixes: 966922f26c7f ("HID: fix a crash in hid_report_raw_event() function.") Signed-off-by: Johan Korsnes jkorsnes@cisco.com Cc: Armando Visconti armando.visconti@st.com Cc: Jiri Kosina jkosina@suse.cz Cc: Alan Stern stern@rowland.harvard.edu Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/hid-core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -1741,7 +1741,9 @@ int hid_report_raw_event(struct hid_devi
rsize = ((report->size - 1) >> 3) + 1;
- if (rsize > HID_MAX_BUFFER_SIZE) + if (report_enum->numbered && rsize >= HID_MAX_BUFFER_SIZE) + rsize = HID_MAX_BUFFER_SIZE - 1; + else if (rsize > HID_MAX_BUFFER_SIZE) rsize = HID_MAX_BUFFER_SIZE;
if (csize < rsize) {
From: Johan Korsnes jkorsnes@cisco.com
commit 84a4062632462c4320704fcdf8e99e89e94c0aba upstream.
We have a HID touch device that reports its opens and shorts test results in HID buffers of size 8184 bytes. The maximum size of the HID buffer is currently set to 4096 bytes, causing probe of this device to fail. With this patch we increase the maximum size of the HID buffer to 8192 bytes, making device probe and acquisition of said buffers succeed.
Signed-off-by: Johan Korsnes jkorsnes@cisco.com Cc: Alan Stern stern@rowland.harvard.edu Cc: Armando Visconti armando.visconti@st.com Cc: Jiri Kosina jkosina@suse.cz Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/hid.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -492,7 +492,7 @@ struct hid_report_enum { };
#define HID_MIN_BUFFER_SIZE 64 /* make sure there is at least a packet size of space */ -#define HID_MAX_BUFFER_SIZE 4096 /* 4kb */ +#define HID_MAX_BUFFER_SIZE 8192 /* 8kb */ #define HID_CONTROL_FIFO_SIZE 256 /* to init devices with >100 reports */ #define HID_OUTPUT_FIFO_SIZE 64
From: Daniel Vetter daniel.vetter@ffwll.ch
commit 8a3bddf67ce88b96531fb22c5a75d7f4dc41d155 upstream.
This doesn't do anything except auto-init drm_agp support when you call drm_get_pci_dev(). Which amdgpu stopped doing with
commit b58c11314a1706bf094c489ef5cb28f76478c704 Author: Alex Deucher alexander.deucher@amd.com Date: Fri Jun 2 17:16:31 2017 -0400
drm/amdgpu: drop deprecated drm_get_pci_dev and drm_put_dev
No idea whether this was intentional or accidental breakage, but I guess anyone who manages to boot a this modern gpu behind an agp bridge deserves a price. A price I never expect anyone to ever collect :-)
Cc: Alex Deucher alexander.deucher@amd.com Cc: "Christian König" christian.koenig@amd.com Cc: Hawking Zhang Hawking.Zhang@amd.com Cc: Xiaojie Yuan xiaojie.yuan@amd.com Cc: Evan Quan evan.quan@amd.com Cc: "Tianci.Yin" tianci.yin@amd.com Cc: "Marek Olšák" marek.olsak@amd.com Cc: Hans de Goede hdegoede@redhat.com Reviewed-by: Emil Velikov emil.velikov@collabora.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1357,7 +1357,7 @@ amdgpu_get_crtc_scanout_position(struct
static struct drm_driver kms_driver = { .driver_features = - DRIVER_USE_AGP | DRIVER_ATOMIC | + DRIVER_ATOMIC | DRIVER_GEM | DRIVER_RENDER | DRIVER_MODESET | DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE,
From: Daniel Vetter daniel.vetter@ffwll.ch
commit eb12c957735b582607e5842a06d1f4c62e185c1d upstream.
It's the last user, and more importantly, it's the last non-legacy user of anything in drm_pci.c.
The only tricky bit is the agp initialization. But a close look shows that radeon does not use the drm_agp midlayer (the main use of that is drm_bufs for legacy drivers), and instead could use the agp subsystem directly (like nouveau does already). Hence we can just pull this in too.
A further step would be to entirely drop the use of drm_device->agp, but feels like too much churn just for this patch.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Alex Deucher alexander.deucher@amd.com Cc: "Christian König" christian.koenig@amd.com Cc: "David (ChunMing) Zhou" David1.Zhou@amd.com Cc: amd-gfx@lists.freedesktop.org Reviewed-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Emil Velikov emil.velikov@collabora.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/radeon/radeon_drv.c | 43 ++++++++++++++++++++++++++++++++++-- drivers/gpu/drm/radeon/radeon_kms.c | 6 +++++ 2 files changed, 47 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -37,6 +37,7 @@ #include <linux/vga_switcheroo.h> #include <linux/mmu_notifier.h>
+#include <drm/drm_agpsupport.h> #include <drm/drm_crtc_helper.h> #include <drm/drm_drv.h> #include <drm/drm_fb_helper.h> @@ -325,6 +326,7 @@ static int radeon_pci_probe(struct pci_d const struct pci_device_id *ent) { unsigned long flags = 0; + struct drm_device *dev; int ret;
if (!ent) @@ -365,7 +367,44 @@ static int radeon_pci_probe(struct pci_d if (ret) return ret;
- return drm_get_pci_dev(pdev, ent, &kms_driver); + dev = drm_dev_alloc(&kms_driver, &pdev->dev); + if (IS_ERR(dev)) + return PTR_ERR(dev); + + ret = pci_enable_device(pdev); + if (ret) + goto err_free; + + dev->pdev = pdev; +#ifdef __alpha__ + dev->hose = pdev->sysdata; +#endif + + pci_set_drvdata(pdev, dev); + + if (pci_find_capability(dev->pdev, PCI_CAP_ID_AGP)) + dev->agp = drm_agp_init(dev); + if (dev->agp) { + dev->agp->agp_mtrr = arch_phys_wc_add( + dev->agp->agp_info.aper_base, + dev->agp->agp_info.aper_size * + 1024 * 1024); + } + + ret = drm_dev_register(dev, ent->driver_data); + if (ret) + goto err_agp; + + return 0; + +err_agp: + if (dev->agp) + arch_phys_wc_del(dev->agp->agp_mtrr); + kfree(dev->agp); + pci_disable_device(pdev); +err_free: + drm_dev_put(dev); + return ret; }
static void @@ -575,7 +614,7 @@ radeon_get_crtc_scanout_position(struct
static struct drm_driver kms_driver = { .driver_features = - DRIVER_USE_AGP | DRIVER_GEM | DRIVER_RENDER, + DRIVER_GEM | DRIVER_RENDER, .load = radeon_driver_load_kms, .open = radeon_driver_open_kms, .postclose = radeon_driver_postclose_kms, --- a/drivers/gpu/drm/radeon/radeon_kms.c +++ b/drivers/gpu/drm/radeon/radeon_kms.c @@ -31,6 +31,7 @@ #include <linux/uaccess.h> #include <linux/vga_switcheroo.h>
+#include <drm/drm_agpsupport.h> #include <drm/drm_fb_helper.h> #include <drm/drm_file.h> #include <drm/drm_ioctl.h> @@ -77,6 +78,11 @@ void radeon_driver_unload_kms(struct drm radeon_modeset_fini(rdev); radeon_device_fini(rdev);
+ if (dev->agp) + arch_phys_wc_del(dev->agp->agp_mtrr); + kfree(dev->agp); + dev->agp = NULL; + done_free: kfree(rdev); dev->dev_private = NULL;
From: Wolfram Sang wsa@the-dreams.de
commit 38b17afb0ebb9ecd41418d3c08bcf9198af4349d upstream.
Removing attach_adapter from this driver caused a regression for at least some machines. Those machines had the sensors described in their DT, too, so they didn't need manual creation of the sensor devices. The old code worked, though, because manual creation came first. Creation of DT devices then failed later and caused error logs, but the sensors worked nonetheless because of the manually created devices.
When removing attach_adaper, manual creation now comes later and loses the race. The sensor devices were already registered via DT, yet with another binding, so the driver could not be bound to it.
This fix refactors the code to remove the race and only manually creates devices if there are no DT nodes present. Also, the DT binding is updated to match both, the DT and manually created devices. Because we don't know which device creation will be used at runtime, the code to start the kthread is moved to do_probe() which will be called by both methods.
Fixes: 3e7bed52719d ("macintosh: therm_windtunnel: drop using attach_adapter") Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723 Reported-by: Erhard Furtner erhard_f@mailbox.org Tested-by: Erhard Furtner erhard_f@mailbox.org Acked-by: Michael Ellerman mpe@ellerman.id.au (powerpc) Signed-off-by: Wolfram Sang wsa@the-dreams.de Cc: stable@kernel.org # v4.19+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/macintosh/therm_windtunnel.c | 52 ++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 21 deletions(-)
--- a/drivers/macintosh/therm_windtunnel.c +++ b/drivers/macintosh/therm_windtunnel.c @@ -300,9 +300,11 @@ static int control_loop(void *dummy) /* i2c probing and setup */ /************************************************************************/
-static int -do_attach( struct i2c_adapter *adapter ) +static void do_attach(struct i2c_adapter *adapter) { + struct i2c_board_info info = { }; + struct device_node *np; + /* scan 0x48-0x4f (DS1775) and 0x2c-2x2f (ADM1030) */ static const unsigned short scan_ds1775[] = { 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, @@ -313,25 +315,24 @@ do_attach( struct i2c_adapter *adapter ) I2C_CLIENT_END };
- if( strncmp(adapter->name, "uni-n", 5) ) - return 0; - - if( !x.running ) { - struct i2c_board_info info; + if (x.running || strncmp(adapter->name, "uni-n", 5)) + return;
- memset(&info, 0, sizeof(struct i2c_board_info)); - strlcpy(info.type, "therm_ds1775", I2C_NAME_SIZE); + np = of_find_compatible_node(adapter->dev.of_node, NULL, "MAC,ds1775"); + if (np) { + of_node_put(np); + } else { + strlcpy(info.type, "MAC,ds1775", I2C_NAME_SIZE); i2c_new_probed_device(adapter, &info, scan_ds1775, NULL); + }
- strlcpy(info.type, "therm_adm1030", I2C_NAME_SIZE); + np = of_find_compatible_node(adapter->dev.of_node, NULL, "MAC,adm1030"); + if (np) { + of_node_put(np); + } else { + strlcpy(info.type, "MAC,adm1030", I2C_NAME_SIZE); i2c_new_probed_device(adapter, &info, scan_adm1030, NULL); - - if( x.thermostat && x.fan ) { - x.running = 1; - x.poll_task = kthread_run(control_loop, NULL, "g4fand"); - } } - return 0; }
static int @@ -404,8 +405,8 @@ out: enum chip { ds1775, adm1030 };
static const struct i2c_device_id therm_windtunnel_id[] = { - { "therm_ds1775", ds1775 }, - { "therm_adm1030", adm1030 }, + { "MAC,ds1775", ds1775 }, + { "MAC,adm1030", adm1030 }, { } }; MODULE_DEVICE_TABLE(i2c, therm_windtunnel_id); @@ -414,6 +415,7 @@ static int do_probe(struct i2c_client *cl, const struct i2c_device_id *id) { struct i2c_adapter *adapter = cl->adapter; + int ret = 0;
if( !i2c_check_functionality(adapter, I2C_FUNC_SMBUS_WORD_DATA | I2C_FUNC_SMBUS_WRITE_BYTE) ) @@ -421,11 +423,19 @@ do_probe(struct i2c_client *cl, const st
switch (id->driver_data) { case adm1030: - return attach_fan( cl ); + ret = attach_fan(cl); + break; case ds1775: - return attach_thermostat(cl); + ret = attach_thermostat(cl); + break; } - return 0; + + if (!x.running && x.thermostat && x.fan) { + x.running = 1; + x.poll_task = kthread_run(control_loop, NULL, "g4fand"); + } + + return ret; }
static struct i2c_driver g4fan_driver = {
From: Jan Kara jack@suse.cz
commit c780e86dd48ef6467a1146cf7d0fe1e05a635039 upstream.
KASAN is reporting that __blk_add_trace() has a use-after-free issue when accessing q->blk_trace. Indeed the switching of block tracing (and thus eventual freeing of q->blk_trace) is completely unsynchronized with the currently running tracing and thus it can happen that the blk_trace structure is being freed just while __blk_add_trace() works on it. Protect accesses to q->blk_trace by RCU during tracing and make sure we wait for the end of RCU grace period when shutting down tracing. Luckily that is rare enough event that we can afford that. Note that postponing the freeing of blk_trace to an RCU callback should better be avoided as it could have unexpected user visible side-effects as debugfs files would be still existing for a short while block tracing has been shut down.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711 CC: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni chaitanya.kulkarni@wdc.com Reviewed-by: Ming Lei ming.lei@redhat.com Tested-by: Ming Lei ming.lei@redhat.com Reviewed-by: Bart Van Assche bvanassche@acm.org Reported-by: Tristan Madani tristmd@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/blkdev.h | 2 include/linux/blktrace_api.h | 18 ++++-- kernel/trace/blktrace.c | 114 +++++++++++++++++++++++++++++++------------ 3 files changed, 97 insertions(+), 37 deletions(-)
--- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -524,7 +524,7 @@ struct request_queue { unsigned int sg_reserved_size; int node; #ifdef CONFIG_BLK_DEV_IO_TRACE - struct blk_trace *blk_trace; + struct blk_trace __rcu *blk_trace; struct mutex blk_trace_mutex; #endif /* --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -51,9 +51,13 @@ void __trace_note_message(struct blk_tra **/ #define blk_add_cgroup_trace_msg(q, cg, fmt, ...) \ do { \ - struct blk_trace *bt = (q)->blk_trace; \ + struct blk_trace *bt; \ + \ + rcu_read_lock(); \ + bt = rcu_dereference((q)->blk_trace); \ if (unlikely(bt)) \ __trace_note_message(bt, cg, fmt, ##__VA_ARGS__);\ + rcu_read_unlock(); \ } while (0) #define blk_add_trace_msg(q, fmt, ...) \ blk_add_cgroup_trace_msg(q, NULL, fmt, ##__VA_ARGS__) @@ -61,10 +65,14 @@ void __trace_note_message(struct blk_tra
static inline bool blk_trace_note_message_enabled(struct request_queue *q) { - struct blk_trace *bt = q->blk_trace; - if (likely(!bt)) - return false; - return bt->act_mask & BLK_TC_NOTIFY; + struct blk_trace *bt; + bool ret; + + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); + ret = bt && (bt->act_mask & BLK_TC_NOTIFY); + rcu_read_unlock(); + return ret; }
extern void blk_add_driver_data(struct request_queue *q, struct request *rq, --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -335,6 +335,7 @@ static void put_probe_ref(void)
static void blk_trace_cleanup(struct blk_trace *bt) { + synchronize_rcu(); blk_trace_free(bt); put_probe_ref(); } @@ -629,8 +630,10 @@ static int compat_blk_trace_setup(struct static int __blk_trace_startstop(struct request_queue *q, int start) { int ret; - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
+ bt = rcu_dereference_protected(q->blk_trace, + lockdep_is_held(&q->blk_trace_mutex)); if (bt == NULL) return -EINVAL;
@@ -740,8 +743,8 @@ int blk_trace_ioctl(struct block_device void blk_trace_shutdown(struct request_queue *q) { mutex_lock(&q->blk_trace_mutex); - - if (q->blk_trace) { + if (rcu_dereference_protected(q->blk_trace, + lockdep_is_held(&q->blk_trace_mutex))) { __blk_trace_startstop(q, 0); __blk_trace_remove(q); } @@ -752,8 +755,10 @@ void blk_trace_shutdown(struct request_q #ifdef CONFIG_BLK_CGROUP static u64 blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
+ /* We don't use the 'bt' value here except as an optimization... */ + bt = rcu_dereference_protected(q->blk_trace, 1); if (!bt || !(blk_tracer_flags.val & TRACE_BLK_OPT_CGROUP)) return 0;
@@ -796,10 +801,14 @@ blk_trace_request_get_cgid(struct reques static void blk_add_trace_rq(struct request *rq, int error, unsigned int nr_bytes, u32 what, u64 cgid) { - struct blk_trace *bt = rq->q->blk_trace; + struct blk_trace *bt;
- if (likely(!bt)) + rcu_read_lock(); + bt = rcu_dereference(rq->q->blk_trace); + if (likely(!bt)) { + rcu_read_unlock(); return; + }
if (blk_rq_is_passthrough(rq)) what |= BLK_TC_ACT(BLK_TC_PC); @@ -808,6 +817,7 @@ static void blk_add_trace_rq(struct requ
__blk_add_trace(bt, blk_rq_trace_sector(rq), nr_bytes, req_op(rq), rq->cmd_flags, what, error, 0, NULL, cgid); + rcu_read_unlock(); }
static void blk_add_trace_rq_insert(void *ignore, @@ -853,14 +863,19 @@ static void blk_add_trace_rq_complete(vo static void blk_add_trace_bio(struct request_queue *q, struct bio *bio, u32 what, int error) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
- if (likely(!bt)) + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); + if (likely(!bt)) { + rcu_read_unlock(); return; + }
__blk_add_trace(bt, bio->bi_iter.bi_sector, bio->bi_iter.bi_size, bio_op(bio), bio->bi_opf, what, error, 0, NULL, blk_trace_bio_get_cgid(q, bio)); + rcu_read_unlock(); }
static void blk_add_trace_bio_bounce(void *ignore, @@ -905,11 +920,14 @@ static void blk_add_trace_getrq(void *ig if (bio) blk_add_trace_bio(q, bio, BLK_TA_GETRQ, 0); else { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
+ rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); if (bt) __blk_add_trace(bt, 0, 0, rw, 0, BLK_TA_GETRQ, 0, 0, NULL, 0); + rcu_read_unlock(); } }
@@ -921,27 +939,35 @@ static void blk_add_trace_sleeprq(void * if (bio) blk_add_trace_bio(q, bio, BLK_TA_SLEEPRQ, 0); else { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
+ rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); if (bt) __blk_add_trace(bt, 0, 0, rw, 0, BLK_TA_SLEEPRQ, 0, 0, NULL, 0); + rcu_read_unlock(); } }
static void blk_add_trace_plug(void *ignore, struct request_queue *q) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
+ rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); if (bt) __blk_add_trace(bt, 0, 0, 0, 0, BLK_TA_PLUG, 0, 0, NULL, 0); + rcu_read_unlock(); }
static void blk_add_trace_unplug(void *ignore, struct request_queue *q, unsigned int depth, bool explicit) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
+ rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); if (bt) { __be64 rpdu = cpu_to_be64(depth); u32 what; @@ -953,14 +979,17 @@ static void blk_add_trace_unplug(void *i
__blk_add_trace(bt, 0, 0, 0, 0, what, 0, sizeof(rpdu), &rpdu, 0); } + rcu_read_unlock(); }
static void blk_add_trace_split(void *ignore, struct request_queue *q, struct bio *bio, unsigned int pdu) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
+ rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); if (bt) { __be64 rpdu = cpu_to_be64(pdu);
@@ -969,6 +998,7 @@ static void blk_add_trace_split(void *ig BLK_TA_SPLIT, bio->bi_status, sizeof(rpdu), &rpdu, blk_trace_bio_get_cgid(q, bio)); } + rcu_read_unlock(); }
/** @@ -988,11 +1018,15 @@ static void blk_add_trace_bio_remap(void struct request_queue *q, struct bio *bio, dev_t dev, sector_t from) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt; struct blk_io_trace_remap r;
- if (likely(!bt)) + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); + if (likely(!bt)) { + rcu_read_unlock(); return; + }
r.device_from = cpu_to_be32(dev); r.device_to = cpu_to_be32(bio_dev(bio)); @@ -1001,6 +1035,7 @@ static void blk_add_trace_bio_remap(void __blk_add_trace(bt, bio->bi_iter.bi_sector, bio->bi_iter.bi_size, bio_op(bio), bio->bi_opf, BLK_TA_REMAP, bio->bi_status, sizeof(r), &r, blk_trace_bio_get_cgid(q, bio)); + rcu_read_unlock(); }
/** @@ -1021,11 +1056,15 @@ static void blk_add_trace_rq_remap(void struct request *rq, dev_t dev, sector_t from) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt; struct blk_io_trace_remap r;
- if (likely(!bt)) + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); + if (likely(!bt)) { + rcu_read_unlock(); return; + }
r.device_from = cpu_to_be32(dev); r.device_to = cpu_to_be32(disk_devt(rq->rq_disk)); @@ -1034,6 +1073,7 @@ static void blk_add_trace_rq_remap(void __blk_add_trace(bt, blk_rq_pos(rq), blk_rq_bytes(rq), rq_data_dir(rq), 0, BLK_TA_REMAP, 0, sizeof(r), &r, blk_trace_request_get_cgid(q, rq)); + rcu_read_unlock(); }
/** @@ -1051,14 +1091,19 @@ void blk_add_driver_data(struct request_ struct request *rq, void *data, size_t len) { - struct blk_trace *bt = q->blk_trace; + struct blk_trace *bt;
- if (likely(!bt)) + rcu_read_lock(); + bt = rcu_dereference(q->blk_trace); + if (likely(!bt)) { + rcu_read_unlock(); return; + }
__blk_add_trace(bt, blk_rq_trace_sector(rq), blk_rq_bytes(rq), 0, 0, BLK_TA_DRV_DATA, 0, len, data, blk_trace_request_get_cgid(q, rq)); + rcu_read_unlock(); } EXPORT_SYMBOL_GPL(blk_add_driver_data);
@@ -1597,6 +1642,7 @@ static int blk_trace_remove_queue(struct return -EINVAL;
put_probe_ref(); + synchronize_rcu(); blk_trace_free(bt); return 0; } @@ -1758,6 +1804,7 @@ static ssize_t sysfs_blk_trace_attr_show struct hd_struct *p = dev_to_part(dev); struct request_queue *q; struct block_device *bdev; + struct blk_trace *bt; ssize_t ret = -ENXIO;
bdev = bdget(part_devt(p)); @@ -1770,21 +1817,23 @@ static ssize_t sysfs_blk_trace_attr_show
mutex_lock(&q->blk_trace_mutex);
+ bt = rcu_dereference_protected(q->blk_trace, + lockdep_is_held(&q->blk_trace_mutex)); if (attr == &dev_attr_enable) { - ret = sprintf(buf, "%u\n", !!q->blk_trace); + ret = sprintf(buf, "%u\n", !!bt); goto out_unlock_bdev; }
- if (q->blk_trace == NULL) + if (bt == NULL) ret = sprintf(buf, "disabled\n"); else if (attr == &dev_attr_act_mask) - ret = blk_trace_mask2str(buf, q->blk_trace->act_mask); + ret = blk_trace_mask2str(buf, bt->act_mask); else if (attr == &dev_attr_pid) - ret = sprintf(buf, "%u\n", q->blk_trace->pid); + ret = sprintf(buf, "%u\n", bt->pid); else if (attr == &dev_attr_start_lba) - ret = sprintf(buf, "%llu\n", q->blk_trace->start_lba); + ret = sprintf(buf, "%llu\n", bt->start_lba); else if (attr == &dev_attr_end_lba) - ret = sprintf(buf, "%llu\n", q->blk_trace->end_lba); + ret = sprintf(buf, "%llu\n", bt->end_lba);
out_unlock_bdev: mutex_unlock(&q->blk_trace_mutex); @@ -1801,6 +1850,7 @@ static ssize_t sysfs_blk_trace_attr_stor struct block_device *bdev; struct request_queue *q; struct hd_struct *p; + struct blk_trace *bt; u64 value; ssize_t ret = -EINVAL;
@@ -1831,8 +1881,10 @@ static ssize_t sysfs_blk_trace_attr_stor
mutex_lock(&q->blk_trace_mutex);
+ bt = rcu_dereference_protected(q->blk_trace, + lockdep_is_held(&q->blk_trace_mutex)); if (attr == &dev_attr_enable) { - if (!!value == !!q->blk_trace) { + if (!!value == !!bt) { ret = 0; goto out_unlock_bdev; } @@ -1844,18 +1896,18 @@ static ssize_t sysfs_blk_trace_attr_stor }
ret = 0; - if (q->blk_trace == NULL) + if (bt == NULL) ret = blk_trace_setup_queue(q, bdev);
if (ret == 0) { if (attr == &dev_attr_act_mask) - q->blk_trace->act_mask = value; + bt->act_mask = value; else if (attr == &dev_attr_pid) - q->blk_trace->pid = value; + bt->pid = value; else if (attr == &dev_attr_start_lba) - q->blk_trace->start_lba = value; + bt->start_lba = value; else if (attr == &dev_attr_end_lba) - q->blk_trace->end_lba = value; + bt->end_lba = value; }
out_unlock_bdev:
From: Steven Rostedt (VMware) rostedt@goodmis.org
commit 78041c0c9e935d9ce4086feeff6c569ed88ddfd4 upstream.
The tracing seftests checks various aspects of the tracing infrastructure, and one is filtering. If trace_printk() is active during a self test, it can cause the filtering to fail, which will disable that part of the trace.
To keep the selftests from failing because of trace_printk() calls, trace_printk() checks the variable tracing_selftest_running, and if set, it does not write to the tracing buffer.
As some tracers were registered earlier in boot, the selftest they triggered would fail because not all the infrastructure was set up for the full selftest. Thus, some of the tests were post poned to when their infrastructure was ready (namely file system code). The postpone code did not set the tracing_seftest_running variable, and could fail if a trace_printk() was added and executed during their run.
Cc: stable@vger.kernel.org Fixes: 9afecfbb95198 ("tracing: Postpone tracer start-up tests till the system is more robust") Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/trace/trace.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1827,6 +1827,7 @@ static __init int init_trace_selftests(v
pr_info("Running postponed tracer tests:\n");
+ tracing_selftest_running = true; list_for_each_entry_safe(p, n, &postponed_selftests, list) { /* This loop can take minutes when sanitizers are enabled, so * lets make sure we allow RCU processing. @@ -1849,6 +1850,7 @@ static __init int init_trace_selftests(v list_del(&p->list); kfree(p); } + tracing_selftest_running = false;
out: mutex_unlock(&trace_types_lock);
From: Orson Zhai orson.unisoc@gmail.com
commit 66d0e797bf095d407479c89952d42b1d96ef0a7f upstream.
This reverts commit 4585fbcb5331fc910b7e553ad3efd0dd7b320d14.
The name changing as devfreq(X) breaks some user space applications, such as Android HAL from Unisoc and Hikey [1]. The device name will be changed unexpectly after every boot depending on module init sequence. It will make trouble to setup some system configuration like selinux for Android.
So we'd like to revert it back to old naming rule before any better way being found.
[1] https://lkml.org/lkml/2018/5/8/1042
Cc: John Stultz john.stultz@linaro.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: stable@vger.kernel.org Signed-off-by: Orson Zhai orson.unisoc@gmail.com Acked-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Chanwoo Choi cw00.choi@samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/devfreq/devfreq.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -738,7 +738,6 @@ struct devfreq *devfreq_add_device(struc { struct devfreq *devfreq; struct devfreq_governor *governor; - static atomic_t devfreq_no = ATOMIC_INIT(-1); int err = 0;
if (!dev || !profile || !governor_name) { @@ -800,8 +799,7 @@ struct devfreq *devfreq_add_device(struc devfreq->suspend_freq = dev_pm_opp_get_suspend_opp_freq(dev); atomic_set(&devfreq->suspend_count, 0);
- dev_set_name(&devfreq->dev, "devfreq%d", - atomic_inc_return(&devfreq_no)); + dev_set_name(&devfreq->dev, "%s", dev_name(dev)); err = device_register(&devfreq->dev); if (err) { mutex_unlock(&devfreq->lock);
From: Shirish S shirish.s@amd.com
commit a3ed353cf8015ba84a0407a5dc3ffee038166ab0 upstream.
fixes S3 issue with IOMMU + S/G enabled @ 64M VRAM.
Suggested-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Shirish S shirish.s@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 37 ++++++++++++- drivers/gpu/drm/amd/include/asic_reg/dce/dce_12_0_offset.h | 2 3 files changed, 39 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -192,6 +192,7 @@ struct amdgpu_gmc { uint32_t srbm_soft_reset; bool prt_warning; uint64_t stolen_size; + uint32_t sdpif_register; /* apertures */ u64 shared_aperture_start; u64 shared_aperture_end; --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1204,6 +1204,19 @@ static void gmc_v9_0_init_golden_registe }
/** + * gmc_v9_0_restore_registers - restores regs + * + * @adev: amdgpu_device pointer + * + * This restores register values, saved at suspend. + */ +static void gmc_v9_0_restore_registers(struct amdgpu_device *adev) +{ + if (adev->asic_type == CHIP_RAVEN) + WREG32(mmDCHUBBUB_SDPIF_MMIO_CNTRL_0, adev->gmc.sdpif_register); +} + +/** * gmc_v9_0_gart_enable - gart enable * * @adev: amdgpu_device pointer @@ -1308,6 +1321,20 @@ static int gmc_v9_0_hw_init(void *handle }
/** + * gmc_v9_0_save_registers - saves regs + * + * @adev: amdgpu_device pointer + * + * This saves potential register values that should be + * restored upon resume + */ +static void gmc_v9_0_save_registers(struct amdgpu_device *adev) +{ + if (adev->asic_type == CHIP_RAVEN) + adev->gmc.sdpif_register = RREG32(mmDCHUBBUB_SDPIF_MMIO_CNTRL_0); +} + +/** * gmc_v9_0_gart_disable - gart disable * * @adev: amdgpu_device pointer @@ -1343,9 +1370,16 @@ static int gmc_v9_0_hw_fini(void *handle
static int gmc_v9_0_suspend(void *handle) { + int r; struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- return gmc_v9_0_hw_fini(adev); + r = gmc_v9_0_hw_fini(adev); + if (r) + return r; + + gmc_v9_0_save_registers(adev); + + return 0; }
static int gmc_v9_0_resume(void *handle) @@ -1353,6 +1387,7 @@ static int gmc_v9_0_resume(void *handle) int r; struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ gmc_v9_0_restore_registers(adev); r = gmc_v9_0_hw_init(adev); if (r) return r; --- a/drivers/gpu/drm/amd/include/asic_reg/dce/dce_12_0_offset.h +++ b/drivers/gpu/drm/amd/include/asic_reg/dce/dce_12_0_offset.h @@ -7376,6 +7376,8 @@ #define mmCRTC4_CRTC_DRR_CONTROL 0x0f3e #define mmCRTC4_CRTC_DRR_CONTROL_BASE_IDX 2
+#define mmDCHUBBUB_SDPIF_MMIO_CNTRL_0 0x395d +#define mmDCHUBBUB_SDPIF_MMIO_CNTRL_0_BASE_IDX 2
// addressBlock: dce_dc_fmt4_dispdec // base address: 0x2000
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit f5739cb0b56590d68d8df8a44659893b6d0084c3 upstream.
Before commit 1e4f63aecb53 ("cpufreq: Avoid creating excessively large stack frames") the initial value of the policy field in struct cpufreq_policy set by the driver's ->init() callback was implicitly passed from cpufreq_init_policy() to cpufreq_set_policy() if the default governor was neither "performance" nor "powersave". After that commit, however, cpufreq_init_policy() must take that case into consideration explicitly and handle it as appropriate, so make that happen.
Fixes: 1e4f63aecb53 ("cpufreq: Avoid creating excessively large stack frames") Link: https://lore.kernel.org/linux-pm/39fb762880c27da110086741315ca8b111d781cd.ca... Reported-by: Artem Bityutskiy dedekind1@gmail.com Cc: 5.4+ stable@vger.kernel.org # 5.4+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/cpufreq/cpufreq.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1074,9 +1074,17 @@ static int cpufreq_init_policy(struct cp pol = policy->last_policy; } else if (def_gov) { pol = cpufreq_parse_policy(def_gov->name); - } else { - return -ENODATA; + /* + * In case the default governor is neiter "performance" + * nor "powersave", fall back to the initial policy + * value set by the driver. + */ + if (pol == CPUFREQ_POLICY_UNKNOWN) + pol = policy->policy; } + if (pol != CPUFREQ_POLICY_PERFORMANCE && + pol != CPUFREQ_POLICY_POWERSAVE) + return -ENODATA; }
return cpufreq_set_policy(policy, gov, pol);
From: Jens Axboe axboe@kernel.dk
commit d876836204897b6d7d911f942084f69a1e9d5c4d upstream.
We must set MSG_CMSG_COMPAT if we're in compatability mode, otherwise the iovec import for these commands will not do the right thing and fail the command with -EINVAL.
Found by running the test suite compiled as 32-bit.
Cc: stable@vger.kernel.org Fixes: aa1fa28fc73e ("io_uring: add support for recvmsg()") Fixes: 0fa03c624d8f ("io_uring: add support for sendmsg()") Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/io_uring.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2166,6 +2166,11 @@ static int io_sendmsg_prep(struct io_kio sr->msg_flags = READ_ONCE(sqe->msg_flags); sr->msg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+#ifdef CONFIG_COMPAT + if (req->ctx->compat) + sr->msg_flags |= MSG_CMSG_COMPAT; +#endif + if (!io) return 0;
@@ -2258,6 +2263,11 @@ static int io_recvmsg_prep(struct io_kio sr->msg_flags = READ_ONCE(sqe->msg_flags); sr->msg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+#ifdef CONFIG_COMPAT + if (req->ctx->compat) + sr->msg_flags |= MSG_CMSG_COMPAT; +#endif + if (!io) return 0;
From: Jozsef Kadlecsik kadlec@netfilter.org
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of a set the processing of the whole set under spinlock protection could take too long.
There were four places where the whole hash table of the set was processed from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side add/del element operations (userspace add/del is excluded by the nfnetlink mutex). The original set is actually just read during the resize, so the spinlocking is replaced with rcu locking of regions. However, thus there can be parallel kernel side add/del of entries. In order not to loose those operations a backlog is added and replayed after the successful resize. - Garbage collection of timed out entries was also protected by the spinlock. In order not to lock too long, region locking is introduced and a single region is processed in one gc go. Also, the simple timer based gc running is replaced with a workqueue based solution. The internal book-keeping (number of elements, size of extensions) is moved to region level due to the region locking. - Adding elements: when the max number of the elements is reached, the gc was called to evict the timed out entries. The new approach is that the gc is called just for the matching region, assuming that if the region (proportionally) seems to be full, then the whole set does. We could scan the other regions to check every entry under rcu locking, but for huge sets it'd mean a slowdown at adding elements. - Listing the set header data: when the set was defined with timeout support, the garbage collector was called to clean up timed out entries to get the correct element numbers and set size values. Now the set is scanned to check non-timed out entries, without actually calling the gc for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe -> SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik kadlec@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/netfilter/ipset/ip_set.h | 11 net/netfilter/ipset/ip_set_core.c | 34 + net/netfilter/ipset/ip_set_hash_gen.h | 633 ++++++++++++++++++++++----------- 3 files changed, 472 insertions(+), 206 deletions(-)
--- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -121,6 +121,7 @@ struct ip_set_ext { u32 timeout; u8 packets_op; u8 bytes_op; + bool target; };
struct ip_set; @@ -187,6 +188,14 @@ struct ip_set_type_variant { /* Return true if "b" set is the same as "a" * according to the create set parameters */ bool (*same_set)(const struct ip_set *a, const struct ip_set *b); + /* Region-locking is used */ + bool region_lock; +}; + +struct ip_set_region { + spinlock_t lock; /* Region lock */ + size_t ext_size; /* Size of the dynamic extensions */ + u32 elements; /* Number of elements vs timeout */ };
/* The core set type structure */ @@ -501,7 +510,7 @@ ip_set_init_skbinfo(struct ip_set_skbinf }
#define IP_SET_INIT_KEXT(skb, opt, set) \ - { .bytes = (skb)->len, .packets = 1, \ + { .bytes = (skb)->len, .packets = 1, .target = true,\ .timeout = ip_set_adt_opt_timeout(opt, set) }
#define IP_SET_INIT_UEXT(set) \ --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -723,6 +723,20 @@ ip_set_rcu_get(struct net *net, ip_set_i return set; }
+static inline void +ip_set_lock(struct ip_set *set) +{ + if (!set->variant->region_lock) + spin_lock_bh(&set->lock); +} + +static inline void +ip_set_unlock(struct ip_set *set) +{ + if (!set->variant->region_lock) + spin_unlock_bh(&set->lock); +} + int ip_set_test(ip_set_id_t index, const struct sk_buff *skb, const struct xt_action_param *par, struct ip_set_adt_opt *opt) @@ -744,9 +758,9 @@ ip_set_test(ip_set_id_t index, const str if (ret == -EAGAIN) { /* Type requests element to be completed */ pr_debug("element must be completed, ADD is triggered\n"); - spin_lock_bh(&set->lock); + ip_set_lock(set); set->variant->kadt(set, skb, par, IPSET_ADD, opt); - spin_unlock_bh(&set->lock); + ip_set_unlock(set); ret = 1; } else { /* --return-nomatch: invert matched element */ @@ -775,9 +789,9 @@ ip_set_add(ip_set_id_t index, const stru !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) return -IPSET_ERR_TYPE_MISMATCH;
- spin_lock_bh(&set->lock); + ip_set_lock(set); ret = set->variant->kadt(set, skb, par, IPSET_ADD, opt); - spin_unlock_bh(&set->lock); + ip_set_unlock(set);
return ret; } @@ -797,9 +811,9 @@ ip_set_del(ip_set_id_t index, const stru !(opt->family == set->family || set->family == NFPROTO_UNSPEC)) return -IPSET_ERR_TYPE_MISMATCH;
- spin_lock_bh(&set->lock); + ip_set_lock(set); ret = set->variant->kadt(set, skb, par, IPSET_DEL, opt); - spin_unlock_bh(&set->lock); + ip_set_unlock(set);
return ret; } @@ -1264,9 +1278,9 @@ ip_set_flush_set(struct ip_set *set) { pr_debug("set: %s\n", set->name);
- spin_lock_bh(&set->lock); + ip_set_lock(set); set->variant->flush(set); - spin_unlock_bh(&set->lock); + ip_set_unlock(set); }
static int ip_set_flush(struct net *net, struct sock *ctnl, struct sk_buff *skb, @@ -1713,9 +1727,9 @@ call_ad(struct sock *ctnl, struct sk_buf bool eexist = flags & IPSET_FLAG_EXIST, retried = false;
do { - spin_lock_bh(&set->lock); + ip_set_lock(set); ret = set->variant->uadt(set, tb, adt, &lineno, flags, retried); - spin_unlock_bh(&set->lock); + ip_set_unlock(set); retried = true; } while (ret == -EAGAIN && set->variant->resize && --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -7,13 +7,21 @@ #include <linux/rcupdate.h> #include <linux/jhash.h> #include <linux/types.h> +#include <linux/netfilter/nfnetlink.h> #include <linux/netfilter/ipset/ip_set.h>
-#define __ipset_dereference_protected(p, c) rcu_dereference_protected(p, c) -#define ipset_dereference_protected(p, set) \ - __ipset_dereference_protected(p, lockdep_is_held(&(set)->lock)) - -#define rcu_dereference_bh_nfnl(p) rcu_dereference_bh_check(p, 1) +#define __ipset_dereference(p) \ + rcu_dereference_protected(p, 1) +#define ipset_dereference_nfnl(p) \ + rcu_dereference_protected(p, \ + lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET)) +#define ipset_dereference_set(p, set) \ + rcu_dereference_protected(p, \ + lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET) || \ + lockdep_is_held(&(set)->lock)) +#define ipset_dereference_bh_nfnl(p) \ + rcu_dereference_bh_check(p, \ + lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET))
/* Hashing which uses arrays to resolve clashing. The hash table is resized * (doubled) when searching becomes too long. @@ -72,11 +80,35 @@ struct hbucket { __aligned(__alignof__(u64)); };
+/* Region size for locking == 2^HTABLE_REGION_BITS */ +#define HTABLE_REGION_BITS 10 +#define ahash_numof_locks(htable_bits) \ + ((htable_bits) < HTABLE_REGION_BITS ? 1 \ + : jhash_size((htable_bits) - HTABLE_REGION_BITS)) +#define ahash_sizeof_regions(htable_bits) \ + (ahash_numof_locks(htable_bits) * sizeof(struct ip_set_region)) +#define ahash_region(n, htable_bits) \ + ((n) % ahash_numof_locks(htable_bits)) +#define ahash_bucket_start(h, htable_bits) \ + ((htable_bits) < HTABLE_REGION_BITS ? 0 \ + : (h) * jhash_size(HTABLE_REGION_BITS)) +#define ahash_bucket_end(h, htable_bits) \ + ((htable_bits) < HTABLE_REGION_BITS ? jhash_size(htable_bits) \ + : ((h) + 1) * jhash_size(HTABLE_REGION_BITS)) + +struct htable_gc { + struct delayed_work dwork; + struct ip_set *set; /* Set the gc belongs to */ + u32 region; /* Last gc run position */ +}; + /* The hash table: the table size stored here in order to make resizing easy */ struct htable { atomic_t ref; /* References for resizing */ - atomic_t uref; /* References for dumping */ + atomic_t uref; /* References for dumping and gc */ u8 htable_bits; /* size of hash table == 2^htable_bits */ + u32 maxelem; /* Maxelem per region */ + struct ip_set_region *hregion; /* Region locks and ext sizes */ struct hbucket __rcu *bucket[0]; /* hashtable buckets */ };
@@ -162,6 +194,10 @@ htable_bits(u32 hashsize) #define NLEN 0 #endif /* IP_SET_HASH_WITH_NETS */
+#define SET_ELEM_EXPIRED(set, d) \ + (SET_WITH_TIMEOUT(set) && \ + ip_set_timeout_expired(ext_timeout(d, set))) + #endif /* _IP_SET_HASH_GEN_H */
#ifndef MTYPE @@ -205,10 +241,12 @@ htable_bits(u32 hashsize) #undef mtype_test_cidrs #undef mtype_test #undef mtype_uref -#undef mtype_expire #undef mtype_resize +#undef mtype_ext_size +#undef mtype_resize_ad #undef mtype_head #undef mtype_list +#undef mtype_gc_do #undef mtype_gc #undef mtype_gc_init #undef mtype_variant @@ -247,10 +285,12 @@ htable_bits(u32 hashsize) #define mtype_test_cidrs IPSET_TOKEN(MTYPE, _test_cidrs) #define mtype_test IPSET_TOKEN(MTYPE, _test) #define mtype_uref IPSET_TOKEN(MTYPE, _uref) -#define mtype_expire IPSET_TOKEN(MTYPE, _expire) #define mtype_resize IPSET_TOKEN(MTYPE, _resize) +#define mtype_ext_size IPSET_TOKEN(MTYPE, _ext_size) +#define mtype_resize_ad IPSET_TOKEN(MTYPE, _resize_ad) #define mtype_head IPSET_TOKEN(MTYPE, _head) #define mtype_list IPSET_TOKEN(MTYPE, _list) +#define mtype_gc_do IPSET_TOKEN(MTYPE, _gc_do) #define mtype_gc IPSET_TOKEN(MTYPE, _gc) #define mtype_gc_init IPSET_TOKEN(MTYPE, _gc_init) #define mtype_variant IPSET_TOKEN(MTYPE, _variant) @@ -275,8 +315,7 @@ htable_bits(u32 hashsize) /* The generic hash structure */ struct htype { struct htable __rcu *table; /* the hash table */ - struct timer_list gc; /* garbage collection when timeout enabled */ - struct ip_set *set; /* attached to this ip_set */ + struct htable_gc gc; /* gc workqueue */ u32 maxelem; /* max elements in the hash */ u32 initval; /* random jhash init value */ #ifdef IP_SET_HASH_WITH_MARKMASK @@ -288,21 +327,33 @@ struct htype { #ifdef IP_SET_HASH_WITH_NETMASK u8 netmask; /* netmask value for subnets to store */ #endif + struct list_head ad; /* Resize add|del backlist */ struct mtype_elem next; /* temporary storage for uadd */ #ifdef IP_SET_HASH_WITH_NETS struct net_prefixes nets[NLEN]; /* book-keeping of prefixes */ #endif };
+/* ADD|DEL entries saved during resize */ +struct mtype_resize_ad { + struct list_head list; + enum ipset_adt ad; /* ADD|DEL element */ + struct mtype_elem d; /* Element value */ + struct ip_set_ext ext; /* Extensions for ADD */ + struct ip_set_ext mext; /* Target extensions for ADD */ + u32 flags; /* Flags for ADD */ +}; + #ifdef IP_SET_HASH_WITH_NETS /* Network cidr size book keeping when the hash stores different * sized networks. cidr == real cidr + 1 to support /0. */ static void -mtype_add_cidr(struct htype *h, u8 cidr, u8 n) +mtype_add_cidr(struct ip_set *set, struct htype *h, u8 cidr, u8 n) { int i, j;
+ spin_lock_bh(&set->lock); /* Add in increasing prefix order, so larger cidr first */ for (i = 0, j = -1; i < NLEN && h->nets[i].cidr[n]; i++) { if (j != -1) { @@ -311,7 +362,7 @@ mtype_add_cidr(struct htype *h, u8 cidr, j = i; } else if (h->nets[i].cidr[n] == cidr) { h->nets[CIDR_POS(cidr)].nets[n]++; - return; + goto unlock; } } if (j != -1) { @@ -320,24 +371,29 @@ mtype_add_cidr(struct htype *h, u8 cidr, } h->nets[i].cidr[n] = cidr; h->nets[CIDR_POS(cidr)].nets[n] = 1; +unlock: + spin_unlock_bh(&set->lock); }
static void -mtype_del_cidr(struct htype *h, u8 cidr, u8 n) +mtype_del_cidr(struct ip_set *set, struct htype *h, u8 cidr, u8 n) { u8 i, j, net_end = NLEN - 1;
+ spin_lock_bh(&set->lock); for (i = 0; i < NLEN; i++) { if (h->nets[i].cidr[n] != cidr) continue; h->nets[CIDR_POS(cidr)].nets[n]--; if (h->nets[CIDR_POS(cidr)].nets[n] > 0) - return; + goto unlock; for (j = i; j < net_end && h->nets[j].cidr[n]; j++) h->nets[j].cidr[n] = h->nets[j + 1].cidr[n]; h->nets[j].cidr[n] = 0; - return; + goto unlock; } +unlock: + spin_unlock_bh(&set->lock); } #endif
@@ -345,7 +401,7 @@ mtype_del_cidr(struct htype *h, u8 cidr, static size_t mtype_ahash_memsize(const struct htype *h, const struct htable *t) { - return sizeof(*h) + sizeof(*t); + return sizeof(*h) + sizeof(*t) + ahash_sizeof_regions(t->htable_bits); }
/* Get the ith element from the array block n */ @@ -369,24 +425,29 @@ mtype_flush(struct ip_set *set) struct htype *h = set->data; struct htable *t; struct hbucket *n; - u32 i; + u32 r, i;
- t = ipset_dereference_protected(h->table, set); - for (i = 0; i < jhash_size(t->htable_bits); i++) { - n = __ipset_dereference_protected(hbucket(t, i), 1); - if (!n) - continue; - if (set->extensions & IPSET_EXT_DESTROY) - mtype_ext_cleanup(set, n); - /* FIXME: use slab cache */ - rcu_assign_pointer(hbucket(t, i), NULL); - kfree_rcu(n, rcu); + t = ipset_dereference_nfnl(h->table); + for (r = 0; r < ahash_numof_locks(t->htable_bits); r++) { + spin_lock_bh(&t->hregion[r].lock); + for (i = ahash_bucket_start(r, t->htable_bits); + i < ahash_bucket_end(r, t->htable_bits); i++) { + n = __ipset_dereference(hbucket(t, i)); + if (!n) + continue; + if (set->extensions & IPSET_EXT_DESTROY) + mtype_ext_cleanup(set, n); + /* FIXME: use slab cache */ + rcu_assign_pointer(hbucket(t, i), NULL); + kfree_rcu(n, rcu); + } + t->hregion[r].ext_size = 0; + t->hregion[r].elements = 0; + spin_unlock_bh(&t->hregion[r].lock); } #ifdef IP_SET_HASH_WITH_NETS memset(h->nets, 0, sizeof(h->nets)); #endif - set->elements = 0; - set->ext_size = 0; }
/* Destroy the hashtable part of the set */ @@ -397,7 +458,7 @@ mtype_ahash_destroy(struct ip_set *set, u32 i;
for (i = 0; i < jhash_size(t->htable_bits); i++) { - n = __ipset_dereference_protected(hbucket(t, i), 1); + n = __ipset_dereference(hbucket(t, i)); if (!n) continue; if (set->extensions & IPSET_EXT_DESTROY && ext_destroy) @@ -406,6 +467,7 @@ mtype_ahash_destroy(struct ip_set *set, kfree(n); }
+ ip_set_free(t->hregion); ip_set_free(t); }
@@ -414,28 +476,21 @@ static void mtype_destroy(struct ip_set *set) { struct htype *h = set->data; + struct list_head *l, *lt;
if (SET_WITH_TIMEOUT(set)) - del_timer_sync(&h->gc); + cancel_delayed_work_sync(&h->gc.dwork);
- mtype_ahash_destroy(set, - __ipset_dereference_protected(h->table, 1), true); + mtype_ahash_destroy(set, ipset_dereference_nfnl(h->table), true); + list_for_each_safe(l, lt, &h->ad) { + list_del(l); + kfree(l); + } kfree(h);
set->data = NULL; }
-static void -mtype_gc_init(struct ip_set *set, void (*gc)(struct timer_list *t)) -{ - struct htype *h = set->data; - - timer_setup(&h->gc, gc, 0); - mod_timer(&h->gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ); - pr_debug("gc initialized, run in every %u\n", - IPSET_GC_PERIOD(set->timeout)); -} - static bool mtype_same_set(const struct ip_set *a, const struct ip_set *b) { @@ -454,11 +509,9 @@ mtype_same_set(const struct ip_set *a, c a->extensions == b->extensions; }
-/* Delete expired elements from the hashtable */ static void -mtype_expire(struct ip_set *set, struct htype *h) +mtype_gc_do(struct ip_set *set, struct htype *h, struct htable *t, u32 r) { - struct htable *t; struct hbucket *n, *tmp; struct mtype_elem *data; u32 i, j, d; @@ -466,10 +519,12 @@ mtype_expire(struct ip_set *set, struct #ifdef IP_SET_HASH_WITH_NETS u8 k; #endif + u8 htable_bits = t->htable_bits;
- t = ipset_dereference_protected(h->table, set); - for (i = 0; i < jhash_size(t->htable_bits); i++) { - n = __ipset_dereference_protected(hbucket(t, i), 1); + spin_lock_bh(&t->hregion[r].lock); + for (i = ahash_bucket_start(r, htable_bits); + i < ahash_bucket_end(r, htable_bits); i++) { + n = __ipset_dereference(hbucket(t, i)); if (!n) continue; for (j = 0, d = 0; j < n->pos; j++) { @@ -485,58 +540,100 @@ mtype_expire(struct ip_set *set, struct smp_mb__after_atomic(); #ifdef IP_SET_HASH_WITH_NETS for (k = 0; k < IPSET_NET_COUNT; k++) - mtype_del_cidr(h, + mtype_del_cidr(set, h, NCIDR_PUT(DCIDR_GET(data->cidr, k)), k); #endif + t->hregion[r].elements--; ip_set_ext_destroy(set, data); - set->elements--; d++; } if (d >= AHASH_INIT_SIZE) { if (d >= n->size) { + t->hregion[r].ext_size -= + ext_size(n->size, dsize); rcu_assign_pointer(hbucket(t, i), NULL); kfree_rcu(n, rcu); continue; } tmp = kzalloc(sizeof(*tmp) + - (n->size - AHASH_INIT_SIZE) * dsize, - GFP_ATOMIC); + (n->size - AHASH_INIT_SIZE) * dsize, + GFP_ATOMIC); if (!tmp) - /* Still try to delete expired elements */ + /* Still try to delete expired elements. */ continue; tmp->size = n->size - AHASH_INIT_SIZE; for (j = 0, d = 0; j < n->pos; j++) { if (!test_bit(j, n->used)) continue; data = ahash_data(n, j, dsize); - memcpy(tmp->value + d * dsize, data, dsize); + memcpy(tmp->value + d * dsize, + data, dsize); set_bit(d, tmp->used); d++; } tmp->pos = d; - set->ext_size -= ext_size(AHASH_INIT_SIZE, dsize); + t->hregion[r].ext_size -= + ext_size(AHASH_INIT_SIZE, dsize); rcu_assign_pointer(hbucket(t, i), tmp); kfree_rcu(n, rcu); } } + spin_unlock_bh(&t->hregion[r].lock); }
static void -mtype_gc(struct timer_list *t) +mtype_gc(struct work_struct *work) { - struct htype *h = from_timer(h, t, gc); - struct ip_set *set = h->set; + struct htable_gc *gc; + struct ip_set *set; + struct htype *h; + struct htable *t; + u32 r, numof_locks; + unsigned int next_run; + + gc = container_of(work, struct htable_gc, dwork.work); + set = gc->set; + h = set->data;
- pr_debug("called\n"); spin_lock_bh(&set->lock); - mtype_expire(set, h); + t = ipset_dereference_set(h->table, set); + atomic_inc(&t->uref); + numof_locks = ahash_numof_locks(t->htable_bits); + r = gc->region++; + if (r >= numof_locks) { + r = gc->region = 0; + } + next_run = (IPSET_GC_PERIOD(set->timeout) * HZ) / numof_locks; + if (next_run < HZ/10) + next_run = HZ/10; spin_unlock_bh(&set->lock);
- h->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ; - add_timer(&h->gc); + mtype_gc_do(set, h, t, r); + + if (atomic_dec_and_test(&t->uref) && atomic_read(&t->ref)) { + pr_debug("Table destroy after resize by expire: %p\n", t); + mtype_ahash_destroy(set, t, false); + } + + queue_delayed_work(system_power_efficient_wq, &gc->dwork, next_run); + }
+static void +mtype_gc_init(struct htable_gc *gc) +{ + INIT_DEFERRABLE_WORK(&gc->dwork, mtype_gc); + queue_delayed_work(system_power_efficient_wq, &gc->dwork, HZ); +} + +static int +mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, + struct ip_set_ext *mext, u32 flags); +static int +mtype_del(struct ip_set *set, void *value, const struct ip_set_ext *ext, + struct ip_set_ext *mext, u32 flags); + /* Resize a hash: create a new hash table with doubling the hashsize * and inserting the elements to it. Repeat until we succeed or * fail due to memory pressures. @@ -547,7 +644,7 @@ mtype_resize(struct ip_set *set, bool re struct htype *h = set->data; struct htable *t, *orig; u8 htable_bits; - size_t extsize, dsize = set->dsize; + size_t dsize = set->dsize; #ifdef IP_SET_HASH_WITH_NETS u8 flags; struct mtype_elem *tmp; @@ -555,7 +652,9 @@ mtype_resize(struct ip_set *set, bool re struct mtype_elem *data; struct mtype_elem *d; struct hbucket *n, *m; - u32 i, j, key; + struct list_head *l, *lt; + struct mtype_resize_ad *x; + u32 i, j, r, nr, key; int ret;
#ifdef IP_SET_HASH_WITH_NETS @@ -563,10 +662,8 @@ mtype_resize(struct ip_set *set, bool re if (!tmp) return -ENOMEM; #endif - rcu_read_lock_bh(); - orig = rcu_dereference_bh_nfnl(h->table); + orig = ipset_dereference_bh_nfnl(h->table); htable_bits = orig->htable_bits; - rcu_read_unlock_bh();
retry: ret = 0; @@ -583,88 +680,124 @@ retry: ret = -ENOMEM; goto out; } + t->hregion = ip_set_alloc(ahash_sizeof_regions(htable_bits)); + if (!t->hregion) { + kfree(t); + ret = -ENOMEM; + goto out; + } t->htable_bits = htable_bits; + t->maxelem = h->maxelem / ahash_numof_locks(htable_bits); + for (i = 0; i < ahash_numof_locks(htable_bits); i++) + spin_lock_init(&t->hregion[i].lock);
- spin_lock_bh(&set->lock); - orig = __ipset_dereference_protected(h->table, 1); - /* There can't be another parallel resizing, but dumping is possible */ + /* There can't be another parallel resizing, + * but dumping, gc, kernel side add/del are possible + */ + orig = ipset_dereference_bh_nfnl(h->table); atomic_set(&orig->ref, 1); atomic_inc(&orig->uref); - extsize = 0; pr_debug("attempt to resize set %s from %u to %u, t %p\n", set->name, orig->htable_bits, htable_bits, orig); - for (i = 0; i < jhash_size(orig->htable_bits); i++) { - n = __ipset_dereference_protected(hbucket(orig, i), 1); - if (!n) - continue; - for (j = 0; j < n->pos; j++) { - if (!test_bit(j, n->used)) + for (r = 0; r < ahash_numof_locks(orig->htable_bits); r++) { + /* Expire may replace a hbucket with another one */ + rcu_read_lock_bh(); + for (i = ahash_bucket_start(r, orig->htable_bits); + i < ahash_bucket_end(r, orig->htable_bits); i++) { + n = __ipset_dereference(hbucket(orig, i)); + if (!n) continue; - data = ahash_data(n, j, dsize); + for (j = 0; j < n->pos; j++) { + if (!test_bit(j, n->used)) + continue; + data = ahash_data(n, j, dsize); + if (SET_ELEM_EXPIRED(set, data)) + continue; #ifdef IP_SET_HASH_WITH_NETS - /* We have readers running parallel with us, - * so the live data cannot be modified. - */ - flags = 0; - memcpy(tmp, data, dsize); - data = tmp; - mtype_data_reset_flags(data, &flags); -#endif - key = HKEY(data, h->initval, htable_bits); - m = __ipset_dereference_protected(hbucket(t, key), 1); - if (!m) { - m = kzalloc(sizeof(*m) + + /* We have readers running parallel with us, + * so the live data cannot be modified. + */ + flags = 0; + memcpy(tmp, data, dsize); + data = tmp; + mtype_data_reset_flags(data, &flags); +#endif + key = HKEY(data, h->initval, htable_bits); + m = __ipset_dereference(hbucket(t, key)); + nr = ahash_region(key, htable_bits); + if (!m) { + m = kzalloc(sizeof(*m) + AHASH_INIT_SIZE * dsize, GFP_ATOMIC); - if (!m) { - ret = -ENOMEM; - goto cleanup; - } - m->size = AHASH_INIT_SIZE; - extsize += ext_size(AHASH_INIT_SIZE, dsize); - RCU_INIT_POINTER(hbucket(t, key), m); - } else if (m->pos >= m->size) { - struct hbucket *ht; - - if (m->size >= AHASH_MAX(h)) { - ret = -EAGAIN; - } else { - ht = kzalloc(sizeof(*ht) + + if (!m) { + ret = -ENOMEM; + goto cleanup; + } + m->size = AHASH_INIT_SIZE; + t->hregion[nr].ext_size += + ext_size(AHASH_INIT_SIZE, + dsize); + RCU_INIT_POINTER(hbucket(t, key), m); + } else if (m->pos >= m->size) { + struct hbucket *ht; + + if (m->size >= AHASH_MAX(h)) { + ret = -EAGAIN; + } else { + ht = kzalloc(sizeof(*ht) + (m->size + AHASH_INIT_SIZE) * dsize, GFP_ATOMIC); - if (!ht) - ret = -ENOMEM; + if (!ht) + ret = -ENOMEM; + } + if (ret < 0) + goto cleanup; + memcpy(ht, m, sizeof(struct hbucket) + + m->size * dsize); + ht->size = m->size + AHASH_INIT_SIZE; + t->hregion[nr].ext_size += + ext_size(AHASH_INIT_SIZE, + dsize); + kfree(m); + m = ht; + RCU_INIT_POINTER(hbucket(t, key), ht); } - if (ret < 0) - goto cleanup; - memcpy(ht, m, sizeof(struct hbucket) + - m->size * dsize); - ht->size = m->size + AHASH_INIT_SIZE; - extsize += ext_size(AHASH_INIT_SIZE, dsize); - kfree(m); - m = ht; - RCU_INIT_POINTER(hbucket(t, key), ht); - } - d = ahash_data(m, m->pos, dsize); - memcpy(d, data, dsize); - set_bit(m->pos++, m->used); + d = ahash_data(m, m->pos, dsize); + memcpy(d, data, dsize); + set_bit(m->pos++, m->used); + t->hregion[nr].elements++; #ifdef IP_SET_HASH_WITH_NETS - mtype_data_reset_flags(d, &flags); + mtype_data_reset_flags(d, &flags); #endif + } } + rcu_read_unlock_bh(); } - rcu_assign_pointer(h->table, t); - set->ext_size = extsize;
- spin_unlock_bh(&set->lock); + /* There can't be any other writer. */ + rcu_assign_pointer(h->table, t);
/* Give time to other readers of the set */ synchronize_rcu();
pr_debug("set %s resized from %u (%p) to %u (%p)\n", set->name, orig->htable_bits, orig, t->htable_bits, t); - /* If there's nobody else dumping the table, destroy it */ + /* Add/delete elements processed by the SET target during resize. + * Kernel-side add cannot trigger a resize and userspace actions + * are serialized by the mutex. + */ + list_for_each_safe(l, lt, &h->ad) { + x = list_entry(l, struct mtype_resize_ad, list); + if (x->ad == IPSET_ADD) { + mtype_add(set, &x->d, &x->ext, &x->mext, x->flags); + } else { + mtype_del(set, &x->d, NULL, NULL, 0); + } + list_del(l); + kfree(l); + } + /* If there's nobody else using the table, destroy it */ if (atomic_dec_and_test(&orig->uref)) { pr_debug("Table destroy by resize %p\n", orig); mtype_ahash_destroy(set, orig, false); @@ -677,15 +810,44 @@ out: return ret;
cleanup: + rcu_read_unlock_bh(); atomic_set(&orig->ref, 0); atomic_dec(&orig->uref); - spin_unlock_bh(&set->lock); mtype_ahash_destroy(set, t, false); if (ret == -EAGAIN) goto retry; goto out; }
+/* Get the current number of elements and ext_size in the set */ +static void +mtype_ext_size(struct ip_set *set, u32 *elements, size_t *ext_size) +{ + struct htype *h = set->data; + const struct htable *t; + u32 i, j, r; + struct hbucket *n; + struct mtype_elem *data; + + t = rcu_dereference_bh(h->table); + for (r = 0; r < ahash_numof_locks(t->htable_bits); r++) { + for (i = ahash_bucket_start(r, t->htable_bits); + i < ahash_bucket_end(r, t->htable_bits); i++) { + n = rcu_dereference_bh(hbucket(t, i)); + if (!n) + continue; + for (j = 0; j < n->pos; j++) { + if (!test_bit(j, n->used)) + continue; + data = ahash_data(n, j, set->dsize); + if (!SET_ELEM_EXPIRED(set, data)) + (*elements)++; + } + } + *ext_size += t->hregion[r].ext_size; + } +} + /* Add an element to a hash and update the internal counters when succeeded, * otherwise report the proper error code. */ @@ -698,32 +860,49 @@ mtype_add(struct ip_set *set, void *valu const struct mtype_elem *d = value; struct mtype_elem *data; struct hbucket *n, *old = ERR_PTR(-ENOENT); - int i, j = -1; + int i, j = -1, ret; bool flag_exist = flags & IPSET_FLAG_EXIST; bool deleted = false, forceadd = false, reuse = false; - u32 key, multi = 0; + u32 r, key, multi = 0, elements, maxelem;
- if (set->elements >= h->maxelem) { - if (SET_WITH_TIMEOUT(set)) - /* FIXME: when set is full, we slow down here */ - mtype_expire(set, h); - if (set->elements >= h->maxelem && SET_WITH_FORCEADD(set)) + rcu_read_lock_bh(); + t = rcu_dereference_bh(h->table); + key = HKEY(value, h->initval, t->htable_bits); + r = ahash_region(key, t->htable_bits); + atomic_inc(&t->uref); + elements = t->hregion[r].elements; + maxelem = t->maxelem; + if (elements >= maxelem) { + u32 e; + if (SET_WITH_TIMEOUT(set)) { + rcu_read_unlock_bh(); + mtype_gc_do(set, h, t, r); + rcu_read_lock_bh(); + } + maxelem = h->maxelem; + elements = 0; + for (e = 0; e < ahash_numof_locks(t->htable_bits); e++) + elements += t->hregion[e].elements; + if (elements >= maxelem && SET_WITH_FORCEADD(set)) forceadd = true; } + rcu_read_unlock_bh();
- t = ipset_dereference_protected(h->table, set); - key = HKEY(value, h->initval, t->htable_bits); - n = __ipset_dereference_protected(hbucket(t, key), 1); + spin_lock_bh(&t->hregion[r].lock); + n = rcu_dereference_bh(hbucket(t, key)); if (!n) { - if (forceadd || set->elements >= h->maxelem) + if (forceadd || elements >= maxelem) goto set_full; old = NULL; n = kzalloc(sizeof(*n) + AHASH_INIT_SIZE * set->dsize, GFP_ATOMIC); - if (!n) - return -ENOMEM; + if (!n) { + ret = -ENOMEM; + goto unlock; + } n->size = AHASH_INIT_SIZE; - set->ext_size += ext_size(AHASH_INIT_SIZE, set->dsize); + t->hregion[r].ext_size += + ext_size(AHASH_INIT_SIZE, set->dsize); goto copy_elem; } for (i = 0; i < n->pos; i++) { @@ -737,19 +916,16 @@ mtype_add(struct ip_set *set, void *valu } data = ahash_data(n, i, set->dsize); if (mtype_data_equal(data, d, &multi)) { - if (flag_exist || - (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(data, set)))) { + if (flag_exist || SET_ELEM_EXPIRED(set, data)) { /* Just the extensions could be overwritten */ j = i; goto overwrite_extensions; } - return -IPSET_ERR_EXIST; + ret = -IPSET_ERR_EXIST; + goto unlock; } /* Reuse first timed out entry */ - if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(data, set)) && - j == -1) { + if (SET_ELEM_EXPIRED(set, data) && j == -1) { j = i; reuse = true; } @@ -759,16 +935,16 @@ mtype_add(struct ip_set *set, void *valu if (!deleted) { #ifdef IP_SET_HASH_WITH_NETS for (i = 0; i < IPSET_NET_COUNT; i++) - mtype_del_cidr(h, + mtype_del_cidr(set, h, NCIDR_PUT(DCIDR_GET(data->cidr, i)), i); #endif ip_set_ext_destroy(set, data); - set->elements--; + t->hregion[r].elements--; } goto copy_data; } - if (set->elements >= h->maxelem) + if (elements >= maxelem) goto set_full; /* Create a new slot */ if (n->pos >= n->size) { @@ -776,28 +952,32 @@ mtype_add(struct ip_set *set, void *valu if (n->size >= AHASH_MAX(h)) { /* Trigger rehashing */ mtype_data_next(&h->next, d); - return -EAGAIN; + ret = -EAGAIN; + goto resize; } old = n; n = kzalloc(sizeof(*n) + (old->size + AHASH_INIT_SIZE) * set->dsize, GFP_ATOMIC); - if (!n) - return -ENOMEM; + if (!n) { + ret = -ENOMEM; + goto unlock; + } memcpy(n, old, sizeof(struct hbucket) + old->size * set->dsize); n->size = old->size + AHASH_INIT_SIZE; - set->ext_size += ext_size(AHASH_INIT_SIZE, set->dsize); + t->hregion[r].ext_size += + ext_size(AHASH_INIT_SIZE, set->dsize); }
copy_elem: j = n->pos++; data = ahash_data(n, j, set->dsize); copy_data: - set->elements++; + t->hregion[r].elements++; #ifdef IP_SET_HASH_WITH_NETS for (i = 0; i < IPSET_NET_COUNT; i++) - mtype_add_cidr(h, NCIDR_PUT(DCIDR_GET(d->cidr, i)), i); + mtype_add_cidr(set, h, NCIDR_PUT(DCIDR_GET(d->cidr, i)), i); #endif memcpy(data, d, sizeof(struct mtype_elem)); overwrite_extensions: @@ -820,13 +1000,41 @@ overwrite_extensions: if (old) kfree_rcu(old, rcu); } + ret = 0; +resize: + spin_unlock_bh(&t->hregion[r].lock); + if (atomic_read(&t->ref) && ext->target) { + /* Resize is in process and kernel side add, save values */ + struct mtype_resize_ad *x; + + x = kzalloc(sizeof(struct mtype_resize_ad), GFP_ATOMIC); + if (!x) + /* Don't bother */ + goto out; + x->ad = IPSET_ADD; + memcpy(&x->d, value, sizeof(struct mtype_elem)); + memcpy(&x->ext, ext, sizeof(struct ip_set_ext)); + memcpy(&x->mext, mext, sizeof(struct ip_set_ext)); + x->flags = flags; + spin_lock_bh(&set->lock); + list_add_tail(&x->list, &h->ad); + spin_unlock_bh(&set->lock); + } + goto out;
- return 0; set_full: if (net_ratelimit()) pr_warn("Set %s is full, maxelem %u reached\n", - set->name, h->maxelem); - return -IPSET_ERR_HASH_FULL; + set->name, maxelem); + ret = -IPSET_ERR_HASH_FULL; +unlock: + spin_unlock_bh(&t->hregion[r].lock); +out: + if (atomic_dec_and_test(&t->uref) && atomic_read(&t->ref)) { + pr_debug("Table destroy after resize by add: %p\n", t); + mtype_ahash_destroy(set, t, false); + } + return ret; }
/* Delete an element from the hash and free up space if possible. @@ -840,13 +1048,23 @@ mtype_del(struct ip_set *set, void *valu const struct mtype_elem *d = value; struct mtype_elem *data; struct hbucket *n; - int i, j, k, ret = -IPSET_ERR_EXIST; + struct mtype_resize_ad *x = NULL; + int i, j, k, r, ret = -IPSET_ERR_EXIST; u32 key, multi = 0; size_t dsize = set->dsize;
- t = ipset_dereference_protected(h->table, set); + /* Userspace add and resize is excluded by the mutex. + * Kernespace add does not trigger resize. + */ + rcu_read_lock_bh(); + t = rcu_dereference_bh(h->table); key = HKEY(value, h->initval, t->htable_bits); - n = __ipset_dereference_protected(hbucket(t, key), 1); + r = ahash_region(key, t->htable_bits); + atomic_inc(&t->uref); + rcu_read_unlock_bh(); + + spin_lock_bh(&t->hregion[r].lock); + n = rcu_dereference_bh(hbucket(t, key)); if (!n) goto out; for (i = 0, k = 0; i < n->pos; i++) { @@ -857,8 +1075,7 @@ mtype_del(struct ip_set *set, void *valu data = ahash_data(n, i, dsize); if (!mtype_data_equal(data, d, &multi)) continue; - if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(data, set))) + if (SET_ELEM_EXPIRED(set, data)) goto out;
ret = 0; @@ -866,20 +1083,33 @@ mtype_del(struct ip_set *set, void *valu smp_mb__after_atomic(); if (i + 1 == n->pos) n->pos--; - set->elements--; + t->hregion[r].elements--; #ifdef IP_SET_HASH_WITH_NETS for (j = 0; j < IPSET_NET_COUNT; j++) - mtype_del_cidr(h, NCIDR_PUT(DCIDR_GET(d->cidr, j)), - j); + mtype_del_cidr(set, h, + NCIDR_PUT(DCIDR_GET(d->cidr, j)), j); #endif ip_set_ext_destroy(set, data);
+ if (atomic_read(&t->ref) && ext->target) { + /* Resize is in process and kernel side del, + * save values + */ + x = kzalloc(sizeof(struct mtype_resize_ad), + GFP_ATOMIC); + if (x) { + x->ad = IPSET_DEL; + memcpy(&x->d, value, + sizeof(struct mtype_elem)); + x->flags = flags; + } + } for (; i < n->pos; i++) { if (!test_bit(i, n->used)) k++; } if (n->pos == 0 && k == 0) { - set->ext_size -= ext_size(n->size, dsize); + t->hregion[r].ext_size -= ext_size(n->size, dsize); rcu_assign_pointer(hbucket(t, key), NULL); kfree_rcu(n, rcu); } else if (k >= AHASH_INIT_SIZE) { @@ -898,7 +1128,8 @@ mtype_del(struct ip_set *set, void *valu k++; } tmp->pos = k; - set->ext_size -= ext_size(AHASH_INIT_SIZE, dsize); + t->hregion[r].ext_size -= + ext_size(AHASH_INIT_SIZE, dsize); rcu_assign_pointer(hbucket(t, key), tmp); kfree_rcu(n, rcu); } @@ -906,6 +1137,16 @@ mtype_del(struct ip_set *set, void *valu }
out: + spin_unlock_bh(&t->hregion[r].lock); + if (x) { + spin_lock_bh(&set->lock); + list_add(&x->list, &h->ad); + spin_unlock_bh(&set->lock); + } + if (atomic_dec_and_test(&t->uref) && atomic_read(&t->ref)) { + pr_debug("Table destroy after resize by del: %p\n", t); + mtype_ahash_destroy(set, t, false); + } return ret; }
@@ -991,6 +1232,7 @@ mtype_test(struct ip_set *set, void *val int i, ret = 0; u32 key, multi = 0;
+ rcu_read_lock_bh(); t = rcu_dereference_bh(h->table); #ifdef IP_SET_HASH_WITH_NETS /* If we test an IP address and not a network address, @@ -1022,6 +1264,7 @@ mtype_test(struct ip_set *set, void *val goto out; } out: + rcu_read_unlock_bh(); return ret; }
@@ -1033,23 +1276,14 @@ mtype_head(struct ip_set *set, struct sk const struct htable *t; struct nlattr *nested; size_t memsize; + u32 elements = 0; + size_t ext_size = 0; u8 htable_bits;
- /* If any members have expired, set->elements will be wrong - * mytype_expire function will update it with the right count. - * we do not hold set->lock here, so grab it first. - * set->elements can still be incorrect in the case of a huge set, - * because elements might time out during the listing. - */ - if (SET_WITH_TIMEOUT(set)) { - spin_lock_bh(&set->lock); - mtype_expire(set, h); - spin_unlock_bh(&set->lock); - } - rcu_read_lock_bh(); - t = rcu_dereference_bh_nfnl(h->table); - memsize = mtype_ahash_memsize(h, t) + set->ext_size; + t = rcu_dereference_bh(h->table); + mtype_ext_size(set, &elements, &ext_size); + memsize = mtype_ahash_memsize(h, t) + ext_size + set->ext_size; htable_bits = t->htable_bits; rcu_read_unlock_bh();
@@ -1071,7 +1305,7 @@ mtype_head(struct ip_set *set, struct sk #endif if (nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) || nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)) || - nla_put_net32(skb, IPSET_ATTR_ELEMENTS, htonl(set->elements))) + nla_put_net32(skb, IPSET_ATTR_ELEMENTS, htonl(elements))) goto nla_put_failure; if (unlikely(ip_set_put_flags(skb, set))) goto nla_put_failure; @@ -1091,15 +1325,15 @@ mtype_uref(struct ip_set *set, struct ne
if (start) { rcu_read_lock_bh(); - t = rcu_dereference_bh_nfnl(h->table); + t = ipset_dereference_bh_nfnl(h->table); atomic_inc(&t->uref); cb->args[IPSET_CB_PRIVATE] = (unsigned long)t; rcu_read_unlock_bh(); } else if (cb->args[IPSET_CB_PRIVATE]) { t = (struct htable *)cb->args[IPSET_CB_PRIVATE]; if (atomic_dec_and_test(&t->uref) && atomic_read(&t->ref)) { - /* Resizing didn't destroy the hash table */ - pr_debug("Table destroy by dump: %p\n", t); + pr_debug("Table destroy after resize " + " by dump: %p\n", t); mtype_ahash_destroy(set, t, false); } cb->args[IPSET_CB_PRIVATE] = 0; @@ -1141,8 +1375,7 @@ mtype_list(const struct ip_set *set, if (!test_bit(i, n->used)) continue; e = ahash_data(n, i, set->dsize); - if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(e, set))) + if (SET_ELEM_EXPIRED(set, e)) continue; pr_debug("list hash %lu hbucket %p i %u, data %p\n", cb->args[IPSET_CB_ARG0], n, i, e); @@ -1208,6 +1441,7 @@ static const struct ip_set_type_variant .uref = mtype_uref, .resize = mtype_resize, .same_set = mtype_same_set, + .region_lock = true, };
#ifdef IP_SET_EMIT_CREATE @@ -1226,6 +1460,7 @@ IPSET_TOKEN(HTYPE, _create)(struct net * size_t hsize; struct htype *h; struct htable *t; + u32 i;
pr_debug("Create set %s with family %s\n", set->name, set->family == NFPROTO_IPV4 ? "inet" : "inet6"); @@ -1294,6 +1529,15 @@ IPSET_TOKEN(HTYPE, _create)(struct net * kfree(h); return -ENOMEM; } + t->hregion = ip_set_alloc(ahash_sizeof_regions(hbits)); + if (!t->hregion) { + kfree(t); + kfree(h); + return -ENOMEM; + } + h->gc.set = set; + for (i = 0; i < ahash_numof_locks(hbits); i++) + spin_lock_init(&t->hregion[i].lock); h->maxelem = maxelem; #ifdef IP_SET_HASH_WITH_NETMASK h->netmask = netmask; @@ -1304,9 +1548,10 @@ IPSET_TOKEN(HTYPE, _create)(struct net * get_random_bytes(&h->initval, sizeof(h->initval));
t->htable_bits = hbits; + t->maxelem = h->maxelem / ahash_numof_locks(hbits); RCU_INIT_POINTER(h->table, t);
- h->set = set; + INIT_LIST_HEAD(&h->ad); set->data = h; #ifndef IP_SET_PROTO_UNDEF if (set->family == NFPROTO_IPV4) { @@ -1329,12 +1574,10 @@ IPSET_TOKEN(HTYPE, _create)(struct net * #ifndef IP_SET_PROTO_UNDEF if (set->family == NFPROTO_IPV4) #endif - IPSET_TOKEN(HTYPE, 4_gc_init)(set, - IPSET_TOKEN(HTYPE, 4_gc)); + IPSET_TOKEN(HTYPE, 4_gc_init)(&h->gc); #ifndef IP_SET_PROTO_UNDEF else - IPSET_TOKEN(HTYPE, 6_gc_init)(set, - IPSET_TOKEN(HTYPE, 6_gc)); + IPSET_TOKEN(HTYPE, 6_gc_init)(&h->gc); #endif } pr_debug("create %s hashsize %u (%u) maxelem %u: %p(%p)\n",
From: Ursula Braun ubraun@linux.ibm.com
commit 67f562e3e147750a02b2a91d21a163fc44a1d13e upstream.
SMC does not work together with FASTOPEN. If sendmsg() is called with flag MSG_FASTOPEN in SMC_INIT state, the SMC-socket switches to fallback mode. To handle the previous ioctl FIOASYNC call correctly in this case, it is necessary to transfer the socket wait queue fasync_list to the internal TCP socket.
Reported-by: syzbot+4b1fe8105f8044a26162@syzkaller.appspotmail.com Fixes: ee9dfbef02d18 ("net/smc: handle sockopts forcing fallback") Signed-off-by: Ursula Braun ubraun@linux.ibm.com Signed-off-by: Karsten Graul kgraul@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/smc/af_smc.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -470,6 +470,8 @@ static void smc_switch_to_fallback(struc if (smc->sk.sk_socket && smc->sk.sk_socket->file) { smc->clcsock->file = smc->sk.sk_socket->file; smc->clcsock->file->private_data = smc->clcsock; + smc->clcsock->wq.fasync_list = + smc->sk.sk_socket->wq.fasync_list; } }
From: Eugenio Pérez eperezma@redhat.com
commit 42d84c8490f9f0931786f1623191fcab397c3d64 upstream.
Doing so, we save one call to get data we already have in the struct.
Also, since there is no guarantee that getname use sockaddr_ll parameter beyond its size, we add a little bit of security here. It should do not do beyond MAX_ADDR_LEN, but syzbot found that ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25, versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro).
Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com Signed-off-by: Eugenio Pérez eperezma@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/vhost/net.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
--- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1414,10 +1414,6 @@ static int vhost_net_release(struct inod
static struct socket *get_raw_socket(int fd) { - struct { - struct sockaddr_ll sa; - char buf[MAX_ADDR_LEN]; - } uaddr; int r; struct socket *sock = sockfd_lookup(fd, &r);
@@ -1430,11 +1426,7 @@ static struct socket *get_raw_socket(int goto err; }
- r = sock->ops->getname(sock, (struct sockaddr *)&uaddr.sa, 0); - if (r < 0) - goto err; - - if (uaddr.sa.sll_family != AF_PACKET) { + if (sock->sk->sk_family != AF_PACKET) { r = -EPFNOSUPPORT; goto err; }
From: Jozsef Kadlecsik kadlec@netfilter.org
commit 8af1c6fbd9239877998c7f5a591cb2c88d41fb66 upstream.
When the forceadd option is enabled, the hash:* types should find and replace the first entry in the bucket with the new one if there are no reuseable (deleted or timed out) entries. However, the position index was just not set to zero and remained the invalid -1 if there were no reuseable entries.
Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik kadlec@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/ipset/ip_set_hash_gen.h | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -931,6 +931,8 @@ mtype_add(struct ip_set *set, void *valu } } if (reuse || forceadd) { + if (j == -1) + j = 0; data = ahash_data(n, j, set->dsize); if (!deleted) { #ifdef IP_SET_HASH_WITH_NETS
From: Cong Wang xiyou.wangcong@gmail.com
commit c4a3922d2d20c710f827d3a115ee338e8d0467df upstream.
It is unnecessary to hold hashlimit_mutex for htable_destroy() as it is already removed from the global hashtable and its refcount is already zero.
Also, switch hinfo->use to refcount_t so that we don't have to hold the mutex until it reaches zero in htable_put().
Reported-and-tested-by: syzbot+adf6c6c2be1c3a718121@syzkaller.appspotmail.com Acked-by: Florian Westphal fw@strlen.de Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/xt_hashlimit.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/net/netfilter/xt_hashlimit.c +++ b/net/netfilter/xt_hashlimit.c @@ -36,6 +36,7 @@ #include <linux/netfilter_ipv6/ip6_tables.h> #include <linux/mutex.h> #include <linux/kernel.h> +#include <linux/refcount.h> #include <uapi/linux/netfilter/xt_hashlimit.h>
#define XT_HASHLIMIT_ALL (XT_HASHLIMIT_HASH_DIP | XT_HASHLIMIT_HASH_DPT | \ @@ -114,7 +115,7 @@ struct dsthash_ent {
struct xt_hashlimit_htable { struct hlist_node node; /* global list of all htables */ - int use; + refcount_t use; u_int8_t family; bool rnd_initialized;
@@ -315,7 +316,7 @@ static int htable_create(struct net *net for (i = 0; i < hinfo->cfg.size; i++) INIT_HLIST_HEAD(&hinfo->hash[i]);
- hinfo->use = 1; + refcount_set(&hinfo->use, 1); hinfo->count = 0; hinfo->family = family; hinfo->rnd_initialized = false; @@ -434,7 +435,7 @@ static struct xt_hashlimit_htable *htabl hlist_for_each_entry(hinfo, &hashlimit_net->htables, node) { if (!strcmp(name, hinfo->name) && hinfo->family == family) { - hinfo->use++; + refcount_inc(&hinfo->use); return hinfo; } } @@ -443,12 +444,11 @@ static struct xt_hashlimit_htable *htabl
static void htable_put(struct xt_hashlimit_htable *hinfo) { - mutex_lock(&hashlimit_mutex); - if (--hinfo->use == 0) { + if (refcount_dec_and_mutex_lock(&hinfo->use, &hashlimit_mutex)) { hlist_del(&hinfo->node); + mutex_unlock(&hashlimit_mutex); htable_destroy(hinfo); } - mutex_unlock(&hashlimit_mutex); }
/* The algorithm used is the Simple Token Bucket Filter (TBF)
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
commit 8d2e77b39b8fecb794e19cd006a12f90b14dd077 upstream.
They are issues: - if 'input_allocate_device()' fails and return NULL, there is no need to free anything and 'input_free_device()' call is a no-op. It can be axed. - 'ret' is known to be 0 at this point, so we must set it to a meaningful value before returning
Fixes: 2562756dde55 ("HID: add Alps I2C HID Touchpad-Stick support") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/hid-alps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hid/hid-alps.c +++ b/drivers/hid/hid-alps.c @@ -730,7 +730,7 @@ static int alps_input_configured(struct if (data->has_sp) { input2 = input_allocate_device(); if (!input2) { - input_free_device(input2); + ret = -ENOMEM; goto exit; }
From: dan.carpenter@oracle.com dan.carpenter@oracle.com
commit 5c02c447eaeda29d3da121a2e17b97ccaf579b51 upstream.
Syzbot reports that "hiddev" is used after it's free in hiddev_disconnect(). The hiddev_disconnect() function sets "hiddev->exist = 0;" so hiddev_release() can free it as soon as we drop the "existancelock" lock. This patch moves the mutex_unlock(&hiddev->existancelock) until after we have finished using it.
Reported-by: syzbot+784ccb935f9900cc7c9e@syzkaller.appspotmail.com Fixes: 7f77897ef2b6 ("HID: hiddev: fix potential use-after-free") Suggested-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hid/usbhid/hiddev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hid/usbhid/hiddev.c +++ b/drivers/hid/usbhid/hiddev.c @@ -932,9 +932,9 @@ void hiddev_disconnect(struct hid_device hiddev->exist = 0;
if (hiddev->open) { - mutex_unlock(&hiddev->existancelock); hid_hw_close(hiddev->hid); wake_up_interruptible(&hiddev->wait); + mutex_unlock(&hiddev->existancelock); } else { mutex_unlock(&hiddev->existancelock); kfree(hiddev);
From: Anup Patel anup.patel@wdc.com
commit 6a1ce99dc4bde564e4a072936f9d41f4a439140e upstream.
Historically, we have been enabling all interrupts for each HART in trap_init(). Ideally, we should only enable M-mode interrupts for M-mode kernel and S-mode interrupts for S-mode kernel in trap_init().
Currently, we get suprious S-mode interrupts on Kendryte K210 board running M-mode NO-MMU kernel because we are enabling all interrupts in trap_init(). To fix this, we only enable software and external interrupt in trap_init(). In future, trap_init() will only enable software interrupt and PLIC driver will enable external interrupt using CPU notifiers.
Fixes: a4c3733d32a7 ("riscv: abstract out CSR names for supervisor vs machine mode") Signed-off-by: Anup Patel anup.patel@wdc.com Reviewed-by: Atish Patra atish.patra@wdc.com Tested-by: Palmer Dabbelt palmerdabbelt@google.com [QMEU virt machine with SMP] [Palmer: Move the Fixes up to a newer commit] Reviewed-by: Palmer Dabbelt palmerdabbelt@google.com Signed-off-by: Palmer Dabbelt palmerdabbelt@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/riscv/kernel/traps.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -156,6 +156,6 @@ void __init trap_init(void) csr_write(CSR_SCRATCH, 0); /* Set the exception vector address */ csr_write(CSR_TVEC, &handle_exception); - /* Enable all interrupts */ - csr_write(CSR_IE, -1); + /* Enable interrupts */ + csr_write(CSR_IE, IE_SIE | IE_EIE); }
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
commit bef8e2dfceed6daeb6ca3e8d33f9c9d43b926580 upstream.
Pointer on the memory allocated by 'alloc_progmem()' is stored in 'v->load_addr'. So this is this memory that should be freed by 'release_progmem()'.
'release_progmem()' is only a call to 'kfree()'.
With the current code, there is both a double free and a memory leak. Fix it by passing the correct pointer to 'release_progmem()'.
Fixes: e01402b115ccc ("More AP / SP bits for the 34K, the Malta bits and things. Still wants") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Paul Burton paulburton@kernel.org Cc: ralf@linux-mips.org Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kernel-janitors@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/vpe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/mips/kernel/vpe.c +++ b/arch/mips/kernel/vpe.c @@ -134,7 +134,7 @@ void release_vpe(struct vpe *v) { list_del(&v->list); if (v->load_addr) - release_progmem(v); + release_progmem(v->load_addr); kfree(v); }
________________________________________ Von: kernel-janitors-owner@vger.kernel.org kernel-janitors-owner@vger.kernel.org im Auftrag von Greg Kroah-Hartman gregkh@linuxfoundation.org Gesendet: Dienstag, 3. März 2020 18:42 An: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman; stable@vger.kernel.org; Christophe JAILLET; Paul Burton; ralf@linux-mips.org; linux-mips@vger.kernel.org; kernel-janitors@vger.kernel.org Betreff: [PATCH 5.5 110/176] MIPS: VPE: Fix a double free and a memory leak in release_vpe()
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
commit bef8e2dfceed6daeb6ca3e8d33f9c9d43b926580 upstream.
Pointer on the memory allocated by 'alloc_progmem()' is stored in 'v->load_addr'. So this is this memory that should be freed by 'release_progmem()'.
'release_progmem()' is only a call to 'kfree()'.
With the current code, there is both a double free and a memory leak. Fix it by passing the correct pointer to 'release_progmem()'.
Fixes: e01402b115ccc ("More AP / SP bits for the 34K, the Malta bits and things. Still wants") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Paul Burton paulburton@kernel.org Cc: ralf@linux-mips.org Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kernel-janitors@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/vpe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/mips/kernel/vpe.c +++ b/arch/mips/kernel/vpe.c @@ -134,7 +134,7 @@ void release_vpe(struct vpe *v) { list_del(&v->list); if (v->load_addr) - release_progmem(v); + release_progmem(v->load_addr); kfree(v); }
since release_progmem() is kfree() it is also possible to drop "if (v->load_addr)"
jm2c
re, wh
Le 04/03/2020 à 22:28, Walter Harms a écrit :
Von: kernel-janitors-owner@vger.kernel.org kernel-janitors-owner@vger.kernel.org im Auftrag von Greg Kroah-Hartman gregkh@linuxfoundation.org Gesendet: Dienstag, 3. März 2020 18:42 An: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman; stable@vger.kernel.org; Christophe JAILLET; Paul Burton; ralf@linux-mips.org; linux-mips@vger.kernel.org; kernel-janitors@vger.kernel.org Betreff: [PATCH 5.5 110/176] MIPS: VPE: Fix a double free and a memory leak in release_vpe()
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
commit bef8e2dfceed6daeb6ca3e8d33f9c9d43b926580 upstream.
Pointer on the memory allocated by 'alloc_progmem()' is stored in 'v->load_addr'. So this is this memory that should be freed by 'release_progmem()'.
'release_progmem()' is only a call to 'kfree()'.
With the current code, there is both a double free and a memory leak. Fix it by passing the correct pointer to 'release_progmem()'.
Fixes: e01402b115ccc ("More AP / SP bits for the 34K, the Malta bits and things. Still wants") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Paul Burton paulburton@kernel.org Cc: ralf@linux-mips.org Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kernel-janitors@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/mips/kernel/vpe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/mips/kernel/vpe.c +++ b/arch/mips/kernel/vpe.c @@ -134,7 +134,7 @@ void release_vpe(struct vpe *v) { list_del(&v->list); if (v->load_addr)
release_progmem(v);
}release_progmem(v->load_addr); kfree(v);
since release_progmem() is kfree() it is also possible to drop "if (v->load_addr)"
jm2c
re, wh
Agreed.
My patch had the following comment after the patch description: --- The 'if (v->load_addr)' looks also redundant, but, well, the code is old and I feel lazy tonight to send another patch for only that. ---
git log shows nearly no update since end of 2015, so I kept my proposal as minimal :)
CJ
From: Oliver Upton oupton@google.com
commit 5ef8acbdd687c9d72582e2c05c0b9756efb37863 upstream.
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution.
However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2.
Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton oupton@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/svm.c | 1 + arch/x86/kvm/vmx/nested.c | 35 ++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/nested.h | 5 +++++ arch/x86/kvm/vmx/vmx.c | 37 ++++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/vmx.h | 3 +++ arch/x86/kvm/x86.c | 2 ++ 8 files changed, 83 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1092,6 +1092,7 @@ struct kvm_x86_ops { void (*run)(struct kvm_vcpu *vcpu); int (*handle_exit)(struct kvm_vcpu *vcpu); int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu); + void (*update_emulated_instruction)(struct kvm_vcpu *vcpu); void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu); void (*patch_hypercall)(struct kvm_vcpu *vcpu, --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -390,6 +390,7 @@ struct kvm_sync_regs { #define KVM_STATE_NESTED_GUEST_MODE 0x00000001 #define KVM_STATE_NESTED_RUN_PENDING 0x00000002 #define KVM_STATE_NESTED_EVMCS 0x00000004 +#define KVM_STATE_NESTED_MTF_PENDING 0x00000008
#define KVM_STATE_NESTED_SMM_GUEST_MODE 0x00000001 #define KVM_STATE_NESTED_SMM_VMXON 0x00000002 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -7311,6 +7311,7 @@ static struct kvm_x86_ops svm_x86_ops __ .run = svm_vcpu_run, .handle_exit = handle_exit, .skip_emulated_instruction = skip_emulated_instruction, + .update_emulated_instruction = NULL, .set_interrupt_shadow = svm_set_interrupt_shadow, .get_interrupt_shadow = svm_get_interrupt_shadow, .patch_hypercall = svm_patch_hypercall, --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3616,8 +3616,15 @@ static int vmx_check_nested_events(struc unsigned long exit_qual; bool block_nested_events = vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu); + bool mtf_pending = vmx->nested.mtf_pending; struct kvm_lapic *apic = vcpu->arch.apic;
+ /* + * Clear the MTF state. If a higher priority VM-exit is delivered first, + * this state is discarded. + */ + vmx->nested.mtf_pending = false; + if (lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &apic->pending_events)) { if (block_nested_events) @@ -3628,8 +3635,28 @@ static int vmx_check_nested_events(struc return 0; }
+ /* + * Process any exceptions that are not debug traps before MTF. + */ + if (vcpu->arch.exception.pending && + !vmx_pending_dbg_trap(vcpu) && + nested_vmx_check_exception(vcpu, &exit_qual)) { + if (block_nested_events) + return -EBUSY; + nested_vmx_inject_exception_vmexit(vcpu, exit_qual); + return 0; + } + + if (mtf_pending) { + if (block_nested_events) + return -EBUSY; + nested_vmx_update_pending_dbg(vcpu); + nested_vmx_vmexit(vcpu, EXIT_REASON_MONITOR_TRAP_FLAG, 0, 0); + return 0; + } + if (vcpu->arch.exception.pending && - nested_vmx_check_exception(vcpu, &exit_qual)) { + nested_vmx_check_exception(vcpu, &exit_qual)) { if (block_nested_events) return -EBUSY; nested_vmx_inject_exception_vmexit(vcpu, exit_qual); @@ -5742,6 +5769,9 @@ static int vmx_get_nested_state(struct k
if (vmx->nested.nested_run_pending) kvm_state.flags |= KVM_STATE_NESTED_RUN_PENDING; + + if (vmx->nested.mtf_pending) + kvm_state.flags |= KVM_STATE_NESTED_MTF_PENDING; } }
@@ -5922,6 +5952,9 @@ static int vmx_set_nested_state(struct k vmx->nested.nested_run_pending = !!(kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING);
+ vmx->nested.mtf_pending = + !!(kvm_state->flags & KVM_STATE_NESTED_MTF_PENDING); + ret = -EINVAL; if (nested_cpu_has_shadow_vmcs(vmcs12) && vmcs12->vmcs_link_pointer != -1ull) { --- a/arch/x86/kvm/vmx/nested.h +++ b/arch/x86/kvm/vmx/nested.h @@ -176,6 +176,11 @@ static inline bool nested_cpu_has_virtua return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS; }
+static inline int nested_cpu_has_mtf(struct vmcs12 *vmcs12) +{ + return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_TRAP_FLAG); +} + static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12) { return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT); --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1595,6 +1595,40 @@ static int skip_emulated_instruction(str return 1; }
+ +/* + * Recognizes a pending MTF VM-exit and records the nested state for later + * delivery. + */ +static void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu) +{ + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (!is_guest_mode(vcpu)) + return; + + /* + * Per the SDM, MTF takes priority over debug-trap exceptions besides + * T-bit traps. As instruction emulation is completed (i.e. at the + * instruction boundary), any #DB exception pending delivery must be a + * debug-trap. Record the pending MTF state to be delivered in + * vmx_check_nested_events(). + */ + if (nested_cpu_has_mtf(vmcs12) && + (!vcpu->arch.exception.pending || + vcpu->arch.exception.nr == DB_VECTOR)) + vmx->nested.mtf_pending = true; + else + vmx->nested.mtf_pending = false; +} + +static int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu) +{ + vmx_update_emulated_instruction(vcpu); + return skip_emulated_instruction(vcpu); +} + static void vmx_clear_hlt(struct kvm_vcpu *vcpu) { /* @@ -7886,7 +7920,8 @@ static struct kvm_x86_ops vmx_x86_ops __
.run = vmx_vcpu_run, .handle_exit = vmx_handle_exit, - .skip_emulated_instruction = skip_emulated_instruction, + .skip_emulated_instruction = vmx_skip_emulated_instruction, + .update_emulated_instruction = vmx_update_emulated_instruction, .set_interrupt_shadow = vmx_set_interrupt_shadow, .get_interrupt_shadow = vmx_get_interrupt_shadow, .patch_hypercall = vmx_patch_hypercall, --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -150,6 +150,9 @@ struct nested_vmx { /* L2 must run next, and mustn't decide to exit to L1. */ bool nested_run_pending;
+ /* Pending MTF VM-exit into L1. */ + bool mtf_pending; + struct loaded_vmcs vmcs02;
/* --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6838,6 +6838,8 @@ restart: kvm_rip_write(vcpu, ctxt->eip); if (r && ctxt->tf) r = kvm_vcpu_do_singlestep(vcpu); + if (kvm_x86_ops->update_emulated_instruction) + kvm_x86_ops->update_emulated_instruction(vcpu); __kvm_set_rflags(vcpu, ctxt->eflags); }
On 03/03/20 18:42, Greg Kroah-Hartman wrote:
From: Oliver Upton oupton@google.com
commit 5ef8acbdd687c9d72582e2c05c0b9756efb37863 upstream.
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution.
However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2.
Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton oupton@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Why is this included in a stable release? It was part of a series of four patches and the prerequisites as far as I can see are not part of 5.5.
I have already said half a dozen times that I don't want any of the autopick stuff for KVM. Is a Fixes tag sufficient to get patches into stable now?
Paolo
arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/svm.c | 1 + arch/x86/kvm/vmx/nested.c | 35 ++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/nested.h | 5 +++++ arch/x86/kvm/vmx/vmx.c | 37 ++++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/vmx.h | 3 +++ arch/x86/kvm/x86.c | 2 ++ 8 files changed, 83 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1092,6 +1092,7 @@ struct kvm_x86_ops { void (*run)(struct kvm_vcpu *vcpu); int (*handle_exit)(struct kvm_vcpu *vcpu); int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
- void (*update_emulated_instruction)(struct kvm_vcpu *vcpu); void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu); void (*patch_hypercall)(struct kvm_vcpu *vcpu,
--- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -390,6 +390,7 @@ struct kvm_sync_regs { #define KVM_STATE_NESTED_GUEST_MODE 0x00000001 #define KVM_STATE_NESTED_RUN_PENDING 0x00000002 #define KVM_STATE_NESTED_EVMCS 0x00000004 +#define KVM_STATE_NESTED_MTF_PENDING 0x00000008 #define KVM_STATE_NESTED_SMM_GUEST_MODE 0x00000001 #define KVM_STATE_NESTED_SMM_VMXON 0x00000002 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -7311,6 +7311,7 @@ static struct kvm_x86_ops svm_x86_ops __ .run = svm_vcpu_run, .handle_exit = handle_exit, .skip_emulated_instruction = skip_emulated_instruction,
- .update_emulated_instruction = NULL, .set_interrupt_shadow = svm_set_interrupt_shadow, .get_interrupt_shadow = svm_get_interrupt_shadow, .patch_hypercall = svm_patch_hypercall,
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3616,8 +3616,15 @@ static int vmx_check_nested_events(struc unsigned long exit_qual; bool block_nested_events = vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
- bool mtf_pending = vmx->nested.mtf_pending; struct kvm_lapic *apic = vcpu->arch.apic;
- /*
* Clear the MTF state. If a higher priority VM-exit is delivered first,
* this state is discarded.
*/
- vmx->nested.mtf_pending = false;
- if (lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &apic->pending_events)) { if (block_nested_events)
@@ -3628,8 +3635,28 @@ static int vmx_check_nested_events(struc return 0; }
- /*
* Process any exceptions that are not debug traps before MTF.
*/
- if (vcpu->arch.exception.pending &&
!vmx_pending_dbg_trap(vcpu) &&
nested_vmx_check_exception(vcpu, &exit_qual)) {
if (block_nested_events)
return -EBUSY;
nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
return 0;
- }
- if (mtf_pending) {
if (block_nested_events)
return -EBUSY;
nested_vmx_update_pending_dbg(vcpu);
nested_vmx_vmexit(vcpu, EXIT_REASON_MONITOR_TRAP_FLAG, 0, 0);
return 0;
- }
- if (vcpu->arch.exception.pending &&
nested_vmx_check_exception(vcpu, &exit_qual)) {
if (block_nested_events) return -EBUSY; nested_vmx_inject_exception_vmexit(vcpu, exit_qual);nested_vmx_check_exception(vcpu, &exit_qual)) {
@@ -5742,6 +5769,9 @@ static int vmx_get_nested_state(struct k if (vmx->nested.nested_run_pending) kvm_state.flags |= KVM_STATE_NESTED_RUN_PENDING;
if (vmx->nested.mtf_pending)
} }kvm_state.flags |= KVM_STATE_NESTED_MTF_PENDING;
@@ -5922,6 +5952,9 @@ static int vmx_set_nested_state(struct k vmx->nested.nested_run_pending = !!(kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING);
- vmx->nested.mtf_pending =
!!(kvm_state->flags & KVM_STATE_NESTED_MTF_PENDING);
- ret = -EINVAL; if (nested_cpu_has_shadow_vmcs(vmcs12) && vmcs12->vmcs_link_pointer != -1ull) {
--- a/arch/x86/kvm/vmx/nested.h +++ b/arch/x86/kvm/vmx/nested.h @@ -176,6 +176,11 @@ static inline bool nested_cpu_has_virtua return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS; } +static inline int nested_cpu_has_mtf(struct vmcs12 *vmcs12) +{
- return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_TRAP_FLAG);
+}
static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12) { return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT); --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1595,6 +1595,40 @@ static int skip_emulated_instruction(str return 1; }
+/*
- Recognizes a pending MTF VM-exit and records the nested state for later
- delivery.
- */
+static void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu) +{
- struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
- struct vcpu_vmx *vmx = to_vmx(vcpu);
- if (!is_guest_mode(vcpu))
return;
- /*
* Per the SDM, MTF takes priority over debug-trap exceptions besides
* T-bit traps. As instruction emulation is completed (i.e. at the
* instruction boundary), any #DB exception pending delivery must be a
* debug-trap. Record the pending MTF state to be delivered in
* vmx_check_nested_events().
*/
- if (nested_cpu_has_mtf(vmcs12) &&
(!vcpu->arch.exception.pending ||
vcpu->arch.exception.nr == DB_VECTOR))
vmx->nested.mtf_pending = true;
- else
vmx->nested.mtf_pending = false;
+}
+static int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu) +{
- vmx_update_emulated_instruction(vcpu);
- return skip_emulated_instruction(vcpu);
+}
static void vmx_clear_hlt(struct kvm_vcpu *vcpu) { /* @@ -7886,7 +7920,8 @@ static struct kvm_x86_ops vmx_x86_ops __ .run = vmx_vcpu_run, .handle_exit = vmx_handle_exit,
- .skip_emulated_instruction = skip_emulated_instruction,
- .skip_emulated_instruction = vmx_skip_emulated_instruction,
- .update_emulated_instruction = vmx_update_emulated_instruction, .set_interrupt_shadow = vmx_set_interrupt_shadow, .get_interrupt_shadow = vmx_get_interrupt_shadow, .patch_hypercall = vmx_patch_hypercall,
--- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -150,6 +150,9 @@ struct nested_vmx { /* L2 must run next, and mustn't decide to exit to L1. */ bool nested_run_pending;
- /* Pending MTF VM-exit into L1. */
- bool mtf_pending;
- struct loaded_vmcs vmcs02;
/* --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6838,6 +6838,8 @@ restart: kvm_rip_write(vcpu, ctxt->eip); if (r && ctxt->tf) r = kvm_vcpu_do_singlestep(vcpu);
if (kvm_x86_ops->update_emulated_instruction)
}kvm_x86_ops->update_emulated_instruction(vcpu); __kvm_set_rflags(vcpu, ctxt->eflags);
On Tue, Mar 3, 2020 at 11:23 PM Paolo Bonzini pbonzini@redhat.com wrote:
On 03/03/20 18:42, Greg Kroah-Hartman wrote:
From: Oliver Upton oupton@google.com
commit 5ef8acbdd687c9d72582e2c05c0b9756efb37863 upstream.
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution.
However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2.
Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton oupton@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Why is this included in a stable release? It was part of a series of four patches and the prerequisites as far as I can see are not part of 5.5.
It looks like these commits were already picked from upstream:
684c0422da71 ("KVM: nVMX: Handle pending #DB when injecting INIT VM-exit") 307f1cfa2696 ("KVM: x86: Mask off reserved bit from #DB exception payload")
Which were bug fixes in their own right and were sensible for backporting (though I didn't cc stable if I'm not mistaken).
but not:
a06230b62b89 ("KVM: x86: Deliver exception payload on KVM_GET_VCPU_EVENTS")
which this patch absolutely depends on.
Otherwise, I'll defer discussions regarding the suitability of this patch for stable to Paolo.
-- Thanks, Oliver
I have already said half a dozen times that I don't want any of the autopick stuff for KVM. Is a Fixes tag sufficient to get patches into stable now?
Paolo
arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/svm.c | 1 + arch/x86/kvm/vmx/nested.c | 35 ++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/nested.h | 5 +++++ arch/x86/kvm/vmx/vmx.c | 37 ++++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/vmx.h | 3 +++ arch/x86/kvm/x86.c | 2 ++ 8 files changed, 83 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1092,6 +1092,7 @@ struct kvm_x86_ops { void (*run)(struct kvm_vcpu *vcpu); int (*handle_exit)(struct kvm_vcpu *vcpu); int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
void (*update_emulated_instruction)(struct kvm_vcpu *vcpu); void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask); u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu); void (*patch_hypercall)(struct kvm_vcpu *vcpu,
--- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -390,6 +390,7 @@ struct kvm_sync_regs { #define KVM_STATE_NESTED_GUEST_MODE 0x00000001 #define KVM_STATE_NESTED_RUN_PENDING 0x00000002 #define KVM_STATE_NESTED_EVMCS 0x00000004 +#define KVM_STATE_NESTED_MTF_PENDING 0x00000008
#define KVM_STATE_NESTED_SMM_GUEST_MODE 0x00000001 #define KVM_STATE_NESTED_SMM_VMXON 0x00000002 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -7311,6 +7311,7 @@ static struct kvm_x86_ops svm_x86_ops __ .run = svm_vcpu_run, .handle_exit = handle_exit, .skip_emulated_instruction = skip_emulated_instruction,
.update_emulated_instruction = NULL, .set_interrupt_shadow = svm_set_interrupt_shadow, .get_interrupt_shadow = svm_get_interrupt_shadow, .patch_hypercall = svm_patch_hypercall,
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3616,8 +3616,15 @@ static int vmx_check_nested_events(struc unsigned long exit_qual; bool block_nested_events = vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
bool mtf_pending = vmx->nested.mtf_pending; struct kvm_lapic *apic = vcpu->arch.apic;
/*
* Clear the MTF state. If a higher priority VM-exit is delivered first,
* this state is discarded.
*/
vmx->nested.mtf_pending = false;
if (lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &apic->pending_events)) { if (block_nested_events)
@@ -3628,8 +3635,28 @@ static int vmx_check_nested_events(struc return 0; }
/*
* Process any exceptions that are not debug traps before MTF.
*/
if (vcpu->arch.exception.pending &&
!vmx_pending_dbg_trap(vcpu) &&
nested_vmx_check_exception(vcpu, &exit_qual)) {
if (block_nested_events)
return -EBUSY;
nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
return 0;
}
if (mtf_pending) {
if (block_nested_events)
return -EBUSY;
nested_vmx_update_pending_dbg(vcpu);
nested_vmx_vmexit(vcpu, EXIT_REASON_MONITOR_TRAP_FLAG, 0, 0);
return 0;
}
if (vcpu->arch.exception.pending &&
nested_vmx_check_exception(vcpu, &exit_qual)) {
nested_vmx_check_exception(vcpu, &exit_qual)) { if (block_nested_events) return -EBUSY; nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
@@ -5742,6 +5769,9 @@ static int vmx_get_nested_state(struct k
if (vmx->nested.nested_run_pending) kvm_state.flags |= KVM_STATE_NESTED_RUN_PENDING;
if (vmx->nested.mtf_pending)
kvm_state.flags |= KVM_STATE_NESTED_MTF_PENDING; } }
@@ -5922,6 +5952,9 @@ static int vmx_set_nested_state(struct k vmx->nested.nested_run_pending = !!(kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING);
vmx->nested.mtf_pending =
!!(kvm_state->flags & KVM_STATE_NESTED_MTF_PENDING);
ret = -EINVAL; if (nested_cpu_has_shadow_vmcs(vmcs12) && vmcs12->vmcs_link_pointer != -1ull) {
--- a/arch/x86/kvm/vmx/nested.h +++ b/arch/x86/kvm/vmx/nested.h @@ -176,6 +176,11 @@ static inline bool nested_cpu_has_virtua return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS; }
+static inline int nested_cpu_has_mtf(struct vmcs12 *vmcs12) +{
return nested_cpu_has(vmcs12, CPU_BASED_MONITOR_TRAP_FLAG);
+}
static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12) { return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT); --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1595,6 +1595,40 @@ static int skip_emulated_instruction(str return 1; }
+/*
- Recognizes a pending MTF VM-exit and records the nested state for later
- delivery.
- */
+static void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu) +{
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
struct vcpu_vmx *vmx = to_vmx(vcpu);
if (!is_guest_mode(vcpu))
return;
/*
* Per the SDM, MTF takes priority over debug-trap exceptions besides
* T-bit traps. As instruction emulation is completed (i.e. at the
* instruction boundary), any #DB exception pending delivery must be a
* debug-trap. Record the pending MTF state to be delivered in
* vmx_check_nested_events().
*/
if (nested_cpu_has_mtf(vmcs12) &&
(!vcpu->arch.exception.pending ||
vcpu->arch.exception.nr == DB_VECTOR))
vmx->nested.mtf_pending = true;
else
vmx->nested.mtf_pending = false;
+}
+static int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu) +{
vmx_update_emulated_instruction(vcpu);
return skip_emulated_instruction(vcpu);
+}
static void vmx_clear_hlt(struct kvm_vcpu *vcpu) { /* @@ -7886,7 +7920,8 @@ static struct kvm_x86_ops vmx_x86_ops __
.run = vmx_vcpu_run, .handle_exit = vmx_handle_exit,
.skip_emulated_instruction = skip_emulated_instruction,
.skip_emulated_instruction = vmx_skip_emulated_instruction,
.update_emulated_instruction = vmx_update_emulated_instruction, .set_interrupt_shadow = vmx_set_interrupt_shadow, .get_interrupt_shadow = vmx_get_interrupt_shadow, .patch_hypercall = vmx_patch_hypercall,
--- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -150,6 +150,9 @@ struct nested_vmx { /* L2 must run next, and mustn't decide to exit to L1. */ bool nested_run_pending;
/* Pending MTF VM-exit into L1. */
bool mtf_pending;
struct loaded_vmcs vmcs02; /*
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6838,6 +6838,8 @@ restart: kvm_rip_write(vcpu, ctxt->eip); if (r && ctxt->tf) r = kvm_vcpu_do_singlestep(vcpu);
if (kvm_x86_ops->update_emulated_instruction)
kvm_x86_ops->update_emulated_instruction(vcpu); __kvm_set_rflags(vcpu, ctxt->eflags); }
On Tue, Mar 03, 2020 at 11:39:35PM -0800, Oliver Upton wrote:
On Tue, Mar 3, 2020 at 11:23 PM Paolo Bonzini pbonzini@redhat.com wrote:
On 03/03/20 18:42, Greg Kroah-Hartman wrote:
From: Oliver Upton oupton@google.com
commit 5ef8acbdd687c9d72582e2c05c0b9756efb37863 upstream.
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution.
However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2.
Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton oupton@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Why is this included in a stable release? It was part of a series of four patches and the prerequisites as far as I can see are not part of 5.5.
It looks like these commits were already picked from upstream:
684c0422da71 ("KVM: nVMX: Handle pending #DB when injecting INIT VM-exit") 307f1cfa2696 ("KVM: x86: Mask off reserved bit from #DB exception payload")
Which were bug fixes in their own right and were sensible for backporting (though I didn't cc stable if I'm not mistaken).
but not:
a06230b62b89 ("KVM: x86: Deliver exception payload on KVM_GET_VCPU_EVENTS")
which this patch absolutely depends on.
Otherwise, I'll defer discussions regarding the suitability of this patch for stable to Paolo.
I picked this patch up solely because of the Fixes: tag showed that this ommit resolved something from a previous commit. The interdependancies were not obvious, especially as it all seemed to build just fine here.
I have already said half a dozen times that I don't want any of the autopick stuff for KVM. Is a Fixes tag sufficient to get patches into stable now?
Yes, it can happen that a Fixes tag does cause a patch to get into stable because I look out for that. I do that because a number of maintainers/developers only put that tag in there, and also to catch patches for when we backport stuff and then need to take a fix for that backport (not the case here though).
I'll be glad to just put KVM into the "never apply any patches to stable unless you explicitly mark it as such", but the sad fact is that many recent KVM fixes for reported CVEs never had any "Cc: stable@vger" markings. They only had "Fixes:" tags and so I have had to dig them out of the tree and backport them myself in order to resolve those very public issues.
So can I ask that you always properly tag things for stable? If so, I will be glad to ignore Fixes: tags for KVM patches in the future.
I'll go drop this patch as well. Note, there are other KVM patches in this release cycle also, can someone verify that I did not overreach for them as well?
thanks,
greg k-h
On 04/03/20 09:10, Greg Kroah-Hartman wrote:
I'll be glad to just put KVM into the "never apply any patches to stable unless you explicitly mark it as such", but the sad fact is that many recent KVM fixes for reported CVEs never had any "Cc: stable@vger" markings.
Hmm, I did miss it in 433f4ba1904100da65a311033f17a9bf586b287e and acff78477b9b4f26ecdf65733a4ed77fe837e9dc, but that's going back to August 2018, so I can do better but it's not too shabby a record. :)
They only had "Fixes:" tags and so I have had to dig them out of the tree and backport them myself in order to resolve those very public issues.
So can I ask that you always properly tag things for stable? If so, I will be glad to ignore Fixes: tags for KVM patches in the future.
I'll go drop this patch as well. Note, there are other KVM patches in this release cycle also, can someone verify that I did not overreach for them as well?
I checked them and they are fine.
Paolo
On Wed, Mar 04, 2020 at 09:19:09AM +0100, Paolo Bonzini wrote:
On 04/03/20 09:10, Greg Kroah-Hartman wrote:
I'll be glad to just put KVM into the "never apply any patches to stable unless you explicitly mark it as such", but the sad fact is that many recent KVM fixes for reported CVEs never had any "Cc: stable@vger" markings.
Hmm, I did miss it in 433f4ba1904100da65a311033f17a9bf586b287e and acff78477b9b4f26ecdf65733a4ed77fe837e9dc, but that's going back to August 2018, so I can do better but it's not too shabby a record. :)
35a571346a94 ("KVM: nVMX: Check IO instruction VM-exit conditions") e71237d3ff1a ("KVM: nVMX: Refactor IO bitmap checks into helper function")
Were both from a few weeks ago and needed to resolve CVE-2020-2732 :(
They only had "Fixes:" tags and so I have had to dig them out of the tree and backport them myself in order to resolve those very public issues.
So can I ask that you always properly tag things for stable? If so, I will be glad to ignore Fixes: tags for KVM patches in the future.
I'll go drop this patch as well. Note, there are other KVM patches in this release cycle also, can someone verify that I did not overreach for them as well?
I checked them and they are fine.
Thank you for that.
greg k-h
On 04/03/20 09:26, Greg Kroah-Hartman wrote:
On Wed, Mar 04, 2020 at 09:19:09AM +0100, Paolo Bonzini wrote:
On 04/03/20 09:10, Greg Kroah-Hartman wrote:
I'll be glad to just put KVM into the "never apply any patches to stable unless you explicitly mark it as such", but the sad fact is that many recent KVM fixes for reported CVEs never had any "Cc: stable@vger" markings.
Hmm, I did miss it in 433f4ba1904100da65a311033f17a9bf586b287e and acff78477b9b4f26ecdf65733a4ed77fe837e9dc, but that's going back to August 2018, so I can do better but it's not too shabby a record. :)
35a571346a94 ("KVM: nVMX: Check IO instruction VM-exit conditions") e71237d3ff1a ("KVM: nVMX: Refactor IO bitmap checks into helper function")
Were both from a few weeks ago and needed to resolve CVE-2020-2732 :(
No, they weren't, only the patch that was CCed stable was needed to resolve the CVE.
Remember that at this point a lot of bugfixes or vulnerabilities in KVM exploit corner cases of the architecture and don't show up with the usual guests (Linux, Windows, BSDs). Since we didn't have full information on the impact on guests that people do run, we started with the bare minimum (the two patches above) but only for 5.6. The idea was to collect follow-up patches for 2-4 weeks, decide which subset was stable-worthy, and only then post them as stable backport subsets.
Paolo
On Wed, Mar 04, 2020 at 09:43:18AM +0100, Paolo Bonzini wrote:
On 04/03/20 09:26, Greg Kroah-Hartman wrote:
On Wed, Mar 04, 2020 at 09:19:09AM +0100, Paolo Bonzini wrote:
On 04/03/20 09:10, Greg Kroah-Hartman wrote:
I'll be glad to just put KVM into the "never apply any patches to stable unless you explicitly mark it as such", but the sad fact is that many recent KVM fixes for reported CVEs never had any "Cc: stable@vger" markings.
Hmm, I did miss it in 433f4ba1904100da65a311033f17a9bf586b287e and acff78477b9b4f26ecdf65733a4ed77fe837e9dc, but that's going back to August 2018, so I can do better but it's not too shabby a record. :)
35a571346a94 ("KVM: nVMX: Check IO instruction VM-exit conditions") e71237d3ff1a ("KVM: nVMX: Refactor IO bitmap checks into helper function")
Were both from a few weeks ago and needed to resolve CVE-2020-2732 :(
No, they weren't, only the patch that was CCed stable was needed to resolve the CVE.
Ah, that's not what was posted to oss-security :(
Remember that at this point a lot of bugfixes or vulnerabilities in KVM exploit corner cases of the architecture and don't show up with the usual guests (Linux, Windows, BSDs). Since we didn't have full information on the impact on guests that people do run, we started with the bare minimum (the two patches above) but only for 5.6. The idea was to collect follow-up patches for 2-4 weeks, decide which subset was stable-worthy, and only then post them as stable backport subsets.
Ok, that's fine, but it would be good if someone told me about this so that I knew what was going on when people asked me about this type of thing :)
thanks,
greg k-h
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit 54498e8070e19e74498a72c7331348143e7e1f8c upstream.
Factor out 100 from the equation and do 32-bit arithmetic (3 * clk_mhz / 10) instead of 64-bit.
Notice that clk_mhz is MHz, so the multiplication will never wrap 32 bits and there is no need for div_u64().
Addresses-Coverity: 1458369 ("Unintentional integer overflow") Fixes: 0560ad576268 ("i2c: altera: Add Altera I2C Controller driver") Suggested-by: David Laight David.Laight@ACULAB.COM Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Reviewed-by: Thor Thayer thor.thayer@linux.intel.com Signed-off-by: Wolfram Sang wsa@the-dreams.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/i2c/busses/i2c-altera.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/i2c/busses/i2c-altera.c +++ b/drivers/i2c/busses/i2c-altera.c @@ -171,7 +171,7 @@ static void altr_i2c_init(struct altr_i2 /* SCL Low Time */ writel(t_low, idev->base + ALTR_I2C_SCL_LOW); /* SDA Hold Time, 300ns */ - writel(div_u64(300 * clk_mhz, 1000), idev->base + ALTR_I2C_SDA_HOLD); + writel(3 * clk_mhz / 10, idev->base + ALTR_I2C_SDA_HOLD);
/* Mask all master interrupt bits */ altr_i2c_int_enable(idev, ALTR_I2C_ALL_IRQ, false);
From: Wolfram Sang wsa@the-dreams.de
commit 9e661cedcc0a072d91a32cb88e0515ea26e35711 upstream.
The printout for txabrt is way too talkative and is highly annoying with scanning programs like 'i2cdetect'. Reduce it to the minimum, the rest can be gained by I2C core debugging and datasheet information. Also, make it a debug printout, it won't help the regular user.
Fixes: ba92222ed63a ("i2c: jz4780: Add i2c bus controller driver for Ingenic JZ4780") Reported-by: H. Nikolaus Schaller hns@goldelico.com Tested-by: H. Nikolaus Schaller hns@goldelico.com Signed-off-by: Wolfram Sang wsa@the-dreams.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/i2c/busses/i2c-jz4780.c | 36 ++---------------------------------- 1 file changed, 2 insertions(+), 34 deletions(-)
--- a/drivers/i2c/busses/i2c-jz4780.c +++ b/drivers/i2c/busses/i2c-jz4780.c @@ -73,25 +73,6 @@ #define JZ4780_I2C_STA_TFNF BIT(1) #define JZ4780_I2C_STA_ACT BIT(0)
-static const char * const jz4780_i2c_abrt_src[] = { - "ABRT_7B_ADDR_NOACK", - "ABRT_10ADDR1_NOACK", - "ABRT_10ADDR2_NOACK", - "ABRT_XDATA_NOACK", - "ABRT_GCALL_NOACK", - "ABRT_GCALL_READ", - "ABRT_HS_ACKD", - "SBYTE_ACKDET", - "ABRT_HS_NORSTRT", - "SBYTE_NORSTRT", - "ABRT_10B_RD_NORSTRT", - "ABRT_MASTER_DIS", - "ARB_LOST", - "SLVFLUSH_TXFIFO", - "SLV_ARBLOST", - "SLVRD_INTX", -}; - #define JZ4780_I2C_INTST_IGC BIT(11) #define JZ4780_I2C_INTST_ISTT BIT(10) #define JZ4780_I2C_INTST_ISTP BIT(9) @@ -529,21 +510,8 @@ done:
static void jz4780_i2c_txabrt(struct jz4780_i2c *i2c, int src) { - int i; - - dev_err(&i2c->adap.dev, "txabrt: 0x%08x\n", src); - dev_err(&i2c->adap.dev, "device addr=%x\n", - jz4780_i2c_readw(i2c, JZ4780_I2C_TAR)); - dev_err(&i2c->adap.dev, "send cmd count:%d %d\n", - i2c->cmd, i2c->cmd_buf[i2c->cmd]); - dev_err(&i2c->adap.dev, "receive data count:%d %d\n", - i2c->cmd, i2c->data_buf[i2c->cmd]); - - for (i = 0; i < 16; i++) { - if (src & BIT(i)) - dev_dbg(&i2c->adap.dev, "I2C TXABRT[%d]=%s\n", - i, jz4780_i2c_abrt_src[i]); - } + dev_dbg(&i2c->adap.dev, "txabrt: 0x%08x, cmd: %d, send: %d, recv: %d\n", + src, i2c->cmd, i2c->cmd_buf[i2c->cmd], i2c->data_buf[i2c->cmd]); }
static inline int jz4780_i2c_xfer_read(struct jz4780_i2c *i2c,
From: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz
commit 97e914b7de3c943011779b979b8093fdc0d85722 upstream.
The Cavium Octeon CPU uses a special sync instruction for implementing wmb, and due to a CPU bug, the instruction must appear twice. A macro had been defined to hide this:
#define __SYNC_rpt(type) (1 + (type == __SYNC_wmb))
which was intended to evaluate to 2 for __SYNC_wmb, and 1 for any other type of sync. However, this expression is evaluated by the assembler, and not the compiler, and the result of '==' in the assembler is 0 or -1, not 0 or 1 as it is in C. The net result was wmb() producing no code at all. The simple fix in this patch is to change the '+' to '-'.
Fixes: bf92927251b3 ("MIPS: barrier: Add __SYNC() infrastructure") Signed-off-by: Mark Tomlinson mark.tomlinson@alliedtelesis.co.nz Tested-by: Chris Packham chris.packham@alliedtelesis.co.nz Signed-off-by: Paul Burton paulburton@kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/include/asm/sync.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/mips/include/asm/sync.h +++ b/arch/mips/include/asm/sync.h @@ -155,9 +155,11 @@ * effective barrier as noted by commit 6b07d38aaa52 ("MIPS: Octeon: Use * optimized memory barrier primitives."). Here we specify that the affected * sync instructions should be emitted twice. + * Note that this expression is evaluated by the assembler (not the compiler), + * and that the assembler evaluates '==' as 0 or -1, not 0 or 1. */ #ifdef CONFIG_CPU_CAVIUM_OCTEON -# define __SYNC_rpt(type) (1 + (type == __SYNC_wmb)) +# define __SYNC_rpt(type) (1 - (type == __SYNC_wmb)) #else # define __SYNC_rpt(type) 1 #endif
From: Tina Zhang tina.zhang@intel.com
commit b549c252b1292aea959cd9b83537fcb9384a6112 upstream.
Deleting dmabuf item's list head after releasing its container can lead to KASAN-reported issue:
BUG: KASAN: use-after-free in __list_del_entry_valid+0x15/0xf0 Read of size 8 at addr ffff88818a4598a8 by task kworker/u8:3/13119
So fix this issue by puting deleting dmabuf_objs ahead of releasing its container.
Fixes: dfb6ae4e14bd6 ("drm/i915/gvt: Handle orphan dmabuf_objs") Signed-off-by: Tina Zhang tina.zhang@intel.com Reviewed-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Link: http://patchwork.freedesktop.org/patch/msgid/20200225053527.8336-2-tina.zhan... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/gvt/dmabuf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/i915/gvt/dmabuf.c +++ b/drivers/gpu/drm/i915/gvt/dmabuf.c @@ -151,12 +151,12 @@ static void dmabuf_gem_object_free(struc dmabuf_obj = container_of(pos, struct intel_vgpu_dmabuf_obj, list); if (dmabuf_obj == obj) { + list_del(pos); intel_gvt_hypervisor_put_vfio_device(vgpu); idr_remove(&vgpu->object_idr, dmabuf_obj->dmabuf_id); kfree(dmabuf_obj->info); kfree(dmabuf_obj); - list_del(pos); break; } }
From: Chris Wilson chris@chris-wilson.co.uk
commit 238734262142075056653b4de091458e0ca858f2 upstream.
We mark the vma as active while binding it in order to protect outselves from being shrunk under mempressure. This only works if we are strict in not attempting to shrink active objects.
<6> [472.618968] Workqueue: events_unbound fence_work [i915] <4> [472.618970] Call Trace: <4> [472.618974] ? __schedule+0x2e5/0x810 <4> [472.618978] schedule+0x37/0xe0 <4> [472.618982] schedule_preempt_disabled+0xf/0x20 <4> [472.618984] __mutex_lock+0x281/0x9c0 <4> [472.618987] ? mark_held_locks+0x49/0x70 <4> [472.618989] ? _raw_spin_unlock_irqrestore+0x47/0x60 <4> [472.619038] ? i915_vma_unbind+0xae/0x110 [i915] <4> [472.619084] ? i915_vma_unbind+0xae/0x110 [i915] <4> [472.619122] i915_vma_unbind+0xae/0x110 [i915] <4> [472.619165] i915_gem_object_unbind+0x1dc/0x400 [i915] <4> [472.619208] i915_gem_shrink+0x328/0x660 [i915] <4> [472.619250] ? i915_gem_shrink_all+0x38/0x60 [i915] <4> [472.619282] i915_gem_shrink_all+0x38/0x60 [i915] <4> [472.619325] vm_alloc_page.constprop.25+0x1aa/0x240 [i915] <4> [472.619330] ? rcu_read_lock_sched_held+0x4d/0x80 <4> [472.619363] ? __alloc_pd+0xb/0x30 [i915] <4> [472.619366] ? module_assert_mutex_or_preempt+0xf/0x30 <4> [472.619368] ? __module_address+0x23/0xe0 <4> [472.619371] ? is_module_address+0x26/0x40 <4> [472.619374] ? static_obj+0x34/0x50 <4> [472.619376] ? lockdep_init_map+0x4d/0x1e0 <4> [472.619407] setup_page_dma+0xd/0x90 [i915] <4> [472.619437] alloc_pd+0x29/0x50 [i915] <4> [472.619470] __gen8_ppgtt_alloc+0x443/0x6b0 [i915] <4> [472.619503] gen8_ppgtt_alloc+0xd7/0x300 [i915] <4> [472.619535] ppgtt_bind_vma+0x2a/0xe0 [i915] <4> [472.619577] __vma_bind+0x26/0x40 [i915] <4> [472.619611] fence_work+0x1c/0x90 [i915] <4> [472.619617] process_one_work+0x26a/0x620
Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20200221221818.2861432-1-chris... (cherry picked from commit 6f24e41022f28061368776ea1514db0a6e67a9b1) Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -257,8 +257,7 @@ unsigned long i915_gem_shrink_all(struct with_intel_runtime_pm(&i915->runtime_pm, wakeref) { freed = i915_gem_shrink(i915, -1UL, NULL, I915_SHRINK_BOUND | - I915_SHRINK_UNBOUND | - I915_SHRINK_ACTIVE); + I915_SHRINK_UNBOUND); }
return freed; @@ -337,7 +336,6 @@ i915_gem_shrinker_oom(struct notifier_bl freed_pages = 0; with_intel_runtime_pm(&i915->runtime_pm, wakeref) freed_pages += i915_gem_shrink(i915, -1UL, NULL, - I915_SHRINK_ACTIVE | I915_SHRINK_BOUND | I915_SHRINK_UNBOUND | I915_SHRINK_WRITEBACK);
From: Tina Zhang tina.zhang@intel.com
commit 3eb55e6f753a379e293395de8d5f3be28351a7f8 upstream.
ALL_ENGINES reset doesn't clobber display with the current gvt-g supported platforms. Thus ALL_ENGINES reset shouldn't reset the display engine registers emulated by gvt-g.
This fixes guest warning like
[ 14.622026] [drm] Initialized i915 1.6.0 20200114 for 0000:00:03.0 on minor 0 [ 14.967917] fbcon: i915drmfb (fb0) is primary device [ 25.100188] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] E RROR [CRTC:51:pipe A] flip_done timed out [ 25.100860] -----------[ cut here ]----------- [ 25.100861] pll on state mismatch (expected 0, found 1) [ 25.101024] WARNING: CPU: 1 PID: 30 at drivers/gpu/drm/i915/display/intel_dis play.c:14382 verify_single_dpll_state.isra.115+0x28f/0x320 [i915] [ 25.101025] Modules linked in: intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 aesni_intel cr ypto_simd cryptd glue_helper cec rc_core video drm_kms_helper joydev drm input_l eds i2c_algo_bit serio_raw fb_sys_fops syscopyarea sysfillrect sysimgblt mac_hid qemu_fw_cfg sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 e1000 psmouse i2c_piix4 pata_acpi floppy [ 25.101052] CPU: 1 PID: 30 Comm: kworker/u4:1 Not tainted 5.5.0+ #1 [ 25.101053] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1 .12.1-0-ga5cab58 04/01/2014 [ 25.101055] Workqueue: events_unbound async_run_entry_fn [ 25.101092] RIP: 0010:verify_single_dpll_state.isra.115+0x28f/0x320 [i915] [ 25.101093] Code: e0 d9 ff e9 a3 fe ff ff 80 3d e9 c2 11 00 00 44 89 f6 48 c7 c7 c0 9d 88 c0 75 3b e8 eb df d9 ff e9 c7 fe ff ff e8 d1 e0 ae c4 <0f> 0b e9 7a fe ff ff 80 3d c0 c2 11 00 00 8d 71 41 89 c2 48 c7 c7 [ 25.101093] RSP: 0018:ffffb1de80107878 EFLAGS: 00010286 [ 25.101094] RAX: 0000000000000000 RBX: ffffb1de80107884 RCX: 0000000000000007 [ 25.101095] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff94fdfdd19740 [ 25.101095] RBP: ffffb1de80107938 R08: 0000000d6bfdc7b4 R09: 000000000000002b [ 25.101096] R10: ffff94fdf82dc000 R11: 0000000000000225 R12: 00000000000001f8 [ 25.101096] R13: ffff94fdb3ca6a90 R14: ffff94fdb3ca0000 R15: 0000000000000000 [ 25.101097] FS: 0000000000000000(0000) GS:ffff94fdfdd00000(0000) knlGS:00000 00000000000 [ 25.101098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 25.101098] CR2: 00007fbc3e2be9c8 CR3: 000000003339a003 CR4: 0000000000360ee0 [ 25.101101] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 25.101101] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 25.101102] Call Trace: [ 25.101139] intel_atomic_commit_tail+0xde4/0x1520 [i915] [ 25.101141] ? flush_workqueue_prep_pwqs+0xfa/0x130 [ 25.101142] ? flush_workqueue+0x198/0x3c0 [ 25.101174] intel_atomic_commit+0x2ad/0x320 [i915] [ 25.101209] drm_atomic_commit+0x4a/0x50 [drm] [ 25.101220] drm_client_modeset_commit_atomic+0x1c4/0x200 [drm] [ 25.101231] drm_client_modeset_commit_force+0x47/0x170 [drm] [ 25.101250] drm_fb_helper_restore_fbdev_mode_unlocked+0x4e/0xa0 [drm_kms_hel per] [ 25.101255] drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper] [ 25.101287] intel_fbdev_set_par+0x1a/0x40 [i915] [ 25.101289] ? con_is_visible+0x2e/0x60 [ 25.101290] fbcon_init+0x378/0x600 [ 25.101292] visual_init+0xd5/0x130 [ 25.101296] do_bind_con_driver+0x217/0x430 [ 25.101297] do_take_over_console+0x7d/0x1b0 [ 25.101298] do_fbcon_takeover+0x5c/0xb0 [ 25.101299] fbcon_fb_registered+0x199/0x1a0 [ 25.101301] register_framebuffer+0x22c/0x330 [ 25.101306] __drm_fb_helper_initial_config_and_unlock+0x31a/0x520 [drm_kms_h elper] [ 25.101311] drm_fb_helper_initial_config+0x35/0x40 [drm_kms_helper] [ 25.101341] intel_fbdev_initial_config+0x18/0x30 [i915] [ 25.101342] async_run_entry_fn+0x3c/0x150 [ 25.101343] process_one_work+0x1fd/0x3f0 [ 25.101344] worker_thread+0x34/0x410 [ 25.101346] kthread+0x121/0x140 [ 25.101346] ? process_one_work+0x3f0/0x3f0 [ 25.101347] ? kthread_park+0x90/0x90 [ 25.101350] ret_from_fork+0x35/0x40 [ 25.101351] --[ end trace b5b47d44cd998ba1 ]--
Fixes: 6294b61ba769 ("drm/i915/gvt: add missing display part reset for vGPU reset") Signed-off-by: Tina Zhang tina.zhang@intel.com Reviewed-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Link: http://patchwork.freedesktop.org/patch/msgid/20200221023234.28635-1-tina.zha... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/gvt/vgpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/i915/gvt/vgpu.c +++ b/drivers/gpu/drm/i915/gvt/vgpu.c @@ -560,9 +560,9 @@ void intel_gvt_reset_vgpu_locked(struct
intel_vgpu_reset_mmio(vgpu, dmlr); populate_pvinfo_page(vgpu); - intel_vgpu_reset_display(vgpu);
if (dmlr) { + intel_vgpu_reset_display(vgpu); intel_vgpu_reset_cfg_space(vgpu); /* only reset the failsafe mode when dmlr reset */ vgpu->failsafe = false;
From: Johannes Berg johannes.berg@intel.com
commit 9951ebfcdf2b97dbb28a5d930458424341e61aa2 upstream.
If nl80211_parse_he_obss_pd() fails, we leak the previously allocated ACL memory. Free it in this case.
Fixes: 796e90f42b7e ("cfg80211: add support for parsing OBBS_PD attributes") Signed-off-by: Johannes Berg johannes.berg@intel.com Link: https://lore.kernel.org/r/20200221104142.835aba4cdd14.I1923b55ba9989c57e1397... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/wireless/nl80211.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -4800,8 +4800,7 @@ static int nl80211_start_ap(struct sk_bu err = nl80211_parse_he_obss_pd( info->attrs[NL80211_ATTR_HE_OBSS_PD], ¶ms.he_obss_pd); - if (err) - return err; + goto out; }
nl80211_calculate_ap_params(¶ms); @@ -4823,6 +4822,7 @@ static int nl80211_start_ap(struct sk_bu } wdev_unlock(wdev);
+out: kfree(params.acl);
return err;
From: Andrei Otcheretianski andrei.otcheretianski@intel.com
commit 0daa63ed4c6c4302790ce67b7a90c0997ceb7514 upstream.
The below-mentioned commit changed the code to unlock *inside* the function, but previously the unlock was *outside*. It failed to remove the outer unlock, however, leading to double unlock.
Fix this.
Fixes: 33483a6b88e4 ("mac80211: fix missing unlock on error in ieee80211_mark_sta_auth()") Signed-off-by: Andrei Otcheretianski andrei.otcheretianski@intel.com Link: https://lore.kernel.org/r/20200221104719.cce4741cf6eb.I671567b185c8a4c240937... [rewrite commit message to better explain what happened] Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/mac80211/mlme.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
--- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -2959,7 +2959,7 @@ static void ieee80211_rx_mgmt_auth(struc (auth_transaction == 2 && ifmgd->auth_data->expected_transaction == 2)) { if (!ieee80211_mark_sta_auth(sdata, bssid)) - goto out_err; + return; /* ignore frame -- wait for timeout */ } else if (ifmgd->auth_data->algorithm == WLAN_AUTH_SAE && auth_transaction == 2) { sdata_info(sdata, "SAE peer confirmed\n"); @@ -2967,10 +2967,6 @@ static void ieee80211_rx_mgmt_auth(struc }
cfg80211_rx_mlme_mgmt(sdata->dev, (u8 *)mgmt, len); - return; - out_err: - mutex_unlock(&sdata->local->sta_mtx); - /* ignore frame -- wait for timeout */ }
#define case_WLAN(type) \
From: Masahiro Yamada masahiroy@kernel.org
commit 7a04960560640ac5b0b89461f7757322b57d0c7a upstream.
This if_change_rule is not working properly; it cannot detect any command line change.
The reason is because cmd-check in scripts/Kbuild.include compares $(cmd_$@) and $(cmd_$1), but cmd_dtc_dt_yaml does not exist here.
For if_change_rule to work properly, the stem part of cmd_* and rule_* must match. Because this cmd_and_fixdep invokes cmd_dtc, this rule must be named rule_dtc.
Fixes: 4f0e3a57d6eb ("kbuild: Add support for DT binding schema checks") Signed-off-by: Masahiro Yamada masahiroy@kernel.org Acked-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- scripts/Makefile.lib | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -291,13 +291,13 @@ DT_TMP_SCHEMA := $(objtree)/$(DT_BINDING quiet_cmd_dtb_check = CHECK $@ cmd_dtb_check = $(DT_CHECKER) -u $(srctree)/$(DT_BINDING_DIR) -p $(DT_TMP_SCHEMA) $@ ;
-define rule_dtc_dt_yaml +define rule_dtc $(call cmd_and_fixdep,dtc,yaml) $(call cmd,dtb_check) endef
$(obj)/%.dt.yaml: $(src)/%.dts $(DTC) $(DT_TMP_SCHEMA) FORCE - $(call if_changed_rule,dtc_dt_yaml) + $(call if_changed_rule,dtc)
dtc-tmp = $(subst $(comma),_,$(dot-target).dts.tmp)
From: Haiyang Zhang haiyangz@microsoft.com
commit f6f13c125e05603f68f5bf31f045b95e6d493598 upstream.
When netvsc_attach() is called by operations like changing MTU, etc., an extra wakeup may happen while netvsc_attach() calling rndis_filter_device_add() which sends rndis messages when queue is stopped in netvsc_detach(). The completion message will wake up queue 0.
We can reproduce the issue by changing MTU etc., then the wake_queue counter from "ethtool -S" will increase beyond stop_queue counter: stop_queue: 0 wake_queue: 1 The issue causes queue wake up, and counter increment, no other ill effects in current code. So we didn't see any network problem for now.
To fix this, initialize tx_disable to true, and set it to false when the NIC is ready to be attached or registered.
Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Haiyang Zhang haiyangz@microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/hyperv/netvsc.c | 2 +- drivers/net/hyperv/netvsc_drv.c | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -99,7 +99,7 @@ static struct netvsc_device *alloc_net_d
init_waitqueue_head(&net_device->wait_drain); net_device->destroy = false; - net_device->tx_disable = false; + net_device->tx_disable = true;
net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT; net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT; --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -977,6 +977,7 @@ static int netvsc_attach(struct net_devi }
/* In any case device is now ready */ + nvdev->tx_disable = false; netif_device_attach(ndev);
/* Note: enable and attach happen when sub-channels setup */ @@ -2354,6 +2355,8 @@ static int netvsc_probe(struct hv_device else net->max_mtu = ETH_DATA_LEN;
+ nvdev->tx_disable = false; + ret = register_netdevice(net); if (ret != 0) { pr_err("Unable to register netdev.\n");
From: Peter Chen peter.chen@nxp.com
commit ca4b43c14cd88d28cfc6467d2fa075aad6818f1d upstream.
To work properly on every architectures and compilers, the enum value needs to be specific numbers.
Suggested-by: Greg KH gregkh@linuxfoundation.org Signed-off-by: Peter Chen peter.chen@nxp.com Link: https://lore.kernel.org/r/1580537624-10179-1-git-send-email-peter.chen@nxp.c... Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/uapi/linux/usb/charger.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/include/uapi/linux/usb/charger.h +++ b/include/uapi/linux/usb/charger.h @@ -14,18 +14,18 @@ * ACA (Accessory Charger Adapters) */ enum usb_charger_type { - UNKNOWN_TYPE, - SDP_TYPE, - DCP_TYPE, - CDP_TYPE, - ACA_TYPE, + UNKNOWN_TYPE = 0, + SDP_TYPE = 1, + DCP_TYPE = 2, + CDP_TYPE = 3, + ACA_TYPE = 4, };
/* USB charger state */ enum usb_charger_state { - USB_CHARGER_DEFAULT, - USB_CHARGER_PRESENT, - USB_CHARGER_ABSENT, + USB_CHARGER_DEFAULT = 0, + USB_CHARGER_PRESENT = 1, + USB_CHARGER_ABSENT = 2, };
#endif /* _UAPI__LINUX_USB_CHARGER_H */
From: Bijan Mottahedeh bijan.mottahedeh@oracle.com
commit 9515743bfb39c61aaf3d4f3219a645c8d1fe9a0e upstream.
Completions need to consumed in the same order the controller submitted them, otherwise future completion entries may overwrite ones we haven't handled yet. Hold the nvme queue's poll lock while completing new CQEs to prevent another thread from freeing command tags for reuse out-of-order.
Fixes: dabcefab45d3 ("nvme: provide optimized poll function for separate poll queues") Signed-off-by: Bijan Mottahedeh bijan.mottahedeh@oracle.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Jens Axboe axboe@kernel.dk Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/nvme/host/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1078,9 +1078,9 @@ static int nvme_poll(struct blk_mq_hw_ct
spin_lock(&nvmeq->cq_poll_lock); found = nvme_process_cq(nvmeq, &start, &end, -1); + nvme_complete_cqes(nvmeq, start, end); spin_unlock(&nvmeq->cq_poll_lock);
- nvme_complete_cqes(nvmeq, start, end); return found; }
From: Alexandra Winter wintera@linux.ibm.com
commit 6f3846f0955308b6d1b219419da42b8de2c08845 upstream.
When getting or setting VNICC parameters, the error code EOPNOTSUPP should have precedence over EBUSY.
EBUSY is used because vnicc feature and bridgeport feature are mutually exclusive, which is a temporary condition. Whereas EOPNOTSUPP indicates that the HW does not support all or parts of the vnicc feature. This issue causes the vnicc sysfs params to show 'blocked by bridgeport' for HW that does not support VNICC at all.
Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support") Signed-off-by: Alexandra Winter wintera@linux.ibm.com Signed-off-by: Julian Wiedmann jwi@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/net/qeth_l2_main.c | 29 +++++++++++++---------------- 1 file changed, 13 insertions(+), 16 deletions(-)
--- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -1815,15 +1815,14 @@ int qeth_l2_vnicc_set_state(struct qeth_
QETH_CARD_TEXT(card, 2, "vniccsch");
- /* do not change anything if BridgePort is enabled */ - if (qeth_bridgeport_is_in_use(card)) - return -EBUSY; - /* check if characteristic and enable/disable are supported */ if (!(card->options.vnicc.sup_chars & vnicc) || !(card->options.vnicc.set_char_sup & vnicc)) return -EOPNOTSUPP;
+ if (qeth_bridgeport_is_in_use(card)) + return -EBUSY; + /* set enable/disable command and store wanted characteristic */ if (state) { cmd = IPA_VNICC_ENABLE; @@ -1869,14 +1868,13 @@ int qeth_l2_vnicc_get_state(struct qeth_
QETH_CARD_TEXT(card, 2, "vniccgch");
- /* do not get anything if BridgePort is enabled */ - if (qeth_bridgeport_is_in_use(card)) - return -EBUSY; - /* check if characteristic is supported */ if (!(card->options.vnicc.sup_chars & vnicc)) return -EOPNOTSUPP;
+ if (qeth_bridgeport_is_in_use(card)) + return -EBUSY; + /* if card is ready, query current VNICC state */ if (qeth_card_hw_is_reachable(card)) rc = qeth_l2_vnicc_query_chars(card); @@ -1894,15 +1892,14 @@ int qeth_l2_vnicc_set_timeout(struct qet
QETH_CARD_TEXT(card, 2, "vniccsto");
- /* do not change anything if BridgePort is enabled */ - if (qeth_bridgeport_is_in_use(card)) - return -EBUSY; - /* check if characteristic and set_timeout are supported */ if (!(card->options.vnicc.sup_chars & QETH_VNICC_LEARNING) || !(card->options.vnicc.getset_timeout_sup & QETH_VNICC_LEARNING)) return -EOPNOTSUPP;
+ if (qeth_bridgeport_is_in_use(card)) + return -EBUSY; + /* do we need to do anything? */ if (card->options.vnicc.learning_timeout == timeout) return rc; @@ -1931,14 +1928,14 @@ int qeth_l2_vnicc_get_timeout(struct qet
QETH_CARD_TEXT(card, 2, "vniccgto");
- /* do not get anything if BridgePort is enabled */ - if (qeth_bridgeport_is_in_use(card)) - return -EBUSY; - /* check if characteristic and get_timeout are supported */ if (!(card->options.vnicc.sup_chars & QETH_VNICC_LEARNING) || !(card->options.vnicc.getset_timeout_sup & QETH_VNICC_LEARNING)) return -EOPNOTSUPP; + + if (qeth_bridgeport_is_in_use(card)) + return -EBUSY; + /* if card is ready, get timeout. Otherwise, just return stored value */ *timeout = card->options.vnicc.learning_timeout; if (qeth_card_hw_is_reachable(card))
From: Julian Wiedmann jwi@linux.ibm.com
commit 54a61fbc020fd2e305680871c453abcf7fc0339b upstream.
The RX copybreak is intended as the _max_ value where the frame's data should be copied. So for frame_len == copybreak, don't build an SG skb.
Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann jwi@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/net/qeth_core_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -5142,7 +5142,7 @@ next_packet: }
use_rx_sg = (card->options.cq == QETH_CQ_ENABLED) || - ((skb_len >= card->options.rx_sg_cb) && + (skb_len > card->options.rx_sg_cb && !atomic_read(&card->force_alloc_skb) && !IS_OSN(card));
From: Nikolay Aleksandrov nikolay@cumulusnetworks.com
commit 3a20773beeeeadec41477a5ba872175b778ff752 upstream.
Since nl_groups is a u32 we can't bind more groups via ->bind (netlink_bind) call, but netlink has supported more groups via setsockopt() for a long time and thus nlk->ngroups could be over 32. Recently I added support for per-vlan notifications and increased the groups to 33 for NETLINK_ROUTE which exposed an old bug in the netlink_bind() code causing out-of-bounds access on archs where unsigned long is 32 bits via test_bit() on a local variable. Fix this by capping the maximum groups in netlink_bind() to BITS_PER_TYPE(u32), effectively capping them at 32 which is the minimum of allocated groups and the maximum groups which can be bound via netlink_bind().
CC: Christophe Leroy christophe.leroy@c-s.fr CC: Richard Guy Briggs rgb@redhat.com Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.") Reported-by: Erhard F. erhard_f@mailbox.org Signed-off-by: Nikolay Aleksandrov nikolay@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netlink/af_netlink.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1014,7 +1014,8 @@ static int netlink_bind(struct socket *s if (nlk->netlink_bind && groups) { int group;
- for (group = 0; group < nlk->ngroups; group++) { + /* nl_groups is a u32, so cap the maximum groups we can bind */ + for (group = 0; group < BITS_PER_TYPE(u32); group++) { if (!test_bit(group, &groups)) continue; err = nlk->netlink_bind(net, group + 1); @@ -1033,7 +1034,7 @@ static int netlink_bind(struct socket *s netlink_insert(sk, nladdr->nl_pid) : netlink_autobind(sock); if (err) { - netlink_undo_bind(nlk->ngroups, groups, sk); + netlink_undo_bind(BITS_PER_TYPE(u32), groups, sk); goto unlock; } }
From: Dmitry Bezrukov dbezrukov@marvell.com
commit 15beab0a9d797be1b7c67458da007a62269be29a upstream.
Yet another checksum offload compatibility issue was found.
The known issue is that AQC HW marks tcp packets with 0xFFFF checksum as invalid (1). This is workarounded in driver, passing all the suspicious packets up to the stack for further csum validation.
Another HW problem (2) is that it hides invalid csum of LRO aggregated packets inside of the individual descriptors. That was workarounded by forced scan of all LRO descriptors for checksum errors.
However the scan logic was joint for both LRO and multi-descriptor packets (jumbos). And this causes the issue.
We have to drop LRO packets with the detected bad checksum because of (2), but we have to pass jumbo packets to stack because of (1).
When using windows tcp partner with jumbo frames but with LSO disabled driver discards such frames as bad checksummed. But only LRO frames should be dropped, not jumbos.
On such a configurations tcp stream have a chance of drops and stucks.
(1) 76f254d4afe2 ("net: aquantia: tcp checksum 0xffff being handled incorrectly") (2) d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum")
Fixes: d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum") Signed-off-by: Dmitry Bezrukov dbezrukov@marvell.com Signed-off-by: Igor Russkikh irusskikh@marvell.com Signed-off-by: Dmitry Bogdanov dbogdanov@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 3 ++- drivers/net/ethernet/aquantia/atlantic/aq_ring.h | 3 ++- drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 5 +++-- 3 files changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -351,7 +351,8 @@ int aq_ring_rx_clean(struct aq_ring_s *s err = 0; goto err_exit; } - if (buff->is_error || buff->is_cso_err) { + if (buff->is_error || + (buff->is_lro && buff->is_cso_err)) { buff_ = buff; do { next_ = buff_->next, --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h @@ -78,7 +78,8 @@ struct __packed aq_ring_buff_s { u32 is_cleaned:1; u32 is_error:1; u32 is_vlan:1; - u32 rsvd3:4; + u32 is_lro:1; + u32 rsvd3:3; u16 eop_index; u16 rsvd4; }; --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c @@ -823,6 +823,8 @@ static int hw_atl_b0_hw_ring_rx_receive( } }
+ buff->is_lro = !!(HW_ATL_B0_RXD_WB_STAT2_RSCCNT & + rxd_wb->status); if (HW_ATL_B0_RXD_WB_STAT2_EOP & rxd_wb->status) { buff->len = rxd_wb->pkt_len % AQ_CFG_RX_FRAME_MAX; @@ -835,8 +837,7 @@ static int hw_atl_b0_hw_ring_rx_receive( rxd_wb->pkt_len > AQ_CFG_RX_FRAME_MAX ? AQ_CFG_RX_FRAME_MAX : rxd_wb->pkt_len;
- if (HW_ATL_B0_RXD_WB_STAT2_RSCCNT & - rxd_wb->status) { + if (buff->is_lro) { /* LRO */ buff->next = rxd_wb->next_desc_ptr; ++ring->stats.rx.lro_packets;
From: Nikita Danilov ndanilov@marvell.com
commit b42726fcf76e9367e524392e0ead7e672cc0791c upstream.
Add checks to not enable multiple loopback modes simultaneously, It was also discovered that for dma loopback to function correctly promisc mode should be enabled on device.
Fixes: ea4b4d7fc106 ("net: atlantic: loopback tests via private flags") Signed-off-by: Nikita Danilov ndanilov@marvell.com Signed-off-by: Igor Russkikh irusskikh@marvell.com Signed-off-by: Dmitry Bogdanov dbogdanov@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c | 5 +++++ drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 13 ++++++++----- 2 files changed, 13 insertions(+), 5 deletions(-)
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c @@ -722,6 +722,11 @@ static int aq_ethtool_set_priv_flags(str if (flags & ~AQ_PRIV_FLAGS_MASK) return -EOPNOTSUPP;
+ if (hweight32((flags | priv_flags) & AQ_HW_LOOPBACK_MASK) > 1) { + netdev_info(ndev, "Can't enable more than one loopback simultaneously\n"); + return -EINVAL; + } + cfg->priv_flags = flags;
if ((priv_flags ^ flags) & BIT(AQ_HW_LOOPBACK_DMA_NET)) { --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c @@ -885,13 +885,16 @@ static int hw_atl_b0_hw_packet_filter_se { struct aq_nic_cfg_s *cfg = self->aq_nic_cfg; unsigned int i = 0U; + u32 vlan_promisc; + u32 l2_promisc;
- hw_atl_rpfl2promiscuous_mode_en_set(self, - IS_FILTER_ENABLED(IFF_PROMISC)); + l2_promisc = IS_FILTER_ENABLED(IFF_PROMISC) || + !!(cfg->priv_flags & BIT(AQ_HW_LOOPBACK_DMA_NET)); + vlan_promisc = l2_promisc || cfg->is_vlan_force_promisc;
- hw_atl_rpf_vlan_prom_mode_en_set(self, - IS_FILTER_ENABLED(IFF_PROMISC) || - cfg->is_vlan_force_promisc); + hw_atl_rpfl2promiscuous_mode_en_set(self, l2_promisc); + + hw_atl_rpf_vlan_prom_mode_en_set(self, vlan_promisc);
hw_atl_rpfl2multicast_flr_en_set(self, IS_FILTER_ENABLED(IFF_ALLMULTI) &&
From: Pavel Belous pbelous@marvell.com
commit a4980919ad6a7be548d499bc5338015e1a9191c6 upstream.
skb->len is used to calculate statistics after xmit invocation.
Under a stress load it may happen that skb will be xmited, rx interrupt will come and skb will be freed, all before xmit function is even returned.
Eventually, skb->len will access unallocated area.
Moving stats calculation into tx_clean routine.
Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code") Reported-by: Christophe Vu-Brugier cvubrugier@fastmail.fm Signed-off-by: Igor Russkikh irusskikh@marvell.com Signed-off-by: Pavel Belous pbelous@marvell.com Signed-off-by: Dmitry Bogdanov dbogdanov@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 4 ---- drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 7 +++++-- 2 files changed, 5 insertions(+), 6 deletions(-)
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c @@ -655,10 +655,6 @@ int aq_nic_xmit(struct aq_nic_s *self, s if (likely(frags)) { err = self->aq_hw_ops->hw_ring_tx_xmit(self->aq_hw, ring, frags); - if (err >= 0) { - ++ring->stats.tx.packets; - ring->stats.tx.bytes += skb->len; - } } else { err = NETDEV_TX_BUSY; } --- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c @@ -272,9 +272,12 @@ bool aq_ring_tx_clean(struct aq_ring_s * } }
- if (unlikely(buff->is_eop)) - dev_kfree_skb_any(buff->skb); + if (unlikely(buff->is_eop)) { + ++self->stats.rx.packets; + self->stats.tx.bytes += buff->skb->len;
+ dev_kfree_skb_any(buff->skb); + } buff->pa = 0U; buff->eop_index = 0xffffU; self->sw_head = aq_ring_next_dx(self, self->sw_head);
From: Pavel Belous pbelous@marvell.com
commit 380ec5b9af7f0d57dbf6ac067fd9f33cff2fef71 upstream.
Code inspection found that in case of mapping error we do return current 'ret' value. But beside error, it is used to count number of descriptors allocated for the packet. In that case map_skb function could return '1'.
Changing it to return zero (number of mapped descriptors for skb)
Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code") Signed-off-by: Pavel Belous pbelous@marvell.com Signed-off-by: Igor Russkikh irusskikh@marvell.com Signed-off-by: Dmitry Bogdanov dbogdanov@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c @@ -533,8 +533,10 @@ unsigned int aq_nic_map_skb(struct aq_ni dx_buff->len, DMA_TO_DEVICE);
- if (unlikely(dma_mapping_error(aq_nic_get_dev(self), dx_buff->pa))) + if (unlikely(dma_mapping_error(aq_nic_get_dev(self), dx_buff->pa))) { + ret = 0; goto exit; + }
first = dx_buff; dx_buff->len_pkt = skb->len;
From: Pavel Belous pbelous@marvell.com
commit 52a22f4d6ff95e8bdca557765c04893eb5dd83fd upstream.
during hibernation freeze, aq_nic_stop could be invoked on a stopped device. That may cause panic on access to not yet allocated vector/ring structures.
Add a check to stop device if it is not yet stopped.
Similiarly after freeze in hibernation thaw, aq_nic_start could be invoked on a not initialized net device. Result will be the same.
Add a check to start device if it is initialized. In our case, this is the same as started.
Fixes: 8aaa112a57c1 ("net: atlantic: refactoring pm logic") Signed-off-by: Pavel Belous pbelous@marvell.com Signed-off-by: Nikita Danilov ndanilov@marvell.com Signed-off-by: Igor Russkikh irusskikh@marvell.com Signed-off-by: Dmitry Bogdanov dbogdanov@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c @@ -359,7 +359,8 @@ static int aq_suspend_common(struct devi netif_device_detach(nic->ndev); netif_tx_stop_all_queues(nic->ndev);
- aq_nic_stop(nic); + if (netif_running(nic->ndev)) + aq_nic_stop(nic);
if (deep) { aq_nic_deinit(nic, !nic->aq_hw->aq_nic_cfg->wol); @@ -375,7 +376,7 @@ static int atl_resume_common(struct devi { struct pci_dev *pdev = to_pci_dev(dev); struct aq_nic_s *nic; - int ret; + int ret = 0;
nic = pci_get_drvdata(pdev);
@@ -390,9 +391,11 @@ static int atl_resume_common(struct devi goto err_exit; }
- ret = aq_nic_start(nic); - if (ret) - goto err_exit; + if (netif_running(nic->ndev)) { + ret = aq_nic_start(nic); + if (ret) + goto err_exit; + }
netif_device_attach(nic->ndev); netif_tx_start_all_queues(nic->ndev);
From: Dmitry Bogdanov dbogdanov@marvell.com
commit 5a292c89a84d49b598f8978f154bdda48b1072c0 upstream.
fix static checker warning: drivers/net/ethernet/aquantia/atlantic/aq_filters.c:166 aq_check_approve_fvlan() error: passing untrusted data to 'test_bit()'
Reported-by: Dan Carpenter dan.carpenter@oracle.com Fixes: 7975d2aff5af: ("net: aquantia: add support of rx-vlan-filter offload") Signed-off-by: Dmitry Bogdanov dbogdanov@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/aquantia/atlantic/aq_filters.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/aquantia/atlantic/aq_filters.c +++ b/drivers/net/ethernet/aquantia/atlantic/aq_filters.c @@ -163,7 +163,7 @@ aq_check_approve_fvlan(struct aq_nic_s * }
if ((aq_nic->ndev->features & NETIF_F_HW_VLAN_CTAG_FILTER) && - (!test_bit(be16_to_cpu(fsp->h_ext.vlan_tci), + (!test_bit(be16_to_cpu(fsp->h_ext.vlan_tci) & VLAN_VID_MASK, aq_nic->active_vlans))) { netdev_err(aq_nic->ndev, "ethtool: unknown vlan-id specified");
From: Michael Ellerman mpe@ellerman.id.au
commit b9167c8078c3527de6da241c8a1a75a9224ed90a upstream.
Commit 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") added a 45 second timeout for tests, and also added a way for tests to customise the timeout via a settings file.
For example the ftrace tests take multiple minutes to run, so they were given longer in commit b43e78f65b1d ("tracing/selftests: Turn off timeout setting").
This works when the tests are run from the source tree. However if the tests are installed with "make -C tools/testing/selftests install", the settings files are not copied into the install directory. When the tests are then run from the install directory the longer timeouts are not applied and the tests timeout incorrectly.
So add the settings files to TEST_FILES of the appropriate Makefiles to cause the settings files to be installed using the existing install logic.
Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/testing/selftests/ftrace/Makefile | 2 +- tools/testing/selftests/livepatch/Makefile | 2 ++ tools/testing/selftests/rseq/Makefile | 2 ++ tools/testing/selftests/rtc/Makefile | 2 ++ 4 files changed, 7 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/ftrace/Makefile +++ b/tools/testing/selftests/ftrace/Makefile @@ -2,7 +2,7 @@ all:
TEST_PROGS := ftracetest -TEST_FILES := test.d +TEST_FILES := test.d settings EXTRA_CLEAN := $(OUTPUT)/logs/*
include ../lib.mk --- a/tools/testing/selftests/livepatch/Makefile +++ b/tools/testing/selftests/livepatch/Makefile @@ -8,4 +8,6 @@ TEST_PROGS := \ test-state.sh \ test-ftrace.sh
+TEST_FILES := settings + include ../lib.mk --- a/tools/testing/selftests/rseq/Makefile +++ b/tools/testing/selftests/rseq/Makefile @@ -19,6 +19,8 @@ TEST_GEN_PROGS_EXTENDED = librseq.so
TEST_PROGS = run_param_test.sh
+TEST_FILES := settings + include ../lib.mk
$(OUTPUT)/librseq.so: rseq.c rseq.h rseq-*.h --- a/tools/testing/selftests/rtc/Makefile +++ b/tools/testing/selftests/rtc/Makefile @@ -6,4 +6,6 @@ TEST_GEN_PROGS = rtctest
TEST_GEN_PROGS_EXTENDED = setdate
+TEST_FILES := settings + include ../lib.mk
From: Ursula Braun ubraun@linux.ibm.com
commit 369537c97024dca99303a8d4d6ab38b4f54d3909 upstream.
Just SMCR requires a CLC Peer ID, but not SMCD. The field should be zero for SMCD.
Fixes: c758dfddc1b5 ("net/smc: add SMC-D support in CLC messages") Signed-off-by: Ursula Braun ubraun@linux.ibm.com Signed-off-by: Karsten Graul kgraul@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/smc/smc_clc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/net/smc/smc_clc.c +++ b/net/smc/smc_clc.c @@ -372,7 +372,9 @@ int smc_clc_send_decline(struct smc_sock dclc.hdr.length = htons(sizeof(struct smc_clc_msg_decline)); dclc.hdr.version = SMC_CLC_V1; dclc.hdr.flag = (peer_diag_info == SMC_CLC_DECL_SYNCERR) ? 1 : 0; - memcpy(dclc.id_for_peer, local_systemid, sizeof(local_systemid)); + if (smc->conn.lgr && !smc->conn.lgr->is_smcd) + memcpy(dclc.id_for_peer, local_systemid, + sizeof(local_systemid)); dclc.peer_diagnosis = htonl(peer_diag_info); memcpy(dclc.trl.eyecatcher, SMC_EYECATCHER, sizeof(SMC_EYECATCHER));
From: Arthur Kiyanovski akiyano@amazon.com
commit 470793a78ce344bd53d31e0c2d537f71ba957547 upstream.
As the name suggests ETH_RSS_HASH_NO_CHANGE is received upon changing the key or indirection table using ethtool while keeping the same hash function.
Also add a function for retrieving the current hash function from the ena-com layer.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran sameehj@amazon.com Signed-off-by: Saeed Bshara saeedb@amazon.com Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/amazon/ena/ena_com.c | 5 +++++ drivers/net/ethernet/amazon/ena/ena_com.h | 8 ++++++++ drivers/net/ethernet/amazon/ena/ena_ethtool.c | 3 +++ 3 files changed, 16 insertions(+)
--- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -1046,6 +1046,11 @@ static int ena_com_get_feature(struct en feature_ver); }
+int ena_com_get_current_hash_function(struct ena_com_dev *ena_dev) +{ + return ena_dev->rss.hash_func; +} + static void ena_com_hash_key_fill_default_key(struct ena_com_dev *ena_dev) { struct ena_admin_feature_rss_flow_hash_control *hash_key = --- a/drivers/net/ethernet/amazon/ena/ena_com.h +++ b/drivers/net/ethernet/amazon/ena/ena_com.h @@ -656,6 +656,14 @@ int ena_com_rss_init(struct ena_com_dev */ void ena_com_rss_destroy(struct ena_com_dev *ena_dev);
+/* ena_com_get_current_hash_function - Get RSS hash function + * @ena_dev: ENA communication layer struct + * + * Return the current hash function. + * @return: 0 or one of the ena_admin_hash_functions values. + */ +int ena_com_get_current_hash_function(struct ena_com_dev *ena_dev); + /* ena_com_fill_hash_function - Fill RSS hash function * @ena_dev: ENA communication layer struct * @func: The hash function (Toeplitz or crc) --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c @@ -736,6 +736,9 @@ static int ena_set_rxfh(struct net_devic }
switch (hfunc) { + case ETH_RSS_HASH_NO_CHANGE: + func = ena_com_get_current_hash_function(ena_dev); + break; case ETH_RSS_HASH_TOP: func = ENA_ADMIN_TOEPLITZ; break;
From: Tuong Lien tuong.t.lien@dektech.com.au
commit 5391a87751a164b3194864126f3b016038abc9fe upstream.
In commit 9546a0b7ce00 ("tipc: fix wrong connect() return code"), we fixed the issue with the 'connect()' that returns zero even though the connecting has failed by waiting for the connection to be 'ESTABLISHED' really. However, the approach has one drawback in conjunction with our 'lightweight' connection setup mechanism that the following scenario can happen:
(server) (client)
+- accept()| | wait_for_conn() | | |connect() -------+ | |<-------[SYN]---------| > sleeping | | *CONNECTING | |--------->*ESTABLISHED | | |--------[ACK]-------->*ESTABLISHED > wakeup() send()|--------[DATA]------->|\ > wakeup() send()|--------[DATA]------->| | > wakeup() . . . . |-> recvq . . . . . | . send()|--------[DATA]------->|/ > wakeup() close()|--------[FIN]-------->*DISCONNECTING | *DISCONNECTING | | | ~~~~~~~~~~~~~~~~~~> schedule() | wait again . . | ETIMEDOUT
Upon the receipt of the server 'ACK', the client becomes 'ESTABLISHED' and the 'wait_for_conn()' process is woken up but not run. Meanwhile, the server starts to send a number of data following by a 'close()' shortly without waiting any response from the client, which then forces the client socket to be 'DISCONNECTING' immediately. When the wait process is switched to be running, it continues to wait until the timer expires because of the unexpected socket state. The client 'connect()' will finally get ‘-ETIMEDOUT’ and force to release the socket whereas there remains the messages in its receive queue.
Obviously the issue would not happen if the server had some delay prior to its 'close()' (or the number of 'DATA' messages is large enough), but any kind of delay would make the connection setup/shutdown "heavy". We solve this by simply allowing the 'connect()' returns zero in this particular case. The socket is already 'DISCONNECTING', so any further write will get '-EPIPE' but the socket is still able to read the messages existing in its receive queue.
Note: This solution doesn't break the previous one as it deals with a different situation that the socket state is 'DISCONNECTING' but has no error (i.e. sk->sk_err = 0).
Fixes: 9546a0b7ce00 ("tipc: fix wrong connect() return code") Acked-by: Ying Xue ying.xue@windriver.com Acked-by: Jon Maloy jon.maloy@ericsson.com Signed-off-by: Tuong Lien tuong.t.lien@dektech.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/tipc/socket.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2441,6 +2441,8 @@ static int tipc_wait_for_connect(struct return -ETIMEDOUT; if (signal_pending(current)) return sock_intr_errno(*timeo_p); + if (sk->sk_state == TIPC_DISCONNECTING) + break;
add_wait_queue(sk_sleep(sk), &wait); done = sk_wait_event(sk, timeo_p, tipc_sk_connected(sk),
From: Aleksa Sarai cyphar@cyphar.com
commit 2b98149c2377bff12be5dd3ce02ae0506e2dd613 upstream.
It's over-zealous to return hard errors under RCU-walk here, given that a REF-walk will be triggered for all other cases handling ".." under RCU.
The original purpose of this check was to ensure that if a rename occurs such that a directory is moved outside of the bind-mount which the resolution started in, it would be detected and blocked to avoid being able to mess with paths outside of the bind-mount. However, triggering a new REF-walk is just as effective a solution.
Cc: "Eric W. Biederman" ebiederm@xmission.com Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root") Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Aleksa Sarai cyphar@cyphar.com Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/namei.c +++ b/fs/namei.c @@ -1367,7 +1367,7 @@ static int follow_dotdot_rcu(struct name nd->path.dentry = parent; nd->seq = seq; if (unlikely(!path_connected(&nd->path))) - return -ENOENT; + return -ECHILD; break; } else { struct mount *mnt = real_mount(nd->path.mnt);
From: Brian Norris briannorris@chromium.org
commit 70e5b8f445fd27fde0c5583460e82539a7242424 upstream.
Before commit 1e58252e334d ("mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()"), mwifiex_process_tdls_action_frame() already had too many magic numbers. But this commit just added a ton more, in the name of checking for buffer overflows. That seems like a really bad idea.
Let's make these magic numbers a little less magic, by (a) factoring out 'pos[1]' as 'ie_len' (b) using 'sizeof' on the appropriate source or destination fields where possible, instead of bare numbers (c) dropping redundant checks, per below.
Regarding redundant checks: the beginning of the loop has this:
if (pos + 2 + pos[1] > end) break;
but then individual 'case's include stuff like this:
if (pos > end - 3) return; if (pos[1] != 1) return;
Note that the second 'return' (validating the length, pos[1]) combined with the above condition (ensuring 'pos + 2 + length' doesn't exceed 'end'), makes the first 'return' (whose 'if' can be reworded as 'pos > end - pos[1] - 2') redundant. Rather than unwind the magic numbers there, just drop those conditions.
Fixes: 1e58252e334d ("mwifiex: Fix heap overflow in mmwifiex_process_tdls_action_frame()") Signed-off-by: Brian Norris briannorris@chromium.org Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/marvell/mwifiex/tdls.c | 75 ++++++++++------------------ 1 file changed, 28 insertions(+), 47 deletions(-)
--- a/drivers/net/wireless/marvell/mwifiex/tdls.c +++ b/drivers/net/wireless/marvell/mwifiex/tdls.c @@ -894,7 +894,7 @@ void mwifiex_process_tdls_action_frame(s u8 *peer, *pos, *end; u8 i, action, basic; u16 cap = 0; - int ie_len = 0; + int ies_len = 0;
if (len < (sizeof(struct ethhdr) + 3)) return; @@ -916,7 +916,7 @@ void mwifiex_process_tdls_action_frame(s pos = buf + sizeof(struct ethhdr) + 4; /* payload 1+ category 1 + action 1 + dialog 1 */ cap = get_unaligned_le16(pos); - ie_len = len - sizeof(struct ethhdr) - TDLS_REQ_FIX_LEN; + ies_len = len - sizeof(struct ethhdr) - TDLS_REQ_FIX_LEN; pos += 2; break;
@@ -926,7 +926,7 @@ void mwifiex_process_tdls_action_frame(s /* payload 1+ category 1 + action 1 + dialog 1 + status code 2*/ pos = buf + sizeof(struct ethhdr) + 6; cap = get_unaligned_le16(pos); - ie_len = len - sizeof(struct ethhdr) - TDLS_RESP_FIX_LEN; + ies_len = len - sizeof(struct ethhdr) - TDLS_RESP_FIX_LEN; pos += 2; break;
@@ -934,7 +934,7 @@ void mwifiex_process_tdls_action_frame(s if (len < (sizeof(struct ethhdr) + TDLS_CONFIRM_FIX_LEN)) return; pos = buf + sizeof(struct ethhdr) + TDLS_CONFIRM_FIX_LEN; - ie_len = len - sizeof(struct ethhdr) - TDLS_CONFIRM_FIX_LEN; + ies_len = len - sizeof(struct ethhdr) - TDLS_CONFIRM_FIX_LEN; break; default: mwifiex_dbg(priv->adapter, ERROR, "Unknown TDLS frame type.\n"); @@ -947,33 +947,33 @@ void mwifiex_process_tdls_action_frame(s
sta_ptr->tdls_cap.capab = cpu_to_le16(cap);
- for (end = pos + ie_len; pos + 1 < end; pos += 2 + pos[1]) { - if (pos + 2 + pos[1] > end) + for (end = pos + ies_len; pos + 1 < end; pos += 2 + pos[1]) { + u8 ie_len = pos[1]; + + if (pos + 2 + ie_len > end) break;
switch (*pos) { case WLAN_EID_SUPP_RATES: - if (pos[1] > 32) + if (ie_len > sizeof(sta_ptr->tdls_cap.rates)) return; - sta_ptr->tdls_cap.rates_len = pos[1]; - for (i = 0; i < pos[1]; i++) + sta_ptr->tdls_cap.rates_len = ie_len; + for (i = 0; i < ie_len; i++) sta_ptr->tdls_cap.rates[i] = pos[i + 2]; break;
case WLAN_EID_EXT_SUPP_RATES: - if (pos[1] > 32) + if (ie_len > sizeof(sta_ptr->tdls_cap.rates)) return; basic = sta_ptr->tdls_cap.rates_len; - if (pos[1] > 32 - basic) + if (ie_len > sizeof(sta_ptr->tdls_cap.rates) - basic) return; - for (i = 0; i < pos[1]; i++) + for (i = 0; i < ie_len; i++) sta_ptr->tdls_cap.rates[basic + i] = pos[i + 2]; - sta_ptr->tdls_cap.rates_len += pos[1]; + sta_ptr->tdls_cap.rates_len += ie_len; break; case WLAN_EID_HT_CAPABILITY: - if (pos > end - sizeof(struct ieee80211_ht_cap) - 2) - return; - if (pos[1] != sizeof(struct ieee80211_ht_cap)) + if (ie_len != sizeof(struct ieee80211_ht_cap)) return; /* copy the ie's value into ht_capb*/ memcpy((u8 *)&sta_ptr->tdls_cap.ht_capb, pos + 2, @@ -981,59 +981,45 @@ void mwifiex_process_tdls_action_frame(s sta_ptr->is_11n_enabled = 1; break; case WLAN_EID_HT_OPERATION: - if (pos > end - - sizeof(struct ieee80211_ht_operation) - 2) - return; - if (pos[1] != sizeof(struct ieee80211_ht_operation)) + if (ie_len != sizeof(struct ieee80211_ht_operation)) return; /* copy the ie's value into ht_oper*/ memcpy(&sta_ptr->tdls_cap.ht_oper, pos + 2, sizeof(struct ieee80211_ht_operation)); break; case WLAN_EID_BSS_COEX_2040: - if (pos > end - 3) - return; - if (pos[1] != 1) + if (ie_len != sizeof(pos[2])) return; sta_ptr->tdls_cap.coex_2040 = pos[2]; break; case WLAN_EID_EXT_CAPABILITY: - if (pos > end - sizeof(struct ieee_types_header)) - return; - if (pos[1] < sizeof(struct ieee_types_header)) + if (ie_len < sizeof(struct ieee_types_header)) return; - if (pos[1] > 8) + if (ie_len > 8) return; memcpy((u8 *)&sta_ptr->tdls_cap.extcap, pos, sizeof(struct ieee_types_header) + - min_t(u8, pos[1], 8)); + min_t(u8, ie_len, 8)); break; case WLAN_EID_RSN: - if (pos > end - sizeof(struct ieee_types_header)) + if (ie_len < sizeof(struct ieee_types_header)) return; - if (pos[1] < sizeof(struct ieee_types_header)) - return; - if (pos[1] > IEEE_MAX_IE_SIZE - + if (ie_len > IEEE_MAX_IE_SIZE - sizeof(struct ieee_types_header)) return; memcpy((u8 *)&sta_ptr->tdls_cap.rsn_ie, pos, sizeof(struct ieee_types_header) + - min_t(u8, pos[1], IEEE_MAX_IE_SIZE - + min_t(u8, ie_len, IEEE_MAX_IE_SIZE - sizeof(struct ieee_types_header))); break; case WLAN_EID_QOS_CAPA: - if (pos > end - 3) - return; - if (pos[1] != 1) + if (ie_len != sizeof(pos[2])) return; sta_ptr->tdls_cap.qos_info = pos[2]; break; case WLAN_EID_VHT_OPERATION: if (priv->adapter->is_hw_11ac_capable) { - if (pos > end - - sizeof(struct ieee80211_vht_operation) - 2) - return; - if (pos[1] != + if (ie_len != sizeof(struct ieee80211_vht_operation)) return; /* copy the ie's value into vhtoper*/ @@ -1043,10 +1029,7 @@ void mwifiex_process_tdls_action_frame(s break; case WLAN_EID_VHT_CAPABILITY: if (priv->adapter->is_hw_11ac_capable) { - if (pos > end - - sizeof(struct ieee80211_vht_cap) - 2) - return; - if (pos[1] != sizeof(struct ieee80211_vht_cap)) + if (ie_len != sizeof(struct ieee80211_vht_cap)) return; /* copy the ie's value into vhtcap*/ memcpy((u8 *)&sta_ptr->tdls_cap.vhtcap, pos + 2, @@ -1056,9 +1039,7 @@ void mwifiex_process_tdls_action_frame(s break; case WLAN_EID_AID: if (priv->adapter->is_hw_11ac_capable) { - if (pos > end - 4) - return; - if (pos[1] != 2) + if (ie_len != sizeof(u16)) return; sta_ptr->tdls_cap.aid = get_unaligned_le16((pos + 2));
From: Brian Norris briannorris@chromium.org
commit 1c9f329b084b7b8ea6d60d91a202e884cdcf6aae upstream.
Commit 7afb94da3cd8 ("mwifiex: update set_mac_address logic") fixed the only user of this function, partly because the author seems to have noticed that, as written, it's on the borderline between highly misleading and buggy.
Anyway, no sense in keeping dead code around: let's drop it.
Fixes: 7afb94da3cd8 ("mwifiex: update set_mac_address logic") Signed-off-by: Brian Norris briannorris@chromium.org Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/marvell/mwifiex/main.h | 13 ------------- 1 file changed, 13 deletions(-)
--- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -1295,19 +1295,6 @@ mwifiex_copy_rates(u8 *dest, u32 pos, u8 return pos; }
-/* This function return interface number with the same bss_type. - */ -static inline u8 -mwifiex_get_intf_num(struct mwifiex_adapter *adapter, u8 bss_type) -{ - u8 i, num = 0; - - for (i = 0; i < adapter->priv_num; i++) - if (adapter->priv[i] && adapter->priv[i]->bss_type == bss_type) - num++; - return num; -} - /* * This function returns the correct private structure pointer based * upon the BSS type and BSS number.
From: Jin Yao yao.jin@linux.intel.com
commit c3314a74f86dc00827e0945c8e5039fc3aebaa3c upstream.
Commit 800d3f561659 ("perf report: Add warning when libunwind not compiled in") breaks the s390 platform. S390 uses libdw-dwarf-unwind for call chain unwinding and had no support for libunwind.
So the warning "Please install libunwind development packages during the perf build." caused the confusion even if the call-graph is displayed correctly.
This patch adds checking for HAVE_DWARF_SUPPORT, which is set when libdw-dwarf-unwind is compiled in.
Fixes: 800d3f561659 ("perf report: Add warning when libunwind not compiled in") Signed-off-by: Jin Yao yao.jin@linux.intel.com Reviewed-by: Thomas Richter tmricht@linux.ibm.com Tested-by: Thomas Richter tmricht@linux.ibm.com Acked-by: Jiri Olsa jolsa@redhat.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Jin Yao yao.jin@intel.com Cc: Kan Liang kan.liang@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/20200107191745.18415-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/builtin-report.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -412,10 +412,10 @@ static int report__setup_sample_type(str PERF_SAMPLE_BRANCH_ANY)) rep->nonany_branch_mode = true;
-#ifndef HAVE_LIBUNWIND_SUPPORT +#if !defined(HAVE_LIBUNWIND_SUPPORT) && !defined(HAVE_DWARF_SUPPORT) if (dwarf_callchain_users) { - ui__warning("Please install libunwind development packages " - "during the perf build.\n"); + ui__warning("Please install libunwind or libdw " + "development packages during the perf build.\n"); } #endif
From: Tom Lendacky thomas.lendacky@amd.com
commit 52918ed5fcf05d97d257f4131e19479da18f5d16 upstream.
The KVM MMIO support uses bit 51 as the reserved bit to cause nested page faults when a guest performs MMIO. The AMD memory encryption support uses a CPUID function to define the encryption bit position. Given this, it is possible that these bits can conflict.
Use svm_hardware_setup() to override the MMIO mask if memory encryption support is enabled. Various checks are performed to ensure that the mask is properly defined and rsvd_bits() is used to generate the new mask (as was done prior to the change that necessitated this patch).
Fixes: 28a1f3ac1d0c ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Suggested-by: Sean Christopherson sean.j.christopherson@intel.com Reviewed-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/svm.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+)
--- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1307,6 +1307,47 @@ static void shrink_ple_window(struct kvm } }
+/* + * The default MMIO mask is a single bit (excluding the present bit), + * which could conflict with the memory encryption bit. Check for + * memory encryption support and override the default MMIO mask if + * memory encryption is enabled. + */ +static __init void svm_adjust_mmio_mask(void) +{ + unsigned int enc_bit, mask_bit; + u64 msr, mask; + + /* If there is no memory encryption support, use existing mask */ + if (cpuid_eax(0x80000000) < 0x8000001f) + return; + + /* If memory encryption is not enabled, use existing mask */ + rdmsrl(MSR_K8_SYSCFG, msr); + if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT)) + return; + + enc_bit = cpuid_ebx(0x8000001f) & 0x3f; + mask_bit = boot_cpu_data.x86_phys_bits; + + /* Increment the mask bit if it is the same as the encryption bit */ + if (enc_bit == mask_bit) + mask_bit++; + + /* + * If the mask bit location is below 52, then some bits above the + * physical addressing limit will always be reserved, so use the + * rsvd_bits() function to generate the mask. This mask, along with + * the present bit, will be used to generate a page fault with + * PFER.RSV = 1. + * + * If the mask bit location is 52 (or above), then clear the mask. + */ + mask = (mask_bit < 52) ? rsvd_bits(mask_bit, 51) | PT_PRESENT_MASK : 0; + + kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK); +} + static __init int svm_hardware_setup(void) { int cpu; @@ -1361,6 +1402,8 @@ static __init int svm_hardware_setup(voi } }
+ svm_adjust_mmio_mask(); + for_each_possible_cpu(cpu) { r = svm_cpu_init(cpu); if (r)
From: Sean Christopherson sean.j.christopherson@intel.com
commit fcfbc617547fc6d9552cb6c1c563b6a90ee98085 upstream.
When reading/writing using the guest/host cache, check for a bad hva before checking for a NULL memslot, which triggers the slow path for handing cross-page accesses. Because the memslot is nullified on error by __kvm_gfn_to_hva_cache_init(), if the bad hva is encountered after crossing into a new page, then the kvm_{read,write}_guest() slow path could potentially write/access the first chunk prior to detecting the bad hva.
Arguably, performing a partial access is semantically correct from an architectural perspective, but that behavior is certainly not intended. In the original implementation, memslot was not explicitly nullified and therefore the partial access behavior varied based on whether the memslot itself was null, or if the hva was simply bad. The current behavior was introduced as a seemingly unintentional side effect in commit f1b9dd5eb86c ("kvm: Disallow wraparound in kvm_gfn_to_hva_cache_init"), which justified the change with "since some callers don't check the return code from this function, it sit seems prudent to clear ghc->memslot in the event of an error".
Regardless of intent, the partial access is dependent on _not_ checking the result of the cache initialization, which is arguably a bug in its own right, at best simply weird.
Fixes: 8f964525a121 ("KVM: Allow cross page reads and writes from cached translations.") Cc: Jim Mattson jmattson@google.com Cc: Andrew Honig ahonig@google.com Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- virt/kvm/kvm_main.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2287,12 +2287,12 @@ int kvm_write_guest_offset_cached(struct if (slots->generation != ghc->generation) __kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len);
- if (unlikely(!ghc->memslot)) - return kvm_write_guest(kvm, gpa, data, len); - if (kvm_is_error_hva(ghc->hva)) return -EFAULT;
+ if (unlikely(!ghc->memslot)) + return kvm_write_guest(kvm, gpa, data, len); + r = __copy_to_user((void __user *)ghc->hva + offset, data, len); if (r) return -EFAULT; @@ -2320,12 +2320,12 @@ int kvm_read_guest_cached(struct kvm *kv if (slots->generation != ghc->generation) __kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len);
- if (unlikely(!ghc->memslot)) - return kvm_read_guest(kvm, ghc->gpa, data, len); - if (kvm_is_error_hva(ghc->hva)) return -EFAULT;
+ if (unlikely(!ghc->memslot)) + return kvm_read_guest(kvm, ghc->gpa, data, len); + r = __copy_from_user(data, (void __user *)ghc->hva, len); if (r) return -EFAULT;
From: Cheng Jian cj.chengjian@huawei.com
commit 60588bfa223ff675b95f866249f90616613fbe31 upstream.
select_idle_cpu() will scan the LLC domain for idle CPUs, it's always expensive. so the next commit :
1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()")
introduces a way to limit how many CPUs we scan.
But it consume some CPUs out of 'nr' that are not allowed for the task and thus waste our attempts. The function always return nr_cpumask_bits, and we can't find a CPU which our task is allowed to run.
Cpumask may be too big, similar to select_idle_core(), use per_cpu_ptr 'select_idle_mask' to prevent stack overflow.
Fixes: 1ad3aaf3fcd2 ("sched/core: Implement new approach to scale select_idle_cpu()") Signed-off-by: Cheng Jian cj.chengjian@huawei.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Srikar Dronamraju srikar@linux.vnet.ibm.com Reviewed-by: Vincent Guittot vincent.guittot@linaro.org Reviewed-by: Valentin Schneider valentin.schneider@arm.com Link: https://lkml.kernel.org/r/20191213024530.28052-1-cj.chengjian@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/fair.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5828,6 +5828,7 @@ static inline int select_idle_smt(struct */ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target) { + struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask); struct sched_domain *this_sd; u64 avg_cost, avg_idle; u64 time, cost; @@ -5859,11 +5860,11 @@ static int select_idle_cpu(struct task_s
time = cpu_clock(this);
- for_each_cpu_wrap(cpu, sched_domain_span(sd), target) { + cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); + + for_each_cpu_wrap(cpu, cpus, target) { if (!--nr) return si_cpu; - if (!cpumask_test_cpu(cpu, p->cpus_ptr)) - continue; if (available_idle_cpu(cpu)) break; if (si_cpu == -1 && sched_idle_cpu(cpu))
From: Chao Yu yuchao0@huawei.com
commit 3e5e479a39ce9ed60cd63f7565cc1d9da77c2a4e upstream.
As Youling reported in mailing list:
https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs...
https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/
There is a test case can corrupt f2fs image: - dd if=/dev/zero of=/swapfile bs=1M count=4096 - chmod 600 /swapfile - mkswap /swapfile - swapon --discard /swapfile
The root cause is f2fs_swap_activate() intends to return zero value to setup_swap_extents() to enable SWP_FS mode (swap file goes through fs), in this flow, setup_swap_extents() setups swap extent with wrong block address range, result in discard_swap() erasing incorrect address.
Because f2fs_swap_activate() has pinned swapfile, its data block address will not change, it's safe to let swap to handle IO through raw device, so we can get rid of SWAP_FS mode and initial swap extents inside f2fs_swap_activate(), by this way, later discard_swap() can trim in right address range.
Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO") Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/f2fs/data.c | 32 +++++++++++++++++++++++++------- 1 file changed, 25 insertions(+), 7 deletions(-)
--- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -3132,7 +3132,8 @@ int f2fs_migrate_page(struct address_spa
#ifdef CONFIG_SWAP /* Copied from generic_swapfile_activate() to check any holes */ -static int check_swap_activate(struct file *swap_file, unsigned int max) +static int check_swap_activate(struct swap_info_struct *sis, + struct file *swap_file, sector_t *span) { struct address_space *mapping = swap_file->f_mapping; struct inode *inode = mapping->host; @@ -3143,6 +3144,8 @@ static int check_swap_activate(struct fi sector_t last_block; sector_t lowest_block = -1; sector_t highest_block = 0; + int nr_extents = 0; + int ret;
blkbits = inode->i_blkbits; blocks_per_page = PAGE_SIZE >> blkbits; @@ -3154,7 +3157,8 @@ static int check_swap_activate(struct fi probe_block = 0; page_no = 0; last_block = i_size_read(inode) >> blkbits; - while ((probe_block + blocks_per_page) <= last_block && page_no < max) { + while ((probe_block + blocks_per_page) <= last_block && + page_no < sis->max) { unsigned block_in_page; sector_t first_block;
@@ -3194,13 +3198,27 @@ static int check_swap_activate(struct fi highest_block = first_block; }
+ /* + * We found a PAGE_SIZE-length, PAGE_SIZE-aligned run of blocks + */ + ret = add_swap_extent(sis, page_no, 1, first_block); + if (ret < 0) + goto out; + nr_extents += ret; page_no++; probe_block += blocks_per_page; reprobe: continue; } - return 0; - + ret = nr_extents; + *span = 1 + highest_block - lowest_block; + if (page_no == 0) + page_no = 1; /* force Empty message */ + sis->max = page_no; + sis->pages = page_no - 1; + sis->highest_bit = page_no - 1; +out: + return ret; bad_bmap: pr_err("swapon: swapfile has holes\n"); return -EINVAL; @@ -3222,14 +3240,14 @@ static int f2fs_swap_activate(struct swa if (ret) return ret;
- ret = check_swap_activate(file, sis->max); - if (ret) + ret = check_swap_activate(sis, file, span); + if (ret < 0) return ret;
set_inode_flag(inode, FI_PIN_FILE); f2fs_precache_extents(inode); f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); - return 0; + return ret; }
static void f2fs_swap_deactivate(struct file *file)
From: Yixian Liu liuyixian@huawei.com
commit 4768820243d71d49f1044b3f911ac3d52bdb79af upstream.
Currently, the wqe idx is calculated repeatly everywhere it is used. This patch defines wqe_idx and calculated it only once, then just use it as needed.
Fixes: 2d40788825ac ("RDMA/hns: Add support for processing send wr and receive wr") Link: https://lore.kernel.org/r/1575981902-5274-1-git-send-email-liweihang@hisilic... Signed-off-by: Yixian Liu liuyixian@huawei.com Signed-off-by: Weihang Li liweihang@hisilicon.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/hw/hns/hns_roce_device.h | 3 - drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 37 ++++++++++-------------- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 43 +++++++++++----------------- 3 files changed, 35 insertions(+), 48 deletions(-)
--- a/drivers/infiniband/hw/hns/hns_roce_device.h +++ b/drivers/infiniband/hw/hns/hns_roce_device.h @@ -423,7 +423,7 @@ struct hns_roce_mr_table { struct hns_roce_wq { u64 *wrid; /* Work request ID */ spinlock_t lock; - int wqe_cnt; /* WQE num */ + u32 wqe_cnt; /* WQE num */ int max_gs; int offset; int wqe_shift; /* WQE size */ @@ -647,7 +647,6 @@ struct hns_roce_qp { u8 sdb_en; u32 doorbell_qpn; u32 sq_signal_bits; - u32 sq_next_wqe; struct hns_roce_wq sq;
struct ib_umem *umem; --- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c @@ -74,8 +74,8 @@ static int hns_roce_v1_post_send(struct unsigned long flags = 0; void *wqe = NULL; __le32 doorbell[2]; + u32 wqe_idx = 0; int nreq = 0; - u32 ind = 0; int ret = 0; u8 *smac; int loopback; @@ -88,7 +88,7 @@ static int hns_roce_v1_post_send(struct }
spin_lock_irqsave(&qp->sq.lock, flags); - ind = qp->sq_next_wqe; + for (nreq = 0; wr; ++nreq, wr = wr->next) { if (hns_roce_wq_overflow(&qp->sq, nreq, qp->ibqp.send_cq)) { ret = -ENOMEM; @@ -96,6 +96,8 @@ static int hns_roce_v1_post_send(struct goto out; }
+ wqe_idx = (qp->sq.head + nreq) & (qp->sq.wqe_cnt - 1); + if (unlikely(wr->num_sge > qp->sq.max_gs)) { dev_err(dev, "num_sge=%d > qp->sq.max_gs=%d\n", wr->num_sge, qp->sq.max_gs); @@ -104,9 +106,8 @@ static int hns_roce_v1_post_send(struct goto out; }
- wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1)); - qp->sq.wrid[(qp->sq.head + nreq) & (qp->sq.wqe_cnt - 1)] = - wr->wr_id; + wqe = get_send_wqe(qp, wqe_idx); + qp->sq.wrid[wqe_idx] = wr->wr_id;
/* Corresponding to the RC and RD type wqe process separately */ if (ibqp->qp_type == IB_QPT_GSI) { @@ -210,7 +211,6 @@ static int hns_roce_v1_post_send(struct cpu_to_le32((wr->sg_list[1].addr) >> 32); ud_sq_wqe->l_key1 = cpu_to_le32(wr->sg_list[1].lkey); - ind++; } else if (ibqp->qp_type == IB_QPT_RC) { u32 tmp_len = 0;
@@ -308,7 +308,6 @@ static int hns_roce_v1_post_send(struct ctrl->flag |= cpu_to_le32(wr->num_sge << HNS_ROCE_WQE_SGE_NUM_BIT); } - ind++; } }
@@ -336,7 +335,6 @@ out: doorbell[1] = sq_db.u32_8;
hns_roce_write64_k(doorbell, qp->sq.db_reg_l); - qp->sq_next_wqe = ind; }
spin_unlock_irqrestore(&qp->sq.lock, flags); @@ -348,12 +346,6 @@ static int hns_roce_v1_post_recv(struct const struct ib_recv_wr *wr, const struct ib_recv_wr **bad_wr) { - int ret = 0; - int nreq = 0; - int ind = 0; - int i = 0; - u32 reg_val; - unsigned long flags = 0; struct hns_roce_rq_wqe_ctrl *ctrl = NULL; struct hns_roce_wqe_data_seg *scat = NULL; struct hns_roce_qp *hr_qp = to_hr_qp(ibqp); @@ -361,9 +353,14 @@ static int hns_roce_v1_post_recv(struct struct device *dev = &hr_dev->pdev->dev; struct hns_roce_rq_db rq_db; __le32 doorbell[2] = {0}; + unsigned long flags = 0; + unsigned int wqe_idx; + int ret = 0; + int nreq = 0; + int i = 0; + u32 reg_val;
spin_lock_irqsave(&hr_qp->rq.lock, flags); - ind = hr_qp->rq.head & (hr_qp->rq.wqe_cnt - 1);
for (nreq = 0; wr; ++nreq, wr = wr->next) { if (hns_roce_wq_overflow(&hr_qp->rq, nreq, @@ -373,6 +370,8 @@ static int hns_roce_v1_post_recv(struct goto out; }
+ wqe_idx = (hr_qp->rq.head + nreq) & (hr_qp->rq.wqe_cnt - 1); + if (unlikely(wr->num_sge > hr_qp->rq.max_gs)) { dev_err(dev, "rq:num_sge=%d > qp->sq.max_gs=%d\n", wr->num_sge, hr_qp->rq.max_gs); @@ -381,7 +380,7 @@ static int hns_roce_v1_post_recv(struct goto out; }
- ctrl = get_recv_wqe(hr_qp, ind); + ctrl = get_recv_wqe(hr_qp, wqe_idx);
roce_set_field(ctrl->rwqe_byte_12, RQ_WQE_CTRL_RWQE_BYTE_12_RWQE_SGE_NUM_M, @@ -393,9 +392,7 @@ static int hns_roce_v1_post_recv(struct for (i = 0; i < wr->num_sge; i++) set_data_seg(scat + i, wr->sg_list + i);
- hr_qp->rq.wrid[ind] = wr->wr_id; - - ind = (ind + 1) & (hr_qp->rq.wqe_cnt - 1); + hr_qp->rq.wrid[wqe_idx] = wr->wr_id; }
out: @@ -2701,7 +2698,6 @@ static int hns_roce_v1_m_sqp(struct ib_q hr_qp->rq.tail = 0; hr_qp->sq.head = 0; hr_qp->sq.tail = 0; - hr_qp->sq_next_wqe = 0; }
kfree(context); @@ -3315,7 +3311,6 @@ static int hns_roce_v1_m_qp(struct ib_qp hr_qp->rq.tail = 0; hr_qp->sq.head = 0; hr_qp->sq.tail = 0; - hr_qp->sq_next_wqe = 0; } out: kfree(context); --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -239,10 +239,10 @@ static int hns_roce_v2_post_send(struct struct device *dev = hr_dev->dev; struct hns_roce_v2_db sq_db; struct ib_qp_attr attr; - unsigned int sge_ind; unsigned int owner_bit; + unsigned int sge_idx; + unsigned int wqe_idx; unsigned long flags; - unsigned int ind; void *wqe = NULL; bool loopback; int attr_mask; @@ -269,8 +269,7 @@ static int hns_roce_v2_post_send(struct }
spin_lock_irqsave(&qp->sq.lock, flags); - ind = qp->sq_next_wqe; - sge_ind = qp->next_sge; + sge_idx = qp->next_sge;
for (nreq = 0; wr; ++nreq, wr = wr->next) { if (hns_roce_wq_overflow(&qp->sq, nreq, qp->ibqp.send_cq)) { @@ -279,6 +278,8 @@ static int hns_roce_v2_post_send(struct goto out; }
+ wqe_idx = (qp->sq.head + nreq) & (qp->sq.wqe_cnt - 1); + if (unlikely(wr->num_sge > qp->sq.max_gs)) { dev_err(dev, "num_sge=%d > qp->sq.max_gs=%d\n", wr->num_sge, qp->sq.max_gs); @@ -287,10 +288,8 @@ static int hns_roce_v2_post_send(struct goto out; }
- wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1)); - qp->sq.wrid[(qp->sq.head + nreq) & (qp->sq.wqe_cnt - 1)] = - wr->wr_id; - + wqe = get_send_wqe(qp, wqe_idx); + qp->sq.wrid[wqe_idx] = wr->wr_id; owner_bit = ~(((qp->sq.head + nreq) >> ilog2(qp->sq.wqe_cnt)) & 0x1); tmp_len = 0; @@ -373,7 +372,7 @@ static int hns_roce_v2_post_send(struct roce_set_field(ud_sq_wqe->byte_20, V2_UD_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_M, V2_UD_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_S, - sge_ind & (qp->sge.sge_cnt - 1)); + sge_idx & (qp->sge.sge_cnt - 1));
roce_set_field(ud_sq_wqe->byte_24, V2_UD_SEND_WQE_BYTE_24_UDPSPN_M, @@ -423,8 +422,7 @@ static int hns_roce_v2_post_send(struct memcpy(&ud_sq_wqe->dgid[0], &ah->av.dgid[0], GID_LEN_V2);
- set_extend_sge(qp, wr, &sge_ind); - ind++; + set_extend_sge(qp, wr, &sge_idx); } else if (ibqp->qp_type == IB_QPT_RC) { rc_sq_wqe = wqe; memset(rc_sq_wqe, 0, sizeof(*rc_sq_wqe)); @@ -553,12 +551,10 @@ static int hns_roce_v2_post_send(struct wr->num_sge); } else if (wr->opcode != IB_WR_REG_MR) { ret = set_rwqe_data_seg(ibqp, wr, rc_sq_wqe, - wqe, &sge_ind, bad_wr); + wqe, &sge_idx, bad_wr); if (ret) goto out; } - - ind++; } else { dev_err(dev, "Illegal qp_type(0x%x)\n", ibqp->qp_type); spin_unlock_irqrestore(&qp->sq.lock, flags); @@ -588,8 +584,7 @@ out:
hns_roce_write64(hr_dev, (__le32 *)&sq_db, qp->sq.db_reg_l);
- qp->sq_next_wqe = ind; - qp->next_sge = sge_ind; + qp->next_sge = sge_idx;
if (qp->state == IB_QPS_ERR) { attr_mask = IB_QP_STATE; @@ -623,13 +618,12 @@ static int hns_roce_v2_post_recv(struct unsigned long flags; void *wqe = NULL; int attr_mask; + u32 wqe_idx; int ret = 0; int nreq; - int ind; int i;
spin_lock_irqsave(&hr_qp->rq.lock, flags); - ind = hr_qp->rq.head & (hr_qp->rq.wqe_cnt - 1);
if (hr_qp->state == IB_QPS_RESET) { spin_unlock_irqrestore(&hr_qp->rq.lock, flags); @@ -645,6 +639,8 @@ static int hns_roce_v2_post_recv(struct goto out; }
+ wqe_idx = (hr_qp->rq.head + nreq) & (hr_qp->rq.wqe_cnt - 1); + if (unlikely(wr->num_sge > hr_qp->rq.max_gs)) { dev_err(dev, "rq:num_sge=%d > qp->sq.max_gs=%d\n", wr->num_sge, hr_qp->rq.max_gs); @@ -653,7 +649,7 @@ static int hns_roce_v2_post_recv(struct goto out; }
- wqe = get_recv_wqe(hr_qp, ind); + wqe = get_recv_wqe(hr_qp, wqe_idx); dseg = (struct hns_roce_v2_wqe_data_seg *)wqe; for (i = 0; i < wr->num_sge; i++) { if (!wr->sg_list[i].length) @@ -669,8 +665,8 @@ static int hns_roce_v2_post_recv(struct
/* rq support inline data */ if (hr_dev->caps.flags & HNS_ROCE_CAP_FLAG_RQ_INLINE) { - sge_list = hr_qp->rq_inl_buf.wqe_list[ind].sg_list; - hr_qp->rq_inl_buf.wqe_list[ind].sge_cnt = + sge_list = hr_qp->rq_inl_buf.wqe_list[wqe_idx].sg_list; + hr_qp->rq_inl_buf.wqe_list[wqe_idx].sge_cnt = (u32)wr->num_sge; for (i = 0; i < wr->num_sge; i++) { sge_list[i].addr = @@ -679,9 +675,7 @@ static int hns_roce_v2_post_recv(struct } }
- hr_qp->rq.wrid[ind] = wr->wr_id; - - ind = (ind + 1) & (hr_qp->rq.wqe_cnt - 1); + hr_qp->rq.wrid[wqe_idx] = wr->wr_id; }
out: @@ -4464,7 +4458,6 @@ static int hns_roce_v2_modify_qp(struct hr_qp->rq.tail = 0; hr_qp->sq.head = 0; hr_qp->sq.tail = 0; - hr_qp->sq_next_wqe = 0; hr_qp->next_sge = 0; if (hr_qp->rq.wqe_cnt) *hr_qp->rdb.db_record = 0;
From: Lijun Ou oulijun@huawei.com
commit 468d020e2f02867b8ec561461a1689cd4365e493 upstream.
Driver should first check whether the sge is valid, then fill the valid sge and the caculated total into hardware, otherwise invalid sges will cause an error.
Fixes: 52e3b42a2f58 ("RDMA/hns: Filter for zero length of sge in hip08 kernel mode") Fixes: 7bdee4158b37 ("RDMA/hns: Fill sq wqe context of ud type in hip08") Link: https://lore.kernel.org/r/1578571852-13704-1-git-send-email-liweihang@huawei... Signed-off-by: Lijun Ou oulijun@huawei.com Signed-off-by: Weihang Li liweihang@huawei.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 41 +++++++++++++++++------------ 1 file changed, 25 insertions(+), 16 deletions(-)
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -110,7 +110,7 @@ static void set_atomic_seg(struct hns_ro }
static void set_extend_sge(struct hns_roce_qp *qp, const struct ib_send_wr *wr, - unsigned int *sge_ind) + unsigned int *sge_ind, int valid_num_sge) { struct hns_roce_v2_wqe_data_seg *dseg; struct ib_sge *sg; @@ -123,7 +123,7 @@ static void set_extend_sge(struct hns_ro
if (qp->ibqp.qp_type == IB_QPT_RC || qp->ibqp.qp_type == IB_QPT_UC) num_in_wqe = HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE; - extend_sge_num = wr->num_sge - num_in_wqe; + extend_sge_num = valid_num_sge - num_in_wqe; sg = wr->sg_list + num_in_wqe; shift = qp->hr_buf.page_shift;
@@ -159,14 +159,16 @@ static void set_extend_sge(struct hns_ro static int set_rwqe_data_seg(struct ib_qp *ibqp, const struct ib_send_wr *wr, struct hns_roce_v2_rc_send_wqe *rc_sq_wqe, void *wqe, unsigned int *sge_ind, + int valid_num_sge, const struct ib_send_wr **bad_wr) { struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device); struct hns_roce_v2_wqe_data_seg *dseg = wqe; struct hns_roce_qp *qp = to_hr_qp(ibqp); + int j = 0; int i;
- if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) { + if (wr->send_flags & IB_SEND_INLINE && valid_num_sge) { if (le32_to_cpu(rc_sq_wqe->msg_len) > hr_dev->caps.max_sq_inline) { *bad_wr = wr; @@ -190,7 +192,7 @@ static int set_rwqe_data_seg(struct ib_q roce_set_bit(rc_sq_wqe->byte_4, V2_RC_SEND_WQE_BYTE_4_INLINE_S, 1); } else { - if (wr->num_sge <= HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE) { + if (valid_num_sge <= HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE) { for (i = 0; i < wr->num_sge; i++) { if (likely(wr->sg_list[i].length)) { set_data_seg_v2(dseg, wr->sg_list + i); @@ -203,19 +205,21 @@ static int set_rwqe_data_seg(struct ib_q V2_RC_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_S, (*sge_ind) & (qp->sge.sge_cnt - 1));
- for (i = 0; i < HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE; i++) { + for (i = 0; i < wr->num_sge && + j < HNS_ROCE_V2_UC_RC_SGE_NUM_IN_WQE; i++) { if (likely(wr->sg_list[i].length)) { set_data_seg_v2(dseg, wr->sg_list + i); dseg++; + j++; } }
- set_extend_sge(qp, wr, sge_ind); + set_extend_sge(qp, wr, sge_ind, valid_num_sge); }
roce_set_field(rc_sq_wqe->byte_16, V2_RC_SEND_WQE_BYTE_16_SGE_NUM_M, - V2_RC_SEND_WQE_BYTE_16_SGE_NUM_S, wr->num_sge); + V2_RC_SEND_WQE_BYTE_16_SGE_NUM_S, valid_num_sge); }
return 0; @@ -243,6 +247,7 @@ static int hns_roce_v2_post_send(struct unsigned int sge_idx; unsigned int wqe_idx; unsigned long flags; + int valid_num_sge; void *wqe = NULL; bool loopback; int attr_mask; @@ -292,8 +297,16 @@ static int hns_roce_v2_post_send(struct qp->sq.wrid[wqe_idx] = wr->wr_id; owner_bit = ~(((qp->sq.head + nreq) >> ilog2(qp->sq.wqe_cnt)) & 0x1); + valid_num_sge = 0; tmp_len = 0;
+ for (i = 0; i < wr->num_sge; i++) { + if (likely(wr->sg_list[i].length)) { + tmp_len += wr->sg_list[i].length; + valid_num_sge++; + } + } + /* Corresponding to the QP type, wqe process separately */ if (ibqp->qp_type == IB_QPT_GSI) { ud_sq_wqe = wqe; @@ -329,9 +342,6 @@ static int hns_roce_v2_post_send(struct V2_UD_SEND_WQE_BYTE_4_OPCODE_S, HNS_ROCE_V2_WQE_OP_SEND);
- for (i = 0; i < wr->num_sge; i++) - tmp_len += wr->sg_list[i].length; - ud_sq_wqe->msg_len = cpu_to_le32(le32_to_cpu(ud_sq_wqe->msg_len) + tmp_len);
@@ -367,7 +377,7 @@ static int hns_roce_v2_post_send(struct roce_set_field(ud_sq_wqe->byte_16, V2_UD_SEND_WQE_BYTE_16_SGE_NUM_M, V2_UD_SEND_WQE_BYTE_16_SGE_NUM_S, - wr->num_sge); + valid_num_sge);
roce_set_field(ud_sq_wqe->byte_20, V2_UD_SEND_WQE_BYTE_20_MSG_START_SGE_IDX_M, @@ -422,12 +432,10 @@ static int hns_roce_v2_post_send(struct memcpy(&ud_sq_wqe->dgid[0], &ah->av.dgid[0], GID_LEN_V2);
- set_extend_sge(qp, wr, &sge_idx); + set_extend_sge(qp, wr, &sge_idx, valid_num_sge); } else if (ibqp->qp_type == IB_QPT_RC) { rc_sq_wqe = wqe; memset(rc_sq_wqe, 0, sizeof(*rc_sq_wqe)); - for (i = 0; i < wr->num_sge; i++) - tmp_len += wr->sg_list[i].length;
rc_sq_wqe->msg_len = cpu_to_le32(le32_to_cpu(rc_sq_wqe->msg_len) + tmp_len); @@ -548,10 +556,11 @@ static int hns_roce_v2_post_send(struct roce_set_field(rc_sq_wqe->byte_16, V2_RC_SEND_WQE_BYTE_16_SGE_NUM_M, V2_RC_SEND_WQE_BYTE_16_SGE_NUM_S, - wr->num_sge); + valid_num_sge); } else if (wr->opcode != IB_WR_REG_MR) { ret = set_rwqe_data_seg(ibqp, wr, rc_sq_wqe, - wqe, &sge_idx, bad_wr); + wqe, &sge_idx, + valid_num_sge, bad_wr); if (ret) goto out; }
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
commit 5a44c71ccda60a50073c5d7fe3f694cdfa3ab0c2 upstream.
'alloc_etherdev_mqs()' expects first 'tx', then 'rx'. The semantic here looks reversed.
Reorder the arguments passed to 'alloc_etherdev_mqs()' in order to keep the correct semantic.
In fact, this is a no-op because both XGENE_NUM_[RT]X_RING are 8.
Fixes: 107dec2749fe ("drivers: net: xgene: Add support for multiple queues") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -2020,7 +2020,7 @@ static int xgene_enet_probe(struct platf int ret;
ndev = alloc_etherdev_mqs(sizeof(struct xgene_enet_pdata), - XGENE_NUM_RX_RING, XGENE_NUM_TX_RING); + XGENE_NUM_TX_RING, XGENE_NUM_RX_RING); if (!ndev) return -ENOMEM;
From: Janne Karhunen janne.karhunen@gmail.com
commit 483ec26eed42bf050931d9a5c5f9f0b5f2ad5f3b upstream.
Keep the ima policy rules around from the beginning even if they appear invalid at the time of loading, as they may become active after an lsm policy load. However, loading a custom IMA policy with unknown LSM labels is only safe after we have transitioned from the "built-in" policy rules to a custom IMA policy.
Patch also fixes the rule re-use during the lsm policy reload and makes some prints a bit more human readable.
Changelog: v4: - Do not allow the initial policy load refer to non-existing lsm rules. v3: - Fix too wide policy rule matching for non-initialized LSMs v2: - Fix log prints
Fixes: b16942455193 ("ima: use the lsm policy update notifier") Cc: Casey Schaufler casey@schaufler-ca.com Reported-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Janne Karhunen janne.karhunen@gmail.com Signed-off-by: Konsta Karsisto konsta.karsisto@gmail.com Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- security/integrity/ima/ima_policy.c | 44 +++++++++++++++++++++--------------- 1 file changed, 26 insertions(+), 18 deletions(-)
--- a/security/integrity/ima/ima_policy.c +++ b/security/integrity/ima/ima_policy.c @@ -263,7 +263,7 @@ static void ima_lsm_free_rule(struct ima static struct ima_rule_entry *ima_lsm_copy_rule(struct ima_rule_entry *entry) { struct ima_rule_entry *nentry; - int i, result; + int i;
nentry = kmalloc(sizeof(*nentry), GFP_KERNEL); if (!nentry) @@ -277,7 +277,7 @@ static struct ima_rule_entry *ima_lsm_co memset(nentry->lsm, 0, sizeof_field(struct ima_rule_entry, lsm));
for (i = 0; i < MAX_LSM_RULES; i++) { - if (!entry->lsm[i].rule) + if (!entry->lsm[i].args_p) continue;
nentry->lsm[i].type = entry->lsm[i].type; @@ -286,13 +286,13 @@ static struct ima_rule_entry *ima_lsm_co if (!nentry->lsm[i].args_p) goto out_err;
- result = security_filter_rule_init(nentry->lsm[i].type, - Audit_equal, - nentry->lsm[i].args_p, - &nentry->lsm[i].rule); - if (result == -EINVAL) - pr_warn("ima: rule for LSM '%d' is undefined\n", - entry->lsm[i].type); + security_filter_rule_init(nentry->lsm[i].type, + Audit_equal, + nentry->lsm[i].args_p, + &nentry->lsm[i].rule); + if (!nentry->lsm[i].rule) + pr_warn("rule for LSM '%s' is undefined\n", + (char *)entry->lsm[i].args_p); } return nentry;
@@ -329,7 +329,7 @@ static void ima_lsm_update_rules(void) list_for_each_entry_safe(entry, e, &ima_policy_rules, list) { needs_update = 0; for (i = 0; i < MAX_LSM_RULES; i++) { - if (entry->lsm[i].rule) { + if (entry->lsm[i].args_p) { needs_update = 1; break; } @@ -339,8 +339,7 @@ static void ima_lsm_update_rules(void)
result = ima_lsm_update_rule(entry); if (result) { - pr_err("ima: lsm rule update error %d\n", - result); + pr_err("lsm rule update error %d\n", result); return; } } @@ -357,7 +356,7 @@ int ima_lsm_policy_change(struct notifie }
/** - * ima_match_rules - determine whether an inode matches the measure rule. + * ima_match_rules - determine whether an inode matches the policy rule. * @rule: a pointer to a rule * @inode: a pointer to an inode * @cred: a pointer to a credentials structure for user validation @@ -415,9 +414,12 @@ static bool ima_match_rules(struct ima_r int rc = 0; u32 osid;
- if (!rule->lsm[i].rule) - continue; - + if (!rule->lsm[i].rule) { + if (!rule->lsm[i].args_p) + continue; + else + return false; + } switch (i) { case LSM_OBJ_USER: case LSM_OBJ_ROLE: @@ -823,8 +825,14 @@ static int ima_lsm_rule_init(struct ima_ entry->lsm[lsm_rule].args_p, &entry->lsm[lsm_rule].rule); if (!entry->lsm[lsm_rule].rule) { - kfree(entry->lsm[lsm_rule].args_p); - return -EINVAL; + pr_warn("rule for LSM '%s' is undefined\n", + (char *)entry->lsm[lsm_rule].args_p); + + if (ima_rules == &ima_default_rules) { + kfree(entry->lsm[lsm_rule].args_p); + result = -EINVAL; + } else + result = 0; }
return result;
From: Masami Hiramatsu mhiramat@kernel.org
commit f66c0447cca1281116224d474cdb37d6a18e4b5b upstream.
Set the unoptimized flag after confirming the code is completely unoptimized. Without this fix, when a kprobe hits the intermediate modified instruction (the first byte is replaced by an INT3, but later bytes can still be a jump address operand) while unoptimizing, it can return to the middle byte of the modified code, which causes an invalid instruction exception in the kernel.
Usually, this is a rare case, but if we put a probe on the function call while text patching, it always causes a kernel panic as below:
# echo p text_poke+5 > kprobe_events # echo 1 > events/kprobes/enable # echo 0 > events/kprobes/enable
invalid opcode: 0000 [#1] PREEMPT SMP PTI RIP: 0010:text_poke+0x9/0x50 Call Trace: arch_unoptimize_kprobe+0x22/0x28 arch_unoptimize_kprobes+0x39/0x87 kprobe_optimizer+0x6e/0x290 process_one_work+0x2a0/0x610 worker_thread+0x28/0x3d0 ? process_one_work+0x610/0x610 kthread+0x10d/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x3a/0x50
text_poke() is used for patching the code in optprobes.
This can happen even if we blacklist text_poke() and other functions, because there is a small time window during which we show the intermediate code to other CPUs.
[ mingo: Edited the changelog. ]
Tested-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Cc: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Steven Rostedt rostedt@goodmis.org Cc: Thomas Gleixner tglx@linutronix.de Cc: bristot@redhat.com Fixes: 6274de4984a6 ("kprobes: Support delayed unoptimizing") Link: https://lkml.kernel.org/r/157483422375.25881.13508326028469515760.stgit@devn... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/kprobes.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -510,6 +510,8 @@ static void do_unoptimize_kprobes(void) arch_unoptimize_kprobes(&unoptimizing_list, &freeing_list); /* Loop free_list for disarming */ list_for_each_entry_safe(op, tmp, &freeing_list, list) { + /* Switching from detour code to origin */ + op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED; /* Disarm probes if marked disabled */ if (kprobe_disabled(&op->kp)) arch_disarm_kprobe(&op->kp); @@ -665,6 +667,7 @@ static void force_unoptimize_kprobe(stru { lockdep_assert_cpus_held(); arch_unoptimize_kprobe(op); + op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED; if (kprobe_disabled(&op->kp)) arch_disarm_kprobe(&op->kp); } @@ -681,7 +684,6 @@ static void unoptimize_kprobe(struct kpr if (!kprobe_optimized(p)) return;
- op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED; if (!list_empty(&op->list)) { if (optprobe_queued_unopt(op)) { /* Queued in unoptimizing queue */
From: Thomas Gleixner tglx@linutronix.de
commit 9a6b55ac4a44060bcb782baf002859b2a2c63267 upstream.
The function name suggests that this is a boolean checking whether the architecture asks for an update of the VDSO data, but it works the other way round. To spare further confusion invert the logic.
Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation") Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20200114185946.656652824@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/include/asm/vdso/vsyscall.h | 4 ++-- include/asm-generic/vdso/vsyscall.h | 4 ++-- kernel/time/vsyscall.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-)
--- a/arch/arm/include/asm/vdso/vsyscall.h +++ b/arch/arm/include/asm/vdso/vsyscall.h @@ -34,9 +34,9 @@ struct vdso_data *__arm_get_k_vdso_data( #define __arch_get_k_vdso_data __arm_get_k_vdso_data
static __always_inline -int __arm_update_vdso_data(void) +bool __arm_update_vdso_data(void) { - return !cntvct_ok; + return cntvct_ok; } #define __arch_update_vdso_data __arm_update_vdso_data
--- a/include/asm-generic/vdso/vsyscall.h +++ b/include/asm-generic/vdso/vsyscall.h @@ -12,9 +12,9 @@ static __always_inline struct vdso_data #endif /* __arch_get_k_vdso_data */
#ifndef __arch_update_vdso_data -static __always_inline int __arch_update_vdso_data(void) +static __always_inline bool __arch_update_vdso_data(void) { - return 0; + return true; } #endif /* __arch_update_vdso_data */
--- a/kernel/time/vsyscall.c +++ b/kernel/time/vsyscall.c @@ -84,7 +84,7 @@ void update_vsyscall(struct timekeeper * struct vdso_timestamp *vdso_ts; u64 nsec;
- if (__arch_update_vdso_data()) { + if (!__arch_update_vdso_data()) { /* * Some architectures might want to skip the update of the * data page.
From: Thomas Gleixner tglx@linutronix.de
commit 9f24c540f7f8eb3a981528da9a9a636a5bdf5987 upstream.
The low resolution parts of the VDSO, i.e.:
clock_gettime(CLOCK_*_COARSE), clock_getres(), time()
can be used even if there is no VDSO capable clocksource.
But if an architecture opts out of the VDSO data update then this information becomes stale. This affects ARM when there is no architected timer available. The lack of update causes userspace to use stale data forever.
Make the update of the low resolution parts unconditional and only skip the update of the high resolution parts if the architecture requests it.
Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation") Signed-off-by: Thomas Gleixner tglx@linutronix.de Link: https://lore.kernel.org/r/20200114185946.765577901@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/time/vsyscall.c | 37 +++++++++++++++++-------------------- 1 file changed, 17 insertions(+), 20 deletions(-)
--- a/kernel/time/vsyscall.c +++ b/kernel/time/vsyscall.c @@ -28,11 +28,6 @@ static inline void update_vdso_data(stru vdata[CS_RAW].mult = tk->tkr_raw.mult; vdata[CS_RAW].shift = tk->tkr_raw.shift;
- /* CLOCK_REALTIME */ - vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME]; - vdso_ts->sec = tk->xtime_sec; - vdso_ts->nsec = tk->tkr_mono.xtime_nsec; - /* CLOCK_MONOTONIC */ vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC]; vdso_ts->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec; @@ -70,12 +65,6 @@ static inline void update_vdso_data(stru vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_TAI]; vdso_ts->sec = tk->xtime_sec + (s64)tk->tai_offset; vdso_ts->nsec = tk->tkr_mono.xtime_nsec; - - /* - * Read without the seqlock held by clock_getres(). - * Note: No need to have a second copy. - */ - WRITE_ONCE(vdata[CS_HRES_COARSE].hrtimer_res, hrtimer_resolution); }
void update_vsyscall(struct timekeeper *tk) @@ -84,20 +73,17 @@ void update_vsyscall(struct timekeeper * struct vdso_timestamp *vdso_ts; u64 nsec;
- if (!__arch_update_vdso_data()) { - /* - * Some architectures might want to skip the update of the - * data page. - */ - return; - } - /* copy vsyscall data */ vdso_write_begin(vdata);
vdata[CS_HRES_COARSE].clock_mode = __arch_get_clock_mode(tk); vdata[CS_RAW].clock_mode = __arch_get_clock_mode(tk);
+ /* CLOCK_REALTIME also required for time() */ + vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME]; + vdso_ts->sec = tk->xtime_sec; + vdso_ts->nsec = tk->tkr_mono.xtime_nsec; + /* CLOCK_REALTIME_COARSE */ vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME_COARSE]; vdso_ts->sec = tk->xtime_sec; @@ -110,7 +96,18 @@ void update_vsyscall(struct timekeeper * nsec = nsec + tk->wall_to_monotonic.tv_nsec; vdso_ts->sec += __iter_div_u64_rem(nsec, NSEC_PER_SEC, &vdso_ts->nsec);
- update_vdso_data(vdata, tk); + /* + * Read without the seqlock held by clock_getres(). + * Note: No need to have a second copy. + */ + WRITE_ONCE(vdata[CS_HRES_COARSE].hrtimer_res, hrtimer_resolution); + + /* + * Architectures can opt out of updating the high resolution part + * of the VDSO. + */ + if (__arch_update_vdso_data()) + update_vdso_data(vdata, tk);
__arch_update_vsyscall(vdata, tk);
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
commit c7cb3a1dd53f63c64fb2b567d0be130b92a44d91 upstream.
This was found by coccicheck:
drivers/pwm/pwm-omap-dmtimer.c:304:2-8: ERROR: missing put_device; call of_find_device_by_node on line 255, but without a corresponding object release within this function.
Reported-by: Markus Elfring elfring@users.sourceforge.net Fixes: 6604c6556db9 ("pwm: Add PWM driver for OMAP using dual-mode timers") Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pwm/pwm-omap-dmtimer.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
--- a/drivers/pwm/pwm-omap-dmtimer.c +++ b/drivers/pwm/pwm-omap-dmtimer.c @@ -256,7 +256,7 @@ static int pwm_omap_dmtimer_probe(struct if (!timer_pdev) { dev_err(&pdev->dev, "Unable to find Timer pdev\n"); ret = -ENODEV; - goto put; + goto err_find_timer_pdev; }
timer_pdata = dev_get_platdata(&timer_pdev->dev); @@ -264,7 +264,7 @@ static int pwm_omap_dmtimer_probe(struct dev_dbg(&pdev->dev, "dmtimer pdata structure NULL, deferring probe\n"); ret = -EPROBE_DEFER; - goto put; + goto err_platdata; }
pdata = timer_pdata->timer_ops; @@ -283,19 +283,19 @@ static int pwm_omap_dmtimer_probe(struct !pdata->write_counter) { dev_err(&pdev->dev, "Incomplete dmtimer pdata structure\n"); ret = -EINVAL; - goto put; + goto err_platdata; }
if (!of_get_property(timer, "ti,timer-pwm", NULL)) { dev_err(&pdev->dev, "Missing ti,timer-pwm capability\n"); ret = -ENODEV; - goto put; + goto err_timer_property; }
dm_timer = pdata->request_by_node(timer); if (!dm_timer) { ret = -EPROBE_DEFER; - goto put; + goto err_request_timer; }
omap = devm_kzalloc(&pdev->dev, sizeof(*omap), GFP_KERNEL); @@ -352,7 +352,14 @@ err_pwmchip_add: err_alloc_omap:
pdata->free(dm_timer); -put: +err_request_timer: + +err_timer_property: +err_platdata: + + put_device(&timer_pdev->dev); +err_find_timer_pdev: + of_node_put(timer);
return ret; @@ -372,6 +379,8 @@ static int pwm_omap_dmtimer_remove(struc
omap->pdata->free(omap->dm_timer);
+ put_device(&omap->dm_timer_pdev->dev); + mutex_destroy(&omap->mutex);
return 0;
From: Arnaldo Carvalho de Melo acme@redhat.com
commit 3f7774033e6820d25beee5cf7aefa11d4968b951 upstream.
We need to set actions->ms.map since 599a2f38a989 ("perf hists browser: Check sort keys before hot key actions"), as in that patch we bail out if map is NULL.
Reviewed-by: Jiri Olsa jolsa@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Namhyung Kim namhyung@kernel.org Fixes: 599a2f38a989 ("perf hists browser: Check sort keys before hot key actions") Link: https://lkml.kernel.org/n/tip-wp1ssoewy6zihwwexqpohv0j@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/ui/browsers/hists.c | 1 + 1 file changed, 1 insertion(+)
--- a/tools/perf/ui/browsers/hists.c +++ b/tools/perf/ui/browsers/hists.c @@ -3062,6 +3062,7 @@ static int perf_evsel__hists_browse(stru
continue; } + actions->ms.map = map; top = pstack__peek(browser->pstack); if (top == &browser->hists->dso_filter) { /*
From: Jiri Olsa jolsa@kernel.org
commit 604e2139a1026793b8c2172bd92c7e9d039a5cf0 upstream.
When we moved zalloc.o to the library we missed gtk library which needs it compiled in, otherwise the missing __zfree symbol will cause the library to fail to load.
Adding the zalloc object to the gtk library build.
Fixes: 7f7c536f23e6 ("tools lib: Adopt zalloc()/zfree() from tools/perf") Signed-off-by: Jiri Olsa jolsa@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jelle van der Waa jelle@vdwaa.nl Cc: Michael Petlan mpetlan@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/20200113104358.123511-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/ui/gtk/Build | 5 +++++ 1 file changed, 5 insertions(+)
--- a/tools/perf/ui/gtk/Build +++ b/tools/perf/ui/gtk/Build @@ -7,3 +7,8 @@ gtk-y += util.o gtk-y += helpline.o gtk-y += progress.o gtk-y += annotate.o +gtk-y += zalloc.o + +$(OUTPUT)ui/gtk/zalloc.o: ../lib/zalloc.c FORCE + $(call rule_mkdir) + $(call if_changed_dep,cc_o_c)
From: Cengiz Can cengiz@kernel.wtf
commit 85fc95d75970ee7dd8e01904e7fb1197c275ba6b upstream.
`tools/perf/util/map.c` has a function named `maps__insert` that acquires a write lock if its in multithread context.
Even though this lock is released when function successfully completes, there's a branch that is executed when `maps_by_name == NULL` that returns from this function without releasing the write lock.
Added an `up_write` to release the lock when this happens.
Fixes: a7c2b572e217 ("perf map_groups: Auto sort maps by name, if needed") Signed-off-by: Cengiz Can cengiz@kernel.wtf Cc: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Link: http://lore.kernel.org/lkml/20200120141553.23934-1-cengiz@kernel.wtf Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/util/map.c | 1 + 1 file changed, 1 insertion(+)
--- a/tools/perf/util/map.c +++ b/tools/perf/util/map.c @@ -549,6 +549,7 @@ void maps__insert(struct maps *maps, str
if (maps_by_name == NULL) { __maps__free_maps_by_name(maps); + up_write(&maps->lock); return; }
From: Xiaochen Shen xiaochen.shen@intel.com
commit 536a0d8e79fb928f2735db37dda95682b6754f9a upstream.
Currently, there are three static keys in the resctrl file system: rdt_mon_enable_key and rdt_alloc_enable_key indicate if the monitoring feature and the allocation feature are enabled, respectively. The rdt_enable_key is enabled when either the monitoring feature or the allocation feature is enabled.
If no monitoring feature is present (either hardware doesn't support a monitoring feature or the feature is disabled by the kernel command line option "rdt="), rdt_enable_key is still enabled but rdt_mon_enable_key is disabled.
MBM is a monitoring feature. The MBM overflow handler intends to check if the monitoring feature is not enabled for fast return.
So check the rdt_mon_enable_key in it instead of the rdt_enable_key as former is the more accurate check.
[ bp: Massage commit message. ]
Fixes: e33026831bdb ("x86/intel_rdt/mbm: Handle counter overflow") Signed-off-by: Xiaochen Shen xiaochen.shen@intel.com Signed-off-by: Borislav Petkov bp@suse.de Link: https://lkml.kernel.org/r/1576094705-13660-1-git-send-email-xiaochen.shen@in... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/resctrl/internal.h | 1 + arch/x86/kernel/cpu/resctrl/monitor.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -57,6 +57,7 @@ static inline struct rdt_fs_context *rdt }
DECLARE_STATIC_KEY_FALSE(rdt_enable_key); +DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
/** * struct mon_evt - Entry in the event list of a resource --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -514,7 +514,7 @@ void mbm_handle_overflow(struct work_str
mutex_lock(&rdtgroup_mutex);
- if (!static_branch_likely(&rdt_enable_key)) + if (!static_branch_likely(&rdt_mon_enable_key)) goto out_unlock;
d = get_domain_from_cpu(cpu, &rdt_resources_all[RDT_RESOURCE_L3]); @@ -543,7 +543,7 @@ void mbm_setup_overflow_handler(struct r unsigned long delay = msecs_to_jiffies(delay_ms); int cpu;
- if (!static_branch_likely(&rdt_enable_key)) + if (!static_branch_likely(&rdt_mon_enable_key)) return; cpu = cpumask_any(&dom->cpu_mask); dom->mbm_work_cpu = cpu;
From: Peter Xu peterx@redhat.com
commit b4b2963616bbd91ebb33148522552e1135de56ae upstream.
The 3rd parameter of kvm_apic_match_dest() is the irq shorthand, rather than the irq delivery mode.
Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Reviewed-by: Vitaly Kuznetsov vkuznets@redhat.com Signed-off-by: Peter Xu peterx@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/lapic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1150,7 +1150,7 @@ void kvm_bitmap_or_dest_vcpus(struct kvm if (!kvm_apic_present(vcpu)) continue; if (!kvm_apic_match_dest(vcpu, NULL, - irq->delivery_mode, + irq->shorthand, irq->dest_id, irq->dest_mode)) continue;
From: Sean Christopherson sean.j.christopherson@intel.com
commit 9d979c7e6ff43ca3200ffcb74f57415fd633a2da upstream.
x86 does not load its MMU until KVM_RUN, which cannot be invoked until after vCPU creation succeeds. Given that kvm_arch_vcpu_destroy() is called if and only if vCPU creation fails, it is impossible for the MMU to be loaded.
Note, the bogus kvm_mmu_unload() call was added during an unrelated refactoring of vCPU allocation, i.e. was presumably added as an opportunstic "fix" for a perceived leak.
Fixes: fb3f0f51d92d1 ("KVM: Dynamically allocate vcpus") Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/x86.c | 4 ---- 1 file changed, 4 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9229,10 +9229,6 @@ void kvm_arch_vcpu_destroy(struct kvm_vc { vcpu->arch.apf.msr_val = 0;
- vcpu_load(vcpu); - kvm_mmu_unload(vcpu); - vcpu_put(vcpu); - kvm_arch_vcpu_free(vcpu); }
From: Sean Christopherson sean.j.christopherson@intel.com
commit 208050dac5ef4de5cb83ffcafa78499c94d0b5ad upstream.
Remove a bogus clearing of apf.msr_val from kvm_arch_vcpu_destroy().
apf.msr_val is only set to a non-zero value by kvm_pv_enable_async_pf(), which is only reachable by kvm_set_msr_common(), i.e. by writing MSR_KVM_ASYNC_PF_EN. KVM does not autonomously write said MSR, i.e. can only be written via KVM_SET_MSRS or KVM_RUN. Since KVM_SET_MSRS and KVM_RUN are vcpu ioctls, they require a valid vcpu file descriptor. kvm_arch_vcpu_destroy() is only called if KVM_CREATE_VCPU fails, and KVM declares KVM_CREATE_VCPU successful once the vcpu fd is installed and thus visible to userspace. Ergo, apf.msr_val cannot be non-zero when kvm_arch_vcpu_destroy() is called.
Fixes: 344d9588a9df0 ("KVM: Add PV MSR to enable asynchronous page faults delivery.") Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/x86.c | 2 -- 1 file changed, 2 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9227,8 +9227,6 @@ void kvm_arch_vcpu_postcreate(struct kvm
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) { - vcpu->arch.apf.msr_val = 0; - kvm_arch_vcpu_free(vcpu); }
From: Neeraj Upadhyay neeraju@codeaurora.org
commit 4bc6b745e5cbefed92c48071e28a5f41246d0470 upstream.
The current expedited RCU grace-period code expects that a task requesting an expedited grace period cannot awaken until that grace period has reached the wakeup phase. However, it is possible for a long preemption to result in the waiting task never sleeping. For example, consider the following sequence of events:
1. Task A starts an expedited grace period by invoking synchronize_rcu_expedited(). It proceeds normally up to the wait_event() near the end of that function, and is then preempted (or interrupted or whatever).
2. The expedited grace period completes, and a kworker task starts the awaken phase, having incremented the counter and acquired the rcu_state structure's .exp_wake_mutex. This kworker task is then preempted or interrupted or whatever.
3. Task A resumes and enters wait_event(), which notes that the expedited grace period has completed, and thus doesn't sleep.
4. Task B starts an expedited grace period exactly as did Task A, complete with the preemption (or whatever delay) just before the call to wait_event().
5. The expedited grace period completes, and another kworker task starts the awaken phase, having incremented the counter. However, it blocks when attempting to acquire the rcu_state structure's .exp_wake_mutex because step 2's kworker task has not yet released it.
6. Steps 4 and 5 repeat, resulting in overflow of the rcu_node structure's ->exp_wq[] array.
In theory, this is harmless. Tasks waiting on the various ->exp_wq[] array will just be spuriously awakened, but they will just sleep again on noting that the rcu_state structure's ->expedited_sequence value has not advanced far enough.
In practice, this wastes CPU time and is an accident waiting to happen. This commit therefore moves the rcu_exp_gp_seq_end() call that officially ends the expedited grace period (along with associate tracing) until after the ->exp_wake_mutex has been acquired. This prevents Task A from awakening prematurely, thus preventing more than one expedited grace period from being in flight during a previous expedited grace period's wakeup phase.
Fixes: 3b5f668e715b ("rcu: Overlap wakeups with next expedited grace period") Signed-off-by: Neeraj Upadhyay neeraju@codeaurora.org [ paulmck: Added updated comment. ] Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/rcu/tree_exp.h | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -540,14 +540,13 @@ static void rcu_exp_wait_wake(unsigned l struct rcu_node *rnp;
synchronize_sched_expedited_wait(); - rcu_exp_gp_seq_end(); - trace_rcu_exp_grace_period(rcu_state.name, s, TPS("end"));
- /* - * Switch over to wakeup mode, allowing the next GP, but -only- the - * next GP, to proceed. - */ + // Switch over to wakeup mode, allowing the next GP to proceed. + // End the previous grace period only after acquiring the mutex + // to ensure that only one GP runs concurrently with wakeups. mutex_lock(&rcu_state.exp_wake_mutex); + rcu_exp_gp_seq_end(); + trace_rcu_exp_grace_period(rcu_state.name, s, TPS("end"));
rcu_for_each_node_breadth_first(rnp) { if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) {
From: Geert Uytterhoeven geert@linux-m68k.org
commit 155fc6ba488a8bdfd1d3be3d7ba98c9cec2b2429 upstream.
On alpha and s390x:
fs/ubifs/debug.h:158:11: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘ino_t {aka unsigned int}’ [-Wformat=] ... fs/ubifs/orphan.c:132:3: note: in expansion of macro ‘dbg_gen’ dbg_gen("deleted twice ino %lu", orph->inum); ... fs/ubifs/orphan.c:140:3: note: in expansion of macro ‘dbg_gen’ dbg_gen("delete later ino %lu", orph->inum);
__kernel_ino_t is "unsigned long" on most architectures, but not on alpha and s390x, where it is "unsigned int". Hence when printing an ino_t, it should always be cast to "unsigned long" first.
Fix this by re-adding the recently removed casts.
Fixes: 8009ce956c3d2802 ("ubifs: Don't leak orphans on memory during commit") Signed-off-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ubifs/orphan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/ubifs/orphan.c +++ b/fs/ubifs/orphan.c @@ -129,7 +129,7 @@ static void __orphan_drop(struct ubifs_i static void orphan_delete(struct ubifs_info *c, struct ubifs_orphan *orph) { if (orph->del) { - dbg_gen("deleted twice ino %lu", orph->inum); + dbg_gen("deleted twice ino %lu", (unsigned long)orph->inum); return; }
@@ -137,7 +137,7 @@ static void orphan_delete(struct ubifs_i orph->del = 1; orph->dnext = c->orph_dnext; c->orph_dnext = orph; - dbg_gen("delete later ino %lu", orph->inum); + dbg_gen("delete later ino %lu", (unsigned long)orph->inum); return; }
From: Linus Walleij linus.walleij@linaro.org
commit c56dcfa3d4d0f49f0c37cd24886aa86db7aa7f30 upstream.
We are not interested in getting this debug print on our console all the time.
Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Stephan Gerhold stephan@gerhold.net Fixes: 6c375eccded4 ("thermal: db8500: Rewrite to be a pure OF sensor") Signed-off-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Stephan Gerhold stephan@gerhold.net Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://lore.kernel.org/r/20191119074650.2664-1-linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/thermal/db8500_thermal.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/thermal/db8500_thermal.c +++ b/drivers/thermal/db8500_thermal.c @@ -152,8 +152,8 @@ static irqreturn_t prcmu_high_irq_handle db8500_thermal_update_config(th, idx, THERMAL_TREND_RAISING, next_low, next_high);
- dev_info(&th->tz->device, - "PRCMU set max %ld, min %ld\n", next_high, next_low); + dev_dbg(&th->tz->device, + "PRCMU set max %ld, min %ld\n", next_high, next_low); } else if (idx == num_points - 1) /* So we roof out 1 degree over the max point */ th->interpolated_temp = db8500_thermal_points[idx] + 1;
From: Florian Fainelli f.fainelli@gmail.com
commit e1ff6fc22f19e2af8adbad618526b80067911d40 upstream.
At the time the brcmstb_thermal driver and its binding were merged, the DT binding did not make the coefficients properties a mandatory one, therefore all users of the brcmstb_thermal driver out there have a non functional implementation with zero coefficients. Even if these properties were provided, the formula used for computation is incorrect.
The coefficients are entirely process specific (right now, only 28nm is supported) and not board or SoC specific, it is therefore appropriate to hard code them in the driver given the compatibility string we are probed with which has to be updated whenever a new process is introduced.
We remove the existing coefficients definition since subsequent patches are going to add support for a new process and will introduce new coefficients as well.
Fixes: 9e03cf1b2dd5 ("thermal: add brcmstb AVS TMON driver") Signed-off-by: Florian Fainelli f.fainelli@gmail.com Reviewed-by: Amit Kucheria amit.kucheria@linaro.org Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://lore.kernel.org/r/20200114190607.29339-2-f.fainelli@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/thermal/broadcom/brcmstb_thermal.c | 31 ++++++++--------------------- 1 file changed, 9 insertions(+), 22 deletions(-)
--- a/drivers/thermal/broadcom/brcmstb_thermal.c +++ b/drivers/thermal/broadcom/brcmstb_thermal.c @@ -49,7 +49,7 @@ #define AVS_TMON_TP_TEST_ENABLE 0x20
/* Default coefficients */ -#define AVS_TMON_TEMP_SLOPE -487 +#define AVS_TMON_TEMP_SLOPE 487 #define AVS_TMON_TEMP_OFFSET 410040
/* HW related temperature constants */ @@ -108,23 +108,12 @@ struct brcmstb_thermal_priv { struct thermal_zone_device *thermal; };
-static void avs_tmon_get_coeffs(struct thermal_zone_device *tz, int *slope, - int *offset) -{ - *slope = thermal_zone_get_slope(tz); - *offset = thermal_zone_get_offset(tz); -} - /* Convert a HW code to a temperature reading (millidegree celsius) */ static inline int avs_tmon_code_to_temp(struct thermal_zone_device *tz, u32 code) { - const int val = code & AVS_TMON_TEMP_MASK; - int slope, offset; - - avs_tmon_get_coeffs(tz, &slope, &offset); - - return slope * val + offset; + return (AVS_TMON_TEMP_OFFSET - + (int)((code & AVS_TMON_TEMP_MAX) * AVS_TMON_TEMP_SLOPE)); }
/* @@ -136,20 +125,18 @@ static inline int avs_tmon_code_to_temp( static inline u32 avs_tmon_temp_to_code(struct thermal_zone_device *tz, int temp, bool low) { - int slope, offset; - if (temp < AVS_TMON_TEMP_MIN) - return AVS_TMON_TEMP_MAX; /* Maximum code value */ - - avs_tmon_get_coeffs(tz, &slope, &offset); + return AVS_TMON_TEMP_MAX; /* Maximum code value */
- if (temp >= offset) + if (temp >= AVS_TMON_TEMP_OFFSET) return 0; /* Minimum code value */
if (low) - return (u32)(DIV_ROUND_UP(offset - temp, abs(slope))); + return (u32)(DIV_ROUND_UP(AVS_TMON_TEMP_OFFSET - temp, + AVS_TMON_TEMP_SLOPE)); else - return (u32)((offset - temp) / abs(slope)); + return (u32)((AVS_TMON_TEMP_OFFSET - temp) / + AVS_TMON_TEMP_SLOPE); }
static int brcmstb_get_temp(void *data, int *temp)
From: Xin Long lucien.xin@gmail.com
commit cf3e204a1ca5442190018a317d9ec181b4639bd6 upstream.
info->key.tp_src and tp_dst are __be16, when using nla_put_be16() to dump them, htons() is not needed, so remove it in this patch.
Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Xin Long lucien.xin@gmail.com Reviewed-by: Simon Horman simon.horman@netronome.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/netfilter/nft_tunnel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/netfilter/nft_tunnel.c +++ b/net/netfilter/nft_tunnel.c @@ -505,8 +505,8 @@ static int nft_tunnel_opts_dump(struct s static int nft_tunnel_ports_dump(struct sk_buff *skb, struct ip_tunnel_info *info) { - if (nla_put_be16(skb, NFTA_TUNNEL_KEY_SPORT, htons(info->key.tp_src)) < 0 || - nla_put_be16(skb, NFTA_TUNNEL_KEY_DPORT, htons(info->key.tp_dst)) < 0) + if (nla_put_be16(skb, NFTA_TUNNEL_KEY_SPORT, info->key.tp_src) < 0 || + nla_put_be16(skb, NFTA_TUNNEL_KEY_DPORT, info->key.tp_dst) < 0) return -1;
return 0;
From: Matteo Croce mcroce@redhat.com
commit 78e06cf430934fc3768c342cbebdd1013dcd6fa7 upstream.
In the flowtable documentation there is a missing semicolon, the command as is would give this error:
nftables.conf:5:27-33: Error: syntax error, unexpected devices, expecting newline or semicolon hook ingress priority 0 devices = { br0, pppoe-data }; ^^^^^^^ nftables.conf:4:12-13: Error: invalid hook (null) flowtable ft { ^^
Fixes: 19b351f16fd9 ("netfilter: add flowtable documentation") Signed-off-by: Matteo Croce mcroce@redhat.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/networking/nf_flowtable.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/networking/nf_flowtable.txt +++ b/Documentation/networking/nf_flowtable.txt @@ -76,7 +76,7 @@ flowtable and add one rule to your forwa
table inet x { flowtable f { - hook ingress priority 0 devices = { eth0, eth1 }; + hook ingress priority 0; devices = { eth0, eth1 }; } chain y { type filter hook forward priority 0; policy accept;
From: Sameer Pujar spujar@nvidia.com
commit 2f56acf818a08a9187ac8ec6e3d994fc13dc368d upstream.
The ACONNECT bus driver does not use pm-clk interface anymore and hence the dependency can be removed from its Kconfig option.
Fixes: 0d7dab926130 ("bus: tegra-aconnect: use devm_clk_*() helpers") Signed-off-by: Sameer Pujar spujar@nvidia.com Acked-by: Jon Hunter jonathanh@nvidia.com Signed-off-by: Thierry Reding treding@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/bus/Kconfig | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/bus/Kconfig +++ b/drivers/bus/Kconfig @@ -139,7 +139,6 @@ config TEGRA_ACONNECT tristate "Tegra ACONNECT Bus Driver" depends on ARCH_TEGRA_210_SOC depends on OF && PM - select PM_CLK help Driver for the Tegra ACONNECT bus which is used to interface with the devices inside the Audio Processing Engine (APE) for Tegra210.
From: Bjorn Andersson bjorn.andersson@linaro.org
commit 9e0cda721d18f44f1cd74d3a426931d71c1f1b30 upstream.
sc7180 was added to the end of the match table, sort the table.
Fixes: eee28109f871 ("clk: qcom: clk-rpmh: Add support for RPMHCC for SC7180") Signed-off-by: Bjorn Andersson bjorn.andersson@linaro.org Link: https://lkml.kernel.org/r/20200124175934.3937473-1-bjorn.andersson@linaro.or... Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/clk/qcom/clk-rpmh.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/clk/qcom/clk-rpmh.c +++ b/drivers/clk/qcom/clk-rpmh.c @@ -481,9 +481,9 @@ static int clk_rpmh_probe(struct platfor }
static const struct of_device_id clk_rpmh_match_table[] = { + { .compatible = "qcom,sc7180-rpmh-clk", .data = &clk_rpmh_sc7180}, { .compatible = "qcom,sdm845-rpmh-clk", .data = &clk_rpmh_sdm845}, { .compatible = "qcom,sm8150-rpmh-clk", .data = &clk_rpmh_sm8150}, - { .compatible = "qcom,sc7180-rpmh-clk", .data = &clk_rpmh_sc7180}, { } }; MODULE_DEVICE_TABLE(of, clk_rpmh_match_table);
From: Christoph Hellwig hch@lst.de
commit 953aa9d136f53e226448dbd801a905c28f8071bf upstream.
Don't allow passing arbitrary flags as they change behavior including memory allocation that the call stack is not prepared for.
Fixes: ddbca70cc45c ("xfs: allocate xattr buffer on demand") Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/xfs/libxfs/xfs_attr.h | 7 +++++-- fs/xfs/xfs_ioctl.c | 2 ++ fs/xfs/xfs_ioctl32.c | 2 ++ 3 files changed, 9 insertions(+), 2 deletions(-)
--- a/fs/xfs/libxfs/xfs_attr.h +++ b/fs/xfs/libxfs/xfs_attr.h @@ -26,7 +26,7 @@ struct xfs_attr_list_context; *========================================================================*/
-#define ATTR_DONTFOLLOW 0x0001 /* -- unused, from IRIX -- */ +#define ATTR_DONTFOLLOW 0x0001 /* -- ignored, from IRIX -- */ #define ATTR_ROOT 0x0002 /* use attrs in root (trusted) namespace */ #define ATTR_TRUST 0x0004 /* -- unused, from IRIX -- */ #define ATTR_SECURE 0x0008 /* use attrs in security namespace */ @@ -37,7 +37,10 @@ struct xfs_attr_list_context; #define ATTR_KERNOVAL 0x2000 /* [kernel] get attr size only, not value */
#define ATTR_INCOMPLETE 0x4000 /* [kernel] return INCOMPLETE attr keys */ -#define ATTR_ALLOC 0x8000 /* allocate xattr buffer on demand */ +#define ATTR_ALLOC 0x8000 /* [kernel] allocate xattr buffer on demand */ + +#define ATTR_KERNEL_FLAGS \ + (ATTR_KERNOTIME | ATTR_KERNOVAL | ATTR_INCOMPLETE | ATTR_ALLOC)
#define XFS_ATTR_FLAGS \ { ATTR_DONTFOLLOW, "DONTFOLLOW" }, \ --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -462,6 +462,8 @@ xfs_attrmulti_by_handle(
error = 0; for (i = 0; i < am_hreq.opcount; i++) { + ops[i].am_flags &= ~ATTR_KERNEL_FLAGS; + ops[i].am_error = strncpy_from_user((char *)attr_name, ops[i].am_attrname, MAXNAMELEN); if (ops[i].am_error == 0 || ops[i].am_error == MAXNAMELEN) --- a/fs/xfs/xfs_ioctl32.c +++ b/fs/xfs/xfs_ioctl32.c @@ -450,6 +450,8 @@ xfs_compat_attrmulti_by_handle(
error = 0; for (i = 0; i < am_hreq.opcount; i++) { + ops[i].am_flags &= ~ATTR_KERNEL_FLAGS; + ops[i].am_error = strncpy_from_user((char *)attr_name, compat_ptr(ops[i].am_attrname), MAXNAMELEN);
From: Daniel Jordan daniel.m.jordan@oracle.com
commit 38228e8848cd7dd86ccb90406af32de0cad24be3 upstream.
lockdep complains when padata's paths to update cpumasks via CPU hotplug and sysfs are both taken:
# echo 0 > /sys/devices/system/cpu/cpu1/online # echo ff > /sys/kernel/pcrypt/pencrypt/parallel_cpumask
====================================================== WARNING: possible circular locking dependency detected 5.4.0-rc8-padata-cpuhp-v3+ #1 Not tainted ------------------------------------------------------ bash/205 is trying to acquire lock: ffffffff8286bcd0 (cpu_hotplug_lock.rw_sem){++++}, at: padata_set_cpumask+0x2b/0x120
but task is already holding lock: ffff8880001abfa0 (&pinst->lock){+.+.}, at: padata_set_cpumask+0x26/0x120
which lock already depends on the new lock.
padata doesn't take cpu_hotplug_lock and pinst->lock in a consistent order. Which should be first? CPU hotplug calls into padata with cpu_hotplug_lock already held, so it should have priority.
Fixes: 6751fb3c0e0c ("padata: Use get_online_cpus/put_online_cpus") Signed-off-by: Daniel Jordan daniel.m.jordan@oracle.com Cc: Eric Biggers ebiggers@kernel.org Cc: Herbert Xu herbert@gondor.apana.org.au Cc: Steffen Klassert steffen.klassert@secunet.com Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/padata.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/padata.c +++ b/kernel/padata.c @@ -643,8 +643,8 @@ int padata_set_cpumask(struct padata_ins struct cpumask *serial_mask, *parallel_mask; int err = -EINVAL;
- mutex_lock(&pinst->lock); get_online_cpus(); + mutex_lock(&pinst->lock);
switch (cpumask_type) { case PADATA_CPU_PARALLEL: @@ -662,8 +662,8 @@ int padata_set_cpumask(struct padata_ins err = __padata_set_cpumasks(pinst, parallel_mask, serial_mask);
out: - put_online_cpus(); mutex_unlock(&pinst->lock); + put_online_cpus();
return err; }
From: Waiman Long longman@redhat.com
commit a030f9767da1a6bbcec840fc54770eb11c2414b6 upstream.
It was found that two lines in the output of /proc/lockdep_stats have indentation problem:
# cat /proc/lockdep_stats : in-process chains: 25057 stack-trace entries: 137827 [max: 524288] number of stack traces: 7973 number of stack hash chains: 6355 combined max dependencies: 1356414598 hardirq-safe locks: 57 hardirq-unsafe locks: 1286 :
All the numbers displayed in /proc/lockdep_stats except the two stack trace numbers are formatted with a field with of 11. To properly align all the numbers, a field width of 11 is now added to the two stack trace numbers.
Fixes: 8c779229d0f4 ("locking/lockdep: Report more stack trace statistics") Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Bart Van Assche bvanassche@acm.org Link: https://lkml.kernel.org/r/20191211213139.29934-1-longman@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/locking/lockdep_proc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/locking/lockdep_proc.c +++ b/kernel/locking/lockdep_proc.c @@ -286,9 +286,9 @@ static int lockdep_stats_show(struct seq seq_printf(m, " stack-trace entries: %11lu [max: %lu]\n", nr_stack_trace_entries, MAX_STACK_TRACE_ENTRIES); #if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_PROVE_LOCKING) - seq_printf(m, " number of stack traces: %llu\n", + seq_printf(m, " number of stack traces: %11llu\n", lockdep_stack_trace_count()); - seq_printf(m, " number of stack hash chains: %llu\n", + seq_printf(m, " number of stack hash chains: %11llu\n", lockdep_stack_hash_count()); #endif seq_printf(m, " combined max dependencies: %11u\n",
From: Vlastimil Babka vbabka@suse.cz
commit 5b57b8f22709f07c0ab5921c94fd66e8c59c3e11 upstream.
Commit 76a1850e4572 ("mm/debug.c: __dump_page() prints an extra line") inadvertently removed printing of page flags for pages that are neither anon nor ksm nor have a mapping. Fix that.
Using pr_cont() again would be a solution, but the commit explicitly removed its use. Avoiding the danger of mixing up split lines from multiple CPUs might be beneficial for near-panic dumps like this, so fix this without reintroducing pr_cont().
Link: http://lkml.kernel.org/r/9f884d5c-ca60-dc7b-219c-c081c755fab6@suse.cz Fixes: 76a1850e4572 ("mm/debug.c: __dump_page() prints an extra line") Signed-off-by: Vlastimil Babka vbabka@suse.cz Reported-by: Anshuman Khandual anshuman.khandual@arm.com Reported-by: Michal Hocko mhocko@kernel.org Acked-by: Michal Hocko mhocko@suse.com Cc: David Hildenbrand david@redhat.com Cc: Qian Cai cai@lca.pw Cc: Oscar Salvador osalvador@suse.de Cc: Mel Gorman mgorman@techsingularity.net Cc: Mike Rapoport rppt@linux.ibm.com Cc: Dan Williams dan.j.williams@intel.com Cc: Pavel Tatashin pavel.tatashin@microsoft.com Cc: Ralph Campbell rcampbell@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/debug.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/mm/debug.c +++ b/mm/debug.c @@ -47,6 +47,7 @@ void __dump_page(struct page *page, cons struct address_space *mapping; bool page_poisoned = PagePoisoned(page); int mapcount; + char *type = "";
/* * If struct page is poisoned don't access Page*() functions as that @@ -78,9 +79,9 @@ void __dump_page(struct page *page, cons page, page_ref_count(page), mapcount, page->mapping, page_to_pgoff(page)); if (PageKsm(page)) - pr_warn("ksm flags: %#lx(%pGp)\n", page->flags, &page->flags); + type = "ksm "; else if (PageAnon(page)) - pr_warn("anon flags: %#lx(%pGp)\n", page->flags, &page->flags); + type = "anon "; else if (mapping) { if (mapping->host && mapping->host->i_dentry.first) { struct dentry *dentry; @@ -88,10 +89,11 @@ void __dump_page(struct page *page, cons pr_warn("%ps name:"%pd"\n", mapping->a_ops, dentry); } else pr_warn("%ps\n", mapping->a_ops); - pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags); } BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
+ pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags); + hex_only: print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32, sizeof(unsigned long), page,
From: John Hubbard jhubbard@nvidia.com
commit f4000fdf435b8301a11cf85237c561047f8c4c72 upstream.
Commit 817be129e6f2 ("mm: validate get_user_pages_fast flags") allowed only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast(). This, combined with the fact that get_user_pages_fast() falls back to "slow gup", which *does* accept FOLL_FORCE, leads to an odd situation: if you need FOLL_FORCE, you cannot call get_user_pages_fast().
There does not appear to be any reason for filtering out FOLL_FORCE. There is nothing in the _fast() implementation that requires that we avoid writing to the pages. So it appears to have been an oversight.
Fix by allowing FOLL_FORCE to be set for get_user_pages_fast().
Link: http://lkml.kernel.org/r/20200107224558.2362728-9-jhubbard@nvidia.com Fixes: 817be129e6f2 ("mm: validate get_user_pages_fast flags") Signed-off-by: John Hubbard jhubbard@nvidia.com Reviewed-by: Leon Romanovsky leonro@mellanox.com Reviewed-by: Jan Kara jack@suse.cz Cc: Christoph Hellwig hch@lst.de Cc: Alex Williamson alex.williamson@redhat.com Cc: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Cc: Björn Töpel bjorn.topel@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Dan Williams dan.j.williams@intel.com Cc: Hans Verkuil hverkuil-cisco@xs4all.nl Cc: Ira Weiny ira.weiny@intel.com Cc: Jason Gunthorpe jgg@mellanox.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Jens Axboe axboe@kernel.dk Cc: Jerome Glisse jglisse@redhat.com Cc: Jonathan Corbet corbet@lwn.net Cc: Kirill A. Shutemov kirill@shutemov.name Cc: Mauro Carvalho Chehab mchehab@kernel.org Cc: Mike Rapoport rppt@linux.ibm.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/gup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/gup.c +++ b/mm/gup.c @@ -2415,7 +2415,8 @@ int get_user_pages_fast(unsigned long st unsigned long addr, len, end; int nr = 0, ret = 0;
- if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM))) + if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | + FOLL_FORCE))) return -EINVAL;
start = untagged_addr(start) & PAGE_MASK;
From: Wei Yang richardw.yang@linux.intel.com
commit cb829624867b5ab10bc6a7036d183b1b82bfe9f8 upstream.
The page could be a tail page, if this is the case, this BUG_ON will never be triggered.
Link: http://lkml.kernel.org/r/20200110032610.26499-1-richardw.yang@linux.intel.co... Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()")
Signed-off-by: Wei Yang richardw.yang@linux.intel.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/huge_memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2712,7 +2712,7 @@ int split_huge_page_to_list(struct page unsigned long flags; pgoff_t end;
- VM_BUG_ON_PAGE(is_huge_zero_page(page), page); + VM_BUG_ON_PAGE(is_huge_zero_page(head), head); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(!PageCompound(page), page);
From: David Rientjes rientjes@google.com
commit f42f25526502d851d0e3ca1e46297da8aafce8a7 upstream.
If thp defrag setting "defer" is used and a newline is *not* used when writing to the sysfs file, this is interpreted as the "defer+madvise" option.
This is because we do prefix matching and if five characters are written without a newline, the current code ends up comparing to the first five bytes of the "defer+madvise" option and using that instead.
Use the more appropriate sysfs_streq() that handles the trailing newline for us. Since this doubles as a nice cleanup, do it in enabled_store() as well.
The current implementation relies on prefix matching: the number of bytes compared is either the number of bytes written or the length of the option being compared. With a newline, "defer\n" does not match "defer+"madvise"; without a newline, however, "defer" is considered to match "defer+madvise" (prefix matching is only comparing the first five bytes). End result is that writing "defer" is broken unless it has an additional trailing character.
This means that writing "madv" in the past would match and set "madvise". With strict checking, that no longer is the case but it is unlikely anybody is currently doing this.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2001171411020.56385@chino.kir.corp.... Fixes: 21440d7eb904 ("mm, thp: add new defer+madvise defrag option") Signed-off-by: David Rientjes rientjes@google.com Suggested-by: Andrew Morton akpm@linux-foundation.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Mel Gorman mgorman@techsingularity.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/huge_memory.c | 24 ++++++++---------------- 1 file changed, 8 insertions(+), 16 deletions(-)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -177,16 +177,13 @@ static ssize_t enabled_store(struct kobj { ssize_t ret = count;
- if (!memcmp("always", buf, - min(sizeof("always")-1, count))) { + if (sysfs_streq(buf, "always")) { clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); - } else if (!memcmp("madvise", buf, - min(sizeof("madvise")-1, count))) { + } else if (sysfs_streq(buf, "madvise")) { clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); - } else if (!memcmp("never", buf, - min(sizeof("never")-1, count))) { + } else if (sysfs_streq(buf, "never")) { clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); } else @@ -250,32 +247,27 @@ static ssize_t defrag_store(struct kobje struct kobj_attribute *attr, const char *buf, size_t count) { - if (!memcmp("always", buf, - min(sizeof("always")-1, count))) { + if (sysfs_streq(buf, "always")) { clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); - } else if (!memcmp("defer+madvise", buf, - min(sizeof("defer+madvise")-1, count))) { + } else if (sysfs_streq(buf, "defer+madvise")) { clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); - } else if (!memcmp("defer", buf, - min(sizeof("defer")-1, count))) { + } else if (sysfs_streq(buf, "defer")) { clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); - } else if (!memcmp("madvise", buf, - min(sizeof("madvise")-1, count))) { + } else if (sysfs_streq(buf, "madvise")) { clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags); - } else if (!memcmp("never", buf, - min(sizeof("never")-1, count))) { + } else if (sysfs_streq(buf, "never")) { clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags);
From: Jim Mattson jmattson@google.com
commit dd2d6042b7f4a5440705b4ffc6c4c2dba81a43b7 upstream.
According to the SDM, a VMWRITE in VMX non-root operation with an invalid VMCS-link pointer results in VMfailInvalid before the validity of the VMCS field in the secondary source operand is checked.
For consistency, modify both handle_vmwrite and handle_vmread, even though there was no problem with the latter.
Fixes: 6d894f498f5d1 ("KVM: nVMX: vmread/vmwrite: Use shadow vmcs12 if running L2") Signed-off-by: Jim Mattson jmattson@google.com Cc: Liran Alon liran.alon@oracle.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Vitaly Kuznetsov vkuznets@redhat.com Reviewed-by: Peter Shier pshier@google.com Reviewed-by: Oliver Upton oupton@google.com Reviewed-by: Jon Cargille jcargill@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/vmx/nested.c | 59 +++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 34 deletions(-)
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4808,32 +4808,28 @@ static int handle_vmread(struct kvm_vcpu { unsigned long field; u64 field_value; + struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION); u32 vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); int len; gva_t gva = 0; - struct vmcs12 *vmcs12; + struct vmcs12 *vmcs12 = is_guest_mode(vcpu) ? get_shadow_vmcs12(vcpu) + : get_vmcs12(vcpu); struct x86_exception e; short offset;
if (!nested_vmx_check_permission(vcpu)) return 1;
- if (to_vmx(vcpu)->nested.current_vmptr == -1ull) + /* + * In VMX non-root operation, when the VMCS-link pointer is -1ull, + * any VMREAD sets the ALU flags for VMfailInvalid. + */ + if (vmx->nested.current_vmptr == -1ull || + (is_guest_mode(vcpu) && + get_vmcs12(vcpu)->vmcs_link_pointer == -1ull)) return nested_vmx_failInvalid(vcpu);
- if (!is_guest_mode(vcpu)) - vmcs12 = get_vmcs12(vcpu); - else { - /* - * When vmcs->vmcs_link_pointer is -1ull, any VMREAD - * to shadowed-field sets the ALU flags for VMfailInvalid. - */ - if (get_vmcs12(vcpu)->vmcs_link_pointer == -1ull) - return nested_vmx_failInvalid(vcpu); - vmcs12 = get_shadow_vmcs12(vcpu); - } - /* Decode instruction info and find the field to read */ field = kvm_register_readl(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
@@ -4912,13 +4908,20 @@ static int handle_vmwrite(struct kvm_vcp */ u64 field_value = 0; struct x86_exception e; - struct vmcs12 *vmcs12; + struct vmcs12 *vmcs12 = is_guest_mode(vcpu) ? get_shadow_vmcs12(vcpu) + : get_vmcs12(vcpu); short offset;
if (!nested_vmx_check_permission(vcpu)) return 1;
- if (vmx->nested.current_vmptr == -1ull) + /* + * In VMX non-root operation, when the VMCS-link pointer is -1ull, + * any VMWRITE sets the ALU flags for VMfailInvalid. + */ + if (vmx->nested.current_vmptr == -1ull || + (is_guest_mode(vcpu) && + get_vmcs12(vcpu)->vmcs_link_pointer == -1ull)) return nested_vmx_failInvalid(vcpu);
if (vmx_instruction_info & (1u << 10)) @@ -4946,24 +4949,12 @@ static int handle_vmwrite(struct kvm_vcp return nested_vmx_failValid(vcpu, VMXERR_VMWRITE_READ_ONLY_VMCS_COMPONENT);
- if (!is_guest_mode(vcpu)) { - vmcs12 = get_vmcs12(vcpu); - - /* - * Ensure vmcs12 is up-to-date before any VMWRITE that dirties - * vmcs12, else we may crush a field or consume a stale value. - */ - if (!is_shadow_field_rw(field)) - copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12); - } else { - /* - * When vmcs->vmcs_link_pointer is -1ull, any VMWRITE - * to shadowed-field sets the ALU flags for VMfailInvalid. - */ - if (get_vmcs12(vcpu)->vmcs_link_pointer == -1ull) - return nested_vmx_failInvalid(vcpu); - vmcs12 = get_shadow_vmcs12(vcpu); - } + /* + * Ensure vmcs12 is up-to-date before any VMWRITE that dirties + * vmcs12, else we may crush a field or consume a stale value. + */ + if (!is_guest_mode(vcpu) && !is_shadow_field_rw(field)) + copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12);
offset = vmcs_field_to_offset(field); if (offset < 0)
From: Jim Mattson jmattson@google.com
commit 693e02cc24090c379217138719d9d84e50036b24 upstream.
According to the SDM, VMWRITE checks to see if the secondary source operand corresponds to an unsupported VMCS field before it checks to see if the secondary source operand corresponds to a VM-exit information field and the processor does not support writing to VM-exit information fields.
Fixes: 49f705c5324aa ("KVM: nVMX: Implement VMREAD and VMWRITE") Signed-off-by: Jim Mattson jmattson@google.com Cc: Paolo Bonzini pbonzini@redhat.com Reviewed-by: Peter Shier pshier@google.com Reviewed-by: Oliver Upton oupton@google.com Reviewed-by: Jon Cargille jcargill@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/vmx/nested.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4940,6 +4940,12 @@ static int handle_vmwrite(struct kvm_vcp
field = kvm_register_readl(vcpu, (((vmx_instruction_info) >> 28) & 0xf)); + + offset = vmcs_field_to_offset(field); + if (offset < 0) + return nested_vmx_failValid(vcpu, + VMXERR_UNSUPPORTED_VMCS_COMPONENT); + /* * If the vCPU supports "VMWRITE to any supported field in the * VMCS," then the "read-only" fields are actually read/write. @@ -4956,11 +4962,6 @@ static int handle_vmwrite(struct kvm_vcp if (!is_guest_mode(vcpu) && !is_shadow_field_rw(field)) copy_vmcs02_to_vmcs12_rare(vcpu, vmcs12);
- offset = vmcs_field_to_offset(field); - if (offset < 0) - return nested_vmx_failValid(vcpu, - VMXERR_UNSUPPORTED_VMCS_COMPONENT); - /* * Some Intel CPUs intentionally drop the reserved bits of the AR byte * fields on VMWRITE. Emulate this behavior to ensure consistent KVM
On 03/03/2020 17:41, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.5: 13 builds: 13 pass, 0 fail 22 boots: 22 pass, 0 fail 40 tests: 40 pass, 0 fail
Linux version: 5.5.8-rc1-g3517b32c0774 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Cheers Jon
On Tue, Mar 03, 2020 at 10:11:19PM +0000, Jon Hunter wrote:
On 03/03/2020 17:41, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.5: 13 builds: 13 pass, 0 fail 22 boots: 22 pass, 0 fail 40 tests: 40 pass, 0 fail
Linux version: 5.5.8-rc1-g3517b32c0774 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
thanks for testing all of these and letting me know.
greg k-h
On 3/3/20 10:41 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Tue, Mar 03, 2020 at 04:01:34PM -0700, shuah wrote:
On 3/3/20 10:41 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Great, thanks for testing these and letting me know.
greg k-h
On Tue, 3 Mar 2020 at 23:16, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. Regressions detected on x86_64 and i386.
Test failure output: CVE-2017-5715: VULN (IBRS+IBPB or retpoline+IBPB+RSB filling, is needed to mitigate the vulnerability)
Test description: CVE-2017-5715 branch target injection (Spectre Variant 2)
Impact: Kernel Mitigation 1: new opcode via microcode update that should be used by up to date compilers to protect the BTB (by flushing indirect branch predictors) Mitigation 2: introducing "retpoline" into compilers, and recompile software/OS with it Performance impact of the mitigation: high for mitigation 1, medium for mitigation 2, depending on your CPU
ref: https://github.com/speed47/spectre-meltdown-checker https://qa-reports.linaro.org/lkft/linux-stable-rc-5.5-oe/tests/spectre-melt... https://lkft.validation.linaro.org/scheduler/job/1264643#L21206
Summary ------------------------------------------------------------------------
kernel: 5.5.8-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.5.y git commit: 3517b32c0774341d492140b2be08c4bf6d1a833e git describe: v5.5.7-177-g3517b32c0774 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.5-oe/build/v5.5.7-177-g...
Regressions (compared to build v5.5.7) ------------------------------------------------------------------------
i386: spectre-meltdown-checker-test: * CVE-2017-5715
x86: spectre-meltdown-checker-test: * CVE-2017-5715
No fixes (compared to build v5.5.7)
Ran 25662 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - nxp-ls2088 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libgpiod * linux-log-parser * perf * ltp-cap_bounds-tests * ltp-cpuhotplug-tests * ltp-fcntl-locktests-tests * ltp-fs-tests * ltp-ipc-tests * ltp-sched-tests * network-basic-tests * kvm-unit-tests * libhugetlbfs * ltp-commands-tests * ltp-containers-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-m[ * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-securebits-tests * ltp-syscalls-tests * spectre-meltdown-checker-test * v4l2-compliance * ltp-cap_bounds-64k-page_size-tests * ltp-cap_bounds-kasan-tests * ltp-commands-64k-page_size-tests * ltp-commands-kasan-tests * ltp-containers-64k-page_size-tests * ltp-containers-kasan-tests * ltp-cpuhotplug-64k-page_size-tests * ltp-cpuhotplug-kasan-tests * ltp-crypto-64k-page_size-tests * ltp-crypto-kasan-tests * ltp-cve-64k-page_size-tests * ltp-cve-kasan-tests * ltp-dio-64k-page_size-tests * ltp-dio-kasan-tests * ltp-fcntl-locktests-64k-page_size-tests * ltp-fcntl-locktests-kasan-tests * ltp-filecaps-64k-page_size-tests * ltp-filecaps-kasan-tests * ltp-fs-64k-page_size-tests * ltp-fs-kasan-tests * ltp-fs_bind-64k-page_size-tests * ltp-fs_bind-kasan-tests * ltp-fs_perms_simple-64k-page_size-tests * ltp-fs_perms_simple-kasan-tests * ltp-fsx-64k-page_size-tests * ltp-fsx-kasan-tests * ltp-hugetlb-64k-page_size-tests * ltp-hugetlb-kasan-tests * ltp-io-64k-page_size-tests * ltp-io-kasan-tests * ltp-ipc-64k-page_size-tests * ltp-ipc-kasan-tests * ltp-math-64k-page_size-tests * ltp-math-kasan-tests * ltp-math-tests * ltp-mm-64k-page_size-tests * ltp-mm-kasan-tests * ltp-nptl-64k-page_size-tests * ltp-nptl-kasan-tests * ltp-pty-64k-page_size-tests * ltp-pty-kasan-tests * ltp-sched-64k-page_size-tests * ltp-sched-kasan-tests * ltp-securebits-64k-page_size-tests * ltp-securebits-kasan-tests * ltp-syscalls-64k-page_size-tests * ltp-syscalls-compat-tests * ltp-syscalls-kasan-tests * ltp-open-posix-tests * ssuite * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Wed, Mar 04, 2020 at 12:43:42PM +0530, Naresh Kamboju wrote:
On Tue, 3 Mar 2020 at 23:16, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. Regressions detected on x86_64 and i386.
Test failure output: CVE-2017-5715: VULN (IBRS+IBPB or retpoline+IBPB+RSB filling, is needed to mitigate the vulnerability)
Test description: CVE-2017-5715 branch target injection (Spectre Variant 2)
Impact: Kernel Mitigation 1: new opcode via microcode update that should be used by up to date compilers to protect the BTB (by flushing indirect branch predictors) Mitigation 2: introducing "retpoline" into compilers, and recompile software/OS with it Performance impact of the mitigation: high for mitigation 1, medium for mitigation 2, depending on your CPU
So these are regressions or just new tests?
If regressions, can you do 'git bisect' to find the offending commit?
Also, are you sure you have an updated microcode on these machines and a proper compiler for retpoline?
thanks,
greg k-h
On Wed, Mar 04, 2020 at 09:11:28AM +0100, Greg Kroah-Hartman wrote:
On Wed, Mar 04, 2020 at 12:43:42PM +0530, Naresh Kamboju wrote:
On Tue, 3 Mar 2020 at 23:16, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. Regressions detected on x86_64 and i386.
Test failure output: CVE-2017-5715: VULN (IBRS+IBPB or retpoline+IBPB+RSB filling, is needed to mitigate the vulnerability)
Test description: CVE-2017-5715 branch target injection (Spectre Variant 2)
Impact: Kernel Mitigation 1: new opcode via microcode update that should be used by up to date compilers to protect the BTB (by flushing indirect branch predictors) Mitigation 2: introducing "retpoline" into compilers, and recompile software/OS with it Performance impact of the mitigation: high for mitigation 1, medium for mitigation 2, depending on your CPU
So these are regressions or just new tests?
If regressions, can you do 'git bisect' to find the offending commit?
Also, are you sure you have an updated microcode on these machines and a proper compiler for retpoline?
As an example of just how crazy that script is, here's the output of my machine for that first CVE issue:
CVE-2017-5715 aka 'Spectre Variant 2, branch target injection' * Mitigated according to the /sys interface: YES (Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling) * Mitigation 1 * Kernel is compiled with IBRS support: YES * IBRS enabled and active: YES (for firmware code only) * Kernel is compiled with IBPB support: YES * IBPB enabled and active: YES * Mitigation 2 * Kernel has branch predictor hardening (arm): NO * Kernel compiled with retpoline option: YES * Kernel compiled with a retpoline-aware compiler: YES (kernel reports full retpoline compilation) * Kernel supports RSB filling: UNKNOWN (couldn't check (couldn't find your kernel image in /boot, if you used netboot, this is normal))
STATUS: VULNERABLE (IBRS+IBPB or retpoline+IBPB+RSB filling, is needed to mitigate
So why is this "Vulnerable"? Because it didn't think it could find my kernel image for some odd reason, despite it really being in /boot/ (I don't use netboot)
So please verify that this really is a real issue, and not just the script doing foolish things.
thanks,
greg k-h
On Wed, Mar 04, 2020 at 09:47:02AM +0100, Greg Kroah-Hartman wrote:
On Wed, Mar 04, 2020 at 09:11:28AM +0100, Greg Kroah-Hartman wrote:
On Wed, Mar 04, 2020 at 12:43:42PM +0530, Naresh Kamboju wrote:
On Tue, 3 Mar 2020 at 23:16, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. Regressions detected on x86_64 and i386.
Test failure output: CVE-2017-5715: VULN (IBRS+IBPB or retpoline+IBPB+RSB filling, is needed to mitigate the vulnerability)
Test description: CVE-2017-5715 branch target injection (Spectre Variant 2)
Impact: Kernel Mitigation 1: new opcode via microcode update that should be used by up to date compilers to protect the BTB (by flushing indirect branch predictors) Mitigation 2: introducing "retpoline" into compilers, and recompile software/OS with it Performance impact of the mitigation: high for mitigation 1, medium for mitigation 2, depending on your CPU
So these are regressions or just new tests?
If regressions, can you do 'git bisect' to find the offending commit?
Also, are you sure you have an updated microcode on these machines and a proper compiler for retpoline?
As an example of just how crazy that script is, here's the output of my machine for that first CVE issue:
CVE-2017-5715 aka 'Spectre Variant 2, branch target injection'
- Mitigated according to the /sys interface: YES (Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling)
- Mitigation 1
- Kernel is compiled with IBRS support: YES
- IBRS enabled and active: YES (for firmware code only)
- Kernel is compiled with IBPB support: YES
- IBPB enabled and active: YES
- Mitigation 2
- Kernel has branch predictor hardening (arm): NO
- Kernel compiled with retpoline option: YES
- Kernel compiled with a retpoline-aware compiler: YES (kernel reports full retpoline compilation)
- Kernel supports RSB filling: UNKNOWN (couldn't check (couldn't find your kernel image in /boot, if you used netboot, this is normal))
STATUS: VULNERABLE (IBRS+IBPB or retpoline+IBPB+RSB filling, is needed to mitigate
So why is this "Vulnerable"? Because it didn't think it could find my kernel image for some odd reason, despite it really being in /boot/ (I don't use netboot)
So please verify that this really is a real issue, and not just the script doing foolish things.
And, if I tell the script where my kernel image is, suddenly all is good:
CVE-2017-5715 aka 'Spectre Variant 2, branch target injection' * Mitigation 1 * Kernel is compiled with IBRS support: YES * IBRS enabled and active: N/A (not testable in offline mode) * Kernel is compiled with IBPB support: YES * IBPB enabled and active: N/A (not testable in offline mode) * Mitigation 2 * Kernel has branch predictor hardening (arm): UNKNOWN * Kernel compiled with retpoline option: UNKNOWN (couldn't read your kernel configuration) * Kernel supports RSB filling: YES
STATUS: NOT VULNERABLE (offline mode: kernel supports IBRS + IBPB to mitigate the vulnerability)
On Wed, 4 Mar 2020 at 14:19, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Wed, Mar 04, 2020 at 09:47:02AM +0100, Greg Kroah-Hartman wrote:
On Wed, Mar 04, 2020 at 09:11:28AM +0100, Greg Kroah-Hartman wrote:
On Wed, Mar 04, 2020 at 12:43:42PM +0530, Naresh Kamboju wrote:
On Tue, 3 Mar 2020 at 23:16, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. Regressions detected on x86_64 and i386.
Test failure output: CVE-2017-5715: VULN (IBRS+IBPB or retpoline+IBPB+RSB filling, is needed to mitigate the vulnerability)
Test description: CVE-2017-5715 branch target injection (Spectre Variant 2)
Impact: Kernel Mitigation 1: new opcode via microcode update that should be used by up to date compilers to protect the BTB (by flushing indirect branch predictors) Mitigation 2: introducing "retpoline" into compilers, and recompile software/OS with it Performance impact of the mitigation: high for mitigation 1, medium for mitigation 2, depending on your CPU
So these are regressions or just new tests?
If regressions, can you do 'git bisect' to find the offending commit?
Also, are you sure you have an updated microcode on these machines and a proper compiler for retpoline?
As an example of just how crazy that script is, here's the output of my machine for that first CVE issue:
CVE-2017-5715 aka 'Spectre Variant 2, branch target injection'
- Mitigated according to the /sys interface: YES (Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling)
- Mitigation 1
- Kernel is compiled with IBRS support: YES
- IBRS enabled and active: YES (for firmware code only)
- Kernel is compiled with IBPB support: YES
- IBPB enabled and active: YES
- Mitigation 2
- Kernel has branch predictor hardening (arm): NO
- Kernel compiled with retpoline option: YES
- Kernel compiled with a retpoline-aware compiler: YES (kernel reports full retpoline compilation)
- Kernel supports RSB filling: UNKNOWN (couldn't check (couldn't find your kernel image in /boot, if you used netboot, this is normal))
STATUS: VULNERABLE (IBRS+IBPB or retpoline+IBPB+RSB filling, is needed to mitigate
So why is this "Vulnerable"? Because it didn't think it could find my kernel image for some odd reason, despite it really being in /boot/ (I don't use netboot)
Now I know the real reason why this test failed. With this note we can conclude this is not a regression.
No regressions on arm64, arm, x86_64, and i386 for 4.19, 5.4 and 5.5 branches.
Sorry for the noise.
On Wed, Mar 04, 2020 at 04:22:30PM +0530, Naresh Kamboju wrote:
On Wed, 4 Mar 2020 at 14:19, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Wed, Mar 04, 2020 at 09:47:02AM +0100, Greg Kroah-Hartman wrote:
On Wed, Mar 04, 2020 at 09:11:28AM +0100, Greg Kroah-Hartman wrote:
On Wed, Mar 04, 2020 at 12:43:42PM +0530, Naresh Kamboju wrote:
On Tue, 3 Mar 2020 at 23:16, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. Regressions detected on x86_64 and i386.
Test failure output: CVE-2017-5715: VULN (IBRS+IBPB or retpoline+IBPB+RSB filling, is needed to mitigate the vulnerability)
Test description: CVE-2017-5715 branch target injection (Spectre Variant 2)
Impact: Kernel Mitigation 1: new opcode via microcode update that should be used by up to date compilers to protect the BTB (by flushing indirect branch predictors) Mitigation 2: introducing "retpoline" into compilers, and recompile software/OS with it Performance impact of the mitigation: high for mitigation 1, medium for mitigation 2, depending on your CPU
So these are regressions or just new tests?
If regressions, can you do 'git bisect' to find the offending commit?
Also, are you sure you have an updated microcode on these machines and a proper compiler for retpoline?
As an example of just how crazy that script is, here's the output of my machine for that first CVE issue:
CVE-2017-5715 aka 'Spectre Variant 2, branch target injection'
- Mitigated according to the /sys interface: YES (Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling)
- Mitigation 1
- Kernel is compiled with IBRS support: YES
- IBRS enabled and active: YES (for firmware code only)
- Kernel is compiled with IBPB support: YES
- IBPB enabled and active: YES
- Mitigation 2
- Kernel has branch predictor hardening (arm): NO
- Kernel compiled with retpoline option: YES
- Kernel compiled with a retpoline-aware compiler: YES (kernel reports full retpoline compilation)
- Kernel supports RSB filling: UNKNOWN (couldn't check (couldn't find your kernel image in /boot, if you used netboot, this is normal))
STATUS: VULNERABLE (IBRS+IBPB or retpoline+IBPB+RSB filling, is needed to mitigate
So why is this "Vulnerable"? Because it didn't think it could find my kernel image for some odd reason, despite it really being in /boot/ (I don't use netboot)
Now I know the real reason why this test failed. With this note we can conclude this is not a regression.
No regressions on arm64, arm, x86_64, and i386 for 4.19, 5.4 and 5.5 branches.
Great, thanks for confirming and for testing all of these.
greg k-h
On Wed, Mar 04, 2020 at 12:52:32PM +0100, Greg Kroah-Hartman wrote:
On Wed, Mar 04, 2020 at 04:22:30PM +0530, Naresh Kamboju wrote:
So why is this "Vulnerable"? Because it didn't think it could find my kernel image for some odd reason, despite it really being in /boot/ (I don't use netboot)
Now I know the real reason why this test failed. With this note we can conclude this is not a regression.
No regressions on arm64, arm, x86_64, and i386 for 4.19, 5.4 and 5.5 branches.
Great, thanks for confirming and for testing all of these.
We originally added spectre-meltdown-checker to lkft for informational purposes, so that we could compare its report to any actual tests that produce spectre/meltdown related failures (and help determine if the problem is hardware/firmware or kernel). In practice, it's never been helpful (because it's not an actual test) and so we'll be removing it from LKFT.
Dan
greg k-h
On Tue, Mar 03, 2020 at 06:41:04PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 423 pass: 423 fail: 0
Guenter
On Wed, Mar 04, 2020 at 08:53:12AM -0800, Guenter Roeck wrote:
On Tue, Mar 03, 2020 at 06:41:04PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.8 release. There are 176 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 05 Mar 2020 17:42:06 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 423 pass: 423 fail: 0
Wonderful, thanks for testing these and letting me konw.
greg k-h
linux-stable-mirror@lists.linaro.org