This is the start of the stable review cycle for the 5.4.270 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.270-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.4.270-rc1
Andrii Nakryiko andriin@fb.com scripts/bpf: Fix xdp_md forward declaration typo
Bart Van Assche bvanassche@acm.org fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
Erik Kurzinger ekurzinger@nvidia.com drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE flag is set
Christian König christian.koenig@amd.com drm/syncobj: make lockdep complain on WAIT_FOR_SUBMIT v3
Florian Westphal fw@strlen.de netfilter: nf_tables: set dormant flag on hook register failure
Sabrina Dubroca sd@queasysnail.net tls: stop recv() if initial process_rx_list gave us non-DATA
Jakub Kicinski kuba@kernel.org tls: rx: drop pointless else after goto
Jakub Kicinski kuba@kernel.org tls: rx: jump to a more appropriate label
Jason Gunthorpe jgg@nvidia.com s390: use the correct count for __iowrite64_copy()
Wolfram Sang wsa+renesas@sang-engineering.com packet: move from strlcpy with unused retval to strscpy
Vasiliy Kovalev kovalev@altlinux.org ipv6: sr: fix possible use-after-free and null-ptr-deref
Daniil Dulov d.dulov@aladdin.ru afs: Increase buffer size in afs_update_volume_status()
Eric Dumazet edumazet@google.com ipv6: properly combine dev_base_seq and ipv6.dev_addr_genid
Eric Dumazet edumazet@google.com ipv4: properly combine dev_base_seq and ipv4.dev_addr_genid
Arnd Bergmann arnd@arndb.de nouveau: fix function cast warnings
Randy Dunlap rdunlap@infradead.org scsi: jazz_esp: Only build if SCSI core is builtin
Gianmarco Lusvardi glusvardi@posteo.net bpf, scripts: Correct GPL license name
Andrii Nakryiko andriin@fb.com scripts/bpf: teach bpf_helpers_doc.py to dump BPF helper definitions
Arnd Bergmann arnd@arndb.de RDMA/srpt: fix function pointer cast warnings
Bart Van Assche bvanassche@acm.org RDMA/srpt: Make debug output more detailed
Kalesh AP kalesh-anakkur.purayil@broadcom.com RDMA/bnxt_re: Return error for SRQ resize
Zhipeng Lu alexious@zju.edu.cn IB/hfi1: Fix a memleak in init_credit_return
Xu Yang xu.yang_2@nxp.com usb: roles: don't get/set_role() when usb_role_switch is unregistered
Krishna Kurapati quic_kriskura@quicinc.com usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
Frank Li Frank.Li@nxp.com usb: cdns3: fix memory double free when handle zero packet
Frank Li Frank.Li@nxp.com usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable()
Nikita Shubin nikita.shubin@maquefel.me ARM: ep93xx: Add terminator to gpiod_lookup_table
Tom Parkin tparkin@katalix.com l2tp: pass correct message length to ip6_append_data
Vidya Sagar vidyas@nvidia.com PCI/MSI: Prevent MSI hardware interrupt number truncation
Vasiliy Kovalev kovalev@altlinux.org gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp()
Mikulas Patocka mpatocka@redhat.com dm-crypt: don't modify the data when using authenticated encryption
Daniel Vacek neelx@redhat.com IB/hfi1: Fix sdma.h tx->num_descs off-by-one error
Christophe JAILLET christophe.jaillet@wanadoo.fr PCI: tegra: Fix OF node reference leak
Pali Rohár pali@kernel.org PCI: tegra: Fix reporting GPIO error value
Sireesh Kodali sireeshkodali1@gmail.com arm64: dts: qcom: msm8916: Fix typo in pronto remoteproc node
Nathan Chancellor nathan@kernel.org drm/amdgpu: Fix type of second parameter in trans_msg() callback
Matthew Wilcox (Oracle) willy@infradead.org iomap: Set all uptodate bits for an Uptodate page
Mikulas Patocka mpatocka@redhat.com dm-integrity: don't modify bio's immutable bio_vec in integrity_metadata()
Kirill A. Shutemov kirill.shutemov@linux.intel.com x86/alternatives: Disable KASAN in apply_alternatives()
Trek trek00@inbox.ru drm/amdgpu: Check for valid number of registers to read
Icenowy Zheng icenowy@aosc.io Revert "drm/sun4i: dsi: Change the start delay calculation"
Kai-Heng Feng kai.heng.feng@canonical.com ALSA: hda/realtek - Enable micmute LED on and HP system
Björn Töpel bjorn.topel@gmail.com selftests/bpf: Avoid running unprivileged tests with alignment requirements
Nikolay Aleksandrov nikolay@cumulusnetworks.com net: bridge: clear bridge's private skb space on xmit
Christophe JAILLET christophe.jaillet@wanadoo.fr spi: mt7621: Fix an error message in mt7621_spi_probe()
John Stultz john.stultz@linaro.org driver core: Set deferred_probe_timeout to a longer default if CONFIG_MODULES is set
Miaoqian Lin linmq006@gmail.com pinctrl: rockchip: Fix refcount leak in rockchip_pinctrl_parse_groups
Lee Jones lee.jones@linaro.org pinctrl: pinctrl-rockchip: Fix a bunch of kerneldoc misdemeanours
Eric Dumazet edumazet@google.com tcp: add annotations around sk->sk_shutdown accesses
Soheil Hassas Yeganeh soheil@google.com tcp: return EPOLLOUT from tcp_poll only when notsent_bytes is half the limit
Paolo Abeni pabeni@redhat.com tcp: factor out __tcp_close() helper
Geert Uytterhoeven geert+renesas@glider.be pmdomain: renesas: r8a77980-sysc: CR7 must be always on
Alexandra Winter wintera@linux.ibm.com s390/qeth: Fix potential loss of L3-IP@ in case of network issues
Yi Sun yi.sun@unisoc.com virtio-blk: Ensure no requests in virtqueues before deleting vqs.
Takashi Sakamoto o-takashi@sakamocchi.jp firewire: core: send bus reset promptly on gap count error
Hannes Reinecke hare@suse.de scsi: lpfc: Use unsigned type for num_sge
Zhang Rui rui.zhang@intel.com hwmon: (coretemp) Enlarge per package core count limit
Daniel Wagner dwagner@suse.de nvmet-fc: abort command when there is no binding
Xin Long lucien.xin@gmail.com netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new
Chen-Yu Tsai wens@csie.org ASoC: sunxi: sun4i-spdif: Add support for Allwinner H616
Guixin Liu kanie@linux.alibaba.com nvmet-tcp: fix nvme tcp ida memory leak
Martin Blumenstingl martin.blumenstingl@googlemail.com regulator: pwm-regulator: Add validity checks in continuous .get_voltage
Baokun Li libaokun1@huawei.com ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
Baokun Li libaokun1@huawei.com ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
Lennert Buytenhek kernel@wantstofly.org ahci: add 43-bit DMA address quirk for ASMedia ASM1061 controllers
Conrad Kostecki conikost@gentoo.org ahci: asm1166: correct count of reported ports
Fullway Wang fullwaywang@outlook.com fbdev: sis: Error out if pixclock equals zero
Fullway Wang fullwaywang@outlook.com fbdev: savage: Error out if pixclock equals zero
Felix Fietkau nbd@nbd.name wifi: mac80211: fix race condition on enabling fast-xmit
Michal Kazior michal@plume.com wifi: cfg80211: fix missing interfaces when dumping
Vinod Koul vkoul@kernel.org dmaengine: fsl-qdma: increase size of 'irq_name'
Vinod Koul vkoul@kernel.org dmaengine: shdma: increase size of 'dev_id'
Dmitry Bogdanov d.bogdanov@yadro.com scsi: target: core: Add TMF to tmr_list handling
Cyril Hrubis chrubis@suse.cz sched/rt: Disallow writing invalid values to sched_rt_period_us
Cyril Hrubis chrubis@suse.cz sched/rt: Fix sysctl_sched_rr_timeslice intial value
Lokesh Gidra lokeshgidra@google.com userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: replace WARN_ONs for invalid DAT metadata block requests
GONG, Ruiqi gongruiqi1@huawei.com memcg: add refcnt for pcpu stock to avoid UAF problem in drain_all_stock()
Cyril Hrubis chrubis@suse.cz sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset
Jamal Hadi Salim jhs@mojatatu.com net/sched: Retire dsmark qdisc
Jamal Hadi Salim jhs@mojatatu.com net/sched: Retire ATM qdisc
Jamal Hadi Salim jhs@mojatatu.com net/sched: Retire CBQ qdisc
Oliver Upton oliver.upton@linux.dev KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler
Oliver Upton oliver.upton@linux.dev KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table()
-------------
Diffstat:
Makefile | 4 +- arch/arm/mach-ep93xx/core.c | 1 + arch/arm64/boot/dts/qcom/msm8916.dtsi | 4 +- arch/s390/pci/pci.c | 2 +- arch/x86/kernel/alternative.c | 13 + drivers/ata/ahci.c | 34 +- drivers/ata/ahci.h | 1 + drivers/base/dd.c | 9 + drivers/block/virtio_blk.c | 7 +- drivers/dma/fsl-qdma.c | 2 +- drivers/dma/sh/shdma.h | 2 +- drivers/firewire/core-card.c | 18 +- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 5 +- drivers/gpu/drm/drm_syncobj.c | 16 +- drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c | 8 +- drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c | 3 +- drivers/hwmon/coretemp.c | 2 +- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 5 +- drivers/infiniband/hw/hfi1/pio.c | 6 +- drivers/infiniband/hw/hfi1/sdma.c | 2 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 18 +- drivers/md/dm-crypt.c | 6 + drivers/md/dm-integrity.c | 11 +- drivers/net/gtp.c | 10 +- drivers/nvme/target/fc.c | 8 +- drivers/nvme/target/tcp.c | 1 + drivers/pci/controller/pci-tegra.c | 17 +- drivers/pci/msi.c | 2 +- drivers/pinctrl/pinctrl-rockchip.c | 23 +- drivers/regulator/pwm-regulator.c | 3 + drivers/s390/net/qeth_l3_main.c | 9 +- drivers/scsi/Kconfig | 2 +- drivers/scsi/lpfc/lpfc_scsi.c | 12 +- drivers/soc/renesas/r8a77980-sysc.c | 3 +- drivers/spi/spi-mt7621.c | 8 +- drivers/target/target_core_device.c | 5 - drivers/target/target_core_transport.c | 4 + drivers/usb/cdns3/gadget.c | 8 +- drivers/usb/gadget/function/f_ncm.c | 10 +- drivers/usb/roles/class.c | 12 +- drivers/video/fbdev/savage/savagefb_driver.c | 3 + drivers/video/fbdev/sis/sis_main.c | 2 + fs/afs/volume.c | 4 +- fs/aio.c | 9 +- fs/ext4/mballoc.c | 13 +- fs/iomap/buffered-io.c | 3 + fs/nilfs2/dat.c | 27 +- include/linux/fs.h | 2 + include/linux/lockdep.h | 5 + include/net/tcp.h | 1 + kernel/sched/rt.c | 10 +- kernel/sysctl.c | 4 + mm/memcontrol.c | 6 + mm/userfaultfd.c | 14 +- net/bridge/br_device.c | 2 + net/ipv4/af_inet.c | 2 +- net/ipv4/devinet.c | 21 +- net/ipv4/tcp.c | 27 +- net/ipv4/tcp_input.c | 4 +- net/ipv6/addrconf.c | 21 +- net/ipv6/seg6.c | 20 +- net/l2tp/l2tp_ip6.c | 2 +- net/mac80211/sta_info.c | 2 + net/mac80211/tx.c | 2 +- net/netfilter/nf_conntrack_proto_sctp.c | 2 +- net/netfilter/nf_tables_api.c | 1 + net/packet/af_packet.c | 4 +- net/sched/Kconfig | 42 - net/sched/Makefile | 3 - net/sched/sch_atm.c | 710 -------- net/sched/sch_cbq.c | 1818 --------------------- net/sched/sch_dsmark.c | 523 ------ net/tls/tls_sw.c | 12 +- net/wireless/nl80211.c | 1 + scripts/bpf_helpers_doc.py | 157 +- sound/pci/hda/patch_realtek.c | 6 +- sound/soc/sunxi/sun4i-spdif.c | 5 + tools/testing/selftests/bpf/test_verifier.c | 13 + virt/kvm/arm/vgic/vgic-its.c | 5 + 80 files changed, 572 insertions(+), 3255 deletions(-)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oliver Upton oliver.upton@linux.dev
commit 8d3a7dfb801d157ac423261d7cd62c33e95375f8 upstream.
vgic_get_irq() may not return a valid descriptor if there is no ITS that holds a valid translation for the specified INTID. If that is the case, it is safe to silently ignore it and continue processing the LPI pending table.
Cc: stable@vger.kernel.org Fixes: 33d3bc9556a7 ("KVM: arm64: vgic-its: Read initial LPI pending table") Signed-off-by: Oliver Upton oliver.upton@linux.dev Link: https://lore.kernel.org/r/20240221092732.4126848-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- virt/kvm/arm/vgic/vgic-its.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -459,6 +459,9 @@ static int its_sync_lpi_pending_table(st }
irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]); + if (!irq) + continue; + raw_spin_lock_irqsave(&irq->irq_lock, flags); irq->pending_latch = pendmask & (1U << bit_nr); vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oliver Upton oliver.upton@linux.dev
commit 85a71ee9a0700f6c18862ef3b0011ed9dad99aca upstream.
It is possible that an LPI mapped in a different ITS gets unmapped while handling the MOVALL command. If that is the case, there is no state that can be migrated to the destination. Silently ignore it and continue migrating other LPIs.
Cc: stable@vger.kernel.org Fixes: ff9c114394aa ("KVM: arm/arm64: GICv4: Handle MOVALL applied to a vPE") Signed-off-by: Oliver Upton oliver.upton@linux.dev Link: https://lore.kernel.org/r/20240221092732.4126848-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- virt/kvm/arm/vgic/vgic-its.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -1376,6 +1376,8 @@ static int vgic_its_cmd_handle_movall(st
for (i = 0; i < irq_count; i++) { irq = vgic_get_irq(kvm, NULL, intids[i]); + if (!irq) + continue;
update_affinity(irq, vcpu2);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jamal Hadi Salim jhs@mojatatu.com
commit 051d442098421c28c7951625652f61b1e15c4bd5 upstream.
While this amazing qdisc has served us well over the years it has not been getting any tender love and care and has bitrotted over time. It has become mostly a shooting target for syzkaller lately. For this reason, we are retiring it. Goodbye CBQ - we loved you.
Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Acked-by: Jiri Pirko jiri@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/Kconfig | 17 net/sched/Makefile | 1 net/sched/sch_cbq.c | 1818 ---------------------------------------------------- 3 files changed, 1836 deletions(-) delete mode 100644 net/sched/sch_cbq.c delete mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/cbq.json
--- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -45,23 +45,6 @@ if NET_SCHED
comment "Queueing/Scheduling"
-config NET_SCH_CBQ - tristate "Class Based Queueing (CBQ)" - ---help--- - Say Y here if you want to use the Class-Based Queueing (CBQ) packet - scheduling algorithm. This algorithm classifies the waiting packets - into a tree-like hierarchy of classes; the leaves of this tree are - in turn scheduled by separate algorithms. - - See the top of file:net/sched/sch_cbq.c for more details. - - CBQ is a commonly used scheduler, so if you're unsure, you should - say Y here. Then say Y to all the queueing algorithms below that you - want to use as leaf disciplines. - - To compile this code as a module, choose M here: the - module will be called sch_cbq. - config NET_SCH_HTB tristate "Hierarchical Token Bucket (HTB)" ---help--- --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -31,7 +31,6 @@ obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_ obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o obj-$(CONFIG_NET_ACT_CT) += act_ct.o obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o -obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o obj-$(CONFIG_NET_SCH_HFSC) += sch_hfsc.o obj-$(CONFIG_NET_SCH_RED) += sch_red.o --- a/net/sched/sch_cbq.c +++ /dev/null @@ -1,1818 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* - * net/sched/sch_cbq.c Class-Based Queueing discipline. - * - * Authors: Alexey Kuznetsov, kuznet@ms2.inr.ac.ru - */ - -#include <linux/module.h> -#include <linux/slab.h> -#include <linux/types.h> -#include <linux/kernel.h> -#include <linux/string.h> -#include <linux/errno.h> -#include <linux/skbuff.h> -#include <net/netlink.h> -#include <net/pkt_sched.h> -#include <net/pkt_cls.h> - - -/* Class-Based Queueing (CBQ) algorithm. - ======================================= - - Sources: [1] Sally Floyd and Van Jacobson, "Link-sharing and Resource - Management Models for Packet Networks", - IEEE/ACM Transactions on Networking, Vol.3, No.4, 1995 - - [2] Sally Floyd, "Notes on CBQ and Guaranteed Service", 1995 - - [3] Sally Floyd, "Notes on Class-Based Queueing: Setting - Parameters", 1996 - - [4] Sally Floyd and Michael Speer, "Experimental Results - for Class-Based Queueing", 1998, not published. - - ----------------------------------------------------------------------- - - Algorithm skeleton was taken from NS simulator cbq.cc. - If someone wants to check this code against the LBL version, - he should take into account that ONLY the skeleton was borrowed, - the implementation is different. Particularly: - - --- The WRR algorithm is different. Our version looks more - reasonable (I hope) and works when quanta are allowed to be - less than MTU, which is always the case when real time classes - have small rates. Note, that the statement of [3] is - incomplete, delay may actually be estimated even if class - per-round allotment is less than MTU. Namely, if per-round - allotment is W*r_i, and r_1+...+r_k = r < 1 - - delay_i <= ([MTU/(W*r_i)]*W*r + W*r + k*MTU)/B - - In the worst case we have IntServ estimate with D = W*r+k*MTU - and C = MTU*r. The proof (if correct at all) is trivial. - - - --- It seems that cbq-2.0 is not very accurate. At least, I cannot - interpret some places, which look like wrong translations - from NS. Anyone is advised to find these differences - and explain to me, why I am wrong 8). - - --- Linux has no EOI event, so that we cannot estimate true class - idle time. Workaround is to consider the next dequeue event - as sign that previous packet is finished. This is wrong because of - internal device queueing, but on a permanently loaded link it is true. - Moreover, combined with clock integrator, this scheme looks - very close to an ideal solution. */ - -struct cbq_sched_data; - - -struct cbq_class { - struct Qdisc_class_common common; - struct cbq_class *next_alive; /* next class with backlog in this priority band */ - -/* Parameters */ - unsigned char priority; /* class priority */ - unsigned char priority2; /* priority to be used after overlimit */ - unsigned char ewma_log; /* time constant for idle time calculation */ - - u32 defmap; - - /* Link-sharing scheduler parameters */ - long maxidle; /* Class parameters: see below. */ - long offtime; - long minidle; - u32 avpkt; - struct qdisc_rate_table *R_tab; - - /* General scheduler (WRR) parameters */ - long allot; - long quantum; /* Allotment per WRR round */ - long weight; /* Relative allotment: see below */ - - struct Qdisc *qdisc; /* Ptr to CBQ discipline */ - struct cbq_class *split; /* Ptr to split node */ - struct cbq_class *share; /* Ptr to LS parent in the class tree */ - struct cbq_class *tparent; /* Ptr to tree parent in the class tree */ - struct cbq_class *borrow; /* NULL if class is bandwidth limited; - parent otherwise */ - struct cbq_class *sibling; /* Sibling chain */ - struct cbq_class *children; /* Pointer to children chain */ - - struct Qdisc *q; /* Elementary queueing discipline */ - - -/* Variables */ - unsigned char cpriority; /* Effective priority */ - unsigned char delayed; - unsigned char level; /* level of the class in hierarchy: - 0 for leaf classes, and maximal - level of children + 1 for nodes. - */ - - psched_time_t last; /* Last end of service */ - psched_time_t undertime; - long avgidle; - long deficit; /* Saved deficit for WRR */ - psched_time_t penalized; - struct gnet_stats_basic_packed bstats; - struct gnet_stats_queue qstats; - struct net_rate_estimator __rcu *rate_est; - struct tc_cbq_xstats xstats; - - struct tcf_proto __rcu *filter_list; - struct tcf_block *block; - - int filters; - - struct cbq_class *defaults[TC_PRIO_MAX + 1]; -}; - -struct cbq_sched_data { - struct Qdisc_class_hash clhash; /* Hash table of all classes */ - int nclasses[TC_CBQ_MAXPRIO + 1]; - unsigned int quanta[TC_CBQ_MAXPRIO + 1]; - - struct cbq_class link; - - unsigned int activemask; - struct cbq_class *active[TC_CBQ_MAXPRIO + 1]; /* List of all classes - with backlog */ - -#ifdef CONFIG_NET_CLS_ACT - struct cbq_class *rx_class; -#endif - struct cbq_class *tx_class; - struct cbq_class *tx_borrowed; - int tx_len; - psched_time_t now; /* Cached timestamp */ - unsigned int pmask; - - struct hrtimer delay_timer; - struct qdisc_watchdog watchdog; /* Watchdog timer, - started when CBQ has - backlog, but cannot - transmit just now */ - psched_tdiff_t wd_expires; - int toplevel; - u32 hgenerator; -}; - - -#define L2T(cl, len) qdisc_l2t((cl)->R_tab, len) - -static inline struct cbq_class * -cbq_class_lookup(struct cbq_sched_data *q, u32 classid) -{ - struct Qdisc_class_common *clc; - - clc = qdisc_class_find(&q->clhash, classid); - if (clc == NULL) - return NULL; - return container_of(clc, struct cbq_class, common); -} - -#ifdef CONFIG_NET_CLS_ACT - -static struct cbq_class * -cbq_reclassify(struct sk_buff *skb, struct cbq_class *this) -{ - struct cbq_class *cl; - - for (cl = this->tparent; cl; cl = cl->tparent) { - struct cbq_class *new = cl->defaults[TC_PRIO_BESTEFFORT]; - - if (new != NULL && new != this) - return new; - } - return NULL; -} - -#endif - -/* Classify packet. The procedure is pretty complicated, but - * it allows us to combine link sharing and priority scheduling - * transparently. - * - * Namely, you can put link sharing rules (f.e. route based) at root of CBQ, - * so that it resolves to split nodes. Then packets are classified - * by logical priority, or a more specific classifier may be attached - * to the split node. - */ - -static struct cbq_class * -cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *head = &q->link; - struct cbq_class **defmap; - struct cbq_class *cl = NULL; - u32 prio = skb->priority; - struct tcf_proto *fl; - struct tcf_result res; - - /* - * Step 1. If skb->priority points to one of our classes, use it. - */ - if (TC_H_MAJ(prio ^ sch->handle) == 0 && - (cl = cbq_class_lookup(q, prio)) != NULL) - return cl; - - *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; - for (;;) { - int result = 0; - defmap = head->defaults; - - fl = rcu_dereference_bh(head->filter_list); - /* - * Step 2+n. Apply classifier. - */ - result = tcf_classify(skb, fl, &res, true); - if (!fl || result < 0) - goto fallback; - if (result == TC_ACT_SHOT) - return NULL; - - cl = (void *)res.class; - if (!cl) { - if (TC_H_MAJ(res.classid)) - cl = cbq_class_lookup(q, res.classid); - else if ((cl = defmap[res.classid & TC_PRIO_MAX]) == NULL) - cl = defmap[TC_PRIO_BESTEFFORT]; - - if (cl == NULL) - goto fallback; - } - if (cl->level >= head->level) - goto fallback; -#ifdef CONFIG_NET_CLS_ACT - switch (result) { - case TC_ACT_QUEUED: - case TC_ACT_STOLEN: - case TC_ACT_TRAP: - *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; - /* fall through */ - fallthrough; - case TC_ACT_RECLASSIFY: - return cbq_reclassify(skb, cl); - } -#endif - if (cl->level == 0) - return cl; - - /* - * Step 3+n. If classifier selected a link sharing class, - * apply agency specific classifier. - * Repeat this procdure until we hit a leaf node. - */ - head = cl; - } - -fallback: - cl = head; - - /* - * Step 4. No success... - */ - if (TC_H_MAJ(prio) == 0 && - !(cl = head->defaults[prio & TC_PRIO_MAX]) && - !(cl = head->defaults[TC_PRIO_BESTEFFORT])) - return head; - - return cl; -} - -/* - * A packet has just been enqueued on the empty class. - * cbq_activate_class adds it to the tail of active class list - * of its priority band. - */ - -static inline void cbq_activate_class(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - int prio = cl->cpriority; - struct cbq_class *cl_tail; - - cl_tail = q->active[prio]; - q->active[prio] = cl; - - if (cl_tail != NULL) { - cl->next_alive = cl_tail->next_alive; - cl_tail->next_alive = cl; - } else { - cl->next_alive = cl; - q->activemask |= (1<<prio); - } -} - -/* - * Unlink class from active chain. - * Note that this same procedure is done directly in cbq_dequeue* - * during round-robin procedure. - */ - -static void cbq_deactivate_class(struct cbq_class *this) -{ - struct cbq_sched_data *q = qdisc_priv(this->qdisc); - int prio = this->cpriority; - struct cbq_class *cl; - struct cbq_class *cl_prev = q->active[prio]; - - do { - cl = cl_prev->next_alive; - if (cl == this) { - cl_prev->next_alive = cl->next_alive; - cl->next_alive = NULL; - - if (cl == q->active[prio]) { - q->active[prio] = cl_prev; - if (cl == q->active[prio]) { - q->active[prio] = NULL; - q->activemask &= ~(1<<prio); - return; - } - } - return; - } - } while ((cl_prev = cl) != q->active[prio]); -} - -static void -cbq_mark_toplevel(struct cbq_sched_data *q, struct cbq_class *cl) -{ - int toplevel = q->toplevel; - - if (toplevel > cl->level) { - psched_time_t now = psched_get_time(); - - do { - if (cl->undertime < now) { - q->toplevel = cl->level; - return; - } - } while ((cl = cl->borrow) != NULL && toplevel > cl->level); - } -} - -static int -cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch, - struct sk_buff **to_free) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - int ret; - struct cbq_class *cl = cbq_classify(skb, sch, &ret); - -#ifdef CONFIG_NET_CLS_ACT - q->rx_class = cl; -#endif - if (cl == NULL) { - if (ret & __NET_XMIT_BYPASS) - qdisc_qstats_drop(sch); - __qdisc_drop(skb, to_free); - return ret; - } - - ret = qdisc_enqueue(skb, cl->q, to_free); - if (ret == NET_XMIT_SUCCESS) { - sch->q.qlen++; - cbq_mark_toplevel(q, cl); - if (!cl->next_alive) - cbq_activate_class(cl); - return ret; - } - - if (net_xmit_drop_count(ret)) { - qdisc_qstats_drop(sch); - cbq_mark_toplevel(q, cl); - cl->qstats.drops++; - } - return ret; -} - -/* Overlimit action: penalize leaf class by adding offtime */ -static void cbq_overlimit(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - psched_tdiff_t delay = cl->undertime - q->now; - - if (!cl->delayed) { - delay += cl->offtime; - - /* - * Class goes to sleep, so that it will have no - * chance to work avgidle. Let's forgive it 8) - * - * BTW cbq-2.0 has a crap in this - * place, apparently they forgot to shift it by cl->ewma_log. - */ - if (cl->avgidle < 0) - delay -= (-cl->avgidle) - ((-cl->avgidle) >> cl->ewma_log); - if (cl->avgidle < cl->minidle) - cl->avgidle = cl->minidle; - if (delay <= 0) - delay = 1; - cl->undertime = q->now + delay; - - cl->xstats.overactions++; - cl->delayed = 1; - } - if (q->wd_expires == 0 || q->wd_expires > delay) - q->wd_expires = delay; - - /* Dirty work! We must schedule wakeups based on - * real available rate, rather than leaf rate, - * which may be tiny (even zero). - */ - if (q->toplevel == TC_CBQ_MAXLEVEL) { - struct cbq_class *b; - psched_tdiff_t base_delay = q->wd_expires; - - for (b = cl->borrow; b; b = b->borrow) { - delay = b->undertime - q->now; - if (delay < base_delay) { - if (delay <= 0) - delay = 1; - base_delay = delay; - } - } - - q->wd_expires = base_delay; - } -} - -static psched_tdiff_t cbq_undelay_prio(struct cbq_sched_data *q, int prio, - psched_time_t now) -{ - struct cbq_class *cl; - struct cbq_class *cl_prev = q->active[prio]; - psched_time_t sched = now; - - if (cl_prev == NULL) - return 0; - - do { - cl = cl_prev->next_alive; - if (now - cl->penalized > 0) { - cl_prev->next_alive = cl->next_alive; - cl->next_alive = NULL; - cl->cpriority = cl->priority; - cl->delayed = 0; - cbq_activate_class(cl); - - if (cl == q->active[prio]) { - q->active[prio] = cl_prev; - if (cl == q->active[prio]) { - q->active[prio] = NULL; - return 0; - } - } - - cl = cl_prev->next_alive; - } else if (sched - cl->penalized > 0) - sched = cl->penalized; - } while ((cl_prev = cl) != q->active[prio]); - - return sched - now; -} - -static enum hrtimer_restart cbq_undelay(struct hrtimer *timer) -{ - struct cbq_sched_data *q = container_of(timer, struct cbq_sched_data, - delay_timer); - struct Qdisc *sch = q->watchdog.qdisc; - psched_time_t now; - psched_tdiff_t delay = 0; - unsigned int pmask; - - now = psched_get_time(); - - pmask = q->pmask; - q->pmask = 0; - - while (pmask) { - int prio = ffz(~pmask); - psched_tdiff_t tmp; - - pmask &= ~(1<<prio); - - tmp = cbq_undelay_prio(q, prio, now); - if (tmp > 0) { - q->pmask |= 1<<prio; - if (tmp < delay || delay == 0) - delay = tmp; - } - } - - if (delay) { - ktime_t time; - - time = 0; - time = ktime_add_ns(time, PSCHED_TICKS2NS(now + delay)); - hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS_PINNED); - } - - __netif_schedule(qdisc_root(sch)); - return HRTIMER_NORESTART; -} - -/* - * It is mission critical procedure. - * - * We "regenerate" toplevel cutoff, if transmitting class - * has backlog and it is not regulated. It is not part of - * original CBQ description, but looks more reasonable. - * Probably, it is wrong. This question needs further investigation. - */ - -static inline void -cbq_update_toplevel(struct cbq_sched_data *q, struct cbq_class *cl, - struct cbq_class *borrowed) -{ - if (cl && q->toplevel >= borrowed->level) { - if (cl->q->q.qlen > 1) { - do { - if (borrowed->undertime == PSCHED_PASTPERFECT) { - q->toplevel = borrowed->level; - return; - } - } while ((borrowed = borrowed->borrow) != NULL); - } -#if 0 - /* It is not necessary now. Uncommenting it - will save CPU cycles, but decrease fairness. - */ - q->toplevel = TC_CBQ_MAXLEVEL; -#endif - } -} - -static void -cbq_update(struct cbq_sched_data *q) -{ - struct cbq_class *this = q->tx_class; - struct cbq_class *cl = this; - int len = q->tx_len; - psched_time_t now; - - q->tx_class = NULL; - /* Time integrator. We calculate EOS time - * by adding expected packet transmission time. - */ - now = q->now + L2T(&q->link, len); - - for ( ; cl; cl = cl->share) { - long avgidle = cl->avgidle; - long idle; - - cl->bstats.packets++; - cl->bstats.bytes += len; - - /* - * (now - last) is total time between packet right edges. - * (last_pktlen/rate) is "virtual" busy time, so that - * - * idle = (now - last) - last_pktlen/rate - */ - - idle = now - cl->last; - if ((unsigned long)idle > 128*1024*1024) { - avgidle = cl->maxidle; - } else { - idle -= L2T(cl, len); - - /* true_avgidle := (1-W)*true_avgidle + W*idle, - * where W=2^{-ewma_log}. But cl->avgidle is scaled: - * cl->avgidle == true_avgidle/W, - * hence: - */ - avgidle += idle - (avgidle>>cl->ewma_log); - } - - if (avgidle <= 0) { - /* Overlimit or at-limit */ - - if (avgidle < cl->minidle) - avgidle = cl->minidle; - - cl->avgidle = avgidle; - - /* Calculate expected time, when this class - * will be allowed to send. - * It will occur, when: - * (1-W)*true_avgidle + W*delay = 0, i.e. - * idle = (1/W - 1)*(-true_avgidle) - * or - * idle = (1 - W)*(-cl->avgidle); - */ - idle = (-avgidle) - ((-avgidle) >> cl->ewma_log); - - /* - * That is not all. - * To maintain the rate allocated to the class, - * we add to undertime virtual clock, - * necessary to complete transmitted packet. - * (len/phys_bandwidth has been already passed - * to the moment of cbq_update) - */ - - idle -= L2T(&q->link, len); - idle += L2T(cl, len); - - cl->undertime = now + idle; - } else { - /* Underlimit */ - - cl->undertime = PSCHED_PASTPERFECT; - if (avgidle > cl->maxidle) - cl->avgidle = cl->maxidle; - else - cl->avgidle = avgidle; - } - if ((s64)(now - cl->last) > 0) - cl->last = now; - } - - cbq_update_toplevel(q, this, q->tx_borrowed); -} - -static inline struct cbq_class * -cbq_under_limit(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - struct cbq_class *this_cl = cl; - - if (cl->tparent == NULL) - return cl; - - if (cl->undertime == PSCHED_PASTPERFECT || q->now >= cl->undertime) { - cl->delayed = 0; - return cl; - } - - do { - /* It is very suspicious place. Now overlimit - * action is generated for not bounded classes - * only if link is completely congested. - * Though it is in agree with ancestor-only paradigm, - * it looks very stupid. Particularly, - * it means that this chunk of code will either - * never be called or result in strong amplification - * of burstiness. Dangerous, silly, and, however, - * no another solution exists. - */ - cl = cl->borrow; - if (!cl) { - this_cl->qstats.overlimits++; - cbq_overlimit(this_cl); - return NULL; - } - if (cl->level > q->toplevel) - return NULL; - } while (cl->undertime != PSCHED_PASTPERFECT && q->now < cl->undertime); - - cl->delayed = 0; - return cl; -} - -static inline struct sk_buff * -cbq_dequeue_prio(struct Qdisc *sch, int prio) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl_tail, *cl_prev, *cl; - struct sk_buff *skb; - int deficit; - - cl_tail = cl_prev = q->active[prio]; - cl = cl_prev->next_alive; - - do { - deficit = 0; - - /* Start round */ - do { - struct cbq_class *borrow = cl; - - if (cl->q->q.qlen && - (borrow = cbq_under_limit(cl)) == NULL) - goto skip_class; - - if (cl->deficit <= 0) { - /* Class exhausted its allotment per - * this round. Switch to the next one. - */ - deficit = 1; - cl->deficit += cl->quantum; - goto next_class; - } - - skb = cl->q->dequeue(cl->q); - - /* Class did not give us any skb :-( - * It could occur even if cl->q->q.qlen != 0 - * f.e. if cl->q == "tbf" - */ - if (skb == NULL) - goto skip_class; - - cl->deficit -= qdisc_pkt_len(skb); - q->tx_class = cl; - q->tx_borrowed = borrow; - if (borrow != cl) { -#ifndef CBQ_XSTATS_BORROWS_BYTES - borrow->xstats.borrows++; - cl->xstats.borrows++; -#else - borrow->xstats.borrows += qdisc_pkt_len(skb); - cl->xstats.borrows += qdisc_pkt_len(skb); -#endif - } - q->tx_len = qdisc_pkt_len(skb); - - if (cl->deficit <= 0) { - q->active[prio] = cl; - cl = cl->next_alive; - cl->deficit += cl->quantum; - } - return skb; - -skip_class: - if (cl->q->q.qlen == 0 || prio != cl->cpriority) { - /* Class is empty or penalized. - * Unlink it from active chain. - */ - cl_prev->next_alive = cl->next_alive; - cl->next_alive = NULL; - - /* Did cl_tail point to it? */ - if (cl == cl_tail) { - /* Repair it! */ - cl_tail = cl_prev; - - /* Was it the last class in this band? */ - if (cl == cl_tail) { - /* Kill the band! */ - q->active[prio] = NULL; - q->activemask &= ~(1<<prio); - if (cl->q->q.qlen) - cbq_activate_class(cl); - return NULL; - } - - q->active[prio] = cl_tail; - } - if (cl->q->q.qlen) - cbq_activate_class(cl); - - cl = cl_prev; - } - -next_class: - cl_prev = cl; - cl = cl->next_alive; - } while (cl_prev != cl_tail); - } while (deficit); - - q->active[prio] = cl_prev; - - return NULL; -} - -static inline struct sk_buff * -cbq_dequeue_1(struct Qdisc *sch) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct sk_buff *skb; - unsigned int activemask; - - activemask = q->activemask & 0xFF; - while (activemask) { - int prio = ffz(~activemask); - activemask &= ~(1<<prio); - skb = cbq_dequeue_prio(sch, prio); - if (skb) - return skb; - } - return NULL; -} - -static struct sk_buff * -cbq_dequeue(struct Qdisc *sch) -{ - struct sk_buff *skb; - struct cbq_sched_data *q = qdisc_priv(sch); - psched_time_t now; - - now = psched_get_time(); - - if (q->tx_class) - cbq_update(q); - - q->now = now; - - for (;;) { - q->wd_expires = 0; - - skb = cbq_dequeue_1(sch); - if (skb) { - qdisc_bstats_update(sch, skb); - sch->q.qlen--; - return skb; - } - - /* All the classes are overlimit. - * - * It is possible, if: - * - * 1. Scheduler is empty. - * 2. Toplevel cutoff inhibited borrowing. - * 3. Root class is overlimit. - * - * Reset 2d and 3d conditions and retry. - * - * Note, that NS and cbq-2.0 are buggy, peeking - * an arbitrary class is appropriate for ancestor-only - * sharing, but not for toplevel algorithm. - * - * Our version is better, but slower, because it requires - * two passes, but it is unavoidable with top-level sharing. - */ - - if (q->toplevel == TC_CBQ_MAXLEVEL && - q->link.undertime == PSCHED_PASTPERFECT) - break; - - q->toplevel = TC_CBQ_MAXLEVEL; - q->link.undertime = PSCHED_PASTPERFECT; - } - - /* No packets in scheduler or nobody wants to give them to us :-( - * Sigh... start watchdog timer in the last case. - */ - - if (sch->q.qlen) { - qdisc_qstats_overlimit(sch); - if (q->wd_expires) - qdisc_watchdog_schedule(&q->watchdog, - now + q->wd_expires); - } - return NULL; -} - -/* CBQ class maintanance routines */ - -static void cbq_adjust_levels(struct cbq_class *this) -{ - if (this == NULL) - return; - - do { - int level = 0; - struct cbq_class *cl; - - cl = this->children; - if (cl) { - do { - if (cl->level > level) - level = cl->level; - } while ((cl = cl->sibling) != this->children); - } - this->level = level + 1; - } while ((this = this->tparent) != NULL); -} - -static void cbq_normalize_quanta(struct cbq_sched_data *q, int prio) -{ - struct cbq_class *cl; - unsigned int h; - - if (q->quanta[prio] == 0) - return; - - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - /* BUGGGG... Beware! This expression suffer of - * arithmetic overflows! - */ - if (cl->priority == prio) { - cl->quantum = (cl->weight*cl->allot*q->nclasses[prio])/ - q->quanta[prio]; - } - if (cl->quantum <= 0 || - cl->quantum > 32*qdisc_dev(cl->qdisc)->mtu) { - pr_warn("CBQ: class %08x has bad quantum==%ld, repaired.\n", - cl->common.classid, cl->quantum); - cl->quantum = qdisc_dev(cl->qdisc)->mtu/2 + 1; - } - } - } -} - -static void cbq_sync_defmap(struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - struct cbq_class *split = cl->split; - unsigned int h; - int i; - - if (split == NULL) - return; - - for (i = 0; i <= TC_PRIO_MAX; i++) { - if (split->defaults[i] == cl && !(cl->defmap & (1<<i))) - split->defaults[i] = NULL; - } - - for (i = 0; i <= TC_PRIO_MAX; i++) { - int level = split->level; - - if (split->defaults[i]) - continue; - - for (h = 0; h < q->clhash.hashsize; h++) { - struct cbq_class *c; - - hlist_for_each_entry(c, &q->clhash.hash[h], - common.hnode) { - if (c->split == split && c->level < level && - c->defmap & (1<<i)) { - split->defaults[i] = c; - level = c->level; - } - } - } - } -} - -static void cbq_change_defmap(struct cbq_class *cl, u32 splitid, u32 def, u32 mask) -{ - struct cbq_class *split = NULL; - - if (splitid == 0) { - split = cl->split; - if (!split) - return; - splitid = split->common.classid; - } - - if (split == NULL || split->common.classid != splitid) { - for (split = cl->tparent; split; split = split->tparent) - if (split->common.classid == splitid) - break; - } - - if (split == NULL) - return; - - if (cl->split != split) { - cl->defmap = 0; - cbq_sync_defmap(cl); - cl->split = split; - cl->defmap = def & mask; - } else - cl->defmap = (cl->defmap & ~mask) | (def & mask); - - cbq_sync_defmap(cl); -} - -static void cbq_unlink_class(struct cbq_class *this) -{ - struct cbq_class *cl, **clp; - struct cbq_sched_data *q = qdisc_priv(this->qdisc); - - qdisc_class_hash_remove(&q->clhash, &this->common); - - if (this->tparent) { - clp = &this->sibling; - cl = *clp; - do { - if (cl == this) { - *clp = cl->sibling; - break; - } - clp = &cl->sibling; - } while ((cl = *clp) != this->sibling); - - if (this->tparent->children == this) { - this->tparent->children = this->sibling; - if (this->sibling == this) - this->tparent->children = NULL; - } - } else { - WARN_ON(this->sibling != this); - } -} - -static void cbq_link_class(struct cbq_class *this) -{ - struct cbq_sched_data *q = qdisc_priv(this->qdisc); - struct cbq_class *parent = this->tparent; - - this->sibling = this; - qdisc_class_hash_insert(&q->clhash, &this->common); - - if (parent == NULL) - return; - - if (parent->children == NULL) { - parent->children = this; - } else { - this->sibling = parent->children->sibling; - parent->children->sibling = this; - } -} - -static void -cbq_reset(struct Qdisc *sch) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl; - int prio; - unsigned int h; - - q->activemask = 0; - q->pmask = 0; - q->tx_class = NULL; - q->tx_borrowed = NULL; - qdisc_watchdog_cancel(&q->watchdog); - hrtimer_cancel(&q->delay_timer); - q->toplevel = TC_CBQ_MAXLEVEL; - q->now = psched_get_time(); - - for (prio = 0; prio <= TC_CBQ_MAXPRIO; prio++) - q->active[prio] = NULL; - - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - qdisc_reset(cl->q); - - cl->next_alive = NULL; - cl->undertime = PSCHED_PASTPERFECT; - cl->avgidle = cl->maxidle; - cl->deficit = cl->quantum; - cl->cpriority = cl->priority; - } - } - sch->q.qlen = 0; -} - - -static int cbq_set_lss(struct cbq_class *cl, struct tc_cbq_lssopt *lss) -{ - if (lss->change & TCF_CBQ_LSS_FLAGS) { - cl->share = (lss->flags & TCF_CBQ_LSS_ISOLATED) ? NULL : cl->tparent; - cl->borrow = (lss->flags & TCF_CBQ_LSS_BOUNDED) ? NULL : cl->tparent; - } - if (lss->change & TCF_CBQ_LSS_EWMA) - cl->ewma_log = lss->ewma_log; - if (lss->change & TCF_CBQ_LSS_AVPKT) - cl->avpkt = lss->avpkt; - if (lss->change & TCF_CBQ_LSS_MINIDLE) - cl->minidle = -(long)lss->minidle; - if (lss->change & TCF_CBQ_LSS_MAXIDLE) { - cl->maxidle = lss->maxidle; - cl->avgidle = lss->maxidle; - } - if (lss->change & TCF_CBQ_LSS_OFFTIME) - cl->offtime = lss->offtime; - return 0; -} - -static void cbq_rmprio(struct cbq_sched_data *q, struct cbq_class *cl) -{ - q->nclasses[cl->priority]--; - q->quanta[cl->priority] -= cl->weight; - cbq_normalize_quanta(q, cl->priority); -} - -static void cbq_addprio(struct cbq_sched_data *q, struct cbq_class *cl) -{ - q->nclasses[cl->priority]++; - q->quanta[cl->priority] += cl->weight; - cbq_normalize_quanta(q, cl->priority); -} - -static int cbq_set_wrr(struct cbq_class *cl, struct tc_cbq_wrropt *wrr) -{ - struct cbq_sched_data *q = qdisc_priv(cl->qdisc); - - if (wrr->allot) - cl->allot = wrr->allot; - if (wrr->weight) - cl->weight = wrr->weight; - if (wrr->priority) { - cl->priority = wrr->priority - 1; - cl->cpriority = cl->priority; - if (cl->priority >= cl->priority2) - cl->priority2 = TC_CBQ_MAXPRIO - 1; - } - - cbq_addprio(q, cl); - return 0; -} - -static int cbq_set_fopt(struct cbq_class *cl, struct tc_cbq_fopt *fopt) -{ - cbq_change_defmap(cl, fopt->split, fopt->defmap, fopt->defchange); - return 0; -} - -static const struct nla_policy cbq_policy[TCA_CBQ_MAX + 1] = { - [TCA_CBQ_LSSOPT] = { .len = sizeof(struct tc_cbq_lssopt) }, - [TCA_CBQ_WRROPT] = { .len = sizeof(struct tc_cbq_wrropt) }, - [TCA_CBQ_FOPT] = { .len = sizeof(struct tc_cbq_fopt) }, - [TCA_CBQ_OVL_STRATEGY] = { .len = sizeof(struct tc_cbq_ovl) }, - [TCA_CBQ_RATE] = { .len = sizeof(struct tc_ratespec) }, - [TCA_CBQ_RTAB] = { .type = NLA_BINARY, .len = TC_RTAB_SIZE }, - [TCA_CBQ_POLICE] = { .len = sizeof(struct tc_cbq_police) }, -}; - -static int cbq_opt_parse(struct nlattr *tb[TCA_CBQ_MAX + 1], - struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - int err; - - if (!opt) { - NL_SET_ERR_MSG(extack, "CBQ options are required for this operation"); - return -EINVAL; - } - - err = nla_parse_nested_deprecated(tb, TCA_CBQ_MAX, opt, - cbq_policy, extack); - if (err < 0) - return err; - - if (tb[TCA_CBQ_WRROPT]) { - const struct tc_cbq_wrropt *wrr = nla_data(tb[TCA_CBQ_WRROPT]); - - if (wrr->priority > TC_CBQ_MAXPRIO) { - NL_SET_ERR_MSG(extack, "priority is bigger than TC_CBQ_MAXPRIO"); - err = -EINVAL; - } - } - return err; -} - -static int cbq_init(struct Qdisc *sch, struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct nlattr *tb[TCA_CBQ_MAX + 1]; - struct tc_ratespec *r; - int err; - - qdisc_watchdog_init(&q->watchdog, sch); - hrtimer_init(&q->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); - q->delay_timer.function = cbq_undelay; - - err = cbq_opt_parse(tb, opt, extack); - if (err < 0) - return err; - - if (!tb[TCA_CBQ_RTAB] || !tb[TCA_CBQ_RATE]) { - NL_SET_ERR_MSG(extack, "Rate specification missing or incomplete"); - return -EINVAL; - } - - r = nla_data(tb[TCA_CBQ_RATE]); - - q->link.R_tab = qdisc_get_rtab(r, tb[TCA_CBQ_RTAB], extack); - if (!q->link.R_tab) - return -EINVAL; - - err = tcf_block_get(&q->link.block, &q->link.filter_list, sch, extack); - if (err) - goto put_rtab; - - err = qdisc_class_hash_init(&q->clhash); - if (err < 0) - goto put_block; - - q->link.sibling = &q->link; - q->link.common.classid = sch->handle; - q->link.qdisc = sch; - q->link.q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, - sch->handle, NULL); - if (!q->link.q) - q->link.q = &noop_qdisc; - else - qdisc_hash_add(q->link.q, true); - - q->link.priority = TC_CBQ_MAXPRIO - 1; - q->link.priority2 = TC_CBQ_MAXPRIO - 1; - q->link.cpriority = TC_CBQ_MAXPRIO - 1; - q->link.allot = psched_mtu(qdisc_dev(sch)); - q->link.quantum = q->link.allot; - q->link.weight = q->link.R_tab->rate.rate; - - q->link.ewma_log = TC_CBQ_DEF_EWMA; - q->link.avpkt = q->link.allot/2; - q->link.minidle = -0x7FFFFFFF; - - q->toplevel = TC_CBQ_MAXLEVEL; - q->now = psched_get_time(); - - cbq_link_class(&q->link); - - if (tb[TCA_CBQ_LSSOPT]) - cbq_set_lss(&q->link, nla_data(tb[TCA_CBQ_LSSOPT])); - - cbq_addprio(q, &q->link); - return 0; - -put_block: - tcf_block_put(q->link.block); - -put_rtab: - qdisc_put_rtab(q->link.R_tab); - return err; -} - -static int cbq_dump_rate(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - - if (nla_put(skb, TCA_CBQ_RATE, sizeof(cl->R_tab->rate), &cl->R_tab->rate)) - goto nla_put_failure; - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_lss(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - struct tc_cbq_lssopt opt; - - opt.flags = 0; - if (cl->borrow == NULL) - opt.flags |= TCF_CBQ_LSS_BOUNDED; - if (cl->share == NULL) - opt.flags |= TCF_CBQ_LSS_ISOLATED; - opt.ewma_log = cl->ewma_log; - opt.level = cl->level; - opt.avpkt = cl->avpkt; - opt.maxidle = cl->maxidle; - opt.minidle = (u32)(-cl->minidle); - opt.offtime = cl->offtime; - opt.change = ~0; - if (nla_put(skb, TCA_CBQ_LSSOPT, sizeof(opt), &opt)) - goto nla_put_failure; - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_wrr(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - struct tc_cbq_wrropt opt; - - memset(&opt, 0, sizeof(opt)); - opt.flags = 0; - opt.allot = cl->allot; - opt.priority = cl->priority + 1; - opt.cpriority = cl->cpriority + 1; - opt.weight = cl->weight; - if (nla_put(skb, TCA_CBQ_WRROPT, sizeof(opt), &opt)) - goto nla_put_failure; - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_fopt(struct sk_buff *skb, struct cbq_class *cl) -{ - unsigned char *b = skb_tail_pointer(skb); - struct tc_cbq_fopt opt; - - if (cl->split || cl->defmap) { - opt.split = cl->split ? cl->split->common.classid : 0; - opt.defmap = cl->defmap; - opt.defchange = ~0; - if (nla_put(skb, TCA_CBQ_FOPT, sizeof(opt), &opt)) - goto nla_put_failure; - } - return skb->len; - -nla_put_failure: - nlmsg_trim(skb, b); - return -1; -} - -static int cbq_dump_attr(struct sk_buff *skb, struct cbq_class *cl) -{ - if (cbq_dump_lss(skb, cl) < 0 || - cbq_dump_rate(skb, cl) < 0 || - cbq_dump_wrr(skb, cl) < 0 || - cbq_dump_fopt(skb, cl) < 0) - return -1; - return 0; -} - -static int cbq_dump(struct Qdisc *sch, struct sk_buff *skb) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct nlattr *nest; - - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (nest == NULL) - goto nla_put_failure; - if (cbq_dump_attr(skb, &q->link) < 0) - goto nla_put_failure; - return nla_nest_end(skb, nest); - -nla_put_failure: - nla_nest_cancel(skb, nest); - return -1; -} - -static int -cbq_dump_stats(struct Qdisc *sch, struct gnet_dump *d) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - - q->link.xstats.avgidle = q->link.avgidle; - return gnet_stats_copy_app(d, &q->link.xstats, sizeof(q->link.xstats)); -} - -static int -cbq_dump_class(struct Qdisc *sch, unsigned long arg, - struct sk_buff *skb, struct tcmsg *tcm) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - struct nlattr *nest; - - if (cl->tparent) - tcm->tcm_parent = cl->tparent->common.classid; - else - tcm->tcm_parent = TC_H_ROOT; - tcm->tcm_handle = cl->common.classid; - tcm->tcm_info = cl->q->handle; - - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (nest == NULL) - goto nla_put_failure; - if (cbq_dump_attr(skb, cl) < 0) - goto nla_put_failure; - return nla_nest_end(skb, nest); - -nla_put_failure: - nla_nest_cancel(skb, nest); - return -1; -} - -static int -cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg, - struct gnet_dump *d) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)arg; - __u32 qlen; - - cl->xstats.avgidle = cl->avgidle; - cl->xstats.undertime = 0; - qdisc_qstats_qlen_backlog(cl->q, &qlen, &cl->qstats.backlog); - - if (cl->undertime != PSCHED_PASTPERFECT) - cl->xstats.undertime = cl->undertime - q->now; - - if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), - d, NULL, &cl->bstats) < 0 || - gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 || - gnet_stats_copy_queue(d, NULL, &cl->qstats, qlen) < 0) - return -1; - - return gnet_stats_copy_app(d, &cl->xstats, sizeof(cl->xstats)); -} - -static int cbq_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, - struct Qdisc **old, struct netlink_ext_ack *extack) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - if (new == NULL) { - new = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, - cl->common.classid, extack); - if (new == NULL) - return -ENOBUFS; - } - - *old = qdisc_replace(sch, new, &cl->q); - return 0; -} - -static struct Qdisc *cbq_leaf(struct Qdisc *sch, unsigned long arg) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - return cl->q; -} - -static void cbq_qlen_notify(struct Qdisc *sch, unsigned long arg) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - cbq_deactivate_class(cl); -} - -static unsigned long cbq_find(struct Qdisc *sch, u32 classid) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - - return (unsigned long)cbq_class_lookup(q, classid); -} - -static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - - WARN_ON(cl->filters); - - tcf_block_put(cl->block); - qdisc_put(cl->q); - qdisc_put_rtab(cl->R_tab); - gen_kill_estimator(&cl->rate_est); - if (cl != &q->link) - kfree(cl); -} - -static void cbq_destroy(struct Qdisc *sch) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct hlist_node *next; - struct cbq_class *cl; - unsigned int h; - -#ifdef CONFIG_NET_CLS_ACT - q->rx_class = NULL; -#endif - /* - * Filters must be destroyed first because we don't destroy the - * classes from root to leafs which means that filters can still - * be bound to classes which have been destroyed already. --TGR '04 - */ - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - tcf_block_put(cl->block); - cl->block = NULL; - } - } - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry_safe(cl, next, &q->clhash.hash[h], - common.hnode) - cbq_destroy_class(sch, cl); - } - qdisc_class_hash_destroy(&q->clhash); -} - -static int -cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **tca, - unsigned long *arg, struct netlink_ext_ack *extack) -{ - int err; - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)*arg; - struct nlattr *opt = tca[TCA_OPTIONS]; - struct nlattr *tb[TCA_CBQ_MAX + 1]; - struct cbq_class *parent; - struct qdisc_rate_table *rtab = NULL; - - err = cbq_opt_parse(tb, opt, extack); - if (err < 0) - return err; - - if (tb[TCA_CBQ_OVL_STRATEGY] || tb[TCA_CBQ_POLICE]) { - NL_SET_ERR_MSG(extack, "Neither overlimit strategy nor policing attributes can be used for changing class params"); - return -EOPNOTSUPP; - } - - if (cl) { - /* Check parent */ - if (parentid) { - if (cl->tparent && - cl->tparent->common.classid != parentid) { - NL_SET_ERR_MSG(extack, "Invalid parent id"); - return -EINVAL; - } - if (!cl->tparent && parentid != TC_H_ROOT) { - NL_SET_ERR_MSG(extack, "Parent must be root"); - return -EINVAL; - } - } - - if (tb[TCA_CBQ_RATE]) { - rtab = qdisc_get_rtab(nla_data(tb[TCA_CBQ_RATE]), - tb[TCA_CBQ_RTAB], extack); - if (rtab == NULL) - return -EINVAL; - } - - if (tca[TCA_RATE]) { - err = gen_replace_estimator(&cl->bstats, NULL, - &cl->rate_est, - NULL, - qdisc_root_sleeping_running(sch), - tca[TCA_RATE]); - if (err) { - NL_SET_ERR_MSG(extack, "Failed to replace specified rate estimator"); - qdisc_put_rtab(rtab); - return err; - } - } - - /* Change class parameters */ - sch_tree_lock(sch); - - if (cl->next_alive != NULL) - cbq_deactivate_class(cl); - - if (rtab) { - qdisc_put_rtab(cl->R_tab); - cl->R_tab = rtab; - } - - if (tb[TCA_CBQ_LSSOPT]) - cbq_set_lss(cl, nla_data(tb[TCA_CBQ_LSSOPT])); - - if (tb[TCA_CBQ_WRROPT]) { - cbq_rmprio(q, cl); - cbq_set_wrr(cl, nla_data(tb[TCA_CBQ_WRROPT])); - } - - if (tb[TCA_CBQ_FOPT]) - cbq_set_fopt(cl, nla_data(tb[TCA_CBQ_FOPT])); - - if (cl->q->q.qlen) - cbq_activate_class(cl); - - sch_tree_unlock(sch); - - return 0; - } - - if (parentid == TC_H_ROOT) - return -EINVAL; - - if (!tb[TCA_CBQ_WRROPT] || !tb[TCA_CBQ_RATE] || !tb[TCA_CBQ_LSSOPT]) { - NL_SET_ERR_MSG(extack, "One of the following attributes MUST be specified: WRR, rate or link sharing"); - return -EINVAL; - } - - rtab = qdisc_get_rtab(nla_data(tb[TCA_CBQ_RATE]), tb[TCA_CBQ_RTAB], - extack); - if (rtab == NULL) - return -EINVAL; - - if (classid) { - err = -EINVAL; - if (TC_H_MAJ(classid ^ sch->handle) || - cbq_class_lookup(q, classid)) { - NL_SET_ERR_MSG(extack, "Specified class not found"); - goto failure; - } - } else { - int i; - classid = TC_H_MAKE(sch->handle, 0x8000); - - for (i = 0; i < 0x8000; i++) { - if (++q->hgenerator >= 0x8000) - q->hgenerator = 1; - if (cbq_class_lookup(q, classid|q->hgenerator) == NULL) - break; - } - err = -ENOSR; - if (i >= 0x8000) { - NL_SET_ERR_MSG(extack, "Unable to generate classid"); - goto failure; - } - classid = classid|q->hgenerator; - } - - parent = &q->link; - if (parentid) { - parent = cbq_class_lookup(q, parentid); - err = -EINVAL; - if (!parent) { - NL_SET_ERR_MSG(extack, "Failed to find parentid"); - goto failure; - } - } - - err = -ENOBUFS; - cl = kzalloc(sizeof(*cl), GFP_KERNEL); - if (cl == NULL) - goto failure; - - err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack); - if (err) { - kfree(cl); - goto failure; - } - - if (tca[TCA_RATE]) { - err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est, - NULL, - qdisc_root_sleeping_running(sch), - tca[TCA_RATE]); - if (err) { - NL_SET_ERR_MSG(extack, "Couldn't create new estimator"); - tcf_block_put(cl->block); - kfree(cl); - goto failure; - } - } - - cl->R_tab = rtab; - rtab = NULL; - cl->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid, - NULL); - if (!cl->q) - cl->q = &noop_qdisc; - else - qdisc_hash_add(cl->q, true); - - cl->common.classid = classid; - cl->tparent = parent; - cl->qdisc = sch; - cl->allot = parent->allot; - cl->quantum = cl->allot; - cl->weight = cl->R_tab->rate.rate; - - sch_tree_lock(sch); - cbq_link_class(cl); - cl->borrow = cl->tparent; - if (cl->tparent != &q->link) - cl->share = cl->tparent; - cbq_adjust_levels(parent); - cl->minidle = -0x7FFFFFFF; - cbq_set_lss(cl, nla_data(tb[TCA_CBQ_LSSOPT])); - cbq_set_wrr(cl, nla_data(tb[TCA_CBQ_WRROPT])); - if (cl->ewma_log == 0) - cl->ewma_log = q->link.ewma_log; - if (cl->maxidle == 0) - cl->maxidle = q->link.maxidle; - if (cl->avpkt == 0) - cl->avpkt = q->link.avpkt; - if (tb[TCA_CBQ_FOPT]) - cbq_set_fopt(cl, nla_data(tb[TCA_CBQ_FOPT])); - sch_tree_unlock(sch); - - qdisc_class_hash_grow(sch, &q->clhash); - - *arg = (unsigned long)cl; - return 0; - -failure: - qdisc_put_rtab(rtab); - return err; -} - -static int cbq_delete(struct Qdisc *sch, unsigned long arg) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)arg; - - if (cl->filters || cl->children || cl == &q->link) - return -EBUSY; - - sch_tree_lock(sch); - - qdisc_purge_queue(cl->q); - - if (cl->next_alive) - cbq_deactivate_class(cl); - - if (q->tx_borrowed == cl) - q->tx_borrowed = q->tx_class; - if (q->tx_class == cl) { - q->tx_class = NULL; - q->tx_borrowed = NULL; - } -#ifdef CONFIG_NET_CLS_ACT - if (q->rx_class == cl) - q->rx_class = NULL; -#endif - - cbq_unlink_class(cl); - cbq_adjust_levels(cl->tparent); - cl->defmap = 0; - cbq_sync_defmap(cl); - - cbq_rmprio(q, cl); - sch_tree_unlock(sch); - - cbq_destroy_class(sch, cl); - return 0; -} - -static struct tcf_block *cbq_tcf_block(struct Qdisc *sch, unsigned long arg, - struct netlink_ext_ack *extack) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl = (struct cbq_class *)arg; - - if (cl == NULL) - cl = &q->link; - - return cl->block; -} - -static unsigned long cbq_bind_filter(struct Qdisc *sch, unsigned long parent, - u32 classid) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *p = (struct cbq_class *)parent; - struct cbq_class *cl = cbq_class_lookup(q, classid); - - if (cl) { - if (p && p->level <= cl->level) - return 0; - cl->filters++; - return (unsigned long)cl; - } - return 0; -} - -static void cbq_unbind_filter(struct Qdisc *sch, unsigned long arg) -{ - struct cbq_class *cl = (struct cbq_class *)arg; - - cl->filters--; -} - -static void cbq_walk(struct Qdisc *sch, struct qdisc_walker *arg) -{ - struct cbq_sched_data *q = qdisc_priv(sch); - struct cbq_class *cl; - unsigned int h; - - if (arg->stop) - return; - - for (h = 0; h < q->clhash.hashsize; h++) { - hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) { - if (arg->count < arg->skip) { - arg->count++; - continue; - } - if (arg->fn(sch, (unsigned long)cl, arg) < 0) { - arg->stop = 1; - return; - } - arg->count++; - } - } -} - -static const struct Qdisc_class_ops cbq_class_ops = { - .graft = cbq_graft, - .leaf = cbq_leaf, - .qlen_notify = cbq_qlen_notify, - .find = cbq_find, - .change = cbq_change_class, - .delete = cbq_delete, - .walk = cbq_walk, - .tcf_block = cbq_tcf_block, - .bind_tcf = cbq_bind_filter, - .unbind_tcf = cbq_unbind_filter, - .dump = cbq_dump_class, - .dump_stats = cbq_dump_class_stats, -}; - -static struct Qdisc_ops cbq_qdisc_ops __read_mostly = { - .next = NULL, - .cl_ops = &cbq_class_ops, - .id = "cbq", - .priv_size = sizeof(struct cbq_sched_data), - .enqueue = cbq_enqueue, - .dequeue = cbq_dequeue, - .peek = qdisc_peek_dequeued, - .init = cbq_init, - .reset = cbq_reset, - .destroy = cbq_destroy, - .change = NULL, - .dump = cbq_dump, - .dump_stats = cbq_dump_stats, - .owner = THIS_MODULE, -}; - -static int __init cbq_module_init(void) -{ - return register_qdisc(&cbq_qdisc_ops); -} -static void __exit cbq_module_exit(void) -{ - unregister_qdisc(&cbq_qdisc_ops); -} -module_init(cbq_module_init) -module_exit(cbq_module_exit) -MODULE_LICENSE("GPL");
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jamal Hadi Salim jhs@mojatatu.com
commit fb38306ceb9e770adfb5ffa6e3c64047b55f7a07 upstream.
The ATM qdisc has served us well over the years but has not been getting much TLC due to lack of known users. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it.
Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Acked-by: Jiri Pirko jiri@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/Kconfig | 14 - net/sched/Makefile | 1 net/sched/sch_atm.c | 710 ---------------------------------------------------- 3 files changed, 725 deletions(-) delete mode 100644 net/sched/sch_atm.c delete mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/atm.json
--- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -68,20 +68,6 @@ config NET_SCH_HFSC To compile this code as a module, choose M here: the module will be called sch_hfsc.
-config NET_SCH_ATM - tristate "ATM Virtual Circuits (ATM)" - depends on ATM - ---help--- - Say Y here if you want to use the ATM pseudo-scheduler. This - provides a framework for invoking classifiers, which in turn - select classes of this queuing discipline. Each class maps - the flow(s) it is handling to a given virtual circuit. - - See the top of file:net/sched/sch_atm.c for more details. - - To compile this code as a module, choose M here: the - module will be called sch_atm. - config NET_SCH_PRIO tristate "Multi Band Priority Queueing (PRIO)" ---help--- --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -43,7 +43,6 @@ obj-$(CONFIG_NET_SCH_TBF) += sch_tbf.o obj-$(CONFIG_NET_SCH_TEQL) += sch_teql.o obj-$(CONFIG_NET_SCH_PRIO) += sch_prio.o obj-$(CONFIG_NET_SCH_MULTIQ) += sch_multiq.o -obj-$(CONFIG_NET_SCH_ATM) += sch_atm.o obj-$(CONFIG_NET_SCH_NETEM) += sch_netem.o obj-$(CONFIG_NET_SCH_DRR) += sch_drr.o obj-$(CONFIG_NET_SCH_PLUG) += sch_plug.o --- a/net/sched/sch_atm.c +++ /dev/null @@ -1,710 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* net/sched/sch_atm.c - ATM VC selection "queueing discipline" */ - -/* Written 1998-2000 by Werner Almesberger, EPFL ICA */ - -#include <linux/module.h> -#include <linux/slab.h> -#include <linux/init.h> -#include <linux/interrupt.h> -#include <linux/string.h> -#include <linux/errno.h> -#include <linux/skbuff.h> -#include <linux/atmdev.h> -#include <linux/atmclip.h> -#include <linux/rtnetlink.h> -#include <linux/file.h> /* for fput */ -#include <net/netlink.h> -#include <net/pkt_sched.h> -#include <net/pkt_cls.h> - -/* - * The ATM queuing discipline provides a framework for invoking classifiers - * (aka "filters"), which in turn select classes of this queuing discipline. - * Each class maps the flow(s) it is handling to a given VC. Multiple classes - * may share the same VC. - * - * When creating a class, VCs are specified by passing the number of the open - * socket descriptor by which the calling process references the VC. The kernel - * keeps the VC open at least until all classes using it are removed. - * - * In this file, most functions are named atm_tc_* to avoid confusion with all - * the atm_* in net/atm. This naming convention differs from what's used in the - * rest of net/sched. - * - * Known bugs: - * - sometimes messes up the IP stack - * - any manipulations besides the few operations described in the README, are - * untested and likely to crash the system - * - should lock the flow while there is data in the queue (?) - */ - -#define VCC2FLOW(vcc) ((struct atm_flow_data *) ((vcc)->user_back)) - -struct atm_flow_data { - struct Qdisc_class_common common; - struct Qdisc *q; /* FIFO, TBF, etc. */ - struct tcf_proto __rcu *filter_list; - struct tcf_block *block; - struct atm_vcc *vcc; /* VCC; NULL if VCC is closed */ - void (*old_pop)(struct atm_vcc *vcc, - struct sk_buff *skb); /* chaining */ - struct atm_qdisc_data *parent; /* parent qdisc */ - struct socket *sock; /* for closing */ - int ref; /* reference count */ - struct gnet_stats_basic_packed bstats; - struct gnet_stats_queue qstats; - struct list_head list; - struct atm_flow_data *excess; /* flow for excess traffic; - NULL to set CLP instead */ - int hdr_len; - unsigned char hdr[0]; /* header data; MUST BE LAST */ -}; - -struct atm_qdisc_data { - struct atm_flow_data link; /* unclassified skbs go here */ - struct list_head flows; /* NB: "link" is also on this - list */ - struct tasklet_struct task; /* dequeue tasklet */ -}; - -/* ------------------------- Class/flow operations ------------------------- */ - -static inline struct atm_flow_data *lookup_flow(struct Qdisc *sch, u32 classid) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - - list_for_each_entry(flow, &p->flows, list) { - if (flow->common.classid == classid) - return flow; - } - return NULL; -} - -static int atm_tc_graft(struct Qdisc *sch, unsigned long arg, - struct Qdisc *new, struct Qdisc **old, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)arg; - - pr_debug("atm_tc_graft(sch %p,[qdisc %p],flow %p,new %p,old %p)\n", - sch, p, flow, new, old); - if (list_empty(&flow->list)) - return -EINVAL; - if (!new) - new = &noop_qdisc; - *old = flow->q; - flow->q = new; - if (*old) - qdisc_reset(*old); - return 0; -} - -static struct Qdisc *atm_tc_leaf(struct Qdisc *sch, unsigned long cl) -{ - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - - pr_debug("atm_tc_leaf(sch %p,flow %p)\n", sch, flow); - return flow ? flow->q : NULL; -} - -static unsigned long atm_tc_find(struct Qdisc *sch, u32 classid) -{ - struct atm_qdisc_data *p __maybe_unused = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("%s(sch %p,[qdisc %p],classid %x)\n", __func__, sch, p, classid); - flow = lookup_flow(sch, classid); - pr_debug("%s: flow %p\n", __func__, flow); - return (unsigned long)flow; -} - -static unsigned long atm_tc_bind_filter(struct Qdisc *sch, - unsigned long parent, u32 classid) -{ - struct atm_qdisc_data *p __maybe_unused = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("%s(sch %p,[qdisc %p],classid %x)\n", __func__, sch, p, classid); - flow = lookup_flow(sch, classid); - if (flow) - flow->ref++; - pr_debug("%s: flow %p\n", __func__, flow); - return (unsigned long)flow; -} - -/* - * atm_tc_put handles all destructions, including the ones that are explicitly - * requested (atm_tc_destroy, etc.). The assumption here is that we never drop - * anything that still seems to be in use. - */ -static void atm_tc_put(struct Qdisc *sch, unsigned long cl) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - - pr_debug("atm_tc_put(sch %p,[qdisc %p],flow %p)\n", sch, p, flow); - if (--flow->ref) - return; - pr_debug("atm_tc_put: destroying\n"); - list_del_init(&flow->list); - pr_debug("atm_tc_put: qdisc %p\n", flow->q); - qdisc_put(flow->q); - tcf_block_put(flow->block); - if (flow->sock) { - pr_debug("atm_tc_put: f_count %ld\n", - file_count(flow->sock->file)); - flow->vcc->pop = flow->old_pop; - sockfd_put(flow->sock); - } - if (flow->excess) - atm_tc_put(sch, (unsigned long)flow->excess); - if (flow != &p->link) - kfree(flow); - /* - * If flow == &p->link, the qdisc no longer works at this point and - * needs to be removed. (By the caller of atm_tc_put.) - */ -} - -static void sch_atm_pop(struct atm_vcc *vcc, struct sk_buff *skb) -{ - struct atm_qdisc_data *p = VCC2FLOW(vcc)->parent; - - pr_debug("sch_atm_pop(vcc %p,skb %p,[qdisc %p])\n", vcc, skb, p); - VCC2FLOW(vcc)->old_pop(vcc, skb); - tasklet_schedule(&p->task); -} - -static const u8 llc_oui_ip[] = { - 0xaa, /* DSAP: non-ISO */ - 0xaa, /* SSAP: non-ISO */ - 0x03, /* Ctrl: Unnumbered Information Command PDU */ - 0x00, /* OUI: EtherType */ - 0x00, 0x00, - 0x08, 0x00 -}; /* Ethertype IP (0800) */ - -static const struct nla_policy atm_policy[TCA_ATM_MAX + 1] = { - [TCA_ATM_FD] = { .type = NLA_U32 }, - [TCA_ATM_EXCESS] = { .type = NLA_U32 }, -}; - -static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent, - struct nlattr **tca, unsigned long *arg, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)*arg; - struct atm_flow_data *excess = NULL; - struct nlattr *opt = tca[TCA_OPTIONS]; - struct nlattr *tb[TCA_ATM_MAX + 1]; - struct socket *sock; - int fd, error, hdr_len; - void *hdr; - - pr_debug("atm_tc_change(sch %p,[qdisc %p],classid %x,parent %x," - "flow %p,opt %p)\n", sch, p, classid, parent, flow, opt); - /* - * The concept of parents doesn't apply for this qdisc. - */ - if (parent && parent != TC_H_ROOT && parent != sch->handle) - return -EINVAL; - /* - * ATM classes cannot be changed. In order to change properties of the - * ATM connection, that socket needs to be modified directly (via the - * native ATM API. In order to send a flow to a different VC, the old - * class needs to be removed and a new one added. (This may be changed - * later.) - */ - if (flow) - return -EBUSY; - if (opt == NULL) - return -EINVAL; - - error = nla_parse_nested_deprecated(tb, TCA_ATM_MAX, opt, atm_policy, - NULL); - if (error < 0) - return error; - - if (!tb[TCA_ATM_FD]) - return -EINVAL; - fd = nla_get_u32(tb[TCA_ATM_FD]); - pr_debug("atm_tc_change: fd %d\n", fd); - if (tb[TCA_ATM_HDR]) { - hdr_len = nla_len(tb[TCA_ATM_HDR]); - hdr = nla_data(tb[TCA_ATM_HDR]); - } else { - hdr_len = RFC1483LLC_LEN; - hdr = NULL; /* default LLC/SNAP for IP */ - } - if (!tb[TCA_ATM_EXCESS]) - excess = NULL; - else { - excess = (struct atm_flow_data *) - atm_tc_find(sch, nla_get_u32(tb[TCA_ATM_EXCESS])); - if (!excess) - return -ENOENT; - } - pr_debug("atm_tc_change: type %d, payload %d, hdr_len %d\n", - opt->nla_type, nla_len(opt), hdr_len); - sock = sockfd_lookup(fd, &error); - if (!sock) - return error; /* f_count++ */ - pr_debug("atm_tc_change: f_count %ld\n", file_count(sock->file)); - if (sock->ops->family != PF_ATMSVC && sock->ops->family != PF_ATMPVC) { - error = -EPROTOTYPE; - goto err_out; - } - /* @@@ should check if the socket is really operational or we'll crash - on vcc->send */ - if (classid) { - if (TC_H_MAJ(classid ^ sch->handle)) { - pr_debug("atm_tc_change: classid mismatch\n"); - error = -EINVAL; - goto err_out; - } - } else { - int i; - unsigned long cl; - - for (i = 1; i < 0x8000; i++) { - classid = TC_H_MAKE(sch->handle, 0x8000 | i); - cl = atm_tc_find(sch, classid); - if (!cl) - break; - } - } - pr_debug("atm_tc_change: new id %x\n", classid); - flow = kzalloc(sizeof(struct atm_flow_data) + hdr_len, GFP_KERNEL); - pr_debug("atm_tc_change: flow %p\n", flow); - if (!flow) { - error = -ENOBUFS; - goto err_out; - } - - error = tcf_block_get(&flow->block, &flow->filter_list, sch, - extack); - if (error) { - kfree(flow); - goto err_out; - } - - flow->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid, - extack); - if (!flow->q) - flow->q = &noop_qdisc; - pr_debug("atm_tc_change: qdisc %p\n", flow->q); - flow->sock = sock; - flow->vcc = ATM_SD(sock); /* speedup */ - flow->vcc->user_back = flow; - pr_debug("atm_tc_change: vcc %p\n", flow->vcc); - flow->old_pop = flow->vcc->pop; - flow->parent = p; - flow->vcc->pop = sch_atm_pop; - flow->common.classid = classid; - flow->ref = 1; - flow->excess = excess; - list_add(&flow->list, &p->link.list); - flow->hdr_len = hdr_len; - if (hdr) - memcpy(flow->hdr, hdr, hdr_len); - else - memcpy(flow->hdr, llc_oui_ip, sizeof(llc_oui_ip)); - *arg = (unsigned long)flow; - return 0; -err_out: - sockfd_put(sock); - return error; -} - -static int atm_tc_delete(struct Qdisc *sch, unsigned long arg) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)arg; - - pr_debug("atm_tc_delete(sch %p,[qdisc %p],flow %p)\n", sch, p, flow); - if (list_empty(&flow->list)) - return -EINVAL; - if (rcu_access_pointer(flow->filter_list) || flow == &p->link) - return -EBUSY; - /* - * Reference count must be 2: one for "keepalive" (set at class - * creation), and one for the reference held when calling delete. - */ - if (flow->ref < 2) { - pr_err("atm_tc_delete: flow->ref == %d\n", flow->ref); - return -EINVAL; - } - if (flow->ref > 2) - return -EBUSY; /* catch references via excess, etc. */ - atm_tc_put(sch, arg); - return 0; -} - -static void atm_tc_walk(struct Qdisc *sch, struct qdisc_walker *walker) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("atm_tc_walk(sch %p,[qdisc %p],walker %p)\n", sch, p, walker); - if (walker->stop) - return; - list_for_each_entry(flow, &p->flows, list) { - if (walker->count >= walker->skip && - walker->fn(sch, (unsigned long)flow, walker) < 0) { - walker->stop = 1; - break; - } - walker->count++; - } -} - -static struct tcf_block *atm_tc_tcf_block(struct Qdisc *sch, unsigned long cl, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - - pr_debug("atm_tc_find_tcf(sch %p,[qdisc %p],flow %p)\n", sch, p, flow); - return flow ? flow->block : p->link.block; -} - -/* --------------------------- Qdisc operations ---------------------------- */ - -static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch, - struct sk_buff **to_free) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - struct tcf_result res; - int result; - int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; - - pr_debug("atm_tc_enqueue(skb %p,sch %p,[qdisc %p])\n", skb, sch, p); - result = TC_ACT_OK; /* be nice to gcc */ - flow = NULL; - if (TC_H_MAJ(skb->priority) != sch->handle || - !(flow = (struct atm_flow_data *)atm_tc_find(sch, skb->priority))) { - struct tcf_proto *fl; - - list_for_each_entry(flow, &p->flows, list) { - fl = rcu_dereference_bh(flow->filter_list); - if (fl) { - result = tcf_classify(skb, fl, &res, true); - if (result < 0) - continue; - if (result == TC_ACT_SHOT) - goto done; - - flow = (struct atm_flow_data *)res.class; - if (!flow) - flow = lookup_flow(sch, res.classid); - goto drop; - } - } - flow = NULL; -done: - ; - } - if (!flow) { - flow = &p->link; - } else { - if (flow->vcc) - ATM_SKB(skb)->atm_options = flow->vcc->atm_options; - /*@@@ looks good ... but it's not supposed to work :-) */ -#ifdef CONFIG_NET_CLS_ACT - switch (result) { - case TC_ACT_QUEUED: - case TC_ACT_STOLEN: - case TC_ACT_TRAP: - __qdisc_drop(skb, to_free); - return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; - case TC_ACT_SHOT: - __qdisc_drop(skb, to_free); - goto drop; - case TC_ACT_RECLASSIFY: - if (flow->excess) - flow = flow->excess; - else - ATM_SKB(skb)->atm_options |= ATM_ATMOPT_CLP; - break; - } -#endif - } - - ret = qdisc_enqueue(skb, flow->q, to_free); - if (ret != NET_XMIT_SUCCESS) { -drop: __maybe_unused - if (net_xmit_drop_count(ret)) { - qdisc_qstats_drop(sch); - if (flow) - flow->qstats.drops++; - } - return ret; - } - /* - * Okay, this may seem weird. We pretend we've dropped the packet if - * it goes via ATM. The reason for this is that the outer qdisc - * expects to be able to q->dequeue the packet later on if we return - * success at this place. Also, sch->q.qdisc needs to reflect whether - * there is a packet egligible for dequeuing or not. Note that the - * statistics of the outer qdisc are necessarily wrong because of all - * this. There's currently no correct solution for this. - */ - if (flow == &p->link) { - sch->q.qlen++; - return NET_XMIT_SUCCESS; - } - tasklet_schedule(&p->task); - return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; -} - -/* - * Dequeue packets and send them over ATM. Note that we quite deliberately - * avoid checking net_device's flow control here, simply because sch_atm - * uses its own channels, which have nothing to do with any CLIP/LANE/or - * non-ATM interfaces. - */ - -static void sch_atm_dequeue(unsigned long data) -{ - struct Qdisc *sch = (struct Qdisc *)data; - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - struct sk_buff *skb; - - pr_debug("sch_atm_dequeue(sch %p,[qdisc %p])\n", sch, p); - list_for_each_entry(flow, &p->flows, list) { - if (flow == &p->link) - continue; - /* - * If traffic is properly shaped, this won't generate nasty - * little bursts. Otherwise, it may ... (but that's okay) - */ - while ((skb = flow->q->ops->peek(flow->q))) { - if (!atm_may_send(flow->vcc, skb->truesize)) - break; - - skb = qdisc_dequeue_peeked(flow->q); - if (unlikely(!skb)) - break; - - qdisc_bstats_update(sch, skb); - bstats_update(&flow->bstats, skb); - pr_debug("atm_tc_dequeue: sending on class %p\n", flow); - /* remove any LL header somebody else has attached */ - skb_pull(skb, skb_network_offset(skb)); - if (skb_headroom(skb) < flow->hdr_len) { - struct sk_buff *new; - - new = skb_realloc_headroom(skb, flow->hdr_len); - dev_kfree_skb(skb); - if (!new) - continue; - skb = new; - } - pr_debug("sch_atm_dequeue: ip %p, data %p\n", - skb_network_header(skb), skb->data); - ATM_SKB(skb)->vcc = flow->vcc; - memcpy(skb_push(skb, flow->hdr_len), flow->hdr, - flow->hdr_len); - refcount_add(skb->truesize, - &sk_atm(flow->vcc)->sk_wmem_alloc); - /* atm.atm_options are already set by atm_tc_enqueue */ - flow->vcc->send(flow->vcc, skb); - } - } -} - -static struct sk_buff *atm_tc_dequeue(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct sk_buff *skb; - - pr_debug("atm_tc_dequeue(sch %p,[qdisc %p])\n", sch, p); - tasklet_schedule(&p->task); - skb = qdisc_dequeue_peeked(p->link.q); - if (skb) - sch->q.qlen--; - return skb; -} - -static struct sk_buff *atm_tc_peek(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - - pr_debug("atm_tc_peek(sch %p,[qdisc %p])\n", sch, p); - - return p->link.q->ops->peek(p->link.q); -} - -static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - int err; - - pr_debug("atm_tc_init(sch %p,[qdisc %p],opt %p)\n", sch, p, opt); - INIT_LIST_HEAD(&p->flows); - INIT_LIST_HEAD(&p->link.list); - list_add(&p->link.list, &p->flows); - p->link.q = qdisc_create_dflt(sch->dev_queue, - &pfifo_qdisc_ops, sch->handle, extack); - if (!p->link.q) - p->link.q = &noop_qdisc; - pr_debug("atm_tc_init: link (%p) qdisc %p\n", &p->link, p->link.q); - p->link.vcc = NULL; - p->link.sock = NULL; - p->link.common.classid = sch->handle; - p->link.ref = 1; - - err = tcf_block_get(&p->link.block, &p->link.filter_list, sch, - extack); - if (err) - return err; - - tasklet_init(&p->task, sch_atm_dequeue, (unsigned long)sch); - return 0; -} - -static void atm_tc_reset(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow; - - pr_debug("atm_tc_reset(sch %p,[qdisc %p])\n", sch, p); - list_for_each_entry(flow, &p->flows, list) - qdisc_reset(flow->q); - sch->q.qlen = 0; -} - -static void atm_tc_destroy(struct Qdisc *sch) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow, *tmp; - - pr_debug("atm_tc_destroy(sch %p,[qdisc %p])\n", sch, p); - list_for_each_entry(flow, &p->flows, list) { - tcf_block_put(flow->block); - flow->block = NULL; - } - - list_for_each_entry_safe(flow, tmp, &p->flows, list) { - if (flow->ref > 1) - pr_err("atm_destroy: %p->ref = %d\n", flow, flow->ref); - atm_tc_put(sch, (unsigned long)flow); - } - tasklet_kill(&p->task); -} - -static int atm_tc_dump_class(struct Qdisc *sch, unsigned long cl, - struct sk_buff *skb, struct tcmsg *tcm) -{ - struct atm_qdisc_data *p = qdisc_priv(sch); - struct atm_flow_data *flow = (struct atm_flow_data *)cl; - struct nlattr *nest; - - pr_debug("atm_tc_dump_class(sch %p,[qdisc %p],flow %p,skb %p,tcm %p)\n", - sch, p, flow, skb, tcm); - if (list_empty(&flow->list)) - return -EINVAL; - tcm->tcm_handle = flow->common.classid; - tcm->tcm_info = flow->q->handle; - - nest = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (nest == NULL) - goto nla_put_failure; - - if (nla_put(skb, TCA_ATM_HDR, flow->hdr_len, flow->hdr)) - goto nla_put_failure; - if (flow->vcc) { - struct sockaddr_atmpvc pvc; - int state; - - memset(&pvc, 0, sizeof(pvc)); - pvc.sap_family = AF_ATMPVC; - pvc.sap_addr.itf = flow->vcc->dev ? flow->vcc->dev->number : -1; - pvc.sap_addr.vpi = flow->vcc->vpi; - pvc.sap_addr.vci = flow->vcc->vci; - if (nla_put(skb, TCA_ATM_ADDR, sizeof(pvc), &pvc)) - goto nla_put_failure; - state = ATM_VF2VS(flow->vcc->flags); - if (nla_put_u32(skb, TCA_ATM_STATE, state)) - goto nla_put_failure; - } - if (flow->excess) { - if (nla_put_u32(skb, TCA_ATM_EXCESS, flow->common.classid)) - goto nla_put_failure; - } else { - if (nla_put_u32(skb, TCA_ATM_EXCESS, 0)) - goto nla_put_failure; - } - return nla_nest_end(skb, nest); - -nla_put_failure: - nla_nest_cancel(skb, nest); - return -1; -} -static int -atm_tc_dump_class_stats(struct Qdisc *sch, unsigned long arg, - struct gnet_dump *d) -{ - struct atm_flow_data *flow = (struct atm_flow_data *)arg; - - if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), - d, NULL, &flow->bstats) < 0 || - gnet_stats_copy_queue(d, NULL, &flow->qstats, flow->q->q.qlen) < 0) - return -1; - - return 0; -} - -static int atm_tc_dump(struct Qdisc *sch, struct sk_buff *skb) -{ - return 0; -} - -static const struct Qdisc_class_ops atm_class_ops = { - .graft = atm_tc_graft, - .leaf = atm_tc_leaf, - .find = atm_tc_find, - .change = atm_tc_change, - .delete = atm_tc_delete, - .walk = atm_tc_walk, - .tcf_block = atm_tc_tcf_block, - .bind_tcf = atm_tc_bind_filter, - .unbind_tcf = atm_tc_put, - .dump = atm_tc_dump_class, - .dump_stats = atm_tc_dump_class_stats, -}; - -static struct Qdisc_ops atm_qdisc_ops __read_mostly = { - .cl_ops = &atm_class_ops, - .id = "atm", - .priv_size = sizeof(struct atm_qdisc_data), - .enqueue = atm_tc_enqueue, - .dequeue = atm_tc_dequeue, - .peek = atm_tc_peek, - .init = atm_tc_init, - .reset = atm_tc_reset, - .destroy = atm_tc_destroy, - .dump = atm_tc_dump, - .owner = THIS_MODULE, -}; - -static int __init atm_init(void) -{ - return register_qdisc(&atm_qdisc_ops); -} - -static void __exit atm_exit(void) -{ - unregister_qdisc(&atm_qdisc_ops); -} - -module_init(atm_init) -module_exit(atm_exit) -MODULE_LICENSE("GPL");
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jamal Hadi Salim jhs@mojatatu.com
commit bbe77c14ee6185a61ba6d5e435c1cbb489d2a9ed upstream.
The dsmark qdisc has served us well over the years for diffserv but has not been getting much attention due to other more popular approaches to do diffserv services. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it.
Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Acked-by: Jiri Pirko jiri@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/Kconfig | 11 - net/sched/Makefile | 1 net/sched/sch_dsmark.c | 523 ------------------------------------------------- 3 files changed, 535 deletions(-) delete mode 100644 net/sched/sch_dsmark.c delete mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dsmark.json
--- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -186,17 +186,6 @@ config NET_SCH_GRED To compile this code as a module, choose M here: the module will be called sch_gred.
-config NET_SCH_DSMARK - tristate "Differentiated Services marker (DSMARK)" - ---help--- - Say Y if you want to schedule packets according to the - Differentiated Services architecture proposed in RFC 2475. - Technical information on this method, with pointers to associated - RFCs, is available at http://www.gta.ufrj.br/diffserv/. - - To compile this code as a module, choose M here: the - module will be called sch_dsmark. - config NET_SCH_NETEM tristate "Network emulator (NETEM)" ---help--- --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -36,7 +36,6 @@ obj-$(CONFIG_NET_SCH_HFSC) += sch_hfsc.o obj-$(CONFIG_NET_SCH_RED) += sch_red.o obj-$(CONFIG_NET_SCH_GRED) += sch_gred.o obj-$(CONFIG_NET_SCH_INGRESS) += sch_ingress.o -obj-$(CONFIG_NET_SCH_DSMARK) += sch_dsmark.o obj-$(CONFIG_NET_SCH_SFB) += sch_sfb.o obj-$(CONFIG_NET_SCH_SFQ) += sch_sfq.o obj-$(CONFIG_NET_SCH_TBF) += sch_tbf.o --- a/net/sched/sch_dsmark.c +++ /dev/null @@ -1,523 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* net/sched/sch_dsmark.c - Differentiated Services field marker */ - -/* Written 1998-2000 by Werner Almesberger, EPFL ICA */ - - -#include <linux/module.h> -#include <linux/init.h> -#include <linux/slab.h> -#include <linux/types.h> -#include <linux/string.h> -#include <linux/errno.h> -#include <linux/skbuff.h> -#include <linux/rtnetlink.h> -#include <linux/bitops.h> -#include <net/pkt_sched.h> -#include <net/pkt_cls.h> -#include <net/dsfield.h> -#include <net/inet_ecn.h> -#include <asm/byteorder.h> - -/* - * classid class marking - * ------- ----- ------- - * n/a 0 n/a - * x:0 1 use entry [0] - * ... ... ... - * x:y y>0 y+1 use entry [y] - * ... ... ... - * x:indices-1 indices use entry [indices-1] - * ... ... ... - * x:y y+1 use entry [y & (indices-1)] - * ... ... ... - * 0xffff 0x10000 use entry [indices-1] - */ - - -#define NO_DEFAULT_INDEX (1 << 16) - -struct mask_value { - u8 mask; - u8 value; -}; - -struct dsmark_qdisc_data { - struct Qdisc *q; - struct tcf_proto __rcu *filter_list; - struct tcf_block *block; - struct mask_value *mv; - u16 indices; - u8 set_tc_index; - u32 default_index; /* index range is 0...0xffff */ -#define DSMARK_EMBEDDED_SZ 16 - struct mask_value embedded[DSMARK_EMBEDDED_SZ]; -}; - -static inline int dsmark_valid_index(struct dsmark_qdisc_data *p, u16 index) -{ - return index <= p->indices && index > 0; -} - -/* ------------------------- Class/flow operations ------------------------- */ - -static int dsmark_graft(struct Qdisc *sch, unsigned long arg, - struct Qdisc *new, struct Qdisc **old, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p],new %p,old %p)\n", - __func__, sch, p, new, old); - - if (new == NULL) { - new = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, - sch->handle, NULL); - if (new == NULL) - new = &noop_qdisc; - } - - *old = qdisc_replace(sch, new, &p->q); - return 0; -} - -static struct Qdisc *dsmark_leaf(struct Qdisc *sch, unsigned long arg) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - return p->q; -} - -static unsigned long dsmark_find(struct Qdisc *sch, u32 classid) -{ - return TC_H_MIN(classid) + 1; -} - -static unsigned long dsmark_bind_filter(struct Qdisc *sch, - unsigned long parent, u32 classid) -{ - pr_debug("%s(sch %p,[qdisc %p],classid %x)\n", - __func__, sch, qdisc_priv(sch), classid); - - return dsmark_find(sch, classid); -} - -static void dsmark_unbind_filter(struct Qdisc *sch, unsigned long cl) -{ -} - -static const struct nla_policy dsmark_policy[TCA_DSMARK_MAX + 1] = { - [TCA_DSMARK_INDICES] = { .type = NLA_U16 }, - [TCA_DSMARK_DEFAULT_INDEX] = { .type = NLA_U16 }, - [TCA_DSMARK_SET_TC_INDEX] = { .type = NLA_FLAG }, - [TCA_DSMARK_MASK] = { .type = NLA_U8 }, - [TCA_DSMARK_VALUE] = { .type = NLA_U8 }, -}; - -static int dsmark_change(struct Qdisc *sch, u32 classid, u32 parent, - struct nlattr **tca, unsigned long *arg, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *opt = tca[TCA_OPTIONS]; - struct nlattr *tb[TCA_DSMARK_MAX + 1]; - int err = -EINVAL; - - pr_debug("%s(sch %p,[qdisc %p],classid %x,parent %x), arg 0x%lx\n", - __func__, sch, p, classid, parent, *arg); - - if (!dsmark_valid_index(p, *arg)) { - err = -ENOENT; - goto errout; - } - - if (!opt) - goto errout; - - err = nla_parse_nested_deprecated(tb, TCA_DSMARK_MAX, opt, - dsmark_policy, NULL); - if (err < 0) - goto errout; - - if (tb[TCA_DSMARK_VALUE]) - p->mv[*arg - 1].value = nla_get_u8(tb[TCA_DSMARK_VALUE]); - - if (tb[TCA_DSMARK_MASK]) - p->mv[*arg - 1].mask = nla_get_u8(tb[TCA_DSMARK_MASK]); - - err = 0; - -errout: - return err; -} - -static int dsmark_delete(struct Qdisc *sch, unsigned long arg) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - if (!dsmark_valid_index(p, arg)) - return -EINVAL; - - p->mv[arg - 1].mask = 0xff; - p->mv[arg - 1].value = 0; - - return 0; -} - -static void dsmark_walk(struct Qdisc *sch, struct qdisc_walker *walker) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - int i; - - pr_debug("%s(sch %p,[qdisc %p],walker %p)\n", - __func__, sch, p, walker); - - if (walker->stop) - return; - - for (i = 0; i < p->indices; i++) { - if (p->mv[i].mask == 0xff && !p->mv[i].value) - goto ignore; - if (walker->count >= walker->skip) { - if (walker->fn(sch, i + 1, walker) < 0) { - walker->stop = 1; - break; - } - } -ignore: - walker->count++; - } -} - -static struct tcf_block *dsmark_tcf_block(struct Qdisc *sch, unsigned long cl, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - return p->block; -} - -/* --------------------------- Qdisc operations ---------------------------- */ - -static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch, - struct sk_buff **to_free) -{ - unsigned int len = qdisc_pkt_len(skb); - struct dsmark_qdisc_data *p = qdisc_priv(sch); - int err; - - pr_debug("%s(skb %p,sch %p,[qdisc %p])\n", __func__, skb, sch, p); - - if (p->set_tc_index) { - int wlen = skb_network_offset(skb); - - switch (skb_protocol(skb, true)) { - case htons(ETH_P_IP): - wlen += sizeof(struct iphdr); - if (!pskb_may_pull(skb, wlen) || - skb_try_make_writable(skb, wlen)) - goto drop; - - skb->tc_index = ipv4_get_dsfield(ip_hdr(skb)) - & ~INET_ECN_MASK; - break; - - case htons(ETH_P_IPV6): - wlen += sizeof(struct ipv6hdr); - if (!pskb_may_pull(skb, wlen) || - skb_try_make_writable(skb, wlen)) - goto drop; - - skb->tc_index = ipv6_get_dsfield(ipv6_hdr(skb)) - & ~INET_ECN_MASK; - break; - default: - skb->tc_index = 0; - break; - } - } - - if (TC_H_MAJ(skb->priority) == sch->handle) - skb->tc_index = TC_H_MIN(skb->priority); - else { - struct tcf_result res; - struct tcf_proto *fl = rcu_dereference_bh(p->filter_list); - int result = tcf_classify(skb, fl, &res, false); - - pr_debug("result %d class 0x%04x\n", result, res.classid); - - switch (result) { -#ifdef CONFIG_NET_CLS_ACT - case TC_ACT_QUEUED: - case TC_ACT_STOLEN: - case TC_ACT_TRAP: - __qdisc_drop(skb, to_free); - return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; - - case TC_ACT_SHOT: - goto drop; -#endif - case TC_ACT_OK: - skb->tc_index = TC_H_MIN(res.classid); - break; - - default: - if (p->default_index != NO_DEFAULT_INDEX) - skb->tc_index = p->default_index; - break; - } - } - - err = qdisc_enqueue(skb, p->q, to_free); - if (err != NET_XMIT_SUCCESS) { - if (net_xmit_drop_count(err)) - qdisc_qstats_drop(sch); - return err; - } - - sch->qstats.backlog += len; - sch->q.qlen++; - - return NET_XMIT_SUCCESS; - -drop: - qdisc_drop(skb, sch, to_free); - return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; -} - -static struct sk_buff *dsmark_dequeue(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct sk_buff *skb; - u32 index; - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - - skb = qdisc_dequeue_peeked(p->q); - if (skb == NULL) - return NULL; - - qdisc_bstats_update(sch, skb); - qdisc_qstats_backlog_dec(sch, skb); - sch->q.qlen--; - - index = skb->tc_index & (p->indices - 1); - pr_debug("index %d->%d\n", skb->tc_index, index); - - switch (skb_protocol(skb, true)) { - case htons(ETH_P_IP): - ipv4_change_dsfield(ip_hdr(skb), p->mv[index].mask, - p->mv[index].value); - break; - case htons(ETH_P_IPV6): - ipv6_change_dsfield(ipv6_hdr(skb), p->mv[index].mask, - p->mv[index].value); - break; - default: - /* - * Only complain if a change was actually attempted. - * This way, we can send non-IP traffic through dsmark - * and don't need yet another qdisc as a bypass. - */ - if (p->mv[index].mask != 0xff || p->mv[index].value) - pr_warn("%s: unsupported protocol %d\n", - __func__, ntohs(skb_protocol(skb, true))); - break; - } - - return skb; -} - -static struct sk_buff *dsmark_peek(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - - return p->q->ops->peek(p->q); -} - -static int dsmark_init(struct Qdisc *sch, struct nlattr *opt, - struct netlink_ext_ack *extack) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *tb[TCA_DSMARK_MAX + 1]; - int err = -EINVAL; - u32 default_index = NO_DEFAULT_INDEX; - u16 indices; - int i; - - pr_debug("%s(sch %p,[qdisc %p],opt %p)\n", __func__, sch, p, opt); - - if (!opt) - goto errout; - - err = tcf_block_get(&p->block, &p->filter_list, sch, extack); - if (err) - return err; - - err = nla_parse_nested_deprecated(tb, TCA_DSMARK_MAX, opt, - dsmark_policy, NULL); - if (err < 0) - goto errout; - - err = -EINVAL; - if (!tb[TCA_DSMARK_INDICES]) - goto errout; - indices = nla_get_u16(tb[TCA_DSMARK_INDICES]); - - if (hweight32(indices) != 1) - goto errout; - - if (tb[TCA_DSMARK_DEFAULT_INDEX]) - default_index = nla_get_u16(tb[TCA_DSMARK_DEFAULT_INDEX]); - - if (indices <= DSMARK_EMBEDDED_SZ) - p->mv = p->embedded; - else - p->mv = kmalloc_array(indices, sizeof(*p->mv), GFP_KERNEL); - if (!p->mv) { - err = -ENOMEM; - goto errout; - } - for (i = 0; i < indices; i++) { - p->mv[i].mask = 0xff; - p->mv[i].value = 0; - } - p->indices = indices; - p->default_index = default_index; - p->set_tc_index = nla_get_flag(tb[TCA_DSMARK_SET_TC_INDEX]); - - p->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, sch->handle, - NULL); - if (p->q == NULL) - p->q = &noop_qdisc; - else - qdisc_hash_add(p->q, true); - - pr_debug("%s: qdisc %p\n", __func__, p->q); - - err = 0; -errout: - return err; -} - -static void dsmark_reset(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - if (p->q) - qdisc_reset(p->q); - sch->qstats.backlog = 0; - sch->q.qlen = 0; -} - -static void dsmark_destroy(struct Qdisc *sch) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - - pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p); - - tcf_block_put(p->block); - qdisc_put(p->q); - if (p->mv != p->embedded) - kfree(p->mv); -} - -static int dsmark_dump_class(struct Qdisc *sch, unsigned long cl, - struct sk_buff *skb, struct tcmsg *tcm) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *opts = NULL; - - pr_debug("%s(sch %p,[qdisc %p],class %ld\n", __func__, sch, p, cl); - - if (!dsmark_valid_index(p, cl)) - return -EINVAL; - - tcm->tcm_handle = TC_H_MAKE(TC_H_MAJ(sch->handle), cl - 1); - tcm->tcm_info = p->q->handle; - - opts = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (opts == NULL) - goto nla_put_failure; - if (nla_put_u8(skb, TCA_DSMARK_MASK, p->mv[cl - 1].mask) || - nla_put_u8(skb, TCA_DSMARK_VALUE, p->mv[cl - 1].value)) - goto nla_put_failure; - - return nla_nest_end(skb, opts); - -nla_put_failure: - nla_nest_cancel(skb, opts); - return -EMSGSIZE; -} - -static int dsmark_dump(struct Qdisc *sch, struct sk_buff *skb) -{ - struct dsmark_qdisc_data *p = qdisc_priv(sch); - struct nlattr *opts = NULL; - - opts = nla_nest_start_noflag(skb, TCA_OPTIONS); - if (opts == NULL) - goto nla_put_failure; - if (nla_put_u16(skb, TCA_DSMARK_INDICES, p->indices)) - goto nla_put_failure; - - if (p->default_index != NO_DEFAULT_INDEX && - nla_put_u16(skb, TCA_DSMARK_DEFAULT_INDEX, p->default_index)) - goto nla_put_failure; - - if (p->set_tc_index && - nla_put_flag(skb, TCA_DSMARK_SET_TC_INDEX)) - goto nla_put_failure; - - return nla_nest_end(skb, opts); - -nla_put_failure: - nla_nest_cancel(skb, opts); - return -EMSGSIZE; -} - -static const struct Qdisc_class_ops dsmark_class_ops = { - .graft = dsmark_graft, - .leaf = dsmark_leaf, - .find = dsmark_find, - .change = dsmark_change, - .delete = dsmark_delete, - .walk = dsmark_walk, - .tcf_block = dsmark_tcf_block, - .bind_tcf = dsmark_bind_filter, - .unbind_tcf = dsmark_unbind_filter, - .dump = dsmark_dump_class, -}; - -static struct Qdisc_ops dsmark_qdisc_ops __read_mostly = { - .next = NULL, - .cl_ops = &dsmark_class_ops, - .id = "dsmark", - .priv_size = sizeof(struct dsmark_qdisc_data), - .enqueue = dsmark_enqueue, - .dequeue = dsmark_dequeue, - .peek = dsmark_peek, - .init = dsmark_init, - .reset = dsmark_reset, - .destroy = dsmark_destroy, - .change = NULL, - .dump = dsmark_dump, - .owner = THIS_MODULE, -}; - -static int __init dsmark_module_init(void) -{ - return register_qdisc(&dsmark_qdisc_ops); -} - -static void __exit dsmark_module_exit(void) -{ - unregister_qdisc(&dsmark_qdisc_ops); -} - -module_init(dsmark_module_init) -module_exit(dsmark_module_exit) - -MODULE_LICENSE("GPL");
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cyril Hrubis chrubis@suse.cz
commit c1fc6484e1fb7cc2481d169bfef129a1b0676abe upstream.
The sched_rr_timeslice can be reset to default by writing value that is <= 0. However after reading from this file we always got the last value written, which is not useful at all.
$ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms $ cat /proc/sys/kernel/sched_rr_timeslice_ms -1
Fix this by setting the variable that holds the sysctl file value to the jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written.
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Petr Vorel pvorel@suse.cz Acked-by: Mel Gorman mgorman@suse.de Tested-by: Petr Vorel pvorel@suse.cz Cc: Mahmoud Adam mngyadam@amazon.com Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/rt.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2738,6 +2738,9 @@ int sched_rr_handler(struct ctl_table *t sched_rr_timeslice = sysctl_sched_rr_timeslice <= 0 ? RR_TIMESLICE : msecs_to_jiffies(sysctl_sched_rr_timeslice); + + if (sysctl_sched_rr_timeslice <= 0) + sysctl_sched_rr_timeslice = jiffies_to_msecs(RR_TIMESLICE); } mutex_unlock(&mutex);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: "GONG, Ruiqi" gongruiqi1@huawei.com
commit 1a3e1f40962c445b997151a542314f3c6097f8c3 upstream.
NOTE: This is a partial backport since we only need the refcnt between memcg and stock to fix the problem stated below, and in this way multiple versions use the same code and align with each other.
There was a kernel panic happened on an in-house environment running 3.10, and the same problem was reproduced on 4.19:
general protection fault: 0000 [#1] SMP PTI CPU: 1 PID: 2085 Comm: bash Kdump: loaded Tainted: G L 4.19.90+ #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010 drain_all_stock+0xad/0x140 Code: 00 00 4d 85 ff 74 2c 45 85 c9 74 27 4d 39 fc 74 42 41 80 bc 24 28 04 00 00 00 74 17 49 8b 04 24 49 8b 17 48 8b 88 90 02 00 00 <48> 39 8a 90 02 00 00 74 02 eb 86 48 63 88 3c 01 00 00 39 8a 3c 01 RSP: 0018:ffffa7efc5813d70 EFLAGS: 00010202 RAX: ffff8cb185548800 RBX: ffff8cb89f420160 RCX: ffff8cb1867b6000 RDX: babababababababa RSI: 0000000000000001 RDI: 0000000000231876 RBP: 0000000000000000 R08: 0000000000000415 R09: 0000000000000002 R10: 0000000000000000 R11: 0000000000000001 R12: ffff8cb186f89040 R13: 0000000000020160 R14: 0000000000000001 R15: ffff8cb186b27040 FS: 00007f4a308d3740(0000) GS:ffff8cb89f440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe4d634a68 CR3: 000000010b022000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mem_cgroup_force_empty_write+0x31/0xb0 cgroup_file_write+0x60/0x140 ? __check_object_size+0x136/0x147 kernfs_fop_write+0x10e/0x190 __vfs_write+0x37/0x1b0 ? selinux_file_permission+0xe8/0x130 ? security_file_permission+0x2e/0xb0 vfs_write+0xb6/0x1a0 ksys_write+0x57/0xd0 do_syscall_64+0x63/0x250 ? async_page_fault+0x8/0x30 entry_SYSCALL_64_after_hwframe+0x5c/0xc1 Modules linked in: ...
It is found that in case of stock->nr_pages == 0, the memcg on stock->cached could be freed due to its refcnt decreased to 0, which made stock->cached become a dangling pointer. It could cause a UAF problem in drain_all_stock() in the following concurrent scenario. Note that drain_all_stock() doesn't disable irq but only preemption.
CPU1 CPU2 ============================================================================== stock->cached = memcgA (freed) drain_all_stock(memcgB) rcu_read_lock() memcg = CPU1's stock->cached (memcgA) (interrupted) refill_stock(memcgC) drain_stock(memcgA) stock->cached = memcgC stock->nr_pages += xxx (> 0) stock->nr_pages > 0 mem_cgroup_is_descendant(memcgA, memcgB) [UAF] rcu_read_unlock()
This problem is, unintentionally, fixed at 5.9, where commit 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page accounting") adds memcg refcnt for stock. Therefore affected LTS versions include 4.19 and 5.4.
For 4.19, memcg's css offline process doesn't call drain_all_stock(). so it's easier for the released memcg to be left on the stock. For 5.4, although mem_cgroup_css_offline() does call drain_all_stock(), but the flushing could be skipped when stock->nr_pages happens to be 0, and besides the async draining could be delayed and take place after the UAF problem has happened.
Fix this problem by adding (and decreasing) memcg's refcnt when memcg is put onto (and removed from) stock, just like how commit 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page accounting") does. After all, "being on the stock" is a kind of reference with regards to memcg. As such, it's guaranteed that a css on stock would not be freed.
It's good to mention that refill_stock() is executed in an irq-disabled context, so the drain_stock() patched with css_put() would not actually free memcgA until the end of refill_stock(), since css_put() is an RCU free and it's still in grace period. For CPU2, the access to CPU1's stock->cached is protected by rcu_read_lock(), so in this case it gets either NULL from stock->cached or a memcgA that is still good.
Cc: stable@vger.kernel.org # 4.19 5.4 Fixes: cdec2e4265df ("memcg: coalesce charging via percpu storage") Signed-off-by: GONG, Ruiqi gongruiqi1@huawei.com Acked-by: Michal Hocko mhocko@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memcontrol.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2214,6 +2214,9 @@ static void drain_stock(struct memcg_sto { struct mem_cgroup *old = stock->cached;
+ if (!old) + return; + if (stock->nr_pages) { page_counter_uncharge(&old->memory, stock->nr_pages); if (do_memsw_account()) @@ -2221,6 +2224,8 @@ static void drain_stock(struct memcg_sto css_put_many(&old->css, stock->nr_pages); stock->nr_pages = 0; } + + css_put(&old->css); stock->cached = NULL; }
@@ -2256,6 +2261,7 @@ static void refill_stock(struct mem_cgro stock = this_cpu_ptr(&memcg_stock); if (stock->cached != memcg) { /* reset if necessary */ drain_stock(stock); + css_get(&memcg->css); stock->cached = memcg; } stock->nr_pages += nr_pages;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 5124a0a549857c4b87173280e192eea24dea72ad upstream.
If DAT metadata file block access fails due to corruption of the DAT file or abnormal virtual block numbers held by b-trees or inodes, a kernel warning is generated.
This replaces the WARN_ONs by error output, so that a kernel, booted with panic_on_warn, does not panic. This patch also replaces the detected return code -ENOENT with another internal code -EINVAL to notify the bmap layer of metadata corruption. When the bmap layer sees -EINVAL, it handles the abnormal situation with nilfs_bmap_convert_error() and finally returns code -EIO as it should.
Link: https://lkml.kernel.org/r/0000000000005cc3d205ea23ddcf@google.com Link: https://lkml.kernel.org/r/20230126164114.6911-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+5d5d25f90f195a3cfcb4@syzkaller.appspotmail.com Tested-by: Ryusuke Konishi konishi.ryusuke@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/dat.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-)
--- a/fs/nilfs2/dat.c +++ b/fs/nilfs2/dat.c @@ -40,8 +40,21 @@ static inline struct nilfs_dat_info *NIL static int nilfs_dat_prepare_entry(struct inode *dat, struct nilfs_palloc_req *req, int create) { - return nilfs_palloc_get_entry_block(dat, req->pr_entry_nr, - create, &req->pr_entry_bh); + int ret; + + ret = nilfs_palloc_get_entry_block(dat, req->pr_entry_nr, + create, &req->pr_entry_bh); + if (unlikely(ret == -ENOENT)) { + nilfs_msg(dat->i_sb, KERN_ERR, + "DAT doesn't have a block to manage vblocknr = %llu", + (unsigned long long)req->pr_entry_nr); + /* + * Return internal code -EINVAL to notify bmap layer of + * metadata corruption. + */ + ret = -EINVAL; + } + return ret; }
static void nilfs_dat_commit_entry(struct inode *dat, @@ -123,11 +136,7 @@ static void nilfs_dat_commit_free(struct
int nilfs_dat_prepare_start(struct inode *dat, struct nilfs_palloc_req *req) { - int ret; - - ret = nilfs_dat_prepare_entry(dat, req, 0); - WARN_ON(ret == -ENOENT); - return ret; + return nilfs_dat_prepare_entry(dat, req, 0); }
void nilfs_dat_commit_start(struct inode *dat, struct nilfs_palloc_req *req, @@ -154,10 +163,8 @@ int nilfs_dat_prepare_end(struct inode * int ret;
ret = nilfs_dat_prepare_entry(dat, req, 0); - if (ret < 0) { - WARN_ON(ret == -ENOENT); + if (ret < 0) return ret; - }
kaddr = kmap_atomic(req->pr_entry_bh->b_page); entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lokesh Gidra lokeshgidra@google.com
commit 67695f18d55924b2013534ef3bdc363bc9e14605 upstream.
In mfill_atomic_hugetlb(), mmap_changing isn't being checked again if we drop mmap_lock and reacquire it. When the lock is not held, mmap_changing could have been incremented. This is also inconsistent with the behavior in mfill_atomic().
Link: https://lkml.kernel.org/r/20240117223729.1444522-1-lokeshgidra@google.com Fixes: df2cc96e77011 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races") Signed-off-by: Lokesh Gidra lokeshgidra@google.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Mike Rapoport rppt@kernel.org Cc: Axel Rasmussen axelrasmussen@google.com Cc: Brian Geffon bgeffon@google.com Cc: David Hildenbrand david@redhat.com Cc: Jann Horn jannh@google.com Cc: Kalesh Singh kaleshsingh@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Nicolas Geoffray ngeoffray@google.com Cc: Peter Xu peterx@redhat.com Cc: Suren Baghdasaryan surenb@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Mike Rapoport (IBM) rppt@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/userfaultfd.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
--- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -177,6 +177,7 @@ static __always_inline ssize_t __mcopy_a unsigned long dst_start, unsigned long src_start, unsigned long len, + bool *mmap_changing, bool zeropage) { int vm_alloc_shared = dst_vma->vm_flags & VM_SHARED; @@ -308,6 +309,15 @@ retry: goto out; } down_read(&dst_mm->mmap_sem); + /* + * If memory mappings are changing because of non-cooperative + * operation (e.g. mremap) running in parallel, bail out and + * request the user to retry later + */ + if (mmap_changing && READ_ONCE(*mmap_changing)) { + err = -EAGAIN; + break; + }
dst_vma = NULL; goto retry; @@ -389,6 +399,7 @@ extern ssize_t __mcopy_atomic_hugetlb(st unsigned long dst_start, unsigned long src_start, unsigned long len, + bool *mmap_changing, bool zeropage); #endif /* CONFIG_HUGETLB_PAGE */
@@ -506,7 +517,8 @@ retry: */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, zeropage); + src_start, len, mmap_changing, + zeropage);
if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cyril Hrubis chrubis@suse.cz
commit c7fcb99877f9f542c918509b2801065adcaf46fa upstream.
There is a 10% rounding error in the intial value of the sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_get_interval01:
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
What this test does is to compare the return value from the sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and fails if they do not match.
The problem it found is the intial sysctl file value which was computed as:
static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
which works fine as long as MSEC_PER_SEC is multiple of HZ, however it introduces 10% rounding error for CONFIG_HZ_300:
(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
(1000 / 300) * (100 * 300 / 1000)
3 * 30 = 90
This can be easily fixed by reversing the order of the multiplication and division. After this fix we get:
(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
(1000 * (100 * 300 / 1000)) / 300
(1000 * 30) / 300 = 100
Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds") Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Petr Vorel pvorel@suse.cz Acked-by: Mel Gorman mgorman@suse.de Tested-by: Petr Vorel pvorel@suse.cz Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz [ pvorel: rebased for 5.4 ] Signed-off-by: Petr Vorel pvorel@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/rt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -8,7 +8,7 @@ #include "pelt.h"
int sched_rr_timeslice = RR_TIMESLICE; -int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE; +int sysctl_sched_rr_timeslice = (MSEC_PER_SEC * RR_TIMESLICE) / HZ; /* More than 4 hours if BW_SHIFT equals 20. */ static const u64 max_rt_runtime = MAX_BW;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cyril Hrubis chrubis@suse.cz
commit 079be8fc630943d9fc70a97807feb73d169ee3fc upstream.
The validation of the value written to sched_rt_period_us was broken because:
- the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec()
Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check:
if (sysclt_sched_rt_period <= 0) return EINVAL;
This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead.
Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX.
As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace.
There is also a LTP test being submitted for these sysctl files at:
http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis@...
Signed-off-by: Cyril Hrubis chrubis@suse.cz Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz [ pvorel: rebased for 5.4 ] Reviewed-by: Petr Vorel pvorel@suse.cz Signed-off-by: Petr Vorel pvorel@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/rt.c | 5 +---- kernel/sysctl.c | 4 ++++ 2 files changed, 5 insertions(+), 4 deletions(-)
--- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2659,9 +2659,6 @@ static int sched_rt_global_constraints(v
static int sched_rt_global_validate(void) { - if (sysctl_sched_rt_period <= 0) - return -EINVAL; - if ((sysctl_sched_rt_runtime != RUNTIME_INF) && ((sysctl_sched_rt_runtime > sysctl_sched_rt_period) || ((u64)sysctl_sched_rt_runtime * @@ -2693,7 +2690,7 @@ int sched_rt_handler(struct ctl_table *t old_period = sysctl_sched_rt_period; old_runtime = sysctl_sched_rt_runtime;
- ret = proc_dointvec(table, write, buffer, lenp, ppos); + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) { ret = sched_rt_global_validate(); --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -465,6 +465,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rt_runtime_us", @@ -472,6 +474,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = &neg_one, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rr_timeslice_ms",
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Bogdanov d.bogdanov@yadro.com
[ Upstream commit 83ab68168a3d990d5ff39ab030ad5754cbbccb25 ]
An abort that is responded to by iSCSI itself is added to tmr_list but does not go to target core. A LUN_RESET that goes through tmr_list takes a refcounter on the abort and waits for completion. However, the abort will be never complete because it was not started in target core.
Unable to locate ITT: 0x05000000 on CID: 0 Unable to locate RefTaskTag: 0x05000000 on CID: 0. wait_for_tasks: Stopping tmf LUN_RESET with tag 0x0 ref_task_tag 0x0 i_state 34 t_state ISTATE_PROCESSING refcnt 2 transport_state active,stop,fabric_stop wait for tasks: tmf LUN_RESET with tag 0x0 ref_task_tag 0x0 i_state 34 t_state ISTATE_PROCESSING refcnt 2 transport_state active,stop,fabric_stop ... INFO: task kworker/0:2:49 blocked for more than 491 seconds. task:kworker/0:2 state:D stack: 0 pid: 49 ppid: 2 flags:0x00000800 Workqueue: events target_tmr_work [target_core_mod] Call Trace: __switch_to+0x2c4/0x470 _schedule+0x314/0x1730 schedule+0x64/0x130 schedule_timeout+0x168/0x430 wait_for_completion+0x140/0x270 target_put_cmd_and_wait+0x64/0xb0 [target_core_mod] core_tmr_lun_reset+0x30/0xa0 [target_core_mod] target_tmr_work+0xc8/0x1b0 [target_core_mod] process_one_work+0x2d4/0x5d0 worker_thread+0x78/0x6c0
To fix this, only add abort to tmr_list if it will be handled by target core.
Signed-off-by: Dmitry Bogdanov d.bogdanov@yadro.com Link: https://lore.kernel.org/r/20240111125941.8688-1-d.bogdanov@yadro.com Reviewed-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/target_core_device.c | 5 ----- drivers/target/target_core_transport.c | 4 ++++ 2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c index 8ba134ccd3b9c..edddc66ad1337 100644 --- a/drivers/target/target_core_device.c +++ b/drivers/target/target_core_device.c @@ -151,7 +151,6 @@ int transport_lookup_tmr_lun(struct se_cmd *se_cmd, u64 unpacked_lun) struct se_session *se_sess = se_cmd->se_sess; struct se_node_acl *nacl = se_sess->se_node_acl; struct se_tmr_req *se_tmr = se_cmd->se_tmr_req; - unsigned long flags;
rcu_read_lock(); deve = target_nacl_find_deve(nacl, unpacked_lun); @@ -182,10 +181,6 @@ int transport_lookup_tmr_lun(struct se_cmd *se_cmd, u64 unpacked_lun) se_cmd->se_dev = rcu_dereference_raw(se_lun->lun_se_dev); se_tmr->tmr_dev = rcu_dereference_raw(se_lun->lun_se_dev);
- spin_lock_irqsave(&se_tmr->tmr_dev->se_tmr_lock, flags); - list_add_tail(&se_tmr->tmr_list, &se_tmr->tmr_dev->dev_tmr_list); - spin_unlock_irqrestore(&se_tmr->tmr_dev->se_tmr_lock, flags); - return 0; } EXPORT_SYMBOL(transport_lookup_tmr_lun); diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index f52fe40002259..82d9e2659abe3 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -3392,6 +3392,10 @@ int transport_generic_handle_tmr( unsigned long flags; bool aborted = false;
+ spin_lock_irqsave(&cmd->se_dev->se_tmr_lock, flags); + list_add_tail(&cmd->se_tmr_req->tmr_list, &cmd->se_dev->dev_tmr_list); + spin_unlock_irqrestore(&cmd->se_dev->se_tmr_lock, flags); + spin_lock_irqsave(&cmd->t_state_lock, flags); if (cmd->transport_state & CMD_T_ABORTED) { aborted = true;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vinod Koul vkoul@kernel.org
[ Upstream commit 404290240827c3bb5c4e195174a8854eef2f89ac ]
We seem to have hit warnings of 'output may be truncated' which is fixed by increasing the size of 'dev_id'
drivers/dma/sh/shdmac.c: In function ‘sh_dmae_probe’: drivers/dma/sh/shdmac.c:541:34: error: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 9 [-Werror=format-truncation=] 541 | "sh-dmae%d.%d", pdev->id, id); | ^~ In function ‘sh_dmae_chan_probe’, inlined from ‘sh_dmae_probe’ at drivers/dma/sh/shdmac.c:845:9: drivers/dma/sh/shdmac.c:541:26: note: directive argument in the range [0, 2147483647] 541 | "sh-dmae%d.%d", pdev->id, id); | ^~~~~~~~~~~~~~ drivers/dma/sh/shdmac.c:541:26: note: directive argument in the range [0, 19] drivers/dma/sh/shdmac.c:540:17: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 16 540 | snprintf(sh_chan->dev_id, sizeof(sh_chan->dev_id), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 541 | "sh-dmae%d.%d", pdev->id, id); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/sh/shdma.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/sh/shdma.h b/drivers/dma/sh/shdma.h index 9c121a4b33ad8..f97d80343aea4 100644 --- a/drivers/dma/sh/shdma.h +++ b/drivers/dma/sh/shdma.h @@ -25,7 +25,7 @@ struct sh_dmae_chan { const struct sh_dmae_slave_config *config; /* Slave DMA configuration */ int xmit_shift; /* log_2(bytes_per_xfer) */ void __iomem *base; - char dev_id[16]; /* unique name per DMAC of channel */ + char dev_id[32]; /* unique name per DMAC of channel */ int pm_error; dma_addr_t slave_addr; };
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vinod Koul vkoul@kernel.org
[ Upstream commit 6386f6c995b3ab91c72cfb76e4465553c555a8da ]
We seem to have hit warnings of 'output may be truncated' which is fixed by increasing the size of 'irq_name'
drivers/dma/fsl-qdma.c: In function ‘fsl_qdma_irq_init’: drivers/dma/fsl-qdma.c:824:46: error: ‘%d’ directive writing between 1 and 11 bytes into a region of size 10 [-Werror=format-overflow=] 824 | sprintf(irq_name, "qdma-queue%d", i); | ^~ drivers/dma/fsl-qdma.c:824:35: note: directive argument in the range [-2147483641, 2147483646] 824 | sprintf(irq_name, "qdma-queue%d", i); | ^~~~~~~~~~~~~~ drivers/dma/fsl-qdma.c:824:17: note: ‘sprintf’ output between 12 and 22 bytes into a destination of size 20 824 | sprintf(irq_name, "qdma-queue%d", i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/fsl-qdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/fsl-qdma.c b/drivers/dma/fsl-qdma.c index ad5818c0ce223..f2f10aeeea319 100644 --- a/drivers/dma/fsl-qdma.c +++ b/drivers/dma/fsl-qdma.c @@ -754,7 +754,7 @@ fsl_qdma_irq_init(struct platform_device *pdev, int i; int cpu; int ret; - char irq_name[20]; + char irq_name[32];
fsl_qdma->error_irq = platform_get_irq_byname(pdev, "qdma-error");
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Kazior michal@plume.com
[ Upstream commit a6e4f85d3820d00694ed10f581f4c650445dbcda ]
The nl80211_dump_interface() supports resumption in case nl80211_send_iface() doesn't have the resources to complete its work.
The logic would store the progress as iteration offsets for rdev and wdev loops.
However the logic did not properly handle resumption for non-last rdev. Assuming a system with 2 rdevs, with 2 wdevs each, this could happen:
dump(cb=[0, 0]): if_start=cb[1] (=0) send rdev0.wdev0 -> ok send rdev0.wdev1 -> yield cb[1] = 1
dump(cb=[0, 1]): if_start=cb[1] (=1) send rdev0.wdev1 -> ok // since if_start=1 the rdev0.wdev0 got skipped // through if_idx < if_start send rdev1.wdev1 -> ok
The if_start needs to be reset back to 0 upon wdev loop end.
The problem is actually hard to hit on a desktop, and even on most routers. The prerequisites for this manifesting was: - more than 1 wiphy - a few handful of interfaces - dump without rdev or wdev filter
I was seeing this with 4 wiphys 9 interfaces each. It'd miss 6 interfaces from the last wiphy reported to userspace.
Signed-off-by: Michal Kazior michal@plume.com Link: https://msgid.link/20240116142340.89678-1-kazikcz@gmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/nl80211.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 0926a30bc7391..494de0161d2fc 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -3350,6 +3350,7 @@ static int nl80211_dump_interface(struct sk_buff *skb, struct netlink_callback * if_idx++; }
+ if_start = 0; wp_idx++; } out:
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit bcbc84af1183c8cf3d1ca9b78540c2185cd85e7f ]
fast-xmit must only be enabled after the sta has been uploaded to the driver, otherwise it could end up passing the not-yet-uploaded sta via drv_tx calls to the driver, leading to potential crashes because of uninitialized drv_priv data. Add a missing sta->uploaded check and re-check fast xmit after inserting a sta.
Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://msgid.link/20240104181059.84032-1-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/sta_info.c | 2 ++ net/mac80211/tx.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index 0f97c6fcec174..e330036e02eac 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -683,6 +683,8 @@ static int sta_info_insert_finish(struct sta_info *sta) __acquires(RCU) if (ieee80211_vif_is_mesh(&sdata->vif)) mesh_accept_plinks_update(sdata);
+ ieee80211_check_fast_xmit(sta); + return 0; out_remove: sta_info_hash_del(local, sta); diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 8bd01dfa75cb1..5fd9a6f752a1d 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -2919,7 +2919,7 @@ void ieee80211_check_fast_xmit(struct sta_info *sta) sdata->vif.type == NL80211_IFTYPE_STATION) goto out;
- if (!test_sta_flag(sta, WLAN_STA_AUTHORIZED)) + if (!test_sta_flag(sta, WLAN_STA_AUTHORIZED) || !sta->uploaded) goto out;
if (test_sta_flag(sta, WLAN_STA_PS_STA) ||
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fullway Wang fullwaywang@outlook.com
[ Upstream commit 04e5eac8f3ab2ff52fa191c187a46d4fdbc1e288 ]
The userspace program could pass any values to the driver through ioctl() interface. If the driver doesn't check the value of pixclock, it may cause divide-by-zero error.
Although pixclock is checked in savagefb_decode_var(), but it is not checked properly in savagefb_probe(). Fix this by checking whether pixclock is zero in the function savagefb_check_var() before info->var.pixclock is used as the divisor.
This is similar to CVE-2022-3061 in i740fb which was fixed by commit 15cf0b8.
Signed-off-by: Fullway Wang fullwaywang@outlook.com Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/savage/savagefb_driver.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/video/fbdev/savage/savagefb_driver.c b/drivers/video/fbdev/savage/savagefb_driver.c index d5d22d9c0f562..368500c401402 100644 --- a/drivers/video/fbdev/savage/savagefb_driver.c +++ b/drivers/video/fbdev/savage/savagefb_driver.c @@ -869,6 +869,9 @@ static int savagefb_check_var(struct fb_var_screeninfo *var,
DBG("savagefb_check_var");
+ if (!var->pixclock) + return -EINVAL; + var->transp.offset = 0; var->transp.length = 0; switch (var->bits_per_pixel) {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fullway Wang fullwaywang@outlook.com
[ Upstream commit e421946be7d9bf545147bea8419ef8239cb7ca52 ]
The userspace program could pass any values to the driver through ioctl() interface. If the driver doesn't check the value of pixclock, it may cause divide-by-zero error.
In sisfb_check_var(), var->pixclock is used as a divisor to caculate drate before it is checked against zero. Fix this by checking it at the beginning.
This is similar to CVE-2022-3061 in i740fb which was fixed by commit 15cf0b8.
Signed-off-by: Fullway Wang fullwaywang@outlook.com Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/video/fbdev/sis/sis_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/video/fbdev/sis/sis_main.c b/drivers/video/fbdev/sis/sis_main.c index b443a8ed46003..2fdd02e51f5fc 100644 --- a/drivers/video/fbdev/sis/sis_main.c +++ b/drivers/video/fbdev/sis/sis_main.c @@ -1474,6 +1474,8 @@ sisfb_check_var(struct fb_var_screeninfo *var, struct fb_info *info)
vtotal = var->upper_margin + var->lower_margin + var->vsync_len;
+ if (!var->pixclock) + return -EINVAL; pixclock = var->pixclock;
if((var->vmode & FB_VMODE_MASK) == FB_VMODE_NONINTERLACED) {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Conrad Kostecki conikost@gentoo.org
[ Upstream commit 0077a504e1a4468669fd2e011108db49133db56e ]
The ASM1166 SATA host controller always reports wrongly, that it has 32 ports. But in reality, it only has six ports.
This seems to be a hardware issue, as all tested ASM1166 SATA host controllers reports such high count of ports.
Example output: ahci 0000:09:00.0: AHCI 0001.0301 32 slots 32 ports 6 Gbps 0xffffff3f impl SATA mode.
By adjusting the port_map, the count is limited to six ports.
New output: ahci 0000:09:00.0: AHCI 0001.0301 32 slots 32 ports 6 Gbps 0x3f impl SATA mode.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=211873 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218346 Signed-off-by: Conrad Kostecki conikost@gentoo.org Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Niklas Cassel cassel@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/ahci.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index aa35d1941d1fc..5db3dc45bdc4e 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -618,6 +618,11 @@ MODULE_PARM_DESC(mobile_lpm_policy, "Default LPM policy for mobile chipsets"); static void ahci_pci_save_initial_config(struct pci_dev *pdev, struct ahci_host_priv *hpriv) { + if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA && pdev->device == 0x1166) { + dev_info(&pdev->dev, "ASM1166 has only six ports\n"); + hpriv->saved_port_map = 0x3f; + } + if (pdev->vendor == PCI_VENDOR_ID_JMICRON && pdev->device == 0x2361) { dev_info(&pdev->dev, "JMB361 has only one port\n"); hpriv->force_port_map = 1;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lennert Buytenhek kernel@wantstofly.org
[ Upstream commit 20730e9b277873deeb6637339edcba64468f3da3 ]
With one of the on-board ASM1061 AHCI controllers (1b21:0612) on an ASUSTeK Pro WS WRX80E-SAGE SE WIFI mainboard, a controller hang was observed that was immediately preceded by the following kernel messages:
ahci 0000:28:00.0: Using 64-bit DMA addresses ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00000 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00300 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00380 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00400 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00680 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00700 flags=0x0000]
The first message is produced by code in drivers/iommu/dma-iommu.c which is accompanied by the following comment that seems to apply:
/* * Try to use all the 32-bit PCI addresses first. The original SAC vs. * DAC reasoning loses relevance with PCIe, but enough hardware and * firmware bugs are still lurking out there that it's safest not to * venture into the 64-bit space until necessary. * * If your device goes wrong after seeing the notice then likely either * its driver is not setting DMA masks accurately, the hardware has * some inherent bug in handling >32-bit addresses, or not all the * expected address bits are wired up between the device and the IOMMU. */
Asking the ASM1061 on a discrete PCIe card to DMA from I/O virtual address 0xffffffff00000000 produces the following I/O page faults:
vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000000 flags=0x0010] vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000500 flags=0x0010]
Note that the upper 21 bits of the logged DMA address are zero. (When asking a different PCIe device in the same PCIe slot to DMA to the same I/O virtual address, we do see all the upper 32 bits of the DMA address as 1, so this is not an issue with the chipset or IOMMU configuration on the test system.)
Also, hacking libahci to always set the upper 21 bits of all DMA addresses to 1 produces no discernible effect on the behavior of the ASM1061, and mkfs/mount/scrub/etc work as without this hack.
This all strongly suggests that the ASM1061 has a 43 bit DMA address limit, and this commit therefore adds a quirk to deal with this limit.
This issue probably applies to (some of) the other supported ASMedia parts as well, but we limit it to the PCI IDs known to refer to ASM1061 parts, as that's the only part we know for sure to be affected by this issue at this point.
Link: https://lore.kernel.org/linux-ide/ZaZ2PIpEId-rl6jv@wantstofly.org/ Signed-off-by: Lennert Buytenhek kernel@wantstofly.org [cassel: drop date from error messages in commit log] Signed-off-by: Niklas Cassel cassel@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/ahci.c | 29 +++++++++++++++++++++++------ drivers/ata/ahci.h | 1 + 2 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 5db3dc45bdc4e..84c7519dddb19 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -48,6 +48,7 @@ enum { enum board_ids { /* board IDs by feature in alphabetical order */ board_ahci, + board_ahci_43bit_dma, board_ahci_ign_iferr, board_ahci_mobile, board_ahci_nomsi, @@ -126,6 +127,13 @@ static const struct ata_port_info ahci_port_info[] = { .udma_mask = ATA_UDMA6, .port_ops = &ahci_ops, }, + [board_ahci_43bit_dma] = { + AHCI_HFLAGS (AHCI_HFLAG_43BIT_ONLY), + .flags = AHCI_FLAG_COMMON, + .pio_mask = ATA_PIO4, + .udma_mask = ATA_UDMA6, + .port_ops = &ahci_ops, + }, [board_ahci_ign_iferr] = { AHCI_HFLAGS (AHCI_HFLAG_IGN_IRQ_IF_ERR), .flags = AHCI_FLAG_COMMON, @@ -561,11 +569,11 @@ static const struct pci_device_id ahci_pci_tbl[] = { { PCI_VDEVICE(PROMISE, 0x3f20), board_ahci }, /* PDC42819 */ { PCI_VDEVICE(PROMISE, 0x3781), board_ahci }, /* FastTrak TX8660 ahci-mode */
- /* Asmedia */ + /* ASMedia */ { PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci }, /* ASM1060 */ { PCI_VDEVICE(ASMEDIA, 0x0602), board_ahci }, /* ASM1060 */ - { PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci }, /* ASM1061 */ - { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci }, /* ASM1062 */ + { PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci_43bit_dma }, /* ASM1061 */ + { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci_43bit_dma }, /* ASM1061/1062 */ { PCI_VDEVICE(ASMEDIA, 0x0621), board_ahci }, /* ASM1061R */ { PCI_VDEVICE(ASMEDIA, 0x0622), board_ahci }, /* ASM1062R */
@@ -916,11 +924,20 @@ static int ahci_pci_device_resume(struct device *dev)
#endif /* CONFIG_PM */
-static int ahci_configure_dma_masks(struct pci_dev *pdev, int using_dac) +static int ahci_configure_dma_masks(struct pci_dev *pdev, + struct ahci_host_priv *hpriv) { - const int dma_bits = using_dac ? 64 : 32; + int dma_bits; int rc;
+ if (hpriv->cap & HOST_CAP_64) { + dma_bits = 64; + if (hpriv->flags & AHCI_HFLAG_43BIT_ONLY) + dma_bits = 43; + } else { + dma_bits = 32; + } + /* * If the device fixup already set the dma_mask to some non-standard * value, don't extend it here. This happens on STA2X11, for example. @@ -1880,7 +1897,7 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) ahci_gtf_filter_workaround(host);
/* initialize adapter */ - rc = ahci_configure_dma_masks(pdev, hpriv->cap & HOST_CAP_64); + rc = ahci_configure_dma_masks(pdev, hpriv); if (rc) return rc;
diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h index 36dac58b5c413..bb1e52212f644 100644 --- a/drivers/ata/ahci.h +++ b/drivers/ata/ahci.h @@ -244,6 +244,7 @@ enum { AHCI_HFLAG_IGN_NOTSUPP_POWER_ON = BIT(27), /* ignore -EOPNOTSUPP from phy_power_on() */ AHCI_HFLAG_NO_SXS = BIT(28), /* SXS not supported */ + AHCI_HFLAG_43BIT_ONLY = BIT(29), /* 43bit DMA addr limit */
/* ap->flags bits */
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 4530b3660d396a646aad91a787b6ab37cf604b53 ]
Determine if the group block bitmap is corrupted before using ac_b_ex in ext4_mb_try_best_found() to avoid allocating blocks from a group with a corrupted block bitmap in the following concurrency and making the situation worse.
ext4_mb_regular_allocator ext4_lock_group(sb, group) ext4_mb_good_group // check if the group bbitmap is corrupted ext4_mb_complex_scan_group // Scan group gets ac_b_ex but doesn't use it ext4_unlock_group(sb, group) ext4_mark_group_bitmap_corrupted(group) // The block bitmap was corrupted during // the group unlock gap. ext4_mb_try_best_found ext4_lock_group(ac->ac_sb, group) ext4_mb_use_best_found mb_mark_used // Allocating blocks in block bitmap corrupted group
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240104142040.2835097-7-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/mballoc.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 0745330228cf0..1af9daaff8ba2 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1802,6 +1802,9 @@ int ext4_mb_try_best_found(struct ext4_allocation_context *ac, return err;
ext4_lock_group(ac->ac_sb, group); + if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) + goto out; + max = mb_find_extent(e4b, ex.fe_start, ex.fe_len, &ex);
if (max > 0) { @@ -1809,6 +1812,7 @@ int ext4_mb_try_best_found(struct ext4_allocation_context *ac, ext4_mb_use_best_found(ac, e4b); }
+out: ext4_unlock_group(ac->ac_sb, group); ext4_mb_unload_buddy(e4b);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 832698373a25950942c04a512daa652c18a9b513 ]
Places the logic for checking if the group's block bitmap is corrupt under the protection of the group lock to avoid allocating blocks from the group with a corrupted block bitmap.
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20240104142040.2835097-8-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/mballoc.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 1af9daaff8ba2..e823731110e3e 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1839,12 +1839,10 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac, if (err) return err;
- if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) { - ext4_mb_unload_buddy(e4b); - return 0; - } - ext4_lock_group(ac->ac_sb, group); + if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) + goto out; + max = mb_find_extent(e4b, ac->ac_g_ex.fe_start, ac->ac_g_ex.fe_len, &ex); ex.fe_logical = 0xDEADFA11; /* debug value */ @@ -1877,6 +1875,7 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac, ac->ac_b_ex = ex; ext4_mb_use_best_found(ac, e4b); } +out: ext4_unlock_group(ac->ac_sb, group); ext4_mb_unload_buddy(e4b);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Blumenstingl martin.blumenstingl@googlemail.com
[ Upstream commit c92688cac239794e4a1d976afa5203a4d3a2ac0e ]
Continuous regulators can be configured to operate only in a certain duty cycle range (for example from 0..91%). Add a check to error out if the duty cycle translates to an unsupported (or out of range) voltage.
Suggested-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Link: https://msgid.link/r/20240113224628.377993-2-martin.blumenstingl@googlemail.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/regulator/pwm-regulator.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/regulator/pwm-regulator.c b/drivers/regulator/pwm-regulator.c index 0a9d61a91f436..1b06aaaaf8b8e 100644 --- a/drivers/regulator/pwm-regulator.c +++ b/drivers/regulator/pwm-regulator.c @@ -158,6 +158,9 @@ static int pwm_regulator_get_voltage(struct regulator_dev *rdev) pwm_get_state(drvdata->pwm, &pstate);
voltage = pwm_get_relative_duty_cycle(&pstate, duty_unit); + if (voltage < min(max_uV_duty, min_uV_duty) || + voltage > max(max_uV_duty, min_uV_duty)) + return -ENOTRECOVERABLE;
/* * The dutycycle for min_uV might be greater than the one for max_uV.
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guixin Liu kanie@linux.alibaba.com
[ Upstream commit 47c5dd66c1840524572dcdd956f4af2bdb6fbdff ]
The nvmet_tcp_queue_ida should be destroy when the nvmet-tcp module exit.
Signed-off-by: Guixin Liu kanie@linux.alibaba.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/tcp.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index be9e976575578..d40bd57537ba1 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -1817,6 +1817,7 @@ static void __exit nvmet_tcp_exit(void) flush_scheduled_work();
destroy_workqueue(nvmet_tcp_wq); + ida_destroy(&nvmet_tcp_queue_ida); }
module_init(nvmet_tcp_init);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen-Yu Tsai wens@csie.org
[ Upstream commit 0adf963b8463faa44653e22e56ce55f747e68868 ]
The SPDIF hardware block found in the H616 SoC has the same layout as the one found in the H6 SoC, except that it is missing the receiver side.
Since the driver currently only supports the transmit function, support for the H616 is identical to what is currently done for the H6.
Signed-off-by: Chen-Yu Tsai wens@csie.org Reviewed-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Jernej Skrabec jernej.skrabec@gmail.com Link: https://msgid.link/r/20240127163247.384439-4-wens@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sunxi/sun4i-spdif.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/sound/soc/sunxi/sun4i-spdif.c b/sound/soc/sunxi/sun4i-spdif.c index cbe598b0fb107..680d64e0d69f4 100644 --- a/sound/soc/sunxi/sun4i-spdif.c +++ b/sound/soc/sunxi/sun4i-spdif.c @@ -464,6 +464,11 @@ static const struct of_device_id sun4i_spdif_of_match[] = { .compatible = "allwinner,sun50i-h6-spdif", .data = &sun50i_h6_spdif_quirks, }, + { + .compatible = "allwinner,sun50i-h616-spdif", + /* Essentially the same as the H6, but without RX */ + .data = &sun50i_h6_spdif_quirks, + }, { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, sun4i_spdif_of_match);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 6e348067ee4bc5905e35faa3a8fafa91c9124bc7 ]
The annotation says in sctp_new(): "If it is a shutdown ack OOTB packet, we expect a return shutdown complete, otherwise an ABORT Sec 8.4 (5) and (8)". However, it does not check SCTP_CID_SHUTDOWN_ACK before setting vtag[REPLY] in the conntrack entry(ct).
Because of that, if the ct in Router disappears for some reason in [1] with the packet sequence like below:
Client > Server: sctp (1) [INIT] [init tag: 3201533963] Server > Client: sctp (1) [INIT ACK] [init tag: 972498433] Client > Server: sctp (1) [COOKIE ECHO] Server > Client: sctp (1) [COOKIE ACK] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057809] Server > Client: sctp (1) [SACK] [cum ack 3075057809] Server > Client: sctp (1) [HB REQ] (the ct in Router disappears somehow) <-------- [1] Client > Server: sctp (1) [HB ACK] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810] Client > Server: sctp (1) [HB REQ] Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810] Client > Server: sctp (1) [HB REQ] Client > Server: sctp (1) [ABORT]
when processing HB ACK packet in Router it calls sctp_new() to initialize the new ct with vtag[REPLY] set to HB_ACK packet's vtag.
Later when sending DATA from Client, all the SACKs from Server will get dropped in Router, as the SACK packet's vtag does not match vtag[REPLY] in the ct. The worst thing is the vtag in this ct will never get fixed by the upcoming packets from Server.
This patch fixes it by checking SCTP_CID_SHUTDOWN_ACK before setting vtag[REPLY] in the ct in sctp_new() as the annotation says. With this fix, it will leave vtag[REPLY] in ct to 0 in the case above, and the next HB REQ/ACK from Server is able to fix the vtag as its value is 0 in nf_conntrack_sctp_packet().
Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_conntrack_proto_sctp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c index e7545bcca805e..6b2a215b27862 100644 --- a/net/netfilter/nf_conntrack_proto_sctp.c +++ b/net/netfilter/nf_conntrack_proto_sctp.c @@ -299,7 +299,7 @@ sctp_new(struct nf_conn *ct, const struct sk_buff *skb, pr_debug("Setting vtag %x for secondary conntrack\n", sh->vtag); ct->proto.sctp.vtag[IP_CT_DIR_ORIGINAL] = sh->vtag; - } else { + } else if (sch->type == SCTP_CID_SHUTDOWN_ACK) { /* If it is a shutdown ack OOTB packet, we expect a return shutdown complete, otherwise an ABORT Sec 8.4 (5) and (8) */ pr_debug("Setting vtag %x for new conn OOTB\n",
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Wagner dwagner@suse.de
[ Upstream commit 3146345c2e9c2f661527054e402b0cfad80105a4 ]
When the target port has not active port binding, there is no point in trying to process the command as it has to fail anyway. Instead adding checks to all commands abort the command early.
Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Daniel Wagner dwagner@suse.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/fc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c index f74fc6481731d..2dd39299fba07 100644 --- a/drivers/nvme/target/fc.c +++ b/drivers/nvme/target/fc.c @@ -796,6 +796,9 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport) int idx; bool needrandom = true;
+ if (!tgtport->pe) + return NULL; + assoc = kzalloc(sizeof(*assoc), GFP_KERNEL); if (!assoc) return NULL; @@ -2180,8 +2183,9 @@ nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
fod->req.cmd = &fod->cmdiubuf.sqe; fod->req.cqe = &fod->rspiubuf.cqe; - if (tgtport->pe) - fod->req.port = tgtport->pe->port; + if (!tgtport->pe) + goto transport_error; + fod->req.port = tgtport->pe->port;
/* clear any response payload */ memset(&fod->rspiubuf, 0, sizeof(fod->rspiubuf));
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit 34cf8c657cf0365791cdc658ddbca9cc907726ce ]
Currently, coretemp driver supports only 128 cores per package. This loses some core temperature information on systems that have more than 128 cores per package. [ 58.685033] coretemp coretemp.0: Adding Core 128 failed [ 58.692009] coretemp coretemp.0: Adding Core 129 failed ...
Enlarge the limitation to 512 because there are platforms with more than 256 cores per package.
Signed-off-by: Zhang Rui rui.zhang@intel.com Link: https://lore.kernel.org/r/20240202092144.71180-4-rui.zhang@intel.com Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/coretemp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c index ecee12d0346b2..04acb8274fdf8 100644 --- a/drivers/hwmon/coretemp.c +++ b/drivers/hwmon/coretemp.c @@ -40,7 +40,7 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
#define PKG_SYSFS_ATTR_NO 1 /* Sysfs attribute for package temp */ #define BASE_SYSFS_ATTR_NO 2 /* Sysfs Base attr no for coretemp */ -#define NUM_REAL_CORES 128 /* Number of Real cores per cpu */ +#define NUM_REAL_CORES 512 /* Number of Real cores per cpu */ #define CORETEMP_NAME_LENGTH 28 /* String Length of attrs */ #define MAX_CORE_ATTRS 4 /* Maximum no of basic attrs */ #define TOTAL_ATTRS (MAX_CORE_ATTRS + 1)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hannes Reinecke hare@suse.de
[ Upstream commit d6c1b19153f92e95e5e1801d540e98771053afae ]
LUNs going into "failed ready running" state observed on >1T and on even numbers of size (2T, 4T, 6T, 8T and 10T). The issue occurs when DIF is enabled at the host.
The kernel logs:
Cannot setup S/G List for HBAIO segs 1/1 SGL 512 SCSI 256: 3 0
The host lpfc driver is failing to setup scatter/gather list (protection data) for the I/Os.
The return type lpfc_bg_setup_sgl()/lpfc_bg_setup_sgl_prot() causes the compiler to remove the most significant bit. Use an unsigned type instead.
Signed-off-by: Hannes Reinecke hare@suse.de [dwagner: added commit message] Signed-off-by: Daniel Wagner dwagner@suse.de Link: https://lore.kernel.org/r/20231220162658.12392-1-dwagner@suse.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/lpfc/lpfc_scsi.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c index cbab15d299ca2..816235ccd2992 100644 --- a/drivers/scsi/lpfc/lpfc_scsi.c +++ b/drivers/scsi/lpfc/lpfc_scsi.c @@ -1942,7 +1942,7 @@ lpfc_bg_setup_bpl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc, * * Returns the number of SGEs added to the SGL. **/ -static int +static uint32_t lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc, struct sli4_sge *sgl, int datasegcnt, struct lpfc_io_buf *lpfc_cmd) @@ -1950,8 +1950,8 @@ lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc, struct scatterlist *sgde = NULL; /* s/g data entry */ struct sli4_sge_diseed *diseed = NULL; dma_addr_t physaddr; - int i = 0, num_sge = 0, status; - uint32_t reftag; + int i = 0, status; + uint32_t reftag, num_sge = 0; uint8_t txop, rxop; #ifdef CONFIG_SCSI_LPFC_DEBUG_FS uint32_t rc; @@ -2122,7 +2122,7 @@ lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc, * * Returns the number of SGEs added to the SGL. **/ -static int +static uint32_t lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc, struct sli4_sge *sgl, int datacnt, int protcnt, struct lpfc_io_buf *lpfc_cmd) @@ -2146,8 +2146,8 @@ lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc, uint32_t rc; #endif uint32_t checking = 1; - uint32_t dma_offset = 0; - int num_sge = 0, j = 2; + uint32_t dma_offset = 0, num_sge = 0; + int j = 2; struct sli4_hybrid_sgl *sgl_xtra = NULL;
sgpe = scsi_prot_sglist(sc);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Sakamoto o-takashi@sakamocchi.jp
[ Upstream commit 7ed4380009e96d9e9c605e12822e987b35b05648 ]
If we are bus manager and the bus has inconsistent gap counts, send a bus reset immediately instead of trying to read the root node's config ROM first. Otherwise, we could spend a lot of time trying to read the config ROM but never succeeding.
This eliminates a 50+ second delay before the FireWire bus is usable after a newly connected device is powered on in certain circumstances.
The delay occurs if a gap count inconsistency occurs, we are not the root node, and we become bus manager. One scenario that causes this is with a TI XIO2213B OHCI, the first time a Sony DSR-25 is powered on after being connected to the FireWire cable. In this configuration, the Linux box will not receive the initial PHY configuration packet sent by the DSR-25 as IRM, resulting in the DSR-25 having a gap count of 44 while the Linux box has a gap count of 63.
FireWire devices have a gap count parameter, which is set to 63 on power-up and can be changed with a PHY configuration packet. This determines the duration of the subaction and arbitration gaps. For reliable communication, all nodes on a FireWire bus must have the same gap count.
A node may have zero or more of the following roles: root node, bus manager (BM), isochronous resource manager (IRM), and cycle master. Unless a root node was forced with a PHY configuration packet, any node might become root node after a bus reset. Only the root node can become cycle master. If the root node is not cycle master capable, the BM or IRM should force a change of root node.
After a bus reset, each node sends a self-ID packet, which contains its current gap count. A single bus reset does not change the gap count, but two bus resets in a row will set the gap count to 63. Because a consistent gap count is required for reliable communication, IEEE 1394a-2000 requires that the bus manager generate a bus reset if it detects that the gap count is inconsistent.
When the gap count is inconsistent, build_tree() will notice this after the self identification process. It will set card->gap_count to the invalid value 0. If we become bus master, this will force bm_work() to send a bus reset when it performs gap count optimization.
After a bus reset, there is no bus manager. We will almost always try to become bus manager. Once we become bus manager, we will first determine whether the root node is cycle master capable. Then, we will determine if the gap count should be changed. If either the root node or the gap count should be changed, we will generate a bus reset.
To determine if the root node is cycle master capable, we read its configuration ROM. bm_work() will wait until we have finished trying to read the configuration ROM.
However, an inconsistent gap count can make this take a long time. read_config_rom() will read the first few quadlets from the config ROM. Due to the gap count inconsistency, eventually one of the reads will time out. When read_config_rom() fails, fw_device_init() calls it again until MAX_RETRIES is reached. This takes 50+ seconds.
Once we give up trying to read the configuration ROM, bm_work() will wake up, assume that the root node is not cycle master capable, and do a bus reset. Hopefully, this will resolve the gap count inconsistency.
This change makes bm_work() check for an inconsistent gap count before waiting for the root node's configuration ROM. If the gap count is inconsistent, bm_work() will immediately do a bus reset. This eliminates the 50+ second delay and rapidly brings the bus to a working state.
I considered that if the gap count is inconsistent, a PHY configuration packet might not be successful, so it could be desirable to skip the PHY configuration packet before the bus reset in this case. However, IEEE 1394a-2000 and IEEE 1394-2008 say that the bus manager may transmit a PHY configuration packet before a bus reset when correcting a gap count error. Since the standard endorses this, I decided it's safe to retain the PHY configuration packet transmission.
Normally, after a topology change, we will reset the bus a maximum of 5 times to change the root node and perform gap count optimization. However, if there is a gap count inconsistency, we must always generate a bus reset. Otherwise the gap count inconsistency will persist and communication will be unreliable. For that reason, if there is a gap count inconstency, we generate a bus reset even if we already reached the 5 reset limit.
Signed-off-by: Adam Goldman adamg@pobox.com Reference: https://sourceforge.net/p/linux1394/mailman/message/58727806/ Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firewire/core-card.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/firewire/core-card.c b/drivers/firewire/core-card.c index f3b3953cac834..be195ba834632 100644 --- a/drivers/firewire/core-card.c +++ b/drivers/firewire/core-card.c @@ -429,7 +429,23 @@ static void bm_work(struct work_struct *work) */ card->bm_generation = generation;
- if (root_device == NULL) { + if (card->gap_count == 0) { + /* + * If self IDs have inconsistent gap counts, do a + * bus reset ASAP. The config rom read might never + * complete, so don't wait for it. However, still + * send a PHY configuration packet prior to the + * bus reset. The PHY configuration packet might + * fail, but 1394-2008 8.4.5.2 explicitly permits + * it in this case, so it should be safe to try. + */ + new_root_id = local_id; + /* + * We must always send a bus reset if the gap count + * is inconsistent, so bypass the 5-reset limit. + */ + card->bm_retries = 0; + } else if (root_device == NULL) { /* * Either link_on is false, or we failed to read the * config rom. In either case, pick another root.
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yi Sun yi.sun@unisoc.com
[ Upstream commit 4ce6e2db00de8103a0687fb0f65fd17124a51aaa ]
Ensure no remaining requests in virtqueues before resetting vdev and deleting virtqueues. Otherwise these requests will never be completed. It may cause the system to become unresponsive.
Function blk_mq_quiesce_queue() can ensure that requests have become in_flight status, but it cannot guarantee that requests have been processed by the device. Virtqueues should never be deleted before all requests become complete status.
Function blk_mq_freeze_queue() ensure that all requests in virtqueues become complete status. And no requests can enter in virtqueues.
Signed-off-by: Yi Sun yi.sun@unisoc.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Link: https://lore.kernel.org/r/20240129085250.1550594-1-yi.sun@unisoc.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/virtio_blk.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 9b3ea86c20e5e..3afc07b59477b 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -1063,14 +1063,15 @@ static int virtblk_freeze(struct virtio_device *vdev) { struct virtio_blk *vblk = vdev->priv;
+ /* Ensure no requests in virtqueues before deleting vqs. */ + blk_mq_freeze_queue(vblk->disk->queue); + /* Ensure we don't receive any more interrupts */ vdev->config->reset(vdev);
/* Make sure no work handler is accessing the device. */ flush_work(&vblk->config_work);
- blk_mq_quiesce_queue(vblk->disk->queue); - vdev->config->del_vqs(vdev); kfree(vblk->vqs);
@@ -1088,7 +1089,7 @@ static int virtblk_restore(struct virtio_device *vdev)
virtio_device_ready(vdev);
- blk_mq_unquiesce_queue(vblk->disk->queue); + blk_mq_unfreeze_queue(vblk->disk->queue); return 0; } #endif
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandra Winter wintera@linux.ibm.com
[ Upstream commit 2fe8a236436fe40d8d26a1af8d150fc80f04ee1a ]
Symptom: In case of a bad cable connection (e.g. dirty optics) a fast sequence of network DOWN-UP-DOWN-UP could happen. UP triggers recovery of the qeth interface. In case of a second DOWN while recovery is still ongoing, it can happen that the IP@ of a Layer3 qeth interface is lost and will not be recovered by the second UP.
Problem: When registration of IP addresses with Layer 3 qeth devices fails, (e.g. because of bad address format) the respective IP address is deleted from its hash-table in the driver. If registration fails because of a ENETDOWN condition, the address should stay in the hashtable, so a subsequent recovery can restore it.
3caa4af834df ("qeth: keep ip-address after LAN_OFFLINE failure") fixes this for registration failures during normal operation, but not during recovery.
Solution: Keep L3-IP address in case of ENETDOWN in qeth_l3_recover_ip(). For consistency with qeth_l3_add_ip() we also keep it in case of EADDRINUSE, i.e. for some reason the card already/still has this address registered.
Fixes: 4a71df50047f ("qeth: new qeth device driver") Cc: stable@vger.kernel.org Signed-off-by: Alexandra Winter wintera@linux.ibm.com Link: https://lore.kernel.org/r/20240206085849.2902775-1-wintera@linux.ibm.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/net/qeth_l3_main.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index 8dee16aca421f..fe2bea2409427 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -305,9 +305,10 @@ static void qeth_l3_clear_ip_htable(struct qeth_card *card, int recover) if (!recover) { hash_del(&addr->hnode); kfree(addr); - continue; + } else { + /* prepare for recovery */ + addr->disp_flag = QETH_DISP_ADDR_ADD; } - addr->disp_flag = QETH_DISP_ADDR_ADD; }
mutex_unlock(&card->ip_lock); @@ -335,11 +336,13 @@ static void qeth_l3_recover_ip(struct qeth_card *card) } else rc = qeth_l3_register_addr_entry(card, addr);
- if (!rc) { + if (!rc || rc == -EADDRINUSE || rc == -ENETDOWN) { + /* keep it in the records */ addr->disp_flag = QETH_DISP_ADDR_DO_NOTHING; if (addr->ref_counter < 1) qeth_l3_delete_ip(card, addr); } else { + /* bad address */ hash_del(&addr->hnode); kfree(addr); }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit f0e4a1356466ec1858ae8e5c70bea2ce5e55008b ]
The power domain containing the Cortex-R7 CPU core on the R-Car V3H SoC must always be in power-on state, unlike on other SoCs in the R-Car Gen3 family. See Table 9.4 "Power domains" in the R-Car Series, 3rd Generation Hardware User’s Manual Rev.1.00 and later.
Fix this by marking the domain as a CPU domain without control registers, so the driver will not touch it.
Fixes: 41d6d8bd8ae9 ("soc: renesas: rcar-sysc: add R8A77980 support") Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/fdad9a86132d53ecddf72b734dac406915c4edc0.170507673... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soc/renesas/r8a77980-sysc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/renesas/r8a77980-sysc.c b/drivers/soc/renesas/r8a77980-sysc.c index a8dbe55e8ba82..3d1ea245681b0 100644 --- a/drivers/soc/renesas/r8a77980-sysc.c +++ b/drivers/soc/renesas/r8a77980-sysc.c @@ -25,7 +25,8 @@ static const struct rcar_sysc_area r8a77980_areas[] __initconst = { PD_CPU_NOCR }, { "ca53-cpu3", 0x200, 3, R8A77980_PD_CA53_CPU3, R8A77980_PD_CA53_SCU, PD_CPU_NOCR }, - { "cr7", 0x240, 0, R8A77980_PD_CR7, R8A77980_PD_ALWAYS_ON }, + { "cr7", 0x240, 0, R8A77980_PD_CR7, R8A77980_PD_ALWAYS_ON, + PD_CPU_NOCR }, { "a3ir", 0x180, 0, R8A77980_PD_A3IR, R8A77980_PD_ALWAYS_ON }, { "a2ir0", 0x400, 0, R8A77980_PD_A2IR0, R8A77980_PD_A3IR }, { "a2ir1", 0x400, 1, R8A77980_PD_A2IR1, R8A77980_PD_A3IR },
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 77c3c95637526f1e4330cc9a4b2065f668c2c4fe ]
unlocked version of protocol level close, will be used by MPTCP to allow decouple orphaning and subflow level close.
Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/tcp.h | 1 + net/ipv4/tcp.c | 9 +++++++-- 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index af67e19eba392..164ba7b77bd9f 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -391,6 +391,7 @@ void tcp_update_metrics(struct sock *sk); void tcp_init_metrics(struct sock *sk); void tcp_metrics_init(void); bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst); +void __tcp_close(struct sock *sk, long timeout); void tcp_close(struct sock *sk, long timeout); void tcp_init_sock(struct sock *sk); void tcp_init_transfer(struct sock *sk, int bpf_op); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 53a8522adf681..6a52fdcf9e4ef 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2351,13 +2351,12 @@ bool tcp_check_oom(struct sock *sk, int shift) return too_many_orphans || out_of_socket_memory; }
-void tcp_close(struct sock *sk, long timeout) +void __tcp_close(struct sock *sk, long timeout) { struct sk_buff *skb; int data_was_unread = 0; int state;
- lock_sock(sk); sk->sk_shutdown = SHUTDOWN_MASK;
if (sk->sk_state == TCP_LISTEN) { @@ -2521,6 +2520,12 @@ void tcp_close(struct sock *sk, long timeout) out: bh_unlock_sock(sk); local_bh_enable(); +} + +void tcp_close(struct sock *sk, long timeout) +{ + lock_sock(sk); + __tcp_close(sk, timeout); release_sock(sk); sock_put(sk); }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Soheil Hassas Yeganeh soheil@google.com
[ Upstream commit 8ba3c9d1c6d75d1e6af2087278b30e17f68e1fff ]
If there was any event available on the TCP socket, tcp_poll() will be called to retrieve all the events. In tcp_poll(), we call sk_stream_is_writeable() which returns true as long as we are at least one byte below notsent_lowat. This will result in quite a few spurious EPLLOUT and frequent tiny sendmsg() calls as a result.
Similar to sk_stream_write_space(), use __sk_stream_is_writeable with a wake value of 1, so that we set EPOLLOUT only if half the space is available for write.
Signed-off-by: Soheil Hassas Yeganeh soheil@google.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6a52fdcf9e4ef..e45c09977c600 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -566,7 +566,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait) mask |= EPOLLIN | EPOLLRDNORM;
if (!(sk->sk_shutdown & SEND_SHUTDOWN)) { - if (sk_stream_is_writeable(sk)) { + if (__sk_stream_is_writeable(sk, 1)) { mask |= EPOLLOUT | EPOLLWRNORM; } else { /* send SIGIO later */ sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk); @@ -578,7 +578,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait) * pairs with the input side. */ smp_mb__after_atomic(); - if (sk_stream_is_writeable(sk)) + if (__sk_stream_is_writeable(sk, 1)) mask |= EPOLLOUT | EPOLLWRNORM; } } else
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit e14cadfd80d76f01bfaa1a8d745b1db19b57d6be ]
Now sk->sk_shutdown is no longer a bitfield, we can add standard READ_ONCE()/WRITE_ONCE() annotations to silence KCSAN reports like the following:
BUG: KCSAN: data-race in tcp_disconnect / tcp_poll
write to 0xffff88814588582c of 1 bytes by task 3404 on cpu 1: tcp_disconnect+0x4d6/0xdb0 net/ipv4/tcp.c:3121 __inet_stream_connect+0x5dd/0x6e0 net/ipv4/af_inet.c:715 inet_stream_connect+0x48/0x70 net/ipv4/af_inet.c:727 __sys_connect_file net/socket.c:2001 [inline] __sys_connect+0x19b/0x1b0 net/socket.c:2018 __do_sys_connect net/socket.c:2028 [inline] __se_sys_connect net/socket.c:2025 [inline] __x64_sys_connect+0x41/0x50 net/socket.c:2025 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88814588582c of 1 bytes by task 3374 on cpu 0: tcp_poll+0x2e6/0x7d0 net/ipv4/tcp.c:562 sock_poll+0x253/0x270 net/socket.c:1383 vfs_poll include/linux/poll.h:88 [inline] io_poll_check_events io_uring/poll.c:281 [inline] io_poll_task_func+0x15a/0x820 io_uring/poll.c:333 handle_tw_list io_uring/io_uring.c:1184 [inline] tctx_task_work+0x1fe/0x4d0 io_uring/io_uring.c:1246 task_work_run+0x123/0x160 kernel/task_work.c:179 get_signal+0xe64/0xff0 kernel/signal.c:2635 arch_do_signal_or_restart+0x89/0x2a0 arch/x86/kernel/signal.c:306 exit_to_user_mode_loop+0x6f/0xe0 kernel/entry/common.c:168 exit_to_user_mode_prepare+0x6c/0xb0 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x26/0x140 kernel/entry/common.c:297 do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x03 -> 0x00
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/af_inet.c | 2 +- net/ipv4/tcp.c | 14 ++++++++------ net/ipv4/tcp_input.c | 4 ++-- 3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index e05cdc6088507..d7ebee3c048d5 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -876,7 +876,7 @@ int inet_shutdown(struct socket *sock, int how) EPOLLHUP, even on eg. unconnected UDP sockets -- RR */ /* fall through */ default: - sk->sk_shutdown |= how; + WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | how); if (sk->sk_prot->shutdown) sk->sk_prot->shutdown(sk, how); break; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e45c09977c600..8d7933989de0e 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -505,6 +505,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait) __poll_t mask; struct sock *sk = sock->sk; const struct tcp_sock *tp = tcp_sk(sk); + u8 shutdown; int state;
sock_poll_wait(file, sock, wait); @@ -547,9 +548,10 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait) * NOTE. Check for TCP_CLOSE is added. The goal is to prevent * blocking on fresh not-connected or disconnected socket. --ANK */ - if (sk->sk_shutdown == SHUTDOWN_MASK || state == TCP_CLOSE) + shutdown = READ_ONCE(sk->sk_shutdown); + if (shutdown == SHUTDOWN_MASK || state == TCP_CLOSE) mask |= EPOLLHUP; - if (sk->sk_shutdown & RCV_SHUTDOWN) + if (shutdown & RCV_SHUTDOWN) mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
/* Connected or passive Fast Open socket? */ @@ -565,7 +567,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait) if (tcp_stream_is_readable(tp, target, sk)) mask |= EPOLLIN | EPOLLRDNORM;
- if (!(sk->sk_shutdown & SEND_SHUTDOWN)) { + if (!(shutdown & SEND_SHUTDOWN)) { if (__sk_stream_is_writeable(sk, 1)) { mask |= EPOLLOUT | EPOLLWRNORM; } else { /* send SIGIO later */ @@ -2357,7 +2359,7 @@ void __tcp_close(struct sock *sk, long timeout) int data_was_unread = 0; int state;
- sk->sk_shutdown = SHUTDOWN_MASK; + WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
if (sk->sk_state == TCP_LISTEN) { tcp_set_state(sk, TCP_CLOSE); @@ -2629,7 +2631,7 @@ int tcp_disconnect(struct sock *sk, int flags) if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK)) inet_reset_saddr(sk);
- sk->sk_shutdown = 0; + WRITE_ONCE(sk->sk_shutdown, 0); sock_reset_flag(sk, SOCK_DONE); tp->srtt_us = 0; tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT); @@ -3905,7 +3907,7 @@ void tcp_done(struct sock *sk) if (req) reqsk_fastopen_remove(sk, req, false);
- sk->sk_shutdown = SHUTDOWN_MASK; + WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
if (!sock_flag(sk, SOCK_DEAD)) sk->sk_state_change(sk); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 982fe464156a4..61243531a7f4c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4216,7 +4216,7 @@ void tcp_fin(struct sock *sk)
inet_csk_schedule_ack(sk);
- sk->sk_shutdown |= RCV_SHUTDOWN; + WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | RCV_SHUTDOWN); sock_set_flag(sk, SOCK_DONE);
switch (sk->sk_state) { @@ -6354,7 +6354,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) break;
tcp_set_state(sk, TCP_FIN_WAIT2); - sk->sk_shutdown |= SEND_SHUTDOWN; + WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | SEND_SHUTDOWN);
sk_dst_confirm(sk);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lee Jones lee.jones@linaro.org
[ Upstream commit e1524ea84af7172acc20827f8dca3fc8f72b8f37 ]
Demote headers which are clearly not kerneldoc, provide titles for struct definition blocks, fix API slip (bitrot) misspellings and provide some missing entries.
Fixes the following W=1 kernel build warning(s):
drivers/pinctrl/pinctrl-rockchip.c:82: warning: cannot understand function prototype: 'struct rockchip_iomux ' drivers/pinctrl/pinctrl-rockchip.c:97: warning: Enum value 'DRV_TYPE_IO_DEFAULT' not described in enum 'rockchip_pin_drv_type' drivers/pinctrl/pinctrl-rockchip.c:97: warning: Enum value 'DRV_TYPE_IO_1V8_OR_3V0' not described in enum 'rockchip_pin_drv_type' drivers/pinctrl/pinctrl-rockchip.c:97: warning: Enum value 'DRV_TYPE_IO_1V8_ONLY' not described in enum 'rockchip_pin_drv_type' drivers/pinctrl/pinctrl-rockchip.c:97: warning: Enum value 'DRV_TYPE_IO_1V8_3V0_AUTO' not described in enum 'rockchip_pin_drv_type' drivers/pinctrl/pinctrl-rockchip.c:97: warning: Enum value 'DRV_TYPE_IO_3V3_ONLY' not described in enum 'rockchip_pin_drv_type' drivers/pinctrl/pinctrl-rockchip.c:97: warning: Enum value 'DRV_TYPE_MAX' not described in enum 'rockchip_pin_drv_type' drivers/pinctrl/pinctrl-rockchip.c:106: warning: Enum value 'PULL_TYPE_IO_DEFAULT' not described in enum 'rockchip_pin_pull_type' drivers/pinctrl/pinctrl-rockchip.c:106: warning: Enum value 'PULL_TYPE_IO_1V8_ONLY' not described in enum 'rockchip_pin_pull_type' drivers/pinctrl/pinctrl-rockchip.c:106: warning: Enum value 'PULL_TYPE_MAX' not described in enum 'rockchip_pin_pull_type' drivers/pinctrl/pinctrl-rockchip.c:109: warning: Cannot understand * @drv_type: drive strength variant using rockchip_perpin_drv_type on line 109 - I thought it was a doc line drivers/pinctrl/pinctrl-rockchip.c:122: warning: Cannot understand * @reg_base: register base of the gpio bank on line 109 - I thought it was a doc line drivers/pinctrl/pinctrl-rockchip.c:325: warning: Function parameter or member 'route_location' not described in 'rockchip_mux_route_data' drivers/pinctrl/pinctrl-rockchip.c:328: warning: Cannot understand */ on line 109 - I thought it was a doc line drivers/pinctrl/pinctrl-rockchip.c:375: warning: Function parameter or member 'data' not described in 'rockchip_pin_group' drivers/pinctrl/pinctrl-rockchip.c:387: warning: Function parameter or member 'ngroups' not described in 'rockchip_pmx_func'
Signed-off-by: Lee Jones lee.jones@linaro.org Reviewed-by: Heiko Stuebner heiko@sntech.de Cc: Heiko Stuebner heiko@sntech.de Cc: Jean-Christophe PLAGNIOL-VILLARD plagnioj@jcrosoft.com Cc: linux-rockchip@lists.infradead.org Link: https://lore.kernel.org/r/20200713144930.1034632-20-lee.jones@linaro.org Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/pinctrl-rockchip.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c index 4b972be3487f9..a44c2680c4230 100644 --- a/drivers/pinctrl/pinctrl-rockchip.c +++ b/drivers/pinctrl/pinctrl-rockchip.c @@ -62,7 +62,7 @@ enum rockchip_pinctrl_type { RK3399, };
-/** +/* * Encode variants of iomux registers into a type variable */ #define IOMUX_GPIO_ONLY BIT(0) @@ -72,6 +72,7 @@ enum rockchip_pinctrl_type { #define IOMUX_WIDTH_3BIT BIT(4)
/** + * struct rockchip_iomux * @type: iomux variant using IOMUX_* constants * @offset: if initialized to -1 it will be autocalculated, by specifying * an initial offset value the relevant source offset can be reset @@ -82,7 +83,7 @@ struct rockchip_iomux { int offset; };
-/** +/* * enum type index corresponding to rockchip_perpin_drv_list arrays index. */ enum rockchip_pin_drv_type { @@ -94,7 +95,7 @@ enum rockchip_pin_drv_type { DRV_TYPE_MAX };
-/** +/* * enum type index corresponding to rockchip_pull_list arrays index. */ enum rockchip_pin_pull_type { @@ -104,6 +105,7 @@ enum rockchip_pin_pull_type { };
/** + * struct rockchip_drv * @drv_type: drive strength variant using rockchip_perpin_drv_type * @offset: if initialized to -1 it will be autocalculated, by specifying * an initial offset value the relevant source offset can be reset @@ -117,8 +119,9 @@ struct rockchip_drv { };
/** + * struct rockchip_pin_bank * @reg_base: register base of the gpio bank - * @reg_pull: optional separate register for additional pull settings + * @regmap_pull: optional separate register for additional pull settings * @clk: clock of the gpio bank * @irq: interrupt of the gpio bank * @saved_masks: Saved content of GPIO_INTEN at suspend time. @@ -136,6 +139,8 @@ struct rockchip_drv { * @gpio_chip: gpiolib chip * @grange: gpio range * @slock: spinlock for the gpio bank + * @toggle_edge_mode: bit mask to toggle (falling/rising) edge mode + * @recalced_mask: bit mask to indicate a need to recalulate the mask * @route_mask: bits describing the routing pins of per bank */ struct rockchip_pin_bank { @@ -310,6 +315,7 @@ enum rockchip_mux_route_location { * @bank_num: bank number. * @pin: index at register or used to calc index. * @func: the min pin. + * @route_location: the mux route location (same, pmu, grf). * @route_offset: the max pin. * @route_val: the register offset. */ @@ -322,8 +328,6 @@ struct rockchip_mux_route_data { u32 route_val; };
-/** - */ struct rockchip_pin_ctrl { struct rockchip_pin_bank *pin_banks; u32 nr_banks; @@ -361,9 +365,7 @@ struct rockchip_pin_config { * @name: name of the pin group, used to lookup the group. * @pins: the pins included in this group. * @npins: number of pins included in this group. - * @func: the mux function number to be programmed when selected. - * @configs: the config values to be set for each pin - * @nconfigs: number of configs for each pin + * @data: local pin configuration */ struct rockchip_pin_group { const char *name; @@ -376,7 +378,7 @@ struct rockchip_pin_group { * struct rockchip_pmx_func: represent a pin function. * @name: name of the pin function, used to lookup the function. * @groups: one or more names of pin groups that provide this function. - * @num_groups: number of groups included in @groups. + * @ngroups: number of groups included in @groups. */ struct rockchip_pmx_func { const char *name;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit c818ae563bf99457f02e8170aabd6b174f629f65 ]
of_find_node_by_phandle() returns a node pointer with refcount incremented, We should use of_node_put() on it when not needed anymore. Add missing of_node_put() to avoid refcount leak.
Fixes: d3e5116119bd ("pinctrl: add pinctrl driver for Rockchip SoCs") Signed-off-by: Miaoqian Lin linmq006@gmail.com Link: https://lore.kernel.org/r/20230102112845.3982407-1-linmq006@gmail.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/pinctrl-rockchip.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c index a44c2680c4230..9388d6fac7d40 100644 --- a/drivers/pinctrl/pinctrl-rockchip.c +++ b/drivers/pinctrl/pinctrl-rockchip.c @@ -2536,6 +2536,7 @@ static int rockchip_pinctrl_parse_groups(struct device_node *np, np_config = of_find_node_by_phandle(be32_to_cpup(phandle)); ret = pinconf_generic_parse_dt_config(np_config, NULL, &grp->data[j].configs, &grp->data[j].nconfigs); + of_node_put(np_config); if (ret) return ret; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Stultz john.stultz@linaro.org
[ Upstream commit e2cec7d6853712295cef5377762165a489b2957f ]
When using modules, its common for the modules not to be loaded until quite late by userland. With the current code, driver_deferred_probe_check_state() will stop returning EPROBE_DEFER after late_initcall, which can cause module dependency resolution to fail after that.
So allow a longer window of 30 seconds (picked somewhat arbitrarily, but influenced by the similar regulator core timeout value) in the case where modules are enabled.
Cc: linux-pm@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Linus Walleij linus.walleij@linaro.org Cc: Thierry Reding treding@nvidia.com Cc: Mark Brown broonie@kernel.org Cc: Liam Girdwood lgirdwood@gmail.com Cc: Bjorn Andersson bjorn.andersson@linaro.org Cc: Saravana Kannan saravanak@google.com Cc: Todd Kjos tkjos@google.com Cc: Len Brown len.brown@intel.com Cc: Pavel Machek pavel@ucw.cz Cc: Ulf Hansson ulf.hansson@linaro.org Cc: Kevin Hilman khilman@kernel.org Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: Rob Herring robh@kernel.org Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Reviewed-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: John Stultz john.stultz@linaro.org Link: https://lore.kernel.org/r/20200225050828.56458-3-john.stultz@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/base/dd.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 7941a8fd22841..0b97a0c96baa3 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -224,7 +224,16 @@ static int deferred_devs_show(struct seq_file *s, void *data) } DEFINE_SHOW_ATTRIBUTE(deferred_devs);
+#ifdef CONFIG_MODULES +/* + * In the case of modules, set the default probe timeout to + * 30 seconds to give userland some time to load needed modules + */ +static int deferred_probe_timeout = 30; +#else +/* In the case of !modules, no probe timeout needed */ static int deferred_probe_timeout = -1; +#endif static int __init deferred_probe_timeout_setup(char *str) { int timeout;
On Tue, Feb 27, 2024 at 5:27 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
5.4-stable review patch. If anyone has any objections, please let me know.
From: John Stultz john.stultz@linaro.org
[ Upstream commit e2cec7d6853712295cef5377762165a489b2957f ]
When using modules, its common for the modules not to be loaded until quite late by userland. With the current code, driver_deferred_probe_check_state() will stop returning EPROBE_DEFER after late_initcall, which can cause module dependency resolution to fail after that.
So allow a longer window of 30 seconds (picked somewhat arbitrarily, but influenced by the similar regulator core timeout value) in the case where modules are enabled.
Cc: linux-pm@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Linus Walleij linus.walleij@linaro.org Cc: Thierry Reding treding@nvidia.com Cc: Mark Brown broonie@kernel.org Cc: Liam Girdwood lgirdwood@gmail.com Cc: Bjorn Andersson bjorn.andersson@linaro.org Cc: Saravana Kannan saravanak@google.com Cc: Todd Kjos tkjos@google.com Cc: Len Brown len.brown@intel.com Cc: Pavel Machek pavel@ucw.cz Cc: Ulf Hansson ulf.hansson@linaro.org Cc: Kevin Hilman khilman@kernel.org Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: Rob Herring robh@kernel.org Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Reviewed-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: John Stultz john.stultz@linaro.org Link: https://lore.kernel.org/r/20200225050828.56458-3-john.stultz@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org
drivers/base/dd.c | 9 +++++++++ 1 file changed, 9 insertions(+)
This change ended up being reverted upstream in ce68929f07de
Is there some specific reason it got selected to be pulled into -stable?
thanks -john
On Tue, Feb 27, 2024 at 11:38:01AM -0800, John Stultz wrote:
On Tue, Feb 27, 2024 at 5:27 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
5.4-stable review patch. If anyone has any objections, please let me know.
From: John Stultz john.stultz@linaro.org
[ Upstream commit e2cec7d6853712295cef5377762165a489b2957f ]
When using modules, its common for the modules not to be loaded until quite late by userland. With the current code, driver_deferred_probe_check_state() will stop returning EPROBE_DEFER after late_initcall, which can cause module dependency resolution to fail after that.
So allow a longer window of 30 seconds (picked somewhat arbitrarily, but influenced by the similar regulator core timeout value) in the case where modules are enabled.
Cc: linux-pm@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Linus Walleij linus.walleij@linaro.org Cc: Thierry Reding treding@nvidia.com Cc: Mark Brown broonie@kernel.org Cc: Liam Girdwood lgirdwood@gmail.com Cc: Bjorn Andersson bjorn.andersson@linaro.org Cc: Saravana Kannan saravanak@google.com Cc: Todd Kjos tkjos@google.com Cc: Len Brown len.brown@intel.com Cc: Pavel Machek pavel@ucw.cz Cc: Ulf Hansson ulf.hansson@linaro.org Cc: Kevin Hilman khilman@kernel.org Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: Rob Herring robh@kernel.org Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Reviewed-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: John Stultz john.stultz@linaro.org Link: https://lore.kernel.org/r/20200225050828.56458-3-john.stultz@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org
drivers/base/dd.c | 9 +++++++++ 1 file changed, 9 insertions(+)
This change ended up being reverted upstream in ce68929f07de
Is there some specific reason it got selected to be pulled into -stable?
It was an attempt to sync up with what had been added to the 4.19.y tree but not newer kernels. I'll go drop this from 5.4.y now, thanks for noticing the revert, my scripts did not pick that up :(
Also, this is in the 4.19.y tree, but in a different form, and one that no one seemed to have problems with, so I'll leave it alone there for now.
thanks,
greg k-h
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit 2b2bf6b7faa9010fae10dc7de76627a3fdb525b3 ]
'status' is known to be 0 at this point. The expected error code is PTR_ERR(clk).
Switch to dev_err_probe() in order to display the expected error code (in a human readable way). This also filters -EPROBE_DEFER cases, should it happen.
Fixes: 1ab7f2a43558 ("staging: mt7621-spi: add mt7621 support") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Reviewed-by: Matthias Brugger matthias.bgg@gmail.com Link: https://lore.kernel.org/r/928f3fb507d53ba0774df27cea0bbba4b055993b.166159967... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-mt7621.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/spi/spi-mt7621.c b/drivers/spi/spi-mt7621.c index b4b9b7309b5e9..351b0ef52bbc8 100644 --- a/drivers/spi/spi-mt7621.c +++ b/drivers/spi/spi-mt7621.c @@ -340,11 +340,9 @@ static int mt7621_spi_probe(struct platform_device *pdev) return PTR_ERR(base);
clk = devm_clk_get(&pdev->dev, NULL); - if (IS_ERR(clk)) { - dev_err(&pdev->dev, "unable to get SYS clock, err=%d\n", - status); - return PTR_ERR(clk); - } + if (IS_ERR(clk)) + return dev_err_probe(&pdev->dev, PTR_ERR(clk), + "unable to get SYS clock\n");
status = clk_prepare_enable(clk); if (status)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov nikolay@cumulusnetworks.com
[ Upstream commit fd65e5a95d08389444e8591a20538b3edece0e15 ]
We need to clear all of the bridge private skb variables as they can be stale due to the packet being recirculated through the stack and then transmitted through the bridge device. Similar memset is already done on bridge's input. We've seen cases where proxyarp_replied was 1 on routed multicast packets transmitted through the bridge to ports with neigh suppress which were getting dropped. Same thing can in theory happen with the port isolation bit as well.
Fixes: 821f1b21cabb ("bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood") Signed-off-by: Nikolay Aleksandrov nikolay@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_device.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index f085b1648e66c..501f77f0f480a 100644 --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -35,6 +35,8 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev) const unsigned char *dest; u16 vid = 0;
+ memset(skb->cb, 0, sizeof(struct br_input_skb_cb)); + rcu_read_lock(); nf_ops = rcu_dereference(nf_br_ops); if (nf_ops && nf_ops->br_dev_xmit_hook(skb)) {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Björn Töpel bjorn.topel@gmail.com
[ Upstream commit c77b0589ca29ad1859fe7d7c1ecd63c0632379fa ]
Some architectures have strict alignment requirements. In that case, the BPF verifier detects if a program has unaligned accesses and rejects them. A user can pass BPF_F_ANY_ALIGNMENT to a program to override this check. That, however, will only work when a privileged user loads a program. An unprivileged user loading a program with this flag will be rejected prior entering the verifier.
Hence, it does not make sense to load unprivileged programs without strict alignment when testing the verifier. This patch avoids exactly that.
Signed-off-by: Björn Töpel bjorn.topel@gmail.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Acked-by: Luke Nelson luke.r.nels@gmail.com Link: https://lore.kernel.org/bpf/20201118071640.83773-3-bjorn.topel@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/bpf/test_verifier.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 43224c5ec1e9b..1bd285dc55e94 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -1091,6 +1091,19 @@ static void get_unpriv_disabled()
static bool test_as_unpriv(struct bpf_test *test) { +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS + /* Some architectures have strict alignment requirements. In + * that case, the BPF verifier detects if a program has + * unaligned accesses and rejects them. A user can pass + * BPF_F_ANY_ALIGNMENT to a program to override this + * check. That, however, will only work when a privileged user + * loads a program. An unprivileged user loading a program + * with this flag will be rejected prior entering the + * verifier. + */ + if (test->flags & F_NEEDS_EFFICIENT_UNALIGNED_ACCESS) + return false; +#endif return !test->prog_type || test->prog_type == BPF_PROG_TYPE_SOCKET_FILTER || test->prog_type == BPF_PROG_TYPE_CGROUP_SKB;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kai-Heng Feng kai.heng.feng@canonical.com
[ Upstream commit 3e0650ab26e2010ee312311612e40e076ed1feca ]
Though the system uses DMIC, headset mic still uses the HDA, let's use GPIO 0x1 to control the micmute LED.
The micmute LED GPIO has a different polarity to the mute LED GPIO, we can use the newly added micmute_led_polarity to indicate that.
Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Link: https://lore.kernel.org/r/20200430083255.5093-2-kai.heng.feng@canonical.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index d8ee1b1ee7e3d..893b9fde59ac0 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -4326,7 +4326,11 @@ static void alc269_fixup_hp_gpio_led(struct hda_codec *codec, static void alc285_fixup_hp_gpio_led(struct hda_codec *codec, const struct hda_fixup *fix, int action) { - alc_fixup_hp_gpio_led(codec, action, 0x04, 0x00); + struct alc_spec *spec = codec->spec; + + spec->micmute_led_polarity = 1; + + alc_fixup_hp_gpio_led(codec, action, 0x04, 0x01); }
static void alc286_fixup_hp_gpio_led(struct hda_codec *codec,
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Icenowy Zheng icenowy@aosc.io
[ Upstream commit a00d17e0a71ae2e4fdaac46e1c12785d3346c3f2 ]
This reverts commit da676c6aa6413d59ab0a80c97bbc273025e640b2.
The original commit adds a start parameter to the calculation of the start delay according to some old BSP versions from Allwinner. However, there're two ways to add this delay -- add it in DSI controller or add it in the TCON. Add it in both controllers won't work.
The code before this commit is picked from new versions of BSP kernel, which has a comment for the 1 that says "put start_delay to tcon". By checking the sun4i_tcon0_mode_set_cpu() in sun4i_tcon driver, it has already added this delay, so we shouldn't repeat to add the delay in DSI controller, otherwise the timing won't match.
Signed-off-by: Icenowy Zheng icenowy@aosc.io Reviewed-by: Jagan Teki jagan@amarulasolutions.com Signed-off-by: Maxime Ripard mripard@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20191001080253.6135-2-icenowy@... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c b/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c index f2b288037b909..a18efd3055199 100644 --- a/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c +++ b/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c @@ -365,8 +365,7 @@ static void sun6i_dsi_inst_init(struct sun6i_dsi *dsi, static u16 sun6i_dsi_get_video_start_delay(struct sun6i_dsi *dsi, struct drm_display_mode *mode) { - u16 start = clamp(mode->vtotal - mode->vdisplay - 10, 8, 100); - u16 delay = mode->vtotal - (mode->vsync_end - mode->vdisplay) + start; + u16 delay = mode->vtotal - (mode->vsync_end - mode->vdisplay) + 1;
if (delay > mode->vtotal) delay = delay % mode->vtotal;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trek trek00@inbox.ru
[ Upstream commit 13238d4fa6764fa74dcf863d5f2227765b3753eb ]
Do not try to allocate any amount of memory requested by the user. Instead limit it to 128 registers. Actually the longest series of consecutive allowed registers are 48, mmGB_TILE_MODE0-31 and mmGB_MACROTILE_MODE0-15 (0x2644-0x2673).
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=111273 Signed-off-by: Trek trek00@inbox.ru Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 26a1173df9586..1f4acb4c3efb8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -650,6 +650,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file if (info->read_mmr_reg.count > 128) return -EINVAL;
+ if (info->read_mmr_reg.count > 128) + return -EINVAL; + regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL); if (!regs) return -ENOMEM;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
[ Upstream commit d35652a5fc9944784f6f50a5c979518ff8dacf61 ]
Fei has reported that KASAN triggers during apply_alternatives() on a 5-level paging machine:
BUG: KASAN: out-of-bounds in rcu_is_watching() Read of size 4 at addr ff110003ee6419a0 by task swapper/0/0 ... __asan_load4() rcu_is_watching() trace_hardirqs_on() text_poke_early() apply_alternatives() ...
On machines with 5-level paging, cpu_feature_enabled(X86_FEATURE_LA57) gets patched. It includes KASAN code, where KASAN_SHADOW_START depends on __VIRTUAL_MASK_SHIFT, which is defined with cpu_feature_enabled().
KASAN gets confused when apply_alternatives() patches the KASAN_SHADOW_START users. A test patch that makes KASAN_SHADOW_START static, by replacing __VIRTUAL_MASK_SHIFT with 56, works around the issue.
Fix it for real by disabling KASAN while the kernel is patching alternatives.
[ mingo: updated the changelog ]
Fixes: 6657fca06e3f ("x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y") Reported-by: Fei Yang fei.yang@intel.com Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar mingo@kernel.org Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231012100424.1456-1-kirill.shutemov@linux.intel.... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/alternative.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 15bad8d598949..faae8a1856709 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -374,6 +374,17 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, u8 insn_buff[MAX_PATCH_LEN];
DPRINTK("alt table %px, -> %px", start, end); + + /* + * In the case CONFIG_X86_5LEVEL=y, KASAN_SHADOW_START is defined using + * cpu_feature_enabled(X86_FEATURE_LA57) and is therefore patched here. + * During the process, KASAN becomes confused seeing partial LA57 + * conversion and triggers a false-positive out-of-bound report. + * + * Disable KASAN until the patching is complete. + */ + kasan_disable_current(); + /* * The scan order should be from start to end. A later scanned * alternative code can overwrite previously scanned alternative code. @@ -434,6 +445,8 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
text_poke_early(instr, insn_buff, insn_buff_sz); } + + kasan_enable_current(); }
#ifdef CONFIG_SMP
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
commit b86f4b790c998afdbc88fe1aa55cfe89c4068726 upstream.
__bio_for_each_segment assumes that the first struct bio_vec argument doesn't change - it calls "bio_advance_iter_single((bio), &(iter), (bvl).bv_len)" to advance the iterator. Unfortunately, the dm-integrity code changes the bio_vec with "bv.bv_len -= pos". When this code path is taken, the iterator would be out of sync and dm-integrity would report errors. This happens if the machine is out of memory and "kmalloc" fails.
Fix this bug by making a copy of "bv" and changing the copy instead.
Fixes: 7eada909bfd7 ("dm: add integrity target") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-integrity.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c index 81157801a3dc6..f3246f7407d61 100644 --- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -1582,11 +1582,12 @@ static void integrity_metadata(struct work_struct *w) }
__bio_for_each_segment(bv, bio, iter, dio->bio_details.bi_iter) { + struct bio_vec bv_copy = bv; unsigned pos; char *mem, *checksums_ptr;
again: - mem = (char *)kmap_atomic(bv.bv_page) + bv.bv_offset; + mem = (char *)kmap_atomic(bv_copy.bv_page) + bv_copy.bv_offset; pos = 0; checksums_ptr = checksums; do { @@ -1595,7 +1596,7 @@ static void integrity_metadata(struct work_struct *w) sectors_to_process -= ic->sectors_per_block; pos += ic->sectors_per_block << SECTOR_SHIFT; sector += ic->sectors_per_block; - } while (pos < bv.bv_len && sectors_to_process && checksums != checksums_onstack); + } while (pos < bv_copy.bv_len && sectors_to_process && checksums != checksums_onstack); kunmap_atomic(mem);
r = dm_integrity_rw_tag(ic, checksums, &dio->metadata_block, &dio->metadata_offset, @@ -1615,9 +1616,9 @@ static void integrity_metadata(struct work_struct *w) if (!sectors_to_process) break;
- if (unlikely(pos < bv.bv_len)) { - bv.bv_offset += pos; - bv.bv_len -= pos; + if (unlikely(pos < bv_copy.bv_len)) { + bv_copy.bv_offset += pos; + bv_copy.bv_len -= pos; goto again; } }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Wilcox (Oracle) willy@infradead.org
[ Upstream commit 4595a298d5563cf76c1d852970f162051fd1a7a6 ]
For filesystems with block size < page size, we need to set all the per-block uptodate bits if the page was already uptodate at the time we create the per-block metadata. This can happen if the page is invalidated (eg by a write to drop_caches) but ultimately not removed from the page cache.
This is a data corruption issue as page writeback skips blocks which are marked !uptodate.
Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads") Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Reported-by: Qian Cai cai@redhat.com Cc: Brian Foster bfoster@redhat.com Reviewed-by: Gao Xiang hsiangkao@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- fs/iomap/buffered-io.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 53cd7b2bb580b..c28ba474a25e7 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -23,6 +23,7 @@ static struct iomap_page * iomap_page_create(struct inode *inode, struct page *page) { struct iomap_page *iop = to_iomap_page(page); + unsigned int nr_blocks = PAGE_SIZE / i_blocksize(inode);
if (iop || i_blocksize(inode) == PAGE_SIZE) return iop; @@ -32,6 +33,8 @@ iomap_page_create(struct inode *inode, struct page *page) atomic_set(&iop->write_count, 0); spin_lock_init(&iop->uptodate_lock); bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE); + if (PageUptodate(page)) + bitmap_fill(iop->uptodate, nr_blocks);
/* * migrate_page_move_mapping() assumes that pages with private data have
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
[ Upstream commit f0d0f1087333714ee683cc134a95afe331d7ddd9 ]
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function pointer prototype to make sure the call target is valid to help mitigate ROP attacks. If they are not identical, there is a failure at run time, which manifests as either a kernel panic or thread getting killed. A proposed warning in clang aims to catch these at compile time, which reveals:
drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c:412:15: error: incompatible function pointer types initializing 'void (*)(struct amdgpu_device *, u32, u32, u32, u32)' (aka 'void (*)(struct amdgpu_device *, unsigned int, unsigned int, unsigned int, unsigned int)') with an expression of type 'void (struct amdgpu_device *, enum idh_request, u32, u32, u32)' (aka 'void (struct amdgpu_device *, enum idh_request, unsigned int, unsigned int, unsigned int)') [-Werror,-Wincompatible-function-pointer-types-strict] .trans_msg = xgpu_ai_mailbox_trans_msg, ^~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated.
drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c:435:15: error: incompatible function pointer types initializing 'void (*)(struct amdgpu_device *, u32, u32, u32, u32)' (aka 'void (*)(struct amdgpu_device *, unsigned int, unsigned int, unsigned int, unsigned int)') with an expression of type 'void (struct amdgpu_device *, enum idh_request, u32, u32, u32)' (aka 'void (struct amdgpu_device *, enum idh_request, unsigned int, unsigned int, unsigned int)') [-Werror,-Wincompatible-function-pointer-types-strict] .trans_msg = xgpu_nv_mailbox_trans_msg, ^~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated.
The type of the second parameter in the prototype should be 'enum idh_request' instead of 'u32'. Update it to clear up the warnings.
Link: https://github.com/ClangBuiltLinux/linux/issues/1750 Reported-by: Sami Tolvanen samitolvanen@google.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h index b0b2bdc750df9..3e59b4e150237 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h @@ -48,6 +48,8 @@ struct amdgpu_vf_error_buffer { uint64_t data[AMDGPU_VF_ERROR_ENTRY_SIZE]; };
+enum idh_request; + /** * struct amdgpu_virt_ops - amdgpu device virt operations */ @@ -56,7 +58,8 @@ struct amdgpu_virt_ops { int (*rel_full_gpu)(struct amdgpu_device *adev, bool init); int (*reset_gpu)(struct amdgpu_device *adev); int (*wait_reset)(struct amdgpu_device *adev); - void (*trans_msg)(struct amdgpu_device *adev, u32 req, u32 data1, u32 data2, u32 data3); + void (*trans_msg)(struct amdgpu_device *adev, enum idh_request req, + u32 data1, u32 data2, u32 data3); int (*get_pp_clk)(struct amdgpu_device *adev, u32 type, char *buf); int (*force_dpm_level)(struct amdgpu_device *adev, u32 level); };
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sireesh Kodali sireeshkodali1@gmail.com
[ Upstream commit 5458d6f2827cd30218570f266b8d238417461f2f ]
The smem-state properties for the pronto node were incorrectly labelled, reading `qcom,state*` rather than `qcom,smem-state*`. Fix that, allowing the stop state to be used.
Fixes: 88106096cbf8 ("ARM: dts: msm8916: Add and enable wcnss node") Signed-off-by: Sireesh Kodali sireeshkodali1@gmail.com Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Stephan Gerhold stephan@gerhold.net Signed-off-by: Bjorn Andersson bjorn.andersson@linaro.org Link: https://lore.kernel.org/r/20220526141740.15834-3-sireeshkodali1@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/boot/dts/qcom/msm8916.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/msm8916.dtsi b/arch/arm64/boot/dts/qcom/msm8916.dtsi index bf40500adef73..e1097ba6c9481 100644 --- a/arch/arm64/boot/dts/qcom/msm8916.dtsi +++ b/arch/arm64/boot/dts/qcom/msm8916.dtsi @@ -1097,8 +1097,8 @@ vddmx-supply = <&pm8916_l3>; vddpx-supply = <&pm8916_l7>;
- qcom,state = <&wcnss_smp2p_out 0>; - qcom,state-names = "stop"; + qcom,smem-states = <&wcnss_smp2p_out 0>; + qcom,smem-state-names = "stop";
pinctrl-names = "default"; pinctrl-0 = <&wcnss_pin_a>;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pali Rohár pali@kernel.org
[ Upstream commit 63605f1cfcc56bcb25c48bbee75a679d85ba7675 ]
Error code is stored in rp->reset_gpio and not in err variable.
Link: https://lore.kernel.org/r/20200414102512.27506-1-pali@kernel.org Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Acked-by: Thierry Reding treding@nvidia.com Acked-by: Rob Herring robh@kernel.org Stable-dep-of: eff21f5da308 ("PCI: tegra: Fix OF node reference leak") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/controller/pci-tegra.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/pci-tegra.c b/drivers/pci/controller/pci-tegra.c index 99d505a85067b..64921c63874fa 100644 --- a/drivers/pci/controller/pci-tegra.c +++ b/drivers/pci/controller/pci-tegra.c @@ -2289,8 +2289,8 @@ static int tegra_pcie_parse_dt(struct tegra_pcie *pcie) if (PTR_ERR(rp->reset_gpio) == -ENOENT) { rp->reset_gpio = NULL; } else { - dev_err(dev, "failed to get reset GPIO: %d\n", - err); + dev_err(dev, "failed to get reset GPIO: %ld\n", + PTR_ERR(rp->reset_gpio)); return PTR_ERR(rp->reset_gpio); } }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit eff21f5da308265678e7e59821795e606f3e560f ]
Commit 9e38e690ace3 ("PCI: tegra: Fix OF node reference leak") has fixed some node reference leaks in this function but missed some of them.
In fact, having 'port' referenced in the 'rp' structure is not enough to prevent the leak, until 'rp' is actually added in the 'pcie->ports' list.
Add the missing 'goto err_node_put' accordingly.
Link: https://lore.kernel.org/r/55b11e9a7fa2987fbc0869d68ae59888954d65e2.162014853... Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Vidya Sagar vidyas@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/controller/pci-tegra.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/controller/pci-tegra.c b/drivers/pci/controller/pci-tegra.c index 64921c63874fa..74c0ddd433815 100644 --- a/drivers/pci/controller/pci-tegra.c +++ b/drivers/pci/controller/pci-tegra.c @@ -2267,13 +2267,15 @@ static int tegra_pcie_parse_dt(struct tegra_pcie *pcie) rp->np = port;
rp->base = devm_pci_remap_cfg_resource(dev, &rp->regs); - if (IS_ERR(rp->base)) - return PTR_ERR(rp->base); + if (IS_ERR(rp->base)) { + err = PTR_ERR(rp->base); + goto err_node_put; + }
label = devm_kasprintf(dev, GFP_KERNEL, "pex-reset-%u", index); if (!label) { - dev_err(dev, "failed to create reset GPIO label\n"); - return -ENOMEM; + err = -ENOMEM; + goto err_node_put; }
/* @@ -2291,7 +2293,8 @@ static int tegra_pcie_parse_dt(struct tegra_pcie *pcie) } else { dev_err(dev, "failed to get reset GPIO: %ld\n", PTR_ERR(rp->reset_gpio)); - return PTR_ERR(rp->reset_gpio); + err = PTR_ERR(rp->reset_gpio); + goto err_node_put; } }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Vacek neelx@redhat.com
commit e6f57c6881916df39db7d95981a8ad2b9c3458d6 upstream.
Unfortunately the commit `fd8958efe877` introduced another error causing the `descs` array to overflow. This reults in further crashes easily reproducible by `sendmsg` system call.
[ 1080.836473] general protection fault, probably for non-canonical address 0x400300015528b00a: 0000 [#1] PREEMPT SMP PTI [ 1080.869326] RIP: 0010:hfi1_ipoib_build_ib_tx_headers.constprop.0+0xe1/0x2b0 [hfi1] -- [ 1080.974535] Call Trace: [ 1080.976990] <TASK> [ 1081.021929] hfi1_ipoib_send_dma_common+0x7a/0x2e0 [hfi1] [ 1081.027364] hfi1_ipoib_send_dma_list+0x62/0x270 [hfi1] [ 1081.032633] hfi1_ipoib_send+0x112/0x300 [hfi1] [ 1081.042001] ipoib_start_xmit+0x2a9/0x2d0 [ib_ipoib] [ 1081.046978] dev_hard_start_xmit+0xc4/0x210 -- [ 1081.148347] __sys_sendmsg+0x59/0xa0
crash> ipoib_txreq 0xffff9cfeba229f00 struct ipoib_txreq { txreq = { list = { next = 0xffff9cfeba229f00, prev = 0xffff9cfeba229f00 }, descp = 0xffff9cfeba229f40, coalesce_buf = 0x0, wait = 0xffff9cfea4e69a48, complete = 0xffffffffc0fe0760 <hfi1_ipoib_sdma_complete>, packet_len = 0x46d, tlen = 0x0, num_desc = 0x0, desc_limit = 0x6, next_descq_idx = 0x45c, coalesce_idx = 0x0, flags = 0x0, descs = {{ qw = {0x8024000120dffb00, 0x4} # SDMA_DESC0_FIRST_DESC_FLAG (bit 63) }, { qw = { 0x3800014231b108, 0x4} }, { qw = { 0x310000e4ee0fcf0, 0x8} }, { qw = { 0x3000012e9f8000, 0x8} }, { qw = { 0x59000dfb9d0000, 0x8} }, { qw = { 0x78000e02e40000, 0x8} }} }, sdma_hdr = 0x400300015528b000, <<< invalid pointer in the tx request structure sdma_status = 0x0, SDMA_DESC0_LAST_DESC_FLAG (bit 62) complete = 0x0, priv = 0x0, txq = 0xffff9cfea4e69880, skb = 0xffff9d099809f400 }
If an SDMA send consists of exactly 6 descriptors and requires dword padding (in the 7th descriptor), the sdma_txreq descriptor array is not properly expanded and the packet will overflow into the container structure. This results in a panic when the send completion runs. The exact panic varies depending on what elements of the container structure get corrupted. The fix is to use the correct expression in _pad_sdma_tx_descs() to test the need to expand the descriptor array.
With this patch the crashes are no longer reproducible and the machine is stable.
Fixes: fd8958efe877 ("IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors") Cc: stable@vger.kernel.org Reported-by: Mats Kronberg kronberg@nsc.liu.se Tested-by: Mats Kronberg kronberg@nsc.liu.se Signed-off-by: Daniel Vacek neelx@redhat.com Link: https://lore.kernel.org/r/20240201081009.1109442-1-neelx@redhat.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/infiniband/hw/hfi1/sdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/infiniband/hw/hfi1/sdma.c +++ b/drivers/infiniband/hw/hfi1/sdma.c @@ -3203,7 +3203,7 @@ int _pad_sdma_tx_descs(struct hfi1_devda { int rval = 0;
- if ((unlikely(tx->num_desc + 1 == tx->desc_limit))) { + if ((unlikely(tx->num_desc == tx->desc_limit))) { rval = _extend_sdma_tx_descs(dd, tx); if (rval) { __sdma_txclean(dd, tx);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
commit 50c70240097ce41fe6bce6478b80478281e4d0f7 upstream.
It was said that authenticated encryption could produce invalid tag when the data that is being encrypted is modified [1]. So, fix this problem by copying the data into the clone bio first and then encrypt them inside the clone bio.
This may reduce performance, but it is needed to prevent the user from corrupting the device by writing data with O_DIRECT and modifying them at the same time.
[1] https://lore.kernel.org/all/20240207004723.GA35324@sol.localdomain/T/
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-crypt.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -1627,6 +1627,12 @@ static void kcryptd_crypt_write_convert( io->ctx.bio_out = clone; io->ctx.iter_out = clone->bi_iter;
+ if (crypt_integrity_aead(cc)) { + bio_copy_data(clone, io->base_bio); + io->ctx.bio_in = clone; + io->ctx.iter_in = clone->bi_iter; + } + sector += bio_sectors(clone);
crypt_inc_pending(io);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasiliy Kovalev kovalev@altlinux.org
commit 136cfaca22567a03bbb3bf53a43d8cb5748b80ec upstream.
The gtp_net_ops pernet operations structure for the subsystem must be registered before registering the generic netlink family.
Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug:
general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] CPU: 1 PID: 5826 Comm: gtp Not tainted 6.8.0-rc3-std-def-alt1 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-alt1 04/01/2014 RIP: 0010:gtp_genl_dump_pdp+0x1be/0x800 [gtp] Code: c6 89 c6 e8 64 e9 86 df 58 45 85 f6 0f 85 4e 04 00 00 e8 c5 ee 86 df 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 de 05 00 00 48 8b 44 24 18 4c 8b 30 4c 39 f0 74 RSP: 0018:ffff888014107220 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88800fcda588 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f1be4eb05c0(0000) GS:ffff88806ce80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1be4e766cf CR3: 000000000c33e000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? show_regs+0x90/0xa0 ? die_addr+0x50/0xd0 ? exc_general_protection+0x148/0x220 ? asm_exc_general_protection+0x22/0x30 ? gtp_genl_dump_pdp+0x1be/0x800 [gtp] ? __alloc_skb+0x1dd/0x350 ? __pfx___alloc_skb+0x10/0x10 genl_dumpit+0x11d/0x230 netlink_dump+0x5b9/0xce0 ? lockdep_hardirqs_on_prepare+0x253/0x430 ? __pfx_netlink_dump+0x10/0x10 ? kasan_save_track+0x10/0x40 ? __kasan_kmalloc+0x9b/0xa0 ? genl_start+0x675/0x970 __netlink_dump_start+0x6fc/0x9f0 genl_family_rcv_msg_dumpit+0x1bb/0x2d0 ? __pfx_genl_family_rcv_msg_dumpit+0x10/0x10 ? genl_op_from_small+0x2a/0x440 ? cap_capable+0x1d0/0x240 ? __pfx_genl_start+0x10/0x10 ? __pfx_genl_dumpit+0x10/0x10 ? __pfx_genl_done+0x10/0x10 ? security_capable+0x9d/0xe0
Cc: stable@vger.kernel.org Signed-off-by: Vasiliy Kovalev kovalev@altlinux.org Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Link: https://lore.kernel.org/r/20240214162733.34214-1-kovalev@altlinux.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/gtp.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -1381,20 +1381,20 @@ static int __init gtp_init(void) if (err < 0) goto error_out;
- err = genl_register_family(>p_genl_family); + err = register_pernet_subsys(>p_net_ops); if (err < 0) goto unreg_rtnl_link;
- err = register_pernet_subsys(>p_net_ops); + err = genl_register_family(>p_genl_family); if (err < 0) - goto unreg_genl_family; + goto unreg_pernet_subsys;
pr_info("GTP module loaded (pdp ctx size %zd bytes)\n", sizeof(struct pdp_ctx)); return 0;
-unreg_genl_family: - genl_unregister_family(>p_genl_family); +unreg_pernet_subsys: + unregister_pernet_subsys(>p_net_ops); unreg_rtnl_link: rtnl_link_unregister(>p_link_ops); error_out:
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vidya Sagar vidyas@nvidia.com
commit db744ddd59be798c2627efbfc71f707f5a935a40 upstream.
While calculating the hardware interrupt number for a MSI interrupt, the higher bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain number gets truncated because of the shifted value casting to return type of pci_domain_nr() which is 'int'. This for example is resulting in same hardware interrupt number for devices 0019:00:00.0 and 0039:00:00.0.
To address this cast the PCI domain number to 'irq_hw_number_t' before left shifting it to calculate the hardware interrupt number.
Please note that this fixes the issue only on 64-bit systems and doesn't change the behavior for 32-bit systems i.e. the 32-bit systems continue to have the issue. Since the issue surfaces only if there are too many PCIe controllers in the system which usually is the case in modern server systems and they don't tend to run 32-bit kernels.
Fixes: 3878eaefb89a ("PCI/MSI: Enhance core to support hierarchy irqdomain") Signed-off-by: Vidya Sagar vidyas@nvidia.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Shanker Donthineni sdonthineni@nvidia.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115135649.708536-1-vidyas@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/msi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1428,7 +1428,7 @@ irq_hw_number_t pci_msi_domain_calc_hwir { return (irq_hw_number_t)desc->msi_attrib.entry_nr | pci_dev_id(dev) << 11 | - (pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27; + ((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 27; }
static inline bool pci_msi_desc_is_multi_msi(struct msi_desc *desc)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Parkin tparkin@katalix.com
commit 359e54a93ab43d32ee1bff3c2f9f10cb9f6b6e79 upstream.
l2tp_ip6_sendmsg needs to avoid accounting for the transport header twice when splicing more data into an already partially-occupied skbuff.
To manage this, we check whether the skbuff contains data using skb_queue_empty when deciding how much data to append using ip6_append_data.
However, the code which performed the calculation was incorrect:
ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0;
...due to C operator precedence, this ends up setting ulen to transhdrlen for messages with a non-zero length, which results in corrupted packets on the wire.
Add parentheses to correct the calculation in line with the original intent.
Fixes: 9d4c75800f61 ("ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()") Cc: David Howells dhowells@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Tom Parkin tparkin@katalix.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240220122156.43131-1-tparkin@katalix.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/l2tp/l2tp_ip6.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -644,7 +644,7 @@ static int l2tp_ip6_sendmsg(struct sock
back_from_confirm: lock_sock(sk); - ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0; + ulen = len + (skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0); err = ip6_append_data(sk, ip_generic_getfrag, msg, ulen, transhdrlen, &ipc6, &fl6, (struct rt6_info *)dst,
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Shubin nikita.shubin@maquefel.me
commit fdf87a0dc26d0550c60edc911cda42f9afec3557 upstream.
Without the terminator, if a con_id is passed to gpio_find() that does not exist in the lookup table the function will not stop looping correctly, and eventually cause an oops.
Cc: stable@vger.kernel.org Fixes: b2e63555592f ("i2c: gpio: Convert to use descriptors") Reported-by: Andy Shevchenko andriy.shevchenko@intel.com Signed-off-by: Nikita Shubin nikita.shubin@maquefel.me Reviewed-by: Linus Walleij linus.walleij@linaro.org Acked-by: Alexander Sverdlin alexander.sverdlin@gmail.com Signed-off-by: Alexander Sverdlin alexander.sverdlin@gmail.com Link: https://lore.kernel.org/r/20240205102337.439002-1-alexander.sverdlin@gmail.c... Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/mach-ep93xx/core.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm/mach-ep93xx/core.c +++ b/arch/arm/mach-ep93xx/core.c @@ -337,6 +337,7 @@ static struct gpiod_lookup_table ep93xx_ GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN), GPIO_LOOKUP_IDX("G", 0, NULL, 1, GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN), + { } }, };
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
commit cd45f99034b0c8c9cb346dd0d6407a95ca3d36f6 upstream.
... cdns3_gadget_ep_free_request(&priv_ep->endpoint, &priv_req->request); list_del_init(&priv_req->list); ...
'priv_req' actually free at cdns3_gadget_ep_free_request(). But list_del_init() use priv_req->list after it.
[ 1542.642868][ T534] BUG: KFENCE: use-after-free read in __list_del_entry_valid+0x10/0xd4 [ 1542.642868][ T534] [ 1542.653162][ T534] Use-after-free read at 0x000000009ed0ba99 (in kfence-#3): [ 1542.660311][ T534] __list_del_entry_valid+0x10/0xd4 [ 1542.665375][ T534] cdns3_gadget_ep_disable+0x1f8/0x388 [cdns3] [ 1542.671571][ T534] usb_ep_disable+0x44/0xe4 [ 1542.675948][ T534] ffs_func_eps_disable+0x64/0xc8 [ 1542.680839][ T534] ffs_func_set_alt+0x74/0x368 [ 1542.685478][ T534] ffs_func_disable+0x18/0x28
Move list_del_init() before cdns3_gadget_ep_free_request() to resolve this problem.
Cc: stable@vger.kernel.org Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver") Signed-off-by: Frank Li Frank.Li@nxp.com Reviewed-by: Roger Quadros rogerq@kernel.org Acked-by: Peter Chen peter.chen@kernel.org Link: https://lore.kernel.org/r/20240202154217.661867-1-Frank.Li@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/cdns3/gadget.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/cdns3/gadget.c +++ b/drivers/usb/cdns3/gadget.c @@ -1951,11 +1951,11 @@ static int cdns3_gadget_ep_disable(struc
while (!list_empty(&priv_ep->wa2_descmiss_req_list)) { priv_req = cdns3_next_priv_request(&priv_ep->wa2_descmiss_req_list); + list_del_init(&priv_req->list);
kfree(priv_req->request.buf); cdns3_gadget_ep_free_request(&priv_ep->endpoint, &priv_req->request); - list_del_init(&priv_req->list); --priv_ep->wa2_counter; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
commit 5fd9e45f1ebcd57181358af28506e8a661a260b3 upstream.
829 if (request->complete) { 830 spin_unlock(&priv_dev->lock); 831 usb_gadget_giveback_request(&priv_ep->endpoint, 832 request); 833 spin_lock(&priv_dev->lock); 834 } 835 836 if (request->buf == priv_dev->zlp_buf) 837 cdns3_gadget_ep_free_request(&priv_ep->endpoint, request);
Driver append an additional zero packet request when queue a packet, which length mod max packet size is 0. When transfer complete, run to line 831, usb_gadget_giveback_request() will free this requestion. 836 condition is true, so cdns3_gadget_ep_free_request() free this request again.
Log:
[ 1920.140696][ T150] BUG: KFENCE: use-after-free read in cdns3_gadget_giveback+0x134/0x2c0 [cdns3] [ 1920.140696][ T150] [ 1920.151837][ T150] Use-after-free read at 0x000000003d1cd10b (in kfence-#36): [ 1920.159082][ T150] cdns3_gadget_giveback+0x134/0x2c0 [cdns3] [ 1920.164988][ T150] cdns3_transfer_completed+0x438/0x5f8 [cdns3]
Add check at line 829, skip call usb_gadget_giveback_request() if it is additional zero length packet request. Needn't call usb_gadget_giveback_request() because it is allocated in this driver.
Cc: stable@vger.kernel.org Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver") Signed-off-by: Frank Li Frank.Li@nxp.com Reviewed-by: Roger Quadros rogerq@kernel.org Acked-by: Peter Chen peter.chen@kernel.org Link: https://lore.kernel.org/r/20240202154217.661867-2-Frank.Li@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/cdns3/gadget.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/usb/cdns3/gadget.c +++ b/drivers/usb/cdns3/gadget.c @@ -661,7 +661,11 @@ void cdns3_gadget_giveback(struct cdns3_ return; }
- if (request->complete) { + /* + * zlp request is appended by driver, needn't call usb_gadget_giveback_request() to notify + * gadget composite driver. + */ + if (request->complete && request->buf != priv_dev->zlp_buf) { spin_unlock(&priv_dev->lock); usb_gadget_giveback_request(&priv_ep->endpoint, request);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krishna Kurapati quic_kriskura@quicinc.com
commit 76c51146820c5dac629f21deafab0a7039bc3ccd upstream.
It is observed sometimes when tethering is used over NCM with Windows 11 as host, at some instances, the gadget_giveback has one byte appended at the end of a proper NTB. When the NTB is parsed, unwrap call looks for any leftover bytes in SKB provided by u_ether and if there are any pending bytes, it treats them as a separate NTB and parses it. But in case the second NTB (as per unwrap call) is faulty/corrupt, all the datagrams that were parsed properly in the first NTB and saved in rx_list are dropped.
Adding a few custom traces showed the following: [002] d..1 7828.532866: dwc3_gadget_giveback: ep1out: req 000000003868811a length 1025/16384 zsI ==> 0 [002] d..1 7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb toprocess: 1025 [002] d..1 7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb seq: 0xce67 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x400 [002] d..1 7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb ndp_len: 0x10 [002] d..1 7828.532869: ncm_unwrap_ntb: K: Parsed NTB with 1 frames
In this case, the giveback is of 1025 bytes and block length is 1024. The rest 1 byte (which is 0x00) won't be parsed resulting in drop of all datagrams in rx_list.
Same is case with packets of size 2048: [002] d..1 7828.557948: dwc3_gadget_giveback: ep1out: req 0000000011dfd96e length 2049/16384 zsI ==> 0 [002] d..1 7828.557949: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342 [002] d..1 7828.557950: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x800
Lecroy shows one byte coming in extra confirming that the byte is coming in from PC:
Transfer 2959 - Bytes Transferred(1025) Timestamp((18.524 843 590) - Transaction 8391 - Data(1025 bytes) Timestamp(18.524 843 590) --- Packet 4063861 Data(1024 bytes) Duration(2.117us) Idle(14.700ns) Timestamp(18.524 843 590) --- Packet 4063863 Data(1 byte) Duration(66.160ns) Time(282.000ns) Timestamp(18.524 845 722)
According to Windows driver, no ZLP is needed if wBlockLength is non-zero, because the non-zero wBlockLength has already told the function side the size of transfer to be expected. However, there are in-market NCM devices that rely on ZLP as long as the wBlockLength is multiple of wMaxPacketSize. To deal with such devices, it pads an extra 0 at end so the transfer is no longer multiple of wMaxPacketSize.
Cc: stable@vger.kernel.org Fixes: 9f6ce4240a2b ("usb: gadget: f_ncm.c added") Signed-off-by: Krishna Kurapati quic_kriskura@quicinc.com Reviewed-by: Maciej Żenczykowski maze@google.com Link: https://lore.kernel.org/r/20240205074650.200304-1-quic_kriskura@quicinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/function/f_ncm.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/drivers/usb/gadget/function/f_ncm.c +++ b/drivers/usb/gadget/function/f_ncm.c @@ -1349,7 +1349,15 @@ parse_ntb: "Parsed NTB with %d frames\n", dgram_counter);
to_process -= block_len; - if (to_process != 0) { + + /* + * Windows NCM driver avoids USB ZLPs by adding a 1-byte + * zero pad as needed. + */ + if (to_process == 1 && + (*(unsigned char *)(ntb_ptr + block_len) == 0x00)) { + to_process--; + } else if (to_process > 0) { ntb_ptr = (unsigned char *)(ntb_ptr + block_len); goto parse_ntb; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xu Yang xu.yang_2@nxp.com
commit b787a3e781759026a6212736ef8e52cf83d1821a upstream.
There is a possibility that usb_role_switch device is unregistered before the user put usb_role_switch. In this case, the user may still want to get/set_role() since the user can't sense the changes of usb_role_switch.
This will add a flag to show if usb_role_switch is already registered and avoid unwanted behaviors.
Fixes: fde0aa6c175a ("usb: common: Small class for USB role switches") cc: stable@vger.kernel.org Signed-off-by: Xu Yang xu.yang_2@nxp.com Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20240129093739.2371530-2-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/roles/class.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/drivers/usb/roles/class.c +++ b/drivers/usb/roles/class.c @@ -20,6 +20,7 @@ struct usb_role_switch { struct device dev; struct mutex lock; /* device lock*/ enum usb_role role; + bool registered;
/* From descriptor */ struct device *usb2_port; @@ -46,6 +47,9 @@ int usb_role_switch_set_role(struct usb_ if (IS_ERR_OR_NULL(sw)) return 0;
+ if (!sw->registered) + return -EOPNOTSUPP; + mutex_lock(&sw->lock);
ret = sw->set(sw->dev.parent, role); @@ -69,7 +73,7 @@ enum usb_role usb_role_switch_get_role(s { enum usb_role role;
- if (IS_ERR_OR_NULL(sw)) + if (IS_ERR_OR_NULL(sw) || !sw->registered) return USB_ROLE_NONE;
mutex_lock(&sw->lock); @@ -319,6 +323,8 @@ usb_role_switch_register(struct device * return ERR_PTR(ret); }
+ sw->registered = true; + /* TODO: Symlinks for the host port and the device controller. */
return sw; @@ -333,8 +339,10 @@ EXPORT_SYMBOL_GPL(usb_role_switch_regist */ void usb_role_switch_unregister(struct usb_role_switch *sw) { - if (!IS_ERR_OR_NULL(sw)) + if (!IS_ERR_OR_NULL(sw)) { + sw->registered = false; device_unregister(&sw->dev); + } } EXPORT_SYMBOL_GPL(usb_role_switch_unregister);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhipeng Lu alexious@zju.edu.cn
[ Upstream commit 809aa64ebff51eb170ee31a95f83b2d21efa32e2 ]
When dma_alloc_coherent fails to allocate dd->cr_base[i].va, init_credit_return should deallocate dd->cr_base and dd->cr_base[i] that allocated before. Or those resources would be never freed and a memleak is triggered.
Fixes: 7724105686e7 ("IB/hfi1: add driver files") Signed-off-by: Zhipeng Lu alexious@zju.edu.cn Link: https://lore.kernel.org/r/20240112085523.3731720-1-alexious@zju.edu.cn Acked-by: Dennis Dalessandro dennis.dalessandro@cornelisnetworks.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/pio.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c index fa5de362010f2..8303c506733cc 100644 --- a/drivers/infiniband/hw/hfi1/pio.c +++ b/drivers/infiniband/hw/hfi1/pio.c @@ -2131,7 +2131,7 @@ int init_credit_return(struct hfi1_devdata *dd) "Unable to allocate credit return DMA range for NUMA %d\n", i); ret = -ENOMEM; - goto done; + goto free_cr_base; } } set_dev_node(&dd->pcidev->dev, dd->node); @@ -2139,6 +2139,10 @@ int init_credit_return(struct hfi1_devdata *dd) ret = 0; done: return ret; + +free_cr_base: + free_credit_return(dd); + goto done; }
void free_credit_return(struct hfi1_devdata *dd)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kalesh AP kalesh-anakkur.purayil@broadcom.com
[ Upstream commit 3687b450c5f32e80f179ce4b09e0454da1449eac ]
SRQ resize is not supported in the driver. But driver is not returning error from bnxt_re_modify_srq() for SRQ resize.
Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Signed-off-by: Selvin Xavier selvin.xavier@broadcom.com Link: https://lore.kernel.org/r/1705985677-15551-5-git-send-email-selvin.xavier@br... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index 183f1c3c1f5ef..c9a7c03403ac0 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -1623,7 +1623,7 @@ int bnxt_re_modify_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr, switch (srq_attr_mask) { case IB_SRQ_MAX_WR: /* SRQ resize is not supported */ - break; + return -EINVAL; case IB_SRQ_LIMIT: /* Change the SRQ threshold */ if (srq_attr->srq_limit > srq->qplib_srq.max_wqe) @@ -1638,13 +1638,12 @@ int bnxt_re_modify_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr, /* On success, update the shadow */ srq->srq_limit = srq_attr->srq_limit; /* No need to Build and send response back to udata */ - break; + return 0; default: dev_err(rdev_to_dev(rdev), "Unsupported srq_attr_mask 0x%x", srq_attr_mask); return -EINVAL; } - return 0; }
int bnxt_re_query_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bart Van Assche bvanassche@acm.org
[ Upstream commit d4ee7f3a4445ec1b0b88af216f4032c4d30abf5a ]
Since the session name by itself is not sufficient to uniquely identify a queue pair, include the queue pair number. Show the ASCII channel state name instead of the numeric value. This change makes the ib_srpt debug output more consistent.
Link: https://lore.kernel.org/r/20200525172212.14413-3-bvanassche@acm.org Signed-off-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Jason Gunthorpe jgg@mellanox.com Stable-dep-of: eb5c7465c324 ("RDMA/srpt: fix function pointer cast warnings") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/ulp/srpt/ib_srpt.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 2822ca5e82779..ccd9811c6c1e2 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -217,8 +217,9 @@ static const char *get_ch_state_name(enum rdma_ch_state s) */ static void srpt_qp_event(struct ib_event *event, struct srpt_rdma_ch *ch) { - pr_debug("QP event %d on ch=%p sess_name=%s state=%d\n", - event->event, ch, ch->sess_name, ch->state); + pr_debug("QP event %d on ch=%p sess_name=%s-%d state=%s\n", + event->event, ch, ch->sess_name, ch->qp->qp_num, + get_ch_state_name(ch->state));
switch (event->event) { case IB_EVENT_COMM_EST: @@ -2005,8 +2006,8 @@ static void __srpt_close_all_ch(struct srpt_port *sport) list_for_each_entry(nexus, &sport->nexus_list, entry) { list_for_each_entry(ch, &nexus->ch_list, list) { if (srpt_disconnect_ch(ch) >= 0) - pr_info("Closing channel %s because target %s_%d has been disabled\n", - ch->sess_name, + pr_info("Closing channel %s-%d because target %s_%d has been disabled\n", + ch->sess_name, ch->qp->qp_num, dev_name(&sport->sdev->device->dev), sport->port); srpt_close_ch(ch);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit eb5c7465c3240151cd42a55c7ace9da0026308a1 ]
clang-16 notices that srpt_qp_event() gets called through an incompatible pointer here:
drivers/infiniband/ulp/srpt/ib_srpt.c:1815:5: error: cast from 'void (*)(struct ib_event *, struct srpt_rdma_ch *)' to 'void (*)(struct ib_event *, void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] 1815 | = (void(*)(struct ib_event *, void*))srpt_qp_event;
Change srpt_qp_event() to use the correct prototype and adjust the argument inside of it.
Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/r/20240213100728.458348-1-arnd@kernel.org Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/ulp/srpt/ib_srpt.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index ccd9811c6c1e2..d03a4f2e006fb 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -213,10 +213,12 @@ static const char *get_ch_state_name(enum rdma_ch_state s) /** * srpt_qp_event - QP event callback function * @event: Description of the event that occurred. - * @ch: SRPT RDMA channel. + * @ptr: SRPT RDMA channel. */ -static void srpt_qp_event(struct ib_event *event, struct srpt_rdma_ch *ch) +static void srpt_qp_event(struct ib_event *event, void *ptr) { + struct srpt_rdma_ch *ch = ptr; + pr_debug("QP event %d on ch=%p sess_name=%s-%d state=%s\n", event->event, ch, ch->sess_name, ch->qp->qp_num, get_ch_state_name(ch->state)); @@ -1802,8 +1804,7 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch) }
qp_init->qp_context = (void *)ch; - qp_init->event_handler - = (void(*)(struct ib_event *, void*))srpt_qp_event; + qp_init->event_handler = srpt_qp_event; qp_init->send_cq = ch->cq; qp_init->recv_cq = ch->cq; qp_init->sq_sig_type = IB_SIGNAL_REQ_WR;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Nakryiko andriin@fb.com
[ Upstream commit 7a387bed47f7e80e257d966cd64a3e92a63e26a1 ]
Enhance scripts/bpf_helpers_doc.py to emit C header with BPF helper definitions (to be included from libbpf's bpf_helpers.h).
Signed-off-by: Andrii Nakryiko andriin@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Stable-dep-of: e37243b65d52 ("bpf, scripts: Correct GPL license name") Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/bpf_helpers_doc.py | 155 ++++++++++++++++++++++++++++++++++++- 1 file changed, 154 insertions(+), 1 deletion(-)
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py index 894cc58c1a034..15d3d83d6297c 100755 --- a/scripts/bpf_helpers_doc.py +++ b/scripts/bpf_helpers_doc.py @@ -391,6 +391,154 @@ SEE ALSO
print('')
+class PrinterHelpers(Printer): + """ + A printer for dumping collected information about helpers as C header to + be included from BPF program. + @helpers: array of Helper objects to print to standard output + """ + + type_fwds = [ + 'struct bpf_fib_lookup', + 'struct bpf_perf_event_data', + 'struct bpf_perf_event_value', + 'struct bpf_sock', + 'struct bpf_sock_addr', + 'struct bpf_sock_ops', + 'struct bpf_sock_tuple', + 'struct bpf_spin_lock', + 'struct bpf_sysctl', + 'struct bpf_tcp_sock', + 'struct bpf_tunnel_key', + 'struct bpf_xfrm_state', + 'struct pt_regs', + 'struct sk_reuseport_md', + 'struct sockaddr', + 'struct tcphdr', + + 'struct __sk_buff', + 'struct sk_msg_md', + 'struct xpd_md', + ] + known_types = { + '...', + 'void', + 'const void', + 'char', + 'const char', + 'int', + 'long', + 'unsigned long', + + '__be16', + '__be32', + '__wsum', + + 'struct bpf_fib_lookup', + 'struct bpf_perf_event_data', + 'struct bpf_perf_event_value', + 'struct bpf_sock', + 'struct bpf_sock_addr', + 'struct bpf_sock_ops', + 'struct bpf_sock_tuple', + 'struct bpf_spin_lock', + 'struct bpf_sysctl', + 'struct bpf_tcp_sock', + 'struct bpf_tunnel_key', + 'struct bpf_xfrm_state', + 'struct pt_regs', + 'struct sk_reuseport_md', + 'struct sockaddr', + 'struct tcphdr', + } + mapped_types = { + 'u8': '__u8', + 'u16': '__u16', + 'u32': '__u32', + 'u64': '__u64', + 's8': '__s8', + 's16': '__s16', + 's32': '__s32', + 's64': '__s64', + 'size_t': 'unsigned long', + 'struct bpf_map': 'void', + 'struct sk_buff': 'struct __sk_buff', + 'const struct sk_buff': 'const struct __sk_buff', + 'struct sk_msg_buff': 'struct sk_msg_md', + 'struct xdp_buff': 'struct xdp_md', + } + + def print_header(self): + header = '''\ +/* This is auto-generated file. See bpf_helpers_doc.py for details. */ + +/* Forward declarations of BPF structs */''' + + print(header) + for fwd in self.type_fwds: + print('%s;' % fwd) + print('') + + def print_footer(self): + footer = '' + print(footer) + + def map_type(self, t): + if t in self.known_types: + return t + if t in self.mapped_types: + return self.mapped_types[t] + print("") + print("Unrecognized type '%s', please add it to known types!" % t) + sys.exit(1) + + seen_helpers = set() + + def print_one(self, helper): + proto = helper.proto_break_down() + + if proto['name'] in self.seen_helpers: + return + self.seen_helpers.add(proto['name']) + + print('/*') + print(" * %s" % proto['name']) + print(" *") + if (helper.desc): + # Do not strip all newline characters: formatted code at the end of + # a section must be followed by a blank line. + for line in re.sub('\n$', '', helper.desc, count=1).split('\n'): + print(' *{}{}'.format(' \t' if line else '', line)) + + if (helper.ret): + print(' *') + print(' * Returns') + for line in helper.ret.rstrip().split('\n'): + print(' *{}{}'.format(' \t' if line else '', line)) + + print(' */') + print('static %s %s(*%s)(' % (self.map_type(proto['ret_type']), + proto['ret_star'], proto['name']), end='') + comma = '' + for i, a in enumerate(proto['args']): + t = a['type'] + n = a['name'] + if proto['name'] == 'bpf_get_socket_cookie' and i == 0: + t = 'void' + n = 'ctx' + one_arg = '{}{}'.format(comma, self.map_type(t)) + if n: + if a['star']: + one_arg += ' {}'.format(a['star']) + else: + one_arg += ' ' + one_arg += '{}'.format(n) + comma = ', ' + print(one_arg, end='') + + print(') = (void *) %d;' % len(self.seen_helpers)) + print('') + ###############################################################################
# If script is launched from scripts/ from kernel tree and can access @@ -405,6 +553,8 @@ Parse eBPF header file and generate documentation for eBPF helper functions. The RST-formatted output produced can be turned into a manual page with the rst2man utility. """) +argParser.add_argument('--header', action='store_true', + help='generate C header file') if (os.path.isfile(bpfh)): argParser.add_argument('--filename', help='path to include/uapi/linux/bpf.h', default=bpfh) @@ -417,5 +567,8 @@ headerParser = HeaderParser(args.filename) headerParser.run()
# Print formatted output to standard output. -printer = PrinterRST(headerParser.helpers) +if args.header: + printer = PrinterHelpers(headerParser.helpers) +else: + printer = PrinterRST(headerParser.helpers) printer.print_all()
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gianmarco Lusvardi glusvardi@posteo.net
[ Upstream commit e37243b65d528a8a9f8b9a57a43885f8e8dfc15c ]
The bpf_doc script refers to the GPL as the "GNU Privacy License". I strongly suspect that the author wanted to refer to the GNU General Public License, under which the Linux kernel is released, as, to the best of my knowledge, there is no license named "GNU Privacy License". This patch corrects the license name in the script accordingly.
Fixes: 56a092c89505 ("bpf: add script and prepare bpf.h for new helpers documentation") Signed-off-by: Gianmarco Lusvardi glusvardi@posteo.net Signed-off-by: Daniel Borkmann daniel@iogearbox.net Reviewed-by: Quentin Monnet quentin@isovalent.com Link: https://lore.kernel.org/bpf/20240213230544.930018-3-glusvardi@posteo.net Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/bpf_helpers_doc.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py index 15d3d83d6297c..78b7194a05521 100755 --- a/scripts/bpf_helpers_doc.py +++ b/scripts/bpf_helpers_doc.py @@ -286,7 +286,7 @@ eBPF programs can have an associated license, passed along with the bytecode instructions to the kernel when the programs are loaded. The format for that string is identical to the one in use for kernel modules (Dual licenses, such as "Dual BSD/GPL", may be used). Some helper functions are only accessible to -programs that are compatible with the GNU Privacy License (GPL). +programs that are compatible with the GNU General Public License (GNU GPL).
In order to use such helpers, the eBPF program must be loaded with the correct license string passed (via **attr**) to the **bpf**\ () system call, and this
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 9ddf190a7df77b77817f955fdb9c2ae9d1c9c9a3 ]
JAZZ_ESP is a bool kconfig symbol that selects SCSI_SPI_ATTRS. When CONFIG_SCSI=m, this results in SCSI_SPI_ATTRS=m while JAZZ_ESP=y, which causes many undefined symbol linker errors.
Fix this by only offering to build this driver when CONFIG_SCSI=y.
[mkp: JAZZ_ESP is unique in that it does not support being compiled as a module unlike the remaining SPI SCSI HBA drivers]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Randy Dunlap rdunlap@infradead.org Link: https://lore.kernel.org/r/20240214055953.9612-1-rdunlap@infradead.org Cc: Thomas Bogendoerfer tsbogend@alpha.franken.de Cc: linux-mips@vger.kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: Masahiro Yamada masahiroy@kernel.org Cc: Nicolas Schier nicolas@fjasle.eu Cc: James E.J. Bottomley jejb@linux.ibm.com Cc: Martin K. Petersen martin.petersen@oracle.com Cc: linux-scsi@vger.kernel.org Cc: Geert Uytterhoeven geert@linux-m68k.org Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202402112222.Gl0udKyU-lkp@intel.com/ Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index 51bbc858a944c..cab6e67ea606b 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -1286,7 +1286,7 @@ source "drivers/scsi/arm/Kconfig"
config JAZZ_ESP bool "MIPS JAZZ FAS216 SCSI support" - depends on MACH_JAZZ && SCSI + depends on MACH_JAZZ && SCSI=y select SCSI_SPI_ATTRS help This is the driver for the onboard SCSI host adapter of MIPS Magnum
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 0affdba22aca5573f9d989bcb1d71d32a6a03efe ]
clang-16 warns about casting between incompatible function types:
drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c:161:10: error: cast from 'void (*)(const struct firmware *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] 161 | .fini = (void(*)(void *))release_firmware,
This one was done to use the generic shadow_fw_release() function as a callback for struct nvbios_source. Change it to use the same prototype as the other five instances, with a trivial helper function that actually calls release_firmware.
Fixes: 70c0f263cc2e ("drm/nouveau/bios: pull in basic vbios subdev, more to come later") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Danilo Krummrich dakr@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20240213095753.455062-1-arnd@k... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c index 4b571cc6bc70f..6597def18627e 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c @@ -154,11 +154,17 @@ shadow_fw_init(struct nvkm_bios *bios, const char *name) return (void *)fw; }
+static void +shadow_fw_release(void *fw) +{ + release_firmware(fw); +} + static const struct nvbios_source shadow_fw = { .name = "firmware", .init = shadow_fw_init, - .fini = (void(*)(void *))release_firmware, + .fini = shadow_fw_release, .read = shadow_fw_read, .rw = false, };
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 081a0e3b0d4c061419d3f4679dec9f68725b17e4 ]
net->dev_base_seq and ipv4.dev_addr_genid are monotonically increasing.
If we XOR their values, we could miss to detect if both values were changed with the same amount.
Fixes: 0465277f6b3f ("ipv4: provide addr and netconf dump consistency info") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Nicolas Dichtel nicolas.dichtel@6wind.com Acked-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/devinet.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 4c013f8800f0c..ed00b233cee2e 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1798,6 +1798,21 @@ static int in_dev_dump_addr(struct in_device *in_dev, struct sk_buff *skb, return err; }
+/* Combine dev_addr_genid and dev_base_seq to detect changes. + */ +static u32 inet_base_seq(const struct net *net) +{ + u32 res = atomic_read(&net->ipv4.dev_addr_genid) + + net->dev_base_seq; + + /* Must not return 0 (see nl_dump_check_consistent()). + * Chose a value far away from 0. + */ + if (!res) + res = 0x80000000; + return res; +} + static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) { const struct nlmsghdr *nlh = cb->nlh; @@ -1849,8 +1864,7 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) idx = 0; head = &tgt_net->dev_index_head[h]; rcu_read_lock(); - cb->seq = atomic_read(&tgt_net->ipv4.dev_addr_genid) ^ - tgt_net->dev_base_seq; + cb->seq = inet_base_seq(tgt_net); hlist_for_each_entry_rcu(dev, head, index_hlist) { if (idx < s_idx) goto cont; @@ -2249,8 +2263,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb, idx = 0; head = &net->dev_index_head[h]; rcu_read_lock(); - cb->seq = atomic_read(&net->ipv4.dev_addr_genid) ^ - net->dev_base_seq; + cb->seq = inet_base_seq(net); hlist_for_each_entry_rcu(dev, head, index_hlist) { if (idx < s_idx) goto cont;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit e898e4cd1aab271ca414f9ac6e08e4c761f6913c ]
net->dev_base_seq and ipv6.dev_addr_genid are monotonically increasing.
If we XOR their values, we could miss to detect if both values were changed with the same amount.
Fixes: 63998ac24f83 ("ipv6: provide addr and netconf dump consistency info") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Nicolas Dichtel nicolas.dichtel@6wind.com
Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/addrconf.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 4bec4c0617412..6fcbe8912b431 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -696,6 +696,22 @@ static int inet6_netconf_get_devconf(struct sk_buff *in_skb, return err; }
+/* Combine dev_addr_genid and dev_base_seq to detect changes. + */ +static u32 inet6_base_seq(const struct net *net) +{ + u32 res = atomic_read(&net->ipv6.dev_addr_genid) + + net->dev_base_seq; + + /* Must not return 0 (see nl_dump_check_consistent()). + * Chose a value far away from 0. + */ + if (!res) + res = 0x80000000; + return res; +} + + static int inet6_netconf_dump_devconf(struct sk_buff *skb, struct netlink_callback *cb) { @@ -729,8 +745,7 @@ static int inet6_netconf_dump_devconf(struct sk_buff *skb, idx = 0; head = &net->dev_index_head[h]; rcu_read_lock(); - cb->seq = atomic_read(&net->ipv6.dev_addr_genid) ^ - net->dev_base_seq; + cb->seq = inet6_base_seq(net); hlist_for_each_entry_rcu(dev, head, index_hlist) { if (idx < s_idx) goto cont; @@ -5232,7 +5247,7 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb, }
rcu_read_lock(); - cb->seq = atomic_read(&tgt_net->ipv6.dev_addr_genid) ^ tgt_net->dev_base_seq; + cb->seq = inet6_base_seq(tgt_net); for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) { idx = 0; head = &tgt_net->dev_index_head[h];
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniil Dulov d.dulov@aladdin.ru
[ Upstream commit 6ea38e2aeb72349cad50e38899b0ba6fbcb2af3d ]
The max length of volume->vid value is 20 characters. So increase idbuf[] size up to 24 to avoid overflow.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
[DH: Actually, it's 20 + NUL, so increase it to 24 and use snprintf()]
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Daniil Dulov d.dulov@aladdin.ru Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/20240211150442.3416-1-d.dulov@aladdin.ru/ # v1 Link: https://lore.kernel.org/r/20240212083347.10742-1-d.dulov@aladdin.ru/ # v2 Link: https://lore.kernel.org/r/20240219143906.138346-3-dhowells@redhat.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/volume.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/afs/volume.c b/fs/afs/volume.c index 4310336b9bb8c..4225bc766d464 100644 --- a/fs/afs/volume.c +++ b/fs/afs/volume.c @@ -221,7 +221,7 @@ static int afs_update_volume_status(struct afs_volume *volume, struct key *key) { struct afs_server_list *new, *old, *discard; struct afs_vldb_entry *vldb; - char idbuf[16]; + char idbuf[24]; int ret, idsz;
_enter(""); @@ -229,7 +229,7 @@ static int afs_update_volume_status(struct afs_volume *volume, struct key *key) /* We look up an ID by passing it as a decimal string in the * operation's name parameter. */ - idsz = sprintf(idbuf, "%llu", volume->vid); + idsz = snprintf(idbuf, sizeof(idbuf), "%llu", volume->vid);
vldb = afs_vl_lookup_vldb(volume->cell, key, idbuf, idsz); if (IS_ERR(vldb)) {
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasiliy Kovalev kovalev@altlinux.org
[ Upstream commit 5559cea2d5aa3018a5f00dd2aca3427ba09b386b ]
The pernet operations structure for the subsystem must be registered before registering the generic netlink family.
Fixes: 915d7e5e5930 ("ipv6: sr: add code base for control plane support of SR-IPv6") Signed-off-by: Vasiliy Kovalev kovalev@altlinux.org Link: https://lore.kernel.org/r/20240215202717.29815-1-kovalev@altlinux.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/seg6.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c index f5c448c276fef..7094f8691ac68 100644 --- a/net/ipv6/seg6.c +++ b/net/ipv6/seg6.c @@ -441,22 +441,24 @@ int __init seg6_init(void) { int err = -ENOMEM;
- err = genl_register_family(&seg6_genl_family); + err = register_pernet_subsys(&ip6_segments_ops); if (err) goto out;
- err = register_pernet_subsys(&ip6_segments_ops); + err = genl_register_family(&seg6_genl_family); if (err) - goto out_unregister_genl; + goto out_unregister_pernet;
#ifdef CONFIG_IPV6_SEG6_LWTUNNEL err = seg6_iptunnel_init(); if (err) - goto out_unregister_pernet; + goto out_unregister_genl;
err = seg6_local_init(); - if (err) - goto out_unregister_pernet; + if (err) { + seg6_iptunnel_exit(); + goto out_unregister_genl; + } #endif
#ifdef CONFIG_IPV6_SEG6_HMAC @@ -477,11 +479,11 @@ int __init seg6_init(void) #endif #endif #ifdef CONFIG_IPV6_SEG6_LWTUNNEL -out_unregister_pernet: - unregister_pernet_subsys(&ip6_segments_ops); -#endif out_unregister_genl: genl_unregister_family(&seg6_genl_family); +#endif +out_unregister_pernet: + unregister_pernet_subsys(&ip6_segments_ops); goto out; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit 8fc9d51ea2d32a05f7d7cf86a25cc86ecc57eb45 ]
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmkn... Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Link: https://lore.kernel.org/r/20220818210227.8611-1-wsa+renesas@sang-engineering... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: a7d6027790ac ("arp: Prevent overflow in arp_req_get().") Signed-off-by: Sasha Levin sashal@kernel.org --- net/packet/af_packet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 600a84a5c582a..451f9c43b34b8 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1850,7 +1850,7 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct net_device *dev, */
spkt->spkt_family = dev->type; - strlcpy(spkt->spkt_device, dev->name, sizeof(spkt->spkt_device)); + strscpy(spkt->spkt_device, dev->name, sizeof(spkt->spkt_device)); spkt->spkt_protocol = skb->protocol;
/* @@ -3511,7 +3511,7 @@ static int packet_getname_spkt(struct socket *sock, struct sockaddr *uaddr, rcu_read_lock(); dev = dev_get_by_index_rcu(sock_net(sk), READ_ONCE(pkt_sk(sk)->ifindex)); if (dev) - strlcpy(uaddr->sa_data, dev->name, sizeof(uaddr->sa_data)); + strscpy(uaddr->sa_data, dev->name, sizeof(uaddr->sa_data)); rcu_read_unlock();
return sizeof(*uaddr);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Gunthorpe jgg@nvidia.com
[ Upstream commit 723a2cc8d69d4342b47dfddbfe6c19f1b135f09b ]
The signature for __iowrite64_copy() requires the number of 64 bit quantities, not bytes. Multiple by 8 to get to a byte length before invoking zpci_memcpy_toio()
Fixes: 87bc359b9822 ("s390/pci: speed up __iowrite64_copy by using pci store block insn") Acked-by: Niklas Schnelle schnelle@linux.ibm.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Link: https://lore.kernel.org/r/0-v1-9223d11a7662+1d7785-s390_iowrite64_jgg@nvidia... Signed-off-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/pci/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index b8ddacf1efe11..8d241f3c3b78b 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -223,7 +223,7 @@ resource_size_t pcibios_align_resource(void *data, const struct resource *res, /* combine single writes by using store-block insn */ void __iowrite64_copy(void __iomem *to, const void *from, size_t count) { - zpci_memcpy_toio(to, from, count); + zpci_memcpy_toio(to, from, count * 8); }
void __iomem *ioremap(unsigned long ioaddr, unsigned long size)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit bfc06e1aaa130b86a81ce3c41ec71a2f5e191690 ]
'recv_end:' checks num_async and decrypted, and is then followed by the 'end' label. Since we know that decrypted and num_async are 0 at the start we can jump to 'end'.
Move the init of decrypted and num_async to let the compiler catch if I'm wrong.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: fdfbaec5923d ("tls: stop recv() if initial process_rx_list gave us non-DATA") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 38592d0871a3f..606a51237c50f 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1748,6 +1748,7 @@ int tls_sw_recvmsg(struct sock *sk, struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); struct tls_prot_info *prot = &tls_ctx->prot_info; struct sk_psock *psock; + int num_async, pending; unsigned char control = 0; ssize_t decrypted = 0; struct strp_msg *rxm; @@ -1760,8 +1761,6 @@ int tls_sw_recvmsg(struct sock *sk, bool is_kvec = iov_iter_is_kvec(&msg->msg_iter); bool is_peek = flags & MSG_PEEK; bool bpf_strp_enabled; - int num_async = 0; - int pending;
flags |= nonblock;
@@ -1783,12 +1782,14 @@ int tls_sw_recvmsg(struct sock *sk, }
if (len <= copied) - goto recv_end; + goto end;
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len); len = len - copied; timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+ decrypted = 0; + num_async = 0; while (len && (decrypted + copied < target || ctx->recv_pkt)) { bool retain_skb = false; bool zc = false;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit d5123edd10cf9d324fcb88e276bdc7375f3c5321 ]
Pointless else branch after goto makes the code harder to refactor down the line.
Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: fdfbaec5923d ("tls: stop recv() if initial process_rx_list gave us non-DATA") Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 606a51237c50f..fb7428f222a8f 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1777,10 +1777,9 @@ int tls_sw_recvmsg(struct sock *sk, if (err < 0) { tls_err_abort(sk, err); goto end; - } else { - copied = err; }
+ copied = err; if (len <= copied) goto end;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sabrina Dubroca sd@queasysnail.net
[ Upstream commit fdfbaec5923d9359698cbb286bc0deadbb717504 ]
If we have a non-DATA record on the rx_list and another record of the same type still on the queue, we will end up merging them: - process_rx_list copies the non-DATA record - we start the loop and process the first available record since it's of the same type - we break out of the loop since the record was not DATA
Just check the record type and jump to the end in case process_rx_list did some work.
Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Sabrina Dubroca sd@queasysnail.net Link: https://lore.kernel.org/r/bd31449e43bd4b6ff546f5c51cf958c31c511deb.170800737... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index fb7428f222a8f..910da98d6bfb3 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1780,7 +1780,7 @@ int tls_sw_recvmsg(struct sock *sk, }
copied = err; - if (len <= copied) + if (len <= copied || (copied && control != TLS_RECORD_TYPE_DATA)) goto end;
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit bccebf64701735533c8db37773eeacc6566cc8ec ]
We need to set the dormant flag again if we fail to register the hooks.
During memory pressure hook registration can fail and we end up with a table marked as active but no registered hooks.
On table/base chain deletion, nf_tables will attempt to unregister the hook again which yields a warn splat from the nftables core.
Reported-and-tested-by: syzbot+de4025c006ec68ac56fc@syzkaller.appspotmail.com Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 6c83d3e169c9c..c5dbb950822fd 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -951,6 +951,7 @@ static int nf_tables_updtable(struct nft_ctx *ctx) return 0;
err_register_hooks: + ctx->table->flags |= NFT_TABLE_F_DORMANT; nft_trans_destroy(trans); return ret; }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian König christian.koenig@amd.com
[ Upstream commit 7621350c6bb20fb6ab7eb988833ab96eac3dcbef ]
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT can't be used when we hold locks since we are basically waiting for userspace to do something.
Holding a lock while doing so can trivial deadlock with page faults etc...
So make lockdep complain when a driver tries to do this.
v2: Add lockdep_assert_none_held() macro. v3: Add might_sleep() and also use lockdep_assert_none_held() in the IOCTL path.
Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://patchwork.freedesktop.org/patch/414944/ Stable-dep-of: 3c43177ffb54 ("drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE flag is set") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_syncobj.c | 12 ++++++++++++ include/linux/lockdep.h | 5 +++++ 2 files changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 8b155e3377cfe..a1d2a3dc00768 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -325,6 +325,15 @@ int drm_syncobj_find_fence(struct drm_file *file_private, if (!syncobj) return -ENOENT;
+ /* Waiting for userspace with locks help is illegal cause that can + * trivial deadlock with page faults for example. Make lockdep complain + * about it early on. + */ + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + might_sleep(); + lockdep_assert_none_held_once(); + } + *fence = drm_syncobj_fence_get(syncobj);
if (*fence) { @@ -889,6 +898,9 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, uint64_t *points; uint32_t signaled_count, i;
+ if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) + lockdep_assert_none_held_once(); + points = kmalloc_array(count, sizeof(*points), GFP_KERNEL); if (points == NULL) return -ENOMEM; diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index b8a835fd611b2..15d92abb3f2d8 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -403,6 +403,10 @@ extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie); WARN_ON_ONCE(debug_locks && !lockdep_is_held(l)); \ } while (0)
+#define lockdep_assert_none_held_once() do { \ + WARN_ON_ONCE(debug_locks && current->lockdep_depth); \ + } while (0) + #define lockdep_recursing(tsk) ((tsk)->lockdep_recursion)
#define lockdep_pin_lock(l) lock_pin_lock(&(l)->dep_map) @@ -479,6 +483,7 @@ struct lockdep_map { }; #define lockdep_assert_held_write(l) do { (void)(l); } while (0) #define lockdep_assert_held_read(l) do { (void)(l); } while (0) #define lockdep_assert_held_once(l) do { (void)(l); } while (0) +#define lockdep_assert_none_held_once() do { } while (0)
#define lockdep_recursing(tsk) (0)
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Erik Kurzinger ekurzinger@nvidia.com
[ Upstream commit 3c43177ffb54ea5be97505eb8e2690e99ac96bc9 ]
When waiting for a syncobj timeline point whose fence has not yet been submitted with the WAIT_FOR_SUBMIT flag, a callback is registered using drm_syncobj_fence_add_wait and the thread is put to sleep until the timeout expires. If the fence is submitted before then, drm_syncobj_add_point will wake up the sleeping thread immediately which will proceed to wait for the fence to be signaled.
However, if the WAIT_AVAILABLE flag is used instead, drm_syncobj_fence_add_wait won't get called, meaning the waiting thread will always sleep for the full timeout duration, even if the fence gets submitted earlier. If it turns out that the fence *has* been submitted by the time it eventually wakes up, it will still indicate to userspace that the wait completed successfully (it won't return -ETIME), but it will have taken much longer than it should have.
To fix this, we must call drm_syncobj_fence_add_wait if *either* the WAIT_FOR_SUBMIT flag or the WAIT_AVAILABLE flag is set. The only difference being that with WAIT_FOR_SUBMIT we will also wait for the fence to be signaled after it has been submitted while with WAIT_AVAILABLE we will return immediately.
IGT test patch: https://lists.freedesktop.org/archives/igt-dev/2024-January/067537.html
v1 -> v2: adjust lockdep_assert_none_held_once condition
(cherry picked from commit 8c44ea81634a4a337df70a32621a5f3791be23df)
Fixes: 01d6c3578379 ("drm/syncobj: add support for timeline point wait v8") Signed-off-by: Erik Kurzinger ekurzinger@nvidia.com Signed-off-by: Simon Ser contact@emersion.fr Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch Reviewed-by: Simon Ser contact@emersion.fr Link: https://patchwork.freedesktop.org/patch/msgid/20240119163208.3723457-1-ekurz... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_syncobj.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index a1d2a3dc00768..9baf95a4fc9ff 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -898,7 +898,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, uint64_t *points; uint32_t signaled_count, i;
- if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) + if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT | + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) lockdep_assert_none_held_once();
points = kmalloc_array(count, sizeof(*points), GFP_KERNEL); @@ -967,7 +968,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, * fallthough and try a 0 timeout wait! */
- if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT | + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) { for (i = 0; i < count; ++i) drm_syncobj_fence_add_wait(syncobjs[i], &entries[i]); }
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bart Van Assche bvanassche@acm.org
commit b820de741ae48ccf50dd95e297889c286ff4f760 upstream.
If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the following kernel warning appears:
WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8 Call trace: kiocb_set_cancel_fn+0x9c/0xa8 ffs_epfile_read_iter+0x144/0x1d0 io_read+0x19c/0x498 io_issue_sqe+0x118/0x27c io_submit_sqes+0x25c/0x5fc __arm64_sys_io_uring_enter+0x104/0xab0 invoke_syscall+0x58/0x11c el0_svc_common+0xb4/0xf4 do_el0_svc+0x2c/0xb0 el0_svc+0x2c/0xa4 el0t_64_sync_handler+0x68/0xb4 el0t_64_sync+0x1a4/0x1a8
Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is submitted by libaio.
Suggested-by: Jens Axboe axboe@kernel.dk Cc: Christoph Hellwig hch@lst.de Cc: Avi Kivity avi@scylladb.com Cc: Sandeep Dhavale dhavale@google.com Cc: Jens Axboe axboe@kernel.dk Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Kent Overstreet kent.overstreet@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Bart Van Assche bvanassche@acm.org Link: https://lore.kernel.org/r/20240215204739.2677806-2-bvanassche@acm.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/aio.c | 9 ++++++++- include/linux/fs.h | 2 ++ 2 files changed, 10 insertions(+), 1 deletion(-)
--- a/fs/aio.c +++ b/fs/aio.c @@ -570,6 +570,13 @@ void kiocb_set_cancel_fn(struct kiocb *i struct kioctx *ctx = req->ki_ctx; unsigned long flags;
+ /* + * kiocb didn't come from aio or is neither a read nor a write, hence + * ignore it. + */ + if (!(iocb->ki_flags & IOCB_AIO_RW)) + return; + if (WARN_ON_ONCE(!list_empty(&req->ki_list))) return;
@@ -1455,7 +1462,7 @@ static int aio_prep_rw(struct kiocb *req req->ki_complete = aio_complete_rw; req->private = NULL; req->ki_pos = iocb->aio_offset; - req->ki_flags = iocb_flags(req->ki_filp); + req->ki_flags = iocb_flags(req->ki_filp) | IOCB_AIO_RW; if (iocb->aio_flags & IOCB_FLAG_RESFD) req->ki_flags |= IOCB_EVENTFD; req->ki_hint = ki_hint_validate(file_write_hint(req->ki_filp)); --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -314,6 +314,8 @@ enum rw_hint { #define IOCB_SYNC (1 << 5) #define IOCB_WRITE (1 << 6) #define IOCB_NOWAIT (1 << 7) +/* kiocb is a read or write operation submitted by fs/aio.c. */ +#define IOCB_AIO_RW (1 << 23)
struct kiocb { struct file *ki_filp;
5.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrii Nakryiko andriin@fb.com
commit e0b68fb186b251374adbd870f99b1ecea236e770 upstream.
Fix typo in struct xpd_md, generated from bpf_helpers_doc.py, which is causing compilation warnings for programs using bpf_helpers.h
Fixes: 7a387bed47f7 ("scripts/bpf: teach bpf_helpers_doc.py to dump BPF helper definitions") Signed-off-by: Andrii Nakryiko andriin@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Link: https://lore.kernel.org/bpf/20191010042534.290562-1-andriin@fb.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- scripts/bpf_helpers_doc.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/scripts/bpf_helpers_doc.py +++ b/scripts/bpf_helpers_doc.py @@ -418,7 +418,7 @@ class PrinterHelpers(Printer):
'struct __sk_buff', 'struct sk_msg_md', - 'struct xpd_md', + 'struct xdp_md', ] known_types = { '...',
On 2/27/24 05:26, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.270 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.270-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On Tue, 27 Feb 2024 14:26:27 +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.270 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.270-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v5.4: 10 builds: 10 pass, 0 fail 24 boots: 24 pass, 0 fail 54 tests: 54 pass, 0 fail
Linux version: 5.4.270-rc1-g331c26fedbd3 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 2/27/24 06:26, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.270 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.270-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Tue, 27 Feb 2024 at 19:52, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.270 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.270-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.4.270-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.4.y * git commit: 331c26fedbd3ec4b446d0f2ae002f93421d3daa6 * git describe: v5.4.269-85-g331c26fedbd3 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.26...
## Test Regressions (compared to v5.4.269)
## Metric Regressions (compared to v5.4.269)
## Test Fixes (compared to v5.4.269)
## Metric Fixes (compared to v5.4.269)
## Test result summary total: 97761, pass: 78074, fail: 3337, skip: 16283, xfail: 67
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 138 total, 138 passed, 0 failed * arm64: 38 total, 36 passed, 2 failed * i386: 25 total, 19 passed, 6 failed * mips: 27 total, 27 passed, 0 failed * parisc: 3 total, 0 passed, 3 failed * powerpc: 32 total, 31 passed, 1 failed * riscv: 12 total, 12 passed, 0 failed * s390: 8 total, 8 passed, 0 failed * sh: 10 total, 10 passed, 0 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 34 total, 34 passed, 0 failed
## Test suites summary * boot * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-lib * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mm * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-sigaltstack * kselftest-size * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-smoketest * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
Hi Greg,
On 27/02/24 18:56, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.4.270 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
No problems seen on x86_64 and aarch64 with our testing.
Tested-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
Thanks, Harshit
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.270-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
thanks,
greg k-h
On Tuesday, February 27, 2024 18:56 IST, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.4.270 release. There are 84 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Feb 2024 13:15:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.270-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y and the diffstat can be found below.
KernelCI report for stable-rc/linux-5.4.y for this week.
## stable-rc HEAD for linux-5.4.y: Date: 2024-02-27 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/l...
## Build failures: No build failures seen for the stable-rc/linux-5.4.y commit head \o/
## Boot failures: No **new** boot failures seen for the stable-rc/linux-5.4.y commit head \o/
Tested-by: kernelci.org bot bot@kernelci.org
Thanks, Shreeya Patel
linux-stable-mirror@lists.linaro.org