February 2025 - Linux-stable-mirror

[PATCH 0/2] bus: mhi: host: pci_generic: Couple of recovery fixes

by Manivannan Sadhasivam via B4 Relay

Hi, This series fixes a couple of issues reported by Johan in [1]. First one fixes a deadlock that happens during shutdown and suspend. Second one fixes the driver's PM behavior. [1] https://lore.kernel.org/mhi/Z1me8iaK7cwgjL92@hovoldconsulting.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org> --- Manivannan Sadhasivam (2): bus: mhi: host: pci_generic: Use pci_try_reset_function() to avoid deadlock bus: mhi: host: pci_generic: Recover the device synchronously from mhi_pci_runtime_resume() drivers/bus/mhi/host/pci_generic.c | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) --- base-commit: fc033cf25e612e840e545f8d5ad2edd6ba613ed5 change-id: 20250108-mhi_recovery_fix-a8f37168f91c Best regards, -- Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>

6 months

4
13
0 0

[PATCH v2] arm64: dts: socfpga: agilex5: fix gpio0 address

by niravkumar.l.rabara＠intel.com

From: Niravkumar L Rabara <niravkumar.l.rabara(a)intel.com> Use the correct gpio0 address for Agilex5. Fixes: 3f7c869e143a ("arm64: dts: socfpga: agilex5: Add gpio0 node and spi dma handshake id") Cc: stable(a)vger.kernel.org Signed-off-by: Niravkumar L Rabara <niravkumar.l.rabara(a)intel.com> --- changes in v2: * Fix dtbs_check warning and update commit message for better clarity. link to v1: - https://lore.kernel.org/all/20250212100131.2668403-1-niravkumar.l.rabara@in… arch/arm64/boot/dts/intel/socfpga_agilex5.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/boot/dts/intel/socfpga_agilex5.dtsi b/arch/arm64/boot/dts/intel/socfpga_agilex5.dtsi index 51c6e19e40b8..7d9394a04302 100644 --- a/arch/arm64/boot/dts/intel/socfpga_agilex5.dtsi +++ b/arch/arm64/boot/dts/intel/socfpga_agilex5.dtsi @@ -222,9 +222,9 @@ i3c1: i3c@10da1000 { status = "disabled"; }; - gpio0: gpio@ffc03200 { + gpio0: gpio@10c03200 { compatible = "snps,dw-apb-gpio"; - reg = <0xffc03200 0x100>; + reg = <0x10c03200 0x100>; #address-cells = <1>; #size-cells = <0>; resets = <&rst GPIO0_RESET>; -- 2.25.1

6 months

3
2
0 0

[PATCH 1/2] drm/amdgpu/gfx9: manually control gfxoff for CS on RV

by Alex Deucher

When mesa started using compute queues more often we started seeing additional hangs with compute queues. Disabling gfxoff seems to mitigate that. Manually control gfxoff and gfx pg with command submissions to avoid any issues related to gfxoff. KFD already does the same thing for these chips. v2: limit to compute v3: limit to APUs v4: limit to Raven/PCO v5: only update the compute ring_funcs v6: Disable GFX PG v7: adjust order Backport the change to 6.13 and older kernels due to a change in function signatures. Reviewed-by: Lijo Lazar <lijo.lazar(a)amd.com> Suggested-by: Błażej Szczygieł <mumei6102(a)gmail.com> Suggested-by: Sergey Kovalenko <seryoga.engineering(a)gmail.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Link: https://lists.freedesktop.org/archives/amd-gfx/2025-January/119116.html Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org # 6.12.x (cherry picked from commit b35eb9128ebeec534eed1cefd6b9b1b7282cf5ba) --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 32 +++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 0b6f09f2cc9b..d28258bb6d29 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -7439,6 +7439,34 @@ static void gfx_v9_0_ring_emit_cleaner_shader(struct amdgpu_ring *ring) amdgpu_ring_write(ring, 0); /* RESERVED field, programmed to zero */ } +static void gfx_v9_0_ring_begin_use_compute(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + amdgpu_gfx_enforce_isolation_ring_begin_use(ring); + + /* Raven and PCO APUs seem to have stability issues + * with compute and gfxoff and gfx pg. Disable gfx pg during + * submission and allow again afterwards. + */ + if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 1, 0)) + gfx_v9_0_set_powergating_state(adev, AMD_PG_STATE_UNGATE); +} + +static void gfx_v9_0_ring_end_use_compute(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + /* Raven and PCO APUs seem to have stability issues + * with compute and gfxoff and gfx pg. Disable gfx pg during + * submission and allow again afterwards. + */ + if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 1, 0)) + gfx_v9_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + + amdgpu_gfx_enforce_isolation_ring_end_use(ring); +} + static const struct amd_ip_funcs gfx_v9_0_ip_funcs = { .name = "gfx_v9_0", .early_init = gfx_v9_0_early_init, @@ -7615,8 +7643,8 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_compute = { .emit_wave_limit = gfx_v9_0_emit_wave_limit, .reset = gfx_v9_0_reset_kcq, .emit_cleaner_shader = gfx_v9_0_ring_emit_cleaner_shader, - .begin_use = amdgpu_gfx_enforce_isolation_ring_begin_use, - .end_use = amdgpu_gfx_enforce_isolation_ring_end_use, + .begin_use = gfx_v9_0_ring_begin_use_compute, + .end_use = gfx_v9_0_ring_end_use_compute, }; static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_kiq = { -- 2.48.1

6 months

1
1
0 0

[PATCH net-next v2 1/2] net: advertise 'netns local' property via netlink

by Nicolas Dichtel

Since the below commit, there is no way to see if the netns_local property is set on a device. Let's add a netlink attribute to advertise it. CC: stable(a)vger.kernel.org Fixes: 05c1280a2bcf ("netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_local") Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com> --- Documentation/netlink/specs/rt_link.yaml | 3 +++ include/uapi/linux/if_link.h | 1 + net/core/rtnetlink.c | 3 +++ 3 files changed, 7 insertions(+) diff --git a/Documentation/netlink/specs/rt_link.yaml b/Documentation/netlink/specs/rt_link.yaml index 0d492500c7e5..a646d8a6bf9d 100644 --- a/Documentation/netlink/specs/rt_link.yaml +++ b/Documentation/netlink/specs/rt_link.yaml @@ -1148,6 +1148,9 @@ attribute-sets: name: max-pacing-offload-horizon type: uint doc: EDT offload horizon supported by the device (in nsec). + - + name: netns-local + type: u8 - name: af-spec-attrs attributes: diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index bfe880fbbb24..ed4a64e1c8f1 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -378,6 +378,7 @@ enum { IFLA_GRO_IPV4_MAX_SIZE, IFLA_DPLL_PIN, IFLA_MAX_PACING_OFFLOAD_HORIZON, + IFLA_NETNS_LOCAL, __IFLA_MAX }; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index abe1a461ea67..acf787e4d22d 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1292,6 +1292,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, + nla_total_size(4) /* IFLA_TSO_MAX_SEGS */ + nla_total_size(1) /* IFLA_OPERSTATE */ + nla_total_size(1) /* IFLA_LINKMODE */ + + nla_total_size(1) /* IFLA_NETNS_LOCAL */ + nla_total_size(4) /* IFLA_CARRIER_CHANGES */ + nla_total_size(4) /* IFLA_LINK_NETNSID */ + nla_total_size(4) /* IFLA_GROUP */ @@ -2046,6 +2047,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, netif_running(dev) ? READ_ONCE(dev->operstate) : IF_OPER_DOWN) || nla_put_u8(skb, IFLA_LINKMODE, READ_ONCE(dev->link_mode)) || + nla_put_u8(skb, IFLA_NETNS_LOCAL, dev->netns_local) || nla_put_u32(skb, IFLA_MTU, READ_ONCE(dev->mtu)) || nla_put_u32(skb, IFLA_MIN_MTU, READ_ONCE(dev->min_mtu)) || nla_put_u32(skb, IFLA_MAX_MTU, READ_ONCE(dev->max_mtu)) || @@ -2234,6 +2236,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = { [IFLA_ALLMULTI] = { .type = NLA_REJECT }, [IFLA_GSO_IPV4_MAX_SIZE] = NLA_POLICY_MIN(NLA_U32, MAX_TCP_HEADER + 1), [IFLA_GRO_IPV4_MAX_SIZE] = { .type = NLA_U32 }, + [IFLA_NETNS_LOCAL] = { .type = NLA_REJECT }, }; static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = { -- 2.47.1

6 months

2
2
0 0

[PATCH 6.1] block, bfq: split sync bfq_queues on a per-actuator basis

by Hagar Hemdan

From: Paolo Valente <paolo.valente(a)linaro.org> commit 9778369a2d6c5ed2b81a04164c4aa9da1bdb193d upstream. Single-LUN multi-actuator SCSI drives, as well as all multi-actuator SATA drives appear as a single device to the I/O subsystem [1]. Yet they address commands to different actuators internally, as a function of Logical Block Addressing (LBAs). A given sector is reachable by only one of the actuators. For example, Seagate’s Serial Advanced Technology Attachment (SATA) version contains two actuators and maps the lower half of the SATA LBA space to the lower actuator and the upper half to the upper actuator. Evidently, to fully utilize actuators, no actuator must be left idle or underutilized while there is pending I/O for it. The block layer must somehow control the load of each actuator individually. This commit lays the ground for allowing BFQ to provide such a per-actuator control. BFQ associates an I/O-request sync bfq_queue with each process doing synchronous I/O, or with a group of processes, in case of queue merging. Then BFQ serves one bfq_queue at a time. While in service, a bfq_queue is emptied in request-position order. Yet the same process, or group of processes, may generate I/O for different actuators. In this case, different streams of I/O (each for a different actuator) get all inserted into the same sync bfq_queue. So there is basically no individual control on when each stream is served, i.e., on when the I/O requests of the stream are picked from the bfq_queue and dispatched to the drive. This commit enables BFQ to control the service of each actuator individually for synchronous I/O, by simply splitting each sync bfq_queue into N queues, one for each actuator. In other words, a sync bfq_queue is now associated to a pair (process, actuator). As a consequence of this split, the per-queue proportional-share policy implemented by BFQ will guarantee that the sync I/O generated for each actuator, by each process, receives its fair share of service. This is just a preparatory patch. If the I/O of the same process happens to be sent to different queues, then each of these queues may undergo queue merging. To handle this event, the bfq_io_cq data structure must be properly extended. In addition, stable merging must be disabled to avoid loss of control on individual actuators. Finally, also async queues must be split. These issues are described in detail and addressed in next commits. As for this commit, although multiple per-process bfq_queues are provided, the I/O of each process or group of processes is still sent to only one queue, regardless of the actuator the I/O is for. The forwarding to distinct bfq_queues will be enabled after addressing the above issues. [1] https://www.linaro.org/blog/budget-fair-queueing-bfq-linux-io-scheduler-opt… Reviewed-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com> Signed-off-by: Gabriele Felici <felicigb(a)gmail.com> Signed-off-by: Carmine Zaccagnino <carmine(a)carminezacc.com> Signed-off-by: Paolo Valente <paolo.valente(a)linaro.org> Link: https://lore.kernel.org/r/20230103145503.71712-2-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Stable-dep-of: e8b8344de398 ("block, bfq: fix bfqq uaf in bfq_limit_depth()") [Hagar: needed contextual fixes] Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com> --- block/bfq-cgroup.c | 95 +++++++++++++------------- block/bfq-iosched.c | 160 +++++++++++++++++++++++++++++--------------- block/bfq-iosched.h | 51 +++++++++++--- 3 files changed, 196 insertions(+), 110 deletions(-) diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index 60b4299bec8ec..5d202b775beb4 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -704,6 +704,46 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq, bfq_put_queue(bfqq); } +static void bfq_sync_bfqq_move(struct bfq_data *bfqd, + struct bfq_queue *sync_bfqq, + struct bfq_io_cq *bic, + struct bfq_group *bfqg, + unsigned int act_idx) +{ + struct bfq_queue *bfqq; + + if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) { + /* We are the only user of this bfqq, just move it */ + if (sync_bfqq->entity.sched_data != &bfqg->sched_data) + bfq_bfqq_move(bfqd, sync_bfqq, bfqg); + return; + } + + /* + * The queue was merged to a different queue. Check + * that the merge chain still belongs to the same + * cgroup. + */ + for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq) + if (bfqq->entity.sched_data != &bfqg->sched_data) + break; + if (bfqq) { + /* + * Some queue changed cgroup so the merge is not valid + * anymore. We cannot easily just cancel the merge (by + * clearing new_bfqq) as there may be other processes + * using this queue and holding refs to all queues + * below sync_bfqq->new_bfqq. Similarly if the merge + * already happened, we need to detach from bfqq now + * so that we cannot merge bio to a request from the + * old cgroup. + */ + bfq_put_cooperator(sync_bfqq); + bfq_release_process_ref(bfqd, sync_bfqq); + bic_set_bfqq(bic, NULL, true, act_idx); + } +} + /** * __bfq_bic_change_cgroup - move @bic to @bfqg. * @bfqd: the queue descriptor. @@ -714,60 +754,25 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq, * sure that the reference to cgroup is valid across the call (see * comments in bfq_bic_update_cgroup on this issue) */ -static void *__bfq_bic_change_cgroup(struct bfq_data *bfqd, +static void __bfq_bic_change_cgroup(struct bfq_data *bfqd, struct bfq_io_cq *bic, struct bfq_group *bfqg) { - struct bfq_queue *async_bfqq = bic_to_bfqq(bic, false); - struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, true); - struct bfq_entity *entity; + unsigned int act_idx; - if (async_bfqq) { - entity = &async_bfqq->entity; + for (act_idx = 0; act_idx < bfqd->num_actuators; act_idx++) { + struct bfq_queue *async_bfqq = bic_to_bfqq(bic, false, act_idx); + struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, true, act_idx); - if (entity->sched_data != &bfqg->sched_data) { - bic_set_bfqq(bic, NULL, false); + if (async_bfqq && + async_bfqq->entity.sched_data != &bfqg->sched_data) { + bic_set_bfqq(bic, NULL, false, act_idx); bfq_release_process_ref(bfqd, async_bfqq); } - } - if (sync_bfqq) { - if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) { - /* We are the only user of this bfqq, just move it */ - if (sync_bfqq->entity.sched_data != &bfqg->sched_data) - bfq_bfqq_move(bfqd, sync_bfqq, bfqg); - } else { - struct bfq_queue *bfqq; - - /* - * The queue was merged to a different queue. Check - * that the merge chain still belongs to the same - * cgroup. - */ - for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq) - if (bfqq->entity.sched_data != - &bfqg->sched_data) - break; - if (bfqq) { - /* - * Some queue changed cgroup so the merge is - * not valid anymore. We cannot easily just - * cancel the merge (by clearing new_bfqq) as - * there may be other processes using this - * queue and holding refs to all queues below - * sync_bfqq->new_bfqq. Similarly if the merge - * already happened, we need to detach from - * bfqq now so that we cannot merge bio to a - * request from the old cgroup. - */ - bfq_put_cooperator(sync_bfqq); - bic_set_bfqq(bic, NULL, true); - bfq_release_process_ref(bfqd, sync_bfqq); - } - } + if (sync_bfqq) + bfq_sync_bfqq_move(bfqd, sync_bfqq, bic, bfqg, act_idx); } - - return bfqg; } void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index f759457646530..76b2e66d09070 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -377,16 +377,23 @@ static const unsigned long bfq_late_stable_merging = 600; #define RQ_BIC(rq) ((struct bfq_io_cq *)((rq)->elv.priv[0])) #define RQ_BFQQ(rq) ((rq)->elv.priv[1]) -struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync) +struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync, + unsigned int actuator_idx) { - return bic->bfqq[is_sync]; + if (is_sync) + return bic->bfqq[1][actuator_idx]; + + return bic->bfqq[0][actuator_idx]; } static void bfq_put_stable_ref(struct bfq_queue *bfqq); -void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync) +void bic_set_bfqq(struct bfq_io_cq *bic, + struct bfq_queue *bfqq, + bool is_sync, + unsigned int actuator_idx) { - struct bfq_queue *old_bfqq = bic->bfqq[is_sync]; + struct bfq_queue *old_bfqq = bic->bfqq[is_sync][actuator_idx]; /* Clear bic pointer if bfqq is detached from this bic */ if (old_bfqq && old_bfqq->bic == bic) @@ -405,7 +412,10 @@ void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync) * we cancel the stable merge if * bic->stable_merge_bfqq == bfqq. */ - bic->bfqq[is_sync] = bfqq; + if (is_sync) + bic->bfqq[1][actuator_idx] = bfqq; + else + bic->bfqq[0][actuator_idx] = bfqq; if (bfqq && bic->stable_merge_bfqq == bfqq) { /* @@ -680,9 +690,9 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data) { struct bfq_data *bfqd = data->q->elevator->elevator_data; struct bfq_io_cq *bic = bfq_bic_lookup(data->q); - struct bfq_queue *bfqq = bic ? bic_to_bfqq(bic, op_is_sync(opf)) : NULL; int depth; unsigned limit = data->q->nr_requests; + unsigned int act_idx; /* Sync reads have full depth available */ if (op_is_sync(opf) && !op_is_write(opf)) { @@ -692,14 +702,21 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data) limit = (limit * depth) >> bfqd->full_depth_shift; } - /* - * Does queue (or any parent entity) exceed number of requests that - * should be available to it? Heavily limit depth so that it cannot - * consume more available requests and thus starve other entities. - */ - if (bfqq && bfqq_request_over_limit(bfqq, limit)) - depth = 1; + for (act_idx = 0; bic && act_idx < bfqd->num_actuators; act_idx++) { + struct bfq_queue *bfqq = + bic_to_bfqq(bic, op_is_sync(opf), act_idx); + /* + * Does queue (or any parent entity) exceed number of + * requests that should be available to it? Heavily + * limit depth so that it cannot consume more + * available requests and thus starve other entities. + */ + if (bfqq && bfqq_request_over_limit(bfqq, limit)) { + depth = 1; + break; + } + } bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u", __func__, bfqd->wr_busy_queues, op_is_sync(opf), depth); if (depth) @@ -1820,6 +1837,18 @@ static bool bfq_bfqq_higher_class_or_weight(struct bfq_queue *bfqq, return bfqq_weight > in_serv_weight; } +/* + * Get the index of the actuator that will serve bio. + */ +static unsigned int bfq_actuator_index(struct bfq_data *bfqd, struct bio *bio) +{ + /* + * Multi-actuator support not complete yet, so always return 0 + * for the moment (to keep incomplete mechanisms off). + */ + return 0; +} + static bool bfq_better_to_idle(struct bfq_queue *bfqq); static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd, @@ -2150,7 +2179,7 @@ static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq, * We reset waker detection logic also if too much time has passed * since the first detection. If wakeups are rare, pointless idling * doesn't hurt throughput that much. The condition below makes sure - * we do not uselessly idle blocking waker in more than 1/64 cases. + * we do not uselessly idle blocking waker in more than 1/64 cases. */ if (bfqd->last_completed_rq_bfqq != bfqq->tentative_waker_bfqq || @@ -2486,7 +2515,8 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio, */ bfq_bic_update_cgroup(bic, bio); - bfqd->bio_bfqq = bic_to_bfqq(bic, op_is_sync(bio->bi_opf)); + bfqd->bio_bfqq = bic_to_bfqq(bic, op_is_sync(bio->bi_opf), + bfq_actuator_index(bfqd, bio)); } else { bfqd->bio_bfqq = NULL; } @@ -3188,7 +3218,7 @@ static struct bfq_queue *bfq_merge_bfqqs(struct bfq_data *bfqd, /* * Merge queues (that is, let bic redirect its requests to new_bfqq) */ - bic_set_bfqq(bic, new_bfqq, true); + bic_set_bfqq(bic, new_bfqq, true, bfqq->actuator_idx); bfq_mark_bfqq_coop(new_bfqq); /* * new_bfqq now belongs to at least two bics (it is a shared queue): @@ -4818,11 +4848,8 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd) */ if (bfq_bfqq_wait_request(bfqq) || (bfqq->dispatched != 0 && bfq_better_to_idle(bfqq))) { - struct bfq_queue *async_bfqq = - bfqq->bic && bfqq->bic->bfqq[0] && - bfq_bfqq_busy(bfqq->bic->bfqq[0]) && - bfqq->bic->bfqq[0]->next_rq ? - bfqq->bic->bfqq[0] : NULL; + unsigned int act_idx = bfqq->actuator_idx; + struct bfq_queue *async_bfqq = NULL; struct bfq_queue *blocked_bfqq = !hlist_empty(&bfqq->woken_list) ? container_of(bfqq->woken_list.first, @@ -4830,6 +4857,10 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd) woken_list_node) : NULL; + if (bfqq->bic && bfqq->bic->bfqq[0][act_idx] && + bfq_bfqq_busy(bfqq->bic->bfqq[0][act_idx]) && + bfqq->bic->bfqq[0][act_idx]->next_rq) + async_bfqq = bfqq->bic->bfqq[0][act_idx]; /* * The next four mutually-exclusive ifs decide * whether to try injection, and choose the queue to @@ -4914,7 +4945,7 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd) icq_to_bic(async_bfqq->next_rq->elv.icq) == bfqq->bic && bfq_serv_to_charge(async_bfqq->next_rq, async_bfqq) <= bfq_bfqq_budget_left(async_bfqq)) - bfqq = bfqq->bic->bfqq[0]; + bfqq = bfqq->bic->bfqq[0][act_idx]; else if (bfqq->waker_bfqq && bfq_bfqq_busy(bfqq->waker_bfqq) && bfqq->waker_bfqq->next_rq && @@ -5375,48 +5406,54 @@ static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq) bfq_release_process_ref(bfqd, bfqq); } -static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, bool is_sync) +static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, bool is_sync, + unsigned int actuator_idx) { - struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync); + struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync, actuator_idx); struct bfq_data *bfqd; if (bfqq) bfqd = bfqq->bfqd; /* NULL if scheduler already exited */ if (bfqq && bfqd) { - unsigned long flags; - - spin_lock_irqsave(&bfqd->lock, flags); - bic_set_bfqq(bic, NULL, is_sync); + bic_set_bfqq(bic, NULL, is_sync, actuator_idx); bfq_exit_bfqq(bfqd, bfqq); - spin_unlock_irqrestore(&bfqd->lock, flags); } } static void bfq_exit_icq(struct io_cq *icq) { struct bfq_io_cq *bic = icq_to_bic(icq); + struct bfq_data *bfqd = bic_to_bfqd(bic); + unsigned long flags; + unsigned int act_idx; + /* + * If bfqd and thus bfqd->num_actuators is not available any + * longer, then cycle over all possible per-actuator bfqqs in + * next loop. We rely on bic being zeroed on creation, and + * therefore on its unused per-actuator fields being NULL. + */ + unsigned int num_actuators = BFQ_MAX_ACTUATORS; - if (bic->stable_merge_bfqq) { - struct bfq_data *bfqd = bic->stable_merge_bfqq->bfqd; + /* + * bfqd is NULL if scheduler already exited, and in that case + * this is the last time these queues are accessed. + */ + if (bfqd) { + spin_lock_irqsave(&bfqd->lock, flags); + num_actuators = bfqd->num_actuators; + } - /* - * bfqd is NULL if scheduler already exited, and in - * that case this is the last time bfqq is accessed. - */ - if (bfqd) { - unsigned long flags; + if (bic->stable_merge_bfqq) + bfq_put_stable_ref(bic->stable_merge_bfqq); - spin_lock_irqsave(&bfqd->lock, flags); - bfq_put_stable_ref(bic->stable_merge_bfqq); - spin_unlock_irqrestore(&bfqd->lock, flags); - } else { - bfq_put_stable_ref(bic->stable_merge_bfqq); - } + for (act_idx = 0; act_idx < num_actuators; act_idx++) { + bfq_exit_icq_bfqq(bic, true, act_idx); + bfq_exit_icq_bfqq(bic, false, act_idx); } - bfq_exit_icq_bfqq(bic, true); - bfq_exit_icq_bfqq(bic, false); + if (bfqd) + spin_unlock_irqrestore(&bfqd->lock, flags); } /* @@ -5493,25 +5530,27 @@ static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio) bic->ioprio = ioprio; - bfqq = bic_to_bfqq(bic, false); + bfqq = bic_to_bfqq(bic, false, bfq_actuator_index(bfqd, bio)); if (bfqq) { struct bfq_queue *old_bfqq = bfqq; bfqq = bfq_get_queue(bfqd, bio, false, bic, true); - bic_set_bfqq(bic, bfqq, false); + bic_set_bfqq(bic, bfqq, false, bfq_actuator_index(bfqd, bio)); bfq_release_process_ref(bfqd, old_bfqq); } - bfqq = bic_to_bfqq(bic, true); + bfqq = bic_to_bfqq(bic, true, bfq_actuator_index(bfqd, bio)); if (bfqq) bfq_set_next_ioprio_data(bfqq, bic); } static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq, - struct bfq_io_cq *bic, pid_t pid, int is_sync) + struct bfq_io_cq *bic, pid_t pid, int is_sync, + unsigned int act_idx) { u64 now_ns = ktime_get_ns(); + bfqq->actuator_idx = act_idx; RB_CLEAR_NODE(&bfqq->entity.rb_node); INIT_LIST_HEAD(&bfqq->fifo); INIT_HLIST_NODE(&bfqq->burst_list_node); @@ -5762,7 +5801,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd, if (bfqq) { bfq_init_bfqq(bfqd, bfqq, bic, current->pid, - is_sync); + is_sync, bfq_actuator_index(bfqd, bio)); bfq_init_entity(&bfqq->entity, bfqg); bfq_log_bfqq(bfqd, bfqq, "allocated"); } else { @@ -6078,7 +6117,8 @@ static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq) * then complete the merge and redirect it to * new_bfqq. */ - if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq) { + if (bic_to_bfqq(RQ_BIC(rq), true, + bfq_actuator_index(bfqd, rq->bio)) == bfqq) { while (bfqq != new_bfqq) bfqq = bfq_merge_bfqqs(bfqd, RQ_BIC(rq), bfqq); } @@ -6632,7 +6672,7 @@ bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq) return bfqq; } - bic_set_bfqq(bic, NULL, true); + bic_set_bfqq(bic, NULL, true, bfqq->actuator_idx); bfq_put_cooperator(bfqq); @@ -6646,7 +6686,8 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd, bool split, bool is_sync, bool *new_queue) { - struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync); + unsigned int act_idx = bfq_actuator_index(bfqd, bio); + struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync, act_idx); if (likely(bfqq && bfqq != &bfqd->oom_bfqq)) return bfqq; @@ -6658,7 +6699,7 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd, bfq_put_queue(bfqq); bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, split); - bic_set_bfqq(bic, bfqq, is_sync); + bic_set_bfqq(bic, bfqq, is_sync, act_idx); if (split && is_sync) { if ((bic->was_in_burst_list && bfqd->large_burst) || bic->saved_in_large_burst) @@ -7139,8 +7180,10 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues. * Grab a permanent reference to it, so that the normal code flow * will not attempt to free it. + * Set zero as actuator index: we will pretend that + * all I/O requests are for the same actuator. */ - bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0); + bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0, 0); bfqd->oom_bfqq.ref++; bfqd->oom_bfqq.new_ioprio = BFQ_DEFAULT_QUEUE_IOPRIO; bfqd->oom_bfqq.new_ioprio_class = IOPRIO_CLASS_BE; @@ -7159,6 +7202,13 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) bfqd->queue = q; + /* + * Multi-actuator support not complete yet, unconditionally + * set to only one actuator for the moment (to keep incomplete + * mechanisms off). + */ + bfqd->num_actuators = 1; + INIT_LIST_HEAD(&bfqd->dispatch); hrtimer_init(&bfqd->idle_slice_timer, CLOCK_MONOTONIC, diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index 71f721670ab62..2b413ddffbb95 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -33,6 +33,14 @@ */ #define BFQ_SOFTRT_WEIGHT_FACTOR 100 +/* + * Maximum number of actuators supported. This constant is used simply + * to define the size of the static array that will contain + * per-actuator data. The current value is hopefully a good upper + * bound to the possible number of actuators of any actual drive. + */ +#define BFQ_MAX_ACTUATORS 8 + struct bfq_entity; /** @@ -225,12 +233,14 @@ struct bfq_ttime { * struct bfq_queue - leaf schedulable entity. * * A bfq_queue is a leaf request queue; it can be associated with an - * io_context or more, if it is async or shared between cooperating - * processes. @cgroup holds a reference to the cgroup, to be sure that it - * does not disappear while a bfqq still references it (mostly to avoid - * races between request issuing and task migration followed by cgroup - * destruction). - * All the fields are protected by the queue lock of the containing bfqd. + * io_context or more, if it is async or shared between cooperating + * processes. Besides, it contains I/O requests for only one actuator + * (an io_context is associated with a different bfq_queue for each + * actuator it generates I/O for). @cgroup holds a reference to the + * cgroup, to be sure that it does not disappear while a bfqq still + * references it (mostly to avoid races between request issuing and + * task migration followed by cgroup destruction). All the fields are + * protected by the queue lock of the containing bfqd. */ struct bfq_queue { /* reference counter */ @@ -395,6 +405,9 @@ struct bfq_queue { * the woken queues when this queue exits. */ struct hlist_head woken_list; + + /* index of the actuator this queue is associated with */ + unsigned int actuator_idx; }; /** @@ -403,8 +416,17 @@ struct bfq_queue { struct bfq_io_cq { /* associated io_cq structure */ struct io_cq icq; /* must be the first member */ - /* array of two process queues, the sync and the async */ - struct bfq_queue *bfqq[2]; + /* + * Matrix of associated process queues: first row for async + * queues, second row sync queues. Each row contains one + * column for each actuator. An I/O request generated by the + * process is inserted into the queue pointed by bfqq[i][j] if + * the request is to be served by the j-th actuator of the + * drive, where i==0 or i==1, depending on whether the request + * is async or sync. So there is a distinct queue for each + * actuator. + */ + struct bfq_queue *bfqq[2][BFQ_MAX_ACTUATORS]; /* per (request_queue, blkcg) ioprio */ int ioprio; #ifdef CONFIG_BFQ_GROUP_IOSCHED @@ -768,6 +790,13 @@ struct bfq_data { */ unsigned int word_depths[2][2]; unsigned int full_depth_shift; + + /* + * Number of independent actuators. This is equal to 1 in + * case of single-actuator drives. + */ + unsigned int num_actuators; + }; enum bfqq_state_flags { @@ -964,8 +993,10 @@ struct bfq_group { extern const int bfq_timeout; -struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync); -void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync); +struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync, + unsigned int actuator_idx); +void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync, + unsigned int actuator_idx); struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic); void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq); void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq, -- 2.47.1

6 months

2
3
0 0

[PATCH] drm/xe/display: Add check for alloc_ordered_workqueue()

by Haoxiang Li

Add check for the return value of alloc_ordered_workqueue() in xe_display_create() to catch potential exception. Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Cc: stable(a)vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com> --- drivers/gpu/drm/xe/display/xe_display.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c index b3921dbc52ff..b5d6ae05d936 100644 --- a/drivers/gpu/drm/xe/display/xe_display.c +++ b/drivers/gpu/drm/xe/display/xe_display.c @@ -97,6 +97,8 @@ int xe_display_create(struct xe_device *xe) spin_lock_init(&xe->display.fb_tracking.lock); xe->display.hotplug.dp_wq = alloc_ordered_workqueue("xe-dp", 0); + if (!xe->display.hotplug.dp_wq) + return -ENOMEM; return drmm_add_action_or_reset(&xe->drm, display_destroy, NULL); } -- 2.25.1

6 months

1
0
0 0

[PATCH] mlx5: Add check for get_macsec_device()

by Haoxiang Li

Add check for the return value of get_macsec_device() in mlx5r_del_gid_macsec_operations() to prevent null pointer dereference. Fixes: 58dbd6428a68 ("RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion") Cc: stable(a)vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com> --- drivers/infiniband/hw/mlx5/macsec.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/macsec.c b/drivers/infiniband/hw/mlx5/macsec.c index 3c56eb5eddf3..623b0a58f721 100644 --- a/drivers/infiniband/hw/mlx5/macsec.c +++ b/drivers/infiniband/hw/mlx5/macsec.c @@ -354,6 +354,11 @@ void mlx5r_del_gid_macsec_operations(const struct ib_gid_attr *attr) } } macsec_device = get_macsec_device(ndev, &dev->macsec.macsec_devices_list); + if (!macsec_device) { + dev_put(ndev); + mutex_unlock(&dev->macsec.lock); + return; + } mlx5_macsec_del_roce_rule(attr->index, dev->mdev->macsec_fs, &macsec_device->tx_rules_list, &macsec_device->rx_rules_list); mlx5_macsec_del_roce_gid(macsec_device, attr->index); -- 2.25.1

6 months

2
1
0 0

FAILED: patch "[PATCH] io_uring/kbuf: reallocate buf lists on upgrade" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 8802766324e1f5d414a81ac43365c20142e85603 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021801-splinter-sappy-56b3@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 8802766324e1f5d414a81ac43365c20142e85603 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov <asml.silence(a)gmail.com> Date: Wed, 12 Feb 2025 13:46:46 +0000 Subject: [PATCH] io_uring/kbuf: reallocate buf lists on upgrade IORING_REGISTER_PBUF_RING can reuse an old struct io_buffer_list if it was created for legacy selected buffer and has been emptied. It violates the requirement that most of the field should stay stable after publish. Always reallocate it instead. Cc: stable(a)vger.kernel.org Reported-by: Pumpkin Chang <pumpkin(a)devco.re> Fixes: 2fcabce2d7d34 ("io_uring: disallow mixed provided buffer group registrations") Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 04bf493eecae..8e72de7712ac 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -415,6 +415,13 @@ void io_destroy_buffers(struct io_ring_ctx *ctx) } } +static void io_destroy_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl) +{ + scoped_guard(mutex, &ctx->mmap_lock) + WARN_ON_ONCE(xa_erase(&ctx->io_bl_xa, bl->bgid) != bl); + io_put_bl(ctx, bl); +} + int io_remove_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_provide_buf *p = io_kiocb_to_cmd(req, struct io_provide_buf); @@ -636,12 +643,13 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) /* if mapped buffer ring OR classic exists, don't allow */ if (bl->flags & IOBL_BUF_RING || !list_empty(&bl->buf_list)) return -EEXIST; - } else { - free_bl = bl = kzalloc(sizeof(*bl), GFP_KERNEL); - if (!bl) - return -ENOMEM; + io_destroy_bl(ctx, bl); } + free_bl = bl = kzalloc(sizeof(*bl), GFP_KERNEL); + if (!bl) + return -ENOMEM; + mmap_offset = (unsigned long)reg.bgid << IORING_OFF_PBUF_SHIFT; ring_size = flex_array_size(br, bufs, reg.ring_entries);

6 months

2
1
0 0

FAILED: patch "[PATCH] io_uring/kbuf: reallocate buf lists on upgrade" failed to apply to 6.13-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.13-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y git checkout FETCH_HEAD git cherry-pick -x 8802766324e1f5d414a81ac43365c20142e85603 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021855-snugly-hacked-a8fa@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 8802766324e1f5d414a81ac43365c20142e85603 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov <asml.silence(a)gmail.com> Date: Wed, 12 Feb 2025 13:46:46 +0000 Subject: [PATCH] io_uring/kbuf: reallocate buf lists on upgrade IORING_REGISTER_PBUF_RING can reuse an old struct io_buffer_list if it was created for legacy selected buffer and has been emptied. It violates the requirement that most of the field should stay stable after publish. Always reallocate it instead. Cc: stable(a)vger.kernel.org Reported-by: Pumpkin Chang <pumpkin(a)devco.re> Fixes: 2fcabce2d7d34 ("io_uring: disallow mixed provided buffer group registrations") Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 04bf493eecae..8e72de7712ac 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -415,6 +415,13 @@ void io_destroy_buffers(struct io_ring_ctx *ctx) } } +static void io_destroy_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl) +{ + scoped_guard(mutex, &ctx->mmap_lock) + WARN_ON_ONCE(xa_erase(&ctx->io_bl_xa, bl->bgid) != bl); + io_put_bl(ctx, bl); +} + int io_remove_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_provide_buf *p = io_kiocb_to_cmd(req, struct io_provide_buf); @@ -636,12 +643,13 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) /* if mapped buffer ring OR classic exists, don't allow */ if (bl->flags & IOBL_BUF_RING || !list_empty(&bl->buf_list)) return -EEXIST; - } else { - free_bl = bl = kzalloc(sizeof(*bl), GFP_KERNEL); - if (!bl) - return -ENOMEM; + io_destroy_bl(ctx, bl); } + free_bl = bl = kzalloc(sizeof(*bl), GFP_KERNEL); + if (!bl) + return -ENOMEM; + mmap_offset = (unsigned long)reg.bgid << IORING_OFF_PBUF_SHIFT; ring_size = flex_array_size(br, bufs, reg.ring_entries);

6 months

3
4
0 0

Re: [PATCH 6.13 000/274] 6.13.4-rc1 review

by Ronald Warsow

Hi Greg no regressions here on x86_64 (RKL, Intel 11th Gen. CPU) Thanks Tested-by: Ronald Warsow <rwarsow(a)gmx.de>

6 months

1
0
0 0

[PATCH 5.10] mm: call the security_mmap_file() LSM hook in remap_file_pages()

by Pratyush Yadav

From: Shu Han <ebpqwerty472123(a)gmail.com> commit ea7e2d5e49c05e5db1922387b09ca74aa40f46e2 upstream. The remap_file_pages syscall handler calls do_mmap() directly, which doesn't contain the LSM security check. And if the process has called personality(READ_IMPLIES_EXEC) before and remap_file_pages() is called for RW pages, this will actually result in remapping the pages to RWX, bypassing a W^X policy enforced by SELinux. So we should check prot by security_mmap_file LSM hook in the remap_file_pages syscall handler before do_mmap() is called. Otherwise, it potentially permits an attacker to bypass a W^X policy enforced by SELinux. The bypass is similar to CVE-2016-10044, which bypass the same thing via AIO and can be found in [1]. The PoC: $ cat > test.c int main(void) { size_t pagesz = sysconf(_SC_PAGE_SIZE); int mfd = syscall(SYS_memfd_create, "test", 0); const char *buf = mmap(NULL, 4 * pagesz, PROT_READ | PROT_WRITE, MAP_SHARED, mfd, 0); unsigned int old = syscall(SYS_personality, 0xffffffff); syscall(SYS_personality, READ_IMPLIES_EXEC | old); syscall(SYS_remap_file_pages, buf, pagesz, 0, 2, 0); syscall(SYS_personality, old); // show the RWX page exists even if W^X policy is enforced int fd = open("/proc/self/maps", O_RDONLY); unsigned char buf2[1024]; while (1) { int ret = read(fd, buf2, 1024); if (ret <= 0) break; write(1, buf2, ret); } close(fd); } $ gcc test.c -o test $ ./test | grep rwx 7f1836c34000-7f1836c35000 rwxs 00002000 00:01 2050 /memfd:test (deleted) Link: https://project-zero.issues.chromium.org/issues/42452389 [1] Cc: stable(a)vger.kernel.org Signed-off-by: Shu Han <ebpqwerty472123(a)gmail.com> Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul(a)paul-moore.com> Signed-off-by: Pratyush Yadav <ptyadav(a)amazon.de> --- mm/mmap.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 9f76625a1743..2c17eb840e44 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3078,8 +3078,12 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, } file = get_file(vma->vm_file); + ret = security_mmap_file(vma->vm_file, prot, flags); + if (ret) + goto out_fput; ret = do_mmap(vma->vm_file, start, size, prot, flags, pgoff, &populate, NULL); +out_fput: fput(file); out: mmap_write_unlock(mm); -- 2.47.1

6 months

2
2
0 0

[PATCH] btrfs: fix data overwriting bug during buffered write when block size < page size

by Qu Wenruo

[BUG] When running generic/417 with a btrfs whose block size < page size (subpage cases), it always fails. And the following minimal reproducer is more than enough to trigger it reliably: workload() { mkfs.btrfs -s 4k -f $dev > /dev/null dmesg -C mount $dev $mnt $fsstree_dir/src/dio-invalidate-cache -r -b 4096 -n 3 -i 1 -f $mnt/diotest ret=$? umount $mnt stop_trace if [ $ret -ne 0 ]; then fail fi } for (( i = 0; i < 1024; i++)); do echo "=== $i/$runtime ===" workload done [CAUSE] With extra trace printk added to the following functions: - btrfs_buffered_write() * Which folio is touched * The file offset (start) where the buffered write is at * How many bytes are copied * The content of the write (the first 2 bytes) - submit_one_sector() * Which folio is touched * The position inside the folio - pagecache_isize_extended() * The parameters of the function itself * The parameters of the folio_zero_range() Which are enough to show the problem: 22.158114: btrfs_buffered_write: folio pos=0 start=0 copied=4096 content=0x0101 22.158161: submit_one_sector: r/i=5/257 folio=0 pos=0 content=0x0101 22.158609: btrfs_buffered_write: folio pos=0 start=4096 copied=4096 content=0x0101 22.158634: btrfs_buffered_write: folio pos=0 start=8192 copied=4096 content=0x0101 22.158650: pagecache_isize_extended: folio=0 from=4096 to=8192 bsize=4096 zero off=4096 len=8192 22.158682: submit_one_sector: r/i=5/257 folio=0 pos=4096 content=0x0000 22.158686: submit_one_sector: r/i=5/257 folio=0 pos=8192 content=0x0101 The tool dio-invalidate-cache will start 3 threads, each doing a buffered write with 0x01 at 4096 * i (i is 0, 1 ,2), do a fsync, then do a direct read, and compare the read buffer with the write buffer. Note that all 3 btrfs_buffered_write() are writing the correct 0x01 into the page cache. But at submit_one_sector(), at file offset 4096, the content is zeroed out, mostly by pagecache_isize_extended(). The race happens like this: Thread A is writing into range [4K, 8K). Thread B is writing into range [8K, 12k). Thread A | Thread B -------------------------------------+------------------------------------ btrfs_buffered_write() | btrfs_buffered_write() |- old_isize = 4K; | |- old_isize = 4096; |- btrfs_inode_lock() | | |- write into folio range [4K, 8K) | | |- pagecache_isize_extended() | | | extend isize from 4096 to 8192 | | | no folio_zero_range() called | | |- btrfs_inode_lock() | | | |- btrfs_inode_lock() | |- write into folio range [8K, 12K) | |- pagecache_isize_extended() | | calling folio_zero_range(4K, 8K) | | This is caused by the old_isize is | | grabbed too early, without any | | inode lock. | |- btrfs_inode_unlock() The @old_isize is grabbed without inode lock, causing race between two buffered write threads and making pagecache_isize_extended() to zero range which is still containing cached data. And this is only affecting subpage btrfs, because for regular blocksize == page size case, the function pagecache_isize_extended() will do nothing if the block size >= page size. [FIX] Grab the old isize with inode lock hold. This means each buffered write thread will have a stable view of the old inode size, thus avoid the above race. Cc: stable(a)vger.kernel.org Signed-off-by: Qu Wenruo <wqu(a)suse.com> --- fs/btrfs/file.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index fd90855fe717..896dc03689d6 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1090,7 +1090,7 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) u64 lockend; size_t num_written = 0; ssize_t ret; - loff_t old_isize = i_size_read(inode); + loff_t old_isize; unsigned int ilock_flags = 0; const bool nowait = (iocb->ki_flags & IOCB_NOWAIT); unsigned int bdp_flags = (nowait ? BDP_ASYNC : 0); @@ -1103,6 +1103,13 @@ ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) if (ret < 0) return ret; + /* + * We can only trust the isize with inode lock hold, or it can race with + * other buffered writes and cause incorrect call of + * pagecache_isize_extended() to overwrite existing data. + */ + old_isize = i_size_read(inode); + ret = generic_write_checks(iocb, i); if (ret <= 0) goto out; -- 2.48.1

6 months

2
1
0 0

Re: [External] : Re: Please backport: netfilter: nft_counter: Use u64_stats_t for statistic.

by MOESSBAUER, Felix

On Fri, 2024-10-04 at 09:39 +0200, Sebastian Andrzej Siewior wrote: > On 2024-09-27 15:01:00 [-0400], Joseph Salisbury wrote: > > Is it needed in all stable release patch sets, including v5.15? > > Yes. I would appreciate backporting it all the way where the code is > available. The dependencies > 1eacdd71b3436 ("netfilter: nft_counter: Disable BH in > nft_counter_offload_stats().") > a0b39e2dc7017 ("netfilter: nft_counter: Synchronize > nft_counter_reset() against reader.") > > were already routed via stable. > The problem is that the seqcount has no lock associated so a reader > could preempt a writer and then lockup spinning. Hi, this needs to be backported to all stable RT trees (just checked 4.19 and 6.1. 5.15 already has it). We observed the reader live-lock issue in "nft_counter_fetch" on 6.1.120-rt47 (leading to a system stall) and were also able to find it with lockdep (see stacktrace below). I'm wondering if this patch could be applied to linux-stable, even if it is just a performance optimization on non-rt kernels (not a fix). The patch "netfilter: nft_counter: Use u64_stats_t for statistic" cleanly applies on 6.1.y and 6.1.127-rt48. Stacktrace from lockdep: [ 33.643632] ------------[ cut here ]------------ [ 33.643637] WARNING: CPU: 0 PID: 972 at include/linux/seqlock.h:269 nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643657] Modules linked in: br_netfilter bridge stp llc xt_comment xt_recent xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink rfkill intel_rapl_msr intel_rapl_common ccp binfmt_misc kvm irqbypass ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 ppdev snd_pcm snd_timer aesni_intel snd crypto_simd cryptd soundcore pcspkr parport_pc iTCO_wdt bochs parport drm_vram_helper intel_pmc_bxt drm_ttm_helper iTCO_vendor_support button ttm drm_kms_helper watchdog sg joydev evdev serio_raw drm fuse loop efi_pstore configfs qemu_fw_cfg ip_tables x_tables autofs4 overlay nls_ascii nls_cp437 vfat fat ext4 crc32c_generic crc16 mbcache jbd2 xts ecb squashfs dm_verity dm_bufio reed_solomon dm_mod sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic virtio_net net_failover ahci failover libahci crct10dif_pclmul [ 33.643727] crct10dif_common libata virtio_pci i2c_i801 crc32_pclmul scsi_mod crc32c_intel virtio_pci_legacy_dev i2c_smbus psmouse virtio_pci_modern_dev virtio scsi_common virtio_ring lpc_ich [ 33.643739] CPU: 0 PID: 972 Comm: onboardservice Not tainted 6.1.120-rt47 #1 [ 33.643742] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 33.643744] RIP: 0010:nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643759] Code: 52 3f 85 d2 74 26 65 8b 05 ba bd 52 3f 85 c0 75 1b 65 8b 05 e7 b3 52 3f a9 ff ff ff 7f 75 0d 65 8b 05 dd ba 52 3f 85 c0 74 02 <0f> 0b ff 74 24 20 4c 8d 6d 08 45 31 c9 31 c9 41 b8 01 00 00 00 31 [ 33.643776] RSP: 0018:ffffa045007736a0 EFLAGS: 00010202 [ 33.643778] RAX: 0000000000000001 RBX: ffffc044ffc2ae80 RCX: 00000000000026af [ 33.643780] RDX: 0000000000000001 RSI: ffff8d29050db388 RDI: ffffffffc0af49a4 [ 33.643781] RBP: ffff8d293f638060 R08: 0000000000000000 R09: 0000000000000000 [ 33.643782] R10: 0000000000000001 R11: 000000009bb77572 R12: ffffa04500773920 [ 33.643783] R13: ffff8d29011db358 R14: ffff8d29011db208 R15: ffff8d29011db240 [ 33.643807] FS: 000000c000047c90(0000) GS:ffff8d293f600000(0000) knlGS:0000000000000000 [ 33.643811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.643813] CR2: 000000c0005fe000 CR3: 000000003a212000 CR4: 00000000003506f0 [ 33.643836] Call Trace: [ 33.643840] <TASK> [ 33.643844] ? __warn+0x82/0xe0 [ 33.643852] ? nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643877] ? report_bug+0x10e/0x180 [ 33.643889] ? handle_bug+0x41/0x70 [ 33.643895] ? exc_invalid_op+0x13/0x60 [ 33.643899] ? asm_exc_invalid_op+0x16/0x20 [ 33.643912] ? nft_counter_eval+0x24/0xd0 [nf_tables] [ 33.643931] ? nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643962] nft_do_chain+0x45b/0x690 [nf_tables] [ 33.644025] nft_do_chain_ipv4+0x78/0xa0 [nf_tables] [ 33.644046] nf_hook_slow+0x41/0xc0 [ 33.644054] __ip_local_out+0x14c/0x300 [ 33.644062] ? ip_output+0xb0/0xb0 [ 33.644074] __ip_queue_xmit+0x1c0/0x7f0 [ 33.644086] __tcp_transmit_skb+0xabe/0xcb0 [ 33.644107] tcp_write_xmit+0x521/0x14a0 [ 33.644117] __tcp_push_pending_frames+0x32/0xf0 [ 33.644120] tcp_sendmsg_locked+0x4cd/0xc20 [ 33.644133] tcp_sendmsg+0x27/0x40 [ 33.644137] __sock_sendmsg+0x58/0x70 [ 33.644142] sock_write_iter+0x9a/0x100 [ 33.644151] vfs_write+0x2c8/0x330 [ 33.644164] ksys_write+0xc3/0xf0 [ 33.644169] do_syscall_64+0x55/0xb0 [ 33.644173] ? lock_acquire+0xc4/0x2d0 [ 33.644178] ? find_held_lock+0x2b/0x80 [ 33.644182] ? finish_task_switch.isra.0+0xca/0x380 [ 33.644186] ? lock_release+0xd0/0x2d0 [ 33.644191] ? lockdep_hardirqs_on_prepare+0xdc/0x190 [ 33.644196] ? finish_task_switch.isra.0+0xcf/0x380 [ 33.644201] ? __schedule+0x3f8/0xd20 [ 33.644206] ? restore_fpregs_from_fpstate+0x38/0x90 [ 33.644211] ? trace_x86_fpu_regs_activated+0x1f/0xb0 [ 33.644213] ? switch_fpu_return+0x58/0x90 [ 33.644218] ? exit_to_user_mode_prepare+0x1af/0x250 [ 33.644223] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 33.644227] RIP: 0033:0x40720e [ 33.644230] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48 [ 33.644232] RSP: 002b:000000c000069980 EFLAGS: 00000216 ORIG_RAX: 0000000000000001 [ 33.644234] RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 000000000040720e [ 33.644236] RDX: 000000000000008c RSI: 000000c0001746c0 RDI: 0000000000000009 [ 33.644237] RBP: 000000c0000699c0 R08: 0000000000000000 R09: 0000000000000000 [ 33.644238] R10: 0000000000000000 R11: 0000000000000216 R12: 000000c000069b00 [ 33.644239] R13: 000000000000000e R14: 000000c00016ed00 R15: 0000000000a88360 [ 33.644250] </TASK> [ 33.644250] irq event stamp: 10266 [ 33.644251] hardirqs last enabled at (10268): [<ffffffff96339836>] vprintk_store+0x326/0x550 [ 33.644256] hardirqs last disabled at (10269): [<ffffffff9633987c>] vprintk_store+0x36c/0x550 [ 33.644259] softirqs last enabled at (9900): [<ffffffff962af77e>] __local_bh_enable_ip+0xfe/0x140 [ 33.644264] softirqs last disabled at (9904): [<ffffffffc0af49a4>] nft_counter_eval+0x24/0xd0 [nf_tables] [ 33.644277] ---[ end trace 0000000000000000 ]--- Best regards, Felix > > Sebastian -- Siemens AG Linux Expert Center Friedrich-Ludwig-Bauer-Str. 3 85748 Garching, Germany

6 months

2
1
0 0

[PATCH v2] i2c: ls2x: Fix frequency division register access

by Binbin Zhou

According to the chip manual, the I2C register access type of Loongson-2K2000/LS7A is "B", so we can only access registers in byte form (readb/writeb). Although Loongson-2K0500/Loongson-2K1000 do not have similar constraints, register accesses in byte form also behave correctly. Also, in hardware, the frequency division registers are defined as two separate registers (high 8-bit and low 8-bit), so we just access them directly as bytes. Cc: stable(a)vger.kernel.org Fixes: 015e61f0bffd ("i2c: ls2x: Add driver for Loongson-2K/LS7A I2C controller") Co-developed-by: Hongliang Wang <wanghongliang(a)loongson.cn> Signed-off-by: Hongliang Wang <wanghongliang(a)loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin(a)loongson.cn> --- V2: - Add a comment to prevent from changing that back to 16-bit write. Link to V1: https://lore.kernel.org/all/20250218111133.3058590-1-zhoubinbin@loongson.cn/ drivers/i2c/busses/i2c-ls2x.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/i2c/busses/i2c-ls2x.c b/drivers/i2c/busses/i2c-ls2x.c index 8821cac3897b..61377693a4d6 100644 --- a/drivers/i2c/busses/i2c-ls2x.c +++ b/drivers/i2c/busses/i2c-ls2x.c @@ -10,6 +10,7 @@ * Rewritten for mainline by Binbin Zhou <zhoubinbin(a)loongson.cn> */ +#include <linux/bitfield.h> #include <linux/bits.h> #include <linux/completion.h> #include <linux/device.h> @@ -26,7 +27,8 @@ #include <linux/units.h> /* I2C Registers */ -#define I2C_LS2X_PRER 0x0 /* Freq Division Register(16 bits) */ +#define I2C_LS2X_PRER_LO 0x0 /* Freq Division Low Byte Register */ +#define I2C_LS2X_PRER_HI 0x1 /* Freq Division High Byte Register */ #define I2C_LS2X_CTR 0x2 /* Control Register */ #define I2C_LS2X_TXR 0x3 /* Transport Data Register */ #define I2C_LS2X_RXR 0x3 /* Receive Data Register */ @@ -93,6 +95,7 @@ static irqreturn_t ls2x_i2c_isr(int this_irq, void *dev_id) */ static void ls2x_i2c_adjust_bus_speed(struct ls2x_i2c_priv *priv) { + u16 val; struct i2c_timings *t = &priv->i2c_t; struct device *dev = priv->adapter.dev.parent; u32 acpi_speed = i2c_acpi_find_bus_speed(dev); @@ -104,9 +107,14 @@ static void ls2x_i2c_adjust_bus_speed(struct ls2x_i2c_priv *priv) else t->bus_freq_hz = LS2X_I2C_FREQ_STD; - /* Calculate and set i2c frequency. */ - writew(LS2X_I2C_PCLK_FREQ / (5 * t->bus_freq_hz) - 1, - priv->base + I2C_LS2X_PRER); + /* + * According to the chip manual, we can only access the registers as bytes, + * otherwise the high bits will be truncated. + * So set the I2C frequency with a sequential writeb instead of writew. + */ + val = LS2X_I2C_PCLK_FREQ / (5 * t->bus_freq_hz) - 1; + writeb(FIELD_GET(GENMASK(7, 0), val), priv->base + I2C_LS2X_PRER_LO); + writeb(FIELD_GET(GENMASK(15, 8), val), priv->base + I2C_LS2X_PRER_HI); } static void ls2x_i2c_init(struct ls2x_i2c_priv *priv) base-commit: 7e45b505e699f4c80aa8bf79b4ea2a5f5a66bb51 -- 2.47.1

6 months

2
1
0 0

[PATCH 0/2] HID: intel-ish-hid: Fix use-after-free issues in driver removal process

by Zhang Lixu

These patches address use-after-free issues in the `intel_ishtp_hid` driver during the removal process. Zhang Lixu (2): HID: intel-ish-hid: Fix use-after-free issue in hid_ishtp_cl_remove() HID: intel-ish-hid: Fix use-after-free issue in ishtp_hid_remove() drivers/hid/intel-ish-hid/ishtp-hid-client.c | 2 +- drivers/hid/intel-ish-hid/ishtp-hid.c | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) base-commit: 0ae0fa3bf0b44c8611d114a9f69985bf451010c3 -- 2.43.0

6 months

2
3
0 0

[PATCH V4] mm/hugetlb: wait for hugetlb folios to be freed

by yangge1116＠126.com

From: Ge Yang <yangge1116(a)126.com> Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing of huge pages if in non-task context"), which supports deferring the freeing of hugetlb pages, the allocation of contiguous memory through cma_alloc() may fail probabilistically. In the CMA allocation process, if it is found that the CMA area is occupied by in-use hugetlb folios, these in-use hugetlb folios need to be migrated to another location. When there are no available hugetlb folios in the free hugetlb pool during the migration of in-use hugetlb folios, new folios are allocated from the buddy system. A temporary state is set on the newly allocated folio. Upon completion of the hugetlb folio migration, the temporary state is transferred from the new folios to the old folios. Normally, when the old folios with the temporary state are freed, it is directly released back to the buddy system. However, due to the deferred freeing of hugetlb pages, the PageBuddy() check fails, ultimately leading to the failure of cma_alloc(). Here is a simplified call trace illustrating the process: cma_alloc() ->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios ->unmap_and_move_huge_page() ->folio_putback_hugetlb() // Free old folios ->test_pages_isolated() ->__test_page_isolated_in_pageblock() ->PageBuddy(page) // Check if the page is in buddy To resolve this issue, we have implemented a function named wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb folios are properly released back to the buddy system after their migration is completed. By invoking wait_for_freed_hugetlb_folios() before calling PageBuddy(), we ensure that PageBuddy() will succeed. Fixes: c77c0a8ac4c52 ("mm/hugetlb: defer freeing of huge pages if in non-task context") Signed-off-by: Ge Yang <yangge1116(a)126.com> Cc: <stable(a)vger.kernel.org> --- V4: - add a check to determine if hpage_freelist is empty suggested by David V3: - adjust code and message suggested by Muchun and David V2: - flush all folios at once suggested by David include/linux/hugetlb.h | 5 +++++ mm/hugetlb.c | 8 ++++++++ mm/page_isolation.c | 10 ++++++++++ 3 files changed, 23 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6c6546b..0c54b3a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m); int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn); +void wait_for_freed_hugetlb_folios(void); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, @@ -1092,6 +1093,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn, return 0; } +static inline void wait_for_freed_hugetlb_folios(void) +{ +} + static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 30bc34d..8801dbc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2955,6 +2955,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) return ret; } +void wait_for_freed_hugetlb_folios(void) +{ + if (llist_empty(&hpage_freelist)) + return; + + flush_work(&free_hpage_work); +} + typedef enum { /* * For either 0/1: we checked the per-vma resv map, and one resv diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 8ed53ee0..b2fc526 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -615,6 +615,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, int ret; /* + * Due to the deferred freeing of hugetlb folios, the hugepage folios may + * not immediately release to the buddy system. This can cause PageBuddy() + * to fail in __test_page_isolated_in_pageblock(). To ensure that the + * hugetlb folios are properly released back to the buddy system, we + * invoke the wait_for_freed_hugetlb_folios() function to wait for the + * release to complete. + */ + wait_for_freed_hugetlb_folios(); + + /* * Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free * pages are not aligned to pageblock_nr_pages. * Then we just check migratetype first. -- 2.7.4

6 months

3
2
0 0

[PATCH] scsi: esas2r: Add check for alloc_ordered_workqueue()

by Haoxiang Li

Add check for the return value of alloc_ordered_workqueue() in esas2r_init_adapter() to catch potential exception. Fixes: 4cb1b41a5ee4 ("scsi: esas2r: Simplify an alloc_ordered_workqueue() invocation") Cc: stable(a)vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com> --- drivers/scsi/esas2r/esas2r_init.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/scsi/esas2r/esas2r_init.c b/drivers/scsi/esas2r/esas2r_init.c index 0cea5f3d1a08..48bb8aaf9df4 100644 --- a/drivers/scsi/esas2r/esas2r_init.c +++ b/drivers/scsi/esas2r/esas2r_init.c @@ -314,6 +314,11 @@ int esas2r_init_adapter(struct Scsi_Host *host, struct pci_dev *pcid, a->fw_event_q = alloc_ordered_workqueue("esas2r/%d", WQ_MEM_RECLAIM, a->index); + if (!a->fw_event_q) { + esas2r_log(ESAS2R_LOG_CRIT, "failed to create work queue\n"); + esas2r_kill_adapter(index); + return 0; + } init_waitqueue_head(&a->buffered_ioctl_waiter); init_waitqueue_head(&a->nvram_waiter); init_waitqueue_head(&a->fm_api_waiter); -- 2.25.1

6 months

1
0
0 0

[PATCH] Revert "ext4: apply umask if ACL support is disabled"

by Max Kellermann

This reverts commit 484fd6c1de13b336806a967908a927cc0356e312. The commit caused a regression because now the umask was applied to symlinks and the fix is unnecessary because the umask/O_TMPFILE bug has been fixed somewhere else already. Fixes: https://lore.kernel.org/lkml/28DSITL9912E1.2LSZUVTGTO52Q@mforney.org/ Signed-off-by: Max Kellermann <max.kellermann(a)ionos.com> --- fs/ext4/acl.h | 5 ----- 1 file changed, 5 deletions(-) diff --git a/fs/ext4/acl.h b/fs/ext4/acl.h index ef4c19e5f570..0c5a79c3b5d4 100644 --- a/fs/ext4/acl.h +++ b/fs/ext4/acl.h @@ -68,11 +68,6 @@ extern int ext4_init_acl(handle_t *, struct inode *, struct inode *); static inline int ext4_init_acl(handle_t *handle, struct inode *inode, struct inode *dir) { - /* usually, the umask is applied by posix_acl_create(), but if - ext4 ACL support is disabled at compile time, we need to do - it here, because posix_acl_create() will never be called */ - inode->i_mode &= ~current_umask(); - return 0; } #endif /* CONFIG_EXT4_FS_POSIX_ACL */ -- 2.39.2

6 months

6
6
0 0

Re: [PATCH] jfs: fix slab-out-of-bounds read in ea_get()

by Qasim Ijaz

On Thu, Feb 13, 2025 at 11:07:07AM +0100, Greg KH wrote: > On Thu, Feb 13, 2025 at 12:20:25AM +0000, Qasim Ijaz wrote: > > During the "size_check" label in ea_get(), the code checks if the extended > > attribute list (xattr) size matches ea_size. If not, it logs > > "ea_get: invalid extended attribute" and calls print_hex_dump(). > > > > Here, EALIST_SIZE(ea_buf->xattr) returns 4110417968, which exceeds > > INT_MAX (2,147,483,647). Then ea_size is clamped: > > > > int size = clamp_t(int, ea_size, 0, EALIST_SIZE(ea_buf->xattr)); > > > > Although clamp_t aims to bound ea_size between 0 and 4110417968, the upper > > limit is treated as an int, causing an overflow above 2^31 - 1. This leads > > "size" to wrap around and become negative (-184549328). > > > > The "size" is then passed to print_hex_dump() (called "len" in > > print_hex_dump()), it is passed as type size_t (an unsigned > > type), this is then stored inside a variable called > > "int remaining", which is then assigned to "int linelen" which > > is then passed to hex_dump_to_buffer(). In print_hex_dump() > > the for loop, iterates through 0 to len-1, where len is > > 18446744073525002176, calling hex_dump_to_buffer() > > on each iteration: > > > > for (i = 0; i < len; i += rowsize) { > > linelen = min(remaining, rowsize); > > remaining -= rowsize; > > > > hex_dump_to_buffer(ptr + i, linelen, rowsize, groupsize, > > linebuf, sizeof(linebuf), ascii); > > > > ... > > } > > > > The expected stopping condition (i < len) is effectively broken > > since len is corrupted and very large. This eventually leads to > > the "ptr+i" being passed to hex_dump_to_buffer() to get closer > > to the end of the actual bounds of "ptr", eventually an out of > > bounds access is done in hex_dump_to_buffer() in the following > > for loop: > > > > for (j = 0; j < len; j++) { > > if (linebuflen < lx + 2) > > goto overflow2; > > ch = ptr[j]; > > ... > > } > > > > To fix this we should validate "EALIST_SIZE(ea_buf->xattr)" > > before it is utilised. > > > > Reported-by: syzbot <syzbot+4e6e7e4279d046613bc5(a)syzkaller.appspotmail.com> > > Tested-by: syzbot <syzbot+4e6e7e4279d046613bc5(a)syzkaller.appspotmail.com> > > Closes: https://syzkaller.appspot.com/bug?extid=4e6e7e4279d046613bc5 > > Fixes: d9f9d96136cb ("jfs: xattr: check invalid xattr size more strictly") > > Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com> > > --- > > fs/jfs/xattr.c | 15 ++++++++++----- > > 1 file changed, 10 insertions(+), 5 deletions(-) > > > > diff --git a/fs/jfs/xattr.c b/fs/jfs/xattr.c > > index 24afbae87225..7575c51cce9b 100644 > > --- a/fs/jfs/xattr.c > > +++ b/fs/jfs/xattr.c > > @@ -559,11 +555,16 @@ static int ea_get(struct inode *inode, struct ea_buffer *ea_buf, int min_size) > > > > size_check: > > if (EALIST_SIZE(ea_buf->xattr) != ea_size) { > > - int size = clamp_t(int, ea_size, 0, EALIST_SIZE(ea_buf->xattr)); > > - > > - printk(KERN_ERR "ea_get: invalid extended attribute\n"); > > - print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, > > - ea_buf->xattr, size, 1); > > + if (unlikely(EALIST_SIZE(ea_buf->xattr) > INT_MAX)) { > > + printk(KERN_ERR "ea_get: extended attribute size too large: %u > INT_MAX\n", > > + EALIST_SIZE(ea_buf->xattr)); > > + } else { > > + int size = clamp_t(int, ea_size, 0, EALIST_SIZE(ea_buf->xattr)); > > + > > + printk(KERN_ERR "ea_get: invalid extended attribute\n"); > > + print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, > > + ea_buf->xattr, size, 1); > > + } > > ea_release(inode, ea_buf); > > rc = -EIO; > > goto clean_up; > > -- > > 2.39.5 > > > > Hi, > > This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him > a patch that has triggered this response. He used to manually respond > to these common problems, but in order to save his sanity (he kept > writing the same thing over and over, yet to different people), I was > created. Hopefully you will not take offence and will fix the problem > in your patch and resubmit it so that it can be accepted into the Linux > kernel tree. > > You are receiving this message because of the following common error(s) > as indicated below: > > - You have marked a patch with a "Fixes:" tag for a commit that is in an > older released kernel, yet you do not have a cc: stable line in the > signed-off-by area at all, which means that the patch will not be > applied to any older kernel releases. To properly fix this, please > follow the documented rules in the > Documentation/process/stable-kernel-rules.rst file for how to resolve > this. > > If you wish to discuss this problem further, or you have questions about > how to resolve this issue, please feel free to respond to this email and > Greg will reply once he has dug out from the pending patches received > from other developers. > Hi Greg, Just following up on this patch. I’ve sent v2 with the added CC stable tag. Here’s the link: https://lore.kernel.org/all/20250213210553.28613-1-qasdev00@gmail.com/ Let me know if any further changes are needed. Thanks, Qasim > thanks, > > greg k-h's patch email bot

6 months

2
1
0 0

[PATCH v2 Linux-6.12.y Linux-6.13.y 1/1] selftests/mm: build with -O2

by Yifei Liu

From: Kevin Brodsky <kevin.brodsky(a)arm.com> [ Upstream commit 46036188ea1f5266df23a6149dea0df1c77cd1c7 ] The mm kselftests are currently built with no optimisation (-O0). It's unclear why, and besides being obviously suboptimal, this also prevents the pkeys tests from working as intended. Let's build all the tests with -O2. [kevin.brodsky(a)arm.com: silence unused-result warnings] Link: https://lkml.kernel.org/r/20250107170110.2819685-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20241209095019.1732120-6-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky(a)arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: Joey Gouly <joey.gouly(a)arm.com> Cc: Keith Lucas <keith.lucas(a)oracle.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Shuah Khan <shuah(a)kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> (cherry picked from commit 46036188ea1f5266df23a6149dea0df1c77cd1c7) [Yifei: This commit also fix the failure of pkey_sighandler_tests_64, which is also in linux-6.12.y and linux-6.13.y, thus backport this commit] Signed-off-by: Yifei Liu <yifei.l.liu(a)oracle.com> --- tools/testing/selftests/mm/Makefile | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index 02e1204971b0..c0138cb19705 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -33,9 +33,16 @@ endif # LDLIBS. MAKEFLAGS += --no-builtin-rules -CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES) +CFLAGS = -Wall -O2 -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES) LDLIBS = -lrt -lpthread -lm +# Some distributions (such as Ubuntu) configure GCC so that _FORTIFY_SOURCE is +# automatically enabled at -O1 or above. This triggers various unused-result +# warnings where functions such as read() or write() are called and their +# return value is not checked. Disable _FORTIFY_SOURCE to silence those +# warnings. +CFLAGS += -U_FORTIFY_SOURCE + TEST_GEN_FILES = cow TEST_GEN_FILES += compaction_test TEST_GEN_FILES += gup_longterm -- 2.46.0

6 months

4
7
0 0

[PATCH] drm/xe: Fix exporting xe buffers multiple times

by Jacek Lawrynowicz

From: Tomasz Rusinowicz <tomasz.rusinowicz(a)intel.com> The `struct ttm_resource->placement` contains TTM_PL_FLAG_* flags, but it was incorrectly tested for XE_PL_* flags. This caused xe_dma_buf_pin() to always fail when invoked for the second time. Fix this by checking the `mem_type` field instead. Fixes: 7764222d54b7 ("drm/xe: Disallow pinning dma-bufs in VRAM") Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com> Cc: Lucas De Marchi <lucas.demarchi(a)intel.com> Cc: "Thomas Hellström" <thomas.hellstrom(a)linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko(a)intel.com> Cc: Matthew Brost <matthew.brost(a)intel.com> Cc: Matthew Auld <matthew.auld(a)intel.com> Cc: Nirmoy Das <nirmoy.das(a)intel.com> Cc: Jani Nikula <jani.nikula(a)intel.com> Cc: intel-xe(a)lists.freedesktop.org Cc: <stable(a)vger.kernel.org> # v6.8+ Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz(a)intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com> --- drivers/gpu/drm/xe/xe_bo.h | 2 -- drivers/gpu/drm/xe/xe_dma_buf.c | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h index d9386ab031404..43bf6f140d40d 100644 --- a/drivers/gpu/drm/xe/xe_bo.h +++ b/drivers/gpu/drm/xe/xe_bo.h @@ -341,7 +341,6 @@ static inline unsigned int xe_sg_segment_size(struct device *dev) return round_down(max / 2, PAGE_SIZE); } -#if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST) /** * xe_bo_is_mem_type - Whether the bo currently resides in the given * TTM memory type @@ -356,4 +355,3 @@ static inline bool xe_bo_is_mem_type(struct xe_bo *bo, u32 mem_type) return bo->ttm.resource->mem_type == mem_type; } #endif -#endif diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c index c5b95470fa324..f67803e15a0e6 100644 --- a/drivers/gpu/drm/xe/xe_dma_buf.c +++ b/drivers/gpu/drm/xe/xe_dma_buf.c @@ -58,7 +58,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach) * 1) Avoid pinning in a placement not accessible to some importers. * 2) Pinning in VRAM requires PIN accounting which is a to-do. */ - if (xe_bo_is_pinned(bo) && bo->ttm.resource->placement != XE_PL_TT) { + if (xe_bo_is_pinned(bo) && !xe_bo_is_mem_type(bo, XE_PL_TT)) { drm_dbg(&xe->drm, "Can't migrate pinned bo for dma-buf pin.\n"); return -EINVAL; } -- 2.45.1

6 months

2
1
0 0

[PATCH V3] mm/hugetlb: wait for hugetlb folios to be freed

by yangge1116＠126.com

From: Ge Yang <yangge1116(a)126.com> Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing of huge pages if in non-task context"), which supports deferring the freeing of hugetlb pages, the allocation of contiguous memory through cma_alloc() may fail probabilistically. In the CMA allocation process, if it is found that the CMA area is occupied by in-use hugetlb folios, these in-use hugetlb folios need to be migrated to another location. When there are no available hugetlb folios in the free hugetlb pool during the migration of in-use hugetlb folios, new folios are allocated from the buddy system. A temporary state is set on the newly allocated folio. Upon completion of the hugetlb folio migration, the temporary state is transferred from the new folios to the old folios. Normally, when the old folios with the temporary state are freed, it is directly released back to the buddy system. However, due to the deferred freeing of hugetlb pages, the PageBuddy() check fails, ultimately leading to the failure of cma_alloc(). Here is a simplified call trace illustrating the process: cma_alloc() ->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios ->unmap_and_move_huge_page() ->folio_putback_hugetlb() // Free old folios ->test_pages_isolated() ->__test_page_isolated_in_pageblock() ->PageBuddy(page) // Check if the page is in buddy To resolve this issue, we have implemented a function named wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb folios are properly released back to the buddy system after their migration is completed. By invoking wait_for_freed_hugetlb_folios() before calling PageBuddy(), we ensure that PageBuddy() will succeed. Fixes: c77c0a8ac4c52 ("mm/hugetlb: defer freeing of huge pages if in non-task context") Signed-off-by: Ge Yang <yangge1116(a)126.com> Cc: <stable(a)vger.kernel.org> --- V3: - adjust code and message suggested by Muchun and David V2: - flush all folios at once suggested by David include/linux/hugetlb.h | 5 +++++ mm/hugetlb.c | 5 +++++ mm/page_isolation.c | 10 ++++++++++ 3 files changed, 20 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6c6546b..0c54b3a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m); int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn); +void wait_for_freed_hugetlb_folios(void); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, @@ -1092,6 +1093,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn, return 0; } +static inline void wait_for_freed_hugetlb_folios(void) +{ +} + static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 30bc34d..b4630b3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2955,6 +2955,11 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) return ret; } +void wait_for_freed_hugetlb_folios(void) +{ + flush_work(&free_hpage_work); +} + typedef enum { /* * For either 0/1: we checked the per-vma resv map, and one resv diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 8ed53ee0..b2fc526 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -615,6 +615,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, int ret; /* + * Due to the deferred freeing of hugetlb folios, the hugepage folios may + * not immediately release to the buddy system. This can cause PageBuddy() + * to fail in __test_page_isolated_in_pageblock(). To ensure that the + * hugetlb folios are properly released back to the buddy system, we + * invoke the wait_for_freed_hugetlb_folios() function to wait for the + * release to complete. + */ + wait_for_freed_hugetlb_folios(); + + /* * Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free * pages are not aligned to pageblock_nr_pages. * Then we just check migratetype first. -- 2.7.4

6 months

3
4
0 0

[PATCH] wifi: mt76: Add check for devm_kstrdup()

by Haoxiang Li

Add check for the return value of devm_kstrdup() in mt76_get_of_data_from_mtd() to catch potential exception. Fixes: e7a6a044f9b9 ("mt76: testmode: move mtd part to mt76_dev") Cc: stable(a)vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com> --- drivers/net/wireless/mediatek/mt76/eeprom.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt76/eeprom.c b/drivers/net/wireless/mediatek/mt76/eeprom.c index 0bc66cc19acd..443517d06c9f 100644 --- a/drivers/net/wireless/mediatek/mt76/eeprom.c +++ b/drivers/net/wireless/mediatek/mt76/eeprom.c @@ -95,6 +95,10 @@ int mt76_get_of_data_from_mtd(struct mt76_dev *dev, void *eep, int offset, int l #ifdef CONFIG_NL80211_TESTMODE dev->test_mtd.name = devm_kstrdup(dev->dev, part, GFP_KERNEL); + if (!dev->test_mtd.name) { + ret = -ENOMEM; + goto out_put_node; + } dev->test_mtd.offset = offset; #endif -- 2.25.1

6 months

1
0
0 0

[PATCH net] gve: set xdp redirect target only when it is available

by joshwash＠google.com

From: Joshua Washington <joshwash(a)google.com> Before this patch the NETDEV_XDP_ACT_NDO_XMIT XDP feature flag is set by default as part of driver initialization, and is never cleared. However, this flag differs from others in that it is used as an indicator for whether the driver is ready to perform the ndo_xdp_xmit operation as part of an XDP_REDIRECT. Kernel helpers xdp_features_(set|clear)_redirect_target exist to convey this meaning. This patch ensures that the netdev is only reported as a redirect target when XDP queues exist to forward traffic. Fixes: 39a7f4aa3e4a ("gve: Add XDP REDIRECT support for GQI-QPL format") Cc: stable(a)vger.kernel.org Reviewed-by: Praveen Kaligineedi <pkaligineedi(a)google.com> Reviewed-by: Jeroen de Borst <jeroendb(a)google.com> Signed-off-by: Joshua Washington <joshwash(a)google.com> --- drivers/net/ethernet/google/gve/gve.h | 10 ++++++++++ drivers/net/ethernet/google/gve/gve_main.c | 6 +++++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h index 8167cc5fb0df..78d2a19593d1 100644 --- a/drivers/net/ethernet/google/gve/gve.h +++ b/drivers/net/ethernet/google/gve/gve.h @@ -1116,6 +1116,16 @@ static inline u32 gve_xdp_tx_start_queue_id(struct gve_priv *priv) return gve_xdp_tx_queue_id(priv, 0); } +static inline bool gve_supports_xdp_xmit(struct gve_priv *priv) +{ + switch (priv->queue_format) { + case GVE_GQI_QPL_FORMAT: + return true; + default: + return false; + } +} + /* gqi napi handler defined in gve_main.c */ int gve_napi_poll(struct napi_struct *napi, int budget); diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c index 533e659b15b3..92237fb0b60c 100644 --- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -1903,6 +1903,8 @@ static void gve_turndown(struct gve_priv *priv) /* Stop tx queues */ netif_tx_disable(priv->dev); + xdp_features_clear_redirect_target(priv->dev); + gve_clear_napi_enabled(priv); gve_clear_report_stats(priv); @@ -1972,6 +1974,9 @@ static void gve_turnup(struct gve_priv *priv) napi_schedule(&block->napi); } + if (priv->num_xdp_queues && gve_supports_xdp_xmit(priv)) + xdp_features_set_redirect_target(priv->dev, false); + gve_set_napi_enabled(priv); } @@ -2246,7 +2251,6 @@ static void gve_set_netdev_xdp_features(struct gve_priv *priv) if (priv->queue_format == GVE_GQI_QPL_FORMAT) { xdp_features = NETDEV_XDP_ACT_BASIC; xdp_features |= NETDEV_XDP_ACT_REDIRECT; - xdp_features |= NETDEV_XDP_ACT_NDO_XMIT; xdp_features |= NETDEV_XDP_ACT_XSK_ZEROCOPY; } else { xdp_features = 0; -- 2.48.1.601.g30ceb7b040-goog

6 months

2
1
0 0

FAILED: patch "[PATCH] io_uring/kbuf: reallocate buf lists on upgrade" failed to apply to 6.12-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.12-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y git checkout FETCH_HEAD git cherry-pick -x 8802766324e1f5d414a81ac43365c20142e85603 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021855-fancy-trough-87ae@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 8802766324e1f5d414a81ac43365c20142e85603 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov <asml.silence(a)gmail.com> Date: Wed, 12 Feb 2025 13:46:46 +0000 Subject: [PATCH] io_uring/kbuf: reallocate buf lists on upgrade IORING_REGISTER_PBUF_RING can reuse an old struct io_buffer_list if it was created for legacy selected buffer and has been emptied. It violates the requirement that most of the field should stay stable after publish. Always reallocate it instead. Cc: stable(a)vger.kernel.org Reported-by: Pumpkin Chang <pumpkin(a)devco.re> Fixes: 2fcabce2d7d34 ("io_uring: disallow mixed provided buffer group registrations") Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 04bf493eecae..8e72de7712ac 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -415,6 +415,13 @@ void io_destroy_buffers(struct io_ring_ctx *ctx) } } +static void io_destroy_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl) +{ + scoped_guard(mutex, &ctx->mmap_lock) + WARN_ON_ONCE(xa_erase(&ctx->io_bl_xa, bl->bgid) != bl); + io_put_bl(ctx, bl); +} + int io_remove_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_provide_buf *p = io_kiocb_to_cmd(req, struct io_provide_buf); @@ -636,12 +643,13 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) /* if mapped buffer ring OR classic exists, don't allow */ if (bl->flags & IOBL_BUF_RING || !list_empty(&bl->buf_list)) return -EEXIST; - } else { - free_bl = bl = kzalloc(sizeof(*bl), GFP_KERNEL); - if (!bl) - return -ENOMEM; + io_destroy_bl(ctx, bl); } + free_bl = bl = kzalloc(sizeof(*bl), GFP_KERNEL); + if (!bl) + return -ENOMEM; + mmap_offset = (unsigned long)reg.bgid << IORING_OFF_PBUF_SHIFT; ring_size = flex_array_size(br, bufs, reg.ring_entries);

6 months

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror February 2025