Linux-stable-mirror February 2022

linux-stable-mirror@lists.linaro.org

377 participants
1736 discussions

[PATCH 1/2] ice: Fix race conditions between virtchnl handling and VF ndo ops

by Jacob Keller

From: Brett Creeley <brett.creeley(a)intel.com> commit e6ba5273d4ede03d075d7a116b8edad1f6115f4d upstream. [I had to fix the cherry-pick manually as the patch added a line around some context that was missing.] The VF can be configured via the PF's ndo ops at the same time the PF is receiving/handling virtchnl messages. This has many issues, with one of them being the ndo op could be actively resetting a VF (i.e. resetting it to the default state and deleting/re-adding the VF's VSI) while a virtchnl message is being handled. The following error was seen because a VF ndo op was used to change a VF's trust setting while the VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing: [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5 [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6 Fix this by making sure the virtchnl handling and VF ndo ops that trigger VF resets cannot run concurrently. This is done by adding a struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex will be locked around the critical operations and VFR. Since the ndo ops will trigger a VFR, the virtchnl thread will use mutex_trylock(). This is done because if any other thread (i.e. VF ndo op) has the mutex, then that means the current VF message being handled is no longer valid, so just ignore it. This issue can be seen using the following commands: for i in {0..50}; do rmmod ice modprobe ice sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on sleep 2 echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on done Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations") Cc: <stable(a)vger.kernel.org> Signed-off-by: Brett Creeley <brett.creeley(a)intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com> Signed-off-by: Jacob Keller <jacob.e.keller(a)intel.com> --- .../net/ethernet/intel/ice/ice_virtchnl_pf.c | 25 +++++++++++++++++++ .../net/ethernet/intel/ice/ice_virtchnl_pf.h | 5 ++++ 2 files changed, 30 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index a78e8f00cf71..a0b8436f50ba 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -646,6 +646,8 @@ void ice_free_vfs(struct ice_pf *pf) set_bit(ICE_VF_STATE_DIS, pf->vf[i].vf_states); ice_free_vf_res(&pf->vf[i]); } + + mutex_destroy(&pf->vf[i].cfg_lock); } if (ice_sriov_free_msix_res(pf)) @@ -1894,6 +1896,8 @@ static void ice_set_dflt_settings_vfs(struct ice_pf *pf) */ ice_vf_ctrl_invalidate_vsi(vf); ice_vf_fdir_init(vf); + + mutex_init(&vf->cfg_lock); } } @@ -4082,6 +4086,8 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos, return 0; } + mutex_lock(&vf->cfg_lock); + vf->port_vlan_info = vlanprio; if (vf->port_vlan_info) @@ -4091,6 +4097,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos, dev_info(dev, "Clearing port VLAN on VF %d\n", vf_id); ice_vc_reset_vf(vf); + mutex_unlock(&vf->cfg_lock); return 0; } @@ -4465,6 +4472,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event) return; } + /* VF is being configured in another context that triggers a VFR, so no + * need to process this message + */ + if (!mutex_trylock(&vf->cfg_lock)) { + dev_info(dev, "VF %u is being configured in another context that will trigger a VFR, so there is no need to handle this message\n", + vf->vf_id); + return; + } + switch (v_opcode) { case VIRTCHNL_OP_VERSION: err = ice_vc_get_ver_msg(vf, msg); @@ -4553,6 +4569,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event) dev_info(dev, "PF failed to honor VF %d, opcode %d, error %d\n", vf_id, v_opcode, err); } + + mutex_unlock(&vf->cfg_lock); } /** @@ -4668,6 +4686,8 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) return -EINVAL; } + mutex_lock(&vf->cfg_lock); + /* VF is notified of its new MAC via the PF's response to the * VIRTCHNL_OP_GET_VF_RESOURCES message after the VF has been reset */ @@ -4686,6 +4706,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) } ice_vc_reset_vf(vf); + mutex_unlock(&vf->cfg_lock); return 0; } @@ -4715,11 +4736,15 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted) if (trusted == vf->trusted) return 0; + mutex_lock(&vf->cfg_lock); + vf->trusted = trusted; ice_vc_reset_vf(vf); dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n", vf_id, trusted ? "" : "un"); + mutex_unlock(&vf->cfg_lock); + return 0; } diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h index 38b4dc82c5c1..a750e9a9d712 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h @@ -74,6 +74,11 @@ struct ice_mdd_vf_events { struct ice_vf { struct ice_pf *pf; + /* Used during virtchnl message handling and NDO ops against the VF + * that will trigger a VFR + */ + struct mutex cfg_lock; + u16 vf_id; /* VF ID in the PF space */ u16 lan_vsi_idx; /* index into PF struct */ u16 ctrl_vsi_idx; -- 2.35.1.129.gb80121027d12

3 years, 10 months

[PATCH 1/2] ice: Fix race conditions between virtchnl handling and VF ndo ops

by Jacob Keller

From: Brett Creeley <brett.creeley(a)intel.com> commit e6ba5273d4ede03d075d7a116b8edad1f6115f4d upstream. [I had to fix the cherry-pick manually as the patch added a line around some context that was missing.] The VF can be configured via the PF's ndo ops at the same time the PF is receiving/handling virtchnl messages. This has many issues, with one of them being the ndo op could be actively resetting a VF (i.e. resetting it to the default state and deleting/re-adding the VF's VSI) while a virtchnl message is being handled. The following error was seen because a VF ndo op was used to change a VF's trust setting while the VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing: [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5 [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6 Fix this by making sure the virtchnl handling and VF ndo ops that trigger VF resets cannot run concurrently. This is done by adding a struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex will be locked around the critical operations and VFR. Since the ndo ops will trigger a VFR, the virtchnl thread will use mutex_trylock(). This is done because if any other thread (i.e. VF ndo op) has the mutex, then that means the current VF message being handled is no longer valid, so just ignore it. This issue can be seen using the following commands: for i in {0..50}; do rmmod ice modprobe ice sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on sleep 2 echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on done Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations") Cc: <stable(a)vger.kernel.org> # 5.14.x Signed-off-by: Brett Creeley <brett.creeley(a)intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com> Signed-off-by: Jacob Keller <jacob.e.keller(a)intel.com> --- This should apply to 5.14.x .../net/ethernet/intel/ice/ice_virtchnl_pf.c | 25 +++++++++++++++++++ .../net/ethernet/intel/ice/ice_virtchnl_pf.h | 5 ++++ 2 files changed, 30 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index 7e3ae4cc17a3..d2f79d579745 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -646,6 +646,8 @@ void ice_free_vfs(struct ice_pf *pf) set_bit(ICE_VF_STATE_DIS, pf->vf[i].vf_states); ice_free_vf_res(&pf->vf[i]); } + + mutex_destroy(&pf->vf[i].cfg_lock); } if (ice_sriov_free_msix_res(pf)) @@ -1892,6 +1894,8 @@ static void ice_set_dflt_settings_vfs(struct ice_pf *pf) */ ice_vf_ctrl_invalidate_vsi(vf); ice_vf_fdir_init(vf); + + mutex_init(&vf->cfg_lock); } } @@ -4080,6 +4084,8 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos, return 0; } + mutex_lock(&vf->cfg_lock); + vf->port_vlan_info = vlanprio; if (vf->port_vlan_info) @@ -4089,6 +4095,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos, dev_info(dev, "Clearing port VLAN on VF %d\n", vf_id); ice_vc_reset_vf(vf); + mutex_unlock(&vf->cfg_lock); return 0; } @@ -4463,6 +4470,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event) return; } + /* VF is being configured in another context that triggers a VFR, so no + * need to process this message + */ + if (!mutex_trylock(&vf->cfg_lock)) { + dev_info(dev, "VF %u is being configured in another context that will trigger a VFR, so there is no need to handle this message\n", + vf->vf_id); + return; + } + switch (v_opcode) { case VIRTCHNL_OP_VERSION: err = ice_vc_get_ver_msg(vf, msg); @@ -4551,6 +4567,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event) dev_info(dev, "PF failed to honor VF %d, opcode %d, error %d\n", vf_id, v_opcode, err); } + + mutex_unlock(&vf->cfg_lock); } /** @@ -4666,6 +4684,8 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) return -EINVAL; } + mutex_lock(&vf->cfg_lock); + /* VF is notified of its new MAC via the PF's response to the * VIRTCHNL_OP_GET_VF_RESOURCES message after the VF has been reset */ @@ -4684,6 +4704,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) } ice_vc_reset_vf(vf); + mutex_unlock(&vf->cfg_lock); return 0; } @@ -4713,11 +4734,15 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted) if (trusted == vf->trusted) return 0; + mutex_lock(&vf->cfg_lock); + vf->trusted = trusted; ice_vc_reset_vf(vf); dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n", vf_id, trusted ? "" : "un"); + mutex_unlock(&vf->cfg_lock); + return 0; } diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h index 38b4dc82c5c1..a750e9a9d712 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h @@ -74,6 +74,11 @@ struct ice_mdd_vf_events { struct ice_vf { struct ice_pf *pf; + /* Used during virtchnl message handling and NDO ops against the VF + * that will trigger a VFR + */ + struct mutex cfg_lock; + u16 vf_id; /* VF ID in the PF space */ u16 lan_vsi_idx; /* index into PF struct */ u16 ctrl_vsi_idx; -- 2.35.1.355.ge7e302376dd6

3 years, 10 months

[PATCH] scsi: ufs: move shutting_down back to ufshcd_shutdown

by Jaegeuk Kim

The commit b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun") moved hba->shutting_down from ufshcd_shutdown to ufshcd_wl_shutdown, which introduced regression as belows. ufshcd_err_handler started; HBA state eh_non_fatal; powered 1; shutting down 1; saved_err = 4; saved_uic_err = 64; force_reset = 0 ... task:init state:D stack: 0 pid: 1 ppid: 0 flags:0x04000008 Call trace: __switch_to+0x25c/0x5e0 __schedule+0x68c/0xaa8 schedule+0x12c/0x24c schedule_timeout+0x98/0x138 wait_for_common_io+0x13c/0x30c blk_execute_rq+0xb0/0x10c __scsi_execute+0x100/0x27c ufshcd_set_dev_pwr_mode+0x1c8/0x408 __ufshcd_wl_suspend+0x564/0x688 ufshcd_wl_shutdown+0xa8/0xc0 device_shutdown+0x234/0x578 kernel_restart+0x4c/0x140 __arm64_sys_reboot+0x3a0/0x414 el0_svc_common+0xd0/0x1e4 el0_svc+0x28/0x88 el0_sync_handler+0x8c/0xf0 el0_sync+0x1c0/0x200 The init for reboot was stuck, since ufshcd_err_hanlder was skipped when shutting down WLUN. This patch allows to run the error handler and let disable it during final ufshcd_shutdown only. Cc: stable(a)vger.kernel.org Fixes: b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun") Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org> --- drivers/scsi/ufs/ufshcd.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 460d2b440d2e..a37813b474d0 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -9178,10 +9178,6 @@ static void ufshcd_wl_shutdown(struct device *dev) hba = shost_priv(sdev->host); - down(&hba->host_sem); - hba->shutting_down = true; - up(&hba->host_sem); - /* Turn on everything while shutting down */ ufshcd_rpm_get_sync(hba); scsi_device_quiesce(sdev); @@ -9387,6 +9383,10 @@ EXPORT_SYMBOL(ufshcd_runtime_resume); */ int ufshcd_shutdown(struct ufs_hba *hba) { + down(&hba->host_sem); + hba->shutting_down = true; + up(&hba->host_sem); + if (ufshcd_is_ufs_dev_poweroff(hba) && ufshcd_is_link_off(hba)) goto out; -- 2.35.1.574.g5d30c73bfb-goog

3 years, 10 months

[PATCH mips-fixes] MIPS: fix fortify panic when copying asm exception handlers

by Alexander Lobakin

With KCFLAGS="-O3", I was able to trigger a fortify-source memcpy() overflow panic on set_vi_srs_handler(). Although O3 level is not supported in the mainline, under some conditions that may've happened with any optimization settings, it's just a matter of inlining luck. The panic itself is correct, more precisely, 50/50 false-positive and not at the same time. From the one side, no real overflow happens. Exception handler defined in asm just gets copied to some reserved places in the memory. But the reason behind is that C code refers to that exception handler declares it as `char`, i.e. something of 1 byte length. It's obvious that the asm function itself is way more than 1 byte, so fortify logics thought we are going to past the symbol declared. The standard way to refer to asm symbols from C code which is not supposed to be called from C is to declare them as `extern const u8[]`. This is fully correct from any point of view, as any code itself is just a bunch of bytes (including 0 as it is for syms like _stext/_etext/etc.), and the exact size is not known at the moment of compilation. Adjust the type of the except_vec_vi_*() and related variables. Make set_handler() take `const` as a second argument to avoid cast-away warnings and give a little more room for optimization. Fixes: e01402b115cc ("More AP / SP bits for the 34K, the Malta bits and things. Still wants") Fixes: c65a5480ff29 ("[MIPS] Fix potential latency problem due to non-atomic cpu_wait.") Cc: stable(a)vger.kernel.org # 3.10+ Signed-off-by: Alexander Lobakin <alobakin(a)pm.me> --- arch/mips/include/asm/setup.h | 2 +- arch/mips/kernel/traps.c | 22 +++++++++++----------- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/mips/include/asm/setup.h b/arch/mips/include/asm/setup.h index bb36a400203d..8c56b862fd9c 100644 --- a/arch/mips/include/asm/setup.h +++ b/arch/mips/include/asm/setup.h @@ -16,7 +16,7 @@ static inline void setup_8250_early_printk_port(unsigned long base, unsigned int reg_shift, unsigned int timeout) {} #endif -extern void set_handler(unsigned long offset, void *addr, unsigned long len); +void set_handler(unsigned long offset, const void *addr, unsigned long len); extern void set_uncached_handler(unsigned long offset, void *addr, unsigned long len); typedef void (*vi_handler_t)(void); diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index a486486b2355..246c6a6b0261 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2091,19 +2091,19 @@ static void *set_vi_srs_handler(int n, vi_handler_t addr, int srs) * If no shadow set is selected then use the default handler * that does normal register saving and standard interrupt exit */ - extern char except_vec_vi, except_vec_vi_lui; - extern char except_vec_vi_ori, except_vec_vi_end; - extern char rollback_except_vec_vi; - char *vec_start = using_rollback_handler() ? - &rollback_except_vec_vi : &except_vec_vi; + extern const u8 except_vec_vi[], except_vec_vi_lui[]; + extern const u8 except_vec_vi_ori[], except_vec_vi_end[]; + extern const u8 rollback_except_vec_vi[]; + const u8 *vec_start = using_rollback_handler() ? + rollback_except_vec_vi : except_vec_vi; #if defined(CONFIG_CPU_MICROMIPS) || defined(CONFIG_CPU_BIG_ENDIAN) - const int lui_offset = &except_vec_vi_lui - vec_start + 2; - const int ori_offset = &except_vec_vi_ori - vec_start + 2; + const int lui_offset = except_vec_vi_lui - vec_start + 2; + const int ori_offset = except_vec_vi_ori - vec_start + 2; #else - const int lui_offset = &except_vec_vi_lui - vec_start; - const int ori_offset = &except_vec_vi_ori - vec_start; + const int lui_offset = except_vec_vi_lui - vec_start; + const int ori_offset = except_vec_vi_ori - vec_start; #endif - const int handler_len = &except_vec_vi_end - vec_start; + const int handler_len = except_vec_vi_end - vec_start; if (handler_len > VECTORSPACING) { /* @@ -2311,7 +2311,7 @@ void per_cpu_trap_init(bool is_boot_cpu) } /* Install CPU exception handler */ -void set_handler(unsigned long offset, void *addr, unsigned long size) +void set_handler(unsigned long offset, const void *addr, unsigned long size) { #ifdef CONFIG_CPU_MICROMIPS memcpy((void *)(ebase + offset), ((unsigned char *)addr - 1), size); -- 2.35.1

3 years, 10 months

FAILED: patch "[PATCH] bpf: Fix toctou on read-only map's constant scalar tracking" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 353050be4c19e102178ccc05988101887c25ae53 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann <daniel(a)iogearbox.net> Date: Tue, 9 Nov 2021 18:48:08 +0000 Subject: [PATCH] bpf: Fix toctou on read-only map's constant scalar tracking Commit a23740ec43ba ("bpf: Track contents of read-only maps as scalars") is checking whether maps are read-only both from BPF program side and user space side, and then, given their content is constant, reading out their data via map->ops->map_direct_value_addr() which is then subsequently used as known scalar value for the register, that is, it is marked as __mark_reg_known() with the read value at verification time. Before a23740ec43ba, the register content was marked as an unknown scalar so the verifier could not make any assumptions about the map content. The current implementation however is prone to a TOCTOU race, meaning, the value read as known scalar for the register is not guaranteed to be exactly the same at a later point when the program is executed, and as such, the prior made assumptions of the verifier with regards to the program will be invalid which can cause issues such as OOB access, etc. While the BPF_F_RDONLY_PROG map flag is always fixed and required to be specified at map creation time, the map->frozen property is initially set to false for the map given the map value needs to be populated, e.g. for global data sections. Once complete, the loader "freezes" the map from user space such that no subsequent updates/deletes are possible anymore. For the rest of the lifetime of the map, this freeze one-time trigger cannot be undone anymore after a successful BPF_MAP_FREEZE cmd return. Meaning, any new BPF_* cmd calls which would update/delete map entries will be rejected with -EPERM since map_get_sys_perms() removes the FMODE_CAN_WRITE permission. This also means that pending update/delete map entries must still complete before this guarantee is given. This corner case is not an issue for loaders since they create and prepare such program private map in successive steps. However, a malicious user is able to trigger this TOCTOU race in two different ways: i) via userfaultfd, and ii) via batched updates. For i) userfaultfd is used to expand the competition interval, so that map_update_elem() can modify the contents of the map after map_freeze() and bpf_prog_load() were executed. This works, because userfaultfd halts the parallel thread which triggered a map_update_elem() at the time where we copy key/value from the user buffer and this already passed the FMODE_CAN_WRITE capability test given at that time the map was not "frozen". Then, the main thread performs the map_freeze() and bpf_prog_load(), and once that had completed successfully, the other thread is woken up to complete the pending map_update_elem() which then changes the map content. For ii) the idea of the batched update is similar, meaning, when there are a large number of updates to be processed, it can increase the competition interval between the two. It is therefore possible in practice to modify the contents of the map after executing map_freeze() and bpf_prog_load(). One way to fix both i) and ii) at the same time is to expand the use of the map's map->writecnt. The latter was introduced in fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY") and further refined in 1f6cb19be2e2 ("bpf: Prevent re-mmap()'ing BPF map as writable for initially r/o mapping") with the rationale to make a writable mmap()'ing of a map mutually exclusive with read-only freezing. The counter indicates writable mmap() mappings and then prevents/fails the freeze operation. Its semantics can be expanded beyond just mmap() by generally indicating ongoing write phases. This would essentially span any parallel regular and batched flavor of update/delete operation and then also have map_freeze() fail with -EBUSY. For the check_mem_access() in the verifier we expand upon the bpf_map_is_rdonly() check ensuring that all last pending writes have completed via bpf_map_write_active() test. Once the map->frozen is set and bpf_map_write_active() indicates a map->writecnt of 0 only then we are really guaranteed to use the map's data as known constants. For map->frozen being set and pending writes in process of still being completed we fall back to marking that register as unknown scalar so we don't end up making assumptions about it. With this, both TOCTOU reproducers from i) and ii) are fixed. Note that the map->writecnt has been converted into a atomic64 in the fix in order to avoid a double freeze_mutex mutex_{un,}lock() pair when updating map->writecnt in the various map update/delete BPF_* cmd flavors. Spanning the freeze_mutex over entire map update/delete operations in syscall side would not be possible due to then causing everything to be serialized. Similarly, something like synchronize_rcu() after setting map->frozen to wait for update/deletes to complete is not possible either since it would also have to span the user copy which can sleep. On the libbpf side, this won't break d66562fba1ce ("libbpf: Add BPF object skeleton support") as the anonymous mmap()-ed "map initialization image" is remapped as a BPF map-backed mmap()-ed memory where for .rodata it's non-writable. Fixes: a23740ec43ba ("bpf: Track contents of read-only maps as scalars") Reported-by: w1tcher.bupt(a)gmail.com Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net> Acked-by: Andrii Nakryiko <andrii(a)kernel.org> Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f715e8863f4d..e7a163a3146b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -193,7 +193,7 @@ struct bpf_map { atomic64_t usercnt; struct work_struct work; struct mutex freeze_mutex; - u64 writecnt; /* writable mmap cnt; protected by freeze_mutex */ + atomic64_t writecnt; }; static inline bool map_value_has_spin_lock(const struct bpf_map *map) @@ -1419,6 +1419,7 @@ void bpf_map_put(struct bpf_map *map); void *bpf_map_area_alloc(u64 size, int numa_node); void *bpf_map_area_mmapable_alloc(u64 size, int numa_node); void bpf_map_area_free(void *base); +bool bpf_map_write_active(const struct bpf_map *map); void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr); int generic_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr, diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 50f96ea4452a..1033ee8c0caf 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -132,6 +132,21 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr) return map; } +static void bpf_map_write_active_inc(struct bpf_map *map) +{ + atomic64_inc(&map->writecnt); +} + +static void bpf_map_write_active_dec(struct bpf_map *map) +{ + atomic64_dec(&map->writecnt); +} + +bool bpf_map_write_active(const struct bpf_map *map) +{ + return atomic64_read(&map->writecnt) != 0; +} + static u32 bpf_map_value_size(const struct bpf_map *map) { if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH || @@ -601,11 +616,8 @@ static void bpf_map_mmap_open(struct vm_area_struct *vma) { struct bpf_map *map = vma->vm_file->private_data; - if (vma->vm_flags & VM_MAYWRITE) { - mutex_lock(&map->freeze_mutex); - map->writecnt++; - mutex_unlock(&map->freeze_mutex); - } + if (vma->vm_flags & VM_MAYWRITE) + bpf_map_write_active_inc(map); } /* called for all unmapped memory region (including initial) */ @@ -613,11 +625,8 @@ static void bpf_map_mmap_close(struct vm_area_struct *vma) { struct bpf_map *map = vma->vm_file->private_data; - if (vma->vm_flags & VM_MAYWRITE) { - mutex_lock(&map->freeze_mutex); - map->writecnt--; - mutex_unlock(&map->freeze_mutex); - } + if (vma->vm_flags & VM_MAYWRITE) + bpf_map_write_active_dec(map); } static const struct vm_operations_struct bpf_map_default_vmops = { @@ -668,7 +677,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) goto out; if (vma->vm_flags & VM_MAYWRITE) - map->writecnt++; + bpf_map_write_active_inc(map); out: mutex_unlock(&map->freeze_mutex); return err; @@ -1139,6 +1148,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); + bpf_map_write_active_inc(map); if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; goto err_put; @@ -1174,6 +1184,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr) free_key: kvfree(key); err_put: + bpf_map_write_active_dec(map); fdput(f); return err; } @@ -1196,6 +1207,7 @@ static int map_delete_elem(union bpf_attr *attr) map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); + bpf_map_write_active_inc(map); if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; goto err_put; @@ -1226,6 +1238,7 @@ static int map_delete_elem(union bpf_attr *attr) out: kvfree(key); err_put: + bpf_map_write_active_dec(map); fdput(f); return err; } @@ -1533,6 +1546,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr) map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); + bpf_map_write_active_inc(map); if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ) || !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; @@ -1597,6 +1611,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr) free_key: kvfree(key); err_put: + bpf_map_write_active_dec(map); fdput(f); return err; } @@ -1624,8 +1639,7 @@ static int map_freeze(const union bpf_attr *attr) } mutex_lock(&map->freeze_mutex); - - if (map->writecnt) { + if (bpf_map_write_active(map)) { err = -EBUSY; goto err_put; } @@ -4171,6 +4185,9 @@ static int bpf_map_do_batch(const union bpf_attr *attr, union bpf_attr __user *uattr, int cmd) { + bool has_read = cmd == BPF_MAP_LOOKUP_BATCH || + cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH; + bool has_write = cmd != BPF_MAP_LOOKUP_BATCH; struct bpf_map *map; int err, ufd; struct fd f; @@ -4183,16 +4200,13 @@ static int bpf_map_do_batch(const union bpf_attr *attr, map = __bpf_map_get(f); if (IS_ERR(map)) return PTR_ERR(map); - - if ((cmd == BPF_MAP_LOOKUP_BATCH || - cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH) && - !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) { + if (has_write) + bpf_map_write_active_inc(map); + if (has_read && !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) { err = -EPERM; goto err_put; } - - if (cmd != BPF_MAP_LOOKUP_BATCH && - !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { + if (has_write && !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) { err = -EPERM; goto err_put; } @@ -4205,8 +4219,9 @@ static int bpf_map_do_batch(const union bpf_attr *attr, BPF_DO_BATCH(map->ops->map_update_batch); else BPF_DO_BATCH(map->ops->map_delete_batch); - err_put: + if (has_write) + bpf_map_write_active_dec(map); fdput(f); return err; } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 65d2f93b7030..50efda51515b 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4056,7 +4056,22 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size) static bool bpf_map_is_rdonly(const struct bpf_map *map) { - return (map->map_flags & BPF_F_RDONLY_PROG) && map->frozen; + /* A map is considered read-only if the following condition are true: + * + * 1) BPF program side cannot change any of the map content. The + * BPF_F_RDONLY_PROG flag is throughout the lifetime of a map + * and was set at map creation time. + * 2) The map value(s) have been initialized from user space by a + * loader and then "frozen", such that no new map update/delete + * operations from syscall side are possible for the rest of + * the map's lifetime from that point onwards. + * 3) Any parallel/pending map update/delete operations from syscall + * side have been completed. Only after that point, it's safe to + * assume that map value(s) are immutable. + */ + return (map->map_flags & BPF_F_RDONLY_PROG) && + READ_ONCE(map->frozen) && + !bpf_map_write_active(map); } static int bpf_map_direct_read(struct bpf_map *map, int off, int size, u64 *val)

3 years, 10 months

[PATCH] mtd: cfi_cmdset_0002: Use chip_ready() for write on S29GL064N

by Tokunori Ikegami

The regression issue has been caused on S29GL064N and reported it. Also the change mentioned is to use chip_good() for buffered write. So disable the change on S29GL064N and use chip_ready() as before. Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value") Signed-off-by: Tokunori Ikegami <ikegami.t(a)gmail.com> Tested-by: Ahmad Fatoum <a.fatoum(a)pengutronix.de> Cc: linux-mtd(a)lists.infradead.org Cc: stable(a)vger.kernel.org --- drivers/mtd/chips/cfi_cmdset_0002.c | 105 ++++++++++++++++------------ 1 file changed, 59 insertions(+), 46 deletions(-) diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c index a761134fd3be..a0dfc8ace899 100644 --- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -48,6 +48,7 @@ #define SST49LF040B 0x0050 #define SST49LF008A 0x005a #define AT49BV6416 0x00d6 +#define S29GL064N_MN12 0x0c01 /* * Status Register bit description. Used by flash devices that don't @@ -109,6 +110,8 @@ static struct mtd_chip_driver cfi_amdstd_chipdrv = { .module = THIS_MODULE }; +static bool use_chip_good_for_write; + /* * Use status register to poll for Erase/write completion when DQ is not * supported. This is indicated by Bit[1:0] of SoftwareFeatures field in @@ -283,6 +286,17 @@ static void fixup_use_write_buffers(struct mtd_info *mtd) } #endif /* !FORCE_WORD_WRITE */ +static void fixup_use_chip_good_for_write(struct mtd_info *mtd) +{ + struct map_info *map = mtd->priv; + struct cfi_private *cfi = map->fldrv_priv; + + if (cfi->mfr == CFI_MFR_AMD && cfi->id == S29GL064N_MN12) + return; + + use_chip_good_for_write = true; +} + /* Atmel chips don't use the same PRI format as AMD chips */ static void fixup_convert_atmel_pri(struct mtd_info *mtd) { @@ -462,7 +476,7 @@ static struct cfi_fixup cfi_fixup_table[] = { { CFI_MFR_AMD, 0x0056, fixup_use_secsi }, { CFI_MFR_AMD, 0x005C, fixup_use_secsi }, { CFI_MFR_AMD, 0x005F, fixup_use_secsi }, - { CFI_MFR_AMD, 0x0c01, fixup_s29gl064n_sectors }, + { CFI_MFR_AMD, S29GL064N_MN12, fixup_s29gl064n_sectors }, { CFI_MFR_AMD, 0x1301, fixup_s29gl064n_sectors }, { CFI_MFR_AMD, 0x1a00, fixup_s29gl032n_sectors }, { CFI_MFR_AMD, 0x1a01, fixup_s29gl032n_sectors }, @@ -474,6 +488,7 @@ static struct cfi_fixup cfi_fixup_table[] = { #if !FORCE_WORD_WRITE { CFI_MFR_ANY, CFI_ID_ANY, fixup_use_write_buffers }, #endif + { CFI_MFR_ANY, CFI_ID_ANY, fixup_use_chip_good_for_write }, { 0, 0, NULL } }; static struct cfi_fixup jedec_fixup_table[] = { @@ -801,42 +816,61 @@ static struct mtd_info *cfi_amdstd_setup(struct mtd_info *mtd) return NULL; } -/* - * Return true if the chip is ready. - * - * Ready is one of: read mode, query mode, erase-suspend-read mode (in any - * non-suspended sector) and is indicated by no toggle bits toggling. - * - * Note that anything more complicated than checking if no bits are toggling - * (including checking DQ5 for an error status) is tricky to get working - * correctly and is therefore not done (particularly with interleaved chips - * as each chip must be checked independently of the others). - */ -static int __xipram chip_ready(struct map_info *map, struct flchip *chip, - unsigned long addr) +static int __xipram chip_check(struct map_info *map, struct flchip *chip, + unsigned long addr, map_word *expected) { struct cfi_private *cfi = map->fldrv_priv; - map_word d, t; + map_word oldd, curd; + int ret; if (cfi_use_status_reg(cfi)) { map_word ready = CMD(CFI_SR_DRB); + /* * For chips that support status register, check device * ready bit */ cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); - d = map_read(map, addr); + curd = map_read(map, addr); - return map_word_andequal(map, d, ready, ready); + return map_word_andequal(map, curd, ready, ready); } - d = map_read(map, addr); - t = map_read(map, addr); + oldd = map_read(map, addr); + curd = map_read(map, addr); + + ret = map_word_equal(map, oldd, curd); + + if (!ret || !expected) + return ret; + + return map_word_equal(map, curd, *expected); +} + +static int __xipram chip_good_for_write(struct map_info *map, + struct flchip *chip, unsigned long addr, + map_word expected) +{ + if (use_chip_good_for_write) + return chip_check(map, chip, addr, &expected); - return map_word_equal(map, d, t); + return chip_check(map, chip, addr, NULL); } +/* + * Return true if the chip is ready. + * + * Ready is one of: read mode, query mode, erase-suspend-read mode (in any + * non-suspended sector) and is indicated by no toggle bits toggling. + * + * Note that anything more complicated than checking if no bits are toggling + * (including checking DQ5 for an error status) is tricky to get working + * correctly and is therefore not done (particularly with interleaved chips + * as each chip must be checked independently of the others). + */ +#define chip_ready(map, chip, addr) chip_check(map, chip, addr, NULL) + /* * Return true if the chip is ready and has the correct value. * @@ -855,28 +889,7 @@ static int __xipram chip_ready(struct map_info *map, struct flchip *chip, static int __xipram chip_good(struct map_info *map, struct flchip *chip, unsigned long addr, map_word expected) { - struct cfi_private *cfi = map->fldrv_priv; - map_word oldd, curd; - - if (cfi_use_status_reg(cfi)) { - map_word ready = CMD(CFI_SR_DRB); - - /* - * For chips that support status register, check device - * ready bit - */ - cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi, - cfi->device_type, NULL); - curd = map_read(map, addr); - - return map_word_andequal(map, curd, ready, ready); - } - - oldd = map_read(map, addr); - curd = map_read(map, addr); - - return map_word_equal(map, oldd, curd) && - map_word_equal(map, curd, expected); + return chip_check(map, chip, addr, &expected); } static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode) @@ -1699,7 +1712,7 @@ static int __xipram do_write_oneword_once(struct map_info *map, * "chip_good" to avoid the failure due to scheduling. */ if (time_after(jiffies, timeo) && - !chip_good(map, chip, adr, datum)) { + !chip_good_for_write(map, chip, adr, datum)) { xip_enable(map, chip, adr); printk(KERN_WARNING "MTD %s(): software timeout\n", __func__); xip_disable(map, chip, adr); @@ -1707,7 +1720,7 @@ static int __xipram do_write_oneword_once(struct map_info *map, break; } - if (chip_good(map, chip, adr, datum)) { + if (chip_good_for_write(map, chip, adr, datum)) { if (cfi_check_err_status(map, chip, adr)) ret = -EIO; break; @@ -1979,14 +1992,14 @@ static int __xipram do_write_buffer_wait(struct map_info *map, * "chip_good" to avoid the failure due to scheduling. */ if (time_after(jiffies, timeo) && - !chip_good(map, chip, adr, datum)) { + !chip_good_for_write(map, chip, adr, datum)) { pr_err("MTD %s(): software timeout, address:0x%.8lx.\n", __func__, adr); ret = -EIO; break; } - if (chip_good(map, chip, adr, datum)) { + if (chip_good_for_write(map, chip, adr, datum)) { if (cfi_check_err_status(map, chip, adr)) ret = -EIO; break; -- 2.32.0

3 years, 10 months

FAILED: patch "[PATCH] firmware: qemu_fw_cfg: fix kobject leak in probe error path" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 47a1db8e797da01a1309bf42e0c0d771d4e4d4f3 Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan(a)kernel.org> Date: Wed, 1 Dec 2021 14:25:26 +0100 Subject: [PATCH] firmware: qemu_fw_cfg: fix kobject leak in probe error path An initialised kobject must be freed using kobject_put() to avoid leaking associated resources (e.g. the object name). Commit fe3c60684377 ("firmware: Fix a reference count leak.") "fixed" the leak in the first error path of the file registration helper but left the second one unchanged. This "fix" would however result in a NULL pointer dereference due to the release function also removing the never added entry from the fw_cfg_entry_cache list. This has now been addressed. Fix the remaining kobject leak by restoring the common error path and adding the missing kobject_put(). Fixes: 75f3e8e47f38 ("firmware: introduce sysfs driver for QEMU's fw_cfg device") Cc: stable(a)vger.kernel.org # 4.6 Cc: Gabriel Somlo <somlo(a)cmu.edu> Signed-off-by: Johan Hovold <johan(a)kernel.org> Link: https://lore.kernel.org/r/20211201132528.30025-3-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c index a9c64ebfc49a..ccb7ed62452f 100644 --- a/drivers/firmware/qemu_fw_cfg.c +++ b/drivers/firmware/qemu_fw_cfg.c @@ -603,15 +603,13 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f) /* register entry under "/sys/firmware/qemu_fw_cfg/by_key/" */ err = kobject_init_and_add(&entry->kobj, &fw_cfg_sysfs_entry_ktype, fw_cfg_sel_ko, "%d", entry->select); - if (err) { - kobject_put(&entry->kobj); - return err; - } + if (err) + goto err_put_entry; /* add raw binary content access */ err = sysfs_create_bin_file(&entry->kobj, &fw_cfg_sysfs_attr_raw); if (err) - goto err_add_raw; + goto err_del_entry; /* try adding "/sys/firmware/qemu_fw_cfg/by_name/" symlink */ fw_cfg_build_symlink(fw_cfg_fname_kset, &entry->kobj, entry->name); @@ -620,9 +618,10 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f) fw_cfg_sysfs_cache_enlist(entry); return 0; -err_add_raw: +err_del_entry: kobject_del(&entry->kobj); - kfree(entry); +err_put_entry: + kobject_put(&entry->kobj); return err; }

3 years, 10 months

FAILED: patch "[PATCH] PCI: pciehp: Fix infinite loop in IRQ handler upon power" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 23584c1ed3e15a6f4bfab8dc5a88d94ab929ee12 Mon Sep 17 00:00:00 2001 From: Lukas Wunner <lukas(a)wunner.de> Date: Wed, 17 Nov 2021 23:22:09 +0100 Subject: [PATCH] PCI: pciehp: Fix infinite loop in IRQ handler upon power fault The Power Fault Detected bit in the Slot Status register differs from all other hotplug events in that it is sticky: It can only be cleared after turning off slot power. Per PCIe r5.0, sec. 6.7.1.8: If a power controller detects a main power fault on the hot-plug slot, it must automatically set its internal main power fault latch [...]. The main power fault latch is cleared when software turns off power to the hot-plug slot. The stickiness used to cause interrupt storms and infinite loops which were fixed in 2009 by commits 5651c48cfafe ("PCI pciehp: fix power fault interrupt storm problem") and 99f0169c17f3 ("PCI: pciehp: enable software notification on empty slots"). Unfortunately in 2020 the infinite loop issue was inadvertently reintroduced by commit 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt race"): The hardirq handler pciehp_isr() clears the PFD bit until pciehp's power_fault_detected flag is set. That happens in the IRQ thread pciehp_ist(), which never learns of the event because the hardirq handler is stuck in an infinite loop. Fix by setting the power_fault_detected flag already in the hardirq handler. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214989 Link: https://lore.kernel.org/linux-pci/DM8PR11MB5702255A6A92F735D90A4446868B9@DM… Fixes: 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt race") Link: https://lore.kernel.org/r/66eaeef31d4997ceea357ad93259f290ededecfd.16371872… Reported-by: Joseph Bao <joseph.bao(a)intel.com> Tested-by: Joseph Bao <joseph.bao(a)intel.com> Signed-off-by: Lukas Wunner <lukas(a)wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Cc: stable(a)vger.kernel.org # v4.19+ Cc: Stuart Hayes <stuart.w.hayes(a)gmail.com> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 83a0fa119cae..9535c61cbff3 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -642,6 +642,8 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) */ if (ctrl->power_fault_detected) status &= ~PCI_EXP_SLTSTA_PFD; + else if (status & PCI_EXP_SLTSTA_PFD) + ctrl->power_fault_detected = true; events |= status; if (!events) { @@ -651,7 +653,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) } if (status) { - pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events); + pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, status); /* * In MSI mode, all event bits must be zero before the port @@ -725,8 +727,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id) } /* Check Power Fault Detected */ - if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) { - ctrl->power_fault_detected = 1; + if (events & PCI_EXP_SLTSTA_PFD) { ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(ctrl)); pciehp_set_indicators(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF, PCI_EXP_SLTCTL_ATTN_IND_ON);

3 years, 10 months

FAILED: patch "[PATCH] block: Fix fsync always failed if once failed" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 8a7518931baa8ea023700987f3db31cb0a80610b Mon Sep 17 00:00:00 2001 From: Ye Bin <yebin10(a)huawei.com> Date: Mon, 29 Nov 2021 09:26:59 +0800 Subject: [PATCH] block: Fix fsync always failed if once failed We do test with inject error fault base on v4.19, after test some time we found sync /dev/sda always failed. [root@localhost] sync /dev/sda sync: error syncing '/dev/sda': Input/output error scsi log as follows: [19069.812296] sd 0:0:0:0: [sda] tag#64 Send: scmd 0x00000000d03a0b6b [19069.812302] sd 0:0:0:0: [sda] tag#64 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [19069.812533] sd 0:0:0:0: [sda] tag#64 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK [19069.812536] sd 0:0:0:0: [sda] tag#64 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [19069.812539] sd 0:0:0:0: [sda] tag#64 scsi host busy 1 failed 0 [19069.812542] sd 0:0:0:0: Notifying upper driver of completion (result 0) [19069.812546] sd 0:0:0:0: [sda] tag#64 sd_done: completed 0 of 0 bytes [19069.812549] sd 0:0:0:0: [sda] tag#64 0 sectors total, 0 bytes done. [19069.812564] print_req_error: I/O error, dev sda, sector 0 ftrace log as follows: rep-306069 [007] .... 19654.923315: block_bio_queue: 8,0 FWS 0 + 0 [rep] rep-306069 [007] .... 19654.923333: block_getrq: 8,0 FWS 0 + 0 [rep] kworker/7:1H-250 [007] .... 19654.923352: block_rq_issue: 8,0 FF 0 () 0 + 0 [kworker/7:1H] <idle>-0 [007] ..s. 19654.923562: block_rq_complete: 8,0 FF () 18446744073709551615 + 0 [0] <idle>-0 [007] d.s. 19654.923576: block_rq_complete: 8,0 WS () 0 + 0 [-5] As 8d6996630c03 introduce 'fq->rq_status', this data only update when 'flush_rq' reference count isn't zero. If flush request once failed and record error code in 'fq->rq_status'. If there is no chance to update 'fq->rq_status',then do fsync will always failed. To address this issue reset 'fq->rq_status' after return error code to upper layer. Fixes: 8d6996630c03("block: fix null pointer dereference in blk_mq_rq_timed_out()") Signed-off-by: Ye Bin <yebin10(a)huawei.com> Reviewed-by: Ming Lei <ming.lei(a)redhat.com> Link: https://lore.kernel.org/r/20211129012659.1553733-1-yebin10@huawei.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/block/blk-flush.c b/block/blk-flush.c index fd5187a0898d..f78bb39e589e 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -242,8 +242,10 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error) * avoiding use-after-free. */ WRITE_ONCE(flush_rq->state, MQ_RQ_IDLE); - if (fq->rq_status != BLK_STS_OK) + if (fq->rq_status != BLK_STS_OK) { error = fq->rq_status; + fq->rq_status = BLK_STS_OK; + } if (!q->elevator) { flush_rq->tag = BLK_MQ_NO_TAG;

3 years, 10 months

[PATCH 1/3] ice: Fix race conditions between virtchnl handling and VF ndo ops

by Jacob Keller

From: Brett Creeley <brett.creeley(a)intel.com> commit e6ba5273d4ede03d075d7a116b8edad1f6115f4d upstream. [I had to fix the cherry-pick manually as the patch added a line around some context that was missing.] The VF can be configured via the PF's ndo ops at the same time the PF is receiving/handling virtchnl messages. This has many issues, with one of them being the ndo op could be actively resetting a VF (i.e. resetting it to the default state and deleting/re-adding the VF's VSI) while a virtchnl message is being handled. The following error was seen because a VF ndo op was used to change a VF's trust setting while the VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing: [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5 [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6 Fix this by making sure the virtchnl handling and VF ndo ops that trigger VF resets cannot run concurrently. This is done by adding a struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex will be locked around the critical operations and VFR. Since the ndo ops will trigger a VFR, the virtchnl thread will use mutex_trylock(). This is done because if any other thread (i.e. VF ndo op) has the mutex, then that means the current VF message being handled is no longer valid, so just ignore it. This issue can be seen using the following commands: for i in {0..50}; do rmmod ice modprobe ice sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on sleep 2 echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on done Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations") Cc: <stable(a)vger.kernel.org> # 5.8.x Signed-off-by: Brett Creeley <brett.creeley(a)intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski(a)intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com> Signed-off-by: Jacob Keller <jacob.e.keller(a)intel.com> --- This is for stable trees 5.8 through 5.12. I sent patches for 5.13 and 5.14 separately since they have slightly different context .../net/ethernet/intel/ice/ice_virtchnl_pf.c | 25 +++++++++++++++++++ .../net/ethernet/intel/ice/ice_virtchnl_pf.h | 5 ++++ 2 files changed, 30 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c index 48dee9c5d534..66da8f540454 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c @@ -375,6 +375,8 @@ void ice_free_vfs(struct ice_pf *pf) set_bit(ICE_VF_STATE_DIS, pf->vf[i].vf_states); ice_free_vf_res(&pf->vf[i]); } + + mutex_destroy(&pf->vf[i].cfg_lock); } if (ice_sriov_free_msix_res(pf)) @@ -1556,6 +1558,8 @@ static void ice_set_dflt_settings_vfs(struct ice_pf *pf) set_bit(ICE_VIRTCHNL_VF_CAP_L2, &vf->vf_caps); vf->spoofchk = true; vf->num_vf_qs = pf->num_qps_per_vf; + + mutex_init(&vf->cfg_lock); } } @@ -3389,6 +3393,8 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos, return 0; } + mutex_lock(&vf->cfg_lock); + vf->port_vlan_info = vlanprio; if (vf->port_vlan_info) @@ -3398,6 +3404,7 @@ ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos, dev_info(dev, "Clearing port VLAN on VF %d\n", vf_id); ice_vc_reset_vf(vf); + mutex_unlock(&vf->cfg_lock); return 0; } @@ -3763,6 +3770,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event) return; } + /* VF is being configured in another context that triggers a VFR, so no + * need to process this message + */ + if (!mutex_trylock(&vf->cfg_lock)) { + dev_info(dev, "VF %u is being configured in another context that will trigger a VFR, so there is no need to handle this message\n", + vf->vf_id); + return; + } + switch (v_opcode) { case VIRTCHNL_OP_VERSION: err = ice_vc_get_ver_msg(vf, msg); @@ -3839,6 +3855,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event) dev_info(dev, "PF failed to honor VF %d, opcode %d, error %d\n", vf_id, v_opcode, err); } + + mutex_unlock(&vf->cfg_lock); } /** @@ -3953,6 +3971,8 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) return -EINVAL; } + mutex_lock(&vf->cfg_lock); + /* VF is notified of its new MAC via the PF's response to the * VIRTCHNL_OP_GET_VF_RESOURCES message after the VF has been reset */ @@ -3970,6 +3990,7 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) } ice_vc_reset_vf(vf); + mutex_unlock(&vf->cfg_lock); return 0; } @@ -3999,11 +4020,15 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted) if (trusted == vf->trusted) return 0; + mutex_lock(&vf->cfg_lock); + vf->trusted = trusted; ice_vc_reset_vf(vf); dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n", vf_id, trusted ? "" : "un"); + mutex_unlock(&vf->cfg_lock); + return 0; } diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h index 0f519fba3770..59e5b4f16e96 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h @@ -68,6 +68,11 @@ struct ice_mdd_vf_events { struct ice_vf { struct ice_pf *pf; + /* Used during virtchnl message handling and NDO ops against the VF + * that will trigger a VFR + */ + struct mutex cfg_lock; + u16 vf_id; /* VF ID in the PF space */ u16 lan_vsi_idx; /* index into PF struct */ /* first vector index of this VF in the PF space */ -- 2.35.1.355.ge7e302376dd6

3 years, 10 months

← Newer
1
2
3
4
5
6
7
8
9
...
174
Older →

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror February 2022