On Thu, Jul 10, 2025 at 10:14:17AM +0200, Niklas Schnelle wrote:
On Tue, 2025-07-08 at 18:23 +0200, Greg Kroah-Hartman wrote:
6.1-stable review patch. If anyone has any objections, please let me know.
From: Niklas Schnelle schnelle@linux.ibm.com
[ Upstream commit 45537926dd2aaa9190ac0fac5a0fbeefcadfea95 ]
The error event information for PCI error events contains a function handle for the respective function. This handle is generally captured at the time the error event was recorded. Due to delays in processing or cascading issues, it may happen that during firmware recovery multiple events are generated. When processing these events in order Linux may already have recovered an affected function making the event information stale. Fix this by doing an unconditional CLP List PCI function retrieving the current function handle with the zdev->state_lock held and ignoring the event if its function handle is stale.
Cc: stable@vger.kernel.org Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery") Reviewed-by: Julian Ruess julianr@linux.ibm.com Reviewed-by: Gerd Bayer gbayer@linux.ibm.com Reviewed-by: Farhan Ali alifm@linux.ibm.com Signed-off-by: Niklas Schnelle schnelle@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/s390/pci/pci_event.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c index d969f36bf186f..dc512c8f82324 100644 --- a/arch/s390/pci/pci_event.c +++ b/arch/s390/pci/pci_event.c @@ -257,6 +257,8 @@ static void __zpci_event_error(struct zpci_ccdf_err *ccdf) struct zpci_dev *zdev = get_zdev_by_fid(ccdf->fid); struct pci_dev *pdev = NULL; pci_ers_result_t ers_res;
- u32 fh = 0;
- int rc;
zpci_dbg(3, "err fid:%x, fh:%x, pec:%x\n", ccdf->fid, ccdf->fh, ccdf->pec); @@ -264,9 +266,23 @@ static void __zpci_event_error(struct zpci_ccdf_err *ccdf) zpci_err_hex(ccdf, sizeof(*ccdf)); if (zdev) {
mutex_lock(&zdev->state_lock);
rc = clp_refresh_fh(zdev->fid, &fh);
if (rc) {
mutex_unlock(&zdev->state_lock);
goto no_pdev;
}
if (!fh || ccdf->fh != fh) {
/* Ignore events with stale handles */
zpci_dbg(3, "err fid:%x, fh:%x (stale %x)\n",
ccdf->fid, fh, ccdf->fh);
mutex_unlock(&zdev->state_lock);
goto no_pdev;
zpci_update_fh(zdev, ccdf->fh); if (zdev->zbus->bus) pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);}
}mutex_unlock(&zdev->state_lock);
pr_err("%s: Event 0x%x reports an error for PCI function 0x%x\n",
Sorry I only noticed this due to a build error report but this backport is NOT CORRECT. The mutex_lock(&zdev->state_lock) line that was context in the original commit was part of commit bcb5d6c76903 ("s390/pci: introduce lock to synchronize state of zpci_dev's") which also added the mutex and isn't in this tree. So without pulling that in as a prerequisite this won't compile.
Also and kind of worse the above puts the mutex_unlock() in the wrong place! Please drop/revert this patch.
The original commit here should work for its specific problem even without the backport of the mutex though I think it would be best to get that into stable as well. Sorry for not marking it as a dependency. That said, shouldn't there be a note that this backport deviates significantly from the upstream commit?
Already dropped, as something went wrong here.
thanks for the review.
greg k-h