During the EEH MMIO error checking, the current implementation fails to map the (virtual) MMIO address back to the pci device on radix with hugepage mappings for I/O. This results into failure to dispatch EEH event with no recovery even when EEH capability has been enabled on the device.
eeh_check_failure(token) # token = virtual MMIO address addr = eeh_token_to_phys(token); edev = eeh_addr_cache_get_dev(addr); if (!edev) return 0; eeh_dev_check_failure(edev); <= Dispatch the EEH event
In case of hugepage mappings, eeh_token_to_phys() has a bug in virt -> phys translation that results in wrong physical address, which is then passed to eeh_addr_cache_get_dev() to match it against cached pci I/O address ranges to get to a PCI device. Hence, it fails to find a match and the EEH event never gets dispatched leaving the device in failed state.
The commit 33439620680be ("powerpc/eeh: Handle hugepages in ioremap space") introduced following logic to translate virt to phys for hugepage mappings:
eeh_token_to_phys(): + pa = pte_pfn(*ptep); + + /* On radix we can do hugepage mappings for io, so handle that */ + if (hugepage_shift) { + pa <<= hugepage_shift; <= This is wrong + pa |= token & ((1ul << hugepage_shift) - 1); + }
This patch fixes the virt -> phys translation in eeh_token_to_phys() function.
$ cat /sys/kernel/debug/powerpc/eeh_address_cache mem addr range [0x0000040080000000-0x00000400807fffff]: 0030:01:00.1 mem addr range [0x0000040080800000-0x0000040080ffffff]: 0030:01:00.1 mem addr range [0x0000040081000000-0x00000400817fffff]: 0030:01:00.0 mem addr range [0x0000040081800000-0x0000040081ffffff]: 0030:01:00.0 mem addr range [0x0000040082000000-0x000004008207ffff]: 0030:01:00.1 mem addr range [0x0000040082080000-0x00000400820fffff]: 0030:01:00.0 mem addr range [0x0000040082100000-0x000004008210ffff]: 0030:01:00.1 mem addr range [0x0000040082110000-0x000004008211ffff]: 0030:01:00.0
Above is the list of cached io address ranges of pci 0030:01:00.<fn>.
Before this patch:
Tracing 'arg1' of function eeh_addr_cache_get_dev() during error injection clearly shows that 'addr=' contains wrong physical address:
kworker/u16:0-7 [001] .... 108.883775: eeh_addr_cache_get_dev: (eeh_addr_cache_get_dev+0xc/0xf0) addr=0x80103000a510
dmesg shows no EEH recovery messages:
[ 108.563768] bnx2x: [bnx2x_timer:5801(eth2)]MFW seems hanged: drv_pulse (0x9ae) != mcp_pulse (0x7fff) [ 108.563788] bnx2x: [bnx2x_hw_stats_update:870(eth2)]NIG timer max (4294967295) [ 108.883788] bnx2x: [bnx2x_acquire_hw_lock:2013(eth1)]lock_status 0xffffffff resource_bit 0x1 [ 108.884407] bnx2x 0030:01:00.0 eth1: MDC/MDIO access timeout [ 108.884976] bnx2x 0030:01:00.0 eth1: MDC/MDIO access timeout <..>
After this patch:
eeh_addr_cache_get_dev() trace shows correct physical address:
<idle>-0 [001] ..s. 1043.123828: eeh_addr_cache_get_dev: (eeh_addr_cache_get_dev+0xc/0xf0) addr=0x40080bc7cd8
dmesg logs shows EEH recovery getting triggerred:
[ 964.323980] bnx2x: [bnx2x_timer:5801(eth2)]MFW seems hanged: drv_pulse (0x746f) != mcp_pulse (0x7fff) [ 964.323991] EEH: Recovering PHB#30-PE#10000 [ 964.324002] EEH: PE location: N/A, PHB location: N/A [ 964.324006] EEH: Frozen PHB#30-PE#10000 detected <..>
Cc: stable@vger.kernel.org Fixes: 33439620680be ("powerpc/eeh: Handle hugepages in ioremap space") Signed-off-by: Mahesh Salgaonkar mahesh@linux.ibm.com Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Reported-by: Dominic DeMarco ddemarc@us.ibm.com --- Change in v2: - Fixed error reported by checkpatch.pl. --- arch/powerpc/kernel/eeh.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index cd60bc1c87011..7040e430a1249 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -362,14 +362,11 @@ static inline unsigned long eeh_token_to_phys(unsigned long token) pa = pte_pfn(*ptep);
/* On radix we can do hugepage mappings for io, so handle that */ - if (hugepage_shift) { - pa <<= hugepage_shift; - pa |= token & ((1ul << hugepage_shift) - 1); - } else { - pa <<= PAGE_SHIFT; - pa |= token & (PAGE_SIZE - 1); - } + if (!hugepage_shift) + hugepage_shift = PAGE_SHIFT;
+ pa <<= PAGE_SHIFT; + pa |= token & ((1ul << hugepage_shift) - 1); return pa; }
On Mon, Apr 12, 2021 at 5:52 PM Mahesh Salgaonkar mahesh@linux.ibm.com wrote:
During the EEH MMIO error checking, the current implementation fails to map the (virtual) MMIO address back to the pci device on radix with hugepage mappings for I/O. This results into failure to dispatch EEH event with no recovery even when EEH capability has been enabled on the device.
eeh_check_failure(token) # token = virtual MMIO address addr = eeh_token_to_phys(token); edev = eeh_addr_cache_get_dev(addr); if (!edev) return 0; eeh_dev_check_failure(edev); <= Dispatch the EEH event
In case of hugepage mappings, eeh_token_to_phys() has a bug in virt -> phys translation that results in wrong physical address, which is then passed to eeh_addr_cache_get_dev() to match it against cached pci I/O address ranges to get to a PCI device. Hence, it fails to find a match and the EEH event never gets dispatched leaving the device in failed state.
The commit 33439620680be ("powerpc/eeh: Handle hugepages in ioremap space") introduced following logic to translate virt to phys for hugepage mappings:
eeh_token_to_phys():
pa = pte_pfn(*ptep);
/* On radix we can do hugepage mappings for io, so handle that */
if (hugepage_shift) {
pa <<= hugepage_shift; <= This is wrong
pa |= token & ((1ul << hugepage_shift) - 1);
}
I think I vaguely remember thinking "is this right?" at the time. Apparently not!
Reviewed-by: Oliver O'Halloran oohall@gmail.com
It would probably be a good idea to add a debugfs interface to help with testing the address translation. Maybe something like:
echo <mmio addr> > /sys/kernel/debug/powerpc/eeh_addr_check
Then in the kernel:
struct resource *r = lookup_resource(mmio_addr); void *virt = ioremap_resource(r); ret = eeh_check_failure(virt); iounmap(virt)
return ret;
A little tedious, but then you can write a selftest :)
Oliver
On 2021-04-13 10:53:39 Tue, Oliver O'Halloran wrote:
On Mon, Apr 12, 2021 at 5:52 PM Mahesh Salgaonkar mahesh@linux.ibm.com wrote:
During the EEH MMIO error checking, the current implementation fails to map the (virtual) MMIO address back to the pci device on radix with hugepage mappings for I/O. This results into failure to dispatch EEH event with no recovery even when EEH capability has been enabled on the device.
eeh_check_failure(token) # token = virtual MMIO address addr = eeh_token_to_phys(token); edev = eeh_addr_cache_get_dev(addr); if (!edev) return 0; eeh_dev_check_failure(edev); <= Dispatch the EEH event
In case of hugepage mappings, eeh_token_to_phys() has a bug in virt -> phys translation that results in wrong physical address, which is then passed to eeh_addr_cache_get_dev() to match it against cached pci I/O address ranges to get to a PCI device. Hence, it fails to find a match and the EEH event never gets dispatched leaving the device in failed state.
The commit 33439620680be ("powerpc/eeh: Handle hugepages in ioremap space") introduced following logic to translate virt to phys for hugepage mappings:
eeh_token_to_phys():
pa = pte_pfn(*ptep);
/* On radix we can do hugepage mappings for io, so handle that */
if (hugepage_shift) {
pa <<= hugepage_shift; <= This is wrong
pa |= token & ((1ul << hugepage_shift) - 1);
}
I think I vaguely remember thinking "is this right?" at the time. Apparently not!
Reviewed-by: Oliver O'Halloran oohall@gmail.com
Thanks for the review.
It would probably be a good idea to add a debugfs interface to help with testing the address translation. Maybe something like:
echo <mmio addr> > /sys/kernel/debug/powerpc/eeh_addr_check
Then in the kernel:
struct resource *r = lookup_resource(mmio_addr); void *virt = ioremap_resource(r); ret = eeh_check_failure(virt); iounmap(virt)
return ret;
A little tedious, but then you can write a selftest :)
Sure, will give a try.
Thanks, -Mahesh.
On Mon, 12 Apr 2021 13:22:50 +0530, Mahesh Salgaonkar wrote:
During the EEH MMIO error checking, the current implementation fails to map the (virtual) MMIO address back to the pci device on radix with hugepage mappings for I/O. This results into failure to dispatch EEH event with no recovery even when EEH capability has been enabled on the device.
eeh_check_failure(token) # token = virtual MMIO address addr = eeh_token_to_phys(token); edev = eeh_addr_cache_get_dev(addr); if (!edev) return 0; eeh_dev_check_failure(edev); <= Dispatch the EEH event
[...]
Applied to powerpc/next.
[1/1] powerpc/eeh: Fix EEH handling for hugepages in ioremap space. https://git.kernel.org/powerpc/c/5ae5bc12d0728db60a0aa9b62160ffc038875f1a
cheers
linux-stable-mirror@lists.linaro.org