On Thu, May 30, 2019, Tony W Wang-oc wrote:
> Hi Ashok,
> I have two questions about this patch, could you help to check:
>
> 1, for broadcast #MC exceptions, this patch seems require #MC exception
> errors
> set MCG_STATUS_RIPV = 1.
> But for Intel CPU, some #MC exception errors set MCG_STATUS_RIPV = 0
> (like "Recoverable-not-continuable SRAR Type" Errors), for these errors
> the patch doesn't seem to work, is that okay?
>
> 2, for LMCE exceptions, this patch seems require #MC exception errors
> set MCG_STATUS_RIPV = 0 to make sure LMCE be handled normally even
> on offline CPU.
> For LMCE errors set MCG_STAUS_RIPV = 1, the patch prevents offline CPU
> handle these LMCE errors, is that okay?
>
More specifically, this patch seems require #MC exceptions meet the condition
"MCG_STATUS_RIPV ^ MCG_STATUS_LMCES == 1"; But on a Xeon X5650 machine (SMP),
"Data CACHE Level-2 Generic Error" does not meet this condition.
I got below message from: https://www.centos.org/forums/viewtopic.php?p=292742
Hardware event. This is not a software error.
MCE 0
CPU 4 BANK 6 TSC b7065eeaa18b0
TIME 1545643603 Mon Dec 24 10:26:43 2018
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Data CACHE Level-2 Generic Error
STATUS b200000080000106 MCGSTATUS 4
MCGCAP 1c09 APICID 4 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44
> Thanks
> Tony W Wang-oc
Please consider backporting this commit to 4.19-stable:
commit ede95a63b5e84ddeea6b0c473b36ab8bfd8c6ce3
Author: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Tue Oct 23 01:11:04 2018 +0200
bpf: add bpf_jit_limit knob to restrict unpriv allocations
No other stable branches are affected by the issue.
Ben.
--
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom
Currently, there is only a 1 ms sleep after asserting PERST.
Reading the datasheets for different endpoints, some require PERST to be
asserted for 10 ms in order for the endpoint to perform a reset, others
require it to be asserted for 50 ms.
Several SoCs using this driver uses PCIe Mini Card, where we don't know
what endpoint will be plugged in.
The PCI Express Card Electromechanical Specification specifies:
"On power up, the deassertion of PERST# is delayed 100 ms (TPVPERL) from
the power rails achieving specified operating limits."
Add a sleep of 100 ms before deasserting PERST, in order to ensure that
we are compliant with the spec.
Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver")
Signed-off-by: Niklas Cassel <niklas.cassel(a)linaro.org>
Acked-by: Stanimir Varbanov <svarbanov(a)mm-sol.com>
Cc: stable(a)vger.kernel.org # 4.5+
---
Changes since v1:
Move the sleep into qcom_ep_reset_deassert()
drivers/pci/controller/dwc/pcie-qcom.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 0ed235d560e3..5d1713069d14 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -178,6 +178,8 @@ static void qcom_ep_reset_assert(struct qcom_pcie *pcie)
static void qcom_ep_reset_deassert(struct qcom_pcie *pcie)
{
+ /* Ensure that PERST has been asserted for at least 100 ms */
+ msleep(100);
gpiod_set_value_cansleep(pcie->reset, 0);
usleep_range(PERST_DELAY_US, PERST_DELAY_US + 500);
}
--
2.21.0