On 03-Nov-22 9:42 PM, Ravi Bangoria wrote:
On 03-Nov-22 6:03 PM, Peter Zijlstra wrote:
On Thu, Nov 03, 2022 at 05:15:30PM +0530, Ravi Bangoria wrote:
Sorry was distracted a bit. So, this seems to be happening because of race between amd_pmu_enable_all() and perf event NMI. Something like:
amd_pmu_enable_all() { if (!test_bit(idx, cpuc->active_mask))
--->/* perf NMI entry */ ... x86_pmu_stop() { __clear_bit(hwc->idx, cpuc->active_mask); cpuc->events[hwc->idx] = NULL; } ... <---/* perf NMI exit */
amd_pmu_enable_event(cpuc->events[idx]);
}
Hmm, do you have more information?
I've extracted function graph logs from crash dump and uploaded it here: https://github.com/BangoriaRavi/function_graph/blob/main/trace.function_grap...
crash was on CPU1.
git bisect let me to a BRS patch:
commit ada543459cab7f653dcacdaba4011a8bb19c627c Author: Stephane Eranian eranian@google.com Date: Tue Mar 22 15:15:07 2022 -0700
perf/x86/amd: Add AMD Fam19h Branch Sampling support
Add support for the AMD Fam19h 16-deep branch sampling feature as described in the AMD PPR Fam19h Model 01h Revision B1. This is a model specific extension. It is not an architected AMD feature.
Thanks, Ravi