Hi there,
I found a nullptr dereference in perf subsystem and it affects at least v5.10 and v6.1 stable trees. (the same poc cannot trigger the crash in the mainline).
I fail to find the root cause the bug. All I know is that it is a race condition in the logic of moving_groups from pure software-based perf events to hardware ones. More specifically, when we add a hardware perf event to a software event group, it will trigger a "move_group" logic in perf_event_open. When the "move_group" logic happens, it will remove all existing events from the context first using `perf_remove_from_context`. And it will invoke `__perf_remove_from_context` through `event_function_call`.
Notice that `event_function_call` is defined as follow: ~~~ static void event_function_call(struct perf_event *event, event_f func, void *data) { ... func(event, NULL, ctx, data); ... } ~~~ This means `__perf_remove_from_context` will be invoked with cpuctx==NULL, which leads to invoking `event_sched_out` with cpuctx == NULL. At this moment, as long as the event is active, we are going to invoke the `if (event->attr.exclusive || !cpuctx->active_oncpu)` logic, which is a null pointer deference.
I don't know the proper way to patch this bug. So I'm asking for help.
A reproducer is attached to this email.
Best, Kyle Zeng