On 2024/9/2 17:50, Peter Zijlstra wrote:
On Sat, Aug 31, 2024 at 07:43:16AM +0000, Luo Gengkun wrote:
Perf events has the notion of sampling frequency which is implemented in software by dynamically adjusting the counter period so that samples occur at approximately the target frequency. Period adjustment is done in 2 places:
- when the counter overflows (and a sample is recorded)
- each timer tick, when the event is active
The later case is slightly flawed because it assumes that the time since the last timer-tick period adjustment is 1 tick, whereas the event may not have been active (e.g. for a task that is sleeping).
Fix by using jiffies to determine the elapsed time in that case.
Signed-off-by: Luo Gengkun luogengkun@huaweicloud.com Reviewed-by: Adrian Hunter adrian.hunter@intel.com
include/linux/perf_event.h | 1 + kernel/events/core.c | 12 +++++++++--- 2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 1a8942277dda..d29b7cf971a1 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -265,6 +265,7 @@ struct hw_perf_event { * State for freq target events, see __perf_event_overflow() and * perf_adjust_freq_unthr_context(). */
- u64 freq_tick_stamp; u64 freq_time_stamp; u64 freq_count_stamp; #endif
diff --git a/kernel/events/core.c b/kernel/events/core.c index a9395bbfd4aa..183291e0d070 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -55,6 +55,7 @@ #include <linux/pgtable.h> #include <linux/buildid.h> #include <linux/task_work.h> +#include <linux/jiffies.h> #include "internal.h" @@ -4120,9 +4121,11 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list) { struct perf_event *event; struct hw_perf_event *hwc;
- u64 now, period = TICK_NSEC;
- u64 now, period, tick_stamp; s64 delta;
- tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
- list_for_each_entry(event, event_list, active_list) { if (event->state != PERF_EVENT_STATE_ACTIVE) continue;
@@ -4148,6 +4151,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list) */ event->pmu->stop(event, PERF_EF_UPDATE);
period = tick_stamp - hwc->freq_tick_stamp;
hwc->freq_tick_stamp = tick_stamp;
- now = local64_read(&event->count); delta = now - hwc->freq_count_stamp; hwc->freq_count_stamp = now;
@@ -4157,9 +4163,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list) * reload only if value has changed * we have stopped the event so tell that * to perf_adjust_period() to avoid stopping it
* twice.
*/* twice. And skip if it is the first tick adjust period.
if (delta > 0)
if (delta > 0 && likely(period != tick_stamp)) perf_adjust_period(event, period, delta, false);
event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
This one I'm less happy with.. that condition 'period != tick_stamp' doesn't make sense to me. That's only false if hwc->freq_tick_stamp == 0, which it will only be once after event creation. Even through the Changelog babbles about event scheduling.
Also, that all should then be written something like:
if (delta > 0 && ...) { perf_adjust_period(...); adjusted = true; }
event->pmu->start(event, adjusted ? PERF_EF_RELOAD : 0);
Thank for your review! That is a good point.
If freq_tick_stamp is initialized when an event is created
or enabled, the additional condition can be removed as follows:
+static bool is_freq_event(struct perf_event *event) +{ + return event->attr.freq && event->attr.sample_freq; +} + static void perf_event_set_state(struct perf_event *event, enum perf_event_state state) { @@ -665,6 +670,12 @@ perf_event_set_state(struct perf_event *event, enum perf_event_state state) */ if ((event->state < 0) ^ (state < 0)) perf_event_update_sibling_time(event); + /* + * Update freq_tick_stamp for freq event just enabled + */ + if (is_freq_event(event) && state == PERF_EVENT_STATE_INACTIVE && + event->state < PERF_EVENT_STATE_INACTIVE) + event->hw.freq_tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
WRITE_ONCE(event->state, state); } @@ -4165,7 +4176,7 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list) * to perf_adjust_period() to avoid stopping it * twice. And skip if it is the first tick adjust period. */ - if (delta > 0 && likely(period != tick_stamp)) + if (delta > 0) perf_adjust_period(event, period, delta, false);
event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0); @@ -12061,8 +12072,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
hwc = &event->hw; hwc->sample_period = attr->sample_period; - if (attr->freq && attr->sample_freq) + if (is_freq_event(event)) { hwc->sample_period = 1; + if (event->state == PERF_EVENT_STATE_INACTIVE) + event->hw.freq_tick_stamp = jiffies64_to_nsecs(get_jiffies_64()); + }
And I'm wondering if we also need to update freq_count_stamp when
the freq event is enabled for the reason to keep they on the same "period".
+ if (is_freq_event(event) && state == PERF_EVENT_STATE_INACTIVE && + event->state < PERF_EVENT_STATE_INACTIVE) { + event->hw.freq_tick_stamp = jiffies64_to_nsecs(get_jiffies_64()); + event->hw.freq_count_stamp = local64_read(&event->count); + }
Looking for your reply!
Thanks.