On Tue, Apr 22, 2025 at 02:49:54PM +0200, Ingo Molnar wrote:
[...]
Hi Yabin,
I was wondering if this is just the opposite of PERF_PMU_CAP_AUX_NO_SG, and that order 0 should be used by default for all devices to solve the issue you describe. Because we already have PERF_PMU_CAP_AUX_NO_SG for devices that need contiguous pages. Then I found commit 5768402fd9c6 ("perf/ring_buffer: Use high order allocations for AUX buffers optimistically") that explains that the current allocation strategy is an optimization.
Your change seems to decide that for certain devices we want to optimize for fragmentation rather than performance. If these are rarely used features specifically when looking at performance should we not continue to optimize for performance? Or at least make it user configurable?
So there seems to be 3 categories:
- Must have physically contiguous AUX buffers, it's a hardware ABI. (PERF_PMU_CAP_AUX_NO_SG for Intel BTS and PT.)
- Would be nice to have continguous AUX buffers, for a bit more performance.
- Doesn't really care.
So we do have #1, and it appears Yabin's usecase is #3?
In Yabin's case, the AUX buffer work as a bounce buffer. The hardware trace data is copied by a driver from low level's contiguous buffer to the AUX buffer.
In this case we cannot benefit much from continguous AUX buffers.
Thanks, Leo