Hi Mathieu,
Thank you for your suggestions!
I discussed a possible design solution with Suzuki, and he suggested the N:N topology would follow a different approach since, in that case, the hardware capability of triggering interrupts is implemented and each buffer is independent with each other.
The trace data is lost because of hardware/memory limitations and due to the system not reading frequently enough the trace data buffer.
This idea would provide an improvement for the second scenario, where the trace buffer is read independently w.r.t. task scheduling.
At the moment it's indeed possible, taking into account the same workload, to produce a final trace in size of orders of magnitude bigger just by artificially increasing the system load. It would be great to have the ETM subsystem provide a less variable
perf.data size.
Strobing is very beneficial but it works on a different abstraction layer: regardless of having strobing on or off, if the trace buffer is only read a couple of times because of an idle system, the final perf.data size will be the same.
I was thinking about defining a rcu_work to be submitted into a workqueue for getting the trace to be read while at the same time it's generated, but unfortunately Suzuki thinks an RCU solution would no be suitable in this use case, and I'm open to any
other suggestions.
Please let me know if you are aware of any crucial problem which could prevent a trace consumer and producer to execute at the same time.
Thanks,
Andrea