On Fri, Apr 21, 2023 at 4:18 PM Peter Newman peternewman@google.com wrote:
__mon_event_count() only reads the current software count and does not cause CPUs in the domain to flush. For mbm_update() to be effective in preventing overflow in hardware counters with soft RMIDs, it needs to flush the domain CPUs so that all of the HW RMIDs are read.
When RMIDs are soft, mbm_update() is intended to push bandwidth counts to the software counters rather than pulling the counts from hardware when userspace reads event counts, as this is a lot more efficient when the number of HW RMIDs is fixed.
The low frequency with which the overflow handler is run would introduce too much error into bandwidth calculations and running it more frequently regardless of whether event count reads are being requested by the user is not a good use of CPU time.
mon_event_read() needs to pull fresh event count values from hardware.
When RMIDs are soft, mbm_update() only calls mbm_flush_cpu_handler() on each CPU in the domain rather than reading all RMIDs.
I'll try going back to Stephane's original approach of rate-limiting how often domain CPUs need to be flushed and allowing the user to configure the time threshold. This will allow mbm_update() to read all of the RMIDs without triggering lots of redundant IPIs. (redundant because only the current RMID on each CPU can change when RMIDs are soft)