Hi Reinette,
On Wed, May 17, 2023 at 2:06 AM Reinette Chatre reinette.chatre@intel.com wrote:
On 5/16/2023 7:49 AM, Peter Newman wrote:
On Thu, May 11, 2023 at 11:40 PM Reinette Chatre reinette.chatre@intel.com wrote:
I do not see any impact to the (soft) RMIDs that can be assigned to monitor groups, yet from what I understand a generic "RMID" is used as index to MBM state. Is this correct? A hardware RMID and software RMID would thus share the same MBM state. If this is correct I think we need to work on making the boundaries between hard and soft RMID more clear.
The only RMID-indexed state used by soft RMIDs right now is mbm_state::soft_rmid_bytes. The other aspect of the boundary is ensuring that nothing will access the hard RMID-specific state for a soft RMID.
The remainder of the mbm_state is only accessed by the software controller, which you suggested that I disable.
The arch_mbm_state is accessed only through resctrl_arch_rmid_read() and resctrl_arch_reset_rmid(), which are called by __mon_event_count() or the limbo handler.
__mon_event_count() is aware of soft RMIDs, so I would just need to ensure the software controller is disabled and never put any RMIDs on the limbo list. To be safe, I can also add WARN_ON_ONCE(rdt_mon_soft_rmid) to the rmid-indexing of the mbm_state arrays in the software controller and before the resctrl_arch_rmid_read() call in the limbo handler to catch if they're ever using soft RMIDs.
I understand and trust that you can ensure that this implementation is done safely. Please also consider how future changes to resctrl may stumble if there are not clear boundaries. You may be able to "ensure the software controller is disabled and never put any RMIDs on the limbo list", but consider if these rules will be clear to somebody who comes along in a year or more.
Documenting the data structures with these unique usages will help. Specific accessors can sometimes be useful to make it obvious in which state the data is being accessed and what data can be accessed. Using WARN as you suggest is a useful tool.
After studying the present usage of RMID values some more, I've concluded that I can cleanly move all knowledge of the soft RMID implementation to be within resctrl_arch_rmid_read() and that none of the FS-layer code should need to be aware of them. However, doing this would require James's patch to allow resctrl_arch_rmid_read() to block[1], since resctrl_arch_rmid_read() would be the first opportunity architecture-dependent code has to IPI the other CPUs in the domain.
The alternative to blocking in resctrl_arch_rmid_read() would be introducing an arch hook to mon_event_read(), where blocking can be done today without James's patches, so that architecture-dependent code can IPI all CPUs in the target domain to flush their event counts to memory before calling mon_event_count() to total their MBM event counts.
The remaining special case for soft RMIDs would be knowing that they should never go on the limbo list. Right now I've hard-coded the soft RMID read to always return 0 bytes for occupancy events, but this answer is only correct in the context of deciding whether RMIDs are dirty, so I have to prevent the events from being presented to the user. If returning an error wasn't considered "dirty", maybe that would work too.
Maybe the cleanest approach would be to cause enabling soft RMIDs to somehow cause is_llc_occupancy_enabled() to return false, but this is difficult as long as soft RMIDs are configured at mount time and rdt_mon_features is set at boot time. If soft RMIDs move completely into the arch layer, is it preferable to configure them with an rdt boot option instead of adding an architecture-dependent mount option? I recall James being opposed to adding a boot option for this.
Thanks! -Peter
[1] https://lore.kernel.org/lkml/20230525180209.19497-15-james.morse@arm.com/