On Fri, Jun 5, 2020 at 6:32 AM Rafael J. Wysocki rafael@kernel.org wrote:
On Fri, May 8, 2020 at 1:55 AM Dan Williams dan.j.williams@intel.com wrote:
Recently a performance problem was reported for a process invoking a non-trival ASL program. The method call in this case ends up repetitively triggering a call path like:
acpi_ex_store acpi_ex_store_object_to_node acpi_ex_write_data_to_field acpi_ex_insert_into_field acpi_ex_write_with_update_rule acpi_ex_field_datum_io acpi_ex_access_region acpi_ev_address_space_dispatch acpi_ex_system_memory_space_handler acpi_os_map_cleanup.part.14 _synchronize_rcu_expedited.constprop.89 schedule
The end result of frequent synchronize_rcu_expedited() invocation is tiny sub-millisecond spurts of execution where the scheduler freely migrates this apparently sleepy task. The overhead of frequent scheduler invocation multiplies the execution time by a factor of 2-3X.
For example, performance improves from 16 minutes to 7 minutes for a firmware update procedure across 24 devices.
Perhaps the rcu usage was intended to allow for not taking a sleeping lock in the acpi_os_{read,write}_memory() path which ostensibly could be called from an APEI NMI error interrupt?
Not really.
acpi_os_{read|write}_memory() end up being called from non-NMI interrupt context via acpi_hw_{read|write}(), respectively, and quite obviously ioremap() cannot be run from there, but in those cases the mappings in question are there in the list already in all cases and so the ioremap() isn't used then.
RCU is there to protect these users from walking the list while it is being updated.
Neither rcu_read_lock() nor ioremap() are interrupt safe, so add a WARN_ONCE() to validate that rcu was not serving as a mechanism to avoid direct calls to ioremap().
But it would produce false-positives if the IRQ context was not NMI, wouldn't it?
Even the original implementation had a spin_lock_irqsave(), but that is not NMI safe.
Which is not a problem (see above).
APEI itself already has some concept of avoiding ioremap() from interrupt context (see erst_exec_move_data()), if the new warning triggers it means that APEI either needs more instrumentation like that to pre-emptively fail, or more infrastructure to arrange for pre-mapping the resources it needs in NMI context.
Well, I'm not sure about that.
Right, this patch set is about 2-3 generations behind the architecture of the fix we are discussing internally, you might mention that.
The fix we are looking at now is to pre-map operation regions in a similar manner as the way APEI resources are pre-mapped. The pre-mapping would arrange for synchronize_rcu_expedited() to be elided on each dynamic mapping attempt. The other piece is to arrange for operation-regions to be mapped at their full size at once rather than a page at a time.