On Dec 23, 2020, at 8:01 PM, Andrea Arcangeli aarcange@redhat.com wrote:
On Wed, Dec 23, 2020 at 07:09:10PM -0800, Nadav Amit wrote:
Perhaps holding some small bitmap based on part of the deferred flushed pages (e.g., bits 12-17 of the address or some other kind of a single hash-function bloom-filter) would be more performant to avoid (most)
The concern here aren't only the page faults having to run the bloom filter, but how to manage the RAM storage pointed by the bloomfilter or whatever index into the storage, which would slowdown mprotect.
Granted that mprotect is slow to begin with, but the idea we can't make it any slower to make MADV_PAGEOUT or uffd-wp or clear_refs run faster since it's too important and too frequent in comparison.
Just to restrict the potential false positive IPI caused by page_count inevitable inaccuracies to uffd-wp and softdirty runtimes, a simple check on vm_flags should be enough.
Andrea,
I am not trying to be argumentative, and I did not think through about an alternative solution. It sounds to me that your proposed solution is correct and would probably be eventually (slightly) more efficient than anything that I can propose.
Yet, I do want to explain my position. Reasoning on TLB flushes is hard, as this long thread shows. The question is whether it has to be so hard. In theory, we can only think about architectural considerations - whether a PTE permissions are promoted/demoted and whether the PTE was changed/cleared.
Obviously, it is more complex than that. Yet, once you add into the equation various parameters such as the VMA flags or whether a page is locked (which Mel told me was once a consideration), things become much more complicated. If all the logic of TLB flushes had been concentrated in a single point and maintenance of this code did not require thought about users and use-cases, I think things would have been much simpler, at least for me.
Regards, Nadav