On 8/11/25 02:15, Uladzislau Rezki wrote:
kernel_pte_work.list is global shared var, it would make the producer pte_free_kernel() and the consumer kernel_pte_work_func() to operate in serialized timing. In a large system, I don't think you design this deliberately 🙂
Sorry for jumping.
Agree, unless it is never considered as a hot path or something that can be really contented. It looks like you can use just a per-cpu llist to drain thinks.
Remember, the code that has to run just before all this sent an IPI to every single CPU on the system to have them do a (on x86 at least) pretty expensive TLB flush.
If this is a hot path, we have bigger problems on our hands: the full TLB flush on every CPU.
So, sure, there are a million ways to make this deferred freeing more scalable. But the code that's here is dirt simple and self contained. If someone has some ideas for something that's simpler and more scalable, then I'm totally open to it.
But this is _not_ the place to add complexity to get scalability.