On 1/16/20 12:21 PM, Jason Gunthorpe wrote:
On Thu, Jan 16, 2020 at 12:16:30PM -0800, Ralph Campbell wrote:
Can you point me to the latest ODP code? Seems like my understanding is quite off.
https://elixir.bootlin.com/linux/v5.5-rc6/source/drivers/infiniband/hw/mlx5/...
Look for the word 'implicit'
mlx5_ib_invalidate_range() releases the interval_notifier when there are no populated shadow PTEs in its leaf
pagefault_implicit_mr() creates an interval_notifier that covers the level in the page table that needs population. Notice it just uses an unlocked xa_load to find the page table level.
The locking is pretty tricky as it relies on RCU, but the fault flow is fairly lightweight.
Jason
Thanks for the information, Jason.
I'm still interested in finding a way to support range based hints to device drivers. madvise() looks like it only sets a bit in vma->vm_flags or acts on the advice immediately. mbind() and set_mempolicy() only work with CPUs and memory with NUMA a node number. What I'm looking for is a way for the device to know whether to migrate pages to device private memory on a fault, whether to duplicate read-only pages in device private memory, or remote map/access a page instead of migrating it. For example, there is a working draft extension to OpenCL, https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_uni... that could provide a way to specify this sort of advice. C++ is also looking at extentions for specifying affinity attributes. In any case, these are probably a long ways off before being finalized and implemented.
I also have some changes to support THP migration to device private memory but that will require updating nouveau to use 2MB TLB mappings.
In the mean time, I can update the HMM self tests to do something like ODP without changing mm/mmu_notifier.c but I don't think I can easily change nouveau to that model.