On Tue, Apr 03, 2018 at 11:09:09AM +0200, Daniel Vetter wrote:
On Thu, Mar 29, 2018 at 01:34:24PM +0200, Christian König wrote:
Am 29.03.2018 um 08:57 schrieb Daniel Vetter:
On Sun, Mar 25, 2018 at 12:59:56PM +0200, Christian König wrote:
Add a peer2peer flag noting that the importer can deal with device resources which are not backed by pages.
Signed-off-by: Christian König christian.koenig@amd.com
Um strictly speaking they all should, but ttm never bothered to use the real interfaces but just hacked around the provided sg list, grabbing the underlying struct pages, then rebuilding&remapping the sg list again.
Actually that isn't correct. TTM converts them to a dma address array because drivers need it like this (at least nouveau, radeon and amdgpu).
I've fixed radeon and amdgpu to be able to deal without it and mailed with Ben about nouveau, but the outcome is they don't really know.
TTM itself doesn't have any need for the pages on imported BOs (you can't mmap them anyway), the real underlying problem is that sg tables doesn't provide what drivers need.
I think we could rather easily fix sg tables, but that is a totally separate task.
Looking at patch 8, the sg table seems perfectly sufficient to convey the right dma addresses to the importer. Ofcourse the exporter has to set up the right kind of iommu mappings to make this work.
The entire point of using sg lists was exactly to allow this use case of peer2peer dma (or well in general have special exporters which managed memory/IO ranges not backed by struct page). So essentially you're having a "I'm totally not broken flag" here.
No, independent of needed struct page pointers we need to note if the exporter can handle peer2peer stuff from the hardware side in general.
So what I've did is just to set peer2peer allowed on the importer because of the driver needs and clear it in the exporter if the hardware can't handle that.
The only thing the importer seems to do is call the pci_peer_traffic_supported, which the exporter could call too. What am I missing (since the sturct_page stuff sounds like it's fixed already by you)? -Daniel
AFAIK Logan patchset require to register and initialize struct page for the device memory you want to map (export from exporter point of view).
With GPU this isn't something we want, struct page is >~= 2^6 so for 4GB GPU = 2^6*2^32/2^12 = 2^26 = 64MB of RAM 8GB GPU = 2^6*2^33/2^12 = 2^27 = 128MB of RAM 16GB GPU = 2^6*2^34/2^12 = 2^28 = 256MB of RAM 32GB GPU = 2^6*2^34/2^12 = 2^29 = 512MB of RAM
All this is mostly wasted as only a small sub-set (that can not be constraint to specific range) will ever be exported at any point in time. For GPU work load this is hardly justifiable, even for HMM i do not plan to register all those pages.
Hence why i argue that dma_map_resource() like use by Christian is good enough for us. People that care about SG can fix that but i rather not have to depend on that and waste system memory.
Cheers, Jérôme