On 2021-03-30 03:40 +0800, Xi Ruoyao wrote:
On 2021-03-29 21:36 +0200, Christian König wrote:
Am 29.03.21 um 21:27 schrieb Xi Ruoyao:
Hi Christian,
I don't think there is any constraint implemented to ensure `num_entries % AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`. For example, in `amdgpu_vm_bo_map()`:
/* validate the parameters */ if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK
size == 0 || size & AMDGPU_GPU_PAGE_MASK) return -EINVAL;
/* snip */
saddr /= AMDGPU_GPU_PAGE_SIZE; eaddr /= AMDGPU_GPU_PAGE_SIZE;
/* snip */
mapping->start = saddr; mapping->last = eaddr;
If we really want to ensure (mapping->last - mapping->start + 1) % AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace "AMDGPU_GPU_PAGE_MASK" in "validate the parameters" with "PAGE_MASK".
It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of "AMDGPU_GPU_PAGE_MASK" :(.
Yeah, good point.
I tried it and it broke userspace: Xorg startup fails with EINVAL with this change.
Well in theory it is possible that we always fill the GPUVM on a 4k basis while the native page size of the CPU is larger. Let me double check the code.
On my platform, there are two issues both causing the VM corruption. One is fixed by agd5f/linux@fe001e7. Another is in Mesa from userspace: it uses `dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver expects it to use `dev_info->virtual_address_alignment`.
If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue and make virtual_address_alignment = 4K. Otherwise, we should fortify the parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK". Then the userspace will just get an EINVAL, instead of a slient VM corruption. And someone should tell Mesa developers to fix the code in this case. -- Xi Ruoyao xry111@mengyan1223.wang School of Aerospace Science and Technology, Xidian University