On error restore anything still on the pin_list back to the invalidation list on error. For the actual pin, so long as the vma is tracked on either list it should get picked up on the next pin, however it looks possible for the vma to get nuked but still be present on this per vm pin_list leading to corruption. An alternative might be then to instead just remove the link when destroying the vma.
Fixes: ed2bdf3b264d ("drm/xe/vm: Subclass userptr vmas") Suggested-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: stable@vger.kernel.org # v6.8+ --- drivers/gpu/drm/xe/xe_vm.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index d664f2e418b2..668b0bde7822 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -670,12 +670,12 @@ int xe_vm_userptr_pin(struct xe_vm *vm) list_for_each_entry_safe(uvma, next, &vm->userptr.invalidated, userptr.invalidate_link) { list_del_init(&uvma->userptr.invalidate_link); - list_move_tail(&uvma->userptr.repin_link, - &vm->userptr.repin_list); + list_add_tail(&uvma->userptr.repin_link, + &vm->userptr.repin_list); } spin_unlock(&vm->userptr.invalidated_lock);
- /* Pin and move to temporary list */ + /* Pin and move to bind list */ list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list, userptr.repin_link) { err = xe_vma_userptr_pin_pages(uvma); @@ -691,10 +691,10 @@ int xe_vm_userptr_pin(struct xe_vm *vm) err = xe_vm_invalidate_vma(&uvma->vma); xe_vm_unlock(vm); if (err) - return err; + break; } else { - if (err < 0) - return err; + if (err) + break;
list_del_init(&uvma->userptr.repin_link); list_move_tail(&uvma->vma.combined_links.rebind, @@ -702,7 +702,19 @@ int xe_vm_userptr_pin(struct xe_vm *vm) } }
- return 0; + if (err) { + down_write(&vm->userptr.notifier_lock); + spin_lock(&vm->userptr.invalidated_lock); + list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list, + userptr.repin_link) { + list_del_init(&uvma->userptr.repin_link); + list_move_tail(&uvma->userptr.invalidate_link, + &vm->userptr.invalidated); + } + spin_unlock(&vm->userptr.invalidated_lock); + up_write(&vm->userptr.notifier_lock); + } + return err; }
/**
Currently we treat EFAULT from hmm_range_fault() as a non-fatal error when called from xe_vm_userptr_pin() with the idea that we want to avoid killing the entire vm and chucking an error, under the assumption that the user just did an unmap or something, and has no intention of actually touching that memory from the GPU. At this point we have already zapped the PTEs so any access should generate a page fault, and if the pin fails there also it will then become fatal.
However it looks like it's possible for the userptr vma to still be on the rebind list in preempt_rebind_work_func(), if we had to retry the pin again due to something happening in the caller before we did the rebind step, but in the meantime needing to re-validate the userptr and this time hitting the EFAULT.
This might explain an internal user report of hitting:
[ 191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe] [ 191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738690] Call Trace: [ 191.738692] <TASK> [ 191.738694] ? show_regs+0x69/0x80 [ 191.738698] ? __warn+0x93/0x1a0 [ 191.738703] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738759] ? report_bug+0x18f/0x1a0 [ 191.738764] ? handle_bug+0x63/0xa0 [ 191.738767] ? exc_invalid_op+0x19/0x70 [ 191.738770] ? asm_exc_invalid_op+0x1b/0x20 [ 191.738777] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738834] ? ret_from_fork_asm+0x1a/0x30 [ 191.738849] bind_op_prepare+0x105/0x7b0 [xe] [ 191.738906] ? dma_resv_reserve_fences+0x301/0x380 [ 191.738912] xe_pt_update_ops_prepare+0x28c/0x4b0 [xe] [ 191.738966] ? kmemleak_alloc+0x4b/0x80 [ 191.738973] ops_execute+0x188/0x9d0 [xe] [ 191.739036] xe_vm_rebind+0x4ce/0x5a0 [xe] [ 191.739098] ? trace_hardirqs_on+0x4d/0x60 [ 191.739112] preempt_rebind_work_func+0x76f/0xd00 [xe]
Followed by NPD, when running some workload, since the sg was never actually populated but the vma is still marked for rebind when it should be skipped for this special EFAULT case. And from the logs it does seem like we hit this special EFAULT case before the explosions.
v2 (MattB): - Move earlier
Fixes: 521db22a1d70 ("drm/xe: Invalidate userptr VMA on page pin fault") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: stable@vger.kernel.org # v6.10+ --- drivers/gpu/drm/xe/xe_vm.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 668b0bde7822..f36e2cc1d155 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -681,6 +681,18 @@ int xe_vm_userptr_pin(struct xe_vm *vm) err = xe_vma_userptr_pin_pages(uvma); if (err == -EFAULT) { list_del_init(&uvma->userptr.repin_link); + /* + * We might have already done the pin once already, but + * then had to retry before the re-bind happened, due + * some other condition in the caller, but in the + * meantime the userptr got dinged by the notifier such + * that we need to revalidate here, but this time we hit + * the EFAULT. In such a case make sure we remove + * ourselves from the rebind list to avoid going down in + * flames. + */ + if (!list_empty(&uvma->vma.combined_links.rebind)) + list_del_init(&uvma->vma.combined_links.rebind);
/* Wait for pending binds */ xe_vm_lock(vm, false);
On Fri, Feb 14, 2025 at 05:05:29PM +0000, Matthew Auld wrote:
Currently we treat EFAULT from hmm_range_fault() as a non-fatal error when called from xe_vm_userptr_pin() with the idea that we want to avoid killing the entire vm and chucking an error, under the assumption that the user just did an unmap or something, and has no intention of actually touching that memory from the GPU. At this point we have already zapped the PTEs so any access should generate a page fault, and if the pin fails there also it will then become fatal.
However it looks like it's possible for the userptr vma to still be on the rebind list in preempt_rebind_work_func(), if we had to retry the pin again due to something happening in the caller before we did the rebind step, but in the meantime needing to re-validate the userptr and this time hitting the EFAULT.
This might explain an internal user report of hitting:
[ 191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe] [ 191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738690] Call Trace: [ 191.738692] <TASK> [ 191.738694] ? show_regs+0x69/0x80 [ 191.738698] ? __warn+0x93/0x1a0 [ 191.738703] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738759] ? report_bug+0x18f/0x1a0 [ 191.738764] ? handle_bug+0x63/0xa0 [ 191.738767] ? exc_invalid_op+0x19/0x70 [ 191.738770] ? asm_exc_invalid_op+0x1b/0x20 [ 191.738777] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738834] ? ret_from_fork_asm+0x1a/0x30 [ 191.738849] bind_op_prepare+0x105/0x7b0 [xe] [ 191.738906] ? dma_resv_reserve_fences+0x301/0x380 [ 191.738912] xe_pt_update_ops_prepare+0x28c/0x4b0 [xe] [ 191.738966] ? kmemleak_alloc+0x4b/0x80 [ 191.738973] ops_execute+0x188/0x9d0 [xe] [ 191.739036] xe_vm_rebind+0x4ce/0x5a0 [xe] [ 191.739098] ? trace_hardirqs_on+0x4d/0x60 [ 191.739112] preempt_rebind_work_func+0x76f/0xd00 [xe]
Followed by NPD, when running some workload, since the sg was never actually populated but the vma is still marked for rebind when it should be skipped for this special EFAULT case. And from the logs it does seem like we hit this special EFAULT case before the explosions.
It would be nice to verify if this fixes the bug report.
v2 (MattB):
- Move earlier
Fixes: 521db22a1d70 ("drm/xe: Invalidate userptr VMA on page pin fault") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Matthew Brost matthew.brost@intel.com
Anyways, LGTM: Reviewed-by: Matthew Brost matthew.brost@intel.com
Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: stable@vger.kernel.org # v6.10+
drivers/gpu/drm/xe/xe_vm.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 668b0bde7822..f36e2cc1d155 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -681,6 +681,18 @@ int xe_vm_userptr_pin(struct xe_vm *vm) err = xe_vma_userptr_pin_pages(uvma); if (err == -EFAULT) { list_del_init(&uvma->userptr.repin_link);
/*
* We might have already done the pin once already, but
* then had to retry before the re-bind happened, due
* some other condition in the caller, but in the
* meantime the userptr got dinged by the notifier such
* that we need to revalidate here, but this time we hit
* the EFAULT. In such a case make sure we remove
* ourselves from the rebind list to avoid going down in
* flames.
*/
if (!list_empty(&uvma->vma.combined_links.rebind))
list_del_init(&uvma->vma.combined_links.rebind);
/* Wait for pending binds */ xe_vm_lock(vm, false); -- 2.48.1
On 15/02/2025 01:23, Matthew Brost wrote:
On Fri, Feb 14, 2025 at 05:05:29PM +0000, Matthew Auld wrote:
Currently we treat EFAULT from hmm_range_fault() as a non-fatal error when called from xe_vm_userptr_pin() with the idea that we want to avoid killing the entire vm and chucking an error, under the assumption that the user just did an unmap or something, and has no intention of actually touching that memory from the GPU. At this point we have already zapped the PTEs so any access should generate a page fault, and if the pin fails there also it will then become fatal.
However it looks like it's possible for the userptr vma to still be on the rebind list in preempt_rebind_work_func(), if we had to retry the pin again due to something happening in the caller before we did the rebind step, but in the meantime needing to re-validate the userptr and this time hitting the EFAULT.
This might explain an internal user report of hitting:
[ 191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe] [ 191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738690] Call Trace: [ 191.738692] <TASK> [ 191.738694] ? show_regs+0x69/0x80 [ 191.738698] ? __warn+0x93/0x1a0 [ 191.738703] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738759] ? report_bug+0x18f/0x1a0 [ 191.738764] ? handle_bug+0x63/0xa0 [ 191.738767] ? exc_invalid_op+0x19/0x70 [ 191.738770] ? asm_exc_invalid_op+0x1b/0x20 [ 191.738777] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe] [ 191.738834] ? ret_from_fork_asm+0x1a/0x30 [ 191.738849] bind_op_prepare+0x105/0x7b0 [xe] [ 191.738906] ? dma_resv_reserve_fences+0x301/0x380 [ 191.738912] xe_pt_update_ops_prepare+0x28c/0x4b0 [xe] [ 191.738966] ? kmemleak_alloc+0x4b/0x80 [ 191.738973] ops_execute+0x188/0x9d0 [xe] [ 191.739036] xe_vm_rebind+0x4ce/0x5a0 [xe] [ 191.739098] ? trace_hardirqs_on+0x4d/0x60 [ 191.739112] preempt_rebind_work_func+0x76f/0xd00 [xe]
Followed by NPD, when running some workload, since the sg was never actually populated but the vma is still marked for rebind when it should be skipped for this special EFAULT case. And from the logs it does seem like we hit this special EFAULT case before the explosions.
It would be nice to verify if this fixes the bug report.
Yes, reporter said it fixes it. Or at least the previous version did. See GSD-10562 if you are curious. Will re-phrase the commit message to make that clear.
v2 (MattB):
- Move earlier
Fixes: 521db22a1d70 ("drm/xe: Invalidate userptr VMA on page pin fault") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Matthew Brost matthew.brost@intel.com
Anyways, LGTM: Reviewed-by: Matthew Brost matthew.brost@intel.com
Thanks.
Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: stable@vger.kernel.org # v6.10+
drivers/gpu/drm/xe/xe_vm.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 668b0bde7822..f36e2cc1d155 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -681,6 +681,18 @@ int xe_vm_userptr_pin(struct xe_vm *vm) err = xe_vma_userptr_pin_pages(uvma); if (err == -EFAULT) { list_del_init(&uvma->userptr.repin_link);
/*
* We might have already done the pin once already, but
* then had to retry before the re-bind happened, due
* some other condition in the caller, but in the
* meantime the userptr got dinged by the notifier such
* that we need to revalidate here, but this time we hit
* the EFAULT. In such a case make sure we remove
* ourselves from the rebind list to avoid going down in
* flames.
*/
if (!list_empty(&uvma->vma.combined_links.rebind))
list_del_init(&uvma->vma.combined_links.rebind);
/* Wait for pending binds */ xe_vm_lock(vm, false); -- 2.48.1
On Fri, Feb 14, 2025 at 05:05:28PM +0000, Matthew Auld wrote:
On error restore anything still on the pin_list back to the invalidation list on error. For the actual pin, so long as the vma is tracked on either list it should get picked up on the next pin, however it looks possible for the vma to get nuked but still be present on this per vm pin_list leading to corruption. An alternative might be then to instead just remove the link when destroying the vma.
Fixes: ed2bdf3b264d ("drm/xe/vm: Subclass userptr vmas") Suggested-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: stable@vger.kernel.org # v6.8+
drivers/gpu/drm/xe/xe_vm.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index d664f2e418b2..668b0bde7822 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -670,12 +670,12 @@ int xe_vm_userptr_pin(struct xe_vm *vm) list_for_each_entry_safe(uvma, next, &vm->userptr.invalidated, userptr.invalidate_link) { list_del_init(&uvma->userptr.invalidate_link);
list_move_tail(&uvma->userptr.repin_link,
&vm->userptr.repin_list);
list_add_tail(&uvma->userptr.repin_link,
&vm->userptr.repin_list);
Why this change?
} spin_unlock(&vm->userptr.invalidated_lock);
- /* Pin and move to temporary list */
- /* Pin and move to bind list */ list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list, userptr.repin_link) { err = xe_vma_userptr_pin_pages(uvma);
@@ -691,10 +691,10 @@ int xe_vm_userptr_pin(struct xe_vm *vm) err = xe_vm_invalidate_vma(&uvma->vma); xe_vm_unlock(vm); if (err)
return err;
} else {break;
if (err < 0)
return err;
if (err)
break;
list_del_init(&uvma->userptr.repin_link); list_move_tail(&uvma->vma.combined_links.rebind, @@ -702,7 +702,19 @@ int xe_vm_userptr_pin(struct xe_vm *vm) } }
- return 0;
- if (err) {
down_write(&vm->userptr.notifier_lock);
Can you explain why you take the notifier lock here? I don't think this required unless I'm missing something.
Matt
spin_lock(&vm->userptr.invalidated_lock);
list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list,
userptr.repin_link) {
list_del_init(&uvma->userptr.repin_link);
list_move_tail(&uvma->userptr.invalidate_link,
&vm->userptr.invalidated);
}
spin_unlock(&vm->userptr.invalidated_lock);
up_write(&vm->userptr.notifier_lock);
- }
- return err;
} /** -- 2.48.1
On 15/02/2025 01:28, Matthew Brost wrote:
On Fri, Feb 14, 2025 at 05:05:28PM +0000, Matthew Auld wrote:
On error restore anything still on the pin_list back to the invalidation list on error. For the actual pin, so long as the vma is tracked on either list it should get picked up on the next pin, however it looks possible for the vma to get nuked but still be present on this per vm pin_list leading to corruption. An alternative might be then to instead just remove the link when destroying the vma.
Fixes: ed2bdf3b264d ("drm/xe/vm: Subclass userptr vmas") Suggested-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: stable@vger.kernel.org # v6.8+
drivers/gpu/drm/xe/xe_vm.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index d664f2e418b2..668b0bde7822 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -670,12 +670,12 @@ int xe_vm_userptr_pin(struct xe_vm *vm) list_for_each_entry_safe(uvma, next, &vm->userptr.invalidated, userptr.invalidate_link) { list_del_init(&uvma->userptr.invalidate_link);
list_move_tail(&uvma->userptr.repin_link,
&vm->userptr.repin_list);
list_add_tail(&uvma->userptr.repin_link,
&vm->userptr.repin_list);
Why this change?
Just that with this patch the repin_link should now always be empty at this point, I think. add should complain if that is not the case.
} spin_unlock(&vm->userptr.invalidated_lock);
- /* Pin and move to temporary list */
- /* Pin and move to bind list */ list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list, userptr.repin_link) { err = xe_vma_userptr_pin_pages(uvma);
@@ -691,10 +691,10 @@ int xe_vm_userptr_pin(struct xe_vm *vm) err = xe_vm_invalidate_vma(&uvma->vma); xe_vm_unlock(vm); if (err)
return err;
} else {break;
if (err < 0)
return err;
if (err)
break;
list_del_init(&uvma->userptr.repin_link); list_move_tail(&uvma->vma.combined_links.rebind, @@ -702,7 +702,19 @@ int xe_vm_userptr_pin(struct xe_vm *vm) } }
- return 0;
- if (err) {
down_write(&vm->userptr.notifier_lock);
Can you explain why you take the notifier lock here? I don't think this required unless I'm missing something.
For the invalidated list, the docs say:
"Removing items from the list additionally requires @lock in write mode, and adding items to the list requires the @userptr.notifer_lock in write mode."
Not sure if the docs needs to be updated here?
Matt
spin_lock(&vm->userptr.invalidated_lock);
list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list,
userptr.repin_link) {
list_del_init(&uvma->userptr.repin_link);
list_move_tail(&uvma->userptr.invalidate_link,
&vm->userptr.invalidated);
}
spin_unlock(&vm->userptr.invalidated_lock);
up_write(&vm->userptr.notifier_lock);
- }
- return err; }
/** -- 2.48.1
On Mon, Feb 17, 2025 at 09:38:26AM +0000, Matthew Auld wrote:
On 15/02/2025 01:28, Matthew Brost wrote:
On Fri, Feb 14, 2025 at 05:05:28PM +0000, Matthew Auld wrote:
On error restore anything still on the pin_list back to the invalidation list on error. For the actual pin, so long as the vma is tracked on either list it should get picked up on the next pin, however it looks possible for the vma to get nuked but still be present on this per vm pin_list leading to corruption. An alternative might be then to instead just remove the link when destroying the vma.
Fixes: ed2bdf3b264d ("drm/xe/vm: Subclass userptr vmas") Suggested-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: stable@vger.kernel.org # v6.8+
drivers/gpu/drm/xe/xe_vm.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index d664f2e418b2..668b0bde7822 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -670,12 +670,12 @@ int xe_vm_userptr_pin(struct xe_vm *vm) list_for_each_entry_safe(uvma, next, &vm->userptr.invalidated, userptr.invalidate_link) { list_del_init(&uvma->userptr.invalidate_link);
list_move_tail(&uvma->userptr.repin_link,
&vm->userptr.repin_list);
list_add_tail(&uvma->userptr.repin_link,
&vm->userptr.repin_list);
Why this change?
Just that with this patch the repin_link should now always be empty at this point, I think. add should complain if that is not the case.
If it is always expected to be empty, then yea maybe add a xe_assert for this as the list management is pretty tricky.
} spin_unlock(&vm->userptr.invalidated_lock);
- /* Pin and move to temporary list */
- /* Pin and move to bind list */ list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list, userptr.repin_link) { err = xe_vma_userptr_pin_pages(uvma);
@@ -691,10 +691,10 @@ int xe_vm_userptr_pin(struct xe_vm *vm) err = xe_vm_invalidate_vma(&uvma->vma); xe_vm_unlock(vm); if (err)
return err;
} else {break;
if (err < 0)
return err;
if (err)
break; list_del_init(&uvma->userptr.repin_link); list_move_tail(&uvma->vma.combined_links.rebind,
@@ -702,7 +702,19 @@ int xe_vm_userptr_pin(struct xe_vm *vm) } }
- return 0;
- if (err) {
down_write(&vm->userptr.notifier_lock);
Can you explain why you take the notifier lock here? I don't think this required unless I'm missing something.
For the invalidated list, the docs say:
"Removing items from the list additionally requires @lock in write mode, and adding items to the list requires the @userptr.notifer_lock in write mode."
Not sure if the docs needs to be updated here?
Oh. I believe the part of comment for 'adding items to the list requires the @userptr.notifer_lock in write mode' really means something like this:
'When adding to @vm->userptr.invalidated in the notifier the @userptr.notifer_lock in write mode protects against concurrent VM binds from setting up newly invalidated pages.'
So with above and since this code path is in the VM bind path (i.e. we are not racing with other binds) I think the vm->userptr.invalidated_lock is sufficient. Maybe ask Thomas if he agrees here.
Matt
Matt
spin_lock(&vm->userptr.invalidated_lock);
list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list,
userptr.repin_link) {
list_del_init(&uvma->userptr.repin_link);
list_move_tail(&uvma->userptr.invalidate_link,
&vm->userptr.invalidated);
}
spin_unlock(&vm->userptr.invalidated_lock);
up_write(&vm->userptr.notifier_lock);
- }
- return err; } /**
-- 2.48.1
linux-stable-mirror@lists.linaro.org