As the error capture will compress user buffers as directed to by the user, it can take an arbitrary amount of time and space. Break up the compression loops with a call to cond_resched(), that will allow other processes to schedule (avoiding the soft lockups) and also serve as a warning should we try to make this loop atomic in the future.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala mika.kuoppala@linux.intel.com --- drivers/gpu/drm/i915/i915_gpu_error.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 3e6cbb0d1150..a635ec8d0b94 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -311,6 +311,8 @@ static int compress_page(struct i915_vma_compress *c,
if (zlib_deflate(zstream, Z_NO_FLUSH) != Z_OK) return -EIO; + + cond_resched(); } while (zstream->avail_in);
/* Fallback to uncompressed if we increase size? */ @@ -397,6 +399,7 @@ static int compress_page(struct i915_vma_compress *c, if (!(wc && i915_memcpy_from_wc(ptr, src, PAGE_SIZE))) memcpy(ptr, src, PAGE_SIZE); dst->pages[dst->page_count++] = ptr; + cond_resched();
return 0; }
Hi
[This is an automated email]
This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: all
The bot has tested the following trees: v5.8.10, v5.4.66, v4.19.146, v4.14.198, v4.9.236, v4.4.236.
v5.8.10: Build OK! v5.4.66: Build OK! v4.19.146: Build OK! v4.14.198: Build OK! v4.9.236: Failed to apply! Possible dependencies: 0a97015d45ee ("drm/i915: Compress GPU objects in error state") 83bc0f5b432f ("drm/i915: Handle incomplete Z_FINISH for compressed error states") 95374d759ac7 ("drm/i915: Always use the GTT for error capture") 98a2f411671f ("drm/i915: Allow disabling error capture") 9f267eb8d2ea ("drm/i915: Stop the machine whilst capturing the GPU crash dump") d636951ec01b ("drm/i915: Cleanup instdone collection") fc4c79c37e82 ("drm/i915: Consolidate error object printing")
v4.4.236: Failed to apply! Possible dependencies: 0a97015d45ee ("drm/i915: Compress GPU objects in error state") 0bc40be85f33 ("drm/i915: Rename intel_engine_cs function parameters") 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd") 755412e29c77 ("drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER") 98a2f411671f ("drm/i915: Allow disabling error capture") ca82580c9cea ("drm/i915: Do not call API requiring struct_mutex where it is not available") cbdc12a9fc9d ("drm/i915: make A0 wa's applied to A1") e87a005d90c3 ("drm/i915: add helpers for platform specific revision id range checks") ef712bb4b700 ("drm/i915: remove parens around revision ids") fffda3f4fb49 ("drm/i915/bxt: add revision id for A1 stepping and use it")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
Hi!
As the error capture will compress user buffers as directed to by the user, it can take an arbitrary amount of time and space. Break up the compression loops with a call to cond_resched(), that will allow other processes to schedule (avoiding the soft lockups) and also serve as a warning should we try to make this loop atomic in the future.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala mika.kuoppala@linux.intel.com
This was queued for 4.19-stable, but is very likely wrong.
@@ -397,6 +399,7 @@ static int compress_page(struct i915_vma_compress *c, if (!(wc && i915_memcpy_from_wc(ptr, src, PAGE_SIZE))) memcpy(ptr, src, PAGE_SIZE); dst->pages[dst->page_count++] = ptr;
- cond_resched();
return 0; }
4.19 compress_page begins with
static int compress_page(struct compress *c, ... page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN);
and likely may not sleep. That changed with commit a42f45a2a85998453078, but that one is not present in 4.19..
I believe we don't need this in stable: dumping of error file will not take so long to trigger softlockup detectors... and if userland access blocked, we would be able to reschedule, anyway.
Best regards, Pavel
linux-stable-mirror@lists.linaro.org