[PATCH 1/9] drm/amdgpu: make sure BOs are locked in amdgpu_vm_get_memory

List overview All Threads
Download

newer

older

FAILED: patch "[PATCH] block: add...

[PATCH -stable,4.14 0/3] stable...

Alex Deucher

7 Jul 2023 7 Jul '23

3:07 p.m.

From: Christian König christian.koenig@amd.com

We need to grab the lock of the BO or otherwise can run into a crash when we try to inspect the current location.

Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Acked-by: Guchun Chen guchun.chen@amd.com Tested-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit e2ad8e2df432498b1cee2af04df605723f4d75e6) Cc: stable@vger.kernel.org # 6.3.x --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 69 +++++++++++++++----------- 1 file changed, 39 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 5b3a70becbdf..a252a206f37b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -920,42 +920,51 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm, return r; }

+static void amdgpu_vm_bo_get_memory(struct amdgpu_bo_va *bo_va, + struct amdgpu_mem_stats *stats) +{ + struct amdgpu_vm *vm = bo_va->base.vm; + struct amdgpu_bo *bo = bo_va->base.bo; + + if (!bo) + return; + + /* + * For now ignore BOs which are currently locked and potentially + * changing their location. + */ + if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv && + !dma_resv_trylock(bo->tbo.base.resv)) + return; + + amdgpu_bo_get_memory(bo, stats); + if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv) + dma_resv_unlock(bo->tbo.base.resv); +} + void amdgpu_vm_get_memory(struct amdgpu_vm *vm, struct amdgpu_mem_stats *stats) { struct amdgpu_bo_va *bo_va, *tmp;

spin_lock(&vm->status_lock); - list_for_each_entry_safe(bo_va, tmp, &vm->idle, base.vm_status) { - if (!bo_va->base.bo) - continue; - amdgpu_bo_get_memory(bo_va->base.bo, stats); - } - list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status) { - if (!bo_va->base.bo) - continue; - amdgpu_bo_get_memory(bo_va->base.bo, stats); - } - list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status) { - if (!bo_va->base.bo) - continue; - amdgpu_bo_get_memory(bo_va->base.bo, stats); - } - list_for_each_entry_safe(bo_va, tmp, &vm->moved, base.vm_status) { - if (!bo_va->base.bo) - continue; - amdgpu_bo_get_memory(bo_va->base.bo, stats); - } - list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, base.vm_status) { - if (!bo_va->base.bo) - continue; - amdgpu_bo_get_memory(bo_va->base.bo, stats); - } - list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) { - if (!bo_va->base.bo) - continue; - amdgpu_bo_get_memory(bo_va->base.bo, stats); - } + list_for_each_entry_safe(bo_va, tmp, &vm->idle, base.vm_status) + amdgpu_vm_bo_get_memory(bo_va, stats); + + list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status) + amdgpu_vm_bo_get_memory(bo_va, stats); + + list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status) + amdgpu_vm_bo_get_memory(bo_va, stats); + + list_for_each_entry_safe(bo_va, tmp, &vm->moved, base.vm_status) + amdgpu_vm_bo_get_memory(bo_va, stats); + + list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, base.vm_status) + amdgpu_vm_bo_get_memory(bo_va, stats); + + list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) + amdgpu_vm_bo_get_memory(bo_va, stats); spin_unlock(&vm->status_lock); }

-- 2.41.0

Show replies by date

Alex Deucher

7 Jul 7 Jul

3:07 p.m.

New subject: [PATCH 2/9] drm/amdgpu: make sure that BOs have a backing store

From: Christian König christian.koenig@amd.com

It's perfectly possible that the BO is about to be destroyed and doesn't have a backing store associated with it.

Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Acked-by: Guchun Chen guchun.chen@amd.com Tested-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit ca0b954a4315ca2228001c439ae1062561c81989) Cc: stable@vger.kernel.org # 6.3.x --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index a70103ac0026..46557bbbc18a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1266,8 +1266,12 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, void amdgpu_bo_get_memory(struct amdgpu_bo *bo, struct amdgpu_mem_stats *stats) { - unsigned int domain; uint64_t size = amdgpu_bo_size(bo); + unsigned int domain; + + /* Abort if the BO doesn't currently have a backing store */ + if (!bo->tbo.resource) + return;

domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type); switch (domain) {

-- 2.41.0

Alex Deucher

3:07 p.m.

New subject: [PATCH 3/9] drm/amdgpu: Skip mark offset for high priority rings

From: Jiadong Zhu Jiadong.Zhu@amd.com

Only low priority rings are using chunks to save the offset. Bypass the mark offset callings from high priority rings.

Signed-off-by: Jiadong Zhu Jiadong.Zhu@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit ef3c36a6e025e9b16ca3321479ba016841fa17a0) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 3 +++ 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c index 73516abef662..b779ee4bbaa7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c @@ -423,6 +423,9 @@ void amdgpu_sw_ring_ib_mark_offset(struct amdgpu_ring *ring, enum amdgpu_ring_mu struct amdgpu_ring_mux *mux = &adev->gfx.muxer; unsigned offset;

+ if (ring->hw_prio > AMDGPU_RING_PRIO_DEFAULT) + return; + offset = ring->wptr & ring->buf_mask;

amdgpu_ring_mux_ib_mark_offset(mux, ring, offset, type);

-- 2.41.0

Alex Deucher

3:07 p.m.

New subject: [PATCH 4/9] drm/amd/pm: revise the ASPM settings for thunderbolt attached scenario

From: Evan Quan evan.quan@amd.com

Also, correct the comment for NAVI10_PCIE__LC_L1_INACTIVITY_TBT_DEFAULT as 0x0000000E stands for 400ms instead of 4ms.

Signed-off-by: Evan Quan evan.quan@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit fd21987274463a439c074b8f3c93d3b132e4c031) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c index aa761ff3a5fa..7ba47fc1917b 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c @@ -346,7 +346,7 @@ static void nbio_v2_3_init_registers(struct amdgpu_device *adev)

#define NAVI10_PCIE__LC_L0S_INACTIVITY_DEFAULT 0x00000000 // off by default, no gains over L1 #define NAVI10_PCIE__LC_L1_INACTIVITY_DEFAULT 0x00000009 // 1=1us, 9=1ms -#define NAVI10_PCIE__LC_L1_INACTIVITY_TBT_DEFAULT 0x0000000E // 4ms +#define NAVI10_PCIE__LC_L1_INACTIVITY_TBT_DEFAULT 0x0000000E // 400ms

static void nbio_v2_3_enable_aspm(struct amdgpu_device *adev, bool enable) @@ -479,9 +479,12 @@ static void nbio_v2_3_program_aspm(struct amdgpu_device *adev) WREG32_SOC15(NBIO, 0, mmRCC_BIF_STRAP5, data);

def = data = RREG32_PCIE(smnPCIE_LC_CNTL); - data &= ~PCIE_LC_CNTL__LC_L0S_INACTIVITY_MASK; - data |= 0x9 << PCIE_LC_CNTL__LC_L1_INACTIVITY__SHIFT; - data |= 0x1 << PCIE_LC_CNTL__LC_PMI_TO_L1_DIS__SHIFT; + data |= NAVI10_PCIE__LC_L0S_INACTIVITY_DEFAULT << PCIE_LC_CNTL__LC_L0S_INACTIVITY__SHIFT; + if (pci_is_thunderbolt_attached(adev->pdev)) + data |= NAVI10_PCIE__LC_L1_INACTIVITY_TBT_DEFAULT << PCIE_LC_CNTL__LC_L1_INACTIVITY__SHIFT; + else + data |= NAVI10_PCIE__LC_L1_INACTIVITY_DEFAULT << PCIE_LC_CNTL__LC_L1_INACTIVITY__SHIFT; + data &= ~PCIE_LC_CNTL__LC_PMI_TO_L1_DIS_MASK; if (def != data) WREG32_PCIE(smnPCIE_LC_CNTL, data);

-- 2.41.0

Alex Deucher

3:07 p.m.

New subject: [PATCH 5/9] drm/amdgpu/sdma4: set align mask to 255

The wptr needs to be incremented at at least 64 dword intervals, use 256 to align with windows. This should fix potential hangs with unaligned updates.

Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Reviewed-by: Aaron Liu aaron.liu@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit e5df16d9428f5c6d2d0b1eff244d6c330ba9ef3a) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index 9295ac7edd56..d35c8a33d06d 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -2306,7 +2306,7 @@ const struct amd_ip_funcs sdma_v4_0_ip_funcs = {

static const struct amdgpu_ring_funcs sdma_v4_0_ring_funcs = { .type = AMDGPU_RING_TYPE_SDMA, - .align_mask = 0xf, + .align_mask = 0xff, .nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP), .support_64bit_ptrs = true, .secure_submission_supported = true, @@ -2338,7 +2338,7 @@ static const struct amdgpu_ring_funcs sdma_v4_0_ring_funcs = {

static const struct amdgpu_ring_funcs sdma_v4_0_page_ring_funcs = { .type = AMDGPU_RING_TYPE_SDMA, - .align_mask = 0xf, + .align_mask = 0xff, .nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP), .support_64bit_ptrs = true, .secure_submission_supported = true, diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c index 64dcaa2670dd..ac7aa8631f6a 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c @@ -1740,7 +1740,7 @@ const struct amd_ip_funcs sdma_v4_4_2_ip_funcs = {

static const struct amdgpu_ring_funcs sdma_v4_4_2_ring_funcs = { .type = AMDGPU_RING_TYPE_SDMA, - .align_mask = 0xf, + .align_mask = 0xff, .nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP), .support_64bit_ptrs = true, .get_rptr = sdma_v4_4_2_ring_get_rptr, @@ -1771,7 +1771,7 @@ static const struct amdgpu_ring_funcs sdma_v4_4_2_ring_funcs = {

static const struct amdgpu_ring_funcs sdma_v4_4_2_page_ring_funcs = { .type = AMDGPU_RING_TYPE_SDMA, - .align_mask = 0xf, + .align_mask = 0xff, .nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP), .support_64bit_ptrs = true, .get_rptr = sdma_v4_4_2_ring_get_rptr,

-- 2.41.0

Alex Deucher

3:07 p.m.

New subject: [PATCH 6/9] drm/amd/pm: add abnormal fan detection for smu 13.0.0

From: Kenneth Feng kenneth.feng@amd.com

add abnormal fan detection for smu 13.0.0

Signed-off-by: Kenneth Feng kenneth.feng@amd.com Reviewed-by: Evan Quan evan.quan@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 2da0036ea99bccb27f7fe3cf2aa2900860e9be46) Cc: stable@vger.kernel.org # 6.1.x --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c index 08577d1b84ec..c42c0c1446f4 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c @@ -1300,6 +1300,7 @@ static int smu_v13_0_0_get_thermal_temperature_range(struct smu_context *smu, range->mem_emergency_max = (pptable->SkuTable.TemperatureLimit[TEMP_MEM] + CTF_OFFSET_MEM)* SMU_TEMPERATURE_UNITS_PER_CENTIGRADES; range->software_shutdown_temp = powerplay_table->software_shutdown_temp; + range->software_shutdown_temp_offset = pptable->SkuTable.FanAbnormalTempLimitOffset;

return 0; }

-- 2.41.0

Alex Deucher

3:07 p.m.

New subject: [PATCH 7/9] drm/amdgpu: check RAS irq existence for VCN/JPEG

From: Tao Zhou tao.zhou1@amd.com

No RAS irq is allowed.

Signed-off-by: Tao Zhou tao.zhou1@amd.com Reviewed-by: Hawking Zhang Hawking.Zhang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 4ff96bcc0d40b66bf3ddd6010830e9a4f9b85d53) Cc: stable@vger.kernel.org # 6.1.x --- drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c index 4fa019c8aefc..fb9251d9c899 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c @@ -251,7 +251,8 @@ int amdgpu_jpeg_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *

if (amdgpu_ras_is_supported(adev, ras_block->block)) { for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { - if (adev->jpeg.harvest_config & (1 << i)) + if (adev->jpeg.harvest_config & (1 << i) || + !adev->jpeg.inst[i].ras_poison_irq.funcs) continue;

r = amdgpu_irq_get(adev, &adev->jpeg.inst[i].ras_poison_irq, 0); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index 2d94f1b63bd6..b46a5771c3ec 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -1191,7 +1191,8 @@ int amdgpu_vcn_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *r

if (amdgpu_ras_is_supported(adev, ras_block->block)) { for (i = 0; i < adev->vcn.num_vcn_inst; i++) { - if (adev->vcn.harvest_config & (1 << i)) + if (adev->vcn.harvest_config & (1 << i) || + !adev->vcn.inst[i].ras_poison_irq.funcs) continue;

r = amdgpu_irq_get(adev, &adev->vcn.inst[i].ras_poison_irq, 0);

-- 2.41.0

Alex Deucher

3:07 p.m.

New subject: [PATCH 8/9] drm/amdgpu: fix number of fence calculations

From: Christian König christian.koenig@amd.com

Since adding gang submit we need to take the gang size into account while reserving fences.

Signed-off-by: Christian König christian.koenig@amd.com Fixes: 4624459c84d7 ("drm/amdgpu: add gang submit frontend v6") Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 570b295248b00c3cf4cf59e397de5cb2361e10c2) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 2eb2c66843a8..5612caf77dd6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -133,9 +133,6 @@ static int amdgpu_cs_p1_user_fence(struct amdgpu_cs_parser *p, bo = amdgpu_bo_ref(gem_to_amdgpu_bo(gobj)); p->uf_entry.priority = 0; p->uf_entry.tv.bo = &bo->tbo; - /* One for TTM and two for the CS job */ - p->uf_entry.tv.num_shared = 3; - drm_gem_object_put(gobj);

size = amdgpu_bo_size(bo); @@ -882,15 +879,19 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,

mutex_lock(&p->bo_list->bo_list_mutex);

- /* One for TTM and one for the CS job */ + /* One for TTM and one for each CS job */ amdgpu_bo_list_for_each_entry(e, p->bo_list) - e->tv.num_shared = 2; + e->tv.num_shared = 1 + p->gang_size; + p->uf_entry.tv.num_shared = 1 + p->gang_size;

amdgpu_bo_list_get_list(p->bo_list, &p->validated);

INIT_LIST_HEAD(&duplicates); amdgpu_vm_get_pd_bo(&fpriv->vm, &p->validated, &p->vm_pd);

+ /* Two for VM updates, one for TTM and one for each CS job */ + p->vm_pd.tv.num_shared = 3 + p->gang_size; + if (p->uf_entry.tv.bo && !ttm_to_amdgpu_bo(p->uf_entry.tv.bo)->parent) list_add(&p->uf_entry.tv.head, &p->validated);

-- 2.41.0

Alex Deucher

3:07 p.m.

New subject: [PATCH 9/9] drm/amd: Don't try to enable secure display TA multiple times

From: Mario Limonciello mario.limonciello@amd.com

If the securedisplay TA failed to load the first time, it's unlikely to work again after a suspend/resume cycle or reset cycle and it appears to be causing problems in futher attempts.

Fixes: e42dfa66d592 ("drm/amdgpu: Add secure display TA load for Renoir") Reported-by: Filip Hejsek filip.hejsek@gmail.com Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2633 Signed-off-by: Mario Limonciello mario.limonciello@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 5c6d52ff4b61e5267b25be714eb5a9ba2a338199) Cc: stable@vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index a150b7a4b4aa..e4757a2807d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -1947,6 +1947,8 @@ static int psp_securedisplay_initialize(struct psp_context *psp) psp_securedisplay_parse_resp_status(psp, securedisplay_cmd->status); dev_err(psp->adev->dev, "SECUREDISPLAY: query securedisplay TA failed. ret 0x%x\n", securedisplay_cmd->securedisplay_out_message.query_ta.query_cmd_ret); + /* don't try again */ + psp->securedisplay_context.context.bin_desc.size_bytes = 0; }

return 0;

-- 2.41.0

Mario Limonciello

11 Jul 11 Jul

9:40 p.m.

On 7/7/23 10:07, Alex Deucher wrote:

...

From: Christian König christian.koenig@amd.com

We need to grab the lock of the BO or otherwise can run into a crash when we try to inspect the current location.

Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Acked-by: Guchun Chen guchun.chen@amd.com Tested-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit e2ad8e2df432498b1cee2af04df605723f4d75e6) Cc: stable@vger.kernel.org # 6.3.x

Greg,

Just want to make sure you saw these 9 commits as you're processing queues since they don't stand out as being sent directly to stable.

...

drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 69 +++++++++++++++----------- 1 file changed, 39 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 5b3a70becbdf..a252a206f37b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -920,42 +920,51 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm, return r; } +static void amdgpu_vm_bo_get_memory(struct amdgpu_bo_va *bo_va,
		    struct amdgpu_mem_stats *stats)
+{
struct amdgpu_vm *vm = bo_va->base.vm;

struct amdgpu_bo *bo = bo_va->base.bo;

if (!bo)
return;
/*
* For now ignore BOs which are currently locked and potentially
* changing their location.
*/
if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv &&
   !dma_resv_trylock(bo->tbo.base.resv))
return;
amdgpu_bo_get_memory(bo, stats);

if (bo->tbo.base.resv != vm->root.bo->tbo.base.resv)
   dma_resv_unlock(bo->tbo.base.resv);
+}

void amdgpu_vm_get_memory(struct amdgpu_vm *vm, struct amdgpu_mem_stats *stats) { struct amdgpu_bo_va *bo_va, *tmp;

spin_lock(&vm->status_lock);
list_for_each_entry_safe(bo_va, tmp, &vm->idle, base.vm_status) {
if (!bo_va->base.bo)
	continue;
amdgpu_bo_get_memory(bo_va->base.bo, stats);
}

list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status) {
if (!bo_va->base.bo)
	continue;
amdgpu_bo_get_memory(bo_va->base.bo, stats);
}

list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status) {
if (!bo_va->base.bo)
	continue;
amdgpu_bo_get_memory(bo_va->base.bo, stats);
}

list_for_each_entry_safe(bo_va, tmp, &vm->moved, base.vm_status) {
if (!bo_va->base.bo)
	continue;
amdgpu_bo_get_memory(bo_va->base.bo, stats);
}

list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, base.vm_status) {
if (!bo_va->base.bo)
	continue;
amdgpu_bo_get_memory(bo_va->base.bo, stats);
}

list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status) {
if (!bo_va->base.bo)
	continue;
amdgpu_bo_get_memory(bo_va->base.bo, stats);
}
list_for_each_entry_safe(bo_va, tmp, &vm->idle, base.vm_status)
amdgpu_vm_bo_get_memory(bo_va, stats);
list_for_each_entry_safe(bo_va, tmp, &vm->evicted, base.vm_status)
amdgpu_vm_bo_get_memory(bo_va, stats);
list_for_each_entry_safe(bo_va, tmp, &vm->relocated, base.vm_status)
amdgpu_vm_bo_get_memory(bo_va, stats);
list_for_each_entry_safe(bo_va, tmp, &vm->moved, base.vm_status)
amdgpu_vm_bo_get_memory(bo_va, stats);
list_for_each_entry_safe(bo_va, tmp, &vm->invalidated, base.vm_status)
amdgpu_vm_bo_get_memory(bo_va, stats);
list_for_each_entry_safe(bo_va, tmp, &vm->done, base.vm_status)
amdgpu_vm_bo_get_memory(bo_va, stats);
spin_unlock(&vm->status_lock); }

Greg Kroah-Hartman

12 Jul 12 Jul

5:12 a.m.

On Tue, Jul 11, 2023 at 04:40:44PM -0500, Mario Limonciello wrote:

...

On 7/7/23 10:07, Alex Deucher wrote:

...
From: Christian König christian.koenig@amd.com

We need to grab the lock of the BO or otherwise can run into a crash when we try to inspect the current location.

Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Acked-by: Guchun Chen guchun.chen@amd.com Tested-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit e2ad8e2df432498b1cee2af04df605723f4d75e6) Cc: stable@vger.kernel.org # 6.3.x

Greg,

Just want to make sure you saw these 9 commits as you're processing queues since they don't stand out as being sent directly to stable.

Thanks for the pointer, no, I had missed them in the flood of stable patches recently. I have many hundreds of other patches to still get to, and these are now in that review queue as well.

greg k-h

Greg KH

16 Jul 16 Jul

7:16 p.m.

On Fri, Jul 07, 2023 at 11:07:26AM -0400, Alex Deucher wrote:

...

From: Christian König christian.koenig@amd.com

We need to grab the lock of the BO or otherwise can run into a crash when we try to inspect the current location.

Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Acked-by: Guchun Chen guchun.chen@amd.com Tested-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit e2ad8e2df432498b1cee2af04df605723f4d75e6) Cc: stable@vger.kernel.org # 6.3.x

drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 69 +++++++++++++++----------- 1 file changed, 39 insertions(+), 30 deletions(-)

I've applied the first 7 patches here to 6.4.y, which I am guessing is where you want them applied to, yet you didn't really say?

The last 2 did not apply :(

And some of these should go into 6.1.y also? Please send a patch series and give me a hint as to where they should be applied to next time so I don't have to guess...

thanks,

greg k-h

Mario Limonciello

7:22 p.m.

On 7/16/23 14:16, Greg KH wrote:

...

On Fri, Jul 07, 2023 at 11:07:26AM -0400, Alex Deucher wrote:

...
From: Christian König christian.koenig@amd.com

We need to grab the lock of the BO or otherwise can run into a crash when we try to inspect the current location.

Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Acked-by: Guchun Chen guchun.chen@amd.com Tested-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit e2ad8e2df432498b1cee2af04df605723f4d75e6) Cc: stable@vger.kernel.org # 6.3.x

drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 69 +++++++++++++++----------- 1 file changed, 39 insertions(+), 30 deletions(-)

I've applied the first 7 patches here to 6.4.y, which I am guessing is where you want them applied to, yet you didn't really say?

The last 2 did not apply :(

And some of these should go into 6.1.y also? Please send a patch series and give me a hint as to where they should be applied to next time so I don't have to guess...

thanks,

greg k-h

In this case the individual patches with specific requirements have:

Cc: stable@vger.kernel.org # version

They were sent before 6.3 went EOL, so here is the updated summary from them: 6.4.y: 1, 2, 3, 4, 5, 6, 7, 8, 9

6.1.y: 3, 4, 5, 6, 7, 8, 9

3 is particularly important for 6.1.y as we have active regressions reported related to it on 6.1.y.

So can you please take 3-7 to 6.1.y and I'll look more closely at what is wrong with 8 and 9 on 6.1.y and 6.4.y and resend them?

Greg KH

7:28 p.m.

On Sun, Jul 16, 2023 at 02:22:36PM -0500, Mario Limonciello wrote:

...

On 7/16/23 14:16, Greg KH wrote:

...
On Fri, Jul 07, 2023 at 11:07:26AM -0400, Alex Deucher wrote:

...
From: Christian König christian.koenig@amd.com

We need to grab the lock of the BO or otherwise can run into a crash when we try to inspect the current location.

Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Alex Deucher alexander.deucher@amd.com Acked-by: Guchun Chen guchun.chen@amd.com Tested-by: Mikhail Gavrilov mikhail.v.gavrilov@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit e2ad8e2df432498b1cee2af04df605723f4d75e6) Cc: stable@vger.kernel.org # 6.3.x

drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 69 +++++++++++++++----------- 1 file changed, 39 insertions(+), 30 deletions(-)

I've applied the first 7 patches here to 6.4.y, which I am guessing is where you want them applied to, yet you didn't really say?

The last 2 did not apply :(

And some of these should go into 6.1.y also? Please send a patch series and give me a hint as to where they should be applied to next time so I don't have to guess...

thanks,

greg k-h

In this case the individual patches with specific requirements have:

Cc: stable@vger.kernel.org # version

They were sent before 6.3 went EOL, so here is the updated summary from them: 6.4.y: 1, 2, 3, 4, 5, 6, 7, 8, 9

6.1.y: 3, 4, 5, 6, 7, 8, 9

3 is particularly important for 6.1.y as we have active regressions reported related to it on 6.1.y.

So can you please take 3-7 to 6.1.y and I'll look more closely at what is wrong with 8 and 9 on 6.1.y and 6.4.y and resend them?

I can't really pick out these for 6.1 from the larger series as I'm drowning in patches at the moment. Please send a backported series and I'll be glad to queue that up.

thanks,

greg k-h

890

days inactive

899

days old

linux-stable-mirror@lists.linaro.org

13 comments

participants

tags (0)

participants (4)

Alex Deucher
Greg KH
Greg Kroah-Hartman
Mario Limonciello