From: Kai Vehmanen kai.vehmanen@linux.intel.com
[ Upstream commit 44fda61d2bcfb74a942df93959e083a4e8eff75f ]
The unregister machine drivers call is not safe to do when kexec is used. Kexec-lite gets blocked with following backtrace:
[ 84.943749] Freezing user space processes ... (elapsed 0.111 seconds) done. [ 246.784446] INFO: task kexec-lite:5123 blocked for more than 122 seconds. [ 246.819035] Call Trace: [ 246.821782] <TASK> [ 246.824186] __schedule+0x5f9/0x1263 [ 246.828231] schedule+0x87/0xc5 [ 246.831779] snd_card_disconnect_sync+0xb5/0x127 ... [ 246.889249] snd_sof_device_shutdown+0xb4/0x150 [ 246.899317] pci_device_shutdown+0x37/0x61 [ 246.903990] device_shutdown+0x14c/0x1d6 [ 246.908391] kernel_kexec+0x45/0xb9
This reverts commit 83bfc7e793b555291785136c3ae86abcdc046887.
Reported-by: Ricardo Ribalda ribalda@chromium.org Cc: Ricardo Ribalda ribalda@chromium.org Signed-off-by: Kai Vehmanen kai.vehmanen@linux.intel.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Péter Ujfalusi peter.ujfalusi@linux.intel.com Reviewed-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com Link: https://lore.kernel.org/r/20221209114529.3909192-3-kai.vehmanen@linux.intel.... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sof/core.c | 9 --------- 1 file changed, 9 deletions(-)
diff --git a/sound/soc/sof/core.c b/sound/soc/sof/core.c index 3e6141d03770..625977a29d8a 100644 --- a/sound/soc/sof/core.c +++ b/sound/soc/sof/core.c @@ -475,19 +475,10 @@ EXPORT_SYMBOL(snd_sof_device_remove); int snd_sof_device_shutdown(struct device *dev) { struct snd_sof_dev *sdev = dev_get_drvdata(dev); - struct snd_sof_pdata *pdata = sdev->pdata;
if (IS_ENABLED(CONFIG_SND_SOC_SOF_PROBE_WORK_QUEUE)) cancel_work_sync(&sdev->probe_work);
- /* - * make sure clients and machine driver(s) are unregistered to force - * all userspace devices to be closed prior to the DSP shutdown sequence - */ - sof_unregister_clients(sdev); - - snd_sof_machine_unregister(sdev, pdata); - if (sdev->fw_state == SOF_FW_BOOT_COMPLETE) return snd_sof_shutdown(sdev);
From: Dominique Martinet asmadeus@codewreck.org
[ Upstream commit 1a4f69ef15ec29b213e2b086b2502644e8ef76ee ]
KCSAN reported a race between writing req->status in p9_client_cb and accessing it in p9_client_rpc's wait_event.
Accesses to req itself is protected by the data barrier (writing req fields, write barrier, writing status // reading status, read barrier, reading other req fields), but status accesses themselves apparently also must be annotated properly with WRITE_ONCE/READ_ONCE when we access it without locks.
Follows: - error paths writing status in various threads all can notify p9_client_rpc, so these all also need WRITE_ONCE - there's a similar read loop in trans_virtio for zc case that also needs READ_ONCE - other reads in trans_fd should be protected by the trans_fd lock and lists state machine, as corresponding writers all are within trans_fd and should be under the same lock. If KCSAN complains on them we likely will have something else to fix as well, so it's better to leave them unmarked and look again if required.
Link: https://lkml.kernel.org/r/20221205124756.426350-1-asmadeus@codewreck.org Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Suggested-by: Marco Elver elver@google.com Acked-by: Marco Elver elver@google.com Reviewed-by: Christian Schoenebeck linux_oss@crudebyte.com Signed-off-by: Dominique Martinet asmadeus@codewreck.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/client.c | 15 ++++++++------- net/9p/trans_fd.c | 12 ++++++------ net/9p/trans_rdma.c | 4 ++-- net/9p/trans_virtio.c | 9 +++++---- net/9p/trans_xen.c | 4 ++-- 5 files changed, 23 insertions(+), 21 deletions(-)
diff --git a/net/9p/client.c b/net/9p/client.c index aaa37b07e30a..29565e021495 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -438,7 +438,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status) * the status change is visible to another thread */ smp_wmb(); - req->status = status; + WRITE_ONCE(req->status, status);
wake_up(&req->wq); p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag); @@ -600,7 +600,7 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq) /* if we haven't received a response for oldreq, * remove it from the list */ - if (oldreq->status == REQ_STATUS_SENT) { + if (READ_ONCE(oldreq->status) == REQ_STATUS_SENT) { if (c->trans_mod->cancelled) c->trans_mod->cancelled(c, oldreq); } @@ -697,7 +697,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) } again: /* Wait for the response */ - err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD); + err = wait_event_killable(req->wq, + READ_ONCE(req->status) >= REQ_STATUS_RCVD);
/* Make sure our req is coherent with regard to updates in other * threads - echoes to wmb() in the callback @@ -711,7 +712,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) goto again; }
- if (req->status == REQ_STATUS_ERROR) { + if (READ_ONCE(req->status) == REQ_STATUS_ERROR) { p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err); err = req->t_err; } @@ -724,7 +725,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) p9_client_flush(c, req);
/* if we received the response anyway, don't signal error */ - if (req->status == REQ_STATUS_RCVD) + if (READ_ONCE(req->status) == REQ_STATUS_RCVD) err = 0; } recalc_sigpending: @@ -793,7 +794,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, if (err != -ERESTARTSYS) goto recalc_sigpending; } - if (req->status == REQ_STATUS_ERROR) { + if (READ_ONCE(req->status) == REQ_STATUS_ERROR) { p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err); err = req->t_err; } @@ -806,7 +807,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, p9_client_flush(c, req);
/* if we received the response anyway, don't signal error */ - if (req->status == REQ_STATUS_RCVD) + if (READ_ONCE(req->status) == REQ_STATUS_RCVD) err = 0; } recalc_sigpending: diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 07db2f436d44..5a1aecf7fe48 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -202,11 +202,11 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) { list_move(&req->req_list, &cancel_list); - req->status = REQ_STATUS_ERROR; + WRITE_ONCE(req->status, REQ_STATUS_ERROR); } list_for_each_entry_safe(req, rtmp, &m->unsent_req_list, req_list) { list_move(&req->req_list, &cancel_list); - req->status = REQ_STATUS_ERROR; + WRITE_ONCE(req->status, REQ_STATUS_ERROR); }
spin_unlock(&m->req_lock); @@ -467,7 +467,7 @@ static void p9_write_work(struct work_struct *work)
req = list_entry(m->unsent_req_list.next, struct p9_req_t, req_list); - req->status = REQ_STATUS_SENT; + WRITE_ONCE(req->status, REQ_STATUS_SENT); p9_debug(P9_DEBUG_TRANS, "move req %p\n", req); list_move_tail(&req->req_list, &m->req_list);
@@ -676,7 +676,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req) return m->err;
spin_lock(&m->req_lock); - req->status = REQ_STATUS_UNSENT; + WRITE_ONCE(req->status, REQ_STATUS_UNSENT); list_add_tail(&req->req_list, &m->unsent_req_list); spin_unlock(&m->req_lock);
@@ -703,7 +703,7 @@ static int p9_fd_cancel(struct p9_client *client, struct p9_req_t *req)
if (req->status == REQ_STATUS_UNSENT) { list_del(&req->req_list); - req->status = REQ_STATUS_FLSHD; + WRITE_ONCE(req->status, REQ_STATUS_FLSHD); p9_req_put(client, req); ret = 0; } @@ -732,7 +732,7 @@ static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req) * remove it from the list. */ list_del(&req->req_list); - req->status = REQ_STATUS_FLSHD; + WRITE_ONCE(req->status, REQ_STATUS_FLSHD); spin_unlock(&m->req_lock);
p9_req_put(client, req); diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c index 6ff706760676..e9a830c69058 100644 --- a/net/9p/trans_rdma.c +++ b/net/9p/trans_rdma.c @@ -507,7 +507,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req) * because doing if after could erase the REQ_STATUS_RCVD * status in case of a very fast reply. */ - req->status = REQ_STATUS_SENT; + WRITE_ONCE(req->status, REQ_STATUS_SENT); err = ib_post_send(rdma->qp, &wr, NULL); if (err) goto send_error; @@ -517,7 +517,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
/* Handle errors that happened during or while preparing the send: */ send_error: - req->status = REQ_STATUS_ERROR; + WRITE_ONCE(req->status, REQ_STATUS_ERROR); kfree(c); p9_debug(P9_DEBUG_ERROR, "Error %d in rdma_request()\n", err);
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index e757f0601304..3f3eb03cda7d 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -263,7 +263,7 @@ p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
p9_debug(P9_DEBUG_TRANS, "9p debug: virtio request\n");
- req->status = REQ_STATUS_SENT; + WRITE_ONCE(req->status, REQ_STATUS_SENT); req_retry: spin_lock_irqsave(&chan->lock, flags);
@@ -469,7 +469,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, inlen = n; } } - req->status = REQ_STATUS_SENT; + WRITE_ONCE(req->status, REQ_STATUS_SENT); req_retry_pinned: spin_lock_irqsave(&chan->lock, flags);
@@ -532,9 +532,10 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, spin_unlock_irqrestore(&chan->lock, flags); kicked = 1; p9_debug(P9_DEBUG_TRANS, "virtio request kicked\n"); - err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD); + err = wait_event_killable(req->wq, + READ_ONCE(req->status) >= REQ_STATUS_RCVD); // RERROR needs reply (== error string) in static data - if (req->status == REQ_STATUS_RCVD && + if (READ_ONCE(req->status) == REQ_STATUS_RCVD && unlikely(req->rc.sdata[4] == P9_RERROR)) handle_rerror(req, in_hdr_len, offs, in_pages);
diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c index aaa5fd364691..cf1b89ba522b 100644 --- a/net/9p/trans_xen.c +++ b/net/9p/trans_xen.c @@ -157,7 +157,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req) &masked_prod, masked_cons, XEN_9PFS_RING_SIZE(ring));
- p9_req->status = REQ_STATUS_SENT; + WRITE_ONCE(p9_req->status, REQ_STATUS_SENT); virt_wmb(); /* write ring before updating pointer */ prod += size; ring->intf->out_prod = prod; @@ -212,7 +212,7 @@ static void p9_xen_response(struct work_struct *work) dev_warn(&priv->dev->dev, "requested packet size too big: %d for tag %d with capacity %zd\n", h.size, h.tag, req->rc.capacity); - req->status = REQ_STATUS_ERROR; + WRITE_ONCE(req->status, REQ_STATUS_ERROR); goto recv_error; }
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit a1dec9d70b6ad97087b60b81d2492134a84208c6 ]
The Advantech MICA-071 tablet deviates from the defaults for a non CR Bay Trail based tablet in several ways:
1. It uses an analog MIC on IN3 rather then using DMIC1 2. It only has 1 speaker 3. It needs the OVCD current threshold to be set to 1500uA instead of the default 2000uA to reliable differentiate between headphones vs headsets
Add a quirk with these settings for this tablet.
Signed-off-by: Hans de Goede hdegoede@redhat.com Acked-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20221213123246.11226-1-hdegoede@redhat.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/boards/bytcr_rt5640.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index fb9d9e271845..ddd2625bed90 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -570,6 +570,21 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { BYT_RT5640_SSP0_AIF1 | BYT_RT5640_MCLK_EN), }, + { + /* Advantech MICA-071 */ + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Advantech"), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "MICA-071"), + }, + /* OVCD Th = 1500uA to reliable detect head-phones vs -set */ + .driver_data = (void *)(BYT_RT5640_IN3_MAP | + BYT_RT5640_JD_SRC_JD2_IN4N | + BYT_RT5640_OVCD_TH_1500UA | + BYT_RT5640_OVCD_SF_0P75 | + BYT_RT5640_MONO_SPEAKER | + BYT_RT5640_DIFF_MIC | + BYT_RT5640_MCLK_EN), + }, { .matches = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "ARCHOS"),
From: YC Hung yc.hung@mediatek.com
[ Upstream commit 7bd220f2ba9014b78f0304178103393554b8c4fe ]
Coverity spotted that panic_info is not initialized to zero in mtk_adsp_dump. Using uninitialized value panic_info.linenum when calling snd_sof_get_status. Fix this coverity by initializing panic_info struct as zero.
Signed-off-by: YC Hung yc.hung@mediatek.com Reviewed-by: Curtis Malainey cujomalainey@chromium.org Link: https://lore.kernel.org/r/20221213115617.25086-1-yc.hung@mediatek.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/sof/mediatek/mtk-adsp-common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/sof/mediatek/mtk-adsp-common.c b/sound/soc/sof/mediatek/mtk-adsp-common.c index 1e0769c668a7..de8dbe27cd0d 100644 --- a/sound/soc/sof/mediatek/mtk-adsp-common.c +++ b/sound/soc/sof/mediatek/mtk-adsp-common.c @@ -60,7 +60,7 @@ void mtk_adsp_dump(struct snd_sof_dev *sdev, u32 flags) { char *level = (flags & SOF_DBG_DUMP_OPTIONAL) ? KERN_DEBUG : KERN_ERR; struct sof_ipc_dsp_oops_xtensa xoops; - struct sof_ipc_panic_info panic_info; + struct sof_ipc_panic_info panic_info = {}; u32 stack[MTK_ADSP_STACK_DUMP_SIZE]; u32 status;
From: Luben Tuikov luben.tuikov@amd.com
[ Upstream commit 7554886daa31eacc8e7fac9e15bbce67d10b8f1f ]
Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the requested memory exists, else we get a kernel oops when dereferencing "man".
v2: Make the patch standalone, i.e. not dependent on local patches. v3: Preserve old behaviour and just check that the manager pointer is not NULL. v4: Complain if GTT domain requested and it is uninitialized--most likely a bug.
Cc: Alex Deucher Alexander.Deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: AMD Graphics amd-gfx@lists.freedesktop.org Signed-off-by: Luben Tuikov luben.tuikov@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 2e8f6cd7a729..33e266433e9b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -446,27 +446,24 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
/* * If GTT is part of requested domains the check must succeed to - * allow fall back to GTT + * allow fall back to GTT. */ if (domain & AMDGPU_GEM_DOMAIN_GTT) { man = ttm_manager_type(&adev->mman.bdev, TTM_PL_TT);
- if (size < man->size) + if (man && size < man->size) return true; - else - goto fail; - } - - if (domain & AMDGPU_GEM_DOMAIN_VRAM) { + else if (!man) + WARN_ON_ONCE("GTT domain requested but GTT mem manager uninitialized"); + goto fail; + } else if (domain & AMDGPU_GEM_DOMAIN_VRAM) { man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
- if (size < man->size) + if (man && size < man->size) return true; - else - goto fail; + goto fail; }
- /* TODO add more domains checks, such as AMDGPU_GEM_DOMAIN_CPU */ return true;
Hi Sasha,
The patch in the link is a Fixes patch of the quoted patch, and should also go in:
https://lore.kernel.org/all/20230104221935.113400-1-luben.tuikov@amd.com/
Regards, Luben
On 2022-12-31 15:04, Sasha Levin wrote:
From: Luben Tuikov luben.tuikov@amd.com
[ Upstream commit 7554886daa31eacc8e7fac9e15bbce67d10b8f1f ]
Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the requested memory exists, else we get a kernel oops when dereferencing "man".
v2: Make the patch standalone, i.e. not dependent on local patches. v3: Preserve old behaviour and just check that the manager pointer is not NULL. v4: Complain if GTT domain requested and it is uninitialized--most likely a bug.
Cc: Alex Deucher Alexander.Deucher@amd.com Cc: Christian König christian.koenig@amd.com Cc: AMD Graphics amd-gfx@lists.freedesktop.org Signed-off-by: Luben Tuikov luben.tuikov@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 2e8f6cd7a729..33e266433e9b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -446,27 +446,24 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev, /* * If GTT is part of requested domains the check must succeed to
* allow fall back to GTT
*/ if (domain & AMDGPU_GEM_DOMAIN_GTT) { man = ttm_manager_type(&adev->mman.bdev, TTM_PL_TT);* allow fall back to GTT.
if (size < man->size)
if (man && size < man->size) return true;
else
goto fail;
- }
- if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
else if (!man)
WARN_ON_ONCE("GTT domain requested but GTT mem manager uninitialized");
goto fail;
- } else if (domain & AMDGPU_GEM_DOMAIN_VRAM) { man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
if (size < man->size)
if (man && size < man->size) return true;
else
goto fail;
}goto fail;
- /* TODO add more domains checks, such as AMDGPU_GEM_DOMAIN_CPU */ return true;
From: Philip Yang Philip.Yang@amd.com
[ Upstream commit 29d48b87db64b6697ddad007548e51d032081c59 ]
Should only destroy the ib_mem and let process cleanup worker to free the outstanding BOs. Reset the pointer in pdd->qpd structure, to avoid NULL pointer access in process destroy worker.
BUG: kernel NULL pointer dereference, address: 0000000000000010 Call Trace: amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel+0x46/0xb0 [amdgpu] kfd_process_device_destroy_cwsr_dgpu+0x40/0x70 [amdgpu] kfd_process_destroy_pdds+0x71/0x190 [amdgpu] kfd_process_wq_release+0x2a2/0x3b0 [amdgpu] process_one_work+0x2a1/0x600 worker_thread+0x39/0x3d0
Signed-off-by: Philip Yang Philip.Yang@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 951b63677248..9821fa9268d3 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -689,13 +689,13 @@ void kfd_process_destroy_wq(void) }
static void kfd_process_free_gpuvm(struct kgd_mem *mem, - struct kfd_process_device *pdd, void *kptr) + struct kfd_process_device *pdd, void **kptr) { struct kfd_dev *dev = pdd->dev;
- if (kptr) { + if (kptr && *kptr) { amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(mem); - kptr = NULL; + *kptr = NULL; }
amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(dev->adev, mem, pdd->drm_priv); @@ -795,7 +795,7 @@ static void kfd_process_device_destroy_ib_mem(struct kfd_process_device *pdd) if (!qpd->ib_kaddr || !qpd->ib_base) return;
- kfd_process_free_gpuvm(qpd->ib_mem, pdd, qpd->ib_kaddr); + kfd_process_free_gpuvm(qpd->ib_mem, pdd, &qpd->ib_kaddr); }
struct kfd_process *kfd_create_process(struct file *filep) @@ -1277,7 +1277,7 @@ static void kfd_process_device_destroy_cwsr_dgpu(struct kfd_process_device *pdd) if (!dev->cwsr_enabled || !qpd->cwsr_kaddr || !qpd->cwsr_base) return;
- kfd_process_free_gpuvm(qpd->cwsr_mem, pdd, qpd->cwsr_kaddr); + kfd_process_free_gpuvm(qpd->cwsr_mem, pdd, &qpd->cwsr_kaddr); }
void kfd_process_set_trap_handler(struct qcm_process_device *qpd, @@ -1598,8 +1598,8 @@ int kfd_process_device_init_vm(struct kfd_process_device *pdd, return 0;
err_init_cwsr: + kfd_process_device_destroy_ib_mem(pdd); err_reserve_ib_mem: - kfd_process_device_free_bos(pdd); pdd->drm_priv = NULL;
return ret;
From: Philip Yang Philip.Yang@amd.com
[ Upstream commit 1a799c4c190ea9f0e81028e3eb3037ed0ab17ff5 ]
If kfd_process_device_init_vm returns failure after vm is converted to compute vm and vm->pasid set to compute pasid, KFD will not take pdd->drm_file reference. As a result, drm close file handler maybe called to release the compute pasid before KFD process destroy worker to release the same pasid and set vm->pasid to zero, this generates below WARNING backtrace and NULL pointer access.
Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step of kfd_process_device_init_vm, to ensure vm pasid is the original pasid if acquiring vm failed or is the compute pasid with pdd->drm_file reference taken to avoid double release same pasid.
amdgpu: Failed to create process VM object ida_free called for id=32770 which is not allocated. WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140 RIP: 0010:ida_free+0x96/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10
BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:ida_free+0x76/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10
Signed-off-by: Philip Yang Philip.Yang@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 4 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 39 +++++++++++++------ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 12 ++++-- 3 files changed, 40 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 647220a8762d..30f145dc8724 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -265,8 +265,10 @@ int amdgpu_amdkfd_get_pcie_bandwidth_mbytes(struct amdgpu_device *adev, bool is_ (&((struct amdgpu_fpriv *) \ ((struct drm_file *)(drm_priv))->driver_priv)->vm)
+int amdgpu_amdkfd_gpuvm_set_vm_pasid(struct amdgpu_device *adev, + struct file *filp, u32 pasid); int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev, - struct file *filp, u32 pasid, + struct file *filp, void **process_info, struct dma_fence **ef); void amdgpu_amdkfd_gpuvm_release_process_vm(struct amdgpu_device *adev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 1f76e27f1a35..18a7cbbd00ff 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1473,10 +1473,9 @@ static void amdgpu_amdkfd_gpuvm_unpin_bo(struct amdgpu_bo *bo) amdgpu_bo_unreserve(bo); }
-int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev, - struct file *filp, u32 pasid, - void **process_info, - struct dma_fence **ef) +int amdgpu_amdkfd_gpuvm_set_vm_pasid(struct amdgpu_device *adev, + struct file *filp, u32 pasid) + { struct amdgpu_fpriv *drv_priv; struct amdgpu_vm *avm; @@ -1487,10 +1486,6 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev, return ret; avm = &drv_priv->vm;
- /* Already a compute VM? */ - if (avm->process_info) - return -EINVAL; - /* Free the original amdgpu allocated pasid, * will be replaced with kfd allocated pasid. */ @@ -1499,14 +1494,36 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev, amdgpu_vm_set_pasid(adev, avm, 0); }
- /* Convert VM into a compute VM */ - ret = amdgpu_vm_make_compute(adev, avm); + ret = amdgpu_vm_set_pasid(adev, avm, pasid); if (ret) return ret;
- ret = amdgpu_vm_set_pasid(adev, avm, pasid); + return 0; +} + +int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev, + struct file *filp, + void **process_info, + struct dma_fence **ef) +{ + struct amdgpu_fpriv *drv_priv; + struct amdgpu_vm *avm; + int ret; + + ret = amdgpu_file_to_fpriv(filp, &drv_priv); if (ret) return ret; + avm = &drv_priv->vm; + + /* Already a compute VM? */ + if (avm->process_info) + return -EINVAL; + + /* Convert VM into a compute VM */ + ret = amdgpu_vm_make_compute(adev, avm); + if (ret) + return ret; + /* Initialize KFD part of the VM and process info */ ret = init_kfd_vm(avm, process_info, ef); if (ret) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 9821fa9268d3..dd351105c1bc 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1576,9 +1576,9 @@ int kfd_process_device_init_vm(struct kfd_process_device *pdd, p = pdd->process; dev = pdd->dev;
- ret = amdgpu_amdkfd_gpuvm_acquire_process_vm( - dev->adev, drm_file, p->pasid, - &p->kgd_process_info, &p->ef); + ret = amdgpu_amdkfd_gpuvm_acquire_process_vm(dev->adev, drm_file, + &p->kgd_process_info, + &p->ef); if (ret) { pr_err("Failed to create process VM object\n"); return ret; @@ -1593,10 +1593,16 @@ int kfd_process_device_init_vm(struct kfd_process_device *pdd, if (ret) goto err_init_cwsr;
+ ret = amdgpu_amdkfd_gpuvm_set_vm_pasid(dev->adev, drm_file, p->pasid); + if (ret) + goto err_set_pasid; + pdd->drm_file = drm_file;
return 0;
+err_set_pasid: + kfd_process_device_destroy_cwsr_dgpu(pdd); err_init_cwsr: kfd_process_device_destroy_ib_mem(pdd); err_reserve_ib_mem:
Hi,
On Sat, 31 Dec 2022, Sasha Levin wrote:
From: Kai Vehmanen kai.vehmanen@linux.intel.com
[ Upstream commit 44fda61d2bcfb74a942df93959e083a4e8eff75f ]
The unregister machine drivers call is not safe to do when kexec is used. Kexec-lite gets blocked with following backtrace:
this should be picked together with commit 2aa2a5ead0e ("ASoC: SOF: Intel: pci-tgl: unblock S5 entry if DMA stop has failed"), to not bring back old bugs (system failures to enter S5 on shutdown). The revert patch unfortunately fails to mention this dependency.
If I'm too late with my reply, I can send the second patch separately to stable.
Br, Kai
On Wed, Jan 04, 2023 at 02:34:55PM +0200, Kai Vehmanen wrote:
Hi,
On Sat, 31 Dec 2022, Sasha Levin wrote:
From: Kai Vehmanen kai.vehmanen@linux.intel.com
[ Upstream commit 44fda61d2bcfb74a942df93959e083a4e8eff75f ]
The unregister machine drivers call is not safe to do when kexec is used. Kexec-lite gets blocked with following backtrace:
this should be picked together with commit 2aa2a5ead0e ("ASoC: SOF: Intel: pci-tgl: unblock S5 entry if DMA stop has failed"), to not bring back old bugs (system failures to enter S5 on shutdown). The revert patch unfortunately fails to mention this dependency.
If I'm too late with my reply, I can send the second patch separately to stable.
I took 2aa2a5ead0e along with this patch, thanks!
linux-stable-mirror@lists.linaro.org