On Mon, 20 Jul 2020 at 21:27, Rob Clark robdclark@gmail.com wrote:
On Mon, Jul 20, 2020 at 4:28 AM Robin Murphy robin.murphy@arm.com wrote:
On 2020-07-20 08:17, Arnd Bergmann wrote:
On Mon, Jul 20, 2020 at 8:36 AM Naresh Kamboju naresh.kamboju@linaro.org wrote:
<>
[ 5.444121] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018 [ 5.456615] ESR = 0x96000004 [ 5.464471] SET = 0, FnV = 0 [ 5.464487] EA = 0, S1PTW = 0 [ 5.466521] Data abort info: [ 5.469971] ISV = 0, ISS = 0x00000004 [ 5.472768] CM = 0, WnR = 0 [ 5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000bacba000 [ 5.479349] [0000000000000018] pgd=0000000000000000, p4d=0000000000000000 [ 5.485820] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 5.492448] Modules linked in: crct10dif_ce adv7511(+) qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns rmtfs_mem fuse [ 5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 #1 [ 5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.529170] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) [ 5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8 [ 5.541148] lr : free_io_pgtable_ops+0x28/0x58
<>
[ 5.628297] Call trace: [ 5.633592] qcom_iommu_tlb_inv_context+0x18/0xa8
This means that dev_iommu_fwspec_get() has returned NULL in qcom_iommu_tlb_inv_context(), either because dev->iommu is NULL, or because dev->iommu->fwspec is NULL.
qcom_iommu_tlb_inv_context() does not check for a NULL pointer before using the returned object.
The bug is either in the lack of error handling, or the fact that it's possible to get into this function for a device that has not been fully set up.
Not quite - the device *was* properly set up, but has already been properly torn down again in the removal path by iommu_release_device(). The problem is that qcom-iommu kept the device pointer as its TLB cookie for the domain, but the domain has a longer lifespan than the validity of that device - that's a fundamental design flaw in the driver.
fwiw, I just sent "iommu/qcom: Use domain rather than dev as tlb cookie".. untested but looks like a straightforward enough change to switch over to using the domain rather than dev as cookie
The proposed patch tested and confirmed the reported problem fixed.
ref: https://lore.kernel.org/linux-iommu/CA+G9fYtj1RBYcPhXZRm-qm5ygtdLj1jD8vFZSqQ... https://lkft.validation.linaro.org/scheduler/job/1593950#L3392
BR, -R
- Naresh