On Tue, Sep 28, 2021 at 09:56:57PM +0200, Lino Sanfilippo wrote:
One example is if the BCM2835 driver is used together with the TPM SPI driver: At system shutdown first the TPM chip devices (pre) shutdown handler (tpm_class_shutdown) is called, stopping the chip and setting an operations pointer to NULL. Then since the BCM2835 shutdown handler unregisters the SPI controller the TPM SPI remove function (tpm_tis_spi_remove) is also called. In case of TPM 2 this function accesses the now nullified operations pointer, resulting in the following NULL pointer access:
This is a bug in that driver, it should be able to cope with a race between a removal (which might be triggered for some other reason) and a shutdown. Obviously this is actively triggered by this code path but it could happen via some other mechanism.
The first attempt to fix this was with an extra check in the tpm chip driver (see https://marc.info/?l=linux-integrity&m=163129718414118&w=2) to avoid the NULL pointer access. Then Jason Gunthorpe noted that the real issue was the BCM driver unregistering the chip in the shutdown handler(see https://marc.info/?l=linux-integrity&m=163129718414118&w=2) which led me to this solution.
Whatever happens here you should still fix the driver.
-static int bcm2835_spi_remove(struct platform_device *pdev) +static void bcm2835_spi_shutdown(struct platform_device *pdev) { struct spi_controller *ctlr = platform_get_drvdata(pdev); struct bcm2835_spi *bs = spi_controller_get_devdata(ctlr);
- bcm2835_debugfs_remove(bs);
- spi_unregister_controller(ctlr);
- bcm2835_dma_release(ctlr, bs);
It is not at all clear to me that it is safe to deallocate the DMA resources the controller is using without first releasing the controller, I don't see what's stopping something coming along and submitting new transactions which could in turn try to start doing DMA.