On Mon, 2018-04-02 at 11:53 +0530, Sreekanth Reddy wrote:
On Fri, Mar 30, 2018 at 5:29 PM, Christoph Hellwig hch@infradead.org wrote:
On Fri, Mar 30, 2018 at 03:07:12PM +0530, Chaitra P B wrote:
Check scsi tracker for NULL before accessing it. And in some places there are possibilities for getting valid st but still other fields are not set.
Please explain how that could ever happen. You should never see a scsi_cmnd without the device pointer.
Chris,
Here is one example, During Host reset operation time driver will flush out all the outstanding IO's to the SML in below function,
static void _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc) { struct scsi_cmnd *scmd; struct scsiio_tracker *st; u16 smid; int count = 0;
[SR] Here driver is looping starting from smid one to HBA queue depth. for (smid = 1; smid <= ioc->scsiio_depth; smid++) {
[SR] Some times it is possible that scsi_cmnd might have created at SML but it might not be issued to the driver as host reset operation is going on, So here we will get non-NULL scmd. scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid); if (!scmd) continue; count++; _scsih_set_satl_pending(scmd, false); [SR] Here we are trying to get the scsi tracker 'st' for the above scmd (which is not received by the driver) and so fields of this 'st' are uninitialized. st = scsi_cmd_priv(scmd); [SR] And here we are trying to clear the scsi tracker 'st' which is not yet all initialized by the driver, in other terms we are trying to flush out the scmd command which is not yet all received by the driver. mpt3sas_base_clear_st(ioc, st); scsi_dma_unmap(scmd); if (ioc->pci_error_recovery || ioc->remove_host) scmd->result = DID_NO_CONNECT << 16; else scmd->result = DID_RESET << 16; scmd->scsi_done(scmd); } dtmprintk(ioc, pr_info(MPT3SAS_FMT "completing %d cmds\n", ioc->name, count)); }
Hello Sreekanth,
From mpt3sas_scsih_scsi_lookup_get():
st = scsi_cmd_priv(scmd); if (st->cb_idx == 0xFF) scmd = NULL;
From mpt3sas_base_get_smid(), mpt3sas_base_get_smid_scsiio() and mpt3sas_base_get_smid_hpr():
request->cb_idx = cb_idx;
Can _scsih_flush_running_cmds() be executed concurrently with scsih_qcmd()? In other words, can it happen that mpt3sas_scsih_scsi_lookup_get() sees that st->cb_idx == 0xff just before scsih_qcmd() assigns a value to .cb_idx? Can this cause _scsih_flush_running_cmds() to skip commands that it shouldn't skip?
Can it happen that _scsih_flush_running_cmds() calls .scsi_done() for a SCSI command just after scsih_qcmd() transferred control for that command to the firmware? Can that cause .scsi_done() to be called twice for the same command?
Thanks,
Bart.