On Wed, 2024-09-18 at 11:29 -0700, Bart Van Assche wrote:
External email : Please do not click links or open attachments until you have verified the sender or the content. On 9/18/24 6:29 AM, Peter Wang (王信友) wrote:
Basically, this patch currently only needs to handle requeueing for the error handler abort. The approach for DBR mode and MCQ mode should be consistent. If receive an interrupt response (OCS:ABORTED or
INVALID_OCS_VALUE),
then set DID_REQUEUE. If there is no interrupt, it will also set SCSI DID_REQUEUE in ufshcd_err_handler through ufshcd_complete_requests with force_compl = true.
Reporting a completion for commands cleared by writing into the legacy UTRLCLR register is not compliant with any version of the UFSHCI standard. Reporting a completion for commands cleared by writing into that register is problematic because it causes ufshcd_release_scsi_cmd() to be called as follows:
ufshcd_sl_intr() ufshcd_transfer_req_compl() ufshcd_poll() __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = ... ufshcd_release_scsi_cmd() scsi_done()
Calling ufshcd_release_scsi_cmd() if a command has been cleared is problematic because the SCSI core does not expect this. If ufshcd_try_to_abort_task() clears a SCSI command, ufshcd_release_scsi_cmd() must not be called until the SCSI core decides to release the command. This is why I wrote in a previous mail that I think that a quirk should be introduced to suppress the completions generated by clearing a SCSI command.
Hi Bart,
I'm not sure if I'm misunderstanding your point, but I feel that ufshcd_release_scsi_cmd should always be called. It's scsi_done that shouldn't be called, as it should be left to the SCSI layer to decide how to handle this command. Because ufshcd_release_scsi_cmd is just about releasing resources related to ufshcd_map_sg and the clock at the UFS driver level. scsi_done is what notifies the SCSI layer that the cmd has finished, asking it to look at the result to decide how to proceed.
The more problematic part is with MCQ mode. To imitate the DBR approach, we just need to set DID_REQUEUE upon receiving an
interrupt.
Everything else remains the same. This would make things simpler.
Moving forward, if we want to simplify things and we have also taken stock of the two or three scenarios where OCS: ABORTED
occurs,
do we even need a flag? Couldn't we just set DID_REQUEUE directly for OCS: ABORTED? What do you think?
How about making ufshcd_compl_one_cqe() skip entries with status OCS_ABORTED? That would make ufshcd_compl_one_cqe() behave as the SCSI core expects, namely not freeing any command resources if a SCSI command is aborted successfully.
This approach may require further changes to ufshcd_abort_all(). In that function there are separate code paths for legacy and MCQ mode. This is less than ideal. Would it be possible to combine these code paths by removing the ufshcd_complete_requests() call from ufshcd_abort_all() and by handling completions from inside ufshcd_abort_one()?
Thanks,
Bart.
The four case flows for abort are as follows: ----------------------------------------------------------------
Case1: DBR ufshcd_abort
In this case, you can see that ufshcd_release_scsi_cmd will definitely be called.
ufshcd_abort() ufshcd_try_to_abort_task() // It should trigger an interrupt, but the tensor might not get outstanding_lock clear outstanding_reqs tag ufshcd_release_scsi_cmd() release outstanding_lock
ufshcd_intr() ufshcd_sl_intr() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done();
In most cases, ufshcd_intr will not reach scsi_done because the outstanding_reqs tag is cleared by the original thread. Therefore, whether there is an interrupt or not doesn't affect the result because the ISR will do nothing in most cases.
In a very low chance, the ISR will reach scsi_done and notify SCSI to requeue, and the original thread will not call ufshcd_release_scsi_cmd. MediaTek may need to change DID_ABORT to DID_REQUEUE in this situation, or perhaps not handle this ISR at all.
----------------------------------------------------------------
Case2: MCQ ufshcd_abort
In the case of MCQ ufshcd_abort, you can also see that ufshcd_release_scsi_cmd will definitely be called too. However, there seems to be a problem here, as ufshcd_release_scsi_cmd might be called twice. This is because cmd is not null in ufshcd_release_scsi_cmd, which the previous version would set cmd to null. Skipping OCS: ABORTED in ufshcd_compl_one_cqe indeed can avoid this problem. This part needs further consideration on how to handle it.
ufshcd_abort() ufshcd_mcq_abort() ufshcd_try_to_abort_task() // will trigger ISR ufshcd_release_scsi_cmd()
ufs_mtk_mcq_intr() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT ufshcd_release_scsi_cmd() // will release twice scsi_done()
----------------------------------------------------------------
Case3: DBR ufshcd_err_handler
In the case of the DBR mode error handler, it's the same; ufshcd_release_scsi_cmd will also be executed, and scsi_done will definitely be used to notify SCSI to requeue.
ufshcd_err_handler() ufshcd_abort_all() ufshcd_abort_one() ufshcd_try_to_abort_task() // It should trigger an interrupt, but the tensor might not ufshcd_complete_requests() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done()
ufshcd_intr() ufshcd_sl_intr() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done();
At this time, the same actions are taken regardless of whether there is an ISR, and with the protection of outstanding_lock, only one thread will execute ufshcd_release_scsi_cmd and scsi_done.
----------------------------------------------------------------
Case4: MCQ ufshcd_err_handler
It's the same with MCQ mode; there is protection from the cqe lock, so only one thread will execute. What my patch 2 aims to do is to change DID_ABORT to DID_REQUEUE in this situation.
ufshcd_err_handler() ufshcd_abort_all() ufshcd_abort_one() ufshcd_try_to_abort_task() // will trigger irq thread ufshcd_complete_requests() ufshcd_mcq_compl_pending_transfer() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT // should change to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done()
ufs_mtk_mcq_intr() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT // should change to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done()
Thanks Peter