On 9/19/24 5:16 AM, Peter Wang (王信友) wrote:
The four case flows for abort are as follows:
Case1: DBR ufshcd_abort
Please follow the terminology from the UFSHCI 4.0 standard and use the word "legacy" instead of "DBR".
In this case, you can see that ufshcd_release_scsi_cmd will definitely be called.
ufshcd_abort() ufshcd_try_to_abort_task() // It should trigger an interrupt, but the tensor might not get outstanding_lock clear outstanding_reqs tag ufshcd_release_scsi_cmd() release outstanding_lock
ufshcd_intr() ufshcd_sl_intr() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done();
In most cases, ufshcd_intr will not reach scsi_done because the outstanding_reqs tag is cleared by the original thread. Therefore, whether there is an interrupt or not doesn't affect the result because the ISR will do nothing in most cases.
In a very low chance, the ISR will reach scsi_done and notify SCSI to requeue, and the original thread will not call ufshcd_release_scsi_cmd. MediaTek may need to change DID_ABORT to DID_REQUEUE in this situation, or perhaps not handle this ISR at all.
Please modify ufshcd_compl_one_cqe() such that it ignores commands with status OCS_ABORTED. This will make the UFSHCI driver behave in the same way for all UFSHCI controllers, whether or not clearing a command triggers a completion interrupt.
Case2: MCQ ufshcd_abort
In the case of MCQ ufshcd_abort, you can also see that ufshcd_release_scsi_cmd will definitely be called too. However, there seems to be a problem here, as ufshcd_release_scsi_cmd might be called twice. This is because cmd is not null in ufshcd_release_scsi_cmd, which the previous version would set cmd to null. Skipping OCS: ABORTED in ufshcd_compl_one_cqe indeed can avoid this problem. This part needs further consideration on how to handle it.
ufshcd_abort() ufshcd_mcq_abort() ufshcd_try_to_abort_task() // will trigger ISR ufshcd_release_scsi_cmd()
ufs_mtk_mcq_intr() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT ufshcd_release_scsi_cmd() // will release twice scsi_done()
Do you agree that this case can be addressed with the ufshcd_compl_one_cqe() change proposed above?
Case3: DBR ufshcd_err_handler
In the case of the DBR mode error handler, it's the same; ufshcd_release_scsi_cmd will also be executed, and scsi_done will definitely be used to notify SCSI to requeue.
ufshcd_err_handler() ufshcd_abort_all() ufshcd_abort_one() ufshcd_try_to_abort_task() // It should trigger an interrupt, but the tensor might not ufshcd_complete_requests() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done()
ufshcd_intr() ufshcd_sl_intr() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done();
At this time, the same actions are taken regardless of whether there is an ISR, and with the protection of outstanding_lock, only one thread will execute ufshcd_release_scsi_cmd and scsi_done.
Case4: MCQ ufshcd_err_handler
It's the same with MCQ mode; there is protection from the cqe lock, so only one thread will execute. What my patch 2 aims to do is to change DID_ABORT to DID_REQUEUE in this situation.
ufshcd_err_handler() ufshcd_abort_all() ufshcd_abort_one() ufshcd_try_to_abort_task() // will trigger irq thread ufshcd_complete_requests() ufshcd_mcq_compl_pending_transfer() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT // should change to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done()
ufs_mtk_mcq_intr() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT // should change to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done()
For legacy and MCQ mode, I prefer the following behavior for ufshcd_abort_all(): * ufshcd_compl_one_cqe() ignores commands with status OCS_ABORTED. * ufshcd_release_scsi_cmd() is called either by ufshcd_abort_one() or by ufshcd_abort_all().
Do you agree with making the changes proposed above?
Thank you,
Bart.