On Thu, 2024-09-19 at 11:49 -0700, Bart Van Assche wrote:
External email : Please do not click links or open attachments until you have verified the sender or the content. On 9/19/24 5:16 AM, Peter Wang (王信友) wrote:
The four case flows for abort are as follows:
Case1: DBR ufshcd_abort
Please follow the terminology from the UFSHCI 4.0 standard and use the word "legacy" instead of "DBR".
Hi Bart,
Okay, but the current code comments all use 'SDB mode'. Should we just stick with that term?
In this case, you can see that ufshcd_release_scsi_cmd will definitely be called.
ufshcd_abort() ufshcd_try_to_abort_task()// It should trigger an interrupt, but the tensor might not get outstanding_lock clear outstanding_reqs tag ufshcd_release_scsi_cmd() release outstanding_lock
ufshcd_intr() ufshcd_sl_intr() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE// mediatek may need quirk change DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done();
In most cases, ufshcd_intr will not reach scsi_done because the outstanding_reqs tag is cleared by the original thread. Therefore, whether there is an interrupt or not doesn't affect the result because the ISR will do nothing in most cases.
In a very low chance, the ISR will reach scsi_done and notify SCSI to requeue, and the original thread will not call ufshcd_release_scsi_cmd. MediaTek may need to change DID_ABORT to DID_REQUEUE in this situation, or perhaps not handle this ISR at all.
Please modify ufshcd_compl_one_cqe() such that it ignores commands with status OCS_ABORTED. This will make the UFSHCI driver behave in the same way for all UFSHCI controllers, whether or not clearing a command triggers a completion interrupt.
Yes, I am considering how to modify the code here.
Case2: MCQ ufshcd_abort
In the case of MCQ ufshcd_abort, you can also see that ufshcd_release_scsi_cmd will definitely be called too. However, there seems to be a problem here, as ufshcd_release_scsi_cmd might be called twice. This is because cmd is not null in ufshcd_release_scsi_cmd, which the previous version would set cmd to null. Skipping OCS: ABORTED in ufshcd_compl_one_cqe indeed can avoid this problem. This part needs further consideration on how to handle it.
ufshcd_abort() ufshcd_mcq_abort() ufshcd_try_to_abort_task()// will trigger ISR ufshcd_release_scsi_cmd()
ufs_mtk_mcq_intr() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT ufshcd_release_scsi_cmd() // will release twice scsi_done()
Do you agree that this case can be addressed with the ufshcd_compl_one_cqe() change proposed above?
Agree.
Case3: DBR ufshcd_err_handler
In the case of the DBR mode error handler, it's the same; ufshcd_release_scsi_cmd will also be executed, and scsi_done will definitely be used to notify SCSI to requeue.
ufshcd_err_handler() ufshcd_abort_all() ufshcd_abort_one() ufshcd_try_to_abort_task()// It should trigger an interrupt, but the tensor might not ufshcd_complete_requests() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk change DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done()
ufshcd_intr() ufshcd_sl_intr() ufshcd_transfer_req_compl() ufshcd_poll() get outstanding_lock clear outstanding_reqs tag release outstanding_lock __ufshcd_transfer_req_compl() ufshcd_compl_one_cqe() cmd->result = DID_REQUEUE // mediatek may need quirk
change
DID_ABORT to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done();
At this time, the same actions are taken regardless of whether there is an ISR, and with the protection of outstanding_lock, only one thread will execute ufshcd_release_scsi_cmd and scsi_done.
Case4: MCQ ufshcd_err_handler
It's the same with MCQ mode; there is protection from the cqe lock, so only one thread will execute. What my patch 2 aims to do is to change DID_ABORT to DID_REQUEUE in this situation.
ufshcd_err_handler() ufshcd_abort_all() ufshcd_abort_one() ufshcd_try_to_abort_task()// will trigger irq thread ufshcd_complete_requests() ufshcd_mcq_compl_pending_transfer() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT // should change to
DID_REQUEUE
ufshcd_release_scsi_cmd() scsi_done()
ufs_mtk_mcq_intr() ufshcd_mcq_poll_cqe_lock() ufshcd_mcq_process_cqe() ufshcd_compl_one_cqe() cmd->result = DID_ABORT // should change to DID_REQUEUE ufshcd_release_scsi_cmd() scsi_done()
For legacy and MCQ mode, I prefer the following behavior for ufshcd_abort_all():
- ufshcd_compl_one_cqe() ignores commands with status OCS_ABORTED.
- ufshcd_release_scsi_cmd() is called either by ufshcd_abort_one() or by ufshcd_abort_all().
Do you agree with making the changes proposed above?
Thank you,
This might not work, as SDB mode doesn't ignore OCS: INVALID_OCS_VALUE but rather notifies SCSI to requeue. So what we need to correct is to notify SCSI to requeue when MCQ mode receives OCS: ABORTED as well.
Furthermore, ufshcd_compl_one_cqe, whether it comes from ufshcd_abort_all or ISR, does the same thing and is protected by a lock. Therefore, there is no need for special handling specifically for ufshcd_abort_all.
After discussing with you, I realized that there are indeed many deficiencies and inconsistencies here that need to be addressed. I will submit a new patch for the content discussed above.
Thanks. Peter
Bar