Re: LKFT: arm x15: mmc1: cache flush error -110

10 Mar 2020

      On Mon, 9 Mar 2020 at 18:33, Sowjanya Komatineni skomatineni@nvidia.com wrote:
...
On 3/6/20 3:14 AM, Ulf Hansson wrote:
...
External email: Use caution opening links or attachments
[...]
...
...
...
...
>>>>>> Actually we always use R1B with CMD6 as per spec.
>>>>> I fully agree that R1B is preferable, but it's not against the
>>>>> spec to
>>>>> send CMD13 to poll for busy.
>>>>>
>>>>> Moreover, we need to cope with the scenario when the host has
>>>>> specified a maximum timeout that isn't sufficiently long enough for
>>>>> the requested operation. Do you have another proposal for how to
>>>>> manage this, but disabling MMC_RSP_BUSY?
>>>>>
>>>>> Let's assume you driver would get a R1B for the CMD6 (we force it),
>>>>> then what timeout would the driver be using if we would set
>>>>> cmd.busy_timeout to 30ms?
>>>>>
Sorry didn't understood clearly. Are you asking with 30s timeout, whats
the data timeout counter used?
Yes. It seems like it will pick the maximum, which is 11s?
yes
Okay, thanks!
...
...
...
Because of above mentioned issue on our host where CMD interrupt happens
after busy state, poll for busy returns right away as not busy.
I see.
...
So issuing CMD13 after CMD6-R1 followed by busy poll should be working.
But weird that with small delay of 1ms or debug print before CMD13 it
doesn't timeout and works all the time.
I have digested the information you provided in these emails. Let me
summarize it, to see if I have understood correctly.

Your controller can't distinguish between R1 and R1B because of a
limitation in the HW. So, in both cases you need to wait for the card
to stop signal busy, before the controller can give an IRQ to notify
that the R1 response has been received. Correct?
In this context, I am wondering if sdhci_send_command(), really
conforms to these requirements. For example, depending on if the CMD6
has MMC_RSP_BUSY or not, it may pick either SDHCI_CMD_RESP_SHORT or
SDHCI_CMD_RESP_SHORT_BUSY.
Does this work as expected for your case?
Design team re-verified internally and bug where HW waits for busy state
before IRQ is only for R1B and R1 is spec compliant.
So, with R1, CMD complete is generated after response received.
Okay.
So, the issue we see for CMD6 with R1, is a software problem that we
should be able to fix.
...
With R1B, CMD complete and xfer complete both are generated after
response received + device busy (max timeout of 11s)
DATA timeout interrupt will be asserted incase if HW busy detection fails.
With R1B we may see DATA Timeout if operation takes more than max busy
timeout of 11s.
Okay, I see.
...
...

Assuming my interpretation of the above is somewhat correct. Then you
always need to set a busy timeout for R1/R1B responses in the
controller. The maximum timeout seems to be 11s long. Obviously, this
isn't enough for all cases, such as cache flushing and erase, for
example. So, what can we do to support a longer timeouts than 11s?
Would it be possible to disable the HW timeout, if the requested
timeout is longer than 11s and use a SW timeout instead?
Kind regards
Uffe
For erase long operations we have register bit to enable for infinite
busy wait mode where host controller would be monitoring until card is busy.
Alright, that sounds great!
...
But so far for emmc devices we used on our platforms, we haven't seen
cache flush taking more than 11s.
I understand that 11s is probably fine to use, for most cases.
However, it's not spec compliant, as for some operations there are
simply no timeout specified. BKOPS, cache flush, sanitize are cases
like this - and then 11s is definitely not sufficient.
...
Will get back on possibility of disabling HW timeout and using SW timeout..
Thanks!
I would like to get the regression fixed asap, but I also would like
to avoid reverting patches, unless really necessary. May I propose the
following two options.

Find out why polling with ->card_busy() or CMD13, for a CMD6 with

an R1 response doesn't work - and then fix that behaviour.

Set the mmc->max_busy_timeout to zero for sdhci-tegra, which makes

the core to always use R1B for CMD6 (and erase). This also means that
when the cmd->busy_timeout becomes longer than 11s, sdhci-tegra must
disable the HW busy timeout and just wait "forever".
If you decide for 2, you can add the software timeout support on top,
but make that can be considered as a next step of an improvement,
rather than needed as fix. Note that, I believe there are some support
for software timeout already in the sdhci core, maybe you need to
tweak it a bit for your case, I don't know.
Kind regards
Uffe
Hi Uffe
Will go with 2nd option and will send patches out when ready.
Okay, good.
...
BTW, Tegra host also supports SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK for
data timeout based on host clock when using finite mode (HW busy
detection based on DATA TIMEOUT count value when cmd operation timeout
is < 11s for tegra host).
So, looks like we cant set host max_busy_timeout to 0 for Tegra host to
force R1B during SWITCH and SLEEP_AWAKE.
So, was thinking to introduce host capability MMC_CAP2_LONG_WAIT_HW_BUSY
which can be used for hosts supporting long or infinite HW busy wait
detection and will update mmc and mmc_ops drivers to not allow convert
R1B to R1B for hosts with this capability during SLEEP_AWAKE and SWITCH.
That seems reasonable, it becomes probably both easier and clearer by
adding a new host cap.
In any case, let me help out and cook a patch for this for the core
part (I leave the sdhci change to you). It may be a bit tricky,
especially since I have currently queued a bunch of new changes for
v5.7, that enables more users of mmc_poll_for_busy() in the core.
Maybe I need to temporarily drop them, so we can fix these problems
first. I will check.
Probably, I would also name the cap MMC_CAP_HW_NEED_RSP_BUSY, as that
seems to be describing the common problem we have for sdhci
omap/tegra.
Finally, it seems like MMC_CAP_WAIT_WHILE_BUSY should be set for
sdhci- tegra, so while at it, perhaps you can cook a patch for that as
well.
Kind regards
Uffe

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: LKFT: arm x15: mmc1: cache flush error -110