Re: LKFT: arm x15: mmc1: cache flush error -110

3 Mar 2020

      On 3/2/20 8:50 AM, Ulf Hansson wrote:
...
External email: Use caution opening links or attachments
On Mon, 2 Mar 2020 at 14:11, Faiz Abbas faiz_abbas@ti.com wrote:
...
Uffe,
On 26/02/20 8:51 pm, Ulf Hansson wrote:
...

Anders, Kishon

On Tue, 25 Feb 2020 at 17:24, Jon Hunter jonathanh@nvidia.com wrote:
...
On 25/02/2020 14:26, Ulf Hansson wrote:
...
...
However, from the core point of view, the response is still requested,
only that we don't want the driver to wait for the card to stop
signaling busy. Instead we want to deal with that via "polling" from
the core.
This is a rather worrying behaviour, as it seems like the host driver
doesn't really follow this expectations from the core point of view.
And mmc_flush_cache() is not the only case, as we have erase, bkops,
sanitize, etc. Are all these working or not really well tested?
I don't believe that they are well tested. We have a simple test to
mount an eMMC partition, create a file, check the contents, remove the
file and unmount. The timeouts always occur during unmounting.
...
Earlier, before my three patches, if the provided timeout_ms parameter
to __mmc_switch() was zero, which was the case for
mmc_mmc_flush_cache() - this lead to that __mmc_switch() simply
ignored validating host->max_busy_timeout, which was wrong. In any
case, this also meant that an R1B response was always used for
mmc_flush_cache(), as you also indicated above. Perhaps this is the
critical part where things can go wrong.
BTW, have you tried erase commands for sdhci tegra driver? If those
are working fine, do you have any special treatments for these?
That I am not sure, but I will check.
Great, thanks. Looking forward to your report.
So, from my side, me and Anders Roxell, have been collaborating on
testing the behaviour on a TI Beagleboard x15 (remotely with limited
debug options), which is using the sdhci-omap variant. I am trying to
get hold of an Nvidia jetson-TX2, but not found one yet. These are the
conclusions from the observed behaviour on the Beagleboard for the
CMD6 cache flush command.
First, the reported host->max_busy_timeout is 2581 (ms) for the
sdhci-omap driver in this configuration.

As we all know by now, the cache flush command (CMD6) fails with

-110 currently. This is when MMC_CACHE_FLUSH_TIMEOUT_MS is set to 30 *
1000 (30s), which means __mmc_switch() drops the MMC_RSP_BUSY flag
from the command.

Changing the MMC_CACHE_FLUSH_TIMEOUT_MS to 2000 (2s), means that

the MMC_RSP_BUSY flag becomes set by __mmc_switch, because of the
timeout_ms parameter is less than max_busy_timeout (2000 <  2581).
Then everything works fine.

Updating the code to again use 30s as the

MMC_CACHE_FLUSH_TIMEOUT_MS, but instead forcing the MMC_RSP_BUSY to be
set, even when the timeout_ms becomes greater than max_busy_timeout.
This also works fine.
Clearly this indicates a problem that I think needs to be addressed in
the sdhci driver. However, of course I can revert the three discussed
patches to fix the problem, but that would only hide the issues and I
am sure we would then get back to this issue, sooner or later.
To fix the problem in the sdhci driver, I would appreciate if someone
from TI and Nvidia can step in to help, as I don't have the HW on my
desk.
Comments or other ideas of how to move forward?
Sorry I missed this earlier.
I don't have an X15 with me here but I'm trying to set one up in our
remote farm. In the meantime, I tried to reproduce this issue on two
platforms (dra72-evm and am57xx-evm) and wasn't able to see the issue
because those eMMC's don't even have a cache. I will keep you updated
when I do get a board with a eMMC that has a cache.
Is there a way to reproduce this CMD6 issue with another operation?
Yes, most definitely.
Let me cook a debug patch for you that should trigger the problem for
another CMD6 operation. I will post something later this evening or in
the mornings (Swedish timezone).
Kind regards
Uffe
Hi Ulf,
I could repro during suspend on Jetson TX1/TX2 as when it does mmc flush 
cache.
Timeout I see is for switch status CMD13 after sending CMD6 as device 
side CMD6 is still inflight while host sends CMD13 as we are using R1 
response type with timeout_ms changes to 30s.
Earlier we used timeout_ms of 0 for CMD6 flush cache, and with it uses 
R1B response type and host will wait for busy state followed by response 
from device for CMD6 and then data lines go High.
Now with timeout_ms changed to 30s, we use R1 response and SW waits for 
busy by checking for DAT0 line to go High.
With R1B type, host design after sending command at end of completion 
after end bit waits for 2 cycles for data line to go low (busy state 
from device) and waits for response cycles after which data lines will 
go back high and then we issue switch status CMD13.
With R1 type, host after sending command and at end of completion after 
end bit, DATA lines will go high immediately as its R1 type and switch 
status CMD13 gets issued but by this time it looks like CMD6 on device 
side is still in flight for sending status and data.
30s timeout is the wait time for data0 line to go high and 
mmc_busy_status will return success right away with R1 response type and 
SW sends switch status CMD13 but during that time on device side looks 
like still processing CMD6 as we are not waiting for enough time when we 
use R1 response type.
Actually we always use R1B with CMD6 as per spec.
Thanks
Sowjanya

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: LKFT: arm x15: mmc1: cache flush error -110