Uffe,
On 26/02/20 8:51 pm, Ulf Hansson wrote:
- Anders, Kishon
On Tue, 25 Feb 2020 at 17:24, Jon Hunter jonathanh@nvidia.com wrote:
On 25/02/2020 14:26, Ulf Hansson wrote:
...
However, from the core point of view, the response is still requested, only that we don't want the driver to wait for the card to stop signaling busy. Instead we want to deal with that via "polling" from the core.
This is a rather worrying behaviour, as it seems like the host driver doesn't really follow this expectations from the core point of view. And mmc_flush_cache() is not the only case, as we have erase, bkops, sanitize, etc. Are all these working or not really well tested?
I don't believe that they are well tested. We have a simple test to mount an eMMC partition, create a file, check the contents, remove the file and unmount. The timeouts always occur during unmounting.
Earlier, before my three patches, if the provided timeout_ms parameter to __mmc_switch() was zero, which was the case for mmc_mmc_flush_cache() - this lead to that __mmc_switch() simply ignored validating host->max_busy_timeout, which was wrong. In any case, this also meant that an R1B response was always used for mmc_flush_cache(), as you also indicated above. Perhaps this is the critical part where things can go wrong.
BTW, have you tried erase commands for sdhci tegra driver? If those are working fine, do you have any special treatments for these?
That I am not sure, but I will check.
Great, thanks. Looking forward to your report.
So, from my side, me and Anders Roxell, have been collaborating on testing the behaviour on a TI Beagleboard x15 (remotely with limited debug options), which is using the sdhci-omap variant. I am trying to get hold of an Nvidia jetson-TX2, but not found one yet. These are the conclusions from the observed behaviour on the Beagleboard for the CMD6 cache flush command.
First, the reported host->max_busy_timeout is 2581 (ms) for the sdhci-omap driver in this configuration.
- As we all know by now, the cache flush command (CMD6) fails with
-110 currently. This is when MMC_CACHE_FLUSH_TIMEOUT_MS is set to 30 * 1000 (30s), which means __mmc_switch() drops the MMC_RSP_BUSY flag from the command.
- Changing the MMC_CACHE_FLUSH_TIMEOUT_MS to 2000 (2s), means that
the MMC_RSP_BUSY flag becomes set by __mmc_switch, because of the timeout_ms parameter is less than max_busy_timeout (2000 < 2581). Then everything works fine.
- Updating the code to again use 30s as the
MMC_CACHE_FLUSH_TIMEOUT_MS, but instead forcing the MMC_RSP_BUSY to be set, even when the timeout_ms becomes greater than max_busy_timeout. This also works fine.
Clearly this indicates a problem that I think needs to be addressed in the sdhci driver. However, of course I can revert the three discussed patches to fix the problem, but that would only hide the issues and I am sure we would then get back to this issue, sooner or later.
To fix the problem in the sdhci driver, I would appreciate if someone from TI and Nvidia can step in to help, as I don't have the HW on my desk.
Comments or other ideas of how to move forward?
Sorry I missed this earlier.
I don't have an X15 with me here but I'm trying to set one up in our remote farm. In the meantime, I tried to reproduce this issue on two platforms (dra72-evm and am57xx-evm) and wasn't able to see the issue because those eMMC's don't even have a cache. I will keep you updated when I do get a board with a eMMC that has a cache.
Is there a way to reproduce this CMD6 issue with another operation?
Thanks, Faiz