On Fri, Jul 15, 2022 at 09:54:10AM +0200, Greg Kroah-Hartman wrote:
On Fri, Jul 15, 2022 at 09:46:31AM +0200, Sascha Hauer wrote:
On Fri, Jul 15, 2022 at 07:49:40AM +0200, Greg Kroah-Hartman wrote:
On Fri, Jul 15, 2022 at 07:27:02AM +0200, Tomasz Moń wrote:
On Mon, 2022-07-11 at 11:12 +0200, Tomasz Moń wrote:
On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong though: It is calculated based on the maximum page read time, but the timeout is also used for page write and block erase operations which require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of tBERS_max and tPROG_max.
06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels, including v5.15.51. After we have upgraded to v5.15.51 we have observed the issue that Sascha mentioned in his email [1].
As the v5.19-rc6 was released yesterday and this fix is still not applied, the v5.19-rc6 (and all stable kernels that picked up the backport) causes NAND flash data loss on imx targets.
I have backported this patch to our internal v5.15.51 based kernel on 4th July 2022 and I can confirm that it does indeed solve the NAND data loss on imx targets.
Is it possible for this patch to make it to the v5.19-rc7?
No response, so sending the email to more people so the voice is heard. Sorry if this is not the proper way, but I think the issue is serious.
Current prepatch kernels starting with v5.19-rc4 and stable kernels starting with v5.4.202. v5.10.127, v5.15.51, v5.18.8 contain a "[PATCH] [REALLY REALLY BROKEN] mtd: rawnand: gpmi: Fix setting busy timeout setting" that is wreaking havoc to i.MX[678] or i.MX28 devices with NAND "** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **" [1]
The solution is to either:
- Revert 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout
setting") and all its cherry-picks to stable branches, *OR*
- Apply the fix ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout
based on program/erase times") [2]
Please do whatever you see fit.
I can do do a stable release with this reverted, but I really expected to see the fix in linux-next by now at the very least. Does this driver not have an active maintainer and subsystem maintainer for some reason?
My IRC history doesn't go back far enough, but if I recall correctly Miquel is on vacation, he would have picked up this patch for linux-next otherwise.
Ok, let me do a round of stable releases so that people don't get hit by this now...
Hopefully this gets fixed up by 5.19-final.
Note, one way this could get fixed up sooner if it was reported to the regression bot, as Linus and other subsystem maintainers do monitor that and will pick up patches that are dropped like this one.
thanks,
greg k-h