06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong though: It is calculated based on the maximum page read time, but the timeout is also used for page write and block erase operations which require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of tBERS_max and tPROG_max.
This is for now the easiest and most obvious way to fix the driver. There's room for improvements though: The NAND_OP_WAITRDY_INSTR tells us the desired timeout for the current operation, so we could program the timeout dynamically for each operation instead of setting a fixed timeout. Also we could wire up the interrupt handler to actually detect and forward timeouts occurred when waiting for the chip being ready.
As a sidenote I verified that the change in 06781a5026350 is really correct. I wired up the interrupt handler in my tree and measured the time between starting the operation and the timeout interrupt handler coming in. The time increases 41us with each step in the timeout register which corresponds to 4096 clock cycles with the 99MHz clock that I have.
Fixes: 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout setting") Fixes: b1206122069aa ("mtd: rawniand: gpmi: use core timings instead of an empirical derivation") Cc: stable@vger.kernel.org Signed-off-by: Sascha Hauer s.hauer@pengutronix.de --- drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c index 889e403299568..93da23682d862 100644 --- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c +++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c @@ -850,9 +850,10 @@ static int gpmi_nfc_compute_timings(struct gpmi_nand_data *this, unsigned int tRP_ps; bool use_half_period; int sample_delay_ps, sample_delay_factor; - u16 busy_timeout_cycles; + unsigned int busy_timeout_cycles; u8 wrn_dly_sel; unsigned long clk_rate, min_rate; + u64 busy_timeout_ps;
if (sdr->tRC_min >= 30000) { /* ONFI non-EDO modes [0-3] */ @@ -885,7 +886,8 @@ static int gpmi_nfc_compute_timings(struct gpmi_nand_data *this, addr_setup_cycles = TO_CYCLES(sdr->tALS_min, period_ps); data_setup_cycles = TO_CYCLES(sdr->tDS_min, period_ps); data_hold_cycles = TO_CYCLES(sdr->tDH_min, period_ps); - busy_timeout_cycles = TO_CYCLES(sdr->tWB_max + sdr->tR_max, period_ps); + busy_timeout_ps = max(sdr->tBERS_max, sdr->tPROG_max); + busy_timeout_cycles = TO_CYCLES(busy_timeout_ps, period_ps);
hw->timing0 = BF_GPMI_TIMING0_ADDRESS_SETUP(addr_setup_cycles) | BF_GPMI_TIMING0_DATA_HOLD(data_hold_cycles) |
-----Original Message----- From: Sascha Hauer s.hauer@pengutronix.de Sent: Friday, July 1, 2022 6:04 AM To: linux-mtd@lists.infradead.org Cc: Han Xu han.xu@nxp.com; Miquel Raynal miquel.raynal@bootlin.com; kernel@pengutronix.de; Richard Weinberger richard@nod.at; Sascha Hauer s.hauer@pengutronix.de; stable@vger.kernel.org Subject: [PATCH] mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times
06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong though: It is calculated based on the maximum page read time, but the timeout is also used for page write and block erase operations which require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of tBERS_max and tPROG_max.
This is for now the easiest and most obvious way to fix the driver. There's room for improvements though: The NAND_OP_WAITRDY_INSTR tells us the desired timeout for the current operation, so we could program the timeout dynamically for each operation instead of setting a fixed timeout. Also we could wire up the interrupt handler to actually detect and forward timeouts occurred when waiting for the chip being ready.
As a sidenote I verified that the change in 06781a5026350 is really correct. I wired up the interrupt handler in my tree and measured the time between starting the operation and the timeout interrupt handler coming in. The time increases 41us with each step in the timeout register which corresponds to 4096 clock cycles with the 99MHz clock that I have.
Fixes: 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout setting") Fixes: b1206122069aa ("mtd: rawniand: gpmi: use core timings instead of an empirical derivation") Cc: stable@vger.kernel.org Signed-off-by: Sascha Hauer s.hauer@pengutronix.de
Acked-by: Han Xu han.xu@nxp.com
drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c index 889e403299568..93da23682d862 100644 --- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c +++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c @@ -850,9 +850,10 @@ static int gpmi_nfc_compute_timings(struct gpmi_nand_data *this, unsigned int tRP_ps; bool use_half_period; int sample_delay_ps, sample_delay_factor;
- u16 busy_timeout_cycles;
unsigned int busy_timeout_cycles; u8 wrn_dly_sel; unsigned long clk_rate, min_rate;
u64 busy_timeout_ps;
if (sdr->tRC_min >= 30000) { /* ONFI non-EDO modes [0-3] */
@@ -885,7 +886,8 @@ static int gpmi_nfc_compute_timings(struct gpmi_nand_data *this, addr_setup_cycles = TO_CYCLES(sdr->tALS_min, period_ps); data_setup_cycles = TO_CYCLES(sdr->tDS_min, period_ps); data_hold_cycles = TO_CYCLES(sdr->tDH_min, period_ps);
- busy_timeout_cycles = TO_CYCLES(sdr->tWB_max + sdr->tR_max,
period_ps);
busy_timeout_ps = max(sdr->tBERS_max, sdr->tPROG_max);
busy_timeout_cycles = TO_CYCLES(busy_timeout_ps, period_ps);
hw->timing0 = BF_GPMI_TIMING0_ADDRESS_SETUP(addr_setup_cycles) | BF_GPMI_TIMING0_DATA_HOLD(data_hold_cycles) |
-- 2.30.2
On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong though: It is calculated based on the maximum page read time, but the timeout is also used for page write and block erase operations which require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of tBERS_max and tPROG_max.
06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels, including v5.15.51. After we have upgraded to v5.15.51 we have observed the issue that Sascha mentioned in his email [1].
As the v5.19-rc6 was released yesterday and this fix is still not applied, the v5.19-rc6 (and all stable kernels that picked up the backport) causes NAND flash data loss on imx targets.
I have backported this patch to our internal v5.15.51 based kernel on 4th July 2022 and I can confirm that it does indeed solve the NAND data loss on imx targets.
Is it possible for this patch to make it to the v5.19-rc7?
Fixes: 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout setting") Fixes: b1206122069aa ("mtd: rawniand: gpmi: use core timings instead of an empirical derivation") Cc: stable@vger.kernel.org Signed-off-by: Sascha Hauer s.hauer@pengutronix.de
Tested-by: Tomasz Moń tomasz.mon@camlingroup.com
[1] https://lore.kernel.org/linux-mtd/20220701091909.GE2387@pengutronix.de/
On Mon, 2022-07-11 at 11:12 +0200, Tomasz Moń wrote:
On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong though: It is calculated based on the maximum page read time, but the timeout is also used for page write and block erase operations which require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of tBERS_max and tPROG_max.
06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels, including v5.15.51. After we have upgraded to v5.15.51 we have observed the issue that Sascha mentioned in his email [1].
As the v5.19-rc6 was released yesterday and this fix is still not applied, the v5.19-rc6 (and all stable kernels that picked up the backport) causes NAND flash data loss on imx targets.
I have backported this patch to our internal v5.15.51 based kernel on 4th July 2022 and I can confirm that it does indeed solve the NAND data loss on imx targets.
Is it possible for this patch to make it to the v5.19-rc7?
No response, so sending the email to more people so the voice is heard. Sorry if this is not the proper way, but I think the issue is serious.
Current prepatch kernels starting with v5.19-rc4 and stable kernels starting with v5.4.202. v5.10.127, v5.15.51, v5.18.8 contain a "[PATCH] [REALLY REALLY BROKEN] mtd: rawnand: gpmi: Fix setting busy timeout setting" that is wreaking havoc to i.MX[678] or i.MX28 devices with NAND "** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **" [1]
The solution is to either: * Revert 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout setting") and all its cherry-picks to stable branches, *OR* * Apply the fix ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times") [2]
Please do whatever you see fit.
Best Regards, Tomasz Moń
[1] https://lore.kernel.org/linux-mtd/20220701091909.GE2387@pengutronix.de/ [2] https://lore.kernel.org/linux-mtd/20220701110341.3094023-1-s.hauer@pengutron...
On Fri, Jul 15, 2022 at 07:27:02AM +0200, Tomasz Moń wrote:
On Mon, 2022-07-11 at 11:12 +0200, Tomasz Moń wrote:
On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong though: It is calculated based on the maximum page read time, but the timeout is also used for page write and block erase operations which require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of tBERS_max and tPROG_max.
06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels, including v5.15.51. After we have upgraded to v5.15.51 we have observed the issue that Sascha mentioned in his email [1].
As the v5.19-rc6 was released yesterday and this fix is still not applied, the v5.19-rc6 (and all stable kernels that picked up the backport) causes NAND flash data loss on imx targets.
I have backported this patch to our internal v5.15.51 based kernel on 4th July 2022 and I can confirm that it does indeed solve the NAND data loss on imx targets.
Is it possible for this patch to make it to the v5.19-rc7?
No response, so sending the email to more people so the voice is heard. Sorry if this is not the proper way, but I think the issue is serious.
Current prepatch kernels starting with v5.19-rc4 and stable kernels starting with v5.4.202. v5.10.127, v5.15.51, v5.18.8 contain a "[PATCH] [REALLY REALLY BROKEN] mtd: rawnand: gpmi: Fix setting busy timeout setting" that is wreaking havoc to i.MX[678] or i.MX28 devices with NAND "** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **" [1]
The solution is to either:
- Revert 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout
setting") and all its cherry-picks to stable branches, *OR*
- Apply the fix ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout
based on program/erase times") [2]
Please do whatever you see fit.
I can do do a stable release with this reverted, but I really expected to see the fix in linux-next by now at the very least. Does this driver not have an active maintainer and subsystem maintainer for some reason?
thanks,
greg k-h
On Fri, Jul 15, 2022 at 07:49:40AM +0200, Greg Kroah-Hartman wrote:
On Fri, Jul 15, 2022 at 07:27:02AM +0200, Tomasz Moń wrote:
On Mon, 2022-07-11 at 11:12 +0200, Tomasz Moń wrote:
On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong though: It is calculated based on the maximum page read time, but the timeout is also used for page write and block erase operations which require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of tBERS_max and tPROG_max.
06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels, including v5.15.51. After we have upgraded to v5.15.51 we have observed the issue that Sascha mentioned in his email [1].
As the v5.19-rc6 was released yesterday and this fix is still not applied, the v5.19-rc6 (and all stable kernels that picked up the backport) causes NAND flash data loss on imx targets.
I have backported this patch to our internal v5.15.51 based kernel on 4th July 2022 and I can confirm that it does indeed solve the NAND data loss on imx targets.
Is it possible for this patch to make it to the v5.19-rc7?
No response, so sending the email to more people so the voice is heard. Sorry if this is not the proper way, but I think the issue is serious.
Current prepatch kernels starting with v5.19-rc4 and stable kernels starting with v5.4.202. v5.10.127, v5.15.51, v5.18.8 contain a "[PATCH] [REALLY REALLY BROKEN] mtd: rawnand: gpmi: Fix setting busy timeout setting" that is wreaking havoc to i.MX[678] or i.MX28 devices with NAND "** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **" [1]
The solution is to either:
- Revert 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout
setting") and all its cherry-picks to stable branches, *OR*
- Apply the fix ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout
based on program/erase times") [2]
Please do whatever you see fit.
I can do do a stable release with this reverted, but I really expected to see the fix in linux-next by now at the very least. Does this driver not have an active maintainer and subsystem maintainer for some reason?
My IRC history doesn't go back far enough, but if I recall correctly Miquel is on vacation, he would have picked up this patch for linux-next otherwise.
Sascha
On Fri, Jul 15, 2022 at 09:46:31AM +0200, Sascha Hauer wrote:
On Fri, Jul 15, 2022 at 07:49:40AM +0200, Greg Kroah-Hartman wrote:
On Fri, Jul 15, 2022 at 07:27:02AM +0200, Tomasz Moń wrote:
On Mon, 2022-07-11 at 11:12 +0200, Tomasz Moń wrote:
On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong though: It is calculated based on the maximum page read time, but the timeout is also used for page write and block erase operations which require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of tBERS_max and tPROG_max.
06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels, including v5.15.51. After we have upgraded to v5.15.51 we have observed the issue that Sascha mentioned in his email [1].
As the v5.19-rc6 was released yesterday and this fix is still not applied, the v5.19-rc6 (and all stable kernels that picked up the backport) causes NAND flash data loss on imx targets.
I have backported this patch to our internal v5.15.51 based kernel on 4th July 2022 and I can confirm that it does indeed solve the NAND data loss on imx targets.
Is it possible for this patch to make it to the v5.19-rc7?
No response, so sending the email to more people so the voice is heard. Sorry if this is not the proper way, but I think the issue is serious.
Current prepatch kernels starting with v5.19-rc4 and stable kernels starting with v5.4.202. v5.10.127, v5.15.51, v5.18.8 contain a "[PATCH] [REALLY REALLY BROKEN] mtd: rawnand: gpmi: Fix setting busy timeout setting" that is wreaking havoc to i.MX[678] or i.MX28 devices with NAND "** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **" [1]
The solution is to either:
- Revert 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout
setting") and all its cherry-picks to stable branches, *OR*
- Apply the fix ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout
based on program/erase times") [2]
Please do whatever you see fit.
I can do do a stable release with this reverted, but I really expected to see the fix in linux-next by now at the very least. Does this driver not have an active maintainer and subsystem maintainer for some reason?
My IRC history doesn't go back far enough, but if I recall correctly Miquel is on vacation, he would have picked up this patch for linux-next otherwise.
Ok, let me do a round of stable releases so that people don't get hit by this now...
Hopefully this gets fixed up by 5.19-final.
thanks,
greg k-h
On Fri, Jul 15, 2022 at 09:54:10AM +0200, Greg Kroah-Hartman wrote:
On Fri, Jul 15, 2022 at 09:46:31AM +0200, Sascha Hauer wrote:
On Fri, Jul 15, 2022 at 07:49:40AM +0200, Greg Kroah-Hartman wrote:
On Fri, Jul 15, 2022 at 07:27:02AM +0200, Tomasz Moń wrote:
On Mon, 2022-07-11 at 11:12 +0200, Tomasz Moń wrote:
On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong though: It is calculated based on the maximum page read time, but the timeout is also used for page write and block erase operations which require orders of magnitude bigger timeouts.
Fix this by calculating busy_timeout_cycles from the maximum of tBERS_max and tPROG_max.
06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels, including v5.15.51. After we have upgraded to v5.15.51 we have observed the issue that Sascha mentioned in his email [1].
As the v5.19-rc6 was released yesterday and this fix is still not applied, the v5.19-rc6 (and all stable kernels that picked up the backport) causes NAND flash data loss on imx targets.
I have backported this patch to our internal v5.15.51 based kernel on 4th July 2022 and I can confirm that it does indeed solve the NAND data loss on imx targets.
Is it possible for this patch to make it to the v5.19-rc7?
No response, so sending the email to more people so the voice is heard. Sorry if this is not the proper way, but I think the issue is serious.
Current prepatch kernels starting with v5.19-rc4 and stable kernels starting with v5.4.202. v5.10.127, v5.15.51, v5.18.8 contain a "[PATCH] [REALLY REALLY BROKEN] mtd: rawnand: gpmi: Fix setting busy timeout setting" that is wreaking havoc to i.MX[678] or i.MX28 devices with NAND "** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **" [1]
The solution is to either:
- Revert 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout
setting") and all its cherry-picks to stable branches, *OR*
- Apply the fix ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout
based on program/erase times") [2]
Please do whatever you see fit.
I can do do a stable release with this reverted, but I really expected to see the fix in linux-next by now at the very least. Does this driver not have an active maintainer and subsystem maintainer for some reason?
My IRC history doesn't go back far enough, but if I recall correctly Miquel is on vacation, he would have picked up this patch for linux-next otherwise.
Ok, let me do a round of stable releases so that people don't get hit by this now...
Hopefully this gets fixed up by 5.19-final.
Note, one way this could get fixed up sooner if it was reported to the regression bot, as Linus and other subsystem maintainers do monitor that and will pick up patches that are dropped like this one.
thanks,
greg k-h
----- Ursprüngliche Mail -----
My IRC history doesn't go back far enough, but if I recall correctly Miquel is on vacation, he would have picked up this patch for linux-next otherwise.
Exactly.
Ok, let me do a round of stable releases so that people don't get hit by this now...
Thanks a lot for doing so.
Hopefully this gets fixed up by 5.19-final.
Sure, I'll pickup this patch.
Thanks, //richard
Hey folks,
richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
----- Ursprüngliche Mail -----
My IRC history doesn't go back far enough, but if I recall correctly Miquel is on vacation, he would have picked up this patch for linux-next otherwise.
Exactly.
Indeed, I was off for an extended period of time, I'm (very) slowly catching up now.
Ok, let me do a round of stable releases so that people don't get hit by this now...
Thanks a lot for doing so.
Hopefully this gets fixed up by 5.19-final.
Sure, I'll pickup this patch.
Thanks Greg & Richard for the handling of this issue.
Cheers, Miquèl
On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal miquel.raynal@bootlin.com wrote:
Hey folks,
richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
----- Ursprüngliche Mail -----
My IRC history doesn't go back far enough, but if I recall correctly Miquel is on vacation, he would have picked up this patch for linux-next otherwise.
Exactly.
Indeed, I was off for an extended period of time, I'm (very) slowly catching up now.
Ok, let me do a round of stable releases so that people don't get hit by this now...
Thanks a lot for doing so.
Hopefully this gets fixed up by 5.19-final.
Sure, I'll pickup this patch.
Thanks Greg & Richard for the handling of this issue.
Cheers, Miquèl
Hello All,
As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels. While this made it into the 5.15 and 5.18 stable branches it did not make it into the following which are thus the are currently broken: 5.10.y 5.17.y
How do we get this patch applied to those stable branches as well to resolve this?
Best regards,
Tim
On Fri, Oct 21, 2022 at 2:55 PM Tim Harvey tharvey@gateworks.com wrote:
On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal miquel.raynal@bootlin.com wrote:
Hey folks,
richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
----- Ursprüngliche Mail -----
My IRC history doesn't go back far enough, but if I recall correctly Miquel is on vacation, he would have picked up this patch for linux-next otherwise.
Exactly.
Indeed, I was off for an extended period of time, I'm (very) slowly catching up now.
Ok, let me do a round of stable releases so that people don't get hit by this now...
Thanks a lot for doing so.
Hopefully this gets fixed up by 5.19-final.
Sure, I'll pickup this patch.
Thanks Greg & Richard for the handling of this issue.
Cheers, Miquèl
Hello All,
As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels. While this made it into the 5.15 and 5.18 stable branches it did not make it into the following which are thus the are currently broken: 5.10.y 5.17.y
How do we get this patch applied to those stable branches as well to resolve this?
Best regards,
Some more details here as I also find that stable 5.4 is broken as well 5.10 and 5.17:
The issue I see here is on a Gateworks imx6q-gw54xx board with a 2GiB Cypress NAND part [ 2.703324] nand: device found, Manufacturer ID: 0x01, Chip ID: 0xd5 [ 2.709815] nand: AMD/Spansion S34ML16G2 [ 2.713758] nand: 2048 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128 [ 2.721990] Scanning device for bad blocks [ 2.727025] Bad eraseblock 0 at 0x000000000000 [ 2.731628] Bad eraseblock 1 at 0x000000020000 [ 2.736191] Bad eraseblock 2 at 0x000000040000 ...
The original regression was from commit d99e7feaed4c: ("mtd: rawnand: gpmi: fix controller timings setting") which was fixed by commit 156427b3123c ("mtd: rawnand: gpmi: Fix setting busy timeout setting") but later deemed to be an invalid fix so that got reverted (thus breaking things again) and fixed instead with commit 0fddf9ad06fd ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times") which did not make it into all the stable branches that had the original commit or the revert of the potential fix.
Here is history of each stable branch affected as best I can tell (I've actually built/booted all of these trying to understand this so fixed/broken refers to the issue I see above):
v5.4: - v5.4.189 broken with commit 0d0ee651e72c: ("mtd: rawnand: gpmi: fix controller timings setting") - v5.4.202 fixed with commit 71c76f56b97c: ("mtd: rawnand: gpmi: Fix setting busy timeout setting") - v5.4.206 broken with commit 15a3adfe7593: ("Revert "mtd: rawnand: gpmi: Fix setting busy timeout setting"") *** still broken as of v5.4.219 but can be fixed by commit 0fddf9ad06fd ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times")
v5.10: - v5.10.110 broken with commit d99e7feaed4c: ("mtd: rawnand: gpmi: fix controller timings setting") - v5.10.127 fixed with commit 156427b3123c ("mtd: rawnand: gpmi: Fix setting busy timeout setting") - v5.10.131 broken with commit cc5ee0e0eed0 Revert "mtd: rawnand: gpmi: Fix setting busy timeout setting" *** still broken as of v5.10.149 but can be fixed by commit 0fddf9ad06fd ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times")
v5.15: - v5.15.33 broken with commit 8480efe815e5: ("mtd: rawnand: gpmi: fix controller timings setting") - v5.15.51 fixed with commit 0af674e7a764: ("mtd: rawnand: gpmi: Fix setting busy timeout setting") - v5.15.55 broken with commit c80b15105a08 ("Revert "mtd: rawnand: gpmi: Fix setting busy timeout setting"") - v5.15.58 fixed with commit 212a5360ef40 ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times")
v5.17: - v5.17.2 broken with commit 4b2eed1d1ee8 ("mtd: rawnand: gpmi: fix controller timings setting") *** still broken as of v5.17.15 but can be fixed by commit 0fddf9ad06fd ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times")
v5.18: - v5.18-rc2 broken with commit 2970bf5a32f0 mtd: rawnand: gpmi: fix controller timings setting - v5.18.8 fixed with commit 8d45b05f7eaf: ("mtd: rawnand: gpmi: Fix setting busy timeout setting") - v5.18.12 broken with commit f8d01e0f004a: ("Revert "mtd: rawnand: gpmi: Fix setting busy timeout setting"") - v5.18.15 fixed with commit c283223e769d: ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times")
v5.19: - v5.19-rc3 broken by commit 06781a502635: ("mtd: rawnand: gpmi: Fix setting busy timeout setting") - v5.19-rc7 fixed by commit 0fddf9ad06fd: ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times")
Best Regards,
Tim
Hi Tim,
tharvey@gateworks.com wrote on Fri, 21 Oct 2022 14:55:15 -0700:
On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal miquel.raynal@bootlin.com wrote:
Hey folks,
richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
----- Ursprüngliche Mail -----
My IRC history doesn't go back far enough, but if I recall correctly Miquel is on vacation, he would have picked up this patch for linux-next otherwise.
Exactly.
Indeed, I was off for an extended period of time, I'm (very) slowly catching up now.
Ok, let me do a round of stable releases so that people don't get hit by this now...
Thanks a lot for doing so.
Hopefully this gets fixed up by 5.19-final.
Sure, I'll pickup this patch.
Thanks Greg & Richard for the handling of this issue.
Cheers, Miquèl
Hello All,
As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels. While this made it into the 5.15 and 5.18 stable branches it did not make it into the following which are thus the are currently broken: 5.10.y 5.17.y
How do we get this patch applied to those stable branches as well to resolve this?
It is likely that the original patch (targeting a mainline kernel) did not apply to those branches. In this case you can adapt the fix to the concerned kernels and send it to stable@ (following the Documentation guidelines for backports).
Thanks, Miquèl
On Mon, Oct 24, 2022 at 1:01 AM Miquel Raynal miquel.raynal@bootlin.com wrote:
Hi Tim,
tharvey@gateworks.com wrote on Fri, 21 Oct 2022 14:55:15 -0700:
On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal miquel.raynal@bootlin.com wrote:
Hey folks,
richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
----- Ursprüngliche Mail -----
My IRC history doesn't go back far enough, but if I recall correctly Miquel is on vacation, he would have picked up this patch for linux-next otherwise.
Exactly.
Indeed, I was off for an extended period of time, I'm (very) slowly catching up now.
Ok, let me do a round of stable releases so that people don't get hit by this now...
Thanks a lot for doing so.
Hopefully this gets fixed up by 5.19-final.
Sure, I'll pickup this patch.
Thanks Greg & Richard for the handling of this issue.
Cheers, Miquèl
Hello All,
As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels. While this made it into the 5.15 and 5.18 stable branches it did not make it into the following which are thus the are currently broken: 5.10.y 5.17.y
How do we get this patch applied to those stable branches as well to resolve this?
It is likely that the original patch (targeting a mainline kernel) did not apply to those branches. In this case you can adapt the fix to the concerned kernels and send it to stable@ (following the Documentation guidelines for backports).
Thanks, Miquèl
Miquèl,
Thanks for the pointer. You are correct that this patch which resolves the regression does not apply directly to 5.4/5.10/5.17 stable trees. I'm looking over https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html and I'm not clear what I need to put in the commit to make it clear that it only applies to those specific trees. Do I simply adjust the 'Fixes' tag to address the commit from that specific stable branch and send one for each stable branch (thus each would have a different sha in the Fixes tag) while also adding the 'commit <sha> upstream' to the top?
Best Regards.
Tim
Hi Tim,
tharvey@gateworks.com wrote on Tue, 25 Oct 2022 15:02:27 -0700:
On Mon, Oct 24, 2022 at 1:01 AM Miquel Raynal miquel.raynal@bootlin.com wrote:
Hi Tim,
tharvey@gateworks.com wrote on Fri, 21 Oct 2022 14:55:15 -0700:
On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal miquel.raynal@bootlin.com wrote:
Hey folks,
richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
----- Ursprüngliche Mail -----
> My IRC history doesn't go back far enough, but if I recall correctly > Miquel is on vacation, he would have picked up this patch for linux-next > otherwise.
Exactly.
Indeed, I was off for an extended period of time, I'm (very) slowly catching up now.
Ok, let me do a round of stable releases so that people don't get hit by this now...
Thanks a lot for doing so.
Hopefully this gets fixed up by 5.19-final.
Sure, I'll pickup this patch.
Thanks Greg & Richard for the handling of this issue.
Cheers, Miquèl
Hello All,
As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels. While this made it into the 5.15 and 5.18 stable branches it did not make it into the following which are thus the are currently broken: 5.10.y 5.17.y
How do we get this patch applied to those stable branches as well to resolve this?
It is likely that the original patch (targeting a mainline kernel) did not apply to those branches. In this case you can adapt the fix to the concerned kernels and send it to stable@ (following the Documentation guidelines for backports).
Thanks, Miquèl
Miquèl,
Thanks for the pointer. You are correct that this patch which resolves the regression does not apply directly to 5.4/5.10/5.17 stable trees. I'm looking over https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html
Option 3 fits, right?
and I'm not clear what I need to put in the commit to make it clear that it only applies to those specific trees. Do I simply adjust the 'Fixes' tag to address the commit from that specific stable branch and send one for each stable branch (thus each would have a different sha in the Fixes tag) while also adding the 'commit <sha> upstream' to the top?
Here are the failures: https://lore.kernel.org/stable/165858381623360@kroah.com/ https://lore.kernel.org/stable/165858381472139@kroah.com/ https://lore.kernel.org/stable/165858381313218@kroah.com/
You should not adjust the Fixes tag, this tag should really reflect what you are trying to fix, and not some kind of target for the backport. However you're changing the commit content so you can also change the commit log, with eg. the upstream original commit somewhere there in plain text. You can target a kernel version on the Cc: stable line (see the doc) and you can also use
--subject-prefix="PATCH stable <version>"
during git-format-patch, even though I'm not sure this is actually needed, it's more like courtesy to let reviewers know what you are doing.
Here is an example, maybe not the best one (forget about the RESEND) but a typical case: https://lore.kernel.org/linux-mtd/20220223174431.1083-1-f.fainelli@gmail.com...
Thanks, Miquèl
On Tue, Oct 25, 2022 at 03:02:27PM -0700, Tim Harvey wrote:
On Mon, Oct 24, 2022 at 1:01 AM Miquel Raynal miquel.raynal@bootlin.com wrote:
Hi Tim,
tharvey@gateworks.com wrote on Fri, 21 Oct 2022 14:55:15 -0700:
On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal miquel.raynal@bootlin.com wrote:
Hey folks,
richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
----- Ursprüngliche Mail -----
> My IRC history doesn't go back far enough, but if I recall correctly > Miquel is on vacation, he would have picked up this patch for linux-next > otherwise.
Exactly.
Indeed, I was off for an extended period of time, I'm (very) slowly catching up now.
Ok, let me do a round of stable releases so that people don't get hit by this now...
Thanks a lot for doing so.
Hopefully this gets fixed up by 5.19-final.
Sure, I'll pickup this patch.
Thanks Greg & Richard for the handling of this issue.
Cheers, Miquèl
Hello All,
As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and then was picked up by several stable kernels. While this made it into the 5.15 and 5.18 stable branches it did not make it into the following which are thus the are currently broken: 5.10.y 5.17.y
How do we get this patch applied to those stable branches as well to resolve this?
It is likely that the original patch (targeting a mainline kernel) did not apply to those branches. In this case you can adapt the fix to the concerned kernels and send it to stable@ (following the Documentation guidelines for backports).
Thanks, Miquèl
Miquèl,
Thanks for the pointer. You are correct that this patch which resolves the regression does not apply directly to 5.4/5.10/5.17 stable trees. I'm looking over https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html and I'm not clear what I need to put in the commit to make it clear that it only applies to those specific trees. Do I simply adjust the 'Fixes' tag to address the commit from that specific stable branch and send one for each stable branch (thus each would have a different sha in the Fixes tag) while also adding the 'commit <sha> upstream' to the top?
Please send multiple patches and say below the --- line which stable tree it should be applied to.
Note that 5.17 is long end-of-life, always check the front page of kernel.org for the active kernel versions.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org