On Mon, Mar 03, 2025 at 03:58:30PM +0100, Eric wrote:
Hi Niklas
Le 03/03/2025 à 07:25, Niklas Cassel a écrit :
So far, this just sounds like a bug where UEFI cannot detect your SSD.
Bit it is detected during cold boot, though.
UEFI problems should be reported to your BIOS vendor.
I'll try to see what can be done, however I am not sure how responsive they will be for this board...
It would be interesting to see if _Linux_ can detect your SSD, after a reboot, without UEFI involvement.
If you kexec into the same kernel as you are currently running: https://manpages.debian.org/testing/kexec-tools/kexec.8.en.html
Do you see your SSD in the kexec'd kernel?
Sorry, I've tried that using several methods (systemctl kexec / kexec --load
- kexec -e / kexec --load + shutdown --reboot now) and it failed each time.
I *don't* think it is related to this bug, however, because each time the process got stuck just after displaying "kexec_core: Starting new kernel".
I just tired (as root): # kexec -l /boot/vmlinuz-6.13.5-200.fc41.x86_64 --initrd=/boot/initramfs-6.13.5-200.fc41.x86_64.img --reuse-cmd # kexec -e
and FWIW, kexec worked fine.
Did you specify an initrd ? did you specify --reuse-cmd ?
If not, please try it.
It would be interesting to see if Linux can detect your SATA drive after a kexec. If it can't, then we need to report the issue to your drive vendor (Samsung).
Kind regards, Niklas
On Thu, Mar 06, 2025 at 11:37:08AM +0100, Niklas Cassel wrote:
On Mon, Mar 03, 2025 at 03:58:30PM +0100, Eric wrote:
Hi Niklas
Le 03/03/2025 à 07:25, Niklas Cassel a écrit :
So far, this just sounds like a bug where UEFI cannot detect your SSD.
Bit it is detected during cold boot, though.
UEFI problems should be reported to your BIOS vendor.
I'll try to see what can be done, however I am not sure how responsive they will be for this board...
It would be interesting to see if _Linux_ can detect your SSD, after a reboot, without UEFI involvement.
If you kexec into the same kernel as you are currently running: https://manpages.debian.org/testing/kexec-tools/kexec.8.en.html
Do you see your SSD in the kexec'd kernel?
Sorry, I've tried that using several methods (systemctl kexec / kexec --load
- kexec -e / kexec --load + shutdown --reboot now) and it failed each time.
I *don't* think it is related to this bug, however, because each time the process got stuck just after displaying "kexec_core: Starting new kernel".
I just tired (as root): # kexec -l /boot/vmlinuz-6.13.5-200.fc41.x86_64 --initrd=/boot/initramfs-6.13.5-200.fc41.x86_64.img --reuse-cmd # kexec -e
and FWIW, kexec worked fine.
Did you specify an initrd ? did you specify --reuse-cmd ?
Sorry, typo:
s/--reuse-cmd/--reuse-cmdline/
Kind regards, Niklas
Le 06/03/2025 à 11:40, Niklas Cassel a écrit :
On Thu, Mar 06, 2025 at 11:37:08AM +0100, Niklas Cassel wrote:
On Mon, Mar 03, 2025 at 03:58:30PM +0100, Eric wrote:
Hi Niklas
Le 03/03/2025 à 07:25, Niklas Cassel a écrit :
Do you see your SSD in the kexec'd kernel?
Sorry, I've tried that using several methods (systemctl kexec / kexec --load
- kexec -e / kexec --load + shutdown --reboot now) and it failed each time.
I *don't* think it is related to this bug, however, because each time the process got stuck just after displaying "kexec_core: Starting new kernel".
I just tired (as root): # kexec -l /boot/vmlinuz-6.13.5-200.fc41.x86_64 --initrd=/boot/initramfs-6.13.5-200.fc41.x86_64.img --reuse-cmd # kexec -e
and FWIW, kexec worked fine.
Did you specify an initrd ? did you specify --reuse-cmd ?
At one time, I did yes. I can't figure out what's wrong, but working from the assumption that another way of working around the UEFI's failure to wake the disk might yield the same information that you're looking for,
I installed the same system on a USB stick, on which I also installed grub, so that the reboot is made independent of weather the UEFI sees the SSD disk or not. I'll attach dmesg extracts (grep on ata or ahci) to this mail.
One is the dmesg after coldbooting from the USB stick, the other is rebooting on the USB stick. First of all, the visible result : the SSD is not detected by linux at reboot (but is when coldbooting).
Here is what changes :
eric@gwaihir:~$ diff /media/eric/trixieUSB/home/eric/dmesg-ahci-ata-coldboot.untimed.txt /media/eric/trixieUSB/home/eric/dmesg-ahci-ata-reboot.untimed.txt
4c4 < ahci 0000:00:11.0: 4/4 ports implemented (port mask 0x3c) ---
ahci 0000:00:11.0: 3/3 ports implemented (port mask 0x38)
14c14 < ata3: SATA max UDMA/133 abar m1024@0xfeb0b000 port 0xfeb0b200 irq 19 lpm-pol 3 ---
ata3: DUMMY
27,28d26 < ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) < ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) 29a28
ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
31,34d29 < ata3.00: Model 'Samsung SSD 870 QVO 2TB', rev 'SVQ02B6Q', applying quirks: noncqtrim zeroaftertrim noncqonati < ata3.00: supports DRM functions and may not be fully accessible < ata3.00: ATA-11: Samsung SSD 870 QVO 2TB, SVQ02B6Q, max UDMA/133 < ata3.00: 3907029168 sectors, multi 1: LBA48 NCQ (not used) 37a33
ata5.00: configured for UDMA/100
40d35 < ata5.00: configured for UDMA/100 43,46d37 < ata3.00: Features: Trust Dev-Sleep < ata3.00: supports DRM functions and may not be fully accessible < ata3.00: configured for UDMA/133 < scsi 2:0:0:0: Direct-Access ATA Samsung SSD 870 2B6Q PQ: 0 ANSI: 5 50,51d40 < ata3.00: Enabling discard_zeroes_data < ata3.00: Enabling discard_zeroes_data
I hope this is useful for diagnosing the problem.
Sorry, typo:
s/--reuse-cmd/--reuse-cmdline/
Kind regards, Niklas
Kind regards,
Eric
Hello Eric,
On Thu, Mar 06, 2025 at 01:27:17PM +0100, Eric wrote:
I installed the same system on a USB stick, on which I also installed grub, so that the reboot is made independent of weather the UEFI sees the SSD disk or not. I'll attach dmesg extracts (grep on ata or ahci) to this mail.
Exellent idea!
One is the dmesg after coldbooting from the USB stick, the other is rebooting on the USB stick. First of all, the visible result : the SSD is not detected by linux at reboot (but is when coldbooting).
Here is what changes :
eric@gwaihir:~$ diff /media/eric/trixieUSB/home/eric/dmesg-ahci-ata-coldboot.untimed.txt /media/eric/trixieUSB/home/eric/dmesg-ahci-ata-reboot.untimed.txt
4c4
< ahci 0000:00:11.0: 4/4 ports implemented (port mask 0x3c)
ahci 0000:00:11.0: 3/3 ports implemented (port mask 0x38)
14c14 < ata3: SATA max UDMA/133 abar m1024@0xfeb0b000 port 0xfeb0b200 irq 19 lpm-pol 3
ata3: DUMMY
27,28d26 < ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) < ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) 29a28
ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
31,34d29 < ata3.00: Model 'Samsung SSD 870 QVO 2TB', rev 'SVQ02B6Q', applying quirks: noncqtrim zeroaftertrim noncqonati < ata3.00: supports DRM functions and may not be fully accessible < ata3.00: ATA-11: Samsung SSD 870 QVO 2TB, SVQ02B6Q, max UDMA/133 < ata3.00: 3907029168 sectors, multi 1: LBA48 NCQ (not used) 37a33
ata5.00: configured for UDMA/100
40d35 < ata5.00: configured for UDMA/100 43,46d37 < ata3.00: Features: Trust Dev-Sleep < ata3.00: supports DRM functions and may not be fully accessible < ata3.00: configured for UDMA/133 < scsi 2:0:0:0: Direct-Access ATA Samsung SSD 870 2B6Q PQ: 0 ANSI: 5 50,51d40 < ata3.00: Enabling discard_zeroes_data < ata3.00: Enabling discard_zeroes_data
I hope this is useful for diagnosing the problem.
It is indeed!
Wow.
The problem does not appear to be with the SSD firmware.
The problem appears to be that your AHCI controller reports different values in the PI (Ports Implemented) register.
This is supposed to be a read-only register :)
At cold boot the print is: 4/4 ports implemented (port mask 0x3c) meaning ports 1,2 are not implemented (DUMMY ports).
At reboot the print is: 3/3 ports implemented (port mask 0x38) meaning ports 1,2,3 are not implemented (DUMMY ports).
So, the problem is that your AHCI controller appears to report different values in the PI register.
Most likely, if the AHCI controller reported the same register values the second boot, libata would be able to scan and detect the drive correctly.
What AHCI controller is this?
$ sudo lspci -nns 0000:00:11.0
Which kernel version are you using?
Please test with v6.14-rc5 as there was a bug in v6.14-rc4 where mask_port_map would get incorrecly set. (Although, this bug should only affect device tree based platforms. Most often when using UEFI, you do not use device tree.)
I do see that your AHCI controller is < AHCI 1.3, so we do take this path: https://github.com/torvalds/linux/blob/v6.14-rc5/drivers/ata/libahci.c#L571-...
Could you please provide a full dmesg?
Also, it would be helpful if you could print every time we read/write the PI register. (Don't ask me why libata writes a read-only register... we were not always the maintainers for this driver...)
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c index e7ace4b10f15..dd837834245b 100644 --- a/drivers/ata/libahci.c +++ b/drivers/ata/libahci.c @@ -533,6 +533,7 @@ void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv)
/* Override the HBA ports mapping if the platform needs it */ port_map = readl(mmio + HOST_PORTS_IMPL); + dev_err(dev, "%s:%d PI: read: %#lx\n", __func__, __LINE__, port_map); if (hpriv->saved_port_map && port_map != hpriv->saved_port_map) { dev_info(dev, "forcing port_map 0x%lx -> 0x%x\n", port_map, hpriv->saved_port_map); @@ -629,6 +630,7 @@ static void ahci_restore_initial_config(struct ata_host *host) if (hpriv->saved_cap2) writel(hpriv->saved_cap2, mmio + HOST_CAP2); writel(hpriv->saved_port_map, mmio + HOST_PORTS_IMPL); + dev_err(host->dev, "%s:%d PI: wrote: %#x\n", __func__, __LINE__, hpriv->saved_port_map); (void) readl(mmio + HOST_PORTS_IMPL); /* flush */
for_each_set_bit(i, &port_map, AHCI_MAX_PORTS) {
Kind regards, Niklas
Hello Niklas,
I'll begin with the quick parts, changing kernel will take more time, this box is fit for family use but takes a bit of time to compile.
Le 07/03/2025 à 10:53, Niklas Cassel a écrit :
Hello Eric,
On Thu, Mar 06, 2025 at 01:27:17PM +0100, Eric wrote:
I installed the same system on a USB stick, on which I also installed grub, so that the reboot is made independent of weather the UEFI sees the SSD disk or not. I'll attach dmesg extracts (grep on ata or ahci) to this mail. [...]
I hope this is useful for diagnosing the problem.
It is indeed!
Wow.
The problem does not appear to be with the SSD firmware.
The problem appears to be that your AHCI controller reports different values in the PI (Ports Implemented) register.
This is supposed to be a read-only register :)
Hmm ... puzzled, but weird things roam in the wild ;)
At cold boot the print is: 4/4 ports implemented (port mask 0x3c) meaning ports 1,2 are not implemented (DUMMY ports).
At reboot the print is: 3/3 ports implemented (port mask 0x38) meaning ports 1,2,3 are not implemented (DUMMY ports).
So, the problem is that your AHCI controller appears to report different values in the PI register.
Most likely, if the AHCI controller reported the same register values the second boot, libata would be able to scan and detect the drive correctly.
What AHCI controller is this?
$ sudo lspci -nns 0000:00:11.0
00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] [1002:4391] (rev 40)
Which kernel version are you using?
This is trixie's last, 6.12.12.
Please test with v6.14-rc5 as there was a bug in v6.14-rc4 where
OK, I think I'll try directly with the patch, as it will save me a kernel build.
mask_port_map would get incorrecly set. (Although, this bug should only affect device tree based platforms. Most often when using UEFI, you do not use device tree.)
I do see that your AHCI controller is < AHCI 1.3, so we do take this path: https://github.com/torvalds/linux/blob/v6.14-rc5/drivers/ata/libahci.c#L571-...
Could you please provide a full dmesg?
Of course, full reboot dmesg attached to this message.
Also, it would be helpful if you could print every time we read/write the PI register. (Don't ask me why libata writes a read-only register... we were not always the maintainers for this driver...)
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c index e7ace4b10f15..dd837834245b 100644 --- a/drivers/ata/libahci.c +++ b/drivers/ata/libahci.c @@ -533,6 +533,7 @@ void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv) /* Override the HBA ports mapping if the platform needs it */ port_map = readl(mmio + HOST_PORTS_IMPL);
- dev_err(dev, "%s:%d PI: read: %#lx\n", __func__, __LINE__, port_map); if (hpriv->saved_port_map && port_map != hpriv->saved_port_map) { dev_info(dev, "forcing port_map 0x%lx -> 0x%x\n", port_map, hpriv->saved_port_map);
@@ -629,6 +630,7 @@ static void ahci_restore_initial_config(struct ata_host *host) if (hpriv->saved_cap2) writel(hpriv->saved_cap2, mmio + HOST_CAP2); writel(hpriv->saved_port_map, mmio + HOST_PORTS_IMPL);
- dev_err(host->dev, "%s:%d PI: wrote: %#x\n", __func__, __LINE__, hpriv->saved_port_map); (void) readl(mmio + HOST_PORTS_IMPL); /* flush */
for_each_set_bit(i, &port_map, AHCI_MAX_PORTS) {
Kind regards, Niklas
Kind regards,
Eric
Le 08/03/2025 à 11:05, Eric a écrit :
Hello Niklas,
I'll begin with the quick parts, changing kernel will take more time, this box is fit for family use but takes a bit of time to compile.
And now, results with the patched 6.14-rc5
Le 07/03/2025 à 10:53, Niklas Cassel a écrit :
Hello Eric,
On Thu, Mar 06, 2025 at 01:27:17PM +0100, Eric wrote:
I installed the same system on a USB stick, on which I also installed grub, so that the reboot is made independent of weather the UEFI sees the SSD disk or not. I'll attach dmesg extracts (grep on ata or ahci) to this mail. [...]
I hope this is useful for diagnosing the problem.
It is indeed!
Wow.
The problem does not appear to be with the SSD firmware.
The problem appears to be that your AHCI controller reports different values in the PI (Ports Implemented) register.
I have attached to this message the journal from a coldboot + reboot session.
Also, it would be helpful if you could print every time we read/write the PI register. (Don't ask me why libata writes a read-only register... we were not always the maintainers for this driver...)
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c index e7ace4b10f15..dd837834245b 100644 --- a/drivers/ata/libahci.c +++ b/drivers/ata/libahci.c @@ -533,6 +533,7 @@ void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv) /* Override the HBA ports mapping if the platform needs it */ port_map = readl(mmio + HOST_PORTS_IMPL); + dev_err(dev, "%s:%d PI: read: %#lx\n", __func__, __LINE__, port_map); if (hpriv->saved_port_map && port_map != hpriv->saved_port_map) { dev_info(dev, "forcing port_map 0x%lx -> 0x%x\n", port_map, hpriv->saved_port_map); @@ -629,6 +630,7 @@ static void ahci_restore_initial_config(struct ata_host *host) if (hpriv->saved_cap2) writel(hpriv->saved_cap2, mmio + HOST_CAP2); writel(hpriv->saved_port_map, mmio + HOST_PORTS_IMPL); + dev_err(host->dev, "%s:%d PI: wrote: %#x\n", __func__, __LINE__, hpriv->saved_port_map); (void) readl(mmio + HOST_PORTS_IMPL); /* flush */ for_each_set_bit(i, &port_map, AHCI_MAX_PORTS) {
Kind regards
Eric
On Sat, Mar 08, 2025 at 11:05:36AM +0100, Eric wrote:
$ sudo lspci -nns 0000:00:11.0
00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] [1002:4391] (rev 40)
Ok, so some old ATI controller that seems to have a bunch of workarounds.
Mario, do you know anything about this AHCI controller?
""" 3.1.4 Offset 0Ch: PI – Ports Implemented
This register indicates which ports are exposed by the HBA. It is loaded by the BIOS. It indicates which ports that the HBA supports are available for software to use. For example, on an HBA that supports 6 ports as indicated in CAP.NP, only ports 1 and 3 could be available, with ports 0, 2, 4, and 5 being unavailable.
Software must not read or write to registers within unavailable ports.
The intent of this register is to allow system vendors to build platforms that support less than the full number of ports implemented on the HBA silicon. """
It seems quite clear that it is a BIOS bug. It is understandable that HBA vendors reuse the same silicon, but I would expect BIOS to always write the same value to the PI register.
Kind regards, Niklas
On 3/10/2025 11:24, Niklas Cassel wrote:
On Sat, Mar 08, 2025 at 11:05:36AM +0100, Eric wrote:
$ sudo lspci -nns 0000:00:11.0
00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] [1002:4391] (rev 40)
Ok, so some old ATI controller that seems to have a bunch of workarounds.
Mario, do you know anything about this AHCI controller?
Unfortunately not; this is one of those "Before my time" type of things. Let me add Shyam and Basavaraj, they may know more about it.
Something that comes to my mind though is this patch:
https://lore.kernel.org/linux-pci/20241208074147.22945-1-kaihengf@nvidia.com...
It's been shown to fix several issues where the Linux kernel doesn't put the devices into the proper state from the shutdown callbacks.
Maybe it helps here too?
""" 3.1.4 Offset 0Ch: PI – Ports Implemented
This register indicates which ports are exposed by the HBA. It is loaded by the BIOS. It indicates which ports that the HBA supports are available for software to use. For example, on an HBA that supports 6 ports as indicated in CAP.NP, only ports 1 and 3 could be available, with ports 0, 2, 4, and 5 being unavailable.
Software must not read or write to registers within unavailable ports.
The intent of this register is to allow system vendors to build platforms that support less than the full number of ports implemented on the HBA silicon. """
It seems quite clear that it is a BIOS bug. It is understandable that HBA vendors reuse the same silicon, but I would expect BIOS to always write the same value to the PI register.
Kind regards, Niklas
Hi,
On 7-Mar-25 10:53, Niklas Cassel wrote:
Hello Eric,
On Thu, Mar 06, 2025 at 01:27:17PM +0100, Eric wrote:
I installed the same system on a USB stick, on which I also installed grub, so that the reboot is made independent of weather the UEFI sees the SSD disk or not. I'll attach dmesg extracts (grep on ata or ahci) to this mail.
Exellent idea!
One is the dmesg after coldbooting from the USB stick, the other is rebooting on the USB stick. First of all, the visible result : the SSD is not detected by linux at reboot (but is when coldbooting).
Here is what changes :
eric@gwaihir:~$ diff /media/eric/trixieUSB/home/eric/dmesg-ahci-ata-coldboot.untimed.txt /media/eric/trixieUSB/home/eric/dmesg-ahci-ata-reboot.untimed.txt
4c4
< ahci 0000:00:11.0: 4/4 ports implemented (port mask 0x3c)
ahci 0000:00:11.0: 3/3 ports implemented (port mask 0x38)
14c14 < ata3: SATA max UDMA/133 abar m1024@0xfeb0b000 port 0xfeb0b200 irq 19 lpm-pol 3
ata3: DUMMY
27,28d26 < ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) < ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) 29a28
ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
31,34d29 < ata3.00: Model 'Samsung SSD 870 QVO 2TB', rev 'SVQ02B6Q', applying quirks: noncqtrim zeroaftertrim noncqonati < ata3.00: supports DRM functions and may not be fully accessible < ata3.00: ATA-11: Samsung SSD 870 QVO 2TB, SVQ02B6Q, max UDMA/133 < ata3.00: 3907029168 sectors, multi 1: LBA48 NCQ (not used) 37a33
ata5.00: configured for UDMA/100
40d35 < ata5.00: configured for UDMA/100 43,46d37 < ata3.00: Features: Trust Dev-Sleep < ata3.00: supports DRM functions and may not be fully accessible < ata3.00: configured for UDMA/133 < scsi 2:0:0:0: Direct-Access ATA Samsung SSD 870 2B6Q PQ: 0 ANSI: 5 50,51d40 < ata3.00: Enabling discard_zeroes_data < ata3.00: Enabling discard_zeroes_data
I hope this is useful for diagnosing the problem.
It is indeed!
Wow.
The problem does not appear to be with the SSD firmware.
The problem appears to be that your AHCI controller reports different values in the PI (Ports Implemented) register.
This is supposed to be a read-only register :)
At cold boot the print is: 4/4 ports implemented (port mask 0x3c) meaning ports 1,2 are not implemented (DUMMY ports).
At reboot the print is: 3/3 ports implemented (port mask 0x38) meaning ports 1,2,3 are not implemented (DUMMY ports).
So, the problem is that your AHCI controller appears to report different values in the PI register.
Most likely, if the AHCI controller reported the same register values the second boot, libata would be able to scan and detect the drive correctly.
I think that the port-mask register is only read-only from an OS pov, the BIOS/UEFI/firmware can likely set it to e.g. exclude ports which are not enabled on the motherboard (e.g. an M2 slot which can do both pci-e + ata and is used in pci-e mode, so the sata port on that slot should be ignored).
What we seem to be hitting here is a bug where the UEFI can not detect the SATA SSD after reboot if it ALPM was used by the OS before reboot and the UEFI's SATA driver responds to the not detecting by clearing the bit in the port-mask register.
The UEFI not detecting the disk after reboot when ALPM was in use also matches with not being able to boot from the disk after reboot.
I think what would be worth a try would be to disable ALPM on reboot from a driver shutdown hook. IIRC the ALPM level can be changed at runtime from a sysfs file, so we should be able to do the same at shutdown ?
Its been a while since I last touched the AHCI code, so I hope someone else can write a proof of concept patch with the shutdown handler disabling ALPM on reboot ?
Regards,
Hans
Hello Hans,
On Mon, Mar 10, 2025 at 10:34:13AM +0100, Hans de Goede wrote:
I think that the port-mask register is only read-only from an OS pov, the BIOS/UEFI/firmware can likely set it to e.g. exclude ports which are not enabled on the motherboard (e.g. an M2 slot which can do both pci-e + ata and is used in pci-e mode, so the sata port on that slot should be ignored).
What we seem to be hitting here is a bug where the UEFI can not detect the SATA SSD after reboot if it ALPM was used by the OS before reboot and the UEFI's SATA driver responds to the not detecting by clearing the bit in the port-mask register.
The UEFI not detecting the disk after reboot when ALPM was in use also matches with not being able to boot from the disk after reboot.
If we look at dmesg: ahci 0000:00:11.0: AHCI vers 0001.0200, 32 command slots, 6 Gbps, SATA mode ahci 0000:00:11.0: 3/3 ports implemented (port mask 0x38) ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
We can see that the controller supports slumber, partial, and aggressive link power management ("pm").
A COMRESET is supposed to take the device out of partial or slumber.
Now, we do not know if the BIOS code sends a COMRESET, but it definitely should.
Anyway, it is stated in AHCI 1.3.1 "10.1 Software Initialization of HBA",
"To aid system software during runtime, the BIOS shall ensure that the following registers are initialized to values that are reflective of the capabilities supported by the platform."
"-PI (ports implemented)"
I think what would be worth a try would be to disable ALPM on reboot from a driver shutdown hook. IIRC the ALPM level can be changed at runtime from a sysfs file, so we should be able to do the same at shutdown ?
Its been a while since I last touched the AHCI code, so I hope someone else can write a proof of concept patch with the shutdown handler disabling ALPM on reboot ?
I mean, that would be a quirk, and if such a quirk is created, it should only be applied for buggy BIOS versions.
(Since BIOS is supposed to initialize the PI register properly.)
If ahci.mobile_lpm_policy=1 or ahci.mobile_lpm_policy=2 works around your buggy BIOS, then I suggest you keep that until your BIOS vendor manages to release a new BIOS version.
Kind regards, Niklas
Hi Niklas,
On 10-Mar-25 7:13 PM, Niklas Cassel wrote:
Hello Hans,
On Mon, Mar 10, 2025 at 10:34:13AM +0100, Hans de Goede wrote:
I think that the port-mask register is only read-only from an OS pov, the BIOS/UEFI/firmware can likely set it to e.g. exclude ports which are not enabled on the motherboard (e.g. an M2 slot which can do both pci-e + ata and is used in pci-e mode, so the sata port on that slot should be ignored).
What we seem to be hitting here is a bug where the UEFI can not detect the SATA SSD after reboot if it ALPM was used by the OS before reboot and the UEFI's SATA driver responds to the not detecting by clearing the bit in the port-mask register.
The UEFI not detecting the disk after reboot when ALPM was in use also matches with not being able to boot from the disk after reboot.
If we look at dmesg: ahci 0000:00:11.0: AHCI vers 0001.0200, 32 command slots, 6 Gbps, SATA mode ahci 0000:00:11.0: 3/3 ports implemented (port mask 0x38) ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
We can see that the controller supports slumber, partial, and aggressive link power management ("pm").
A COMRESET is supposed to take the device out of partial or slumber.
Now, we do not know if the BIOS code sends a COMRESET, but it definitely should.
Anyway, it is stated in AHCI 1.3.1 "10.1 Software Initialization of HBA",
"To aid system software during runtime, the BIOS shall ensure that the following registers are initialized to values that are reflective of the capabilities supported by the platform."
"-PI (ports implemented)"
I think what would be worth a try would be to disable ALPM on reboot from a driver shutdown hook. IIRC the ALPM level can be changed at runtime from a sysfs file, so we should be able to do the same at shutdown ?
Its been a while since I last touched the AHCI code, so I hope someone else can write a proof of concept patch with the shutdown handler disabling ALPM on reboot ?
I mean, that would be a quirk, and if such a quirk is created, it should only be applied for buggy BIOS versions.
(Since BIOS is supposed to initialize the PI register properly.)
If ahci.mobile_lpm_policy=1 or ahci.mobile_lpm_policy=2 works around your buggy BIOS, then I suggest you keep that until your BIOS vendor manages to release a new BIOS version.
I agree with you that this is a BIOS bug of the motherboard in question and/or a bad interaction between the ATI SATA controller and Samsung SSD 870* models. Note that given the age of the motherboard there are likely not going to be any BIOS updates fixing this though.
Certainly ahci.mobile_lpm_policy=x can be used to workaround this, but going by my experience from being involved in resolving: https://bugzilla.kernel.org/show_bug.cgi?id=201693 which took a long time to resolve and has many comments (1).
I'm afraid that we are going to see more users hit this. This seems to be another case of samsung sata SSDs and ATI SATA chipsets not liking each other but this time the problem is triggered by LPM rather then by NCQ and we likely did not hit this the last time we were seeing a lot of users reporting issues on this combo because so far LPM has defaulted to off in these cases.
Note that in commit 7a8526a5cd51 which fixes the bug linked above, we already disable all NCQ use for "Samsung SSD 870*" models when used together with SATA controllers with a PCI vendor-id of ATI because of various severe issues when it is enabled.
I strongly believe that to avoid further regressions from commit 7627a0edef54 ("ata: ahci: Drop low power policy board type") on ATI SATA controller + Samsung SSD combinations we should probably extend the special handling of ATI SATA chipsets to also disable LPM.
IOW add a new ATA_QUIRK_NO_LPM_ON_ATI flag which mirrors how the current ATA_QUIRK_NO_NCQ_ON_ATI works but then for LPM and set that for "Samsung SSD 870*".
I can prepare a patch for this if that sounds like an acceptable solution to you.
Regards,
Hans
1) I don't know if you know, but I'm the author of the initial addition of the "low power policy board type" list, because back then we needed ALPM to reach high PC-states (e.g. PC10) on Broadwell and newer Intel SoCs, while at the same time there were reports that ALPM was causing. I also added quite a few of the initial NOLPM __ata_dev_quirks[] entries.
Hello Hans, Eric,
On Mon, Mar 10, 2025 at 09:12:13PM +0100, Hans de Goede wrote:
I agree with you that this is a BIOS bug of the motherboard in question and/or a bad interaction between the ATI SATA controller and Samsung SSD 870* models. Note that given the age of the motherboard there are likely not going to be any BIOS updates fixing this though.
Looking at the number of quirks for some of the ATI SB7x0/SB8x0/SB9x0 SATA controllers, they really look like something special (not in a good way): https://github.com/torvalds/linux/blob/v6.14-rc6/drivers/ata/ahci.c#L236-L24...
-Ignore SError internal -No MSI -Max 255 sectors -Broken 64-bit DMA -Retry SRST (software reset)
And that is even without the weird "disable NCQ but only for Samsung SSD 8xx drives" quirk when using these ATI controllers.
What does bother me is that we don't know if it is this specific mobo/BIOS: Manufacturer: ASUSTeK COMPUTER INC. Product Name: M5A99X EVO R2.0 Version: Rev 1.xx
M5A99X EVO R2.0 BIOS 2501 Version 2501 3.06 MB 2014/05/14
that should have a NOLPM quirk, like we do for specific BIOSes: https://github.com/torvalds/linux/blob/v6.14-rc6/drivers/ata/ahci.c#L1402-L1...
Or if it this ATI SATA controller that is always broken when it comes to LPM, regardless of the drive, or if it is only Samsung drives.
Considering the dmesg comparing cold boot, the Maxtor drive and the ASUS ATAPI device seems to be recognized correctly.
Eric, could you please run: $ sudo hdparm -I /dev/sdX | grep "interface power management"
on both your Samsung and Maxtor drive? (A star to the left of feature means that the feature is enabled)
One guess... perhaps it could be Device Initiated PM that is broken with these controllers? (Even though the controller does claim to support it.)
Eric, could you please try this patch:
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index f813dbdc2346..ca690fde8842 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -244,7 +244,7 @@ static const struct ata_port_info ahci_port_info[] = { }, [board_ahci_sb700] = { /* for SB700 and SB800 */ AHCI_HFLAGS (AHCI_HFLAG_IGN_SERR_INTERNAL), - .flags = AHCI_FLAG_COMMON, + .flags = AHCI_FLAG_COMMON | ATA_FLAG_NO_DIPM, .pio_mask = ATA_PIO4, .udma_mask = ATA_UDMA6, .port_ops = &ahci_pmp_retry_srst_ops,
Normally, I do think that we need more reports, to see if it is just this specific BIOS, or all the ATI SB7x0/SB8x0/SB9x0 SATA controllers that are broken...
...but, considering how many quirks these ATI controllers have already...
...and the fact that the one (Dieter) who reported that his Samsung SSD 870 QVO could enter deeper sleep states just fine was running an Intel AHCI controller (with the same FW version as Eric), I would be open to a patch that sets ATA_FLAG_NO_LPM for all these ATI controllers.
Or a ATA_QUIRK_NO_LPM_ON_ATI, like you suggested, if we are certain that it is only Samsung drives that don't work with these ATI SATA controllers.
Kind regards, Niklas
Hi Niklas
Le 11/03/2025 à 15:14, Niklas Cassel a écrit :
Hello Hans, Eric,
Eric, could you please run: $ sudo hdparm -I /dev/sdX | grep "interface power management"
on both your Samsung and Maxtor drive? (A star to the left of feature means that the feature is enabled)
Here is the result (apparently PM is enabled on the maxtor but it doesn't create the same problem) :
(trixieUSB)eric@gwaihir:~$ sudo hdparm -I /dev/disk/by-id/ata-MAXTOR_STM3250310AS_6RY2WB82 | grep "interface power management" * Device-initiated interface power management (trixieUSB)eric@gwaihir:~$ sudo hdparm -I /dev/disk/by-id/ata-Samsung_SSD_870_QVO_2TB_S5RPNF0T419459E | grep "interface power management" * Device-initiated interface power management
One guess... perhaps it could be Device Initiated PM that is broken with these controllers? (Even though the controller does claim to support it.)
Eric, could you please try this patch:
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index f813dbdc2346..ca690fde8842 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -244,7 +244,7 @@ static const struct ata_port_info ahci_port_info[] = { }, [board_ahci_sb700] = { /* for SB700 and SB800 */ AHCI_HFLAGS (AHCI_HFLAG_IGN_SERR_INTERNAL),
.flags = AHCI_FLAG_COMMON,
.pio_mask = ATA_PIO4, .udma_mask = ATA_UDMA6, .port_ops = &ahci_pmp_retry_srst_ops,.flags = AHCI_FLAG_COMMON | ATA_FLAG_NO_DIPM,
Will do. I'll report back as soon as I've built the modified kernel and tested it.
Kind regards, Niklas
Kind regards
Eric
Le 12/03/2025 à 18:11, Eric a écrit :
Hi Niklas
Le 11/03/2025 à 15:14, Niklas Cassel a écrit :
Hello Hans, Eric,
Eric, could you please run: $ sudo hdparm -I /dev/sdX | grep "interface power management"
on both your Samsung and Maxtor drive? (A star to the left of feature means that the feature is enabled)
Here is the result (apparently PM is enabled on the maxtor but it doesn't create the same problem) :
(trixieUSB)eric@gwaihir:~$ sudo hdparm -I /dev/disk/by-id/ata-MAXTOR_STM3250310AS_6RY2WB82 | grep "interface power management" * Device-initiated interface power management (trixieUSB)eric@gwaihir:~$ sudo hdparm -I /dev/disk/by-id/ata-Samsung_SSD_870_QVO_2TB_S5RPNF0T419459E | grep "interface power management" * Device-initiated interface power management
One guess... perhaps it could be Device Initiated PM that is broken with these controllers? (Even though the controller does claim to support it.)
Eric, could you please try this patch:
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index f813dbdc2346..ca690fde8842 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -244,7 +244,7 @@ static const struct ata_port_info ahci_port_info[] = { }, [board_ahci_sb700] = { /* for SB700 and SB800 */ AHCI_HFLAGS (AHCI_HFLAG_IGN_SERR_INTERNAL), - .flags = AHCI_FLAG_COMMON, + .flags = AHCI_FLAG_COMMON | ATA_FLAG_NO_DIPM, .pio_mask = ATA_PIO4, .udma_mask = ATA_UDMA6, .port_ops = &ahci_pmp_retry_srst_ops,
Will do. I'll report back as soon as I've built the modified kernel and tested it.
Tested. Both disks now respond that way (no Device-initiated interface power management) :
(trixieUSB)eric@gwaihir:~$ sudo hdparm -I /dev/disk/by-id/ata-MAXTOR_STM3250310AS_6RY2WB82 | grep "interface power management" Device-initiated interface power management (trixieUSB)eric@gwaihir:~$ sudo hdparm -I /dev/disk/by-id/ata-Samsung_SSD_870_QVO_2TB_S5RPNF0T419459E | grep "interface power management" Device-initiated interface power management
With the patch you asked me to test, the SSD is properly detected at reboot, both by the UEFI and the kernel.
Kind regards, Niklas
Kind regards
Eri
Hello Eric,
On Wed, Mar 12, 2025 at 10:39:37PM +0100, Eric wrote:
With the patch you asked me to test, the SSD is properly detected at reboot, both by the UEFI and the kernel.
Thanks a lot for testing!
Interesting that DIPM is enabled on both, but only causes a problem on Samsung.
Kind regards, Niklas
Hi Niklas, Eric,
On 11-Mar-25 3:14 PM, Niklas Cassel wrote:
Hello Hans, Eric,
On Mon, Mar 10, 2025 at 09:12:13PM +0100, Hans de Goede wrote:
I agree with you that this is a BIOS bug of the motherboard in question and/or a bad interaction between the ATI SATA controller and Samsung SSD 870* models. Note that given the age of the motherboard there are likely not going to be any BIOS updates fixing this though.
Looking at the number of quirks for some of the ATI SB7x0/SB8x0/SB9x0 SATA controllers, they really look like something special (not in a good way): https://github.com/torvalds/linux/blob/v6.14-rc6/drivers/ata/ahci.c#L236-L24...
-Ignore SError internal -No MSI -Max 255 sectors -Broken 64-bit DMA -Retry SRST (software reset)
And that is even without the weird "disable NCQ but only for Samsung SSD 8xx drives" quirk when using these ATI controllers.
What does bother me is that we don't know if it is this specific mobo/BIOS: Manufacturer: ASUSTeK COMPUTER INC. Product Name: M5A99X EVO R2.0 Version: Rev 1.xx
M5A99X EVO R2.0 BIOS 2501 Version 2501 3.06 MB 2014/05/14
that should have a NOLPM quirk, like we do for specific BIOSes: https://github.com/torvalds/linux/blob/v6.14-rc6/drivers/ata/ahci.c#L1402-L1...
That seems to be a Lenovo only thing though and with Intel chipsets.
Or if it this ATI SATA controller that is always broken when it comes to LPM, regardless of the drive, or if it is only Samsung drives.
I'm pretty sure we can assume this will happen on all ATI SATA controllers, the new LPM default is pretty recent and these boards are getting old, so likely have not that many users who use distros which ship cutting edge kernels.
I do agree with you that it is a question if this is another bad interaction with Samsung SATA SSDs, or if it is a general ATI SATA controller problem, but see below.
Considering the dmesg comparing cold boot, the Maxtor drive and the ASUS ATAPI device seems to be recognized correctly.
Eric, could you please run: $ sudo hdparm -I /dev/sdX | grep "interface power management"
on both your Samsung and Maxtor drive? (A star to the left of feature means that the feature is enabled)
One guess... perhaps it could be Device Initiated PM that is broken with these controllers? (Even though the controller does claim to support it.)
Eric, could you please try this patch:
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index f813dbdc2346..ca690fde8842 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -244,7 +244,7 @@ static const struct ata_port_info ahci_port_info[] = { }, [board_ahci_sb700] = { /* for SB700 and SB800 */ AHCI_HFLAGS (AHCI_HFLAG_IGN_SERR_INTERNAL),
.flags = AHCI_FLAG_COMMON,
.pio_mask = ATA_PIO4, .udma_mask = ATA_UDMA6, .port_ops = &ahci_pmp_retry_srst_ops,.flags = AHCI_FLAG_COMMON | ATA_FLAG_NO_DIPM,
Normally, I do think that we need more reports, to see if it is just this specific BIOS, or all the ATI SB7x0/SB8x0/SB9x0 SATA controllers that are broken...
...but, considering how many quirks these ATI controllers have already...
Right in the mean time Eric has reported back that the above patch fixes this. Thank you for testing this Eric,
One reason why ATA_QUIRK_NO_NCQ_ON_ATI was introduced is because disabling NCQ has severe performance impacts for SSDs, so we did not want to do this for all ATI controllers; or for all Samsung drives. Given that until the recent LPM default change we did not use DIPM on ATI chipsets the above fix IMHO is a good fix, which even keeps the rest of the LPM power-savings.
...and the fact that the one (Dieter) who reported that his Samsung SSD 870 QVO could enter deeper sleep states just fine was running an Intel AHCI controller (with the same FW version as Eric), I would be open to a patch that sets ATA_FLAG_NO_LPM for all these ATI controllers.
Right I think it is save to assume that this is not a Samsung drive problem it is an ATI controller problem. The only question is if this only impacts ATI <-> Samsung SSD combinations or if it is a general issue with ATI controllers. But given the combination of DIPM not having been enabled on these controllers by default anyways, combined with the age of these motherboards (*) I believe that the above patch is a good compromise to fix the regression without needing to wait for more data.
Regards,
Hans
*) And there thus being less users making getting more data hard. And alo meaning not having DIPM will impact only the relatively few remaining users
Hello Hans,
On Thu, Mar 13, 2025 at 11:04:13AM +0100, Hans de Goede wrote:
I do agree with you that it is a question if this is another bad interaction with Samsung SATA SSDs, or if it is a general ATI SATA controller problem, but see below.
(snip)
Right in the mean time Eric has reported back that the above patch fixes this. Thank you for testing this Eric,
One reason why ATA_QUIRK_NO_NCQ_ON_ATI was introduced is because disabling NCQ has severe performance impacts for SSDs, so we did not want to do this for all ATI controllers; or for all Samsung drives. Given that until the recent LPM default change we did not use DIPM on ATI chipsets the above fix IMHO is a good fix, which even keeps the rest of the LPM power-savings.
One slightly interesting thing was that neither the Maxtor or the Samsung drive reported support for Host-Initiated Power Management (HIPM).
Both drives supported Device-Initiated Power Management (DIPM), and we could see that DIPM was enabled on both drives.
We already know that LPM works on the Samsung drive with an Intel AHCI controller. (But since the device does not report support for HIPM, even on Intel, only DIPM will be used/enabled.)
Right I think it is safe to assume that this is not a Samsung drive problem it is an ATI controller problem. The only question is if this only impacts ATI <-> Samsung SSD combinations or if it is a general issue with ATI controllers. But given the combination of DIPM not having been enabled on these controllers by default anyways, combined with the age of these motherboards (*) I believe that the above patch is a good compromise to fix the regression without needing to wait for more data.
Regards,
Hans
*) And there thus being less users making getting more data hard. And alo meaning not having DIPM will impact only the relatively few remaining users
I'm still not 100% sure with the best way forward.
The ATI SATA controller reports that it supports ALPM (i.e. also HIPM). It also reports support for slumber and partial, which means that it must support both host initiated and device initiated requests to these states. (See AHCI spec 3.1.1 - Offset 00h: CAP – HBA Capabilities, CAP.PSC and CAP.SSC fields.)
Considering that DIPM seems to work fine on the Maxtor drive, I guess your initial suggestion of a Samsung only quirk which only disables LPM on ATI is the best way?
It seems that ATI and Samsung must have interpreted some spec differently from each other, otherwise, I don't understand why this combination specificially seems to be so extremely bad, ATI + anything other than Samsung, or Samsung + anything other than ATI seems to work.
Kind regards, Niklas
Hi Niklas,
On 13-Mar-25 1:48 PM, Niklas Cassel wrote:
Hello Hans,
On Thu, Mar 13, 2025 at 11:04:13AM +0100, Hans de Goede wrote:
I do agree with you that it is a question if this is another bad interaction with Samsung SATA SSDs, or if it is a general ATI SATA controller problem, but see below.
(snip)
Right in the mean time Eric has reported back that the above patch fixes this. Thank you for testing this Eric,
One reason why ATA_QUIRK_NO_NCQ_ON_ATI was introduced is because disabling NCQ has severe performance impacts for SSDs, so we did not want to do this for all ATI controllers; or for all Samsung drives. Given that until the recent LPM default change we did not use DIPM on ATI chipsets the above fix IMHO is a good fix, which even keeps the rest of the LPM power-savings.
One slightly interesting thing was that neither the Maxtor or the Samsung drive reported support for Host-Initiated Power Management (HIPM).
Both drives supported Device-Initiated Power Management (DIPM), and we could see that DIPM was enabled on both drives.
We already know that LPM works on the Samsung drive with an Intel AHCI controller. (But since the device does not report support for HIPM, even on Intel, only DIPM will be used/enabled.)
Right I think it is safe to assume that this is not a Samsung drive problem it is an ATI controller problem. The only question is if this only impacts ATI <-> Samsung SSD combinations or if it is a general issue with ATI controllers. But given the combination of DIPM not having been enabled on these controllers by default anyways, combined with the age of these motherboards (*) I believe that the above patch is a good compromise to fix the regression without needing to wait for more data.
Regards,
Hans
*) And there thus being less users making getting more data hard. And alo meaning not having DIPM will impact only the relatively few remaining users
I'm still not 100% sure with the best way forward.
The ATI SATA controller reports that it supports ALPM (i.e. also HIPM). It also reports support for slumber and partial, which means that it must support both host initiated and device initiated requests to these states. (See AHCI spec 3.1.1 - Offset 00h: CAP – HBA Capabilities, CAP.PSC and CAP.SSC fields.)
Considering that DIPM seems to work fine on the Maxtor drive, I guess your initial suggestion of a Samsung only quirk which only disables LPM on ATI is the best way?
I have no objections against going that route, except that I guess this should then be something like ATA_QUIRK_NO_DIPM_ON_ATI to not loose the other LPM modes / savings? AFAIK/IIRC there still is quite some powersaving to be had without DIPM.
It seems that ATI and Samsung must have interpreted some spec differently from each other, otherwise, I don't understand why this combination specificially seems to be so extremely bad, ATI + anything other than Samsung, or Samsung + anything other than ATI seems to work.
Yes the most severe problems do seem to come from that specific mix, although the long list of other ATI controller quirks also shows those controllers are somewhat finicky.
Regards,
Hans
Hello Hans,
On Thu, Mar 13, 2025 at 04:13:24PM +0100, Hans de Goede wrote:
Considering that DIPM seems to work fine on the Maxtor drive, I guess your initial suggestion of a Samsung only quirk which only disables LPM on ATI is the best way?
I have no objections against going that route, except that I guess this should then be something like ATA_QUIRK_NO_DIPM_ON_ATI to not loose the other LPM modes / savings? AFAIK/IIRC there still is quite some powersaving to be had without DIPM.
I was thinking like your original suggestion, i.e. setting: ATA_QUIRK_NO_LPM_ON_ATI
for all the Samsung devices that currently have: ATA_QUIRK_NO_NCQ_ON_ATI
Considering that this Samsung device only supports DIPM (and not HIPM), I'm guessing the same is true for the other Samsung devices as well.
So we might as well just do: ATA_QUIRK_NO_LPM_ON_ATI
to disable both HIPM and DIPM (since only DIPM would have been enabled without this quirk anyway).
Yes the most severe problems do seem to come from that specific mix, although the long list of other ATI controller quirks also shows those controllers are somewhat finicky.
Definitely!
Kind regards, Niklas
Hi,
On 13-Mar-25 4:28 PM, Niklas Cassel wrote:
Hello Hans,
On Thu, Mar 13, 2025 at 04:13:24PM +0100, Hans de Goede wrote:
Considering that DIPM seems to work fine on the Maxtor drive, I guess your initial suggestion of a Samsung only quirk which only disables LPM on ATI is the best way?
I have no objections against going that route, except that I guess this should then be something like ATA_QUIRK_NO_DIPM_ON_ATI to not loose the other LPM modes / savings? AFAIK/IIRC there still is quite some powersaving to be had without DIPM.
I was thinking like your original suggestion, i.e. setting: ATA_QUIRK_NO_LPM_ON_ATI
for all the Samsung devices that currently have: ATA_QUIRK_NO_NCQ_ON_ATI
Considering that this Samsung device only supports DIPM (and not HIPM), I'm guessing the same is true for the other Samsung devices as well.
Ah I see ...
So we might as well just do: ATA_QUIRK_NO_LPM_ON_ATI
Yes I agree and that will nicely work as a combination of ATA_QUIRK_NO_LPM + ATA_QUIRK_NO_NCQ_ON_ATI functionality so using tested code-paths in a slightly new way.
Regards,
Hans
to disable both HIPM and DIPM (since only DIPM would have been enabled without this quirk anyway).
Yes the most severe problems do seem to come from that specific mix, although the long list of other ATI controller quirks also shows those controllers are somewhat finicky.
Definitely!
Kind regards, Niklas
Hello Hans,
On Thu, Mar 13, 2025 at 07:47:11PM +0100, Hans de Goede wrote:
Hi,
On 13-Mar-25 4:28 PM, Niklas Cassel wrote:
Hello Hans,
On Thu, Mar 13, 2025 at 04:13:24PM +0100, Hans de Goede wrote:
Considering that DIPM seems to work fine on the Maxtor drive, I guess your initial suggestion of a Samsung only quirk which only disables LPM on ATI is the best way?
I have no objections against going that route, except that I guess this should then be something like ATA_QUIRK_NO_DIPM_ON_ATI to not loose the other LPM modes / savings? AFAIK/IIRC there still is quite some powersaving to be had without DIPM.
I was thinking like your original suggestion, i.e. setting: ATA_QUIRK_NO_LPM_ON_ATI
for all the Samsung devices that currently have: ATA_QUIRK_NO_NCQ_ON_ATI
Considering that this Samsung device only supports DIPM (and not HIPM), I'm guessing the same is true for the other Samsung devices as well.
Ah I see ...
So we might as well just do: ATA_QUIRK_NO_LPM_ON_ATI
Yes I agree and that will nicely work as a combination of ATA_QUIRK_NO_LPM + ATA_QUIRK_NO_NCQ_ON_ATI functionality so using tested code-paths in a slightly new way.
I sent a patch that implements your original suggestion here: https://lore.kernel.org/linux-ide/20250317170348.1748671-2-cassel@kernel.org...
I forgot to add your Suggested-by tag. If the patch solves Eric's problem, I could add the tag when applying.
Kind regards, Niklas
Hi Niklas,
Le 17/03/2025 à 18:09, Niklas Cassel a écrit :
I sent a patch that implements your original suggestion here: https://lore.kernel.org/linux-ide/20250317170348.1748671-2-cassel@kernel.org...
I forgot to add your Suggested-by tag. If the patch solves Eric's problem, I could add the tag when applying.
I'll report back when the kernel with your proposed patch is built and tested on my system.
Kind regards, Niklas
Kind regards
Eric
Hi Niklas, hi Hans,
Le 17/03/2025 à 20:15, Eric a écrit :
Hi Niklas,
Le 17/03/2025 à 18:09, Niklas Cassel a écrit :
I sent a patch that implements your original suggestion here: https://lore.kernel.org/linux-ide/20250317170348.1748671-2-cassel@kernel.org...
I forgot to add your Suggested-by tag. If the patch solves Eric's problem, I could add the tag when applying.
I'll report back when the kernel with your proposed patch is built and tested on my system.
The test is a success as far as I am concerned. With this new patch, DIPM is disabled on the Samsung SSD, but not the Maxtor disk on the same controller :
(trixieUSB)eric@gwaihir:~$ sudo hdparm -I /dev/disk/by-id/ata-Samsung_SSD_870_QVO_2TB_S5RPNF0T419459E | grep "interface power management" Device-initiated interface power management (trixieUSB)eric@gwaihir:~$ sudo hdparm -I /dev/disk/by-id/ata-MAXTOR_STM3250310AS_6RY2WB82 | grep "interface power management" * Device-initiated interface power management
and the SSD is successfully detected at reboot by both the UEFI and the linux kernel.
Kind regards, Niklas
Kind regards
Eric
Hello Eric,
On Tue, Mar 18, 2025 at 01:04:48AM +0100, Eric wrote:
Hi Niklas, hi Hans,
Le 17/03/2025 à 20:15, Eric a écrit :
Hi Niklas,
Le 17/03/2025 à 18:09, Niklas Cassel a écrit :
I sent a patch that implements your original suggestion here: https://lore.kernel.org/linux-ide/20250317170348.1748671-2-cassel@kernel.org...
I forgot to add your Suggested-by tag. If the patch solves Eric's problem, I could add the tag when applying.
I'll report back when the kernel with your proposed patch is built and tested on my system.
The test is a success as far as I am concerned. With this new patch, DIPM is disabled on the Samsung SSD, but not the Maxtor disk on the same controller :
(trixieUSB)eric@gwaihir:~$ sudo hdparm -I /dev/disk/by-id/ata-Samsung_SSD_870_QVO_2TB_S5RPNF0T419459E | grep "interface power management" Device-initiated interface power management (trixieUSB)eric@gwaihir:~$ sudo hdparm -I /dev/disk/by-id/ata-MAXTOR_STM3250310AS_6RY2WB82 | grep "interface power management" * Device-initiated interface power management
and the SSD is successfully detected at reboot by both the UEFI and the linux kernel.
Thank you for all your perseverance!
Hopefully, your efforts will make sure that others with ATI AHCI do not encounter the same issue that you faced.
Kind regards, Niklas
Hi Niklas, hi Hans
Le 18/03/2025 à 10:10, Niklas Cassel a écrit :
Hello Eric,
On Tue, Mar 18, 2025 at 01:04:48AM +0100, Eric wrote:
Hi Niklas, hi Hans,
[...] The test is a success as far as I am concerned. With this new patch, DIPM is disabled on the Samsung SSD, but not the Maxtor disk on the same controller:
[...]
and the SSD is successfully detected at reboot by both the UEFI and the linux kernel.
Thank you for all your perseverance!
Hopefully, your efforts will make sure that others with ATI AHCI do not encounter the same issue that you faced.
Thank you for your work !
Kind regards, Niklas
Kind regards,
Eric
linux-stable-mirror@lists.linaro.org