On Thu, Aug 21, 2025 at 08:54:52AM +0800, Ethan Zhao wrote:
On 8/21/2025 1:26 AM, Brian Norris wrote:
From: Brian Norris briannorris@google.com
max_link_speed, max_link_width, current_link_speed, current_link_width, secondary_bus_number, and subordinate_bus_number all access config registers, but they don't check the runtime PM state. If the device is in D3cold, we may see -EINVAL or even bogus values.
My understanding, if your device is in D3cold, returning of -EINVAL is the right behavior.
That's not the guaranteed result though. Some hosts don't properly return PCIBIOS_DEVICE_NOT_FOUND, for one. But also, it's racy -- because we don't even try to hold a pm_runtime reference, the device could possibly enter D3cold while we're in the middle of reading from it. If you're lucky, that'll get you a completion timeout and an all-1's result, and we'll return a garbage result.
So if we want to purposely not resume the device and retain "I can't give you what you asked for" behavior, we'd at least need a pm_runtime_get_noresume() or similar.
Wrap these access in pci_config_pm_runtime_{get,put}() like most of the rest of the similar sysfs attributes.
Fixes: 56c1af4606f0 ("PCI: Add sysfs max_link_speed/width, current_link_speed/width, etc") Cc: stable@vger.kernel.org Signed-off-by: Brian Norris briannorris@google.com Signed-off-by: Brian Norris briannorris@chromium.org
drivers/pci/pci-sysfs.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 5eea14c1f7f5..160df897dc5e 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -191,9 +191,16 @@ static ssize_t max_link_speed_show(struct device *dev, struct device_attribute *attr, char *buf) { struct pci_dev *pdev = to_pci_dev(dev);
- ssize_t ret;
- pci_config_pm_runtime_get(pdev);
This function would potentially change the power state of device, that would be a complex process, beyond the meaning of max_link_speed_show(), given the semantics of these functions ( max_link_speed_show()/max_link_width_show()/current_link_speed_show()/ ....), this cannot be done !
What makes this different than the 'config' attribute (i.e., "read config register")? Why shouldn't that just return -EINVAL? I don't really buy your reasoning -- "it's a complex process" is not a reason not to do something. The user asked for the link speed; why not give it? If the user wanted to know if the device was powered, they could check the 'power_state' attribute instead.
(Side note: these attributes don't show up anywhere in Documentation/, so it's also a bit hard to declare "best" semantics for them.)
To flip this question around a bit: if I have a system that aggressively suspends devices when there's no recent activity, how am I supposed to check what the link speed is? Probabilistically hammer the file while hoping some other activity wakes the device, so I can find the small windows of time where it's RPM_ACTIVE? Disable runtime_pm for the device while I check?
Brian