The Cadence PCIe Controller integrated in the TI K3 SoCs supports both Root-Complex and Endpoint modes of operation. The Glue Layer allows "strapping" the Mode of operation of the Controller, the Link Speed and the Link Width. This is enabled by programming the "PCIEn_CTRL" register (n corresponds to the PCIe instance) within the CTRL_MMR memory-mapped register space. The "reset-values" of the registers are also different depending on the mode of operation.
Since the PCIe Controller latches onto the "reset-values" immediately after being powered on, if the Glue Layer configuration is not done while the PCIe Controller is off, it will result in the PCIe Controller latching onto the wrong "reset-values". In practice, this will show up as a wrong representation of the PCIe Controller's capability structures in the PCIe Configuration Space. Some such capabilities which are supported by the PCIe Controller in the Root-Complex mode but are incorrectly latched onto as being unsupported are: - Link Bandwidth Notification - Alternate Routing ID (ARI) Forwarding Support - Next capability offset within Advanced Error Reporting (AER) capability
Fix this by powering off the PCIe Controller before programming the "strap" settings and powering it on after that.
Fixes: f3e25911a430 ("PCI: j721e: Add TI J721E PCIe driver") Cc: stable@vger.kernel.org Signed-off-by: Siddharth Vadapalli s-vadapalli@ti.com ---
Hello,
This patch is based on commit 07d9df80082b Merge tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools of Mainline Linux.
v2 of this patch is at: https://lore.kernel.org/r/20250819101336.292013-1-s-vadapalli@ti.com/ Changes since v2: - Based on Bjorn's feedback at: https://lore.kernel.org/r/20250819221748.GA598958@bhelgaas/ 1) Commit message has been rephrased to summarize the issue and the fix without elaborating too much on the details. 2) Description of the issue's symptoms noticeable by a user has been added to the commit message. 3) Comment has been wrapped to fit within 80 columns. 4) The implementation has been simplified by moving the Controller Power OFF and Power ON sequence into j721e_pcie_ctrl_init() as a result of which the code reordering as well as function parameter changes are no longer required. - Based on offline feedback from Vignesh, Runtime PM APIs are used instead of PM DOMAIN APIs to power off and power on the PCIe Controller. - Rebased patch on latest Mainline Linux.
Test Logs on J7200 EVM without the current patch applied show that the ARI Forwarding Capability incorrectly shows up as not being supported: https://gist.github.com/Siddharth-Vadapalli-at-TI/768bca36025ed630c4e69bcc3d...
Test Logs on J7200 EVM with the current patch applied show that the ARI Forwarding Capability correctly shows up as being supported: https://gist.github.com/Siddharth-Vadapalli-at-TI/fc1752d17140646c8fa57209ec...
As explained in the commit message, this discrepancy is solely due to the PCIe Controller latching onto the incorrect reset-values which occurs when the strap settings are programmed after the PCIe Controller is powered on, at which point, the reset-values don't toggle anymore.
Regards, Siddharth.
drivers/pci/controller/cadence/pci-j721e.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/pci/controller/cadence/pci-j721e.c b/drivers/pci/controller/cadence/pci-j721e.c index 6c93f39d0288..c178b117215a 100644 --- a/drivers/pci/controller/cadence/pci-j721e.c +++ b/drivers/pci/controller/cadence/pci-j721e.c @@ -284,6 +284,22 @@ static int j721e_pcie_ctrl_init(struct j721e_pcie *pcie) if (!ret) offset = args.args[0];
+ /* + * The PCIe Controller's registers have different "reset-values" + * depending on the "strap" settings programmed into the PCIEn_CTRL + * register within the CTRL_MMR memory-mapped register space. + * The registers latch onto a "reset-value" based on the "strap" + * settings sampled after the PCIe Controller is powered on. + * To ensure that the "reset-values" are sampled accurately, power + * off the PCIe Controller before programming the "strap" settings + * and power it on after that. + */ + ret = pm_runtime_put_sync(dev); + if (ret < 0) { + dev_err(dev, "Failed to power off PCIe Controller\n"); + return ret; + } + ret = j721e_pcie_set_mode(pcie, syscon, offset); if (ret < 0) { dev_err(dev, "Failed to set pci mode\n"); @@ -302,6 +318,12 @@ static int j721e_pcie_ctrl_init(struct j721e_pcie *pcie) return ret; }
+ ret = pm_runtime_get_sync(dev); + if (ret < 0) { + dev_err(dev, "Failed to power on PCIe Controller\n"); + return ret; + } + /* Enable ACSPCIE refclk output if the optional property exists */ syscon = syscon_regmap_lookup_by_phandle_optional(node, "ti,syscon-acspcie-proxy-ctrl");
On Fri, Aug 29, 2025 at 02:46:28PM GMT, Siddharth Vadapalli wrote:
The Cadence PCIe Controller integrated in the TI K3 SoCs supports both Root-Complex and Endpoint modes of operation. The Glue Layer allows "strapping" the Mode of operation of the Controller, the Link Speed and the Link Width. This is enabled by programming the "PCIEn_CTRL" register (n corresponds to the PCIe instance) within the CTRL_MMR memory-mapped register space. The "reset-values" of the registers are also different depending on the mode of operation.
Since the PCIe Controller latches onto the "reset-values" immediately after being powered on, if the Glue Layer configuration is not done while the PCIe Controller is off, it will result in the PCIe Controller latching onto the wrong "reset-values". In practice, this will show up as a wrong representation of the PCIe Controller's capability structures in the PCIe Configuration Space. Some such capabilities which are supported by the PCIe Controller in the Root-Complex mode but are incorrectly latched onto as being unsupported are:
- Link Bandwidth Notification
- Alternate Routing ID (ARI) Forwarding Support
- Next capability offset within Advanced Error Reporting (AER) capability
Fix this by powering off the PCIe Controller before programming the "strap" settings and powering it on after that.
Fixes: f3e25911a430 ("PCI: j721e: Add TI J721E PCIe driver") Cc: stable@vger.kernel.org Signed-off-by: Siddharth Vadapalli s-vadapalli@ti.com
Hello,
This patch is based on commit 07d9df80082b Merge tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools of Mainline Linux.
v2 of this patch is at: https://lore.kernel.org/r/20250819101336.292013-1-s-vadapalli@ti.com/ Changes since v2:
- Based on Bjorn's feedback at: https://lore.kernel.org/r/20250819221748.GA598958@bhelgaas/
fix without elaborating too much on the details. 2) Description of the issue's symptoms noticeable by a user has been added to the commit message. 3) Comment has been wrapped to fit within 80 columns. 4) The implementation has been simplified by moving the Controller Power OFF and Power ON sequence into j721e_pcie_ctrl_init() as a result of which the code reordering as well as function parameter changes are no longer required.
- Commit message has been rephrased to summarize the issue and the
- Based on offline feedback from Vignesh, Runtime PM APIs are used instead of PM DOMAIN APIs to power off and power on the PCIe Controller.
- Rebased patch on latest Mainline Linux.
Test Logs on J7200 EVM without the current patch applied show that the ARI Forwarding Capability incorrectly shows up as not being supported: https://gist.github.com/Siddharth-Vadapalli-at-TI/768bca36025ed630c4e69bcc3d...
Test Logs on J7200 EVM with the current patch applied show that the ARI Forwarding Capability correctly shows up as being supported: https://gist.github.com/Siddharth-Vadapalli-at-TI/fc1752d17140646c8fa57209ec...
As explained in the commit message, this discrepancy is solely due to the PCIe Controller latching onto the incorrect reset-values which occurs when the strap settings are programmed after the PCIe Controller is powered on, at which point, the reset-values don't toggle anymore.
Regards, Siddharth.
drivers/pci/controller/cadence/pci-j721e.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/pci/controller/cadence/pci-j721e.c b/drivers/pci/controller/cadence/pci-j721e.c index 6c93f39d0288..c178b117215a 100644 --- a/drivers/pci/controller/cadence/pci-j721e.c +++ b/drivers/pci/controller/cadence/pci-j721e.c @@ -284,6 +284,22 @@ static int j721e_pcie_ctrl_init(struct j721e_pcie *pcie) if (!ret) offset = args.args[0];
- /*
* The PCIe Controller's registers have different "reset-values"
* depending on the "strap" settings programmed into the PCIEn_CTRL
* register within the CTRL_MMR memory-mapped register space.
* The registers latch onto a "reset-value" based on the "strap"
* settings sampled after the PCIe Controller is powered on.
* To ensure that the "reset-values" are sampled accurately, power
* off the PCIe Controller before programming the "strap" settings
* and power it on after that.
*/
- ret = pm_runtime_put_sync(dev);
- if (ret < 0) {
dev_err(dev, "Failed to power off PCIe Controller\n");
return ret;
- }
How does the controller gets powered off after pm_runtime_put_sync() since you do not have runtime PM callbacks? I believe the parent is turning off the power domain?
- Mani
On Sun, Aug 31, 2025 at 06:15:13PM +0530, Manivannan Sadhasivam wrote:
Hello Mani,
On Fri, Aug 29, 2025 at 02:46:28PM GMT, Siddharth Vadapalli wrote:
The Cadence PCIe Controller integrated in the TI K3 SoCs supports both Root-Complex and Endpoint modes of operation. The Glue Layer allows "strapping" the Mode of operation of the Controller, the Link Speed and the Link Width. This is enabled by programming the "PCIEn_CTRL" register (n corresponds to the PCIe instance) within the CTRL_MMR memory-mapped register space. The "reset-values" of the registers are also different depending on the mode of operation.
Since the PCIe Controller latches onto the "reset-values" immediately after being powered on, if the Glue Layer configuration is not done while the PCIe Controller is off, it will result in the PCIe Controller latching onto the wrong "reset-values". In practice, this will show up as a wrong representation of the PCIe Controller's capability structures in the PCIe Configuration Space. Some such capabilities which are supported by the PCIe Controller in the Root-Complex mode but are incorrectly latched onto as being unsupported are:
- Link Bandwidth Notification
- Alternate Routing ID (ARI) Forwarding Support
- Next capability offset within Advanced Error Reporting (AER) capability
Fix this by powering off the PCIe Controller before programming the "strap" settings and powering it on after that.
Fixes: f3e25911a430 ("PCI: j721e: Add TI J721E PCIe driver") Cc: stable@vger.kernel.org Signed-off-by: Siddharth Vadapalli s-vadapalli@ti.com
Hello,
This patch is based on commit 07d9df80082b Merge tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools of Mainline Linux.
v2 of this patch is at: https://lore.kernel.org/r/20250819101336.292013-1-s-vadapalli@ti.com/ Changes since v2:
- Based on Bjorn's feedback at: https://lore.kernel.org/r/20250819221748.GA598958@bhelgaas/
fix without elaborating too much on the details. 2) Description of the issue's symptoms noticeable by a user has been added to the commit message. 3) Comment has been wrapped to fit within 80 columns. 4) The implementation has been simplified by moving the Controller Power OFF and Power ON sequence into j721e_pcie_ctrl_init() as a result of which the code reordering as well as function parameter changes are no longer required.
- Commit message has been rephrased to summarize the issue and the
- Based on offline feedback from Vignesh, Runtime PM APIs are used instead of PM DOMAIN APIs to power off and power on the PCIe Controller.
- Rebased patch on latest Mainline Linux.
Test Logs on J7200 EVM without the current patch applied show that the ARI Forwarding Capability incorrectly shows up as not being supported: https://gist.github.com/Siddharth-Vadapalli-at-TI/768bca36025ed630c4e69bcc3d...
Test Logs on J7200 EVM with the current patch applied show that the ARI Forwarding Capability correctly shows up as being supported: https://gist.github.com/Siddharth-Vadapalli-at-TI/fc1752d17140646c8fa57209ec...
As explained in the commit message, this discrepancy is solely due to the PCIe Controller latching onto the incorrect reset-values which occurs when the strap settings are programmed after the PCIe Controller is powered on, at which point, the reset-values don't toggle anymore.
Regards, Siddharth.
drivers/pci/controller/cadence/pci-j721e.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/pci/controller/cadence/pci-j721e.c b/drivers/pci/controller/cadence/pci-j721e.c index 6c93f39d0288..c178b117215a 100644 --- a/drivers/pci/controller/cadence/pci-j721e.c +++ b/drivers/pci/controller/cadence/pci-j721e.c @@ -284,6 +284,22 @@ static int j721e_pcie_ctrl_init(struct j721e_pcie *pcie) if (!ret) offset = args.args[0];
- /*
* The PCIe Controller's registers have different "reset-values"
* depending on the "strap" settings programmed into the PCIEn_CTRL
* register within the CTRL_MMR memory-mapped register space.
* The registers latch onto a "reset-value" based on the "strap"
* settings sampled after the PCIe Controller is powered on.
* To ensure that the "reset-values" are sampled accurately, power
* off the PCIe Controller before programming the "strap" settings
* and power it on after that.
*/
- ret = pm_runtime_put_sync(dev);
- if (ret < 0) {
dev_err(dev, "Failed to power off PCIe Controller\n");
return ret;
- }
How does the controller gets powered off after pm_runtime_put_sync() since you do not have runtime PM callbacks? I believe the parent is turning off the power domain?
By invoking 'pm_runtime_put_sync(dev)', the ref-count is being decremented and it results in the device being powered off. I have verified it by printing the power domain index within the functions for powering off and powering on devices on the J7200 SoC. Logs:
root@j7200-evm:~# modprobe pci_j721e [ 25.231675] [Powering On]: 240 [ 25.234848] j721e-pcie 2910000.pcie: host bridge /bus@100000/pcie@2910000 ranges: [ 25.242378] j721e-pcie 2910000.pcie: IO 0x4100001000..0x4100100fff -> 0x0000001000 [ 25.250496] j721e-pcie 2910000.pcie: MEM 0x4100101000..0x41ffffffff -> 0x0000101000 [ 25.258588] j721e-pcie 2910000.pcie: IB MEM 0x0000000000..0xffffffffffff -> 0x0000000000 [ 25.267098] [Powering Off]: 240 [ 25.270318] [Powering On]: 240 [ 25.273511] [Powering On]: 187 [ 26.372361] j721e-pcie 2910000.pcie: PCI host bridge to bus 0000:00 [ 26.378666] pci_bus 0000:00: root bus resource [bus 00-ff] [ 26.384156] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff] (bus address [0x1000-0x100fff]) [ 26.393197] pci_bus 0000:00: root bus resource [mem 0x4100101000-0x41ffffffff] (bus address [0x00101000-0xffffffff]) [ 26.403728] pci 0000:00:00.0: [104c:b00f] type 01 class 0x060400 PCIe Root Port [ 26.411044] pci 0000:00:00.0: PCI bridge to [bus 00] [ 26.416009] pci 0000:00:00.0: bridge window [io 0x0000-0x0fff] [ 26.422091] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff] [ 26.428874] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref] [ 26.436676] pci 0000:00:00.0: supports D1 [ 26.440699] pci 0000:00:00.0: PME# supported from D0 D1 D3hot [ 26.448064] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 26.456274] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 [ 26.462923] pci 0000:00:00.0: PCI bridge to [bus 01] [ 26.467933] pci_bus 0000:00: resource 4 [io 0x0000-0xfffff] [ 26.473595] pci_bus 0000:00: resource 5 [mem 0x4100101000-0x41ffffffff] [ 26.480337] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 26.487479] pcieport 0000:00:00.0: PME: Signaling with IRQ 701 [ 26.493909] pcieport 0000:00:00.0: AER: enabled with IRQ 701
In the above logs, '240' is the Power Domain Index for the PCIe Controller on J7200 SoC. It is powered on initially before the driver is probed. During driver probe, we see the logs corresponding to "devm_pci_alloc_host_bridge()" from the timestamp of '25.234848' which is prior to the invocation of 'j721e_pcie_ctrl_init()'. Some time around the '25.267098' timestamp, the 'j721e_pcie_ctrl_init()' function is invoked which then decrements the ref-count via 'pm_runtime_put_sync(dev)' leading to the PCIe Controller being powered off. This seems to be consistent across boot unlike the usage of 'dev_pm_domain_detach' which handles the device power off via a workqueue as a result of which it may not be powered off yet when 'j721e_pcie_ctrl_init()' is programming the strap settings. Hence, I switched from 'dev_pm_domain_detach()' to 'pm_runtime_put_sync()' in the v3 patch.
Please let me know if you have any suggestions for alternative means to power off the device in a reliable manner without deferring it to a workqueue as done by the 'dev_pm_domain_detach()' API.
Regards, Siddharth.
On Mon, Sep 01, 2025 at 10:27:51AM GMT, Siddharth Vadapalli wrote:
On Sun, Aug 31, 2025 at 06:15:13PM +0530, Manivannan Sadhasivam wrote:
Hello Mani,
On Fri, Aug 29, 2025 at 02:46:28PM GMT, Siddharth Vadapalli wrote:
The Cadence PCIe Controller integrated in the TI K3 SoCs supports both Root-Complex and Endpoint modes of operation. The Glue Layer allows "strapping" the Mode of operation of the Controller, the Link Speed and the Link Width. This is enabled by programming the "PCIEn_CTRL" register (n corresponds to the PCIe instance) within the CTRL_MMR memory-mapped register space. The "reset-values" of the registers are also different depending on the mode of operation.
Since the PCIe Controller latches onto the "reset-values" immediately after being powered on, if the Glue Layer configuration is not done while the PCIe Controller is off, it will result in the PCIe Controller latching onto the wrong "reset-values". In practice, this will show up as a wrong representation of the PCIe Controller's capability structures in the PCIe Configuration Space. Some such capabilities which are supported by the PCIe Controller in the Root-Complex mode but are incorrectly latched onto as being unsupported are:
- Link Bandwidth Notification
- Alternate Routing ID (ARI) Forwarding Support
- Next capability offset within Advanced Error Reporting (AER) capability
Fix this by powering off the PCIe Controller before programming the "strap" settings and powering it on after that.
Fixes: f3e25911a430 ("PCI: j721e: Add TI J721E PCIe driver") Cc: stable@vger.kernel.org Signed-off-by: Siddharth Vadapalli s-vadapalli@ti.com
Hello,
This patch is based on commit 07d9df80082b Merge tag 'perf-tools-fixes-for-v6.17-2025-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools of Mainline Linux.
v2 of this patch is at: https://lore.kernel.org/r/20250819101336.292013-1-s-vadapalli@ti.com/ Changes since v2:
- Based on Bjorn's feedback at: https://lore.kernel.org/r/20250819221748.GA598958@bhelgaas/
fix without elaborating too much on the details. 2) Description of the issue's symptoms noticeable by a user has been added to the commit message. 3) Comment has been wrapped to fit within 80 columns. 4) The implementation has been simplified by moving the Controller Power OFF and Power ON sequence into j721e_pcie_ctrl_init() as a result of which the code reordering as well as function parameter changes are no longer required.
- Commit message has been rephrased to summarize the issue and the
- Based on offline feedback from Vignesh, Runtime PM APIs are used instead of PM DOMAIN APIs to power off and power on the PCIe Controller.
- Rebased patch on latest Mainline Linux.
Test Logs on J7200 EVM without the current patch applied show that the ARI Forwarding Capability incorrectly shows up as not being supported: https://gist.github.com/Siddharth-Vadapalli-at-TI/768bca36025ed630c4e69bcc3d...
Test Logs on J7200 EVM with the current patch applied show that the ARI Forwarding Capability correctly shows up as being supported: https://gist.github.com/Siddharth-Vadapalli-at-TI/fc1752d17140646c8fa57209ec...
As explained in the commit message, this discrepancy is solely due to the PCIe Controller latching onto the incorrect reset-values which occurs when the strap settings are programmed after the PCIe Controller is powered on, at which point, the reset-values don't toggle anymore.
Regards, Siddharth.
drivers/pci/controller/cadence/pci-j721e.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/pci/controller/cadence/pci-j721e.c b/drivers/pci/controller/cadence/pci-j721e.c index 6c93f39d0288..c178b117215a 100644 --- a/drivers/pci/controller/cadence/pci-j721e.c +++ b/drivers/pci/controller/cadence/pci-j721e.c @@ -284,6 +284,22 @@ static int j721e_pcie_ctrl_init(struct j721e_pcie *pcie) if (!ret) offset = args.args[0];
- /*
* The PCIe Controller's registers have different "reset-values"
* depending on the "strap" settings programmed into the PCIEn_CTRL
* register within the CTRL_MMR memory-mapped register space.
* The registers latch onto a "reset-value" based on the "strap"
* settings sampled after the PCIe Controller is powered on.
* To ensure that the "reset-values" are sampled accurately, power
* off the PCIe Controller before programming the "strap" settings
* and power it on after that.
*/
- ret = pm_runtime_put_sync(dev);
- if (ret < 0) {
dev_err(dev, "Failed to power off PCIe Controller\n");
return ret;
- }
How does the controller gets powered off after pm_runtime_put_sync() since you do not have runtime PM callbacks? I believe the parent is turning off the power domain?
By invoking 'pm_runtime_put_sync(dev)', the ref-count is being decremented and it results in the device being powered off. I have verified it by printing the power domain index within the functions for powering off and powering on devices on the J7200 SoC. Logs:
root@j7200-evm:~# modprobe pci_j721e [ 25.231675] [Powering On]: 240 [ 25.234848] j721e-pcie 2910000.pcie: host bridge /bus@100000/pcie@2910000 ranges: [ 25.242378] j721e-pcie 2910000.pcie: IO 0x4100001000..0x4100100fff -> 0x0000001000 [ 25.250496] j721e-pcie 2910000.pcie: MEM 0x4100101000..0x41ffffffff -> 0x0000101000 [ 25.258588] j721e-pcie 2910000.pcie: IB MEM 0x0000000000..0xffffffffffff -> 0x0000000000 [ 25.267098] [Powering Off]: 240 [ 25.270318] [Powering On]: 240 [ 25.273511] [Powering On]: 187 [ 26.372361] j721e-pcie 2910000.pcie: PCI host bridge to bus 0000:00 [ 26.378666] pci_bus 0000:00: root bus resource [bus 00-ff] [ 26.384156] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff] (bus address [0x1000-0x100fff]) [ 26.393197] pci_bus 0000:00: root bus resource [mem 0x4100101000-0x41ffffffff] (bus address [0x00101000-0xffffffff]) [ 26.403728] pci 0000:00:00.0: [104c:b00f] type 01 class 0x060400 PCIe Root Port [ 26.411044] pci 0000:00:00.0: PCI bridge to [bus 00] [ 26.416009] pci 0000:00:00.0: bridge window [io 0x0000-0x0fff] [ 26.422091] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff] [ 26.428874] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref] [ 26.436676] pci 0000:00:00.0: supports D1 [ 26.440699] pci 0000:00:00.0: PME# supported from D0 D1 D3hot [ 26.448064] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 26.456274] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 [ 26.462923] pci 0000:00:00.0: PCI bridge to [bus 01] [ 26.467933] pci_bus 0000:00: resource 4 [io 0x0000-0xfffff] [ 26.473595] pci_bus 0000:00: resource 5 [mem 0x4100101000-0x41ffffffff] [ 26.480337] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 26.487479] pcieport 0000:00:00.0: PME: Signaling with IRQ 701 [ 26.493909] pcieport 0000:00:00.0: AER: enabled with IRQ 701
In the above logs, '240' is the Power Domain Index for the PCIe Controller on J7200 SoC. It is powered on initially before the driver is probed.
In that case, the driver should not call pm_runtime_get_sync() in its probe. What it should do is:
pm_runtime_set_active() pm_runtime_enable()
But the driver is supporting several SoC variants. Does the bootloader enable PCIe controller for all of them?
During driver probe, we see the logs corresponding to "devm_pci_alloc_host_bridge()" from the timestamp of '25.234848' which is prior to the invocation of 'j721e_pcie_ctrl_init()'. Some time around the '25.267098' timestamp, the 'j721e_pcie_ctrl_init()' function is invoked which then decrements the ref-count via 'pm_runtime_put_sync(dev)' leading to the PCIe Controller being powered off. This seems to be consistent across boot unlike the usage of 'dev_pm_domain_detach' which handles the device power off via a workqueue as a result of which it may not be powered off yet when 'j721e_pcie_ctrl_init()' is programming the strap settings. Hence, I switched from 'dev_pm_domain_detach()' to 'pm_runtime_put_sync()' in the v3 patch.
No using dev_pm_domain_detach() is a wrong approach.
Please let me know if you have any suggestions for alternative means to power off the device in a reliable manner without deferring it to a workqueue as done by the 'dev_pm_domain_detach()' API.
Using pm_runtime_put_sync() is the correct way, but the comment and patch description needs to be improved. In the comment, you are claiming that pm_runtime_put_sync() will power off the controller, even though it is true, it is not clear who is responsible for doing that. So reword it to reflect the fact that the power domain (genpd?) will turn off the controller.
- Mani
On Mon, Sep 01, 2025 at 11:18:23AM +0530, Manivannan Sadhasivam wrote:
On Mon, Sep 01, 2025 at 10:27:51AM GMT, Siddharth Vadapalli wrote:
On Sun, Aug 31, 2025 at 06:15:13PM +0530, Manivannan Sadhasivam wrote:
Hello Mani,
On Fri, Aug 29, 2025 at 02:46:28PM GMT, Siddharth Vadapalli wrote:
[...]
- /*
* The PCIe Controller's registers have different "reset-values"
* depending on the "strap" settings programmed into the PCIEn_CTRL
* register within the CTRL_MMR memory-mapped register space.
* The registers latch onto a "reset-value" based on the "strap"
* settings sampled after the PCIe Controller is powered on.
* To ensure that the "reset-values" are sampled accurately, power
* off the PCIe Controller before programming the "strap" settings
* and power it on after that.
*/
- ret = pm_runtime_put_sync(dev);
- if (ret < 0) {
dev_err(dev, "Failed to power off PCIe Controller\n");
return ret;
- }
How does the controller gets powered off after pm_runtime_put_sync() since you do not have runtime PM callbacks? I believe the parent is turning off the power domain?
By invoking 'pm_runtime_put_sync(dev)', the ref-count is being decremented and it results in the device being powered off. I have verified it by printing the power domain index within the functions for powering off and powering on devices on the J7200 SoC. Logs:
root@j7200-evm:~# modprobe pci_j721e [ 25.231675] [Powering On]: 240 [ 25.234848] j721e-pcie 2910000.pcie: host bridge /bus@100000/pcie@2910000 ranges: [ 25.242378] j721e-pcie 2910000.pcie: IO 0x4100001000..0x4100100fff -> 0x0000001000 [ 25.250496] j721e-pcie 2910000.pcie: MEM 0x4100101000..0x41ffffffff -> 0x0000101000 [ 25.258588] j721e-pcie 2910000.pcie: IB MEM 0x0000000000..0xffffffffffff -> 0x0000000000 [ 25.267098] [Powering Off]: 240 [ 25.270318] [Powering On]: 240 [ 25.273511] [Powering On]: 187 [ 26.372361] j721e-pcie 2910000.pcie: PCI host bridge to bus 0000:00 [ 26.378666] pci_bus 0000:00: root bus resource [bus 00-ff] [ 26.384156] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff] (bus address [0x1000-0x100fff]) [ 26.393197] pci_bus 0000:00: root bus resource [mem 0x4100101000-0x41ffffffff] (bus address [0x00101000-0xffffffff]) [ 26.403728] pci 0000:00:00.0: [104c:b00f] type 01 class 0x060400 PCIe Root Port [ 26.411044] pci 0000:00:00.0: PCI bridge to [bus 00] [ 26.416009] pci 0000:00:00.0: bridge window [io 0x0000-0x0fff] [ 26.422091] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff] [ 26.428874] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref] [ 26.436676] pci 0000:00:00.0: supports D1 [ 26.440699] pci 0000:00:00.0: PME# supported from D0 D1 D3hot [ 26.448064] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 26.456274] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 [ 26.462923] pci 0000:00:00.0: PCI bridge to [bus 01] [ 26.467933] pci_bus 0000:00: resource 4 [io 0x0000-0xfffff] [ 26.473595] pci_bus 0000:00: resource 5 [mem 0x4100101000-0x41ffffffff] [ 26.480337] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 26.487479] pcieport 0000:00:00.0: PME: Signaling with IRQ 701 [ 26.493909] pcieport 0000:00:00.0: AER: enabled with IRQ 701
In the above logs, '240' is the Power Domain Index for the PCIe Controller on J7200 SoC. It is powered on initially before the driver is probed.
In that case, the driver should not call pm_runtime_get_sync() in its probe. What it should do is:
pm_runtime_set_active() pm_runtime_enable()
If I understand correctly, are you suggesting the following?
j721e_pcie_probe() pm_runtime_set_active() pm_runtime_enable() ret = j721e_pcie_ctrl_init(pcie); /* * PCIe Controller should be powered off here, but is there * a way to ensure that it has been powered off? */ => Program the strap settings and return to j721e_pcie_probe() /* Power on the PCIe Controller now */ ret = pm_runtime_get_sync(dev);
But the driver is supporting several SoC variants. Does the bootloader enable PCIe controller for all of them?
By 'bootloader', I assume that you are referring to Firmware that is responsible for powering on or off the Controller on the basis of Power Management APIs from Linux. If so, yes, all the SoC variants are Powered on prior to the probe function being invoked via the 'dev_pm_domain_attach()' API called in drivers/base/platform.c.
During driver probe, we see the logs corresponding to "devm_pci_alloc_host_bridge()" from the timestamp of '25.234848' which is prior to the invocation of 'j721e_pcie_ctrl_init()'. Some time around the '25.267098' timestamp, the 'j721e_pcie_ctrl_init()' function is invoked which then decrements the ref-count via 'pm_runtime_put_sync(dev)' leading to the PCIe Controller being powered off. This seems to be consistent across boot unlike the usage of 'dev_pm_domain_detach' which handles the device power off via a workqueue as a result of which it may not be powered off yet when 'j721e_pcie_ctrl_init()' is programming the strap settings. Hence, I switched from 'dev_pm_domain_detach()' to 'pm_runtime_put_sync()' in the v3 patch.
No using dev_pm_domain_detach() is a wrong approach.
Please let me know if you have any suggestions for alternative means to power off the device in a reliable manner without deferring it to a workqueue as done by the 'dev_pm_domain_detach()' API.
Using pm_runtime_put_sync() is the correct way, but the comment and patch description needs to be improved. In the comment, you are claiming that pm_runtime_put_sync() will power off the controller, even though it is true, it is not clear who is responsible for doing that. So reword it to reflect the fact that the power domain (genpd?) will turn off the controller.
Thank you for the feedback. Yes, it is indeed genpd that powers off the PCIe Controller via corresponding requests to the Firmware that manages the Power states of all devices in the SoC. I will update the commit message to indicate this.
Regards, Siddharth.
On Mon, Sep 01, 2025 at 11:51:33AM GMT, Siddharth Vadapalli wrote:
On Mon, Sep 01, 2025 at 11:18:23AM +0530, Manivannan Sadhasivam wrote:
On Mon, Sep 01, 2025 at 10:27:51AM GMT, Siddharth Vadapalli wrote:
On Sun, Aug 31, 2025 at 06:15:13PM +0530, Manivannan Sadhasivam wrote:
Hello Mani,
On Fri, Aug 29, 2025 at 02:46:28PM GMT, Siddharth Vadapalli wrote:
[...]
- /*
* The PCIe Controller's registers have different "reset-values"
* depending on the "strap" settings programmed into the PCIEn_CTRL
* register within the CTRL_MMR memory-mapped register space.
* The registers latch onto a "reset-value" based on the "strap"
* settings sampled after the PCIe Controller is powered on.
* To ensure that the "reset-values" are sampled accurately, power
* off the PCIe Controller before programming the "strap" settings
* and power it on after that.
*/
- ret = pm_runtime_put_sync(dev);
- if (ret < 0) {
dev_err(dev, "Failed to power off PCIe Controller\n");
return ret;
- }
How does the controller gets powered off after pm_runtime_put_sync() since you do not have runtime PM callbacks? I believe the parent is turning off the power domain?
By invoking 'pm_runtime_put_sync(dev)', the ref-count is being decremented and it results in the device being powered off. I have verified it by printing the power domain index within the functions for powering off and powering on devices on the J7200 SoC. Logs:
root@j7200-evm:~# modprobe pci_j721e [ 25.231675] [Powering On]: 240 [ 25.234848] j721e-pcie 2910000.pcie: host bridge /bus@100000/pcie@2910000 ranges: [ 25.242378] j721e-pcie 2910000.pcie: IO 0x4100001000..0x4100100fff -> 0x0000001000 [ 25.250496] j721e-pcie 2910000.pcie: MEM 0x4100101000..0x41ffffffff -> 0x0000101000 [ 25.258588] j721e-pcie 2910000.pcie: IB MEM 0x0000000000..0xffffffffffff -> 0x0000000000 [ 25.267098] [Powering Off]: 240 [ 25.270318] [Powering On]: 240 [ 25.273511] [Powering On]: 187 [ 26.372361] j721e-pcie 2910000.pcie: PCI host bridge to bus 0000:00 [ 26.378666] pci_bus 0000:00: root bus resource [bus 00-ff] [ 26.384156] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff] (bus address [0x1000-0x100fff]) [ 26.393197] pci_bus 0000:00: root bus resource [mem 0x4100101000-0x41ffffffff] (bus address [0x00101000-0xffffffff]) [ 26.403728] pci 0000:00:00.0: [104c:b00f] type 01 class 0x060400 PCIe Root Port [ 26.411044] pci 0000:00:00.0: PCI bridge to [bus 00] [ 26.416009] pci 0000:00:00.0: bridge window [io 0x0000-0x0fff] [ 26.422091] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff] [ 26.428874] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref] [ 26.436676] pci 0000:00:00.0: supports D1 [ 26.440699] pci 0000:00:00.0: PME# supported from D0 D1 D3hot [ 26.448064] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 26.456274] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 [ 26.462923] pci 0000:00:00.0: PCI bridge to [bus 01] [ 26.467933] pci_bus 0000:00: resource 4 [io 0x0000-0xfffff] [ 26.473595] pci_bus 0000:00: resource 5 [mem 0x4100101000-0x41ffffffff] [ 26.480337] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 26.487479] pcieport 0000:00:00.0: PME: Signaling with IRQ 701 [ 26.493909] pcieport 0000:00:00.0: AER: enabled with IRQ 701
In the above logs, '240' is the Power Domain Index for the PCIe Controller on J7200 SoC. It is powered on initially before the driver is probed.
In that case, the driver should not call pm_runtime_get_sync() in its probe. What it should do is:
pm_runtime_set_active() pm_runtime_enable()
If I understand correctly, are you suggesting the following?
j721e_pcie_probe() pm_runtime_set_active() pm_runtime_enable() ret = j721e_pcie_ctrl_init(pcie); /* * PCIe Controller should be powered off here, but is there * a way to ensure that it has been powered off? */ => Program the strap settings and return to j721e_pcie_probe() /* Power on the PCIe Controller now */ ret = pm_runtime_get_sync(dev);
This pm_runtime_get_sync() should be part of j721e_pcie_ctrl_init() where you do power off, program strap and power on.
This should not be part of probe() as by that time, controller is already powered on. So pm_runtime_set_active() and pm_runtime_enable() should be enough to convey the state of the device to PM core.
- Mani
On Mon, Sep 01, 2025 at 12:14:51PM +0530, Manivannan Sadhasivam wrote:
On Mon, Sep 01, 2025 at 11:51:33AM GMT, Siddharth Vadapalli wrote:
On Mon, Sep 01, 2025 at 11:18:23AM +0530, Manivannan Sadhasivam wrote:
On Mon, Sep 01, 2025 at 10:27:51AM GMT, Siddharth Vadapalli wrote:
On Sun, Aug 31, 2025 at 06:15:13PM +0530, Manivannan Sadhasivam wrote:
Hello Mani,
On Fri, Aug 29, 2025 at 02:46:28PM GMT, Siddharth Vadapalli wrote:
[...]
- /*
* The PCIe Controller's registers have different "reset-values"
* depending on the "strap" settings programmed into the PCIEn_CTRL
* register within the CTRL_MMR memory-mapped register space.
* The registers latch onto a "reset-value" based on the "strap"
* settings sampled after the PCIe Controller is powered on.
* To ensure that the "reset-values" are sampled accurately, power
* off the PCIe Controller before programming the "strap" settings
* and power it on after that.
*/
- ret = pm_runtime_put_sync(dev);
- if (ret < 0) {
dev_err(dev, "Failed to power off PCIe Controller\n");
return ret;
- }
How does the controller gets powered off after pm_runtime_put_sync() since you do not have runtime PM callbacks? I believe the parent is turning off the power domain?
By invoking 'pm_runtime_put_sync(dev)', the ref-count is being decremented and it results in the device being powered off. I have verified it by printing the power domain index within the functions for powering off and powering on devices on the J7200 SoC. Logs:
root@j7200-evm:~# modprobe pci_j721e [ 25.231675] [Powering On]: 240 [ 25.234848] j721e-pcie 2910000.pcie: host bridge /bus@100000/pcie@2910000 ranges: [ 25.242378] j721e-pcie 2910000.pcie: IO 0x4100001000..0x4100100fff -> 0x0000001000 [ 25.250496] j721e-pcie 2910000.pcie: MEM 0x4100101000..0x41ffffffff -> 0x0000101000 [ 25.258588] j721e-pcie 2910000.pcie: IB MEM 0x0000000000..0xffffffffffff -> 0x0000000000 [ 25.267098] [Powering Off]: 240 [ 25.270318] [Powering On]: 240 [ 25.273511] [Powering On]: 187 [ 26.372361] j721e-pcie 2910000.pcie: PCI host bridge to bus 0000:00 [ 26.378666] pci_bus 0000:00: root bus resource [bus 00-ff] [ 26.384156] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff] (bus address [0x1000-0x100fff]) [ 26.393197] pci_bus 0000:00: root bus resource [mem 0x4100101000-0x41ffffffff] (bus address [0x00101000-0xffffffff]) [ 26.403728] pci 0000:00:00.0: [104c:b00f] type 01 class 0x060400 PCIe Root Port [ 26.411044] pci 0000:00:00.0: PCI bridge to [bus 00] [ 26.416009] pci 0000:00:00.0: bridge window [io 0x0000-0x0fff] [ 26.422091] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff] [ 26.428874] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref] [ 26.436676] pci 0000:00:00.0: supports D1 [ 26.440699] pci 0000:00:00.0: PME# supported from D0 D1 D3hot [ 26.448064] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 26.456274] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 [ 26.462923] pci 0000:00:00.0: PCI bridge to [bus 01] [ 26.467933] pci_bus 0000:00: resource 4 [io 0x0000-0xfffff] [ 26.473595] pci_bus 0000:00: resource 5 [mem 0x4100101000-0x41ffffffff] [ 26.480337] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 26.487479] pcieport 0000:00:00.0: PME: Signaling with IRQ 701 [ 26.493909] pcieport 0000:00:00.0: AER: enabled with IRQ 701
In the above logs, '240' is the Power Domain Index for the PCIe Controller on J7200 SoC. It is powered on initially before the driver is probed.
In that case, the driver should not call pm_runtime_get_sync() in its probe. What it should do is:
pm_runtime_set_active() pm_runtime_enable()
If I understand correctly, are you suggesting the following?
j721e_pcie_probe() pm_runtime_set_active() pm_runtime_enable() ret = j721e_pcie_ctrl_init(pcie); /* * PCIe Controller should be powered off here, but is there * a way to ensure that it has been powered off? */ => Program the strap settings and return to j721e_pcie_probe() /* Power on the PCIe Controller now */ ret = pm_runtime_get_sync(dev);
This pm_runtime_get_sync() should be part of j721e_pcie_ctrl_init() where you do power off, program strap and power on.
This should not be part of probe() as by that time, controller is already powered on. So pm_runtime_set_active() and pm_runtime_enable() should be enough to convey the state of the device to PM core.
I have tried out the suggestion in the following manner:
j721e_pcie_probe() ... pm_runtime_set_active(dev); pm_runtime_enable(dev); ret = j721e_pcie_ctrl_init(pcie); ... within j721e_pcie_ctrl_init() ret = pm_runtime_put_sync(dev); /* Program Strap Settings */ ret = pm_runtime_get_sync(dev); ... ... exit probe
Since a 'pm_runtime_get_sync()' hasn't yet been invoked prior to the section where 'pm_runtime_put_sync()' is invoked, it leads to a ref-count underflow error at runtime. Please let me know if I am missing something.
The following is the working implementation from the current patch:
j721e_pcie_probe() ... /* * The following two lines are identical to the existing * driver code */ pm_runtime_enable(dev); ret = pm_runtime_get_sync(dev); ... ret = j721e_pcie_ctrl_init(pcie); ... within j721e_pcie_ctrl_init() ret = pm_runtime_put_sync(dev); ... /* PCIe Controller is now powered off */ /* Program the strap settings */ ... ret = pm_runtime_get_sync(dev); ... ... exit probe
Regards, Siddharth.
On Mon, Sep 01, 2025 at 04:50:02PM GMT, Siddharth Vadapalli wrote:
On Mon, Sep 01, 2025 at 12:14:51PM +0530, Manivannan Sadhasivam wrote:
On Mon, Sep 01, 2025 at 11:51:33AM GMT, Siddharth Vadapalli wrote:
On Mon, Sep 01, 2025 at 11:18:23AM +0530, Manivannan Sadhasivam wrote:
On Mon, Sep 01, 2025 at 10:27:51AM GMT, Siddharth Vadapalli wrote:
On Sun, Aug 31, 2025 at 06:15:13PM +0530, Manivannan Sadhasivam wrote:
Hello Mani,
On Fri, Aug 29, 2025 at 02:46:28PM GMT, Siddharth Vadapalli wrote:
[...]
> + /* > + * The PCIe Controller's registers have different "reset-values" > + * depending on the "strap" settings programmed into the PCIEn_CTRL > + * register within the CTRL_MMR memory-mapped register space. > + * The registers latch onto a "reset-value" based on the "strap" > + * settings sampled after the PCIe Controller is powered on. > + * To ensure that the "reset-values" are sampled accurately, power > + * off the PCIe Controller before programming the "strap" settings > + * and power it on after that. > + */ > + ret = pm_runtime_put_sync(dev); > + if (ret < 0) { > + dev_err(dev, "Failed to power off PCIe Controller\n"); > + return ret; > + }
How does the controller gets powered off after pm_runtime_put_sync() since you do not have runtime PM callbacks? I believe the parent is turning off the power domain?
By invoking 'pm_runtime_put_sync(dev)', the ref-count is being decremented and it results in the device being powered off. I have verified it by printing the power domain index within the functions for powering off and powering on devices on the J7200 SoC. Logs:
root@j7200-evm:~# modprobe pci_j721e [ 25.231675] [Powering On]: 240 [ 25.234848] j721e-pcie 2910000.pcie: host bridge /bus@100000/pcie@2910000 ranges: [ 25.242378] j721e-pcie 2910000.pcie: IO 0x4100001000..0x4100100fff -> 0x0000001000 [ 25.250496] j721e-pcie 2910000.pcie: MEM 0x4100101000..0x41ffffffff -> 0x0000101000 [ 25.258588] j721e-pcie 2910000.pcie: IB MEM 0x0000000000..0xffffffffffff -> 0x0000000000 [ 25.267098] [Powering Off]: 240 [ 25.270318] [Powering On]: 240 [ 25.273511] [Powering On]: 187 [ 26.372361] j721e-pcie 2910000.pcie: PCI host bridge to bus 0000:00 [ 26.378666] pci_bus 0000:00: root bus resource [bus 00-ff] [ 26.384156] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff] (bus address [0x1000-0x100fff]) [ 26.393197] pci_bus 0000:00: root bus resource [mem 0x4100101000-0x41ffffffff] (bus address [0x00101000-0xffffffff]) [ 26.403728] pci 0000:00:00.0: [104c:b00f] type 01 class 0x060400 PCIe Root Port [ 26.411044] pci 0000:00:00.0: PCI bridge to [bus 00] [ 26.416009] pci 0000:00:00.0: bridge window [io 0x0000-0x0fff] [ 26.422091] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff] [ 26.428874] pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref] [ 26.436676] pci 0000:00:00.0: supports D1 [ 26.440699] pci 0000:00:00.0: PME# supported from D0 D1 D3hot [ 26.448064] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 26.456274] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 [ 26.462923] pci 0000:00:00.0: PCI bridge to [bus 01] [ 26.467933] pci_bus 0000:00: resource 4 [io 0x0000-0xfffff] [ 26.473595] pci_bus 0000:00: resource 5 [mem 0x4100101000-0x41ffffffff] [ 26.480337] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 26.487479] pcieport 0000:00:00.0: PME: Signaling with IRQ 701 [ 26.493909] pcieport 0000:00:00.0: AER: enabled with IRQ 701
In the above logs, '240' is the Power Domain Index for the PCIe Controller on J7200 SoC. It is powered on initially before the driver is probed.
In that case, the driver should not call pm_runtime_get_sync() in its probe. What it should do is:
pm_runtime_set_active() pm_runtime_enable()
If I understand correctly, are you suggesting the following?
j721e_pcie_probe() pm_runtime_set_active() pm_runtime_enable() ret = j721e_pcie_ctrl_init(pcie); /* * PCIe Controller should be powered off here, but is there * a way to ensure that it has been powered off? */ => Program the strap settings and return to j721e_pcie_probe() /* Power on the PCIe Controller now */ ret = pm_runtime_get_sync(dev);
This pm_runtime_get_sync() should be part of j721e_pcie_ctrl_init() where you do power off, program strap and power on.
This should not be part of probe() as by that time, controller is already powered on. So pm_runtime_set_active() and pm_runtime_enable() should be enough to convey the state of the device to PM core.
I have tried out the suggestion in the following manner:
j721e_pcie_probe() ... pm_runtime_set_active(dev); pm_runtime_enable(dev); ret = j721e_pcie_ctrl_init(pcie); ... within j721e_pcie_ctrl_init() ret = pm_runtime_put_sync(dev); /* Program Strap Settings */ ret = pm_runtime_get_sync(dev); ... ... exit probe
Since a 'pm_runtime_get_sync()' hasn't yet been invoked prior to the section where 'pm_runtime_put_sync()' is invoked, it leads to a ref-count underflow error at runtime. Please let me know if I am missing something.
Doh... At the start of probe(), device PM usage_count will be 0. So we cannot decrement it without incrementing it.
Could you try below sequence?
probe() ... pm_runtime_set_active() pm_runtime_enable() j721e_pcie_ctrl_init() ... pm_runtime_get() /* Just increment usage_count */ pm_runtime_put_sync() /* ask PM core to turn off */ /* program strap setting */ pm_runtime_get_sync() /* ask PM core to turn on */ pm_runtime_put_noidle() /* balance the usage_count without suspending the device */ ...
- Mani
On Mon, Sep 01, 2025 at 08:15:24PM +0530, Manivannan Sadhasivam wrote:
On Mon, Sep 01, 2025 at 04:50:02PM GMT, Siddharth Vadapalli wrote:
On Mon, Sep 01, 2025 at 12:14:51PM +0530, Manivannan Sadhasivam wrote:
On Mon, Sep 01, 2025 at 11:51:33AM GMT, Siddharth Vadapalli wrote:
[...]
If I understand correctly, are you suggesting the following?
j721e_pcie_probe() pm_runtime_set_active() pm_runtime_enable() ret = j721e_pcie_ctrl_init(pcie); /* * PCIe Controller should be powered off here, but is there * a way to ensure that it has been powered off? */ => Program the strap settings and return to j721e_pcie_probe() /* Power on the PCIe Controller now */ ret = pm_runtime_get_sync(dev);
This pm_runtime_get_sync() should be part of j721e_pcie_ctrl_init() where you do power off, program strap and power on.
This should not be part of probe() as by that time, controller is already powered on. So pm_runtime_set_active() and pm_runtime_enable() should be enough to convey the state of the device to PM core.
I have tried out the suggestion in the following manner:
j721e_pcie_probe() ... pm_runtime_set_active(dev); pm_runtime_enable(dev); ret = j721e_pcie_ctrl_init(pcie); ... within j721e_pcie_ctrl_init() ret = pm_runtime_put_sync(dev); /* Program Strap Settings */ ret = pm_runtime_get_sync(dev); ... ... exit probe
Since a 'pm_runtime_get_sync()' hasn't yet been invoked prior to the section where 'pm_runtime_put_sync()' is invoked, it leads to a ref-count underflow error at runtime. Please let me know if I am missing something.
Doh... At the start of probe(), device PM usage_count will be 0. So we cannot decrement it without incrementing it.
Could you try below sequence?
probe() ... pm_runtime_set_active() pm_runtime_enable() j721e_pcie_ctrl_init() ... pm_runtime_get() /* Just increment usage_count */ pm_runtime_put_sync() /* ask PM core to turn off */ /* program strap setting */ pm_runtime_get_sync() /* ask PM core to turn on */ pm_runtime_put_noidle() /* balance the usage_count without suspending the device */ ...
The above sequence powers off the controller at the point in time that the strap settings are programmed. 'pm_runtime_get_sync()' is powering on the controller afterwards. However, the 'pm_runtime_put_noidle()' at the end is causing the controller to be powered off again leading to a crash. Removing 'pm_runtime_put_noidle()' results in a functional sequence.
Please consider the existing sequence prior to this patch:
probe() ... pm_runtime_enable() pm_runtime_get_sync() => usage_count is 1 ... exit probe
With the suggested sequence above, we have:
probe() ... pm_runtime_set_active() pm_runtime_enable() j721e_pcie_ctrl_init() ... pm_runtime_get() => usage_count is 1 pm_runtime_put_sync() => usage_count is 0 /* Controller is powered off now */ /* Strap settings are programmed */ pm_runtime_get_sync() => usage_count is 1 /* Controller is powered on now */ pm_runtime_put_noidle() => usage_count is 0 /* Controller is powered off in a while */ ... /* Crash is observed aroung this point before probe finishes */
Please let me know if the fix is to drop 'pm_runtime_put_noidle()' from the above sequence.
Regards, Siddharth.
On Tue, Sep 02, 2025 at 10:34:51AM GMT, Siddharth Vadapalli wrote:
On Mon, Sep 01, 2025 at 08:15:24PM +0530, Manivannan Sadhasivam wrote:
On Mon, Sep 01, 2025 at 04:50:02PM GMT, Siddharth Vadapalli wrote:
On Mon, Sep 01, 2025 at 12:14:51PM +0530, Manivannan Sadhasivam wrote:
On Mon, Sep 01, 2025 at 11:51:33AM GMT, Siddharth Vadapalli wrote:
[...]
If I understand correctly, are you suggesting the following?
j721e_pcie_probe() pm_runtime_set_active() pm_runtime_enable() ret = j721e_pcie_ctrl_init(pcie); /* * PCIe Controller should be powered off here, but is there * a way to ensure that it has been powered off? */ => Program the strap settings and return to j721e_pcie_probe() /* Power on the PCIe Controller now */ ret = pm_runtime_get_sync(dev);
This pm_runtime_get_sync() should be part of j721e_pcie_ctrl_init() where you do power off, program strap and power on.
This should not be part of probe() as by that time, controller is already powered on. So pm_runtime_set_active() and pm_runtime_enable() should be enough to convey the state of the device to PM core.
I have tried out the suggestion in the following manner:
j721e_pcie_probe() ... pm_runtime_set_active(dev); pm_runtime_enable(dev); ret = j721e_pcie_ctrl_init(pcie); ... within j721e_pcie_ctrl_init() ret = pm_runtime_put_sync(dev); /* Program Strap Settings */ ret = pm_runtime_get_sync(dev); ... ... exit probe
Since a 'pm_runtime_get_sync()' hasn't yet been invoked prior to the section where 'pm_runtime_put_sync()' is invoked, it leads to a ref-count underflow error at runtime. Please let me know if I am missing something.
Doh... At the start of probe(), device PM usage_count will be 0. So we cannot decrement it without incrementing it.
Could you try below sequence?
probe() ... pm_runtime_set_active() pm_runtime_enable() j721e_pcie_ctrl_init() ... pm_runtime_get() /* Just increment usage_count */ pm_runtime_put_sync() /* ask PM core to turn off */ /* program strap setting */ pm_runtime_get_sync() /* ask PM core to turn on */ pm_runtime_put_noidle() /* balance the usage_count without suspending the device */ ...
The above sequence powers off the controller at the point in time that the strap settings are programmed. 'pm_runtime_get_sync()' is powering on the controller afterwards. However, the 'pm_runtime_put_noidle()' at the end is causing the controller to be powered off again leading to a crash. Removing 'pm_runtime_put_noidle()' results in a functional sequence.
Please consider the existing sequence prior to this patch:
probe() ... pm_runtime_enable() pm_runtime_get_sync() => usage_count is 1 ... exit probe
With the suggested sequence above, we have:
probe() ... pm_runtime_set_active() pm_runtime_enable() j721e_pcie_ctrl_init() ... pm_runtime_get() => usage_count is 1 pm_runtime_put_sync() => usage_count is 0 /* Controller is powered off now */ /* Strap settings are programmed */ pm_runtime_get_sync() => usage_count is 1 /* Controller is powered on now */ pm_runtime_put_noidle() => usage_count is 0 /* Controller is powered off in a while */ ... /* Crash is observed aroung this point before probe finishes */
Please let me know if the fix is to drop 'pm_runtime_put_noidle()' from the above sequence.
I thought put_noidle() will just reduce the refcount and not invoke the idle/suspend callbacks, but I seem to be wrong here.
Anyway, I guess we have no option here other than to drop the pm_runtime_put_noidle() call. This will keep refcount as 1 and will prevent the parent (genpd) to not enter runtime suspend, but we have to live with it (this was also the previous beahvior as well).
Btw, pm_runtime_set_active/enable change belongs to a separate patch.
- Mani
On Wed, Sep 03, 2025 at 06:51:14PM +0530, Manivannan Sadhasivam wrote:
[...]
I thought put_noidle() will just reduce the refcount and not invoke the idle/suspend callbacks, but I seem to be wrong here.
Anyway, I guess we have no option here other than to drop the pm_runtime_put_noidle() call. This will keep refcount as 1 and will prevent the parent (genpd) to not enter runtime suspend, but we have to live with it (this was also the previous beahvior as well).
Btw, pm_runtime_set_active/enable change belongs to a separate patch.
I will work on and post the v4 patch with the following change: - The commit message and the comment in the code will be updated to indicate that the runtime PM APIs are used to cause 'genpd' to power off and power on the controller.
I assume that the driver changes can remain as they are in this patch given that the existing behavior has to be preserved. As j721e_pcie_ctrl_init() programs the strap settings, powering off the controller via pm_runtime_put_sync() and powering it on via pm_runtime_get_sync() shall be implemented within j721e_pcie_ctrl_init() itself leaving the rest of the probe function as-is. To summarize, the driver changes will be:
j721e_pcie_ctrl_init() ...Existing code... + Add pm_runtime_put_sync() here to power off controller ...Existing code to program strap settings... + Add pm_runtime_get_sync() here to power on controller ...Existing code...
Please let me know what you think.
Regards, Siddharth.
On Thu, Sep 04, 2025 at 11:33:00AM GMT, Siddharth Vadapalli wrote:
On Wed, Sep 03, 2025 at 06:51:14PM +0530, Manivannan Sadhasivam wrote:
[...]
I thought put_noidle() will just reduce the refcount and not invoke the idle/suspend callbacks, but I seem to be wrong here.
Anyway, I guess we have no option here other than to drop the pm_runtime_put_noidle() call. This will keep refcount as 1 and will prevent the parent (genpd) to not enter runtime suspend, but we have to live with it (this was also the previous beahvior as well).
Btw, pm_runtime_set_active/enable change belongs to a separate patch.
I will work on and post the v4 patch with the following change:
- The commit message and the comment in the code will be updated to indicate that the runtime PM APIs are used to cause 'genpd' to power off and power on the controller.
I assume that the driver changes can remain as they are in this patch given that the existing behavior has to be preserved. As j721e_pcie_ctrl_init() programs the strap settings, powering off the controller via pm_runtime_put_sync() and powering it on via pm_runtime_get_sync() shall be implemented within j721e_pcie_ctrl_init() itself leaving the rest of the probe function as-is. To summarize, the driver changes will be:
j721e_pcie_ctrl_init() ...Existing code...
- Add pm_runtime_put_sync() here to power off controller ...Existing code to program strap settings...
- Add pm_runtime_get_sync() here to power on controller ...Existing code...
Please let me know what you think.
I would've preferred changing the runtime PM calls as I suggested initially, but still we would keep it incremented which brings the net effect to the same.
So you can drop my initial comment and post just this patch with improved patch description.
- Mani
On Mon, Sep 08, 2025 at 12:58:51PM +0530, Manivannan Sadhasivam wrote:
On Thu, Sep 04, 2025 at 11:33:00AM GMT, Siddharth Vadapalli wrote:
On Wed, Sep 03, 2025 at 06:51:14PM +0530, Manivannan Sadhasivam wrote:
[...]
I thought put_noidle() will just reduce the refcount and not invoke the idle/suspend callbacks, but I seem to be wrong here.
Anyway, I guess we have no option here other than to drop the pm_runtime_put_noidle() call. This will keep refcount as 1 and will prevent the parent (genpd) to not enter runtime suspend, but we have to live with it (this was also the previous beahvior as well).
Btw, pm_runtime_set_active/enable change belongs to a separate patch.
I will work on and post the v4 patch with the following change:
- The commit message and the comment in the code will be updated to indicate that the runtime PM APIs are used to cause 'genpd' to power off and power on the controller.
I assume that the driver changes can remain as they are in this patch given that the existing behavior has to be preserved. As j721e_pcie_ctrl_init() programs the strap settings, powering off the controller via pm_runtime_put_sync() and powering it on via pm_runtime_get_sync() shall be implemented within j721e_pcie_ctrl_init() itself leaving the rest of the probe function as-is. To summarize, the driver changes will be:
j721e_pcie_ctrl_init() ...Existing code...
- Add pm_runtime_put_sync() here to power off controller ...Existing code to program strap settings...
- Add pm_runtime_get_sync() here to power on controller ...Existing code...
Please let me know what you think.
I would've preferred changing the runtime PM calls as I suggested initially, but still we would keep it incremented which brings the net effect to the same.
So you can drop my initial comment and post just this patch with improved patch description.
Thank you for the clarification. I will implement the above in the v4 patch.
Regards, Siddharth.
linux-stable-mirror@lists.linaro.org