When bnxt_init_one() fails during initialization (e.g., bnxt_init_int_mode returns -ENODEV), the error path calls bnxt_free_hwrm_resources() which destroys the DMA pool and sets bp->hwrm_dma_pool to NULL. Subsequently, bnxt_ptp_clear() is called, which invokes ptp_clock_unregister().
Since commit a60fc3294a37 ("ptp: rework ptp_clock_unregister() to disable events"), ptp_clock_unregister() now calls ptp_disable_all_events(), which in turn invokes the driver's .enable() callback (bnxt_ptp_enable()) to disable PTP events before completing the unregistration.
bnxt_ptp_enable() attempts to send HWRM commands via bnxt_ptp_cfg_pin() and bnxt_ptp_cfg_event(), both of which call hwrm_req_init(). This function tries to allocate from bp->hwrm_dma_pool, causing a NULL pointer dereference:
bnxt_en 0000:01:00.0 (unnamed net_device) (uninitialized): bnxt_init_int_mode err: ffffffed KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] Call Trace: __hwrm_req_init (drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.c:72) bnxt_ptp_enable (drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:323 drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:517) ptp_disable_all_events (drivers/ptp/ptp_chardev.c:66) ptp_clock_unregister (drivers/ptp/ptp_clock.c:518) bnxt_ptp_clear (drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:1134) bnxt_init_one (drivers/net/ethernet/broadcom/bnxt/bnxt.c:16889)
Lines are against commit f8f9c1f4d0c7 ("Linux 6.19-rc3")
Fix this by clearing and unregistering ptp (bnxt_ptp_clear()) before freeing HWRM resources.
Suggested-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Breno Leitao leitao@debian.org Fixes: a60fc3294a37 ("ptp: rework ptp_clock_unregister() to disable events") Cc: stable@vger.kernel.org --- Changes in v2: - Instead of checking for HWRM resources in bnxt_ptp_enable(), call it when HWRM resources are availble (Pavan Chebbi) - Link to v1: https://patch.msgid.link/20251231-bnxt-v1-1-8f9cde6698b4@debian.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index d160e54ac121..5a4af8abf848 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -16891,11 +16891,11 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
init_err_pci_clean: bnxt_hwrm_func_drv_unrgtr(bp); + bnxt_ptp_clear(bp); + kfree(bp->ptp_cfg); bnxt_free_hwrm_resources(bp); bnxt_hwmon_uninit(bp); bnxt_ethtool_free(bp); - bnxt_ptp_clear(bp); - kfree(bp->ptp_cfg); bp->ptp_cfg = NULL; kfree(bp->fw_health); bp->fw_health = NULL;
--- base-commit: e146b276a817807b8f4a94b5781bf80c6c00601b change-id: 20251231-bnxt-c54d317d8bfe
Best regards, -- Breno Leitao leitao@debian.org
On Mon, Jan 05, 2026 at 04:00:16AM -0800, Breno Leitao wrote:
When bnxt_init_one() fails during initialization (e.g., bnxt_init_int_mode returns -ENODEV), the error path calls bnxt_free_hwrm_resources() which destroys the DMA pool and sets bp->hwrm_dma_pool to NULL. Subsequently, bnxt_ptp_clear() is called, which invokes ptp_clock_unregister().
Since commit a60fc3294a37 ("ptp: rework ptp_clock_unregister() to disable events"), ptp_clock_unregister() now calls ptp_disable_all_events(), which in turn invokes the driver's .enable() callback (bnxt_ptp_enable()) to disable PTP events before completing the unregistration.
bnxt_ptp_enable() attempts to send HWRM commands via bnxt_ptp_cfg_pin() and bnxt_ptp_cfg_event(), both of which call hwrm_req_init(). This function tries to allocate from bp->hwrm_dma_pool, causing a NULL pointer dereference:
This has revealed a latent bug in this driver. All the time that the PTP clock is registered, userspace can interact with it, and thus bnxt_ptp_enable() can be called. ptp_clock_unregister() unpublishes that interface.
ptp_clock_unregister() must always be called _before_ tearing down any resources that the PTP clock implementation may use.
From what you describe, it sounds like this patch fixes that.
Looking at the driver, however, it looks very suspicious.
__bnxt_hwrm_ptp_qcfg() seems to be the place where PTP is setup and initialised (and ptp_clock_register() called in bnxt_ptp_init()).
First, it looks like bnxt_ptp_init() will tear down an existing PTP clock via bnxt_ptp_free() before then re-registering it. That seems odd.
Second, __bnxt_hwrm_ptp_qcfg() calls bnxt_ptp_clear() if bp->hwrm_spec_code < 0x10801 || !BNXT_CHIP_P5_PLUS(bp) is true or hwrm_req_init() fails. Is it really possible that we have the PTP clock registered when PTP isn't supported?
Third, same concern but with __bnxt_hwrm_func_qcaps().
My guess is that this has something to do with firmware, and maybe upgrading it at runtime - so if the firmware gets upgraded to a version that doesn't support PTP, the driver removes PTP. However, can PTP be used while firmware is being upgraded, and what happens if, e.g. bnxt_ptp_enable() were called mid-upgrade? Would that be safe?
Hello Russell,
On Mon, Jan 05, 2026 at 01:29:40PM +0000, Russell King (Oracle) wrote:
On Mon, Jan 05, 2026 at 04:00:16AM -0800, Breno Leitao wrote: My guess is that this has something to do with firmware, and maybe upgrading it at runtime - so if the firmware gets upgraded to a version that doesn't support PTP, the driver removes PTP. However, can PTP be used while firmware is being upgraded, and what happens if, e.g. bnxt_ptp_enable() were called mid-upgrade? Would that be safe?
This crash happened at boot time, when the kernel was having another at DMA path, which was triggering this bug. There was no firmare upgrade at all. Just rebooting the machine with 6.19 was crashing everytime due to the early failure to initialize the driver.
--breno
On Mon, Jan 05, 2026 at 06:11:26AM -0800, Breno Leitao wrote:
Hello Russell,
On Mon, Jan 05, 2026 at 01:29:40PM +0000, Russell King (Oracle) wrote:
On Mon, Jan 05, 2026 at 04:00:16AM -0800, Breno Leitao wrote: My guess is that this has something to do with firmware, and maybe upgrading it at runtime - so if the firmware gets upgraded to a version that doesn't support PTP, the driver removes PTP. However, can PTP be used while firmware is being upgraded, and what happens if, e.g. bnxt_ptp_enable() were called mid-upgrade? Would that be safe?
This crash happened at boot time, when the kernel was having another at DMA path, which was triggering this bug. There was no firmare upgrade at all. Just rebooting the machine with 6.19 was crashing everytime due to the early failure to initialize the driver.
Please read my email again. I wasn't questioning _when_ the problem you were seeing was occuring. I was questioning the overall structural quality of the driver, suggesting that there are further issues with it around PTP.
On Mon, Jan 5, 2026 at 6:59 PM Russell King (Oracle) linux@armlinux.org.uk wrote:
On Mon, Jan 05, 2026 at 04:00:16AM -0800, Breno Leitao wrote:
When bnxt_init_one() fails during initialization (e.g., bnxt_init_int_mode returns -ENODEV), the error path calls bnxt_free_hwrm_resources() which destroys the DMA pool and sets bp->hwrm_dma_pool to NULL. Subsequently, bnxt_ptp_clear() is called, which invokes ptp_clock_unregister().
Since commit a60fc3294a37 ("ptp: rework ptp_clock_unregister() to disable events"), ptp_clock_unregister() now calls ptp_disable_all_events(), which in turn invokes the driver's .enable() callback (bnxt_ptp_enable()) to disable PTP events before completing the unregistration.
bnxt_ptp_enable() attempts to send HWRM commands via bnxt_ptp_cfg_pin() and bnxt_ptp_cfg_event(), both of which call hwrm_req_init(). This function tries to allocate from bp->hwrm_dma_pool, causing a NULL pointer dereference:
This has revealed a latent bug in this driver. All the time that the PTP clock is registered, userspace can interact with it, and thus bnxt_ptp_enable() can be called. ptp_clock_unregister() unpublishes that interface.
ptp_clock_unregister() must always be called _before_ tearing down any resources that the PTP clock implementation may use.
From what you describe, it sounds like this patch fixes that.
Looking at the driver, however, it looks very suspicious.
__bnxt_hwrm_ptp_qcfg() seems to be the place where PTP is setup and initialised (and ptp_clock_register() called in bnxt_ptp_init()).
First, it looks like bnxt_ptp_init() will tear down an existing PTP clock via bnxt_ptp_free() before then re-registering it. That seems odd.
This is to handle the firmware capabilities changes post an update, like you guessed.
Second, __bnxt_hwrm_ptp_qcfg() calls bnxt_ptp_clear() if bp->hwrm_spec_code < 0x10801 || !BNXT_CHIP_P5_PLUS(bp) is true or hwrm_req_init() fails. Is it really possible that we have the PTP clock registered when PTP isn't supported?
Right, this check may not make much sense because we call __bnxt_hwrm_ptp_qcfg() only after we know PTP is supported. Michael may tell better but I think we could improve by removing that check.
Third, same concern but with __bnxt_hwrm_func_qcaps().
This case is different? __bnxt_hwrm_func_qcaps() is always called post fw reset, where the capability could have changed.
My guess is that this has something to do with firmware, and maybe upgrading it at runtime - so if the firmware gets upgraded to a version that doesn't support PTP, the driver removes PTP. However, can PTP be used while firmware is being upgraded, and what happens if, e.g. bnxt_ptp_enable() were called mid-upgrade? Would that be safe?
Hm.. you have a point. This is a consequence of commit a60fc3294a37. We never had this situation before. As it stands now, from what I know, bnxt_ptp_enable()'s fw commands will be no-ops. But yes, to be correct, I think we should have a fw PTP capability check inside bnxt_ptp_enable().
-- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Mon, Jan 5, 2026 at 7:51 AM Pavan Chebbi pavan.chebbi@broadcom.com wrote:
On Mon, Jan 5, 2026 at 6:59 PM Russell King (Oracle) linux@armlinux.org.uk wrote:
Second, __bnxt_hwrm_ptp_qcfg() calls bnxt_ptp_clear() if bp->hwrm_spec_code < 0x10801 || !BNXT_CHIP_P5_PLUS(bp) is true or hwrm_req_init() fails. Is it really possible that we have the PTP clock registered when PTP isn't supported?
Right, this check may not make much sense because we call __bnxt_hwrm_ptp_qcfg() only after we know PTP is supported. Michael may tell better but I think we could improve by removing that check.
Some older FW may advertise support for PTP using an older scheme that the driver does not support. The FW running on an older class of chips may also advertise support for PTP and it's also not supported by the driver. In the former case, if FW is downgraded, the test may become true.
On Mon, Jan 05, 2026 at 09:40:03AM -0800, Michael Chan wrote:
On Mon, Jan 5, 2026 at 7:51 AM Pavan Chebbi pavan.chebbi@broadcom.com wrote:
On Mon, Jan 5, 2026 at 6:59 PM Russell King (Oracle) linux@armlinux.org.uk wrote:
Second, __bnxt_hwrm_ptp_qcfg() calls bnxt_ptp_clear() if bp->hwrm_spec_code < 0x10801 || !BNXT_CHIP_P5_PLUS(bp) is true or hwrm_req_init() fails. Is it really possible that we have the PTP clock registered when PTP isn't supported?
Right, this check may not make much sense because we call __bnxt_hwrm_ptp_qcfg() only after we know PTP is supported. Michael may tell better but I think we could improve by removing that check.
Some older FW may advertise support for PTP using an older scheme that the driver does not support. The FW running on an older class of chips may also advertise support for PTP and it's also not supported by the driver. In the former case, if FW is downgraded, the test may become true.
I'd like to restate my question, as it is the crux of the issue: as the PTP clock remains registered during the firmware change, userspace can interact with that device in every way possible.
If the firmware is in the process of being changed, and e.g. bnxt_ptp_enable() were to be called by way of userspace interacting with the PTP clock, we have already established that bnxt_ptp_enable() will talk to the firmware - but what happens if bnxt_ptp_enable() attempts to while the firmware is being changed? Is this safe?
On Mon, Jan 5, 2026 at 10:03 AM Russell King (Oracle) linux@armlinux.org.uk wrote:
I'd like to restate my question, as it is the crux of the issue: as the PTP clock remains registered during the firmware change, userspace can interact with that device in every way possible.
If the firmware is in the process of being changed, and e.g. bnxt_ptp_enable() were to be called by way of userspace interacting with the PTP clock, we have already established that bnxt_ptp_enable() will talk to the firmware - but what happens if bnxt_ptp_enable() attempts to while the firmware is being changed? Is this safe?
I believe we have code to deal with that. During FW upgrade/downgrade/reset, the BNXT_STATE_IN_FW_RESET flag should be set. The operation that needs to be avoided in this window is reading the clock registers. A quick check of the code shows that we take the ptp_lock and check the flag before we call readl().
On Mon, 05 Jan 2026 04:00:16 -0800 Breno Leitao wrote:
init_err_pci_clean: bnxt_hwrm_func_drv_unrgtr(bp);
- bnxt_ptp_clear(bp);
- kfree(bp->ptp_cfg); bnxt_free_hwrm_resources(bp); bnxt_hwmon_uninit(bp); bnxt_ethtool_free(bp);
- bnxt_ptp_clear(bp);
- kfree(bp->ptp_cfg); bp->ptp_cfg = NULL;
Is there a reason to leave clearing of the pointer behind? I don't see it mentioned in the commit msg.. Checking previous discussion it sounds like Pavan asked for the clearing to also be moved.
kfree(bp->fw_health); bp->fw_health = NULL;
On Mon, Jan 05, 2026 at 04:04:58PM -0800, Jakub Kicinski wrote:
On Mon, 05 Jan 2026 04:00:16 -0800 Breno Leitao wrote:
init_err_pci_clean: bnxt_hwrm_func_drv_unrgtr(bp);
- bnxt_ptp_clear(bp);
- kfree(bp->ptp_cfg); bnxt_free_hwrm_resources(bp); bnxt_hwmon_uninit(bp); bnxt_ethtool_free(bp);
- bnxt_ptp_clear(bp);
- kfree(bp->ptp_cfg); bp->ptp_cfg = NULL;
Is there a reason to leave clearing of the pointer behind? I don't see it mentioned in the commit msg.. Checking previous discussion it sounds like Pavan asked for the clearing to also be moved.
no reason, just a mistake. I will update.
linux-stable-mirror@lists.linaro.org