On Mon, 29 Apr 2019 16:01:29 +1000 Alexey Kardashevskiy aik@ozlabs.ru wrote:
On 20/04/2019 01:34, Greg Kurz wrote:
Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). This has the effect of incrementing the reference count of the PCI device, as explained in drivers/pci/search.c:
- Given a PCI domain, bus, and slot/function number, the desired PCI
- device is located in the list of PCI devices. If the device is
- found, its reference count is increased and this function returns a
- pointer to its data structure. The caller must decrement the
- reference count by calling pci_dev_put(). If no device is found,
- %NULL is returned.
Nothing was done to call pci_dev_put() and the reference count of GPU and NPU PCI devices rockets up.
A natural way to fix this would be to teach the callers about the change, so that they call pci_dev_put() when done with the pointer. This turns out to be quite intrusive, as it affects many paths in npu-dma.c, pci-ioda.c and vfio_pci_nvlink2.c.
afaict this referencing is only done to protect the current traverser and what you've done is actually a natural way (and the generic pci_get_dev_by_id() does exactly the same), although this looks a bit weird.
Not exactly the same: pci_get_dev_by_id() always increment the refcount of the returned PCI device. The refcount is only decremented when this device is passed to pci_get_dev_by_id() to continue searching.
That means that the users of the PCI device pointer returned by pci_get_dev_by_id() or its exported variants pci_get_subsys(), pci_get_device() and pci_get_class() do handle the refcount. They all pass the pointer to pci_dev_put() or continue the search, which calls pci_dev_put() internally.
Direct and indirect callers of get_pci_dev() don't care for the refcount at all unless I'm missing something.
Also, the issue appeared in 4.16 and some affected code got moved around since then: it would be problematic to backport the fix to stable releases.
All that code never cared for reference counting anyway. Call pci_dev_put() from get_pci_dev() to revert to the previous behavior.
Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev
from pci_dn")
Cc: stable@vger.kernel.org # v4.16 Signed-off-by: Greg Kurz groug@kaod.org
arch/powerpc/platforms/powernv/npu-dma.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index e713ade30087..d8f3647e8fb2 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -31,9 +31,22 @@ static DEFINE_SPINLOCK(npu_context_lock); static struct pci_dev *get_pci_dev(struct device_node *dn) { struct pci_dn *pdn = PCI_DN(dn);
- struct pci_dev *pdev;
- return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
- pdev = pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus), pdn->busno, pdn->devfn);
- /*
* pci_get_domain_bus_and_slot() increased the reference count of
* the PCI device, but callers don't need that actually as the PE
* already holds a reference to the device.
Imho this would be just enough.
Anyway,
Reviewed-by: Alexey Kardashevskiy aik@ozlabs.ru
Thanks !
I now realize that I forgot to add the --cc option for stable on my stgit command line :-.
Cc'ing now.
How did you find it? :)
While reading code to find some inspiration for OpenCAPI passthrough. :)
I saw the following in vfio_pci_ibm_npu2_init():
if (!pnv_pci_get_gpu_dev(vdev->pdev)) return -ENODEV;
and simply followed the function calls.
Since callers aren't
* aware of the reference count change, call pci_dev_put() now to
* avoid leaks.
*/
- if (pdev)
pci_dev_put(pdev);
- return pdev;
} /* Given a NPU device get the associated PCI device. */
Michael,
Any comments on this patch ? Should I repost with a shorter comment as suggested by Alexey ?
Cheers,
-- Greg
On Mon, 29 Apr 2019 12:36:59 +0200 Greg Kurz groug@kaod.org wrote:
On Mon, 29 Apr 2019 16:01:29 +1000 Alexey Kardashevskiy aik@ozlabs.ru wrote:
On 20/04/2019 01:34, Greg Kurz wrote:
Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). This has the effect of incrementing the reference count of the PCI device, as explained in drivers/pci/search.c:
- Given a PCI domain, bus, and slot/function number, the desired PCI
- device is located in the list of PCI devices. If the device is
- found, its reference count is increased and this function returns a
- pointer to its data structure. The caller must decrement the
- reference count by calling pci_dev_put(). If no device is found,
- %NULL is returned.
Nothing was done to call pci_dev_put() and the reference count of GPU and NPU PCI devices rockets up.
A natural way to fix this would be to teach the callers about the change, so that they call pci_dev_put() when done with the pointer. This turns out to be quite intrusive, as it affects many paths in npu-dma.c, pci-ioda.c and vfio_pci_nvlink2.c.
afaict this referencing is only done to protect the current traverser and what you've done is actually a natural way (and the generic pci_get_dev_by_id() does exactly the same), although this looks a bit weird.
Not exactly the same: pci_get_dev_by_id() always increment the refcount of the returned PCI device. The refcount is only decremented when this device is passed to pci_get_dev_by_id() to continue searching.
That means that the users of the PCI device pointer returned by pci_get_dev_by_id() or its exported variants pci_get_subsys(), pci_get_device() and pci_get_class() do handle the refcount. They all pass the pointer to pci_dev_put() or continue the search, which calls pci_dev_put() internally.
Direct and indirect callers of get_pci_dev() don't care for the refcount at all unless I'm missing something.
Also, the issue appeared in 4.16 and some affected code got moved around since then: it would be problematic to backport the fix to stable releases.
All that code never cared for reference counting anyway. Call pci_dev_put() from get_pci_dev() to revert to the previous behavior.
Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev
from pci_dn")
Cc: stable@vger.kernel.org # v4.16 Signed-off-by: Greg Kurz groug@kaod.org
arch/powerpc/platforms/powernv/npu-dma.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index e713ade30087..d8f3647e8fb2 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -31,9 +31,22 @@ static DEFINE_SPINLOCK(npu_context_lock); static struct pci_dev *get_pci_dev(struct device_node *dn) { struct pci_dn *pdn = PCI_DN(dn);
- struct pci_dev *pdev;
- return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
- pdev = pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus), pdn->busno, pdn->devfn);
- /*
* pci_get_domain_bus_and_slot() increased the reference count of
* the PCI device, but callers don't need that actually as the PE
* already holds a reference to the device.
Imho this would be just enough.
Anyway,
Reviewed-by: Alexey Kardashevskiy aik@ozlabs.ru
Thanks !
I now realize that I forgot to add the --cc option for stable on my stgit command line :-.
Cc'ing now.
How did you find it? :)
While reading code to find some inspiration for OpenCAPI passthrough. :)
I saw the following in vfio_pci_ibm_npu2_init():
if (!pnv_pci_get_gpu_dev(vdev->pdev)) return -ENODEV;
and simply followed the function calls.
Since callers aren't
* aware of the reference count change, call pci_dev_put() now to
* avoid leaks.
*/
- if (pdev)
pci_dev_put(pdev);
- return pdev;
} /* Given a NPU device get the associated PCI device. */
Greg Kurz groug@kaod.org writes:
Michael,
Any comments on this patch ? Should I repost with a shorter comment as suggested by Alexey ?
No the longer comment seems fine to me.
I'm not a big fan of the patch, it's basically a hack :)
But for a backportable fix I guess it is OK.
I would be happier though if we eventually fix up the code to do the refcounting properly.
cheers
On Mon, 29 Apr 2019 12:36:59 +0200 Greg Kurz groug@kaod.org wrote:
On Mon, 29 Apr 2019 16:01:29 +1000 Alexey Kardashevskiy aik@ozlabs.ru wrote:
On 20/04/2019 01:34, Greg Kurz wrote:
Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). This has the effect of incrementing the reference count of the PCI device, as explained in drivers/pci/search.c:
- Given a PCI domain, bus, and slot/function number, the desired PCI
- device is located in the list of PCI devices. If the device is
- found, its reference count is increased and this function returns a
- pointer to its data structure. The caller must decrement the
- reference count by calling pci_dev_put(). If no device is found,
- %NULL is returned.
Nothing was done to call pci_dev_put() and the reference count of GPU and NPU PCI devices rockets up.
A natural way to fix this would be to teach the callers about the change, so that they call pci_dev_put() when done with the pointer. This turns out to be quite intrusive, as it affects many paths in npu-dma.c, pci-ioda.c and vfio_pci_nvlink2.c.
afaict this referencing is only done to protect the current traverser and what you've done is actually a natural way (and the generic pci_get_dev_by_id() does exactly the same), although this looks a bit weird.
Not exactly the same: pci_get_dev_by_id() always increment the refcount of the returned PCI device. The refcount is only decremented when this device is passed to pci_get_dev_by_id() to continue searching.
That means that the users of the PCI device pointer returned by pci_get_dev_by_id() or its exported variants pci_get_subsys(), pci_get_device() and pci_get_class() do handle the refcount. They all pass the pointer to pci_dev_put() or continue the search, which calls pci_dev_put() internally.
Direct and indirect callers of get_pci_dev() don't care for the refcount at all unless I'm missing something.
Also, the issue appeared in 4.16 and some affected code got moved around since then: it would be problematic to backport the fix to stable releases.
All that code never cared for reference counting anyway. Call pci_dev_put() from get_pci_dev() to revert to the previous behavior.
Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev
from pci_dn")
Cc: stable@vger.kernel.org # v4.16 Signed-off-by: Greg Kurz groug@kaod.org
arch/powerpc/platforms/powernv/npu-dma.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index e713ade30087..d8f3647e8fb2 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -31,9 +31,22 @@ static DEFINE_SPINLOCK(npu_context_lock); static struct pci_dev *get_pci_dev(struct device_node *dn) { struct pci_dn *pdn = PCI_DN(dn);
- struct pci_dev *pdev;
- return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
- pdev = pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus), pdn->busno, pdn->devfn);
- /*
* pci_get_domain_bus_and_slot() increased the reference count of
* the PCI device, but callers don't need that actually as the PE
* already holds a reference to the device.
Imho this would be just enough.
Anyway,
Reviewed-by: Alexey Kardashevskiy aik@ozlabs.ru
Thanks !
I now realize that I forgot to add the --cc option for stable on my stgit command line :-.
Cc'ing now.
How did you find it? :)
While reading code to find some inspiration for OpenCAPI passthrough. :)
I saw the following in vfio_pci_ibm_npu2_init():
if (!pnv_pci_get_gpu_dev(vdev->pdev)) return -ENODEV;
and simply followed the function calls.
Since callers aren't
* aware of the reference count change, call pci_dev_put() now to
* avoid leaks.
*/
- if (pdev)
pci_dev_put(pdev);
- return pdev;
} /* Given a NPU device get the associated PCI device. */
On Tue, 14 May 2019 21:13:40 +1000 Michael Ellerman mpe@ellerman.id.au wrote:
Greg Kurz groug@kaod.org writes:
Michael,
Any comments on this patch ? Should I repost with a shorter comment as suggested by Alexey ?
No the longer comment seems fine to me.
I'm not a big fan of the patch, it's basically a hack :)
Yeah :)
But for a backportable fix I guess it is OK.
I would be happier though if we eventually fix up the code to do the refcounting properly.
I had started to do just that before deciding to go for the backportable hack. Should I rebase the other patches I have on top of this patch and repost the whole thing, so that we have both the ugly fix for stable and the pretty one for 5.2 ?
Cheers,
-- Greg
cheers
On Mon, 29 Apr 2019 12:36:59 +0200 Greg Kurz groug@kaod.org wrote:
On Mon, 29 Apr 2019 16:01:29 +1000 Alexey Kardashevskiy aik@ozlabs.ru wrote:
On 20/04/2019 01:34, Greg Kurz wrote:
Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). This has the effect of incrementing the reference count of the PCI device, as explained in drivers/pci/search.c:
- Given a PCI domain, bus, and slot/function number, the desired PCI
- device is located in the list of PCI devices. If the device is
- found, its reference count is increased and this function returns a
- pointer to its data structure. The caller must decrement the
- reference count by calling pci_dev_put(). If no device is found,
- %NULL is returned.
Nothing was done to call pci_dev_put() and the reference count of GPU and NPU PCI devices rockets up.
A natural way to fix this would be to teach the callers about the change, so that they call pci_dev_put() when done with the pointer. This turns out to be quite intrusive, as it affects many paths in npu-dma.c, pci-ioda.c and vfio_pci_nvlink2.c.
afaict this referencing is only done to protect the current traverser and what you've done is actually a natural way (and the generic pci_get_dev_by_id() does exactly the same), although this looks a bit weird.
Not exactly the same: pci_get_dev_by_id() always increment the refcount of the returned PCI device. The refcount is only decremented when this device is passed to pci_get_dev_by_id() to continue searching.
That means that the users of the PCI device pointer returned by pci_get_dev_by_id() or its exported variants pci_get_subsys(), pci_get_device() and pci_get_class() do handle the refcount. They all pass the pointer to pci_dev_put() or continue the search, which calls pci_dev_put() internally.
Direct and indirect callers of get_pci_dev() don't care for the refcount at all unless I'm missing something.
Also, the issue appeared in 4.16 and some affected code got moved around since then: it would be problematic to backport the fix to stable releases.
All that code never cared for reference counting anyway. Call pci_dev_put() from get_pci_dev() to revert to the previous behavior.
Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev
from pci_dn")
Cc: stable@vger.kernel.org # v4.16 Signed-off-by: Greg Kurz groug@kaod.org
arch/powerpc/platforms/powernv/npu-dma.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index e713ade30087..d8f3647e8fb2 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -31,9 +31,22 @@ static DEFINE_SPINLOCK(npu_context_lock); static struct pci_dev *get_pci_dev(struct device_node *dn) { struct pci_dn *pdn = PCI_DN(dn);
- struct pci_dev *pdev;
- return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
- pdev = pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus), pdn->busno, pdn->devfn);
- /*
* pci_get_domain_bus_and_slot() increased the reference count of
* the PCI device, but callers don't need that actually as the PE
* already holds a reference to the device.
Imho this would be just enough.
Anyway,
Reviewed-by: Alexey Kardashevskiy aik@ozlabs.ru
Thanks !
I now realize that I forgot to add the --cc option for stable on my stgit command line :-.
Cc'ing now.
How did you find it? :)
While reading code to find some inspiration for OpenCAPI passthrough. :)
I saw the following in vfio_pci_ibm_npu2_init():
if (!pnv_pci_get_gpu_dev(vdev->pdev)) return -ENODEV;
and simply followed the function calls.
Since callers aren't
* aware of the reference count change, call pci_dev_put() now to
* avoid leaks.
*/
- if (pdev)
pci_dev_put(pdev);
- return pdev;
} /* Given a NPU device get the associated PCI device. */
linux-stable-mirror@lists.linaro.org