Generally PASID support requires ACS settings that usually create single device groups, but there are some niche cases where we can get multi-device groups and still have working PASID support. The primary issue is that PCI switches are not required to treat PASID tagged TLPs specially so appropriate ACS settings are required to route all TLPs to the host bridge if PASID is going to work properly.
pci_enable_pasid() does check that each device that will use PASID has the proper ACS settings to achieve this routing.
However, no-PASID devices can be combined with PASID capable devices within the same topology using non-uniform ACS settings. In this case the no-PASID devices may not have strict route to host ACS flags and end up being grouped with the PASID devices.
This configuration fails to allow use of the PASID within the iommu core code which wrongly checks if the no-PASID device supports PASID.
Fix this by ignoring no-PASID devices during the PASID validation. They will never issue a PASID TLP anyhow so they can be ignored.
Fixes: c404f55c26fc ("iommu: Validate the PASID in iommu_attach_device_pasid()") Cc: stable@vger.kernel.org Signed-off-by: Tushar Dave tdave@nvidia.com --- drivers/iommu/iommu.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4f91a740c15f..e01df4c3e709 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3440,7 +3440,13 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
mutex_lock(&group->mutex); for_each_group_device(group, device) { - if (pasid >= device->dev->iommu->max_pasids) { + /* + * Skip PASID validation for devices without PASID support + * (max_pasids = 0). These devices cannot issue transactions + * with PASID, so they don't affect group's PASID usage. + */ + if ((device->dev->iommu->max_pasids > 0) && + (pasid >= device->dev->iommu->max_pasids)) { ret = -EINVAL; goto out_unlock; }
On 4/24/25 10:06, Tushar Dave wrote:
Generally PASID support requires ACS settings that usually create single device groups, but there are some niche cases where we can get multi-device groups and still have working PASID support. The primary issue is that PCI switches are not required to treat PASID tagged TLPs specially so appropriate ACS settings are required to route all TLPs to the host bridge if PASID is going to work properly.
pci_enable_pasid() does check that each device that will use PASID has the proper ACS settings to achieve this routing.
However, no-PASID devices can be combined with PASID capable devices within the same topology using non-uniform ACS settings. In this case the no-PASID devices may not have strict route to host ACS flags and end up being grouped with the PASID devices.
This configuration fails to allow use of the PASID within the iommu core code which wrongly checks if the no-PASID device supports PASID.
Fix this by ignoring no-PASID devices during the PASID validation. They will never issue a PASID TLP anyhow so they can be ignored.
Fixes: c404f55c26fc ("iommu: Validate the PASID in iommu_attach_device_pasid()") Cc:stable@vger.kernel.org Signed-off-by: Tushar Davetdave@nvidia.com
drivers/iommu/iommu.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4f91a740c15f..e01df4c3e709 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3440,7 +3440,13 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, mutex_lock(&group->mutex); for_each_group_device(group, device) {
if (pasid >= device->dev->iommu->max_pasids) {
/*
* Skip PASID validation for devices without PASID support
* (max_pasids = 0). These devices cannot issue transactions
* with PASID, so they don't affect group's PASID usage.
*/
if ((device->dev->iommu->max_pasids > 0) &&
(pasid >= device->dev->iommu->max_pasids)) {
What the iommu driver should do when set_dev_pasid is called for a non- PASID device? The iommu driver has no sense of iommu group, hence it has no knowledge about this device sharing an iommu group with another PASID capable device.
From: Baolu Lu baolu.lu@linux.intel.com Sent: Thursday, April 24, 2025 11:27 AM
On 4/24/25 10:06, Tushar Dave wrote:
Generally PASID support requires ACS settings that usually create single device groups, but there are some niche cases where we can get multi-device groups and still have working PASID support. The primary issue is that PCI switches are not required to treat PASID tagged TLPs specially so appropriate ACS settings are required to route all TLPs to the host bridge if PASID is going to work properly.
pci_enable_pasid() does check that each device that will use PASID has the proper ACS settings to achieve this routing.
However, no-PASID devices can be combined with PASID capable devices within the same topology using non-uniform ACS settings. In this case the no-PASID devices may not have strict route to host ACS flags and end up being grouped with the PASID devices.
Is there a detailed example?
This configuration fails to allow use of the PASID within the iommu core code which wrongly checks if the no-PASID device supports PASID.
Fix this by ignoring no-PASID devices during the PASID validation. They will never issue a PASID TLP anyhow so they can be ignored.
Fixes: c404f55c26fc ("iommu: Validate the PASID in
iommu_attach_device_pasid()")
Cc:stable@vger.kernel.org Signed-off-by: Tushar Davetdave@nvidia.com
drivers/iommu/iommu.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4f91a740c15f..e01df4c3e709 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3440,7 +3440,13 @@ int iommu_attach_device_pasid(struct
iommu_domain *domain,
mutex_lock(&group->mutex); for_each_group_device(group, device) {
if (pasid >= device->dev->iommu->max_pasids) {
/*
* Skip PASID validation for devices without PASID support
* (max_pasids = 0). These devices cannot issue transactions
* with PASID, so they don't affect group's PASID usage.
*/
if ((device->dev->iommu->max_pasids > 0) &&
(pasid >= device->dev->iommu->max_pasids)) {
What the iommu driver should do when set_dev_pasid is called for a non- PASID device? The iommu driver has no sense of iommu group, hence it has no knowledge about this device sharing an iommu group with another PASID capable device.
could add a similar check in __iommu_set_group_pasid() and __iommu_remove_group_pasid() to skip those devices.
On 4/24/2025 8:57 AM, Baolu Lu wrote:
On 4/24/25 10:06, Tushar Dave wrote:
Generally PASID support requires ACS settings that usually create single device groups, but there are some niche cases where we can get multi-device groups and still have working PASID support. The primary issue is that PCI switches are not required to treat PASID tagged TLPs specially so appropriate ACS settings are required to route all TLPs to the host bridge if PASID is going to work properly.
pci_enable_pasid() does check that each device that will use PASID has the proper ACS settings to achieve this routing.
However, no-PASID devices can be combined with PASID capable devices within the same topology using non-uniform ACS settings. In this case the no-PASID devices may not have strict route to host ACS flags and end up being grouped with the PASID devices.
This configuration fails to allow use of the PASID within the iommu core code which wrongly checks if the no-PASID device supports PASID.
Fix this by ignoring no-PASID devices during the PASID validation. They will never issue a PASID TLP anyhow so they can be ignored.
Fixes: c404f55c26fc ("iommu: Validate the PASID in iommu_attach_device_pasid()") Cc:stable@vger.kernel.org Signed-off-by: Tushar Davetdave@nvidia.com
drivers/iommu/iommu.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4f91a740c15f..e01df4c3e709 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3440,7 +3440,13 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, mutex_lock(&group->mutex); for_each_group_device(group, device) { - if (pasid >= device->dev->iommu->max_pasids) { + /* + * Skip PASID validation for devices without PASID support + * (max_pasids = 0). These devices cannot issue transactions + * with PASID, so they don't affect group's PASID usage. + */ + if ((device->dev->iommu->max_pasids > 0) && + (pasid >= device->dev->iommu->max_pasids)) {
What the iommu driver should do when set_dev_pasid is called for a non- PASID device?
Per device max_pasids check should cover that right?
FYI. One example of such device is some of the AMD GPUs which has both VGA and audio in same group. while VGA supports PASID, audio is not. This used to work fine when we had AMD IOMMU PASID specific driver. GPUs stopped using PASIDs in upstream kernel. So I didn't look into this part in details.
-Vasant
On Thu, Apr 24, 2025 at 12:08:56PM +0530, Vasant Hegde wrote:
What the iommu driver should do when set_dev_pasid is called for a non- PASID device?
That's a good point, maybe the core code should filter that out based on max_pasids? I think we do run into trouble here because the drivers are allocating PASID table space based on max_pasids so the non-pasid device should fail to add the pasid. Tushar, you should have hit this in your testing???
We also have a problem setting up the default domain - it won't compute IOMMU_HWPT_ALLOC_PASID properly across the group. If the no-pasid device probes first then PASID will be broken on the group.
Tushar isn't hitting this because ARM always uses a PASID compatible domain today, but it will not work on AMD.
That's a huge pain to deal with :\
Per device max_pasids check should cover that right?
The driver shouldn't be doing this though, if the driver is told to make a pasid then it should make a pasid.. The driver can fail attaching a pasid to a device that is over the device's max_pasid.
FYI. One example of such device is some of the AMD GPUs which has both VGA and audio in same group. while VGA supports PASID, audio is not. This used to work fine when we had AMD IOMMU PASID specific driver. GPUs stopped using PASIDs in upstream kernel. So I didn't look into this part in details.
Uhhh.. That sounds like a worse problem, the only way you should end up with same group is if the ACS flags are missing on the GPU so Linux assumes the VGA and audio can loopback to each other internally.
That should completely block PASID support on the GPU side due the wrong routing. We can't have a hole in the PASID address space where the audio BAR is.
I suppose the HW doesn't actually behave this way but since it doesn't have the right ACS flags the SW doesn't know? Guessing..
Jason
Jason,
On 4/24/2025 6:01 PM, Jason Gunthorpe wrote:
On Thu, Apr 24, 2025 at 12:08:56PM +0530, Vasant Hegde wrote:
What the iommu driver should do when set_dev_pasid is called for a non- PASID device?
That's a good point, maybe the core code should filter that out based on max_pasids? I think we do run into trouble here because the drivers are allocating PASID table space based on max_pasids so the non-pasid device should fail to add the pasid. Tushar, you should have hit this in your testing???
We also have a problem setting up the default domain - it won't compute IOMMU_HWPT_ALLOC_PASID properly across the group. If the no-pasid device probes first then PASID will be broken on the group.
Tushar isn't hitting this because ARM always uses a PASID compatible domain today, but it will not work on AMD.
That's a huge pain to deal with :\
Agree. That will complicate things.
Just to be clear, I gave some of the AMD GPU as an example of group where we have both PASID, non-PASID devices in same group. But currently AMDGPU is not using PASID. But currently I am not looking for supporting SVA for amdgpu with such configs.
Per device max_pasids check should cover that right?
The driver shouldn't be doing this though, if the driver is told to make a pasid then it should make a pasid.. The driver can fail attaching a pasid to a device that is over the device's max_pasid.
FYI. One example of such device is some of the AMD GPUs which has both VGA and audio in same group. while VGA supports PASID, audio is not. This used to work fine when we had AMD IOMMU PASID specific driver. GPUs stopped using PASIDs in upstream kernel. So I didn't look into this part in details.
Uhhh.. That sounds like a worse problem, the only way you should end up with same group is if the ACS flags are missing on the GPU so Linux assumes the VGA and audio can loopback to each other internally.
That should completely block PASID support on the GPU side due the wrong routing. We can't have a hole in the PASID address space where the audio BAR is.
I suppose the HW doesn't actually behave this way but since it doesn't have the right ACS flags the SW doesn't know? Guessing..
Honestly I have no idea. Since they had stopped using PASID support I never digged into the details!
-Vasant
On 4/24/25 05:31, Jason Gunthorpe wrote:
On Thu, Apr 24, 2025 at 12:08:56PM +0530, Vasant Hegde wrote:
What the iommu driver should do when set_dev_pasid is called for a non- PASID device?
That's a good point, maybe the core code should filter that out based on max_pasids? I think we do run into trouble here because the drivers are allocating PASID table space based on max_pasids so the non-pasid device should fail to add the pasid. Tushar, you should have hit this in your testing???
When we have multi-device group with PASID device and non-PASID devices, set_dev_pasid doesn't fail in my testing for non-PASID devices.
Here is the example topology and bit more detail:
0008:00:00.0 root_port └─0008:01:00.0 upstream_port ├─0008:02:00.0 downstream_port │ └─0008:03:00.0 endpoint (NIC DMA-PF) └─0008:02:03.0 downstream_port └─0008:04:00.0 upstream_port └─0008:05:00.0 downstream_port └─0008:06:00.0 endpoint (GPU)
In the above topology, we setup ACS flags on DSP 0008:02:03.0 and 0008:02:00.0 to achieve desired p2p configuration for GPU and DMA-PF. Apparently, this creates multi-device group with GPU being only device with PASID support in that group. In this case, set_dev_pasid() ops invoked for each device within the group with pasid=1 and doesn't fail.
e.g.
... .. . pcieport 0008:02:03.0: debug: __iommu_set_group_pasid(): pasid=1 dev->iommu->max_pasids=0 iommu_group 30 pcieport 0008:02:03.0: debug: __iommu_set_group_pasid(): ret 0 pcieport 0008:04:00.0: debug: __iommu_set_group_pasid(): pasid=1 dev->iommu->max_pasids=0 iommu_group 30 pcieport 0008:04:00.0: debug: __iommu_set_group_pasid(): ret 0 pcieport 0008:05:00.0: debug: __iommu_set_group_pasid(): pasid=1 dev->iommu->max_pasids=0 iommu_group 30 pcieport 0008:05:00.0: debug: __iommu_set_group_pasid(): ret 0 nvidia 0008:06:00.0: debug: __iommu_set_group_pasid(): pasid=1 dev->iommu->max_pasids=1048576 iommu_group 30 nvidia 0008:06:00.0: debug: __iommu_set_group_pasid(): ret 0
IMO this outcome is expected. Quoting a text from commit https://github.com/torvalds/linux/commit/16603704559c7a68718059c4f75287886c0...
"If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table."
-Tushar
We also have a problem setting up the default domain - it won't compute IOMMU_HWPT_ALLOC_PASID properly across the group. If the no-pasid device probes first then PASID will be broken on the group.
Tushar isn't hitting this because ARM always uses a PASID compatible domain today, but it will not work on AMD.
That's a huge pain to deal with :\
Per device max_pasids check should cover that right?
The driver shouldn't be doing this though, if the driver is told to make a pasid then it should make a pasid.. The driver can fail attaching a pasid to a device that is over the device's max_pasid.
FYI. One example of such device is some of the AMD GPUs which has both VGA and audio in same group. while VGA supports PASID, audio is not. This used to work fine when we had AMD IOMMU PASID specific driver. GPUs stopped using PASIDs in upstream kernel. So I didn't look into this part in details.
Uhhh.. That sounds like a worse problem, the only way you should end up with same group is if the ACS flags are missing on the GPU so Linux assumes the VGA and audio can loopback to each other internally.
That should completely block PASID support on the GPU side due the wrong routing. We can't have a hole in the PASID address space where the audio BAR is.
I suppose the HW doesn't actually behave this way but since it doesn't have the right ACS flags the SW doesn't know? Guessing..
Jason
On Thu, Apr 24, 2025 at 05:49:20PM -0700, Tushar Dave wrote:
In the above topology, we setup ACS flags on DSP 0008:02:03.0 and 0008:02:00.0 to achieve desired p2p configuration for GPU and DMA-PF. Apparently, this creates multi-device group with GPU being only device with PASID support in that group. In this case, set_dev_pasid() ops invoked for each device within the group with pasid=1 and doesn't fail.
Hurm, it doesn't fail, but it corrupts memory in the driver :\
int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, struct arm_smmu_cd *cd, struct iommu_domain *old) { struct iommu_domain *sid_domain = iommu_get_domain_for_dev(master->dev); struct arm_smmu_attach_state state = { .master = master, .ssid = pasid, .old_domain = old, }; struct arm_smmu_cd *cdptr; int ret;
/* The core code validates pasid */ ^^^^^^^^^^
Which is not true after this patch.
The core code may not call the driver's set_pasid() function with a PASID larger than that specific device's device->dev->iommu->max_pasids
Jason
On 4/25/25 05:00, Jason Gunthorpe wrote:
On Thu, Apr 24, 2025 at 05:49:20PM -0700, Tushar Dave wrote:
In the above topology, we setup ACS flags on DSP 0008:02:03.0 and 0008:02:00.0 to achieve desired p2p configuration for GPU and DMA-PF. Apparently, this creates multi-device group with GPU being only device with PASID support in that group. In this case, set_dev_pasid() ops invoked for each device within the group with pasid=1 and doesn't fail.
Hurm, it doesn't fail, but it corrupts memory in the driver :\
int arm_smmu_set_pasid(struct arm_smmu_master *master, struct arm_smmu_domain *smmu_domain, ioasid_t pasid, struct arm_smmu_cd *cd, struct iommu_domain *old) { struct iommu_domain *sid_domain = iommu_get_domain_for_dev(master->dev); struct arm_smmu_attach_state state = { .master = master, .ssid = pasid, .old_domain = old, }; struct arm_smmu_cd *cdptr; int ret;
/* The core code validates pasid */ ^^^^^^^^^^
Which is not true after this patch.
The core code may not call the driver's set_pasid() function with a PASID larger than that specific device's device->dev->iommu->max_pasids
Yup. And I should be adding similar check (i.e. max_pasid > 0 ) before invoking set_dev_pasid and remove_dev_pasid (as Kevin already asked earlier). I can do that in v2.
-Tushar
Jason
linux-stable-mirror@lists.linaro.org