From: Artem Sadovnikov a.sadovnikov@ispras.ru
[ Upstream commit b3060198483bac43ec113c62ae3837076f61f5de ]
MLX cap pg_track_log_max_msg_size consists of 5 bits, value of which is used as power of 2 for max_msg_size. This can lead to multiplication overflow between max_msg_size (u32) and integer constant, and afterwards incorrect value is being written to rq_size.
Fix this issue by extending integer constant to u64 type.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Suggested-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Artem Sadovnikov a.sadovnikov@ispras.ru Reviewed-by: Yishai Hadas yishaih@nvidia.com Link: https://lore.kernel.org/r/20250701144017.2410-2-a.sadovnikov@ispras.ru Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
**YES**
This commit should be backported to stable kernel trees for the following reasons:
## Critical Integer Overflow Bug Fix
1. **Clear Bug with Security Implications**: The commit fixes a critical integer overflow bug in the VFIO MLX5 driver. When `log_max_msg_size` reaches its maximum value (31, as it's a 5-bit field), the calculation `4 * max_msg_size` causes an integer overflow: - `max_msg_size = (1ULL << 31) = 2147483648 (0x80000000)` - `4 * max_msg_size` overflows to 0 when calculated as u32 * int - This results in `rq_size` being incorrectly set, potentially to 0 or a very small value
2. **Introduced by Recent Code**: The vulnerable code was introduced in commit 9c7c5430bca36 ("vfio/mlx5: Align the page tracking max message size with the device capability") from December 2024, which itself has a Fixes tag for commit 79c3cf279926 from 2022. This indicates the affected code exists in stable kernels.
3. **Simple and Contained Fix**: The fix is minimal - just changing `4` to `4ULL` in two places to ensure 64-bit arithmetic: ```c - if (rq_size < 4 * max_msg_size) - rq_size = 4 * max_msg_size; + if (rq_size < 4ULL * max_msg_size) + rq_size = 4ULL * max_msg_size; ```
4. **Affects User-Visible Functionality**: This bug affects VFIO device passthrough for MLX5 devices, specifically the page tracking functionality used for live migration. An incorrect `rq_size` could lead to: - Failed QP (Queue Pair) creation - Memory corruption - System instability during VM migration
5. **Low Risk of Regression**: The change only affects the calculation precision by ensuring 64-bit arithmetic. It doesn't change any logic, APIs, or data structures.
6. **Found by Verification Tools**: The bug was discovered by the Linux Verification Center using SVACE static analysis, indicating it's a real issue that could be triggered in practice.
7. **Reviewed by Maintainers**: The fix was suggested by the VFIO maintainer (Alex Williamson) and reviewed by the MLX5 driver maintainer (Yishai Hadas), showing consensus on the solution.
The commit meets all criteria for stable backporting: it fixes a real bug that affects users, the fix is small and contained, there's minimal risk of regression, and it addresses a potential security/stability issue in virtualization infrastructure.
drivers/vfio/pci/mlx5/cmd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 5b919a0b2524..a92b095b90f6 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -1523,8 +1523,8 @@ int mlx5vf_start_page_tracker(struct vfio_device *vdev, log_max_msg_size = MLX5_CAP_ADV_VIRTUALIZATION(mdev, pg_track_log_max_msg_size); max_msg_size = (1ULL << log_max_msg_size); /* The RQ must hold at least 4 WQEs/messages for successful QP creation */ - if (rq_size < 4 * max_msg_size) - rq_size = 4 * max_msg_size; + if (rq_size < 4ULL * max_msg_size) + rq_size = 4ULL * max_msg_size;
memset(tracker, 0, sizeof(*tracker)); tracker->uar = mlx5_get_uars_page(mdev);