[PATCH AUTOSEL 6.16-5.4] RDMA: hfi1: fix possible divide-by-zero in find_hw_thread_mask()

5 Aug 2025

From: "Yury Norov [NVIDIA]" yury.norov@gmail.com
[ Upstream commit 59f7d2138591ef8f0e4e4ab5f1ab674e8181ad3a ]
The function divides number of online CPUs by num_core_siblings, and
later checks the divider by zero. This implies a possibility to get
and divide-by-zero runtime error. Fix it by moving the check prior to
division. This also helps to save one indentation level.
Signed-off-by: Yury Norov [NVIDIA] yury.norov@gmail.com
Link: https://patch.msgid.link/20250604193947.11834-3-yury.norov@gmail.com
Signed-off-by: Leon Romanovsky leon@kernel.org
Signed-off-by: Sasha Levin sashal@kernel.org
---
LLM Generated explanations, may be completely bogus:
**Backport Status: YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## Bug Fix Analysis
The commit fixes a **genuine divide-by-zero bug** in the
`find_hw_thread_mask()` function. The code changes show:
1. **Original bug**: The division `num_cores_per_socket =
   node_affinity.num_online_cpus / affinity->num_core_siblings /
   node_affinity.num_online_nodes` occurs at lines 967-969 BEFORE
   checking if `num_core_siblings > 0` at line 972.
2. **The fix**: Moves the check `if (affinity->num_core_siblings == 0)
   return;` to line 973-974 (in the new code) BEFORE the division
   operation, preventing the divide-by-zero.
## When the Bug Can Trigger
The `num_core_siblings` value is initialized as:
```c
cpumask_weight(topology_sibling_cpumask(cpumask_first(&node_affinity.pro
c.mask)))
```
This can be 0 in several real-world scenarios:
- Single-core systems without SMT/hyperthreading
- Systems where SMT is disabled at runtime
- Virtualized environments with unusual CPU topology
- Certain ARM or other architectures where topology_sibling_cpumask()
  returns empty
## Stable Kernel Criteria Met
1. **Fixes a real bug**: ✓ - Prevents kernel divide-by-zero crash
2. **Small and contained**: ✓ - Only 20 lines changed in one function
3. **No side effects**: ✓ - Early return preserves existing behavior
   when num_core_siblings==0
4. **No architectural changes**: ✓ - Simple defensive programming fix
5. **Clear bug fix**: ✓ - Not a feature or optimization
6. **Low regression risk**: ✓ - Only adds safety check, doesn't change
   logic
## Impact Assessment
- **Severity**: Medium-High - Can cause kernel panic on affected systems
- **Affected systems**: HFI1 InfiniBand hardware on systems with
  specific CPU configurations
- **User impact**: System crash when loading HFI1 driver on vulnerable
  configurations
The commit message clearly states "fix possible divide-by-zero" and the
code change unambiguously moves a zero-check before a division operation
that uses that value as divisor. This is a textbook example of a bug fix
that should be backported to stable kernels to prevent crashes on
systems with certain CPU topologies.
drivers/infiniband/hw/hfi1/affinity.c | 44 +++++++++++++++------------
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/affinity.c b/drivers/infiniband/hw/hfi1/affinity.c
index 7ead8746b79b..f2c530ab85a5 100644
--- a/drivers/infiniband/hw/hfi1/affinity.c
+++ b/drivers/infiniband/hw/hfi1/affinity.c
@@ -964,31 +964,35 @@ static void find_hw_thread_mask(uint hw_thread_no, cpumask_var_t hw_thread_mask,
    			struct hfi1_affinity_node_list *affinity)
 {
    int possible, curr_cpu, i;
-	uint num_cores_per_socket = node_affinity.num_online_cpus /
+	uint num_cores_per_socket;
+
+	cpumask_copy(hw_thread_mask, &affinity->proc.mask);
+
+	if (affinity->num_core_siblings == 0)
+		return;
+
+	num_cores_per_socket = node_affinity.num_online_cpus /
    				affinity->num_core_siblings /
    					node_affinity.num_online_nodes;
-	cpumask_copy(hw_thread_mask, &affinity->proc.mask);
-	if (affinity->num_core_siblings > 0) {
-		/* Removing other siblings not needed for now */
-		possible = cpumask_weight(hw_thread_mask);
-		curr_cpu = cpumask_first(hw_thread_mask);
-		for (i = 0;
-		     i < num_cores_per_socket * node_affinity.num_online_nodes;
-		     i++)
-			curr_cpu = cpumask_next(curr_cpu, hw_thread_mask);
-
-		for (; i < possible; i++) {
-			cpumask_clear_cpu(curr_cpu, hw_thread_mask);
-			curr_cpu = cpumask_next(curr_cpu, hw_thread_mask);
-		}
+	/* Removing other siblings not needed for now */
+	possible = cpumask_weight(hw_thread_mask);
+	curr_cpu = cpumask_first(hw_thread_mask);
+	for (i = 0;
+	     i < num_cores_per_socket * node_affinity.num_online_nodes;
+	     i++)
+		curr_cpu = cpumask_next(curr_cpu, hw_thread_mask);
-		/* Identifying correct HW threads within physical cores */
-		cpumask_shift_left(hw_thread_mask, hw_thread_mask,
-				   num_cores_per_socket *
-				   node_affinity.num_online_nodes *
-				   hw_thread_no);
+	for (; i < possible; i++) {
+		cpumask_clear_cpu(curr_cpu, hw_thread_mask);
+		curr_cpu = cpumask_next(curr_cpu, hw_thread_mask);
    }
+
+	/* Identifying correct HW threads within physical cores */
+	cpumask_shift_left(hw_thread_mask, hw_thread_mask,
+			   num_cores_per_socket *
+			   node_affinity.num_online_nodes *
+			   hw_thread_no);
 }
int hfi1_get_proc_affinity(int node)
-- 
2.39.5



    

2025

2024

2023

2022

2021

2020

2019

2018

2017

[PATCH AUTOSEL 6.16-5.4] RDMA: hfi1: fix possible divide-by-zero in find_hw_thread_mask()