From: Ankit Garg nktgrg@google.com
This series fixes a kernel panic in the GVE driver caused by out-of-bounds array access when the network stack provides an invalid TX queue index.
The issue impacts both GQI and DQO queue formats. For both cases, the driver is updated to validate the queue index and drop the packet if the index is out of range.
Ankit Garg (2): gve: drop packets on invalid queue indices in GQI TX path gve: drop packets on invalid queue indices in DQO TX path
drivers/net/ethernet/google/gve/gve_tx.c | 12 +++++++++--- drivers/net/ethernet/google/gve/gve_tx_dqo.c | 9 ++++++++- 2 files changed, 17 insertions(+), 4 deletions(-)
From: Ankit Garg nktgrg@google.com
The driver currently assumes that the skb queue mapping is within the range of configured TX queues. However, the stack may provide an index that exceeds the number of active queues.
In GQI format, an out-of-range index triggered a warning but continues to dereference tx array, potentially causing a crash like below:
[ 6.700970] Call Trace: [ 6.703576] ? __warn+0x94/0xe0 [ 6.706863] ? gve_tx+0xa9f/0xc30 [gve] [ 6.712223] ? gve_tx+0xa9f/0xc30 [gve] [ 6.716197] ? report_bug+0xb1/0xe0 [ 6.721195] ? do_error_trap+0x9e/0xd0 [ 6.725084] ? do_invalid_op+0x36/0x40 [ 6.730355] ? gve_tx+0xa9f/0xc30 [gve] [ 6.734353] ? invalid_op+0x14/0x20 [ 6.739372] ? gve_tx+0xa9f/0xc30 [gve] [ 6.743350] ? netif_skb_features+0xcf/0x2a0 [ 6.749137] dev_hard_start_xmit+0xd7/0x240
Change that behavior to log a warning and drop the packet.
Cc: stable@vger.kernel.org Fixes: f5cedc84a30d ("gve: Add transmit and receive support") Signed-off-by: Ankit Garg nktgrg@google.com Reviewed-by: Harshitha Ramamurthy hramamurthy@google.com Signed-off-by: Joshua Washington joshwash@google.com --- drivers/net/ethernet/google/gve/gve_tx.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c index 97efc8d..30d1686 100644 --- a/drivers/net/ethernet/google/gve/gve_tx.c +++ b/drivers/net/ethernet/google/gve/gve_tx.c @@ -739,12 +739,18 @@ drop: netdev_tx_t gve_tx(struct sk_buff *skb, struct net_device *dev) { struct gve_priv *priv = netdev_priv(dev); + u16 qid = skb_get_queue_mapping(skb); struct gve_tx_ring *tx; int nsegs;
- WARN(skb_get_queue_mapping(skb) >= priv->tx_cfg.num_queues, - "skb queue index out of range"); - tx = &priv->tx[skb_get_queue_mapping(skb)]; + if (unlikely(qid >= priv->tx_cfg.num_queues)) { + net_warn_ratelimited("%s: skb qid %d out of range, num tx queue %d. dropping packet", + dev->name, qid, priv->tx_cfg.num_queues); + dev_kfree_skb_any(skb); + return NETDEV_TX_OK; + } + + tx = &priv->tx[qid]; if (unlikely(gve_maybe_stop_tx(priv, tx, skb))) { /* We need to ring the txq doorbell -- we have stopped the Tx * queue for want of resources, but prior calls to gve_tx()
From: Ankit Garg nktgrg@google.com
The driver currently assumes that the skb queue mapping is within the range of configured TX queues. However, the stack may provide an index that exceeds the number of active queues.
In DQO format, driver doesn't perform any validation and continues to dereference tx array, potentially causing a crash like below (trace is from GQI format, but how we handle OOB queue is same in both formats).
[ 6.700970] Call Trace: [ 6.703576] ? __warn+0x94/0xe0 [ 6.706863] ? gve_tx+0xa9f/0xc30 [gve] [ 6.712223] ? gve_tx+0xa9f/0xc30 [gve] [ 6.716197] ? report_bug+0xb1/0xe0 [ 6.721195] ? do_error_trap+0x9e/0xd0 [ 6.725084] ? do_invalid_op+0x36/0x40 [ 6.730355] ? gve_tx+0xa9f/0xc30 [gve] [ 6.734353] ? invalid_op+0x14/0x20 [ 6.739372] ? gve_tx+0xa9f/0xc30 [gve] [ 6.743350] ? netif_skb_features+0xcf/0x2a0 [ 6.749137] dev_hard_start_xmit+0xd7/0x240
Change that behavior to log a warning and drop the packet.
Cc: stable@vger.kernel.org Fixes: a57e5de476be ("gve: DQO: Add TX path") Signed-off-by: Ankit Garg nktgrg@google.com Reviewed-by: Harshitha Ramamurthy hramamurthy@google.com Signed-off-by: Joshua Washington joshwash@google.com --- drivers/net/ethernet/google/gve/gve_tx_dqo.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/ethernet/google/gve/gve_tx_dqo.c index 40b89b3..8ebcc84 100644 --- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c @@ -1045,9 +1045,16 @@ static void gve_xsk_reorder_queue_pop_dqo(struct gve_tx_ring *tx) netdev_tx_t gve_tx_dqo(struct sk_buff *skb, struct net_device *dev) { struct gve_priv *priv = netdev_priv(dev); + u16 qid = skb_get_queue_mapping(skb); struct gve_tx_ring *tx;
- tx = &priv->tx[skb_get_queue_mapping(skb)]; + if (unlikely(qid >= priv->tx_cfg.num_queues)) { + net_warn_ratelimited("%s: skb qid %d out of range, num tx queue %d. dropping packet", + dev->name, qid, priv->tx_cfg.num_queues); + dev_kfree_skb_any(skb); + return NETDEV_TX_OK; + } + tx = &priv->tx[qid]; if (unlikely(gve_try_tx_skb(priv, tx, skb) < 0)) { /* We need to ring the txq doorbell -- we have stopped the Tx * queue for want of resources, but prior calls to gve_tx() -- 2.52.0.351.gbe84eed79e-goog
On Mon, 5 Jan 2026 15:25:02 -0800 Joshua Washington wrote:
This series fixes a kernel panic in the GVE driver caused by out-of-bounds array access when the network stack provides an invalid TX queue index.
Do you know how? I seem to recall we had such issues due to bugs in the qdisc layer, most of which were fixed.
Fixing this at the source, if possible, would be far preferable to sprinkling this condition to all the drivers.
On Tue, Jan 6, 2026 at 6:22 PM Jakub Kicinski kuba@kernel.org wrote:
On Mon, 5 Jan 2026 15:25:02 -0800 Joshua Washington wrote:
This series fixes a kernel panic in the GVE driver caused by out-of-bounds array access when the network stack provides an invalid TX queue index.
Do you know how? I seem to recall we had such issues due to bugs in the qdisc layer, most of which were fixed.
Fixing this at the source, if possible, would be far preferable to sprinkling this condition to all the drivers.
That matches our observation—we have encountered this panic on older kernels (specifically Rocky Linux 8) but have not been able to reproduce it on recent upstream kernels.
Could you point us to the specific qdisc fixes you recall? We'd like to verify if the issue we are seeing on the older kernel is indeed one of those known/fixed bugs.
If it turns out this is fully resolved in the core network stack upstream, we can drop this patch for the mainline driver. However, if there is ambiguity, do you think there is value in keeping this check to prevent the driver from crashing on invalid input?
Thanks, Ankit Garg
On Thu, 8 Jan 2026 07:35:59 -0800 Ankit Garg wrote:
On Tue, Jan 6, 2026 at 6:22 PM Jakub Kicinski kuba@kernel.org wrote:
On Mon, 5 Jan 2026 15:25:02 -0800 Joshua Washington wrote:
This series fixes a kernel panic in the GVE driver caused by out-of-bounds array access when the network stack provides an invalid TX queue index.
Do you know how? I seem to recall we had such issues due to bugs in the qdisc layer, most of which were fixed.
Fixing this at the source, if possible, would be far preferable to sprinkling this condition to all the drivers.
That matches our observation—we have encountered this panic on older kernels (specifically Rocky Linux 8) but have not been able to reproduce it on recent upstream kernels.
Could you point us to the specific qdisc fixes you recall? We'd like to verify if the issue we are seeing on the older kernel is indeed one of those known/fixed bugs.
Very old - ac5b70198adc25
If it turns out this is fully resolved in the core network stack upstream, we can drop this patch for the mainline driver. However, if there is ambiguity, do you think there is value in keeping this check to prevent the driver from crashing on invalid input?
The API contract is that the stack does not send frames for queues which don't exist (> real_num_tx_queues) down to the drivers. There's no ambiguity, IMO, if the stack sends such frames its a bug in the stack.
On Thu, Jan 8, 2026 at 4:36 PM Ankit Garg nktgrg@google.com wrote:
On Tue, Jan 6, 2026 at 6:22 PM Jakub Kicinski kuba@kernel.org wrote:
On Mon, 5 Jan 2026 15:25:02 -0800 Joshua Washington wrote:
This series fixes a kernel panic in the GVE driver caused by out-of-bounds array access when the network stack provides an invalid TX queue index.
Do you know how? I seem to recall we had such issues due to bugs in the qdisc layer, most of which were fixed.
Fixing this at the source, if possible, would be far preferable to sprinkling this condition to all the drivers.
That matches our observation—we have encountered this panic on older kernels (specifically Rocky Linux 8) but have not been able to reproduce it on recent upstream kernels.
What is the kernel version used in Rocky Linux 8 ?
Note that the test against real_num_tx_queues is done before reaching the Qdisc layer.
It might help to give a stack trace of a panic.
Could you point us to the specific qdisc fixes you recall? We'd like to verify if the issue we are seeing on the older kernel is indeed one of those known/fixed bugs.
If it turns out this is fully resolved in the core network stack upstream, we can drop this patch for the mainline driver. However, if there is ambiguity, do you think there is value in keeping this check to prevent the driver from crashing on invalid input?
We already have many costly checks, and netdev_core_pick_tx() should already prevent such panic.
Thanks, Ankit Garg
linux-stable-mirror@lists.linaro.org