On 2024/3/7 6:10, Mina Almasry wrote:
...
+static int netdev_restart_rx_queue(struct net_device *dev, int rxq_idx) +{
void *new_mem;
void *old_mem;
int err;
if (!dev || !dev->netdev_ops)
return -EINVAL;
if (!dev->netdev_ops->ndo_queue_stop ||
!dev->netdev_ops->ndo_queue_mem_free ||
!dev->netdev_ops->ndo_queue_mem_alloc ||
!dev->netdev_ops->ndo_queue_start)
return -EOPNOTSUPP;
new_mem = dev->netdev_ops->ndo_queue_mem_alloc(dev, rxq_idx);
if (!new_mem)
return -ENOMEM;
err = dev->netdev_ops->ndo_queue_stop(dev, rxq_idx, &old_mem);
if (err)
goto err_free_new_mem;
err = dev->netdev_ops->ndo_queue_start(dev, rxq_idx, new_mem);
if (err)
goto err_start_queue;
dev->netdev_ops->ndo_queue_mem_free(dev, old_mem);
return 0;
+err_start_queue:
dev->netdev_ops->ndo_queue_start(dev, rxq_idx, old_mem);
It might worth mentioning why queue start with old_mem will always success here as the return value seems to be ignored here.
So the old queue, we stopped it, and if we fail to bring up the new queue, then we want to start the old queue back up to get the queue back to a workable state.
I don't see what we can do to recover if restarting the old queue fails. Seems like it should be a requirement that the driver tries as much as possible to keep the old queue restartable.
Is it possible that we may have the 'old_mem' leaking if the driver fails to restart the old queue? how does the driver handle the firmware cmd failure for ndo_queue_start()? it seems a little tricky to implement it.
I'm not sure what we can do to meaningfully recover from failure to restarting the old queue, except log it so the error is visible. In theory because we have not modifying any queue configurations restarting it would be straight forward, but since it's dealing with hardware then any failures are possible.
Yes, we may need to have a clear semantics of how should the driver implement the above interface, for example if the driver should free the memory when fail to start a queue or the driver should restart the queue when fail to stop a queue? Otherwise we may have different driver implementing different behavior.
From the disscusion you mentioned below, does it make senses to modeling rdma subsystem by using create_queue/modify_queue/destroy_queue semantics instead?
I can improve this by at least logging or warning if restarting the old queue fails.
Also the semantics of the above function seems odd that it is not only restarting rx queue, but also freeing and allocating memory despite the name only suggests 'restart', I am a litte afraid that it may conflict with future usecae when user only need the 'restart' part, perhaps rename it to a more appropriate name.
Oh, what we want here is just the 'restart' part. However, Jakub mandates that if you restart a queue (or a driver), you do it like this, hence the slightly more complicated implementation.
https://patchwork.kernel.org/project/netdevbpf/patch/20231106024413.2801438-... https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
Thanks for the link.
I like david's idea of "a more generic design where H/W queues are created and destroyed - e.g., queues unique to a process which makes the cleanup so much easier." , but it seems it is a lot of work for networking to implement that for now.