On Fri, Nov 10, 2023 at 3:13 PM Jakub Kicinski kuba@kernel.org wrote:
My brain is slightly fried after trying to catch up on the thread for close to 2h. So forgive me if I'm missing something. This applies to all emails I'm about to send :)
On Sun, 5 Nov 2023 18:44:11 -0800 Mina Almasry wrote:
trigger_device_reset();
The user space must not be responsible for the reset. We can add some temporary "recreate page pools" ndo until the queue API is ready.
Thanks for the clear requirement. I clearly had something different in mind.
Might be dumb suggestions, but instead of creating a new ndo that we maybe end up wanting to deprecate once the queue API is ready, how about we use either of those existing APIs?
+void netdev_reset(struct net_device *dev) +{ + int flags = ETH_RESET_ALL; + int err; + +#if 1 + __dev_close(dev); + err = __dev_open(dev, NULL); +#else + err = dev->ethtool_ops->reset(dev, &flags); +#endif +} +
I've tested both of these to work with GVE on both bind via the netlink API and unbind via the netlink socket close, but I'm not enough of an expert to tell if there is some bad side effect that can happen or something.
But it should not be visible to the user in any way.
And then the kernel can issue the same reset when the netlink socket dies to flush device free lists.
Sure thing, I can do that.
Maybe we should also add a "allow device/all-queues reload" flag to the netlink API to differentiate drivers which can't implement full queue API later on. We want to make sure the defaults work well in our "target design", rather than at the first stage. And target design will reload queues one by one.
I can add a flag, yes.