On Fri, 10 Nov 2023 18:27:08 -0800 Mina Almasry wrote:
> Thanks for the clear requirement. I clearly had something different in mind.
>
> Might be dumb suggestions, but instead of creating a new ndo that we
> maybe end up wanting to deprecate once the queue API is ready, how
> about we use either of those existing APIs?
>
> +void netdev_reset(struct net_device *dev)
> +{
> + int flags = ETH_RESET_ALL;
> + int err;
> +
> +#if 1
> + __dev_close(dev);
> + err = __dev_open(dev, NULL);
> +#else
> + err = dev->ethtool_ops->reset(dev, &flags);
> +#endif
> +}
> +
>
> I've tested both of these to work with GVE on both bind via the
> netlink API and unbind via the netlink socket close, but I'm not
> enough of an expert to tell if there is some bad side effect that can
> happen or something.
We generally don't accept drivers doing device reconfiguration with
full close() + open() because if the open() fails your machine
may be cut off.
There are drivers which do it, but they are either old... or weren't
reviewed hard enough.
The driver should allocate memory and whether else it can without
stopping the queues first. Once it has all those, stop the queues,
reconfigure with already allocated resources, start queues, free old.
Even without the queue API in place, good drivers do full device
reconfig this way. Hence my mind goes towards a new (temporary?)
ndo. It will be replaced by the queue API, but whoever implements
it for now has to follow this careful reconfig strategy...
On Sun, 5 Nov 2023 18:44:07 -0800 Mina Almasry wrote:
> #include <net/net_debug.h>
> #include <net/dropreason-core.h>
> +#include <net/page_pool/types.h>
> +#include <net/page_pool/helpers.h>
> /**
> * DOC: skb checksums
> @@ -3402,15 +3404,38 @@ static inline void skb_frag_off_copy(skb_frag_t *fragto,
> fragto->bv_offset = fragfrom->bv_offset;
> }
>
> +/* Returns true if the skb_frag contains a page_pool_iov. */
> +static inline bool skb_frag_is_page_pool_iov(const skb_frag_t *frag)
> +{
> + return page_is_page_pool_iov(frag->bv_page);
> +}
Maybe we can create a new header? For skb + page pool.
skbuff.h is included by 1/4th of the kernel objects, we should not
be adding dependencies to this header, it really slows down incremental
builds.
On Tue, 7 Nov 2023 11:53:22 -0800 Mina Almasry wrote:
> My bad on not including some docs about this. The next version should
> have the commit message beefed up to explain all this, or a docs
> patch.
Yes, please. Would be great to have the user facing interface well
explained under Documentation/
On Sun, 5 Nov 2023 18:44:09 -0800 Mina Almasry wrote:
> + if (!skb_frags_not_readable(skb)) {
nit: maybe call the helper skb_frags_readable() and flip
the polarity, avoid double negatives.
On Sun, 5 Nov 2023 18:44:01 -0800 Mina Almasry wrote:
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 6fc5134095ed..d4bea053bb7e 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -60,6 +60,8 @@ struct page_pool_params {
> int nid;
> struct device *dev;
> struct napi_struct *napi;
> + u8 memory_provider;
> + void *mp_priv;
> enum dma_data_direction dma_dir;
> unsigned int max_len;
> unsigned int offset;
you should rebase on top of net-next
More importantly I was expecting those fields to be gone from params.
The fact that the page pool is configured to a specific provider
should be fully transparent to the driver, driver should just tell
the core what queue its creating the pool from and if there's a dmabuf
bound for that queue - out pops a pp backed by the dmabuf.
My RFC had the page pool params fields here as a hack.
On Tue, 7 Nov 2023 14:23:20 -0800 Stanislav Fomichev wrote:
> Can we mark a socket as devmem-only? Do we have any use-case for those
> hybrid setups? Or, let me put it that way: do we expect API callers
> to handle both linear and non-linear cases correctly?
> As a consumer of the previous versions of these apis internally,
> I find all those corner cases confusing :-( Hence trying to understand
> whether we can make it a bit more rigid and properly defined upstream.
FWIW I'd also prefer to allow mixing. "Some NICs" can decide HDS
very flexibly, incl. landing full jumbo frames into the "headers".
There's no sender API today to signal how to mark the data for
selective landing, but if Mina already has the rx side written
to allow that...
On Sun, 5 Nov 2023 18:44:02 -0800 Mina Almasry wrote:
> + -
> + name: queues
> + doc: receive queues to bind the dma-buf to.
> + type: u32
> + multi-attr: true
I think that you should throw in the queue type.
I know you made the op imply RX:
> + -
> + name: bind-rx
but if we decide to create a separate "type" for some magic
queue type one day we'll have to ponder how to extend this API
IMHO queue should be identified by a <type, id> tuple, always.
On Sun, 5 Nov 2023 18:44:05 -0800 Mina Almasry wrote:
> +static int mp_dmabuf_devmem_init(struct page_pool *pool)
> +{
> + struct netdev_dmabuf_binding *binding = pool->mp_priv;
> +
> + if (!binding)
> + return -EINVAL;
> +
> + if (pool->p.flags & PP_FLAG_DMA_MAP ||
> + pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> + return -EOPNOTSUPP;
This looks backwards, we should _force_ the driver to use the dma
mapping built into the page pool APIs, to isolate the driver from
how the DMA addr actually gets obtained. Right?
Maybe seeing driver patches would illuminate.