On Thu, 09 Nov 2023 12:05:37 +0100 Paolo Abeni wrote:
> > I suppose we just disagree on the elegance of the API.
>
> FWIW, I think sockopt +cmsg is the right API.
FWIW it's fine by me as well.
My brain is slightly fried after trying to catch up on the thread
for close to 2h. So forgive me if I'm missing something.
This applies to all emails I'm about to send :)
On Sun, 5 Nov 2023 18:44:11 -0800 Mina Almasry wrote:
> + trigger_device_reset();
The user space must not be responsible for the reset.
We can add some temporary "recreate page pools" ndo
until the queue API is ready.
But it should not be visible to the user in any way.
And then the kernel can issue the same reset when the netlink
socket dies to flush device free lists.
Maybe we should also add a "allow device/all-queues reload" flag
to the netlink API to differentiate drivers which can't implement
full queue API later on. We want to make sure the defaults work well
in our "target design", rather than at the first stage. And target
design will reload queues one by one.
Am 09.11.23 um 04:20 schrieb Mina Almasry:
> [SNIP]
>> But we might be able to do something as folio is doing now, mm subsystem
>> is still seeing 'struct folio/page', but other subsystem like slab is using
>> 'struct slab', and there is still some common fields shared between
>> 'struct folio' and 'struct slab'.
>>
> In my eyes this is almost exactly what I suggested in RFC v1 and got
> immediately nacked with no room to negotiate. What we did for v1 is to
> allocate struct pages for dma-buf to make dma-bufs look like struct
> page to mm subsystem. Almost exactly what you're describing above.
> It's a no-go. I don't think renaming struct page to netmem is going to
> move the needle (it also re-introduces code-churn). What I feel like I
> learnt is that dma-bufs are not struct pages and can't be made to look
> like one, I think.
Yeah, that pretty much hits the nail on the head. What was not mentioned
yet and you could potentially try to do is to go the other way around.
In other words instead of importing a DMA-buf file descriptor into the
page-pool, you take some netdev page-pool pages and export them as
DMA-buf handle.
All you then need to do is to implement the necessary DMA-buf interface,
e.g. wait for DMA-fences before accessing stuff, when you have an async
operation install a DMA-fence so that other can wait for your operation
to finish etc.. etc..
This still doesn't gives you peer 2 peer over for example the PCIe bus,
but it would give you zero copy in the sense that a GPU or other
acceleration device directly writes stuff into memory a network device
can work with.
We already have some similar functionality for at least Intel and AMD hw
where an userptr mechanism is used to make malloced memory (backed by
normal struct pages) available to the GPU. The problem with this
approach is that most GPUs currently can't do good recoverable page
faults which in turn makes the whole MMU notifier based approach just
horrible inefficient.
Just giving a few more pointers you might want to look into...
Cheers,
Christian.
From: Mina Almasry
> Sent: 06 November 2023 02:44
>
> For device memory TCP, we expect the skb headers to be available in host
> memory for access, and we expect the skb frags to be in device memory
> and unaccessible to the host. We expect there to be no mixing and
> matching of device memory frags (unaccessible) with host memory frags
> (accessible) in the same skb.
>
> Add a skb->devmem flag which indicates whether the frags in this skb
> are device memory frags or not.
>
...
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 1fae276c1353..8fb468ff8115 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -805,6 +805,8 @@ typedef unsigned char *sk_buff_data_t;
> * @csum_level: indicates the number of consecutive checksums found in
> * the packet minus one that have been verified as
> * CHECKSUM_UNNECESSARY (max 3)
> + * @devmem: indicates that all the fragments in this skb are backed by
> + * device memory.
> * @dst_pending_confirm: need to confirm neighbour
> * @decrypted: Decrypted SKB
> * @slow_gro: state present at GRO time, slower prepare step required
> @@ -991,7 +993,7 @@ struct sk_buff {
> #if IS_ENABLED(CONFIG_IP_SCTP)
> __u8 csum_not_inet:1;
> #endif
> -
> + __u8 devmem:1;
> #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
> __u16 tc_index; /* traffic control index */
> #endif
> @@ -1766,6 +1768,12 @@ static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb)
> __skb_zcopy_downgrade_managed(skb);
> }
Doesn't that bloat struct sk_buff?
I'm not sure there are any spare bits available.
Although CONFIG_NET_SWITCHDEV and CONFIG_NET_SCHED seem to
already add padding.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Mon, Oct 30, 2023 at 11:58:20AM +0100, Marco Pagani wrote:
> On 2023-10-25 10:43, Maxime Ripard wrote:
> > On Tue, Oct 24, 2023 at 07:14:25PM +0200, Marco Pagani wrote:
> >>>> +static void drm_gem_shmem_test_obj_create_private(struct kunit *test)
> >>>> +{
> >>>> + struct fake_dev *fdev = test->priv;
> >>>> + struct drm_gem_shmem_object *shmem;
> >>>> + struct drm_gem_object *gem_obj;
> >>>> + struct dma_buf buf_mock;
> >>>> + struct dma_buf_attachment attach_mock;
> >>>> + struct sg_table *sgt;
> >>>> + char *buf;
> >>>> + int ret;
> >>>> +
> >>>> + /* Create a mock scatter/gather table */
> >>>> + buf = kunit_kzalloc(test, TEST_SIZE, GFP_KERNEL);
> >>>> + KUNIT_ASSERT_NOT_NULL(test, buf);
> >>>> +
> >>>> + sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> >>>> + KUNIT_ASSERT_NOT_NULL(test, sgt);
> >>>> +
> >>>> + ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
> >>>> + KUNIT_ASSERT_EQ(test, ret, 0);
> >>>> + sg_init_one(sgt->sgl, buf, TEST_SIZE);
> >>>> +
> >>>> + /* Init a mock DMA-BUF */
> >>>> + buf_mock.size = TEST_SIZE;
> >>>> + attach_mock.dmabuf = &buf_mock;
> >>>> +
> >>>> + gem_obj = drm_gem_shmem_prime_import_sg_table(&fdev->drm_dev, &attach_mock, sgt);
> >>>> + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gem_obj);
> >>>> + KUNIT_ASSERT_EQ(test, gem_obj->size, TEST_SIZE);
> >>>> + KUNIT_ASSERT_NULL(test, gem_obj->filp);
> >>>> + KUNIT_ASSERT_NOT_NULL(test, gem_obj->funcs);
> >>>> +
> >>>> + shmem = to_drm_gem_shmem_obj(gem_obj);
> >>>> + KUNIT_ASSERT_PTR_EQ(test, shmem->sgt, sgt);
> >>>> +
> >>>> + /* The scatter/gather table is freed by drm_gem_shmem_free */
> >>>> + drm_gem_shmem_free(shmem);
> >>>> +}
> >>>
> >>> KUNIT_ASSERT_* will stop the execution of the test on failure, you
> >>> should probably use a bit more of KUNIT_EXPECT_* calls otherwise you'll
> >>> leak resources.
> >>>
> >>> You also probably want to use a kunit_action to clean up and avoid that
> >>> whole discussion
> >>>
> >>
> >> You are right. I slightly prefer using KUnit expectations (unless actions
> >> are strictly necessary) since I feel using actions makes test cases a bit
> >> less straightforward to understand. Is this okay for you?
> >
> > I disagree. Actions make it easier to reason about, even when comparing
> > assertion vs expectation
> >
> > Like, for the call to sg_alloc_table and
> > drm_gem_shmem_prime_import_sg_table(), the reasonable use of assert vs
> > expect would be something like:
> >
> > sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> > KUNIT_ASSERT_NOT_NULL(test, sgt);
> >
> > ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
> > KUNIT_ASSERT_EQ(test, ret, 0);
> >
> > /*
> > * Here, it's already not super clear whether you want to expect vs
> > * assert. expect will make you handle the failure case later, assert will
> > * force you to call kfree on sgt. Both kind of suck in their own ways.
> > */
> >
> > sg_init_one(sgt->sgl, buf, TEST_SIZE);
> >
> > gem_obj = drm_gem_shmem_prime_import_sg_table(&fdev->drm_dev, &attach_mock, sgt);
> > KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gem_obj);
> >
> > /*
> > * If the assert fails, we forgot to call sg_free_table(sgt) and kfree(sgt).
> > */
> >
> > KUNIT_EXPECT_EQ(test, gem_obj->size, TEST_SIZE);
> > KUNIT_EXPECT_NULL(test, gem_obj->filp);
> > KUNIT_EXPECT_NOT_NULL(test, gem_obj->funcs);
> >
> > /*
> > * And here we have to handle the case where the expectation was wrong,
> > * but the test still continued.
> > */
> >
> > But if you're not using an action, you still have to call kfree(sgt),
> > which means that you might still
> >
> > shmem = to_drm_gem_shmem_obj(gem_obj);
> > KUNIT_ASSERT_PTR_EQ(test, shmem->sgt, sgt);
> >
> > /*
> > * If the assertion fails, we now have to call drm_gem_shmem_free(shmem)
> > */
> >
> > /* The scatter/gather table is freed by drm_gem_shmem_free */
> > drm_gem_shmem_free(shmem);
> >
> > /* everything's fine now */
> >
> > The semantics around drm_gem_shmem_free make it a bit convoluted, but
> > doing it using goto/labels, plus handling the assertions and error
> > reporting would be difficult.
> >
> > Using actions, we have:
> >
> > sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> > KUNIT_ASSERT_NOT_NULL(test, sgt);
> >
> > ret = kunit_add_action_or_reset(test, kfree_wrapper, sgt);
> > KUNIT_ASSERT_EQ(test, ret, 0);
> >
> > ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
> > KUNIT_ASSERT_EQ(test, ret, 0);
> >
> > ret = kunit_add_action_or_reset(test, sg_free_table_wrapper, sgt);
> > KUNIT_ASSERT_EQ(test, ret, 0);
> >
> > sg_init_one(sgt->sgl, buf, TEST_SIZE);
> >
> > gem_obj = drm_gem_shmem_prime_import_sg_table(&fdev->drm_dev, &attach_mock, sgt);
> > KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gem_obj);
> > KUNIT_EXPECT_EQ(test, gem_obj->size, TEST_SIZE);
> > KUNIT_EXPECT_NULL(test, gem_obj->filp);
> > KUNIT_EXPECT_NOT_NULL(test, gem_obj->funcs);
> >
> > /* drm_gem_shmem_free will free the struct sg_table itself */
> > kunit_remove_action(test, sg_free_table_wrapper, sgt);
> > kunit_remove_action(test, kfree_wrapper, sgt);
>
> I agree that using actions makes error handling cleaner. However, I still
> have some concerns about the additional complexity that actions introduce.
> For instance, I feel these two lines make the testing harness more complex
> without asserting any additional property of the component under test.
If anything, the API makes it more difficult to deal with. It would
actually be harder/messier to handle without an action.
> In some sense, I wonder if it is worth worrying about memory leaks when
> a test case fails. At that point, the system is already in an inconsistent
> state due to a bug in the component under test, so it is unsafe to continue
> anyway.
I guess the larger issue is: once that code will be merged, we're going
to have patches to convert to actions because they make it nicer and fix
a couple of issues anyway.
So, if it's still the state we're going to end up in, why not doing it
right from the beginning?
Maxime
On Wed, 2023-11-01 at 11:20 +0530, Jaskaran Singh wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> On 10/20/2023 3:20 PM, Yong Wu (吴勇) wrote:
> > On Thu, 2023-10-19 at 10:16 +0530, Vijayanand Jitta wrote:
> >>
> >> Instead of having a vendor specific binding for cma area, How
> about
> >> retrieving
> >>
> >
> https://lore.kernel.org/lkml/1594948208-4739-1-git-send-email-hayashi.kunih…
> >> ?
> >> dma_heap_add_cma can just associate cma region and create a heap.
> So,
> >> we can reuse cma heap
> >> code for allocation instead of replicating that code here.
> >>
> >
> > Thanks for the reference. I guess we can't use it. There are two
> > reasons:
> >
> > a) The secure heap driver is a pure software driver and we have no
> > device for it, therefore we cannot call dma_heap_add_cma.
> >
>
> Hi Yong,
>
> We're considering using struct cma as the function argument to
> dma_heap_add_cma() rather than struct device. Would this help
> resolve the problem of usage with dma_heap_add_cma()?
Yes. If we use "struct cma", I guess it works.
>
> > b) The CMA area here is dynamic for SVP. Normally this CMA can be
> used
> > in the kernel. In the SVP case we use cma_alloc to get it and pass
> the
> > entire CMA physical start address and size into TEE to protect the
> CMA
> > region. The original CMA heap cannot help with the TEE part.
> >
>
> Referring the conversation at
>
https://lore.kernel.org/lkml/7a2995de23c24ef22c071c6976c02b97e9b50126.camel…
> ;
>
> since we're considering abstracting secure mem ops, would it make
> sense
> to use the default CMA heap ops (cma_heap_ops), allocate buffers from
> it
> and secure each allocated buffer?
Then it looks you also need tee operation like tee_client_open_session
and tee_client_invoke_func, right?
It seems we also need change a bit for "cma_heap_allocate" to allow cma
support operations from secure heap.
I will send a v2 to move the discussion forward. The v2 is based on our
case, It won't include the cma part.
>
> Thanks,
> Jaskaran.
>
> > Thanks.
> >
> >> Thanks,
> >> Vijay
> >>
> >>
> >>
>