On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng 11065417@vivo.com wrote:
在 2024/4/8 15:58, Christian König 写道:
Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
[SNIP]
Am 13.11.21 um 07:22 schrieb Jianqun Xu:
Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with offset and len.
You have not given an use case for this so it is a bit hard to review. And from the existing use cases I don't see why this should be necessary.
Even worse from the existing backend implementation I don't even see how drivers should be able to fulfill this semantics.
Please explain further, Christian.
Here is a practical case: The user space can allocate a large chunk of dma-buf for self-management, used as a shared memory pool. Small dma-buf can be allocated from this shared memory pool and released back to it after use, thus improving the speed of dma-buf allocation and release. Additionally, custom functionalities such as memory statistics and boundary checking can be implemented in the user space. Of course, the above-mentioned functionalities require the implementation of a partial cache sync interface.
Well that is obvious, but where is the code doing that?
You can't send out code without an actual user of it. That will obviously be rejected.
Regards, Christian.
In fact, we have already used the user-level dma-buf memory pool in the camera shooting scenario on the phone.
From the test results, The execution time of the photo shooting algorithm has been reduced from 3.8s to 3s.
For phones, the (out of tree) Android version of the system heap has a page pool connected to a shrinker. That allows you to skip page allocation without fully pinning the memory like you get when allocating a dma-buf that's way larger than necessary. If it's for a camera maybe you need physically contiguous memory, but it's also possible to set that up.
https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/dr...
To be honest, I didn't understand your concern "...how drivers should be able to fulfill this semantics." Can you please help explain it in more detail?
Thanks,
Rong Qianfeng.
Thanks Rong Qianfeng.
Am 09.04.24 um 18:37 schrieb T.J. Mercier:
On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng 11065417@vivo.com wrote:
在 2024/4/8 15:58, Christian König 写道:
Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
[SNIP]
Am 13.11.21 um 07:22 schrieb Jianqun Xu:
Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with offset and len.
You have not given an use case for this so it is a bit hard to review. And from the existing use cases I don't see why this should be necessary.
Even worse from the existing backend implementation I don't even see how drivers should be able to fulfill this semantics.
Please explain further, Christian.
Here is a practical case: The user space can allocate a large chunk of dma-buf for self-management, used as a shared memory pool. Small dma-buf can be allocated from this shared memory pool and released back to it after use, thus improving the speed of dma-buf allocation and release. Additionally, custom functionalities such as memory statistics and boundary checking can be implemented in the user space. Of course, the above-mentioned functionalities require the implementation of a partial cache sync interface.
Well that is obvious, but where is the code doing that?
You can't send out code without an actual user of it. That will obviously be rejected.
Regards, Christian.
In fact, we have already used the user-level dma-buf memory pool in the camera shooting scenario on the phone.
From the test results, The execution time of the photo shooting algorithm has been reduced from 3.8s to 3s.
For phones, the (out of tree) Android version of the system heap has a page pool connected to a shrinker.
Well, it should be obvious but I'm going to repeat it here: Submitting kernel patches for our of tree code is a rather *extreme* no-go.
That in kernel code *must* have an in kernel user is a number one rule.
When somebody violates this rule he pretty much disqualifying himself as reliable source of patches since maintainers then have to assume that this person tries to submit code which doesn't have a justification to be upstream.
So while this actually looks useful from the technical side as long as nobody does an implementation based on an upstream driver I absolutely have to reject it from the organizational side.
Regards, Christian.
That allows you to skip page allocation without fully pinning the memory like you get when allocating a dma-buf that's way larger than necessary. If it's for a camera maybe you need physically contiguous memory, but it's also possible to set that up.
https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/dr...
To be honest, I didn't understand your concern "...how drivers should be able to fulfill this semantics." Can you please help explain it in more detail?
Thanks,
Rong Qianfeng.
Thanks Rong Qianfeng.
On Wed, Apr 10, 2024 at 7:22 AM Christian König christian.koenig@amd.com wrote:
Am 09.04.24 um 18:37 schrieb T.J. Mercier:
On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng 11065417@vivo.com wrote:
在 2024/4/8 15:58, Christian König 写道:
Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
[SNIP]
Am 13.11.21 um 07:22 schrieb Jianqun Xu: > Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with > offset and len. You have not given an use case for this so it is a bit hard to review. And from the existing use cases I don't see why this should be necessary.
Even worse from the existing backend implementation I don't even see how drivers should be able to fulfill this semantics.
Please explain further, Christian.
Here is a practical case: The user space can allocate a large chunk of dma-buf for self-management, used as a shared memory pool. Small dma-buf can be allocated from this shared memory pool and released back to it after use, thus improving the speed of dma-buf allocation and release. Additionally, custom functionalities such as memory statistics and boundary checking can be implemented in the user space. Of course, the above-mentioned functionalities require the implementation of a partial cache sync interface.
Well that is obvious, but where is the code doing that?
You can't send out code without an actual user of it. That will obviously be rejected.
Regards, Christian.
In fact, we have already used the user-level dma-buf memory pool in the camera shooting scenario on the phone.
From the test results, The execution time of the photo shooting algorithm has been reduced from 3.8s to 3s.
For phones, the (out of tree) Android version of the system heap has a page pool connected to a shrinker.
Well, it should be obvious but I'm going to repeat it here: Submitting kernel patches for our of tree code is a rather *extreme* no-go.
Sorry I think my comment led to some confusion. I wasn't suggesting you should take the patch; it's clearly incomplete. I was pointing out another option to Rong. It appears Rong is creating a single oversized dma-buf, and subdividing it in userspace to avoid multiple allocations for multiple buffers. That would lead to a need for a partial sync of the buffer from userspace. Instead of that, using a heap with a page pool gets you kind of the same thing with a lot less headache in userspace, and no need for partial sync. So I wanted to share that option, and that can go in just Android if necessary without this patch.
That in kernel code *must* have an in kernel user is a number one rule.
When somebody violates this rule he pretty much disqualifying himself as reliable source of patches since maintainers then have to assume that this person tries to submit code which doesn't have a justification to be upstream.
So while this actually looks useful from the technical side as long as nobody does an implementation based on an upstream driver I absolutely have to reject it from the organizational side.
Regards, Christian.
That allows you to skip page allocation without fully pinning the memory like you get when allocating a dma-buf that's way larger than necessary. If it's for a camera maybe you need physically contiguous memory, but it's also possible to set that up.
https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/dr...
To be honest, I didn't understand your concern "...how drivers should be able to fulfill this semantics." Can you please help explain it in more detail?
Thanks,
Rong Qianfeng.
Thanks Rong Qianfeng.
Am 10.04.24 um 17:07 schrieb T.J. Mercier:
On Wed, Apr 10, 2024 at 7:22 AM Christian König christian.koenig@amd.com wrote:
Am 09.04.24 um 18:37 schrieb T.J. Mercier:
On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng 11065417@vivo.com wrote:
在 2024/4/8 15:58, Christian König 写道:
Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
[SNIP] > Am 13.11.21 um 07:22 schrieb Jianqun Xu: >> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with >> offset and len. > You have not given an use case for this so it is a bit hard to > review. And from the existing use cases I don't see why this should > be necessary. > > Even worse from the existing backend implementation I don't even see > how drivers should be able to fulfill this semantics. > > Please explain further, > Christian. Here is a practical case: The user space can allocate a large chunk of dma-buf for self-management, used as a shared memory pool. Small dma-buf can be allocated from this shared memory pool and released back to it after use, thus improving the speed of dma-buf allocation and release. Additionally, custom functionalities such as memory statistics and boundary checking can be implemented in the user space. Of course, the above-mentioned functionalities require the implementation of a partial cache sync interface.
Well that is obvious, but where is the code doing that?
You can't send out code without an actual user of it. That will obviously be rejected.
Regards, Christian.
In fact, we have already used the user-level dma-buf memory pool in the camera shooting scenario on the phone.
From the test results, The execution time of the photo shooting algorithm has been reduced from 3.8s to 3s.
For phones, the (out of tree) Android version of the system heap has a page pool connected to a shrinker.
Well, it should be obvious but I'm going to repeat it here: Submitting kernel patches for our of tree code is a rather *extreme* no-go.
Sorry I think my comment led to some confusion. I wasn't suggesting you should take the patch; it's clearly incomplete. I was pointing out another option to Rong. It appears Rong is creating a single oversized dma-buf, and subdividing it in userspace to avoid multiple allocations for multiple buffers. That would lead to a need for a partial sync of the buffer from userspace. Instead of that, using a heap with a page pool gets you kind of the same thing with a lot less headache in userspace, and no need for partial sync. So I wanted to share that option, and that can go in just Android if necessary without this patch.
Ah! Thanks for the clarification and sorry for any noise I created.
I mean from the technical side the patch doesn't looks invalid or anything, but there is simply no upstream use case.
Regards, Christian.
That in kernel code *must* have an in kernel user is a number one rule.
When somebody violates this rule he pretty much disqualifying himself as reliable source of patches since maintainers then have to assume that this person tries to submit code which doesn't have a justification to be upstream.
So while this actually looks useful from the technical side as long as nobody does an implementation based on an upstream driver I absolutely have to reject it from the organizational side.
Regards, Christian.
That allows you to skip page allocation without fully pinning the memory like you get when allocating a dma-buf that's way larger than necessary. If it's for a camera maybe you need physically contiguous memory, but it's also possible to set that up.
https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/dr...
To be honest, I didn't understand your concern "...how drivers should be able to fulfill this semantics." Can you please help explain it in more detail?
Thanks,
Rong Qianfeng.
Thanks Rong Qianfeng.
linaro-mm-sig@lists.linaro.org