On 2024/9/10 0:54, Mina Almasry wrote:
On Mon, Sep 9, 2024 at 4:21 AM Yunsheng Lin linyunsheng@huawei.com wrote:
On 2024/9/9 13:43, Mina Almasry wrote:
Perf - page-pool benchmark:
bench_page_pool_simple.ko tests with and without these changes: https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the 'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8 cycles without the changes but there is some 1 cycle noise in some results.
With the patches this regresses to 9 cycles with the changes but there is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in netmem_is_net_iov() check. To my surprise disabling the static_branch_unlikely() check reduces the fast path back to 8 cycles, but the 1 cycle noise remains.
Sorry for the late report, as I was adding a testing page_pool ko basing on [1] to avoid introducing performance regression when fixing the bug in [2]. I used it to test the performance impact of devmem patchset for page_pool too, it seems there might be some noticable performance impact quite stably for the below testcases, about 5%~16% performance degradation as below in the arm64 system:
Correct me if I'm wrong here, but on the surface here it seems that you're re-reporting a known issue. Consensus seems to be that it's a non-issue.
In v6 I reported that the bench_page_pool_simple.ko test reports a 1 cycle regression with these patches, from 8->9 cycles. That is roughly consistent with the 5-15% you're reporting.
From the description above in the cover letter, I thought the performance data using the out of tree testing ko is not stable enough to justify the performance impact.
I root caused the reason for the regression to be the netmem_is_net_iov() check in the fast path. I removed this regression in v7 (see the change log) by conditionally compiling the check in that function.
In v8, Pavel/Jens/David pushed back on the ifdef check. See this entire thread, but in particular this response from Jens:
It seemed the main objection is about how to enable this feature for the io_uring?
And it seemed that you had added the CONFIG_NET_DEVMEM for this devmem thing, why not use it for that?
https://lore.kernel.org/lkml/11f52113-7b67-4b45-ba1d-29b070050cec@kernel.dk/
Seems consensus that it's 'not really worth it in this scenario'.
I was only reading through the above thread, it didn't seemed to reach to consensus as Jesper pointed out the performance impact for the XDP DROP case in the same thread.
https://lore.kernel.org/lkml/779b9542-4170-483a-af54-ca0dd471f774@kernel.org...