On Tue, Sep 29, 2020 at 04:12:16PM +0200, Peter Zijlstra wrote:
On Tue, Sep 29, 2020 at 04:05:29PM +0300, Mike Rapoport wrote:
On Fri, Sep 25, 2020 at 09:41:25AM +0200, Peter Zijlstra wrote:
On Thu, Sep 24, 2020 at 04:29:03PM +0300, Mike Rapoport wrote:
From: Mike Rapoport rppt@linux.ibm.com
Removing a PAGE_SIZE page from the direct map every time such page is allocated for a secret memory mapping will cause severe fragmentation of the direct map. This fragmentation can be reduced by using PMD-size pages as a pool for small pages for secret memory mappings.
Add a gen_pool per secretmem inode and lazily populate this pool with PMD-size pages.
What's the actual efficacy of this? Since the pmd is per inode, all I need is a lot of inodes and we're in business to destroy the directmap, no?
Afaict there's no privs needed to use this, all a process needs is to stay below the mlock limit, so a 'fork-bomb' that maps a single secret page will utterly destroy the direct map.
This indeed will cause 1G pages in the direct map to be split into 2M chunks, but I disagree with 'destroy' term here. Citing the cover letter of an earlier version of this series:
It will drop them down to 4k pages. Given enough inodes, and allocating only a single sekrit page per pmd, we'll shatter the directmap into 4k.
I've tried to find some numbers that show the benefit of using larger pages in the direct map, but I couldn't find anything so I've run a couple of benchmarks from phoronix-test-suite on my laptop (i7-8650U with 32G RAM).
Existing benchmarks suck at this, but FB had a load that had a
I tried to dig the regression report in the mailing list, and the best I could find is
https://lore.kernel.org/lkml/20190823052335.572133-1-songliubraving@fb.com/
which does not mention the actual performance regression but it only complaints about kernel text mapping being split into 4K pages.
Any chance you have the regression report handy?
deterministic enough performance regression to bisect to a directmap issue, fixed by:
7af0145067bc ("x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text")
This commit talks about large page split for the text and mentions iTLB performance. Could it be that for data the behavoiur is different?
I've tested three variants: the default with 28G of the physical memory covered with 1G pages, then I disabled 1G pages using "nogbpages" in the kernel command line and at last I've forced the entire direct map to use 4K pages using a simple patch to arch/x86/mm/init.c. I've made runs of the benchmarks with SSD and tmpfs. Surprisingly, the results does not show huge advantage for large pages. For instance, here the results for kernel build with 'make -j8', in seconds:
Your benchmark should stress the TLB of your uarch, such that additional pressure added by the shattered directmap shows up.
I understand that the benchmark should stress the TLB, but it's not that we can add something like random access to a large working set as a kernel module and insmod it. The userspace should do something that will cause the stress to the TLB so that entries corresponding to the direct map will be evicted frequently. And, frankly,
And no, I don't have one either.
| 1G | 2M | 4K
----------------------+--------+--------+--------- ssd, mitigations=on | 308.75 | 317.37 | 314.9 ssd, mitigations=off | 305.25 | 295.32 | 304.92 ram, mitigations=on | 301.58 | 322.49 | 306.54 ram, mitigations=off | 299.32 | 288.44 | 310.65
These results lack error data, but assuming the reults are significant, then this very much makes a case for 1G mappings. 5s on a kernel builds is pretty good.
The standard error for those are between 2.5 and 4.5 out of 3 runs for each variant.
For kernel build 1G mappings perform better, but here 5s is only 1.6% of 300s and the direct map fragmentation was taken to the extreme here. I'm not saying that the direct map fragmentation comes with no cost, but the cost is not so big to dismiss features that cause the fragmentation out of hand.
There were also benchmarks that actually performed better with 2M pages in the direct map, so I'm still not convinced that 1G pages in the direct map are the clear cut winner.