On Tue, 2020-09-29 at 16:12 +0200, Peter Zijlstra wrote:
On Tue, Sep 29, 2020 at 04:05:29PM +0300, Mike Rapoport wrote:
On Fri, Sep 25, 2020 at 09:41:25AM +0200, Peter Zijlstra wrote:
On Thu, Sep 24, 2020 at 04:29:03PM +0300, Mike Rapoport wrote:
From: Mike Rapoport rppt@linux.ibm.com
Removing a PAGE_SIZE page from the direct map every time such page is allocated for a secret memory mapping will cause severe fragmentation of the direct map. This fragmentation can be reduced by using PMD-size pages as a pool for small pages for secret memory mappings.
Add a gen_pool per secretmem inode and lazily populate this pool with PMD-size pages.
What's the actual efficacy of this? Since the pmd is per inode, all I need is a lot of inodes and we're in business to destroy the directmap, no?
Afaict there's no privs needed to use this, all a process needs is to stay below the mlock limit, so a 'fork-bomb' that maps a single secret page will utterly destroy the direct map.
This indeed will cause 1G pages in the direct map to be split into 2M chunks, but I disagree with 'destroy' term here. Citing the cover letter of an earlier version of this series:
It will drop them down to 4k pages. Given enough inodes, and allocating only a single sekrit page per pmd, we'll shatter the directmap into 4k.
Since the only requirement is 2M, even if this happens, which I'm not sure it does, it's fixable to only fragment down to 2M, right?
We could also enforce a global limit in the secretmem syscall, so the fork bomb problem can be made to go away.
Lastly, we could go back to boot time allocation as the previous patch did, so this isn't even a fundamental problem with the patch set.
That said, I think investigation of the importance of direct map tiling is useful, since it does fragment for other reasons, and fixing or proving that the fragmentation doesn't matter is also something we'll keep on investigating. But it would be useful in the meantime to explore things which may be more fundamental issues with the approach.
Regards,
James