Re: [PATCH v4 0/3] mm: process/cgroup ksm support

15 Mar 2023


      On Wed, Mar 15, 2023 at 05:05:47PM -0400, Johannes Weiner wrote:
...
On Wed, Mar 15, 2023 at 09:03:57PM +0100, David Hildenbrand wrote:
...
On 10.03.23 19:28, Stefan Roesch wrote:
...
So far KSM can only be enabled by calling madvise for memory regions. To
be able to use KSM for more workloads, KSM needs to have the ability to be
enabled / disabled at the process / cgroup level.
Use case 1:
The madvise call is not available in the programming language. An example for
this are programs with forked workloads using a garbage collected language without
pointers. In such a language madvise cannot be made available.
In addition the addresses of objects get moved around as they are garbage
collected. KSM sharing needs to be enabled "from the outside" for these type of
workloads.
Use case 2:
The same interpreter can also be used for workloads where KSM brings no
benefit or even has overhead. We'd like to be able to enable KSM on a workload
by workload basis.
Use case 3:
With the madvise call sharing opportunities are only enabled for the current
process: it is a workload-local decision. A considerable number of sharing
opportuniites may exist across multiple workloads or jobs. Only a higler level
entity like a job scheduler or container can know for certain if its running
one or more instances of a job. That job scheduler however doesn't have
the necessary internal worklaod knowledge to make targeted madvise calls.
Security concerns:
In previous discussions security concerns have been brought up. The problem is
that an individual workload does not have the knowledge about what else is
running on a machine. Therefore it has to be very conservative in what memory
areas can be shared or not. However, if the system is dedicated to running
multiple jobs within the same security domain, its the job scheduler that has
the knowledge that sharing can be safely enabled and is even desirable.
Performance:
Experiments with using UKSM have shown a capacity increase of around 20%.
Stefan, can you do me a favor and investigate which pages we end up
deduplicating -- especially if it's mostly only the zeropage and if it's
still that significant when disabling THP?
I'm currently investigating with some engineers on playing with enabling KSM
on some selected processes (enabling it blindly on all VMAs of that process
via madvise() ).
One thing we noticed is that such (~50 times) 20MiB processes end up saving
~2MiB of memory per process. That made me suspicious, because it's the THP
size.
What I think happens is that we have a 2 MiB area (stack?) and only touch a
single page. We get a whole 2 MiB THP populated. Most of that THP is zeroes.
KSM somehow ends up splitting that THP and deduplicates all resulting
zeropages. Thus, we "save" 2 MiB. Actually, it's more like we no longer
"waste" 2 MiB. I think the processes with KSM have less (none) THP than the
processes with THP enabled, but I only took a look at a sample of the
process' smaps so far.
THP and KSM is indeed an interesting problem. Better TLB hits with
THPs, but reduced chance of deduplicating memory - which may or may
not result in more IO that outweighs any THP benefits.
That said, the service in the experiment referenced above has swap
turned on and is under significant memory pressure. Unused splitpages
would get swapped out. The difference from KSM was from deduplicating
pages that were in active use, not internal THP fragmentation.
Brainfart, my apologies. It could have been the ksm-induced splits
themselves that allowed the unused subpages to get swapped out in the
first place.
But no, I double checked that workload just now. On a weekly average,
it has about 50 anon THPs and 12 million regular anon. THP is not a
factor in the reduction results.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v4 0/3] mm: process/cgroup ksm support