On 10.09.25 18:15, Kyle Meyer wrote:
Soft offlining a HugeTLB page reduces the available HugeTLB page pool. Since HugeTLB pages are preallocated, reducing the available HugeTLB page pool can cause allocation failures.
/proc/sys/vm/enable_soft_offline provides a sysctl interface to disable/enable soft offline:
0 - Soft offline is disabled. 1 - Soft offline is enabled.
The current sysctl interface does not distinguish between HugeTLB pages and other page types.
Disable soft offline for HugeTLB pages by default (1) and extend the sysctl interface to preserve existing behavior (2):
0 - Soft offline is disabled. 1 - Soft offline is enabled (excluding HugeTLB pages). 2 - Soft offline is enabled (including HugeTLB pages).
Update documentation for the sysctl interface, reference the sysctl interface in the sysfs ABI documentation, and update HugeTLB soft offline selftests.
I'm sure you spotted that the documentation for "/sys/devices/system/memory/soft_offline_pag" resides under "testing".
If your read about MADV_SOFT_OFFLINE in the man page it clearly says:
"This feature is intended for testing of memory error-handling code; it is available only if the kernel was configured with CONFIG_MEMORY_FAILURE."
So I'm sorry to say: I miss why we should add all this complexity to make a feature used for testing soft-offlining work differently for hugetlb folios -- with a testing interface.