On Wed, Mar 09, 2022 at 10:01:17AM +0200, Jarkko Sakkinen wrote:
On Tue, Mar 08, 2022 at 11:16:19AM -0800, Reinette Chatre wrote:
Hi,
On 3/3/2022 2:38 PM, Jarkko Sakkinen wrote:
There is a limited amount of SGX memory (EPC) on each system. When that memory is used up, SGX has its own swapping mechanism which is similar in concept but totally separate from the core mm/* code. Instead of swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM comes from a shared memory pseudo-file and can itself be swapped by the core mm code. There is a hierarchy like this:
EPC <-> shmem <-> disk
After data is swapped back in from shmem to EPC, the shmem backing storage needs to be freed. Currently, the backing shmem is not freed. This effectively wastes the shmem while the enclave is running. The memory is recovered when the enclave is destroyed and the backing storage freed.
Sort this out by freeing memory with shmem_truncate_range(), as soon as a page is faulted back to the EPC. In addition, free the memory for PCMD pages as soon as all PCMD's in a page have been marked as unused by zeroing its contents.
Reported-by: Dave Hansen dave.hansen@linux.intel.com Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Signed-off-by: Jarkko Sakkinen jarkko@kernel.org
I can reliably reproduce the issue this patch aims to solve by creating a virtual machine that has a significant portion of its memory consumed by EPC:
qemu-system-x86_64 -smp 4 -m 4G\ -enable-kvm \ -cpu host,+sgx-provisionkey \ -object memory-backend-ram,size=2G,host-nodes=0,policy=bind,id=node0 \ -object memory-backend-epc,id=mem0,size=1536M,prealloc=on,host-nodes=0,policy=bind \ -numa node,nodeid=0,cpus=0-1,memdev=node0 \ -object memory-backend-ram,size=2G,host-nodes=1,policy=bind,id=node1 \ -object memory-backend-epc,id=mem1,size=1536M,prealloc=on,host-nodes=1,policy=bind \ -numa node,nodeid=1,cpus=2-3,memdev=node1 \ -M sgx-epc.0.memdev=mem0,sgx-epc.0.node=0,sgx-epc.1.memdev=mem1,sgx-epc.1.node=1 \ ...
Before this patch, running the very stressful SGX2 over subscription test case (unclobbered_vdso_oversubscribed_remove) in this environment always triggers the oom-killer but no amount of tasks killed can save the system with it always ending deadlocked on memory:
[ 58.642719] Tasks state (memory values in pages): [ 58.644324] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 58.647237] [ 195] 0 195 3153 197 45056 0 -1000 systemd-udevd [ 58.650238] [ 281] 0 281 1836367 0 10817536 0 0 test_sgx [ 58.653088] Out of memory and no killable processes... [ 58.654832] Kernel panic - not syncing: System is deadlocked on memory
After applying this patch I was able to run SGX2 selftest unclobbered_vdso_oversubscribed_remove ten times successfully.
Tested-by: Reinette Chatre reinette.chatre@intel.com
Thank you.
Reinette
Dave, can this be picked up?
BR, Jarkko