November 2019 - Linux-kselftest-mirror

[PATCH] kunit: testing kunit: Bug fix in test_run_timeout function

by Heidi Fahim

Assert in test_run_timeout was not updated with the build_dir argument and caused the following error: AssertionError: Expected call: run_kernel(timeout=3453) Actual call: run_kernel(build_dir=None, timeout=3453) Needed to update kunit_tool_test to reflect this fix https://lkml.org/lkml/2019/9/6/3 Signed-off-by: Heidi Fahim <heidifahim(a)google.com> Change-Id: I6f161c72c6a5f071a4dc31582ba08b91974502ce --- tools/testing/kunit/kunit_tool_test.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py index 4a12baa0cd4e..a2a8ea6beae3 100755 --- a/tools/testing/kunit/kunit_tool_test.py +++ b/tools/testing/kunit/kunit_tool_test.py @@ -199,7 +199,7 @@ class KUnitMainTest(unittest.TestCase): timeout = 3453 kunit.main(['run', '--timeout', str(timeout)], self.linux_source_mock) assert self.linux_source_mock.build_reconfig.call_count == 1 - self.linux_source_mock.run_kernel.assert_called_once_with(timeout=timeout) + self.linux_source_mock.run_kernel.assert_called_once_with(build_dir=None, timeout=timeout) self.print_mock.assert_any_call(StrContains('Testing complete.')) if __name__ == '__main__': -- 2.24.0.432.g9d3f5f5b63-goog

5 years, 10 months

3
3
0 0

[PATCH v4 0/4] PM / QoS: Restore DEV_PM_QOS_MIN/MAX_FREQUENCY

by Leonard Crestez

Support for frequency limits in dev_pm_qos was removed when cpufreq was switched to freq_qos, this series attempts to restore it by reimplementing on top of freq_qos. Discussion about removal is here: https://lore.kernel.org/linux-pm/VI1PR04MB7023DF47D046AEADB4E051EBEE680@VI1… The cpufreq core switched away because it needs contraints at the level of a "cpufreq_policy" which cover multiple cpus so dev_pm_qos coupling to struct device was not useful. Cpufreq could only use dev_pm_qos by implementing an additional layer of aggregation anyway. However in the devfreq subsystem scaling is always performed on a per-device basis so dev_pm_qos is a very good match. Support for dev_pm_qos in devfreq core is here (latest version, no dependencies outside this series): https://patchwork.kernel.org/cover/11252409/ That series is RFC mostly because it needs these PM core patches. Earlier versions got entangled in some locking cleanups but those are not strictly necessary to get dev_pm_qos functionality. In theory if freq_qos is extended to handle conflicting min/max values then this sharing would be valuable. Right now freq_qos just ties two unrelated pm_qos aggregations for min and max freq. --- This is implemented by embeding a freq_qos_request inside dev_pm_qos_request: the data field was already an union in order to deal with flag requests. The internal freq_qos_apply is exported so that it can be called from dev_pm_qos apply_constraints. The dev_pm_qos_constraints_destroy function has no obvious equivalent in freq_qos and the whole approach of "removing requests" is somewhat dubios: request objects should be owned by consumers and the list of qos requests will most likely be empty when the target device is deleted. Series follows current pattern for dev_pm_qos. First two patches can be applied separately. Changes since v3: * Fix s/QOS/QoS in patch 2 title * Improves comments in kunit test * Fix assertions after freq_qos_remove_request * Remove (c) from NXP copyright header * Wrap long lines in qos.c to be under 80 chars. This fixes checkpatch but the rule is already broken by code in the files. * Collect reviews Link to v3: https://patchwork.kernel.org/cover/11260627/ Changes since v2: * #define PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE FREQ_QOS_MAX_DEFAULT_VALUE * #define FREQ_QOS_MAX_DEFAULT_VALUE S32_MAX (in new patch) * Add initial kunit test for freq_qos, validating the MAX_DEFAULT_VALUE found by Matthias and another recent fix. Testing this should be easier! Link to v2: https://patchwork.kernel.org/cover/11250413/ Changes since v1: * Don't rename or EXPORT_SYMBOL_GPL the freq_qos_apply function; just drop the static marker. Link to v1: https://patchwork.kernel.org/cover/11212887/ Leonard Crestez (4): PM / QoS: Initial kunit test PM / QoS: Redefine FREQ_QOS_MAX_DEFAULT_VALUE to S32_MAX PM / QoS: Reorder pm_qos/freq_qos/dev_pm_qos structs PM / QoS: Restore DEV_PM_QOS_MIN/MAX_FREQUENCY drivers/base/Kconfig | 4 ++ drivers/base/power/Makefile | 1 + drivers/base/power/qos-test.c | 117 ++++++++++++++++++++++++++++++++++ drivers/base/power/qos.c | 73 +++++++++++++++++++-- include/linux/pm_qos.h | 86 ++++++++++++++----------- kernel/power/qos.c | 4 +- 6 files changed, 242 insertions(+), 43 deletions(-) create mode 100644 drivers/base/power/qos-test.c -- 2.17.1

5 years, 10 months

5
12
0 0

[PATCH] fs/ext4/inode-test: Fix inode test on 32 bit platforms.

by Iurii Zaikin

Fixes the issue caused by the fact that in C in the expression of the form -1234L only 1234L is the actual literal, the unary minus is an operation applied to the literal. Which means that to express the lower bound for the type one has to negate the upper bound and subtract 1. Original error: Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == -2147483648 timestamp.tv_sec == 2147483648 1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits: msb:1 lower_bound:1 extra_bits: 0 Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == 2147483648 timestamp.tv_sec == 6442450944 2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on: msb:1 lower_bound:1 extra_bits: 1 Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == 6442450944 timestamp.tv_sec == 10737418240 2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on: msb:1 lower_bound:1 extra_bits: 2 not ok 1 - inode_test_xtimestamp_decoding not ok 1 - ext4_inode_test Reported-by: Geert Uytterhoeven <geert(a)linux-m68k.org> Signed-off-by: Iurii Zaikin <yzaikin(a)google.com> Tested-by: Geert Uytterhoeven <geert(a)linux-m68k.org> --- fs/ext4/inode-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/inode-test.c b/fs/ext4/inode-test.c index 92a9da1774aa..bbce1c328d85 100644 --- a/fs/ext4/inode-test.c +++ b/fs/ext4/inode-test.c @@ -25,7 +25,7 @@ * For constructing the negative timestamp lower bound value. * binary: 10000000 00000000 00000000 00000000 */ -#define LOWER_MSB_1 (-0x80000000L) +#define LOWER_MSB_1 (-(UPPER_MSB_0) - 1L) /* avoid overflow */ /* * For constructing the negative timestamp upper bound value. * binary: 11111111 11111111 11111111 11111111 -- 2.24.0.432.g9d3f5f5b63-goog

5 years, 10 months

2
1
0 0

[GIT PULL] seccomp updates for v5.5-rc1

by Kees Cook

Hi Linus, Please pull these seccomp updates for v5.5-rc1. Mostly this is implementing the new flag SECCOMP_USER_NOTIF_FLAG_CONTINUE, but there are cleanups as well. Most notably, the secure_computing() prototype has changed (to remove an unused argument), but this has happened at the same time as riscv adding seccomp support, so the cleanest merge order would be to merge riscv first, then seccomp with the following patch for riscv to handle the change from "seccomp: simplify secure_computing()": diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c index 0f84628b9385..407464201b91 100644 --- a/arch/riscv/kernel/ptrace.c +++ b/arch/riscv/kernel/ptrace.c @@ -159,7 +159,7 @@ __visible void do_syscall_trace_enter(struct pt_regs *regs) * If this fails we might have return value in a0 from seccomp * (via SECCOMP_RET_ERRNO/TRACE). */ - if (secure_computing(NULL) == -1) { + if (secure_computing() == -1) { syscall_set_nr(current, regs, -1); return; } Thanks! -Kees The following changes since commit da0c9ea146cbe92b832f1b0f694840ea8eb33cce: Linux 5.4-rc2 (2019-10-06 14:27:30 -0700) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git tags/seccomp-v5.5-rc1 for you to fetch changes up to 23b2c96fad21886c53f5e1a4ffedd45ddd2e85ba: seccomp: rework define for SECCOMP_USER_NOTIF_FLAG_CONTINUE (2019-10-28 12:29:46 -0700) ---------------------------------------------------------------- seccomp updates for v5.5 - implement SECCOMP_USER_NOTIF_FLAG_CONTINUE (Christian Brauner) - fixes to selftests (Christian Brauner) - remove secure_computing() argument (Christian Brauner) ---------------------------------------------------------------- Christian Brauner (6): seccomp: avoid overflow in implicit constant conversion seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE seccomp: simplify secure_computing() seccomp: fix SECCOMP_USER_NOTIF_FLAG_CONTINUE test seccomp: rework define for SECCOMP_USER_NOTIF_FLAG_CONTINUE arch/arm/kernel/ptrace.c | 2 +- arch/arm64/kernel/ptrace.c | 2 +- arch/parisc/kernel/ptrace.c | 2 +- arch/s390/kernel/ptrace.c | 2 +- arch/um/kernel/skas/syscall.c | 2 +- arch/x86/entry/vsyscall/vsyscall_64.c | 2 +- include/linux/seccomp.h | 6 +- include/uapi/linux/seccomp.h | 29 +++++++ kernel/seccomp.c | 28 +++++-- tools/testing/selftests/seccomp/seccomp_bpf.c | 110 +++++++++++++++++++++++++- 10 files changed, 169 insertions(+), 16 deletions(-) -- Kees Cook

5 years, 10 months

3
2
0 0

[PATCH v7 00/24] mm/gup: track dma-pinned pages: FOLL_PIN

by John Hubbard

Hi, OK, here is v7, maybe this is the last one. The corresponding git repo and branch is: git@github.com:johnhubbard/linux.git pin_user_pages_tracking_v7 Ira, you reviewed the gup_benchmark patches a bit earlier, but I removed one or two of those review-by tags, due to invasive changes I made after your review (in response to further reviews). So could you please reply to any patches you'd like to have reviewed-by's restoredto, if any? Mainly I'm thinking of "mm/gup_benchmark: support pin_user_pages() and related calls". Also various FOLL_LONGTERM vs pin_longterm*() patches. The following blurb from the v6 cover letter is still applicable, and I'll repeat it here so it doesn't get lost in the patch blizzard: Christoph Hellwig has a preference to do things a little differently, for the devmap cleanup in patch 5 ("mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages"). That came up in a different review thread, because the patch is out for review in two locations. Here's that review thread: https://lore.kernel.org/r/20191118070826.GB3099@infradead.org ...and I'm hoping that we can defer that request, because otherwise it derails this series, which is starting to otherwise look like it could be ready for 5.5. Changes since v6: * Renamed a couple of routines, to get rid of unnecessary leading underscores: __pin_compound_head() --> grab_compound_head() __record_subpages() --> record_subpages() * Fixed the error fallback (put_compound_head()) so as to match the fix in the previous version: need to put back N * GUP_PIN_COUNTING_BIAS pages, for FOLL_PIN cases. * Factored out yet another common chunk of code, into a new grab_page() routine. * Added a missing compound_head() call to put_compound_head(). * [Re-]added Jens Axboe's reviewed-by tag to the fs/io_uring patch. * Added more reviewed-by's from Jan Kara. Changes since v5: * Fixed the refcounting for huge pages: in most cases, it was only taking one GUP_PIN_COUNTING_BIAS's worth of refs, when it should have been taking one GUP_PIN_COUNTING_BIAS for each subpage. (Much thanks to Jan Kara for spotting that one!) * Renamed user_page_ref_inc() to try_pin_page(), and added a new try_pin_compound_head(). This definitely improves readability. * Factored out some more duplication in the FOLL_PIN and FOLL_GET cases, in gup.c. * Fixed up some straggling "get_" --> "pin_" references in the comments. * Added reviewed-by tags. Changes since v4: * Renamed put_user_page*() --> unpin_user_page(). * Removed all pin_longterm_pages*() calls. We will use FOLL_LONGTERM at the call sites. (FOLL_PIN, however, remains an internal gup flag). This is very nice: many patches just change three characters now: get_user_pages --> pin_user_pages. I think we've found the right balance of wrapper calls and gup flags, for the call sites. * Updated a lot of documentation and commit logs to match the above two large changes. * Changed gup_benchmark tests and run_vmtests, to adapt to one less use case: there is no pin_longterm_pages() call anymore. * This includes a new devmap cleanup patch from Dan Williams, along with a rebased follow-up: patches 4 and 5, already mentioned above. * Fixed patch 10 ("mm/gup: introduce pin_user_pages*() and FOLL_PIN"), so as to make pin_user_pages*() calls act as placeholders for the corresponding get_user_pages*() calls, until a later patch fully implements the DMA-pinning functionality. Thanks to Jan Kara for noticing that. * Fixed the implementation of pin_user_pages_remote(). * Further tweaked patch 2 ("mm/gup: factor out duplicate code from four routines"), in response to Jan Kara's feedback. * Dropped a few reviewed-by tags due to changes that invalidated them. Changes since v3: * VFIO fix (patch 8): applied further cleanup: removed a pre-existing, unnecessary release and reacquire of mmap_sem. Moved the DAX vma checks from the vfio call site, to gup internals, and added comments (and commit log) to clarify. * Due to the above, made a corresponding fix to the pin_longterm_pages_remote(), which was actually calling the wrong gup internal function. * Changed put_user_page() comments, to refer to pin*() APIs, rather than get_user_pages*() APIs. * Reverted an accidental whitespace-only change in the IB ODP code. * Added a few more reviewed-by tags. Changes since v2: * Added a patch to convert IB/umem from normal gup, to gup_fast(). This is also posted separately, in order to hopefully get some runtime testing. * Changed the page devmap code to be a little clearer, thanks to Jerome for that. * Split out the page devmap changes into a separate patch (and moved Ira's Signed-off-by to that patch). * Fixed my bug in IB: ODP code does not require pin_user_pages() semantics. Therefore, revert the put_user_page() calls to put_page(), and leave the get_user_pages() call as-is. * As part of the revert, I am proposing here a change directly from put_user_pages(), to release_pages(). I'd feel better if someone agrees that this is the best way. It uses the more efficient release_pages(), instead of put_page() in a loop, and keep the change to just a few character on one line, but OTOH it is not a pure revert. * Loosened the FOLL_LONGTERM restrictions in the __get_user_pages_locked() implementation, and used that in order to fix up a VFIO bug. Thanks to Jason for that idea. * Note the use of release_pages() in IB: is that OK? * Added a few more WARN's and clarifying comments nearby. * Many documentation improvements in various comments. * Moved the new pin_user_pages.rst from Documentation/vm/ to Documentation/core-api/ . * Commit descriptions: added clarifying notes to the three patches (drm/via, fs/io_uring, net/xdp) that already had put_user_page() calls in place. * Collected all pending Reviewed-by and Acked-by tags, from v1 and v2 email threads. * Lot of churn from v2 --> v3, so it's possible that new bugs sneaked in. NOT DONE: separate patchset is required: * __get_user_pages_locked(): stop compensating for buggy callers who failed to set FOLL_GET. Instead, assert that FOLL_GET is set (and fail if it's not). ====================================================================== Original cover letter (edited to fix up the patch description numbers) This applies cleanly to linux-next and mmotm, and also to linux.git if linux-next's commit 20cac10710c9 ("mm/gup_benchmark: fix MAP_HUGETLB case") is first applied there. This provides tracking of dma-pinned pages. This is a prerequisite to solving the larger problem of proper interactions between file-backed pages, and [R]DMA activities, as discussed in [1], [2], [3], and in a remarkable number of email threads since about 2017. :) A new internal gup flag, FOLL_PIN is introduced, and thoroughly documented in the last patch's Documentation/vm/pin_user_pages.rst. I believe that this will provide a good starting point for doing the layout lease work that Ira Weiny has been working on. That's because these new wrapper functions provide a clean, constrained, systematically named set of functionality that, again, is required in order to even know if a page is "dma-pinned". In contrast to earlier approaches, the page tracking can be incrementally applied to the kernel call sites that, until now, have been simply calling get_user_pages() ("gup"). In other words, opt-in by changing from this: get_user_pages() (sets FOLL_GET) put_page() to this: pin_user_pages() (sets FOLL_PIN) put_user_page() Because there are interdependencies with FOLL_LONGTERM, a similar conversion as for FOLL_PIN, was applied. The change was from this: get_user_pages(FOLL_LONGTERM) (also sets FOLL_GET) put_page() to this: pin_longterm_pages() (sets FOLL_PIN | FOLL_LONGTERM) put_user_page() ============================================================ Patch summary: * Patches 1-9: refactoring and preparatory cleanup, independent fixes * Patch 10: introduce pin_user_pages(), FOLL_PIN, but no functional changes yet * Patches 11-16: Convert existing put_user_page() callers, to use the new pin*() * Patch 17: Activate tracking of FOLL_PIN pages. * Patches 18-20: convert various callers * Patches: 21-23: gup_benchmark and run_vmtests support * Patch 24: rename put_user_page*() --> unpin_user_page*() ============================================================ Testing: * I've done some overall kernel testing (LTP, and a few other goodies), and some directed testing to exercise some of the changes. And as you can see, gup_benchmark is enhanced to exercise this. Basically, I've been able to runtime test the core get_user_pages() and pin_user_pages() and related routines, but not so much on several of the call sites--but those are generally just a couple of lines changed, each. Not much of the kernel is actually using this, which on one hand reduces risk quite a lot. But on the other hand, testing coverage is low. So I'd love it if, in particular, the Infiniband and PowerPC folks could do a smoke test of this series for me. Also, my runtime testing for the call sites so far is very weak: * io_uring: Some directed tests from liburing exercise this, and they pass. * process_vm_access.c: A small directed test passes. * gup_benchmark: the enhanced version hits the new gup.c code, and passes. * infiniband (still only have crude "IB pingpong" working, on a good day: it's not exercising my conversions at runtime...) * VFIO: compiles (I'm vowing to set up a run time test soon, but it's not ready just yet) * powerpc: it compiles... * drm/via: compiles... * goldfish: compiles... * net/xdp: compiles... * media/v4l2: compiles... ============================================================ Next: * Get the block/bio_vec sites converted to use pin_user_pages(). * Work with Ira and Dave Chinner to weave this together with the layout lease stuff. ============================================================ [1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/ [2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/ [3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/ Dan Williams (1): mm: Cleanup __put_devmap_managed_page() vs ->page_free() John Hubbard (23): mm/gup: pass flags arg to __gup_device_* functions mm/gup: factor out duplicate code from four routines mm/gup: move try_get_compound_head() to top, fix minor issues mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages goldish_pipe: rename local pin_user_pages() routine IB/umem: use get_user_pages_fast() to pin DMA pages media/v4l2-core: set pages dirty upon releasing DMA buffers vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM mm/gup: introduce pin_user_pages*() and FOLL_PIN goldish_pipe: convert to pin_user_pages() and put_user_page() IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() drm/via: set FOLL_PIN via pin_user_pages_fast() fs/io_uring: set FOLL_PIN via pin_user_pages() net/xdp: set FOLL_PIN via pin_user_pages() mm/gup: track FOLL_PIN pages media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion powerpc: book3s64: convert to pin_user_pages() and put_user_page() mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1" mm/gup_benchmark: support pin_user_pages() and related calls selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN coverage mm, tree-wide: rename put_user_page*() to unpin_user_page*() Documentation/core-api/index.rst | 1 + Documentation/core-api/pin_user_pages.rst | 233 +++++++++ arch/powerpc/mm/book3s64/iommu_api.c | 12 +- drivers/gpu/drm/via/via_dmablit.c | 6 +- drivers/infiniband/core/umem.c | 19 +- drivers/infiniband/core/umem_odp.c | 13 +- drivers/infiniband/hw/hfi1/user_pages.c | 4 +- drivers/infiniband/hw/mthca/mthca_memfree.c | 8 +- drivers/infiniband/hw/qib/qib_user_pages.c | 4 +- drivers/infiniband/hw/qib/qib_user_sdma.c | 8 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 4 +- drivers/infiniband/sw/siw/siw_mem.c | 4 +- drivers/media/v4l2-core/videobuf-dma-sg.c | 8 +- drivers/nvdimm/pmem.c | 6 - drivers/platform/goldfish/goldfish_pipe.c | 35 +- drivers/vfio/vfio_iommu_type1.c | 35 +- fs/io_uring.c | 6 +- include/linux/mm.h | 195 ++++++- include/linux/mmzone.h | 2 + include/linux/page_ref.h | 10 + mm/gup.c | 553 +++++++++++++++----- mm/gup_benchmark.c | 74 ++- mm/huge_memory.c | 44 +- mm/hugetlb.c | 36 +- mm/memremap.c | 76 ++- mm/process_vm_access.c | 28 +- mm/vmstat.c | 2 + net/xdp/xdp_umem.c | 4 +- tools/testing/selftests/vm/gup_benchmark.c | 21 +- tools/testing/selftests/vm/run_vmtests | 22 + 30 files changed, 1121 insertions(+), 352 deletions(-) create mode 100644 Documentation/core-api/pin_user_pages.rst -- 2.24.0

5 years, 10 months

9
54
0 0

[PATCH] tools/testing/selftests/seccomp: change USER_NOTIF_MAGIC definition

by Max Filippov

USER_NOTIF_MAGIC is used to both initialize seccomp_notif_resp::val and verify syscall resturn value. On 32-bit architectures syscall return value has type long, but the value of USER_NOTIF_MAGIC has type long long because it doesn't fit into long. As a result all syscall return value comparisons with USER_NOTIF_MAGIC are false. This is also reported by the compiler when '-W' is added to CFLAGS. Add explicit type cast to USER_NOTIF_MAGIC definition. This fixes the following seccomp_bpf tests on 32-bit architectures: global.user_notification_basic global.user_notification_child_pid_ns global.user_notification_sibling_pid_ns global.user_notification_fault_recv Signed-off-by: Max Filippov <jcmvbkbc(a)gmail.com> --- tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 7f8b5c8982e3..16cc30e2ade4 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -3077,7 +3077,7 @@ static int user_trap_syscall(int nr, unsigned int flags) return seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog); } -#define USER_NOTIF_MAGIC 116983961184613L +#define USER_NOTIF_MAGIC ((unsigned long)116983961184613L) TEST(user_notification_basic) { pid_t pid; -- 2.20.1

5 years, 10 months

2
1
0 0

[PATCH v24 22/24] selftests/x86: Add vDSO selftest for SGX

by Jarkko Sakkinen

Expand the selftest by invoking the enclave by using __vdso_sgx_enter_enclave() in addition to direct ENCLS[EENTER]. Cc: linux-sgx(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> --- tools/testing/selftests/x86/sgx/main.c | 132 +++++++++++++++++++++ tools/testing/selftests/x86/sgx/sgx_call.S | 43 +++++++ tools/testing/selftests/x86/sgx/sgx_call.h | 3 + 3 files changed, 178 insertions(+) diff --git a/tools/testing/selftests/x86/sgx/main.c b/tools/testing/selftests/x86/sgx/main.c index 06c761c83cdf..a94ba894b020 100644 --- a/tools/testing/selftests/x86/sgx/main.c +++ b/tools/testing/selftests/x86/sgx/main.c @@ -23,6 +23,109 @@ #define PAGE_SIZE 4096 static const uint64_t MAGIC = 0x1122334455667788ULL; +void *eenter; + +struct vdso_symtab { + Elf64_Sym *elf_symtab; + const char *elf_symstrtab; + Elf64_Word *elf_hashtab; +}; + +static void *vdso_get_base_addr(char *envp[]) +{ + Elf64_auxv_t *auxv; + int i; + + for (i = 0; envp[i]; i++) + ; + + auxv = (Elf64_auxv_t *)&envp[i + 1]; + + for (i = 0; auxv[i].a_type != AT_NULL; i++) { + if (auxv[i].a_type == AT_SYSINFO_EHDR) + return (void *)auxv[i].a_un.a_val; + } + + return NULL; +} + +static Elf64_Dyn *vdso_get_dyntab(void *addr) +{ + Elf64_Ehdr *ehdr = addr; + Elf64_Phdr *phdrtab = addr + ehdr->e_phoff; + int i; + + for (i = 0; i < ehdr->e_phnum; i++) + if (phdrtab[i].p_type == PT_DYNAMIC) + return addr + phdrtab[i].p_offset; + + return NULL; +} + +static void *vdso_get_dyn(void *addr, Elf64_Dyn *dyntab, Elf64_Sxword tag) +{ + int i; + + for (i = 0; dyntab[i].d_tag != DT_NULL; i++) + if (dyntab[i].d_tag == tag) + return addr + dyntab[i].d_un.d_ptr; + + return NULL; +} + +static bool vdso_get_symtab(void *addr, struct vdso_symtab *symtab) +{ + Elf64_Dyn *dyntab = vdso_get_dyntab(addr); + + symtab->elf_symtab = vdso_get_dyn(addr, dyntab, DT_SYMTAB); + if (!symtab->elf_symtab) + return false; + + symtab->elf_symstrtab = vdso_get_dyn(addr, dyntab, DT_STRTAB); + if (!symtab->elf_symstrtab) + return false; + + symtab->elf_hashtab = vdso_get_dyn(addr, dyntab, DT_HASH); + if (!symtab->elf_hashtab) + return false; + + return true; +} + +static unsigned long elf_sym_hash(const char *name) +{ + unsigned long h = 0, high; + + while (*name) { + h = (h << 4) + *name++; + high = h & 0xf0000000; + + if (high) + h ^= high >> 24; + + h &= ~high; + } + + return h; +} + +static Elf64_Sym *vdso_symtab_get(struct vdso_symtab *symtab, const char *name) +{ + Elf64_Word bucketnum = symtab->elf_hashtab[0]; + Elf64_Word *buckettab = &symtab->elf_hashtab[2]; + Elf64_Word *chaintab = &symtab->elf_hashtab[2 + bucketnum]; + Elf64_Sym *sym; + Elf64_Word i; + + for (i = buckettab[elf_sym_hash(name) % bucketnum]; i != STN_UNDEF; + i = chaintab[i]) { + sym = &symtab->elf_symtab[i]; + if (!strcmp(name, &symtab->elf_symstrtab[sym->st_name])) + return sym; + } + + return NULL; +} static bool encl_create(int dev_fd, unsigned long bin_size, struct sgx_secs *secs) @@ -220,10 +323,14 @@ bool load_sigstruct(const char *path, void *sigstruct) int main(int argc, char *argv[], char *envp[]) { + struct sgx_enclave_exception exception; struct sgx_sigstruct sigstruct; + struct vdso_symtab symtab; + Elf64_Sym *eenter_sym; struct sgx_secs secs; uint64_t result = 0; off_t bin_size; + void *addr; void *bin; if (!encl_data_map("encl.bin", &bin, &bin_size)) @@ -245,5 +352,30 @@ int main(int argc, char *argv[], char *envp[]) printf("Output: 0x%lx\n", result); + memset(&exception, 0, sizeof(exception)); + + addr = vdso_get_base_addr(envp); + if (!addr) + exit(1); + + if (!vdso_get_symtab(addr, &symtab)) + exit(1); + + eenter_sym = vdso_symtab_get(&symtab, "__vdso_sgx_enter_enclave"); + if (!eenter_sym) + exit(1); + eenter = addr + eenter_sym->st_value; + + printf("Input: 0x%lx\n", MAGIC); + + sgx_call_vdso((void *)&MAGIC, &result, 0, NULL, NULL, NULL, + (void *)secs.base, &exception, NULL); + if (result != MAGIC) { + fprintf(stderr, "0x%lx != 0x%lx\n", result, MAGIC); + exit(1); + } + + printf("Output: 0x%lx\n", result); + exit(0); } diff --git a/tools/testing/selftests/x86/sgx/sgx_call.S b/tools/testing/selftests/x86/sgx/sgx_call.S index ca4c7893f9d9..e71f44f7a995 100644 --- a/tools/testing/selftests/x86/sgx/sgx_call.S +++ b/tools/testing/selftests/x86/sgx/sgx_call.S @@ -21,3 +21,46 @@ sgx_async_exit: ENCLU pop %rbx ret + + .global sgx_call_vdso +sgx_call_vdso: + .cfi_startproc + push %r15 + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset %r15, 0 + push %r14 + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset %r14, 0 + push %r13 + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset %r13, 0 + push %r12 + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset %r12, 0 + push %rbx + .cfi_adjust_cfa_offset 8 + .cfi_rel_offset %rbx, 0 + push $0 + .cfi_adjust_cfa_offset 8 + push 0x48(%rsp) + .cfi_adjust_cfa_offset 8 + push 0x48(%rsp) + .cfi_adjust_cfa_offset 8 + push 0x48(%rsp) + .cfi_adjust_cfa_offset 8 + mov $2, %eax + call *eenter(%rip) + add $0x20, %rsp + .cfi_adjust_cfa_offset -0x20 + pop %rbx + .cfi_adjust_cfa_offset -8 + pop %r12 + .cfi_adjust_cfa_offset -8 + pop %r13 + .cfi_adjust_cfa_offset -8 + pop %r14 + .cfi_adjust_cfa_offset -8 + pop %r15 + .cfi_adjust_cfa_offset -8 + ret + .cfi_endproc diff --git a/tools/testing/selftests/x86/sgx/sgx_call.h b/tools/testing/selftests/x86/sgx/sgx_call.h index bf72068ada23..a4072c5ecce7 100644 --- a/tools/testing/selftests/x86/sgx/sgx_call.h +++ b/tools/testing/selftests/x86/sgx/sgx_call.h @@ -8,4 +8,7 @@ void sgx_call_eenter(void *rdi, void *rsi, void *entry); +int sgx_call_vdso(void *rdi, void *rsi, long rdx, void *rcx, void *r8, void *r9, + void *tcs, struct sgx_enclave_exception *ei, void *cb); + #endif /* SGX_CALL_H */ -- 2.20.1

5 years, 10 months

1
0
0 0

[PATCH v24 14/24] selftests/x86: Add a selftest for SGX

by Jarkko Sakkinen

Add a selftest for SGX. It is a trivial test where a simple enclave copies one 64-bit word of memory between two memory locations given to the enclave as arguments. Use ENCLS[EENTER] to invoke the enclave. Cc: linux-sgx(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> --- tools/testing/selftests/x86/sgx/Makefile | 47 ++ tools/testing/selftests/x86/sgx/defines.h | 39 ++ tools/testing/selftests/x86/sgx/encl.c | 20 + tools/testing/selftests/x86/sgx/encl.lds | 34 ++ .../selftests/x86/sgx/encl_bootstrap.S | 94 ++++ tools/testing/selftests/x86/sgx/main.c | 249 +++++++++ tools/testing/selftests/x86/sgx/sgx_call.S | 23 + tools/testing/selftests/x86/sgx/sgx_call.h | 11 + tools/testing/selftests/x86/sgx/sgxsign.c | 493 ++++++++++++++++++ .../testing/selftests/x86/sgx/signing_key.pem | 39 ++ 10 files changed, 1049 insertions(+) create mode 100644 tools/testing/selftests/x86/sgx/Makefile create mode 100644 tools/testing/selftests/x86/sgx/defines.h create mode 100644 tools/testing/selftests/x86/sgx/encl.c create mode 100644 tools/testing/selftests/x86/sgx/encl.lds create mode 100644 tools/testing/selftests/x86/sgx/encl_bootstrap.S create mode 100644 tools/testing/selftests/x86/sgx/main.c create mode 100644 tools/testing/selftests/x86/sgx/sgx_call.S create mode 100644 tools/testing/selftests/x86/sgx/sgx_call.h create mode 100644 tools/testing/selftests/x86/sgx/sgxsign.c create mode 100644 tools/testing/selftests/x86/sgx/signing_key.pem diff --git a/tools/testing/selftests/x86/sgx/Makefile b/tools/testing/selftests/x86/sgx/Makefile new file mode 100644 index 000000000000..a09ef5f965dc --- /dev/null +++ b/tools/testing/selftests/x86/sgx/Makefile @@ -0,0 +1,47 @@ +top_srcdir = ../../../../.. + +include ../../lib.mk + +ifndef OBJCOPY +OBJCOPY := $(CROSS_COMPILE)objcopy +endif + +HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC -z noexecstack +ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \ + -fno-stack-protector -mrdrnd $(INCLUDES) + +TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss + +all: $(TEST_CUSTOM_PROGS) + +$(OUTPUT)/test_sgx: $(OUTPUT)/main.o $(OUTPUT)/sgx_call.o + $(CC) $(HOST_CFLAGS) -o $@ $^ + +$(OUTPUT)/main.o: main.c + $(CC) $(HOST_CFLAGS) -c $< -o $@ + +$(OUTPUT)/sgx_call.o: sgx_call.S + $(CC) $(HOST_CFLAGS) -c $< -o $@ + +$(OUTPUT)/encl.bin: $(OUTPUT)/encl.elf $(OUTPUT)/sgxsign + $(OBJCOPY) -O binary $< $@ + +$(OUTPUT)/encl.elf: encl.lds encl.c encl_bootstrap.S + $(CC) $(ENCL_CFLAGS) -T $^ -o $@ + +$(OUTPUT)/encl.ss: $(OUTPUT)/encl.bin + $(OUTPUT)/sgxsign signing_key.pem $(OUTPUT)/encl.bin $(OUTPUT)/encl.ss + +$(OUTPUT)/sgxsign: sgxsign.c + $(CC) -o $@ $< -lcrypto + +EXTRA_CLEAN := \ + $(OUTPUT)/encl.bin \ + $(OUTPUT)/encl.elf \ + $(OUTPUT)/encl.ss \ + $(OUTPUT)/sgx_call.o \ + $(OUTPUT)/sgxsign \ + $(OUTPUT)/test_sgx \ + $(OUTPUT)/test_sgx.o \ + +.PHONY: clean diff --git a/tools/testing/selftests/x86/sgx/defines.h b/tools/testing/selftests/x86/sgx/defines.h new file mode 100644 index 000000000000..1e67f2f29f42 --- /dev/null +++ b/tools/testing/selftests/x86/sgx/defines.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright(c) 2016-19 Intel Corporation. + */ + +#ifndef DEFINES_H +#define DEFINES_H + +#include <stdint.h> + +typedef uint8_t u8; +typedef uint16_t u16; +typedef uint32_t u32; +typedef uint64_t u64; + +#define __aligned(x) __attribute__((__aligned__(x))) +#define __packed __attribute__((packed)) + +/* Derived from asm-generic/bitsperlong.h. */ +#if __x86_64__ +#define BITS_PER_LONG 64 +#else +#define BITS_PER_LONG 32 +#endif +#define BITS_PER_LONG_LONG 64 + +/* Taken from linux/bits.h. */ +#define BIT(nr) (1UL << (nr)) +#define BIT_ULL(nr) (1ULL << (nr)) +#define GENMASK(h, l) \ + (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h)))) +#define GENMASK_ULL(h, l) \ + (((~0ULL) - (1ULL << (l)) + 1) & \ + (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h)))) + +#include "../../../../../arch/x86/kernel/cpu/sgx/arch.h" +#include "../../../../../arch/x86/include/uapi/asm/sgx.h" + +#endif /* DEFINES_H */ diff --git a/tools/testing/selftests/x86/sgx/encl.c b/tools/testing/selftests/x86/sgx/encl.c new file mode 100644 index 000000000000..ede915399742 --- /dev/null +++ b/tools/testing/selftests/x86/sgx/encl.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2016-18 Intel Corporation. + +#include <stddef.h> +#include "defines.h" + +static void *memcpy(void *dest, const void *src, size_t n) +{ + size_t i; + + for (i = 0; i < n; i++) + ((char *)dest)[i] = ((char *)src)[i]; + + return dest; +} + +void encl_body(void *rdi, void *rsi) +{ + memcpy(rsi, rdi, 8); +} diff --git a/tools/testing/selftests/x86/sgx/encl.lds b/tools/testing/selftests/x86/sgx/encl.lds new file mode 100644 index 000000000000..9a56d3064104 --- /dev/null +++ b/tools/testing/selftests/x86/sgx/encl.lds @@ -0,0 +1,34 @@ +OUTPUT_FORMAT(elf64-x86-64) + +SECTIONS +{ + . = 0; + .tcs : { + *(.tcs*) + } + + . = ALIGN(4096); + .text : { + *(.text*) + *(.rodata*) + } + + . = ALIGN(4096); + .data : { + *(.data*) + } + + /DISCARD/ : { + *(.data*) + *(.comment*) + *(.note*) + *(.debug*) + *(.eh_frame*) + } +} + +ASSERT(!DEFINED(.altinstructions), "ALTERNATIVES are not supported in enclaves") +ASSERT(!DEFINED(.altinstr_replacement), "ALTERNATIVES are not supported in enclaves") +ASSERT(!DEFINED(.discard.retpoline_safe), "RETPOLINE ALTERNATIVES are not supported in enclaves") +ASSERT(!DEFINED(.discard.nospec), "RETPOLINE ALTERNATIVES are not supported in enclaves") +ASSERT(!DEFINED(.got.plt), "Libcalls are not supported in enclaves") diff --git a/tools/testing/selftests/x86/sgx/encl_bootstrap.S b/tools/testing/selftests/x86/sgx/encl_bootstrap.S new file mode 100644 index 000000000000..d07f970ccdf9 --- /dev/null +++ b/tools/testing/selftests/x86/sgx/encl_bootstrap.S @@ -0,0 +1,94 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ +/* + * Copyright(c) 2016-18 Intel Corporation. + */ + + .macro ENCLU + .byte 0x0f, 0x01, 0xd7 + .endm + + .section ".tcs", "a" + .balign 4096 + + .fill 1, 8, 0 # STATE (set by CPU) + .fill 1, 8, 0 # FLAGS + .quad encl_ssa # OSSA + .fill 1, 4, 0 # CSSA (set by CPU) + .fill 1, 4, 1 # NSSA + .quad encl_entry # OENTRY + .fill 1, 8, 0 # AEP (set by EENTER and ERESUME) + .fill 1, 8, 0 # OFSBASE + .fill 1, 8, 0 # OGSBASE + .fill 1, 4, 0xFFFFFFFF # FSLIMIT + .fill 1, 4, 0xFFFFFFFF # GSLIMIT + .fill 4024, 1, 0 # Reserved + + .text + +encl_entry: + # RBX contains the base address for TCS, which is also the first address + # inside the enclave. By adding the value of le_stack_end to it, we get + # the absolute address for the stack. + lea (encl_stack)(%rbx), %rax + xchg %rsp, %rax + push %rax + + push %rcx # push the address after EENTER + push %rbx # push the enclave base address + + call encl_body + + pop %rbx # pop the enclave base address + + # Restore XSAVE registers to a synthetic state. + mov $0xFFFFFFFF, %rax + mov $0xFFFFFFFF, %rdx + lea (xsave_area)(%rbx), %rdi + fxrstor (%rdi) + + # Clear GPRs. + xor %rcx, %rcx + xor %rdx, %rdx + xor %rdi, %rdi + xor %rsi, %rsi + xor %r8, %r8 + xor %r9, %r9 + xor %r10, %r10 + xor %r11, %r11 + xor %r12, %r12 + xor %r13, %r13 + xor %r14, %r14 + xor %r15, %r15 + + # Reset status flags. + add %rdx, %rdx # OF = SF = AF = CF = 0; ZF = PF = 1 + + # Prepare EEXIT target by popping the address of the instruction after + # EENTER to RBX. + pop %rbx + + # Restore the caller stack. + pop %rax + mov %rax, %rsp + + # EEXIT + mov $4, %rax + enclu + + .section ".data", "aw" + +encl_ssa: + .space 4096 + +xsave_area: + .fill 1, 4, 0x037F # FCW + .fill 5, 4, 0 + .fill 1, 4, 0x1F80 # MXCSR + .fill 1, 4, 0xFFFF # MXCSR_MASK + .fill 123, 4, 0 + .fill 1, 4, 0x80000000 # XCOMP_BV[63] = 1, compaction mode + .fill 12, 4, 0 + + .balign 4096 + .space 8192 +encl_stack: diff --git a/tools/testing/selftests/x86/sgx/main.c b/tools/testing/selftests/x86/sgx/main.c new file mode 100644 index 000000000000..06c761c83cdf --- /dev/null +++ b/tools/testing/selftests/x86/sgx/main.c @@ -0,0 +1,249 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2016-18 Intel Corporation. + +#include <elf.h> +#include <errno.h> +#include <fcntl.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdint.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <sys/stat.h> +#include <sys/time.h> +#include <sys/types.h> +#include "defines.h" +#include "../../../../../arch/x86/kernel/cpu/sgx/arch.h" +#include "../../../../../arch/x86/include/uapi/asm/sgx.h" +#include "sgx_call.h" + +#define PAGE_SIZE 4096 + +static const uint64_t MAGIC = 0x1122334455667788ULL; + +static bool encl_create(int dev_fd, unsigned long bin_size, + struct sgx_secs *secs) +{ + struct sgx_enclave_create ioc; + void *area; + int rc; + + memset(secs, 0, sizeof(*secs)); + secs->ssa_frame_size = 1; + secs->attributes = SGX_ATTR_MODE64BIT; + secs->xfrm = 3; + + for (secs->size = 4096; secs->size < bin_size; ) + secs->size <<= 1; + + area = mmap(NULL, secs->size * 2, PROT_NONE, MAP_SHARED, dev_fd, 0); + if (area == MAP_FAILED) { + perror("mmap"); + return false; + } + + secs->base = ((uint64_t)area + secs->size - 1) & ~(secs->size - 1); + + munmap(area, secs->base - (uint64_t)area); + munmap((void *)(secs->base + secs->size), + (uint64_t)area + secs->size - secs->base); + + ioc.src = (unsigned long)secs; + rc = ioctl(dev_fd, SGX_IOC_ENCLAVE_CREATE, &ioc); + if (rc) { + fprintf(stderr, "ECREATE failed rc=%d, err=%d.\n", rc, errno); + munmap((void *)secs->base, secs->size); + return false; + } + + return true; +} + +static bool encl_add_pages(int dev_fd, unsigned long offset, void *data, + unsigned long length, uint64_t flags) +{ + struct sgx_enclave_add_pages ioc; + struct sgx_secinfo secinfo; + int rc; + + memset(&secinfo, 0, sizeof(secinfo)); + secinfo.flags = flags; + + ioc.src = (uint64_t)data; + ioc.offset = offset; + ioc.length = length; + ioc.secinfo = (unsigned long)&secinfo; + ioc.flags = SGX_PAGE_MEASURE; + + rc = ioctl(dev_fd, SGX_IOC_ENCLAVE_ADD_PAGES, &ioc); + if (rc) { + fprintf(stderr, "EADD failed rc=%d.\n", rc); + return false; + } + + if (ioc.count != ioc.length) { + fprintf(stderr, "Partially processed, update the test.\n"); + return false; + } + + return true; +} + +#define SGX_REG_PAGE_FLAGS \ + (SGX_SECINFO_REG | SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X) + +static bool encl_build(struct sgx_secs *secs, void *bin, + unsigned long bin_size, struct sgx_sigstruct *sigstruct) +{ + struct sgx_enclave_init ioc; + void *addr; + int dev_fd; + int rc; + + dev_fd = open("/dev/sgx/enclave", O_RDWR); + if (dev_fd < 0) { + fprintf(stderr, "Unable to open /dev/sgx\n"); + return false; + } + + if (!encl_create(dev_fd, bin_size, secs)) + goto out_dev_fd; + + if (!encl_add_pages(dev_fd, 0, bin, PAGE_SIZE, SGX_SECINFO_TCS)) + goto out_dev_fd; + + if (!encl_add_pages(dev_fd, PAGE_SIZE, bin + PAGE_SIZE, + bin_size - PAGE_SIZE, SGX_REG_PAGE_FLAGS)) + goto out_dev_fd; + + ioc.sigstruct = (uint64_t)sigstruct; + rc = ioctl(dev_fd, SGX_IOC_ENCLAVE_INIT, &ioc); + if (rc) { + printf("EINIT failed rc=%d\n", rc); + goto out_map; + } + + addr = mmap((void *)secs->base, PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_FIXED, dev_fd, 0); + if (addr == MAP_FAILED) { + fprintf(stderr, "mmap() failed on TCS, errno=%d.\n", errno); + return false; + } + + addr = mmap((void *)(secs->base + PAGE_SIZE), bin_size - PAGE_SIZE, + PROT_READ | PROT_WRITE | PROT_EXEC, + MAP_SHARED | MAP_FIXED, dev_fd, 0); + if (addr == MAP_FAILED) { + fprintf(stderr, "mmap() failed, errno=%d.\n", errno); + return false; + } + + close(dev_fd); + return true; +out_map: + munmap((void *)secs->base, secs->size); +out_dev_fd: + close(dev_fd); + return false; +} + +bool get_file_size(const char *path, off_t *bin_size) +{ + struct stat sb; + int ret; + + ret = stat(path, &sb); + if (ret) { + perror("stat"); + return false; + } + + if (!sb.st_size || sb.st_size & 0xfff) { + fprintf(stderr, "Invalid blob size %lu\n", sb.st_size); + return false; + } + + *bin_size = sb.st_size; + return true; +} + +bool encl_data_map(const char *path, void **bin, off_t *bin_size) +{ + int fd; + + fd = open(path, O_RDONLY); + if (fd == -1) { + fprintf(stderr, "open() %s failed, errno=%d.\n", path, errno); + return false; + } + + if (!get_file_size(path, bin_size)) + goto err_out; + + *bin = mmap(NULL, *bin_size, PROT_READ, MAP_PRIVATE, fd, 0); + if (*bin == MAP_FAILED) { + fprintf(stderr, "mmap() %s failed, errno=%d.\n", path, errno); + goto err_out; + } + + close(fd); + return true; + +err_out: + close(fd); + return false; +} + +bool load_sigstruct(const char *path, void *sigstruct) +{ + int fd; + + fd = open(path, O_RDONLY); + if (fd == -1) { + fprintf(stderr, "open() %s failed, errno=%d.\n", path, errno); + return false; + } + + if (read(fd, sigstruct, sizeof(struct sgx_sigstruct)) != + sizeof(struct sgx_sigstruct)) { + fprintf(stderr, "read() %s failed, errno=%d.\n", path, errno); + close(fd); + return false; + } + + close(fd); + return true; +} + +int main(int argc, char *argv[], char *envp[]) +{ + struct sgx_sigstruct sigstruct; + struct sgx_secs secs; + uint64_t result = 0; + off_t bin_size; + void *bin; + + if (!encl_data_map("encl.bin", &bin, &bin_size)) + exit(1); + + if (!load_sigstruct("encl.ss", &sigstruct)) + exit(1); + + if (!encl_build(&secs, bin, bin_size, &sigstruct)) + exit(1); + + printf("Input: 0x%lx\n", MAGIC); + + sgx_call_eenter((void *)&MAGIC, &result, (void *)secs.base); + if (result != MAGIC) { + fprintf(stderr, "0x%lx != 0x%lx\n", result, MAGIC); + exit(1); + } + + printf("Output: 0x%lx\n", result); + + exit(0); +} diff --git a/tools/testing/selftests/x86/sgx/sgx_call.S b/tools/testing/selftests/x86/sgx/sgx_call.S new file mode 100644 index 000000000000..ca4c7893f9d9 --- /dev/null +++ b/tools/testing/selftests/x86/sgx/sgx_call.S @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ +/** +* Copyright(c) 2016-18 Intel Corporation. +*/ + + .text + + .macro ENCLU + .byte 0x0f, 0x01, 0xd7 + .endm + + .text + + .global sgx_call_eenter +sgx_call_eenter: + push %rbx + mov $0x02, %rax + mov %rdx, %rbx + lea sgx_async_exit(%rip), %rcx +sgx_async_exit: + ENCLU + pop %rbx + ret diff --git a/tools/testing/selftests/x86/sgx/sgx_call.h b/tools/testing/selftests/x86/sgx/sgx_call.h new file mode 100644 index 000000000000..bf72068ada23 --- /dev/null +++ b/tools/testing/selftests/x86/sgx/sgx_call.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright(c) 2016-19 Intel Corporation. + */ + +#ifndef SGX_CALL_H +#define SGX_CALL_H + +void sgx_call_eenter(void *rdi, void *rsi, void *entry); + +#endif /* SGX_CALL_H */ diff --git a/tools/testing/selftests/x86/sgx/sgxsign.c b/tools/testing/selftests/x86/sgx/sgxsign.c new file mode 100644 index 000000000000..3d9007af40c9 --- /dev/null +++ b/tools/testing/selftests/x86/sgx/sgxsign.c @@ -0,0 +1,493 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2016-18 Intel Corporation. + +#define _GNU_SOURCE +#include <getopt.h> +#include <stdbool.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <unistd.h> +#include <openssl/err.h> +#include <openssl/pem.h> +#include "defines.h" + +struct sgx_sigstruct_payload { + struct sgx_sigstruct_header header; + struct sgx_sigstruct_body body; +}; + +static bool check_crypto_errors(void) +{ + int err; + bool had_errors = false; + const char *filename; + int line; + char str[256]; + + for ( ; ; ) { + if (ERR_peek_error() == 0) + break; + + had_errors = true; + err = ERR_get_error_line(&filename, &line); + ERR_error_string_n(err, str, sizeof(str)); + fprintf(stderr, "crypto: %s: %s:%d\n", str, filename, line); + } + + return had_errors; +} + +static void exit_usage(const char *program) +{ + fprintf(stderr, + "Usage: %s/sign-le <key> <enclave> <sigstruct>\n", program); + exit(1); +} + +static inline const BIGNUM *get_modulus(RSA *key) +{ +#if OPENSSL_VERSION_NUMBER < 0x10100000L + return key->n; +#else + const BIGNUM *n; + + RSA_get0_key(key, &n, NULL, NULL); + return n; +#endif +} + +static RSA *load_sign_key(const char *path) +{ + FILE *f; + RSA *key; + + f = fopen(path, "rb"); + if (!f) { + fprintf(stderr, "Unable to open %s\n", path); + return NULL; + } + key = RSA_new(); + if (!PEM_read_RSAPrivateKey(f, &key, NULL, NULL)) + return NULL; + fclose(f); + + if (BN_num_bytes(get_modulus(key)) != SGX_MODULUS_SIZE) { + fprintf(stderr, "Invalid key size %d\n", + BN_num_bytes(get_modulus(key))); + RSA_free(key); + return NULL; + } + + return key; +} + +static void reverse_bytes(void *data, int length) +{ + int i = 0; + int j = length - 1; + uint8_t temp; + uint8_t *ptr = data; + + while (i < j) { + temp = ptr[i]; + ptr[i] = ptr[j]; + ptr[j] = temp; + i++; + j--; + } +} + +enum mrtags { + MRECREATE = 0x0045544145524345, + MREADD = 0x0000000044444145, + MREEXTEND = 0x00444E4554584545, +}; + +static bool mrenclave_update(EVP_MD_CTX *ctx, const void *data) +{ + if (!EVP_DigestUpdate(ctx, data, 64)) { + fprintf(stderr, "digest update failed\n"); + return false; + } + + return true; +} + +static bool mrenclave_commit(EVP_MD_CTX *ctx, uint8_t *mrenclave) +{ + unsigned int size; + + if (!EVP_DigestFinal_ex(ctx, (unsigned char *)mrenclave, &size)) { + fprintf(stderr, "digest commit failed\n"); + return false; + } + + if (size != 32) { + fprintf(stderr, "invalid digest size = %u\n", size); + return false; + } + + return true; +} + +struct mrecreate { + uint64_t tag; + uint32_t ssaframesize; + uint64_t size; + uint8_t reserved[44]; +} __attribute__((__packed__)); + + +static bool mrenclave_ecreate(EVP_MD_CTX *ctx, uint64_t blob_size) +{ + struct mrecreate mrecreate; + uint64_t encl_size; + + for (encl_size = 0x1000; encl_size < blob_size; ) + encl_size <<= 1; + + memset(&mrecreate, 0, sizeof(mrecreate)); + mrecreate.tag = MRECREATE; + mrecreate.ssaframesize = 1; + mrecreate.size = encl_size; + + if (!EVP_DigestInit_ex(ctx, EVP_sha256(), NULL)) + return false; + + return mrenclave_update(ctx, &mrecreate); +} + +struct mreadd { + uint64_t tag; + uint64_t offset; + uint64_t flags; /* SECINFO flags */ + uint8_t reserved[40]; +} __attribute__((__packed__)); + +static bool mrenclave_eadd(EVP_MD_CTX *ctx, uint64_t offset, uint64_t flags) +{ + struct mreadd mreadd; + + memset(&mreadd, 0, sizeof(mreadd)); + mreadd.tag = MREADD; + mreadd.offset = offset; + mreadd.flags = flags; + + return mrenclave_update(ctx, &mreadd); +} + +struct mreextend { + uint64_t tag; + uint64_t offset; + uint8_t reserved[48]; +} __attribute__((__packed__)); + +static bool mrenclave_eextend(EVP_MD_CTX *ctx, uint64_t offset, uint8_t *data) +{ + struct mreextend mreextend; + int i; + + for (i = 0; i < 0x1000; i += 0x100) { + memset(&mreextend, 0, sizeof(mreextend)); + mreextend.tag = MREEXTEND; + mreextend.offset = offset + i; + + if (!mrenclave_update(ctx, &mreextend)) + return false; + + if (!mrenclave_update(ctx, &data[i + 0x00])) + return false; + + if (!mrenclave_update(ctx, &data[i + 0x40])) + return false; + + if (!mrenclave_update(ctx, &data[i + 0x80])) + return false; + + if (!mrenclave_update(ctx, &data[i + 0xC0])) + return false; + } + + return true; +} + +/** + * measure_encl - measure enclave + * @path: path to the enclave + * @mrenclave: measurement + * + * Calculates MRENCLAVE. Assumes that the very first page is a TCS page and + * following pages are regular pages. Does not measure the contents of the + * enclave as the signing tool is used at the moment only for the launch + * enclave, which is pass-through (everything gets a token). + */ +static bool measure_encl(const char *path, uint8_t *mrenclave) +{ + FILE *file; + struct stat sb; + EVP_MD_CTX *ctx; + uint64_t flags; + uint64_t offset; + uint8_t data[0x1000]; + int rc; + + ctx = EVP_MD_CTX_create(); + if (!ctx) + return false; + + file = fopen(path, "rb"); + if (!file) { + perror("fopen"); + EVP_MD_CTX_destroy(ctx); + return false; + } + + rc = stat(path, &sb); + if (rc) { + perror("stat"); + goto out; + } + + if (!sb.st_size || sb.st_size & 0xfff) { + fprintf(stderr, "Invalid blob size %lu\n", sb.st_size); + goto out; + } + + if (!mrenclave_ecreate(ctx, sb.st_size)) + goto out; + + for (offset = 0; offset < sb.st_size; offset += 0x1000) { + if (!offset) + flags = SGX_SECINFO_TCS; + else + flags = SGX_SECINFO_REG | SGX_SECINFO_R | + SGX_SECINFO_W | SGX_SECINFO_X; + + if (!mrenclave_eadd(ctx, offset, flags)) + goto out; + + rc = fread(data, 1, 0x1000, file); + if (!rc) + break; + if (rc < 0x1000) + goto out; + + if (!mrenclave_eextend(ctx, offset, data)) + goto out; + } + + if (!mrenclave_commit(ctx, mrenclave)) + goto out; + + fclose(file); + EVP_MD_CTX_destroy(ctx); + return true; +out: + fclose(file); + EVP_MD_CTX_destroy(ctx); + return false; +} + +/** + * sign_encl - sign enclave + * @sigstruct: pointer to SIGSTRUCT + * @key: 3072-bit RSA key + * @signature: byte array for the signature + * + * Calculates EMSA-PKCSv1.5 signature for the given SIGSTRUCT. The result is + * stored in big-endian format so that it can be further passed to OpenSSL + * libcrypto functions. + */ +static bool sign_encl(const struct sgx_sigstruct *sigstruct, RSA *key, + uint8_t *signature) +{ + struct sgx_sigstruct_payload payload; + unsigned int siglen; + uint8_t digest[SHA256_DIGEST_LENGTH]; + bool ret; + + memcpy(&payload.header, &sigstruct->header, sizeof(sigstruct->header)); + memcpy(&payload.body, &sigstruct->body, sizeof(sigstruct->body)); + + SHA256((unsigned char *)&payload, sizeof(payload), digest); + + ret = RSA_sign(NID_sha256, digest, SHA256_DIGEST_LENGTH, signature, + &siglen, key); + + return ret; +} + +struct q1q2_ctx { + BN_CTX *bn_ctx; + BIGNUM *m; + BIGNUM *s; + BIGNUM *q1; + BIGNUM *qr; + BIGNUM *q2; +}; + +static void free_q1q2_ctx(struct q1q2_ctx *ctx) +{ + BN_CTX_free(ctx->bn_ctx); + BN_free(ctx->m); + BN_free(ctx->s); + BN_free(ctx->q1); + BN_free(ctx->qr); + BN_free(ctx->q2); +} + +static bool alloc_q1q2_ctx(const uint8_t *s, const uint8_t *m, + struct q1q2_ctx *ctx) +{ + ctx->bn_ctx = BN_CTX_new(); + ctx->s = BN_bin2bn(s, SGX_MODULUS_SIZE, NULL); + ctx->m = BN_bin2bn(m, SGX_MODULUS_SIZE, NULL); + ctx->q1 = BN_new(); + ctx->qr = BN_new(); + ctx->q2 = BN_new(); + + if (!ctx->bn_ctx || !ctx->s || !ctx->m || !ctx->q1 || !ctx->qr || + !ctx->q2) { + free_q1q2_ctx(ctx); + return false; + } + + return true; +} + +static bool calc_q1q2(const uint8_t *s, const uint8_t *m, uint8_t *q1, + uint8_t *q2) +{ + struct q1q2_ctx ctx; + + if (!alloc_q1q2_ctx(s, m, &ctx)) { + fprintf(stderr, "Not enough memory for Q1Q2 calculation\n"); + return false; + } + + if (!BN_mul(ctx.q1, ctx.s, ctx.s, ctx.bn_ctx)) + goto out; + + if (!BN_div(ctx.q1, ctx.qr, ctx.q1, ctx.m, ctx.bn_ctx)) + goto out; + + if (BN_num_bytes(ctx.q1) > SGX_MODULUS_SIZE) { + fprintf(stderr, "Too large Q1 %d bytes\n", + BN_num_bytes(ctx.q1)); + goto out; + } + + if (!BN_mul(ctx.q2, ctx.s, ctx.qr, ctx.bn_ctx)) + goto out; + + if (!BN_div(ctx.q2, NULL, ctx.q2, ctx.m, ctx.bn_ctx)) + goto out; + + if (BN_num_bytes(ctx.q2) > SGX_MODULUS_SIZE) { + fprintf(stderr, "Too large Q2 %d bytes\n", + BN_num_bytes(ctx.q2)); + goto out; + } + + BN_bn2bin(ctx.q1, q1); + BN_bn2bin(ctx.q2, q2); + + free_q1q2_ctx(&ctx); + return true; +out: + free_q1q2_ctx(&ctx); + return false; +} + +static bool save_sigstruct(const struct sgx_sigstruct *sigstruct, + const char *path) +{ + FILE *f = fopen(path, "wb"); + + if (!f) { + fprintf(stderr, "Unable to open %s\n", path); + return false; + } + + fwrite(sigstruct, sizeof(*sigstruct), 1, f); + fclose(f); + return true; +} + +int main(int argc, char **argv) +{ + uint64_t header1[2] = {0x000000E100000006, 0x0000000000010000}; + uint64_t header2[2] = {0x0000006000000101, 0x0000000100000060}; + struct sgx_sigstruct ss; + const char *program; + int opt; + RSA *sign_key; + + memset(&ss, 0, sizeof(ss)); + ss.header.header1[0] = header1[0]; + ss.header.header1[1] = header1[1]; + ss.header.header2[0] = header2[0]; + ss.header.header2[1] = header2[1]; + ss.exponent = 3; + +#ifndef CONFIG_EINITTOKENKEY + ss.body.attributes = SGX_ATTR_MODE64BIT; +#else + ss.body.attributes = SGX_ATTR_MODE64BIT | SGX_ATTR_EINITTOKENKEY; +#endif + ss.body.xfrm = 3, + + program = argv[0]; + + do { + opt = getopt(argc, argv, ""); + switch (opt) { + case -1: + break; + default: + exit_usage(program); + } + } while (opt != -1); + + argc -= optind; + argv += optind; + + if (argc < 3) + exit_usage(program); + + /* sanity check only */ + if (check_crypto_errors()) + exit(1); + + sign_key = load_sign_key(argv[0]); + if (!sign_key) + goto out; + + BN_bn2bin(get_modulus(sign_key), ss.modulus); + + if (!measure_encl(argv[1], ss.body.mrenclave)) + goto out; + + if (!sign_encl(&ss, sign_key, ss.signature)) + goto out; + + if (!calc_q1q2(ss.signature, ss.modulus, ss.q1, ss.q2)) + goto out; + + /* convert to little endian */ + reverse_bytes(ss.signature, SGX_MODULUS_SIZE); + reverse_bytes(ss.modulus, SGX_MODULUS_SIZE); + reverse_bytes(ss.q1, SGX_MODULUS_SIZE); + reverse_bytes(ss.q2, SGX_MODULUS_SIZE); + + if (!save_sigstruct(&ss, argv[2])) + goto out; + exit(0); +out: + check_crypto_errors(); + exit(1); +} diff --git a/tools/testing/selftests/x86/sgx/signing_key.pem b/tools/testing/selftests/x86/sgx/signing_key.pem new file mode 100644 index 000000000000..d76f21f19187 --- /dev/null +++ b/tools/testing/selftests/x86/sgx/signing_key.pem @@ -0,0 +1,39 @@ +-----BEGIN RSA PRIVATE KEY----- +MIIG4wIBAAKCAYEApalGbq7Q+usM91CPtksu3D+b0Prc8gAFL6grM3mg85A5Bx8V +cfMXPgtrw8EYFwQxDAvzZWwl+9VfOX0ECrFRBkOHcOiG0SnADN8+FLj1UiNUQwbp +S6OzhNWuRcSbGraSOyUlVlV0yMQSvewyzGklOaXBe30AJqzIBc8QfdSxKuP8rs0Z +ga6k/Bl73osrYKByILJTUUeZqjLERsE6GebsdzbWgKn8qVqng4ZS4yMNg6LeRlH3 ++9CIPgg4jwpSLHcp7dq2qTIB9a0tGe9ayp+5FbucpB6U7ePold0EeRN6RlJGDF9k +L93v8P5ykz5G5gYZ2g0K1X2sHIWV4huxPgv5PXgdyQYbK+6olqj0d5rjYuwX57Ul +k6SroPS1U6UbdCjG5txM+BNGU0VpD0ZhrIRw0leQdnNcCO9sTJuInZrgYacSVJ7u +mtB+uCt+uzUesc+l+xPRYA+9e14lLkZp7AAmo9FvL816XDI09deehJ3i/LmHKCRN +tuqC5TprRjFwUr6dAgEDAoIBgG5w2Z8fNfycs0+LCnmHdJLVEotR6KFVWMpwHMz7 +wKJgJgS/Y6FMuilc8oKAuroCy11dTO5IGVKOP3uorVx2NgQtBPXwWeDGgAiU1A3Q +o4wXjYIEm4fCd63jyYPYZ2ckYXzDbjmOTdstYdPyzIhGGNEZK6eoqsRzMAPfYFPj +IMdCqHSIu6vJw1K7p+myHOsVoWshjODaZnF3LYSA0WaZ8vokjwBxUxuRxQJZjJds +s60XPtmL+qfgWtQFewoG4XL6GuD8FcXccynRRtzrLtFNPIl9BQfWfjBBhTC1/Te1 +0Z6XbZvpdUTD9OfLB7SbR2OUFNpKQgriO0iYVdbW3cr7uu38Zwp4W1TX73DPjoi6 +KNooP6SGWd4mRJW2+dUmSYS4QNG8eVVZswKcploEIXlAKRsOe4kzJJ1iETugIe85 +uX8nd1WYEp65xwoRUg8hqng0MeyveVbXqNKuJG6tzNDt9kgFYo+hmC/oouAW2Dtc +T9jdRAwKJXqA2Eg6OkgXCEv+kwKBwQDYaQiFMlFhsmLlqI+EzCUh7c941/cL7m6U +7j98+8ngl0HgCEcrc10iJVCKakQW3YbPzAx3XkKTaGjWazvvrFarXIGlOud64B8a +iWyQ7VdlnmZnNEdk+C83tI91OQeaTKqRLDGzKh29Ry/jL8Pcbazt+kDgxa0H7qJp +roADUanLQuNkYubpbhFBh3xpa2EExaVq6rF7nIVsD8W9TrbmPKA4LgH7z0iy544D +kVCNYsTjYDdUWP+WiSor8kCnnpjnN9sCgcEAw/eNezUD1UDf6OYFC9+5JZJFn4Tg +mZMyN93JKIb199ffwnjtHUSjcyiWeesXucpzwtGbTcwQnDisSW4oneYKLSEBlBaq +scqiUugyGZZOthFSCbdXYXMViK2vHrKlkse7GxVlROKcEhM/pRBrmjaGO8eWR+D4 +FO2wCXzVs3KgV6j779frw0vC54oHOxc9+Lu1rSHp4i+600koyvL/zF6U/5tZXIvN +YW2yoiQJnjCmVA1pwbwV6KAUTPDTMnBK+YjnAoHBAJBGBa4hi5Z27JkbCliIGMFJ +NPs6pLKe9GNJf6in2+sPgUAFhMeiPhbDiwbxgrnpBIqICE+ULGJFmzmc0p/IOceT +ARjR76dAFLxbnbXzj5kURETNhO36yiUjCk4mBRGIcbYddndxaSjaH+zKgpLzyJ6m +1esuc1qfFvEfAAI2cTIsl5hB70ZJYNZaUvDyQK3ZGPHxy6e9rkgKg9OJz0QoatAe +q/002yHvtAJg4F5B2JeVejg7VQ8GHB1MKxppu0TP5wKBwQCCpQj8zgKOKz/wmViy +lSYZDC5qWJW7t3bP6TDFr06lOpUsUJ4TgxeiGw778g/RMaKB4RIz3WBoJcgw9BsT +7rFza1ZiucchMcGMmswRDt8kC4wGejpA92Owc8oUdxkMhSdnY5jYlxK2t3/DYEe8 +JFl9L7mFQKVjSSAGUzkiTGrlG1Kf5UfXh9dFBq98uilQfSPIwUaWynyM23CHTKqI +Pw3/vOY9sojrnncWwrEUIG7is5vWfWPwargzSzd29YdRBe8CgcEAuRVewK/YeNOX +B7ZG6gKKsfsvrGtY7FPETzLZAHjoVXYNea4LVZ2kn4hBXXlvw/4HD+YqcTt4wmif +5JQlDvjNobUiKJZpzy7hklVhF7wZFl4pCF7Yh43q9iQ7gKTaeUG7MiaK+G8Zz8aY +HW9rsiihbdZkccMvnPfO9334XMxl3HtBRzLstjUlbLB7Sdh+7tZ3JQidCOFNs5pE +XyWwnASPu4tKfDahH1UUTp1uJcq/6716CSWg080avYxFcn75qqsb +-----END RSA PRIVATE KEY----- -- 2.20.1

5 years, 10 months

1
0
0 0

[PATCH v2 00/19] pin_user_pages(): reduced-risk series for Linux 5.5

by John Hubbard

Hi, Changes since v1: * Fixed up ppc in response to Jan Kara's review comments (thanks for those!). * Fixed a kbuilt robot-detected build failure: added a stub function for the !CONFIG_MMU case. * Cover letter: now refers to "unpin_user_page()", reflecting the name change in the last patch (instead of put_user_page() ). * Rebased onto today's linux-next: c165016bac27 ("Add linux-next specific files for 20191125") ======================================================================== Here is a set of well-reviewed (expect for one patch), lower-risk items that can go into Linux 5.5. (Update: the powerpc conversion patch has had some initial review now, since v1 was posted.) This is essentially a cut-down v8 of "mm/gup: track dma-pinned pages: FOLL_PIN" [1], and with one of the VFIO patches split into two patches. The idea here is to get this long list of "noise" checked into 5.5, so that the actual, higher-risk "track FOLL_PIN pages" (which is deferred: not part of this series) will be a much shorter patchset to review. For the v4l2-core changes, I've left those here (instead of sending them separately to the -media tree), in order to get the name change done now (put_user_page --> unpin_user_page). However, I've added a Cc stable, as recommended during the last round of reviews. Here are the relevant notes from the original cover letter, edited to match the current situation: This is a prerequisite to tracking dma-pinned pages. That in turn is a prerequisite to solving the larger problem of proper interactions between file-backed pages, and [R]DMA activities, as discussed in [1], [2], [3], and in a remarkable number of email threads since about 2017. :) A new internal gup flag, FOLL_PIN is introduced, and thoroughly documented in the last patch's Documentation/vm/pin_user_pages.rst. I believe that this will provide a good starting point for doing the layout lease work that Ira Weiny has been working on. That's because these new wrapper functions provide a clean, constrained, systematically named set of functionality that, again, is required in order to even know if a page is "dma-pinned". In contrast to earlier approaches, the page tracking can be incrementally applied to the kernel call sites that, until now, have been simply calling get_user_pages() ("gup"). In other words, opt-in by changing from this: get_user_pages() (sets FOLL_GET) put_page() to this: pin_user_pages() (sets FOLL_PIN) unpin_user_page() Because there are interdependencies with FOLL_LONGTERM, a similar conversion as for FOLL_PIN, was applied. The change was from this: get_user_pages(FOLL_LONGTERM) (also sets FOLL_GET) put_page() to this: pin_longterm_pages() (sets FOLL_PIN | FOLL_LONGTERM) unpin_user_page() [1] https://lore.kernel.org/r/20191121071354.456618-1-jhubbard@nvidia.com thanks, John Hubbard NVIDIA Dan Williams (1): mm: Cleanup __put_devmap_managed_page() vs ->page_free() John Hubbard (18): mm/gup: factor out duplicate code from four routines mm/gup: move try_get_compound_head() to top, fix minor issues goldish_pipe: rename local pin_user_pages() routine mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call mm/gup: introduce pin_user_pages*() and FOLL_PIN goldish_pipe: convert to pin_user_pages() and put_user_page() IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() drm/via: set FOLL_PIN via pin_user_pages_fast() fs/io_uring: set FOLL_PIN via pin_user_pages() net/xdp: set FOLL_PIN via pin_user_pages() media/v4l2-core: set pages dirty upon releasing DMA buffers media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion powerpc: book3s64: convert to pin_user_pages() and put_user_page() mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1" mm, tree-wide: rename put_user_page*() to unpin_user_page*() Documentation/core-api/index.rst | 1 + Documentation/core-api/pin_user_pages.rst | 233 ++++++++++++++ arch/powerpc/mm/book3s64/iommu_api.c | 12 +- drivers/gpu/drm/via/via_dmablit.c | 6 +- drivers/infiniband/core/umem.c | 4 +- drivers/infiniband/core/umem_odp.c | 13 +- drivers/infiniband/hw/hfi1/user_pages.c | 4 +- drivers/infiniband/hw/mthca/mthca_memfree.c | 8 +- drivers/infiniband/hw/qib/qib_user_pages.c | 4 +- drivers/infiniband/hw/qib/qib_user_sdma.c | 8 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 4 +- drivers/infiniband/sw/siw/siw_mem.c | 4 +- drivers/media/v4l2-core/videobuf-dma-sg.c | 8 +- drivers/nvdimm/pmem.c | 6 - drivers/platform/goldfish/goldfish_pipe.c | 35 +- drivers/vfio/vfio_iommu_type1.c | 35 +- fs/io_uring.c | 6 +- include/linux/mm.h | 77 +++-- mm/gup.c | 340 +++++++++++++------- mm/gup_benchmark.c | 9 +- mm/memremap.c | 80 ++--- mm/process_vm_access.c | 28 +- net/xdp/xdp_umem.c | 4 +- tools/testing/selftests/vm/gup_benchmark.c | 6 +- 24 files changed, 650 insertions(+), 285 deletions(-) create mode 100644 Documentation/core-api/pin_user_pages.rst -- 2.24.0

5 years, 10 months

2
21
0 0

[PATCH v3 00/19] x86/cpu: Clean up handling of VMX features

by Sean Christopherson

Clean up a handful of interrelated warts in the kernel's handling of VMX: - Enable VMX in IA32_FEATURE_CONTROL during boot instead of on-demand during KVM load to avoid future contention over IA32_FEATURE_CONTROL. - Rework VMX feature reporting so that it is accurate and up-to-date, now and in the future. - Consolidate code across CPUs that support VMX. This series stems from two separate but related issues. The first issue, pointed out by Boris in the SGX enabling series[1], is that the kernel currently doesn't ensure the IA32_FEATURE_CONTROL MSR is configured during boot. The second issue is that the kernel's reporting of VMX features is stale, potentially inaccurate, and difficult to maintain. Please holler if you don't want to be cc'd on future versions of this series, or only want to be cc'd on select patches. v3: - Rebase to tip/master, ceceaf1f12ba ("Merge branch 'WIP.x86/cleanups'"). - Rename the feature control MSR bit defines [Boris]. - Rewrite the error message displayed when reading feature control MSR faults on a VMX capable CPU to explicitly state that it's likely a hardware or hypervisor issue [Boris]. - Collect a Reviewed-by for the LMCE change [Boris]. - Enable VMX in feature control (if it's unlocked) if and only if KVM is enabled [Paolo]. - Remove a big pile of redudant MSR defines from the KVM selftests that was discovered when renaming the feature control defines. - Fix a changelog typoe [Boris]. v2: - Rebase to latest tip/x86/cpu (1edae1ae6258, "x86/Kconfig: Enforce...) - Collect Jim's reviews. - Fix a typo in setting of EPT capabilities [TonyWWang-oc]. - Remove defines for reserved VMX feature flags [Paolo]. - Print the VMX features under "flags" and maintain all existing names to be backward compatible with the ABI [Paolo]. - Create aggregate APIC features to report FLEXPRIORITY and APICV, so that the full feature *and* their associated individual features are printed, e.g. to aid in recognizing why an APIC feature isn't being used. - Fix a few copy paste errors in changelogs. v1 cover letter: == IA32_FEATURE_CONTROL == Lack of IA32_FEATURE_CONTROL configuration during boot isn't a functional issue in the current kernel as the majority of platforms set and lock IA32_FEATURE_CONTROL in firmware. And when the MSR is left unlocked, KVM is the only subsystem that writes IA32_FEATURE_CONTROL. That will change if/when SGX support is enabled, as SGX will also want to fully enable itself when IA32_FEATURE_CONTROL is unlocked. == VMX Feature Reporting == VMX features are not enumerated via CPUID, but instead are enumerated through VMX MSRs. As a result, new VMX features are not automatically reported via /proc/cpuinfo. An attempt was made long ago to report interesting and/or meaningful VMX features by synthesizing select features into a Linux-defined cpufeatures word. Synthetic feature flags worked for the initial purpose, but the existence of the synthetic flags was forgotten almost immediately, e.g. only one new flag (EPT A/D) has been added in the the decade since the synthetic VMX features were introduced, while VMX and KVM have gained support for many new features. Placing the synthetic flags in x86_capability also allows them to be queried via cpu_has() and company, which is misleading as the flags exist purely for reporting via /proc/cpuinfo. KVM, the only in-kernel user of VMX, ignores the flags. Last but not least, VMX features are reported in /proc/cpuinfo even when VMX is unusable due to lack of enabling in IA32_FEATURE_CONTROL. == Caveats == All of the testing of non-standard flows was done in a VM, as I don't have a system that leaves IA32_FEATURE_CONTROL unlocked, or locks it with VMX disabled. The Centaur and Zhaoxin changes are somewhat speculative, as I haven't confirmed they actually support IA32_FEATURE_CONTROL, or that they want to gain "official" KVM support. I assume they unofficially support KVM given that both CPUs went through the effort of enumerating VMX features. That in turn would require them to support IA32_FEATURE_CONTROL since KVM will fault and refuse to load if the MSR doesn't exist. [1] https://lkml.kernel.org/r/20190925085156.GA3891@zn.tnic Sean Christopherson (19): x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR selftests: kvm: Replace manual MSR defs with common msr-index.h tools arch x86: Sync msr-index.h from kernel sources x86/intel: Initialize IA32_FEATURE_CONTROL MSR at boot x86/mce: WARN once if IA32_FEATURE_CONTROL MSR is left unlocked x86/centaur: Use common IA32_FEATURE_CONTROL MSR initialization x86/zhaoxin: Use common IA32_FEATURE_CONTROL MSR initialization KVM: VMX: Drop initialization of IA32_FEATURE_CONTROL MSR x86/cpu: Clear VMX feature flag if VMX is not fully enabled KVM: VMX: Use VMX feature flag to query BIOS enabling KVM: VMX: Check for full VMX support when verifying CPU compatibility x86/vmx: Introduce VMX_FEATURES_* x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_* x86/cpufeatures: Drop synthetic VMX feature flags KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits x86/cpufeatures: Clean up synthetic virtualization flags perf/x86: Provide stubs of KVM helpers for non-Intel CPUs KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUs MAINTAINERS | 2 +- arch/x86/Kconfig.cpu | 8 + arch/x86/boot/mkcpustr.c | 1 + arch/x86/include/asm/cpufeatures.h | 15 +- arch/x86/include/asm/msr-index.h | 11 +- arch/x86/include/asm/perf_event.h | 22 +- arch/x86/include/asm/processor.h | 4 + arch/x86/include/asm/vmx.h | 105 +-- arch/x86/include/asm/vmxfeatures.h | 86 +++ arch/x86/kernel/cpu/Makefile | 6 +- arch/x86/kernel/cpu/centaur.c | 35 +- arch/x86/kernel/cpu/common.c | 3 + arch/x86/kernel/cpu/cpu.h | 4 + arch/x86/kernel/cpu/feature_control.c | 127 +++ arch/x86/kernel/cpu/intel.c | 49 +- arch/x86/kernel/cpu/mce/intel.c | 7 +- arch/x86/kernel/cpu/mkcapflags.sh | 15 +- arch/x86/kernel/cpu/proc.c | 14 + arch/x86/kernel/cpu/zhaoxin.c | 35 +- arch/x86/kvm/Kconfig | 10 +- arch/x86/kvm/vmx/nested.c | 4 +- arch/x86/kvm/vmx/vmx.c | 57 +- arch/x86/kvm/vmx/vmx.h | 2 +- tools/arch/x86/include/asm/msr-index.h | 27 +- tools/testing/selftests/kvm/Makefile | 4 +- .../selftests/kvm/include/x86_64/processor.h | 726 +----------------- tools/testing/selftests/kvm/lib/x86_64/vmx.c | 4 +- 27 files changed, 400 insertions(+), 983 deletions(-) create mode 100644 arch/x86/include/asm/vmxfeatures.h create mode 100644 arch/x86/kernel/cpu/feature_control.c -- 2.24.0

5 years, 10 months

4
38
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror November 2019