More details of the seal can be found in the LKML patch:
https://lore.kernel.org/lkml/20181120052137.74317-1-joel@joelfernandes.org/…
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
man2/fcntl.2 | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/man2/fcntl.2 b/man2/fcntl.2
index 03533d65b49d..54772f94964c 100644
--- a/man2/fcntl.2
+++ b/man2/fcntl.2
@@ -1525,6 +1525,21 @@ Furthermore, if there are any asynchronous I/O operations
.RB ( io_submit (2))
pending on the file,
all outstanding writes will be discarded.
+.TP
+.BR F_SEAL_FUTURE_WRITE
+If this seal is set, the contents of the file can be modified only from
+existing writeable mappings that were created prior to the seal being set.
+Any attempt to create a new writeable mapping on the memfd via
+.BR mmap (2)
+will fail with
+.BR EPERM.
+Also any attempts to write to the memfd via
+.BR write (2)
+will fail with
+.BR EPERM.
+This is useful in situations where existing writable mapped regions need to be
+kept intact while preventing any future writes. For example, to share a
+read-only memory buffer to other processes that only the sender can write to.
.\"
.SS File read/write hints
Write lifetime hints can be used to inform the kernel about the relative
--
2.19.1.1215.g8438c0b245-goog
Petr says:
This patchset adds several tests for VXLAN attached to an 802.1d bridge
and fixes a related bug.
First patch #1 fixes a bug in propagating SKB already-forwarded marks
over veth to bridges, where they are irrelevant. This bug causes the
vxlan_bridge_1d test suite from this patchset to fail as the packets
aren't forwarded by br2.
In patches #2 and #3, lib.sh is extended to support network namespaces.
The use of namespaces is necessitated by VXLAN, which allows only one
VXLAN device with a given VNI per namespace. Thus to host full topology
on a single box for selftests, the "remote" endpoints need to be in
namespaces.
In patches #4-#6, lib.sh is extended in other ways to facilitate the
following patches.
In patches #7-#15, first the skeleton, and later the generic tests
themselves are added.
Patch #16 then adds another test that serves as a wrapper around the
previous one, and runs it with a non-default port number.
Patches #17 and #18 add mlxsw-specific tests. About those, Ido writes:
The first test creates various configurations with regards to the VxLAN
and bridge devices and makes sure the driver correctly forbids
unsupported configuration and permits supported ones. It also verifies
that the driver correctly sets the offload indication on FDB entries and
the local route used for VxLAN decapsulation.
The second test verifies that the driver correctly configures the singly
linked list used to flood BUM traffic and that traffic is flooded as
expected.
Ido Schimmel (2):
selftests: mlxsw: Add a test for VxLAN configuration
selftests: mlxsw: Add a test for VxLAN flooding
Petr Machata (16):
net: skb_scrub_packet(): Scrub offload_fwd_mark
selftests: forwarding: lib: Support NUM_NETIFS of 0
selftests: forwarding: lib: Add in_ns()
selftests: forwarding: ping{6,}_test(): Add description argument
selftests: forwarding: ping{6,}_do(): Allow passing ping arguments
selftests: forwarding: lib: Add link_stats_rx_errors_get()
selftests: forwarding: Add a skeleton of vxlan_bridge_1d
selftests: forwarding: vxlan_bridge_1d: Add ping test
selftests: forwarding: vxlan_bridge_1d: Add flood test
selftests: forwarding: vxlan_bridge_1d: Add unicast test
selftests: forwarding: vxlan_bridge_1d: Reconfigure & rerun tests
selftests: forwarding: vxlan_bridge_1d: Add a TTL test
selftests: forwarding: vxlan_bridge_1d: Add a TOS test
selftests: forwarding: vxlan_bridge_1d: Add an ECN encap test
selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test
selftests: forwarding: vxlan_bridge_1d_port_8472: New test
net/core/skbuff.c | 5 +
.../selftests/drivers/net/mlxsw/vxlan.sh | 664 +++++++++++++++++
.../drivers/net/mlxsw/vxlan_flooding.sh | 309 ++++++++
tools/testing/selftests/net/forwarding/lib.sh | 42 +-
.../net/forwarding/vxlan_bridge_1d.sh | 678 ++++++++++++++++++
.../forwarding/vxlan_bridge_1d_port_8472.sh | 10 +
6 files changed, 1700 insertions(+), 8 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/mlxsw/vxlan.sh
create mode 100755 tools/testing/selftests/drivers/net/mlxsw/vxlan_flooding.sh
create mode 100755 tools/testing/selftests/net/forwarding/vxlan_bridge_1d.sh
create mode 100755 tools/testing/selftests/net/forwarding/vxlan_bridge_1d_port_8472.sh
--
2.19.1
If the cgroup destruction races with an exit() of a belonging
process(es), cg_kill_all() may fail. It's not a good reason to make
cg_destroy() fail and leave the cgroup in place, potentially causing
next test runs to fail.
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: kernel-team(a)fb.com
Cc: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/cgroup/cgroup_util.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c
index 14c9fe284806..eba06f94433b 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.c
+++ b/tools/testing/selftests/cgroup/cgroup_util.c
@@ -227,9 +227,7 @@ int cg_destroy(const char *cgroup)
retry:
ret = rmdir(cgroup);
if (ret && errno == EBUSY) {
- ret = cg_killall(cgroup);
- if (ret)
- return ret;
+ cg_killall(cgroup);
usleep(100);
goto retry;
}
--
2.17.2
Hi
Now that I got into adding selftests [1] I started to think if I should
also move my TPM2 smoke tests as part of them.
The project resides here:
https://github.com/jsakkine-intel/tpm2-scripts
I wonder if selftests can be done with Python in the first place or do
they have to be implemented in C?
[1] https://lkml.org/lkml/2018/11/16/274
/Jarkko
For unknown reason this never reached any MLs (used the same command
line for git send-email as usual).
/Jarkko
On Fri, Nov 16, 2018 at 03:38:08AM +0200, Jarkko Sakkinen wrote:
> Intel(R) SGX is a set of CPU instructions that can be used by applications
> to set aside private regions of code and data. The code outside the enclave
> is disallowed to access the memory inside the enclave by the CPU access
> control. In a way you can think that SGX provides inverted sandbox. It
> protects the application from a malicious host.
>
> There is a new hardware unit in the processor called Memory Encryption
> Engine (MEE) starting from the Skylake microacrhitecture. BIOS can define
> one or many MEE regions that can hold enclave data by configuring them with
> PRMRR registers.
>
> The MEE automatically encrypts the data leaving the processor package to
> the MEE regions. The data is encrypted using a random key whose life-time
> is exactly one power cycle.
>
> The current implementation requires that the firmware sets
> IA32_SGXLEPUBKEYHASH* MSRs as writable so that ultimately the kernel can
> decide what enclaves it wants run. The implementation does not create
> any bottlenecks to support read-only MSRs later on.
>
> You can tell if your CPU supports SGX by looking into /proc/cpuinfo:
>
> cat /proc/cpuinfo | grep sgx
>
> v17:
> * Add a simple selftest.
> * Fix a null pointer dereference to section->pages when its
> allocation fails.
> * Add Sean's description of the exception handling to the documentation.
>
> v16:
> * Fixed SOB's in the commits that were a bit corrupted in v15.
> * Implemented exceptio handling properly to detect_sgx().
> * Use GENMASK() to define SGX_CPUID_SUB_LEAF_TYPE_MASK.
> * Updated the documentation to use rst definition lists.
> * Added the missing Documentation/x86/index.rst, which has a link to
> intel_sgx.rst. Now the SGX and uapi documentation is properly generated
> with 'make htmldocs'.
> * While enumerating EPC sections, if an undefined section is found, fail
> the driver initialization instead of continuing the initialization.
> * Issue a warning if there are more than %SGX_MAX_EPC_SECTIONS.
> * Remove copyright notice from arch/x86/include/asm/sgx.h.
> * Migrated from ioremap_cache() to memremap().
>
> v15:
> * Split into more digestable size patches.
> * Lots of small fixes and clean ups.
> * Signal a "plain" SIGSEGV on an EPCM violation.
>
> v14:
> * Change the comment about X86_FEATURE_SGX_LC from “SGX launch
> configuration” to “SGX launch control”.
> * Move the SGX-related CPU feature flags as part of the Linux defined
> virtual leaf 8.
> * Add SGX_ prefix to the constants defining the ENCLS leaf functions.
> * Use GENMASK*() and BIT*() in sgx_arch.h instead of raw hex numbers.
> * Refine the long description for CONFIG_INTEL_SGX_CORE.
> * Do not use pr_*_ratelimited() in the driver. The use of the rate limited
> versions is legacy cruft from the prototyping phase.
> * Detect sleep with SGX_INVALID_EINIT_TOKEN instead of counting power
> cycles.
> * Manually prefix with “sgx:” in the core SGX code instead of redefining
> pr_fmt.
> * Report if IA32_SGXLEPUBKEYHASHx MSRs are not writable in the driver
> instead of core because it is a driver requirement.
> * Change prompt to bool in the entry for CONFIG_INTEL_SGX_CORE because the
> default is ‘n’.
> * Rename struct sgx_epc_bank as struct sgx_epc_section in order to match
> the SDM.
> * Allocate struct sgx_epc_page instances one at a time.
> * Use “__iomem void *” pointers for the mapped EPC memory consistently.
> * Retry once on SGX_INVALID_TOKEN in sgx_einit() instead of counting power
> cycles.
> * Call enclave swapping operations directly from the driver instead of
> calling them .indirectly through struct sgx_epc_page_ops because indirect
> calls are not required yet as the patch set does not contain the KVM
> support.
> * Added special signal SEGV_SGXERR to notify about SGX EPCM violation
> errors.
>
> v13:
> * Always use SGX_CPUID constant instead of a hardcoded value.
> * Simplified and documented the macros and functions for ENCLS leaves.
> * Enable sgx_free_page() to free active enclave pages on demand
> in order to allow sgx_invalidate() to delete enclave pages.
> It no longer performs EREMOVE if a page is in the process of
> being reclaimed.
> * Use PM notifier per enclave so that we don't have to traverse
> the global list of active EPC pages to find enclaves.
> * Removed unused SGX_LE_ROLLBACK constant from uapi/asm/sgx.h
> * Always use ioremap() to map EPC banks as we only support 64-bit kernel.
> * Invalidate IA32_SGXLEPUBKEYHASH cache used by sgx_einit() when going
> to sleep.
>
> v12:
> * Split to more narrow scoped commits in order to ease the review process and
> use co-developed-by tag for co-authors of commits instead of listing them in
> the source files.
> * Removed cruft EXPORT_SYMBOL() declarations and converted to static variables.
> * Removed in-kernel LE i.e. this version of the SGX software stack only
> supports unlocked IA32_SGXLEPUBKEYHASHx MSRs.
> * Refined documentation on launching enclaves, swapping and enclave
> construction.
> * Refined sgx_arch.h to include alignment information for every struct that
> requires it and removed structs that are not needed without an LE.
> * Got rid of SGX_CPUID.
> * SGX detection now prints log messages about firmware configuration issues.
>
> v11:
> * Polished ENCLS wrappers with refined exception handling.
> * ksgxswapd was not stopped (regression in v5) in
> sgx_page_cache_teardown(), which causes a leaked kthread after driver
> deinitialization.
> * Shutdown sgx_le_proxy when going to suspend because its EPC pages will be
> invalidated when resuming, which will cause it not function properly
> anymore.
> * Set EINITTOKEN.VALID to zero for a token that is passed when
> SGXLEPUBKEYHASH matches MRSIGNER as alloc_page() does not give a zero
> page.
> * Fixed the check in sgx_edbgrd() for a TCS page. Allowed to read offsets
> around the flags field, which causes a #GP. Only flags read is readable.
> * On read access memcpy() call inside sgx_vma_access() had src and dest
> parameters in wrong order.
> * The build issue with CONFIG_KASAN is now fixed. Added undefined symbols
> to LE even if “KASAN_SANITIZE := false” was set in the makefile.
> * Fixed a regression in the #PF handler. If a page has
> SGX_ENCL_PAGE_RESERVED flag the #PF handler should unconditionally fail.
> It did not, which caused weird races when trying to change other parts of
> swapping code.
> * EPC management has been refactored to a flat LRU cache and moved to
> arch/x86. The swapper thread reads a cluster of EPC pages and swaps all
> of them. It can now swap from multiple enclaves in the same round.
> * For the sake of consistency with SGX_IOC_ENCLAVE_ADD_PAGE, return -EINVAL
> when an enclave is already initialized or dead instead of zero.
>
> v10:
> * Cleaned up anon inode based IPC between the ring-0 and ring-3 parts
> of the driver.
> * Unset the reserved flag from an enclave page if EDBGRD/WR fails
> (regression in v6).
> * Close the anon inode when LE is stopped (regression in v9).
> * Update the documentation with a more detailed description of SGX.
>
> v9:
> * Replaced kernel-LE IPC based on pipes with an anonymous inode.
> The driver does not require anymore new exports.
>
> v8:
> * Check that public key MSRs match the LE public key hash in the
> driver initialization when the MSRs are read-only.
> * Fix the race in VA slot allocation by checking the fullness
> immediately after succeesful allocation.
> * Fix the race in hash mrsigner calculation between the launch
> enclave and user enclaves by having a separate lock for hash
> calculation.
>
> v7:
> * Fixed offset calculation in sgx_edbgr/wr(). Address was masked with PAGE_MASK
> when it should have been masked with ~PAGE_MASK.
> * Fixed a memory leak in sgx_ioc_enclave_create().
> * Simplified swapping code by using a pointer array for a cluster
> instead of a linked list.
> * Squeezed struct sgx_encl_page to 32 bytes.
> * Fixed deferencing of an RSA key on OpenSSL 1.1.0.
> * Modified TC's CMAC to use kernel AES-NI. Restructured the code
> a bit in order to better align with kernel conventions.
>
> v6:
> * Fixed semaphore underrun when accessing /dev/sgx from the launch enclave.
> * In sgx_encl_create() s/IS_ERR(secs)/IS_ERR(encl)/.
> * Removed virtualization chapter from the documentation.
> * Changed the default filename for the signing key as signing_key.pem.
> * Reworked EPC management in a way that instead of a linked list of
> struct sgx_epc_page instances there is an array of integers that
> encodes address and bank of an EPC page (the same data as 'pa' field
> earlier). The locking has been moved to the EPC bank level instead
> of a global lock.
> * Relaxed locking requirements for EPC management. EPC pages can be
> released back to the EPC bank concurrently.
> * Cleaned up ptrace() code.
> * Refined commit messages for new architectural constants.
> * Sorted includes in every source file.
> * Sorted local variable declarations according to the line length in
> every function.
> * Style fixes based on Darren's comments to sgx_le.c.
>
> v5:
> * Described IPC between the Launch Enclave and kernel in the commit messages.
> * Fixed all relevant checkpatch.pl issues that I have forgot fix in earlier
> versions except those that exist in the imported TinyCrypt code.
> * Fixed spelling mistakes in the documentation.
> * Forgot to check the return value of sgx_drv_subsys_init().
> * Encapsulated properly page cache init and teardown.
> * Collect epc pages to a temp list in sgx_add_epc_bank
> * Removed SGX_ENCLAVE_INIT_ARCH constant.
>
> v4:
> * Tied life-cycle of the sgx_le_proxy process to /dev/sgx.
> * Removed __exit annotation from sgx_drv_subsys_exit().
> * Fixed a leak of a backing page in sgx_process_add_page_req() in the
> case when vm_insert_pfn() fails.
> * Removed unused symbol exports for sgx_page_cache.c.
> * Updated sgx_alloc_page() to require encl parameter and documented the
> behavior (Sean Christopherson).
> * Refactored a more lean API for sgx_encl_find() and documented the behavior.
> * Moved #PF handler to sgx_fault.c.
> * Replaced subsys_system_register() with plain bus_register().
> * Retry EINIT 2nd time only if MSRs are not locked.
>
> v3:
> * Check that FEATURE_CONTROL_LOCKED and FEATURE_CONTROL_SGX_ENABLE are set.
> * Return -ERESTARTSYS in __sgx_encl_add_page() when sgx_alloc_page() fails.
> * Use unused bits in epc_page->pa to store the bank number.
> * Removed #ifdef for WQ_NONREENTRANT.
> * If mmu_notifier_register() fails with -EINTR, return -ERESTARTSYS.
> * Added --remove-section=.got.plt to objcopy flags in order to prevent a
> dummy .got.plt, which will cause an inconsistent size for the LE.
> * Documented sgx_encl_* functions.
> * Added remark about AES implementation used inside the LE.
> * Removed redundant sgx_sys_exit() from le/main.c.
> * Fixed struct sgx_secinfo alignment from 128 to 64 bytes.
> * Validate miscselect in sgx_encl_create().
> * Fixed SSA frame size calculation to take the misc region into account.
> * Implemented consistent exception handling to __encls() and __encls_ret().
> * Implemented a proper device model in order to allow sysfs attributes
> and in-kernel API.
> * Cleaned up various "find enclave" implementations to the unified
> sgx_encl_find().
> * Validate that vm_pgoff is zero.
> * Discard backing pages with shmem_truncate_range() after EADD.
> * Added missing EEXTEND operations to LE signing and launch.
> * Fixed SSA size for GPRS region from 168 to 184 bytes.
> * Fixed the checks for TCS flags. Now DBGOPTIN is allowed.
> * Check that TCS addresses are in ELRANGE and not just page aligned.
> * Require kernel to be compiled with X64_64 and CPU_SUP_INTEL.
> * Fixed an incorrect value for SGX_ATTR_DEBUG from 0x01 to 0x02.
>
> v2:
> * get_rand_uint32() changed the value of the pointer instead of value
> where it is pointing at.
> * Launch enclave incorrectly used sigstruct attributes-field instead of
> enclave attributes-field.
> * Removed unused struct sgx_add_page_req from sgx_ioctl.c
> * Removed unused sgx_has_sgx2.
> * Updated arch/x86/include/asm/sgx.h so that it provides stub
> implementations when sgx in not enabled.
> * Removed cruft rdmsr-calls from sgx_set_pubkeyhash_msrs().
> * return -ENOMEM in sgx_alloc_page() when VA pages consume too much space
> * removed unused global sgx_nr_pids
> * moved sgx_encl_release to sgx_encl.c
> * return -ERESTARTSYS instead of -EINTR in sgx_encl_init()
>
>
> Jarkko Sakkinen (13):
> x86/sgx: Update MAINTAINERS
> x86/sgx: Define SGX1 and SGX2 ENCLS leafs
> x86/sgx: Add ENCLS architectural error codes
> x86/sgx: Add SGX1 and SGX2 architectural data structures
> x86/sgx: Add definitions for SGX's CPUID leaf and variable sub-leafs
> x86/sgx: Add wrappers for ENCLS leaf functions
> x86/sgx: Add functions to allocate and free EPC pages
> platform/x86: Intel SGX driver
> platform/x86: sgx: Add swapping functionality to the Intel SGX driver
> x86/sgx: Add a simple swapper for the EPC memory manager
> platform/x86: ptrace() support for the SGX driver
> x86/sgx: SGX documentation
> selftests/x86: Add a selftest for SGX
>
> Kai Huang (2):
> x86/cpufeatures: Add Intel-defined SGX feature bit
> x86/cpufeatures: Add Intel-defined SGX_LC feature bit
>
> Sean Christopherson (8):
> x86/cpufeatures: Add SGX sub-features (as Linux-defined bits)
> x86/msr: Add IA32_FEATURE_CONTROL.SGX_ENABLE definition
> x86/cpu/intel: Detect SGX support and update caps appropriately
> x86/mm: x86/sgx: Add new 'PF_SGX' page fault error code bit
> x86/mm: x86/sgx: Signal SIGSEGV for userspace #PFs w/ PF_SGX
> x86/msr: Add SGX Launch Control MSR definitions
> x86/sgx: Enumerate and track EPC sections
> x86/sgx: Add sgx_einit() for initializing enclaves
>
> Documentation/index.rst | 1 +
> Documentation/x86/index.rst | 8 +
> Documentation/x86/intel_sgx.rst | 233 +++++
> MAINTAINERS | 7 +
> arch/x86/Kconfig | 18 +
> arch/x86/include/asm/cpufeatures.h | 23 +-
> arch/x86/include/asm/msr-index.h | 8 +
> arch/x86/include/asm/sgx.h | 324 ++++++
> arch/x86/include/asm/sgx_arch.h | 400 +++++++
> arch/x86/include/asm/traps.h | 1 +
> arch/x86/include/uapi/asm/sgx.h | 59 ++
> arch/x86/include/uapi/asm/sgx_errno.h | 91 ++
> arch/x86/kernel/cpu/Makefile | 1 +
> arch/x86/kernel/cpu/intel.c | 37 +
> arch/x86/kernel/cpu/intel_sgx.c | 488 +++++++++
> arch/x86/kernel/cpu/scattered.c | 2 +
> arch/x86/mm/fault.c | 13 +
> drivers/platform/x86/Kconfig | 2 +
> drivers/platform/x86/Makefile | 1 +
> drivers/platform/x86/intel_sgx/Kconfig | 20 +
> drivers/platform/x86/intel_sgx/Makefile | 14 +
> drivers/platform/x86/intel_sgx/sgx.h | 212 ++++
> drivers/platform/x86/intel_sgx/sgx_encl.c | 977 ++++++++++++++++++
> .../platform/x86/intel_sgx/sgx_encl_page.c | 178 ++++
> drivers/platform/x86/intel_sgx/sgx_fault.c | 109 ++
> drivers/platform/x86/intel_sgx/sgx_ioctl.c | 234 +++++
> drivers/platform/x86/intel_sgx/sgx_main.c | 267 +++++
> drivers/platform/x86/intel_sgx/sgx_util.c | 156 +++
> drivers/platform/x86/intel_sgx/sgx_vma.c | 167 +++
> tools/arch/x86/include/asm/cpufeatures.h | 21 +-
> tools/testing/selftests/x86/Makefile | 10 +
> tools/testing/selftests/x86/sgx/Makefile | 47 +
> tools/testing/selftests/x86/sgx/encl.c | 20 +
> tools/testing/selftests/x86/sgx/encl.lds | 33 +
> .../selftests/x86/sgx/encl_bootstrap.S | 94 ++
> tools/testing/selftests/x86/sgx/encl_piggy.S | 16 +
> tools/testing/selftests/x86/sgx/encl_piggy.h | 13 +
> .../testing/selftests/x86/sgx/sgx-selftest.c | 149 +++
> tools/testing/selftests/x86/sgx/sgx_arch.h | 109 ++
> tools/testing/selftests/x86/sgx/sgx_call.S | 20 +
> tools/testing/selftests/x86/sgx/sgx_uapi.h | 100 ++
> tools/testing/selftests/x86/sgx/sgxsign.c | 503 +++++++++
> .../testing/selftests/x86/sgx/signing_key.pem | 39 +
> 43 files changed, 5213 insertions(+), 12 deletions(-)
> create mode 100644 Documentation/x86/index.rst
> create mode 100644 Documentation/x86/intel_sgx.rst
> create mode 100644 arch/x86/include/asm/sgx.h
> create mode 100644 arch/x86/include/asm/sgx_arch.h
> create mode 100644 arch/x86/include/uapi/asm/sgx.h
> create mode 100644 arch/x86/include/uapi/asm/sgx_errno.h
> create mode 100644 arch/x86/kernel/cpu/intel_sgx.c
> create mode 100644 drivers/platform/x86/intel_sgx/Kconfig
> create mode 100644 drivers/platform/x86/intel_sgx/Makefile
> create mode 100644 drivers/platform/x86/intel_sgx/sgx.h
> create mode 100644 drivers/platform/x86/intel_sgx/sgx_encl.c
> create mode 100644 drivers/platform/x86/intel_sgx/sgx_encl_page.c
> create mode 100644 drivers/platform/x86/intel_sgx/sgx_fault.c
> create mode 100644 drivers/platform/x86/intel_sgx/sgx_ioctl.c
> create mode 100644 drivers/platform/x86/intel_sgx/sgx_main.c
> create mode 100644 drivers/platform/x86/intel_sgx/sgx_util.c
> create mode 100644 drivers/platform/x86/intel_sgx/sgx_vma.c
> create mode 100644 tools/testing/selftests/x86/sgx/Makefile
> create mode 100644 tools/testing/selftests/x86/sgx/encl.c
> create mode 100644 tools/testing/selftests/x86/sgx/encl.lds
> create mode 100644 tools/testing/selftests/x86/sgx/encl_bootstrap.S
> create mode 100644 tools/testing/selftests/x86/sgx/encl_piggy.S
> create mode 100644 tools/testing/selftests/x86/sgx/encl_piggy.h
> create mode 100644 tools/testing/selftests/x86/sgx/sgx-selftest.c
> create mode 100644 tools/testing/selftests/x86/sgx/sgx_arch.h
> create mode 100644 tools/testing/selftests/x86/sgx/sgx_call.S
> create mode 100644 tools/testing/selftests/x86/sgx/sgx_uapi.h
> create mode 100644 tools/testing/selftests/x86/sgx/sgxsign.c
> create mode 100644 tools/testing/selftests/x86/sgx/signing_key.pem
>
> --
> 2.19.1
>
>
This series attempts to make the fsgsbase test in the x86 kselftest
report a stable result. On some Intel systems there are intermittent
failures in this testcase which have been reported and discussed
previously:
https://lore.kernel.org/lkml/20180126153631.ha7yc33fj5uhitjo@xps/
with the analysis concluding that this is a hardware issue affecting a
subset of systems but no fix has been merged as yet. In order to at
least make the test more solid for use in automated testing this series
modifies it to execute the test often enough to reproduce the problem
reliably.
I'm not happy with this since it doesn't fix the actual problem, the
code isn't particularly clean and it makes the execution time for the
selftests much longer - my main goal here is to restart the discussion
of the test failure, I don't think merging this is a great idea.
Mark Brown (2):
selftests/x86/fsgsbase: Indirect output through a wrapper function
selftests/x86/fsgsbase: Default to trying to run the test repeatedly
tools/testing/selftests/x86/fsgsbase.c | 79 +++++++++++++++++++++++++---------
1 file changed, 58 insertions(+), 21 deletions(-)
Including Shuah and kselftest list...
On Sat, Nov 10, 2018, at 4:49 PM, Alexey Dobriyan wrote:
> https://bugs.linaro.org/show_bug.cgi?id=3782
>
> Turns out arm doesn't allow to map address 0, so try minimum virtual
> address instead.
>
> Reported-by: Rafael David Tinoco <rafael.tinoco(a)linaro.org>
> Signed-off-by: Alexey Dobriyan <adobriyan(a)gmail.com>
> ---
>
> tools/testing/selftests/proc/proc-self-map-files-002.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> --- a/tools/testing/selftests/proc/proc-self-map-files-002.c
> +++ b/tools/testing/selftests/proc/proc-self-map-files-002.c
> @@ -13,7 +13,7 @@
> * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
> OF
> * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> */
> -/* Test readlink /proc/self/map_files/... with address 0. */
> +/* Test readlink /proc/self/map_files/... with minimum address. */
> #include <errno.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> @@ -47,6 +47,11 @@ static void fail(const char *fmt, unsigned long a,
> unsigned long b)
> int main(void)
> {
> const unsigned int PAGE_SIZE = sysconf(_SC_PAGESIZE);
> +#ifdef __arm__
> + unsigned long va = 2 * PAGE_SIZE;
> +#else
> + unsigned long va = 0;
> +#endif
> void *p;
> int fd;
> unsigned long a, b;
> @@ -55,7 +60,7 @@ int main(void)
> if (fd == -1)
> return 1;
>
> - p = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_FILE|MAP_FIXED, fd, 0);
> + p = mmap(va, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_FILE|MAP_FIXED, fd, 0);
> if (p == MAP_FAILED) {
> if (errno == EPERM)
> return 2;
I have sent a patch removing proc-self-map-files-002 AND making 001 to use as a
HINT for mmap (MAP_FIXED) *at least* *(2 * PAGE_SIZE), which would, likely,
attend all architectures, avoiding trying to make the test specific to one,
and, still, test the symlinks for issues (like bad chars, spaces, so on).
Both tests (001 and 002) have pretty much the same code, while they could have 2
tests in a single code, using kselftest framework. Is NULL hint + MAP_FIXED
something imperative for this test ? Why not to have all in a single test ? Are
you keeping the NULL hint just to test mmap, apart" from the core of this test ?
Sorry to insist.. If you want to keep it like this, I can create a similar test
in LTP - for the symlinks only, which seem important - and blacklist this one in
our function tests kselftest list (https://lkft.linaro.org/), then no change is
needed on your side.
Thanks
v5 changes:
* FILE -> PATH for load/loadall (can be either file or directory now)
* simpler implementation for __bpf_program__pin_name
* removed p_err for REQ_ARGS checks
* parse_atach_detach_args -> parse_attach_detach_args
* for -> while in bpf_object__pin_{programs,maps} recovery
v4 changes:
* addressed another round of comments/style issues from Jakub Kicinski &
Quentin Monnet (thanks!)
* implemented bpf_object__pin_maps and bpf_object__pin_programs helpers and
used them in bpf_program__pin
* added new pin_name to bpf_program so bpf_program__pin
works with sections that contain '/'
* moved *loadall* command implementation into a separate patch
* added patch that implements *pinmaps* to pin maps when doing
load/loadall
v3 changes:
* (maybe) better cleanup for partial failure in bpf_object__pin
* added special case in bpf_program__pin for programs with single
instances
v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin, parts of which are now
being used to attach all flow dissector progs/maps
* third patch adds special case in bpf_program__pin for programs with
single instances (we don't create <prog>/0 pin anymore, just <prog>)
* forth patch adds pin_name to the bpf_program struct
which is now used as a pin name in bpf_program__pin et al
* fifth patch adds *loadall* command that pins all programs, not just
the first one
* sixth patch adds *pinmaps* argument to load/loadall to let users pin
all maps of the obj file
* seventh patch adds actual flow_dissector support to the bpftool and
an example
Stanislav Fomichev (7):
selftests/bpf: rename flow dissector section to flow_dissector
libbpf: cleanup after partial failure in bpf_object__pin
libbpf: bpf_program__pin: add special case for instances.nr == 1
libbpf: add internal pin_name
bpftool: add loadall command
bpftool: add pinmaps argument to the load/loadall
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 42 +-
tools/bpf/bpftool/bash-completion/bpftool | 21 +-
tools/bpf/bpftool/common.c | 31 +-
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 183 ++++++---
tools/lib/bpf/libbpf.c | 359 ++++++++++++++++--
tools/lib/bpf/libbpf.h | 18 +
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
9 files changed, 537 insertions(+), 122 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
Historically, kretprobe has always produced unusable stack traces
(kretprobe_trampoline is the only entry in most cases, because of the
funky stack pointer overwriting). This has caused quite a few annoyances
when using tracing to debug problems[1] -- since return values are only
available with kretprobes but stack traces were only usable for kprobes,
users had to probe both and then manually associate them.
This patch series stores the stack trace within kretprobe_instance on
the kprobe entry used to set up the kretprobe. This allows for
DTrace-style stack aggregation between function entry and exit with
tools like BPFtrace -- which would not really be doable if the stack
unwinder understood kretprobe_trampoline.
We also revert commit 76094a2cf46e ("ftrace: distinguish kretprobe'd
functions in trace logs") and any follow-up changes because that code is
no longer necessary now that stack traces are sane. *However* this patch
might be a bit contentious since the original usecase (that ftrace
returns shouldn't show kretprobe_trampoline) is arguably still an
issue. Feel free to drop it if you think it is wrong.
Patch changelog:
v3:
* kprobe: fix build on !CONFIG_KPROBES
v2:
* documentation: mention kretprobe stack-stashing
* ftrace: add self-test for fixed kretprobe stacktraces
* ftrace: remove [unknown/kretprobe'd] handling
* kprobe: remove needless EXPORT statements
* kprobe: minor corrections to current_kretprobe_instance (switch
away from hlist_for_each_entry_safe)
* kprobe: make maximum stack size 127, which is the ftrace default
Aleksa Sarai (2):
kretprobe: produce sane stack traces
trace: remove kretprobed checks
Documentation/kprobes.txt | 6 +-
include/linux/kprobes.h | 27 +++++
kernel/events/callchain.c | 8 +-
kernel/kprobes.c | 101 +++++++++++++++++-
kernel/trace/trace.c | 11 +-
kernel/trace/trace_output.c | 34 +-----
.../test.d/kprobe/kretprobe_stacktrace.tc | 25 +++++
7 files changed, 177 insertions(+), 35 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc
--
2.19.1
Hi,friend,
This is Daniel Murray and i am from Sinara Group Co.Ltd Group Co.,LTD in Russia.
We are glad to know about your company from the web and we are interested in your products.
Could you kindly send us your Latest catalog and price list for our trial order.
Best Regards,
Daniel Murray
Purchasing Manager
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.
One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active. This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow
This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active. The following program shows the seal
working in action:
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
#include <linux/memfd.h>
#include <linux/fcntl.h>
#include <asm/unistd.h>
#include <unistd.h>
#define F_SEAL_FUTURE_WRITE 0x0010
#define REGION_SIZE (5 * 1024 * 1024)
int memfd_create_region(const char *name, size_t size)
{
int ret;
int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
if (fd < 0) return fd;
ret = ftruncate(fd, size);
if (ret < 0) { close(fd); return ret; }
return fd;
}
int main() {
int ret, fd;
void *addr, *addr2, *addr3, *addr1;
ret = memfd_create_region("test_region", REGION_SIZE);
printf("ret=%d\n", ret);
fd = ret;
// Create map
addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED)
printf("map 0 failed\n");
else
printf("map 0 passed\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed even though no future-write seal "
"(ret=%d errno =%d)\n", ret, errno);
else
printf("write passed\n");
addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr1 == MAP_FAILED)
perror("map 1 prot-write failed even though no seal\n");
else
printf("map 1 prot-write passed as expected\n");
ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE |
F_SEAL_GROW |
F_SEAL_SHRINK);
if (ret == -1)
printf("fcntl failed, errno: %d\n", errno);
else
printf("future-write seal now active\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed as expected due to future-write seal\n");
else
printf("write passed (unexpected)\n");
addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr2 == MAP_FAILED)
perror("map 2 prot-write failed as expected due to seal\n");
else
printf("map 2 passed\n");
addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (addr3 == MAP_FAILED)
perror("map 3 failed\n");
else
printf("map 3 prot-read passed as expected\n");
}
The output of running this program is as follows:
ret=3
map 0 passed
write passed
map 1 prot-write passed as expected
future-write seal now active
write failed as expected due to future-write seal
map 2 prot-write failed as expected due to seal
: Permission denied
map 3 prot-read passed as expected
Cc: jreck(a)google.com
Cc: john.stultz(a)linaro.org
Cc: tkjos(a)google.com
Cc: gregkh(a)linuxfoundation.org
Cc: hch(a)infradead.org
Reviewed-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
v1->v2: No change, just added selftests to the series. manpages are
ready and I'll submit them once the patches are accepted.
v2->v3: Updated commit message to have more support code (John Stultz)
Renamed seal from F_SEAL_FS_WRITE to F_SEAL_FUTURE_WRITE
(Christoph Hellwig)
Allow for this seal only if grow/shrink seals are also
either previous set, or are requested along with this seal.
(Christoph Hellwig)
Added locking to synchronize access to file->f_mode.
(Christoph Hellwig)
include/uapi/linux/fcntl.h | 1 +
mm/memfd.c | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..a2f8658f1c55 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -41,6 +41,7 @@
#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
#define F_SEAL_GROW 0x0004 /* prevent file from growing */
#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
/* (1U << 31) is reserved for signed error codes */
/*
diff --git a/mm/memfd.c b/mm/memfd.c
index 2bb5e257080e..5ba9804e9515 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -150,7 +150,8 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_SHRINK | \
F_SEAL_GROW | \
- F_SEAL_WRITE)
+ F_SEAL_WRITE | \
+ F_SEAL_FUTURE_WRITE)
static int memfd_add_seals(struct file *file, unsigned int seals)
{
@@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
}
}
+ if ((seals & F_SEAL_FUTURE_WRITE) &&
+ !(*file_seals & F_SEAL_FUTURE_WRITE)) {
+ /*
+ * The FUTURE_WRITE seal also prevents growing and shrinking
+ * so we need them to be already set, or requested now.
+ */
+ int test_seals = (seals | *file_seals) &
+ (F_SEAL_GROW | F_SEAL_SHRINK);
+
+ if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
+ error = -EINVAL;
+ goto unlock;
+ }
+
+ spin_lock(&file->f_lock);
+ file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
+ spin_unlock(&file->f_lock);
+ }
+
*file_seals |= seals;
error = 0;
--
2.19.1.930.g4563a0d9d0-goog
Hi,friend,
This is Daniel Murray and i am from Sinara Group Co.Ltd Group Co.,LTD in Russia.
We are glad to know about your company from the web and we are interested in your products.
Could you kindly send us your Latest catalog and price list for our trial order.
Best Regards,
Daniel Murray
Purchasing Manager
MAP_FIXED is important for this test but, unfortunately, lowest virtual
address for user space mapping on arm is (PAGE_SIZE * 2) and NULL hint
does not seem to guarantee that when MAP_FIXED is given. This patch sets
the virtual address that will hold the mapping for the test, fixing the
issue.
Link: https://bugs.linaro.org/show_bug.cgi?id=3782
Signed-off-by: Rafael David Tinoco <rafael.tinoco(a)linaro.org>
---
tools/testing/selftests/proc/proc-self-map-files-002.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/proc/proc-self-map-files-002.c b/tools/testing/selftests/proc/proc-self-map-files-002.c
index 6f1f4a6e1ecb..0a47eaca732a 100644
--- a/tools/testing/selftests/proc/proc-self-map-files-002.c
+++ b/tools/testing/selftests/proc/proc-self-map-files-002.c
@@ -55,7 +55,9 @@ int main(void)
if (fd == -1)
return 1;
- p = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_FILE|MAP_FIXED, fd, 0);
+ p = mmap((void *) (2 * PAGE_SIZE), PAGE_SIZE, PROT_NONE,
+ MAP_PRIVATE|MAP_FILE|MAP_FIXED, fd, 0);
+
if (p == MAP_FAILED) {
if (errno == EPERM)
return 2;
--
2.19.1
On Sat, Nov 10, 2018 at 04:26:46AM -0800, Daniel Colascione wrote:
> On Friday, November 9, 2018, Joel Fernandes <joel(a)joelfernandes.org> wrote:
>
> > On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
> > > On Fri, Nov 9, 2018 at 10:06 PM Jann Horn <jannh(a)google.com> wrote:
> > > > On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
> > > > <joel(a)joelfernandes.org> wrote:
> > > > > Android uses ashmem for sharing memory regions. We are looking
> > forward
> > > > > to migrating all usecases of ashmem to memfd so that we can possibly
> > > > > remove the ashmem driver in the future from staging while also
> > > > > benefiting from using memfd and contributing to it. Note staging
> > drivers
> > > > > are also not ABI and generally can be removed at anytime.
> > > > >
> > > > > One of the main usecases Android has is the ability to create a
> > region
> > > > > and mmap it as writeable, then add protection against making any
> > > > > "future" writes while keeping the existing already mmap'ed
> > > > > writeable-region active. This allows us to implement a usecase where
> > > > > receivers of the shared memory buffer can get a read-only view, while
> > > > > the sender continues to write to the buffer.
> > > > > See CursorWindow documentation in Android for more details:
> > > > > https://developer.android.com/reference/android/database/
> > CursorWindow
> > > > >
> > > > > This usecase cannot be implemented with the existing F_SEAL_WRITE
> > seal.
> > > > > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE
> > seal
> > > > > which prevents any future mmap and write syscalls from succeeding
> > while
> > > > > keeping the existing mmap active.
> > > >
> > > > Please CC linux-api@ on patches like this. If you had done that, I
> > > > might have criticized your v1 patch instead of your v3 patch...
> > > >
> > > > > The following program shows the seal
> > > > > working in action:
> > > > [...]
> > > > > Cc: jreck(a)google.com
> > > > > Cc: john.stultz(a)linaro.org
> > > > > Cc: tkjos(a)google.com
> > > > > Cc: gregkh(a)linuxfoundation.org
> > > > > Cc: hch(a)infradead.org
> > > > > Reviewed-by: John Stultz <john.stultz(a)linaro.org>
> > > > > Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
> > > > > ---
> > > > [...]
> > > > > diff --git a/mm/memfd.c b/mm/memfd.c
> > > > > index 2bb5e257080e..5ba9804e9515 100644
> > > > > --- a/mm/memfd.c
> > > > > +++ b/mm/memfd.c
> > > > [...]
> > > > > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file,
> > unsigned int seals)
> > > > > }
> > > > > }
> > > > >
> > > > > + if ((seals & F_SEAL_FUTURE_WRITE) &&
> > > > > + !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > > > > + /*
> > > > > + * The FUTURE_WRITE seal also prevents growing and
> > shrinking
> > > > > + * so we need them to be already set, or requested
> > now.
> > > > > + */
> > > > > + int test_seals = (seals | *file_seals) &
> > > > > + (F_SEAL_GROW | F_SEAL_SHRINK);
> > > > > +
> > > > > + if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > > > > + error = -EINVAL;
> > > > > + goto unlock;
> > > > > + }
> > > > > +
> > > > > + spin_lock(&file->f_lock);
> > > > > + file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > > > > + spin_unlock(&file->f_lock);
> > > > > + }
> > > >
> > > > So you're fiddling around with the file, but not the inode? How are
> > > > you preventing code like the following from re-opening the file as
> > > > writable?
> > > >
> > > > $ cat memfd.c
> > > > #define _GNU_SOURCE
> > > > #include <unistd.h>
> > > > #include <sys/syscall.h>
> > > > #include <printf.h>
> > > > #include <fcntl.h>
> > > > #include <err.h>
> > > > #include <stdio.h>
> > > >
> > > > int main(void) {
> > > > int fd = syscall(__NR_memfd_create, "testfd", 0);
> > > > if (fd == -1) err(1, "memfd");
> > > > char path[100];
> > > > sprintf(path, "/proc/self/fd/%d", fd);
> > > > int fd2 = open(path, O_RDWR);
> > > > if (fd2 == -1) err(1, "reopen");
> > > > printf("reopen successful: %d\n", fd2);
> > > > }
> > > > $ gcc -o memfd memfd.c
> > > > $ ./memfd
> > > > reopen successful: 4
> > > > $
> > > >
> > > > That aside: I wonder whether a better API would be something that
> > > > allows you to create a new readonly file descriptor, instead of
> > > > fiddling with the writability of an existing fd.
> > >
> > > My favorite approach would be to forbid open() on memfds, hope that
> > > nobody notices the tiny API break, and then add an ioctl for "reopen
> > > this memfd with reduced permissions" - but that's just my personal
> > > opinion.
> >
> > I did something along these lines and it fixes the issue, but I forbid open
> > of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its
> > not
> > an ABI break because this is a brand new seal. That seems the least
> > intrusive
> > solution and it works. Do you mind testing it and I'll add your and
> > Tested-by
> > to the new fix? The patch is based on top of this series.
> >
>
> Please don't forbid reopens entirely. You're taking a feature that works
> generally (reopens) and breaking it in one specific case (memfd write
> sealed files). The open modes are available in .open in the struct file:
> you can deny *only* opens for write instead of denying reopens generally.
Yes, as we discussed over chat already, I will implement it that way.
Also lets continue to discuss Andy's concerns he raised on the other thread.
thanks,
- Joel
From: Stanislav Fomichev <sdf(a)google.com>
v4 changes:
* addressed another round of comments/style issues from Jakub Kicinski &
Quentin Monnet (thanks!)
* implemented bpf_object__pin_maps and bpf_object__pin_programs helpers and
used them in bpf_program__pin
* added new pin_name to bpf_program so bpf_program__pin
works with sections that contain '/'
* moved *loadall* command implementation into a separate patch
* added patch that implements *pinmaps* to pin maps when doing
load/loadall
v3 changes:
* (maybe) better cleanup for partial failure in bpf_object__pin
* added special case in bpf_program__pin for programs with single
instances
v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin, parts of which are now
being used to attach all flow dissector progs/maps
* third patch adds special case in bpf_program__pin for programs with
single instances (we don't create <prog>/0 pin anymore, just <prog>)
* forth patch adds pin_name to the bpf_program struct
which is now used as a pin name in bpf_program__pin et al
* fifth patch adds *loadall* command that pins all programs, not just
the first one
* sixth patch adds *pinmaps* argument to load/loadall to let users pin
all maps of the obj file
* seventh patch adds actual flow_dissector support to the bpftool and
an example
Stanislav Fomichev (7):
selftests/bpf: rename flow dissector section to flow_dissector
libbpf: cleanup after partial failure in bpf_object__pin
libbpf: bpf_program__pin: add special case for instances.nr == 1
libbpf: add internal pin_name
bpftool: add loadall command
bpftool: add pinmaps argument to the load/loadall
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 42 +-
tools/bpf/bpftool/bash-completion/bpftool | 21 +-
tools/bpf/bpftool/common.c | 31 +-
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 185 ++++++---
tools/lib/bpf/libbpf.c | 364 ++++++++++++++++--
tools/lib/bpf/libbpf.h | 18 +
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
9 files changed, 546 insertions(+), 120 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
v3 changes:
* (maybe) better cleanup for partial failure in bpf_object__pin
* added special case in bpf_program__pin for programs with single
instances
v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin which is now being
used to attach all flow dissector progs/maps
* third patch adds special case in bpf_program__pin for programs with
single instances (we don't create <prog>/0 pin anymore, just <prog>)
* forth patch adds actual support to the bpftool
See forth patch for the description/details.
Stanislav Fomichev (4):
selftests/bpf: rename flow dissector section to flow_dissector
libbpf: cleanup after partial failure in bpf_object__pin
libbpf: bpf_program__pin: add special case for instances.nr == 1
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 36 ++-
tools/bpf/bpftool/bash-completion/bpftool | 6 +-
tools/bpf/bpftool/common.c | 30 +-
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 112 ++++++--
tools/lib/bpf/libbpf.c | 258 ++++++++++++++++--
tools/lib/bpf/libbpf.h | 11 +
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
9 files changed, 368 insertions(+), 90 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
All,
Updating Kselftest wiki and providing links to overview and how-to documents
has been on my list of things to do for a while.
It is now updated with the current status and links to documents. I am planning
to write a detailed how-to blog/article.
https://kselftest.wiki.kernel.org
thanks,
-- Shuah
v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin which is now being
used to attach all flow dissector progs/maps
* third patch adds actual support to the bpftool
See third patch for the description/details.
Stanislav Fomichev (3):
selftests/bpf: rename flow dissector section to flow_dissector
libbpf: cleanup after partial failure in bpf_object__pin
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 26 +++--
tools/bpf/bpftool/bash-completion/bpftool | 2 +-
tools/bpf/bpftool/common.c | 30 +++---
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 94 ++++++++++++++-----
tools/lib/bpf/libbpf.c | 58 ++++++++++--
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
8 files changed, 151 insertions(+), 64 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds actual support to the bpftool
See second patch for the description/details.
Stanislav Fomichev (2):
selftests/bpf: rename flow dissector section to flow_dissector
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 16 ++-
tools/bpf/bpftool/common.c | 32 +++--
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 135 +++++++++++++++---
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
6 files changed, 143 insertions(+), 45 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
From: Randy Dunlap <rdunlap(a)infradead.org>
This is a small cleanup to kselftest.rst:
- Fix some language typos in the usage instructions.
- Change one non-ASCII space to an ASCII space.
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
---
Documentation/dev-tools/kselftest.rst | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- linux-next-20181101.orig/Documentation/dev-tools/kselftest.rst
+++ linux-next-20181101/Documentation/dev-tools/kselftest.rst
@@ -9,7 +9,7 @@ and booting a kernel.
On some systems, hot-plug tests could hang forever waiting for cpu and
memory to be ready to be offlined. A special hot-plug target is created
-to run full range of hot-plug tests. In default mode, hot-plug tests run
+to run the full range of hot-plug tests. In default mode, hot-plug tests run
in safe mode with a limited scope. In limited mode, cpu-hotplug test is
run on a single cpu as opposed to all hotplug capable cpus, and memory
hotplug test is run on 2% of hotplug capable memory instead of 10%.
@@ -89,9 +89,9 @@ Note that some tests will require root p
Install selftests
=================
-You can use kselftest_install.sh tool installs selftests in default
-location which is tools/testing/selftests/kselftest or a user specified
-location.
+You can use the kselftest_install.sh tool to install selftests in the
+default location, which is tools/testing/selftests/kselftest, or in a
+user specified location.
To install selftests in default location::
@@ -109,7 +109,7 @@ Running installed selftests
Kselftest install as well as the Kselftest tarball provide a script
named "run_kselftest.sh" to run the tests.
-You can simply do the following to run the installed Kselftests. Please
+You can simply do the following to run the installed Kselftests. Please
note some tests will require root privileges::
$ cd kselftest
@@ -139,7 +139,7 @@ Contributing new tests (details)
default.
TEST_CUSTOM_PROGS should be used by tests that require custom build
- rule and prevent common build rule use.
+ rules and prevent common build rule use.
TEST_PROGS are for test shell scripts. Please ensure shell script has
its exec bit set. Otherwise, lib.mk run_tests will generate a warning.
On smaller systems, running a test with 200 threads can take a long
time on machines with smaller number of CPUs.
Detect the number of online cpus at test runtime, and multiply that
by 6 to have 6 rseq threads per cpu preempting each other.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Watson <davejwatson(a)fb.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Andi Kleen <andi(a)firstfloor.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: "H . Peter Anvin" <hpa(a)zytor.com>
Cc: Chris Lameter <cl(a)linux.com>
Cc: Russell King <linux(a)arm.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: "Paul E . McKenney" <paulmck(a)linux.vnet.ibm.com>
Cc: Paul Turner <pjt(a)google.com>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ben Maurer <bmaurer(a)fb.com>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
---
tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
index 3acd6d75ff9f..e426304fd4a0 100755
--- a/tools/testing/selftests/rseq/run_param_test.sh
+++ b/tools/testing/selftests/rseq/run_param_test.sh
@@ -1,6 +1,8 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+ or MIT
+NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
+
EXTRA_ARGS=${@}
OLDIFS="$IFS"
@@ -28,15 +30,16 @@ IFS="$OLDIFS"
REPS=1000
SLOW_REPS=100
+NR_THREADS=$((6*${NR_CPUS}))
function do_tests()
{
local i=0
while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
echo "Running test ${TEST_NAME[$i]}"
- ./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+ ./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
echo "Running compare-twice test ${TEST_NAME[$i]}"
- ./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+ ./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
let "i++"
done
}
--
2.11.0
Historically, kretprobe has always produced unusable stack traces
(kretprobe_trampoline is the only entry in most cases, because of the
funky stack pointer overwriting). This has caused quite a few annoyances
when using tracing to debug problems[1] -- since return values are only
available with kretprobes but stack traces were only usable for kprobes,
users had to probe both and then manually associate them.
This patch series stores the stack trace within kretprobe_instance on
the kprobe entry used to set up the kretprobe. This allows for
DTrace-style stack aggregation between function entry and exit with
tools like BPFtrace -- which would not really be doable if the stack
unwinder understood kretprobe_trampoline.
We also revert commit 76094a2cf46e ("ftrace: distinguish kretprobe'd
functions in trace logs") and any follow-up changes because that code is
no longer necessary now that stack traces are sane. *However* this patch
might be a bit contentious since the original usecase (that ftrace
returns shouldn't show kretprobe_trampoline) is arguably still an
issue. Feel free to drop it if you think it is wrong.
Patch changelog:
v2:
* documentation: mention kretprobe stack-stashing
* ftrace: add self-test for fixed kretprobe stacktraces
* ftrace: remove [unknown/kretprobe'd] handling
* kprobe: remove needless EXPORT statements
* kprobe: minor corrections to current_kretprobe_instance (switch
away from hlist_for_each_entry_safe)
* kprobe: make maximum stack size 127, which is the ftrace default
(I forgot to Cc the BPF folks in v1, I've added them now.)
Aleksa Sarai (2):
kretprobe: produce sane stack traces
trace: remove kretprobed checks
Documentation/kprobes.txt | 6 +-
include/linux/kprobes.h | 15 +++
kernel/events/callchain.c | 8 +-
kernel/kprobes.c | 101 +++++++++++++++++-
kernel/trace/trace.c | 11 +-
kernel/trace/trace_output.c | 34 +-----
.../test.d/kprobe/kretprobe_stacktrace.tc | 25 +++++
7 files changed, 165 insertions(+), 35 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc
--
2.19.1
Discussions around time virtualization are there for a long time.
The first attempt to implement time namespace was in 2006 by Jeff Dike.
>From that time, the topic appears on and off in various discussions.
There are two main use cases for time namespaces:
1. change date and time inside a container;
2. adjust clocks for a container restored from a checkpoint.
“It seems like this might be one of the last major obstacles keeping
migration from being used in production systems, given that not all
containers and connections can be migrated as long as a time dependency
is capable of messing it up.” (by github.com/dav-ell)
The kernel provides access to several clocks: CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME. Last two clocks are monotonous, but the
start points for them are not defined and are different for each running
system. When a container is migrated from one node to another, all
clocks have to be restored into consistent states; in other words, they
have to continue running from the same points where they have been
dumped.
The main idea behind this patch set is adding per-namespace offsets for
system clocks. When a process in a non-root time namespace requests
time of a clock, a namespace offset is added to the current value of
this clock on a host and the sum is returned.
All offsets are placed on a separate page, this allows up to map it as
part of vvar into user processes and use offsets from vdso calls.
Now offsets are implemented for CLOCK_MONOTONIC and CLOCK_BOOTTIME
clocks.
Questions to discuss:
* Clone flags exhaustion. Currently there is only one unused clone flag
bit left, and it may be worth to use it to extend arguments of the clone
system call.
* Realtime clock implementation details:
Is having a simple offset enough?
What to do when date and time is changed on the host?
Is there a need to adjust vfs modification and creation times?
Implementation for adjtime() syscall.
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: Adrian Reber <adrian(a)lisas.de>
Cc: Andrei Vagin <avagin(a)openvz.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Christian Brauner <christian.brauner(a)ubuntu.com>
Cc: Cyrill Gorcunov <gorcunov(a)openvz.org>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jeff Dike <jdike(a)addtoit.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: containers(a)lists.linux-foundation.org
Cc: criu(a)openvz.org
Cc: linux-api(a)vger.kernel.org
Cc: x86(a)kernel.org
Andrei Vagin (12):
ns: Introduce Time Namespace
timens: Add timens_offsets
timens: Introduce CLOCK_MONOTONIC offsets
timens: Introduce CLOCK_BOOTTIME offset
timerfd/timens: Take into account ns clock offsets
kernel: Take into account timens clock offsets in clock_nanosleep
x86/vdso/timens: Add offsets page in vvar
x86/vdso: Use set_normalized_timespec() to avoid 32 bit overflow
posix-timers/timens: Take into account clock offsets
selftest/timens: Add test for timerfd
selftest/timens: Add test for clock_nanosleep
timens/selftest: Add timer offsets test
Dmitry Safonov (8):
timens: Shift /proc/uptime
x86/vdso: Restrict splitting vvar vma
x86/vdso: Purge timens page on setns()/unshare()/clone()
x86/vdso: Look for vvar vma to purge timens page
timens: Add align for timens_offsets
timens: Optimize zero-offsets
selftest: Add Time Namespace test for supported clocks
timens/selftest: Add procfs selftest
arch/Kconfig | 5 +
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vclock_gettime.c | 52 +++++
arch/x86/entry/vdso/vdso-layout.lds.S | 9 +-
arch/x86/entry/vdso/vdso2c.c | 3 +
arch/x86/entry/vdso/vma.c | 67 +++++++
arch/x86/include/asm/vdso.h | 2 +
fs/proc/namespaces.c | 3 +
fs/proc/uptime.c | 3 +
fs/timerfd.c | 16 +-
include/linux/nsproxy.h | 1 +
include/linux/proc_ns.h | 1 +
include/linux/time_namespace.h | 72 +++++++
include/linux/timens_offsets.h | 25 +++
include/linux/user_namespace.h | 1 +
include/uapi/linux/sched.h | 1 +
init/Kconfig | 8 +
kernel/Makefile | 1 +
kernel/fork.c | 3 +-
kernel/nsproxy.c | 19 +-
kernel/time/hrtimer.c | 8 +
kernel/time/posix-timers.c | 89 ++++++++-
kernel/time/posix-timers.h | 2 +
kernel/time_namespace.c | 230 +++++++++++++++++++++++
tools/testing/selftests/timens/.gitignore | 5 +
tools/testing/selftests/timens/Makefile | 6 +
tools/testing/selftests/timens/clock_nanosleep.c | 98 ++++++++++
tools/testing/selftests/timens/config | 1 +
tools/testing/selftests/timens/log.h | 21 +++
tools/testing/selftests/timens/procfs.c | 145 ++++++++++++++
tools/testing/selftests/timens/timens.c | 196 +++++++++++++++++++
tools/testing/selftests/timens/timer.c | 95 ++++++++++
tools/testing/selftests/timens/timerfd.c | 96 ++++++++++
33 files changed, 1272 insertions(+), 13 deletions(-)
create mode 100644 include/linux/time_namespace.h
create mode 100644 include/linux/timens_offsets.h
create mode 100644 kernel/time_namespace.c
create mode 100644 tools/testing/selftests/timens/.gitignore
create mode 100644 tools/testing/selftests/timens/Makefile
create mode 100644 tools/testing/selftests/timens/clock_nanosleep.c
create mode 100644 tools/testing/selftests/timens/config
create mode 100644 tools/testing/selftests/timens/log.h
create mode 100644 tools/testing/selftests/timens/procfs.c
create mode 100644 tools/testing/selftests/timens/timens.c
create mode 100644 tools/testing/selftests/timens/timer.c
create mode 100644 tools/testing/selftests/timens/timerfd.c
--
2.13.6
This series fixes issues I encountered building and running the
selftests on a Ubuntu Cosmic ppc64le system.
Joel Stanley (6):
selftests: powerpc/ptrace: Make tests build
selftests: powerpc/ptrace: Remove clean rule
selftests: powerpc/ptrace: Fix linking against pthread
selftests: powerpc/signal: Make tests build
selftests: powerpc/signal: Fix signal_tm CFLAGS
selftests: powerpc/pmu: Link ebb tests with -no-pie
tools/testing/selftests/powerpc/pmu/ebb/Makefile | 3 +++
tools/testing/selftests/powerpc/ptrace/Makefile | 11 ++++-------
tools/testing/selftests/powerpc/signal/Makefile | 9 +++------
3 files changed, 10 insertions(+), 13 deletions(-)
--
2.19.1
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.
One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active. This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow
This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active. The following program shows the seal
working in action:
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
#include <linux/memfd.h>
#include <linux/fcntl.h>
#include <asm/unistd.h>
#include <unistd.h>
#define F_SEAL_FUTURE_WRITE 0x0010
#define REGION_SIZE (5 * 1024 * 1024)
int memfd_create_region(const char *name, size_t size)
{
int ret;
int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
if (fd < 0) return fd;
ret = ftruncate(fd, size);
if (ret < 0) { close(fd); return ret; }
return fd;
}
int main() {
int ret, fd;
void *addr, *addr2, *addr3, *addr1;
ret = memfd_create_region("test_region", REGION_SIZE);
printf("ret=%d\n", ret);
fd = ret;
// Create map
addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED)
printf("map 0 failed\n");
else
printf("map 0 passed\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed even though no future-write seal "
"(ret=%d errno =%d)\n", ret, errno);
else
printf("write passed\n");
addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr1 == MAP_FAILED)
perror("map 1 prot-write failed even though no seal\n");
else
printf("map 1 prot-write passed as expected\n");
ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE |
F_SEAL_GROW |
F_SEAL_SHRINK);
if (ret == -1)
printf("fcntl failed, errno: %d\n", errno);
else
printf("future-write seal now active\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed as expected due to future-write seal\n");
else
printf("write passed (unexpected)\n");
addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr2 == MAP_FAILED)
perror("map 2 prot-write failed as expected due to seal\n");
else
printf("map 2 passed\n");
addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (addr3 == MAP_FAILED)
perror("map 3 failed\n");
else
printf("map 3 prot-read passed as expected\n");
}
The output of running this program is as follows:
ret=3
map 0 passed
write passed
map 1 prot-write passed as expected
future-write seal now active
write failed as expected due to future-write seal
map 2 prot-write failed as expected due to seal
: Permission denied
map 3 prot-read passed as expected
Cc: jreck(a)google.com
Cc: john.stultz(a)linaro.org
Cc: tkjos(a)google.com
Cc: gregkh(a)linuxfoundation.org
Cc: hch(a)infradead.org
Reviewed-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
v1->v2: No change, just added selftests to the series. manpages are
ready and I'll submit them once the patches are accepted.
v2->v3: Updated commit message to have more support code (John Stultz)
Renamed seal from F_SEAL_FS_WRITE to F_SEAL_FUTURE_WRITE
(Christoph Hellwig)
Allow for this seal only if grow/shrink seals are also
either previous set, or are requested along with this seal.
(Christoph Hellwig)
Added locking to synchronize access to file->f_mode.
(Christoph Hellwig)
include/uapi/linux/fcntl.h | 1 +
mm/memfd.c | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..a2f8658f1c55 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -41,6 +41,7 @@
#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
#define F_SEAL_GROW 0x0004 /* prevent file from growing */
#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
/* (1U << 31) is reserved for signed error codes */
/*
diff --git a/mm/memfd.c b/mm/memfd.c
index 2bb5e257080e..5ba9804e9515 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -150,7 +150,8 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_SHRINK | \
F_SEAL_GROW | \
- F_SEAL_WRITE)
+ F_SEAL_WRITE | \
+ F_SEAL_FUTURE_WRITE)
static int memfd_add_seals(struct file *file, unsigned int seals)
{
@@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
}
}
+ if ((seals & F_SEAL_FUTURE_WRITE) &&
+ !(*file_seals & F_SEAL_FUTURE_WRITE)) {
+ /*
+ * The FUTURE_WRITE seal also prevents growing and shrinking
+ * so we need them to be already set, or requested now.
+ */
+ int test_seals = (seals | *file_seals) &
+ (F_SEAL_GROW | F_SEAL_SHRINK);
+
+ if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
+ error = -EINVAL;
+ goto unlock;
+ }
+
+ spin_lock(&file->f_lock);
+ file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
+ spin_unlock(&file->f_lock);
+ }
+
*file_seals |= seals;
error = 0;
--
2.19.1.331.ge82ca0e54c-goog
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")
When passing tagged pointers to syscalls, there's a special case of such a
pointer being passed to one of the memory syscalls (mmap, mprotect, etc.).
These syscalls don't do memory accesses but rather deal with memory
ranges, hence an untagged pointer is better suited.
This patchset extends tagged pointer support to non-memory syscalls. This
is done by reusing the untagged_addr macro to untag user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok).
The following testing approaches has been taken to find potential issues
with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.
2. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added
to the patchset.
This patchset is a prerequisite for ARM's memory tagging hardware feature
support [3].
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060…
[3] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architectur…
Changes in v7:
- Rebased onto 17b57b18 (4.19-rc6).
- Dropped the "arm64: untag user address in __do_user_fault" patch, since
the existing patches already handle user faults properly.
- Dropped the "usb, arm64: untag user addresses in devio" patch, since the
passed pointer must come from a vma and therefore be untagged.
- Dropped the "arm64: annotate user pointers casts detected by sparse"
patch (see the discussion to the replies of the v6 of this patchset).
- Added more context to the cover letter.
- Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6:
- Added annotations for user pointer casts found by sparse.
- Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5:
- Added 3 new patches that add untagging to places found with static
analysis.
- Rebased onto 44c929e1 (4.18-rc8).
Changes in v4:
- Added a selftest for checking that passing tagged pointers to the
kernel succeeds.
- Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.
Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1:
- Rebased onto 4.17-rc1.
Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped "mm, arm64: untag user addresses in memory syscalls".
- Rebased onto 3eb2ce82 (4.16-rc7).
Andrey Konovalov (8):
arm64: add type casts to untagged_addr macro
uaccess: add untagged_addr definition for other arches
arm64: untag user addresses in access_ok and __uaccess_mask_ptr
mm, arm64: untag user addresses in mm/gup.c
lib, arm64: untag addrs passed to strncpy_from_user and strnlen_user
fs, arm64: untag user address in copy_mount_options
arm64: update Documentation/arm64/tagged-pointers.txt
selftests, arm64: add a selftest for passing tagged pointers to kernel
Documentation/arm64/tagged-pointers.txt | 24 +++++++++++--------
arch/arm64/include/asm/uaccess.h | 14 +++++++----
fs/namespace.c | 2 +-
include/linux/uaccess.h | 4 ++++
lib/strncpy_from_user.c | 2 ++
lib/strnlen_user.c | 2 ++
mm/gup.c | 4 ++++
tools/testing/selftests/arm64/.gitignore | 1 +
tools/testing/selftests/arm64/Makefile | 11 +++++++++
.../testing/selftests/arm64/run_tags_test.sh | 12 ++++++++++
tools/testing/selftests/arm64/tags_test.c | 19 +++++++++++++++
11 files changed, 79 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/arm64/.gitignore
create mode 100644 tools/testing/selftests/arm64/Makefile
create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh
create mode 100644 tools/testing/selftests/arm64/tags_test.c
--
2.19.0.605.g01d371f741-goog
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
and does not require tests to be written in userspace running on a host
kernel. Additionally, KUnit is fast: From invocation to completion KUnit
can run several dozen tests in under a second. Currently, the entire
KUnit test suite for KUnit runs in under a second from the initial
invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
## What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
## Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
## More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here:
https://google.github.io/kunit-docs/third_party/kernel/docs/
--
2.19.1.331.ge82ca0e54c-goog