When running the openat2 test suite on ARM64 platform, we got below failure,
since the definition of the O_LARGEFILE is different on ARM64. So we can
set the correct O_LARGEFILE definition on ARM64 to fix this issue.
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
---
tools/testing/selftests/openat2/openat2_test.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/openat2/openat2_test.c b/tools/testing/selftests/openat2/openat2_test.c
index d7ec1e7..1bddbe9 100644
--- a/tools/testing/selftests/openat2/openat2_test.c
+++ b/tools/testing/selftests/openat2/openat2_test.c
@@ -22,7 +22,11 @@
* XXX: This is wrong on {mips, parisc, powerpc, sparc}.
*/
#undef O_LARGEFILE
+#ifdef __aarch64__
+#define O_LARGEFILE 0x20000
+#else
#define O_LARGEFILE 0x8000
+#endif
struct open_how_ext {
struct open_how inner;
--
1.8.3.1
This patchset is an implementation of futex2 interface on top of existing
futex.c code.
* What happened to the current futex()?
The futex() is implemented using a multiplexed interface that doesn't
scale well and gives headaches to people. We don't want to add more
features there.
* New features at futex2()
** NUMA-awareness
At the current implementation, all futex kernel side infrastructure is
stored on a single node. Given that, all futex() calls issued by
processors that aren't located on that node will have a memory access
penalty when doing it.
** Variable sized futexes
Futexes are used to implement atomic operations in userspace.
Supporting 8, 16, 32 and 64 bit sized futexes allows user libraries to
implement all those sizes in a performant way. Thanks Boost devs for
feedback: https://lists.boost.org/Archives/boost/2021/05/251508.php
Embedded systems or anything with memory constrains could benefit of
using smaller sizes for the futex userspace integer.
** Wait on multiple futexes
Proton's (a set of compatibility tools to run Windows games) fork of Wine
benefits of this feature to implement WaitForMultipleObjects from Win32 in
a performant way. Native game engines will benefit from this as well,
given that this is a common wait pattern for games.
* The interface
The new interface has one syscall per operation as opposed to the
current multiplexing one. The details can be found in the following
patches, but this is a high level summary of what the interface can do:
- Supports wake/wait semantics, as in futex()
- Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with
individual flags for each address
- Supports waiting for a vector of futexes, using a new syscall named
futex_waitv()
- The following features will be implemented in next patchset versions:
- Supports variable sized futexes (8bits, 16bits, 32bits and 64bits)
- Supports NUMA-awareness operations, where the user can specify on
which memory node would like to operate
* The patchset
Given that futex2 reuses futex code, the patches make futex.c functions
public and modify them as needed.
This patchset can be also found at my git tree:
https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev
- Patch 1: Implements 32bit wait/wake
- Patches 2-3: Implement waitv and requeue.
- Patch 4: Add a documentation file which details the interface and
the internal implementation.
- Patches 5-10: Selftests for all operations along with perf
support for futex2.
- Patch 11: Proof of concept of waking threads at waitpid(), not to be
merged as it is.
* Testing
** Stability
- glibc[1]: nptl's low level locking was modified to use futex2 API
(except for PI). All nptl/ tests passed.
- Proton's Wine: Proton/Wine was modified in order to use futex2() for the
emulation of Windows NT sync mechanisms based on futex, called "fsync".
Triple-A games with huge CPU's loads and tons of parallel jobs worked
as expected when compared with the previous FUTEX_WAIT_MULTIPLE
implementation at futex(). Some games issue 42k futex2() calls
per second.
- perf: The perf benchmarks tests can also be used to stress the
interface, and they can be found in this patchset.
[1] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2-dev
** Performance
- Using perf, no significant difference was measured when comparing
futex() and futex2() for the following benchmarks: hash, wake and
wake-parallel.
- I measured a 15% overhead for the perf's requeue benchmark, comparing
futex2() to futex(). Requeue patch provides more details about why this
happens and how to overcome this.
* Changelog
Changes from v4:
- Use existing futex.c code when possible
- Cleaned up cover letter, check v4 for a more verbose version
v4: https://lore.kernel.org/lkml/20210603195924.361327-1-andrealmeid@collabora.…
André Almeida (11):
futex2: Implement wait and wake functions
futex2: Implement vectorized wait
futex2: Implement requeue operation
docs: locking: futex2: Add documentation
selftests: futex2: Add wake/wait test
selftests: futex2: Add timeout test
selftests: futex2: Add wouldblock test
selftests: futex2: Add waitv test
selftests: futex2: Add requeue test
perf bench: Add futex2 benchmark tests
kernel: Enable waitpid() for futex2
Documentation/locking/futex2.rst | 185 ++++++
Documentation/locking/index.rst | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 4 +
arch/x86/entry/syscalls/syscall_64.tbl | 4 +
include/linux/compat.h | 23 +
include/linux/futex.h | 103 ++++
include/linux/syscalls.h | 8 +
include/uapi/asm-generic/unistd.h | 11 +-
include/uapi/linux/futex.h | 27 +
init/Kconfig | 7 +
kernel/Makefile | 1 +
kernel/fork.c | 2 +
kernel/futex.c | 111 +---
kernel/futex2.c | 566 ++++++++++++++++++
kernel/sys_ni.c | 9 +
tools/arch/x86/include/asm/unistd_64.h | 12 +
tools/perf/bench/bench.h | 4 +
tools/perf/bench/futex-hash.c | 24 +-
tools/perf/bench/futex-requeue.c | 57 +-
tools/perf/bench/futex-wake-parallel.c | 41 +-
tools/perf/bench/futex-wake.c | 37 +-
tools/perf/bench/futex.h | 47 ++
tools/perf/builtin-bench.c | 18 +-
.../selftests/futex/functional/.gitignore | 3 +
.../selftests/futex/functional/Makefile | 6 +-
.../futex/functional/futex2_requeue.c | 164 +++++
.../selftests/futex/functional/futex2_wait.c | 195 ++++++
.../selftests/futex/functional/futex2_waitv.c | 154 +++++
.../futex/functional/futex_wait_timeout.c | 24 +-
.../futex/functional/futex_wait_wouldblock.c | 33 +-
.../testing/selftests/futex/functional/run.sh | 6 +
.../selftests/futex/include/futex2test.h | 112 ++++
32 files changed, 1865 insertions(+), 134 deletions(-)
create mode 100644 Documentation/locking/futex2.rst
create mode 100644 kernel/futex2.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c
create mode 100644 tools/testing/selftests/futex/include/futex2test.h
--
2.32.0
Introduce selftests to validate the functionality of KSM. The tests are
run on private anonymous pages. Since some KSM tunables are modified,
their starting values are saved and restored after testing. At the
start, run is set to 2 to ensure that only test pages will be merged (we
assume that no applications make madvise syscalls in the background). If
KSM config not enabled, all tests will be skipped.
Zhansaya Bagdauletkyzy (4):
selftests: vm: add KSM merge test
selftests: vm: add KSM unmerge test
selftests: vm: add KSM zero page merging test
selftests: vm: add KSM merging across nodes test
v1 -> v2:
- add a test to check KSM unmerging
- add a test to check merging of zero pages
- add a test to check merging in different NUMA nodes
- include command line options for each test
- new options to specify use_zero_pages and merge_across_nodes
- run each test case in run_vmtests.sh
- add some helper functions to make the code more compact:
allocate_memory(), ksm_do_scan(), ksm_merge_pages()
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +
tools/testing/selftests/vm/ksm_tests.c | 516 ++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests.sh | 96 ++++
4 files changed, 616 insertions(+)
create mode 100644 tools/testing/selftests/vm/ksm_tests.c
--
2.25.1
This patch series add support for unix stream type
for sockmap. Sockmap already supports TCP, UDP,
unix dgram types. The unix stream support is similar
to unix dgram.
Also add selftests for unix stream type in sockmap tests.
Jiang Wang (5):
af_unix: add read_sock for stream socket types
af_unix: add unix_stream_proto for sockmap
selftest/bpf: add tests for sockmap with unix stream type.
selftest/bpf: change udp to inet in some function names
selftest/bpf: add new tests in sockmap for unix stream to tcp.
include/net/af_unix.h | 8 +-
net/core/sock_map.c | 8 +-
net/unix/af_unix.c | 89 ++++++++++++++++--
net/unix/unix_bpf.c | 93 ++++++++++++++-----
.../selftests/bpf/prog_tests/sockmap_listen.c | 48 ++++++----
5 files changed, 194 insertions(+), 52 deletions(-)
--
2.20.1
v3:
- Add two new patches (patches 2 & 3) to fix bugs found during the
testing process.
- Add a new patch to enable inotify event notification when partition
become invalid.
- Add a test to test event notification when partition become invalid.
v2:
- Drop v1 patch 1.
- Break out some cosmetic changes into a separate patch (patch #1).
- Add a new patch to clarify the transition to invalid partition root
is mainly caused by hotplug events.
- Enhance the partition root state test including CPU online/offline
behavior and fix issues found by the test.
This patchset fixes two bugs and makes four enhancements to the cpuset
v2 code.
Bug fixes:
Patch 2: Fix a hotplug handling bug when just all cpus in subparts_cpus
are offlined.
Patch 3: Fix violation of cpuset locking rule.
Enhancements:
Patch 4: Enable event notification on "cpuset.cpus.partition" when
a partition become invalid.
Patch 5: Clarify the use of invalid partition root and add new checks
to make sure that normal cpuset control file operations will not be
allowed to create invalid partition root. It also fixes some of the
issues in existing code.
Patch 6: Add a new partition state "isolated" to create a partition
root without load balancing. This is for handling intermitten workloads
that have a strict low latency requirement.
Patch 7: Allow partition roots that are not the top cpuset to distribute
all its cpus to child partitions as long as there is no task associated
with that partition root. This allows more flexibility for middleware
to manage multiple partitions.
Patch 8 updates the cgroup-v2.rst file accordingly. Patch 9 adds a new
cpuset test to test the new cpuset partition code.
Waiman Long (9):
cgroup/cpuset: Miscellaneous code cleanup
cgroup/cpuset: Fix a partition bug with hotplug
cgroup/cpuset: Fix violation of cpuset locking rule
cgroup/cpuset: Enable event notification when partition become invalid
cgroup/cpuset: Clarify the use of invalid partition root
cgroup/cpuset: Add a new isolated cpus.partition type
cgroup/cpuset: Allow non-top parent partition root to distribute out
all CPUs
cgroup/cpuset: Update description of cpuset.cpus.partition in
cgroup-v2.rst
kselftest/cgroup: Add cpuset v2 partition root state test
Documentation/admin-guide/cgroup-v2.rst | 94 ++-
kernel/cgroup/cpuset.c | 360 +++++++---
tools/testing/selftests/cgroup/Makefile | 5 +-
.../selftests/cgroup/test_cpuset_prs.sh | 626 ++++++++++++++++++
tools/testing/selftests/cgroup/wait_inotify.c | 67 ++
5 files changed, 1007 insertions(+), 145 deletions(-)
create mode 100755 tools/testing/selftests/cgroup/test_cpuset_prs.sh
create mode 100644 tools/testing/selftests/cgroup/wait_inotify.c
--
2.18.1
Current PTP driver exposes one PTP device to user which binds network
interface/interfaces to provide timestamping. Actually we have a way
utilizing timecounter/cyclecounter to virtualize any number of PTP
clocks based on a same free running physical clock for using.
The purpose of having multiple PTP virtual clocks is for user space
to directly/easily use them for multiple domains synchronization.
user
space: ^ ^
| SO_TIMESTAMPING new flag: | Packets with
| SOF_TIMESTAMPING_BIND_PHC | TX/RX HW timestamps
v v
+--------------------------------------------+
sock: | sock (new member sk_bind_phc) |
+--------------------------------------------+
^ ^
| ethtool_get_phc_vclocks | Convert HW timestamps
| | to sk_bind_phc
v v
+--------------+--------------+--------------+
vclock: | ptp1 | ptp2 | ptpN |
+--------------+--------------+--------------+
pclock: | ptp0 free running |
+--------------------------------------------+
The block diagram may explain how it works. Besides the PTP virtual
clocks, the packet HW timestamp converting to the bound PHC is also
done in sock driver. For user space, PTP virtual clocks can be
created via sysfs, and extended SO_TIMESTAMPING API (new flag
SOF_TIMESTAMPING_BIND_PHC) can be used to bind one PTP virtual clock
for timestamping.
The test tool timestamping.c (together with linuxptp phc_ctl tool) can
be used to verify:
# echo 4 > /sys/class/ptp/ptp0/n_vclocks
[ 129.399472] ptp ptp0: new virtual clock ptp2
[ 129.404234] ptp ptp0: new virtual clock ptp3
[ 129.409532] ptp ptp0: new virtual clock ptp4
[ 129.413942] ptp ptp0: new virtual clock ptp5
[ 129.418257] ptp ptp0: guarantee physical clock free running
#
# phc_ctl /dev/ptp2 set 10000
# phc_ctl /dev/ptp3 set 20000
#
# timestamping eno0 2 SOF_TIMESTAMPING_TX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
# timestamping eno0 2 SOF_TIMESTAMPING_RX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
# timestamping eno0 3 SOF_TIMESTAMPING_TX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
# timestamping eno0 3 SOF_TIMESTAMPING_RX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
Changes for v2:
- Converted to num_vclocks for creating virtual clocks.
- Guranteed physical clock free running when using virtual
clocks.
- Fixed build warning.
- Updated copyright.
Changes for v3:
- Supported PTP virtual clock in default in PTP driver.
- Protected concurrency of ptp->num_vclocks accessing.
- Supported PHC vclocks query via ethtool.
- Extended SO_TIMESTAMPING API for PHC binding.
- Converted HW timestamps to PHC bound, instead of previous
binding domain value to PHC idea.
- Other minor fixes.
Changes for v4:
- Used do_aux_work callback for vclock refreshing instead.
- Used unsigned int for vclocks number, and max_vclocks
for limitiation.
- Fixed mutex locking.
- Dynamically allocated memory for vclock index storage.
- Removed ethtool ioctl command for vclocks getting.
- Updated doc for ethtool phc vclocks get.
- Converted to mptcp_setsockopt_sol_socket_timestamping().
- Passed so_timestamping for sock_set_timestamping.
- Fixed checkpatch/build.
- Other minor fixed.
Changes for v5:
- Fixed checkpatch/build/bug reported by test robot.
Yangbo Lu (11):
ptp: add ptp virtual clock driver framework
ptp: support ptp physical/virtual clocks conversion
ptp: track available ptp vclocks information
ptp: add kernel API ptp_get_vclocks_index()
ethtool: add a new command for getting PHC virtual clocks
ptp: add kernel API ptp_convert_timestamp()
mptcp: setsockopt: convert to
mptcp_setsockopt_sol_socket_timestamping()
net: sock: extend SO_TIMESTAMPING for PHC binding
net: socket: support hardware timestamp conversion to PHC bound
selftests/net: timestamping: support binding PHC
MAINTAINERS: add entry for PTP virtual clock driver
Documentation/ABI/testing/sysfs-ptp | 20 ++
Documentation/networking/ethtool-netlink.rst | 22 ++
MAINTAINERS | 7 +
drivers/ptp/Makefile | 2 +-
drivers/ptp/ptp_clock.c | 42 +++-
drivers/ptp/ptp_private.h | 39 ++++
drivers/ptp/ptp_sysfs.c | 160 ++++++++++++++
drivers/ptp/ptp_vclock.c | 219 +++++++++++++++++++
include/linux/ethtool.h | 10 +
include/linux/ptp_clock_kernel.h | 31 ++-
include/net/sock.h | 8 +-
include/uapi/linux/ethtool_netlink.h | 15 ++
include/uapi/linux/net_tstamp.h | 17 +-
net/core/sock.c | 65 +++++-
net/ethtool/Makefile | 2 +-
net/ethtool/common.c | 14 ++
net/ethtool/netlink.c | 10 +
net/ethtool/netlink.h | 2 +
net/ethtool/phc_vclocks.c | 94 ++++++++
net/mptcp/sockopt.c | 68 ++++--
net/socket.c | 19 +-
tools/testing/selftests/net/timestamping.c | 55 +++--
22 files changed, 867 insertions(+), 54 deletions(-)
create mode 100644 drivers/ptp/ptp_vclock.c
create mode 100644 net/ethtool/phc_vclocks.c
base-commit: b6df00789e2831fff7a2c65aa7164b2a4dcbe599
--
2.25.1
This patch set depends on:
- https://lore.kernel.org/linux-integrity/20210723085304.1760138-1-roberto.sa…
- https://lore.kernel.org/linux-integrity/20210705115650.3373599-1-roberto.sa…
I still kept pointer math to optimize the size of the digest_list_item_ref
structure. Replacing offsets with pointers would cause the size of the
structure to double. I could do this in the next version of the patch set
if the size change is acceptable.
Digest Lists Integrity Module (DIGLIM) is a new component added to the
integrity subsystem in the kernel, primarily aiming to aid Integrity
Measurement Architecture (IMA) in the process of checking the integrity
of file content and metadata. It accomplishes this task by storing
reference values coming from software vendors and by reporting whether
or not the digest of file content or metadata calculated by IMA (or EVM)
is found among those values. In this way, IMA can decide, depending on
the result of a query, if a measurement should be taken or access to the
file should be granted. The Security Assumptions section explains more
in detail why this component has been placed in the kernel.
The main benefits of using IMA in conjunction with DIGLIM are the
ability to implement advanced remote attestation schemes based on the
usage of a TPM key for establishing a TLS secure channel [1][2], and to
reduce the burden on Linux distribution vendors to extend secure boot at
OS level to applications.
DIGLIM does not have the complexity of feature-rich databases. In fact,
its main functionality comes from the hash table primitives already in
the kernel. It does not have an ad-hoc storage module, it just indexes
data in a fixed format (digest lists, a set of concatenated digests
preceded by a header), copied to kernel memory as they are. Lastly, it
does not support database-oriented languages such as SQL, but only
accepts a digest and its algorithm as a query.
The only digest list format supported by DIGLIM is called compact.
However, Linux distribution vendors don't have to generate new digest
lists in this format for the packages they release, as already available
information, such as RPM headers and DEB package metadata, can be
already used as a source for reference values (they already include file
digests), with a user space parser taking care of the conversion to the
compact format.
Although one might perceive that storing file or metadata digests for a
Linux distribution would significantly increase the memory usage, this
does not seem to be the case. As an anticipation of the evaluation done
in the Preliminary Performance Evaluation section, protecting binaries
and shared libraries of a minimal Fedora 33 installation requires 208K
of memory for the digest lists plus 556K for indexing.
In exchange for a slightly increased memory usage, DIGLIM improves the
performance of the integrity subsystem. In the considered scenario, IMA
measurement and appraisal with digest lists requires respectively less
than one quarter and less than half the time, compared to the current
solution.
DIGLIM also keeps track of whether digest lists have been processed in
some way (e.g. measured or appraised by IMA). This is important for
example for remote attestation, so that remote verifiers understand what
has been uploaded to the kernel.
DIGLIM behaves like a transactional database, i.e. it has the ability to
roll back to the beginning of the transaction if an error occurred
during the addition of a digest list (the deletion operation always
succeeds). This capability has been tested with an ad-hoc fault
injection mechanism capable of simulating failures during the
operations.
Finally, DIGLIM exposes to user space, through securityfs, the digest
lists currently loaded, the number of digests added, a query interface
and an interface to set digest list labels.
[1] LSS EU 2019
- slides:
https://static.sched.com/hosted_files/lsseu2019/bd/secure_attested_communic…
- video: https://youtu.be/mffdQgkvDNY
[2] FutureTPM EU project, final review meeting demo
- slides:
https://futuretpm.eu/images/07-3-FutureTPM-Final-Review-Slides-WP6-Device-M…
- video: https://vimeo.com/528251864/4c1d55abcd
Binary Integrity
Integrity is a fundamental security property in information systems.
Integrity could be described as the condition in which a generic
component is just after it has been released by the entity that created
it.
One way to check whether a component is in this condition (called binary
integrity) is to calculate its digest and to compare it with a reference
value (i.e. the digest calculated in controlled conditions, when the
component is released).
IMA, a software part of the integrity subsystem, can perform such
evaluation and execute different actions:
- store the digest in an integrity-protected measurement list, so that
it can be sent to a remote verifier for analysis;
- compare the calculated digest with a reference value (usually
protected with a signature) and deny operations if the file is found
corrupted;
- store the digest in the system log.
Contribution
DIGLIM further enhances the capabilities offered by IMA-based solutions
and, at the same time, makes them more practical to adopt by reusing
existing sources as reference values for integrity decisions.
Possible sources for digest lists are:
- RPM headers;
- Debian repository metadata.
Benefits for IMA Measurement
One of the issues that arises when files are measured by the OS is that,
due to parallel execution, the order in which file accesses happen
cannot be predicted. Since the TPM Platform Configuration Register (PCR)
extend operation, executed after each file measurement,
cryptographically binds the current measurement to the previous ones,
the PCR value at the end of a workload cannot be predicted too.
Thus, even if the usage of a TPM key, bound to a PCR value, should be
allowed when only good files were accessed, the TPM could unexpectedly
deny an operation on that key if files accesses did not happen as stated
by the key policy (which allows only one of the possible sequences).
DIGLIM solves this issue by making the PCR value stable over the time
and not dependent on file accesses. The following figure depicts the
current and the new approaches:
IMA measurement list (current)
entry# 1st boot 2nd boot 3rd boot
+----+---------------+ +----+---------------+ +----+---------------+
1: | 10 | file1 measur. | | 10 | file3 measur. | | 10 | file2 measur. |
+----+---------------+ +----+---------------+ +----+---------------+
2: | 10 | file2 measur. | | 10 | file2 measur. | | 10 | file3 measur. |
+----+---------------+ +----+---------------+ +----+---------------+
3: | 10 | file3 measur. | | 10 | file1 measur. | | 10 | file4 measur. |
+----+---------------+ +----+---------------+ +----+---------------+
PCR: Extend != Extend != Extend
file1, file2, file3 file3, file2, file1 file2, file3, file4
PCR Extend definition:
PCR(new value) = Hash(Hash(meas. entry), PCR(previous value))
A new entry in the measurement list is created by IMA for each file
access. Assuming that file1, file2 and file3 are files provided by the
software vendor, file4 is an unknown file, the first two PCR values
above represent a good system state, the third a bad system state. The
PCR values are the result of the PCR extend operation performed for each
measurement entry with the digest of the measurement entry as an input.
IMA measurement list (with DIGLIM)
dlist
+--------------+
| header |
+--------------+
| file1 digest |
| file2 digest |
| file3 digest |
+--------------+
dlist is a digest list containing the digest of file1, file2 and file3.
In the intended scenario, it is generated by a software vendor at the
end of the building process, and retrieved by the administrator of the
system where the digest list is loaded.
entry# 1st boot 2nd boot 3rd boot
+----+---------------+ +----+---------------+ +----+---------------+
0: | 11 | dlist measur. | | 11 | dlist measur. | | 11 | dlist measur. |
+----+---------------+ +----+---------------+ +----+---------------+
1: < file1 measur. skip > < file3 measur. skip > < file2 measur. skip >
2: < file2 measur. skip > < file2 measur. skip > < file3 measur. skip >
+----+---------------+
3: < file3 measur. skip > < file1 measur. skip > | 11 | file4 measur. |
+----+---------------+
PCR: Extend = Extend != Extend
dlist dlist dlist, file4
The first entry in the measurement list contains the digest of the
digest list uploaded to the kernel at kernel initialization time.
When a file is accessed, IMA queries DIGLIM with the calculated file
digest and, if it is found, IMA skips the measurement.
Thus, the only information sent to remote verifiers are: the list of
files that could possibly be accessed (from the digest list), but not if
they were accessed and when; the measurement of unknown files.
Despite providing less information, this solution has the advantage that
the good system state (i.e. when only file1, file2 and file3 are
accessed) now can be represented with a deterministic PCR value (the PCR
is extended only with the measurement of the digest list). Also, the bad
system state can still be distinguished from the good state (the PCR is
extended also with the measurement of file4).
If a TPM key is bound to the good PCR value, the TPM would allow the key
to be used if file1, file2 or file3 are accessed, regardless of the
sequence in which they are accessed (the PCR value does not change), and
would revoke the permission when the unknown file4 is accessed (the PCR
value changes). If a system is able to establish a TLS connection with a
peer, this implicitly means that the system was in a good state (i.e.
file4 was not accessed, otherwise the TPM would have denied the usage of
the TPM key due to the key policy).
Benefits for IMA Appraisal
Extending secure boot to applications means being able to verify the
provenance of files accessed. IMA does it by verifying file signatures
with a key that it trusts, which requires Linux distribution vendors to
additionally include in the package header a signature for each file
that must be verified (there is the dedicated RPMTAG_FILESIGNATURES
section in the RPM header).
The proposed approach would be instead to verify data provenance from
already available metadata (file digests) in existing packages. IMA
would verify the signature of package metadata and search file digests
extracted from package metadata and added to the hash table in the
kernel.
For RPMs, file digests can be found in the RPMTAG_FILEDIGESTS section of
RPMTAG_IMMUTABLE, whose signature is in RPMTAG_RSAHEADER. For DEBs, file
digests (unsafe to use due to a weak digest algorithm) can be found in
the md5sum file, which can be indirectly verified from Release.gpg.
The following figure highlights the differences between the current and
the proposed approach.
IMA appraisal (current solution, with file signatures):
appraise
+-----------+
V |
+-------------------------+-----+ +-------+-----+ |
| RPM header | | ima rpm | file1 | sig | |
| ... | | plugin +-------+-----+ +-----+
| file1 sig [to be added] | sig |--------> ... | IMA |
| ... | | +-------+-----+ +-----+
| fileN sig [to be added] | | | fileN | sig |
+-------------------------+-----+ +-------+-----+
In this case, file signatures must be added to the RPM header, so that
the ima rpm plugin can extract them together with the file content. The
RPM header signature is not used.
IMA appraisal (with DIGLIM):
kernel hash table
with RPM header content
+---+ +--------------+
| |--->| file1 digest |
+---+ +--------------+
...
+---+ appraise (file1)
| | <--------------+
+----------------+-----+ +---+ |
| RPM header | | ^ |
| ... | | digest_list | |
| file1 digest | sig | rpm plugin | +-------+ +-----+
| ... | |-------------+--->| file1 | | IMA |
| fileN digest | | +-------+ +-----+
+----------------+-----+ |
^ |
+------------------------------------+
appraise (RPM header)
In this case, the RPM header is used as it is, and its signature is used
for IMA appraisal. Then, the digest_list rpm plugin executes the user
space parser to parse the RPM header and add the extracted digests to an
hash table in the kernel. IMA appraisal of the files in the RPM package
consists in searching their digest in the hash table.
Other than reusing available information as digest list, another
advantage is the lower computational overhead compared to the solution
with file signatures (only one signature verification for many files and
digest lookup, instead of per file signature verification, see
Preliminary Performance Evaluation for more details).
Lifecycle
The lifecycle of DIGLIM is represented in the following figure:
Vendor premises (release process with modifications):
+------------+ +-----------------------+ +------------------------+
| 1. build a | | 2. generate and sign | | 3. publish the package |
| package |-->| a digest list from |-->| and digest list in |
| | | packaged files | | a repository |
+------------+ +-----------------------+ +------------------------+
|
|
User premises: |
V
+---------------------+ +------------------------+ +-----------------+
| 6. use digest lists | | 5. download the digest | | 4. download and |
| for measurement |<--| list and upload to |<--| install the |
| and/or appraisal | | the kernel | | package |
+---------------------+ +------------------------+ +-----------------+
The figure above represents all the steps when a digest list is
generated separately. However, as mentioned in Contribution, in most
cases existing packages can be already used as a source for digest
lists, limiting the effort for software vendors.
If, for example, RPMs are used as a source for digest lists, the figure
above becomes:
Vendor premises (release process without modifications):
+------------+ +------------------------+
| 1. build a | | 2. publish the package |
| package |-->| in a repository |---------------------+
| | | | |
+------------+ +------------------------+ |
|
|
User premises: |
V
+---------------------+ +------------------------+ +-----------------+
| 5. use digest lists | | 4. extract digest list | | 3. download and |
| for measurement |<--| from the package |<--| install the |
| and/or appraisal | | and upload to the | | package |
| | | kernel | | |
+---------------------+ +------------------------+ +-----------------+
Step 4 can be performed with the digest_list rpm plugin and the user
space parser, without changes to rpm itself.
Security Assumptions
As mentioned in the Introduction, DIGLIM will be primarily used in
conjunction with IMA to enforce a mandatory policy on all user space
processes, including those owned by root. Even root, in a system with a
locked-down kernel, cannot affect the enforcement of the mandatory
policy or, if changes are permitted, it cannot do so without being
detected.
Given that the target of the enforcement are user space processes,
DIGLIM cannot be placed in the target, as a Mandatory Access Control
(MAC) design is required to have the components responsible to enforce
the mandatory policy separated from the target.
While locking-down a system and limiting actions with a mandatory policy
is generally perceived by users as an obstacle, it has noteworthy
benefits for the users themselves.
First, it would timely block attempts by malicious software to steal or
misuse user assets. Although users could query the package managers to
detect them, detection would happen after the fact, or it wouldn't
happen at all if the malicious software tampered with package managers.
With a mandatory policy enforced by the kernel, users would still be
able to decide which software they want to be executed except that,
unlike package managers, the kernel is not affected by user space
processes or root.
Second, it might make systems more easily verifiable from outside, due
to the limited actions the system allows. When users connect to a
server, not only they would be able to verify the server identity, which
is already possible with communication protocols like TLS, but also if
the software running on that server can be trusted to handle their
sensitive data.
Adoption
A former version of DIGLIM is used in the following OSes:
- openEuler 20.09
https://github.com/openeuler-mirror/kernel/tree/openEuler-20.09
- openEuler 21.03
https://github.com/openeuler-mirror/kernel/tree/openEuler-21.03
Originally, DIGLIM was part of IMA (known as IMA Digest Lists). In this
version, it has been redesigned as a standalone module with an API that
makes its functionality accessible by IMA and, eventually, other
subsystems.
User Space Support
Digest lists can be generated and managed with digest-list-tools:
https://github.com/openeuler-mirror/digest-list-tools
It includes two main applications:
- gen_digest_lists: generates digest lists from files in the
filesystem or from the RPM database (more digest list sources can be
supported);
- manage_digest_lists: converts and uploads digest lists to the
kernel.
Integration with rpm is done with the digest_list plugin:
https://gitee.com/src-openeuler/rpm/blob/master/Add-digest-list-plugin.patch
This plugin writes the RPM header and its signature to a file, so that
the file is ready to be appraised by IMA, and calls the user space
parser to convert and upload the digest list to the kernel.
Simple Usage Example (Tested with Fedora 33)
1. Digest list generation (RPM headers and their signature are copied
to the specified directory):
# mkdir /etc/digest_lists
# gen_digest_lists -t file -f rpm+db -d /etc/digest_lists -o add
2. Digest list upload with the user space parser:
# manage_digest_lists -p add-digest -d /etc/digest_lists
3. First digest list query:
# echo sha256-$(sha256sum /bin/cat) > /sys/kernel/security/integrity/diglim/digest_query
# cat /sys/kernel/security/integrity/diglim/digest_query
sha256-[...]-0-file_list-rpm-coreutils-8.32-18.fc33.x86_64 (actions: 0): version: 1, algo: sha256, type: 2, modifiers: 1, count: 106, datalen: 3392
4. Second digest list query:
# echo sha256-$(sha256sum /bin/zip) > /sys/kernel/security/integrity/diglim/digest_query
# cat /sys/kernel/security/integrity/diglim/digest_query
sha256-[...]-0-file_list-rpm-zip-3.0-27.fc33.x86_64 (actions: 0): version: 1, algo: sha256, type: 2, modifiers: 1, count: 4, datalen: 128
Preliminary Performance Evaluation
This section provides an initial estimation of the overhead introduced
by DIGLIM. The estimation has been performed on a Fedora 33 virtual
machine with 1447 packages installed. The virtual machine has 16 vCPU
(host CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores) and 2G of RAM
(host memory: 64G). The virtual machine also has a vTPM with libtpms and
swtpm as backend.
After writing the RPM headers to files, the size of the directory
containing them is 36M.
After converting the RPM headers to the compact digest list, the size of
the data being uploaded to the kernel is 3.6M.
The time to load the entire RPM database is 0.628s.
After loading the digest lists to the kernel, the slab usage due to
indexing is (obtained with slab_nomerge in the kernel command line):
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
118144 118144 100% 0,03K 923 128 3692K digest_list_item_ref_cache
102400 102400 100% 0,03K 800 128 3200K digest_item_cache
2646 2646 100% 0,09K 63 42 252K digest_list_item_cache
The stats, obtained from the digests_count interface, introduced later,
are:
Parser digests: 0
File digests: 99100
Metadata digests: 0
Digest list digests: 1423
On this installation, this would be the worst case in which all files
are measured and/or appraised, which is currently not recommended
without enforcing an integrity policy protecting mutable files. Infoflow
LSM is a component to accomplish this task:
https://patchwork.kernel.org/project/linux-integrity/cover/20190818235745.1…
The first manageable goal of IMA with DIGLIM is to use an execution
policy, with measurement and/or appraisal of files executed or mapped in
memory as executable (in addition to kernel modules and firmware). In
this case, the digest list contains the digest only for those files. The
numbers above change as follows.
After converting the RPM headers to the compact digest list, the size of
the data being uploaded to the kernel is 208K.
The time to load the digest of binaries and shared libraries is 0.062s.
After loading the digest lists to the kernel, the slab usage due to
indexing is:
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
7168 7168 100% 0,03K 56 128 224K digest_list_item_ref_cache
7168 7168 100% 0,03K 56 128 224K digest_item_cache
1134 1134 100% 0,09K 27 42 108K digest_list_item_cache
The stats, obtained from the digests_count interface, are:
Parser digests: 0
File digests: 5986
Metadata digests: 0
Digest list digests: 1104
Comparison with IMA
This section compares the performance between the current solution for
IMA measurement and appraisal, and IMA with DIGLIM.
Workload A (without DIGLIM):
1. cat file[0-5985] > /dev/null
Workload B (with DIGLIM):
1. echo $PWD/0-file_list-compact-file[0-1103] >
<securityfs>/integrity/diglim/digest_list_add
2. cat file[0-5985] > /dev/null
Workload A execution time without IMA policy:
real 0m0,155s
user 0m0,008s
sys 0m0,066s
Measurement
IMA policy:
measure fowner=2000 func=FILE_CHECK mask=MAY_READ use_diglim=allow pcr=11 ima_template=ima-sig
use_diglim is a policy keyword not yet supported by IMA.
Workload A execution time with IMA and 5986 files with signature
measured:
real 0m8,273s
user 0m0,008s
sys 0m2,537s
Workload B execution time with IMA, 1104 digest lists with signature
measured and uploaded to the kernel, and 5986 files with signature
accessed but not measured (due to the file digest being found in the
hash table):
real 0m1,837s
user 0m0,036s
sys 0m0,583s
Appraisal
IMA policy:
appraise fowner=2000 func=FILE_CHECK mask=MAY_READ use_diglim=allow
use_diglim is a policy keyword not yet supported by IMA.
Workload A execution time with IMA and 5986 files with file signature
appraised:
real 0m2,197s
user 0m0,011s
sys 0m2,022s
Workload B execution time with IMA, 1104 digest lists with signature
appraised and uploaded to the kernel, and with 5986 files with signature
not verified (due to the file digest being found in the hash table):
real 0m0,982s
user 0m0,020s
sys 0m0,865s
Changelog
v1:
- remove 'ima: Add digest, algo, measured parameters to
ima_measure_critical_data()', replaced by:
https://lore.kernel.org/linux-integrity/20210705090922.3321178-1-roberto.sa…
- add 'Lifecycle' subsection to better clarify how digest lists are
generated and used (suggested by Greg KH)
- remove 'Possible Usages' subsection and add 'Benefits for IMA
Measurement' and 'Benefits for IMA Appraisal' subsubsections
- add 'Preliminary Performance Evaluation' subsection
- declare digest_offset and hdr_offset in the digest_list_item_ref
structure as u32 (sufficient for digest lists of 4G) to make room for a
list_head structure (digest_list_item_ref size: 32)
- implement digest list reference management with a linked list instead of
an array
- reorder structure members for better alignment (suggested by Mauro)
- rename digest_lookup() to __digest_lookup() (suggested by Mauro)
- introduce an object cache for each defined structure
- replace atomic_long_t with unsigned long in h_table structure definition
(suggested by Greg KH)
- remove GPL2 license text and file names (suggested by Greg KH)
- ensure that the _reserved field of compact_list_hdr is equal to zero
(suggested by Greg KH)
- dynamically allocate the buffer in digest_lists_show_htable_len() to
avoid frame size warning (reported by kernel test robot, dynamic
allocation suggested by Mauro)
- split documentation in multiple files and reference the source code
(suggested by Mauro)
- use #ifdef in include/linux/diglim.h
- improve generation of event name for IMA measurements
- add new patch to introduce the 'Remote Attestation' section in the
documentation
- fix assignment of actions variable in digest_list_read() and
digest_list_write()
- always release dentry reference when digest_list_get_secfs_files() is
called
- rewrite add/del and query interfaces to take advantage of m->private
- prevent deletion of a digest list only if there are actions done at
addition time that are not currently being performed
- fix doc warnings (replace Returns with Return:)
- perform queries of digest list digests in the existing tests
- add new tests: digest_list_add_del_test_file_upload_measured,
digest_list_check_measurement_list_test_file_upload and
digest_list_check_measurement_list_test_buffer_upload
- don't return a value from digest_del(), digest_list_ref_del, and
digest_list_del()
- improve Makefile for tests
Roberto Sassu (12):
diglim: Overview
diglim: Basic definitions
diglim: Objects
diglim: Methods
diglim: Parser
diglim: Interfaces - digest_list_add, digest_list_del
diglim: Interfaces - digest_lists_loaded
diglim: Interfaces - digest_label
diglim: Interfaces - digest_query
diglim: Interfaces - digests_count
diglim: Remote Attestation
diglim: Tests
.../security/diglim/architecture.rst | 45 +
.../security/diglim/implementation.rst | 255 +++
Documentation/security/diglim/index.rst | 14 +
.../security/diglim/introduction.rst | 631 ++++++++
.../security/diglim/remote_attestation.rst | 87 ++
Documentation/security/diglim/tests.rst | 66 +
Documentation/security/index.rst | 1 +
MAINTAINERS | 19 +
include/linux/diglim.h | 28 +
include/linux/kernel_read_file.h | 1 +
include/uapi/linux/diglim.h | 51 +
security/integrity/Kconfig | 1 +
security/integrity/Makefile | 1 +
security/integrity/diglim/Kconfig | 11 +
security/integrity/diglim/Makefile | 8 +
security/integrity/diglim/diglim.h | 157 ++
security/integrity/diglim/fs.c | 782 ++++++++++
security/integrity/diglim/methods.c | 499 ++++++
security/integrity/diglim/parser.c | 274 ++++
security/integrity/integrity.h | 4 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/diglim/Makefile | 19 +
tools/testing/selftests/diglim/common.c | 115 ++
tools/testing/selftests/diglim/common.h | 31 +
tools/testing/selftests/diglim/config | 3 +
tools/testing/selftests/diglim/selftest.c | 1382 +++++++++++++++++
26 files changed, 4486 insertions(+)
create mode 100644 Documentation/security/diglim/architecture.rst
create mode 100644 Documentation/security/diglim/implementation.rst
create mode 100644 Documentation/security/diglim/index.rst
create mode 100644 Documentation/security/diglim/introduction.rst
create mode 100644 Documentation/security/diglim/remote_attestation.rst
create mode 100644 Documentation/security/diglim/tests.rst
create mode 100644 include/linux/diglim.h
create mode 100644 include/uapi/linux/diglim.h
create mode 100644 security/integrity/diglim/Kconfig
create mode 100644 security/integrity/diglim/Makefile
create mode 100644 security/integrity/diglim/diglim.h
create mode 100644 security/integrity/diglim/fs.c
create mode 100644 security/integrity/diglim/methods.c
create mode 100644 security/integrity/diglim/parser.c
create mode 100644 tools/testing/selftests/diglim/Makefile
create mode 100644 tools/testing/selftests/diglim/common.c
create mode 100644 tools/testing/selftests/diglim/common.h
create mode 100644 tools/testing/selftests/diglim/config
create mode 100644 tools/testing/selftests/diglim/selftest.c
--
2.25.1