There is a potential for us to hit a type conflict when including
netinet/tcp.h with sys/socket.h, we can remove these as they are not
actually needed.
Fixes errors like:
In file included from /usr/include/netinet/tcp.h:91,
from progs/bind4_prog.c:10:
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char'
34 | typedef __INT8_TYPE__ int8_t;
| ^~~~~~
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155,
from /usr/include/x86_64-linux-gnu/bits/socket.h:29,
from /usr/include/x86_64-linux-gnu/sys/socket.h:33,
from progs/bind4_prog.c:9:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'}
24 | typedef __int8_t int8_t;
| ^~~~~~
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int'
43 | typedef __INT64_TYPE__ int64_t;
| ^~~~~~~
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'}
27 | typedef __int64_t int64_t;
| ^~~~~~~
make: *** [Makefile:537: /home/buildroot/bpf-next/tools/testing/selftests/bpf/bpf_gcc/bind4_prog.o] Error 1
Signed-off-by: James Hilliard <james.hilliard1(a)gmail.com>
---
Changes v1 -> v2:
- just remove netinet/tcp.h and sys/socket.h
---
tools/testing/selftests/bpf/progs/bind4_prog.c | 2 --
tools/testing/selftests/bpf/progs/bind6_prog.c | 2 --
2 files changed, 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/bind4_prog.c b/tools/testing/selftests/bpf/progs/bind4_prog.c
index 474c6a62078a..a487f60b73ac 100644
--- a/tools/testing/selftests/bpf/progs/bind4_prog.c
+++ b/tools/testing/selftests/bpf/progs/bind4_prog.c
@@ -6,8 +6,6 @@
#include <linux/bpf.h>
#include <linux/in.h>
#include <linux/in6.h>
-#include <sys/socket.h>
-#include <netinet/tcp.h>
#include <linux/if.h>
#include <errno.h>
diff --git a/tools/testing/selftests/bpf/progs/bind6_prog.c b/tools/testing/selftests/bpf/progs/bind6_prog.c
index c19cfa869f30..d62cd9e9cf0e 100644
--- a/tools/testing/selftests/bpf/progs/bind6_prog.c
+++ b/tools/testing/selftests/bpf/progs/bind6_prog.c
@@ -6,8 +6,6 @@
#include <linux/bpf.h>
#include <linux/in.h>
#include <linux/in6.h>
-#include <sys/socket.h>
-#include <netinet/tcp.h>
#include <linux/if.h>
#include <errno.h>
--
2.34.1
Attestation is used to verify the TDX guest trustworthiness to other
entities before provisioning secrets to the guest. For example, a key
server may request for attestation before releasing the encryption keys
to mount the encrypted rootfs or secondary drive.
During the TDX guest launch, the initial contents (including the
firmware image) and configuration of the guest are recorded by the
Intel TDX module in build time measurement register (MRTD). After TDX
guest is created, run-time measurement registers (RTMRs) can be used by
the guest software to extend the measurements. TDX supports 4 RTMR
registers, and TDG.MR.RTMR.EXTEND TDCALL is used to update the RTMR
registers securely. RTMRs are mainly used to record measurements
related to sections like the kernel image, command line parameters,
initrd, ACPI tables, firmware data, configuration firmware volume (CFV)
of TDVF, etc. For complete details, please refer to TDX Virtual
Firmware design specification, sec titled "TD Measurement".
At TDX guest runtime, the Intel TDX module reuses the Intel SGX
attestation infrastructure to provide support for attesting to these
measurements as described below.
The attestation process consists of two steps: TDREPORT generation and
Quote generation.
TDREPORT (TDREPORT_STRUCT) is a fixed-size data structure generated by
the TDX module which contains guest-specific information (such as build
and boot measurements), platform security version, and the MAC to
protect the integrity of the TDREPORT. The guest kernel uses
TDCALL[TDG.MR.REPORT] to get the TDREPORT from the TDX module. A
user-provided 64-Byte REPORTDATA is used as input and included in the
TDREPORT. Typically it can be some nonce provided by attestation
service so the TDREPORT can be verified uniquely. More details about
the TDREPORT can be found in Intel TDX Module specification, section
titled "TDG.MR.REPORT Leaf".
TDREPORT by design can only be verified on the local platform as the
MAC key is bound to the platform. To support remote verification of
the TDREPORT, TDX leverages Intel SGX Quote Enclave (QE) to verify
the TDREPORT locally and convert it to a remote verifiable Quote.
After getting the TDREPORT, the second step of the attestation process
is to send it to the QE to generate the Quote. TDX doesn't support SGX
inside the guest, so the QE can be deployed in the host, or in another
legacy VM with SGX support. QE checks the integrity of TDREPORT and if
it is valid, a certified quote signing key is used to sign the Quote.
How to send the TDREPORT to QE and receive the Quote is implementation
and deployment specific.
Implement a basic guest misc driver to allow userspace to get the
TDREPORT. After getting TDREPORT, the userspace attestation software
can choose whatever communication channel available (i.e. vsock or
hypercall) to send the TDREPORT to QE and receive the Quote.
Also note that explicit access permissions are not enforced in this
driver because the quote and measurements are not a secret. However
the access permissions of the device node can be used to set any
desired access policy. The udev default is usually root access
only.
Operations like getting TDREPORT or Quote generation involves sending
a blob of data as input and getting another blob of data as output. It
was considered to use a sysfs interface for this, but it doesn't fit
well into the standard sysfs model for configuring values. It would be
possible to do read/write on files, but it would need multiple file
descriptors, which would be somewhat messy. IOCTLs seems to be the best
fitting and simplest model for this use case. This is similar to AMD
SEV platform, which also uses IOCTL interface to support attestation.
Any distribution enabling TDX is also expected to need attestation. So
enable it by default with TDX guest support.
Reviewed-by: Tony Luck <tony.luck(a)intel.com>
Reviewed-by: Andi Kleen <ak(a)linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Kai Huang <kai.huang(a)intel.com>
Acked-by: Wander Lairson Costa <wander(a)redhat.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
---
Changes since v10:
* Replaced TD/TD Guest usage with TDX Guest or Guest.
* Removed unnecessary comments.
* Added more validation to user input in tdx_get_report().
* Used u64_to_user_ptr when reading user u64 pointers.
* Fixed commit log as per review comments.
Changes since v9:
* Dropped the cover letter. Since this patch set only adds
TDREPORT support, the commit log itself has all the required details.
* Dropped the Quote support and event IRQ support as per Dave's
review suggestion.
* Dropped attest.c and moved its contents to tdx.c
* Updated commit log and comments to reflect latest changes.
Changes since v8:
* Please refer to https://lore.kernel.org/all/ \
20220728034420.648314-1-sathyanarayanan.kuppuswamy(a)linux.intel.com/
arch/x86/coco/tdx/tdx.c | 114 ++++++++++++++++++++++++++++++++
arch/x86/include/uapi/asm/tdx.h | 51 ++++++++++++++
2 files changed, 165 insertions(+)
create mode 100644 arch/x86/include/uapi/asm/tdx.h
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 928dcf7a20d9..0888bdf93a4e 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -5,16 +5,21 @@
#define pr_fmt(fmt) "tdx: " fmt
#include <linux/cpufeature.h>
+#include <linux/miscdevice.h>
+#include <linux/mm.h>
+#include <linux/io.h>
#include <asm/coco.h>
#include <asm/tdx.h>
#include <asm/vmx.h>
#include <asm/insn.h>
#include <asm/insn-eval.h>
#include <asm/pgtable.h>
+#include <uapi/asm/tdx.h>
/* TDX module Call Leaf IDs */
#define TDX_GET_INFO 1
#define TDX_GET_VEINFO 3
+#define TDX_GET_REPORT 4
#define TDX_ACCEPT_PAGE 6
/* TDX hypercall Leaf IDs */
@@ -34,6 +39,10 @@
#define VE_GET_PORT_NUM(e) ((e) >> 16)
#define VE_IS_IO_STRING(e) ((e) & BIT(4))
+#define DRIVER_NAME "tdx-guest"
+
+static struct miscdevice tdx_misc_dev;
+
/*
* Wrapper for standard use of __tdx_hypercall with no output aside from
* return code.
@@ -775,3 +784,108 @@ void __init tdx_early_init(void)
pr_info("Guest detected\n");
}
+
+static long tdx_get_report(void __user *argp)
+{
+ u8 *reportdata, *tdreport;
+ struct tdx_report_req req;
+ long ret;
+
+ if (copy_from_user(&req, argp, sizeof(req)))
+ return -EFAULT;
+
+ /*
+ * Per TDX Module 1.0 specification, section titled
+ * "TDG.MR.REPORT", REPORTDATA length is fixed as
+ * TDX_REPORTDATA_LEN, TDREPORT length is fixed as
+ * TDX_REPORT_LEN, and TDREPORT subtype is fixed as
+ * 0. Also check for valid user pointers.
+ */
+ if (!req.reportdata || !req.tdreport || req.subtype ||
+ req.rpd_len != TDX_REPORTDATA_LEN ||
+ req.tdr_len != TDX_REPORT_LEN)
+ return -EINVAL;
+
+ reportdata = kmalloc(req.rpd_len, GFP_KERNEL);
+ if (!reportdata)
+ return -ENOMEM;
+
+ tdreport = kzalloc(req.tdr_len, GFP_KERNEL);
+ if (!tdreport) {
+ kfree(reportdata);
+ return -ENOMEM;
+ }
+
+ if (copy_from_user(reportdata, u64_to_user_ptr(req.reportdata),
+ req.rpd_len)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ /*
+ * Generate TDREPORT using "TDG.MR.REPORT" TDCALL.
+ *
+ * Get the TDREPORT using REPORTDATA as input. Refer to
+ * section 22.3.3 TDG.MR.REPORT leaf in the TDX Module 1.0
+ * Specification for detailed information.
+ */
+ ret = __tdx_module_call(TDX_GET_REPORT, virt_to_phys(tdreport),
+ virt_to_phys(reportdata), req.subtype,
+ 0, NULL);
+ if (ret) {
+ ret = -EIO;
+ goto out;
+ }
+
+ if (copy_to_user(u64_to_user_ptr(req.tdreport), tdreport, req.tdr_len))
+ ret = -EFAULT;
+
+out:
+ kfree(reportdata);
+ kfree(tdreport);
+ return ret;
+}
+static long tdx_guest_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ void __user *argp = (void __user *)arg;
+ long ret = -EINVAL;
+
+ switch (cmd) {
+ case TDX_CMD_GET_REPORT:
+ ret = tdx_get_report(argp);
+ break;
+ default:
+ pr_debug("cmd %d not supported\n", cmd);
+ break;
+ }
+
+ return ret;
+}
+
+static const struct file_operations tdx_guest_fops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = tdx_guest_ioctl,
+ .llseek = no_llseek,
+};
+
+static int __init tdx_guest_init(void)
+{
+ int ret;
+
+ if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
+ return -EIO;
+
+ tdx_misc_dev.name = DRIVER_NAME;
+ tdx_misc_dev.minor = MISC_DYNAMIC_MINOR;
+ tdx_misc_dev.fops = &tdx_guest_fops;
+
+ ret = misc_register(&tdx_misc_dev);
+ if (ret) {
+ pr_err("misc device registration failed\n");
+ return ret;
+ }
+
+ return 0;
+}
+device_initcall(tdx_guest_init)
diff --git a/arch/x86/include/uapi/asm/tdx.h b/arch/x86/include/uapi/asm/tdx.h
new file mode 100644
index 000000000000..c1667b20fe20
--- /dev/null
+++ b/arch/x86/include/uapi/asm/tdx.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_ASM_X86_TDX_H
+#define _UAPI_ASM_X86_TDX_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/* Length of the REPORTDATA used in TDG.MR.REPORT TDCALL */
+#define TDX_REPORTDATA_LEN 64
+
+/* Length of TDREPORT used in TDG.MR.REPORT TDCALL */
+#define TDX_REPORT_LEN 1024
+
+/**
+ * struct tdx_report_req: Get TDREPORT using REPORTDATA as input.
+ *
+ * @subtype : Subtype of TDREPORT (fixed as 0 by TDX Module
+ * specification, but added a parameter to handle
+ * future extension).
+ * @reportdata : User-defined REPORTDATA to be included into
+ * TDREPORT. Typically it can be some nonce
+ * provided by attestation service, so the
+ * generated TDREPORT can be uniquely verified.
+ * @rpd_len : Length of the REPORTDATA (fixed as 64 bytes by
+ * the TDX Module specification, but parameter is
+ * added to handle future extension).
+ * @tdreport : TDREPORT output from TDCALL[TDG.MR.REPORT].
+ * @tdr_len : Length of the TDREPORT (fixed as 1024 bytes by
+ * the TDX Module specification, but a parameter
+ * is added to accommodate future extension).
+ *
+ * Used in TDX_CMD_GET_REPORT IOCTL request.
+ */
+struct tdx_report_req {
+ __u8 subtype;
+ __u64 reportdata;
+ __u32 rpd_len;
+ __u64 tdreport;
+ __u32 tdr_len;
+};
+
+/*
+ * TDX_CMD_GET_REPORT - Get TDREPORT using TDCALL[TDG.MR.REPORT]
+ *
+ * Return 0 on success, -EIO on TDCALL execution failure, and
+ * standard errno on other general error cases.
+ *
+ */
+#define TDX_CMD_GET_REPORT _IOWR('T', 0x01, __u64)
+
+#endif /* _UAPI_ASM_X86_TDX_H */
--
2.25.1
There is a potential for us to hit a type conflict when including
netinet/tcp.h and sys/socket.h, we can replace both of these includes
with linux/tcp.h and bpf_tcp_helpers.h to avoid this conflict.
Fixes the following error:
In file included from /usr/include/netinet/tcp.h:91,
from progs/connect4_prog.c:11:
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char'
34 | typedef __INT8_TYPE__ int8_t;
| ^~~~~~
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155,
from /usr/include/x86_64-linux-gnu/bits/socket.h:29,
from /usr/include/x86_64-linux-gnu/sys/socket.h:33,
from progs/connect4_prog.c:10:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'}
24 | typedef __int8_t int8_t;
| ^~~~~~
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int'
43 | typedef __INT64_TYPE__ int64_t;
| ^~~~~~~
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'}
27 | typedef __int64_t int64_t;
| ^~~~~~~
Signed-off-by: James Hilliard <james.hilliard1(a)gmail.com>
---
Changes v1 -> v2:
- use bpf_tcp_helpers.h so we can use SOL_TCP
---
tools/testing/selftests/bpf/progs/connect4_prog.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/connect4_prog.c b/tools/testing/selftests/bpf/progs/connect4_prog.c
index b241932911db..23d85f5027d3 100644
--- a/tools/testing/selftests/bpf/progs/connect4_prog.c
+++ b/tools/testing/selftests/bpf/progs/connect4_prog.c
@@ -7,13 +7,13 @@
#include <linux/bpf.h>
#include <linux/in.h>
#include <linux/in6.h>
-#include <sys/socket.h>
-#include <netinet/tcp.h>
+#include <linux/tcp.h>
#include <linux/if.h>
#include <errno.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
+#include "bpf_tcp_helpers.h"
#define SRC_REWRITE_IP4 0x7f000004U
#define DST_REWRITE_IP4 0x7f000001U
--
2.34.1
There is a potential for us to hit a type conflict when including
netinet/tcp.h with sys/socket.h, we can replace both of these includes
with linux/tcp.h to avoid this conflict.
Fixes errors like:
In file included from /usr/include/netinet/tcp.h:91,
from progs/bind4_prog.c:10:
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char'
34 | typedef __INT8_TYPE__ int8_t;
| ^~~~~~
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155,
from /usr/include/x86_64-linux-gnu/bits/socket.h:29,
from /usr/include/x86_64-linux-gnu/sys/socket.h:33,
from progs/bind4_prog.c:9:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'}
24 | typedef __int8_t int8_t;
| ^~~~~~
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int'
43 | typedef __INT64_TYPE__ int64_t;
| ^~~~~~~
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'}
27 | typedef __int64_t int64_t;
| ^~~~~~~
make: *** [Makefile:537: /home/buildroot/bpf-next/tools/testing/selftests/bpf/bpf_gcc/bind4_prog.o] Error 1
Signed-off-by: James Hilliard <james.hilliard1(a)gmail.com>
---
tools/testing/selftests/bpf/progs/bind4_prog.c | 3 +--
tools/testing/selftests/bpf/progs/bind6_prog.c | 3 +--
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/bind4_prog.c b/tools/testing/selftests/bpf/progs/bind4_prog.c
index 474c6a62078a..6bd20042fd53 100644
--- a/tools/testing/selftests/bpf/progs/bind4_prog.c
+++ b/tools/testing/selftests/bpf/progs/bind4_prog.c
@@ -6,8 +6,7 @@
#include <linux/bpf.h>
#include <linux/in.h>
#include <linux/in6.h>
-#include <sys/socket.h>
-#include <netinet/tcp.h>
+#include <linux/tcp.h>
#include <linux/if.h>
#include <errno.h>
diff --git a/tools/testing/selftests/bpf/progs/bind6_prog.c b/tools/testing/selftests/bpf/progs/bind6_prog.c
index c19cfa869f30..f37617b35a55 100644
--- a/tools/testing/selftests/bpf/progs/bind6_prog.c
+++ b/tools/testing/selftests/bpf/progs/bind6_prog.c
@@ -6,8 +6,7 @@
#include <linux/bpf.h>
#include <linux/in.h>
#include <linux/in6.h>
-#include <sys/socket.h>
-#include <netinet/tcp.h>
+#include <linux/tcp.h>
#include <linux/if.h>
#include <errno.h>
--
2.34.1
There is a potential for us to hit a type conflict when including
netinet/tcp.h and sys/socket.h, we can replace both of these includes
with linux/tcp.h to avoid this conflict.
We also need to replace SOL_TCP with equivalent IPPROTO_TCP.
Fixes the following error:
In file included from /usr/include/netinet/tcp.h:91,
from progs/connect4_prog.c:11:
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:34:23: error: conflicting types for 'int8_t'; have 'char'
34 | typedef __INT8_TYPE__ int8_t;
| ^~~~~~
In file included from /usr/include/x86_64-linux-gnu/sys/types.h:155,
from /usr/include/x86_64-linux-gnu/bits/socket.h:29,
from /usr/include/x86_64-linux-gnu/sys/socket.h:33,
from progs/connect4_prog.c:10:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:24:18: note: previous declaration of 'int8_t' with type 'int8_t' {aka 'signed char'}
24 | typedef __int8_t int8_t;
| ^~~~~~
/home/buildroot/opt/cross/lib/gcc/bpf/13.0.0/include/stdint.h:43:24: error: conflicting types for 'int64_t'; have 'long int'
43 | typedef __INT64_TYPE__ int64_t;
| ^~~~~~~
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous declaration of 'int64_t' with type 'int64_t' {aka 'long long int'}
27 | typedef __int64_t int64_t;
| ^~~~~~~
Signed-off-by: James Hilliard <james.hilliard1(a)gmail.com>
---
.../selftests/bpf/progs/connect4_prog.c | 21 +++++++++----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/connect4_prog.c b/tools/testing/selftests/bpf/progs/connect4_prog.c
index b241932911db..0f68b8d756b3 100644
--- a/tools/testing/selftests/bpf/progs/connect4_prog.c
+++ b/tools/testing/selftests/bpf/progs/connect4_prog.c
@@ -7,8 +7,7 @@
#include <linux/bpf.h>
#include <linux/in.h>
#include <linux/in6.h>
-#include <sys/socket.h>
-#include <netinet/tcp.h>
+#include <linux/tcp.h>
#include <linux/if.h>
#include <errno.h>
@@ -52,7 +51,7 @@ static __inline int verify_cc(struct bpf_sock_addr *ctx,
char buf[TCP_CA_NAME_MAX];
int i;
- if (bpf_getsockopt(ctx, SOL_TCP, TCP_CONGESTION, &buf, sizeof(buf)))
+ if (bpf_getsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, &buf, sizeof(buf)))
return 1;
for (i = 0; i < TCP_CA_NAME_MAX; i++) {
@@ -70,12 +69,12 @@ static __inline int set_cc(struct bpf_sock_addr *ctx)
char reno[TCP_CA_NAME_MAX] = "reno";
char cubic[TCP_CA_NAME_MAX] = "cubic";
- if (bpf_setsockopt(ctx, SOL_TCP, TCP_CONGESTION, &reno, sizeof(reno)))
+ if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, &reno, sizeof(reno)))
return 1;
if (verify_cc(ctx, reno))
return 1;
- if (bpf_setsockopt(ctx, SOL_TCP, TCP_CONGESTION, &cubic, sizeof(cubic)))
+ if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_CONGESTION, &cubic, sizeof(cubic)))
return 1;
if (verify_cc(ctx, cubic))
return 1;
@@ -113,15 +112,15 @@ static __inline int set_keepalive(struct bpf_sock_addr *ctx)
if (bpf_setsockopt(ctx, SOL_SOCKET, SO_KEEPALIVE, &one, sizeof(one)))
return 1;
if (ctx->type == SOCK_STREAM) {
- if (bpf_setsockopt(ctx, SOL_TCP, TCP_KEEPIDLE, &one, sizeof(one)))
+ if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_KEEPIDLE, &one, sizeof(one)))
return 1;
- if (bpf_setsockopt(ctx, SOL_TCP, TCP_KEEPINTVL, &one, sizeof(one)))
+ if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_KEEPINTVL, &one, sizeof(one)))
return 1;
- if (bpf_setsockopt(ctx, SOL_TCP, TCP_KEEPCNT, &one, sizeof(one)))
+ if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_KEEPCNT, &one, sizeof(one)))
return 1;
- if (bpf_setsockopt(ctx, SOL_TCP, TCP_SYNCNT, &one, sizeof(one)))
+ if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_SYNCNT, &one, sizeof(one)))
return 1;
- if (bpf_setsockopt(ctx, SOL_TCP, TCP_USER_TIMEOUT, &one, sizeof(one)))
+ if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_USER_TIMEOUT, &one, sizeof(one)))
return 1;
}
if (bpf_setsockopt(ctx, SOL_SOCKET, SO_KEEPALIVE, &zero, sizeof(zero)))
@@ -135,7 +134,7 @@ static __inline int set_notsent_lowat(struct bpf_sock_addr *ctx)
int lowat = 65535;
if (ctx->type == SOCK_STREAM) {
- if (bpf_setsockopt(ctx, SOL_TCP, TCP_NOTSENT_LOWAT, &lowat, sizeof(lowat)))
+ if (bpf_setsockopt(ctx, IPPROTO_TCP, TCP_NOTSENT_LOWAT, &lowat, sizeof(lowat)))
return 1;
}
--
2.34.1
From: Roberto Sassu <roberto.sassu(a)huawei.com>
One of the desirable features in security is the ability to restrict import
of data to a given system based on data authenticity. If data import can be
restricted, it would be possible to enforce a system-wide policy based on
the signing keys the system owner trusts.
This feature is widely used in the kernel. For example, if the restriction
is enabled, kernel modules can be plugged in only if they are signed with a
key whose public part is in the primary or secondary keyring.
For eBPF, it can be useful as well. For example, it might be useful to
authenticate data an eBPF program makes security decisions on.
After a discussion in the eBPF mailing list, it was decided that the stated
goal should be accomplished by introducing four new kfuncs:
bpf_lookup_user_key() and bpf_lookup_system_key(), for retrieving a keyring
with keys trusted for signature verification, respectively from its serial
and from a pre-determined ID; bpf_key_put(), to release the reference
obtained with the former two kfuncs, bpf_verify_pkcs7_signature(), for
verifying PKCS#7 signatures.
Other than the key serial, bpf_lookup_user_key() also accepts key lookup
flags, that influence the behavior of the lookup. bpf_lookup_system_key()
accepts pre-determined IDs defined in include/linux/verification.h.
bpf_key_put() accepts the new bpf_key structure, introduced to tell whether
the other structure member, a key pointer, is valid or not. The reason is
that verify_pkcs7_signature() also accepts invalid pointers, set with the
pre-determined ID, to select a system-defined keyring. key_put() must be
called only for valid key pointers.
Since the two key lookup functions allocate memory and one increments a key
reference count, they must be used in conjunction with bpf_key_put(). The
latter must be called only if the lookup functions returned a non-NULL
pointer. The verifier denies the execution of eBPF programs that don't
respect this rule.
The two key lookup functions should be used in alternative, depending on
the use case. While bpf_lookup_user_key() provides great flexibility, it
seems suboptimal in terms of security guarantees, as even if the eBPF
program is assumed to be trusted, the serial used to obtain the key pointer
might come from untrusted user space not choosing one that the system
administrator approves to enforce a mandatory policy.
bpf_lookup_system_key() instead provides much stronger guarantees,
especially if the pre-determined ID is not passed by user space but is
hardcoded in the eBPF program, and that program is signed. In this case,
bpf_verify_pkcs7_signature() will always perform signature verification
with a key that the system administrator approves, i.e. the primary,
secondary or platform keyring.
Nevertheless, key permission checks need to be done accurately. Since
bpf_lookup_user_key() cannot determine how a key will be used by other
kfuncs, it has to defer the permission check to the actual kfunc using the
key. It does it by calling lookup_user_key() with KEY_DEFER_PERM_CHECK as
needed permission. Later, bpf_verify_pkcs7_signature(), if called,
completes the permission check by calling key_validate(). It does not need
to call key_task_permission() with permission KEY_NEED_SEARCH, as it is
already done elsewhere by the key subsystem. Future kfuncs using the
bpf_key structure need to implement the proper checks as well.
Finally, the last kfunc, bpf_verify_pkcs7_signature(), accepts the data and
signature to verify as eBPF dynamic pointers, to minimize the number of
kfunc parameters, and the keyring with keys for signature verification as a
bpf_key structure, returned by one of the two key lookup functions.
bpf_lookup_user_key() and bpf_verify_pkcs7_signature() can be called only
from sleepable programs, because of memory allocation and crypto
operations. For example, the lsm.s/bpf attach point is suitable,
fexit/array_map_update_elem is not.
The correctness of implementation of the new kfuncs and of their usage is
checked with the introduced tests.
The patch set includes a patch from another author (dependency) for sake of
completeness. It is organized as follows.
Patch 1 from KP Singh allows kfuncs to be used by LSM programs. Patch 2
allows dynamic pointers to be used as kfunc parameters. Patch 3 exports
bpf_dynptr_get_size(), to obtain the real size of data carried by a dynamic
pointer. Patch 4 makes available for new eBPF kfuncs some key-related
definitions. Patch 5 introduces the bpf_lookup_*_key() and bpf_key_put()
kfuncs. Patch 6 introduces the bpf_verify_pkcs7_signature() kfunc. Patch 7
changes the testing kernel configuration to compile everything as built-in.
Finally, patches 8-10 introduce the tests.
Changelog
v11:
- Move stringify_struct() macro to include/linux/btf.h (suggested by
Daniel)
- Change kernel configuration options in
tools/testing/selftests/bpf/config* from =m to =y
v10:
- Introduce key_lookup_flags_check() and system_keyring_id_check() inline
functions to check parameters (suggested by KP)
- Fix descriptions and comment of key-related kfuncs (suggested by KP)
- Register kfunc set only once (suggested by Alexei)
- Move needed kernel options to the architecture-independent configuration
for testing
v9:
- Drop patch to introduce KF_SLEEPABLE kfunc flag (already merged)
- Rename valid_ptr member of bpf_key to has_ref (suggested by Daniel)
- Check dynamic pointers in kfunc definition with bpf_dynptr_kern struct
definition instead of string, to detect structure renames (suggested by
Daniel)
- Explicitly say that we permit initialized dynamic pointers in kfunc
definition (suggested by Daniel)
- Remove noinline __weak from kfuncs definition (reported by Daniel)
- Simplify key lookup flags check in bpf_lookup_user_key() (suggested by
Daniel)
- Explain the reason for deferring key permission check (suggested by
Daniel)
- Allocate memory with GFP_ATOMIC in bpf_lookup_system_key(), and remove
KF_SLEEPABLE kfunc flag from kfunc declaration (suggested by Daniel)
- Define only one kfunc set and remove the loop for registration
(suggested by Alexei)
v8:
- Define the new bpf_key structure to carry the key pointer and whether
that pointer is valid or not (suggested by Daniel)
- Drop patch to mark a kfunc parameter with the __maybe_null suffix
- Improve documentation of kfuncs
- Introduce bpf_lookup_system_key() to obtain a key pointer suitable for
verify_pkcs7_signature() (suggested by Daniel)
- Use the new kfunc registration API
- Drop patch to test the __maybe_null suffix
- Add tests for bpf_lookup_system_key()
v7:
- Add support for using dynamic and NULL pointers in kfunc (suggested by
Alexei)
- Add new kfunc-related tests
v6:
- Switch back to key lookup helpers + signature verification (until v5),
and defer permission check from bpf_lookup_user_key() to
bpf_verify_pkcs7_signature()
- Add additional key lookup test to illustrate the usage of the
KEY_LOOKUP_CREATE flag and validate the flags (suggested by Daniel)
- Make description of flags of bpf_lookup_user_key() more user-friendly
(suggested by Daniel)
- Fix validation of flags parameter in bpf_lookup_user_key() (reported by
Daniel)
- Rename bpf_verify_pkcs7_signature() keyring-related parameters to
user_keyring and system_keyring to make their purpose more clear
- Accept keyring-related parameters of bpf_verify_pkcs7_signature() as
alternatives (suggested by KP)
- Replace unsigned long type with u64 in helper declaration (suggested by
Daniel)
- Extend the bpf_verify_pkcs7_signature() test by calling the helper
without data, by ensuring that the helper enforces the keyring-related
parameters as alternatives, by ensuring that the helper rejects
inaccessible and expired keyrings, and by checking all system keyrings
- Move bpf_lookup_user_key() and bpf_key_put() usage tests to
ref_tracking.c (suggested by John)
- Call bpf_lookup_user_key() and bpf_key_put() only in sleepable programs
v5:
- Move KEY_LOOKUP_ to include/linux/key.h
for validation of bpf_verify_pkcs7_signature() parameter
- Remove bpf_lookup_user_key() and bpf_key_put() helpers, and the
corresponding tests
- Replace struct key parameter of bpf_verify_pkcs7_signature() with the
keyring serial and lookup flags
- Call lookup_user_key() and key_put() in bpf_verify_pkcs7_signature()
code, to ensure that the retrieved key is used according to the
permission requested at lookup time
- Clarified keyring precedence in the description of
bpf_verify_pkcs7_signature() (suggested by John)
- Remove newline in the second argument of ASSERT_
- Fix helper prototype regular expression in bpf_doc.py
v4:
- Remove bpf_request_key_by_id(), don't return an invalid pointer that
other helpers can use
- Pass the keyring ID (without ULONG_MAX, suggested by Alexei) to
bpf_verify_pkcs7_signature()
- Introduce bpf_lookup_user_key() and bpf_key_put() helpers (suggested by
Alexei)
- Add lookup_key_norelease test, to ensure that the verifier blocks eBPF
programs which don't decrement the key reference count
- Parse raw PKCS#7 signature instead of module-style signature in the
verify_pkcs7_signature test (suggested by Alexei)
- Parse kernel module in user space and pass raw PKCS#7 signature to the
eBPF program for signature verification
v3:
- Rename bpf_verify_signature() back to bpf_verify_pkcs7_signature() to
avoid managing different parameters for each signature verification
function in one helper (suggested by Daniel)
- Use dynamic pointers and export bpf_dynptr_get_size() (suggested by
Alexei)
- Introduce bpf_request_key_by_id() to give more flexibility to the caller
of bpf_verify_pkcs7_signature() to retrieve the appropriate keyring
(suggested by Alexei)
- Fix test by reordering the gcc command line, always compile sign-file
- Improve helper support check mechanism in the test
v2:
- Rename bpf_verify_pkcs7_signature() to a more generic
bpf_verify_signature() and pass the signature type (suggested by KP)
- Move the helper and prototype declaration under #ifdef so that user
space can probe for support for the helper (suggested by Daniel)
- Describe better the keyring types (suggested by Daniel)
- Include linux/bpf.h instead of vmlinux.h to avoid implicit or
redeclaration
- Make the test selfcontained (suggested by Alexei)
v1:
- Don't define new map flag but introduce simple wrapper of
verify_pkcs7_signature() (suggested by Alexei and KP)
KP Singh (1):
bpf: Allow kfuncs to be used in LSM programs
Roberto Sassu (9):
btf: Handle dynamic pointer parameter in kfuncs
bpf: Export bpf_dynptr_get_size()
KEYS: Move KEY_LOOKUP_ to include/linux/key.h
bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncs
bpf: Add bpf_verify_pkcs7_signature() kfunc
selftests/bpf: Compile kernel with everything as built-in
selftests/bpf: Add verifier tests for bpf_lookup_*_key() and
bpf_key_put()
selftests/bpf: Add additional tests for bpf_lookup_*_key()
selftests/bpf: Add test for bpf_verify_pkcs7_signature() kfunc
include/linux/bpf.h | 7 +
include/linux/bpf_verifier.h | 3 +
include/linux/btf.h | 9 +
include/linux/key.h | 11 +
include/linux/verification.h | 8 +
kernel/bpf/btf.c | 19 +
kernel/bpf/helpers.c | 2 +-
kernel/bpf/verifier.c | 4 +-
kernel/trace/bpf_trace.c | 180 ++++++++
security/keys/internal.h | 2 -
tools/testing/selftests/bpf/Makefile | 14 +-
tools/testing/selftests/bpf/config | 32 +-
tools/testing/selftests/bpf/config.x86_64 | 7 +-
.../selftests/bpf/prog_tests/lookup_key.c | 112 +++++
.../bpf/prog_tests/verify_pkcs7_sig.c | 399 ++++++++++++++++++
.../selftests/bpf/progs/test_lookup_key.c | 46 ++
.../bpf/progs/test_verify_pkcs7_sig.c | 100 +++++
tools/testing/selftests/bpf/test_verifier.c | 3 +-
.../selftests/bpf/verifier/ref_tracking.c | 139 ++++++
.../testing/selftests/bpf/verify_sig_setup.sh | 104 +++++
20 files changed, 1173 insertions(+), 28 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/lookup_key.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/verify_pkcs7_sig.c
create mode 100644 tools/testing/selftests/bpf/progs/test_lookup_key.c
create mode 100644 tools/testing/selftests/bpf/progs/test_verify_pkcs7_sig.c
create mode 100755 tools/testing/selftests/bpf/verify_sig_setup.sh
--
2.25.1
The primary reason to invoke the oom reaper from the exit_mmap path used
to be a prevention of an excessive oom killing if the oom victim exit
races with the oom reaper (see [1] for more details). The invocation has
moved around since then because of the interaction with the munlock
logic but the underlying reason has remained the same (see [2]).
Munlock code is no longer a problem since [3] and there shouldn't be
any blocking operation before the memory is unmapped by exit_mmap so
the oom reaper invocation can be dropped. The unmapping part can be done
with the non-exclusive mmap_sem and the exclusive one is only required
when page tables are freed.
Remove the oom_reaper from exit_mmap which will make the code easier to
read. This is really unlikely to make any observable difference although
some microbenchmarks could benefit from one less branch that needs to be
evaluated even though it almost never is true.
[1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
[2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
[3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
---
Notes:
- Rebased over git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
mm-unstable branch per Andrew's request but applies cleany to Linus' ToT
- Conflicts with maple-tree patchset. Resolving these was discussed in
https://lore.kernel.org/all/20220519223438.qx35hbpfnnfnpouw@revolver/
include/linux/oom.h | 2 --
mm/mmap.c | 31 ++++++++++++-------------------
mm/oom_kill.c | 2 +-
3 files changed, 13 insertions(+), 22 deletions(-)
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 02d1e7bbd8cd..6cdde62b078b 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -106,8 +106,6 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm)
return 0;
}
-bool __oom_reap_task_mm(struct mm_struct *mm);
-
long oom_badness(struct task_struct *p,
unsigned long totalpages);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2b9305ed0dda..b7918e6bb0db 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3110,30 +3110,13 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Manually reap the mm to free as much memory as possible.
- * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
- * this mm from further consideration. Taking mm->mmap_lock for
- * write after setting MMF_OOM_SKIP will guarantee that the oom
- * reaper will not run on this mm again after mmap_lock is
- * dropped.
- *
- * Nothing can be holding mm->mmap_lock here and the above call
- * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
- * __oom_reap_task_mm() will not block.
- */
- (void)__oom_reap_task_mm(mm);
- set_bit(MMF_OOM_SKIP, &mm->flags);
- }
-
- mmap_write_lock(mm);
+ mmap_read_lock(mm);
arch_exit_mmap(mm);
vma = mm->mmap;
if (!vma) {
/* Can happen if dup_mmap() received an OOM */
- mmap_write_unlock(mm);
+ mmap_read_unlock(mm);
return;
}
@@ -3143,6 +3126,16 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
+ mmap_read_unlock(mm);
+
+ /*
+ * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
+ * because the memory has been already freed. Do not bother checking
+ * mm_is_oom_victim because setting a bit unconditionally is cheaper.
+ */
+ set_bit(MMF_OOM_SKIP, &mm->flags);
+
+ mmap_write_lock(mm);
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8a70bca67c94..98dca2b42357 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -538,7 +538,7 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait);
static struct task_struct *oom_reaper_list;
static DEFINE_SPINLOCK(oom_reaper_lock);
-bool __oom_reap_task_mm(struct mm_struct *mm)
+static bool __oom_reap_task_mm(struct mm_struct *mm)
{
struct vm_area_struct *vma;
bool ret = true;
--
2.36.1.255.ge46751e96f-goog