Introduce in-kernel headers and other artifacts which are made available as an archive through proc (/proc/kheaders.tar.xz file). This archive makes it possible to build kernel modules, run eBPF programs, and other tracing programs that need to extend the kernel for tracing purposes without any dependency on the file system having headers and build artifacts.
On Android and embedded systems, it is common to switch kernels but not have kernel headers available on the file system. Further once a different kernel is booted, any headers stored on the file system will no longer be useful. By storing the headers as a compressed archive within the kernel, we can avoid these issues that have been a hindrance for a long time.
The best way to use this feature is by building it in. Several users have a need for this, when they switch debug kernels, they donot want to update the filesystem or worry about it where to store the headers on it. However, the feature is also buildable as a module in case the user desires it not being part of the kernel image. This makes it possible to load and unload the headers from memory on demand. A tracing program, or a kernel module builder can load the module, do its operations, and then unload the module to save kernel memory. The total memory needed is 3.8MB.
By having the archive available at a fixed location independent of filesystem dependencies and conventions, all debugging tools can directly refer to the fixed location for the archive, without concerning with where the headers on a typical filesystem which significantly simplifies tooling that needs kernel headers.
The code to read the headers is based on /proc/config.gz code and uses the same technique to embed the headers.
To build a module, the below steps have been tested on an x86 machine: modprobe kheaders rm -rf $HOME/headers mkdir -p $HOME/headers tar -xvf /proc/kheaders.tar.xz -C $HOME/headers >/dev/null cd my-kernel-module make -C $HOME/headers M=$(pwd) modules rmmod kheaders
Additional notes: (1) external modules must be built on the same arch as the host that built vmlinux. This can be done either in a qemu emulated chroot on the target, or natively. This is due to host arch dependency of kernel scripts.
(2) If module building is used, since Module.symvers is not available in the archive due to a cyclic dependency with building of the archive into the kernel or module binaries, the modules built using the archive will not contain symbol versioning (modversion). This is usually not an issue since the idea of this patch is to build a kernel module on the fly and load it into the same kernel. An appropriate warning is already printed by the kernel to alert the user of modules not having modversions when built using the archive. For building with modversions, the user can use traditional header packages. For our tracing usecases, we build modules on the fly with this so it is not a concern.
(3) I have left IKHD_ST and IKHD_ED markers as is to facilitate future patches that would extract the headers from a kernel or module image.
(v4 was Tested-by the following folks, v5 only has minor changes and has passed my testing). Tested-by: qais.yousef@arm.com Tested-by: dietmar.eggemann@arm.com Tested-by: linux@manojrajarao.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org ---
v4 -> v5: (Thanks to Masahiro Yamada for several excellent suggestions) - used incbin instead of bin2c (Masahiro did similar idea) - added module.lds if ia64 otherwise ia64 may fail to build. - added clean-files rule to Makefile - removed strip-comments script and doing it inline - added set -e to header generated to die on errorsr - fixed a minor issue where find command was noisy. - removed unneeded tar.xz rule from kernel/.gitignore - added Tested-by tags from ARM folks.
Changes since v3: - Blank tar was being generated because of a one line I forgot to push. It is updated now. - Added module.lds since arm64 needs it to build modules.
Changes since v2: (Thanks to Masahiro Yamada for several excellent suggestions) - Added support for out of tree builds. - Added incremental build support bringing down build time of incremental builds from 50 seconds to 5 seconds. - Fixed various small nits / cleanups. - clean ups to kheaders.c pointed by Alexey Dobriyan. - Fixed MODULE_LICENSE in test module and kheaders.c - Dropped Module.symvers from archive due to circular dependency.
Changes since v1: - removed IKH_EXTRA variable, not needed (Masahiro Yamada) - small fix ups to selftest - added target to main Makefile etc - added MODULE_LICENSE to test module - made selftest more quiet
Changes since RFC: Both changes bring size down to 3.8MB: - use xz for compression - strip comments except SPDX lines - Call out the module name in Kconfig - Also added selftests in second patch to ensure headers are always working.
init/Kconfig | 11 ++++++ kernel/.gitignore | 1 + kernel/Makefile | 28 ++++++++++++++ kernel/kheaders.c | 73 +++++++++++++++++++++++++++++++++++ scripts/gen_ikh_data.sh | 84 +++++++++++++++++++++++++++++++++++++++++ 5 files changed, 197 insertions(+) create mode 100644 kernel/kheaders.c create mode 100755 scripts/gen_ikh_data.sh
diff --git a/init/Kconfig b/init/Kconfig index 4592bf7997c0..ea75bfbf7dfa 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -580,6 +580,17 @@ config IKCONFIG_PROC This option enables access to the kernel configuration file through /proc/config.gz.
+config IKHEADERS_PROC + tristate "Enable kernel header artifacts through /proc/kheaders.tar.xz" + depends on PROC_FS + help + This option enables access to the kernel header and other artifacts that + are generated during the build process. These can be used to build kernel + modules or by other in-kernel programs such as those generated by eBPF + and systemtap tools. If you build the headers as a module, a module + called kheaders.ko is built which can be loaded on-demand to get access + to the headers. + config LOG_BUF_SHIFT int "Kernel log buffer size (16 => 64KB, 17 => 128KB)" range 12 25 diff --git a/kernel/.gitignore b/kernel/.gitignore index 6e699100872f..34d1e77ee9df 100644 --- a/kernel/.gitignore +++ b/kernel/.gitignore @@ -1,5 +1,6 @@ # # Generated files # +kheaders.md5 timeconst.h hz.bc diff --git a/kernel/Makefile b/kernel/Makefile index 6c57e78817da..7c486bc25fd7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -70,6 +70,7 @@ obj-$(CONFIG_UTS_NS) += utsname.o obj-$(CONFIG_USER_NS) += user_namespace.o obj-$(CONFIG_PID_NS) += pid_namespace.o obj-$(CONFIG_IKCONFIG) += configs.o +obj-$(CONFIG_IKHEADERS_PROC) += kheaders.o obj-$(CONFIG_SMP) += stop_machine.o obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o obj-$(CONFIG_AUDIT) += audit.o auditfilter.o @@ -121,3 +122,30 @@ $(obj)/configs.o: $(obj)/config_data.gz targets += config_data.gz $(obj)/config_data.gz: $(KCONFIG_CONFIG) FORCE $(call if_changed,gzip) + +# Build a list of in-kernel headers for building kernel modules +ikh_file_list := include/ +ikh_file_list += arch/$(SRCARCH)/Makefile +ikh_file_list += arch/$(SRCARCH)/include/ +ikh_file_list += arch/$(SRCARCH)/module.lds +ikh_file_list += arch/$(SRCARCH)/kernel/module.lds +ikh_file_list += scripts/ +ikh_file_list += Makefile + +# Things we need from the $objtree. "OBJDIR" is for the gen_ikh_data.sh +# script to identify that this comes from the $objtree directory +ikh_file_list += OBJDIR/scripts/ +ikh_file_list += OBJDIR/include/ +ikh_file_list += OBJDIR/arch/$(SRCARCH)/include/ +ifeq ($(CONFIG_STACK_VALIDATION), y) +ikh_file_list += OBJDIR/tools/objtool/objtool +endif + +$(obj)/kheaders.o: $(obj)/kheaders_data.tar.xz + +quiet_cmd_genikh = GEN $(obj)/kheaders_data.tar.xz +cmd_genikh = $(srctree)/scripts/gen_ikh_data.sh $@ $(ikh_file_list) +$(obj)/kheaders_data.tar.xz: FORCE + $(call cmd,genikh) + +clean-files := kheaders_data.tar.xz kheaders.md5 diff --git a/kernel/kheaders.c b/kernel/kheaders.c new file mode 100644 index 000000000000..d072a958a8f1 --- /dev/null +++ b/kernel/kheaders.c @@ -0,0 +1,73 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * kernel/kheaders.c + * Provide headers and artifacts needed to build kernel modules. + * (Borrowed code from kernel/configs.c) + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/proc_fs.h> +#include <linux/init.h> +#include <linux/uaccess.h> + +/* + * Define kernel_headers_data and kernel_headers_data_end, within which the the + * compressed kernel headers are stpred. The file is first compressed with xz. + */ + +asm ( +" .pushsection .rodata, "a" \n" +" .global kernel_headers_data \n" +"kernel_headers_data: \n" +" .incbin "kernel/kheaders_data.tar.xz" \n" +" .global kernel_headers_data_end \n" +"kernel_headers_data_end: \n" +" .popsection \n" +); + +extern char kernel_headers_data; +extern char kernel_headers_data_end; + +static ssize_t +ikheaders_read_current(struct file *file, char __user *buf, + size_t len, loff_t *offset) +{ + return simple_read_from_buffer(buf, len, offset, + &kernel_headers_data, + &kernel_headers_data_end - + &kernel_headers_data); +} + +static const struct file_operations ikheaders_file_ops = { + .read = ikheaders_read_current, + .llseek = default_llseek, +}; + +static int __init ikheaders_init(void) +{ + struct proc_dir_entry *entry; + + /* create the current headers file */ + entry = proc_create("kheaders.tar.xz", S_IRUGO, NULL, + &ikheaders_file_ops); + if (!entry) + return -ENOMEM; + + proc_set_size(entry, + &kernel_headers_data_end - + &kernel_headers_data); + return 0; +} + +static void __exit ikheaders_cleanup(void) +{ + remove_proc_entry("kheaders.tar.xz", NULL); +} + +module_init(ikheaders_init); +module_exit(ikheaders_cleanup); + +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Joel Fernandes"); +MODULE_DESCRIPTION("Echo the kernel header artifacts used to build the kernel"); diff --git a/scripts/gen_ikh_data.sh b/scripts/gen_ikh_data.sh new file mode 100755 index 000000000000..3f9cae72c2a4 --- /dev/null +++ b/scripts/gen_ikh_data.sh @@ -0,0 +1,84 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# This script generates an archive consisting of kernel headers +# for CONFIG_IKHEADERS_PROC. +set -e + +spath="$(dirname "$(readlink -f "$0")")" +kroot="$spath/.." +outdir="$(pwd)" +tarfile=$1 +cpio_dir=$outdir/$tarfile.tmp + +file_list=${@:2} + +src_file_list="" +for f in $file_list; do + if [ ! -f "$kroot/$f" ] && [ ! -d "$kroot/$f" ]; then continue; fi + src_file_list="$src_file_list $(echo $f | grep -v OBJDIR)" +done + +obj_file_list="" +for f in $file_list; do + f=$(echo $f | grep OBJDIR | sed -e 's/OBJDIR///g') + if [ ! -f $f ] && [ ! -d $f ]; then continue; fi + obj_file_list="$obj_file_list $f"; +done + +# Support incremental builds by skipping archive generation +# if timestamps of files being archived are not changed. + +# This block is useful for debugging the incremental builds. +# Uncomment it for debugging. +# iter=1 +# if [ ! -f /tmp/iter ]; then echo 1 > /tmp/iter; +# else; iter=$(($(cat /tmp/iter) + 1)); fi +# find $src_file_list -type f | xargs ls -lR > /tmp/src-ls-$iter +# find $obj_file_list -type f | xargs ls -lR > /tmp/obj-ls-$iter + +# modules.order and include/generated/compile.h are ignored because these are +# touched even when none of the source files changed. This causes pointless +# regeneration, so let us ignore them for md5 calculation. +pushd $kroot > /dev/null +src_files_md5="$(find $src_file_list -type f ! -name modules.order | + grep -v "include/generated/compile.h" | + xargs ls -lR | md5sum | cut -d ' ' -f1)" +popd > /dev/null +obj_files_md5="$(find $obj_file_list -type f ! -name modules.order | + grep -v "include/generated/compile.h" | + xargs ls -lR | md5sum | cut -d ' ' -f1)" + +if [ -f $tarfile ]; then tarfile_md5="$(md5sum $tarfile | cut -d ' ' -f1)"; fi +if [ -f kernel/kheaders.md5 ] && + [ "$(cat kernel/kheaders.md5|head -1)" == "$src_files_md5" ] && + [ "$(cat kernel/kheaders.md5|head -2|tail -1)" == "$obj_files_md5" ] && + [ "$(cat kernel/kheaders.md5|tail -1)" == "$tarfile_md5" ]; then + exit +fi + +rm -rf $cpio_dir +mkdir $cpio_dir + +pushd $kroot > /dev/null +for f in $src_file_list; + do find "$f" ! -name "*.c" ! -name "*.o" ! -name "*.cmd" ! -name ".*"; +done | cpio --quiet -pd $cpio_dir +popd > /dev/null + +# The second CPIO can complain if files already exist which can +# happen with out of tree builds. Just silence CPIO for now. +for f in $obj_file_list; + do find "$f" ! -name "*.c" ! -name "*.o" ! -name "*.cmd" ! -name ".*"; +done | cpio --quiet -pd $cpio_dir >/dev/null 2>&1 + +find $cpio_dir -type f -print0 | + xargs -0 -P8 -n1 perl -pi -e 'BEGIN {undef $/;}; s//*((?!SPDX).)*?*///smg;' + +tar -Jcf $tarfile -C $cpio_dir/ . > /dev/null + +echo "$src_files_md5" > kernel/kheaders.md5 +echo "$obj_files_md5" >> kernel/kheaders.md5 +echo "$(md5sum $tarfile | cut -d ' ' -f1)" >> kernel/kheaders.md5 + +rm -rf $cpio_dir
This test tries to build a module successfully using the in-kernel headers found in /proc/kheaders.tar.xz.
Verified pass and fail scenarios by running: make -C tools/testing/selftests TARGETS=kheaders run_tests
Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/kheaders/Makefile | 5 +++++ tools/testing/selftests/kheaders/config | 1 + .../kheaders/run_kheaders_modbuild.sh | 18 +++++++++++++++++ .../selftests/kheaders/testmod/Makefile | 3 +++ .../testing/selftests/kheaders/testmod/test.c | 20 +++++++++++++++++++ 6 files changed, 48 insertions(+) create mode 100644 tools/testing/selftests/kheaders/Makefile create mode 100644 tools/testing/selftests/kheaders/config create mode 100755 tools/testing/selftests/kheaders/run_kheaders_modbuild.sh create mode 100644 tools/testing/selftests/kheaders/testmod/Makefile create mode 100644 tools/testing/selftests/kheaders/testmod/test.c
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 971fc8428117..2446cfeae1c5 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -20,6 +20,7 @@ TARGETS += intel_pstate TARGETS += ipc TARGETS += ir TARGETS += kcmp +TARGETS += kheaders TARGETS += kvm TARGETS += lib TARGETS += livepatch diff --git a/tools/testing/selftests/kheaders/Makefile b/tools/testing/selftests/kheaders/Makefile new file mode 100644 index 000000000000..51035ab0732b --- /dev/null +++ b/tools/testing/selftests/kheaders/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +TEST_PROGS := run_kheaders_modbuild.sh + +include ../lib.mk diff --git a/tools/testing/selftests/kheaders/config b/tools/testing/selftests/kheaders/config new file mode 100644 index 000000000000..5221f9fb5e79 --- /dev/null +++ b/tools/testing/selftests/kheaders/config @@ -0,0 +1 @@ +CONFIG_IKHEADERS_PROC=y diff --git a/tools/testing/selftests/kheaders/run_kheaders_modbuild.sh b/tools/testing/selftests/kheaders/run_kheaders_modbuild.sh new file mode 100755 index 000000000000..f001568e08b0 --- /dev/null +++ b/tools/testing/selftests/kheaders/run_kheaders_modbuild.sh @@ -0,0 +1,18 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +HEADERS_XZ=/proc/kheaders.tar.xz +TMP_DIR_HEADERS=$(mktemp -d) +TMP_DIR_MODULE=$(mktemp -d) +SPATH="$(dirname "$(readlink -f "$0")")" + +tar -xvf $HEADERS_XZ -C $TMP_DIR_HEADERS > /dev/null + +cp -r $SPATH/testmod $TMP_DIR_MODULE/ + +pushd $TMP_DIR_MODULE/testmod > /dev/null +make -C $TMP_DIR_HEADERS M=$(pwd) modules +popd > /dev/null + +rm -rf $TMP_DIR_HEADERS +rm -rf $TMP_DIR_MODULE diff --git a/tools/testing/selftests/kheaders/testmod/Makefile b/tools/testing/selftests/kheaders/testmod/Makefile new file mode 100644 index 000000000000..7083e28706e8 --- /dev/null +++ b/tools/testing/selftests/kheaders/testmod/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-m += test.o diff --git a/tools/testing/selftests/kheaders/testmod/test.c b/tools/testing/selftests/kheaders/testmod/test.c new file mode 100644 index 000000000000..6eb0b8492ffa --- /dev/null +++ b/tools/testing/selftests/kheaders/testmod/test.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/module.h> +#include <linux/kernel.h> +#include <linux/init.h> + +static int __init hello_init(void) +{ + printk(KERN_INFO "Hello, world\n"); + return 0; +} + +static void __exit hello_exit(void) +{ + printk(KERN_INFO "Goodbye, world\n"); +} + +module_init(hello_init); +module_exit(hello_exit); +MODULE_LICENSE("GPL v2");
Since commit 13610aa908dc ("kernel/configs: use .incbin directive to embed config_data.gz"), IKCONFIG no longer uses BUILD_BIN2C so prevent it from being selected in Kconfig.
Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- init/Kconfig | 1 - 1 file changed, 1 deletion(-)
diff --git a/init/Kconfig b/init/Kconfig index ea75bfbf7dfa..ffcb2f90543f 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -562,7 +562,6 @@ config BUILD_BIN2C
config IKCONFIG tristate "Kernel .config support" - select BUILD_BIN2C ---help--- This option enables the complete Linux kernel ".config" file contents to be saved in the kernel. It provides documentation
On Wed, 20 Mar 2019 12:31:14 -0400 "Joel Fernandes (Google)" joel@joelfernandes.org wrote:
Introduce in-kernel headers and other artifacts which are made available as an archive through proc (/proc/kheaders.tar.xz file).
Lazyass here hasn't been following the recent review discussion and is going to try cheating. Are there significant outstanding concerns now, or are we good to go?
On Wed, Mar 20, 2019 at 11:31:11AM -0700, Andrew Morton wrote:
On Wed, 20 Mar 2019 12:31:14 -0400 "Joel Fernandes (Google)" joel@joelfernandes.org wrote:
Introduce in-kernel headers and other artifacts which are made available as an archive through proc (/proc/kheaders.tar.xz file).
Lazyass here hasn't been following the recent review discussion and is going to try cheating. Are there significant outstanding concerns now, or are we good to go?
Looks to me like it is pretty solid. v4->v5 was just cosmetic changes, rebasing etc. Unless Masahiro has any other concerns, it could be a linux-next candidate. I tested it quite heavily, but would be nice to get some linux-next testing as well.
thanks,
- Joel
Hi,
On Wed, Mar 20, 2019 at 9:31 AM Joel Fernandes (Google) joel@joelfernandes.org wrote:
Introduce in-kernel headers and other artifacts which are made available as an archive through proc (/proc/kheaders.tar.xz file). This archive makes it possible to build kernel modules, run eBPF programs, and other tracing programs that need to extend the kernel for tracing purposes without any dependency on the file system having headers and build artifacts.
On Android and embedded systems, it is common to switch kernels but not have kernel headers available on the file system. Further once a different kernel is booted, any headers stored on the file system will no longer be useful. By storing the headers as a compressed archive within the kernel, we can avoid these issues that have been a hindrance for a long time.
The best way to use this feature is by building it in. Several users have a need for this, when they switch debug kernels, they donot want to update the filesystem or worry about it where to store the headers on it. However, the feature is also buildable as a module in case the user desires it not being part of the kernel image. This makes it possible to load and unload the headers from memory on demand. A tracing program, or a kernel module builder can load the module, do its operations, and then unload the module to save kernel memory. The total memory needed is 3.8MB.
By having the archive available at a fixed location independent of filesystem dependencies and conventions, all debugging tools can directly refer to the fixed location for the archive, without concerning with where the headers on a typical filesystem which significantly simplifies tooling that needs kernel headers.
The code to read the headers is based on /proc/config.gz code and uses the same technique to embed the headers.
To build a module, the below steps have been tested on an x86 machine: modprobe kheaders rm -rf $HOME/headers mkdir -p $HOME/headers tar -xvf /proc/kheaders.tar.xz -C $HOME/headers >/dev/null cd my-kernel-module make -C $HOME/headers M=$(pwd) modules rmmod kheaders
Additional notes: (1) external modules must be built on the same arch as the host that built vmlinux. This can be done either in a qemu emulated chroot on the target, or natively. This is due to host arch dependency of kernel scripts.
(2) If module building is used, since Module.symvers is not available in the archive due to a cyclic dependency with building of the archive into the kernel or module binaries, the modules built using the archive will not contain symbol versioning (modversion). This is usually not an issue since the idea of this patch is to build a kernel module on the fly and load it into the same kernel. An appropriate warning is already printed by the kernel to alert the user of modules not having modversions when built using the archive. For building with modversions, the user can use traditional header packages. For our tracing usecases, we build modules on the fly with this so it is not a concern.
(3) I have left IKHD_ST and IKHD_ED markers as is to facilitate future patches that would extract the headers from a kernel or module image.
(v4 was Tested-by the following folks, v5 only has minor changes and has passed my testing). Tested-by: qais.yousef@arm.com Tested-by: dietmar.eggemann@arm.com Tested-by: linux@manojrajarao.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
Sorry to be late at the party with this kind of feedback, but I find the whole ".tar.gz in procfs" to be an awkward solution, especially if there's expected to be userspace tooling that depends on this long-term.
Wouldn't it be more convenient to provide it in a standardized format such that you won't have to take an additional step, and always have it in a known location?
Something like:
- Pseudo-filesystem, that can just be mounted under /sys/kernel/headers or something (similar to debugfs or /proc/device-tree). - Exporting something like a squashfs image instead, allowing loopback mounting of it (or by providing a pseudo-/dev entry for it), again allowing direct export of the contents and avoiding the extracted directory from being out of sync with currently running kernel.
Having to copy and extract the tarball is the most awkward step, IMHO. I also find the waste of kernel memory for it to be an issue, but given that it can be built as a module I guess that's the obvious solution for those who care about memory consumption.
-Olof
On Mon, Apr 8, 2019 at 9:29 AM Olof Johansson olof@lixom.net wrote:
Hi,
On Wed, Mar 20, 2019 at 9:31 AM Joel Fernandes (Google) joel@joelfernandes.org wrote:
Introduce in-kernel headers and other artifacts which are made available as an archive through proc (/proc/kheaders.tar.xz file). This archive makes it possible to build kernel modules, run eBPF programs, and other tracing programs that need to extend the kernel for tracing purposes without any dependency on the file system having headers and build artifacts.
On Android and embedded systems, it is common to switch kernels but not have kernel headers available on the file system. Further once a different kernel is booted, any headers stored on the file system will no longer be useful. By storing the headers as a compressed archive within the kernel, we can avoid these issues that have been a hindrance for a long time.
The best way to use this feature is by building it in. Several users have a need for this, when they switch debug kernels, they donot want to update the filesystem or worry about it where to store the headers on it. However, the feature is also buildable as a module in case the user desires it not being part of the kernel image. This makes it possible to load and unload the headers from memory on demand. A tracing program, or a kernel module builder can load the module, do its operations, and then unload the module to save kernel memory. The total memory needed is 3.8MB.
By having the archive available at a fixed location independent of filesystem dependencies and conventions, all debugging tools can directly refer to the fixed location for the archive, without concerning with where the headers on a typical filesystem which significantly simplifies tooling that needs kernel headers.
The code to read the headers is based on /proc/config.gz code and uses the same technique to embed the headers.
To build a module, the below steps have been tested on an x86 machine: modprobe kheaders rm -rf $HOME/headers mkdir -p $HOME/headers tar -xvf /proc/kheaders.tar.xz -C $HOME/headers >/dev/null cd my-kernel-module make -C $HOME/headers M=$(pwd) modules rmmod kheaders
Additional notes: (1) external modules must be built on the same arch as the host that built vmlinux. This can be done either in a qemu emulated chroot on the target, or natively. This is due to host arch dependency of kernel scripts.
(2) If module building is used, since Module.symvers is not available in the archive due to a cyclic dependency with building of the archive into the kernel or module binaries, the modules built using the archive will not contain symbol versioning (modversion). This is usually not an issue since the idea of this patch is to build a kernel module on the fly and load it into the same kernel. An appropriate warning is already printed by the kernel to alert the user of modules not having modversions when built using the archive. For building with modversions, the user can use traditional header packages. For our tracing usecases, we build modules on the fly with this so it is not a concern.
(3) I have left IKHD_ST and IKHD_ED markers as is to facilitate future patches that would extract the headers from a kernel or module image.
(v4 was Tested-by the following folks, v5 only has minor changes and has passed my testing). Tested-by: qais.yousef@arm.com Tested-by: dietmar.eggemann@arm.com Tested-by: linux@manojrajarao.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
Sorry to be late at the party with this kind of feedback, but I find the whole ".tar.gz in procfs" to be an awkward solution, especially if there's expected to be userspace tooling that depends on this long-term. [snip]
The approaches you proposed were explored in detail on this thread and other related threads. The tarball in proc approach is a simple, pragmatic approach that allows makes a lot of scenarios Just Work where they didn't before. Approaches like a new filesystem, a mountable block device, a custom debuginfo format, and so on add complexity without providing concrete gains in functionality. We'd like to get this work into the tree sooner rather than later.
On Mon, Apr 8, 2019 at 9:37 AM Daniel Colascione dancol@google.com wrote:
On Mon, Apr 8, 2019 at 9:29 AM Olof Johansson olof@lixom.net wrote:
Hi,
On Wed, Mar 20, 2019 at 9:31 AM Joel Fernandes (Google) joel@joelfernandes.org wrote:
Introduce in-kernel headers and other artifacts which are made available as an archive through proc (/proc/kheaders.tar.xz file). This archive makes it possible to build kernel modules, run eBPF programs, and other tracing programs that need to extend the kernel for tracing purposes without any dependency on the file system having headers and build artifacts.
On Android and embedded systems, it is common to switch kernels but not have kernel headers available on the file system. Further once a different kernel is booted, any headers stored on the file system will no longer be useful. By storing the headers as a compressed archive within the kernel, we can avoid these issues that have been a hindrance for a long time.
The best way to use this feature is by building it in. Several users have a need for this, when they switch debug kernels, they donot want to update the filesystem or worry about it where to store the headers on it. However, the feature is also buildable as a module in case the user desires it not being part of the kernel image. This makes it possible to load and unload the headers from memory on demand. A tracing program, or a kernel module builder can load the module, do its operations, and then unload the module to save kernel memory. The total memory needed is 3.8MB.
By having the archive available at a fixed location independent of filesystem dependencies and conventions, all debugging tools can directly refer to the fixed location for the archive, without concerning with where the headers on a typical filesystem which significantly simplifies tooling that needs kernel headers.
The code to read the headers is based on /proc/config.gz code and uses the same technique to embed the headers.
To build a module, the below steps have been tested on an x86 machine: modprobe kheaders rm -rf $HOME/headers mkdir -p $HOME/headers tar -xvf /proc/kheaders.tar.xz -C $HOME/headers >/dev/null cd my-kernel-module make -C $HOME/headers M=$(pwd) modules rmmod kheaders
Additional notes: (1) external modules must be built on the same arch as the host that built vmlinux. This can be done either in a qemu emulated chroot on the target, or natively. This is due to host arch dependency of kernel scripts.
(2) If module building is used, since Module.symvers is not available in the archive due to a cyclic dependency with building of the archive into the kernel or module binaries, the modules built using the archive will not contain symbol versioning (modversion). This is usually not an issue since the idea of this patch is to build a kernel module on the fly and load it into the same kernel. An appropriate warning is already printed by the kernel to alert the user of modules not having modversions when built using the archive. For building with modversions, the user can use traditional header packages. For our tracing usecases, we build modules on the fly with this so it is not a concern.
(3) I have left IKHD_ST and IKHD_ED markers as is to facilitate future patches that would extract the headers from a kernel or module image.
(v4 was Tested-by the following folks, v5 only has minor changes and has passed my testing). Tested-by: qais.yousef@arm.com Tested-by: dietmar.eggemann@arm.com Tested-by: linux@manojrajarao.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
Sorry to be late at the party with this kind of feedback, but I find the whole ".tar.gz in procfs" to be an awkward solution, especially if there's expected to be userspace tooling that depends on this long-term. [snip]
The approaches you proposed were explored in detail on this thread and other related threads.
Not on this thread, but maybe on others. Since they weren't linked and referenced, I didn't find them. Having them summarized in the patch description would be a great idea.
The tarball in proc approach is a simple, pragmatic approach that allows makes a lot of scenarios Just Work where they didn't before.
"My pragmatic solution, your messy hack".
It's an awkward solution that will now be permanently locked in due to the /proc interface being an ABI.
Approaches like a new filesystem, a mountable block device, a custom debuginfo format, and so on add complexity without providing concrete gains in functionality.
It definitely provides concrete gains in functionality, in particular it provides a significantly less fragile way of providing the data such that having it out of sync with the running kernel is a lot less accident-prone.
We'd like to get this work into the tree sooner rather than later.
That has _never_ been a good argument for picking up something that will need to be supported as an ABI forever.
We've solved these kind of things in the past without exporting tarballs from the kernel. We can do it again.
I literally didn't hear a single valid reason for why this patch should go in from you.
-Olof
On Mon, Apr 08, 2019 at 09:29:30AM -0700, Olof Johansson wrote:
Hi,
On Wed, Mar 20, 2019 at 9:31 AM Joel Fernandes (Google) joel@joelfernandes.org wrote:
Introduce in-kernel headers and other artifacts which are made available as an archive through proc (/proc/kheaders.tar.xz file). This archive makes it possible to build kernel modules, run eBPF programs, and other tracing programs that need to extend the kernel for tracing purposes without any dependency on the file system having headers and build artifacts.
On Android and embedded systems, it is common to switch kernels but not have kernel headers available on the file system. Further once a different kernel is booted, any headers stored on the file system will no longer be useful. By storing the headers as a compressed archive within the kernel, we can avoid these issues that have been a hindrance for a long time.
The best way to use this feature is by building it in. Several users have a need for this, when they switch debug kernels, they donot want to update the filesystem or worry about it where to store the headers on it. However, the feature is also buildable as a module in case the user desires it not being part of the kernel image. This makes it possible to load and unload the headers from memory on demand. A tracing program, or a kernel module builder can load the module, do its operations, and then unload the module to save kernel memory. The total memory needed is 3.8MB.
By having the archive available at a fixed location independent of filesystem dependencies and conventions, all debugging tools can directly refer to the fixed location for the archive, without concerning with where the headers on a typical filesystem which significantly simplifies tooling that needs kernel headers.
The code to read the headers is based on /proc/config.gz code and uses the same technique to embed the headers.
To build a module, the below steps have been tested on an x86 machine: modprobe kheaders rm -rf $HOME/headers mkdir -p $HOME/headers tar -xvf /proc/kheaders.tar.xz -C $HOME/headers >/dev/null cd my-kernel-module make -C $HOME/headers M=$(pwd) modules rmmod kheaders
Additional notes: (1) external modules must be built on the same arch as the host that built vmlinux. This can be done either in a qemu emulated chroot on the target, or natively. This is due to host arch dependency of kernel scripts.
(2) If module building is used, since Module.symvers is not available in the archive due to a cyclic dependency with building of the archive into the kernel or module binaries, the modules built using the archive will not contain symbol versioning (modversion). This is usually not an issue since the idea of this patch is to build a kernel module on the fly and load it into the same kernel. An appropriate warning is already printed by the kernel to alert the user of modules not having modversions when built using the archive. For building with modversions, the user can use traditional header packages. For our tracing usecases, we build modules on the fly with this so it is not a concern.
(3) I have left IKHD_ST and IKHD_ED markers as is to facilitate future patches that would extract the headers from a kernel or module image.
(v4 was Tested-by the following folks, v5 only has minor changes and has passed my testing). Tested-by: qais.yousef@arm.com Tested-by: dietmar.eggemann@arm.com Tested-by: linux@manojrajarao.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
Sorry to be late at the party with this kind of feedback, but I find the whole ".tar.gz in procfs" to be an awkward solution, especially if there's expected to be userspace tooling that depends on this long-term.
No problem, your feedback is welcome.
Wouldn't it be more convenient to provide it in a standardized format such that you won't have to take an additional step, and always have This is that form IMO.
The location of the archive is fixed/known. If you are talking of the location where the user decompresses it to, then they a;ready know where they are decompressing to.
Something like:
- Pseudo-filesystem, that can just be mounted under
/sys/kernel/headers or something (similar to debugfs or /proc/device-tree).
The headers are huge if uncompressed (~30MB). Currently we use xz compression in the archive. It would be a huge waste to decompress everything into memory such as through an in-memory filesystem. And compressing on a per-file basis would be too slow for build time. Currently the build of the archive is extrememly fast.
- Exporting something like a squashfs image instead, allowing
loopback mounting of it (or by providing a pseudo-/dev entry for it), again allowing direct export of the contents and avoiding the extracted directory from being out of sync with currently running kernel.
One drawback of squashfs (other than possibly the compression ratio) is that this would be kernel build unfriendly in comparison to tar+xz. On my machine, squashfs-tools needed to be installed. For users who don't have this package, that would break their kernel build.
Having to copy and extract the tarball is the most awkward step, IMHO. I also find the waste of kernel memory for it to be an issue, but given that it can be built as a module I guess that's the obvious solution for those who care about memory consumption.
Yes. We discussed in previous threads that for users who really want the archive to be completely uncompressed and in-memory, can just load the module, decompress into tmpfs, and unload the module. That is an extra step, yes.
We had close to 2-3 months of discussions now with various folks up until v5. I am about to post v6 which is in line with Masahiro Yamada's expecations. In that I will be dropping module building artifacts due to his module building concerns and only include the headers.
thanks,
- Joel
On Mon, Apr 8, 2019 at 1:36 PM Joel Fernandes joel@joelfernandes.org wrote:
On Mon, Apr 08, 2019 at 09:29:30AM -0700, Olof Johansson wrote:
Hi,
On Wed, Mar 20, 2019 at 9:31 AM Joel Fernandes (Google) joel@joelfernandes.org wrote:
Introduce in-kernel headers and other artifacts which are made available as an archive through proc (/proc/kheaders.tar.xz file). This archive makes it possible to build kernel modules, run eBPF programs, and other tracing programs that need to extend the kernel for tracing purposes without any dependency on the file system having headers and build artifacts.
On Android and embedded systems, it is common to switch kernels but not have kernel headers available on the file system. Further once a different kernel is booted, any headers stored on the file system will no longer be useful. By storing the headers as a compressed archive within the kernel, we can avoid these issues that have been a hindrance for a long time.
The best way to use this feature is by building it in. Several users have a need for this, when they switch debug kernels, they donot want to update the filesystem or worry about it where to store the headers on it. However, the feature is also buildable as a module in case the user desires it not being part of the kernel image. This makes it possible to load and unload the headers from memory on demand. A tracing program, or a kernel module builder can load the module, do its operations, and then unload the module to save kernel memory. The total memory needed is 3.8MB.
By having the archive available at a fixed location independent of filesystem dependencies and conventions, all debugging tools can directly refer to the fixed location for the archive, without concerning with where the headers on a typical filesystem which significantly simplifies tooling that needs kernel headers.
The code to read the headers is based on /proc/config.gz code and uses the same technique to embed the headers.
To build a module, the below steps have been tested on an x86 machine: modprobe kheaders rm -rf $HOME/headers mkdir -p $HOME/headers tar -xvf /proc/kheaders.tar.xz -C $HOME/headers >/dev/null cd my-kernel-module make -C $HOME/headers M=$(pwd) modules rmmod kheaders
Additional notes: (1) external modules must be built on the same arch as the host that built vmlinux. This can be done either in a qemu emulated chroot on the target, or natively. This is due to host arch dependency of kernel scripts.
(2) If module building is used, since Module.symvers is not available in the archive due to a cyclic dependency with building of the archive into the kernel or module binaries, the modules built using the archive will not contain symbol versioning (modversion). This is usually not an issue since the idea of this patch is to build a kernel module on the fly and load it into the same kernel. An appropriate warning is already printed by the kernel to alert the user of modules not having modversions when built using the archive. For building with modversions, the user can use traditional header packages. For our tracing usecases, we build modules on the fly with this so it is not a concern.
(3) I have left IKHD_ST and IKHD_ED markers as is to facilitate future patches that would extract the headers from a kernel or module image.
(v4 was Tested-by the following folks, v5 only has minor changes and has passed my testing). Tested-by: qais.yousef@arm.com Tested-by: dietmar.eggemann@arm.com Tested-by: linux@manojrajarao.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
Sorry to be late at the party with this kind of feedback, but I find the whole ".tar.gz in procfs" to be an awkward solution, especially if there's expected to be userspace tooling that depends on this long-term.
No problem, your feedback is welcome.
Wouldn't it be more convenient to provide it in a standardized format such that you won't have to take an additional step, and always have This is that form IMO.
The location of the archive is fixed/known. If you are talking of the location where the user decompresses it to, then they a;ready know where they are decompressing to.
The location _of_ the archive, sure. But the format of what is in the tarball, how it is versioned, and how to manage it will have to be done by every user.
For any script that doesn't depend on some shared system state that wants to, say, build a eBPF program and load it, it would need to extract the tarball from scratch to make sure it is the current correct version of it.
If that's required by all users, why not just present the data in a way that it can be used directly?
Something like:
- Pseudo-filesystem, that can just be mounted under
/sys/kernel/headers or something (similar to debugfs or /proc/device-tree).
The headers are huge if uncompressed (~30MB). Currently we use xz compression in the archive. It would be a huge waste to decompress everything into memory such as through an in-memory filesystem. And compressing on a per-file basis would be too slow for build time. Currently the build of the archive is extrememly fast.
Keeping it around at all times in memory seems like a significant waste, I agree.
Providing a standard way of presenting the contents without more requirements on userspace, and without building up new cargo cult methods for how to prepare the headers, would still be useful though (see below).
- Exporting something like a squashfs image instead, allowing
loopback mounting of it (or by providing a pseudo-/dev entry for it), again allowing direct export of the contents and avoiding the extracted directory from being out of sync with currently running kernel.
One drawback of squashfs (other than possibly the compression ratio) is that this would be kernel build unfriendly in comparison to tar+xz. On my machine, squashfs-tools needed to be installed. For users who don't have this package, that would break their kernel build.
Adding a new tool that is required to use a new feature isn't that bad -- it's not like you're breaking the build for everyone.
We've also done this before in the past, by importing the tools into the kernel tree if needed. It can be solved.
Having to copy and extract the tarball is the most awkward step, IMHO. I also find the waste of kernel memory for it to be an issue, but given that it can be built as a module I guess that's the obvious solution for those who care about memory consumption.
Yes. We discussed in previous threads that for users who really want the archive to be completely uncompressed and in-memory, can just load the module, decompress into tmpfs, and unload the module. That is an extra step, yes.
Most users will need to decompress it every time they use it anyway, especially if there's no versioned prefix in the tarball that they can use to key to a previously decompressed version with the exact same kernel version and config.
So, if you need to do that anyway, wouldn't it be easier if you just mounted a FS to get to it. If you're on a system where you can't use it in-place for resource reasons, you can copy it off and unmount it. No extra tools needed in userspace then at run/use time.
Said filesystem could be populated by a compressed cpio archive since we already have code in the kernel to do this for initramfs, and could do so at mount time -- and at unmount time it'd be freed up.
If you absolutely need to export a file to userspace with the archive, my suggestion is to do it through debugfs. That way the format isn't in a /proc ABI that can't be changed in the future (debugfs isn't required to be stable in the same way). This way we can change the format carried in the kernel over time without changing the official way we present the data to userspace (via a filesystem view).
As far as format goes; there's clear precedent on cpio being used and supported; we already have build time requirements on the userspace tools with some options. Using tar would actually be a new dependency even if it is a common tool to have installed. With a self-populating FS, there's no new tool requirements on the runtime side either.
We had close to 2-3 months of discussions now with various folks up until v5. I am about to post v6 which is in line with Masahiro Yamada's expecations. In that I will be dropping module building artifacts due to his module building concerns and only include the headers.
I've found some of the old discussion and read up on it. I think it was pretty quick at dismissing ideas for more robust implementations ("it needs squashfs-tools"), and had some narrow viewpoints (exporting a tarball is the least amount of kernel change, while adding complexity at the system/usage side).
I'd also like to clarify: I'm not opposed to the general idea of providing the needed headers with the kernel somehow. I just think it's worth spending effort making sure an interface for it that we'll need to live with forever is appropriately thought through and not rushed in, especially since we're likely to get substantial infrastructure on top of it quickly (eBPF and friends in particular).
-Olof
On Wed, Apr 10, 2019 at 11:07 AM Olof Johansson olof@lixom.net wrote: [snip]
Wouldn't it be more convenient to provide it in a standardized format such that you won't have to take an additional step, and always have This is that form IMO.
The location of the archive is fixed/known. If you are talking of the location where the user decompresses it to, then they a;ready know where they are decompressing to.
The location _of_ the archive, sure. But the format of what is in the tarball, how it is versioned, and how to manage it will have to be done by every user.
For any script that doesn't depend on some shared system state that wants to, say, build a eBPF program and load it, it would need to extract the tarball from scratch to make sure it is the current correct version of it.
If that's required by all users, why not just present the data in a way that it can be used directly?
That is the part that is unclear from your proposal. If we present a filesystem view, then I am assuming the data will have to be decompressed first into memory. That means you are proposing use of 30MB uncompressed memory. The whole archive has to be decompressed but the whole archive if compressed with XZ for a maximum compression ratio.
Having to copy and extract the tarball is the most awkward step, IMHO. I also find the waste of kernel memory for it to be an issue, but given that it can be built as a module I guess that's the obvious solution for those who care about memory consumption.
Yes. We discussed in previous threads that for users who really want the archive to be completely uncompressed and in-memory, can just load the module, decompress into tmpfs, and unload the module. That is an extra step, yes.
Most users will need to decompress it every time they use it anyway, especially if there's no versioned prefix in the tarball that they can use to key to a previously decompressed version with the exact same kernel version and config.
So, if you need to do that anyway, wouldn't it be easier if you just mounted a FS to get to it. If you're on a system where you can't use it in-place for resource reasons, you can copy it off and unmount it. No extra tools needed in userspace then at run/use time.
Said filesystem could be populated by a compressed cpio archive since we already have code in the kernel to do this for initramfs, and could do so at mount time -- and at unmount time it'd be freed up.
But still, decompressing to the filesystem in a scratch area may be better than decompressing to RAM, for some users who have lesser RAM. This patchset does not enforce a certain way of doing things and leaves it to the user.
If you absolutely need to export a file to userspace with the archive, my suggestion is to do it through debugfs. That way the format isn't in a /proc ABI that can't be changed in the future (debugfs isn't required to be stable in the same way). This way we can change the format carried in the kernel over time without changing the official way we present the data to userspace (via a filesystem view).
As far as format goes; there's clear precedent on cpio being used and supported; we already have build time requirements on the userspace tools with some options. Using tar would actually be a new dependency even if it is a common tool to have installed. With a self-populating FS, there's no new tool requirements on the runtime side either.
debugfs is going away for Android and is controversial in the fact that its functionality isn't guaranteed to be there (debugfs breakages aren't necessarily bugs AFAIK). So this isn't an option.
We had close to 2-3 months of discussions now with various folks up until v5. I am about to post v6 which is in line with Masahiro Yamada's expecations. In that I will be dropping module building artifacts due to his module building concerns and only include the headers.
I've found some of the old discussion and read up on it. I think it was pretty quick at dismissing ideas for more robust implementations ("it needs squashfs-tools"), and had some narrow viewpoints (exporting a tarball is the least amount of kernel change, while adding complexity at the system/usage side).
Honestly, that's kind of unfair to be quoting just a few points like that. If I remember there were 100s of emails and many good view points were brought up by many people. We have done the diligence in the discussions of this over a period of time.
I'd also like to clarify: I'm not opposed to the general idea of providing the needed headers with the kernel somehow. I just think it's worth spending effort making sure an interface for it that we'll need to live with forever is appropriately thought through and not rushed in, especially since we're likely to get substantial infrastructure on top of it quickly (eBPF and friends in particular).
We have spent the time :) This seems like the best solution of all. Greg KH and other maintainers are also supportive of it as can be seen in other threads. We can consider an alternate proposal if it is better, but I don't see any better one proposed at the moment.
- squashfs-tools requirement on the build really sucks - cpio uncompressed to memory equally sucks because it consumes all the memory uncompressed instead of reclaimable pages - decompressing into tmpfs will suck for Android because we don't use disk-based swap and we run into the same cpio issue above. We use ZRAM for compressed swap. - debugfs is a non-option for Android
The tar+xz is trivially created without depending on squashfs-tools. And xz provides the maximum compression ratio in our experiments. Decompression time is a non-issue since trace tools are using it.
The filesystem view sounds using mount/unmount like a pony to me, but it does not meet the requirements above. Let me know if I am missing something.
thanks, - Joel
On Wed, Apr 10, 2019 at 8:51 AM Joel Fernandes joelaf@google.com wrote:
On Wed, Apr 10, 2019 at 11:07 AM Olof Johansson olof@lixom.net wrote: [snip]
Wouldn't it be more convenient to provide it in a standardized format such that you won't have to take an additional step, and always have This is that form IMO.
The location of the archive is fixed/known. If you are talking of the location where the user decompresses it to, then they a;ready know where they are decompressing to.
The location _of_ the archive, sure. But the format of what is in the tarball, how it is versioned, and how to manage it will have to be done by every user.
For any script that doesn't depend on some shared system state that wants to, say, build a eBPF program and load it, it would need to extract the tarball from scratch to make sure it is the current correct version of it.
If that's required by all users, why not just present the data in a way that it can be used directly?
That is the part that is unclear from your proposal. If we present a filesystem view, then I am assuming the data will have to be decompressed first into memory. That means you are proposing use of 30MB uncompressed memory. The whole archive has to be decompressed but the whole archive if compressed with XZ for a maximum compression ratio.
Only while the filesystem is mounted. So you would do something like:
- Mount filesystem - Build and load - Unmount
The 30MB would only be used while the filesystem is mounted.
Compared to: - Extract tarball - Build and load - Remove file tree from filesystem
Having to copy and extract the tarball is the most awkward step, IMHO. I also find the waste of kernel memory for it to be an issue, but given that it can be built as a module I guess that's the obvious solution for those who care about memory consumption.
Yes. We discussed in previous threads that for users who really want the archive to be completely uncompressed and in-memory, can just load the module, decompress into tmpfs, and unload the module. That is an extra step, yes.
Most users will need to decompress it every time they use it anyway, especially if there's no versioned prefix in the tarball that they can use to key to a previously decompressed version with the exact same kernel version and config.
So, if you need to do that anyway, wouldn't it be easier if you just mounted a FS to get to it. If you're on a system where you can't use it in-place for resource reasons, you can copy it off and unmount it. No extra tools needed in userspace then at run/use time.
Said filesystem could be populated by a compressed cpio archive since we already have code in the kernel to do this for initramfs, and could do so at mount time -- and at unmount time it'd be freed up.
But still, decompressing to the filesystem in a scratch area may be better than decompressing to RAM, for some users who have lesser RAM. This patchset does not enforce a certain way of doing things and leaves it to the user.
There are lots of things where we provide suitable ways of doing things to the user instead of making them come up with their own handling of things. devtmpfs is a perfect example of this -- doing things in userspace was perfectly possible but still a hassle in many cases, and having the kernel do it for you when it already has the data makes sense.
I'd expect many users to still want to do this to tmpfs. Also, I expect whatever userspace tools and programs that will consume this data is likely to consume similar or more memory while running anyway. So mounting + copying + unmounting on the heavily constrained systems shouldn't be raising the high water mark on memory consumption.
If you absolutely need to export a file to userspace with the archive, my suggestion is to do it through debugfs. That way the format isn't in a /proc ABI that can't be changed in the future (debugfs isn't required to be stable in the same way). This way we can change the format carried in the kernel over time without changing the official way we present the data to userspace (via a filesystem view).
As far as format goes; there's clear precedent on cpio being used and supported; we already have build time requirements on the userspace tools with some options. Using tar would actually be a new dependency even if it is a common tool to have installed. With a self-populating FS, there's no new tool requirements on the runtime side either.
debugfs is going away for Android and is controversial in the fact that its functionality isn't guaranteed to be there (debugfs breakages aren't necessarily bugs AFAIK). So this isn't an option.
The argument that this needs to go into /proc because Android is removing debugfs isn't a very strong one.
And "debugfs breakages aren't bugs" is exactly why I'm suggesting to do the non-supported export of the archive that way instead.
We had close to 2-3 months of discussions now with various folks up until v5. I am about to post v6 which is in line with Masahiro Yamada's expecations. In that I will be dropping module building artifacts due to his module building concerns and only include the headers.
I've found some of the old discussion and read up on it. I think it was pretty quick at dismissing ideas for more robust implementations ("it needs squashfs-tools"), and had some narrow viewpoints (exporting a tarball is the least amount of kernel change, while adding complexity at the system/usage side).
Honestly, that's kind of unfair to be quoting just a few points like that. If I remember there were 100s of emails and many good view points were brought up by many people. We have done the diligence in the discussions of this over a period of time.
That wasn't captured with the patch submission, and having people go find 100s of emails to figure out why your seemingly lacking solution is the best one available is not how you motivate getting your code into the kernel.
I'd also like to clarify: I'm not opposed to the general idea of providing the needed headers with the kernel somehow. I just think it's worth spending effort making sure an interface for it that we'll need to live with forever is appropriately thought through and not rushed in, especially since we're likely to get substantial infrastructure on top of it quickly (eBPF and friends in particular).
We have spent the time :) This seems like the best solution of all.
That should be documented.
Greg KH and other maintainers are also supportive of it as can be seen in other threads.
I've found support for the desire to provide headers. If there's so much support for this solution, the number of Acks to the patch should have been higher.
We can consider an alternate proposal if it is better, but I don't see any better one proposed at the moment.
Really?
- squashfs-tools requirement on the build really sucks
Nah, this is a minor detail.
- cpio uncompressed to memory equally sucks because it consumes all
the memory uncompressed instead of reclaimable pages
Only while mounted.
- decompressing into tmpfs will suck for Android because we don't use
disk-based swap and we run into the same cpio issue above. We use ZRAM for compressed swap.
See comments above about high water marks for memory consumption likely not moving much.
- debugfs is a non-option for Android
Not my problem.
The tar+xz is trivially created without depending on squashfs-tools.
It adds a new dependency on tar.
And xz provides the maximum compression ratio in our experiments.
Sure.
Decompression time is a non-issue since trace tools are using it.
Sure.
The filesystem view sounds using mount/unmount like a pony to me, but it does not meet the requirements above. Let me know if I am missing something.
What requirements?
-Olof
On Wed, Apr 10, 2019 at 12:35 PM Olof Johansson olof@lixom.net wrote:
On Wed, Apr 10, 2019 at 8:51 AM Joel Fernandes joelaf@google.com wrote:
On Wed, Apr 10, 2019 at 11:07 AM Olof Johansson olof@lixom.net wrote: [snip]
Wouldn't it be more convenient to provide it in a standardized format such that you won't have to take an additional step, and always have This is that form IMO.
The location of the archive is fixed/known. If you are talking of the location where the user decompresses it to, then they a;ready know where they are decompressing to.
The location _of_ the archive, sure. But the format of what is in the tarball, how it is versioned, and how to manage it will have to be done by every user.
For any script that doesn't depend on some shared system state that wants to, say, build a eBPF program and load it, it would need to extract the tarball from scratch to make sure it is the current correct version of it.
If that's required by all users, why not just present the data in a way that it can be used directly?
That is the part that is unclear from your proposal. If we present a filesystem view, then I am assuming the data will have to be decompressed first into memory. That means you are proposing use of 30MB uncompressed memory. The whole archive has to be decompressed but the whole archive if compressed with XZ for a maximum compression ratio.
Only while the filesystem is mounted. So you would do something like:
- Mount filesystem
- Build and load
- Unmount
The 30MB would only be used while the filesystem is mounted.
Compared to:
- Extract tarball
- Build and load
- Remove file tree from filesystem
I feel there is no benefit in this proposal and adds considerable complexity to the kernel for no benefit. Only drawbacks - will likely do much poorly on lower memory devices, addition of more complexity and likely bugs, etc.
Having to copy and extract the tarball is the most awkward step, IMHO. I also find the waste of kernel memory for it to be an issue, but given that it can be built as a module I guess that's the obvious solution for those who care about memory consumption.
Yes. We discussed in previous threads that for users who really want the archive to be completely uncompressed and in-memory, can just load the module, decompress into tmpfs, and unload the module. That is an extra step, yes.
Most users will need to decompress it every time they use it anyway, especially if there's no versioned prefix in the tarball that they can use to key to a previously decompressed version with the exact same kernel version and config.
So, if you need to do that anyway, wouldn't it be easier if you just mounted a FS to get to it. If you're on a system where you can't use it in-place for resource reasons, you can copy it off and unmount it. No extra tools needed in userspace then at run/use time.
Said filesystem could be populated by a compressed cpio archive since we already have code in the kernel to do this for initramfs, and could do so at mount time -- and at unmount time it'd be freed up.
But still, decompressing to the filesystem in a scratch area may be better than decompressing to RAM, for some users who have lesser RAM. This patchset does not enforce a certain way of doing things and leaves it to the user.
There are lots of things where we provide suitable ways of doing things to the user instead of making them come up with their own handling of things. devtmpfs is a perfect example of this -- doing things in userspace was perfectly possible but still a hassle in many cases, and having the kernel do it for you when it already has the data makes sense.
I'd expect many users to still want to do this to tmpfs. Also, I expect whatever userspace tools and programs that will consume this data is likely to consume similar or more memory while running anyway. So mounting + copying + unmounting on the heavily constrained systems shouldn't be raising the high water mark on memory consumption.
With this patch, a user can decompress the archive into their own tmpfs instance if they want to. This was also mentioned on previous threads. I don't see your point at all.
If you absolutely need to export a file to userspace with the archive, my suggestion is to do it through debugfs. That way the format isn't in a /proc ABI that can't be changed in the future (debugfs isn't required to be stable in the same way). This way we can change the format carried in the kernel over time without changing the official way we present the data to userspace (via a filesystem view).
As far as format goes; there's clear precedent on cpio being used and supported; we already have build time requirements on the userspace tools with some options. Using tar would actually be a new dependency even if it is a common tool to have installed. With a self-populating FS, there's no new tool requirements on the runtime side either.
debugfs is going away for Android and is controversial in the fact that its functionality isn't guaranteed to be there (debugfs breakages aren't necessarily bugs AFAIK). So this isn't an option.
The argument that this needs to go into /proc because Android is removing debugfs isn't a very strong one.
And "debugfs breakages aren't bugs" is exactly why I'm suggesting to do the non-supported export of the archive that way instead.
BPF tools are shipped on production systems. They should not break, that you want put them into debugfs to make them more likely to break does not make any sense.
We had close to 2-3 months of discussions now with various folks up until v5. I am about to post v6 which is in line with Masahiro Yamada's expecations. In that I will be dropping module building artifacts due to his module building concerns and only include the headers.
I've found some of the old discussion and read up on it. I think it was pretty quick at dismissing ideas for more robust implementations ("it needs squashfs-tools"), and had some narrow viewpoints (exporting a tarball is the least amount of kernel change, while adding complexity at the system/usage side).
Honestly, that's kind of unfair to be quoting just a few points like that. If I remember there were 100s of emails and many good view points were brought up by many people. We have done the diligence in the discussions of this over a period of time.
That wasn't captured with the patch submission, and having people go find 100s of emails to figure out why your seemingly lacking solution is the best one available is not how you motivate getting your code into the kernel.
I can summarize it better in the commit message. That's fine with me.
Greg KH and other maintainers are also supportive of it as can be seen in other threads.
I've found support for the desire to provide headers. If there's so much support for this solution, the number of Acks to the patch should have been higher.
There was at least one Ack on a prior revision, one Reviewed-by, and at least 4 Tested-by(s). I dropped the tags since I changed the patch a bit recently although the user interface and the idea is fundamentally the same. Also Masahiro Yamada is happy with the quality of the v6 patch, I privately chatted him. He mentioned he will likely give his Acked-by tag.
We can consider an alternate proposal if it is better, but I don't see any better one proposed at the moment.
Really?
What do you mean? I meant better, as in, a proposal that works and makes sense. Is simple, bug-free and solves the problem we are trying to solve.
- cpio uncompressed to memory equally sucks because it consumes all
the memory uncompressed instead of reclaimable pages
Only while mounted.
Still a disadvantage.
- decompressing into tmpfs will suck for Android because we don't use
disk-based swap and we run into the same cpio issue above. We use ZRAM for compressed swap.
See comments above about high water marks for memory consumption likely not moving much.
- debugfs is a non-option for Android
Not my problem.
Really? Android runs on billions of devices. That is arrogant / ignorant to say Android's requirements of moving away from debugfs are not your problem.
The filesystem view sounds using mount/unmount like a pony to me, but it does not meet the requirements above. Let me know if I am missing something.
What requirements?
I think you know this already - we don't want 30MB of active RAM being used, that does not make much sense. debugfs doesn't work because tools that need this will need to work even on production systems. That you want debugfs because it is more susceptible to ABI breakage doesn't make much sense. Not having all of the deal breakers above are requirements.
thanks!
- Joel
-Olof
-- You received this message because you are subscribed to the Google Groups "kernel-team" group. To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
On Wed, Apr 10, 2019 at 9:35 AM Olof Johansson olof@lixom.net wrote:
There are lots of things where we provide suitable ways of doing things to the user instead of making them come up with their own handling of things. devtmpfs is a perfect example of this -- doing things in userspace was perfectly possible but still a hassle in many cases, and having the kernel do it for you when it already has the data makes sense.
devtmpfs is something that the kernel can do *better* than userspace. The userspace equivalent is error-prone. Unpacking an archive, on the other hand, is trivial. Does userspace really need the kernel's help to do tar xvf? Are we really so worried about userspace getting tar xvf wrong somehow that we should provide tar xvf in the kernel? The kernel should have a feature only if there's some particular reason to put that feature in the kernel instead of userspace. A smaller kernel is a better kernel, all things being equal.
It's also worth noting that years and years ago, many proposals to do what udevd does (but in the kernel) failed to make it into the kernel.
I'd expect many users to still want to do this to tmpfs. Also, I expect whatever userspace tools and programs that will consume this data is likely to consume similar or more memory while running anyway. So mounting + copying + unmounting on the heavily constrained systems shouldn't be raising the high water mark on memory consumption.
Consider the embedded case: who's to say that the machine decompressing the header bundle is the machine that provides it? We can suck a compressed archive off the device without ever decompressing it. I know you that you proposed providing access to both a compressed cpio archive and a filesystem view of that archive, but in this scheme, the filesystem view is redundant. If someone wants a filesystem view of an archive, he can make one with FUSE without the kernel's help and in a general way.
If you absolutely need to export a file to userspace with the archive, my suggestion is to do it through debugfs. That way the format isn't in a /proc ABI that can't be changed in the future (debugfs isn't required to be stable in the same way). This way we can change the format carried in the kernel over time without changing the official way we present the data to userspace (via a filesystem view).
As far as format goes; there's clear precedent on cpio being used and supported; we already have build time requirements on the userspace tools with some options. Using tar would actually be a new dependency even if it is a common tool to have installed. With a self-populating FS, there's no new tool requirements on the runtime side either.
debugfs is going away for Android and is controversial in the fact that its functionality isn't guaranteed to be there (debugfs breakages aren't necessarily bugs AFAIK). So this isn't an option.
The argument that this needs to go into /proc because Android is removing debugfs isn't a very strong one.
And "debugfs breakages aren't bugs" is exactly why I'm suggesting to do the non-supported export of the archive that way instead.
Breaking this header bundle *should* be a bug though: tools will rely on it. It's not critical interface in the same way that, say, open(2) is, but the interface stability guarantee should nevertheless be stronger than what debugfs provides.
We had close to 2-3 months of discussions now with various folks up until v5. I am about to post v6 which is in line with Masahiro Yamada's expecations. In that I will be dropping module building artifacts due to his module building concerns and only include the headers.
I've found some of the old discussion and read up on it. I think it was pretty quick at dismissing ideas for more robust implementations ("it needs squashfs-tools"), and had some narrow viewpoints (exporting a tarball is the least amount of kernel change, while adding complexity at the system/usage side).
Honestly, that's kind of unfair to be quoting just a few points like that. If I remember there were 100s of emails and many good view points were brought up by many people. We have done the diligence in the discussions of this over a period of time.
That wasn't captured with the patch submission, and having people go find 100s of emails to figure out why your seemingly lacking solution is the best one available is not how you motivate getting your code into the kernel.
I'd also like to clarify: I'm not opposed to the general idea of providing the needed headers with the kernel somehow. I just think it's worth spending effort making sure an interface for it that we'll need to live with forever is appropriately thought through and not rushed in, especially since we're likely to get substantial infrastructure on top of it quickly (eBPF and friends in particular).
We have spent the time :) This seems like the best solution of all.
That should be documented.
Greg KH and other maintainers are also supportive of it as can be seen in other threads.
I've found support for the desire to provide headers. If there's so much support for this solution, the number of Acks to the patch should have been higher.
We can consider an alternate proposal if it is better, but I don't see any better one proposed at the moment.
Really?
Joel's proposal is the simplest approach that solves the problem we're trying to solve. The model you're proposing adds a lot of complexity, and I'm not convinced that the complexity buys us anything.
- squashfs-tools requirement on the build really sucks
Nah, this is a minor detail.
tar is ubiquitous. The squashfs tool isn't. Treating both dependencies the same way is a false equivalence.
- cpio uncompressed to memory equally sucks because it consumes all
the memory uncompressed instead of reclaimable pages
Only while mounted.
- decompressing into tmpfs will suck for Android because we don't use
disk-based swap and we run into the same cpio issue above. We use ZRAM for compressed swap.
See comments above about high water marks for memory consumption likely not moving much.
- debugfs is a non-option for Android
Not my problem.
It's important to make interfaces that work for everyone. Robust eBPF use should be possible without debugfs.
On Wed, Apr 10, 2019 at 09:34:49AM -0700, Olof Johansson wrote:
On Wed, Apr 10, 2019 at 8:51 AM Joel Fernandes joelaf@google.com wrote:
On Wed, Apr 10, 2019 at 11:07 AM Olof Johansson olof@lixom.net wrote: [snip]
Wouldn't it be more convenient to provide it in a standardized format such that you won't have to take an additional step, and always have This is that form IMO.
...
Compared to:
- Extract tarball
- Build and load
- Remove file tree from filesystem
I think there are too many assumptions in this thread in regard to what is more convenient for user space. I think bcc should try to avoid extracting tarball into file system. For example libbcc can uncompress kheader.tar.xz into virtual file system of clang front-end. It's more or less std::map<string, string> with key=path, value=content of the file. Access to such in-memory 'files' is obviously faster than doing open/read syscalls. bcc already uses this approach for some bcc internal 'files' that it passes to clang during compilation. All of /proc/kheaders.tar.xz files can be passed the same way without extracting them into real file system.
Joel, would be great if you can share corresponding bcc patch that takes advantage of /proc/kheaders
On Wed, Apr 10, 2019 at 08:15:42PM -0700, Alexei Starovoitov wrote:
On Wed, Apr 10, 2019 at 09:34:49AM -0700, Olof Johansson wrote:
On Wed, Apr 10, 2019 at 8:51 AM Joel Fernandes joelaf@google.com wrote:
On Wed, Apr 10, 2019 at 11:07 AM Olof Johansson olof@lixom.net wrote: [snip]
Wouldn't it be more convenient to provide it in a standardized format such that you won't have to take an additional step, and always have This is that form IMO.
...
Compared to:
- Extract tarball
- Build and load
- Remove file tree from filesystem
I think there are too many assumptions in this thread in regard to what is more convenient for user space. I think bcc should try to avoid extracting tarball into file system. For example libbcc can uncompress kheader.tar.xz into virtual file system of clang front-end. It's more or less std::map<string, string> with key=path, value=content of the file. Access to such in-memory 'files' is obviously faster than doing open/read syscalls. bcc already uses this approach for some bcc internal 'files' that it passes to clang during compilation. All of /proc/kheaders.tar.xz files can be passed the same way without extracting them into real file system.
Joel, would be great if you can share corresponding bcc patch that takes advantage of /proc/kheaders
That sounds good to me. Others have tested this with BCC with a manual extraction and setting environment variable to point BCC to the extracted location, but I can work BCC patch that makes this automatic and share it (I was planning to do the "automatically make BCC do it" patch for the BCC project, but first wanted this in. But I agree with Alexei, its a good idea to do it now and share it). Masahiro had some nits in v6 that I need to address anyway ;-) And Olof grumbled a bit about the commit message which could be polished a bit more :-)
thanks!
- Joel
On Wed, Apr 10, 2019 at 8:15 PM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
On Wed, Apr 10, 2019 at 09:34:49AM -0700, Olof Johansson wrote:
On Wed, Apr 10, 2019 at 8:51 AM Joel Fernandes joelaf@google.com wrote:
On Wed, Apr 10, 2019 at 11:07 AM Olof Johansson olof@lixom.net wrote: [snip]
Wouldn't it be more convenient to provide it in a standardized format such that you won't have to take an additional step, and always have This is that form IMO.
...
Compared to:
- Extract tarball
- Build and load
- Remove file tree from filesystem
I think there are too many assumptions in this thread in regard to what is more convenient for user space. I think bcc should try to avoid extracting tarball into file system. For example libbcc can uncompress kheader.tar.xz into virtual file system of clang front-end. It's more or less std::map<string, string> with key=path, value=content of the file. Access to such in-memory 'files' is obviously faster than doing open/read syscalls.
I think performance is a red herring, especially since you have to uncompress it on every compiler invocation. With this you'd need to read/touch/write _all_ header files, not just the one your current compiler invocation will use.
In the grand scheme of things, open/mmap syscalls wouldn't necessarily be slower.
bcc already uses this approach for some bcc internal 'files' that it passes to clang during compilation. All of /proc/kheaders.tar.xz files can be passed the same way without extracting them into real file system.
This is now a circular argument. Joel was stating that the plain text headers took up too much memory, but now it's preferred to create such filesystem in userspace memory on *every compiler invocation*? That's... not better. And definitely worse if you want to compile in parallel.
From my perspective, this is where we're at:
This patch seems to have been met with a lot of responses in the tone of "this is not an appealing solution". Meanwhile, some of the suggested alternative solutions have not worked out, and we are now at a point where there's less interest in exploring alternatives and arguments to merge as-is with only minor adjustments.
I understand the desire to solve this. It'd be really convenient to have a way to runtime build against the same structure layouts that the kernel was built with. But I haven't heard anyone say that they *like* the solution proposed, and I haven't seen many of those expressing concerns being converted to support it.
Usually what we do at times like this is that we say "Yeah, this is a problem that should be solved, but this solution doesn't seem to be the right one and we would need to maintain it forever as part of the ABI. Let's wait until a better solution is found." With time, sometimes a better solution becomes obvious, or circumstances change enough to allow for some different approach, or someone has a new idea from a different perspective that solves the same problem.
All of that being said, I don't have veto rights on code going into the kernel, even if I think picking up this patch would be the wrong thing to do.
I'd be a *lot* less hesitant if this went into debugfs or another location than /proc, which is one of the most regression-sensitive interfaces we have in the kernel.
Joel, would be great if you can share corresponding bcc patch that takes advantage of /proc/kheaders
-Olof
On 14.04.19 21:38, Olof Johansson wrote:
Hi folks,
I haven't followed these longs discussions completely, so forgive me if I've missed something. But here're my comments on this ...
In the grand scheme of things, open/mmap syscalls wouldn't necessarily> be slower.
ACK. In my own experience on dealing w/ lots of files, eg. headers, in compilation processes, there indeed is a bottleneck, when thousands of files have to be processed, but:
a) most the kernel-side delay's were coming from actual IO - w/ tmpfs, that easily goes away, and the syscall overhead isn't so much. b) most of the total computation was preprocessor and parsing.
This patch seems to have been met with a lot of responses in the tone> of "this is not an appealing solution".
Personally, having generic helpers for putting blobs into /proc files (like config.gz) sound appealing. But I'm not sure whether doing that w/ kernel headers this way is a good solution. Actually, I'm even not sure whether raw kernel headers are at all are a good way. (can't we use compiler-generated debug info ?)
Usually what we do at times like this is that we say "Yeah, this is a> problem that should be solved, but this solution doesn't seem to be>
the right one and we would need to maintain it forever as part of the> ABI. Let's wait until a better solution is found." With time,> sometimes a better solution becomes obvious, or circumstances change> enough to allow for some different approach, or someone has a new idea> from a different perspective that solves the same problem. ACK. For now, this is an Android-only debug tool, just needed there because of it's unusual partitioning/deployment mechanisms - on usual GNU/Linux distros, we just have the kheaders in the file system. (and even on my small embedded devices, I either run the DUTs via NFS, 9P2k, initrd, etc or just deploy kernel and headers into the filesystem)
As Android already is in it's own universe, why can't that stuff remain incubated there, until we have more field experience w/ it and more time to rethink the whole idea very carefully ?
The patch is pretty small, so it's trivial cherry-pick, in case somebody outside Android universe wants to use it.
I see two smaller sub-problems that IMHO deserve a more generic solution:
#1: generic helpers for easily exposing in-kernel blobs as /proc files (potentially w/ transparent decompression)
#2: generic way for easily linking in files with very few LoC
one scenario that I've got in mind is using dtb snippets for board drivers, that only need to initialize some generic drivers w/ board specific configuration, so that doesn't have to be hand- written anymore.
I'd be a *lot* less hesitant if this went into debugfs or another location than /proc, which is one of the most regression-sensitive interfaces we have in the kernel.
ACK. I don't think that /proc really is the right place for that. Actually, I even doubt that for config.gz , it's just historically there (many distros already disabled it for many years). IMHO, /proc is for process information. (i guess that was also a reason for creating debugfs instead of putting that stuff into /proc).
--mtx
On Mon, Apr 15, 2019 at 11:41:18AM +0200, Enrico Weigelt, metux IT consult wrote: [snip]
This patch seems to have been met with a lot of responses in the tone> of "this is not an appealing solution".
Personally, having generic helpers for putting blobs into /proc files (like config.gz) sound appealing. But I'm not sure whether doing that w/ kernel headers this way is a good solution. Actually, I'm even not sure whether raw kernel headers are at all are a good way. (can't we use compiler-generated debug info ?)
We can't use compiler generated debug info for this.
As discussed previously here, eBPF tools need kernel headers, DWARF and compiler debug information wont help: https://lkml.org/lkml/2019/3/11/1358 https://lkml.org/lkml/2019/3/11/1363
Usually what we do at times like this is that we say "Yeah, this is a> problem that should be solved, but this solution doesn't seem to be>
the right one and we would need to maintain it forever as part of the> ABI. Let's wait until a better solution is found." With time,> sometimes a better solution becomes obvious, or circumstances change> enough to allow for some different approach, or someone has a new idea> from a different perspective that solves the same problem. ACK. For now, this is an Android-only debug tool, just needed there because of it's unusual partitioning/deployment mechanisms - on usual GNU/Linux distros, we just have the kheaders in the file system. (and even on my small embedded devices, I either run the DUTs via NFS, 9P2k, initrd, etc or just deploy kernel and headers into the filesystem)
As Android already is in it's own universe, why can't that stuff remain incubated there, until we have more field experience w/ it and more time to rethink the whole idea very carefully ?
Well, we follow mostly an upstream first process.
The patch is pretty small, so it's trivial cherry-pick, in case somebody outside Android universe wants to use it.
It could break very easily if things upstream change in some way, and adds a lot of maintenance burden, besides I don't see a good reason it should not be upstreamed tbh.
thanks,
- Joel
On Sun, Apr 14, 2019 at 12:38:34PM -0700, Olof Johansson wrote:
On Wed, Apr 10, 2019 at 8:15 PM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
On Wed, Apr 10, 2019 at 09:34:49AM -0700, Olof Johansson wrote:
On Wed, Apr 10, 2019 at 8:51 AM Joel Fernandes joelaf@google.com wrote:
On Wed, Apr 10, 2019 at 11:07 AM Olof Johansson olof@lixom.net wrote: [snip]
> Wouldn't it be more convenient to provide it in a standardized format > such that you won't have to take an additional step, and always have > This is that form IMO.
...
Compared to:
- Extract tarball
- Build and load
- Remove file tree from filesystem
I think there are too many assumptions in this thread in regard to what is more convenient for user space. I think bcc should try to avoid extracting tarball into file system. For example libbcc can uncompress kheader.tar.xz into virtual file system of clang front-end. It's more or less std::map<string, string> with key=path, value=content of the file. Access to such in-memory 'files' is obviously faster than doing open/read syscalls.
I think performance is a red herring, especially since you have to uncompress it on every compiler invocation. With this you'd need to read/touch/write _all_ header files, not just the one your current compiler invocation will use.
In the grand scheme of things, open/mmap syscalls wouldn't necessarily be slower.
Agreed.
bcc already uses this approach for some bcc internal 'files' that it passes to clang during compilation. All of /proc/kheaders.tar.xz files can be passed the same way without extracting them into real file system.
This is now a circular argument. Joel was stating that the plain text headers took up too much memory, but now it's preferred to create such filesystem in userspace memory on *every compiler invocation*? That's... not better. And definitely worse if you want to compile in parallel.
The BCC patch does not extract purely into memory, but uses temporary directory: https://github.com/iovisor/bcc/pull/2312 I believe this is a good approach.
From my perspective, this is where we're at:
This patch seems to have been met with a lot of responses in the tone of "this is not an appealing solution". Meanwhile, some of the suggested alternative solutions have not worked out, and we are now at a point where there's less interest in exploring alternatives and arguments to merge as-is with only minor adjustments.
I understand the desire to solve this. It'd be really convenient to have a way to runtime build against the same structure layouts that the kernel was built with. But I haven't heard anyone say that they
About structure layouts, I'm assuming you mean compiler generated debug info. That does not work for eBPF tools as was mentioned previously in these threads: https://lkml.org/lkml/2019/3/11/1358 https://lkml.org/lkml/2019/3/11/1363
*like* the solution proposed, and I haven't seen many of those expressing concerns being converted to support it.
IMHO there has been good number of people on both sides of the argument. If it were as strong of an opposition as you think, then I would have personally not wanted this merged tbh. We do want a solution that is clean and works, and I think this is a candidate.
[snip]
I'd be a *lot* less hesitant if this went into debugfs or another location than /proc, which is one of the most regression-sensitive interfaces we have in the kernel.
The solution should be regression-sensitive imho, we don't want the tracing tools to break, people use them. And their usability and robustness is important which prompted these patches. For some time we have been hosting downloadable headers for popular kernels (such as LTS versions) to workaround this issue, but this both a maintenance issue and non-scalable, and not robust (someone boots a custom kernel or we forget to update headers for a future LTS release and the tools break).
thanks,
- Joel
On Sun, 14 Apr 2019 12:38:34 -0700 Olof Johansson olof@lixom.net wrote:
From my perspective, this is where we're at:
This patch seems to have been met with a lot of responses in the tone of "this is not an appealing solution". Meanwhile, some of the suggested alternative solutions have not worked out, and we are now at a point where there's less interest in exploring alternatives and arguments to merge as-is with only minor adjustments.
Another consideration to make is difficulty of support. Having a tarball compressed headers may not be an appealing solution, but it isn't one that would be too much of an issue to support. Having a better interface would be difficult to get right, and if you get it wrong, you are now stuck with supporting something that may become a big pain to do so in the future.
I'd be a *lot* less hesitant if this went into debugfs or another location than /proc, which is one of the most regression-sensitive interfaces we have in the kernel.
I agree with this assessment. We shouldn't use config.gz as precedence for this solution. config.gz should have been in debugfs to begin with, but I don't believe debugfs was around when config.gz was introduced. (Don't have time to look into the history of the two).
-- Steve
On Mon, Apr 15, 2019 at 9:41 AM Steven Rostedt rostedt@goodmis.org wrote:
I agree with this assessment. We shouldn't use config.gz as precedence for this solution. config.gz should have been in debugfs to begin with, but I don't believe debugfs was around when config.gz was introduced. (Don't have time to look into the history of the two).
I don't agree with this: /proc/config.gz is used by a lot of tools that do sanity-check of running systems. This isn't _debugging_... it's verifying correct kernel builds. It's a fancy version of checking /proc/version.
On Mon, 15 Apr 2019 22:50:10 -0500 Kees Cook keescook@chromium.org wrote:
On Mon, Apr 15, 2019 at 9:41 AM Steven Rostedt rostedt@goodmis.org wrote:
I agree with this assessment. We shouldn't use config.gz as precedence for this solution. config.gz should have been in debugfs to begin with, but I don't believe debugfs was around when config.gz was introduced. (Don't have time to look into the history of the two).
I don't agree with this: /proc/config.gz is used by a lot of tools that do sanity-check of running systems. This isn't _debugging_... it's verifying correct kernel builds. It's a fancy version of checking /proc/version.
Then we should perhaps make a new file system call tarballs ;-)
/sys/kernel/tarballs/
and place everything there. That way it removes it from /proc (which is the worse place for that) and also makes it something other than debug. That's what I did for tracefs.
-- Steve
On Tue, Apr 16, 2019 at 08:33:06AM -0400, Steven Rostedt wrote:
On Mon, 15 Apr 2019 22:50:10 -0500 Kees Cook keescook@chromium.org wrote:
On Mon, Apr 15, 2019 at 9:41 AM Steven Rostedt rostedt@goodmis.org wrote:
I agree with this assessment. We shouldn't use config.gz as precedence for this solution. config.gz should have been in debugfs to begin with, but I don't believe debugfs was around when config.gz was introduced. (Don't have time to look into the history of the two).
I don't agree with this: /proc/config.gz is used by a lot of tools that do sanity-check of running systems. This isn't _debugging_... it's verifying correct kernel builds. It's a fancy version of checking /proc/version.
Then we should perhaps make a new file system call tarballs ;-)
/sys/kernel/tarballs/
and place everything there. That way it removes it from /proc (which is the worse place for that) and also makes it something other than debug. That's what I did for tracefs.
As horrible as that suggestion is, it does kind of make sense :)
We can't put this in debugfs as that's only for debugging and systems should never have that mounted for normal operations (users want to build ebpf programs), and /proc really should be for processes but that horse is long left the barn.
But, I'm willing to consider putting this either in a system-fs-like filesystem, or just in sysfs itself, we do have /sys/kernel/ to play around in if the main objection is that we should not be cluttering up /proc with stuff like this.
thanks,
greg k-h
On Tue, Apr 16, 2019 at 02:49:39PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 08:33:06AM -0400, Steven Rostedt wrote:
On Mon, 15 Apr 2019 22:50:10 -0500 Kees Cook keescook@chromium.org wrote:
On Mon, Apr 15, 2019 at 9:41 AM Steven Rostedt rostedt@goodmis.org wrote:
I agree with this assessment. We shouldn't use config.gz as precedence for this solution. config.gz should have been in debugfs to begin with, but I don't believe debugfs was around when config.gz was introduced. (Don't have time to look into the history of the two).
I don't agree with this: /proc/config.gz is used by a lot of tools that do sanity-check of running systems. This isn't _debugging_... it's verifying correct kernel builds. It's a fancy version of checking /proc/version.
Then we should perhaps make a new file system call tarballs ;-)
/sys/kernel/tarballs/
and place everything there. That way it removes it from /proc (which is the worse place for that) and also makes it something other than debug. That's what I did for tracefs.
As horrible as that suggestion is, it does kind of make sense :)
We can't put this in debugfs as that's only for debugging and systems should never have that mounted for normal operations (users want to build ebpf programs), and /proc really should be for processes but that horse is long left the barn.
But, I'm willing to consider putting this either in a system-fs-like filesystem, or just in sysfs itself, we do have /sys/kernel/ to play around in if the main objection is that we should not be cluttering up /proc with stuff like this.
I am ok with the suggestion of /sys/kernel for the archive. That also seems to fit well with the idea that the headers are kernel related and probably belong here more strictly speaking, than /proc.
thanks,
- Joel
thanks,
greg k-h
On 4/16/19 9:04 AM, Joel Fernandes wrote:
On Tue, Apr 16, 2019 at 02:49:39PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 08:33:06AM -0400, Steven Rostedt wrote:
On Mon, 15 Apr 2019 22:50:10 -0500 Kees Cook keescook@chromium.org wrote:
On Mon, Apr 15, 2019 at 9:41 AM Steven Rostedt rostedt@goodmis.org wrote:
I agree with this assessment. We shouldn't use config.gz as precedence for this solution. config.gz should have been in debugfs to begin with, but I don't believe debugfs was around when config.gz was introduced. (Don't have time to look into the history of the two).
I don't agree with this: /proc/config.gz is used by a lot of tools that do sanity-check of running systems. This isn't _debugging_... it's verifying correct kernel builds. It's a fancy version of checking /proc/version.
Then we should perhaps make a new file system call tarballs ;-)
/sys/kernel/tarballs/
and place everything there. That way it removes it from /proc (which is the worse place for that) and also makes it something other than debug. That's what I did for tracefs.
As horrible as that suggestion is, it does kind of make sense :)
We can't put this in debugfs as that's only for debugging and systems should never have that mounted for normal operations (users want to build ebpf programs), and /proc really should be for processes but that horse is long left the barn.
But, I'm willing to consider putting this either in a system-fs-like filesystem, or just in sysfs itself, we do have /sys/kernel/ to play around in if the main objection is that we should not be cluttering up /proc with stuff like this.
I am ok with the suggestion of /sys/kernel for the archive. That also seems to fit well with the idea that the headers are kernel related and probably belong here more strictly speaking, than /proc.
This makes sense. And if it alleviates concerns regarding extending /proc ABIs then might as well switch to this.
Olof, what do you think of this?
On Tue, 16 Apr 2019 09:32:37 -0400 Karim Yaghmour karim.yaghmour@opersys.com wrote:
Then we should perhaps make a new file system call tarballs ;-)
/sys/kernel/tarballs/
and place everything there. That way it removes it from /proc (which is the worse place for that) and also makes it something other than debug. That's what I did for tracefs.
As horrible as that suggestion is, it does kind of make sense :)
We can't put this in debugfs as that's only for debugging and systems should never have that mounted for normal operations (users want to build ebpf programs), and /proc really should be for processes but that horse is long left the barn.
But, I'm willing to consider putting this either in a system-fs-like filesystem, or just in sysfs itself, we do have /sys/kernel/ to play around in if the main objection is that we should not be cluttering up /proc with stuff like this.
I am ok with the suggestion of /sys/kernel for the archive. That also seems to fit well with the idea that the headers are kernel related and probably belong here more strictly speaking, than /proc.
This makes sense. And if it alleviates concerns regarding extending /proc ABIs then might as well switch to this.
Olof, what do you think of this?
BTW, the name "tarballs" was kind of a joke. Probably should come up with a better name. Although, I'm fine with tarballsfs ;-)
-- Steve
On Tue, Apr 16, 2019 at 09:45:09AM -0400, Steven Rostedt wrote:
On Tue, 16 Apr 2019 09:32:37 -0400 Karim Yaghmour karim.yaghmour@opersys.com wrote:
Then we should perhaps make a new file system call tarballs ;-)
/sys/kernel/tarballs/
and place everything there. That way it removes it from /proc (which is the worse place for that) and also makes it something other than debug. That's what I did for tracefs.
As horrible as that suggestion is, it does kind of make sense :)
We can't put this in debugfs as that's only for debugging and systems should never have that mounted for normal operations (users want to build ebpf programs), and /proc really should be for processes but that horse is long left the barn.
But, I'm willing to consider putting this either in a system-fs-like filesystem, or just in sysfs itself, we do have /sys/kernel/ to play around in if the main objection is that we should not be cluttering up /proc with stuff like this.
I am ok with the suggestion of /sys/kernel for the archive. That also seems to fit well with the idea that the headers are kernel related and probably belong here more strictly speaking, than /proc.
This makes sense. And if it alleviates concerns regarding extending /proc ABIs then might as well switch to this.
Olof, what do you think of this?
BTW, the name "tarballs" was kind of a joke. Probably should come up with a better name. Although, I'm fine with tarballsfs ;-)
:-) In theory, the headers could also be hosted in tracefs since the scope of the patch right now is to help tracing tools (BCC / eBPF). Although /sys/kernel might be better in case the scope is expanded to other things.
thanks,
- Joel
On Tue, Apr 16, 2019 at 09:45:09AM -0400, Steven Rostedt wrote:
On Tue, 16 Apr 2019 09:32:37 -0400 Karim Yaghmour karim.yaghmour@opersys.com wrote:
Then we should perhaps make a new file system call tarballs ;-)
/sys/kernel/tarballs/
and place everything there. That way it removes it from /proc (which is the worse place for that) and also makes it something other than debug. That's what I did for tracefs.
As horrible as that suggestion is, it does kind of make sense :)
We can't put this in debugfs as that's only for debugging and systems should never have that mounted for normal operations (users want to build ebpf programs), and /proc really should be for processes but that horse is long left the barn.
But, I'm willing to consider putting this either in a system-fs-like filesystem, or just in sysfs itself, we do have /sys/kernel/ to play around in if the main objection is that we should not be cluttering up /proc with stuff like this.
I am ok with the suggestion of /sys/kernel for the archive. That also seems to fit well with the idea that the headers are kernel related and probably belong here more strictly speaking, than /proc.
This makes sense. And if it alleviates concerns regarding extending /proc ABIs then might as well switch to this.
Olof, what do you think of this?
BTW, the name "tarballs" was kind of a joke. Probably should come up with a better name. Although, I'm fine with tarballsfs ;-)
No need to have this be a separate filesystem, we can use a binary sysfs file in /sys/kernel/ for this as the kernel is not doing any "parsing" of the data, it is just dumping it out to userspace.
If I make /sys/kernel/config.gz be the same thing as /proc/config.gz will you fix up 'make localmodconfig' to use it? :)
thanks,
greg k-h
On Tue, 16 Apr 2019 16:22:40 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
If I make /sys/kernel/config.gz be the same thing as /proc/config.gz will you fix up 'make localmodconfig' to use it? :)
Sure.
-- Steve
On Tue, Apr 16, 2019 at 7:43 AM Steven Rostedt rostedt@goodmis.org wrote:
On Tue, 16 Apr 2019 16:22:40 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
If I make /sys/kernel/config.gz be the same thing as /proc/config.gz will you fix up 'make localmodconfig' to use it? :)
Sure.
Doing a link to the new location is probably best, just like /proc/device-tree is today.
-Olof
On Tue, Apr 16, 2019 at 04:22:40PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 09:45:09AM -0400, Steven Rostedt wrote:
On Tue, 16 Apr 2019 09:32:37 -0400 Karim Yaghmour karim.yaghmour@opersys.com wrote:
Then we should perhaps make a new file system call tarballs ;-)
/sys/kernel/tarballs/
and place everything there. That way it removes it from /proc (which is the worse place for that) and also makes it something other than debug. That's what I did for tracefs.
As horrible as that suggestion is, it does kind of make sense :)
We can't put this in debugfs as that's only for debugging and systems should never have that mounted for normal operations (users want to build ebpf programs), and /proc really should be for processes but that horse is long left the barn.
But, I'm willing to consider putting this either in a system-fs-like filesystem, or just in sysfs itself, we do have /sys/kernel/ to play around in if the main objection is that we should not be cluttering up /proc with stuff like this.
I am ok with the suggestion of /sys/kernel for the archive. That also seems to fit well with the idea that the headers are kernel related and probably belong here more strictly speaking, than /proc.
This makes sense. And if it alleviates concerns regarding extending /proc ABIs then might as well switch to this.
Olof, what do you think of this?
BTW, the name "tarballs" was kind of a joke. Probably should come up with a better name. Although, I'm fine with tarballsfs ;-)
No need to have this be a separate filesystem, we can use a binary sysfs file in /sys/kernel/ for this as the kernel is not doing any "parsing" of the data, it is just dumping it out to userspace.
What folks keep saying that an fs of header files is easier to use than tarball from bcc and cleaner from architectural pov. That's not the case.
From bcc side I'd rather have a single precompiled headers blob
that I can feed into clang and improve bpf program compilation time. Having a set of headers is a step to generate such .pch file, but once generated the headers can be removed from fs and kheaders module unloaded. The sequence is: bcc checks standard /lib/module location, if not there loads kheader mod, extracts into known location, and unloads. The extraced headers are in plain fs cache and will be evicted from memory when bcc is done compiling progs. imo much cleaner than kernel maintaining headers-fs and wasting memory.
Where kheaders.tar.xz is placed doesn't really matter. /proc or /sys/kernel makes no real difference.
On Tue, Apr 16, 2019 at 9:46 AM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
On Tue, Apr 16, 2019 at 04:22:40PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 09:45:09AM -0400, Steven Rostedt wrote:
On Tue, 16 Apr 2019 09:32:37 -0400 Karim Yaghmour karim.yaghmour@opersys.com wrote:
> Then we should perhaps make a new file system call tarballs ;-) > > /sys/kernel/tarballs/ > > and place everything there. That way it removes it from /proc (which is > the worse place for that) and also makes it something other than debug. > That's what I did for tracefs.
As horrible as that suggestion is, it does kind of make sense :)
We can't put this in debugfs as that's only for debugging and systems should never have that mounted for normal operations (users want to build ebpf programs), and /proc really should be for processes but that horse is long left the barn.
But, I'm willing to consider putting this either in a system-fs-like filesystem, or just in sysfs itself, we do have /sys/kernel/ to play around in if the main objection is that we should not be cluttering up /proc with stuff like this.
I am ok with the suggestion of /sys/kernel for the archive. That also seems to fit well with the idea that the headers are kernel related and probably belong here more strictly speaking, than /proc.
This makes sense. And if it alleviates concerns regarding extending /proc ABIs then might as well switch to this.
Olof, what do you think of this?
BTW, the name "tarballs" was kind of a joke. Probably should come up with a better name. Although, I'm fine with tarballsfs ;-)
No need to have this be a separate filesystem, we can use a binary sysfs file in /sys/kernel/ for this as the kernel is not doing any "parsing" of the data, it is just dumping it out to userspace.
What folks keep saying that an fs of header files is easier to use than tarball from bcc and cleaner from architectural pov. That's not the case. From bcc side I'd rather have a single precompiled headers blob that I can feed into clang and improve bpf program compilation time. Having a set of headers is a step to generate such .pch file, but once generated the headers can be removed from fs and kheaders module unloaded. The sequence is: bcc checks standard /lib/module location, if not there loads kheader mod, extracts into known location, and unloads.
May I suggest keeping the bcc-populated headers somewhere else? Ideally something cleaned out on every reboot in case kernel changes without version string doing it.
That way you can by default prefer the module-exported tarball, and fall back to /lib/module/$(uname -r)/ if not available, instead of the other way around and instead of having to check creation times on the dir vs boot time of the kernel, etc.
Anyway, that's just an implementation detail. But it's the kind of detail that all tools that use this would need to get right, instead of doing it right once by exporting it in a way that it can be directly used.
The extraced headers are in plain fs cache and will be evicted from memory when bcc is done compiling progs. imo much cleaner than kernel maintaining headers-fs and wasting memory.
So, in my original proposal I recommended unmounting when not needing it, which would remove the memory usage as well.
Where kheaders.tar.xz is placed doesn't really matter. /proc or /sys/kernel makes no real difference.
If done in a location that isn't a perpetual ABI commitment, a tarball solution is something we can work with.
-Olof
On Tue, Apr 16, 2019 at 12:57 PM Olof Johansson olof@lixom.net wrote: [snip]
Where kheaders.tar.xz is placed doesn't really matter. /proc or /sys/kernel makes no real difference.
If done in a location that isn't a perpetual ABI commitment, a tarball solution is something we can work with.
This is the part I don't get, Olof. It has to be an ABI commitment so that the tools always work. Why are we so interested in breaking userspace tools? We want a robust solution, not a fragile one.
thanks,
- Joel
On Tue, Apr 16, 2019 at 09:57:08AM -0700, Olof Johansson wrote:
On Tue, Apr 16, 2019 at 9:46 AM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
On Tue, Apr 16, 2019 at 04:22:40PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 09:45:09AM -0400, Steven Rostedt wrote:
On Tue, 16 Apr 2019 09:32:37 -0400 Karim Yaghmour karim.yaghmour@opersys.com wrote:
>> Then we should perhaps make a new file system call tarballs ;-) >> >> /sys/kernel/tarballs/ >> >> and place everything there. That way it removes it from /proc (which is >> the worse place for that) and also makes it something other than debug. >> That's what I did for tracefs. > > As horrible as that suggestion is, it does kind of make sense :) > > We can't put this in debugfs as that's only for debugging and systems > should never have that mounted for normal operations (users want to > build ebpf programs), and /proc really should be for processes but that > horse is long left the barn. > > But, I'm willing to consider putting this either in a system-fs-like > filesystem, or just in sysfs itself, we do have /sys/kernel/ to play > around in if the main objection is that we should not be cluttering up > /proc with stuff like this. >
I am ok with the suggestion of /sys/kernel for the archive. That also seems to fit well with the idea that the headers are kernel related and probably belong here more strictly speaking, than /proc.
This makes sense. And if it alleviates concerns regarding extending /proc ABIs then might as well switch to this.
Olof, what do you think of this?
BTW, the name "tarballs" was kind of a joke. Probably should come up with a better name. Although, I'm fine with tarballsfs ;-)
No need to have this be a separate filesystem, we can use a binary sysfs file in /sys/kernel/ for this as the kernel is not doing any "parsing" of the data, it is just dumping it out to userspace.
What folks keep saying that an fs of header files is easier to use than tarball from bcc and cleaner from architectural pov. That's not the case. From bcc side I'd rather have a single precompiled headers blob that I can feed into clang and improve bpf program compilation time. Having a set of headers is a step to generate such .pch file, but once generated the headers can be removed from fs and kheaders module unloaded. The sequence is: bcc checks standard /lib/module location, if not there loads kheader mod, extracts into known location, and unloads.
May I suggest keeping the bcc-populated headers somewhere else?
what do you mean by bcc-populated? bcc keeps its own headers inside libbcc.so .data section and provides them to clang as 'memory buffer' in clang's virtual file system.
Ideally something cleaned out on every reboot in case kernel changes without version string doing it.
That way you can by default prefer the module-exported tarball, and fall back to /lib/module/$(uname -r)/ if not available, instead of the other way around and instead of having to check creation times on the dir vs boot time of the kernel, etc.
the order of checks is bcc implementation detail. we can change that later. we've seen issues with /lib/modules/`uname -r` being corrupted by chef, so we might actually extract from kheaders.tar.xz all the time and more than once. Like try-compiling a simple prog and if it doesn't work, do the extract.
Anyway, that's just an implementation detail. But it's the kind of detail that all tools that use this would need to get right, instead of doing it right once by exporting it in a way that it can be directly used.
Today bcc is the only tool that interacts with clang this way. There is enough complexity and plenty of complex issues with on-the-fly recompile approach. I strongly suggest anyone considering new on-the-fly recompile to work with us on BTF instead.
The set of headers is not an ultimate goal. See the example with pch. bpf tracing needs three components: - all types and layout of datastructures; including all function prototypes with arg names - all macros - all inline functions
The first one is solved by BTF based solution, but macroses and infline functions have no substitute, but C header files. That is today. Eventually we might find a way to reduce dependency on headers and have macroses and infline functions represented some other way. Like mini-pch where only relevant bits of headers are represented as clang's syntax tree or mini C code. The key point is that having headers is not a goal. Making kernel maintain an fs of headers is imo a waste of kernel code. The most minimal approach of compressed tarball is preferred.
The extraced headers are in plain fs cache and will be evicted from memory when bcc is done compiling progs. imo much cleaner than kernel maintaining headers-fs and wasting memory.
So, in my original proposal I recommended unmounting when not needing it, which would remove the memory usage as well.
and such header-fs would uncompress internal tarball, create inodes, dentries and to make sure all that stuff is cleanly refcnted and freed. imo that is plenty of kernel code for no good reason.
Where kheaders.tar.xz is placed doesn't really matter. /proc or /sys/kernel makes no real difference.
If done in a location that isn't a perpetual ABI commitment, a tarball solution is something we can work with.
Fair enough. My guess that kheaders.tar.xz in this shape we would need for at least 5 years. After that we'll come up with better approach.
On Tue, Apr 16, 2019 at 6:32 AM Karim Yaghmour karim.yaghmour@opersys.com wrote:
On 4/16/19 9:04 AM, Joel Fernandes wrote:
On Tue, Apr 16, 2019 at 02:49:39PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2019 at 08:33:06AM -0400, Steven Rostedt wrote:
On Mon, 15 Apr 2019 22:50:10 -0500 Kees Cook keescook@chromium.org wrote:
On Mon, Apr 15, 2019 at 9:41 AM Steven Rostedt rostedt@goodmis.org wrote:
I agree with this assessment. We shouldn't use config.gz as precedence for this solution. config.gz should have been in debugfs to begin with, but I don't believe debugfs was around when config.gz was introduced. (Don't have time to look into the history of the two).
I don't agree with this: /proc/config.gz is used by a lot of tools that do sanity-check of running systems. This isn't _debugging_... it's verifying correct kernel builds. It's a fancy version of checking /proc/version.
Then we should perhaps make a new file system call tarballs ;-)
/sys/kernel/tarballs/
and place everything there. That way it removes it from /proc (which is the worse place for that) and also makes it something other than debug. That's what I did for tracefs.
As horrible as that suggestion is, it does kind of make sense :)
We can't put this in debugfs as that's only for debugging and systems should never have that mounted for normal operations (users want to build ebpf programs), and /proc really should be for processes but that horse is long left the barn.
But, I'm willing to consider putting this either in a system-fs-like filesystem, or just in sysfs itself, we do have /sys/kernel/ to play around in if the main objection is that we should not be cluttering up /proc with stuff like this.
I am ok with the suggestion of /sys/kernel for the archive. That also seems to fit well with the idea that the headers are kernel related and probably belong here more strictly speaking, than /proc.
This makes sense. And if it alleviates concerns regarding extending /proc ABIs then might as well switch to this.
Olof, what do you think of this?
In practice we've been more lenient with changes to /sys over time, so I think this might be a reasonable compromise.
I still think that a filesystem view is the cleanest way to do this, but I won't push back from this going in. It does solve a real problem, and if we want a different format later we can revisit it then.
-Olof
On Wed, Apr 10, 2019 at 5:09 PM Olof Johansson olof@lixom.net wrote:
As far as format goes; there's clear precedent on cpio being used and supported; we already have build time requirements on the userspace tools with some options. Using tar would actually be a new dependency even if it is a common tool to have installed. With a self-populating FS, there's no new tool requirements on the runtime side either.
The decision between tar and cpio directly follows from whether the headers are uncompressed in kernel space or in user space:
- we use cpio for initramfs because unpacking cpio in C code is fairly simple, while unpacking tar is not (see the wikipedia page on tar). If we were to unpack it from the kernel, the initramfs code could be trivially reused. - nobody sane uses cpio in user space, since tar is already present almost everywhere, while cpio is rather obscure in comparison.
Arnd
On 10.04.19 21:19, Arnd Bergmann wrote:
- we use cpio for initramfs because unpacking cpio in C code is fairly simple, while unpacking tar is not (see the wikipedia page on tar). If we were to unpack it from the kernel, the initramfs code could be trivially reused.
What about ar ? ;-)
- nobody sane uses cpio in user space, since tar is already present almost everywhere, while cpio is rather obscure in comparison.
So, rpm doesn't count as "sane" ? :o
--mtx
On Fri, 12 Apr 2019 18:16:19 +0200 "Enrico Weigelt, metux IT consult" lkml@metux.net wrote:
So, rpm doesn't count as "sane" ? :o
As a Debian user... No ;-)
/me runs
-- Steve
Hi Olof,
On 4/8/19 12:29 PM, Olof Johansson wrote:
Sorry to be late at the party with this kind of feedback, but I find the whole ".tar.gz in procfs" to be an awkward solution, especially if there's expected to be userspace tooling that depends on this long-term.
Wouldn't it be more convenient to provide it in a standardized format such that you won't have to take an additional step, and always have it in a known location?
Something like:
- Pseudo-filesystem, that can just be mounted under
/sys/kernel/headers or something (similar to debugfs or /proc/device-tree).
- Exporting something like a squashfs image instead, allowing
loopback mounting of it (or by providing a pseudo-/dev entry for it), again allowing direct export of the contents and avoiding the extracted directory from being out of sync with currently running kernel.
Having to copy and extract the tarball is the most awkward step, IMHO. I also find the waste of kernel memory for it to be an issue, but given that it can be built as a module I guess that's the obvious solution for those who care about memory consumption.
One of the things I pointed out earlier in the thread is that /proc/config.gz has already set a precedent as to the interface for this sort of artifact. It's a plain compressed file and it's directly accessible from toplevel /proc. From a consistency perspective there's an idiomatic angle to some sort of "/proc/kheaders.gz".
In some offline discussions I was also told that squashfs (I'm no expert of it) required special user-space tools and had some security issues.
Cheers,
On Mon, Apr 8, 2019 at 1:52 PM Karim Yaghmour karim.yaghmour@opersys.com wrote:
Hi Olof,
On 4/8/19 12:29 PM, Olof Johansson wrote:
Sorry to be late at the party with this kind of feedback, but I find the whole ".tar.gz in procfs" to be an awkward solution, especially if there's expected to be userspace tooling that depends on this long-term.
Wouldn't it be more convenient to provide it in a standardized format such that you won't have to take an additional step, and always have it in a known location?
Something like:
- Pseudo-filesystem, that can just be mounted under
/sys/kernel/headers or something (similar to debugfs or /proc/device-tree).
- Exporting something like a squashfs image instead, allowing
loopback mounting of it (or by providing a pseudo-/dev entry for it), again allowing direct export of the contents and avoiding the extracted directory from being out of sync with currently running kernel.
Having to copy and extract the tarball is the most awkward step, IMHO. I also find the waste of kernel memory for it to be an issue, but given that it can be built as a module I guess that's the obvious solution for those who care about memory consumption.
One of the things I pointed out earlier in the thread is that /proc/config.gz has already set a precedent as to the interface for this sort of artifact. It's a plain compressed file and it's directly accessible from toplevel /proc. From a consistency perspective there's an idiomatic angle to some sort of "/proc/kheaders.gz".
I'm not arguing against providing the headers in some format, I think that's a good idea.
On similarities, there are some but there are also substantial differences in the use model.
For the config file, the main use cases are:
- Checking to make sure that the running kernel has a particular set of config options set or cleared. - Ease of cloning the config of a running kernel when building a new one.
The file format is just a plain text file, even if compressed. No real internal structure to consider.
Both of the above uses are relatively rare (well, the first might be done in some startup scripts, etc).
The kernel headers case is different. The file format is more complex (tarball, which would also include the structure of said tarball). You can't just zgrep to get some data out.
Also, the way the contents is used is different, in that it will be needed by runtime tools that build and load eBPF programs. For the build to always be known to be against the running headers, every build would likely need to decompress and stage said tarball independently and not rely on previous state. If that's needed, why not just provide it once in the right format and avoid people building userspace solutions in several different ways to do the same thing?
In some offline discussions I was also told that squashfs (I'm no expert of it) required special user-space tools and had some security issues.
I'm unaware of what the security issues are, and there's indeed a GPLv2 tool needed to construct the filesystem. The latter can be solved, the former I don't know enough about to have an opinion.
Anyway, see my other reply just now -- CPIO + a filesystem view, and providing said cpio archive in debugfs for those who want to copy it off themselves might be something that fits everybody.
-Olof
On Wed, Apr 10, 2019 at 8:16 AM Olof Johansson olof@lixom.net wrote:
Anyway, see my other reply just now -- CPIO + a filesystem view, and providing said cpio archive in debugfs for those who want to copy it off themselves might be something that fits everybody.
I don't think it's worth increasing the complexity of the kernel for the sake of a little userspace convenience. By including the headers in the kernel, we solve an important coordination problem --- but we should solve this problem in the simplest possible way, pushing whatever complexity possible to userspace, which is more flexible and which evolves faster. Providing a compressed archive is fine. The user burden is not large: it's just extracting an archive from a standard container format. The kernel has no special advantageous way of doing this job. If userspace wants to cache extraction, it can hash the compressed archive itself and use the result as a key: it's only 3MB. But extracting the archive every time is fine. We're not talking about a huge amount of data, and I don't see a need for the kind of long-term caching that would require revalidation.
As for cpio vs tar: IME, people are much more familiar with the latter, and they're both omnipresent on unixish systems. I think it's fine to rely on a tool that's been part of POSIX for over 30 years.
linux-kselftest-mirror@lists.linaro.org