On Tue, Jan 3, 2023 at 7:03 AM Quentin Monnet quentin@isovalent.com wrote:
2022-12-20 16:13 UTC-0800 ~ Andrii Nakryiko andrii.nakryiko@gmail.com
On Tue, Dec 20, 2022 at 3:34 AM Leo Yan leo.yan@linaro.org wrote:
On Tue, Dec 20, 2022 at 09:31:14AM +0800, Changbin Du wrote:
[...]
Now will print below info: libbpf: failed to find '.BTF' ELF section in /home/changbin/work/linux/vmlinux
Recently I encountered the same issue, it could be caused by: either missing to install tool pahole or missing to enable kernel configuration CONFIG_DEBUG_INFO_BTF.
Could we give explict info for reasoning failure? Like:
"libbpf: failed to find '.BTF' ELF section in /home/changbin/work/linux/vmlinux, please install pahole and enable CONFIG_DEBUG_INFO_BTF=y for kernel building".
This is vmlinux special information and similar tips are removed from patch V2. libbpf is common for all ELFs.
Okay, I see. Sorry for noise.
Error: failed to load BTF from /home/changbin/work/linux/vmlinux: No such file or directory
This log is confusing when we can find vmlinux file but without BTF section. Consider to use a separate patch to detect vmlinux not found case and print out "No such file or directory"?
I think it's already there. If the file doesn't exist, open will fail.
[...]
@@ -990,6 +990,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf, err = 0;
if (!btf_data) {
pr_warn("failed to find '%s' ELF section in %s\n", BTF_ELF_SEC, path); err = -ENOENT;
btf_parse_elf() returns -ENOENT when ELF file doesn't contain BTF section, therefore, bpftool dumps error string "No such file or directory". It's confused that actually vmlinux is existed.
I am wondering if we can use error -LIBBPF_ERRNO__FORMAT (or any better choice?) to replace -ENOENT at here, this can avoid bpftool to outputs "No such file or directory" in this case.
The only really meaningful error code would be -ESRCH, which strerror() will translate to "No such process", which is also completely confusing.
In general, I always found these strerror() messages extremely unhelpful and confusing. I wonder if we should make an effort to actually emit symbolic names of errors instead (literally, "-ENOENT" in this case). This is all tooling for engineers, I find -ENOENT or -ESRCH much more meaningful as an error message, compared to "No such file" seemingly human-readable interpretation.
Quenting, what do you think about the above proposal for bpftool? We can have some libbpf helper internally and do it in libbpf error messages as well and just reuse the logic in bpftool, perhaps?
Apologies for the delay. What you're proposing is to replace all messages currently looking like this:
$ bpftool prog Error: can't get next program: Operation not permitted
by:
$ bpftool prog Error: can't get next program: -EPERM
Do I understand correctly?
yep, that's what I had in mind
I think the strerror() messages are helpful in some occasions (they _are_ more human-friendly to many users), but it's also true that they're not always precise. With bpftool, "Invalid argument" is a classic when the program doesn't load, and may lead to confusion with the args passed to bpftool on the command line. Then there are the other corner cases like the one discussed in this thread. So, why not.
maybe the right approach would be to have both symbolic error name and its human-readable representation, so for example above
Error: can't get next program: [-EPERM] Operation not permitted
or something like that? And if error value is unknown, just keep it as integer: "[-5555]" ?
If we do change, yeah I'd rather have as much of this handling in libbpf itself, and then adjust bpftool to handle the remaining cases, for consistency.
we can teach libbpf_strerror_r() to do this and if bpftool is going to use it consistently then it would get the benefit automatically
Quentin