On Thu, 24 Sep 2020, Alexei Starovoitov wrote:
to whatever number, but printing single task_struct needs ~800 lines and ~18kbytes. Humans can scroll through that much spam, but can we make it less verbose by default somehow? May be not in this patch set, but in the follow up?
One approach that might work would be to devote 4 bits or so of flag space to a "maximum depth" specifier; i.e. at depth 1, only base types are displayed, no aggregate types like arrays, structs and unions. We've already got depth processing in the code to figure out if possibly zeroed nested data needs to be displayed, so it should hopefully be a simple follow-up.
One way to express it would be to use "..." to denote field(s) were omitted. We could even use the number of "."s to denote cases where multiple fields were omitted, giving a visual sense of how much data was omitted. So for example with BTF_F_MAX_DEPTH(1), task_struct looks like this:
(struct task_struct){ .state = ()1, .stack = ( *)0x00000000029d1e6f, ... .flags = (unsigned int)4194560, ... .cpu = (unsigned int)36, .wakee_flips = (unsigned int)11, .wakee_flip_decay_ts = (long unsigned int)4294914874, .last_wakee = (struct task_struct *)0x000000006c7dfe6d, .recent_used_cpu = (int)19, .wake_cpu = (int)36, .prio = (int)120, .static_prio = (int)120, .normal_prio = (int)120, .sched_class = (struct sched_class *)0x00000000ad1561e6, ... .exec_start = (u64)674402577156, .sum_exec_runtime = (u64)5009664110, .vruntime = (u64)167038057, .prev_sum_exec_runtime = (u64)5009578167, .nr_migrations = (u64)54, .depth = (int)1, .parent = (struct sched_entity *)0x00000000cba60e7d, .cfs_rq = (struct cfs_rq *)0x0000000014f353ed, ...
...etc. What do you think?
+SEC("iter/task") +int dump_task_fs_struct(struct bpf_iter__task *ctx) +{
- static const char fs_type[] = "struct fs_struct";
- struct seq_file *seq = ctx->meta->seq;
- struct task_struct *task = ctx->task;
- struct fs_struct *fs = (void *)0;
- static struct btf_ptr ptr = { };
- long ret;
- if (task)
fs = task->fs;
- ptr.type = fs_type;
- ptr.ptr = fs;
imo the following is better: ptr.type_id = __builtin_btf_type_id(*fs, 1); ptr.ptr = fs;
I'm still seeing lookup failures using __builtin_btf_type_id(,1) - whereas both __builtin_btf_type_id(,0) and Andrii's suggestion of bpf_core_type_id_kernel() work. Not sure what's going on - pahole is v1.17, clang is
clang version 12.0.0 (/mnt/src/llvm-project/clang 7ab7b979d29e1e43701cf690f5cf1903740f50e3)
- if (ctx->meta->seq_num == 0)
BPF_SEQ_PRINTF(seq, "Raw BTF fs_struct per task\n");
- ret = bpf_seq_printf_btf(seq, &ptr, sizeof(ptr), 0);
- switch (ret) {
- case 0:
tasks++;
break;
- case -ERANGE:
/* NULL task or task->fs, don't count it as an error. */
break;
- default:
seq_err = ret;
break;
- }
Please add handling of E2BIG to this switch. Otherwise printing large amount of tiny structs will overflow PAGE_SIZE and E2BIG will be send to user space. Like this: @@ -40,6 +40,8 @@ int dump_task_fs_struct(struct bpf_iter__task *ctx) case -ERANGE: /* NULL task or task->fs, don't count it as an error. */ break;
case -E2BIG:
return 1;
Done.
Also please change bpf_seq_read() like this: diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 30833bbf3019..8f10e30ea0b0 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -88,8 +88,8 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size, mutex_lock(&seq->lock);
if (!seq->buf) {
seq->size = PAGE_SIZE;
seq->buf = kmalloc(seq->size, GFP_KERNEL);
seq->size = PAGE_SIZE << 3;
seq->buf = kvmalloc(seq->size, GFP_KERNEL);
So users can print task_struct by default. Hopefully we will figure out how to deal with spam later.
Thanks for all the help and suggestions! I didn't want to attribute the patch bumping seq size in v7 to you without your permission, but it's all your work so if I need to respin let me know if you'd like me to fix that. Thanks again!
Alan