[RFC PATCH bpf-next 0/2] bpf, x64: Fix tailcall infinite loop bug

List overview All Threads
Download

newer

older

Greetings from Ukraine,

[PATCH v2] selftests/tdx: Add...

Leon Hwang

14 Aug 2023 14 Aug '23

1:41 p.m.

From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

1. FENTRY/FEXIT on a BPF subprogram. 2. A tailcall runs in the BPF subprogram. 3. The tailcall calls itself.

As a result, a tailcall infinite loop comes up. And the loop would halt the machine.

As we know, in tail call context, the tail_call_cnt propagates by stack and RAX register between BPF subprograms. So do it in FENTRY/FEXIT trampolines.

How did I discover the bug?

From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth 256 for subprogs with tailcalls"), the total stack size limits to around 8KiB. Then, I write some bpf progs to validate the stack consuming, that are tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf[1].

At that time, accidently, I made a tailcall loop. And then the loop halted my VM. Without the loop, the bpf progs would consume over 8KiB stack size. But the _stack-overflow_ did not halt my VM.

With bpf_printk(), I confirmed that the tailcall count limit did not work expectedly. Next, read the code and fix it.

Finally, unfortunately, I only fix it on x64 but other arches. As a result, CI tests failed because this bug hasn't been fixed on s390x.

Some helps are requested.

[1]: https://github.com/Asphaltt/learn-by-example/tree/main/ebpf/tailcall-stackov...

Leon Hwang (2): bpf, x64: Fix tailcall infinite loop bug selftests/bpf: Add testcases for tailcall infinite loop bug fixing

arch/x86/net/bpf_jit_comp.c | 23 ++- include/linux/bpf.h | 6 + kernel/bpf/trampoline.c | 5 +- kernel/bpf/verifier.c | 9 +- .../selftests/bpf/prog_tests/tailcalls.c | 194 +++++++++++++++++- .../bpf/progs/tailcall_bpf2bpf_fentry.c | 18 ++ .../bpf/progs/tailcall_bpf2bpf_fexit.c | 18 ++ 7 files changed, 264 insertions(+), 9 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c

base-commit: 9930e4af4b509bcf6f060b09b16884f26102d110

-- 2.41.0

Show replies by date

Leon Hwang

14 Aug 14 Aug

1:41 p.m.

New subject: [RFC PATCH bpf-next 1/2] bpf, x64: Fix tailcall infinite loop bug

From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

1. FENTRY/FEXIT on a BPF subprogram. 2. A tailcall runs in the BPF subprogram. 3. The tailcall calls itself.

As a result, a tailcall infinite loop comes up. And the loop halts the machine.

As we know, in tail call context, the tail_call_cnt propagates by stack and RAX register between BPF subprograms. So do it in FENTRY/FEXIT trampolines.

Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Signed-off-by: Leon Hwang hffilwlqm@gmail.com --- arch/x86/net/bpf_jit_comp.c | 23 +++++++++++++++++++---- include/linux/bpf.h | 6 ++++++ kernel/bpf/trampoline.c | 5 +++-- kernel/bpf/verifier.c | 9 +++++++-- 4 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index a5930042139d3..ca5366d97ad04 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1018,6 +1018,10 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)

#define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))

+/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */ +#define RESTORE_TAIL_CALL_CNT(stack) \ + EMIT3_off32(0x48, 0x8B, 0x85, -round_up(stack, 8) - 8) + static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image, int oldproglen, struct jit_context *ctx, bool jmp_padding) { @@ -1623,9 +1627,7 @@ st: if (is_imm8(insn->off))

func = (u8 *) __bpf_call_base + imm32; if (tail_call_reachable) { - /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */ - EMIT3_off32(0x48, 0x8B, 0x85, - -round_up(bpf_prog->aux->stack_depth, 8) - 8); + RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth); if (!imm32) return -EINVAL; offs = 7 + x86_call_depth_emit_accounting(&prog, func); @@ -2464,6 +2466,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i else /* sub rsp, stack_size */ EMIT4(0x48, 0x83, 0xEC, stack_size); + if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) + EMIT1(0x50); /* push rax */ /* mov QWORD PTR [rbp - rbx_off], rbx */ emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);

@@ -2516,6 +2520,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i restore_regs(m, &prog, regs_off); save_args(m, &prog, arg_stack_off, true);

+ if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) + /* Before calling the original function, restore the + * tail_call_cnt from stack. + */ + RESTORE_TAIL_CALL_CNT(stack_size); + if (flags & BPF_TRAMP_F_ORIG_STACK) { emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8); EMIT2(0xff, 0xd0); /* call *rax */ @@ -2569,7 +2579,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i ret = -EINVAL; goto cleanup; } - } + } else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) + /* Before running the original function, restore the + * tail_call_cnt from stack. + */ + RESTORE_TAIL_CALL_CNT(stack_size); + /* restore return value of orig_call or fentry prog back into RAX */ if (save_ret) emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8); diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cfabbcf47bdb8..55c72086034ef 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1028,6 +1028,11 @@ struct btf_func_model { */ #define BPF_TRAMP_F_SHARE_IPMODIFY BIT(6)

+/* Indicate that current trampoline is in a tail call context. Then, it has to + * cache and restore tail_call_cnt to avoid infinite tail call loop. + */ +#define BPF_TRAMP_F_TAIL_CALL_CTX BIT(7) + /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 * bytes on x86. */ @@ -1147,6 +1152,7 @@ struct bpf_attach_target_info { struct module *tgt_mod; const char *tgt_name; const struct btf_type *tgt_type; + bool tail_call_ctx; };

#define BPF_DISPATCHER_MAX 48 /* Fits in 2048B */ diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 78acf28d48732..0fae334e3f7b8 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -415,8 +415,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut goto out; }

- /* clear all bits except SHARE_IPMODIFY */ - tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY; + /* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */ + tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);

if (tlinks[BPF_TRAMP_FEXIT].nr_links || tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) { @@ -783,6 +783,7 @@ struct bpf_trampoline *bpf_trampoline_get(u64 key,

memcpy(&tr->func.model, &tgt_info->fmodel, sizeof(tgt_info->fmodel)); tr->func.addr = (void *)tgt_info->tgt_addr; + tr->flags = (tgt_info->tail_call_ctx ? BPF_TRAMP_F_TAIL_CALL_CTX : 0); out: mutex_unlock(&tr->mutex); return tr; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 4ccca1f6c9981..a78e5a2ae5c72 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -19400,10 +19400,15 @@ int bpf_check_attach_target(struct bpf_verifier_log *log, return -EINVAL; fallthrough; case BPF_MODIFY_RETURN: - case BPF_LSM_MAC: - case BPF_LSM_CGROUP: case BPF_TRACE_FENTRY: case BPF_TRACE_FEXIT: + if (tgt_prog && subprog > 0 && + tgt_prog->aux->func[subprog]->is_func && + tgt_prog->aux->tail_call_reachable) + tgt_info->tail_call_ctx = true; + fallthrough; + case BPF_LSM_MAC: + case BPF_LSM_CGROUP: if (!btf_type_is_func(t)) { bpf_log(log, "attach_btf_id %u is not a function\n", btf_id);

-- 2.41.0

Eduard Zingerman

15 Aug 15 Aug

12:52 a.m.

New subject: [RFC PATCH bpf-next 1/2] bpf, x64: Fix tailcall infinite loop bug

On Mon, 2023-08-14 at 21:41 +0800, Leon Hwang wrote:

...

From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

FENTRY/FEXIT on a BPF subprogram.

A tailcall runs in the BPF subprogram.

The tailcall calls itself.

As a result, a tailcall infinite loop comes up. And the loop halts the machine.

As we know, in tail call context, the tail_call_cnt propagates by stack and RAX register between BPF subprograms. So do it in FENTRY/FEXIT trampolines.

Hi Leon,

I'm not familiar with this part of the jit compiler, so decided that taking a look at your series might be a good learning point. I think I got the gist of it, but I don't understand where the initial value of RAX (== 0) is coming from in arch_prepare_bpf_trampoline(), could you please help me out?

Also a nitpick: - in arch_prepare_bpf_trampoline() there is a comment detailing the stack layout, it probably should be updated to say that tail call count is stored as well; - before arch_prepare_bpf_trampoline() there is a comment with an example of generated assembly, should it be updated?

Thanks, Eduard

...

Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Signed-off-by: Leon Hwang hffilwlqm@gmail.com

arch/x86/net/bpf_jit_comp.c | 23 +++++++++++++++++++---- include/linux/bpf.h | 6 ++++++ kernel/bpf/trampoline.c | 5 +++-- kernel/bpf/verifier.c | 9 +++++++-- 4 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index a5930042139d3..ca5366d97ad04 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1018,6 +1018,10 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op) #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp))) +/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */ +#define RESTORE_TAIL_CALL_CNT(stack) \

EMIT3_off32(0x48, 0x8B, 0x85, -round_up(stack, 8) - 8)

static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image, int oldproglen, struct jit_context *ctx, bool jmp_padding) { @@ -1623,9 +1627,7 @@ st: if (is_imm8(insn->off)) func = (u8 *) __bpf_call_base + imm32; if (tail_call_reachable) {
		/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
		EMIT3_off32(0x48, 0x8B, 0x85,
			    -round_up(bpf_prog->aux->stack_depth, 8) - 8);
		RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth);
	if (!imm32)
		return -EINVAL;
	offs = 7 + x86_call_depth_emit_accounting(&prog, func);
@@ -2464,6 +2466,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i else /* sub rsp, stack_size */ EMIT4(0x48, 0x83, 0xEC, stack_size);
if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
EMIT1(0x50);		/* push rax */
/* mov QWORD PTR [rbp - rbx_off], rbx */ emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
@@ -2516,6 +2520,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i restore_regs(m, &prog, regs_off); save_args(m, &prog, arg_stack_off, true);
if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
	/* Before calling the original function, restore the
	 * tail_call_cnt from stack.
	 */
	RESTORE_TAIL_CALL_CNT(stack_size);
if (flags & BPF_TRAMP_F_ORIG_STACK) { emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8); EMIT2(0xff, 0xd0); /* call *rax */
@@ -2569,7 +2579,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i ret = -EINVAL; goto cleanup; }

}
} else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
/* Before running the original function, restore the
 * tail_call_cnt from stack.
 */
RESTORE_TAIL_CALL_CNT(stack_size);
/* restore return value of orig_call or fentry prog back into RAX */ if (save_ret) emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cfabbcf47bdb8..55c72086034ef 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1028,6 +1028,11 @@ struct btf_func_model { */ #define BPF_TRAMP_F_SHARE_IPMODIFY BIT(6) +/* Indicate that current trampoline is in a tail call context. Then, it has to

cache and restore tail_call_cnt to avoid infinite tail call loop.

*/

+#define BPF_TRAMP_F_TAIL_CALL_CTX BIT(7)

/* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50

bytes on x86.

*/ @@ -1147,6 +1152,7 @@ struct bpf_attach_target_info { struct module *tgt_mod; const char *tgt_name; const struct btf_type *tgt_type;

bool tail_call_ctx;

}; #define BPF_DISPATCHER_MAX 48 /* Fits in 2048B */ diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 78acf28d48732..0fae334e3f7b8 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -415,8 +415,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut goto out; }

/* clear all bits except SHARE_IPMODIFY */

tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY;

/* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */

tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);

if (tlinks[BPF_TRAMP_FEXIT].nr_links || tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) { @@ -783,6 +783,7 @@ struct bpf_trampoline *bpf_trampoline_get(u64 key, memcpy(&tr->func.model, &tgt_info->fmodel, sizeof(tgt_info->fmodel)); tr->func.addr = (void *)tgt_info->tgt_addr;

tr->flags = (tgt_info->tail_call_ctx ? BPF_TRAMP_F_TAIL_CALL_CTX : 0);

out: mutex_unlock(&tr->mutex); return tr; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 4ccca1f6c9981..a78e5a2ae5c72 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -19400,10 +19400,15 @@ int bpf_check_attach_target(struct bpf_verifier_log *log, return -EINVAL; fallthrough; case BPF_MODIFY_RETURN:

case BPF_LSM_MAC:

case BPF_LSM_CGROUP: case BPF_TRACE_FENTRY: case BPF_TRACE_FEXIT:
if (tgt_prog && subprog > 0 &&
    tgt_prog->aux->func[subprog]->is_func &&
    tgt_prog->aux->tail_call_reachable)
	tgt_info->tail_call_ctx = true;
fallthrough;
case BPF_LSM_MAC:

case BPF_LSM_CGROUP: if (!btf_type_is_func(t)) { bpf_log(log, "attach_btf_id %u is not a function\n", btf_id);

Leon Hwang

3:01 a.m.

New subject: [RFC PATCH bpf-next 1/2] bpf, x64: Fix tailcall infinite loop bug

On 15/8/23 08:52, Eduard Zingerman wrote:

...

On Mon, 2023-08-14 at 21:41 +0800, Leon Hwang wrote:

...
From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

FENTRY/FEXIT on a BPF subprogram.

A tailcall runs in the BPF subprogram.

The tailcall calls itself.

As a result, a tailcall infinite loop comes up. And the loop halts the machine.

As we know, in tail call context, the tail_call_cnt propagates by stack and RAX register between BPF subprograms. So do it in FENTRY/FEXIT trampolines.

Hi Leon,

I'm not familiar with this part of the jit compiler, so decided that taking a look at your series might be a good learning point. I think I got the gist of it, but I don't understand where the initial value of RAX (== 0) is coming from in arch_prepare_bpf_trampoline(), could you please help me out?

Also a nitpick:

in arch_prepare_bpf_trampoline() there is a comment detailing the stack layout, it probably should be updated to say that tail call count is stored as well;

before arch_prepare_bpf_trampoline() there is a comment with an example of generated assembly, should it be updated?

Thanks, Eduard

a) Initial value of RAX is in emit_prologue(). if (!ebpf_from_cbpf) { if (tail_call_reachable && !is_subprog) /* When it's the entry of the whole * tailcall context, zeroing the RAX * means init tail_call_cnt. */ EMIT2(0x31, 0xC0); /* xor eax, eax */ else // Keep the same asm layout. EMIT2(0x66, 0x90); /* nop2 */ } I'd like to add this comment to emit_prologue().

b) Good to update the stack layout. I'll do it.

c) Its comment will be updated also.

Thanks, Leon

...

...
Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Signed-off-by: Leon Hwang hffilwlqm@gmail.com

arch/x86/net/bpf_jit_comp.c | 23 +++++++++++++++++++---- include/linux/bpf.h | 6 ++++++ kernel/bpf/trampoline.c | 5 +++-- kernel/bpf/verifier.c | 9 +++++++-- 4 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index a5930042139d3..ca5366d97ad04 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1018,6 +1018,10 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op) #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp))) +/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */ +#define RESTORE_TAIL_CALL_CNT(stack) \

EMIT3_off32(0x48, 0x8B, 0x85, -round_up(stack, 8) - 8)

static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image, int oldproglen, struct jit_context *ctx, bool jmp_padding) { @@ -1623,9 +1627,7 @@ st: if (is_imm8(insn->off)) func = (u8 *) __bpf_call_base + imm32; if (tail_call_reachable) {
		/* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
		EMIT3_off32(0x48, 0x8B, 0x85,
			    -round_up(bpf_prog->aux->stack_depth, 8) - 8);
		RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth);
	if (!imm32)
		return -EINVAL;
	offs = 7 + x86_call_depth_emit_accounting(&prog, func);
@@ -2464,6 +2466,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i else /* sub rsp, stack_size */ EMIT4(0x48, 0x83, 0xEC, stack_size);
if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
EMIT1(0x50);		/* push rax */
/* mov QWORD PTR [rbp - rbx_off], rbx */ emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_6, -rbx_off);
@@ -2516,6 +2520,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i restore_regs(m, &prog, regs_off); save_args(m, &prog, arg_stack_off, true);
if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
	/* Before calling the original function, restore the
	 * tail_call_cnt from stack.
	 */
	RESTORE_TAIL_CALL_CNT(stack_size);
if (flags & BPF_TRAMP_F_ORIG_STACK) { emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8); EMIT2(0xff, 0xd0); /* call *rax */
@@ -2569,7 +2579,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i ret = -EINVAL; goto cleanup; }

}
} else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
/* Before running the original function, restore the
 * tail_call_cnt from stack.
 */
RESTORE_TAIL_CALL_CNT(stack_size);
/* restore return value of orig_call or fentry prog back into RAX */ if (save_ret) emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cfabbcf47bdb8..55c72086034ef 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1028,6 +1028,11 @@ struct btf_func_model { */ #define BPF_TRAMP_F_SHARE_IPMODIFY BIT(6) +/* Indicate that current trampoline is in a tail call context. Then, it has to

cache and restore tail_call_cnt to avoid infinite tail call loop.

*/

+#define BPF_TRAMP_F_TAIL_CALL_CTX BIT(7)

/* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50

bytes on x86.

*/ @@ -1147,6 +1152,7 @@ struct bpf_attach_target_info { struct module *tgt_mod; const char *tgt_name; const struct btf_type *tgt_type;

bool tail_call_ctx;

}; #define BPF_DISPATCHER_MAX 48 /* Fits in 2048B */ diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 78acf28d48732..0fae334e3f7b8 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -415,8 +415,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut goto out; }

/* clear all bits except SHARE_IPMODIFY */

tr->flags &= BPF_TRAMP_F_SHARE_IPMODIFY;

/* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */

tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);

if (tlinks[BPF_TRAMP_FEXIT].nr_links || tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) { @@ -783,6 +783,7 @@ struct bpf_trampoline *bpf_trampoline_get(u64 key, memcpy(&tr->func.model, &tgt_info->fmodel, sizeof(tgt_info->fmodel)); tr->func.addr = (void *)tgt_info->tgt_addr;

tr->flags = (tgt_info->tail_call_ctx ? BPF_TRAMP_F_TAIL_CALL_CTX : 0);

out: mutex_unlock(&tr->mutex); return tr; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 4ccca1f6c9981..a78e5a2ae5c72 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -19400,10 +19400,15 @@ int bpf_check_attach_target(struct bpf_verifier_log *log, return -EINVAL; fallthrough; case BPF_MODIFY_RETURN:

case BPF_LSM_MAC:

case BPF_LSM_CGROUP: case BPF_TRACE_FENTRY: case BPF_TRACE_FEXIT:
if (tgt_prog && subprog > 0 &&
    tgt_prog->aux->func[subprog]->is_func &&
    tgt_prog->aux->tail_call_reachable)
	tgt_info->tail_call_ctx = true;
fallthrough;
case BPF_LSM_MAC:

case BPF_LSM_CGROUP: if (!btf_type_is_func(t)) { bpf_log(log, "attach_btf_id %u is not a function\n", btf_id);

Eduard Zingerman

2:35 p.m.

New subject: [RFC PATCH bpf-next 1/2] bpf, x64: Fix tailcall infinite loop bug

On Tue, 2023-08-15 at 11:01 +0800, Leon Hwang wrote: [...]

...

a) Initial value of RAX is in emit_prologue(). if (!ebpf_from_cbpf) { if (tail_call_reachable && !is_subprog) /* When it's the entry of the whole * tailcall context, zeroing the RAX * means init tail_call_cnt. */ EMIT2(0x31, 0xC0); /* xor eax, eax */ else // Keep the same asm layout. EMIT2(0x66, 0x90); /* nop2 */ } I'd like to add this comment to emit_prologue().

Got it, thank you.

[...]

Alexei Starovoitov

17 Aug 17 Aug

10:31 p.m.

New subject: [RFC PATCH bpf-next 1/2] bpf, x64: Fix tailcall infinite loop bug

On Mon, Aug 14, 2023 at 09:41:46PM +0800, Leon Hwang wrote:

...

@@ -1147,6 +1152,7 @@ struct bpf_attach_target_info { struct module *tgt_mod; const char *tgt_name; const struct btf_type *tgt_type;

bool tail_call_ctx;

Instead of extra flag here can you check tgt_prog->aux->tail_call_reachable in check_attach_btf_id() and set tr->flags there? Other than this the fix makes sense. Please trim your cc list when you respin. Just maintainers, Maciej (author of fixes tag) and bpf@vger is enough.

Leon Hwang

18 Aug 18 Aug

2:10 a.m.

New subject: [RFC PATCH bpf-next 1/2] bpf, x64: Fix tailcall infinite loop bug

On 18/8/23 06:31, Alexei Starovoitov wrote:

...

On Mon, Aug 14, 2023 at 09:41:46PM +0800, Leon Hwang wrote:

...
@@ -1147,6 +1152,7 @@ struct bpf_attach_target_info { struct module *tgt_mod; const char *tgt_name; const struct btf_type *tgt_type;

bool tail_call_ctx;

Instead of extra flag here can you check tgt_prog->aux->tail_call_reachable in check_attach_btf_id() and set tr->flags there?

Should we check tgt_prog->aux->func[subprog]->is_func? Or, tgt_prog->aux->tail_call_reachable is enough?

I think tgt_prog->aux->func[subprog]->is_func is required to check. It's because it's a bug about subprog instead of tgt_prog.

In check_attach_btf_id():

bool tail_call_ctx; // ... ret = bpf_check_attach_target(&env->log, prog, tgt_prog, btf_id, &tgt_info, &tail_call_ctx); // ... tr->flags = (tail_call_ctx ? BPF_TRAMP_F_TAIL_CALL_CTX : 0);

How about changing like this? However, it's bad to change bpf_check_attach_target() declaration.

...

Other than this the fix makes sense. Please trim your cc list when you respin.> Just maintainers, Maciej (author of fixes tag) and bpf@vger is enough.

I'll trim it.

Thanks, Leon

Alexei Starovoitov

7:59 p.m.

New subject: [RFC PATCH bpf-next 1/2] bpf, x64: Fix tailcall infinite loop bug

On Thu, Aug 17, 2023 at 7:10 PM Leon Hwang hffilwlqm@gmail.com wrote:

...

On 18/8/23 06:31, Alexei Starovoitov wrote:

...
On Mon, Aug 14, 2023 at 09:41:46PM +0800, Leon Hwang wrote:

...
@@ -1147,6 +1152,7 @@ struct bpf_attach_target_info { struct module *tgt_mod; const char *tgt_name; const struct btf_type *tgt_type;

bool tail_call_ctx;

Instead of extra flag here can you check tgt_prog->aux->tail_call_reachable in check_attach_btf_id() and set tr->flags there?

Should we check tgt_prog->aux->func[subprog]->is_func? Or, tgt_prog->aux->tail_call_reachable is enough?

Please let the thread continue to a logical conclusion before resending new version. Will reply there.

Leon Hwang

19 Aug 19 Aug

3:38 a.m.

New subject: [RFC PATCH bpf-next 1/2] bpf, x64: Fix tailcall infinite loop bug

On 2023/8/19 03:59, Alexei Starovoitov wrote:

...

On Thu, Aug 17, 2023 at 7:10 PM Leon Hwang hffilwlqm@gmail.com wrote:

...
On 18/8/23 06:31, Alexei Starovoitov wrote:

...
On Mon, Aug 14, 2023 at 09:41:46PM +0800, Leon Hwang wrote:

...
@@ -1147,6 +1152,7 @@ struct bpf_attach_target_info { struct module *tgt_mod; const char *tgt_name; const struct btf_type *tgt_type;

bool tail_call_ctx;

Instead of extra flag here can you check tgt_prog->aux->tail_call_reachable in check_attach_btf_id() and set tr->flags there?

Should we check tgt_prog->aux->func[subprog]->is_func? Or, tgt_prog->aux->tail_call_reachable is enough?

Please let the thread continue to a logical conclusion before resending new version. Will reply there.

Sorry for the new version without logical conclusion.

I'll do it better in the future.

Additionally, I'm looking forward to fix it, and then planning to add a feature to trace tailcalls with trampoline.

Thanks, Leon

Leon Hwang

14 Aug 14 Aug

1:41 p.m.

New subject: [RFC PATCH bpf-next 2/2] selftests/bpf: Add testcases for tailcall infinite loop bug fixing

Add 3 test cases to confirm the tailcall infinite loop bug has been fixed.

Like tailcall_bpf2bpf cases, do fentry/fexit on the bpf2bpf, and then check the final count result.

tools/testing/selftests/bpf/test_progs -t tailcalls 226/13 tailcalls/tailcall_bpf2bpf_fentry:OK 226/14 tailcalls/tailcall_bpf2bpf_fexit:OK 226/15 tailcalls/tailcall_bpf2bpf_fentry_fexit:OK 226 tailcalls:OK Summary: 1/15 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang hffilwlqm@gmail.com --- .../selftests/bpf/prog_tests/tailcalls.c | 194 +++++++++++++++++- .../bpf/progs/tailcall_bpf2bpf_fentry.c | 18 ++ .../bpf/progs/tailcall_bpf2bpf_fexit.c | 18 ++ 3 files changed, 229 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tailcalls.c b/tools/testing/selftests/bpf/prog_tests/tailcalls.c index 58fe2c586ed76..a47c2fd6b8d37 100644 --- a/tools/testing/selftests/bpf/prog_tests/tailcalls.c +++ b/tools/testing/selftests/bpf/prog_tests/tailcalls.c @@ -634,7 +634,7 @@ static void test_tailcall_bpf2bpf_2(void) return;

data_fd = bpf_map__fd(data_map); - if (CHECK_FAIL(map_fd < 0)) + if (CHECK_FAIL(data_fd < 0)) return;

i = 0; @@ -884,6 +884,191 @@ static void test_tailcall_bpf2bpf_6(void) tailcall_bpf2bpf6__destroy(obj); }

+static void __tailcall_bpf2bpf_fentry_fexit(bool test_fentry, bool test_fexit) +{ + struct bpf_object *tgt_obj = NULL, *fentry_obj = NULL, *fexit_obj = NULL; + struct bpf_link *fentry_link = NULL, *fexit_link = NULL; + int err, map_fd, prog_fd, main_fd, data_fd, i, val; + struct bpf_map *prog_array, *data_map; + struct bpf_program *prog; + char buff[128] = {}; + + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = buff, + .data_size_in = sizeof(buff), + .repeat = 1, + ); + + err = bpf_prog_test_load("tailcall_bpf2bpf2.bpf.o", + BPF_PROG_TYPE_SCHED_CLS, + &tgt_obj, &prog_fd); + if (!ASSERT_OK(err, "load tgt_obj")) + return; + + prog = bpf_object__find_program_by_name(tgt_obj, "entry"); + if (!ASSERT_OK_PTR(prog, "find entry prog")) + goto out; + + main_fd = bpf_program__fd(prog); + if (!ASSERT_FALSE(main_fd < 0, "find entry prog fd")) + goto out; + + prog_array = bpf_object__find_map_by_name(tgt_obj, "jmp_table"); + if (!ASSERT_OK_PTR(prog_array, "find jmp_table map")) + goto out; + + map_fd = bpf_map__fd(prog_array); + if (!ASSERT_FALSE(map_fd < 0, "find jmp_table map fd")) + goto out; + + prog = bpf_object__find_program_by_name(tgt_obj, "classifier_0"); + if (!ASSERT_OK_PTR(prog, "find classifier_0 prog")) + goto out; + + prog_fd = bpf_program__fd(prog); + if (!ASSERT_FALSE(prog_fd < 0, "find classifier_0 prog fd")) + goto out; + + i = 0; + err = bpf_map_update_elem(map_fd, &i, &prog_fd, BPF_ANY); + if (!ASSERT_OK(err, "update jmp_table")) + goto out; + + if (test_fentry) { + fentry_obj = bpf_object__open_file("tailcall_bpf2bpf_fentry.bpf.o", + NULL); + if (!ASSERT_OK_PTR(fentry_obj, "open fentry_obj file")) + goto out; + + prog = bpf_object__find_program_by_name(fentry_obj, "fentry"); + if (!ASSERT_OK_PTR(prog, "find fentry prog")) + goto out; + + err = bpf_program__set_attach_target(prog, prog_fd, + "subprog_tail"); + if (!ASSERT_OK(err, "set_attach_target subprog_tail")) + goto out; + + err = bpf_object__load(fentry_obj); + if (!ASSERT_OK(err, "load fentry_obj")) + goto out; + + fentry_link = bpf_program__attach_trace(prog); + if (!ASSERT_OK_PTR(fentry_link, "attach_trace")) + goto out; + } + + if (test_fexit) { + fexit_obj = bpf_object__open_file("tailcall_bpf2bpf_fexit.bpf.o", + NULL); + if (!ASSERT_OK_PTR(fexit_obj, "open fexit_obj file")) + goto out; + + prog = bpf_object__find_program_by_name(fexit_obj, "fexit"); + if (!ASSERT_OK_PTR(prog, "find fexit prog")) + goto out; + + err = bpf_program__set_attach_target(prog, prog_fd, + "subprog_tail"); + if (!ASSERT_OK(err, "set_attach_target subprog_tail")) + goto out; + + err = bpf_object__load(fexit_obj); + if (!ASSERT_OK(err, "load fexit_obj")) + goto out; + + fexit_link = bpf_program__attach_trace(prog); + if (!ASSERT_OK_PTR(fexit_link, "attach_trace")) + goto out; + } + + err = bpf_prog_test_run_opts(main_fd, &topts); + ASSERT_OK(err, "tailcall"); + ASSERT_EQ(topts.retval, 1, "tailcall retval"); + + data_map = bpf_object__find_map_by_name(tgt_obj, "tailcall.bss"); + if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map), + "find tailcall.bss map")) + goto out; + + data_fd = bpf_map__fd(data_map); + if (!ASSERT_FALSE(data_fd < 0, "find tailcall.bss map fd")) + goto out; + + i = 0; + err = bpf_map_lookup_elem(data_fd, &i, &val); + ASSERT_OK(err, "tailcall count"); + ASSERT_EQ(val, 33, "tailcall count"); + + if (test_fentry) { + data_map = bpf_object__find_map_by_name(fentry_obj, ".bss"); + if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map), + "find tailcall_bpf2bpf_fentry.bss.bss map")) + goto out; + + data_fd = bpf_map__fd(data_map); + if (!ASSERT_FALSE(data_fd < 0, + "find tailcall_bpf2bpf_fentry.bss.bss map fd")) + goto out; + + i = 0; + err = bpf_map_lookup_elem(data_fd, &i, &val); + ASSERT_OK(err, "fentry count"); + ASSERT_EQ(val, 33, "fentry count"); + } + + if (test_fexit) { + data_map = bpf_object__find_map_by_name(fexit_obj, ".bss"); + if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map), + "find tailcall_bpf2bpf_fexit.bss map")) + goto out; + + data_fd = bpf_map__fd(data_map); + if (!ASSERT_FALSE(data_fd < 0, + "find tailcall_bpf2bpf_fexit.bss map fd")) + goto out; + + i = 0; + err = bpf_map_lookup_elem(data_fd, &i, &val); + ASSERT_OK(err, "fexit count"); + ASSERT_EQ(val, 33, "fexit count"); + } + +out: + bpf_link__destroy(fentry_link); + bpf_link__destroy(fexit_link); + bpf_object__close(fentry_obj); + bpf_object__close(fexit_obj); + bpf_object__close(tgt_obj); +} + +/* test_tailcall_bpf2bpf_fentry checks that the count value of the tail call + * limit enforcement matches with expectations when tailcall is preceded with + * bpf2bpf call, and the bpf2bpf call is traced by fentry. + */ +static void test_tailcall_bpf2bpf_fentry(void) +{ + __tailcall_bpf2bpf_fentry_fexit(true, false); +} + +/* test_tailcall_bpf2bpf_fexit checks that the count value of the tail call + * limit enforcement matches with expectations when tailcall is preceded with + * bpf2bpf call, and the bpf2bpf call is traced by fexit. + */ +static void test_tailcall_bpf2bpf_fexit(void) +{ + __tailcall_bpf2bpf_fentry_fexit(false, true); +} + +/* test_tailcall_bpf2bpf_fentry_fexit checks that the count value of the tail call + * limit enforcement matches with expectations when tailcall is preceded with + * bpf2bpf call, and the bpf2bpf call is traced by both fentry and fexit. + */ +static void test_tailcall_bpf2bpf_fentry_fexit(void) +{ + __tailcall_bpf2bpf_fentry_fexit(true, true); +} + void test_tailcalls(void) { if (test__start_subtest("tailcall_1")) @@ -910,4 +1095,11 @@ void test_tailcalls(void) test_tailcall_bpf2bpf_4(true); if (test__start_subtest("tailcall_bpf2bpf_6")) test_tailcall_bpf2bpf_6(); + if (test__start_subtest("tailcall_bpf2bpf_fentry")) + test_tailcall_bpf2bpf_fentry(); + if (test__start_subtest("tailcall_bpf2bpf_fexit")) + test_tailcall_bpf2bpf_fexit(); + if (test__start_subtest("tailcall_bpf2bpf_fentry_fexit")) + test_tailcall_bpf2bpf_fentry_fexit(); } + diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c new file mode 100644 index 0000000000000..8436c6729167c --- /dev/null +++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c @@ -0,0 +1,18 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright Leon Hwang */ + +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +int count = 0; + +SEC("fentry/subprog_tail") +int BPF_PROG(fentry, struct sk_buff *skb) +{ + count++; + + return 0; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c new file mode 100644 index 0000000000000..fe16412c6e6e9 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c @@ -0,0 +1,18 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright Leon Hwang */ + +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +int count = 0; + +SEC("fexit/subprog_tail") +int BPF_PROG(fexit, struct sk_buff *skb) +{ + count++; + + return 0; +} + +char _license[] SEC("license") = "GPL";

-- 2.41.0

680

days inactive

685

days old

linux-kselftest-mirror@lists.linaro.org

9 comments

participants

tags (0)

participants (3)

Alexei Starovoitov
Eduard Zingerman
Leon Hwang