Hi,
===== START ===== TEST: enq_last_no_enq_fails DESCRIPTION: Verify we fail to load a scheduler if we specify the SCX_OPS_ENQ_LAST flag without defining ops.enqueue() OUTPUT: ERR: enq_last_no_enq_fails.c:35 Incorrectly succeeded in to attaching scheduler not ok 2 enq_last_no_enq_fails # ===== END =====
Above selftest fails even when BPF scheduler is not loaded into the kernel.
Below is snippet from the dmesg verifing bpf program was not loaded: sched_ext: enq_last_no_enq_fails: SCX_OPS_ENQ_LAST requires ops.enqueue() to be implemented scx_ops_enable.isra.0+0xde8/0xe30 bpf_struct_ops_link_create+0x1ac/0x240 link_create+0x178/0x400 __sys_bpf+0x7ac/0xd50 sys_bpf+0x2c/0x70 system_call_exception+0x148/0x310 system_call_vectored_common+0x15c/0x2ec sched_ext: "enq_select_cpu_fails" does not implement cgroup cpu.weight sched_ext: BPF scheduler "enq_select_cpu_fails" enabled sched_ext: BPF scheduler "enq_select_cpu_fails" disabled (runtime error)
static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link) { ... ret = validate_ops(ops); if (ret) goto err_disable; ... err_disable: mutex_unlock(&scx_ops_enable_mutex); /* * Returning an error code here would not pass all the error information * to userspace. Record errno using scx_ops_error() for cases * scx_ops_error() wasn't already invoked and exit indicating success so * that the error is notified through ops.exit() with all the details. * * Flush scx_ops_disable_work to ensure that error is reported before * init completion. */ scx_ops_error("scx_ops_enable() failed (%d)", ret); kthread_flush_work(&scx_ops_disable_work); return 0; }
validate_ops() correctly reports the error, but err_disable path ultimately returns with a value of zero
from: enq_last_no_enq_fails.c static enum scx_test_status run(void *ctx) { struct enq_last_no_enq_fails *skel = ctx; struct bpf_link *link;
link = bpf_map__attach_struct_ops(skel->maps.enq_last_no_enq_fails_ops); if (link) { SCX_ERR("Incorrectly succeeded in to attaching scheduler"); return SCX_TEST_FAIL; }
bpf_link__destroy(link);
return SCX_TEST_PASS; }
Hello,
On Wed, Oct 23, 2024 at 10:13:19PM +0530, Vishal Chourasia wrote: ...
static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link) { ... ret = validate_ops(ops); if (ret) goto err_disable; ... err_disable: mutex_unlock(&scx_ops_enable_mutex); /* * Returning an error code here would not pass all the error information * to userspace. Record errno using scx_ops_error() for cases * scx_ops_error() wasn't already invoked and exit indicating success so * that the error is notified through ops.exit() with all the details. * * Flush scx_ops_disable_work to ensure that error is reported before * init completion. */ scx_ops_error("scx_ops_enable() failed (%d)", ret); kthread_flush_work(&scx_ops_disable_work); return 0; }
validate_ops() correctly reports the error, but err_disable path ultimately returns with a value of zero
Yeah, this is because the failure is now communicated through the scheduler unload path which has richer error reporting. The exit is triggered immediately but loading still succeeds. We need to update the test framework to detect this failure mode too.
Thanks.
linux-kselftest-mirror@lists.linaro.org