My motivation comes from debugging cgroup selftests when strace is quite useful and your implementation adds the unnecessary fork which makes the strace (slightly) less readable.
This makes sense, thank you for the context. I hadn't considered debugging considerations much, but I can imagine that it becomes harder to read once the code & strace becomes clogged up.
Do you think that this increase in granularity / accuracy is worth the increase in code complexity? I do agree that it would be much easier to read if there was no fork.
I think both changes (no cg_run or cpu_hog_func_param extension) could be reasonably small changes (existing usages of cpu_hog_func_param extension would default to zero nice, so the actual change would only be in hog_cpus_timed()).
I think I will stick with the no cg_run option. Initially, I had wanted to use it to maintain the same style with the other selftests in test_cpu.c, but I think it creates more unnecessary unreadability.
Thank you again, Joshua