Hi Alexandre, Palmer,
On 2025/6/23 20:15, Alexandre Ghiti wrote:
Hi Jingwei, Palmer,
On 6/20/25 10:43, Jingwei Wang wrote:
Hi Palmer, On 2025/6/11 06:25, Palmer Dabbelt wrote:
On Wed, 28 May 2025 07:28:19 PDT (-0700), wangjingwei@iscas.ac.cn wrote:
The riscv_hwprobe vDSO data is populated by init_hwprobe_vdso_data(), an arch_initcall_sync. However, underlying data for some keys, like RISCV_HWPROBE_KEY_MISALIGNED_VECTOR_PERF, is determined asynchronously.
Specifically, the per_cpu(vector_misaligned_access, cpu) values are set by the vec_check_unaligned_access_speed_all_cpus kthread. This kthread is spawned by an earlier arch_initcall (check_unaligned_access_all_cpus) and may complete its benchmark *after* init_hwprobe_vdso_data() has already populated the vDSO with default/stale values.
IIUC there's another race here: we don't ensure these complete before allowing userspace to see the values, so if these took so long to probe userspace started to make hwprobe() calls before they got scheduled we'd be providing the wrong answer.
Unless I'm just missing something, though -- I thought we'd looked at that case?
Thanks for the review. You're right, my current patch doesn't fix the race condition with userspace.
I don't think there could be a race since all initcalls are executed sequentially, meaning userspace won't be up before the arch_initcall level is finished.
Yes, the initcall sequence itself provides a guarantee before userspace starts. The theoretical race condition I'm concerned about comes from kthread_run, which hands off the probe's execution to the scheduler. This introduces an uncertainty in timing that we shouldn't rely on. The completion mechanism is intended to replace this timing assumption with an explicit, guaranteed synchronization point.
But that means that this patch in its current form will probably slow down the whole boot process. To avoid that (and in addition to this patch), can we move init_hwprobe_vdso_data() to late_initcall()?
That's a great point. To check the actual impact, I ran a test on a 64-core QEMU setup where I deliberately increased the probe time (from ~2ms to ~256ms by adjusting JIFFIES). Even in this worst-case scenario, the probe consistently finished during the subsystem_initcalls. This suggests the real-world slowdown from waiting is quite minimal.
However, I completely agree with your suggestion. Combining completion with a later initcall is the ideal solution to guarantee both correctness and performance. Since the vDSO data is only consumed by userspace, moving its initialization to late_initcall is perfectly safe and makes the most sense.
The robust solution here is to use the kernel's `completion`. I've tested this approach: the async probing thread calls `complete()` when finished, and `init_hwprobe_vdso_data()` blocks on `wait_for_completion()`. This guarantees the vDSO data is finalized before userspace can access it.
So, refresh the vDSO data for specified keys (e.g., MISALIGNED_VECTOR_PERF) ensuring it reflects the final boot-time values.
Test by comparing vDSO and syscall results for affected keys (e.g., MISALIGNED_VECTOR_PERF), which now match their final boot-time values.
Wouldn't all the other keys we probe via workqueue be racy as well?
The completion mechanism is easily reusable. If this approach is accepted, I will then identify other potential probe keys and integrate them into this synchronization logic.
Yes, I'd say that's the right way to do, there aren't lots of asynchronous initialization stuff so we can handle that when new ones land.
Thank you. I'll prepare a new version of the patch that implements this completion + late_initcall approach. I've also had a look, and for now, the asynchronous probe for vector misalignment seems to be a special case. The completion framework can be easily extended if similar ones land in the future.
Thanks, Jingwei