Hi,
On Wed, Dec 18, 2024 at 8:36 AM Julius Werner jwerner@chromium.org wrote:
Given that I'm not going to change the way the existing predicates work, if I move the "fallback" setting `max_bhb_k` to 32 to spectre_bhb_enable_mitigation() then when we set `max_bhb_k` becomes inconsistent between recognized and unrecognized CPUs.
A clean way to fix that could be to change spectre_bhb_loop_affected() to just return the K-value (rather than change max_bhb_k directly), and then set max_bhb_k to the max() of that return value and the existing value when it is called from spectre_bhb_enable_mitigation(). That way, max_bhb_k would only be updated from spectre_bhb_enable_mitigation().
Sure, you could do that. ...but in my patch series I need to add _another_ static boolean that's updated in is_spectre_bhb_affected() anyway. I need to add one to the new is_spectre_bhb_safe() function that you requested. Specifically, the moment I detect any CPU ID that's not in the "safe" list then I need to note that down. If I don't do that then later calls to is_spectre_bhb_affected(SCOPE_SYSTEM) will return the wrong value. So while all the other "static" caches in is_spectre_bhb_affected() could be removed because we changed the default return to "true", the one in is_spectre_bhb_safe() (which causes the function to return "false") can't be removed.
In any case, the predicates updating the static caches predates my series and IMO my series doesn't make it worse. If you want to post a followup series changing how the predicates work and can convince others that it's worth doing then great, but I don't think it should block forward progress here.
I would also say that having `max_bhb_k` get set in an inconsistent place opens us up for bugs in the future. Even if it works today, I imagine someone could change things in the future such that spectre_bhb_enable_mitigation() reads `max_bhb_k` and essentially caches it (maybe it constructs an instruction based on it). If that happened things could be subtly broken for the "unrecognized CPU" case because the first CPU would "cache" the value without it having been called on all CPUs.
This would likely already be a problem with the current code, since spectre_bhb_enable_mitigations() is called one at a time for each CPU and the max_bhb_k value is only valid once it has been called on all CPUs. If someone tried to just immediately use the value inside the function that would just be a bug either way. (Right now this should not be a problem because max_bhb_k is only used by spectre_bhb_patch_loop_iter() which ends up being called through apply_alternatives_all(), which should only run after all those CPU errata cpu_enable callbacks have been called.)
Actually, today is_spectre_bhb_affected() is called much earlier I believe. It's installed (via cpu_errata.c) and called like this:
is_spectre_bhb_affected+0x124/0x2d8 update_cpu_capabilities+0xa0/0x158 setup_boot_cpu_capabilities+0x20/0x40 setup_boot_cpu_features+0x38/0x50 smp_prepare_boot_cpu+0x38/0x60 start_kernel+0x90/0x438
...but then spectre_bhb_enable_mitigations() is called later and by that time is_spectre_bhb_affected() should have been called for each of the CPUs:
spectre_bhb_enable_mitigation+0xbc/0x340 cpu_enable_non_boot_scope_capabilities+0x74/0xc8 multi_cpu_stop+0xf0/0x1b8 cpu_stopper_thread+0xac/0x148 smpboot_thread_fn+0xb0/0x238
I agree that it doesn't seem to be a problem today, though.
-Doug