On Fri, Apr 12, 2024 at 01:34:43PM -0700, Charlie Jenkins wrote:
On Fri, Apr 12, 2024 at 08:26:12PM +0100, Conor Dooley wrote:
On Fri, Apr 12, 2024 at 11:46:21AM -0700, Charlie Jenkins wrote:
On Fri, Apr 12, 2024 at 07:38:04PM +0100, Conor Dooley wrote:
On Fri, Apr 12, 2024 at 10:04:17AM -0700, Evan Green wrote:
On Fri, Apr 12, 2024 at 3:26 AM Conor Dooley conor.dooley@microchip.com wrote:
On Thu, Apr 11, 2024 at 09:11:08PM -0700, Charlie Jenkins wrote: > The riscv_cpuinfo struct that contains mvendorid and marchid is not > populated until all harts are booted which happens after the DT parsing. > Use the vendorid/archid values from the DT if available or assume all > harts have the same values as the boot hart as a fallback. > > Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs")
If this is our only use case for getting the mvendorid/marchid stuff from dt, then I don't think we should add it. None of the devicetrees that the commit you're fixing here addresses will have these properties and if they did have them, they'd then also be new enough to hopefully not have "v" either - the issue is they're using whatever crap the vendor shipped. If we're gonna get the information from DT, we already have something that we can look at to perform the disable as the cpu compatibles give us enough information to make the decision.
I also think that we could just cache the boot CPU's marchid/mvendorid, since we already have to look at it in riscv_fill_cpu_mfr_info(), avoid repeating these ecalls on all systems.
Perhaps for now we could just look at the boot CPU alone? To my knowledge the systems that this targets all have homogeneous marchid/mvendorid values of 0x0.
It's possible I'm misinterpreting, but is the suggestion to apply the marchid/mvendorid we find on the boot CPU and assume it's the same on all other CPUs? Since we're reporting the marchid/mvendorid/mimpid to usermode in a per-hart way, it would be better IMO if we really do query marchid/mvendorid/mimpid on each hart. The problem with applying the boot CPU's value everywhere is if we're ever wrong in the future (ie that assumption doesn't hold on some machine), we'll only find out about it after the fact. Since we reported the wrong information to usermode via hwprobe, it'll be an ugly userspace ABI issue to clean up.
You're misinterpreting, we do get the values on all individually as they're brought online. This is only used by the code that throws a bone to people with crappy vendor dtbs that put "v" in riscv,isa when they support the unratified version.
Not quite,
Remember that this patch stands in isolation and the justification given in your commit message does not mention anything other than fixing my broken patch.
Fixing the patch in the simplest sense would be to eagerly get the mvendorid/marchid without using the cached version. But this assumes that all harts have the same mvendorid/marchid. This is not something that I am strongly attached to. If it truly is detrimental to Linux to allow a user a way to specify different vendorids for different harts then I will remove that code.
I think that the simple fix is all that we need to do here, perhaps updating the comment to point out how naive we are being. `
the alternatives are patched before the other cpus are booted, so the alternatives will have false positives resulting in broken kernels.
Over-eagerly disabling vector isn't going to break any kernels and really should not break a behaving userspace either. Under-eagerly disabling it (in a way that this approach could solve) is only going to happen on a system where the boot hart has non-zero values and claims support for v but a non-boot hart has zero values and claims support for v but actually doesn't implement the ratified version. If the boot hart doesn't support v, then we currently disable the extension as only homogeneous stuff is supported by Linux. If the boot hart claims support for "v" but doesn't actually implement the ratified version neither the intent of my original patch nor this fix for it are going to help avoid a broken kernel.
I think we do have a problem if the boot cpu having some erratum leads to the kernel being patched in a way that does not work for the other CPUs on the system, but I don't think this series addresses that sort of issue at all as you'd be adding code to the pi section if you were fixing it. I also don't think we should be making pre-emptive changes to the errata patching code either to solve that sort of problem, until an SoC shows up where things don't work. Cheers, Conor.