On Wed, Jul 10, 2024 at 04:17:02PM +0100, Marc Zyngier wrote:
Mark Brown broonie@kernel.org wrote:
- if (ctxt_has_gcs(ctxt)) {
Since this is conditioned on S1PIE, it should be only be evaluated when PIE is enabled in the guest.
So make ctxt_has_gcs() embed a check of ctxt_has_s1pie()?
ctxt_sys_reg(ctxt, GCSPR_EL1) = read_sysreg_el1(SYS_GCSPR);
ctxt_sys_reg(ctxt, GCSCR_EL1) = read_sysreg_el1(SYS_GCSCR);
ctxt_sys_reg(ctxt, GCSCRE0_EL1) = read_sysreg_s(SYS_GCSCRE0_EL1);
Why is this part of the EL1 context? It clearly only matters to EL0 execution, so it could be switched in load/put on nVHE as well. And actually, given that the whole thing is strictly for userspace, why do we switch *anything* eagerly at all?
GCS can also be used independently at EL1 (and EL2 for that matter), it's not purely for userspace even though this series only implements use in userspace. GCSPR_EL1 and GCSCR_EL1 control the use of GCS at EL1, not EL0, and the guest might be using GCS at EL1 even if the host doesn't.
GCSCRE0_EL1 is for EL0 though, it ended up here mainly because it's an _EL1 register and we are already context switching PIRE0_EL1 in the EL1 functions so it seemed consistent to follow the same approach for GCS. The _el1 and _user save/restore functions are called from the same place for both VHE and nVHE so the practical impact of the placement should be minimal AFAICT. Unlike PIRE0_EL1 GCSCRE0_EL1 only has an impact for code runnning at EL0 so I can move it to the _user functions.
TBH I'm not following your comments about switching eagerly too here at all, where would you expect to see the switching done? You've said something along these lines before which prompted me to send a patch to only save the S1PIE registers if they'd been written to which you were quite reasonably not happy with given the extra traps it would cause:
https://lore.kernel.org/r/20240301-kvm-arm64-defer-regs-v1-1-401e3de92e97@ke...
but I'm at a loss as to how to make things less eager otherwise.
@@ -2306,7 +2323,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { ID_AA64PFR0_EL1_GIC | ID_AA64PFR0_EL1_AdvSIMD | ID_AA64PFR0_EL1_FP), },
- ID_SANITISED(ID_AA64PFR1_EL1),
- ID_WRITABLE(ID_AA64PFR1_EL1, ~(ID_AA64PFR1_EL1_RES0 |
ID_AA64PFR1_EL1_BT)),
I don't know what you're trying to do here, but that's not right. If you want to make this register writable, here's the shopping list:
Yes, trying to make things writable. I do see we need to exclude more bits there and I'm not clear why I excluded BTI, looks like I forgot to add a TODO comment at some point and finish that off. Sorry about that.
In the linked mail you say you want to see all fields explicitly handled, could you be more direct about what such explicit handling would look like? I see a number of examples in the existing code like:
ID_WRITABLE(ID_AA64ZFR0_EL1, ~ID_AA64ZFR0_EL1_RES0),
ID_WRITABLE(ID_AA64ISAR0_EL1, ~ID_AA64ISAR0_EL1_RES0), ID_WRITABLE(ID_AA64ISAR1_EL1, ~(ID_AA64ISAR1_EL1_GPI | ID_AA64ISAR1_EL1_GPA | ID_AA64ISAR1_EL1_API | ID_AA64ISAR1_EL1_APA)),
which look to my eye very similar to the above, they do not visibliy explictly enumerate every field in the registers and given that there's a single mask specified it's not clear how that would look. If ID_WRITABLE() took separate read/write masks and combined them it'd be more obvious but it's just not written that way.