User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Also contains some fixes/changes for the memop selftest independent of the cmpxchg changes.
v4 -> v5 * refuse cmpxchg if not write (thanks Thomas) * minor doc changes (thanks Claudio) * picked up R-b's (thanks Thomas & Claudio) * memop selftest fixes * rebased
v3 -> v4 * no functional change intended * rework documentation a bit * name extension cap cmpxchg bit * picked up R-b (thanks Thomas) * various changes (rename variable, comments, ...) see range-diff below
v2 -> v3 * rebase onto the wip/cmpxchg_user_key branch in the s390 kernel repo * use __uint128_t instead of unsigned __int128 * put moving of testlist into main into separate patch * pick up R-b's (thanks Nico)
v1 -> v2 * get rid of xrk instruction for cmpxchg byte and short implementation * pass old parameter via pointer instead of in mem_op struct * indicate failure of cmpxchg due to wrong old value by special return code * picked up R-b's (thanks Thomas)
Janis Schoetterl-Glausch (10): KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG KVM: s390: selftest: memop: Pass mop_desc via pointer KVM: s390: selftest: memop: Replace macros by functions KVM: s390: selftest: memop: Move testlist into main KVM: s390: selftest: memop: Add cmpxchg tests KVM: s390: selftest: memop: Add bad address test KVM: s390: selftest: memop: Fix typo KVM: s390: selftest: memop: Fix wrong address being used in test KVM: s390: selftest: memop: Fix integer literal
Documentation/virt/kvm/api.rst | 20 +- include/uapi/linux/kvm.h | 7 + arch/s390/kvm/gaccess.h | 3 + arch/s390/kvm/gaccess.c | 102 ++++ arch/s390/kvm/kvm-s390.c | 41 +- tools/testing/selftests/kvm/s390x/memop.c | 675 +++++++++++++++++----- 6 files changed, 696 insertions(+), 152 deletions(-)
Range-diff against v4: 1: 75a20d2e27f2 ! 1: 6adc166ee141 KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg @@ arch/s390/kvm/kvm-s390.c: static int kvm_s390_vm_mem_op(struct kvm *kvm, struct + */ + if (mop->size > sizeof(new)) + return -EINVAL; ++ if (mop->op != KVM_S390_MEMOP_ABSOLUTE_WRITE) ++ return -EINVAL; + if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size)) + return -EFAULT; + if (copy_from_user(&old.raw[off_in_quad], old_addr, mop->size)) 2: 5c5ad96a4c81 ! 2: fce9a063ab70 Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG @@ Commit message checked) cmpxchg operations on guest memory.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com + Reviewed-by: Claudio Imbrenda imbrenda@linux.ibm.com
## Documentation/virt/kvm/api.rst ## @@ Documentation/virt/kvm/api.rst: The fields in each entry are defined as follows: @@ Documentation/virt/kvm/api.rst: The fields in each entry are defined as follows: :Returns: = 0 on success, < 0 on generic error (e.g. -EFAULT or -ENOMEM), - > 0 if an exception occurred while walking the page tables -+ 16 bit program exception code if the access causes such an exception -+ other code > 0xffff with special meaning ++ 16 bit program exception code if the access causes such an exception, ++ other code > 0xffff with special meaning.
Read or write data from/to the VM's memory. The KVM_CAP_S390_MEM_OP_EXTENSION capability specifies what functionality is 3: 9cbcb313d91d = 3: 94c1165ae24a KVM: s390: selftest: memop: Pass mop_desc via pointer 4: 21d98b9deaae ! 4: 027c87eee0ac KVM: s390: selftest: memop: Replace macros by functions @@ Commit message need the exta flexibility.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com + Reviewed-by: Thomas Huth thuth@redhat.com
## tools/testing/selftests/kvm/s390x/memop.c ## @@ tools/testing/selftests/kvm/s390x/memop.c: struct mop_desc { 5: 866fcd7fbc97 ! 5: 16ac410ecc0f KVM: s390: selftest: memop: Move testlist into main @@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors(void) { int extension_cap, idx;
-+ setbuf(stdout, NULL); /* Tell stdout not to buffer its content */ TEST_REQUIRE(kvm_has_cap(KVM_CAP_S390_MEM_OP)); + extension_cap = kvm_check_cap(KVM_CAP_S390_MEM_OP_EXTENSION);
-- setbuf(stdout, NULL); /* Tell stdout not to buffer its content */ +- ksft_print_header(); + struct testdef { + const char *name; + void (*test)(void); @@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors(void) + }, + };
- ksft_print_header(); -- ++ ksft_print_header(); ksft_set_plan(ARRAY_SIZE(testlist));
- extension_cap = kvm_check_cap(KVM_CAP_S390_MEM_OP_EXTENSION); @@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors(void) - ksft_test_result_skip("%s - extension level %d not supported\n", - testlist[idx].name, - testlist[idx].extension); -+ ksft_test_result_skip("%s - requirements not met (kernel has extension cap %#x\n)", ++ ksft_test_result_skip("%s - requirements not met (kernel has extension cap %#x)\n", + testlist[idx].name, extension_cap); } } 6: c3e473677786 ! 6: 214281b6eb96 KVM: s390: selftest: memop: Add cmpxchg tests @@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors(void) + rv = ERR_MOP(t.vm, ABSOLUTE, WRITE, mem1, i, GADDR((void *)~0xfffUL), + CMPXCHG_OLD(&old)); + TEST_ASSERT(rv > 0, "ioctl allows bad guest address for cmpxchg"); ++ rv = ERR_MOP(t.vm, ABSOLUTE, READ, mem1, i, GADDR_V(mem1), ++ CMPXCHG_OLD(&old)); ++ TEST_ASSERT(rv == -1 && errno == EINVAL, ++ "ioctl allows read cmpxchg call"); + } + for (i = 2; i <= 16; i *= 2) { + rv = ERR_MOP(t.vm, ABSOLUTE, WRITE, mem1, i, GADDR_V(mem1 + 1), 7: 90288760656e = 7: 2d6776733e64 KVM: s390: selftest: memop: Add bad address test 8: e3d4b9b2ba61 = 8: 8c49eafd2881 KVM: s390: selftest: memop: Fix typo 9: 13fedd6e3d9e = 9: 0af907110b34 KVM: s390: selftest: memop: Fix wrong address being used in test -: ------------ > 10: 886c80b2bdce KVM: s390: selftest: memop: Fix integer literal
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com --- include/uapi/linux/kvm.h | 7 +++ arch/s390/kvm/gaccess.h | 3 ++ arch/s390/kvm/gaccess.c | 102 +++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 41 +++++++++++++++- 4 files changed, 151 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 55155e262646..452f43c1cc34 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -583,6 +583,8 @@ struct kvm_s390_mem_op { struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */ + __u8 pad1[6]; /* ignored */ + __u64 old_addr; /* ignored if flag unset */ }; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */ @@ -599,6 +601,11 @@ struct kvm_s390_mem_op { #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) +#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3) +/* flags specifying extension support */ +#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG 0x2 +/* Non program exception return codes (pgm codes are 16 bit) */ +#define KVM_S390_MEMOP_R_NO_XCHG (1 << 16)
/* for KVM_INTERRUPT */ struct kvm_interrupt { diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 9408d6cc8e2c..92a3b9fb31ec 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,6 +206,9 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode);
+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, + __uint128_t *old, __uint128_t new, u8 access_key); + /** * write_guest_with_key - copy data from kernel space to guest space * @vcpu: virtual cpu diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 0243b6e38d36..6165e761a637 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -1161,6 +1161,108 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, return rc; }
+/** + * cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address. + * @kvm: Virtual machine instance. + * @gpa: Absolute guest address of the location to be changed. + * @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a + * non power of two will result in failure. + * @old_addr: Pointer to old value. If the location at @gpa contains this value, the + * exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old + * contains the value at @gpa before the attempt to exchange the value. + * @new: The value to place at @gpa. + * @access_key: The access key to use for the guest access. + * + * Atomically exchange the value at @gpa by @new, if it contains *@old. + * Honors storage keys. + * + * Return: * 0: successful exchange + * * 1: exchange unsuccessful + * * a program interruption code indicating the reason cmpxchg could + * not be attempted + * * -EINVAL: address misaligned or len not power of two + * * -EAGAIN: transient failure (len 1 or 2) + * * -EOPNOTSUPP: read-only memslot (should never occur) + */ +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, + __uint128_t *old_addr, __uint128_t new, + u8 access_key) +{ + gfn_t gfn = gpa >> PAGE_SHIFT; + struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); + bool writable; + hva_t hva; + int ret; + + if (!IS_ALIGNED(gpa, len)) + return -EINVAL; + + hva = gfn_to_hva_memslot_prot(slot, gfn, &writable); + if (kvm_is_error_hva(hva)) + return PGM_ADDRESSING; + /* + * Check if it's a read-only memslot, even though that cannot occur + * since those are unsupported. + * Don't try to actually handle that case. + */ + if (!writable) + return -EOPNOTSUPP; + + hva += offset_in_page(gpa); + switch (len) { + case 1: { + u8 old; + + ret = cmpxchg_user_key((u8 *)hva, &old, *old_addr, new, access_key); + ret = ret < 0 ? ret : old != *old_addr; + *old_addr = old; + break; + } + case 2: { + u16 old; + + ret = cmpxchg_user_key((u16 *)hva, &old, *old_addr, new, access_key); + ret = ret < 0 ? ret : old != *old_addr; + *old_addr = old; + break; + } + case 4: { + u32 old; + + ret = cmpxchg_user_key((u32 *)hva, &old, *old_addr, new, access_key); + ret = ret < 0 ? ret : old != *old_addr; + *old_addr = old; + break; + } + case 8: { + u64 old; + + ret = cmpxchg_user_key((u64 *)hva, &old, *old_addr, new, access_key); + ret = ret < 0 ? ret : old != *old_addr; + *old_addr = old; + break; + } + case 16: { + __uint128_t old; + + ret = cmpxchg_user_key((__uint128_t *)hva, &old, *old_addr, new, access_key); + ret = ret < 0 ? ret : old != *old_addr; + *old_addr = old; + break; + } + default: + return -EINVAL; + } + mark_page_dirty_in_slot(kvm, slot, gfn); + /* + * Assume that the fault is caused by protection, either key protection + * or user page write protection. + */ + if (ret == -EFAULT) + ret = PGM_PROTECTION; + return ret; +} + /** * guest_translate_address_with_key - translate guest logical into guest absolute address * @vcpu: virtual cpu diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index e4890e04b210..56f4f6ddd5bb 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -584,7 +584,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_VCPU_RESETS: case KVM_CAP_SET_GUEST_DEBUG: case KVM_CAP_S390_DIAG318: - case KVM_CAP_S390_MEM_OP_EXTENSION: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: @@ -598,6 +597,14 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_MEM_OP: r = MEM_OP_MAX_SIZE; break; + case KVM_CAP_S390_MEM_OP_EXTENSION: + /* + * Flag bits indicating which extensions are supported. + * The first extension doesn't use a flag, but pretend it does, + * this way that can be changed in the future. + */ + r = KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG | 1; + break; case KVM_CAP_NR_VCPUS: case KVM_CAP_MAX_VCPUS: case KVM_CAP_MAX_VCPU_ID: @@ -2772,12 +2779,19 @@ static bool access_key_invalid(u8 access_key) static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) { void __user *uaddr = (void __user *)mop->buf; + void __user *old_addr = (void __user *)mop->old_addr; + union { + __uint128_t quad; + char raw[sizeof(__uint128_t)]; + } old = { .quad = 0}, new = { .quad = 0 }; + unsigned int off_in_quad = sizeof(new) - mop->size; u64 supported_flags; void *tmpbuf = NULL; int r, srcu_idx;
supported_flags = KVM_S390_MEMOP_F_SKEY_PROTECTION - | KVM_S390_MEMOP_F_CHECK_ONLY; + | KVM_S390_MEMOP_F_CHECK_ONLY + | KVM_S390_MEMOP_F_CMPXCHG; if (mop->flags & ~supported_flags || !mop->size) return -EINVAL; if (mop->size > MEM_OP_MAX_SIZE) @@ -2799,6 +2813,21 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) } else { mop->key = 0; } + if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) { + /* + * This validates off_in_quad. Checking that size is a power + * of two is not necessary, as cmpxchg_guest_abs_with_key + * takes care of that + */ + if (mop->size > sizeof(new)) + return -EINVAL; + if (mop->op != KVM_S390_MEMOP_ABSOLUTE_WRITE) + return -EINVAL; + if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size)) + return -EFAULT; + if (copy_from_user(&old.raw[off_in_quad], old_addr, mop->size)) + return -EFAULT; + } if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) { tmpbuf = vmalloc(mop->size); if (!tmpbuf) @@ -2829,6 +2858,14 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) case KVM_S390_MEMOP_ABSOLUTE_WRITE: { if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_STORE, mop->key); + } else if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) { + r = cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, + &old.quad, new.quad, mop->key); + if (r == 1) { + r = KVM_S390_MEMOP_R_NO_XCHG; + if (copy_to_user(old_addr, &old.raw[off_in_quad], mop->size)) + r = -EFAULT; + } } else { if (copy_from_user(tmpbuf, uaddr, mop->size)) { r = -EFAULT;
On 10/01/2023 21.26, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
include/uapi/linux/kvm.h | 7 +++ arch/s390/kvm/gaccess.h | 3 ++ arch/s390/kvm/gaccess.c | 102 +++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 41 +++++++++++++++- 4 files changed, 151 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 55155e262646..452f43c1cc34 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -583,6 +583,8 @@ struct kvm_s390_mem_op { struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */
__u8 pad1[6]; /* ignored */
}; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */__u64 old_addr; /* ignored if flag unset */
@@ -599,6 +601,11 @@ struct kvm_s390_mem_op { #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) +#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3) +/* flags specifying extension support */ +#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG 0x2 +/* Non program exception return codes (pgm codes are 16 bit) */ +#define KVM_S390_MEMOP_R_NO_XCHG (1 << 16) /* for KVM_INTERRUPT */ struct kvm_interrupt { diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 9408d6cc8e2c..92a3b9fb31ec 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,6 +206,9 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
__uint128_t *old, __uint128_t new, u8 access_key);
- /**
- write_guest_with_key - copy data from kernel space to guest space
- @vcpu: virtual cpu
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 0243b6e38d36..6165e761a637 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -1161,6 +1161,108 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, return rc; } +/**
- cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address.
- @kvm: Virtual machine instance.
- @gpa: Absolute guest address of the location to be changed.
- @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
non power of two will result in failure.
- @old_addr: Pointer to old value. If the location at @gpa contains this value, the
exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
contains the value at @gpa before the attempt to exchange the value.
- @new: The value to place at @gpa.
- @access_key: The access key to use for the guest access.
- Atomically exchange the value at @gpa by @new, if it contains *@old.
- Honors storage keys.
- Return: * 0: successful exchange
* 1: exchange unsuccessful
* a program interruption code indicating the reason cmpxchg could
not be attempted
PGM_OPERATION has also the value 1 ... can we be sure that it never happens here? ... maybe it would make sense to use KVM_S390_MEMOP_R_NO_XCHG for return value here instead of 1, too, just to be on the safe side?
Apart from that, patch looks fine to me.
Thomas
* -EINVAL: address misaligned or len not power of two
* -EAGAIN: transient failure (len 1 or 2)
* -EOPNOTSUPP: read-only memslot (should never occur)
- */
+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
__uint128_t *old_addr, __uint128_t new,
u8 access_key)
+{
- gfn_t gfn = gpa >> PAGE_SHIFT;
- struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
- bool writable;
- hva_t hva;
- int ret;
- if (!IS_ALIGNED(gpa, len))
return -EINVAL;
- hva = gfn_to_hva_memslot_prot(slot, gfn, &writable);
- if (kvm_is_error_hva(hva))
return PGM_ADDRESSING;
- /*
* Check if it's a read-only memslot, even though that cannot occur
* since those are unsupported.
* Don't try to actually handle that case.
*/
- if (!writable)
return -EOPNOTSUPP;
- hva += offset_in_page(gpa);
- switch (len) {
- case 1: {
u8 old;
ret = cmpxchg_user_key((u8 *)hva, &old, *old_addr, new, access_key);
ret = ret < 0 ? ret : old != *old_addr;
*old_addr = old;
break;
- }
- case 2: {
u16 old;
ret = cmpxchg_user_key((u16 *)hva, &old, *old_addr, new, access_key);
ret = ret < 0 ? ret : old != *old_addr;
*old_addr = old;
break;
- }
- case 4: {
u32 old;
ret = cmpxchg_user_key((u32 *)hva, &old, *old_addr, new, access_key);
ret = ret < 0 ? ret : old != *old_addr;
*old_addr = old;
break;
- }
- case 8: {
u64 old;
ret = cmpxchg_user_key((u64 *)hva, &old, *old_addr, new, access_key);
ret = ret < 0 ? ret : old != *old_addr;
*old_addr = old;
break;
- }
- case 16: {
__uint128_t old;
ret = cmpxchg_user_key((__uint128_t *)hva, &old, *old_addr, new, access_key);
ret = ret < 0 ? ret : old != *old_addr;
*old_addr = old;
break;
- }
- default:
return -EINVAL;
- }
- mark_page_dirty_in_slot(kvm, slot, gfn);
- /*
* Assume that the fault is caused by protection, either key protection
* or user page write protection.
*/
- if (ret == -EFAULT)
ret = PGM_PROTECTION;
- return ret;
+}
- /**
- guest_translate_address_with_key - translate guest logical into guest absolute address
- @vcpu: virtual cpu
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index e4890e04b210..56f4f6ddd5bb 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -584,7 +584,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_VCPU_RESETS: case KVM_CAP_SET_GUEST_DEBUG: case KVM_CAP_S390_DIAG318:
- case KVM_CAP_S390_MEM_OP_EXTENSION: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2:
@@ -598,6 +597,14 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_MEM_OP: r = MEM_OP_MAX_SIZE; break;
- case KVM_CAP_S390_MEM_OP_EXTENSION:
/*
* Flag bits indicating which extensions are supported.
* The first extension doesn't use a flag, but pretend it does,
* this way that can be changed in the future.
*/
r = KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG | 1;
case KVM_CAP_NR_VCPUS: case KVM_CAP_MAX_VCPUS: case KVM_CAP_MAX_VCPU_ID:break;
@@ -2772,12 +2779,19 @@ static bool access_key_invalid(u8 access_key) static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) { void __user *uaddr = (void __user *)mop->buf;
- void __user *old_addr = (void __user *)mop->old_addr;
- union {
__uint128_t quad;
char raw[sizeof(__uint128_t)];
- } old = { .quad = 0}, new = { .quad = 0 };
- unsigned int off_in_quad = sizeof(new) - mop->size; u64 supported_flags; void *tmpbuf = NULL; int r, srcu_idx;
supported_flags = KVM_S390_MEMOP_F_SKEY_PROTECTION
| KVM_S390_MEMOP_F_CHECK_ONLY;
| KVM_S390_MEMOP_F_CHECK_ONLY
if (mop->flags & ~supported_flags || !mop->size) return -EINVAL; if (mop->size > MEM_OP_MAX_SIZE)| KVM_S390_MEMOP_F_CMPXCHG;
@@ -2799,6 +2813,21 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) } else { mop->key = 0; }
- if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
/*
* This validates off_in_quad. Checking that size is a power
* of two is not necessary, as cmpxchg_guest_abs_with_key
* takes care of that
*/
if (mop->size > sizeof(new))
return -EINVAL;
if (mop->op != KVM_S390_MEMOP_ABSOLUTE_WRITE)
return -EINVAL;
if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size))
return -EFAULT;
if (copy_from_user(&old.raw[off_in_quad], old_addr, mop->size))
return -EFAULT;
- } if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) { tmpbuf = vmalloc(mop->size); if (!tmpbuf)
@@ -2829,6 +2858,14 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) case KVM_S390_MEMOP_ABSOLUTE_WRITE: { if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_STORE, mop->key);
} else if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
r = cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size,
&old.quad, new.quad, mop->key);
if (r == 1) {
r = KVM_S390_MEMOP_R_NO_XCHG;
if (copy_to_user(old_addr, &old.raw[off_in_quad], mop->size))
r = -EFAULT;
} else { if (copy_from_user(tmpbuf, uaddr, mop->size)) { r = -EFAULT;}
On Wed, 2023-01-11 at 08:59 +0100, Thomas Huth wrote:
On 10/01/2023 21.26, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
include/uapi/linux/kvm.h | 7 +++ arch/s390/kvm/gaccess.h | 3 ++ arch/s390/kvm/gaccess.c | 102 +++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 41 +++++++++++++++- 4 files changed, 151 insertions(+), 2 deletions(-)
[...]
+/**
- cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address.
- @kvm: Virtual machine instance.
- @gpa: Absolute guest address of the location to be changed.
- @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
non power of two will result in failure.
- @old_addr: Pointer to old value. If the location at @gpa contains this value, the
exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
contains the value at @gpa before the attempt to exchange the value.
- @new: The value to place at @gpa.
- @access_key: The access key to use for the guest access.
- Atomically exchange the value at @gpa by @new, if it contains *@old.
- Honors storage keys.
- Return: * 0: successful exchange
* 1: exchange unsuccessful
* a program interruption code indicating the reason cmpxchg could
not be attempted
PGM_OPERATION has also the value 1 ... can we be sure that it never happens here?
Currently yes, only program errors are those explicit in the code, PGM_ADDRESSING and PGM_PROTECTION.
... maybe it would make sense to use KVM_S390_MEMOP_R_NO_XCHG for return value here instead of 1, too, just to be on the safe side?
I didn't like that idea because I consider KVM_S390_MEMOP_R_NO_XCHG to be part of the KVM's api surface and cmpxchg_guest_abs_with_key is an internal function that shouldn't concern itself with that.
But being unclear on PGM_OPERATION is indeed ugly. Maybe I should just replace "a program interruption code ..." with the specific ones?
Apart from that, patch looks fine to me.
Thomas
* -EINVAL: address misaligned or len not power of two
* -EAGAIN: transient failure (len 1 or 2)
* -EOPNOTSUPP: read-only memslot (should never occur)
- */
+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
__uint128_t *old_addr, __uint128_t new,
u8 access_key)
+{
- gfn_t gfn = gpa >> PAGE_SHIFT;
- struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
- bool writable;
- hva_t hva;
- int ret;
- if (!IS_ALIGNED(gpa, len))
return -EINVAL;
- hva = gfn_to_hva_memslot_prot(slot, gfn, &writable);
- if (kvm_is_error_hva(hva))
return PGM_ADDRESSING;
- /*
* Check if it's a read-only memslot, even though that cannot occur
* since those are unsupported.
* Don't try to actually handle that case.
*/
- if (!writable)
return -EOPNOTSUPP;
- hva += offset_in_page(gpa);
- switch (len) {
- case 1: {
u8 old;
ret = cmpxchg_user_key((u8 *)hva, &old, *old_addr, new, access_key);
ret = ret < 0 ? ret : old != *old_addr;
*old_addr = old;
break;
- }
- case 2: {
u16 old;
ret = cmpxchg_user_key((u16 *)hva, &old, *old_addr, new, access_key);
ret = ret < 0 ? ret : old != *old_addr;
*old_addr = old;
break;
- }
- case 4: {
u32 old;
ret = cmpxchg_user_key((u32 *)hva, &old, *old_addr, new, access_key);
ret = ret < 0 ? ret : old != *old_addr;
*old_addr = old;
break;
- }
- case 8: {
u64 old;
ret = cmpxchg_user_key((u64 *)hva, &old, *old_addr, new, access_key);
ret = ret < 0 ? ret : old != *old_addr;
*old_addr = old;
break;
- }
- case 16: {
__uint128_t old;
ret = cmpxchg_user_key((__uint128_t *)hva, &old, *old_addr, new, access_key);
ret = ret < 0 ? ret : old != *old_addr;
*old_addr = old;
break;
- }
- default:
return -EINVAL;
- }
- mark_page_dirty_in_slot(kvm, slot, gfn);
- /*
* Assume that the fault is caused by protection, either key protection
* or user page write protection.
*/
- if (ret == -EFAULT)
ret = PGM_PROTECTION;
- return ret;
+}
[...]
On 11/01/2023 11.00, Janis Schoetterl-Glausch wrote:
On Wed, 2023-01-11 at 08:59 +0100, Thomas Huth wrote:
On 10/01/2023 21.26, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
include/uapi/linux/kvm.h | 7 +++ arch/s390/kvm/gaccess.h | 3 ++ arch/s390/kvm/gaccess.c | 102 +++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 41 +++++++++++++++- 4 files changed, 151 insertions(+), 2 deletions(-)
[...]
+/**
- cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address.
- @kvm: Virtual machine instance.
- @gpa: Absolute guest address of the location to be changed.
- @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
non power of two will result in failure.
- @old_addr: Pointer to old value. If the location at @gpa contains this value, the
exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
contains the value at @gpa before the attempt to exchange the value.
- @new: The value to place at @gpa.
- @access_key: The access key to use for the guest access.
- Atomically exchange the value at @gpa by @new, if it contains *@old.
- Honors storage keys.
- Return: * 0: successful exchange
* 1: exchange unsuccessful
* a program interruption code indicating the reason cmpxchg could
not be attempted
PGM_OPERATION has also the value 1 ... can we be sure that it never happens here?
Currently yes, only program errors are those explicit in the code, PGM_ADDRESSING and PGM_PROTECTION.
... maybe it would make sense to use KVM_S390_MEMOP_R_NO_XCHG for return value here instead of 1, too, just to be on the safe side?
I didn't like that idea because I consider KVM_S390_MEMOP_R_NO_XCHG to be part of the KVM's api surface and cmpxchg_guest_abs_with_key is an internal function that shouldn't concern itself with that.
But being unclear on PGM_OPERATION is indeed ugly. Maybe I should just replace "a program interruption code ..." with the specific ones?
Yes, that would help to avoid this confusion. With such a change feel free to add: Reviewed-by: Thomas Huth thuth@redhat.com
On 1/10/23 21:26, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
include/uapi/linux/kvm.h | 7 +++ arch/s390/kvm/gaccess.h | 3 ++ arch/s390/kvm/gaccess.c | 102 +++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 41 +++++++++++++++- 4 files changed, 151 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 55155e262646..452f43c1cc34 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -583,6 +583,8 @@ struct kvm_s390_mem_op { struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */
__u8 pad1[6]; /* ignored */
}; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */__u64 old_addr; /* ignored if flag unset */
@@ -599,6 +601,11 @@ struct kvm_s390_mem_op { #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) +#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3) +/* flags specifying extension support */
Would that fit behind the bit shifts without getting into the "line too long" territory?
+#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG 0x2
\n please
+/* Non program exception return codes (pgm codes are 16 bit) */ +#define KVM_S390_MEMOP_R_NO_XCHG (1 << 16) /* for KVM_INTERRUPT */ struct kvm_interrupt { diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 9408d6cc8e2c..92a3b9fb31ec 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,6 +206,9 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
__uint128_t *old, __uint128_t new, u8 access_key);
- /**
- write_guest_with_key - copy data from kernel space to guest space
- @vcpu: virtual cpu
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 0243b6e38d36..6165e761a637 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -1161,6 +1161,108 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, return rc; } +/**
- cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address.
- @kvm: Virtual machine instance.
- @gpa: Absolute guest address of the location to be changed.
- @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
non power of two will result in failure.
- @old_addr: Pointer to old value. If the location at @gpa contains this value, the
exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
contains the value at @gpa before the attempt to exchange the value.
- @new: The value to place at @gpa.
- @access_key: The access key to use for the guest access.
- Atomically exchange the value at @gpa by @new, if it contains *@old.
- Honors storage keys.
- Return: * 0: successful exchange
* 1: exchange unsuccessful
* a program interruption code indicating the reason cmpxchg could
not be attempted
1 Access related program interruption code indicating the reason
cmpxchg could not be attempted
< 1 Kernel / input data error codes
* -EINVAL: address misaligned or len not power of two
* -EAGAIN: transient failure (len 1 or 2)
* -EOPNOTSUPP: read-only memslot (should never occur)
Would PGM_PROTECTED also make sense here instead of EOPNOTSUPP?
- */
+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
__uint128_t *old_addr, __uint128_t new,
u8 access_key)
+{
- gfn_t gfn = gpa >> PAGE_SHIFT;
- struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
- bool writable;
- hva_t hva;
- int ret;
- if (!IS_ALIGNED(gpa, len))
return -EINVAL;
- hva = gfn_to_hva_memslot_prot(slot, gfn, &writable);
- if (kvm_is_error_hva(hva))
return PGM_ADDRESSING;
- /*
* Check if it's a read-only memslot, even though that cannot occur
* since those are unsupported.
* Don't try to actually handle that case.
*/
- if (!writable)
return -EOPNOTSUPP;
[...]
/**
- guest_translate_address_with_key - translate guest logical into guest absolute address
- @vcpu: virtual cpu
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index e4890e04b210..56f4f6ddd5bb 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -584,7 +584,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_VCPU_RESETS: case KVM_CAP_SET_GUEST_DEBUG: case KVM_CAP_S390_DIAG318:
- case KVM_CAP_S390_MEM_OP_EXTENSION: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2:
@@ -598,6 +597,14 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_MEM_OP: r = MEM_OP_MAX_SIZE; break;
- case KVM_CAP_S390_MEM_OP_EXTENSION:
/*
* Flag bits indicating which extensions are supported.
* The first extension doesn't use a flag, but pretend it does,
* this way that can be changed in the future.
*/
r = KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG | 1;
case KVM_CAP_NR_VCPUS: case KVM_CAP_MAX_VCPUS: case KVM_CAP_MAX_VCPU_ID:break;
@@ -2772,12 +2779,19 @@ static bool access_key_invalid(u8 access_key) static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) { void __user *uaddr = (void __user *)mop->buf;
- void __user *old_addr = (void __user *)mop->old_addr;
- union {
__uint128_t quad;
char raw[sizeof(__uint128_t)];
- } old = { .quad = 0}, new = { .quad = 0 };
- unsigned int off_in_quad = sizeof(new) - mop->size; u64 supported_flags; void *tmpbuf = NULL; int r, srcu_idx;
supported_flags = KVM_S390_MEMOP_F_SKEY_PROTECTION
| KVM_S390_MEMOP_F_CHECK_ONLY;
| KVM_S390_MEMOP_F_CHECK_ONLY
if (mop->flags & ~supported_flags || !mop->size) return -EINVAL; if (mop->size > MEM_OP_MAX_SIZE)| KVM_S390_MEMOP_F_CMPXCHG;
@@ -2799,6 +2813,21 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) } else { mop->key = 0; }
- if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
/*
* This validates off_in_quad. Checking that size is a power
* of two is not necessary, as cmpxchg_guest_abs_with_key
* takes care of that
*/
if (mop->size > sizeof(new))
return -EINVAL;
!mop->size || mop->size > sizeof(new)
if (mop->op != KVM_S390_MEMOP_ABSOLUTE_WRITE)
return -EINVAL;
if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size))
return -EFAULT;
if (copy_from_user(&old.raw[off_in_quad], old_addr, mop->size))
return -EFAULT;
- } if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) { tmpbuf = vmalloc(mop->size); if (!tmpbuf)
@@ -2829,6 +2858,14 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) case KVM_S390_MEMOP_ABSOLUTE_WRITE: { if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_STORE, mop->key);
} else if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
r = cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size,
&old.quad, new.quad, mop->key);
if (r == 1) {
r = KVM_S390_MEMOP_R_NO_XCHG;
Why don't we return KVM_S390_MEMOP_R_NO_XCHG from cmpxchg_guest_abs_with_key instead of aliasing 1 here?
if (copy_to_user(old_addr, &old.raw[off_in_quad], mop->size))
r = -EFAULT;
} else { if (copy_from_user(tmpbuf, uaddr, mop->size)) { r = -EFAULT;}
On Wed, 2023-01-11 at 10:35 +0100, Janosch Frank wrote:
On 1/10/23 21:26, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
include/uapi/linux/kvm.h | 7 +++ arch/s390/kvm/gaccess.h | 3 ++ arch/s390/kvm/gaccess.c | 102 +++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 41 +++++++++++++++- 4 files changed, 151 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 55155e262646..452f43c1cc34 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -583,6 +583,8 @@ struct kvm_s390_mem_op { struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */
__u8 pad1[6]; /* ignored */
}; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */__u64 old_addr; /* ignored if flag unset */
@@ -599,6 +601,11 @@ struct kvm_s390_mem_op { #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) +#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3) +/* flags specifying extension support */
Would that fit behind the bit shifts without getting into the "line too long" territory?
Bit shifts or the next line?
+#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG 0x2
\n please
Not sure about all that, this is the way it looks right now:
/* types for kvm_s390_mem_op->op */ #define KVM_S390_MEMOP_LOGICAL_READ 0 #define KVM_S390_MEMOP_LOGICAL_WRITE 1 #define KVM_S390_MEMOP_SIDA_READ 2 #define KVM_S390_MEMOP_SIDA_WRITE 3 #define KVM_S390_MEMOP_ABSOLUTE_READ 4 #define KVM_S390_MEMOP_ABSOLUTE_WRITE 5 /* flags for kvm_s390_mem_op->flags */ #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) #define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3) /* flags specifying extension support */ #define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG 0x2 /* Non program exception return codes (pgm codes are 16 bit) */ #define KVM_S390_MEMOP_R_NO_XCHG (1 << 16)
Seems consistent to me.
+/* Non program exception return codes (pgm codes are 16 bit) */ +#define KVM_S390_MEMOP_R_NO_XCHG (1 << 16) /* for KVM_INTERRUPT */ struct kvm_interrupt { diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 9408d6cc8e2c..92a3b9fb31ec 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,6 +206,9 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
__uint128_t *old, __uint128_t new, u8 access_key);
- /**
- write_guest_with_key - copy data from kernel space to guest space
- @vcpu: virtual cpu
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 0243b6e38d36..6165e761a637 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -1161,6 +1161,108 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, return rc; } +/**
- cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address.
- @kvm: Virtual machine instance.
- @gpa: Absolute guest address of the location to be changed.
- @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
non power of two will result in failure.
- @old_addr: Pointer to old value. If the location at @gpa contains this value, the
exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
contains the value at @gpa before the attempt to exchange the value.
- @new: The value to place at @gpa.
- @access_key: The access key to use for the guest access.
- Atomically exchange the value at @gpa by @new, if it contains *@old.
- Honors storage keys.
- Return: * 0: successful exchange
* 1: exchange unsuccessful
* a program interruption code indicating the reason cmpxchg could
not be attempted
1 Access related program interruption code indicating the reason
cmpxchg could not be attempted
< 1 Kernel / input data error codes
I think I'll do it like I said in the email to Thomas, that way it's maximally explicit about the return values one might get.
* -EINVAL: address misaligned or len not power of two
* -EAGAIN: transient failure (len 1 or 2)
* -EOPNOTSUPP: read-only memslot (should never occur)
Would PGM_PROTECTED also make sense here instead of EOPNOTSUPP?
I don't think so, if you get EOPNOTSUPP there's a programming error somewhere that needs to be fixed. I wouldn't want to mix that with the totally fine case of a protection exception.
[...]
@@ -2772,12 +2779,19 @@ static bool access_key_invalid(u8 access_key) static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) { void __user *uaddr = (void __user *)mop->buf;
- void __user *old_addr = (void __user *)mop->old_addr;
- union {
__uint128_t quad;
char raw[sizeof(__uint128_t)];
- } old = { .quad = 0}, new = { .quad = 0 };
- unsigned int off_in_quad = sizeof(new) - mop->size; u64 supported_flags; void *tmpbuf = NULL; int r, srcu_idx;
supported_flags = KVM_S390_MEMOP_F_SKEY_PROTECTION
| KVM_S390_MEMOP_F_CHECK_ONLY;
| KVM_S390_MEMOP_F_CHECK_ONLY
if (mop->flags & ~supported_flags || !mop->size) return -EINVAL; if (mop->size > MEM_OP_MAX_SIZE)| KVM_S390_MEMOP_F_CMPXCHG;
@@ -2799,6 +2813,21 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) } else { mop->key = 0; }
- if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
/*
* This validates off_in_quad. Checking that size is a power
* of two is not necessary, as cmpxchg_guest_abs_with_key
* takes care of that
*/
if (mop->size > sizeof(new))
return -EINVAL;
!mop->size || mop->size > sizeof(new)
Not sure why that would be necessary, but I did write "Operand length of the cmpxchg, required: 1 <= len <= 16", so I'll trust my past self on that.
if (mop->op != KVM_S390_MEMOP_ABSOLUTE_WRITE)
return -EINVAL;
if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size))
return -EFAULT;
if (copy_from_user(&old.raw[off_in_quad], old_addr, mop->size))
return -EFAULT;
- } if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) { tmpbuf = vmalloc(mop->size); if (!tmpbuf)
@@ -2829,6 +2858,14 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) case KVM_S390_MEMOP_ABSOLUTE_WRITE: { if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_STORE, mop->key);
} else if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
r = cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size,
&old.quad, new.quad, mop->key);
if (r == 1) {
r = KVM_S390_MEMOP_R_NO_XCHG;
Why don't we return KVM_S390_MEMOP_R_NO_XCHG from cmpxchg_guest_abs_with_key instead of aliasing 1 here?
I think it's a bit ugly, since cmpxchg_guest_abs_with_key is an internal function and not memop specific. I don't like returning a MEMOP API constant there.
if (copy_to_user(old_addr, &old.raw[off_in_quad], mop->size))
r = -EFAULT;
} else { if (copy_from_user(tmpbuf, uaddr, mop->size)) { r = -EFAULT;}
On 1/11/23 16:19, Janis Schoetterl-Glausch wrote:
On Wed, 2023-01-11 at 10:35 +0100, Janosch Frank wrote:
On 1/10/23 21:26, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
include/uapi/linux/kvm.h | 7 +++ arch/s390/kvm/gaccess.h | 3 ++ arch/s390/kvm/gaccess.c | 102 +++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 41 +++++++++++++++- 4 files changed, 151 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 55155e262646..452f43c1cc34 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -583,6 +583,8 @@ struct kvm_s390_mem_op { struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */
__u8 pad1[6]; /* ignored */
__u64 old_addr; /* ignored if flag unset */
Which flag? Maybe that would be clearer with a cmpxchg_ prefix.
}; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */
@@ -599,6 +601,11 @@ struct kvm_s390_mem_op { #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) +#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3) +/* flags specifying extension support */
via KVM_CAP_S390_MEM_OP_EXTENSION
This is part of the CAP api but is nestled between the memop api. I'm not entirely happy about that.
Would that fit behind the bit shifts without getting into the "line too long" territory?
Bit shifts or the next line?
Seems like I didn't see that this is grouped to the next line so forget about my comment.
\n please
Not sure about all that, this is the way it looks right now:
/* types for kvm_s390_mem_op->op */ #define KVM_S390_MEMOP_LOGICAL_READ 0 #define KVM_S390_MEMOP_LOGICAL_WRITE 1 #define KVM_S390_MEMOP_SIDA_READ 2 #define KVM_S390_MEMOP_SIDA_WRITE 3 #define KVM_S390_MEMOP_ABSOLUTE_READ 4 #define KVM_S390_MEMOP_ABSOLUTE_WRITE 5
/* flags for kvm_s390_mem_op->flags */ #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) #define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3)
/* flags specifying extension support */
#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG 0x2
#define KVM_S390_MEMOP_EXTENSION_CAP_ABSOLUTE 1 << 0 #define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG 1 << 1
Or maybe BASE instead of ABSOLUTE
/* Non program exception return codes (pgm codes are 16 bit) */ #define KVM_S390_MEMOP_R_NO_XCHG (1 << 16)
Seems consistent to me.
Consistent and hardly readable once you add two more "categories" of values.
+/* Non program exception return codes (pgm codes are 16 bit) */ +#define KVM_S390_MEMOP_R_NO_XCHG (1 << 16) /* for KVM_INTERRUPT */ struct kvm_interrupt { diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 9408d6cc8e2c..92a3b9fb31ec 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,6 +206,9 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
__uint128_t *old, __uint128_t new, u8 access_key);
- /**
- write_guest_with_key - copy data from kernel space to guest space
- @vcpu: virtual cpu
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 0243b6e38d36..6165e761a637 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -1161,6 +1161,108 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, return rc; } +/**
- cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address.
- @kvm: Virtual machine instance.
- @gpa: Absolute guest address of the location to be changed.
- @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
non power of two will result in failure.
- @old_addr: Pointer to old value. If the location at @gpa contains this value, the
exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
contains the value at @gpa before the attempt to exchange the value.
- @new: The value to place at @gpa.
- @access_key: The access key to use for the guest access.
- Atomically exchange the value at @gpa by @new, if it contains *@old.
- Honors storage keys.
- Return: * 0: successful exchange
* 1: exchange unsuccessful
* a program interruption code indicating the reason cmpxchg could
not be attempted
1 Access related program interruption code indicating the reason
cmpxchg could not be attempted
< 1 Kernel / input data error codes
I think I'll do it like I said in the email to Thomas, that way it's maximally explicit about the return values one might get.
This shows me that we're overloading the return value way too much. Not just of this function but also of the memop with KVM_S390_MEMOP_R_NO_XCHG.
A return of < 0, 0, 1 and a passed int reference for the PGM codes that's updated if the rc is 1 would make this clearer.
Btw. could a user specify check + cmpxchange as the flags? Are we checking that and I've missed to see such a check?
I don't like that we throw in yet another feature as a flag thereby blowing up kvm_s390_vm_mem_op() and adding new branches to it. I'll need to think about that some more.
* -EINVAL: address misaligned or len not power of two
* -EAGAIN: transient failure (len 1 or 2)
* -EOPNOTSUPP: read-only memslot (should never occur)
Would PGM_PROTECTED also make sense here instead of EOPNOTSUPP?
I don't think so, if you get EOPNOTSUPP there's a programming error somewhere that needs to be fixed. I wouldn't want to mix that with the totally fine case of a protection exception.
[...]
@@ -2772,12 +2779,19 @@ static bool access_key_invalid(u8 access_key) static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) { void __user *uaddr = (void __user *)mop->buf;
- void __user *old_addr = (void __user *)mop->old_addr;
- union {
__uint128_t quad;
char raw[sizeof(__uint128_t)];
- } old = { .quad = 0}, new = { .quad = 0 };
- unsigned int off_in_quad = sizeof(new) - mop->size; u64 supported_flags; void *tmpbuf = NULL; int r, srcu_idx; supported_flags = KVM_S390_MEMOP_F_SKEY_PROTECTION
| KVM_S390_MEMOP_F_CHECK_ONLY;
| KVM_S390_MEMOP_F_CHECK_ONLY
if (mop->flags & ~supported_flags || !mop->size) return -EINVAL; if (mop->size > MEM_OP_MAX_SIZE)| KVM_S390_MEMOP_F_CMPXCHG;
@@ -2799,6 +2813,21 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) } else { mop->key = 0; }
- if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
/*
* This validates off_in_quad. Checking that size is a power
* of two is not necessary, as cmpxchg_guest_abs_with_key
* takes care of that
*/
if (mop->size > sizeof(new))
return -EINVAL;
!mop->size || mop->size > sizeof(new)
Not sure why that would be necessary, but I did write "Operand length of the cmpxchg, required: 1 <= len <= 16", so I'll trust my past self on that.
It's already being checked right after the flags are specified so we're golden anyway.
if (mop->op != KVM_S390_MEMOP_ABSOLUTE_WRITE)
return -EINVAL;
if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size))
return -EFAULT;
if (copy_from_user(&old.raw[off_in_quad], old_addr, mop->size))
return -EFAULT;
- } if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) { tmpbuf = vmalloc(mop->size); if (!tmpbuf)
@@ -2829,6 +2858,14 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) case KVM_S390_MEMOP_ABSOLUTE_WRITE: { if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_STORE, mop->key);
} else if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
r = cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size,
&old.quad, new.quad, mop->key);
if (r == 1) {
r = KVM_S390_MEMOP_R_NO_XCHG;
Why don't we return KVM_S390_MEMOP_R_NO_XCHG from cmpxchg_guest_abs_with_key instead of aliasing 1 here?
I think it's a bit ugly, since cmpxchg_guest_abs_with_key is an internal function and not memop specific. I don't like returning a MEMOP API constant there.
if (copy_to_user(old_addr, &old.raw[off_in_quad], mop->size))
r = -EFAULT;
} } else { if (copy_from_user(tmpbuf, uaddr, mop->size)) { r = -EFAULT;
On Wed, 2023-01-11 at 18:26 +0100, Janosch Frank wrote:
On 1/11/23 16:19, Janis Schoetterl-Glausch wrote:
On Wed, 2023-01-11 at 10:35 +0100, Janosch Frank wrote:
On 1/10/23 21:26, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
include/uapi/linux/kvm.h | 7 +++ arch/s390/kvm/gaccess.h | 3 ++ arch/s390/kvm/gaccess.c | 102 +++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 41 +++++++++++++++- 4 files changed, 151 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 55155e262646..452f43c1cc34 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -583,6 +583,8 @@ struct kvm_s390_mem_op { struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */
__u8 pad1[6]; /* ignored */
__u64 old_addr; /* ignored if flag unset */
Which flag? Maybe that would be clearer with a cmpxchg_ prefix.
Yes.
}; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */
@@ -599,6 +601,11 @@ struct kvm_s390_mem_op { #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) +#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3) +/* flags specifying extension support */
via KVM_CAP_S390_MEM_OP_EXTENSION
This is part of the CAP api but is nestled between the memop api. I'm not entirely happy about that.
Yes, I wasn't sure where to put it.
Would that fit behind the bit shifts without getting into the "line too long" territory?
Bit shifts or the next line?
Seems like I didn't see that this is grouped to the next line so forget about my comment.
\n please
Not sure about all that, this is the way it looks right now:
/* types for kvm_s390_mem_op->op */ #define KVM_S390_MEMOP_LOGICAL_READ 0 #define KVM_S390_MEMOP_LOGICAL_WRITE 1 #define KVM_S390_MEMOP_SIDA_READ 2 #define KVM_S390_MEMOP_SIDA_WRITE 3 #define KVM_S390_MEMOP_ABSOLUTE_READ 4 #define KVM_S390_MEMOP_ABSOLUTE_WRITE 5
/* flags for kvm_s390_mem_op->flags */ #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) #define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3)
/* flags specifying extension support */
#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG 0x2
#define KVM_S390_MEMOP_EXTENSION_CAP_ABSOLUTE 1 << 0
Unfortunately, I designed this badly for the first memop extension, the absolute memop is supported if extension_cap > 0. But we can always also set bit 0 in that case.
#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG 1 << 1
Or maybe BASE instead of ABSOLUTE
/* Non program exception return codes (pgm codes are 16 bit) */ #define KVM_S390_MEMOP_R_NO_XCHG (1 << 16)
Seems consistent to me.
Consistent and hardly readable once you add two more "categories" of values.
I'll add newlines then.
+/* Non program exception return codes (pgm codes are 16 bit) */ +#define KVM_S390_MEMOP_R_NO_XCHG (1 << 16) /* for KVM_INTERRUPT */ struct kvm_interrupt { diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 9408d6cc8e2c..92a3b9fb31ec 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,6 +206,9 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
__uint128_t *old, __uint128_t new, u8 access_key);
- /**
- write_guest_with_key - copy data from kernel space to guest space
- @vcpu: virtual cpu
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 0243b6e38d36..6165e761a637 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -1161,6 +1161,108 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, return rc; } +/**
- cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address.
- @kvm: Virtual machine instance.
- @gpa: Absolute guest address of the location to be changed.
- @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
non power of two will result in failure.
- @old_addr: Pointer to old value. If the location at @gpa contains this value, the
exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
contains the value at @gpa before the attempt to exchange the value.
- @new: The value to place at @gpa.
- @access_key: The access key to use for the guest access.
- Atomically exchange the value at @gpa by @new, if it contains *@old.
- Honors storage keys.
- Return: * 0: successful exchange
* 1: exchange unsuccessful
* a program interruption code indicating the reason cmpxchg could
not be attempted
1 Access related program interruption code indicating the reason
cmpxchg could not be attempted
< 1 Kernel / input data error codes
I think I'll do it like I said in the email to Thomas, that way it's maximally explicit about the return values one might get.
This shows me that we're overloading the return value way too much. Not just of this function but also of the memop with KVM_S390_MEMOP_R_NO_XCHG.
A return of < 0, 0, 1 and a passed int reference for the PGM codes that's updated if the rc is 1 would make this clearer.
The return code is complicated, using effectively two return codes does cleanly separate the possible error types, but also means one has to check two return codes. I'm fine with that, but in that case it shouldn't be the PGM code that gets separated, but the success of the xchg. So <0 kernel error, >0 PGM code, like everywhere else and ==0 -> check *success.
Btw. could a user specify check + cmpxchange as the flags? Are we checking that and I've missed to see such a check?
Yes, what that does is basically to check if the cmpxchg args are valid. Then it checks if the target address is writable. Not much code necessary for that other than not doing the cmpxchg if check_only is set.
I don't like that we throw in yet another feature as a flag thereby blowing up kvm_s390_vm_mem_op() and adding new branches to it. I'll need to think about that some more.
[...]
On 1/10/23 21:26, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
include/uapi/linux/kvm.h | 7 +++ arch/s390/kvm/gaccess.h | 3 ++ arch/s390/kvm/gaccess.c | 102 +++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 41 +++++++++++++++- 4 files changed, 151 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 55155e262646..452f43c1cc34 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -583,6 +583,8 @@ struct kvm_s390_mem_op { struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */
__u8 pad1[6]; /* ignored */
__u64 old_addr; /* ignored if flag unset */
These 3 are only used for flag values >=4, no? They aren't used for all flag values but for specific ones, right?
On Wed, 2023-01-11 at 10:38 +0100, Janosch Frank wrote:
On 1/10/23 21:26, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
include/uapi/linux/kvm.h | 7 +++ arch/s390/kvm/gaccess.h | 3 ++ arch/s390/kvm/gaccess.c | 102 +++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 41 +++++++++++++++- 4 files changed, 151 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 55155e262646..452f43c1cc34 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -583,6 +583,8 @@ struct kvm_s390_mem_op { struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */
__u8 pad1[6]; /* ignored */
__u64 old_addr; /* ignored if flag unset */
These 3 are only used for flag values >=4, no? They aren't used for all flag values but for specific ones, right?
key is used with the skey flag, old_addr with the cmpxchg flag, so yes only used with specific flags. ar is used for logical accesses.
Describe the semantics of the new KVM_S390_MEMOP_F_CMPXCHG flag for absolute vm write memops which allows user space to perform (storage key checked) cmpxchg operations on guest memory.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com Reviewed-by: Claudio Imbrenda imbrenda@linux.ibm.com --- Documentation/virt/kvm/api.rst | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index deb494f759ed..be4bdcd2d489 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -3728,7 +3728,8 @@ The fields in each entry are defined as follows: :Parameters: struct kvm_s390_mem_op (in) :Returns: = 0 on success, < 0 on generic error (e.g. -EFAULT or -ENOMEM), - > 0 if an exception occurred while walking the page tables + 16 bit program exception code if the access causes such an exception, + other code > 0xffff with special meaning.
Read or write data from/to the VM's memory. The KVM_CAP_S390_MEM_OP_EXTENSION capability specifies what functionality is @@ -3746,6 +3747,8 @@ Parameters are specified via the following structure:: struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */ + __u8 pad1[6]; /* ignored */ + __u64 old_addr; /* ignored if flag unset */ }; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */ @@ -3828,8 +3831,21 @@ Absolute accesses are permitted for non-protected guests only. Supported flags: * ``KVM_S390_MEMOP_F_CHECK_ONLY`` * ``KVM_S390_MEMOP_F_SKEY_PROTECTION`` + * ``KVM_S390_MEMOP_F_CMPXCHG`` + +The semantics of the flags common with logical accesses are as for logical +accesses. + +For write accesses, the KVM_S390_MEMOP_F_CMPXCHG flag is supported if +KVM_CAP_S390_MEM_OP_EXTENSION has flag KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG set. +In this case, instead of doing an unconditional write, the access occurs +only if the target location contains the value pointed to by "old_addr". +This is performed as an atomic cmpxchg with the length specified by the "size" +parameter. "size" must be a power of two up to and including 16. +If the exchange did not take place because the target value doesn't match the +old value, KVM_S390_MEMOP_R_NO_XCHG is returned. +In this case the value "old_addr" points to is replaced by the target value.
-The semantics of the flags are as for logical accesses.
SIDA read/write: ^^^^^^^^^^^^^^^^
The struct is quite large, so this seems nicer.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com Reviewed-by: Thomas Huth thuth@redhat.com --- tools/testing/selftests/kvm/s390x/memop.c | 44 +++++++++++------------ 1 file changed, 22 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 3fd81e58f40c..9c05d1205114 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -48,53 +48,53 @@ struct mop_desc { uint8_t key; };
-static struct kvm_s390_mem_op ksmo_from_desc(struct mop_desc desc) +static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc) { struct kvm_s390_mem_op ksmo = { - .gaddr = (uintptr_t)desc.gaddr, - .size = desc.size, - .buf = ((uintptr_t)desc.buf), + .gaddr = (uintptr_t)desc->gaddr, + .size = desc->size, + .buf = ((uintptr_t)desc->buf), .reserved = "ignored_ignored_ignored_ignored" };
- switch (desc.target) { + switch (desc->target) { case LOGICAL: - if (desc.mode == READ) + if (desc->mode == READ) ksmo.op = KVM_S390_MEMOP_LOGICAL_READ; - if (desc.mode == WRITE) + if (desc->mode == WRITE) ksmo.op = KVM_S390_MEMOP_LOGICAL_WRITE; break; case SIDA: - if (desc.mode == READ) + if (desc->mode == READ) ksmo.op = KVM_S390_MEMOP_SIDA_READ; - if (desc.mode == WRITE) + if (desc->mode == WRITE) ksmo.op = KVM_S390_MEMOP_SIDA_WRITE; break; case ABSOLUTE: - if (desc.mode == READ) + if (desc->mode == READ) ksmo.op = KVM_S390_MEMOP_ABSOLUTE_READ; - if (desc.mode == WRITE) + if (desc->mode == WRITE) ksmo.op = KVM_S390_MEMOP_ABSOLUTE_WRITE; break; case INVALID: ksmo.op = -1; } - if (desc.f_check) + if (desc->f_check) ksmo.flags |= KVM_S390_MEMOP_F_CHECK_ONLY; - if (desc.f_inject) + if (desc->f_inject) ksmo.flags |= KVM_S390_MEMOP_F_INJECT_EXCEPTION; - if (desc._set_flags) - ksmo.flags = desc.set_flags; - if (desc.f_key) { + if (desc->_set_flags) + ksmo.flags = desc->set_flags; + if (desc->f_key) { ksmo.flags |= KVM_S390_MEMOP_F_SKEY_PROTECTION; - ksmo.key = desc.key; + ksmo.key = desc->key; } - if (desc._ar) - ksmo.ar = desc.ar; + if (desc->_ar) + ksmo.ar = desc->ar; else ksmo.ar = 0; - if (desc._sida_offset) - ksmo.sida_offset = desc.sida_offset; + if (desc->_sida_offset) + ksmo.sida_offset = desc->sida_offset;
return ksmo; } @@ -183,7 +183,7 @@ static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) else \ __desc.gaddr = __desc.gaddr_v; \ } \ - __ksmo = ksmo_from_desc(__desc); \ + __ksmo = ksmo_from_desc(&__desc); \ print_memop(__info.vcpu, &__ksmo); \ err##memop_ioctl(__info, &__ksmo); \ })
Replace the DEFAULT_* test helpers by functions, as they don't need the exta flexibility.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com Reviewed-by: Thomas Huth thuth@redhat.com --- tools/testing/selftests/kvm/s390x/memop.c | 82 +++++++++++------------ 1 file changed, 39 insertions(+), 43 deletions(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 9c05d1205114..df1c726294b2 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -48,6 +48,8 @@ struct mop_desc { uint8_t key; };
+const uint8_t NO_KEY = 0xff; + static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc) { struct kvm_s390_mem_op ksmo = { @@ -85,7 +87,7 @@ static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc) ksmo.flags |= KVM_S390_MEMOP_F_INJECT_EXCEPTION; if (desc->_set_flags) ksmo.flags = desc->set_flags; - if (desc->f_key) { + if (desc->f_key && desc->key != NO_KEY) { ksmo.flags |= KVM_S390_MEMOP_F_SKEY_PROTECTION; ksmo.key = desc->key; } @@ -268,34 +270,28 @@ static void prepare_mem12(void) #define ASSERT_MEM_EQ(p1, p2, size) \ TEST_ASSERT(!memcmp(p1, p2, size), "Memory contents do not match!")
-#define DEFAULT_WRITE_READ(copy_cpu, mop_cpu, mop_target_p, size, ...) \ -({ \ - struct test_info __copy_cpu = (copy_cpu), __mop_cpu = (mop_cpu); \ - enum mop_target __target = (mop_target_p); \ - uint32_t __size = (size); \ - \ - prepare_mem12(); \ - CHECK_N_DO(MOP, __mop_cpu, __target, WRITE, mem1, __size, \ - GADDR_V(mem1), ##__VA_ARGS__); \ - HOST_SYNC(__copy_cpu, STAGE_COPIED); \ - CHECK_N_DO(MOP, __mop_cpu, __target, READ, mem2, __size, \ - GADDR_V(mem2), ##__VA_ARGS__); \ - ASSERT_MEM_EQ(mem1, mem2, __size); \ -}) +static void default_write_read(struct test_info copy_cpu, struct test_info mop_cpu, + enum mop_target mop_target, uint32_t size, uint8_t key) +{ + prepare_mem12(); + CHECK_N_DO(MOP, mop_cpu, mop_target, WRITE, mem1, size, + GADDR_V(mem1), KEY(key)); + HOST_SYNC(copy_cpu, STAGE_COPIED); + CHECK_N_DO(MOP, mop_cpu, mop_target, READ, mem2, size, + GADDR_V(mem2), KEY(key)); + ASSERT_MEM_EQ(mem1, mem2, size); +}
-#define DEFAULT_READ(copy_cpu, mop_cpu, mop_target_p, size, ...) \ -({ \ - struct test_info __copy_cpu = (copy_cpu), __mop_cpu = (mop_cpu); \ - enum mop_target __target = (mop_target_p); \ - uint32_t __size = (size); \ - \ - prepare_mem12(); \ - CHECK_N_DO(MOP, __mop_cpu, __target, WRITE, mem1, __size, \ - GADDR_V(mem1)); \ - HOST_SYNC(__copy_cpu, STAGE_COPIED); \ - CHECK_N_DO(MOP, __mop_cpu, __target, READ, mem2, __size, ##__VA_ARGS__);\ - ASSERT_MEM_EQ(mem1, mem2, __size); \ -}) +static void default_read(struct test_info copy_cpu, struct test_info mop_cpu, + enum mop_target mop_target, uint32_t size, uint8_t key) +{ + prepare_mem12(); + CHECK_N_DO(MOP, mop_cpu, mop_target, WRITE, mem1, size, GADDR_V(mem1)); + HOST_SYNC(copy_cpu, STAGE_COPIED); + CHECK_N_DO(MOP, mop_cpu, mop_target, READ, mem2, size, + GADDR_V(mem2), KEY(key)); + ASSERT_MEM_EQ(mem1, mem2, size); +}
static void guest_copy(void) { @@ -310,7 +306,7 @@ static void test_copy(void)
HOST_SYNC(t.vcpu, STAGE_INITED);
- DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, t.size); + default_write_read(t.vcpu, t.vcpu, LOGICAL, t.size, NO_KEY);
kvm_vm_free(t.kvm_vm); } @@ -357,26 +353,26 @@ static void test_copy_key(void) HOST_SYNC(t.vcpu, STAGE_SKEYS_SET);
/* vm, no key */ - DEFAULT_WRITE_READ(t.vcpu, t.vm, ABSOLUTE, t.size); + default_write_read(t.vcpu, t.vm, ABSOLUTE, t.size, NO_KEY);
/* vm/vcpu, machting key or key 0 */ - DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, t.size, KEY(0)); - DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, t.size, KEY(9)); - DEFAULT_WRITE_READ(t.vcpu, t.vm, ABSOLUTE, t.size, KEY(0)); - DEFAULT_WRITE_READ(t.vcpu, t.vm, ABSOLUTE, t.size, KEY(9)); + default_write_read(t.vcpu, t.vcpu, LOGICAL, t.size, 0); + default_write_read(t.vcpu, t.vcpu, LOGICAL, t.size, 9); + default_write_read(t.vcpu, t.vm, ABSOLUTE, t.size, 0); + default_write_read(t.vcpu, t.vm, ABSOLUTE, t.size, 9); /* * There used to be different code paths for key handling depending on * if the region crossed a page boundary. * There currently are not, but the more tests the merrier. */ - DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, 1, KEY(0)); - DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, 1, KEY(9)); - DEFAULT_WRITE_READ(t.vcpu, t.vm, ABSOLUTE, 1, KEY(0)); - DEFAULT_WRITE_READ(t.vcpu, t.vm, ABSOLUTE, 1, KEY(9)); + default_write_read(t.vcpu, t.vcpu, LOGICAL, 1, 0); + default_write_read(t.vcpu, t.vcpu, LOGICAL, 1, 9); + default_write_read(t.vcpu, t.vm, ABSOLUTE, 1, 0); + default_write_read(t.vcpu, t.vm, ABSOLUTE, 1, 9);
/* vm/vcpu, mismatching keys on read, but no fetch protection */ - DEFAULT_READ(t.vcpu, t.vcpu, LOGICAL, t.size, GADDR_V(mem2), KEY(2)); - DEFAULT_READ(t.vcpu, t.vm, ABSOLUTE, t.size, GADDR_V(mem1), KEY(2)); + default_read(t.vcpu, t.vcpu, LOGICAL, t.size, 2); + default_read(t.vcpu, t.vm, ABSOLUTE, t.size, 2);
kvm_vm_free(t.kvm_vm); } @@ -409,7 +405,7 @@ static void test_copy_key_storage_prot_override(void) HOST_SYNC(t.vcpu, STAGE_SKEYS_SET);
/* vcpu, mismatching keys, storage protection override in effect */ - DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, t.size, KEY(2)); + default_write_read(t.vcpu, t.vcpu, LOGICAL, t.size, 2);
kvm_vm_free(t.kvm_vm); } @@ -422,8 +418,8 @@ static void test_copy_key_fetch_prot(void) HOST_SYNC(t.vcpu, STAGE_SKEYS_SET);
/* vm/vcpu, matching key, fetch protection in effect */ - DEFAULT_READ(t.vcpu, t.vcpu, LOGICAL, t.size, GADDR_V(mem2), KEY(9)); - DEFAULT_READ(t.vcpu, t.vm, ABSOLUTE, t.size, GADDR_V(mem2), KEY(9)); + default_read(t.vcpu, t.vcpu, LOGICAL, t.size, 9); + default_read(t.vcpu, t.vm, ABSOLUTE, t.size, 9);
kvm_vm_free(t.kvm_vm); }
This allows checking if the necessary requirements for a test case are met via an arbitrary expression. In particular, it is easy to check if certain bits are set in the memop extension capability.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com Reviewed-by: Thomas Huth thuth@redhat.com --- tools/testing/selftests/kvm/s390x/memop.c | 131 +++++++++++----------- 1 file changed, 66 insertions(+), 65 deletions(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index df1c726294b2..bbc191a13760 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -690,85 +690,86 @@ static void test_errors(void) kvm_vm_free(t.kvm_vm); }
-struct testdef { - const char *name; - void (*test)(void); - int extension; -} testlist[] = { - { - .name = "simple copy", - .test = test_copy, - }, - { - .name = "generic error checks", - .test = test_errors, - }, - { - .name = "copy with storage keys", - .test = test_copy_key, - .extension = 1, - }, - { - .name = "copy with key storage protection override", - .test = test_copy_key_storage_prot_override, - .extension = 1, - }, - { - .name = "copy with key fetch protection", - .test = test_copy_key_fetch_prot, - .extension = 1, - }, - { - .name = "copy with key fetch protection override", - .test = test_copy_key_fetch_prot_override, - .extension = 1, - }, - { - .name = "error checks with key", - .test = test_errors_key, - .extension = 1, - }, - { - .name = "termination", - .test = test_termination, - .extension = 1, - }, - { - .name = "error checks with key storage protection override", - .test = test_errors_key_storage_prot_override, - .extension = 1, - }, - { - .name = "error checks without key fetch prot override", - .test = test_errors_key_fetch_prot_override_not_enabled, - .extension = 1, - }, - { - .name = "error checks with key fetch prot override", - .test = test_errors_key_fetch_prot_override_enabled, - .extension = 1, - }, -};
int main(int argc, char *argv[]) { int extension_cap, idx;
TEST_REQUIRE(kvm_has_cap(KVM_CAP_S390_MEM_OP)); + extension_cap = kvm_check_cap(KVM_CAP_S390_MEM_OP_EXTENSION);
- ksft_print_header(); + struct testdef { + const char *name; + void (*test)(void); + bool requirements_met; + } testlist[] = { + { + .name = "simple copy", + .test = test_copy, + .requirements_met = true, + }, + { + .name = "generic error checks", + .test = test_errors, + .requirements_met = true, + }, + { + .name = "copy with storage keys", + .test = test_copy_key, + .requirements_met = extension_cap > 0, + }, + { + .name = "copy with key storage protection override", + .test = test_copy_key_storage_prot_override, + .requirements_met = extension_cap > 0, + }, + { + .name = "copy with key fetch protection", + .test = test_copy_key_fetch_prot, + .requirements_met = extension_cap > 0, + }, + { + .name = "copy with key fetch protection override", + .test = test_copy_key_fetch_prot_override, + .requirements_met = extension_cap > 0, + }, + { + .name = "error checks with key", + .test = test_errors_key, + .requirements_met = extension_cap > 0, + }, + { + .name = "termination", + .test = test_termination, + .requirements_met = extension_cap > 0, + }, + { + .name = "error checks with key storage protection override", + .test = test_errors_key_storage_prot_override, + .requirements_met = extension_cap > 0, + }, + { + .name = "error checks without key fetch prot override", + .test = test_errors_key_fetch_prot_override_not_enabled, + .requirements_met = extension_cap > 0, + }, + { + .name = "error checks with key fetch prot override", + .test = test_errors_key_fetch_prot_override_enabled, + .requirements_met = extension_cap > 0, + }, + };
+ ksft_print_header(); ksft_set_plan(ARRAY_SIZE(testlist));
- extension_cap = kvm_check_cap(KVM_CAP_S390_MEM_OP_EXTENSION); for (idx = 0; idx < ARRAY_SIZE(testlist); idx++) { - if (extension_cap >= testlist[idx].extension) { + if (testlist[idx].requirements_met) { testlist[idx].test(); ksft_test_result_pass("%s\n", testlist[idx].name); } else { - ksft_test_result_skip("%s - extension level %d not supported\n", - testlist[idx].name, - testlist[idx].extension); + ksft_test_result_skip("%s - requirements not met (kernel has extension cap %#x)\n", + testlist[idx].name, extension_cap); } }
Test successful exchange, unsuccessful exchange, storage key protection and invalid arguments.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com --- tools/testing/selftests/kvm/s390x/memop.c | 408 +++++++++++++++++++++- 1 file changed, 394 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index bbc191a13760..77cac3e502ec 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -9,6 +9,7 @@ #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> +#include <pthread.h>
#include <linux/bits.h>
@@ -44,6 +45,8 @@ struct mop_desc { enum mop_access_mode mode; void *buf; uint32_t sida_offset; + void *old; + bool *cmpxchg_success; uint8_t ar; uint8_t key; }; @@ -91,6 +94,10 @@ static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc) ksmo.flags |= KVM_S390_MEMOP_F_SKEY_PROTECTION; ksmo.key = desc->key; } + if (desc->old) { + ksmo.flags |= KVM_S390_MEMOP_F_CMPXCHG; + ksmo.old_addr = (uint64_t)desc->old; + } if (desc->_ar) ksmo.ar = desc->ar; else @@ -136,35 +143,45 @@ static void print_memop(struct kvm_vcpu *vcpu, const struct kvm_s390_mem_op *ksm printf("ABSOLUTE, WRITE, "); break; } - printf("gaddr=%llu, size=%u, buf=%llu, ar=%u, key=%u", - ksmo->gaddr, ksmo->size, ksmo->buf, ksmo->ar, ksmo->key); + printf("gaddr=%llu, size=%u, buf=%llu, ar=%u, key=%u, old_addr=%llx", + ksmo->gaddr, ksmo->size, ksmo->buf, ksmo->ar, ksmo->key, + ksmo->old_addr); if (ksmo->flags & KVM_S390_MEMOP_F_CHECK_ONLY) printf(", CHECK_ONLY"); if (ksmo->flags & KVM_S390_MEMOP_F_INJECT_EXCEPTION) printf(", INJECT_EXCEPTION"); if (ksmo->flags & KVM_S390_MEMOP_F_SKEY_PROTECTION) printf(", SKEY_PROTECTION"); + if (ksmo->flags & KVM_S390_MEMOP_F_CMPXCHG) + printf(", CMPXCHG"); puts(")"); }
-static void memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) +static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo, + struct mop_desc *desc) { struct kvm_vcpu *vcpu = info.vcpu;
if (!vcpu) - vm_ioctl(info.vm, KVM_S390_MEM_OP, ksmo); + return __vm_ioctl(info.vm, KVM_S390_MEM_OP, ksmo); else - vcpu_ioctl(vcpu, KVM_S390_MEM_OP, ksmo); + return __vcpu_ioctl(vcpu, KVM_S390_MEM_OP, ksmo); }
-static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) +static void memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo, + struct mop_desc *desc) { - struct kvm_vcpu *vcpu = info.vcpu; + int r; + + r = err_memop_ioctl(info, ksmo, desc); + if (ksmo->flags & KVM_S390_MEMOP_F_CMPXCHG) { + if (desc->cmpxchg_success) + *desc->cmpxchg_success = !r; + if (r == KVM_S390_MEMOP_R_NO_XCHG) + r = 0; + } + TEST_ASSERT(!r, __KVM_IOCTL_ERROR("KVM_S390_MEM_OP", r));
- if (!vcpu) - return __vm_ioctl(info.vm, KVM_S390_MEM_OP, ksmo); - else - return __vcpu_ioctl(vcpu, KVM_S390_MEM_OP, ksmo); }
#define MEMOP(err, info_p, mop_target_p, access_mode_p, buf_p, size_p, ...) \ @@ -187,7 +204,7 @@ static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) } \ __ksmo = ksmo_from_desc(&__desc); \ print_memop(__info.vcpu, &__ksmo); \ - err##memop_ioctl(__info, &__ksmo); \ + err##memop_ioctl(__info, &__ksmo, &__desc); \ })
#define MOP(...) MEMOP(, __VA_ARGS__) @@ -201,6 +218,8 @@ static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) #define AR(a) ._ar = 1, .ar = (a) #define KEY(a) .f_key = 1, .key = (a) #define INJECT .f_inject = 1 +#define CMPXCHG_OLD(o) .old = (o) +#define CMPXCHG_SUCCESS(s) .cmpxchg_success = (s)
#define CHECK_N_DO(f, ...) ({ f(__VA_ARGS__, CHECK_ONLY); f(__VA_ARGS__); })
@@ -210,8 +229,8 @@ static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) #define CR0_FETCH_PROTECTION_OVERRIDE (1UL << (63 - 38)) #define CR0_STORAGE_PROTECTION_OVERRIDE (1UL << (63 - 39))
-static uint8_t mem1[65536]; -static uint8_t mem2[65536]; +static uint8_t __aligned(PAGE_SIZE) mem1[65536]; +static uint8_t __aligned(PAGE_SIZE) mem2[65536];
struct test_default { struct kvm_vm *kvm_vm; @@ -243,6 +262,8 @@ enum stage { STAGE_SKEYS_SET, /* Guest copied memory (locations up to test case) */ STAGE_COPIED, + /* End of guest code reached */ + STAGE_DONE, };
#define HOST_SYNC(info_p, stage) \ @@ -254,6 +275,9 @@ enum stage { \ vcpu_run(__vcpu); \ get_ucall(__vcpu, &uc); \ + if (uc.cmd == UCALL_ABORT) { \ + REPORT_GUEST_ASSERT_2(uc, "hints: %lu, %lu"); \ + } \ ASSERT_EQ(uc.cmd, UCALL_SYNC); \ ASSERT_EQ(uc.args[1], __stage); \ }) \ @@ -293,6 +317,44 @@ static void default_read(struct test_info copy_cpu, struct test_info mop_cpu, ASSERT_MEM_EQ(mem1, mem2, size); }
+static void default_cmpxchg(struct test_default *test, uint8_t key) +{ + for (int size = 1; size <= 16; size *= 2) { + for (int offset = 0; offset < 16; offset += size) { + uint8_t __aligned(16) new[16] = {}; + uint8_t __aligned(16) old[16]; + bool succ; + + prepare_mem12(); + default_write_read(test->vcpu, test->vcpu, LOGICAL, 16, NO_KEY); + + memcpy(&old, mem1, 16); + CHECK_N_DO(MOP, test->vm, ABSOLUTE, WRITE, new + offset, + size, GADDR_V(mem1 + offset), + CMPXCHG_OLD(old + offset), + CMPXCHG_SUCCESS(&succ), KEY(key)); + HOST_SYNC(test->vcpu, STAGE_COPIED); + MOP(test->vm, ABSOLUTE, READ, mem2, 16, GADDR_V(mem2)); + TEST_ASSERT(succ, "exchange of values should succeed"); + memcpy(mem1 + offset, new + offset, size); + ASSERT_MEM_EQ(mem1, mem2, 16); + + memcpy(&old, mem1, 16); + new[offset]++; + old[offset]++; + CHECK_N_DO(MOP, test->vm, ABSOLUTE, WRITE, new + offset, + size, GADDR_V(mem1 + offset), + CMPXCHG_OLD(old + offset), + CMPXCHG_SUCCESS(&succ), KEY(key)); + HOST_SYNC(test->vcpu, STAGE_COPIED); + MOP(test->vm, ABSOLUTE, READ, mem2, 16, GADDR_V(mem2)); + TEST_ASSERT(!succ, "exchange of values should not succeed"); + ASSERT_MEM_EQ(mem1, mem2, 16); + ASSERT_MEM_EQ(&old, mem1, 16); + } + } +} + static void guest_copy(void) { GUEST_SYNC(STAGE_INITED); @@ -377,6 +439,250 @@ static void test_copy_key(void) kvm_vm_free(t.kvm_vm); }
+static void test_cmpxchg_key(void) +{ + struct test_default t = test_default_init(guest_copy_key); + + HOST_SYNC(t.vcpu, STAGE_SKEYS_SET); + + default_cmpxchg(&t, NO_KEY); + default_cmpxchg(&t, 0); + default_cmpxchg(&t, 9); + + kvm_vm_free(t.kvm_vm); +} + +static __uint128_t cut_to_size(int size, __uint128_t val) +{ + switch (size) { + case 1: + return (uint8_t)val; + case 2: + return (uint16_t)val; + case 4: + return (uint32_t)val; + case 8: + return (uint64_t)val; + case 16: + return val; + } + GUEST_ASSERT_1(false, "Invalid size"); + return 0; +} + +static bool popcount_eq(__uint128_t a, __uint128_t b) +{ + unsigned int count_a, count_b; + + count_a = __builtin_popcountl((uint64_t)(a >> 64)) + + __builtin_popcountl((uint64_t)a); + count_b = __builtin_popcountl((uint64_t)(b >> 64)) + + __builtin_popcountl((uint64_t)b); + return count_a == count_b; +} + +static __uint128_t rotate(int size, __uint128_t val, int amount) +{ + unsigned int bits = size * 8; + + amount = (amount + bits) % bits; + val = cut_to_size(size, val); + return (val << (bits - amount)) | (val >> amount); +} + +const unsigned int max_block = 16; + +static void choose_block(bool guest, int i, int *size, int *offset) +{ + unsigned int rand; + + rand = i; + if (guest) { + rand = rand * 19 + 11; + *size = 1 << ((rand % 3) + 2); + rand = rand * 19 + 11; + *offset = (rand % max_block) & ~(*size - 1); + } else { + rand = rand * 17 + 5; + *size = 1 << (rand % 5); + rand = rand * 17 + 5; + *offset = (rand % max_block) & ~(*size - 1); + } +} + +static __uint128_t permutate_bits(bool guest, int i, int size, __uint128_t old) +{ + unsigned int rand; + bool swap; + + rand = i; + rand = rand * 3 + 1; + if (guest) + rand = rand * 3 + 1; + swap = rand % 2 == 0; + if (swap) { + int i, j; + __uint128_t new; + uint8_t byte0, byte1; + + rand = rand * 3 + 1; + i = rand % size; + rand = rand * 3 + 1; + j = rand % size; + if (i == j) + return old; + new = rotate(16, old, i * 8); + byte0 = new & 0xff; + new &= ~0xff; + new = rotate(16, new, -i * 8); + new = rotate(16, new, j * 8); + byte1 = new & 0xff; + new = (new & ~0xff) | byte0; + new = rotate(16, new, -j * 8); + new = rotate(16, new, i * 8); + new = new | byte1; + new = rotate(16, new, -i * 8); + return new; + } else { + int amount; + + rand = rand * 3 + 1; + amount = rand % (size * 8); + return rotate(size, old, amount); + } +} + +static bool _cmpxchg(int size, void *target, __uint128_t *old_addr, __uint128_t new) +{ + bool ret; + + switch (size) { + case 4: { + uint32_t old = *old_addr; + + asm volatile ("cs %[old],%[new],%[address]" + : [old] "+d" (old), + [address] "+Q" (*(uint32_t *)(target)) + : [new] "d" ((uint32_t)new) + : "cc" + ); + ret = old == (uint32_t)*old_addr; + *old_addr = old; + return ret; + } + case 8: { + uint64_t old = *old_addr; + + asm volatile ("csg %[old],%[new],%[address]" + : [old] "+d" (old), + [address] "+Q" (*(uint64_t *)(target)) + : [new] "d" ((uint64_t)new) + : "cc" + ); + ret = old == (uint64_t)*old_addr; + *old_addr = old; + return ret; + } + case 16: { + __uint128_t old = *old_addr; + + asm volatile ("cdsg %[old],%[new],%[address]" + : [old] "+d" (old), + [address] "+Q" (*(__uint128_t *)(target)) + : [new] "d" (new) + : "cc" + ); + ret = old == *old_addr; + *old_addr = old; + return ret; + } + } + GUEST_ASSERT_1(false, "Invalid size"); + return 0; +} + +const unsigned int cmpxchg_iter_outer = 100, cmpxchg_iter_inner = 10000; + +static void guest_cmpxchg_key(void) +{ + int size, offset; + __uint128_t old, new; + + set_storage_key_range(mem1, max_block, 0x10); + set_storage_key_range(mem2, max_block, 0x10); + GUEST_SYNC(STAGE_SKEYS_SET); + + for (int i = 0; i < cmpxchg_iter_outer; i++) { + do { + old = 1; + } while (!_cmpxchg(16, mem1, &old, 0)); + for (int j = 0; j < cmpxchg_iter_inner; j++) { + choose_block(true, i + j, &size, &offset); + do { + new = permutate_bits(true, i + j, size, old); + } while (!_cmpxchg(size, mem2 + offset, &old, new)); + } + } + + GUEST_SYNC(STAGE_DONE); +} + +static void *run_guest(void *data) +{ + struct test_info *info = data; + + HOST_SYNC(*info, STAGE_DONE); + return NULL; +} + +static char *quad_to_char(__uint128_t *quad, int size) +{ + return ((char *)quad) + (sizeof(*quad) - size); +} + +static void test_cmpxchg_key_concurrent(void) +{ + struct test_default t = test_default_init(guest_cmpxchg_key); + int size, offset; + __uint128_t old, new; + bool success; + pthread_t thread; + + HOST_SYNC(t.vcpu, STAGE_SKEYS_SET); + prepare_mem12(); + MOP(t.vcpu, LOGICAL, WRITE, mem1, max_block, GADDR_V(mem2)); + pthread_create(&thread, NULL, run_guest, &t.vcpu); + + for (int i = 0; i < cmpxchg_iter_outer; i++) { + do { + old = 0; + new = 1; + MOP(t.vm, ABSOLUTE, WRITE, &new, + sizeof(new), GADDR_V(mem1), + CMPXCHG_OLD(&old), + CMPXCHG_SUCCESS(&success), KEY(1)); + } while (!success); + for (int j = 0; j < cmpxchg_iter_inner; j++) { + choose_block(false, i + j, &size, &offset); + do { + new = permutate_bits(false, i + j, size, old); + MOP(t.vm, ABSOLUTE, WRITE, quad_to_char(&new, size), + size, GADDR_V(mem2 + offset), + CMPXCHG_OLD(quad_to_char(&old, size)), + CMPXCHG_SUCCESS(&success), KEY(1)); + } while (!success); + } + } + + pthread_join(thread, NULL); + + MOP(t.vcpu, LOGICAL, READ, mem2, max_block, GADDR_V(mem2)); + TEST_ASSERT(popcount_eq(*(__uint128_t *)mem1, *(__uint128_t *)mem2), + "Must retain number of set bits"); + + kvm_vm_free(t.kvm_vm); +} + static void guest_copy_key_fetch_prot(void) { /* @@ -457,6 +763,24 @@ static void test_errors_key(void) kvm_vm_free(t.kvm_vm); }
+static void test_errors_cmpxchg_key(void) +{ + struct test_default t = test_default_init(guest_copy_key_fetch_prot); + int i; + + HOST_SYNC(t.vcpu, STAGE_INITED); + HOST_SYNC(t.vcpu, STAGE_SKEYS_SET); + + for (i = 1; i <= 16; i *= 2) { + __uint128_t old = 0; + + CHECK_N_DO(ERR_PROT_MOP, t.vm, ABSOLUTE, WRITE, mem2, i, GADDR_V(mem2), + CMPXCHG_OLD(&old), KEY(2)); + } + + kvm_vm_free(t.kvm_vm); +} + static void test_termination(void) { struct test_default t = test_default_init(guest_error_key); @@ -690,6 +1014,42 @@ static void test_errors(void) kvm_vm_free(t.kvm_vm); }
+static void test_errors_cmpxchg(void) +{ + struct test_default t = test_default_init(guest_idle); + __uint128_t old; + int rv, i, power = 1; + + HOST_SYNC(t.vcpu, STAGE_INITED); + + for (i = 0; i < 32; i++) { + if (i == power) { + power *= 2; + continue; + } + rv = ERR_MOP(t.vm, ABSOLUTE, WRITE, mem1, i, GADDR_V(mem1), + CMPXCHG_OLD(&old)); + TEST_ASSERT(rv == -1 && errno == EINVAL, + "ioctl allows bad size for cmpxchg"); + } + for (i = 1; i <= 16; i *= 2) { + rv = ERR_MOP(t.vm, ABSOLUTE, WRITE, mem1, i, GADDR((void *)~0xfffUL), + CMPXCHG_OLD(&old)); + TEST_ASSERT(rv > 0, "ioctl allows bad guest address for cmpxchg"); + rv = ERR_MOP(t.vm, ABSOLUTE, READ, mem1, i, GADDR_V(mem1), + CMPXCHG_OLD(&old)); + TEST_ASSERT(rv == -1 && errno == EINVAL, + "ioctl allows read cmpxchg call"); + } + for (i = 2; i <= 16; i *= 2) { + rv = ERR_MOP(t.vm, ABSOLUTE, WRITE, mem1, i, GADDR_V(mem1 + 1), + CMPXCHG_OLD(&old)); + TEST_ASSERT(rv == -1 && errno == EINVAL, + "ioctl allows bad alignment for cmpxchg"); + } + + kvm_vm_free(t.kvm_vm); +}
int main(int argc, char *argv[]) { @@ -718,6 +1078,16 @@ int main(int argc, char *argv[]) .test = test_copy_key, .requirements_met = extension_cap > 0, }, + { + .name = "cmpxchg with storage keys", + .test = test_cmpxchg_key, + .requirements_met = extension_cap & 0x2, + }, + { + .name = "concurrently cmpxchg with storage keys", + .test = test_cmpxchg_key_concurrent, + .requirements_met = extension_cap & 0x2, + }, { .name = "copy with key storage protection override", .test = test_copy_key_storage_prot_override, @@ -738,6 +1108,16 @@ int main(int argc, char *argv[]) .test = test_errors_key, .requirements_met = extension_cap > 0, }, + { + .name = "error checks for cmpxchg with key", + .test = test_errors_cmpxchg_key, + .requirements_met = extension_cap & 0x2, + }, + { + .name = "error checks for cmpxchg", + .test = test_errors_cmpxchg, + .requirements_met = extension_cap & 0x2, + }, { .name = "termination", .test = test_termination,
Add test that tries to access, instead of CHECK_ONLY.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com Reviewed-by: Nico Boehr nrb@linux.ibm.com --- tools/testing/selftests/kvm/s390x/memop.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 77cac3e502ec..5d5096fad161 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -965,7 +965,9 @@ static void _test_errors_common(struct test_info info, enum mop_target target, i
/* Bad guest address: */ rv = ERR_MOP(info, target, WRITE, mem1, size, GADDR((void *)~0xfffUL), CHECK_ONLY); - TEST_ASSERT(rv > 0, "ioctl does not report bad guest memory access"); + TEST_ASSERT(rv > 0, "ioctl does not report bad guest memory address"); + rv = ERR_MOP(info, target, WRITE, mem1, size, GADDR((void *)~0xfffUL)); + TEST_ASSERT(rv > 0, "ioctl does not report bad guest memory address");
/* Bad host address: */ rv = ERR_MOP(info, target, WRITE, 0, size, GADDR_V(mem1));
"acceeded" isn't a word, should be "exceeded".
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com Reviewed-by: Thomas Huth thuth@redhat.com Reviewed-by: Nico Boehr nrb@linux.ibm.com --- tools/testing/selftests/kvm/s390x/memop.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 5d5096fad161..238a3f20bcc1 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -926,7 +926,7 @@ static void test_errors_key_fetch_prot_override_enabled(void)
/* * vcpu, mismatching keys on fetch, - * fetch protection override does not apply because memory range acceeded + * fetch protection override does not apply because memory range exceeded */ CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, 2048 + 1, GADDR_V(0), KEY(2)); CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, PAGE_SIZE + 2048 + 1,
The guest code sets the key for mem1 only. In order to provoke a protection exception the test codes needs to address mem1.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com Reviewed-by: Nico Boehr nrb@linux.ibm.com --- tools/testing/selftests/kvm/s390x/memop.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 238a3f20bcc1..03f74c6c9fee 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -756,9 +756,9 @@ static void test_errors_key(void)
/* vm/vcpu, mismatching keys, fetch protection in effect */ CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, WRITE, mem1, t.size, GADDR_V(mem1), KEY(2)); - CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, t.size, GADDR_V(mem2), KEY(2)); + CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, t.size, GADDR_V(mem1), KEY(2)); CHECK_N_DO(ERR_PROT_MOP, t.vm, ABSOLUTE, WRITE, mem1, t.size, GADDR_V(mem1), KEY(2)); - CHECK_N_DO(ERR_PROT_MOP, t.vm, ABSOLUTE, READ, mem2, t.size, GADDR_V(mem2), KEY(2)); + CHECK_N_DO(ERR_PROT_MOP, t.vm, ABSOLUTE, READ, mem2, t.size, GADDR_V(mem1), KEY(2));
kvm_vm_free(t.kvm_vm); }
The address is a 64 bit value, specifying a 32 bit value can crash the guest. In this case things worked out with -O2 but not -O0.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com Fixes: 1bb873495a9e ("KVM: s390: selftests: Add more copy memop tests") --- tools/testing/selftests/kvm/s390x/memop.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 03f74c6c9fee..2173f7bca601 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -838,7 +838,7 @@ static void guest_copy_key_fetch_prot_override(void) GUEST_SYNC(STAGE_INITED); set_storage_key_range(0, PAGE_SIZE, 0x18); set_storage_key_range((void *)last_page_addr, PAGE_SIZE, 0x0); - asm volatile ("sske %[key],%[addr]\n" :: [addr] "r"(0), [key] "r"(0x18) : "cc"); + asm volatile ("sske %[key],%[addr]\n" :: [addr] "r"(0L), [key] "r"(0x18) : "cc"); GUEST_SYNC(STAGE_SKEYS_SET);
for (;;) {
On 1/10/23 21:26, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Also contains some fixes/changes for the memop selftest independent of the cmpxchg changes.
Since the selftest fixes seem to apply and run without the new code I'm considering splitting them off entirely.
Most of them have reviews already and they are lower risk anyway so we could add them to devel rather soonish.
linux-kselftest-mirror@lists.linaro.org