User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Janis Schoetterl-Glausch (9): s390/uaccess: Add storage key checked cmpxchg access to user space KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG KVM: s390: selftest: memop: Pass mop_desc via pointer KVM: s390: selftest: memop: Replace macros by functions KVM: s390: selftest: memop: Add bad address test KVM: s390: selftest: memop: Add cmpxchg tests KVM: s390: selftest: memop: Fix typo KVM: s390: selftest: memop: Fix wrong address being used in test
Documentation/virt/kvm/api.rst | 18 +- include/uapi/linux/kvm.h | 5 + arch/s390/include/asm/uaccess.h | 187 ++++++ arch/s390/kvm/gaccess.h | 4 + arch/s390/kvm/gaccess.c | 56 ++ arch/s390/kvm/kvm-s390.c | 50 +- tools/testing/selftests/kvm/s390x/memop.c | 704 +++++++++++++++++----- 7 files changed, 874 insertions(+), 150 deletions(-)
base-commit: f76349cf41451c5c42a99f18a9163377e4b364ff
Add cmpxchg functionality similar to that in cmpxchg.h except that the target is a user space address and that the address' storage key is matched with the access_key argument in order to honor key-controlled protection. The access is performed by changing to the secondary-spaces mode and setting the PSW key for the duration of the compare and swap.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com ---
Possible variations: * check the assumptions made in cmpxchg_user_key_size and error out * call functions called by copy_to_user * access_ok? is a nop * should_fail_usercopy? * instrument_copy_to_user? doesn't make sense IMO * don't be overly strict in cmpxchg_user_key
arch/s390/include/asm/uaccess.h | 187 ++++++++++++++++++++++++++++++++ 1 file changed, 187 insertions(+)
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h index f7038b800cc3..0ce90b7e2b75 100644 --- a/arch/s390/include/asm/uaccess.h +++ b/arch/s390/include/asm/uaccess.h @@ -19,6 +19,8 @@ #include <asm/extable.h> #include <asm/facility.h> #include <asm-generic/access_ok.h> +#include <asm/page.h> +#include <linux/log2.h>
void debug_user_asce(int exit);
@@ -390,4 +392,189 @@ do { \ goto err_label; \ } while (0)
+static __always_inline int __cmpxchg_user_key_small(int size, u64 address, + unsigned __int128 *old_p, + unsigned __int128 new, u8 access_key) +{ + u32 shift, mask, old_word, new_word, align_mask, tmp, diff; + u64 aligned; + int ret = -EFAULT; + + switch (size) { + case 2: + align_mask = 2; + aligned = (address ^ (address & align_mask)); + shift = (sizeof(u32) - (address & align_mask) - size) * 8; + mask = 0xffff << shift; + old_word = ((u16)*old_p) << shift; + new_word = ((u16)new) << shift; + break; + case 1: + align_mask = 3; + aligned = (address ^ (address & align_mask)); + shift = (sizeof(u32) - (address & align_mask) - size) * 8; + mask = 0xff << shift; + old_word = ((u8)*old_p) << shift; + new_word = ((u8)new) << shift; + break; + } + asm volatile( + "spka 0(%[access_key])\n" + " sacf 256\n" + "0: l %[tmp],%[aligned]\n" + "1: nr %[tmp],%[hole_mask]\n" + " or %[new_word],%[tmp]\n" + " or %[old_word],%[tmp]\n" + " lr %[tmp],%[old_word]\n" + "2: cs %[tmp],%[new_word],%[aligned]\n" + "3: jnl 4f\n" + " xrk %[diff],%[tmp],%[old_word]\n" + " nr %[diff],%[hole_mask]\n" + " xr %[new_word],%[diff]\n" + " xr %[old_word],%[diff]\n" + " xrk %[diff],%[tmp],%[old_word]\n" + " jz 2b\n" + "4: ipm %[ret]\n" + " srl %[ret],28\n" + "5: sacf 768\n" + " spka %[default_key]\n" + EX_TABLE(0b, 5b) EX_TABLE(1b, 5b) + EX_TABLE(2b, 5b) EX_TABLE(3b, 5b) + : [old_word] "+&d" (old_word), + [new_word] "+&d" (new_word), + [tmp] "=&d" (tmp), + [aligned] "+Q" (*(u32 *)aligned), + [diff] "=&d" (diff), + [ret] "+d" (ret) + : [access_key] "a" (access_key << 4), + [hole_mask] "d" (~mask), + [default_key] "J" (PAGE_DEFAULT_KEY) + : "cc" + ); + *old_p = (tmp & mask) >> shift; + return ret; +} + +/** + * cmpxchg_user_key_size() - cmpxchg with user space target, honoring storage keys + * @size: Size of the value being cmpxchg'ed, one of 1,2,4,8,16. + * @address: User space address of value to compare to *@old_p and exchange with + * *@new. Must be aligned to @size. + * @old_p: Pointer to old value. Interpreted as a @size byte integer and compared + * to the content pointed to by @address in order to determine if the + * exchange occurs. The value read from @address is written back to *@old_p. + * @new: New value to place at @address, interpreted as a @size byte integer. + * @access_key: Access key to use for checking storage key protection. + * + * Perform a cmpxchg on a user space target, honoring storage key protection. + * @access_key alone determines how key checking is performed, neither + * storage-protection-override nor fetch-protection-override apply. + * + * Return: 0: successful exchange + * 1: exchange failed + * -EFAULT: @address not accessible or not naturally aligned + * -EINVAL: invalid @size + */ +static __always_inline int cmpxchg_user_key_size(int size, void __user *address, + unsigned __int128 *old_p, + unsigned __int128 new, u8 access_key) +{ + union { + u32 word; + u64 doubleword; + } old; + int ret = -EFAULT; + + /* + * The following assumes that: + * * the current psw key is the default key + * * no storage protection overrides are in effect + */ + might_fault(); + switch (size) { + case 16: + asm volatile( + "spka 0(%[access_key])\n" + " sacf 256\n" + "0: cdsg %[old],%[new],%[target]\n" + "1: ipm %[ret]\n" + " srl %[ret],28\n" + "2: sacf 768\n" + " spka %[default_key]\n" + EX_TABLE(0b, 2b) EX_TABLE(1b, 2b) + : [old] "+d" (*old_p), + [target] "+Q" (*(unsigned __int128 __user *)address), + [ret] "+d" (ret) + : [access_key] "a" (access_key << 4), + [new] "d" (new), + [default_key] "J" (PAGE_DEFAULT_KEY) + : "cc" + ); + return ret; + case 8: + old.doubleword = *old_p; + asm volatile( + "spka 0(%[access_key])\n" + " sacf 256\n" + "0: csg %[old],%[new],%[target]\n" + "1: ipm %[ret]\n" + " srl %[ret],28\n" + "2: sacf 768\n" + " spka %[default_key]\n" + EX_TABLE(0b, 2b) EX_TABLE(1b, 2b) + : [old] "+d" (old.doubleword), + [target] "+Q" (*(u64 __user *)address), + [ret] "+d" (ret) + : [access_key] "a" (access_key << 4), + [new] "d" ((u64)new), + [default_key] "J" (PAGE_DEFAULT_KEY) + : "cc" + ); + *old_p = old.doubleword; + return ret; + case 4: + old.word = *old_p; + asm volatile( + "spka 0(%[access_key])\n" + " sacf 256\n" + "0: cs %[old],%[new],%[target]\n" + "1: ipm %[ret]\n" + " srl %[ret],28\n" + "2: sacf 768\n" + " spka %[default_key]\n" + EX_TABLE(0b, 2b) EX_TABLE(1b, 2b) + : [old] "+d" (old.word), + [target] "+Q" (*(u32 __user *)address), + [ret] "+d" (ret) + : [access_key] "a" (access_key << 4), + [new] "d" ((u32)new), + [default_key] "J" (PAGE_DEFAULT_KEY) + : "cc" + ); + *old_p = old.word; + return ret; + case 2: + case 1: + return __cmpxchg_user_key_small(size, (u64)address, old_p, new, access_key); + default: + return -EINVAL; + } +} + +#define cmpxchg_user_key(target_p, old_p, new, access_key) \ +({ \ + __typeof__(old_p) __old_p = (old_p); \ + unsigned __int128 __old = *__old_p; \ + size_t __size = sizeof(*(target_p)); \ + int __ret; \ + \ + BUILD_BUG_ON(__size != sizeof(*__old_p)); \ + BUILD_BUG_ON(__size != sizeof(new)); \ + BUILD_BUG_ON(__size > 16 || !is_power_of_2(__size)); \ + __ret = cmpxchg_user_key_size(__size, (target_p), &__old, (new), \ + (access_key)); \ + *__old_p = __old; \ + __ret; \ +}) + #endif /* __S390_UACCESS_H */
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com ---
The return value of MEM_OP is: 0 on success, < 0 on generic error (e.g. -EFAULT or -ENOMEM),
0 if an exception occurred while walking the page tables
A cmpxchg failing because the old value doesn't match is neither an error nor an exception, so the question is how best to signal that condition. This is not strictly necessary since user space can compare the value of old after the MEM_OP with the value it set. If they're different the cmpxchg failed. It might be a better user interface if there is an easier way to see if the cmpxchg failed. This patch sets the cmpxchg flag bit to 0 on a successful cmpxchg. This way you can compare against a constant instead of the old old value. This has the disadvantage of being a bit weird, other suggestions welcome.
include/uapi/linux/kvm.h | 5 ++++ arch/s390/kvm/gaccess.h | 4 +++ arch/s390/kvm/gaccess.c | 56 ++++++++++++++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 50 ++++++++++++++++++++++++++++++----- 4 files changed, 109 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index eed0315a77a6..b856705f3f6b 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -580,7 +580,9 @@ struct kvm_translation { struct kvm_s390_mem_op { /* in */ __u64 gaddr; /* the guest address */ + /* in & out */ __u64 flags; /* flags */ + /* in */ __u32 size; /* amount of bytes */ __u32 op; /* type of operation */ __u64 buf; /* buffer in userspace */ @@ -588,6 +590,8 @@ struct kvm_s390_mem_op { struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */ + /* in & out */ + __u64 old[2]; /* ignored if flag unset */ }; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */ @@ -604,6 +608,7 @@ struct kvm_s390_mem_op { #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) +#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3)
/* for KVM_INTERRUPT */ struct kvm_interrupt { diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 9408d6cc8e2c..a1cb66ae0995 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,6 +206,10 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode);
+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, + unsigned __int128 *old, + unsigned __int128 new, u8 access_key); + /** * write_guest_with_key - copy data from kernel space to guest space * @vcpu: virtual cpu diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 0243b6e38d36..c0e490ecc372 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -1161,6 +1161,62 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, return rc; }
+/** + * cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address. + * @kvm: Virtual machine instance. + * @gpa: Absolute guest address of the location to be changed. + * @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a + * non power of two will result in failure. + * @old_p: Pointer to old value. If the location at @gpa contains this value, the + * exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old + * contains the value at @gpa before the attempt to exchange the value. + * @new: The value to place at @gpa. + * @access_key: The access key to use for the guest access. + * + * Atomically exchange the value at @gpa by @new, if it contains *@old. + * Honors storage keys. + * + * Return: * 0: successful exchange + * * 1: exchange unsuccessful + * * a program interruption code indicating the reason cmpxchg could + * not be attempted + * * -EINVAL: address misaligned or len not power of two + */ +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, + unsigned __int128 *old_p, unsigned __int128 new, + u8 access_key) +{ + gfn_t gfn = gpa >> PAGE_SHIFT; + struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); + bool writable; + hva_t hva; + int ret; + + if (!IS_ALIGNED(gpa, len)) + return -EINVAL; + + hva = gfn_to_hva_memslot_prot(slot, gfn, &writable); + if (kvm_is_error_hva(hva)) + return PGM_ADDRESSING; + /* + * Check if it's a ro memslot, even tho that can't occur (they're unsupported). + * Don't try to actually handle that case. + */ + if (!writable) + return -EOPNOTSUPP; + + hva += offset_in_page(gpa); + ret = cmpxchg_user_key_size(len, (void __user *)hva, old_p, new, access_key); + mark_page_dirty_in_slot(kvm, slot, gfn); + /* + * Assume that the fault is caused by key protection, the alternative + * is that the user page is write protected. + */ + if (ret == -EFAULT) + ret = PGM_PROTECTION; + return ret; +} + /** * guest_translate_address_with_key - translate guest logical into guest absolute address * @vcpu: virtual cpu diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index b7ef0b71014d..d594d1318d2a 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -576,7 +576,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_VCPU_RESETS: case KVM_CAP_SET_GUEST_DEBUG: case KVM_CAP_S390_DIAG318: - case KVM_CAP_S390_MEM_OP_EXTENSION: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: @@ -590,6 +589,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_MEM_OP: r = MEM_OP_MAX_SIZE; break; + case KVM_CAP_S390_MEM_OP_EXTENSION: + r = 0x3; + break; case KVM_CAP_NR_VCPUS: case KVM_CAP_MAX_VCPUS: case KVM_CAP_MAX_VCPU_ID: @@ -2711,15 +2713,22 @@ static bool access_key_invalid(u8 access_key) return access_key > 0xf; }
-static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) +static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop, bool *modified) { void __user *uaddr = (void __user *)mop->buf; + unsigned __int128 old; + union { + unsigned __int128 quad; + char raw[sizeof(unsigned __int128)]; + } new = { .quad = 0 }; u64 supported_flags; void *tmpbuf = NULL; int r, srcu_idx;
+ *modified = false; supported_flags = KVM_S390_MEMOP_F_SKEY_PROTECTION - | KVM_S390_MEMOP_F_CHECK_ONLY; + | KVM_S390_MEMOP_F_CHECK_ONLY + | KVM_S390_MEMOP_F_CMPXCHG; if (mop->flags & ~supported_flags || !mop->size) return -EINVAL; if (mop->size > MEM_OP_MAX_SIZE) @@ -2741,6 +2750,13 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) } else { mop->key = 0; } + if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) { + if (mop->size > sizeof(new)) + return -EINVAL; + if (copy_from_user(&new.raw[sizeof(new) - mop->size], uaddr, mop->size)) + return -EFAULT; + memcpy(&old, mop->old, sizeof(old)); + } if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) { tmpbuf = vmalloc(mop->size); if (!tmpbuf) @@ -2771,6 +2787,16 @@ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop) case KVM_S390_MEMOP_ABSOLUTE_WRITE: { if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_STORE, mop->key); + } else if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) { + r = cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, + &old, new.quad, mop->key); + if (!r) { + mop->flags &= ~KVM_S390_MEMOP_F_CMPXCHG; + } else if (r == 1) { + memcpy(mop->old, &old, sizeof(old)); + r = 0; + } + *modified = true; } else { if (copy_from_user(tmpbuf, uaddr, mop->size)) { r = -EFAULT; @@ -2918,11 +2944,23 @@ long kvm_arch_vm_ioctl(struct file *filp, } case KVM_S390_MEM_OP: { struct kvm_s390_mem_op mem_op; + bool modified;
- if (copy_from_user(&mem_op, argp, sizeof(mem_op)) == 0) - r = kvm_s390_vm_mem_op(kvm, &mem_op); - else + r = copy_from_user(&mem_op, argp, sizeof(mem_op)); + if (r) { r = -EFAULT; + break; + } + r = kvm_s390_vm_mem_op(kvm, &mem_op, &modified); + if (r) + break; + if (modified) { + r = copy_to_user(argp, &mem_op, sizeof(mem_op)); + if (r) { + r = -EFAULT; + break; + } + } break; } case KVM_S390_ZPCI_OP: {
Describe the semantics of the new KVM_S390_MEMOP_F_CMPXCHG flag for absolute vm write memops which allows user space to perform (storage key checked) cmpxchg operations on guest memory.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com --- Documentation/virt/kvm/api.rst | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index abd7c32126ce..0e02d66e38ae 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -3771,6 +3771,7 @@ Parameters are specified via the following structure:: struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */ + __u64 old[2]; /* ignored if flag unset */ }; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */ @@ -3853,8 +3854,23 @@ Absolute accesses are permitted for non-protected guests only. Supported flags: * ``KVM_S390_MEMOP_F_CHECK_ONLY`` * ``KVM_S390_MEMOP_F_SKEY_PROTECTION`` + * ``KVM_S390_MEMOP_F_CMPXCHG`` + +The semantics of the flags common with logical acesses are as for logical accesses. + +For write accesses, the KVM_S390_MEMOP_F_CMPXCHG might be supported. +In this case, instead of doing an unconditional write, the access occurs only +if the target location contains the value provided in "old". This is performed +as an atomic cmpxchg. "size" must be a power of two up to and including 16. +Values with sizes <8 byte are to be provided by assignment to "old[1]". +Doublewords are provided with the higher value word in "old[0]" and the lower +word in "old[1]". +The value at the target location is returned in "old", encoded in the same manner. +If the value was exchanged the KVM_S390_MEMOP_F_CMPXCHG bit in "flags" is set to +0, otherwise it remains set. +The KVM_S390_MEMOP_F_CMPXCHG flag is supported if KVM_CAP_S390_MEM_OP_EXTENSION +has bit 1 (i.e. bit with value 2) set.
-The semantics of the flags are as for logical accesses.
SIDA read/write: ^^^^^^^^^^^^^^^^
The struct is quite large, so this seems nicer.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com --- tools/testing/selftests/kvm/s390x/memop.c | 44 +++++++++++------------ 1 file changed, 22 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 9113696d5178..69869c7e2ab1 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -48,53 +48,53 @@ struct mop_desc { uint8_t key; };
-static struct kvm_s390_mem_op ksmo_from_desc(struct mop_desc desc) +static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc) { struct kvm_s390_mem_op ksmo = { - .gaddr = (uintptr_t)desc.gaddr, - .size = desc.size, - .buf = ((uintptr_t)desc.buf), + .gaddr = (uintptr_t)desc->gaddr, + .size = desc->size, + .buf = ((uintptr_t)desc->buf), .reserved = "ignored_ignored_ignored_ignored" };
- switch (desc.target) { + switch (desc->target) { case LOGICAL: - if (desc.mode == READ) + if (desc->mode == READ) ksmo.op = KVM_S390_MEMOP_LOGICAL_READ; - if (desc.mode == WRITE) + if (desc->mode == WRITE) ksmo.op = KVM_S390_MEMOP_LOGICAL_WRITE; break; case SIDA: - if (desc.mode == READ) + if (desc->mode == READ) ksmo.op = KVM_S390_MEMOP_SIDA_READ; - if (desc.mode == WRITE) + if (desc->mode == WRITE) ksmo.op = KVM_S390_MEMOP_SIDA_WRITE; break; case ABSOLUTE: - if (desc.mode == READ) + if (desc->mode == READ) ksmo.op = KVM_S390_MEMOP_ABSOLUTE_READ; - if (desc.mode == WRITE) + if (desc->mode == WRITE) ksmo.op = KVM_S390_MEMOP_ABSOLUTE_WRITE; break; case INVALID: ksmo.op = -1; } - if (desc.f_check) + if (desc->f_check) ksmo.flags |= KVM_S390_MEMOP_F_CHECK_ONLY; - if (desc.f_inject) + if (desc->f_inject) ksmo.flags |= KVM_S390_MEMOP_F_INJECT_EXCEPTION; - if (desc._set_flags) - ksmo.flags = desc.set_flags; - if (desc.f_key) { + if (desc->_set_flags) + ksmo.flags = desc->set_flags; + if (desc->f_key) { ksmo.flags |= KVM_S390_MEMOP_F_SKEY_PROTECTION; - ksmo.key = desc.key; + ksmo.key = desc->key; } - if (desc._ar) - ksmo.ar = desc.ar; + if (desc->_ar) + ksmo.ar = desc->ar; else ksmo.ar = 0; - if (desc._sida_offset) - ksmo.sida_offset = desc.sida_offset; + if (desc->_sida_offset) + ksmo.sida_offset = desc->sida_offset;
return ksmo; } @@ -183,7 +183,7 @@ static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) else \ __desc.gaddr = __desc.gaddr_v; \ } \ - __ksmo = ksmo_from_desc(__desc); \ + __ksmo = ksmo_from_desc(&__desc); \ print_memop(__info.vcpu, &__ksmo); \ err##memop_ioctl(__info, &__ksmo); \ })
Replace the DEFAULT_* test helpers by functions, as they don't need the exta flexibility.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com --- tools/testing/selftests/kvm/s390x/memop.c | 82 +++++++++++------------ 1 file changed, 39 insertions(+), 43 deletions(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 69869c7e2ab1..286185a59238 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -48,6 +48,8 @@ struct mop_desc { uint8_t key; };
+const uint8_t NO_KEY = 0xff; + static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc) { struct kvm_s390_mem_op ksmo = { @@ -85,7 +87,7 @@ static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc) ksmo.flags |= KVM_S390_MEMOP_F_INJECT_EXCEPTION; if (desc->_set_flags) ksmo.flags = desc->set_flags; - if (desc->f_key) { + if (desc->f_key && desc->key != NO_KEY) { ksmo.flags |= KVM_S390_MEMOP_F_SKEY_PROTECTION; ksmo.key = desc->key; } @@ -268,34 +270,28 @@ static void prepare_mem12(void) #define ASSERT_MEM_EQ(p1, p2, size) \ TEST_ASSERT(!memcmp(p1, p2, size), "Memory contents do not match!")
-#define DEFAULT_WRITE_READ(copy_cpu, mop_cpu, mop_target_p, size, ...) \ -({ \ - struct test_info __copy_cpu = (copy_cpu), __mop_cpu = (mop_cpu); \ - enum mop_target __target = (mop_target_p); \ - uint32_t __size = (size); \ - \ - prepare_mem12(); \ - CHECK_N_DO(MOP, __mop_cpu, __target, WRITE, mem1, __size, \ - GADDR_V(mem1), ##__VA_ARGS__); \ - HOST_SYNC(__copy_cpu, STAGE_COPIED); \ - CHECK_N_DO(MOP, __mop_cpu, __target, READ, mem2, __size, \ - GADDR_V(mem2), ##__VA_ARGS__); \ - ASSERT_MEM_EQ(mem1, mem2, __size); \ -}) +static void default_write_read(struct test_info copy_cpu, struct test_info mop_cpu, + enum mop_target mop_target, uint32_t size, uint8_t key) +{ + prepare_mem12(); + CHECK_N_DO(MOP, mop_cpu, mop_target, WRITE, mem1, size, + GADDR_V(mem1), KEY(key)); + HOST_SYNC(copy_cpu, STAGE_COPIED); + CHECK_N_DO(MOP, mop_cpu, mop_target, READ, mem2, size, + GADDR_V(mem2), KEY(key)); + ASSERT_MEM_EQ(mem1, mem2, size); +}
-#define DEFAULT_READ(copy_cpu, mop_cpu, mop_target_p, size, ...) \ -({ \ - struct test_info __copy_cpu = (copy_cpu), __mop_cpu = (mop_cpu); \ - enum mop_target __target = (mop_target_p); \ - uint32_t __size = (size); \ - \ - prepare_mem12(); \ - CHECK_N_DO(MOP, __mop_cpu, __target, WRITE, mem1, __size, \ - GADDR_V(mem1)); \ - HOST_SYNC(__copy_cpu, STAGE_COPIED); \ - CHECK_N_DO(MOP, __mop_cpu, __target, READ, mem2, __size, ##__VA_ARGS__);\ - ASSERT_MEM_EQ(mem1, mem2, __size); \ -}) +static void default_read(struct test_info copy_cpu, struct test_info mop_cpu, + enum mop_target mop_target, uint32_t size, uint8_t key) +{ + prepare_mem12(); + CHECK_N_DO(MOP, mop_cpu, mop_target, WRITE, mem1, size, GADDR_V(mem1)); + HOST_SYNC(copy_cpu, STAGE_COPIED); + CHECK_N_DO(MOP, mop_cpu, mop_target, READ, mem2, size, + GADDR_V(mem2), KEY(key)); + ASSERT_MEM_EQ(mem1, mem2, size); +}
static void guest_copy(void) { @@ -310,7 +306,7 @@ static void test_copy(void)
HOST_SYNC(t.vcpu, STAGE_INITED);
- DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, t.size); + default_write_read(t.vcpu, t.vcpu, LOGICAL, t.size, NO_KEY);
kvm_vm_free(t.kvm_vm); } @@ -357,26 +353,26 @@ static void test_copy_key(void) HOST_SYNC(t.vcpu, STAGE_SKEYS_SET);
/* vm, no key */ - DEFAULT_WRITE_READ(t.vcpu, t.vm, ABSOLUTE, t.size); + default_write_read(t.vcpu, t.vm, ABSOLUTE, t.size, NO_KEY);
/* vm/vcpu, machting key or key 0 */ - DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, t.size, KEY(0)); - DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, t.size, KEY(9)); - DEFAULT_WRITE_READ(t.vcpu, t.vm, ABSOLUTE, t.size, KEY(0)); - DEFAULT_WRITE_READ(t.vcpu, t.vm, ABSOLUTE, t.size, KEY(9)); + default_write_read(t.vcpu, t.vcpu, LOGICAL, t.size, 0); + default_write_read(t.vcpu, t.vcpu, LOGICAL, t.size, 9); + default_write_read(t.vcpu, t.vm, ABSOLUTE, t.size, 0); + default_write_read(t.vcpu, t.vm, ABSOLUTE, t.size, 9); /* * There used to be different code paths for key handling depending on * if the region crossed a page boundary. * There currently are not, but the more tests the merrier. */ - DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, 1, KEY(0)); - DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, 1, KEY(9)); - DEFAULT_WRITE_READ(t.vcpu, t.vm, ABSOLUTE, 1, KEY(0)); - DEFAULT_WRITE_READ(t.vcpu, t.vm, ABSOLUTE, 1, KEY(9)); + default_write_read(t.vcpu, t.vcpu, LOGICAL, 1, 0); + default_write_read(t.vcpu, t.vcpu, LOGICAL, 1, 9); + default_write_read(t.vcpu, t.vm, ABSOLUTE, 1, 0); + default_write_read(t.vcpu, t.vm, ABSOLUTE, 1, 9);
/* vm/vcpu, mismatching keys on read, but no fetch protection */ - DEFAULT_READ(t.vcpu, t.vcpu, LOGICAL, t.size, GADDR_V(mem2), KEY(2)); - DEFAULT_READ(t.vcpu, t.vm, ABSOLUTE, t.size, GADDR_V(mem1), KEY(2)); + default_read(t.vcpu, t.vcpu, LOGICAL, t.size, 2); + default_read(t.vcpu, t.vm, ABSOLUTE, t.size, 2);
kvm_vm_free(t.kvm_vm); } @@ -409,7 +405,7 @@ static void test_copy_key_storage_prot_override(void) HOST_SYNC(t.vcpu, STAGE_SKEYS_SET);
/* vcpu, mismatching keys, storage protection override in effect */ - DEFAULT_WRITE_READ(t.vcpu, t.vcpu, LOGICAL, t.size, KEY(2)); + default_write_read(t.vcpu, t.vcpu, LOGICAL, t.size, 2);
kvm_vm_free(t.kvm_vm); } @@ -422,8 +418,8 @@ static void test_copy_key_fetch_prot(void) HOST_SYNC(t.vcpu, STAGE_SKEYS_SET);
/* vm/vcpu, matching key, fetch protection in effect */ - DEFAULT_READ(t.vcpu, t.vcpu, LOGICAL, t.size, GADDR_V(mem2), KEY(9)); - DEFAULT_READ(t.vcpu, t.vm, ABSOLUTE, t.size, GADDR_V(mem2), KEY(9)); + default_read(t.vcpu, t.vcpu, LOGICAL, t.size, 9); + default_read(t.vcpu, t.vm, ABSOLUTE, t.size, 9);
kvm_vm_free(t.kvm_vm); }
Add test that tries to access, instead of CHECK_ONLY.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com --- tools/testing/selftests/kvm/s390x/memop.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 286185a59238..311cf7d33247 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -641,7 +641,9 @@ static void _test_errors_common(struct test_info info, enum mop_target target, i
/* Bad guest address: */ rv = ERR_MOP(info, target, WRITE, mem1, size, GADDR((void *)~0xfffUL), CHECK_ONLY); - TEST_ASSERT(rv > 0, "ioctl does not report bad guest memory access"); + TEST_ASSERT(rv > 0, "ioctl does not report bad guest memory address"); + rv = ERR_MOP(info, target, WRITE, mem1, size, GADDR((void *)~0xfffUL)); + TEST_ASSERT(rv > 0, "ioctl does not report bad guest memory address");
/* Bad host address: */ rv = ERR_MOP(info, target, WRITE, 0, size, GADDR_V(mem1));
Test successful exchange, unsuccessful exchange, storage key protection and invalid arguments.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com --- tools/testing/selftests/kvm/s390x/memop.c | 570 +++++++++++++++++++--- 1 file changed, 495 insertions(+), 75 deletions(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 311cf7d33247..3a160ab0415b 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -9,6 +9,7 @@ #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> +#include <pthread.h>
#include <linux/bits.h>
@@ -44,10 +45,14 @@ struct mop_desc { enum mop_access_mode mode; void *buf; uint32_t sida_offset; + void *old; + bool *cmpxchg_success; uint8_t ar; uint8_t key; };
+typedef unsigned __int128 uint128; + const uint8_t NO_KEY = 0xff;
static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc) @@ -91,6 +96,26 @@ static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc) ksmo.flags |= KVM_S390_MEMOP_F_SKEY_PROTECTION; ksmo.key = desc->key; } + if (desc->old) { + ksmo.flags |= KVM_S390_MEMOP_F_CMPXCHG; + switch (ksmo.size) { + case 1: + ksmo.old[1] = *(uint8_t *)desc->old; + break; + case 2: + ksmo.old[1] = *(uint16_t *)desc->old; + break; + case 4: + ksmo.old[1] = *(uint32_t *)desc->old; + break; + case 8: + ksmo.old[1] = *(uint64_t *)desc->old; + break; + case 16: + memcpy(ksmo.old, desc->old, sizeof(ksmo.old)); + break; + } + } if (desc->_ar) ksmo.ar = desc->ar; else @@ -101,6 +126,31 @@ static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc) return ksmo; }
+static void cmpxchg_write_back(struct kvm_s390_mem_op *ksmo, struct mop_desc *desc) +{ + if (desc->old) { + switch (ksmo->size) { + case 1: + *(uint8_t *)desc->old = ksmo->old[1]; + break; + case 2: + *(uint16_t *)desc->old = ksmo->old[1]; + break; + case 4: + *(uint32_t *)desc->old = ksmo->old[1]; + break; + case 8: + *(uint64_t *)desc->old = ksmo->old[1]; + break; + case 16: + memcpy(desc->old, ksmo->old, sizeof(ksmo->old)); + break; + } + } + if (desc->cmpxchg_success) + *desc->cmpxchg_success = !(ksmo->flags & KVM_S390_MEMOP_F_CMPXCHG); +} + struct test_info { struct kvm_vm *vm; struct kvm_vcpu *vcpu; @@ -136,18 +186,22 @@ static void print_memop(struct kvm_vcpu *vcpu, const struct kvm_s390_mem_op *ksm printf("ABSOLUTE, WRITE, "); break; } - printf("gaddr=%llu, size=%u, buf=%llu, ar=%u, key=%u", - ksmo->gaddr, ksmo->size, ksmo->buf, ksmo->ar, ksmo->key); + printf("gaddr=%llu, size=%u, buf=%llu, ar=%u, key=%u, old[0]=%llu, old[1]=%llu", + ksmo->gaddr, ksmo->size, ksmo->buf, ksmo->ar, ksmo->key, + ksmo->old[0], ksmo->old[1]); if (ksmo->flags & KVM_S390_MEMOP_F_CHECK_ONLY) printf(", CHECK_ONLY"); if (ksmo->flags & KVM_S390_MEMOP_F_INJECT_EXCEPTION) printf(", INJECT_EXCEPTION"); if (ksmo->flags & KVM_S390_MEMOP_F_SKEY_PROTECTION) printf(", SKEY_PROTECTION"); + if (ksmo->flags & KVM_S390_MEMOP_F_CMPXCHG) + printf(", CMPXCHG"); puts(")"); }
-static void memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) +static void memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo, + struct mop_desc *desc) { struct kvm_vcpu *vcpu = info.vcpu;
@@ -155,16 +209,21 @@ static void memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) vm_ioctl(info.vm, KVM_S390_MEM_OP, ksmo); else vcpu_ioctl(vcpu, KVM_S390_MEM_OP, ksmo); + cmpxchg_write_back(ksmo, desc); }
-static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) +static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo, + struct mop_desc *desc) { struct kvm_vcpu *vcpu = info.vcpu; + int r;
if (!vcpu) - return __vm_ioctl(info.vm, KVM_S390_MEM_OP, ksmo); + r = __vm_ioctl(info.vm, KVM_S390_MEM_OP, ksmo); else - return __vcpu_ioctl(vcpu, KVM_S390_MEM_OP, ksmo); + r = __vcpu_ioctl(vcpu, KVM_S390_MEM_OP, ksmo); + cmpxchg_write_back(ksmo, desc); + return r; }
#define MEMOP(err, info_p, mop_target_p, access_mode_p, buf_p, size_p, ...) \ @@ -187,7 +246,7 @@ static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) } \ __ksmo = ksmo_from_desc(&__desc); \ print_memop(__info.vcpu, &__ksmo); \ - err##memop_ioctl(__info, &__ksmo); \ + err##memop_ioctl(__info, &__ksmo, &__desc); \ })
#define MOP(...) MEMOP(, __VA_ARGS__) @@ -201,6 +260,8 @@ static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) #define AR(a) ._ar = 1, .ar = (a) #define KEY(a) .f_key = 1, .key = (a) #define INJECT .f_inject = 1 +#define CMPXCHG_OLD(o) .old = (o) +#define CMPXCHG_SUCCESS(s) .cmpxchg_success = (s)
#define CHECK_N_DO(f, ...) ({ f(__VA_ARGS__, CHECK_ONLY); f(__VA_ARGS__); })
@@ -210,8 +271,8 @@ static int err_memop_ioctl(struct test_info info, struct kvm_s390_mem_op *ksmo) #define CR0_FETCH_PROTECTION_OVERRIDE (1UL << (63 - 38)) #define CR0_STORAGE_PROTECTION_OVERRIDE (1UL << (63 - 39))
-static uint8_t mem1[65536]; -static uint8_t mem2[65536]; +static uint8_t __aligned(PAGE_SIZE) mem1[65536]; +static uint8_t __aligned(PAGE_SIZE) mem2[65536];
struct test_default { struct kvm_vm *kvm_vm; @@ -243,6 +304,8 @@ enum stage { STAGE_SKEYS_SET, /* Guest copied memory (locations up to test case) */ STAGE_COPIED, + /* End of guest code reached */ + STAGE_DONE, };
#define HOST_SYNC(info_p, stage) \ @@ -254,6 +317,11 @@ enum stage { \ vcpu_run(__vcpu); \ get_ucall(__vcpu, &uc); \ + if (uc.cmd == UCALL_ABORT) { \ + TEST_FAIL("line %lu: %s, hints: %lu, %lu", \ + uc.args[1], (const char *)uc.args[0], \ + uc.args[2], uc.args[3]); \ + } \ ASSERT_EQ(uc.cmd, UCALL_SYNC); \ ASSERT_EQ(uc.args[1], __stage); \ }) \ @@ -293,6 +361,44 @@ static void default_read(struct test_info copy_cpu, struct test_info mop_cpu, ASSERT_MEM_EQ(mem1, mem2, size); }
+static void default_cmpxchg(struct test_default *test, uint8_t key) +{ + for (int size = 1; size <= 16; size *= 2) { + for (int offset = 0; offset < 16; offset += size) { + uint8_t __aligned(16) new[16] = {}; + uint8_t __aligned(16) old[16]; + bool succ; + + prepare_mem12(); + default_write_read(test->vcpu, test->vcpu, LOGICAL, 16, NO_KEY); + + memcpy(&old, mem1, 16); + CHECK_N_DO(MOP, test->vm, ABSOLUTE, WRITE, new + offset, + size, GADDR_V(mem1 + offset), + CMPXCHG_OLD(old + offset), + CMPXCHG_SUCCESS(&succ), KEY(key)); + HOST_SYNC(test->vcpu, STAGE_COPIED); + MOP(test->vm, ABSOLUTE, READ, mem2, 16, GADDR_V(mem2)); + TEST_ASSERT(succ, "exchange of values should succeed"); + memcpy(mem1 + offset, new + offset, size); + ASSERT_MEM_EQ(mem1, mem2, 16); + + memcpy(&old, mem1, 16); + new[offset]++; + old[offset]++; + CHECK_N_DO(MOP, test->vm, ABSOLUTE, WRITE, new + offset, + size, GADDR_V(mem1 + offset), + CMPXCHG_OLD(old + offset), + CMPXCHG_SUCCESS(&succ), KEY(key)); + HOST_SYNC(test->vcpu, STAGE_COPIED); + MOP(test->vm, ABSOLUTE, READ, mem2, 16, GADDR_V(mem2)); + TEST_ASSERT(!succ, "exchange of values should not succeed"); + ASSERT_MEM_EQ(mem1, mem2, 16); + ASSERT_MEM_EQ(&old, mem1, 16); + } + } +} + static void guest_copy(void) { GUEST_SYNC(STAGE_INITED); @@ -377,6 +483,250 @@ static void test_copy_key(void) kvm_vm_free(t.kvm_vm); }
+static void test_cmpxchg_key(void) +{ + struct test_default t = test_default_init(guest_copy_key); + + HOST_SYNC(t.vcpu, STAGE_SKEYS_SET); + + default_cmpxchg(&t, NO_KEY); + default_cmpxchg(&t, 0); + default_cmpxchg(&t, 9); + + kvm_vm_free(t.kvm_vm); +} + +static uint128 cut_to_size(int size, uint128 val) +{ + switch (size) { + case 1: + return (uint8_t)val; + case 2: + return (uint16_t)val; + case 4: + return (uint32_t)val; + case 8: + return (uint64_t)val; + case 16: + return val; + } + GUEST_ASSERT_1(false, "Invalid size"); + return 0; +} + +static bool popcount_eq(uint128 a, uint128 b) +{ + unsigned int count_a, count_b; + + count_a = __builtin_popcountl((uint64_t)(a >> 64)) + + __builtin_popcountl((uint64_t)a); + count_b = __builtin_popcountl((uint64_t)(b >> 64)) + + __builtin_popcountl((uint64_t)b); + return count_a == count_b; +} + +static uint128 rotate(int size, uint128 val, int amount) +{ + unsigned int bits = size * 8; + + amount = (amount + bits) % bits; + val = cut_to_size(size, val); + return (val << (bits - amount)) | (val >> amount); +} + +const unsigned int max_block = 16; + +static void choose_block(bool guest, int i, int *size, int *offset) +{ + unsigned int rand; + + rand = i; + if (guest) { + rand = rand * 19 + 11; + *size = 1 << ((rand % 3) + 2); + rand = rand * 19 + 11; + *offset = (rand % max_block) & ~(*size - 1); + } else { + rand = rand * 17 + 5; + *size = 1 << (rand % 5); + rand = rand * 17 + 5; + *offset = (rand % max_block) & ~(*size - 1); + } +} + +static uint128 permutate_bits(bool guest, int i, int size, uint128 old) +{ + unsigned int rand; + bool swap; + + rand = i; + rand = rand * 3 + 1; + if (guest) + rand = rand * 3 + 1; + swap = rand % 2 == 0; + if (swap) { + int i, j; + uint128 new; + uint8_t byte0, byte1; + + rand = rand * 3 + 1; + i = rand % size; + rand = rand * 3 + 1; + j = rand % size; + if (i == j) + return old; + new = rotate(16, old, i * 8); + byte0 = new & 0xff; + new &= ~0xff; + new = rotate(16, new, -i * 8); + new = rotate(16, new, j * 8); + byte1 = new & 0xff; + new = (new & ~0xff) | byte0; + new = rotate(16, new, -j * 8); + new = rotate(16, new, i * 8); + new = new | byte1; + new = rotate(16, new, -i * 8); + return new; + } else { + int amount; + + rand = rand * 3 + 1; + amount = rand % (size * 8); + return rotate(size, old, amount); + } +} + +static bool _cmpxchg(int size, void *target, uint128 *old_p, uint128 new) +{ + bool ret; + + switch (size) { + case 4: { + uint32_t old = *old_p; + + asm volatile ("cs %[old],%[new],%[address]" + : [old] "+d" (old), + [address] "+Q" (*(uint32_t *)(target)) + : [new] "d" ((uint32_t)new) + : "cc" + ); + ret = old == (uint32_t)*old_p; + *old_p = old; + return ret; + } + case 8: { + uint64_t old = *old_p; + + asm volatile ("csg %[old],%[new],%[address]" + : [old] "+d" (old), + [address] "+Q" (*(uint64_t *)(target)) + : [new] "d" ((uint64_t)new) + : "cc" + ); + ret = old == (uint64_t)*old_p; + *old_p = old; + return ret; + } + case 16: { + uint128 old = *old_p; + + asm volatile ("cdsg %[old],%[new],%[address]" + : [old] "+d" (old), + [address] "+Q" (*(uint128 *)(target)) + : [new] "d" (new) + : "cc" + ); + ret = old == *old_p; + *old_p = old; + return ret; + } + } + GUEST_ASSERT_1(false, "Invalid size"); + return 0; +} + +const unsigned int cmpxchg_iter_outer = 100, cmpxchg_iter_inner = 10000; + +static void guest_cmpxchg_key(void) +{ + int size, offset; + uint128 old, new; + + set_storage_key_range(mem1, max_block, 0x10); + set_storage_key_range(mem2, max_block, 0x10); + GUEST_SYNC(STAGE_SKEYS_SET); + + for (int i = 0; i < cmpxchg_iter_outer; i++) { + do { + old = 1; + } while (!_cmpxchg(16, mem1, &old, 0)); + for (int j = 0; j < cmpxchg_iter_inner; j++) { + choose_block(true, i + j, &size, &offset); + do { + new = permutate_bits(true, i + j, size, old); + } while (!_cmpxchg(size, mem2 + offset, &old, new)); + } + } + + GUEST_SYNC(STAGE_DONE); +} + +static void *run_guest(void *data) +{ + struct test_info *info = data; + + HOST_SYNC(*info, STAGE_DONE); + return NULL; +} + +static char *quad_to_char(uint128 *quad, int size) +{ + return ((char *)quad) + (sizeof(*quad) - size); +} + +static void test_cmpxchg_key_concurrent(void) +{ + struct test_default t = test_default_init(guest_cmpxchg_key); + int size, offset; + uint128 old, new; + bool success; + pthread_t thread; + + HOST_SYNC(t.vcpu, STAGE_SKEYS_SET); + prepare_mem12(); + MOP(t.vcpu, LOGICAL, WRITE, mem1, max_block, GADDR_V(mem2)); + pthread_create(&thread, NULL, run_guest, &t.vcpu); + + for (int i = 0; i < cmpxchg_iter_outer; i++) { + do { + old = 0; + new = 1; + MOP(t.vm, ABSOLUTE, WRITE, &new, + sizeof(new), GADDR_V(mem1), + CMPXCHG_OLD(&old), + CMPXCHG_SUCCESS(&success), KEY(1)); + } while (!success); + for (int j = 0; j < cmpxchg_iter_inner; j++) { + choose_block(false, i + j, &size, &offset); + do { + new = permutate_bits(false, i + j, size, old); + MOP(t.vm, ABSOLUTE, WRITE, quad_to_char(&new, size), + size, GADDR_V(mem2 + offset), + CMPXCHG_OLD(quad_to_char(&old, size)), + CMPXCHG_SUCCESS(&success), KEY(1)); + } while (!success); + } + } + + pthread_join(thread, NULL); + + MOP(t.vcpu, LOGICAL, READ, mem2, max_block, GADDR_V(mem2)); + TEST_ASSERT(popcount_eq(*(uint128 *)mem1, *(uint128 *)mem2), + "Must retain number of set bits"); + + kvm_vm_free(t.kvm_vm); +} + static void guest_copy_key_fetch_prot(void) { /* @@ -457,6 +807,24 @@ static void test_errors_key(void) kvm_vm_free(t.kvm_vm); }
+static void test_errors_cmpxchg_key(void) +{ + struct test_default t = test_default_init(guest_copy_key_fetch_prot); + int i; + + HOST_SYNC(t.vcpu, STAGE_INITED); + HOST_SYNC(t.vcpu, STAGE_SKEYS_SET); + + for (i = 1; i <= 16; i *= 2) { + uint128 old = 0; + + CHECK_N_DO(ERR_PROT_MOP, t.vm, ABSOLUTE, WRITE, mem2, i, GADDR_V(mem2), + CMPXCHG_OLD(&old), KEY(2)); + } + + kvm_vm_free(t.kvm_vm); +} + static void test_termination(void) { struct test_default t = test_default_init(guest_error_key); @@ -692,87 +1060,139 @@ static void test_errors(void) kvm_vm_free(t.kvm_vm); }
-struct testdef { - const char *name; - void (*test)(void); - int extension; -} testlist[] = { - { - .name = "simple copy", - .test = test_copy, - }, - { - .name = "generic error checks", - .test = test_errors, - }, - { - .name = "copy with storage keys", - .test = test_copy_key, - .extension = 1, - }, - { - .name = "copy with key storage protection override", - .test = test_copy_key_storage_prot_override, - .extension = 1, - }, - { - .name = "copy with key fetch protection", - .test = test_copy_key_fetch_prot, - .extension = 1, - }, - { - .name = "copy with key fetch protection override", - .test = test_copy_key_fetch_prot_override, - .extension = 1, - }, - { - .name = "error checks with key", - .test = test_errors_key, - .extension = 1, - }, - { - .name = "termination", - .test = test_termination, - .extension = 1, - }, - { - .name = "error checks with key storage protection override", - .test = test_errors_key_storage_prot_override, - .extension = 1, - }, - { - .name = "error checks without key fetch prot override", - .test = test_errors_key_fetch_prot_override_not_enabled, - .extension = 1, - }, - { - .name = "error checks with key fetch prot override", - .test = test_errors_key_fetch_prot_override_enabled, - .extension = 1, - }, -}; +static void test_errors_cmpxchg(void) +{ + struct test_default t = test_default_init(guest_idle); + uint128 old; + int rv, i, power = 1; + + HOST_SYNC(t.vcpu, STAGE_INITED); + + for (i = 0; i < 32; i++) { + if (i == power) { + power *= 2; + continue; + } + rv = ERR_MOP(t.vm, ABSOLUTE, WRITE, mem1, i, GADDR_V(mem1), + CMPXCHG_OLD(&old)); + TEST_ASSERT(rv == -1 && errno == EINVAL, + "ioctl allows bad size for cmpxchg"); + } + for (i = 1; i <= 16; i *= 2) { + rv = ERR_MOP(t.vm, ABSOLUTE, WRITE, mem1, i, GADDR((void *)~0xfffUL), + CMPXCHG_OLD(&old)); + TEST_ASSERT(rv > 0, "ioctl allows bad guest address for cmpxchg"); + } + for (i = 2; i <= 16; i *= 2) { + rv = ERR_MOP(t.vm, ABSOLUTE, WRITE, mem1, i, GADDR_V(mem1 + 1), + CMPXCHG_OLD(&old)); + TEST_ASSERT(rv == -1 && errno == EINVAL, + "ioctl allows bad alignment for cmpxchg"); + } + + kvm_vm_free(t.kvm_vm); +}
int main(int argc, char *argv[]) { int extension_cap, idx;
+ setbuf(stdout, NULL); /* Tell stdout not to buffer its content */ TEST_REQUIRE(kvm_has_cap(KVM_CAP_S390_MEM_OP)); + extension_cap = kvm_check_cap(KVM_CAP_S390_MEM_OP_EXTENSION);
- setbuf(stdout, NULL); /* Tell stdout not to buffer its content */ + struct testdef { + const char *name; + void (*test)(void); + bool requirements_met; + } testlist[] = { + { + .name = "simple copy", + .test = test_copy, + .requirements_met = true, + }, + { + .name = "generic error checks", + .test = test_errors, + .requirements_met = true, + }, + { + .name = "copy with storage keys", + .test = test_copy_key, + .requirements_met = extension_cap > 0, + }, + { + .name = "cmpxchg with storage keys", + .test = test_cmpxchg_key, + .requirements_met = extension_cap & 0x2, + }, + { + .name = "concurrently cmpxchg with storage keys", + .test = test_cmpxchg_key_concurrent, + .requirements_met = extension_cap & 0x2, + }, + { + .name = "copy with key storage protection override", + .test = test_copy_key_storage_prot_override, + .requirements_met = extension_cap > 0, + }, + { + .name = "copy with key fetch protection", + .test = test_copy_key_fetch_prot, + .requirements_met = extension_cap > 0, + }, + { + .name = "copy with key fetch protection override", + .test = test_copy_key_fetch_prot_override, + .requirements_met = extension_cap > 0, + }, + { + .name = "error checks with key", + .test = test_errors_key, + .requirements_met = extension_cap > 0, + }, + { + .name = "error checks for cmpxchg with key", + .test = test_errors_cmpxchg_key, + .requirements_met = extension_cap & 0x2, + }, + { + .name = "error checks for cmpxchg", + .test = test_errors_cmpxchg, + .requirements_met = extension_cap & 0x2, + }, + { + .name = "termination", + .test = test_termination, + .requirements_met = extension_cap > 0, + }, + { + .name = "error checks with key storage protection override", + .test = test_errors_key_storage_prot_override, + .requirements_met = extension_cap > 0, + }, + { + .name = "error checks without key fetch prot override", + .test = test_errors_key_fetch_prot_override_not_enabled, + .requirements_met = extension_cap > 0, + }, + { + .name = "error checks with key fetch prot override", + .test = test_errors_key_fetch_prot_override_enabled, + .requirements_met = extension_cap > 0, + }, + };
ksft_print_header(); - ksft_set_plan(ARRAY_SIZE(testlist));
- extension_cap = kvm_check_cap(KVM_CAP_S390_MEM_OP_EXTENSION); for (idx = 0; idx < ARRAY_SIZE(testlist); idx++) { - if (extension_cap >= testlist[idx].extension) { + if (testlist[idx].requirements_met) { testlist[idx].test(); ksft_test_result_pass("%s\n", testlist[idx].name); } else { - ksft_test_result_skip("%s - extension level %d not supported\n", - testlist[idx].name, - testlist[idx].extension); + ksft_test_result_skip("%s - requirements not met (kernel has extension cap %#x\n)", + testlist[idx].name, extension_cap); } }
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com --- tools/testing/selftests/kvm/s390x/memop.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 3a160ab0415b..1887685b41d2 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -970,7 +970,7 @@ static void test_errors_key_fetch_prot_override_enabled(void)
/* * vcpu, mismatching keys on fetch, - * fetch protection override does not apply because memory range acceeded + * fetch protection override does not apply because memory range exceeded */ CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, 2048 + 1, GADDR_V(0), KEY(2)); CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, PAGE_SIZE + 2048 + 1,
The guest code sets the key for mem1 only. In order to provoke a protection exception the test codes needs to address mem1.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com --- tools/testing/selftests/kvm/s390x/memop.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 1887685b41d2..82fc65cca8a3 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -800,9 +800,9 @@ static void test_errors_key(void)
/* vm/vcpu, mismatching keys, fetch protection in effect */ CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, WRITE, mem1, t.size, GADDR_V(mem1), KEY(2)); - CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, t.size, GADDR_V(mem2), KEY(2)); + CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, t.size, GADDR_V(mem1), KEY(2)); CHECK_N_DO(ERR_PROT_MOP, t.vm, ABSOLUTE, WRITE, mem1, t.size, GADDR_V(mem1), KEY(2)); - CHECK_N_DO(ERR_PROT_MOP, t.vm, ABSOLUTE, READ, mem2, t.size, GADDR_V(mem2), KEY(2)); + CHECK_N_DO(ERR_PROT_MOP, t.vm, ABSOLUTE, READ, mem2, t.size, GADDR_V(mem1), KEY(2));
kvm_vm_free(t.kvm_vm); }
On 10/1/22 04:07, Janis Schoetterl-Glausch wrote:
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
tools/testing/selftests/kvm/s390x/memop.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 3a160ab0415b..1887685b41d2 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -970,7 +970,7 @@ static void test_errors_key_fetch_prot_override_enabled(void) /* * vcpu, mismatching keys on fetch,
* fetch protection override does not apply because memory range acceeded
*/ CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, 2048 + 1, GADDR_V(0), KEY(2)); CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, PAGE_SIZE + 2048 + 1,* fetch protection override does not apply because memory range exceeded
No patch description? Seriously?
What about this one?:
``` The word "acceded" isn't the right word to describe too much memory situation. Substitute it with "exceeded". ```
Thanks.
Hi Janis,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on f76349cf41451c5c42a99f18a9163377e4b364ff]
url: https://github.com/intel-lab-lkp/linux/commits/Janis-Schoetterl-Glausch/KVM-... base: f76349cf41451c5c42a99f18a9163377e4b364ff config: s390-randconfig-m031-20220925 compiler: s390-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/b53d129604de03147fce1d353698f9... git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Janis-Schoetterl-Glausch/KVM-s390-Extend-MEM_OP-ioctl-by-storage-key-checked-cmpxchg/20221001-050945 git checkout b53d129604de03147fce1d353698f961b256f895 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot lkp@intel.com
All errors (new ones prefixed by >>):
arch/s390/include/asm/uaccess.h: Assembler messages:
arch/s390/include/asm/uaccess.h:430: Error: Unrecognized opcode: `xrk'
arch/s390/include/asm/uaccess.h:434: Error: Unrecognized opcode: `xrk'
vim +430 arch/s390/include/asm/uaccess.h
110a6dbb2eca6b Heiko Carstens 2020-09-14 394 db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 395 static __always_inline int __cmpxchg_user_key_small(int size, u64 address, db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 396 unsigned __int128 *old_p, db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 397 unsigned __int128 new, u8 access_key) db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 398 { db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 399 u32 shift, mask, old_word, new_word, align_mask, tmp, diff; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 400 u64 aligned; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 401 int ret = -EFAULT; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 402 db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 403 switch (size) { db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 404 case 2: db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 405 align_mask = 2; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 406 aligned = (address ^ (address & align_mask)); db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 407 shift = (sizeof(u32) - (address & align_mask) - size) * 8; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 408 mask = 0xffff << shift; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 409 old_word = ((u16)*old_p) << shift; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 410 new_word = ((u16)new) << shift; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 411 break; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 412 case 1: db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 413 align_mask = 3; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 414 aligned = (address ^ (address & align_mask)); db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 415 shift = (sizeof(u32) - (address & align_mask) - size) * 8; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 416 mask = 0xff << shift; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 417 old_word = ((u8)*old_p) << shift; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 418 new_word = ((u8)new) << shift; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 419 break; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 420 } db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 421 asm volatile( db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 422 "spka 0(%[access_key])\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 423 " sacf 256\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 424 "0: l %[tmp],%[aligned]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 425 "1: nr %[tmp],%[hole_mask]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 426 " or %[new_word],%[tmp]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 427 " or %[old_word],%[tmp]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 428 " lr %[tmp],%[old_word]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 429 "2: cs %[tmp],%[new_word],%[aligned]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 @430 "3: jnl 4f\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 431 " xrk %[diff],%[tmp],%[old_word]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 432 " nr %[diff],%[hole_mask]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 433 " xr %[new_word],%[diff]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 434 " xr %[old_word],%[diff]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 435 " xrk %[diff],%[tmp],%[old_word]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 436 " jz 2b\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 437 "4: ipm %[ret]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 438 " srl %[ret],28\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 439 "5: sacf 768\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 440 " spka %[default_key]\n" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 441 EX_TABLE(0b, 5b) EX_TABLE(1b, 5b) db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 442 EX_TABLE(2b, 5b) EX_TABLE(3b, 5b) db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 443 : [old_word] "+&d" (old_word), db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 444 [new_word] "+&d" (new_word), db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 445 [tmp] "=&d" (tmp), db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 446 [aligned] "+Q" (*(u32 *)aligned), db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 447 [diff] "=&d" (diff), db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 448 [ret] "+d" (ret) db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 449 : [access_key] "a" (access_key << 4), db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 450 [hole_mask] "d" (~mask), db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 451 [default_key] "J" (PAGE_DEFAULT_KEY) db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 452 : "cc" db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 453 ); db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 454 *old_p = (tmp & mask) >> shift; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 455 return ret; db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 456 } db6f2eb3910899 Janis Schoetterl-Glausch 2022-09-30 457
On 30/09/2022 23.07, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
...
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index eed0315a77a6..b856705f3f6b 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -580,7 +580,9 @@ struct kvm_translation { struct kvm_s390_mem_op { /* in */ __u64 gaddr; /* the guest address */
- /* in & out */ __u64 flags; /* flags */
- /* in */ __u32 size; /* amount of bytes */ __u32 op; /* type of operation */ __u64 buf; /* buffer in userspace */
@@ -588,6 +590,8 @@ struct kvm_s390_mem_op { struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */
/* in & out */
__u64 old[2]; /* ignored if flag unset */
The alignment looks very unfortunate now ... could you please add a "__u8 pad[6]" or "__u8 pad[14]" in front of the new field?
}; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */
@@ -604,6 +608,7 @@ struct kvm_s390_mem_op { #define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0) #define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1) #define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2) +#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3) /* for KVM_INTERRUPT */ struct kvm_interrupt { diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index 9408d6cc8e2c..a1cb66ae0995 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -206,6 +206,10 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, void *data, unsigned long len, enum gacc_mode mode); +int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
unsigned __int128 *old,
unsigned __int128 new, u8 access_key);
- /**
- write_guest_with_key - copy data from kernel space to guest space
- @vcpu: virtual cpu
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 0243b6e38d36..c0e490ecc372 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -1161,6 +1161,62 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, return rc; } +/**
- cmpxchg_guest_abs_with_key() - Perform cmpxchg on guest absolute address.
- @kvm: Virtual machine instance.
- @gpa: Absolute guest address of the location to be changed.
- @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
non power of two will result in failure.
- @old_p: Pointer to old value. If the location at @gpa contains this value, the
exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
contains the value at @gpa before the attempt to exchange the value.
- @new: The value to place at @gpa.
- @access_key: The access key to use for the guest access.
- Atomically exchange the value at @gpa by @new, if it contains *@old.
- Honors storage keys.
- Return: * 0: successful exchange
* 1: exchange unsuccessful
* a program interruption code indicating the reason cmpxchg could
not be attempted
* -EINVAL: address misaligned or len not power of two
- */
+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
unsigned __int128 *old_p, unsigned __int128 new,
u8 access_key)
+{
- gfn_t gfn = gpa >> PAGE_SHIFT;
- struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
- bool writable;
- hva_t hva;
- int ret;
- if (!IS_ALIGNED(gpa, len))
return -EINVAL;
- hva = gfn_to_hva_memslot_prot(slot, gfn, &writable);
- if (kvm_is_error_hva(hva))
return PGM_ADDRESSING;
- /*
* Check if it's a ro memslot, even tho that can't occur (they're unsupported).
Not everybody is used to read such abbreviated English ... I'd recommend to spell it rather properly (ro ==> read-only, tho ==> though)
* Don't try to actually handle that case.
*/
- if (!writable)
return -EOPNOTSUPP;
- hva += offset_in_page(gpa);
- ret = cmpxchg_user_key_size(len, (void __user *)hva, old_p, new, access_key);
- mark_page_dirty_in_slot(kvm, slot, gfn);
- /*
* Assume that the fault is caused by key protection, the alternative
* is that the user page is write protected.
*/
- if (ret == -EFAULT)
ret = PGM_PROTECTION;
- return ret;
+}
- /**
- guest_translate_address_with_key - translate guest logical into guest absolute address
- @vcpu: virtual cpu
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index b7ef0b71014d..d594d1318d2a 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -576,7 +576,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_VCPU_RESETS: case KVM_CAP_SET_GUEST_DEBUG: case KVM_CAP_S390_DIAG318:
- case KVM_CAP_S390_MEM_OP_EXTENSION: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2:
@@ -590,6 +589,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_MEM_OP: r = MEM_OP_MAX_SIZE; break;
- case KVM_CAP_S390_MEM_OP_EXTENSION:
r = 0x3;
Add a comment to explain that magic value 0x3 ?
case KVM_CAP_NR_VCPUS: case KVM_CAP_MAX_VCPUS: case KVM_CAP_MAX_VCPU_ID:break;
Thomas
On 30/09/2022 23.07, Janis Schoetterl-Glausch wrote:
Describe the semantics of the new KVM_S390_MEMOP_F_CMPXCHG flag for absolute vm write memops which allows user space to perform (storage key checked) cmpxchg operations on guest memory.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
Documentation/virt/kvm/api.rst | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index abd7c32126ce..0e02d66e38ae 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -3771,6 +3771,7 @@ Parameters are specified via the following structure:: struct { __u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */
Padding / alignment?
}; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */__u64 old[2]; /* ignored if flag unset */
@@ -3853,8 +3854,23 @@ Absolute accesses are permitted for non-protected guests only. Supported flags: * ``KVM_S390_MEMOP_F_CHECK_ONLY`` * ``KVM_S390_MEMOP_F_SKEY_PROTECTION``
- ``KVM_S390_MEMOP_F_CMPXCHG``
+The semantics of the flags common with logical acesses are as for logical accesses.
+For write accesses, the KVM_S390_MEMOP_F_CMPXCHG might be supported. +In this case, instead of doing an unconditional write, the access occurs only +if the target location contains the value provided in "old". This is performed +as an atomic cmpxchg. "size" must be a power of two up to and including 16. +Values with sizes <8 byte are to be provided by assignment to "old[1]". +Doublewords are provided with the higher value word in "old[0]" and the lower +word in "old[1]". +The value at the target location is returned in "old", encoded in the same manner. +If the value was exchanged the KVM_S390_MEMOP_F_CMPXCHG bit in "flags" is set to +0, otherwise it remains set. +The KVM_S390_MEMOP_F_CMPXCHG flag is supported if KVM_CAP_S390_MEM_OP_EXTENSION +has bit 1 (i.e. bit with value 2) set.
Please try to fit the text within 80 columns.
Thanks, Thomas
On 30/09/2022 23.07, Janis Schoetterl-Glausch wrote:
The struct is quite large, so this seems nicer.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
tools/testing/selftests/kvm/s390x/memop.c | 44 +++++++++++------------ 1 file changed, 22 insertions(+), 22 deletions(-)
Reviewed-by: Thomas Huth thuth@redhat.com
On 30/09/2022 23.07, Janis Schoetterl-Glausch wrote:
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
tools/testing/selftests/kvm/s390x/memop.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c index 3a160ab0415b..1887685b41d2 100644 --- a/tools/testing/selftests/kvm/s390x/memop.c +++ b/tools/testing/selftests/kvm/s390x/memop.c @@ -970,7 +970,7 @@ static void test_errors_key_fetch_prot_override_enabled(void) /* * vcpu, mismatching keys on fetch,
* fetch protection override does not apply because memory range acceeded
*/ CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, 2048 + 1, GADDR_V(0), KEY(2)); CHECK_N_DO(ERR_PROT_MOP, t.vcpu, LOGICAL, READ, mem2, PAGE_SIZE + 2048 + 1,* fetch protection override does not apply because memory range exceeded
Reviewed-by: Thomas Huth thuth@redhat.com
On Tue, 2022-10-04 at 10:16 +0200, Thomas Huth wrote:
On 30/09/2022 23.07, Janis Schoetterl-Glausch wrote:
Describe the semantics of the new KVM_S390_MEMOP_F_CMPXCHG flag for absolute vm write memops which allows user space to perform (storage key checked) cmpxchg operations on guest memory.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
Documentation/virt/kvm/api.rst | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index abd7c32126ce..0e02d66e38ae 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -3771,6 +3771,7 @@ Parameters are specified via the following structure:: struct {
What is the reason you initially didn't copy the /* in */ comment here?
Thanks for the reviews.
__u8 ar; /* the access register number */ __u8 key; /* access key, ignored if flag unset */
Padding / alignment?
}; __u32 sida_offset; /* offset into the sida */ __u8 reserved[32]; /* ignored */__u64 old[2]; /* ignored if flag unset */
@@ -3853,8 +3854,23 @@ Absolute accesses are permitted for non-protected guests only. Supported flags: * ``KVM_S390_MEMOP_F_CHECK_ONLY`` * ``KVM_S390_MEMOP_F_SKEY_PROTECTION``
- ``KVM_S390_MEMOP_F_CMPXCHG``
+The semantics of the flags common with logical acesses are as for logical accesses.
+For write accesses, the KVM_S390_MEMOP_F_CMPXCHG might be supported. +In this case, instead of doing an unconditional write, the access occurs only +if the target location contains the value provided in "old". This is performed +as an atomic cmpxchg. "size" must be a power of two up to and including 16. +Values with sizes <8 byte are to be provided by assignment to "old[1]". +Doublewords are provided with the higher value word in "old[0]" and the lower +word in "old[1]". +The value at the target location is returned in "old", encoded in the same manner. +If the value was exchanged the KVM_S390_MEMOP_F_CMPXCHG bit in "flags" is set to +0, otherwise it remains set. +The KVM_S390_MEMOP_F_CMPXCHG flag is supported if KVM_CAP_S390_MEM_OP_EXTENSION +has bit 1 (i.e. bit with value 2) set.
Please try to fit the text within 80 columns.
Thanks, Thomas
On 04/10/2022 20.51, Janis Schoetterl-Glausch wrote:
On Tue, 2022-10-04 at 10:16 +0200, Thomas Huth wrote:
On 30/09/2022 23.07, Janis Schoetterl-Glausch wrote:
Describe the semantics of the new KVM_S390_MEMOP_F_CMPXCHG flag for absolute vm write memops which allows user space to perform (storage key checked) cmpxchg operations on guest memory.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
Documentation/virt/kvm/api.rst | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index abd7c32126ce..0e02d66e38ae 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -3771,6 +3771,7 @@ Parameters are specified via the following structure:: struct {
What is the reason you initially didn't copy the /* in */ comment here?
You mean in commit 41408c28f283b ? Uh, don't ask me, that's more than 7 years ago...
Anyway, please be aware that the MEMOP ioctl is defined as IOW only:
#define KVM_S390_MEM_OP _IOW(KVMIO, 0xb1, struct kvm_s390_mem_op)
... so if you now introduce an "out" field in that struct, this might have some impact, e.g. on Valgrind etc.
Thomas
On 30/09/2022 23.07, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
The return value of MEM_OP is: 0 on success, < 0 on generic error (e.g. -EFAULT or -ENOMEM),
0 if an exception occurred while walking the page tables
A cmpxchg failing because the old value doesn't match is neither an error nor an exception, so the question is how best to signal that condition. This is not strictly necessary since user space can compare the value of old after the MEM_OP with the value it set. If they're different the cmpxchg failed. It might be a better user interface if there is an easier way to see if the cmpxchg failed. This patch sets the cmpxchg flag bit to 0 on a successful cmpxchg. This way you can compare against a constant instead of the old old value. This has the disadvantage of being a bit weird, other suggestions welcome.
This also breaks the old API of defining the ioctl as _IOW only ... with your change to the flags field, it effectively gets IOWR instead.
Maybe it would be better to put all the new logic into a new struct and only pass a pointer to that struct in kvm_s390_mem_op, so that the ioctl stays IOW ? ... or maybe even introduce a completely new ioctl for this functionality instead?
Thomas
On Fri, 30 Sep 2022 23:07:43 +0200 Janis Schoetterl-Glausch scgl@linux.ibm.com wrote:
Add cmpxchg functionality similar to that in cmpxchg.h except that the target is a user space address and that the address' storage key is matched with the access_key argument in order to honor key-controlled protection. The access is performed by changing to the secondary-spaces mode and setting the PSW key for the duration of the compare and swap.
this whole patch is very complex, I think it can be simplified and made more maintainable (see my comments below)
in the end here we need an atomic compare and swap with key checking, if we are doing a syscall for it, we are clearly not looking for performance.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
Possible variations:
- check the assumptions made in cmpxchg_user_key_size and error out
- call functions called by copy_to_user
- access_ok? is a nop
- should_fail_usercopy?
- instrument_copy_to_user? doesn't make sense IMO
- don't be overly strict in cmpxchg_user_key
arch/s390/include/asm/uaccess.h | 187 ++++++++++++++++++++++++++++++++ 1 file changed, 187 insertions(+)
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h index f7038b800cc3..0ce90b7e2b75 100644 --- a/arch/s390/include/asm/uaccess.h +++ b/arch/s390/include/asm/uaccess.h @@ -19,6 +19,8 @@ #include <asm/extable.h> #include <asm/facility.h> #include <asm-generic/access_ok.h> +#include <asm/page.h> +#include <linux/log2.h> void debug_user_asce(int exit); @@ -390,4 +392,189 @@ do { \ goto err_label; \ } while (0) +static __always_inline int __cmpxchg_user_key_small(int size, u64 address,
unsigned __int128 *old_p,
unsigned __int128 new, u8 access_key)
+{
can this whole function be simplified to be a C wrapper for the 4 byte version of compare and swap?
- u32 shift, mask, old_word, new_word, align_mask, tmp, diff;
- u64 aligned;
- int ret = -EFAULT;
- switch (size) {
- case 2:
align_mask = 2;
aligned = (address ^ (address & align_mask));
shift = (sizeof(u32) - (address & align_mask) - size) * 8;
mask = 0xffff << shift;
old_word = ((u16)*old_p) << shift;
new_word = ((u16)new) << shift;
break;
- case 1:
align_mask = 3;
aligned = (address ^ (address & align_mask));
shift = (sizeof(u32) - (address & align_mask) - size) * 8;
mask = 0xff << shift;
old_word = ((u8)*old_p) << shift;
new_word = ((u8)new) << shift;
break;
- }
- asm volatile(
"spka 0(%[access_key])\n"
" sacf 256\n"
"0: l %[tmp],%[aligned]\n"
"1: nr %[tmp],%[hole_mask]\n"
" or %[new_word],%[tmp]\n"
" or %[old_word],%[tmp]\n"
" lr %[tmp],%[old_word]\n"
"2: cs %[tmp],%[new_word],%[aligned]\n"
"3: jnl 4f\n"
" xrk %[diff],%[tmp],%[old_word]\n"
" nr %[diff],%[hole_mask]\n"
" xr %[new_word],%[diff]\n"
" xr %[old_word],%[diff]\n"
" xrk %[diff],%[tmp],%[old_word]\n"
" jz 2b\n"
"4: ipm %[ret]\n"
" srl %[ret],28\n"
"5: sacf 768\n"
" spka %[default_key]\n"
EX_TABLE(0b, 5b) EX_TABLE(1b, 5b)
EX_TABLE(2b, 5b) EX_TABLE(3b, 5b)
: [old_word] "+&d" (old_word),
[new_word] "+&d" (new_word),
[tmp] "=&d" (tmp),
[aligned] "+Q" (*(u32 *)aligned),
[diff] "=&d" (diff),
[ret] "+d" (ret)
: [access_key] "a" (access_key << 4),
[hole_mask] "d" (~mask),
[default_key] "J" (PAGE_DEFAULT_KEY)
: "cc"
- );
- *old_p = (tmp & mask) >> shift;
- return ret;
+}
+/**
- cmpxchg_user_key_size() - cmpxchg with user space target, honoring storage keys
- @size: Size of the value being cmpxchg'ed, one of 1,2,4,8,16.
- @address: User space address of value to compare to *@old_p and exchange with
*@new. Must be aligned to @size.
- @old_p: Pointer to old value. Interpreted as a @size byte integer and compared
to the content pointed to by @address in order to determine if the
exchange occurs. The value read from @address is written back to *@old_p.
- @new: New value to place at @address, interpreted as a @size byte integer.
- @access_key: Access key to use for checking storage key protection.
- Perform a cmpxchg on a user space target, honoring storage key protection.
- @access_key alone determines how key checking is performed, neither
- storage-protection-override nor fetch-protection-override apply.
- Return: 0: successful exchange
1: exchange failed
-EFAULT: @address not accessible or not naturally aligned
-EINVAL: invalid @size
- */
+static __always_inline int cmpxchg_user_key_size(int size, void __user *address,
unsigned __int128 *old_p,
unsigned __int128 new, u8 access_key)
+{
- union {
u32 word;
u64 doubleword;
- } old;
- int ret = -EFAULT;
- /*
* The following assumes that:
* * the current psw key is the default key
* * no storage protection overrides are in effect
*/
- might_fault();
- switch (size) {
- case 16:
asm volatile(
"spka 0(%[access_key])\n"
" sacf 256\n"
"0: cdsg %[old],%[new],%[target]\n"
"1: ipm %[ret]\n"
" srl %[ret],28\n"
"2: sacf 768\n"
" spka %[default_key]\n"
EX_TABLE(0b, 2b) EX_TABLE(1b, 2b)
: [old] "+d" (*old_p),
[target] "+Q" (*(unsigned __int128 __user *)address),
[ret] "+d" (ret)
: [access_key] "a" (access_key << 4),
[new] "d" (new),
[default_key] "J" (PAGE_DEFAULT_KEY)
: "cc"
);
return ret;
- case 8:
old.doubleword = *old_p;
asm volatile(
"spka 0(%[access_key])\n"
" sacf 256\n"
"0: csg %[old],%[new],%[target]\n"
"1: ipm %[ret]\n"
" srl %[ret],28\n"
"2: sacf 768\n"
" spka %[default_key]\n"
EX_TABLE(0b, 2b) EX_TABLE(1b, 2b)
: [old] "+d" (old.doubleword),
[target] "+Q" (*(u64 __user *)address),
[ret] "+d" (ret)
: [access_key] "a" (access_key << 4),
[new] "d" ((u64)new),
[default_key] "J" (PAGE_DEFAULT_KEY)
: "cc"
);
*old_p = old.doubleword;
return ret;
- case 4:
old.word = *old_p;
asm volatile(
"spka 0(%[access_key])\n"
" sacf 256\n"
"0: cs %[old],%[new],%[target]\n"
"1: ipm %[ret]\n"
" srl %[ret],28\n"
"2: sacf 768\n"
" spka %[default_key]\n"
EX_TABLE(0b, 2b) EX_TABLE(1b, 2b)
: [old] "+d" (old.word),
[target] "+Q" (*(u32 __user *)address),
[ret] "+d" (ret)
: [access_key] "a" (access_key << 4),
[new] "d" ((u32)new),
[default_key] "J" (PAGE_DEFAULT_KEY)
: "cc"
this is the same code 3 times with only very minimal changes. can you factor it out in macros?
something like this:
#define DO_COMPARE_AND_SWAP(instr, _old, _addr, _ret, _key, _new) \ asm volatile( "spka 0(%[access_key])\n" " sacf 256\n" "0: " instr "%[old],%[new],%[target]\n" "1: ipm %[ret]\n" " srl %[ret],28\n" "2: sacf 768\n" " spka %[default_key]\n" EX_TABLE(0b, 2b) EX_TABLE(1b, 2b) : [old] "+d"(_old), [target] "+Q" (*(_addr)), [ret] "+d" (_ret) : [access_key] "a" ((_key) << 4), [new] "d" (_new), [default_key] "J" (PAGE_DEFAULT_KEY) : "cc"
and then in the code:
DO_COMPARE_AND_SWAP("cs", old.word, (u32 __user *)address, ret, access_key, (u32)new)
this way the code is not duplicated
or have you tried it already and there are issues I didn't think of?
);
*old_p = old.word;
return ret;
- case 2:
- case 1:
return __cmpxchg_user_key_small(size, (u64)address, old_p, new, access_key);
- default:
return -EINVAL;
- }
+}
+#define cmpxchg_user_key(target_p, old_p, new, access_key) \ +({ \
- __typeof__(old_p) __old_p = (old_p); \
- unsigned __int128 __old = *__old_p; \
- size_t __size = sizeof(*(target_p)); \
- int __ret; \
\
- BUILD_BUG_ON(__size != sizeof(*__old_p)); \
- BUILD_BUG_ON(__size != sizeof(new)); \
- BUILD_BUG_ON(__size > 16 || !is_power_of_2(__size)); \
and here an if to see if you need the _small version or the regular one, with the _small version being a wrapper around the regular one
- __ret = cmpxchg_user_key_size(__size, (target_p), &__old, (new), \
(access_key)); \
- *__old_p = __old; \
- __ret; \
+})
#endif /* __S390_UACCESS_H */
On Wed, 2022-10-05 at 16:13 +0200, Claudio Imbrenda wrote:
On Fri, 30 Sep 2022 23:07:43 +0200 Janis Schoetterl-Glausch scgl@linux.ibm.com wrote:
Add cmpxchg functionality similar to that in cmpxchg.h except that the target is a user space address and that the address' storage key is matched with the access_key argument in order to honor key-controlled protection. The access is performed by changing to the secondary-spaces mode and setting the PSW key for the duration of the compare and swap.
this whole patch is very complex, I think it can be simplified and made more maintainable (see my comments below)
in the end here we need an atomic compare and swap with key checking, if we are doing a syscall for it, we are clearly not looking for performance.
If you only consider this in the context of KVM you are correct, but Heiko wanted me not to specialize this for KVM.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
Possible variations:
- check the assumptions made in cmpxchg_user_key_size and error out
- call functions called by copy_to_user
- access_ok? is a nop
- should_fail_usercopy?
- instrument_copy_to_user? doesn't make sense IMO
- don't be overly strict in cmpxchg_user_key
arch/s390/include/asm/uaccess.h | 187 ++++++++++++++++++++++++++++++++ 1 file changed, 187 insertions(+)
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h index f7038b800cc3..0ce90b7e2b75 100644 --- a/arch/s390/include/asm/uaccess.h +++ b/arch/s390/include/asm/uaccess.h @@ -19,6 +19,8 @@ #include <asm/extable.h> #include <asm/facility.h> #include <asm-generic/access_ok.h> +#include <asm/page.h> +#include <linux/log2.h> void debug_user_asce(int exit); @@ -390,4 +392,189 @@ do { \ goto err_label; \ } while (0) +static __always_inline int __cmpxchg_user_key_small(int size, u64 address,
unsigned __int128 *old_p,
unsigned __int128 new, u8 access_key)
+{
can this whole function be simplified to be a C wrapper for the 4 byte version of compare and swap?
I think so, but all of this is supposed to mirror arch/s390/include/asm/cmpxchg.h, although I did depart from that somewhat. For one, I changed the decision for retrying the loop, but I'll have to undo that since compilation for older machines complains about xrk.
- u32 shift, mask, old_word, new_word, align_mask, tmp, diff;
- u64 aligned;
- int ret = -EFAULT;
- switch (size) {
- case 2:
align_mask = 2;
aligned = (address ^ (address & align_mask));
shift = (sizeof(u32) - (address & align_mask) - size) * 8;
mask = 0xffff << shift;
old_word = ((u16)*old_p) << shift;
new_word = ((u16)new) << shift;
break;
- case 1:
align_mask = 3;
aligned = (address ^ (address & align_mask));
shift = (sizeof(u32) - (address & align_mask) - size) * 8;
mask = 0xff << shift;
old_word = ((u8)*old_p) << shift;
new_word = ((u8)new) << shift;
break;
- }
- asm volatile(
"spka 0(%[access_key])\n"
" sacf 256\n"
"0: l %[tmp],%[aligned]\n"
"1: nr %[tmp],%[hole_mask]\n"
" or %[new_word],%[tmp]\n"
" or %[old_word],%[tmp]\n"
" lr %[tmp],%[old_word]\n"
"2: cs %[tmp],%[new_word],%[aligned]\n"
"3: jnl 4f\n"
" xrk %[diff],%[tmp],%[old_word]\n"
" nr %[diff],%[hole_mask]\n"
" xr %[new_word],%[diff]\n"
" xr %[old_word],%[diff]\n"
" xrk %[diff],%[tmp],%[old_word]\n"
" jz 2b\n"
"4: ipm %[ret]\n"
" srl %[ret],28\n"
"5: sacf 768\n"
" spka %[default_key]\n"
EX_TABLE(0b, 5b) EX_TABLE(1b, 5b)
EX_TABLE(2b, 5b) EX_TABLE(3b, 5b)
: [old_word] "+&d" (old_word),
[new_word] "+&d" (new_word),
[tmp] "=&d" (tmp),
[aligned] "+Q" (*(u32 *)aligned),
[diff] "=&d" (diff),
[ret] "+d" (ret)
: [access_key] "a" (access_key << 4),
[hole_mask] "d" (~mask),
[default_key] "J" (PAGE_DEFAULT_KEY)
: "cc"
- );
- *old_p = (tmp & mask) >> shift;
- return ret;
+}
+/**
- cmpxchg_user_key_size() - cmpxchg with user space target, honoring storage keys
- @size: Size of the value being cmpxchg'ed, one of 1,2,4,8,16.
- @address: User space address of value to compare to *@old_p and exchange with
*@new. Must be aligned to @size.
- @old_p: Pointer to old value. Interpreted as a @size byte integer and compared
to the content pointed to by @address in order to determine if the
exchange occurs. The value read from @address is written back to *@old_p.
- @new: New value to place at @address, interpreted as a @size byte integer.
- @access_key: Access key to use for checking storage key protection.
- Perform a cmpxchg on a user space target, honoring storage key protection.
- @access_key alone determines how key checking is performed, neither
- storage-protection-override nor fetch-protection-override apply.
- Return: 0: successful exchange
1: exchange failed
-EFAULT: @address not accessible or not naturally aligned
-EINVAL: invalid @size
- */
+static __always_inline int cmpxchg_user_key_size(int size, void __user *address,
unsigned __int128 *old_p,
unsigned __int128 new, u8 access_key)
+{
- union {
u32 word;
u64 doubleword;
- } old;
- int ret = -EFAULT;
- /*
* The following assumes that:
* * the current psw key is the default key
* * no storage protection overrides are in effect
*/
- might_fault();
- switch (size) {
- case 16:
asm volatile(
"spka 0(%[access_key])\n"
" sacf 256\n"
"0: cdsg %[old],%[new],%[target]\n"
"1: ipm %[ret]\n"
" srl %[ret],28\n"
"2: sacf 768\n"
" spka %[default_key]\n"
EX_TABLE(0b, 2b) EX_TABLE(1b, 2b)
: [old] "+d" (*old_p),
[target] "+Q" (*(unsigned __int128 __user *)address),
[ret] "+d" (ret)
: [access_key] "a" (access_key << 4),
[new] "d" (new),
[default_key] "J" (PAGE_DEFAULT_KEY)
: "cc"
);
return ret;
- case 8:
old.doubleword = *old_p;
asm volatile(
"spka 0(%[access_key])\n"
" sacf 256\n"
"0: csg %[old],%[new],%[target]\n"
"1: ipm %[ret]\n"
" srl %[ret],28\n"
"2: sacf 768\n"
" spka %[default_key]\n"
EX_TABLE(0b, 2b) EX_TABLE(1b, 2b)
: [old] "+d" (old.doubleword),
[target] "+Q" (*(u64 __user *)address),
[ret] "+d" (ret)
: [access_key] "a" (access_key << 4),
[new] "d" ((u64)new),
[default_key] "J" (PAGE_DEFAULT_KEY)
: "cc"
);
*old_p = old.doubleword;
return ret;
- case 4:
old.word = *old_p;
asm volatile(
"spka 0(%[access_key])\n"
" sacf 256\n"
"0: cs %[old],%[new],%[target]\n"
"1: ipm %[ret]\n"
" srl %[ret],28\n"
"2: sacf 768\n"
" spka %[default_key]\n"
EX_TABLE(0b, 2b) EX_TABLE(1b, 2b)
: [old] "+d" (old.word),
[target] "+Q" (*(u32 __user *)address),
[ret] "+d" (ret)
: [access_key] "a" (access_key << 4),
[new] "d" ((u32)new),
[default_key] "J" (PAGE_DEFAULT_KEY)
: "cc"
this is the same code 3 times with only very minimal changes. can you factor it out in macros?
something like this:
#define DO_COMPARE_AND_SWAP(instr, _old, _addr, _ret, _key, _new) \ asm volatile( "spka 0(%[access_key])\n" " sacf 256\n" "0: " instr "%[old],%[new],%[target]\n" "1: ipm %[ret]\n" " srl %[ret],28\n" "2: sacf 768\n" " spka %[default_key]\n" EX_TABLE(0b, 2b) EX_TABLE(1b, 2b) : [old] "+d"(_old), [target] "+Q" (*(_addr)), [ret] "+d" (_ret) : [access_key] "a" ((_key) << 4), [new] "d" (_new), [default_key] "J" (PAGE_DEFAULT_KEY) : "cc"
and then in the code:
DO_COMPARE_AND_SWAP("cs", old.word, (u32 __user *)address, ret, access_key, (u32)new)
this way the code is not duplicated
or have you tried it already and there are issues I didn't think of?
I'd prefer that, but it's different from how cmpxchg.h does it. But then that has simpler asm and needs to special case int128 so the benefit isn't as great. I guess Heiko should make that call.
);
*old_p = old.word;
return ret;
- case 2:
- case 1:
return __cmpxchg_user_key_small(size, (u64)address, old_p, new, access_key);
- default:
return -EINVAL;
- }
+}
+#define cmpxchg_user_key(target_p, old_p, new, access_key) \
Note that this macro isn't being used because I also deviated from the functions in cmpxchg.h here. Since we need to return an error in case of a fault the return type cannot be void. So we can also return EINVAL in case of an invalid size. Then cmpxchg_user_key_size is perfectly fine to call directly, which avoids awkwardness in KVM converting the numeric size we got from user space into the right types. So this macro only exists for other future possible users of key checked cmpxchg where the types are fixed at compile time. So with your version cmpxchg_user_key_size should just recurse for the small sizes.
+({ \
- __typeof__(old_p) __old_p = (old_p); \
- unsigned __int128 __old = *__old_p; \
- size_t __size = sizeof(*(target_p)); \
- int __ret; \
\
- BUILD_BUG_ON(__size != sizeof(*__old_p)); \
- BUILD_BUG_ON(__size != sizeof(new)); \
- BUILD_BUG_ON(__size > 16 || !is_power_of_2(__size)); \
and here an if to see if you need the _small version or the regular one, with the _small version being a wrapper around the regular one
- __ret = cmpxchg_user_key_size(__size, (target_p), &__old, (new), \
(access_key)); \
- *__old_p = __old; \
- __ret; \
+})
#endif /* __S390_UACCESS_H */
On Wed, 2022-10-05 at 08:32 +0200, Thomas Huth wrote:
On 30/09/2022 23.07, Janis Schoetterl-Glausch wrote:
User space can use the MEM_OP ioctl to make storage key checked reads and writes to the guest, however, it has no way of performing atomic, key checked, accesses to the guest. Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg mode. For now, support this mode for absolute accesses only.
This mode can be use, for example, to set the device-state-change indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch scgl@linux.ibm.com
The return value of MEM_OP is: 0 on success, < 0 on generic error (e.g. -EFAULT or -ENOMEM),
0 if an exception occurred while walking the page tables
A cmpxchg failing because the old value doesn't match is neither an error nor an exception, so the question is how best to signal that condition. This is not strictly necessary since user space can compare the value of old after the MEM_OP with the value it set. If they're different the cmpxchg failed. It might be a better user interface if there is an easier way to see if the cmpxchg failed. This patch sets the cmpxchg flag bit to 0 on a successful cmpxchg. This way you can compare against a constant instead of the old old value. This has the disadvantage of being a bit weird, other suggestions welcome.
This also breaks the old API of defining the ioctl as _IOW only ... with your change to the flags field, it effectively gets IOWR instead.
Oh, right.
Maybe it would be better to put all the new logic into a new struct and only pass a pointer to that struct in kvm_s390_mem_op, so that the ioctl stays IOW ? ... or maybe even introduce a completely new ioctl for this functionality instead?
Hmmm, the latter seems a bit ugly since there is so much commonality with the existing memop.
Thomas
linux-kselftest-mirror@lists.linaro.org