The selftests, when built with newer versions of clang, is found to have over optimized guests' ucall() function, and eliminating the stores for uc.cmd (perhaps due to no immediate readers). This resulted in the userspace side always reading a value of '0', and causing multiple test failures.
As a result, prevent the compiler from optimizing the stores in ucall() with WRITE_ONCE().
Suggested-by: Ricardo Koller ricarkol@google.com Suggested-by: Reiji Watanabe reijiw@google.com Signed-off-by: Raghavendra Rao Ananta rananta@google.com --- tools/testing/selftests/kvm/lib/aarch64/ucall.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/aarch64/ucall.c b/tools/testing/selftests/kvm/lib/aarch64/ucall.c index e0b0164e9af8..be1d9728c4ce 100644 --- a/tools/testing/selftests/kvm/lib/aarch64/ucall.c +++ b/tools/testing/selftests/kvm/lib/aarch64/ucall.c @@ -73,20 +73,19 @@ void ucall_uninit(struct kvm_vm *vm)
void ucall(uint64_t cmd, int nargs, ...) { - struct ucall uc = { - .cmd = cmd, - }; + struct ucall uc = {}; va_list va; int i;
+ WRITE_ONCE(uc.cmd, cmd); nargs = nargs <= UCALL_MAX_ARGS ? nargs : UCALL_MAX_ARGS;
va_start(va, nargs); for (i = 0; i < nargs; ++i) - uc.args[i] = va_arg(va, uint64_t); + WRITE_ONCE(uc.args[i], va_arg(va, uint64_t)); va_end(va);
- *ucall_exit_mmio_addr = (vm_vaddr_t)&uc; + WRITE_ONCE(*ucall_exit_mmio_addr, (vm_vaddr_t)&uc); }
uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc)
On Wed, Jun 15, 2022 at 06:57:06PM +0000, Raghavendra Rao Ananta wrote:
The selftests, when built with newer versions of clang, is found to have over optimized guests' ucall() function, and eliminating the stores for uc.cmd (perhaps due to no immediate readers). This resulted in the userspace side always reading a value of '0', and causing multiple test failures.
As a result, prevent the compiler from optimizing the stores in ucall() with WRITE_ONCE().
Suggested-by: Ricardo Koller ricarkol@google.com Suggested-by: Reiji Watanabe reijiw@google.com Signed-off-by: Raghavendra Rao Ananta rananta@google.com
tools/testing/selftests/kvm/lib/aarch64/ucall.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/aarch64/ucall.c b/tools/testing/selftests/kvm/lib/aarch64/ucall.c index e0b0164e9af8..be1d9728c4ce 100644 --- a/tools/testing/selftests/kvm/lib/aarch64/ucall.c +++ b/tools/testing/selftests/kvm/lib/aarch64/ucall.c @@ -73,20 +73,19 @@ void ucall_uninit(struct kvm_vm *vm) void ucall(uint64_t cmd, int nargs, ...) {
- struct ucall uc = {
.cmd = cmd,
- };
- struct ucall uc = {}; va_list va; int i;
- WRITE_ONCE(uc.cmd, cmd); nargs = nargs <= UCALL_MAX_ARGS ? nargs : UCALL_MAX_ARGS;
va_start(va, nargs); for (i = 0; i < nargs; ++i)
uc.args[i] = va_arg(va, uint64_t);
va_end(va);WRITE_ONCE(uc.args[i], va_arg(va, uint64_t));
- *ucall_exit_mmio_addr = (vm_vaddr_t)&uc;
- WRITE_ONCE(*ucall_exit_mmio_addr, (vm_vaddr_t)&uc);
} uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc) -- 2.36.1.476.g0c4daa206d-goog
Reviewed-by: Andrew Jones drjones@redhat.com
Thanks, drew
From: Andrew Jones
Sent: 16 June 2022 13:03
On Wed, Jun 15, 2022 at 06:57:06PM +0000, Raghavendra Rao Ananta wrote:
The selftests, when built with newer versions of clang, is found to have over optimized guests' ucall() function, and eliminating the stores for uc.cmd (perhaps due to no immediate readers). This resulted in the userspace side always reading a value of '0', and causing multiple test failures.
As a result, prevent the compiler from optimizing the stores in ucall() with WRITE_ONCE().
Suggested-by: Ricardo Koller ricarkol@google.com Suggested-by: Reiji Watanabe reijiw@google.com Signed-off-by: Raghavendra Rao Ananta rananta@google.com
tools/testing/selftests/kvm/lib/aarch64/ucall.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/aarch64/ucall.c
b/tools/testing/selftests/kvm/lib/aarch64/ucall.c
index e0b0164e9af8..be1d9728c4ce 100644 --- a/tools/testing/selftests/kvm/lib/aarch64/ucall.c +++ b/tools/testing/selftests/kvm/lib/aarch64/ucall.c @@ -73,20 +73,19 @@ void ucall_uninit(struct kvm_vm *vm)
void ucall(uint64_t cmd, int nargs, ...) {
- struct ucall uc = {
.cmd = cmd,
- };
struct ucall uc = {}; va_list va; int i;
WRITE_ONCE(uc.cmd, cmd); nargs = nargs <= UCALL_MAX_ARGS ? nargs : UCALL_MAX_ARGS;
va_start(va, nargs); for (i = 0; i < nargs; ++i)
uc.args[i] = va_arg(va, uint64_t);
va_end(va);WRITE_ONCE(uc.args[i], va_arg(va, uint64_t));
- *ucall_exit_mmio_addr = (vm_vaddr_t)&uc;
- WRITE_ONCE(*ucall_exit_mmio_addr, (vm_vaddr_t)&uc);
}
Am I misreading things again? That function looks like it writes the address of an on-stack item into global data.
Maybe 'uc' ought to be static?
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, Jun 16, 2022 at 03:58:52PM +0000, David Laight wrote:
From: Andrew Jones
Sent: 16 June 2022 13:03
On Wed, Jun 15, 2022 at 06:57:06PM +0000, Raghavendra Rao Ananta wrote:
The selftests, when built with newer versions of clang, is found to have over optimized guests' ucall() function, and eliminating the stores for uc.cmd (perhaps due to no immediate readers). This resulted in the userspace side always reading a value of '0', and causing multiple test failures.
As a result, prevent the compiler from optimizing the stores in ucall() with WRITE_ONCE().
Suggested-by: Ricardo Koller ricarkol@google.com Suggested-by: Reiji Watanabe reijiw@google.com Signed-off-by: Raghavendra Rao Ananta rananta@google.com
tools/testing/selftests/kvm/lib/aarch64/ucall.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/aarch64/ucall.c
b/tools/testing/selftests/kvm/lib/aarch64/ucall.c
index e0b0164e9af8..be1d9728c4ce 100644 --- a/tools/testing/selftests/kvm/lib/aarch64/ucall.c +++ b/tools/testing/selftests/kvm/lib/aarch64/ucall.c @@ -73,20 +73,19 @@ void ucall_uninit(struct kvm_vm *vm)
void ucall(uint64_t cmd, int nargs, ...) {
- struct ucall uc = {
.cmd = cmd,
- };
struct ucall uc = {}; va_list va; int i;
WRITE_ONCE(uc.cmd, cmd); nargs = nargs <= UCALL_MAX_ARGS ? nargs : UCALL_MAX_ARGS;
va_start(va, nargs); for (i = 0; i < nargs; ++i)
uc.args[i] = va_arg(va, uint64_t);
va_end(va);WRITE_ONCE(uc.args[i], va_arg(va, uint64_t));
- *ucall_exit_mmio_addr = (vm_vaddr_t)&uc;
- WRITE_ONCE(*ucall_exit_mmio_addr, (vm_vaddr_t)&uc);
}
Am I misreading things again? That function looks like it writes the address of an on-stack item into global data.
The write to the address that the global points at causes a switch from guest to host context. The guest's stack remains intact while executing host code and the host can access the uc stack variable directly by its address. Take a look at lib/aarch64/ucall.c to see all the details.
Thanks, drew
From: Andrew Jones
Sent: 16 June 2022 17:26
On Thu, Jun 16, 2022 at 03:58:52PM +0000, David Laight wrote:
From: Andrew Jones
Sent: 16 June 2022 13:03
On Wed, Jun 15, 2022 at 06:57:06PM +0000, Raghavendra Rao Ananta wrote:
The selftests, when built with newer versions of clang, is found to have over optimized guests' ucall() function, and eliminating the stores for uc.cmd (perhaps due to no immediate readers). This resulted in the userspace side always reading a value of '0', and causing multiple test failures.
As a result, prevent the compiler from optimizing the stores in ucall() with WRITE_ONCE().
Suggested-by: Ricardo Koller ricarkol@google.com Suggested-by: Reiji Watanabe reijiw@google.com Signed-off-by: Raghavendra Rao Ananta rananta@google.com
tools/testing/selftests/kvm/lib/aarch64/ucall.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/aarch64/ucall.c
b/tools/testing/selftests/kvm/lib/aarch64/ucall.c
index e0b0164e9af8..be1d9728c4ce 100644 --- a/tools/testing/selftests/kvm/lib/aarch64/ucall.c +++ b/tools/testing/selftests/kvm/lib/aarch64/ucall.c @@ -73,20 +73,19 @@ void ucall_uninit(struct kvm_vm *vm)
void ucall(uint64_t cmd, int nargs, ...) {
- struct ucall uc = {
.cmd = cmd,
- };
struct ucall uc = {}; va_list va; int i;
WRITE_ONCE(uc.cmd, cmd); nargs = nargs <= UCALL_MAX_ARGS ? nargs : UCALL_MAX_ARGS;
va_start(va, nargs); for (i = 0; i < nargs; ++i)
uc.args[i] = va_arg(va, uint64_t);
va_end(va);WRITE_ONCE(uc.args[i], va_arg(va, uint64_t));
- *ucall_exit_mmio_addr = (vm_vaddr_t)&uc;
- WRITE_ONCE(*ucall_exit_mmio_addr, (vm_vaddr_t)&uc);
}
Am I misreading things again? That function looks like it writes the address of an on-stack item into global data.
The write to the address that the global points at causes a switch from guest to host context. The guest's stack remains intact while executing host code and the host can access the uc stack variable directly by its address. Take a look at lib/aarch64/ucall.c to see all the details.
No wonder I was confused. It's not surprising the compiler optimises it all away.
It doesn't seem right to be 'abusing' WRITE_ONCE() here. Just adding barrier() should be enough and much more descriptive.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
June 16, 2022 11:48 AM, "David Laight" David.Laight@aculab.com wrote:
No wonder I was confused. It's not surprising the compiler optimises it all away.
It doesn't seem right to be 'abusing' WRITE_ONCE() here. Just adding barrier() should be enough and much more descriptive.
I had the same thought, although I do not believe barrier() is sufficient on its own. barrier_data() with a pointer to uc passed through is required to keep clang from eliminating the dead store.
-- Thanks, Oliver
From: oliver.upton@linux.dev
Sent: 16 June 2022 19:45
June 16, 2022 11:48 AM, "David Laight" David.Laight@aculab.com wrote:
No wonder I was confused. It's not surprising the compiler optimises it all away.
It doesn't seem right to be 'abusing' WRITE_ONCE() here. Just adding barrier() should be enough and much more descriptive.
I had the same thought, although I do not believe barrier() is sufficient on its own. barrier_data() with a pointer to uc passed through is required to keep clang from eliminating the dead store.
A barrier() (full memory clobber) ought to be stronger than the partial one than barrier_data() generates.
I can't quite decide whether you need a barrier() both sides of the 'magic write'. Plausibly the compiler could discard the on-stack data after the barrier() and before the 'magic write'.
Certainly putting the 'magic write' inside a asm block that has a memory clobber is a more correct solution.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, Jun 16, 2022 at 09:54:16PM +0000, David Laight wrote:
From: oliver.upton@linux.dev
Sent: 16 June 2022 19:45
June 16, 2022 11:48 AM, "David Laight" David.Laight@aculab.com wrote:
No wonder I was confused. It's not surprising the compiler optimises it all away.
It doesn't seem right to be 'abusing' WRITE_ONCE() here. Just adding barrier() should be enough and much more descriptive.
I had the same thought, although I do not believe barrier() is sufficient on its own. barrier_data() with a pointer to uc passed through is required to keep clang from eliminating the dead store.
A barrier() (full memory clobber) ought to be stronger than the partial one than barrier_data() generates.
I can't quite decide whether you need a barrier() both sides of the 'magic write'. Plausibly the compiler could discard the on-stack data after the barrier() and before the 'magic write'.
Certainly putting the 'magic write' inside a asm block that has a memory clobber is a more correct solution.
Indeed, since the magic write is actually a guest MMIO write, then it should be using writeq().
Thanks, drew
On 6/17/22 09:28, Andrew Jones wrote:
On Thu, Jun 16, 2022 at 09:54:16PM +0000, David Laight wrote:
From: oliver.upton@linux.dev
Sent: 16 June 2022 19:45
June 16, 2022 11:48 AM, "David Laight" David.Laight@aculab.com wrote:
No wonder I was confused. It's not surprising the compiler optimises it all away.
It doesn't seem right to be 'abusing' WRITE_ONCE() here. Just adding barrier() should be enough and much more descriptive.
I had the same thought, although I do not believe barrier() is sufficient on its own. barrier_data() with a pointer to uc passed through is required to keep clang from eliminating the dead store.
A barrier() (full memory clobber) ought to be stronger than the partial one than barrier_data() generates.
I can't quite decide whether you need a barrier() both sides of the 'magic write'. Plausibly the compiler could discard the on-stack data after the barrier() and before the 'magic write'.
Certainly putting the 'magic write' inside a asm block that has a memory clobber is a more correct solution.
Indeed, since the magic write is actually a guest MMIO write, then it should be using writeq().
It doesn't need to use writeq() because no special precautions are needed with respect to cacheability or instruction reordering (as is the case with hardware registers).
WRITE_ONCE is okay, especially since the code never reads it (and if it did it would also use READ_ONCE).
Paolo
linux-kselftest-mirror@lists.linaro.org