vDSO (virtual dynamic shared object) is a mechanism that the Linux kernel provides as an alternative to system calls to reduce where possible the costs in terms of cycles. This is possible because certain syscalls like gettimeofday() do not write any data and return one or more values that are stored in the kernel, which makes relatively safe calling them directly as a library function.
Even if the mechanism is pretty much standard, every architecture in the last few years ended up implementing their own vDSO library in the architectural code.
The purpose of this patch-set is to identify the commonalities in between the architectures and try to consolidate the common code paths, starting with gettimeofday().
This implementation contains the following design choices: * Every architecture defines the arch specific code in an header in "asm/vdso/". * The generic implementation includes the arch specific one and lives in "lib/vdso". * The arch specific code for gettimeofday lives in "<arch path>/vdso/gettimeofday.c" and includes the generic code only. * The generic implementation of update_vsyscall and update_vsyscall_tz lives in kernel/vdso and provide the bindings that can be implemented by each architecture. * Each architecture provides its implementation of the bindings in "asm/vdso/vsyscall.h". * This approach allows to consolidate the common code in a single place with the benefit of avoiding code duplication.
This implementation contains the portings to the common library for: arm64, compat mode for arm64, arm, mips, x86_64, x32, compat mode for x86_64 and i386.
The mips porting has been tested on qemu for mips32el. A configuration to repeat the tests can be found at [4].
The x86_64 porting has been tested on an Intel Xeon 5120T based machine running Ubuntu 18.04 and using the Ubuntu provided defconfig.
The i386 porting has been tested on qemu using the i386_defconfig configuration.
Last but not least from this porting arm64, compat arm64, arm and mips gain the support for: * CLOCK_BOOTTIME that can be useful in certain scenarios since it keeps track of the time during sleep as well. * CLOCK_TAI that is like CLOCK_REALTIME, but uses the International Atomic Time (TAI) reference instead of UTC to avoid jumping on leap second updates. for both clock_gettime and clock_getres.
The porting has been validated using the vdsotest test-suite [1] extended to cover all the clock ids [2].
A new test has been added to the linux kselftest in order to validate the newly added library.
The porting has been benchmarked and the performance results are provided as part of this cover letter.
To simplify the testing, a copy of the patchset on top of a recent linux tree can be found at [3] and [4].
[1] https://github.com/nathanlynch/vdsotest [2] https://github.com/fvincenzo/vdsotest [3] git://linux-arm.org/linux-vf.git vdso/v6 [4] git://linux-arm.org/linux-vf.git vdso-mips/v6
Changes: -------- v6: - Rebased on 5.2-rc2. - Added performance numbers. - Removed vdso_types.h. - Unified update_vsyscall and update_vsyscall_tz. - Reworked the kselftest included in this patchset. - Addressed review comments. v5: - Rebased on 5.0-rc7. - Added x86_64, compat mode for x86_64 and i386 portings. - Extended vDSO kselftest. - Addressed review comments. v4: - Rebased on 5.0-rc2. - Addressed review comments. - Disabled compat vdso on arm64 when the kernel is compiled with clang. v3: - Ported the latest fixes and optimizations done on the x86 architecture to the generic library. - Addressed review comments. - Improved the documentation of the interfaces. - Changed the HAVE_ARCH_TIMER config option to a more generic HAVE_HW_COUNTER. v2: - Added -ffixed-x18 to arm64 - Repleced occurrences of timeval and timespec - Modified datapage.h to be compliant with y2038 on all the architectures - Removed __u_vdso type
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Cc: Arnd Bergmann arnd@arndb.de Cc: Russell King linux@armlinux.org.uk Cc: Ralf Baechle ralf@linux-mips.org Cc: Paul Burton paul.burton@mips.com Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Mark Salyzyn salyzyn@android.com Cc: Peter Collingbourne pcc@google.com Cc: Shuah Khan shuah@kernel.org Cc: Dmitry Safonov 0x7f454c46@gmail.com Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Huw Davies huw@codeweavers.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com
Performance Numbers: Linux 5.2.0-rc2 - Xeon Gold 5120T ======================================================
Unified vDSO: -------------
clock-gettime-monotonic: syscall: 342 nsec/call clock-gettime-monotonic: libc: 25 nsec/call clock-gettime-monotonic: vdso: 24 nsec/call clock-getres-monotonic: syscall: 296 nsec/call clock-getres-monotonic: libc: 296 nsec/call clock-getres-monotonic: vdso: 3 nsec/call clock-gettime-monotonic-coarse: syscall: 294 nsec/call clock-gettime-monotonic-coarse: libc: 5 nsec/call clock-gettime-monotonic-coarse: vdso: 5 nsec/call clock-getres-monotonic-coarse: syscall: 295 nsec/call clock-getres-monotonic-coarse: libc: 292 nsec/call clock-getres-monotonic-coarse: vdso: 5 nsec/call clock-gettime-monotonic-raw: syscall: 343 nsec/call clock-gettime-monotonic-raw: libc: 25 nsec/call clock-gettime-monotonic-raw: vdso: 23 nsec/call clock-getres-monotonic-raw: syscall: 290 nsec/call clock-getres-monotonic-raw: libc: 290 nsec/call clock-getres-monotonic-raw: vdso: 4 nsec/call clock-gettime-tai: syscall: 332 nsec/call clock-gettime-tai: libc: 24 nsec/call clock-gettime-tai: vdso: 23 nsec/call clock-getres-tai: syscall: 288 nsec/call clock-getres-tai: libc: 288 nsec/call clock-getres-tai: vdso: 3 nsec/call clock-gettime-boottime: syscall: 342 nsec/call clock-gettime-boottime: libc: 24 nsec/call clock-gettime-boottime: vdso: 23 nsec/call clock-getres-boottime: syscall: 284 nsec/call clock-getres-boottime: libc: 291 nsec/call clock-getres-boottime: vdso: 3 nsec/call clock-gettime-realtime: syscall: 337 nsec/call clock-gettime-realtime: libc: 24 nsec/call clock-gettime-realtime: vdso: 23 nsec/call clock-getres-realtime: syscall: 287 nsec/call clock-getres-realtime: libc: 284 nsec/call clock-getres-realtime: vdso: 3 nsec/call clock-gettime-realtime-coarse: syscall: 307 nsec/call clock-gettime-realtime-coarse: libc: 4 nsec/call clock-gettime-realtime-coarse: vdso: 4 nsec/call clock-getres-realtime-coarse: syscall: 294 nsec/call clock-getres-realtime-coarse: libc: 291 nsec/call clock-getres-realtime-coarse: vdso: 4 nsec/call getcpu: syscall: 246 nsec/call getcpu: libc: 14 nsec/call getcpu: vdso: 11 nsec/call gettimeofday: syscall: 293 nsec/call gettimeofday: libc: 26 nsec/call gettimeofday: vdso: 25 nsec/call
Stock Kernel: -------------
clock-gettime-monotonic: syscall: 338 nsec/call clock-gettime-monotonic: libc: 24 nsec/call clock-gettime-monotonic: vdso: 23 nsec/call clock-getres-monotonic: syscall: 291 nsec/call clock-getres-monotonic: libc: 304 nsec/call clock-getres-monotonic: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-monotonic-coarse: syscall: 297 nsec/call clock-gettime-monotonic-coarse: libc: 5 nsec/call clock-gettime-monotonic-coarse: vdso: 4 nsec/call clock-getres-monotonic-coarse: syscall: 281 nsec/call clock-getres-monotonic-coarse: libc: 286 nsec/call clock-getres-monotonic-coarse: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-monotonic-raw: syscall: 336 nsec/call clock-gettime-monotonic-raw: libc: 340 nsec/call clock-gettime-monotonic-raw: vdso: 346 nsec/call clock-getres-monotonic-raw: syscall: 297 nsec/call clock-getres-monotonic-raw: libc: 301 nsec/call clock-getres-monotonic-raw: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-tai: syscall: 351 nsec/call clock-gettime-tai: libc: 24 nsec/call clock-gettime-tai: vdso: 23 nsec/call clock-getres-tai: syscall: 298 nsec/call clock-getres-tai: libc: 290 nsec/call clock-getres-tai: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-boottime: syscall: 342 nsec/call clock-gettime-boottime: libc: 347 nsec/call clock-gettime-boottime: vdso: 355 nsec/call clock-getres-boottime: syscall: 296 nsec/call clock-getres-boottime: libc: 295 nsec/call clock-getres-boottime: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-realtime: syscall: 346 nsec/call clock-gettime-realtime: libc: 24 nsec/call clock-gettime-realtime: vdso: 22 nsec/call clock-getres-realtime: syscall: 295 nsec/call clock-getres-realtime: libc: 291 nsec/call clock-getres-realtime: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-realtime-coarse: syscall: 292 nsec/call clock-gettime-realtime-coarse: libc: 5 nsec/call clock-gettime-realtime-coarse: vdso: 4 nsec/call clock-getres-realtime-coarse: syscall: 300 nsec/call clock-getres-realtime-coarse: libc: 301 nsec/call clock-getres-realtime-coarse: vdso: not tested Note: vDSO version of clock_getres not found getcpu: syscall: 252 nsec/call getcpu: libc: 14 nsec/call getcpu: vdso: 11 nsec/call gettimeofday: syscall: 293 nsec/call gettimeofday: libc: 24 nsec/call gettimeofday: vdso: 25 nsec/call
Peter Collingbourne (1): arm64: Build vDSO with -ffixed-x18
Vincenzo Frascino (18): kernel: Standardize vdso_datapage kernel: Define gettimeofday vdso common code kernel: Unify update_vsyscall implementation arm64: Substitute gettimeofday with C implementation arm64: compat: Add missing syscall numbers arm64: compat: Expose signal related structures arm64: compat: Generate asm offsets for signals lib: vdso: Add compat support arm64: compat: Add vDSO arm64: Refactor vDSO code arm64: compat: vDSO setup for compat layer arm64: elf: vDSO code page discovery arm64: compat: Get sigreturn trampolines from vDSO arm64: Add vDSO compat support arm: Add support for generic vDSO mips: Add support for generic vDSO x86: Add support for generic vDSO kselftest: Extend vDSO selftest
arch/arm/Kconfig | 3 + arch/arm/include/asm/vdso/gettimeofday.h | 96 +++++ arch/arm/include/asm/vdso/vsyscall.h | 71 ++++ arch/arm/include/asm/vdso_datapage.h | 29 +- arch/arm/kernel/vdso.c | 87 +---- arch/arm/vdso/Makefile | 13 +- arch/arm/vdso/note.c | 15 + arch/arm/vdso/vdso.lds.S | 2 + arch/arm/vdso/vgettimeofday.c | 268 +------------ arch/arm64/Kconfig | 3 + arch/arm64/Makefile | 23 +- arch/arm64/include/asm/elf.h | 14 + arch/arm64/include/asm/signal32.h | 46 +++ arch/arm64/include/asm/unistd.h | 5 + arch/arm64/include/asm/vdso.h | 3 + arch/arm64/include/asm/vdso/compat_barrier.h | 51 +++ .../include/asm/vdso/compat_gettimeofday.h | 108 ++++++ arch/arm64/include/asm/vdso/gettimeofday.h | 84 +++++ arch/arm64/include/asm/vdso/vsyscall.h | 53 +++ arch/arm64/include/asm/vdso_datapage.h | 48 --- arch/arm64/kernel/Makefile | 6 +- arch/arm64/kernel/asm-offsets.c | 39 +- arch/arm64/kernel/signal32.c | 72 ++-- arch/arm64/kernel/vdso.c | 356 ++++++++++++------ arch/arm64/kernel/vdso/Makefile | 34 +- arch/arm64/kernel/vdso/gettimeofday.S | 334 ---------------- arch/arm64/kernel/vdso/vgettimeofday.c | 28 ++ arch/arm64/kernel/vdso32/.gitignore | 2 + arch/arm64/kernel/vdso32/Makefile | 184 +++++++++ arch/arm64/kernel/vdso32/note.c | 15 + arch/arm64/kernel/vdso32/sigreturn.S | 62 +++ arch/arm64/kernel/vdso32/vdso.S | 19 + arch/arm64/kernel/vdso32/vdso.lds.S | 82 ++++ arch/arm64/kernel/vdso32/vgettimeofday.c | 59 +++ arch/mips/Kconfig | 2 + arch/mips/include/asm/vdso.h | 78 +--- arch/mips/include/asm/vdso/gettimeofday.h | 175 +++++++++ arch/mips/{ => include/asm}/vdso/vdso.h | 6 +- arch/mips/include/asm/vdso/vsyscall.h | 43 +++ arch/mips/kernel/vdso.c | 37 +- arch/mips/vdso/Makefile | 25 +- arch/mips/vdso/elf.S | 2 +- arch/mips/vdso/gettimeofday.c | 273 -------------- arch/mips/vdso/sigreturn.S | 2 +- arch/mips/vdso/vdso.lds.S | 4 + arch/mips/vdso/vgettimeofday.c | 57 +++ arch/x86/Kconfig | 3 + arch/x86/entry/vdso/Makefile | 9 + arch/x86/entry/vdso/vclock_gettime.c | 251 +++--------- arch/x86/entry/vdso/vdso.lds.S | 2 + arch/x86/entry/vdso/vdso32/vdso32.lds.S | 2 + arch/x86/entry/vdso/vdsox32.lds.S | 1 + arch/x86/entry/vsyscall/Makefile | 2 - arch/x86/entry/vsyscall/vsyscall_gtod.c | 83 ---- arch/x86/include/asm/mshyperv-tsc.h | 76 ++++ arch/x86/include/asm/mshyperv.h | 70 +--- arch/x86/include/asm/pvclock.h | 2 +- arch/x86/include/asm/vdso/gettimeofday.h | 203 ++++++++++ arch/x86/include/asm/vdso/vsyscall.h | 44 +++ arch/x86/include/asm/vgtod.h | 75 +--- arch/x86/include/asm/vvar.h | 7 +- arch/x86/kernel/pvclock.c | 1 + include/asm-generic/vdso/vsyscall.h | 56 +++ include/linux/hrtimer.h | 15 +- include/linux/hrtimer_defs.h | 25 ++ include/linux/timekeeper_internal.h | 9 + include/vdso/datapage.h | 91 +++++ include/vdso/helpers.h | 56 +++ include/vdso/vsyscall.h | 11 + kernel/Makefile | 1 + kernel/vdso/Makefile | 2 + kernel/vdso/vsyscall.c | 139 +++++++ lib/Kconfig | 5 + lib/vdso/Kconfig | 36 ++ lib/vdso/Makefile | 22 ++ lib/vdso/gettimeofday.c | 229 +++++++++++ tools/testing/selftests/vDSO/Makefile | 2 + tools/testing/selftests/vDSO/vdso_full_test.c | 261 +++++++++++++ 78 files changed, 3042 insertions(+), 1767 deletions(-) create mode 100644 arch/arm/include/asm/vdso/gettimeofday.h create mode 100644 arch/arm/include/asm/vdso/vsyscall.h create mode 100644 arch/arm/vdso/note.c create mode 100644 arch/arm64/include/asm/vdso/compat_barrier.h create mode 100644 arch/arm64/include/asm/vdso/compat_gettimeofday.h create mode 100644 arch/arm64/include/asm/vdso/gettimeofday.h create mode 100644 arch/arm64/include/asm/vdso/vsyscall.h delete mode 100644 arch/arm64/include/asm/vdso_datapage.h delete mode 100644 arch/arm64/kernel/vdso/gettimeofday.S create mode 100644 arch/arm64/kernel/vdso/vgettimeofday.c create mode 100644 arch/arm64/kernel/vdso32/.gitignore create mode 100644 arch/arm64/kernel/vdso32/Makefile create mode 100644 arch/arm64/kernel/vdso32/note.c create mode 100644 arch/arm64/kernel/vdso32/sigreturn.S create mode 100644 arch/arm64/kernel/vdso32/vdso.S create mode 100644 arch/arm64/kernel/vdso32/vdso.lds.S create mode 100644 arch/arm64/kernel/vdso32/vgettimeofday.c create mode 100644 arch/mips/include/asm/vdso/gettimeofday.h rename arch/mips/{ => include/asm}/vdso/vdso.h (90%) create mode 100644 arch/mips/include/asm/vdso/vsyscall.h delete mode 100644 arch/mips/vdso/gettimeofday.c create mode 100644 arch/mips/vdso/vgettimeofday.c delete mode 100644 arch/x86/entry/vsyscall/vsyscall_gtod.c create mode 100644 arch/x86/include/asm/mshyperv-tsc.h create mode 100644 arch/x86/include/asm/vdso/gettimeofday.h create mode 100644 arch/x86/include/asm/vdso/vsyscall.h create mode 100644 include/asm-generic/vdso/vsyscall.h create mode 100644 include/linux/hrtimer_defs.h create mode 100644 include/vdso/datapage.h create mode 100644 include/vdso/helpers.h create mode 100644 include/vdso/vsyscall.h create mode 100644 kernel/vdso/Makefile create mode 100644 kernel/vdso/vsyscall.c create mode 100644 lib/vdso/Kconfig create mode 100644 lib/vdso/Makefile create mode 100644 lib/vdso/gettimeofday.c create mode 100644 tools/testing/selftests/vDSO/vdso_full_test.c
In an effort to unify the common code for managing the vdso library in between all the architectures that support it, this patch tries to provide a common format for the vdso datapage.
As a result of this, this patch generalized the data structures in vgtod.h from x86 private includes to general includes (include/vdso).
Cc: Arnd Bergmann arnd@arndb.de Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- include/vdso/datapage.h | 91 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 include/vdso/datapage.h
diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h new file mode 100644 index 000000000000..bb7087eec9bd --- /dev/null +++ b/include/vdso/datapage.h @@ -0,0 +1,91 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __VDSO_DATAPAGE_H +#define __VDSO_DATAPAGE_H + +#ifdef __KERNEL__ + +#ifndef __ASSEMBLY__ + +#include <linux/bits.h> +#include <linux/time.h> +#include <linux/types.h> + +#define VDSO_BASES (CLOCK_TAI + 1) +#define VDSO_HRES (BIT(CLOCK_REALTIME) | \ + BIT(CLOCK_MONOTONIC) | \ + BIT(CLOCK_BOOTTIME) | \ + BIT(CLOCK_TAI)) +#define VDSO_COARSE (BIT(CLOCK_REALTIME_COARSE) | \ + BIT(CLOCK_MONOTONIC_COARSE)) +#define VDSO_RAW (BIT(CLOCK_MONOTONIC_RAW)) + +#define CS_HRES_COARSE 0 +#define CS_RAW 1 +#define CS_BASES (CS_RAW + 1) + +/** + * struct vdso_timestamp - basetime per clock_id + * @sec: seconds + * @nsec: nanoseconds + * + * There is one vdso_timestamp object in vvar for each vDSO-accelerated + * clock_id. For high-resolution clocks, this encodes the time + * corresponding to vdso_data.cycle_last. For coarse clocks this encodes + * the actual time. + * + * To be noticed that for highres clocks nsec is left-shifted by + * vdso_data.cs[x].shift. + */ +struct vdso_timestamp { + u64 sec; + u64 nsec; +}; + +/** + * struct vdso_data - vdso datapage representation + * @seq: timebase sequence counter + * @clock_mode: clock mode + * @cycle_last: timebase at clocksource init + * @mask: clocksource mask + * @mult: clocksource multiplier + * @shift: clocksource shift + * @basetime[clock_id]: basetime per clock_id + * @tz_minuteswest: minutes west of Greenwich + * @tz_dsttime: type of DST correction + * @hrtimer_res: hrtimer resolution + * + * vdso_data will be accessed by 64 bit and compat code at the same time + * so we should be careful before modifying this structure. + */ +struct vdso_data { + u32 seq; + + s32 clock_mode; + u64 cycle_last; + u64 mask; + u32 mult; + u32 shift; + + struct vdso_timestamp basetime[VDSO_BASES]; + + s32 tz_minuteswest; + s32 tz_dsttime; + u32 hrtimer_res; +}; + +/* + * We use the hidden visibility to prevent the compiler from generating a GOT + * relocation. Not only is going through a GOT useless (the entry couldn't and + * must not be overridden by another library), it does not even work: the linker + * cannot generate an absolute address to the data page. + * + * With the hidden visibility, the compiler simply generates a PC-relative + * relocation, and this is what we need. + */ +extern struct vdso_data _vdso_data[CS_BASES] __attribute__((visibility("hidden"))); + +#endif /* !__ASSEMBLY__ */ + +#endif /* __KERNEL__ */ + +#endif /* __VDSO_DATAPAGE_H */
On Thu, May 30, 2019 at 4:15 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
- vdso_data will be accessed by 64 bit and compat code at the same time
- so we should be careful before modifying this structure.
- */
+struct vdso_data {
u32 seq;
s32 clock_mode;
u64 cycle_last;
u64 mask;
u32 mult;
u32 shift;
struct vdso_timestamp basetime[VDSO_BASES];
s32 tz_minuteswest;
s32 tz_dsttime;
u32 hrtimer_res;
+};
The structure contains four padding bytes at the end, which is something we try to avoid, at least if this ends up being used as an ABI. Maybe add "u32 __unused" at the end?
Arnd
On 31/05/2019 09:16, Arnd Bergmann wrote:
On Thu, May 30, 2019 at 4:15 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
- vdso_data will be accessed by 64 bit and compat code at the same time
- so we should be careful before modifying this structure.
- */
+struct vdso_data {
u32 seq;
s32 clock_mode;
u64 cycle_last;
u64 mask;
u32 mult;
u32 shift;
struct vdso_timestamp basetime[VDSO_BASES];
s32 tz_minuteswest;
s32 tz_dsttime;
u32 hrtimer_res;
+};
The structure contains four padding bytes at the end, which is something we try to avoid, at least if this ends up being used as an ABI. Maybe add "u32 __unused" at the end?
Agreed, I will fix this in v7.
Arnd
On Tue, Jun 04, 2019 at 01:05:40PM +0100, Vincenzo Frascino wrote:
On 31/05/2019 09:16, Arnd Bergmann wrote:
On Thu, May 30, 2019 at 4:15 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
- vdso_data will be accessed by 64 bit and compat code at the same time
- so we should be careful before modifying this structure.
- */
+struct vdso_data {
u32 seq;
s32 clock_mode;
u64 cycle_last;
u64 mask;
u32 mult;
u32 shift;
struct vdso_timestamp basetime[VDSO_BASES];
s32 tz_minuteswest;
s32 tz_dsttime;
u32 hrtimer_res;
+};
The structure contains four padding bytes at the end, which is something we try to avoid, at least if this ends up being used as an ABI. Maybe add "u32 __unused" at the end?
Agreed, I will fix this in v7.
Note that this is also necessary to ensure that CLOCK_MONOTONIC_RAW works in the 32-bit vDSO on x86_64 kernels.
Huw.
On Thu, May 30, 2019 at 03:15:13PM +0100, Vincenzo Frascino wrote:
--- /dev/null +++ b/include/vdso/datapage.h @@ -0,0 +1,91 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __VDSO_DATAPAGE_H +#define __VDSO_DATAPAGE_H
+#ifdef __KERNEL__
+#ifndef __ASSEMBLY__
+#include <linux/bits.h> +#include <linux/time.h> +#include <linux/types.h>
+#define VDSO_BASES (CLOCK_TAI + 1) +#define VDSO_HRES (BIT(CLOCK_REALTIME) | \
BIT(CLOCK_MONOTONIC) | \
BIT(CLOCK_BOOTTIME) | \
BIT(CLOCK_TAI))
+#define VDSO_COARSE (BIT(CLOCK_REALTIME_COARSE) | \
BIT(CLOCK_MONOTONIC_COARSE))
+#define VDSO_RAW (BIT(CLOCK_MONOTONIC_RAW))
+#define CS_HRES_COARSE 0 +#define CS_RAW 1
CS_HRES_COARSE seems like a confusing name choice to me. What you really mean is not RAW.
How about CS_ADJ to indicate that its updated by adjtime? CS_XTIME might be another option.
Huw.
Hi Huw,
thank you for your review.
On 10/06/2019 10:27, Huw Davies wrote:
On Thu, May 30, 2019 at 03:15:13PM +0100, Vincenzo Frascino wrote:
--- /dev/null +++ b/include/vdso/datapage.h @@ -0,0 +1,91 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __VDSO_DATAPAGE_H +#define __VDSO_DATAPAGE_H
+#ifdef __KERNEL__
+#ifndef __ASSEMBLY__
+#include <linux/bits.h> +#include <linux/time.h> +#include <linux/types.h>
+#define VDSO_BASES (CLOCK_TAI + 1) +#define VDSO_HRES (BIT(CLOCK_REALTIME) | \
BIT(CLOCK_MONOTONIC) | \
BIT(CLOCK_BOOTTIME) | \
BIT(CLOCK_TAI))
+#define VDSO_COARSE (BIT(CLOCK_REALTIME_COARSE) | \
BIT(CLOCK_MONOTONIC_COARSE))
+#define VDSO_RAW (BIT(CLOCK_MONOTONIC_RAW))
+#define CS_HRES_COARSE 0 +#define CS_RAW 1
CS_HRES_COARSE seems like a confusing name choice to me. What you really mean is not RAW.
How about CS_ADJ to indicate that its updated by adjtime? CS_XTIME might be another option.
I divided the timers in 3 sets (HRES, COARSE, RAW), CS_HRES_COARSE refers to the first two and CS_RAW to the third. I will ad a comment to explain the logic in the next iteration.
Huw.
On Mon, Jun 10, 2019 at 11:17:48AM +0100, Vincenzo Frascino wrote:
On 10/06/2019 10:27, Huw Davies wrote:
On Thu, May 30, 2019 at 03:15:13PM +0100, Vincenzo Frascino wrote:
--- /dev/null +++ b/include/vdso/datapage.h @@ -0,0 +1,91 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __VDSO_DATAPAGE_H +#define __VDSO_DATAPAGE_H
+#ifdef __KERNEL__
+#ifndef __ASSEMBLY__
+#include <linux/bits.h> +#include <linux/time.h> +#include <linux/types.h>
+#define VDSO_BASES (CLOCK_TAI + 1) +#define VDSO_HRES (BIT(CLOCK_REALTIME) | \
BIT(CLOCK_MONOTONIC) | \
BIT(CLOCK_BOOTTIME) | \
BIT(CLOCK_TAI))
+#define VDSO_COARSE (BIT(CLOCK_REALTIME_COARSE) | \
BIT(CLOCK_MONOTONIC_COARSE))
+#define VDSO_RAW (BIT(CLOCK_MONOTONIC_RAW))
+#define CS_HRES_COARSE 0 +#define CS_RAW 1
CS_HRES_COARSE seems like a confusing name choice to me. What you really mean is not RAW.
How about CS_ADJ to indicate that its updated by adjtime? CS_XTIME might be another option.
I divided the timers in 3 sets (HRES, COARSE, RAW), CS_HRES_COARSE refers to the first two and CS_RAW to the third. I will ad a comment to explain the logic in the next iteration.
I'm thinking ahead about a possible CLOCK_MONOTONIC_RAW_COARSE (which would be useful at least for Wine). In that case you'd have four clock types non-raw and raw, each with either hres or coarse.
Huw.
Hi Huw,
On 10/06/2019 11:31, Huw Davies wrote:
On Mon, Jun 10, 2019 at 11:17:48AM +0100, Vincenzo Frascino wrote:
On 10/06/2019 10:27, Huw Davies wrote:
On Thu, May 30, 2019 at 03:15:13PM +0100, Vincenzo Frascino wrote:
--- /dev/null +++ b/include/vdso/datapage.h @@ -0,0 +1,91 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __VDSO_DATAPAGE_H +#define __VDSO_DATAPAGE_H
+#ifdef __KERNEL__
+#ifndef __ASSEMBLY__
+#include <linux/bits.h> +#include <linux/time.h> +#include <linux/types.h>
+#define VDSO_BASES (CLOCK_TAI + 1) +#define VDSO_HRES (BIT(CLOCK_REALTIME) | \
BIT(CLOCK_MONOTONIC) | \
BIT(CLOCK_BOOTTIME) | \
BIT(CLOCK_TAI))
+#define VDSO_COARSE (BIT(CLOCK_REALTIME_COARSE) | \
BIT(CLOCK_MONOTONIC_COARSE))
+#define VDSO_RAW (BIT(CLOCK_MONOTONIC_RAW))
+#define CS_HRES_COARSE 0 +#define CS_RAW 1
CS_HRES_COARSE seems like a confusing name choice to me. What you really mean is not RAW.
How about CS_ADJ to indicate that its updated by adjtime? CS_XTIME might be another option.
I divided the timers in 3 sets (HRES, COARSE, RAW), CS_HRES_COARSE refers to the first two and CS_RAW to the third. I will ad a comment to explain the logic in the next iteration.
I'm thinking ahead about a possible CLOCK_MONOTONIC_RAW_COARSE (which would be useful at least for Wine). In that case you'd have four clock types non-raw and raw, each with either hres or coarse.
Thanks for this, I was not aware of CLOCK_MONOTONIC_RAW_COARSE. I tried to find, though, some details, but I could not find any. Could you please provide some reference?
Huw.
On Mon, Jun 10, 2019 at 12:07:45PM +0100, Vincenzo Frascino wrote:
On 10/06/2019 11:31, Huw Davies wrote:
On Mon, Jun 10, 2019 at 11:17:48AM +0100, Vincenzo Frascino wrote:
On 10/06/2019 10:27, Huw Davies wrote:
On Thu, May 30, 2019 at 03:15:13PM +0100, Vincenzo Frascino wrote:
--- /dev/null +++ b/include/vdso/datapage.h @@ -0,0 +1,91 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __VDSO_DATAPAGE_H +#define __VDSO_DATAPAGE_H
+#ifdef __KERNEL__
+#ifndef __ASSEMBLY__
+#include <linux/bits.h> +#include <linux/time.h> +#include <linux/types.h>
+#define VDSO_BASES (CLOCK_TAI + 1) +#define VDSO_HRES (BIT(CLOCK_REALTIME) | \
BIT(CLOCK_MONOTONIC) | \
BIT(CLOCK_BOOTTIME) | \
BIT(CLOCK_TAI))
+#define VDSO_COARSE (BIT(CLOCK_REALTIME_COARSE) | \
BIT(CLOCK_MONOTONIC_COARSE))
+#define VDSO_RAW (BIT(CLOCK_MONOTONIC_RAW))
+#define CS_HRES_COARSE 0 +#define CS_RAW 1
CS_HRES_COARSE seems like a confusing name choice to me. What you really mean is not RAW.
How about CS_ADJ to indicate that its updated by adjtime? CS_XTIME might be another option.
I divided the timers in 3 sets (HRES, COARSE, RAW), CS_HRES_COARSE refers to the first two and CS_RAW to the third. I will ad a comment to explain the logic in the next iteration.
I'm thinking ahead about a possible CLOCK_MONOTONIC_RAW_COARSE (which would be useful at least for Wine). In that case you'd have four clock types non-raw and raw, each with either hres or coarse.
Thanks for this, I was not aware of CLOCK_MONOTONIC_RAW_COARSE. I tried to find, though, some details, but I could not find any. Could you please provide some reference?
It doesn't exist yet ;-) However it doesn't seem crazy that such a clock should exist. I was really using it to illustrate that raw / non-raw is orthogonal to hres / coarse.
That being said, this really doesn't matter that much.
Huw.
In the last few years we assisted to an explosion of vdso implementations that mostly share similar code.
Try to unify the gettimeofday vdso implementation introducing lib/vdso. The code contained in this library can ideally be reused by all the architectures avoiding, where possible, code duplication.
Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- include/linux/hrtimer.h | 15 +-- include/linux/hrtimer_defs.h | 25 ++++ include/vdso/helpers.h | 56 +++++++++ lib/Kconfig | 5 + lib/vdso/Kconfig | 36 ++++++ lib/vdso/Makefile | 22 ++++ lib/vdso/gettimeofday.c | 225 +++++++++++++++++++++++++++++++++++ 7 files changed, 370 insertions(+), 14 deletions(-) create mode 100644 include/linux/hrtimer_defs.h create mode 100644 include/vdso/helpers.h create mode 100644 lib/vdso/Kconfig create mode 100644 lib/vdso/Makefile create mode 100644 lib/vdso/gettimeofday.c
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 2e8957eac4d4..c922ce02e2e6 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -12,6 +12,7 @@ #ifndef _LINUX_HRTIMER_H #define _LINUX_HRTIMER_H
+#include <linux/hrtimer_defs.h> #include <linux/rbtree.h> #include <linux/ktime.h> #include <linux/init.h> @@ -298,26 +299,12 @@ struct clock_event_device;
extern void hrtimer_interrupt(struct clock_event_device *dev);
-/* - * The resolution of the clocks. The resolution value is returned in - * the clock_getres() system call to give application programmers an - * idea of the (in)accuracy of timers. Timer values are rounded up to - * this resolution values. - */ -# define HIGH_RES_NSEC 1 -# define KTIME_HIGH_RES (HIGH_RES_NSEC) -# define MONOTONIC_RES_NSEC HIGH_RES_NSEC -# define KTIME_MONOTONIC_RES KTIME_HIGH_RES - extern void clock_was_set_delayed(void);
extern unsigned int hrtimer_resolution;
#else
-# define MONOTONIC_RES_NSEC LOW_RES_NSEC -# define KTIME_MONOTONIC_RES KTIME_LOW_RES - #define hrtimer_resolution (unsigned int)LOW_RES_NSEC
static inline void clock_was_set_delayed(void) { } diff --git a/include/linux/hrtimer_defs.h b/include/linux/hrtimer_defs.h new file mode 100644 index 000000000000..7179bfc04115 --- /dev/null +++ b/include/linux/hrtimer_defs.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_HRTIMER_DEFS_H +#define _LINUX_HRTIMER_DEFS_H + +#ifdef CONFIG_HIGH_RES_TIMERS + +/* + * The resolution of the clocks. The resolution value is returned in + * the clock_getres() system call to give application programmers an + * idea of the (in)accuracy of timers. Timer values are rounded up to + * this resolution values. + */ +# define HIGH_RES_NSEC 1 +# define KTIME_HIGH_RES (HIGH_RES_NSEC) +# define MONOTONIC_RES_NSEC HIGH_RES_NSEC +# define KTIME_MONOTONIC_RES KTIME_HIGH_RES + +#else + +# define MONOTONIC_RES_NSEC LOW_RES_NSEC +# define KTIME_MONOTONIC_RES KTIME_LOW_RES + +#endif + +#endif diff --git a/include/vdso/helpers.h b/include/vdso/helpers.h new file mode 100644 index 000000000000..4d66f4ffa1a2 --- /dev/null +++ b/include/vdso/helpers.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __VDSO_HELPERS_H +#define __VDSO_HELPERS_H + +#ifndef __ASSEMBLY__ + +#include <vdso/datapage.h> + +static __always_inline notrace u32 vdso_read_begin(const struct vdso_data *vd) +{ + u32 seq; + + while ((seq = READ_ONCE(vd->seq)) & 1) + cpu_relax(); + + smp_rmb(); + return seq; +} + +static __always_inline notrace u32 vdso_read_retry(const struct vdso_data *vd, + u32 start) +{ + u32 seq; + + smp_rmb(); + seq = READ_ONCE(vd->seq); + return seq != start; +} + +static __always_inline notrace void vdso_write_begin(struct vdso_data *vd) +{ + /* + * WRITE_ONCE it is required otherwise the compiler can validly tear + * updates to vd[x].seq and it is possible that the value seen by the + * reader it is inconsistent. + */ + WRITE_ONCE(vd[CS_HRES_COARSE].seq, vd[CS_HRES_COARSE].seq + 1); + WRITE_ONCE(vd[CS_RAW].seq, vd[CS_RAW].seq + 1); + smp_wmb(); +} + +static __always_inline notrace void vdso_write_end(struct vdso_data *vd) +{ + smp_wmb(); + /* + * WRITE_ONCE it is required otherwise the compiler can validly tear + * updates to vd[x].seq and it is possible that the value seen by the + * reader it is inconsistent. + */ + WRITE_ONCE(vd[CS_HRES_COARSE].seq, vd[CS_HRES_COARSE].seq + 1); + WRITE_ONCE(vd[CS_RAW].seq, vd[CS_RAW].seq + 1); +} + +#endif /* !__ASSEMBLY__ */ + +#endif /* __VDSO_HELPERS_H */ diff --git a/lib/Kconfig b/lib/Kconfig index 90623a0e1942..8c8eefc5e54c 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -576,6 +576,11 @@ config OID_REGISTRY config UCS2_STRING tristate
+# +# generic vdso +# +source "lib/vdso/Kconfig" + source "lib/fonts/Kconfig"
config SG_SPLIT diff --git a/lib/vdso/Kconfig b/lib/vdso/Kconfig new file mode 100644 index 000000000000..cc00364bd2c2 --- /dev/null +++ b/lib/vdso/Kconfig @@ -0,0 +1,36 @@ +# SPDX-License-Identifier: GPL-2.0 + +config HAVE_GENERIC_VDSO + bool + +if HAVE_GENERIC_VDSO + +config GENERIC_GETTIMEOFDAY + bool + help + This is a generic implementation of gettimeofday vdso. + Each architecture that enables this feature has to + provide the fallback implementation. + +config GENERIC_VDSO_32 + bool + depends on GENERIC_GETTIMEOFDAY && !64BIT + help + This config option helps to avoid possible performance issues + in 32 bit only architectures. + +config GENERIC_COMPAT_VDSO + bool + help + This config option enables the compat VDSO layer. + +config CROSS_COMPILE_COMPAT_VDSO + string "32 bit Toolchain prefix for compat vDSO" + default "" + depends on GENERIC_COMPAT_VDSO + help + Defines the cross-compiler prefix for compiling compat vDSO. + If a 64 bit compiler (i.e. x86_64) can compile the VDSO for + 32 bit, it does not need to define this parameter. + +endif diff --git a/lib/vdso/Makefile b/lib/vdso/Makefile new file mode 100644 index 000000000000..c415a685d61b --- /dev/null +++ b/lib/vdso/Makefile @@ -0,0 +1,22 @@ +# SPDX-License-Identifier: GPL-2.0 + +GENERIC_VDSO_MK_PATH := $(abspath $(lastword $(MAKEFILE_LIST))) +GENERIC_VDSO_DIR := $(dir $(GENERIC_VDSO_MK_PATH)) + +c-gettimeofday-$(CONFIG_GENERIC_GETTIMEOFDAY) := $(addprefix $(GENERIC_VDSO_DIR), gettimeofday.c) + +# This cmd checks that the vdso library does not contain absolute relocation +# It has to be called after the linking of the vdso library and requires it +# as a parameter. +# +# $(ARCH_REL_TYPE_ABS) is defined in the arch specific makefile and corresponds +# to the absolute relocation types printed by "objdump -R" and accepted by the +# dynamic linker. +ifndef ARCH_REL_TYPE_ABS +$(error ARCH_REL_TYPE_ABS is not set) +endif + +quiet_cmd_vdso_check = VDSOCHK $@ + cmd_vdso_check = if $(OBJDUMP) -R $@ | egrep -h "$(ARCH_REL_TYPE_ABS)"; \ + then (echo >&2 "$@: dynamic relocations are not supported"; \ + rm -f $@; /bin/false); fi diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c new file mode 100644 index 000000000000..a226675731f4 --- /dev/null +++ b/lib/vdso/gettimeofday.c @@ -0,0 +1,225 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Generic userspace implementations of gettimeofday() and similar. + */ +#include <linux/compiler.h> +#include <linux/math64.h> +#include <linux/time.h> +#include <linux/kernel.h> +#include <linux/ktime.h> +#include <linux/hrtimer_defs.h> +#include <vdso/datapage.h> +#include <vdso/helpers.h> + +/* + * The generic vDSO implementation requires that gettimeofday.h + * provides: + * - __arch_get_vdso_data(): to get the vdso datapage. + * - __arch_get_hw_counter(): to get the hw counter based on the + * clock_mode. + * - gettimeofday_fallback(): fallback for gettimeofday. + * - clock_gettime_fallback(): fallback for clock_gettime. + * - clock_getres_fallback(): fallback for clock_getres. + */ +#include <asm/vdso/gettimeofday.h> + +static notrace int do_hres(const struct vdso_data *vd, + clockid_t clk, + struct __kernel_timespec *ts) +{ + const struct vdso_timestamp *vdso_ts = &vd->basetime[clk]; + u64 cycles, last, sec, ns; + u32 seq; + + do { + seq = vdso_read_begin(vd); + cycles = __arch_get_hw_counter(vd->clock_mode) & + vd->mask; + ns = vdso_ts->nsec; + last = vd->cycle_last; + if (unlikely((s64)cycles < 0)) + return clock_gettime_fallback(clk, ts); + if (cycles > last) + ns += (cycles - last) * vd->mult; + ns >>= vd->shift; + sec = vdso_ts->sec; + } while (unlikely(vdso_read_retry(vd, seq))); + + /* + * Do this outside the loop: a race inside the loop could result + * in __iter_div_u64_rem() being extremely slow. + */ + ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); + ts->tv_nsec = ns; + + return 0; +} + +static notrace void do_coarse(const struct vdso_data *vd, + clockid_t clk, + struct __kernel_timespec *ts) +{ + const struct vdso_timestamp *vdso_ts = &vd->basetime[clk]; + u32 seq; + + do { + seq = vdso_read_begin(vd); + ts->tv_sec = vdso_ts->sec; + ts->tv_nsec = vdso_ts->nsec; + } while (unlikely(vdso_read_retry(vd, seq))); +} + +static notrace __maybe_unused int +__cvdso_clock_gettime(clockid_t clock, struct __kernel_timespec *ts) +{ + const struct vdso_data *vd = __arch_get_vdso_data(); + u32 msk; + + /* Check for negative values or invalid clocks */ + if (unlikely((u32) clock >= MAX_CLOCKS)) + goto fallback; + + /* + * Convert the clockid to a bitmask and use it to check which + * clocks are handled in the VDSO directly. + */ + msk = 1U << clock; + if (likely(msk & VDSO_HRES)) { + return do_hres(&vd[CS_HRES_COARSE], clock, ts); + } else if (msk & VDSO_COARSE) { + do_coarse(&vd[CS_HRES_COARSE], clock, ts); + return 0; + } else if (msk & VDSO_RAW) { + return do_hres(&vd[CS_RAW], clock, ts); + } + +fallback: + return clock_gettime_fallback(clock, ts); +} + +static notrace __maybe_unused int +__cvdso_clock_gettime32(clockid_t clock, struct old_timespec32 *res) +{ + struct __kernel_timespec ts; + int ret; + + if (res == NULL) + goto fallback; + + ret = __cvdso_clock_gettime(clock, &ts); + + if (ret == 0) { + res->tv_sec = ts.tv_sec; + res->tv_nsec = ts.tv_nsec; + } + + return ret; + +fallback: + return clock_gettime_fallback(clock, (struct __kernel_timespec *)res); +} + +static notrace __maybe_unused int +__cvdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz) +{ + const struct vdso_data *vd = __arch_get_vdso_data(); + + if (likely(tv != NULL)) { + struct __kernel_timespec ts; + + if (do_hres(&vd[CS_HRES_COARSE], CLOCK_REALTIME, &ts)) + return gettimeofday_fallback(tv, tz); + + tv->tv_sec = ts.tv_sec; + tv->tv_usec = (u32)ts.tv_nsec / NSEC_PER_USEC; + } + + if (unlikely(tz != NULL)) { + tz->tz_minuteswest = vd[CS_HRES_COARSE].tz_minuteswest; + tz->tz_dsttime = vd[CS_HRES_COARSE].tz_dsttime; + } + + return 0; +} + +#ifdef VDSO_HAS_TIME +static notrace __maybe_unused time_t __cvdso_time(time_t *time) +{ + const struct vdso_data *vd = __arch_get_vdso_data(); + time_t t = READ_ONCE(vd[CS_HRES_COARSE].basetime[CLOCK_REALTIME].sec); + + if (time) + *time = t; + + return t; +} +#endif /* VDSO_HAS_TIME */ + +static notrace __maybe_unused +int __cvdso_clock_getres(clockid_t clock, struct __kernel_timespec *res) +{ + const struct vdso_data *vd = __arch_get_vdso_data(); + u64 ns; + u32 msk; + u64 hrtimer_res = READ_ONCE(vd[CS_HRES_COARSE].hrtimer_res); + + /* Check for negative values or invalid clocks */ + if (unlikely((u32) clock >= MAX_CLOCKS)) + goto fallback; + + /* + * Convert the clockid to a bitmask and use it to check which + * clocks are handled in the VDSO directly. + */ + msk = 1U << clock; + if (msk & VDSO_HRES) { + /* + * Preserves the behaviour of posix_get_hrtimer_res(). + */ + ns = hrtimer_res; + } else if (msk & VDSO_COARSE) { + /* + * Preserves the behaviour of posix_get_coarse_res(). + */ + ns = LOW_RES_NSEC; + } else if (msk & VDSO_RAW) { + /* + * Preserves the behaviour of posix_get_hrtimer_res(). + */ + ns = hrtimer_res; + } else { + goto fallback; + } + + if (res) { + res->tv_sec = 0; + res->tv_nsec = ns; + } + + return 0; + +fallback: + return clock_getres_fallback(clock, res); +} + +static notrace __maybe_unused int +__cvdso_clock_getres_time32(clockid_t clock, struct old_timespec32 *res) +{ + struct __kernel_timespec ts; + int ret; + + if (res == NULL) + goto fallback; + + ret = __cvdso_clock_getres(clock, &ts); + + if (ret == 0) { + res->tv_sec = ts.tv_sec; + res->tv_nsec = ts.tv_nsec; + } + + return ret; + +fallback: + return clock_getres_fallback(clock, (struct __kernel_timespec *)res); +}
On Thu, May 30, 2019 at 4:15 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
+static __always_inline notrace void vdso_write_end(struct vdso_data *vd) +{
Rather than marking every single function in here as "notrace",I think it would be more robust to remove the '-pg' flag in the CFLAGS used for compiling the vdso files.
Arnd
On 31/05/2019 09:19, Arnd Bergmann wrote:
On Thu, May 30, 2019 at 4:15 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
+static __always_inline notrace void vdso_write_end(struct vdso_data *vd) +{
Rather than marking every single function in here as "notrace",I think it would be more robust to remove the '-pg' flag in the CFLAGS used for compiling the vdso files.
All the architectures that I added to this patchset are already compiled with $(CC_FLAGS_FTRACE), hence I think I just forgot to remove the "notrace" around the code. Will fix in v7.
Arnd
On Thu, May 30, 2019 at 03:15:14PM +0100, Vincenzo Frascino wrote:
In the last few years we assisted to an explosion of vdso implementations that mostly share similar code.
This doesn't make much sense. Perhaps: "In the last few years we have seen an explosion in vdso..." ?
Huw.
On 10/06/2019 10:31, Huw Davies wrote:
On Thu, May 30, 2019 at 03:15:14PM +0100, Vincenzo Frascino wrote:
In the last few years we assisted to an explosion of vdso implementations that mostly share similar code.
This doesn't make much sense. Perhaps: "In the last few years we have seen an explosion in vdso..." ?
Thanks for this, I will fix in v7.
Huw.
With the definition of the unified vDSO library the implementations of update_vsyscall and update_vsyscall_tz became quite similar across architectures.
Define a unified implementation of this two functions in kernel/vdso and provide the bindings that can be implemented by every architecture that takes advantage of the unified vDSO library.
Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- include/asm-generic/vdso/vsyscall.h | 56 +++++++++++ include/linux/timekeeper_internal.h | 9 ++ include/vdso/vsyscall.h | 11 +++ kernel/Makefile | 1 + kernel/vdso/Makefile | 2 + kernel/vdso/vsyscall.c | 139 ++++++++++++++++++++++++++++ 6 files changed, 218 insertions(+) create mode 100644 include/asm-generic/vdso/vsyscall.h create mode 100644 include/vdso/vsyscall.h create mode 100644 kernel/vdso/Makefile create mode 100644 kernel/vdso/vsyscall.c
diff --git a/include/asm-generic/vdso/vsyscall.h b/include/asm-generic/vdso/vsyscall.h new file mode 100644 index 000000000000..9a4b9fbcc9b6 --- /dev/null +++ b/include/asm-generic/vdso/vsyscall.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_GENERIC_VSYSCALL_H +#define __ASM_GENERIC_VSYSCALL_H + +#ifndef __ASSEMBLY__ + +#ifndef __arch_get_k_vdso_data +static __always_inline +struct vdso_data *__arch_get_k_vdso_data(void) +{ + return NULL; +} +#endif /* __arch_get_k_vdso_data */ + +#ifndef __arch_update_vdso_data +static __always_inline +int __arch_update_vdso_data(void) +{ + return 0; +} +#endif /* __arch_update_vdso_data */ + +#ifndef __arch_get_clock_mode +static __always_inline +int __arch_get_clock_mode(struct timekeeper *tk) +{ + return 0; +} +#endif /* __arch_get_clock_mode */ + +#ifndef __arch_use_vsyscall +static __always_inline +int __arch_use_vsyscall(struct vdso_data *vdata) +{ + return 1; +} +#endif /* __arch_use_vsyscall */ + +#ifndef __arch_update_vsyscall +static __always_inline +void __arch_update_vsyscall(struct vdso_data *vdata, + struct timekeeper *tk) +{ +} +#endif /* __arch_update_vsyscall */ + +#ifndef __arch_sync_vdso_data +static __always_inline +void __arch_sync_vdso_data(struct vdso_data *vdata) +{ +} +#endif /* __arch_sync_vdso_data */ + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_GENERIC_VSYSCALL_H */ diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h index 7acb953298a7..8177e75a71eb 100644 --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -135,9 +135,18 @@ struct timekeeper {
#ifdef CONFIG_GENERIC_TIME_VSYSCALL
+#ifdef CONFIG_HAVE_GENERIC_VDSO + +void update_vsyscall(struct timekeeper *tk); +void update_vsyscall_tz(void); + +#else + extern void update_vsyscall(struct timekeeper *tk); extern void update_vsyscall_tz(void);
+#endif /* CONFIG_HAVE_GENERIC_VDSO */ + #else
static inline void update_vsyscall(struct timekeeper *tk) diff --git a/include/vdso/vsyscall.h b/include/vdso/vsyscall.h new file mode 100644 index 000000000000..2c6134e0c23d --- /dev/null +++ b/include/vdso/vsyscall.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __VDSO_VSYSCALL_H +#define __VDSO_VSYSCALL_H + +#ifndef __ASSEMBLY__ + +#include <asm/vdso/vsyscall.h> + +#endif /* !__ASSEMBLY__ */ + +#endif /* __VDSO_VSYSCALL_H */ diff --git a/kernel/Makefile b/kernel/Makefile index 33824f0385b3..56a98ebb7772 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -48,6 +48,7 @@ obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o obj-$(CONFIG_PROFILING) += profile.o obj-$(CONFIG_STACKTRACE) += stacktrace.o +obj-$(CONFIG_HAVE_GENERIC_VDSO) += vdso/ obj-y += time/ obj-$(CONFIG_FUTEX) += futex.o obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o diff --git a/kernel/vdso/Makefile b/kernel/vdso/Makefile new file mode 100644 index 000000000000..ad0d3b1a475c --- /dev/null +++ b/kernel/vdso/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_HAVE_GENERIC_VDSO) += vsyscall.o diff --git a/kernel/vdso/vsyscall.c b/kernel/vdso/vsyscall.c new file mode 100644 index 000000000000..49409eece728 --- /dev/null +++ b/kernel/vdso/vsyscall.c @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2019 ARM Ltd. + * + * Generic implementation of update_vsyscall and update_vsyscall_tz. + */ + +#include <linux/hrtimer.h> +#include <linux/timekeeper_internal.h> +#include <vdso/datapage.h> +#include <vdso/helpers.h> +#include <vdso/vsyscall.h> + +void update_vsyscall(struct timekeeper *tk) +{ + struct vdso_data *vdata = __arch_get_k_vdso_data(); + struct vdso_timestamp *vdso_ts; + u64 nsec; + + if (__arch_update_vdso_data()) { + /* + * Some architectures might want to skip the update of the + * data page. + */ + return; + } + + /* copy vsyscall data */ + vdso_write_begin(vdata); + + vdata[CS_HRES_COARSE].clock_mode = __arch_get_clock_mode(tk); + vdata[CS_RAW].clock_mode = __arch_get_clock_mode(tk); + + /* CLOCK_REALTIME_COARSE */ + vdso_ts = + &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME_COARSE]; + vdso_ts->sec = tk->xtime_sec; + vdso_ts->nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift; + /* CLOCK_MONOTONIC_COARSE */ + vdso_ts = + &vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC_COARSE]; + vdso_ts->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec; + nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift; + nsec = nsec + tk->wall_to_monotonic.tv_nsec; + while (nsec >= NSEC_PER_SEC) { + nsec = nsec - NSEC_PER_SEC; + vdso_ts->sec++; + } + vdso_ts->nsec = nsec; + + if (__arch_use_vsyscall(vdata)) { + vdata[CS_HRES_COARSE].cycle_last = + tk->tkr_mono.cycle_last; + vdata[CS_HRES_COARSE].mask = + tk->tkr_mono.mask; + vdata[CS_HRES_COARSE].mult = + tk->tkr_mono.mult; + vdata[CS_HRES_COARSE].shift = + tk->tkr_mono.shift; + vdata[CS_RAW].cycle_last = + tk->tkr_raw.cycle_last; + vdata[CS_RAW].mask = + tk->tkr_raw.mask; + vdata[CS_RAW].mult = + tk->tkr_raw.mult; + vdata[CS_RAW].shift = + tk->tkr_raw.shift; + /* CLOCK_REALTIME */ + vdso_ts = + &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME]; + vdso_ts->sec = tk->xtime_sec; + vdso_ts->nsec = tk->tkr_mono.xtime_nsec; + /* CLOCK_MONOTONIC */ + vdso_ts = + &vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC]; + vdso_ts->sec = tk->xtime_sec + + tk->wall_to_monotonic.tv_sec; + nsec = tk->tkr_mono.xtime_nsec; + nsec = nsec + + ((u64)tk->wall_to_monotonic.tv_nsec << + tk->tkr_mono.shift); + while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) { + nsec = nsec - + (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift); + vdso_ts->sec++; + } + vdso_ts->nsec = nsec; + /* CLOCK_MONOTONIC_RAW */ + vdso_ts = + &vdata[CS_RAW].basetime[CLOCK_MONOTONIC_RAW]; + vdso_ts->sec = tk->raw_sec; + vdso_ts->nsec = tk->tkr_raw.xtime_nsec; + /* CLOCK_BOOTTIME */ + vdso_ts = + &vdata[CS_HRES_COARSE].basetime[CLOCK_BOOTTIME]; + vdso_ts->sec = tk->xtime_sec + + tk->wall_to_monotonic.tv_sec; + nsec = tk->tkr_mono.xtime_nsec; + nsec = nsec + + ((u64)(tk->wall_to_monotonic.tv_nsec + + ktime_to_ns(tk->offs_boot)) << + tk->tkr_mono.shift); + while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) { + nsec = nsec - + (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift); + vdso_ts->sec++; + } + vdso_ts->nsec = nsec; + /* CLOCK_TAI */ + vdso_ts = + &vdata[CS_HRES_COARSE].basetime[CLOCK_TAI]; + vdso_ts->sec = tk->xtime_sec + (s64)tk->tai_offset; + vdso_ts->nsec = tk->tkr_mono.xtime_nsec; + + /* + * Read without the seqlock held by clock_getres(). + * Note: No need to have a second copy. + */ + WRITE_ONCE(vdata[CS_HRES_COARSE].hrtimer_res, hrtimer_resolution); + } + + __arch_update_vsyscall(vdata, tk); + + vdso_write_end(vdata); + + __arch_sync_vdso_data(vdata); +} + +void update_vsyscall_tz(void) +{ + struct vdso_data *vdata = __arch_get_k_vdso_data(); + + if (__arch_use_vsyscall(vdata)) { + vdata[CS_HRES_COARSE].tz_minuteswest = sys_tz.tz_minuteswest; + vdata[CS_HRES_COARSE].tz_dsttime = sys_tz.tz_dsttime; + } + + __arch_sync_vdso_data(vdata); +}
On Thu, May 30, 2019 at 03:15:15PM +0100, Vincenzo Frascino wrote:
With the definition of the unified vDSO library the implementations of update_vsyscall and update_vsyscall_tz became quite similar across architectures.
Define a unified implementation of this two functions in kernel/vdso and
... of these two functions ...
provide the bindings that can be implemented by every architecture that takes advantage of the unified vDSO library.
On 10/06/2019 10:34, Huw Davies wrote:
On Thu, May 30, 2019 at 03:15:15PM +0100, Vincenzo Frascino wrote:
With the definition of the unified vDSO library the implementations of update_vsyscall and update_vsyscall_tz became quite similar across architectures.
Define a unified implementation of this two functions in kernel/vdso and
... of these two functions ...
Thanks for this, I will fix in v7.
provide the bindings that can be implemented by every architecture that takes advantage of the unified vDSO library.
On Thu, 30 May 2019, Vincenzo Frascino wrote:
- if (__arch_use_vsyscall(vdata)) {
vdata[CS_HRES_COARSE].cycle_last =
tk->tkr_mono.cycle_last;
vdata[CS_HRES_COARSE].mask =
tk->tkr_mono.mask;
vdata[CS_HRES_COARSE].mult =
tk->tkr_mono.mult;
These line breaks make it really hard to read. Can you fold in the patch below please?
Thanks,
tglx 8<----------- --- a/kernel/vdso/vsyscall.c +++ b/kernel/vdso/vsyscall.c @@ -11,6 +11,66 @@ #include <vdso/helpers.h> #include <vdso/vsyscall.h>
+static inline void udpate_vdata(struct vdso_data *vdata, struct timekeeper *tk) +{ + struct vdso_timestamp *vdso_ts; + u64 nsec; + + vdata[CS_HRES_COARSE].cycle_last = tk->tkr_mono.cycle_last; + vdata[CS_HRES_COARSE].mask = tk->tkr_mono.mask; + vdata[CS_HRES_COARSE].mult = tk->tkr_mono.mult; + vdata[CS_HRES_COARSE].shift = tk->tkr_mono.shift; + vdata[CS_RAW].cycle_last = tk->tkr_raw.cycle_last; + vdata[CS_RAW].mask = tk->tkr_raw.mask; + vdata[CS_RAW].mult = tk->tkr_raw.mult; + vdata[CS_RAW].shift = tk->tkr_raw.shift; + + /* CLOCK_REALTIME */ + vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME]; + vdso_ts->sec = tk->xtime_sec; + vdso_ts->nsec = tk->tkr_mono.xtime_nsec; + + /* CLOCK_MONOTONIC */ + vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC]; + vdso_ts->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec; + + nsec = tk->tkr_mono.xtime_nsec; + nsec += ((u64)tk->wall_to_monotonic.tv_nsec << tk->tkr_mono.shift); + while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) { + nsec -= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift); + vdso_ts->sec++; + } + vdso_ts->nsec = nsec; + + /* CLOCK_MONOTONIC_RAW */ + vdso_ts = &vdata[CS_RAW].basetime[CLOCK_MONOTONIC_RAW]; + vdso_ts->sec = tk->raw_sec; + vdso_ts->nsec = tk->tkr_raw.xtime_nsec; + + /* CLOCK_BOOTTIME */ + vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_BOOTTIME]; + vdso_ts->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec; + nsec = tk->tkr_mono.xtime_nsec; + nsec += ((u64)(tk->wall_to_monotonic.tv_nsec + + ktime_to_ns(tk->offs_boot)) << tk->tkr_mono.shift); + while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) { + nsec -= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift); + vdso_ts->sec++; + } + vdso_ts->nsec = nsec; + + /* CLOCK_TAI */ + vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_TAI]; + vdso_ts->sec = tk->xtime_sec + (s64)tk->tai_offset; + vdso_ts->nsec = tk->tkr_mono.xtime_nsec; + + /* + * Read without the seqlock held by clock_getres(). + * Note: No need to have a second copy. + */ + WRITE_ONCE(vdata[CS_HRES_COARSE].hrtimer_res, hrtimer_resolution); +} + void update_vsyscall(struct timekeeper *tk) { struct vdso_data *vdata = __arch_get_k_vdso_data(); @@ -32,92 +92,23 @@ void update_vsyscall(struct timekeeper * vdata[CS_RAW].clock_mode = __arch_get_clock_mode(tk);
/* CLOCK_REALTIME_COARSE */ - vdso_ts = - &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME_COARSE]; - vdso_ts->sec = tk->xtime_sec; - vdso_ts->nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift; + vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME_COARSE]; + vdso_ts->sec = tk->xtime_sec; + vdso_ts->nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift; + /* CLOCK_MONOTONIC_COARSE */ - vdso_ts = - &vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC_COARSE]; - vdso_ts->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec; - nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift; - nsec = nsec + tk->wall_to_monotonic.tv_nsec; + vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC_COARSE]; + vdso_ts->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec; + nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift; + nsec = nsec + tk->wall_to_monotonic.tv_nsec; while (nsec >= NSEC_PER_SEC) { nsec = nsec - NSEC_PER_SEC; vdso_ts->sec++; } - vdso_ts->nsec = nsec; + vdso_ts->nsec = nsec;
- if (__arch_use_vsyscall(vdata)) { - vdata[CS_HRES_COARSE].cycle_last = - tk->tkr_mono.cycle_last; - vdata[CS_HRES_COARSE].mask = - tk->tkr_mono.mask; - vdata[CS_HRES_COARSE].mult = - tk->tkr_mono.mult; - vdata[CS_HRES_COARSE].shift = - tk->tkr_mono.shift; - vdata[CS_RAW].cycle_last = - tk->tkr_raw.cycle_last; - vdata[CS_RAW].mask = - tk->tkr_raw.mask; - vdata[CS_RAW].mult = - tk->tkr_raw.mult; - vdata[CS_RAW].shift = - tk->tkr_raw.shift; - /* CLOCK_REALTIME */ - vdso_ts = - &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME]; - vdso_ts->sec = tk->xtime_sec; - vdso_ts->nsec = tk->tkr_mono.xtime_nsec; - /* CLOCK_MONOTONIC */ - vdso_ts = - &vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC]; - vdso_ts->sec = tk->xtime_sec + - tk->wall_to_monotonic.tv_sec; - nsec = tk->tkr_mono.xtime_nsec; - nsec = nsec + - ((u64)tk->wall_to_monotonic.tv_nsec << - tk->tkr_mono.shift); - while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) { - nsec = nsec - - (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift); - vdso_ts->sec++; - } - vdso_ts->nsec = nsec; - /* CLOCK_MONOTONIC_RAW */ - vdso_ts = - &vdata[CS_RAW].basetime[CLOCK_MONOTONIC_RAW]; - vdso_ts->sec = tk->raw_sec; - vdso_ts->nsec = tk->tkr_raw.xtime_nsec; - /* CLOCK_BOOTTIME */ - vdso_ts = - &vdata[CS_HRES_COARSE].basetime[CLOCK_BOOTTIME]; - vdso_ts->sec = tk->xtime_sec + - tk->wall_to_monotonic.tv_sec; - nsec = tk->tkr_mono.xtime_nsec; - nsec = nsec + - ((u64)(tk->wall_to_monotonic.tv_nsec + - ktime_to_ns(tk->offs_boot)) << - tk->tkr_mono.shift); - while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) { - nsec = nsec - - (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift); - vdso_ts->sec++; - } - vdso_ts->nsec = nsec; - /* CLOCK_TAI */ - vdso_ts = - &vdata[CS_HRES_COARSE].basetime[CLOCK_TAI]; - vdso_ts->sec = tk->xtime_sec + (s64)tk->tai_offset; - vdso_ts->nsec = tk->tkr_mono.xtime_nsec; - - /* - * Read without the seqlock held by clock_getres(). - * Note: No need to have a second copy. - */ - WRITE_ONCE(vdata[CS_HRES_COARSE].hrtimer_res, hrtimer_resolution); - } + if (__arch_use_vsyscall(vdata)) + update_vdata(vdata, tk);
__arch_update_vsyscall(vdata, tk);
Hi Thomas,
On 6/14/19 12:10 PM, Thomas Gleixner wrote:
On Thu, 30 May 2019, Vincenzo Frascino wrote:
- if (__arch_use_vsyscall(vdata)) {
vdata[CS_HRES_COARSE].cycle_last =
tk->tkr_mono.cycle_last;
vdata[CS_HRES_COARSE].mask =
tk->tkr_mono.mask;
vdata[CS_HRES_COARSE].mult =
tk->tkr_mono.mult;
These line breaks make it really hard to read. Can you fold in the patch below please?
Thanks for this. I will do it in v7.
Thanks,
tglx 8<----------- --- a/kernel/vdso/vsyscall.c +++ b/kernel/vdso/vsyscall.c @@ -11,6 +11,66 @@ #include <vdso/helpers.h> #include <vdso/vsyscall.h> +static inline void udpate_vdata(struct vdso_data *vdata, struct timekeeper *tk) +{
- struct vdso_timestamp *vdso_ts;
- u64 nsec;
- vdata[CS_HRES_COARSE].cycle_last = tk->tkr_mono.cycle_last;
- vdata[CS_HRES_COARSE].mask = tk->tkr_mono.mask;
- vdata[CS_HRES_COARSE].mult = tk->tkr_mono.mult;
- vdata[CS_HRES_COARSE].shift = tk->tkr_mono.shift;
- vdata[CS_RAW].cycle_last = tk->tkr_raw.cycle_last;
- vdata[CS_RAW].mask = tk->tkr_raw.mask;
- vdata[CS_RAW].mult = tk->tkr_raw.mult;
- vdata[CS_RAW].shift = tk->tkr_raw.shift;
- /* CLOCK_REALTIME */
- vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME];
- vdso_ts->sec = tk->xtime_sec;
- vdso_ts->nsec = tk->tkr_mono.xtime_nsec;
- /* CLOCK_MONOTONIC */
- vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC];
- vdso_ts->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
- nsec = tk->tkr_mono.xtime_nsec;
- nsec += ((u64)tk->wall_to_monotonic.tv_nsec << tk->tkr_mono.shift);
- while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) {
nsec -= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift);
vdso_ts->sec++;
- }
- vdso_ts->nsec = nsec;
- /* CLOCK_MONOTONIC_RAW */
- vdso_ts = &vdata[CS_RAW].basetime[CLOCK_MONOTONIC_RAW];
- vdso_ts->sec = tk->raw_sec;
- vdso_ts->nsec = tk->tkr_raw.xtime_nsec;
- /* CLOCK_BOOTTIME */
- vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_BOOTTIME];
- vdso_ts->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
- nsec = tk->tkr_mono.xtime_nsec;
- nsec += ((u64)(tk->wall_to_monotonic.tv_nsec +
ktime_to_ns(tk->offs_boot)) << tk->tkr_mono.shift);
- while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) {
nsec -= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift);
vdso_ts->sec++;
- }
- vdso_ts->nsec = nsec;
- /* CLOCK_TAI */
- vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_TAI];
- vdso_ts->sec = tk->xtime_sec + (s64)tk->tai_offset;
- vdso_ts->nsec = tk->tkr_mono.xtime_nsec;
- /*
* Read without the seqlock held by clock_getres().
* Note: No need to have a second copy.
*/
- WRITE_ONCE(vdata[CS_HRES_COARSE].hrtimer_res, hrtimer_resolution);
+}
void update_vsyscall(struct timekeeper *tk) { struct vdso_data *vdata = __arch_get_k_vdso_data(); @@ -32,92 +92,23 @@ void update_vsyscall(struct timekeeper * vdata[CS_RAW].clock_mode = __arch_get_clock_mode(tk); /* CLOCK_REALTIME_COARSE */
- vdso_ts =
&vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME_COARSE];
- vdso_ts->sec = tk->xtime_sec;
- vdso_ts->nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift;
- vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME_COARSE];
- vdso_ts->sec = tk->xtime_sec;
- vdso_ts->nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift;
- /* CLOCK_MONOTONIC_COARSE */
- vdso_ts =
&vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC_COARSE];
- vdso_ts->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
- nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift;
- nsec = nsec + tk->wall_to_monotonic.tv_nsec;
- vdso_ts = &vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC_COARSE];
- vdso_ts->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
- nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift;
- nsec = nsec + tk->wall_to_monotonic.tv_nsec; while (nsec >= NSEC_PER_SEC) { nsec = nsec - NSEC_PER_SEC; vdso_ts->sec++; }
- vdso_ts->nsec = nsec;
- vdso_ts->nsec = nsec;
- if (__arch_use_vsyscall(vdata)) {
vdata[CS_HRES_COARSE].cycle_last =
tk->tkr_mono.cycle_last;
vdata[CS_HRES_COARSE].mask =
tk->tkr_mono.mask;
vdata[CS_HRES_COARSE].mult =
tk->tkr_mono.mult;
vdata[CS_HRES_COARSE].shift =
tk->tkr_mono.shift;
vdata[CS_RAW].cycle_last =
tk->tkr_raw.cycle_last;
vdata[CS_RAW].mask =
tk->tkr_raw.mask;
vdata[CS_RAW].mult =
tk->tkr_raw.mult;
vdata[CS_RAW].shift =
tk->tkr_raw.shift;
/* CLOCK_REALTIME */
vdso_ts =
&vdata[CS_HRES_COARSE].basetime[CLOCK_REALTIME];
vdso_ts->sec = tk->xtime_sec;
vdso_ts->nsec = tk->tkr_mono.xtime_nsec;
/* CLOCK_MONOTONIC */
vdso_ts =
&vdata[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC];
vdso_ts->sec = tk->xtime_sec +
tk->wall_to_monotonic.tv_sec;
nsec = tk->tkr_mono.xtime_nsec;
nsec = nsec +
((u64)tk->wall_to_monotonic.tv_nsec <<
tk->tkr_mono.shift);
while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) {
nsec = nsec -
(((u64)NSEC_PER_SEC) << tk->tkr_mono.shift);
vdso_ts->sec++;
}
vdso_ts->nsec = nsec;
/* CLOCK_MONOTONIC_RAW */
vdso_ts =
&vdata[CS_RAW].basetime[CLOCK_MONOTONIC_RAW];
vdso_ts->sec = tk->raw_sec;
vdso_ts->nsec = tk->tkr_raw.xtime_nsec;
/* CLOCK_BOOTTIME */
vdso_ts =
&vdata[CS_HRES_COARSE].basetime[CLOCK_BOOTTIME];
vdso_ts->sec = tk->xtime_sec +
tk->wall_to_monotonic.tv_sec;
nsec = tk->tkr_mono.xtime_nsec;
nsec = nsec +
((u64)(tk->wall_to_monotonic.tv_nsec +
ktime_to_ns(tk->offs_boot)) <<
tk->tkr_mono.shift);
while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) {
nsec = nsec -
(((u64)NSEC_PER_SEC) << tk->tkr_mono.shift);
vdso_ts->sec++;
}
vdso_ts->nsec = nsec;
/* CLOCK_TAI */
vdso_ts =
&vdata[CS_HRES_COARSE].basetime[CLOCK_TAI];
vdso_ts->sec = tk->xtime_sec + (s64)tk->tai_offset;
vdso_ts->nsec = tk->tkr_mono.xtime_nsec;
/*
* Read without the seqlock held by clock_getres().
* Note: No need to have a second copy.
*/
WRITE_ONCE(vdata[CS_HRES_COARSE].hrtimer_res, hrtimer_resolution);
- }
- if (__arch_use_vsyscall(vdata))
update_vdata(vdata, tk);
__arch_update_vsyscall(vdata, tk);
On Fri, 14 Jun 2019, Vincenzo Frascino wrote:
On 6/14/19 12:10 PM, Thomas Gleixner wrote:
On Thu, 30 May 2019, Vincenzo Frascino wrote:
- if (__arch_use_vsyscall(vdata)) {
vdata[CS_HRES_COARSE].cycle_last =
tk->tkr_mono.cycle_last;
vdata[CS_HRES_COARSE].mask =
tk->tkr_mono.mask;
vdata[CS_HRES_COARSE].mult =
tk->tkr_mono.mult;
These line breaks make it really hard to read. Can you fold in the patch below please?
Thanks for this. I will do it in v7.
Talking about v7. I'd like to get this into 5.3. That means you'd have to rebase it on
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git hyperv-next
to avoid the hyperv conflict. I'll sort this out with the hyperv folks how I can get these bits as a base for a tip branch which holds all the vdso pieces.
Thanks,
tglx
On 6/14/19 1:19 PM, Thomas Gleixner wrote:
On Fri, 14 Jun 2019, Vincenzo Frascino wrote:
On 6/14/19 12:10 PM, Thomas Gleixner wrote:
On Thu, 30 May 2019, Vincenzo Frascino wrote:
- if (__arch_use_vsyscall(vdata)) {
vdata[CS_HRES_COARSE].cycle_last =
tk->tkr_mono.cycle_last;
vdata[CS_HRES_COARSE].mask =
tk->tkr_mono.mask;
vdata[CS_HRES_COARSE].mult =
tk->tkr_mono.mult;
These line breaks make it really hard to read. Can you fold in the patch below please?
Thanks for this. I will do it in v7.
Talking about v7. I'd like to get this into 5.3. That means you'd have to rebase it on
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git hyperv-next
to avoid the hyperv conflict. I'll sort this out with the hyperv folks how I can get these bits as a base for a tip branch which holds all the vdso pieces.
Ok, I will rebase and test the patches against the hyperv-next branch. Could you please let me know when all the bits are sorted?
Thanks,
tglx
On Fri, 14 Jun 2019, Vincenzo Frascino wrote:
On 6/14/19 1:19 PM, Thomas Gleixner wrote:
On Fri, 14 Jun 2019, Vincenzo Frascino wrote:
On 6/14/19 12:10 PM, Thomas Gleixner wrote:
On Thu, 30 May 2019, Vincenzo Frascino wrote:
- if (__arch_use_vsyscall(vdata)) {
vdata[CS_HRES_COARSE].cycle_last =
tk->tkr_mono.cycle_last;
vdata[CS_HRES_COARSE].mask =
tk->tkr_mono.mask;
vdata[CS_HRES_COARSE].mult =
tk->tkr_mono.mult;
These line breaks make it really hard to read. Can you fold in the patch below please?
Thanks for this. I will do it in v7.
Talking about v7. I'd like to get this into 5.3. That means you'd have to rebase it on
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git hyperv-next
to avoid the hyperv conflict. I'll sort this out with the hyperv folks how I can get these bits as a base for a tip branch which holds all the vdso pieces.
Ok, I will rebase and test the patches against the hyperv-next branch. Could you please let me know when all the bits are sorted?
Don't worry. Just post it against that branch and I'll sort out the logistics independently.
Thanks,
tglx
To take advantage of the commonly defined vdso interface for gettimeofday the architectural code requires an adaptation.
Re-implement the gettimeofday vdso in C in order to use lib/vdso.
With the new implementation arm64 gains support for CLOCK_BOOTTIME and CLOCK_TAI.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm64/Kconfig | 2 + arch/arm64/include/asm/vdso/gettimeofday.h | 84 ++++++ arch/arm64/include/asm/vdso/vsyscall.h | 53 ++++ arch/arm64/include/asm/vdso_datapage.h | 48 --- arch/arm64/kernel/asm-offsets.c | 33 +- arch/arm64/kernel/vdso.c | 51 +--- arch/arm64/kernel/vdso/Makefile | 34 ++- arch/arm64/kernel/vdso/gettimeofday.S | 334 --------------------- arch/arm64/kernel/vdso/vgettimeofday.c | 28 ++ 9 files changed, 221 insertions(+), 446 deletions(-) create mode 100644 arch/arm64/include/asm/vdso/gettimeofday.h create mode 100644 arch/arm64/include/asm/vdso/vsyscall.h delete mode 100644 arch/arm64/include/asm/vdso_datapage.h delete mode 100644 arch/arm64/kernel/vdso/gettimeofday.S create mode 100644 arch/arm64/kernel/vdso/vgettimeofday.c
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 697ea0510729..952c9f8cf3b8 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -107,6 +107,7 @@ config ARM64 select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER select GENERIC_TIME_VSYSCALL + select GENERIC_GETTIMEOFDAY select HANDLE_DOMAIN_IRQ select HARDIRQS_SW_RESEND select HAVE_PCI @@ -160,6 +161,7 @@ config ARM64 select HAVE_SYSCALL_TRACEPOINTS select HAVE_KPROBES select HAVE_KRETPROBES + select HAVE_GENERIC_VDSO select IOMMU_DMA if IOMMU_SUPPORT select IRQ_DOMAIN select IRQ_FORCED_THREADING diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h new file mode 100644 index 000000000000..dcfe408b9b11 --- /dev/null +++ b/arch/arm64/include/asm/vdso/gettimeofday.h @@ -0,0 +1,84 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2018 ARM Limited + */ +#ifndef __ASM_VDSO_GETTIMEOFDAY_H +#define __ASM_VDSO_GETTIMEOFDAY_H + +#ifndef __ASSEMBLY__ + +#include <asm/unistd.h> +#include <uapi/linux/time.h> + +static __always_inline notrace int gettimeofday_fallback( + struct __kernel_old_timeval *_tv, + struct timezone *_tz) +{ + register struct timezone *tz asm("x1") = _tz; + register struct __kernel_old_timeval *tv asm("x0") = _tv; + register long ret asm ("x0"); + register long nr asm("x8") = __NR_gettimeofday; + + asm volatile( + " svc #0\n" + : "=r" (ret) + : "r" (tv), "r" (tz), "r" (nr) + : "memory"); + + return ret; +} + +static __always_inline notrace long clock_gettime_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + register struct __kernel_timespec *ts asm("x1") = _ts; + register clockid_t clkid asm("x0") = _clkid; + register long ret asm ("x0"); + register long nr asm("x8") = __NR_clock_gettime; + + asm volatile( + " svc #0\n" + : "=r" (ret) + : "r" (clkid), "r" (ts), "r" (nr) + : "memory"); + + return ret; +} + +static __always_inline notrace int clock_getres_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + register struct __kernel_timespec *ts asm("x1") = _ts; + register clockid_t clkid asm("x0") = _clkid; + register long ret asm ("x0"); + register long nr asm("x8") = __NR_clock_getres; + + asm volatile( + " svc #0\n" + : "=r" (ret) + : "r" (clkid), "r" (ts), "r" (nr) + : "memory"); + + return ret; +} + +static __always_inline notrace u64 __arch_get_hw_counter(s32 clock_mode) +{ + u64 res; + + asm volatile("mrs %0, cntvct_el0" : "=r" (res) :: "memory"); + + return res; +} + +static __always_inline +notrace const struct vdso_data *__arch_get_vdso_data(void) +{ + return _vdso_data; +} + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_GETTIMEOFDAY_H */ diff --git a/arch/arm64/include/asm/vdso/vsyscall.h b/arch/arm64/include/asm/vdso/vsyscall.h new file mode 100644 index 000000000000..0c731bfc7c8c --- /dev/null +++ b/arch/arm64/include/asm/vdso/vsyscall.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_VDSO_VSYSCALL_H +#define __ASM_VDSO_VSYSCALL_H + +#ifndef __ASSEMBLY__ + +#include <linux/timekeeper_internal.h> +#include <vdso/datapage.h> + +#define VDSO_PRECISION_MASK ~(0xFF00ULL<<48) + +extern struct vdso_data *vdso_data; + +/* + * Update the vDSO data page to keep in sync with kernel timekeeping. + */ +static __always_inline +struct vdso_data *__arm64_get_k_vdso_data(void) +{ + return vdso_data; +} +#define __arch_get_k_vdso_data __arm64_get_k_vdso_data + +static __always_inline +int __arm64_get_clock_mode(struct timekeeper *tk) +{ + u32 use_syscall = !tk->tkr_mono.clock->archdata.vdso_direct; + + return use_syscall; +} +#define __arch_get_clock_mode __arm64_get_clock_mode + +static __always_inline +int __arm64_use_vsyscall(struct vdso_data *vdata) +{ + return !vdata[CS_HRES_COARSE].clock_mode; +} +#define __arch_use_vsyscall __arm64_use_vsyscall + +static __always_inline +void __arm64_update_vsyscall(struct vdso_data *vdata, struct timekeeper *tk) +{ + vdata[CS_HRES_COARSE].mask = VDSO_PRECISION_MASK; + vdata[CS_RAW].mask = VDSO_PRECISION_MASK; +} +#define __arch_update_vsyscall __arm64_update_vsyscall + +/* The asm-generic header needs to be included after the definitions above */ +#include <asm-generic/vdso/vsyscall.h> + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_VSYSCALL_H */ diff --git a/arch/arm64/include/asm/vdso_datapage.h b/arch/arm64/include/asm/vdso_datapage.h deleted file mode 100644 index f89263c8e11a..000000000000 --- a/arch/arm64/include/asm/vdso_datapage.h +++ /dev/null @@ -1,48 +0,0 @@ -/* - * Copyright (C) 2012 ARM Limited - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see http://www.gnu.org/licenses/. - */ -#ifndef __ASM_VDSO_DATAPAGE_H -#define __ASM_VDSO_DATAPAGE_H - -#ifdef __KERNEL__ - -#ifndef __ASSEMBLY__ - -struct vdso_data { - __u64 cs_cycle_last; /* Timebase at clocksource init */ - __u64 raw_time_sec; /* Raw time */ - __u64 raw_time_nsec; - __u64 xtime_clock_sec; /* Kernel time */ - __u64 xtime_clock_nsec; - __u64 xtime_coarse_sec; /* Coarse time */ - __u64 xtime_coarse_nsec; - __u64 wtm_clock_sec; /* Wall to monotonic time */ - __u64 wtm_clock_nsec; - __u32 tb_seq_count; /* Timebase sequence counter */ - /* cs_* members must be adjacent and in this order (ldp accesses) */ - __u32 cs_mono_mult; /* NTP-adjusted clocksource multiplier */ - __u32 cs_shift; /* Clocksource shift (mono = raw) */ - __u32 cs_raw_mult; /* Raw clocksource multiplier */ - __u32 tz_minuteswest; /* Whacky timezone stuff */ - __u32 tz_dsttime; - __u32 use_syscall; - __u32 hrtimer_res; -}; - -#endif /* !__ASSEMBLY__ */ - -#endif /* __KERNEL__ */ - -#endif /* __ASM_VDSO_DATAPAGE_H */ diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 947e39896e28..9e4b7ccbab2f 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -25,13 +25,13 @@ #include <linux/kvm_host.h> #include <linux/preempt.h> #include <linux/suspend.h> +#include <vdso/datapage.h> #include <asm/cpufeature.h> #include <asm/fixmap.h> #include <asm/thread_info.h> #include <asm/memory.h> #include <asm/smp_plat.h> #include <asm/suspend.h> -#include <asm/vdso_datapage.h> #include <linux/kbuild.h> #include <linux/arm-smccc.h>
@@ -100,17 +100,28 @@ int main(void) DEFINE(CLOCK_COARSE_RES, LOW_RES_NSEC); DEFINE(NSEC_PER_SEC, NSEC_PER_SEC); BLANK(); - DEFINE(VDSO_CS_CYCLE_LAST, offsetof(struct vdso_data, cs_cycle_last)); - DEFINE(VDSO_RAW_TIME_SEC, offsetof(struct vdso_data, raw_time_sec)); - DEFINE(VDSO_XTIME_CLK_SEC, offsetof(struct vdso_data, xtime_clock_sec)); - DEFINE(VDSO_XTIME_CRS_SEC, offsetof(struct vdso_data, xtime_coarse_sec)); - DEFINE(VDSO_XTIME_CRS_NSEC, offsetof(struct vdso_data, xtime_coarse_nsec)); - DEFINE(VDSO_WTM_CLK_SEC, offsetof(struct vdso_data, wtm_clock_sec)); - DEFINE(VDSO_TB_SEQ_COUNT, offsetof(struct vdso_data, tb_seq_count)); - DEFINE(VDSO_CS_MONO_MULT, offsetof(struct vdso_data, cs_mono_mult)); - DEFINE(VDSO_CS_SHIFT, offsetof(struct vdso_data, cs_shift)); + DEFINE(VDSO_SEQ, offsetof(struct vdso_data, seq)); + DEFINE(VDSO_CLK_MODE, offsetof(struct vdso_data, clock_mode)); + DEFINE(VDSO_CYCLE_LAST, offsetof(struct vdso_data, cycle_last)); + DEFINE(VDSO_MASK, offsetof(struct vdso_data, mask)); + DEFINE(VDSO_MULT, offsetof(struct vdso_data, mult)); + DEFINE(VDSO_SHIFT, offsetof(struct vdso_data, shift)); + DEFINE(VDSO_REALTIME_SEC, offsetof(struct vdso_data, basetime[CLOCK_REALTIME].sec)); + DEFINE(VDSO_REALTIME_NSEC, offsetof(struct vdso_data, basetime[CLOCK_REALTIME].nsec)); + DEFINE(VDSO_MONO_SEC, offsetof(struct vdso_data, basetime[CLOCK_MONOTONIC].sec)); + DEFINE(VDSO_MONO_NSEC, offsetof(struct vdso_data, basetime[CLOCK_MONOTONIC].nsec)); + DEFINE(VDSO_MONO_RAW_SEC, offsetof(struct vdso_data, basetime[CLOCK_MONOTONIC_RAW].sec)); + DEFINE(VDSO_MONO_RAW_NSEC, offsetof(struct vdso_data, basetime[CLOCK_MONOTONIC_RAW].nsec)); + DEFINE(VDSO_BOOTTIME_SEC, offsetof(struct vdso_data, basetime[CLOCK_BOOTTIME].sec)); + DEFINE(VDSO_BOOTTIME_NSEC, offsetof(struct vdso_data, basetime[CLOCK_BOOTTIME].nsec)); + DEFINE(VDSO_TAI_SEC, offsetof(struct vdso_data, basetime[CLOCK_TAI].sec)); + DEFINE(VDSO_TAI_NSEC, offsetof(struct vdso_data, basetime[CLOCK_TAI].nsec)); + DEFINE(VDSO_RT_COARSE_SEC, offsetof(struct vdso_data, basetime[CLOCK_REALTIME_COARSE].sec)); + DEFINE(VDSO_RT_COARSE_NSEC, offsetof(struct vdso_data, basetime[CLOCK_REALTIME_COARSE].nsec)); + DEFINE(VDSO_MONO_COARSE_SEC, offsetof(struct vdso_data, basetime[CLOCK_MONOTONIC_COARSE].sec)); + DEFINE(VDSO_MONO_COARSE_NSEC, offsetof(struct vdso_data, basetime[CLOCK_MONOTONIC_COARSE].nsec)); DEFINE(VDSO_TZ_MINWEST, offsetof(struct vdso_data, tz_minuteswest)); - DEFINE(VDSO_USE_SYSCALL, offsetof(struct vdso_data, use_syscall)); + DEFINE(VDSO_TZ_DSTTIME, offsetof(struct vdso_data, tz_dsttime)); BLANK(); DEFINE(TVAL_TV_SEC, offsetof(struct timeval, tv_sec)); DEFINE(TSPEC_TV_SEC, offsetof(struct timespec, tv_sec)); diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index 8074cbd3a3a8..23c38303a52a 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -31,11 +31,13 @@ #include <linux/slab.h> #include <linux/timekeeper_internal.h> #include <linux/vmalloc.h> +#include <vdso/datapage.h> +#include <vdso/helpers.h> +#include <vdso/vsyscall.h>
#include <asm/cacheflush.h> #include <asm/signal32.h> #include <asm/vdso.h> -#include <asm/vdso_datapage.h>
extern char vdso_start[], vdso_end[]; static unsigned long vdso_pages __ro_after_init; @@ -44,10 +46,10 @@ static unsigned long vdso_pages __ro_after_init; * The vDSO data page. */ static union { - struct vdso_data data; + struct vdso_data data[CS_BASES]; u8 page[PAGE_SIZE]; } vdso_data_store __page_aligned_data; -struct vdso_data *vdso_data = &vdso_data_store.data; +struct vdso_data *vdso_data = vdso_data_store.data;
#ifdef CONFIG_COMPAT /* @@ -280,46 +282,3 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, up_write(&mm->mmap_sem); return PTR_ERR(ret); } - -/* - * Update the vDSO data page to keep in sync with kernel timekeeping. - */ -void update_vsyscall(struct timekeeper *tk) -{ - u32 use_syscall = !tk->tkr_mono.clock->archdata.vdso_direct; - - ++vdso_data->tb_seq_count; - smp_wmb(); - - vdso_data->use_syscall = use_syscall; - vdso_data->xtime_coarse_sec = tk->xtime_sec; - vdso_data->xtime_coarse_nsec = tk->tkr_mono.xtime_nsec >> - tk->tkr_mono.shift; - vdso_data->wtm_clock_sec = tk->wall_to_monotonic.tv_sec; - vdso_data->wtm_clock_nsec = tk->wall_to_monotonic.tv_nsec; - - /* Read without the seqlock held by clock_getres() */ - WRITE_ONCE(vdso_data->hrtimer_res, hrtimer_resolution); - - if (!use_syscall) { - /* tkr_mono.cycle_last == tkr_raw.cycle_last */ - vdso_data->cs_cycle_last = tk->tkr_mono.cycle_last; - vdso_data->raw_time_sec = tk->raw_sec; - vdso_data->raw_time_nsec = tk->tkr_raw.xtime_nsec; - vdso_data->xtime_clock_sec = tk->xtime_sec; - vdso_data->xtime_clock_nsec = tk->tkr_mono.xtime_nsec; - vdso_data->cs_mono_mult = tk->tkr_mono.mult; - vdso_data->cs_raw_mult = tk->tkr_raw.mult; - /* tkr_mono.shift == tkr_raw.shift */ - vdso_data->cs_shift = tk->tkr_mono.shift; - } - - smp_wmb(); - ++vdso_data->tb_seq_count; -} - -void update_vsyscall_tz(void) -{ - vdso_data->tz_minuteswest = sys_tz.tz_minuteswest; - vdso_data->tz_dsttime = sys_tz.tz_dsttime; -} diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index fa230ff09aa1..00bcfd2672cf 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -6,7 +6,12 @@ # Heavily based on the vDSO Makefiles for other archs. #
-obj-vdso := gettimeofday.o note.o sigreturn.o +# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before +# the inclusion of generic Makefile. +ARCH_REL_TYPE_ABS := R_AARCH64_JUMP_SLOT|R_AARCH64_GLOB_DAT|R_AARCH64_ABS64 +include $(srctree)/lib/vdso/Makefile + +obj-vdso := vgettimeofday.o note.o sigreturn.o
# Build rules targets := $(obj-vdso) vdso.so vdso.so.dbg @@ -15,6 +20,24 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso)) ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv \ --build-id -n -T
+ccflags-y := -fno-common -fno-builtin -fno-stack-protector +ccflags-y += -DDISABLE_BRANCH_PROFILING + +VDSO_LDFLAGS := -Bsymbolic + +CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os +KBUILD_CFLAGS += $(DISABLE_LTO) +KASAN_SANITIZE := n +UBSAN_SANITIZE := n +OBJECT_FILES_NON_STANDARD := y +KCOV_INSTRUMENT := n + +ifeq ($(c-gettimeofday-y),) +CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny +else +CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -include $(c-gettimeofday-y) +endif + # Disable gcov profiling for VDSO code GCOV_PROFILE := n
@@ -28,6 +51,7 @@ $(obj)/vdso.o : $(obj)/vdso.so # Link rule for the .so file, .lds has to be first $(obj)/vdso.so.dbg: $(obj)/vdso.lds $(obj-vdso) FORCE $(call if_changed,ld) + $(call if_changed,vdso_check)
# Strip rule for the .so file $(obj)/%.so: OBJCOPYFLAGS := -S @@ -42,13 +66,9 @@ quiet_cmd_vdsosym = VDSOSYM $@ include/generated/vdso-offsets.h: $(obj)/vdso.so.dbg FORCE $(call if_changed,vdsosym)
-# Assembly rules for the .S files -$(obj-vdso): %.o: %.S FORCE - $(call if_changed_dep,vdsoas) - # Actual build commands -quiet_cmd_vdsoas = VDSOA $@ - cmd_vdsoas = $(CC) $(a_flags) -c -o $@ $< +quiet_cmd_vdsocc = VDSOCC $@ + cmd_vdsocc = $(CC) $(a_flags) $(c_flags) -c -o $@ $<
# Install commands for the unstripped file quiet_cmd_vdso_install = INSTALL $@ diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S deleted file mode 100644 index 856fee6d3512..000000000000 --- a/arch/arm64/kernel/vdso/gettimeofday.S +++ /dev/null @@ -1,334 +0,0 @@ -/* - * Userspace implementations of gettimeofday() and friends. - * - * Copyright (C) 2012 ARM Limited - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see http://www.gnu.org/licenses/. - * - * Author: Will Deacon will.deacon@arm.com - */ - -#include <linux/linkage.h> -#include <asm/asm-offsets.h> -#include <asm/unistd.h> - -#define NSEC_PER_SEC_LO16 0xca00 -#define NSEC_PER_SEC_HI16 0x3b9a - -vdso_data .req x6 -seqcnt .req w7 -w_tmp .req w8 -x_tmp .req x8 - -/* - * Conventions for macro arguments: - * - An argument is write-only if its name starts with "res". - * - All other arguments are read-only, unless otherwise specified. - */ - - .macro seqcnt_acquire -9999: ldr seqcnt, [vdso_data, #VDSO_TB_SEQ_COUNT] - tbnz seqcnt, #0, 9999b - dmb ishld - .endm - - .macro seqcnt_check fail - dmb ishld - ldr w_tmp, [vdso_data, #VDSO_TB_SEQ_COUNT] - cmp w_tmp, seqcnt - b.ne \fail - .endm - - .macro syscall_check fail - ldr w_tmp, [vdso_data, #VDSO_USE_SYSCALL] - cbnz w_tmp, \fail - .endm - - .macro get_nsec_per_sec res - mov \res, #NSEC_PER_SEC_LO16 - movk \res, #NSEC_PER_SEC_HI16, lsl #16 - .endm - - /* - * Returns the clock delta, in nanoseconds left-shifted by the clock - * shift. - */ - .macro get_clock_shifted_nsec res, cycle_last, mult - /* Read the virtual counter. */ - isb - mrs x_tmp, cntvct_el0 - /* Calculate cycle delta and convert to ns. */ - sub \res, x_tmp, \cycle_last - /* We can only guarantee 56 bits of precision. */ - movn x_tmp, #0xff00, lsl #48 - and \res, x_tmp, \res - mul \res, \res, \mult - /* - * Fake address dependency from the value computed from the counter - * register to subsequent data page accesses so that the sequence - * locking also orders the read of the counter. - */ - and x_tmp, \res, xzr - add vdso_data, vdso_data, x_tmp - .endm - - /* - * Returns in res_{sec,nsec} the REALTIME timespec, based on the - * "wall time" (xtime) and the clock_mono delta. - */ - .macro get_ts_realtime res_sec, res_nsec, \ - clock_nsec, xtime_sec, xtime_nsec, nsec_to_sec - add \res_nsec, \clock_nsec, \xtime_nsec - udiv x_tmp, \res_nsec, \nsec_to_sec - add \res_sec, \xtime_sec, x_tmp - msub \res_nsec, x_tmp, \nsec_to_sec, \res_nsec - .endm - - /* - * Returns in res_{sec,nsec} the timespec based on the clock_raw delta, - * used for CLOCK_MONOTONIC_RAW. - */ - .macro get_ts_clock_raw res_sec, res_nsec, clock_nsec, nsec_to_sec - udiv \res_sec, \clock_nsec, \nsec_to_sec - msub \res_nsec, \res_sec, \nsec_to_sec, \clock_nsec - .endm - - /* sec and nsec are modified in place. */ - .macro add_ts sec, nsec, ts_sec, ts_nsec, nsec_to_sec - /* Add timespec. */ - add \sec, \sec, \ts_sec - add \nsec, \nsec, \ts_nsec - - /* Normalise the new timespec. */ - cmp \nsec, \nsec_to_sec - b.lt 9999f - sub \nsec, \nsec, \nsec_to_sec - add \sec, \sec, #1 -9999: - cmp \nsec, #0 - b.ge 9998f - add \nsec, \nsec, \nsec_to_sec - sub \sec, \sec, #1 -9998: - .endm - - .macro clock_gettime_return, shift=0 - .if \shift == 1 - lsr x11, x11, x12 - .endif - stp x10, x11, [x1, #TSPEC_TV_SEC] - mov x0, xzr - ret - .endm - - .macro jump_slot jumptable, index, label - .if (. - \jumptable) != 4 * (\index) - .error "Jump slot index mismatch" - .endif - b \label - .endm - - .text - -/* int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); */ -ENTRY(__kernel_gettimeofday) - .cfi_startproc - adr vdso_data, _vdso_data - /* If tv is NULL, skip to the timezone code. */ - cbz x0, 2f - - /* Compute the time of day. */ -1: seqcnt_acquire - syscall_check fail=4f - ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] - /* w11 = cs_mono_mult, w12 = cs_shift */ - ldp w11, w12, [vdso_data, #VDSO_CS_MONO_MULT] - ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] - - get_nsec_per_sec res=x9 - lsl x9, x9, x12 - - get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 - seqcnt_check fail=1b - get_ts_realtime res_sec=x10, res_nsec=x11, \ - clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9 - - /* Convert ns to us. */ - mov x13, #1000 - lsl x13, x13, x12 - udiv x11, x11, x13 - stp x10, x11, [x0, #TVAL_TV_SEC] -2: - /* If tz is NULL, return 0. */ - cbz x1, 3f - ldp w4, w5, [vdso_data, #VDSO_TZ_MINWEST] - stp w4, w5, [x1, #TZ_MINWEST] -3: - mov x0, xzr - ret -4: - /* Syscall fallback. */ - mov x8, #__NR_gettimeofday - svc #0 - ret - .cfi_endproc -ENDPROC(__kernel_gettimeofday) - -#define JUMPSLOT_MAX CLOCK_MONOTONIC_COARSE - -/* int __kernel_clock_gettime(clockid_t clock_id, struct timespec *tp); */ -ENTRY(__kernel_clock_gettime) - .cfi_startproc - cmp w0, #JUMPSLOT_MAX - b.hi syscall - adr vdso_data, _vdso_data - adr x_tmp, jumptable - add x_tmp, x_tmp, w0, uxtw #2 - br x_tmp - - ALIGN -jumptable: - jump_slot jumptable, CLOCK_REALTIME, realtime - jump_slot jumptable, CLOCK_MONOTONIC, monotonic - b syscall - b syscall - jump_slot jumptable, CLOCK_MONOTONIC_RAW, monotonic_raw - jump_slot jumptable, CLOCK_REALTIME_COARSE, realtime_coarse - jump_slot jumptable, CLOCK_MONOTONIC_COARSE, monotonic_coarse - - .if (. - jumptable) != 4 * (JUMPSLOT_MAX + 1) - .error "Wrong jumptable size" - .endif - - ALIGN -realtime: - seqcnt_acquire - syscall_check fail=syscall - ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] - /* w11 = cs_mono_mult, w12 = cs_shift */ - ldp w11, w12, [vdso_data, #VDSO_CS_MONO_MULT] - ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] - - /* All computations are done with left-shifted nsecs. */ - get_nsec_per_sec res=x9 - lsl x9, x9, x12 - - get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 - seqcnt_check fail=realtime - get_ts_realtime res_sec=x10, res_nsec=x11, \ - clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9 - clock_gettime_return, shift=1 - - ALIGN -monotonic: - seqcnt_acquire - syscall_check fail=syscall - ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] - /* w11 = cs_mono_mult, w12 = cs_shift */ - ldp w11, w12, [vdso_data, #VDSO_CS_MONO_MULT] - ldp x13, x14, [vdso_data, #VDSO_XTIME_CLK_SEC] - ldp x3, x4, [vdso_data, #VDSO_WTM_CLK_SEC] - - /* All computations are done with left-shifted nsecs. */ - lsl x4, x4, x12 - get_nsec_per_sec res=x9 - lsl x9, x9, x12 - - get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 - seqcnt_check fail=monotonic - get_ts_realtime res_sec=x10, res_nsec=x11, \ - clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9 - - add_ts sec=x10, nsec=x11, ts_sec=x3, ts_nsec=x4, nsec_to_sec=x9 - clock_gettime_return, shift=1 - - ALIGN -monotonic_raw: - seqcnt_acquire - syscall_check fail=syscall - ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST] - /* w11 = cs_raw_mult, w12 = cs_shift */ - ldp w12, w11, [vdso_data, #VDSO_CS_SHIFT] - ldp x13, x14, [vdso_data, #VDSO_RAW_TIME_SEC] - - /* All computations are done with left-shifted nsecs. */ - get_nsec_per_sec res=x9 - lsl x9, x9, x12 - - get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11 - seqcnt_check fail=monotonic_raw - get_ts_clock_raw res_sec=x10, res_nsec=x11, \ - clock_nsec=x15, nsec_to_sec=x9 - - add_ts sec=x10, nsec=x11, ts_sec=x13, ts_nsec=x14, nsec_to_sec=x9 - clock_gettime_return, shift=1 - - ALIGN -realtime_coarse: - seqcnt_acquire - ldp x10, x11, [vdso_data, #VDSO_XTIME_CRS_SEC] - seqcnt_check fail=realtime_coarse - clock_gettime_return - - ALIGN -monotonic_coarse: - seqcnt_acquire - ldp x10, x11, [vdso_data, #VDSO_XTIME_CRS_SEC] - ldp x13, x14, [vdso_data, #VDSO_WTM_CLK_SEC] - seqcnt_check fail=monotonic_coarse - - /* Computations are done in (non-shifted) nsecs. */ - get_nsec_per_sec res=x9 - add_ts sec=x10, nsec=x11, ts_sec=x13, ts_nsec=x14, nsec_to_sec=x9 - clock_gettime_return - - ALIGN -syscall: /* Syscall fallback. */ - mov x8, #__NR_clock_gettime - svc #0 - ret - .cfi_endproc -ENDPROC(__kernel_clock_gettime) - -/* int __kernel_clock_getres(clockid_t clock_id, struct timespec *res); */ -ENTRY(__kernel_clock_getres) - .cfi_startproc - cmp w0, #CLOCK_REALTIME - ccmp w0, #CLOCK_MONOTONIC, #0x4, ne - ccmp w0, #CLOCK_MONOTONIC_RAW, #0x4, ne - b.ne 1f - - adr vdso_data, _vdso_data - ldr w2, [vdso_data, #CLOCK_REALTIME_RES] - b 2f -1: - cmp w0, #CLOCK_REALTIME_COARSE - ccmp w0, #CLOCK_MONOTONIC_COARSE, #0x4, ne - b.ne 4f - ldr x2, 5f -2: - cbz x1, 3f - stp xzr, x2, [x1] - -3: /* res == NULL. */ - mov w0, wzr - ret - -4: /* Syscall fallback. */ - mov x8, #__NR_clock_getres - svc #0 - ret -5: - .quad CLOCK_COARSE_RES - .cfi_endproc -ENDPROC(__kernel_clock_getres) diff --git a/arch/arm64/kernel/vdso/vgettimeofday.c b/arch/arm64/kernel/vdso/vgettimeofday.c new file mode 100644 index 000000000000..bbc83b6e2b1a --- /dev/null +++ b/arch/arm64/kernel/vdso/vgettimeofday.c @@ -0,0 +1,28 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ARM64 userspace implementations of gettimeofday() and similar. + * + * Copyright (C) 2018 ARM Limited + * + */ +#include <linux/time.h> +#include <linux/types.h> + +notrace int __kernel_clock_gettime(clockid_t clock, + struct __kernel_timespec *ts) +{ + return __cvdso_clock_gettime(clock, ts); +} + +notrace int __kernel_gettimeofday(struct __kernel_old_timeval *tv, + struct timezone *tz) +{ + return __cvdso_gettimeofday(tv, tz); +} + +notrace int __kernel_clock_getres(clockid_t clock_id, + struct __kernel_timespec *res) +{ + return __cvdso_clock_getres(clock_id, res); +} +
From: Peter Collingbourne pcc@google.com
The vDSO needs to be build with x18 reserved in order to accommodate userspace platform ABIs built on top of Linux that use the register to carry inter-procedural state, as provided for by the AAPCS. An example of such a platform ABI is the one that will be used by an upcoming version of Android.
Although this change is currently a no-op due to the fact that the vDSO is currently implemented in pure assembly on arm64, it is necessary in order to prepare for another change [1] that will add C code to the vDSO.
[1] https://patchwork.kernel.org/patch/10044501/
Signed-off-by: Peter Collingbourne pcc@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com Cc: Mark Salyzyn salyzyn@google.com Cc: Will Deacon will.deacon@arm.com Cc: linux-arm-kernel@lists.infradead.org --- arch/arm64/kernel/vdso/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index 00bcfd2672cf..d98230f70caf 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -20,7 +20,7 @@ obj-vdso := $(addprefix $(obj)/, $(obj-vdso)) ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv \ --build-id -n -T
-ccflags-y := -fno-common -fno-builtin -fno-stack-protector +ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18 ccflags-y += -DDISABLE_BRANCH_PROFILING
VDSO_LDFLAGS := -Bsymbolic
vDSO requires gettimeofday and clock_gettime syscalls to implement the fallback mechanism.
Add the missing syscall numbers to unistd.h for arm64.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Cc: Arnd Bergmann arnd@arndb.de Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm64/include/asm/unistd.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 70e6882853c0..81cc05acccc9 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -33,8 +33,13 @@ #define __NR_compat_exit 1 #define __NR_compat_read 3 #define __NR_compat_write 4 +#define __NR_compat_gettimeofday 78 #define __NR_compat_sigreturn 119 #define __NR_compat_rt_sigreturn 173 +#define __NR_compat_clock_getres 247 +#define __NR_compat_clock_gettime 263 +#define __NR_compat_clock_gettime64 403 +#define __NR_compat_clock_getres_time64 406
/* * The following SVCs are ARM private.
The compat signal data structures are required as part of the compat vDSO implementation in order to provide the unwinding information for the sigreturn trampolines.
Expose the mentioned data structures as part of signal32.h.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm64/include/asm/signal32.h | 46 +++++++++++++++++++++++++++++++ arch/arm64/kernel/signal32.c | 46 ------------------------------- 2 files changed, 46 insertions(+), 46 deletions(-)
diff --git a/arch/arm64/include/asm/signal32.h b/arch/arm64/include/asm/signal32.h index 58e288aaf0ba..1f05268f4c6d 100644 --- a/arch/arm64/include/asm/signal32.h +++ b/arch/arm64/include/asm/signal32.h @@ -20,6 +20,52 @@ #ifdef CONFIG_COMPAT #include <linux/compat.h>
+struct compat_sigcontext { + /* We always set these two fields to 0 */ + compat_ulong_t trap_no; + compat_ulong_t error_code; + + compat_ulong_t oldmask; + compat_ulong_t arm_r0; + compat_ulong_t arm_r1; + compat_ulong_t arm_r2; + compat_ulong_t arm_r3; + compat_ulong_t arm_r4; + compat_ulong_t arm_r5; + compat_ulong_t arm_r6; + compat_ulong_t arm_r7; + compat_ulong_t arm_r8; + compat_ulong_t arm_r9; + compat_ulong_t arm_r10; + compat_ulong_t arm_fp; + compat_ulong_t arm_ip; + compat_ulong_t arm_sp; + compat_ulong_t arm_lr; + compat_ulong_t arm_pc; + compat_ulong_t arm_cpsr; + compat_ulong_t fault_address; +}; + +struct compat_ucontext { + compat_ulong_t uc_flags; + compat_uptr_t uc_link; + compat_stack_t uc_stack; + struct compat_sigcontext uc_mcontext; + compat_sigset_t uc_sigmask; + int __unused[32 - (sizeof(compat_sigset_t) / sizeof(int))]; + compat_ulong_t uc_regspace[128] __attribute__((__aligned__(8))); +}; + +struct compat_sigframe { + struct compat_ucontext uc; + compat_ulong_t retcode[2]; +}; + +struct compat_rt_sigframe { + struct compat_siginfo info; + struct compat_sigframe sig; +}; + int compat_setup_frame(int usig, struct ksignal *ksig, sigset_t *set, struct pt_regs *regs); int compat_setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c index caea6e25db2a..74e06d8c7c2b 100644 --- a/arch/arm64/kernel/signal32.c +++ b/arch/arm64/kernel/signal32.c @@ -30,42 +30,6 @@ #include <linux/uaccess.h> #include <asm/unistd.h>
-struct compat_sigcontext { - /* We always set these two fields to 0 */ - compat_ulong_t trap_no; - compat_ulong_t error_code; - - compat_ulong_t oldmask; - compat_ulong_t arm_r0; - compat_ulong_t arm_r1; - compat_ulong_t arm_r2; - compat_ulong_t arm_r3; - compat_ulong_t arm_r4; - compat_ulong_t arm_r5; - compat_ulong_t arm_r6; - compat_ulong_t arm_r7; - compat_ulong_t arm_r8; - compat_ulong_t arm_r9; - compat_ulong_t arm_r10; - compat_ulong_t arm_fp; - compat_ulong_t arm_ip; - compat_ulong_t arm_sp; - compat_ulong_t arm_lr; - compat_ulong_t arm_pc; - compat_ulong_t arm_cpsr; - compat_ulong_t fault_address; -}; - -struct compat_ucontext { - compat_ulong_t uc_flags; - compat_uptr_t uc_link; - compat_stack_t uc_stack; - struct compat_sigcontext uc_mcontext; - compat_sigset_t uc_sigmask; - int __unused[32 - (sizeof (compat_sigset_t) / sizeof (int))]; - compat_ulong_t uc_regspace[128] __attribute__((__aligned__(8))); -}; - struct compat_vfp_sigframe { compat_ulong_t magic; compat_ulong_t size; @@ -92,16 +56,6 @@ struct compat_aux_sigframe { unsigned long end_magic; } __attribute__((__aligned__(8)));
-struct compat_sigframe { - struct compat_ucontext uc; - compat_ulong_t retcode[2]; -}; - -struct compat_rt_sigframe { - struct compat_siginfo info; - struct compat_sigframe sig; -}; - #define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP)))
static inline int put_sigset_t(compat_sigset_t __user *uset, sigset_t *set)
Update asm-offsets for arm64 to generate the correct offsets for compat signals.
They will be useful for the implementation of the compat sigreturn trampolines in vDSO context.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm64/kernel/asm-offsets.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 9e4b7ccbab2f..c0d8b9f40022 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -30,6 +30,7 @@ #include <asm/fixmap.h> #include <asm/thread_info.h> #include <asm/memory.h> +#include <asm/signal32.h> #include <asm/smp_plat.h> #include <asm/suspend.h> #include <linux/kbuild.h> @@ -77,6 +78,11 @@ int main(void) DEFINE(S_STACKFRAME, offsetof(struct pt_regs, stackframe)); DEFINE(S_FRAME_SIZE, sizeof(struct pt_regs)); BLANK(); +#ifdef CONFIG_COMPAT + DEFINE(COMPAT_SIGFRAME_REGS_OFFSET, offsetof(struct compat_sigframe, uc.uc_mcontext.arm_r0)); + DEFINE(COMPAT_RT_SIGFRAME_REGS_OFFSET, offsetof(struct compat_rt_sigframe, sig.uc.uc_mcontext.arm_r0)); + BLANK(); +#endif DEFINE(MM_CONTEXT_ID, offsetof(struct mm_struct, context.id.counter)); BLANK(); DEFINE(VMA_VM_MM, offsetof(struct vm_area_struct, vm_mm));
Some 64 bit architectures have support for 32 bit applications that require a separate version of the vDSOs.
Add support to the generic code for compat fallback functions.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- lib/vdso/gettimeofday.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c index a226675731f4..44b9c6a0cb95 100644 --- a/lib/vdso/gettimeofday.c +++ b/lib/vdso/gettimeofday.c @@ -21,7 +21,11 @@ * - clock_gettime_fallback(): fallback for clock_gettime. * - clock_getres_fallback(): fallback for clock_getres. */ +#ifdef ENABLE_COMPAT_VDSO +#include <asm/vdso/compat_gettimeofday.h> +#else #include <asm/vdso/gettimeofday.h> +#endif /* ENABLE_COMPAT_VDSO */
static notrace int do_hres(const struct vdso_data *vd, clockid_t clk,
Provide the arm64 compat (AArch32) vDSO in kernel/vdso32 in a similar way to what happens in kernel/vdso.
The compat vDSO leverages on an adaptation of the arm architecture code with few changes: - Use of lib/vdso for gettimeofday - Implementation of syscall based fallback - Introduction of clock_getres for the compat library - Implementation of trampolines - Implementation of elf note
To build the compat vDSO a 32 bit compiler is required and needs to be specified via CONFIG_CROSS_COMPILE_COMPAT_VDSO.
The implementation of the configuration option will be contained in a future patch.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm64/include/asm/vdso/compat_barrier.h | 51 +++++ .../include/asm/vdso/compat_gettimeofday.h | 108 ++++++++++ arch/arm64/kernel/vdso32/.gitignore | 2 + arch/arm64/kernel/vdso32/Makefile | 184 ++++++++++++++++++ arch/arm64/kernel/vdso32/note.c | 15 ++ arch/arm64/kernel/vdso32/sigreturn.S | 62 ++++++ arch/arm64/kernel/vdso32/vdso.S | 19 ++ arch/arm64/kernel/vdso32/vdso.lds.S | 82 ++++++++ arch/arm64/kernel/vdso32/vgettimeofday.c | 59 ++++++ 9 files changed, 582 insertions(+) create mode 100644 arch/arm64/include/asm/vdso/compat_barrier.h create mode 100644 arch/arm64/include/asm/vdso/compat_gettimeofday.h create mode 100644 arch/arm64/kernel/vdso32/.gitignore create mode 100644 arch/arm64/kernel/vdso32/Makefile create mode 100644 arch/arm64/kernel/vdso32/note.c create mode 100644 arch/arm64/kernel/vdso32/sigreturn.S create mode 100644 arch/arm64/kernel/vdso32/vdso.S create mode 100644 arch/arm64/kernel/vdso32/vdso.lds.S create mode 100644 arch/arm64/kernel/vdso32/vgettimeofday.c
diff --git a/arch/arm64/include/asm/vdso/compat_barrier.h b/arch/arm64/include/asm/vdso/compat_barrier.h new file mode 100644 index 000000000000..ea24ea856b07 --- /dev/null +++ b/arch/arm64/include/asm/vdso/compat_barrier.h @@ -0,0 +1,51 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2018 ARM Limited + */ +#ifndef __COMPAT_BARRIER_H +#define __COMPAT_BARRIER_H + +#ifndef __ASSEMBLY__ +/* + * Warning: This code is meant to be used with + * ENABLE_COMPAT_VDSO only. + */ +#ifndef ENABLE_COMPAT_VDSO +#error This header is meant to be used with ENABLE_COMPAT_VDSO only +#endif + +#ifdef dmb +#undef dmb +#endif + +#if __LINUX_ARM_ARCH__ >= 7 +#define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory") +#elif __LINUX_ARM_ARCH__ == 6 +#define dmb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 5" \ + : : "r" (0) : "memory") +#else +#define dmb(x) __asm__ __volatile__ ("" : : : "memory") +#endif + +#if __LINUX_ARM_ARCH__ >= 8 +#define aarch32_smp_mb() dmb(ish) +#define aarch32_smp_rmb() dmb(ishld) +#define aarch32_smp_wmb() dmb(ishst) +#else +#define aarch32_smp_mb() dmb(ish) +#define aarch32_smp_rmb() aarch32_smp_mb() +#define aarch32_smp_wmb() dmb(ishst) +#endif + + +#undef smp_mb +#undef smp_rmb +#undef smp_wmb + +#define smp_mb() aarch32_smp_mb() +#define smp_rmb() aarch32_smp_rmb() +#define smp_wmb() aarch32_smp_wmb() + +#endif /* !__ASSEMBLY__ */ + +#endif /* __COMPAT_BARRIER_H */ diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h b/arch/arm64/include/asm/vdso/compat_gettimeofday.h new file mode 100644 index 000000000000..e9d44b363bf2 --- /dev/null +++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h @@ -0,0 +1,108 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2018 ARM Limited + */ +#ifndef __ASM_VDSO_GETTIMEOFDAY_H +#define __ASM_VDSO_GETTIMEOFDAY_H + +#ifndef __ASSEMBLY__ + +#include <asm/unistd.h> +#include <uapi/linux/time.h> + +#include <asm/vdso/compat_barrier.h> + +static __always_inline notrace int gettimeofday_fallback( + struct __kernel_old_timeval *_tv, + struct timezone *_tz) +{ + register struct timezone *tz asm("r1") = _tz; + register struct __kernel_old_timeval *tv asm("r0") = _tv; + register long ret asm ("r0"); + register long nr asm("r7") = __NR_compat_gettimeofday; + + asm volatile( + " swi #0\n" + : "=r" (ret) + : "r" (tv), "r" (tz), "r" (nr) + : "memory"); + + return ret; +} + +static __always_inline notrace long clock_gettime_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + register struct __kernel_timespec *ts asm("r1") = _ts; + register clockid_t clkid asm("r0") = _clkid; + register long ret asm ("r0"); + register long nr asm("r7") = __NR_compat_clock_gettime64; + + asm volatile( + " swi #0\n" + : "=r" (ret) + : "r" (clkid), "r" (ts), "r" (nr) + : "memory"); + + return ret; +} + +static __always_inline notrace int clock_getres_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + register struct __kernel_timespec *ts asm("r1") = _ts; + register clockid_t clkid asm("r0") = _clkid; + register long ret asm ("r0"); + register long nr asm("r7") = __NR_compat_clock_getres_time64; + + /* The checks below are required for ABI consistency with arm */ + if ((_clkid >= MAX_CLOCKS) && (_ts == NULL)) + return -EINVAL; + + asm volatile( + " swi #0\n" + : "=r" (ret) + : "r" (clkid), "r" (ts), "r" (nr) + : "memory"); + + return ret; +} + +static __always_inline notrace u64 __arch_get_hw_counter(s32 clock_mode) +{ + u64 res; + + isb(); + asm volatile("mrrc p15, 1, %Q0, %R0, c14" : "=r" (res)); + + return res; +} + +static __always_inline notrace const struct vdso_data *__arch_get_vdso_data(void) +{ + const struct vdso_data *ret; + + /* + * This simply puts &_vdso_data into ret. The reason why we don't use + * `ret = _vdso_data` is that the compiler tends to optimise this in a + * very suboptimal way: instead of keeping &_vdso_data in a register, + * it goes through a relocation almost every time _vdso_data must be + * accessed (even in subfunctions). This is both time and space + * consuming: each relocation uses a word in the code section, and it + * has to be loaded at runtime. + * + * This trick hides the assignment from the compiler. Since it cannot + * track where the pointer comes from, it will only use one relocation + * where __arch_get_vdso_data() is called, and then keep the result in + * a register. + */ + asm volatile("mov %0, %1" : "=r"(ret) : "r"(_vdso_data)); + + return ret; +} + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_GETTIMEOFDAY_H */ diff --git a/arch/arm64/kernel/vdso32/.gitignore b/arch/arm64/kernel/vdso32/.gitignore new file mode 100644 index 000000000000..4fea950fa5ed --- /dev/null +++ b/arch/arm64/kernel/vdso32/.gitignore @@ -0,0 +1,2 @@ +vdso.lds +vdso.so.raw diff --git a/arch/arm64/kernel/vdso32/Makefile b/arch/arm64/kernel/vdso32/Makefile new file mode 100644 index 000000000000..0f1a02ccacbc --- /dev/null +++ b/arch/arm64/kernel/vdso32/Makefile @@ -0,0 +1,184 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for vdso32 +# + +# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before +# the inclusion of generic Makefile. +ARCH_REL_TYPE_ABS := R_ARM_JUMP_SLOT|R_ARM_GLOB_DAT|R_ARM_ABS32 +include $(srctree)/lib/vdso/Makefile + +COMPATCC := $(CROSS_COMPILE_COMPAT)gcc + +# Same as cc-*option, but using COMPATCC instead of CC +cc32-option = $(call try-run,\ + $(COMPATCC) $(1) -c -x c /dev/null -o "$$TMP",$(1),$(2)) +cc32-disable-warning = $(call try-run,\ + $(COMPATCC) -W$(strip $(1)) -c -x c /dev/null -o "$$TMP",-Wno-$(strip $(1))) +cc32-ldoption = $(call try-run,\ + $(COMPATCC) $(1) -nostdlib -x c /dev/null -o "$$TMP",$(1),$(2)) + +# We cannot use the global flags to compile the vDSO files, the main reason +# being that the 32-bit compiler may be older than the main (64-bit) compiler +# and therefore may not understand flags set using $(cc-option ...). Besides, +# arch-specific options should be taken from the arm Makefile instead of the +# arm64 one. +# As a result we set our own flags here. + +# From top-level Makefile +# NOSTDINC_FLAGS +VDSO_CPPFLAGS := -nostdinc -isystem $(shell $(COMPATCC) -print-file-name=include) +VDSO_CPPFLAGS += $(LINUXINCLUDE) +VDSO_CPPFLAGS += $(KBUILD_CPPFLAGS) + +# Common C and assembly flags +# From top-level Makefile +VDSO_CAFLAGS := $(VDSO_CPPFLAGS) +VDSO_CAFLAGS += $(call cc32-option,-fno-PIE) +ifdef CONFIG_DEBUG_INFO +VDSO_CAFLAGS += -g +endif +ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(COMPATCC)), y) +VDSO_CAFLAGS += -DCC_HAVE_ASM_GOTO +endif + +# From arm Makefile +VDSO_CAFLAGS += $(call cc32-option,-fno-dwarf2-cfi-asm) +VDSO_CAFLAGS += -mabi=aapcs-linux -mfloat-abi=soft +ifeq ($(CONFIG_CPU_BIG_ENDIAN), y) +VDSO_CAFLAGS += -mbig-endian +else +VDSO_CAFLAGS += -mlittle-endian +endif + +# From arm vDSO Makefile +VDSO_CAFLAGS += -fPIC -fno-builtin -fno-stack-protector +VDSO_CAFLAGS += -DDISABLE_BRANCH_PROFILING + +# Try to compile for ARMv8. If the compiler is too old and doesn't support it, +# fall back to v7. There is no easy way to check for what architecture the code +# is being compiled, so define a macro specifying that (see arch/arm/Makefile). +VDSO_CAFLAGS += $(call cc32-option,-march=armv8-a -D__LINUX_ARM_ARCH__=8,\ + -march=armv7-a -D__LINUX_ARM_ARCH__=7) + +VDSO_CFLAGS := $(VDSO_CAFLAGS) +VDSO_CFLAGS += -DENABLE_COMPAT_VDSO=1 +# KBUILD_CFLAGS from top-level Makefile +VDSO_CFLAGS += -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ + -fno-strict-aliasing -fno-common \ + -Werror-implicit-function-declaration \ + -Wno-format-security \ + -std=gnu89 +VDSO_CFLAGS += -O2 +# Some useful compiler-dependent flags from top-level Makefile +VDSO_CFLAGS += $(call cc32-option,-Wdeclaration-after-statement,) +VDSO_CFLAGS += $(call cc32-option,-Wno-pointer-sign) +VDSO_CFLAGS += $(call cc32-option,-fno-strict-overflow) +VDSO_CFLAGS += $(call cc32-option,-Werror=strict-prototypes) +VDSO_CFLAGS += $(call cc32-option,-Werror=date-time) +VDSO_CFLAGS += $(call cc32-option,-Werror=incompatible-pointer-types) + +# The 32-bit compiler does not provide 128-bit integers, which are used in +# some headers that are indirectly included from the vDSO code. +# This hack makes the compiler happy and should trigger a warning/error if +# variables of such type are referenced. +VDSO_CFLAGS += -D__uint128_t='void*' +# Silence some warnings coming from headers that operate on long's +# (on GCC 4.8 or older, there is unfortunately no way to silence this warning) +VDSO_CFLAGS += $(call cc32-disable-warning,shift-count-overflow) +VDSO_CFLAGS += -Wno-int-to-pointer-cast + +VDSO_AFLAGS := $(VDSO_CAFLAGS) +VDSO_AFLAGS += -D__ASSEMBLY__ + +VDSO_LDFLAGS := $(VDSO_CPPFLAGS) +# From arm vDSO Makefile +VDSO_LDFLAGS += -Wl,-Bsymbolic -Wl,--no-undefined -Wl,-soname=linux-vdso.so.1 +VDSO_LDFLAGS += -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 +VDSO_LDFLAGS += -nostdlib -shared -mfloat-abi=soft +VDSO_LDFLAGS += $(call cc32-ldoption,-Wl$(comma)--hash-style=sysv) +VDSO_LDFLAGS += $(call cc32-ldoption,-Wl$(comma)--build-id) +VDSO_LDFLAGS += $(call cc32-ldoption,-fuse-ld=bfd) + + +# Borrow vdsomunge.c from the arm vDSO +# We have to use a relative path because scripts/Makefile.host prefixes +# $(hostprogs-y) with $(obj) +munge := ../../../arm/vdso/vdsomunge +hostprogs-y := $(munge) + +c-obj-vdso := note.o +c-obj-vdso-gettimeofday := vgettimeofday.o +asm-obj-vdso := sigreturn.o + +ifneq ($(c-gettimeofday-y),) +VDSO_CFLAGS_gettimeofday_o += -include $(c-gettimeofday-y) +endif + +# Build rules +targets := $(c-obj-vdso) $(c-obj-vdso-gettimeofday) $(asm-obj-vdso) vdso.so vdso.so.dbg vdso.so.raw +c-obj-vdso := $(addprefix $(obj)/, $(c-obj-vdso)) +c-obj-vdso-gettimeofday := $(addprefix $(obj)/, $(c-obj-vdso-gettimeofday)) +asm-obj-vdso := $(addprefix $(obj)/, $(asm-obj-vdso)) +obj-vdso := $(c-obj-vdso) $(c-obj-vdso-gettimeofday) $(asm-obj-vdso) + +obj-y += vdso.o +extra-y += vdso.lds +CPPFLAGS_vdso.lds += -P -C -U$(ARCH) + +# Force dependency (vdso.s includes vdso.so through incbin) +$(obj)/vdso.o: $(obj)/vdso.so + +include/generated/vdso32-offsets.h: $(obj)/vdso.so.dbg FORCE + $(call if_changed,vdsosym) + +# Strip rule for vdso.so +$(obj)/vdso.so: OBJCOPYFLAGS := -S +$(obj)/vdso.so: $(obj)/vdso.so.dbg FORCE + $(call if_changed,objcopy) + +$(obj)/vdso.so.dbg: $(obj)/vdso.so.raw $(obj)/$(munge) FORCE + $(call if_changed,vdsomunge) + +# Link rule for the .so file, .lds has to be first +$(obj)/vdso.so.raw: $(src)/vdso.lds $(obj-vdso) FORCE + $(call if_changed,vdsold) + $(call if_changed,vdso_check) + +# Compilation rules for the vDSO sources +$(c-obj-vdso): %.o: %.c FORCE + $(call if_changed_dep,vdsocc) +$(c-obj-vdso-gettimeofday): %.o: %.c FORCE + $(call if_changed_dep,vdsocc_gettimeofday) +$(asm-obj-vdso): %.o: %.S FORCE + $(call if_changed_dep,vdsoas) + +# Actual build commands +quiet_cmd_vdsold = VDSOL $@ + cmd_vdsold = $(COMPATCC) -Wp,-MD,$(depfile) $(VDSO_LDFLAGS) \ + -Wl,-T $(filter %.lds,$^) $(filter %.o,$^) -o $@ +quiet_cmd_vdsocc = VDSOC $@ + cmd_vdsocc = $(COMPATCC) -Wp,-MD,$(depfile) $(VDSO_CFLAGS) -c -o $@ $< +quiet_cmd_vdsocc_gettimeofday = VDSOC_GTD $@ + cmd_vdsocc_gettimeofday = $(COMPATCC) -Wp,-MD,$(depfile) $(VDSO_CFLAGS) $(VDSO_CFLAGS_gettimeofday_o) -c -o $@ $< +quiet_cmd_vdsoas = VDSOA $@ + cmd_vdsoas = $(COMPATCC) -Wp,-MD,$(depfile) $(VDSO_AFLAGS) -c -o $@ $< + +quiet_cmd_vdsomunge = MUNGE $@ + cmd_vdsomunge = $(obj)/$(munge) $< $@ + +# Generate vDSO offsets using helper script (borrowed from the 64-bit vDSO) +gen-vdsosym := $(srctree)/$(src)/../vdso/gen_vdso_offsets.sh +quiet_cmd_vdsosym = VDSOSYM $@ +# The AArch64 nm should be able to read an AArch32 binary + cmd_vdsosym = $(NM) $< | $(gen-vdsosym) | LC_ALL=C sort > $@ + +# Install commands for the unstripped file +quiet_cmd_vdso_install = INSTALL $@ + cmd_vdso_install = cp $(obj)/$@.dbg $(MODLIB)/vdso/vdso32.so + +vdso.so: $(obj)/vdso.so.dbg + @mkdir -p $(MODLIB)/vdso + $(call cmd,vdso_install) + +vdso_install: vdso.so diff --git a/arch/arm64/kernel/vdso32/note.c b/arch/arm64/kernel/vdso32/note.c new file mode 100644 index 000000000000..eff5bf9efb8b --- /dev/null +++ b/arch/arm64/kernel/vdso32/note.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2012-2018 ARM Limited + * + * This supplies .note.* sections to go into the PT_NOTE inside the vDSO text. + * Here we can supply some information useful to userland. + */ + +#include <linux/uts.h> +#include <linux/version.h> +#include <linux/elfnote.h> +#include <linux/build-salt.h> + +ELFNOTE32("Linux", 0, LINUX_VERSION_CODE); +BUILD_SALT; diff --git a/arch/arm64/kernel/vdso32/sigreturn.S b/arch/arm64/kernel/vdso32/sigreturn.S new file mode 100644 index 000000000000..1a81277c2d09 --- /dev/null +++ b/arch/arm64/kernel/vdso32/sigreturn.S @@ -0,0 +1,62 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * This file provides both A32 and T32 versions, in accordance with the + * arm sigreturn code. + * + * Copyright (C) 2018 ARM Limited + */ + +#include <linux/linkage.h> +#include <asm/asm-offsets.h> +#include <asm/unistd.h> + +#define ARM_ENTRY(name) \ + ENTRY(name) + +#define ARM_ENDPROC(name) \ + .type name, %function; \ + END(name) + + .text + + .arm + .fnstart + .save {r0-r15} + .pad #COMPAT_SIGFRAME_REGS_OFFSET + nop +ARM_ENTRY(__kernel_sigreturn_arm) + mov r7, #__NR_compat_sigreturn + svc #0 + .fnend +ARM_ENDPROC(__kernel_sigreturn_arm) + + .fnstart + .save {r0-r15} + .pad #COMPAT_RT_SIGFRAME_REGS_OFFSET + nop +ARM_ENTRY(__kernel_rt_sigreturn_arm) + mov r7, #__NR_compat_rt_sigreturn + svc #0 + .fnend +ARM_ENDPROC(__kernel_rt_sigreturn_arm) + + .thumb + .fnstart + .save {r0-r15} + .pad #COMPAT_SIGFRAME_REGS_OFFSET + nop +ARM_ENTRY(__kernel_sigreturn_thumb) + mov r7, #__NR_compat_sigreturn + svc #0 + .fnend +ARM_ENDPROC(__kernel_sigreturn_thumb) + + .fnstart + .save {r0-r15} + .pad #COMPAT_RT_SIGFRAME_REGS_OFFSET + nop +ARM_ENTRY(__kernel_rt_sigreturn_thumb) + mov r7, #__NR_compat_rt_sigreturn + svc #0 + .fnend +ARM_ENDPROC(__kernel_rt_sigreturn_thumb) diff --git a/arch/arm64/kernel/vdso32/vdso.S b/arch/arm64/kernel/vdso32/vdso.S new file mode 100644 index 000000000000..e72ac7bc4c04 --- /dev/null +++ b/arch/arm64/kernel/vdso32/vdso.S @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2012 ARM Limited + */ + +#include <linux/init.h> +#include <linux/linkage.h> +#include <linux/const.h> +#include <asm/page.h> + + .globl vdso32_start, vdso32_end + .section .rodata + .balign PAGE_SIZE +vdso32_start: + .incbin "arch/arm64/kernel/vdso32/vdso.so" + .balign PAGE_SIZE +vdso32_end: + + .previous diff --git a/arch/arm64/kernel/vdso32/vdso.lds.S b/arch/arm64/kernel/vdso32/vdso.lds.S new file mode 100644 index 000000000000..a3944927eaeb --- /dev/null +++ b/arch/arm64/kernel/vdso32/vdso.lds.S @@ -0,0 +1,82 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Adapted from arm64 version. + * + * GNU linker script for the VDSO library. + * Heavily based on the vDSO linker scripts for other archs. + * + * Copyright (C) 2012-2018 ARM Limited + */ + +#include <linux/const.h> +#include <asm/page.h> +#include <asm/vdso.h> + +OUTPUT_FORMAT("elf32-littlearm", "elf32-bigarm", "elf32-littlearm") +OUTPUT_ARCH(arm) + +SECTIONS +{ + PROVIDE_HIDDEN(_vdso_data = . - PAGE_SIZE); + . = VDSO_LBASE + SIZEOF_HEADERS; + + .hash : { *(.hash) } :text + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + + .note : { *(.note.*) } :text :note + + .dynamic : { *(.dynamic) } :text :dynamic + + .rodata : { *(.rodata*) } :text + + .text : { *(.text*) } :text =0xe7f001f2 + + .got : { *(.got) } + .rel.plt : { *(.rel.plt) } + + /DISCARD/ : { + *(.note.GNU-stack) + *(.data .data.* .gnu.linkonce.d.* .sdata*) + *(.bss .sbss .dynbss .dynsbss) + } +} + +/* + * We must supply the ELF program headers explicitly to get just one + * PT_LOAD segment, and set the flags explicitly to make segments read-only. + */ +PHDRS +{ + text PT_LOAD FLAGS(5) FILEHDR PHDRS; /* PF_R|PF_X */ + dynamic PT_DYNAMIC FLAGS(4); /* PF_R */ + note PT_NOTE FLAGS(4); /* PF_R */ +} + +VERSION +{ + LINUX_2.6 { + global: + __vdso_clock_gettime; + __vdso_gettimeofday; + __vdso_clock_getres; + __kernel_sigreturn_arm; + __kernel_sigreturn_thumb; + __kernel_rt_sigreturn_arm; + __kernel_rt_sigreturn_thumb; + __vdso_clock_gettime64; + local: *; + }; +} + +/* + * Make the sigreturn code visible to the kernel. + */ +VDSO_compat_sigreturn_arm = __kernel_sigreturn_arm; +VDSO_compat_sigreturn_thumb = __kernel_sigreturn_thumb; +VDSO_compat_rt_sigreturn_arm = __kernel_rt_sigreturn_arm; +VDSO_compat_rt_sigreturn_thumb = __kernel_rt_sigreturn_thumb; diff --git a/arch/arm64/kernel/vdso32/vgettimeofday.c b/arch/arm64/kernel/vdso32/vgettimeofday.c new file mode 100644 index 000000000000..a6c8afe14ebc --- /dev/null +++ b/arch/arm64/kernel/vdso32/vgettimeofday.c @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ARM64 compat userspace implementations of gettimeofday() and similar. + * + * Copyright (C) 2018 ARM Limited + * + */ +#include <linux/time.h> +#include <linux/types.h> + +notrace int __vdso_clock_gettime(clockid_t clock, + struct old_timespec32 *ts) +{ + /* The checks below are required for ABI consistency with arm */ + if ((u32)ts >= TASK_SIZE_32) + return -EFAULT; + + return __cvdso_clock_gettime32(clock, ts); +} + +notrace int __vdso_clock_gettime64(clockid_t clock, + struct __kernel_timespec *ts) +{ + /* The checks below are required for ABI consistency with arm */ + if ((u32)ts >= TASK_SIZE_32) + return -EFAULT; + + return __cvdso_clock_gettime(clock, ts); +} + +notrace int __vdso_gettimeofday(struct __kernel_old_timeval *tv, + struct timezone *tz) +{ + return __cvdso_gettimeofday(tv, tz); +} + +notrace int __vdso_clock_getres(clockid_t clock_id, + struct old_timespec32 *res) +{ + /* The checks below are required for ABI consistency with arm */ + if ((u32)res >= TASK_SIZE_32) + return -EFAULT; + + return __cvdso_clock_getres_time32(clock_id, res); +} + +/* Avoid unresolved references emitted by GCC */ + +void __aeabi_unwind_cpp_pr0(void) +{ +} + +void __aeabi_unwind_cpp_pr1(void) +{ +} + +void __aeabi_unwind_cpp_pr2(void) +{ +}
Most of the code for initializing the vDSOs in arm64 and compat will be in common, hence a refactor of the current code is required to avoid duplication and simplify maintainability.
Refactor vdso.c to simplify the implementation of arm64 vDSO compat (which will be pushed with a future patch).
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm64/kernel/vdso.c | 215 ++++++++++++++++++++++++++------------- 1 file changed, 144 insertions(+), 71 deletions(-)
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index 23c38303a52a..aa1fb25a9fe4 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -40,7 +40,31 @@ #include <asm/vdso.h>
extern char vdso_start[], vdso_end[]; -static unsigned long vdso_pages __ro_after_init; + +/* vdso_lookup arch_index */ +enum arch_vdso_type { + ARM64_VDSO = 0, +}; +#define VDSO_TYPES (ARM64_VDSO + 1) + +struct __vdso_abi { + const char *name; + const char *vdso_code_start; + const char *vdso_code_end; + unsigned long vdso_pages; + /* Data Mapping */ + struct vm_special_mapping *dm; + /* Code Mapping */ + struct vm_special_mapping *cm; +}; + +static struct __vdso_abi vdso_lookup[VDSO_TYPES] __ro_after_init = { + { + .name = "vdso", + .vdso_code_start = vdso_start, + .vdso_code_end = vdso_end, + }, +};
/* * The vDSO data page. @@ -51,10 +75,110 @@ static union { } vdso_data_store __page_aligned_data; struct vdso_data *vdso_data = vdso_data_store.data;
+static int __vdso_remap(enum arch_vdso_type arch_index, + const struct vm_special_mapping *sm, + struct vm_area_struct *new_vma) +{ + unsigned long new_size = new_vma->vm_end - new_vma->vm_start; + unsigned long vdso_size = vdso_lookup[arch_index].vdso_code_end - + vdso_lookup[arch_index].vdso_code_start; + + if (vdso_size != new_size) + return -EINVAL; + + current->mm->context.vdso = (void *)new_vma->vm_start; + + return 0; +} + +static int __vdso_init(enum arch_vdso_type arch_index) +{ + int i; + struct page **vdso_pagelist; + unsigned long pfn; + + if (memcmp(vdso_lookup[arch_index].vdso_code_start, "\177ELF", 4)) { + pr_err("vDSO is not a valid ELF object!\n"); + return -EINVAL; + } + + vdso_lookup[arch_index].vdso_pages = ( + vdso_lookup[arch_index].vdso_code_end - + vdso_lookup[arch_index].vdso_code_start) >> + PAGE_SHIFT; + + /* Allocate the vDSO pagelist, plus a page for the data. */ + vdso_pagelist = kcalloc(vdso_lookup[arch_index].vdso_pages + 1, + sizeof(struct page *), + GFP_KERNEL); + if (vdso_pagelist == NULL) + return -ENOMEM; + + /* Grab the vDSO data page. */ + vdso_pagelist[0] = phys_to_page(__pa_symbol(vdso_data)); + + + /* Grab the vDSO code pages. */ + pfn = sym_to_pfn(vdso_lookup[arch_index].vdso_code_start); + + for (i = 0; i < vdso_lookup[arch_index].vdso_pages; i++) + vdso_pagelist[i + 1] = pfn_to_page(pfn + i); + + vdso_lookup[arch_index].dm->pages = &vdso_pagelist[0]; + vdso_lookup[arch_index].cm->pages = &vdso_pagelist[1]; + + return 0; +} + +static int __setup_additional_pages(enum arch_vdso_type arch_index, + struct mm_struct *mm, + struct linux_binprm *bprm, + int uses_interp) +{ + unsigned long vdso_base, vdso_text_len, vdso_mapping_len; + void *ret; + + vdso_text_len = vdso_lookup[arch_index].vdso_pages << PAGE_SHIFT; + /* Be sure to map the data page */ + vdso_mapping_len = vdso_text_len + PAGE_SIZE; + + vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0); + if (IS_ERR_VALUE(vdso_base)) { + ret = ERR_PTR(vdso_base); + goto up_fail; + } + + ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE, + VM_READ|VM_MAYREAD, + vdso_lookup[arch_index].dm); + if (IS_ERR(ret)) + goto up_fail; + + vdso_base += PAGE_SIZE; + mm->context.vdso = (void *)vdso_base; + ret = _install_special_mapping(mm, vdso_base, vdso_text_len, + VM_READ|VM_EXEC| + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, + vdso_lookup[arch_index].cm); + if (IS_ERR(ret)) + goto up_fail; + + return 0; + +up_fail: + mm->context.vdso = NULL; + return PTR_ERR(ret); +} + #ifdef CONFIG_COMPAT /* * Create and map the vectors page for AArch32 tasks. */ +/* + * aarch32_vdso_pages: + * 0 - kuser helpers + * 1 - sigreturn code + */ #define C_VECTORS 0 #define C_SIGPAGE 1 #define C_PAGES (C_SIGPAGE + 1) @@ -183,18 +307,18 @@ int aarch32_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma) { - unsigned long new_size = new_vma->vm_end - new_vma->vm_start; - unsigned long vdso_size = vdso_end - vdso_start; - - if (vdso_size != new_size) - return -EINVAL; - - current->mm->context.vdso = (void *)new_vma->vm_start; - - return 0; + return __vdso_remap(ARM64_VDSO, sm, new_vma); }
-static struct vm_special_mapping vdso_spec[2] __ro_after_init = { +/* + * aarch64_vdso_pages: + * 0 - vvar + * 1 - vdso + */ +#define A_VVAR 0 +#define A_VDSO 1 +#define A_PAGES (A_VDSO + 1) +static struct vm_special_mapping vdso_spec[A_PAGES] __ro_after_init = { { .name = "[vvar]", }, @@ -206,37 +330,10 @@ static struct vm_special_mapping vdso_spec[2] __ro_after_init = {
static int __init vdso_init(void) { - int i; - struct page **vdso_pagelist; - unsigned long pfn; - - if (memcmp(vdso_start, "\177ELF", 4)) { - pr_err("vDSO is not a valid ELF object!\n"); - return -EINVAL; - } - - vdso_pages = (vdso_end - vdso_start) >> PAGE_SHIFT; - - /* Allocate the vDSO pagelist, plus a page for the data. */ - vdso_pagelist = kcalloc(vdso_pages + 1, sizeof(struct page *), - GFP_KERNEL); - if (vdso_pagelist == NULL) - return -ENOMEM; - - /* Grab the vDSO data page. */ - vdso_pagelist[0] = phys_to_page(__pa_symbol(vdso_data)); - + vdso_lookup[ARM64_VDSO].dm = &vdso_spec[A_VVAR]; + vdso_lookup[ARM64_VDSO].cm = &vdso_spec[A_VDSO];
- /* Grab the vDSO code pages. */ - pfn = sym_to_pfn(vdso_start); - - for (i = 0; i < vdso_pages; i++) - vdso_pagelist[i + 1] = pfn_to_page(pfn + i); - - vdso_spec[0].pages = &vdso_pagelist[0]; - vdso_spec[1].pages = &vdso_pagelist[1]; - - return 0; + return __vdso_init(ARM64_VDSO); } arch_initcall(vdso_init);
@@ -244,41 +341,17 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) { struct mm_struct *mm = current->mm; - unsigned long vdso_base, vdso_text_len, vdso_mapping_len; - void *ret; - - vdso_text_len = vdso_pages << PAGE_SHIFT; - /* Be sure to map the data page */ - vdso_mapping_len = vdso_text_len + PAGE_SIZE; + int ret;
if (down_write_killable(&mm->mmap_sem)) return -EINTR; - vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0); - if (IS_ERR_VALUE(vdso_base)) { - ret = ERR_PTR(vdso_base); - goto up_fail; - } - ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE, - VM_READ|VM_MAYREAD, - &vdso_spec[0]); - if (IS_ERR(ret)) - goto up_fail; - - vdso_base += PAGE_SIZE; - mm->context.vdso = (void *)vdso_base; - ret = _install_special_mapping(mm, vdso_base, vdso_text_len, - VM_READ|VM_EXEC| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, - &vdso_spec[1]); - if (IS_ERR(ret)) - goto up_fail;
+ ret = __setup_additional_pages(ARM64_VDSO, + mm, + bprm, + uses_interp);
up_write(&mm->mmap_sem); - return 0;
-up_fail: - mm->context.vdso = NULL; - up_write(&mm->mmap_sem); - return PTR_ERR(ret); + return ret; }
If CONFIG_GENERIC_COMPAT_VDSO is enabled, compat vDSO are installed in a compat (32 bit) process instead of sigpage.
Add the necessary code to setup the vDSO required pages.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm64/kernel/vdso.c | 90 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 88 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index aa1fb25a9fe4..ad3a81b2c7ce 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -40,12 +40,22 @@ #include <asm/vdso.h>
extern char vdso_start[], vdso_end[]; +#ifdef CONFIG_COMPAT_VDSO +extern char vdso32_start[], vdso32_end[]; +#endif /* CONFIG_COMPAT_VDSO */
/* vdso_lookup arch_index */ enum arch_vdso_type { ARM64_VDSO = 0, +#ifdef CONFIG_COMPAT_VDSO + ARM64_VDSO32 = 1, +#endif /* CONFIG_COMPAT_VDSO */ }; +#ifdef CONFIG_COMPAT_VDSO +#define VDSO_TYPES (ARM64_VDSO32 + 1) +#else #define VDSO_TYPES (ARM64_VDSO + 1) +#endif /* CONFIG_COMPAT_VDSO */
struct __vdso_abi { const char *name; @@ -64,6 +74,13 @@ static struct __vdso_abi vdso_lookup[VDSO_TYPES] __ro_after_init = { .vdso_code_start = vdso_start, .vdso_code_end = vdso_end, }, +#ifdef CONFIG_COMPAT_VDSO + { + .name = "vdso32", + .vdso_code_start = vdso32_start, + .vdso_code_end = vdso32_end, + }, +#endif /* CONFIG_COMPAT_VDSO */ };
/* @@ -174,24 +191,52 @@ static int __setup_additional_pages(enum arch_vdso_type arch_index, /* * Create and map the vectors page for AArch32 tasks. */ +#ifdef CONFIG_COMPAT_VDSO +static int aarch32_vdso_mremap(const struct vm_special_mapping *sm, + struct vm_area_struct *new_vma) +{ + return __vdso_remap(ARM64_VDSO32, sm, new_vma); +} +#endif /* CONFIG_COMPAT_VDSO */ + /* * aarch32_vdso_pages: * 0 - kuser helpers * 1 - sigreturn code + * or (CONFIG_COMPAT_VDSO): + * 0 - kuser helpers + * 1 - vdso data + * 2 - vdso code */ #define C_VECTORS 0 +#ifdef CONFIG_COMPAT_VDSO +#define C_VVAR 1 +#define C_VDSO 2 +#define C_PAGES (C_VDSO + 1) +#else #define C_SIGPAGE 1 #define C_PAGES (C_SIGPAGE + 1) +#endif /* CONFIG_COMPAT_VDSO */ static struct page *aarch32_vdso_pages[C_PAGES] __ro_after_init; -static const struct vm_special_mapping aarch32_vdso_spec[C_PAGES] = { +static struct vm_special_mapping aarch32_vdso_spec[C_PAGES] = { { .name = "[vectors]", /* ABI */ .pages = &aarch32_vdso_pages[C_VECTORS], }, +#ifdef CONFIG_COMPAT_VDSO + { + .name = "[vvar]", + }, + { + .name = "[vdso]", + .mremap = aarch32_vdso_mremap, + }, +#else { .name = "[sigpage]", /* ABI */ .pages = &aarch32_vdso_pages[C_SIGPAGE], }, +#endif /* CONFIG_COMPAT_VDSO */ };
static int aarch32_alloc_kuser_vdso_page(void) @@ -214,7 +259,33 @@ static int aarch32_alloc_kuser_vdso_page(void) return 0; }
-static int __init aarch32_alloc_vdso_pages(void) +#ifdef CONFIG_COMPAT_VDSO +static int __aarch32_alloc_vdso_pages(void) +{ + int ret; + + vdso_lookup[ARM64_VDSO32].dm = &aarch32_vdso_spec[C_VVAR]; + vdso_lookup[ARM64_VDSO32].cm = &aarch32_vdso_spec[C_VDSO]; + + ret = __vdso_init(ARM64_VDSO32); + if (ret) + return ret; + + ret = aarch32_alloc_kuser_vdso_page(); + if (ret) { + unsigned long c_vvar = + (unsigned long)page_to_virt(aarch32_vdso_pages[C_VVAR]); + unsigned long c_vdso = + (unsigned long)page_to_virt(aarch32_vdso_pages[C_VDSO]); + + free_page(c_vvar); + free_page(c_vdso); + } + + return ret; +} +#else +static int __aarch32_alloc_vdso_pages(void) { extern char __aarch32_sigret_code_start[], __aarch32_sigret_code_end[]; int sigret_sz = __aarch32_sigret_code_end - __aarch32_sigret_code_start; @@ -235,6 +306,12 @@ static int __init aarch32_alloc_vdso_pages(void)
return ret; } +#endif /* CONFIG_COMPAT_VDSO */ + +static int __init aarch32_alloc_vdso_pages(void) +{ + return __aarch32_alloc_vdso_pages(); +} arch_initcall(aarch32_alloc_vdso_pages);
static int aarch32_kuser_helpers_setup(struct mm_struct *mm) @@ -256,6 +333,7 @@ static int aarch32_kuser_helpers_setup(struct mm_struct *mm) return PTR_ERR_OR_ZERO(ret); }
+#ifndef CONFIG_COMPAT_VDSO static int aarch32_sigreturn_setup(struct mm_struct *mm) { unsigned long addr; @@ -283,6 +361,7 @@ static int aarch32_sigreturn_setup(struct mm_struct *mm) out: return PTR_ERR_OR_ZERO(ret); } +#endif /* !CONFIG_COMPAT_VDSO */
int aarch32_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) { @@ -296,7 +375,14 @@ int aarch32_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) if (ret) goto out;
+#ifdef CONFIG_COMPAT_VDSO + ret = __setup_additional_pages(ARM64_VDSO32, + mm, + bprm, + uses_interp); +#else ret = aarch32_sigreturn_setup(mm); +#endif /* CONFIG_COMPAT_VDSO */
out: up_write(&mm->mmap_sem);
Like in normal vDSOs, when compat vDSOs are enabled the auxiliary vector symbol AT_SYSINFO_EHDR needs to point at the address of the vDSO code, to allow the dynamic linker to find it.
Add the necessary code to the elf arm64 module to make this possible.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm64/include/asm/elf.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 355d120b78cb..34cabaf78011 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -213,7 +213,21 @@ typedef compat_elf_greg_t compat_elf_gregset_t[COMPAT_ELF_NGREG]; ({ \ set_thread_flag(TIF_32BIT); \ }) +#ifdef CONFIG_GENERIC_COMPAT_VDSO +#define COMPAT_ARCH_DLINFO \ +do { \ + /* \ + * Note that we use Elf64_Off instead of elf_addr_t because \ + * elf_addr_t in compat is defined as Elf32_Addr and casting \ + * current->mm->context.vdso to it triggers a cast warning of \ + * cast from pointer to integer of different size. \ + */ \ + NEW_AUX_ENT(AT_SYSINFO_EHDR, \ + (Elf64_Off)current->mm->context.vdso); \ +} while (0) +#else #define COMPAT_ARCH_DLINFO +#endif extern int aarch32_setup_additional_pages(struct linux_binprm *bprm, int uses_interp); #define compat_arch_setup_additional_pages \
When the compat vDSO is enabled, the sigreturn trampolines are not anymore available through [sigpage] but through [vdso].
Add the relevant code the enable the feature.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm64/include/asm/vdso.h | 3 +++ arch/arm64/kernel/signal32.c | 26 ++++++++++++++++++++++++++ 2 files changed, 29 insertions(+)
diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h index 839ce0031bd5..9b197e5ea759 100644 --- a/arch/arm64/include/asm/vdso.h +++ b/arch/arm64/include/asm/vdso.h @@ -28,6 +28,9 @@ #ifndef __ASSEMBLY__
#include <generated/vdso-offsets.h> +#ifdef CONFIG_COMPAT_VDSO +#include <generated/vdso32-offsets.h> +#endif
#define VDSO_SYMBOL(base, name) \ ({ \ diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c index 74e06d8c7c2b..4fca2e1937b2 100644 --- a/arch/arm64/kernel/signal32.c +++ b/arch/arm64/kernel/signal32.c @@ -29,6 +29,7 @@ #include <asm/traps.h> #include <linux/uaccess.h> #include <asm/unistd.h> +#include <asm/vdso.h>
struct compat_vfp_sigframe { compat_ulong_t magic; @@ -352,6 +353,30 @@ static void compat_setup_return(struct pt_regs *regs, struct k_sigaction *ka, retcode = ptr_to_compat(ka->sa.sa_restorer); } else { /* Set up sigreturn pointer */ +#ifdef CONFIG_COMPAT_VDSO + void *vdso_base = current->mm->context.vdso; + void *vdso_trampoline; + + if (ka->sa.sa_flags & SA_SIGINFO) { + if (thumb) { + vdso_trampoline = VDSO_SYMBOL(vdso_base, + compat_rt_sigreturn_thumb); + } else { + vdso_trampoline = VDSO_SYMBOL(vdso_base, + compat_rt_sigreturn_arm); + } + } else { + if (thumb) { + vdso_trampoline = VDSO_SYMBOL(vdso_base, + compat_sigreturn_thumb); + } else { + vdso_trampoline = VDSO_SYMBOL(vdso_base, + compat_sigreturn_arm); + } + } + + retcode = ptr_to_compat(vdso_trampoline) + thumb; +#else unsigned int idx = thumb << 1;
if (ka->sa.sa_flags & SA_SIGINFO) @@ -359,6 +384,7 @@ static void compat_setup_return(struct pt_regs *regs, struct k_sigaction *ka,
retcode = (unsigned long)current->mm->context.vdso + (idx << 2) + thumb; +#endif }
regs->regs[0] = usig;
Add vDSO compat support to the arm64 building system.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm64/Kconfig | 1 + arch/arm64/Makefile | 23 +++++++++++++++++++++-- arch/arm64/kernel/Makefile | 6 +++++- 3 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 952c9f8cf3b8..3e1d4f8347f4 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -108,6 +108,7 @@ config ARM64 select GENERIC_STRNLEN_USER select GENERIC_TIME_VSYSCALL select GENERIC_GETTIMEOFDAY + select GENERIC_COMPAT_VDSO if !CPU_BIG_ENDIAN select HANDLE_DOMAIN_IRQ select HARDIRQS_SW_RESEND select HAVE_PCI diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index b025304bde46..4db50d4b2476 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -49,9 +49,25 @@ $(warning Detected assembler with broken .inst; disassembly will be unreliable) endif endif
-KBUILD_CFLAGS += -mgeneral-regs-only $(lseinstr) $(brokengasinst) +ifeq ($(CONFIG_GENERIC_COMPAT_VDSO), y) + CROSS_COMPILE_COMPAT ?= $(CONFIG_CROSS_COMPILE_COMPAT_VDSO:"%"=%) + + ifeq ($(CONFIG_CC_IS_CLANG), y) + $(warning CROSS_COMPILE_COMPAT is clang, the compat vDSO will not be built) + else ifeq ($(CROSS_COMPILE_COMPAT),) + $(warning CROSS_COMPILE_COMPAT not defined or empty, the compat vDSO will not be built) + else ifeq ($(shell which $(CROSS_COMPILE_COMPAT)gcc 2> /dev/null),) + $(error $(CROSS_COMPILE_COMPAT)gcc not found, check CROSS_COMPILE_COMPAT) + else + export CROSS_COMPILE_COMPAT + export CONFIG_COMPAT_VDSO := y + compat_vdso := -DCONFIG_COMPAT_VDSO=1 + endif +endif + +KBUILD_CFLAGS += -mgeneral-regs-only $(lseinstr) $(brokengasinst) $(compat_vdso) KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) +KBUILD_AFLAGS += $(lseinstr) $(brokengasinst) $(compat_vdso)
KBUILD_CFLAGS += $(call cc-option,-mabi=lp64) KBUILD_AFLAGS += $(call cc-option,-mabi=lp64) @@ -163,6 +179,9 @@ ifeq ($(KBUILD_EXTMOD),) prepare: vdso_prepare vdso_prepare: prepare0 $(Q)$(MAKE) $(build)=arch/arm64/kernel/vdso include/generated/vdso-offsets.h + $(if $(CONFIG_COMPAT_VDSO),$(Q)$(MAKE) \ + $(build)=arch/arm64/kernel/vdso32 \ + include/generated/vdso32-offsets.h) endif
define archhelp diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 9e7dcb2c31c7..478491f07b4f 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -28,7 +28,10 @@ $(obj)/%.stub.o: $(obj)/%.o FORCE $(call if_changed,objcopy)
obj-$(CONFIG_COMPAT) += sys32.o signal32.o \ - sigreturn32.o sys_compat.o + sys_compat.o +ifneq ($(CONFIG_COMPAT_VDSO), y) +obj-$(CONFIG_COMPAT) += sigreturn32.o +endif obj-$(CONFIG_KUSER_HELPERS) += kuser32.o obj-$(CONFIG_FUNCTION_TRACER) += ftrace.o entry-ftrace.o obj-$(CONFIG_MODULES) += module.o @@ -62,6 +65,7 @@ obj-$(CONFIG_ARM64_SSBD) += ssbd.o obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
obj-y += vdso/ probes/ +obj-$(CONFIG_COMPAT_VDSO) += vdso32/ head-y := head.o extra-y += $(head-y) vmlinux.lds
On Thu, May 30, 2019 at 03:15:27PM +0100, Vincenzo Frascino wrote:
Add vDSO compat support to the arm64 building system.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com
arch/arm64/Kconfig | 1 + arch/arm64/Makefile | 23 +++++++++++++++++++++-- arch/arm64/kernel/Makefile | 6 +++++- 3 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 952c9f8cf3b8..3e1d4f8347f4 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -108,6 +108,7 @@ config ARM64 select GENERIC_STRNLEN_USER select GENERIC_TIME_VSYSCALL select GENERIC_GETTIMEOFDAY
- select GENERIC_COMPAT_VDSO if !CPU_BIG_ENDIAN
This select needs to also depend on COMPAT (or rather be selected from the COMPAT menuconfig), otherwise, trying to build this series with 64K pages where COMPAT is disabled, I get:
VDSOC_GTD arch/arm64/kernel/vdso32/vgettimeofday.o VDSOA arch/arm64/kernel/vdso32/sigreturn.o arch/arm64/kernel/vdso32/sigreturn.S: Assembler messages: arch/arm64/kernel/vdso32/sigreturn.S:25: Error: expected #constant arch/arm64/kernel/vdso32/sigreturn.S:35: Error: expected #constant arch/arm64/kernel/vdso32/sigreturn.S:46: Error: expected #constant arch/arm64/kernel/vdso32/sigreturn.S:56: Error: expected #constant arch/arm64/kernel/vdso32/sigreturn.S:28: Error: undefined symbol __NR_compat_sigreturn used as an immediate value arch/arm64/kernel/vdso32/sigreturn.S:38: Error: undefined symbol __NR_compat_rt_sigreturn used as an immediate value make[2]: *** [arch/arm64/kernel/vdso32/Makefile:154: arch/arm64/kernel/vdso32/sigreturn.o] Error 1 make[2]: *** Waiting for unfinished jobs.... In file included from lib/vdso/gettimeofday.c:25:0, from <command-line>:0: arch/arm64/include/asm/vdso/compat_gettimeofday.h: In function 'gettimeofday_fallback': arch/arm64/include/asm/vdso/compat_gettimeofday.h:22:31: error: '__NR_compat_gettimeofday' undeclared (first use in this function); did you mean '__NR_gettimeofday'? register long nr asm("r7") = __NR_compat_gettimeofday; ^~~~~~~~~~~~~~~~~~~~~~~~ __NR_gettimeofday arch/arm64/include/asm/vdso/compat_gettimeofday.h:22:31: note: each undeclared identifier is reported only once for each function it appears in arch/arm64/include/asm/vdso/compat_gettimeofday.h: In function 'clock_gettime_fallback': arch/arm64/include/asm/vdso/compat_gettimeofday.h:40:31: error: '__NR_compat_clock_gettime64' undeclared (first use in this function); did you mean '__NR_clock_gettime'? register long nr asm("r7") = __NR_compat_clock_gettime64; ^~~~~~~~~~~~~~~~~~~~~~~~~~~ __NR_clock_gettime arch/arm64/include/asm/vdso/compat_gettimeofday.h: In function 'clock_getres_fallback': arch/arm64/include/asm/vdso/compat_gettimeofday.h:58:31: error: '__NR_compat_clock_getres_time64' undeclared (first use in this function); did you mean '__NR_clock_gettime'? register long nr asm("r7") = __NR_compat_clock_getres_time64; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ __NR_clock_gettime arch/arm64/kernel/vdso32/vgettimeofday.c: In function '__vdso_clock_gettime': arch/arm64/kernel/vdso32/vgettimeofday.c:15:17: error: 'TASK_SIZE_32' undeclared (first use in this function); did you mean 'TASK_SIZE_64'? if ((u32)ts >= TASK_SIZE_32) ^~~~~~~~~~~~ TASK_SIZE_64 arch/arm64/kernel/vdso32/vgettimeofday.c: In function '__vdso_clock_gettime64': arch/arm64/kernel/vdso32/vgettimeofday.c:25:17: error: 'TASK_SIZE_32' undeclared (first use in this function); did you mean 'TASK_SIZE_64'? if ((u32)ts >= TASK_SIZE_32) ^~~~~~~~~~~~ TASK_SIZE_64 arch/arm64/kernel/vdso32/vgettimeofday.c: In function '__vdso_clock_getres': arch/arm64/kernel/vdso32/vgettimeofday.c:41:18: error: 'TASK_SIZE_32' undeclared (first use in this function); did you mean 'TASK_SIZE_64'? if ((u32)res >= TASK_SIZE_32) ^~~~~~~~~~~~ TASK_SIZE_64 make[2]: *** [arch/arm64/kernel/vdso32/Makefile:152: arch/arm64/kernel/vdso32/vgettimeofday.o] Error 1 make[1]: *** [arch/arm64/Makefile:182: vdso_prepare] Error 2 make: *** [Makefile:179: sub-make] Error 2
Hi Catalin,
thank you for testing my patches and providing the scripts you used to reproduce the issue.
On 01/06/2019 10:38, Catalin Marinas wrote:
On Thu, May 30, 2019 at 03:15:27PM +0100, Vincenzo Frascino wrote:
Add vDSO compat support to the arm64 building system.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com
arch/arm64/Kconfig | 1 + arch/arm64/Makefile | 23 +++++++++++++++++++++-- arch/arm64/kernel/Makefile | 6 +++++- 3 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 952c9f8cf3b8..3e1d4f8347f4 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -108,6 +108,7 @@ config ARM64 select GENERIC_STRNLEN_USER select GENERIC_TIME_VSYSCALL select GENERIC_GETTIMEOFDAY
- select GENERIC_COMPAT_VDSO if !CPU_BIG_ENDIAN
This select needs to also depend on COMPAT (or rather be selected from the COMPAT menuconfig), otherwise, trying to build this series with 64K pages where COMPAT is disabled, I get:
This is a very good catch, my bad, will definitely fix in v7.
...
The arm vDSO library requires some adaptations to use to take advantage of the newly introduced generic vDSO library.
Introduce the following changes: - Modification vdso.c to be compliant with the common vdso datapage - Use of lib/vdso for gettimeofday - Implementation of elf note
Cc: Russell King linux@armlinux.org.uk Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/arm/Kconfig | 3 + arch/arm/include/asm/vdso/gettimeofday.h | 96 ++++++++ arch/arm/include/asm/vdso/vsyscall.h | 71 ++++++ arch/arm/include/asm/vdso_datapage.h | 29 +-- arch/arm/kernel/vdso.c | 87 +------- arch/arm/vdso/Makefile | 13 +- arch/arm/vdso/note.c | 15 ++ arch/arm/vdso/vdso.lds.S | 2 + arch/arm/vdso/vgettimeofday.c | 268 ++--------------------- 9 files changed, 222 insertions(+), 362 deletions(-) create mode 100644 arch/arm/include/asm/vdso/gettimeofday.h create mode 100644 arch/arm/include/asm/vdso/vsyscall.h create mode 100644 arch/arm/vdso/note.c
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 8869742a85df..795cd2f7f437 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -53,6 +53,8 @@ config ARM select GENERIC_SMP_IDLE_THREAD select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER + select GENERIC_GETTIMEOFDAY + select GENERIC_VDSO_32 select HANDLE_DOMAIN_IRQ select HARDIRQS_SW_RESEND select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT @@ -101,6 +103,7 @@ config ARM select HAVE_SYSCALL_TRACEPOINTS select HAVE_UID16 select HAVE_VIRT_CPU_ACCOUNTING_GEN + select HAVE_GENERIC_VDSO select IRQ_FORCED_THREADING select MODULES_USE_ELF_REL select NEED_DMA_MAP_STATE diff --git a/arch/arm/include/asm/vdso/gettimeofday.h b/arch/arm/include/asm/vdso/gettimeofday.h new file mode 100644 index 000000000000..eeeb319840ba --- /dev/null +++ b/arch/arm/include/asm/vdso/gettimeofday.h @@ -0,0 +1,96 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2018 ARM Limited + */ +#ifndef __ASM_VDSO_GETTIMEOFDAY_H +#define __ASM_VDSO_GETTIMEOFDAY_H + +#ifndef __ASSEMBLY__ + +#include <asm/barrier.h> +#include <asm/cp15.h> +#include <asm/unistd.h> +#include <uapi/linux/time.h> + +#ifndef CONFIG_AEABI +#error This code depends on AEABI system call conventions +#endif + +extern struct vdso_data *__get_datapage(void); + +static __always_inline notrace int gettimeofday_fallback( + struct __kernel_old_timeval *_tv, + struct timezone *_tz) +{ + register struct timezone *tz asm("r1") = _tz; + register struct __kernel_old_timeval *tv asm("r0") = _tv; + register long ret asm ("r0"); + register long nr asm("r7") = __NR_gettimeofday; + + asm volatile( + " swi #0\n" + : "=r" (ret) + : "r" (tv), "r" (tz), "r" (nr) + : "memory"); + + return ret; +} + +static __always_inline notrace long clock_gettime_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + register struct __kernel_timespec *ts asm("r1") = _ts; + register clockid_t clkid asm("r0") = _clkid; + register long ret asm ("r0"); + register long nr asm("r7") = __NR_clock_gettime64; + + asm volatile( + " swi #0\n" + : "=r" (ret) + : "r" (clkid), "r" (ts), "r" (nr) + : "memory"); + + return ret; +} + +static __always_inline notrace int clock_getres_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + register struct __kernel_timespec *ts asm("r1") = _ts; + register clockid_t clkid asm("r0") = _clkid; + register long ret asm ("r0"); + register long nr asm("r7") = __NR_clock_getres_time64; + + asm volatile( + " swi #0\n" + : "=r" (ret) + : "r" (clkid), "r" (ts), "r" (nr) + : "memory"); + + return ret; +} + +static __always_inline notrace u64 __arch_get_hw_counter(int clock_mode) +{ +#ifdef CONFIG_ARM_ARCH_TIMER + u64 cycle_now; + + isb(); + cycle_now = read_sysreg(CNTVCT); + + return cycle_now; +#else + return -EINVAL; /* use fallback */ +#endif +} + +static __always_inline notrace const struct vdso_data *__arch_get_vdso_data(void) +{ + return __get_datapage(); +} + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_GETTIMEOFDAY_H */ diff --git a/arch/arm/include/asm/vdso/vsyscall.h b/arch/arm/include/asm/vdso/vsyscall.h new file mode 100644 index 000000000000..c4166f317071 --- /dev/null +++ b/arch/arm/include/asm/vdso/vsyscall.h @@ -0,0 +1,71 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_VDSO_VSYSCALL_H +#define __ASM_VDSO_VSYSCALL_H + +#ifndef __ASSEMBLY__ + +#include <linux/timekeeper_internal.h> +#include <vdso/datapage.h> +#include <asm/cacheflush.h> + +extern struct vdso_data *vdso_data; +extern bool cntvct_ok; + +static __always_inline +bool tk_is_cntvct(const struct timekeeper *tk) +{ + if (!IS_ENABLED(CONFIG_ARM_ARCH_TIMER)) + return false; + + if (!tk->tkr_mono.clock->archdata.vdso_direct) + return false; + + return true; +} + +/* + * Update the vDSO data page to keep in sync with kernel timekeeping. + */ +static __always_inline +struct vdso_data *__arm_get_k_vdso_data(void) +{ + return vdso_data; +} +#define __arch_get_k_vdso_data __arm_get_k_vdso_data + +static __always_inline +int __arm_update_vdso_data(void) +{ + return !cntvct_ok; +} +#define __arch_update_vdso_data __arm_update_vdso_data + +static __always_inline +int __arm_get_clock_mode(struct timekeeper *tk) +{ + u32 __tk_is_cntvct = tk_is_cntvct(tk); + + return __tk_is_cntvct; +} +#define __arch_get_clock_mode __arm_get_clock_mode + +static __always_inline +int __arm_use_vsyscall(struct vdso_data *vdata) +{ + return vdata[CS_HRES_COARSE].clock_mode; +} +#define __arch_use_vsyscall __arm_use_vsyscall + +static __always_inline +void __arm_sync_vdso_data(struct vdso_data *vdata) +{ + flush_dcache_page(virt_to_page(vdata)); +} +#define __arch_sync_vdso_data __arm_sync_vdso_data + +/* The asm-generic header needs to be included after the definitions above */ +#include <asm-generic/vdso/vsyscall.h> + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_VSYSCALL_H */ diff --git a/arch/arm/include/asm/vdso_datapage.h b/arch/arm/include/asm/vdso_datapage.h index 9be259442fca..bfdbbd2b5fe7 100644 --- a/arch/arm/include/asm/vdso_datapage.h +++ b/arch/arm/include/asm/vdso_datapage.h @@ -22,35 +22,12 @@
#ifndef __ASSEMBLY__
+#include <vdso/datapage.h> #include <asm/page.h>
-/* Try to be cache-friendly on systems that don't implement the - * generic timer: fit the unconditionally updated fields in the first - * 32 bytes. - */ -struct vdso_data { - u32 seq_count; /* sequence count - odd during updates */ - u16 tk_is_cntvct; /* fall back to syscall if false */ - u16 cs_shift; /* clocksource shift */ - u32 xtime_coarse_sec; /* coarse time */ - u32 xtime_coarse_nsec; - - u32 wtm_clock_sec; /* wall to monotonic offset */ - u32 wtm_clock_nsec; - u32 xtime_clock_sec; /* CLOCK_REALTIME - seconds */ - u32 cs_mult; /* clocksource multiplier */ - - u64 cs_cycle_last; /* last cycle value */ - u64 cs_mask; /* clocksource mask */ - - u64 xtime_clock_snsec; /* CLOCK_REALTIME sub-ns base */ - u32 tz_minuteswest; /* timezone info for gettimeofday(2) */ - u32 tz_dsttime; -}; - union vdso_data_store { - struct vdso_data data; - u8 page[PAGE_SIZE]; + struct vdso_data data[CS_BASES]; + u8 page[PAGE_SIZE]; };
#endif /* !__ASSEMBLY__ */ diff --git a/arch/arm/kernel/vdso.c b/arch/arm/kernel/vdso.c index f4dd7f9663c1..9a9cea8b333d 100644 --- a/arch/arm/kernel/vdso.c +++ b/arch/arm/kernel/vdso.c @@ -34,6 +34,8 @@ #include <asm/vdso.h> #include <asm/vdso_datapage.h> #include <clocksource/arm_arch_timer.h> +#include <vdso/helpers.h> +#include <vdso/vsyscall.h>
#define MAX_SYMNAME 64
@@ -48,7 +50,7 @@ unsigned int vdso_total_pages __ro_after_init; * The VDSO data page. */ static union vdso_data_store vdso_data_store __page_aligned_data; -static struct vdso_data *vdso_data = &vdso_data_store.data; +struct vdso_data *vdso_data = vdso_data_store.data;
static struct page *vdso_data_page __ro_after_init; static const struct vm_special_mapping vdso_data_mapping = { @@ -88,7 +90,7 @@ struct elfinfo { /* Cached result of boot-time check for whether the arch timer exists, * and if so, whether the virtual counter is useable. */ -static bool cntvct_ok __ro_after_init; +bool cntvct_ok __ro_after_init;
static bool __init cntvct_functional(void) { @@ -274,84 +276,3 @@ void arm_install_vdso(struct mm_struct *mm, unsigned long addr) mm->context.vdso = addr; }
-static void vdso_write_begin(struct vdso_data *vdata) -{ - ++vdso_data->seq_count; - smp_wmb(); /* Pairs with smp_rmb in vdso_read_retry */ -} - -static void vdso_write_end(struct vdso_data *vdata) -{ - smp_wmb(); /* Pairs with smp_rmb in vdso_read_begin */ - ++vdso_data->seq_count; -} - -static bool tk_is_cntvct(const struct timekeeper *tk) -{ - if (!IS_ENABLED(CONFIG_ARM_ARCH_TIMER)) - return false; - - if (!tk->tkr_mono.clock->archdata.vdso_direct) - return false; - - return true; -} - -/** - * update_vsyscall - update the vdso data page - * - * Increment the sequence counter, making it odd, indicating to - * userspace that an update is in progress. Update the fields used - * for coarse clocks and, if the architected system timer is in use, - * the fields used for high precision clocks. Increment the sequence - * counter again, making it even, indicating to userspace that the - * update is finished. - * - * Userspace is expected to sample seq_count before reading any other - * fields from the data page. If seq_count is odd, userspace is - * expected to wait until it becomes even. After copying data from - * the page, userspace must sample seq_count again; if it has changed - * from its previous value, userspace must retry the whole sequence. - * - * Calls to update_vsyscall are serialized by the timekeeping core. - */ -void update_vsyscall(struct timekeeper *tk) -{ - struct timespec64 *wtm = &tk->wall_to_monotonic; - - if (!cntvct_ok) { - /* The entry points have been zeroed, so there is no - * point in updating the data page. - */ - return; - } - - vdso_write_begin(vdso_data); - - vdso_data->tk_is_cntvct = tk_is_cntvct(tk); - vdso_data->xtime_coarse_sec = tk->xtime_sec; - vdso_data->xtime_coarse_nsec = (u32)(tk->tkr_mono.xtime_nsec >> - tk->tkr_mono.shift); - vdso_data->wtm_clock_sec = wtm->tv_sec; - vdso_data->wtm_clock_nsec = wtm->tv_nsec; - - if (vdso_data->tk_is_cntvct) { - vdso_data->cs_cycle_last = tk->tkr_mono.cycle_last; - vdso_data->xtime_clock_sec = tk->xtime_sec; - vdso_data->xtime_clock_snsec = tk->tkr_mono.xtime_nsec; - vdso_data->cs_mult = tk->tkr_mono.mult; - vdso_data->cs_shift = tk->tkr_mono.shift; - vdso_data->cs_mask = tk->tkr_mono.mask; - } - - vdso_write_end(vdso_data); - - flush_dcache_page(virt_to_page(vdso_data)); -} - -void update_vsyscall_tz(void) -{ - vdso_data->tz_minuteswest = sys_tz.tz_minuteswest; - vdso_data->tz_dsttime = sys_tz.tz_dsttime; - flush_dcache_page(virt_to_page(vdso_data)); -} diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile index fadf554d9391..0c8a819ef4f1 100644 --- a/arch/arm/vdso/Makefile +++ b/arch/arm/vdso/Makefile @@ -1,7 +1,13 @@ # SPDX-License-Identifier: GPL-2.0 + +# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before +# the inclusion of generic Makefile. +ARCH_REL_TYPE_ABS := R_ARM_JUMP_SLOT|R_ARM_GLOB_DAT|R_ARM_ABS32 +include $(srctree)/lib/vdso/Makefile + hostprogs-y := vdsomunge
-obj-vdso := vgettimeofday.o datapage.o +obj-vdso := vgettimeofday.o datapage.o note.o
# Build rules targets := $(obj-vdso) vdso.so vdso.so.dbg vdso.so.raw vdso.lds @@ -25,7 +31,11 @@ CFLAGS_REMOVE_vdso.o = -pg
# Force -O2 to avoid libgcc dependencies CFLAGS_REMOVE_vgettimeofday.o = -pg -Os +ifeq ($(c-gettimeofday-y),) CFLAGS_vgettimeofday.o = -O2 +else +CFLAGS_vgettimeofday.o = -O2 -include $(c-gettimeofday-y) +endif
# Disable gcov profiling for VDSO code GCOV_PROFILE := n @@ -39,6 +49,7 @@ $(obj)/vdso.o : $(obj)/vdso.so # Link rule for the .so file $(obj)/vdso.so.raw: $(obj)/vdso.lds $(obj-vdso) FORCE $(call if_changed,ld) + $(call if_changed,vdso_check)
$(obj)/vdso.so.dbg: $(obj)/vdso.so.raw $(obj)/vdsomunge FORCE $(call if_changed,vdsomunge) diff --git a/arch/arm/vdso/note.c b/arch/arm/vdso/note.c new file mode 100644 index 000000000000..eff5bf9efb8b --- /dev/null +++ b/arch/arm/vdso/note.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2012-2018 ARM Limited + * + * This supplies .note.* sections to go into the PT_NOTE inside the vDSO text. + * Here we can supply some information useful to userland. + */ + +#include <linux/uts.h> +#include <linux/version.h> +#include <linux/elfnote.h> +#include <linux/build-salt.h> + +ELFNOTE32("Linux", 0, LINUX_VERSION_CODE); +BUILD_SALT; diff --git a/arch/arm/vdso/vdso.lds.S b/arch/arm/vdso/vdso.lds.S index 89ca89f12d23..05581140fd12 100644 --- a/arch/arm/vdso/vdso.lds.S +++ b/arch/arm/vdso/vdso.lds.S @@ -82,6 +82,8 @@ VERSION global: __vdso_clock_gettime; __vdso_gettimeofday; + __vdso_clock_getres; + __vdso_clock_gettime64; local: *; }; } diff --git a/arch/arm/vdso/vgettimeofday.c b/arch/arm/vdso/vgettimeofday.c index 7bdbf5d5c47d..0964b07890eb 100644 --- a/arch/arm/vdso/vgettimeofday.c +++ b/arch/arm/vdso/vgettimeofday.c @@ -1,271 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0 /* - * Copyright 2015 Mentor Graphics Corporation. + * ARM userspace implementations of gettimeofday() and similar. * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; version 2 of the - * License. + * Copyright (C) 2018 ARM Limited * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see http://www.gnu.org/licenses/. */ - -#include <linux/compiler.h> -#include <linux/hrtimer.h> #include <linux/time.h> -#include <asm/barrier.h> -#include <asm/bug.h> -#include <asm/cp15.h> -#include <asm/page.h> -#include <asm/unistd.h> -#include <asm/vdso_datapage.h> - -#ifndef CONFIG_AEABI -#error This code depends on AEABI system call conventions -#endif - -extern struct vdso_data *__get_datapage(void); - -static notrace u32 __vdso_read_begin(const struct vdso_data *vdata) -{ - u32 seq; -repeat: - seq = READ_ONCE(vdata->seq_count); - if (seq & 1) { - cpu_relax(); - goto repeat; - } - return seq; -} - -static notrace u32 vdso_read_begin(const struct vdso_data *vdata) -{ - u32 seq; - - seq = __vdso_read_begin(vdata); - - smp_rmb(); /* Pairs with smp_wmb in vdso_write_end */ - return seq; -} - -static notrace int vdso_read_retry(const struct vdso_data *vdata, u32 start) -{ - smp_rmb(); /* Pairs with smp_wmb in vdso_write_begin */ - return vdata->seq_count != start; -} - -static notrace long clock_gettime_fallback(clockid_t _clkid, - struct timespec *_ts) -{ - register struct timespec *ts asm("r1") = _ts; - register clockid_t clkid asm("r0") = _clkid; - register long ret asm ("r0"); - register long nr asm("r7") = __NR_clock_gettime; - - asm volatile( - " swi #0\n" - : "=r" (ret) - : "r" (clkid), "r" (ts), "r" (nr) - : "memory"); - - return ret; -} - -static notrace int do_realtime_coarse(struct timespec *ts, - struct vdso_data *vdata) -{ - u32 seq; - - do { - seq = vdso_read_begin(vdata); - - ts->tv_sec = vdata->xtime_coarse_sec; - ts->tv_nsec = vdata->xtime_coarse_nsec; - - } while (vdso_read_retry(vdata, seq)); - - return 0; -} - -static notrace int do_monotonic_coarse(struct timespec *ts, - struct vdso_data *vdata) -{ - struct timespec tomono; - u32 seq; - - do { - seq = vdso_read_begin(vdata); - - ts->tv_sec = vdata->xtime_coarse_sec; - ts->tv_nsec = vdata->xtime_coarse_nsec; - - tomono.tv_sec = vdata->wtm_clock_sec; - tomono.tv_nsec = vdata->wtm_clock_nsec; - - } while (vdso_read_retry(vdata, seq)); - - ts->tv_sec += tomono.tv_sec; - timespec_add_ns(ts, tomono.tv_nsec); - - return 0; -} - -#ifdef CONFIG_ARM_ARCH_TIMER - -static notrace u64 get_ns(struct vdso_data *vdata) -{ - u64 cycle_delta; - u64 cycle_now; - u64 nsec; +#include <linux/types.h>
- isb(); - cycle_now = read_sysreg(CNTVCT); - - cycle_delta = (cycle_now - vdata->cs_cycle_last) & vdata->cs_mask; - - nsec = (cycle_delta * vdata->cs_mult) + vdata->xtime_clock_snsec; - nsec >>= vdata->cs_shift; - - return nsec; -} - -static notrace int do_realtime(struct timespec *ts, struct vdso_data *vdata) +notrace int __vdso_clock_gettime(clockid_t clock, + struct old_timespec32 *ts) { - u64 nsecs; - u32 seq; - - do { - seq = vdso_read_begin(vdata); - - if (!vdata->tk_is_cntvct) - return -1; - - ts->tv_sec = vdata->xtime_clock_sec; - nsecs = get_ns(vdata); - - } while (vdso_read_retry(vdata, seq)); - - ts->tv_nsec = 0; - timespec_add_ns(ts, nsecs); - - return 0; + return __cvdso_clock_gettime32(clock, ts); }
-static notrace int do_monotonic(struct timespec *ts, struct vdso_data *vdata) +notrace int __vdso_clock_gettime64(clockid_t clock, + struct __kernel_timespec *ts) { - struct timespec tomono; - u64 nsecs; - u32 seq; - - do { - seq = vdso_read_begin(vdata); - - if (!vdata->tk_is_cntvct) - return -1; - - ts->tv_sec = vdata->xtime_clock_sec; - nsecs = get_ns(vdata); - - tomono.tv_sec = vdata->wtm_clock_sec; - tomono.tv_nsec = vdata->wtm_clock_nsec; - - } while (vdso_read_retry(vdata, seq)); - - ts->tv_sec += tomono.tv_sec; - ts->tv_nsec = 0; - timespec_add_ns(ts, nsecs + tomono.tv_nsec); - - return 0; + return __cvdso_clock_gettime(clock, ts); }
-#else /* CONFIG_ARM_ARCH_TIMER */ - -static notrace int do_realtime(struct timespec *ts, struct vdso_data *vdata) +notrace int __vdso_gettimeofday(struct __kernel_old_timeval *tv, + struct timezone *tz) { - return -1; + return __cvdso_gettimeofday(tv, tz); }
-static notrace int do_monotonic(struct timespec *ts, struct vdso_data *vdata) +notrace int __vdso_clock_getres(clockid_t clock_id, + struct old_timespec32 *res) { - return -1; -} - -#endif /* CONFIG_ARM_ARCH_TIMER */ - -notrace int __vdso_clock_gettime(clockid_t clkid, struct timespec *ts) -{ - struct vdso_data *vdata; - int ret = -1; - - vdata = __get_datapage(); - - switch (clkid) { - case CLOCK_REALTIME_COARSE: - ret = do_realtime_coarse(ts, vdata); - break; - case CLOCK_MONOTONIC_COARSE: - ret = do_monotonic_coarse(ts, vdata); - break; - case CLOCK_REALTIME: - ret = do_realtime(ts, vdata); - break; - case CLOCK_MONOTONIC: - ret = do_monotonic(ts, vdata); - break; - default: - break; - } - - if (ret) - ret = clock_gettime_fallback(clkid, ts); - - return ret; -} - -static notrace long gettimeofday_fallback(struct timeval *_tv, - struct timezone *_tz) -{ - register struct timezone *tz asm("r1") = _tz; - register struct timeval *tv asm("r0") = _tv; - register long ret asm ("r0"); - register long nr asm("r7") = __NR_gettimeofday; - - asm volatile( - " swi #0\n" - : "=r" (ret) - : "r" (tv), "r" (tz), "r" (nr) - : "memory"); - - return ret; -} - -notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz) -{ - struct timespec ts; - struct vdso_data *vdata; - int ret; - - vdata = __get_datapage(); - - ret = do_realtime(&ts, vdata); - if (ret) - return gettimeofday_fallback(tv, tz); - - if (tv) { - tv->tv_sec = ts.tv_sec; - tv->tv_usec = ts.tv_nsec / 1000; - } - if (tz) { - tz->tz_minuteswest = vdata->tz_minuteswest; - tz->tz_dsttime = vdata->tz_dsttime; - } - - return ret; + return __cvdso_clock_getres_time32(clock_id, res); }
/* Avoid unresolved references emitted by GCC */
On Thu, May 30, 2019 at 4:16 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
diff --git a/arch/arm/include/asm/vdso/gettimeofday.h b/arch/arm/include/asm/vdso/gettimeofday.h new file mode 100644 index 000000000000..eeeb319840ba --- /dev/null +++ b/arch/arm/include/asm/vdso/gettimeofday.h @@ -0,0 +1,96 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/*
- Copyright (C) 2018 ARM Limited
- */
+#ifndef __ASM_VDSO_GETTIMEOFDAY_H +#define __ASM_VDSO_GETTIMEOFDAY_H
+#ifndef __ASSEMBLY__
+#include <asm/barrier.h> +#include <asm/cp15.h> +#include <asm/unistd.h> +#include <uapi/linux/time.h>
+#ifndef CONFIG_AEABI +#error This code depends on AEABI system call conventions +#endif
Instead of an #error here, I would use a Kconfig conditional and make it
'select HAVE_GENERIC_VDSO if AEABI'
diff --git a/arch/arm/vdso/vdso.lds.S b/arch/arm/vdso/vdso.lds.S index 89ca89f12d23..05581140fd12 100644 --- a/arch/arm/vdso/vdso.lds.S +++ b/arch/arm/vdso/vdso.lds.S @@ -82,6 +82,8 @@ VERSION global: __vdso_clock_gettime; __vdso_gettimeofday;
__vdso_clock_getres;
__vdso_clock_gettime64; local: *; };
Why are you adding __vdso_clock_getres here? I would probably leave the addition of the new entry point(s) for a separate patch at the end, adding __vdso_clock_gettime64 to all 32-bit ABIs at once, since while that part is a trivial change, it's also user visible and deserves its own changelog text.
Arnd
The mips vDSO library requires some adaptations to take advantage of the newly introduced generic vDSO library.
Introduce the following changes: - Modification of vdso.c to be compliant with the common vdso datapage - Use of lib/vdso for gettimeofday
Cc: Ralf Baechle ralf@linux-mips.org Cc: Paul Burton paul.burton@mips.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/mips/Kconfig | 2 + arch/mips/include/asm/vdso.h | 78 +------ arch/mips/include/asm/vdso/gettimeofday.h | 175 ++++++++++++++ arch/mips/{ => include/asm}/vdso/vdso.h | 6 +- arch/mips/include/asm/vdso/vsyscall.h | 43 ++++ arch/mips/kernel/vdso.c | 37 +-- arch/mips/vdso/Makefile | 25 +- arch/mips/vdso/elf.S | 2 +- arch/mips/vdso/gettimeofday.c | 273 ---------------------- arch/mips/vdso/sigreturn.S | 2 +- arch/mips/vdso/vdso.lds.S | 4 + arch/mips/vdso/vgettimeofday.c | 57 +++++ 12 files changed, 316 insertions(+), 388 deletions(-) create mode 100644 arch/mips/include/asm/vdso/gettimeofday.h rename arch/mips/{ => include/asm}/vdso/vdso.h (90%) create mode 100644 arch/mips/include/asm/vdso/vsyscall.h delete mode 100644 arch/mips/vdso/gettimeofday.c create mode 100644 arch/mips/vdso/vgettimeofday.c
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 70d3200476bf..390c052cac9a 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -34,6 +34,7 @@ config MIPS select GENERIC_SCHED_CLOCK if !CAVIUM_OCTEON_SOC select GENERIC_SMP_IDLE_THREAD select GENERIC_TIME_VSYSCALL + select GENERIC_GETTIMEOFDAY select HANDLE_DOMAIN_IRQ select HAVE_ARCH_COMPILER_H select HAVE_ARCH_JUMP_LABEL @@ -72,6 +73,7 @@ config MIPS select HAVE_STACKPROTECTOR select HAVE_SYSCALL_TRACEPOINTS select HAVE_VIRT_CPU_ACCOUNTING_GEN if 64BIT || !SMP + select HAVE_GENERIC_VDSO select IRQ_FORCED_THREADING select ISA if EISA select MODULES_USE_ELF_RELA if MODULES && 64BIT diff --git a/arch/mips/include/asm/vdso.h b/arch/mips/include/asm/vdso.h index 91bf0c2c265c..285241da2284 100644 --- a/arch/mips/include/asm/vdso.h +++ b/arch/mips/include/asm/vdso.h @@ -12,6 +12,7 @@ #define __ASM_VDSO_H
#include <linux/mm_types.h> +#include <vdso/datapage.h>
#include <asm/barrier.h>
@@ -53,84 +54,9 @@ extern struct mips_vdso_image vdso_image_o32; extern struct mips_vdso_image vdso_image_n32; #endif
-/** - * union mips_vdso_data - Data provided by the kernel for the VDSO. - * @xtime_sec: Current real time (seconds part). - * @xtime_nsec: Current real time (nanoseconds part, shifted). - * @wall_to_mono_sec: Wall-to-monotonic offset (seconds part). - * @wall_to_mono_nsec: Wall-to-monotonic offset (nanoseconds part). - * @seq_count: Counter to synchronise updates (odd = updating). - * @cs_shift: Clocksource shift value. - * @clock_mode: Clocksource to use for time functions. - * @cs_mult: Clocksource multiplier value. - * @cs_cycle_last: Clock cycle value at last update. - * @cs_mask: Clocksource mask value. - * @tz_minuteswest: Minutes west of Greenwich (from timezone). - * @tz_dsttime: Type of DST correction (from timezone). - * - * This structure contains data needed by functions within the VDSO. It is - * populated by the kernel and mapped read-only into user memory. The time - * fields are mirrors of internal data from the timekeeping infrastructure. - * - * Note: Care should be taken when modifying as the layout must remain the same - * for both 64- and 32-bit (for 32-bit userland on 64-bit kernel). - */ union mips_vdso_data { - struct { - u64 xtime_sec; - u64 xtime_nsec; - u64 wall_to_mono_sec; - u64 wall_to_mono_nsec; - u32 seq_count; - u32 cs_shift; - u8 clock_mode; - u32 cs_mult; - u64 cs_cycle_last; - u64 cs_mask; - s32 tz_minuteswest; - s32 tz_dsttime; - }; - + struct vdso_data data[CS_BASES]; u8 page[PAGE_SIZE]; };
-static inline u32 vdso_data_read_begin(const union mips_vdso_data *data) -{ - u32 seq; - - while (true) { - seq = READ_ONCE(data->seq_count); - if (likely(!(seq & 1))) { - /* Paired with smp_wmb() in vdso_data_write_*(). */ - smp_rmb(); - return seq; - } - - cpu_relax(); - } -} - -static inline bool vdso_data_read_retry(const union mips_vdso_data *data, - u32 start_seq) -{ - /* Paired with smp_wmb() in vdso_data_write_*(). */ - smp_rmb(); - return unlikely(data->seq_count != start_seq); -} - -static inline void vdso_data_write_begin(union mips_vdso_data *data) -{ - ++data->seq_count; - - /* Ensure sequence update is written before other data page values. */ - smp_wmb(); -} - -static inline void vdso_data_write_end(union mips_vdso_data *data) -{ - /* Ensure data values are written before updating sequence again. */ - smp_wmb(); - ++data->seq_count; -} - #endif /* __ASM_VDSO_H */ diff --git a/arch/mips/include/asm/vdso/gettimeofday.h b/arch/mips/include/asm/vdso/gettimeofday.h new file mode 100644 index 000000000000..a9fb871fb096 --- /dev/null +++ b/arch/mips/include/asm/vdso/gettimeofday.h @@ -0,0 +1,175 @@ +/* + * Copyright (C) 2018 ARM Limited + * Copyright (C) 2015 Imagination Technologies + * Author: Alex Smith alex.smith@imgtec.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ +#ifndef __ASM_VDSO_GETTIMEOFDAY_H +#define __ASM_VDSO_GETTIMEOFDAY_H + +#ifndef __ASSEMBLY__ + +#include <linux/compiler.h> +#include <linux/time.h> + +#include <asm/vdso/vdso.h> +#include <asm/clocksource.h> +#include <asm/io.h> +#include <asm/unistd.h> +#include <asm/vdso.h> + +#ifdef CONFIG_MIPS_CLOCK_VSYSCALL + +static __always_inline notrace long gettimeofday_fallback( + struct __kernel_old_timeval *_tv, + struct timezone *_tz) +{ + register struct timezone *tz asm("a1") = _tz; + register struct __kernel_old_timeval *tv asm("a0") = _tv; + register long ret asm("v0"); + register long nr asm("v0") = __NR_gettimeofday; + register long error asm("a3"); + + asm volatile( + " syscall\n" + : "=r" (ret), "=r" (error) + : "r" (tv), "r" (tz), "r" (nr) + : "$1", "$3", "$8", "$9", "$10", "$11", "$12", "$13", + "$14", "$15", "$24", "$25", "hi", "lo", "memory"); + + return error ? -ret : ret; +} + +#else + +static __always_inline notrace long gettimeofday_fallback( + struct __kernel_old_timeval *_tv, + struct timezone *_tz) +{ + return -1; +} + +#endif + +static __always_inline notrace long clock_gettime_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + register struct __kernel_timespec *ts asm("a1") = _ts; + register clockid_t clkid asm("a0") = _clkid; + register long ret asm("v0"); +#if _MIPS_SIM == _MIPS_SIM_ABI64 + register long nr asm("v0") = __NR_clock_gettime; +#else + register long nr asm("v0") = __NR_clock_gettime64; +#endif + register long error asm("a3"); + + asm volatile( + " syscall\n" + : "=r" (ret), "=r" (error) + : "r" (clkid), "r" (ts), "r" (nr) + : "$1", "$3", "$8", "$9", "$10", "$11", "$12", "$13", + "$14", "$15", "$24", "$25", "hi", "lo", "memory"); + + return error ? -ret : ret; +} + +static __always_inline notrace int clock_getres_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + register struct __kernel_timespec *ts asm("a1") = _ts; + register clockid_t clkid asm("a0") = _clkid; + register long ret asm("v0"); +#if _MIPS_SIM == _MIPS_SIM_ABI64 + register long nr asm("v0") = __NR_clock_getres; +#else + register long nr asm("v0") = __NR_clock_getres_time64; +#endif + register long error asm("a3"); + + asm volatile( + " syscall\n" + : "=r" (ret), "=r" (error) + : "r" (clkid), "r" (ts), "r" (nr) + : "$1", "$3", "$8", "$9", "$10", "$11", "$12", "$13", + "$14", "$15", "$24", "$25", "hi", "lo", "memory"); + + return error ? -ret : ret; +} + +#ifdef CONFIG_CSRC_R4K + +static __always_inline u64 read_r4k_count(void) +{ + unsigned int count; + + __asm__ __volatile__( + " .set push\n" + " .set mips32r2\n" + " rdhwr %0, $2\n" + " .set pop\n" + : "=r" (count)); + + return count; +} + +#endif + +#ifdef CONFIG_CLKSRC_MIPS_GIC + +static __always_inline u64 read_gic_count(const struct vdso_data *data) +{ + void __iomem *gic = get_gic(data); + u32 hi, hi2, lo; + + do { + hi = __raw_readl(gic + sizeof(lo)); + lo = __raw_readl(gic); + hi2 = __raw_readl(gic + sizeof(lo)); + } while (hi2 != hi); + + return (((u64)hi) << 32) + lo; +} + +#endif + +static __always_inline notrace u64 __arch_get_hw_counter(s32 clock_mode) +{ +#ifdef CONFIG_CLKSRC_MIPS_GIC + const struct vdso_data *data = get_vdso_data(); +#endif + u64 cycle_now; + + switch (clock_mode) { +#ifdef CONFIG_CSRC_R4K + case VDSO_CLOCK_R4K: + cycle_now = read_r4k_count(); + break; +#endif +#ifdef CONFIG_CLKSRC_MIPS_GIC + case VDSO_CLOCK_GIC: + cycle_now = read_gic_count(data); + break; +#endif + default: + cycle_now = 0; + break; + } + + return cycle_now; +} + +static __always_inline notrace const struct vdso_data *__arch_get_vdso_data(void) +{ + return get_vdso_data(); +} + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_GETTIMEOFDAY_H */ diff --git a/arch/mips/vdso/vdso.h b/arch/mips/include/asm/vdso/vdso.h similarity index 90% rename from arch/mips/vdso/vdso.h rename to arch/mips/include/asm/vdso/vdso.h index cfb1be441dec..048d12fbb925 100644 --- a/arch/mips/vdso/vdso.h +++ b/arch/mips/include/asm/vdso/vdso.h @@ -72,14 +72,14 @@ static inline unsigned long get_vdso_base(void) return addr; }
-static inline const union mips_vdso_data *get_vdso_data(void) +static inline const struct vdso_data *get_vdso_data(void) { - return (const union mips_vdso_data *)(get_vdso_base() - PAGE_SIZE); + return (const struct vdso_data *)(get_vdso_base() - PAGE_SIZE); }
#ifdef CONFIG_CLKSRC_MIPS_GIC
-static inline void __iomem *get_gic(const union mips_vdso_data *data) +static inline void __iomem *get_gic(const struct vdso_data *data) { return (void __iomem *)data - PAGE_SIZE; } diff --git a/arch/mips/include/asm/vdso/vsyscall.h b/arch/mips/include/asm/vdso/vsyscall.h new file mode 100644 index 000000000000..195314732233 --- /dev/null +++ b/arch/mips/include/asm/vdso/vsyscall.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_VDSO_VSYSCALL_H +#define __ASM_VDSO_VSYSCALL_H + +#ifndef __ASSEMBLY__ + +#include <linux/timekeeper_internal.h> +#include <vdso/datapage.h> + +extern struct vdso_data *vdso_data; + +/* + * Update the vDSO data page to keep in sync with kernel timekeeping. + */ +static __always_inline +struct vdso_data *__mips_get_k_vdso_data(void) +{ + return vdso_data; +} +#define __arch_get_k_vdso_data __mips_get_k_vdso_data + +static __always_inline +int __mips_get_clock_mode(struct timekeeper *tk) +{ + u32 clock_mode = tk->tkr_mono.clock->archdata.vdso_clock_mode; + + return clock_mode; +} +#define __arch_get_clock_mode __mips_get_clock_mode + +static __always_inline +int __mips_use_vsyscall(struct vdso_data *vdata) +{ + return (vdata[CS_HRES_COARSE].clock_mode != VDSO_CLOCK_NONE); +} +#define __arch_use_vsyscall __mips_use_vsyscall + +/* The asm-generic header needs to be included after the definitions above */ +#include <asm-generic/vdso/vsyscall.h> + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_VSYSCALL_H */ diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c index 9df3ebdc7b0f..157ee8045035 100644 --- a/arch/mips/kernel/vdso.c +++ b/arch/mips/kernel/vdso.c @@ -24,9 +24,12 @@ #include <asm/mips-cps.h> #include <asm/page.h> #include <asm/vdso.h> +#include <vdso/helpers.h> +#include <vdso/vsyscall.h>
/* Kernel-provided data used by the VDSO. */ -static union mips_vdso_data vdso_data __page_aligned_data; +static union mips_vdso_data mips_vdso_data __page_aligned_data; +struct vdso_data *vdso_data = mips_vdso_data.data;
/* * Mapping for the VDSO data/GIC pages. The real pages are mapped manually, as @@ -70,34 +73,6 @@ static int __init init_vdso(void) } subsys_initcall(init_vdso);
-void update_vsyscall(struct timekeeper *tk) -{ - vdso_data_write_begin(&vdso_data); - - vdso_data.xtime_sec = tk->xtime_sec; - vdso_data.xtime_nsec = tk->tkr_mono.xtime_nsec; - vdso_data.wall_to_mono_sec = tk->wall_to_monotonic.tv_sec; - vdso_data.wall_to_mono_nsec = tk->wall_to_monotonic.tv_nsec; - vdso_data.cs_shift = tk->tkr_mono.shift; - - vdso_data.clock_mode = tk->tkr_mono.clock->archdata.vdso_clock_mode; - if (vdso_data.clock_mode != VDSO_CLOCK_NONE) { - vdso_data.cs_mult = tk->tkr_mono.mult; - vdso_data.cs_cycle_last = tk->tkr_mono.cycle_last; - vdso_data.cs_mask = tk->tkr_mono.mask; - } - - vdso_data_write_end(&vdso_data); -} - -void update_vsyscall_tz(void) -{ - if (vdso_data.clock_mode != VDSO_CLOCK_NONE) { - vdso_data.tz_minuteswest = sys_tz.tz_minuteswest; - vdso_data.tz_dsttime = sys_tz.tz_dsttime; - } -} - static unsigned long vdso_base(void) { unsigned long base; @@ -167,7 +142,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) */ if (cpu_has_dc_aliases) { base = __ALIGN_MASK(base, shm_align_mask); - base += ((unsigned long)&vdso_data - gic_size) & shm_align_mask; + base += ((unsigned long)vdso_data - gic_size) & shm_align_mask; }
data_addr = base + gic_size; @@ -193,7 +168,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
/* Map data page. */ ret = remap_pfn_range(vma, data_addr, - virt_to_phys(&vdso_data) >> PAGE_SHIFT, + virt_to_phys(vdso_data) >> PAGE_SHIFT, PAGE_SIZE, PAGE_READONLY); if (ret) goto out; diff --git a/arch/mips/vdso/Makefile b/arch/mips/vdso/Makefile index 7221df24cb23..96420357d5bf 100644 --- a/arch/mips/vdso/Makefile +++ b/arch/mips/vdso/Makefile @@ -1,6 +1,12 @@ # SPDX-License-Identifier: GPL-2.0 # Objects to go into the VDSO. -obj-vdso-y := elf.o gettimeofday.o sigreturn.o + +# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before +# the inclusion of generic Makefile. +ARCH_REL_TYPE_ABS := R_MIPS_JUMP_SLOT|R_MIPS_GLOB_DAT +include $(srctree)/lib/vdso/Makefile + +obj-vdso-y := elf.o vgettimeofday.o sigreturn.o
# Common compiler flags between ABIs. ccflags-vdso := \ @@ -15,15 +21,23 @@ ifdef CONFIG_CC_IS_CLANG ccflags-vdso += $(filter --target=%,$(KBUILD_CFLAGS)) endif
+# +# The -fno-jump-tables flag only prevents the compiler from generating +# jump tables but does not prevent the compiler from emitting absolute +# offsets. cflags-vdso := $(ccflags-vdso) \ $(filter -W%,$(filter-out -Wa$(comma)%,$(KBUILD_CFLAGS))) \ - -O2 -g -fPIC -fno-strict-aliasing -fno-common -fno-builtin -G 0 \ - -DDISABLE_BRANCH_PROFILING \ + -O3 -g -fPIC -fno-strict-aliasing -fno-common -fno-builtin -G 0 \ + -fno-stack-protector -fno-jump-tables -DDISABLE_BRANCH_PROFILING \ $(call cc-option, -fno-asynchronous-unwind-tables) \ $(call cc-option, -fno-stack-protector) aflags-vdso := $(ccflags-vdso) \ -D__ASSEMBLY__ -Wa,-gdwarf-2
+ifneq ($(c-gettimeofday-y),) +CFLAGS_vgettimeofday.o = -include $(c-gettimeofday-y) +endif + # # For the pre-R6 code in arch/mips/vdso/vdso.h for locating # the base address of VDSO, the linker will emit a R_MIPS_PC32 @@ -48,6 +62,8 @@ VDSO_LDFLAGS := \ $(addprefix -Wl$(comma),$(filter -E%,$(KBUILD_CFLAGS))) \ -nostdlib -shared -Wl,--hash-style=sysv -Wl,--build-id
+CFLAGS_REMOVE_vdso.o = -pg + GCOV_PROFILE := n UBSAN_SANITIZE := n
@@ -96,6 +112,7 @@ $(obj)/vdso.lds: KBUILD_CPPFLAGS := $(ccflags-vdso) $(native-abi)
$(obj)/vdso.so.dbg.raw: $(obj)/vdso.lds $(obj-vdso) FORCE $(call if_changed,vdsold) + $(call if_changed,vdso_check)
$(obj)/vdso-image.c: $(obj)/vdso.so.dbg.raw $(obj)/vdso.so.raw \ $(obj)/genvdso FORCE @@ -134,6 +151,7 @@ $(obj)/vdso-o32.lds: $(src)/vdso.lds.S FORCE
$(obj)/vdso-o32.so.dbg.raw: $(obj)/vdso-o32.lds $(obj-vdso-o32) FORCE $(call if_changed,vdsold) + $(call if_changed,vdso_check)
$(obj)/vdso-o32-image.c: VDSO_NAME := o32 $(obj)/vdso-o32-image.c: $(obj)/vdso-o32.so.dbg.raw $(obj)/vdso-o32.so.raw \ @@ -174,6 +192,7 @@ $(obj)/vdso-n32.lds: $(src)/vdso.lds.S FORCE
$(obj)/vdso-n32.so.dbg.raw: $(obj)/vdso-n32.lds $(obj-vdso-n32) FORCE $(call if_changed,vdsold) + $(call if_changed,vdso_check)
$(obj)/vdso-n32-image.c: VDSO_NAME := n32 $(obj)/vdso-n32-image.c: $(obj)/vdso-n32.so.dbg.raw $(obj)/vdso-n32.so.raw \ diff --git a/arch/mips/vdso/elf.S b/arch/mips/vdso/elf.S index 428a1917afc6..c0c85d126094 100644 --- a/arch/mips/vdso/elf.S +++ b/arch/mips/vdso/elf.S @@ -8,7 +8,7 @@ * option) any later version. */
-#include "vdso.h" +#include <asm/vdso/vdso.h>
#include <asm/isa-rev.h>
diff --git a/arch/mips/vdso/gettimeofday.c b/arch/mips/vdso/gettimeofday.c deleted file mode 100644 index e22b422f282c..000000000000 --- a/arch/mips/vdso/gettimeofday.c +++ /dev/null @@ -1,273 +0,0 @@ -/* - * Copyright (C) 2015 Imagination Technologies - * Author: Alex Smith alex.smith@imgtec.com - * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License as published by the - * Free Software Foundation; either version 2 of the License, or (at your - * option) any later version. - */ - -#include "vdso.h" - -#include <linux/compiler.h> -#include <linux/time.h> - -#include <asm/clocksource.h> -#include <asm/io.h> -#include <asm/unistd.h> -#include <asm/vdso.h> - -#ifdef CONFIG_MIPS_CLOCK_VSYSCALL - -static __always_inline long gettimeofday_fallback(struct timeval *_tv, - struct timezone *_tz) -{ - register struct timezone *tz asm("a1") = _tz; - register struct timeval *tv asm("a0") = _tv; - register long ret asm("v0"); - register long nr asm("v0") = __NR_gettimeofday; - register long error asm("a3"); - - asm volatile( - " syscall\n" - : "=r" (ret), "=r" (error) - : "r" (tv), "r" (tz), "r" (nr) - : "$1", "$3", "$8", "$9", "$10", "$11", "$12", "$13", - "$14", "$15", "$24", "$25", "hi", "lo", "memory"); - - return error ? -ret : ret; -} - -#endif - -static __always_inline long clock_gettime_fallback(clockid_t _clkid, - struct timespec *_ts) -{ - register struct timespec *ts asm("a1") = _ts; - register clockid_t clkid asm("a0") = _clkid; - register long ret asm("v0"); - register long nr asm("v0") = __NR_clock_gettime; - register long error asm("a3"); - - asm volatile( - " syscall\n" - : "=r" (ret), "=r" (error) - : "r" (clkid), "r" (ts), "r" (nr) - : "$1", "$3", "$8", "$9", "$10", "$11", "$12", "$13", - "$14", "$15", "$24", "$25", "hi", "lo", "memory"); - - return error ? -ret : ret; -} - -static __always_inline int do_realtime_coarse(struct timespec *ts, - const union mips_vdso_data *data) -{ - u32 start_seq; - - do { - start_seq = vdso_data_read_begin(data); - - ts->tv_sec = data->xtime_sec; - ts->tv_nsec = data->xtime_nsec >> data->cs_shift; - } while (vdso_data_read_retry(data, start_seq)); - - return 0; -} - -static __always_inline int do_monotonic_coarse(struct timespec *ts, - const union mips_vdso_data *data) -{ - u32 start_seq; - u64 to_mono_sec; - u64 to_mono_nsec; - - do { - start_seq = vdso_data_read_begin(data); - - ts->tv_sec = data->xtime_sec; - ts->tv_nsec = data->xtime_nsec >> data->cs_shift; - - to_mono_sec = data->wall_to_mono_sec; - to_mono_nsec = data->wall_to_mono_nsec; - } while (vdso_data_read_retry(data, start_seq)); - - ts->tv_sec += to_mono_sec; - timespec_add_ns(ts, to_mono_nsec); - - return 0; -} - -#ifdef CONFIG_CSRC_R4K - -static __always_inline u64 read_r4k_count(void) -{ - unsigned int count; - - __asm__ __volatile__( - " .set push\n" - " .set mips32r2\n" - " rdhwr %0, $2\n" - " .set pop\n" - : "=r" (count)); - - return count; -} - -#endif - -#ifdef CONFIG_CLKSRC_MIPS_GIC - -static __always_inline u64 read_gic_count(const union mips_vdso_data *data) -{ - void __iomem *gic = get_gic(data); - u32 hi, hi2, lo; - - do { - hi = __raw_readl(gic + sizeof(lo)); - lo = __raw_readl(gic); - hi2 = __raw_readl(gic + sizeof(lo)); - } while (hi2 != hi); - - return (((u64)hi) << 32) + lo; -} - -#endif - -static __always_inline u64 get_ns(const union mips_vdso_data *data) -{ - u64 cycle_now, delta, nsec; - - switch (data->clock_mode) { -#ifdef CONFIG_CSRC_R4K - case VDSO_CLOCK_R4K: - cycle_now = read_r4k_count(); - break; -#endif -#ifdef CONFIG_CLKSRC_MIPS_GIC - case VDSO_CLOCK_GIC: - cycle_now = read_gic_count(data); - break; -#endif - default: - return 0; - } - - delta = (cycle_now - data->cs_cycle_last) & data->cs_mask; - - nsec = (delta * data->cs_mult) + data->xtime_nsec; - nsec >>= data->cs_shift; - - return nsec; -} - -static __always_inline int do_realtime(struct timespec *ts, - const union mips_vdso_data *data) -{ - u32 start_seq; - u64 ns; - - do { - start_seq = vdso_data_read_begin(data); - - if (data->clock_mode == VDSO_CLOCK_NONE) - return -ENOSYS; - - ts->tv_sec = data->xtime_sec; - ns = get_ns(data); - } while (vdso_data_read_retry(data, start_seq)); - - ts->tv_nsec = 0; - timespec_add_ns(ts, ns); - - return 0; -} - -static __always_inline int do_monotonic(struct timespec *ts, - const union mips_vdso_data *data) -{ - u32 start_seq; - u64 ns; - u64 to_mono_sec; - u64 to_mono_nsec; - - do { - start_seq = vdso_data_read_begin(data); - - if (data->clock_mode == VDSO_CLOCK_NONE) - return -ENOSYS; - - ts->tv_sec = data->xtime_sec; - ns = get_ns(data); - - to_mono_sec = data->wall_to_mono_sec; - to_mono_nsec = data->wall_to_mono_nsec; - } while (vdso_data_read_retry(data, start_seq)); - - ts->tv_sec += to_mono_sec; - ts->tv_nsec = 0; - timespec_add_ns(ts, ns + to_mono_nsec); - - return 0; -} - -#ifdef CONFIG_MIPS_CLOCK_VSYSCALL - -/* - * This is behind the ifdef so that we don't provide the symbol when there's no - * possibility of there being a usable clocksource, because there's nothing we - * can do without it. When libc fails the symbol lookup it should fall back on - * the standard syscall path. - */ -int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz) -{ - const union mips_vdso_data *data = get_vdso_data(); - struct timespec ts; - int ret; - - ret = do_realtime(&ts, data); - if (ret) - return gettimeofday_fallback(tv, tz); - - if (tv) { - tv->tv_sec = ts.tv_sec; - tv->tv_usec = ts.tv_nsec / 1000; - } - - if (tz) { - tz->tz_minuteswest = data->tz_minuteswest; - tz->tz_dsttime = data->tz_dsttime; - } - - return 0; -} - -#endif /* CONFIG_MIPS_CLOCK_VSYSCALL */ - -int __vdso_clock_gettime(clockid_t clkid, struct timespec *ts) -{ - const union mips_vdso_data *data = get_vdso_data(); - int ret = -1; - - switch (clkid) { - case CLOCK_REALTIME_COARSE: - ret = do_realtime_coarse(ts, data); - break; - case CLOCK_MONOTONIC_COARSE: - ret = do_monotonic_coarse(ts, data); - break; - case CLOCK_REALTIME: - ret = do_realtime(ts, data); - break; - case CLOCK_MONOTONIC: - ret = do_monotonic(ts, data); - break; - default: - break; - } - - if (ret) - ret = clock_gettime_fallback(clkid, ts); - - return ret; -} diff --git a/arch/mips/vdso/sigreturn.S b/arch/mips/vdso/sigreturn.S index 30c6219912ac..c2b05956e4cb 100644 --- a/arch/mips/vdso/sigreturn.S +++ b/arch/mips/vdso/sigreturn.S @@ -8,7 +8,7 @@ * option) any later version. */
-#include "vdso.h" +#include <asm/vdso/vdso.h>
#include <uapi/asm/unistd.h>
diff --git a/arch/mips/vdso/vdso.lds.S b/arch/mips/vdso/vdso.lds.S index 8df7dd53e8e0..659fe0c3750a 100644 --- a/arch/mips/vdso/vdso.lds.S +++ b/arch/mips/vdso/vdso.lds.S @@ -99,6 +99,10 @@ VERSION global: __vdso_clock_gettime; __vdso_gettimeofday; + __vdso_clock_getres; +#if _MIPS_SIM != _MIPS_SIM_ABI64 + __vdso_clock_gettime64; +#endif #endif local: *; }; diff --git a/arch/mips/vdso/vgettimeofday.c b/arch/mips/vdso/vgettimeofday.c new file mode 100644 index 000000000000..41b6a21cd1d1 --- /dev/null +++ b/arch/mips/vdso/vgettimeofday.c @@ -0,0 +1,57 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * MIPS64 and compat userspace implementations of gettimeofday() + * and similar. + * + * Copyright (C) 2018 ARM Limited + * + */ +#include <linux/time.h> +#include <linux/types.h> + +#if _MIPS_SIM != _MIPS_SIM_ABI64 +notrace int __vdso_clock_gettime(clockid_t clock, + struct old_timespec32 *ts) +{ + return __cvdso_clock_gettime32(clock, ts); +} + +notrace int __vdso_gettimeofday(struct __kernel_old_timeval *tv, + struct timezone *tz) +{ + return __cvdso_gettimeofday(tv, tz); +} + +notrace int __vdso_clock_getres(clockid_t clock_id, + struct old_timespec32 *res) +{ + return __cvdso_clock_getres_time32(clock_id, res); +} + +notrace int __vdso_clock_gettime_time64(clockid_t clock, + struct __kernel_timespec *ts) +{ + return __cvdso_clock_gettime(clock, ts); +} + +#else + +notrace int __vdso_clock_gettime(clockid_t clock, + struct __kernel_timespec *ts) +{ + return __cvdso_clock_gettime(clock, ts); +} + +notrace int __vdso_gettimeofday(struct __kernel_old_timeval *tv, + struct timezone *tz) +{ + return __cvdso_gettimeofday(tv, tz); +} + +notrace int __vdso_clock_getres(clockid_t clock_id, + struct __kernel_timespec *res) +{ + return __cvdso_clock_getres(clock_id, res); +} + +#endif
On Thu, May 30, 2019 at 4:16 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
--- a/arch/mips/vdso/vdso.lds.S +++ b/arch/mips/vdso/vdso.lds.S @@ -99,6 +99,10 @@ VERSION global: __vdso_clock_gettime; __vdso_gettimeofday;
__vdso_clock_getres;
+#if _MIPS_SIM != _MIPS_SIM_ABI64
__vdso_clock_gettime64;
+#endif #endif local: *; };
Same comment as for the corresponding arm change: I'd leave the ABI changes to a separate patch, and probably not add __vdso_clock_getres at all.
Also, you seem to have a typo here:
+notrace int __vdso_clock_gettime_time64(clockid_t clock,
struct __kernel_timespec *ts)
+{
return __cvdso_clock_gettime(clock, ts);
+}
This should be __vdso_clock_gettime64, not __vdso_clock_gettime_time64 I think.
Arnd
On 05/31/2019 01:34 AM, Arnd Bergmann wrote:
On Thu, May 30, 2019 at 4:16 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
--- a/arch/mips/vdso/vdso.lds.S +++ b/arch/mips/vdso/vdso.lds.S @@ -99,6 +99,10 @@ VERSION global: __vdso_clock_gettime; __vdso_gettimeofday;
__vdso_clock_getres;
+#if _MIPS_SIM != _MIPS_SIM_ABI64
__vdso_clock_gettime64;
+#endif #endif local: *; };
Same comment as for the corresponding arm change: I'd leave the ABI changes to a separate patch, and probably not add __vdso_clock_getres at all.
Removing this would break ABI (would it really, it just replaces the syscall ... so it is more of a user space expectation)? already present in arm64 before this series.
-- Mark
On Mon, Jun 3, 2019 at 4:54 PM Mark Salyzyn salyzyn@android.com wrote:
On 05/31/2019 01:34 AM, Arnd Bergmann wrote:
On Thu, May 30, 2019 at 4:16 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
--- a/arch/mips/vdso/vdso.lds.S +++ b/arch/mips/vdso/vdso.lds.S @@ -99,6 +99,10 @@ VERSION global: __vdso_clock_gettime; __vdso_gettimeofday;
__vdso_clock_getres;
+#if _MIPS_SIM != _MIPS_SIM_ABI64
__vdso_clock_gettime64;
+#endif #endif local: *; };
Same comment as for the corresponding arm change: I'd leave the ABI changes to a separate patch, and probably not add __vdso_clock_getres at all.
Removing this would break ABI (would it really, it just replaces the syscall ... so it is more of a user space expectation)? already present in arm64 before this series.
What I meant is that we should only keep clock_getres() in the vdso for architectures that already have it, to keep the ABI unchanged, but not add it to new ones.
At the moment, arm64, nds32, ppc, riscv and s390 have clock_getres, while arm, mips, sparc, and x86 don't.
Also: on 32-bit architectures with 64-bit time_t, the series only adds clock_gettime()., not clock_getres(), so user space should stop assuming it's there.
Arnd
The x86 vDSO library requires some adaptations to take advantage of the newly introduced generic vDSO library.
Introduce the following changes: - Modification of vdso.c to be compliant with the common vdso datapage - Use of lib/vdso for gettimeofday
Cc: Thomas Gleixner tglx@linutronix.de Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- arch/x86/Kconfig | 3 + arch/x86/entry/vdso/Makefile | 9 + arch/x86/entry/vdso/vclock_gettime.c | 251 +++++------------------ arch/x86/entry/vdso/vdso.lds.S | 2 + arch/x86/entry/vdso/vdso32/vdso32.lds.S | 2 + arch/x86/entry/vdso/vdsox32.lds.S | 1 + arch/x86/entry/vsyscall/Makefile | 2 - arch/x86/entry/vsyscall/vsyscall_gtod.c | 83 -------- arch/x86/include/asm/mshyperv-tsc.h | 76 +++++++ arch/x86/include/asm/mshyperv.h | 70 +------ arch/x86/include/asm/pvclock.h | 2 +- arch/x86/include/asm/vdso/gettimeofday.h | 203 ++++++++++++++++++ arch/x86/include/asm/vdso/vsyscall.h | 44 ++++ arch/x86/include/asm/vgtod.h | 75 +------ arch/x86/include/asm/vvar.h | 7 +- arch/x86/kernel/pvclock.c | 1 + 16 files changed, 396 insertions(+), 435 deletions(-) delete mode 100644 arch/x86/entry/vsyscall/vsyscall_gtod.c create mode 100644 arch/x86/include/asm/mshyperv-tsc.h create mode 100644 arch/x86/include/asm/vdso/gettimeofday.h create mode 100644 arch/x86/include/asm/vdso/vsyscall.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2bbbd4d1ba31..51a98d6eae8e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -17,6 +17,7 @@ config X86_32 select HAVE_DEBUG_STACKOVERFLOW select MODULES_USE_ELF_REL select OLD_SIGACTION + select GENERIC_VDSO_32
config X86_64 def_bool y @@ -121,6 +122,7 @@ config X86 select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER select GENERIC_TIME_VSYSCALL + select GENERIC_GETTIMEOFDAY select HARDLOCKUP_CHECK_TIMESTAMP if X86_64 select HAVE_ACPI_APEI if ACPI select HAVE_ACPI_APEI_NMI if ACPI @@ -202,6 +204,7 @@ config X86 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER + select HAVE_GENERIC_VDSO select HOTPLUG_SMT if SMP select IRQ_FORCED_THREADING select NEED_SG_DMA_LENGTH diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 42fe42e82baf..39106111be86 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -3,6 +3,12 @@ # Building vDSO images for x86. #
+# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before +# the inclusion of generic Makefile. +ARCH_REL_TYPE_ABS := R_X86_64_JUMP_SLOT|R_X86_64_GLOB_DAT|R_X86_64_RELATIVE| +ARCH_REL_TYPE_ABS += R_386_GLOB_DAT|R_386_JMP_SLOT|R_386_RELATIVE +include $(srctree)/lib/vdso/Makefile + KBUILD_CFLAGS += $(DISABLE_LTO) KASAN_SANITIZE := n UBSAN_SANITIZE := n @@ -51,6 +57,7 @@ VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-undefined \
$(obj)/vdso64.so.dbg: $(obj)/vdso.lds $(vobjs) FORCE $(call if_changed,vdso) + $(call if_changed,vdso_check)
HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi -I$(srctree)/arch/$(SUBARCH)/include/uapi hostprogs-y += vdso2c @@ -121,6 +128,7 @@ $(obj)/%.so: $(obj)/%.so.dbg FORCE
$(obj)/vdsox32.so.dbg: $(obj)/vdsox32.lds $(vobjx32s) FORCE $(call if_changed,vdso) + $(call if_changed,vdso_check)
CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds) VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1 @@ -160,6 +168,7 @@ $(obj)/vdso32.so.dbg: FORCE \ $(obj)/vdso32/system_call.o \ $(obj)/vdso32/sigreturn.o $(call if_changed,vdso) + $(call if_changed,vdso_check)
# # The DSO images are built using a special linker script. diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c index 98c7d12b945c..39268f941878 100644 --- a/arch/x86/entry/vdso/vclock_gettime.c +++ b/arch/x86/entry/vdso/vclock_gettime.c @@ -1,240 +1,81 @@ +// SPDX-License-Identifier: GPL-2.0 /* - * Copyright 2006 Andi Kleen, SUSE Labs. - * Subject to the GNU Public License, v.2 - * * Fast user context implementation of clock_gettime, gettimeofday, and time. * + * Copyright 2019 ARM Limited + * Copyright 2006 Andi Kleen, SUSE Labs. * 32 Bit compat layer by Stefani Seibold stefani@seibold.net * sponsored by Rohde & Schwarz GmbH & Co. KG Munich/Germany - * - * The code should have no internal unresolved relocations. - * Check with readelf after changing. */ - -#include <uapi/linux/time.h> -#include <asm/vgtod.h> -#include <asm/vvar.h> -#include <asm/unistd.h> -#include <asm/msr.h> -#include <asm/pvclock.h> -#include <asm/mshyperv.h> -#include <linux/math64.h> #include <linux/time.h> -#include <linux/kernel.h> +#include <linux/types.h>
-#define gtod (&VVAR(vsyscall_gtod_data)) +#include "../../../../lib/vdso/gettimeofday.c"
-extern int __vdso_clock_gettime(clockid_t clock, struct timespec *ts); -extern int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz); +extern int __vdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz); extern time_t __vdso_time(time_t *t);
-#ifdef CONFIG_PARAVIRT_CLOCK -extern u8 pvclock_page[PAGE_SIZE] - __attribute__((visibility("hidden"))); -#endif - -#ifdef CONFIG_HYPERV_TSCPAGE -extern u8 hvclock_page[PAGE_SIZE] - __attribute__((visibility("hidden"))); -#endif - -#ifndef BUILD_VDSO32 - -notrace static long vdso_fallback_gettime(long clock, struct timespec *ts) -{ - long ret; - asm ("syscall" : "=a" (ret), "=m" (*ts) : - "0" (__NR_clock_gettime), "D" (clock), "S" (ts) : - "rcx", "r11"); - return ret; -} - -#else - -notrace static long vdso_fallback_gettime(long clock, struct timespec *ts) +notrace int __vdso_gettimeofday(struct __kernel_old_timeval *tv, + struct timezone *tz) { - long ret; - - asm ( - "mov %%ebx, %%edx \n" - "mov %[clock], %%ebx \n" - "call __kernel_vsyscall \n" - "mov %%edx, %%ebx \n" - : "=a" (ret), "=m" (*ts) - : "0" (__NR_clock_gettime), [clock] "g" (clock), "c" (ts) - : "edx"); - return ret; + return __cvdso_gettimeofday(tv, tz); } +int gettimeofday(struct __kernel_old_timeval *, struct timezone *) + __attribute__((weak, alias("__vdso_gettimeofday")));
-#endif - -#ifdef CONFIG_PARAVIRT_CLOCK -static notrace const struct pvclock_vsyscall_time_info *get_pvti0(void) +notrace time_t __vdso_time(time_t *t) { - return (const struct pvclock_vsyscall_time_info *)&pvclock_page; + return __cvdso_time(t); } +time_t time(time_t *t) + __attribute__((weak, alias("__vdso_time")));
-static notrace u64 vread_pvclock(void) -{ - const struct pvclock_vcpu_time_info *pvti = &get_pvti0()->pvti; - u32 version; - u64 ret; - - /* - * Note: The kernel and hypervisor must guarantee that cpu ID - * number maps 1:1 to per-CPU pvclock time info. - * - * Because the hypervisor is entirely unaware of guest userspace - * preemption, it cannot guarantee that per-CPU pvclock time - * info is updated if the underlying CPU changes or that that - * version is increased whenever underlying CPU changes. - * - * On KVM, we are guaranteed that pvti updates for any vCPU are - * atomic as seen by *all* vCPUs. This is an even stronger - * guarantee than we get with a normal seqlock. - * - * On Xen, we don't appear to have that guarantee, but Xen still - * supplies a valid seqlock using the version field. - * - * We only do pvclock vdso timing at all if - * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to - * mean that all vCPUs have matching pvti and that the TSC is - * synced, so we can just look at vCPU 0's pvti. - */ - - do { - version = pvclock_read_begin(pvti); - - if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) - return U64_MAX; - - ret = __pvclock_read_cycles(pvti, rdtsc_ordered()); - } while (pvclock_read_retry(pvti, version)); - - return ret; -} -#endif -#ifdef CONFIG_HYPERV_TSCPAGE -static notrace u64 vread_hvclock(void) -{ - const struct ms_hyperv_tsc_page *tsc_pg = - (const struct ms_hyperv_tsc_page *)&hvclock_page;
- return hv_read_tsc_page(tsc_pg); -} -#endif +#if defined(CONFIG_X86_64) && !defined(BUILD_VDSO32_64) +/* both 64-bit and x32 use these */ +extern int __vdso_clock_gettime(clockid_t clock, struct __kernel_timespec *ts); +extern int __vdso_clock_getres(clockid_t clock, struct __kernel_timespec *res);
-notrace static inline u64 vgetcyc(int mode) +notrace int __vdso_clock_gettime(clockid_t clock, struct __kernel_timespec *ts) { - if (mode == VCLOCK_TSC) - return (u64)rdtsc_ordered(); -#ifdef CONFIG_PARAVIRT_CLOCK - else if (mode == VCLOCK_PVCLOCK) - return vread_pvclock(); -#endif -#ifdef CONFIG_HYPERV_TSCPAGE - else if (mode == VCLOCK_HVCLOCK) - return vread_hvclock(); -#endif - return U64_MAX; + return __cvdso_clock_gettime(clock, ts); } +int clock_gettime(clockid_t, struct __kernel_timespec *) + __attribute__((weak, alias("__vdso_clock_gettime")));
-notrace static int do_hres(clockid_t clk, struct timespec *ts) +notrace int __vdso_clock_getres(clockid_t clock, + struct __kernel_timespec *res) { - struct vgtod_ts *base = >od->basetime[clk]; - u64 cycles, last, sec, ns; - unsigned int seq; - - do { - seq = gtod_read_begin(gtod); - cycles = vgetcyc(gtod->vclock_mode); - ns = base->nsec; - last = gtod->cycle_last; - if (unlikely((s64)cycles < 0)) - return vdso_fallback_gettime(clk, ts); - if (cycles > last) - ns += (cycles - last) * gtod->mult; - ns >>= gtod->shift; - sec = base->sec; - } while (unlikely(gtod_read_retry(gtod, seq))); - - /* - * Do this outside the loop: a race inside the loop could result - * in __iter_div_u64_rem() being extremely slow. - */ - ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); - ts->tv_nsec = ns; - - return 0; + return __cvdso_clock_getres(clock, res); } +int clock_getres(clockid_t, struct __kernel_timespec *) + __attribute__((weak, alias("__vdso_clock_getres")));
-notrace static void do_coarse(clockid_t clk, struct timespec *ts) -{ - struct vgtod_ts *base = >od->basetime[clk]; - unsigned int seq; - - do { - seq = gtod_read_begin(gtod); - ts->tv_sec = base->sec; - ts->tv_nsec = base->nsec; - } while (unlikely(gtod_read_retry(gtod, seq))); -} +#else +/* i386 only */ +extern int __vdso_clock_gettime(clockid_t clock, struct old_timespec32 *ts); +extern int __vdso_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts); +extern int __vdso_clock_getres(clockid_t clock, struct old_timespec32 *res);
-notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts) +notrace int __vdso_clock_gettime(clockid_t clock, struct old_timespec32 *ts) { - unsigned int msk; - - /* Sort out negative (CPU/FD) and invalid clocks */ - if (unlikely((unsigned int) clock >= MAX_CLOCKS)) - return vdso_fallback_gettime(clock, ts); - - /* - * Convert the clockid to a bitmask and use it to check which - * clocks are handled in the VDSO directly. - */ - msk = 1U << clock; - if (likely(msk & VGTOD_HRES)) { - return do_hres(clock, ts); - } else if (msk & VGTOD_COARSE) { - do_coarse(clock, ts); - return 0; - } - return vdso_fallback_gettime(clock, ts); + return __cvdso_clock_gettime32(clock, ts); } - -int clock_gettime(clockid_t, struct timespec *) +int clock_gettime(clockid_t, struct old_timespec32 *) __attribute__((weak, alias("__vdso_clock_gettime")));
-notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz) +notrace int __vdso_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts) { - if (likely(tv != NULL)) { - struct timespec *ts = (struct timespec *) tv; - - do_hres(CLOCK_REALTIME, ts); - tv->tv_usec /= 1000; - } - if (unlikely(tz != NULL)) { - tz->tz_minuteswest = gtod->tz_minuteswest; - tz->tz_dsttime = gtod->tz_dsttime; - } - - return 0; + return __cvdso_clock_gettime(clock, ts); } -int gettimeofday(struct timeval *, struct timezone *) - __attribute__((weak, alias("__vdso_gettimeofday"))); +int clock_gettime64(clockid_t, struct __kernel_timespec *) + __attribute__((weak, alias("__vdso_clock_gettime64")));
-/* - * This will break when the xtime seconds get inaccurate, but that is - * unlikely - */ -notrace time_t __vdso_time(time_t *t) +notrace int __vdso_clock_getres(clockid_t clock, + struct old_timespec32 *res) { - /* This is atomic on x86 so we don't need any locks. */ - time_t result = READ_ONCE(gtod->basetime[CLOCK_REALTIME].sec); - - if (t) - *t = result; - return result; + return __cvdso_clock_getres_time32(clock, res); } -time_t time(time_t *t) - __attribute__((weak, alias("__vdso_time"))); +int clock_getres(clockid_t, struct old_timespec32 *) + __attribute__((weak, alias("__vdso_clock_getres"))); +#endif diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S index d3a2dce4cfa9..36b644e16272 100644 --- a/arch/x86/entry/vdso/vdso.lds.S +++ b/arch/x86/entry/vdso/vdso.lds.S @@ -25,6 +25,8 @@ VERSION { __vdso_getcpu; time; __vdso_time; + clock_getres; + __vdso_clock_getres; local: *; }; } diff --git a/arch/x86/entry/vdso/vdso32/vdso32.lds.S b/arch/x86/entry/vdso/vdso32/vdso32.lds.S index 422764a81d32..c7720995ab1a 100644 --- a/arch/x86/entry/vdso/vdso32/vdso32.lds.S +++ b/arch/x86/entry/vdso/vdso32/vdso32.lds.S @@ -26,6 +26,8 @@ VERSION __vdso_clock_gettime; __vdso_gettimeofday; __vdso_time; + __vdso_clock_getres; + __vdso_clock_gettime64; };
LINUX_2.5 { diff --git a/arch/x86/entry/vdso/vdsox32.lds.S b/arch/x86/entry/vdso/vdsox32.lds.S index 05cd1c5c4a15..16a8050a4fb6 100644 --- a/arch/x86/entry/vdso/vdsox32.lds.S +++ b/arch/x86/entry/vdso/vdsox32.lds.S @@ -21,6 +21,7 @@ VERSION { __vdso_gettimeofday; __vdso_getcpu; __vdso_time; + __vdso_clock_getres; local: *; }; } diff --git a/arch/x86/entry/vsyscall/Makefile b/arch/x86/entry/vsyscall/Makefile index 1ac4dd116c26..93c1b3e949a7 100644 --- a/arch/x86/entry/vsyscall/Makefile +++ b/arch/x86/entry/vsyscall/Makefile @@ -2,7 +2,5 @@ # # Makefile for the x86 low level vsyscall code # -obj-y := vsyscall_gtod.o - obj-$(CONFIG_X86_VSYSCALL_EMULATION) += vsyscall_64.o vsyscall_emu_64.o
diff --git a/arch/x86/entry/vsyscall/vsyscall_gtod.c b/arch/x86/entry/vsyscall/vsyscall_gtod.c deleted file mode 100644 index cfcdba082feb..000000000000 --- a/arch/x86/entry/vsyscall/vsyscall_gtod.c +++ /dev/null @@ -1,83 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Copyright (C) 2001 Andrea Arcangeli andrea@suse.de SuSE - * Copyright 2003 Andi Kleen, SuSE Labs. - * - * Modified for x86 32 bit architecture by - * Stefani Seibold stefani@seibold.net - * sponsored by Rohde & Schwarz GmbH & Co. KG Munich/Germany - * - * Thanks to hpa@transmeta.com for some useful hint. - * Special thanks to Ingo Molnar for his early experience with - * a different vsyscall implementation for Linux/IA32 and for the name. - * - */ - -#include <linux/timekeeper_internal.h> -#include <asm/vgtod.h> -#include <asm/vvar.h> - -int vclocks_used __read_mostly; - -DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data); - -void update_vsyscall_tz(void) -{ - vsyscall_gtod_data.tz_minuteswest = sys_tz.tz_minuteswest; - vsyscall_gtod_data.tz_dsttime = sys_tz.tz_dsttime; -} - -void update_vsyscall(struct timekeeper *tk) -{ - int vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode; - struct vsyscall_gtod_data *vdata = &vsyscall_gtod_data; - struct vgtod_ts *base; - u64 nsec; - - /* Mark the new vclock used. */ - BUILD_BUG_ON(VCLOCK_MAX >= 32); - WRITE_ONCE(vclocks_used, READ_ONCE(vclocks_used) | (1 << vclock_mode)); - - gtod_write_begin(vdata); - - /* copy vsyscall data */ - vdata->vclock_mode = vclock_mode; - vdata->cycle_last = tk->tkr_mono.cycle_last; - vdata->mask = tk->tkr_mono.mask; - vdata->mult = tk->tkr_mono.mult; - vdata->shift = tk->tkr_mono.shift; - - base = &vdata->basetime[CLOCK_REALTIME]; - base->sec = tk->xtime_sec; - base->nsec = tk->tkr_mono.xtime_nsec; - - base = &vdata->basetime[CLOCK_TAI]; - base->sec = tk->xtime_sec + (s64)tk->tai_offset; - base->nsec = tk->tkr_mono.xtime_nsec; - - base = &vdata->basetime[CLOCK_MONOTONIC]; - base->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec; - nsec = tk->tkr_mono.xtime_nsec; - nsec += ((u64)tk->wall_to_monotonic.tv_nsec << tk->tkr_mono.shift); - while (nsec >= (((u64)NSEC_PER_SEC) << tk->tkr_mono.shift)) { - nsec -= ((u64)NSEC_PER_SEC) << tk->tkr_mono.shift; - base->sec++; - } - base->nsec = nsec; - - base = &vdata->basetime[CLOCK_REALTIME_COARSE]; - base->sec = tk->xtime_sec; - base->nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift; - - base = &vdata->basetime[CLOCK_MONOTONIC_COARSE]; - base->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec; - nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift; - nsec += tk->wall_to_monotonic.tv_nsec; - while (nsec >= NSEC_PER_SEC) { - nsec -= NSEC_PER_SEC; - base->sec++; - } - base->nsec = nsec; - - gtod_write_end(vdata); -} diff --git a/arch/x86/include/asm/mshyperv-tsc.h b/arch/x86/include/asm/mshyperv-tsc.h new file mode 100644 index 000000000000..99c98ccea0bf --- /dev/null +++ b/arch/x86/include/asm/mshyperv-tsc.h @@ -0,0 +1,76 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MSHYPER_TSCPAGE_H +#define _ASM_X86_MSHYPER_TSCPAGE_H + +#include <asm/hyperv-tlfs.h> + +#ifdef CONFIG_HYPERV_TSCPAGE +struct ms_hyperv_tsc_page *hv_get_tsc_page(void); +static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, + u64 *cur_tsc) +{ + u64 scale, offset; + u32 sequence; + + /* + * The protocol for reading Hyper-V TSC page is specified in Hypervisor + * Top-Level Functional Specification ver. 3.0 and above. To get the + * reference time we must do the following: + * - READ ReferenceTscSequence + * A special '0' value indicates the time source is unreliable and we + * need to use something else. The currently published specification + * versions (up to 4.0b) contain a mistake and wrongly claim '-1' + * instead of '0' as the special value, see commit c35b82ef0294. + * - ReferenceTime = + * ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset + * - READ ReferenceTscSequence again. In case its value has changed + * since our first reading we need to discard ReferenceTime and repeat + * the whole sequence as the hypervisor was updating the page in + * between. + */ + do { + sequence = READ_ONCE(tsc_pg->tsc_sequence); + if (!sequence) + return U64_MAX; + /* + * Make sure we read sequence before we read other values from + * TSC page. + */ + smp_rmb(); + + scale = READ_ONCE(tsc_pg->tsc_scale); + offset = READ_ONCE(tsc_pg->tsc_offset); + *cur_tsc = rdtsc_ordered(); + + /* + * Make sure we read sequence after we read all other values + * from TSC page. + */ + smp_rmb(); + + } while (READ_ONCE(tsc_pg->tsc_sequence) != sequence); + + return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset; +} + +static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg) +{ + u64 cur_tsc; + + return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc); +} + +#else +static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void) +{ + return NULL; +} + +static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, + u64 *cur_tsc) +{ + BUG(); + return U64_MAX; +} +#endif +#endif diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index cc60e617931c..db095a992f3e 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -7,6 +7,7 @@ #include <linux/nmi.h> #include <asm/io.h> #include <asm/hyperv-tlfs.h> +#include <asm/mshyperv-tsc.h> #include <asm/nospec-branch.h>
#define VP_INVAL U32_MAX @@ -387,73 +388,4 @@ static inline int hyperv_flush_guest_mapping_range(u64 as, } #endif /* CONFIG_HYPERV */
-#ifdef CONFIG_HYPERV_TSCPAGE -struct ms_hyperv_tsc_page *hv_get_tsc_page(void); -static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, - u64 *cur_tsc) -{ - u64 scale, offset; - u32 sequence; - - /* - * The protocol for reading Hyper-V TSC page is specified in Hypervisor - * Top-Level Functional Specification ver. 3.0 and above. To get the - * reference time we must do the following: - * - READ ReferenceTscSequence - * A special '0' value indicates the time source is unreliable and we - * need to use something else. The currently published specification - * versions (up to 4.0b) contain a mistake and wrongly claim '-1' - * instead of '0' as the special value, see commit c35b82ef0294. - * - ReferenceTime = - * ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset - * - READ ReferenceTscSequence again. In case its value has changed - * since our first reading we need to discard ReferenceTime and repeat - * the whole sequence as the hypervisor was updating the page in - * between. - */ - do { - sequence = READ_ONCE(tsc_pg->tsc_sequence); - if (!sequence) - return U64_MAX; - /* - * Make sure we read sequence before we read other values from - * TSC page. - */ - smp_rmb(); - - scale = READ_ONCE(tsc_pg->tsc_scale); - offset = READ_ONCE(tsc_pg->tsc_offset); - *cur_tsc = rdtsc_ordered(); - - /* - * Make sure we read sequence after we read all other values - * from TSC page. - */ - smp_rmb(); - - } while (READ_ONCE(tsc_pg->tsc_sequence) != sequence); - - return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset; -} - -static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg) -{ - u64 cur_tsc; - - return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc); -} - -#else -static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void) -{ - return NULL; -} - -static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, - u64 *cur_tsc) -{ - BUG(); - return U64_MAX; -} -#endif #endif diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h index b6033680d458..19b695ff2c68 100644 --- a/arch/x86/include/asm/pvclock.h +++ b/arch/x86/include/asm/pvclock.h @@ -2,7 +2,7 @@ #ifndef _ASM_X86_PVCLOCK_H #define _ASM_X86_PVCLOCK_H
-#include <linux/clocksource.h> +#include <asm/clocksource.h> #include <asm/pvclock-abi.h>
/* some helper functions for xen and kvm pv clock sources */ diff --git a/arch/x86/include/asm/vdso/gettimeofday.h b/arch/x86/include/asm/vdso/gettimeofday.h new file mode 100644 index 000000000000..45608b1d6ff8 --- /dev/null +++ b/arch/x86/include/asm/vdso/gettimeofday.h @@ -0,0 +1,203 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Fast user context implementation of clock_gettime, gettimeofday, and time. + * + * Copyright (C) 2019 ARM Limited. + * Copyright 2006 Andi Kleen, SUSE Labs. + * 32 Bit compat layer by Stefani Seibold stefani@seibold.net + * sponsored by Rohde & Schwarz GmbH & Co. KG Munich/Germany + */ +#ifndef __ASM_VDSO_GETTIMEOFDAY_H +#define __ASM_VDSO_GETTIMEOFDAY_H + +#ifndef __ASSEMBLY__ + +#include <uapi/linux/time.h> +#include <asm/vgtod.h> +#include <asm/vvar.h> +#include <asm/unistd.h> +#include <asm/msr.h> +#include <asm/pvclock.h> +#include <asm/mshyperv-tsc.h> + +#define __vdso_data (VVAR(_vdso_data)) + +#define VDSO_HAS_TIME 1 + +#ifdef CONFIG_PARAVIRT_CLOCK +extern u8 pvclock_page[PAGE_SIZE] + __attribute__((visibility("hidden"))); +#endif + +#ifdef CONFIG_HYPERV_TSCPAGE +extern u8 hvclock_page[PAGE_SIZE] + __attribute__((visibility("hidden"))); +#endif + +#ifndef BUILD_VDSO32 + +static __always_inline notrace long clock_gettime_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + long ret; + asm ("syscall" : "=a" (ret), "=m" (*_ts) : + "0" (__NR_clock_gettime), "D" (_clkid), "S" (_ts) : + "rcx", "r11"); + return ret; +} + +static __always_inline notrace long gettimeofday_fallback( + struct __kernel_old_timeval *_tv, + struct timezone *_tz) +{ + long ret; + asm("syscall" : "=a" (ret) : + "0" (__NR_gettimeofday), "D" (_tv), "S" (_tz) : "memory"); + return ret; +} + +static __always_inline notrace long clock_getres_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + long ret; + asm ("syscall" : "=a" (ret), "=m" (*_ts) : + "0" (__NR_clock_getres), "D" (_clkid), "S" (_ts) : + "rcx", "r11"); + return ret; +} + +#else + +static __always_inline notrace long clock_gettime_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + long ret; + + asm ( + "mov %%ebx, %%edx \n" + "mov %[clock], %%ebx \n" + "call __kernel_vsyscall \n" + "mov %%edx, %%ebx \n" + : "=a" (ret), "=m" (*_ts) + : "0" (__NR_clock_gettime64), [clock] "g" (_clkid), "c" (_ts) + : "edx"); + return ret; +} + +static __always_inline notrace long gettimeofday_fallback( + struct __kernel_old_timeval *_tv, + struct timezone *_tz) +{ + long ret; + asm( + "mov %%ebx, %%edx \n" + "mov %2, %%ebx \n" + "call __kernel_vsyscall \n" + "mov %%edx, %%ebx \n" + : "=a" (ret) + : "0" (__NR_gettimeofday), "g" (_tv), "c" (_tz) + : "memory", "edx"); + return ret; +} + +static __always_inline notrace long clock_getres_fallback( + clockid_t _clkid, + struct __kernel_timespec *_ts) +{ + long ret; + + asm ( + "mov %%ebx, %%edx \n" + "mov %[clock], %%ebx \n" + "call __kernel_vsyscall \n" + "mov %%edx, %%ebx \n" + : "=a" (ret), "=m" (*_ts) + : "0" (__NR_clock_getres_time64), [clock] "g" (_clkid), "c" (_ts) + : "edx"); + return ret; +} + +#endif + +#ifdef CONFIG_PARAVIRT_CLOCK +static notrace const struct pvclock_vsyscall_time_info *get_pvti0(void) +{ + return (const struct pvclock_vsyscall_time_info *)&pvclock_page; +} + +static notrace u64 vread_pvclock(void) +{ + const struct pvclock_vcpu_time_info *pvti = &get_pvti0()->pvti; + u32 version; + u64 ret; + + /* + * Note: The kernel and hypervisor must guarantee that cpu ID + * number maps 1:1 to per-CPU pvclock time info. + * + * Because the hypervisor is entirely unaware of guest userspace + * preemption, it cannot guarantee that per-CPU pvclock time + * info is updated if the underlying CPU changes or that that + * version is increased whenever underlying CPU changes. + * + * On KVM, we are guaranteed that pvti updates for any vCPU are + * atomic as seen by *all* vCPUs. This is an even stronger + * guarantee than we get with a normal seqlock. + * + * On Xen, we don't appear to have that guarantee, but Xen still + * supplies a valid seqlock using the version field. + * + * We only do pvclock vdso timing at all if + * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to + * mean that all vCPUs have matching pvti and that the TSC is + * synced, so we can just look at vCPU 0's pvti. + */ + + do { + version = pvclock_read_begin(pvti); + + if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) + return U64_MAX; + + ret = __pvclock_read_cycles(pvti, rdtsc_ordered()); + } while (pvclock_read_retry(pvti, version)); + + return ret; +} +#endif +#ifdef CONFIG_HYPERV_TSCPAGE +static notrace u64 vread_hvclock(void) +{ + const struct ms_hyperv_tsc_page *tsc_pg = + (const struct ms_hyperv_tsc_page *)&hvclock_page; + + return hv_read_tsc_page(tsc_pg); +} +#endif + +notrace static inline u64 __arch_get_hw_counter(s32 clock_mode) +{ + if (clock_mode == VCLOCK_TSC) + return (u64)rdtsc_ordered(); +#ifdef CONFIG_PARAVIRT_CLOCK + else if (clock_mode == VCLOCK_PVCLOCK) + return vread_pvclock(); +#endif +#ifdef CONFIG_HYPERV_TSCPAGE + else if (clock_mode == VCLOCK_HVCLOCK) + return vread_hvclock(); +#endif + return U64_MAX; +} + +static __always_inline notrace const struct vdso_data *__arch_get_vdso_data(void) +{ + return __vdso_data; +} + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_GETTIMEOFDAY_H */ diff --git a/arch/x86/include/asm/vdso/vsyscall.h b/arch/x86/include/asm/vdso/vsyscall.h new file mode 100644 index 000000000000..0026ab2123ce --- /dev/null +++ b/arch/x86/include/asm/vdso/vsyscall.h @@ -0,0 +1,44 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_VDSO_VSYSCALL_H +#define __ASM_VDSO_VSYSCALL_H + +#ifndef __ASSEMBLY__ + +#include <linux/hrtimer.h> +#include <linux/timekeeper_internal.h> +#include <vdso/datapage.h> +#include <asm/vgtod.h> +#include <asm/vvar.h> + +int vclocks_used __read_mostly; + +DEFINE_VVAR(struct vdso_data, _vdso_data); +/* + * Update the vDSO data page to keep in sync with kernel timekeeping. + */ +static __always_inline +struct vdso_data *__x86_get_k_vdso_data(void) +{ + return _vdso_data; +} +#define __arch_get_k_vdso_data __x86_get_k_vdso_data + +static __always_inline +int __x86_get_clock_mode(struct timekeeper *tk) +{ + int vclock_mode = tk->tkr_mono.clock->archdata.vclock_mode; + + /* Mark the new vclock used. */ + BUILD_BUG_ON(VCLOCK_MAX >= 32); + WRITE_ONCE(vclocks_used, READ_ONCE(vclocks_used) | (1 << vclock_mode)); + + return vclock_mode; +} +#define __arch_get_clock_mode __x86_get_clock_mode + +/* The asm-generic header needs to be included after the definitions above */ +#include <asm-generic/vdso/vsyscall.h> + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_VSYSCALL_H */ diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h index 913a133f8e6f..a2638c6124ed 100644 --- a/arch/x86/include/asm/vgtod.h +++ b/arch/x86/include/asm/vgtod.h @@ -3,7 +3,9 @@ #define _ASM_X86_VGTOD_H
#include <linux/compiler.h> -#include <linux/clocksource.h> +#include <asm/clocksource.h> +#include <vdso/datapage.h> +#include <vdso/helpers.h>
#include <uapi/linux/time.h>
@@ -13,81 +15,10 @@ typedef u64 gtod_long_t; typedef unsigned long gtod_long_t; #endif
-/* - * There is one of these objects in the vvar page for each - * vDSO-accelerated clockid. For high-resolution clocks, this encodes - * the time corresponding to vsyscall_gtod_data.cycle_last. For coarse - * clocks, this encodes the actual time. - * - * To confuse the reader, for high-resolution clocks, nsec is left-shifted - * by vsyscall_gtod_data.shift. - */ -struct vgtod_ts { - u64 sec; - u64 nsec; -}; - -#define VGTOD_BASES (CLOCK_TAI + 1) -#define VGTOD_HRES (BIT(CLOCK_REALTIME) | BIT(CLOCK_MONOTONIC) | BIT(CLOCK_TAI)) -#define VGTOD_COARSE (BIT(CLOCK_REALTIME_COARSE) | BIT(CLOCK_MONOTONIC_COARSE)) - -/* - * vsyscall_gtod_data will be accessed by 32 and 64 bit code at the same time - * so be carefull by modifying this structure. - */ -struct vsyscall_gtod_data { - unsigned int seq; - - int vclock_mode; - u64 cycle_last; - u64 mask; - u32 mult; - u32 shift; - - struct vgtod_ts basetime[VGTOD_BASES]; - - int tz_minuteswest; - int tz_dsttime; -}; -extern struct vsyscall_gtod_data vsyscall_gtod_data; - extern int vclocks_used; static inline bool vclock_was_used(int vclock) { return READ_ONCE(vclocks_used) & (1 << vclock); }
-static inline unsigned int gtod_read_begin(const struct vsyscall_gtod_data *s) -{ - unsigned int ret; - -repeat: - ret = READ_ONCE(s->seq); - if (unlikely(ret & 1)) { - cpu_relax(); - goto repeat; - } - smp_rmb(); - return ret; -} - -static inline int gtod_read_retry(const struct vsyscall_gtod_data *s, - unsigned int start) -{ - smp_rmb(); - return unlikely(s->seq != start); -} - -static inline void gtod_write_begin(struct vsyscall_gtod_data *s) -{ - ++s->seq; - smp_wmb(); -} - -static inline void gtod_write_end(struct vsyscall_gtod_data *s) -{ - smp_wmb(); - ++s->seq; -} - #endif /* _ASM_X86_VGTOD_H */ diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h index 3f32dfc2ab73..2dbde48f27a9 100644 --- a/arch/x86/include/asm/vvar.h +++ b/arch/x86/include/asm/vvar.h @@ -32,19 +32,20 @@ extern char __vvar_page;
#define DECLARE_VVAR(offset, type, name) \ - extern type vvar_ ## name __attribute__((visibility("hidden"))); + extern type vvar_ ## name[CS_BASES] \ + __attribute__((visibility("hidden")));
#define VVAR(name) (vvar_ ## name)
#define DEFINE_VVAR(type, name) \ - type name \ + type name[CS_BASES] \ __attribute__((section(".vvar_" #name), aligned(16))) __visible
#endif
/* DECLARE_VVAR(offset, type, name) */
-DECLARE_VVAR(128, struct vsyscall_gtod_data, vsyscall_gtod_data) +DECLARE_VVAR(128, struct vdso_data, _vdso_data)
#undef DECLARE_VVAR
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 0ff3e294d0e5..10125358b9c4 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -3,6 +3,7 @@
*/
+#include <linux/clocksource.h> #include <linux/kernel.h> #include <linux/percpu.h> #include <linux/notifier.h>
From: Vincenzo Frascino vincenzo.frascino@arm.com On Thursday, May 30, 2019 7:16 AM
The x86 vDSO library requires some adaptations to take advantage of the newly introduced generic vDSO library.
Introduce the following changes:
- Modification of vdso.c to be compliant with the common vdso datapage
- Use of lib/vdso for gettimeofday
Cc: Thomas Gleixner tglx@linutronix.de Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com
diff --git a/arch/x86/include/asm/mshyperv-tsc.h b/arch/x86/include/asm/mshyperv-tsc.h new file mode 100644 index 000000000000..99c98ccea0bf --- /dev/null +++ b/arch/x86/include/asm/mshyperv-tsc.h @@ -0,0 +1,76 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MSHYPER_TSCPAGE_H +#define _ASM_X86_MSHYPER_TSCPAGE_H
+#include <asm/hyperv-tlfs.h>
+#ifdef CONFIG_HYPERV_TSCPAGE +struct ms_hyperv_tsc_page *hv_get_tsc_page(void); +static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
u64 *cur_tsc)
+{
- u64 scale, offset;
- u32 sequence;
- /*
* The protocol for reading Hyper-V TSC page is specified in Hypervisor
* Top-Level Functional Specification ver. 3.0 and above. To get the
* reference time we must do the following:
* - READ ReferenceTscSequence
* A special '0' value indicates the time source is unreliable and we
* need to use something else. The currently published specification
* versions (up to 4.0b) contain a mistake and wrongly claim '-1'
* instead of '0' as the special value, see commit c35b82ef0294.
* - ReferenceTime =
* ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
* - READ ReferenceTscSequence again. In case its value has changed
* since our first reading we need to discard ReferenceTime and repeat
* the whole sequence as the hypervisor was updating the page in
* between.
*/
- do {
sequence = READ_ONCE(tsc_pg->tsc_sequence);
if (!sequence)
return U64_MAX;
/*
* Make sure we read sequence before we read other values from
* TSC page.
*/
smp_rmb();
scale = READ_ONCE(tsc_pg->tsc_scale);
offset = READ_ONCE(tsc_pg->tsc_offset);
*cur_tsc = rdtsc_ordered();
/*
* Make sure we read sequence after we read all other values
* from TSC page.
*/
smp_rmb();
- } while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
- return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
+}
+static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg) +{
- u64 cur_tsc;
- return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc);
+}
+#else +static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void) +{
- return NULL;
+}
+static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
u64 *cur_tsc)
+{
- BUG();
- return U64_MAX;
+} +#endif +#endif diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index cc60e617931c..db095a992f3e 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -7,6 +7,7 @@ #include <linux/nmi.h> #include <asm/io.h> #include <asm/hyperv-tlfs.h> +#include <asm/mshyperv-tsc.h> #include <asm/nospec-branch.h>
#define VP_INVAL U32_MAX @@ -387,73 +388,4 @@ static inline int hyperv_flush_guest_mapping_range(u64 as, } #endif /* CONFIG_HYPERV */
-#ifdef CONFIG_HYPERV_TSCPAGE -struct ms_hyperv_tsc_page *hv_get_tsc_page(void); -static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
u64 *cur_tsc)
-{
- u64 scale, offset;
- u32 sequence;
- /*
* The protocol for reading Hyper-V TSC page is specified in Hypervisor
* Top-Level Functional Specification ver. 3.0 and above. To get the
* reference time we must do the following:
* - READ ReferenceTscSequence
* A special '0' value indicates the time source is unreliable and we
* need to use something else. The currently published specification
* versions (up to 4.0b) contain a mistake and wrongly claim '-1'
* instead of '0' as the special value, see commit c35b82ef0294.
* - ReferenceTime =
* ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
* - READ ReferenceTscSequence again. In case its value has changed
* since our first reading we need to discard ReferenceTime and repeat
* the whole sequence as the hypervisor was updating the page in
* between.
*/
- do {
sequence = READ_ONCE(tsc_pg->tsc_sequence);
if (!sequence)
return U64_MAX;
/*
* Make sure we read sequence before we read other values from
* TSC page.
*/
smp_rmb();
scale = READ_ONCE(tsc_pg->tsc_scale);
offset = READ_ONCE(tsc_pg->tsc_offset);
*cur_tsc = rdtsc_ordered();
/*
* Make sure we read sequence after we read all other values
* from TSC page.
*/
smp_rmb();
- } while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
- return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
-}
-static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg) -{
- u64 cur_tsc;
- return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc);
-}
-#else -static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void) -{
- return NULL;
-}
-static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
u64 *cur_tsc)
-{
- BUG();
- return U64_MAX;
-} -#endif #endif
Vincenzo -- these changes for Hyper-V are a subset of a larger patch set I have that moves all of the Hyper-V clock/timer code into a separate clocksource driver in drivers/clocksource, with an include file in includes/clocksource. That new include file should be able to work instead of your new mshyperv-tsc.h. It also has the benefit of being ISA neutral, so it will work with my in-progress patch set to support Linux on Hyper-V on ARM64. See https://lkml.org/lkml/2019/5/27/231 for the new clocksource driver patch set.
Michael
Hi Michael,
On 30/05/2019 16:41, Michael Kelley wrote:
From: Vincenzo Frascino vincenzo.frascino@arm.com On Thursday, May 30, 2019 7:16 AM
The x86 vDSO library requires some adaptations to take advantage of the newly introduced generic vDSO library.
Introduce the following changes:
- Modification of vdso.c to be compliant with the common vdso datapage
- Use of lib/vdso for gettimeofday
Cc: Thomas Gleixner tglx@linutronix.de Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com
diff --git a/arch/x86/include/asm/mshyperv-tsc.h b/arch/x86/include/asm/mshyperv-tsc.h new file mode 100644 index 000000000000..99c98ccea0bf --- /dev/null +++ b/arch/x86/include/asm/mshyperv-tsc.h @@ -0,0 +1,76 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MSHYPER_TSCPAGE_H +#define _ASM_X86_MSHYPER_TSCPAGE_H
+#include <asm/hyperv-tlfs.h>
+#ifdef CONFIG_HYPERV_TSCPAGE +struct ms_hyperv_tsc_page *hv_get_tsc_page(void); +static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
u64 *cur_tsc)
+{
- u64 scale, offset;
- u32 sequence;
- /*
* The protocol for reading Hyper-V TSC page is specified in Hypervisor
* Top-Level Functional Specification ver. 3.0 and above. To get the
* reference time we must do the following:
* - READ ReferenceTscSequence
* A special '0' value indicates the time source is unreliable and we
* need to use something else. The currently published specification
* versions (up to 4.0b) contain a mistake and wrongly claim '-1'
* instead of '0' as the special value, see commit c35b82ef0294.
* - ReferenceTime =
* ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
* - READ ReferenceTscSequence again. In case its value has changed
* since our first reading we need to discard ReferenceTime and repeat
* the whole sequence as the hypervisor was updating the page in
* between.
*/
- do {
sequence = READ_ONCE(tsc_pg->tsc_sequence);
if (!sequence)
return U64_MAX;
/*
* Make sure we read sequence before we read other values from
* TSC page.
*/
smp_rmb();
scale = READ_ONCE(tsc_pg->tsc_scale);
offset = READ_ONCE(tsc_pg->tsc_offset);
*cur_tsc = rdtsc_ordered();
/*
* Make sure we read sequence after we read all other values
* from TSC page.
*/
smp_rmb();
- } while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
- return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
+}
+static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg) +{
- u64 cur_tsc;
- return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc);
+}
+#else +static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void) +{
- return NULL;
+}
+static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
u64 *cur_tsc)
+{
- BUG();
- return U64_MAX;
+} +#endif +#endif diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index cc60e617931c..db095a992f3e 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -7,6 +7,7 @@ #include <linux/nmi.h> #include <asm/io.h> #include <asm/hyperv-tlfs.h> +#include <asm/mshyperv-tsc.h> #include <asm/nospec-branch.h>
#define VP_INVAL U32_MAX @@ -387,73 +388,4 @@ static inline int hyperv_flush_guest_mapping_range(u64 as, } #endif /* CONFIG_HYPERV */
-#ifdef CONFIG_HYPERV_TSCPAGE -struct ms_hyperv_tsc_page *hv_get_tsc_page(void); -static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
u64 *cur_tsc)
-{
- u64 scale, offset;
- u32 sequence;
- /*
* The protocol for reading Hyper-V TSC page is specified in Hypervisor
* Top-Level Functional Specification ver. 3.0 and above. To get the
* reference time we must do the following:
* - READ ReferenceTscSequence
* A special '0' value indicates the time source is unreliable and we
* need to use something else. The currently published specification
* versions (up to 4.0b) contain a mistake and wrongly claim '-1'
* instead of '0' as the special value, see commit c35b82ef0294.
* - ReferenceTime =
* ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
* - READ ReferenceTscSequence again. In case its value has changed
* since our first reading we need to discard ReferenceTime and repeat
* the whole sequence as the hypervisor was updating the page in
* between.
*/
- do {
sequence = READ_ONCE(tsc_pg->tsc_sequence);
if (!sequence)
return U64_MAX;
/*
* Make sure we read sequence before we read other values from
* TSC page.
*/
smp_rmb();
scale = READ_ONCE(tsc_pg->tsc_scale);
offset = READ_ONCE(tsc_pg->tsc_offset);
*cur_tsc = rdtsc_ordered();
/*
* Make sure we read sequence after we read all other values
* from TSC page.
*/
smp_rmb();
- } while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
- return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
-}
-static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg) -{
- u64 cur_tsc;
- return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc);
-}
-#else -static inline struct ms_hyperv_tsc_page *hv_get_tsc_page(void) -{
- return NULL;
-}
-static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
u64 *cur_tsc)
-{
- BUG();
- return U64_MAX;
-} -#endif #endif
Vincenzo -- these changes for Hyper-V are a subset of a larger patch set I have that moves all of the Hyper-V clock/timer code into a separate clocksource driver in drivers/clocksource, with an include file in includes/clocksource. That new include file should be able to work instead of your new mshyperv-tsc.h. It also has the benefit of being ISA neutral, so it will work with my in-progress patch set to support Linux on Hyper-V on ARM64. See https://lkml.org/lkml/2019/5/27/231 for the new clocksource driver patch set.
Thank you for pointing this out, I will rebase my changes on your patches.
Michael
On Thu, 30 May 2019, Michael Kelley wrote:
Vincenzo -- these changes for Hyper-V are a subset of a larger patch set I have that moves all of the Hyper-V clock/timer code into a separate clocksource driver in drivers/clocksource, with an include file in includes/clocksource. That new include file should be able to work instead of your new mshyperv-tsc.h. It also has the benefit of being ISA neutral, so it will work with my in-progress patch set to support Linux on Hyper-V on ARM64. See https://lkml.org/lkml/2019/5/27/231 for the new clocksource driver patch set.
Grrr. That's queued in hyperv-next for whatever reasons.
Sasha, can you please provide me the branch to pull from so I can have a common base for all the various changes floating around?
Thanks,
tglx
On Fri, Jun 14, 2019 at 01:15:23PM +0200, Thomas Gleixner wrote:
On Thu, 30 May 2019, Michael Kelley wrote:
Vincenzo -- these changes for Hyper-V are a subset of a larger patch set I have that moves all of the Hyper-V clock/timer code into a separate clocksource driver in drivers/clocksource, with an include file in includes/clocksource. That new include file should be able to work instead of your new mshyperv-tsc.h. It also has the benefit of being ISA neutral, so it will work with my in-progress patch set to support Linux on Hyper-V on ARM64. See https://lkml.org/lkml/2019/5/27/231 for the new clocksource driver patch set.
Grrr. That's queued in hyperv-next for whatever reasons.
I queue up our future pull requests there to give them some soaking in -next.
Sasha, can you please provide me the branch to pull from so I can have a common base for all the various changes floating around?
I'll send you a unified pull request for these changes.
-- Thanks, Sasha
On Fri, 14 Jun 2019, Sasha Levin wrote:
On Fri, Jun 14, 2019 at 01:15:23PM +0200, Thomas Gleixner wrote:
On Thu, 30 May 2019, Michael Kelley wrote:
Vincenzo -- these changes for Hyper-V are a subset of a larger patch set I have that moves all of the Hyper-V clock/timer code into a separate clocksource driver in drivers/clocksource, with an include file in includes/clocksource. That new include file should be able to work instead of your new mshyperv-tsc.h. It also has the benefit of being ISA neutral, so it will work with my in-progress patch set to support Linux on Hyper-V on ARM64. See https://lkml.org/lkml/2019/5/27/231 for the new clocksource driver patch set.
Grrr. That's queued in hyperv-next for whatever reasons.
I queue up our future pull requests there to give them some soaking in -next.
What? You queue completely unreviewed stuff which touches two other subsystems to let it soak in next?
Sasha, can you please provide me the branch to pull from so I can have a common base for all the various changes floating around?
I'll send you a unified pull request for these changes.
Which has not materialized yet.
TBH, I'm pretty grumpy about those clocksource changes. Here is the diffstat:
MAINTAINERS | 2 arch/x86/entry/vdso/vclock_gettime.c | 1 arch/x86/entry/vdso/vma.c | 2 arch/x86/hyperv/hv_init.c | 91 --------- arch/x86/include/asm/hyperv-tlfs.h | 6 arch/x86/include/asm/mshyperv.h | 81 +------- arch/x86/kernel/cpu/mshyperv.c | 2 arch/x86/kvm/x86.c | 1 drivers/clocksource/Makefile | 1 drivers/clocksource/hyperv_timer.c | 322 +++++++++++++++++++++++++++++++++++ drivers/hv/Kconfig | 3 drivers/hv/hv.c | 156 ---------------- drivers/hv/hv_util.c | 1 drivers/hv/hyperv_vmbus.h | 3 drivers/hv/vmbus_drv.c | 42 ++-- include/clocksource/hyperv_timer.h | 105 +++++++++++
While the world and some more people have been CC'ed on those patches, neither the clocksource nor the x86 maintainer have been.
When I gave Vincenzo the advise to base his code on that hyper-v branch, I expected that I find the related patches in my mail backlog. No, they have not been there because I was not on CC.
Folks, please stop chosing Cc lists as you like. We have well established rules for that. And please stop queueing random unreviewed patches in next. Next is not a playground for not ready and unreviewed stuff. No, the hyper-v inbreed Reviewed-by is not sufficient for anything x86 and clocksource related.
After chasing and looking at those patches, which have horrible subject lines and changelogs btw, I was not able to judge quickly whether that stuff is self contained or not. So no, I fixed up the fallout and rebased Vincenzos VDSO stuff on mainline w/o those hyperv changes simply because if they are not self contained they will break bisection badly.
I'm going to push out the VDSO series later today. That will nicely break in combination with the hyper-next branch. Stephen, please drop that and do not try to handle the fallout. That stuff needs to go through the proper channels or at least be acked/reviewed by the relevant maintainers. So the hyper-v folks can rebase themself and post it proper.
Yours grumpy,
tglx
On Sat, Jun 22, 2019 at 04:46:28PM +0200, Thomas Gleixner wrote:
On Fri, 14 Jun 2019, Sasha Levin wrote:
On Fri, Jun 14, 2019 at 01:15:23PM +0200, Thomas Gleixner wrote:
On Thu, 30 May 2019, Michael Kelley wrote:
Vincenzo -- these changes for Hyper-V are a subset of a larger patch set I have that moves all of the Hyper-V clock/timer code into a separate clocksource driver in drivers/clocksource, with an include file in includes/clocksource. That new include file should be able to work instead of your new mshyperv-tsc.h. It also has the benefit of being ISA neutral, so it will work with my in-progress patch set to support Linux on Hyper-V on ARM64. See https://lkml.org/lkml/2019/5/27/231 for the new clocksource driver patch set.
Grrr. That's queued in hyperv-next for whatever reasons.
I queue up our future pull requests there to give them some soaking in -next.
What? You queue completely unreviewed stuff which touches two other subsystems to let it soak in next?
It was out on LKML for 2+ weeks before I've pulled it in. As it mostly touches hyperv bits I felt comfortable to give it time in -next (but not actually to try and merge it until it gets a few acks).
Sasha, can you please provide me the branch to pull from so I can have a common base for all the various changes floating around?
I'll send you a unified pull request for these changes.
Which has not materialized yet.
Appologies about this. I ended up with way more travel than I would have liked (writing this from an airport). I've reset our hyperv-next branch to remove these 3 commits until we figure this out.
TBH, I'm pretty grumpy about those clocksource changes. Here is the diffstat:
MAINTAINERS | 2 arch/x86/entry/vdso/vclock_gettime.c | 1 arch/x86/entry/vdso/vma.c | 2 arch/x86/hyperv/hv_init.c | 91 --------- arch/x86/include/asm/hyperv-tlfs.h | 6 arch/x86/include/asm/mshyperv.h | 81 +------- arch/x86/kernel/cpu/mshyperv.c | 2 arch/x86/kvm/x86.c | 1 drivers/clocksource/Makefile | 1 drivers/clocksource/hyperv_timer.c | 322 +++++++++++++++++++++++++++++++++++ drivers/hv/Kconfig | 3 drivers/hv/hv.c | 156 ---------------- drivers/hv/hv_util.c | 1 drivers/hv/hyperv_vmbus.h | 3 drivers/hv/vmbus_drv.c | 42 ++-- include/clocksource/hyperv_timer.h | 105 +++++++++++
While the world and some more people have been CC'ed on those patches, neither the clocksource nor the x86 maintainer have been.
When I gave Vincenzo the advise to base his code on that hyper-v branch, I expected that I find the related patches in my mail backlog. No, they have not been there because I was not on CC.
Folks, please stop chosing Cc lists as you like. We have well established rules for that. And please stop queueing random unreviewed patches in next. Next is not a playground for not ready and unreviewed stuff. No, the hyper-v inbreed Reviewed-by is not sufficient for anything x86 and clocksource related.
I'm sorry for this, you were supposed to be Cc'ed on these patches and I see that you were not.
After chasing and looking at those patches, which have horrible subject lines and changelogs btw, I was not able to judge quickly whether that stuff is self contained or not. So no, I fixed up the fallout and rebased Vincenzos VDSO stuff on mainline w/o those hyperv changes simply because if they are not self contained they will break bisection badly.
I'm going to push out the VDSO series later today. That will nicely break in combination with the hyper-next branch. Stephen, please drop that and do not try to handle the fallout. That stuff needs to go through the proper channels or at least be acked/reviewed by the relevant maintainers. So the hyper-v folks can rebase themself and post it proper.
Okay, thank you. We'll rebase and resend.
-- Thanks, Sasha
Hi Sasha,
On Sun, 23 Jun 2019 15:09:29 -0400 Sasha Levin sashal@kernel.org wrote:
Appologies about this. I ended up with way more travel than I would have liked (writing this from an airport). I've reset our hyperv-next branch to remove these 3 commits until we figure this out.
But not pushed out, yet?
On Mon, Jun 24, 2019 at 07:58:34AM +1000, Stephen Rothwell wrote:
Hi Sasha,
On Sun, 23 Jun 2019 15:09:29 -0400 Sasha Levin sashal@kernel.org wrote:
Appologies about this. I ended up with way more travel than I would have liked (writing this from an airport). I've reset our hyperv-next branch to remove these 3 commits until we figure this out.
But not pushed out, yet?
Pushed now. For some reason the airport wifi was blocking ssh :/
-- Thanks, Sasha
Hi Sasha,
On Sun, 23 Jun 2019 20:24:30 -0400 Sasha Levin sashal@kernel.org wrote:
Pushed now. For some reason the airport wifi was blocking ssh :/
Thanks.
Sasha,
On Sun, 23 Jun 2019, Sasha Levin wrote:
On Sat, Jun 22, 2019 at 04:46:28PM +0200, Thomas Gleixner wrote:
Folks, please stop chosing Cc lists as you like. We have well established rules for that. And please stop queueing random unreviewed patches in next. Next is not a playground for not ready and unreviewed stuff. No, the hyper-v inbreed Reviewed-by is not sufficient for anything x86 and clocksource related.
I'm sorry for this, you were supposed to be Cc'ed on these patches and I see that you were not.
All good. I've vented steam and am back to normal pressure :)
After chasing and looking at those patches, which have horrible subject lines and changelogs btw, I was not able to judge quickly whether that stuff is self contained or not. So no, I fixed up the fallout and rebased Vincenzos VDSO stuff on mainline w/o those hyperv changes simply because if they are not self contained they will break bisection badly.
I'm going to push out the VDSO series later today. That will nicely break
Not yet, but soon :)
in combination with the hyper-next branch. Stephen, please drop that and do not try to handle the fallout. That stuff needs to go through the proper channels or at least be acked/reviewed by the relevant maintainers. So the hyper-v folks can rebase themself and post it proper.
Okay, thank you. We'll rebase and resend.
I have no objections if you collect hyper-v stuff, quite the contrary, but changes which touch other subsystems need to be coordinated upfront. That's all I'm asking for.
Btw, that clocksource stuff looks good code wise, just the change logs need some care and after the VDSO stuff hits next we need to sort out the logistics. I hope these changes are completely self contained. If not we'll find a solution.
Thanks,
tglx
From: Thomas Gleixner tglx@linutronix.de Sent: Sunday, June 23, 2019 3:13 PM
I have no objections if you collect hyper-v stuff, quite the contrary, but changes which touch other subsystems need to be coordinated upfront. That's all I'm asking for.
Btw, that clocksource stuff looks good code wise, just the change logs need some care and after the VDSO stuff hits next we need to sort out the logistics. I hope these changes are completely self contained. If not we'll find a solution.
In my view, the only thing that potentially needs a solution is where the Hyper-V clock code used by VDSO ends up in the code tree. I think the right long term place is include/clocksource/hyperv_timer.h. That location is architecture neutral, and the same Hyper-V clock code will be shared by the Hyper-V on ARM64 support that's in process.
Vincenzo's patch set creates a new file arch/x86/include/asm/mshyperv-tsc.h, which I will want to move when creating the separate Hyper-V clocksource driver. If you're OK with that file existing for a release and then going away, that's fine. Alternatively, put the code in include/clocksource/hyperv_timer.h now as part of the VDSO patch set so it's in the right place from the start. My subsequent patch set will add a few additional tweaks to remove x86-isms and fully integrate with the separate Hyper-V clocksource driver.
Michael
On Mon, 24 Jun 2019, Michael Kelley wrote:
From: Thomas Gleixner tglx@linutronix.de Sent: Sunday, June 23, 2019 3:13 PM
I have no objections if you collect hyper-v stuff, quite the contrary, but changes which touch other subsystems need to be coordinated upfront. That's all I'm asking for.
Btw, that clocksource stuff looks good code wise, just the change logs need some care and after the VDSO stuff hits next we need to sort out the logistics. I hope these changes are completely self contained. If not we'll find a solution.
In my view, the only thing that potentially needs a solution is where the Hyper-V clock code used by VDSO ends up in the code tree. I think the right long term place is include/clocksource/hyperv_timer.h. That location is architecture neutral, and the same Hyper-V clock code will be shared by the Hyper-V on ARM64 support that's in process.
Vincenzo's patch set creates a new file arch/x86/include/asm/mshyperv-tsc.h, which I will want to move when creating the separate Hyper-V clocksource driver. If you're OK with that file existing for a release and then going away, that's fine. Alternatively, put the code in include/clocksource/hyperv_timer.h now as part of the VDSO patch set so it's in the right place from the start. My subsequent patch set will add a few additional tweaks to remove x86-isms and fully integrate with the separate Hyper-V clocksource driver.
I don't care whether this goes into 5.3 or later. If you can provide me rebased self contained patches on top of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/vdso
I'm happy to pull them in on top.
Thanks,
tglx
From: Thomas Gleixner tglx@linutronix.de Sent: Sunday, June 23, 2019 5:25 PM
I don't care whether this goes into 5.3 or later. If you can provide me rebased self contained patches on top of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/vdso
I'm happy to pull them in on top.
I've sent out "v4" of the patch set to create a Hyper-V clocksource, based on the above tree. It is contained to Hyper-V code, plus updating a #include statement in two of the VDSO files and in one KVM file. If the KVM file update is problematic, the patch set can just wait until 5.3-rc1.
Michael
The current version of the multiarch vDSO selftest verifies only gettimeofday.
Extend the vDSO selftest to the other library functions: - time - clock_getres - clock_gettime
The extension has been used to verify the unified vdso library on the supported architectures.
Cc: Shuah Khan shuah@kernel.org Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com --- tools/testing/selftests/vDSO/Makefile | 2 + tools/testing/selftests/vDSO/vdso_full_test.c | 261 ++++++++++++++++++ 2 files changed, 263 insertions(+) create mode 100644 tools/testing/selftests/vDSO/vdso_full_test.c
diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile index 9e03d61f52fd..68e9b4a1cdcf 100644 --- a/tools/testing/selftests/vDSO/Makefile +++ b/tools/testing/selftests/vDSO/Makefile @@ -5,6 +5,7 @@ uname_M := $(shell uname -m 2>/dev/null || echo not) ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
TEST_GEN_PROGS := $(OUTPUT)/vdso_test +TEST_GEN_PROGS += $(OUTPUT)/vdso_full_test ifeq ($(ARCH),x86) TEST_GEN_PROGS += $(OUTPUT)/vdso_standalone_test_x86 endif @@ -18,6 +19,7 @@ endif
all: $(TEST_GEN_PROGS) $(OUTPUT)/vdso_test: parse_vdso.c vdso_test.c +$(OUTPUT)/vdso_full_test: parse_vdso.c vdso_full_test.c $(OUTPUT)/vdso_standalone_test_x86: vdso_standalone_test_x86.c parse_vdso.c $(CC) $(CFLAGS) $(CFLAGS_vdso_standalone_test_x86) \ vdso_standalone_test_x86.c parse_vdso.c \ diff --git a/tools/testing/selftests/vDSO/vdso_full_test.c b/tools/testing/selftests/vDSO/vdso_full_test.c new file mode 100644 index 000000000000..62001d3d241b --- /dev/null +++ b/tools/testing/selftests/vDSO/vdso_full_test.c @@ -0,0 +1,261 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * vdso_full_test.c: Sample code to test all the timers. + * Copyright (c) 2019 Arm Ltd. + * + * Compile with: + * gcc -std=gnu99 vdso_full_test.c parse_vdso.c + * + * Tested on ARM, ARM64, MIPS32 and x86 (32-bit and 64-bit). + * Might work on other architectures. + */ + +#include <stdint.h> +#include <elf.h> +#include <stdio.h> +#include <time.h> +#include <sys/auxv.h> +#include <sys/time.h> +#define _GNU_SOURCE +#include <unistd.h> +#include <sys/syscall.h> + +#include "../kselftest.h" + +extern void *vdso_sym(const char *version, const char *name); +extern void vdso_init_from_sysinfo_ehdr(uintptr_t base); +extern void vdso_init_from_auxv(void *auxv); + +/* + * ARM64's vDSO exports its vDSO implementation with different names and + * a different version from other architectures, so we need to handle it + * as a special case. + */ +#if defined(__aarch64__) +const char *version = "LINUX_2.6.39"; +const char *name[4] = { + "__kernel_gettimeofday", + "__kernel_clock_gettime", + "__kernel_time", + "__kernel_clock_getres", +}; +#else +/* Tested on x86, arm, mips */ +const char *version = "LINUX_2.6"; +const char *name[4] = { + "__vdso_gettimeofday", + "__vdso_clock_gettime", + "__vdso_time", + "__vdso_clock_getres", +}; +#endif + +typedef long (*vdso_gettimeofday_t)(struct timeval *tv, struct timezone *tz); +typedef long (*vdso_clock_gettime_t)(clockid_t clk_id, struct timespec *ts); +typedef long (*vdso_clock_getres_t)(clockid_t clk_id, struct timespec *ts); +typedef time_t (*vdso_time_t)(time_t *t); + +static int vdso_test_gettimeofday(void) +{ + /* Find gettimeofday. */ + vdso_gettimeofday_t vdso_gettimeofday = + (vdso_gettimeofday_t)vdso_sym(version, name[0]); + + if (!vdso_gettimeofday) { + printf("Could not find %s\n", name[0]); + return KSFT_SKIP; + } + + struct timeval tv; + long ret = vdso_gettimeofday(&tv, 0); + + if (ret == 0) { + printf("The time is %lld.%06lld\n", + (long long)tv.tv_sec, (long long)tv.tv_usec); + } else { + printf("%s failed\n", name[0]); + return KSFT_FAIL; + } + + return KSFT_PASS; +} + +static int vdso_test_clock_gettime(clockid_t clk_id) +{ + /* Find clock_gettime. */ + vdso_clock_gettime_t vdso_clock_gettime = + (vdso_clock_gettime_t)vdso_sym(version, name[1]); + + if (!vdso_clock_gettime) { + printf("Could not find %s\n", name[1]); + return KSFT_SKIP; + } + + struct timespec ts; + long ret = vdso_clock_gettime(clk_id, &ts); + + if (ret == 0) { + printf("The time is %lld.%06lld\n", + (long long)ts.tv_sec, (long long)ts.tv_nsec); + } else { + printf("%s failed\n", name[1]); + return KSFT_FAIL; + } + + return KSFT_PASS; +} + +static int vdso_test_time(void) +{ + /* Find time. */ + vdso_time_t vdso_time = + (vdso_time_t)vdso_sym(version, name[2]); + + if (!vdso_time) { + printf("Could not find %s\n", name[2]); + return KSFT_SKIP; + } + + long ret = vdso_time(NULL); + + if (ret > 0) { + printf("The time in hours since January 1, 1970 is %lld\n", + (long long)(ret / 3600)); + } else { + printf("%s failed\n", name[2]); + return KSFT_FAIL; + } + + return KSFT_PASS; +} + +static int vdso_test_clock_getres(clockid_t clk_id) +{ + /* Find clock_getres. */ + vdso_clock_getres_t vdso_clock_getres = + (vdso_clock_getres_t)vdso_sym(version, name[3]); + + if (!vdso_clock_getres) { + printf("Could not find %s\n", name[3]); + return KSFT_SKIP; + } + + struct timespec ts, sys_ts; + long ret = vdso_clock_getres(clk_id, &ts); + + if (ret == 0) { + printf("The resolution is %lld %lld\n", + (long long)ts.tv_sec, (long long)ts.tv_nsec); + } else { + printf("%s failed\n", name[3]); + return KSFT_FAIL; + } + + ret = syscall(SYS_clock_getres, clk_id, &sys_ts); + + if ((sys_ts.tv_sec != ts.tv_sec) || (sys_ts.tv_nsec != ts.tv_nsec)) { + printf("%s failed\n", name[3]); + return KSFT_FAIL; + } + + return KSFT_PASS; +} + +const char *vdso_clock_name[12] = { + "CLOCK_REALTIME", + "CLOCK_MONOTONIC", + "CLOCK_PROCESS_CPUTIME_ID", + "CLOCK_THREAD_CPUTIME_ID", + "CLOCK_MONOTONIC_RAW", + "CLOCK_REALTIME_COARSE", + "CLOCK_MONOTONIC_COARSE", + "CLOCK_BOOTTIME", + "CLOCK_REALTIME_ALARM", + "CLOCK_BOOTTIME_ALARM", + "CLOCK_SGI_CYCLE", + "CLOCK_TAI", +}; + +/* + * This function calls vdso_test_clock_gettime and vdso_test_clock_getres + * with different values for clock_id. + */ +static inline int vdso_test_clock(clockid_t clock_id) +{ + int ret0, ret1; + + ret0 = vdso_test_clock_gettime(clock_id); + /* A skipped test is considered passed */ + if (ret0 == KSFT_SKIP) + ret0 = KSFT_PASS; + + ret1 = vdso_test_clock_getres(clock_id); + /* A skipped test is considered passed */ + if (ret1 == KSFT_SKIP) + ret1 = KSFT_PASS; + + ret0 += ret1; + + printf("clock_id: %s", vdso_clock_name[clock_id]); + + if (ret0 > 0) + printf(" [FAIL]\n"); + else + printf(" [PASS]\n"); + + return ret0; +} + +int main(int argc, char **argv) +{ + unsigned long sysinfo_ehdr = getauxval(AT_SYSINFO_EHDR); + int ret; + + if (!sysinfo_ehdr) { + printf("AT_SYSINFO_EHDR is not present!\n"); + return KSFT_SKIP; + } + + vdso_init_from_sysinfo_ehdr(getauxval(AT_SYSINFO_EHDR)); + + ret = vdso_test_gettimeofday(); + +#if _POSIX_TIMERS > 0 + +#ifdef CLOCK_REALTIME + ret += vdso_test_clock(CLOCK_REALTIME); +#endif + +#ifdef CLOCK_BOOTTIME + ret += vdso_test_clock(CLOCK_BOOTTIME); +#endif + +#ifdef CLOCK_TAI + ret += vdso_test_clock(CLOCK_TAI); +#endif + +#ifdef CLOCK_REALTIME_COARSE + ret += vdso_test_clock(CLOCK_REALTIME_COARSE); +#endif + +#ifdef CLOCK_MONOTONIC + ret += vdso_test_clock(CLOCK_MONOTONIC); +#endif + +#ifdef CLOCK_MONOTONIC_RAW + ret += vdso_test_clock(CLOCK_MONOTONIC_RAW); +#endif + +#ifdef CLOCK_MONOTONIC_COARSE + ret += vdso_test_clock(CLOCK_MONOTONIC_COARSE); +#endif + +#endif + + ret += vdso_test_time(); + + if (ret > 0) + return KSFT_FAIL; + + return KSFT_PASS; +}
On Thu, May 30, 2019 at 4:16 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
+/*
- ARM64's vDSO exports its vDSO implementation with different names and
- a different version from other architectures, so we need to handle it
- as a special case.
- */
+#if defined(__aarch64__) +const char *version = "LINUX_2.6.39"; +const char *name[4] = {
"__kernel_gettimeofday",
"__kernel_clock_gettime",
"__kernel_time",
"__kernel_clock_getres",
+}; +#else +/* Tested on x86, arm, mips */ +const char *version = "LINUX_2.6"; +const char *name[4] = {
"__vdso_gettimeofday",
"__vdso_clock_gettime",
"__vdso_time",
"__vdso_clock_getres",
+}; +#endif
I see the __kernel_* name used on arm64, powerpc and s390, whiel the __vdso_* name is used on arm, mips, nds32, riscv, sparc, and x86.
Also the versions have more variants:
$ git ls-files arch | grep vdso | xargs grep '(LINUX_[2345]|VDSO_VERSION_STRING)' arch/arm/vdso/vdso.lds.S: LINUX_2.6 { arch/arm64/kernel/vdso/vdso.lds.S: LINUX_2.6.39 { arch/mips/vdso/vdso.lds.S: LINUX_2.6 { arch/nds32/kernel/vdso/vdso.lds.S: LINUX_4 { arch/powerpc/include/asm/vdso.h:#define VDSO_VERSION_STRING LINUX_2.6.15 arch/powerpc/kernel/vdso32/vdso32.lds.S: VDSO_VERSION_STRING { arch/powerpc/kernel/vdso64/vdso64.lds.S: VDSO_VERSION_STRING { arch/riscv/kernel/vdso/vdso.lds.S: LINUX_4.15 { arch/s390/include/asm/vdso.h:#define VDSO_VERSION_STRING LINUX_2.6.29 arch/s390/kernel/vdso32/vdso32.lds.S: VDSO_VERSION_STRING { arch/s390/kernel/vdso64/vdso64.lds.S: VDSO_VERSION_STRING { arch/sparc/vdso/vdso.lds.S: LINUX_2.6 { arch/sparc/vdso/vdso32/vdso32.lds.S: LINUX_2.6 { arch/x86/entry/vdso/vdso.lds.S: LINUX_2.6 { arch/x86/entry/vdso/vdso32/vdso32.lds.S: LINUX_2.6 { arch/x86/entry/vdso/vdso32/vdso32.lds.S: LINUX_2.5 { arch/x86/entry/vdso/vdsox32.lds.S: LINUX_2.6 { arch/x86/um/vdso/vdso.lds.S: LINUX_2.6 {
Maybe change the test case to just try all combinations of the above (and __vdso_clock_gettime64 as well) and stop checking the architecture?
Arnd
On Thu, May 30, 2019 at 4:15 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
vDSO (virtual dynamic shared object) is a mechanism that the Linux kernel provides as an alternative to system calls to reduce where possible the costs in terms of cycles. This is possible because certain syscalls like gettimeofday() do not write any data and return one or more values that are stored in the kernel, which makes relatively safe calling them directly as a library function.
Hi Vincento,
I've very happy with how this turned out overall, and as far as I can tell you have addressed all my previous comments. I had another look through the series and only noticed a few very minor issues.
I hope Thomas can have another look soon, he probably also finds a few things, and then it should be ready for inclusion in linux-next and the coming merge window.
One open question I touched in my review is whether we want to have a vdso version of clock_getres() in all architectures or not. I'd prefer to leave it out because there is very little advantage to it over the system call (the results don't change at runtime and can easily be cached by libc if performance ever matters), and it takes up a small amount of memory for the implementation.
We shouldn't just need it for consistency because all callers would require implementing a fallback to the system call anyway, to deal with old kernels.
If anyone comes up with a good reason why it should be added after all, let me know and I'll stop mentioning it.
Arnd
Hi Arnd,
thank you for your review.
On 31/05/2019 09:46, Arnd Bergmann wrote:
On Thu, May 30, 2019 at 4:15 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
vDSO (virtual dynamic shared object) is a mechanism that the Linux kernel provides as an alternative to system calls to reduce where possible the costs in terms of cycles. This is possible because certain syscalls like gettimeofday() do not write any data and return one or more values that are stored in the kernel, which makes relatively safe calling them directly as a library function.
Hi Vincento,
I've very happy with how this turned out overall, and as far as I can tell you have addressed all my previous comments. I had another look through the series and only noticed a few very minor issues.
Thanks! I agree with what you pointed out in the single patches, I will wait for Thomas to review them as well and then will address all the comments in v7.
...
One open question I touched in my review is whether we want to have a vdso version of clock_getres() in all architectures or not. I'd prefer to leave it out because there is very little advantage to it over the system call (the results don't change at runtime and can easily be cached by libc if performance ever matters), and it takes up a small amount of memory for the implementation.
I thought about it and I ended up with what proposed in this patchset mainly for symmetry across all the architectures since in the end they use the same common code.
It seems also that there is some performance impact (i.e.):
clock-getres-monotonic: libc(system call): 296 nsec/call clock-getres-monotonic: libc(vdso): 5 nsec/call
I agree with you though when you say that caching it in the libc is a possibility to overcome the performance impact.
We shouldn't just need it for consistency because all callers would require implementing a fallback to the system call anyway, to deal with old kernels.
A way to address this issue would be to use versioning, which seems supported in the vdso library (i.e. arch/x86/entry/vdso/vdso32/vdso32.lds.S).
For example for x86 (vdso32) we would have something like:
VERSION { LINUX_5.3 (being optimistic here :) ) { global: __vdso_clock_getres; __vdso_clock_gettime64; }; LINUX_2.6 { global: __vdso_clock_gettime; __vdso_gettimeofday; __vdso_time; };
LINUX_2.5 { global: __kernel_vsyscall; __kernel_sigreturn; __kernel_rt_sigreturn; local: *; }; }
What do you think? Would this be a viable solution?
If anyone comes up with a good reason why it should be added after all, let me know and I'll stop mentioning it.
Arnd
On Tue, Jun 4, 2019 at 2:05 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
On 31/05/2019 09:46, Arnd Bergmann wrote:
On Thu, May 30, 2019 at 4:15 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote: One open question I touched in my review is whether we want to have a vdso version of clock_getres() in all architectures or not. I'd prefer to leave it out because there is very little advantage to it over the system call (the results don't change at runtime and can easily be cached by libc if performance ever matters), and it takes up a small amount of memory for the implementation.
I thought about it and I ended up with what proposed in this patchset mainly for symmetry across all the architectures since in the end they use the same common code.
It seems also that there is some performance impact (i.e.):
clock-getres-monotonic: libc(system call): 296 nsec/call clock-getres-monotonic: libc(vdso): 5 nsec/call
I agree with you though when you say that caching it in the libc is a possibility to overcome the performance impact.
It's clear that the vdso version is much faster, my point was that I could not think of any use case that cared about it being fast.
If there is a good reason for it, I also don't mind adding a clock_getres_time64() vdso version everywhere.
We shouldn't just need it for consistency because all callers would require implementing a fallback to the system call anyway, to deal with old kernels.
A way to address this issue would be to use versioning, which seems supported in the vdso library (i.e. arch/x86/entry/vdso/vdso32/vdso32.lds.S).
For example for x86 (vdso32) we would have something like:
VERSION { LINUX_5.3 (being optimistic here :) ) { global: __vdso_clock_getres; __vdso_clock_gettime64; }; LINUX_2.6 { global: __vdso_clock_gettime; __vdso_gettimeofday; __vdso_time; };
LINUX_2.5 { global: __kernel_vsyscall; __kernel_sigreturn; __kernel_rt_sigreturn; local: *; };
}
What do you think? Would this be a viable solution?
I actually never understood the point of symbol versioning in the vdso. What does that gain us? Note that there are no conflicting symbol names between the versions, and that nothing enforces the kernel headers to match the symbol version used when linking.
Arnd
On 6/4/19 1:12 PM, Arnd Bergmann wrote:
On Tue, Jun 4, 2019 at 2:05 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
On 31/05/2019 09:46, Arnd Bergmann wrote:
On Thu, May 30, 2019 at 4:15 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote: One open question I touched in my review is whether we want to have a vdso version of clock_getres() in all architectures or not. I'd prefer to leave it out because there is very little advantage to it over the system call (the results don't change at runtime and can easily be cached by libc if performance ever matters), and it takes up a small amount of memory for the implementation.
I thought about it and I ended up with what proposed in this patchset mainly for symmetry across all the architectures since in the end they use the same common code.
It seems also that there is some performance impact (i.e.):
clock-getres-monotonic: libc(system call): 296 nsec/call clock-getres-monotonic: libc(vdso): 5 nsec/call
I agree with you though when you say that caching it in the libc is a possibility to overcome the performance impact.
It's clear that the vdso version is much faster, my point was that I could not think of any use case that cared about it being fast.
I do not know of any use case that cares, my point was that since we need to implement it in the generic library for some architectures, for symmetry we can extend it to all the architectures that support the generic vdso library.
If there is a good reason for it, I also don't mind adding a clock_getres_time64() vdso version everywhere.
Totally agree on this.
We shouldn't just need it for consistency because all callers would require implementing a fallback to the system call anyway, to deal with old kernels.
A way to address this issue would be to use versioning, which seems supported in the vdso library (i.e. arch/x86/entry/vdso/vdso32/vdso32.lds.S).
For example for x86 (vdso32) we would have something like:
VERSION { LINUX_5.3 (being optimistic here :) ) { global: __vdso_clock_getres; __vdso_clock_gettime64; }; LINUX_2.6 { global: __vdso_clock_gettime; __vdso_gettimeofday; __vdso_time; };
LINUX_2.5 { global: __kernel_vsyscall; __kernel_sigreturn; __kernel_rt_sigreturn; local: *; };
}
What do you think? Would this be a viable solution?
I actually never understood the point of symbol versioning in the vdso. What does that gain us? Note that there are no conflicting symbol names between the versions, and that nothing enforces the kernel headers to match the symbol version used when linking.
My understanding, based on [1] and [2] is that the version defines the minimum kernel version from when a specific symbols is exposed and whenever this symbol is requested from the vDSO the correct version needs to be specified. Every "new" library, dealing with an "old" kernel, compliant with the exposed ABI should implement the vDSO calls in this way and provide a fallback if the vDSO function is not present (i.e. [3]).
[1] Documentation/ABI/stable/vdso [2] tools/testing/selftests/vDSO/parse_vdso.c [3] https://github.com/lattera/glibc/blob/master/sysdeps/unix/sysv/linux/aarch64...
Arnd
On Tue, 4 Jun 2019, Vincenzo Frascino wrote:
On 31/05/2019 09:46, Arnd Bergmann wrote:
One open question I touched in my review is whether we want to have a vdso version of clock_getres() in all architectures or not. I'd prefer to leave it out because there is very little advantage to it over the system call (the results don't change at runtime and can easily be cached by libc if performance ever matters), and it takes up a small amount of memory for the implementation.
I thought about it and I ended up with what proposed in this patchset mainly for symmetry across all the architectures since in the end they use the same common code.
It seems also that there is some performance impact (i.e.):
clock-getres-monotonic: libc(system call): 296 nsec/call clock-getres-monotonic: libc(vdso): 5 nsec/call
clock_getres() is usually not a hot path operation.
I agree with you though when you say that caching it in the libc is a possibility to overcome the performance impact.
We shouldn't just need it for consistency because all callers would require implementing a fallback to the system call anyway, to deal with old kernels.
libc has the fallback already. Let's aim for 1:1 replacement of the architecture code first and then add the extra bits in separate patches.
Thanks,
tglx
On 6/14/19 1:16 PM, Thomas Gleixner wrote:
On Tue, 4 Jun 2019, Vincenzo Frascino wrote:
On 31/05/2019 09:46, Arnd Bergmann wrote:
One open question I touched in my review is whether we want to have a vdso version of clock_getres() in all architectures or not. I'd prefer to leave it out because there is very little advantage to it over the system call (the results don't change at runtime and can easily be cached by libc if performance ever matters), and it takes up a small amount of memory for the implementation.
I thought about it and I ended up with what proposed in this patchset mainly for symmetry across all the architectures since in the end they use the same common code.
It seems also that there is some performance impact (i.e.):
clock-getres-monotonic: libc(system call): 296 nsec/call clock-getres-monotonic: libc(vdso): 5 nsec/call
clock_getres() is usually not a hot path operation.
I agree with you though when you say that caching it in the libc is a possibility to overcome the performance impact.
We shouldn't just need it for consistency because all callers would require implementing a fallback to the system call anyway, to deal with old kernels.
libc has the fallback already. Let's aim for 1:1 replacement of the architecture code first and then add the extra bits in separate patches.
Ok, thanks Thomas, I will split the patches accordingly.
Thanks,
tglx
Hi Vincenzo,
On 5/30/19 7:15 AM, Vincenzo Frascino wrote:
vDSO (virtual dynamic shared object) is a mechanism that the Linux kernel provides as an alternative to system calls to reduce where possible the costs in terms of cycles. This is possible because certain syscalls like gettimeofday() do not write any data and return one or more values that are stored in the kernel, which makes relatively safe calling them directly as a library function.
Even if the mechanism is pretty much standard, every architecture in the last few years ended up implementing their own vDSO library in the architectural code.
The purpose of this patch-set is to identify the commonalities in between the architectures and try to consolidate the common code paths, starting with gettimeofday().
This implementation contains the following design choices:
- Every architecture defines the arch specific code in an header in "asm/vdso/".
- The generic implementation includes the arch specific one and lives in "lib/vdso".
- The arch specific code for gettimeofday lives in "<arch path>/vdso/gettimeofday.c" and includes the generic code only.
- The generic implementation of update_vsyscall and update_vsyscall_tz lives in kernel/vdso and provide the bindings that can be implemented by each architecture.
- Each architecture provides its implementation of the bindings in "asm/vdso/vsyscall.h".
- This approach allows to consolidate the common code in a single place with the benefit of avoiding code duplication.
This implementation contains the portings to the common library for: arm64, compat mode for arm64, arm, mips, x86_64, x32, compat mode for x86_64 and i386.
The mips porting has been tested on qemu for mips32el. A configuration to repeat the tests can be found at [4].
The x86_64 porting has been tested on an Intel Xeon 5120T based machine running Ubuntu 18.04 and using the Ubuntu provided defconfig.
The i386 porting has been tested on qemu using the i386_defconfig configuration.
Last but not least from this porting arm64, compat arm64, arm and mips gain the support for:
- CLOCK_BOOTTIME that can be useful in certain scenarios since it keeps track of the time during sleep as well.
- CLOCK_TAI that is like CLOCK_REALTIME, but uses the International Atomic Time (TAI) reference instead of UTC to avoid jumping on leap second updates.
for both clock_gettime and clock_getres.
The porting has been validated using the vdsotest test-suite [1] extended to cover all the clock ids [2].
A new test has been added to the linux kselftest in order to validate the newly added library.
The porting has been benchmarked and the performance results are provided as part of this cover letter.
To simplify the testing, a copy of the patchset on top of a recent linux tree can be found at [3] and [4].
[1] https://github.com/nathanlynch/vdsotest [2] https://github.com/fvincenzo/vdsotest [3] git://linux-arm.org/linux-vf.git vdso/v6 [4] git://linux-arm.org/linux-vf.git vdso-mips/v6
Changes:
v6:
- Rebased on 5.2-rc2.
- Added performance numbers.
- Removed vdso_types.h.
- Unified update_vsyscall and update_vsyscall_tz.
- Reworked the kselftest included in this patchset.
- Addressed review comments.
v5:
- Rebased on 5.0-rc7.
- Added x86_64, compat mode for x86_64 and i386 portings.
- Extended vDSO kselftest.
- Addressed review comments.
v4:
- Rebased on 5.0-rc2.
- Addressed review comments.
- Disabled compat vdso on arm64 when the kernel is compiled with clang.
v3:
- Ported the latest fixes and optimizations done on the x86 architecture to the generic library.
- Addressed review comments.
- Improved the documentation of the interfaces.
- Changed the HAVE_ARCH_TIMER config option to a more generic HAVE_HW_COUNTER.
v2:
- Added -ffixed-x18 to arm64
- Repleced occurrences of timeval and timespec
- Modified datapage.h to be compliant with y2038 on all the architectures
- Removed __u_vdso type
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Cc: Arnd Bergmann arnd@arndb.de Cc: Russell King linux@armlinux.org.uk Cc: Ralf Baechle ralf@linux-mips.org Cc: Paul Burton paul.burton@mips.com Cc: Daniel Lezcano daniel.lezcano@linaro.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Mark Salyzyn salyzyn@android.com Cc: Peter Collingbourne pcc@google.com Cc: Shuah Khan shuah@kernel.org Cc: Dmitry Safonov 0x7f454c46@gmail.com Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Huw Davies huw@codeweavers.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com
Performance Numbers: Linux 5.2.0-rc2 - Xeon Gold 5120T
Unified vDSO:
clock-gettime-monotonic: syscall: 342 nsec/call clock-gettime-monotonic: libc: 25 nsec/call clock-gettime-monotonic: vdso: 24 nsec/call clock-getres-monotonic: syscall: 296 nsec/call clock-getres-monotonic: libc: 296 nsec/call clock-getres-monotonic: vdso: 3 nsec/call clock-gettime-monotonic-coarse: syscall: 294 nsec/call clock-gettime-monotonic-coarse: libc: 5 nsec/call clock-gettime-monotonic-coarse: vdso: 5 nsec/call clock-getres-monotonic-coarse: syscall: 295 nsec/call clock-getres-monotonic-coarse: libc: 292 nsec/call clock-getres-monotonic-coarse: vdso: 5 nsec/call clock-gettime-monotonic-raw: syscall: 343 nsec/call clock-gettime-monotonic-raw: libc: 25 nsec/call clock-gettime-monotonic-raw: vdso: 23 nsec/call clock-getres-monotonic-raw: syscall: 290 nsec/call clock-getres-monotonic-raw: libc: 290 nsec/call clock-getres-monotonic-raw: vdso: 4 nsec/call clock-gettime-tai: syscall: 332 nsec/call clock-gettime-tai: libc: 24 nsec/call clock-gettime-tai: vdso: 23 nsec/call clock-getres-tai: syscall: 288 nsec/call clock-getres-tai: libc: 288 nsec/call clock-getres-tai: vdso: 3 nsec/call clock-gettime-boottime: syscall: 342 nsec/call clock-gettime-boottime: libc: 24 nsec/call clock-gettime-boottime: vdso: 23 nsec/call clock-getres-boottime: syscall: 284 nsec/call clock-getres-boottime: libc: 291 nsec/call clock-getres-boottime: vdso: 3 nsec/call clock-gettime-realtime: syscall: 337 nsec/call clock-gettime-realtime: libc: 24 nsec/call clock-gettime-realtime: vdso: 23 nsec/call clock-getres-realtime: syscall: 287 nsec/call clock-getres-realtime: libc: 284 nsec/call clock-getres-realtime: vdso: 3 nsec/call clock-gettime-realtime-coarse: syscall: 307 nsec/call clock-gettime-realtime-coarse: libc: 4 nsec/call clock-gettime-realtime-coarse: vdso: 4 nsec/call clock-getres-realtime-coarse: syscall: 294 nsec/call clock-getres-realtime-coarse: libc: 291 nsec/call clock-getres-realtime-coarse: vdso: 4 nsec/call getcpu: syscall: 246 nsec/call getcpu: libc: 14 nsec/call getcpu: vdso: 11 nsec/call gettimeofday: syscall: 293 nsec/call gettimeofday: libc: 26 nsec/call gettimeofday: vdso: 25 nsec/call
Stock Kernel:
clock-gettime-monotonic: syscall: 338 nsec/call clock-gettime-monotonic: libc: 24 nsec/call clock-gettime-monotonic: vdso: 23 nsec/call clock-getres-monotonic: syscall: 291 nsec/call clock-getres-monotonic: libc: 304 nsec/call clock-getres-monotonic: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-monotonic-coarse: syscall: 297 nsec/call clock-gettime-monotonic-coarse: libc: 5 nsec/call clock-gettime-monotonic-coarse: vdso: 4 nsec/call clock-getres-monotonic-coarse: syscall: 281 nsec/call clock-getres-monotonic-coarse: libc: 286 nsec/call clock-getres-monotonic-coarse: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-monotonic-raw: syscall: 336 nsec/call clock-gettime-monotonic-raw: libc: 340 nsec/call clock-gettime-monotonic-raw: vdso: 346 nsec/call clock-getres-monotonic-raw: syscall: 297 nsec/call clock-getres-monotonic-raw: libc: 301 nsec/call clock-getres-monotonic-raw: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-tai: syscall: 351 nsec/call clock-gettime-tai: libc: 24 nsec/call clock-gettime-tai: vdso: 23 nsec/call clock-getres-tai: syscall: 298 nsec/call clock-getres-tai: libc: 290 nsec/call clock-getres-tai: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-boottime: syscall: 342 nsec/call clock-gettime-boottime: libc: 347 nsec/call clock-gettime-boottime: vdso: 355 nsec/call clock-getres-boottime: syscall: 296 nsec/call clock-getres-boottime: libc: 295 nsec/call clock-getres-boottime: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-realtime: syscall: 346 nsec/call clock-gettime-realtime: libc: 24 nsec/call clock-gettime-realtime: vdso: 22 nsec/call clock-getres-realtime: syscall: 295 nsec/call clock-getres-realtime: libc: 291 nsec/call clock-getres-realtime: vdso: not tested Note: vDSO version of clock_getres not found clock-gettime-realtime-coarse: syscall: 292 nsec/call clock-gettime-realtime-coarse: libc: 5 nsec/call clock-gettime-realtime-coarse: vdso: 4 nsec/call clock-getres-realtime-coarse: syscall: 300 nsec/call clock-getres-realtime-coarse: libc: 301 nsec/call clock-getres-realtime-coarse: vdso: not tested Note: vDSO version of clock_getres not found getcpu: syscall: 252 nsec/call getcpu: libc: 14 nsec/call getcpu: vdso: 11 nsec/call gettimeofday: syscall: 293 nsec/call gettimeofday: libc: 24 nsec/call gettimeofday: vdso: 25 nsec/call
Peter Collingbourne (1): arm64: Build vDSO with -ffixed-x18
Vincenzo Frascino (18): kernel: Standardize vdso_datapage kernel: Define gettimeofday vdso common code kernel: Unify update_vsyscall implementation arm64: Substitute gettimeofday with C implementation arm64: compat: Add missing syscall numbers arm64: compat: Expose signal related structures arm64: compat: Generate asm offsets for signals lib: vdso: Add compat support arm64: compat: Add vDSO arm64: Refactor vDSO code arm64: compat: vDSO setup for compat layer arm64: elf: vDSO code page discovery arm64: compat: Get sigreturn trampolines from vDSO arm64: Add vDSO compat support arm: Add support for generic vDSO mips: Add support for generic vDSO x86: Add support for generic vDSO kselftest: Extend vDSO selftest
arch/arm/Kconfig | 3 + arch/arm/include/asm/vdso/gettimeofday.h | 96 +++++ arch/arm/include/asm/vdso/vsyscall.h | 71 ++++ arch/arm/include/asm/vdso_datapage.h | 29 +- arch/arm/kernel/vdso.c | 87 +---- arch/arm/vdso/Makefile | 13 +- arch/arm/vdso/note.c | 15 + arch/arm/vdso/vdso.lds.S | 2 + arch/arm/vdso/vgettimeofday.c | 268 +------------ arch/arm64/Kconfig | 3 + arch/arm64/Makefile | 23 +- arch/arm64/include/asm/elf.h | 14 + arch/arm64/include/asm/signal32.h | 46 +++ arch/arm64/include/asm/unistd.h | 5 + arch/arm64/include/asm/vdso.h | 3 + arch/arm64/include/asm/vdso/compat_barrier.h | 51 +++ .../include/asm/vdso/compat_gettimeofday.h | 108 ++++++ arch/arm64/include/asm/vdso/gettimeofday.h | 84 +++++ arch/arm64/include/asm/vdso/vsyscall.h | 53 +++ arch/arm64/include/asm/vdso_datapage.h | 48 --- arch/arm64/kernel/Makefile | 6 +- arch/arm64/kernel/asm-offsets.c | 39 +- arch/arm64/kernel/signal32.c | 72 ++-- arch/arm64/kernel/vdso.c | 356 ++++++++++++------ arch/arm64/kernel/vdso/Makefile | 34 +- arch/arm64/kernel/vdso/gettimeofday.S | 334 ---------------- arch/arm64/kernel/vdso/vgettimeofday.c | 28 ++ arch/arm64/kernel/vdso32/.gitignore | 2 + arch/arm64/kernel/vdso32/Makefile | 184 +++++++++ arch/arm64/kernel/vdso32/note.c | 15 + arch/arm64/kernel/vdso32/sigreturn.S | 62 +++ arch/arm64/kernel/vdso32/vdso.S | 19 + arch/arm64/kernel/vdso32/vdso.lds.S | 82 ++++ arch/arm64/kernel/vdso32/vgettimeofday.c | 59 +++ arch/mips/Kconfig | 2 + arch/mips/include/asm/vdso.h | 78 +--- arch/mips/include/asm/vdso/gettimeofday.h | 175 +++++++++ arch/mips/{ => include/asm}/vdso/vdso.h | 6 +- arch/mips/include/asm/vdso/vsyscall.h | 43 +++ arch/mips/kernel/vdso.c | 37 +- arch/mips/vdso/Makefile | 25 +- arch/mips/vdso/elf.S | 2 +- arch/mips/vdso/gettimeofday.c | 273 -------------- arch/mips/vdso/sigreturn.S | 2 +- arch/mips/vdso/vdso.lds.S | 4 + arch/mips/vdso/vgettimeofday.c | 57 +++ arch/x86/Kconfig | 3 + arch/x86/entry/vdso/Makefile | 9 + arch/x86/entry/vdso/vclock_gettime.c | 251 +++--------- arch/x86/entry/vdso/vdso.lds.S | 2 + arch/x86/entry/vdso/vdso32/vdso32.lds.S | 2 + arch/x86/entry/vdso/vdsox32.lds.S | 1 + arch/x86/entry/vsyscall/Makefile | 2 - arch/x86/entry/vsyscall/vsyscall_gtod.c | 83 ---- arch/x86/include/asm/mshyperv-tsc.h | 76 ++++ arch/x86/include/asm/mshyperv.h | 70 +--- arch/x86/include/asm/pvclock.h | 2 +- arch/x86/include/asm/vdso/gettimeofday.h | 203 ++++++++++ arch/x86/include/asm/vdso/vsyscall.h | 44 +++ arch/x86/include/asm/vgtod.h | 75 +--- arch/x86/include/asm/vvar.h | 7 +- arch/x86/kernel/pvclock.c | 1 + include/asm-generic/vdso/vsyscall.h | 56 +++ include/linux/hrtimer.h | 15 +- include/linux/hrtimer_defs.h | 25 ++ include/linux/timekeeper_internal.h | 9 + include/vdso/datapage.h | 91 +++++ include/vdso/helpers.h | 56 +++ include/vdso/vsyscall.h | 11 + kernel/Makefile | 1 + kernel/vdso/Makefile | 2 + kernel/vdso/vsyscall.c | 139 +++++++ lib/Kconfig | 5 + lib/vdso/Kconfig | 36 ++ lib/vdso/Makefile | 22 ++ lib/vdso/gettimeofday.c | 229 +++++++++++ tools/testing/selftests/vDSO/Makefile | 2 + tools/testing/selftests/vDSO/vdso_full_test.c | 261 +++++++++++++ 78 files changed, 3042 insertions(+), 1767 deletions(-) create mode 100644 arch/arm/include/asm/vdso/gettimeofday.h create mode 100644 arch/arm/include/asm/vdso/vsyscall.h create mode 100644 arch/arm/vdso/note.c create mode 100644 arch/arm64/include/asm/vdso/compat_barrier.h create mode 100644 arch/arm64/include/asm/vdso/compat_gettimeofday.h create mode 100644 arch/arm64/include/asm/vdso/gettimeofday.h create mode 100644 arch/arm64/include/asm/vdso/vsyscall.h delete mode 100644 arch/arm64/include/asm/vdso_datapage.h delete mode 100644 arch/arm64/kernel/vdso/gettimeofday.S create mode 100644 arch/arm64/kernel/vdso/vgettimeofday.c create mode 100644 arch/arm64/kernel/vdso32/.gitignore create mode 100644 arch/arm64/kernel/vdso32/Makefile create mode 100644 arch/arm64/kernel/vdso32/note.c create mode 100644 arch/arm64/kernel/vdso32/sigreturn.S create mode 100644 arch/arm64/kernel/vdso32/vdso.S create mode 100644 arch/arm64/kernel/vdso32/vdso.lds.S create mode 100644 arch/arm64/kernel/vdso32/vgettimeofday.c create mode 100644 arch/mips/include/asm/vdso/gettimeofday.h rename arch/mips/{ => include/asm}/vdso/vdso.h (90%) create mode 100644 arch/mips/include/asm/vdso/vsyscall.h delete mode 100644 arch/mips/vdso/gettimeofday.c create mode 100644 arch/mips/vdso/vgettimeofday.c delete mode 100644 arch/x86/entry/vsyscall/vsyscall_gtod.c create mode 100644 arch/x86/include/asm/mshyperv-tsc.h create mode 100644 arch/x86/include/asm/vdso/gettimeofday.h create mode 100644 arch/x86/include/asm/vdso/vsyscall.h create mode 100644 include/asm-generic/vdso/vsyscall.h create mode 100644 include/linux/hrtimer_defs.h create mode 100644 include/vdso/datapage.h create mode 100644 include/vdso/helpers.h create mode 100644 include/vdso/vsyscall.h create mode 100644 kernel/vdso/Makefile create mode 100644 kernel/vdso/vsyscall.c create mode 100644 lib/vdso/Kconfig create mode 100644 lib/vdso/Makefile create mode 100644 lib/vdso/gettimeofday.c create mode 100644 tools/testing/selftests/vDSO/vdso_full_test.c
Ran vdsotest-bench on ThunderX2 (arm64) with and without unified patchset. The numbers are as below:
Unified vDSO: ------------- clock-gettime-monotonic: syscall: 346 nsec/call clock-gettime-monotonic: libc: 38 nsec/call clock-gettime-monotonic: vdso: 36 nsec/call clock-getres-monotonic: syscall: 262 nsec/call clock-getres-monotonic: libc: 6 nsec/call clock-getres-monotonic: vdso: 5 nsec/call clock-gettime-monotonic-coarse: syscall: 296 nsec/call clock-gettime-monotonic-coarse: libc: 39 nsec/call clock-gettime-monotonic-coarse: vdso: 38 nsec/call clock-getres-monotonic-coarse: syscall: 260 nsec/call clock-getres-monotonic-coarse: libc: 8 nsec/call clock-getres-monotonic-coarse: vdso: 5 nsec/call clock-gettime-monotonic-raw: syscall: 345 nsec/call clock-gettime-monotonic-raw: libc: 35 nsec/call clock-gettime-monotonic-raw: vdso: 34 nsec/call clock-getres-monotonic-raw: syscall: 261 nsec/call clock-getres-monotonic-raw: libc: 7 nsec/call clock-getres-monotonic-raw: vdso: 5 nsec/call clock-gettime-tai: syscall: 357 nsec/call clock-gettime-tai: libc: 38 nsec/call clock-gettime-tai: vdso: 36 nsec/call clock-getres-tai: syscall: 257 nsec/call clock-getres-tai: libc: 7 nsec/call clock-getres-tai: vdso: 5 nsec/call clock-gettime-boottime: syscall: 356 nsec/call clock-gettime-boottime: libc: 38 nsec/call clock-gettime-boottime: vdso: 36 nsec/call clock-getres-boottime: syscall: 257 nsec/call clock-getres-boottime: libc: 6 nsec/call clock-getres-boottime: vdso: 5 nsec/call clock-gettime-realtime: syscall: 345 nsec/call clock-gettime-realtime: libc: 38 nsec/call clock-gettime-realtime: vdso: 36 nsec/call clock-getres-realtime: syscall: 257 nsec/call clock-getres-realtime: libc: 7 nsec/call clock-getres-realtime: vdso: 5 nsec/call clock-gettime-realtime-coarse: syscall: 295 nsec/call clock-gettime-realtime-coarse: libc: 39 nsec/call clock-gettime-realtime-coarse: vdso: 38 nsec/call clock-getres-realtime-coarse: syscall: 260 nsec/call clock-getres-realtime-coarse: libc: 8 nsec/call clock-getres-realtime-coarse: vdso: 5 nsec/call getcpu: syscall: 244 nsec/call getcpu: libc: 247 nsec/call getcpu: vdso: not tested Note: vDSO version of getcpu not found gettimeofday: syscall: 383 nsec/call gettimeofday: libc: 39 nsec/call gettimeofday: vdso: 35 nsec/call
Stock Kernel: ------------- clock-gettime-monotonic: syscall: 344 nsec/call clock-gettime-monotonic: libc: 74 nsec/call clock-gettime-monotonic: vdso: 73 nsec/call clock-getres-monotonic: syscall: 258 nsec/call clock-getres-monotonic: libc: 6 nsec/call clock-getres-monotonic: vdso: 4 nsec/call clock-gettime-monotonic-coarse: syscall: 300 nsec/call clock-gettime-monotonic-coarse: libc: 36 nsec/call clock-gettime-monotonic-coarse: vdso: 34 nsec/call clock-getres-monotonic-coarse: syscall: 261 nsec/call clock-getres-monotonic-coarse: libc: 6 nsec/call clock-getres-monotonic-coarse: vdso: 4 nsec/call clock-gettime-monotonic-raw: syscall: 346 nsec/call clock-gettime-monotonic-raw: libc: 74 nsec/call clock-gettime-monotonic-raw: vdso: 72 nsec/call clock-getres-monotonic-raw: syscall: 254 nsec/call clock-getres-monotonic-raw: libc: 6 nsec/call clock-getres-monotonic-raw: vdso: 4 nsec/call clock-gettime-tai: syscall: 345 nsec/call clock-gettime-tai: libc: 361 nsec/call clock-gettime-tai: vdso: 359 nsec/call clock-getres-tai: syscall: 259 nsec/call clock-getres-tai: libc: 262 nsec/call clock-getres-tai: vdso: 258 nsec/call clock-gettime-boottime: syscall: 353 nsec/call clock-gettime-boottime: libc: 365 nsec/call clock-gettime-boottime: vdso: 362 nsec/call clock-getres-boottime: syscall: 260 nsec/call clock-getres-boottime: libc: 267 nsec/call clock-getres-boottime: vdso: 259 nsec/call clock-gettime-realtime: syscall: 344 nsec/call clock-gettime-realtime: libc: 73 nsec/call clock-gettime-realtime: vdso: 72 nsec/call clock-getres-realtime: syscall: 255 nsec/call clock-getres-realtime: libc: 7 nsec/call clock-getres-realtime: vdso: 4 nsec/call clock-gettime-realtime-coarse: syscall: 296 nsec/call clock-gettime-realtime-coarse: libc: 35 nsec/call clock-gettime-realtime-coarse: vdso: 33 nsec/call clock-getres-realtime-coarse: syscall: 258 nsec/call clock-getres-realtime-coarse: libc: 6 nsec/call clock-getres-realtime-coarse: vdso: 4 nsec/call getcpu: syscall: 237 nsec/call getcpu: libc: 242 nsec/call getcpu: vdso: not tested Note: vDSO version of getcpu not found gettimeofday: syscall: 378 nsec/call gettimeofday: libc: 73 nsec/call gettimeofday: vdso: 70 nsec/call
Observed good improvement for some APIs with the patch.
Tested-by: Shijith Thotton sthotton@marvell.com
Thanks, Shijith
Hi Shijith,
...
Observed good improvement for some APIs with the patch.
Looks good. Thanks for testing the set, I will add your tag to my patches.
Tested-by: Shijith Thotton sthotton@marvell.com
Thanks, Shijith
On Thu, 30 May 2019 15:15:12 +0100 Vincenzo Frascino vincenzo.frascino@arm.com wrote:
Hi,
vDSO (virtual dynamic shared object) is a mechanism that the Linux kernel provides as an alternative to system calls to reduce where possible the costs in terms of cycles. [ ... ] The porting has been benchmarked and the performance results are provided as part of this cover letter.
I can't reveal the absolute numbers here, but vdsotest-bench gives me quite some performance gain on my board here ("time needed on v6" divided by "time needed on 5.2-rc1", so smaller percentages are better): clock-gettime-monotonic: 23 % clock-gettime-monotonic-raw: 30 % clock-gettime-tai: 5 % clock-getres-tai: 5 % clock-gettime-boottime: 5 % clock-getres-boottime: 5 % clock-gettime-realtime: 25 % gettimeofday: 26 % The other numbers stayed the same or differed by just 1 ns, which seems to be within the margin of error, as repeated runs on the same kernel suggest. The 5% numbers are of course those were we went from a syscall-only to the newly added arm64 VDSO implementation, but even the other calls improved by a factor of 3 or more.
Sounds like a strong indicator that this is a good thing to have.
Not sure if "running some benchmark a couple of times on a single machine" qualifies for this, but I guess it means:
Tested-by: Andre Przywara andre.przywara@arm.com
Cheers, Andre.
On 20/06/2019 17:27, Andre Przywara wrote:
On Thu, 30 May 2019 15:15:12 +0100 Vincenzo Frascino vincenzo.frascino@arm.com wrote:
Hi,
vDSO (virtual dynamic shared object) is a mechanism that the Linux kernel provides as an alternative to system calls to reduce where possible the costs in terms of cycles. [ ... ] The porting has been benchmarked and the performance results are provided as part of this cover letter.
I can't reveal the absolute numbers here, but vdsotest-bench gives me quite some performance gain on my board here ("time needed on v6" divided by "time needed on 5.2-rc1", so smaller percentages are better): clock-gettime-monotonic: 23 % clock-gettime-monotonic-raw: 30 % clock-gettime-tai: 5 % clock-getres-tai: 5 % clock-gettime-boottime: 5 % clock-getres-boottime: 5 % clock-gettime-realtime: 25 % gettimeofday: 26 % The other numbers stayed the same or differed by just 1 ns, which seems to be within the margin of error, as repeated runs on the same kernel suggest. The 5% numbers are of course those were we went from a syscall-only to the newly added arm64 VDSO implementation, but even the other calls improved by a factor of 3 or more.
Sounds like a strong indicator that this is a good thing to have.
Not sure if "running some benchmark a couple of times on a single machine" qualifies for this, but I guess it means:
Tested-by: Andre Przywara andre.przywara@arm.com
Thanks Andre, it sounds great! I will add your tag as well to my patches.
Cheers, Andre.
linux-kselftest-mirror@lists.linaro.org