This patch series is to refine the memory barriers for AUX ring buffer.
Patches 01 ~ 04 to address the barriers usage in the kernel. The first patch is to make clear comment for how to use the barriers between the data store and aux_head store, this asks the driver to make sure the data is visible. Patches 02 ~ 04 is to refine the drivers for barriers after the data store.
Patch 05 is to use WRITE_ONCE() for updating aux_tail.
Patches 06 ~ 09 is to drop the legacy __sync functions, and polish for duplicate code and cleanup the build and feature test after SYNC_COMPARE_AND_SWAP is not used.
Patch 10 introduces a new global variable to indicate the kernel runs in 64-bit mode which can be used to confirm if in compat mode; patch 11 introduces variant functions for accessing AUX head/tail, it resolves the aotmicity for reading head pointer, and returns error for the tail is bigger than 4GB.
Have testes the patches on Arm64 Juno platform.
Changes from v3: - Removed the inapprocate paragraph in the commit log for patch "perf auxtrace: Drop legacy __sync functions" (Adrian); - Added new patch to remove feature-sync-compare-and-swap test (Adrian); - Th patch for "perf auxtrace: Use WRITE_ONCE() for updating aux_tail", is a standlone and simple change, so moved it ahead in the patch set for better ordering; - Minor improvement for commit logs in the last two patches.
Changes from v2: - Removed auxtrace_mmap__read_snapshot_head(), which has the duplicated code with auxtrace_mmap__read_head(); - Cleanuped the build for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT (Adrian); - Added global variable "kernel_is_64_bit" (Adrian); - Added compat variants compat_auxtrace_mmap__{read_head|write_tail} (Adrian).
Leo Yan (11): perf/ring_buffer: Add comment for barriers on AUX ring buffer coresight: tmc-etr: Add barrier after updating AUX ring buffer coresight: tmc-etf: Add comment for store ordering perf/x86: Add barrier after updating bts perf auxtrace: Use WRITE_ONCE() for updating aux_tail perf auxtrace: Drop legacy __sync functions perf auxtrace: Remove auxtrace_mmap__read_snapshot_head() perf: Cleanup for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT tools: Remove feature-sync-compare-and-swap feature detection perf env: Set flag for kernel is 64-bit mode perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
arch/x86/events/intel/bts.c | 3 + .../hwtracing/coresight/coresight-tmc-etf.c | 6 + .../hwtracing/coresight/coresight-tmc-etr.c | 8 ++ kernel/events/ring_buffer.c | 9 ++ tools/build/Makefile.feature | 1 - tools/build/feature/Makefile | 4 - tools/build/feature/test-all.c | 4 - .../feature/test-sync-compare-and-swap.c | 15 --- tools/perf/Makefile.config | 4 - tools/perf/util/auxtrace.c | 19 ++- tools/perf/util/auxtrace.h | 109 ++++++++++++++---- tools/perf/util/env.c | 17 ++- tools/perf/util/env.h | 1 + 13 files changed, 136 insertions(+), 64 deletions(-) delete mode 100644 tools/build/feature/test-sync-compare-and-swap.c
AUX ring buffer applies almost the same barriers as perf ring buffer, but there has an exception for ordering between writing the AUX trace data and updating user_page::aux_head.
This patch adds comment for how to use the barriers on AUX ring buffer, and gives comment to ask the drivers to flush the trace data into AUX ring buffer prior to updating user_page::aux_head.
Signed-off-by: Leo Yan leo.yan@linaro.org Acked-by: Peter Zijlstra (Intel) peterz@infradead.org --- kernel/events/ring_buffer.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 52868716ec35..5cf6579be05e 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -509,6 +509,15 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size) perf_event_aux_event(handle->event, aux_head, size, handle->aux_flags);
+ /* + * See perf_output_put_handle(), AUX ring buffer applies the same + * barrier pairing as the perf ring buffer; except for B, since + * AUX ring buffer is written by hardware trace, we cannot simply + * use the generic memory barrier (like smp_wmb()) prior to update + * user_page::aux_head, the hardware trace driver takes the + * responsibility to ensure the trace data has been flushed into + * the AUX buffer before calling perf_aux_output_end(). + */ WRITE_ONCE(rb->user_page->aux_head, rb->aux_head); if (rb_need_aux_wakeup(rb)) wakeup = true;
Since a memory barrier is required between AUX trace data store and aux_head store, and the AUX trace data is filled with memcpy(), it's sufficient to use smp_wmb() so can ensure the trace data is visible prior to updating aux_head.
Signed-off-by: Leo Yan leo.yan@linaro.org --- drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index acdb59e0e661..713205db15a1 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1563,6 +1563,14 @@ tmc_update_etr_buffer(struct coresight_device *csdev, */ if (etr_perf->snapshot) handle->head += size; + + /* + * It requires the ordering between the AUX trace data and aux_head + * store, below smp_wmb() ensures the AUX trace data is visible prior + * to updating aux_head. + */ + smp_wmb(); + out: /* * Don't set the TRUNCATED flag in snapshot mode because 1) the
On 11/07/2021 11:40, Leo Yan wrote:
Since a memory barrier is required between AUX trace data store and aux_head store, and the AUX trace data is filled with memcpy(), it's sufficient to use smp_wmb() so can ensure the trace data is visible prior to updating aux_head.
Signed-off-by: Leo Yan leo.yan@linaro.org
drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index acdb59e0e661..713205db15a1 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1563,6 +1563,14 @@ tmc_update_etr_buffer(struct coresight_device *csdev, */ if (etr_perf->snapshot) handle->head += size;
- /*
* It requires the ordering between the AUX trace data and aux_head
* store, below smp_wmb() ensures the AUX trace data is visible prior
* to updating aux_head.
*/
Please could we reword this a bit, something like :
/* * Ensure that the AUX trace data is visible before the aux_head * is updated via perf_aux_output_end(), as expected by the * perf ring buffer. */
- smp_wmb();
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
Hi Suzuki,
On Mon, Jul 12, 2021 at 11:40:12AM +0100, Suzuki Kuruppassery Poulose wrote:
On 11/07/2021 11:40, Leo Yan wrote:
Since a memory barrier is required between AUX trace data store and aux_head store, and the AUX trace data is filled with memcpy(), it's sufficient to use smp_wmb() so can ensure the trace data is visible prior to updating aux_head.
Signed-off-by: Leo Yan leo.yan@linaro.org
drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index acdb59e0e661..713205db15a1 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1563,6 +1563,14 @@ tmc_update_etr_buffer(struct coresight_device *csdev, */ if (etr_perf->snapshot) handle->head += size;
- /*
* It requires the ordering between the AUX trace data and aux_head
* store, below smp_wmb() ensures the AUX trace data is visible prior
* to updating aux_head.
*/
Please could we reword this a bit, something like :
/* * Ensure that the AUX trace data is visible before the aux_head * is updated via perf_aux_output_end(), as expected by the * perf ring buffer. */
Will refine with this in next spin. Thanks for review!
- smp_wmb();
Reviewed-by: Suzuki K Poulose suzuki.poulose@arm.com
AUX ring buffer is required to separate the data store and aux_head store, since the function CS_LOCK() has contained memory barrier mb(), mb() is a more conservative barrier than smp_wmb() on Arm32/Arm64, thus it's needless to add any explicit barrier anymore.
Add comment to make clear for the barrier usage for ETF.
Signed-off-by: Leo Yan leo.yan@linaro.org --- drivers/hwtracing/coresight/coresight-tmc-etf.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index 45b85edfc690..9a42ee689921 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -553,6 +553,12 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev, if (buf->snapshot) handle->head += to_read;
+ /* + * AUX ring buffer requires to use memory barrier to separate the trace + * data store and aux_head store, because CS_LOCK() contains mb() which + * gives more heavy barrier than smp_wmb(), it's not necessary to + * explicitly invoke any barrier. + */ CS_LOCK(drvdata->base); out: spin_unlock_irqrestore(&drvdata->spinlock, flags);
Add barrier wmb() to separate the AUX data store and aux_head store.
Signed-off-by: Leo Yan leo.yan@linaro.org --- arch/x86/events/intel/bts.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c index 6320d2cfd9d3..4a015d160bc5 100644 --- a/arch/x86/events/intel/bts.c +++ b/arch/x86/events/intel/bts.c @@ -209,6 +209,9 @@ static void bts_update(struct bts_ctx *bts) } else { local_set(&buf->data_size, head); } + + /* The WMB separates data store and aux_head store matches. */ + wmb(); }
static int
Use WRITE_ONCE() for updating aux_tail, so can avoid unexpected memory behaviour.
Signed-off-by: Leo Yan leo.yan@linaro.org Acked-by: Adrian Hunter adrian.hunter@intel.com Acked-by: Peter Zijlstra (Intel) peterz@infradead.org --- tools/perf/util/auxtrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index cc1c1b9cec9c..79227b8864cd 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h @@ -480,7 +480,7 @@ static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail) /* Ensure all reads are done before we write the tail out */ smp_mb(); #if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT) - pc->aux_tail = tail; + WRITE_ONCE(pc->aux_tail, tail); #else do { old_tail = __sync_val_compare_and_swap(&pc->aux_tail, 0, 0);
The main purpose for using __sync built-in functions is to support compat mode for 32-bit perf with 64-bit kernel. But using these built-in functions might cause potential issues.
__sync functions originally support Intel Itanium processoer [1] but it cannot promise to support all 32-bit archs. Now these functions have become the legacy functions.
Considering __sync functions cannot really fix the 64-bit value atomicity on 32-bit archs, thus this patch drops __sync functions.
Credits to Peter for detailed analysis.
[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#g_t_005f_005...
Suggested-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/util/auxtrace.h | 19 ------------------- 1 file changed, 19 deletions(-)
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index 79227b8864cd..4f9176368134 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h @@ -440,12 +440,6 @@ struct auxtrace_cache;
#ifdef HAVE_AUXTRACE_SUPPORT
-/* - * In snapshot mode the mmapped page is read-only which makes using - * __sync_val_compare_and_swap() problematic. However, snapshot mode expects - * the buffer is not updated while the snapshot is made (e.g. Intel PT disables - * the event) so there is not a race anyway. - */ static inline u64 auxtrace_mmap__read_snapshot_head(struct auxtrace_mmap *mm) { struct perf_event_mmap_page *pc = mm->userpg; @@ -459,11 +453,7 @@ static inline u64 auxtrace_mmap__read_snapshot_head(struct auxtrace_mmap *mm) static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm) { struct perf_event_mmap_page *pc = mm->userpg; -#if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT) u64 head = READ_ONCE(pc->aux_head); -#else - u64 head = __sync_val_compare_and_swap(&pc->aux_head, 0, 0); -#endif
/* Ensure all reads are done after we read the head */ smp_rmb(); @@ -473,19 +463,10 @@ static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm) static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail) { struct perf_event_mmap_page *pc = mm->userpg; -#if BITS_PER_LONG != 64 && defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT) - u64 old_tail; -#endif
/* Ensure all reads are done before we write the tail out */ smp_mb(); -#if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT) WRITE_ONCE(pc->aux_tail, tail); -#else - do { - old_tail = __sync_val_compare_and_swap(&pc->aux_tail, 0, 0); - } while (!__sync_bool_compare_and_swap(&pc->aux_tail, old_tail, tail)); -#endif }
int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
Since the function auxtrace_mmap__read_snapshot_head() is exactly same with auxtrace_mmap__read_head(), whether the session is in snapshot mode or not, it's unified to use function auxtrace_mmap__read_head() for reading AUX buffer head.
And the function auxtrace_mmap__read_snapshot_head() is unused so this patch removes it.
Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/util/auxtrace.c | 5 ++--- tools/perf/util/auxtrace.h | 10 ---------- 2 files changed, 2 insertions(+), 13 deletions(-)
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c index cb19669d2a5b..7958e17229ea 100644 --- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -1686,13 +1686,12 @@ static int __auxtrace_mmap__read(struct mmap *map, union perf_event ev; void *data1, *data2;
+ head = auxtrace_mmap__read_head(mm); + if (snapshot) { - head = auxtrace_mmap__read_snapshot_head(mm); if (auxtrace_record__find_snapshot(itr, mm->idx, mm, data, &head, &old)) return -1; - } else { - head = auxtrace_mmap__read_head(mm); }
if (old == head) diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index 4f9176368134..d68a5e80b217 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h @@ -440,16 +440,6 @@ struct auxtrace_cache;
#ifdef HAVE_AUXTRACE_SUPPORT
-static inline u64 auxtrace_mmap__read_snapshot_head(struct auxtrace_mmap *mm) -{ - struct perf_event_mmap_page *pc = mm->userpg; - u64 head = READ_ONCE(pc->aux_head); - - /* Ensure all reads are done after we read the head */ - smp_rmb(); - return head; -} - static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm) { struct perf_event_mmap_page *pc = mm->userpg;
Since the __sync functions have been dropped, This patch removes unused build and checking for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT in perf tool.
Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/Makefile.config | 4 ---- tools/perf/util/auxtrace.c | 5 ----- 2 files changed, 9 deletions(-)
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index eb8e487ef90b..4a0d9a6defc7 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -349,10 +349,6 @@ CXXFLAGS += $(INC_FLAGS)
LIBPERF_CFLAGS := $(CORE_CFLAGS) $(EXTRA_CFLAGS)
-ifeq ($(feature-sync-compare-and-swap), 1) - CFLAGS += -DHAVE_SYNC_COMPARE_AND_SWAP_SUPPORT -endif - ifeq ($(feature-pthread-attr-setaffinity-np), 1) CFLAGS += -DHAVE_PTHREAD_ATTR_SETAFFINITY_NP endif diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c index 7958e17229ea..6a63be8b2430 100644 --- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -130,11 +130,6 @@ int auxtrace_mmap__mmap(struct auxtrace_mmap *mm, return 0; }
-#if BITS_PER_LONG != 64 && !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT) - pr_err("Cannot use AUX area tracing mmaps\n"); - return -1; -#endif - pc->aux_offset = mp->offset; pc->aux_size = mp->len;
Since the __sync functions have been removed from perf, it's needless for perf tool to test the feature sync-compare-and-swap.
The feature test is not used by any other components, remove it.
Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/build/Makefile.feature | 1 - tools/build/feature/Makefile | 4 ---- tools/build/feature/test-all.c | 4 ---- tools/build/feature/test-sync-compare-and-swap.c | 15 --------------- 4 files changed, 24 deletions(-) delete mode 100644 tools/build/feature/test-sync-compare-and-swap.c
diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index 04a8e3db8a54..3dd2f68366f9 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -34,7 +34,6 @@ FEATURE_TESTS_BASIC := \ dwarf_getlocations \ eventfd \ fortify-source \ - sync-compare-and-swap \ get_current_dir_name \ gettid \ glibc \ diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index ec203e28407f..eff55d287db1 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -9,7 +9,6 @@ FILES= \ test-dwarf_getlocations.bin \ test-eventfd.bin \ test-fortify-source.bin \ - test-sync-compare-and-swap.bin \ test-get_current_dir_name.bin \ test-glibc.bin \ test-gtk2.bin \ @@ -260,9 +259,6 @@ $(OUTPUT)test-libdw-dwarf-unwind.bin: $(OUTPUT)test-libbabeltrace.bin: $(BUILD) # -lbabeltrace provided by $(FEATURE_CHECK_LDFLAGS-libbabeltrace)
-$(OUTPUT)test-sync-compare-and-swap.bin: - $(BUILD) - $(OUTPUT)test-compile-32.bin: $(CC) -m32 -o $@ test-compile.c
diff --git a/tools/build/feature/test-all.c b/tools/build/feature/test-all.c index 464873883396..920439527291 100644 --- a/tools/build/feature/test-all.c +++ b/tools/build/feature/test-all.c @@ -106,10 +106,6 @@ # include "test-libdw-dwarf-unwind.c" #undef main
-#define main main_test_sync_compare_and_swap -# include "test-sync-compare-and-swap.c" -#undef main - #define main main_test_zlib # include "test-zlib.c" #undef main diff --git a/tools/build/feature/test-sync-compare-and-swap.c b/tools/build/feature/test-sync-compare-and-swap.c deleted file mode 100644 index 3bc6b0768a53..000000000000 --- a/tools/build/feature/test-sync-compare-and-swap.c +++ /dev/null @@ -1,15 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#include <stdint.h> - -volatile uint64_t x; - -int main(int argc, char *argv[]) -{ - uint64_t old, new = argc; - - (void)argv; - do { - old = __sync_val_compare_and_swap(&x, 0, 0); - } while (!__sync_bool_compare_and_swap(&x, old, new)); - return old == new; -}
It's useful to know that the kernel is running in 32-bit or 64-bit mode. E.g. We can decide if perf tool is running in compat mode from this info.
This patch adds a global variable "kernel_is_64_bit", it's initialized when a session setups environment, its value is decided by checking the architecture string.
Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/util/env.c | 17 ++++++++++++++++- tools/perf/util/env.h | 1 + 2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c index ebc5e9ad35db..345635a2e842 100644 --- a/tools/perf/util/env.c +++ b/tools/perf/util/env.c @@ -11,6 +11,7 @@ #include <stdlib.h> #include <string.h>
+int kernel_is_64_bit; struct perf_env perf_env;
#ifdef HAVE_LIBBPF_SUPPORT @@ -172,6 +173,19 @@ static void perf_env__purge_bpf(struct perf_env *env __maybe_unused) } #endif // HAVE_LIBBPF_SUPPORT
+static void perf_env__init_kernel_mode(struct perf_env *env) +{ + const char *arch = perf_env__raw_arch(env); + + if (!strncmp(arch, "x86_64", 6) || !strncmp(arch, "aarch64", 7) || + !strncmp(arch, "arm64", 5) || !strncmp(arch, "mips64", 6) || + !strncmp(arch, "parisc64", 8) || !strncmp(arch, "riscv64", 7) || + !strncmp(arch, "s390x", 5) || !strncmp(arch, "sparc64", 7)) + kernel_is_64_bit = 1; + else + kernel_is_64_bit = 0; +} + void perf_env__exit(struct perf_env *env) { int i; @@ -217,13 +231,14 @@ void perf_env__exit(struct perf_env *env) zfree(&env->hybrid_cpc_nodes); }
-void perf_env__init(struct perf_env *env __maybe_unused) +void perf_env__init(struct perf_env *env) { #ifdef HAVE_LIBBPF_SUPPORT env->bpf_progs.infos = RB_ROOT; env->bpf_progs.btfs = RB_ROOT; init_rwsem(&env->bpf_progs.lock); #endif + perf_env__init_kernel_mode(env); }
int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]) diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h index 6824a7423a2d..cc989ff49740 100644 --- a/tools/perf/util/env.h +++ b/tools/perf/util/env.h @@ -139,6 +139,7 @@ enum perf_compress_type { struct bpf_prog_info_node; struct btf_node;
+extern int kernel_is_64_bit; extern struct perf_env perf_env;
void perf_env__exit(struct perf_env *env);
Em Sun, Jul 11, 2021 at 06:41:04PM +0800, Leo Yan escreveu:
It's useful to know that the kernel is running in 32-bit or 64-bit mode. E.g. We can decide if perf tool is running in compat mode from this info.
This patch adds a global variable "kernel_is_64_bit", it's initialized when a session setups environment, its value is decided by checking the architecture string.
Signed-off-by: Leo Yan leo.yan@linaro.org
tools/perf/util/env.c | 17 ++++++++++++++++- tools/perf/util/env.h | 1 + 2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c index ebc5e9ad35db..345635a2e842 100644 --- a/tools/perf/util/env.c +++ b/tools/perf/util/env.c @@ -11,6 +11,7 @@ #include <stdlib.h> #include <string.h> +int kernel_is_64_bit; struct perf_env perf_env;
Why can't this be in 'struct perf_env'?
- Arnaldo
#ifdef HAVE_LIBBPF_SUPPORT @@ -172,6 +173,19 @@ static void perf_env__purge_bpf(struct perf_env *env __maybe_unused) } #endif // HAVE_LIBBPF_SUPPORT +static void perf_env__init_kernel_mode(struct perf_env *env) +{
- const char *arch = perf_env__raw_arch(env);
- if (!strncmp(arch, "x86_64", 6) || !strncmp(arch, "aarch64", 7) ||
!strncmp(arch, "arm64", 5) || !strncmp(arch, "mips64", 6) ||
!strncmp(arch, "parisc64", 8) || !strncmp(arch, "riscv64", 7) ||
!strncmp(arch, "s390x", 5) || !strncmp(arch, "sparc64", 7))
kernel_is_64_bit = 1;
- else
kernel_is_64_bit = 0;
+}
void perf_env__exit(struct perf_env *env) { int i; @@ -217,13 +231,14 @@ void perf_env__exit(struct perf_env *env) zfree(&env->hybrid_cpc_nodes); } -void perf_env__init(struct perf_env *env __maybe_unused) +void perf_env__init(struct perf_env *env) { #ifdef HAVE_LIBBPF_SUPPORT env->bpf_progs.infos = RB_ROOT; env->bpf_progs.btfs = RB_ROOT; init_rwsem(&env->bpf_progs.lock); #endif
- perf_env__init_kernel_mode(env);
} int perf_env__set_cmdline(struct perf_env *env, int argc, const char *argv[]) diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h index 6824a7423a2d..cc989ff49740 100644 --- a/tools/perf/util/env.h +++ b/tools/perf/util/env.h @@ -139,6 +139,7 @@ enum perf_compress_type { struct bpf_prog_info_node; struct btf_node; +extern int kernel_is_64_bit; extern struct perf_env perf_env; void perf_env__exit(struct perf_env *env); -- 2.25.1
Hi Arnaldo, Adrian,
On Mon, Jul 12, 2021 at 03:14:35PM -0300, Arnaldo Carvalho de Melo wrote:
Em Sun, Jul 11, 2021 at 06:41:04PM +0800, Leo Yan escreveu:
It's useful to know that the kernel is running in 32-bit or 64-bit mode. E.g. We can decide if perf tool is running in compat mode from this info.
This patch adds a global variable "kernel_is_64_bit", it's initialized when a session setups environment, its value is decided by checking the architecture string.
Signed-off-by: Leo Yan leo.yan@linaro.org
tools/perf/util/env.c | 17 ++++++++++++++++- tools/perf/util/env.h | 1 + 2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c index ebc5e9ad35db..345635a2e842 100644 --- a/tools/perf/util/env.c +++ b/tools/perf/util/env.c @@ -11,6 +11,7 @@ #include <stdlib.h> #include <string.h> +int kernel_is_64_bit; struct perf_env perf_env;
Why can't this be in 'struct perf_env'?
Good question. I considered to add it in struct perf_env but finally I used this way; the reason is this variable "kernel_is_64_bit" is only used during recording phase for AUX ring buffer, and don't use it for report. So seems to me it's over complexity to add a new field and just wander if it's necessary to save this field as new feature in the perf header.
Combining the comment from Adrian in another email, I think it's good to add a new field "compat_mode" in the struct perf_env, and this field will be initialized in build-record.c. Currently we don't need to save this value into the perf file, if later we need to use this value for decoding phase, then we can add a new feature item to save "compat_mode" into the perf file's header.
If you have any different idea, please let me know. Thanks!
Leo
When perf runs in compat mode (kernel in 64-bit mode and the perf is in 32-bit mode), the 64-bit value atomicity in the user space cannot be assured, E.g. on some architectures, the 64-bit value accessing is split into two instructions, one is for the low 32-bit word accessing and another is for the high 32-bit word.
This patch introduces two functions compat_auxtrace_mmap__read_head() and compat_auxtrace_mmap__write_tail(), as their naming indicates, when perf tool works in compat mode, it uses these two functions to access the AUX head and tail. These two functions can allow the perf tool to work properly in certain conditions, e.g. when perf tool works in snapshot mode with only using AUX head pointer, or perf tool uses the AUX buffer and the incremented tail is not bigger than 4GB.
When perf tool cannot handle the case when the AUX tail is bigger than 4GB, the function compat_auxtrace_mmap__write_tail() returns -1 and tells the caller to bail out for the error.
Suggested-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Leo Yan leo.yan@linaro.org --- tools/perf/util/auxtrace.c | 9 ++-- tools/perf/util/auxtrace.h | 94 +++++++++++++++++++++++++++++++++++++- 2 files changed, 98 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c index 6a63be8b2430..d6fc250fbf97 100644 --- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -1766,10 +1766,13 @@ static int __auxtrace_mmap__read(struct mmap *map, mm->prev = head;
if (!snapshot) { - auxtrace_mmap__write_tail(mm, head); - if (itr->read_finish) { - int err; + int err;
+ err = auxtrace_mmap__write_tail(mm, head); + if (err < 0) + return err; + + if (itr->read_finish) { err = itr->read_finish(itr, mm->idx); if (err < 0) return err; diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index d68a5e80b217..66de7b6e65ec 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h @@ -18,6 +18,8 @@ #include <asm/bitsperlong.h> #include <asm/barrier.h>
+#include "env.h" + union perf_event; struct perf_session; struct evlist; @@ -440,23 +442,111 @@ struct auxtrace_cache;
#ifdef HAVE_AUXTRACE_SUPPORT
+/* + * In the compat mode kernel runs in 64-bit and perf tool runs in 32-bit mode, + * 32-bit perf tool cannot access 64-bit value atomically, which might lead to + * the issues caused by the below sequence on multiple CPUs: when perf tool + * accesses either the load operation or the store operation for 64-bit value, + * on some architectures the operation is divided into two instructions, one + * is for accessing the low 32-bit value and another is for the high 32-bit; + * thus these two user operations can give the kernel chances to access the + * 64-bit value, and thus leads to the unexpected load values. + * + * kernel (64-bit) user (32-bit) + * + * if (LOAD ->aux_tail) { --, LOAD ->aux_head_lo + * STORE $aux_data | ,---> + * FLUSH $aux_data | | LOAD ->aux_head_hi + * STORE ->aux_head --|-------` smp_rmb() + * } | LOAD $data + * | smp_mb() + * | STORE ->aux_tail_lo + * `-----------> + * STORE ->aux_tail_hi + * + * For this reason, it's impossible for the perf tool to work correctly when + * the AUX head or tail is bigger than 4GB (more than 32 bits length); and we + * can not simply limit the AUX ring buffer to less than 4GB, the reason is + * the pointers can be increased monotonically (e.g in snapshot mode), whatever + * the buffer size it is, at the end the head and tail can be bigger than 4GB + * and carry out to the high 32-bit. + * + * To mitigate the issues and improve the user experience, we can allow the + * perf tool working in certain conditions and bail out with error if detect + * any overflow cannot be handled. + * + * For reading the AUX head, it reads out the values for three times, and + * compares the high 4 bytes of the values between the first time and the last + * time, if there has no change for high 4 bytes injected by the kernel during + * the user reading sequence, it's safe for use the second value. + * + * When update the AUX tail and detects any carrying in the high 32 bits, it + * means there have two store operations in user space and it cannot promise + * the atomicity for 64-bit write, so return '-1' in this case to tell the + * caller an overflow error has happened. + */ +static inline u64 compat_auxtrace_mmap__read_head(struct auxtrace_mmap *mm) +{ + struct perf_event_mmap_page *pc = mm->userpg; + u64 first, second, last; + u64 mask = (u64)(UINT32_MAX) << 32; + + do { + first = READ_ONCE(pc->aux_head); + /* Ensure all reads are done after we read the head */ + smp_rmb(); + second = READ_ONCE(pc->aux_head); + /* Ensure all reads are done after we read the head */ + smp_rmb(); + last = READ_ONCE(pc->aux_head); + } while ((first & mask) != (last & mask)); + + return second; +} + +static inline int compat_auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, + u64 tail) +{ + struct perf_event_mmap_page *pc = mm->userpg; + u64 mask = (u64)(UINT32_MAX) << 32; + + if (tail & mask) + return -1; + + /* Ensure all reads are done before we write the tail out */ + smp_mb(); + WRITE_ONCE(pc->aux_tail, tail); + return 0; +} + static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm) { struct perf_event_mmap_page *pc = mm->userpg; - u64 head = READ_ONCE(pc->aux_head); + u64 head; + +#if BITS_PER_LONG == 32 + if (kernel_is_64_bit) + return compat_auxtrace_mmap__read_head(mm); +#endif + head = READ_ONCE(pc->aux_head);
/* Ensure all reads are done after we read the head */ smp_rmb(); return head; }
-static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail) +static inline int auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail) { struct perf_event_mmap_page *pc = mm->userpg;
+#if BITS_PER_LONG == 32 + if (kernel_is_64_bit) + return compat_auxtrace_mmap__write_tail(mm, tail); +#endif /* Ensure all reads are done before we write the tail out */ smp_mb(); WRITE_ONCE(pc->aux_tail, tail); + return 0; }
int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,