- Robustify the user backtrace code, as done on other architectures. - Provide the symbols resolution when triggering from tracepoints.
Big thanks to Steve Capper for the help in debugging and rephrasing the commits descriptions.
Stress tested with perf record and tracepoints triggering (-e <tracepoint>), with unwinding using fp (--call-graph fp) and dwarf info (--call-graph dwarf).
Jean Pihet (3): ARM: perf: Check that current->mm is alive before getting user callchain ARM: perf: disable the pagefault handler when reading from user space ARM: perf: allow tracing with kernel tracepoints events
arch/arm/include/asm/perf_event.h | 19 +++++++++++++++++++ arch/arm/kernel/perf_event.c | 13 +++++++++++-- 2 files changed, 30 insertions(+), 2 deletions(-)
An event may occur when an mm is already released.
As per commit 20afc60f892d285fde179ead4b24e6a7938c2f1b 'x86, perf: Check that current->mm is alive before getting user callchain'
Signed-off-by: Jean Pihet jean.pihet@linaro.org Cc: Will Deacon will.deacon@arm.com Acked-by: Will Deacon will.deacon@arm.com --- arch/arm/kernel/perf_event.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c index 4238bcb..6493c4c 100644 --- a/arch/arm/kernel/perf_event.c +++ b/arch/arm/kernel/perf_event.c @@ -590,6 +590,10 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs) }
perf_callchain_store(entry, regs->ARM_pc); + + if (!current->mm) + return; + tail = (struct frame_tail __user *)regs->ARM_fp - 1;
while ((entry->nr < PERF_MAX_STACK_DEPTH) &&
Under perf, the fp unwinding scheme requires access to user space memory and can provoke a pagefault via call to __copy_from_user_inatomic from user_backtrace. This unwinding can take place in response to an interrupt (__perf_event_overflow). This is undesirable as we may already have mmap_sem held for write. One example being a process that calls mprotect just as a the PMU counters overflow.
An example that can provoke this behaviour: perf record -e event:tocapture --call-graph fp ./application_to_test
This patch addresses this issue by disabling pagefaults briefly in user_backtrace (as is done in the other architectures: ARM64, x86, Sparc etc.).
Without the patch a deadlock occurs when __perf_event_overflow is called while reading the data from the user space:
[ INFO: possible recursive locking detected ] 3.16.0-rc2-00038-g0ed7ff6 #46 Not tainted --------------------------------------------- stress/1634 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<c001dc04>] do_page_fault+0xa8/0x428
but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<c00f4098>] SyS_mprotect+0xa8/0x1c8
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(&mm->mmap_sem); lock(&mm->mmap_sem);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by stress/1634: #0: (&mm->mmap_sem){++++++}, at: [<c00f4098>] SyS_mprotect+0xa8/0x1c8 #1: (rcu_read_lock){......}, at: [<c00c29dc>] __perf_event_overflow+0x120/0x294
stack backtrace: CPU: 1 PID: 1634 Comm: stress Not tainted 3.16.0-rc2-00038-g0ed7ff6 #46 [<c0017c8c>] (unwind_backtrace) from [<c0012eec>] (show_stack+0x20/0x24) [<c0012eec>] (show_stack) from [<c04de914>] (dump_stack+0x7c/0x98) [<c04de914>] (dump_stack) from [<c006a360>] (__lock_acquire+0x1484/0x1cf0) [<c006a360>] (__lock_acquire) from [<c006b14c>] (lock_acquire+0xa4/0x11c) [<c006b14c>] (lock_acquire) from [<c04e3880>] (down_read+0x40/0x7c) [<c04e3880>] (down_read) from [<c001dc04>] (do_page_fault+0xa8/0x428) [<c001dc04>] (do_page_fault) from [<c00084ec>] (do_DataAbort+0x44/0xa8) [<c00084ec>] (do_DataAbort) from [<c0013a1c>] (__dabt_svc+0x3c/0x60) Exception stack(0xed7c5ae0 to 0xed7c5b28) 5ae0: ed7c5b5c b6dadff4 ffffffec 00000000 b6dadff4 ebc08000 00000000 ebc08000 5b00: 0000007e 00000000 ed7c4000 ed7c5b94 00000014 ed7c5b2c c001a438 c0236c60 5b20: 00000013 ffffffff [<c0013a1c>] (__dabt_svc) from [<c0236c60>] (__copy_from_user+0xa4/0x3a4)
Signed-off-by: Jean Pihet jean.pihet@linaro.org Cc: Will Deacon will.deacon@arm.com Acked-by: Steve Capper steve.capper@linaro.org --- arch/arm/kernel/perf_event.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c index 6493c4c..f5aeca2 100644 --- a/arch/arm/kernel/perf_event.c +++ b/arch/arm/kernel/perf_event.c @@ -560,11 +560,16 @@ user_backtrace(struct frame_tail __user *tail, struct perf_callchain_entry *entry) { struct frame_tail buftail; + unsigned long err;
- /* Also check accessibility of one struct frame_tail beyond */ if (!access_ok(VERIFY_READ, tail, sizeof(buftail))) return NULL; - if (__copy_from_user_inatomic(&buftail, tail, sizeof(buftail))) + + pagefault_disable(); + err = __copy_from_user_inatomic(&buftail, tail, sizeof(buftail)); + pagefault_enable(); + + if (err) return NULL;
perf_callchain_store(entry, buftail.lr);
When tracing with tracepoints events the IP and CPSR are set to 0, preventing the perf code to resolve the symbols:
./perf record -e kmem:kmalloc cal [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.007 MB perf.data (~321 samples) ]
./perf report Overhead Command Shared Object Symbol ........ ....... ............. ........... 40.78% cal [unknown] [.]00000000 31.6% cal [unknown] [.]00000000
The examination of the gathered samples (perf report -D) shows the IP is set to 0 and that the samples are considered as user space samples, while the IP should be set from the registers and the samples should be considered as kernel samples.
The fix is to implement perf_arch_fetch_caller_regs for ARM, which fills the necessary registers used for the callchain unwinding and to determine the user/kernel space property of the samples: - program counter for PERF_SAMPLE_IP, as used by dwarf unwinding, - sp & fp for fp based callchain unwinding, - cpsr for user_mode() tests.
Tested with perf record and tracepoints triggering (-e <tracepoint>), with unwinding using fp (--call-graph fp) and dwarf info (--call-graph dwarf).
Reported by Sneha Priya on linaro-dev, cf. http://lists.linaro.org/pipermail/linaro-dev/2014-May/017151.html
Signed-off-by: Jean Pihet jean.pihet@linaro.org Cc: Will Deacon will.deacon@arm.com Cc: Steve Capper steve.capper@linaro.org Reported-by: Sneha Priya sneha.cse@hotmail.com --- arch/arm/include/asm/perf_event.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/arch/arm/include/asm/perf_event.h b/arch/arm/include/asm/perf_event.h index 7558775..b02b5d3 100644 --- a/arch/arm/include/asm/perf_event.h +++ b/arch/arm/include/asm/perf_event.h @@ -26,6 +26,25 @@ struct pt_regs; extern unsigned long perf_instruction_pointer(struct pt_regs *regs); extern unsigned long perf_misc_flags(struct pt_regs *regs); #define perf_misc_flags(regs) perf_misc_flags(regs) + +/* + * Take a snapshot of the regs. + * We only need a few of the regs: + * - program counter for PERF_SAMPLE_IP, as used by dwarf unwinding, + * - sp & fp for fp based callchain unwinding, + * - cpsr for user_mode() tests. + */ +#define perf_arch_fetch_caller_regs(regs, __ip) { \ + instruction_pointer(regs)= (__ip); \ + __asm__ ( \ + "str sp, %[_ARM_sp] \n\t" \ + "str fp, %[_ARM_fp] \n\t" \ + "mrs %[_ARM_cpsr], cpsr \n\t" \ + : [_ARM_sp] "=m" (regs->ARM_sp), \ + [_ARM_fp] "=m" (regs->ARM_fp), \ + [_ARM_cpsr] "=r" (regs->ARM_cpsr) \ + ); \ +} #endif
#endif /* __ARM_PERF_EVENT_H__ */
Hi Jean,
On Mon, Jul 07, 2014 at 02:45:07PM +0100, Jean Pihet wrote:
- Robustify the user backtrace code, as done on other architectures.
- Provide the symbols resolution when triggering from tracepoints.
Big thanks to Steve Capper for the help in debugging and rephrasing the commits descriptions.
Stress tested with perf record and tracepoints triggering (-e <tracepoint>), with unwinding using fp (--call-graph fp) and dwarf info (--call-graph dwarf).
I've taken the first two in this series, but patch 3 still doesn't address my concerns about saving the required register state for unwinding.
Thanks!
Will
linaro-kernel@lists.linaro.org