This patch series adds the infrastructure required to support kprobes in Thumb code. As it stands, it allows probes to be inserted onto NOP instructions.
Work Remaining --------------
- Add decoding and emulation code for all the different Thumb instructions. - Make framework test for probes on conditionally executed instructions and not execute callback functions if the relevant condition isn't met. - Write test code for all Thumb and ARM instruction types. (Does Linux have in tree test code or do I need to stash this elsewhere?)
Open Issue ----------
32-bit Thumb breakpoints may straddle two memory words, which means that when we set or clear them there is a window of opportunity where another CPU may only see half of the new instruction and execute invalid code. To prevent this I've used stop_machine() to get all CPUs to synchronously modify the instruction and update their I-caches. To my thinking, something like this would also be needed so that other CPUs see the new instruction, otherwise they could indefinately be executing the old one from their local I-cache.
The problem with using stop_machine() is that the breakpoint setting code is called from enable_kprobe() which holds the text_mutex and has this comment which says:
since [the breakpoint setting code] doesn't use stop_machine(), this doesn't cause deadlock on text_mutex. So, we don't need get_online_cpus()
Now I am using stop_machine() I need to understand what the consequences and alternatives are.
I do note, that when probes are disabled, the existing ARM kprobe implementation uses stop_machine() and we have a similar issue with this being called from disarm_kprobe() which takes the text_mutex.
From: Jon Medhurst tixy@yxit.co.uk
The implementation of svc_exit didn't take into account any stack hole created by svc_entry; as happens with the undef handler when kprobes are configured. The fix is to read the saved value of SP rather than trying to calculate it.
Signed-off-by: Jon Medhurst tixy@yxit.co.uk --- arch/arm/kernel/entry-header.S | 12 +++++------- 1 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index 051166c..83e29ad 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -121,15 +121,13 @@ .endm #else /* CONFIG_THUMB2_KERNEL */ .macro svc_exit, rpsr + ldr lr, [sp, #S_SP] @ top of the stack + ldrd r0, r1, [sp, #S_LR] @ calling lr and pc clrex @ clear the exclusive monitor - ldr r0, [sp, #S_SP] @ top of the stack - ldr r1, [sp, #S_PC] @ return address - tst r0, #4 @ orig stack 8-byte aligned? - stmdb r0, {r1, \rpsr} @ rfe context + stmdb lr!, {r0, r1, \rpsr} @ calling lr and rfe context ldmia sp, {r0 - r12} - ldr lr, [sp, #S_LR] - addeq sp, sp, #S_FRAME_SIZE - 8 @ aligned - addne sp, sp, #S_FRAME_SIZE - 4 @ not aligned + mov sp, lr + ldr lr, [sp], #4 rfeia sp! .endm
On Mon, 21 Mar 2011, Tixy wrote:
From: Jon Medhurst tixy@yxit.co.uk
The implementation of svc_exit didn't take into account any stack hole created by svc_entry; as happens with the undef handler when kprobes are configured. The fix is to read the saved value of SP rather than trying to calculate it.
Signed-off-by: Jon Medhurst tixy@yxit.co.uk
Acked-by: Nicolas Pitre nicolas.pitre@linaro.org
arch/arm/kernel/entry-header.S | 12 +++++------- 1 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index 051166c..83e29ad 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -121,15 +121,13 @@ .endm #else /* CONFIG_THUMB2_KERNEL */ .macro svc_exit, rpsr
- ldr lr, [sp, #S_SP] @ top of the stack
- ldrd r0, r1, [sp, #S_LR] @ calling lr and pc clrex @ clear the exclusive monitor
- ldr r0, [sp, #S_SP] @ top of the stack
- ldr r1, [sp, #S_PC] @ return address
- tst r0, #4 @ orig stack 8-byte aligned?
- stmdb r0, {r1, \rpsr} @ rfe context
- stmdb lr!, {r0, r1, \rpsr} @ calling lr and rfe context ldmia sp, {r0 - r12}
- ldr lr, [sp, #S_LR]
- addeq sp, sp, #S_FRAME_SIZE - 8 @ aligned
- addne sp, sp, #S_FRAME_SIZE - 4 @ not aligned
- mov sp, lr
- ldr lr, [sp], #4 rfeia sp! .endm
1.7.2.5
linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel
From: Jon Medhurst tixy@yxit.co.uk
Signed-off-by: Jon Medhurst tixy@yxit.co.uk --- arch/arm/kernel/traps.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 21ac43f..6468100 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -356,7 +356,12 @@ asmlinkage void __exception do_undefinstr(struct pt_regs *regs) pc = (void __user *)instruction_pointer(regs);
if (processor_mode(regs) == SVC_MODE) { - instr = *(u32 *) pc; +#ifdef CONFIG_THUMB2_KERNEL + if (thumb_mode(regs)) + instr = *(u16 *) pc; + else +#endif + instr = *(u32 *) pc; } else if (thumb_mode(regs)) { get_user(instr, (u16 __user *)pc); } else {
On Mon, 21 Mar 2011, Tixy wrote:
From: Jon Medhurst tixy@yxit.co.uk
Signed-off-by: Jon Medhurst tixy@yxit.co.uk
arch/arm/kernel/traps.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 21ac43f..6468100 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -356,7 +356,12 @@ asmlinkage void __exception do_undefinstr(struct pt_regs *regs) pc = (void __user *)instruction_pointer(regs); if (processor_mode(regs) == SVC_MODE) {
instr = *(u32 *) pc;
+#ifdef CONFIG_THUMB2_KERNEL
if (thumb_mode(regs))
instr = *(u16 *) pc;
else
+#endif
instr = *(u32 *) pc;
Although I suggested always using a 16-bit instruction for kprobe breakpoint, the above looks to be broken for 32-bit Thumb instructions.
Nicolas
On Mon, 2011-03-21 at 13:18 -0400, Nicolas Pitre wrote:
On Mon, 21 Mar 2011, Tixy wrote:
From: Jon Medhurst tixy@yxit.co.uk
Signed-off-by: Jon Medhurst tixy@yxit.co.uk
arch/arm/kernel/traps.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 21ac43f..6468100 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -356,7 +356,12 @@ asmlinkage void __exception do_undefinstr(struct pt_regs *regs) pc = (void __user *)instruction_pointer(regs); if (processor_mode(regs) == SVC_MODE) {
instr = *(u32 *) pc;
+#ifdef CONFIG_THUMB2_KERNEL
if (thumb_mode(regs))
instr = *(u16 *) pc;
else
+#endif
instr = *(u32 *) pc;
Although I suggested always using a 16-bit instruction for kprobe breakpoint, the above looks to be broken for 32-bit Thumb instructions.
It follows the existing pattern for user-side breakpoints and ptrace. These only use the first halfword of the instruction to select the hook to call; this hook then reads the second half and decides if it wants to handle it.
I initially thought of fixing do_undefinstr to use the full 32 bits of the instruction, which seems the right thing to do. But backed off because it would introduce compatibility problems with existing handlers like ptrace and possibly other out-of-tree code. I could change ptrace and not worry about other code I can't see.
One of my biggest headaches with this is deciding the best way of storing a 32-bit thumb instruction in an integer. The human readable way [ 1st-half : 2nd-half ], or the way that a 32-bit value would be stored in memory, which varies between little/big-endian, or even always as little-endian.
In the end I chickened out and avoided rewriting things I didn't need to. Perhaps I should have gone the whole hog and made all code dealing with ARM instructions use a union like:
struct arm_instruction { union { u32 arm; /* * thumb[0] = 16-bit thumb instruction or * 1st half-word of 32-bit thumb instruction * thumb[1] = 2nd half-word of 32-bit thumb instruction */ u16 thumb[2]; }; };
On Mon, 21 Mar 2011, Tixy wrote:
On Mon, 2011-03-21 at 13:18 -0400, Nicolas Pitre wrote:
On Mon, 21 Mar 2011, Tixy wrote:
From: Jon Medhurst tixy@yxit.co.uk
Signed-off-by: Jon Medhurst tixy@yxit.co.uk
arch/arm/kernel/traps.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index 21ac43f..6468100 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -356,7 +356,12 @@ asmlinkage void __exception do_undefinstr(struct pt_regs *regs) pc = (void __user *)instruction_pointer(regs); if (processor_mode(regs) == SVC_MODE) {
instr = *(u32 *) pc;
+#ifdef CONFIG_THUMB2_KERNEL
if (thumb_mode(regs))
instr = *(u16 *) pc;
else
+#endif
instr = *(u32 *) pc;
Although I suggested always using a 16-bit instruction for kprobe breakpoint, the above looks to be broken for 32-bit Thumb instructions.
It follows the existing pattern for user-side breakpoints and ptrace. These only use the first halfword of the instruction to select the hook to call; this hook then reads the second half and decides if it wants to handle it.
Right... well the user space Thumb access was originally meant for Thumb1 which has next to no 32-bit instructions. If it is easy to determine if an insn is 16 or 32 bits (I've seen x >= 0xe800 somewhere in your code) then I'd go with that and fetch the second half right away instead of duplicating the fetching of the second half everywhere.
I initially thought of fixing do_undefinstr to use the full 32 bits of the instruction, which seems the right thing to do. But backed off because it would introduce compatibility problems with existing handlers like ptrace and possibly other out-of-tree code.
We don't care about out-of-tree code. There is a reason for merging code into mainline after all.
From a quick glance only one hook so far expects a 32-bit Thumb2
instruction i.e. for ptrace. So IMHO this is worth doing.
One of my biggest headaches with this is deciding the best way of storing a 32-bit thumb instruction in an integer. The human readable way [ 1st-half : 2nd-half ], or the way that a 32-bit value would be stored in memory, which varies between little/big-endian, or even always as little-endian.
I'd suggest:
insn = ((u16 *)pc)[0]; if (insn_is_wide(insn)) { insn <<= 16; insn |= ((u16 *)pc)[1]; }
effectively using a big endian representation, but which is similar to ARM mode instructions, and which would make sense regardless of the instruction mode/width when printed.
In the end I chickened out and avoided rewriting things I didn't need to. Perhaps I should have gone the whole hog and made all code dealing with ARM instructions use a union like:
struct arm_instruction { union { u32 arm; /* * thumb[0] = 16-bit thumb instruction or * 1st half-word of 32-bit thumb instruction * thumb[1] = 2nd half-word of 32-bit thumb instruction */ u16 thumb[2]; }; };
I don't think going that far is necessary. That won't gain much in clarity and this has the potential of making the resulting compiled code a bit less efficient.
Nicolas
On Mon, 2011-03-21 at 14:56 -0400, Nicolas Pitre wrote:
On Mon, 21 Mar 2011, Tixy wrote:
On Mon, 2011-03-21 at 13:18 -0400, Nicolas Pitre wrote:
On Mon, 21 Mar 2011, Tixy wrote:
[...]
I initially thought of fixing do_undefinstr to use the full 32 bits of the instruction, which seems the right thing to do. But backed off because it would introduce compatibility problems with existing handlers like ptrace and possibly other out-of-tree code.
We don't care about out-of-tree code. There is a reason for merging code into mainline after all.
From a quick glance only one hook so far expects a 32-bit Thumb2 instruction i.e. for ptrace. So IMHO this is worth doing.
OK, I'll add the changes for this.
One of my biggest headaches with this is deciding the best way of storing a 32-bit thumb instruction in an integer. The human readable way [ 1st-half : 2nd-half ], or the way that a 32-bit value would be stored in memory, which varies between little/big-endian, or even always as little-endian.
I'd suggest:
insn = ((u16 *)pc)[0]; if (insn_is_wide(insn)) { insn <<= 16; insn |= ((u16 *)pc)[1]; }
effectively using a big endian representation, but which is similar to ARM mode instructions, and which would make sense regardless of the instruction mode/width when printed.
This is what I've done in the kprobe code that handles thumb, except that I've also shifted 16-bit thumb instructions into the top half. This was so we always know where the first half-word is and can test it for 16/32-bit size with "if(first_half<0xe800)".
However, I've just realised that I didn't need to do this, with your example "if(inst<0xe800)" still works; so my other patches can be simplified :-)
We could even just test the upper 16-bits for non-zero to indicate 32-bit. E.g. to store a thumb instruction:
u16* pc = (u16 *)addr; u16 half = inst>>16; if (half) *pc++ = half; *pc++ = half&0xffff;
which should compile to just three or four CPU instructions.
On Mon, 21 Mar 2011, Tixy wrote:
On Mon, 2011-03-21 at 14:56 -0400, Nicolas Pitre wrote:
I'd suggest:
insn = ((u16 *)pc)[0]; if (insn_is_wide(insn)) { insn <<= 16; insn |= ((u16 *)pc)[1]; }
effectively using a big endian representation, but which is similar to ARM mode instructions, and which would make sense regardless of the instruction mode/width when printed.
This is what I've done in the kprobe code that handles thumb, except that I've also shifted 16-bit thumb instructions into the top half. This was so we always know where the first half-word is and can test it for 16/32-bit size with "if(first_half<0xe800)".
However, I've just realised that I didn't need to do this, with your example "if(inst<0xe800)" still works; so my other patches can be simplified :-)
Another reason to keep the 16-bit insn in the low half-word is to simplify printing.
printk("instruction: 0x%04x\n", insn);
would display something sensible regardless of the mode or width (no fancy spacing in the 32-bit Thumb2 case but that's not so bad).
We could even just test the upper 16-bits for non-zero to indicate 32-bit. E.g. to store a thumb instruction:
u16* pc = (u16 *)addr; u16 half = inst>>16; if (half) *pc++ = half; *pc++ = half&0xffff;
which should compile to just three or four CPU instructions.
Indeed. And given the pointer is u16*, you may omit the 0xffff mask.
Nicolas
From: Jon Medhurst tixy@yxit.co.uk
On Thumb-2 kernels there may still be some kernel code using the ARM instruction set, so kprobes will need to support this. This patch fixes the ARM emulation code to built and work correctly when built as Thumb.
Signed-off-by: Jon Medhurst tixy@yxit.co.uk --- arch/arm/kernel/kprobes-decode.c | 61 +++++++++++++++++++++---------------- 1 files changed, 35 insertions(+), 26 deletions(-)
diff --git a/arch/arm/kernel/kprobes-decode.c b/arch/arm/kernel/kprobes-decode.c index 8f6ed43..0532ba3 100644 --- a/arch/arm/kernel/kprobes-decode.c +++ b/arch/arm/kernel/kprobes-decode.c @@ -94,6 +94,14 @@ union reg_pair { #endif };
+ +#ifdef CONFIG_THUMB2_KERNEL +#define BLX(reg) "blx "reg" \n\t" +#else +#define BLX(reg) "mov lr, pc \n\t" \ + "mov pc, "reg" \n\t" +#endif + /* * For STR and STM instructions, an ARM core may choose to use either * a +8 or a +12 displacement from the current instruction's address. @@ -108,8 +116,22 @@ static void __init find_str_pc_offset(void) int addr, scratch, ret;
__asm__ ( +#ifdef CONFIG_THUMB2_KERNEL + /* Switch to ARM mode */ + "adr %[scr], 1f \n\t" + "bx %[scr] \n\t" + ".arm \n\t" + "1: \n\t" +#endif "sub %[ret], pc, #4 \n\t" "str pc, %[addr] \n\t" +#ifdef CONFIG_THUMB2_KERNEL + /* Switch back to Thumb mode */ + "adr %[scr], 2f + 1 \n\t" + "bx %[scr] \n\t" + ".thumb \n\t" + "2: \n\t" +#endif "ldr %[scr], %[addr] \n\t" "sub %[ret], %[scr], %[ret] \n\t" : [ret] "=r" (ret), [scr] "=r" (scratch), [addr] "+m" (addr)); @@ -131,8 +153,7 @@ insnslot_0arg_rflags(long cpsr, insn_0arg_fn_t *fn)
__asm__ __volatile__ ( "msr cpsr_fs, %[cpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") : "=r" (ret) : [cpsr] "r" (cpsr), [fn] "r" (fn) : "lr", "cc" @@ -149,8 +170,7 @@ insnslot_llret_0arg_rflags(long cpsr, insn_llret_0arg_fn_t *fn)
__asm__ __volatile__ ( "msr cpsr_fs, %[cpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") : "=r" (ret0), "=r" (ret1) : [cpsr] "r" (cpsr), [fn] "r" (fn) : "lr", "cc" @@ -168,8 +188,7 @@ insnslot_1arg_rflags(long r0, long cpsr, insn_1arg_fn_t *fn)
__asm__ __volatile__ ( "msr cpsr_fs, %[cpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") : "=r" (ret) : "0" (rr0), [cpsr] "r" (cpsr), [fn] "r" (fn) : "lr", "cc" @@ -186,8 +205,7 @@ insnslot_2arg_rflags(long r0, long r1, long cpsr, insn_2arg_fn_t *fn)
__asm__ __volatile__ ( "msr cpsr_fs, %[cpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") : "=r" (ret) : "0" (rr0), "r" (rr1), [cpsr] "r" (cpsr), [fn] "r" (fn) @@ -206,8 +224,7 @@ insnslot_3arg_rflags(long r0, long r1, long r2, long cpsr, insn_3arg_fn_t *fn)
__asm__ __volatile__ ( "msr cpsr_fs, %[cpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") : "=r" (ret) : "0" (rr0), "r" (rr1), "r" (rr2), [cpsr] "r" (cpsr), [fn] "r" (fn) @@ -229,8 +246,7 @@ insnslot_llret_3arg_rflags(long r0, long r1, long r2, long cpsr,
__asm__ __volatile__ ( "msr cpsr_fs, %[cpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") : "=r" (ret0), "=r" (ret1) : "0" (rr0), "r" (rr1), "r" (rr2), [cpsr] "r" (cpsr), [fn] "r" (fn) @@ -253,8 +269,7 @@ insnslot_4arg_rflags(long r0, long r1, long r2, long r3, long cpsr,
__asm__ __volatile__ ( "msr cpsr_fs, %[cpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") : "=r" (ret) : "0" (rr0), "r" (rr1), "r" (rr2), "r" (rr3), [cpsr] "r" (cpsr), [fn] "r" (fn) @@ -273,8 +288,7 @@ insnslot_1arg_rwflags(long r0, long *cpsr, insn_1arg_fn_t *fn)
__asm__ __volatile__ ( "msr cpsr_fs, %[oldcpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") "mrs %[newcpsr], cpsr \n\t" : "=r" (ret), [newcpsr] "=r" (newcpsr) : "0" (rr0), [oldcpsr] "r" (oldcpsr), [fn] "r" (fn) @@ -295,8 +309,7 @@ insnslot_2arg_rwflags(long r0, long r1, long *cpsr, insn_2arg_fn_t *fn)
__asm__ __volatile__ ( "msr cpsr_fs, %[oldcpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") "mrs %[newcpsr], cpsr \n\t" : "=r" (ret), [newcpsr] "=r" (newcpsr) : "0" (rr0), "r" (rr1), [oldcpsr] "r" (oldcpsr), [fn] "r" (fn) @@ -319,8 +332,7 @@ insnslot_3arg_rwflags(long r0, long r1, long r2, long *cpsr,
__asm__ __volatile__ ( "msr cpsr_fs, %[oldcpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") "mrs %[newcpsr], cpsr \n\t" : "=r" (ret), [newcpsr] "=r" (newcpsr) : "0" (rr0), "r" (rr1), "r" (rr2), @@ -345,8 +357,7 @@ insnslot_4arg_rwflags(long r0, long r1, long r2, long r3, long *cpsr,
__asm__ __volatile__ ( "msr cpsr_fs, %[oldcpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") "mrs %[newcpsr], cpsr \n\t" : "=r" (ret), [newcpsr] "=r" (newcpsr) : "0" (rr0), "r" (rr1), "r" (rr2), "r" (rr3), @@ -373,8 +384,7 @@ insnslot_llret_4arg_rwflags(long r0, long r1, long r2, long r3, long *cpsr,
__asm__ __volatile__ ( "msr cpsr_fs, %[oldcpsr] \n\t" - "mov lr, pc \n\t" - "mov pc, %[fn] \n\t" + BLX("%[fn]") "mrs %[newcpsr], cpsr \n\t" : "=r" (ret0), "=r" (ret1), [newcpsr] "=r" (newcpsr) : "0" (rr0), "r" (rr1), "r" (rr2), "r" (rr3), @@ -549,8 +559,7 @@ static void __kprobes emulate_ldrd(struct kprobe *p, struct pt_regs *regs) "ldr r0, %[rn] \n\t" "ldr r1, %[rm] \n\t" "msr cpsr_fs, %[cpsr]\n\t" - "mov lr, pc \n\t" - "mov pc, %[i_fn] \n\t" + BLX("%[i_fn]") "str r0, %[rn] \n\t" /* in case of writeback */ "str r2, %[rd0] \n\t" "str r3, %[rd1] \n\t"
On Mon, 21 Mar 2011, Tixy wrote:
From: Jon Medhurst tixy@yxit.co.uk
On Thumb-2 kernels there may still be some kernel code using the ARM instruction set, so kprobes will need to support this. This patch fixes the ARM emulation code to built and work correctly when built as Thumb.
Signed-off-by: Jon Medhurst tixy@yxit.co.uk
[...]
@@ -108,8 +116,22 @@ static void __init find_str_pc_offset(void) int addr, scratch, ret; __asm__ ( +#ifdef CONFIG_THUMB2_KERNEL
/* Switch to ARM mode */
"adr %[scr], 1f \n\t"
"bx %[scr] \n\t"
A .align would be required here.
".arm \n\t"
"1: \n\t"
+#endif "sub %[ret], pc, #4 \n\t" "str pc, %[addr] \n\t" +#ifdef CONFIG_THUMB2_KERNEL
/* Switch back to Thumb mode */
"adr %[scr], 2f + 1 \n\t"
"bx %[scr] \n\t"
".thumb \n\t"
"2: \n\t"
Here you could simply use something like:
sub pc, pc, #3
+#endif "ldr %[scr], %[addr] \n\t" "sub %[ret], %[scr], %[ret] \n\t" : [ret] "=r" (ret), [scr] "=r" (scratch), [addr] "+m" (addr));
Isn't the direct storage of pc architecturally defined with Thumb2 capable processors? If so this whole code could be omitted in that case.
Nicolas
On Mon, 2011-03-21 at 15:58 -0400, Nicolas Pitre wrote:
On Mon, 21 Mar 2011, Tixy wrote:
From: Jon Medhurst tixy@yxit.co.uk
On Thumb-2 kernels there may still be some kernel code using the ARM instruction set, so kprobes will need to support this. This patch fixes the ARM emulation code to built and work correctly when built as Thumb.
Signed-off-by: Jon Medhurst tixy@yxit.co.uk
[...]
@@ -108,8 +116,22 @@ static void __init find_str_pc_offset(void) int addr, scratch, ret; __asm__ ( +#ifdef CONFIG_THUMB2_KERNEL
/* Switch to ARM mode */
"adr %[scr], 1f \n\t"
"bx %[scr] \n\t"
A .align would be required here.
OK. I had assumed that the assembler would automatically align ARM instructions but I doesn't say that it the docs.
".arm \n\t"
"1: \n\t"
+#endif "sub %[ret], pc, #4 \n\t" "str pc, %[addr] \n\t" +#ifdef CONFIG_THUMB2_KERNEL
/* Switch back to Thumb mode */
"adr %[scr], 2f + 1 \n\t"
"bx %[scr] \n\t"
".thumb \n\t"
"2: \n\t"
Here you could simply use something like:
sub pc, pc, #3
Only on ARMv7, on ARMv6T2 it doesn't change state.
+#endif "ldr %[scr], %[addr] \n\t" "sub %[ret], %[scr], %[ret] \n\t" : [ret] "=r" (ret), [scr] "=r" (scratch), [addr] "+m" (addr));
Isn't the direct storage of pc architecturally defined with Thumb2 capable processors? If so this whole code could be omitted in that case.
Again, only on ARMv7, not ARMv6T2.
Does Linux support ARMv6T2? arch/arm/Kconfig has
config THUMB2_KERNEL depends on CPU_V7 && !CPU_V6 && !CPU_V6K && EXPERIMENTAL
so if there isn't any other CPU_V6x config option then THUMB2_KERNEL implies ARMv7.
From: Jon Medhurst tixy@yxit.co.uk
This adds support to kprobes for setting, processing and clearing of Thumb breakpoints.
When kprobes fire, the probed instruction needs simulating. The code for handling this is in two new source files: kprobes-thumb16.c and kprobes-thumb32.c, currently these files are stubs which just handle NOP instructions. For reference, the simulation of ARM instructions is handled in kprobes-decode.c. It is anticipated that some of the helper functions in this will be factored out into a header file so they can be shared with the Thumb code.
Signed-off-by: Jon Medhurst tixy@yxit.co.uk --- arch/arm/Kconfig | 2 +- arch/arm/include/asm/kprobes.h | 33 +++++- arch/arm/kernel/Makefile | 3 + arch/arm/kernel/kprobes-thumb16.c | 33 ++++++ arch/arm/kernel/kprobes-thumb32.c | 32 +++++ arch/arm/kernel/kprobes.c | 230 ++++++++++++++++++++++++++++++++++--- 6 files changed, 315 insertions(+), 18 deletions(-) create mode 100644 arch/arm/kernel/kprobes-thumb16.c create mode 100644 arch/arm/kernel/kprobes-thumb32.c
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 6b6a4fd..9cda52f 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -10,7 +10,7 @@ config ARM select GENERIC_ATOMIC64 if (CPU_V6 || !CPU_32v6K || !AEABI) select HAVE_OPROFILE if (HAVE_PERF_EVENTS) select HAVE_ARCH_KGDB - select HAVE_KPROBES if (!XIP_KERNEL && !THUMB2_KERNEL) + select HAVE_KPROBES if (!XIP_KERNEL) select HAVE_KRETPROBES if (HAVE_KPROBES) select HAVE_FUNCTION_TRACER if (!XIP_KERNEL) select HAVE_FTRACE_MCOUNT_RECORD if (!XIP_KERNEL) diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h index bb8a19b..dc50b85 100644 --- a/arch/arm/include/asm/kprobes.h +++ b/arch/arm/include/asm/kprobes.h @@ -25,10 +25,12 @@ #define MAX_STACK_SIZE 64 /* 32 would probably be OK */
/* - * This undefined instruction must be unique and + * These undefined instructions must be unique and * reserved solely for kprobes' use. */ -#define KPROBE_BREAKPOINT_INSTRUCTION 0xe7f001f8 +#define KPROBE_ARM_BREAKPOINT_INSTRUCTION 0xe7f001f8 +#define KPROBE_THUMB16_BREAKPOINT_INSTRUCTION 0xdef8 +#define KPROBE_THUMB32_BREAKPOINT_INSTRUCTION 0xf7f101f8
#define regs_return_value(regs) ((regs)->ARM_r0) #define flush_insn_slot(p) do { } while (0) @@ -36,6 +38,24 @@
typedef u32 kprobe_opcode_t;
+/* + * When we store a 32-bit Thumb instruction in a 32-bit integer we store + * the first halfword in the upper 16 bits and the second halfword in the + * lower 16 bits. This is the reverse of the order as they would appear in + * memory on a little-endian system but it matches the convention used in + * the ARM ARM and makes the decoding functions more readable. + * + * A 16-bit Thumb instruction is stored in the upper half of a 32-bit integer + * to be consistent with the above. + * + * The following macro definitions let us manipulate instructions in these + * formats... + */ +#define THUMB_1ST_HALF(i) ((i) >> 16) +#define THUMB_2ND_HALF(i) ((i) & 0xffff) +#define MAKE_THUMB32(a, b) ((kprobe_opcode_t)(((a) << 16) + (b))) +#define MAKE_THUMB16(a) ((kprobe_opcode_t)((a) << 16)) + struct kprobe; typedef void (kprobe_insn_handler_t)(struct kprobe *, struct pt_regs *);
@@ -73,6 +93,15 @@ enum kprobe_insn {
enum kprobe_insn arm_kprobe_decode_insn(kprobe_opcode_t, struct arch_specific_insn *); + +#ifdef CONFIG_THUMB2_KERNEL +enum kprobe_insn thumb16_kprobe_decode_insn(kprobe_opcode_t, + struct arch_specific_insn *); + +enum kprobe_insn thumb32_kprobe_decode_insn(kprobe_opcode_t, + struct arch_specific_insn *); +#endif + void __init arm_kprobe_decode_init(void);
#endif /* _ARM_KPROBES_H */ diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile index 8df05fd..d345bdf 100644 --- a/arch/arm/kernel/Makefile +++ b/arch/arm/kernel/Makefile @@ -37,6 +37,9 @@ obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o obj-$(CONFIG_KPROBES) += kprobes.o kprobes-decode.o +ifdef CONFIG_THUMB2_KERNEL +obj-$(CONFIG_KPROBES) += kprobes-thumb16.o kprobes-thumb32.o +endif obj-$(CONFIG_ATAGS_PROC) += atags.o obj-$(CONFIG_OABI_COMPAT) += sys_oabi-compat.o obj-$(CONFIG_ARM_THUMBEE) += thumbee.o diff --git a/arch/arm/kernel/kprobes-thumb16.c b/arch/arm/kernel/kprobes-thumb16.c new file mode 100644 index 0000000..3cb37aa --- /dev/null +++ b/arch/arm/kernel/kprobes-thumb16.c @@ -0,0 +1,33 @@ +/* + * arch/arm/kernel/kprobes-thumb16.c + * + * Copyright (C) 2011 Jon Medhurst tixy@yxit.co.uk. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/kernel.h> +#include <linux/kprobes.h> + +static void __kprobes +emulate_nop(struct kprobe *p, struct pt_regs *regs) +{ +} + +enum kprobe_insn __kprobes +thumb16_kprobe_decode_insn(kprobe_opcode_t insn, struct arch_specific_insn *asi) +{ + insn = THUMB_1ST_HALF(insn); + + ((u16 *)asi->insn)[1] = 0x4770; /* bx lr */ + + if (insn == 0xbf00) { + /* Special case NOP for now. TODO: Remove this. */ + asi->insn_handler = emulate_nop; + return INSN_GOOD_NO_SLOT; + } + + return INSN_REJECTED; +} diff --git a/arch/arm/kernel/kprobes-thumb32.c b/arch/arm/kernel/kprobes-thumb32.c new file mode 100644 index 0000000..4dc64d3 --- /dev/null +++ b/arch/arm/kernel/kprobes-thumb32.c @@ -0,0 +1,32 @@ +/* + * arch/arm/kernel/kprobes-thumb32.c + * + * Copyright (C) 2011 Jon Medhurst tixy@yxit.co.uk. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/kernel.h> +#include <linux/kprobes.h> + +static void __kprobes +emulate_nop(struct kprobe *p, struct pt_regs *regs) +{ +} + +enum kprobe_insn __kprobes +thumb32_kprobe_decode_insn(kprobe_opcode_t insn, struct arch_specific_insn *asi) +{ + ((u16 *)asi->insn)[2] = 0x4770; /* bx lr */ + + if (insn == 0xf3af8000) { + /* Special case NOP for now. TODO: Remove this. */ + asi->insn_handler = emulate_nop; + return INSN_GOOD_NO_SLOT; + } + + return INSN_REJECTED; +} + diff --git a/arch/arm/kernel/kprobes.c b/arch/arm/kernel/kprobes.c index 2ba7deb..1946384 100644 --- a/arch/arm/kernel/kprobes.c +++ b/arch/arm/kernel/kprobes.c @@ -35,7 +35,7 @@ #define flush_insns(addr, cnt) \ flush_icache_range((unsigned long)(addr), \ (unsigned long)(addr) + \ - sizeof(kprobe_opcode_t) * (cnt)) + (cnt))
/* Used as a marker in ARM_pc to note when we're in a jprobe. */ #define JPROBE_MAGIC_ADDR 0xffffffff @@ -49,16 +49,43 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p) kprobe_opcode_t insn; kprobe_opcode_t tmp_insn[MAX_INSN_SIZE]; unsigned long addr = (unsigned long)p->addr; + enum kprobe_insn (*decode_insn)(kprobe_opcode_t, + struct arch_specific_insn *); int is;
- if (addr & 0x3 || in_exception_text(addr)) + if (in_exception_text(addr)) return -EINVAL;
+#ifdef CONFIG_THUMB2_KERNEL + if (addr & 1) { + /* Probe is in Thumb code */ + u16 *thumb_addr = (u16 *)(addr & ~1); + unsigned long insn1 = thumb_addr[0]; + if (insn1 < 0xe800) { + /* 16bit Thumb instruction */ + insn = MAKE_THUMB16(insn1); + decode_insn = thumb16_kprobe_decode_insn; + } else { + /* 32bit Thumb instruction */ + insn = MAKE_THUMB32(insn1, thumb_addr[1]); + decode_insn = thumb32_kprobe_decode_insn; + } + } else { + /* Probe is in ARM code */ + insn = *(u32 *)addr; + decode_insn = arm_kprobe_decode_insn; + } +#else /* !CONFIG_THUMB2_KERNEL */ + if (addr & 0x3) + return -EINVAL; insn = *p->addr; + decode_insn = arm_kprobe_decode_insn; +#endif + p->opcode = insn; p->ainsn.insn = tmp_insn;
- switch (arm_kprobe_decode_insn(insn, &p->ainsn)) { + switch ((*decode_insn)(insn, &p->ainsn)) { case INSN_REJECTED: /* not supported */ return -EINVAL;
@@ -68,7 +95,8 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p) return -ENOMEM; for (is = 0; is < MAX_INSN_SIZE; ++is) p->ainsn.insn[is] = tmp_insn[is]; - flush_insns(p->ainsn.insn, MAX_INSN_SIZE); + flush_insns(p->ainsn.insn, + sizeof(p->ainsn.insn[0]) * MAX_INSN_SIZE); break;
case INSN_GOOD_NO_SLOT: /* instruction doesn't need insn slot */ @@ -79,24 +107,93 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p) return 0; }
+#ifndef CONFIG_THUMB2_KERNEL + void __kprobes arch_arm_kprobe(struct kprobe *p) { - *p->addr = KPROBE_BREAKPOINT_INSTRUCTION; - flush_insns(p->addr, 1); + *p->addr = KPROBE_ARM_BREAKPOINT_INSTRUCTION; + flush_insns(p->addr, sizeof(p->addr[0])); }
+#else /* CONFIG_THUMB2_KERNEL */ + +/* + * The actual arming is done here on each CPU and synchronized using + * stop_machine. This synchronization is necessary on SMP to atomically set the + * two half-words of a 32-bit Thumb breakpoint. It is also needed so that we can + * guarentee that all CPUs will actually hit the probe after the caller thinks + * they have inserted it. (Is this last staement true?) + */ +int __kprobes __arch_arm_kprobe(void *p) +{ + struct kprobe *kp = p; + kprobe_opcode_t *addr = kp->addr; + unsigned int len; + if ((uintptr_t)addr & 1) { + /* Thumb instruction */ + addr = (kprobe_opcode_t *)((uintptr_t)addr & ~1); + if (THUMB_1ST_HALF(kp->opcode) >= 0xe800) { + /* 32-bit thumb instruction */ + const u32 brkp = KPROBE_THUMB32_BREAKPOINT_INSTRUCTION; + ((u16 *)addr)[0] = THUMB_1ST_HALF(brkp); + ((u16 *)addr)[1] = THUMB_2ND_HALF(brkp); + len = 2*sizeof(u16); + } else { + /* 16-bit thumb instruction */ + *(u16 *)addr = KPROBE_THUMB16_BREAKPOINT_INSTRUCTION; + len = sizeof(u16); + } + } else { + /* ARM instruction */ + *addr = KPROBE_ARM_BREAKPOINT_INSTRUCTION; + len = sizeof(addr[0]); + } + flush_insns(addr, len); + return 0; +} + +void __kprobes arch_arm_kprobe(struct kprobe *p) +{ + stop_machine(__arch_arm_kprobe, p, &cpu_online_map); +} + +#endif /* !CONFIG_THUMB2_KERNEL */ + /* * The actual disarming is done here on each CPU and synchronized using * stop_machine. This synchronization is necessary on SMP to avoid removing * a probe between the moment the 'Undefined Instruction' exception is raised * and the moment the exception handler reads the faulting instruction from - * memory. + * memory. It is also needed to atomically set the two half-words of a 32-bit + * Thumb breakpoint. */ int __kprobes __arch_disarm_kprobe(void *p) { struct kprobe *kp = p; - *kp->addr = kp->opcode; - flush_insns(kp->addr, 1); + kprobe_opcode_t *addr = kp->addr; +#ifdef CONFIG_THUMB2_KERNEL + unsigned int len; + if ((uintptr_t)addr & 1) { + /* Thumb instruction */ + u16 inst1 = THUMB_1ST_HALF(kp->opcode); + addr = (kprobe_opcode_t *)((uintptr_t)addr & ~1); + ((u16 *)addr)[0] = inst1; + len = sizeof(u16); + if (inst1 >= 0xe800) { + /* 32-bit thumb instruction */ + ((u16 *)addr)[1] = THUMB_2ND_HALF(kp->opcode); + len = 2*sizeof(u16); + } + } else { + /* ARM instruction */ + *addr = kp->opcode; + len = sizeof(addr[0]); + } + flush_insns(addr, len); +#else /* !CONFIG_THUMB2_KERNEL */ + *addr = kp->opcode; + flush_insns(addr, sizeof(addr[0])); +#endif return 0; }
@@ -133,7 +230,19 @@ static void __kprobes set_current_kprobe(struct kprobe *p) static void __kprobes singlestep(struct kprobe *p, struct pt_regs *regs, struct kprobe_ctlblk *kcb) { - regs->ARM_pc += 4; + unsigned long pc = regs->ARM_pc; +#ifdef CONFIG_THUMB2_KERNEL + unsigned long thumb = regs->ARM_cpsr & PSR_T_BIT; + if (thumb && (THUMB_1ST_HALF(p->opcode) < 0xe800)) + /* A 16bit Thumb instruction */ + pc += 2; + else + /* An ARM or 32bit Thumb instruction */ + pc += 4; +#else /* !CONFIG_THUMB2_KERNEL */ + pc += 4; +#endif + regs->ARM_pc = pc; p->ainsn.insn_handler(p, regs); }
@@ -149,6 +258,11 @@ void __kprobes kprobe_handler(struct pt_regs *regs) struct kprobe *p, *cur; struct kprobe_ctlblk *kcb; kprobe_opcode_t *addr = (kprobe_opcode_t *)regs->ARM_pc; +#ifdef CONFIG_THUMB2_KERNEL + if (regs->ARM_cpsr & PSR_T_BIT) + /* Set bit zero of addr for Thumb */ + addr = (kprobe_opcode_t *)((unsigned long)addr | 1); +#endif
kcb = get_kprobe_ctlblk(); cur = kprobe_running(); @@ -223,6 +337,31 @@ static int __kprobes kprobe_trap_handler(struct pt_regs *regs, unsigned int inst return 0; }
+#ifdef CONFIG_THUMB2_KERNEL + +static int __kprobes +kprobe_thumb32_trap_handler(struct pt_regs *regs, unsigned int instr) +{ + int ret; + u16 instr2; + unsigned long flags; + local_irq_save(flags); + + /* Check the second half of the instruction. */ + instr2 = *(u16 *) (instruction_pointer(regs) + 2); + if (instr2 == THUMB_2ND_HALF(KPROBE_THUMB32_BREAKPOINT_INSTRUCTION)) { + kprobe_handler(regs); + ret = 0; + } else { + ret = 1; + } + + local_irq_restore(flags); + return ret; +} + +#endif /* CONFIG_THUMB2_KERNEL */ + int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr) { struct kprobe *cur = kprobe_running(); @@ -299,7 +438,11 @@ void __naked __kprobes kretprobe_trampoline(void) "bl trampoline_handler \n\t" "mov lr, r0 \n\t" "ldmia sp!, {r0 - r11} \n\t" +#ifdef CONFIG_THUMB2_KERNEL + "bx lr \n\t" +#else "mov pc, lr \n\t" +#endif : : : "memory"); }
@@ -377,11 +520,22 @@ int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs) struct jprobe *jp = container_of(p, struct jprobe, kp); struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); long sp_addr = regs->ARM_sp; + long cpsr;
kcb->jprobe_saved_regs = *regs; memcpy(kcb->jprobes_stack, (void *)sp_addr, MIN_STACK_SIZE(sp_addr)); + regs->ARM_pc = (long)jp->entry; - regs->ARM_cpsr |= PSR_I_BIT; + cpsr = regs->ARM_cpsr | PSR_I_BIT; +#ifdef CONFIG_THUMB2_KERNEL + /* Set correct Thumb state in cpsr */ + if (regs->ARM_pc & 1) + cpsr |= PSR_T_BIT; + else + cpsr &= ~PSR_T_BIT; +#endif + regs->ARM_cpsr = cpsr; + preempt_disable(); return 1; } @@ -403,7 +557,12 @@ void __kprobes jprobe_return(void) * This is to prevent any simulated instruction from writing * over the regs when they are accessing the stack. */ +#ifdef CONFIG_THUMB2_KERNEL + "sub r0, %0, %1 \n\t" + "mov sp, r0 \n\t" +#else "sub sp, %0, %1 \n\t" +#endif "ldr r0, ="__stringify(JPROBE_MAGIC_ADDR)"\n\t" "str %0, [sp, %2] \n\t" "str r0, [sp, %3] \n\t" @@ -414,15 +573,28 @@ void __kprobes jprobe_return(void) * Return to the context saved by setjmp_pre_handler * and restored by longjmp_break_handler. */ +#ifdef CONFIG_THUMB2_KERNEL + "ldr lr, [sp, %2] \n\t" /* lr = saved sp */ + "ldrd r0, r1, [sp, %5] \n\t" /* r0,r1 = saved lr,pc */ + "ldr r2, [sp, %4] \n\t" /* r2 = saved psr */ + "stmdb lr!, {r0, r1, r2} \n\t" /* push saved lr and */ + /* rfe context */ + "ldmia sp, {r0 - r12} \n\t" + "mov sp, lr \n\t" + "ldr lr, [sp], #4 \n\t" + "rfeia sp! \n\t" +#else "ldr r0, [sp, %4] \n\t" "msr cpsr_cxsf, r0 \n\t" "ldmia sp, {r0 - pc} \n\t" +#endif : : "r" (kcb->jprobe_saved_regs.ARM_sp), "I" (sizeof(struct pt_regs) * 2), "J" (offsetof(struct pt_regs, ARM_sp)), "J" (offsetof(struct pt_regs, ARM_pc)), - "J" (offsetof(struct pt_regs, ARM_cpsr)) + "J" (offsetof(struct pt_regs, ARM_cpsr)), + "J" (offsetof(struct pt_regs, ARM_lr)) : "memory", "cc"); }
@@ -459,17 +631,45 @@ int __kprobes arch_trampoline_kprobe(struct kprobe *p) return 0; }
-static struct undef_hook kprobes_break_hook = { +static struct undef_hook kprobes_arm_break_hook = { .instr_mask = 0xffffffff, - .instr_val = KPROBE_BREAKPOINT_INSTRUCTION, + .instr_val = KPROBE_ARM_BREAKPOINT_INSTRUCTION, .cpsr_mask = MODE_MASK, .cpsr_val = SVC_MODE, .fn = kprobe_trap_handler, };
+#ifdef CONFIG_THUMB2_KERNEL + +static struct undef_hook kprobes_thumb16_break_hook = { + .instr_mask = 0xffff, + .instr_val = KPROBE_THUMB16_BREAKPOINT_INSTRUCTION, + .cpsr_mask = MODE_MASK, + .cpsr_val = SVC_MODE, + .fn = kprobe_trap_handler, +}; + +static struct undef_hook kprobes_thumb32_break_hook = { + .instr_mask = 0xffff, + .instr_val = THUMB_1ST_HALF(KPROBE_THUMB32_BREAKPOINT_INSTRUCTION), + .cpsr_mask = MODE_MASK, + .cpsr_val = SVC_MODE, + .fn = kprobe_thumb32_trap_handler, +}; + +#endif /* CONFIG_THUMB2_KERNEL */ + int __init arch_init_kprobes() { arm_kprobe_decode_init(); - register_undef_hook(&kprobes_break_hook); +#ifdef CONFIG_THUMB2_KERNEL + /* + * Thumb breakpoints in thumb kernels are the usual case, so register + * them before the ARM hooks so they are found quicker. + */ + register_undef_hook(&kprobes_thumb16_break_hook); + register_undef_hook(&kprobes_thumb32_break_hook); +#endif + register_undef_hook(&kprobes_arm_break_hook); return 0; }
On Mon, 21 Mar 2011, Tixy wrote:
Open Issue
32-bit Thumb breakpoints may straddle two memory words, which means that when we set or clear them there is a window of opportunity where another CPU may only see half of the new instruction and execute invalid code. To prevent this I've used stop_machine() to get all CPUs to synchronously modify the instruction and update their I-caches. To my thinking, something like this would also be needed so that other CPUs see the new instruction, otherwise they could indefinately be executing the old one from their local I-cache.
The problem with using stop_machine() is that the breakpoint setting code is called from enable_kprobe() which holds the text_mutex and has this comment which says:
since [the breakpoint setting code] doesn't use stop_machine(), this doesn't cause deadlock on text_mutex. So, we don't need get_online_cpus()
Now I am using stop_machine() I need to understand what the consequences and alternatives are.
Why not always using a 16-bit Thumb breakpoint instruction even in place of a 32-bit Thumb instruction? This way you sidestep all the issues about atomically updating the instruction across two words. The instruction to emulate might still be 32-bit and therefore pc would be advanced appropriately.
Nicolas
On Mon, 2011-03-21 at 12:01 -0400, Nicolas Pitre wrote:
Why not always using a 16-bit Thumb breakpoint instruction even in place of a 32-bit Thumb instruction? This way you sidestep all the issues about atomically updating the instruction across two words. The instruction to emulate might still be 32-bit and therefore pc would be advanced appropriately.
If the breakpoint is in an IT block and its execution condition is false then the breakpoint may not cause an exception and the CPU will go on and try and execute the other half of the instruction.
On Mon, 21 Mar 2011, Tixy wrote:
On Mon, 2011-03-21 at 12:01 -0400, Nicolas Pitre wrote:
Why not always using a 16-bit Thumb breakpoint instruction even in place of a 32-bit Thumb instruction? This way you sidestep all the issues about atomically updating the instruction across two words. The instruction to emulate might still be 32-bit and therefore pc would be advanced appropriately.
If the breakpoint is in an IT block and its execution condition is false then the breakpoint may not cause an exception and the CPU will go on and try and execute the other half of the instruction.
yeah... bummer.
In this case this might be a worthwhile optimization to not do the heavy dance when the instruction is word aligned.
Nicolas
On Mon, 2011-03-21 at 16:06 -0400, Nicolas Pitre wrote:
On Mon, 21 Mar 2011, Tixy wrote:
On Mon, 2011-03-21 at 12:01 -0400, Nicolas Pitre wrote:
Why not always using a 16-bit Thumb breakpoint instruction even in place of a 32-bit Thumb instruction? This way you sidestep all the issues about atomically updating the instruction across two words. The instruction to emulate might still be 32-bit and therefore pc would be advanced appropriately.
If the breakpoint is in an IT block and its execution condition is false then the breakpoint may not cause an exception and the CPU will go on and try and execute the other half of the instruction.
yeah... bummer.
In this case this might be a worthwhile optimization to not do the heavy dance when the instruction is word aligned.
Is it worth adding extra code to optimise 50% of the cases? I've been assuming inserting and removing probes is a fairly heavyweight process anyway.
On Mon, 21 Mar 2011, Tixy wrote:
On Mon, 2011-03-21 at 16:06 -0400, Nicolas Pitre wrote:
On Mon, 21 Mar 2011, Tixy wrote:
On Mon, 2011-03-21 at 12:01 -0400, Nicolas Pitre wrote:
Why not always using a 16-bit Thumb breakpoint instruction even in place of a 32-bit Thumb instruction? This way you sidestep all the issues about atomically updating the instruction across two words. The instruction to emulate might still be 32-bit and therefore pc would be advanced appropriately.
If the breakpoint is in an IT block and its execution condition is false then the breakpoint may not cause an exception and the CPU will go on and try and execute the other half of the instruction.
yeah... bummer.
In this case this might be a worthwhile optimization to not do the heavy dance when the instruction is word aligned.
Is it worth adding extra code to optimise 50% of the cases? I've been assuming inserting and removing probes is a fairly heavyweight process anyway.
You don't need the big hammer for 16-bit insns either.
In the end it depends on how complex the extra code is. If it is like
if (!(addr & 2)) goto skip;
then it might be worth it. Of course this would look better as:
if (insn_is_wide(insn) && addr & 2) { do_the_heavy_locking; } ...
Nicolas
linaro-kernel@lists.linaro.org