Hi all,
Apologies if this is the wrong list, and for the somewhat vague description of my problem.
I've been working on porting Go (via gccgo) to aarch64 and things have mostly been going well. However, under some circumstances, I'm seeing crashes. What's happening is that when a signal -- SIGCHLD in this case -- is being handled, instead of being executed on the stack passed to sigaltstack, the signal is being handled on some *other* thread's stack, which unsurprisingly ends badly when a signal context object is smashed over whatever the original thread had put there.
By setting breakpoints on the signal handler in gdb and printing $sp, I can actually see that signals are never being executed on the altstack, but it takes a random number of signals before one is executed somewhere that causes a crash. So I don't know if signals are always being handled on other thread's stacks or if it's just at random-ish locations in the heap. (Goroutines run with stacks allocated in the heap).
Writing a very simple program that calls sigaltstack does behave as expected, but the go runtime is doing all sorts of things with multiple threads and getcontext/makecontext/setcontext so I guess something is getting confused.
There are some more details on this bug: https://bugs.launchpad.net/ubuntu/+source/gcc-4.8/+bug/1279620 but I don't have anything like a minimal example unfortunately. I'll try to come up with one tomorrow, but in the mean time: does this ring any bells at all with anyone? I couldn't see any obvious reasons for this behaviour in the kernel code :/
Cheers, mwh
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Hi all,
Apologies if this is the wrong list, and for the somewhat vague description of my problem.
I've been working on porting Go (via gccgo) to aarch64 and things have mostly been going well. However, under some circumstances, I'm seeing crashes. What's happening is that when a signal -- SIGCHLD in this case -- is being handled, instead of being executed on the stack passed to sigaltstack, the signal is being handled on some *other* thread's stack, which unsurprisingly ends badly when a signal context object is smashed over whatever the original thread had put there.
[...]
There are some more details on this bug: https://bugs.launchpad.net/ubuntu/+source/gcc-4.8/+bug/1279620 but I don't have anything like a minimal example unfortunately. I'll try to come up with one tomorrow, but in the mean time: does this ring any bells at all with anyone? I couldn't see any obvious reasons for this behaviour in the kernel code :/
It's not minimal, but I have a way that someone other than me can (hopefully) reproduce the bug.
1) Get yourself an Ubuntu Trusty image running on aarch64 2) Install libgo4-dbg and gdb and their dependencies 3) Turn off transparent huge pages: * echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled 4) Download these files and chuck them in the same directory: * http://people.linaro.org/~mwhudson/sas-test * http://people.linaro.org/~mwhudson/pkgs.json 5) Chuck this in .gdbinit:
set height 0 define hook-quit set confirm off end set breakpoint pending on br __go_panic set breakpoint pending on br runtime_signalstack commands printf "XXX sigaltstack requesting stack: 0x%lx - 0x%lx\n", p, p+n c end br sig_tramp_info commands printf "XXX signal: $sp: 0x%010lx", $sp c end
6) Run "gdb --args ./sas-test install | tee gdb.out" and type run 7) grep for XXX in gdb.out and see if the stacks that the signals are being handled on lines up with what is being passed to sigaltstack.
When I do this I see:
XXX sigaltstack requesting stack: 0x8506000 - 0x850e000 XXX sigaltstack requesting stack: 0x8b2c000 - 0x8b34000 XXX sigaltstack requesting stack: 0x9180000 - 0x9188000 XXX sigaltstack requesting stack: 0x9192000 - 0x919a000 XXX sigaltstack requesting stack: 0x919f000 - 0x91a7000 XXX signal: $sp: 0x0009604240[Switching to Thread 0x7fa6c36270 (LWP 1034)] XXX signal: $sp: 0x0009804710 XXX signal: $sp: 0x0009a040a0[Switching to Thread 0x7fb7ff3000 (LWP 1028)] XXX signal: $sp: 0x00098042b0[Switching to Thread 0x7fa6c36270 (LWP 1034)] XXX signal: $sp: 0x0009a04780 XXX signal: $sp: 0x0009604780 XXX signal: $sp: 0x0009c09710 XXX signal: $sp: 0x0009a04710 XXX signal: $sp: 0x0009604710 XXX signal: $sp: 0x0009c09780 XXX signal: $sp: 0x0009804710 XXX signal: $sp: 0x0009a04780 XXX signal: $sp: 0x0009604780 XXX signal: $sp: 0x0009c09710 XXX signal: $sp: 0x0009804780 ...
Those signal $sp's don't appear to bear any relation to the addresses passed to sigaltstack...
Cheers, mwh
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Hi all,
Apologies if this is the wrong list, and for the somewhat vague description of my problem.
I've been working on porting Go (via gccgo) to aarch64 and things have mostly been going well. However, under some circumstances, I'm seeing crashes. What's happening is that when a signal -- SIGCHLD in this case -- is being handled, instead of being executed on the stack passed to sigaltstack, the signal is being handled on some *other* thread's stack, which unsurprisingly ends badly when a signal context object is smashed over whatever the original thread had put there.
I finally chased this down to (what at least I think is) a glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=16629
Cheers, mwh
On 24 February 2014 03:15, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Hi all,
Apologies if this is the wrong list, and for the somewhat vague description of my problem.
I've been working on porting Go (via gccgo) to aarch64 and things have mostly been going well. However, under some circumstances, I'm seeing crashes. What's happening is that when a signal -- SIGCHLD in this case -- is being handled, instead of being executed on the stack passed to sigaltstack, the signal is being handled on some *other* thread's stack, which unsurprisingly ends badly when a signal context object is smashed over whatever the original thread had put there.
I finally chased this down to (what at least I think is) a glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=16629
Thanks for another great debugging job! ;-)
I'll take a look at this as soon as I'm "back in the office" unless someone gets there first.
Will Newton will.newton@linaro.org writes:
On 24 February 2014 03:15, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Michael Hudson-Doyle michael.hudson@linaro.org writes:
Hi all,
Apologies if this is the wrong list, and for the somewhat vague description of my problem.
I've been working on porting Go (via gccgo) to aarch64 and things have mostly been going well. However, under some circumstances, I'm seeing crashes. What's happening is that when a signal -- SIGCHLD in this case -- is being handled, instead of being executed on the stack passed to sigaltstack, the signal is being handled on some *other* thread's stack, which unsurprisingly ends badly when a signal context object is smashed over whatever the original thread had put there.
I finally chased this down to (what at least I think is) a glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=16629
Thanks for another great debugging job! ;-)
Heh, I find it hard to give up... I should probably get help about that!
I'll take a look at this as soon as I'm "back in the office" unless someone gets there first.
Cool. I'm certainly happy to test any fixes, and could even try to work on one myself with a few hints...
Cheers, mwh