On Tue, Jul 12, 2011 at 01:10:24PM +0100, David Gilbert wrote:
On 12 July 2011 11:43, Dave Martin dave.martin@linaro.org wrote:
Just for context, I had a quick play to get a feel for the feasibility of implementing this directly, without relying either on a VDSO or on IFUNC.
I originally thought about doing something similar to what you've done with the indirection; but eventually convinced myself that it didn't provide anything that the initial constructor check didn't provide.
It's possible to produce something which works reasonably well: see the attachment. The result is almost the same as what IFUNC would achieve (although &__kernel_cmpxchg64 leaves something to be desired; macros could potentially fix that).
Does it help address rth's concerns though?
Which ones in particular?
The various helpers are checked independently at startup if they are used, by registering the resolver functions as constructors as others have suggested. (Note that gcc/libc already fails to do this check for the older kernel helper functions; it's just that you're unlikely to observe a problem on any reasonably new kernel.)
I don't see any point going back and adding checks for those; the code is already there.
Indeed -- I'm merely observing that this problem is not actually new with cmpxchg64; it's just that it wasn't solved previously.
The problem of constructor ordering is solved by pointing each indirect function pointer at the appropriate resolver function initially. Threading is not taken into account because no additional threads can exist during execution of constructors (though if more libraries are later dlopen()'d they may invoke additional constructors in the presence of threads, so this might need fixing).
I'm not sure threads are a problem; you could have two threads going into the check at the same time; with any luck they would both make the same decision and rewrite the pointer to the same value and either call the function ro exit; I guess you could end up with two threads trying to print the error and exit at the same time.
Hmmm, probably right.
The main question is the correct way to blow the program up if a needed helper function is not available. Calling exit() is possibly too gentle, whereas calling abort() may be too aggressive. libc may have some correct internal mechanism for doing this, but my knowledge falls short of that.
I took the write() -> abort() from someother libc code that forced an exit.
What to do in the case of recursive lookup failures isn't obvious: currently I just call abort(). By this point the process should be running atexit handlers or destructors, so calling exit() isn't going to help in this situation anyway.
You could always just send yourself a SIGKILL.
Probably, yes. abort() is almost the same, except that SIGKILL is not catchable.
It's probably up to libc guys to judge what would be correct.
The other question is whether and how the program can be enabled to catch the error and report something intelligible to the user. If there is a hook for catching dynamic link failures, we should maybe use the same. If this problem is not solved for dynamic linking in general, it doesn't make sense to require the kuser functions to solve it either (except that if the problem is eventually solved, we want the fix to work smoothly for both).
Isn't this a bit over-the-top for the failure we're trying to catch?
Yes and no. Implementing such a solution just for this case would be very over-the-top. But integrating with a general mechanism for reporting errors if it exists (now or in the future) is very sensible.
Of course, all of these are generic dynamic linking challenges which ld.so likely has some policy or solution for already.
Overall, this isn't a perfect solution, but it doesn't feel catastrophically bad either.
Any thoughts?
We seem to be growing solutions rather than figuring out which ones satisfy the requirements of both gcc and kernel:
- my original simple solution (about 5 lines of code)
- IFUNC
- Your more general indirected code
- A VDSO
I was just exploring the problem space; I appreciate that's not the same thing as choosing a solution.
To be honest I don't see the point in the more general indirected approach; if we want to be more general then I think we should use IFUNC (it would be the 1st use of it, which means we may have to fix some issues but hey that's what we're here for).
Does IFUNC rely on support from libc in order to get resolved correctly? My concern is that if so, binaries using it will silently misbehave on older libcs. This seems to be the case at present, though I haven't figured out whether this is caused by libc or the toolchain. It may be that I'm worrying for nothing -- I don't know exactly how IFUNC works.
There is some stuff that is a bit of a shame that it wasn't more general already; e.g. those slots in the commpage for helper functions - if they were filled with a known marker, you could just check the marker rather than having a separate version, or a faulting instruction or the like.
If it's necessary to to any check at all before concluding that it's safe to call a function, I don't see the difference. Checking a magic number in one place seems no better than checking for a version number in another place, AFAICT.
Having a faulting instruction doesn't permit the program to abort cleanly ahead of time (unless you check in advance -- in which case you're really back to a magic number again).
This matters if constructors might do things like screwing up the terminal -- if so, the chance to un-screw thing up should be offered to the program before it exits.
Possibly constructors should never do things like that, in which case these concerns don't apply. I don't know what the official line is regarding what kind of things global constructors should and shouldn't do.
Also, remember this whole discussion is just to print a message and exit nicely in the case of someone using a currently incredibly rare function on an old kernel!
I'd say we want to notify the operating environment and/or the user. This may be realised by writing some text to stderr, but that's not useful in the context of GUI environments.
I am not suggesting that we should engineer a special solution to that problem here, so long as we don't defeat potential solutions either. For now, it doesn't like any of the proposals risk that though.
Cheers ---Dave