Hi, Willy
On Sun, Aug 13, 2023 at 09:26:20PM +0800, Zhangjin Wu wrote: [...]
With this exception, s390 no long need to provide its own mmap definition, it (seems i386 too, but it uses mmap2 currently) can simply define '__ARCH_WANT_SYS_OLD_MMAP' as the '__ARCH_WANT_SYS_OLD_SELECT' we are using for old_select.
The same method applies to the selection of the different backward version of the sys_clone() syscall (from kernel/fork.c):
(...)
#ifdef __NR_clone #undef sys_clone #define __sys_clone(...) __sysdef(clone, __VA_ARGS__)
static __attribute__((unused)) int sys_clone(unsigned long clone_flags, unsigned long newsp, int __attribute__((unused)) stack_size, int parent_tidptr, int child_tidptr, unsigned long tls) { long ret; #ifdef __ARCH_WANT_SYS_CLONE_BACKWARDS ret = __sys_clone(clone_flags, newsp, parent_tidptr, tls, child_tidptr); #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS2) ret = __sys_clone(newsp, clone_flags, parent_tidptr, child_tidptr, tls); #elif defined(__ARCH_WANT_SYS_CLONE_BACKWARDS3) ret = __sys_clone(clone_flags, newsp, stack_size, parent_tidptr, child_tidptr, tls); #else ret = __sys_clone(clone_flags, newsp, parent_tidptr, child_tidptr, tls); #endif return ret; } #endif /* __NR_clone */
s390 only requires to define '__ARCH_WANT_SYS_CLONE_BACKWARDS2', no need to provide its own sys_fork() version, in the __NR_clone branch of fork(), __ARCH_WANT_SYS_CLONE_BACKWARDS2 can directly select the right version of sys_clone() for s390).
Maybe but with much less #define indirections it would be significantly better.
(...)
We only have these three exceptions currently, with this normalization, the library routines from sys.h can directly think sys_* macros are generic, if not, let syscall.h take care of the right exceptions.
I see the point. But that doesn't remove the need to write the exported function itself. I'm not saying there's nothing to save here, I see your point, I'm just wondering if we really gain something in terms of ease of declaring new syscalls especially for first-time contributors and if we're not losing in maintenance. If at least it's easy enough to implement exceptions, maybe it could be worth further investigating.
I will delay the whole work about __sysdef(), but work on some more generic parts (like the exceptions above) at first.
static __attribute__((unused)) int dup2(int old, int new) {
int ret = sys_dup3(old, new, 0);
if (ret == -ENOSYS) ret = sys_dup2(old, new);
return __sysret(ret); }
But this will add a useless test after all such syscalls, we'd rather not do that!
Indeed, I found this issue too, when __NR_dup3 not defined, it returns -ENOSYS, than, no size issue, otherwise, the compiler will not be able to learn what the ret of sys_dup3() will be, so, it can not optimize the second call to sys_dup2().
So, the '#ifdef' logic must be used like we did in sys_* functions, but it is really not that meaningful (no big gain as you mentioned above) if we only move them from the sys_* functions to the library routines.
At last, I found the ternary operation together with the initialization of the not-defined __NR_* as NOLIBC__NR_NOSYS help this a lot, at last, we get something like this:
/* __systry2() is used to select one of two provided low level syscalls */ #define __systry2(a, sys_a, sys_b) \ ((NOLIBC__NR_##a != NOLIBC__NR_NOSYS) ? (sys_a) : (sys_b))
But this supposes that all of them are manually defined as you did above. I'd rather implement an ugly is_numeric() macro based on argument resolution. I've done it once in another project, I don't remember precisely where it is but I vaguely remember that it used to check that the string resolution of the argument gave a letter (when it does not exist) or a digit (when it does). I can look into that later if needed. But please avoid extra macro definitions as much as possible, they're a real pain to handle in the code. There's no error when one is missing or has a typo, it's difficult to follow them and they don't appear in the debugger.
Yeah, your reply inspired me to look into the IS_ENABLED() from ../include/linux/kconfig.h macro again, there was a __is_defined() there, let's throw away the ugly sysnr.h. I thought of IS_ENABLED() was only for y/n/m before, but it does return 0 when the macro is not defined, it uses the same trick in syscall() to calculate the number of arguments, if the macro is not defined, then, 0 "argument".
It can eliminate all of the '#ifdef' stuffs, using the chmod example you mentioned above, it becomes something like this:
/* * int chmod(const char *path, mode_t mode); */
static __attribute__((unused)) int chmod(const char *path, mode_t mode) { return __sysret(__systry2(chmod, sys_chmod(path, mode), sys_fchmodat(AT_FDCWD, path, mode, 0))); }
Purely clean and clear.
That's a matter of taste and it may explain why we often disagree. For me it's horrid. If I'm the one implementing chmod for my platform and it does not work, what should I do when facing that, except want to cry ? Think that right now we have this:
static __attribute__((unused)) int sys_chmod(const char *path, mode_t mode) { #ifdef __NR_fchmodat return my_syscall4(__NR_fchmodat, AT_FDCWD, path, mode, 0); #elif defined(__NR_chmod) return my_syscall2(__NR_chmod, path, mode); #else return -ENOSYS; #endif }
Sure it can be called not pretty, but I think it has the merit of being totally explicit, and whoever sees chmod() fail can quickly check based on the test in what situation they're supposed to be and what to check.
One thing I'm worried about also regarding using my_syscall() without the number is that it's easy to miss an argument and have random values taken from registers (or the stack) passed as argument. For example above we can see that the flags part is 0 in fchmodat(). It's easy to miss themn and while the syscall4() will complain, syscall() will silently turn that into syscall3(). That's not necessarily a big deal, but we've seen during the development that it's easy to make mistakes and they're not trivial to spot. So again I'm really wondering about the benefits in such a case.
This is well illustrated in your example below:
return __sysret(__systry2(newselect, sys_newselect(nfds, rfds, wfds, efds, timeout), sys_pselect6(nfds, rfds, wfds, efds, timeout ? &t : NULL, NULL)));
How many attempts to get it right ? Just skip one NULL and you don't see it.
Yeah, seems we have missed the last 0 in ppoll() before and the test may not report about it either.
[...]
Sure it's not pretty, and I'd rather just go back to SET_ERRNO() to be honest, because we're there just because of the temptation to remove lines that were not causing any difficulties :-/
I think we can do something in-between and deal only with signed returns, and explicitly place the test for MAX_ERRNO on the two unsigned ones (brk and mmap). It should look approximately like this:
#define __sysret(arg) \ ({ \ __typeof__(arg) __sysret_arg = (arg); \ (__sysret_arg < 0) ? ({ /* error ? */ \ SET_ERRNO(-__sysret_arg); /* yes: errno != -ret */ \ ((__typeof__(arg)) -1); /* return -1 */ \ }) : __sysret_arg; /* return original value */ \ })
I like this one very much, a simple test shows, it saves one more byte.
I'm going to do that and revert the 3 affected syscalls.
Ok.
Only a quesiton, why 'errno != -ret' has a '!'? and we have post-tab in above two lines of __sysret() too, I have changed them to whitespaces.
[...]
But let them align with the others may be better, so, most of the sys_* macros can be simply mapped with a simple line (all of them are generated automatically), without the care of the return types changing.
So, Willy, as a summary:
one solution is your new __sysret() + restore the original SET_ERRNO for mmap and brk [1].
another solution is your new __sysret() + my patch [2] to let mmap and brk return 'long' as the other sys_* function does.
No, because it will completely break them when they'll need to access the second half of the memory, as I already explained somewhere else in one of these numerous discussions.
Sorry, will recheck this part later, please ignore it.
[...]
I will resell my proposed patchset above, at least a RFC patchset, please ignore this currently ;-)
Please allow me to breathe a little bit. Really I mean, I'm already worn by having to constantly review breaking changes that either introduce bugs or break maintainability, and to have to justify myself for things that I thought would be obvious to anyone. Massive changes are extremely time consuming to review, and trying to figure subtle breakage in such low-level stuff is even harder. I simply can't assign more time to this, particularly for the expected gains which for me or often perceived as losses of maintainability instead :-/
Take a rest, I will delay the whole __sysdef() stuff and continue to work on the last tinyconfig patchset after v6.5, it is the last one before our early rv32 work ;-)
Thanks, Zhangjin
Thanks, Willy