Chromium sandbox apparently wants to deny statx [1] so it could properly inspect arguments after the sandboxed process later falls back to fstat. Because there's currently not a "fd-only" version of statx, so that the sandbox has no way to ensure the path argument is empty without being able to peek into the sandboxed process's memory. For architectures able to do newfstatat though, glibc falls back to newfstatat after getting -ENOSYS for statx, then the respective SIGSYS handler [2] takes care of inspecting the path argument, transforming allowed newfstatat's into fstat instead which is allowed and has the same type of return value.
But, as LoongArch is the first architecture to not have fstat nor newfstatat, the LoongArch glibc does not attempt falling back at all when it gets -ENOSYS for statx -- and you see the problem there!
Actually, back when the LoongArch port was under review, people were aware of the same problem with sandboxing clone3 [3], so clone was eventually kept. Unfortunately it seemed at that time no one had noticed statx, so besides restoring fstat/newfstatat to LoongArch uapi (and postponing the problem further), it seems inevitable that we would need to tackle seccomp deep argument inspection.
However, this is obviously a decision that shouldn't be taken lightly, so we just restore fstat/newfstatat by defining __ARCH_WANT_NEW_STAT in unistd.h. This is the simplest solution for now, and so we hope the community will tackle the long-standing problem of seccomp deep argument inspection in the future [4][5].
More infomation please reading this thread [6].
[1] https://chromium-review.googlesource.com/c/chromium/src/+/2823150 [2] https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/... [3] https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal.... [4] https://lwn.net/Articles/799557/ [5] https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-in... [6] https://lore.kernel.org/loongarch/20240226-granit-seilschaft-eccc2433014d@br...
Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen chenhuacai@loongson.cn --- arch/loongarch/include/uapi/asm/unistd.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/loongarch/include/uapi/asm/unistd.h b/arch/loongarch/include/uapi/asm/unistd.h index fcb668984f03..b344b1f91715 100644 --- a/arch/loongarch/include/uapi/asm/unistd.h +++ b/arch/loongarch/include/uapi/asm/unistd.h @@ -1,4 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#define __ARCH_WANT_NEW_STAT #define __ARCH_WANT_SYS_CLONE #define __ARCH_WANT_SYS_CLONE3
On Sat, May 11, 2024, at 12:01, Huacai Chen wrote:
Chromium sandbox apparently wants to deny statx [1] so it could properly inspect arguments after the sandboxed process later falls back to fstat. Because there's currently not a "fd-only" version of statx, so that the sandbox has no way to ensure the path argument is empty without being able to peek into the sandboxed process's memory. For architectures able to do newfstatat though, glibc falls back to newfstatat after getting -ENOSYS for statx, then the respective SIGSYS handler [2] takes care of inspecting the path argument, transforming allowed newfstatat's into fstat instead which is allowed and has the same type of return value.
But, as LoongArch is the first architecture to not have fstat nor newfstatat, the LoongArch glibc does not attempt falling back at all when it gets -ENOSYS for statx -- and you see the problem there!
My main objection here is that this is inconsistent with 32-bit architectures: we normally have newfstatat() on 64-bit architectures but fstatat64() on 32-bit ones. While loongarch64 is the first 64-bit one that is missing newfstatat(), we have riscv32 already without fstatat64().
Importantly, we can't just add fstatat64() on riscv32 because there is no time64 version for it other than statx(), and I don't want the architectures to diverge more than necessary. I would not mind adding a variant of statx() that works for both riscv32 and loongarch64 though, if it gets added to all architectures.
Arnd
Hi, Arnd,
On Sat, May 11, 2024 at 8:17 PM Arnd Bergmann arnd@arndb.de wrote:
On Sat, May 11, 2024, at 12:01, Huacai Chen wrote:
Chromium sandbox apparently wants to deny statx [1] so it could properly inspect arguments after the sandboxed process later falls back to fstat. Because there's currently not a "fd-only" version of statx, so that the sandbox has no way to ensure the path argument is empty without being able to peek into the sandboxed process's memory. For architectures able to do newfstatat though, glibc falls back to newfstatat after getting -ENOSYS for statx, then the respective SIGSYS handler [2] takes care of inspecting the path argument, transforming allowed newfstatat's into fstat instead which is allowed and has the same type of return value.
But, as LoongArch is the first architecture to not have fstat nor newfstatat, the LoongArch glibc does not attempt falling back at all when it gets -ENOSYS for statx -- and you see the problem there!
My main objection here is that this is inconsistent with 32-bit architectures: we normally have newfstatat() on 64-bit architectures but fstatat64() on 32-bit ones. While loongarch64 is the first 64-bit one that is missing newfstatat(), we have riscv32 already without fstatat64().
Then how to move forward? Xuerui said that he wants to improve seccomp, but a long time has already passed. And I think we should solve this problem before Debian loong64 ports become usable.
Importantly, we can't just add fstatat64() on riscv32 because there is no time64 version for it other than statx(), and I don't want the architectures to diverge more than necessary. I would not mind adding a variant of statx() that works for both riscv32 and loongarch64 though, if it gets added to all architectures.
As far as I know, Ren Guo is trying to implement riscv64 kernel + riscv32 userspace, so I think riscv32 kernel won't be widely used?
Huacai
Arnd
On Sat, May 11, 2024, at 16:28, Huacai Chen wrote:
On Sat, May 11, 2024 at 8:17 PM Arnd Bergmann arnd@arndb.de wrote:
Importantly, we can't just add fstatat64() on riscv32 because there is no time64 version for it other than statx(), and I don't want the architectures to diverge more than necessary. I would not mind adding a variant of statx() that works for both riscv32 and loongarch64 though, if it gets added to all architectures.
As far as I know, Ren Guo is trying to implement riscv64 kernel + riscv32 userspace, so I think riscv32 kernel won't be widely used?
I was talking about the ABI, so it doesn't actually matter what the kernel is: any userspace ABI without CONFIG_COMPAT_32BIT_TIME is equally affected here. On riscv32 this is the only allowed configuration, while on others (arm32 or x86-32 userland) you can turn off COMPAT_32BIT_TIME on both 32-bit kernel and on 64-bit kernels with compat mode.
Arnd
On Sat, May 11, 2024 at 11:39 PM Arnd Bergmann arnd@arndb.de wrote:
On Sat, May 11, 2024, at 16:28, Huacai Chen wrote:
On Sat, May 11, 2024 at 8:17 PM Arnd Bergmann arnd@arndb.de wrote:
Importantly, we can't just add fstatat64() on riscv32 because there is no time64 version for it other than statx(), and I don't want the architectures to diverge more than necessary. I would not mind adding a variant of statx() that works for both riscv32 and loongarch64 though, if it gets added to all architectures.
As far as I know, Ren Guo is trying to implement riscv64 kernel + riscv32 userspace, so I think riscv32 kernel won't be widely used?
I was talking about the ABI, so it doesn't actually matter what the kernel is: any userspace ABI without CONFIG_COMPAT_32BIT_TIME is equally affected here. On riscv32 this is the only allowed configuration, while on others (arm32 or x86-32 userland) you can turn off COMPAT_32BIT_TIME on both 32-bit kernel and on 64-bit kernels with compat mode.
I don't know too much detail, but I think riscv32 can do something similar to arm32 and x86-32, or we can wait for Xuerui to improve seccomp. But there is no much time for loongarch because the Debian loong64 port is coming soon.
Huacai
Arnd
On Sun, May 12, 2024, at 05:11, Huacai Chen wrote:
On Sat, May 11, 2024 at 11:39 PM Arnd Bergmann arnd@arndb.de wrote:
On Sat, May 11, 2024, at 16:28, Huacai Chen wrote:
On Sat, May 11, 2024 at 8:17 PM Arnd Bergmann arnd@arndb.de wrote:
CONFIG_COMPAT_32BIT_TIME is equally affected here. On riscv32 this is the only allowed configuration, while on others (arm32 or x86-32 userland) you can turn off COMPAT_32BIT_TIME on both 32-bit kernel and on 64-bit kernels with compat mode.
I don't know too much detail, but I think riscv32 can do something similar to arm32 and x86-32, or we can wait for Xuerui to improve seccomp. But there is no much time for loongarch because the Debian loong64 port is coming soon.
What I meant is that the other architectures only work by accident if COMPAT_32BIT_TIME is enabled and statx() gets blocked, but then they truncate the timestamps to the tim32 range, which is not acceptable behavior. Actually mips64 is in the same situation because it also only supports 32-bit timestamps in newstatat(), despite being a 64-bit architecture with a 64-bit time_t in all other syscalls.
Arnd
On 2024/5/11 下午8:17, Arnd Bergmann wrote:
On Sat, May 11, 2024, at 12:01, Huacai Chen wrote:
Chromium sandbox apparently wants to deny statx [1] so it could properly inspect arguments after the sandboxed process later falls back to fstat. Because there's currently not a "fd-only" version of statx, so that the sandbox has no way to ensure the path argument is empty without being able to peek into the sandboxed process's memory. For architectures able to do newfstatat though, glibc falls back to newfstatat after getting -ENOSYS for statx, then the respective SIGSYS handler [2] takes care of inspecting the path argument, transforming allowed newfstatat's into fstat instead which is allowed and has the same type of return value.
But, as LoongArch is the first architecture to not have fstat nor newfstatat, the LoongArch glibc does not attempt falling back at all when it gets -ENOSYS for statx -- and you see the problem there!
My main objection here is that this is inconsistent with 32-bit architectures: we normally have newfstatat() on 64-bit architectures but fstatat64() on 32-bit ones. While loongarch64 is the first 64-bit one that is missing newfstatat(), we have riscv32 already without fstatat64().
Importantly, we can't just add fstatat64() on riscv32 because there is no time64 version for it other than statx(), and I don't want the architectures to diverge more than necessary.
yes, I agree. Normally there is newfstatat() on 64-bit architectures but fstatat64() on 32-bit ones.
I do not understand why fstatat64() can be added for riscv32 still. 32bit timestamp seems works well for the present, it is valid until (0x1UL << 32) / 365 / 24 / 3600 + 1970 == 2106 year. Year 2106 should be enough for 32bit system.
Regards Bibo Mao
I would not mind adding a variant of statx() that works for both riscv32 and loongarch64 though, if it gets added to all architectures.
Arnd
On Wed, May 15, 2024, at 09:30, maobibo wrote:
On 2024/5/11 下午8:17, Arnd Bergmann wrote:
On Sat, May 11, 2024, at 12:01, Huacai Chen wrote:
Importantly, we can't just add fstatat64() on riscv32 because there is no time64 version for it other than statx(), and I don't want the architectures to diverge more than necessary.
yes, I agree. Normally there is newfstatat() on 64-bit architectures but fstatat64() on 32-bit ones.
I do not understand why fstatat64() can be added for riscv32 still. 32bit timestamp seems works well for the present, it is valid until (0x1UL << 32) / 365 / 24 / 3600 + 1970 == 2106 year. Year 2106 should be enough for 32bit system.
There is a very small number of interfaces for which we ended up not using a 64-bit time_t replacement, but those are only for relative times, not epoch based offsets. The main problems here are:
- time_t is defined to be a signed value in posix, and we need to handle file timestamps before 1970 in stat(), so changing this one to be unsigned is not an option.
- A lot of products have already shipped that will have to be supported past 2038 on existing 32-bit hardware. We cannot regress on architectures that have already been fixed to support this.
- file timestamps can also be set into the future, so applications relying on this are broken before 2038.
Arnd
On 2024/5/15 下午10:25, Arnd Bergmann wrote:
On Wed, May 15, 2024, at 09:30, maobibo wrote:
On 2024/5/11 下午8:17, Arnd Bergmann wrote:
On Sat, May 11, 2024, at 12:01, Huacai Chen wrote:
Importantly, we can't just add fstatat64() on riscv32 because there is no time64 version for it other than statx(), and I don't want the architectures to diverge more than necessary.
yes, I agree. Normally there is newfstatat() on 64-bit architectures but fstatat64() on 32-bit ones.
I do not understand why fstatat64() can be added for riscv32 still. 32bit timestamp seems works well for the present, it is valid until (0x1UL << 32) / 365 / 24 / 3600 + 1970 == 2106 year. Year 2106 should be enough for 32bit system.
There is a very small number of interfaces for which we ended up not using a 64-bit time_t replacement, but those are only for relative times, not epoch based offsets. The main problems here are:
time_t is defined to be a signed value in posix, and we need to handle file timestamps before 1970 in stat(), so changing this one to be unsigned is not an option.
A lot of products have already shipped that will have to be supported past 2038 on existing 32-bit hardware. We cannot regress on architectures that have already been fixed to support this.
file timestamps can also be set into the future, so applications relying on this are broken before 2038.
I see. And thanks for detailed explanation.
Regards Bibo Mao
Arnd
linux-stable-mirror@lists.linaro.org