For a while now we have supported file handles for pidfds. This has proven to be very useful.
Extend the concept to cover namespaces as well. After this patchset it is possible to encode and decode namespace file handles using the commong name_to_handle_at() and open_by_handle_at() apis.
Namespaces file descriptors can already be derived from pidfds which means they aren't subject to overmount protection bugs. IOW, it's irrelevant if the caller would not have access to an appropriate /proc/<pid>/ns/ directory as they could always just derive the namespace based on a pidfd already.
It has the same advantage as pidfds. It's possible to reliably and for the lifetime of the system refer to a namespace without pinning any resources and to compare them.
Permission checking is kept simple. If the caller is located in the namespace the file handle refers to they are able to open it otherwise they must hold privilege over the owning namespace of the relevant namespace.
Both the network namespace and the mount namespace already have an associated cookie that isn't recycled and is fully exposed to userspace. Move this into ns_common and use the same id space for all namespaces so they can trivially and reliably be compared.
There's more coming based on the iterator infrastructure but the series is large enough and focuses on file handles.
Extensive selftests included. I still have various other test-suites to run but it holds up so far.
Signed-off-by: Christian Brauner brauner@kernel.org --- Christian Brauner (32): pidfs: validate extensible ioctls nsfs: validate extensible ioctls block: use extensible_ioctl_valid() ns: move to_ns_common() to ns_common.h nsfs: add nsfs.h header ns: uniformly initialize ns_common mnt: use ns_common_init() ipc: use ns_common_init() cgroup: use ns_common_init() pid: use ns_common_init() time: use ns_common_init() uts: use ns_common_init() user: use ns_common_init() net: use ns_common_init() ns: remove ns_alloc_inum() nstree: make iterator generic mnt: support iterator cgroup: support iterator ipc: support iterator net: support iterator pid: support iterator time: support iterator userns: support iterator uts: support iterator ns: add to_<type>_ns() to respective headers nsfs: add current_in_namespace() nsfs: support file handles nsfs: support exhaustive file handles nsfs: add missing id retrieval support tools: update nsfs.h uapi header selftests/namespaces: add identifier selftests selftests/namespaces: add file handle selftests
block/blk-integrity.c | 8 +- fs/fhandle.c | 6 + fs/internal.h | 1 + fs/mount.h | 10 +- fs/namespace.c | 156 +-- fs/nsfs.c | 266 +++- fs/pidfs.c | 2 +- include/linux/cgroup.h | 5 + include/linux/exportfs.h | 6 + include/linux/fs.h | 14 + include/linux/ipc_namespace.h | 5 + include/linux/ns_common.h | 29 + include/linux/nsfs.h | 40 + include/linux/nsproxy.h | 11 - include/linux/nstree.h | 89 ++ include/linux/pid_namespace.h | 5 + include/linux/proc_ns.h | 32 +- include/linux/time_namespace.h | 9 + include/linux/user_namespace.h | 5 + include/linux/utsname.h | 5 + include/net/net_namespace.h | 6 + include/uapi/linux/fcntl.h | 1 + include/uapi/linux/nsfs.h | 12 +- init/main.c | 2 + ipc/msgutil.c | 1 + ipc/namespace.c | 12 +- ipc/shm.c | 2 + kernel/Makefile | 2 +- kernel/cgroup/cgroup.c | 2 + kernel/cgroup/namespace.c | 24 +- kernel/nstree.c | 233 ++++ kernel/pid_namespace.c | 13 +- kernel/time/namespace.c | 23 +- kernel/user_namespace.c | 17 +- kernel/utsname.c | 28 +- net/core/net_namespace.c | 59 +- tools/include/uapi/linux/nsfs.h | 23 +- tools/testing/selftests/namespaces/.gitignore | 2 + tools/testing/selftests/namespaces/Makefile | 7 + tools/testing/selftests/namespaces/config | 7 + .../selftests/namespaces/file_handle_test.c | 1410 ++++++++++++++++++++ tools/testing/selftests/namespaces/nsid_test.c | 986 ++++++++++++++ 42 files changed, 3306 insertions(+), 270 deletions(-) --- base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 change-id: 20250905-work-namespace-c68826dda0d4
Validate extensible ioctls stricter than we do now.
Signed-off-by: Christian Brauner brauner@kernel.org --- fs/pidfs.c | 2 +- include/linux/fs.h | 14 ++++++++++++++ 2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/fs/pidfs.c b/fs/pidfs.c index edc35522d75c..0a5083b9cce5 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -440,7 +440,7 @@ static bool pidfs_ioctl_valid(unsigned int cmd) * erronously mistook the file descriptor for a pidfd. * This is not perfect but will catch most cases. */ - return (_IOC_TYPE(cmd) == _IOC_TYPE(PIDFD_GET_INFO)); + return extensible_ioctl_valid(cmd, PIDFD_GET_INFO, PIDFD_INFO_SIZE_VER0); }
return false; diff --git a/include/linux/fs.h b/include/linux/fs.h index d7ab4f96d705..2f2edc53bf3c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -4023,4 +4023,18 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
+static inline bool extensible_ioctl_valid(unsigned int cmd_a, + unsigned int cmd_b, size_t min_size) +{ + if (_IOC_DIR(cmd_a) != _IOC_DIR(cmd_b)) + return false; + if (_IOC_TYPE(cmd_a) != _IOC_TYPE(cmd_b)) + return false; + if (_IOC_NR(cmd_a) != _IOC_NR(cmd_b)) + return false; + if (_IOC_SIZE(cmd_a) < min_size) + return false; + return true; +} + #endif /* _LINUX_FS_H */
On Wed 10-09-25 16:36:46, Christian Brauner wrote:
Validate extensible ioctls stricter than we do now.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
fs/pidfs.c | 2 +- include/linux/fs.h | 14 ++++++++++++++ 2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/fs/pidfs.c b/fs/pidfs.c index edc35522d75c..0a5083b9cce5 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -440,7 +440,7 @@ static bool pidfs_ioctl_valid(unsigned int cmd) * erronously mistook the file descriptor for a pidfd. * This is not perfect but will catch most cases. */
return (_IOC_TYPE(cmd) == _IOC_TYPE(PIDFD_GET_INFO));
}return extensible_ioctl_valid(cmd, PIDFD_GET_INFO, PIDFD_INFO_SIZE_VER0);
return false; diff --git a/include/linux/fs.h b/include/linux/fs.h index d7ab4f96d705..2f2edc53bf3c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -4023,4 +4023,18 @@ static inline bool vfs_empty_path(int dfd, const char __user *path) int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter); +static inline bool extensible_ioctl_valid(unsigned int cmd_a,
unsigned int cmd_b, size_t min_size)
+{
- if (_IOC_DIR(cmd_a) != _IOC_DIR(cmd_b))
return false;
- if (_IOC_TYPE(cmd_a) != _IOC_TYPE(cmd_b))
return false;
- if (_IOC_NR(cmd_a) != _IOC_NR(cmd_b))
return false;
- if (_IOC_SIZE(cmd_a) < min_size)
return false;
- return true;
+}
#endif /* _LINUX_FS_H */
-- 2.47.3
On 2025-09-10, Christian Brauner brauner@kernel.org wrote:
Validate extensible ioctls stricter than we do now.
Signed-off-by: Christian Brauner brauner@kernel.org
fs/pidfs.c | 2 +- include/linux/fs.h | 14 ++++++++++++++ 2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/fs/pidfs.c b/fs/pidfs.c index edc35522d75c..0a5083b9cce5 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -440,7 +440,7 @@ static bool pidfs_ioctl_valid(unsigned int cmd) * erronously mistook the file descriptor for a pidfd. * This is not perfect but will catch most cases. */
return (_IOC_TYPE(cmd) == _IOC_TYPE(PIDFD_GET_INFO));
}return extensible_ioctl_valid(cmd, PIDFD_GET_INFO, PIDFD_INFO_SIZE_VER0);
return false; diff --git a/include/linux/fs.h b/include/linux/fs.h index d7ab4f96d705..2f2edc53bf3c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -4023,4 +4023,18 @@ static inline bool vfs_empty_path(int dfd, const char __user *path) int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter); +static inline bool extensible_ioctl_valid(unsigned int cmd_a,
unsigned int cmd_b, size_t min_size)
+{
- if (_IOC_DIR(cmd_a) != _IOC_DIR(cmd_b))
return false;
- if (_IOC_TYPE(cmd_a) != _IOC_TYPE(cmd_b))
return false;
- if (_IOC_NR(cmd_a) != _IOC_NR(cmd_b))
return false;
- if (_IOC_SIZE(cmd_a) < min_size)
return false;
- return true;
+}
nit: I know only we use them for now, but does this maybe belong in ioctl.h (or even uaccess.h with the other extensible struct stuff)?
Otherwise,
Reviewed-by: Aleksa Sarai cyphar@cyphar.com
#endif /* _LINUX_FS_H */
-- 2.47.3
Validate extensible ioctls stricter than we do now.
Signed-off-by: Christian Brauner brauner@kernel.org --- fs/nsfs.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nsfs.c b/fs/nsfs.c index 59aa801347a7..34f0b35d3ead 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -169,9 +169,11 @@ static bool nsfs_ioctl_valid(unsigned int cmd) /* Extensible ioctls require some extra handling. */ switch (_IOC_NR(cmd)) { case _IOC_NR(NS_MNT_GET_INFO): + return extensible_ioctl_valid(cmd, NS_MNT_GET_INFO, MNT_NS_INFO_SIZE_VER0); case _IOC_NR(NS_MNT_GET_NEXT): + return extensible_ioctl_valid(cmd, NS_MNT_GET_NEXT, MNT_NS_INFO_SIZE_VER0); case _IOC_NR(NS_MNT_GET_PREV): - return (_IOC_TYPE(cmd) == _IOC_TYPE(cmd)); + return extensible_ioctl_valid(cmd, NS_MNT_GET_PREV, MNT_NS_INFO_SIZE_VER0); }
return false;
On Wed 10-09-25 16:36:47, Christian Brauner wrote:
Validate extensible ioctls stricter than we do now.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
fs/nsfs.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nsfs.c b/fs/nsfs.c index 59aa801347a7..34f0b35d3ead 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -169,9 +169,11 @@ static bool nsfs_ioctl_valid(unsigned int cmd) /* Extensible ioctls require some extra handling. */ switch (_IOC_NR(cmd)) { case _IOC_NR(NS_MNT_GET_INFO):
case _IOC_NR(NS_MNT_GET_NEXT):return extensible_ioctl_valid(cmd, NS_MNT_GET_INFO, MNT_NS_INFO_SIZE_VER0);
case _IOC_NR(NS_MNT_GET_PREV):return extensible_ioctl_valid(cmd, NS_MNT_GET_NEXT, MNT_NS_INFO_SIZE_VER0);
return (_IOC_TYPE(cmd) == _IOC_TYPE(cmd));
}return extensible_ioctl_valid(cmd, NS_MNT_GET_PREV, MNT_NS_INFO_SIZE_VER0);
return false;
-- 2.47.3
Use the new extensible_ioctl_valid() helper which is equivalent to what is done here.
Signed-off-by: Christian Brauner brauner@kernel.org --- block/blk-integrity.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/block/blk-integrity.c b/block/blk-integrity.c index 056b8948369d..609d75d6a39b 100644 --- a/block/blk-integrity.c +++ b/block/blk-integrity.c @@ -58,16 +58,14 @@ int blk_rq_count_integrity_sg(struct request_queue *q, struct bio *bio) int blk_get_meta_cap(struct block_device *bdev, unsigned int cmd, struct logical_block_metadata_cap __user *argp) { - struct blk_integrity *bi = blk_get_integrity(bdev->bd_disk); + struct blk_integrity *bi; struct logical_block_metadata_cap meta_cap = {}; size_t usize = _IOC_SIZE(cmd);
- if (_IOC_DIR(cmd) != _IOC_DIR(FS_IOC_GETLBMD_CAP) || - _IOC_TYPE(cmd) != _IOC_TYPE(FS_IOC_GETLBMD_CAP) || - _IOC_NR(cmd) != _IOC_NR(FS_IOC_GETLBMD_CAP) || - _IOC_SIZE(cmd) < LBMD_SIZE_VER0) + if (extensible_ioctl_valid(cmd, FS_IOC_GETLBMD_CAP, LBMD_SIZE_VER0)) return -ENOIOCTLCMD;
+ bi = blk_get_integrity(bdev->bd_disk); if (!bi) goto out;
On Wed 10-09-25 16:36:48, Christian Brauner wrote:
Use the new extensible_ioctl_valid() helper which is equivalent to what is done here.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
block/blk-integrity.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/block/blk-integrity.c b/block/blk-integrity.c index 056b8948369d..609d75d6a39b 100644 --- a/block/blk-integrity.c +++ b/block/blk-integrity.c @@ -58,16 +58,14 @@ int blk_rq_count_integrity_sg(struct request_queue *q, struct bio *bio) int blk_get_meta_cap(struct block_device *bdev, unsigned int cmd, struct logical_block_metadata_cap __user *argp) {
- struct blk_integrity *bi = blk_get_integrity(bdev->bd_disk);
- struct blk_integrity *bi; struct logical_block_metadata_cap meta_cap = {}; size_t usize = _IOC_SIZE(cmd);
- if (_IOC_DIR(cmd) != _IOC_DIR(FS_IOC_GETLBMD_CAP) ||
_IOC_TYPE(cmd) != _IOC_TYPE(FS_IOC_GETLBMD_CAP) ||
_IOC_NR(cmd) != _IOC_NR(FS_IOC_GETLBMD_CAP) ||
_IOC_SIZE(cmd) < LBMD_SIZE_VER0)
- if (extensible_ioctl_valid(cmd, FS_IOC_GETLBMD_CAP, LBMD_SIZE_VER0)) return -ENOIOCTLCMD;
- bi = blk_get_integrity(bdev->bd_disk); if (!bi) goto out;
-- 2.47.3
Reviewed-by: Jens Axboe axboe@kernel.dk
Move the helper to ns_common.h where it belongs.
Signed-off-by: Christian Brauner brauner@kernel.org --- include/linux/ns_common.h | 20 ++++++++++++++++++++ include/linux/nsproxy.h | 11 ----------- 2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/include/linux/ns_common.h b/include/linux/ns_common.h index 7d22ea50b098..bc2e0758e1c9 100644 --- a/include/linux/ns_common.h +++ b/include/linux/ns_common.h @@ -6,6 +6,15 @@
struct proc_ns_operations;
+struct cgroup_namespace; +struct ipc_namespace; +struct mnt_namespace; +struct net; +struct pid_namespace; +struct time_namespace; +struct user_namespace; +struct uts_namespace; + struct ns_common { struct dentry *stashed; const struct proc_ns_operations *ops; @@ -13,4 +22,15 @@ struct ns_common { refcount_t count; };
+#define to_ns_common(__ns) \ + _Generic((__ns), \ + struct cgroup_namespace *: &(__ns)->ns, \ + struct ipc_namespace *: &(__ns)->ns, \ + struct mnt_namespace *: &(__ns)->ns, \ + struct net *: &(__ns)->ns, \ + struct pid_namespace *: &(__ns)->ns, \ + struct time_namespace *: &(__ns)->ns, \ + struct user_namespace *: &(__ns)->ns, \ + struct uts_namespace *: &(__ns)->ns) + #endif diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h index dab6a1734a22..e6bec522b139 100644 --- a/include/linux/nsproxy.h +++ b/include/linux/nsproxy.h @@ -42,17 +42,6 @@ struct nsproxy { }; extern struct nsproxy init_nsproxy;
-#define to_ns_common(__ns) \ - _Generic((__ns), \ - struct cgroup_namespace *: &(__ns->ns), \ - struct ipc_namespace *: &(__ns->ns), \ - struct net *: &(__ns->ns), \ - struct pid_namespace *: &(__ns->ns), \ - struct mnt_namespace *: &(__ns->ns), \ - struct time_namespace *: &(__ns->ns), \ - struct user_namespace *: &(__ns->ns), \ - struct uts_namespace *: &(__ns->ns)) - /* * A structure to encompass all bits needed to install * a partial or complete new set of namespaces.
On Wed 10-09-25 16:36:49, Christian Brauner wrote:
Move the helper to ns_common.h where it belongs.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
include/linux/ns_common.h | 20 ++++++++++++++++++++ include/linux/nsproxy.h | 11 ----------- 2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/include/linux/ns_common.h b/include/linux/ns_common.h index 7d22ea50b098..bc2e0758e1c9 100644 --- a/include/linux/ns_common.h +++ b/include/linux/ns_common.h @@ -6,6 +6,15 @@ struct proc_ns_operations; +struct cgroup_namespace; +struct ipc_namespace; +struct mnt_namespace; +struct net; +struct pid_namespace; +struct time_namespace; +struct user_namespace; +struct uts_namespace;
struct ns_common { struct dentry *stashed; const struct proc_ns_operations *ops; @@ -13,4 +22,15 @@ struct ns_common { refcount_t count; }; +#define to_ns_common(__ns) \
- _Generic((__ns), \
struct cgroup_namespace *: &(__ns)->ns, \
struct ipc_namespace *: &(__ns)->ns, \
struct mnt_namespace *: &(__ns)->ns, \
struct net *: &(__ns)->ns, \
struct pid_namespace *: &(__ns)->ns, \
struct time_namespace *: &(__ns)->ns, \
struct user_namespace *: &(__ns)->ns, \
struct uts_namespace *: &(__ns)->ns)
#endif diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h index dab6a1734a22..e6bec522b139 100644 --- a/include/linux/nsproxy.h +++ b/include/linux/nsproxy.h @@ -42,17 +42,6 @@ struct nsproxy { }; extern struct nsproxy init_nsproxy; -#define to_ns_common(__ns) \
- _Generic((__ns), \
struct cgroup_namespace *: &(__ns->ns), \
struct ipc_namespace *: &(__ns->ns), \
struct net *: &(__ns->ns), \
struct pid_namespace *: &(__ns->ns), \
struct mnt_namespace *: &(__ns->ns), \
struct time_namespace *: &(__ns->ns), \
struct user_namespace *: &(__ns->ns), \
struct uts_namespace *: &(__ns->ns))
/*
- A structure to encompass all bits needed to install
- a partial or complete new set of namespaces.
-- 2.47.3
And move the stuff out from proc_ns.h where it really doesn't belong.
Signed-off-by: Christian Brauner brauner@kernel.org --- include/linux/nsfs.h | 26 ++++++++++++++++++++++++++ include/linux/proc_ns.h | 13 +------------ 2 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/include/linux/nsfs.h b/include/linux/nsfs.h new file mode 100644 index 000000000000..fb84aa538091 --- /dev/null +++ b/include/linux/nsfs.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (c) 2025 Christian Brauner brauner@kernel.org */ + +#ifndef _LINUX_NSFS_H +#define _LINUX_NSFS_H + +#include <linux/ns_common.h> + +struct path; +struct task_struct; +struct proc_ns_operations; + +int ns_get_path(struct path *path, struct task_struct *task, + const struct proc_ns_operations *ns_ops); +typedef struct ns_common *ns_get_path_helper_t(void *); +int ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb, + void *private_data); + +bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino); + +int ns_get_name(char *buf, size_t size, struct task_struct *task, + const struct proc_ns_operations *ns_ops); +void nsfs_init(void); + +#endif /* _LINUX_NSFS_H */ + diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index 4b20375f3783..5e1a4b378b79 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -5,7 +5,7 @@ #ifndef _LINUX_PROC_NS_H #define _LINUX_PROC_NS_H
-#include <linux/ns_common.h> +#include <linux/nsfs.h> #include <uapi/linux/nsfs.h>
struct pid_namespace; @@ -75,16 +75,5 @@ static inline int ns_alloc_inum(struct ns_common *ns) #define ns_free_inum(ns) proc_free_inum((ns)->inum)
#define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private) -extern int ns_get_path(struct path *path, struct task_struct *task, - const struct proc_ns_operations *ns_ops); -typedef struct ns_common *ns_get_path_helper_t(void *); -extern int ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb, - void *private_data); - -extern bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino); - -extern int ns_get_name(char *buf, size_t size, struct task_struct *task, - const struct proc_ns_operations *ns_ops); -extern void nsfs_init(void);
#endif /* _LINUX_PROC_NS_H */
On Wed 10-09-25 16:36:50, Christian Brauner wrote:
And move the stuff out from proc_ns.h where it really doesn't belong.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks sensible. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
include/linux/nsfs.h | 26 ++++++++++++++++++++++++++ include/linux/proc_ns.h | 13 +------------ 2 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/include/linux/nsfs.h b/include/linux/nsfs.h new file mode 100644 index 000000000000..fb84aa538091 --- /dev/null +++ b/include/linux/nsfs.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (c) 2025 Christian Brauner brauner@kernel.org */
+#ifndef _LINUX_NSFS_H +#define _LINUX_NSFS_H
+#include <linux/ns_common.h>
+struct path; +struct task_struct; +struct proc_ns_operations;
+int ns_get_path(struct path *path, struct task_struct *task,
const struct proc_ns_operations *ns_ops);
+typedef struct ns_common *ns_get_path_helper_t(void *); +int ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb,
void *private_data);
+bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino);
+int ns_get_name(char *buf, size_t size, struct task_struct *task,
const struct proc_ns_operations *ns_ops);
+void nsfs_init(void);
+#endif /* _LINUX_NSFS_H */
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index 4b20375f3783..5e1a4b378b79 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -5,7 +5,7 @@ #ifndef _LINUX_PROC_NS_H #define _LINUX_PROC_NS_H -#include <linux/ns_common.h> +#include <linux/nsfs.h> #include <uapi/linux/nsfs.h> struct pid_namespace; @@ -75,16 +75,5 @@ static inline int ns_alloc_inum(struct ns_common *ns) #define ns_free_inum(ns) proc_free_inum((ns)->inum) #define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private) -extern int ns_get_path(struct path *path, struct task_struct *task,
const struct proc_ns_operations *ns_ops);
-typedef struct ns_common *ns_get_path_helper_t(void *); -extern int ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb,
void *private_data);
-extern bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino);
-extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
const struct proc_ns_operations *ns_ops);
-extern void nsfs_init(void); #endif /* _LINUX_PROC_NS_H */
-- 2.47.3
No point in cargo-culting the same code across all the different types. Use one common initializer.
Signed-off-by: Christian Brauner brauner@kernel.org --- include/linux/proc_ns.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index 5e1a4b378b79..dbb119bda097 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -72,6 +72,22 @@ static inline int ns_alloc_inum(struct ns_common *ns) return proc_alloc_inum(&ns->inum); }
+static inline int ns_common_init(struct ns_common *ns, + const struct proc_ns_operations *ops, + bool alloc_inum) +{ + if (alloc_inum) { + int ret; + ret = proc_alloc_inum(&ns->inum); + if (ret) + return ret; + } + refcount_set(&ns->count, 1); + ns->stashed = NULL; + ns->ops = ops; + return 0; +} + #define ns_free_inum(ns) proc_free_inum((ns)->inum)
#define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private)
On Wed 10-09-25 16:36:51, Christian Brauner wrote:
No point in cargo-culting the same code across all the different types. Use one common initializer.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
include/linux/proc_ns.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index 5e1a4b378b79..dbb119bda097 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -72,6 +72,22 @@ static inline int ns_alloc_inum(struct ns_common *ns) return proc_alloc_inum(&ns->inum); } +static inline int ns_common_init(struct ns_common *ns,
const struct proc_ns_operations *ops,
bool alloc_inum)
+{
- if (alloc_inum) {
int ret;
ret = proc_alloc_inum(&ns->inum);
if (ret)
return ret;
- }
- refcount_set(&ns->count, 1);
- ns->stashed = NULL;
- ns->ops = ops;
- return 0;
+}
#define ns_free_inum(ns) proc_free_inum((ns)->inum) #define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private)
-- 2.47.3
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org --- fs/namespace.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c index ddfd4457d338..14c5cdbdd6e1 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -4177,18 +4177,15 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns, bool a dec_mnt_namespaces(ucounts); return ERR_PTR(-ENOMEM); } - if (!anon) { - ret = ns_alloc_inum(&new_ns->ns); - if (ret) { - kfree(new_ns); - dec_mnt_namespaces(ucounts); - return ERR_PTR(ret); - } + + ret = ns_common_init(&new_ns->ns, &mntns_operations, !anon); + if (ret) { + kfree(new_ns); + dec_mnt_namespaces(ucounts); + return ERR_PTR(ret); } - new_ns->ns.ops = &mntns_operations; if (!anon) new_ns->seq = atomic64_inc_return(&mnt_ns_seq); - refcount_set(&new_ns->ns.count, 1); refcount_set(&new_ns->passive, 1); new_ns->mounts = RB_ROOT; INIT_LIST_HEAD(&new_ns->mnt_ns_list);
On Wed 10-09-25 16:36:52, Christian Brauner wrote:
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
fs/namespace.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c index ddfd4457d338..14c5cdbdd6e1 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -4177,18 +4177,15 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns, bool a dec_mnt_namespaces(ucounts); return ERR_PTR(-ENOMEM); }
- if (!anon) {
ret = ns_alloc_inum(&new_ns->ns);
if (ret) {
kfree(new_ns);
dec_mnt_namespaces(ucounts);
return ERR_PTR(ret);
}
- ret = ns_common_init(&new_ns->ns, &mntns_operations, !anon);
- if (ret) {
kfree(new_ns);
dec_mnt_namespaces(ucounts);
}return ERR_PTR(ret);
- new_ns->ns.ops = &mntns_operations; if (!anon) new_ns->seq = atomic64_inc_return(&mnt_ns_seq);
- refcount_set(&new_ns->ns.count, 1); refcount_set(&new_ns->passive, 1); new_ns->mounts = RB_ROOT; INIT_LIST_HEAD(&new_ns->mnt_ns_list);
-- 2.47.3
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org --- ipc/namespace.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/ipc/namespace.c b/ipc/namespace.c index 4df91ceeeafe..d4188a88ee57 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -61,12 +61,10 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns, if (ns == NULL) goto fail_dec;
- err = ns_alloc_inum(&ns->ns); + err = ns_common_init(&ns->ns, &ipcns_operations, true); if (err) goto fail_free; - ns->ns.ops = &ipcns_operations;
- refcount_set(&ns->ns.count, 1); ns->user_ns = get_user_ns(user_ns); ns->ucounts = ucounts;
On Wed 10-09-25 16:36:53, Christian Brauner wrote:
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
ipc/namespace.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/ipc/namespace.c b/ipc/namespace.c index 4df91ceeeafe..d4188a88ee57 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -61,12 +61,10 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns, if (ns == NULL) goto fail_dec;
- err = ns_alloc_inum(&ns->ns);
- err = ns_common_init(&ns->ns, &ipcns_operations, true); if (err) goto fail_free;
- ns->ns.ops = &ipcns_operations;
- refcount_set(&ns->ns.count, 1); ns->user_ns = get_user_ns(user_ns); ns->ucounts = ucounts;
-- 2.47.3
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org --- kernel/cgroup/namespace.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/kernel/cgroup/namespace.c b/kernel/cgroup/namespace.c index 144a464e45c6..0391b6ab0bf1 100644 --- a/kernel/cgroup/namespace.c +++ b/kernel/cgroup/namespace.c @@ -21,20 +21,16 @@ static void dec_cgroup_namespaces(struct ucounts *ucounts)
static struct cgroup_namespace *alloc_cgroup_ns(void) { - struct cgroup_namespace *new_ns; + struct cgroup_namespace *new_ns __free(kfree) = NULL; int ret;
new_ns = kzalloc(sizeof(struct cgroup_namespace), GFP_KERNEL_ACCOUNT); if (!new_ns) return ERR_PTR(-ENOMEM); - ret = ns_alloc_inum(&new_ns->ns); - if (ret) { - kfree(new_ns); + ret = ns_common_init(&new_ns->ns, &cgroupns_operations, true); + if (ret) return ERR_PTR(ret); - } - refcount_set(&new_ns->ns.count, 1); - new_ns->ns.ops = &cgroupns_operations; - return new_ns; + return no_free_ptr(new_ns); }
void free_cgroup_ns(struct cgroup_namespace *ns)
On Wed 10-09-25 16:36:54, Christian Brauner wrote:
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
kernel/cgroup/namespace.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/kernel/cgroup/namespace.c b/kernel/cgroup/namespace.c index 144a464e45c6..0391b6ab0bf1 100644 --- a/kernel/cgroup/namespace.c +++ b/kernel/cgroup/namespace.c @@ -21,20 +21,16 @@ static void dec_cgroup_namespaces(struct ucounts *ucounts) static struct cgroup_namespace *alloc_cgroup_ns(void) {
- struct cgroup_namespace *new_ns;
- struct cgroup_namespace *new_ns __free(kfree) = NULL; int ret;
new_ns = kzalloc(sizeof(struct cgroup_namespace), GFP_KERNEL_ACCOUNT); if (!new_ns) return ERR_PTR(-ENOMEM);
- ret = ns_alloc_inum(&new_ns->ns);
- if (ret) {
kfree(new_ns);
- ret = ns_common_init(&new_ns->ns, &cgroupns_operations, true);
- if (ret) return ERR_PTR(ret);
- }
- refcount_set(&new_ns->ns.count, 1);
- new_ns->ns.ops = &cgroupns_operations;
- return new_ns;
- return no_free_ptr(new_ns);
} void free_cgroup_ns(struct cgroup_namespace *ns)
-- 2.47.3
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org --- kernel/pid_namespace.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 7098ed44e717..20ce4052d1c5 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -102,17 +102,15 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns if (ns->pid_cachep == NULL) goto out_free_idr;
- err = ns_alloc_inum(&ns->ns); + err = ns_common_init(&ns->ns, &pidns_operations, true); if (err) goto out_free_idr; - ns->ns.ops = &pidns_operations;
ns->pid_max = PID_MAX_LIMIT; err = register_pidns_sysctls(ns); if (err) goto out_free_inum;
- refcount_set(&ns->ns.count, 1); ns->level = level; ns->parent = get_pid_ns(parent_pid_ns); ns->user_ns = get_user_ns(user_ns);
On Wed 10-09-25 16:36:55, Christian Brauner wrote:
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
kernel/pid_namespace.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 7098ed44e717..20ce4052d1c5 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -102,17 +102,15 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns if (ns->pid_cachep == NULL) goto out_free_idr;
- err = ns_alloc_inum(&ns->ns);
- err = ns_common_init(&ns->ns, &pidns_operations, true); if (err) goto out_free_idr;
- ns->ns.ops = &pidns_operations;
ns->pid_max = PID_MAX_LIMIT; err = register_pidns_sysctls(ns); if (err) goto out_free_inum;
- refcount_set(&ns->ns.count, 1); ns->level = level; ns->parent = get_pid_ns(parent_pid_ns); ns->user_ns = get_user_ns(user_ns);
-- 2.47.3
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org --- kernel/time/namespace.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c index 667452768ed3..80b3d2ce2fb6 100644 --- a/kernel/time/namespace.c +++ b/kernel/time/namespace.c @@ -92,18 +92,15 @@ static struct time_namespace *clone_time_ns(struct user_namespace *user_ns, if (!ns) goto fail_dec;
- refcount_set(&ns->ns.count, 1); - ns->vvar_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!ns->vvar_page) goto fail_free;
- err = ns_alloc_inum(&ns->ns); + err = ns_common_init(&ns->ns, &timens_operations, true); if (err) goto fail_free_page;
ns->ucounts = ucounts; - ns->ns.ops = &timens_operations; ns->user_ns = get_user_ns(user_ns); ns->offsets = old_ns->offsets; ns->frozen_offsets = false;
On Wed, Sep 10 2025 at 16:36, Christian Brauner wrote:
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org
Reviewed-by: Thomas Gleixner tglx@linutronix.de
On Wed 10-09-25 16:36:56, Christian Brauner wrote:
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
kernel/time/namespace.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c index 667452768ed3..80b3d2ce2fb6 100644 --- a/kernel/time/namespace.c +++ b/kernel/time/namespace.c @@ -92,18 +92,15 @@ static struct time_namespace *clone_time_ns(struct user_namespace *user_ns, if (!ns) goto fail_dec;
- refcount_set(&ns->ns.count, 1);
- ns->vvar_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!ns->vvar_page) goto fail_free;
- err = ns_alloc_inum(&ns->ns);
- err = ns_common_init(&ns->ns, &timens_operations, true); if (err) goto fail_free_page;
ns->ucounts = ucounts;
- ns->ns.ops = &timens_operations; ns->user_ns = get_user_ns(user_ns); ns->offsets = old_ns->offsets; ns->frozen_offsets = false;
-- 2.47.3
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org --- kernel/utsname.c | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-)
diff --git a/kernel/utsname.c b/kernel/utsname.c index b1ac3ca870f2..02037010b378 100644 --- a/kernel/utsname.c +++ b/kernel/utsname.c @@ -27,16 +27,6 @@ static void dec_uts_namespaces(struct ucounts *ucounts) dec_ucount(ucounts, UCOUNT_UTS_NAMESPACES); }
-static struct uts_namespace *create_uts_ns(void) -{ - struct uts_namespace *uts_ns; - - uts_ns = kmem_cache_alloc(uts_ns_cache, GFP_KERNEL); - if (uts_ns) - refcount_set(&uts_ns->ns.count, 1); - return uts_ns; -} - /* * Clone a new ns copying an original utsname, setting refcount to 1 * @old_ns: namespace to clone @@ -55,17 +45,15 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns, goto fail;
err = -ENOMEM; - ns = create_uts_ns(); + ns = kmem_cache_zalloc(uts_ns_cache, GFP_KERNEL); if (!ns) goto fail_dec;
- err = ns_alloc_inum(&ns->ns); + err = ns_common_init(&ns->ns, &utsns_operations, true); if (err) goto fail_free;
ns->ucounts = ucounts; - ns->ns.ops = &utsns_operations; - down_read(&uts_sem); memcpy(&ns->name, &old_ns->name, sizeof(ns->name)); ns->user_ns = get_user_ns(user_ns);
On Wed 10-09-25 16:36:57, Christian Brauner wrote:
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
kernel/utsname.c | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-)
diff --git a/kernel/utsname.c b/kernel/utsname.c index b1ac3ca870f2..02037010b378 100644 --- a/kernel/utsname.c +++ b/kernel/utsname.c @@ -27,16 +27,6 @@ static void dec_uts_namespaces(struct ucounts *ucounts) dec_ucount(ucounts, UCOUNT_UTS_NAMESPACES); } -static struct uts_namespace *create_uts_ns(void) -{
- struct uts_namespace *uts_ns;
- uts_ns = kmem_cache_alloc(uts_ns_cache, GFP_KERNEL);
- if (uts_ns)
refcount_set(&uts_ns->ns.count, 1);
- return uts_ns;
-}
/*
- Clone a new ns copying an original utsname, setting refcount to 1
- @old_ns: namespace to clone
@@ -55,17 +45,15 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns, goto fail; err = -ENOMEM;
- ns = create_uts_ns();
- ns = kmem_cache_zalloc(uts_ns_cache, GFP_KERNEL); if (!ns) goto fail_dec;
- err = ns_alloc_inum(&ns->ns);
- err = ns_common_init(&ns->ns, &utsns_operations, true); if (err) goto fail_free;
ns->ucounts = ucounts;
- ns->ns.ops = &utsns_operations;
- down_read(&uts_sem); memcpy(&ns->name, &old_ns->name, sizeof(ns->name)); ns->user_ns = get_user_ns(user_ns);
-- 2.47.3
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org --- kernel/user_namespace.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 682f40d5632d..98f4fe84d039 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -124,12 +124,11 @@ int create_user_ns(struct cred *new) goto fail_dec;
ns->parent_could_setfcap = cap_raised(new->cap_effective, CAP_SETFCAP); - ret = ns_alloc_inum(&ns->ns); + + ret = ns_common_init(&ns->ns, &userns_operations, true); if (ret) goto fail_free; - ns->ns.ops = &userns_operations;
- refcount_set(&ns->ns.count, 1); /* Leave the new->user_ns reference with the new user namespace. */ ns->parent = parent_ns; ns->level = parent_ns->level + 1;
On Wed 10-09-25 16:36:58, Christian Brauner wrote:
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
kernel/user_namespace.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 682f40d5632d..98f4fe84d039 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -124,12 +124,11 @@ int create_user_ns(struct cred *new) goto fail_dec; ns->parent_could_setfcap = cap_raised(new->cap_effective, CAP_SETFCAP);
- ret = ns_alloc_inum(&ns->ns);
- ret = ns_common_init(&ns->ns, &userns_operations, true); if (ret) goto fail_free;
- ns->ns.ops = &userns_operations;
- refcount_set(&ns->ns.count, 1); /* Leave the new->user_ns reference with the new user namespace. */ ns->parent = parent_ns; ns->level = parent_ns->level + 1;
-- 2.47.3
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org --- net/core/net_namespace.c | 46 ++++++++++++++++++++++++++++++++-------------- 1 file changed, 32 insertions(+), 14 deletions(-)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 1b6f3826dd0e..dafb3d947043 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -397,10 +397,22 @@ static __net_init void preinit_net_sysctl(struct net *net) }
/* init code that must occur even if setup_net() is not called. */ -static __net_init void preinit_net(struct net *net, struct user_namespace *user_ns) +static __net_init int preinit_net(struct net *net, struct user_namespace *user_ns) { + const struct proc_ns_operations *ns_ops; + int ret; + +#ifdef CONFIG_NET_NS + ns_ops = &netns_operations; +#else + ns_ops = NULL; +#endif + + ret = ns_common_init(&net->ns, ns_ops, false); + if (ret) + return ret; + refcount_set(&net->passive, 1); - refcount_set(&net->ns.count, 1); ref_tracker_dir_init(&net->refcnt_tracker, 128, "net_refcnt"); ref_tracker_dir_init(&net->notrefcnt_tracker, 128, "net_notrefcnt");
@@ -420,6 +432,7 @@ static __net_init void preinit_net(struct net *net, struct user_namespace *user_ INIT_LIST_HEAD(&net->ptype_all); INIT_LIST_HEAD(&net->ptype_specific); preinit_net_sysctl(net); + return 0; }
/* @@ -559,7 +572,9 @@ struct net *copy_net_ns(unsigned long flags, goto dec_ucounts; }
- preinit_net(net, user_ns); + rv = preinit_net(net, user_ns); + if (rv < 0) + goto dec_ucounts; net->ucounts = ucounts; get_user_ns(user_ns);
@@ -573,6 +588,7 @@ struct net *copy_net_ns(unsigned long flags,
if (rv < 0) { put_userns: + ns_free_inum(&net->ns); #ifdef CONFIG_KEYS key_remove_domain(net->key_domain); #endif @@ -812,17 +828,14 @@ static void net_ns_net_debugfs(struct net *net)
static __net_init int net_ns_net_init(struct net *net) { -#ifdef CONFIG_NET_NS - net->ns.ops = &netns_operations; -#endif - net->ns.inum = PROC_NET_INIT_INO; - if (net != &init_net) { - int ret = ns_alloc_inum(&net->ns); - if (ret) - return ret; - } + int ret = 0; + + if (net == &init_net) + net->ns.inum = PROC_NET_INIT_INO; + else + ret = proc_alloc_inum(&to_ns_common(net)->inum); net_ns_net_debugfs(net); - return 0; + return ret; }
static __net_exit void net_ns_net_exit(struct net *net) @@ -1282,7 +1295,12 @@ void __init net_ns_init(void) #ifdef CONFIG_KEYS init_net.key_domain = &init_net_key_domain; #endif - preinit_net(&init_net, &init_user_ns); + /* + * This currently cannot fail as the initial network namespace + * has a static inode number. + */ + if (preinit_net(&init_net, &init_user_ns)) + panic("Could not preinitialize the initial network namespace");
down_write(&pernet_ops_rwsem); if (setup_net(&init_net))
On Wed 10-09-25 16:36:59, Christian Brauner wrote:
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org
One comment below.
@@ -812,17 +828,14 @@ static void net_ns_net_debugfs(struct net *net) static __net_init int net_ns_net_init(struct net *net) { -#ifdef CONFIG_NET_NS
- net->ns.ops = &netns_operations;
-#endif
- net->ns.inum = PROC_NET_INIT_INO;
- if (net != &init_net) {
int ret = ns_alloc_inum(&net->ns);
if (ret)
return ret;
- }
- int ret = 0;
- if (net == &init_net)
net->ns.inum = PROC_NET_INIT_INO;
- else
net_ns_net_debugfs(net);ret = proc_alloc_inum(&to_ns_common(net)->inum);
Here you're calling net_ns_net_debugfs() even if proc_alloc_inum() failed which looks like a bug to me...
Honza
- return 0;
- return ret;
} static __net_exit void net_ns_net_exit(struct net *net) @@ -1282,7 +1295,12 @@ void __init net_ns_init(void) #ifdef CONFIG_KEYS init_net.key_domain = &init_net_key_domain; #endif
- preinit_net(&init_net, &init_user_ns);
- /*
* This currently cannot fail as the initial network namespace
* has a static inode number.
*/
- if (preinit_net(&init_net, &init_user_ns))
panic("Could not preinitialize the initial network namespace");
down_write(&pernet_ops_rwsem); if (setup_net(&init_net))
-- 2.47.3
On Wed, Sep 10, 2025 at 05:57:52PM +0200, Jan Kara wrote:
On Wed 10-09-25 16:36:59, Christian Brauner wrote:
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org
One comment below.
@@ -812,17 +828,14 @@ static void net_ns_net_debugfs(struct net *net) static __net_init int net_ns_net_init(struct net *net) { -#ifdef CONFIG_NET_NS
- net->ns.ops = &netns_operations;
-#endif
- net->ns.inum = PROC_NET_INIT_INO;
- if (net != &init_net) {
int ret = ns_alloc_inum(&net->ns);
if (ret)
return ret;
- }
- int ret = 0;
- if (net == &init_net)
net->ns.inum = PROC_NET_INIT_INO;
- else
net_ns_net_debugfs(net);ret = proc_alloc_inum(&to_ns_common(net)->inum);
Here you're calling net_ns_net_debugfs() even if proc_alloc_inum() failed which looks like a bug to me...
Yes, good catch!
Fyi, I have been out properly sick this week and that's why I haven't been very active on-list. I hope to be back in a more functional state tomorrow and will process the backlog.
On Thu 11-09-25 10:46:11, Christian Brauner wrote:
On Wed, Sep 10, 2025 at 05:57:52PM +0200, Jan Kara wrote:
On Wed 10-09-25 16:36:59, Christian Brauner wrote:
Don't cargo-cult the same thing over and over.
Signed-off-by: Christian Brauner brauner@kernel.org
One comment below.
@@ -812,17 +828,14 @@ static void net_ns_net_debugfs(struct net *net) static __net_init int net_ns_net_init(struct net *net) { -#ifdef CONFIG_NET_NS
- net->ns.ops = &netns_operations;
-#endif
- net->ns.inum = PROC_NET_INIT_INO;
- if (net != &init_net) {
int ret = ns_alloc_inum(&net->ns);
if (ret)
return ret;
- }
- int ret = 0;
- if (net == &init_net)
net->ns.inum = PROC_NET_INIT_INO;
- else
net_ns_net_debugfs(net);ret = proc_alloc_inum(&to_ns_common(net)->inum);
Here you're calling net_ns_net_debugfs() even if proc_alloc_inum() failed which looks like a bug to me...
Yes, good catch!
Fyi, I have been out properly sick this week and that's why I haven't been very active on-list. I hope to be back in a more functional state tomorrow and will process the backlog.
There's no rush. Get well soon!
Honza
On Wed, Sep 10, 2025 at 04:36:59PM +0200, Christian Brauner wrote:
@@ -573,6 +588,7 @@ struct net *copy_net_ns(unsigned long flags,
if (rv < 0) { put_userns:
ns_free_inum(&net->ns);
I've ended up looking at this patch because of Jan's earlier comment about a different issue in this patch.
Aren't we double-freeing net->ns here if setup_net() failed?
setup_net() can call ops_undo_list() on failure, which will ns_free_inum(&net->ns) once, and then we do it again in the put_userns error handling label.
It's now unused.
Signed-off-by: Christian Brauner brauner@kernel.org --- include/linux/proc_ns.h | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index dbb119bda097..e50d312f9fee 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -66,12 +66,6 @@ static inline void proc_free_inum(unsigned int inum) {}
#endif /* CONFIG_PROC_FS */
-static inline int ns_alloc_inum(struct ns_common *ns) -{ - WRITE_ONCE(ns->stashed, NULL); - return proc_alloc_inum(&ns->inum); -} - static inline int ns_common_init(struct ns_common *ns, const struct proc_ns_operations *ops, bool alloc_inum)
On Wed 10-09-25 16:37:00, Christian Brauner wrote:
It's now unused.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good. Feel free to add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
include/linux/proc_ns.h | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index dbb119bda097..e50d312f9fee 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -66,12 +66,6 @@ static inline void proc_free_inum(unsigned int inum) {} #endif /* CONFIG_PROC_FS */ -static inline int ns_alloc_inum(struct ns_common *ns) -{
- WRITE_ONCE(ns->stashed, NULL);
- return proc_alloc_inum(&ns->inum);
-}
static inline int ns_common_init(struct ns_common *ns, const struct proc_ns_operations *ops, bool alloc_inum)
-- 2.47.3
Move the namespace iteration infrastructure originally introduced for mount namespaces into a generic library usable by all namespace types.
Signed-off-by: Christian Brauner brauner@kernel.org --- include/linux/ns_common.h | 9 ++ include/linux/nstree.h | 89 ++++++++++++++++++ include/linux/proc_ns.h | 3 + kernel/Makefile | 2 +- kernel/nstree.c | 233 ++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 335 insertions(+), 1 deletion(-)
diff --git a/include/linux/ns_common.h b/include/linux/ns_common.h index bc2e0758e1c9..7224072cccc5 100644 --- a/include/linux/ns_common.h +++ b/include/linux/ns_common.h @@ -3,6 +3,7 @@ #define _LINUX_NS_COMMON_H
#include <linux/refcount.h> +#include <linux/rbtree.h>
struct proc_ns_operations;
@@ -20,6 +21,14 @@ struct ns_common { const struct proc_ns_operations *ops; unsigned int inum; refcount_t count; + union { + struct { + u64 ns_id; + struct rb_node ns_tree_node; + struct list_head ns_list_node; + }; + struct rcu_head ns_rcu; + }; };
#define to_ns_common(__ns) \ diff --git a/include/linux/nstree.h b/include/linux/nstree.h new file mode 100644 index 000000000000..e26951a83924 --- /dev/null +++ b/include/linux/nstree.h @@ -0,0 +1,89 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_NSTREE_H +#define _LINUX_NSTREE_H + +#include <linux/ns_common.h> +#include <linux/nsproxy.h> +#include <linux/rbtree.h> +#include <linux/seqlock.h> +#include <linux/rculist.h> +#include <linux/cookie.h> + +/** + * struct ns_tree - Namespace tree + * @ns_tree: Rbtree of namespaces of a particular type + * @ns_list: Sequentially walkable list of all namespaces of this type + * @ns_tree_lock: Seqlock to protect the tree and list + */ +struct ns_tree { + struct rb_root ns_tree; + struct list_head ns_list; + seqlock_t ns_tree_lock; + int type; +}; + +extern struct ns_tree cgroup_ns_tree; +extern struct ns_tree ipc_ns_tree; +extern struct ns_tree mnt_ns_tree; +extern struct ns_tree net_ns_tree; +extern struct ns_tree pid_ns_tree; +extern struct ns_tree time_ns_tree; +extern struct ns_tree user_ns_tree; +extern struct ns_tree uts_ns_tree; + +#define to_ns_tree(__ns) \ + _Generic((__ns), \ + struct cgroup_namespace *: &(cgroup_ns_tree), \ + struct ipc_namespace *: &(ipc_ns_tree), \ + struct net *: &(net_ns_tree), \ + struct pid_namespace *: &(pid_ns_tree), \ + struct mnt_namespace *: &(mnt_ns_tree), \ + struct time_namespace *: &(time_ns_tree), \ + struct user_namespace *: &(user_ns_tree), \ + struct uts_namespace *: &(uts_ns_tree)) + +u64 ns_tree_gen_id(struct ns_common *ns); +void __ns_tree_add_raw(struct ns_common *ns, struct ns_tree *ns_tree); +void __ns_tree_remove(struct ns_common *ns, struct ns_tree *ns_tree); +struct ns_common *ns_tree_lookup_rcu(u64 ns_id, int ns_type); +struct ns_common *__ns_tree_adjoined_rcu(struct ns_common *ns, + struct ns_tree *ns_tree, + bool previous); + +static inline void __ns_tree_add(struct ns_common *ns, struct ns_tree *ns_tree) +{ + ns_tree_gen_id(ns); + __ns_tree_add_raw(ns, ns_tree); +} + +/** + * ns_tree_add_raw - Add a namespace to a namespace + * @ns: Namespace to add + * + * This function adds a namespace to the appropriate namespace tree + * without assigning a id. + */ +#define ns_tree_add_raw(__ns) __ns_tree_add_raw(to_ns_common(__ns), to_ns_tree(__ns)) + +/** + * ns_tree_add - Add a namespace to a namespace tree + * @ns: Namespace to add + * + * This function assigns a new id to the namespace and adds it to the + * appropriate namespace tree and list. + */ +#define ns_tree_add(__ns) __ns_tree_add(to_ns_common(__ns), to_ns_tree(__ns)) + +/** + * ns_tree_remove - Remove a namespace from a namespace tree + * @ns: Namespace to remove + * + * This function removes a namespace from the appropriate namespace + * tree and list. + */ +#define ns_tree_remove(__ns) __ns_tree_remove(to_ns_common(__ns), to_ns_tree(__ns)) + +#define ns_tree_adjoined_rcu(__ns, __previous) \ + __ns_tree_adjoined_rcu(to_ns_common(__ns), to_ns_tree(__ns), __previous) + +#endif /* _LINUX_NSTREE_H */ diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index e50d312f9fee..7f89f0829e60 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -79,6 +79,9 @@ static inline int ns_common_init(struct ns_common *ns, refcount_set(&ns->count, 1); ns->stashed = NULL; ns->ops = ops; + ns->ns_id = 0; + RB_CLEAR_NODE(&ns->ns_tree_node); + INIT_LIST_HEAD(&ns->ns_list_node); return 0; }
diff --git a/kernel/Makefile b/kernel/Makefile index c60623448235..b807516a1b43 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -8,7 +8,7 @@ obj-y = fork.o exec_domain.o panic.o \ sysctl.o capability.o ptrace.o user.o \ signal.o sys.o umh.o workqueue.o pid.o task_work.o \ extable.o params.o \ - kthread.o sys_ni.o nsproxy.o \ + kthread.o sys_ni.o nsproxy.o nstree.o \ notifier.o ksysfs.o cred.o reboot.o \ async.o range.o smpboot.o ucount.o regset.o ksyms_common.o
diff --git a/kernel/nstree.c b/kernel/nstree.c new file mode 100644 index 000000000000..bbe8bedc924c --- /dev/null +++ b/kernel/nstree.c @@ -0,0 +1,233 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <linux/nstree.h> +#include <linux/proc_ns.h> +#include <linux/vfsdebug.h> + +struct ns_tree mnt_ns_tree = { + .ns_tree = RB_ROOT, + .ns_list = LIST_HEAD_INIT(mnt_ns_tree.ns_list), + .ns_tree_lock = __SEQLOCK_UNLOCKED(mnt_ns_tree.ns_tree_lock), + .type = CLONE_NEWNS, +}; + +struct ns_tree net_ns_tree = { + .ns_tree = RB_ROOT, + .ns_list = LIST_HEAD_INIT(net_ns_tree.ns_list), + .ns_tree_lock = __SEQLOCK_UNLOCKED(net_ns_tree.ns_tree_lock), + .type = CLONE_NEWNET, +}; +EXPORT_SYMBOL_GPL(net_ns_tree); + +struct ns_tree uts_ns_tree = { + .ns_tree = RB_ROOT, + .ns_list = LIST_HEAD_INIT(uts_ns_tree.ns_list), + .ns_tree_lock = __SEQLOCK_UNLOCKED(uts_ns_tree.ns_tree_lock), + .type = CLONE_NEWUTS, +}; + +struct ns_tree user_ns_tree = { + .ns_tree = RB_ROOT, + .ns_list = LIST_HEAD_INIT(user_ns_tree.ns_list), + .ns_tree_lock = __SEQLOCK_UNLOCKED(user_ns_tree.ns_tree_lock), + .type = CLONE_NEWUSER, +}; + +struct ns_tree ipc_ns_tree = { + .ns_tree = RB_ROOT, + .ns_list = LIST_HEAD_INIT(ipc_ns_tree.ns_list), + .ns_tree_lock = __SEQLOCK_UNLOCKED(ipc_ns_tree.ns_tree_lock), + .type = CLONE_NEWIPC, +}; + +struct ns_tree pid_ns_tree = { + .ns_tree = RB_ROOT, + .ns_list = LIST_HEAD_INIT(pid_ns_tree.ns_list), + .ns_tree_lock = __SEQLOCK_UNLOCKED(pid_ns_tree.ns_tree_lock), + .type = CLONE_NEWPID, +}; + +struct ns_tree cgroup_ns_tree = { + .ns_tree = RB_ROOT, + .ns_list = LIST_HEAD_INIT(cgroup_ns_tree.ns_list), + .ns_tree_lock = __SEQLOCK_UNLOCKED(cgroup_ns_tree.ns_tree_lock), + .type = CLONE_NEWCGROUP, +}; + +struct ns_tree time_ns_tree = { + .ns_tree = RB_ROOT, + .ns_list = LIST_HEAD_INIT(time_ns_tree.ns_list), + .ns_tree_lock = __SEQLOCK_UNLOCKED(time_ns_tree.ns_tree_lock), + .type = CLONE_NEWTIME, +}; + +DEFINE_COOKIE(namespace_cookie); + +static inline struct ns_common *node_to_ns(const struct rb_node *node) +{ + if (!node) + return NULL; + return rb_entry(node, struct ns_common, ns_tree_node); +} + +static inline int ns_cmp(struct rb_node *a, const struct rb_node *b) +{ + struct ns_common *ns_a = node_to_ns(a); + struct ns_common *ns_b = node_to_ns(b); + u64 ns_id_a = ns_a->ns_id; + u64 ns_id_b = ns_b->ns_id; + + if (ns_id_a < ns_id_b) + return -1; + if (ns_id_a > ns_id_b) + return 1; + return 0; +} + +void __ns_tree_add_raw(struct ns_common *ns, struct ns_tree *ns_tree) +{ + struct rb_node *node, *prev; + + VFS_WARN_ON_ONCE(!ns->ns_id); + + write_seqlock(&ns_tree->ns_tree_lock); + + VFS_WARN_ON_ONCE(ns->ops->type != ns_tree->type); + + node = rb_find_add_rcu(&ns->ns_tree_node, &ns_tree->ns_tree, ns_cmp); + /* + * If there's no previous entry simply add it after the + * head and if there is add it after the previous entry. + */ + prev = rb_prev(&ns->ns_tree_node); + if (!prev) + list_add_rcu(&ns->ns_list_node, &ns_tree->ns_list); + else + list_add_rcu(&ns->ns_list_node, &node_to_ns(prev)->ns_list_node); + + write_sequnlock(&ns_tree->ns_tree_lock); + + VFS_WARN_ON_ONCE(node); +} + +void __ns_tree_remove(struct ns_common *ns, struct ns_tree *ns_tree) +{ + VFS_WARN_ON_ONCE(RB_EMPTY_NODE(&ns->ns_tree_node)); + VFS_WARN_ON_ONCE(list_empty(&ns->ns_list_node)); + VFS_WARN_ON_ONCE(ns->ops->type != ns_tree->type); + + write_seqlock(&ns_tree->ns_tree_lock); + rb_erase(&ns->ns_tree_node, &ns_tree->ns_tree); + list_bidir_del_rcu(&ns->ns_list_node); + RB_CLEAR_NODE(&ns->ns_tree_node); + write_sequnlock(&ns_tree->ns_tree_lock); +} +EXPORT_SYMBOL_GPL(__ns_tree_remove); + +static int ns_find(const void *key, const struct rb_node *node) +{ + const u64 ns_id = *(u64 *)key; + const struct ns_common *ns = node_to_ns(node); + + if (ns_id < ns->ns_id) + return -1; + if (ns_id > ns->ns_id) + return 1; + return 0; +} + + +static struct ns_tree *ns_tree_from_type(int ns_type) +{ + switch (ns_type) { + case CLONE_NEWCGROUP: + return &cgroup_ns_tree; + case CLONE_NEWIPC: + return &ipc_ns_tree; + case CLONE_NEWNS: + return &mnt_ns_tree; + case CLONE_NEWNET: + return &net_ns_tree; + case CLONE_NEWPID: + return &pid_ns_tree; + case CLONE_NEWUSER: + return &user_ns_tree; + case CLONE_NEWUTS: + return &uts_ns_tree; + case CLONE_NEWTIME: + return &time_ns_tree; + } + + return NULL; +} + +struct ns_common *ns_tree_lookup_rcu(u64 ns_id, int ns_type) +{ + struct ns_tree *ns_tree; + struct rb_node *node; + unsigned int seq; + + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "suspicious ns_tree_lookup_rcu() usage"); + + ns_tree = ns_tree_from_type(ns_type); + if (!ns_tree) + return NULL; + + do { + seq = read_seqbegin(&ns_tree->ns_tree_lock); + node = rb_find_rcu(&ns_id, &ns_tree->ns_tree, ns_find); + if (node) + break; + } while (read_seqretry(&ns_tree->ns_tree_lock, seq)); + + if (!node) + return NULL; + + VFS_WARN_ON_ONCE(node_to_ns(node)->ops->type != ns_type); + + return node_to_ns(node); +} + +/** + * ns_tree_adjoined_rcu - find the next/previous namespace in the same + * tree + * @ns: namespace to start from + * @previous: if true find the previous namespace, otherwise the next + * + * Find the next or previous namespace in the same tree as @ns. If + * there is no next/previous namespace, -ENOENT is returned. + */ +struct ns_common *__ns_tree_adjoined_rcu(struct ns_common *ns, + struct ns_tree *ns_tree, bool previous) +{ + struct list_head *list; + + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "suspicious ns_tree_adjoined_rcu() usage"); + + if (previous) + list = rcu_dereference(list_bidir_prev_rcu(&ns->ns_list_node)); + else + list = rcu_dereference(list_next_rcu(&ns->ns_list_node)); + if (list_is_head(list, &ns_tree->ns_list)) + return ERR_PTR(-ENOENT); + + VFS_WARN_ON_ONCE(list_entry_rcu(list, struct ns_common, ns_list_node)->ops->type != ns_tree->type); + + return list_entry_rcu(list, struct ns_common, ns_list_node); +} + +/** + * ns_tree_gen_id - generate a new namespace id + * @ns: namespace to generate id for + * + * Generates a new namespace id and assigns it to the namespace. All + * namespaces types share the same id space and thus can be compared + * directly. IOW, when two ids of two namespace are equal, they are + * identical. + */ +u64 ns_tree_gen_id(struct ns_common *ns) +{ + guard(preempt)(); + ns->ns_id = gen_cookie_next(&namespace_cookie); + return ns->ns_id; +}
Move the mount namespace to the generic iterator. This allows us to drop a bunch of members from struct mnt_namespace. t Signed-off-by: Christian Brauner brauner@kernel.org --- fs/mount.h | 10 +--- fs/namespace.c | 141 +++++++++++++-------------------------------------------- fs/nsfs.c | 4 +- 3 files changed, 35 insertions(+), 120 deletions(-)
diff --git a/fs/mount.h b/fs/mount.h index 97737051a8b9..76bf863c9ae2 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -17,11 +17,7 @@ struct mnt_namespace { }; struct user_namespace *user_ns; struct ucounts *ucounts; - u64 seq; /* Sequence number to prevent loops */ - union { - wait_queue_head_t poll; - struct rcu_head mnt_ns_rcu; - }; + wait_queue_head_t poll; u64 seq_origin; /* Sequence number of origin mount namespace */ u64 event; #ifdef CONFIG_FSNOTIFY @@ -30,8 +26,6 @@ struct mnt_namespace { #endif unsigned int nr_mounts; /* # of mounts in the namespace */ unsigned int pending_mounts; - struct rb_node mnt_ns_tree_node; /* node in the mnt_ns_tree */ - struct list_head mnt_ns_list; /* entry in the sequential list of mounts namespace */ refcount_t passive; /* number references not pinning @mounts */ } __randomize_layout;
@@ -173,7 +167,7 @@ static inline bool is_local_mountpoint(const struct dentry *dentry)
static inline bool is_anon_ns(struct mnt_namespace *ns) { - return ns->seq == 0; + return ns->ns.ns_id == 0; }
static inline bool anon_ns_root(const struct mount *m) diff --git a/fs/namespace.c b/fs/namespace.c index 14c5cdbdd6e1..40a8d75f6b16 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -33,6 +33,7 @@ #include <linux/shmem_fs.h> #include <linux/mnt_idmapping.h> #include <linux/pidfs.h> +#include <linux/nstree.h>
#include "pnode.h" #include "internal.h" @@ -80,13 +81,10 @@ static DECLARE_RWSEM(namespace_sem); static HLIST_HEAD(unmounted); /* protected by namespace_sem */ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */ static struct mnt_namespace *emptied_ns; /* protected by namespace_sem */ -static DEFINE_SEQLOCK(mnt_ns_tree_lock);
#ifdef CONFIG_FSNOTIFY LIST_HEAD(notify_list); /* protected by namespace_sem */ #endif -static struct rb_root mnt_ns_tree = RB_ROOT; /* protected by mnt_ns_tree_lock */ -static LIST_HEAD(mnt_ns_list); /* protected by mnt_ns_tree_lock */
enum mount_kattr_flags_t { MOUNT_KATTR_RECURSE = (1 << 0), @@ -119,53 +117,12 @@ __cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);
static inline struct mnt_namespace *node_to_mnt_ns(const struct rb_node *node) { + struct ns_common *ns; + if (!node) return NULL; - return rb_entry(node, struct mnt_namespace, mnt_ns_tree_node); -} - -static int mnt_ns_cmp(struct rb_node *a, const struct rb_node *b) -{ - struct mnt_namespace *ns_a = node_to_mnt_ns(a); - struct mnt_namespace *ns_b = node_to_mnt_ns(b); - u64 seq_a = ns_a->seq; - u64 seq_b = ns_b->seq; - - if (seq_a < seq_b) - return -1; - if (seq_a > seq_b) - return 1; - return 0; -} - -static inline void mnt_ns_tree_write_lock(void) -{ - write_seqlock(&mnt_ns_tree_lock); -} - -static inline void mnt_ns_tree_write_unlock(void) -{ - write_sequnlock(&mnt_ns_tree_lock); -} - -static void mnt_ns_tree_add(struct mnt_namespace *ns) -{ - struct rb_node *node, *prev; - - mnt_ns_tree_write_lock(); - node = rb_find_add_rcu(&ns->mnt_ns_tree_node, &mnt_ns_tree, mnt_ns_cmp); - /* - * If there's no previous entry simply add it after the - * head and if there is add it after the previous entry. - */ - prev = rb_prev(&ns->mnt_ns_tree_node); - if (!prev) - list_add_rcu(&ns->mnt_ns_list, &mnt_ns_list); - else - list_add_rcu(&ns->mnt_ns_list, &node_to_mnt_ns(prev)->mnt_ns_list); - mnt_ns_tree_write_unlock(); - - WARN_ON_ONCE(node); + ns = rb_entry(node, struct ns_common, ns_tree_node); + return container_of(ns, struct mnt_namespace, ns); }
static void mnt_ns_release(struct mnt_namespace *ns) @@ -181,32 +138,16 @@ DEFINE_FREE(mnt_ns_release, struct mnt_namespace *, if (_T) mnt_ns_release(_T))
static void mnt_ns_release_rcu(struct rcu_head *rcu) { - mnt_ns_release(container_of(rcu, struct mnt_namespace, mnt_ns_rcu)); + mnt_ns_release(container_of(rcu, struct mnt_namespace, ns.ns_rcu)); }
static void mnt_ns_tree_remove(struct mnt_namespace *ns) { /* remove from global mount namespace list */ - if (!is_anon_ns(ns)) { - mnt_ns_tree_write_lock(); - rb_erase(&ns->mnt_ns_tree_node, &mnt_ns_tree); - list_bidir_del_rcu(&ns->mnt_ns_list); - mnt_ns_tree_write_unlock(); - } - - call_rcu(&ns->mnt_ns_rcu, mnt_ns_release_rcu); -} - -static int mnt_ns_find(const void *key, const struct rb_node *node) -{ - const u64 mnt_ns_id = *(u64 *)key; - const struct mnt_namespace *ns = node_to_mnt_ns(node); + if (!is_anon_ns(ns)) + ns_tree_remove(ns);
- if (mnt_ns_id < ns->seq) - return -1; - if (mnt_ns_id > ns->seq) - return 1; - return 0; + call_rcu(&ns->ns.ns_rcu, mnt_ns_release_rcu); }
/* @@ -225,28 +166,21 @@ static int mnt_ns_find(const void *key, const struct rb_node *node) */ static struct mnt_namespace *lookup_mnt_ns(u64 mnt_ns_id) { - struct mnt_namespace *ns; - struct rb_node *node; - unsigned int seq; + struct mnt_namespace *mnt_ns; + struct ns_common *ns;
guard(rcu)(); - do { - seq = read_seqbegin(&mnt_ns_tree_lock); - node = rb_find_rcu(&mnt_ns_id, &mnt_ns_tree, mnt_ns_find); - if (node) - break; - } while (read_seqretry(&mnt_ns_tree_lock, seq)); - - if (!node) + ns = ns_tree_lookup_rcu(mnt_ns_id, CLONE_NEWNS); + if (!ns) return NULL;
/* * The last reference count is put with RCU delay so we can * unconditonally acquire a reference here. */ - ns = node_to_mnt_ns(node); - refcount_inc(&ns->passive); - return ns; + mnt_ns = container_of(ns, struct mnt_namespace, ns); + refcount_inc(&mnt_ns->passive); + return mnt_ns; }
static inline void lock_mount_hash(void) @@ -1017,7 +951,7 @@ static inline bool check_anonymous_mnt(struct mount *mnt) return false;
seq = mnt->mnt_ns->seq_origin; - return !seq || (seq == current->nsproxy->mnt_ns->seq); + return !seq || (seq == current->nsproxy->mnt_ns->ns.ns_id); }
/* @@ -2155,19 +2089,16 @@ struct ns_common *from_mnt_ns(struct mnt_namespace *mnt)
struct mnt_namespace *get_sequential_mnt_ns(struct mnt_namespace *mntns, bool previous) { + struct ns_common *ns; + guard(rcu)();
for (;;) { - struct list_head *list; - - if (previous) - list = rcu_dereference(list_bidir_prev_rcu(&mntns->mnt_ns_list)); - else - list = rcu_dereference(list_next_rcu(&mntns->mnt_ns_list)); - if (list_is_head(list, &mnt_ns_list)) - return ERR_PTR(-ENOENT); + ns = ns_tree_adjoined_rcu(mntns, previous); + if (IS_ERR(ns)) + return ERR_CAST(ns);
- mntns = list_entry_rcu(list, struct mnt_namespace, mnt_ns_list); + mntns = to_mnt_ns(ns);
/* * The last passive reference count is put with RCU @@ -2207,7 +2138,7 @@ static bool mnt_ns_loop(struct dentry *dentry) if (!mnt_ns) return false;
- return current->nsproxy->mnt_ns->seq >= mnt_ns->seq; + return current->nsproxy->mnt_ns->ns.ns_id >= mnt_ns->ns.ns_id; }
struct mount *copy_tree(struct mount *src_root, struct dentry *dentry, @@ -3070,7 +3001,7 @@ static struct file *open_detached_copy(struct path *path, bool recursive) if (is_anon_ns(src_mnt_ns)) ns->seq_origin = src_mnt_ns->seq_origin; else - ns->seq_origin = src_mnt_ns->seq; + ns->seq_origin = src_mnt_ns->ns.ns_id; }
mnt = __do_loopback(path, recursive); @@ -4153,15 +4084,6 @@ static void free_mnt_ns(struct mnt_namespace *ns) mnt_ns_tree_remove(ns); }
-/* - * Assign a sequence number so we can detect when we attempt to bind - * mount a reference to an older mount namespace into the current - * mount namespace, preventing reference counting loops. A 64bit - * number incrementing at 10Ghz will take 12,427 years to wrap which - * is effectively never, so we can ignore the possibility. - */ -static atomic64_t mnt_ns_seq = ATOMIC64_INIT(1); - static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns, bool anon) { struct mnt_namespace *new_ns; @@ -4185,11 +4107,11 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns, bool a return ERR_PTR(ret); } if (!anon) - new_ns->seq = atomic64_inc_return(&mnt_ns_seq); + ns_tree_gen_id(&new_ns->ns); + RB_CLEAR_NODE(&new_ns->ns.ns_tree_node); + INIT_LIST_HEAD(&new_ns->ns.ns_list_node); refcount_set(&new_ns->passive, 1); new_ns->mounts = RB_ROOT; - INIT_LIST_HEAD(&new_ns->mnt_ns_list); - RB_CLEAR_NODE(&new_ns->mnt_ns_tree_node); init_waitqueue_head(&new_ns->poll); new_ns->user_ns = get_user_ns(user_ns); new_ns->ucounts = ucounts; @@ -4275,7 +4197,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns, if (pwdmnt) mntput(pwdmnt);
- mnt_ns_tree_add(new_ns); + ns_tree_add_raw(new_ns); return new_ns; }
@@ -5385,7 +5307,7 @@ static int statmount_sb_source(struct kstatmount *s, struct seq_file *seq) static void statmount_mnt_ns_id(struct kstatmount *s, struct mnt_namespace *ns) { s->sm.mask |= STATMOUNT_MNT_NS_ID; - s->sm.mnt_ns_id = ns->seq; + s->sm.mnt_ns_id = ns->ns.ns_id; }
static int statmount_mnt_opts(struct kstatmount *s, struct seq_file *seq) @@ -6090,7 +6012,6 @@ static void __init init_mount_tree(void) ns = alloc_mnt_ns(&init_user_ns, true); if (IS_ERR(ns)) panic("Can't allocate initial namespace"); - ns->seq = atomic64_inc_return(&mnt_ns_seq); ns->ns.inum = PROC_MNT_INIT_INO; m = real_mount(mnt); ns->root = m; @@ -6105,7 +6026,7 @@ static void __init init_mount_tree(void) set_fs_pwd(current->fs, &root); set_fs_root(current->fs, &root);
- mnt_ns_tree_add(ns); + ns_tree_add(ns); }
void __init mnt_init(void) diff --git a/fs/nsfs.c b/fs/nsfs.c index 34f0b35d3ead..6f8008177133 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -139,7 +139,7 @@ static int copy_ns_info_to_user(const struct mnt_namespace *mnt_ns, * the size value will be set to the size the kernel knows about. */ kinfo->size = min(usize, sizeof(*kinfo)); - kinfo->mnt_ns_id = mnt_ns->seq; + kinfo->mnt_ns_id = mnt_ns->ns.ns_id; kinfo->nr_mounts = READ_ONCE(mnt_ns->nr_mounts); /* Subtract the root mount of the mount namespace. */ if (kinfo->nr_mounts) @@ -221,7 +221,7 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl,
mnt_ns = container_of(ns, struct mnt_namespace, ns); idp = (__u64 __user *)arg; - id = mnt_ns->seq; + id = mnt_ns->ns.ns_id; return put_user(id, idp); } case NS_GET_PID_FROM_PIDNS:
Support the generic namespace iterator and lookup infrastructure to support file handles for namespaces.
Signed-off-by: Christian Brauner brauner@kernel.org --- kernel/cgroup/cgroup.c | 2 ++ kernel/cgroup/namespace.c | 7 +++++-- 2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 312c6a8b55bb..092e6bf081ed 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -59,6 +59,7 @@ #include <linux/sched/cputime.h> #include <linux/sched/deadline.h> #include <linux/psi.h> +#include <linux/nstree.h> #include <net/sock.h>
#define CREATE_TRACE_POINTS @@ -6312,6 +6313,7 @@ int __init cgroup_init(void) WARN_ON(register_filesystem(&cpuset_fs_type)); #endif
+ ns_tree_add(&init_cgroup_ns); return 0; }
diff --git a/kernel/cgroup/namespace.c b/kernel/cgroup/namespace.c index 0391b6ab0bf1..fc12c416dfeb 100644 --- a/kernel/cgroup/namespace.c +++ b/kernel/cgroup/namespace.c @@ -5,7 +5,7 @@ #include <linux/slab.h> #include <linux/nsproxy.h> #include <linux/proc_ns.h> - +#include <linux/nstree.h>
/* cgroup namespaces */
@@ -30,16 +30,19 @@ static struct cgroup_namespace *alloc_cgroup_ns(void) ret = ns_common_init(&new_ns->ns, &cgroupns_operations, true); if (ret) return ERR_PTR(ret); + ns_tree_add(new_ns); return no_free_ptr(new_ns); }
void free_cgroup_ns(struct cgroup_namespace *ns) { + ns_tree_remove(ns); put_css_set(ns->root_cset); dec_cgroup_namespaces(ns->ucounts); put_user_ns(ns->user_ns); ns_free_inum(&ns->ns); - kfree(ns); + /* Concurrent nstree traversal depends on a grace period. */ + kfree_rcu(ns, ns.ns_rcu); } EXPORT_SYMBOL(free_cgroup_ns);
On Wed, Sep 10, 2025 at 04:37:03PM +0200, Christian Brauner wrote:
Support the generic namespace iterator and lookup infrastructure to support file handles for namespaces.
The patch subject seems a bit too generic and could be misleading. Maybe it should mention it's for namespaces? Other than that,
Acked-by: Tejun Heo tj@kernel.org
Thanks.
Support the generic namespace iterator and lookup infrastructure to support file handles for namespaces.
Signed-off-by: Christian Brauner brauner@kernel.org --- ipc/msgutil.c | 1 + ipc/namespace.c | 3 +++ ipc/shm.c | 2 ++ 3 files changed, 6 insertions(+)
diff --git a/ipc/msgutil.c b/ipc/msgutil.c index c7be0c792647..bbf61275df41 100644 --- a/ipc/msgutil.c +++ b/ipc/msgutil.c @@ -15,6 +15,7 @@ #include <linux/proc_ns.h> #include <linux/uaccess.h> #include <linux/sched.h> +#include <linux/nstree.h>
#include "util.h"
diff --git a/ipc/namespace.c b/ipc/namespace.c index d4188a88ee57..9f923c1a1eb3 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -15,6 +15,7 @@ #include <linux/mount.h> #include <linux/user_namespace.h> #include <linux/proc_ns.h> +#include <linux/nstree.h> #include <linux/sched/task.h>
#include "util.h" @@ -85,6 +86,7 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
sem_init_ns(ns); shm_init_ns(ns); + ns_tree_add(ns);
return ns;
@@ -201,6 +203,7 @@ void put_ipc_ns(struct ipc_namespace *ns) mq_clear_sbinfo(ns); spin_unlock(&mq_lock);
+ ns_tree_remove(ns); if (llist_add(&ns->mnt_llist, &free_ipc_list)) schedule_work(&free_ipc_work); } diff --git a/ipc/shm.c b/ipc/shm.c index a9310b6dbbc3..3db36773dd10 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -45,6 +45,7 @@ #include <linux/mount.h> #include <linux/ipc_namespace.h> #include <linux/rhashtable.h> +#include <linux/nstree.h>
#include <linux/uaccess.h>
@@ -148,6 +149,7 @@ void shm_exit_ns(struct ipc_namespace *ns) static int __init ipc_ns_init(void) { shm_init_ns(&init_ipc_ns); + ns_tree_add(&init_ipc_ns); return 0; }
Support the generic namespace iterator and lookup infrastructure to support file handles for namespaces.
The network namespace has a separate list with different lifetime rules which we can just leave in tact. We have a similar concept for mount namespaces as well where it is on two differenet lists for different purposes.
Signed-off-by: Christian Brauner brauner@kernel.org --- include/net/net_namespace.h | 1 + net/core/net_namespace.c | 8 ++++++-- 2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 025a7574b275..42075748dff1 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -11,6 +11,7 @@ #include <linux/list.h> #include <linux/sysctl.h> #include <linux/uidgid.h> +#include <linux/nstree.h>
#include <net/flow.h> #include <net/netns/core.h> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index dafb3d947043..b85e303400be 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -20,6 +20,7 @@ #include <linux/sched/task.h> #include <linux/uidgid.h> #include <linux/proc_fs.h> +#include <linux/nstree.h>
#include <net/aligned_data.h> #include <net/sock.h> @@ -445,7 +446,7 @@ static __net_init int setup_net(struct net *net) LIST_HEAD(net_exit_list); int error = 0;
- net->net_cookie = atomic64_inc_return(&net_aligned_data.net_cookie); + net->net_cookie = ns_tree_gen_id(&net->ns);
list_for_each_entry(ops, &pernet_list, list) { error = ops_init(ops, net); @@ -455,6 +456,7 @@ static __net_init int setup_net(struct net *net) down_write(&net_rwsem); list_add_tail_rcu(&net->list, &net_namespace_list); up_write(&net_rwsem); + ns_tree_add_raw(net); out: return error;
@@ -675,8 +677,10 @@ static void cleanup_net(struct work_struct *work)
/* Don't let anyone else find us. */ down_write(&net_rwsem); - llist_for_each_entry(net, net_kill_list, cleanup_list) + llist_for_each_entry(net, net_kill_list, cleanup_list) { + ns_tree_remove(net); list_del_rcu(&net->list); + } /* Cache last net. After we unlock rtnl, no one new net * added to net_namespace_list can assign nsid pointer * to a net from net_kill_list (see peernet2id_alloc()).
Support the generic namespace iterator and lookup infrastructure to support file handles for namespaces.
Signed-off-by: Christian Brauner brauner@kernel.org --- kernel/pid_namespace.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 20ce4052d1c5..228ae20299f9 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -23,6 +23,7 @@ #include <linux/sched/task.h> #include <linux/sched/signal.h> #include <linux/idr.h> +#include <linux/nstree.h> #include <uapi/linux/wait.h> #include "pid_sysctl.h"
@@ -122,6 +123,7 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns ns->memfd_noexec_scope = pidns_memfd_noexec_scope(parent_pid_ns); #endif
+ ns_tree_add(ns); return ns;
out_free_inum: @@ -147,6 +149,7 @@ static void delayed_free_pidns(struct rcu_head *p)
static void destroy_pid_namespace(struct pid_namespace *ns) { + ns_tree_remove(ns); unregister_pidns_sysctls(ns);
ns_free_inum(&ns->ns); @@ -473,6 +476,7 @@ static __init int pid_namespaces_init(void) #endif
register_pid_ns_sysctl_table_vm(); + ns_tree_add(&init_pid_ns); return 0; }
Support the generic namespace iterator and lookup infrastructure to support file handles for namespaces.
Signed-off-by: Christian Brauner brauner@kernel.org --- include/linux/time_namespace.h | 5 +++++ init/main.c | 2 ++ kernel/time/namespace.c | 13 +++++++++++-- 3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h index bb2c52f4fc94..7f6af7a9771e 100644 --- a/include/linux/time_namespace.h +++ b/include/linux/time_namespace.h @@ -33,6 +33,7 @@ struct time_namespace { extern struct time_namespace init_time_ns;
#ifdef CONFIG_TIME_NS +void __init time_ns_init(void); extern int vdso_join_timens(struct task_struct *task, struct time_namespace *ns); extern void timens_commit(struct task_struct *tsk, struct time_namespace *ns); @@ -108,6 +109,10 @@ static inline ktime_t timens_ktime_to_host(clockid_t clockid, ktime_t tim) }
#else +static inline void __init time_ns_init(void) +{ +} + static inline int vdso_join_timens(struct task_struct *task, struct time_namespace *ns) { diff --git a/init/main.c b/init/main.c index 0ee0ee7b7c2c..e7d2c57c65a7 100644 --- a/init/main.c +++ b/init/main.c @@ -103,6 +103,7 @@ #include <linux/randomize_kstack.h> #include <linux/pidfs.h> #include <linux/ptdump.h> +#include <linux/time_namespace.h> #include <net/net_namespace.h>
#include <asm/io.h> @@ -1072,6 +1073,7 @@ void start_kernel(void) fork_init(); proc_caches_init(); uts_ns_init(); + time_ns_init(); key_init(); security_init(); dbg_late_init(); diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c index 80b3d2ce2fb6..408f60d0a3b6 100644 --- a/kernel/time/namespace.c +++ b/kernel/time/namespace.c @@ -12,6 +12,7 @@ #include <linux/seq_file.h> #include <linux/proc_ns.h> #include <linux/export.h> +#include <linux/nstree.h> #include <linux/time.h> #include <linux/slab.h> #include <linux/cred.h> @@ -88,7 +89,7 @@ static struct time_namespace *clone_time_ns(struct user_namespace *user_ns, goto fail;
err = -ENOMEM; - ns = kmalloc(sizeof(*ns), GFP_KERNEL_ACCOUNT); + ns = kzalloc(sizeof(*ns), GFP_KERNEL_ACCOUNT); if (!ns) goto fail_dec;
@@ -104,6 +105,7 @@ static struct time_namespace *clone_time_ns(struct user_namespace *user_ns, ns->user_ns = get_user_ns(user_ns); ns->offsets = old_ns->offsets; ns->frozen_offsets = false; + ns_tree_add(ns); return ns;
fail_free_page: @@ -250,11 +252,13 @@ static void timens_set_vvar_page(struct task_struct *task,
void free_time_ns(struct time_namespace *ns) { + ns_tree_remove(ns); dec_time_namespaces(ns->ucounts); put_user_ns(ns->user_ns); ns_free_inum(&ns->ns); __free_page(ns->vvar_page); - kfree(ns); + /* Concurrent nstree traversal depends on a grace period. */ + kfree_rcu(ns, ns.ns_rcu); }
static struct time_namespace *to_time_ns(struct ns_common *ns) @@ -487,3 +491,8 @@ struct time_namespace init_time_ns = { .ns.ops = &timens_operations, .frozen_offsets = true, }; + +void __init time_ns_init(void) +{ + ns_tree_add(&init_time_ns); +}
On Wed, Sep 10 2025 at 16:37, Christian Brauner wrote:
Support the generic namespace iterator and lookup infrastructure to support file handles for namespaces.
Signed-off-by: Christian Brauner brauner@kernel.org
Reviewed-by: Thomas Gleixner tglx@linutronix.de
Support the generic namespace iterator and lookup infrastructure to support file handles for namespaces.
Signed-off-by: Christian Brauner brauner@kernel.org --- kernel/user_namespace.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 98f4fe84d039..ade5b6806c5c 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -21,6 +21,7 @@ #include <linux/fs_struct.h> #include <linux/bsearch.h> #include <linux/sort.h> +#include <linux/nstree.h>
static struct kmem_cache *user_ns_cachep __ro_after_init; static DEFINE_MUTEX(userns_state_mutex); @@ -158,6 +159,7 @@ int create_user_ns(struct cred *new) goto fail_keyring;
set_cred_user_ns(new, ns); + ns_tree_add(ns); return 0; fail_keyring: #ifdef CONFIG_PERSISTENT_KEYRINGS @@ -200,6 +202,7 @@ static void free_user_ns(struct work_struct *work) do { struct ucounts *ucounts = ns->ucounts; parent = ns->parent; + ns_tree_remove(ns); if (ns->gid_map.nr_extents > UID_GID_MAP_MAX_BASE_EXTENTS) { kfree(ns->gid_map.forward); kfree(ns->gid_map.reverse); @@ -218,7 +221,8 @@ static void free_user_ns(struct work_struct *work) retire_userns_sysctls(ns); key_free_user_ns(ns); ns_free_inum(&ns->ns); - kmem_cache_free(user_ns_cachep, ns); + /* Concurrent nstree traversal depends on a grace period. */ + kfree_rcu(ns, ns.ns_rcu); dec_user_namespaces(ucounts); ns = parent; } while (refcount_dec_and_test(&parent->ns.count)); @@ -1412,6 +1416,7 @@ const struct proc_ns_operations userns_operations = { static __init int user_namespaces_init(void) { user_ns_cachep = KMEM_CACHE(user_namespace, SLAB_PANIC | SLAB_ACCOUNT); + ns_tree_add(&init_user_ns); return 0; } subsys_initcall(user_namespaces_init);
Support the generic namespace iterator and lookup infrastructure to support file handles for namespaces.
Signed-off-by: Christian Brauner brauner@kernel.org --- kernel/utsname.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/utsname.c b/kernel/utsname.c index 02037010b378..64155417ae0c 100644 --- a/kernel/utsname.c +++ b/kernel/utsname.c @@ -13,6 +13,7 @@ #include <linux/cred.h> #include <linux/user_namespace.h> #include <linux/proc_ns.h> +#include <linux/nstree.h> #include <linux/sched/task.h>
static struct kmem_cache *uts_ns_cache __ro_after_init; @@ -58,6 +59,7 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns, memcpy(&ns->name, &old_ns->name, sizeof(ns->name)); ns->user_ns = get_user_ns(user_ns); up_read(&uts_sem); + ns_tree_add(ns); return ns;
fail_free: @@ -93,10 +95,12 @@ struct uts_namespace *copy_utsname(unsigned long flags,
void free_uts_ns(struct uts_namespace *ns) { + ns_tree_remove(ns); dec_uts_namespaces(ns->ucounts); put_user_ns(ns->user_ns); ns_free_inum(&ns->ns); - kmem_cache_free(uts_ns_cache, ns); + /* Concurrent nstree traversal depends on a grace period. */ + kfree_rcu(ns, ns.ns_rcu); }
static inline struct uts_namespace *to_uts_ns(struct ns_common *ns) @@ -162,4 +166,5 @@ void __init uts_ns_init(void) offsetof(struct uts_namespace, name), sizeof_field(struct uts_namespace, name), NULL); + ns_tree_add(&init_uts_ns); }
Every namespace type has a container_of(ns, <ns_type>, ns) static inline function that is currently not exposed in the header. So we have a bunch of places that open-code it via container_of(). Move it to the headers so we can use it directly.
Signed-off-by: Christian Brauner brauner@kernel.org --- include/linux/cgroup.h | 5 +++++ include/linux/ipc_namespace.h | 5 +++++ include/linux/pid_namespace.h | 5 +++++ include/linux/time_namespace.h | 4 ++++ include/linux/user_namespace.h | 5 +++++ include/linux/utsname.h | 5 +++++ include/net/net_namespace.h | 5 +++++ ipc/namespace.c | 5 ----- kernel/cgroup/namespace.c | 5 ----- kernel/pid_namespace.c | 5 ----- kernel/time/namespace.c | 5 ----- kernel/user_namespace.c | 5 ----- kernel/utsname.c | 5 ----- net/core/net_namespace.c | 5 ----- 14 files changed, 34 insertions(+), 35 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index b18fb5fcb38e..9ca25346f7cb 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -794,6 +794,11 @@ extern struct cgroup_namespace init_cgroup_ns;
#ifdef CONFIG_CGROUPS
+static inline struct cgroup_namespace *to_cg_ns(struct ns_common *ns) +{ + return container_of(ns, struct cgroup_namespace, ns); +} + void free_cgroup_ns(struct cgroup_namespace *ns);
struct cgroup_namespace *copy_cgroup_ns(unsigned long flags, diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index e8240cf2611a..924e4754374f 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -129,6 +129,11 @@ static inline int mq_init_ns(struct ipc_namespace *ns) { return 0; } #endif
#if defined(CONFIG_IPC_NS) +static inline struct ipc_namespace *to_ipc_ns(struct ns_common *ns) +{ + return container_of(ns, struct ipc_namespace, ns); +} + extern struct ipc_namespace *copy_ipcs(unsigned long flags, struct user_namespace *user_ns, struct ipc_namespace *ns);
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 7c67a5811199..ba0efc8c8596 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -54,6 +54,11 @@ extern struct pid_namespace init_pid_ns; #define PIDNS_ADDING (1U << 31)
#ifdef CONFIG_PID_NS +static inline struct pid_namespace *to_pid_ns(struct ns_common *ns) +{ + return container_of(ns, struct pid_namespace, ns); +} + static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns) { if (ns != &init_pid_ns) diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h index 7f6af7a9771e..a47a4ce4183e 100644 --- a/include/linux/time_namespace.h +++ b/include/linux/time_namespace.h @@ -33,6 +33,10 @@ struct time_namespace { extern struct time_namespace init_time_ns;
#ifdef CONFIG_TIME_NS +static inline struct time_namespace *to_time_ns(struct ns_common *ns) +{ + return container_of(ns, struct time_namespace, ns); +} void __init time_ns_init(void); extern int vdso_join_timens(struct task_struct *task, struct time_namespace *ns); diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index a0bb6d012137..a09056ad090e 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -168,6 +168,11 @@ static inline void set_userns_rlimit_max(struct user_namespace *ns,
#ifdef CONFIG_USER_NS
+static inline struct user_namespace *to_user_ns(struct ns_common *ns) +{ + return container_of(ns, struct user_namespace, ns); +} + static inline struct user_namespace *get_user_ns(struct user_namespace *ns) { if (ns) diff --git a/include/linux/utsname.h b/include/linux/utsname.h index bf7613ba412b..5d34c4f0f945 100644 --- a/include/linux/utsname.h +++ b/include/linux/utsname.h @@ -30,6 +30,11 @@ struct uts_namespace { extern struct uts_namespace init_uts_ns;
#ifdef CONFIG_UTS_NS +static inline struct uts_namespace *to_uts_ns(struct ns_common *ns) +{ + return container_of(ns, struct uts_namespace, ns); +} + static inline void get_uts_ns(struct uts_namespace *ns) { refcount_inc(&ns->ns.count); diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 42075748dff1..b9c5f6c7ee1e 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -263,6 +263,11 @@ void ipx_unregister_sysctl(void); #ifdef CONFIG_NET_NS void __put_net(struct net *net);
+static inline struct net *to_net_ns(struct ns_common *ns) +{ + return container_of(ns, struct net, ns); +} + /* Try using get_net_track() instead */ static inline struct net *get_net(struct net *net) { diff --git a/ipc/namespace.c b/ipc/namespace.c index 9f923c1a1eb3..89588819956b 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -209,11 +209,6 @@ void put_ipc_ns(struct ipc_namespace *ns) } }
-static inline struct ipc_namespace *to_ipc_ns(struct ns_common *ns) -{ - return container_of(ns, struct ipc_namespace, ns); -} - static struct ns_common *ipcns_get(struct task_struct *task) { struct ipc_namespace *ns = NULL; diff --git a/kernel/cgroup/namespace.c b/kernel/cgroup/namespace.c index fc12c416dfeb..5a327914b565 100644 --- a/kernel/cgroup/namespace.c +++ b/kernel/cgroup/namespace.c @@ -89,11 +89,6 @@ struct cgroup_namespace *copy_cgroup_ns(unsigned long flags, return new_ns; }
-static inline struct cgroup_namespace *to_cg_ns(struct ns_common *ns) -{ - return container_of(ns, struct cgroup_namespace, ns); -} - static int cgroupns_install(struct nsset *nsset, struct ns_common *ns) { struct nsproxy *nsproxy = nsset->nsproxy; diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 228ae20299f9..9b327420309e 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -345,11 +345,6 @@ int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd) return 0; }
-static inline struct pid_namespace *to_pid_ns(struct ns_common *ns) -{ - return container_of(ns, struct pid_namespace, ns); -} - static struct ns_common *pidns_get(struct task_struct *task) { struct pid_namespace *ns; diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c index 408f60d0a3b6..20b65f90549e 100644 --- a/kernel/time/namespace.c +++ b/kernel/time/namespace.c @@ -261,11 +261,6 @@ void free_time_ns(struct time_namespace *ns) kfree_rcu(ns, ns.ns_rcu); }
-static struct time_namespace *to_time_ns(struct ns_common *ns) -{ - return container_of(ns, struct time_namespace, ns); -} - static struct ns_common *timens_get(struct task_struct *task) { struct time_namespace *ns = NULL; diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index ade5b6806c5c..cfb0e28f2779 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -1325,11 +1325,6 @@ bool current_in_userns(const struct user_namespace *target_ns) } EXPORT_SYMBOL(current_in_userns);
-static inline struct user_namespace *to_user_ns(struct ns_common *ns) -{ - return container_of(ns, struct user_namespace, ns); -} - static struct ns_common *userns_get(struct task_struct *task) { struct user_namespace *user_ns; diff --git a/kernel/utsname.c b/kernel/utsname.c index 64155417ae0c..a682830742d3 100644 --- a/kernel/utsname.c +++ b/kernel/utsname.c @@ -103,11 +103,6 @@ void free_uts_ns(struct uts_namespace *ns) kfree_rcu(ns, ns.ns_rcu); }
-static inline struct uts_namespace *to_uts_ns(struct ns_common *ns) -{ - return container_of(ns, struct uts_namespace, ns); -} - static struct ns_common *utsns_get(struct task_struct *task) { struct uts_namespace *ns = NULL; diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index b85e303400be..ca9b06f3925f 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -1539,11 +1539,6 @@ static struct ns_common *netns_get(struct task_struct *task) return net ? &net->ns : NULL; }
-static inline struct net *to_net_ns(struct ns_common *ns) -{ - return container_of(ns, struct net, ns); -} - static void netns_put(struct ns_common *ns) { put_net(to_net_ns(ns));
On 2025-09-10, Christian Brauner brauner@kernel.org wrote:
Every namespace type has a container_of(ns, <ns_type>, ns) static inline function that is currently not exposed in the header. So we have a bunch of places that open-code it via container_of(). Move it to the headers so we can use it directly.
Yes please! Feel free to add my
Reviewed-by: Aleksa Sarai cyphar@cyphar.com
Signed-off-by: Christian Brauner brauner@kernel.org
include/linux/cgroup.h | 5 +++++ include/linux/ipc_namespace.h | 5 +++++ include/linux/pid_namespace.h | 5 +++++ include/linux/time_namespace.h | 4 ++++ include/linux/user_namespace.h | 5 +++++ include/linux/utsname.h | 5 +++++ include/net/net_namespace.h | 5 +++++ ipc/namespace.c | 5 ----- kernel/cgroup/namespace.c | 5 ----- kernel/pid_namespace.c | 5 ----- kernel/time/namespace.c | 5 ----- kernel/user_namespace.c | 5 ----- kernel/utsname.c | 5 ----- net/core/net_namespace.c | 5 ----- 14 files changed, 34 insertions(+), 35 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index b18fb5fcb38e..9ca25346f7cb 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -794,6 +794,11 @@ extern struct cgroup_namespace init_cgroup_ns; #ifdef CONFIG_CGROUPS +static inline struct cgroup_namespace *to_cg_ns(struct ns_common *ns) +{
- return container_of(ns, struct cgroup_namespace, ns);
+}
void free_cgroup_ns(struct cgroup_namespace *ns); struct cgroup_namespace *copy_cgroup_ns(unsigned long flags, diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index e8240cf2611a..924e4754374f 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -129,6 +129,11 @@ static inline int mq_init_ns(struct ipc_namespace *ns) { return 0; } #endif #if defined(CONFIG_IPC_NS) +static inline struct ipc_namespace *to_ipc_ns(struct ns_common *ns) +{
- return container_of(ns, struct ipc_namespace, ns);
+}
extern struct ipc_namespace *copy_ipcs(unsigned long flags, struct user_namespace *user_ns, struct ipc_namespace *ns); diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 7c67a5811199..ba0efc8c8596 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -54,6 +54,11 @@ extern struct pid_namespace init_pid_ns; #define PIDNS_ADDING (1U << 31) #ifdef CONFIG_PID_NS +static inline struct pid_namespace *to_pid_ns(struct ns_common *ns) +{
- return container_of(ns, struct pid_namespace, ns);
+}
static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns) { if (ns != &init_pid_ns) diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h index 7f6af7a9771e..a47a4ce4183e 100644 --- a/include/linux/time_namespace.h +++ b/include/linux/time_namespace.h @@ -33,6 +33,10 @@ struct time_namespace { extern struct time_namespace init_time_ns; #ifdef CONFIG_TIME_NS +static inline struct time_namespace *to_time_ns(struct ns_common *ns) +{
- return container_of(ns, struct time_namespace, ns);
+} void __init time_ns_init(void); extern int vdso_join_timens(struct task_struct *task, struct time_namespace *ns); diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index a0bb6d012137..a09056ad090e 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -168,6 +168,11 @@ static inline void set_userns_rlimit_max(struct user_namespace *ns, #ifdef CONFIG_USER_NS +static inline struct user_namespace *to_user_ns(struct ns_common *ns) +{
- return container_of(ns, struct user_namespace, ns);
+}
static inline struct user_namespace *get_user_ns(struct user_namespace *ns) { if (ns) diff --git a/include/linux/utsname.h b/include/linux/utsname.h index bf7613ba412b..5d34c4f0f945 100644 --- a/include/linux/utsname.h +++ b/include/linux/utsname.h @@ -30,6 +30,11 @@ struct uts_namespace { extern struct uts_namespace init_uts_ns; #ifdef CONFIG_UTS_NS +static inline struct uts_namespace *to_uts_ns(struct ns_common *ns) +{
- return container_of(ns, struct uts_namespace, ns);
+}
static inline void get_uts_ns(struct uts_namespace *ns) { refcount_inc(&ns->ns.count); diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 42075748dff1..b9c5f6c7ee1e 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -263,6 +263,11 @@ void ipx_unregister_sysctl(void); #ifdef CONFIG_NET_NS void __put_net(struct net *net); +static inline struct net *to_net_ns(struct ns_common *ns) +{
- return container_of(ns, struct net, ns);
+}
/* Try using get_net_track() instead */ static inline struct net *get_net(struct net *net) { diff --git a/ipc/namespace.c b/ipc/namespace.c index 9f923c1a1eb3..89588819956b 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -209,11 +209,6 @@ void put_ipc_ns(struct ipc_namespace *ns) } } -static inline struct ipc_namespace *to_ipc_ns(struct ns_common *ns) -{
- return container_of(ns, struct ipc_namespace, ns);
-}
static struct ns_common *ipcns_get(struct task_struct *task) { struct ipc_namespace *ns = NULL; diff --git a/kernel/cgroup/namespace.c b/kernel/cgroup/namespace.c index fc12c416dfeb..5a327914b565 100644 --- a/kernel/cgroup/namespace.c +++ b/kernel/cgroup/namespace.c @@ -89,11 +89,6 @@ struct cgroup_namespace *copy_cgroup_ns(unsigned long flags, return new_ns; } -static inline struct cgroup_namespace *to_cg_ns(struct ns_common *ns) -{
- return container_of(ns, struct cgroup_namespace, ns);
-}
static int cgroupns_install(struct nsset *nsset, struct ns_common *ns) { struct nsproxy *nsproxy = nsset->nsproxy; diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 228ae20299f9..9b327420309e 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -345,11 +345,6 @@ int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd) return 0; } -static inline struct pid_namespace *to_pid_ns(struct ns_common *ns) -{
- return container_of(ns, struct pid_namespace, ns);
-}
static struct ns_common *pidns_get(struct task_struct *task) { struct pid_namespace *ns; diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c index 408f60d0a3b6..20b65f90549e 100644 --- a/kernel/time/namespace.c +++ b/kernel/time/namespace.c @@ -261,11 +261,6 @@ void free_time_ns(struct time_namespace *ns) kfree_rcu(ns, ns.ns_rcu); } -static struct time_namespace *to_time_ns(struct ns_common *ns) -{
- return container_of(ns, struct time_namespace, ns);
-}
static struct ns_common *timens_get(struct task_struct *task) { struct time_namespace *ns = NULL; diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index ade5b6806c5c..cfb0e28f2779 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -1325,11 +1325,6 @@ bool current_in_userns(const struct user_namespace *target_ns) } EXPORT_SYMBOL(current_in_userns); -static inline struct user_namespace *to_user_ns(struct ns_common *ns) -{
- return container_of(ns, struct user_namespace, ns);
-}
static struct ns_common *userns_get(struct task_struct *task) { struct user_namespace *user_ns; diff --git a/kernel/utsname.c b/kernel/utsname.c index 64155417ae0c..a682830742d3 100644 --- a/kernel/utsname.c +++ b/kernel/utsname.c @@ -103,11 +103,6 @@ void free_uts_ns(struct uts_namespace *ns) kfree_rcu(ns, ns.ns_rcu); } -static inline struct uts_namespace *to_uts_ns(struct ns_common *ns) -{
- return container_of(ns, struct uts_namespace, ns);
-}
static struct ns_common *utsns_get(struct task_struct *task) { struct uts_namespace *ns = NULL; diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index b85e303400be..ca9b06f3925f 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -1539,11 +1539,6 @@ static struct ns_common *netns_get(struct task_struct *task) return net ? &net->ns : NULL; } -static inline struct net *to_net_ns(struct ns_common *ns) -{
- return container_of(ns, struct net, ns);
-}
static void netns_put(struct ns_common *ns) { put_net(to_net_ns(ns));
-- 2.47.3
Add a helper to easily check whether a given namespace is the caller's current namespace. This is currently open-coded in a lot of places. Simply switch on the type and compare the results.
Signed-off-by: Christian Brauner brauner@kernel.org --- include/linux/nsfs.h | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/include/linux/nsfs.h b/include/linux/nsfs.h index fb84aa538091..e5a5fa83d36b 100644 --- a/include/linux/nsfs.h +++ b/include/linux/nsfs.h @@ -5,6 +5,8 @@ #define _LINUX_NSFS_H
#include <linux/ns_common.h> +#include <linux/cred.h> +#include <linux/pid_namespace.h>
struct path; struct task_struct; @@ -22,5 +24,17 @@ int ns_get_name(char *buf, size_t size, struct task_struct *task, const struct proc_ns_operations *ns_ops); void nsfs_init(void);
-#endif /* _LINUX_NSFS_H */ +#define __current_namespace_from_type(__ns) \ + _Generic((__ns), \ + struct cgroup_namespace *: current->nsproxy->cgroup_ns, \ + struct ipc_namespace *: current->nsproxy->ipc_ns, \ + struct net *: current->nsproxy->net_ns, \ + struct pid_namespace *: task_active_pid_ns(current), \ + struct mnt_namespace *: current->nsproxy->mnt_ns, \ + struct time_namespace *: current->nsproxy->time_ns, \ + struct user_namespace *: current_user_ns(), \ + struct uts_namespace *: current->nsproxy->uts_ns) + +#define current_in_namespace(__ns) (__current_namespace_from_type(__ns) == __ns)
+#endif /* _LINUX_NSFS_H */
On 2025-09-10, Christian Brauner brauner@kernel.org wrote:
Add a helper to easily check whether a given namespace is the caller's current namespace. This is currently open-coded in a lot of places. Simply switch on the type and compare the results.
Signed-off-by: Christian Brauner brauner@kernel.org
Looks good, feel free to add my
Reviewed-by: Aleksa Sarai cyphar@cyphar.com
include/linux/nsfs.h | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/include/linux/nsfs.h b/include/linux/nsfs.h index fb84aa538091..e5a5fa83d36b 100644 --- a/include/linux/nsfs.h +++ b/include/linux/nsfs.h @@ -5,6 +5,8 @@ #define _LINUX_NSFS_H #include <linux/ns_common.h> +#include <linux/cred.h> +#include <linux/pid_namespace.h> struct path; struct task_struct; @@ -22,5 +24,17 @@ int ns_get_name(char *buf, size_t size, struct task_struct *task, const struct proc_ns_operations *ns_ops); void nsfs_init(void); -#endif /* _LINUX_NSFS_H */ +#define __current_namespace_from_type(__ns) \
- _Generic((__ns), \
struct cgroup_namespace *: current->nsproxy->cgroup_ns, \
struct ipc_namespace *: current->nsproxy->ipc_ns, \
struct net *: current->nsproxy->net_ns, \
struct pid_namespace *: task_active_pid_ns(current), \
struct mnt_namespace *: current->nsproxy->mnt_ns, \
struct time_namespace *: current->nsproxy->time_ns, \
struct user_namespace *: current_user_ns(), \
struct uts_namespace *: current->nsproxy->uts_ns)
+#define current_in_namespace(__ns) (__current_namespace_from_type(__ns) == __ns) +#endif /* _LINUX_NSFS_H */
-- 2.47.3
A while ago we added support for file handles to pidfs so pidfds can be encoded and decoded as file handles. Userspace has adopted this quickly and it's proven very useful. Pidfd file handles are exhaustive meaning they don't require a handle on another pidfd to pass to open_by_handle_at() so it can derive the filesystem to decode in.
Implement the exhaustive file handles for namespaces as well.
Signed-off-by: Christian Brauner brauner@kernel.org --- fs/nsfs.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/exportfs.h | 6 ++ 2 files changed, 182 insertions(+)
diff --git a/fs/nsfs.c b/fs/nsfs.c index 6f8008177133..a1585a2f4f03 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -13,6 +13,12 @@ #include <linux/nsfs.h> #include <linux/uaccess.h> #include <linux/mnt_namespace.h> +#include <linux/ipc_namespace.h> +#include <linux/time_namespace.h> +#include <linux/utsname.h> +#include <linux/exportfs.h> +#include <linux/nstree.h> +#include <net/net_namespace.h>
#include "mount.h" #include "internal.h" @@ -417,12 +423,182 @@ static const struct stashed_operations nsfs_stashed_ops = { .put_data = nsfs_put_data, };
+struct nsfs_fid { + u64 ns_id; + u32 ns_type; + u32 ns_inum; +} __attribute__ ((packed)); + +#define NSFS_FID_SIZE (sizeof(struct nsfs_fid) / sizeof(u32)) + +static int nsfs_encode_fh(struct inode *inode, u32 *fh, int *max_len, + struct inode *parent) +{ + struct nsfs_fid *fid = (struct nsfs_fid *)fh; + struct ns_common *ns = inode->i_private; + int len = *max_len; + + /* + * TODO: + * For hierarchical namespaces we should start to encode the + * parent namespace. Then userspace can walk a namespace + * hierarchy purely based on file handles. + */ + if (parent) + return FILEID_INVALID; + + if (len < NSFS_FID_SIZE) { + *max_len = NSFS_FID_SIZE; + return FILEID_INVALID; + } + + len = NSFS_FID_SIZE; + + fid->ns_id = ns->ns_id; + fid->ns_type = ns->ops->type; + fid->ns_inum = inode->i_ino; + *max_len = len; + return FILEID_NSFS; +} + +static struct dentry *nsfs_fh_to_dentry(struct super_block *sb, struct fid *fh, + int fh_len, int fh_type) +{ + struct path path __free(path_put) = {}; + struct nsfs_fid *fid = (struct nsfs_fid *)fh; + struct user_namespace *owning_ns = NULL; + struct ns_common *ns; + int ret; + + if (fh_len < NSFS_FID_SIZE) + return NULL; + + switch (fh_type) { + case FILEID_NSFS: + break; + default: + return NULL; + } + + scoped_guard(rcu) { + ns = ns_tree_lookup_rcu(fid->ns_id, fid->ns_type); + if (!ns) + return NULL; + + VFS_WARN_ON_ONCE(ns->ns_id != fid->ns_id); + VFS_WARN_ON_ONCE(ns->ops->type != fid->ns_type); + VFS_WARN_ON_ONCE(ns->inum != fid->ns_inum); + + if (!refcount_inc_not_zero(&ns->count)) + return NULL; + } + + switch (ns->ops->type) { +#ifdef CONFIG_CGROUPS + case CLONE_NEWCGROUP: + if (!current_in_namespace(to_cg_ns(ns))) + owning_ns = to_cg_ns(ns)->user_ns; + break; +#endif +#ifdef CONFIG_IPC_NS + case CLONE_NEWIPC: + if (!current_in_namespace(to_ipc_ns(ns))) + owning_ns = to_ipc_ns(ns)->user_ns; + break; +#endif + case CLONE_NEWNS: + if (!current_in_namespace(to_mnt_ns(ns))) + owning_ns = to_mnt_ns(ns)->user_ns; + break; +#ifdef CONFIG_NET_NS + case CLONE_NEWNET: + if (!current_in_namespace(to_net_ns(ns))) + owning_ns = to_net_ns(ns)->user_ns; + break; +#endif +#ifdef CONFIG_PID_NS + case CLONE_NEWPID: + if (!current_in_namespace(to_pid_ns(ns))) { + owning_ns = to_pid_ns(ns)->user_ns; + } else if (!READ_ONCE(to_pid_ns(ns)->child_reaper)) { + ns->ops->put(ns); + return ERR_PTR(-EPERM); + } + break; +#endif +#ifdef CONFIG_TIME_NS + case CLONE_NEWTIME: + if (!current_in_namespace(to_time_ns(ns))) + owning_ns = to_time_ns(ns)->user_ns; + break; +#endif +#ifdef CONFIG_USER_NS + case CLONE_NEWUSER: + if (!current_in_namespace(to_user_ns(ns))) + owning_ns = to_user_ns(ns); + break; +#endif +#ifdef CONFIG_UTS_NS + case CLONE_NEWUTS: + if (!current_in_namespace(to_uts_ns(ns))) + owning_ns = to_uts_ns(ns)->user_ns; + break; +#endif + default: + return ERR_PTR(-EOPNOTSUPP); + } + + if (owning_ns && !ns_capable(owning_ns, CAP_SYS_ADMIN)) { + ns->ops->put(ns); + return ERR_PTR(-EPERM); + } + + /* path_from_stashed() unconditionally consumes the reference. */ + ret = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path); + if (ret) + return ERR_PTR(ret); + + return no_free_ptr(path.dentry); +} + +/* + * Make sure that we reject any nonsensical flags that users pass via + * open_by_handle_at(). + */ +#define VALID_FILE_HANDLE_OPEN_FLAGS \ + (O_RDONLY | O_WRONLY | O_RDWR | O_NONBLOCK | O_CLOEXEC | O_EXCL) + +static int nsfs_export_permission(struct handle_to_path_ctx *ctx, + unsigned int oflags) +{ + if (oflags & ~(VALID_FILE_HANDLE_OPEN_FLAGS | O_LARGEFILE)) + return -EINVAL; + + /* nsfs_fh_to_dentry() is performs further permission checks. */ + return 0; +} + +static struct file *nsfs_export_open(struct path *path, unsigned int oflags) +{ + /* Clear O_LARGEFILE as open_by_handle_at() forces it. */ + oflags &= ~O_LARGEFILE; + return file_open_root(path, "", oflags, 0); +} + +static const struct export_operations nsfs_export_operations = { + .encode_fh = nsfs_encode_fh, + .fh_to_dentry = nsfs_fh_to_dentry, + .open = nsfs_export_open, + .permission = nsfs_export_permission, +}; + static int nsfs_init_fs_context(struct fs_context *fc) { struct pseudo_fs_context *ctx = init_pseudo(fc, NSFS_MAGIC); if (!ctx) return -ENOMEM; ctx->ops = &nsfs_ops; + ctx->eops = &nsfs_export_operations; ctx->dops = &ns_dentry_operations; fc->s_fs_info = (void *)&nsfs_stashed_ops; return 0; diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index cfb0dd1ea49c..3aac58a520c7 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -122,6 +122,12 @@ enum fid_type { FILEID_BCACHEFS_WITHOUT_PARENT = 0xb1, FILEID_BCACHEFS_WITH_PARENT = 0xb2,
+ /* + * + * 64 bit namespace identifier, 32 bit namespace type, 32 bit inode number. + */ + FILEID_NSFS = 0xf1, + /* * 64 bit unique kernfs id */
On Wed, Sep 10, 2025 at 4:39 PM Christian Brauner brauner@kernel.org wrote:
A while ago we added support for file handles to pidfs so pidfds can be encoded and decoded as file handles. Userspace has adopted this quickly and it's proven very useful.
Pidfd file handles are exhaustive meaning they don't require a handle on another pidfd to pass to open_by_handle_at() so it can derive the filesystem to decode in.
Implement the exhaustive file handles for namespaces as well.
I think you decide to split the "exhaustive" part to another patch, so better drop this paragraph?
I am missing an explanation about the permissions for opening these file handles.
My understanding of the code is that the opener needs to meet one of the conditions: 1. user has CAP_SYS_ADMIN in the userns owning the opened namespace 2. current task is in the opened namespace
But I do not fully understand the rationale behind the 2nd condition, that is, when is it useful? And as far as I can tell, your selftest does not cover this condition (only both true or both false)?
I suggest to start with allowing only the useful and important cases, so if cond #1 is useful enough, drop cond #2 and we can add it later if needed and then your selftests already cover cond #1 true and false.
Signed-off-by: Christian Brauner brauner@kernel.org
After documenting the permissions, with ot without dropping cond #2 feel free to add:
Reviewed-by: Amir Goldstein amir73il@gmail.com
fs/nsfs.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/exportfs.h | 6 ++ 2 files changed, 182 insertions(+)
diff --git a/fs/nsfs.c b/fs/nsfs.c index 6f8008177133..a1585a2f4f03 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -13,6 +13,12 @@ #include <linux/nsfs.h> #include <linux/uaccess.h> #include <linux/mnt_namespace.h> +#include <linux/ipc_namespace.h> +#include <linux/time_namespace.h> +#include <linux/utsname.h> +#include <linux/exportfs.h> +#include <linux/nstree.h> +#include <net/net_namespace.h>
#include "mount.h" #include "internal.h" @@ -417,12 +423,182 @@ static const struct stashed_operations nsfs_stashed_ops = { .put_data = nsfs_put_data, };
+struct nsfs_fid {
u64 ns_id;
u32 ns_type;
u32 ns_inum;
+} __attribute__ ((packed));
+#define NSFS_FID_SIZE (sizeof(struct nsfs_fid) / sizeof(u32))
+static int nsfs_encode_fh(struct inode *inode, u32 *fh, int *max_len,
struct inode *parent)
+{
struct nsfs_fid *fid = (struct nsfs_fid *)fh;
struct ns_common *ns = inode->i_private;
int len = *max_len;
/*
* TODO:
* For hierarchical namespaces we should start to encode the
* parent namespace. Then userspace can walk a namespace
* hierarchy purely based on file handles.
*/
if (parent)
return FILEID_INVALID;
if (len < NSFS_FID_SIZE) {
*max_len = NSFS_FID_SIZE;
return FILEID_INVALID;
}
len = NSFS_FID_SIZE;
fid->ns_id = ns->ns_id;
fid->ns_type = ns->ops->type;
fid->ns_inum = inode->i_ino;
*max_len = len;
return FILEID_NSFS;
+}
+static struct dentry *nsfs_fh_to_dentry(struct super_block *sb, struct fid *fh,
int fh_len, int fh_type)
+{
struct path path __free(path_put) = {};
struct nsfs_fid *fid = (struct nsfs_fid *)fh;
struct user_namespace *owning_ns = NULL;
struct ns_common *ns;
int ret;
if (fh_len < NSFS_FID_SIZE)
return NULL;
switch (fh_type) {
case FILEID_NSFS:
break;
default:
return NULL;
}
scoped_guard(rcu) {
ns = ns_tree_lookup_rcu(fid->ns_id, fid->ns_type);
if (!ns)
return NULL;
VFS_WARN_ON_ONCE(ns->ns_id != fid->ns_id);
VFS_WARN_ON_ONCE(ns->ops->type != fid->ns_type);
VFS_WARN_ON_ONCE(ns->inum != fid->ns_inum);
if (!refcount_inc_not_zero(&ns->count))
return NULL;
}
switch (ns->ops->type) {
+#ifdef CONFIG_CGROUPS
case CLONE_NEWCGROUP:
if (!current_in_namespace(to_cg_ns(ns)))
owning_ns = to_cg_ns(ns)->user_ns;
break;
+#endif +#ifdef CONFIG_IPC_NS
case CLONE_NEWIPC:
if (!current_in_namespace(to_ipc_ns(ns)))
owning_ns = to_ipc_ns(ns)->user_ns;
break;
+#endif
case CLONE_NEWNS:
if (!current_in_namespace(to_mnt_ns(ns)))
owning_ns = to_mnt_ns(ns)->user_ns;
break;
+#ifdef CONFIG_NET_NS
case CLONE_NEWNET:
if (!current_in_namespace(to_net_ns(ns)))
owning_ns = to_net_ns(ns)->user_ns;
break;
+#endif +#ifdef CONFIG_PID_NS
case CLONE_NEWPID:
if (!current_in_namespace(to_pid_ns(ns))) {
owning_ns = to_pid_ns(ns)->user_ns;
} else if (!READ_ONCE(to_pid_ns(ns)->child_reaper)) {
ns->ops->put(ns);
return ERR_PTR(-EPERM);
}
break;
+#endif +#ifdef CONFIG_TIME_NS
case CLONE_NEWTIME:
if (!current_in_namespace(to_time_ns(ns)))
owning_ns = to_time_ns(ns)->user_ns;
break;
+#endif +#ifdef CONFIG_USER_NS
case CLONE_NEWUSER:
if (!current_in_namespace(to_user_ns(ns)))
owning_ns = to_user_ns(ns);
break;
+#endif +#ifdef CONFIG_UTS_NS
case CLONE_NEWUTS:
if (!current_in_namespace(to_uts_ns(ns)))
owning_ns = to_uts_ns(ns)->user_ns;
break;
+#endif
default:
return ERR_PTR(-EOPNOTSUPP);
}
if (owning_ns && !ns_capable(owning_ns, CAP_SYS_ADMIN)) {
ns->ops->put(ns);
return ERR_PTR(-EPERM);
}
/* path_from_stashed() unconditionally consumes the reference. */
ret = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path);
if (ret)
return ERR_PTR(ret);
return no_free_ptr(path.dentry);
+}
+/*
- Make sure that we reject any nonsensical flags that users pass via
- open_by_handle_at().
- */
+#define VALID_FILE_HANDLE_OPEN_FLAGS \
(O_RDONLY | O_WRONLY | O_RDWR | O_NONBLOCK | O_CLOEXEC | O_EXCL)
+static int nsfs_export_permission(struct handle_to_path_ctx *ctx,
unsigned int oflags)
+{
if (oflags & ~(VALID_FILE_HANDLE_OPEN_FLAGS | O_LARGEFILE))
return -EINVAL;
/* nsfs_fh_to_dentry() is performs further permission checks. */
return 0;
+}
+static struct file *nsfs_export_open(struct path *path, unsigned int oflags) +{
/* Clear O_LARGEFILE as open_by_handle_at() forces it. */
oflags &= ~O_LARGEFILE;
return file_open_root(path, "", oflags, 0);
+}
+static const struct export_operations nsfs_export_operations = {
.encode_fh = nsfs_encode_fh,
.fh_to_dentry = nsfs_fh_to_dentry,
.open = nsfs_export_open,
.permission = nsfs_export_permission,
+};
static int nsfs_init_fs_context(struct fs_context *fc) { struct pseudo_fs_context *ctx = init_pseudo(fc, NSFS_MAGIC); if (!ctx) return -ENOMEM; ctx->ops = &nsfs_ops;
ctx->eops = &nsfs_export_operations; ctx->dops = &ns_dentry_operations; fc->s_fs_info = (void *)&nsfs_stashed_ops; return 0;
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index cfb0dd1ea49c..3aac58a520c7 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -122,6 +122,12 @@ enum fid_type { FILEID_BCACHEFS_WITHOUT_PARENT = 0xb1, FILEID_BCACHEFS_WITH_PARENT = 0xb2,
/*
*
* 64 bit namespace identifier, 32 bit namespace type, 32 bit inode number.
*/
FILEID_NSFS = 0xf1,
/* * 64 bit unique kernfs id */
-- 2.47.3
On Wed, Sep 10, 2025 at 07:21:22PM +0200, Amir Goldstein wrote:
On Wed, Sep 10, 2025 at 4:39 PM Christian Brauner brauner@kernel.org wrote:
A while ago we added support for file handles to pidfs so pidfds can be encoded and decoded as file handles. Userspace has adopted this quickly and it's proven very useful.
Pidfd file handles are exhaustive meaning they don't require a handle on another pidfd to pass to open_by_handle_at() so it can derive the filesystem to decode in.
Implement the exhaustive file handles for namespaces as well.
I think you decide to split the "exhaustive" part to another patch, so better drop this paragraph?
Yes, good point. I've dont that.
I am missing an explanation about the permissions for opening these file handles.
My understanding of the code is that the opener needs to meet one of the conditions:
- user has CAP_SYS_ADMIN in the userns owning the opened namespace
- current task is in the opened namespace
Yes.
But I do not fully understand the rationale behind the 2nd condition, that is, when is it useful?
A caller is always able to open a file descriptor to it's own set of namespaces. File handles will behave the same way.
And as far as I can tell, your selftest does not cover this condition (only both true or both false)?
I've added this now.
I suggest to start with allowing only the useful and important cases, so if cond #1 is useful enough, drop cond #2 and we can add it later if needed and then your selftests already cover cond #1 true and false.
Signed-off-by: Christian Brauner brauner@kernel.org
After documenting the permissions, with ot without dropping cond #2 feel free to add:
Reviewed-by: Amir Goldstein amir73il@gmail.com
Thanks!
fs/nsfs.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/exportfs.h | 6 ++ 2 files changed, 182 insertions(+)
diff --git a/fs/nsfs.c b/fs/nsfs.c index 6f8008177133..a1585a2f4f03 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -13,6 +13,12 @@ #include <linux/nsfs.h> #include <linux/uaccess.h> #include <linux/mnt_namespace.h> +#include <linux/ipc_namespace.h> +#include <linux/time_namespace.h> +#include <linux/utsname.h> +#include <linux/exportfs.h> +#include <linux/nstree.h> +#include <net/net_namespace.h>
#include "mount.h" #include "internal.h" @@ -417,12 +423,182 @@ static const struct stashed_operations nsfs_stashed_ops = { .put_data = nsfs_put_data, };
+struct nsfs_fid {
u64 ns_id;
u32 ns_type;
u32 ns_inum;
+} __attribute__ ((packed));
+#define NSFS_FID_SIZE (sizeof(struct nsfs_fid) / sizeof(u32))
+static int nsfs_encode_fh(struct inode *inode, u32 *fh, int *max_len,
struct inode *parent)
+{
struct nsfs_fid *fid = (struct nsfs_fid *)fh;
struct ns_common *ns = inode->i_private;
int len = *max_len;
/*
* TODO:
* For hierarchical namespaces we should start to encode the
* parent namespace. Then userspace can walk a namespace
* hierarchy purely based on file handles.
*/
if (parent)
return FILEID_INVALID;
if (len < NSFS_FID_SIZE) {
*max_len = NSFS_FID_SIZE;
return FILEID_INVALID;
}
len = NSFS_FID_SIZE;
fid->ns_id = ns->ns_id;
fid->ns_type = ns->ops->type;
fid->ns_inum = inode->i_ino;
*max_len = len;
return FILEID_NSFS;
+}
+static struct dentry *nsfs_fh_to_dentry(struct super_block *sb, struct fid *fh,
int fh_len, int fh_type)
+{
struct path path __free(path_put) = {};
struct nsfs_fid *fid = (struct nsfs_fid *)fh;
struct user_namespace *owning_ns = NULL;
struct ns_common *ns;
int ret;
if (fh_len < NSFS_FID_SIZE)
return NULL;
switch (fh_type) {
case FILEID_NSFS:
break;
default:
return NULL;
}
scoped_guard(rcu) {
ns = ns_tree_lookup_rcu(fid->ns_id, fid->ns_type);
if (!ns)
return NULL;
VFS_WARN_ON_ONCE(ns->ns_id != fid->ns_id);
VFS_WARN_ON_ONCE(ns->ops->type != fid->ns_type);
VFS_WARN_ON_ONCE(ns->inum != fid->ns_inum);
if (!refcount_inc_not_zero(&ns->count))
return NULL;
}
switch (ns->ops->type) {
+#ifdef CONFIG_CGROUPS
case CLONE_NEWCGROUP:
if (!current_in_namespace(to_cg_ns(ns)))
owning_ns = to_cg_ns(ns)->user_ns;
break;
+#endif +#ifdef CONFIG_IPC_NS
case CLONE_NEWIPC:
if (!current_in_namespace(to_ipc_ns(ns)))
owning_ns = to_ipc_ns(ns)->user_ns;
break;
+#endif
case CLONE_NEWNS:
if (!current_in_namespace(to_mnt_ns(ns)))
owning_ns = to_mnt_ns(ns)->user_ns;
break;
+#ifdef CONFIG_NET_NS
case CLONE_NEWNET:
if (!current_in_namespace(to_net_ns(ns)))
owning_ns = to_net_ns(ns)->user_ns;
break;
+#endif +#ifdef CONFIG_PID_NS
case CLONE_NEWPID:
if (!current_in_namespace(to_pid_ns(ns))) {
owning_ns = to_pid_ns(ns)->user_ns;
} else if (!READ_ONCE(to_pid_ns(ns)->child_reaper)) {
ns->ops->put(ns);
return ERR_PTR(-EPERM);
}
break;
+#endif +#ifdef CONFIG_TIME_NS
case CLONE_NEWTIME:
if (!current_in_namespace(to_time_ns(ns)))
owning_ns = to_time_ns(ns)->user_ns;
break;
+#endif +#ifdef CONFIG_USER_NS
case CLONE_NEWUSER:
if (!current_in_namespace(to_user_ns(ns)))
owning_ns = to_user_ns(ns);
break;
+#endif +#ifdef CONFIG_UTS_NS
case CLONE_NEWUTS:
if (!current_in_namespace(to_uts_ns(ns)))
owning_ns = to_uts_ns(ns)->user_ns;
break;
+#endif
default:
return ERR_PTR(-EOPNOTSUPP);
}
if (owning_ns && !ns_capable(owning_ns, CAP_SYS_ADMIN)) {
ns->ops->put(ns);
return ERR_PTR(-EPERM);
}
/* path_from_stashed() unconditionally consumes the reference. */
ret = path_from_stashed(&ns->stashed, nsfs_mnt, ns, &path);
if (ret)
return ERR_PTR(ret);
return no_free_ptr(path.dentry);
+}
+/*
- Make sure that we reject any nonsensical flags that users pass via
- open_by_handle_at().
- */
+#define VALID_FILE_HANDLE_OPEN_FLAGS \
(O_RDONLY | O_WRONLY | O_RDWR | O_NONBLOCK | O_CLOEXEC | O_EXCL)
+static int nsfs_export_permission(struct handle_to_path_ctx *ctx,
unsigned int oflags)
+{
if (oflags & ~(VALID_FILE_HANDLE_OPEN_FLAGS | O_LARGEFILE))
return -EINVAL;
/* nsfs_fh_to_dentry() is performs further permission checks. */
return 0;
+}
+static struct file *nsfs_export_open(struct path *path, unsigned int oflags) +{
/* Clear O_LARGEFILE as open_by_handle_at() forces it. */
oflags &= ~O_LARGEFILE;
return file_open_root(path, "", oflags, 0);
+}
+static const struct export_operations nsfs_export_operations = {
.encode_fh = nsfs_encode_fh,
.fh_to_dentry = nsfs_fh_to_dentry,
.open = nsfs_export_open,
.permission = nsfs_export_permission,
+};
static int nsfs_init_fs_context(struct fs_context *fc) { struct pseudo_fs_context *ctx = init_pseudo(fc, NSFS_MAGIC); if (!ctx) return -ENOMEM; ctx->ops = &nsfs_ops;
ctx->eops = &nsfs_export_operations; ctx->dops = &ns_dentry_operations; fc->s_fs_info = (void *)&nsfs_stashed_ops; return 0;
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index cfb0dd1ea49c..3aac58a520c7 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -122,6 +122,12 @@ enum fid_type { FILEID_BCACHEFS_WITHOUT_PARENT = 0xb1, FILEID_BCACHEFS_WITH_PARENT = 0xb2,
/*
*
* 64 bit namespace identifier, 32 bit namespace type, 32 bit inode number.
*/
FILEID_NSFS = 0xf1,
/* * 64 bit unique kernfs id */
-- 2.47.3
On Thu, Sep 11, 2025 at 11:31 AM Christian Brauner brauner@kernel.org wrote:
On Wed, Sep 10, 2025 at 07:21:22PM +0200, Amir Goldstein wrote:
On Wed, Sep 10, 2025 at 4:39 PM Christian Brauner brauner@kernel.org wrote:
A while ago we added support for file handles to pidfs so pidfds can be encoded and decoded as file handles. Userspace has adopted this quickly and it's proven very useful.
Pidfd file handles are exhaustive meaning they don't require a handle on another pidfd to pass to open_by_handle_at() so it can derive the filesystem to decode in.
Implement the exhaustive file handles for namespaces as well.
I think you decide to split the "exhaustive" part to another patch, so better drop this paragraph?
Yes, good point. I've dont that.
I am missing an explanation about the permissions for opening these file handles.
My understanding of the code is that the opener needs to meet one of the conditions:
- user has CAP_SYS_ADMIN in the userns owning the opened namespace
- current task is in the opened namespace
Yes.
But I do not fully understand the rationale behind the 2nd condition, that is, when is it useful?
A caller is always able to open a file descriptor to it's own set of namespaces. File handles will behave the same way.
I understand why it's safe, and I do not object to it at all, I just feel that I do not fully understand the use case of how ns file handles are expected to be used. A process can always open /proc/self/ns/mnt What's the use case where a process may need to open its own ns by handle?
I will explain. For CAP_SYS_ADMIN I can see why keeping handles that do not keep an elevated refcount of ns object could be useful in the same way that an NFS client keeps file handles without keeping the file object alive.
But if you do not have CAP_SYS_ADMIN and can only open your own ns by handle, what is the application that could make use of this? and what's the benefit of such application keeping a file handle instead of ns fd?
Sorry. I feel that I may be missing something in the big picture.
Thanks, Amir.
Pidfd file handles are exhaustive meaning they don't require a handle on another pidfd to pass to open_by_handle_at() so it can derive the filesystem to decode in. Instead it can be derived from the file handle itself. The same is possible for namespace file handles.
Signed-off-by: Christian Brauner brauner@kernel.org --- fs/fhandle.c | 6 ++++++ fs/internal.h | 1 + fs/nsfs.c | 10 ++++++++++ include/uapi/linux/fcntl.h | 1 + 4 files changed, 18 insertions(+)
diff --git a/fs/fhandle.c b/fs/fhandle.c index 7c236f64cdea..f18c855bb0c2 100644 --- a/fs/fhandle.c +++ b/fs/fhandle.c @@ -11,6 +11,7 @@ #include <linux/personality.h> #include <linux/uaccess.h> #include <linux/compat.h> +#include <linux/nsfs.h> #include "internal.h" #include "mount.h"
@@ -189,6 +190,11 @@ static int get_path_anchor(int fd, struct path *root) return 0; }
+ if (fd == FD_NSFS_ROOT) { + nsfs_get_root(root); + return 0; + } + return -EBADF; }
diff --git a/fs/internal.h b/fs/internal.h index 38e8aab27bbd..a33d18ee5b74 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -355,3 +355,4 @@ int anon_inode_getattr(struct mnt_idmap *idmap, const struct path *path, int anon_inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *attr); void pidfs_get_root(struct path *path); +void nsfs_get_root(struct path *path); diff --git a/fs/nsfs.c b/fs/nsfs.c index a1585a2f4f03..3c6fcf652633 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -25,6 +25,14 @@
static struct vfsmount *nsfs_mnt;
+static struct path nsfs_root_path = {}; + +void nsfs_get_root(struct path *path) +{ + *path = nsfs_root_path; + path_get(path); +} + static long ns_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); static const struct file_operations ns_file_operations = { @@ -616,4 +624,6 @@ void __init nsfs_init(void) if (IS_ERR(nsfs_mnt)) panic("can't set nsfs up\n"); nsfs_mnt->mnt_sb->s_flags &= ~SB_NOUSER; + nsfs_root_path.mnt = nsfs_mnt; + nsfs_root_path.dentry = nsfs_mnt->mnt_root; } diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index f291ab4f94eb..3741ea1b73d8 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -111,6 +111,7 @@ #define PIDFD_SELF_THREAD_GROUP -10001 /* Current thread group leader. */
#define FD_PIDFS_ROOT -10002 /* Root of the pidfs filesystem */ +#define FD_NSFS_ROOT -10003 /* Root of the nsfs filesystem */ #define FD_INVALID -10009 /* Invalid file descriptor: -10000 - EBADF = -10009 */
/* Generic flags for the *at(2) family of syscalls. */
On Wed, Sep 10, 2025 at 4:39 PM Christian Brauner brauner@kernel.org wrote:
Pidfd file handles are exhaustive meaning they don't require a handle on another pidfd to pass to open_by_handle_at() so it can derive the filesystem to decode in. Instead it can be derived from the file handle itself. The same is possible for namespace file handles.
Signed-off-by: Christian Brauner brauner@kernel.org
Reviewed-by: Amir Goldstein amir73il@gmail.com
fs/fhandle.c | 6 ++++++ fs/internal.h | 1 + fs/nsfs.c | 10 ++++++++++ include/uapi/linux/fcntl.h | 1 + 4 files changed, 18 insertions(+)
diff --git a/fs/fhandle.c b/fs/fhandle.c index 7c236f64cdea..f18c855bb0c2 100644 --- a/fs/fhandle.c +++ b/fs/fhandle.c @@ -11,6 +11,7 @@ #include <linux/personality.h> #include <linux/uaccess.h> #include <linux/compat.h> +#include <linux/nsfs.h> #include "internal.h" #include "mount.h"
@@ -189,6 +190,11 @@ static int get_path_anchor(int fd, struct path *root) return 0; }
if (fd == FD_NSFS_ROOT) {
nsfs_get_root(root);
return 0;
}
return -EBADF;
}
diff --git a/fs/internal.h b/fs/internal.h index 38e8aab27bbd..a33d18ee5b74 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -355,3 +355,4 @@ int anon_inode_getattr(struct mnt_idmap *idmap, const struct path *path, int anon_inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct iattr *attr); void pidfs_get_root(struct path *path); +void nsfs_get_root(struct path *path); diff --git a/fs/nsfs.c b/fs/nsfs.c index a1585a2f4f03..3c6fcf652633 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -25,6 +25,14 @@
static struct vfsmount *nsfs_mnt;
+static struct path nsfs_root_path = {};
+void nsfs_get_root(struct path *path) +{
*path = nsfs_root_path;
path_get(path);
+}
static long ns_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); static const struct file_operations ns_file_operations = { @@ -616,4 +624,6 @@ void __init nsfs_init(void) if (IS_ERR(nsfs_mnt)) panic("can't set nsfs up\n"); nsfs_mnt->mnt_sb->s_flags &= ~SB_NOUSER;
nsfs_root_path.mnt = nsfs_mnt;
nsfs_root_path.dentry = nsfs_mnt->mnt_root;
} diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index f291ab4f94eb..3741ea1b73d8 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -111,6 +111,7 @@ #define PIDFD_SELF_THREAD_GROUP -10001 /* Current thread group leader. */
#define FD_PIDFS_ROOT -10002 /* Root of the pidfs filesystem */ +#define FD_NSFS_ROOT -10003 /* Root of the nsfs filesystem */ #define FD_INVALID -10009 /* Invalid file descriptor: -10000 - EBADF = -10009 */
/* Generic flags for the *at(2) family of syscalls. */
-- 2.47.3
The mount namespace has supported id retrieval for a while already. Add support for the other types as well.
Signed-off-by: Christian Brauner brauner@kernel.org --- fs/nsfs.c | 74 +++++++++++++++++++++++++++++++++++++++-------- include/uapi/linux/nsfs.h | 12 ++++++-- 2 files changed, 72 insertions(+), 14 deletions(-)
diff --git a/fs/nsfs.c b/fs/nsfs.c index 3c6fcf652633..527480e67fd1 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -173,6 +173,13 @@ static bool nsfs_ioctl_valid(unsigned int cmd) case NS_GET_NSTYPE: case NS_GET_OWNER_UID: case NS_GET_MNTNS_ID: + case NS_GET_NETNS_ID: + case NS_GET_CGROUPNS_ID: + case NS_GET_IPCNS_ID: + case NS_GET_UTSNS_ID: + case NS_GET_PIDNS_ID: + case NS_GET_TIMENS_ID: + case NS_GET_USERNS_ID: case NS_GET_PID_FROM_PIDNS: case NS_GET_TGID_FROM_PIDNS: case NS_GET_PID_IN_PIDNS: @@ -226,18 +233,6 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl, argp = (uid_t __user *) arg; uid = from_kuid_munged(current_user_ns(), user_ns->owner); return put_user(uid, argp); - case NS_GET_MNTNS_ID: { - __u64 __user *idp; - __u64 id; - - if (ns->ops->type != CLONE_NEWNS) - return -EINVAL; - - mnt_ns = container_of(ns, struct mnt_namespace, ns); - idp = (__u64 __user *)arg; - id = mnt_ns->ns.ns_id; - return put_user(id, idp); - } case NS_GET_PID_FROM_PIDNS: fallthrough; case NS_GET_TGID_FROM_PIDNS: @@ -283,6 +278,61 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl, ret = -ESRCH; return ret; } + case NS_GET_MNTNS_ID: + fallthrough; + case NS_GET_NETNS_ID: + fallthrough; + case NS_GET_CGROUPNS_ID: + fallthrough; + case NS_GET_IPCNS_ID: + fallthrough; + case NS_GET_UTSNS_ID: + fallthrough; + case NS_GET_PIDNS_ID: + fallthrough; + case NS_GET_TIMENS_ID: + fallthrough; + case NS_GET_USERNS_ID: { + __u64 __user *idp; + __u64 id; + int expected_type; + + switch (ioctl) { + case NS_GET_MNTNS_ID: + expected_type = CLONE_NEWNS; + break; + case NS_GET_NETNS_ID: + expected_type = CLONE_NEWNET; + break; + case NS_GET_CGROUPNS_ID: + expected_type = CLONE_NEWCGROUP; + break; + case NS_GET_IPCNS_ID: + expected_type = CLONE_NEWIPC; + break; + case NS_GET_UTSNS_ID: + expected_type = CLONE_NEWUTS; + break; + case NS_GET_PIDNS_ID: + expected_type = CLONE_NEWPID; + break; + case NS_GET_TIMENS_ID: + expected_type = CLONE_NEWTIME; + break; + case NS_GET_USERNS_ID: + expected_type = CLONE_NEWUSER; + break; + default: + return -EINVAL; + } + + if (ns->ops->type != expected_type) + return -EINVAL; + + idp = (__u64 __user *)arg; + id = ns->ns_id; + return put_user(id, idp); + } }
/* extensible ioctls */ diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h index 97d8d80d139f..f7c21840cc09 100644 --- a/include/uapi/linux/nsfs.h +++ b/include/uapi/linux/nsfs.h @@ -16,8 +16,6 @@ #define NS_GET_NSTYPE _IO(NSIO, 0x3) /* Get owner UID (in the caller's user namespace) for a user namespace */ #define NS_GET_OWNER_UID _IO(NSIO, 0x4) -/* Get the id for a mount namespace */ -#define NS_GET_MNTNS_ID _IOR(NSIO, 0x5, __u64) /* Translate pid from target pid namespace into the caller's pid namespace. */ #define NS_GET_PID_FROM_PIDNS _IOR(NSIO, 0x6, int) /* Return thread-group leader id of pid in the callers pid namespace. */ @@ -42,6 +40,16 @@ struct mnt_ns_info { /* Get previous namespace. */ #define NS_MNT_GET_PREV _IOR(NSIO, 12, struct mnt_ns_info)
+/* Retrieve namespace identifiers. */ +#define NS_GET_MNTNS_ID _IOR(NSIO, 5, __u64) +#define NS_GET_NETNS_ID _IOR(NSIO, 13, __u64) +#define NS_GET_CGROUPNS_ID _IOR(NSIO, 14, __u64) +#define NS_GET_IPCNS_ID _IOR(NSIO, 15, __u64) +#define NS_GET_UTSNS_ID _IOR(NSIO, 16, __u64) +#define NS_GET_PIDNS_ID _IOR(NSIO, 17, __u64) +#define NS_GET_TIMENS_ID _IOR(NSIO, 18, __u64) +#define NS_GET_USERNS_ID _IOR(NSIO, 19, __u64) + enum init_ns_ino { IPC_NS_INIT_INO = 0xEFFFFFFFU, UTS_NS_INIT_INO = 0xEFFFFFFEU,
On 2025-09-10, Christian Brauner brauner@kernel.org wrote:
The mount namespace has supported id retrieval for a while already. Add support for the other types as well.
Signed-off-by: Christian Brauner brauner@kernel.org
fs/nsfs.c | 74 +++++++++++++++++++++++++++++++++++++++-------- include/uapi/linux/nsfs.h | 12 ++++++-- 2 files changed, 72 insertions(+), 14 deletions(-)
diff --git a/fs/nsfs.c b/fs/nsfs.c index 3c6fcf652633..527480e67fd1 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -173,6 +173,13 @@ static bool nsfs_ioctl_valid(unsigned int cmd) case NS_GET_NSTYPE: case NS_GET_OWNER_UID: case NS_GET_MNTNS_ID:
- case NS_GET_NETNS_ID:
- case NS_GET_CGROUPNS_ID:
- case NS_GET_IPCNS_ID:
- case NS_GET_UTSNS_ID:
- case NS_GET_PIDNS_ID:
- case NS_GET_TIMENS_ID:
- case NS_GET_USERNS_ID: case NS_GET_PID_FROM_PIDNS: case NS_GET_TGID_FROM_PIDNS: case NS_GET_PID_IN_PIDNS:
@@ -226,18 +233,6 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl, argp = (uid_t __user *) arg; uid = from_kuid_munged(current_user_ns(), user_ns->owner); return put_user(uid, argp);
- case NS_GET_MNTNS_ID: {
__u64 __user *idp;
__u64 id;
if (ns->ops->type != CLONE_NEWNS)
return -EINVAL;
mnt_ns = container_of(ns, struct mnt_namespace, ns);
idp = (__u64 __user *)arg;
id = mnt_ns->ns.ns_id;
return put_user(id, idp);
- } case NS_GET_PID_FROM_PIDNS: fallthrough; case NS_GET_TGID_FROM_PIDNS:
@@ -283,6 +278,61 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl, ret = -ESRCH; return ret; }
- case NS_GET_MNTNS_ID:
fallthrough;
- case NS_GET_NETNS_ID:
fallthrough;
- case NS_GET_CGROUPNS_ID:
fallthrough;
- case NS_GET_IPCNS_ID:
fallthrough;
- case NS_GET_UTSNS_ID:
fallthrough;
- case NS_GET_PIDNS_ID:
fallthrough;
- case NS_GET_TIMENS_ID:
fallthrough;
- case NS_GET_USERNS_ID: {
__u64 __user *idp;
__u64 id;
int expected_type;
switch (ioctl) {
case NS_GET_MNTNS_ID:
expected_type = CLONE_NEWNS;
break;
case NS_GET_NETNS_ID:
expected_type = CLONE_NEWNET;
break;
case NS_GET_CGROUPNS_ID:
expected_type = CLONE_NEWCGROUP;
break;
case NS_GET_IPCNS_ID:
expected_type = CLONE_NEWIPC;
break;
case NS_GET_UTSNS_ID:
expected_type = CLONE_NEWUTS;
break;
case NS_GET_PIDNS_ID:
expected_type = CLONE_NEWPID;
break;
case NS_GET_TIMENS_ID:
expected_type = CLONE_NEWTIME;
break;
case NS_GET_USERNS_ID:
expected_type = CLONE_NEWUSER;
break;
default:
return -EINVAL;
}
if (ns->ops->type != expected_type)
return -EINVAL;
While I get that having this be per-ns-type lets programs avoid being tricked into thinking that one namespace ID is actually another namespace, it feels a bit ugly to have to add a new ioctl for every new namespace.
If we added a way to get the CLONE_* flag for a namespace (NS_GET_TYPE) we could have just NS_GET_ID. Of course, we would have to trust userspace to do the right thing...
idp = (__u64 __user *)arg;
id = ns->ns_id;
return put_user(id, idp);
- } }
/* extensible ioctls */ diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h index 97d8d80d139f..f7c21840cc09 100644 --- a/include/uapi/linux/nsfs.h +++ b/include/uapi/linux/nsfs.h @@ -16,8 +16,6 @@ #define NS_GET_NSTYPE _IO(NSIO, 0x3) /* Get owner UID (in the caller's user namespace) for a user namespace */ #define NS_GET_OWNER_UID _IO(NSIO, 0x4) -/* Get the id for a mount namespace */ -#define NS_GET_MNTNS_ID _IOR(NSIO, 0x5, __u64) /* Translate pid from target pid namespace into the caller's pid namespace. */ #define NS_GET_PID_FROM_PIDNS _IOR(NSIO, 0x6, int) /* Return thread-group leader id of pid in the callers pid namespace. */ @@ -42,6 +40,16 @@ struct mnt_ns_info { /* Get previous namespace. */ #define NS_MNT_GET_PREV _IOR(NSIO, 12, struct mnt_ns_info) +/* Retrieve namespace identifiers. */ +#define NS_GET_MNTNS_ID _IOR(NSIO, 5, __u64) +#define NS_GET_NETNS_ID _IOR(NSIO, 13, __u64) +#define NS_GET_CGROUPNS_ID _IOR(NSIO, 14, __u64) +#define NS_GET_IPCNS_ID _IOR(NSIO, 15, __u64) +#define NS_GET_UTSNS_ID _IOR(NSIO, 16, __u64) +#define NS_GET_PIDNS_ID _IOR(NSIO, 17, __u64) +#define NS_GET_TIMENS_ID _IOR(NSIO, 18, __u64) +#define NS_GET_USERNS_ID _IOR(NSIO, 19, __u64)
enum init_ns_ino { IPC_NS_INIT_INO = 0xEFFFFFFFU, UTS_NS_INIT_INO = 0xEFFFFFFEU,
-- 2.47.3
On Thu, Sep 11, 2025 at 02:49:49AM +1000, Aleksa Sarai wrote:
On 2025-09-10, Christian Brauner brauner@kernel.org wrote:
The mount namespace has supported id retrieval for a while already. Add support for the other types as well.
Signed-off-by: Christian Brauner brauner@kernel.org
fs/nsfs.c | 74 +++++++++++++++++++++++++++++++++++++++-------- include/uapi/linux/nsfs.h | 12 ++++++-- 2 files changed, 72 insertions(+), 14 deletions(-)
diff --git a/fs/nsfs.c b/fs/nsfs.c index 3c6fcf652633..527480e67fd1 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -173,6 +173,13 @@ static bool nsfs_ioctl_valid(unsigned int cmd) case NS_GET_NSTYPE: case NS_GET_OWNER_UID: case NS_GET_MNTNS_ID:
- case NS_GET_NETNS_ID:
- case NS_GET_CGROUPNS_ID:
- case NS_GET_IPCNS_ID:
- case NS_GET_UTSNS_ID:
- case NS_GET_PIDNS_ID:
- case NS_GET_TIMENS_ID:
- case NS_GET_USERNS_ID: case NS_GET_PID_FROM_PIDNS: case NS_GET_TGID_FROM_PIDNS: case NS_GET_PID_IN_PIDNS:
@@ -226,18 +233,6 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl, argp = (uid_t __user *) arg; uid = from_kuid_munged(current_user_ns(), user_ns->owner); return put_user(uid, argp);
- case NS_GET_MNTNS_ID: {
__u64 __user *idp;
__u64 id;
if (ns->ops->type != CLONE_NEWNS)
return -EINVAL;
mnt_ns = container_of(ns, struct mnt_namespace, ns);
idp = (__u64 __user *)arg;
id = mnt_ns->ns.ns_id;
return put_user(id, idp);
- } case NS_GET_PID_FROM_PIDNS: fallthrough; case NS_GET_TGID_FROM_PIDNS:
@@ -283,6 +278,61 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl, ret = -ESRCH; return ret; }
- case NS_GET_MNTNS_ID:
fallthrough;
- case NS_GET_NETNS_ID:
fallthrough;
- case NS_GET_CGROUPNS_ID:
fallthrough;
- case NS_GET_IPCNS_ID:
fallthrough;
- case NS_GET_UTSNS_ID:
fallthrough;
- case NS_GET_PIDNS_ID:
fallthrough;
- case NS_GET_TIMENS_ID:
fallthrough;
- case NS_GET_USERNS_ID: {
__u64 __user *idp;
__u64 id;
int expected_type;
switch (ioctl) {
case NS_GET_MNTNS_ID:
expected_type = CLONE_NEWNS;
break;
case NS_GET_NETNS_ID:
expected_type = CLONE_NEWNET;
break;
case NS_GET_CGROUPNS_ID:
expected_type = CLONE_NEWCGROUP;
break;
case NS_GET_IPCNS_ID:
expected_type = CLONE_NEWIPC;
break;
case NS_GET_UTSNS_ID:
expected_type = CLONE_NEWUTS;
break;
case NS_GET_PIDNS_ID:
expected_type = CLONE_NEWPID;
break;
case NS_GET_TIMENS_ID:
expected_type = CLONE_NEWTIME;
break;
case NS_GET_USERNS_ID:
expected_type = CLONE_NEWUSER;
break;
default:
return -EINVAL;
}
if (ns->ops->type != expected_type)
return -EINVAL;
While I get that having this be per-ns-type lets programs avoid being tricked into thinking that one namespace ID is actually another namespace, it feels a bit ugly to have to add a new ioctl for every new namespace.
If we added a way to get the CLONE_* flag for a namespace (NS_GET_TYPE)
That exists afaict: NS_GET_NSTYPE.
we could have just NS_GET_ID. Of course, we would have to trust userspace to do the right thing...
So NS_GET_ID can just return the id and be done with it. If userspace wants to know what type it is they can issue a separate ioctl. But since the id space is shared all ids of all namespaces can be compared with each other reliably. So really for comparision you wouldn't need to care. IOW, yes.
idp = (__u64 __user *)arg;
id = ns->ns_id;
return put_user(id, idp);
- } }
/* extensible ioctls */ diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h index 97d8d80d139f..f7c21840cc09 100644 --- a/include/uapi/linux/nsfs.h +++ b/include/uapi/linux/nsfs.h @@ -16,8 +16,6 @@ #define NS_GET_NSTYPE _IO(NSIO, 0x3) /* Get owner UID (in the caller's user namespace) for a user namespace */ #define NS_GET_OWNER_UID _IO(NSIO, 0x4) -/* Get the id for a mount namespace */ -#define NS_GET_MNTNS_ID _IOR(NSIO, 0x5, __u64) /* Translate pid from target pid namespace into the caller's pid namespace. */ #define NS_GET_PID_FROM_PIDNS _IOR(NSIO, 0x6, int) /* Return thread-group leader id of pid in the callers pid namespace. */ @@ -42,6 +40,16 @@ struct mnt_ns_info { /* Get previous namespace. */ #define NS_MNT_GET_PREV _IOR(NSIO, 12, struct mnt_ns_info) +/* Retrieve namespace identifiers. */ +#define NS_GET_MNTNS_ID _IOR(NSIO, 5, __u64) +#define NS_GET_NETNS_ID _IOR(NSIO, 13, __u64) +#define NS_GET_CGROUPNS_ID _IOR(NSIO, 14, __u64) +#define NS_GET_IPCNS_ID _IOR(NSIO, 15, __u64) +#define NS_GET_UTSNS_ID _IOR(NSIO, 16, __u64) +#define NS_GET_PIDNS_ID _IOR(NSIO, 17, __u64) +#define NS_GET_TIMENS_ID _IOR(NSIO, 18, __u64) +#define NS_GET_USERNS_ID _IOR(NSIO, 19, __u64)
enum init_ns_ino { IPC_NS_INIT_INO = 0xEFFFFFFFU, UTS_NS_INIT_INO = 0xEFFFFFFEU,
-- 2.47.3
-- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
On 2025-09-11, Christian Brauner brauner@kernel.org wrote:
On Thu, Sep 11, 2025 at 02:49:49AM +1000, Aleksa Sarai wrote:
On 2025-09-10, Christian Brauner brauner@kernel.org wrote:
The mount namespace has supported id retrieval for a while already. Add support for the other types as well.
Signed-off-by: Christian Brauner brauner@kernel.org
fs/nsfs.c | 74 +++++++++++++++++++++++++++++++++++++++-------- include/uapi/linux/nsfs.h | 12 ++++++-- 2 files changed, 72 insertions(+), 14 deletions(-)
diff --git a/fs/nsfs.c b/fs/nsfs.c index 3c6fcf652633..527480e67fd1 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -173,6 +173,13 @@ static bool nsfs_ioctl_valid(unsigned int cmd) case NS_GET_NSTYPE: case NS_GET_OWNER_UID: case NS_GET_MNTNS_ID:
- case NS_GET_NETNS_ID:
- case NS_GET_CGROUPNS_ID:
- case NS_GET_IPCNS_ID:
- case NS_GET_UTSNS_ID:
- case NS_GET_PIDNS_ID:
- case NS_GET_TIMENS_ID:
- case NS_GET_USERNS_ID: case NS_GET_PID_FROM_PIDNS: case NS_GET_TGID_FROM_PIDNS: case NS_GET_PID_IN_PIDNS:
@@ -226,18 +233,6 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl, argp = (uid_t __user *) arg; uid = from_kuid_munged(current_user_ns(), user_ns->owner); return put_user(uid, argp);
- case NS_GET_MNTNS_ID: {
__u64 __user *idp;
__u64 id;
if (ns->ops->type != CLONE_NEWNS)
return -EINVAL;
mnt_ns = container_of(ns, struct mnt_namespace, ns);
idp = (__u64 __user *)arg;
id = mnt_ns->ns.ns_id;
return put_user(id, idp);
- } case NS_GET_PID_FROM_PIDNS: fallthrough; case NS_GET_TGID_FROM_PIDNS:
@@ -283,6 +278,61 @@ static long ns_ioctl(struct file *filp, unsigned int ioctl, ret = -ESRCH; return ret; }
- case NS_GET_MNTNS_ID:
fallthrough;
- case NS_GET_NETNS_ID:
fallthrough;
- case NS_GET_CGROUPNS_ID:
fallthrough;
- case NS_GET_IPCNS_ID:
fallthrough;
- case NS_GET_UTSNS_ID:
fallthrough;
- case NS_GET_PIDNS_ID:
fallthrough;
- case NS_GET_TIMENS_ID:
fallthrough;
- case NS_GET_USERNS_ID: {
__u64 __user *idp;
__u64 id;
int expected_type;
switch (ioctl) {
case NS_GET_MNTNS_ID:
expected_type = CLONE_NEWNS;
break;
case NS_GET_NETNS_ID:
expected_type = CLONE_NEWNET;
break;
case NS_GET_CGROUPNS_ID:
expected_type = CLONE_NEWCGROUP;
break;
case NS_GET_IPCNS_ID:
expected_type = CLONE_NEWIPC;
break;
case NS_GET_UTSNS_ID:
expected_type = CLONE_NEWUTS;
break;
case NS_GET_PIDNS_ID:
expected_type = CLONE_NEWPID;
break;
case NS_GET_TIMENS_ID:
expected_type = CLONE_NEWTIME;
break;
case NS_GET_USERNS_ID:
expected_type = CLONE_NEWUSER;
break;
default:
return -EINVAL;
}
if (ns->ops->type != expected_type)
return -EINVAL;
While I get that having this be per-ns-type lets programs avoid being tricked into thinking that one namespace ID is actually another namespace, it feels a bit ugly to have to add a new ioctl for every new namespace.
If we added a way to get the CLONE_* flag for a namespace (NS_GET_TYPE)
That exists afaict: NS_GET_NSTYPE.
D'oh, yeah that's all you need.
we could have just NS_GET_ID. Of course, we would have to trust userspace to do the right thing...
So NS_GET_ID can just return the id and be done with it. If userspace wants to know what type it is they can issue a separate ioctl. But since the id space is shared all ids of all namespaces can be compared with each other reliably. So really for comparision you wouldn't need to care. IOW, yes.
Ah, I didn't realise they're all in the same id-space -- in that case it makes even more sense to just have a single NS_GET_ID IMHO.
idp = (__u64 __user *)arg;
id = ns->ns_id;
return put_user(id, idp);
- } }
/* extensible ioctls */ diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h index 97d8d80d139f..f7c21840cc09 100644 --- a/include/uapi/linux/nsfs.h +++ b/include/uapi/linux/nsfs.h @@ -16,8 +16,6 @@ #define NS_GET_NSTYPE _IO(NSIO, 0x3) /* Get owner UID (in the caller's user namespace) for a user namespace */ #define NS_GET_OWNER_UID _IO(NSIO, 0x4) -/* Get the id for a mount namespace */ -#define NS_GET_MNTNS_ID _IOR(NSIO, 0x5, __u64) /* Translate pid from target pid namespace into the caller's pid namespace. */ #define NS_GET_PID_FROM_PIDNS _IOR(NSIO, 0x6, int) /* Return thread-group leader id of pid in the callers pid namespace. */ @@ -42,6 +40,16 @@ struct mnt_ns_info { /* Get previous namespace. */ #define NS_MNT_GET_PREV _IOR(NSIO, 12, struct mnt_ns_info) +/* Retrieve namespace identifiers. */ +#define NS_GET_MNTNS_ID _IOR(NSIO, 5, __u64) +#define NS_GET_NETNS_ID _IOR(NSIO, 13, __u64) +#define NS_GET_CGROUPNS_ID _IOR(NSIO, 14, __u64) +#define NS_GET_IPCNS_ID _IOR(NSIO, 15, __u64) +#define NS_GET_UTSNS_ID _IOR(NSIO, 16, __u64) +#define NS_GET_PIDNS_ID _IOR(NSIO, 17, __u64) +#define NS_GET_TIMENS_ID _IOR(NSIO, 18, __u64) +#define NS_GET_USERNS_ID _IOR(NSIO, 19, __u64)
enum init_ns_ino { IPC_NS_INIT_INO = 0xEFFFFFFFU, UTS_NS_INIT_INO = 0xEFFFFFFEU,
-- 2.47.3
-- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
Update the nsfs.h tools header to the uapi/nsfs.h header so we can rely on it in the selftests.
Signed-off-by: Christian Brauner brauner@kernel.org --- tools/include/uapi/linux/nsfs.h | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/tools/include/uapi/linux/nsfs.h b/tools/include/uapi/linux/nsfs.h index 34127653fd00..f7c21840cc09 100644 --- a/tools/include/uapi/linux/nsfs.h +++ b/tools/include/uapi/linux/nsfs.h @@ -16,8 +16,6 @@ #define NS_GET_NSTYPE _IO(NSIO, 0x3) /* Get owner UID (in the caller's user namespace) for a user namespace */ #define NS_GET_OWNER_UID _IO(NSIO, 0x4) -/* Get the id for a mount namespace */ -#define NS_GET_MNTNS_ID _IOR(NSIO, 0x5, __u64) /* Translate pid from target pid namespace into the caller's pid namespace. */ #define NS_GET_PID_FROM_PIDNS _IOR(NSIO, 0x6, int) /* Return thread-group leader id of pid in the callers pid namespace. */ @@ -42,4 +40,25 @@ struct mnt_ns_info { /* Get previous namespace. */ #define NS_MNT_GET_PREV _IOR(NSIO, 12, struct mnt_ns_info)
+/* Retrieve namespace identifiers. */ +#define NS_GET_MNTNS_ID _IOR(NSIO, 5, __u64) +#define NS_GET_NETNS_ID _IOR(NSIO, 13, __u64) +#define NS_GET_CGROUPNS_ID _IOR(NSIO, 14, __u64) +#define NS_GET_IPCNS_ID _IOR(NSIO, 15, __u64) +#define NS_GET_UTSNS_ID _IOR(NSIO, 16, __u64) +#define NS_GET_PIDNS_ID _IOR(NSIO, 17, __u64) +#define NS_GET_TIMENS_ID _IOR(NSIO, 18, __u64) +#define NS_GET_USERNS_ID _IOR(NSIO, 19, __u64) + +enum init_ns_ino { + IPC_NS_INIT_INO = 0xEFFFFFFFU, + UTS_NS_INIT_INO = 0xEFFFFFFEU, + USER_NS_INIT_INO = 0xEFFFFFFDU, + PID_NS_INIT_INO = 0xEFFFFFFCU, + CGROUP_NS_INIT_INO = 0xEFFFFFFBU, + TIME_NS_INIT_INO = 0xEFFFFFFAU, + NET_NS_INIT_INO = 0xEFFFFFF9U, + MNT_NS_INIT_INO = 0xEFFFFFF8U, +}; + #endif /* __LINUX_NSFS_H */
Add a bunch of selftests for the identifier retrieval ioctls.
Signed-off-by: Christian Brauner brauner@kernel.org --- tools/testing/selftests/namespaces/.gitignore | 1 + tools/testing/selftests/namespaces/Makefile | 7 + tools/testing/selftests/namespaces/config | 7 + tools/testing/selftests/namespaces/nsid_test.c | 986 +++++++++++++++++++++++++ 4 files changed, 1001 insertions(+)
diff --git a/tools/testing/selftests/namespaces/.gitignore b/tools/testing/selftests/namespaces/.gitignore new file mode 100644 index 000000000000..c1e8d634dd21 --- /dev/null +++ b/tools/testing/selftests/namespaces/.gitignore @@ -0,0 +1 @@ +nsid_test diff --git a/tools/testing/selftests/namespaces/Makefile b/tools/testing/selftests/namespaces/Makefile new file mode 100644 index 000000000000..9280c703533e --- /dev/null +++ b/tools/testing/selftests/namespaces/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only +CFLAGS += -Wall -O0 -g $(KHDR_INCLUDES) $(TOOLS_INCLUDES) + +TEST_GEN_PROGS := nsid_test + +include ../lib.mk + diff --git a/tools/testing/selftests/namespaces/config b/tools/testing/selftests/namespaces/config new file mode 100644 index 000000000000..d09836260262 --- /dev/null +++ b/tools/testing/selftests/namespaces/config @@ -0,0 +1,7 @@ +CONFIG_UTS_NS=y +CONFIG_TIME_NS=y +CONFIG_IPC_NS=y +CONFIG_USER_NS=y +CONFIG_PID_NS=y +CONFIG_NET_NS=y +CONFIG_CGROUPS=y diff --git a/tools/testing/selftests/namespaces/nsid_test.c b/tools/testing/selftests/namespaces/nsid_test.c new file mode 100644 index 000000000000..280dde9b71dc --- /dev/null +++ b/tools/testing/selftests/namespaces/nsid_test.c @@ -0,0 +1,986 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <assert.h> +#include <fcntl.h> +#include <inttypes.h> +#include <libgen.h> +#include <limits.h> +#include <pthread.h> +#include <string.h> +#include <sys/mount.h> +#include <poll.h> +#include <sys/epoll.h> +#include <sys/resource.h> +#include <sys/stat.h> +#include <sys/socket.h> +#include <sys/un.h> +#include <unistd.h> +#include <linux/fs.h> +#include <linux/limits.h> +#include <linux/nsfs.h> +#include "../kselftest_harness.h" + +TEST(nsid_mntns_basic) +{ + __u64 mnt_ns_id = 0; + int fd_mntns; + int ret; + + /* Open the current mount namespace */ + fd_mntns = open("/proc/self/ns/mnt", O_RDONLY); + ASSERT_GE(fd_mntns, 0); + + /* Get the mount namespace ID */ + ret = ioctl(fd_mntns, NS_GET_MNTNS_ID, &mnt_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(mnt_ns_id, 0); + + /* Verify we can get the same ID again */ + __u64 mnt_ns_id2 = 0; + ret = ioctl(fd_mntns, NS_GET_MNTNS_ID, &mnt_ns_id2); + ASSERT_EQ(ret, 0); + ASSERT_EQ(mnt_ns_id, mnt_ns_id2); + + close(fd_mntns); +} + +TEST(nsid_mntns_separate) +{ + __u64 parent_mnt_ns_id = 0; + __u64 child_mnt_ns_id = 0; + int fd_parent_mntns, fd_child_mntns; + int ret; + pid_t pid; + int pipefd[2]; + + /* Get parent's mount namespace ID */ + fd_parent_mntns = open("/proc/self/ns/mnt", O_RDONLY); + ASSERT_GE(fd_parent_mntns, 0); + ret = ioctl(fd_parent_mntns, NS_GET_MNTNS_ID, &parent_mnt_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(parent_mnt_ns_id, 0); + + /* Create a pipe for synchronization */ + ASSERT_EQ(pipe(pipefd), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* Create new mount namespace */ + ret = unshare(CLONE_NEWNS); + if (ret != 0) { + /* Skip test if we don't have permission */ + if (errno == EPERM || errno == EACCES) { + write(pipefd[1], "S", 1); /* Signal skip */ + _exit(0); + } + _exit(1); + } + + /* Signal success */ + write(pipefd[1], "Y", 1); + close(pipefd[1]); + + /* Keep namespace alive */ + pause(); + _exit(0); + } + + /* Parent process */ + close(pipefd[1]); + + char buf; + ASSERT_EQ(read(pipefd[0], &buf, 1), 1); + close(pipefd[0]); + + if (buf == 'S') { + /* Child couldn't create namespace, skip test */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); + close(fd_parent_mntns); + SKIP(return, "No permission to create mount namespace"); + } + + ASSERT_EQ(buf, 'Y'); + + /* Open child's mount namespace */ + char path[256]; + snprintf(path, sizeof(path), "/proc/%d/ns/mnt", pid); + fd_child_mntns = open(path, O_RDONLY); + ASSERT_GE(fd_child_mntns, 0); + + /* Get child's mount namespace ID */ + ret = ioctl(fd_child_mntns, NS_GET_MNTNS_ID, &child_mnt_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(child_mnt_ns_id, 0); + + /* Parent and child should have different mount namespace IDs */ + ASSERT_NE(parent_mnt_ns_id, child_mnt_ns_id); + + close(fd_parent_mntns); + close(fd_child_mntns); + + /* Clean up child process */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); +} + +TEST(nsid_cgroupns_basic) +{ + __u64 cgroup_ns_id = 0; + int fd_cgroupns; + int ret; + + /* Open the current cgroup namespace */ + fd_cgroupns = open("/proc/self/ns/cgroup", O_RDONLY); + ASSERT_GE(fd_cgroupns, 0); + + /* Get the cgroup namespace ID */ + ret = ioctl(fd_cgroupns, NS_GET_CGROUPNS_ID, &cgroup_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(cgroup_ns_id, 0); + + /* Verify we can get the same ID again */ + __u64 cgroup_ns_id2 = 0; + ret = ioctl(fd_cgroupns, NS_GET_CGROUPNS_ID, &cgroup_ns_id2); + ASSERT_EQ(ret, 0); + ASSERT_EQ(cgroup_ns_id, cgroup_ns_id2); + + close(fd_cgroupns); +} + +TEST(nsid_cgroupns_separate) +{ + __u64 parent_cgroup_ns_id = 0; + __u64 child_cgroup_ns_id = 0; + int fd_parent_cgroupns, fd_child_cgroupns; + int ret; + pid_t pid; + int pipefd[2]; + + /* Get parent's cgroup namespace ID */ + fd_parent_cgroupns = open("/proc/self/ns/cgroup", O_RDONLY); + ASSERT_GE(fd_parent_cgroupns, 0); + ret = ioctl(fd_parent_cgroupns, NS_GET_CGROUPNS_ID, &parent_cgroup_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(parent_cgroup_ns_id, 0); + + /* Create a pipe for synchronization */ + ASSERT_EQ(pipe(pipefd), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* Create new cgroup namespace */ + ret = unshare(CLONE_NEWCGROUP); + if (ret != 0) { + /* Skip test if we don't have permission */ + if (errno == EPERM || errno == EACCES) { + write(pipefd[1], "S", 1); /* Signal skip */ + _exit(0); + } + _exit(1); + } + + /* Signal success */ + write(pipefd[1], "Y", 1); + close(pipefd[1]); + + /* Keep namespace alive */ + pause(); + _exit(0); + } + + /* Parent process */ + close(pipefd[1]); + + char buf; + ASSERT_EQ(read(pipefd[0], &buf, 1), 1); + close(pipefd[0]); + + if (buf == 'S') { + /* Child couldn't create namespace, skip test */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); + close(fd_parent_cgroupns); + SKIP(return, "No permission to create cgroup namespace"); + } + + ASSERT_EQ(buf, 'Y'); + + /* Open child's cgroup namespace */ + char path[256]; + snprintf(path, sizeof(path), "/proc/%d/ns/cgroup", pid); + fd_child_cgroupns = open(path, O_RDONLY); + ASSERT_GE(fd_child_cgroupns, 0); + + /* Get child's cgroup namespace ID */ + ret = ioctl(fd_child_cgroupns, NS_GET_CGROUPNS_ID, &child_cgroup_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(child_cgroup_ns_id, 0); + + /* Parent and child should have different cgroup namespace IDs */ + ASSERT_NE(parent_cgroup_ns_id, child_cgroup_ns_id); + + close(fd_parent_cgroupns); + close(fd_child_cgroupns); + + /* Clean up child process */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); +} + +TEST(nsid_ipcns_basic) +{ + __u64 ipc_ns_id = 0; + int fd_ipcns; + int ret; + + /* Open the current IPC namespace */ + fd_ipcns = open("/proc/self/ns/ipc", O_RDONLY); + ASSERT_GE(fd_ipcns, 0); + + /* Get the IPC namespace ID */ + ret = ioctl(fd_ipcns, NS_GET_IPCNS_ID, &ipc_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(ipc_ns_id, 0); + + /* Verify we can get the same ID again */ + __u64 ipc_ns_id2 = 0; + ret = ioctl(fd_ipcns, NS_GET_IPCNS_ID, &ipc_ns_id2); + ASSERT_EQ(ret, 0); + ASSERT_EQ(ipc_ns_id, ipc_ns_id2); + + close(fd_ipcns); +} + +TEST(nsid_ipcns_separate) +{ + __u64 parent_ipc_ns_id = 0; + __u64 child_ipc_ns_id = 0; + int fd_parent_ipcns, fd_child_ipcns; + int ret; + pid_t pid; + int pipefd[2]; + + /* Get parent's IPC namespace ID */ + fd_parent_ipcns = open("/proc/self/ns/ipc", O_RDONLY); + ASSERT_GE(fd_parent_ipcns, 0); + ret = ioctl(fd_parent_ipcns, NS_GET_IPCNS_ID, &parent_ipc_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(parent_ipc_ns_id, 0); + + /* Create a pipe for synchronization */ + ASSERT_EQ(pipe(pipefd), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* Create new IPC namespace */ + ret = unshare(CLONE_NEWIPC); + if (ret != 0) { + /* Skip test if we don't have permission */ + if (errno == EPERM || errno == EACCES) { + write(pipefd[1], "S", 1); /* Signal skip */ + _exit(0); + } + _exit(1); + } + + /* Signal success */ + write(pipefd[1], "Y", 1); + close(pipefd[1]); + + /* Keep namespace alive */ + pause(); + _exit(0); + } + + /* Parent process */ + close(pipefd[1]); + + char buf; + ASSERT_EQ(read(pipefd[0], &buf, 1), 1); + close(pipefd[0]); + + if (buf == 'S') { + /* Child couldn't create namespace, skip test */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); + close(fd_parent_ipcns); + SKIP(return, "No permission to create IPC namespace"); + } + + ASSERT_EQ(buf, 'Y'); + + /* Open child's IPC namespace */ + char path[256]; + snprintf(path, sizeof(path), "/proc/%d/ns/ipc", pid); + fd_child_ipcns = open(path, O_RDONLY); + ASSERT_GE(fd_child_ipcns, 0); + + /* Get child's IPC namespace ID */ + ret = ioctl(fd_child_ipcns, NS_GET_IPCNS_ID, &child_ipc_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(child_ipc_ns_id, 0); + + /* Parent and child should have different IPC namespace IDs */ + ASSERT_NE(parent_ipc_ns_id, child_ipc_ns_id); + + close(fd_parent_ipcns); + close(fd_child_ipcns); + + /* Clean up child process */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); +} + +TEST(nsid_utsns_basic) +{ + __u64 uts_ns_id = 0; + int fd_utsns; + int ret; + + /* Open the current UTS namespace */ + fd_utsns = open("/proc/self/ns/uts", O_RDONLY); + ASSERT_GE(fd_utsns, 0); + + /* Get the UTS namespace ID */ + ret = ioctl(fd_utsns, NS_GET_UTSNS_ID, &uts_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(uts_ns_id, 0); + + /* Verify we can get the same ID again */ + __u64 uts_ns_id2 = 0; + ret = ioctl(fd_utsns, NS_GET_UTSNS_ID, &uts_ns_id2); + ASSERT_EQ(ret, 0); + ASSERT_EQ(uts_ns_id, uts_ns_id2); + + close(fd_utsns); +} + +TEST(nsid_utsns_separate) +{ + __u64 parent_uts_ns_id = 0; + __u64 child_uts_ns_id = 0; + int fd_parent_utsns, fd_child_utsns; + int ret; + pid_t pid; + int pipefd[2]; + + /* Get parent's UTS namespace ID */ + fd_parent_utsns = open("/proc/self/ns/uts", O_RDONLY); + ASSERT_GE(fd_parent_utsns, 0); + ret = ioctl(fd_parent_utsns, NS_GET_UTSNS_ID, &parent_uts_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(parent_uts_ns_id, 0); + + /* Create a pipe for synchronization */ + ASSERT_EQ(pipe(pipefd), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* Create new UTS namespace */ + ret = unshare(CLONE_NEWUTS); + if (ret != 0) { + /* Skip test if we don't have permission */ + if (errno == EPERM || errno == EACCES) { + write(pipefd[1], "S", 1); /* Signal skip */ + _exit(0); + } + _exit(1); + } + + /* Signal success */ + write(pipefd[1], "Y", 1); + close(pipefd[1]); + + /* Keep namespace alive */ + pause(); + _exit(0); + } + + /* Parent process */ + close(pipefd[1]); + + char buf; + ASSERT_EQ(read(pipefd[0], &buf, 1), 1); + close(pipefd[0]); + + if (buf == 'S') { + /* Child couldn't create namespace, skip test */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); + close(fd_parent_utsns); + SKIP(return, "No permission to create UTS namespace"); + } + + ASSERT_EQ(buf, 'Y'); + + /* Open child's UTS namespace */ + char path[256]; + snprintf(path, sizeof(path), "/proc/%d/ns/uts", pid); + fd_child_utsns = open(path, O_RDONLY); + ASSERT_GE(fd_child_utsns, 0); + + /* Get child's UTS namespace ID */ + ret = ioctl(fd_child_utsns, NS_GET_UTSNS_ID, &child_uts_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(child_uts_ns_id, 0); + + /* Parent and child should have different UTS namespace IDs */ + ASSERT_NE(parent_uts_ns_id, child_uts_ns_id); + + close(fd_parent_utsns); + close(fd_child_utsns); + + /* Clean up child process */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); +} + +TEST(nsid_userns_basic) +{ + __u64 user_ns_id = 0; + int fd_userns; + int ret; + + /* Open the current user namespace */ + fd_userns = open("/proc/self/ns/user", O_RDONLY); + ASSERT_GE(fd_userns, 0); + + /* Get the user namespace ID */ + ret = ioctl(fd_userns, NS_GET_USERNS_ID, &user_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(user_ns_id, 0); + + /* Verify we can get the same ID again */ + __u64 user_ns_id2 = 0; + ret = ioctl(fd_userns, NS_GET_USERNS_ID, &user_ns_id2); + ASSERT_EQ(ret, 0); + ASSERT_EQ(user_ns_id, user_ns_id2); + + close(fd_userns); +} + +TEST(nsid_userns_separate) +{ + __u64 parent_user_ns_id = 0; + __u64 child_user_ns_id = 0; + int fd_parent_userns, fd_child_userns; + int ret; + pid_t pid; + int pipefd[2]; + + /* Get parent's user namespace ID */ + fd_parent_userns = open("/proc/self/ns/user", O_RDONLY); + ASSERT_GE(fd_parent_userns, 0); + ret = ioctl(fd_parent_userns, NS_GET_USERNS_ID, &parent_user_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(parent_user_ns_id, 0); + + /* Create a pipe for synchronization */ + ASSERT_EQ(pipe(pipefd), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* Create new user namespace */ + ret = unshare(CLONE_NEWUSER); + if (ret != 0) { + /* Skip test if we don't have permission */ + if (errno == EPERM || errno == EACCES) { + write(pipefd[1], "S", 1); /* Signal skip */ + _exit(0); + } + _exit(1); + } + + /* Signal success */ + write(pipefd[1], "Y", 1); + close(pipefd[1]); + + /* Keep namespace alive */ + pause(); + _exit(0); + } + + /* Parent process */ + close(pipefd[1]); + + char buf; + ASSERT_EQ(read(pipefd[0], &buf, 1), 1); + close(pipefd[0]); + + if (buf == 'S') { + /* Child couldn't create namespace, skip test */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); + close(fd_parent_userns); + SKIP(return, "No permission to create user namespace"); + } + + ASSERT_EQ(buf, 'Y'); + + /* Open child's user namespace */ + char path[256]; + snprintf(path, sizeof(path), "/proc/%d/ns/user", pid); + fd_child_userns = open(path, O_RDONLY); + ASSERT_GE(fd_child_userns, 0); + + /* Get child's user namespace ID */ + ret = ioctl(fd_child_userns, NS_GET_USERNS_ID, &child_user_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(child_user_ns_id, 0); + + /* Parent and child should have different user namespace IDs */ + ASSERT_NE(parent_user_ns_id, child_user_ns_id); + + close(fd_parent_userns); + close(fd_child_userns); + + /* Clean up child process */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); +} + +TEST(nsid_timens_basic) +{ + __u64 time_ns_id = 0; + int fd_timens; + int ret; + + /* Open the current time namespace */ + fd_timens = open("/proc/self/ns/time", O_RDONLY); + if (fd_timens < 0) { + SKIP(return, "Time namespaces not supported"); + } + + /* Get the time namespace ID */ + ret = ioctl(fd_timens, NS_GET_TIMENS_ID, &time_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(time_ns_id, 0); + + /* Verify we can get the same ID again */ + __u64 time_ns_id2 = 0; + ret = ioctl(fd_timens, NS_GET_TIMENS_ID, &time_ns_id2); + ASSERT_EQ(ret, 0); + ASSERT_EQ(time_ns_id, time_ns_id2); + + close(fd_timens); +} + +TEST(nsid_timens_separate) +{ + __u64 parent_time_ns_id = 0; + __u64 child_time_ns_id = 0; + int fd_parent_timens, fd_child_timens; + int ret; + pid_t pid; + int pipefd[2]; + + /* Open the current time namespace */ + fd_parent_timens = open("/proc/self/ns/time", O_RDONLY); + if (fd_parent_timens < 0) { + SKIP(return, "Time namespaces not supported"); + } + + /* Get parent's time namespace ID */ + ret = ioctl(fd_parent_timens, NS_GET_TIMENS_ID, &parent_time_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(parent_time_ns_id, 0); + + /* Create a pipe for synchronization */ + ASSERT_EQ(pipe(pipefd), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* Create new time namespace */ + ret = unshare(CLONE_NEWTIME); + if (ret != 0) { + /* Skip test if we don't have permission */ + if (errno == EPERM || errno == EACCES || errno == EINVAL) { + write(pipefd[1], "S", 1); /* Signal skip */ + _exit(0); + } + _exit(1); + } + + /* Fork a grandchild to actually enter the new namespace */ + pid_t grandchild = fork(); + if (grandchild == 0) { + /* Grandchild is in the new namespace */ + write(pipefd[1], "Y", 1); + close(pipefd[1]); + pause(); + _exit(0); + } else if (grandchild > 0) { + /* Child writes grandchild PID and waits */ + write(pipefd[1], "Y", 1); + write(pipefd[1], &grandchild, sizeof(grandchild)); + close(pipefd[1]); + pause(); /* Keep the parent alive to maintain the grandchild */ + _exit(0); + } else { + _exit(1); + } + } + + /* Parent process */ + close(pipefd[1]); + + char buf; + ASSERT_EQ(read(pipefd[0], &buf, 1), 1); + + if (buf == 'S') { + /* Child couldn't create namespace, skip test */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); + close(fd_parent_timens); + close(pipefd[0]); + SKIP(return, "Cannot create time namespace"); + } + + ASSERT_EQ(buf, 'Y'); + + pid_t grandchild_pid; + ASSERT_EQ(read(pipefd[0], &grandchild_pid, sizeof(grandchild_pid)), sizeof(grandchild_pid)); + close(pipefd[0]); + + /* Open grandchild's time namespace */ + char path[256]; + snprintf(path, sizeof(path), "/proc/%d/ns/time", grandchild_pid); + fd_child_timens = open(path, O_RDONLY); + ASSERT_GE(fd_child_timens, 0); + + /* Get child's time namespace ID */ + ret = ioctl(fd_child_timens, NS_GET_TIMENS_ID, &child_time_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(child_time_ns_id, 0); + + /* Parent and child should have different time namespace IDs */ + ASSERT_NE(parent_time_ns_id, child_time_ns_id); + + close(fd_parent_timens); + close(fd_child_timens); + + /* Clean up child process */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); +} + +TEST(nsid_pidns_basic) +{ + __u64 pid_ns_id = 0; + int fd_pidns; + int ret; + + /* Open the current PID namespace */ + fd_pidns = open("/proc/self/ns/pid", O_RDONLY); + ASSERT_GE(fd_pidns, 0); + + /* Get the PID namespace ID */ + ret = ioctl(fd_pidns, NS_GET_PIDNS_ID, &pid_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(pid_ns_id, 0); + + /* Verify we can get the same ID again */ + __u64 pid_ns_id2 = 0; + ret = ioctl(fd_pidns, NS_GET_PIDNS_ID, &pid_ns_id2); + ASSERT_EQ(ret, 0); + ASSERT_EQ(pid_ns_id, pid_ns_id2); + + close(fd_pidns); +} + +TEST(nsid_pidns_separate) +{ + __u64 parent_pid_ns_id = 0; + __u64 child_pid_ns_id = 0; + int fd_parent_pidns, fd_child_pidns; + int ret; + pid_t pid; + int pipefd[2]; + + /* Get parent's PID namespace ID */ + fd_parent_pidns = open("/proc/self/ns/pid", O_RDONLY); + ASSERT_GE(fd_parent_pidns, 0); + ret = ioctl(fd_parent_pidns, NS_GET_PIDNS_ID, &parent_pid_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(parent_pid_ns_id, 0); + + /* Create a pipe for synchronization */ + ASSERT_EQ(pipe(pipefd), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* Create new PID namespace */ + ret = unshare(CLONE_NEWPID); + if (ret != 0) { + /* Skip test if we don't have permission */ + if (errno == EPERM || errno == EACCES) { + write(pipefd[1], "S", 1); /* Signal skip */ + _exit(0); + } + _exit(1); + } + + /* Fork a grandchild to actually enter the new namespace */ + pid_t grandchild = fork(); + if (grandchild == 0) { + /* Grandchild is in the new namespace */ + write(pipefd[1], "Y", 1); + close(pipefd[1]); + pause(); + _exit(0); + } else if (grandchild > 0) { + /* Child writes grandchild PID and waits */ + write(pipefd[1], "Y", 1); + write(pipefd[1], &grandchild, sizeof(grandchild)); + close(pipefd[1]); + pause(); /* Keep the parent alive to maintain the grandchild */ + _exit(0); + } else { + _exit(1); + } + } + + /* Parent process */ + close(pipefd[1]); + + char buf; + ASSERT_EQ(read(pipefd[0], &buf, 1), 1); + + if (buf == 'S') { + /* Child couldn't create namespace, skip test */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); + close(fd_parent_pidns); + close(pipefd[0]); + SKIP(return, "No permission to create PID namespace"); + } + + ASSERT_EQ(buf, 'Y'); + + pid_t grandchild_pid; + ASSERT_EQ(read(pipefd[0], &grandchild_pid, sizeof(grandchild_pid)), sizeof(grandchild_pid)); + close(pipefd[0]); + + /* Open grandchild's PID namespace */ + char path[256]; + snprintf(path, sizeof(path), "/proc/%d/ns/pid", grandchild_pid); + fd_child_pidns = open(path, O_RDONLY); + ASSERT_GE(fd_child_pidns, 0); + + /* Get child's PID namespace ID */ + ret = ioctl(fd_child_pidns, NS_GET_PIDNS_ID, &child_pid_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(child_pid_ns_id, 0); + + /* Parent and child should have different PID namespace IDs */ + ASSERT_NE(parent_pid_ns_id, child_pid_ns_id); + + close(fd_parent_pidns); + close(fd_child_pidns); + + /* Clean up child process */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); +} + +TEST(nsid_netns_basic) +{ + __u64 net_ns_id = 0; + __u64 netns_cookie = 0; + int fd_netns; + int sock; + socklen_t optlen; + int ret; + + /* Open the current network namespace */ + fd_netns = open("/proc/self/ns/net", O_RDONLY); + ASSERT_GE(fd_netns, 0); + + /* Get the network namespace ID via ioctl */ + ret = ioctl(fd_netns, NS_GET_NETNS_ID, &net_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(net_ns_id, 0); + + /* Create a socket to get the SO_NETNS_COOKIE */ + sock = socket(AF_UNIX, SOCK_STREAM, 0); + ASSERT_GE(sock, 0); + + /* Get the network namespace cookie via socket option */ + optlen = sizeof(netns_cookie); + ret = getsockopt(sock, SOL_SOCKET, SO_NETNS_COOKIE, &netns_cookie, &optlen); + ASSERT_EQ(ret, 0); + ASSERT_EQ(optlen, sizeof(netns_cookie)); + + /* The namespace ID and cookie should be identical */ + ASSERT_EQ(net_ns_id, netns_cookie); + + /* Verify we can get the same ID again */ + __u64 net_ns_id2 = 0; + ret = ioctl(fd_netns, NS_GET_NETNS_ID, &net_ns_id2); + ASSERT_EQ(ret, 0); + ASSERT_EQ(net_ns_id, net_ns_id2); + + close(sock); + close(fd_netns); +} + +TEST(nsid_netns_separate) +{ + __u64 parent_net_ns_id = 0; + __u64 parent_netns_cookie = 0; + __u64 child_net_ns_id = 0; + __u64 child_netns_cookie = 0; + int fd_parent_netns, fd_child_netns; + int parent_sock, child_sock; + socklen_t optlen; + int ret; + pid_t pid; + int pipefd[2]; + + /* Get parent's network namespace ID */ + fd_parent_netns = open("/proc/self/ns/net", O_RDONLY); + ASSERT_GE(fd_parent_netns, 0); + ret = ioctl(fd_parent_netns, NS_GET_NETNS_ID, &parent_net_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(parent_net_ns_id, 0); + + /* Get parent's network namespace cookie */ + parent_sock = socket(AF_UNIX, SOCK_STREAM, 0); + ASSERT_GE(parent_sock, 0); + optlen = sizeof(parent_netns_cookie); + ret = getsockopt(parent_sock, SOL_SOCKET, SO_NETNS_COOKIE, &parent_netns_cookie, &optlen); + ASSERT_EQ(ret, 0); + + /* Verify parent's ID and cookie match */ + ASSERT_EQ(parent_net_ns_id, parent_netns_cookie); + + /* Create a pipe for synchronization */ + ASSERT_EQ(pipe(pipefd), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* Create new network namespace */ + ret = unshare(CLONE_NEWNET); + if (ret != 0) { + /* Skip test if we don't have permission */ + if (errno == EPERM || errno == EACCES) { + write(pipefd[1], "S", 1); /* Signal skip */ + _exit(0); + } + _exit(1); + } + + /* Signal success */ + write(pipefd[1], "Y", 1); + close(pipefd[1]); + + /* Keep namespace alive */ + pause(); + _exit(0); + } + + /* Parent process */ + close(pipefd[1]); + + char buf; + ASSERT_EQ(read(pipefd[0], &buf, 1), 1); + close(pipefd[0]); + + if (buf == 'S') { + /* Child couldn't create namespace, skip test */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); + close(fd_parent_netns); + close(parent_sock); + SKIP(return, "No permission to create network namespace"); + } + + ASSERT_EQ(buf, 'Y'); + + /* Open child's network namespace */ + char path[256]; + snprintf(path, sizeof(path), "/proc/%d/ns/net", pid); + fd_child_netns = open(path, O_RDONLY); + ASSERT_GE(fd_child_netns, 0); + + /* Get child's network namespace ID */ + ret = ioctl(fd_child_netns, NS_GET_NETNS_ID, &child_net_ns_id); + ASSERT_EQ(ret, 0); + ASSERT_NE(child_net_ns_id, 0); + + /* Create socket in child's namespace to get cookie */ + ret = setns(fd_child_netns, CLONE_NEWNET); + if (ret == 0) { + child_sock = socket(AF_UNIX, SOCK_STREAM, 0); + ASSERT_GE(child_sock, 0); + + optlen = sizeof(child_netns_cookie); + ret = getsockopt(child_sock, SOL_SOCKET, SO_NETNS_COOKIE, &child_netns_cookie, &optlen); + ASSERT_EQ(ret, 0); + + /* Verify child's ID and cookie match */ + ASSERT_EQ(child_net_ns_id, child_netns_cookie); + + close(child_sock); + + /* Return to parent namespace */ + setns(fd_parent_netns, CLONE_NEWNET); + } + + /* Parent and child should have different network namespace IDs */ + ASSERT_NE(parent_net_ns_id, child_net_ns_id); + if (child_netns_cookie != 0) { + ASSERT_NE(parent_netns_cookie, child_netns_cookie); + } + + close(fd_parent_netns); + close(fd_child_netns); + close(parent_sock); + + /* Clean up child process */ + kill(pid, SIGTERM); + waitpid(pid, NULL, 0); +} + +TEST_HARNESS_MAIN
Add a bunch of selftests for namespace file handles.
Signed-off-by: Christian Brauner brauner@kernel.org --- tools/testing/selftests/namespaces/.gitignore | 1 + tools/testing/selftests/namespaces/Makefile | 2 +- .../selftests/namespaces/file_handle_test.c | 1410 ++++++++++++++++++++ 3 files changed, 1412 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/namespaces/.gitignore b/tools/testing/selftests/namespaces/.gitignore index c1e8d634dd21..7639dbf58bbf 100644 --- a/tools/testing/selftests/namespaces/.gitignore +++ b/tools/testing/selftests/namespaces/.gitignore @@ -1 +1,2 @@ nsid_test +file_handle_test diff --git a/tools/testing/selftests/namespaces/Makefile b/tools/testing/selftests/namespaces/Makefile index 9280c703533e..f6c117ce2c2b 100644 --- a/tools/testing/selftests/namespaces/Makefile +++ b/tools/testing/selftests/namespaces/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only CFLAGS += -Wall -O0 -g $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
-TEST_GEN_PROGS := nsid_test +TEST_GEN_PROGS := nsid_test file_handle_test
include ../lib.mk
diff --git a/tools/testing/selftests/namespaces/file_handle_test.c b/tools/testing/selftests/namespaces/file_handle_test.c new file mode 100644 index 000000000000..87573fa06990 --- /dev/null +++ b/tools/testing/selftests/namespaces/file_handle_test.c @@ -0,0 +1,1410 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <errno.h> +#include <fcntl.h> +#include <grp.h> +#include <limits.h> +#include <sched.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/mount.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> +#include <linux/unistd.h> +#include "../kselftest_harness.h" + +#ifndef FD_NSFS_ROOT +#define FD_NSFS_ROOT -10003 /* Root of the nsfs filesystem */ +#endif + +TEST(nsfs_net_handle) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + char ns_path[256]; + struct stat st1, st2; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Open a namespace file descriptor */ + snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/net"); + ns_fd = open(ns_path, O_RDONLY); + ASSERT_GE(ns_fd, 0); + + /* Get handle for the namespace */ + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + ASSERT_GT(handle->handle_bytes, 0); + + /* Try to open using FD_NSFS_ROOT */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) { + SKIP(free(handle); close(ns_fd); + return, + "open_by_handle_at with FD_NSFS_ROOT not supported"); + } + ASSERT_GE(fd, 0); + + /* Verify we opened the correct namespace */ + ASSERT_EQ(fstat(ns_fd, &st1), 0); + ASSERT_EQ(fstat(fd, &st2), 0); + ASSERT_EQ(st1.st_ino, st2.st_ino); + ASSERT_EQ(st1.st_dev, st2.st_dev); + + close(fd); + close(ns_fd); + free(handle); +} + +TEST(nsfs_uts_handle) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + char ns_path[256]; + struct stat st1, st2; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Open UTS namespace file descriptor */ + snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/uts"); + ns_fd = open(ns_path, O_RDONLY); + ASSERT_GE(ns_fd, 0); + + /* Get handle for the namespace */ + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + ASSERT_GT(handle->handle_bytes, 0); + + /* Try to open using FD_NSFS_ROOT */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) { + SKIP(free(handle); close(ns_fd); + return, + "open_by_handle_at with FD_NSFS_ROOT not supported"); + } + ASSERT_GE(fd, 0); + + /* Verify we opened the correct namespace */ + ASSERT_EQ(fstat(ns_fd, &st1), 0); + ASSERT_EQ(fstat(fd, &st2), 0); + ASSERT_EQ(st1.st_ino, st2.st_ino); + ASSERT_EQ(st1.st_dev, st2.st_dev); + + close(fd); + close(ns_fd); + free(handle); +} + +TEST(nsfs_ipc_handle) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + char ns_path[256]; + struct stat st1, st2; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Open IPC namespace file descriptor */ + snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/ipc"); + ns_fd = open(ns_path, O_RDONLY); + ASSERT_GE(ns_fd, 0); + + /* Get handle for the namespace */ + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + ASSERT_GT(handle->handle_bytes, 0); + + /* Try to open using FD_NSFS_ROOT */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) { + SKIP(free(handle); close(ns_fd); + return, + "open_by_handle_at with FD_NSFS_ROOT not supported"); + } + ASSERT_GE(fd, 0); + + /* Verify we opened the correct namespace */ + ASSERT_EQ(fstat(ns_fd, &st1), 0); + ASSERT_EQ(fstat(fd, &st2), 0); + ASSERT_EQ(st1.st_ino, st2.st_ino); + ASSERT_EQ(st1.st_dev, st2.st_dev); + + close(fd); + close(ns_fd); + free(handle); +} + +TEST(nsfs_pid_handle) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + char ns_path[256]; + struct stat st1, st2; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Open PID namespace file descriptor */ + snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/pid"); + ns_fd = open(ns_path, O_RDONLY); + ASSERT_GE(ns_fd, 0); + + /* Get handle for the namespace */ + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + ASSERT_GT(handle->handle_bytes, 0); + + /* Try to open using FD_NSFS_ROOT */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) { + SKIP(free(handle); close(ns_fd); + return, + "open_by_handle_at with FD_NSFS_ROOT not supported"); + } + ASSERT_GE(fd, 0); + + /* Verify we opened the correct namespace */ + ASSERT_EQ(fstat(ns_fd, &st1), 0); + ASSERT_EQ(fstat(fd, &st2), 0); + ASSERT_EQ(st1.st_ino, st2.st_ino); + ASSERT_EQ(st1.st_dev, st2.st_dev); + + close(fd); + close(ns_fd); + free(handle); +} + +TEST(nsfs_mnt_handle) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + char ns_path[256]; + struct stat st1, st2; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Open mount namespace file descriptor */ + snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/mnt"); + ns_fd = open(ns_path, O_RDONLY); + ASSERT_GE(ns_fd, 0); + + /* Get handle for the namespace */ + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + ASSERT_GT(handle->handle_bytes, 0); + + /* Try to open using FD_NSFS_ROOT */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) { + SKIP(free(handle); close(ns_fd); + return, + "open_by_handle_at with FD_NSFS_ROOT not supported"); + } + ASSERT_GE(fd, 0); + + /* Verify we opened the correct namespace */ + ASSERT_EQ(fstat(ns_fd, &st1), 0); + ASSERT_EQ(fstat(fd, &st2), 0); + ASSERT_EQ(st1.st_ino, st2.st_ino); + ASSERT_EQ(st1.st_dev, st2.st_dev); + + close(fd); + close(ns_fd); + free(handle); +} + +TEST(nsfs_user_handle) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + char ns_path[256]; + struct stat st1, st2; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Open user namespace file descriptor */ + snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/user"); + ns_fd = open(ns_path, O_RDONLY); + ASSERT_GE(ns_fd, 0); + + /* Get handle for the namespace */ + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + ASSERT_GT(handle->handle_bytes, 0); + + /* Try to open using FD_NSFS_ROOT */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) { + SKIP(free(handle); close(ns_fd); + return, + "open_by_handle_at with FD_NSFS_ROOT not supported"); + } + ASSERT_GE(fd, 0); + + /* Verify we opened the correct namespace */ + ASSERT_EQ(fstat(ns_fd, &st1), 0); + ASSERT_EQ(fstat(fd, &st2), 0); + ASSERT_EQ(st1.st_ino, st2.st_ino); + ASSERT_EQ(st1.st_dev, st2.st_dev); + + close(fd); + close(ns_fd); + free(handle); +} + +TEST(nsfs_cgroup_handle) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + char ns_path[256]; + struct stat st1, st2; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Open cgroup namespace file descriptor */ + snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/cgroup"); + ns_fd = open(ns_path, O_RDONLY); + if (ns_fd < 0) { + SKIP(free(handle); return, "cgroup namespace not available"); + } + + /* Get handle for the namespace */ + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + ASSERT_GT(handle->handle_bytes, 0); + + /* Try to open using FD_NSFS_ROOT */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) { + SKIP(free(handle); close(ns_fd); + return, + "open_by_handle_at with FD_NSFS_ROOT not supported"); + } + ASSERT_GE(fd, 0); + + /* Verify we opened the correct namespace */ + ASSERT_EQ(fstat(ns_fd, &st1), 0); + ASSERT_EQ(fstat(fd, &st2), 0); + ASSERT_EQ(st1.st_ino, st2.st_ino); + ASSERT_EQ(st1.st_dev, st2.st_dev); + + close(fd); + close(ns_fd); + free(handle); +} + +TEST(nsfs_time_handle) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + char ns_path[256]; + struct stat st1, st2; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Open time namespace file descriptor */ + snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/time"); + ns_fd = open(ns_path, O_RDONLY); + if (ns_fd < 0) { + SKIP(free(handle); return, "time namespace not available"); + } + + /* Get handle for the namespace */ + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + ASSERT_GT(handle->handle_bytes, 0); + + /* Try to open using FD_NSFS_ROOT */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) { + SKIP(free(handle); close(ns_fd); + return, + "open_by_handle_at with FD_NSFS_ROOT not supported"); + } + ASSERT_GE(fd, 0); + + /* Verify we opened the correct namespace */ + ASSERT_EQ(fstat(ns_fd, &st1), 0); + ASSERT_EQ(fstat(fd, &st2), 0); + ASSERT_EQ(st1.st_ino, st2.st_ino); + ASSERT_EQ(st1.st_dev, st2.st_dev); + + close(fd); + close(ns_fd); + free(handle); +} + +TEST(nsfs_user_net_namespace_isolation) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + pid_t pid; + int status; + int pipefd[2]; + char result; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Create pipe for communication */ + ASSERT_EQ(pipe(pipefd), 0); + + /* Get handle for current network namespace */ + ns_fd = open("/proc/self/ns/net", O_RDONLY); + ASSERT_GE(ns_fd, 0); + + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); close(pipefd[0]); + close(pipefd[1]); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + close(ns_fd); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* First create new user namespace to drop privileges */ + ret = unshare(CLONE_NEWUSER); + if (ret < 0) { + write(pipefd[1], "U", + 1); /* Unable to create user namespace */ + close(pipefd[1]); + exit(0); + } + + /* Write uid/gid mappings to maintain some capabilities */ + int uid_map_fd = open("/proc/self/uid_map", O_WRONLY); + int gid_map_fd = open("/proc/self/gid_map", O_WRONLY); + int setgroups_fd = open("/proc/self/setgroups", O_WRONLY); + + if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) { + write(pipefd[1], "M", 1); /* Unable to set mappings */ + close(pipefd[1]); + exit(0); + } + + /* Disable setgroups to allow gid mapping */ + write(setgroups_fd, "deny", 4); + close(setgroups_fd); + + /* Map current uid/gid to root in the new namespace */ + char mapping[64]; + snprintf(mapping, sizeof(mapping), "0 %d 1", getuid()); + write(uid_map_fd, mapping, strlen(mapping)); + close(uid_map_fd); + + snprintf(mapping, sizeof(mapping), "0 %d 1", getgid()); + write(gid_map_fd, mapping, strlen(mapping)); + close(gid_map_fd); + + /* Now create new network namespace */ + ret = unshare(CLONE_NEWNET); + if (ret < 0) { + write(pipefd[1], "N", + 1); /* Unable to create network namespace */ + close(pipefd[1]); + exit(0); + } + + /* Try to open parent's network namespace handle from new user+net namespace */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + + if (fd >= 0) { + /* Should NOT succeed - we're in a different user namespace */ + write(pipefd[1], "S", 1); /* Unexpected success */ + close(fd); + } else if (errno == ESTALE) { + /* Expected: Stale file handle */ + write(pipefd[1], "P", 1); + } else { + /* Other error */ + write(pipefd[1], "F", 1); + } + + close(pipefd[1]); + exit(0); + } + + /* Parent process */ + close(pipefd[1]); + ASSERT_EQ(read(pipefd[0], &result, 1), 1); + + waitpid(pid, &status, 0); + ASSERT_TRUE(WIFEXITED(status)); + ASSERT_EQ(WEXITSTATUS(status), 0); + + if (result == 'U') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new user namespace"); + } + if (result == 'M') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot set uid/gid mappings"); + } + if (result == 'N') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new network namespace"); + } + + /* Should fail with permission denied since we're in a different user namespace */ + ASSERT_EQ(result, 'P'); + + close(pipefd[0]); + free(handle); +} + +TEST(nsfs_user_uts_namespace_isolation) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + pid_t pid; + int status; + int pipefd[2]; + char result; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Create pipe for communication */ + ASSERT_EQ(pipe(pipefd), 0); + + /* Get handle for current UTS namespace */ + ns_fd = open("/proc/self/ns/uts", O_RDONLY); + ASSERT_GE(ns_fd, 0); + + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); close(pipefd[0]); + close(pipefd[1]); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + close(ns_fd); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* First create new user namespace to drop privileges */ + ret = unshare(CLONE_NEWUSER); + if (ret < 0) { + write(pipefd[1], "U", + 1); /* Unable to create user namespace */ + close(pipefd[1]); + exit(0); + } + + /* Write uid/gid mappings to maintain some capabilities */ + int uid_map_fd = open("/proc/self/uid_map", O_WRONLY); + int gid_map_fd = open("/proc/self/gid_map", O_WRONLY); + int setgroups_fd = open("/proc/self/setgroups", O_WRONLY); + + if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) { + write(pipefd[1], "M", 1); /* Unable to set mappings */ + close(pipefd[1]); + exit(0); + } + + /* Disable setgroups to allow gid mapping */ + write(setgroups_fd, "deny", 4); + close(setgroups_fd); + + /* Map current uid/gid to root in the new namespace */ + char mapping[64]; + snprintf(mapping, sizeof(mapping), "0 %d 1", getuid()); + write(uid_map_fd, mapping, strlen(mapping)); + close(uid_map_fd); + + snprintf(mapping, sizeof(mapping), "0 %d 1", getgid()); + write(gid_map_fd, mapping, strlen(mapping)); + close(gid_map_fd); + + /* Now create new UTS namespace */ + ret = unshare(CLONE_NEWUTS); + if (ret < 0) { + write(pipefd[1], "N", + 1); /* Unable to create UTS namespace */ + close(pipefd[1]); + exit(0); + } + + /* Try to open parent's UTS namespace handle from new user+uts namespace */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + + if (fd >= 0) { + /* Should NOT succeed - we're in a different user namespace */ + write(pipefd[1], "S", 1); /* Unexpected success */ + close(fd); + } else if (errno == ESTALE) { + /* Expected: Stale file handle */ + write(pipefd[1], "P", 1); + } else { + /* Other error */ + write(pipefd[1], "F", 1); + } + + close(pipefd[1]); + exit(0); + } + + /* Parent process */ + close(pipefd[1]); + ASSERT_EQ(read(pipefd[0], &result, 1), 1); + + waitpid(pid, &status, 0); + ASSERT_TRUE(WIFEXITED(status)); + ASSERT_EQ(WEXITSTATUS(status), 0); + + if (result == 'U') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new user namespace"); + } + if (result == 'M') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot set uid/gid mappings"); + } + if (result == 'N') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new UTS namespace"); + } + + /* Should fail with ESTALE since we're in a different user namespace */ + ASSERT_EQ(result, 'P'); + + close(pipefd[0]); + free(handle); +} + +TEST(nsfs_user_ipc_namespace_isolation) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + pid_t pid; + int status; + int pipefd[2]; + char result; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Create pipe for communication */ + ASSERT_EQ(pipe(pipefd), 0); + + /* Get handle for current IPC namespace */ + ns_fd = open("/proc/self/ns/ipc", O_RDONLY); + ASSERT_GE(ns_fd, 0); + + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); close(pipefd[0]); + close(pipefd[1]); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + close(ns_fd); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* First create new user namespace to drop privileges */ + ret = unshare(CLONE_NEWUSER); + if (ret < 0) { + write(pipefd[1], "U", + 1); /* Unable to create user namespace */ + close(pipefd[1]); + exit(0); + } + + /* Write uid/gid mappings to maintain some capabilities */ + int uid_map_fd = open("/proc/self/uid_map", O_WRONLY); + int gid_map_fd = open("/proc/self/gid_map", O_WRONLY); + int setgroups_fd = open("/proc/self/setgroups", O_WRONLY); + + if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) { + write(pipefd[1], "M", 1); /* Unable to set mappings */ + close(pipefd[1]); + exit(0); + } + + /* Disable setgroups to allow gid mapping */ + write(setgroups_fd, "deny", 4); + close(setgroups_fd); + + /* Map current uid/gid to root in the new namespace */ + char mapping[64]; + snprintf(mapping, sizeof(mapping), "0 %d 1", getuid()); + write(uid_map_fd, mapping, strlen(mapping)); + close(uid_map_fd); + + snprintf(mapping, sizeof(mapping), "0 %d 1", getgid()); + write(gid_map_fd, mapping, strlen(mapping)); + close(gid_map_fd); + + /* Now create new IPC namespace */ + ret = unshare(CLONE_NEWIPC); + if (ret < 0) { + write(pipefd[1], "N", + 1); /* Unable to create IPC namespace */ + close(pipefd[1]); + exit(0); + } + + /* Try to open parent's IPC namespace handle from new user+ipc namespace */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + + if (fd >= 0) { + /* Should NOT succeed - we're in a different user namespace */ + write(pipefd[1], "S", 1); /* Unexpected success */ + close(fd); + } else if (errno == ESTALE) { + /* Expected: Stale file handle */ + write(pipefd[1], "P", 1); + } else { + /* Other error */ + write(pipefd[1], "F", 1); + } + + close(pipefd[1]); + exit(0); + } + + /* Parent process */ + close(pipefd[1]); + ASSERT_EQ(read(pipefd[0], &result, 1), 1); + + waitpid(pid, &status, 0); + ASSERT_TRUE(WIFEXITED(status)); + ASSERT_EQ(WEXITSTATUS(status), 0); + + if (result == 'U') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new user namespace"); + } + if (result == 'M') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot set uid/gid mappings"); + } + if (result == 'N') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new IPC namespace"); + } + + /* Should fail with ESTALE since we're in a different user namespace */ + ASSERT_EQ(result, 'P'); + + close(pipefd[0]); + free(handle); +} + +TEST(nsfs_user_mnt_namespace_isolation) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + pid_t pid; + int status; + int pipefd[2]; + char result; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Create pipe for communication */ + ASSERT_EQ(pipe(pipefd), 0); + + /* Get handle for current mount namespace */ + ns_fd = open("/proc/self/ns/mnt", O_RDONLY); + ASSERT_GE(ns_fd, 0); + + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); close(pipefd[0]); + close(pipefd[1]); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + close(ns_fd); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* First create new user namespace to drop privileges */ + ret = unshare(CLONE_NEWUSER); + if (ret < 0) { + write(pipefd[1], "U", + 1); /* Unable to create user namespace */ + close(pipefd[1]); + exit(0); + } + + /* Write uid/gid mappings to maintain some capabilities */ + int uid_map_fd = open("/proc/self/uid_map", O_WRONLY); + int gid_map_fd = open("/proc/self/gid_map", O_WRONLY); + int setgroups_fd = open("/proc/self/setgroups", O_WRONLY); + + if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) { + write(pipefd[1], "M", 1); /* Unable to set mappings */ + close(pipefd[1]); + exit(0); + } + + /* Disable setgroups to allow gid mapping */ + write(setgroups_fd, "deny", 4); + close(setgroups_fd); + + /* Map current uid/gid to root in the new namespace */ + char mapping[64]; + snprintf(mapping, sizeof(mapping), "0 %d 1", getuid()); + write(uid_map_fd, mapping, strlen(mapping)); + close(uid_map_fd); + + snprintf(mapping, sizeof(mapping), "0 %d 1", getgid()); + write(gid_map_fd, mapping, strlen(mapping)); + close(gid_map_fd); + + /* Now create new mount namespace */ + ret = unshare(CLONE_NEWNS); + if (ret < 0) { + write(pipefd[1], "N", + 1); /* Unable to create mount namespace */ + close(pipefd[1]); + exit(0); + } + + /* Try to open parent's mount namespace handle from new user+mnt namespace */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + + if (fd >= 0) { + /* Should NOT succeed - we're in a different user namespace */ + write(pipefd[1], "S", 1); /* Unexpected success */ + close(fd); + } else if (errno == ESTALE) { + /* Expected: Stale file handle */ + write(pipefd[1], "P", 1); + } else { + /* Other error */ + write(pipefd[1], "F", 1); + } + + close(pipefd[1]); + exit(0); + } + + /* Parent process */ + close(pipefd[1]); + ASSERT_EQ(read(pipefd[0], &result, 1), 1); + + waitpid(pid, &status, 0); + ASSERT_TRUE(WIFEXITED(status)); + ASSERT_EQ(WEXITSTATUS(status), 0); + + if (result == 'U') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new user namespace"); + } + if (result == 'M') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot set uid/gid mappings"); + } + if (result == 'N') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new mount namespace"); + } + + /* Should fail with ESTALE since we're in a different user namespace */ + ASSERT_EQ(result, 'P'); + + close(pipefd[0]); + free(handle); +} + +TEST(nsfs_user_cgroup_namespace_isolation) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + pid_t pid; + int status; + int pipefd[2]; + char result; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Create pipe for communication */ + ASSERT_EQ(pipe(pipefd), 0); + + /* Get handle for current cgroup namespace */ + ns_fd = open("/proc/self/ns/cgroup", O_RDONLY); + if (ns_fd < 0) { + SKIP(free(handle); close(pipefd[0]); close(pipefd[1]); + return, "cgroup namespace not available"); + } + + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); close(pipefd[0]); + close(pipefd[1]); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + close(ns_fd); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* First create new user namespace to drop privileges */ + ret = unshare(CLONE_NEWUSER); + if (ret < 0) { + write(pipefd[1], "U", + 1); /* Unable to create user namespace */ + close(pipefd[1]); + exit(0); + } + + /* Write uid/gid mappings to maintain some capabilities */ + int uid_map_fd = open("/proc/self/uid_map", O_WRONLY); + int gid_map_fd = open("/proc/self/gid_map", O_WRONLY); + int setgroups_fd = open("/proc/self/setgroups", O_WRONLY); + + if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) { + write(pipefd[1], "M", 1); /* Unable to set mappings */ + close(pipefd[1]); + exit(0); + } + + /* Disable setgroups to allow gid mapping */ + write(setgroups_fd, "deny", 4); + close(setgroups_fd); + + /* Map current uid/gid to root in the new namespace */ + char mapping[64]; + snprintf(mapping, sizeof(mapping), "0 %d 1", getuid()); + write(uid_map_fd, mapping, strlen(mapping)); + close(uid_map_fd); + + snprintf(mapping, sizeof(mapping), "0 %d 1", getgid()); + write(gid_map_fd, mapping, strlen(mapping)); + close(gid_map_fd); + + /* Now create new cgroup namespace */ + ret = unshare(CLONE_NEWCGROUP); + if (ret < 0) { + write(pipefd[1], "N", + 1); /* Unable to create cgroup namespace */ + close(pipefd[1]); + exit(0); + } + + /* Try to open parent's cgroup namespace handle from new user+cgroup namespace */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + + if (fd >= 0) { + /* Should NOT succeed - we're in a different user namespace */ + write(pipefd[1], "S", 1); /* Unexpected success */ + close(fd); + } else if (errno == ESTALE) { + /* Expected: Stale file handle */ + write(pipefd[1], "P", 1); + } else { + /* Other error */ + write(pipefd[1], "F", 1); + } + + close(pipefd[1]); + exit(0); + } + + /* Parent process */ + close(pipefd[1]); + ASSERT_EQ(read(pipefd[0], &result, 1), 1); + + waitpid(pid, &status, 0); + ASSERT_TRUE(WIFEXITED(status)); + ASSERT_EQ(WEXITSTATUS(status), 0); + + if (result == 'U') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new user namespace"); + } + if (result == 'M') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot set uid/gid mappings"); + } + if (result == 'N') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new cgroup namespace"); + } + + /* Should fail with ESTALE since we're in a different user namespace */ + ASSERT_EQ(result, 'P'); + + close(pipefd[0]); + free(handle); +} + +TEST(nsfs_user_pid_namespace_isolation) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + pid_t pid; + int status; + int pipefd[2]; + char result; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Create pipe for communication */ + ASSERT_EQ(pipe(pipefd), 0); + + /* Get handle for current PID namespace */ + ns_fd = open("/proc/self/ns/pid", O_RDONLY); + ASSERT_GE(ns_fd, 0); + + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); close(pipefd[0]); + close(pipefd[1]); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + close(ns_fd); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* First create new user namespace to drop privileges */ + ret = unshare(CLONE_NEWUSER); + if (ret < 0) { + write(pipefd[1], "U", + 1); /* Unable to create user namespace */ + close(pipefd[1]); + exit(0); + } + + /* Write uid/gid mappings to maintain some capabilities */ + int uid_map_fd = open("/proc/self/uid_map", O_WRONLY); + int gid_map_fd = open("/proc/self/gid_map", O_WRONLY); + int setgroups_fd = open("/proc/self/setgroups", O_WRONLY); + + if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) { + write(pipefd[1], "M", 1); /* Unable to set mappings */ + close(pipefd[1]); + exit(0); + } + + /* Disable setgroups to allow gid mapping */ + write(setgroups_fd, "deny", 4); + close(setgroups_fd); + + /* Map current uid/gid to root in the new namespace */ + char mapping[64]; + snprintf(mapping, sizeof(mapping), "0 %d 1", getuid()); + write(uid_map_fd, mapping, strlen(mapping)); + close(uid_map_fd); + + snprintf(mapping, sizeof(mapping), "0 %d 1", getgid()); + write(gid_map_fd, mapping, strlen(mapping)); + close(gid_map_fd); + + /* Now create new PID namespace - requires fork to take effect */ + ret = unshare(CLONE_NEWPID); + if (ret < 0) { + write(pipefd[1], "N", + 1); /* Unable to create PID namespace */ + close(pipefd[1]); + exit(0); + } + + /* Fork again for PID namespace to take effect */ + pid_t child_pid = fork(); + if (child_pid < 0) { + write(pipefd[1], "N", + 1); /* Unable to fork in PID namespace */ + close(pipefd[1]); + exit(0); + } + + if (child_pid == 0) { + /* Grandchild in new PID namespace */ + /* Try to open parent's PID namespace handle from new user+pid namespace */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + + if (fd >= 0) { + /* Should NOT succeed - we're in a different user namespace */ + write(pipefd[1], "S", + 1); /* Unexpected success */ + close(fd); + } else if (errno == ESTALE) { + /* Expected: Stale file handle */ + write(pipefd[1], "P", 1); + } else { + /* Other error */ + write(pipefd[1], "F", 1); + } + + close(pipefd[1]); + exit(0); + } + + /* Wait for grandchild */ + waitpid(child_pid, NULL, 0); + exit(0); + } + + /* Parent process */ + close(pipefd[1]); + ASSERT_EQ(read(pipefd[0], &result, 1), 1); + + waitpid(pid, &status, 0); + ASSERT_TRUE(WIFEXITED(status)); + ASSERT_EQ(WEXITSTATUS(status), 0); + + if (result == 'U') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new user namespace"); + } + if (result == 'M') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot set uid/gid mappings"); + } + if (result == 'N') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new PID namespace"); + } + + /* Should fail with ESTALE since we're in a different user namespace */ + ASSERT_EQ(result, 'P'); + + close(pipefd[0]); + free(handle); +} + +TEST(nsfs_user_time_namespace_isolation) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + pid_t pid; + int status; + int pipefd[2]; + char result; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Create pipe for communication */ + ASSERT_EQ(pipe(pipefd), 0); + + /* Get handle for current time namespace */ + ns_fd = open("/proc/self/ns/time", O_RDONLY); + if (ns_fd < 0) { + SKIP(free(handle); close(pipefd[0]); close(pipefd[1]); + return, "time namespace not available"); + } + + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); close(pipefd[0]); + close(pipefd[1]); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + close(ns_fd); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* Child process */ + close(pipefd[0]); + + /* First create new user namespace to drop privileges */ + ret = unshare(CLONE_NEWUSER); + if (ret < 0) { + write(pipefd[1], "U", + 1); /* Unable to create user namespace */ + close(pipefd[1]); + exit(0); + } + + /* Write uid/gid mappings to maintain some capabilities */ + int uid_map_fd = open("/proc/self/uid_map", O_WRONLY); + int gid_map_fd = open("/proc/self/gid_map", O_WRONLY); + int setgroups_fd = open("/proc/self/setgroups", O_WRONLY); + + if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) { + write(pipefd[1], "M", 1); /* Unable to set mappings */ + close(pipefd[1]); + exit(0); + } + + /* Disable setgroups to allow gid mapping */ + write(setgroups_fd, "deny", 4); + close(setgroups_fd); + + /* Map current uid/gid to root in the new namespace */ + char mapping[64]; + snprintf(mapping, sizeof(mapping), "0 %d 1", getuid()); + write(uid_map_fd, mapping, strlen(mapping)); + close(uid_map_fd); + + snprintf(mapping, sizeof(mapping), "0 %d 1", getgid()); + write(gid_map_fd, mapping, strlen(mapping)); + close(gid_map_fd); + + /* Now create new time namespace - requires fork to take effect */ + ret = unshare(CLONE_NEWTIME); + if (ret < 0) { + write(pipefd[1], "N", + 1); /* Unable to create time namespace */ + close(pipefd[1]); + exit(0); + } + + /* Fork again for time namespace to take effect */ + pid_t child_pid = fork(); + if (child_pid < 0) { + write(pipefd[1], "N", + 1); /* Unable to fork in time namespace */ + close(pipefd[1]); + exit(0); + } + + if (child_pid == 0) { + /* Grandchild in new time namespace */ + /* Try to open parent's time namespace handle from new user+time namespace */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY); + + if (fd >= 0) { + /* Should NOT succeed - we're in a different user namespace */ + write(pipefd[1], "S", + 1); /* Unexpected success */ + close(fd); + } else if (errno == ESTALE) { + /* Expected: Stale file handle */ + write(pipefd[1], "P", 1); + } else { + /* Other error */ + write(pipefd[1], "F", 1); + } + + close(pipefd[1]); + exit(0); + } + + /* Wait for grandchild */ + waitpid(child_pid, NULL, 0); + exit(0); + } + + /* Parent process */ + close(pipefd[1]); + ASSERT_EQ(read(pipefd[0], &result, 1), 1); + + waitpid(pid, &status, 0); + ASSERT_TRUE(WIFEXITED(status)); + ASSERT_EQ(WEXITSTATUS(status), 0); + + if (result == 'U') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new user namespace"); + } + if (result == 'M') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot set uid/gid mappings"); + } + if (result == 'N') { + SKIP(free(handle); close(pipefd[0]); + return, "Cannot create new time namespace"); + } + + /* Should fail with ESTALE since we're in a different user namespace */ + ASSERT_EQ(result, 'P'); + + close(pipefd[0]); + free(handle); +} + +TEST(nsfs_open_flags) +{ + struct file_handle *handle; + int mount_id; + int ret; + int fd; + int ns_fd; + char ns_path[256]; + + handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ); + ASSERT_NE(handle, NULL); + + /* Open a namespace file descriptor */ + snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/net"); + ns_fd = open(ns_path, O_RDONLY); + ASSERT_GE(ns_fd, 0); + + /* Get handle for the namespace */ + handle->handle_bytes = MAX_HANDLE_SZ; + ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH); + if (ret < 0 && errno == EOPNOTSUPP) { + SKIP(free(handle); close(ns_fd); + return, "nsfs doesn't support file handles"); + } + ASSERT_EQ(ret, 0); + ASSERT_GT(handle->handle_bytes, 0); + + /* Test invalid flags that should fail */ + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_WRONLY); + ASSERT_LT(fd, 0); + ASSERT_EQ(errno, EPERM); + + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDWR); + ASSERT_LT(fd, 0); + ASSERT_EQ(errno, EPERM); + + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_CREAT); + ASSERT_LT(fd, 0); + ASSERT_EQ(errno, EINVAL); + + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_TRUNC); + ASSERT_LT(fd, 0); + ASSERT_EQ(errno, EINVAL); + + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_APPEND); + ASSERT_LT(fd, 0); + ASSERT_EQ(errno, EINVAL); + + fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_DIRECT); + ASSERT_LT(fd, 0); + ASSERT_EQ(errno, EINVAL); + + close(ns_fd); + free(handle); +} + +TEST_HARNESS_MAIN
On Wed, Sep 10, 2025 at 4:40 PM Christian Brauner brauner@kernel.org wrote:
Add a bunch of selftests for namespace file handles.
Signed-off-by: Christian Brauner brauner@kernel.org
Obviously, I did not go over every single line, but for the general test template and test coverage you may add:
Reviewed-by: Amir Goldstein amir73il@gmail.com
However, see my comment on file handle support patch. The test matrix is incomplete. Maybe it would be complete if test is run as root and then as non root, but then I think the test needs some changes for running as root and opening non-self ns.
I am not sure what the standard is wrt running the selftests as root /non-root.
I see that the userns isolation tests do: /* Map current uid/gid to root in the new namespace */
Are you assuming that non root is running this test or am I missing something?
Wouldn't mapping uid 0 to uid 0 in the new userns cause the test to fail because opening by handle will succeed?
Thanks, Amir.
tools/testing/selftests/namespaces/.gitignore | 1 + tools/testing/selftests/namespaces/Makefile | 2 +- .../selftests/namespaces/file_handle_test.c | 1410 ++++++++++++++++++++ 3 files changed, 1412 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/namespaces/.gitignore b/tools/testing/selftests/namespaces/.gitignore index c1e8d634dd21..7639dbf58bbf 100644 --- a/tools/testing/selftests/namespaces/.gitignore +++ b/tools/testing/selftests/namespaces/.gitignore @@ -1 +1,2 @@ nsid_test +file_handle_test diff --git a/tools/testing/selftests/namespaces/Makefile b/tools/testing/selftests/namespaces/Makefile index 9280c703533e..f6c117ce2c2b 100644 --- a/tools/testing/selftests/namespaces/Makefile +++ b/tools/testing/selftests/namespaces/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only CFLAGS += -Wall -O0 -g $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
-TEST_GEN_PROGS := nsid_test +TEST_GEN_PROGS := nsid_test file_handle_test
include ../lib.mk
diff --git a/tools/testing/selftests/namespaces/file_handle_test.c b/tools/testing/selftests/namespaces/file_handle_test.c new file mode 100644 index 000000000000..87573fa06990 --- /dev/null +++ b/tools/testing/selftests/namespaces/file_handle_test.c @@ -0,0 +1,1410 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <errno.h> +#include <fcntl.h> +#include <grp.h> +#include <limits.h> +#include <sched.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/mount.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> +#include <linux/unistd.h> +#include "../kselftest_harness.h"
+#ifndef FD_NSFS_ROOT +#define FD_NSFS_ROOT -10003 /* Root of the nsfs filesystem */ +#endif
+TEST(nsfs_net_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open a namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/net");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_uts_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open UTS namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/uts");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_ipc_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open IPC namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/ipc");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_pid_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open PID namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/pid");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_mnt_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open mount namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/mnt");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_user_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open user namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/user");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_cgroup_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open cgroup namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/cgroup");
ns_fd = open(ns_path, O_RDONLY);
if (ns_fd < 0) {
SKIP(free(handle); return, "cgroup namespace not available");
}
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_time_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open time namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/time");
ns_fd = open(ns_path, O_RDONLY);
if (ns_fd < 0) {
SKIP(free(handle); return, "time namespace not available");
}
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_user_net_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current network namespace */
ns_fd = open("/proc/self/ns/net", O_RDONLY);
ASSERT_GE(ns_fd, 0);
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new network namespace */
ret = unshare(CLONE_NEWNET);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create network namespace */
close(pipefd[1]);
exit(0);
}
/* Try to open parent's network namespace handle from new user+net namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S", 1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new network namespace");
}
/* Should fail with permission denied since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_uts_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current UTS namespace */
ns_fd = open("/proc/self/ns/uts", O_RDONLY);
ASSERT_GE(ns_fd, 0);
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new UTS namespace */
ret = unshare(CLONE_NEWUTS);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create UTS namespace */
close(pipefd[1]);
exit(0);
}
/* Try to open parent's UTS namespace handle from new user+uts namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S", 1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new UTS namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_ipc_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current IPC namespace */
ns_fd = open("/proc/self/ns/ipc", O_RDONLY);
ASSERT_GE(ns_fd, 0);
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new IPC namespace */
ret = unshare(CLONE_NEWIPC);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create IPC namespace */
close(pipefd[1]);
exit(0);
}
/* Try to open parent's IPC namespace handle from new user+ipc namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S", 1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new IPC namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_mnt_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current mount namespace */
ns_fd = open("/proc/self/ns/mnt", O_RDONLY);
ASSERT_GE(ns_fd, 0);
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new mount namespace */
ret = unshare(CLONE_NEWNS);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create mount namespace */
close(pipefd[1]);
exit(0);
}
/* Try to open parent's mount namespace handle from new user+mnt namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S", 1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new mount namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_cgroup_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current cgroup namespace */
ns_fd = open("/proc/self/ns/cgroup", O_RDONLY);
if (ns_fd < 0) {
SKIP(free(handle); close(pipefd[0]); close(pipefd[1]);
return, "cgroup namespace not available");
}
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new cgroup namespace */
ret = unshare(CLONE_NEWCGROUP);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create cgroup namespace */
close(pipefd[1]);
exit(0);
}
/* Try to open parent's cgroup namespace handle from new user+cgroup namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S", 1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new cgroup namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_pid_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current PID namespace */
ns_fd = open("/proc/self/ns/pid", O_RDONLY);
ASSERT_GE(ns_fd, 0);
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new PID namespace - requires fork to take effect */
ret = unshare(CLONE_NEWPID);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create PID namespace */
close(pipefd[1]);
exit(0);
}
/* Fork again for PID namespace to take effect */
pid_t child_pid = fork();
if (child_pid < 0) {
write(pipefd[1], "N",
1); /* Unable to fork in PID namespace */
close(pipefd[1]);
exit(0);
}
if (child_pid == 0) {
/* Grandchild in new PID namespace */
/* Try to open parent's PID namespace handle from new user+pid namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S",
1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Wait for grandchild */
waitpid(child_pid, NULL, 0);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new PID namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_time_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current time namespace */
ns_fd = open("/proc/self/ns/time", O_RDONLY);
if (ns_fd < 0) {
SKIP(free(handle); close(pipefd[0]); close(pipefd[1]);
return, "time namespace not available");
}
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new time namespace - requires fork to take effect */
ret = unshare(CLONE_NEWTIME);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create time namespace */
close(pipefd[1]);
exit(0);
}
/* Fork again for time namespace to take effect */
pid_t child_pid = fork();
if (child_pid < 0) {
write(pipefd[1], "N",
1); /* Unable to fork in time namespace */
close(pipefd[1]);
exit(0);
}
if (child_pid == 0) {
/* Grandchild in new time namespace */
/* Try to open parent's time namespace handle from new user+time namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S",
1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Wait for grandchild */
waitpid(child_pid, NULL, 0);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new time namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_open_flags) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open a namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/net");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Test invalid flags that should fail */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_WRONLY);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EPERM);
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDWR);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EPERM);
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_CREAT);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EINVAL);
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_TRUNC);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EINVAL);
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_APPEND);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EINVAL);
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_DIRECT);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EINVAL);
close(ns_fd);
free(handle);
+}
+TEST_HARNESS_MAIN
-- 2.47.3
On Wed, Sep 10, 2025 at 07:30:03PM +0200, Amir Goldstein wrote:
On Wed, Sep 10, 2025 at 4:40 PM Christian Brauner brauner@kernel.org wrote:
Add a bunch of selftests for namespace file handles.
Signed-off-by: Christian Brauner brauner@kernel.org
Obviously, I did not go over every single line, but for the general test template and test coverage you may add:
Reviewed-by: Amir Goldstein amir73il@gmail.com
However, see my comment on file handle support patch. The test matrix is incomplete.
I mean, I'll just drop to non-root in the non-cross ns tests:
/* Drop to unprivileged uid/gid */ ASSERT_EQ(setresgid(65534, 65534, 65534), 0); /* nogroup */ ASSERT_EQ(setresuid(65534, 65534, 65534), 0); /* nobody */
Maybe it would be complete if test is run as root and then as non root, but then I think the test needs some changes for running as root and opening non-self ns.
I am not sure what the standard is wrt running the selftests as root /non-root.
I see that the userns isolation tests do: /* Map current uid/gid to root in the new namespace */
Are you assuming that non root is running this test or am I missing something?
No, I'm not assuming that. I just need a new user namespace and become root in it to assume privilege over it so I can test that decoding doesn't work from an ancestor userns owned namespace.
Wouldn't mapping uid 0 to uid 0 in the new userns cause the test to fail because opening by handle will succeed?
Thanks, Amir.
tools/testing/selftests/namespaces/.gitignore | 1 + tools/testing/selftests/namespaces/Makefile | 2 +- .../selftests/namespaces/file_handle_test.c | 1410 ++++++++++++++++++++ 3 files changed, 1412 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/namespaces/.gitignore b/tools/testing/selftests/namespaces/.gitignore index c1e8d634dd21..7639dbf58bbf 100644 --- a/tools/testing/selftests/namespaces/.gitignore +++ b/tools/testing/selftests/namespaces/.gitignore @@ -1 +1,2 @@ nsid_test +file_handle_test diff --git a/tools/testing/selftests/namespaces/Makefile b/tools/testing/selftests/namespaces/Makefile index 9280c703533e..f6c117ce2c2b 100644 --- a/tools/testing/selftests/namespaces/Makefile +++ b/tools/testing/selftests/namespaces/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only CFLAGS += -Wall -O0 -g $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
-TEST_GEN_PROGS := nsid_test +TEST_GEN_PROGS := nsid_test file_handle_test
include ../lib.mk
diff --git a/tools/testing/selftests/namespaces/file_handle_test.c b/tools/testing/selftests/namespaces/file_handle_test.c new file mode 100644 index 000000000000..87573fa06990 --- /dev/null +++ b/tools/testing/selftests/namespaces/file_handle_test.c @@ -0,0 +1,1410 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <errno.h> +#include <fcntl.h> +#include <grp.h> +#include <limits.h> +#include <sched.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/mount.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> +#include <linux/unistd.h> +#include "../kselftest_harness.h"
+#ifndef FD_NSFS_ROOT +#define FD_NSFS_ROOT -10003 /* Root of the nsfs filesystem */ +#endif
+TEST(nsfs_net_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open a namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/net");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_uts_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open UTS namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/uts");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_ipc_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open IPC namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/ipc");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_pid_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open PID namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/pid");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_mnt_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open mount namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/mnt");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_user_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open user namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/user");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_cgroup_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open cgroup namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/cgroup");
ns_fd = open(ns_path, O_RDONLY);
if (ns_fd < 0) {
SKIP(free(handle); return, "cgroup namespace not available");
}
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_time_handle) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
struct stat st1, st2;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open time namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/time");
ns_fd = open(ns_path, O_RDONLY);
if (ns_fd < 0) {
SKIP(free(handle); return, "time namespace not available");
}
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Try to open using FD_NSFS_ROOT */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd < 0 && (errno == EINVAL || errno == EOPNOTSUPP)) {
SKIP(free(handle); close(ns_fd);
return,
"open_by_handle_at with FD_NSFS_ROOT not supported");
}
ASSERT_GE(fd, 0);
/* Verify we opened the correct namespace */
ASSERT_EQ(fstat(ns_fd, &st1), 0);
ASSERT_EQ(fstat(fd, &st2), 0);
ASSERT_EQ(st1.st_ino, st2.st_ino);
ASSERT_EQ(st1.st_dev, st2.st_dev);
close(fd);
close(ns_fd);
free(handle);
+}
+TEST(nsfs_user_net_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current network namespace */
ns_fd = open("/proc/self/ns/net", O_RDONLY);
ASSERT_GE(ns_fd, 0);
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new network namespace */
ret = unshare(CLONE_NEWNET);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create network namespace */
close(pipefd[1]);
exit(0);
}
/* Try to open parent's network namespace handle from new user+net namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S", 1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new network namespace");
}
/* Should fail with permission denied since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_uts_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current UTS namespace */
ns_fd = open("/proc/self/ns/uts", O_RDONLY);
ASSERT_GE(ns_fd, 0);
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new UTS namespace */
ret = unshare(CLONE_NEWUTS);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create UTS namespace */
close(pipefd[1]);
exit(0);
}
/* Try to open parent's UTS namespace handle from new user+uts namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S", 1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new UTS namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_ipc_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current IPC namespace */
ns_fd = open("/proc/self/ns/ipc", O_RDONLY);
ASSERT_GE(ns_fd, 0);
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new IPC namespace */
ret = unshare(CLONE_NEWIPC);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create IPC namespace */
close(pipefd[1]);
exit(0);
}
/* Try to open parent's IPC namespace handle from new user+ipc namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S", 1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new IPC namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_mnt_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current mount namespace */
ns_fd = open("/proc/self/ns/mnt", O_RDONLY);
ASSERT_GE(ns_fd, 0);
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new mount namespace */
ret = unshare(CLONE_NEWNS);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create mount namespace */
close(pipefd[1]);
exit(0);
}
/* Try to open parent's mount namespace handle from new user+mnt namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S", 1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new mount namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_cgroup_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current cgroup namespace */
ns_fd = open("/proc/self/ns/cgroup", O_RDONLY);
if (ns_fd < 0) {
SKIP(free(handle); close(pipefd[0]); close(pipefd[1]);
return, "cgroup namespace not available");
}
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new cgroup namespace */
ret = unshare(CLONE_NEWCGROUP);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create cgroup namespace */
close(pipefd[1]);
exit(0);
}
/* Try to open parent's cgroup namespace handle from new user+cgroup namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S", 1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new cgroup namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_pid_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current PID namespace */
ns_fd = open("/proc/self/ns/pid", O_RDONLY);
ASSERT_GE(ns_fd, 0);
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new PID namespace - requires fork to take effect */
ret = unshare(CLONE_NEWPID);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create PID namespace */
close(pipefd[1]);
exit(0);
}
/* Fork again for PID namespace to take effect */
pid_t child_pid = fork();
if (child_pid < 0) {
write(pipefd[1], "N",
1); /* Unable to fork in PID namespace */
close(pipefd[1]);
exit(0);
}
if (child_pid == 0) {
/* Grandchild in new PID namespace */
/* Try to open parent's PID namespace handle from new user+pid namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S",
1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Wait for grandchild */
waitpid(child_pid, NULL, 0);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new PID namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_user_time_namespace_isolation) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
pid_t pid;
int status;
int pipefd[2];
char result;
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Create pipe for communication */
ASSERT_EQ(pipe(pipefd), 0);
/* Get handle for current time namespace */
ns_fd = open("/proc/self/ns/time", O_RDONLY);
if (ns_fd < 0) {
SKIP(free(handle); close(pipefd[0]); close(pipefd[1]);
return, "time namespace not available");
}
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd); close(pipefd[0]);
close(pipefd[1]);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
close(ns_fd);
pid = fork();
ASSERT_GE(pid, 0);
if (pid == 0) {
/* Child process */
close(pipefd[0]);
/* First create new user namespace to drop privileges */
ret = unshare(CLONE_NEWUSER);
if (ret < 0) {
write(pipefd[1], "U",
1); /* Unable to create user namespace */
close(pipefd[1]);
exit(0);
}
/* Write uid/gid mappings to maintain some capabilities */
int uid_map_fd = open("/proc/self/uid_map", O_WRONLY);
int gid_map_fd = open("/proc/self/gid_map", O_WRONLY);
int setgroups_fd = open("/proc/self/setgroups", O_WRONLY);
if (uid_map_fd < 0 || gid_map_fd < 0 || setgroups_fd < 0) {
write(pipefd[1], "M", 1); /* Unable to set mappings */
close(pipefd[1]);
exit(0);
}
/* Disable setgroups to allow gid mapping */
write(setgroups_fd, "deny", 4);
close(setgroups_fd);
/* Map current uid/gid to root in the new namespace */
char mapping[64];
snprintf(mapping, sizeof(mapping), "0 %d 1", getuid());
write(uid_map_fd, mapping, strlen(mapping));
close(uid_map_fd);
snprintf(mapping, sizeof(mapping), "0 %d 1", getgid());
write(gid_map_fd, mapping, strlen(mapping));
close(gid_map_fd);
/* Now create new time namespace - requires fork to take effect */
ret = unshare(CLONE_NEWTIME);
if (ret < 0) {
write(pipefd[1], "N",
1); /* Unable to create time namespace */
close(pipefd[1]);
exit(0);
}
/* Fork again for time namespace to take effect */
pid_t child_pid = fork();
if (child_pid < 0) {
write(pipefd[1], "N",
1); /* Unable to fork in time namespace */
close(pipefd[1]);
exit(0);
}
if (child_pid == 0) {
/* Grandchild in new time namespace */
/* Try to open parent's time namespace handle from new user+time namespace */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDONLY);
if (fd >= 0) {
/* Should NOT succeed - we're in a different user namespace */
write(pipefd[1], "S",
1); /* Unexpected success */
close(fd);
} else if (errno == ESTALE) {
/* Expected: Stale file handle */
write(pipefd[1], "P", 1);
} else {
/* Other error */
write(pipefd[1], "F", 1);
}
close(pipefd[1]);
exit(0);
}
/* Wait for grandchild */
waitpid(child_pid, NULL, 0);
exit(0);
}
/* Parent process */
close(pipefd[1]);
ASSERT_EQ(read(pipefd[0], &result, 1), 1);
waitpid(pid, &status, 0);
ASSERT_TRUE(WIFEXITED(status));
ASSERT_EQ(WEXITSTATUS(status), 0);
if (result == 'U') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new user namespace");
}
if (result == 'M') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot set uid/gid mappings");
}
if (result == 'N') {
SKIP(free(handle); close(pipefd[0]);
return, "Cannot create new time namespace");
}
/* Should fail with ESTALE since we're in a different user namespace */
ASSERT_EQ(result, 'P');
close(pipefd[0]);
free(handle);
+}
+TEST(nsfs_open_flags) +{
struct file_handle *handle;
int mount_id;
int ret;
int fd;
int ns_fd;
char ns_path[256];
handle = malloc(sizeof(*handle) + MAX_HANDLE_SZ);
ASSERT_NE(handle, NULL);
/* Open a namespace file descriptor */
snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/net");
ns_fd = open(ns_path, O_RDONLY);
ASSERT_GE(ns_fd, 0);
/* Get handle for the namespace */
handle->handle_bytes = MAX_HANDLE_SZ;
ret = name_to_handle_at(ns_fd, "", handle, &mount_id, AT_EMPTY_PATH);
if (ret < 0 && errno == EOPNOTSUPP) {
SKIP(free(handle); close(ns_fd);
return, "nsfs doesn't support file handles");
}
ASSERT_EQ(ret, 0);
ASSERT_GT(handle->handle_bytes, 0);
/* Test invalid flags that should fail */
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_WRONLY);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EPERM);
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_RDWR);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EPERM);
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_CREAT);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EINVAL);
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_TRUNC);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EINVAL);
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_APPEND);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EINVAL);
fd = open_by_handle_at(FD_NSFS_ROOT, handle, O_DIRECT);
ASSERT_LT(fd, 0);
ASSERT_EQ(errno, EINVAL);
close(ns_fd);
free(handle);
+}
+TEST_HARNESS_MAIN
-- 2.47.3
On Thu, Sep 11, 2025 at 11:15 AM Christian Brauner brauner@kernel.org wrote:
On Wed, Sep 10, 2025 at 07:30:03PM +0200, Amir Goldstein wrote:
On Wed, Sep 10, 2025 at 4:40 PM Christian Brauner brauner@kernel.org wrote:
Add a bunch of selftests for namespace file handles.
Signed-off-by: Christian Brauner brauner@kernel.org
Obviously, I did not go over every single line, but for the general test template and test coverage you may add:
Reviewed-by: Amir Goldstein amir73il@gmail.com
However, see my comment on file handle support patch. The test matrix is incomplete.
I mean, I'll just drop to non-root in the non-cross ns tests:
/* Drop to unprivileged uid/gid */ ASSERT_EQ(setresgid(65534, 65534, 65534), 0); /* nogroup */ ASSERT_EQ(setresuid(65534, 65534, 65534), 0); /* nobody */
That would be good I think.
Maybe it would be complete if test is run as root and then as non root, but then I think the test needs some changes for running as root and opening non-self ns.
I am not sure what the standard is wrt running the selftests as root /non-root.
I see that the userns isolation tests do: /* Map current uid/gid to root in the new namespace */
Are you assuming that non root is running this test or am I missing something?
No, I'm not assuming that. I just need a new user namespace and become root in it to assume privilege over it so I can test that decoding doesn't work from an ancestor userns owned namespace.
With dropping to unprivileged uid/gid in parent, I understand it should work. I guess I wasn't sure if dropping to unprivileged uid/gid was required for the test to pass when the test is run as root user, but with the addition of dropping to unprivileged uid/gid - feel free to add:
Reviewed-by: Amir Goldstein amir73il@gmail.com
Thanks, Amir.
On 9/10/25 7:37 AM, Christian Brauner wrote:
- snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/net");
- ns_fd = open(ns_path, O_RDONLY);
Here and also in TEST(nsfs_uts_handle), ns_path is not modified. Does this mean that "/proc/self/ns/net" can be stored in a static const char array and also that the snprintf() call can be left out? In case I would have missed the reason why the path is copied, how about using asprintf() or strdup() instead of snprintf()?
Thanks,
Bart.
On Wed, Sep 10, 2025 at 02:46:21PM -0700, Bart Van Assche wrote:
On 9/10/25 7:37 AM, Christian Brauner wrote:
- snprintf(ns_path, sizeof(ns_path), "/proc/self/ns/net");
- ns_fd = open(ns_path, O_RDONLY);
Here and also in TEST(nsfs_uts_handle), ns_path is not modified. Does this mean that "/proc/self/ns/net" can be stored in a static const char array and also that the snprintf() call can be left out? In case I would have missed the reason why the path is copied, how about using asprintf() or strdup() instead of snprintf()?
Yep, that can just be a static string. Thanks.
syzbot ci has tested the following series
[v1] ns: support file handles https://lore.kernel.org/all/20250910-work-namespace-v1-0-4dd56e7359d8@kernel... * [PATCH 01/32] pidfs: validate extensible ioctls * [PATCH 02/32] nsfs: validate extensible ioctls * [PATCH 03/32] block: use extensible_ioctl_valid() * [PATCH 04/32] ns: move to_ns_common() to ns_common.h * [PATCH 05/32] nsfs: add nsfs.h header * [PATCH 06/32] ns: uniformly initialize ns_common * [PATCH 07/32] mnt: use ns_common_init() * [PATCH 08/32] ipc: use ns_common_init() * [PATCH 09/32] cgroup: use ns_common_init() * [PATCH 10/32] pid: use ns_common_init() * [PATCH 11/32] time: use ns_common_init() * [PATCH 12/32] uts: use ns_common_init() * [PATCH 13/32] user: use ns_common_init() * [PATCH 14/32] net: use ns_common_init() * [PATCH 15/32] ns: remove ns_alloc_inum() * [PATCH 16/32] nstree: make iterator generic * [PATCH 17/32] mnt: support iterator * [PATCH 18/32] cgroup: support iterator * [PATCH 19/32] ipc: support iterator * [PATCH 20/32] net: support iterator * [PATCH 21/32] pid: support iterator * [PATCH 22/32] time: support iterator * [PATCH 23/32] userns: support iterator * [PATCH 24/32] uts: support iterator * [PATCH 25/32] ns: add to_<type>_ns() to respective headers * [PATCH 26/32] nsfs: add current_in_namespace() * [PATCH 27/32] nsfs: support file handles * [PATCH 28/32] nsfs: support exhaustive file handles * [PATCH 29/32] nsfs: add missing id retrieval support * [PATCH 30/32] tools: update nsfs.h uapi header * [PATCH 31/32] selftests/namespaces: add identifier selftests * [PATCH 32/32] selftests/namespaces: add file handle selftests
and found the following issue: WARNING in copy_net_ns
Full report is available here: https://ci.syzbot.org/series/bc3dfd83-98cc-488c-b046-f849c79a6a41
***
WARNING in copy_net_ns
tree: net-next URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git base: deb105f49879dd50d595f7f55207d6e74dec34e6 arch: amd64 compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 config: https://ci.syzbot.org/builds/a560fd28-b788-4442-a7c8-10c6240b4dbf/config syz repro: https://ci.syzbot.org/findings/18e91b10-567e-4cae-a279-8a5f2f2cde80/syz_repr...
------------[ cut here ]------------ ida_free called for id=1326 which is not allocated. WARNING: CPU: 0 PID: 6146 at lib/idr.c:592 ida_free+0x280/0x310 lib/idr.c:592 Modules linked in: CPU: 0 UID: 0 PID: 6146 Comm: syz.1.60 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:ida_free+0x280/0x310 lib/idr.c:592 Code: 00 00 00 00 fc ff df 48 8b 5c 24 10 48 8b 7c 24 40 48 89 de e8 d1 8a 0c 00 90 48 c7 c7 80 ee ba 8c 44 89 fe e8 11 87 12 f6 90 <0f> 0b 90 90 eb 34 e8 95 02 4f f6 49 bd 00 00 00 00 00 fc ff df eb RSP: 0018:ffffc9000302fba0 EFLAGS: 00010246 RAX: c838d58ce4bb0000 RBX: 0000000000000a06 RCX: ffff88801eac0000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000002 RBP: ffffc9000302fca0 R08: ffff88804b024293 R09: 1ffff11009604852 R10: dffffc0000000000 R11: ffffed1009604853 R12: 1ffff92000605f78 R13: dffffc0000000000 R14: ffff888026c1fd00 R15: 000000000000052e FS: 00007f6d7aab16c0(0000) GS:ffff8880b8613000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000200000004000 CR3: 000000002726e000 CR4: 00000000000006f0 Call Trace: <TASK> copy_net_ns+0x37a/0x510 net/core/net_namespace.c:593 create_new_namespaces+0x3f3/0x720 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0x11c/0x170 kernel/nsproxy.c:218 ksys_unshare+0x4c8/0x8c0 kernel/fork.c:3127 __do_sys_unshare kernel/fork.c:3198 [inline] __se_sys_unshare kernel/fork.c:3196 [inline] __x64_sys_unshare+0x38/0x50 kernel/fork.c:3196 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f6d79b8eba9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f6d7aab1038 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007f6d79dd5fa0 RCX: 00007f6d79b8eba9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062040200 RBP: 00007f6d79c11e19 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f6d79dd6038 R14: 00007f6d79dd5fa0 R15: 00007ffd5ab830f8 </TASK>
***
If these findings have caused you to resend the series or submit a separate fix, please add the following tag to your commit message: Tested-by: syzbot@syzkaller.appspotmail.com
--- This report is generated by a bot. It may contain errors. syzbot ci engineers can be reached at syzkaller@googlegroups.com.
linux-kselftest-mirror@lists.linaro.org