[PATCH] fs/ceph/mds_client: give up on paths longer than PATH_MAX

List overview All Threads
Download

newer

older

Apply 60fc1e675013 to 6.12.y

[PATCH v4 00/15] media: i2c:...

Max Kellermann

18 Nov 2024 18 Nov '24

10:28 p.m.

If the full path to be built by ceph_mdsc_build_path() happens to be longer than PATH_MAX, then this function will enter an endless (retry) loop, effectively blocking the whole task. Most of the machine becomes unusable, making this a very simple and effective DoS vulnerability.

I cannot imagine why this retry was ever implemented, but it seems rather useless and harmful to me. Let's remove it and fail with ENAMETOOLONG instead.

Cc: stable@vger.kernel.org Reported-by: Dario Weißer dario@cure53.de Signed-off-by: Max Kellermann max.kellermann@ionos.com --- fs/ceph/mds_client.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index c4a5fd94bbbb..4f6ac015edcd 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2808,12 +2808,11 @@ char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, struct dentry *dentry,

if (pos < 0) { /* - * A rename didn't occur, but somehow we didn't end up where - * we thought we would. Throw a warning and try again. + * The path is longer than PATH_MAX and this function + * cannot ever succeed. Creating paths that long is + * possible with Ceph, but Linux cannot use them. */ - pr_warn_client(cl, "did not end path lookup where expected (pos = %d)\n", - pos); - goto retry; + return ERR_PTR(-ENAMETOOLONG); }

*pbase = base;

-- 2.45.2

Show replies by date

Ilya Dryomov

19 Nov 19 Nov

12:38 p.m.

On Mon, Nov 18, 2024 at 11:28 PM Max Kellermann max.kellermann@ionos.com wrote:

...

If the full path to be built by ceph_mdsc_build_path() happens to be longer than PATH_MAX, then this function will enter an endless (retry) loop, effectively blocking the whole task. Most of the machine becomes unusable, making this a very simple and effective DoS vulnerability.

I cannot imagine why this retry was ever implemented, but it seems rather useless and harmful to me. Let's remove it and fail with ENAMETOOLONG instead.

Hi Max,

When this was put in place in 2009, I think the idea of a retry was copied from CIFS. Jeff preserved the retry when he massaged this code to not warn in case a rename race is detected [1]. CIFS got rid of it only a couple of years ago [2][3].

Adding Patrick and Venky as well, please chime in.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...

Thanks,

Ilya

...

Cc: stable@vger.kernel.org Reported-by: Dario Weißer dario@cure53.de Signed-off-by: Max Kellermann max.kellermann@ionos.com

fs/ceph/mds_client.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index c4a5fd94bbbb..4f6ac015edcd 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2808,12 +2808,11 @@ char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, struct dentry *dentry,
    if (pos < 0) {
            /*
           * A rename didn't occur, but somehow we didn't end up where
           * we thought we would. Throw a warning and try again.
           * The path is longer than PATH_MAX and this function
           * cannot ever succeed.  Creating paths that long is
           * possible with Ceph, but Linux cannot use them.
           */
          pr_warn_client(cl, "did not end path lookup where expected (pos = %d)\n",
                         pos);
          goto retry;
          return ERR_PTR(-ENAMETOOLONG);
  }

  *pbase = base;
-- 2.45.2

Jeff Layton

12:51 p.m.

On Tue, 2024-11-19 at 13:38 +0100, Ilya Dryomov wrote:

...

On Mon, Nov 18, 2024 at 11:28 PM Max Kellermann max.kellermann@ionos.com wrote:

...
If the full path to be built by ceph_mdsc_build_path() happens to be longer than PATH_MAX, then this function will enter an endless (retry) loop, effectively blocking the whole task. Most of the machine becomes unusable, making this a very simple and effective DoS vulnerability.

I cannot imagine why this retry was ever implemented, but it seems rather useless and harmful to me. Let's remove it and fail with ENAMETOOLONG instead.

Hi Max,

When this was put in place in 2009, I think the idea of a retry was copied from CIFS. Jeff preserved the retry when he massaged this code to not warn in case a rename race is detected [1]. CIFS got rid of it only a couple of years ago [2][3].

Adding Patrick and Venky as well, please chime in.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...

Thanks,
            Ilya
...
Cc: stable@vger.kernel.org Reported-by: Dario Weißer dario@cure53.de Signed-off-by: Max Kellermann max.kellermann@ionos.com

fs/ceph/mds_client.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index c4a5fd94bbbb..4f6ac015edcd 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2808,12 +2808,11 @@ char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, struct dentry *dentry,
    if (pos < 0) {
            /*
           * A rename didn't occur, but somehow we didn't end up where
           * we thought we would. Throw a warning and try again.
           * The path is longer than PATH_MAX and this function
           * cannot ever succeed.  Creating paths that long is
           * possible with Ceph, but Linux cannot use them.
           */
          pr_warn_client(cl, "did not end path lookup where expected (pos = %d)\n",
                         pos);
          goto retry;
          return ERR_PTR(-ENAMETOOLONG);
  }

  *pbase = base;
-- 2.45.2

I think the idea was that if we ended up with a path longer than PATH_MAX that something must have changed in the middle of the reverse pathwalk to make it too long a path, and we should just go and try it again. At the very least, this code should cap the retries to a certain number if you don't just return an error here.

-ENAMETOOLONG could be problematic there. This function is often called when we have a dentry and need to build a path to it to send to the MDS in a call. The system call that caused us to generate this path probably doesn't involve a pathname itself, so the caller may be confused by an -ENAMETOOLONG return.

You may want to go with a more generic error code here -- -EIO or something. It might also be worthwhile to leave in a pr_warn_once or something since there may be users confused by this error return.

-- Jeff Layton jlayton@kernel.org

Max Kellermann

1:02 p.m.

On Tue, Nov 19, 2024 at 1:51 PM Jeff Layton jlayton@kernel.org wrote:

...

-ENAMETOOLONG could be problematic there. This function is often called when we have a dentry and need to build a path to it to send to the MDS in a call. The system call that caused us to generate this path probably doesn't involve a pathname itself, so the caller may be confused by an -ENAMETOOLONG return.

It is unfortunate that the Ceph-MDS protocol requires having to convert a file descriptor back to a path name - but do you really believe EIO would cause less confusion? ENAMETOOLONG is exactly what happens, even if it's an internal error. But there are many error codes that describe internal errors, so there's some prior art.

EIO just doesn't fit, returning EIO would be confusing - even more so because EIO isn't a documented error code for open().

If this is about building path names for sending to the MDS, and not for the userspace ABI, maybe the PATH_MAX limitation is wrong here. If Ceph doesn't have such a limitation, the Ceph code shouldn't use the userspace ABI limit for protocol use.

...

You may want to go with a more generic error code here -- -EIO or something. It might also be worthwhile to leave in a pr_warn_once or something since there may be users confused by this error return.

Users cannot read the kernel log, and this allows users to flood the kernel log. So we get all the disadvantages of the kernel log while our users get none of the advantages.

Jeff Layton

1:32 p.m.

On Tue, 2024-11-19 at 14:02 +0100, Max Kellermann wrote:

...

On Tue, Nov 19, 2024 at 1:51 PM Jeff Layton jlayton@kernel.org wrote:

...
-ENAMETOOLONG could be problematic there. This function is often called when we have a dentry and need to build a path to it to send to the MDS in a call. The system call that caused us to generate this path probably doesn't involve a pathname itself, so the caller may be confused by an -ENAMETOOLONG return.

It is unfortunate that the Ceph-MDS protocol requires having to convert a file descriptor back to a path name - but do you really believe EIO would cause less confusion? ENAMETOOLONG is exactly what happens, even if it's an internal error. But there are many error codes that describe internal errors, so there's some prior art.

EIO just doesn't fit, returning EIO would be confusing - even more so because EIO isn't a documented error code for open().

Fair enough. EIO is just my goto when I don't have a better idea.

Probably you want some weird error code that will make people stand up and take notice. Maybe?

#define ENOSR 63 /* Out of streams resources */

...

If this is about building path names for sending to the MDS, and not for the userspace ABI, maybe the PATH_MAX limitation is wrong here. If Ceph doesn't have such a limitation, the Ceph code shouldn't use the userspace ABI limit for protocol use.

It is possible to build a dir hierarchy that is deeper than can be represented by a PATH_MAX-length string. No reason you can't allocate a bigger buffer there. Does the MDS have a limit on the size of a path string that it'll accept?

...

...
You may want to go with a more generic error code here -- -EIO or something. It might also be worthwhile to leave in a pr_warn_once or something since there may be users confused by this error return.

Users cannot read the kernel log, and this allows users to flood the kernel log. So we get all the disadvantages of the kernel log while our users get none of the advantages.

Sure. I'd make this a pr_warn_once or heavily ratelimit it or something. No need for a lot of these messages, but eventually users are going to wonder why they're getting these weird errors and bug the admins. Giving them this hint might be helpful.

Alternately you could add a conditional tracepoint or something, but those are more obscure.

-- Jeff Layton jlayton@kernel.org

Patrick Donnelly

1:58 p.m.

On Tue, Nov 19, 2024 at 8:02 AM Max Kellermann max.kellermann@ionos.com wrote:

...

On Tue, Nov 19, 2024 at 1:51 PM Jeff Layton jlayton@kernel.org wrote:

...
-ENAMETOOLONG could be problematic there. This function is often called when we have a dentry and need to build a path to it to send to the MDS in a call. The system call that caused us to generate this path probably doesn't involve a pathname itself, so the caller may be confused by an -ENAMETOOLONG return.

It is unfortunate that the Ceph-MDS protocol requires having to convert a file descriptor back to a path name - but do you really believe EIO would cause less confusion? ENAMETOOLONG is exactly what happens, even if it's an internal error. But there are many error codes that describe internal errors, so there's some prior art.

The protocol does **not** require building the full path for most operations unless it involves a snapshot. For snapshots we have to climb the directory tree until we find the directory with the snapshot. e.g.:

$ tree -a foo/ foo/.snap/ foo/bar/baz/.snap/ foo/ └── bar └── baz └── file foo/.snap/ └── 1 └── bar └── baz └── file foo/bar/baz/.snap/ ├── _1_1099511627779 │ └── file └── 2 └── file

If you read "file" via foo/.snap/1 you get:

2024-11-19T13:47:23.523+0000 7f9b3b79b640 1 -- [v2:172.21.10.4:6874/192645635,v1:172.21.10.4:6875/192645635] <== client.4417 172.21.10.4:0/1322260999 43506 ==== client_request(client.4417:121 open #0x10000000003//1/bar/baz/file 2024-11-19T13:47:23.524518+0000 caller_uid=1141, caller_gid=1141{1000,1141,}) ==== 199+0+0 (crc 0 0 0) 0x55acb2d55180 con 0x55acb2b3cc00

and for foo/bar/baz/.snap/2/

2024-11-19T13:47:56.796+0000 7f9b3b79b640 1 -- [v2:172.21.10.4:6874/192645635,v1:172.21.10.4:6875/192645635] <== client.4417 172.21.10.4:0/1322260999 43578 ==== client_request(client.4417:155 open #0x10000000005//2/file 2024-11-19T13:47:56.798370+0000 caller_uid=1141, caller_gid=1141{1000,1141,}) ==== 191+0+0 (crc 0 0 0) 0x55acb2d56000 con 0x55acb2b3cc00

(Note: the MDS protocol indicates a snapshot in the relative file path via double forward slash.)

If you create "file":

2024-11-19T13:56:56.895+0000 7f9b34f8e640 7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.4467:7 create owner_uid=1141, owner_gid=1141 #0x10000000005/file 2024-11-19T13:56:56.890430+0000 caller_uid=1141, caller_gid=1141{1000,1141,})

(During path lookups when planning to read the file, the client will usually get read caps so it doesn't need to formally open the file. So this last example uses a create.)

-- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Max Kellermann

2:54 p.m.

On Tue, Nov 19, 2024 at 2:58 PM Patrick Donnelly pdonnell@redhat.com wrote:

...

The protocol does **not** require building the full path for most operations unless it involves a snapshot.

We don't use Ceph snapshots, but before today's emergency update, we could shoot down an arbitrary server with a single (unprivileged) system call using this vulnerability.

I'm not sure what your point is, but this vulnerability exists, it works without snapshots and we think it's serious.

Patrick Donnelly

3:13 p.m.

On Tue, Nov 19, 2024 at 9:54 AM Max Kellermann max.kellermann@ionos.com wrote:

...

On Tue, Nov 19, 2024 at 2:58 PM Patrick Donnelly pdonnell@redhat.com wrote:

...
The protocol does **not** require building the full path for most operations unless it involves a snapshot.

We don't use Ceph snapshots, but before today's emergency update, we could shoot down an arbitrary server with a single (unprivileged) system call using this vulnerability.

I'm not sure what your point is, but this vulnerability exists, it works without snapshots and we think it's serious.

I'm not suggesting there isn't a bug. I'm correcting a misunderstanding.

-- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Alex Markuze

21 Nov 21 Nov

10:48 a.m.

IMHO, we should first have a solution for the immediate problem, remove infinite retries and fail early, and cap it at 3 retries in case there is a temporary issue here. I would use ENAMETOOLONG as the primary error code, as it is the most informative, and couple it with a rate-limited kernel log (pr_warn_once) for debugging without flooding. I would also open a bug/feature request for a dynamic buffer allocation that bypasses PATH_MAX for protocol-specific paths.

On Tue, Nov 19, 2024 at 5:17 PM Patrick Donnelly pdonnell@redhat.com wrote:

...

On Tue, Nov 19, 2024 at 9:54 AM Max Kellermann max.kellermann@ionos.com wrote:

...
On Tue, Nov 19, 2024 at 2:58 PM Patrick Donnelly pdonnell@redhat.com wrote:

...
The protocol does **not** require building the full path for most operations unless it involves a snapshot.

We don't use Ceph snapshots, but before today's emergency update, we could shoot down an arbitrary server with a single (unprivileged) system call using this vulnerability.

I'm not sure what your point is, but this vulnerability exists, it works without snapshots and we think it's serious.

I'm not suggesting there isn't a bug. I'm correcting a misunderstanding.

-- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Alex Markuze

25 Nov 25 Nov

1:23 p.m.

Hey, Folks. This seems important, so I'm bumping this thread. Max has a valid concern, and this issue must be addressed. Jeff seems to think keeping at least a few retries might be a good idea.

Max, could you add a cap on the retry count to your original patch? I will discuss the PATH_MAX with Patrick and open a feature request for myself to alleviate the limitation.

On Thu, Nov 21, 2024 at 12:48 PM Alex Markuze amarkuze@redhat.com wrote:

...

IMHO, we should first have a solution for the immediate problem, remove infinite retries and fail early, and cap it at 3 retries in case there is a temporary issue here. I would use ENAMETOOLONG as the primary error code, as it is the most informative, and couple it with a rate-limited kernel log (pr_warn_once) for debugging without flooding. I would also open a bug/feature request for a dynamic buffer allocation that bypasses PATH_MAX for protocol-specific paths.

On Tue, Nov 19, 2024 at 5:17 PM Patrick Donnelly pdonnell@redhat.com wrote:

...
On Tue, Nov 19, 2024 at 9:54 AM Max Kellermann max.kellermann@ionos.com wrote:

...
On Tue, Nov 19, 2024 at 2:58 PM Patrick Donnelly pdonnell@redhat.com wrote:

...
The protocol does **not** require building the full path for most operations unless it involves a snapshot.

We don't use Ceph snapshots, but before today's emergency update, we could shoot down an arbitrary server with a single (unprivileged) system call using this vulnerability.

I'm not sure what your point is, but this vulnerability exists, it works without snapshots and we think it's serious.

I'm not suggesting there isn't a bug. I'm correcting a misunderstanding.

-- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Max Kellermann

1:59 p.m.

On Mon, Nov 25, 2024 at 2:24 PM Alex Markuze amarkuze@redhat.com wrote:

...

Max, could you add a cap on the retry count to your original patch?

Before I wrote code that's not useful at all: I don't quite get why retry on buffer overflow is necessary at all. It looks like it once seemed to be a useful kludge, but then 1b71fe2efa31 ("ceph analog of cifs build_path_from_dentry() race fix") added the read_seqretry() check which, to my limited understanding, is a more robust implementation of rename detection.

Max

Alex Markuze

2:32 p.m.

You and Illia agree on this point. I'll wait for replies and take your original patch into the testing branch unless any concerns are raised.

On Mon, Nov 25, 2024 at 3:59 PM Max Kellermann max.kellermann@ionos.com wrote:

...

On Mon, Nov 25, 2024 at 2:24 PM Alex Markuze amarkuze@redhat.com wrote:

...
Max, could you add a cap on the retry count to your original patch?

Before I wrote code that's not useful at all: I don't quite get why retry on buffer overflow is necessary at all. It looks like it once seemed to be a useful kludge, but then 1b71fe2efa31 ("ceph analog of cifs build_path_from_dentry() race fix") added the read_seqretry() check which, to my limited understanding, is a more robust implementation of rename detection.

Max

Max Kellermann

3 Dec 3 Dec

12:20 p.m.

On Mon, Nov 25, 2024 at 3:33 PM Alex Markuze amarkuze@redhat.com wrote:

...

You and Illia agree on this point. I'll wait for replies and take your original patch into the testing branch unless any concerns are raised.

How long will you wait? It's been more than two weeks since I reported this DoS vulnerability.

Alex Markuze

4 Dec 4 Dec

12:51 p.m.

It's already in a testing branch; what branch are you working on?

On Tue, Dec 3, 2024 at 2:21 PM Max Kellermann max.kellermann@ionos.com wrote:

...

On Mon, Nov 25, 2024 at 3:33 PM Alex Markuze amarkuze@redhat.com wrote:

...
You and Illia agree on this point. I'll wait for replies and take your original patch into the testing branch unless any concerns are raised.

How long will you wait? It's been more than two weeks since I reported this DoS vulnerability.

Max Kellermann

1:05 p.m.

On Wed, Dec 4, 2024 at 1:51 PM Alex Markuze amarkuze@redhat.com wrote:

...

It's already in a testing branch; what branch are you working on?

It is? Which one? I checked https://github.com/ceph/ceph-client/commits/testing/ and it's not there. This is the repository mentioned on https://docs.ceph.com/en/latest/dev/developer_guide/testing_integration_test...

Max Kellermann

5 Dec 5 Dec

8:24 a.m.

On Wed, Dec 4, 2024 at 1:51 PM Alex Markuze amarkuze@redhat.com wrote:

...

It's already in a testing branch; what branch are you working on?

I found this on branch "wip-shirnk-crash": https://github.com/ceph/ceph-client/commit/6cdec9f931e38980eb007d9704c5a2453... - did you mean this branch?

This is my patch; but you removed the commit message, removed the explanation I wrote from the code comment, left the (useless and confusing) log message in, and then claimed authorship for my work.

Alex Markuze

8 Dec 8 Dec

10:16 a.m.

Illya, this patch is tested and it has my review by.

On Thu, Dec 5, 2024 at 10:24 AM Max Kellermann max.kellermann@ionos.com wrote:

...

On Wed, Dec 4, 2024 at 1:51 PM Alex Markuze amarkuze@redhat.com wrote:

...
It's already in a testing branch; what branch are you working on?

I found this on branch "wip-shirnk-crash": https://github.com/ceph/ceph-client/commit/6cdec9f931e38980eb007d9704c5a2453...

did you mean this branch?

This is my patch; but you removed the commit message, removed the explanation I wrote from the code comment, left the (useless and confusing) log message in, and then claimed authorship for my work.

Ilya Dryomov

10 Dec 10 Dec

8:42 a.m.

On Sun, Dec 8, 2024 at 11:17 AM Alex Markuze amarkuze@redhat.com wrote:

...

Illya, this patch is tested and it has my review by.

Max's original patch has been applied (with the authorship, commit message, etc preserved).

Thanks,

Ilya

...

On Thu, Dec 5, 2024 at 10:24 AM Max Kellermann max.kellermann@ionos.com wrote:

...
On Wed, Dec 4, 2024 at 1:51 PM Alex Markuze amarkuze@redhat.com wrote:

...
It's already in a testing branch; what branch are you working on?

I found this on branch "wip-shirnk-crash": https://github.com/ceph/ceph-client/commit/6cdec9f931e38980eb007d9704c5a2453...

did you mean this branch?

This is my patch; but you removed the commit message, removed the explanation I wrote from the code comment, left the (useless and confusing) log message in, and then claimed authorship for my work.

384

days inactive

406

days old

linux-stable-mirror@lists.linaro.org

17 comments

participants

tags (0)

participants (5)

Alex Markuze
Ilya Dryomov
Jeff Layton
Max Kellermann
Patrick Donnelly