[PATCHv5 0/7] tty: Hold write ldisc sem in tty_reopen()

List overview All Threads
Download

newer

older

[git:media_tree/master] media:...

[PATCH] fs: fix possible Spectre...

Dmitry Safonov

17 Sep 2018 17 Sep '18

11:52 p.m.

Hi all,

Three fixes that worth to have in the @stable, as they were hit by different people, including Arista on v4.9 stable.

And for linux-next - adding lockdep asserts for line discipline changing code, verifying that write ldisc sem will be held forthwith.

The last patch is an optional and probably, timeout can be dropped for read_lock(). I'll do it if everyone agrees. (Or as per discussion with Peter in v3, just convert ldisc to a regular rwsem).

Thanks, Dima

Changes since v4: - back to lock ldisc with (5*HZ) timeout in tty_reopen() (LKP report link: lkml.kernel.org/r/1536940609.3185.29.camel@arista.com) - reordered 3/7 with 2/7 for LKP robot

Changes since v3: - Added tested-by Mark Rutland (thanks!) - Dropped patch with smp_wmb() - wrong idea - lockdep_assert_held() should be actually lockdep_assert_held_exclusive() - Described why tty_ldisc_open() can be called without ldisc_sem held for pty slave end (o_tty). - Added Peter's patch for dropping self-made lockdep annotations - Fix for a reader(s) of ldisc semaphore waiting for an active reader(s)

Changes since v2: - Added reviewed-by tags - Hopefully, fixed reported by 0-day issue. - Added optional fix for wait_readers decrement

Changes since v1: - Added tested-by/reported-by tags - Dropped 3/4 (locking tty pair for lockdep sake), Because of that - not adding lockdep_assert_held() in tty_ldisc_open() - Added 4/4 cleanup to inc tty->count only on success of tty_ldisc_reinit() - lock ldisc without (5*HZ) timeout in tty_reopen()

v1 link: lkml.kernel.org/r/20180829022353.23568-1-dima@arista.com v2 link: lkml.kernel.org/r/20180903165257.29227-1-dima@arista.com v3 link: lkml.kernel.org/r/20180911014821.26286-1-dima@arista.com v4 link: lkml.kernel.org/r/20180912001702.18522-1-dima@arista.com

Cc: Daniel Axtens dja@axtens.net Cc: Dmitry Vyukov dvyukov@google.com Cc: Mark Rutland mark.rutland@arm.com Cc: Michael Neuling mikey@neuling.org Cc: Mikulas Patocka mpatocka@redhat.com Cc: Nathan March nathan@gt.net Cc: Pasi Kärkkäinen pasik@iki.fi Cc: Peter Hurley peter@hurleysoftware.com Cc: Peter Zijlstra peterz@infradead.org Cc: "Rong, Chen" rong.a.chen@intel.com Cc: Sergey Senozhatsky sergey.senozhatsky.work@gmail.com Cc: Tan Xiaojun tanxiaojun@huawei.com Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp (please, ignore if I Cc'ed you mistakenly)

Dmitry Safonov (6): tty: Drop tty->count on tty_reopen() failure tty/ldsem: Wake up readers after timed out down_write() tty: Hold tty_ldisc_lock() during tty_reopen() tty: Simplify tty->count math in tty_reopen() tty/ldsem: Add lockdep asserts for ldisc_sem tty/ldsem: Decrement wait_readers on timeouted down_read()

Peter Zijlstra (1): tty/ldsem: Convert to regular lockdep annotations

drivers/tty/tty_io.c | 13 ++++++++--- drivers/tty/tty_ldisc.c | 9 +++++++ drivers/tty/tty_ldsem.c | 62 ++++++++++++++++++++----------------------------- 3 files changed, 44 insertions(+), 40 deletions(-)

-- 2.13.6

Show replies by date

Dmitry Safonov

17 Sep 17 Sep

11:52 p.m.

New subject: [PATCHv5 1/7] tty: Drop tty->count on tty_reopen() failure

In case of tty_ldisc_reinit() failure, tty->count should be decremented back, otherwise we will never release_tty(). Tetsuo reported that it fixes noisy warnings on tty release like: pts pts4033: tty_release: tty->count(10529) != (#fd's(7) + #kopen's(0))

Fixes: commit 892d1fa7eaae ("tty: Destroy ldisc instance on hangup")

Cc: stable@vger.kernel.org # v4.6+ Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jiri Slaby jslaby@suse.com Reviewed-by: Jiri Slaby jslaby@suse.cz Tested-by: Jiri Slaby jslaby@suse.com Tested-by: Mark Rutland mark.rutland@arm.com Tested-by: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Signed-off-by: Dmitry Safonov dima@arista.com --- drivers/tty/tty_io.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 32bc3e3fe4d3..5e5da9acaf0a 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1255,6 +1255,7 @@ static void tty_driver_remove_tty(struct tty_driver *driver, struct tty_struct * static int tty_reopen(struct tty_struct *tty) { struct tty_driver *driver = tty->driver; + int retval;

if (driver->type == TTY_DRIVER_TYPE_PTY && driver->subtype == PTY_TYPE_MASTER) @@ -1268,10 +1269,14 @@ static int tty_reopen(struct tty_struct *tty)

tty->count++;

- if (!tty->ldisc) - return tty_ldisc_reinit(tty, tty->termios.c_line); + if (tty->ldisc) + return 0;

- return 0; + retval = tty_ldisc_reinit(tty, tty->termios.c_line); + if (retval) + tty->count--; + + return retval; }

/**

-- 2.13.6

Greg Kroah-Hartman

18 Sep 18 Sep

1:41 p.m.

New subject: [PATCHv5 1/7] tty: Drop tty->count on tty_reopen() failure

On Tue, Sep 18, 2018 at 12:52:52AM +0100, Dmitry Safonov wrote:

...

In case of tty_ldisc_reinit() failure, tty->count should be decremented back, otherwise we will never release_tty(). Tetsuo reported that it fixes noisy warnings on tty release like: pts pts4033: tty_release: tty->count(10529) != (#fd's(7) + #kopen's(0))

Fixes: commit 892d1fa7eaae ("tty: Destroy ldisc instance on hangup")

Minor nit, no need to have "commit" in this line, it's implied.

Also, no need to put an extra line here either.

I'll queue this patch up now, have some question about the rest...

thanks,

greg k-h

Dmitry Safonov

2:15 p.m.

New subject: [PATCHv5 1/7] tty: Drop tty->count on tty_reopen() failure

On Tue, 2018-09-18 at 15:41 +0200, Greg Kroah-Hartman wrote:

...

On Tue, Sep 18, 2018 at 12:52:52AM +0100, Dmitry Safonov wrote:

...
In case of tty_ldisc_reinit() failure, tty->count should be decremented back, otherwise we will never release_tty(). Tetsuo reported that it fixes noisy warnings on tty release like: pts pts4033: tty_release: tty->count(10529) != (#fd's(7) + #kopen's(0))

Fixes: commit 892d1fa7eaae ("tty: Destroy ldisc instance on hangup")

Minor nit, no need to have "commit" in this line, it's implied.

Also, no need to put an extra line here either.

I'll queue this patch up now, have some question about the rest...

Thanks, Greg, will have in mind further.

-- Dima

Dmitry Safonov

17 Sep 17 Sep

11:52 p.m.

New subject: [PATCHv5 3/7] tty: Hold tty_ldisc_lock() during tty_reopen()

tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup() nor set_ldisc() nor tty_ldisc_release() as they use tty lock. But it races with anyone who expects line discipline to be the same after hoding read semaphore in tty_ldisc_ref().

We've seen the following crash on v4.9.108 stable:

BUG: unable to handle kernel paging request at 0000000000002260 IP: [..] n_tty_receive_buf_common+0x5f/0x86d Workqueue: events_unbound flush_to_ldisc Call Trace: [..] n_tty_receive_buf2 [..] tty_ldisc_receive_buf [..] flush_to_ldisc [..] process_one_work [..] worker_thread [..] kthread [..] ret_from_fork

tty_ldisc_reinit() should be called with ldisc_sem hold for writing, which will protect any reader against line discipline changes.

Backport-first: b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf") Cc: stable@vger.kernel.org

Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jiri Slaby jslaby@suse.com Reviewed-by: Jiri Slaby jslaby@suse.cz Reported-by: syzbot+3aa9784721dfb90e984d@syzkaller.appspotmail.com Tested-by: Mark Rutland mark.rutland@arm.com Tested-by: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Signed-off-by: Dmitry Safonov dima@arista.com --- drivers/tty/tty_io.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 5e5da9acaf0a..3ef8b977b167 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1267,15 +1267,20 @@ static int tty_reopen(struct tty_struct *tty) if (test_bit(TTY_EXCLUSIVE, &tty->flags) && !capable(CAP_SYS_ADMIN)) return -EBUSY;

- tty->count++; + retval = tty_ldisc_lock(tty, 5 * HZ); + if (retval) + return retval;

+ tty->count++; if (tty->ldisc) - return 0; + goto out_unlock;

retval = tty_ldisc_reinit(tty, tty->termios.c_line); if (retval) tty->count--;

+out_unlock: + tty_ldisc_unlock(tty); return retval; }

-- 2.13.6

Greg Kroah-Hartman

18 Sep 18 Sep

1:47 p.m.

New subject: [PATCHv5 3/7] tty: Hold tty_ldisc_lock() during tty_reopen()

On Tue, Sep 18, 2018 at 12:52:54AM +0100, Dmitry Safonov wrote:

...

tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup() nor set_ldisc() nor tty_ldisc_release() as they use tty lock. But it races with anyone who expects line discipline to be the same after hoding read semaphore in tty_ldisc_ref().

We've seen the following crash on v4.9.108 stable:

BUG: unable to handle kernel paging request at 0000000000002260 IP: [..] n_tty_receive_buf_common+0x5f/0x86d Workqueue: events_unbound flush_to_ldisc Call Trace: [..] n_tty_receive_buf2 [..] tty_ldisc_receive_buf [..] flush_to_ldisc [..] process_one_work [..] worker_thread [..] kthread [..] ret_from_fork

tty_ldisc_reinit() should be called with ldisc_sem hold for writing, which will protect any reader against line discipline changes.

Backport-first: b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf")

What does this mean?

Does this require that patch for a stable patch? If so, just do: Cc: stable@vger.kernel.org # b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf") in the signed-off-by area. The stable documentation should describe this pretty well. If not, we can modify it to make it more obvious.

can you fix this up for the next resend of this series?

thanks,

greg k-h

Dmitry Safonov

2:19 p.m.

New subject: [PATCHv5 3/7] tty: Hold tty_ldisc_lock() during tty_reopen()

On Tue, 2018-09-18 at 15:47 +0200, Greg Kroah-Hartman wrote:

...

On Tue, Sep 18, 2018 at 12:52:54AM +0100, Dmitry Safonov wrote:

...
tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup() nor set_ldisc() nor tty_ldisc_release() as they use tty lock. But it races with anyone who expects line discipline to be the same after hoding read semaphore in tty_ldisc_ref().

We've seen the following crash on v4.9.108 stable:

BUG: unable to handle kernel paging request at 0000000000002260 IP: [..] n_tty_receive_buf_common+0x5f/0x86d Workqueue: events_unbound flush_to_ldisc Call Trace: [..] n_tty_receive_buf2 [..] tty_ldisc_receive_buf [..] flush_to_ldisc [..] process_one_work [..] worker_thread [..] kthread [..] ret_from_fork

tty_ldisc_reinit() should be called with ldisc_sem hold for writing, which will protect any reader against line discipline changes.

Backport-first: b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf")

What does this mean?

Does this require that patch for a stable patch? If so, just do: Cc: stable@vger.kernel.org # b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf") in the signed-off-by area. The stable documentation should describe this pretty well. If not, we can modify it to make it more obvious.

can you fix this up for the next resend of this series?

Sure, sorry about that non-obvious tag.

-- Thanks, Dmitry

Mikulas Patocka

19 Sep 19 Sep

4:56 p.m.

On Tue, 18 Sep 2018, Dmitry Safonov wrote:

...

Hi all,

Three fixes that worth to have in the @stable, as they were hit by different people, including Arista on v4.9 stable.

And for linux-next - adding lockdep asserts for line discipline changing code, verifying that write ldisc sem will be held forthwith.

The last patch is an optional and probably, timeout can be dropped for read_lock(). I'll do it if everyone agrees. (Or as per discussion with Peter in v3, just convert ldisc to a regular rwsem).

Thanks, Dima

I confirm that this patch series fixes the crash for me.

Tested-by: Mikulas Patocka mpatocka@redhat.com

...

Changes since v4:

back to lock ldisc with (5*HZ) timeout in tty_reopen() (LKP report link: lkml.kernel.org/r/1536940609.3185.29.camel@arista.com)

reordered 3/7 with 2/7 for LKP robot

Changes since v3:

Added tested-by Mark Rutland (thanks!)

Dropped patch with smp_wmb() - wrong idea

lockdep_assert_held() should be actually lockdep_assert_held_exclusive()

Described why tty_ldisc_open() can be called without ldisc_sem held for pty slave end (o_tty).

Added Peter's patch for dropping self-made lockdep annotations

Fix for a reader(s) of ldisc semaphore waiting for an active reader(s)

Changes since v2:

Added reviewed-by tags

Hopefully, fixed reported by 0-day issue.

Added optional fix for wait_readers decrement

Changes since v1:

Added tested-by/reported-by tags

Dropped 3/4 (locking tty pair for lockdep sake), Because of that - not adding lockdep_assert_held() in tty_ldisc_open()

Added 4/4 cleanup to inc tty->count only on success of tty_ldisc_reinit()

lock ldisc without (5*HZ) timeout in tty_reopen()

v1 link: lkml.kernel.org/r/20180829022353.23568-1-dima@arista.com v2 link: lkml.kernel.org/r/20180903165257.29227-1-dima@arista.com v3 link: lkml.kernel.org/r/20180911014821.26286-1-dima@arista.com v4 link: lkml.kernel.org/r/20180912001702.18522-1-dima@arista.com

Cc: Daniel Axtens dja@axtens.net Cc: Dmitry Vyukov dvyukov@google.com Cc: Mark Rutland mark.rutland@arm.com Cc: Michael Neuling mikey@neuling.org Cc: Mikulas Patocka mpatocka@redhat.com Cc: Nathan March nathan@gt.net Cc: Pasi Kärkkäinen pasik@iki.fi Cc: Peter Hurley peter@hurleysoftware.com Cc: Peter Zijlstra peterz@infradead.org Cc: "Rong, Chen" rong.a.chen@intel.com Cc: Sergey Senozhatsky sergey.senozhatsky.work@gmail.com Cc: Tan Xiaojun tanxiaojun@huawei.com Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp (please, ignore if I Cc'ed you mistakenly)

Dmitry Safonov (6): tty: Drop tty->count on tty_reopen() failure tty/ldsem: Wake up readers after timed out down_write() tty: Hold tty_ldisc_lock() during tty_reopen() tty: Simplify tty->count math in tty_reopen() tty/ldsem: Add lockdep asserts for ldisc_sem tty/ldsem: Decrement wait_readers on timeouted down_read()

Peter Zijlstra (1): tty/ldsem: Convert to regular lockdep annotations

drivers/tty/tty_io.c | 13 ++++++++--- drivers/tty/tty_ldisc.c | 9 +++++++ drivers/tty/tty_ldsem.c | 62 ++++++++++++++++++++----------------------------- 3 files changed, 44 insertions(+), 40 deletions(-)

-- 2.13.6

Mikulas Patocka

5:35 p.m.

On Wed, 19 Sep 2018, Mikulas Patocka wrote:

...

On Tue, 18 Sep 2018, Dmitry Safonov wrote:

...
Hi all,

Three fixes that worth to have in the @stable, as they were hit by different people, including Arista on v4.9 stable.

And for linux-next - adding lockdep asserts for line discipline changing code, verifying that write ldisc sem will be held forthwith.

The last patch is an optional and probably, timeout can be dropped for read_lock(). I'll do it if everyone agrees. (Or as per discussion with Peter in v3, just convert ldisc to a regular rwsem).

Thanks, Dima

I confirm that this patch series fixes the crash for me.

Tested-by: Mikulas Patocka mpatocka@redhat.com

I was too quick to acknowledge this patchset. It doesn't work.

This patchset fixes the crash, but it introduces another bug - when I type 'reboot' on the console, it prints 'INIT: Switching to runlevel: 6' and then it gets stuck for 80 seconds before proceeding with the reboot. When I revert this patchset 'reboot' reboots the machine without any delay. This bug was reproduced on Debian 5 userspace on pa-risc.

Mikulas

Dmitry Safonov

5:45 p.m.

On Wed, 2018-09-19 at 13:35 -0400, Mikulas Patocka wrote:

...

On Wed, 19 Sep 2018, Mikulas Patocka wrote:

...
On Tue, 18 Sep 2018, Dmitry Safonov wrote:

...
Hi all,

Three fixes that worth to have in the @stable, as they were hit

by

...
...
different people, including Arista on v4.9 stable.

And for linux-next - adding lockdep asserts for line discipline

changing

...
...
code, verifying that write ldisc sem will be held forthwith.

The last patch is an optional and probably, timeout can be

dropped for

...
...
read_lock(). I'll do it if everyone agrees. (Or as per discussion with Peter in v3, just convert ldisc to a regular rwsem).

Thanks, Dima

I confirm that this patch series fixes the crash for me.

Tested-by: Mikulas Patocka mpatocka@redhat.com

I was too quick to acknowledge this patchset. It doesn't work.

This patchset fixes the crash, but it introduces another bug - when I type 'reboot' on the console, it prints 'INIT: Switching to runlevel: 6' and then it gets stuck for 80 seconds before proceeding with the reboot. When I revert this patchset 'reboot' reboots the machine without any delay. This bug was reproduced on Debian 5 userspace on pa-risc.

Thanks much for the testing, Mikulas. Could you try to bisect which of the patches causes it? The most important are 1,2,3 - probably, one of them caused it.. I'll stare a bit into the code.

-- Thanks, Dmitry

Mikulas Patocka

8:03 p.m.

On Wed, 19 Sep 2018, Dmitry Safonov wrote:

...

On Wed, 2018-09-19 at 13:35 -0400, Mikulas Patocka wrote:

...
On Wed, 19 Sep 2018, Mikulas Patocka wrote:

...
On Tue, 18 Sep 2018, Dmitry Safonov wrote:

...
Hi all,

Three fixes that worth to have in the @stable, as they were hit

by

...
...
different people, including Arista on v4.9 stable.

And for linux-next - adding lockdep asserts for line discipline

changing

...
...
code, verifying that write ldisc sem will be held forthwith.

The last patch is an optional and probably, timeout can be

dropped for

...
...
read_lock(). I'll do it if everyone agrees. (Or as per discussion with Peter in v3, just convert ldisc to a regular rwsem).

Thanks, Dima

I confirm that this patch series fixes the crash for me.

Tested-by: Mikulas Patocka mpatocka@redhat.com

I was too quick to acknowledge this patchset. It doesn't work.

This patchset fixes the crash, but it introduces another bug - when I type 'reboot' on the console, it prints 'INIT: Switching to runlevel: 6' and then it gets stuck for 80 seconds before proceeding with the reboot. When I revert this patchset 'reboot' reboots the machine without any delay. This bug was reproduced on Debian 5 userspace on pa-risc.

Thanks much for the testing, Mikulas. Could you try to bisect which of the patches causes it? The most important are 1,2,3 - probably, one of them caused it.. I'll stare a bit into the code.

The patch 3 causes it.

The hangs during reboot take either 80 seconds, 3 minutes or 3:25.

Mikulas

Dmitry Safonov

8:21 p.m.

On Wed, 2018-09-19 at 16:03 -0400, Mikulas Patocka wrote:

...

On Wed, 19 Sep 2018, Dmitry Safonov wrote:

...
Thanks much for the testing, Mikulas. Could you try to bisect which of the patches causes it? The most important are 1,2,3 - probably, one of them caused it.. I'll stare a bit into the code.

The patch 3 causes it.

The hangs during reboot take either 80 seconds, 3 minutes or 3:25.

Thanks much for that again. Any chance you have sysrq enabled and could print: locks held in system and kernel backtraces for applications?

I suppose, userspace hangs in n_tty_receive_buf_common(), which is better of cause than null-ptr dereference, but I believe we can improve it by stopping a reader earlier if there is any writer pending.

-- Thanks, Dmitry

Mikulas Patocka

20 Sep 20 Sep

2:25 p.m.

On Wed, 19 Sep 2018, Dmitry Safonov wrote:

...

On Wed, 2018-09-19 at 16:03 -0400, Mikulas Patocka wrote:

...
On Wed, 19 Sep 2018, Dmitry Safonov wrote:

...
Thanks much for the testing, Mikulas. Could you try to bisect which of the patches causes it? The most important are 1,2,3 - probably, one of them caused it.. I'll stare a bit into the code.

The patch 3 causes it.

The hangs during reboot take either 80 seconds, 3 minutes or 3:25.

Thanks much for that again. Any chance you have sysrq enabled and could print: locks held in system and kernel backtraces for applications?

Stacktrace doesn't work on parisc. It used to work on 4.18, but the kernel 4.19 reportedly requires patched binutils.

Mikulas

...

I suppose, userspace hangs in n_tty_receive_buf_common(), which is better of cause than null-ptr dereference, but I believe we can improve it by stopping a reader earlier if there is any writer pending.

-- Thanks, Dmitry

Pasi Kärkkäinen

15 Oct 15 Oct

1:37 p.m.

Hi,

On Thu, Sep 20, 2018 at 10:25:01AM -0400, Mikulas Patocka wrote:

...

On Wed, 19 Sep 2018, Dmitry Safonov wrote:

...
On Wed, 2018-09-19 at 16:03 -0400, Mikulas Patocka wrote:

...
On Wed, 19 Sep 2018, Dmitry Safonov wrote:

...
Thanks much for the testing, Mikulas. Could you try to bisect which of the patches causes it? The most important are 1,2,3 - probably, one of them caused it.. I'll stare a bit into the code.

The patch 3 causes it.

The hangs during reboot take either 80 seconds, 3 minutes or 3:25.

Thanks much for that again. Any chance you have sysrq enabled and could print: locks held in system and kernel backtraces for applications?

Stacktrace doesn't work on parisc. It used to work on 4.18, but the kernel 4.19 reportedly requires patched binutils.

Mikulas

...
I suppose, userspace hangs in n_tty_receive_buf_common(), which is better of cause than null-ptr dereference, but I believe we can improve it by stopping a reader earlier if there is any writer pending.

-- Thanks, Dmitry

Any progress here? Or is more info/stacktrace needed about the hang?

Thanks,

-- Pasi

2624

days inactive

2652

days old

linux-stable-mirror@lists.linaro.org

13 comments

participants

tags (0)

participants (4)

Dmitry Safonov
Greg Kroah-Hartman
Mikulas Patocka
Pasi Kärkkäinen