- Linux-stable-mirror - lists.linaro.org

[Linux-stable-mirror] FAILED: patch "[PATCH] fscrypt: lock mutex before checking for bounce page pool" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From a0b3bc855374c50b5ea85273553485af48caf2f7 Mon Sep 17 00:00:00 2001 From: Eric Biggers <ebiggers(a)google.com> Date: Sun, 29 Oct 2017 06:30:19 -0400 Subject: [PATCH] fscrypt: lock mutex before checking for bounce page pool fscrypt_initialize(), which allocates the global bounce page pool when an encrypted file is first accessed, uses "double-checked locking" to try to avoid locking fscrypt_init_mutex. However, it doesn't use any memory barriers, so it's theoretically possible for a thread to observe a bounce page pool which has not been fully initialized. This is a classic bug with "double-checked locking". While "only a theoretical issue" in the latest kernel, in pre-4.8 kernels the pointer that was checked was not even the last to be initialized, so it was easily possible for a crash (NULL pointer dereference) to happen. This was changed only incidentally by the large refactor to use fs/crypto/. Solve both problems in a trivial way that can easily be backported: just always take the mutex. It's theoretically less efficient, but it shouldn't be noticeable in practice as the mutex is only acquired very briefly once per encrypted file. Later I'd like to make this use a helper macro like DO_ONCE(). However, DO_ONCE() runs in atomic context, so we'd need to add a new macro that allows blocking. Cc: stable(a)vger.kernel.org # v4.1+ Signed-off-by: Eric Biggers <ebiggers(a)google.com> Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c index 608f6bbe0f31..472326737717 100644 --- a/fs/crypto/crypto.c +++ b/fs/crypto/crypto.c @@ -410,11 +410,8 @@ int fscrypt_initialize(unsigned int cop_flags) { int i, res = -ENOMEM; - /* - * No need to allocate a bounce page pool if there already is one or - * this FS won't use it. - */ - if (cop_flags & FS_CFLG_OWN_PAGES || fscrypt_bounce_page_pool) + /* No need to allocate a bounce page pool if this FS won't use it. */ + if (cop_flags & FS_CFLG_OWN_PAGES) return 0; mutex_lock(&fscrypt_init_mutex);

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] fscrypt: lock mutex before checking for bounce page pool" failed to apply to 4.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From a0b3bc855374c50b5ea85273553485af48caf2f7 Mon Sep 17 00:00:00 2001 From: Eric Biggers <ebiggers(a)google.com> Date: Sun, 29 Oct 2017 06:30:19 -0400 Subject: [PATCH] fscrypt: lock mutex before checking for bounce page pool fscrypt_initialize(), which allocates the global bounce page pool when an encrypted file is first accessed, uses "double-checked locking" to try to avoid locking fscrypt_init_mutex. However, it doesn't use any memory barriers, so it's theoretically possible for a thread to observe a bounce page pool which has not been fully initialized. This is a classic bug with "double-checked locking". While "only a theoretical issue" in the latest kernel, in pre-4.8 kernels the pointer that was checked was not even the last to be initialized, so it was easily possible for a crash (NULL pointer dereference) to happen. This was changed only incidentally by the large refactor to use fs/crypto/. Solve both problems in a trivial way that can easily be backported: just always take the mutex. It's theoretically less efficient, but it shouldn't be noticeable in practice as the mutex is only acquired very briefly once per encrypted file. Later I'd like to make this use a helper macro like DO_ONCE(). However, DO_ONCE() runs in atomic context, so we'd need to add a new macro that allows blocking. Cc: stable(a)vger.kernel.org # v4.1+ Signed-off-by: Eric Biggers <ebiggers(a)google.com> Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c index 608f6bbe0f31..472326737717 100644 --- a/fs/crypto/crypto.c +++ b/fs/crypto/crypto.c @@ -410,11 +410,8 @@ int fscrypt_initialize(unsigned int cop_flags) { int i, res = -ENOMEM; - /* - * No need to allocate a bounce page pool if there already is one or - * this FS won't use it. - */ - if (cop_flags & FS_CFLG_OWN_PAGES || fscrypt_bounce_page_pool) + /* No need to allocate a bounce page pool if this FS won't use it. */ + if (cop_flags & FS_CFLG_OWN_PAGES) return 0; mutex_lock(&fscrypt_init_mutex);

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] fscrypt: lock mutex before checking for bounce page pool" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From a0b3bc855374c50b5ea85273553485af48caf2f7 Mon Sep 17 00:00:00 2001 From: Eric Biggers <ebiggers(a)google.com> Date: Sun, 29 Oct 2017 06:30:19 -0400 Subject: [PATCH] fscrypt: lock mutex before checking for bounce page pool fscrypt_initialize(), which allocates the global bounce page pool when an encrypted file is first accessed, uses "double-checked locking" to try to avoid locking fscrypt_init_mutex. However, it doesn't use any memory barriers, so it's theoretically possible for a thread to observe a bounce page pool which has not been fully initialized. This is a classic bug with "double-checked locking". While "only a theoretical issue" in the latest kernel, in pre-4.8 kernels the pointer that was checked was not even the last to be initialized, so it was easily possible for a crash (NULL pointer dereference) to happen. This was changed only incidentally by the large refactor to use fs/crypto/. Solve both problems in a trivial way that can easily be backported: just always take the mutex. It's theoretically less efficient, but it shouldn't be noticeable in practice as the mutex is only acquired very briefly once per encrypted file. Later I'd like to make this use a helper macro like DO_ONCE(). However, DO_ONCE() runs in atomic context, so we'd need to add a new macro that allows blocking. Cc: stable(a)vger.kernel.org # v4.1+ Signed-off-by: Eric Biggers <ebiggers(a)google.com> Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c index 608f6bbe0f31..472326737717 100644 --- a/fs/crypto/crypto.c +++ b/fs/crypto/crypto.c @@ -410,11 +410,8 @@ int fscrypt_initialize(unsigned int cop_flags) { int i, res = -ENOMEM; - /* - * No need to allocate a bounce page pool if there already is one or - * this FS won't use it. - */ - if (cop_flags & FS_CFLG_OWN_PAGES || fscrypt_bounce_page_pool) + /* No need to allocate a bounce page pool if this FS won't use it. */ + if (cop_flags & FS_CFLG_OWN_PAGES) return 0; mutex_lock(&fscrypt_init_mutex);

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] mm/z3fold.c: use kref to prevent page free/compact race" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 5d03a6613957785e94af7a4a6212ad4af66aa5c2 Mon Sep 17 00:00:00 2001 From: Vitaly Wool <vitalywool(a)gmail.com> Date: Fri, 17 Nov 2017 15:26:16 -0800 Subject: [PATCH] mm/z3fold.c: use kref to prevent page free/compact race There is a race in the current z3fold implementation between do_compact() called in a work queue context and the page release procedure when page's kref goes to 0. do_compact() may be waiting for page lock, which is released by release_z3fold_page_locked right before putting the page onto the "stale" list, and then the page may be freed as do_compact() modifies its contents. The mechanism currently implemented to handle that (checking the PAGE_STALE flag) is not reliable enough. Instead, we'll use page's kref counter to guarantee that the page is not released if its compaction is scheduled. It then becomes compaction function's responsibility to decrease the counter and quit immediately if the page was actually freed. Link: http://lkml.kernel.org/r/20171117092032.00ea56f42affbed19f4fcc6c@gmail.com Signed-off-by: Vitaly Wool <vitaly.wool(a)sonymobile.com> Cc: <Oleksiy.Avramchenko(a)sony.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/mm/z3fold.c b/mm/z3fold.c index b2ba2ba585f3..39e19125d6a0 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -404,8 +404,7 @@ static void do_compact_page(struct z3fold_header *zhdr, bool locked) WARN_ON(z3fold_page_trylock(zhdr)); else z3fold_page_lock(zhdr); - if (test_bit(PAGE_STALE, &page->private) || - !test_and_clear_bit(NEEDS_COMPACTING, &page->private)) { + if (WARN_ON(!test_and_clear_bit(NEEDS_COMPACTING, &page->private))) { z3fold_page_unlock(zhdr); return; } @@ -413,6 +412,11 @@ static void do_compact_page(struct z3fold_header *zhdr, bool locked) list_del_init(&zhdr->buddy); spin_unlock(&pool->lock); + if (kref_put(&zhdr->refcount, release_z3fold_page_locked)) { + atomic64_dec(&pool->pages_nr); + return; + } + z3fold_compact_page(zhdr); unbuddied = get_cpu_ptr(pool->unbuddied); fchunks = num_free_chunks(zhdr); @@ -753,9 +757,11 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) list_del_init(&zhdr->buddy); spin_unlock(&pool->lock); zhdr->cpu = -1; + kref_get(&zhdr->refcount); do_compact_page(zhdr, true); return; } + kref_get(&zhdr->refcount); queue_work_on(zhdr->cpu, pool->compact_wq, &zhdr->work); z3fold_page_unlock(zhdr); }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] MIPS: math-emu: Fix final emulation phase for certain" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 409fcace9963c1e8d2cb0f7ac62e8b34d47ef979 Mon Sep 17 00:00:00 2001 From: Aleksandar Markovic <aleksandar.markovic(a)mips.com> Date: Thu, 2 Nov 2017 12:13:58 +0100 Subject: [PATCH] MIPS: math-emu: Fix final emulation phase for certain instructions Fix final phase of <CLASS|MADDF|MSUBF|MAX|MIN|MAXA|MINA>.<D|S> emulation. Provide proper generation of SIGFPE signal and updating debugfs FP exception stats in cases of any exception flags set in preceding phases of emulation. CLASS.<D|S> instruction may generate "Unimplemented Operation" FP exception. <MADDF|MSUBF>.<D|S> instructions may generate "Inexact", "Unimplemented Operation", "Invalid Operation", "Overflow", and "Underflow" FP exceptions. <MAX|MIN|MAXA|MINA>.<D|S> instructions can generate "Unimplemented Operation" and "Invalid Operation" FP exceptions. The proper final processing of the cases when any FP exception flag is set is achieved by replacing "break" statement with "goto copcsr" statement. With such solution, this patch brings the final phase of emulation of the above instructions consistent with the one corresponding to the previously implemented emulation of other related FPU instructions (ADD, SUB, etc.). Fixes: 38db37ba069f ("MIPS: math-emu: Add support for the MIPS R6 CLASS FPU instruction") Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction") Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction") Fixes: a79f5f9ba508 ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction") Fixes: 4e9561b20e2f ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction") Signed-off-by: Aleksandar Markovic <aleksandar.markovic(a)mips.com> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: Douglas Leung <douglas.leung(a)mips.com> Cc: Goran Ferenc <goran.ferenc(a)mips.com> Cc: "Maciej W. Rozycki" <macro(a)imgtec.com> Cc: Miodrag Dinic <miodrag.dinic(a)mips.com> Cc: Paul Burton <paul.burton(a)mips.com> Cc: Petar Jovanovic <petar.jovanovic(a)mips.com> Cc: Raghu Gandham <raghu.gandham(a)mips.com> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 4.3+ Patchwork: https://patchwork.linux-mips.org/patch/17581/ Signed-off-by: James Hogan <jhogan(a)kernel.org> diff --git a/arch/mips/math-emu/cp1emu.c b/arch/mips/math-emu/cp1emu.c index 192542dbd972..d2fcb3084279 100644 --- a/arch/mips/math-emu/cp1emu.c +++ b/arch/mips/math-emu/cp1emu.c @@ -1795,7 +1795,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, SPFROMREG(fs, MIPSInst_FS(ir)); SPFROMREG(fd, MIPSInst_FD(ir)); rv.s = ieee754sp_maddf(fd, fs, ft); - break; + goto copcsr; } case fmsubf_op: { @@ -1809,7 +1809,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, SPFROMREG(fs, MIPSInst_FS(ir)); SPFROMREG(fd, MIPSInst_FD(ir)); rv.s = ieee754sp_msubf(fd, fs, ft); - break; + goto copcsr; } case frint_op: { @@ -1834,7 +1834,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, SPFROMREG(fs, MIPSInst_FS(ir)); rv.w = ieee754sp_2008class(fs); rfmt = w_fmt; - break; + goto copcsr; } case fmin_op: { @@ -1847,7 +1847,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, SPFROMREG(ft, MIPSInst_FT(ir)); SPFROMREG(fs, MIPSInst_FS(ir)); rv.s = ieee754sp_fmin(fs, ft); - break; + goto copcsr; } case fmina_op: { @@ -1860,7 +1860,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, SPFROMREG(ft, MIPSInst_FT(ir)); SPFROMREG(fs, MIPSInst_FS(ir)); rv.s = ieee754sp_fmina(fs, ft); - break; + goto copcsr; } case fmax_op: { @@ -1873,7 +1873,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, SPFROMREG(ft, MIPSInst_FT(ir)); SPFROMREG(fs, MIPSInst_FS(ir)); rv.s = ieee754sp_fmax(fs, ft); - break; + goto copcsr; } case fmaxa_op: { @@ -1886,7 +1886,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, SPFROMREG(ft, MIPSInst_FT(ir)); SPFROMREG(fs, MIPSInst_FS(ir)); rv.s = ieee754sp_fmaxa(fs, ft); - break; + goto copcsr; } case fabs_op: @@ -2165,7 +2165,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, DPFROMREG(fs, MIPSInst_FS(ir)); DPFROMREG(fd, MIPSInst_FD(ir)); rv.d = ieee754dp_maddf(fd, fs, ft); - break; + goto copcsr; } case fmsubf_op: { @@ -2179,7 +2179,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, DPFROMREG(fs, MIPSInst_FS(ir)); DPFROMREG(fd, MIPSInst_FD(ir)); rv.d = ieee754dp_msubf(fd, fs, ft); - break; + goto copcsr; } case frint_op: { @@ -2204,7 +2204,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, DPFROMREG(fs, MIPSInst_FS(ir)); rv.l = ieee754dp_2008class(fs); rfmt = l_fmt; - break; + goto copcsr; } case fmin_op: { @@ -2217,7 +2217,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, DPFROMREG(ft, MIPSInst_FT(ir)); DPFROMREG(fs, MIPSInst_FS(ir)); rv.d = ieee754dp_fmin(fs, ft); - break; + goto copcsr; } case fmina_op: { @@ -2230,7 +2230,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, DPFROMREG(ft, MIPSInst_FT(ir)); DPFROMREG(fs, MIPSInst_FS(ir)); rv.d = ieee754dp_fmina(fs, ft); - break; + goto copcsr; } case fmax_op: { @@ -2243,7 +2243,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, DPFROMREG(ft, MIPSInst_FT(ir)); DPFROMREG(fs, MIPSInst_FS(ir)); rv.d = ieee754dp_fmax(fs, ft); - break; + goto copcsr; } case fmaxa_op: { @@ -2256,7 +2256,7 @@ static int fpu_emu(struct pt_regs *xcp, struct mips_fpu_struct *ctx, DPFROMREG(ft, MIPSInst_FT(ir)); DPFROMREG(fs, MIPSInst_FS(ir)); rv.d = ieee754dp_fmaxa(fs, ft); - break; + goto copcsr; } case fabs_op:

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] MIPS: Fix odd fp register warnings with MIPS64r2" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From c7fd89a6407ea3a44a2a2fa12d290162c42499c4 Mon Sep 17 00:00:00 2001 From: James Hogan <jhogan(a)kernel.org> Date: Fri, 10 Nov 2017 11:46:54 +0000 Subject: [PATCH] MIPS: Fix odd fp register warnings with MIPS64r2 Building 32-bit MIPS64r2 kernels produces warnings like the following on certain toolchains (such as GNU assembler 2.24.90, but not GNU assembler 2.28.51) since commit 22b8ba765a72 ("MIPS: Fix MIPS64 FP save/restore on 32-bit kernels"), due to the exposure of fpu_save_16odd from fpu_save_double and fpu_restore_16odd from fpu_restore_double: arch/mips/kernel/r4k_fpu.S:47: Warning: float register should be even, was 1 ... arch/mips/kernel/r4k_fpu.S:59: Warning: float register should be even, was 1 ... This appears to be because .set mips64r2 does not change the FPU ABI to 64-bit when -march=mips64r2 (or e.g. -march=xlp) is provided on the command line on that toolchain, from the default FPU ABI of 32-bit due to the -mabi=32. This makes access to the odd FPU registers invalid. Fix by explicitly changing the FPU ABI with .set fp=64 directives in fpu_save_16odd and fpu_restore_16odd, and moving the undefine of fp up in asmmacro.h so fp doesn't turn into $30. Fixes: 22b8ba765a72 ("MIPS: Fix MIPS64 FP save/restore on 32-bit kernels") Signed-off-by: James Hogan <jhogan(a)kernel.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: Paul Burton <paul.burton(a)imgtec.com> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 4.0+: 22b8ba765a72: MIPS: Fix MIPS64 FP save/restore on 32-bit kernels Cc: <stable(a)vger.kernel.org> # 4.0+ Patchwork: https://patchwork.linux-mips.org/patch/17656/ diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h index b815d7b3bd27..feb069cbf44e 100644 --- a/arch/mips/include/asm/asmmacro.h +++ b/arch/mips/include/asm/asmmacro.h @@ -19,6 +19,9 @@ #include <asm/asmmacro-64.h> #endif +/* preprocessor replaces the fp in ".set fp=64" with $30 otherwise */ +#undef fp + /* * Helper macros for generating raw instruction encodings. */ @@ -105,6 +108,7 @@ .macro fpu_save_16odd thread .set push .set mips64r2 + .set fp=64 SET_HARDFLOAT sdc1 $f1, THREAD_FPR1(\thread) sdc1 $f3, THREAD_FPR3(\thread) @@ -163,6 +167,7 @@ .macro fpu_restore_16odd thread .set push .set mips64r2 + .set fp=64 SET_HARDFLOAT ldc1 $f1, THREAD_FPR1(\thread) ldc1 $f3, THREAD_FPR3(\thread) @@ -234,9 +239,6 @@ .endm #ifdef TOOLCHAIN_SUPPORTS_MSA -/* preprocessor replaces the fp in ".set fp=64" with $30 otherwise */ -#undef fp - .macro _cfcmsa rd, cs .set push .set mips32r2

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] MIPS: Fix MIPS64 FP save/restore on 32-bit kernels" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 22b8ba765a726d90e9830ff6134c32b04f12c10f Mon Sep 17 00:00:00 2001 From: James Hogan <jhogan(a)kernel.org> Date: Mon, 3 Jul 2017 23:41:47 +0100 Subject: [PATCH] MIPS: Fix MIPS64 FP save/restore on 32-bit kernels 32-bit kernels can be configured to support MIPS64, in which case neither CONFIG_64BIT or CONFIG_CPU_MIPS32_R* will be set. This causes the CP0_Status.FR checks at the point of floating point register save and restore to be compiled out, which results in odd FP registers not being saved or restored to the task or signal context even when CP0_Status.FR is set. Fix the ifdefs to use CONFIG_CPU_MIPSR2 and CONFIG_CPU_MIPSR6, which are enabled for the relevant revisions of either MIPS32 or MIPS64, along with some other CPUs such as Octeon (r2), Loongson1 (r2), XLP (r2), Loongson 3A R2. The suspect code originates from commit 597ce1723e0f ("MIPS: Support for 64-bit FP with O32 binaries") in v3.14, however the code in __enable_fpu() was consistent and refused to set FR=1, falling back to software FPU emulation. This was suboptimal but should be functionally correct. Commit fcc53b5f6c38 ("MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU") in v4.2 (and stable tagged back to 4.0) later introduced the bug by updating __enable_fpu() to set FR=1 but failing to update the other similar ifdefs to enable FR=1 state handling. Fixes: fcc53b5f6c38 ("MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU") Signed-off-by: James Hogan <jhogan(a)kernel.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: Paul Burton <paul.burton(a)imgtec.com> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 4.0+ Patchwork: https://patchwork.linux-mips.org/patch/16739/ diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h index 83054f79f72a..b815d7b3bd27 100644 --- a/arch/mips/include/asm/asmmacro.h +++ b/arch/mips/include/asm/asmmacro.h @@ -126,8 +126,8 @@ .endm .macro fpu_save_double thread status tmp -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) sll \tmp, \status, 5 bgez \tmp, 10f fpu_save_16odd \thread @@ -184,8 +184,8 @@ .endm .macro fpu_restore_double thread status tmp -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) sll \tmp, \status, 5 bgez \tmp, 10f # 16 register mode? diff --git a/arch/mips/kernel/r4k_fpu.S b/arch/mips/kernel/r4k_fpu.S index 0a83b1708b3c..8e3a6020c613 100644 --- a/arch/mips/kernel/r4k_fpu.S +++ b/arch/mips/kernel/r4k_fpu.S @@ -40,8 +40,8 @@ */ LEAF(_save_fp) EXPORT_SYMBOL(_save_fp) -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) mfc0 t0, CP0_STATUS #endif fpu_save_double a0 t0 t1 # clobbers t1 @@ -52,8 +52,8 @@ EXPORT_SYMBOL(_save_fp) * Restore a thread's fp context. */ LEAF(_restore_fp) -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) mfc0 t0, CP0_STATUS #endif fpu_restore_double a0 t0 t1 # clobbers t1 @@ -246,11 +246,11 @@ LEAF(_save_fp_context) cfc1 t1, fcr31 .set pop -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) .set push SET_HARDFLOAT -#ifdef CONFIG_CPU_MIPS32_R2 +#ifdef CONFIG_CPU_MIPSR2 .set mips32r2 .set fp=64 mfc0 t0, CP0_STATUS @@ -314,11 +314,11 @@ LEAF(_save_fp_context) LEAF(_restore_fp_context) EX lw t1, 0(a1) -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) .set push SET_HARDFLOAT -#ifdef CONFIG_CPU_MIPS32_R2 +#ifdef CONFIG_CPU_MIPSR2 .set mips32r2 .set fp=64 mfc0 t0, CP0_STATUS

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] MIPS: Fix MIPS64 FP save/restore on 32-bit kernels" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 22b8ba765a726d90e9830ff6134c32b04f12c10f Mon Sep 17 00:00:00 2001 From: James Hogan <jhogan(a)kernel.org> Date: Mon, 3 Jul 2017 23:41:47 +0100 Subject: [PATCH] MIPS: Fix MIPS64 FP save/restore on 32-bit kernels 32-bit kernels can be configured to support MIPS64, in which case neither CONFIG_64BIT or CONFIG_CPU_MIPS32_R* will be set. This causes the CP0_Status.FR checks at the point of floating point register save and restore to be compiled out, which results in odd FP registers not being saved or restored to the task or signal context even when CP0_Status.FR is set. Fix the ifdefs to use CONFIG_CPU_MIPSR2 and CONFIG_CPU_MIPSR6, which are enabled for the relevant revisions of either MIPS32 or MIPS64, along with some other CPUs such as Octeon (r2), Loongson1 (r2), XLP (r2), Loongson 3A R2. The suspect code originates from commit 597ce1723e0f ("MIPS: Support for 64-bit FP with O32 binaries") in v3.14, however the code in __enable_fpu() was consistent and refused to set FR=1, falling back to software FPU emulation. This was suboptimal but should be functionally correct. Commit fcc53b5f6c38 ("MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU") in v4.2 (and stable tagged back to 4.0) later introduced the bug by updating __enable_fpu() to set FR=1 but failing to update the other similar ifdefs to enable FR=1 state handling. Fixes: fcc53b5f6c38 ("MIPS: fpu.h: Allow 64-bit FPU on a 64-bit MIPS R6 CPU") Signed-off-by: James Hogan <jhogan(a)kernel.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: Paul Burton <paul.burton(a)imgtec.com> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 4.0+ Patchwork: https://patchwork.linux-mips.org/patch/16739/ diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h index 83054f79f72a..b815d7b3bd27 100644 --- a/arch/mips/include/asm/asmmacro.h +++ b/arch/mips/include/asm/asmmacro.h @@ -126,8 +126,8 @@ .endm .macro fpu_save_double thread status tmp -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) sll \tmp, \status, 5 bgez \tmp, 10f fpu_save_16odd \thread @@ -184,8 +184,8 @@ .endm .macro fpu_restore_double thread status tmp -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) sll \tmp, \status, 5 bgez \tmp, 10f # 16 register mode? diff --git a/arch/mips/kernel/r4k_fpu.S b/arch/mips/kernel/r4k_fpu.S index 0a83b1708b3c..8e3a6020c613 100644 --- a/arch/mips/kernel/r4k_fpu.S +++ b/arch/mips/kernel/r4k_fpu.S @@ -40,8 +40,8 @@ */ LEAF(_save_fp) EXPORT_SYMBOL(_save_fp) -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) mfc0 t0, CP0_STATUS #endif fpu_save_double a0 t0 t1 # clobbers t1 @@ -52,8 +52,8 @@ EXPORT_SYMBOL(_save_fp) * Restore a thread's fp context. */ LEAF(_restore_fp) -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) mfc0 t0, CP0_STATUS #endif fpu_restore_double a0 t0 t1 # clobbers t1 @@ -246,11 +246,11 @@ LEAF(_save_fp_context) cfc1 t1, fcr31 .set pop -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) .set push SET_HARDFLOAT -#ifdef CONFIG_CPU_MIPS32_R2 +#ifdef CONFIG_CPU_MIPSR2 .set mips32r2 .set fp=64 mfc0 t0, CP0_STATUS @@ -314,11 +314,11 @@ LEAF(_save_fp_context) LEAF(_restore_fp_context) EX lw t1, 0(a1) -#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPS32_R2) || \ - defined(CONFIG_CPU_MIPS32_R6) +#if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \ + defined(CONFIG_CPU_MIPSR6) .set push SET_HARDFLOAT -#ifdef CONFIG_CPU_MIPS32_R2 +#ifdef CONFIG_CPU_MIPSR2 .set mips32r2 .set fp=64 mfc0 t0, CP0_STATUS

8 years

1
0
0 0

Re: [Linux-stable-mirror] WTF: patch "[PATCH] MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver" was seriously submitted to be applied to the 4.9-stable tree?

by Greg KH

On Mon, Nov 27, 2017 at 02:13:00PM +0000, James Hogan wrote: > On Mon, Nov 27, 2017 at 01:56:49PM +0100, Greg KH wrote: > > On Mon, Nov 27, 2017 at 12:40:36PM +0000, James Hogan wrote: > > > Hi Greg, > > > > > > On Mon, Nov 27, 2017 at 01:35:46PM +0100, gregkh(a)linuxfoundation.org wrote: > > > > The patch below was submitted to be applied to the 4.9-stable tree. > > > > > > > > I fail to see how this patch meets the stable kernel rules as found at > > > > Documentation/process/stable-kernel-rules.rst. > > > > > > > > I could be totally wrong, and if so, please respond to > > > > <stable(a)vger.kernel.org> and let me know why this patch should be > > > > applied. Otherwise, it is now dropped from my patch queues, never to be > > > > seen again. > > > > > > I should have adjusted the commit message. KERN_WARN doesn't exist so it > > > actually fixes a build error as well as switching to pr_warn(). > > > > What build error? I have not heard of this breaking the build on 4.9 > > for the past year, is it in some config that no one uses? :) > > The LEDE project has been carrying the patch [1] since February when > they added 4.9 support (their 4.4 support had a slightly earlier version > of the driver added with just a plain printk, no KERN_WARN). > > They have both CONFIG_SOC_MT7620 and CONFIG_PCI=y in their ralink mt7620 > config [2], and they are keeping up to date with stable releases [3], so > I have no doubt they would appreciate having the patch applied to > upstream stable to reduce their delta. > > The only defconfigs in mainline which enable this platform > (CONFIG_SOC_MT7620) are omega2p_defconfig and vocore2_defconfig, which > were added in August by Harvey to help widen our internal continuous > build & boot test coverage. Neither defconfig enables CONFIG_PCI yet > which is required to see the build failure below, but regardless it is a > valid configuration which LEDE is actively using. > > arch/mips/pci/pci-mt7620.c: In function ‘wait_pciephy_busy’: > arch/mips/pci/pci-mt7620.c:123:11: error: ‘KERN_WARN’ undeclared (first use in this function) > printk(KERN_WARN "PCIE-PHY retry failed.\n"); > ^~~~~~~~~ > > John: I'm not familiar with the hardware, but would it be appropriate to > add CONFIG_PCI=y to either of those 2 defconfigs (omega2p_defconfig and > vocore2_defconfig) so this driver gets some upstream build[/boot] > testing? > > Anyway, hopefully that helps allay stable backport concerns. Yes, thanks, that explains it a lot better, now queued up. greg k-h

8 years

1
0
0 0

[Linux-stable-mirror] [-stable PATCH] sched/rt: Simplify the IPI based RT balancing logic

by Ingo Molnar

hi Greg, Please include the commit below in -stable, once it gets upstream. Maybe also add: Fixes: b6366f048e0c: ("sched/rt: Use IPI to trigger RT task push migration instead of pulling") because it fixes a pretty old bug. Thanks, Ingo ----- Forwarded message from "tip-bot for Steven Rostedt (Red Hat)" <tipbot(a)zytor.com> ----- Date: Tue, 10 Oct 2017 04:02:17 -0700 From: "tip-bot for Steven Rostedt (Red Hat)" <tipbot(a)zytor.com> To: linux-tip-commits(a)vger.kernel.org Cc: rostedt(a)goodmis.org, williams(a)redhat.com, mingo(a)kernel.org, tglx(a)linutronix.de, peterz(a)infradead.org, linux-kernel(a)vger.kernel.org, jkacur(a)redhat.com, bristot(a)redhat.com, torvalds(a)linux-foundation.org, efault(a)gmx.de, hpa(a)zytor.com, swood(a)redhat.com Subject: [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic Commit-ID: 4bdced5c9a2922521e325896a7bbbf0132c94e56 Gitweb: https://git.kernel.org/tip/4bdced5c9a2922521e325896a7bbbf0132c94e56 Author: Steven Rostedt (Red Hat) <rostedt(a)goodmis.org> AuthorDate: Fri, 6 Oct 2017 14:05:04 -0400 Committer: Ingo Molnar <mingo(a)kernel.org> CommitDate: Tue, 10 Oct 2017 11:45:40 +0200 sched/rt: Simplify the IPI based RT balancing logic When a CPU lowers its priority (schedules out a high priority task for a lower priority one), a check is made to see if any other CPU has overloaded RT tasks (more than one). It checks the rto_mask to determine this and if so it will request to pull one of those tasks to itself if the non running RT task is of higher priority than the new priority of the next task to run on the current CPU. When we deal with large number of CPUs, the original pull logic suffered from large lock contention on a single CPU run queue, which caused a huge latency across all CPUs. This was caused by only having one CPU having overloaded RT tasks and a bunch of other CPUs lowering their priority. To solve this issue, commit: b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration instead of pulling") changed the way to request a pull. Instead of grabbing the lock of the overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work. Although the IPI logic worked very well in removing the large latency build up, it still could suffer from a large number of IPIs being sent to a single CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet, when I tested this on a 120 CPU box, with a stress test that had lots of RT tasks scheduling on all CPUs, it actually triggered the hard lockup detector! One CPU had so many IPIs sent to it, and due to the restart mechanism that is triggered when the source run queue has a priority status change, the CPU spent minutes! processing the IPIs. Thinking about this further, I realized there's no reason for each run queue to send its own IPI. As all CPUs with overloaded tasks must be scanned regardless if there's one or many CPUs lowering their priority, because there's no current way to find the CPU with the highest priority task that can schedule to one of these CPUs, there really only needs to be one IPI being sent around at a time. This greatly simplifies the code! The new approach is to have each root domain have its own irq work, as the rto_mask is per root domain. The root domain has the following fields attached to it: rto_push_work - the irq work to process each CPU set in rto_mask rto_lock - the lock to protect some of the other rto fields rto_loop_start - an atomic that keeps contention down on rto_lock the first CPU scheduling in a lower priority task is the one to kick off the process. rto_loop_next - an atomic that gets incremented for each CPU that schedules in a lower priority task. rto_loop - a variable protected by rto_lock that is used to compare against rto_loop_next rto_cpu - The cpu to send the next IPI to, also protected by the rto_lock. When a CPU schedules in a lower priority task and wants to make sure overloaded CPUs know about it. It increments the rto_loop_next. Then it atomically sets rto_loop_start with a cmpxchg. If the old value is not "0", then it is done, as another CPU is kicking off the IPI loop. If the old value is "0", then it will take the rto_lock to synchronize with a possible IPI being sent around to the overloaded CPUs. If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no IPI being sent around, or one is about to finish. Then rto_cpu is set to the first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set in rto_mask, then there's nothing to be done. When the CPU receives the IPI, it will first try to push any RT tasks that is queued on the CPU but can't run because a higher priority RT task is currently running on that CPU. Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it finds one, it simply sends an IPI to that CPU and the process continues. If there's no more CPUs in the rto_mask, then rto_loop is compared with rto_loop_next. If they match, everything is done and the process is over. If they do not match, then a CPU scheduled in a lower priority task as the IPI was being passed around, and the process needs to start again. The first CPU in rto_mask is sent the IPI. This change removes this duplication of work in the IPI logic, and greatly lowers the latency caused by the IPIs. This removed the lockup happening on the 120 CPU machine. It also simplifies the code tremendously. What else could anyone ask for? Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and supplying me with the rto_start_trylock() and rto_start_unlock() helper functions. Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc: Clark Williams <williams(a)redhat.com> Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com> Cc: John Kacur <jkacur(a)redhat.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mike Galbraith <efault(a)gmx.de> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Scott Wood <swood(a)redhat.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home Signed-off-by: Ingo Molnar <mingo(a)kernel.org> --- kernel/sched/rt.c | 316 ++++++++++++++++++------------------------------ kernel/sched/sched.h | 24 ++-- kernel/sched/topology.c | 6 + 3 files changed, 138 insertions(+), 208 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 0af5ca9..fda2799 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -73,10 +73,6 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b) raw_spin_unlock(&rt_b->rt_runtime_lock); } -#if defined(CONFIG_SMP) && defined(HAVE_RT_PUSH_IPI) -static void push_irq_work_func(struct irq_work *work); -#endif - void init_rt_rq(struct rt_rq *rt_rq) { struct rt_prio_array *array; @@ -96,13 +92,6 @@ void init_rt_rq(struct rt_rq *rt_rq) rt_rq->rt_nr_migratory = 0; rt_rq->overloaded = 0; plist_head_init(&rt_rq->pushable_tasks); - -#ifdef HAVE_RT_PUSH_IPI - rt_rq->push_flags = 0; - rt_rq->push_cpu = nr_cpu_ids; - raw_spin_lock_init(&rt_rq->push_lock); - init_irq_work(&rt_rq->push_work, push_irq_work_func); -#endif #endif /* CONFIG_SMP */ /* We start is dequeued state, because no RT tasks are queued */ rt_rq->rt_queued = 0; @@ -1875,241 +1864,166 @@ static void push_rt_tasks(struct rq *rq) } #ifdef HAVE_RT_PUSH_IPI + /* - * The search for the next cpu always starts at rq->cpu and ends - * when we reach rq->cpu again. It will never return rq->cpu. - * This returns the next cpu to check, or nr_cpu_ids if the loop - * is complete. + * When a high priority task schedules out from a CPU and a lower priority + * task is scheduled in, a check is made to see if there's any RT tasks + * on other CPUs that are waiting to run because a higher priority RT task + * is currently running on its CPU. In this case, the CPU with multiple RT + * tasks queued on it (overloaded) needs to be notified that a CPU has opened + * up that may be able to run one of its non-running queued RT tasks. + * + * All CPUs with overloaded RT tasks need to be notified as there is currently + * no way to know which of these CPUs have the highest priority task waiting + * to run. Instead of trying to take a spinlock on each of these CPUs, + * which has shown to cause large latency when done on machines with many + * CPUs, sending an IPI to the CPUs to have them push off the overloaded + * RT tasks waiting to run. + * + * Just sending an IPI to each of the CPUs is also an issue, as on large + * count CPU machines, this can cause an IPI storm on a CPU, especially + * if its the only CPU with multiple RT tasks queued, and a large number + * of CPUs scheduling a lower priority task at the same time. + * + * Each root domain has its own irq work function that can iterate over + * all CPUs with RT overloaded tasks. Since all CPUs with overloaded RT + * tassk must be checked if there's one or many CPUs that are lowering + * their priority, there's a single irq work iterator that will try to + * push off RT tasks that are waiting to run. + * + * When a CPU schedules a lower priority task, it will kick off the + * irq work iterator that will jump to each CPU with overloaded RT tasks. + * As it only takes the first CPU that schedules a lower priority task + * to start the process, the rto_start variable is incremented and if + * the atomic result is one, then that CPU will try to take the rto_lock. + * This prevents high contention on the lock as the process handles all + * CPUs scheduling lower priority tasks. + * + * All CPUs that are scheduling a lower priority task will increment the + * rt_loop_next variable. This will make sure that the irq work iterator + * checks all RT overloaded CPUs whenever a CPU schedules a new lower + * priority task, even if the iterator is in the middle of a scan. Incrementing + * the rt_loop_next will cause the iterator to perform another scan. * - * rq->rt.push_cpu holds the last cpu returned by this function, - * or if this is the first instance, it must hold rq->cpu. */ static int rto_next_cpu(struct rq *rq) { - int prev_cpu = rq->rt.push_cpu; + struct root_domain *rd = rq->rd; + int next; int cpu; - cpu = cpumask_next(prev_cpu, rq->rd->rto_mask); - /* - * If the previous cpu is less than the rq's CPU, then it already - * passed the end of the mask, and has started from the beginning. - * We end if the next CPU is greater or equal to rq's CPU. + * When starting the IPI RT pushing, the rto_cpu is set to -1, + * rt_next_cpu() will simply return the first CPU found in + * the rto_mask. + * + * If rto_next_cpu() is called with rto_cpu is a valid cpu, it + * will return the next CPU found in the rto_mask. + * + * If there are no more CPUs left in the rto_mask, then a check is made + * against rto_loop and rto_loop_next. rto_loop is only updated with + * the rto_lock held, but any CPU may increment the rto_loop_next + * without any locking. */ - if (prev_cpu < rq->cpu) { - if (cpu >= rq->cpu) - return nr_cpu_ids; + for (;;) { - } else if (cpu >= nr_cpu_ids) { - /* - * We passed the end of the mask, start at the beginning. - * If the result is greater or equal to the rq's CPU, then - * the loop is finished. - */ - cpu = cpumask_first(rq->rd->rto_mask); - if (cpu >= rq->cpu) - return nr_cpu_ids; - } - rq->rt.push_cpu = cpu; + /* When rto_cpu is -1 this acts like cpumask_first() */ + cpu = cpumask_next(rd->rto_cpu, rd->rto_mask); - /* Return cpu to let the caller know if the loop is finished or not */ - return cpu; -} + rd->rto_cpu = cpu; -static int find_next_push_cpu(struct rq *rq) -{ - struct rq *next_rq; - int cpu; + if (cpu < nr_cpu_ids) + return cpu; - while (1) { - cpu = rto_next_cpu(rq); - if (cpu >= nr_cpu_ids) - break; - next_rq = cpu_rq(cpu); + rd->rto_cpu = -1; + + /* + * ACQUIRE ensures we see the @rto_mask changes + * made prior to the @next value observed. + * + * Matches WMB in rt_set_overload(). + */ + next = atomic_read_acquire(&rd->rto_loop_next); - /* Make sure the next rq can push to this rq */ - if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr) + if (rd->rto_loop == next) break; + + rd->rto_loop = next; } - return cpu; + return -1; } -#define RT_PUSH_IPI_EXECUTING 1 -#define RT_PUSH_IPI_RESTART 2 +static inline bool rto_start_trylock(atomic_t *v) +{ + return !atomic_cmpxchg_acquire(v, 0, 1); +} -/* - * When a high priority task schedules out from a CPU and a lower priority - * task is scheduled in, a check is made to see if there's any RT tasks - * on other CPUs that are waiting to run because a higher priority RT task - * is currently running on its CPU. In this case, the CPU with multiple RT - * tasks queued on it (overloaded) needs to be notified that a CPU has opened - * up that may be able to run one of its non-running queued RT tasks. - * - * On large CPU boxes, there's the case that several CPUs could schedule - * a lower priority task at the same time, in which case it will look for - * any overloaded CPUs that it could pull a task from. To do this, the runqueue - * lock must be taken from that overloaded CPU. Having 10s of CPUs all fighting - * for a single overloaded CPU's runqueue lock can produce a large latency. - * (This has actually been observed on large boxes running cyclictest). - * Instead of taking the runqueue lock of the overloaded CPU, each of the - * CPUs that scheduled a lower priority task simply sends an IPI to the - * overloaded CPU. An IPI is much cheaper than taking an runqueue lock with - * lots of contention. The overloaded CPU will look to push its non-running - * RT task off, and if it does, it can then ignore the other IPIs coming - * in, and just pass those IPIs off to any other overloaded CPU. - * - * When a CPU schedules a lower priority task, it only sends an IPI to - * the "next" CPU that has overloaded RT tasks. This prevents IPI storms, - * as having 10 CPUs scheduling lower priority tasks and 10 CPUs with - * RT overloaded tasks, would cause 100 IPIs to go out at once. - * - * The overloaded RT CPU, when receiving an IPI, will try to push off its - * overloaded RT tasks and then send an IPI to the next CPU that has - * overloaded RT tasks. This stops when all CPUs with overloaded RT tasks - * have completed. Just because a CPU may have pushed off its own overloaded - * RT task does not mean it should stop sending the IPI around to other - * overloaded CPUs. There may be another RT task waiting to run on one of - * those CPUs that are of higher priority than the one that was just - * pushed. - * - * An optimization that could possibly be made is to make a CPU array similar - * to the cpupri array mask of all running RT tasks, but for the overloaded - * case, then the IPI could be sent to only the CPU with the highest priority - * RT task waiting, and that CPU could send off further IPIs to the CPU with - * the next highest waiting task. Since the overloaded case is much less likely - * to happen, the complexity of this implementation may not be worth it. - * Instead, just send an IPI around to all overloaded CPUs. - * - * The rq->rt.push_flags holds the status of the IPI that is going around. - * A run queue can only send out a single IPI at a time. The possible flags - * for rq->rt.push_flags are: - * - * (None or zero): No IPI is going around for the current rq - * RT_PUSH_IPI_EXECUTING: An IPI for the rq is being passed around - * RT_PUSH_IPI_RESTART: The priority of the running task for the rq - * has changed, and the IPI should restart - * circulating the overloaded CPUs again. - * - * rq->rt.push_cpu contains the CPU that is being sent the IPI. It is updated - * before sending to the next CPU. - * - * Instead of having all CPUs that schedule a lower priority task send - * an IPI to the same "first" CPU in the RT overload mask, they send it - * to the next overloaded CPU after their own CPU. This helps distribute - * the work when there's more than one overloaded CPU and multiple CPUs - * scheduling in lower priority tasks. - * - * When a rq schedules a lower priority task than what was currently - * running, the next CPU with overloaded RT tasks is examined first. - * That is, if CPU 1 and 5 are overloaded, and CPU 3 schedules a lower - * priority task, it will send an IPI first to CPU 5, then CPU 5 will - * send to CPU 1 if it is still overloaded. CPU 1 will clear the - * rq->rt.push_flags if RT_PUSH_IPI_RESTART is not set. - * - * The first CPU to notice IPI_RESTART is set, will clear that flag and then - * send an IPI to the next overloaded CPU after the rq->cpu and not the next - * CPU after push_cpu. That is, if CPU 1, 4 and 5 are overloaded when CPU 3 - * schedules a lower priority task, and the IPI_RESTART gets set while the - * handling is being done on CPU 5, it will clear the flag and send it back to - * CPU 4 instead of CPU 1. - * - * Note, the above logic can be disabled by turning off the sched_feature - * RT_PUSH_IPI. Then the rq lock of the overloaded CPU will simply be - * taken by the CPU requesting a pull and the waiting RT task will be pulled - * by that CPU. This may be fine for machines with few CPUs. - */ -static void tell_cpu_to_push(struct rq *rq) +static inline void rto_start_unlock(atomic_t *v) { - int cpu; + atomic_set_release(v, 0); +} - if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { - raw_spin_lock(&rq->rt.push_lock); - /* Make sure it's still executing */ - if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { - /* - * Tell the IPI to restart the loop as things have - * changed since it started. - */ - rq->rt.push_flags |= RT_PUSH_IPI_RESTART; - raw_spin_unlock(&rq->rt.push_lock); - return; - } - raw_spin_unlock(&rq->rt.push_lock); - } +static void tell_cpu_to_push(struct rq *rq) +{ + int cpu = -1; - /* When here, there's no IPI going around */ + /* Keep the loop going if the IPI is currently active */ + atomic_inc(&rq->rd->rto_loop_next); - rq->rt.push_cpu = rq->cpu; - cpu = find_next_push_cpu(rq); - if (cpu >= nr_cpu_ids) + /* Only one CPU can initiate a loop at a time */ + if (!rto_start_trylock(&rq->rd->rto_loop_start)) return; - rq->rt.push_flags = RT_PUSH_IPI_EXECUTING; + raw_spin_lock(&rq->rd->rto_lock); + + /* + * The rto_cpu is updated under the lock, if it has a valid cpu + * then the IPI is still running and will continue due to the + * update to loop_next, and nothing needs to be done here. + * Otherwise it is finishing up and an ipi needs to be sent. + */ + if (rq->rd->rto_cpu < 0) + cpu = rto_next_cpu(rq); - irq_work_queue_on(&rq->rt.push_work, cpu); + raw_spin_unlock(&rq->rd->rto_lock); + + rto_start_unlock(&rq->rd->rto_loop_start); + + if (cpu >= 0) + irq_work_queue_on(&rq->rd->rto_push_work, cpu); } /* Called from hardirq context */ -static void try_to_push_tasks(void *arg) +void rto_push_irq_work_func(struct irq_work *work) { - struct rt_rq *rt_rq = arg; - struct rq *rq, *src_rq; - int this_cpu; + struct rq *rq; int cpu; - this_cpu = rt_rq->push_cpu; + rq = this_rq(); - /* Paranoid check */ - BUG_ON(this_cpu != smp_processor_id()); - - rq = cpu_rq(this_cpu); - src_rq = rq_of_rt_rq(rt_rq); - -again: + /* + * We do not need to grab the lock to check for has_pushable_tasks. + * When it gets updated, a check is made if a push is possible. + */ if (has_pushable_tasks(rq)) { raw_spin_lock(&rq->lock); - push_rt_task(rq); + push_rt_tasks(rq); raw_spin_unlock(&rq->lock); } - /* Pass the IPI to the next rt overloaded queue */ - raw_spin_lock(&rt_rq->push_lock); - /* - * If the source queue changed since the IPI went out, - * we need to restart the search from that CPU again. - */ - if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) { - rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART; - rt_rq->push_cpu = src_rq->cpu; - } + raw_spin_lock(&rq->rd->rto_lock); - cpu = find_next_push_cpu(src_rq); + /* Pass the IPI to the next rt overloaded queue */ + cpu = rto_next_cpu(rq); - if (cpu >= nr_cpu_ids) - rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING; - raw_spin_unlock(&rt_rq->push_lock); + raw_spin_unlock(&rq->rd->rto_lock); - if (cpu >= nr_cpu_ids) + if (cpu < 0) return; - /* - * It is possible that a restart caused this CPU to be - * chosen again. Don't bother with an IPI, just see if we - * have more to push. - */ - if (unlikely(cpu == rq->cpu)) - goto again; - /* Try the next RT overloaded CPU */ - irq_work_queue_on(&rt_rq->push_work, cpu); -} - -static void push_irq_work_func(struct irq_work *work) -{ - struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work); - - try_to_push_tasks(rt_rq); + irq_work_queue_on(&rq->rd->rto_push_work, cpu); } #endif /* HAVE_RT_PUSH_IPI */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a81c978..8aa24b4 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -505,7 +505,7 @@ static inline int rt_bandwidth_enabled(void) } /* RT IPI pull logic requires IRQ_WORK */ -#ifdef CONFIG_IRQ_WORK +#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_SMP) # define HAVE_RT_PUSH_IPI #endif @@ -527,12 +527,6 @@ struct rt_rq { unsigned long rt_nr_total; int overloaded; struct plist_head pushable_tasks; -#ifdef HAVE_RT_PUSH_IPI - int push_flags; - int push_cpu; - struct irq_work push_work; - raw_spinlock_t push_lock; -#endif #endif /* CONFIG_SMP */ int rt_queued; @@ -641,6 +635,19 @@ struct root_domain { struct dl_bw dl_bw; struct cpudl cpudl; +#ifdef HAVE_RT_PUSH_IPI + /* + * For IPI pull requests, loop across the rto_mask. + */ + struct irq_work rto_push_work; + raw_spinlock_t rto_lock; + /* These are only updated and read within rto_lock */ + int rto_loop; + int rto_cpu; + /* These atomics are updated outside of a lock */ + atomic_t rto_loop_next; + atomic_t rto_loop_start; +#endif /* * The "RT overload" flag: it gets set if a CPU has more than * one runnable RT task. @@ -658,6 +665,9 @@ extern void init_defrootdomain(void); extern int sched_init_domains(const struct cpumask *cpu_map); extern void rq_attach_root(struct rq *rq, struct root_domain *rd); +#ifdef HAVE_RT_PUSH_IPI +extern void rto_push_irq_work_func(struct irq_work *work); +#endif #endif /* CONFIG_SMP */ /* diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index f51d123..e50450c 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -268,6 +268,12 @@ static int init_rootdomain(struct root_domain *rd) if (!zalloc_cpumask_var(&rd->rto_mask, GFP_KERNEL)) goto free_dlo_mask; +#ifdef HAVE_RT_PUSH_IPI + rd->rto_cpu = -1; + raw_spin_lock_init(&rd->rto_lock); + init_irq_work(&rd->rto_push_work, rto_push_irq_work_func); +#endif + init_dl_bw(&rd->dl_bw); if (cpudl_init(&rd->cpudl) != 0) goto free_rto_mask; ----- End forwarded message -----

8 years

3
8
0 0

[Linux-stable-mirror] Patch "sched/rt: Simplify the IPI based RT balancing logic" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled sched/rt: Simplify the IPI based RT balancing logic to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: sched-rt-simplify-the-ipi-based-rt-balancing-logic.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 4bdced5c9a2922521e325896a7bbbf0132c94e56 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Red Hat)" <rostedt(a)goodmis.org> Date: Fri, 6 Oct 2017 14:05:04 -0400 Subject: sched/rt: Simplify the IPI based RT balancing logic From: Steven Rostedt (Red Hat) <rostedt(a)goodmis.org> commit 4bdced5c9a2922521e325896a7bbbf0132c94e56 upstream. When a CPU lowers its priority (schedules out a high priority task for a lower priority one), a check is made to see if any other CPU has overloaded RT tasks (more than one). It checks the rto_mask to determine this and if so it will request to pull one of those tasks to itself if the non running RT task is of higher priority than the new priority of the next task to run on the current CPU. When we deal with large number of CPUs, the original pull logic suffered from large lock contention on a single CPU run queue, which caused a huge latency across all CPUs. This was caused by only having one CPU having overloaded RT tasks and a bunch of other CPUs lowering their priority. To solve this issue, commit: b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration instead of pulling") changed the way to request a pull. Instead of grabbing the lock of the overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work. Although the IPI logic worked very well in removing the large latency build up, it still could suffer from a large number of IPIs being sent to a single CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet, when I tested this on a 120 CPU box, with a stress test that had lots of RT tasks scheduling on all CPUs, it actually triggered the hard lockup detector! One CPU had so many IPIs sent to it, and due to the restart mechanism that is triggered when the source run queue has a priority status change, the CPU spent minutes! processing the IPIs. Thinking about this further, I realized there's no reason for each run queue to send its own IPI. As all CPUs with overloaded tasks must be scanned regardless if there's one or many CPUs lowering their priority, because there's no current way to find the CPU with the highest priority task that can schedule to one of these CPUs, there really only needs to be one IPI being sent around at a time. This greatly simplifies the code! The new approach is to have each root domain have its own irq work, as the rto_mask is per root domain. The root domain has the following fields attached to it: rto_push_work - the irq work to process each CPU set in rto_mask rto_lock - the lock to protect some of the other rto fields rto_loop_start - an atomic that keeps contention down on rto_lock the first CPU scheduling in a lower priority task is the one to kick off the process. rto_loop_next - an atomic that gets incremented for each CPU that schedules in a lower priority task. rto_loop - a variable protected by rto_lock that is used to compare against rto_loop_next rto_cpu - The cpu to send the next IPI to, also protected by the rto_lock. When a CPU schedules in a lower priority task and wants to make sure overloaded CPUs know about it. It increments the rto_loop_next. Then it atomically sets rto_loop_start with a cmpxchg. If the old value is not "0", then it is done, as another CPU is kicking off the IPI loop. If the old value is "0", then it will take the rto_lock to synchronize with a possible IPI being sent around to the overloaded CPUs. If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no IPI being sent around, or one is about to finish. Then rto_cpu is set to the first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set in rto_mask, then there's nothing to be done. When the CPU receives the IPI, it will first try to push any RT tasks that is queued on the CPU but can't run because a higher priority RT task is currently running on that CPU. Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it finds one, it simply sends an IPI to that CPU and the process continues. If there's no more CPUs in the rto_mask, then rto_loop is compared with rto_loop_next. If they match, everything is done and the process is over. If they do not match, then a CPU scheduled in a lower priority task as the IPI was being passed around, and the process needs to start again. The first CPU in rto_mask is sent the IPI. This change removes this duplication of work in the IPI logic, and greatly lowers the latency caused by the IPIs. This removed the lockup happening on the 120 CPU machine. It also simplifies the code tremendously. What else could anyone ask for? Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and supplying me with the rto_start_trylock() and rto_start_unlock() helper functions. Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc: Clark Williams <williams(a)redhat.com> Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com> Cc: John Kacur <jkacur(a)redhat.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mike Galbraith <efault(a)gmx.de> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Scott Wood <swood(a)redhat.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- kernel/sched/rt.c | 316 +++++++++++++++++------------------------------- kernel/sched/sched.h | 24 ++- kernel/sched/topology.c | 6 3 files changed, 138 insertions(+), 208 deletions(-) --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -74,10 +74,6 @@ static void start_rt_bandwidth(struct rt raw_spin_unlock(&rt_b->rt_runtime_lock); } -#if defined(CONFIG_SMP) && defined(HAVE_RT_PUSH_IPI) -static void push_irq_work_func(struct irq_work *work); -#endif - void init_rt_rq(struct rt_rq *rt_rq) { struct rt_prio_array *array; @@ -97,13 +93,6 @@ void init_rt_rq(struct rt_rq *rt_rq) rt_rq->rt_nr_migratory = 0; rt_rq->overloaded = 0; plist_head_init(&rt_rq->pushable_tasks); - -#ifdef HAVE_RT_PUSH_IPI - rt_rq->push_flags = 0; - rt_rq->push_cpu = nr_cpu_ids; - raw_spin_lock_init(&rt_rq->push_lock); - init_irq_work(&rt_rq->push_work, push_irq_work_func); -#endif #endif /* CONFIG_SMP */ /* We start is dequeued state, because no RT tasks are queued */ rt_rq->rt_queued = 0; @@ -1876,241 +1865,166 @@ static void push_rt_tasks(struct rq *rq) } #ifdef HAVE_RT_PUSH_IPI + /* - * The search for the next cpu always starts at rq->cpu and ends - * when we reach rq->cpu again. It will never return rq->cpu. - * This returns the next cpu to check, or nr_cpu_ids if the loop - * is complete. + * When a high priority task schedules out from a CPU and a lower priority + * task is scheduled in, a check is made to see if there's any RT tasks + * on other CPUs that are waiting to run because a higher priority RT task + * is currently running on its CPU. In this case, the CPU with multiple RT + * tasks queued on it (overloaded) needs to be notified that a CPU has opened + * up that may be able to run one of its non-running queued RT tasks. + * + * All CPUs with overloaded RT tasks need to be notified as there is currently + * no way to know which of these CPUs have the highest priority task waiting + * to run. Instead of trying to take a spinlock on each of these CPUs, + * which has shown to cause large latency when done on machines with many + * CPUs, sending an IPI to the CPUs to have them push off the overloaded + * RT tasks waiting to run. + * + * Just sending an IPI to each of the CPUs is also an issue, as on large + * count CPU machines, this can cause an IPI storm on a CPU, especially + * if its the only CPU with multiple RT tasks queued, and a large number + * of CPUs scheduling a lower priority task at the same time. + * + * Each root domain has its own irq work function that can iterate over + * all CPUs with RT overloaded tasks. Since all CPUs with overloaded RT + * tassk must be checked if there's one or many CPUs that are lowering + * their priority, there's a single irq work iterator that will try to + * push off RT tasks that are waiting to run. + * + * When a CPU schedules a lower priority task, it will kick off the + * irq work iterator that will jump to each CPU with overloaded RT tasks. + * As it only takes the first CPU that schedules a lower priority task + * to start the process, the rto_start variable is incremented and if + * the atomic result is one, then that CPU will try to take the rto_lock. + * This prevents high contention on the lock as the process handles all + * CPUs scheduling lower priority tasks. + * + * All CPUs that are scheduling a lower priority task will increment the + * rt_loop_next variable. This will make sure that the irq work iterator + * checks all RT overloaded CPUs whenever a CPU schedules a new lower + * priority task, even if the iterator is in the middle of a scan. Incrementing + * the rt_loop_next will cause the iterator to perform another scan. * - * rq->rt.push_cpu holds the last cpu returned by this function, - * or if this is the first instance, it must hold rq->cpu. */ static int rto_next_cpu(struct rq *rq) { - int prev_cpu = rq->rt.push_cpu; + struct root_domain *rd = rq->rd; + int next; int cpu; - cpu = cpumask_next(prev_cpu, rq->rd->rto_mask); - /* - * If the previous cpu is less than the rq's CPU, then it already - * passed the end of the mask, and has started from the beginning. - * We end if the next CPU is greater or equal to rq's CPU. + * When starting the IPI RT pushing, the rto_cpu is set to -1, + * rt_next_cpu() will simply return the first CPU found in + * the rto_mask. + * + * If rto_next_cpu() is called with rto_cpu is a valid cpu, it + * will return the next CPU found in the rto_mask. + * + * If there are no more CPUs left in the rto_mask, then a check is made + * against rto_loop and rto_loop_next. rto_loop is only updated with + * the rto_lock held, but any CPU may increment the rto_loop_next + * without any locking. */ - if (prev_cpu < rq->cpu) { - if (cpu >= rq->cpu) - return nr_cpu_ids; + for (;;) { - } else if (cpu >= nr_cpu_ids) { - /* - * We passed the end of the mask, start at the beginning. - * If the result is greater or equal to the rq's CPU, then - * the loop is finished. - */ - cpu = cpumask_first(rq->rd->rto_mask); - if (cpu >= rq->cpu) - return nr_cpu_ids; - } - rq->rt.push_cpu = cpu; + /* When rto_cpu is -1 this acts like cpumask_first() */ + cpu = cpumask_next(rd->rto_cpu, rd->rto_mask); - /* Return cpu to let the caller know if the loop is finished or not */ - return cpu; -} + rd->rto_cpu = cpu; -static int find_next_push_cpu(struct rq *rq) -{ - struct rq *next_rq; - int cpu; + if (cpu < nr_cpu_ids) + return cpu; - while (1) { - cpu = rto_next_cpu(rq); - if (cpu >= nr_cpu_ids) - break; - next_rq = cpu_rq(cpu); + rd->rto_cpu = -1; + + /* + * ACQUIRE ensures we see the @rto_mask changes + * made prior to the @next value observed. + * + * Matches WMB in rt_set_overload(). + */ + next = atomic_read_acquire(&rd->rto_loop_next); - /* Make sure the next rq can push to this rq */ - if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr) + if (rd->rto_loop == next) break; + + rd->rto_loop = next; } - return cpu; + return -1; } -#define RT_PUSH_IPI_EXECUTING 1 -#define RT_PUSH_IPI_RESTART 2 +static inline bool rto_start_trylock(atomic_t *v) +{ + return !atomic_cmpxchg_acquire(v, 0, 1); +} -/* - * When a high priority task schedules out from a CPU and a lower priority - * task is scheduled in, a check is made to see if there's any RT tasks - * on other CPUs that are waiting to run because a higher priority RT task - * is currently running on its CPU. In this case, the CPU with multiple RT - * tasks queued on it (overloaded) needs to be notified that a CPU has opened - * up that may be able to run one of its non-running queued RT tasks. - * - * On large CPU boxes, there's the case that several CPUs could schedule - * a lower priority task at the same time, in which case it will look for - * any overloaded CPUs that it could pull a task from. To do this, the runqueue - * lock must be taken from that overloaded CPU. Having 10s of CPUs all fighting - * for a single overloaded CPU's runqueue lock can produce a large latency. - * (This has actually been observed on large boxes running cyclictest). - * Instead of taking the runqueue lock of the overloaded CPU, each of the - * CPUs that scheduled a lower priority task simply sends an IPI to the - * overloaded CPU. An IPI is much cheaper than taking an runqueue lock with - * lots of contention. The overloaded CPU will look to push its non-running - * RT task off, and if it does, it can then ignore the other IPIs coming - * in, and just pass those IPIs off to any other overloaded CPU. - * - * When a CPU schedules a lower priority task, it only sends an IPI to - * the "next" CPU that has overloaded RT tasks. This prevents IPI storms, - * as having 10 CPUs scheduling lower priority tasks and 10 CPUs with - * RT overloaded tasks, would cause 100 IPIs to go out at once. - * - * The overloaded RT CPU, when receiving an IPI, will try to push off its - * overloaded RT tasks and then send an IPI to the next CPU that has - * overloaded RT tasks. This stops when all CPUs with overloaded RT tasks - * have completed. Just because a CPU may have pushed off its own overloaded - * RT task does not mean it should stop sending the IPI around to other - * overloaded CPUs. There may be another RT task waiting to run on one of - * those CPUs that are of higher priority than the one that was just - * pushed. - * - * An optimization that could possibly be made is to make a CPU array similar - * to the cpupri array mask of all running RT tasks, but for the overloaded - * case, then the IPI could be sent to only the CPU with the highest priority - * RT task waiting, and that CPU could send off further IPIs to the CPU with - * the next highest waiting task. Since the overloaded case is much less likely - * to happen, the complexity of this implementation may not be worth it. - * Instead, just send an IPI around to all overloaded CPUs. - * - * The rq->rt.push_flags holds the status of the IPI that is going around. - * A run queue can only send out a single IPI at a time. The possible flags - * for rq->rt.push_flags are: - * - * (None or zero): No IPI is going around for the current rq - * RT_PUSH_IPI_EXECUTING: An IPI for the rq is being passed around - * RT_PUSH_IPI_RESTART: The priority of the running task for the rq - * has changed, and the IPI should restart - * circulating the overloaded CPUs again. - * - * rq->rt.push_cpu contains the CPU that is being sent the IPI. It is updated - * before sending to the next CPU. - * - * Instead of having all CPUs that schedule a lower priority task send - * an IPI to the same "first" CPU in the RT overload mask, they send it - * to the next overloaded CPU after their own CPU. This helps distribute - * the work when there's more than one overloaded CPU and multiple CPUs - * scheduling in lower priority tasks. - * - * When a rq schedules a lower priority task than what was currently - * running, the next CPU with overloaded RT tasks is examined first. - * That is, if CPU 1 and 5 are overloaded, and CPU 3 schedules a lower - * priority task, it will send an IPI first to CPU 5, then CPU 5 will - * send to CPU 1 if it is still overloaded. CPU 1 will clear the - * rq->rt.push_flags if RT_PUSH_IPI_RESTART is not set. - * - * The first CPU to notice IPI_RESTART is set, will clear that flag and then - * send an IPI to the next overloaded CPU after the rq->cpu and not the next - * CPU after push_cpu. That is, if CPU 1, 4 and 5 are overloaded when CPU 3 - * schedules a lower priority task, and the IPI_RESTART gets set while the - * handling is being done on CPU 5, it will clear the flag and send it back to - * CPU 4 instead of CPU 1. - * - * Note, the above logic can be disabled by turning off the sched_feature - * RT_PUSH_IPI. Then the rq lock of the overloaded CPU will simply be - * taken by the CPU requesting a pull and the waiting RT task will be pulled - * by that CPU. This may be fine for machines with few CPUs. - */ -static void tell_cpu_to_push(struct rq *rq) +static inline void rto_start_unlock(atomic_t *v) { - int cpu; + atomic_set_release(v, 0); +} - if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { - raw_spin_lock(&rq->rt.push_lock); - /* Make sure it's still executing */ - if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { - /* - * Tell the IPI to restart the loop as things have - * changed since it started. - */ - rq->rt.push_flags |= RT_PUSH_IPI_RESTART; - raw_spin_unlock(&rq->rt.push_lock); - return; - } - raw_spin_unlock(&rq->rt.push_lock); - } +static void tell_cpu_to_push(struct rq *rq) +{ + int cpu = -1; - /* When here, there's no IPI going around */ + /* Keep the loop going if the IPI is currently active */ + atomic_inc(&rq->rd->rto_loop_next); - rq->rt.push_cpu = rq->cpu; - cpu = find_next_push_cpu(rq); - if (cpu >= nr_cpu_ids) + /* Only one CPU can initiate a loop at a time */ + if (!rto_start_trylock(&rq->rd->rto_loop_start)) return; - rq->rt.push_flags = RT_PUSH_IPI_EXECUTING; + raw_spin_lock(&rq->rd->rto_lock); - irq_work_queue_on(&rq->rt.push_work, cpu); + /* + * The rto_cpu is updated under the lock, if it has a valid cpu + * then the IPI is still running and will continue due to the + * update to loop_next, and nothing needs to be done here. + * Otherwise it is finishing up and an ipi needs to be sent. + */ + if (rq->rd->rto_cpu < 0) + cpu = rto_next_cpu(rq); + + raw_spin_unlock(&rq->rd->rto_lock); + + rto_start_unlock(&rq->rd->rto_loop_start); + + if (cpu >= 0) + irq_work_queue_on(&rq->rd->rto_push_work, cpu); } /* Called from hardirq context */ -static void try_to_push_tasks(void *arg) +void rto_push_irq_work_func(struct irq_work *work) { - struct rt_rq *rt_rq = arg; - struct rq *rq, *src_rq; - int this_cpu; + struct rq *rq; int cpu; - this_cpu = rt_rq->push_cpu; + rq = this_rq(); - /* Paranoid check */ - BUG_ON(this_cpu != smp_processor_id()); - - rq = cpu_rq(this_cpu); - src_rq = rq_of_rt_rq(rt_rq); - -again: + /* + * We do not need to grab the lock to check for has_pushable_tasks. + * When it gets updated, a check is made if a push is possible. + */ if (has_pushable_tasks(rq)) { raw_spin_lock(&rq->lock); - push_rt_task(rq); + push_rt_tasks(rq); raw_spin_unlock(&rq->lock); } - /* Pass the IPI to the next rt overloaded queue */ - raw_spin_lock(&rt_rq->push_lock); - /* - * If the source queue changed since the IPI went out, - * we need to restart the search from that CPU again. - */ - if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) { - rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART; - rt_rq->push_cpu = src_rq->cpu; - } + raw_spin_lock(&rq->rd->rto_lock); - cpu = find_next_push_cpu(src_rq); + /* Pass the IPI to the next rt overloaded queue */ + cpu = rto_next_cpu(rq); - if (cpu >= nr_cpu_ids) - rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING; - raw_spin_unlock(&rt_rq->push_lock); + raw_spin_unlock(&rq->rd->rto_lock); - if (cpu >= nr_cpu_ids) + if (cpu < 0) return; - /* - * It is possible that a restart caused this CPU to be - * chosen again. Don't bother with an IPI, just see if we - * have more to push. - */ - if (unlikely(cpu == rq->cpu)) - goto again; - /* Try the next RT overloaded CPU */ - irq_work_queue_on(&rt_rq->push_work, cpu); -} - -static void push_irq_work_func(struct irq_work *work) -{ - struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work); - - try_to_push_tasks(rt_rq); + irq_work_queue_on(&rq->rd->rto_push_work, cpu); } #endif /* HAVE_RT_PUSH_IPI */ --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -502,7 +502,7 @@ static inline int rt_bandwidth_enabled(v } /* RT IPI pull logic requires IRQ_WORK */ -#ifdef CONFIG_IRQ_WORK +#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_SMP) # define HAVE_RT_PUSH_IPI #endif @@ -524,12 +524,6 @@ struct rt_rq { unsigned long rt_nr_total; int overloaded; struct plist_head pushable_tasks; -#ifdef HAVE_RT_PUSH_IPI - int push_flags; - int push_cpu; - struct irq_work push_work; - raw_spinlock_t push_lock; -#endif #endif /* CONFIG_SMP */ int rt_queued; @@ -638,6 +632,19 @@ struct root_domain { struct dl_bw dl_bw; struct cpudl cpudl; +#ifdef HAVE_RT_PUSH_IPI + /* + * For IPI pull requests, loop across the rto_mask. + */ + struct irq_work rto_push_work; + raw_spinlock_t rto_lock; + /* These are only updated and read within rto_lock */ + int rto_loop; + int rto_cpu; + /* These atomics are updated outside of a lock */ + atomic_t rto_loop_next; + atomic_t rto_loop_start; +#endif /* * The "RT overload" flag: it gets set if a CPU has more than * one runnable RT task. @@ -655,6 +662,9 @@ extern void init_defrootdomain(void); extern int sched_init_domains(const struct cpumask *cpu_map); extern void rq_attach_root(struct rq *rq, struct root_domain *rd); +#ifdef HAVE_RT_PUSH_IPI +extern void rto_push_irq_work_func(struct irq_work *work); +#endif #endif /* CONFIG_SMP */ /* --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -269,6 +269,12 @@ static int init_rootdomain(struct root_d if (!zalloc_cpumask_var(&rd->rto_mask, GFP_KERNEL)) goto free_dlo_mask; +#ifdef HAVE_RT_PUSH_IPI + rd->rto_cpu = -1; + raw_spin_lock_init(&rd->rto_lock); + init_irq_work(&rd->rto_push_work, rto_push_irq_work_func); +#endif + init_dl_bw(&rd->dl_bw); if (cpudl_init(&rd->cpudl) != 0) goto free_rto_mask; Patches currently in stable-queue which might be from rostedt(a)goodmis.org are queue-4.14/sched-make-resched_cpu-unconditional.patch queue-4.14/sched-rt-simplify-the-ipi-based-rt-balancing-logic.patch

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] sched/rt: Simplify the IPI based RT balancing logic" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 4bdced5c9a2922521e325896a7bbbf0132c94e56 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Red Hat)" <rostedt(a)goodmis.org> Date: Fri, 6 Oct 2017 14:05:04 -0400 Subject: [PATCH] sched/rt: Simplify the IPI based RT balancing logic When a CPU lowers its priority (schedules out a high priority task for a lower priority one), a check is made to see if any other CPU has overloaded RT tasks (more than one). It checks the rto_mask to determine this and if so it will request to pull one of those tasks to itself if the non running RT task is of higher priority than the new priority of the next task to run on the current CPU. When we deal with large number of CPUs, the original pull logic suffered from large lock contention on a single CPU run queue, which caused a huge latency across all CPUs. This was caused by only having one CPU having overloaded RT tasks and a bunch of other CPUs lowering their priority. To solve this issue, commit: b6366f048e0c ("sched/rt: Use IPI to trigger RT task push migration instead of pulling") changed the way to request a pull. Instead of grabbing the lock of the overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work. Although the IPI logic worked very well in removing the large latency build up, it still could suffer from a large number of IPIs being sent to a single CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet, when I tested this on a 120 CPU box, with a stress test that had lots of RT tasks scheduling on all CPUs, it actually triggered the hard lockup detector! One CPU had so many IPIs sent to it, and due to the restart mechanism that is triggered when the source run queue has a priority status change, the CPU spent minutes! processing the IPIs. Thinking about this further, I realized there's no reason for each run queue to send its own IPI. As all CPUs with overloaded tasks must be scanned regardless if there's one or many CPUs lowering their priority, because there's no current way to find the CPU with the highest priority task that can schedule to one of these CPUs, there really only needs to be one IPI being sent around at a time. This greatly simplifies the code! The new approach is to have each root domain have its own irq work, as the rto_mask is per root domain. The root domain has the following fields attached to it: rto_push_work - the irq work to process each CPU set in rto_mask rto_lock - the lock to protect some of the other rto fields rto_loop_start - an atomic that keeps contention down on rto_lock the first CPU scheduling in a lower priority task is the one to kick off the process. rto_loop_next - an atomic that gets incremented for each CPU that schedules in a lower priority task. rto_loop - a variable protected by rto_lock that is used to compare against rto_loop_next rto_cpu - The cpu to send the next IPI to, also protected by the rto_lock. When a CPU schedules in a lower priority task and wants to make sure overloaded CPUs know about it. It increments the rto_loop_next. Then it atomically sets rto_loop_start with a cmpxchg. If the old value is not "0", then it is done, as another CPU is kicking off the IPI loop. If the old value is "0", then it will take the rto_lock to synchronize with a possible IPI being sent around to the overloaded CPUs. If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no IPI being sent around, or one is about to finish. Then rto_cpu is set to the first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set in rto_mask, then there's nothing to be done. When the CPU receives the IPI, it will first try to push any RT tasks that is queued on the CPU but can't run because a higher priority RT task is currently running on that CPU. Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it finds one, it simply sends an IPI to that CPU and the process continues. If there's no more CPUs in the rto_mask, then rto_loop is compared with rto_loop_next. If they match, everything is done and the process is over. If they do not match, then a CPU scheduled in a lower priority task as the IPI was being passed around, and the process needs to start again. The first CPU in rto_mask is sent the IPI. This change removes this duplication of work in the IPI logic, and greatly lowers the latency caused by the IPIs. This removed the lockup happening on the 120 CPU machine. It also simplifies the code tremendously. What else could anyone ask for? Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and supplying me with the rto_start_trylock() and rto_start_unlock() helper functions. Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc: Clark Williams <williams(a)redhat.com> Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com> Cc: John Kacur <jkacur(a)redhat.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mike Galbraith <efault(a)gmx.de> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Scott Wood <swood(a)redhat.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.home Signed-off-by: Ingo Molnar <mingo(a)kernel.org> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 0af5ca9e3e3f..fda27991699a 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -73,10 +73,6 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b) raw_spin_unlock(&rt_b->rt_runtime_lock); } -#if defined(CONFIG_SMP) && defined(HAVE_RT_PUSH_IPI) -static void push_irq_work_func(struct irq_work *work); -#endif - void init_rt_rq(struct rt_rq *rt_rq) { struct rt_prio_array *array; @@ -96,13 +92,6 @@ void init_rt_rq(struct rt_rq *rt_rq) rt_rq->rt_nr_migratory = 0; rt_rq->overloaded = 0; plist_head_init(&rt_rq->pushable_tasks); - -#ifdef HAVE_RT_PUSH_IPI - rt_rq->push_flags = 0; - rt_rq->push_cpu = nr_cpu_ids; - raw_spin_lock_init(&rt_rq->push_lock); - init_irq_work(&rt_rq->push_work, push_irq_work_func); -#endif #endif /* CONFIG_SMP */ /* We start is dequeued state, because no RT tasks are queued */ rt_rq->rt_queued = 0; @@ -1875,241 +1864,166 @@ static void push_rt_tasks(struct rq *rq) } #ifdef HAVE_RT_PUSH_IPI + /* - * The search for the next cpu always starts at rq->cpu and ends - * when we reach rq->cpu again. It will never return rq->cpu. - * This returns the next cpu to check, or nr_cpu_ids if the loop - * is complete. + * When a high priority task schedules out from a CPU and a lower priority + * task is scheduled in, a check is made to see if there's any RT tasks + * on other CPUs that are waiting to run because a higher priority RT task + * is currently running on its CPU. In this case, the CPU with multiple RT + * tasks queued on it (overloaded) needs to be notified that a CPU has opened + * up that may be able to run one of its non-running queued RT tasks. + * + * All CPUs with overloaded RT tasks need to be notified as there is currently + * no way to know which of these CPUs have the highest priority task waiting + * to run. Instead of trying to take a spinlock on each of these CPUs, + * which has shown to cause large latency when done on machines with many + * CPUs, sending an IPI to the CPUs to have them push off the overloaded + * RT tasks waiting to run. + * + * Just sending an IPI to each of the CPUs is also an issue, as on large + * count CPU machines, this can cause an IPI storm on a CPU, especially + * if its the only CPU with multiple RT tasks queued, and a large number + * of CPUs scheduling a lower priority task at the same time. + * + * Each root domain has its own irq work function that can iterate over + * all CPUs with RT overloaded tasks. Since all CPUs with overloaded RT + * tassk must be checked if there's one or many CPUs that are lowering + * their priority, there's a single irq work iterator that will try to + * push off RT tasks that are waiting to run. + * + * When a CPU schedules a lower priority task, it will kick off the + * irq work iterator that will jump to each CPU with overloaded RT tasks. + * As it only takes the first CPU that schedules a lower priority task + * to start the process, the rto_start variable is incremented and if + * the atomic result is one, then that CPU will try to take the rto_lock. + * This prevents high contention on the lock as the process handles all + * CPUs scheduling lower priority tasks. + * + * All CPUs that are scheduling a lower priority task will increment the + * rt_loop_next variable. This will make sure that the irq work iterator + * checks all RT overloaded CPUs whenever a CPU schedules a new lower + * priority task, even if the iterator is in the middle of a scan. Incrementing + * the rt_loop_next will cause the iterator to perform another scan. * - * rq->rt.push_cpu holds the last cpu returned by this function, - * or if this is the first instance, it must hold rq->cpu. */ static int rto_next_cpu(struct rq *rq) { - int prev_cpu = rq->rt.push_cpu; + struct root_domain *rd = rq->rd; + int next; int cpu; - cpu = cpumask_next(prev_cpu, rq->rd->rto_mask); - /* - * If the previous cpu is less than the rq's CPU, then it already - * passed the end of the mask, and has started from the beginning. - * We end if the next CPU is greater or equal to rq's CPU. + * When starting the IPI RT pushing, the rto_cpu is set to -1, + * rt_next_cpu() will simply return the first CPU found in + * the rto_mask. + * + * If rto_next_cpu() is called with rto_cpu is a valid cpu, it + * will return the next CPU found in the rto_mask. + * + * If there are no more CPUs left in the rto_mask, then a check is made + * against rto_loop and rto_loop_next. rto_loop is only updated with + * the rto_lock held, but any CPU may increment the rto_loop_next + * without any locking. */ - if (prev_cpu < rq->cpu) { - if (cpu >= rq->cpu) - return nr_cpu_ids; + for (;;) { - } else if (cpu >= nr_cpu_ids) { - /* - * We passed the end of the mask, start at the beginning. - * If the result is greater or equal to the rq's CPU, then - * the loop is finished. - */ - cpu = cpumask_first(rq->rd->rto_mask); - if (cpu >= rq->cpu) - return nr_cpu_ids; - } - rq->rt.push_cpu = cpu; + /* When rto_cpu is -1 this acts like cpumask_first() */ + cpu = cpumask_next(rd->rto_cpu, rd->rto_mask); - /* Return cpu to let the caller know if the loop is finished or not */ - return cpu; -} + rd->rto_cpu = cpu; -static int find_next_push_cpu(struct rq *rq) -{ - struct rq *next_rq; - int cpu; + if (cpu < nr_cpu_ids) + return cpu; - while (1) { - cpu = rto_next_cpu(rq); - if (cpu >= nr_cpu_ids) - break; - next_rq = cpu_rq(cpu); + rd->rto_cpu = -1; + + /* + * ACQUIRE ensures we see the @rto_mask changes + * made prior to the @next value observed. + * + * Matches WMB in rt_set_overload(). + */ + next = atomic_read_acquire(&rd->rto_loop_next); - /* Make sure the next rq can push to this rq */ - if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr) + if (rd->rto_loop == next) break; + + rd->rto_loop = next; } - return cpu; + return -1; } -#define RT_PUSH_IPI_EXECUTING 1 -#define RT_PUSH_IPI_RESTART 2 +static inline bool rto_start_trylock(atomic_t *v) +{ + return !atomic_cmpxchg_acquire(v, 0, 1); +} -/* - * When a high priority task schedules out from a CPU and a lower priority - * task is scheduled in, a check is made to see if there's any RT tasks - * on other CPUs that are waiting to run because a higher priority RT task - * is currently running on its CPU. In this case, the CPU with multiple RT - * tasks queued on it (overloaded) needs to be notified that a CPU has opened - * up that may be able to run one of its non-running queued RT tasks. - * - * On large CPU boxes, there's the case that several CPUs could schedule - * a lower priority task at the same time, in which case it will look for - * any overloaded CPUs that it could pull a task from. To do this, the runqueue - * lock must be taken from that overloaded CPU. Having 10s of CPUs all fighting - * for a single overloaded CPU's runqueue lock can produce a large latency. - * (This has actually been observed on large boxes running cyclictest). - * Instead of taking the runqueue lock of the overloaded CPU, each of the - * CPUs that scheduled a lower priority task simply sends an IPI to the - * overloaded CPU. An IPI is much cheaper than taking an runqueue lock with - * lots of contention. The overloaded CPU will look to push its non-running - * RT task off, and if it does, it can then ignore the other IPIs coming - * in, and just pass those IPIs off to any other overloaded CPU. - * - * When a CPU schedules a lower priority task, it only sends an IPI to - * the "next" CPU that has overloaded RT tasks. This prevents IPI storms, - * as having 10 CPUs scheduling lower priority tasks and 10 CPUs with - * RT overloaded tasks, would cause 100 IPIs to go out at once. - * - * The overloaded RT CPU, when receiving an IPI, will try to push off its - * overloaded RT tasks and then send an IPI to the next CPU that has - * overloaded RT tasks. This stops when all CPUs with overloaded RT tasks - * have completed. Just because a CPU may have pushed off its own overloaded - * RT task does not mean it should stop sending the IPI around to other - * overloaded CPUs. There may be another RT task waiting to run on one of - * those CPUs that are of higher priority than the one that was just - * pushed. - * - * An optimization that could possibly be made is to make a CPU array similar - * to the cpupri array mask of all running RT tasks, but for the overloaded - * case, then the IPI could be sent to only the CPU with the highest priority - * RT task waiting, and that CPU could send off further IPIs to the CPU with - * the next highest waiting task. Since the overloaded case is much less likely - * to happen, the complexity of this implementation may not be worth it. - * Instead, just send an IPI around to all overloaded CPUs. - * - * The rq->rt.push_flags holds the status of the IPI that is going around. - * A run queue can only send out a single IPI at a time. The possible flags - * for rq->rt.push_flags are: - * - * (None or zero): No IPI is going around for the current rq - * RT_PUSH_IPI_EXECUTING: An IPI for the rq is being passed around - * RT_PUSH_IPI_RESTART: The priority of the running task for the rq - * has changed, and the IPI should restart - * circulating the overloaded CPUs again. - * - * rq->rt.push_cpu contains the CPU that is being sent the IPI. It is updated - * before sending to the next CPU. - * - * Instead of having all CPUs that schedule a lower priority task send - * an IPI to the same "first" CPU in the RT overload mask, they send it - * to the next overloaded CPU after their own CPU. This helps distribute - * the work when there's more than one overloaded CPU and multiple CPUs - * scheduling in lower priority tasks. - * - * When a rq schedules a lower priority task than what was currently - * running, the next CPU with overloaded RT tasks is examined first. - * That is, if CPU 1 and 5 are overloaded, and CPU 3 schedules a lower - * priority task, it will send an IPI first to CPU 5, then CPU 5 will - * send to CPU 1 if it is still overloaded. CPU 1 will clear the - * rq->rt.push_flags if RT_PUSH_IPI_RESTART is not set. - * - * The first CPU to notice IPI_RESTART is set, will clear that flag and then - * send an IPI to the next overloaded CPU after the rq->cpu and not the next - * CPU after push_cpu. That is, if CPU 1, 4 and 5 are overloaded when CPU 3 - * schedules a lower priority task, and the IPI_RESTART gets set while the - * handling is being done on CPU 5, it will clear the flag and send it back to - * CPU 4 instead of CPU 1. - * - * Note, the above logic can be disabled by turning off the sched_feature - * RT_PUSH_IPI. Then the rq lock of the overloaded CPU will simply be - * taken by the CPU requesting a pull and the waiting RT task will be pulled - * by that CPU. This may be fine for machines with few CPUs. - */ -static void tell_cpu_to_push(struct rq *rq) +static inline void rto_start_unlock(atomic_t *v) { - int cpu; + atomic_set_release(v, 0); +} - if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { - raw_spin_lock(&rq->rt.push_lock); - /* Make sure it's still executing */ - if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { - /* - * Tell the IPI to restart the loop as things have - * changed since it started. - */ - rq->rt.push_flags |= RT_PUSH_IPI_RESTART; - raw_spin_unlock(&rq->rt.push_lock); - return; - } - raw_spin_unlock(&rq->rt.push_lock); - } +static void tell_cpu_to_push(struct rq *rq) +{ + int cpu = -1; - /* When here, there's no IPI going around */ + /* Keep the loop going if the IPI is currently active */ + atomic_inc(&rq->rd->rto_loop_next); - rq->rt.push_cpu = rq->cpu; - cpu = find_next_push_cpu(rq); - if (cpu >= nr_cpu_ids) + /* Only one CPU can initiate a loop at a time */ + if (!rto_start_trylock(&rq->rd->rto_loop_start)) return; - rq->rt.push_flags = RT_PUSH_IPI_EXECUTING; + raw_spin_lock(&rq->rd->rto_lock); + + /* + * The rto_cpu is updated under the lock, if it has a valid cpu + * then the IPI is still running and will continue due to the + * update to loop_next, and nothing needs to be done here. + * Otherwise it is finishing up and an ipi needs to be sent. + */ + if (rq->rd->rto_cpu < 0) + cpu = rto_next_cpu(rq); - irq_work_queue_on(&rq->rt.push_work, cpu); + raw_spin_unlock(&rq->rd->rto_lock); + + rto_start_unlock(&rq->rd->rto_loop_start); + + if (cpu >= 0) + irq_work_queue_on(&rq->rd->rto_push_work, cpu); } /* Called from hardirq context */ -static void try_to_push_tasks(void *arg) +void rto_push_irq_work_func(struct irq_work *work) { - struct rt_rq *rt_rq = arg; - struct rq *rq, *src_rq; - int this_cpu; + struct rq *rq; int cpu; - this_cpu = rt_rq->push_cpu; + rq = this_rq(); - /* Paranoid check */ - BUG_ON(this_cpu != smp_processor_id()); - - rq = cpu_rq(this_cpu); - src_rq = rq_of_rt_rq(rt_rq); - -again: + /* + * We do not need to grab the lock to check for has_pushable_tasks. + * When it gets updated, a check is made if a push is possible. + */ if (has_pushable_tasks(rq)) { raw_spin_lock(&rq->lock); - push_rt_task(rq); + push_rt_tasks(rq); raw_spin_unlock(&rq->lock); } - /* Pass the IPI to the next rt overloaded queue */ - raw_spin_lock(&rt_rq->push_lock); - /* - * If the source queue changed since the IPI went out, - * we need to restart the search from that CPU again. - */ - if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) { - rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART; - rt_rq->push_cpu = src_rq->cpu; - } + raw_spin_lock(&rq->rd->rto_lock); - cpu = find_next_push_cpu(src_rq); + /* Pass the IPI to the next rt overloaded queue */ + cpu = rto_next_cpu(rq); - if (cpu >= nr_cpu_ids) - rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING; - raw_spin_unlock(&rt_rq->push_lock); + raw_spin_unlock(&rq->rd->rto_lock); - if (cpu >= nr_cpu_ids) + if (cpu < 0) return; - /* - * It is possible that a restart caused this CPU to be - * chosen again. Don't bother with an IPI, just see if we - * have more to push. - */ - if (unlikely(cpu == rq->cpu)) - goto again; - /* Try the next RT overloaded CPU */ - irq_work_queue_on(&rt_rq->push_work, cpu); -} - -static void push_irq_work_func(struct irq_work *work) -{ - struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work); - - try_to_push_tasks(rt_rq); + irq_work_queue_on(&rq->rd->rto_push_work, cpu); } #endif /* HAVE_RT_PUSH_IPI */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a81c9782e98c..8aa24b41f652 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -505,7 +505,7 @@ static inline int rt_bandwidth_enabled(void) } /* RT IPI pull logic requires IRQ_WORK */ -#ifdef CONFIG_IRQ_WORK +#if defined(CONFIG_IRQ_WORK) && defined(CONFIG_SMP) # define HAVE_RT_PUSH_IPI #endif @@ -527,12 +527,6 @@ struct rt_rq { unsigned long rt_nr_total; int overloaded; struct plist_head pushable_tasks; -#ifdef HAVE_RT_PUSH_IPI - int push_flags; - int push_cpu; - struct irq_work push_work; - raw_spinlock_t push_lock; -#endif #endif /* CONFIG_SMP */ int rt_queued; @@ -641,6 +635,19 @@ struct root_domain { struct dl_bw dl_bw; struct cpudl cpudl; +#ifdef HAVE_RT_PUSH_IPI + /* + * For IPI pull requests, loop across the rto_mask. + */ + struct irq_work rto_push_work; + raw_spinlock_t rto_lock; + /* These are only updated and read within rto_lock */ + int rto_loop; + int rto_cpu; + /* These atomics are updated outside of a lock */ + atomic_t rto_loop_next; + atomic_t rto_loop_start; +#endif /* * The "RT overload" flag: it gets set if a CPU has more than * one runnable RT task. @@ -658,6 +665,9 @@ extern void init_defrootdomain(void); extern int sched_init_domains(const struct cpumask *cpu_map); extern void rq_attach_root(struct rq *rq, struct root_domain *rd); +#ifdef HAVE_RT_PUSH_IPI +extern void rto_push_irq_work_func(struct irq_work *work); +#endif #endif /* CONFIG_SMP */ /* diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index f51d123f9fe1..e50450c2fed8 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -268,6 +268,12 @@ static int init_rootdomain(struct root_domain *rd) if (!zalloc_cpumask_var(&rd->rto_mask, GFP_KERNEL)) goto free_dlo_mask; +#ifdef HAVE_RT_PUSH_IPI + rd->rto_cpu = -1; + raw_spin_lock_init(&rd->rto_lock); + init_irq_work(&rd->rto_push_work, rto_push_irq_work_func); +#endif + init_dl_bw(&rd->dl_bw); if (cpudl_init(&rd->cpudl) != 0) goto free_rto_mask;

8 years

1
0
0 0

[Linux-stable-mirror] WTF: patch "[PATCH] timekeeping: Eliminate the stale declaration of" was seriously submitted to be applied to the 4.14-stable tree?

by gregkh＠linuxfoundation.org

The patch below was submitted to be applied to the 4.14-stable tree. I fail to see how this patch meets the stable kernel rules as found at Documentation/process/stable-kernel-rules.rst. I could be totally wrong, and if so, please respond to <stable(a)vger.kernel.org> and let me know why this patch should be applied. Otherwise, it is now dropped from my patch queues, never to be seen again. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 8a7a8e1eab929eb3a5b735a788a23b9731139046 Mon Sep 17 00:00:00 2001 From: Dou Liyang <douly.fnst(a)cn.fujitsu.com> Date: Mon, 13 Nov 2017 13:49:04 +0800 Subject: [PATCH] timekeeping: Eliminate the stale declaration of ktime_get_raw_and_real_ts64() Commit ba26621e63ce got rid of ktime_get_raw_and_real_ts64(), but left its declaration behind. Remove it. Fixes: ba26621e63ce ("time: Remove duplicated code in ktime_get_raw_and_real()") Signed-off-by: Dou Liyang <douly.fnst(a)cn.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: Christopher S. Hall <christopher.s.hall(a)intel.com> Cc: joelaf(a)google.com Cc: arnd(a)arndb.de Cc: gregkh(a)linuxfoundation.org Cc: john.stultz(a)linaro.org Cc: deepa.kernel(a)gmail.com Cc: stable(a)vger.kernel.org Link: https://lkml.kernel.org/r/1510552144-20831-1-git-send-email-douly.fnst@cn.f… diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index 0021575fe871..51293e1aa4da 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -272,12 +272,6 @@ extern bool timekeeping_rtc_skipresume(void); extern void timekeeping_inject_sleeptime64(struct timespec64 *delta); -/* - * PPS accessor - */ -extern void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw, - struct timespec64 *ts_real); - /* * struct system_time_snapshot - simultaneous raw/real time capture with * counter value

8 years

1
0
0 0

[Linux-stable-mirror] Patch "dm bufio: fix integer overflow when limiting maximum cache size" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dm bufio: fix integer overflow when limiting maximum cache size to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 74d4108d9e681dbbe4a2940ed8fdff1f6868184c Mon Sep 17 00:00:00 2001 From: Eric Biggers <ebiggers(a)google.com> Date: Wed, 15 Nov 2017 16:38:09 -0800 Subject: dm bufio: fix integer overflow when limiting maximum cache size From: Eric Biggers <ebiggers(a)google.com> commit 74d4108d9e681dbbe4a2940ed8fdff1f6868184c upstream. The default max_cache_size_bytes for dm-bufio is meant to be the lesser of 25% of the size of the vmalloc area and 2% of the size of lowmem. However, on 32-bit systems the intermediate result in the expression (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100 overflows, causing the wrong result to be computed. For example, on a 32-bit system where the vmalloc area is 520093696 bytes, the result is 1174405 rather than the expected 130023424, which makes the maximum cache size much too small (far less than 2% of lowmem). This causes severe performance problems for dm-verity users on affected systems. Fix this by using mult_frac() to correctly multiply by a percentage. Do this for all places in dm-bufio that multiply by a percentage. Also replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary to the comment is now defined in include/linux/vmalloc.h. Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset") Fixes: 95d402f057f2 ("dm: add bufio") Signed-off-by: Eric Biggers <ebiggers(a)google.com> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/md/dm-bufio.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -937,7 +937,8 @@ static void __get_memory_limit(struct dm buffers = c->minimum_buffers; *limit_buffers = buffers; - *threshold_buffers = buffers * DM_BUFIO_WRITEBACK_PERCENT / 100; + *threshold_buffers = mult_frac(buffers, + DM_BUFIO_WRITEBACK_PERCENT, 100); } /* @@ -1856,19 +1857,15 @@ static int __init dm_bufio_init(void) memset(&dm_bufio_caches, 0, sizeof dm_bufio_caches); memset(&dm_bufio_cache_names, 0, sizeof dm_bufio_cache_names); - mem = (__u64)((totalram_pages - totalhigh_pages) * - DM_BUFIO_MEMORY_PERCENT / 100) << PAGE_SHIFT; + mem = (__u64)mult_frac(totalram_pages - totalhigh_pages, + DM_BUFIO_MEMORY_PERCENT, 100) << PAGE_SHIFT; if (mem > ULONG_MAX) mem = ULONG_MAX; #ifdef CONFIG_MMU - /* - * Get the size of vmalloc space the same way as VMALLOC_TOTAL - * in fs/proc/internal.h - */ - if (mem > (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100) - mem = (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100; + if (mem > mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100)) + mem = mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100); #endif dm_bufio_default_cache_size = mem; Patches currently in stable-queue which might be from ebiggers(a)google.com are queue-4.9/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch queue-4.9/dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "dm: allocate struct mapped_device with kvzalloc" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dm: allocate struct mapped_device with kvzalloc to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dm-allocate-struct-mapped_device-with-kvzalloc.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 856eb0916d181da6d043cc33e03f54d5c5bbe54a Mon Sep 17 00:00:00 2001 From: Mikulas Patocka <mpatocka(a)redhat.com> Date: Tue, 31 Oct 2017 19:33:02 -0400 Subject: dm: allocate struct mapped_device with kvzalloc From: Mikulas Patocka <mpatocka(a)redhat.com> commit 856eb0916d181da6d043cc33e03f54d5c5bbe54a upstream. The structure srcu_struct can be very big, its size is proportional to the value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field io_barrier in the struct mapped_device has 84kB in the debugging kernel and 50kB in the non-debugging kernel. The large size may result in failure of the function kzalloc_node. In order to avoid the allocation failure, we use the function kvzalloc_node, this function falls back to vmalloc if a large contiguous chunk of memory is not available. This patch also moves the field io_barrier to the last position of struct mapped_device - the reason is that on many processor architectures, short memory offsets result in smaller code than long memory offsets - on x86-64 it reduces code size by 320 bytes. Note to stable kernel maintainers - the kernels 4.11 and older don't have the function kvzalloc_node, you can use the function vzalloc_node instead. Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/md/dm-core.h | 3 ++- drivers/md/dm.c | 7 ++++--- 2 files changed, 6 insertions(+), 4 deletions(-) --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -29,7 +29,6 @@ struct dm_kobject_holder { * DM targets must _not_ deference a mapped_device to directly access its members! */ struct mapped_device { - struct srcu_struct io_barrier; struct mutex suspend_lock; /* @@ -127,6 +126,8 @@ struct mapped_device { struct blk_mq_tag_set *tag_set; bool use_blk_mq:1; bool init_tio_pdu:1; + + struct srcu_struct io_barrier; }; void dm_init_md_queue(struct mapped_device *md); --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -21,6 +21,7 @@ #include <linux/delay.h> #include <linux/wait.h> #include <linux/pr.h> +#include <linux/vmalloc.h> #define DM_MSG_PREFIX "core" @@ -1511,7 +1512,7 @@ static struct mapped_device *alloc_dev(i struct mapped_device *md; void *old_md; - md = kzalloc_node(sizeof(*md), GFP_KERNEL, numa_node_id); + md = vzalloc_node(sizeof(*md), numa_node_id); if (!md) { DMWARN("unable to allocate device, out of memory."); return NULL; @@ -1605,7 +1606,7 @@ bad_io_barrier: bad_minor: module_put(THIS_MODULE); bad_module_get: - kfree(md); + kvfree(md); return NULL; } @@ -1624,7 +1625,7 @@ static void free_dev(struct mapped_devic free_minor(minor); module_put(THIS_MODULE); - kfree(md); + kvfree(md); } static void __bind_mempools(struct mapped_device *md, struct dm_table *t) Patches currently in stable-queue which might be from mpatocka(a)redhat.com are queue-4.9/dm-allocate-struct-mapped_device-with-kvzalloc.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "dm bufio: fix integer overflow when limiting maximum cache size" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dm bufio: fix integer overflow when limiting maximum cache size to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 74d4108d9e681dbbe4a2940ed8fdff1f6868184c Mon Sep 17 00:00:00 2001 From: Eric Biggers <ebiggers(a)google.com> Date: Wed, 15 Nov 2017 16:38:09 -0800 Subject: dm bufio: fix integer overflow when limiting maximum cache size From: Eric Biggers <ebiggers(a)google.com> commit 74d4108d9e681dbbe4a2940ed8fdff1f6868184c upstream. The default max_cache_size_bytes for dm-bufio is meant to be the lesser of 25% of the size of the vmalloc area and 2% of the size of lowmem. However, on 32-bit systems the intermediate result in the expression (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100 overflows, causing the wrong result to be computed. For example, on a 32-bit system where the vmalloc area is 520093696 bytes, the result is 1174405 rather than the expected 130023424, which makes the maximum cache size much too small (far less than 2% of lowmem). This causes severe performance problems for dm-verity users on affected systems. Fix this by using mult_frac() to correctly multiply by a percentage. Do this for all places in dm-bufio that multiply by a percentage. Also replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary to the comment is now defined in include/linux/vmalloc.h. Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset") Fixes: 95d402f057f2 ("dm: add bufio") Signed-off-by: Eric Biggers <ebiggers(a)google.com> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/md/dm-bufio.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -928,7 +928,8 @@ static void __get_memory_limit(struct dm buffers = c->minimum_buffers; *limit_buffers = buffers; - *threshold_buffers = buffers * DM_BUFIO_WRITEBACK_PERCENT / 100; + *threshold_buffers = mult_frac(buffers, + DM_BUFIO_WRITEBACK_PERCENT, 100); } /* @@ -1829,19 +1830,15 @@ static int __init dm_bufio_init(void) memset(&dm_bufio_caches, 0, sizeof dm_bufio_caches); memset(&dm_bufio_cache_names, 0, sizeof dm_bufio_cache_names); - mem = (__u64)((totalram_pages - totalhigh_pages) * - DM_BUFIO_MEMORY_PERCENT / 100) << PAGE_SHIFT; + mem = (__u64)mult_frac(totalram_pages - totalhigh_pages, + DM_BUFIO_MEMORY_PERCENT, 100) << PAGE_SHIFT; if (mem > ULONG_MAX) mem = ULONG_MAX; #ifdef CONFIG_MMU - /* - * Get the size of vmalloc space the same way as VMALLOC_TOTAL - * in fs/proc/internal.h - */ - if (mem > (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100) - mem = (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100; + if (mem > mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100)) + mem = mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100); #endif dm_bufio_default_cache_size = mem; Patches currently in stable-queue which might be from ebiggers(a)google.com are queue-4.4/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch queue-4.4/dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ovl: Put upperdentry if ovl_check_origin() fails" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ovl: Put upperdentry if ovl_check_origin() fails to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: ovl-put-upperdentry-if-ovl_check_origin-fails.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 5455f92b54e516995a9ca45bbf790d3629c27a93 Mon Sep 17 00:00:00 2001 From: Vivek Goyal <vgoyal(a)redhat.com> Date: Wed, 1 Nov 2017 15:37:22 -0400 Subject: ovl: Put upperdentry if ovl_check_origin() fails From: Vivek Goyal <vgoyal(a)redhat.com> commit 5455f92b54e516995a9ca45bbf790d3629c27a93 upstream. If ovl_check_origin() fails, we should put upperdentry. We have a reference on it by now. So goto out_put_upper instead of out. Fixes: a9d019573e88 ("ovl: lookup non-dir copy-up-origin by file handle") Signed-off-by: Vivek Goyal <vgoyal(a)redhat.com> Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- fs/overlayfs/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/fs/overlayfs/namei.c +++ b/fs/overlayfs/namei.c @@ -630,7 +630,7 @@ struct dentry *ovl_lookup(struct inode * err = ovl_check_origin(upperdentry, roe->lowerstack, roe->numlower, &stack, &ctr); if (err) - goto out; + goto out_put_upper; } if (d.redirect) { Patches currently in stable-queue which might be from vgoyal(a)redhat.com are queue-4.14/ovl-put-upperdentry-if-ovl_check_origin-fails.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "dm zoned: ignore last smaller runt zone" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dm zoned: ignore last smaller runt zone to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dm-zoned-ignore-last-smaller-runt-zone.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 114e025968b5990ad0b57bf60697ea64ee206aac Mon Sep 17 00:00:00 2001 From: Damien Le Moal <damien.lemoal(a)wdc.com> Date: Sat, 28 Oct 2017 16:39:34 +0900 Subject: dm zoned: ignore last smaller runt zone From: Damien Le Moal <damien.lemoal(a)wdc.com> commit 114e025968b5990ad0b57bf60697ea64ee206aac upstream. The SCSI layer allows ZBC drives to have a smaller last runt zone. For such a device, specifying the entire capacity for a dm-zoned target table entry fails because the specified capacity is not aligned on a device zone size indicated in the request queue structure of the device. Fix this problem by ignoring the last runt zone in the entry length when seting up the dm-zoned target (ctr method) and when iterating table entries of the target (iterate_devices method). This allows dm-zoned users to still easily setup a target using the entire device capacity (as mandated by dm-zoned) or the aligned capacity excluding the last runt zone. While at it, replace direct references to the device queue chunk_sectors limit with calls to the accessor blk_queue_zone_sectors(). Reported-by: Peter Desnoyers <pjd(a)ccs.neu.edu> Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/md/dm-zoned-target.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) --- a/drivers/md/dm-zoned-target.c +++ b/drivers/md/dm-zoned-target.c @@ -660,6 +660,7 @@ static int dmz_get_zoned_device(struct d struct dmz_target *dmz = ti->private; struct request_queue *q; struct dmz_dev *dev; + sector_t aligned_capacity; int ret; /* Get the target device */ @@ -685,15 +686,17 @@ static int dmz_get_zoned_device(struct d goto err; } + q = bdev_get_queue(dev->bdev); dev->capacity = i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT; - if (ti->begin || (ti->len != dev->capacity)) { + aligned_capacity = dev->capacity & ~(blk_queue_zone_sectors(q) - 1); + if (ti->begin || + ((ti->len != dev->capacity) && (ti->len != aligned_capacity))) { ti->error = "Partial mapping not supported"; ret = -EINVAL; goto err; } - q = bdev_get_queue(dev->bdev); - dev->zone_nr_sectors = q->limits.chunk_sectors; + dev->zone_nr_sectors = blk_queue_zone_sectors(q); dev->zone_nr_sectors_shift = ilog2(dev->zone_nr_sectors); dev->zone_nr_blocks = dmz_sect2blk(dev->zone_nr_sectors); @@ -929,8 +932,10 @@ static int dmz_iterate_devices(struct dm iterate_devices_callout_fn fn, void *data) { struct dmz_target *dmz = ti->private; + struct dmz_dev *dev = dmz->dev; + sector_t capacity = dev->capacity & ~(dev->zone_nr_sectors - 1); - return fn(ti, dmz->ddev, 0, dmz->dev->capacity, data); + return fn(ti, dmz->ddev, 0, capacity, data); } static struct target_type dmz_type = { Patches currently in stable-queue which might be from damien.lemoal(a)wdc.com are queue-4.14/dm-zoned-ignore-last-smaller-runt-zone.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "dm mpath: remove annoying message of 'blk_get_request() returned -11'" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dm mpath: remove annoying message of 'blk_get_request() returned -11' to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dm-mpath-remove-annoying-message-of-blk_get_request-returned-11.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 9dc112e2daf87b40607fd8d357e2d7de32290d45 Mon Sep 17 00:00:00 2001 From: Ming Lei <ming.lei(a)redhat.com> Date: Sat, 30 Sep 2017 19:46:48 +0800 Subject: dm mpath: remove annoying message of 'blk_get_request() returned -11' From: Ming Lei <ming.lei(a)redhat.com> commit 9dc112e2daf87b40607fd8d357e2d7de32290d45 upstream. It is very normal to see allocation failure, especially with blk-mq request_queues, so it's unnecessary to report this error and annoy people. In practice this 'blk_get_request() returned -11' error gets logged quite frequently when a blk-mq DM multipath device sees heavy IO. This change is marked for stable@ because the annoying message in question was included in stable@ commit 7083abbbf. Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop") Signed-off-by: Ming Lei <ming.lei(a)redhat.com> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/md/dm-mpath.c | 2 -- 1 file changed, 2 deletions(-) --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -499,8 +499,6 @@ static int multipath_clone_and_map(struc if (IS_ERR(clone)) { /* EBUSY, ENODEV or EWOULDBLOCK: requeue */ bool queue_dying = blk_queue_dying(q); - DMERR_LIMIT("blk_get_request() returned %ld%s - requeuing", - PTR_ERR(clone), queue_dying ? " (path offline)" : ""); if (queue_dying) { atomic_inc(&m->pg_init_in_progress); activate_or_offline_path(pgpath); Patches currently in stable-queue which might be from ming.lei(a)redhat.com are queue-4.14/dm-mpath-remove-annoying-message-of-blk_get_request-returned-11.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "dm integrity: allow unaligned bv_offset" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dm integrity: allow unaligned bv_offset to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dm-integrity-allow-unaligned-bv_offset.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 95b1369a9638cfa322ad1c0cde8efbe524059884 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka <mpatocka(a)redhat.com> Date: Tue, 7 Nov 2017 10:40:40 -0500 Subject: dm integrity: allow unaligned bv_offset MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Mikulas Patocka <mpatocka(a)redhat.com> commit 95b1369a9638cfa322ad1c0cde8efbe524059884 upstream. When slub_debug is enabled kmalloc returns unaligned memory. XFS uses this unaligned memory for its buffers (if an unaligned buffer crosses a page, XFS frees it and allocates a full page instead - see the function xfs_buf_allocate_memory). dm-integrity checks if bv_offset is aligned on page size and this check fail with slub_debug and XFS. Fix this bug by removing the bv_offset check, leaving only the check for bv_len. Fixes: 7eada909bfd7 ("dm: add integrity target") Reported-by: Bruno Prémont <bonbons(a)sysophe.eu> Reviewed-by: Milan Broz <gmazyland(a)gmail.com> Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/md/dm-integrity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -1376,7 +1376,7 @@ static int dm_integrity_map(struct dm_ta struct bvec_iter iter; struct bio_vec bv; bio_for_each_segment(bv, bio, iter) { - if (unlikely((bv.bv_offset | bv.bv_len) & ((ic->sectors_per_block << SECTOR_SHIFT) - 1))) { + if (unlikely(bv.bv_len & ((ic->sectors_per_block << SECTOR_SHIFT) - 1))) { DMERR("Bio vector (%u,%u) is not aligned on %u-sector boundary", bv.bv_offset, bv.bv_len, ic->sectors_per_block); return DM_MAPIO_KILL; Patches currently in stable-queue which might be from mpatocka(a)redhat.com are queue-4.14/dm-allocate-struct-mapped_device-with-kvzalloc.patch queue-4.14/dm-integrity-allow-unaligned-bv_offset.patch queue-4.14/dm-crypt-allow-unaligned-bv_offset.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "dm crypt: allow unaligned bv_offset" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dm crypt: allow unaligned bv_offset to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dm-crypt-allow-unaligned-bv_offset.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 0440d5c0ca9744b92a07aeb6df0a9a75db6f4280 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka <mpatocka(a)redhat.com> Date: Tue, 7 Nov 2017 10:35:57 -0500 Subject: dm crypt: allow unaligned bv_offset MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Mikulas Patocka <mpatocka(a)redhat.com> commit 0440d5c0ca9744b92a07aeb6df0a9a75db6f4280 upstream. When slub_debug is enabled kmalloc returns unaligned memory. XFS uses this unaligned memory for its buffers (if an unaligned buffer crosses a page, XFS frees it and allocates a full page instead - see the function xfs_buf_allocate_memory). dm-crypt checks if bv_offset is aligned on page size and these checks fail with slub_debug and XFS. Fix this bug by removing the bv_offset checks. Switch to checking if bv_len is aligned instead of bv_offset (this check should be sufficient to prevent overruns if a bio with too small bv_len is received). Fixes: 8f0009a22517 ("dm crypt: optionally support larger encryption sector size") Reported-by: Bruno Prémont <bonbons(a)sysophe.eu> Tested-by: Bruno Prémont <bonbons(a)sysophe.eu> Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com> Reviewed-by: Milan Broz <gmazyland(a)gmail.com> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/md/dm-crypt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -1075,7 +1075,7 @@ static int crypt_convert_block_aead(stru BUG_ON(cc->integrity_iv_size && cc->integrity_iv_size != cc->iv_size); /* Reject unexpected unaligned bio. */ - if (unlikely(bv_in.bv_offset & (cc->sector_size - 1))) + if (unlikely(bv_in.bv_len & (cc->sector_size - 1))) return -EIO; dmreq = dmreq_of_req(cc, req); @@ -1168,7 +1168,7 @@ static int crypt_convert_block_skcipher( int r = 0; /* Reject unexpected unaligned bio. */ - if (unlikely(bv_in.bv_offset & (cc->sector_size - 1))) + if (unlikely(bv_in.bv_len & (cc->sector_size - 1))) return -EIO; dmreq = dmreq_of_req(cc, req); Patches currently in stable-queue which might be from mpatocka(a)redhat.com are queue-4.14/dm-allocate-struct-mapped_device-with-kvzalloc.patch queue-4.14/dm-integrity-allow-unaligned-bv_offset.patch queue-4.14/dm-crypt-allow-unaligned-bv_offset.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "dm cache: fix race condition in the writeback mode overwrite_bio optimisation" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dm cache: fix race condition in the writeback mode overwrite_bio optimisation to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dm-cache-fix-race-condition-in-the-writeback-mode-overwrite_bio-optimisation.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From d1260e2a3f85f4c1010510a15f89597001318b1b Mon Sep 17 00:00:00 2001 From: Joe Thornber <ejt(a)redhat.com> Date: Fri, 10 Nov 2017 07:53:31 -0500 Subject: dm cache: fix race condition in the writeback mode overwrite_bio optimisation From: Joe Thornber <ejt(a)redhat.com> commit d1260e2a3f85f4c1010510a15f89597001318b1b upstream. When a DM cache in writeback mode moves data between the slow and fast device it can often avoid a copy if the triggering bio either: i) covers the whole block (no point copying if we're about to overwrite it) ii) the migration is a promotion and the origin block is currently discarded Prior to this fix there was a race with case (ii). The discard status was checked with a shared lock held (rather than exclusive). This meant another bio could run in parallel and write data to the origin, removing the discard state. After the promotion the parallel write would have been lost. With this fix the discard status is re-checked once the exclusive lock has been aquired. If the block is no longer discarded it falls back to the slower full copy path. Fixes: b29d4986d ("dm cache: significant rework to leverage dm-bio-prison-v2") Signed-off-by: Joe Thornber <ejt(a)redhat.com> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/md/dm-cache-target.c | 86 ++++++++++++++++++++++++++----------------- 1 file changed, 53 insertions(+), 33 deletions(-) --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -1201,6 +1201,18 @@ static void background_work_end(struct c /*----------------------------------------------------------------*/ +static bool bio_writes_complete_block(struct cache *cache, struct bio *bio) +{ + return (bio_data_dir(bio) == WRITE) && + (bio->bi_iter.bi_size == (cache->sectors_per_block << SECTOR_SHIFT)); +} + +static bool optimisable_bio(struct cache *cache, struct bio *bio, dm_oblock_t block) +{ + return writeback_mode(&cache->features) && + (is_discarded_oblock(cache, block) || bio_writes_complete_block(cache, bio)); +} + static void quiesce(struct dm_cache_migration *mg, void (*continuation)(struct work_struct *)) { @@ -1474,13 +1486,51 @@ static void mg_upgrade_lock(struct work_ } } +static void mg_full_copy(struct work_struct *ws) +{ + struct dm_cache_migration *mg = ws_to_mg(ws); + struct cache *cache = mg->cache; + struct policy_work *op = mg->op; + bool is_policy_promote = (op->op == POLICY_PROMOTE); + + if ((!is_policy_promote && !is_dirty(cache, op->cblock)) || + is_discarded_oblock(cache, op->oblock)) { + mg_upgrade_lock(ws); + return; + } + + init_continuation(&mg->k, mg_upgrade_lock); + + if (copy(mg, is_policy_promote)) { + DMERR_LIMIT("%s: migration copy failed", cache_device_name(cache)); + mg->k.input = BLK_STS_IOERR; + mg_complete(mg, false); + } +} + static void mg_copy(struct work_struct *ws) { - int r; struct dm_cache_migration *mg = ws_to_mg(ws); if (mg->overwrite_bio) { /* + * No exclusive lock was held when we last checked if the bio + * was optimisable. So we have to check again in case things + * have changed (eg, the block may no longer be discarded). + */ + if (!optimisable_bio(mg->cache, mg->overwrite_bio, mg->op->oblock)) { + /* + * Fallback to a real full copy after doing some tidying up. + */ + bool rb = bio_detain_shared(mg->cache, mg->op->oblock, mg->overwrite_bio); + BUG_ON(rb); /* An exclussive lock must _not_ be held for this block */ + mg->overwrite_bio = NULL; + inc_io_migrations(mg->cache); + mg_full_copy(ws); + return; + } + + /* * It's safe to do this here, even though it's new data * because all IO has been locked out of the block. * @@ -1489,26 +1539,8 @@ static void mg_copy(struct work_struct * */ overwrite(mg, mg_update_metadata_after_copy); - } else { - struct cache *cache = mg->cache; - struct policy_work *op = mg->op; - bool is_policy_promote = (op->op == POLICY_PROMOTE); - - if ((!is_policy_promote && !is_dirty(cache, op->cblock)) || - is_discarded_oblock(cache, op->oblock)) { - mg_upgrade_lock(ws); - return; - } - - init_continuation(&mg->k, mg_upgrade_lock); - - r = copy(mg, is_policy_promote); - if (r) { - DMERR_LIMIT("%s: migration copy failed", cache_device_name(cache)); - mg->k.input = BLK_STS_IOERR; - mg_complete(mg, false); - } - } + } else + mg_full_copy(ws); } static int mg_lock_writes(struct dm_cache_migration *mg) @@ -1748,18 +1780,6 @@ static void inc_miss_counter(struct cach /*----------------------------------------------------------------*/ -static bool bio_writes_complete_block(struct cache *cache, struct bio *bio) -{ - return (bio_data_dir(bio) == WRITE) && - (bio->bi_iter.bi_size == (cache->sectors_per_block << SECTOR_SHIFT)); -} - -static bool optimisable_bio(struct cache *cache, struct bio *bio, dm_oblock_t block) -{ - return writeback_mode(&cache->features) && - (is_discarded_oblock(cache, block) || bio_writes_complete_block(cache, bio)); -} - static int map_bio(struct cache *cache, struct bio *bio, dm_oblock_t block, bool *commit_needed) { Patches currently in stable-queue which might be from ejt(a)redhat.com are queue-4.14/dm-cache-fix-race-condition-in-the-writeback-mode-overwrite_bio-optimisation.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "dm bufio: fix integer overflow when limiting maximum cache size" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dm bufio: fix integer overflow when limiting maximum cache size to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 74d4108d9e681dbbe4a2940ed8fdff1f6868184c Mon Sep 17 00:00:00 2001 From: Eric Biggers <ebiggers(a)google.com> Date: Wed, 15 Nov 2017 16:38:09 -0800 Subject: dm bufio: fix integer overflow when limiting maximum cache size From: Eric Biggers <ebiggers(a)google.com> commit 74d4108d9e681dbbe4a2940ed8fdff1f6868184c upstream. The default max_cache_size_bytes for dm-bufio is meant to be the lesser of 25% of the size of the vmalloc area and 2% of the size of lowmem. However, on 32-bit systems the intermediate result in the expression (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100 overflows, causing the wrong result to be computed. For example, on a 32-bit system where the vmalloc area is 520093696 bytes, the result is 1174405 rather than the expected 130023424, which makes the maximum cache size much too small (far less than 2% of lowmem). This causes severe performance problems for dm-verity users on affected systems. Fix this by using mult_frac() to correctly multiply by a percentage. Do this for all places in dm-bufio that multiply by a percentage. Also replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary to the comment is now defined in include/linux/vmalloc.h. Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset") Fixes: 95d402f057f2 ("dm: add bufio") Signed-off-by: Eric Biggers <ebiggers(a)google.com> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/md/dm-bufio.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -974,7 +974,8 @@ static void __get_memory_limit(struct dm buffers = c->minimum_buffers; *limit_buffers = buffers; - *threshold_buffers = buffers * DM_BUFIO_WRITEBACK_PERCENT / 100; + *threshold_buffers = mult_frac(buffers, + DM_BUFIO_WRITEBACK_PERCENT, 100); } /* @@ -1910,19 +1911,15 @@ static int __init dm_bufio_init(void) memset(&dm_bufio_caches, 0, sizeof dm_bufio_caches); memset(&dm_bufio_cache_names, 0, sizeof dm_bufio_cache_names); - mem = (__u64)((totalram_pages - totalhigh_pages) * - DM_BUFIO_MEMORY_PERCENT / 100) << PAGE_SHIFT; + mem = (__u64)mult_frac(totalram_pages - totalhigh_pages, + DM_BUFIO_MEMORY_PERCENT, 100) << PAGE_SHIFT; if (mem > ULONG_MAX) mem = ULONG_MAX; #ifdef CONFIG_MMU - /* - * Get the size of vmalloc space the same way as VMALLOC_TOTAL - * in fs/proc/internal.h - */ - if (mem > (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100) - mem = (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100; + if (mem > mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100)) + mem = mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100); #endif dm_bufio_default_cache_size = mem; Patches currently in stable-queue which might be from ebiggers(a)google.com are queue-4.14/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch queue-4.14/dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "dm: allocate struct mapped_device with kvzalloc" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dm: allocate struct mapped_device with kvzalloc to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dm-allocate-struct-mapped_device-with-kvzalloc.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 856eb0916d181da6d043cc33e03f54d5c5bbe54a Mon Sep 17 00:00:00 2001 From: Mikulas Patocka <mpatocka(a)redhat.com> Date: Tue, 31 Oct 2017 19:33:02 -0400 Subject: dm: allocate struct mapped_device with kvzalloc From: Mikulas Patocka <mpatocka(a)redhat.com> commit 856eb0916d181da6d043cc33e03f54d5c5bbe54a upstream. The structure srcu_struct can be very big, its size is proportional to the value CONFIG_NR_CPUS. The Fedora kernel has CONFIG_NR_CPUS 8192, the field io_barrier in the struct mapped_device has 84kB in the debugging kernel and 50kB in the non-debugging kernel. The large size may result in failure of the function kzalloc_node. In order to avoid the allocation failure, we use the function kvzalloc_node, this function falls back to vmalloc if a large contiguous chunk of memory is not available. This patch also moves the field io_barrier to the last position of struct mapped_device - the reason is that on many processor architectures, short memory offsets result in smaller code than long memory offsets - on x86-64 it reduces code size by 320 bytes. Note to stable kernel maintainers - the kernels 4.11 and older don't have the function kvzalloc_node, you can use the function vzalloc_node instead. Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/md/dm-core.h | 3 ++- drivers/md/dm.c | 6 +++--- 2 files changed, 5 insertions(+), 4 deletions(-) --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -29,7 +29,6 @@ struct dm_kobject_holder { * DM targets must _not_ deference a mapped_device to directly access its members! */ struct mapped_device { - struct srcu_struct io_barrier; struct mutex suspend_lock; /* @@ -127,6 +126,8 @@ struct mapped_device { struct blk_mq_tag_set *tag_set; bool use_blk_mq:1; bool init_tio_pdu:1; + + struct srcu_struct io_barrier; }; void dm_init_md_queue(struct mapped_device *md); --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1695,7 +1695,7 @@ static struct mapped_device *alloc_dev(i struct mapped_device *md; void *old_md; - md = kzalloc_node(sizeof(*md), GFP_KERNEL, numa_node_id); + md = kvzalloc_node(sizeof(*md), GFP_KERNEL, numa_node_id); if (!md) { DMWARN("unable to allocate device, out of memory."); return NULL; @@ -1795,7 +1795,7 @@ bad_io_barrier: bad_minor: module_put(THIS_MODULE); bad_module_get: - kfree(md); + kvfree(md); return NULL; } @@ -1814,7 +1814,7 @@ static void free_dev(struct mapped_devic free_minor(minor); module_put(THIS_MODULE); - kfree(md); + kvfree(md); } static void __bind_mempools(struct mapped_device *md, struct dm_table *t) Patches currently in stable-queue which might be from mpatocka(a)redhat.com are queue-4.14/dm-allocate-struct-mapped_device-with-kvzalloc.patch queue-4.14/dm-integrity-allow-unaligned-bv_offset.patch queue-4.14/dm-crypt-allow-unaligned-bv_offset.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "dm bufio: fix integer overflow when limiting maximum cache size" has been added to the 3.18-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled dm bufio: fix integer overflow when limiting maximum cache size to the 3.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch and it can be found in the queue-3.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 74d4108d9e681dbbe4a2940ed8fdff1f6868184c Mon Sep 17 00:00:00 2001 From: Eric Biggers <ebiggers(a)google.com> Date: Wed, 15 Nov 2017 16:38:09 -0800 Subject: dm bufio: fix integer overflow when limiting maximum cache size From: Eric Biggers <ebiggers(a)google.com> commit 74d4108d9e681dbbe4a2940ed8fdff1f6868184c upstream. The default max_cache_size_bytes for dm-bufio is meant to be the lesser of 25% of the size of the vmalloc area and 2% of the size of lowmem. However, on 32-bit systems the intermediate result in the expression (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100 overflows, causing the wrong result to be computed. For example, on a 32-bit system where the vmalloc area is 520093696 bytes, the result is 1174405 rather than the expected 130023424, which makes the maximum cache size much too small (far less than 2% of lowmem). This causes severe performance problems for dm-verity users on affected systems. Fix this by using mult_frac() to correctly multiply by a percentage. Do this for all places in dm-bufio that multiply by a percentage. Also replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary to the comment is now defined in include/linux/vmalloc.h. Depends-on: 9993bc635 ("sched/x86: Fix overflow in cyc2ns_offset") Fixes: 95d402f057f2 ("dm: add bufio") Signed-off-by: Eric Biggers <ebiggers(a)google.com> Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/md/dm-bufio.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -876,7 +876,8 @@ static void __get_memory_limit(struct dm buffers = c->minimum_buffers; *limit_buffers = buffers; - *threshold_buffers = buffers * DM_BUFIO_WRITEBACK_PERCENT / 100; + *threshold_buffers = mult_frac(buffers, + DM_BUFIO_WRITEBACK_PERCENT, 100); } /* @@ -1764,19 +1765,15 @@ static int __init dm_bufio_init(void) memset(&dm_bufio_caches, 0, sizeof dm_bufio_caches); memset(&dm_bufio_cache_names, 0, sizeof dm_bufio_cache_names); - mem = (__u64)((totalram_pages - totalhigh_pages) * - DM_BUFIO_MEMORY_PERCENT / 100) << PAGE_SHIFT; + mem = (__u64)mult_frac(totalram_pages - totalhigh_pages, + DM_BUFIO_MEMORY_PERCENT, 100) << PAGE_SHIFT; if (mem > ULONG_MAX) mem = ULONG_MAX; #ifdef CONFIG_MMU - /* - * Get the size of vmalloc space the same way as VMALLOC_TOTAL - * in fs/proc/internal.h - */ - if (mem > (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100) - mem = (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100; + if (mem > mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100)) + mem = mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100); #endif dm_bufio_default_cache_size = mem; Patches currently in stable-queue which might be from ebiggers(a)google.com are queue-3.18/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch queue-3.18/dm-bufio-fix-integer-overflow-when-limiting-maximum-cache-size.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 7f342678634f16795892677204366e835e450dda Mon Sep 17 00:00:00 2001 From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com> Date: Tue, 17 Oct 2017 05:47:38 -0700 Subject: PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com> commit 7f342678634f16795892677204366e835e450dda upstream. The Cavium ThunderX (CN8XXX) family of PCIe Root Ports does not advertise an ACS capability. However, the RTL internally implements similar protection as if ACS had Request Redirection, Completion Redirection, Source Validation, and Upstream Forwarding features enabled. Change Cavium ACS capabilities quirk flags accordingly. Fixes: b404bcfbf035 ("PCI: Add ACS quirk for all Cavium devices") Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com> [bhelgaas: tidy changelog, comment, stable tag] Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/pci/quirks.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4088,12 +4088,14 @@ static int pci_quirk_amd_sb_acs(struct p static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags) { /* - * Cavium devices matching this quirk do not perform peer-to-peer - * with other functions, allowing masking out these bits as if they - * were unimplemented in the ACS capability. + * Cavium root ports don't advertise an ACS capability. However, + * the RTL internally implements similar protection as if ACS had + * Request Redirection, Completion Redirection, Source Validation, + * and Upstream Forwarding features enabled. Assert that the + * hardware implements and enables equivalent ACS functionality for + * these flags. */ - acs_flags &= ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | - PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_DT); + acs_flags &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_SV | PCI_ACS_UF); return acs_flags ? 0 : 1; } Patches currently in stable-queue which might be from Vadim.Lomovtsev(a)cavium.com are queue-4.9/pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ALSA: hda: Add Raven PCI ID" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ALSA: hda: Add Raven PCI ID to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: alsa-hda-add-raven-pci-id.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 9ceace3c9c18c67676e75141032a65a8e01f9a7a Mon Sep 17 00:00:00 2001 From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> Date: Thu, 23 Nov 2017 20:07:00 +0530 Subject: ALSA: hda: Add Raven PCI ID From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> commit 9ceace3c9c18c67676e75141032a65a8e01f9a7a upstream. This commit adds PCI ID for Raven platform Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> Signed-off-by: Takashi Iwai <tiwai(a)suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- sound/pci/hda/hda_intel.c | 3 +++ 1 file changed, 3 insertions(+) --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2305,6 +2305,9 @@ static const struct pci_device_id azx_id /* AMD Hudson */ { PCI_DEVICE(0x1022, 0x780d), .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB }, + /* AMD Raven */ + { PCI_DEVICE(0x1022, 0x15e3), + .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB }, /* ATI HDMI */ { PCI_DEVICE(0x1002, 0x0002), .driver_data = AZX_DRIVER_ATIHDMI_NS | AZX_DCAPS_PRESET_ATI_HDMI_NS }, Patches currently in stable-queue which might be from Vijendar.Mukunda(a)amd.com are queue-4.9/alsa-hda-add-raven-pci-id.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ALSA: hda: Add Raven PCI ID" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ALSA: hda: Add Raven PCI ID to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: alsa-hda-add-raven-pci-id.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 9ceace3c9c18c67676e75141032a65a8e01f9a7a Mon Sep 17 00:00:00 2001 From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> Date: Thu, 23 Nov 2017 20:07:00 +0530 Subject: ALSA: hda: Add Raven PCI ID From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> commit 9ceace3c9c18c67676e75141032a65a8e01f9a7a upstream. This commit adds PCI ID for Raven platform Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> Signed-off-by: Takashi Iwai <tiwai(a)suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- sound/pci/hda/hda_intel.c | 3 +++ 1 file changed, 3 insertions(+) --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2316,6 +2316,9 @@ static const struct pci_device_id azx_id /* AMD Hudson */ { PCI_DEVICE(0x1022, 0x780d), .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB }, + /* AMD Raven */ + { PCI_DEVICE(0x1022, 0x15e3), + .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB }, /* ATI HDMI */ { PCI_DEVICE(0x1002, 0x0002), .driver_data = AZX_DRIVER_ATIHDMI_NS | AZX_DCAPS_PRESET_ATI_HDMI_NS }, Patches currently in stable-queue which might be from Vijendar.Mukunda(a)amd.com are queue-4.4/alsa-hda-add-raven-pci-id.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "PM / OPP: Add missing of_node_put(np)" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled PM / OPP: Add missing of_node_put(np) to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: pm-opp-add-missing-of_node_put-np.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 7978db344719dab1e56d05e6fc04aaaddcde0a5e Mon Sep 17 00:00:00 2001 From: Tobias Jordan <Tobias.Jordan(a)elektrobit.com> Date: Wed, 4 Oct 2017 11:35:03 +0530 Subject: PM / OPP: Add missing of_node_put(np) From: Tobias Jordan <Tobias.Jordan(a)elektrobit.com> commit 7978db344719dab1e56d05e6fc04aaaddcde0a5e upstream. The for_each_available_child_of_node() loop in _of_add_opp_table_v2() doesn't drop the reference to "np" on errors. Fix that. Fixes: 274659029c9d (PM / OPP: Add support to parse "operating-points-v2" bindings) Signed-off-by: Tobias Jordan <Tobias.Jordan(a)elektrobit.com> [ VK: Improved commit log. ] Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org> Reviewed-by: Stephen Boyd <sboyd(a)codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/base/power/opp/of.c | 1 + 1 file changed, 1 insertion(+) --- a/drivers/base/power/opp/of.c +++ b/drivers/base/power/opp/of.c @@ -397,6 +397,7 @@ static int _of_add_opp_table_v2(struct d dev_err(dev, "%s: Failed to add OPP, %d\n", __func__, ret); _dev_pm_opp_remove_table(opp_table, dev, false); + of_node_put(np); goto put_opp_table; } } Patches currently in stable-queue which might be from Tobias.Jordan(a)elektrobit.com are queue-4.14/pm-opp-add-missing-of_node_put-np.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 7f342678634f16795892677204366e835e450dda Mon Sep 17 00:00:00 2001 From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com> Date: Tue, 17 Oct 2017 05:47:38 -0700 Subject: PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com> commit 7f342678634f16795892677204366e835e450dda upstream. The Cavium ThunderX (CN8XXX) family of PCIe Root Ports does not advertise an ACS capability. However, the RTL internally implements similar protection as if ACS had Request Redirection, Completion Redirection, Source Validation, and Upstream Forwarding features enabled. Change Cavium ACS capabilities quirk flags accordingly. Fixes: b404bcfbf035 ("PCI: Add ACS quirk for all Cavium devices") Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com> [bhelgaas: tidy changelog, comment, stable tag] Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/pci/quirks.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4215,12 +4215,14 @@ static int pci_quirk_amd_sb_acs(struct p static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags) { /* - * Cavium devices matching this quirk do not perform peer-to-peer - * with other functions, allowing masking out these bits as if they - * were unimplemented in the ACS capability. + * Cavium root ports don't advertise an ACS capability. However, + * the RTL internally implements similar protection as if ACS had + * Request Redirection, Completion Redirection, Source Validation, + * and Upstream Forwarding features enabled. Assert that the + * hardware implements and enables equivalent ACS functionality for + * these flags. */ - acs_flags &= ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | - PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_DT); + acs_flags &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_SV | PCI_ACS_UF); if (!((dev->device >= 0xa000) && (dev->device <= 0xa0ff))) return -ENOTTY; Patches currently in stable-queue which might be from Vadim.Lomovtsev(a)cavium.com are queue-4.14/pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch queue-4.14/pci-apply-cavium-thunderx-acs-quirk-to-more-root-ports.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "PCI: hv: Use effective affinity mask" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled PCI: hv: Use effective affinity mask to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: pci-hv-use-effective-affinity-mask.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 79aa801e899417a56863d6713f76c4e108856000 Mon Sep 17 00:00:00 2001 From: Dexuan Cui <decui(a)microsoft.com> Date: Wed, 1 Nov 2017 20:30:53 +0000 Subject: PCI: hv: Use effective affinity mask From: Dexuan Cui <decui(a)microsoft.com> commit 79aa801e899417a56863d6713f76c4e108856000 upstream. The effective_affinity_mask is always set when an interrupt is assigned in __assign_irq_vector() -> apic->cpu_mask_to_apicid(), e.g. for struct apic apic_physflat: -> default_cpu_mask_to_apicid() -> irq_data_update_effective_affinity(), but it looks d->common->affinity remains all-1's before the user space or the kernel changes it later. In the early allocation/initialization phase of an IRQ, we should use the effective_affinity_mask, otherwise Hyper-V may not deliver the interrupt to the expected CPU. Without the patch, if we assign 7 Mellanox ConnectX-3 VFs to a 32-vCPU VM, one of the VFs may fail to receive interrupts. Tested-by: Adrian Suhov <v-adsuho(a)microsoft.com> Signed-off-by: Dexuan Cui <decui(a)microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Reviewed-by: Jake Oshins <jakeo(a)microsoft.com> Cc: Jork Loeser <jloeser(a)microsoft.com> Cc: Stephen Hemminger <sthemmin(a)microsoft.com> Cc: K. Y. Srinivasan <kys(a)microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/pci/host/pci-hyperv.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) --- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -879,7 +879,7 @@ static void hv_irq_unmask(struct irq_dat int cpu; u64 res; - dest = irq_data_get_affinity_mask(data); + dest = irq_data_get_effective_affinity_mask(data); pdev = msi_desc_to_pci_dev(msi_desc); pbus = pdev->bus; hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); @@ -1042,6 +1042,7 @@ static void hv_compose_msi_msg(struct ir struct hv_pci_dev *hpdev; struct pci_bus *pbus; struct pci_dev *pdev; + struct cpumask *dest; struct compose_comp_ctxt comp; struct tran_int_desc *int_desc; struct { @@ -1056,6 +1057,7 @@ static void hv_compose_msi_msg(struct ir int ret; pdev = msi_desc_to_pci_dev(irq_data_get_msi_desc(data)); + dest = irq_data_get_effective_affinity_mask(data); pbus = pdev->bus; hbus = container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); hpdev = get_pcichild_wslot(hbus, devfn_to_wslot(pdev->devfn)); @@ -1081,14 +1083,14 @@ static void hv_compose_msi_msg(struct ir switch (pci_protocol_version) { case PCI_PROTOCOL_VERSION_1_1: size = hv_compose_msi_req_v1(&ctxt.int_pkts.v1, - irq_data_get_affinity_mask(data), + dest, hpdev->desc.win_slot.slot, cfg->vector); break; case PCI_PROTOCOL_VERSION_1_2: size = hv_compose_msi_req_v2(&ctxt.int_pkts.v2, - irq_data_get_affinity_mask(data), + dest, hpdev->desc.win_slot.slot, cfg->vector); break; Patches currently in stable-queue which might be from decui(a)microsoft.com are queue-4.14/pci-hv-use-effective-affinity-mask.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: pci-aspm-use-correct-capability-pointer-to-program-ltr_l1.2_threshold.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From c00054f540bf81e592e1fee709b0bdbf20f478b5 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas <bhelgaas(a)google.com> Date: Mon, 13 Nov 2017 15:05:50 -0600 Subject: PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD From: Bjorn Helgaas <bhelgaas(a)google.com> commit c00054f540bf81e592e1fee709b0bdbf20f478b5 upstream. Previously we programmed the LTR_L1.2_THRESHOLD in the parent (upstream) device using the capability pointer of the *child* (downstream) device, which corrupted some random word of the parent's config space. Use the parent's L1 SS capability pointer to program its LTR_L1.2_THRESHOLD. Fixes: aeda9adebab8 ("PCI/ASPM: Configure L1 substate settings") Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Reviewed-by: Vidya Sagar <vidyas(a)nvidia.com> CC: Rajat Jain <rajatja(a)google.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/pci/pcie/aspm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -658,7 +658,7 @@ static void pcie_config_aspm_l1ss(struct 0xFF00, link->l1ss.ctl1); /* Program LTR L1.2 threshold in both ports */ - pci_clear_and_set_dword(parent, dw_cap_ptr + PCI_L1SS_CTL1, + pci_clear_and_set_dword(parent, up_cap_ptr + PCI_L1SS_CTL1, 0xE3FF0000, link->l1ss.ctl1); pci_clear_and_set_dword(child, dw_cap_ptr + PCI_L1SS_CTL1, 0xE3FF0000, link->l1ss.ctl1); Patches currently in stable-queue which might be from bhelgaas(a)google.com are queue-4.14/pci-hv-use-effective-affinity-mask.patch queue-4.14/pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch queue-4.14/pci-aspm-account-for-downstream-device-s-port-common_mode_restore_time.patch queue-4.14/pci-aspm-use-correct-capability-pointer-to-program-ltr_l1.2_threshold.patch queue-4.14/pci-apply-cavium-thunderx-acs-quirk-to-more-root-ports.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: pci-aspm-account-for-downstream-device-s-port-common_mode_restore_time.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 94ac327e043ee40d7fc57b54541da50507ef4e99 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas <bhelgaas(a)google.com> Date: Mon, 13 Nov 2017 08:50:30 -0600 Subject: PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time From: Bjorn Helgaas <bhelgaas(a)google.com> commit 94ac327e043ee40d7fc57b54541da50507ef4e99 upstream. Every Port that supports the L1.2 substate advertises its Port Common_Mode_Restore_Time, i.e., the time the Port requires to re-establish common mode when exiting L1.2 (see PCIe r3.1, sec 7.33.2). Per sec 5.5.3.3.1, when exiting L1.2, the Downstream Port (the device at the upstream end of the link) must send TS1 training sequences for at least T(COMMONMODE) after it detects electrical idle exit on the Link. We want this to be long enough for both ends of the Link, so we should set it to the maximum of the Port Common_Mode_Restore_Time for the upstream and downstream components on the Link. Previously we only looked at the Port Common_Mode_Restore_Time of the upstream device, so if the downstream device required more time, we didn't program the upstream device's T(COMMONMODE) correctly. Fixes: f1f0366dd6be ("PCI/ASPM: Calculate and save the L1.2 timing parameters") Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Reviewed-by: Vidya Sagar <vidyas(a)nvidia.com> Acked-by: Rajat Jain <rajatja(a)google.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/pci/pcie/aspm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -453,7 +453,7 @@ static void aspm_calc_l1ss_info(struct p /* Choose the greater of the two T_cmn_mode_rstr_time */ val1 = (upreg->l1ss_cap >> 8) & 0xFF; - val2 = (upreg->l1ss_cap >> 8) & 0xFF; + val2 = (dwreg->l1ss_cap >> 8) & 0xFF; if (val1 > val2) link->l1ss.ctl1 |= val1 << 8; else Patches currently in stable-queue which might be from bhelgaas(a)google.com are queue-4.14/pci-hv-use-effective-affinity-mask.patch queue-4.14/pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch queue-4.14/pci-aspm-account-for-downstream-device-s-port-common_mode_restore_time.patch queue-4.14/pci-aspm-use-correct-capability-pointer-to-program-ltr_l1.2_threshold.patch queue-4.14/pci-apply-cavium-thunderx-acs-quirk-to-more-root-ports.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "PCI: Apply Cavium ThunderX ACS quirk to more Root Ports" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled PCI: Apply Cavium ThunderX ACS quirk to more Root Ports to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: pci-apply-cavium-thunderx-acs-quirk-to-more-root-ports.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From f2ddaf8dfd4a5071ad09074d2f95ab85d35c8a1e Mon Sep 17 00:00:00 2001 From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com> Date: Tue, 17 Oct 2017 05:47:39 -0700 Subject: PCI: Apply Cavium ThunderX ACS quirk to more Root Ports From: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com> commit f2ddaf8dfd4a5071ad09074d2f95ab85d35c8a1e upstream. Extend the Cavium ThunderX ACS quirk to cover more device IDs and restrict it to only Root Ports. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev(a)cavium.com> [bhelgaas: changelog, stable tag] Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/pci/quirks.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4212,6 +4212,19 @@ static int pci_quirk_amd_sb_acs(struct p #endif } +static bool pci_quirk_cavium_acs_match(struct pci_dev *dev) +{ + /* + * Effectively selects all downstream ports for whole ThunderX 1 + * family by 0xf800 mask (which represents 8 SoCs), while the lower + * bits of device ID are used to indicate which subdevice is used + * within the SoC. + */ + return (pci_is_pcie(dev) && + (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) && + ((dev->device & 0xf800) == 0xa000)); +} + static int pci_quirk_cavium_acs(struct pci_dev *dev, u16 acs_flags) { /* @@ -4224,7 +4237,7 @@ static int pci_quirk_cavium_acs(struct p */ acs_flags &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_SV | PCI_ACS_UF); - if (!((dev->device >= 0xa000) && (dev->device <= 0xa0ff))) + if (!pci_quirk_cavium_acs_match(dev)) return -ENOTTY; return acs_flags ? 0 : 1; Patches currently in stable-queue which might be from Vadim.Lomovtsev(a)cavium.com are queue-4.14/pci-set-cavium-acs-capability-quirk-flags-to-assert-rr-cr-sv-uf.patch queue-4.14/pci-apply-cavium-thunderx-acs-quirk-to-more-root-ports.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "net: mvneta: fix handling of the Tx descriptor counter" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled net: mvneta: fix handling of the Tx descriptor counter to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: net-mvneta-fix-handling-of-the-tx-descriptor-counter.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 0d63785c6b94b5d2f095f90755825f90eea791f5 Mon Sep 17 00:00:00 2001 From: Simon Guinot <simon.guinot(a)sequanux.org> Date: Mon, 13 Nov 2017 16:27:02 +0100 Subject: net: mvneta: fix handling of the Tx descriptor counter From: Simon Guinot <simon.guinot(a)sequanux.org> commit 0d63785c6b94b5d2f095f90755825f90eea791f5 upstream. The mvneta controller provides a 8-bit register to update the pending Tx descriptor counter. Then, a maximum of 255 Tx descriptors can be added at once. In the current code the mvneta_txq_pend_desc_add function assumes the caller takes care of this limit. But it is not the case. In some situations (xmit_more flag), more than 255 descriptors are added. When this happens, the Tx descriptor counter register is updated with a wrong value, which breaks the whole Tx queue management. This patch fixes the issue by allowing the mvneta_txq_pend_desc_add function to process more than 255 Tx descriptors. Fixes: 2a90f7e1d5d0 ("net: mvneta: add xmit_more support") Signed-off-by: Simon Guinot <simon.guinot(a)sequanux.org> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/net/ethernet/marvell/mvneta.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -816,11 +816,14 @@ static void mvneta_txq_pend_desc_add(str { u32 val; - /* Only 255 descriptors can be added at once ; Assume caller - * process TX desriptors in quanta less than 256 - */ - val = pend_desc + txq->pending; - mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val); + pend_desc += txq->pending; + + /* Only 255 Tx descriptors can be added at once */ + do { + val = min(pend_desc, 255); + mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val); + pend_desc -= val; + } while (pend_desc > 0); txq->pending = 0; } Patches currently in stable-queue which might be from simon.guinot(a)sequanux.org are queue-4.14/net-mvneta-fix-handling-of-the-tx-descriptor-counter.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "nbd: wait uninterruptible for the dead timeout" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled nbd: wait uninterruptible for the dead timeout to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: nbd-wait-uninterruptible-for-the-dead-timeout.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From ff57dc94faec023abc267cdc45766fccff497557 Mon Sep 17 00:00:00 2001 From: Josef Bacik <jbacik(a)fb.com> Date: Mon, 6 Nov 2017 16:11:57 -0500 Subject: nbd: wait uninterruptible for the dead timeout From: Josef Bacik <jbacik(a)fb.com> commit ff57dc94faec023abc267cdc45766fccff497557 upstream. If we have a pending signal or the user kills their application then it'll bring down the whole device, which is less than awesome. Instead wait uninterruptible for the dead timeout so we're sure we gave it our best shot. Fixes: 560bc4b39952 ("nbd: handle dead connections") Signed-off-by: Josef Bacik <jbacik(a)fb.com> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/block/nbd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -723,9 +723,9 @@ static int wait_for_reconnect(struct nbd return 0; if (test_bit(NBD_DISCONNECTED, &config->runtime_flags)) return 0; - wait_event_interruptible_timeout(config->conn_wait, - atomic_read(&config->live_connections), - config->dead_conn_timeout); + wait_event_timeout(config->conn_wait, + atomic_read(&config->live_connections), + config->dead_conn_timeout); return atomic_read(&config->live_connections); } Patches currently in stable-queue which might be from jbacik(a)fb.com are queue-4.14/nbd-wait-uninterruptible-for-the-dead-timeout.patch queue-4.14/nbd-don-t-start-req-until-after-the-dead-connection-logic.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "nbd: don't start req until after the dead connection logic" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled nbd: don't start req until after the dead connection logic to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: nbd-don-t-start-req-until-after-the-dead-connection-logic.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 6a468d5990ecd1c2d07dd85f8633bbdd0ba61c40 Mon Sep 17 00:00:00 2001 From: Josef Bacik <jbacik(a)fb.com> Date: Mon, 6 Nov 2017 16:11:58 -0500 Subject: nbd: don't start req until after the dead connection logic From: Josef Bacik <jbacik(a)fb.com> commit 6a468d5990ecd1c2d07dd85f8633bbdd0ba61c40 upstream. We can end up sleeping for a while waiting for the dead timeout, which means we could get the per request timer to fire. We did handle this case, but if the dead timeout happened right after we submitted we'd either tear down the connection or possibly requeue as we're handling an error and race with the endio which can lead to panics and other hilarity. Fixes: 560bc4b39952 ("nbd: handle dead connections") Signed-off-by: Josef Bacik <jbacik(a)fb.com> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/block/nbd.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -288,15 +288,6 @@ static enum blk_eh_timer_return nbd_xmit cmd->status = BLK_STS_TIMEOUT; return BLK_EH_HANDLED; } - - /* If we are waiting on our dead timer then we could get timeout - * callbacks for our request. For this we just want to reset the timer - * and let the queue side take care of everything. - */ - if (!completion_done(&cmd->send_complete)) { - nbd_config_put(nbd); - return BLK_EH_RESET_TIMER; - } config = nbd->config; if (config->num_connections > 1) { @@ -740,6 +731,7 @@ static int nbd_handle_cmd(struct nbd_cmd if (!refcount_inc_not_zero(&nbd->config_refs)) { dev_err_ratelimited(disk_to_dev(nbd->disk), "Socks array is empty\n"); + blk_mq_start_request(req); return -EINVAL; } config = nbd->config; @@ -748,6 +740,7 @@ static int nbd_handle_cmd(struct nbd_cmd dev_err_ratelimited(disk_to_dev(nbd->disk), "Attempted send on invalid socket\n"); nbd_config_put(nbd); + blk_mq_start_request(req); return -EINVAL; } cmd->status = BLK_STS_OK; @@ -771,6 +764,7 @@ again: */ sock_shutdown(nbd); nbd_config_put(nbd); + blk_mq_start_request(req); return -EIO; } goto again; @@ -781,6 +775,7 @@ again: * here so that it gets put _after_ the request that is already on the * dispatch list. */ + blk_mq_start_request(req); if (unlikely(nsock->pending && nsock->pending != req)) { blk_mq_requeue_request(req, true); ret = 0; @@ -793,10 +788,10 @@ again: ret = nbd_send_cmd(nbd, cmd, index); if (ret == -EAGAIN) { dev_err_ratelimited(disk_to_dev(nbd->disk), - "Request send failed trying another connection\n"); + "Request send failed, requeueing\n"); nbd_mark_nsock_dead(nbd, nsock, 1); - mutex_unlock(&nsock->tx_lock); - goto again; + blk_mq_requeue_request(req, true); + ret = 0; } out: mutex_unlock(&nsock->tx_lock); @@ -820,7 +815,6 @@ static blk_status_t nbd_queue_rq(struct * done sending everything over the wire. */ init_completion(&cmd->send_complete); - blk_mq_start_request(bd->rq); /* We can be called directly from the user space process, which means we * could possibly have signals pending so our sendmsg will fail. In Patches currently in stable-queue which might be from jbacik(a)fb.com are queue-4.14/nbd-wait-uninterruptible-for-the-dead-timeout.patch queue-4.14/nbd-don-t-start-req-until-after-the-dead-connection-logic.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ALSA: hda: Add Raven PCI ID" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ALSA: hda: Add Raven PCI ID to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: alsa-hda-add-raven-pci-id.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 9ceace3c9c18c67676e75141032a65a8e01f9a7a Mon Sep 17 00:00:00 2001 From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> Date: Thu, 23 Nov 2017 20:07:00 +0530 Subject: ALSA: hda: Add Raven PCI ID From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> commit 9ceace3c9c18c67676e75141032a65a8e01f9a7a upstream. This commit adds PCI ID for Raven platform Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> Signed-off-by: Takashi Iwai <tiwai(a)suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- sound/pci/hda/hda_intel.c | 3 +++ 1 file changed, 3 insertions(+) --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2463,6 +2463,9 @@ static const struct pci_device_id azx_id /* AMD Hudson */ { PCI_DEVICE(0x1022, 0x780d), .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB }, + /* AMD Raven */ + { PCI_DEVICE(0x1022, 0x15e3), + .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB }, /* ATI HDMI */ { PCI_DEVICE(0x1002, 0x0002), .driver_data = AZX_DRIVER_ATIHDMI_NS | AZX_DCAPS_PRESET_ATI_HDMI_NS }, Patches currently in stable-queue which might be from Vijendar.Mukunda(a)amd.com are queue-4.14/alsa-hda-add-raven-pci-id.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ALSA: hda: Add Raven PCI ID" has been added to the 3.18-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ALSA: hda: Add Raven PCI ID to the 3.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: alsa-hda-add-raven-pci-id.patch and it can be found in the queue-3.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 9ceace3c9c18c67676e75141032a65a8e01f9a7a Mon Sep 17 00:00:00 2001 From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> Date: Thu, 23 Nov 2017 20:07:00 +0530 Subject: ALSA: hda: Add Raven PCI ID From: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> commit 9ceace3c9c18c67676e75141032a65a8e01f9a7a upstream. This commit adds PCI ID for Raven platform Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com> Signed-off-by: Takashi Iwai <tiwai(a)suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- sound/pci/hda/hda_intel.c | 3 +++ 1 file changed, 3 insertions(+) --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2092,6 +2092,9 @@ static const struct pci_device_id azx_id /* AMD Hudson */ { PCI_DEVICE(0x1022, 0x780d), .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB }, + /* AMD Raven */ + { PCI_DEVICE(0x1022, 0x15e3), + .driver_data = AZX_DRIVER_GENERIC | AZX_DCAPS_PRESET_ATI_SB }, /* ATI HDMI */ { PCI_DEVICE(0x1002, 0x0002), .driver_data = AZX_DRIVER_ATIHDMI_NS | AZX_DCAPS_PRESET_ATI_HDMI_NS }, Patches currently in stable-queue which might be from Vijendar.Mukunda(a)amd.com are queue-3.18/alsa-hda-add-raven-pci-id.patch

8 years

1
0
0 0

Re: [Linux-stable-mirror] WTF: patch "[PATCH] MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver" was seriously submitted to be applied to the 4.9-stable tree?

by Greg KH

On Mon, Nov 27, 2017 at 12:40:36PM +0000, James Hogan wrote: > Hi Greg, > > On Mon, Nov 27, 2017 at 01:35:46PM +0100, gregkh(a)linuxfoundation.org wrote: > > The patch below was submitted to be applied to the 4.9-stable tree. > > > > I fail to see how this patch meets the stable kernel rules as found at > > Documentation/process/stable-kernel-rules.rst. > > > > I could be totally wrong, and if so, please respond to > > <stable(a)vger.kernel.org> and let me know why this patch should be > > applied. Otherwise, it is now dropped from my patch queues, never to be > > seen again. > > I should have adjusted the commit message. KERN_WARN doesn't exist so it > actually fixes a build error as well as switching to pr_warn(). What build error? I have not heard of this breaking the build on 4.9 for the past year, is it in some config that no one uses? :) thanks, greg k-h

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] PM / OPP: Add missing of_node_put(np)" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 7978db344719dab1e56d05e6fc04aaaddcde0a5e Mon Sep 17 00:00:00 2001 From: Tobias Jordan <Tobias.Jordan(a)elektrobit.com> Date: Wed, 4 Oct 2017 11:35:03 +0530 Subject: [PATCH] PM / OPP: Add missing of_node_put(np) The for_each_available_child_of_node() loop in _of_add_opp_table_v2() doesn't drop the reference to "np" on errors. Fix that. Fixes: 274659029c9d (PM / OPP: Add support to parse "operating-points-v2" bindings) Cc: 4.3+ <stable(a)vger.kernel.org> # 4.3+ Signed-off-by: Tobias Jordan <Tobias.Jordan(a)elektrobit.com> [ VK: Improved commit log. ] Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org> Reviewed-by: Stephen Boyd <sboyd(a)codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 0b718886479b..87509cb69f79 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -397,6 +397,7 @@ static int _of_add_opp_table_v2(struct device *dev, struct device_node *opp_np) dev_err(dev, "%s: Failed to add OPP, %d\n", __func__, ret); _dev_pm_opp_remove_table(opp_table, dev, false); + of_node_put(np); goto put_opp_table; } }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] PM / OPP: Add missing of_node_put(np)" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 7978db344719dab1e56d05e6fc04aaaddcde0a5e Mon Sep 17 00:00:00 2001 From: Tobias Jordan <Tobias.Jordan(a)elektrobit.com> Date: Wed, 4 Oct 2017 11:35:03 +0530 Subject: [PATCH] PM / OPP: Add missing of_node_put(np) The for_each_available_child_of_node() loop in _of_add_opp_table_v2() doesn't drop the reference to "np" on errors. Fix that. Fixes: 274659029c9d (PM / OPP: Add support to parse "operating-points-v2" bindings) Cc: 4.3+ <stable(a)vger.kernel.org> # 4.3+ Signed-off-by: Tobias Jordan <Tobias.Jordan(a)elektrobit.com> [ VK: Improved commit log. ] Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org> Reviewed-by: Stephen Boyd <sboyd(a)codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 0b718886479b..87509cb69f79 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -397,6 +397,7 @@ static int _of_add_opp_table_v2(struct device *dev, struct device_node *opp_np) dev_err(dev, "%s: Failed to add OPP, %d\n", __func__, ret); _dev_pm_opp_remove_table(opp_table, dev, false); + of_node_put(np); goto put_opp_table; } }

8 years

1
0
0 0

[Linux-stable-mirror] Patch "MIPS: ralink: Fix typo in mt7628 pinmux function" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled MIPS: ralink: Fix typo in mt7628 pinmux function to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mips-ralink-fix-typo-in-mt7628-pinmux-function.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 05a67cc258e75ac9758e6f13d26337b8be51162a Mon Sep 17 00:00:00 2001 From: Mathias Kresin <dev(a)kresin.me> Date: Thu, 11 May 2017 08:11:15 +0200 Subject: MIPS: ralink: Fix typo in mt7628 pinmux function From: Mathias Kresin <dev(a)kresin.me> commit 05a67cc258e75ac9758e6f13d26337b8be51162a upstream. There is a typo inside the pinmux setup code. The function is called refclk and not reclk. Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support") Signed-off-by: Mathias Kresin <dev(a)kresin.me> Acked-by: John Crispin <john(a)phrozen.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: linux-mips(a)linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16047/ Signed-off-by: James Hogan <jhogan(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/mips/ralink/mt7620.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/mips/ralink/mt7620.c +++ b/arch/mips/ralink/mt7620.c @@ -141,7 +141,7 @@ static struct rt2880_pmx_func i2c_grp_mt FUNC("i2c", 0, 4, 2), }; -static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) }; +static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("refclk", 0, 37, 1) }; static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) }; static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) }; static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) }; Patches currently in stable-queue which might be from dev(a)kresin.me are queue-4.9/mips-ralink-fix-mt7628-pinmux.patch queue-4.9/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "x86/decoder: Add new TEST instruction pattern" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled x86/decoder: Add new TEST instruction pattern to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: x86-decoder-add-new-test-instruction-pattern.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu <mhiramat(a)kernel.org> Date: Fri, 24 Nov 2017 13:56:30 +0900 Subject: x86/decoder: Add new TEST instruction pattern From: Masami Hiramatsu <mhiramat(a)kernel.org> commit 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df upstream. The kbuild test robot reported this build warning: Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx) Warning: objdump says 3 bytes, but insn_get_length() says 2 Warning: decoded and checked 1569014 instructions with 1 warnings This sequence seems to be a new instruction not in the opcode map in the Intel SDM. The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8. Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of the ModR/M Byte (bits 2,1,0 in parenthesis)" In that table, opcodes listed by the index REG bits as: 000 001 010 011 100 101 110 111 TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX So, it seems TEST Ib is assigned to 001. Add the new pattern. Reported-by: kbuild test robot <fengguang.wu(a)intel.com> Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: H. Peter Anvin <hpa(a)zytor.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/lib/x86-opcode-map.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -896,7 +896,7 @@ EndTable GrpTable: Grp3_1 0: TEST Eb,Ib -1: +1: TEST Eb,Ib 2: NOT Eb 3: NEG Eb 4: MUL AL,Eb Patches currently in stable-queue which might be from mhiramat(a)kernel.org are queue-4.9/x86-decoder-add-new-test-instruction-pattern.patch

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] MIPS: ralink: Fix typo in mt7628 pinmux function" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 05a67cc258e75ac9758e6f13d26337b8be51162a Mon Sep 17 00:00:00 2001 From: Mathias Kresin <dev(a)kresin.me> Date: Thu, 11 May 2017 08:11:15 +0200 Subject: [PATCH] MIPS: ralink: Fix typo in mt7628 pinmux function There is a typo inside the pinmux setup code. The function is called refclk and not reclk. Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support") Signed-off-by: Mathias Kresin <dev(a)kresin.me> Acked-by: John Crispin <john(a)phrozen.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 3.19+ Patchwork: https://patchwork.linux-mips.org/patch/16047/ Signed-off-by: James Hogan <jhogan(a)kernel.org> diff --git a/arch/mips/ralink/mt7620.c b/arch/mips/ralink/mt7620.c index fda073cd7b73..41b71c4352c2 100644 --- a/arch/mips/ralink/mt7620.c +++ b/arch/mips/ralink/mt7620.c @@ -145,7 +145,7 @@ static struct rt2880_pmx_func i2c_grp_mt7628[] = { FUNC("i2c", 0, 4, 2), }; -static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) }; +static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("refclk", 0, 37, 1) }; static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) }; static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) }; static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) };

8 years

2
1
0 0

[Linux-stable-mirror] Patch "MIPS: ralink: Fix MT7628 pinmux" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled MIPS: ralink: Fix MT7628 pinmux to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mips-ralink-fix-mt7628-pinmux.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 8ef4b43cd3794d63052d85898e42424fd3b14d24 Mon Sep 17 00:00:00 2001 From: Mathias Kresin <dev(a)kresin.me> Date: Thu, 11 May 2017 08:11:14 +0200 Subject: MIPS: ralink: Fix MT7628 pinmux From: Mathias Kresin <dev(a)kresin.me> commit 8ef4b43cd3794d63052d85898e42424fd3b14d24 upstream. According to the datasheet the REFCLK pin is shared with GPIO#37 and the PERST pin is shared with GPIO#36. Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support") Signed-off-by: Mathias Kresin <dev(a)kresin.me> Acked-by: John Crispin <john(a)phrozen.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: linux-mips(a)linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16046/ Signed-off-by: James Hogan <jhogan(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/mips/ralink/mt7620.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/mips/ralink/mt7620.c +++ b/arch/mips/ralink/mt7620.c @@ -141,8 +141,8 @@ static struct rt2880_pmx_func i2c_grp_mt FUNC("i2c", 0, 4, 2), }; -static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 36, 1) }; -static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 37, 1) }; +static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) }; +static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) }; static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) }; static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) }; Patches currently in stable-queue which might be from dev(a)kresin.me are queue-4.9/mips-ralink-fix-mt7628-pinmux.patch queue-4.9/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "arm64: Implement arch-specific pte_access_permitted()" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled arm64: Implement arch-specific pte_access_permitted() to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: arm64-implement-arch-specific-pte_access_permitted.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 6218f96c58dbf44a06aeaf767aab1f54fc397838 Mon Sep 17 00:00:00 2001 From: Catalin Marinas <catalin.marinas(a)arm.com> Date: Thu, 26 Oct 2017 18:36:47 +0100 Subject: arm64: Implement arch-specific pte_access_permitted() From: Catalin Marinas <catalin.marinas(a)arm.com> commit 6218f96c58dbf44a06aeaf767aab1f54fc397838 upstream. The generic pte_access_permitted() implementation only checks for pte_present() (together with the write permission where applicable). However, for both kernel ptes and PROT_NONE mappings pte_present() also returns true on arm64 even though such mappings are not user accessible. Additionally, arm64 now supports execute-only user permission (PROT_EXEC) which is implemented by clearing the PTE_USER bit. With this patch the arm64 implementation of pte_access_permitted() checks for the PTE_VALID and PTE_USER bits together with writable access if applicable. Reported-by: Al Viro <viro(a)zeniv.linux.org.uk> Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com> Signed-off-by: Will Deacon <will.deacon(a)arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -91,6 +91,8 @@ extern unsigned long empty_zero_page[PAG ((pte_val(pte) & (PTE_VALID | PTE_USER | PTE_UXN)) == (PTE_VALID | PTE_UXN)) #define pte_valid_young(pte) \ ((pte_val(pte) & (PTE_VALID | PTE_AF)) == (PTE_VALID | PTE_AF)) +#define pte_valid_user(pte) \ + ((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) /* * Could the pte be present in the TLB? We must check mm_tlb_flush_pending @@ -100,6 +102,18 @@ extern unsigned long empty_zero_page[PAG #define pte_accessible(mm, pte) \ (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte)) +/* + * p??_access_permitted() is true for valid user mappings (subject to the + * write permission check) other than user execute-only which do not have the + * PTE_USER bit set. PROT_NONE mappings do not have the PTE_VALID bit set. + */ +#define pte_access_permitted(pte, write) \ + (pte_valid_user(pte) && (!(write) || pte_write(pte))) +#define pmd_access_permitted(pmd, write) \ + (pte_access_permitted(pmd_pte(pmd), (write))) +#define pud_access_permitted(pud, write) \ + (pte_access_permitted(pud_pte(pud), (write))) + static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot) { pte_val(pte) &= ~pgprot_val(prot); Patches currently in stable-queue which might be from catalin.marinas(a)arm.com are queue-4.9/arm64-implement-arch-specific-pte_access_permitted.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ARM: 8721/1: mm: dump: check hardware RO bit for LPAE" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ARM: 8721/1: mm: dump: check hardware RO bit for LPAE to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 3b0c0c922ff4be275a8beb87ce5657d16f355b54 Mon Sep 17 00:00:00 2001 From: Philip Derrin <philip(a)cog.systems> Date: Tue, 14 Nov 2017 00:55:26 +0100 Subject: ARM: 8721/1: mm: dump: check hardware RO bit for LPAE From: Philip Derrin <philip(a)cog.systems> commit 3b0c0c922ff4be275a8beb87ce5657d16f355b54 upstream. When CONFIG_ARM_LPAE is set, the PMD dump relies on the software read-only bit to determine whether a page is writable. This concealed a bug which left the kernel text section writable (AP2=0) while marked read-only in the software bit. In a kernel with the AP2 bug, the dump looks like this: ---[ Kernel Mapping ]--- 0xc0000000-0xc0200000 2M RW NX SHD 0xc0200000-0xc0600000 4M ro x SHD 0xc0600000-0xc0800000 2M ro NX SHD 0xc0800000-0xc4800000 64M RW NX SHD The fix is to check that the software and hardware bits are both set before displaying "ro". The dump then shows the true perms: ---[ Kernel Mapping ]--- 0xc0000000-0xc0200000 2M RW NX SHD 0xc0200000-0xc0600000 4M RW x SHD 0xc0600000-0xc0800000 2M RW NX SHD 0xc0800000-0xc4800000 64M RW NX SHD Fixes: ded947798469 ("ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE") Signed-off-by: Philip Derrin <philip(a)cog.systems> Tested-by: Neil Dick <neil(a)cog.systems> Reviewed-by: Kees Cook <keescook(a)chromium.org> Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/arm/mm/dump.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/arm/mm/dump.c +++ b/arch/arm/mm/dump.c @@ -126,8 +126,8 @@ static const struct prot_bits section_bi .val = PMD_SECT_USER, .set = "USR", }, { - .mask = L_PMD_SECT_RDONLY, - .val = L_PMD_SECT_RDONLY, + .mask = L_PMD_SECT_RDONLY | PMD_SECT_AP2, + .val = L_PMD_SECT_RDONLY | PMD_SECT_AP2, .set = "ro", .clear = "RW", #elif __LINUX_ARM_ARCH__ >= 6 Patches currently in stable-queue which might be from philip(a)cog.systems are queue-4.9/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch queue-4.9/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c Mon Sep 17 00:00:00 2001 From: Philip Derrin <philip(a)cog.systems> Date: Tue, 14 Nov 2017 00:55:25 +0100 Subject: ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE From: Philip Derrin <philip(a)cog.systems> commit 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c upstream. Currently, for ARM kernels with CONFIG_ARM_LPAE and CONFIG_STRICT_KERNEL_RWX enabled, the 2MiB pages mapping the kernel code and rodata are writable. They are marked read-only in a software bit (L_PMD_SECT_RDONLY) but the hardware read-only bit is not set (PMD_SECT_AP2). For user mappings, the logic that propagates the software bit to the hardware bit is in set_pmd_at(); but for the kernel, section_update() writes the PMDs directly, skipping this logic. The fix is to set PMD_SECT_AP2 for read-only sections in section_update(), at the same time as L_PMD_SECT_RDONLY. Fixes: 1e3479225acb ("ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error") Signed-off-by: Philip Derrin <philip(a)cog.systems> Reported-by: Neil Dick <neil(a)cog.systems> Tested-by: Neil Dick <neil(a)cog.systems> Tested-by: Laura Abbott <labbott(a)redhat.com> Reviewed-by: Kees Cook <keescook(a)chromium.org> Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/arm/mm/init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -619,8 +619,8 @@ static struct section_perm ro_perms[] = .start = (unsigned long)_stext, .end = (unsigned long)__init_begin, #ifdef CONFIG_ARM_LPAE - .mask = ~L_PMD_SECT_RDONLY, - .prot = L_PMD_SECT_RDONLY, + .mask = ~(L_PMD_SECT_RDONLY | PMD_SECT_AP2), + .prot = L_PMD_SECT_RDONLY | PMD_SECT_AP2, #else .mask = ~(PMD_SECT_APX | PMD_SECT_AP_WRITE), .prot = PMD_SECT_APX | PMD_SECT_AP_WRITE, Patches currently in stable-queue which might be from philip(a)cog.systems are queue-4.9/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch queue-4.9/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "x86/decoder: Add new TEST instruction pattern" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled x86/decoder: Add new TEST instruction pattern to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: x86-decoder-add-new-test-instruction-pattern.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu <mhiramat(a)kernel.org> Date: Fri, 24 Nov 2017 13:56:30 +0900 Subject: x86/decoder: Add new TEST instruction pattern From: Masami Hiramatsu <mhiramat(a)kernel.org> commit 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df upstream. The kbuild test robot reported this build warning: Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx) Warning: objdump says 3 bytes, but insn_get_length() says 2 Warning: decoded and checked 1569014 instructions with 1 warnings This sequence seems to be a new instruction not in the opcode map in the Intel SDM. The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8. Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of the ModR/M Byte (bits 2,1,0 in parenthesis)" In that table, opcodes listed by the index REG bits as: 000 001 010 011 100 101 110 111 TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX So, it seems TEST Ib is assigned to 001. Add the new pattern. Reported-by: kbuild test robot <fengguang.wu(a)intel.com> Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: H. Peter Anvin <hpa(a)zytor.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/lib/x86-opcode-map.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -833,7 +833,7 @@ EndTable GrpTable: Grp3_1 0: TEST Eb,Ib -1: +1: TEST Eb,Ib 2: NOT Eb 3: NEG Eb 4: MUL AL,Eb Patches currently in stable-queue which might be from mhiramat(a)kernel.org are queue-4.4/x86-decoder-add-new-test-instruction-pattern.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "MIPS: ralink: Fix typo in mt7628 pinmux function" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled MIPS: ralink: Fix typo in mt7628 pinmux function to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mips-ralink-fix-typo-in-mt7628-pinmux-function.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 05a67cc258e75ac9758e6f13d26337b8be51162a Mon Sep 17 00:00:00 2001 From: Mathias Kresin <dev(a)kresin.me> Date: Thu, 11 May 2017 08:11:15 +0200 Subject: MIPS: ralink: Fix typo in mt7628 pinmux function From: Mathias Kresin <dev(a)kresin.me> commit 05a67cc258e75ac9758e6f13d26337b8be51162a upstream. There is a typo inside the pinmux setup code. The function is called refclk and not reclk. Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support") Signed-off-by: Mathias Kresin <dev(a)kresin.me> Acked-by: John Crispin <john(a)phrozen.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: linux-mips(a)linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16047/ Signed-off-by: James Hogan <jhogan(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/mips/ralink/mt7620.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/mips/ralink/mt7620.c +++ b/arch/mips/ralink/mt7620.c @@ -141,7 +141,7 @@ static struct rt2880_pmx_func i2c_grp_mt FUNC("i2c", 0, 4, 2), }; -static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) }; +static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("refclk", 0, 37, 1) }; static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) }; static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) }; static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) }; Patches currently in stable-queue which might be from dev(a)kresin.me are queue-4.4/mips-ralink-fix-mt7628-pinmux.patch queue-4.4/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "MIPS: ralink: Fix MT7628 pinmux" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled MIPS: ralink: Fix MT7628 pinmux to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mips-ralink-fix-mt7628-pinmux.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 8ef4b43cd3794d63052d85898e42424fd3b14d24 Mon Sep 17 00:00:00 2001 From: Mathias Kresin <dev(a)kresin.me> Date: Thu, 11 May 2017 08:11:14 +0200 Subject: MIPS: ralink: Fix MT7628 pinmux From: Mathias Kresin <dev(a)kresin.me> commit 8ef4b43cd3794d63052d85898e42424fd3b14d24 upstream. According to the datasheet the REFCLK pin is shared with GPIO#37 and the PERST pin is shared with GPIO#36. Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support") Signed-off-by: Mathias Kresin <dev(a)kresin.me> Acked-by: John Crispin <john(a)phrozen.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: linux-mips(a)linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16046/ Signed-off-by: James Hogan <jhogan(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/mips/ralink/mt7620.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/mips/ralink/mt7620.c +++ b/arch/mips/ralink/mt7620.c @@ -141,8 +141,8 @@ static struct rt2880_pmx_func i2c_grp_mt FUNC("i2c", 0, 4, 2), }; -static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 36, 1) }; -static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 37, 1) }; +static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) }; +static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) }; static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) }; static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) }; Patches currently in stable-queue which might be from dev(a)kresin.me are queue-4.4/mips-ralink-fix-mt7628-pinmux.patch queue-4.4/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c Mon Sep 17 00:00:00 2001 From: Philip Derrin <philip(a)cog.systems> Date: Tue, 14 Nov 2017 00:55:25 +0100 Subject: ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE From: Philip Derrin <philip(a)cog.systems> commit 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c upstream. Currently, for ARM kernels with CONFIG_ARM_LPAE and CONFIG_STRICT_KERNEL_RWX enabled, the 2MiB pages mapping the kernel code and rodata are writable. They are marked read-only in a software bit (L_PMD_SECT_RDONLY) but the hardware read-only bit is not set (PMD_SECT_AP2). For user mappings, the logic that propagates the software bit to the hardware bit is in set_pmd_at(); but for the kernel, section_update() writes the PMDs directly, skipping this logic. The fix is to set PMD_SECT_AP2 for read-only sections in section_update(), at the same time as L_PMD_SECT_RDONLY. Fixes: 1e3479225acb ("ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error") Signed-off-by: Philip Derrin <philip(a)cog.systems> Reported-by: Neil Dick <neil(a)cog.systems> Tested-by: Neil Dick <neil(a)cog.systems> Tested-by: Laura Abbott <labbott(a)redhat.com> Reviewed-by: Kees Cook <keescook(a)chromium.org> Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/arm/mm/init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -611,8 +611,8 @@ static struct section_perm ro_perms[] = .start = (unsigned long)_stext, .end = (unsigned long)__init_begin, #ifdef CONFIG_ARM_LPAE - .mask = ~L_PMD_SECT_RDONLY, - .prot = L_PMD_SECT_RDONLY, + .mask = ~(L_PMD_SECT_RDONLY | PMD_SECT_AP2), + .prot = L_PMD_SECT_RDONLY | PMD_SECT_AP2, #else .mask = ~(PMD_SECT_APX | PMD_SECT_AP_WRITE), .prot = PMD_SECT_APX | PMD_SECT_AP_WRITE, Patches currently in stable-queue which might be from philip(a)cog.systems are queue-4.4/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch queue-4.4/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ARM: 8721/1: mm: dump: check hardware RO bit for LPAE" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ARM: 8721/1: mm: dump: check hardware RO bit for LPAE to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 3b0c0c922ff4be275a8beb87ce5657d16f355b54 Mon Sep 17 00:00:00 2001 From: Philip Derrin <philip(a)cog.systems> Date: Tue, 14 Nov 2017 00:55:26 +0100 Subject: ARM: 8721/1: mm: dump: check hardware RO bit for LPAE From: Philip Derrin <philip(a)cog.systems> commit 3b0c0c922ff4be275a8beb87ce5657d16f355b54 upstream. When CONFIG_ARM_LPAE is set, the PMD dump relies on the software read-only bit to determine whether a page is writable. This concealed a bug which left the kernel text section writable (AP2=0) while marked read-only in the software bit. In a kernel with the AP2 bug, the dump looks like this: ---[ Kernel Mapping ]--- 0xc0000000-0xc0200000 2M RW NX SHD 0xc0200000-0xc0600000 4M ro x SHD 0xc0600000-0xc0800000 2M ro NX SHD 0xc0800000-0xc4800000 64M RW NX SHD The fix is to check that the software and hardware bits are both set before displaying "ro". The dump then shows the true perms: ---[ Kernel Mapping ]--- 0xc0000000-0xc0200000 2M RW NX SHD 0xc0200000-0xc0600000 4M RW x SHD 0xc0600000-0xc0800000 2M RW NX SHD 0xc0800000-0xc4800000 64M RW NX SHD Fixes: ded947798469 ("ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE") Signed-off-by: Philip Derrin <philip(a)cog.systems> Tested-by: Neil Dick <neil(a)cog.systems> Reviewed-by: Kees Cook <keescook(a)chromium.org> Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/arm/mm/dump.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/arm/mm/dump.c +++ b/arch/arm/mm/dump.c @@ -126,8 +126,8 @@ static const struct prot_bits section_bi .val = PMD_SECT_USER, .set = "USR", }, { - .mask = L_PMD_SECT_RDONLY, - .val = L_PMD_SECT_RDONLY, + .mask = L_PMD_SECT_RDONLY | PMD_SECT_AP2, + .val = L_PMD_SECT_RDONLY | PMD_SECT_AP2, .set = "ro", .clear = "RW", #elif __LINUX_ARM_ARCH__ >= 6 Patches currently in stable-queue which might be from philip(a)cog.systems are queue-4.4/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch queue-4.4/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: x86-entry-64-fix-entry_syscall_64_after_hwframe-irq-tracing.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 548c3050ea8d16997ae27f9e080a8338a606fc93 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski <luto(a)kernel.org> Date: Tue, 21 Nov 2017 20:43:56 -0800 Subject: x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing From: Andy Lutomirski <luto(a)kernel.org> commit 548c3050ea8d16997ae27f9e080a8338a606fc93 upstream. When I added entry_SYSCALL_64_after_hwframe(), I left TRACE_IRQS_OFF before it. This means that users of entry_SYSCALL_64_after_hwframe() were responsible for invoking TRACE_IRQS_OFF, and the one and only user (Xen, added in the same commit) got it wrong. I think this would manifest as a warning if a Xen PV guest with CONFIG_DEBUG_LOCKDEP=y were used with context tracking. (The context tracking bit is to cause lockdep to get invoked before we turn IRQs back on.) I haven't tested that for real yet because I can't get a kernel configured like that to boot at all on Xen PV. Move TRACE_IRQS_OFF below the label. Signed-off-by: Andy Lutomirski <luto(a)kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com> Cc: Borislav Petkov <bpetkov(a)suse.de> Cc: Brian Gerst <brgerst(a)gmail.com> Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Josh Poimboeuf <jpoimboe(a)redhat.com> Cc: Juergen Gross <jgross(a)suse.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Fixes: 8a9949bc71a7 ("x86/xen/64: Rearrange the SYSCALL entries") Link: http://lkml.kernel.org/r/9150aac013b7b95d62c2336751d5b6e91d2722aa.151132544… Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/entry/entry_64.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -148,8 +148,6 @@ ENTRY(entry_SYSCALL_64) movq %rsp, PER_CPU_VAR(rsp_scratch) movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp - TRACE_IRQS_OFF - /* Construct struct pt_regs on stack */ pushq $__USER_DS /* pt_regs->ss */ pushq PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */ @@ -170,6 +168,8 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */ UNWIND_HINT_REGS extra=0 + TRACE_IRQS_OFF + /* * If we need to do entry work or if we guess we'll need to do * exit work, go straight to the slow path. Patches currently in stable-queue which might be from luto(a)kernel.org are queue-4.14/x86-entry-64-fix-entry_syscall_64_after_hwframe-irq-tracing.patch queue-4.14/x86-entry-64-add-missing-irqflags-tracing-to-native_load_gs_index.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "x86/entry/64: Add missing irqflags tracing to native_load_gs_index()" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled x86/entry/64: Add missing irqflags tracing to native_load_gs_index() to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: x86-entry-64-add-missing-irqflags-tracing-to-native_load_gs_index.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From ca37e57bbe0cf1455ea3e84eb89ed04a132d59e1 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski <luto(a)kernel.org> Date: Wed, 22 Nov 2017 20:39:16 -0800 Subject: x86/entry/64: Add missing irqflags tracing to native_load_gs_index() From: Andy Lutomirski <luto(a)kernel.org> commit ca37e57bbe0cf1455ea3e84eb89ed04a132d59e1 upstream. Running this code with IRQs enabled (where dummy_lock is a spinlock): static void check_load_gs_index(void) { /* This will fail. */ load_gs_index(0xffff); spin_lock(&dummy_lock); spin_unlock(&dummy_lock); } Will generate a lockdep warning. The issue is that the actual write to %gs would cause an exception with IRQs disabled, and the exception handler would, as an inadvertent side effect, update irqflag tracing to reflect the IRQs-off status. native_load_gs_index() would then turn IRQs back on and return with irqflag tracing still thinking that IRQs were off. The dummy lock-and-unlock causes lockdep to notice the error and warn. Fix it by adding the missing tracing. Apparently nothing did this in a context where it mattered. I haven't tried to find a code path that would actually exhibit the warning if appropriately nasty user code were running. I suspect that the security impact of this bug is very, very low -- production systems don't run with lockdep enabled, and the warning is mostly harmless anyway. Found during a quick audit of the entry code to try to track down an unrelated bug that Ingo found in some still-in-development code. Signed-off-by: Andy Lutomirski <luto(a)kernel.org> Cc: Borislav Petkov <bpetkov(a)suse.de> Cc: Brian Gerst <brgerst(a)gmail.com> Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Josh Poimboeuf <jpoimboe(a)redhat.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Link: http://lkml.kernel.org/r/e1aeb0e6ba8dd430ec36c8a35e63b429698b4132.151141191… Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/entry/entry_64.S | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -51,15 +51,19 @@ ENTRY(native_usergs_sysret64) END(native_usergs_sysret64) #endif /* CONFIG_PARAVIRT */ -.macro TRACE_IRQS_IRETQ +.macro TRACE_IRQS_FLAGS flags:req #ifdef CONFIG_TRACE_IRQFLAGS - bt $9, EFLAGS(%rsp) /* interrupts off? */ + bt $9, \flags /* interrupts off? */ jnc 1f TRACE_IRQS_ON 1: #endif .endm +.macro TRACE_IRQS_IRETQ + TRACE_IRQS_FLAGS EFLAGS(%rsp) +.endm + /* * When dynamic function tracer is enabled it will add a breakpoint * to all locations that it is about to modify, sync CPUs, update @@ -923,11 +927,13 @@ ENTRY(native_load_gs_index) FRAME_BEGIN pushfq DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI) + TRACE_IRQS_OFF SWAPGS .Lgs_change: movl %edi, %gs 2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE SWAPGS + TRACE_IRQS_FLAGS (%rsp) popfq FRAME_END ret Patches currently in stable-queue which might be from luto(a)kernel.org are queue-4.14/x86-entry-64-fix-entry_syscall_64_after_hwframe-irq-tracing.patch queue-4.14/x86-entry-64-add-missing-irqflags-tracing-to-native_load_gs_index.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "x86/decoder: Add new TEST instruction pattern" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled x86/decoder: Add new TEST instruction pattern to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: x86-decoder-add-new-test-instruction-pattern.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu <mhiramat(a)kernel.org> Date: Fri, 24 Nov 2017 13:56:30 +0900 Subject: x86/decoder: Add new TEST instruction pattern From: Masami Hiramatsu <mhiramat(a)kernel.org> commit 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df upstream. The kbuild test robot reported this build warning: Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx) Warning: objdump says 3 bytes, but insn_get_length() says 2 Warning: decoded and checked 1569014 instructions with 1 warnings This sequence seems to be a new instruction not in the opcode map in the Intel SDM. The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8. Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of the ModR/M Byte (bits 2,1,0 in parenthesis)" In that table, opcodes listed by the index REG bits as: 000 001 010 011 100 101 110 111 TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX So, it seems TEST Ib is assigned to 001. Add the new pattern. Reported-by: kbuild test robot <fengguang.wu(a)intel.com> Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: H. Peter Anvin <hpa(a)zytor.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/lib/x86-opcode-map.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -896,7 +896,7 @@ EndTable GrpTable: Grp3_1 0: TEST Eb,Ib -1: +1: TEST Eb,Ib 2: NOT Eb 3: NEG Eb 4: MUL AL,Eb Patches currently in stable-queue which might be from mhiramat(a)kernel.org are queue-4.14/x86-decoder-add-new-test-instruction-pattern.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "x86/boot: Fix boot failure when SMP MP-table is based at 0" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled x86/boot: Fix boot failure when SMP MP-table is based at 0 to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: x86-boot-fix-boot-failure-when-smp-mp-table-is-based-at-0.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From ac5292e9a294618cecb31109d1ba265e3d027ba2 Mon Sep 17 00:00:00 2001 From: Tom Lendacky <thomas.lendacky(a)amd.com> Date: Mon, 6 Nov 2017 14:17:53 -0600 Subject: x86/boot: Fix boot failure when SMP MP-table is based at 0 From: Tom Lendacky <thomas.lendacky(a)amd.com> commit ac5292e9a294618cecb31109d1ba265e3d027ba2 upstream. When crosvm is used to boot a kernel as a VM, the SMP MP-table is found at physical address 0x0. This causes mpf_base to be set to 0 and a subsequent "if (!mpf_base)" check in default_get_smp_config() results in the MP-table not being parsed. Further into the boot this results in an oops when attempting a read_apic_id(). Add a boolean variable that is set to true when the MP-table is found. Use this variable for testing if the MP-table was found so that even a value of 0 for mpf_base will result in continued parsing of the MP-table. Fixes: 5997efb96756 ("x86/boot: Use memremap() to map the MPF and MPC data") Reported-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Borislav Petkov <bp(a)alien8.de> Cc: regression(a)leemhuis.info Link: https://lkml.kernel.org/r/20171106201753.23059.86674.stgit@tlendack-t1.amdo… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kernel/mpparse.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/mpparse.c +++ b/arch/x86/kernel/mpparse.c @@ -431,6 +431,7 @@ static inline void __init construct_defa } static unsigned long mpf_base; +static bool mpf_found; static unsigned long __init get_mpc_size(unsigned long physptr) { @@ -504,7 +505,7 @@ void __init default_get_smp_config(unsig if (!smp_found_config) return; - if (!mpf_base) + if (!mpf_found) return; if (acpi_lapic && early) @@ -593,6 +594,7 @@ static int __init smp_scan_config(unsign smp_found_config = 1; #endif mpf_base = base; + mpf_found = true; pr_info("found SMP MP-table at [mem %#010lx-%#010lx] mapped at [%p]\n", base, base + sizeof(*mpf) - 1, mpf); @@ -858,7 +860,7 @@ static int __init update_mp_table(void) if (!enable_update_mptable) return 0; - if (!mpf_base) + if (!mpf_found) return 0; mpf = early_memremap(mpf_base, sizeof(*mpf)); Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are queue-4.14/x86-boot-fix-boot-failure-when-smp-mp-table-is-based-at-0.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "uapi: fix linux/tls.h userspace compilation error" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled uapi: fix linux/tls.h userspace compilation error to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: uapi-fix-linux-tls.h-userspace-compilation-error.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From b9f3eb499d84f8d4adcb2f9212ec655700b28228 Mon Sep 17 00:00:00 2001 From: "Dmitry V. Levin" <ldv(a)altlinux.org> Date: Tue, 14 Nov 2017 06:30:11 +0300 Subject: uapi: fix linux/tls.h userspace compilation error From: Dmitry V. Levin <ldv(a)altlinux.org> commit b9f3eb499d84f8d4adcb2f9212ec655700b28228 upstream. Move inclusion of a private kernel header <net/tcp.h> from uapi/linux/tls.h to its only user - net/tls.h, to fix the following linux/tls.h userspace compilation error: /usr/include/linux/tls.h:41:21: fatal error: net/tcp.h: No such file or directory As to this point uapi/linux/tls.h was totaly unusuable for userspace, cleanup this header file further by moving other redundant includes to net/tls.h. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Dmitry V. Levin <ldv(a)altlinux.org> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- include/net/tls.h | 4 ++++ include/uapi/linux/tls.h | 4 ---- 2 files changed, 4 insertions(+), 4 deletions(-) --- a/include/net/tls.h +++ b/include/net/tls.h @@ -35,6 +35,10 @@ #define _TLS_OFFLOAD_H #include <linux/types.h> +#include <asm/byteorder.h> +#include <linux/socket.h> +#include <linux/tcp.h> +#include <net/tcp.h> #include <uapi/linux/tls.h> --- a/include/uapi/linux/tls.h +++ b/include/uapi/linux/tls.h @@ -35,10 +35,6 @@ #define _UAPI_LINUX_TLS_H #include <linux/types.h> -#include <asm/byteorder.h> -#include <linux/socket.h> -#include <linux/tcp.h> -#include <net/tcp.h> /* TLS socket options */ #define TLS_TX 1 /* Set transmit parameters */ Patches currently in stable-queue which might be from ldv(a)altlinux.org are queue-4.14/uapi-fix-linux-tls.h-userspace-compilation-error.patch queue-4.14/uapi-fix-linux-rxrpc.h-userspace-compilation-errors.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "uapi: fix linux/rxrpc.h userspace compilation errors" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled uapi: fix linux/rxrpc.h userspace compilation errors to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: uapi-fix-linux-rxrpc.h-userspace-compilation-errors.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 0eef304bc9f7d079a1165e8cd2f24b078e9e1f2a Mon Sep 17 00:00:00 2001 From: "Dmitry V. Levin" <ldv(a)altlinux.org> Date: Mon, 13 Nov 2017 03:37:06 +0300 Subject: uapi: fix linux/rxrpc.h userspace compilation errors From: Dmitry V. Levin <ldv(a)altlinux.org> commit 0eef304bc9f7d079a1165e8cd2f24b078e9e1f2a upstream. Consistently use types provided by <linux/types.h> to fix the following linux/rxrpc.h userspace compilation errors: /usr/include/linux/rxrpc.h:24:2: error: unknown type name 'u16' u16 srx_service; /* service desired */ /usr/include/linux/rxrpc.h:25:2: error: unknown type name 'u16' u16 transport_type; /* type of transport socket (SOCK_DGRAM) */ /usr/include/linux/rxrpc.h:26:2: error: unknown type name 'u16' u16 transport_len; /* length of transport address */ Use __kernel_sa_family_t instead of sa_family_t the same way as uapi/linux/in.h does, to fix the following linux/rxrpc.h userspace compilation errors: /usr/include/linux/rxrpc.h:23:2: error: unknown type name 'sa_family_t' sa_family_t srx_family; /* address family */ /usr/include/linux/rxrpc.h:28:3: error: unknown type name 'sa_family_t' sa_family_t family; /* transport address family */ Fixes: 727f8914477e ("rxrpc: Expose UAPI definitions to userspace") Signed-off-by: Dmitry V. Levin <ldv(a)altlinux.org> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- include/uapi/linux/rxrpc.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) --- a/include/uapi/linux/rxrpc.h +++ b/include/uapi/linux/rxrpc.h @@ -20,12 +20,12 @@ * RxRPC socket address */ struct sockaddr_rxrpc { - sa_family_t srx_family; /* address family */ - u16 srx_service; /* service desired */ - u16 transport_type; /* type of transport socket (SOCK_DGRAM) */ - u16 transport_len; /* length of transport address */ + __kernel_sa_family_t srx_family; /* address family */ + __u16 srx_service; /* service desired */ + __u16 transport_type; /* type of transport socket (SOCK_DGRAM) */ + __u16 transport_len; /* length of transport address */ union { - sa_family_t family; /* transport address family */ + __kernel_sa_family_t family; /* transport address family */ struct sockaddr_in sin; /* IPv4 transport address */ struct sockaddr_in6 sin6; /* IPv6 transport address */ } transport; Patches currently in stable-queue which might be from ldv(a)altlinux.org are queue-4.14/uapi-fix-linux-tls.h-userspace-compilation-error.patch queue-4.14/uapi-fix-linux-rxrpc.h-userspace-compilation-errors.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "perf/x86/intel: Hide TSX events when RTM is not supported" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled perf/x86/intel: Hide TSX events when RTM is not supported to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: perf-x86-intel-hide-tsx-events-when-rtm-is-not-supported.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 58ba4d5a25579e5c7e312bd359c95f3a9a0a242c Mon Sep 17 00:00:00 2001 From: Andi Kleen <ak(a)linux.intel.com> Date: Wed, 8 Nov 2017 16:07:18 -0800 Subject: perf/x86/intel: Hide TSX events when RTM is not supported From: Andi Kleen <ak(a)linux.intel.com> commit 58ba4d5a25579e5c7e312bd359c95f3a9a0a242c upstream. 0day testing reported a perf test regression on Haswell systems without RTM. Commit a5df70c35 hides the in_tx/in_tx_cp attributes when RTM is not available, but the TSX events are still available in sysfs. Due to the missing attributes the event parser fails on those files. Don't show the TSX events in sysfs when RTM is not available on Haswell/Broadwell/Skylake. Fixes: a5df70c354c2 (perf/x86: Only show format attributes when supported) Reported-by: kernel test robot <xiaolong.ye(a)intel.com> Tested-by: Jin Yao <yao.jin(a)linux.intel.com> Signed-off-by: Andi Kleen <ak(a)linux.intel.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Acked-by: Peter Zijlstra <peterz(a)infradead.org> Link: https://lkml.kernel.org/r/20171109000718.14137-1-andi@firstfloor.org Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/events/intel/core.c | 35 +++++++++++++++++++++++------------ 1 file changed, 23 insertions(+), 12 deletions(-) --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3730,6 +3730,19 @@ EVENT_ATTR_STR(cycles-t, cycles_t, "even EVENT_ATTR_STR(cycles-ct, cycles_ct, "event=0x3c,in_tx=1,in_tx_cp=1"); static struct attribute *hsw_events_attrs[] = { + EVENT_PTR(mem_ld_hsw), + EVENT_PTR(mem_st_hsw), + EVENT_PTR(td_slots_issued), + EVENT_PTR(td_slots_retired), + EVENT_PTR(td_fetch_bubbles), + EVENT_PTR(td_total_slots), + EVENT_PTR(td_total_slots_scale), + EVENT_PTR(td_recovery_bubbles), + EVENT_PTR(td_recovery_bubbles_scale), + NULL +}; + +static struct attribute *hsw_tsx_events_attrs[] = { EVENT_PTR(tx_start), EVENT_PTR(tx_commit), EVENT_PTR(tx_abort), @@ -3742,18 +3755,16 @@ static struct attribute *hsw_events_attr EVENT_PTR(el_conflict), EVENT_PTR(cycles_t), EVENT_PTR(cycles_ct), - EVENT_PTR(mem_ld_hsw), - EVENT_PTR(mem_st_hsw), - EVENT_PTR(td_slots_issued), - EVENT_PTR(td_slots_retired), - EVENT_PTR(td_fetch_bubbles), - EVENT_PTR(td_total_slots), - EVENT_PTR(td_total_slots_scale), - EVENT_PTR(td_recovery_bubbles), - EVENT_PTR(td_recovery_bubbles_scale), NULL }; +static __init struct attribute **get_hsw_events_attrs(void) +{ + return boot_cpu_has(X86_FEATURE_RTM) ? + merge_attr(hsw_events_attrs, hsw_tsx_events_attrs) : + hsw_events_attrs; +} + static ssize_t freeze_on_smi_show(struct device *cdev, struct device_attribute *attr, char *buf) @@ -4182,7 +4193,7 @@ __init int intel_pmu_init(void) x86_pmu.hw_config = hsw_hw_config; x86_pmu.get_event_constraints = hsw_get_event_constraints; - x86_pmu.cpu_events = hsw_events_attrs; + x86_pmu.cpu_events = get_hsw_events_attrs(); x86_pmu.lbr_double_abort = true; extra_attr = boot_cpu_has(X86_FEATURE_RTM) ? hsw_format_attr : nhm_format_attr; @@ -4221,7 +4232,7 @@ __init int intel_pmu_init(void) x86_pmu.hw_config = hsw_hw_config; x86_pmu.get_event_constraints = hsw_get_event_constraints; - x86_pmu.cpu_events = hsw_events_attrs; + x86_pmu.cpu_events = get_hsw_events_attrs(); x86_pmu.limit_period = bdw_limit_period; extra_attr = boot_cpu_has(X86_FEATURE_RTM) ? hsw_format_attr : nhm_format_attr; @@ -4279,7 +4290,7 @@ __init int intel_pmu_init(void) extra_attr = boot_cpu_has(X86_FEATURE_RTM) ? hsw_format_attr : nhm_format_attr; extra_attr = merge_attr(extra_attr, skl_format_attr); - x86_pmu.cpu_events = hsw_events_attrs; + x86_pmu.cpu_events = get_hsw_events_attrs(); intel_pmu_pebs_data_source_skl( boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X); pr_cont("Skylake events, "); Patches currently in stable-queue which might be from ak(a)linux.intel.com are queue-4.14/perf-x86-intel-hide-tsx-events-when-rtm-is-not-supported.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "MIPS: ralink: Fix MT7628 pinmux" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled MIPS: ralink: Fix MT7628 pinmux to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mips-ralink-fix-mt7628-pinmux.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 8ef4b43cd3794d63052d85898e42424fd3b14d24 Mon Sep 17 00:00:00 2001 From: Mathias Kresin <dev(a)kresin.me> Date: Thu, 11 May 2017 08:11:14 +0200 Subject: MIPS: ralink: Fix MT7628 pinmux From: Mathias Kresin <dev(a)kresin.me> commit 8ef4b43cd3794d63052d85898e42424fd3b14d24 upstream. According to the datasheet the REFCLK pin is shared with GPIO#37 and the PERST pin is shared with GPIO#36. Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support") Signed-off-by: Mathias Kresin <dev(a)kresin.me> Acked-by: John Crispin <john(a)phrozen.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: linux-mips(a)linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16046/ Signed-off-by: James Hogan <jhogan(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/mips/ralink/mt7620.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/mips/ralink/mt7620.c +++ b/arch/mips/ralink/mt7620.c @@ -145,8 +145,8 @@ static struct rt2880_pmx_func i2c_grp_mt FUNC("i2c", 0, 4, 2), }; -static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 36, 1) }; -static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 37, 1) }; +static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) }; +static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) }; static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) }; static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) }; Patches currently in stable-queue which might be from dev(a)kresin.me are queue-4.14/mips-ralink-fix-mt7628-pinmux.patch queue-4.14/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "MIPS: ralink: Fix typo in mt7628 pinmux function" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled MIPS: ralink: Fix typo in mt7628 pinmux function to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mips-ralink-fix-typo-in-mt7628-pinmux-function.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 05a67cc258e75ac9758e6f13d26337b8be51162a Mon Sep 17 00:00:00 2001 From: Mathias Kresin <dev(a)kresin.me> Date: Thu, 11 May 2017 08:11:15 +0200 Subject: MIPS: ralink: Fix typo in mt7628 pinmux function From: Mathias Kresin <dev(a)kresin.me> commit 05a67cc258e75ac9758e6f13d26337b8be51162a upstream. There is a typo inside the pinmux setup code. The function is called refclk and not reclk. Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support") Signed-off-by: Mathias Kresin <dev(a)kresin.me> Acked-by: John Crispin <john(a)phrozen.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: linux-mips(a)linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16047/ Signed-off-by: James Hogan <jhogan(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/mips/ralink/mt7620.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/mips/ralink/mt7620.c +++ b/arch/mips/ralink/mt7620.c @@ -145,7 +145,7 @@ static struct rt2880_pmx_func i2c_grp_mt FUNC("i2c", 0, 4, 2), }; -static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) }; +static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("refclk", 0, 37, 1) }; static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) }; static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) }; static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) }; Patches currently in stable-queue which might be from dev(a)kresin.me are queue-4.14/mips-ralink-fix-mt7628-pinmux.patch queue-4.14/mips-ralink-fix-typo-in-mt7628-pinmux-function.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "arm64: Implement arch-specific pte_access_permitted()" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled arm64: Implement arch-specific pte_access_permitted() to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: arm64-implement-arch-specific-pte_access_permitted.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 6218f96c58dbf44a06aeaf767aab1f54fc397838 Mon Sep 17 00:00:00 2001 From: Catalin Marinas <catalin.marinas(a)arm.com> Date: Thu, 26 Oct 2017 18:36:47 +0100 Subject: arm64: Implement arch-specific pte_access_permitted() From: Catalin Marinas <catalin.marinas(a)arm.com> commit 6218f96c58dbf44a06aeaf767aab1f54fc397838 upstream. The generic pte_access_permitted() implementation only checks for pte_present() (together with the write permission where applicable). However, for both kernel ptes and PROT_NONE mappings pte_present() also returns true on arm64 even though such mappings are not user accessible. Additionally, arm64 now supports execute-only user permission (PROT_EXEC) which is implemented by clearing the PTE_USER bit. With this patch the arm64 implementation of pte_access_permitted() checks for the PTE_VALID and PTE_USER bits together with writable access if applicable. Reported-by: Al Viro <viro(a)zeniv.linux.org.uk> Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com> Signed-off-by: Will Deacon <will.deacon(a)arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -98,6 +98,8 @@ extern unsigned long empty_zero_page[PAG ((pte_val(pte) & (PTE_VALID | PTE_USER | PTE_UXN)) == (PTE_VALID | PTE_UXN)) #define pte_valid_young(pte) \ ((pte_val(pte) & (PTE_VALID | PTE_AF)) == (PTE_VALID | PTE_AF)) +#define pte_valid_user(pte) \ + ((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) /* * Could the pte be present in the TLB? We must check mm_tlb_flush_pending @@ -107,6 +109,18 @@ extern unsigned long empty_zero_page[PAG #define pte_accessible(mm, pte) \ (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte)) +/* + * p??_access_permitted() is true for valid user mappings (subject to the + * write permission check) other than user execute-only which do not have the + * PTE_USER bit set. PROT_NONE mappings do not have the PTE_VALID bit set. + */ +#define pte_access_permitted(pte, write) \ + (pte_valid_user(pte) && (!(write) || pte_write(pte))) +#define pmd_access_permitted(pmd, write) \ + (pte_access_permitted(pmd_pte(pmd), (write))) +#define pud_access_permitted(pud, write) \ + (pte_access_permitted(pud_pte(pud), (write))) + static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot) { pte_val(pte) &= ~pgprot_val(prot); Patches currently in stable-queue which might be from catalin.marinas(a)arm.com are queue-4.14/arm64-implement-arch-specific-pte_access_permitted.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work for 32-bit SMP" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work for 32-bit SMP to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mips-cmpxchg64-and-have_virt_cpu_accounting_gen-don-t-work-for-32-bit-smp.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From a3f143106596d739e7fbc4b84c96b1475247d876 Mon Sep 17 00:00:00 2001 From: Ben Hutchings <ben(a)decadent.org.uk> Date: Wed, 4 Oct 2017 03:46:14 +0100 Subject: MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work for 32-bit SMP From: Ben Hutchings <ben(a)decadent.org.uk> commit a3f143106596d739e7fbc4b84c96b1475247d876 upstream. __cmpxchg64_local_generic() is atomic only w.r.t tasks and interrupts on the same CPU (that's what the 'local' means). We can't use it to implement cmpxchg64() in SMP configurations. So, for 32-bit SMP configurations: - Don't define cmpxchg64() - Don't enable HAVE_VIRT_CPU_ACCOUNTING_GEN, which requires it Fixes: e2093c7b03c1 ("MIPS: Fall back to generic implementation of ...") Fixes: bb877e96bea1 ("MIPS: Add support for full dynticks CPU time accounting") Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: Deng-Cheng Zhu <dengcheng.zhu(a)mips.com> Cc: linux-mips(a)linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/17413/ Signed-off-by: James Hogan <jhogan(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/mips/Kconfig | 2 +- arch/mips/include/asm/cmpxchg.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -65,7 +65,7 @@ config MIPS select HAVE_PERF_EVENTS select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_SYSCALL_TRACEPOINTS - select HAVE_VIRT_CPU_ACCOUNTING_GEN + select HAVE_VIRT_CPU_ACCOUNTING_GEN if 64BIT || !SMP select IRQ_FORCED_THREADING select MODULES_USE_ELF_RELA if MODULES && 64BIT select MODULES_USE_ELF_REL if MODULES --- a/arch/mips/include/asm/cmpxchg.h +++ b/arch/mips/include/asm/cmpxchg.h @@ -204,8 +204,10 @@ static inline unsigned long __cmpxchg(vo #else #include <asm-generic/cmpxchg-local.h> #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n)) +#ifndef CONFIG_SMP #define cmpxchg64(ptr, o, n) cmpxchg64_local((ptr), (o), (n)) #endif +#endif #undef __scbeqz Patches currently in stable-queue which might be from ben(a)decadent.org.uk are queue-4.14/mips-cmpxchg64-and-have_virt_cpu_accounting_gen-don-t-work-for-32-bit-smp.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c Mon Sep 17 00:00:00 2001 From: Philip Derrin <philip(a)cog.systems> Date: Tue, 14 Nov 2017 00:55:25 +0100 Subject: ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE From: Philip Derrin <philip(a)cog.systems> commit 400eeffaffc7232c0ae1134fe04e14ae4fb48d8c upstream. Currently, for ARM kernels with CONFIG_ARM_LPAE and CONFIG_STRICT_KERNEL_RWX enabled, the 2MiB pages mapping the kernel code and rodata are writable. They are marked read-only in a software bit (L_PMD_SECT_RDONLY) but the hardware read-only bit is not set (PMD_SECT_AP2). For user mappings, the logic that propagates the software bit to the hardware bit is in set_pmd_at(); but for the kernel, section_update() writes the PMDs directly, skipping this logic. The fix is to set PMD_SECT_AP2 for read-only sections in section_update(), at the same time as L_PMD_SECT_RDONLY. Fixes: 1e3479225acb ("ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error") Signed-off-by: Philip Derrin <philip(a)cog.systems> Reported-by: Neil Dick <neil(a)cog.systems> Tested-by: Neil Dick <neil(a)cog.systems> Tested-by: Laura Abbott <labbott(a)redhat.com> Reviewed-by: Kees Cook <keescook(a)chromium.org> Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/arm/mm/init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -639,8 +639,8 @@ static struct section_perm ro_perms[] = .start = (unsigned long)_stext, .end = (unsigned long)__init_begin, #ifdef CONFIG_ARM_LPAE - .mask = ~L_PMD_SECT_RDONLY, - .prot = L_PMD_SECT_RDONLY, + .mask = ~(L_PMD_SECT_RDONLY | PMD_SECT_AP2), + .prot = L_PMD_SECT_RDONLY | PMD_SECT_AP2, #else .mask = ~(PMD_SECT_APX | PMD_SECT_AP_WRITE), .prot = PMD_SECT_APX | PMD_SECT_AP_WRITE, Patches currently in stable-queue which might be from philip(a)cog.systems are queue-4.14/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch queue-4.14/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ARM: 8721/1: mm: dump: check hardware RO bit for LPAE" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ARM: 8721/1: mm: dump: check hardware RO bit for LPAE to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 3b0c0c922ff4be275a8beb87ce5657d16f355b54 Mon Sep 17 00:00:00 2001 From: Philip Derrin <philip(a)cog.systems> Date: Tue, 14 Nov 2017 00:55:26 +0100 Subject: ARM: 8721/1: mm: dump: check hardware RO bit for LPAE From: Philip Derrin <philip(a)cog.systems> commit 3b0c0c922ff4be275a8beb87ce5657d16f355b54 upstream. When CONFIG_ARM_LPAE is set, the PMD dump relies on the software read-only bit to determine whether a page is writable. This concealed a bug which left the kernel text section writable (AP2=0) while marked read-only in the software bit. In a kernel with the AP2 bug, the dump looks like this: ---[ Kernel Mapping ]--- 0xc0000000-0xc0200000 2M RW NX SHD 0xc0200000-0xc0600000 4M ro x SHD 0xc0600000-0xc0800000 2M ro NX SHD 0xc0800000-0xc4800000 64M RW NX SHD The fix is to check that the software and hardware bits are both set before displaying "ro". The dump then shows the true perms: ---[ Kernel Mapping ]--- 0xc0000000-0xc0200000 2M RW NX SHD 0xc0200000-0xc0600000 4M RW x SHD 0xc0600000-0xc0800000 2M RW NX SHD 0xc0800000-0xc4800000 64M RW NX SHD Fixes: ded947798469 ("ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE") Signed-off-by: Philip Derrin <philip(a)cog.systems> Tested-by: Neil Dick <neil(a)cog.systems> Reviewed-by: Kees Cook <keescook(a)chromium.org> Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/arm/mm/dump.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/arm/mm/dump.c +++ b/arch/arm/mm/dump.c @@ -129,8 +129,8 @@ static const struct prot_bits section_bi .val = PMD_SECT_USER, .set = "USR", }, { - .mask = L_PMD_SECT_RDONLY, - .val = L_PMD_SECT_RDONLY, + .mask = L_PMD_SECT_RDONLY | PMD_SECT_AP2, + .val = L_PMD_SECT_RDONLY | PMD_SECT_AP2, .set = "ro", .clear = "RW", #elif __LINUX_ARM_ARCH__ >= 6 Patches currently in stable-queue which might be from philip(a)cog.systems are queue-4.14/arm-8722-1-mm-make-strict_kernel_rwx-effective-for-lpae.patch queue-4.14/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "x86/decoder: Add new TEST instruction pattern" has been added to the 3.18-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled x86/decoder: Add new TEST instruction pattern to the 3.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: x86-decoder-add-new-test-instruction-pattern.patch and it can be found in the queue-3.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu <mhiramat(a)kernel.org> Date: Fri, 24 Nov 2017 13:56:30 +0900 Subject: x86/decoder: Add new TEST instruction pattern From: Masami Hiramatsu <mhiramat(a)kernel.org> commit 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df upstream. The kbuild test robot reported this build warning: Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx) Warning: objdump says 3 bytes, but insn_get_length() says 2 Warning: decoded and checked 1569014 instructions with 1 warnings This sequence seems to be a new instruction not in the opcode map in the Intel SDM. The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8. Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of the ModR/M Byte (bits 2,1,0 in parenthesis)" In that table, opcodes listed by the index REG bits as: 000 001 010 011 100 101 110 111 TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX So, it seems TEST Ib is assigned to 001. Add the new pattern. Reported-by: kbuild test robot <fengguang.wu(a)intel.com> Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: H. Peter Anvin <hpa(a)zytor.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/lib/x86-opcode-map.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -814,7 +814,7 @@ EndTable GrpTable: Grp3_1 0: TEST Eb,Ib -1: +1: TEST Eb,Ib 2: NOT Eb 3: NEG Eb 4: MUL AL,Eb Patches currently in stable-queue which might be from mhiramat(a)kernel.org are queue-3.18/x86-decoder-add-new-test-instruction-pattern.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ARM: 8721/1: mm: dump: check hardware RO bit for LPAE" has been added to the 3.18-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ARM: 8721/1: mm: dump: check hardware RO bit for LPAE to the 3.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch and it can be found in the queue-3.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 3b0c0c922ff4be275a8beb87ce5657d16f355b54 Mon Sep 17 00:00:00 2001 From: Philip Derrin <philip(a)cog.systems> Date: Tue, 14 Nov 2017 00:55:26 +0100 Subject: ARM: 8721/1: mm: dump: check hardware RO bit for LPAE From: Philip Derrin <philip(a)cog.systems> commit 3b0c0c922ff4be275a8beb87ce5657d16f355b54 upstream. When CONFIG_ARM_LPAE is set, the PMD dump relies on the software read-only bit to determine whether a page is writable. This concealed a bug which left the kernel text section writable (AP2=0) while marked read-only in the software bit. In a kernel with the AP2 bug, the dump looks like this: ---[ Kernel Mapping ]--- 0xc0000000-0xc0200000 2M RW NX SHD 0xc0200000-0xc0600000 4M ro x SHD 0xc0600000-0xc0800000 2M ro NX SHD 0xc0800000-0xc4800000 64M RW NX SHD The fix is to check that the software and hardware bits are both set before displaying "ro". The dump then shows the true perms: ---[ Kernel Mapping ]--- 0xc0000000-0xc0200000 2M RW NX SHD 0xc0200000-0xc0600000 4M RW x SHD 0xc0600000-0xc0800000 2M RW NX SHD 0xc0800000-0xc4800000 64M RW NX SHD Fixes: ded947798469 ("ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE") Signed-off-by: Philip Derrin <philip(a)cog.systems> Tested-by: Neil Dick <neil(a)cog.systems> Reviewed-by: Kees Cook <keescook(a)chromium.org> Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/arm/mm/dump.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/arm/mm/dump.c +++ b/arch/arm/mm/dump.c @@ -126,8 +126,8 @@ static const struct prot_bits section_bi .val = PMD_SECT_USER, .set = "USR", }, { - .mask = L_PMD_SECT_RDONLY, - .val = L_PMD_SECT_RDONLY, + .mask = L_PMD_SECT_RDONLY | PMD_SECT_AP2, + .val = L_PMD_SECT_RDONLY | PMD_SECT_AP2, .set = "ro", .clear = "RW", #elif __LINUX_ARM_ARCH__ >= 6 Patches currently in stable-queue which might be from philip(a)cog.systems are queue-3.18/arm-8721-1-mm-dump-check-hardware-ro-bit-for-lpae.patch

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] MIPS: ralink: Fix typo in mt7628 pinmux function" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 05a67cc258e75ac9758e6f13d26337b8be51162a Mon Sep 17 00:00:00 2001 From: Mathias Kresin <dev(a)kresin.me> Date: Thu, 11 May 2017 08:11:15 +0200 Subject: [PATCH] MIPS: ralink: Fix typo in mt7628 pinmux function There is a typo inside the pinmux setup code. The function is called refclk and not reclk. Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support") Signed-off-by: Mathias Kresin <dev(a)kresin.me> Acked-by: John Crispin <john(a)phrozen.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 3.19+ Patchwork: https://patchwork.linux-mips.org/patch/16047/ Signed-off-by: James Hogan <jhogan(a)kernel.org> diff --git a/arch/mips/ralink/mt7620.c b/arch/mips/ralink/mt7620.c index fda073cd7b73..41b71c4352c2 100644 --- a/arch/mips/ralink/mt7620.c +++ b/arch/mips/ralink/mt7620.c @@ -145,7 +145,7 @@ static struct rt2880_pmx_func i2c_grp_mt7628[] = { FUNC("i2c", 0, 4, 2), }; -static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) }; +static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("refclk", 0, 37, 1) }; static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) }; static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) }; static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) };

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] MIPS: ralink: Fix typo in mt7628 pinmux function" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 05a67cc258e75ac9758e6f13d26337b8be51162a Mon Sep 17 00:00:00 2001 From: Mathias Kresin <dev(a)kresin.me> Date: Thu, 11 May 2017 08:11:15 +0200 Subject: [PATCH] MIPS: ralink: Fix typo in mt7628 pinmux function There is a typo inside the pinmux setup code. The function is called refclk and not reclk. Fixes: 53263a1c6852 ("MIPS: ralink: add mt7628an support") Signed-off-by: Mathias Kresin <dev(a)kresin.me> Acked-by: John Crispin <john(a)phrozen.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 3.19+ Patchwork: https://patchwork.linux-mips.org/patch/16047/ Signed-off-by: James Hogan <jhogan(a)kernel.org> diff --git a/arch/mips/ralink/mt7620.c b/arch/mips/ralink/mt7620.c index fda073cd7b73..41b71c4352c2 100644 --- a/arch/mips/ralink/mt7620.c +++ b/arch/mips/ralink/mt7620.c @@ -145,7 +145,7 @@ static struct rt2880_pmx_func i2c_grp_mt7628[] = { FUNC("i2c", 0, 4, 2), }; -static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("reclk", 0, 37, 1) }; +static struct rt2880_pmx_func refclk_grp_mt7628[] = { FUNC("refclk", 0, 37, 1) }; static struct rt2880_pmx_func perst_grp_mt7628[] = { FUNC("perst", 0, 36, 1) }; static struct rt2880_pmx_func wdt_grp_mt7628[] = { FUNC("wdt", 0, 38, 1) }; static struct rt2880_pmx_func spi_grp_mt7628[] = { FUNC("spi", 0, 7, 4) };

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH RFC] EDAC: octeon: Fix uninitialized variable warning

by James Hogan

From: James Hogan <jhogan(a)kernel.org> Fix uninitialized variable warning in the Octeon EDAC driver, as seen in MIPS cavium_octeon_defconfig builds since v4.14 with Codescape GNU Tools 2016.05-03: drivers/edac/octeon_edac-lmc.c In function ‘octeon_lmc_edac_poll_o2’: drivers/edac/octeon_edac-lmc.c:87:24: warning: ‘((long unsigned int*)&int_reg)[1]’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (int_reg.s.sec_err || int_reg.s.ded_err) { ^ This was introduced in commit 1bc021e81565 ("EDAC: Octeon: Add error injection support"), and is fixed by initialising the whole int_reg variable to zero before the conditional assignments in the error injection case. Fixes: 1bc021e81565 ("EDAC: Octeon: Add error injection support") Signed-off-by: James Hogan <jhogan(a)kernel.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: David Daney <david.daney(a)cavium.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org> Cc: Daniel Walker <dwalker(a)fifo99.com> Cc: Steven J. Hill <steven.hill(a)cavium.com> Cc: linux-edac(a)vger.kernel.org Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 3.15+ --- Comments appreciated. Is this correct? I've added the stable tag on the assumption that this might matter. If not it can be changed. It'd be nice to have it in 4.14 though to silence the warning since the driver was added to cavium_octeon_defconfig in commit f922bc0ad08b ("MIPS: Octeon: cavium_octeon_defconfig: Enable more drivers"). --- drivers/edac/octeon_edac-lmc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/edac/octeon_edac-lmc.c b/drivers/edac/octeon_edac-lmc.c index 9c1ffe3e912b..aeb222ca3ed1 100644 --- a/drivers/edac/octeon_edac-lmc.c +++ b/drivers/edac/octeon_edac-lmc.c @@ -78,6 +78,7 @@ static void octeon_lmc_edac_poll_o2(struct mem_ctl_info *mci) if (!pvt->inject) int_reg.u64 = cvmx_read_csr(CVMX_LMCX_INT(mci->mc_idx)); else { + int_reg.u64 = 0; if (pvt->error_type == 1) int_reg.s.sec_err = 1; if (pvt->error_type == 2) -- 2.14.1

8 years

3
2
0 0

[Linux-stable-mirror] WTF: patch "[PATCH] MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver" was seriously submitted to be applied to the 4.9-stable tree?

by gregkh＠linuxfoundation.org

The patch below was submitted to be applied to the 4.9-stable tree. I fail to see how this patch meets the stable kernel rules as found at Documentation/process/stable-kernel-rules.rst. I could be totally wrong, and if so, please respond to <stable(a)vger.kernel.org> and let me know why this patch should be applied. Otherwise, it is now dropped from my patch queues, never to be seen again. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 8593b18ad348733b5d5ddfa0c79dcabf51dff308 Mon Sep 17 00:00:00 2001 From: John Crispin <john(a)phrozen.org> Date: Mon, 20 Feb 2017 10:29:43 +0100 Subject: [PATCH] MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver Switch the printk() call to the prefered pr_warn() api. Fixes: 7e5873d3755c ("MIPS: pci: Add MT7620a PCIE driver") Signed-off-by: John Crispin <john(a)phrozen.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 4.5+ Patchwork: https://patchwork.linux-mips.org/patch/15321/ Signed-off-by: James Hogan <jhogan(a)kernel.org> diff --git a/arch/mips/pci/pci-mt7620.c b/arch/mips/pci/pci-mt7620.c index 22e4c22e750b..1a0b80a1cc4a 100644 --- a/arch/mips/pci/pci-mt7620.c +++ b/arch/mips/pci/pci-mt7620.c @@ -120,7 +120,7 @@ static int wait_pciephy_busy(void) else break; if (retry++ > WAITRETRY_MAX) { - printk(KERN_WARN "PCIE-PHY retry failed.\n"); + pr_warn("PCIE-PHY retry failed.\n"); return -1; } }

8 years

1
0
0 0

[Linux-stable-mirror] WTF: patch "[PATCH] MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver" was seriously submitted to be applied to the 4.14-stable tree?

by gregkh＠linuxfoundation.org

The patch below was submitted to be applied to the 4.14-stable tree. I fail to see how this patch meets the stable kernel rules as found at Documentation/process/stable-kernel-rules.rst. I could be totally wrong, and if so, please respond to <stable(a)vger.kernel.org> and let me know why this patch should be applied. Otherwise, it is now dropped from my patch queues, never to be seen again. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 8593b18ad348733b5d5ddfa0c79dcabf51dff308 Mon Sep 17 00:00:00 2001 From: John Crispin <john(a)phrozen.org> Date: Mon, 20 Feb 2017 10:29:43 +0100 Subject: [PATCH] MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver Switch the printk() call to the prefered pr_warn() api. Fixes: 7e5873d3755c ("MIPS: pci: Add MT7620a PCIE driver") Signed-off-by: John Crispin <john(a)phrozen.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 4.5+ Patchwork: https://patchwork.linux-mips.org/patch/15321/ Signed-off-by: James Hogan <jhogan(a)kernel.org> diff --git a/arch/mips/pci/pci-mt7620.c b/arch/mips/pci/pci-mt7620.c index 22e4c22e750b..1a0b80a1cc4a 100644 --- a/arch/mips/pci/pci-mt7620.c +++ b/arch/mips/pci/pci-mt7620.c @@ -120,7 +120,7 @@ static int wait_pciephy_busy(void) else break; if (retry++ > WAITRETRY_MAX) { - printk(KERN_WARN "PCIE-PHY retry failed.\n"); + pr_warn("PCIE-PHY retry failed.\n"); return -1; } }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From a3f143106596d739e7fbc4b84c96b1475247d876 Mon Sep 17 00:00:00 2001 From: Ben Hutchings <ben(a)decadent.org.uk> Date: Wed, 4 Oct 2017 03:46:14 +0100 Subject: [PATCH] MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work for 32-bit SMP __cmpxchg64_local_generic() is atomic only w.r.t tasks and interrupts on the same CPU (that's what the 'local' means). We can't use it to implement cmpxchg64() in SMP configurations. So, for 32-bit SMP configurations: - Don't define cmpxchg64() - Don't enable HAVE_VIRT_CPU_ACCOUNTING_GEN, which requires it Fixes: e2093c7b03c1 ("MIPS: Fall back to generic implementation of ...") Fixes: bb877e96bea1 ("MIPS: Add support for full dynticks CPU time accounting") Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: Deng-Cheng Zhu <dengcheng.zhu(a)mips.com> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 4.1+ Patchwork: https://patchwork.linux-mips.org/patch/17413/ Signed-off-by: James Hogan <jhogan(a)kernel.org> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 43411118dd4f..b09b644cd57c 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -64,7 +64,7 @@ config MIPS select HAVE_PERF_EVENTS select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_SYSCALL_TRACEPOINTS - select HAVE_VIRT_CPU_ACCOUNTING_GEN + select HAVE_VIRT_CPU_ACCOUNTING_GEN if 64BIT || !SMP select IRQ_FORCED_THREADING select MODULES_USE_ELF_RELA if MODULES && 64BIT select MODULES_USE_ELF_REL if MODULES diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h index 903f3bf48419..ae2b4583b486 100644 --- a/arch/mips/include/asm/cmpxchg.h +++ b/arch/mips/include/asm/cmpxchg.h @@ -202,8 +202,10 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old, #else #include <asm-generic/cmpxchg-local.h> #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n)) +#ifndef CONFIG_SMP #define cmpxchg64(ptr, o, n) cmpxchg64_local((ptr), (o), (n)) #endif +#endif #undef __scbeqz

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From a3f143106596d739e7fbc4b84c96b1475247d876 Mon Sep 17 00:00:00 2001 From: Ben Hutchings <ben(a)decadent.org.uk> Date: Wed, 4 Oct 2017 03:46:14 +0100 Subject: [PATCH] MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work for 32-bit SMP __cmpxchg64_local_generic() is atomic only w.r.t tasks and interrupts on the same CPU (that's what the 'local' means). We can't use it to implement cmpxchg64() in SMP configurations. So, for 32-bit SMP configurations: - Don't define cmpxchg64() - Don't enable HAVE_VIRT_CPU_ACCOUNTING_GEN, which requires it Fixes: e2093c7b03c1 ("MIPS: Fall back to generic implementation of ...") Fixes: bb877e96bea1 ("MIPS: Add support for full dynticks CPU time accounting") Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: Deng-Cheng Zhu <dengcheng.zhu(a)mips.com> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 4.1+ Patchwork: https://patchwork.linux-mips.org/patch/17413/ Signed-off-by: James Hogan <jhogan(a)kernel.org> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 43411118dd4f..b09b644cd57c 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -64,7 +64,7 @@ config MIPS select HAVE_PERF_EVENTS select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_SYSCALL_TRACEPOINTS - select HAVE_VIRT_CPU_ACCOUNTING_GEN + select HAVE_VIRT_CPU_ACCOUNTING_GEN if 64BIT || !SMP select IRQ_FORCED_THREADING select MODULES_USE_ELF_RELA if MODULES && 64BIT select MODULES_USE_ELF_REL if MODULES diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h index 903f3bf48419..ae2b4583b486 100644 --- a/arch/mips/include/asm/cmpxchg.h +++ b/arch/mips/include/asm/cmpxchg.h @@ -202,8 +202,10 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old, #else #include <asm-generic/cmpxchg-local.h> #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n)) +#ifndef CONFIG_SMP #define cmpxchg64(ptr, o, n) cmpxchg64_local((ptr), (o), (n)) #endif +#endif #undef __scbeqz

8 years

1
0
0 0

[Linux-stable-mirror] WTF: patch "[PATCH] arm64: mm: Remove arch_apei_flush_tlb_one()" was seriously submitted to be applied to the 4.14-stable tree?

by gregkh＠linuxfoundation.org

The patch below was submitted to be applied to the 4.14-stable tree. I fail to see how this patch meets the stable kernel rules as found at Documentation/process/stable-kernel-rules.rst. I could be totally wrong, and if so, please respond to <stable(a)vger.kernel.org> and let me know why this patch should be applied. Otherwise, it is now dropped from my patch queues, never to be seen again. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 18b4b276b490a8b9f86c512de8a6054c27bb87c2 Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:26 +0000 Subject: [PATCH] arm64: mm: Remove arch_apei_flush_tlb_one() Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on __set_fixmap() to do the invalidation. Remove it. Move the IPI-considered-harmful comment to __set_fixmap(). Signed-off-by: James Morse <james.morse(a)arm.com> Acked-by: Will Deacon <will.deacon(a)arm.com> Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Cc: All applicable <stable(a)vger.kernel.org> diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h index 59cca1d6ec54..32f465a80e4e 100644 --- a/arch/arm64/include/asm/acpi.h +++ b/arch/arm64/include/asm/acpi.h @@ -126,18 +126,6 @@ static inline const char *acpi_get_enable_method(int cpu) */ #define acpi_disable_cmcff 1 pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr); - -/* - * Despite its name, this function must still broadcast the TLB - * invalidation in order to ensure other CPUs don't end up with junk - * entries as a result of speculation. Unusually, its also called in - * IRQ context (ghes_iounmap_irq) so if we ever need to use IPIs for - * TLB broadcasting, then we're in trouble here. - */ -static inline void arch_apei_flush_tlb_one(unsigned long addr) -{ - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); -} #endif /* CONFIG_ACPI_APEI */ #ifdef CONFIG_ACPI_NUMA diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index f1eb15e0e864..267d2b79d52d 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -778,6 +778,10 @@ void __init early_fixmap_init(void) } } +/* + * Unusually, this is also called in IRQ context (ghes_iounmap_irq) so if we + * ever need to use IPIs for TLB broadcasting, then we're in trouble here. + */ void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags) {

8 years

1
0
0 0

[Linux-stable-mirror] Patch "sched: Make resched_cpu() unconditional" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled sched: Make resched_cpu() unconditional to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: sched-make-resched_cpu-unconditional.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" <paulmck(a)linux.vnet.ibm.com> Date: Mon, 18 Sep 2017 08:54:40 -0700 Subject: sched: Make resched_cpu() unconditional From: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> commit 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 upstream. The current implementation of synchronize_sched_expedited() incorrectly assumes that resched_cpu() is unconditional, which it is not. This means that synchronize_sched_expedited() can hang when resched_cpu()'s trylock fails as follows (analysis by Neeraj Upadhyay): o CPU1 is waiting for expedited wait to complete: sync_rcu_exp_select_cpus rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5 IPI sent to CPU5 synchronize_sched_expedited_wait ret = swait_event_timeout(rsp->expedited_wq, sync_rcu_preempt_exp_done(rnp_root), jiffies_stall); expmask = 0x20, CPU 5 in idle path (in cpuidle_enter()) o CPU5 handles IPI and fails to acquire rq lock. Handles IPI sync_sched_exp_handler resched_cpu returns while failing to try lock acquire rq->lock need_resched is not set o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to idle (schedule() is not called). o CPU 1 reports RCU stall. Given that resched_cpu() is now used only by RCU, this commit fixes the assumption by making resched_cpu() unconditional. Reported-by: Neeraj Upadhyay <neeraju(a)codeaurora.org> Suggested-by: Neeraj Upadhyay <neeraju(a)codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- kernel/sched/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -507,8 +507,7 @@ void resched_cpu(int cpu) struct rq *rq = cpu_rq(cpu); unsigned long flags; - if (!raw_spin_trylock_irqsave(&rq->lock, flags)) - return; + raw_spin_lock_irqsave(&rq->lock, flags); resched_curr(rq); raw_spin_unlock_irqrestore(&rq->lock, flags); } Patches currently in stable-queue which might be from paulmck(a)linux.vnet.ibm.com are queue-4.9/sched-make-resched_cpu-unconditional.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "lib/mpi: call cond_resched() from mpi_powm() loop" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled lib/mpi: call cond_resched() from mpi_powm() loop to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: lib-mpi-call-cond_resched-from-mpi_powm-loop.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca Mon Sep 17 00:00:00 2001 From: Eric Biggers <ebiggers(a)google.com> Date: Tue, 7 Nov 2017 14:15:27 -0800 Subject: lib/mpi: call cond_resched() from mpi_powm() loop From: Eric Biggers <ebiggers(a)google.com> commit 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca upstream. On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the largest permitted inputs (16384 bits), the kernel spends 10+ seconds doing modular exponentiation in mpi_powm() without rescheduling. If all threads do it, it locks up the system. Moreover, it can cause rcu_sched-stall warnings. Notwithstanding the insanity of doing this calculation in kernel mode rather than in userspace, fix it by calling cond_resched() as each bit from the exponent is processed. It's still noninterruptible, but at least it's preemptible now. Do the cond_resched() once per bit rather than once per MPI limb because each limb might still easily take 100+ milliseconds on slow CPUs. Signed-off-by: Eric Biggers <ebiggers(a)google.com> Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- lib/mpi/mpi-pow.c | 2 ++ 1 file changed, 2 insertions(+) --- a/lib/mpi/mpi-pow.c +++ b/lib/mpi/mpi-pow.c @@ -26,6 +26,7 @@ * however I decided to publish this code under the plain GPL. */ +#include <linux/sched.h> #include <linux/string.h> #include "mpi-internal.h" #include "longlong.h" @@ -256,6 +257,7 @@ int mpi_powm(MPI res, MPI base, MPI exp, } e <<= 1; c--; + cond_resched(); } i--; Patches currently in stable-queue which might be from ebiggers(a)google.com are queue-4.9/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "sched: Make resched_cpu() unconditional" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled sched: Make resched_cpu() unconditional to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: sched-make-resched_cpu-unconditional.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" <paulmck(a)linux.vnet.ibm.com> Date: Mon, 18 Sep 2017 08:54:40 -0700 Subject: sched: Make resched_cpu() unconditional From: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> commit 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 upstream. The current implementation of synchronize_sched_expedited() incorrectly assumes that resched_cpu() is unconditional, which it is not. This means that synchronize_sched_expedited() can hang when resched_cpu()'s trylock fails as follows (analysis by Neeraj Upadhyay): o CPU1 is waiting for expedited wait to complete: sync_rcu_exp_select_cpus rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5 IPI sent to CPU5 synchronize_sched_expedited_wait ret = swait_event_timeout(rsp->expedited_wq, sync_rcu_preempt_exp_done(rnp_root), jiffies_stall); expmask = 0x20, CPU 5 in idle path (in cpuidle_enter()) o CPU5 handles IPI and fails to acquire rq lock. Handles IPI sync_sched_exp_handler resched_cpu returns while failing to try lock acquire rq->lock need_resched is not set o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to idle (schedule() is not called). o CPU 1 reports RCU stall. Given that resched_cpu() is now used only by RCU, this commit fixes the assumption by making resched_cpu() unconditional. Reported-by: Neeraj Upadhyay <neeraju(a)codeaurora.org> Suggested-by: Neeraj Upadhyay <neeraju(a)codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- kernel/sched/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -600,8 +600,7 @@ void resched_cpu(int cpu) struct rq *rq = cpu_rq(cpu); unsigned long flags; - if (!raw_spin_trylock_irqsave(&rq->lock, flags)) - return; + raw_spin_lock_irqsave(&rq->lock, flags); resched_curr(rq); raw_spin_unlock_irqrestore(&rq->lock, flags); } Patches currently in stable-queue which might be from paulmck(a)linux.vnet.ibm.com are queue-4.4/sched-make-resched_cpu-unconditional.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "lib/mpi: call cond_resched() from mpi_powm() loop" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled lib/mpi: call cond_resched() from mpi_powm() loop to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: lib-mpi-call-cond_resched-from-mpi_powm-loop.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca Mon Sep 17 00:00:00 2001 From: Eric Biggers <ebiggers(a)google.com> Date: Tue, 7 Nov 2017 14:15:27 -0800 Subject: lib/mpi: call cond_resched() from mpi_powm() loop From: Eric Biggers <ebiggers(a)google.com> commit 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca upstream. On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the largest permitted inputs (16384 bits), the kernel spends 10+ seconds doing modular exponentiation in mpi_powm() without rescheduling. If all threads do it, it locks up the system. Moreover, it can cause rcu_sched-stall warnings. Notwithstanding the insanity of doing this calculation in kernel mode rather than in userspace, fix it by calling cond_resched() as each bit from the exponent is processed. It's still noninterruptible, but at least it's preemptible now. Do the cond_resched() once per bit rather than once per MPI limb because each limb might still easily take 100+ milliseconds on slow CPUs. Signed-off-by: Eric Biggers <ebiggers(a)google.com> Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- lib/mpi/mpi-pow.c | 2 ++ 1 file changed, 2 insertions(+) --- a/lib/mpi/mpi-pow.c +++ b/lib/mpi/mpi-pow.c @@ -26,6 +26,7 @@ * however I decided to publish this code under the plain GPL. */ +#include <linux/sched.h> #include <linux/string.h> #include "mpi-internal.h" #include "longlong.h" @@ -256,6 +257,7 @@ int mpi_powm(MPI res, MPI base, MPI exp, } e <<= 1; c--; + cond_resched(); } i--; Patches currently in stable-queue which might be from ebiggers(a)google.com are queue-4.4/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "serdev: fix registration of second slave" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled serdev: fix registration of second slave to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: serdev-fix-registration-of-second-slave.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 08fcee289f341786eb3b44e5f2d1dc850943238e Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan(a)kernel.org> Date: Tue, 10 Oct 2017 18:09:49 +0200 Subject: serdev: fix registration of second slave From: Johan Hovold <johan(a)kernel.org> commit 08fcee289f341786eb3b44e5f2d1dc850943238e upstream. Serdev currently only supports a single slave device, but the required sanity checks to prevent further registration attempts were missing. If a serial-port node has two child nodes with compatible properties, the OF code would try to register two slave devices using the same id and name. Driver core will not allow this (and there will be loud complaints), but the controller's slave pointer would already have been set to address of the soon to be deallocated second struct serdev_device. As the first slave device remains registered, this can lead to later use-after-free issues when the slave callbacks are accessed. Note that while the serdev registration helpers are exported, they are typically only called by serdev core. Any other (out-of-tree) callers must serialise registration and deregistration themselves. Fixes: cd6484e1830b ("serdev: Introduce new bus for serial attached devices") Cc: Rob Herring <robh(a)kernel.org> Signed-off-by: Johan Hovold <johan(a)kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/tty/serdev/core.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) --- a/drivers/tty/serdev/core.c +++ b/drivers/tty/serdev/core.c @@ -65,21 +65,32 @@ static int serdev_uevent(struct device * */ int serdev_device_add(struct serdev_device *serdev) { + struct serdev_controller *ctrl = serdev->ctrl; struct device *parent = serdev->dev.parent; int err; dev_set_name(&serdev->dev, "%s-%d", dev_name(parent), serdev->nr); + /* Only a single slave device is currently supported. */ + if (ctrl->serdev) { + dev_err(&serdev->dev, "controller busy\n"); + return -EBUSY; + } + ctrl->serdev = serdev; + err = device_add(&serdev->dev); if (err < 0) { dev_err(&serdev->dev, "Can't add %s, status %d\n", dev_name(&serdev->dev), err); - goto err_device_add; + goto err_clear_serdev; } dev_dbg(&serdev->dev, "device %s registered\n", dev_name(&serdev->dev)); -err_device_add: + return 0; + +err_clear_serdev: + ctrl->serdev = NULL; return err; } EXPORT_SYMBOL_GPL(serdev_device_add); @@ -90,7 +101,10 @@ EXPORT_SYMBOL_GPL(serdev_device_add); */ void serdev_device_remove(struct serdev_device *serdev) { + struct serdev_controller *ctrl = serdev->ctrl; + device_unregister(&serdev->dev); + ctrl->serdev = NULL; } EXPORT_SYMBOL_GPL(serdev_device_remove); @@ -295,7 +309,6 @@ struct serdev_device *serdev_device_allo return NULL; serdev->ctrl = ctrl; - ctrl->serdev = serdev; device_initialize(&serdev->dev); serdev->dev.parent = &ctrl->dev; serdev->dev.bus = &serdev_bus_type; Patches currently in stable-queue which might be from johan(a)kernel.org are queue-4.14/serdev-fix-registration-of-second-slave.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "sched: Make resched_cpu() unconditional" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled sched: Make resched_cpu() unconditional to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: sched-make-resched_cpu-unconditional.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" <paulmck(a)linux.vnet.ibm.com> Date: Mon, 18 Sep 2017 08:54:40 -0700 Subject: sched: Make resched_cpu() unconditional From: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> commit 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 upstream. The current implementation of synchronize_sched_expedited() incorrectly assumes that resched_cpu() is unconditional, which it is not. This means that synchronize_sched_expedited() can hang when resched_cpu()'s trylock fails as follows (analysis by Neeraj Upadhyay): o CPU1 is waiting for expedited wait to complete: sync_rcu_exp_select_cpus rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5 IPI sent to CPU5 synchronize_sched_expedited_wait ret = swait_event_timeout(rsp->expedited_wq, sync_rcu_preempt_exp_done(rnp_root), jiffies_stall); expmask = 0x20, CPU 5 in idle path (in cpuidle_enter()) o CPU5 handles IPI and fails to acquire rq lock. Handles IPI sync_sched_exp_handler resched_cpu returns while failing to try lock acquire rq->lock need_resched is not set o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to idle (schedule() is not called). o CPU 1 reports RCU stall. Given that resched_cpu() is now used only by RCU, this commit fixes the assumption by making resched_cpu() unconditional. Reported-by: Neeraj Upadhyay <neeraju(a)codeaurora.org> Suggested-by: Neeraj Upadhyay <neeraju(a)codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- kernel/sched/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -505,8 +505,7 @@ void resched_cpu(int cpu) struct rq *rq = cpu_rq(cpu); unsigned long flags; - if (!raw_spin_trylock_irqsave(&rq->lock, flags)) - return; + raw_spin_lock_irqsave(&rq->lock, flags); resched_curr(rq); raw_spin_unlock_irqrestore(&rq->lock, flags); } Patches currently in stable-queue which might be from paulmck(a)linux.vnet.ibm.com are queue-4.14/sched-make-resched_cpu-unconditional.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "lib/mpi: call cond_resched() from mpi_powm() loop" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled lib/mpi: call cond_resched() from mpi_powm() loop to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: lib-mpi-call-cond_resched-from-mpi_powm-loop.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca Mon Sep 17 00:00:00 2001 From: Eric Biggers <ebiggers(a)google.com> Date: Tue, 7 Nov 2017 14:15:27 -0800 Subject: lib/mpi: call cond_resched() from mpi_powm() loop From: Eric Biggers <ebiggers(a)google.com> commit 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca upstream. On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the largest permitted inputs (16384 bits), the kernel spends 10+ seconds doing modular exponentiation in mpi_powm() without rescheduling. If all threads do it, it locks up the system. Moreover, it can cause rcu_sched-stall warnings. Notwithstanding the insanity of doing this calculation in kernel mode rather than in userspace, fix it by calling cond_resched() as each bit from the exponent is processed. It's still noninterruptible, but at least it's preemptible now. Do the cond_resched() once per bit rather than once per MPI limb because each limb might still easily take 100+ milliseconds on slow CPUs. Signed-off-by: Eric Biggers <ebiggers(a)google.com> Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- lib/mpi/mpi-pow.c | 2 ++ 1 file changed, 2 insertions(+) --- a/lib/mpi/mpi-pow.c +++ b/lib/mpi/mpi-pow.c @@ -26,6 +26,7 @@ * however I decided to publish this code under the plain GPL. */ +#include <linux/sched.h> #include <linux/string.h> #include "mpi-internal.h" #include "longlong.h" @@ -256,6 +257,7 @@ int mpi_powm(MPI res, MPI base, MPI exp, } e <<= 1; c--; + cond_resched(); } i--; Patches currently in stable-queue which might be from ebiggers(a)google.com are queue-4.14/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: cpufreq-schedutil-reset-cached_raw_freq-when-not-in-sync-with-next_freq.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 07458f6a5171d97511dfbdf6ce549ed2ca0280c7 Mon Sep 17 00:00:00 2001 From: Viresh Kumar <viresh.kumar(a)linaro.org> Date: Wed, 8 Nov 2017 20:23:55 +0530 Subject: cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq From: Viresh Kumar <viresh.kumar(a)linaro.org> commit 07458f6a5171d97511dfbdf6ce549ed2ca0280c7 upstream. 'cached_raw_freq' is used to get the next frequency quickly but should always be in sync with sg_policy->next_freq. There is a case where it is not and in such cases it should be reset to avoid switching to incorrect frequencies. Consider this case for example: - policy->cur is 1.2 GHz (Max) - New request comes for 780 MHz and we store that in cached_raw_freq. - Based on 780 MHz, we calculate the effective frequency as 800 MHz. - We then see the CPU wasn't idle recently and choose to keep the next freq as 1.2 GHz. - Now we have cached_raw_freq is 780 MHz and sg_policy->next_freq is 1.2 GHz. - Now if the utilization doesn't change in then next request, then the next target frequency will still be 780 MHz and it will match with cached_raw_freq. But we will choose 1.2 GHz instead of 800 MHz here. Fixes: b7eaf1aab9f8 (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely) Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- kernel/sched/cpufreq_schedutil.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -282,8 +282,12 @@ static void sugov_update_single(struct u * Do not reduce the frequency if the CPU has not been idle * recently, as the reduction is likely to be premature then. */ - if (busy && next_f < sg_policy->next_freq) + if (busy && next_f < sg_policy->next_freq) { next_f = sg_policy->next_freq; + + /* Reset cached freq as next_freq has changed */ + sg_policy->cached_raw_freq = 0; + } } sugov_update_commit(sg_policy, time, next_f); } Patches currently in stable-queue which might be from viresh.kumar(a)linaro.org are queue-4.14/cpufreq-schedutil-reset-cached_raw_freq-when-not-in-sync-with-next_freq.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "sched: Make resched_cpu() unconditional" has been added to the 3.18-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled sched: Make resched_cpu() unconditional to the 3.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: sched-make-resched_cpu-unconditional.patch and it can be found in the queue-3.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" <paulmck(a)linux.vnet.ibm.com> Date: Mon, 18 Sep 2017 08:54:40 -0700 Subject: sched: Make resched_cpu() unconditional From: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> commit 7c2102e56a3f7d85b5d8f33efbd7aecc1f36fdd8 upstream. The current implementation of synchronize_sched_expedited() incorrectly assumes that resched_cpu() is unconditional, which it is not. This means that synchronize_sched_expedited() can hang when resched_cpu()'s trylock fails as follows (analysis by Neeraj Upadhyay): o CPU1 is waiting for expedited wait to complete: sync_rcu_exp_select_cpus rdp->exp_dynticks_snap & 0x1 // returns 1 for CPU5 IPI sent to CPU5 synchronize_sched_expedited_wait ret = swait_event_timeout(rsp->expedited_wq, sync_rcu_preempt_exp_done(rnp_root), jiffies_stall); expmask = 0x20, CPU 5 in idle path (in cpuidle_enter()) o CPU5 handles IPI and fails to acquire rq lock. Handles IPI sync_sched_exp_handler resched_cpu returns while failing to try lock acquire rq->lock need_resched is not set o CPU5 calls rcu_idle_enter() and as need_resched is not set, goes to idle (schedule() is not called). o CPU 1 reports RCU stall. Given that resched_cpu() is now used only by RCU, this commit fixes the assumption by making resched_cpu() unconditional. Reported-by: Neeraj Upadhyay <neeraju(a)codeaurora.org> Suggested-by: Neeraj Upadhyay <neeraju(a)codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- kernel/sched/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -632,8 +632,7 @@ void resched_cpu(int cpu) struct rq *rq = cpu_rq(cpu); unsigned long flags; - if (!raw_spin_trylock_irqsave(&rq->lock, flags)) - return; + raw_spin_lock_irqsave(&rq->lock, flags); resched_curr(rq); raw_spin_unlock_irqrestore(&rq->lock, flags); } Patches currently in stable-queue which might be from paulmck(a)linux.vnet.ibm.com are queue-3.18/sched-make-resched_cpu-unconditional.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "lib/mpi: call cond_resched() from mpi_powm() loop" has been added to the 3.18-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled lib/mpi: call cond_resched() from mpi_powm() loop to the 3.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: lib-mpi-call-cond_resched-from-mpi_powm-loop.patch and it can be found in the queue-3.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca Mon Sep 17 00:00:00 2001 From: Eric Biggers <ebiggers(a)google.com> Date: Tue, 7 Nov 2017 14:15:27 -0800 Subject: lib/mpi: call cond_resched() from mpi_powm() loop From: Eric Biggers <ebiggers(a)google.com> commit 1d9ddde12e3c9bab7f3d3484eb9446315e3571ca upstream. On a non-preemptible kernel, if KEYCTL_DH_COMPUTE is called with the largest permitted inputs (16384 bits), the kernel spends 10+ seconds doing modular exponentiation in mpi_powm() without rescheduling. If all threads do it, it locks up the system. Moreover, it can cause rcu_sched-stall warnings. Notwithstanding the insanity of doing this calculation in kernel mode rather than in userspace, fix it by calling cond_resched() as each bit from the exponent is processed. It's still noninterruptible, but at least it's preemptible now. Do the cond_resched() once per bit rather than once per MPI limb because each limb might still easily take 100+ milliseconds on slow CPUs. Signed-off-by: Eric Biggers <ebiggers(a)google.com> Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- lib/mpi/mpi-pow.c | 2 ++ 1 file changed, 2 insertions(+) --- a/lib/mpi/mpi-pow.c +++ b/lib/mpi/mpi-pow.c @@ -26,6 +26,7 @@ * however I decided to publish this code under the plain GPL. */ +#include <linux/sched.h> #include <linux/string.h> #include "mpi-internal.h" #include "longlong.h" @@ -256,6 +257,7 @@ int mpi_powm(MPI res, MPI base, MPI exp, } e <<= 1; c--; + cond_resched(); } i--; Patches currently in stable-queue which might be from ebiggers(a)google.com are queue-3.18/lib-mpi-call-cond_resched-from-mpi_powm-loop.patch

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH] sunxi-rsb: Include OF based modalias in device uevent

by Stefan Brüns

Include the OF-based modalias in the uevent sent when registering devices on the sunxi RSB bus, so that user space has a chance to autoload the kernel module for the device. Fixes a regression caused by commit 3f241bfa60bd ("arm64: allwinner: a64: pine64: Use dcdc1 regulator for mmc0"). When the axp20x-rsb module for the AXP803 PMIC is built as a module, it is not loaded and the system ends up with an disfunctional MMC controller. Cc: stable <stable(a)vger.kernel.org> Signed-off-by: Stefan Brüns <stefan.bruens(a)rwth-aachen.de> --- drivers/bus/sunxi-rsb.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/bus/sunxi-rsb.c b/drivers/bus/sunxi-rsb.c index 328ca93781cf..37cb57244cbe 100644 --- a/drivers/bus/sunxi-rsb.c +++ b/drivers/bus/sunxi-rsb.c @@ -173,11 +173,24 @@ static int sunxi_rsb_device_remove(struct device *dev) return drv->remove(to_sunxi_rsb_device(dev)); } +static int sunxi_rsb_device_uevent(struct device *dev, + struct kobj_uevent_env *env) +{ + int ret; + + ret = of_device_uevent_modalias(dev, env); + if (ret != -ENODEV) + return ret; + + return 0; +} + static struct bus_type sunxi_rsb_bus = { .name = RSB_CTRL_NAME, .match = sunxi_rsb_device_match, .probe = sunxi_rsb_device_probe, .remove = sunxi_rsb_device_remove, + .uevent = sunxi_rsb_device_uevent, }; static void sunxi_rsb_dev_release(struct device *dev) -- 2.15.0

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ACPI / APEI: Remove arch_apei_flush_tlb_one()" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ACPI / APEI: Remove arch_apei_flush_tlb_one() to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: acpi-apei-remove-arch_apei_flush_tlb_one.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 4a75aeacda3c2455954596593d89187df5420d0a Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:27 +0000 Subject: ACPI / APEI: Remove arch_apei_flush_tlb_one() From: James Morse <james.morse(a)arm.com> commit 4a75aeacda3c2455954596593d89187df5420d0a upstream. Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on __set_pte_vaddr() to do the invalidation when called from clear_fixmap() Remove arch_apei_flush_tlb_one(). Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kernel/acpi/apei.c | 5 ----- include/acpi/apei.h | 1 - 2 files changed, 6 deletions(-) --- a/arch/x86/kernel/acpi/apei.c +++ b/arch/x86/kernel/acpi/apei.c @@ -55,8 +55,3 @@ void arch_apei_report_mem_error(int sev, apei_mce_report_mem_error(sev, mem_err); #endif } - -void arch_apei_flush_tlb_one(unsigned long addr) -{ - __flush_tlb_one(addr); -} --- a/include/acpi/apei.h +++ b/include/acpi/apei.h @@ -44,7 +44,6 @@ int erst_clear(u64 record_id); int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data); void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err); -void arch_apei_flush_tlb_one(unsigned long addr); #endif #endif Patches currently in stable-queue which might be from james.morse(a)arm.com are queue-4.9/acpi-apei-remove-arch_apei_flush_tlb_one.patch

8 years

3
3
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] sched/debug: Fix task state recording/printout" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 3f5fe9fef5b2da06b6319fab8123056da5217c3f Mon Sep 17 00:00:00 2001 From: Thomas Gleixner <tglx(a)linutronix.de> Date: Wed, 22 Nov 2017 13:05:48 +0100 Subject: [PATCH] sched/debug: Fix task state recording/printout The recent conversion of the task state recording to use task_state_index() broke the sched_switch tracepoint task state output. task_state_index() returns surprisingly an index (0-7) which is then printed with __print_flags() applying bitmasks. Not really working and resulting in weird states like 'prev_state=t' instead of 'prev_state=I'. Use TASK_REPORT_MAX instead of TASK_STATE_MAX to report preemption. Build a bitmask from the return value of task_state_index() and store it in entry->prev_state, which makes __print_flags() work as expected. Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Steven Rostedt <rostedt(a)goodmis.org> Cc: stable(a)vger.kernel.org Fixes: efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1711221304180.1751@nanos Signed-off-by: Ingo Molnar <mingo(a)kernel.org> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 306b31de5194..bc01e06bc716 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -116,9 +116,9 @@ static inline long __trace_sched_switch_state(bool preempt, struct task_struct * * RUNNING (we will not have dequeued if state != RUNNING). */ if (preempt) - return TASK_STATE_MAX; + return TASK_REPORT_MAX; - return task_state_index(p); + return 1 << task_state_index(p); } #endif /* CREATE_TRACE_POINTS */ @@ -164,7 +164,7 @@ TRACE_EVENT(sched_switch, { 0x40, "P" }, { 0x80, "I" }) : "R", - __entry->prev_state & TASK_STATE_MAX ? "+" : "", + __entry->prev_state & TASK_REPORT_MAX ? "+" : "", __entry->next_comm, __entry->next_pid, __entry->next_prio) );

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] sched/rt: Use IPI to trigger RT task push migration instead" failed to apply to 3.18-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 3.18-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From b6366f048e0caff28af5335b7af2031266e1b06b Mon Sep 17 00:00:00 2001 From: Steven Rostedt <rostedt(a)goodmis.org> Date: Wed, 18 Mar 2015 14:49:46 -0400 Subject: [PATCH] sched/rt: Use IPI to trigger RT task push migration instead of pulling When debugging the latencies on a 40 core box, where we hit 300 to 500 microsecond latencies, I found there was a huge contention on the runqueue locks. Investigating it further, running ftrace, I found that it was due to the pulling of RT tasks. The test that was run was the following: cyclictest --numa -p95 -m -d0 -i100 This created a thread on each CPU, that would set its wakeup in iterations of 100 microseconds. The -d0 means that all the threads had the same interval (100us). Each thread sleeps for 100us and wakes up and measures its latencies. cyclictest is maintained at: git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git What happened was another RT task would be scheduled on one of the CPUs that was running our test, when the other CPU tests went to sleep and scheduled idle. This caused the "pull" operation to execute on all these CPUs. Each one of these saw the RT task that was overloaded on the CPU of the test that was still running, and each one tried to grab that task in a thundering herd way. To grab the task, each thread would do a double rq lock grab, grabbing its own lock as well as the rq of the overloaded CPU. As the sched domains on this box was rather flat for its size, I saw up to 12 CPUs block on this lock at once. This caused a ripple affect with the rq locks especially since the taking was done via a double rq lock, which means that several of the CPUs had their own rq locks held while trying to take this rq lock. As these locks were blocked, any wakeups or load balanceing on these CPUs would also block on these locks, and the wait time escalated. I've tried various methods to lessen the load, but things like an atomic counter to only let one CPU grab the task wont work, because the task may have a limited affinity, and we may pick the wrong CPU to take that lock and do the pull, to only find out that the CPU we picked isn't in the task's affinity. Instead of doing the PULL, I now have the CPUs that want the pull to send over an IPI to the overloaded CPU, and let that CPU pick what CPU to push the task to. No more need to grab the rq lock, and the push/pull algorithm still works fine. With this patch, the latency dropped to just 150us over a 20 hour run. Without the patch, the huge latencies would trigger in seconds. I've created a new sched feature called RT_PUSH_IPI, which is enabled by default. When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks and having the pulling CPU do the work is implemented. When RT_PUSH_IPI is enabled, the IPI is sent to the overloaded CPU to do a push. To enabled or disable this at run time: # mount -t debugfs nodev /sys/kernel/debug # echo RT_PUSH_IPI > /sys/kernel/debug/sched_features or # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features Update: This original patch would send an IPI to all CPUs in the RT overload list. But that could theoretically cause the reverse issue. That is, there could be lots of overloaded RT queues and one CPU lowers its priority. It would then send an IPI to all the overloaded RT queues and they could then all try to grab the rq lock of the CPU lowering its priority, and then we have the same problem. The latest design sends out only one IPI to the first overloaded CPU. It tries to push any tasks that it can, and then looks for the next overloaded CPU that can push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable tasks that have priorities greater than the source CPU are covered. In case the source CPU lowers its priority again, a flag is set to tell the IPI traversal to restart with the first RT overloaded CPU after the source CPU. Parts-suggested-by: Peter Zijlstra <peterz(a)infradead.org> Signed-off-by: Steven Rostedt <rostedt(a)goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc: Joern Engel <joern(a)purestorage.com> Cc: Clark Williams <williams(a)redhat.com> Cc: Mike Galbraith <umgwanakikbuti(a)gmail.com> Cc: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home Signed-off-by: Ingo Molnar <mingo(a)kernel.org> diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 90284d117fe6..91e33cd485f6 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -56,6 +56,19 @@ SCHED_FEAT(NONTASK_CAPACITY, true) */ SCHED_FEAT(TTWU_QUEUE, true) +#ifdef HAVE_RT_PUSH_IPI +/* + * In order to avoid a thundering herd attack of CPUs that are + * lowering their priorities at the same time, and there being + * a single CPU that has an RT task that can migrate and is waiting + * to run, where the other CPUs will try to take that CPUs + * rq lock and possibly create a large contention, sending an + * IPI to that CPU and let that CPU push the RT task to where + * it should go may be a better scenario. + */ +SCHED_FEAT(RT_PUSH_IPI, true) +#endif + SCHED_FEAT(FORCE_SD_OVERLAP, false) SCHED_FEAT(RT_RUNTIME_SHARE, true) SCHED_FEAT(LB_MIN, false) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index f4d4b077eba0..ad0241561c3e 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -6,6 +6,7 @@ #include "sched.h" #include <linux/slab.h> +#include <linux/irq_work.h> int sched_rr_timeslice = RR_TIMESLICE; @@ -59,6 +60,10 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b) raw_spin_unlock(&rt_b->rt_runtime_lock); } +#ifdef CONFIG_SMP +static void push_irq_work_func(struct irq_work *work); +#endif + void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq) { struct rt_prio_array *array; @@ -78,7 +83,14 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq) rt_rq->rt_nr_migratory = 0; rt_rq->overloaded = 0; plist_head_init(&rt_rq->pushable_tasks); + +#ifdef HAVE_RT_PUSH_IPI + rt_rq->push_flags = 0; + rt_rq->push_cpu = nr_cpu_ids; + raw_spin_lock_init(&rt_rq->push_lock); + init_irq_work(&rt_rq->push_work, push_irq_work_func); #endif +#endif /* CONFIG_SMP */ /* We start is dequeued state, because no RT tasks are queued */ rt_rq->rt_queued = 0; @@ -1778,6 +1790,164 @@ static void push_rt_tasks(struct rq *rq) ; } +#ifdef HAVE_RT_PUSH_IPI +/* + * The search for the next cpu always starts at rq->cpu and ends + * when we reach rq->cpu again. It will never return rq->cpu. + * This returns the next cpu to check, or nr_cpu_ids if the loop + * is complete. + * + * rq->rt.push_cpu holds the last cpu returned by this function, + * or if this is the first instance, it must hold rq->cpu. + */ +static int rto_next_cpu(struct rq *rq) +{ + int prev_cpu = rq->rt.push_cpu; + int cpu; + + cpu = cpumask_next(prev_cpu, rq->rd->rto_mask); + + /* + * If the previous cpu is less than the rq's CPU, then it already + * passed the end of the mask, and has started from the beginning. + * We end if the next CPU is greater or equal to rq's CPU. + */ + if (prev_cpu < rq->cpu) { + if (cpu >= rq->cpu) + return nr_cpu_ids; + + } else if (cpu >= nr_cpu_ids) { + /* + * We passed the end of the mask, start at the beginning. + * If the result is greater or equal to the rq's CPU, then + * the loop is finished. + */ + cpu = cpumask_first(rq->rd->rto_mask); + if (cpu >= rq->cpu) + return nr_cpu_ids; + } + rq->rt.push_cpu = cpu; + + /* Return cpu to let the caller know if the loop is finished or not */ + return cpu; +} + +static int find_next_push_cpu(struct rq *rq) +{ + struct rq *next_rq; + int cpu; + + while (1) { + cpu = rto_next_cpu(rq); + if (cpu >= nr_cpu_ids) + break; + next_rq = cpu_rq(cpu); + + /* Make sure the next rq can push to this rq */ + if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr) + break; + } + + return cpu; +} + +#define RT_PUSH_IPI_EXECUTING 1 +#define RT_PUSH_IPI_RESTART 2 + +static void tell_cpu_to_push(struct rq *rq) +{ + int cpu; + + if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { + raw_spin_lock(&rq->rt.push_lock); + /* Make sure it's still executing */ + if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { + /* + * Tell the IPI to restart the loop as things have + * changed since it started. + */ + rq->rt.push_flags |= RT_PUSH_IPI_RESTART; + raw_spin_unlock(&rq->rt.push_lock); + return; + } + raw_spin_unlock(&rq->rt.push_lock); + } + + /* When here, there's no IPI going around */ + + rq->rt.push_cpu = rq->cpu; + cpu = find_next_push_cpu(rq); + if (cpu >= nr_cpu_ids) + return; + + rq->rt.push_flags = RT_PUSH_IPI_EXECUTING; + + irq_work_queue_on(&rq->rt.push_work, cpu); +} + +/* Called from hardirq context */ +static void try_to_push_tasks(void *arg) +{ + struct rt_rq *rt_rq = arg; + struct rq *rq, *src_rq; + int this_cpu; + int cpu; + + this_cpu = rt_rq->push_cpu; + + /* Paranoid check */ + BUG_ON(this_cpu != smp_processor_id()); + + rq = cpu_rq(this_cpu); + src_rq = rq_of_rt_rq(rt_rq); + +again: + if (has_pushable_tasks(rq)) { + raw_spin_lock(&rq->lock); + push_rt_task(rq); + raw_spin_unlock(&rq->lock); + } + + /* Pass the IPI to the next rt overloaded queue */ + raw_spin_lock(&rt_rq->push_lock); + /* + * If the source queue changed since the IPI went out, + * we need to restart the search from that CPU again. + */ + if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) { + rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART; + rt_rq->push_cpu = src_rq->cpu; + } + + cpu = find_next_push_cpu(src_rq); + + if (cpu >= nr_cpu_ids) + rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING; + raw_spin_unlock(&rt_rq->push_lock); + + if (cpu >= nr_cpu_ids) + return; + + /* + * It is possible that a restart caused this CPU to be + * chosen again. Don't bother with an IPI, just see if we + * have more to push. + */ + if (unlikely(cpu == rq->cpu)) + goto again; + + /* Try the next RT overloaded CPU */ + irq_work_queue_on(&rt_rq->push_work, cpu); +} + +static void push_irq_work_func(struct irq_work *work) +{ + struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work); + + try_to_push_tasks(rt_rq); +} +#endif /* HAVE_RT_PUSH_IPI */ + static int pull_rt_task(struct rq *this_rq) { int this_cpu = this_rq->cpu, ret = 0, cpu; @@ -1793,6 +1963,13 @@ static int pull_rt_task(struct rq *this_rq) */ smp_rmb(); +#ifdef HAVE_RT_PUSH_IPI + if (sched_feat(RT_PUSH_IPI)) { + tell_cpu_to_push(this_rq); + return 0; + } +#endif + for_each_cpu(cpu, this_rq->rd->rto_mask) { if (this_cpu == cpu) continue; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index dc0f435a2779..c2c0d7bd5027 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -6,6 +6,7 @@ #include <linux/mutex.h> #include <linux/spinlock.h> #include <linux/stop_machine.h> +#include <linux/irq_work.h> #include <linux/tick.h> #include <linux/slab.h> @@ -418,6 +419,11 @@ static inline int rt_bandwidth_enabled(void) return sysctl_sched_rt_runtime >= 0; } +/* RT IPI pull logic requires IRQ_WORK */ +#ifdef CONFIG_IRQ_WORK +# define HAVE_RT_PUSH_IPI +#endif + /* Real-Time classes' related field in a runqueue: */ struct rt_rq { struct rt_prio_array active; @@ -435,7 +441,13 @@ struct rt_rq { unsigned long rt_nr_total; int overloaded; struct plist_head pushable_tasks; +#ifdef HAVE_RT_PUSH_IPI + int push_flags; + int push_cpu; + struct irq_work push_work; + raw_spinlock_t push_lock; #endif +#endif /* CONFIG_SMP */ int rt_queued; int rt_throttled;

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] sched/rt: Use IPI to trigger RT task push migration instead" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From b6366f048e0caff28af5335b7af2031266e1b06b Mon Sep 17 00:00:00 2001 From: Steven Rostedt <rostedt(a)goodmis.org> Date: Wed, 18 Mar 2015 14:49:46 -0400 Subject: [PATCH] sched/rt: Use IPI to trigger RT task push migration instead of pulling When debugging the latencies on a 40 core box, where we hit 300 to 500 microsecond latencies, I found there was a huge contention on the runqueue locks. Investigating it further, running ftrace, I found that it was due to the pulling of RT tasks. The test that was run was the following: cyclictest --numa -p95 -m -d0 -i100 This created a thread on each CPU, that would set its wakeup in iterations of 100 microseconds. The -d0 means that all the threads had the same interval (100us). Each thread sleeps for 100us and wakes up and measures its latencies. cyclictest is maintained at: git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git What happened was another RT task would be scheduled on one of the CPUs that was running our test, when the other CPU tests went to sleep and scheduled idle. This caused the "pull" operation to execute on all these CPUs. Each one of these saw the RT task that was overloaded on the CPU of the test that was still running, and each one tried to grab that task in a thundering herd way. To grab the task, each thread would do a double rq lock grab, grabbing its own lock as well as the rq of the overloaded CPU. As the sched domains on this box was rather flat for its size, I saw up to 12 CPUs block on this lock at once. This caused a ripple affect with the rq locks especially since the taking was done via a double rq lock, which means that several of the CPUs had their own rq locks held while trying to take this rq lock. As these locks were blocked, any wakeups or load balanceing on these CPUs would also block on these locks, and the wait time escalated. I've tried various methods to lessen the load, but things like an atomic counter to only let one CPU grab the task wont work, because the task may have a limited affinity, and we may pick the wrong CPU to take that lock and do the pull, to only find out that the CPU we picked isn't in the task's affinity. Instead of doing the PULL, I now have the CPUs that want the pull to send over an IPI to the overloaded CPU, and let that CPU pick what CPU to push the task to. No more need to grab the rq lock, and the push/pull algorithm still works fine. With this patch, the latency dropped to just 150us over a 20 hour run. Without the patch, the huge latencies would trigger in seconds. I've created a new sched feature called RT_PUSH_IPI, which is enabled by default. When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks and having the pulling CPU do the work is implemented. When RT_PUSH_IPI is enabled, the IPI is sent to the overloaded CPU to do a push. To enabled or disable this at run time: # mount -t debugfs nodev /sys/kernel/debug # echo RT_PUSH_IPI > /sys/kernel/debug/sched_features or # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features Update: This original patch would send an IPI to all CPUs in the RT overload list. But that could theoretically cause the reverse issue. That is, there could be lots of overloaded RT queues and one CPU lowers its priority. It would then send an IPI to all the overloaded RT queues and they could then all try to grab the rq lock of the CPU lowering its priority, and then we have the same problem. The latest design sends out only one IPI to the first overloaded CPU. It tries to push any tasks that it can, and then looks for the next overloaded CPU that can push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable tasks that have priorities greater than the source CPU are covered. In case the source CPU lowers its priority again, a flag is set to tell the IPI traversal to restart with the first RT overloaded CPU after the source CPU. Parts-suggested-by: Peter Zijlstra <peterz(a)infradead.org> Signed-off-by: Steven Rostedt <rostedt(a)goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc: Joern Engel <joern(a)purestorage.com> Cc: Clark Williams <williams(a)redhat.com> Cc: Mike Galbraith <umgwanakikbuti(a)gmail.com> Cc: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home Signed-off-by: Ingo Molnar <mingo(a)kernel.org> diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 90284d117fe6..91e33cd485f6 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -56,6 +56,19 @@ SCHED_FEAT(NONTASK_CAPACITY, true) */ SCHED_FEAT(TTWU_QUEUE, true) +#ifdef HAVE_RT_PUSH_IPI +/* + * In order to avoid a thundering herd attack of CPUs that are + * lowering their priorities at the same time, and there being + * a single CPU that has an RT task that can migrate and is waiting + * to run, where the other CPUs will try to take that CPUs + * rq lock and possibly create a large contention, sending an + * IPI to that CPU and let that CPU push the RT task to where + * it should go may be a better scenario. + */ +SCHED_FEAT(RT_PUSH_IPI, true) +#endif + SCHED_FEAT(FORCE_SD_OVERLAP, false) SCHED_FEAT(RT_RUNTIME_SHARE, true) SCHED_FEAT(LB_MIN, false) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index f4d4b077eba0..ad0241561c3e 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -6,6 +6,7 @@ #include "sched.h" #include <linux/slab.h> +#include <linux/irq_work.h> int sched_rr_timeslice = RR_TIMESLICE; @@ -59,6 +60,10 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b) raw_spin_unlock(&rt_b->rt_runtime_lock); } +#ifdef CONFIG_SMP +static void push_irq_work_func(struct irq_work *work); +#endif + void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq) { struct rt_prio_array *array; @@ -78,7 +83,14 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq) rt_rq->rt_nr_migratory = 0; rt_rq->overloaded = 0; plist_head_init(&rt_rq->pushable_tasks); + +#ifdef HAVE_RT_PUSH_IPI + rt_rq->push_flags = 0; + rt_rq->push_cpu = nr_cpu_ids; + raw_spin_lock_init(&rt_rq->push_lock); + init_irq_work(&rt_rq->push_work, push_irq_work_func); #endif +#endif /* CONFIG_SMP */ /* We start is dequeued state, because no RT tasks are queued */ rt_rq->rt_queued = 0; @@ -1778,6 +1790,164 @@ static void push_rt_tasks(struct rq *rq) ; } +#ifdef HAVE_RT_PUSH_IPI +/* + * The search for the next cpu always starts at rq->cpu and ends + * when we reach rq->cpu again. It will never return rq->cpu. + * This returns the next cpu to check, or nr_cpu_ids if the loop + * is complete. + * + * rq->rt.push_cpu holds the last cpu returned by this function, + * or if this is the first instance, it must hold rq->cpu. + */ +static int rto_next_cpu(struct rq *rq) +{ + int prev_cpu = rq->rt.push_cpu; + int cpu; + + cpu = cpumask_next(prev_cpu, rq->rd->rto_mask); + + /* + * If the previous cpu is less than the rq's CPU, then it already + * passed the end of the mask, and has started from the beginning. + * We end if the next CPU is greater or equal to rq's CPU. + */ + if (prev_cpu < rq->cpu) { + if (cpu >= rq->cpu) + return nr_cpu_ids; + + } else if (cpu >= nr_cpu_ids) { + /* + * We passed the end of the mask, start at the beginning. + * If the result is greater or equal to the rq's CPU, then + * the loop is finished. + */ + cpu = cpumask_first(rq->rd->rto_mask); + if (cpu >= rq->cpu) + return nr_cpu_ids; + } + rq->rt.push_cpu = cpu; + + /* Return cpu to let the caller know if the loop is finished or not */ + return cpu; +} + +static int find_next_push_cpu(struct rq *rq) +{ + struct rq *next_rq; + int cpu; + + while (1) { + cpu = rto_next_cpu(rq); + if (cpu >= nr_cpu_ids) + break; + next_rq = cpu_rq(cpu); + + /* Make sure the next rq can push to this rq */ + if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr) + break; + } + + return cpu; +} + +#define RT_PUSH_IPI_EXECUTING 1 +#define RT_PUSH_IPI_RESTART 2 + +static void tell_cpu_to_push(struct rq *rq) +{ + int cpu; + + if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { + raw_spin_lock(&rq->rt.push_lock); + /* Make sure it's still executing */ + if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { + /* + * Tell the IPI to restart the loop as things have + * changed since it started. + */ + rq->rt.push_flags |= RT_PUSH_IPI_RESTART; + raw_spin_unlock(&rq->rt.push_lock); + return; + } + raw_spin_unlock(&rq->rt.push_lock); + } + + /* When here, there's no IPI going around */ + + rq->rt.push_cpu = rq->cpu; + cpu = find_next_push_cpu(rq); + if (cpu >= nr_cpu_ids) + return; + + rq->rt.push_flags = RT_PUSH_IPI_EXECUTING; + + irq_work_queue_on(&rq->rt.push_work, cpu); +} + +/* Called from hardirq context */ +static void try_to_push_tasks(void *arg) +{ + struct rt_rq *rt_rq = arg; + struct rq *rq, *src_rq; + int this_cpu; + int cpu; + + this_cpu = rt_rq->push_cpu; + + /* Paranoid check */ + BUG_ON(this_cpu != smp_processor_id()); + + rq = cpu_rq(this_cpu); + src_rq = rq_of_rt_rq(rt_rq); + +again: + if (has_pushable_tasks(rq)) { + raw_spin_lock(&rq->lock); + push_rt_task(rq); + raw_spin_unlock(&rq->lock); + } + + /* Pass the IPI to the next rt overloaded queue */ + raw_spin_lock(&rt_rq->push_lock); + /* + * If the source queue changed since the IPI went out, + * we need to restart the search from that CPU again. + */ + if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) { + rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART; + rt_rq->push_cpu = src_rq->cpu; + } + + cpu = find_next_push_cpu(src_rq); + + if (cpu >= nr_cpu_ids) + rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING; + raw_spin_unlock(&rt_rq->push_lock); + + if (cpu >= nr_cpu_ids) + return; + + /* + * It is possible that a restart caused this CPU to be + * chosen again. Don't bother with an IPI, just see if we + * have more to push. + */ + if (unlikely(cpu == rq->cpu)) + goto again; + + /* Try the next RT overloaded CPU */ + irq_work_queue_on(&rt_rq->push_work, cpu); +} + +static void push_irq_work_func(struct irq_work *work) +{ + struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work); + + try_to_push_tasks(rt_rq); +} +#endif /* HAVE_RT_PUSH_IPI */ + static int pull_rt_task(struct rq *this_rq) { int this_cpu = this_rq->cpu, ret = 0, cpu; @@ -1793,6 +1963,13 @@ static int pull_rt_task(struct rq *this_rq) */ smp_rmb(); +#ifdef HAVE_RT_PUSH_IPI + if (sched_feat(RT_PUSH_IPI)) { + tell_cpu_to_push(this_rq); + return 0; + } +#endif + for_each_cpu(cpu, this_rq->rd->rto_mask) { if (this_cpu == cpu) continue; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index dc0f435a2779..c2c0d7bd5027 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -6,6 +6,7 @@ #include <linux/mutex.h> #include <linux/spinlock.h> #include <linux/stop_machine.h> +#include <linux/irq_work.h> #include <linux/tick.h> #include <linux/slab.h> @@ -418,6 +419,11 @@ static inline int rt_bandwidth_enabled(void) return sysctl_sched_rt_runtime >= 0; } +/* RT IPI pull logic requires IRQ_WORK */ +#ifdef CONFIG_IRQ_WORK +# define HAVE_RT_PUSH_IPI +#endif + /* Real-Time classes' related field in a runqueue: */ struct rt_rq { struct rt_prio_array active; @@ -435,7 +441,13 @@ struct rt_rq { unsigned long rt_nr_total; int overloaded; struct plist_head pushable_tasks; +#ifdef HAVE_RT_PUSH_IPI + int push_flags; + int push_cpu; + struct irq_work push_work; + raw_spinlock_t push_lock; #endif +#endif /* CONFIG_SMP */ int rt_queued; int rt_throttled;

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] sched/rt: Use IPI to trigger RT task push migration instead" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From b6366f048e0caff28af5335b7af2031266e1b06b Mon Sep 17 00:00:00 2001 From: Steven Rostedt <rostedt(a)goodmis.org> Date: Wed, 18 Mar 2015 14:49:46 -0400 Subject: [PATCH] sched/rt: Use IPI to trigger RT task push migration instead of pulling When debugging the latencies on a 40 core box, where we hit 300 to 500 microsecond latencies, I found there was a huge contention on the runqueue locks. Investigating it further, running ftrace, I found that it was due to the pulling of RT tasks. The test that was run was the following: cyclictest --numa -p95 -m -d0 -i100 This created a thread on each CPU, that would set its wakeup in iterations of 100 microseconds. The -d0 means that all the threads had the same interval (100us). Each thread sleeps for 100us and wakes up and measures its latencies. cyclictest is maintained at: git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git What happened was another RT task would be scheduled on one of the CPUs that was running our test, when the other CPU tests went to sleep and scheduled idle. This caused the "pull" operation to execute on all these CPUs. Each one of these saw the RT task that was overloaded on the CPU of the test that was still running, and each one tried to grab that task in a thundering herd way. To grab the task, each thread would do a double rq lock grab, grabbing its own lock as well as the rq of the overloaded CPU. As the sched domains on this box was rather flat for its size, I saw up to 12 CPUs block on this lock at once. This caused a ripple affect with the rq locks especially since the taking was done via a double rq lock, which means that several of the CPUs had their own rq locks held while trying to take this rq lock. As these locks were blocked, any wakeups or load balanceing on these CPUs would also block on these locks, and the wait time escalated. I've tried various methods to lessen the load, but things like an atomic counter to only let one CPU grab the task wont work, because the task may have a limited affinity, and we may pick the wrong CPU to take that lock and do the pull, to only find out that the CPU we picked isn't in the task's affinity. Instead of doing the PULL, I now have the CPUs that want the pull to send over an IPI to the overloaded CPU, and let that CPU pick what CPU to push the task to. No more need to grab the rq lock, and the push/pull algorithm still works fine. With this patch, the latency dropped to just 150us over a 20 hour run. Without the patch, the huge latencies would trigger in seconds. I've created a new sched feature called RT_PUSH_IPI, which is enabled by default. When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks and having the pulling CPU do the work is implemented. When RT_PUSH_IPI is enabled, the IPI is sent to the overloaded CPU to do a push. To enabled or disable this at run time: # mount -t debugfs nodev /sys/kernel/debug # echo RT_PUSH_IPI > /sys/kernel/debug/sched_features or # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features Update: This original patch would send an IPI to all CPUs in the RT overload list. But that could theoretically cause the reverse issue. That is, there could be lots of overloaded RT queues and one CPU lowers its priority. It would then send an IPI to all the overloaded RT queues and they could then all try to grab the rq lock of the CPU lowering its priority, and then we have the same problem. The latest design sends out only one IPI to the first overloaded CPU. It tries to push any tasks that it can, and then looks for the next overloaded CPU that can push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable tasks that have priorities greater than the source CPU are covered. In case the source CPU lowers its priority again, a flag is set to tell the IPI traversal to restart with the first RT overloaded CPU after the source CPU. Parts-suggested-by: Peter Zijlstra <peterz(a)infradead.org> Signed-off-by: Steven Rostedt <rostedt(a)goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc: Joern Engel <joern(a)purestorage.com> Cc: Clark Williams <williams(a)redhat.com> Cc: Mike Galbraith <umgwanakikbuti(a)gmail.com> Cc: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home Signed-off-by: Ingo Molnar <mingo(a)kernel.org> diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 90284d117fe6..91e33cd485f6 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -56,6 +56,19 @@ SCHED_FEAT(NONTASK_CAPACITY, true) */ SCHED_FEAT(TTWU_QUEUE, true) +#ifdef HAVE_RT_PUSH_IPI +/* + * In order to avoid a thundering herd attack of CPUs that are + * lowering their priorities at the same time, and there being + * a single CPU that has an RT task that can migrate and is waiting + * to run, where the other CPUs will try to take that CPUs + * rq lock and possibly create a large contention, sending an + * IPI to that CPU and let that CPU push the RT task to where + * it should go may be a better scenario. + */ +SCHED_FEAT(RT_PUSH_IPI, true) +#endif + SCHED_FEAT(FORCE_SD_OVERLAP, false) SCHED_FEAT(RT_RUNTIME_SHARE, true) SCHED_FEAT(LB_MIN, false) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index f4d4b077eba0..ad0241561c3e 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -6,6 +6,7 @@ #include "sched.h" #include <linux/slab.h> +#include <linux/irq_work.h> int sched_rr_timeslice = RR_TIMESLICE; @@ -59,6 +60,10 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b) raw_spin_unlock(&rt_b->rt_runtime_lock); } +#ifdef CONFIG_SMP +static void push_irq_work_func(struct irq_work *work); +#endif + void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq) { struct rt_prio_array *array; @@ -78,7 +83,14 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq) rt_rq->rt_nr_migratory = 0; rt_rq->overloaded = 0; plist_head_init(&rt_rq->pushable_tasks); + +#ifdef HAVE_RT_PUSH_IPI + rt_rq->push_flags = 0; + rt_rq->push_cpu = nr_cpu_ids; + raw_spin_lock_init(&rt_rq->push_lock); + init_irq_work(&rt_rq->push_work, push_irq_work_func); #endif +#endif /* CONFIG_SMP */ /* We start is dequeued state, because no RT tasks are queued */ rt_rq->rt_queued = 0; @@ -1778,6 +1790,164 @@ static void push_rt_tasks(struct rq *rq) ; } +#ifdef HAVE_RT_PUSH_IPI +/* + * The search for the next cpu always starts at rq->cpu and ends + * when we reach rq->cpu again. It will never return rq->cpu. + * This returns the next cpu to check, or nr_cpu_ids if the loop + * is complete. + * + * rq->rt.push_cpu holds the last cpu returned by this function, + * or if this is the first instance, it must hold rq->cpu. + */ +static int rto_next_cpu(struct rq *rq) +{ + int prev_cpu = rq->rt.push_cpu; + int cpu; + + cpu = cpumask_next(prev_cpu, rq->rd->rto_mask); + + /* + * If the previous cpu is less than the rq's CPU, then it already + * passed the end of the mask, and has started from the beginning. + * We end if the next CPU is greater or equal to rq's CPU. + */ + if (prev_cpu < rq->cpu) { + if (cpu >= rq->cpu) + return nr_cpu_ids; + + } else if (cpu >= nr_cpu_ids) { + /* + * We passed the end of the mask, start at the beginning. + * If the result is greater or equal to the rq's CPU, then + * the loop is finished. + */ + cpu = cpumask_first(rq->rd->rto_mask); + if (cpu >= rq->cpu) + return nr_cpu_ids; + } + rq->rt.push_cpu = cpu; + + /* Return cpu to let the caller know if the loop is finished or not */ + return cpu; +} + +static int find_next_push_cpu(struct rq *rq) +{ + struct rq *next_rq; + int cpu; + + while (1) { + cpu = rto_next_cpu(rq); + if (cpu >= nr_cpu_ids) + break; + next_rq = cpu_rq(cpu); + + /* Make sure the next rq can push to this rq */ + if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr) + break; + } + + return cpu; +} + +#define RT_PUSH_IPI_EXECUTING 1 +#define RT_PUSH_IPI_RESTART 2 + +static void tell_cpu_to_push(struct rq *rq) +{ + int cpu; + + if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { + raw_spin_lock(&rq->rt.push_lock); + /* Make sure it's still executing */ + if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { + /* + * Tell the IPI to restart the loop as things have + * changed since it started. + */ + rq->rt.push_flags |= RT_PUSH_IPI_RESTART; + raw_spin_unlock(&rq->rt.push_lock); + return; + } + raw_spin_unlock(&rq->rt.push_lock); + } + + /* When here, there's no IPI going around */ + + rq->rt.push_cpu = rq->cpu; + cpu = find_next_push_cpu(rq); + if (cpu >= nr_cpu_ids) + return; + + rq->rt.push_flags = RT_PUSH_IPI_EXECUTING; + + irq_work_queue_on(&rq->rt.push_work, cpu); +} + +/* Called from hardirq context */ +static void try_to_push_tasks(void *arg) +{ + struct rt_rq *rt_rq = arg; + struct rq *rq, *src_rq; + int this_cpu; + int cpu; + + this_cpu = rt_rq->push_cpu; + + /* Paranoid check */ + BUG_ON(this_cpu != smp_processor_id()); + + rq = cpu_rq(this_cpu); + src_rq = rq_of_rt_rq(rt_rq); + +again: + if (has_pushable_tasks(rq)) { + raw_spin_lock(&rq->lock); + push_rt_task(rq); + raw_spin_unlock(&rq->lock); + } + + /* Pass the IPI to the next rt overloaded queue */ + raw_spin_lock(&rt_rq->push_lock); + /* + * If the source queue changed since the IPI went out, + * we need to restart the search from that CPU again. + */ + if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) { + rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART; + rt_rq->push_cpu = src_rq->cpu; + } + + cpu = find_next_push_cpu(src_rq); + + if (cpu >= nr_cpu_ids) + rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING; + raw_spin_unlock(&rt_rq->push_lock); + + if (cpu >= nr_cpu_ids) + return; + + /* + * It is possible that a restart caused this CPU to be + * chosen again. Don't bother with an IPI, just see if we + * have more to push. + */ + if (unlikely(cpu == rq->cpu)) + goto again; + + /* Try the next RT overloaded CPU */ + irq_work_queue_on(&rt_rq->push_work, cpu); +} + +static void push_irq_work_func(struct irq_work *work) +{ + struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work); + + try_to_push_tasks(rt_rq); +} +#endif /* HAVE_RT_PUSH_IPI */ + static int pull_rt_task(struct rq *this_rq) { int this_cpu = this_rq->cpu, ret = 0, cpu; @@ -1793,6 +1963,13 @@ static int pull_rt_task(struct rq *this_rq) */ smp_rmb(); +#ifdef HAVE_RT_PUSH_IPI + if (sched_feat(RT_PUSH_IPI)) { + tell_cpu_to_push(this_rq); + return 0; + } +#endif + for_each_cpu(cpu, this_rq->rd->rto_mask) { if (this_cpu == cpu) continue; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index dc0f435a2779..c2c0d7bd5027 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -6,6 +6,7 @@ #include <linux/mutex.h> #include <linux/spinlock.h> #include <linux/stop_machine.h> +#include <linux/irq_work.h> #include <linux/tick.h> #include <linux/slab.h> @@ -418,6 +419,11 @@ static inline int rt_bandwidth_enabled(void) return sysctl_sched_rt_runtime >= 0; } +/* RT IPI pull logic requires IRQ_WORK */ +#ifdef CONFIG_IRQ_WORK +# define HAVE_RT_PUSH_IPI +#endif + /* Real-Time classes' related field in a runqueue: */ struct rt_rq { struct rt_prio_array active; @@ -435,7 +441,13 @@ struct rt_rq { unsigned long rt_nr_total; int overloaded; struct plist_head pushable_tasks; +#ifdef HAVE_RT_PUSH_IPI + int push_flags; + int push_cpu; + struct irq_work push_work; + raw_spinlock_t push_lock; #endif +#endif /* CONFIG_SMP */ int rt_queued; int rt_throttled;

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] sched/rt: Use IPI to trigger RT task push migration instead" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From b6366f048e0caff28af5335b7af2031266e1b06b Mon Sep 17 00:00:00 2001 From: Steven Rostedt <rostedt(a)goodmis.org> Date: Wed, 18 Mar 2015 14:49:46 -0400 Subject: [PATCH] sched/rt: Use IPI to trigger RT task push migration instead of pulling When debugging the latencies on a 40 core box, where we hit 300 to 500 microsecond latencies, I found there was a huge contention on the runqueue locks. Investigating it further, running ftrace, I found that it was due to the pulling of RT tasks. The test that was run was the following: cyclictest --numa -p95 -m -d0 -i100 This created a thread on each CPU, that would set its wakeup in iterations of 100 microseconds. The -d0 means that all the threads had the same interval (100us). Each thread sleeps for 100us and wakes up and measures its latencies. cyclictest is maintained at: git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git What happened was another RT task would be scheduled on one of the CPUs that was running our test, when the other CPU tests went to sleep and scheduled idle. This caused the "pull" operation to execute on all these CPUs. Each one of these saw the RT task that was overloaded on the CPU of the test that was still running, and each one tried to grab that task in a thundering herd way. To grab the task, each thread would do a double rq lock grab, grabbing its own lock as well as the rq of the overloaded CPU. As the sched domains on this box was rather flat for its size, I saw up to 12 CPUs block on this lock at once. This caused a ripple affect with the rq locks especially since the taking was done via a double rq lock, which means that several of the CPUs had their own rq locks held while trying to take this rq lock. As these locks were blocked, any wakeups or load balanceing on these CPUs would also block on these locks, and the wait time escalated. I've tried various methods to lessen the load, but things like an atomic counter to only let one CPU grab the task wont work, because the task may have a limited affinity, and we may pick the wrong CPU to take that lock and do the pull, to only find out that the CPU we picked isn't in the task's affinity. Instead of doing the PULL, I now have the CPUs that want the pull to send over an IPI to the overloaded CPU, and let that CPU pick what CPU to push the task to. No more need to grab the rq lock, and the push/pull algorithm still works fine. With this patch, the latency dropped to just 150us over a 20 hour run. Without the patch, the huge latencies would trigger in seconds. I've created a new sched feature called RT_PUSH_IPI, which is enabled by default. When RT_PUSH_IPI is not enabled, the old method of grabbing the rq locks and having the pulling CPU do the work is implemented. When RT_PUSH_IPI is enabled, the IPI is sent to the overloaded CPU to do a push. To enabled or disable this at run time: # mount -t debugfs nodev /sys/kernel/debug # echo RT_PUSH_IPI > /sys/kernel/debug/sched_features or # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched_features Update: This original patch would send an IPI to all CPUs in the RT overload list. But that could theoretically cause the reverse issue. That is, there could be lots of overloaded RT queues and one CPU lowers its priority. It would then send an IPI to all the overloaded RT queues and they could then all try to grab the rq lock of the CPU lowering its priority, and then we have the same problem. The latest design sends out only one IPI to the first overloaded CPU. It tries to push any tasks that it can, and then looks for the next overloaded CPU that can push to the source CPU. The IPIs stop when all overloaded CPUs that have pushable tasks that have priorities greater than the source CPU are covered. In case the source CPU lowers its priority again, a flag is set to tell the IPI traversal to restart with the first RT overloaded CPU after the source CPU. Parts-suggested-by: Peter Zijlstra <peterz(a)infradead.org> Signed-off-by: Steven Rostedt <rostedt(a)goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc: Joern Engel <joern(a)purestorage.com> Cc: Clark Williams <williams(a)redhat.com> Cc: Mike Galbraith <umgwanakikbuti(a)gmail.com> Cc: Paul E. McKenney <paulmck(a)linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Link: http://lkml.kernel.org/r/20150318144946.2f3cc982@gandalf.local.home Signed-off-by: Ingo Molnar <mingo(a)kernel.org> diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 90284d117fe6..91e33cd485f6 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -56,6 +56,19 @@ SCHED_FEAT(NONTASK_CAPACITY, true) */ SCHED_FEAT(TTWU_QUEUE, true) +#ifdef HAVE_RT_PUSH_IPI +/* + * In order to avoid a thundering herd attack of CPUs that are + * lowering their priorities at the same time, and there being + * a single CPU that has an RT task that can migrate and is waiting + * to run, where the other CPUs will try to take that CPUs + * rq lock and possibly create a large contention, sending an + * IPI to that CPU and let that CPU push the RT task to where + * it should go may be a better scenario. + */ +SCHED_FEAT(RT_PUSH_IPI, true) +#endif + SCHED_FEAT(FORCE_SD_OVERLAP, false) SCHED_FEAT(RT_RUNTIME_SHARE, true) SCHED_FEAT(LB_MIN, false) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index f4d4b077eba0..ad0241561c3e 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -6,6 +6,7 @@ #include "sched.h" #include <linux/slab.h> +#include <linux/irq_work.h> int sched_rr_timeslice = RR_TIMESLICE; @@ -59,6 +60,10 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt_b) raw_spin_unlock(&rt_b->rt_runtime_lock); } +#ifdef CONFIG_SMP +static void push_irq_work_func(struct irq_work *work); +#endif + void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq) { struct rt_prio_array *array; @@ -78,7 +83,14 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq) rt_rq->rt_nr_migratory = 0; rt_rq->overloaded = 0; plist_head_init(&rt_rq->pushable_tasks); + +#ifdef HAVE_RT_PUSH_IPI + rt_rq->push_flags = 0; + rt_rq->push_cpu = nr_cpu_ids; + raw_spin_lock_init(&rt_rq->push_lock); + init_irq_work(&rt_rq->push_work, push_irq_work_func); #endif +#endif /* CONFIG_SMP */ /* We start is dequeued state, because no RT tasks are queued */ rt_rq->rt_queued = 0; @@ -1778,6 +1790,164 @@ static void push_rt_tasks(struct rq *rq) ; } +#ifdef HAVE_RT_PUSH_IPI +/* + * The search for the next cpu always starts at rq->cpu and ends + * when we reach rq->cpu again. It will never return rq->cpu. + * This returns the next cpu to check, or nr_cpu_ids if the loop + * is complete. + * + * rq->rt.push_cpu holds the last cpu returned by this function, + * or if this is the first instance, it must hold rq->cpu. + */ +static int rto_next_cpu(struct rq *rq) +{ + int prev_cpu = rq->rt.push_cpu; + int cpu; + + cpu = cpumask_next(prev_cpu, rq->rd->rto_mask); + + /* + * If the previous cpu is less than the rq's CPU, then it already + * passed the end of the mask, and has started from the beginning. + * We end if the next CPU is greater or equal to rq's CPU. + */ + if (prev_cpu < rq->cpu) { + if (cpu >= rq->cpu) + return nr_cpu_ids; + + } else if (cpu >= nr_cpu_ids) { + /* + * We passed the end of the mask, start at the beginning. + * If the result is greater or equal to the rq's CPU, then + * the loop is finished. + */ + cpu = cpumask_first(rq->rd->rto_mask); + if (cpu >= rq->cpu) + return nr_cpu_ids; + } + rq->rt.push_cpu = cpu; + + /* Return cpu to let the caller know if the loop is finished or not */ + return cpu; +} + +static int find_next_push_cpu(struct rq *rq) +{ + struct rq *next_rq; + int cpu; + + while (1) { + cpu = rto_next_cpu(rq); + if (cpu >= nr_cpu_ids) + break; + next_rq = cpu_rq(cpu); + + /* Make sure the next rq can push to this rq */ + if (next_rq->rt.highest_prio.next < rq->rt.highest_prio.curr) + break; + } + + return cpu; +} + +#define RT_PUSH_IPI_EXECUTING 1 +#define RT_PUSH_IPI_RESTART 2 + +static void tell_cpu_to_push(struct rq *rq) +{ + int cpu; + + if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { + raw_spin_lock(&rq->rt.push_lock); + /* Make sure it's still executing */ + if (rq->rt.push_flags & RT_PUSH_IPI_EXECUTING) { + /* + * Tell the IPI to restart the loop as things have + * changed since it started. + */ + rq->rt.push_flags |= RT_PUSH_IPI_RESTART; + raw_spin_unlock(&rq->rt.push_lock); + return; + } + raw_spin_unlock(&rq->rt.push_lock); + } + + /* When here, there's no IPI going around */ + + rq->rt.push_cpu = rq->cpu; + cpu = find_next_push_cpu(rq); + if (cpu >= nr_cpu_ids) + return; + + rq->rt.push_flags = RT_PUSH_IPI_EXECUTING; + + irq_work_queue_on(&rq->rt.push_work, cpu); +} + +/* Called from hardirq context */ +static void try_to_push_tasks(void *arg) +{ + struct rt_rq *rt_rq = arg; + struct rq *rq, *src_rq; + int this_cpu; + int cpu; + + this_cpu = rt_rq->push_cpu; + + /* Paranoid check */ + BUG_ON(this_cpu != smp_processor_id()); + + rq = cpu_rq(this_cpu); + src_rq = rq_of_rt_rq(rt_rq); + +again: + if (has_pushable_tasks(rq)) { + raw_spin_lock(&rq->lock); + push_rt_task(rq); + raw_spin_unlock(&rq->lock); + } + + /* Pass the IPI to the next rt overloaded queue */ + raw_spin_lock(&rt_rq->push_lock); + /* + * If the source queue changed since the IPI went out, + * we need to restart the search from that CPU again. + */ + if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) { + rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART; + rt_rq->push_cpu = src_rq->cpu; + } + + cpu = find_next_push_cpu(src_rq); + + if (cpu >= nr_cpu_ids) + rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING; + raw_spin_unlock(&rt_rq->push_lock); + + if (cpu >= nr_cpu_ids) + return; + + /* + * It is possible that a restart caused this CPU to be + * chosen again. Don't bother with an IPI, just see if we + * have more to push. + */ + if (unlikely(cpu == rq->cpu)) + goto again; + + /* Try the next RT overloaded CPU */ + irq_work_queue_on(&rt_rq->push_work, cpu); +} + +static void push_irq_work_func(struct irq_work *work) +{ + struct rt_rq *rt_rq = container_of(work, struct rt_rq, push_work); + + try_to_push_tasks(rt_rq); +} +#endif /* HAVE_RT_PUSH_IPI */ + static int pull_rt_task(struct rq *this_rq) { int this_cpu = this_rq->cpu, ret = 0, cpu; @@ -1793,6 +1963,13 @@ static int pull_rt_task(struct rq *this_rq) */ smp_rmb(); +#ifdef HAVE_RT_PUSH_IPI + if (sched_feat(RT_PUSH_IPI)) { + tell_cpu_to_push(this_rq); + return 0; + } +#endif + for_each_cpu(cpu, this_rq->rd->rto_mask) { if (this_cpu == cpu) continue; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index dc0f435a2779..c2c0d7bd5027 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -6,6 +6,7 @@ #include <linux/mutex.h> #include <linux/spinlock.h> #include <linux/stop_machine.h> +#include <linux/irq_work.h> #include <linux/tick.h> #include <linux/slab.h> @@ -418,6 +419,11 @@ static inline int rt_bandwidth_enabled(void) return sysctl_sched_rt_runtime >= 0; } +/* RT IPI pull logic requires IRQ_WORK */ +#ifdef CONFIG_IRQ_WORK +# define HAVE_RT_PUSH_IPI +#endif + /* Real-Time classes' related field in a runqueue: */ struct rt_rq { struct rt_prio_array active; @@ -435,7 +441,13 @@ struct rt_rq { unsigned long rt_nr_total; int overloaded; struct plist_head pushable_tasks; +#ifdef HAVE_RT_PUSH_IPI + int push_flags; + int push_cpu; + struct irq_work push_work; + raw_spinlock_t push_lock; #endif +#endif /* CONFIG_SMP */ int rt_queued; int rt_throttled;

8 years

1
0
0 0

[Linux-stable-mirror] Patch "vsock: use new wait API for vsock_stream_sendmsg()" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled vsock: use new wait API for vsock_stream_sendmsg() to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 499fde662f1957e3cb8d192a94a099ebe19c714b Mon Sep 17 00:00:00 2001 From: WANG Cong <xiyou.wangcong(a)gmail.com> Date: Fri, 19 May 2017 11:21:59 -0700 Subject: vsock: use new wait API for vsock_stream_sendmsg() From: WANG Cong <xiyou.wangcong(a)gmail.com> commit 499fde662f1957e3cb8d192a94a099ebe19c714b upstream. As reported by Michal, vsock_stream_sendmsg() could still sleep at vsock_stream_has_space() after prepare_to_wait(): vsock_stream_has_space vmci_transport_stream_has_space vmci_qpair_produce_free_space qp_lock qp_acquire_queue_mutex mutex_lock Just switch to the new wait API like we did for commit d9dc8b0f8b4e ("net: fix sleeping for sk_wait_event()"). Reported-by: Michal Kubecek <mkubecek(a)suse.cz> Cc: Stefan Hajnoczi <stefanha(a)redhat.com> Cc: Jorgen Hansen <jhansen(a)vmware.com> Cc: "Michael S. Tsirkin" <mst(a)redhat.com> Cc: Claudio Imbrenda <imbrenda(a)linux.vnet.ibm.com> Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha(a)redhat.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Cc: "Jorgen S. Hansen" <jhansen(a)vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/vmw_vsock/af_vsock.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1524,8 +1524,7 @@ static int vsock_stream_sendmsg(struct s long timeout; int err; struct vsock_transport_send_notify_data send_data; - - DEFINE_WAIT(wait); + DEFINE_WAIT_FUNC(wait, woken_wake_function); sk = sock->sk; vsk = vsock_sk(sk); @@ -1568,11 +1567,10 @@ static int vsock_stream_sendmsg(struct s if (err < 0) goto out; - while (total_written < len) { ssize_t written; - prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); + add_wait_queue(sk_sleep(sk), &wait); while (vsock_stream_has_space(vsk) == 0 && sk->sk_err == 0 && !(sk->sk_shutdown & SEND_SHUTDOWN) && @@ -1581,33 +1579,30 @@ static int vsock_stream_sendmsg(struct s /* Don't wait for non-blocking sockets. */ if (timeout == 0) { err = -EAGAIN; - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); goto out_err; } err = transport->notify_send_pre_block(vsk, &send_data); if (err < 0) { - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); goto out_err; } release_sock(sk); - timeout = schedule_timeout(timeout); + timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); lock_sock(sk); if (signal_pending(current)) { err = sock_intr_errno(timeout); - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); goto out_err; } else if (timeout == 0) { err = -EAGAIN; - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); goto out_err; } - - prepare_to_wait(sk_sleep(sk), &wait, - TASK_INTERRUPTIBLE); } - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); /* These checks occur both as part of and after the loop * conditional since we need to check before and after Patches currently in stable-queue which might be from xiyou.wangcong(a)gmail.com are queue-4.9/vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch queue-4.9/ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 76da0704507bbc51875013f6557877ab308cfd0a Mon Sep 17 00:00:00 2001 From: WANG Cong <xiyou.wangcong(a)gmail.com> Date: Tue, 20 Jun 2017 11:42:27 -0700 Subject: ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER From: WANG Cong <xiyou.wangcong(a)gmail.com> commit 76da0704507bbc51875013f6557877ab308cfd0a upstream. In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired, unfortunately, as reported by jeffy, netdev_wait_allrefs() could rebroadcast NETDEV_UNREGISTER event until all refs are gone. We have to add an additional check to avoid this corner case. For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED, for dev_change_net_namespace(), dev->reg_state is NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED. Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") Reported-by: jeffy <jeffy.chen(a)rock-chips.com> Cc: David Ahern <dsahern(a)gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com> Acked-by: David Ahern <dsahern(a)gmail.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/ipv6/route.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3495,7 +3495,11 @@ static int ip6_route_dev_notify(struct n net->ipv6.ip6_blk_hole_entry->dst.dev = dev; net->ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(dev); #endif - } else if (event == NETDEV_UNREGISTER) { + } else if (event == NETDEV_UNREGISTER && + dev->reg_state != NETREG_UNREGISTERED) { + /* NETDEV_UNREGISTER could be fired for multiple times by + * netdev_wait_allrefs(). Make sure we only call this once. + */ in6_dev_put(net->ipv6.ip6_null_entry->rt6i_idev); #ifdef CONFIG_IPV6_MULTIPLE_TABLES in6_dev_put(net->ipv6.ip6_prohibit_entry->rt6i_idev); Patches currently in stable-queue which might be from xiyou.wangcong(a)gmail.com are queue-4.9/vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch queue-4.9/ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "vsock: use new wait API for vsock_stream_sendmsg()" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled vsock: use new wait API for vsock_stream_sendmsg() to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 499fde662f1957e3cb8d192a94a099ebe19c714b Mon Sep 17 00:00:00 2001 From: WANG Cong <xiyou.wangcong(a)gmail.com> Date: Fri, 19 May 2017 11:21:59 -0700 Subject: vsock: use new wait API for vsock_stream_sendmsg() From: WANG Cong <xiyou.wangcong(a)gmail.com> commit 499fde662f1957e3cb8d192a94a099ebe19c714b upstream. As reported by Michal, vsock_stream_sendmsg() could still sleep at vsock_stream_has_space() after prepare_to_wait(): vsock_stream_has_space vmci_transport_stream_has_space vmci_qpair_produce_free_space qp_lock qp_acquire_queue_mutex mutex_lock Just switch to the new wait API like we did for commit d9dc8b0f8b4e ("net: fix sleeping for sk_wait_event()"). Reported-by: Michal Kubecek <mkubecek(a)suse.cz> Cc: Stefan Hajnoczi <stefanha(a)redhat.com> Cc: Jorgen Hansen <jhansen(a)vmware.com> Cc: "Michael S. Tsirkin" <mst(a)redhat.com> Cc: Claudio Imbrenda <imbrenda(a)linux.vnet.ibm.com> Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha(a)redhat.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Cc: "Jorgen S. Hansen" <jhansen(a)vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/vmw_vsock/af_vsock.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1512,8 +1512,7 @@ static int vsock_stream_sendmsg(struct s long timeout; int err; struct vsock_transport_send_notify_data send_data; - - DEFINE_WAIT(wait); + DEFINE_WAIT_FUNC(wait, woken_wake_function); sk = sock->sk; vsk = vsock_sk(sk); @@ -1556,11 +1555,10 @@ static int vsock_stream_sendmsg(struct s if (err < 0) goto out; - while (total_written < len) { ssize_t written; - prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); + add_wait_queue(sk_sleep(sk), &wait); while (vsock_stream_has_space(vsk) == 0 && sk->sk_err == 0 && !(sk->sk_shutdown & SEND_SHUTDOWN) && @@ -1569,33 +1567,30 @@ static int vsock_stream_sendmsg(struct s /* Don't wait for non-blocking sockets. */ if (timeout == 0) { err = -EAGAIN; - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); goto out_err; } err = transport->notify_send_pre_block(vsk, &send_data); if (err < 0) { - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); goto out_err; } release_sock(sk); - timeout = schedule_timeout(timeout); + timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); lock_sock(sk); if (signal_pending(current)) { err = sock_intr_errno(timeout); - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); goto out_err; } else if (timeout == 0) { err = -EAGAIN; - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); goto out_err; } - - prepare_to_wait(sk_sleep(sk), &wait, - TASK_INTERRUPTIBLE); } - finish_wait(sk_sleep(sk), &wait); + remove_wait_queue(sk_sleep(sk), &wait); /* These checks occur both as part of and after the loop * conditional since we need to check before and after Patches currently in stable-queue which might be from xiyou.wangcong(a)gmail.com are queue-4.4/vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch queue-4.4/ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "AF_VSOCK: Shrink the area influenced by prepare_to_wait" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled AF_VSOCK: Shrink the area influenced by prepare_to_wait to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: af_vsock-shrink-the-area-influenced-by-prepare_to_wait.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From f7f9b5e7f8eccfd68ffa7b8d74b07c478bb9e7f0 Mon Sep 17 00:00:00 2001 From: Claudio Imbrenda <imbrenda(a)linux.vnet.ibm.com> Date: Tue, 22 Mar 2016 17:05:52 +0100 Subject: AF_VSOCK: Shrink the area influenced by prepare_to_wait From: Claudio Imbrenda <imbrenda(a)linux.vnet.ibm.com> commit f7f9b5e7f8eccfd68ffa7b8d74b07c478bb9e7f0 upstream. When a thread is prepared for waiting by calling prepare_to_wait, sleeping is not allowed until either the wait has taken place or finish_wait has been called. The existing code in af_vsock imposed unnecessary no-sleep assumptions to a broad list of backend functions. This patch shrinks the influence of prepare_to_wait to the area where it is strictly needed, therefore relaxing the no-sleep restriction there. Signed-off-by: Claudio Imbrenda <imbrenda(a)linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Cc: "Jorgen S. Hansen" <jhansen(a)vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/vmw_vsock/af_vsock.c | 158 +++++++++++++++++++++++++---------------------- 1 file changed, 85 insertions(+), 73 deletions(-) --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1209,10 +1209,14 @@ static int vsock_stream_connect(struct s if (signal_pending(current)) { err = sock_intr_errno(timeout); - goto out_wait_error; + sk->sk_state = SS_UNCONNECTED; + sock->state = SS_UNCONNECTED; + goto out_wait; } else if (timeout == 0) { err = -ETIMEDOUT; - goto out_wait_error; + sk->sk_state = SS_UNCONNECTED; + sock->state = SS_UNCONNECTED; + goto out_wait; } prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); @@ -1220,20 +1224,17 @@ static int vsock_stream_connect(struct s if (sk->sk_err) { err = -sk->sk_err; - goto out_wait_error; - } else + sk->sk_state = SS_UNCONNECTED; + sock->state = SS_UNCONNECTED; + } else { err = 0; + } out_wait: finish_wait(sk_sleep(sk), &wait); out: release_sock(sk); return err; - -out_wait_error: - sk->sk_state = SS_UNCONNECTED; - sock->state = SS_UNCONNECTED; - goto out_wait; } static int vsock_accept(struct socket *sock, struct socket *newsock, int flags) @@ -1270,18 +1271,20 @@ static int vsock_accept(struct socket *s listener->sk_err == 0) { release_sock(listener); timeout = schedule_timeout(timeout); + finish_wait(sk_sleep(listener), &wait); lock_sock(listener); if (signal_pending(current)) { err = sock_intr_errno(timeout); - goto out_wait; + goto out; } else if (timeout == 0) { err = -EAGAIN; - goto out_wait; + goto out; } prepare_to_wait(sk_sleep(listener), &wait, TASK_INTERRUPTIBLE); } + finish_wait(sk_sleep(listener), &wait); if (listener->sk_err) err = -listener->sk_err; @@ -1301,19 +1304,15 @@ static int vsock_accept(struct socket *s */ if (err) { vconnected->rejected = true; - release_sock(connected); - sock_put(connected); - goto out_wait; + } else { + newsock->state = SS_CONNECTED; + sock_graft(connected, newsock); } - newsock->state = SS_CONNECTED; - sock_graft(connected, newsock); release_sock(connected); sock_put(connected); } -out_wait: - finish_wait(sk_sleep(listener), &wait); out: release_sock(listener); return err; @@ -1557,11 +1556,11 @@ static int vsock_stream_sendmsg(struct s if (err < 0) goto out; - prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); while (total_written < len) { ssize_t written; + prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); while (vsock_stream_has_space(vsk) == 0 && sk->sk_err == 0 && !(sk->sk_shutdown & SEND_SHUTDOWN) && @@ -1570,27 +1569,33 @@ static int vsock_stream_sendmsg(struct s /* Don't wait for non-blocking sockets. */ if (timeout == 0) { err = -EAGAIN; - goto out_wait; + finish_wait(sk_sleep(sk), &wait); + goto out_err; } err = transport->notify_send_pre_block(vsk, &send_data); - if (err < 0) - goto out_wait; + if (err < 0) { + finish_wait(sk_sleep(sk), &wait); + goto out_err; + } release_sock(sk); timeout = schedule_timeout(timeout); lock_sock(sk); if (signal_pending(current)) { err = sock_intr_errno(timeout); - goto out_wait; + finish_wait(sk_sleep(sk), &wait); + goto out_err; } else if (timeout == 0) { err = -EAGAIN; - goto out_wait; + finish_wait(sk_sleep(sk), &wait); + goto out_err; } prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); } + finish_wait(sk_sleep(sk), &wait); /* These checks occur both as part of and after the loop * conditional since we need to check before and after @@ -1598,16 +1603,16 @@ static int vsock_stream_sendmsg(struct s */ if (sk->sk_err) { err = -sk->sk_err; - goto out_wait; + goto out_err; } else if ((sk->sk_shutdown & SEND_SHUTDOWN) || (vsk->peer_shutdown & RCV_SHUTDOWN)) { err = -EPIPE; - goto out_wait; + goto out_err; } err = transport->notify_send_pre_enqueue(vsk, &send_data); if (err < 0) - goto out_wait; + goto out_err; /* Note that enqueue will only write as many bytes as are free * in the produce queue, so we don't need to ensure len is @@ -1620,7 +1625,7 @@ static int vsock_stream_sendmsg(struct s len - total_written); if (written < 0) { err = -ENOMEM; - goto out_wait; + goto out_err; } total_written += written; @@ -1628,14 +1633,13 @@ static int vsock_stream_sendmsg(struct s err = transport->notify_send_post_enqueue( vsk, written, &send_data); if (err < 0) - goto out_wait; + goto out_err; } -out_wait: +out_err: if (total_written > 0) err = total_written; - finish_wait(sk_sleep(sk), &wait); out: release_sock(sk); return err; @@ -1716,21 +1720,61 @@ vsock_stream_recvmsg(struct socket *sock if (err < 0) goto out; - prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); while (1) { - s64 ready = vsock_stream_has_data(vsk); + s64 ready; - if (ready < 0) { - /* Invalid queue pair content. XXX This should be - * changed to a connection reset in a later change. - */ + prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); + ready = vsock_stream_has_data(vsk); - err = -ENOMEM; - goto out_wait; - } else if (ready > 0) { + if (ready == 0) { + if (sk->sk_err != 0 || + (sk->sk_shutdown & RCV_SHUTDOWN) || + (vsk->peer_shutdown & SEND_SHUTDOWN)) { + finish_wait(sk_sleep(sk), &wait); + break; + } + /* Don't wait for non-blocking sockets. */ + if (timeout == 0) { + err = -EAGAIN; + finish_wait(sk_sleep(sk), &wait); + break; + } + + err = transport->notify_recv_pre_block( + vsk, target, &recv_data); + if (err < 0) { + finish_wait(sk_sleep(sk), &wait); + break; + } + release_sock(sk); + timeout = schedule_timeout(timeout); + lock_sock(sk); + + if (signal_pending(current)) { + err = sock_intr_errno(timeout); + finish_wait(sk_sleep(sk), &wait); + break; + } else if (timeout == 0) { + err = -EAGAIN; + finish_wait(sk_sleep(sk), &wait); + break; + } + } else { ssize_t read; + finish_wait(sk_sleep(sk), &wait); + + if (ready < 0) { + /* Invalid queue pair content. XXX This should + * be changed to a connection reset in a later + * change. + */ + + err = -ENOMEM; + goto out; + } + err = transport->notify_recv_pre_dequeue( vsk, target, &recv_data); if (err < 0) @@ -1750,42 +1794,12 @@ vsock_stream_recvmsg(struct socket *sock vsk, target, read, !(flags & MSG_PEEK), &recv_data); if (err < 0) - goto out_wait; + goto out; if (read >= target || flags & MSG_PEEK) break; target -= read; - } else { - if (sk->sk_err != 0 || (sk->sk_shutdown & RCV_SHUTDOWN) - || (vsk->peer_shutdown & SEND_SHUTDOWN)) { - break; - } - /* Don't wait for non-blocking sockets. */ - if (timeout == 0) { - err = -EAGAIN; - break; - } - - err = transport->notify_recv_pre_block( - vsk, target, &recv_data); - if (err < 0) - break; - - release_sock(sk); - timeout = schedule_timeout(timeout); - lock_sock(sk); - - if (signal_pending(current)) { - err = sock_intr_errno(timeout); - break; - } else if (timeout == 0) { - err = -EAGAIN; - break; - } - - prepare_to_wait(sk_sleep(sk), &wait, - TASK_INTERRUPTIBLE); } } @@ -1797,8 +1811,6 @@ vsock_stream_recvmsg(struct socket *sock if (copied > 0) err = copied; -out_wait: - finish_wait(sk_sleep(sk), &wait); out: release_sock(sk); return err; Patches currently in stable-queue which might be from imbrenda(a)linux.vnet.ibm.com are queue-4.4/af_vsock-shrink-the-area-influenced-by-prepare_to_wait.patch queue-4.4/vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 76da0704507bbc51875013f6557877ab308cfd0a Mon Sep 17 00:00:00 2001 From: WANG Cong <xiyou.wangcong(a)gmail.com> Date: Tue, 20 Jun 2017 11:42:27 -0700 Subject: ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER From: WANG Cong <xiyou.wangcong(a)gmail.com> commit 76da0704507bbc51875013f6557877ab308cfd0a upstream. In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired, unfortunately, as reported by jeffy, netdev_wait_allrefs() could rebroadcast NETDEV_UNREGISTER event until all refs are gone. We have to add an additional check to avoid this corner case. For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED, for dev_change_net_namespace(), dev->reg_state is NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED. Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") Reported-by: jeffy <jeffy.chen(a)rock-chips.com> Cc: David Ahern <dsahern(a)gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com> Acked-by: David Ahern <dsahern(a)gmail.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/ipv6/route.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3378,7 +3378,11 @@ static int ip6_route_dev_notify(struct n net->ipv6.ip6_blk_hole_entry->dst.dev = dev; net->ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(dev); #endif - } else if (event == NETDEV_UNREGISTER) { + } else if (event == NETDEV_UNREGISTER && + dev->reg_state != NETREG_UNREGISTERED) { + /* NETDEV_UNREGISTER could be fired for multiple times by + * netdev_wait_allrefs(). Make sure we only call this once. + */ in6_dev_put(net->ipv6.ip6_null_entry->rt6i_idev); #ifdef CONFIG_IPV6_MULTIPLE_TABLES in6_dev_put(net->ipv6.ip6_prohibit_entry->rt6i_idev); Patches currently in stable-queue which might be from xiyou.wangcong(a)gmail.com are queue-4.4/vsock-use-new-wait-api-for-vsock_stream_sendmsg.patch queue-4.4/ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER" has been added to the 3.18-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER to the 3.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch and it can be found in the queue-3.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 76da0704507bbc51875013f6557877ab308cfd0a Mon Sep 17 00:00:00 2001 From: WANG Cong <xiyou.wangcong(a)gmail.com> Date: Tue, 20 Jun 2017 11:42:27 -0700 Subject: ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER From: WANG Cong <xiyou.wangcong(a)gmail.com> commit 76da0704507bbc51875013f6557877ab308cfd0a upstream. In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired, unfortunately, as reported by jeffy, netdev_wait_allrefs() could rebroadcast NETDEV_UNREGISTER event until all refs are gone. We have to add an additional check to avoid this corner case. For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED, for dev_change_net_namespace(), dev->reg_state is NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED. Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") Reported-by: jeffy <jeffy.chen(a)rock-chips.com> Cc: David Ahern <dsahern(a)gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com> Acked-by: David Ahern <dsahern(a)gmail.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/ipv6/route.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2827,7 +2827,11 @@ static int ip6_route_dev_notify(struct n net->ipv6.ip6_blk_hole_entry->dst.dev = dev; net->ipv6.ip6_blk_hole_entry->rt6i_idev = in6_dev_get(dev); #endif - } else if (event == NETDEV_UNREGISTER) { + } else if (event == NETDEV_UNREGISTER && + dev->reg_state != NETREG_UNREGISTERED) { + /* NETDEV_UNREGISTER could be fired for multiple times by + * netdev_wait_allrefs(). Make sure we only call this once. + */ in6_dev_put(net->ipv6.ip6_null_entry->rt6i_idev); #ifdef CONFIG_IPV6_MULTIPLE_TABLES in6_dev_put(net->ipv6.ip6_prohibit_entry->rt6i_idev); Patches currently in stable-queue which might be from xiyou.wangcong(a)gmail.com are queue-3.18/ipv6-only-call-ip6_route_dev_notify-once-for-netdev_unregister.patch

8 years

1
0
0 0

[Linux-stable-mirror] Backport request to 4.4 and 4.9 for af_vsock.c

by Jorgen S. Hansen

Hi, Customers running VMware Workstation on 4.4 kernels have been hitting a couple of bugs where net/vmw_vsock/af_vsock.c is calling functions that may sleep when not allowed to. These issues have already been fixed in later kernels, and we would like to request these fixes backported to 4.4 and 4.9 LTS branches. To resolve the above issue, we would like the following commit backported to 4.4 (it applies when using 3-way merge): commit 265563fc8f123641d006d65f26d18d0b24d3022d "AF_VSOCK: Shrink the area influenced by prepare_to_wait" and the following commit (that needs to be applied on top of the above) backported to both 4.4 and 4.9: commit 359669f3b8eab6cbfb83eb1a46ec6ba089d47d18 "vsock: use new wait API for vsock_stream_sendmsg()” The backports have been tested with kernel 4.4.100 and 4.9.64. Thanks, Jorgen

8 years

2
3
0 0

[Linux-stable-mirror] Please pick 76da0704507b ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER")

by Konstantin Khlebnikov

At least 3.18, 4.4 and 4.9 already have 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") which has bug fixed in: 76da0704507b ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER")

8 years

2
1
0 0

[Linux-stable-mirror] Patch "ACPI / APEI: Remove arch_apei_flush_tlb_one()" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ACPI / APEI: Remove arch_apei_flush_tlb_one() to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: acpi-apei-remove-arch_apei_flush_tlb_one.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 4a75aeacda3c2455954596593d89187df5420d0a Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:27 +0000 Subject: ACPI / APEI: Remove arch_apei_flush_tlb_one() From: James Morse <james.morse(a)arm.com> commit 4a75aeacda3c2455954596593d89187df5420d0a upstream. Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on __set_pte_vaddr() to do the invalidation when called from clear_fixmap() Remove arch_apei_flush_tlb_one(). Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kernel/acpi/apei.c | 5 ----- include/acpi/apei.h | 1 - 2 files changed, 6 deletions(-) --- a/arch/x86/kernel/acpi/apei.c +++ b/arch/x86/kernel/acpi/apei.c @@ -52,8 +52,3 @@ void arch_apei_report_mem_error(int sev, apei_mce_report_mem_error(sev, mem_err); #endif } - -void arch_apei_flush_tlb_one(unsigned long addr) -{ - __flush_tlb_one(addr); -} --- a/include/acpi/apei.h +++ b/include/acpi/apei.h @@ -51,7 +51,6 @@ int erst_clear(u64 record_id); int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data); void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err); -void arch_apei_flush_tlb_one(unsigned long addr); #endif #endif Patches currently in stable-queue which might be from james.morse(a)arm.com are queue-4.14/acpi-apei-remove-arch_apei_flush_tlb_one.patch

8 years

2
1
0 0

[Linux-stable-mirror] Patch "ACPI / APEI: Remove arch_apei_flush_tlb_one()" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ACPI / APEI: Remove arch_apei_flush_tlb_one() to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: acpi-apei-remove-arch_apei_flush_tlb_one.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 4a75aeacda3c2455954596593d89187df5420d0a Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:27 +0000 Subject: ACPI / APEI: Remove arch_apei_flush_tlb_one() From: James Morse <james.morse(a)arm.com> commit 4a75aeacda3c2455954596593d89187df5420d0a upstream. Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on __set_pte_vaddr() to do the invalidation when called from clear_fixmap() Remove arch_apei_flush_tlb_one(). Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kernel/acpi/apei.c | 5 ----- include/acpi/apei.h | 1 - 2 files changed, 6 deletions(-) --- a/arch/x86/kernel/acpi/apei.c +++ b/arch/x86/kernel/acpi/apei.c @@ -55,8 +55,3 @@ void arch_apei_report_mem_error(int sev, apei_mce_report_mem_error(sev, mem_err); #endif } - -void arch_apei_flush_tlb_one(unsigned long addr) -{ - __flush_tlb_one(addr); -} --- a/include/acpi/apei.h +++ b/include/acpi/apei.h @@ -44,7 +44,6 @@ int erst_clear(u64 record_id); int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data); void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err); -void arch_apei_flush_tlb_one(unsigned long addr); #endif #endif Patches currently in stable-queue which might be from james.morse(a)arm.com are queue-4.4/acpi-apei-remove-arch_apei_flush_tlb_one.patch

8 years

3
3
0 0

Re: [Linux-stable-mirror] flow cache removed = xfrm doesnt work

by Steffen Klassert

On Mon, Nov 27, 2017 at 10:39:25AM +0100, Tomas Charvat wrote: > Ok under heavy load patch > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?i… > > cause following error once in few minutes, it doesn't happen instantly This patch simply reverts to the old behaviour, this bug does not seem to be related to the revert. Is the commit that you referring the head commit of your branch, or did you picked this one to some other branch? Can you give some details on your setup and send your .config?

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH 1/3] ALSA: usb-audio: Fix out-of-bound error

by Jaejoong Kim

The snd_usb_copy_string_desc() retrieves the usb string corresponding to the index number through the usb_string(). The problem is that the usb_string() returns the length of the string (>= 0) when successful, but it can also return a negative value about the error case or status of usb_control_msg(). If iClockSource is '0' as shown below, usb_string() will returns -EINVAL. This will result in '0' being inserted into buf[-22], and the following KASAN out-of-bound error message will be output. AudioControl Interface Descriptor: bLength 8 bDescriptorType 36 bDescriptorSubtype 10 (CLOCK_SOURCE) bClockID 1 bmAttributes 0x07 Internal programmable Clock (synced to SOF) bmControls 0x07 Clock Frequency Control (read/write) Clock Validity Control (read-only) bAssocTerminal 0 iClockSource 0 To fix it, check usb_string() return value and bail out. ================================================================== BUG: KASAN: stack-out-of-bounds in parse_audio_unit+0x1327/0x1960 [snd_usb_audio] Write of size 1 at addr ffff88007e66735a by task systemd-udevd/18376 CPU: 0 PID: 18376 Comm: systemd-udevd Not tainted 4.13.0+ #3 Hardware name: LG Electronics 15N540-RFLGL/White Tip Mountain, BIOS 15N5 Call Trace: dump_stack+0x63/0x8d print_address_description+0x70/0x290 ? parse_audio_unit+0x1327/0x1960 [snd_usb_audio] kasan_report+0x265/0x350 __asan_store1+0x4a/0x50 parse_audio_unit+0x1327/0x1960 [snd_usb_audio] ? save_stack+0xb5/0xd0 ? save_stack_trace+0x1b/0x20 ? save_stack+0x46/0xd0 ? kasan_kmalloc+0xad/0xe0 ? kmem_cache_alloc_trace+0xff/0x230 ? snd_usb_create_mixer+0xb0/0x4b0 [snd_usb_audio] ? usb_audio_probe+0x4de/0xf40 [snd_usb_audio] ? usb_probe_interface+0x1f5/0x440 ? driver_probe_device+0x3ed/0x660 ? build_feature_ctl+0xb10/0xb10 [snd_usb_audio] ? save_stack_trace+0x1b/0x20 ? init_object+0x69/0xa0 ? snd_usb_find_csint_desc+0xa8/0xf0 [snd_usb_audio] snd_usb_mixer_controls+0x1dc/0x370 [snd_usb_audio] ? build_audio_procunit+0x890/0x890 [snd_usb_audio] ? snd_usb_create_mixer+0xb0/0x4b0 [snd_usb_audio] ? kmem_cache_alloc_trace+0xff/0x230 ? usb_ifnum_to_if+0xbd/0xf0 snd_usb_create_mixer+0x25b/0x4b0 [snd_usb_audio] ? snd_usb_create_stream+0x255/0x2c0 [snd_usb_audio] usb_audio_probe+0x4de/0xf40 [snd_usb_audio] ? snd_usb_autosuspend.part.7+0x30/0x30 [snd_usb_audio] ? __pm_runtime_idle+0x90/0x90 ? kernfs_activate+0xa6/0xc0 ? usb_match_one_id_intf+0xdc/0x130 ? __pm_runtime_set_status+0x2d4/0x450 usb_probe_interface+0x1f5/0x440 Cc: <stable(a)vger.kernel.org> Signed-off-by: Jaejoong Kim <climbbb.kim(a)gmail.com> --- sound/usb/mixer.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index e630813..da7cbe7 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -204,6 +204,10 @@ static int snd_usb_copy_string_desc(struct mixer_build *state, int index, char *buf, int maxlen) { int len = usb_string(state->chip->dev, index, buf, maxlen - 1); + + if (len < 0) + return len; + buf[len] = 0; return len; } -- 2.7.4

8 years

1
0
0 0

[Linux-stable-mirror] Apply "x86/mm: fix use-after-free of vma during userfaultfd fault" to 4.9-stable

by Eric Biggers

Commit cb0631fd3cf9 ("x86/mm: fix use-after-free of vma during userfaultfd fault") went into mainline without Cc: stable. It appears to be a use-after-free reachable by unprivileged users -- at least with CONFIG_USERFAULTFD=y. Can it please be applied to 4.9-stable? Eric

8 years

3
2
0 0

[Linux-stable-mirror] Patch "x86/mm: fix use-after-free of vma during userfaultfd fault" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled x86/mm: fix use-after-free of vma during userfaultfd fault to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: x86-mm-fix-use-after-free-of-vma-during-userfaultfd-fault.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From cb0631fd3cf9e989cd48293fe631cbc402aec9a9 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka <vbabka(a)suse.cz> Date: Wed, 1 Nov 2017 08:21:25 +0100 Subject: x86/mm: fix use-after-free of vma during userfaultfd fault From: Vlastimil Babka <vbabka(a)suse.cz> commit cb0631fd3cf9e989cd48293fe631cbc402aec9a9 upstream. Syzkaller with KASAN has reported a use-after-free of vma->vm_flags in __do_page_fault() with the following reproducer: mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0) mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0) r0 = userfaultfd(0x0) ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0}) ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0}) r1 = gettid() syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0) tkill(r1, 0x7) The vma should be pinned by mmap_sem, but handle_userfault() might (in a return to userspace scenario) release it and then acquire again, so when we return to __do_page_fault() (with other result than VM_FAULT_RETRY), the vma might be gone. Specifically, per Andrea the scenario is "A return to userland to repeat the page fault later with a VM_FAULT_NOPAGE retval (potentially after handling any pending signal during the return to userland). The return to userland is identified whenever FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in vmf->flags" However, since commit a3c4fb7c9c2e ("x86/mm: Fix fault error path using unsafe vma pointer") there is a vma_pkey() read of vma->vm_flags after that point, which can thus become use-after-free. Fix this by moving the read before calling handle_mm_fault(). Reported-by: syzbot <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6(a)syzkaller.appspotmail.com> Reported-by: Dmitry Vyukov <dvyukov(a)google.com> Suggested-by: Kirill A. Shutemov <kirill(a)shutemov.name> Fixes: 3c4fb7c9c2e ("x86/mm: Fix fault error path using unsafe vma pointer") Reviewed-by: Andrea Arcangeli <aarcange(a)redhat.com> Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Eric Biggers <ebiggers3(a)gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/mm/fault.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1393,7 +1393,17 @@ good_area: * make sure we exit gracefully rather than endlessly redo * the fault. Since we never set FAULT_FLAG_RETRY_NOWAIT, if * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked. + * + * Note that handle_userfault() may also release and reacquire mmap_sem + * (and not return with VM_FAULT_RETRY), when returning to userland to + * repeat the page fault later with a VM_FAULT_NOPAGE retval + * (potentially after handling any pending signal during the return to + * userland). The return to userland is identified whenever + * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags. + * Thus we have to be careful about not touching vma after handling the + * fault, so we read the pkey beforehand. */ + pkey = vma_pkey(vma); fault = handle_mm_fault(vma, address, flags); major |= fault & VM_FAULT_MAJOR; @@ -1420,7 +1430,6 @@ good_area: return; } - pkey = vma_pkey(vma); up_read(&mm->mmap_sem); if (unlikely(fault & VM_FAULT_ERROR)) { mm_fault_error(regs, error_code, address, &pkey, fault); Patches currently in stable-queue which might be from vbabka(a)suse.cz are queue-4.9/x86-mm-fix-use-after-free-of-vma-during-userfaultfd-fault.patch

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH] X.509: reject invalid BIT STRING for subjectPublicKey

by Eric Biggers

From: Eric Biggers <ebiggers(a)google.com> Adding a specially crafted X.509 certificate whose subjectPublicKey ASN.1 value is zero-length caused x509_extract_key_data() to set the public key size to SIZE_MAX, as it subtracted the nonexistent BIT STRING metadata byte. Then, x509_cert_parse() called kmemdup() with that bogus size, triggering the WARN_ON_ONCE() in kmalloc_slab(). This appears to be harmless, but it still must be fixed since WARNs are never supposed to be user-triggerable. Fix it by updating x509_cert_parse() to validate that the value has a BIT STRING metadata byte, and that the byte is 0 which indicates that the number of bits in the bitstring is a multiple of 8. It would be nice to handle the metadata byte in asn1_ber_decoder() instead. But that would be tricky because in the general case a BIT STRING could be implicitly tagged, and/or could legitimately have a length that is not a whole number of bytes. Here was the WARN (cleaned up slightly): WARNING: CPU: 1 PID: 202 at mm/slab_common.c:971 kmalloc_slab+0x5d/0x70 mm/slab_common.c:971 Modules linked in: CPU: 1 PID: 202 Comm: keyctl Tainted: G B 4.14.0-09238-g1d3b78bbc6e9 #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 task: ffff880033014180 task.stack: ffff8800305c8000 Call Trace: __do_kmalloc mm/slab.c:3706 [inline] __kmalloc_track_caller+0x22/0x2e0 mm/slab.c:3726 kmemdup+0x17/0x40 mm/util.c:118 kmemdup include/linux/string.h:414 [inline] x509_cert_parse+0x2cb/0x620 crypto/asymmetric_keys/x509_cert_parser.c:106 x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174 asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388 key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850 SYSC_add_key security/keys/keyctl.c:122 [inline] SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62 entry_SYSCALL_64_fastpath+0x1f/0x96 Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder") Cc: <stable(a)vger.kernel.org> # v3.7+ Signed-off-by: Eric Biggers <ebiggers(a)google.com> --- crypto/asymmetric_keys/x509_cert_parser.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/crypto/asymmetric_keys/x509_cert_parser.c b/crypto/asymmetric_keys/x509_cert_parser.c index dd03fead1ca3..ce2df8c9c583 100644 --- a/crypto/asymmetric_keys/x509_cert_parser.c +++ b/crypto/asymmetric_keys/x509_cert_parser.c @@ -409,6 +409,8 @@ int x509_extract_key_data(void *context, size_t hdrlen, ctx->cert->pub->pkey_algo = "rsa"; /* Discard the BIT STRING metadata */ + if (vlen < 1 || *(const u8 *)value != 0) + return -EBADMSG; ctx->key = value + 1; ctx->key_size = vlen - 1; return 0; -- 2.15.0

8 years

2
1
0 0

[Linux-stable-mirror] [PATCH] ASN.1: check for error from ASN1_OP_END__ACT actions

by Eric Biggers

From: Eric Biggers <ebiggers(a)google.com> asn1_ber_decoder() was ignoring errors from actions associated with the opcodes ASN1_OP_END_SEQ_ACT, ASN1_OP_END_SET_ACT, ASN1_OP_END_SEQ_OF_ACT, and ASN1_OP_END_SET_OF_ACT. In practice, this meant the pkcs7_note_signed_info() action (since that was the only user of those opcodes). Fix it by checking for the error, just like the decoder does for actions associated with the other opcodes. This bug allowed users to leak slab memory by repeatedly trying to add a specially crafted "pkcs7_test" key (requires CONFIG_PKCS7_TEST_KEY). In theory, this bug could also be used to bypass module signature verification, by providing a PKCS#7 message that is misparsed such that a signature's ->authattrs do not contain its ->msgdigest. But it doesn't seem practical in normal cases, due to restrictions on the format of the ->authattrs. Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder") Cc: <stable(a)vger.kernel.org> # v3.7+ Signed-off-by: Eric Biggers <ebiggers(a)google.com> --- lib/asn1_decoder.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/asn1_decoder.c b/lib/asn1_decoder.c index d77cdfc4b554..dc14beae2c9a 100644 --- a/lib/asn1_decoder.c +++ b/lib/asn1_decoder.c @@ -439,6 +439,8 @@ int asn1_ber_decoder(const struct asn1_decoder *decoder, else act = machine[pc + 1]; ret = actions[act](context, hdr, 0, data + tdp, len); + if (ret < 0) + return ret; } pc += asn1_op_lengths[op]; goto next_op; -- 2.15.0

8 years

2
1
0 0

[Linux-stable-mirror] [PATCH] ASN.1: fix out-of-bounds read when parsing indefinite length item

by Eric Biggers

From: Eric Biggers <ebiggers(a)google.com> In asn1_ber_decoder(), indefinitely-sized ASN.1 items were being passed to the action functions before their lengths had been computed, using the bogus length of 0x80 (ASN1_INDEFINITE_LENGTH). This resulted in reading data past the end of the input buffer, when given a specially crafted message. Fix it by rearranging the code so that the indefinite length is resolved before the action is called. This bug was originally found by fuzzing the X.509 parser in userspace using libFuzzer from the LLVM project. KASAN report (cleaned up slightly): BUG: KASAN: slab-out-of-bounds in memcpy ./include/linux/string.h:341 [inline] BUG: KASAN: slab-out-of-bounds in x509_fabricate_name.constprop.1+0x1a4/0x940 crypto/asymmetric_keys/x509_cert_parser.c:366 Read of size 128 at addr ffff880035dd9eaf by task keyctl/195 CPU: 1 PID: 195 Comm: keyctl Not tainted 4.14.0-09238-g1d3b78bbc6e9 #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0xd1/0x175 lib/dump_stack.c:53 print_address_description+0x78/0x260 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x23f/0x350 mm/kasan/report.c:409 memcpy+0x1f/0x50 mm/kasan/kasan.c:302 memcpy ./include/linux/string.h:341 [inline] x509_fabricate_name.constprop.1+0x1a4/0x940 crypto/asymmetric_keys/x509_cert_parser.c:366 asn1_ber_decoder+0xb4a/0x1fd0 lib/asn1_decoder.c:447 x509_cert_parse+0x1c7/0x620 crypto/asymmetric_keys/x509_cert_parser.c:89 x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174 asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388 key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850 SYSC_add_key security/keys/keyctl.c:122 [inline] SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62 entry_SYSCALL_64_fastpath+0x1f/0x96 Allocated by task 195: __do_kmalloc_node mm/slab.c:3675 [inline] __kmalloc_node+0x47/0x60 mm/slab.c:3682 kvmalloc ./include/linux/mm.h:540 [inline] SYSC_add_key security/keys/keyctl.c:104 [inline] SyS_add_key+0x19e/0x290 security/keys/keyctl.c:62 entry_SYSCALL_64_fastpath+0x1f/0x96 Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder") Reported-by: Alexander Potapenko <glider(a)google.com> Cc: <stable(a)vger.kernel.org> # v3.7+ Signed-off-by: Eric Biggers <ebiggers(a)google.com> --- lib/asn1_decoder.c | 47 ++++++++++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 21 deletions(-) diff --git a/lib/asn1_decoder.c b/lib/asn1_decoder.c index 1ef0cec38d78..d77cdfc4b554 100644 --- a/lib/asn1_decoder.c +++ b/lib/asn1_decoder.c @@ -313,42 +313,47 @@ int asn1_ber_decoder(const struct asn1_decoder *decoder, /* Decide how to handle the operation */ switch (op) { - case ASN1_OP_MATCH_ANY_ACT: - case ASN1_OP_MATCH_ANY_ACT_OR_SKIP: - case ASN1_OP_COND_MATCH_ANY_ACT: - case ASN1_OP_COND_MATCH_ANY_ACT_OR_SKIP: - ret = actions[machine[pc + 1]](context, hdr, tag, data + dp, len); - if (ret < 0) - return ret; - goto skip_data; - - case ASN1_OP_MATCH_ACT: - case ASN1_OP_MATCH_ACT_OR_SKIP: - case ASN1_OP_COND_MATCH_ACT_OR_SKIP: - ret = actions[machine[pc + 2]](context, hdr, tag, data + dp, len); - if (ret < 0) - return ret; - goto skip_data; - case ASN1_OP_MATCH: case ASN1_OP_MATCH_OR_SKIP: + case ASN1_OP_MATCH_ACT: + case ASN1_OP_MATCH_ACT_OR_SKIP: case ASN1_OP_MATCH_ANY: case ASN1_OP_MATCH_ANY_OR_SKIP: + case ASN1_OP_MATCH_ANY_ACT: + case ASN1_OP_MATCH_ANY_ACT_OR_SKIP: case ASN1_OP_COND_MATCH_OR_SKIP: + case ASN1_OP_COND_MATCH_ACT_OR_SKIP: case ASN1_OP_COND_MATCH_ANY: case ASN1_OP_COND_MATCH_ANY_OR_SKIP: - skip_data: + case ASN1_OP_COND_MATCH_ANY_ACT: + case ASN1_OP_COND_MATCH_ANY_ACT_OR_SKIP: + if (!(flags & FLAG_CONS)) { if (flags & FLAG_INDEFINITE_LENGTH) { + size_t tmp = dp; + ret = asn1_find_indefinite_length( - data, datalen, &dp, &len, &errmsg); + data, datalen, &tmp, &len, &errmsg); if (ret < 0) goto error; - } else { - dp += len; } pr_debug("- LEAF: %zu\n", len); } + + if (op & ASN1_OP_MATCH__ACT) { + unsigned char act; + + if (op & ASN1_OP_MATCH__ANY) + act = machine[pc + 1]; + else + act = machine[pc + 2]; + ret = actions[act](context, hdr, tag, data + dp, len); + if (ret < 0) + return ret; + } + + if (!(flags & FLAG_CONS)) + dp += len; pc += asn1_op_lengths[op]; goto next_op; -- 2.15.0

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH rdma-rc v1 1/2] IB/core: Only enforce security for InfiniBand

by Leon Romanovsky

From: Daniel Jurgens <danielj(a)mellanox.com> For now the only LSM security enforcement mechanism available is specific to InfiniBand. Bypass enforcement for non-IB link types. This fixes a regression where modify_qp fails for iWARP because querying the PKEY returns -EINVAL. Cc: Paul Moore <paul(a)paul-moore.com> Cc: Don Dutile <ddutile(a)redhat.com> Cc: stable(a)vger.kernel.org Reported-by: Potnuri Bharat Teja <bharat(a)chelsio.com> Fixes: d291f1a65232("IB/core: Enforce PKey security on QPs") Fixes: 47a2b338fe63("IB/core: Enforce security on management datagrams") Signed-off-by: Daniel Jurgens <danielj(a)mellanox.com> Reviewed-by: Parav Pandit <parav(a)mellanox.com> Tested-by: Potnuri Bharat Teja <bharat(a)chelsio.com> Signed-off-by: Leon Romanovsky <leon(a)kernel.org> --- drivers/infiniband/core/security.c | 43 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 41 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c index 23278ed5be45..4b7fd68e1174 100644 --- a/drivers/infiniband/core/security.c +++ b/drivers/infiniband/core/security.c @@ -417,8 +417,17 @@ void ib_close_shared_qp_security(struct ib_qp_security *sec) int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev) { + u8 i = rdma_start_port(dev); + bool is_ib = false; int ret; + while (i <= rdma_end_port(dev) && !is_ib) + is_ib = rdma_protocol_ib(dev, i++); + + /* If this isn't an IB device don't create the security context */ + if (!is_ib) + return 0; + qp->qp_sec = kzalloc(sizeof(*qp->qp_sec), GFP_KERNEL); if (!qp->qp_sec) return -ENOMEM; @@ -441,6 +450,10 @@ EXPORT_SYMBOL(ib_create_qp_security); void ib_destroy_qp_security_begin(struct ib_qp_security *sec) { + /* Return if not IB */ + if (!sec) + return; + mutex_lock(&sec->mutex); /* Remove the QP from the lists so it won't get added to @@ -470,6 +483,10 @@ void ib_destroy_qp_security_abort(struct ib_qp_security *sec) int ret; int i; + /* Return if not IB */ + if (!sec) + return; + /* If a concurrent cache update is in progress this * QP security could be marked for an error state * transition. Wait for this to complete. @@ -505,6 +522,10 @@ void ib_destroy_qp_security_end(struct ib_qp_security *sec) { int i; + /* Return if not IB */ + if (!sec) + return; + /* If a concurrent cache update is occurring we must * wait until this QP security structure is processed * in the QP to error flow before destroying it because @@ -565,13 +586,19 @@ int ib_security_modify_qp(struct ib_qp *qp, bool pps_change = ((qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) || (qp_attr_mask & IB_QP_ALT_PATH)); + WARN_ONCE((qp_attr_mask & IB_QP_PORT && + rdma_protocol_ib(real_qp->device, qp_attr->port_num) && + !real_qp->qp_sec), + "%s: QP security is not initialized for IB QP: %d\n", + __func__, real_qp->qp_num); + /* The port/pkey settings are maintained only for the real QP. Open * handles on the real QP will be in the shared_qp_list. When * enforcing security on the real QP all the shared QPs will be * checked as well. */ - if (pps_change && !special_qp) { + if (pps_change && !special_qp` && real_qp->qp_sec) { mutex_lock(&real_qp->qp_sec->mutex); new_pps = get_new_pps(real_qp, qp_attr, @@ -600,7 +627,7 @@ int ib_security_modify_qp(struct ib_qp *qp, qp_attr_mask, udata); - if (pps_change && !special_qp) { + if (pps_change && !special_qpp && real_qp->qp_sec) { /* Clean up the lists and free the appropriate * ports_pkeys structure. */ @@ -631,6 +658,9 @@ int ib_security_pkey_access(struct ib_device *dev, u16 pkey; int ret; + if (!rdma_protocol_ib(dev, port_num)) + return 0; + ret = ib_get_cached_pkey(dev, port_num, pkey_index, &pkey); if (ret) return ret; @@ -665,6 +695,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent, { int ret; + if (!rdma_protocol_ib(agent->device, agent->port_num)) + return 0; + ret = security_ib_alloc_security(&agent->security); if (ret) return ret; @@ -690,6 +723,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent, void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent) { + if (!rdma_protocol_ib(agent->device, agent->port_num)) + return; + security_ib_free_security(agent->security); if (agent->lsm_nb_reg) unregister_lsm_notifier(&agent->lsm_nb); @@ -697,6 +733,9 @@ void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent) int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index) { + if (!rdma_protocol_ib(map->agent.device, map->agent.port_num)) + return 0; + if (map->agent.qp->qp_type == IB_QPT_SMI && !map->agent.smp_allowed) return -EACCES; -- 2.15.0

8 years

1
0
0 0

[Linux-stable-mirror] v3.2.96 build: 4 failures 20 warnings (v3.2.96)

by Build bot for Mark Brown

Tree/Branch: v3.2.96 Git describe: v3.2.96 Commit: 07a40fa222 Linux 3.2.96 Build Time: 0 min 3 sec Passed: 0 / 4 ( 0.00 %) Failed: 4 / 4 (100.00 %) Errors: 6 Warnings: 20 Section Mismatches: 0 Failed defconfigs: x86_64-allnoconfig arm-allmodconfig arm-allnoconfig x86_64-defconfig Errors: x86_64-allnoconfig /home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode /home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode arm-allnoconfig /home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function) /home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model /home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type /home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type x86_64-defconfig /home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode /home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): 2 warnings 0 mismatches : arm-allmodconfig 19 warnings 0 mismatches : arm-allnoconfig ------------------------------------------------------------------------------- Errors summary: 6 2 /home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode 2 /home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode 1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type 1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type 1 /home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model 1 /home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function) Warnings Summary: 20 2 .config:27:warning: symbol value '' invalid for PHYS_OFFSET 1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:342:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:272:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:265:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:131:35: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:127:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:121:3: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:120:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:114:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/swab.h:25:28: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:82:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:102:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/irqflags.h:11:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/fpstate.h:32:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:33:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:28:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:196:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:194:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/bitops.h:217:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] 1 /home/broonie/build/linux-stable/arch/arm/include/asm/atomic.h:30:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- x86_64-allnoconfig : FAIL, 2 errors, 0 warnings, 0 section mismatches Errors: /home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode /home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode ------------------------------------------------------------------------------- arm-allmodconfig : FAIL, 0 errors, 2 warnings, 0 section mismatches Warnings: .config:27:warning: symbol value '' invalid for PHYS_OFFSET .config:27:warning: symbol value '' invalid for PHYS_OFFSET ------------------------------------------------------------------------------- arm-allnoconfig : FAIL, 4 errors, 19 warnings, 0 section mismatches Errors: /home/broonie/build/linux-stable/arch/arm/include/asm/div64.h:77:7: error: '__LINUX_ARM_ARCH__' undeclared (first use in this function) /home/broonie/build/linux-stable/arch/arm/include/asm/glue-cache.h:129:2: error: #error Unknown cache maintenance model /home/broonie/build/linux-stable/arch/arm/include/asm/glue-df.h:107:2: error: #error Unknown data abort handler type /home/broonie/build/linux-stable/arch/arm/include/asm/glue-pf.h:54:2: error: #error Unknown prefetch abort handler type Warnings: /home/broonie/build/linux-stable/arch/arm/include/asm/irqflags.h:11:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:114:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:120:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:121:3: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:127:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:131:35: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:265:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:272:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/system.h:342:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/bitops.h:217:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/swab.h:25:28: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/fpstate.h:32:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:82:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/processor.h:102:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/atomic.h:30:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:28:5: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/cachetype.h:33:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:194:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] /home/broonie/build/linux-stable/arch/arm/include/asm/cacheflush.h:196:7: warning: "__LINUX_ARM_ARCH__" is not defined [-Wundef] ------------------------------------------------------------------------------- x86_64-defconfig : FAIL, 2 errors, 0 warnings, 0 section mismatches Errors: /home/broonie/build/linux-stable/scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode /home/broonie/build/linux-stable/kernel/bounds.c:1:0: error: code model kernel does not support PIC mode ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches:

8 years

1
0
0 0

[Linux-stable-mirror] v3.16.51 build: 0 failures 7 warnings (v3.16.51)

by Build bot for Mark Brown

Tree/Branch: v3.16.51 Git describe: v3.16.51 Commit: c45c05f42d Linux 3.16.51 Build Time: 37 min 49 sec Passed: 9 / 9 (100.00 %) Failed: 0 / 9 ( 0.00 %) Errors: 0 Warnings: 7 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): 1 warnings 0 mismatches : arm64-allnoconfig 2 warnings 0 mismatches : arm64-allmodconfig 1 warnings 0 mismatches : arm-multi_v5_defconfig 1 warnings 0 mismatches : arm-multi_v7_defconfig 1 warnings 0 mismatches : x86_64-defconfig 4 warnings 0 mismatches : arm-allmodconfig 1 warnings 0 mismatches : arm-allnoconfig 1 warnings 0 mismatches : x86_64-allnoconfig 3 warnings 0 mismatches : arm64-defconfig ------------------------------------------------------------------------------- Warnings Summary: 7 7 ../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void 2 ../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized] 2 ../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable] 1 ../drivers/platform/x86/eeepc-laptop.c:279:10: warning: 'value' may be used uninitialized in this function [-Wmaybe-uninitialized] 1 ../drivers/media/platform/davinci/vpfe_capture.c:291:12: warning: 'vpfe_get_ccdc_image_format' defined but not used [-Wunused-function] 1 ../drivers/media/platform/davinci/vpfe_capture.c:1718:1: warning: label 'unlock_out' defined but not used [-Wunused-label] 1 ../arch/x86/kernel/cpu/common.c:961:13: warning: 'syscall32_cpu_init' defined but not used [-Wunused-function] =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- arm64-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches Warnings: ../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void ------------------------------------------------------------------------------- arm64-allmodconfig : PASS, 0 errors, 2 warnings, 0 section mismatches Warnings: ../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void ../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable] ------------------------------------------------------------------------------- arm-multi_v5_defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches Warnings: ../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void ------------------------------------------------------------------------------- arm-multi_v7_defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches Warnings: ../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void ------------------------------------------------------------------------------- x86_64-defconfig : PASS, 0 errors, 1 warnings, 0 section mismatches Warnings: ../drivers/platform/x86/eeepc-laptop.c:279:10: warning: 'value' may be used uninitialized in this function [-Wmaybe-uninitialized] ------------------------------------------------------------------------------- arm-allmodconfig : PASS, 0 errors, 4 warnings, 0 section mismatches Warnings: ../drivers/media/platform/davinci/vpfe_capture.c:1718:1: warning: label 'unlock_out' defined but not used [-Wunused-label] ../drivers/media/platform/davinci/vpfe_capture.c:291:12: warning: 'vpfe_get_ccdc_image_format' defined but not used [-Wunused-function] ../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void ../drivers/net/ethernet/broadcom/genet/bcmgenet.c:1346:17: warning: unused variable 'kdev' [-Wunused-variable] ------------------------------------------------------------------------------- arm-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches Warnings: ../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void ------------------------------------------------------------------------------- x86_64-allnoconfig : PASS, 0 errors, 1 warnings, 0 section mismatches Warnings: ../arch/x86/kernel/cpu/common.c:961:13: warning: 'syscall32_cpu_init' defined but not used [-Wunused-function] ------------------------------------------------------------------------------- arm64-defconfig : PASS, 0 errors, 3 warnings, 0 section mismatches Warnings: ../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized] ../ipc/sem.c:377:6: warning: '___p1' may be used uninitialized in this function [-Wmaybe-uninitialized] ../include/linux/stddef.h:8:14: warning: 'return' with a value, in function returning void ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches:

8 years

1
0
0 0

[Linux-stable-mirror] HELLO!

by Mr. G. CHA

I have important transaction for you as next of kin to claim US$8.37m email me at changgordon61(a)yahoo.com.hk so I can send you more details

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] tpm_tis_spi: Use DMA-safe memory for SPI transfers" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 6b3a13173f23e798e1ba213dd4a2c065a3b8d751 Mon Sep 17 00:00:00 2001 From: Alexander Steffen <Alexander.Steffen(a)infineon.com> Date: Mon, 11 Sep 2017 12:26:52 +0200 Subject: [PATCH] tpm_tis_spi: Use DMA-safe memory for SPI transfers The buffers used as tx_buf/rx_buf in a SPI transfer need to be DMA-safe. This cannot be guaranteed for the buffers passed to tpm_tis_spi_read_bytes and tpm_tis_spi_write_bytes. Therefore, we need to use our own DMA-safe buffer and copy the data to/from it. The buffer needs to be allocated separately, to ensure that it is cacheline-aligned and not shared with other data, so that DMA can work correctly. Fixes: 0edbfea537d1 ("tpm/tpm_tis_spi: Add support for spi phy") Cc: stable(a)vger.kernel.org Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> Signed-off-by: Alexander Steffen <Alexander.Steffen(a)infineon.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> diff --git a/drivers/char/tpm/tpm_tis_spi.c b/drivers/char/tpm/tpm_tis_spi.c index e49f5b9b739a..8ab0bd8445f6 100644 --- a/drivers/char/tpm/tpm_tis_spi.c +++ b/drivers/char/tpm/tpm_tis_spi.c @@ -46,9 +46,7 @@ struct tpm_tis_spi_phy { struct tpm_tis_data priv; struct spi_device *spi_device; - - u8 tx_buf[4]; - u8 rx_buf[4]; + u8 *iobuf; }; static inline struct tpm_tis_spi_phy *to_tpm_tis_spi_phy(struct tpm_tis_data *data) @@ -71,14 +69,14 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, while (len) { transfer_len = min_t(u16, len, MAX_SPI_FRAMESIZE); - phy->tx_buf[0] = (in ? 0x80 : 0) | (transfer_len - 1); - phy->tx_buf[1] = 0xd4; - phy->tx_buf[2] = addr >> 8; - phy->tx_buf[3] = addr; + phy->iobuf[0] = (in ? 0x80 : 0) | (transfer_len - 1); + phy->iobuf[1] = 0xd4; + phy->iobuf[2] = addr >> 8; + phy->iobuf[3] = addr; memset(&spi_xfer, 0, sizeof(spi_xfer)); - spi_xfer.tx_buf = phy->tx_buf; - spi_xfer.rx_buf = phy->rx_buf; + spi_xfer.tx_buf = phy->iobuf; + spi_xfer.rx_buf = phy->iobuf; spi_xfer.len = 4; spi_xfer.cs_change = 1; @@ -88,9 +86,9 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, if (ret < 0) goto exit; - if ((phy->rx_buf[3] & 0x01) == 0) { + if ((phy->iobuf[3] & 0x01) == 0) { // handle SPI wait states - phy->tx_buf[0] = 0; + phy->iobuf[0] = 0; for (i = 0; i < TPM_RETRY; i++) { spi_xfer.len = 1; @@ -99,7 +97,7 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, ret = spi_sync_locked(phy->spi_device, &m); if (ret < 0) goto exit; - if (phy->rx_buf[0] & 0x01) + if (phy->iobuf[0] & 0x01) break; } @@ -112,8 +110,14 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, spi_xfer.cs_change = 0; spi_xfer.len = transfer_len; spi_xfer.delay_usecs = 5; - spi_xfer.tx_buf = out; - spi_xfer.rx_buf = in; + + if (in) { + spi_xfer.tx_buf = NULL; + } else if (out) { + spi_xfer.rx_buf = NULL; + memcpy(phy->iobuf, out, transfer_len); + out += transfer_len; + } spi_message_init(&m); spi_message_add_tail(&spi_xfer, &m); @@ -121,11 +125,12 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, if (ret < 0) goto exit; - len -= transfer_len; - if (in) + if (in) { + memcpy(in, phy->iobuf, transfer_len); in += transfer_len; - if (out) - out += transfer_len; + } + + len -= transfer_len; } exit: @@ -191,6 +196,10 @@ static int tpm_tis_spi_probe(struct spi_device *dev) phy->spi_device = dev; + phy->iobuf = devm_kmalloc(&dev->dev, MAX_SPI_FRAMESIZE, GFP_KERNEL); + if (!phy->iobuf) + return -ENOMEM; + return tpm_tis_core_init(&dev->dev, &phy->priv, -1, &tpm_spi_phy_ops, NULL); }

8 years

2
1
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] tpm-dev-common: Reject too short writes" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From ee70bc1e7b63ac8023c9ff9475d8741e397316e7 Mon Sep 17 00:00:00 2001 From: Alexander Steffen <Alexander.Steffen(a)infineon.com> Date: Fri, 8 Sep 2017 17:21:32 +0200 Subject: [PATCH] tpm-dev-common: Reject too short writes tpm_transmit() does not offer an explicit interface to indicate the number of valid bytes in the communication buffer. Instead, it relies on the commandSize field in the TPM header that is encoded within the buffer. Therefore, ensure that a) enough data has been written to the buffer, so that the commandSize field is present and b) the commandSize field does not announce more data than has been written to the buffer. This should have been fixed with CVE-2011-1161 long ago, but apparently a correct version of that patch never made it into the kernel. Cc: stable(a)vger.kernel.org Signed-off-by: Alexander Steffen <Alexander.Steffen(a)infineon.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c index 610638a80383..461bf0b8a094 100644 --- a/drivers/char/tpm/tpm-dev-common.c +++ b/drivers/char/tpm/tpm-dev-common.c @@ -110,6 +110,12 @@ ssize_t tpm_common_write(struct file *file, const char __user *buf, return -EFAULT; } + if (in_size < 6 || + in_size < be32_to_cpu(*((__be32 *) (priv->data_buffer + 2)))) { + mutex_unlock(&priv->buffer_mutex); + return -EINVAL; + } + /* atomic tpm command send and result receive. We only hold the ops * lock during this period so that the tpm can be unregistered even if * the char dev is held open.

8 years

2
1
0 0

[Linux-stable-mirror] WTF: patch "[PATCH] tpm: constify transmit data pointers" was seriously submitted to be applied to the 4.14-stable tree?

by gregkh＠linuxfoundation.org

The patch below was submitted to be applied to the 4.14-stable tree. I fail to see how this patch meets the stable kernel rules as found at Documentation/process/stable-kernel-rules.rst. I could be totally wrong, and if so, please respond to <stable(a)vger.kernel.org> and let me know why this patch should be applied. Otherwise, it is now dropped from my patch queues, never to be seen again. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From c37fbc09bd4977736f6bc4050c6f099c587052a7 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann <arnd(a)arndb.de> Date: Thu, 7 Sep 2017 15:30:45 +0200 Subject: [PATCH] tpm: constify transmit data pointers Making cmd_getticks 'const' introduced a couple of harmless warnings: drivers/char/tpm/tpm_tis_core.c: In function 'probe_itpm': drivers/char/tpm/tpm_tis_core.c:469:31: error: passing argument 2 of 'tpm_tis_send_data' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] rc = tpm_tis_send_data(chip, cmd_getticks, len); drivers/char/tpm/tpm_tis_core.c:477:31: error: passing argument 2 of 'tpm_tis_send_data' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] rc = tpm_tis_send_data(chip, cmd_getticks, len); drivers/char/tpm/tpm_tis_core.c:255:12: note: expected 'u8 * {aka unsigned char *}' but argument is of type 'const u8 * {aka const unsigned char *}' static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) This changes the related functions to all take 'const' pointers so that gcc can see this as being correct. I had to slightly modify the logic around tpm_tis_spi_transfer() for this to work without introducing ugly casts. Cc: stable(a)vger.kernel.org Fixes: 5e35bd8e06b9 ("tpm_tis: make array cmd_getticks static const to shink object code size") Signed-off-by: Arnd Bergmann <arnd(a)arndb.de> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index 7e55aa9ce680..ebd0e75a3e4d 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -223,7 +223,7 @@ static int tpm_tcg_read_bytes(struct tpm_tis_data *data, u32 addr, u16 len, } static int tpm_tcg_write_bytes(struct tpm_tis_data *data, u32 addr, u16 len, - u8 *value) + const u8 *value) { struct tpm_tis_tcg_phy *phy = to_tpm_tis_tcg_phy(data); diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c index 1e957e923d21..fdde971bc810 100644 --- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -252,7 +252,7 @@ static int tpm_tis_recv(struct tpm_chip *chip, u8 *buf, size_t count) * tpm.c can skip polling for the data to be available as the interrupt is * waited for here */ -static int tpm_tis_send_data(struct tpm_chip *chip, u8 *buf, size_t len) +static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) { struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); int rc, status, burstcnt; @@ -343,7 +343,7 @@ static void disable_interrupts(struct tpm_chip *chip) * tpm.c can skip polling for the data to be available as the interrupt is * waited for here */ -static int tpm_tis_send_main(struct tpm_chip *chip, u8 *buf, size_t len) +static int tpm_tis_send_main(struct tpm_chip *chip, const u8 *buf, size_t len) { struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); int rc; diff --git a/drivers/char/tpm/tpm_tis_core.h b/drivers/char/tpm/tpm_tis_core.h index e2212f021a02..6bbac319ff3b 100644 --- a/drivers/char/tpm/tpm_tis_core.h +++ b/drivers/char/tpm/tpm_tis_core.h @@ -98,7 +98,7 @@ struct tpm_tis_phy_ops { int (*read_bytes)(struct tpm_tis_data *data, u32 addr, u16 len, u8 *result); int (*write_bytes)(struct tpm_tis_data *data, u32 addr, u16 len, - u8 *value); + const u8 *value); int (*read16)(struct tpm_tis_data *data, u32 addr, u16 *result); int (*read32)(struct tpm_tis_data *data, u32 addr, u32 *result); int (*write32)(struct tpm_tis_data *data, u32 addr, u32 src); @@ -128,7 +128,7 @@ static inline int tpm_tis_read32(struct tpm_tis_data *data, u32 addr, } static inline int tpm_tis_write_bytes(struct tpm_tis_data *data, u32 addr, - u16 len, u8 *value) + u16 len, const u8 *value) { return data->phy_ops->write_bytes(data, addr, len, value); } diff --git a/drivers/char/tpm/tpm_tis_spi.c b/drivers/char/tpm/tpm_tis_spi.c index 88fe72ae967f..e49f5b9b739a 100644 --- a/drivers/char/tpm/tpm_tis_spi.c +++ b/drivers/char/tpm/tpm_tis_spi.c @@ -57,7 +57,7 @@ static inline struct tpm_tis_spi_phy *to_tpm_tis_spi_phy(struct tpm_tis_data *da } static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, - u8 *buffer, u8 direction) + u8 *in, const u8 *out) { struct tpm_tis_spi_phy *phy = to_tpm_tis_spi_phy(data); int ret = 0; @@ -71,7 +71,7 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, while (len) { transfer_len = min_t(u16, len, MAX_SPI_FRAMESIZE); - phy->tx_buf[0] = direction | (transfer_len - 1); + phy->tx_buf[0] = (in ? 0x80 : 0) | (transfer_len - 1); phy->tx_buf[1] = 0xd4; phy->tx_buf[2] = addr >> 8; phy->tx_buf[3] = addr; @@ -112,14 +112,8 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, spi_xfer.cs_change = 0; spi_xfer.len = transfer_len; spi_xfer.delay_usecs = 5; - - if (direction) { - spi_xfer.tx_buf = NULL; - spi_xfer.rx_buf = buffer; - } else { - spi_xfer.tx_buf = buffer; - spi_xfer.rx_buf = NULL; - } + spi_xfer.tx_buf = out; + spi_xfer.rx_buf = in; spi_message_init(&m); spi_message_add_tail(&spi_xfer, &m); @@ -128,7 +122,10 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, goto exit; len -= transfer_len; - buffer += transfer_len; + if (in) + in += transfer_len; + if (out) + out += transfer_len; } exit: @@ -139,13 +136,13 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len, static int tpm_tis_spi_read_bytes(struct tpm_tis_data *data, u32 addr, u16 len, u8 *result) { - return tpm_tis_spi_transfer(data, addr, len, result, 0x80); + return tpm_tis_spi_transfer(data, addr, len, result, NULL); } static int tpm_tis_spi_write_bytes(struct tpm_tis_data *data, u32 addr, - u16 len, u8 *value) + u16 len, const u8 *value) { - return tpm_tis_spi_transfer(data, addr, len, value, 0); + return tpm_tis_spi_transfer(data, addr, len, NULL, value); } static int tpm_tis_spi_read16(struct tpm_tis_data *data, u32 addr, u16 *result)

8 years

2
1
0 0

[Linux-stable-mirror] WTF: patch "[PATCH] tpm_tis: make array cmd_getticks static const to shrink" was seriously submitted to be applied to the 4.14-stable tree?

by gregkh＠linuxfoundation.org

The patch below was submitted to be applied to the 4.14-stable tree. I fail to see how this patch meets the stable kernel rules as found at Documentation/process/stable-kernel-rules.rst. I could be totally wrong, and if so, please respond to <stable(a)vger.kernel.org> and let me know why this patch should be applied. Otherwise, it is now dropped from my patch queues, never to be seen again. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 0bbc931a074a741cf8e6279e8045cf7118586780 Mon Sep 17 00:00:00 2001 From: Colin Ian King <colin.king(a)canonical.com> Date: Fri, 25 Aug 2017 17:45:05 +0100 Subject: [PATCH] tpm_tis: make array cmd_getticks static const to shrink object code size Don't populate array cmd_getticks on the stack, instead make it static const. Makes the object code smaller by over 160 bytes: Before: text data bss dec hex filename 18813 3152 128 22093 564d drivers/char/tpm/tpm_tis_core.o After: text data bss dec hex filename 18554 3248 128 21930 55aa drivers/char/tpm/tpm_tis_core.o Cc: stable(a)vger.kernel.org Signed-off-by: Colin Ian King <colin.king(a)canonical.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c index 63bc6c3b949e..1e957e923d21 100644 --- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -445,7 +445,7 @@ static int probe_itpm(struct tpm_chip *chip) { struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); int rc = 0; - u8 cmd_getticks[] = { + static const u8 cmd_getticks[] = { 0x00, 0xc1, 0x00, 0x00, 0x00, 0x0a, 0x00, 0x00, 0x00, 0xf1 };

8 years

2
1
0 0

[Linux-stable-mirror] [PATCH] drm/i915/fbdev: Serialise early hotplug events with async fbdev config

by Chris Wilson

As both the hotplug event and fbdev configuration run asynchronously, it is possible for them to run concurrently. If configuration fails, we were freeing the fbdev causing a use-after-free in the hotplug event. <7>[ 3069.935211] [drm:intel_fb_initial_config [i915]] Not using firmware configuration <7>[ 3069.935225] [drm:drm_setup_crtcs] looking for cmdline mode on connector 77 <7>[ 3069.935229] [drm:drm_setup_crtcs] looking for preferred mode on connector 77 0 <7>[ 3069.935233] [drm:drm_setup_crtcs] found mode 3200x1800 <7>[ 3069.935236] [drm:drm_setup_crtcs] picking CRTCs for 8192x8192 config <7>[ 3069.935253] [drm:drm_setup_crtcs] desired mode 3200x1800 set on crtc 43 (0,0) <7>[ 3069.935323] [drm:intelfb_create [i915]] no BIOS fb, allocating a new one <4>[ 3069.967737] general protection fault: 0000 [#1] PREEMPT SMP <0>[ 3069.977453] --------------------------------- <4>[ 3069.977457] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mei_me mii prime_numbers mei i2c_hid pinctrl_geminilake pinctrl_intel [last unloaded: i915] <4>[ 3069.977492] CPU: 1 PID: 15414 Comm: kworker/1:0 Tainted: G U 4.14.0-CI-CI_DRM_3388+ #1 <4>[ 3069.977497] Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 3069.977508] Workqueue: events output_poll_execute <4>[ 3069.977512] task: ffff880177734e40 task.stack: ffffc90001fe4000 <4>[ 3069.977519] RIP: 0010:__lock_acquire+0x109/0x1b60 <4>[ 3069.977523] RSP: 0018:ffffc90001fe7bb0 EFLAGS: 00010002 <4>[ 3069.977526] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000282 RCX: 0000000000000000 <4>[ 3069.977530] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880170d4efd0 <4>[ 3069.977534] RBP: ffffc90001fe7c70 R08: 0000000000000001 R09: 0000000000000000 <4>[ 3069.977538] R10: 0000000000000000 R11: ffffffff81899609 R12: ffff880170d4efd0 <4>[ 3069.977542] R13: ffff880177734e40 R14: 0000000000000001 R15: 0000000000000000 <4>[ 3069.977547] FS: 0000000000000000(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000 <4>[ 3069.977551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 3069.977555] CR2: 00007f7e8b7bcf04 CR3: 0000000003e0f000 CR4: 00000000003406e0 <4>[ 3069.977559] Call Trace: <4>[ 3069.977565] ? mark_held_locks+0x64/0x90 <4>[ 3069.977571] ? _raw_spin_unlock_irq+0x24/0x50 <4>[ 3069.977575] ? _raw_spin_unlock_irq+0x24/0x50 <4>[ 3069.977579] ? trace_hardirqs_on_caller+0xde/0x1c0 <4>[ 3069.977583] ? _raw_spin_unlock_irq+0x2f/0x50 <4>[ 3069.977588] ? finish_task_switch+0xa5/0x210 <4>[ 3069.977592] ? lock_acquire+0xaf/0x200 <4>[ 3069.977596] lock_acquire+0xaf/0x200 <4>[ 3069.977600] ? __mutex_lock+0x5e9/0x9b0 <4>[ 3069.977604] _raw_spin_lock+0x2a/0x40 <4>[ 3069.977608] ? __mutex_lock+0x5e9/0x9b0 <4>[ 3069.977612] __mutex_lock+0x5e9/0x9b0 <4>[ 3069.977616] ? drm_fb_helper_hotplug_event.part.19+0x16/0xa0 <4>[ 3069.977621] ? drm_fb_helper_hotplug_event.part.19+0x16/0xa0 <4>[ 3069.977625] drm_fb_helper_hotplug_event.part.19+0x16/0xa0 <4>[ 3069.977630] output_poll_execute+0x8d/0x180 <4>[ 3069.977635] process_one_work+0x22e/0x660 <4>[ 3069.977640] worker_thread+0x48/0x3a0 <4>[ 3069.977644] ? _raw_spin_unlock_irqrestore+0x4c/0x60 <4>[ 3069.977649] kthread+0x102/0x140 <4>[ 3069.977653] ? process_one_work+0x660/0x660 <4>[ 3069.977657] ? kthread_create_on_node+0x40/0x40 <4>[ 3069.977662] ret_from_fork+0x27/0x40 <4>[ 3069.977666] Code: 8d 62 f8 c3 49 81 3c 24 e0 fa 3c 82 41 be 00 00 00 00 45 0f 45 f0 83 fe 01 77 86 89 f0 49 8b 44 c4 08 48 85 c0 0f 84 76 ff ff ff <f0> ff 80 38 01 00 00 8b 1d 62 f9 e8 01 45 8b 85 b8 08 00 00 85 <1>[ 3069.977707] RIP: __lock_acquire+0x109/0x1b60 RSP: ffffc90001fe7bb0 <4>[ 3069.977712] ---[ end trace 4ad012eb3af62df7 ]--- In order to keep the dev_priv->ifbdev alive after failure, we have to avoid the free and leave it empty until we unload the module. Then we can use intel_fbdev_sync() to serialise the hotplug event with the configuration. The serialisation between the two was removed in commit 934458c2c95d ("Revert "drm/i915: Fix races on fbdev""), but the use after free is much older, commit 366e39b4d2c5 ("drm/i915: Tear down fbdev if initialization fails") Fixes: 366e39b4d2c5 ("drm/i915: Tear down fbdev if initialization fails") Fixes: 934458c2c95d ("Revert "drm/i915: Fix races on fbdev"") Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk> Cc: Lukas Wunner <lukas(a)wunner.de> Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com> Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch> Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/i915/intel_fbdev.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c index b8af35187d22..ea96682568e8 100644 --- a/drivers/gpu/drm/i915/intel_fbdev.c +++ b/drivers/gpu/drm/i915/intel_fbdev.c @@ -697,10 +697,8 @@ static void intel_fbdev_initial_config(void *data, async_cookie_t cookie) /* Due to peculiar init order wrt to hpd handling this is separate. */ if (drm_fb_helper_initial_config(&ifbdev->helper, - ifbdev->preferred_bpp)) { + ifbdev->preferred_bpp)) intel_fbdev_unregister(to_i915(ifbdev->helper.dev)); - intel_fbdev_fini(to_i915(ifbdev->helper.dev)); - } } void intel_fbdev_initial_config_async(struct drm_device *dev) @@ -800,7 +798,11 @@ void intel_fbdev_output_poll_changed(struct drm_device *dev) { struct intel_fbdev *ifbdev = to_i915(dev)->fbdev; - if (ifbdev) + if (!ifbdev) + return; + + intel_fbdev_sync(ifbdev); + if (ifbdev->vma) drm_fb_helper_hotplug_event(&ifbdev->helper); } -- 2.15.0

8 years

2
3
0 0

Re: [Linux-stable-mirror] [PATCH] drm/i915/fbdev: Serialise early hotplug events with async fbdev config

by Lukas Wunner

On Sun, Nov 26, 2017 at 11:58:43AM +0000, Chris Wilson wrote: > Quoting Lukas Wunner (2017-11-26 11:49:19) > > Hm, the race at hand would be solved by the intel_fbdev_sync() below, > > or am I missing something? Still wondering why it's necessary to > > leave the fbdev around... > > The race is solved, but if we do free ifbdev, we can't dereference > ifbdev prior to the sync; and we store the async cookie inside ifbdev. > Bleugh. Catch 22. Right. Oh dear god! We could move the cookie into dev_priv, the fbdev_suspend_work is also there, outside of struct intel_fbdev. > What we might do then is just pull the struct into dev_priv under > ifdef FBDEV. I vaguely remember something that dev_priv deliberately only contains a pointer to struct intel_fbdev, that it *was* embedded in dev_priv in the past but moved out for some reason. > > However the "if (ifbdev->vma)" looks a bit fishy, ifbdev could be NULL > > (e.g. if BIOS fb was too small but intelfb_alloc() failed) so I think > > this might lead to a null pointer deref. Does it make a difference > > if we check for ifbdev versus ifbdev->vma? I also notice that you > > added a check for ifbdev->vma with 15727ed0d944 but Daniel later > > removed it with 88be58be886f. > > We know that ifbdev is non-NULL and can't become NULL until fini. So > after the sync point, we want to ask the question of whether the config > was successful, for that I used to use ->fb which now replaced by ->vma. Yes if the fbdev is kept around then obviously it's fine to deref it. Thanks, Lukas

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ACPI / APEI: Remove arch_apei_flush_tlb_one()" has been added to the 3.18-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ACPI / APEI: Remove arch_apei_flush_tlb_one() to the 3.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: acpi-apei-remove-arch_apei_flush_tlb_one.patch and it can be found in the queue-3.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 4a75aeacda3c2455954596593d89187df5420d0a Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:27 +0000 Subject: ACPI / APEI: Remove arch_apei_flush_tlb_one() From: James Morse <james.morse(a)arm.com> commit 4a75aeacda3c2455954596593d89187df5420d0a upstream. Nothing calls arch_apei_flush_tlb_one() anymore, instead relying on __set_pte_vaddr() to do the invalidation when called from clear_fixmap() Remove arch_apei_flush_tlb_one(). Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kernel/acpi/apei.c | 5 ----- include/acpi/apei.h | 1 - 2 files changed, 6 deletions(-) --- a/arch/x86/kernel/acpi/apei.c +++ b/arch/x86/kernel/acpi/apei.c @@ -55,8 +55,3 @@ void arch_apei_report_mem_error(int sev, apei_mce_report_mem_error(sev, mem_err); #endif } - -void arch_apei_flush_tlb_one(unsigned long addr) -{ - __flush_tlb_one(addr); -} --- a/include/acpi/apei.h +++ b/include/acpi/apei.h @@ -44,7 +44,6 @@ int erst_clear(u64 record_id); int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data); void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err); -void arch_apei_flush_tlb_one(unsigned long addr); #endif #endif Patches currently in stable-queue which might be from james.morse(a)arm.com are queue-3.18/acpi-apei-remove-arch_apei_flush_tlb_one.patch

8 years

2
1
0 0

[Linux-stable-mirror] [PATCH 4/6] iwlwifi: mvm: fix packet injection

by Luca Coelho

From: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com> We need to have a station and a queue for the monitor interface to be able to inject traffic. We used to have this traffic routed to the auxiliary queue, but this queue isn't scheduled for the station we had linked to the monitor vif. Allocate a new queue, link it to the monitor vif's station and make that queue use the BE fifo. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=196715 Cc: stable(a)vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com> Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com> --- drivers/net/wireless/intel/iwlwifi/fw/api/txq.h | 4 ++ drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 1 + drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 1 + drivers/net/wireless/intel/iwlwifi/mvm/sta.c | 53 +++++++++++++++++------ drivers/net/wireless/intel/iwlwifi/mvm/tx.c | 3 +- 6 files changed, 49 insertions(+), 15 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h b/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h index 87b4434224a1..dfa111bb411e 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h +++ b/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h @@ -68,6 +68,9 @@ * @IWL_MVM_DQA_CMD_QUEUE: a queue reserved for sending HCMDs to the FW * @IWL_MVM_DQA_AUX_QUEUE: a queue reserved for aux frames * @IWL_MVM_DQA_P2P_DEVICE_QUEUE: a queue reserved for P2P device frames + * @IWL_MVM_DQA_INJECT_MONITOR_QUEUE: a queue reserved for injection using + * monitor mode. Note this queue is the same as the queue for P2P device + * but we can't have active monitor mode along with P2P device anyway. * @IWL_MVM_DQA_GCAST_QUEUE: a queue reserved for P2P GO/SoftAP GCAST frames * @IWL_MVM_DQA_BSS_CLIENT_QUEUE: a queue reserved for BSS activity, to ensure * that we are never left without the possibility to connect to an AP. @@ -87,6 +90,7 @@ enum iwl_mvm_dqa_txq { IWL_MVM_DQA_CMD_QUEUE = 0, IWL_MVM_DQA_AUX_QUEUE = 1, IWL_MVM_DQA_P2P_DEVICE_QUEUE = 2, + IWL_MVM_DQA_INJECT_MONITOR_QUEUE = 2, IWL_MVM_DQA_GCAST_QUEUE = 3, IWL_MVM_DQA_BSS_CLIENT_QUEUE = 4, IWL_MVM_DQA_MIN_MGMT_QUEUE = 5, diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c index a2bf530eeae4..2f22e14e00fe 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c @@ -787,7 +787,7 @@ static int iwl_mvm_mac_ctxt_cmd_listener(struct iwl_mvm *mvm, u32 action) { struct iwl_mac_ctx_cmd cmd = {}; - u32 tfd_queue_msk = 0; + u32 tfd_queue_msk = BIT(mvm->snif_queue); int ret; WARN_ON(vif->type != NL80211_IFTYPE_MONITOR); diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h index 4575595ab022..6a9a25beab3f 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h @@ -972,6 +972,7 @@ struct iwl_mvm { /* Tx queues */ u16 aux_queue; + u16 snif_queue; u16 probe_queue; u16 p2p_dev_queue; diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c index 7078b7e458be..45470b6b351a 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c @@ -624,6 +624,7 @@ iwl_op_mode_mvm_start(struct iwl_trans *trans, const struct iwl_cfg *cfg, mvm->fw_restart = iwlwifi_mod_params.fw_restart ? -1 : 0; mvm->aux_queue = IWL_MVM_DQA_AUX_QUEUE; + mvm->snif_queue = IWL_MVM_DQA_INJECT_MONITOR_QUEUE; mvm->probe_queue = IWL_MVM_DQA_AP_PROBE_RESP_QUEUE; mvm->p2p_dev_queue = IWL_MVM_DQA_P2P_DEVICE_QUEUE; diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c index c19f98489d4e..1add5615fc3a 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c @@ -1709,29 +1709,29 @@ void iwl_mvm_dealloc_int_sta(struct iwl_mvm *mvm, struct iwl_mvm_int_sta *sta) sta->sta_id = IWL_MVM_INVALID_STA; } -static void iwl_mvm_enable_aux_queue(struct iwl_mvm *mvm) +static void iwl_mvm_enable_aux_snif_queue(struct iwl_mvm *mvm, u16 *queue, + u8 sta_id, u8 fifo) { unsigned int wdg_timeout = iwlmvm_mod_params.tfd_q_hang_detect ? mvm->cfg->base_params->wd_timeout : IWL_WATCHDOG_DISABLED; if (iwl_mvm_has_new_tx_api(mvm)) { - int queue = iwl_mvm_tvqm_enable_txq(mvm, mvm->aux_queue, - mvm->aux_sta.sta_id, - IWL_MAX_TID_COUNT, - wdg_timeout); - mvm->aux_queue = queue; + int tvqm_queue = + iwl_mvm_tvqm_enable_txq(mvm, *queue, sta_id, + IWL_MAX_TID_COUNT, + wdg_timeout); + *queue = tvqm_queue; } else { struct iwl_trans_txq_scd_cfg cfg = { - .fifo = IWL_MVM_TX_FIFO_MCAST, - .sta_id = mvm->aux_sta.sta_id, + .fifo = fifo, + .sta_id = sta_id, .tid = IWL_MAX_TID_COUNT, .aggregate = false, .frame_limit = IWL_FRAME_LIMIT, }; - iwl_mvm_enable_txq(mvm, mvm->aux_queue, mvm->aux_queue, 0, &cfg, - wdg_timeout); + iwl_mvm_enable_txq(mvm, *queue, *queue, 0, &cfg, wdg_timeout); } } @@ -1750,7 +1750,9 @@ int iwl_mvm_add_aux_sta(struct iwl_mvm *mvm) /* Map Aux queue to fifo - needs to happen before adding Aux station */ if (!iwl_mvm_has_new_tx_api(mvm)) - iwl_mvm_enable_aux_queue(mvm); + iwl_mvm_enable_aux_snif_queue(mvm, &mvm->aux_queue, + mvm->aux_sta.sta_id, + IWL_MVM_TX_FIFO_MCAST); ret = iwl_mvm_add_int_sta_common(mvm, &mvm->aux_sta, NULL, MAC_INDEX_AUX, 0); @@ -1764,7 +1766,9 @@ int iwl_mvm_add_aux_sta(struct iwl_mvm *mvm) * to firmware so enable queue here - after the station was added */ if (iwl_mvm_has_new_tx_api(mvm)) - iwl_mvm_enable_aux_queue(mvm); + iwl_mvm_enable_aux_snif_queue(mvm, &mvm->aux_queue, + mvm->aux_sta.sta_id, + IWL_MVM_TX_FIFO_MCAST); return 0; } @@ -1772,10 +1776,31 @@ int iwl_mvm_add_aux_sta(struct iwl_mvm *mvm) int iwl_mvm_add_snif_sta(struct iwl_mvm *mvm, struct ieee80211_vif *vif) { struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif); + int ret; lockdep_assert_held(&mvm->mutex); - return iwl_mvm_add_int_sta_common(mvm, &mvm->snif_sta, vif->addr, + + /* Map snif queue to fifo - must happen before adding snif station */ + if (!iwl_mvm_has_new_tx_api(mvm)) + iwl_mvm_enable_aux_snif_queue(mvm, &mvm->snif_queue, + mvm->snif_sta.sta_id, + IWL_MVM_TX_FIFO_BE); + + ret = iwl_mvm_add_int_sta_common(mvm, &mvm->snif_sta, vif->addr, mvmvif->id, 0); + if (ret) + return ret; + + /* + * For 22000 firmware and on we cannot add queue to a station unknown + * to firmware so enable queue here - after the station was added + */ + if (iwl_mvm_has_new_tx_api(mvm)) + iwl_mvm_enable_aux_snif_queue(mvm, &mvm->snif_queue, + mvm->snif_sta.sta_id, + IWL_MVM_TX_FIFO_BE); + + return 0; } int iwl_mvm_rm_snif_sta(struct iwl_mvm *mvm, struct ieee80211_vif *vif) @@ -1784,6 +1809,8 @@ int iwl_mvm_rm_snif_sta(struct iwl_mvm *mvm, struct ieee80211_vif *vif) lockdep_assert_held(&mvm->mutex); + iwl_mvm_disable_txq(mvm, mvm->snif_queue, mvm->snif_queue, + IWL_MAX_TID_COUNT, 0); ret = iwl_mvm_rm_sta_common(mvm, mvm->snif_sta.sta_id); if (ret) IWL_WARN(mvm, "Failed sending remove station\n"); diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c index 593b7f97b29c..333bcb75b8af 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c @@ -657,7 +657,8 @@ int iwl_mvm_tx_skb_non_sta(struct iwl_mvm *mvm, struct sk_buff *skb) if (ap_sta_id != IWL_MVM_INVALID_STA) sta_id = ap_sta_id; } else if (info.control.vif->type == NL80211_IFTYPE_MONITOR) { - queue = mvm->aux_queue; + queue = mvm->snif_queue; + sta_id = mvm->snif_sta.sta_id; } } -- 2.15.0

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH 2/6] iwlwifi: mvm: don't use transmit queue hang detection when it is not possible

by Luca Coelho

From: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com> When we act as an AP, new firmware versions handle internally the power saving clients and the driver doesn't know that the peers went to sleep. It is, hence, possible that a peer goes to sleep for a long time and stop pulling frames. This will cause its transmit queue to hang which is a condition that triggers the recovery flow in the driver. While this client is certainly buggy (it should have pulled the frame based on the TIM IE in the beacon), we can't blow up because of a buggy client. Change the current implementation to not enable the transmit queue hang detection on queues that serve peers when we act as an AP / GO. We can still enable this mechanism using the debug configuration which can come in handy when we want to debug why the client doesn't wake up. Cc: stable(a)vger.kernel.org # v4.13 Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach(a)intel.com> Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com> --- drivers/net/wireless/intel/iwlwifi/mvm/utils.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c index d46115e2d69e..19c1d1f76e15 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c @@ -1134,9 +1134,18 @@ unsigned int iwl_mvm_get_wd_timeout(struct iwl_mvm *mvm, unsigned int default_timeout = cmd_q ? IWL_DEF_WD_TIMEOUT : mvm->cfg->base_params->wd_timeout; - if (!iwl_fw_dbg_trigger_enabled(mvm->fw, FW_DBG_TRIGGER_TXQ_TIMERS)) + if (!iwl_fw_dbg_trigger_enabled(mvm->fw, FW_DBG_TRIGGER_TXQ_TIMERS)) { + /* + * We can't know when the station is asleep or awake, so we + * must disable the queue hang detection. + */ + if (fw_has_capa(&mvm->fw->ucode_capa, + IWL_UCODE_TLV_CAPA_STA_PM_NOTIF) && + vif && vif->type == NL80211_IFTYPE_AP) + return IWL_WATCHDOG_DISABLED; return iwlmvm_mod_params.tfd_q_hang_detect ? default_timeout : IWL_WATCHDOG_DISABLED; + } trigger = iwl_fw_dbg_get_trigger(mvm->fw, FW_DBG_TRIGGER_TXQ_TIMERS); txq_timer = (void *)trigger->data; -- 2.15.0

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/runtime instrumention: fix possible memory corruption" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/runtime instrumention: fix possible memory corruption to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-runtime-instrumention-fix-possible-memory-corruption.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From d6e646ad7cfa7034d280459b2b2546288f247144 Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Mon, 11 Sep 2017 11:24:22 +0200 Subject: s390/runtime instrumention: fix possible memory corruption From: Heiko Carstens <heiko.carstens(a)de.ibm.com> commit d6e646ad7cfa7034d280459b2b2546288f247144 upstream. For PREEMPT enabled kernels the runtime instrumentation (RI) code contains a possible use-after-free bug. If a task that makes use of RI exits, it will execute do_exit() while still enabled for preemption. That function will call exit_thread_runtime_instr() via exit_thread(). If exit_thread_runtime_instr() gets preempted after the RI control block of the task has been freed but before the pointer to it is set to NULL, then save_ri_cb(), called from switch_to(), will write to already freed memory. Avoid this and simply disable preemption while freeing the control block and setting the pointer to NULL. Fixes: e4b8b3f33fca ("s390: add support for runtime instrumentation") Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/runtime_instr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/s390/kernel/runtime_instr.c +++ b/arch/s390/kernel/runtime_instr.c @@ -47,11 +47,13 @@ void exit_thread_runtime_instr(void) { struct task_struct *task = current; + preempt_disable(); if (!task->thread.ri_cb) return; disable_runtime_instr(); kfree(task->thread.ri_cb); task->thread.ri_cb = NULL; + preempt_enable(); } SYSCALL_DEFINE1(s390_runtime_instr, int, command) @@ -62,9 +64,7 @@ SYSCALL_DEFINE1(s390_runtime_instr, int, return -EOPNOTSUPP; if (command == S390_RUNTIME_INSTR_STOP) { - preempt_disable(); exit_thread_runtime_instr(); - preempt_enable(); return 0; } Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are queue-4.9/s390-disassembler-add-missing-end-marker-for-e7-table.patch queue-4.9/s390-fix-transactional-execution-control-register-handling.patch queue-4.9/s390-runtime-instrumention-fix-possible-memory-corruption.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390: fix transactional execution control register handling" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390: fix transactional execution control register handling to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-fix-transactional-execution-control-register-handling.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Thu, 9 Nov 2017 12:29:34 +0100 Subject: s390: fix transactional execution control register handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Heiko Carstens <heiko.carstens(a)de.ibm.com> commit a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 upstream. Dan Horák reported the following crash related to transactional execution: User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000] CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1 Hardware name: IBM 2827 H43 400 (z/VM 6.4.0) task: 00000000fafc8000 task.stack: 00000000fafc4000 User PSW : 0705200180000000 000003ff93c14e70 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3 User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e 0000000000000000 0000000000000002 0000000000000000 000003ff00000000 0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770 000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0 User Code: 000003ff93c14e62: 60e0b030 std %f14,48(%r11) 000003ff93c14e66: 60f0b038 std %f15,56(%r11) #000003ff93c14e6a: e5600000ff0e tbegin 0,65294 >000003ff93c14e70: a7740006 brc 7,3ff93c14e7c 000003ff93c14e74: a7080000 lhi %r0,0 000003ff93c14e78: a7f40023 brc 15,3ff93c14ebe 000003ff93c14e7c: b2220000 ipm %r0 000003ff93c14e80: 8800001c srl %r0,28 There are several bugs with control register handling with respect to transactional execution: - on task switch update_per_regs() is only called if the next task has an mm (is not a kernel thread). This however is incorrect. This breaks e.g. for user mode helper handling, where the kernel creates a kernel thread and then execve's a user space program. Control register contents related to transactional execution won't be updated on execve. If the previous task ran with transactional execution disabled then the new task will also run with transactional execution disabled, which is incorrect. Therefore call update_per_regs() unconditionally within switch_to(). - on startup the transactional execution facility is not enabled for the idle thread. This is not really a bug, but an inconsistency to other facilities. Therefore enable the facility if it is available. - on fork the new thread's per_flags field is not cleared. This means that a child process inherits the PER_FLAG_NO_TE flag. This flag can be set with a ptrace request to disable transactional execution for the current process. It should not be inherited by new child processes in order to be consistent with the handling of all other PER related debugging options. Therefore clear the per_flags field in copy_thread_tls(). Reported-and-tested-by: Dan Horák <dan(a)danny.cz> Fixes: d35339a42dd1 ("s390: add support for transactional memory") Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner(a)linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/include/asm/switch_to.h | 2 +- arch/s390/kernel/early.c | 4 +++- arch/s390/kernel/process.c | 1 + 3 files changed, 5 insertions(+), 2 deletions(-) --- a/arch/s390/include/asm/switch_to.h +++ b/arch/s390/include/asm/switch_to.h @@ -34,8 +34,8 @@ static inline void restore_access_regs(u save_access_regs(&prev->thread.acrs[0]); \ save_ri_cb(prev->thread.ri_cb); \ } \ + update_cr_regs(next); \ if (next->mm) { \ - update_cr_regs(next); \ set_cpu_flag(CIF_FPU); \ restore_access_regs(&next->thread.acrs[0]); \ restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \ --- a/arch/s390/kernel/early.c +++ b/arch/s390/kernel/early.c @@ -345,8 +345,10 @@ static __init void detect_machine_facili S390_lowcore.machine_flags |= MACHINE_FLAG_IDTE; if (test_facility(40)) S390_lowcore.machine_flags |= MACHINE_FLAG_LPP; - if (test_facility(50) && test_facility(73)) + if (test_facility(50) && test_facility(73)) { S390_lowcore.machine_flags |= MACHINE_FLAG_TE; + __ctl_set_bit(0, 55); + } if (test_facility(51)) S390_lowcore.machine_flags |= MACHINE_FLAG_TLB_LC; if (test_facility(129)) { --- a/arch/s390/kernel/process.c +++ b/arch/s390/kernel/process.c @@ -120,6 +120,7 @@ int copy_thread(unsigned long clone_flag memset(&p->thread.per_user, 0, sizeof(p->thread.per_user)); memset(&p->thread.per_event, 0, sizeof(p->thread.per_event)); clear_tsk_thread_flag(p, TIF_SINGLE_STEP); + p->thread.per_flags = 0; /* Initialize per thread user and system timer values */ ti = task_thread_info(p); ti->user_timer = 0; Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are queue-4.9/s390-disassembler-add-missing-end-marker-for-e7-table.patch queue-4.9/s390-fix-transactional-execution-control-register-handling.patch queue-4.9/s390-runtime-instrumention-fix-possible-memory-corruption.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/disassembler: increase show_code buffer size" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/disassembler: increase show_code buffer size to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-disassembler-increase-show_code-buffer-size.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From b192571d1ae375e0bbe0aa3ccfa1a3c3704454b9 Mon Sep 17 00:00:00 2001 From: Vasily Gorbik <gor(a)linux.vnet.ibm.com> Date: Wed, 15 Nov 2017 14:15:36 +0100 Subject: s390/disassembler: increase show_code buffer size From: Vasily Gorbik <gor(a)linux.vnet.ibm.com> commit b192571d1ae375e0bbe0aa3ccfa1a3c3704454b9 upstream. Current buffer size of 64 is too small. objdump shows that there are instructions which would require up to 75 bytes buffer (with current formating). 128 bytes "ought to be enough for anybody". Also replaces 8 spaces with a single tab to reduce the memory footprint. Fixes the following KASAN finding: BUG: KASAN: stack-out-of-bounds in number+0x3fe/0x538 Write of size 1 at addr 000000005a4a75a0 by task bash/1282 CPU: 1 PID: 1282 Comm: bash Not tainted 4.14.0+ #215 Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) Call Trace: ([<000000000011eeb6>] show_stack+0x56/0x88) [<0000000000e1ce1a>] dump_stack+0x15a/0x1b0 [<00000000004e2994>] print_address_description+0xf4/0x288 [<00000000004e2cf2>] kasan_report+0x13a/0x230 [<0000000000e38ae6>] number+0x3fe/0x538 [<0000000000e3dfe4>] vsnprintf+0x194/0x948 [<0000000000e3ea42>] sprintf+0xa2/0xb8 [<00000000001198dc>] print_insn+0x374/0x500 [<0000000000119346>] show_code+0x4ee/0x538 [<000000000011f234>] show_registers+0x34c/0x388 [<000000000011f2ae>] show_regs+0x3e/0xa8 [<000000000011f502>] die+0x1ea/0x2e8 [<0000000000138f0e>] do_no_context+0x106/0x168 [<0000000000139a1a>] do_protection_exception+0x4da/0x7d0 [<0000000000e55914>] pgm_check_handler+0x16c/0x1c0 [<000000000090639e>] sysrq_handle_crash+0x46/0x58 ([<0000000000000007>] 0x7) [<00000000009073fa>] __handle_sysrq+0x102/0x218 [<0000000000907c06>] write_sysrq_trigger+0xd6/0x100 [<000000000061d67a>] proc_reg_write+0xb2/0x128 [<0000000000520be6>] __vfs_write+0xee/0x368 [<0000000000521222>] vfs_write+0x21a/0x278 [<000000000052156a>] SyS_write+0xda/0x178 [<0000000000e555cc>] system_call+0xc4/0x270 The buggy address belongs to the page: page:000003d1016929c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() raw: 0000000000000000 0000000000000000 0000000000000000 ffffffff00000000 raw: 0000000000000100 0000000000000200 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: 000000005a4a7480: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 000000005a4a7500: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00 00 >000000005a4a7580: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 ^ 000000005a4a7600: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f8 000000005a4a7680: f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f3 f3 f3 f3 00 00 ================================================================== Signed-off-by: Vasily Gorbik <gor(a)linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/dis.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/s390/kernel/dis.c +++ b/arch/s390/kernel/dis.c @@ -1954,7 +1954,7 @@ void show_code(struct pt_regs *regs) { char *mode = user_mode(regs) ? "User" : "Krnl"; unsigned char code[64]; - char buffer[64], *ptr; + char buffer[128], *ptr; mm_segment_t old_fs; unsigned long addr; int start, end, opsize, hops, i; @@ -2017,7 +2017,7 @@ void show_code(struct pt_regs *regs) start += opsize; pr_cont("%s", buffer); ptr = buffer; - ptr += sprintf(ptr, "\n "); + ptr += sprintf(ptr, "\n\t "); hops++; } pr_cont("\n"); Patches currently in stable-queue which might be from gor(a)linux.vnet.ibm.com are queue-4.9/s390-disassembler-increase-show_code-buffer-size.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ACPI / EC: Fix regression related to triggering source of EC event handling ACPI / EC: Fix an issue that SCI_EVT cannot be detected ACPI / EC: Remove old CLEAR_ON_RESUME quirk Revert "ACPI / EC: Enable event freeze mode..." to fix" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ACPI / EC: Fix regression related to triggering source of EC event handling ACPI / EC: Fix an issue that SCI_EVT cannot be detected ACPI / EC: Remove old CLEAR_ON_RESUME quirk Revert "ACPI / EC: Enable event freeze mode..." to fix to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: acpi-ec-fix-regression-related-to-triggering-source-of-ec-event-handling.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 53c5eaabaea9a1b7a96f95ccc486d2ad721d95bb Mon Sep 17 00:00:00 2001 From: Lv Zheng <lv.zheng(a)intel.com> Date: Tue, 26 Sep 2017 16:54:03 +0800 Subject: ACPI / EC: Fix regression related to triggering source of EC event handling From: Lv Zheng <lv.zheng(a)intel.com> commit 53c5eaabaea9a1b7a96f95ccc486d2ad721d95bb upstream. Originally the Samsung quirks removed by commit 4c237371 can be covered by commit e923e8e7 and ec_freeze_events=Y mode. But commit 9c40f956 changed ec_freeze_events=Y back to N, making this problem re-surface. Actually, if commit e923e8e7 is robust enough, we can freely change ec_freeze_events mode, so this patch fixes the issue by improving commit e923e8e7. Related commits listed in the merged order: Commit: e923e8e79e18fd6be9162f1be6b99a002e9df2cb Subject: ACPI / EC: Fix an issue that SCI_EVT cannot be detected after event is enabled Commit: 4c237371f290d1ed3b2071dd43554362137b1cce Subject: ACPI / EC: Remove old CLEAR_ON_RESUME quirk Commit: 9c40f956ce9b331493347d1b3cb7e384f7dc0581 Subject: Revert "ACPI / EC: Enable event freeze mode..." to fix a regression This patch not only fixes the reported post-resume EC event triggering source issue, but also fixes an unreported similar issue related to the driver bind by adding EC event triggering source in ec_install_handlers(). Fixes: e923e8e79e18 (ACPI / EC: Fix an issue that SCI_EVT cannot be detected after event is enabled) Fixes: 4c237371f290 (ACPI / EC: Remove old CLEAR_ON_RESUME quirk) Fixes: 9c40f956ce9b (Revert "ACPI / EC: Enable event freeze mode..." to fix a regression) Link: https://bugzilla.kernel.org/show_bug.cgi?id=196833 Signed-off-by: Lv Zheng <lv.zheng(a)intel.com> Reported-by: Alistair Hamilton <ahpatent(a)gmail.com> Tested-by: Alistair Hamilton <ahpatent(a)gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/acpi/ec.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -482,8 +482,11 @@ static inline void __acpi_ec_enable_even { if (!test_and_set_bit(EC_FLAGS_QUERY_ENABLED, &ec->flags)) ec_log_drv("event unblocked"); - if (!test_bit(EC_FLAGS_QUERY_PENDING, &ec->flags)) - advance_transaction(ec); + /* + * Unconditionally invoke this once after enabling the event + * handling mechanism to detect the pending events. + */ + advance_transaction(ec); } static inline void __acpi_ec_disable_event(struct acpi_ec *ec) @@ -1458,11 +1461,10 @@ static int ec_install_handlers(struct ac if (test_bit(EC_FLAGS_STARTED, &ec->flags) && ec->reference_count >= 1) acpi_ec_enable_gpe(ec, true); - - /* EC is fully operational, allow queries */ - acpi_ec_enable_event(ec); } } + /* EC is fully operational, allow queries */ + acpi_ec_enable_event(ec); return 0; } Patches currently in stable-queue which might be from lv.zheng(a)intel.com are queue-4.9/acpi-ec-fix-regression-related-to-triggering-source-of-ec-event-handling.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/disassembler: add missing end marker for e7 table" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/disassembler: add missing end marker for e7 table to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-disassembler-add-missing-end-marker-for-e7-table.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 5c50538752af7968f53924b22dede8ed4ce4cb3b Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Tue, 26 Sep 2017 09:16:48 +0200 Subject: s390/disassembler: add missing end marker for e7 table From: Heiko Carstens <heiko.carstens(a)de.ibm.com> commit 5c50538752af7968f53924b22dede8ed4ce4cb3b upstream. The e7 opcode table does not have an end marker. Hence when trying to find an unknown e7 instruction the code will access memory behind the table until it finds something that matches the opcode, or the kernel crashes, whatever comes first. This affects not only the in-kernel disassembler but also uprobes and kprobes which refuse to set a probe on unknown instructions, and therefore search the opcode tables to figure out if instructions are known or not. Fixes: 3585cb0280654 ("s390/disassembler: add vector instructions") Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/dis.c | 1 + 1 file changed, 1 insertion(+) --- a/arch/s390/kernel/dis.c +++ b/arch/s390/kernel/dis.c @@ -1548,6 +1548,7 @@ static struct s390_insn opcode_e7[] = { { "vfsq", 0xce, INSTR_VRR_VV000MM }, { "vfs", 0xe2, INSTR_VRR_VVV00MM }, { "vftci", 0x4a, INSTR_VRI_VVIMM }, + { "", 0, INSTR_INVALID } }; static struct s390_insn opcode_eb[] = { Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are queue-4.9/s390-disassembler-add-missing-end-marker-for-e7-table.patch queue-4.9/s390-fix-transactional-execution-control-register-handling.patch queue-4.9/s390-runtime-instrumention-fix-possible-memory-corruption.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/runtime instrumention: fix possible memory corruption" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/runtime instrumention: fix possible memory corruption to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-runtime-instrumention-fix-possible-memory-corruption.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From d6e646ad7cfa7034d280459b2b2546288f247144 Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Mon, 11 Sep 2017 11:24:22 +0200 Subject: s390/runtime instrumention: fix possible memory corruption From: Heiko Carstens <heiko.carstens(a)de.ibm.com> commit d6e646ad7cfa7034d280459b2b2546288f247144 upstream. For PREEMPT enabled kernels the runtime instrumentation (RI) code contains a possible use-after-free bug. If a task that makes use of RI exits, it will execute do_exit() while still enabled for preemption. That function will call exit_thread_runtime_instr() via exit_thread(). If exit_thread_runtime_instr() gets preempted after the RI control block of the task has been freed but before the pointer to it is set to NULL, then save_ri_cb(), called from switch_to(), will write to already freed memory. Avoid this and simply disable preemption while freeing the control block and setting the pointer to NULL. Fixes: e4b8b3f33fca ("s390: add support for runtime instrumentation") Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/runtime_instr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/s390/kernel/runtime_instr.c +++ b/arch/s390/kernel/runtime_instr.c @@ -47,11 +47,13 @@ void exit_thread_runtime_instr(void) { struct task_struct *task = current; + preempt_disable(); if (!task->thread.ri_cb) return; disable_runtime_instr(); kfree(task->thread.ri_cb); task->thread.ri_cb = NULL; + preempt_enable(); } SYSCALL_DEFINE1(s390_runtime_instr, int, command) @@ -62,9 +64,7 @@ SYSCALL_DEFINE1(s390_runtime_instr, int, return -EOPNOTSUPP; if (command == S390_RUNTIME_INSTR_STOP) { - preempt_disable(); exit_thread_runtime_instr(); - preempt_enable(); return 0; } Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are queue-4.4/s390-disassembler-add-missing-end-marker-for-e7-table.patch queue-4.4/s390-fix-transactional-execution-control-register-handling.patch queue-4.4/s390-runtime-instrumention-fix-possible-memory-corruption.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390: fix transactional execution control register handling" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390: fix transactional execution control register handling to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-fix-transactional-execution-control-register-handling.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Thu, 9 Nov 2017 12:29:34 +0100 Subject: s390: fix transactional execution control register handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Heiko Carstens <heiko.carstens(a)de.ibm.com> commit a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 upstream. Dan Horák reported the following crash related to transactional execution: User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000] CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1 Hardware name: IBM 2827 H43 400 (z/VM 6.4.0) task: 00000000fafc8000 task.stack: 00000000fafc4000 User PSW : 0705200180000000 000003ff93c14e70 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3 User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e 0000000000000000 0000000000000002 0000000000000000 000003ff00000000 0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770 000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0 User Code: 000003ff93c14e62: 60e0b030 std %f14,48(%r11) 000003ff93c14e66: 60f0b038 std %f15,56(%r11) #000003ff93c14e6a: e5600000ff0e tbegin 0,65294 >000003ff93c14e70: a7740006 brc 7,3ff93c14e7c 000003ff93c14e74: a7080000 lhi %r0,0 000003ff93c14e78: a7f40023 brc 15,3ff93c14ebe 000003ff93c14e7c: b2220000 ipm %r0 000003ff93c14e80: 8800001c srl %r0,28 There are several bugs with control register handling with respect to transactional execution: - on task switch update_per_regs() is only called if the next task has an mm (is not a kernel thread). This however is incorrect. This breaks e.g. for user mode helper handling, where the kernel creates a kernel thread and then execve's a user space program. Control register contents related to transactional execution won't be updated on execve. If the previous task ran with transactional execution disabled then the new task will also run with transactional execution disabled, which is incorrect. Therefore call update_per_regs() unconditionally within switch_to(). - on startup the transactional execution facility is not enabled for the idle thread. This is not really a bug, but an inconsistency to other facilities. Therefore enable the facility if it is available. - on fork the new thread's per_flags field is not cleared. This means that a child process inherits the PER_FLAG_NO_TE flag. This flag can be set with a ptrace request to disable transactional execution for the current process. It should not be inherited by new child processes in order to be consistent with the handling of all other PER related debugging options. Therefore clear the per_flags field in copy_thread_tls(). Reported-and-tested-by: Dan Horák <dan(a)danny.cz> Fixes: d35339a42dd1 ("s390: add support for transactional memory") Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner(a)linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/include/asm/switch_to.h | 2 +- arch/s390/kernel/early.c | 4 +++- arch/s390/kernel/process.c | 1 + 3 files changed, 5 insertions(+), 2 deletions(-) --- a/arch/s390/include/asm/switch_to.h +++ b/arch/s390/include/asm/switch_to.h @@ -34,8 +34,8 @@ static inline void restore_access_regs(u save_access_regs(&prev->thread.acrs[0]); \ save_ri_cb(prev->thread.ri_cb); \ } \ + update_cr_regs(next); \ if (next->mm) { \ - update_cr_regs(next); \ set_cpu_flag(CIF_FPU); \ restore_access_regs(&next->thread.acrs[0]); \ restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \ --- a/arch/s390/kernel/early.c +++ b/arch/s390/kernel/early.c @@ -325,8 +325,10 @@ static __init void detect_machine_facili S390_lowcore.machine_flags |= MACHINE_FLAG_IDTE; if (test_facility(40)) S390_lowcore.machine_flags |= MACHINE_FLAG_LPP; - if (test_facility(50) && test_facility(73)) + if (test_facility(50) && test_facility(73)) { S390_lowcore.machine_flags |= MACHINE_FLAG_TE; + __ctl_set_bit(0, 55); + } if (test_facility(51)) S390_lowcore.machine_flags |= MACHINE_FLAG_TLB_LC; if (test_facility(129)) { --- a/arch/s390/kernel/process.c +++ b/arch/s390/kernel/process.c @@ -137,6 +137,7 @@ int copy_thread(unsigned long clone_flag memset(&p->thread.per_user, 0, sizeof(p->thread.per_user)); memset(&p->thread.per_event, 0, sizeof(p->thread.per_event)); clear_tsk_thread_flag(p, TIF_SINGLE_STEP); + p->thread.per_flags = 0; /* Initialize per thread user and system timer values */ ti = task_thread_info(p); ti->user_timer = 0; Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are queue-4.4/s390-disassembler-add-missing-end-marker-for-e7-table.patch queue-4.4/s390-fix-transactional-execution-control-register-handling.patch queue-4.4/s390-runtime-instrumention-fix-possible-memory-corruption.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/disassembler: increase show_code buffer size" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/disassembler: increase show_code buffer size to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-disassembler-increase-show_code-buffer-size.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From b192571d1ae375e0bbe0aa3ccfa1a3c3704454b9 Mon Sep 17 00:00:00 2001 From: Vasily Gorbik <gor(a)linux.vnet.ibm.com> Date: Wed, 15 Nov 2017 14:15:36 +0100 Subject: s390/disassembler: increase show_code buffer size From: Vasily Gorbik <gor(a)linux.vnet.ibm.com> commit b192571d1ae375e0bbe0aa3ccfa1a3c3704454b9 upstream. Current buffer size of 64 is too small. objdump shows that there are instructions which would require up to 75 bytes buffer (with current formating). 128 bytes "ought to be enough for anybody". Also replaces 8 spaces with a single tab to reduce the memory footprint. Fixes the following KASAN finding: BUG: KASAN: stack-out-of-bounds in number+0x3fe/0x538 Write of size 1 at addr 000000005a4a75a0 by task bash/1282 CPU: 1 PID: 1282 Comm: bash Not tainted 4.14.0+ #215 Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) Call Trace: ([<000000000011eeb6>] show_stack+0x56/0x88) [<0000000000e1ce1a>] dump_stack+0x15a/0x1b0 [<00000000004e2994>] print_address_description+0xf4/0x288 [<00000000004e2cf2>] kasan_report+0x13a/0x230 [<0000000000e38ae6>] number+0x3fe/0x538 [<0000000000e3dfe4>] vsnprintf+0x194/0x948 [<0000000000e3ea42>] sprintf+0xa2/0xb8 [<00000000001198dc>] print_insn+0x374/0x500 [<0000000000119346>] show_code+0x4ee/0x538 [<000000000011f234>] show_registers+0x34c/0x388 [<000000000011f2ae>] show_regs+0x3e/0xa8 [<000000000011f502>] die+0x1ea/0x2e8 [<0000000000138f0e>] do_no_context+0x106/0x168 [<0000000000139a1a>] do_protection_exception+0x4da/0x7d0 [<0000000000e55914>] pgm_check_handler+0x16c/0x1c0 [<000000000090639e>] sysrq_handle_crash+0x46/0x58 ([<0000000000000007>] 0x7) [<00000000009073fa>] __handle_sysrq+0x102/0x218 [<0000000000907c06>] write_sysrq_trigger+0xd6/0x100 [<000000000061d67a>] proc_reg_write+0xb2/0x128 [<0000000000520be6>] __vfs_write+0xee/0x368 [<0000000000521222>] vfs_write+0x21a/0x278 [<000000000052156a>] SyS_write+0xda/0x178 [<0000000000e555cc>] system_call+0xc4/0x270 The buggy address belongs to the page: page:000003d1016929c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() raw: 0000000000000000 0000000000000000 0000000000000000 ffffffff00000000 raw: 0000000000000100 0000000000000200 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: 000000005a4a7480: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 000000005a4a7500: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00 00 >000000005a4a7580: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 ^ 000000005a4a7600: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f8 000000005a4a7680: f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f3 f3 f3 f3 00 00 ================================================================== Signed-off-by: Vasily Gorbik <gor(a)linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/dis.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/s390/kernel/dis.c +++ b/arch/s390/kernel/dis.c @@ -1962,7 +1962,7 @@ void show_code(struct pt_regs *regs) { char *mode = user_mode(regs) ? "User" : "Krnl"; unsigned char code[64]; - char buffer[64], *ptr; + char buffer[128], *ptr; mm_segment_t old_fs; unsigned long addr; int start, end, opsize, hops, i; @@ -2025,7 +2025,7 @@ void show_code(struct pt_regs *regs) start += opsize; printk(buffer); ptr = buffer; - ptr += sprintf(ptr, "\n "); + ptr += sprintf(ptr, "\n\t "); hops++; } printk("\n"); Patches currently in stable-queue which might be from gor(a)linux.vnet.ibm.com are queue-4.4/s390-disassembler-increase-show_code-buffer-size.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/disassembler: add missing end marker for e7 table" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/disassembler: add missing end marker for e7 table to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-disassembler-add-missing-end-marker-for-e7-table.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 5c50538752af7968f53924b22dede8ed4ce4cb3b Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Tue, 26 Sep 2017 09:16:48 +0200 Subject: s390/disassembler: add missing end marker for e7 table From: Heiko Carstens <heiko.carstens(a)de.ibm.com> commit 5c50538752af7968f53924b22dede8ed4ce4cb3b upstream. The e7 opcode table does not have an end marker. Hence when trying to find an unknown e7 instruction the code will access memory behind the table until it finds something that matches the opcode, or the kernel crashes, whatever comes first. This affects not only the in-kernel disassembler but also uprobes and kprobes which refuse to set a probe on unknown instructions, and therefore search the opcode tables to figure out if instructions are known or not. Fixes: 3585cb0280654 ("s390/disassembler: add vector instructions") Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/dis.c | 1 + 1 file changed, 1 insertion(+) --- a/arch/s390/kernel/dis.c +++ b/arch/s390/kernel/dis.c @@ -1549,6 +1549,7 @@ static struct s390_insn opcode_e7[] = { { "vfsq", 0xce, INSTR_VRR_VV000MM }, { "vfs", 0xe2, INSTR_VRR_VVV00MM }, { "vftci", 0x4a, INSTR_VRI_VVIMM }, + { "", 0, INSTR_INVALID } }; static struct s390_insn opcode_eb[] = { Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are queue-4.4/s390-disassembler-add-missing-end-marker-for-e7-table.patch queue-4.4/s390-fix-transactional-execution-control-register-handling.patch queue-4.4/s390-runtime-instrumention-fix-possible-memory-corruption.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/runtime instrumention: fix possible memory corruption" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/runtime instrumention: fix possible memory corruption to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-runtime-instrumention-fix-possible-memory-corruption.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From d6e646ad7cfa7034d280459b2b2546288f247144 Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Mon, 11 Sep 2017 11:24:22 +0200 Subject: s390/runtime instrumention: fix possible memory corruption From: Heiko Carstens <heiko.carstens(a)de.ibm.com> commit d6e646ad7cfa7034d280459b2b2546288f247144 upstream. For PREEMPT enabled kernels the runtime instrumentation (RI) code contains a possible use-after-free bug. If a task that makes use of RI exits, it will execute do_exit() while still enabled for preemption. That function will call exit_thread_runtime_instr() via exit_thread(). If exit_thread_runtime_instr() gets preempted after the RI control block of the task has been freed but before the pointer to it is set to NULL, then save_ri_cb(), called from switch_to(), will write to already freed memory. Avoid this and simply disable preemption while freeing the control block and setting the pointer to NULL. Fixes: e4b8b3f33fca ("s390: add support for runtime instrumentation") Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/runtime_instr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/s390/kernel/runtime_instr.c +++ b/arch/s390/kernel/runtime_instr.c @@ -50,11 +50,13 @@ void exit_thread_runtime_instr(void) { struct task_struct *task = current; + preempt_disable(); if (!task->thread.ri_cb) return; disable_runtime_instr(); kfree(task->thread.ri_cb); task->thread.ri_cb = NULL; + preempt_enable(); } SYSCALL_DEFINE1(s390_runtime_instr, int, command) @@ -65,9 +67,7 @@ SYSCALL_DEFINE1(s390_runtime_instr, int, return -EOPNOTSUPP; if (command == S390_RUNTIME_INSTR_STOP) { - preempt_disable(); exit_thread_runtime_instr(); - preempt_enable(); return 0; } Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are queue-4.14/s390-guarded-storage-fix-possible-memory-corruption.patch queue-4.14/s390-disassembler-add-missing-end-marker-for-e7-table.patch queue-4.14/s390-fix-transactional-execution-control-register-handling.patch queue-4.14/s390-runtime-instrumention-fix-possible-memory-corruption.patch queue-4.14/s390-noexec-execute-kexec-datamover-without-dat.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/noexec: execute kexec datamover without DAT" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/noexec: execute kexec datamover without DAT to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-noexec-execute-kexec-datamover-without-dat.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From d0e810eeb3d326978f248b8f0233a2f30f58c72d Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Thu, 9 Nov 2017 23:00:14 +0100 Subject: s390/noexec: execute kexec datamover without DAT From: Heiko Carstens <heiko.carstens(a)de.ibm.com> commit d0e810eeb3d326978f248b8f0233a2f30f58c72d upstream. Rebooting into a new kernel with kexec fails (system dies) if tried on a machine that has no-execute support. Reason for this is that the so called datamover code gets executed with DAT on (MMU is active) and the page that contains the datamover is marked as non-executable. Therefore when branching into the datamover an unexpected program check happens and afterwards the machine is dead. This can be simply avoided by disabling DAT, which also disables any no-execute checks, just before the datamover gets executed. In fact the first thing done by the datamover is to disable DAT. The code in the datamover that disables DAT can be removed as well. Thanks to Michael Holzheu and Gerald Schaefer for tracking this down. Reviewed-by: Michael Holzheu <holzheu(a)linux.vnet.ibm.com> Reviewed-by: Philipp Rudo <prudo(a)linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer(a)de.ibm.com> Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Fixes: 57d7f939e7bd ("s390: add no-execute support") Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/machine_kexec.c | 1 + arch/s390/kernel/relocate_kernel.S | 3 --- 2 files changed, 1 insertion(+), 3 deletions(-) --- a/arch/s390/kernel/machine_kexec.c +++ b/arch/s390/kernel/machine_kexec.c @@ -269,6 +269,7 @@ static void __do_machine_kexec(void *dat s390_reset_system(); data_mover = (relocate_kernel_t) page_to_phys(image->control_code_page); + __arch_local_irq_stnsm(0xfb); /* disable DAT - avoid no-execute */ /* Call the moving routine */ (*data_mover)(&image->head, image->start); --- a/arch/s390/kernel/relocate_kernel.S +++ b/arch/s390/kernel/relocate_kernel.S @@ -29,7 +29,6 @@ ENTRY(relocate_kernel) basr %r13,0 # base address .base: - stnsm sys_msk-.base(%r13),0xfb # disable DAT stctg %c0,%c15,ctlregs-.base(%r13) stmg %r0,%r15,gprregs-.base(%r13) lghi %r0,3 @@ -103,8 +102,6 @@ ENTRY(relocate_kernel) .align 8 load_psw: .long 0x00080000,0x80000000 - sys_msk: - .quad 0 ctlregs: .rept 16 .quad 0 Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are queue-4.14/s390-guarded-storage-fix-possible-memory-corruption.patch queue-4.14/s390-disassembler-add-missing-end-marker-for-e7-table.patch queue-4.14/s390-fix-transactional-execution-control-register-handling.patch queue-4.14/s390-runtime-instrumention-fix-possible-memory-corruption.patch queue-4.14/s390-noexec-execute-kexec-datamover-without-dat.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/guarded storage: fix possible memory corruption" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/guarded storage: fix possible memory corruption to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-guarded-storage-fix-possible-memory-corruption.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From fa1edf3f63c05ca8eacafcd7048ed91e5360f1a8 Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Mon, 11 Sep 2017 11:24:22 +0200 Subject: s390/guarded storage: fix possible memory corruption From: Heiko Carstens <heiko.carstens(a)de.ibm.com> commit fa1edf3f63c05ca8eacafcd7048ed91e5360f1a8 upstream. For PREEMPT enabled kernels the guarded storage (GS) code contains a possible use-after-free bug. If a task that makes use of GS exits, it will execute do_exit() while still enabled for preemption. That function will call exit_thread_runtime_instr() via exit_thread(). If exit_thread_gs() gets preempted after the GS control block of the task has been freed but before the pointer to it is set to NULL, then save_gs_cb(), called from switch_to(), will write to already freed memory. Avoid this and simply disable preemption while freeing the control block and setting the pointer to NULL. Fixes: 916cda1aa1b4 ("s390: add a system call for guarded storage") Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/guarded_storage.c | 2 ++ 1 file changed, 2 insertions(+) --- a/arch/s390/kernel/guarded_storage.c +++ b/arch/s390/kernel/guarded_storage.c @@ -14,9 +14,11 @@ void exit_thread_gs(void) { + preempt_disable(); kfree(current->thread.gs_cb); kfree(current->thread.gs_bc_cb); current->thread.gs_cb = current->thread.gs_bc_cb = NULL; + preempt_enable(); } static int gs_enable(void) Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are queue-4.14/s390-guarded-storage-fix-possible-memory-corruption.patch queue-4.14/s390-disassembler-add-missing-end-marker-for-e7-table.patch queue-4.14/s390-fix-transactional-execution-control-register-handling.patch queue-4.14/s390-runtime-instrumention-fix-possible-memory-corruption.patch queue-4.14/s390-noexec-execute-kexec-datamover-without-dat.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390: fix transactional execution control register handling" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390: fix transactional execution control register handling to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-fix-transactional-execution-control-register-handling.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Thu, 9 Nov 2017 12:29:34 +0100 Subject: s390: fix transactional execution control register handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Heiko Carstens <heiko.carstens(a)de.ibm.com> commit a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 upstream. Dan Horák reported the following crash related to transactional execution: User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000] CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1 Hardware name: IBM 2827 H43 400 (z/VM 6.4.0) task: 00000000fafc8000 task.stack: 00000000fafc4000 User PSW : 0705200180000000 000003ff93c14e70 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3 User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e 0000000000000000 0000000000000002 0000000000000000 000003ff00000000 0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770 000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0 User Code: 000003ff93c14e62: 60e0b030 std %f14,48(%r11) 000003ff93c14e66: 60f0b038 std %f15,56(%r11) #000003ff93c14e6a: e5600000ff0e tbegin 0,65294 >000003ff93c14e70: a7740006 brc 7,3ff93c14e7c 000003ff93c14e74: a7080000 lhi %r0,0 000003ff93c14e78: a7f40023 brc 15,3ff93c14ebe 000003ff93c14e7c: b2220000 ipm %r0 000003ff93c14e80: 8800001c srl %r0,28 There are several bugs with control register handling with respect to transactional execution: - on task switch update_per_regs() is only called if the next task has an mm (is not a kernel thread). This however is incorrect. This breaks e.g. for user mode helper handling, where the kernel creates a kernel thread and then execve's a user space program. Control register contents related to transactional execution won't be updated on execve. If the previous task ran with transactional execution disabled then the new task will also run with transactional execution disabled, which is incorrect. Therefore call update_per_regs() unconditionally within switch_to(). - on startup the transactional execution facility is not enabled for the idle thread. This is not really a bug, but an inconsistency to other facilities. Therefore enable the facility if it is available. - on fork the new thread's per_flags field is not cleared. This means that a child process inherits the PER_FLAG_NO_TE flag. This flag can be set with a ptrace request to disable transactional execution for the current process. It should not be inherited by new child processes in order to be consistent with the handling of all other PER related debugging options. Therefore clear the per_flags field in copy_thread_tls(). Reported-and-tested-by: Dan Horák <dan(a)danny.cz> Fixes: d35339a42dd1 ("s390: add support for transactional memory") Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner(a)linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/include/asm/switch_to.h | 2 +- arch/s390/kernel/early.c | 4 +++- arch/s390/kernel/process.c | 1 + 3 files changed, 5 insertions(+), 2 deletions(-) --- a/arch/s390/include/asm/switch_to.h +++ b/arch/s390/include/asm/switch_to.h @@ -37,8 +37,8 @@ static inline void restore_access_regs(u save_ri_cb(prev->thread.ri_cb); \ save_gs_cb(prev->thread.gs_cb); \ } \ + update_cr_regs(next); \ if (next->mm) { \ - update_cr_regs(next); \ set_cpu_flag(CIF_FPU); \ restore_access_regs(&next->thread.acrs[0]); \ restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \ --- a/arch/s390/kernel/early.c +++ b/arch/s390/kernel/early.c @@ -375,8 +375,10 @@ static __init void detect_machine_facili S390_lowcore.machine_flags |= MACHINE_FLAG_IDTE; if (test_facility(40)) S390_lowcore.machine_flags |= MACHINE_FLAG_LPP; - if (test_facility(50) && test_facility(73)) + if (test_facility(50) && test_facility(73)) { S390_lowcore.machine_flags |= MACHINE_FLAG_TE; + __ctl_set_bit(0, 55); + } if (test_facility(51)) S390_lowcore.machine_flags |= MACHINE_FLAG_TLB_LC; if (test_facility(129)) { --- a/arch/s390/kernel/process.c +++ b/arch/s390/kernel/process.c @@ -100,6 +100,7 @@ int copy_thread_tls(unsigned long clone_ memset(&p->thread.per_user, 0, sizeof(p->thread.per_user)); memset(&p->thread.per_event, 0, sizeof(p->thread.per_event)); clear_tsk_thread_flag(p, TIF_SINGLE_STEP); + p->thread.per_flags = 0; /* Initialize per thread user and system timer values */ p->thread.user_timer = 0; p->thread.guest_timer = 0; Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are queue-4.14/s390-guarded-storage-fix-possible-memory-corruption.patch queue-4.14/s390-disassembler-add-missing-end-marker-for-e7-table.patch queue-4.14/s390-fix-transactional-execution-control-register-handling.patch queue-4.14/s390-runtime-instrumention-fix-possible-memory-corruption.patch queue-4.14/s390-noexec-execute-kexec-datamover-without-dat.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/disassembler: increase show_code buffer size" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/disassembler: increase show_code buffer size to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-disassembler-increase-show_code-buffer-size.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From b192571d1ae375e0bbe0aa3ccfa1a3c3704454b9 Mon Sep 17 00:00:00 2001 From: Vasily Gorbik <gor(a)linux.vnet.ibm.com> Date: Wed, 15 Nov 2017 14:15:36 +0100 Subject: s390/disassembler: increase show_code buffer size From: Vasily Gorbik <gor(a)linux.vnet.ibm.com> commit b192571d1ae375e0bbe0aa3ccfa1a3c3704454b9 upstream. Current buffer size of 64 is too small. objdump shows that there are instructions which would require up to 75 bytes buffer (with current formating). 128 bytes "ought to be enough for anybody". Also replaces 8 spaces with a single tab to reduce the memory footprint. Fixes the following KASAN finding: BUG: KASAN: stack-out-of-bounds in number+0x3fe/0x538 Write of size 1 at addr 000000005a4a75a0 by task bash/1282 CPU: 1 PID: 1282 Comm: bash Not tainted 4.14.0+ #215 Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) Call Trace: ([<000000000011eeb6>] show_stack+0x56/0x88) [<0000000000e1ce1a>] dump_stack+0x15a/0x1b0 [<00000000004e2994>] print_address_description+0xf4/0x288 [<00000000004e2cf2>] kasan_report+0x13a/0x230 [<0000000000e38ae6>] number+0x3fe/0x538 [<0000000000e3dfe4>] vsnprintf+0x194/0x948 [<0000000000e3ea42>] sprintf+0xa2/0xb8 [<00000000001198dc>] print_insn+0x374/0x500 [<0000000000119346>] show_code+0x4ee/0x538 [<000000000011f234>] show_registers+0x34c/0x388 [<000000000011f2ae>] show_regs+0x3e/0xa8 [<000000000011f502>] die+0x1ea/0x2e8 [<0000000000138f0e>] do_no_context+0x106/0x168 [<0000000000139a1a>] do_protection_exception+0x4da/0x7d0 [<0000000000e55914>] pgm_check_handler+0x16c/0x1c0 [<000000000090639e>] sysrq_handle_crash+0x46/0x58 ([<0000000000000007>] 0x7) [<00000000009073fa>] __handle_sysrq+0x102/0x218 [<0000000000907c06>] write_sysrq_trigger+0xd6/0x100 [<000000000061d67a>] proc_reg_write+0xb2/0x128 [<0000000000520be6>] __vfs_write+0xee/0x368 [<0000000000521222>] vfs_write+0x21a/0x278 [<000000000052156a>] SyS_write+0xda/0x178 [<0000000000e555cc>] system_call+0xc4/0x270 The buggy address belongs to the page: page:000003d1016929c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() raw: 0000000000000000 0000000000000000 0000000000000000 ffffffff00000000 raw: 0000000000000100 0000000000000200 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: 000000005a4a7480: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 000000005a4a7500: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00 00 >000000005a4a7580: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 ^ 000000005a4a7600: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f8 000000005a4a7680: f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f3 f3 f3 f3 00 00 ================================================================== Signed-off-by: Vasily Gorbik <gor(a)linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/dis.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/s390/kernel/dis.c +++ b/arch/s390/kernel/dis.c @@ -1954,7 +1954,7 @@ void show_code(struct pt_regs *regs) { char *mode = user_mode(regs) ? "User" : "Krnl"; unsigned char code[64]; - char buffer[64], *ptr; + char buffer[128], *ptr; mm_segment_t old_fs; unsigned long addr; int start, end, opsize, hops, i; @@ -2017,7 +2017,7 @@ void show_code(struct pt_regs *regs) start += opsize; pr_cont("%s", buffer); ptr = buffer; - ptr += sprintf(ptr, "\n "); + ptr += sprintf(ptr, "\n\t "); hops++; } pr_cont("\n"); Patches currently in stable-queue which might be from gor(a)linux.vnet.ibm.com are queue-4.14/s390-disassembler-increase-show_code-buffer-size.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/disassembler: add missing end marker for e7 table" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/disassembler: add missing end marker for e7 table to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-disassembler-add-missing-end-marker-for-e7-table.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 5c50538752af7968f53924b22dede8ed4ce4cb3b Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Tue, 26 Sep 2017 09:16:48 +0200 Subject: s390/disassembler: add missing end marker for e7 table From: Heiko Carstens <heiko.carstens(a)de.ibm.com> commit 5c50538752af7968f53924b22dede8ed4ce4cb3b upstream. The e7 opcode table does not have an end marker. Hence when trying to find an unknown e7 instruction the code will access memory behind the table until it finds something that matches the opcode, or the kernel crashes, whatever comes first. This affects not only the in-kernel disassembler but also uprobes and kprobes which refuse to set a probe on unknown instructions, and therefore search the opcode tables to figure out if instructions are known or not. Fixes: 3585cb0280654 ("s390/disassembler: add vector instructions") Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/dis.c | 1 + 1 file changed, 1 insertion(+) --- a/arch/s390/kernel/dis.c +++ b/arch/s390/kernel/dis.c @@ -1548,6 +1548,7 @@ static struct s390_insn opcode_e7[] = { { "vfsq", 0xce, INSTR_VRR_VV000MM }, { "vfs", 0xe2, INSTR_VRR_VVV00MM }, { "vftci", 0x4a, INSTR_VRI_VVIMM }, + { "", 0, INSTR_INVALID } }; static struct s390_insn opcode_eb[] = { Patches currently in stable-queue which might be from heiko.carstens(a)de.ibm.com are queue-4.14/s390-guarded-storage-fix-possible-memory-corruption.patch queue-4.14/s390-disassembler-add-missing-end-marker-for-e7-table.patch queue-4.14/s390-fix-transactional-execution-control-register-handling.patch queue-4.14/s390-runtime-instrumention-fix-possible-memory-corruption.patch queue-4.14/s390-noexec-execute-kexec-datamover-without-dat.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: acpi-pm-fix-acpi_pm_notifier_lock-vs-flush_workqueue-deadlock.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From ff1656790b3a4caca94505c52fd0250f981ea187 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com> Date: Tue, 7 Nov 2017 23:08:10 +0200 Subject: ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Ville Syrjälä <ville.syrjala(a)linux.intel.com> commit ff1656790b3a4caca94505c52fd0250f981ea187 upstream. acpi_remove_pm_notifier() ends up calling flush_workqueue() while holding acpi_pm_notifier_lock, and that same lock is taken by by the work via acpi_pm_notify_handler(). This can deadlock. To fix the problem let's split the single lock into two: one to protect the dev->wakeup between the work vs. add/remove, and another one to handle notifier installation vs. removal. After commit a1d14934ea4b "workqueue/lockdep: 'Fix' flush_work() annotation" I was able to kill the machine (Intel Braswell) very easily with 'powertop --auto-tune', runtime suspending i915, and trying to wake it up via the USB keyboard. The cases when it didn't die are presumably explained by lockdep getting disabled by something else (cpu hotplug locking issues usually). Fortunately I still got a lockdep report over netconsole (trickling in very slowly), even though the machine was otherwise practically dead: [ 112.179806] ====================================================== [ 114.670858] WARNING: possible circular locking dependency detected [ 117.155663] 4.13.0-rc6-bsw-bisect-00169-ga1d14934ea4b #119 Not tainted [ 119.658101] ------------------------------------------------------ [ 121.310242] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command. [ 121.313294] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead [ 121.313346] xhci_hcd 0000:00:14.0: HC died; cleaning up [ 121.313485] usb 1-6: USB disconnect, device number 3 [ 121.313501] usb 1-6.2: USB disconnect, device number 4 [ 134.747383] kworker/0:2/47 is trying to acquire lock: [ 137.220790] (acpi_pm_notifier_lock){+.+.}, at: [<ffffffff813cafdf>] acpi_pm_notify_handler+0x2f/0x80 [ 139.721524] [ 139.721524] but task is already holding lock: [ 144.672922] ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 147.184450] [ 147.184450] which lock already depends on the new lock. [ 147.184450] [ 154.604711] [ 154.604711] the existing dependency chain (in reverse order) is: [ 159.447888] [ 159.447888] -> #2 ((&dpc->work)){+.+.}: [ 164.183486] __lock_acquire+0x1255/0x13f0 [ 166.504313] lock_acquire+0xb5/0x210 [ 168.778973] process_one_work+0x1b9/0x720 [ 171.030316] worker_thread+0x4c/0x440 [ 173.257184] kthread+0x154/0x190 [ 175.456143] ret_from_fork+0x27/0x40 [ 177.624348] [ 177.624348] -> #1 ("kacpi_notify"){+.+.}: [ 181.850351] __lock_acquire+0x1255/0x13f0 [ 183.941695] lock_acquire+0xb5/0x210 [ 186.046115] flush_workqueue+0xdd/0x510 [ 190.408153] acpi_os_wait_events_complete+0x31/0x40 [ 192.625303] acpi_remove_notify_handler+0x133/0x188 [ 194.820829] acpi_remove_pm_notifier+0x56/0x90 [ 196.989068] acpi_dev_pm_detach+0x5f/0xa0 [ 199.145866] dev_pm_domain_detach+0x27/0x30 [ 201.285614] i2c_device_probe+0x100/0x210 [ 203.411118] driver_probe_device+0x23e/0x310 [ 205.522425] __driver_attach+0xa3/0xb0 [ 207.634268] bus_for_each_dev+0x69/0xa0 [ 209.714797] driver_attach+0x1e/0x20 [ 211.778258] bus_add_driver+0x1bc/0x230 [ 213.837162] driver_register+0x60/0xe0 [ 215.868162] i2c_register_driver+0x42/0x70 [ 217.869551] 0xffffffffa0172017 [ 219.863009] do_one_initcall+0x45/0x170 [ 221.843863] do_init_module+0x5f/0x204 [ 223.817915] load_module+0x225b/0x29b0 [ 225.757234] SyS_finit_module+0xc6/0xd0 [ 227.661851] do_syscall_64+0x5c/0x120 [ 229.536819] return_from_SYSCALL_64+0x0/0x7a [ 231.392444] [ 231.392444] -> #0 (acpi_pm_notifier_lock){+.+.}: [ 235.124914] check_prev_add+0x44e/0x8a0 [ 237.024795] __lock_acquire+0x1255/0x13f0 [ 238.937351] lock_acquire+0xb5/0x210 [ 240.840799] __mutex_lock+0x75/0x940 [ 242.709517] mutex_lock_nested+0x1c/0x20 [ 244.551478] acpi_pm_notify_handler+0x2f/0x80 [ 246.382052] acpi_ev_notify_dispatch+0x44/0x5c [ 248.194412] acpi_os_execute_deferred+0x14/0x30 [ 250.003925] process_one_work+0x1ec/0x720 [ 251.803191] worker_thread+0x4c/0x440 [ 253.605307] kthread+0x154/0x190 [ 255.387498] ret_from_fork+0x27/0x40 [ 257.153175] [ 257.153175] other info that might help us debug this: [ 257.153175] [ 262.324392] Chain exists of: [ 262.324392] acpi_pm_notifier_lock --> "kacpi_notify" --> (&dpc->work) [ 262.324392] [ 267.391997] Possible unsafe locking scenario: [ 267.391997] [ 270.758262] CPU0 CPU1 [ 272.431713] ---- ---- [ 274.060756] lock((&dpc->work)); [ 275.646532] lock("kacpi_notify"); [ 277.260772] lock((&dpc->work)); [ 278.839146] lock(acpi_pm_notifier_lock); [ 280.391902] [ 280.391902] *** DEADLOCK *** [ 280.391902] [ 284.986385] 2 locks held by kworker/0:2/47: [ 286.524895] #0: ("kacpi_notify"){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 288.112927] #1: ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 289.727725] Fixes: c072530f391e (ACPI / PM: Revork the handling of ACPI device wakeup notifications) Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/acpi/device_pm.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable) #ifdef CONFIG_PM static DEFINE_MUTEX(acpi_pm_notifier_lock); +static DEFINE_MUTEX(acpi_pm_notifier_install_lock); void acpi_pm_wakeup_event(struct device *dev) { @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct if (!dev && !func) return AE_BAD_PARAMETER; - mutex_lock(&acpi_pm_notifier_lock); + mutex_lock(&acpi_pm_notifier_install_lock); if (adev->wakeup.flags.notifier_present) goto out; - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); - adev->wakeup.context.dev = dev; - adev->wakeup.context.func = func; - status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY, acpi_pm_notify_handler, NULL); if (ACPI_FAILURE(status)) goto out; + mutex_lock(&acpi_pm_notifier_lock); + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); + adev->wakeup.context.dev = dev; + adev->wakeup.context.func = func; adev->wakeup.flags.notifier_present = true; + mutex_unlock(&acpi_pm_notifier_lock); out: - mutex_unlock(&acpi_pm_notifier_lock); + mutex_unlock(&acpi_pm_notifier_install_lock); return status; } @@ -472,7 +474,7 @@ acpi_status acpi_remove_pm_notifier(stru { acpi_status status = AE_BAD_PARAMETER; - mutex_lock(&acpi_pm_notifier_lock); + mutex_lock(&acpi_pm_notifier_install_lock); if (!adev->wakeup.flags.notifier_present) goto out; @@ -483,14 +485,15 @@ acpi_status acpi_remove_pm_notifier(stru if (ACPI_FAILURE(status)) goto out; + mutex_lock(&acpi_pm_notifier_lock); adev->wakeup.context.func = NULL; adev->wakeup.context.dev = NULL; wakeup_source_unregister(adev->wakeup.ws); - adev->wakeup.flags.notifier_present = false; + mutex_unlock(&acpi_pm_notifier_lock); out: - mutex_unlock(&acpi_pm_notifier_lock); + mutex_unlock(&acpi_pm_notifier_install_lock); return status; } Patches currently in stable-queue which might be from ville.syrjala(a)linux.intel.com are queue-4.14/acpi-pm-fix-acpi_pm_notifier_lock-vs-flush_workqueue-deadlock.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "ACPI / EC: Fix regression related to triggering source of EC event handling ACPI / EC: Fix an issue that SCI_EVT cannot be detected ACPI / EC: Remove old CLEAR_ON_RESUME quirk Revert "ACPI / EC: Enable event freeze mode..." to fix" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ACPI / EC: Fix regression related to triggering source of EC event handling ACPI / EC: Fix an issue that SCI_EVT cannot be detected ACPI / EC: Remove old CLEAR_ON_RESUME quirk Revert "ACPI / EC: Enable event freeze mode..." to fix to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: acpi-ec-fix-regression-related-to-triggering-source-of-ec-event-handling.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 53c5eaabaea9a1b7a96f95ccc486d2ad721d95bb Mon Sep 17 00:00:00 2001 From: Lv Zheng <lv.zheng(a)intel.com> Date: Tue, 26 Sep 2017 16:54:03 +0800 Subject: ACPI / EC: Fix regression related to triggering source of EC event handling From: Lv Zheng <lv.zheng(a)intel.com> commit 53c5eaabaea9a1b7a96f95ccc486d2ad721d95bb upstream. Originally the Samsung quirks removed by commit 4c237371 can be covered by commit e923e8e7 and ec_freeze_events=Y mode. But commit 9c40f956 changed ec_freeze_events=Y back to N, making this problem re-surface. Actually, if commit e923e8e7 is robust enough, we can freely change ec_freeze_events mode, so this patch fixes the issue by improving commit e923e8e7. Related commits listed in the merged order: Commit: e923e8e79e18fd6be9162f1be6b99a002e9df2cb Subject: ACPI / EC: Fix an issue that SCI_EVT cannot be detected after event is enabled Commit: 4c237371f290d1ed3b2071dd43554362137b1cce Subject: ACPI / EC: Remove old CLEAR_ON_RESUME quirk Commit: 9c40f956ce9b331493347d1b3cb7e384f7dc0581 Subject: Revert "ACPI / EC: Enable event freeze mode..." to fix a regression This patch not only fixes the reported post-resume EC event triggering source issue, but also fixes an unreported similar issue related to the driver bind by adding EC event triggering source in ec_install_handlers(). Fixes: e923e8e79e18 (ACPI / EC: Fix an issue that SCI_EVT cannot be detected after event is enabled) Fixes: 4c237371f290 (ACPI / EC: Remove old CLEAR_ON_RESUME quirk) Fixes: 9c40f956ce9b (Revert "ACPI / EC: Enable event freeze mode..." to fix a regression) Link: https://bugzilla.kernel.org/show_bug.cgi?id=196833 Signed-off-by: Lv Zheng <lv.zheng(a)intel.com> Reported-by: Alistair Hamilton <ahpatent(a)gmail.com> Tested-by: Alistair Hamilton <ahpatent(a)gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/acpi/ec.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -486,8 +486,11 @@ static inline void __acpi_ec_enable_even { if (!test_and_set_bit(EC_FLAGS_QUERY_ENABLED, &ec->flags)) ec_log_drv("event unblocked"); - if (!test_bit(EC_FLAGS_QUERY_PENDING, &ec->flags)) - advance_transaction(ec); + /* + * Unconditionally invoke this once after enabling the event + * handling mechanism to detect the pending events. + */ + advance_transaction(ec); } static inline void __acpi_ec_disable_event(struct acpi_ec *ec) @@ -1456,11 +1459,10 @@ static int ec_install_handlers(struct ac if (test_bit(EC_FLAGS_STARTED, &ec->flags) && ec->reference_count >= 1) acpi_ec_enable_gpe(ec, true); - - /* EC is fully operational, allow queries */ - acpi_ec_enable_event(ec); } } + /* EC is fully operational, allow queries */ + acpi_ec_enable_event(ec); return 0; } Patches currently in stable-queue which might be from lv.zheng(a)intel.com are queue-4.14/acpi-ec-fix-regression-related-to-triggering-source-of-ec-event-handling.patch

8 years

1
0
0 0

[Linux-stable-mirror] Patch "s390/disassembler: increase show_code buffer size" has been added to the 3.18-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled s390/disassembler: increase show_code buffer size to the 3.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: s390-disassembler-increase-show_code-buffer-size.patch and it can be found in the queue-3.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From b192571d1ae375e0bbe0aa3ccfa1a3c3704454b9 Mon Sep 17 00:00:00 2001 From: Vasily Gorbik <gor(a)linux.vnet.ibm.com> Date: Wed, 15 Nov 2017 14:15:36 +0100 Subject: s390/disassembler: increase show_code buffer size From: Vasily Gorbik <gor(a)linux.vnet.ibm.com> commit b192571d1ae375e0bbe0aa3ccfa1a3c3704454b9 upstream. Current buffer size of 64 is too small. objdump shows that there are instructions which would require up to 75 bytes buffer (with current formating). 128 bytes "ought to be enough for anybody". Also replaces 8 spaces with a single tab to reduce the memory footprint. Fixes the following KASAN finding: BUG: KASAN: stack-out-of-bounds in number+0x3fe/0x538 Write of size 1 at addr 000000005a4a75a0 by task bash/1282 CPU: 1 PID: 1282 Comm: bash Not tainted 4.14.0+ #215 Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) Call Trace: ([<000000000011eeb6>] show_stack+0x56/0x88) [<0000000000e1ce1a>] dump_stack+0x15a/0x1b0 [<00000000004e2994>] print_address_description+0xf4/0x288 [<00000000004e2cf2>] kasan_report+0x13a/0x230 [<0000000000e38ae6>] number+0x3fe/0x538 [<0000000000e3dfe4>] vsnprintf+0x194/0x948 [<0000000000e3ea42>] sprintf+0xa2/0xb8 [<00000000001198dc>] print_insn+0x374/0x500 [<0000000000119346>] show_code+0x4ee/0x538 [<000000000011f234>] show_registers+0x34c/0x388 [<000000000011f2ae>] show_regs+0x3e/0xa8 [<000000000011f502>] die+0x1ea/0x2e8 [<0000000000138f0e>] do_no_context+0x106/0x168 [<0000000000139a1a>] do_protection_exception+0x4da/0x7d0 [<0000000000e55914>] pgm_check_handler+0x16c/0x1c0 [<000000000090639e>] sysrq_handle_crash+0x46/0x58 ([<0000000000000007>] 0x7) [<00000000009073fa>] __handle_sysrq+0x102/0x218 [<0000000000907c06>] write_sysrq_trigger+0xd6/0x100 [<000000000061d67a>] proc_reg_write+0xb2/0x128 [<0000000000520be6>] __vfs_write+0xee/0x368 [<0000000000521222>] vfs_write+0x21a/0x278 [<000000000052156a>] SyS_write+0xda/0x178 [<0000000000e555cc>] system_call+0xc4/0x270 The buggy address belongs to the page: page:000003d1016929c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() raw: 0000000000000000 0000000000000000 0000000000000000 ffffffff00000000 raw: 0000000000000100 0000000000000200 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: 000000005a4a7480: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 000000005a4a7500: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00 00 >000000005a4a7580: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 ^ 000000005a4a7600: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f8 000000005a4a7680: f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f3 f3 f3 f3 00 00 ================================================================== Signed-off-by: Vasily Gorbik <gor(a)linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/s390/kernel/dis.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/s390/kernel/dis.c +++ b/arch/s390/kernel/dis.c @@ -1997,7 +1997,7 @@ void show_code(struct pt_regs *regs) { char *mode = user_mode(regs) ? "User" : "Krnl"; unsigned char code[64]; - char buffer[64], *ptr; + char buffer[128], *ptr; mm_segment_t old_fs; unsigned long addr; int start, end, opsize, hops, i; @@ -2060,7 +2060,7 @@ void show_code(struct pt_regs *regs) start += opsize; printk(buffer); ptr = buffer; - ptr += sprintf(ptr, "\n "); + ptr += sprintf(ptr, "\n\t "); hops++; } printk("\n"); Patches currently in stable-queue which might be from gor(a)linux.vnet.ibm.com are queue-3.18/s390-disassembler-increase-show_code-buffer-size.patch

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap" failed to apply to 3.18-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 3.18-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 4f89fa286f6729312e227e7c2d764e8e7b9d340e Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:24 +0000 Subject: [PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range() with __set_fixmap() as ioremap_page_range() may sleep to allocate a new level of page-table, even if its passed an existing final-address to use in the mapping. The GHES driver can only be enabled for architectures that select HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64. clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64 and __set_pte_vaddr() for x86. In each case its the same as the respective arch_apei_flush_tlb_one(). Reported-by: Fengguang Wu <fengguang.wu(a)intel.com> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org> Tested-by: Toshi Kani <toshi.kani(a)hpe.com> [ For the arm64 bits: ] Acked-by: Will Deacon <will.deacon(a)arm.com> [ For the x86 bits: ] Acked-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Cc: All applicable <stable(a)vger.kernel.org> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h index caf86be815ba..4052ec39e8db 100644 --- a/arch/arm64/include/asm/fixmap.h +++ b/arch/arm64/include/asm/fixmap.h @@ -51,6 +51,13 @@ enum fixed_addresses { FIX_EARLYCON_MEM_BASE, FIX_TEXT_POKE0, + +#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif /* CONFIG_ACPI_APEI_GHES */ + __end_of_permanent_fixed_addresses, /* diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index dcd9fb55e679..b0c505fe9a95 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -104,6 +104,12 @@ enum fixed_addresses { FIX_GDT_REMAP_BEGIN, FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, +#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif + __end_of_permanent_fixed_addresses, /* diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index cb7aceae3553..572b6c7303ed 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -51,6 +51,7 @@ #include <acpi/actbl1.h> #include <acpi/ghes.h> #include <acpi/apei.h> +#include <asm/fixmap.h> #include <asm/tlbflush.h> #include <ras/ras_event.h> @@ -112,7 +113,7 @@ static DEFINE_MUTEX(ghes_list_mutex); * Because the memory area used to transfer hardware error information * from BIOS to Linux can be determined only in NMI, IRQ or timer * handler, but general ioremap can not be used in atomic context, so - * a special version of atomic ioremap is implemented for that. + * the fixmap is used instead. */ /* @@ -126,8 +127,8 @@ static DEFINE_MUTEX(ghes_list_mutex); /* virtual memory area for atomic ioremap */ static struct vm_struct *ghes_ioremap_area; /* - * These 2 spinlock is used to prevent atomic ioremap virtual memory - * area from being mapped simultaneously. + * These 2 spinlocks are used to prevent the fixmap entries from being used + * simultaneously. */ static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi); static DEFINE_SPINLOCK(ghes_ioremap_lock_irq); @@ -159,53 +160,36 @@ static void ghes_ioremap_exit(void) static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot; - vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); + __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot); - return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI); } static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot; - vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); + __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); - - return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ); } -static void ghes_iounmap_nmi(void __iomem *vaddr_ptr) +static void ghes_iounmap_nmi(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_NMI); } -static void ghes_iounmap_irq(void __iomem *vaddr_ptr) +static void ghes_iounmap_irq(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_IRQ); } static int ghes_estatus_pool_init(void) @@ -361,10 +345,10 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len, paddr += trunk; buffer += trunk; if (in_nmi) { - ghes_iounmap_nmi(vaddr); + ghes_iounmap_nmi(); raw_spin_unlock(&ghes_ioremap_lock_nmi); } else { - ghes_iounmap_irq(vaddr); + ghes_iounmap_irq(); spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags); } }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 4f89fa286f6729312e227e7c2d764e8e7b9d340e Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:24 +0000 Subject: [PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range() with __set_fixmap() as ioremap_page_range() may sleep to allocate a new level of page-table, even if its passed an existing final-address to use in the mapping. The GHES driver can only be enabled for architectures that select HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64. clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64 and __set_pte_vaddr() for x86. In each case its the same as the respective arch_apei_flush_tlb_one(). Reported-by: Fengguang Wu <fengguang.wu(a)intel.com> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org> Tested-by: Toshi Kani <toshi.kani(a)hpe.com> [ For the arm64 bits: ] Acked-by: Will Deacon <will.deacon(a)arm.com> [ For the x86 bits: ] Acked-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Cc: All applicable <stable(a)vger.kernel.org> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h index caf86be815ba..4052ec39e8db 100644 --- a/arch/arm64/include/asm/fixmap.h +++ b/arch/arm64/include/asm/fixmap.h @@ -51,6 +51,13 @@ enum fixed_addresses { FIX_EARLYCON_MEM_BASE, FIX_TEXT_POKE0, + +#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif /* CONFIG_ACPI_APEI_GHES */ + __end_of_permanent_fixed_addresses, /* diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index dcd9fb55e679..b0c505fe9a95 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -104,6 +104,12 @@ enum fixed_addresses { FIX_GDT_REMAP_BEGIN, FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, +#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif + __end_of_permanent_fixed_addresses, /* diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index cb7aceae3553..572b6c7303ed 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -51,6 +51,7 @@ #include <acpi/actbl1.h> #include <acpi/ghes.h> #include <acpi/apei.h> +#include <asm/fixmap.h> #include <asm/tlbflush.h> #include <ras/ras_event.h> @@ -112,7 +113,7 @@ static DEFINE_MUTEX(ghes_list_mutex); * Because the memory area used to transfer hardware error information * from BIOS to Linux can be determined only in NMI, IRQ or timer * handler, but general ioremap can not be used in atomic context, so - * a special version of atomic ioremap is implemented for that. + * the fixmap is used instead. */ /* @@ -126,8 +127,8 @@ static DEFINE_MUTEX(ghes_list_mutex); /* virtual memory area for atomic ioremap */ static struct vm_struct *ghes_ioremap_area; /* - * These 2 spinlock is used to prevent atomic ioremap virtual memory - * area from being mapped simultaneously. + * These 2 spinlocks are used to prevent the fixmap entries from being used + * simultaneously. */ static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi); static DEFINE_SPINLOCK(ghes_ioremap_lock_irq); @@ -159,53 +160,36 @@ static void ghes_ioremap_exit(void) static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot; - vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); + __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot); - return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI); } static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot; - vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); + __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); - - return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ); } -static void ghes_iounmap_nmi(void __iomem *vaddr_ptr) +static void ghes_iounmap_nmi(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_NMI); } -static void ghes_iounmap_irq(void __iomem *vaddr_ptr) +static void ghes_iounmap_irq(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_IRQ); } static int ghes_estatus_pool_init(void) @@ -361,10 +345,10 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len, paddr += trunk; buffer += trunk; if (in_nmi) { - ghes_iounmap_nmi(vaddr); + ghes_iounmap_nmi(); raw_spin_unlock(&ghes_ioremap_lock_nmi); } else { - ghes_iounmap_irq(vaddr); + ghes_iounmap_irq(); spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags); } }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 4f89fa286f6729312e227e7c2d764e8e7b9d340e Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:24 +0000 Subject: [PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range() with __set_fixmap() as ioremap_page_range() may sleep to allocate a new level of page-table, even if its passed an existing final-address to use in the mapping. The GHES driver can only be enabled for architectures that select HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64. clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64 and __set_pte_vaddr() for x86. In each case its the same as the respective arch_apei_flush_tlb_one(). Reported-by: Fengguang Wu <fengguang.wu(a)intel.com> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org> Tested-by: Toshi Kani <toshi.kani(a)hpe.com> [ For the arm64 bits: ] Acked-by: Will Deacon <will.deacon(a)arm.com> [ For the x86 bits: ] Acked-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Cc: All applicable <stable(a)vger.kernel.org> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h index caf86be815ba..4052ec39e8db 100644 --- a/arch/arm64/include/asm/fixmap.h +++ b/arch/arm64/include/asm/fixmap.h @@ -51,6 +51,13 @@ enum fixed_addresses { FIX_EARLYCON_MEM_BASE, FIX_TEXT_POKE0, + +#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif /* CONFIG_ACPI_APEI_GHES */ + __end_of_permanent_fixed_addresses, /* diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index dcd9fb55e679..b0c505fe9a95 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -104,6 +104,12 @@ enum fixed_addresses { FIX_GDT_REMAP_BEGIN, FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, +#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif + __end_of_permanent_fixed_addresses, /* diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index cb7aceae3553..572b6c7303ed 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -51,6 +51,7 @@ #include <acpi/actbl1.h> #include <acpi/ghes.h> #include <acpi/apei.h> +#include <asm/fixmap.h> #include <asm/tlbflush.h> #include <ras/ras_event.h> @@ -112,7 +113,7 @@ static DEFINE_MUTEX(ghes_list_mutex); * Because the memory area used to transfer hardware error information * from BIOS to Linux can be determined only in NMI, IRQ or timer * handler, but general ioremap can not be used in atomic context, so - * a special version of atomic ioremap is implemented for that. + * the fixmap is used instead. */ /* @@ -126,8 +127,8 @@ static DEFINE_MUTEX(ghes_list_mutex); /* virtual memory area for atomic ioremap */ static struct vm_struct *ghes_ioremap_area; /* - * These 2 spinlock is used to prevent atomic ioremap virtual memory - * area from being mapped simultaneously. + * These 2 spinlocks are used to prevent the fixmap entries from being used + * simultaneously. */ static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi); static DEFINE_SPINLOCK(ghes_ioremap_lock_irq); @@ -159,53 +160,36 @@ static void ghes_ioremap_exit(void) static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot; - vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); + __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot); - return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI); } static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot; - vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); + __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); - - return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ); } -static void ghes_iounmap_nmi(void __iomem *vaddr_ptr) +static void ghes_iounmap_nmi(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_NMI); } -static void ghes_iounmap_irq(void __iomem *vaddr_ptr) +static void ghes_iounmap_irq(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_IRQ); } static int ghes_estatus_pool_init(void) @@ -361,10 +345,10 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len, paddr += trunk; buffer += trunk; if (in_nmi) { - ghes_iounmap_nmi(vaddr); + ghes_iounmap_nmi(); raw_spin_unlock(&ghes_ioremap_lock_nmi); } else { - ghes_iounmap_irq(vaddr); + ghes_iounmap_irq(); spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags); } }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 4f89fa286f6729312e227e7c2d764e8e7b9d340e Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:24 +0000 Subject: [PATCH] ACPI / APEI: Replace ioremap_page_range() with fixmap Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range() with __set_fixmap() as ioremap_page_range() may sleep to allocate a new level of page-table, even if its passed an existing final-address to use in the mapping. The GHES driver can only be enabled for architectures that select HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64. clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64 and __set_pte_vaddr() for x86. In each case its the same as the respective arch_apei_flush_tlb_one(). Reported-by: Fengguang Wu <fengguang.wu(a)intel.com> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org> Tested-by: Toshi Kani <toshi.kani(a)hpe.com> [ For the arm64 bits: ] Acked-by: Will Deacon <will.deacon(a)arm.com> [ For the x86 bits: ] Acked-by: Ingo Molnar <mingo(a)kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Cc: All applicable <stable(a)vger.kernel.org> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h index caf86be815ba..4052ec39e8db 100644 --- a/arch/arm64/include/asm/fixmap.h +++ b/arch/arm64/include/asm/fixmap.h @@ -51,6 +51,13 @@ enum fixed_addresses { FIX_EARLYCON_MEM_BASE, FIX_TEXT_POKE0, + +#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif /* CONFIG_ACPI_APEI_GHES */ + __end_of_permanent_fixed_addresses, /* diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index dcd9fb55e679..b0c505fe9a95 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -104,6 +104,12 @@ enum fixed_addresses { FIX_GDT_REMAP_BEGIN, FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, +#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif + __end_of_permanent_fixed_addresses, /* diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index cb7aceae3553..572b6c7303ed 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -51,6 +51,7 @@ #include <acpi/actbl1.h> #include <acpi/ghes.h> #include <acpi/apei.h> +#include <asm/fixmap.h> #include <asm/tlbflush.h> #include <ras/ras_event.h> @@ -112,7 +113,7 @@ static DEFINE_MUTEX(ghes_list_mutex); * Because the memory area used to transfer hardware error information * from BIOS to Linux can be determined only in NMI, IRQ or timer * handler, but general ioremap can not be used in atomic context, so - * a special version of atomic ioremap is implemented for that. + * the fixmap is used instead. */ /* @@ -126,8 +127,8 @@ static DEFINE_MUTEX(ghes_list_mutex); /* virtual memory area for atomic ioremap */ static struct vm_struct *ghes_ioremap_area; /* - * These 2 spinlock is used to prevent atomic ioremap virtual memory - * area from being mapped simultaneously. + * These 2 spinlocks are used to prevent the fixmap entries from being used + * simultaneously. */ static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi); static DEFINE_SPINLOCK(ghes_ioremap_lock_irq); @@ -159,53 +160,36 @@ static void ghes_ioremap_exit(void) static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot; - vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); + __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot); - return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI); } static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot; - vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); + __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); - - return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ); } -static void ghes_iounmap_nmi(void __iomem *vaddr_ptr) +static void ghes_iounmap_nmi(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_NMI); } -static void ghes_iounmap_irq(void __iomem *vaddr_ptr) +static void ghes_iounmap_irq(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_IRQ); } static int ghes_estatus_pool_init(void) @@ -361,10 +345,10 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len, paddr += trunk; buffer += trunk; if (in_nmi) { - ghes_iounmap_nmi(vaddr); + ghes_iounmap_nmi(); raw_spin_unlock(&ghes_ioremap_lock_nmi); } else { - ghes_iounmap_irq(vaddr); + ghes_iounmap_irq(); spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags); } }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] ACPI / APEI: Remove ghes_ioremap_area" failed to apply to 3.18-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 3.18-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 520e18a5080d2c444a03280d99c8a35cb667d321 Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:25 +0000 Subject: [PATCH] ACPI / APEI: Remove ghes_ioremap_area Now that nothing is using the ghes_ioremap_area pages, rip them out. Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Cc: All applicable <stable(a)vger.kernel.org> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 572b6c7303ed..f14695e744d0 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -114,19 +114,7 @@ static DEFINE_MUTEX(ghes_list_mutex); * from BIOS to Linux can be determined only in NMI, IRQ or timer * handler, but general ioremap can not be used in atomic context, so * the fixmap is used instead. - */ - -/* - * Two virtual pages are used, one for IRQ/PROCESS context, the other for - * NMI context (optionally). - */ -#define GHES_IOREMAP_PAGES 2 -#define GHES_IOREMAP_IRQ_PAGE(base) (base) -#define GHES_IOREMAP_NMI_PAGE(base) ((base) + PAGE_SIZE) - -/* virtual memory area for atomic ioremap */ -static struct vm_struct *ghes_ioremap_area; -/* + * * These 2 spinlocks are used to prevent the fixmap entries from being used * simultaneously. */ @@ -141,23 +129,6 @@ static atomic_t ghes_estatus_cache_alloced; static int ghes_panic_timeout __read_mostly = 30; -static int ghes_ioremap_init(void) -{ - ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES, - VM_IOREMAP, VMALLOC_START, VMALLOC_END); - if (!ghes_ioremap_area) { - pr_err(GHES_PFX "Failed to allocate virtual memory area for atomic ioremap.\n"); - return -ENOMEM; - } - - return 0; -} - -static void ghes_ioremap_exit(void) -{ - free_vm_area(ghes_ioremap_area); -} - static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) { phys_addr_t paddr; @@ -1247,13 +1218,9 @@ static int __init ghes_init(void) ghes_nmi_init_cxt(); - rc = ghes_ioremap_init(); - if (rc) - goto err; - rc = ghes_estatus_pool_init(); if (rc) - goto err_ioremap_exit; + goto err; rc = ghes_estatus_pool_expand(GHES_ESTATUS_CACHE_AVG_SIZE * GHES_ESTATUS_CACHE_ALLOCED_MAX); @@ -1277,8 +1244,6 @@ static int __init ghes_init(void) return 0; err_pool_exit: ghes_estatus_pool_exit(); -err_ioremap_exit: - ghes_ioremap_exit(); err: return rc; }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] ACPI / APEI: Remove ghes_ioremap_area" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 520e18a5080d2c444a03280d99c8a35cb667d321 Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:25 +0000 Subject: [PATCH] ACPI / APEI: Remove ghes_ioremap_area Now that nothing is using the ghes_ioremap_area pages, rip them out. Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Cc: All applicable <stable(a)vger.kernel.org> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 572b6c7303ed..f14695e744d0 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -114,19 +114,7 @@ static DEFINE_MUTEX(ghes_list_mutex); * from BIOS to Linux can be determined only in NMI, IRQ or timer * handler, but general ioremap can not be used in atomic context, so * the fixmap is used instead. - */ - -/* - * Two virtual pages are used, one for IRQ/PROCESS context, the other for - * NMI context (optionally). - */ -#define GHES_IOREMAP_PAGES 2 -#define GHES_IOREMAP_IRQ_PAGE(base) (base) -#define GHES_IOREMAP_NMI_PAGE(base) ((base) + PAGE_SIZE) - -/* virtual memory area for atomic ioremap */ -static struct vm_struct *ghes_ioremap_area; -/* + * * These 2 spinlocks are used to prevent the fixmap entries from being used * simultaneously. */ @@ -141,23 +129,6 @@ static atomic_t ghes_estatus_cache_alloced; static int ghes_panic_timeout __read_mostly = 30; -static int ghes_ioremap_init(void) -{ - ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES, - VM_IOREMAP, VMALLOC_START, VMALLOC_END); - if (!ghes_ioremap_area) { - pr_err(GHES_PFX "Failed to allocate virtual memory area for atomic ioremap.\n"); - return -ENOMEM; - } - - return 0; -} - -static void ghes_ioremap_exit(void) -{ - free_vm_area(ghes_ioremap_area); -} - static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) { phys_addr_t paddr; @@ -1247,13 +1218,9 @@ static int __init ghes_init(void) ghes_nmi_init_cxt(); - rc = ghes_ioremap_init(); - if (rc) - goto err; - rc = ghes_estatus_pool_init(); if (rc) - goto err_ioremap_exit; + goto err; rc = ghes_estatus_pool_expand(GHES_ESTATUS_CACHE_AVG_SIZE * GHES_ESTATUS_CACHE_ALLOCED_MAX); @@ -1277,8 +1244,6 @@ static int __init ghes_init(void) return 0; err_pool_exit: ghes_estatus_pool_exit(); -err_ioremap_exit: - ghes_ioremap_exit(); err: return rc; }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] ACPI / APEI: Remove ghes_ioremap_area" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 520e18a5080d2c444a03280d99c8a35cb667d321 Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:25 +0000 Subject: [PATCH] ACPI / APEI: Remove ghes_ioremap_area Now that nothing is using the ghes_ioremap_area pages, rip them out. Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Cc: All applicable <stable(a)vger.kernel.org> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 572b6c7303ed..f14695e744d0 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -114,19 +114,7 @@ static DEFINE_MUTEX(ghes_list_mutex); * from BIOS to Linux can be determined only in NMI, IRQ or timer * handler, but general ioremap can not be used in atomic context, so * the fixmap is used instead. - */ - -/* - * Two virtual pages are used, one for IRQ/PROCESS context, the other for - * NMI context (optionally). - */ -#define GHES_IOREMAP_PAGES 2 -#define GHES_IOREMAP_IRQ_PAGE(base) (base) -#define GHES_IOREMAP_NMI_PAGE(base) ((base) + PAGE_SIZE) - -/* virtual memory area for atomic ioremap */ -static struct vm_struct *ghes_ioremap_area; -/* + * * These 2 spinlocks are used to prevent the fixmap entries from being used * simultaneously. */ @@ -141,23 +129,6 @@ static atomic_t ghes_estatus_cache_alloced; static int ghes_panic_timeout __read_mostly = 30; -static int ghes_ioremap_init(void) -{ - ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES, - VM_IOREMAP, VMALLOC_START, VMALLOC_END); - if (!ghes_ioremap_area) { - pr_err(GHES_PFX "Failed to allocate virtual memory area for atomic ioremap.\n"); - return -ENOMEM; - } - - return 0; -} - -static void ghes_ioremap_exit(void) -{ - free_vm_area(ghes_ioremap_area); -} - static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) { phys_addr_t paddr; @@ -1247,13 +1218,9 @@ static int __init ghes_init(void) ghes_nmi_init_cxt(); - rc = ghes_ioremap_init(); - if (rc) - goto err; - rc = ghes_estatus_pool_init(); if (rc) - goto err_ioremap_exit; + goto err; rc = ghes_estatus_pool_expand(GHES_ESTATUS_CACHE_AVG_SIZE * GHES_ESTATUS_CACHE_ALLOCED_MAX); @@ -1277,8 +1244,6 @@ static int __init ghes_init(void) return 0; err_pool_exit: ghes_estatus_pool_exit(); -err_ioremap_exit: - ghes_ioremap_exit(); err: return rc; }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] ACPI / APEI: Remove ghes_ioremap_area" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 520e18a5080d2c444a03280d99c8a35cb667d321 Mon Sep 17 00:00:00 2001 From: James Morse <james.morse(a)arm.com> Date: Mon, 6 Nov 2017 18:44:25 +0000 Subject: [PATCH] ACPI / APEI: Remove ghes_ioremap_area Now that nothing is using the ghes_ioremap_area pages, rip them out. Signed-off-by: James Morse <james.morse(a)arm.com> Reviewed-by: Borislav Petkov <bp(a)suse.de> Tested-by: Tyler Baicar <tbaicar(a)codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Cc: All applicable <stable(a)vger.kernel.org> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 572b6c7303ed..f14695e744d0 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -114,19 +114,7 @@ static DEFINE_MUTEX(ghes_list_mutex); * from BIOS to Linux can be determined only in NMI, IRQ or timer * handler, but general ioremap can not be used in atomic context, so * the fixmap is used instead. - */ - -/* - * Two virtual pages are used, one for IRQ/PROCESS context, the other for - * NMI context (optionally). - */ -#define GHES_IOREMAP_PAGES 2 -#define GHES_IOREMAP_IRQ_PAGE(base) (base) -#define GHES_IOREMAP_NMI_PAGE(base) ((base) + PAGE_SIZE) - -/* virtual memory area for atomic ioremap */ -static struct vm_struct *ghes_ioremap_area; -/* + * * These 2 spinlocks are used to prevent the fixmap entries from being used * simultaneously. */ @@ -141,23 +129,6 @@ static atomic_t ghes_estatus_cache_alloced; static int ghes_panic_timeout __read_mostly = 30; -static int ghes_ioremap_init(void) -{ - ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES, - VM_IOREMAP, VMALLOC_START, VMALLOC_END); - if (!ghes_ioremap_area) { - pr_err(GHES_PFX "Failed to allocate virtual memory area for atomic ioremap.\n"); - return -ENOMEM; - } - - return 0; -} - -static void ghes_ioremap_exit(void) -{ - free_vm_area(ghes_ioremap_area); -} - static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) { phys_addr_t paddr; @@ -1247,13 +1218,9 @@ static int __init ghes_init(void) ghes_nmi_init_cxt(); - rc = ghes_ioremap_init(); - if (rc) - goto err; - rc = ghes_estatus_pool_init(); if (rc) - goto err_ioremap_exit; + goto err; rc = ghes_estatus_pool_expand(GHES_ESTATUS_CACHE_AVG_SIZE * GHES_ESTATUS_CACHE_ALLOCED_MAX); @@ -1277,8 +1244,6 @@ static int __init ghes_init(void) return 0; err_pool_exit: ghes_estatus_pool_exit(); -err_ioremap_exit: - ghes_ioremap_exit(); err: return rc; }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue()" failed to apply to 3.18-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 3.18-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From ff1656790b3a4caca94505c52fd0250f981ea187 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com> Date: Tue, 7 Nov 2017 23:08:10 +0200 Subject: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit acpi_remove_pm_notifier() ends up calling flush_workqueue() while holding acpi_pm_notifier_lock, and that same lock is taken by by the work via acpi_pm_notify_handler(). This can deadlock. To fix the problem let's split the single lock into two: one to protect the dev->wakeup between the work vs. add/remove, and another one to handle notifier installation vs. removal. After commit a1d14934ea4b "workqueue/lockdep: 'Fix' flush_work() annotation" I was able to kill the machine (Intel Braswell) very easily with 'powertop --auto-tune', runtime suspending i915, and trying to wake it up via the USB keyboard. The cases when it didn't die are presumably explained by lockdep getting disabled by something else (cpu hotplug locking issues usually). Fortunately I still got a lockdep report over netconsole (trickling in very slowly), even though the machine was otherwise practically dead: [ 112.179806] ====================================================== [ 114.670858] WARNING: possible circular locking dependency detected [ 117.155663] 4.13.0-rc6-bsw-bisect-00169-ga1d14934ea4b #119 Not tainted [ 119.658101] ------------------------------------------------------ [ 121.310242] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command. [ 121.313294] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead [ 121.313346] xhci_hcd 0000:00:14.0: HC died; cleaning up [ 121.313485] usb 1-6: USB disconnect, device number 3 [ 121.313501] usb 1-6.2: USB disconnect, device number 4 [ 134.747383] kworker/0:2/47 is trying to acquire lock: [ 137.220790] (acpi_pm_notifier_lock){+.+.}, at: [<ffffffff813cafdf>] acpi_pm_notify_handler+0x2f/0x80 [ 139.721524] [ 139.721524] but task is already holding lock: [ 144.672922] ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 147.184450] [ 147.184450] which lock already depends on the new lock. [ 147.184450] [ 154.604711] [ 154.604711] the existing dependency chain (in reverse order) is: [ 159.447888] [ 159.447888] -> #2 ((&dpc->work)){+.+.}: [ 164.183486] __lock_acquire+0x1255/0x13f0 [ 166.504313] lock_acquire+0xb5/0x210 [ 168.778973] process_one_work+0x1b9/0x720 [ 171.030316] worker_thread+0x4c/0x440 [ 173.257184] kthread+0x154/0x190 [ 175.456143] ret_from_fork+0x27/0x40 [ 177.624348] [ 177.624348] -> #1 ("kacpi_notify"){+.+.}: [ 181.850351] __lock_acquire+0x1255/0x13f0 [ 183.941695] lock_acquire+0xb5/0x210 [ 186.046115] flush_workqueue+0xdd/0x510 [ 190.408153] acpi_os_wait_events_complete+0x31/0x40 [ 192.625303] acpi_remove_notify_handler+0x133/0x188 [ 194.820829] acpi_remove_pm_notifier+0x56/0x90 [ 196.989068] acpi_dev_pm_detach+0x5f/0xa0 [ 199.145866] dev_pm_domain_detach+0x27/0x30 [ 201.285614] i2c_device_probe+0x100/0x210 [ 203.411118] driver_probe_device+0x23e/0x310 [ 205.522425] __driver_attach+0xa3/0xb0 [ 207.634268] bus_for_each_dev+0x69/0xa0 [ 209.714797] driver_attach+0x1e/0x20 [ 211.778258] bus_add_driver+0x1bc/0x230 [ 213.837162] driver_register+0x60/0xe0 [ 215.868162] i2c_register_driver+0x42/0x70 [ 217.869551] 0xffffffffa0172017 [ 219.863009] do_one_initcall+0x45/0x170 [ 221.843863] do_init_module+0x5f/0x204 [ 223.817915] load_module+0x225b/0x29b0 [ 225.757234] SyS_finit_module+0xc6/0xd0 [ 227.661851] do_syscall_64+0x5c/0x120 [ 229.536819] return_from_SYSCALL_64+0x0/0x7a [ 231.392444] [ 231.392444] -> #0 (acpi_pm_notifier_lock){+.+.}: [ 235.124914] check_prev_add+0x44e/0x8a0 [ 237.024795] __lock_acquire+0x1255/0x13f0 [ 238.937351] lock_acquire+0xb5/0x210 [ 240.840799] __mutex_lock+0x75/0x940 [ 242.709517] mutex_lock_nested+0x1c/0x20 [ 244.551478] acpi_pm_notify_handler+0x2f/0x80 [ 246.382052] acpi_ev_notify_dispatch+0x44/0x5c [ 248.194412] acpi_os_execute_deferred+0x14/0x30 [ 250.003925] process_one_work+0x1ec/0x720 [ 251.803191] worker_thread+0x4c/0x440 [ 253.605307] kthread+0x154/0x190 [ 255.387498] ret_from_fork+0x27/0x40 [ 257.153175] [ 257.153175] other info that might help us debug this: [ 257.153175] [ 262.324392] Chain exists of: [ 262.324392] acpi_pm_notifier_lock --> "kacpi_notify" --> (&dpc->work) [ 262.324392] [ 267.391997] Possible unsafe locking scenario: [ 267.391997] [ 270.758262] CPU0 CPU1 [ 272.431713] ---- ---- [ 274.060756] lock((&dpc->work)); [ 275.646532] lock("kacpi_notify"); [ 277.260772] lock((&dpc->work)); [ 278.839146] lock(acpi_pm_notifier_lock); [ 280.391902] [ 280.391902] *** DEADLOCK *** [ 280.391902] [ 284.986385] 2 locks held by kworker/0:2/47: [ 286.524895] #0: ("kacpi_notify"){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 288.112927] #1: ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 289.727725] Fixes: c072530f391e (ACPI / PM: Revork the handling of ACPI device wakeup notifications) Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com> Cc: 3.17+ <stable(a)vger.kernel.org> # 3.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index 17e8eb93a76c..69ffd1dc1de7 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable); #ifdef CONFIG_PM static DEFINE_MUTEX(acpi_pm_notifier_lock); +static DEFINE_MUTEX(acpi_pm_notifier_install_lock); void acpi_pm_wakeup_event(struct device *dev) { @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, if (!dev && !func) return AE_BAD_PARAMETER; - mutex_lock(&acpi_pm_notifier_lock); + mutex_lock(&acpi_pm_notifier_install_lock); if (adev->wakeup.flags.notifier_present) goto out; - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); - adev->wakeup.context.dev = dev; - adev->wakeup.context.func = func; - status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY, acpi_pm_notify_handler, NULL); if (ACPI_FAILURE(status)) goto out; + mutex_lock(&acpi_pm_notifier_lock); + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); + adev->wakeup.context.dev = dev; + adev->wakeup.context.func = func; adev->wakeup.flags.notifier_present = true; + mutex_unlock(&acpi_pm_notifier_lock); out: - mutex_unlock(&acpi_pm_notifier_lock); + mutex_unlock(&acpi_pm_notifier_install_lock); return status; } @@ -472,7 +474,7 @@ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev) { acpi_status status = AE_BAD_PARAMETER; - mutex_lock(&acpi_pm_notifier_lock); + mutex_lock(&acpi_pm_notifier_install_lock); if (!adev->wakeup.flags.notifier_present) goto out; @@ -483,14 +485,15 @@ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev) if (ACPI_FAILURE(status)) goto out; + mutex_lock(&acpi_pm_notifier_lock); adev->wakeup.context.func = NULL; adev->wakeup.context.dev = NULL; wakeup_source_unregister(adev->wakeup.ws); - adev->wakeup.flags.notifier_present = false; + mutex_unlock(&acpi_pm_notifier_lock); out: - mutex_unlock(&acpi_pm_notifier_lock); + mutex_unlock(&acpi_pm_notifier_install_lock); return status; }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue()" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From ff1656790b3a4caca94505c52fd0250f981ea187 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com> Date: Tue, 7 Nov 2017 23:08:10 +0200 Subject: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit acpi_remove_pm_notifier() ends up calling flush_workqueue() while holding acpi_pm_notifier_lock, and that same lock is taken by by the work via acpi_pm_notify_handler(). This can deadlock. To fix the problem let's split the single lock into two: one to protect the dev->wakeup between the work vs. add/remove, and another one to handle notifier installation vs. removal. After commit a1d14934ea4b "workqueue/lockdep: 'Fix' flush_work() annotation" I was able to kill the machine (Intel Braswell) very easily with 'powertop --auto-tune', runtime suspending i915, and trying to wake it up via the USB keyboard. The cases when it didn't die are presumably explained by lockdep getting disabled by something else (cpu hotplug locking issues usually). Fortunately I still got a lockdep report over netconsole (trickling in very slowly), even though the machine was otherwise practically dead: [ 112.179806] ====================================================== [ 114.670858] WARNING: possible circular locking dependency detected [ 117.155663] 4.13.0-rc6-bsw-bisect-00169-ga1d14934ea4b #119 Not tainted [ 119.658101] ------------------------------------------------------ [ 121.310242] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command. [ 121.313294] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead [ 121.313346] xhci_hcd 0000:00:14.0: HC died; cleaning up [ 121.313485] usb 1-6: USB disconnect, device number 3 [ 121.313501] usb 1-6.2: USB disconnect, device number 4 [ 134.747383] kworker/0:2/47 is trying to acquire lock: [ 137.220790] (acpi_pm_notifier_lock){+.+.}, at: [<ffffffff813cafdf>] acpi_pm_notify_handler+0x2f/0x80 [ 139.721524] [ 139.721524] but task is already holding lock: [ 144.672922] ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 147.184450] [ 147.184450] which lock already depends on the new lock. [ 147.184450] [ 154.604711] [ 154.604711] the existing dependency chain (in reverse order) is: [ 159.447888] [ 159.447888] -> #2 ((&dpc->work)){+.+.}: [ 164.183486] __lock_acquire+0x1255/0x13f0 [ 166.504313] lock_acquire+0xb5/0x210 [ 168.778973] process_one_work+0x1b9/0x720 [ 171.030316] worker_thread+0x4c/0x440 [ 173.257184] kthread+0x154/0x190 [ 175.456143] ret_from_fork+0x27/0x40 [ 177.624348] [ 177.624348] -> #1 ("kacpi_notify"){+.+.}: [ 181.850351] __lock_acquire+0x1255/0x13f0 [ 183.941695] lock_acquire+0xb5/0x210 [ 186.046115] flush_workqueue+0xdd/0x510 [ 190.408153] acpi_os_wait_events_complete+0x31/0x40 [ 192.625303] acpi_remove_notify_handler+0x133/0x188 [ 194.820829] acpi_remove_pm_notifier+0x56/0x90 [ 196.989068] acpi_dev_pm_detach+0x5f/0xa0 [ 199.145866] dev_pm_domain_detach+0x27/0x30 [ 201.285614] i2c_device_probe+0x100/0x210 [ 203.411118] driver_probe_device+0x23e/0x310 [ 205.522425] __driver_attach+0xa3/0xb0 [ 207.634268] bus_for_each_dev+0x69/0xa0 [ 209.714797] driver_attach+0x1e/0x20 [ 211.778258] bus_add_driver+0x1bc/0x230 [ 213.837162] driver_register+0x60/0xe0 [ 215.868162] i2c_register_driver+0x42/0x70 [ 217.869551] 0xffffffffa0172017 [ 219.863009] do_one_initcall+0x45/0x170 [ 221.843863] do_init_module+0x5f/0x204 [ 223.817915] load_module+0x225b/0x29b0 [ 225.757234] SyS_finit_module+0xc6/0xd0 [ 227.661851] do_syscall_64+0x5c/0x120 [ 229.536819] return_from_SYSCALL_64+0x0/0x7a [ 231.392444] [ 231.392444] -> #0 (acpi_pm_notifier_lock){+.+.}: [ 235.124914] check_prev_add+0x44e/0x8a0 [ 237.024795] __lock_acquire+0x1255/0x13f0 [ 238.937351] lock_acquire+0xb5/0x210 [ 240.840799] __mutex_lock+0x75/0x940 [ 242.709517] mutex_lock_nested+0x1c/0x20 [ 244.551478] acpi_pm_notify_handler+0x2f/0x80 [ 246.382052] acpi_ev_notify_dispatch+0x44/0x5c [ 248.194412] acpi_os_execute_deferred+0x14/0x30 [ 250.003925] process_one_work+0x1ec/0x720 [ 251.803191] worker_thread+0x4c/0x440 [ 253.605307] kthread+0x154/0x190 [ 255.387498] ret_from_fork+0x27/0x40 [ 257.153175] [ 257.153175] other info that might help us debug this: [ 257.153175] [ 262.324392] Chain exists of: [ 262.324392] acpi_pm_notifier_lock --> "kacpi_notify" --> (&dpc->work) [ 262.324392] [ 267.391997] Possible unsafe locking scenario: [ 267.391997] [ 270.758262] CPU0 CPU1 [ 272.431713] ---- ---- [ 274.060756] lock((&dpc->work)); [ 275.646532] lock("kacpi_notify"); [ 277.260772] lock((&dpc->work)); [ 278.839146] lock(acpi_pm_notifier_lock); [ 280.391902] [ 280.391902] *** DEADLOCK *** [ 280.391902] [ 284.986385] 2 locks held by kworker/0:2/47: [ 286.524895] #0: ("kacpi_notify"){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 288.112927] #1: ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 289.727725] Fixes: c072530f391e (ACPI / PM: Revork the handling of ACPI device wakeup notifications) Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com> Cc: 3.17+ <stable(a)vger.kernel.org> # 3.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index 17e8eb93a76c..69ffd1dc1de7 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable); #ifdef CONFIG_PM static DEFINE_MUTEX(acpi_pm_notifier_lock); +static DEFINE_MUTEX(acpi_pm_notifier_install_lock); void acpi_pm_wakeup_event(struct device *dev) { @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, if (!dev && !func) return AE_BAD_PARAMETER; - mutex_lock(&acpi_pm_notifier_lock); + mutex_lock(&acpi_pm_notifier_install_lock); if (adev->wakeup.flags.notifier_present) goto out; - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); - adev->wakeup.context.dev = dev; - adev->wakeup.context.func = func; - status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY, acpi_pm_notify_handler, NULL); if (ACPI_FAILURE(status)) goto out; + mutex_lock(&acpi_pm_notifier_lock); + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); + adev->wakeup.context.dev = dev; + adev->wakeup.context.func = func; adev->wakeup.flags.notifier_present = true; + mutex_unlock(&acpi_pm_notifier_lock); out: - mutex_unlock(&acpi_pm_notifier_lock); + mutex_unlock(&acpi_pm_notifier_install_lock); return status; } @@ -472,7 +474,7 @@ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev) { acpi_status status = AE_BAD_PARAMETER; - mutex_lock(&acpi_pm_notifier_lock); + mutex_lock(&acpi_pm_notifier_install_lock); if (!adev->wakeup.flags.notifier_present) goto out; @@ -483,14 +485,15 @@ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev) if (ACPI_FAILURE(status)) goto out; + mutex_lock(&acpi_pm_notifier_lock); adev->wakeup.context.func = NULL; adev->wakeup.context.dev = NULL; wakeup_source_unregister(adev->wakeup.ws); - adev->wakeup.flags.notifier_present = false; + mutex_unlock(&acpi_pm_notifier_lock); out: - mutex_unlock(&acpi_pm_notifier_lock); + mutex_unlock(&acpi_pm_notifier_install_lock); return status; }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue()" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From ff1656790b3a4caca94505c52fd0250f981ea187 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com> Date: Tue, 7 Nov 2017 23:08:10 +0200 Subject: [PATCH] ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit acpi_remove_pm_notifier() ends up calling flush_workqueue() while holding acpi_pm_notifier_lock, and that same lock is taken by by the work via acpi_pm_notify_handler(). This can deadlock. To fix the problem let's split the single lock into two: one to protect the dev->wakeup between the work vs. add/remove, and another one to handle notifier installation vs. removal. After commit a1d14934ea4b "workqueue/lockdep: 'Fix' flush_work() annotation" I was able to kill the machine (Intel Braswell) very easily with 'powertop --auto-tune', runtime suspending i915, and trying to wake it up via the USB keyboard. The cases when it didn't die are presumably explained by lockdep getting disabled by something else (cpu hotplug locking issues usually). Fortunately I still got a lockdep report over netconsole (trickling in very slowly), even though the machine was otherwise practically dead: [ 112.179806] ====================================================== [ 114.670858] WARNING: possible circular locking dependency detected [ 117.155663] 4.13.0-rc6-bsw-bisect-00169-ga1d14934ea4b #119 Not tainted [ 119.658101] ------------------------------------------------------ [ 121.310242] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command. [ 121.313294] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead [ 121.313346] xhci_hcd 0000:00:14.0: HC died; cleaning up [ 121.313485] usb 1-6: USB disconnect, device number 3 [ 121.313501] usb 1-6.2: USB disconnect, device number 4 [ 134.747383] kworker/0:2/47 is trying to acquire lock: [ 137.220790] (acpi_pm_notifier_lock){+.+.}, at: [<ffffffff813cafdf>] acpi_pm_notify_handler+0x2f/0x80 [ 139.721524] [ 139.721524] but task is already holding lock: [ 144.672922] ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 147.184450] [ 147.184450] which lock already depends on the new lock. [ 147.184450] [ 154.604711] [ 154.604711] the existing dependency chain (in reverse order) is: [ 159.447888] [ 159.447888] -> #2 ((&dpc->work)){+.+.}: [ 164.183486] __lock_acquire+0x1255/0x13f0 [ 166.504313] lock_acquire+0xb5/0x210 [ 168.778973] process_one_work+0x1b9/0x720 [ 171.030316] worker_thread+0x4c/0x440 [ 173.257184] kthread+0x154/0x190 [ 175.456143] ret_from_fork+0x27/0x40 [ 177.624348] [ 177.624348] -> #1 ("kacpi_notify"){+.+.}: [ 181.850351] __lock_acquire+0x1255/0x13f0 [ 183.941695] lock_acquire+0xb5/0x210 [ 186.046115] flush_workqueue+0xdd/0x510 [ 190.408153] acpi_os_wait_events_complete+0x31/0x40 [ 192.625303] acpi_remove_notify_handler+0x133/0x188 [ 194.820829] acpi_remove_pm_notifier+0x56/0x90 [ 196.989068] acpi_dev_pm_detach+0x5f/0xa0 [ 199.145866] dev_pm_domain_detach+0x27/0x30 [ 201.285614] i2c_device_probe+0x100/0x210 [ 203.411118] driver_probe_device+0x23e/0x310 [ 205.522425] __driver_attach+0xa3/0xb0 [ 207.634268] bus_for_each_dev+0x69/0xa0 [ 209.714797] driver_attach+0x1e/0x20 [ 211.778258] bus_add_driver+0x1bc/0x230 [ 213.837162] driver_register+0x60/0xe0 [ 215.868162] i2c_register_driver+0x42/0x70 [ 217.869551] 0xffffffffa0172017 [ 219.863009] do_one_initcall+0x45/0x170 [ 221.843863] do_init_module+0x5f/0x204 [ 223.817915] load_module+0x225b/0x29b0 [ 225.757234] SyS_finit_module+0xc6/0xd0 [ 227.661851] do_syscall_64+0x5c/0x120 [ 229.536819] return_from_SYSCALL_64+0x0/0x7a [ 231.392444] [ 231.392444] -> #0 (acpi_pm_notifier_lock){+.+.}: [ 235.124914] check_prev_add+0x44e/0x8a0 [ 237.024795] __lock_acquire+0x1255/0x13f0 [ 238.937351] lock_acquire+0xb5/0x210 [ 240.840799] __mutex_lock+0x75/0x940 [ 242.709517] mutex_lock_nested+0x1c/0x20 [ 244.551478] acpi_pm_notify_handler+0x2f/0x80 [ 246.382052] acpi_ev_notify_dispatch+0x44/0x5c [ 248.194412] acpi_os_execute_deferred+0x14/0x30 [ 250.003925] process_one_work+0x1ec/0x720 [ 251.803191] worker_thread+0x4c/0x440 [ 253.605307] kthread+0x154/0x190 [ 255.387498] ret_from_fork+0x27/0x40 [ 257.153175] [ 257.153175] other info that might help us debug this: [ 257.153175] [ 262.324392] Chain exists of: [ 262.324392] acpi_pm_notifier_lock --> "kacpi_notify" --> (&dpc->work) [ 262.324392] [ 267.391997] Possible unsafe locking scenario: [ 267.391997] [ 270.758262] CPU0 CPU1 [ 272.431713] ---- ---- [ 274.060756] lock((&dpc->work)); [ 275.646532] lock("kacpi_notify"); [ 277.260772] lock((&dpc->work)); [ 278.839146] lock(acpi_pm_notifier_lock); [ 280.391902] [ 280.391902] *** DEADLOCK *** [ 280.391902] [ 284.986385] 2 locks held by kworker/0:2/47: [ 286.524895] #0: ("kacpi_notify"){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 288.112927] #1: ((&dpc->work)){+.+.}, at: [<ffffffff8109ce90>] process_one_work+0x160/0x720 [ 289.727725] Fixes: c072530f391e (ACPI / PM: Revork the handling of ACPI device wakeup notifications) Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com> Cc: 3.17+ <stable(a)vger.kernel.org> # 3.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index 17e8eb93a76c..69ffd1dc1de7 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -387,6 +387,7 @@ EXPORT_SYMBOL(acpi_bus_power_manageable); #ifdef CONFIG_PM static DEFINE_MUTEX(acpi_pm_notifier_lock); +static DEFINE_MUTEX(acpi_pm_notifier_install_lock); void acpi_pm_wakeup_event(struct device *dev) { @@ -443,24 +444,25 @@ acpi_status acpi_add_pm_notifier(struct acpi_device *adev, struct device *dev, if (!dev && !func) return AE_BAD_PARAMETER; - mutex_lock(&acpi_pm_notifier_lock); + mutex_lock(&acpi_pm_notifier_install_lock); if (adev->wakeup.flags.notifier_present) goto out; - adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); - adev->wakeup.context.dev = dev; - adev->wakeup.context.func = func; - status = acpi_install_notify_handler(adev->handle, ACPI_SYSTEM_NOTIFY, acpi_pm_notify_handler, NULL); if (ACPI_FAILURE(status)) goto out; + mutex_lock(&acpi_pm_notifier_lock); + adev->wakeup.ws = wakeup_source_register(dev_name(&adev->dev)); + adev->wakeup.context.dev = dev; + adev->wakeup.context.func = func; adev->wakeup.flags.notifier_present = true; + mutex_unlock(&acpi_pm_notifier_lock); out: - mutex_unlock(&acpi_pm_notifier_lock); + mutex_unlock(&acpi_pm_notifier_install_lock); return status; } @@ -472,7 +474,7 @@ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev) { acpi_status status = AE_BAD_PARAMETER; - mutex_lock(&acpi_pm_notifier_lock); + mutex_lock(&acpi_pm_notifier_install_lock); if (!adev->wakeup.flags.notifier_present) goto out; @@ -483,14 +485,15 @@ acpi_status acpi_remove_pm_notifier(struct acpi_device *adev) if (ACPI_FAILURE(status)) goto out; + mutex_lock(&acpi_pm_notifier_lock); adev->wakeup.context.func = NULL; adev->wakeup.context.dev = NULL; wakeup_source_unregister(adev->wakeup.ws); - adev->wakeup.flags.notifier_present = false; + mutex_unlock(&acpi_pm_notifier_lock); out: - mutex_unlock(&acpi_pm_notifier_lock); + mutex_unlock(&acpi_pm_notifier_install_lock); return status; }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] s390/disassembler: add missing end marker for e7 table" failed to apply to 3.18-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 3.18-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 5c50538752af7968f53924b22dede8ed4ce4cb3b Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Tue, 26 Sep 2017 09:16:48 +0200 Subject: [PATCH] s390/disassembler: add missing end marker for e7 table The e7 opcode table does not have an end marker. Hence when trying to find an unknown e7 instruction the code will access memory behind the table until it finds something that matches the opcode, or the kernel crashes, whatever comes first. This affects not only the in-kernel disassembler but also uprobes and kprobes which refuse to set a probe on unknown instructions, and therefore search the opcode tables to figure out if instructions are known or not. Cc: <stable(a)vger.kernel.org> # v3.18+ Fixes: 3585cb0280654 ("s390/disassembler: add vector instructions") Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> diff --git a/arch/s390/kernel/dis.c b/arch/s390/kernel/dis.c index f7e82302a71e..d9970c15f79d 100644 --- a/arch/s390/kernel/dis.c +++ b/arch/s390/kernel/dis.c @@ -1548,6 +1548,7 @@ static struct s390_insn opcode_e7[] = { { "vfsq", 0xce, INSTR_VRR_VV000MM }, { "vfs", 0xe2, INSTR_VRR_VVV00MM }, { "vftci", 0x4a, INSTR_VRI_VVIMM }, + { "", 0, INSTR_INVALID } }; static struct s390_insn opcode_eb[] = {

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] s390/runtime instrumention: fix possible memory corruption" failed to apply to 3.18-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 3.18-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From d6e646ad7cfa7034d280459b2b2546288f247144 Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Mon, 11 Sep 2017 11:24:22 +0200 Subject: [PATCH] s390/runtime instrumention: fix possible memory corruption For PREEMPT enabled kernels the runtime instrumentation (RI) code contains a possible use-after-free bug. If a task that makes use of RI exits, it will execute do_exit() while still enabled for preemption. That function will call exit_thread_runtime_instr() via exit_thread(). If exit_thread_runtime_instr() gets preempted after the RI control block of the task has been freed but before the pointer to it is set to NULL, then save_ri_cb(), called from switch_to(), will write to already freed memory. Avoid this and simply disable preemption while freeing the control block and setting the pointer to NULL. Fixes: e4b8b3f33fca ("s390: add support for runtime instrumentation") Cc: <stable(a)vger.kernel.org> # v3.7+ Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky(a)de.ibm.com> diff --git a/arch/s390/kernel/runtime_instr.c b/arch/s390/kernel/runtime_instr.c index 429d3a782f1c..b9738ae2e1de 100644 --- a/arch/s390/kernel/runtime_instr.c +++ b/arch/s390/kernel/runtime_instr.c @@ -49,11 +49,13 @@ void exit_thread_runtime_instr(void) { struct task_struct *task = current; + preempt_disable(); if (!task->thread.ri_cb) return; disable_runtime_instr(); kfree(task->thread.ri_cb); task->thread.ri_cb = NULL; + preempt_enable(); } SYSCALL_DEFINE1(s390_runtime_instr, int, command) @@ -64,9 +66,7 @@ SYSCALL_DEFINE1(s390_runtime_instr, int, command) return -EOPNOTSUPP; if (command == S390_RUNTIME_INSTR_STOP) { - preempt_disable(); exit_thread_runtime_instr(); - preempt_enable(); return 0; }

8 years

1
0
0 0

[Linux-stable-mirror] FAILED: patch "[PATCH] s390: fix transactional execution control register handling" failed to apply to 3.18-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 3.18-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From a1c5befc1c24eb9c1ee83f711e0f21ee79cbb556 Mon Sep 17 00:00:00 2001 From: Heiko Carstens <heiko.carstens(a)de.ibm.com> Date: Thu, 9 Nov 2017 12:29:34 +0100 Subject: [PATCH] s390: fix transactional execution control register handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Dan Horák reported the following crash related to transactional execution: User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000] CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1 Hardware name: IBM 2827 H43 400 (z/VM 6.4.0) task: 00000000fafc8000 task.stack: 00000000fafc4000 User PSW : 0705200180000000 000003ff93c14e70 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3 User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e 0000000000000000 0000000000000002 0000000000000000 000003ff00000000 0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770 000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0 User Code: 000003ff93c14e62: 60e0b030 std %f14,48(%r11) 000003ff93c14e66: 60f0b038 std %f15,56(%r11) #000003ff93c14e6a: e5600000ff0e tbegin 0,65294 >000003ff93c14e70: a7740006 brc 7,3ff93c14e7c 000003ff93c14e74: a7080000 lhi %r0,0 000003ff93c14e78: a7f40023 brc 15,3ff93c14ebe 000003ff93c14e7c: b2220000 ipm %r0 000003ff93c14e80: 8800001c srl %r0,28 There are several bugs with control register handling with respect to transactional execution: - on task switch update_per_regs() is only called if the next task has an mm (is not a kernel thread). This however is incorrect. This breaks e.g. for user mode helper handling, where the kernel creates a kernel thread and then execve's a user space program. Control register contents related to transactional execution won't be updated on execve. If the previous task ran with transactional execution disabled then the new task will also run with transactional execution disabled, which is incorrect. Therefore call update_per_regs() unconditionally within switch_to(). - on startup the transactional execution facility is not enabled for the idle thread. This is not really a bug, but an inconsistency to other facilities. Therefore enable the facility if it is available. - on fork the new thread's per_flags field is not cleared. This means that a child process inherits the PER_FLAG_NO_TE flag. This flag can be set with a ptrace request to disable transactional execution for the current process. It should not be inherited by new child processes in order to be consistent with the handling of all other PER related debugging options. Therefore clear the per_flags field in copy_thread_tls(). Reported-and-tested-by: Dan Horák <dan(a)danny.cz> Fixes: d35339a42dd1 ("s390: add support for transactional memory") Cc: <stable(a)vger.kernel.org> # v3.7+ Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger(a)de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner(a)linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens(a)de.ibm.com> diff --git a/arch/s390/include/asm/switch_to.h b/arch/s390/include/asm/switch_to.h index f6c2b5814ab0..8e6b07609ff4 100644 --- a/arch/s390/include/asm/switch_to.h +++ b/arch/s390/include/asm/switch_to.h @@ -36,8 +36,8 @@ static inline void restore_access_regs(unsigned int *acrs) save_ri_cb(prev->thread.ri_cb); \ save_gs_cb(prev->thread.gs_cb); \ } \ + update_cr_regs(next); \ if (next->mm) { \ - update_cr_regs(next); \ set_cpu_flag(CIF_FPU); \ restore_access_regs(&next->thread.acrs[0]); \ restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \ diff --git a/arch/s390/kernel/early.c b/arch/s390/kernel/early.c index 389e9f61c76f..5096875f7822 100644 --- a/arch/s390/kernel/early.c +++ b/arch/s390/kernel/early.c @@ -238,8 +238,10 @@ static __init void detect_machine_facilities(void) S390_lowcore.machine_flags |= MACHINE_FLAG_IDTE; if (test_facility(40)) S390_lowcore.machine_flags |= MACHINE_FLAG_LPP; - if (test_facility(50) && test_facility(73)) + if (test_facility(50) && test_facility(73)) { S390_lowcore.machine_flags |= MACHINE_FLAG_TE; + __ctl_set_bit(0, 55); + } if (test_facility(51)) S390_lowcore.machine_flags |= MACHINE_FLAG_TLB_LC; if (test_facility(129)) { diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c index 080c851dd9a5..cee658e27732 100644 --- a/arch/s390/kernel/process.c +++ b/arch/s390/kernel/process.c @@ -86,6 +86,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long new_stackp, memset(&p->thread.per_user, 0, sizeof(p->thread.per_user)); memset(&p->thread.per_event, 0, sizeof(p->thread.per_event)); clear_tsk_thread_flag(p, TIF_SINGLE_STEP); + p->thread.per_flags = 0; /* Initialize per thread user and system timer values */ p->thread.user_timer = 0; p->thread.guest_timer = 0;

8 years

1
0
0 0

[Linux-stable-mirror] v4.4.102 build: 0 failures 0 warnings (v4.4.102)

by Build bot for Mark Brown

Tree/Branch: v4.4.102 Git describe: v4.4.102 Commit: 29ffb9c1fb Linux 4.4.102 Build Time: 87 min 9 sec Passed: 9 / 9 (100.00 %) Failed: 0 / 9 ( 0.00 %) Errors: 0 Warnings: 0 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): ------------------------------------------------------------------------------- =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm64-allmodconfig arm-multi_v5_defconfig arm-multi_v7_defconfig x86_64-defconfig arm-allmodconfig arm-allnoconfig x86_64-allnoconfig arm64-defconfig close failed in file object destructor: sys.excepthook is missing lost sys.stderr

8 years

1
0
0 0

[Linux-stable-mirror] v4.14.2 build: 0 failures 0 warnings (v4.14.2)

by Build bot for Mark Brown

Tree/Branch: v4.14.2 Git describe: v4.14.2 Commit: f9f0b03ded Linux 4.14.2 Build Time: 119 min 13 sec Passed: 10 / 10 (100.00 %) Failed: 0 / 10 ( 0.00 %) Errors: 0 Warnings: 0 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): ------------------------------------------------------------------------------- =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm64-allmodconfig arm-multi_v5_defconfig arm-multi_v7_defconfig x86_64-defconfig arm-allmodconfig arm-allnoconfig x86_64-allnoconfig arm-multi_v4t_defconfig arm64-defconfig close failed in file object destructor: sys.excepthook is missing lost sys.stderr

8 years

1
0
0 0

[Linux-stable-mirror] v4.13.16 build: 0 failures 0 warnings (v4.13.16)

by Build bot for Mark Brown

Tree/Branch: v4.13.16 Git describe: v4.13.16 Commit: e87c13993f Linux 4.13.16 Build Time: 112 min 15 sec Passed: 10 / 10 (100.00 %) Failed: 0 / 10 ( 0.00 %) Errors: 0 Warnings: 0 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): ------------------------------------------------------------------------------- =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm64-allmodconfig arm-multi_v5_defconfig arm-multi_v7_defconfig x86_64-defconfig arm-allmodconfig arm-allnoconfig x86_64-allnoconfig arm-multi_v4t_defconfig arm64-defconfig close failed in file object destructor: sys.excepthook is missing lost sys.stderr

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH] crypto: skcipher - Fix skcipher_walk_aead_common

by Ondrej Mosnacek

The skcipher_walk_aead_common function calls scatterwalk_copychunks on the input and output walks to skip the associated data. If the AD end at an SG list entry boundary, then after these calls the walks will still be pointing to the end of the skipped region. These offsets are later checked for alignment in skcipher_walk_next, so the skcipher_walk may detect the alignment incorrectly. This patch fixes it by calling scatterwalk_done after the copychunks calls to ensure that the offsets refer to the right SG list entry. Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface") Cc: <stable(a)vger.kernel.org> Signed-off-by: Ondrej Mosnacek <omosnacek(a)gmail.com> --- crypto/skcipher.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/crypto/skcipher.c b/crypto/skcipher.c index 4faa0fd53b0c..6c45ed536664 100644 --- a/crypto/skcipher.c +++ b/crypto/skcipher.c @@ -517,6 +517,9 @@ static int skcipher_walk_aead_common(struct skcipher_walk *walk, scatterwalk_copychunks(NULL, &walk->in, req->assoclen, 2); scatterwalk_copychunks(NULL, &walk->out, req->assoclen, 2); + scatterwalk_done(&walk->in, 0, walk->total); + scatterwalk_done(&walk->out, 0, walk->total); + walk->iv = req->iv; walk->oiv = req->iv; -- 2.14.1

8 years

3
3
0 0

[Linux-stable-mirror] [PATCH 3/4] bcache: recover data from backing when data is clean

by Michael Lyle

From: Rui Hua <huarui.dev(a)gmail.com> When we send a read request and hit the clean data in cache device, there is a situation called cache read race in bcache(see the commit in the tail of cache_look_up(), the following explaination just copy from there): The bucket we're reading from might be reused while our bio is in flight, and we could then end up reading the wrong data. We guard against this by checking (in bch_cache_read_endio()) if the pointer is stale again; if so, we treat it as an error (s->iop.error = -EINTR) and reread from the backing device (but we don't pass that error up anywhere) It should be noted that cache read race happened under normal circumstances, not the circumstance when SSD failed, it was counted and shown in /sys/fs/bcache/XXX/internal/cache_read_races. Without this patch, when we use writeback mode, we will never reread from the backing device when cache read race happened, until the whole cache device is clean, because the condition (s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR) will be passed up, at last, user will receive -EINTR when it's bio end, this is not suitable, and wield to up-application. In this patch, we use s->read_dirty_data to judge whether the read request hit dirty data in cache device, it is safe to reread data from the backing device when the read request hit clean data. This can not only handle cache read race, but also recover data when failed read request from cache device. [edited by mlyle to fix up whitespace, commit log title, comment spelling] Fixes: d59b23795933 ("bcache: only permit to recovery read error when cache device is clean") Cc: <stable(a)vger.kernel.org> # 4.14 Signed-off-by: Hua Rui <huarui.dev(a)gmail.com> Reviewed-by: Michael Lyle <mlyle(a)lyle.org> Reviewed-by: Coly Li <colyli(a)suse.de> Signed-off-by: Michael Lyle <mlyle(a)lyle.org> --- drivers/md/bcache/request.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 3a7aed7282b2..643c3021624f 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -708,16 +708,15 @@ static void cached_dev_read_error(struct closure *cl) { struct search *s = container_of(cl, struct search, cl); struct bio *bio = &s->bio.bio; - struct cached_dev *dc = container_of(s->d, struct cached_dev, disk); /* - * If cache device is dirty (dc->has_dirty is non-zero), then - * recovery a failed read request from cached device may get a - * stale data back. So read failure recovery is only permitted - * when cache device is clean. + * If read request hit dirty data (s->read_dirty_data is true), + * then recovery a failed read request from cached device may + * get a stale data back. So read failure recovery is only + * permitted when read request hit clean data in cache device, + * or when cache read race happened. */ - if (s->recoverable && - (dc && !atomic_read(&dc->has_dirty))) { + if (s->recoverable && !s->read_dirty_data) { /* Retry from the backing device: */ trace_bcache_read_retry(s->orig_bio); -- 2.14.1

8 years

1
0
0 0

[Linux-stable-mirror] v3.18.84 build: 0 failures 1 warnings (v3.18.84)

by Build bot for Mark Brown

Tree/Branch: v3.18.84 Git describe: v3.18.84 Commit: 7166ceea0a Linux 3.18.84 Build Time: 67 min 6 sec Passed: 9 / 9 (100.00 %) Failed: 0 / 9 ( 0.00 %) Errors: 0 Warnings: 1 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): 2 warnings 0 mismatches : x86_64-defconfig ------------------------------------------------------------------------------- Warnings Summary: 1 2 ../include/linux/ftrace.h:632:36: warning: calling '__builtin_return_address' with a nonzero argument is unsafe [-Wframe-address] =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- x86_64-defconfig : PASS, 0 errors, 2 warnings, 0 section mismatches Warnings: ../include/linux/ftrace.h:632:36: warning: calling '__builtin_return_address' with a nonzero argument is unsafe [-Wframe-address] ../include/linux/ftrace.h:632:36: warning: calling '__builtin_return_address' with a nonzero argument is unsafe [-Wframe-address] ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm64-allmodconfig arm-multi_v5_defconfig arm-multi_v7_defconfig arm-allmodconfig arm-allnoconfig x86_64-allnoconfig arm64-defconfig close failed in file object destructor: sys.excepthook is missing lost sys.stderr

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH 1/3] drm/i915: Pass the correct msgs to gmbus_is_index_read()

by Ville Syrjala

From: Ville Syrjälä <ville.syrjala(a)linux.intel.com> We're supposed to examine msgs[i] and msgs[i+1] to see if they form a pair suitable for an indexed transfer. But in reality we're examining msgs[0] and msgs[1]. Fix this. Cc: stable(a)vger.kernel.org Cc: Daniel Kurtz <djkurtz(a)chromium.org> Cc: Chris Wilson <chris(a)chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch> Cc: Sean Paul <seanpaul(a)chromium.org> Fixes: 56f9eac05489 ("drm/i915/intel_i2c: use INDEX cycles for i2c read transactions") Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com> --- drivers/gpu/drm/i915/intel_i2c.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index eb5827110d8f..165375cbef2f 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -484,7 +484,7 @@ do_gmbus_xfer(struct i2c_adapter *adapter, struct i2c_msg *msgs, int num) for (; i < num; i += inc) { inc = 1; - if (gmbus_is_index_read(msgs, i, num)) { + if (gmbus_is_index_read(&msgs[i], i, num)) { ret = gmbus_xfer_index_read(dev_priv, &msgs[i]); inc = 2; /* an index read is two msgs */ } else if (msgs[i].flags & I2C_M_RD) { -- 2.13.6

8 years

3
7
0 0

[Linux-stable-mirror] v4.9.65 build: 0 failures 0 warnings (v4.9.65)

by Build bot for Mark Brown

Tree/Branch: v4.9.65 Git describe: v4.9.65 Commit: 133e6ccf46 Linux 4.9.65 Build Time: 101 min 1 sec Passed: 10 / 10 (100.00 %) Failed: 0 / 10 ( 0.00 %) Errors: 0 Warnings: 0 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): ------------------------------------------------------------------------------- =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm64-allmodconfig arm-multi_v5_defconfig arm-multi_v7_defconfig x86_64-defconfig arm-allmodconfig arm-allnoconfig x86_64-allnoconfig arm-multi_v4t_defconfig arm64-defconfig close failed in file object destructor: sys.excepthook is missing lost sys.stderr

8 years

1
0
0 0

[Linux-stable-mirror] v4.4.101 build: 0 failures 0 warnings (v4.4.101)

by Build bot for Mark Brown

Tree/Branch: v4.4.101 Git describe: v4.4.101 Commit: 5baf0fb260 Linux 4.4.101 Build Time: 86 min 54 sec Passed: 9 / 9 (100.00 %) Failed: 0 / 9 ( 0.00 %) Errors: 0 Warnings: 0 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): ------------------------------------------------------------------------------- =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm64-allmodconfig arm-multi_v5_defconfig arm-multi_v7_defconfig x86_64-defconfig arm-allmodconfig arm-allnoconfig x86_64-allnoconfig arm64-defconfig close failed in file object destructor: sys.excepthook is missing lost sys.stderr

8 years

1
0
0 0

Re: [Linux-stable-mirror] Linux 4.4.102

by Greg KH

diff --git a/Makefile b/Makefile index 0d7b050427ed..9e036fac9c04 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 101 +SUBLEVEL = 102 EXTRAVERSION = NAME = Blurry Fish Butt diff --git a/mm/debug-pagealloc.c b/mm/debug-pagealloc.c index fe1c61f7cf26..3b8f1b83610e 100644 --- a/mm/debug-pagealloc.c +++ b/mm/debug-pagealloc.c @@ -34,7 +34,7 @@ static inline void set_page_poison(struct page *page) struct page_ext *page_ext; page_ext = lookup_page_ext(page); - if (page_ext) + if (!page_ext) return; __set_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags); } @@ -44,7 +44,7 @@ static inline void clear_page_poison(struct page *page) struct page_ext *page_ext; page_ext = lookup_page_ext(page); - if (page_ext) + if (!page_ext) return; __clear_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags); } @@ -54,7 +54,7 @@ static inline bool page_poison(struct page *page) struct page_ext *page_ext; page_ext = lookup_page_ext(page); - if (page_ext) + if (!page_ext) return false; return test_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags); }

8 years

1
0
0 0

[Linux-stable-mirror] Patch "mm, hwpoison: fixup "mm: check the return value of lookup_page_ext for all call sites"" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled mm, hwpoison: fixup "mm: check the return value of lookup_page_ext for all call sites" to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mm-hwpoison-fixup-mm-check-the-return-value-of-lookup_page_ext-for-all-call-sites.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From foo@baz Fri Nov 24 11:13:07 CET 2017 Date: Fri, 24 Nov 2017 11:13:07 +0100 To: Greg KH <gregkh(a)linuxfoundation.org> From: Michal Hocko <mhocko(a)suse.com> Subject: mm, hwpoison: fixup "mm: check the return value of lookup_page_ext for all call sites" From: Michal Hocko <mhocko(a)suse.com> Backport of the upstream commit f86e4271978b ("mm: check the return value of lookup_page_ext for all call sites") is wrong for hwpoison pages. I have accidentally negated the condition for bailout. This basically disables hwpoison pages tracking while the code still might crash on unusual configurations when struct pages do not have page_ext allocated. The fix is trivial to invert the condition. Reported-by: Jiri Slaby <jslaby(a)suse.cz> Signed-off-by: Michal Hocko <mhocko(a)suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- mm/debug-pagealloc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/debug-pagealloc.c +++ b/mm/debug-pagealloc.c @@ -34,7 +34,7 @@ static inline void set_page_poison(struc struct page_ext *page_ext; page_ext = lookup_page_ext(page); - if (page_ext) + if (!page_ext) return; __set_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags); } @@ -44,7 +44,7 @@ static inline void clear_page_poison(str struct page_ext *page_ext; page_ext = lookup_page_ext(page); - if (page_ext) + if (!page_ext) return; __clear_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags); } @@ -54,7 +54,7 @@ static inline bool page_poison(struct pa struct page_ext *page_ext; page_ext = lookup_page_ext(page); - if (page_ext) + if (!page_ext) return false; return test_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags); } Patches currently in stable-queue which might be from gregkh(a)linuxfoundation.org are queue-4.4/mm-hwpoison-fixup-mm-check-the-return-value-of-lookup_page_ext-for-all-call-sites.patch

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH v3] cxl: Check if vphb exists before iterating over AFU devices

by Vaibhav Jain

During an eeh a kernel-oops is reported if no vPHB is allocated to the AFU. This happens as during AFU init, an error in creation of vPHB is a non-fatal error. Hence afu->phb should always be checked for NULL before iterating over it for the virtual AFU pci devices. This patch fixes the kenel-oops by adding a NULL pointer check for afu->phb before it is dereferenced. Fixes: 9e8df8a2196("cxl: EEH support") Cc: stable(a)vger.kernel.org Signed-off-by: Vaibhav Jain <vaibhav(a)linux.vnet.ibm.com> --- Changelog: v3 -> Fixed a reverse NULL check [Fred] Resend -> Added the 'Fixes' info and marking the patch to stable tree [Mpe] v2 -> Added the vphb NULL check to cxl_vphb_error_detected() [Andrew] --- drivers/misc/cxl/pci.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index bb7fd3f4edab..19969ee86d6f 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -2083,6 +2083,9 @@ static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu, /* There should only be one entry, but go through the list * anyway */ + if (afu->phb == NULL) + return result; + list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { if (!afu_dev->driver) continue; @@ -2124,8 +2127,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, * Tell the AFU drivers; but we don't care what they * say, we're going away. */ - if (afu->phb != NULL) - cxl_vphb_error_detected(afu, state); + cxl_vphb_error_detected(afu, state); } return PCI_ERS_RESULT_DISCONNECT; } @@ -2265,6 +2267,9 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev) if (cxl_afu_select_best_mode(afu)) goto err; + if (afu->phb == NULL) + continue; + list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { /* Reset the device context. * TODO: make this less disruptive @@ -2327,6 +2332,9 @@ static void cxl_pci_resume(struct pci_dev *pdev) for (i = 0; i < adapter->slices; i++) { afu = adapter->afu[i]; + if (afu->phb == NULL) + continue; + list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { if (afu_dev->driver && afu_dev->driver->err_handler && afu_dev->driver->err_handler->resume) -- 2.14.3

8 years

4
3
0 0

[Linux-stable-mirror] [PATCH] alpha: Fix mixed up args in EXC macro in futex operations

by Michael Cree

Fix the typo (mixed up arguments) in the EXC macro in the futex definitions introduced by commit ca282f697381 (alpha: add a helper for emitting exception table entries). Signed-off-by: Michael Cree <mcree(a)orcon.net.nz> --- arch/alpha/include/asm/futex.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/alpha/include/asm/futex.h b/arch/alpha/include/asm/futex.h index d2e4da93e68c..ca3322536f72 100644 --- a/arch/alpha/include/asm/futex.h +++ b/arch/alpha/include/asm/futex.h @@ -20,8 +20,8 @@ "3: .subsection 2\n" \ "4: br 1b\n" \ " .previous\n" \ - EXC(1b,3b,%1,$31) \ - EXC(2b,3b,%1,$31) \ + EXC(1b,3b,$31,%1) \ + EXC(2b,3b,$31,%1) \ : "=&r" (oldval), "=&r"(ret) \ : "r" (uaddr), "r"(oparg) \ : "memory") @@ -82,8 +82,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, "3: .subsection 2\n" "4: br 1b\n" " .previous\n" - EXC(1b,3b,%0,$31) - EXC(2b,3b,%0,$31) + EXC(1b,3b,$31,%0) + EXC(2b,3b,$31,%0) : "+r"(ret), "=&r"(prev), "=&r"(cmp) : "r"(uaddr), "r"((long)(int)oldval), "r"(newval) : "memory"); -- 2.11.0

8 years

2
1
0 0

Re: [Linux-stable-mirror] Linux 4.14.2

by Greg KH

diff --git a/Makefile b/Makefile index 01f9df1af256..75d89dc2b94a 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 1 +SUBLEVEL = 2 EXTRAVERSION = NAME = Petit Gorille diff --git a/block/bio.c b/block/bio.c index 101c2a9b5481..33fa6b4af312 100644 --- a/block/bio.c +++ b/block/bio.c @@ -597,6 +597,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) * so we don't set nor calculate new physical/hw segment counts here */ bio->bi_disk = bio_src->bi_disk; + bio->bi_partno = bio_src->bi_partno; bio_set_flag(bio, BIO_CLONED); bio->bi_opf = bio_src->bi_opf; bio->bi_write_hint = bio_src->bi_write_hint; diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 810b138f5897..c82d9fd2f05a 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -4030,7 +4030,8 @@ smi_from_recv_msg(ipmi_smi_t intf, struct ipmi_recv_msg *recv_msg, } static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, - struct list_head *timeouts, long timeout_period, + struct list_head *timeouts, + unsigned long timeout_period, int slot, unsigned long *flags, unsigned int *waiting_msgs) { @@ -4043,8 +4044,8 @@ static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, if (!ent->inuse) return; - ent->timeout -= timeout_period; - if (ent->timeout > 0) { + if (timeout_period < ent->timeout) { + ent->timeout -= timeout_period; (*waiting_msgs)++; return; } @@ -4110,7 +4111,8 @@ static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, } } -static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, long timeout_period) +static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, + unsigned long timeout_period) { struct list_head timeouts; struct ipmi_recv_msg *msg, *msg2; diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index 36f47e8d06a3..bc3984ffe867 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -3424,7 +3424,7 @@ static inline void wait_for_timer_and_thread(struct smi_info *smi_info) del_timer_sync(&smi_info->si_timer); } -static int is_new_interface(struct smi_info *info) +static struct smi_info *find_dup_si(struct smi_info *info) { struct smi_info *e; @@ -3439,24 +3439,36 @@ static int is_new_interface(struct smi_info *info) */ if (info->slave_addr && !e->slave_addr) e->slave_addr = info->slave_addr; - return 0; + return e; } } - return 1; + return NULL; } static int add_smi(struct smi_info *new_smi) { int rv = 0; + struct smi_info *dup; mutex_lock(&smi_infos_lock); - if (!is_new_interface(new_smi)) { - pr_info(PFX "%s-specified %s state machine: duplicate\n", - ipmi_addr_src_to_str(new_smi->addr_source), - si_to_str[new_smi->si_type]); - rv = -EBUSY; - goto out_err; + dup = find_dup_si(new_smi); + if (dup) { + if (new_smi->addr_source == SI_ACPI && + dup->addr_source == SI_SMBIOS) { + /* We prefer ACPI over SMBIOS. */ + dev_info(dup->dev, + "Removing SMBIOS-specified %s state machine in favor of ACPI\n", + si_to_str[new_smi->si_type]); + cleanup_one_si(dup); + } else { + dev_info(new_smi->dev, + "%s-specified %s state machine: duplicate\n", + ipmi_addr_src_to_str(new_smi->addr_source), + si_to_str[new_smi->si_type]); + rv = -EBUSY; + goto out_err; + } } pr_info(PFX "Adding %s-specified %s state machine\n", @@ -3865,7 +3877,8 @@ static void cleanup_one_si(struct smi_info *to_clean) poll(to_clean); schedule_timeout_uninterruptible(1); } - disable_si_irq(to_clean, false); + if (to_clean->handlers) + disable_si_irq(to_clean, false); while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) { poll(to_clean); schedule_timeout_uninterruptible(1); diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c index 610638a80383..461bf0b8a094 100644 --- a/drivers/char/tpm/tpm-dev-common.c +++ b/drivers/char/tpm/tpm-dev-common.c @@ -110,6 +110,12 @@ ssize_t tpm_common_write(struct file *file, const char __user *buf, return -EFAULT; } + if (in_size < 6 || + in_size < be32_to_cpu(*((__be32 *) (priv->data_buffer + 2)))) { + mutex_unlock(&priv->buffer_mutex); + return -EINVAL; + } + /* atomic tpm command send and result receive. We only hold the ops * lock during this period so that the tpm can be unregistered even if * the char dev is held open. diff --git a/drivers/net/ethernet/fealnx.c b/drivers/net/ethernet/fealnx.c index e92859dab7ae..e191c4ebeaf4 100644 --- a/drivers/net/ethernet/fealnx.c +++ b/drivers/net/ethernet/fealnx.c @@ -257,8 +257,8 @@ enum rx_desc_status_bits { RXFSD = 0x00000800, /* first descriptor */ RXLSD = 0x00000400, /* last descriptor */ ErrorSummary = 0x80, /* error summary */ - RUNT = 0x40, /* runt packet received */ - LONG = 0x20, /* long packet received */ + RUNTPKT = 0x40, /* runt packet received */ + LONGPKT = 0x20, /* long packet received */ FAE = 0x10, /* frame align error */ CRC = 0x08, /* crc error */ RXER = 0x04, /* receive error */ @@ -1632,7 +1632,7 @@ static int netdev_rx(struct net_device *dev) dev->name, rx_status); dev->stats.rx_errors++; /* end of a packet. */ - if (rx_status & (LONG | RUNT)) + if (rx_status & (LONGPKT | RUNTPKT)) dev->stats.rx_length_errors++; if (rx_status & RXER) dev->stats.rx_frame_errors++; diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index 47cab1bde065..9e1b74590682 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -771,7 +771,7 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_ int err; u8 iface_no; struct usb_cdc_parsed_header hdr; - u16 curr_ntb_format; + __le16 curr_ntb_format; ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) @@ -889,7 +889,7 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_ goto error2; } - if (curr_ntb_format == USB_CDC_NCM_NTB32_FORMAT) { + if (curr_ntb_format == cpu_to_le16(USB_CDC_NCM_NTB32_FORMAT)) { dev_info(&intf->dev, "resetting NTB format to 16-bit"); err = usbnet_write_cmd(dev, USB_CDC_SET_NTB_FORMAT, USB_TYPE_CLASS | USB_DIR_OUT diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index d7c49cf1d5e9..a2f4e52fadb5 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1623,26 +1623,19 @@ static struct sk_buff *vxlan_na_create(struct sk_buff *request, static int neigh_reduce(struct net_device *dev, struct sk_buff *skb, __be32 vni) { struct vxlan_dev *vxlan = netdev_priv(dev); - struct nd_msg *msg; - const struct ipv6hdr *iphdr; const struct in6_addr *daddr; - struct neighbour *n; + const struct ipv6hdr *iphdr; struct inet6_dev *in6_dev; + struct neighbour *n; + struct nd_msg *msg; in6_dev = __in6_dev_get(dev); if (!in6_dev) goto out; - if (!pskb_may_pull(skb, sizeof(struct ipv6hdr) + sizeof(struct nd_msg))) - goto out; - iphdr = ipv6_hdr(skb); daddr = &iphdr->daddr; - msg = (struct nd_msg *)(iphdr + 1); - if (msg->icmph.icmp6_code != 0 || - msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION) - goto out; if (ipv6_addr_loopback(daddr) || ipv6_addr_is_multicast(&msg->target)) @@ -2240,11 +2233,11 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) { struct vxlan_dev *vxlan = netdev_priv(dev); + struct vxlan_rdst *rdst, *fdst = NULL; const struct ip_tunnel_info *info; - struct ethhdr *eth; bool did_rsc = false; - struct vxlan_rdst *rdst, *fdst = NULL; struct vxlan_fdb *f; + struct ethhdr *eth; __be32 vni = 0; info = skb_tunnel_info(skb); @@ -2269,12 +2262,14 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) if (ntohs(eth->h_proto) == ETH_P_ARP) return arp_reduce(dev, skb, vni); #if IS_ENABLED(CONFIG_IPV6) - else if (ntohs(eth->h_proto) == ETH_P_IPV6) { - struct ipv6hdr *hdr, _hdr; - if ((hdr = skb_header_pointer(skb, - skb_network_offset(skb), - sizeof(_hdr), &_hdr)) && - hdr->nexthdr == IPPROTO_ICMPV6) + else if (ntohs(eth->h_proto) == ETH_P_IPV6 && + pskb_may_pull(skb, sizeof(struct ipv6hdr) + + sizeof(struct nd_msg)) && + ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) { + struct nd_msg *m = (struct nd_msg *)(ipv6_hdr(skb) + 1); + + if (m->icmph.icmp6_code == 0 && + m->icmph.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) return neigh_reduce(dev, skb, vni); } #endif diff --git a/drivers/tty/serial/8250/8250_fintek.c b/drivers/tty/serial/8250/8250_fintek.c index e500f7dd2470..4bd376c08b59 100644 --- a/drivers/tty/serial/8250/8250_fintek.c +++ b/drivers/tty/serial/8250/8250_fintek.c @@ -118,6 +118,9 @@ static int fintek_8250_enter_key(u16 base_port, u8 key) if (!request_muxed_region(base_port, 2, "8250_fintek")) return -EBUSY; + /* Force to deactive all SuperIO in this base_port */ + outb(EXIT_KEY, base_port + ADDR_PORT); + outb(key, base_port + ADDR_PORT); outb(key, base_port + ADDR_PORT); return 0; diff --git a/drivers/tty/serial/omap-serial.c b/drivers/tty/serial/omap-serial.c index 7754053deeda..26a22b100df1 100644 --- a/drivers/tty/serial/omap-serial.c +++ b/drivers/tty/serial/omap-serial.c @@ -693,7 +693,7 @@ static void serial_omap_set_mctrl(struct uart_port *port, unsigned int mctrl) if ((mctrl & TIOCM_RTS) && (port->status & UPSTAT_AUTORTS)) up->efr |= UART_EFR_RTS; else - up->efr &= UART_EFR_RTS; + up->efr &= ~UART_EFR_RTS; serial_out(up, UART_EFR, up->efr); serial_out(up, UART_LCR, lcr); diff --git a/fs/coda/upcall.c b/fs/coda/upcall.c index a37f003530d7..1175a1722411 100644 --- a/fs/coda/upcall.c +++ b/fs/coda/upcall.c @@ -447,8 +447,7 @@ int venus_fsync(struct super_block *sb, struct CodaFid *fid) UPARG(CODA_FSYNC); inp->coda_fsync.VFid = *fid; - error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs), - &outsize, inp); + error = coda_upcall(coda_vcp(sb), insize, &outsize, inp); CODA_FREE(inp, insize); return error; diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index 74407c6dd592..ec8f75813beb 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node) dlm_lockres_put(res); continue; } + dlm_move_lockres_to_recovery_list(dlm, res); } else if (res->owner == dlm->node_num) { dlm_free_dead_locks(dlm, res, dead_node); __dlm_lockres_calc_usage(dlm, res); diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 6e41fc8fabbe..dc455d45a66a 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1161,6 +1161,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) } size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE; if (size_change) { + /* + * Here we should wait dio to finish before inode lock + * to avoid a deadlock between ocfs2_setattr() and + * ocfs2_dio_end_io_write() + */ + inode_dio_wait(inode); + status = ocfs2_rw_lock(inode, 1); if (status < 0) { mlog_errno(status); @@ -1200,8 +1207,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) if (status) goto bail_unlock; - inode_dio_wait(inode); - if (i_size_read(inode) >= attr->ia_size) { if (ocfs2_should_order_data(inode)) { status = ocfs2_begin_ordered_truncate(inode, diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c9c4a81b9767..18b06983131a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -700,7 +700,8 @@ typedef struct pglist_data { * is the first PFN that needs to be initialised. */ unsigned long first_deferred_pfn; - unsigned long static_init_size; + /* Number of non-deferred pages */ + unsigned long static_init_pgcnt; #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index e012b9be777e..fed95fa941e6 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1507,7 +1507,7 @@ static void rcu_prepare_for_idle(void) rdtp->last_accelerate = jiffies; for_each_rcu_flavor(rsp) { rdp = this_cpu_ptr(rsp->rda); - if (rcu_segcblist_pend_cbs(&rdp->cblist)) + if (!rcu_segcblist_pend_cbs(&rdp->cblist)) continue; rnp = rdp->mynode; raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 77e4d3c5c57b..82a6270c9743 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -290,28 +290,37 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + +/* + * Determine how many pages need to be initialized durig early boot + * (non-deferred initialization). + * The value of first_deferred_pfn will be set later, once non-deferred pages + * are initialized, but for now set it ULONG_MAX. + */ static inline void reset_deferred_meminit(pg_data_t *pgdat) { - unsigned long max_initialise; - unsigned long reserved_lowmem; + phys_addr_t start_addr, end_addr; + unsigned long max_pgcnt; + unsigned long reserved; /* * Initialise at least 2G of a node but also take into account that * two large system hashes that can take up 1GB for 0.25TB/node. */ - max_initialise = max(2UL << (30 - PAGE_SHIFT), - (pgdat->node_spanned_pages >> 8)); + max_pgcnt = max(2UL << (30 - PAGE_SHIFT), + (pgdat->node_spanned_pages >> 8)); /* * Compensate the all the memblock reservations (e.g. crash kernel) * from the initial estimation to make sure we will initialize enough * memory to boot. */ - reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn, - pgdat->node_start_pfn + max_initialise); - max_initialise += reserved_lowmem; + start_addr = PFN_PHYS(pgdat->node_start_pfn); + end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt); + reserved = memblock_reserved_memory_within(start_addr, end_addr); + max_pgcnt += PHYS_PFN(reserved); - pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages); + pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages); pgdat->first_deferred_pfn = ULONG_MAX; } @@ -338,7 +347,7 @@ static inline bool update_defer_init(pg_data_t *pgdat, if (zone_end < pgdat_end_pfn(pgdat)) return true; (*nr_initialised)++; - if ((*nr_initialised > pgdat->static_init_size) && + if ((*nr_initialised > pgdat->static_init_pgcnt) && (pfn & (PAGES_PER_SECTION - 1)) == 0) { pgdat->first_deferred_pfn = pfn; return false; diff --git a/mm/page_ext.c b/mm/page_ext.c index 4f0367d472c4..2c16216c29b6 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -125,7 +125,6 @@ struct page_ext *lookup_page_ext(struct page *page) struct page_ext *base; base = NODE_DATA(page_to_nid(page))->node_page_ext; -#if defined(CONFIG_DEBUG_VM) /* * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are @@ -134,7 +133,6 @@ struct page_ext *lookup_page_ext(struct page *page) */ if (unlikely(!base)) return NULL; -#endif index = pfn - round_down(node_start_pfn(page_to_nid(page)), MAX_ORDER_NR_PAGES); return get_entry(base, index); @@ -199,7 +197,6 @@ struct page_ext *lookup_page_ext(struct page *page) { unsigned long pfn = page_to_pfn(page); struct mem_section *section = __pfn_to_section(pfn); -#if defined(CONFIG_DEBUG_VM) /* * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are @@ -208,7 +205,6 @@ struct page_ext *lookup_page_ext(struct page *page) */ if (!section->page_ext) return NULL; -#endif return get_entry(section->page_ext, pfn); } diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 8bd4afa83cb8..23a3e415ac2c 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -188,8 +188,12 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, do { next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask, sz); - if (pte && walk->hugetlb_entry) + + if (pte) err = walk->hugetlb_entry(pte, hmask, addr, next, walk); + else if (walk->pte_hole) + err = walk->pte_hole(addr, next, walk); + if (err) break; } while (addr = next, addr != end); diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index b93148e8e9fb..15c99dfa3d72 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_min_size; int alloc_size; @@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); - len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb); - if (len > 0) { + if (nlk->dump_done_errno > 0 || + skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk) return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, + sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb; nl_dump_check_consistent(cb, nlh); - memcpy(nlmsg_data(nlh), &len, sizeof(len)); + memcpy(nlmsg_data(nlh), &nlk->dump_done_errno, + sizeof(nlk->dump_done_errno)); if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, } nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX; mutex_unlock(nlk->cb_mutex); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 028188597eaa..962de7b3c023 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -34,6 +34,7 @@ struct netlink_sock { wait_queue_head_t wait; bool bound; bool cb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex *cb_mutex; struct mutex cb_def_mutex; diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index a6dfa86c0201..3b18085e3b10 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -807,9 +807,10 @@ static void sctp_inet6_skb_msgname(struct sk_buff *skb, char *msgname, addr->v6.sin6_flowinfo = 0; addr->v6.sin6_port = sh->source; addr->v6.sin6_addr = ipv6_hdr(skb)->saddr; - if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) { + if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) addr->v6.sin6_scope_id = sctp_v6_skb_iif(skb); - } + else + addr->v6.sin6_scope_id = 0; } *addr_len = sctp_v6_addr_to_user(sctp_sk(skb->sk), addr); diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c index 809ba70fbbbf..7d769b948de8 100644 --- a/security/integrity/ima/ima_appraise.c +++ b/security/integrity/ima/ima_appraise.c @@ -320,6 +320,9 @@ void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file) if (iint->flags & IMA_DIGSIG) return; + if (iint->ima_file_status != INTEGRITY_PASS) + return; + rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo); if (rc < 0) return;

8 years

1
0
0 0

Re: [Linux-stable-mirror] Linux 4.13.16

by Greg KH

diff --git a/Makefile b/Makefile index 3bd5d9d148d3..bc9a897e0431 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 13 -SUBLEVEL = 15 +SUBLEVEL = 16 EXTRAVERSION = NAME = Fearless Coyote diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c index c55fb2cb2acc..24f749324c0f 100644 --- a/arch/x86/kernel/cpu/intel_cacheinfo.c +++ b/arch/x86/kernel/cpu/intel_cacheinfo.c @@ -811,7 +811,24 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index, struct cacheinfo *this_leaf; int i, sibling; - if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { + /* + * For L3, always use the pre-calculated cpu_llc_shared_mask + * to derive shared_cpu_map. + */ + if (index == 3) { + for_each_cpu(i, cpu_llc_shared_mask(cpu)) { + this_cpu_ci = get_cpu_cacheinfo(i); + if (!this_cpu_ci->info_list) + continue; + this_leaf = this_cpu_ci->info_list + index; + for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) { + if (!cpu_online(sibling)) + continue; + cpumask_set_cpu(sibling, + &this_leaf->shared_cpu_map); + } + } + } else if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { unsigned int apicid, nshared, first, last; this_leaf = this_cpu_ci->info_list + index; @@ -839,19 +856,6 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index, &this_leaf->shared_cpu_map); } } - } else if (index == 3) { - for_each_cpu(i, cpu_llc_shared_mask(cpu)) { - this_cpu_ci = get_cpu_cacheinfo(i); - if (!this_cpu_ci->info_list) - continue; - this_leaf = this_cpu_ci->info_list + index; - for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) { - if (!cpu_online(sibling)) - continue; - cpumask_set_cpu(sibling, - &this_leaf->shared_cpu_map); - } - } } else return 0; diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 810b138f5897..c82d9fd2f05a 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -4030,7 +4030,8 @@ smi_from_recv_msg(ipmi_smi_t intf, struct ipmi_recv_msg *recv_msg, } static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, - struct list_head *timeouts, long timeout_period, + struct list_head *timeouts, + unsigned long timeout_period, int slot, unsigned long *flags, unsigned int *waiting_msgs) { @@ -4043,8 +4044,8 @@ static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, if (!ent->inuse) return; - ent->timeout -= timeout_period; - if (ent->timeout > 0) { + if (timeout_period < ent->timeout) { + ent->timeout -= timeout_period; (*waiting_msgs)++; return; } @@ -4110,7 +4111,8 @@ static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, } } -static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, long timeout_period) +static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, + unsigned long timeout_period) { struct list_head timeouts; struct ipmi_recv_msg *msg, *msg2; diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c index 610638a80383..461bf0b8a094 100644 --- a/drivers/char/tpm/tpm-dev-common.c +++ b/drivers/char/tpm/tpm-dev-common.c @@ -110,6 +110,12 @@ ssize_t tpm_common_write(struct file *file, const char __user *buf, return -EFAULT; } + if (in_size < 6 || + in_size < be32_to_cpu(*((__be32 *) (priv->data_buffer + 2)))) { + mutex_unlock(&priv->buffer_mutex); + return -EINVAL; + } + /* atomic tpm command send and result receive. We only hold the ops * lock during this period so that the tpm can be unregistered even if * the char dev is held open. diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index c99dc59d729b..76e8054bfc4e 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3253,7 +3253,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) hash ^= (hash >> 16); hash ^= (hash >> 8); - return hash; + return hash >> 1; } /*-------------------------- Device entry points ----------------------------*/ diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c index c28fa5a8734c..ba15eeadfe21 100644 --- a/drivers/net/ethernet/broadcom/bcmsysport.c +++ b/drivers/net/ethernet/broadcom/bcmsysport.c @@ -1743,15 +1743,17 @@ static inline void bcm_sysport_mask_all_intrs(struct bcm_sysport_priv *priv) static inline void gib_set_pad_extension(struct bcm_sysport_priv *priv) { - u32 __maybe_unused reg; + u32 reg; - /* Include Broadcom tag in pad extension */ + reg = gib_readl(priv, GIB_CONTROL); + /* Include Broadcom tag in pad extension and fix up IPG_LENGTH */ if (netdev_uses_dsa(priv->netdev)) { - reg = gib_readl(priv, GIB_CONTROL); reg &= ~(GIB_PAD_EXTENSION_MASK << GIB_PAD_EXTENSION_SHIFT); reg |= ENET_BRCM_TAG_LEN << GIB_PAD_EXTENSION_SHIFT; - gib_writel(priv, reg, GIB_CONTROL); } + reg &= ~(GIB_IPG_LEN_MASK << GIB_IPG_LEN_SHIFT); + reg |= 12 << GIB_IPG_LEN_SHIFT; + gib_writel(priv, reg, GIB_CONTROL); } static int bcm_sysport_open(struct net_device *dev) diff --git a/drivers/net/ethernet/fealnx.c b/drivers/net/ethernet/fealnx.c index e92859dab7ae..e191c4ebeaf4 100644 --- a/drivers/net/ethernet/fealnx.c +++ b/drivers/net/ethernet/fealnx.c @@ -257,8 +257,8 @@ enum rx_desc_status_bits { RXFSD = 0x00000800, /* first descriptor */ RXLSD = 0x00000400, /* last descriptor */ ErrorSummary = 0x80, /* error summary */ - RUNT = 0x40, /* runt packet received */ - LONG = 0x20, /* long packet received */ + RUNTPKT = 0x40, /* runt packet received */ + LONGPKT = 0x20, /* long packet received */ FAE = 0x10, /* frame align error */ CRC = 0x08, /* crc error */ RXER = 0x04, /* receive error */ @@ -1632,7 +1632,7 @@ static int netdev_rx(struct net_device *dev) dev->name, rx_status); dev->stats.rx_errors++; /* end of a packet. */ - if (rx_status & (LONG | RUNT)) + if (rx_status & (LONGPKT | RUNTPKT)) dev->stats.rx_length_errors++; if (rx_status & RXER) dev->stats.rx_frame_errors++; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 7344433259fc..1c513dc0105e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -213,22 +213,20 @@ static inline bool mlx5e_rx_cache_get(struct mlx5e_rq *rq, static inline int mlx5e_page_alloc_mapped(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info) { - struct page *page; - if (mlx5e_rx_cache_get(rq, dma_info)) return 0; - page = dev_alloc_pages(rq->buff.page_order); - if (unlikely(!page)) + dma_info->page = dev_alloc_pages(rq->buff.page_order); + if (unlikely(!dma_info->page)) return -ENOMEM; - dma_info->addr = dma_map_page(rq->pdev, page, 0, + dma_info->addr = dma_map_page(rq->pdev, dma_info->page, 0, RQ_PAGE_SIZE(rq), rq->buff.map_dir); if (unlikely(dma_mapping_error(rq->pdev, dma_info->addr))) { - put_page(page); + put_page(dma_info->page); + dma_info->page = NULL; return -ENOMEM; } - dma_info->page = page; return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 16885827367b..553bc230d70d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1545,9 +1545,16 @@ static int mlx5_try_fast_unload(struct mlx5_core_dev *dev) return -EAGAIN; } + /* Panic tear down fw command will stop the PCI bus communication + * with the HCA, so the health polll is no longer needed. + */ + mlx5_drain_health_wq(dev); + mlx5_stop_health_poll(dev); + ret = mlx5_cmd_force_teardown_hca(dev); if (ret) { mlx5_core_dbg(dev, "Firmware couldn't do fast unload error: %d\n", ret); + mlx5_start_health_poll(dev); return ret; } diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c index b2ff88e69a81..3d4f7959dabb 100644 --- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -626,7 +626,7 @@ static int asix_suspend(struct usb_interface *intf, pm_message_t message) struct usbnet *dev = usb_get_intfdata(intf); struct asix_common_private *priv = dev->driver_priv; - if (priv->suspend) + if (priv && priv->suspend) priv->suspend(dev); return usbnet_suspend(intf, message); @@ -678,7 +678,7 @@ static int asix_resume(struct usb_interface *intf) struct usbnet *dev = usb_get_intfdata(intf); struct asix_common_private *priv = dev->driver_priv; - if (priv->resume) + if (priv && priv->resume) priv->resume(dev); return usbnet_resume(intf); diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c index 8ab281b478f2..4f88f64cccb4 100644 --- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -221,7 +221,7 @@ int usbnet_generic_cdc_bind(struct usbnet *dev, struct usb_interface *intf) goto bad_desc; } - if (header.usb_cdc_ether_desc) { + if (header.usb_cdc_ether_desc && info->ether->wMaxSegmentSize) { dev->hard_mtu = le16_to_cpu(info->ether->wMaxSegmentSize); /* because of Zaurus, we may be ignoring the host * side link address we were given. diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index 9c80e80c5493..8d5e97251efe 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -771,7 +771,7 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_ int err; u8 iface_no; struct usb_cdc_parsed_header hdr; - u16 curr_ntb_format; + __le16 curr_ntb_format; ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) @@ -889,7 +889,7 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_ goto error2; } - if (curr_ntb_format == USB_CDC_NCM_NTB32_FORMAT) { + if (curr_ntb_format == cpu_to_le16(USB_CDC_NCM_NTB32_FORMAT)) { dev_info(&intf->dev, "resetting NTB format to 16-bit"); err = usbnet_write_cmd(dev, USB_CDC_SET_NTB_FORMAT, USB_TYPE_CLASS | USB_DIR_OUT diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 8c3733608271..8d4a6f7cba61 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -499,6 +499,7 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb) return 1; } if (rawip) { + skb_reset_mac_header(skb); skb->dev = dev->net; /* normally set by eth_type_trans */ skb->protocol = proto; return 1; @@ -681,7 +682,7 @@ static int qmi_wwan_bind(struct usbnet *dev, struct usb_interface *intf) } /* errors aren't fatal - we can live with the dynamic address */ - if (cdc_ether) { + if (cdc_ether && cdc_ether->wMaxSegmentSize) { dev->hard_mtu = le16_to_cpu(cdc_ether->wMaxSegmentSize); usbnet_get_ethernet_addr(dev, cdc_ether->iMACAddress); } diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 8a1eaf3c302a..e91ef5e236cc 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1271,7 +1271,7 @@ static int vrf_fib_rule(const struct net_device *dev, __u8 family, bool add_it) frh->family = family; frh->action = FR_ACT_TO_TBL; - if (nla_put_u32(skb, FRA_L3MDEV, 1)) + if (nla_put_u8(skb, FRA_L3MDEV, 1)) goto nla_put_failure; if (nla_put_u32(skb, FRA_PRIORITY, FIB_RULE_PREF)) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index e17baac70f43..436154720bf8 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1632,26 +1632,19 @@ static struct sk_buff *vxlan_na_create(struct sk_buff *request, static int neigh_reduce(struct net_device *dev, struct sk_buff *skb, __be32 vni) { struct vxlan_dev *vxlan = netdev_priv(dev); - struct nd_msg *msg; - const struct ipv6hdr *iphdr; const struct in6_addr *daddr; - struct neighbour *n; + const struct ipv6hdr *iphdr; struct inet6_dev *in6_dev; + struct neighbour *n; + struct nd_msg *msg; in6_dev = __in6_dev_get(dev); if (!in6_dev) goto out; - if (!pskb_may_pull(skb, sizeof(struct ipv6hdr) + sizeof(struct nd_msg))) - goto out; - iphdr = ipv6_hdr(skb); daddr = &iphdr->daddr; - msg = (struct nd_msg *)(iphdr + 1); - if (msg->icmph.icmp6_code != 0 || - msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION) - goto out; if (ipv6_addr_loopback(daddr) || ipv6_addr_is_multicast(&msg->target)) @@ -2258,11 +2251,11 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) { struct vxlan_dev *vxlan = netdev_priv(dev); + struct vxlan_rdst *rdst, *fdst = NULL; const struct ip_tunnel_info *info; - struct ethhdr *eth; bool did_rsc = false; - struct vxlan_rdst *rdst, *fdst = NULL; struct vxlan_fdb *f; + struct ethhdr *eth; __be32 vni = 0; info = skb_tunnel_info(skb); @@ -2287,12 +2280,14 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) if (ntohs(eth->h_proto) == ETH_P_ARP) return arp_reduce(dev, skb, vni); #if IS_ENABLED(CONFIG_IPV6) - else if (ntohs(eth->h_proto) == ETH_P_IPV6) { - struct ipv6hdr *hdr, _hdr; - if ((hdr = skb_header_pointer(skb, - skb_network_offset(skb), - sizeof(_hdr), &_hdr)) && - hdr->nexthdr == IPPROTO_ICMPV6) + else if (ntohs(eth->h_proto) == ETH_P_IPV6 && + pskb_may_pull(skb, sizeof(struct ipv6hdr) + + sizeof(struct nd_msg)) && + ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) { + struct nd_msg *m = (struct nd_msg *)(ipv6_hdr(skb) + 1); + + if (m->icmph.icmp6_code == 0 && + m->icmph.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) return neigh_reduce(dev, skb, vni); } #endif diff --git a/drivers/tty/serial/8250/8250_fintek.c b/drivers/tty/serial/8250/8250_fintek.c index e500f7dd2470..4bd376c08b59 100644 --- a/drivers/tty/serial/8250/8250_fintek.c +++ b/drivers/tty/serial/8250/8250_fintek.c @@ -118,6 +118,9 @@ static int fintek_8250_enter_key(u16 base_port, u8 key) if (!request_muxed_region(base_port, 2, "8250_fintek")) return -EBUSY; + /* Force to deactive all SuperIO in this base_port */ + outb(EXIT_KEY, base_port + ADDR_PORT); + outb(key, base_port + ADDR_PORT); outb(key, base_port + ADDR_PORT); return 0; diff --git a/drivers/tty/serial/omap-serial.c b/drivers/tty/serial/omap-serial.c index 1ea05ac57aa7..670f7e334f93 100644 --- a/drivers/tty/serial/omap-serial.c +++ b/drivers/tty/serial/omap-serial.c @@ -693,7 +693,7 @@ static void serial_omap_set_mctrl(struct uart_port *port, unsigned int mctrl) if ((mctrl & TIOCM_RTS) && (port->status & UPSTAT_AUTORTS)) up->efr |= UART_EFR_RTS; else - up->efr &= UART_EFR_RTS; + up->efr &= ~UART_EFR_RTS; serial_out(up, UART_EFR, up->efr); serial_out(up, UART_LCR, lcr); diff --git a/fs/coda/upcall.c b/fs/coda/upcall.c index e82357c89979..8cf16d8c5261 100644 --- a/fs/coda/upcall.c +++ b/fs/coda/upcall.c @@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb, struct CodaFid *fid) UPARG(CODA_FSYNC); inp->coda_fsync.VFid = *fid; - error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs), - &outsize, inp); + error = coda_upcall(coda_vcp(sb), insize, &outsize, inp); CODA_FREE(inp, insize); return error; diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index 74407c6dd592..ec8f75813beb 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node) dlm_lockres_put(res); continue; } + dlm_move_lockres_to_recovery_list(dlm, res); } else if (res->owner == dlm->node_num) { dlm_free_dead_locks(dlm, res, dead_node); __dlm_lockres_calc_usage(dlm, res); diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index bfeb647459d9..2fc8e65c07c5 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1168,6 +1168,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) } size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE; if (size_change) { + /* + * Here we should wait dio to finish before inode lock + * to avoid a deadlock between ocfs2_setattr() and + * ocfs2_dio_end_io_write() + */ + inode_dio_wait(inode); + status = ocfs2_rw_lock(inode, 1); if (status < 0) { mlog_errno(status); @@ -1207,8 +1214,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) if (status) goto bail_unlock; - inode_dio_wait(inode); - if (i_size_read(inode) >= attr->ia_size) { if (ocfs2_should_order_data(inode)) { status = ocfs2_begin_ordered_truncate(inode, diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fc14b8b3f6ce..1d86e09f17c1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -691,7 +691,8 @@ typedef struct pglist_data { * is the first PFN that needs to be initialised. */ unsigned long first_deferred_pfn; - unsigned long static_init_size; + /* Number of non-deferred pages */ + unsigned long static_init_pgcnt; #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 63df75ae70ee..baf2dd102686 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3655,6 +3655,13 @@ static inline void nf_reset_trace(struct sk_buff *skb) #endif } +static inline void ipvs_reset(struct sk_buff *skb) +{ +#if IS_ENABLED(CONFIG_IP_VS) + skb->ipvs_property = 0; +#endif +} + /* Note: This doesn't put any conntrack and bridge info in dst. */ static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src, bool copy) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 908b309d60d7..b8f51dffeae9 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1493,7 +1493,7 @@ static void rcu_prepare_for_idle(void) rdtp->last_accelerate = jiffies; for_each_rcu_flavor(rsp) { rdp = this_cpu_ptr(rsp->rda); - if (rcu_segcblist_pend_cbs(&rdp->cblist)) + if (!rcu_segcblist_pend_cbs(&rdp->cblist)) continue; rnp = rdp->mynode; raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1423da8dd16f..3bd0999c266f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -289,28 +289,37 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + +/* + * Determine how many pages need to be initialized durig early boot + * (non-deferred initialization). + * The value of first_deferred_pfn will be set later, once non-deferred pages + * are initialized, but for now set it ULONG_MAX. + */ static inline void reset_deferred_meminit(pg_data_t *pgdat) { - unsigned long max_initialise; - unsigned long reserved_lowmem; + phys_addr_t start_addr, end_addr; + unsigned long max_pgcnt; + unsigned long reserved; /* * Initialise at least 2G of a node but also take into account that * two large system hashes that can take up 1GB for 0.25TB/node. */ - max_initialise = max(2UL << (30 - PAGE_SHIFT), - (pgdat->node_spanned_pages >> 8)); + max_pgcnt = max(2UL << (30 - PAGE_SHIFT), + (pgdat->node_spanned_pages >> 8)); /* * Compensate the all the memblock reservations (e.g. crash kernel) * from the initial estimation to make sure we will initialize enough * memory to boot. */ - reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn, - pgdat->node_start_pfn + max_initialise); - max_initialise += reserved_lowmem; + start_addr = PFN_PHYS(pgdat->node_start_pfn); + end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt); + reserved = memblock_reserved_memory_within(start_addr, end_addr); + max_pgcnt += PHYS_PFN(reserved); - pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages); + pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages); pgdat->first_deferred_pfn = ULONG_MAX; } @@ -337,7 +346,7 @@ static inline bool update_defer_init(pg_data_t *pgdat, if (zone_end < pgdat_end_pfn(pgdat)) return true; (*nr_initialised)++; - if ((*nr_initialised > pgdat->static_init_size) && + if ((*nr_initialised > pgdat->static_init_pgcnt) && (pfn & (PAGES_PER_SECTION - 1)) == 0) { pgdat->first_deferred_pfn = pfn; return false; diff --git a/mm/page_ext.c b/mm/page_ext.c index 88ccc044b09a..9dbabbfc4557 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -124,7 +124,6 @@ struct page_ext *lookup_page_ext(struct page *page) struct page_ext *base; base = NODE_DATA(page_to_nid(page))->node_page_ext; -#if defined(CONFIG_DEBUG_VM) /* * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are @@ -133,7 +132,6 @@ struct page_ext *lookup_page_ext(struct page *page) */ if (unlikely(!base)) return NULL; -#endif index = pfn - round_down(node_start_pfn(page_to_nid(page)), MAX_ORDER_NR_PAGES); return get_entry(base, index); @@ -198,7 +196,6 @@ struct page_ext *lookup_page_ext(struct page *page) { unsigned long pfn = page_to_pfn(page); struct mem_section *section = __pfn_to_section(pfn); -#if defined(CONFIG_DEBUG_VM) /* * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are @@ -207,7 +204,6 @@ struct page_ext *lookup_page_ext(struct page *page) */ if (!section->page_ext) return NULL; -#endif return get_entry(section->page_ext, pfn); } diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 1a4197965415..7d973f63088c 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -187,8 +187,12 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, do { next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask, sz); - if (pte && walk->hugetlb_entry) + + if (pte) err = walk->hugetlb_entry(pte, hmask, addr, next, walk); + else if (walk->pte_hole) + err = walk->pte_hole(addr, next, walk); + if (err) break; } while (addr = next, addr != end); diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 9649579b5b9f..4a72ee4e2ae9 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -376,6 +376,9 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, dev->name); vlan_vid_add(dev, htons(ETH_P_8021Q), 0); } + if (event == NETDEV_DOWN && + (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) + vlan_vid_del(dev, htons(ETH_P_8021Q), 0); vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) @@ -423,9 +426,6 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, struct net_device *tmp; LIST_HEAD(close_list); - if (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER) - vlan_vid_del(dev, htons(ETH_P_8021Q), 0); - /* Put all VLANs for this dev in the down state too. */ vlan_group_for_each_dev(grp, i, vlandev) { flgs = vlandev->flags; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 72eb23d2426f..a0155578e951 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4476,6 +4476,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet) if (!xnet) return; + ipvs_reset(skb); skb_orphan(skb); skb->mark = 0; } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index e92e5dbcb3d6..ffe96de8a079 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2613,7 +2613,6 @@ void tcp_simple_retransmit(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; unsigned int mss = tcp_current_mss(sk); - u32 prior_lost = tp->lost_out; tcp_for_write_queue(skb, sk) { if (skb == tcp_send_head(sk)) @@ -2630,7 +2629,7 @@ void tcp_simple_retransmit(struct sock *sk) tcp_clear_retrans_hints_partial(tp); - if (prior_lost == tp->lost_out) + if (!tp->lost_out) return; if (tcp_is_reno(tp)) diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c index 6d650ed3cb59..5c871666c561 100644 --- a/net/ipv4/tcp_nv.c +++ b/net/ipv4/tcp_nv.c @@ -263,7 +263,7 @@ static void tcpnv_acked(struct sock *sk, const struct ack_sample *sample) /* rate in 100's bits per second */ rate64 = ((u64)sample->in_flight) * 8000000; - rate = (u32)div64_u64(rate64, (u64)(avg_rtt * 100)); + rate = (u32)div64_u64(rate64, (u64)(avg_rtt ?: 1) * 100); /* Remember the maximum rate seen during this RTT * Note: It may be more than one RTT. This function should be diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index 11f69bbf9307..b6a2aa1dcf56 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -149,11 +149,19 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb, * is freed by GSO engine */ if (copy_destructor) { + int delta; + swap(gso_skb->sk, skb->sk); swap(gso_skb->destructor, skb->destructor); sum_truesize += skb->truesize; - refcount_add(sum_truesize - gso_skb->truesize, - &skb->sk->sk_wmem_alloc); + delta = sum_truesize - gso_skb->truesize; + /* In some pathological cases, delta can be negative. + * We need to either use refcount_add() or refcount_sub_and_test() + */ + if (likely(delta >= 0)) + refcount_add(delta, &skb->sk->sk_wmem_alloc); + else + WARN_ON_ONCE(refcount_sub_and_test(-delta, &skb->sk->sk_wmem_alloc)); } delta = htonl(oldlen + (skb_tail_pointer(skb) - diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 58587b0e2b5d..e359840f46c0 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3207,13 +3207,8 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, th->source = htons(ireq->ir_num); th->dest = ireq->ir_rmt_port; skb->mark = ireq->ir_mark; - /* Setting of flags are superfluous here for callers (and ECE is - * not even correctly set) - */ - tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn, - TCPHDR_SYN | TCPHDR_ACK); - - th->seq = htonl(TCP_SKB_CB(skb)->seq); + skb->ip_summed = CHECKSUM_PARTIAL; + th->seq = htonl(tcp_rsk(req)->snt_isn); /* XXX data is queued and acked as is. No buffer/window check */ th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt); diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c index 4d322c1b7233..e4280b6568b4 100644 --- a/net/l2tp/l2tp_ip.c +++ b/net/l2tp/l2tp_ip.c @@ -123,6 +123,7 @@ static int l2tp_ip_recv(struct sk_buff *skb) unsigned char *ptr, *optr; struct l2tp_session *session; struct l2tp_tunnel *tunnel = NULL; + struct iphdr *iph; int length; if (!pskb_may_pull(skb, 4)) @@ -178,24 +179,17 @@ static int l2tp_ip_recv(struct sk_buff *skb) goto discard; tunnel_id = ntohl(*(__be32 *) &skb->data[4]); - tunnel = l2tp_tunnel_find(net, tunnel_id); - if (tunnel) { - sk = tunnel->sock; - sock_hold(sk); - } else { - struct iphdr *iph = (struct iphdr *) skb_network_header(skb); - - read_lock_bh(&l2tp_ip_lock); - sk = __l2tp_ip_bind_lookup(net, iph->daddr, iph->saddr, - inet_iif(skb), tunnel_id); - if (!sk) { - read_unlock_bh(&l2tp_ip_lock); - goto discard; - } + iph = (struct iphdr *)skb_network_header(skb); - sock_hold(sk); + read_lock_bh(&l2tp_ip_lock); + sk = __l2tp_ip_bind_lookup(net, iph->daddr, iph->saddr, inet_iif(skb), + tunnel_id); + if (!sk) { read_unlock_bh(&l2tp_ip_lock); + goto discard; } + sock_hold(sk); + read_unlock_bh(&l2tp_ip_lock); if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_put; diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c index 88b397c30d86..8bcaa975b432 100644 --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -136,6 +136,7 @@ static int l2tp_ip6_recv(struct sk_buff *skb) unsigned char *ptr, *optr; struct l2tp_session *session; struct l2tp_tunnel *tunnel = NULL; + struct ipv6hdr *iph; int length; if (!pskb_may_pull(skb, 4)) @@ -192,24 +193,17 @@ static int l2tp_ip6_recv(struct sk_buff *skb) goto discard; tunnel_id = ntohl(*(__be32 *) &skb->data[4]); - tunnel = l2tp_tunnel_find(net, tunnel_id); - if (tunnel) { - sk = tunnel->sock; - sock_hold(sk); - } else { - struct ipv6hdr *iph = ipv6_hdr(skb); - - read_lock_bh(&l2tp_ip6_lock); - sk = __l2tp_ip6_bind_lookup(net, &iph->daddr, &iph->saddr, - inet6_iif(skb), tunnel_id); - if (!sk) { - read_unlock_bh(&l2tp_ip6_lock); - goto discard; - } + iph = ipv6_hdr(skb); - sock_hold(sk); + read_lock_bh(&l2tp_ip6_lock); + sk = __l2tp_ip6_bind_lookup(net, &iph->daddr, &iph->saddr, + inet6_iif(skb), tunnel_id); + if (!sk) { read_unlock_bh(&l2tp_ip6_lock); + goto discard; } + sock_hold(sk); + read_unlock_bh(&l2tp_ip6_lock); if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_put; diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 09c8dbbd2d70..2939a6b87c27 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2128,7 +2128,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_min_size; int alloc_size; @@ -2175,9 +2175,11 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); - len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb); - if (len > 0) { + if (nlk->dump_done_errno > 0 || + skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) @@ -2187,13 +2189,15 @@ static int netlink_dump(struct sock *sk) return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, + sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb; nl_dump_check_consistent(cb, nlh); - memcpy(nlmsg_data(nlh), &len, sizeof(len)); + memcpy(nlmsg_data(nlh), &nlk->dump_done_errno, + sizeof(nlk->dump_done_errno)); if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2265,6 +2269,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, } nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX; mutex_unlock(nlk->cb_mutex); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 3490f2430532..8908fc2d3de0 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -33,6 +33,7 @@ struct netlink_sock { wait_queue_head_t wait; bool bound; bool cb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex *cb_mutex; struct mutex cb_def_mutex; diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index 1344e3a411ae..edb462b0b73b 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -807,9 +807,10 @@ static void sctp_inet6_skb_msgname(struct sk_buff *skb, char *msgname, addr->v6.sin6_flowinfo = 0; addr->v6.sin6_port = sh->source; addr->v6.sin6_addr = ipv6_hdr(skb)->saddr; - if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) { + if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) addr->v6.sin6_scope_id = sctp_v6_skb_iif(skb); - } + else + addr->v6.sin6_scope_id = 0; } *addr_len = sctp_v6_addr_to_user(sctp_sk(skb->sk), addr); diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 3d79085eb4e0..083da13e1af4 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4924,6 +4924,10 @@ int sctp_do_peeloff(struct sock *sk, sctp_assoc_t id, struct socket **sockp) struct socket *sock; int err = 0; + /* Do not peel off from one netns to another one. */ + if (!net_eq(current->nsproxy->net_ns, sock_net(sk))) + return -EINVAL; + if (!asoc) return -EINVAL; diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c index 809ba70fbbbf..7d769b948de8 100644 --- a/security/integrity/ima/ima_appraise.c +++ b/security/integrity/ima/ima_appraise.c @@ -320,6 +320,9 @@ void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file) if (iint->flags & IMA_DIGSIG) return; + if (iint->ima_file_status != INTEGRITY_PASS) + return; + rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo); if (rc < 0) return;

8 years

1
0
0 0

Re: [Linux-stable-mirror] Linux 4.9.65

by Greg KH

diff --git a/Makefile b/Makefile index d29cace0da6d..87a641515e9c 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 64 +SUBLEVEL = 65 EXTRAVERSION = NAME = Roaring Lionus diff --git a/crypto/dh.c b/crypto/dh.c index 9d19360e7189..99e20fc63cc9 100644 --- a/crypto/dh.c +++ b/crypto/dh.c @@ -21,19 +21,12 @@ struct dh_ctx { MPI xa; }; -static inline void dh_clear_params(struct dh_ctx *ctx) +static void dh_clear_ctx(struct dh_ctx *ctx) { mpi_free(ctx->p); mpi_free(ctx->g); - ctx->p = NULL; - ctx->g = NULL; -} - -static void dh_free_ctx(struct dh_ctx *ctx) -{ - dh_clear_params(ctx); mpi_free(ctx->xa); - ctx->xa = NULL; + memset(ctx, 0, sizeof(*ctx)); } /* @@ -71,10 +64,8 @@ static int dh_set_params(struct dh_ctx *ctx, struct dh *params) return -EINVAL; ctx->g = mpi_read_raw_data(params->g, params->g_size); - if (!ctx->g) { - mpi_free(ctx->p); + if (!ctx->g) return -EINVAL; - } return 0; } @@ -84,19 +75,24 @@ static int dh_set_secret(struct crypto_kpp *tfm, void *buf, unsigned int len) struct dh_ctx *ctx = dh_get_ctx(tfm); struct dh params; + /* Free the old MPI key if any */ + dh_clear_ctx(ctx); + if (crypto_dh_decode_key(buf, len, &params) < 0) - return -EINVAL; + goto err_clear_ctx; if (dh_set_params(ctx, &params) < 0) - return -EINVAL; + goto err_clear_ctx; ctx->xa = mpi_read_raw_data(params.key, params.key_size); - if (!ctx->xa) { - dh_clear_params(ctx); - return -EINVAL; - } + if (!ctx->xa) + goto err_clear_ctx; return 0; + +err_clear_ctx: + dh_clear_ctx(ctx); + return -EINVAL; } static int dh_compute_value(struct kpp_request *req) @@ -154,7 +150,7 @@ static void dh_exit_tfm(struct crypto_kpp *tfm) { struct dh_ctx *ctx = dh_get_ctx(tfm); - dh_free_ctx(ctx); + dh_clear_ctx(ctx); } static struct kpp_alg dh = { diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 172a9dc06ec9..5d509ccf1299 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -4029,7 +4029,8 @@ smi_from_recv_msg(ipmi_smi_t intf, struct ipmi_recv_msg *recv_msg, } static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, - struct list_head *timeouts, long timeout_period, + struct list_head *timeouts, + unsigned long timeout_period, int slot, unsigned long *flags, unsigned int *waiting_msgs) { @@ -4042,8 +4043,8 @@ static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, if (!ent->inuse) return; - ent->timeout -= timeout_period; - if (ent->timeout > 0) { + if (timeout_period < ent->timeout) { + ent->timeout -= timeout_period; (*waiting_msgs)++; return; } @@ -4109,7 +4110,8 @@ static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, } } -static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, long timeout_period) +static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, + unsigned long timeout_period) { struct list_head timeouts; struct ipmi_recv_msg *msg, *msg2; diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c index cf76fc6149e5..fbb75514dfb4 100644 --- a/drivers/dma/dmatest.c +++ b/drivers/dma/dmatest.c @@ -666,6 +666,7 @@ static int dmatest_func(void *data) * free it this time?" dancing. For now, just * leave it dangling. */ + WARN(1, "dmatest: Kernel stack may be corrupted!!\n"); dmaengine_unmap_put(um); result("test timed out", total_tests, src_off, dst_off, len, 0); diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 5fa36ebc0640..63d61c084815 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3217,7 +3217,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) hash ^= (hash >> 16); hash ^= (hash >> 8); - return hash; + return hash >> 1; } /*-------------------------- Device entry points ----------------------------*/ diff --git a/drivers/net/ethernet/fealnx.c b/drivers/net/ethernet/fealnx.c index c08bd763172a..a300ed48a7d8 100644 --- a/drivers/net/ethernet/fealnx.c +++ b/drivers/net/ethernet/fealnx.c @@ -257,8 +257,8 @@ enum rx_desc_status_bits { RXFSD = 0x00000800, /* first descriptor */ RXLSD = 0x00000400, /* last descriptor */ ErrorSummary = 0x80, /* error summary */ - RUNT = 0x40, /* runt packet received */ - LONG = 0x20, /* long packet received */ + RUNTPKT = 0x40, /* runt packet received */ + LONGPKT = 0x20, /* long packet received */ FAE = 0x10, /* frame align error */ CRC = 0x08, /* crc error */ RXER = 0x04, /* receive error */ @@ -1633,7 +1633,7 @@ static int netdev_rx(struct net_device *dev) dev->name, rx_status); dev->stats.rx_errors++; /* end of a packet. */ - if (rx_status & (LONG | RUNT)) + if (rx_status & (LONGPKT | RUNTPKT)) dev->stats.rx_length_errors++; if (rx_status & RXER) dev->stats.rx_frame_errors++; diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c index 50737def774c..32e9ec8f1521 100644 --- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -624,7 +624,7 @@ static int asix_suspend(struct usb_interface *intf, pm_message_t message) struct usbnet *dev = usb_get_intfdata(intf); struct asix_common_private *priv = dev->driver_priv; - if (priv->suspend) + if (priv && priv->suspend) priv->suspend(dev); return usbnet_suspend(intf, message); @@ -676,7 +676,7 @@ static int asix_resume(struct usb_interface *intf) struct usbnet *dev = usb_get_intfdata(intf); struct asix_common_private *priv = dev->driver_priv; - if (priv->resume) + if (priv && priv->resume) priv->resume(dev); return usbnet_resume(intf); diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c index b82be816256c..1fca0024f294 100644 --- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -221,7 +221,7 @@ int usbnet_generic_cdc_bind(struct usbnet *dev, struct usb_interface *intf) goto bad_desc; } - if (header.usb_cdc_ether_desc) { + if (header.usb_cdc_ether_desc && info->ether->wMaxSegmentSize) { dev->hard_mtu = le16_to_cpu(info->ether->wMaxSegmentSize); /* because of Zaurus, we may be ignoring the host * side link address we were given. diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 49a27dc46e5e..9cf11c83993a 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -205,6 +205,7 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb) return 1; } if (rawip) { + skb_reset_mac_header(skb); skb->dev = dev->net; /* normally set by eth_type_trans */ skb->protocol = proto; return 1; @@ -386,7 +387,7 @@ static int qmi_wwan_bind(struct usbnet *dev, struct usb_interface *intf) } /* errors aren't fatal - we can live with the dynamic address */ - if (cdc_ether) { + if (cdc_ether && cdc_ether->wMaxSegmentSize) { dev->hard_mtu = le16_to_cpu(cdc_ether->wMaxSegmentSize); usbnet_get_ethernet_addr(dev, cdc_ether->iMACAddress); } diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 578bd5001d93..346e48698555 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1129,7 +1129,7 @@ static int vrf_fib_rule(const struct net_device *dev, __u8 family, bool add_it) frh->family = family; frh->action = FR_ACT_TO_TBL; - if (nla_put_u32(skb, FRA_L3MDEV, 1)) + if (nla_put_u8(skb, FRA_L3MDEV, 1)) goto nla_put_failure; if (nla_put_u32(skb, FRA_PRIORITY, FIB_RULE_PREF)) diff --git a/drivers/tty/serial/8250/8250_fintek.c b/drivers/tty/serial/8250/8250_fintek.c index 0facc789fe7d..f8c31070a337 100644 --- a/drivers/tty/serial/8250/8250_fintek.c +++ b/drivers/tty/serial/8250/8250_fintek.c @@ -54,6 +54,9 @@ static int fintek_8250_enter_key(u16 base_port, u8 key) if (!request_muxed_region(base_port, 2, "8250_fintek")) return -EBUSY; + /* Force to deactive all SuperIO in this base_port */ + outb(EXIT_KEY, base_port + ADDR_PORT); + outb(key, base_port + ADDR_PORT); outb(key, base_port + ADDR_PORT); return 0; diff --git a/drivers/tty/serial/omap-serial.c b/drivers/tty/serial/omap-serial.c index 44e5b5bf713b..472ba3c813c1 100644 --- a/drivers/tty/serial/omap-serial.c +++ b/drivers/tty/serial/omap-serial.c @@ -693,7 +693,7 @@ static void serial_omap_set_mctrl(struct uart_port *port, unsigned int mctrl) if ((mctrl & TIOCM_RTS) && (port->status & UPSTAT_AUTORTS)) up->efr |= UART_EFR_RTS; else - up->efr &= UART_EFR_RTS; + up->efr &= ~UART_EFR_RTS; serial_out(up, UART_EFR, up->efr); serial_out(up, UART_LCR, lcr); diff --git a/fs/coda/upcall.c b/fs/coda/upcall.c index f6c6c8adbc01..7289f0a7670b 100644 --- a/fs/coda/upcall.c +++ b/fs/coda/upcall.c @@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb, struct CodaFid *fid) UPARG(CODA_FSYNC); inp->coda_fsync.VFid = *fid; - error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs), - &outsize, inp); + error = coda_upcall(coda_vcp(sb), insize, &outsize, inp); CODA_FREE(inp, insize); return error; diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index dd5cb8bcefd1..eef324823311 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node) dlm_lockres_put(res); continue; } + dlm_move_lockres_to_recovery_list(dlm, res); } else if (res->owner == dlm->node_num) { dlm_free_dead_locks(dlm, res, dead_node); __dlm_lockres_calc_usage(dlm, res); diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 0db6f83fdea1..05a0fb9854f9 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1166,6 +1166,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) } size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE; if (size_change) { + /* + * Here we should wait dio to finish before inode lock + * to avoid a deadlock between ocfs2_setattr() and + * ocfs2_dio_end_io_write() + */ + inode_dio_wait(inode); + status = ocfs2_rw_lock(inode, 1); if (status < 0) { mlog_errno(status); @@ -1186,8 +1193,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) if (status) goto bail_unlock; - inode_dio_wait(inode); - if (i_size_read(inode) >= attr->ia_size) { if (ocfs2_should_order_data(inode)) { status = ocfs2_begin_ordered_truncate(inode, diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6744eb40c4ea..fff21a82780c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -672,7 +672,8 @@ typedef struct pglist_data { * is the first PFN that needs to be initialised. */ unsigned long first_deferred_pfn; - unsigned long static_init_size; + /* Number of non-deferred pages */ + unsigned long static_init_pgcnt; #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 32810f279f8e..601dfa849d30 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3584,6 +3584,13 @@ static inline void nf_reset_trace(struct sk_buff *skb) #endif } +static inline void ipvs_reset(struct sk_buff *skb) +{ +#if IS_ENABLED(CONFIG_IP_VS) + skb->ipvs_property = 0; +#endif +} + /* Note: This doesn't put any conntrack and bridge info in dst. */ static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src, bool copy) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7064aae8ded7..4a044134ce84 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -284,28 +284,37 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + +/* + * Determine how many pages need to be initialized durig early boot + * (non-deferred initialization). + * The value of first_deferred_pfn will be set later, once non-deferred pages + * are initialized, but for now set it ULONG_MAX. + */ static inline void reset_deferred_meminit(pg_data_t *pgdat) { - unsigned long max_initialise; - unsigned long reserved_lowmem; + phys_addr_t start_addr, end_addr; + unsigned long max_pgcnt; + unsigned long reserved; /* * Initialise at least 2G of a node but also take into account that * two large system hashes that can take up 1GB for 0.25TB/node. */ - max_initialise = max(2UL << (30 - PAGE_SHIFT), - (pgdat->node_spanned_pages >> 8)); + max_pgcnt = max(2UL << (30 - PAGE_SHIFT), + (pgdat->node_spanned_pages >> 8)); /* * Compensate the all the memblock reservations (e.g. crash kernel) * from the initial estimation to make sure we will initialize enough * memory to boot. */ - reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn, - pgdat->node_start_pfn + max_initialise); - max_initialise += reserved_lowmem; + start_addr = PFN_PHYS(pgdat->node_start_pfn); + end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt); + reserved = memblock_reserved_memory_within(start_addr, end_addr); + max_pgcnt += PHYS_PFN(reserved); - pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages); + pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages); pgdat->first_deferred_pfn = ULONG_MAX; } @@ -332,7 +341,7 @@ static inline bool update_defer_init(pg_data_t *pgdat, if (zone_end < pgdat_end_pfn(pgdat)) return true; (*nr_initialised)++; - if ((*nr_initialised > pgdat->static_init_size) && + if ((*nr_initialised > pgdat->static_init_pgcnt) && (pfn & (PAGES_PER_SECTION - 1)) == 0) { pgdat->first_deferred_pfn = pfn; return false; diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 207244489a68..d95341cffc2f 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -142,8 +142,12 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, do { next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask); - if (pte && walk->hugetlb_entry) + + if (pte) err = walk->hugetlb_entry(pte, hmask, addr, next, walk); + else if (walk->pte_hole) + err = walk->pte_hole(addr, next, walk); + if (err) break; } while (addr = next, addr != end); diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 8d213f974448..4a47074d1d7f 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -376,6 +376,9 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, dev->name); vlan_vid_add(dev, htons(ETH_P_8021Q), 0); } + if (event == NETDEV_DOWN && + (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) + vlan_vid_del(dev, htons(ETH_P_8021Q), 0); vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) @@ -423,9 +426,6 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, struct net_device *tmp; LIST_HEAD(close_list); - if (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER) - vlan_vid_del(dev, htons(ETH_P_8021Q), 0); - /* Put all VLANs for this dev in the down state too. */ vlan_group_for_each_dev(grp, i, vlandev) { flgs = vlandev->flags; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index fe008f1bd930..aec5605944d3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4375,6 +4375,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet) if (!xnet) return; + ipvs_reset(skb); skb_orphan(skb); skb->mark = 0; } diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c index 5de82a8d4d87..e45e2c41c7bd 100644 --- a/net/ipv4/tcp_nv.c +++ b/net/ipv4/tcp_nv.c @@ -263,7 +263,7 @@ static void tcpnv_acked(struct sock *sk, const struct ack_sample *sample) /* rate in 100's bits per second */ rate64 = ((u64)sample->in_flight) * 8000000; - rate = (u32)div64_u64(rate64, (u64)(avg_rtt * 100)); + rate = (u32)div64_u64(rate64, (u64)(avg_rtt ?: 1) * 100); /* Remember the maximum rate seen during this RTT * Note: It may be more than one RTT. This function should be diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 566b43afe378..3d7b59ecc76c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3110,13 +3110,8 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, tcp_ecn_make_synack(req, th); th->source = htons(ireq->ir_num); th->dest = ireq->ir_rmt_port; - /* Setting of flags are superfluous here for callers (and ECE is - * not even correctly set) - */ - tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn, - TCPHDR_SYN | TCPHDR_ACK); - - th->seq = htonl(TCP_SKB_CB(skb)->seq); + skb->ip_summed = CHECKSUM_PARTIAL; + th->seq = htonl(tcp_rsk(req)->snt_isn); /* XXX data is queued and acked as is. No buffer/window check */ th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt); diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index a1dca3b169a1..c9fac08a53b1 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2077,7 +2077,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_min_size; int alloc_size; @@ -2124,9 +2124,11 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); - len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb); - if (len > 0) { + if (nlk->dump_done_errno > 0 || + skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) @@ -2136,13 +2138,15 @@ static int netlink_dump(struct sock *sk) return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, + sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb; nl_dump_check_consistent(cb, nlh); - memcpy(nlmsg_data(nlh), &len, sizeof(len)); + memcpy(nlmsg_data(nlh), &nlk->dump_done_errno, + sizeof(nlk->dump_done_errno)); if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2214,6 +2218,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, } nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX; mutex_unlock(nlk->cb_mutex); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index 4fdb38318977..bae961cfa3ad 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -24,6 +24,7 @@ struct netlink_sock { wait_queue_head_t wait; bool bound; bool cb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex *cb_mutex; struct mutex cb_def_mutex; diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index f7f00d012888..5d015270e454 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -806,9 +806,10 @@ static void sctp_inet6_skb_msgname(struct sk_buff *skb, char *msgname, addr->v6.sin6_flowinfo = 0; addr->v6.sin6_port = sh->source; addr->v6.sin6_addr = ipv6_hdr(skb)->saddr; - if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) { + if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) addr->v6.sin6_scope_id = sctp_v6_skb_iif(skb); - } + else + addr->v6.sin6_scope_id = 0; } *addr_len = sctp_v6_addr_to_user(sctp_sk(skb->sk), addr); diff --git a/net/sctp/socket.c b/net/sctp/socket.c index ffcc8aa78db7..c062ceae19e6 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4764,6 +4764,10 @@ int sctp_do_peeloff(struct sock *sk, sctp_assoc_t id, struct socket **sockp) struct socket *sock; int err = 0; + /* Do not peel off from one netns to another one. */ + if (!net_eq(current->nsproxy->net_ns, sock_net(sk))) + return -EINVAL; + if (!asoc) return -EINVAL; diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c index 097459830454..6830d2427e47 100644 --- a/security/integrity/ima/ima_appraise.c +++ b/security/integrity/ima/ima_appraise.c @@ -303,6 +303,9 @@ void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file) if (iint->flags & IMA_DIGSIG) return; + if (iint->ima_file_status != INTEGRITY_PASS) + return; + rc = ima_collect_measurement(iint, file, NULL, 0, ima_hash_algo); if (rc < 0) return;

8 years

1
0
0 0

Re: [Linux-stable-mirror] Linux 4.4.101

by Greg KH

diff --git a/Makefile b/Makefile index 91dd7832f499..0d7b050427ed 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 100 +SUBLEVEL = 101 EXTRAVERSION = NAME = Blurry Fish Butt diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index 210826d5bba5..9119722eb347 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -64,8 +64,7 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom, /* * We need to switch to kernel mode so that we can use __get_user - * to safely read from kernel space. Note that we now dump the - * code first, just in case the backtrace kills us. + * to safely read from kernel space. */ fs = get_fs(); set_fs(KERNEL_DS); @@ -111,21 +110,12 @@ static void dump_backtrace_entry(unsigned long where) print_ip_sym(where); } -static void dump_instr(const char *lvl, struct pt_regs *regs) +static void __dump_instr(const char *lvl, struct pt_regs *regs) { unsigned long addr = instruction_pointer(regs); - mm_segment_t fs; char str[sizeof("00000000 ") * 5 + 2 + 1], *p = str; int i; - /* - * We need to switch to kernel mode so that we can use __get_user - * to safely read from kernel space. Note that we now dump the - * code first, just in case the backtrace kills us. - */ - fs = get_fs(); - set_fs(KERNEL_DS); - for (i = -4; i < 1; i++) { unsigned int val, bad; @@ -139,8 +129,18 @@ static void dump_instr(const char *lvl, struct pt_regs *regs) } } printk("%sCode: %s\n", lvl, str); +} - set_fs(fs); +static void dump_instr(const char *lvl, struct pt_regs *regs) +{ + if (!user_mode(regs)) { + mm_segment_t fs = get_fs(); + set_fs(KERNEL_DS); + __dump_instr(lvl, regs); + set_fs(fs); + } else { + __dump_instr(lvl, regs); + } } static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index 25372dc381d4..5cb5e8ff0224 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -4029,7 +4029,8 @@ smi_from_recv_msg(ipmi_smi_t intf, struct ipmi_recv_msg *recv_msg, } static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, - struct list_head *timeouts, long timeout_period, + struct list_head *timeouts, + unsigned long timeout_period, int slot, unsigned long *flags, unsigned int *waiting_msgs) { @@ -4042,8 +4043,8 @@ static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, if (!ent->inuse) return; - ent->timeout -= timeout_period; - if (ent->timeout > 0) { + if (timeout_period < ent->timeout) { + ent->timeout -= timeout_period; (*waiting_msgs)++; return; } @@ -4109,7 +4110,8 @@ static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, } } -static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, long timeout_period) +static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, + unsigned long timeout_period) { struct list_head timeouts; struct ipmi_recv_msg *msg, *msg2; diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 5dca77e0ffed..2cb34b0f3856 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3166,7 +3166,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) hash ^= (hash >> 16); hash ^= (hash >> 8); - return hash; + return hash >> 1; } /*-------------------------- Device entry points ----------------------------*/ diff --git a/drivers/net/ethernet/fealnx.c b/drivers/net/ethernet/fealnx.c index b1b9ebafb354..a3b2e23921bf 100644 --- a/drivers/net/ethernet/fealnx.c +++ b/drivers/net/ethernet/fealnx.c @@ -257,8 +257,8 @@ enum rx_desc_status_bits { RXFSD = 0x00000800, /* first descriptor */ RXLSD = 0x00000400, /* last descriptor */ ErrorSummary = 0x80, /* error summary */ - RUNT = 0x40, /* runt packet received */ - LONG = 0x20, /* long packet received */ + RUNTPKT = 0x40, /* runt packet received */ + LONGPKT = 0x20, /* long packet received */ FAE = 0x10, /* frame align error */ CRC = 0x08, /* crc error */ RXER = 0x04, /* receive error */ @@ -1633,7 +1633,7 @@ static int netdev_rx(struct net_device *dev) dev->name, rx_status); dev->stats.rx_errors++; /* end of a packet. */ - if (rx_status & (LONG | RUNT)) + if (rx_status & (LONGPKT | RUNTPKT)) dev->stats.rx_length_errors++; if (rx_status & RXER) dev->stats.rx_frame_errors++; diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 669edbd47602..d6ceb8b91cd6 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -350,8 +350,8 @@ static void async_completion(struct nvme_queue *nvmeq, void *ctx, struct async_cmd_info *cmdinfo = ctx; cmdinfo->result = le32_to_cpup(&cqe->result); cmdinfo->status = le16_to_cpup(&cqe->status) >> 1; - queue_kthread_work(cmdinfo->worker, &cmdinfo->work); blk_mq_free_request(cmdinfo->req); + queue_kthread_work(cmdinfo->worker, &cmdinfo->work); } static inline struct nvme_cmd_info *get_cmd_from_tag(struct nvme_queue *nvmeq, diff --git a/drivers/tty/serial/omap-serial.c b/drivers/tty/serial/omap-serial.c index de1c143b475f..21fc9b3a27cf 100644 --- a/drivers/tty/serial/omap-serial.c +++ b/drivers/tty/serial/omap-serial.c @@ -693,7 +693,7 @@ static void serial_omap_set_mctrl(struct uart_port *port, unsigned int mctrl) if ((mctrl & TIOCM_RTS) && (port->status & UPSTAT_AUTORTS)) up->efr |= UART_EFR_RTS; else - up->efr &= UART_EFR_RTS; + up->efr &= ~UART_EFR_RTS; serial_out(up, UART_EFR, up->efr); serial_out(up, UART_LCR, lcr); diff --git a/fs/coda/upcall.c b/fs/coda/upcall.c index f6c6c8adbc01..7289f0a7670b 100644 --- a/fs/coda/upcall.c +++ b/fs/coda/upcall.c @@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb, struct CodaFid *fid) UPARG(CODA_FSYNC); inp->coda_fsync.VFid = *fid; - error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs), - &outsize, inp); + error = coda_upcall(coda_vcp(sb), insize, &outsize, inp); CODA_FREE(inp, insize); return error; diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 1d738723a41a..501ecc4a1ac4 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1166,6 +1166,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) } size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE; if (size_change) { + /* + * Here we should wait dio to finish before inode lock + * to avoid a deadlock between ocfs2_setattr() and + * ocfs2_dio_end_io_write() + */ + inode_dio_wait(inode); + status = ocfs2_rw_lock(inode, 1); if (status < 0) { mlog_errno(status); @@ -1186,8 +1193,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) if (status) goto bail_unlock; - inode_dio_wait(inode); - if (i_size_read(inode) >= attr->ia_size) { if (ocfs2_should_order_data(inode)) { status = ocfs2_begin_ordered_truncate(inode, diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 5b609a3ce3d7..ff88d6189411 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -688,7 +688,8 @@ typedef struct pglist_data { * is the first PFN that needs to be initialised. */ unsigned long first_deferred_pfn; - unsigned long static_init_size; + /* Number of non-deferred pages */ + unsigned long static_init_pgcnt; #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ } pg_data_t; diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index bf268fa92c5b..fec40271339f 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -46,33 +46,62 @@ extern struct page_ext_operations page_idle_ops; static inline bool page_is_young(struct page *page) { - return test_bit(PAGE_EXT_YOUNG, &lookup_page_ext(page)->flags); + struct page_ext *page_ext = lookup_page_ext(page); + + if (unlikely(!page_ext)) + return false; + + return test_bit(PAGE_EXT_YOUNG, &page_ext->flags); } static inline void set_page_young(struct page *page) { - set_bit(PAGE_EXT_YOUNG, &lookup_page_ext(page)->flags); + struct page_ext *page_ext = lookup_page_ext(page); + + if (unlikely(!page_ext)) + return; + + set_bit(PAGE_EXT_YOUNG, &page_ext->flags); } static inline bool test_and_clear_page_young(struct page *page) { - return test_and_clear_bit(PAGE_EXT_YOUNG, - &lookup_page_ext(page)->flags); + struct page_ext *page_ext = lookup_page_ext(page); + + if (unlikely(!page_ext)) + return false; + + return test_and_clear_bit(PAGE_EXT_YOUNG, &page_ext->flags); } static inline bool page_is_idle(struct page *page) { - return test_bit(PAGE_EXT_IDLE, &lookup_page_ext(page)->flags); + struct page_ext *page_ext = lookup_page_ext(page); + + if (unlikely(!page_ext)) + return false; + + return test_bit(PAGE_EXT_IDLE, &page_ext->flags); } static inline void set_page_idle(struct page *page) { - set_bit(PAGE_EXT_IDLE, &lookup_page_ext(page)->flags); + struct page_ext *page_ext = lookup_page_ext(page); + + if (unlikely(!page_ext)) + return; + + set_bit(PAGE_EXT_IDLE, &page_ext->flags); } static inline void clear_page_idle(struct page *page) { - clear_bit(PAGE_EXT_IDLE, &lookup_page_ext(page)->flags); + struct page_ext *page_ext = lookup_page_ext(page); + + if (unlikely(!page_ext)) + return; + + clear_bit(PAGE_EXT_IDLE, &page_ext->flags); } #endif /* CONFIG_64BIT */ diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 3f61c647fc5c..b5421f6f155a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3400,6 +3400,13 @@ static inline void nf_reset_trace(struct sk_buff *skb) #endif } +static inline void ipvs_reset(struct sk_buff *skb) +{ +#if IS_ENABLED(CONFIG_IP_VS) + skb->ipvs_property = 0; +#endif +} + /* Note: This doesn't put any conntrack and bridge info in dst. */ static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src, bool copy) diff --git a/mm/debug-pagealloc.c b/mm/debug-pagealloc.c index 5bf5906ce13b..fe1c61f7cf26 100644 --- a/mm/debug-pagealloc.c +++ b/mm/debug-pagealloc.c @@ -34,6 +34,8 @@ static inline void set_page_poison(struct page *page) struct page_ext *page_ext; page_ext = lookup_page_ext(page); + if (page_ext) + return; __set_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags); } @@ -42,6 +44,8 @@ static inline void clear_page_poison(struct page *page) struct page_ext *page_ext; page_ext = lookup_page_ext(page); + if (page_ext) + return; __clear_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags); } @@ -50,6 +54,8 @@ static inline bool page_poison(struct page *page) struct page_ext *page_ext; page_ext = lookup_page_ext(page); + if (page_ext) + return false; return test_bit(PAGE_EXT_DEBUG_POISON, &page_ext->flags); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6b5421ae86c6..3c70f03d91ec 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -267,28 +267,37 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + +/* + * Determine how many pages need to be initialized durig early boot + * (non-deferred initialization). + * The value of first_deferred_pfn will be set later, once non-deferred pages + * are initialized, but for now set it ULONG_MAX. + */ static inline void reset_deferred_meminit(pg_data_t *pgdat) { - unsigned long max_initialise; - unsigned long reserved_lowmem; + phys_addr_t start_addr, end_addr; + unsigned long max_pgcnt; + unsigned long reserved; /* * Initialise at least 2G of a node but also take into account that * two large system hashes that can take up 1GB for 0.25TB/node. */ - max_initialise = max(2UL << (30 - PAGE_SHIFT), - (pgdat->node_spanned_pages >> 8)); + max_pgcnt = max(2UL << (30 - PAGE_SHIFT), + (pgdat->node_spanned_pages >> 8)); /* * Compensate the all the memblock reservations (e.g. crash kernel) * from the initial estimation to make sure we will initialize enough * memory to boot. */ - reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn, - pgdat->node_start_pfn + max_initialise); - max_initialise += reserved_lowmem; + start_addr = PFN_PHYS(pgdat->node_start_pfn); + end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt); + reserved = memblock_reserved_memory_within(start_addr, end_addr); + max_pgcnt += PHYS_PFN(reserved); - pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages); + pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages); pgdat->first_deferred_pfn = ULONG_MAX; } @@ -324,7 +333,7 @@ static inline bool update_defer_init(pg_data_t *pgdat, return true; /* Initialise at least 2G of the highest zone */ (*nr_initialised)++; - if ((*nr_initialised > pgdat->static_init_size) && + if ((*nr_initialised > pgdat->static_init_pgcnt) && (pfn & (PAGES_PER_SECTION - 1)) == 0) { pgdat->first_deferred_pfn = pfn; return false; @@ -560,6 +569,9 @@ static inline void set_page_guard(struct zone *zone, struct page *page, return; page_ext = lookup_page_ext(page); + if (unlikely(!page_ext)) + return; + __set_bit(PAGE_EXT_DEBUG_GUARD, &page_ext->flags); INIT_LIST_HEAD(&page->lru); @@ -577,6 +589,9 @@ static inline void clear_page_guard(struct zone *zone, struct page *page, return; page_ext = lookup_page_ext(page); + if (unlikely(!page_ext)) + return; + __clear_bit(PAGE_EXT_DEBUG_GUARD, &page_ext->flags); set_page_private(page, 0); diff --git a/mm/page_ext.c b/mm/page_ext.c index 292ca7b8debd..4d1eac0d4fc5 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -106,7 +106,6 @@ struct page_ext *lookup_page_ext(struct page *page) struct page_ext *base; base = NODE_DATA(page_to_nid(page))->node_page_ext; -#ifdef CONFIG_DEBUG_VM /* * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are @@ -115,7 +114,6 @@ struct page_ext *lookup_page_ext(struct page *page) */ if (unlikely(!base)) return NULL; -#endif offset = pfn - round_down(node_start_pfn(page_to_nid(page)), MAX_ORDER_NR_PAGES); return base + offset; @@ -180,7 +178,6 @@ struct page_ext *lookup_page_ext(struct page *page) { unsigned long pfn = page_to_pfn(page); struct mem_section *section = __pfn_to_section(pfn); -#ifdef CONFIG_DEBUG_VM /* * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are @@ -189,7 +186,6 @@ struct page_ext *lookup_page_ext(struct page *page) */ if (!section->page_ext) return NULL; -#endif return section->page_ext + pfn; } diff --git a/mm/page_owner.c b/mm/page_owner.c index 983c3a10fa07..dd6b9cebf981 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -53,6 +53,8 @@ void __reset_page_owner(struct page *page, unsigned int order) for (i = 0; i < (1 << order); i++) { page_ext = lookup_page_ext(page + i); + if (unlikely(!page_ext)) + continue; __clear_bit(PAGE_EXT_OWNER, &page_ext->flags); } } @@ -60,6 +62,7 @@ void __reset_page_owner(struct page *page, unsigned int order) void __set_page_owner(struct page *page, unsigned int order, gfp_t gfp_mask) { struct page_ext *page_ext = lookup_page_ext(page); + struct stack_trace trace = { .nr_entries = 0, .max_entries = ARRAY_SIZE(page_ext->trace_entries), @@ -67,6 +70,9 @@ void __set_page_owner(struct page *page, unsigned int order, gfp_t gfp_mask) .skip = 3, }; + if (unlikely(!page_ext)) + return; + save_stack_trace(&trace); page_ext->order = order; @@ -79,6 +85,12 @@ void __set_page_owner(struct page *page, unsigned int order, gfp_t gfp_mask) gfp_t __get_page_owner_gfp(struct page *page) { struct page_ext *page_ext = lookup_page_ext(page); + if (unlikely(!page_ext)) + /* + * The caller just returns 0 if no valid gfp + * So return 0 here too. + */ + return 0; return page_ext->gfp_mask; } @@ -194,6 +206,8 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) } page_ext = lookup_page_ext(page); + if (unlikely(!page_ext)) + continue; /* * Some pages could be missed by concurrent allocation or free, @@ -257,6 +271,8 @@ static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone) continue; page_ext = lookup_page_ext(page); + if (unlikely(!page_ext)) + continue; /* Maybe overraping zone */ if (test_bit(PAGE_EXT_OWNER, &page_ext->flags)) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 29f2f8b853ae..c2cbd2620169 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -142,8 +142,12 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, do { next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask); - if (pte && walk->hugetlb_entry) + + if (pte) err = walk->hugetlb_entry(pte, hmask, addr, next, walk); + else if (walk->pte_hole) + err = walk->pte_hole(addr, next, walk); + if (err) break; } while (addr = next, addr != end); diff --git a/mm/vmstat.c b/mm/vmstat.c index c54fd2924f25..c344e3609c53 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1091,6 +1091,8 @@ static void pagetypeinfo_showmixedcount_print(struct seq_file *m, continue; page_ext = lookup_page_ext(page); + if (unlikely(!page_ext)) + continue; if (!test_bit(PAGE_EXT_OWNER, &page_ext->flags)) continue; diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 5e4199d5a388..01abb6431fd9 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -376,6 +376,9 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, dev->name); vlan_vid_add(dev, htons(ETH_P_8021Q), 0); } + if (event == NETDEV_DOWN && + (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) + vlan_vid_del(dev, htons(ETH_P_8021Q), 0); vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) @@ -423,9 +426,6 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, struct net_device *tmp; LIST_HEAD(close_list); - if (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER) - vlan_vid_del(dev, htons(ETH_P_8021Q), 0); - /* Put all VLANs for this dev in the down state too. */ vlan_group_for_each_dev(grp, i, vlandev) { flgs = vlandev->flags; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 73dfd7729bc9..d33609c2f276 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4229,6 +4229,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet) if (!xnet) return; + ipvs_reset(skb); skb_orphan(skb); skb->mark = 0; } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 64c7ce847584..39c2919fe0d3 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3018,13 +3018,8 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, tcp_ecn_make_synack(req, th); th->source = htons(ireq->ir_num); th->dest = ireq->ir_rmt_port; - /* Setting of flags are superfluous here for callers (and ECE is - * not even correctly set) - */ - tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn, - TCPHDR_SYN | TCPHDR_ACK); - - th->seq = htonl(TCP_SKB_CB(skb)->seq); + skb->ip_summed = CHECKSUM_PARTIAL; + th->seq = htonl(tcp_rsk(req)->snt_isn); /* XXX data is queued and acked as is. No buffer/window check */ th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt); diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index acfb16fdcd55..9ecdd61c6463 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2077,7 +2077,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_min_size; int alloc_size; @@ -2125,9 +2125,11 @@ static int netlink_dump(struct sock *sk) skb_reserve(skb, skb_tailroom(skb) - alloc_size); netlink_skb_set_owner_r(skb, sk); - len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb); - if (len > 0) { + if (nlk->dump_done_errno > 0 || + skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) @@ -2137,13 +2139,15 @@ static int netlink_dump(struct sock *sk) return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, + sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb; nl_dump_check_consistent(cb, nlh); - memcpy(nlmsg_data(nlh), &len, sizeof(len)); + memcpy(nlmsg_data(nlh), &nlk->dump_done_errno, + sizeof(nlk->dump_done_errno)); if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2208,6 +2212,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, cb->skb = skb; nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX; mutex_unlock(nlk->cb_mutex); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index ea4600aea6b0..d987696c0eb4 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -38,6 +38,7 @@ struct netlink_sock { wait_queue_head_t wait; bool bound; bool cb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex *cb_mutex; struct mutex cb_def_mutex; diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index e33e9bd4ed5a..8a61ccc37e12 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -806,6 +806,8 @@ static void sctp_inet6_skb_msgname(struct sk_buff *skb, char *msgname, if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) { struct sctp_ulpevent *ev = sctp_skb2event(skb); addr->v6.sin6_scope_id = ev->iif; + } else { + addr->v6.sin6_scope_id = 0; } } diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 7f0f689b8d2b..272edd7748a0 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4453,6 +4453,10 @@ int sctp_do_peeloff(struct sock *sk, sctp_assoc_t id, struct socket **sockp) struct socket *sock; int err = 0; + /* Do not peel off from one netns to another one. */ + if (!net_eq(current->nsproxy->net_ns, sock_net(sk))) + return -EINVAL; + /* Do not peel off from one netns to another one. */ if (!net_eq(current->nsproxy->net_ns, sock_net(sk))) return -EINVAL; diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c index 9ce9d5003dcc..19014293f927 100644 --- a/security/integrity/ima/ima_appraise.c +++ b/security/integrity/ima/ima_appraise.c @@ -297,6 +297,9 @@ void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file) if (iint->flags & IMA_DIGSIG) return; + if (iint->ima_file_status != INTEGRITY_PASS) + return; + rc = ima_collect_measurement(iint, file, NULL, NULL); if (rc < 0) return;

8 years

1
0
0 0

Re: [Linux-stable-mirror] Linux 3.18.84

by Greg KH

diff --git a/Makefile b/Makefile index 8a1e51e5b0cf..107b5778b864 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 3 PATCHLEVEL = 18 -SUBLEVEL = 83 +SUBLEVEL = 84 EXTRAVERSION = NAME = Diseased Newt diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index f816211f062f..63164ff66bb4 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -4010,7 +4010,8 @@ smi_from_recv_msg(ipmi_smi_t intf, struct ipmi_recv_msg *recv_msg, } static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, - struct list_head *timeouts, long timeout_period, + struct list_head *timeouts, + unsigned long timeout_period, int slot, unsigned long *flags, unsigned int *waiting_msgs) { @@ -4023,8 +4024,8 @@ static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, if (!ent->inuse) return; - ent->timeout -= timeout_period; - if (ent->timeout > 0) { + if (timeout_period < ent->timeout) { + ent->timeout -= timeout_period; (*waiting_msgs)++; return; } @@ -4091,7 +4092,8 @@ static void check_msg_timeout(ipmi_smi_t intf, struct seq_table *ent, } } -static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, long timeout_period) +static unsigned int ipmi_timeout_handler(ipmi_smi_t intf, + unsigned long timeout_period) { struct list_head timeouts; struct ipmi_recv_msg *msg, *msg2; diff --git a/drivers/net/ethernet/fealnx.c b/drivers/net/ethernet/fealnx.c index b1b9ebafb354..a3b2e23921bf 100644 --- a/drivers/net/ethernet/fealnx.c +++ b/drivers/net/ethernet/fealnx.c @@ -257,8 +257,8 @@ enum rx_desc_status_bits { RXFSD = 0x00000800, /* first descriptor */ RXLSD = 0x00000400, /* last descriptor */ ErrorSummary = 0x80, /* error summary */ - RUNT = 0x40, /* runt packet received */ - LONG = 0x20, /* long packet received */ + RUNTPKT = 0x40, /* runt packet received */ + LONGPKT = 0x20, /* long packet received */ FAE = 0x10, /* frame align error */ CRC = 0x08, /* crc error */ RXER = 0x04, /* receive error */ @@ -1633,7 +1633,7 @@ static int netdev_rx(struct net_device *dev) dev->name, rx_status); dev->stats.rx_errors++; /* end of a packet. */ - if (rx_status & (LONG | RUNT)) + if (rx_status & (LONGPKT | RUNTPKT)) dev->stats.rx_length_errors++; if (rx_status & RXER) dev->stats.rx_frame_errors++; diff --git a/fs/coda/upcall.c b/fs/coda/upcall.c index 5bb6e27298a4..21dbff85829a 100644 --- a/fs/coda/upcall.c +++ b/fs/coda/upcall.c @@ -446,8 +446,7 @@ int venus_fsync(struct super_block *sb, struct CodaFid *fid) UPARG(CODA_FSYNC); inp->coda_fsync.VFid = *fid; - error = coda_upcall(coda_vcp(sb), sizeof(union inputArgs), - &outsize, inp); + error = coda_upcall(coda_vcp(sb), insize, &outsize, inp); CODA_FREE(inp, insize); return error; diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 2adcb9876e91..6c6fa10a82ca 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1151,6 +1151,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) dquot_initialize(inode); size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE; if (size_change) { + /* + * Here we should wait dio to finish before inode lock + * to avoid a deadlock between ocfs2_setattr() and + * ocfs2_dio_end_io_write() + */ + inode_dio_wait(inode); + status = ocfs2_rw_lock(inode, 1); if (status < 0) { mlog_errno(status); @@ -1170,8 +1177,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) if (status) goto bail_unlock; - inode_dio_wait(inode); - if (i_size_read(inode) >= attr->ia_size) { if (ocfs2_should_order_data(inode)) { status = ocfs2_begin_ordered_truncate(inode, diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2ff757f2d3a3..e5ba0236047e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3117,6 +3117,13 @@ static inline void nf_reset_trace(struct sk_buff *skb) #endif } +static inline void ipvs_reset(struct sk_buff *skb) +{ +#if IS_ENABLED(CONFIG_IP_VS) + skb->ipvs_property = 0; +#endif +} + /* Note: This doesn't put any conntrack and bridge info in dst. */ static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src, bool copy) diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 4ccf021b85d5..6b9c7eaca478 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -376,6 +376,9 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, dev->name); vlan_vid_add(dev, htons(ETH_P_8021Q), 0); } + if (event == NETDEV_DOWN && + (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) + vlan_vid_del(dev, htons(ETH_P_8021Q), 0); vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) @@ -420,9 +423,6 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, break; case NETDEV_DOWN: - if (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER) - vlan_vid_del(dev, htons(ETH_P_8021Q), 0); - /* Put all VLANs for this dev in the down state too. */ vlan_group_for_each_dev(grp, i, vlandev) { flgs = vlandev->flags; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index b2e2a53c2284..9c830242a1f9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4069,6 +4069,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet) if (!xnet) return; + ipvs_reset(skb); skb_orphan(skb); skb->mark = 0; } diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index ff186dac3e07..4d925dbe4bb7 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -487,6 +487,9 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk, newsk->sk_backlog_rcv = dccp_v4_do_rcv; newnp->pktoptions = NULL; newnp->opt = NULL; + newnp->ipv6_mc_list = NULL; + newnp->ipv6_ac_list = NULL; + newnp->ipv6_fl_list = NULL; newnp->mcast_oif = inet6_iif(skb); newnp->mcast_hops = ipv6_hdr(skb)->hop_limit; @@ -562,6 +565,10 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk, /* Clone RX bits */ newnp->rxopt.all = np->rxopt.all; + newnp->ipv6_mc_list = NULL; + newnp->ipv6_ac_list = NULL; + newnp->ipv6_fl_list = NULL; + /* Clone pktoptions received with SYN */ newnp->pktoptions = NULL; if (ireq->pktopts != NULL) { diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index ff47e881e205..55da3338bfb2 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2911,13 +2911,8 @@ struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst, tcp_ecn_make_synack(req, th, sk); th->source = htons(ireq->ir_num); th->dest = ireq->ir_rmt_port; - /* Setting of flags are superfluous here for callers (and ECE is - * not even correctly set) - */ - tcp_init_nondata_skb(skb, tcp_rsk(req)->snt_isn, - TCPHDR_SYN | TCPHDR_ACK); - - th->seq = htonl(TCP_SKB_CB(skb)->seq); + skb->ip_summed = CHECKSUM_PARTIAL; + th->seq = htonl(tcp_rsk(req)->snt_isn); /* XXX data is queued and acked as is. No buffer/window check */ th->ack_seq = htonl(tcp_rsk(req)->rcv_nxt); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 19fe7b789a72..b8e05f16659c 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1113,6 +1113,7 @@ static struct sock *tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb, newtp->af_specific = &tcp_sock_ipv6_mapped_specific; #endif + newnp->ipv6_mc_list = NULL; newnp->ipv6_ac_list = NULL; newnp->ipv6_fl_list = NULL; newnp->pktoptions = NULL; @@ -1184,6 +1185,7 @@ static struct sock *tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb, First: no IPv4 options. */ newinet->inet_opt = NULL; + newnp->ipv6_mc_list = NULL; newnp->ipv6_ac_list = NULL; newnp->ipv6_fl_list = NULL; diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 4792e76b7d4a..d22e8d210fce 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1927,7 +1927,7 @@ static int netlink_dump(struct sock *sk) struct sk_buff *skb = NULL; struct nlmsghdr *nlh; struct module *module; - int len, err = -ENOBUFS; + int err = -ENOBUFS; int alloc_size; mutex_lock(nlk->cb_mutex); @@ -1965,9 +1965,11 @@ static int netlink_dump(struct sock *sk) goto errout_skb; netlink_skb_set_owner_r(skb, sk); - len = cb->dump(skb, cb); + if (nlk->dump_done_errno > 0) + nlk->dump_done_errno = cb->dump(skb, cb); - if (len > 0) { + if (nlk->dump_done_errno > 0 || + skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) @@ -1977,13 +1979,15 @@ static int netlink_dump(struct sock *sk) return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, + sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) goto errout_skb; nl_dump_check_consistent(cb, nlh); - memcpy(nlmsg_data(nlh), &len, sizeof(len)); + memcpy(nlmsg_data(nlh), &nlk->dump_done_errno, + sizeof(nlk->dump_done_errno)); if (sk_filter(sk, skb)) kfree_skb(skb); @@ -2048,6 +2052,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb, cb->skb = skb; nlk->cb_running = true; + nlk->dump_done_errno = INT_MAX; mutex_unlock(nlk->cb_mutex); diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h index c6bd3dd35cb3..81a2d371bcf6 100644 --- a/net/netlink/af_netlink.h +++ b/net/netlink/af_netlink.h @@ -35,6 +35,7 @@ struct netlink_sock { size_t max_recvmsg_len; wait_queue_head_t wait; bool cb_running; + int dump_done_errno; struct netlink_callback cb; struct mutex *cb_mutex; struct mutex cb_def_mutex; diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index 70966fee8835..5b5ddc0f2dbc 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -805,6 +805,8 @@ static void sctp_inet6_skb_msgname(struct sk_buff *skb, char *msgname, if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) { struct sctp_ulpevent *ev = sctp_skb2event(skb); addr->v6.sin6_scope_id = ev->iif; + } else { + addr->v6.sin6_scope_id = 0; } } diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 07324ca1df1d..3f89cd063246 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4466,6 +4466,10 @@ int sctp_do_peeloff(struct sock *sk, sctp_assoc_t id, struct socket **sockp) struct socket *sock; int err = 0; + /* Do not peel off from one netns to another one. */ + if (!net_eq(current->nsproxy->net_ns, sock_net(sk))) + return -EINVAL; + if (!asoc) return -EINVAL; diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c index dd88b6e1e8a1..ee7618115756 100644 --- a/security/integrity/ima/ima_appraise.c +++ b/security/integrity/ima/ima_appraise.c @@ -297,6 +297,9 @@ void ima_update_xattr(struct integrity_iint_cache *iint, struct file *file) if (iint->flags & IMA_DIGSIG) return; + if (iint->ima_file_status != INTEGRITY_PASS) + return; + rc = ima_collect_measurement(iint, file, NULL, NULL); if (rc < 0) return;

8 years

1
0
0 0

[Linux-stable-mirror] [tip:x86/urgent] x86/decoder: Add new TEST instruction pattern

by tip-bot for Masami Hiramatsu

Commit-ID: 12a78d43de767eaf8fb272facb7a7b6f2dc6a9df Gitweb: https://git.kernel.org/tip/12a78d43de767eaf8fb272facb7a7b6f2dc6a9df Author: Masami Hiramatsu <mhiramat(a)kernel.org> AuthorDate: Fri, 24 Nov 2017 13:56:30 +0900 Committer: Ingo Molnar <mingo(a)kernel.org> CommitDate: Fri, 24 Nov 2017 08:36:12 +0100 x86/decoder: Add new TEST instruction pattern The kbuild test robot reported this build warning: Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx) Warning: objdump says 3 bytes, but insn_get_length() says 2 Warning: decoded and checked 1569014 instructions with 1 warnings This sequence seems to be a new instruction not in the opcode map in the Intel SDM. The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8. Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of the ModR/M Byte (bits 2,1,0 in parenthesis)" In that table, opcodes listed by the index REG bits as: 000 001 010 011 100 101 110 111 TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX So, it seems TEST Ib is assigned to 001. Add the new pattern. Reported-by: kbuild test robot <fengguang.wu(a)intel.com> Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: <stable(a)vger.kernel.org> Cc: H. Peter Anvin <hpa(a)zytor.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Ingo Molnar <mingo(a)kernel.org> --- arch/x86/lib/x86-opcode-map.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt index 12e3771..c4d5591 100644 --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -896,7 +896,7 @@ EndTable GrpTable: Grp3_1 0: TEST Eb,Ib -1: +1: TEST Eb,Ib 2: NOT Eb 3: NEG Eb 4: MUL AL,Eb

8 years

1
0
0 0

[Linux-stable-mirror] [tip:x86/urgent] x86/decoder: Add new TEST instruction pattern

by tip-bot for Masami Hiramatsu

Commit-ID: 843c885f41b3c74212d8a85da95993075dd41415 Gitweb: https://git.kernel.org/tip/843c885f41b3c74212d8a85da95993075dd41415 Author: Masami Hiramatsu <mhiramat(a)kernel.org> AuthorDate: Fri, 24 Nov 2017 13:56:30 +0900 Committer: Ingo Molnar <mingo(a)kernel.org> CommitDate: Fri, 24 Nov 2017 08:16:09 +0100 x86/decoder: Add new TEST instruction pattern The kbuild test robot reported this build warning: Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx) Warning: objdump says 3 bytes, but insn_get_length() says 2 Warning: decoded and checked 1569014 instructions with 1 warnings This sequence seems to be a new instruction not in the opcode map in the Intel SDM. The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8. Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of the ModR/M Byte (bits 2,1,0 in parenthesis)" In that table, opcodes listed by the index REG bits as: 000 001 010 011 100 101 110 111 TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX So, it seems TEST Ib is assigned to 001. Add the new pattern. Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Reported-by: kbuild test robot <fengguang.wu(a)intel.com> Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: <stable(a)vger.kernel.org> Cc: H. Peter Anvin <hpa(a)zytor.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: linux-kernel(a)vger.kernel.org Signed-off-by: Ingo Molnar <mingo(a)kernel.org> --- arch/x86/lib/x86-opcode-map.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt index 12e3771..c4d5591 100644 --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -896,7 +896,7 @@ EndTable GrpTable: Grp3_1 0: TEST Eb,Ib -1: +1: TEST Eb,Ib 2: NOT Eb 3: NEG Eb 4: MUL AL,Eb

8 years

1
0
0 0

[Linux-stable-mirror] HELLO!

by Mr. G. CHA

I have important transaction for you as next of kin to claim US$8.37m email me at changgordon61(a)yahoo.com.hk so I can send you more details

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH v2 0/4] fix device-dax pud crash and fixup {pte, pmd, pud}_write

by Dan Williams

Changes since v1 [1]: * fix arm64 compilation, add __HAVE_ARCH_PUD_WRITE * fix sparc64 compilation, add __HAVE_ARCH_PUD_WRITE * fix s390 compilation, add a pud_write() helper --- Andrew, Here is a third version to the pud_write() fix [2], and some follow-on patches to use the '_access_permitted' helpers in fault and get_user_pages() paths where we are checking if the thread has access to write. I explicitly omit conversions for places where the kernel is checking the _PAGE_RW flag for kernel purposes, not for userspace access. Beyond fixing the crash, this series also fixes get_user_pages() and fault paths to honor protection keys in the same manner as get_user_pages_fast(). Only the crash fix is tagged for -stable as the protection key check is done just for consistency reasons since userspace can change protection keys at will. [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-November/013249.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-November/013237.html --- Dan Williams (4): mm: fix device-dax pud write-faults triggered by get_user_pages() mm: replace pud_write with pud_access_permitted in fault + gup paths mm: replace pmd_write with pmd_access_permitted in fault + gup paths mm: replace pte_write with pte_access_permitted in fault + gup paths arch/arm64/include/asm/pgtable.h | 1 + arch/s390/include/asm/pgtable.h | 6 ++++++ arch/sparc/include/asm/pgtable_64.h | 1 + arch/sparc/mm/gup.c | 4 ++-- arch/x86/include/asm/pgtable.h | 6 ++++++ fs/dax.c | 3 ++- include/asm-generic/pgtable.h | 9 +++++++++ include/linux/hugetlb.h | 8 -------- mm/gup.c | 2 +- mm/hmm.c | 8 ++++---- mm/huge_memory.c | 6 +++--- mm/memory.c | 8 ++++---- 12 files changed, 39 insertions(+), 23 deletions(-)

8 years

2
2
0 0

[Linux-stable-mirror] [PATCH] HID: asus: Add product-id for the T100TAF keyboard dock

by Hans de Goede

The T100TAF keyboard dock has the same special keys and custom protocol multitouch touchpad as the T100TA, but uses a different product id, add support for this. Cc: stable(a)vger.kernel.org BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=197849 Signed-off-by: Hans de Goede <hdegoede(a)redhat.com> --- drivers/hid/hid-asus.c | 5 ++++- drivers/hid/hid-core.c | 3 ++- drivers/hid/hid-ids.h | 3 ++- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/hid/hid-asus.c b/drivers/hid/hid-asus.c index 1bb7b63b3150..4cdac9857037 100644 --- a/drivers/hid/hid-asus.c +++ b/drivers/hid/hid-asus.c @@ -751,7 +751,10 @@ static const struct hid_device_id asus_devices[] = { { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_ROG_KEYBOARD3), QUIRK_G752_KEYBOARD }, { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, - USB_DEVICE_ID_ASUSTEK_T100_KEYBOARD), + USB_DEVICE_ID_ASUSTEK_T100TA_KEYBOARD), + QUIRK_T100_KEYBOARD | QUIRK_NO_CONSUMER_USAGES }, + { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, + USB_DEVICE_ID_ASUSTEK_T100TAF_KEYBOARD), QUIRK_T100_KEYBOARD | QUIRK_NO_CONSUMER_USAGES }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_ASUS_AK1D) }, { HID_USB_DEVICE(USB_VENDOR_ID_TURBOX, USB_DEVICE_ID_ASUS_MD_5110) }, diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index f3fcb836a1f9..d6780e1d2526 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -1983,7 +1983,8 @@ static const struct hid_device_id hid_have_special_driver[] = { { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_ROG_KEYBOARD1) }, { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_ROG_KEYBOARD2) }, { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_ROG_KEYBOARD3) }, - { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_T100_KEYBOARD) }, + { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_T100TA_KEYBOARD) }, + { HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_T100TAF_KEYBOARD) }, { HID_USB_DEVICE(USB_VENDOR_ID_JESS, USB_DEVICE_ID_ASUS_MD_5112) }, { HID_USB_DEVICE(USB_VENDOR_ID_TURBOX, USB_DEVICE_ID_ASUS_MD_5110) }, { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_T100CHI_KEYBOARD) }, diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 5da3d6256d25..dc1db8558850 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -178,7 +178,8 @@ #define USB_VENDOR_ID_ASUSTEK 0x0b05 #define USB_DEVICE_ID_ASUSTEK_LCM 0x1726 #define USB_DEVICE_ID_ASUSTEK_LCM2 0x175b -#define USB_DEVICE_ID_ASUSTEK_T100_KEYBOARD 0x17e0 +#define USB_DEVICE_ID_ASUSTEK_T100TA_KEYBOARD 0x17e0 +#define USB_DEVICE_ID_ASUSTEK_T100TAF_KEYBOARD 0x1807 #define USB_DEVICE_ID_ASUSTEK_T100CHI_KEYBOARD 0x8502 #define USB_DEVICE_ID_ASUSTEK_T304_KEYBOARD 0x184a #define USB_DEVICE_ID_ASUSTEK_I2C_KEYBOARD 0x8585 -- 2.14.3

8 years

2
3
0 0

[Linux-stable-mirror] [PATCH] USB: usbfs: Filter flags passed in from user space

by Oliver Neukum

USBDEVFS_URB_ISO_ASAP must be accepted only for ISO endpoints. Improve sanity checking. Signed-off-by: Oliver Neukum <oneukum(a)suse.com> --- drivers/usb/core/devio.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 705c573d0257..701ddada389a 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -1442,14 +1442,18 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb int number_of_packets = 0; unsigned int stream_id = 0; void *buf; - - if (uurb->flags & ~(USBDEVFS_URB_ISO_ASAP | - USBDEVFS_URB_SHORT_NOT_OK | + unsigned long mask = USBDEVFS_URB_SHORT_NOT_OK | USBDEVFS_URB_BULK_CONTINUATION | USBDEVFS_URB_NO_FSBR | - USBDEVFS_URB_ZERO_PACKET | - USBDEVFS_URB_NO_INTERRUPT)) - return -EINVAL; + USBDEVFS_URB_ZERO_PACKET | + USBDEVFS_URB_NO_INTERRUPT; + /* USBDEVFS_URB_ISO_ASAP is a special case */ + if (uurb->type == USBDEVFS_URB_TYPE_ISO) + mask |= USBDEVFS_URB_ISO_ASAP; + + if (uurb->flags & ~mask) + return -EINVAL; + if ((unsigned int)uurb->buffer_length >= USBFS_XFER_MAX) return -EINVAL; if (uurb->buffer_length > 0 && !uurb->buffer) -- 2.13.6

8 years

2
3
0 0

[Linux-stable-mirror] [PATCH 4.14 00/18] 4.14.2-stable review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 4.14.2 release. There are 18 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Fri Nov 24 10:11:38 UTC 2017. Anything received after that time might be too late. The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.2-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 4.14.2-rc1 Jan Harkes <jaharkes(a)cs.cmu.edu> coda: fix 'kernel memory exposure attempt' in fsync Jaewon Kim <jaewon31.kim(a)samsung.com> mm/page_ext.c: check if page_ext is not prepared Pavel Tatashin <pasha.tatashin(a)oracle.com> mm/page_alloc.c: broken deferred calculation Corey Minyard <cminyard(a)mvista.com> ipmi: fix unsigned long underflow alex chen <alex.chen(a)huawei.com> ocfs2: should wait dio before inode lock in ocfs2_setattr() Changwei Ge <ge.changwei(a)h3c.com> ocfs2: fix cluster hang after a node dies Jann Horn <jannh(a)google.com> mm/pagewalk.c: report holes in hugetlb ranges Neeraj Upadhyay <neeraju(a)codeaurora.org> rcu: Fix up pending cbs check in rcu_prepare_for_idle Alexander Steffen <Alexander.Steffen(a)infineon.com> tpm-dev-common: Reject too short writes Ji-Ze Hong (Peter Hong) <hpeter(a)gmail.com> serial: 8250_fintek: Fix finding base_port with activated SuperIO Lukas Wunner <lukas(a)wunner.de> serial: omap: Fix EFR write on RTS deassertion Roberto Sassu <roberto.sassu(a)huawei.com> ima: do not update security.ima if appraisal status is not INTEGRITY_PASS Eric W. Biederman <ebiederm(a)xmission.com> net/sctp: Always set scope_id in sctp_inet6_skb_msgname Huacai Chen <chenhc(a)lemote.com> fealnx: Fix building error on MIPS Bjørn Mork <bjorn(a)mork.no> net: cdc_ncm: GetNtbFormat endian fix Xin Long <lucien.xin(a)gmail.com> vxlan: fix the issue that neigh proxy blocks all icmpv6 packets Jason A. Donenfeld <Jason(a)zx2c4.com> af_netlink: ensure that NLMSG_DONE never fails in dumps Michael Lyle <mlyle(a)lyle.org> bio: ensure __bio_clone_fast copies bi_partno ------------- Diffstat: Makefile | 4 ++-- block/bio.c | 1 + drivers/char/ipmi/ipmi_msghandler.c | 10 ++++++---- drivers/char/tpm/tpm-dev-common.c | 6 ++++++ drivers/net/ethernet/fealnx.c | 6 +++--- drivers/net/usb/cdc_ncm.c | 4 ++-- drivers/net/vxlan.c | 31 +++++++++++++------------------ drivers/tty/serial/8250/8250_fintek.c | 3 +++ drivers/tty/serial/omap-serial.c | 2 +- fs/coda/upcall.c | 3 +-- fs/ocfs2/dlm/dlmrecovery.c | 1 + fs/ocfs2/file.c | 9 +++++++-- include/linux/mmzone.h | 3 ++- kernel/rcu/tree_plugin.h | 2 +- mm/page_alloc.c | 27 ++++++++++++++++++--------- mm/page_ext.c | 4 ---- mm/pagewalk.c | 6 +++++- net/netlink/af_netlink.c | 17 +++++++++++------ net/netlink/af_netlink.h | 1 + net/sctp/ipv6.c | 5 +++-- security/integrity/ima/ima_appraise.c | 3 +++ 21 files changed, 90 insertions(+), 58 deletions(-)

8 years

3
21
0 0

[Linux-stable-mirror] [PATCH 4.13 00/35] 4.13.16-stable review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 4.13.16 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Fri Nov 24 10:11:25 UTC 2017. Anything received after that time might be too late. The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.13.16-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.13.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 4.13.16-rc1 Jan Harkes <jaharkes(a)cs.cmu.edu> coda: fix 'kernel memory exposure attempt' in fsync Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com> x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask Jaewon Kim <jaewon31.kim(a)samsung.com> mm/page_ext.c: check if page_ext is not prepared Pavel Tatashin <pasha.tatashin(a)oracle.com> mm/page_alloc.c: broken deferred calculation Corey Minyard <cminyard(a)mvista.com> ipmi: fix unsigned long underflow alex chen <alex.chen(a)huawei.com> ocfs2: should wait dio before inode lock in ocfs2_setattr() Changwei Ge <ge.changwei(a)h3c.com> ocfs2: fix cluster hang after a node dies Jann Horn <jannh(a)google.com> mm/pagewalk.c: report holes in hugetlb ranges Neeraj Upadhyay <neeraju(a)codeaurora.org> rcu: Fix up pending cbs check in rcu_prepare_for_idle Alexander Steffen <Alexander.Steffen(a)infineon.com> tpm-dev-common: Reject too short writes Ji-Ze Hong (Peter Hong) <hpeter(a)gmail.com> serial: 8250_fintek: Fix finding base_port with activated SuperIO Lukas Wunner <lukas(a)wunner.de> serial: omap: Fix EFR write on RTS deassertion Roberto Sassu <roberto.sassu(a)huawei.com> ima: do not update security.ima if appraisal status is not INTEGRITY_PASS Eric W. Biederman <ebiederm(a)xmission.com> net/sctp: Always set scope_id in sctp_inet6_skb_msgname Huacai Chen <chenhc(a)lemote.com> fealnx: Fix building error on MIPS Xin Long <lucien.xin(a)gmail.com> sctp: do not peel off an assoc from one netns to another one Bjørn Mork <bjorn(a)mork.no> net: cdc_ncm: GetNtbFormat endian fix Xin Long <lucien.xin(a)gmail.com> vxlan: fix the issue that neigh proxy blocks all icmpv6 packets Jason A. Donenfeld <Jason(a)zx2c4.com> af_netlink: ensure that NLMSG_DONE never fails in dumps Inbar Karmy <inbark(a)mellanox.com> net/mlx5e: Set page to null in case dma mapping fails Huy Nguyen <huyn(a)mellanox.com> net/mlx5: Cancel health poll before sending panic teardown command Cong Wang <xiyou.wangcong(a)gmail.com> vlan: fix a use-after-free in vlan_device_event() Yuchung Cheng <ycheng(a)google.com> tcp: fix tcp_fastretrans_alert warning Eric Dumazet <edumazet(a)google.com> tcp: gso: avoid refcount_t warning from tcp_gso_segment() Andrey Konovalov <andreyknvl(a)google.com> net: usb: asix: fill null-ptr-deref in asix_suspend Kristian Evensen <kristian.evensen(a)gmail.com> qmi_wwan: Add missing skb_reset_mac_header-call Bjørn Mork <bjorn(a)mork.no> net: qmi_wwan: fix divide by 0 on bad descriptors Bjørn Mork <bjorn(a)mork.no> net: cdc_ether: fix divide by 0 on bad descriptors Hangbin Liu <liuhangbin(a)gmail.com> bonding: discard lowest hash bit for 802.3ad layer3+4 Guillaume Nault <g.nault(a)alphalink.fr> l2tp: don't use l2tp_tunnel_find() in l2tp_ip and l2tp_ip6 Ye Yin <hustcat(a)gmail.com> netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed Florian Fainelli <f.fainelli(a)gmail.com> net: systemport: Correct IPG length settings Eric Dumazet <edumazet(a)google.com> tcp: do not mangle skb->cb[] in tcp_make_synack() Jeff Barnhill <0xeffeff(a)gmail.com> net: vrf: correct FRA_L3MDEV encode type Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> tcp_nv: fix division by zero in tcpnv_acked() ------------- Diffstat: Makefile | 4 ++-- arch/x86/kernel/cpu/intel_cacheinfo.c | 32 ++++++++++++++----------- drivers/char/ipmi/ipmi_msghandler.c | 10 ++++---- drivers/char/tpm/tpm-dev-common.c | 6 +++++ drivers/net/bonding/bond_main.c | 2 +- drivers/net/ethernet/broadcom/bcmsysport.c | 10 ++++---- drivers/net/ethernet/fealnx.c | 6 ++--- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 ++++------ drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 ++++++ drivers/net/usb/asix_devices.c | 4 ++-- drivers/net/usb/cdc_ether.c | 2 +- drivers/net/usb/cdc_ncm.c | 4 ++-- drivers/net/usb/qmi_wwan.c | 3 ++- drivers/net/vrf.c | 2 +- drivers/net/vxlan.c | 31 ++++++++++-------------- drivers/tty/serial/8250/8250_fintek.c | 3 +++ drivers/tty/serial/omap-serial.c | 2 +- fs/coda/upcall.c | 3 +-- fs/ocfs2/dlm/dlmrecovery.c | 1 + fs/ocfs2/file.c | 9 +++++-- include/linux/mmzone.h | 3 ++- include/linux/skbuff.h | 7 ++++++ kernel/rcu/tree_plugin.h | 2 +- mm/page_alloc.c | 27 ++++++++++++++------- mm/page_ext.c | 4 ---- mm/pagewalk.c | 6 ++++- net/8021q/vlan.c | 6 ++--- net/core/skbuff.c | 1 + net/ipv4/tcp_input.c | 3 +-- net/ipv4/tcp_nv.c | 2 +- net/ipv4/tcp_offload.c | 12 ++++++++-- net/ipv4/tcp_output.c | 9 ++----- net/l2tp/l2tp_ip.c | 24 +++++++------------ net/l2tp/l2tp_ip6.c | 24 +++++++------------ net/netlink/af_netlink.c | 17 ++++++++----- net/netlink/af_netlink.h | 1 + net/sctp/ipv6.c | 5 ++-- net/sctp/socket.c | 4 ++++ security/integrity/ima/ima_appraise.c | 3 +++ 39 files changed, 179 insertions(+), 134 deletions(-)

8 years

3
34
0 0

[Linux-stable-mirror] [PATCH] MIPS: Fix odd fp register warnings with MIPS64r2

by James Hogan

From: James Hogan <jhogan(a)kernel.org> Building 32-bit MIPS64r2 kernels produces warnings like the following on certain toolchains (such as GNU assembler 2.24.90, but not GNU assembler 2.28.51) since commit 22b8ba765a72 ("MIPS: Fix MIPS64 FP save/restore on 32-bit kernels"), due to the exposure of fpu_save_16odd from fpu_save_double and fpu_restore_16odd from fpu_restore_double: arch/mips/kernel/r4k_fpu.S:47: Warning: float register should be even, was 1 ... arch/mips/kernel/r4k_fpu.S:59: Warning: float register should be even, was 1 ... This appears to be because .set mips64r2 does not change the FPU ABI to 64-bit when -march=mips64r2 (or e.g. -march=xlp) is provided on the command line on that toolchain, from the default FPU ABI of 32-bit due to the -mabi=32. This makes access to the odd FPU registers invalid. Fix by explicitly changing the FPU ABI with .set fp=64 directives in fpu_save_16odd and fpu_restore_16odd, and moving the undefine of fp up in asmmacro.h so fp doesn't turn into $30. Fixes: 22b8ba765a72 ("MIPS: Fix MIPS64 FP save/restore on 32-bit kernels") Signed-off-by: James Hogan <jhogan(a)kernel.org> Cc: Ralf Baechle <ralf(a)linux-mips.org> Cc: Paul Burton <paul.burton(a)imgtec.com> Cc: linux-mips(a)linux-mips.org Cc: <stable(a)vger.kernel.org> # 4.0+: 22b8ba765a72: MIPS: Fix MIPS64 FP save/restore on 32-bit kernels Cc: <stable(a)vger.kernel.org> # 4.0+ --- arch/mips/include/asm/asmmacro.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h index b815d7b3bd27..feb069cbf44e 100644 --- a/arch/mips/include/asm/asmmacro.h +++ b/arch/mips/include/asm/asmmacro.h @@ -19,6 +19,9 @@ #include <asm/asmmacro-64.h> #endif +/* preprocessor replaces the fp in ".set fp=64" with $30 otherwise */ +#undef fp + /* * Helper macros for generating raw instruction encodings. */ @@ -105,6 +108,7 @@ .macro fpu_save_16odd thread .set push .set mips64r2 + .set fp=64 SET_HARDFLOAT sdc1 $f1, THREAD_FPR1(\thread) sdc1 $f3, THREAD_FPR3(\thread) @@ -163,6 +167,7 @@ .macro fpu_restore_16odd thread .set push .set mips64r2 + .set fp=64 SET_HARDFLOAT ldc1 $f1, THREAD_FPR1(\thread) ldc1 $f3, THREAD_FPR3(\thread) @@ -234,9 +239,6 @@ .endm #ifdef TOOLCHAIN_SUPPORTS_MSA -/* preprocessor replaces the fp in ".set fp=64" with $30 otherwise */ -#undef fp - .macro _cfcmsa rd, cs .set push .set mips32r2 -- 2.14.1

8 years

2
1
0 0

Re: [Linux-stable-mirror] [PATCH 3.16 100/133] bcache: correct cache_dirty_target in __update_writeback_rate()

by Joe Perches

On Thu, 2017-11-23 at 13:08 +0000, Ben Hutchings wrote: > On Tue, 2017-11-21 at 19:41 -0800, Joe Perches wrote: > > On Wed, 2017-11-22 at 01:58 +0000, Ben Hutchings wrote: > > > 3.16.51-rc1 review patch. If anyone has any objections, please let me know. > > [] > > > --- a/drivers/md/bcache/writeback.h > > > +++ b/drivers/md/bcache/writeback.h > > > @@ -14,6 +14,25 @@ static inline uint64_t bcache_dev_sector > > > return ret; > > > } > > > > > > +static inline uint64_t bcache_flash_devs_sectors_dirty(struct cache_set *c) > > > +{ > > > + uint64_t i, ret = 0; > > > > There's no reason i should be uint64_t > > as nr_uuids is unsigned int. > > But this still works, right? That's a minor issue to deal with > upstream, not in the backport. correct

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH 4.9 00/25] 4.9.65-stable review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 4.9.65 release. There are 25 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Fri Nov 24 10:11:07 UTC 2017. Anything received after that time might be too late. The whole patch series can be found in one patch at: kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.65-rc1.gz or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 4.9.65-rc1 Jan Harkes <jaharkes(a)cs.cmu.edu> coda: fix 'kernel memory exposure attempt' in fsync Pavel Tatashin <pasha.tatashin(a)oracle.com> mm/page_alloc.c: broken deferred calculation Corey Minyard <cminyard(a)mvista.com> ipmi: fix unsigned long underflow alex chen <alex.chen(a)huawei.com> ocfs2: should wait dio before inode lock in ocfs2_setattr() Changwei Ge <ge.changwei(a)h3c.com> ocfs2: fix cluster hang after a node dies Adam Wallis <awallis(a)codeaurora.org> dmaengine: dmatest: warn user when dma test times out Ji-Ze Hong (Peter Hong) <hpeter(a)gmail.com> serial: 8250_fintek: Fix finding base_port with activated SuperIO Lukas Wunner <lukas(a)wunner.de> serial: omap: Fix EFR write on RTS deassertion Roberto Sassu <roberto.sassu(a)huawei.com> ima: do not update security.ima if appraisal status is not INTEGRITY_PASS Eric Biggers <ebiggers(a)google.com> crypto: dh - Fix double free of ctx->p Tudor-Dan Ambarus <tudor.ambarus(a)microchip.com> crypto: dh - fix memleak in setkey Eric W. Biederman <ebiederm(a)xmission.com> net/sctp: Always set scope_id in sctp_inet6_skb_msgname Huacai Chen <chenhc(a)lemote.com> fealnx: Fix building error on MIPS Xin Long <lucien.xin(a)gmail.com> sctp: do not peel off an assoc from one netns to another one Jason A. Donenfeld <Jason(a)zx2c4.com> af_netlink: ensure that NLMSG_DONE never fails in dumps Cong Wang <xiyou.wangcong(a)gmail.com> vlan: fix a use-after-free in vlan_device_event() Andrey Konovalov <andreyknvl(a)google.com> net: usb: asix: fill null-ptr-deref in asix_suspend Kristian Evensen <kristian.evensen(a)gmail.com> qmi_wwan: Add missing skb_reset_mac_header-call Bjørn Mork <bjorn(a)mork.no> net: qmi_wwan: fix divide by 0 on bad descriptors Bjørn Mork <bjorn(a)mork.no> net: cdc_ether: fix divide by 0 on bad descriptors Hangbin Liu <liuhangbin(a)gmail.com> bonding: discard lowest hash bit for 802.3ad layer3+4 Ye Yin <hustcat(a)gmail.com> netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed Eric Dumazet <edumazet(a)google.com> tcp: do not mangle skb->cb[] in tcp_make_synack() Jeff Barnhill <0xeffeff(a)gmail.com> net: vrf: correct FRA_L3MDEV encode type Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> tcp_nv: fix division by zero in tcpnv_acked() ------------- Diffstat: Makefile | 4 ++-- crypto/dh.c | 34 +++++++++++++++------------------- drivers/char/ipmi/ipmi_msghandler.c | 10 ++++++---- drivers/dma/dmatest.c | 1 + drivers/net/bonding/bond_main.c | 2 +- drivers/net/ethernet/fealnx.c | 6 +++--- drivers/net/usb/asix_devices.c | 4 ++-- drivers/net/usb/cdc_ether.c | 2 +- drivers/net/usb/qmi_wwan.c | 3 ++- drivers/net/vrf.c | 2 +- drivers/tty/serial/8250/8250_fintek.c | 3 +++ drivers/tty/serial/omap-serial.c | 2 +- fs/coda/upcall.c | 3 +-- fs/ocfs2/dlm/dlmrecovery.c | 1 + fs/ocfs2/file.c | 9 +++++++-- include/linux/mmzone.h | 3 ++- include/linux/skbuff.h | 7 +++++++ mm/page_alloc.c | 27 ++++++++++++++++++--------- net/8021q/vlan.c | 6 +++--- net/core/skbuff.c | 1 + net/ipv4/tcp_nv.c | 2 +- net/ipv4/tcp_output.c | 9 ++------- net/netlink/af_netlink.c | 17 +++++++++++------ net/netlink/af_netlink.h | 1 + net/sctp/ipv6.c | 5 +++-- net/sctp/socket.c | 4 ++++ security/integrity/ima/ima_appraise.c | 3 +++ 27 files changed, 103 insertions(+), 68 deletions(-)

8 years

3
24
0 0

Re: [Linux-stable-mirror] [PATCH 3.16 105/133] mm/vmstat.c: fix wrong comment

by Michal Hocko

On Thu 23-11-17 13:05:10, Ben Hutchings wrote: > On Wed, 2017-11-22 at 08:41 +0100, Vlastimil Babka wrote: > > On 11/22/2017 02:58 AM, Ben Hutchings wrote: > > > 3.16.51-rc1 review patch. If anyone has any objections, please let me know. > > > > I don't really care much in the end, but is "fix wrong comment" really a > > stable patch material these days? :) > > It had a Fixes: field and it clearly won't do any harm. Fixes tag is sometimes abused this way. It will not do any harm but the fewer patch to backport the better IMHO > Still, none of the other stable branches has it so I'll drop it. Makes sense. -- Michal Hocko SUSE Labs

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH v2] drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU

by Marek Szyprowski

When no IOMMU is available, all GEM buffers allocated by Exynos DRM driver are contiguous, because of the underlying dma_alloc_attrs() function provides only such buffers. In such case it makes no sense to keep BO_NONCONTIG flag for the allocated GEM buffers. This allows to avoid failures for buffer contiguity checks in the subsequent operations on GEM objects. Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com> CC: stable(a)vger.kernel.org # v4.4+ --- This issue is there since commit 0519f9a12d011 ("drm/exynos: add iommu support for exynos drm framework"), but this patch applies cleanly only to v4.4+ kernel releases due changes in the surrounding code. Changelog: v2: - added warning message when buffer flags are updadated (requested by Inki) v1: https://patchwork.kernel.org/patch/10034919/ - initial version --- drivers/gpu/drm/exynos/exynos_drm_gem.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c index 077de014d610..4400efe3974a 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c @@ -247,6 +247,15 @@ struct exynos_drm_gem *exynos_drm_gem_create(struct drm_device *dev, if (IS_ERR(exynos_gem)) return exynos_gem; + if (!is_drm_iommu_supported(dev) && (flags & EXYNOS_BO_NONCONTIG)) { + /* + * when no IOMMU is available, all allocated buffers are + * contiguous anyway, so drop EXYNOS_BO_NONCONTIG flag + */ + flags &= ~EXYNOS_BO_NONCONTIG; + DRM_WARN("Non-contiguous allocation is not supported without IOMMU, falling back to contiguous buffer\n"); + } + /* set memory type and cache attribute from user side. */ exynos_gem->flags = flags; -- 2.14.2

8 years

3
3
0 0

[Linux-stable-mirror] [PATCH] drm/vblank: Pass crtc_id to page_flip_ioctl.

by Maarten Lankhorst

We added crtc_id to the atomic ioctl, but forgot to add it for vblank and page flip events. Commit bd386e518056 ("drm: Reorganize drm_pending_event to support future event types [v2]") added it to the vblank event, but page flip event was still missing. Correct this and add a test for making sure we always set crtc_id correctly. Fixes: bd386e518056 ("drm: Reorganize drm_pending_event to support future event types [v2]") Fixes: 5db06a8a98f5 ("drm: Pass CRTC ID in userspace vblank events") Cc: Daniel Stone <daniels(a)collabora.com> Cc: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Gustavo Padovan <gustavo(a)padovan.org> Cc: Sean Paul <seanpaul(a)chromium.org> Cc: dri-devel(a)lists.freedesktop.org Cc: <stable(a)vger.kernel.org> # v4.12+ Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> --- drivers/gpu/drm/drm_plane.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c index 19404e34cd59..37a93cdffb4a 100644 --- a/drivers/gpu/drm/drm_plane.c +++ b/drivers/gpu/drm/drm_plane.c @@ -1030,6 +1030,7 @@ int drm_mode_page_flip_ioctl(struct drm_device *dev, e->event.base.type = DRM_EVENT_FLIP_COMPLETE; e->event.base.length = sizeof(e->event); e->event.vbl.user_data = page_flip->user_data; + e->event.vbl.crtc_id = crtc->base.id; ret = drm_event_reserve_init(dev, file_priv, &e->base, &e->event.base); if (ret) { kfree(e); -- 2.15.0

8 years

1
0
0 0

Re: [Linux-stable-mirror] [PATCH 4.14 00/18] 4.14.2-stable review

by Antoine Tenart

Hi Kevin, On Wed, Nov 22, 2017 at 03:43:51PM -0800, Kevin Hilman wrote: > kernelci.org bot <bot(a)kernelci.org> writes: > > > stable-rc/linux-4.14.y boot: 100 boots: 3 failed, 96 passed with 1 untried/unknown (v4.14.1-19-g6d0448582cf7) > > > > Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.1… > > Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.1-19-… > > > > Tree: stable-rc > > Branch: linux-4.14.y > > Git Describe: v4.14.1-19-g6d0448582cf7 > > Git Commit: 6d0448582cf754db0c8eedcb85edc01692892f50 > > Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > Tested: 43 unique boards, 16 SoC families, 18 builds out of 189 > > TL;DR; All is well, but Cc'd Free Electrons for double check. Thanks for the hint, we'll double check this one. Antoine -- Antoine Ténart, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com

8 years

1
0
0 0

Re: [Linux-stable-mirror] [PATCH 3.18 00/12] 3.18.84-stable review

by Greg Kroah-Hartman

On Wed, Nov 22, 2017 at 03:49:03PM -0800, Kevin Hilman wrote: > kernelci.org bot <bot(a)kernelci.org> writes: > > > stable-rc/linux-3.18.y boot: 44 boots: 2 failed, 42 passed (v3.18.83-12-g09d9c198f334) > > > > Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-3.18.y/kernel/v3.1… > > Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-3.18.y/kernel/v3.18.83-12… > > > > Tree: stable-rc > > Branch: linux-3.18.y > > Git Describe: v3.18.83-12-g09d9c198f334 > > Git Commit: 09d9c198f3343abad9fceeab61c33c351793aac7 > > Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > Tested: 18 unique boards, 9 SoC families, 13 builds out of 177 > > > > Boot Regressions Detected: > > > > arm: > > > > multi_v7_defconfig: > > armada-xp-openblocks-ax3-4: > > lab-free-electrons: new failure (last pass: v3.18.83-10-g097340129e75) > > > > sama5_defconfig: > > at91-sama5d3_xplained_rootfs:nfs: > > lab-free-electrons: new failure (last pass: v3.18.83-10-g097340129e75) > > These are both passing now in the latest 3.18.y branch[1], so whatever it > was is now gone: Thanks for the manual verification on all of these. greg k-h

8 years

1
0
0 0

[Linux-stable-mirror] [PATCH AUTOSEL for 4.9 01/54] dax: Avoid page invalidation races and unnecessary radix tree traversals

by alexander.levin＠verizon.com

From: Jan Kara <jack(a)suse.cz> [ Upstream commit e3fce68cdbed297d927e993b3ea7b8b1cee545da ] Currently dax_iomap_rw() takes care of invalidating page tables and evicting hole pages from the radix tree when write(2) to the file happens. This invalidation is only necessary when there is some block allocation resulting from write(2). Furthermore in current place the invalidation is racy wrt page fault instantiating a hole page just after we have invalidated it. So perform the page invalidation inside dax_iomap_actor() where we can do it only when really necessary and after blocks have been allocated so nobody will be instantiating new hole pages anymore. Reviewed-by: Christoph Hellwig <hch(a)lst.de> Reviewed-by: Ross Zwisler <ross.zwisler(a)linux.intel.com> Signed-off-by: Jan Kara <jack(a)suse.cz> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com> --- fs/dax.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index bf6218da7928..800748f10b3d 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1265,6 +1265,17 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data, if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED)) return -EIO; + /* + * Write can allocate block for an area which has a hole page mapped + * into page tables. We have to tear down these mappings so that data + * written by write(2) is visible in mmap. + */ + if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) { + invalidate_inode_pages2_range(inode->i_mapping, + pos >> PAGE_SHIFT, + (end - 1) >> PAGE_SHIFT); + } + while (pos < end) { unsigned offset = pos & (PAGE_SIZE - 1); struct blk_dax_ctl dax = { 0 }; @@ -1329,23 +1340,6 @@ iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter, if (iov_iter_rw(iter) == WRITE) flags |= IOMAP_WRITE; - /* - * Yes, even DAX files can have page cache attached to them: A zeroed - * page is inserted into the pagecache when we have to serve a write - * fault on a hole. It should never be dirtied and can simply be - * dropped from the pagecache once we get real data for the page. - * - * XXX: This is racy against mmap, and there's nothing we can do about - * it. We'll eventually need to shift this down even further so that - * we can check if we allocated blocks over a hole first. - */ - if (mapping->nrpages) { - ret = invalidate_inode_pages2_range(mapping, - pos >> PAGE_SHIFT, - (pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT); - WARN_ON_ONCE(ret); - } - while (iov_iter_count(iter)) { ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops, iter, iomap_dax_actor); -- 2.11.0

8 years

2
54
0 0

[Linux-stable-mirror] [PATCH] x86/entry/64: Add missing irqflags tracing to native_load_gs_index()

by Andy Lutomirski

Running this code with IRQs enabled (where dummy_lock is a spinlock): static void check_load_gs_index(void) { /* This will fail. */ load_gs_index(0xffff); spin_lock(&dummy_lock); spin_unlock(&dummy_lock); } Will generate a lockdep warning. The issue is that the actual write to %gs would cause an exception with IRQs disabled, and the exception handler would, as an inadvertent side effect, update irqflag tracing to reflect the IRQs-off status. native_load_gs_index() would then turn IRQs back on and return with irqflag tracing still thinking that IRQs were off. The dummy lock-and-unlock causes lockdep to notice the error and warn. Fix it by adding the missing tracing. Apparently nothing did this in a context where it mattered. I haven't tried to find a code path that would actually exhibit the warning if appropriately nasty user code were running. I suspect that the security impact of this bug is very, very low -- production systems don't run with lockdep enabled, and the warning is mostly harmless anyway. Found during a quick audit of the entry code to try to track down an unrelated bug that Ingo found in some still-in-development code. Cc: stable(a)vger.kernel.org Signed-off-by: Andy Lutomirski <luto(a)kernel.org> --- Hi Ingo- You asked me to look for an irqflag tracing bug, so I found one :) arch/x86/entry/entry_64.S | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index a2b30ec69497..3c288f260fdf 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -51,15 +51,19 @@ ENTRY(native_usergs_sysret64) END(native_usergs_sysret64) #endif /* CONFIG_PARAVIRT */ -.macro TRACE_IRQS_IRETQ +.macro TRACE_IRQS_FLAGS flags:req #ifdef CONFIG_TRACE_IRQFLAGS - bt $9, EFLAGS(%rsp) /* interrupts off? */ + bt $9, \flags /* interrupts off? */ jnc 1f TRACE_IRQS_ON 1: #endif .endm +.macro TRACE_IRQS_IRETQ + TRACE_IRQS_FLAGS EFLAGS(%rsp) +.endm + /* * When dynamic function tracer is enabled it will add a breakpoint * to all locations that it is about to modify, sync CPUs, update @@ -943,11 +947,13 @@ ENTRY(native_load_gs_index) FRAME_BEGIN pushfq DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI) + TRACE_IRQS_OFF SWAPGS .Lgs_change: movl %edi, %gs 2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE SWAPGS + TRACE_IRQS_FLAGS (%rsp) popfq FRAME_END ret -- 2.13.6

8 years

1
0
0 0

[Linux-stable-mirror] [RESEND][PATCH v2] cxl: Check if vphb exists before iterating over AFU devices

by Vaibhav Jain

During an eeh a kernel-oops is reported if no vPHB to allocated to the AFU. This happens as during AFU init, an error in creation of vPHB is a non-fatal error. Hence afu->phb should always be checked for NULL before iterating over it for the virtual AFU pci devices. This patch fixes the kenel-oops by adding a NULL pointer check for afu->phb before it is dereferenced. Fixes: 9e8df8a2196("cxl: EEH support") Cc: stable(a)vger.kernel.org Signed-off-by: Vaibhav Jain <vaibhav(a)linux.vnet.ibm.com> --- Changelog: Resend -> Added the 'Fixes' info and marking the patch to stable tree [Mpe] v2 -> Added the vphb NULL check to cxl_vphb_error_detected() [Andrew] --- drivers/misc/cxl/pci.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index bb7fd3f4edab..18773343ab3e 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -2083,6 +2083,9 @@ static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu, /* There should only be one entry, but go through the list * anyway */ + if (afu->phb == NULL) + return result; + list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { if (!afu_dev->driver) continue; @@ -2124,8 +2127,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, * Tell the AFU drivers; but we don't care what they * say, we're going away. */ - if (afu->phb != NULL) - cxl_vphb_error_detected(afu, state); + cxl_vphb_error_detected(afu, state); } return PCI_ERS_RESULT_DISCONNECT; } @@ -2265,6 +2267,9 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev) if (cxl_afu_select_best_mode(afu)) goto err; + if (afu->phb == NULL) + continue; + list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { /* Reset the device context. * TODO: make this less disruptive @@ -2327,6 +2332,9 @@ static void cxl_pci_resume(struct pci_dev *pdev) for (i = 0; i < adapter->slices; i++) { afu = adapter->afu[i]; + if (afu->phb != NULL) + continue; + list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { if (afu_dev->driver && afu_dev->driver->err_handler && afu_dev->driver->err_handler->resume) -- 2.14.3

8 years

2
2
0 0

Re: [Linux-stable-mirror] [PATCH AUTOSEL for 4.9 01/56] ACPICA: Resources: Not a valid resource if buffer length too long

by alexander.levin＠verizon.com

On Wed, Nov 15, 2017 at 03:39:22PM +0000, Moore, Robert wrote: >> -----Original Message----- >> From: alexander.levin(a)verizon.com [mailto:alexander.levin@verizon.com] >> Sent: Tuesday, November 14, 2017 6:46 PM >> To: linux-kernel(a)vger.kernel.org; stable(a)vger.kernel.org >> Cc: Moore, Robert <robert.moore(a)intel.com>; Zheng, Lv >> <lv.zheng(a)intel.com>; Wysocki, Rafael J <rafael.j.wysocki(a)intel.com>; >> alexander.levin(a)verizon.com >> Subject: [PATCH AUTOSEL for 4.9 01/56] ACPICA: Resources: Not a valid >> resource if buffer length too long >> >> From: Bob Moore <robert.moore(a)intel.com> >> >> [ Upstream commit 57707a9a7780fab426b8ae9b4c7b65b912a748b3 ] >> >> ACPICA commit 9f76de2d249b18804e35fb55d14b1c2604d627a1 >> ACPICA commit b2e89d72ef1e9deefd63c3fd1dee90f893575b3a >> ACPICA commit 23b5bbe6d78afd3c5abf3adb91a1b098a3000b2e >> >> The declared buffer length must be the same as the length of the byte >> initializer list, otherwise not a valid resource descriptor. [snip] >[Moore, Robert] > >Please explain what you are doing here. Proposing this commit for the 4.9 LTS tree. -- Thanks, Sasha

8 years

2
4
0 0

[Linux-stable-mirror] [PATCH AUTOSEL for 3.18 1/8] ima: fix hash algorithm initialization

by alexander.levin＠verizon.com

From: Boshi Wang <wangboshi(a)huawei.com> [ Upstream commit ebe7c0a7be92bbd34c6ff5b55810546a0ee05bee ] The hash_setup function always sets the hash_setup_done flag, even when the hash algorithm is invalid. This prevents the default hash algorithm defined as CONFIG_IMA_DEFAULT_HASH from being used. This patch sets hash_setup_done flag only for valid hash algorithms. Fixes: e7a2ad7eb6f4 "ima: enable support for larger default filedata hash algorithms" Signed-off-by: Boshi Wang <wangboshi(a)huawei.com> Signed-off-by: Mimi Zohar <zohar(a)linux.vnet.ibm.com> Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com> --- security/integrity/ima/ima_main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index 62f59eca32d3..01590949eddd 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -52,6 +52,8 @@ static int __init hash_setup(char *str) ima_hash_algo = HASH_ALGO_SHA1; else if (strncmp(str, "md5", 3) == 0) ima_hash_algo = HASH_ALGO_MD5; + else + return 1; goto out; } @@ -61,6 +63,8 @@ static int __init hash_setup(char *str) break; } } + if (i == HASH_ALGO__LAST) + return 1; out: hash_setup_done = 1; return 1; -- 2.11.0

8 years

1
7
0 0

[Linux-stable-mirror] [PATCH AUTOSEL for 4.4 01/16] ima: fix hash algorithm initialization

by alexander.levin＠verizon.com

From: Boshi Wang <wangboshi(a)huawei.com> [ Upstream commit ebe7c0a7be92bbd34c6ff5b55810546a0ee05bee ] The hash_setup function always sets the hash_setup_done flag, even when the hash algorithm is invalid. This prevents the default hash algorithm defined as CONFIG_IMA_DEFAULT_HASH from being used. This patch sets hash_setup_done flag only for valid hash algorithms. Fixes: e7a2ad7eb6f4 "ima: enable support for larger default filedata hash algorithms" Signed-off-by: Boshi Wang <wangboshi(a)huawei.com> Signed-off-by: Mimi Zohar <zohar(a)linux.vnet.ibm.com> Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com> --- security/integrity/ima/ima_main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index c21f09bf8b99..98289ba2a2e6 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -52,6 +52,8 @@ static int __init hash_setup(char *str) ima_hash_algo = HASH_ALGO_SHA1; else if (strncmp(str, "md5", 3) == 0) ima_hash_algo = HASH_ALGO_MD5; + else + return 1; goto out; } @@ -61,6 +63,8 @@ static int __init hash_setup(char *str) break; } } + if (i == HASH_ALGO__LAST) + return 1; out: hash_setup_done = 1; return 1; -- 2.11.0

8 years

1
15
0 0

[Linux-stable-mirror] [PATCH AUTOSEL for 4.9 01/24] ima: fix hash algorithm initialization

by alexander.levin＠verizon.com

From: Boshi Wang <wangboshi(a)huawei.com> [ Upstream commit ebe7c0a7be92bbd34c6ff5b55810546a0ee05bee ] The hash_setup function always sets the hash_setup_done flag, even when the hash algorithm is invalid. This prevents the default hash algorithm defined as CONFIG_IMA_DEFAULT_HASH from being used. This patch sets hash_setup_done flag only for valid hash algorithms. Fixes: e7a2ad7eb6f4 "ima: enable support for larger default filedata hash algorithms" Signed-off-by: Boshi Wang <wangboshi(a)huawei.com> Signed-off-by: Mimi Zohar <zohar(a)linux.vnet.ibm.com> Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com> --- security/integrity/ima/ima_main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index 0e8762945e79..2b3def14b4fb 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -51,6 +51,8 @@ static int __init hash_setup(char *str) ima_hash_algo = HASH_ALGO_SHA1; else if (strncmp(str, "md5", 3) == 0) ima_hash_algo = HASH_ALGO_MD5; + else + return 1; goto out; } @@ -60,6 +62,8 @@ static int __init hash_setup(char *str) break; } } + if (i == HASH_ALGO__LAST) + return 1; out: hash_setup_done = 1; return 1; -- 2.11.0

8 years

1
23
0 0

[Linux-stable-mirror] [PATCH AUTOSEL for 3.18 1/9] ARM: OMAP1: DMA: Correct the number of logical channels

by alexander.levin＠verizon.com

From: Peter Ujfalusi <peter.ujfalusi(a)ti.com> [ Upstream commit 657279778af54f35e54b07b6687918f254a2992c ] OMAP1510, OMAP5910 and OMAP310 have only 9 logical channels. OMAP1610, OMAP5912, OMAP1710, OMAP730, and OMAP850 have 16 logical channels available. The wired 17 for the lch_count must have been used to cover the 16 + 1 dedicated LCD channel, in reality we can only use 9 or 16 channels. The d->chan_count is not used by the omap-dma stack, so we can skip the setup. chan_count was configured to the number of logical channels and not the actual number of physical channels anyways. Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)ti.com> Acked-by: Aaro Koskinen <aaro.koskinen(a)iki.fi> Signed-off-by: Tony Lindgren <tony(a)atomide.com> Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com> --- arch/arm/mach-omap1/dma.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/arch/arm/mach-omap1/dma.c b/arch/arm/mach-omap1/dma.c index 4be601b638d7..8129e5f9c94d 100644 --- a/arch/arm/mach-omap1/dma.c +++ b/arch/arm/mach-omap1/dma.c @@ -31,7 +31,6 @@ #include <mach/irqs.h> #define OMAP1_DMA_BASE (0xfffed800) -#define OMAP1_LOGICAL_DMA_CH_COUNT 17 static u32 enable_1510_mode; @@ -311,8 +310,6 @@ static int __init omap1_system_dma_init(void) goto exit_iounmap; } - d->lch_count = OMAP1_LOGICAL_DMA_CH_COUNT; - /* Valid attributes for omap1 plus processors */ if (cpu_is_omap15xx()) d->dev_caps = ENABLE_1510_MODE; @@ -329,13 +326,14 @@ static int __init omap1_system_dma_init(void) d->dev_caps |= CLEAR_CSR_ON_READ; d->dev_caps |= IS_WORD_16; - if (cpu_is_omap15xx()) - d->chan_count = 9; - else if (cpu_is_omap16xx() || cpu_is_omap7xx()) { - if (!(d->dev_caps & ENABLE_1510_MODE)) - d->chan_count = 16; + /* available logical channels */ + if (cpu_is_omap15xx()) { + d->lch_count = 9; + } else { + if (d->dev_caps & ENABLE_1510_MODE) + d->lch_count = 9; else - d->chan_count = 9; + d->lch_count = 16; } p = dma_plat_info; -- 2.11.0

8 years

1
8
0 0

[Linux-stable-mirror] [PATCH AUTOSEL for 4.4 01/17] net: systemport: Utilize skb_put_padto()

by alexander.levin＠verizon.com

From: Florian Fainelli <f.fainelli(a)gmail.com> [ Upstream commit bb7da333d0a9f3bddc08f84187b7579a3f68fd24 ] Since we need to pad our packets, utilize skb_put_padto() which increases skb->len by how much we need to pad, allowing us to eliminate the test on skb->len right below. Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com> --- drivers/net/ethernet/broadcom/bcmsysport.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c index 8860e74aa28f..fae1a1ff53ab 100644 --- a/drivers/net/ethernet/broadcom/bcmsysport.c +++ b/drivers/net/ethernet/broadcom/bcmsysport.c @@ -1061,13 +1061,12 @@ static netdev_tx_t bcm_sysport_xmit(struct sk_buff *skb, * (including FCS and tag) because the length verification is done after * the Broadcom tag is stripped off the ingress packet. */ - if (skb_padto(skb, ETH_ZLEN + ENET_BRCM_TAG_LEN)) { + if (skb_put_padto(skb, ETH_ZLEN + ENET_BRCM_TAG_LEN)) { ret = NETDEV_TX_OK; goto out; } - skb_len = skb->len < ETH_ZLEN + ENET_BRCM_TAG_LEN ? - ETH_ZLEN + ENET_BRCM_TAG_LEN : skb->len; + skb_len = skb->len; mapping = dma_map_single(kdev, skb->data, skb_len, DMA_TO_DEVICE); if (dma_mapping_error(kdev, mapping)) { -- 2.11.0

8 years

1
16
0 0

[Linux-stable-mirror] [PATCH AUTOSEL for 4.9 01/56] ACPICA: Resources: Not a valid resource if buffer length too long

by alexander.levin＠verizon.com

From: Bob Moore <robert.moore(a)intel.com> [ Upstream commit 57707a9a7780fab426b8ae9b4c7b65b912a748b3 ] ACPICA commit 9f76de2d249b18804e35fb55d14b1c2604d627a1 ACPICA commit b2e89d72ef1e9deefd63c3fd1dee90f893575b3a ACPICA commit 23b5bbe6d78afd3c5abf3adb91a1b098a3000b2e The declared buffer length must be the same as the length of the byte initializer list, otherwise not a valid resource descriptor. Link: https://github.com/acpica/acpica/commit/9f76de2d Link: https://github.com/acpica/acpica/commit/b2e89d72 Link: https://github.com/acpica/acpica/commit/23b5bbe6 Signed-off-by: Bob Moore <robert.moore(a)intel.com> Signed-off-by: Lv Zheng <lv.zheng(a)intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com> --- drivers/acpi/acpica/utresrc.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/acpica/utresrc.c b/drivers/acpi/acpica/utresrc.c index 1de3376da66a..2ad99ea3d496 100644 --- a/drivers/acpi/acpica/utresrc.c +++ b/drivers/acpi/acpica/utresrc.c @@ -421,8 +421,10 @@ acpi_ut_walk_aml_resources(struct acpi_walk_state *walk_state, ACPI_FUNCTION_TRACE(ut_walk_aml_resources); - /* The absolute minimum resource template is one end_tag descriptor */ - + /* + * The absolute minimum resource template is one end_tag descriptor. + * However, we will treat a lone end_tag as just a simple buffer. + */ if (aml_length < sizeof(struct aml_resource_end_tag)) { return_ACPI_STATUS(AE_AML_NO_RESOURCE_END_TAG); } @@ -454,9 +456,8 @@ acpi_ut_walk_aml_resources(struct acpi_walk_state *walk_state, /* Invoke the user function */ if (user_function) { - status = - user_function(aml, length, offset, resource_index, - context); + status = user_function(aml, length, offset, + resource_index, context); if (ACPI_FAILURE(status)) { return_ACPI_STATUS(status); } @@ -480,6 +481,12 @@ acpi_ut_walk_aml_resources(struct acpi_walk_state *walk_state, *context = aml; } + /* Check if buffer is defined to be longer than the resource length */ + + if (aml_length > (offset + length)) { + return_ACPI_STATUS(AE_AML_NO_RESOURCE_END_TAG); + } + /* Normal exit */ return_ACPI_STATUS(AE_OK); -- 2.11.0

8 years

1
25
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror