- Linux-stable-mirror - lists.linaro.org

FAILED: patch "[PATCH] futex: Cure exit race" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From da791a667536bf8322042e38ca85d55a78d3c273 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner <tglx(a)linutronix.de> Date: Mon, 10 Dec 2018 14:35:14 +0100 Subject: [PATCH] futex: Cure exit race Stefan reported, that the glibc tst-robustpi4 test case fails occasionally. That case creates the following race between sys_exit() and sys_futex_lock_pi(): CPU0 CPU1 sys_exit() sys_futex() do_exit() futex_lock_pi() exit_signals(tsk) No waiters: tsk->flags |= PF_EXITING; *uaddr == 0x00000PID mm_release(tsk) Set waiter bit exit_robust_list(tsk) { *uaddr = 0x80000PID; Set owner died attach_to_pi_owner() { *uaddr = 0xC0000000; tsk = get_task(PID); } if (!tsk->flags & PF_EXITING) { ... attach(); tsk->flags |= PF_EXITPIDONE; } else { if (!(tsk->flags & PF_EXITPIDONE)) return -EAGAIN; return -ESRCH; <--- FAIL } ESRCH is returned all the way to user space, which triggers the glibc test case assert. Returning ESRCH unconditionally is wrong here because the user space value has been changed by the exiting task to 0xC0000000, i.e. the FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This is a valid state and the kernel has to handle it, i.e. taking the futex. Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE is set in the task which 'owns' the futex. If the value has changed, let the kernel retry the operation, which includes all regular sanity checks and correctly handles the FUTEX_OWNER_DIED case. If it hasn't changed, then return ESRCH as there is no way to distinguish this case from malfunctioning user space. This happens when the exiting task did not have a robust list, the robust list was corrupted or the user space value in the futex was simply bogus. Reported-by: Stefan Liebler <stli(a)linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Acked-by: Peter Zijlstra <peterz(a)infradead.org> Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com> Cc: Darren Hart <dvhart(a)infradead.org> Cc: Ingo Molnar <mingo(a)kernel.org> Cc: Sasha Levin <sashal(a)kernel.org> Cc: stable(a)vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467 Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de diff --git a/kernel/futex.c b/kernel/futex.c index f423f9b6577e..5cc8083a4c89 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1148,11 +1148,65 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval, return ret; } +static int handle_exit_race(u32 __user *uaddr, u32 uval, + struct task_struct *tsk) +{ + u32 uval2; + + /* + * If PF_EXITPIDONE is not yet set, then try again. + */ + if (tsk && !(tsk->flags & PF_EXITPIDONE)) + return -EAGAIN; + + /* + * Reread the user space value to handle the following situation: + * + * CPU0 CPU1 + * + * sys_exit() sys_futex() + * do_exit() futex_lock_pi() + * futex_lock_pi_atomic() + * exit_signals(tsk) No waiters: + * tsk->flags |= PF_EXITING; *uaddr == 0x00000PID + * mm_release(tsk) Set waiter bit + * exit_robust_list(tsk) { *uaddr = 0x80000PID; + * Set owner died attach_to_pi_owner() { + * *uaddr = 0xC0000000; tsk = get_task(PID); + * } if (!tsk->flags & PF_EXITING) { + * ... attach(); + * tsk->flags |= PF_EXITPIDONE; } else { + * if (!(tsk->flags & PF_EXITPIDONE)) + * return -EAGAIN; + * return -ESRCH; <--- FAIL + * } + * + * Returning ESRCH unconditionally is wrong here because the + * user space value has been changed by the exiting task. + * + * The same logic applies to the case where the exiting task is + * already gone. + */ + if (get_futex_value_locked(&uval2, uaddr)) + return -EFAULT; + + /* If the user space value has changed, try again. */ + if (uval2 != uval) + return -EAGAIN; + + /* + * The exiting task did not have a robust list, the robust list was + * corrupted or the user space value in *uaddr is simply bogus. + * Give up and tell user space. + */ + return -ESRCH; +} + /* * Lookup the task for the TID provided from user space and attach to * it after doing proper sanity checks. */ -static int attach_to_pi_owner(u32 uval, union futex_key *key, +static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key, struct futex_pi_state **ps) { pid_t pid = uval & FUTEX_TID_MASK; @@ -1162,12 +1216,15 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key, /* * We are the first waiter - try to look up the real owner and attach * the new pi_state to it, but bail out when TID = 0 [1] + * + * The !pid check is paranoid. None of the call sites should end up + * with pid == 0, but better safe than sorry. Let the caller retry */ if (!pid) - return -ESRCH; + return -EAGAIN; p = find_get_task_by_vpid(pid); if (!p) - return -ESRCH; + return handle_exit_race(uaddr, uval, NULL); if (unlikely(p->flags & PF_KTHREAD)) { put_task_struct(p); @@ -1187,7 +1244,7 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key, * set, we know that the task has finished the * cleanup: */ - int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN; + int ret = handle_exit_race(uaddr, uval, p); raw_spin_unlock_irq(&p->pi_lock); put_task_struct(p); @@ -1244,7 +1301,7 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval, * We are the first waiter - try to look up the owner based on * @uval and attach to it. */ - return attach_to_pi_owner(uval, key, ps); + return attach_to_pi_owner(uaddr, uval, key, ps); } static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval) @@ -1352,7 +1409,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb, * attach to the owner. If that fails, no harm done, we only * set the FUTEX_WAITERS bit in the user space variable. */ - return attach_to_pi_owner(uval, key, ps); + return attach_to_pi_owner(uaddr, newval, key, ps); } /**

6 years, 9 months

1
0
0 0

FAILED: patch "[PATCH] ubifs: Handle re-linking of inodes correctly while recovery" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From e58725d51fa8da9133f3f1c54170aa2e43056b91 Mon Sep 17 00:00:00 2001 From: Richard Weinberger <richard(a)nod.at> Date: Wed, 7 Nov 2018 23:04:43 +0100 Subject: [PATCH] ubifs: Handle re-linking of inodes correctly while recovery MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit UBIFS's recovery code strictly assumes that a deleted inode will never come back, therefore it removes all data which belongs to that inode as soon it faces an inode with link count 0 in the replay list. Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE it can lead to data loss upon a power-cut. Consider a journal with entries like: 0: inode X (nlink = 0) /* O_TMPFILE was created */ 1: data for inode X /* Someone writes to the temp file */ 2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */ 3: inode X (nlink = 1) /* inode was re-linked via linkat() */ Upon replay of entry #2 UBIFS will drop all data that belongs to inode X, this will lead to an empty file after mounting. As solution for this problem, scan the replay list for a re-link entry before dropping data. Fixes: 474b93704f32 ("ubifs: Implement O_TMPFILE") Cc: stable(a)vger.kernel.org Cc: Russell Senior <russell(a)personaltelco.net> Cc: Rafał Miłecki <zajec5(a)gmail.com> Reported-by: Russell Senior <russell(a)personaltelco.net> Reported-by: Rafał Miłecki <zajec5(a)gmail.com> Tested-by: Rafał Miłecki <rafal(a)milecki.pl> Signed-off-by: Richard Weinberger <richard(a)nod.at> diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c index a08c5b7030ea..0a0e65c07c6d 100644 --- a/fs/ubifs/replay.c +++ b/fs/ubifs/replay.c @@ -212,6 +212,38 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r) return ubifs_tnc_remove_range(c, &min_key, &max_key); } +/** + * inode_still_linked - check whether inode in question will be re-linked. + * @c: UBIFS file-system description object + * @rino: replay entry to test + * + * O_TMPFILE files can be re-linked, this means link count goes from 0 to 1. + * This case needs special care, otherwise all references to the inode will + * be removed upon the first replay entry of an inode with link count 0 + * is found. + */ +static bool inode_still_linked(struct ubifs_info *c, struct replay_entry *rino) +{ + struct replay_entry *r; + + ubifs_assert(c, rino->deletion); + ubifs_assert(c, key_type(c, &rino->key) == UBIFS_INO_KEY); + + /* + * Find the most recent entry for the inode behind @rino and check + * whether it is a deletion. + */ + list_for_each_entry_reverse(r, &c->replay_list, list) { + ubifs_assert(c, r->sqnum >= rino->sqnum); + if (key_inum(c, &r->key) == key_inum(c, &rino->key)) + return r->deletion == 0; + + } + + ubifs_assert(c, 0); + return false; +} + /** * apply_replay_entry - apply a replay entry to the TNC. * @c: UBIFS file-system description object @@ -239,6 +271,11 @@ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r) { ino_t inum = key_inum(c, &r->key); + if (inode_still_linked(c, r)) { + err = 0; + break; + } + err = ubifs_tnc_remove_ino(c, inum); break; }

6 years, 9 months

1
0
0 0

FAILED: patch "[PATCH] ubifs: Handle re-linking of inodes correctly while recovery" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From e58725d51fa8da9133f3f1c54170aa2e43056b91 Mon Sep 17 00:00:00 2001 From: Richard Weinberger <richard(a)nod.at> Date: Wed, 7 Nov 2018 23:04:43 +0100 Subject: [PATCH] ubifs: Handle re-linking of inodes correctly while recovery MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit UBIFS's recovery code strictly assumes that a deleted inode will never come back, therefore it removes all data which belongs to that inode as soon it faces an inode with link count 0 in the replay list. Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE it can lead to data loss upon a power-cut. Consider a journal with entries like: 0: inode X (nlink = 0) /* O_TMPFILE was created */ 1: data for inode X /* Someone writes to the temp file */ 2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */ 3: inode X (nlink = 1) /* inode was re-linked via linkat() */ Upon replay of entry #2 UBIFS will drop all data that belongs to inode X, this will lead to an empty file after mounting. As solution for this problem, scan the replay list for a re-link entry before dropping data. Fixes: 474b93704f32 ("ubifs: Implement O_TMPFILE") Cc: stable(a)vger.kernel.org Cc: Russell Senior <russell(a)personaltelco.net> Cc: Rafał Miłecki <zajec5(a)gmail.com> Reported-by: Russell Senior <russell(a)personaltelco.net> Reported-by: Rafał Miłecki <zajec5(a)gmail.com> Tested-by: Rafał Miłecki <rafal(a)milecki.pl> Signed-off-by: Richard Weinberger <richard(a)nod.at> diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c index a08c5b7030ea..0a0e65c07c6d 100644 --- a/fs/ubifs/replay.c +++ b/fs/ubifs/replay.c @@ -212,6 +212,38 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r) return ubifs_tnc_remove_range(c, &min_key, &max_key); } +/** + * inode_still_linked - check whether inode in question will be re-linked. + * @c: UBIFS file-system description object + * @rino: replay entry to test + * + * O_TMPFILE files can be re-linked, this means link count goes from 0 to 1. + * This case needs special care, otherwise all references to the inode will + * be removed upon the first replay entry of an inode with link count 0 + * is found. + */ +static bool inode_still_linked(struct ubifs_info *c, struct replay_entry *rino) +{ + struct replay_entry *r; + + ubifs_assert(c, rino->deletion); + ubifs_assert(c, key_type(c, &rino->key) == UBIFS_INO_KEY); + + /* + * Find the most recent entry for the inode behind @rino and check + * whether it is a deletion. + */ + list_for_each_entry_reverse(r, &c->replay_list, list) { + ubifs_assert(c, r->sqnum >= rino->sqnum); + if (key_inum(c, &r->key) == key_inum(c, &rino->key)) + return r->deletion == 0; + + } + + ubifs_assert(c, 0); + return false; +} + /** * apply_replay_entry - apply a replay entry to the TNC. * @c: UBIFS file-system description object @@ -239,6 +271,11 @@ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r) { ino_t inum = key_inum(c, &r->key); + if (inode_still_linked(c, r)) { + err = 0; + break; + } + err = ubifs_tnc_remove_ino(c, inum); break; }

6 years, 9 months

1
0
0 0

[PATCH] memcg, oom: notify on oom killer invocation from the charge path

by Michal Hocko

From: Michal Hocko <mhocko(a)suse.com> Burt Holzman has noticed that memcg v1 doesn't notify about OOM events via eventfd anymore. The reason is that 29ef680ae7c2 ("memcg, oom: move out_of_memory back to the charge path") has moved the oom handling back to the charge path. While doing so the notification was left behind in mem_cgroup_oom_synchronize. Fix the issue by replicating the oom hierarchy locking and the notification. Reported-by: Burt Holzman <burt(a)fnal.gov> Fixes: 29ef680ae7c2 ("memcg, oom: move out_of_memory back to the charge path") Cc: stable # 4.19+ Acked-by: Johannes Weiner <hannes(a)cmpxchg.org> Signed-off-by: Michal Hocko <mhocko(a)suse.com> --- Hi Andrew, I forgot to CC you on the patch sent as a reply to the original bug report [1] so I am reposting with Ack from Johannes. Burt has confirmed this is resolving the regression for him [2]. 4.20 is out but I have marked the patch for stable so it should hit both 4.19 and 4.20. [1] http://lkml.kernel.org/r/20181221153302.GB6410@dhcp22.suse.cz [2] http://lkml.kernel.org/r/96D4815C-420F-41B7-B1E9-A741E7523596@services.fnal… mm/memcontrol.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6e1469b80cb7..7e6bf74ddb1e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1666,6 +1666,9 @@ enum oom_status { static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) { + enum oom_status ret; + bool locked; + if (order > PAGE_ALLOC_COSTLY_ORDER) return OOM_SKIPPED; @@ -1700,10 +1703,23 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int return OOM_ASYNC; } + mem_cgroup_mark_under_oom(memcg); + + locked = mem_cgroup_oom_trylock(memcg); + + if (locked) + mem_cgroup_oom_notify(memcg); + + mem_cgroup_unmark_under_oom(memcg); if (mem_cgroup_out_of_memory(memcg, mask, order)) - return OOM_SUCCESS; + ret = OOM_SUCCESS; + else + ret = OOM_FAILED; - return OOM_FAILED; + if (locked) + mem_cgroup_oom_unlock(memcg); + + return ret; } /** -- 2.19.2

6 years, 9 months

1
0
0 0

[PATCH 0/1] f2fs: fix sanity_check_raw_super on big endian machines

by Martin Blumenstingl

Big endian machines (at least the one I have access to) cannot mount f2fs filesystems anymore. This is with Linux 4.14.89 but I suspect that 4.9.144 (and later) is affected as well. commit 0cfe75c5b01199 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows") treats the "block_count" from struct f2fs_super_block as 32-bit little endian value instead of a 64-bit little endian value. I tested this fix on top of Linux 4.14.49 but it seems that all stable and mainline kernel versions are affected: - 4.9.144 and later because 0cfe75c5b01199 was backported there - 4.14.86 and later because 0cfe75c5b01199 was backported there - 4.19 - 4.20-rcX Martin Blumenstingl (1): f2fs: fix validation of the block count in sanity_check_raw_super fs/f2fs/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- 2.20.1

6 years, 9 months

3
6
0 0

Re: [BREAKAGE] Since 4.18, kernel sets SB_I_NODEV implicitly on userns mounts, breaking systemd-nspawn

by Thomas Backlund

Den 23-12-2018 kl. 01:28, skrev Linus Torvalds: > On Sat, Dec 22, 2018 at 3:07 PM Christian Brauner > <christian.brauner(a)canonical.com> wrote: >> >> However, for this case should I resend the revert? > > Since I was pointed at the original email thread, I just picked it up > from there directly. It still applied cleanly, nothing had changed in > that area. > > Linus > This should also be picked up for 4.19 lts Greg, it's now upstream as: From 94f82008ce30e2624537d240d64ce718255e0b80 Mon Sep 17 00:00:00 2001 From: Christian Brauner <christian(a)brauner.io> Date: Thu, 5 Jul 2018 17:51:20 +0200 Subject: Revert "vfs: Allow userns root to call mknod on owned filesystems." -- Thomas

6 years, 9 months

2
1
0 0

[PATCH] MIPS: math-emu: Write-protect delay slot emulation pages

by Paul Burton

Mapping the delay slot emulation page as both writeable & executable presents a security risk, in that if an exploit can write to & jump into the page then it can be used as an easy way to execute arbitrary code. Prevent this by mapping the page read-only for userland, and using access_process_vm() with the FOLL_FORCE flag to write to it from mips_dsemul(). This will likely be less efficient due to copy_to_user_page() performing cache maintenance on a whole page, rather than a single line as in the previous use of flush_cache_sigtramp(). However this delay slot emulation code ought not to be running in any performance critical paths anyway so this isn't really a problem, and we can probably do better in copy_to_user_page() anyway in future. A major advantage of this approach is that the fix is small & simple to backport to stable kernels. Reported-by: Andy Lutomirski <luto(a)kernel.org> Signed-off-by: Paul Burton <paul.burton(a)mips.com> Fixes: 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions") Cc: stable(a)vger.kernel.org # v4.8+ --- arch/mips/kernel/vdso.c | 4 ++-- arch/mips/math-emu/dsemul.c | 38 +++++++++++++++++++------------------ 2 files changed, 22 insertions(+), 20 deletions(-) diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c index 48a9c6b90e07..9df3ebdc7b0f 100644 --- a/arch/mips/kernel/vdso.c +++ b/arch/mips/kernel/vdso.c @@ -126,8 +126,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) /* Map delay slot emulation page */ base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, - VM_READ|VM_WRITE|VM_EXEC| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, + VM_READ | VM_EXEC | + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC, 0, NULL); if (IS_ERR_VALUE(base)) { ret = base; diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c index 5450f4d1c920..e2d46cb93ca9 100644 --- a/arch/mips/math-emu/dsemul.c +++ b/arch/mips/math-emu/dsemul.c @@ -214,8 +214,9 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, { int isa16 = get_isa16_mode(regs->cp0_epc); mips_instruction break_math; - struct emuframe __user *fr; - int err, fr_idx; + unsigned long fr_uaddr; + struct emuframe fr; + int fr_idx, ret; /* NOP is easy */ if (ir == 0) @@ -250,27 +251,31 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, fr_idx = alloc_emuframe(); if (fr_idx == BD_EMUFRAME_NONE) return SIGBUS; - fr = &dsemul_page()[fr_idx]; /* Retrieve the appropriately encoded break instruction */ break_math = BREAK_MATH(isa16); /* Write the instructions to the frame */ if (isa16) { - err = __put_user(ir >> 16, - (u16 __user *)(&fr->emul)); - err |= __put_user(ir & 0xffff, - (u16 __user *)((long)(&fr->emul) + 2)); - err |= __put_user(break_math >> 16, - (u16 __user *)(&fr->badinst)); - err |= __put_user(break_math & 0xffff, - (u16 __user *)((long)(&fr->badinst) + 2)); + union mips_instruction _emul = { + .halfword = { ir >> 16, ir } + }; + union mips_instruction _badinst = { + .halfword = { break_math >> 16, break_math } + }; + + fr.emul = _emul.word; + fr.badinst = _badinst.word; } else { - err = __put_user(ir, &fr->emul); - err |= __put_user(break_math, &fr->badinst); + fr.emul = ir; + fr.badinst = break_math; } - if (unlikely(err)) { + /* Write the frame to user memory */ + fr_uaddr = (unsigned long)&dsemul_page()[fr_idx]; + ret = access_process_vm(current, fr_uaddr, &fr, sizeof(fr), + FOLL_FORCE | FOLL_WRITE); + if (unlikely(ret != sizeof(fr))) { MIPS_FPU_EMU_INC_STATS(errors); free_emuframe(fr_idx, current->mm); return SIGBUS; @@ -282,10 +287,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, atomic_set(&current->thread.bd_emu_frame, fr_idx); /* Change user register context to execute the frame */ - regs->cp0_epc = (unsigned long)&fr->emul | isa16; - - /* Ensure the icache observes our newly written frame */ - flush_cache_sigtramp((unsigned long)&fr->emul); + regs->cp0_epc = fr_uaddr | isa16; return 0; } -- 2.20.0

6 years, 9 months

1
1
0 0

[PATCH v2] ocxl: Fix endiannes bug in read_afu_name()

by Greg Kurz

The AFU Descriptor Template in the PCI config space has a Name Space field which is a 24 Byte ASCII character string of descriptive name space for the AFU. The OCXL driver read the string four characters at a time with pci_read_config_dword(). This optimization is valid on a little-endian system since this is PCI, but a big-endian system ends up with each subset of four characters in reverse order. This could be fixed by switching to read characters one by one. Another option is to swap the bytes if we're big-endian. Go for the latter with le32_to_cpu(). Cc: stable(a)vger.kernel.org # v4.16 Signed-off-by: Greg Kurz <groug(a)kaod.org> Acked-by: Frederic Barrat <fbarrat(a)linux.ibm.com> --- v2: - silence sparse with (__force __le32) cast - new changelog --- drivers/misc/ocxl/config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c index 57a6bb1fd3c9..8f2c5d8bd2ee 100644 --- a/drivers/misc/ocxl/config.c +++ b/drivers/misc/ocxl/config.c @@ -318,7 +318,7 @@ static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn, if (rc) return rc; ptr = (u32 *) &afu->name[i]; - *ptr = val; + *ptr = le32_to_cpu((__force __le32) val); } afu->name[OCXL_AFU_NAME_SZ - 1] = '\0'; /* play safe */ return 0;

6 years, 9 months

3
3
0 0

[PATCH v2] powerpc/tm: Set MSR[TS] just prior to recheckpoint

by Breno Leitao

On a signal handler return, the user could set a context with MSR[TS] bits set, and these bits would be copied to task regs->msr. At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, several __get_user() are called and then a recheckpoint is executed. This is a problem since a page fault (in kernel space) could happen when calling __get_user(). If it happens, the process MSR[TS] bits were already set, but recheckpoint was not executed, and SPRs are still invalid. The page fault can cause the current process to be de-scheduled, with MSR[TS] active and without tm_recheckpoint() being called. More importantly, without TEXASR[FS] bit set also. Since TEXASR might not have the FS bit set, and when the process is scheduled back, it will try to reclaim, which will be aborted because of the CPU is not in the suspended state, and, then, recheckpoint. This recheckpoint will restore thread->texasr into TEXASR SPR, which might be zero, hitting a BUG_ON(). kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0] pc: c000000000054550: restore_gprs+0xb0/0x180 lr: 0000000000000000 sp: c00000041f157950 msr: 8000000100021033 current = 0xc00000041f143000 paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01 pid = 1021, comm = kworker/11:1 kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! Linux version 4.9.0-3-powerpc64le (debian-kernel(a)lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) enter ? for help [c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0 [c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0 [c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990 [c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0 [c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610 [c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140 [c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c This patch simply delays the MSR[TS] set, so, if there is any page fault in the __get_user() section, it does not have regs->msr[TS] set, since the TM structures are still invalid, thus avoiding doing TM operations for in-kernel exceptions and possible process reschedule. With this patch, the MSR[TS] will only be set just before recheckpointing and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in invalid state. Other than that, if CONFIG_PREEMPT is set, there might be a preemption just after setting MSR[TS] and before tm_recheckpoint(), thus, this block must be atomic from a preemption perspective, thus, calling preempt_disable/enable() on this code. It is not possible to move tm_recheckpoint to happen earlier, because it is required to get the checkpointed registers from userspace, with __get_user(), thus, the only way to avoid this undesired behavior is delaying the MSR[TS] set. The 32-bits signal handler seems to be safe this current issue, but, it might be exposed to the preemption issue, thus, disabling preemption in this chunk of code. Changes from v2: * Run the critical section with preempt_disable. Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") Cc: stable(a)vger.kernel.org (v3.9+) Signed-off-by: Breno Leitao <leitao(a)debian.org> diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index e6474a45cef5..fd59fef9931b 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -848,7 +848,23 @@ static long restore_tm_user_regs(struct pt_regs *regs, /* If TM bits are set to the reserved value, it's an invalid context */ if (MSR_TM_RESV(msr_hi)) return 1; - /* Pull in the MSR TM bits from the user context */ + + /* + * Disabling preemption, since it is unsafe to be preempted + * with MSR[TS] set without recheckpointing. + */ + preempt_disable(); + + /* + * CAUTION: + * After regs->MSR[TS] being updated, make sure that get_user(), + * put_user() or similar functions are *not* called. These + * functions can generate page faults which will cause the process + * to be de-scheduled with MSR[TS] set but without calling + * tm_recheckpoint(). This can cause a bug. + * + * Pull in the MSR TM bits from the user context + */ regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr_hi & MSR_TS_MASK); /* Now, recheckpoint. This loads up all of the checkpointed (older) * registers, including FP and V[S]Rs. After recheckpointing, the @@ -873,6 +889,8 @@ static long restore_tm_user_regs(struct pt_regs *regs, } #endif + preempt_enable(); + return 0; } #endif diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 83d51bf586c7..bbd1c73243d7 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, if (MSR_TM_RESV(msr)) return -EINVAL; - /* pull in MSR TS bits from user context */ - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); - - /* - * Ensure that TM is enabled in regs->msr before we leave the signal - * handler. It could be the case that (a) user disabled the TM bit - * through the manipulation of the MSR bits in uc_mcontext or (b) the - * TM bit was disabled because a sufficient number of context switches - * happened whilst in the signal handler and load_tm overflowed, - * disabling the TM bit. In either case we can end up with an illegal - * TM state leading to a TM Bad Thing when we return to userspace. - */ - regs->msr |= MSR_TM; - /* pull in MSR LE from user context */ regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE); @@ -572,6 +558,34 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, tm_enable(); /* Make sure the transaction is marked as failed */ tsk->thread.tm_texasr |= TEXASR_FS; + + /* + * Disabling preemption, since it is unsafe to be preempted + * with MSR[TS] set without recheckpointing. + */ + preempt_disable(); + + /* pull in MSR TS bits from user context */ + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); + + /* + * Ensure that TM is enabled in regs->msr before we leave the signal + * handler. It could be the case that (a) user disabled the TM bit + * through the manipulation of the MSR bits in uc_mcontext or (b) the + * TM bit was disabled because a sufficient number of context switches + * happened whilst in the signal handler and load_tm overflowed, + * disabling the TM bit. In either case we can end up with an illegal + * TM state leading to a TM Bad Thing when we return to userspace. + * + * CAUTION: + * After regs->MSR[TS] being updated, make sure that get_user(), + * put_user() or similar functions are *not* called. These + * functions can generate page faults which will cause the process + * to be de-scheduled with MSR[TS] set but without calling + * tm_recheckpoint(). This can cause a bug. + */ + regs->msr |= MSR_TM; + /* This loads the checkpointed FP/VEC state, if used */ tm_recheckpoint(&tsk->thread); @@ -585,6 +599,8 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, regs->msr |= MSR_VEC; } + preempt_enable(); + return err; } #endif -- 2.19.0

6 years, 9 months

2
1
0 0

FAILED: patch "[PATCH] ASoC: sta32x: set ->component pointer in private struct" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 747df19747bc9752cd40b9cce761e17a033aa5c2 Mon Sep 17 00:00:00 2001 From: Daniel Mack <daniel(a)zonque.org> Date: Thu, 11 Oct 2018 20:32:05 +0200 Subject: [PATCH] ASoC: sta32x: set ->component pointer in private struct The ESD watchdog code in sta32x_watchdog() dereferences the pointer which is never assigned. This is a regression from a1be4cead9b950 ("ASoC: sta32x: Convert to direct regmap API usage.") which went unnoticed since nobody seems to use that ESD workaround. Fixes: a1be4cead9b950 ("ASoC: sta32x: Convert to direct regmap API usage.") Signed-off-by: Daniel Mack <daniel(a)zonque.org> Signed-off-by: Mark Brown <broonie(a)kernel.org> Cc: stable(a)vger.kernel.org diff --git a/sound/soc/codecs/sta32x.c b/sound/soc/codecs/sta32x.c index d5035f2f2b2b..ce508b4cc85c 100644 --- a/sound/soc/codecs/sta32x.c +++ b/sound/soc/codecs/sta32x.c @@ -879,6 +879,9 @@ static int sta32x_probe(struct snd_soc_component *component) struct sta32x_priv *sta32x = snd_soc_component_get_drvdata(component); struct sta32x_platform_data *pdata = sta32x->pdata; int i, ret = 0, thermal = 0; + + sta32x->component = component; + ret = regulator_bulk_enable(ARRAY_SIZE(sta32x->supplies), sta32x->supplies); if (ret != 0) {

6 years, 9 months

6
5
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror