The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bfe8cc1db02ab243c62780f17fc57f65bde0afe1 Mon Sep 17 00:00:00 2001
From: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Date: Sat, 21 Nov 2020 22:17:15 -0800
Subject: [PATCH] mm/userfaultfd: do not access vma->vm_mm after calling
handle_userfault()
Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.
In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be
freed in some cases. In the case of userfaultfd_missing(), this will
happen after calling handle_userfault(), which might have released the
mmap_lock. Therefore, the following pte_free(vma->vm_mm, pgtable) will
access an unstable vma->vm_mm, which could have been freed or re-used
already.
For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the
passed-in mm. The implementation for SPARC32 would also access
mm->page_table_lock for pte_free(), but there is no THP support in
SPARC32, so the buggy code path will not be used there.
For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used
mm could result in various more or less subtle bugs due to list /
pagetable corruption.
Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.
Commit 6b251fc96cf2c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case
it put the pte_free() before calling handle_userfault().
BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334
CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
create_huge_pmd mm/memory.c:4256 [inline]
__handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480
handle_mm_fault+0x288/0x748 mm/memory.c:4607
do_exception+0x394/0xae0 arch/s390/mm/fault.c:479
do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567
pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706
copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline]
raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174
_copy_from_user+0x48/0xa8 lib/usercopy.c:16
copy_from_user include/linux/uaccess.h:192 [inline]
__do_sys_sigaltstack kernel/signal.c:4064 [inline]
__s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060
system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Allocated by task 9334:
slab_alloc_node mm/slub.c:2891 [inline]
slab_alloc mm/slub.c:2899 [inline]
kmem_cache_alloc+0x118/0x348 mm/slub.c:2904
vm_area_dup+0x9c/0x2b8 kernel/fork.c:356
__split_vma+0xba/0x560 mm/mmap.c:2742
split_vma+0xca/0x108 mm/mmap.c:2800
mlock_fixup+0x4ae/0x600 mm/mlock.c:550
apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619
do_mlock+0x1aa/0x718 mm/mlock.c:711
__do_sys_mlock2 mm/mlock.c:738 [inline]
__s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728
system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
Freed by task 9333:
slab_free mm/slub.c:3142 [inline]
kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158
__vma_adjust+0x7b2/0x2508 mm/mmap.c:960
vma_merge+0x87e/0xce0 mm/mmap.c:1209
userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868
__fput+0x22c/0x7a8 fs/file_table.c:281
task_work_run+0x200/0x320 kernel/task_work.c:151
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538
system_call+0xe6/0x28c arch/s390/kernel/entry.S:416
The buggy address belongs to the object at 00000000962d6948 which belongs to the cache vm_area_struct of size 200
The buggy address is located 64 bytes inside of 200-byte region [00000000962d6948, 00000000962d6a10)
The buggy address belongs to the page: page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6 flags: 0x3ffff00000000200(slab)
raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00
raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501
page dumped because: kasan: bad access detected
page->mem_cgroup:0000000096959501
Memory state around the buggy address:
00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
>00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00
00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Fixes: 6b251fc96cf2c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults")
Reported-by: Alexander Egorenkov <egorenar(a)linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org> [4.3+]
Link: https://lkml.kernel.org/r/20201110190329.11920-1-gerald.schaefer@linux.ibm.…
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9474dbc150ed..ec2bb93f7431 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -710,7 +710,6 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
transparent_hugepage_use_zero_page()) {
pgtable_t pgtable;
struct page *zero_page;
- bool set;
vm_fault_t ret;
pgtable = pte_alloc_one(vma->vm_mm);
if (unlikely(!pgtable))
@@ -723,25 +722,25 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
}
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
ret = 0;
- set = false;
if (pmd_none(*vmf->pmd)) {
ret = check_stable_address_space(vma->vm_mm);
if (ret) {
spin_unlock(vmf->ptl);
+ pte_free(vma->vm_mm, pgtable);
} else if (userfaultfd_missing(vma)) {
spin_unlock(vmf->ptl);
+ pte_free(vma->vm_mm, pgtable);
ret = handle_userfault(vmf, VM_UFFD_MISSING);
VM_BUG_ON(ret & VM_FAULT_FALLBACK);
} else {
set_huge_zero_page(pgtable, vma->vm_mm, vma,
haddr, vmf->pmd, zero_page);
spin_unlock(vmf->ptl);
- set = true;
}
- } else
+ } else {
spin_unlock(vmf->ptl);
- if (!set)
pte_free(vma->vm_mm, pgtable);
+ }
return ret;
}
gfp = alloc_hugepage_direct_gfpmask(vma);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 572c83acdcdafeb04e70aa46be1fa539310be20c Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Tue, 29 Sep 2020 08:53:54 -0400
Subject: [PATCH] btrfs: cleanup cow block on error
In fstest btrfs/064 a transaction abort in __btrfs_cow_block could lead
to a system lockup. It gets stuck trying to write back inodes, and the
write back thread was trying to lock an extent buffer:
$ cat /proc/2143497/stack
[<0>] __btrfs_tree_lock+0x108/0x250
[<0>] lock_extent_buffer_for_io+0x35e/0x3a0
[<0>] btree_write_cache_pages+0x15a/0x3b0
[<0>] do_writepages+0x28/0xb0
[<0>] __writeback_single_inode+0x54/0x5c0
[<0>] writeback_sb_inodes+0x1e8/0x510
[<0>] wb_writeback+0xcc/0x440
[<0>] wb_workfn+0xd7/0x650
[<0>] process_one_work+0x236/0x560
[<0>] worker_thread+0x55/0x3c0
[<0>] kthread+0x13a/0x150
[<0>] ret_from_fork+0x1f/0x30
This is because we got an error while COWing a block, specifically here
if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) {
ret = btrfs_reloc_cow_block(trans, root, buf, cow);
if (ret) {
btrfs_abort_transaction(trans, ret);
return ret;
}
}
[16402.241552] BTRFS: Transaction aborted (error -2)
[16402.242362] WARNING: CPU: 1 PID: 2563188 at fs/btrfs/ctree.c:1074 __btrfs_cow_block+0x376/0x540
[16402.249469] CPU: 1 PID: 2563188 Comm: fsstress Not tainted 5.9.0-rc6+ #8
[16402.249936] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
[16402.250525] RIP: 0010:__btrfs_cow_block+0x376/0x540
[16402.252417] RSP: 0018:ffff9cca40e578b0 EFLAGS: 00010282
[16402.252787] RAX: 0000000000000025 RBX: 0000000000000002 RCX: ffff9132bbd19388
[16402.253278] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9132bbd19380
[16402.254063] RBP: ffff9132b41a49c0 R08: 0000000000000000 R09: 0000000000000000
[16402.254887] R10: 0000000000000000 R11: ffff91324758b080 R12: ffff91326ef17ce0
[16402.255694] R13: ffff91325fc0f000 R14: ffff91326ef176b0 R15: ffff9132815e2000
[16402.256321] FS: 00007f542c6d7b80(0000) GS:ffff9132bbd00000(0000) knlGS:0000000000000000
[16402.256973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16402.257374] CR2: 00007f127b83f250 CR3: 0000000133480002 CR4: 0000000000370ee0
[16402.257867] Call Trace:
[16402.258072] btrfs_cow_block+0x109/0x230
[16402.258356] btrfs_search_slot+0x530/0x9d0
[16402.258655] btrfs_lookup_file_extent+0x37/0x40
[16402.259155] __btrfs_drop_extents+0x13c/0xd60
[16402.259628] ? btrfs_block_rsv_migrate+0x4f/0xb0
[16402.259949] btrfs_replace_file_extents+0x190/0x820
[16402.260873] btrfs_clone+0x9ae/0xc00
[16402.261139] btrfs_extent_same_range+0x66/0x90
[16402.261771] btrfs_remap_file_range+0x353/0x3b1
[16402.262333] vfs_dedupe_file_range_one.part.0+0xd5/0x140
[16402.262821] vfs_dedupe_file_range+0x189/0x220
[16402.263150] do_vfs_ioctl+0x552/0x700
[16402.263662] __x64_sys_ioctl+0x62/0xb0
[16402.264023] do_syscall_64+0x33/0x40
[16402.264364] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[16402.264862] RIP: 0033:0x7f542c7d15cb
[16402.266901] RSP: 002b:00007ffd35944ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[16402.267627] RAX: ffffffffffffffda RBX: 00000000009d1968 RCX: 00007f542c7d15cb
[16402.268298] RDX: 00000000009d2490 RSI: 00000000c0189436 RDI: 0000000000000003
[16402.268958] RBP: 00000000009d2520 R08: 0000000000000036 R09: 00000000009d2e64
[16402.269726] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[16402.270659] R13: 000000000001f000 R14: 00000000009d1970 R15: 00000000009d2e80
[16402.271498] irq event stamp: 0
[16402.271846] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[16402.272497] hardirqs last disabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0
[16402.273343] softirqs last enabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0
[16402.273905] softirqs last disabled at (0): [<0000000000000000>] 0x0
[16402.274338] ---[ end trace 737874a5a41a8236 ]---
[16402.274669] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
[16402.276179] BTRFS info (device dm-9): forced readonly
[16402.277046] BTRFS: error (device dm-9) in btrfs_replace_file_extents:2723: errno=-2 No such entry
[16402.278744] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
[16402.279968] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
[16402.280582] BTRFS info (device dm-9): balance: ended with status: -30
The problem here is that as soon as we allocate the new block it is
locked and marked dirty in the btree inode. This means that we could
attempt to writeback this block and need to lock the extent buffer.
However we're not unlocking it here and thus we deadlock.
Fix this by unlocking the cow block if we have any errors inside of
__btrfs_cow_block, and also free it so we do not leak it.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a165093739c4..113da62dc17f 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1064,6 +1064,8 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
if (ret) {
+ btrfs_tree_unlock(cow);
+ free_extent_buffer(cow);
btrfs_abort_transaction(trans, ret);
return ret;
}
@@ -1071,6 +1073,8 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) {
ret = btrfs_reloc_cow_block(trans, root, buf, cow);
if (ret) {
+ btrfs_tree_unlock(cow);
+ free_extent_buffer(cow);
btrfs_abort_transaction(trans, ret);
return ret;
}
@@ -1103,6 +1107,8 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
if (last_ref) {
ret = tree_mod_log_free_eb(buf);
if (ret) {
+ btrfs_tree_unlock(cow);
+ free_extent_buffer(cow);
btrfs_abort_transaction(trans, ret);
return ret;
}
commit a2d375eda771 ("dyndbg: refine export, rename to dynamic_debug_exec_queries()")
Above commit copies a string before checking for null pointer, fix
this, and add a pr_err. Also trim comment, and add return val info.
Fixes: a2d375eda771
Cc: stable(a)vger.kernel.org
Signed-off-by: Jim Cromie <jim.cromie(a)gmail.com>
---
lib/dynamic_debug.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index bd7b3aaa93c3..711a9def8c83 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -553,17 +553,23 @@ static int ddebug_exec_queries(char *query, const char *modname)
* @query: query-string described in admin-guide/dynamic-debug-howto
* @modname: string containing module name, usually &module.mod_name
*
- * This uses the >/proc/dynamic_debug/control reader, allowing module
- * authors to modify their dynamic-debug callsites. The modname is
- * canonically struct module.mod_name, but can also be null or a
- * module-wildcard, for example: "drm*".
+ * This uses the >control reader, allowing module authors to modify
+ * their dynamic-debug callsites. The modname is canonically struct
+ * module.mod_name, but can also be null or a module-wildcard, for
+ * example: "drm*".
+ * Returns <0 on error, >=0 for callsites changed
*/
int dynamic_debug_exec_queries(const char *query, const char *modname)
{
int rc;
- char *qry = kstrndup(query, PAGE_SIZE, GFP_KERNEL);
+ char *qry; /* writable copy of query */
- if (!query)
+ if (!query) {
+ pr_err("non-null query/command string expected\n");
+ return -EINVAL;
+ }
+ qry = kstrndup(query, PAGE_SIZE, GFP_KERNEL);
+ if (!qry)
return -ENOMEM;
rc = ddebug_exec_queries(qry, modname);
--
2.28.0
A number of userspace utilities depend on making calls to RTAS to retrieve
information and update various things.
The existing API through which we expose RTAS to userspace exposes more
RTAS functionality than we actually need, through the sys_rtas syscall,
which allows root (or anyone with CAP_SYS_ADMIN) to make any RTAS call they
want with arbitrary arguments.
Many RTAS calls take the address of a buffer as an argument, and it's up to
the caller to specify the physical address of the buffer as an argument. We
allocate a buffer (the "RMO buffer") in the Real Memory Area that RTAS can
access, and then expose the physical address and size of this buffer in
/proc/powerpc/rtas/rmo_buffer. Userspace is expected to read this address,
poke at the buffer using /dev/mem, and pass an address in the RMO buffer to
the RTAS call.
However, there's nothing stopping the caller from specifying whatever
address they want in the RTAS call, and it's easy to construct a series of
RTAS calls that can overwrite arbitrary bytes (even without /dev/mem
access).
Additionally, there are some RTAS calls that do potentially dangerous
things and for which there are no legitimate userspace use cases.
In the past, this would not have been a particularly big deal as it was
assumed that root could modify all system state freely, but with Secure
Boot and lockdown we need to care about this.
We can't fundamentally change the ABI at this point, however we can address
this by implementing a filter that checks RTAS calls against a list
of permitted calls and forces the caller to use addresses within the RMO
buffer.
The list is based off the list of calls that are used by the librtas
userspace library, and has been tested with a number of existing userspace
RTAS utilities. For compatibility with any applications we are not aware of
that require other calls, the filter can be turned off at build time.
Reported-by: Daniel Axtens <dja(a)axtens.net>
Cc: stable(a)vger.kernel.org
Signed-off-by: Andrew Donnellan <ajd(a)linux.ibm.com>
---
v1->v2:
- address comments from mpe
- shorten the names of some struct members
- make the filter array static/ro_after_init, use const char *
- genericise the fixed buffer size cases
- simplify/get rid of some of the error printing
- get rid of rtas_token_name()
---
arch/powerpc/Kconfig | 13 ++++
arch/powerpc/kernel/rtas.c | 153 +++++++++++++++++++++++++++++++++++++
2 files changed, 166 insertions(+)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1f48bbfb3ce9..8dd42b82379b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -989,6 +989,19 @@ config PPC_SECVAR_SYSFS
read/write operations on these variables. Say Y if you have
secure boot enabled and want to expose variables to userspace.
+config PPC_RTAS_FILTER
+ bool "Enable filtering of RTAS syscalls"
+ default y
+ depends on PPC_RTAS
+ help
+ The RTAS syscall API has security issues that could be used to
+ compromise system integrity. This option enforces restrictions on the
+ RTAS calls and arguments passed by userspace programs to mitigate
+ these issues.
+
+ Say Y unless you know what you are doing and the filter is causing
+ problems for you.
+
endmenu
config ISA_DMA_API
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 806d554ce357..954f41676f69 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -992,6 +992,147 @@ struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
return NULL;
}
+#ifdef CONFIG_PPC_RTAS_FILTER
+
+/*
+ * The sys_rtas syscall, as originally designed, allows root to pass
+ * arbitrary physical addresses to RTAS calls. A number of RTAS calls
+ * can be abused to write to arbitrary memory and do other things that
+ * are potentially harmful to system integrity, and thus should only
+ * be used inside the kernel and not exposed to userspace.
+ *
+ * All known legitimate users of the sys_rtas syscall will only ever
+ * pass addresses that fall within the RMO buffer, and use a known
+ * subset of RTAS calls.
+ *
+ * Accordingly, we filter RTAS requests to check that the call is
+ * permitted, and that provided pointers fall within the RMO buffer.
+ * The rtas_filters list contains an entry for each permitted call,
+ * with the indexes of the parameters which are expected to contain
+ * addresses and sizes of buffers allocated inside the RMO buffer.
+ */
+struct rtas_filter {
+ const char *name;
+ int token;
+ /* Indexes into the args buffer, -1 if not used */
+ int buf_idx1;
+ int size_idx1;
+ int buf_idx2;
+ int size_idx2;
+
+ int fixed_size;
+};
+
+static struct rtas_filter rtas_filters[] __ro_after_init = {
+ { "ibm,activate-firmware", -1, -1, -1, -1, -1 },
+ { "ibm,configure-connector", -1, 0, -1, 1, -1, 4096 }, /* Special cased */
+ { "display-character", -1, -1, -1, -1, -1 },
+ { "ibm,display-message", -1, 0, -1, -1, -1 },
+ { "ibm,errinjct", -1, 2, -1, -1, -1, 1024 },
+ { "ibm,close-errinjct", -1, -1, -1, -1, -1 },
+ { "ibm,open-errinct", -1, -1, -1, -1, -1 },
+ { "ibm,get-config-addr-info2", -1, -1, -1, -1, -1 },
+ { "ibm,get-dynamic-sensor-state", -1, 1, -1, -1, -1 },
+ { "ibm,get-indices", -1, 2, 3, -1, -1 },
+ { "get-power-level", -1, -1, -1, -1, -1 },
+ { "get-sensor-state", -1, -1, -1, -1, -1 },
+ { "ibm,get-system-parameter", -1, 1, 2, -1, -1 },
+ { "get-time-of-day", -1, -1, -1, -1, -1 },
+ { "ibm,get-vpd", -1, 0, -1, 1, 2 },
+ { "ibm,lpar-perftools", -1, 2, 3, -1, -1 },
+ { "ibm,platform-dump", -1, 4, 5, -1, -1 },
+ { "ibm,read-slot-reset-state", -1, -1, -1, -1, -1 },
+ { "ibm,scan-log-dump", -1, 0, 1, -1, -1 },
+ { "ibm,set-dynamic-indicator", -1, 2, -1, -1, -1 },
+ { "ibm,set-eeh-option", -1, -1, -1, -1, -1 },
+ { "set-indicator", -1, -1, -1, -1, -1 },
+ { "set-power-level", -1, -1, -1, -1, -1 },
+ { "set-time-for-power-on", -1, -1, -1, -1, -1 },
+ { "ibm,set-system-parameter", -1, 1, -1, -1, -1 },
+ { "set-time-of-day", -1, -1, -1, -1, -1 },
+ { "ibm,suspend-me", -1, -1, -1, -1, -1 },
+ { "ibm,update-nodes", -1, 0, -1, -1, -1, 4096 },
+ { "ibm,update-properties", -1, 0, -1, -1, -1, 4096 },
+ { "ibm,physical-attestation", -1, 0, 1, -1, -1 },
+};
+
+static bool in_rmo_buf(u32 base, u32 end)
+{
+ return base >= rtas_rmo_buf &&
+ base < (rtas_rmo_buf + RTAS_RMOBUF_MAX) &&
+ base <= end &&
+ end >= rtas_rmo_buf &&
+ end < (rtas_rmo_buf + RTAS_RMOBUF_MAX);
+}
+
+static bool block_rtas_call(int token, int nargs,
+ struct rtas_args *args)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
+ struct rtas_filter *f = &rtas_filters[i];
+ u32 base, size, end;
+
+ if (token != f->token)
+ continue;
+
+ if (f->buf_idx1 != -1) {
+ base = be32_to_cpu(args->args[f->buf_idx1]);
+ if (f->size_idx1 != -1)
+ size = be32_to_cpu(args->args[f->size_idx1]);
+ else if (f->fixed_size)
+ size = f->fixed_size;
+ else
+ size = 1;
+
+ end = base + size - 1;
+ if (!in_rmo_buf(base, end))
+ goto err;
+ }
+
+ if (f->buf_idx2 != -1) {
+ base = be32_to_cpu(args->args[f->buf_idx2]);
+ if (f->size_idx2 != -1)
+ size = be32_to_cpu(args->args[f->size_idx2]);
+ else if (f->fixed_size)
+ size = f->fixed_size;
+ else
+ size = 1;
+ end = base + size - 1;
+
+ /*
+ * Special case for ibm,configure-connector where the
+ * address can be 0
+ */
+ if (!strcmp(f->name, "ibm,configure-connector") &&
+ base == 0)
+ return false;
+
+ if (!in_rmo_buf(base, end))
+ goto err;
+ }
+
+ return false;
+ }
+
+err:
+ pr_err_ratelimited("sys_rtas: RTAS call blocked - exploit attempt?\n");
+ pr_err_ratelimited("sys_rtas: token=0x%x, nargs=%d (called by %s)\n",
+ token, nargs, current->comm);
+ return true;
+}
+
+#else
+
+static bool block_rtas_call(int token, int nargs,
+ struct rtas_args *args)
+{
+ return false;
+}
+
+#endif /* CONFIG_PPC_RTAS_FILTER */
+
/* We assume to be passed big endian arguments */
SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
{
@@ -1029,6 +1170,9 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
args.rets = &args.args[nargs];
memset(args.rets, 0, nret * sizeof(rtas_arg_t));
+ if (block_rtas_call(token, nargs, &args))
+ return -EINVAL;
+
/* Need to handle ibm,suspend_me call specially */
if (token == ibm_suspend_me_token) {
@@ -1090,6 +1234,9 @@ void __init rtas_initialize(void)
unsigned long rtas_region = RTAS_INSTANTIATE_MAX;
u32 base, size, entry;
int no_base, no_size, no_entry;
+#ifdef CONFIG_PPC_RTAS_FILTER
+ int i;
+#endif
/* Get RTAS dev node and fill up our "rtas" structure with infos
* about it.
@@ -1129,6 +1276,12 @@ void __init rtas_initialize(void)
#ifdef CONFIG_RTAS_ERROR_LOGGING
rtas_last_error_token = rtas_token("rtas-last-error");
#endif
+
+#ifdef CONFIG_PPC_RTAS_FILTER
+ for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
+ rtas_filters[i].token = rtas_token(rtas_filters[i].name);
+ }
+#endif
}
int __init early_init_dt_scan_rtas(unsigned long node,
--
2.20.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 156abe2961601d60a8c2a60c6dc8dd6ce7adcdaf Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Sat, 6 Jun 2020 11:31:50 +0200
Subject: [PATCH] pinctrl: baytrail: Fix pin being driven low for a while on
gpiod_get(..., GPIOD_OUT_HIGH)
The pins on the Bay Trail SoC have separate input-buffer and output-buffer
enable bits and a read of the level bit of the value register will always
return the value from the input-buffer.
The BIOS of a device may configure a pin in output-only mode, only enabling
the output buffer, and write 1 to the level bit to drive the pin high.
This 1 written to the level bit will be stored inside the data-latch of the
output buffer.
But a subsequent read of the value register will return 0 for the level bit
because the input-buffer is disabled. This causes a read-modify-write as
done by byt_gpio_set_direction() to write 0 to the level bit, driving the
pin low!
Before this commit byt_gpio_direction_output() relied on
pinctrl_gpio_direction_output() to set the direction, followed by a call
to byt_gpio_set() to apply the selected value. This causes the pin to
go low between the pinctrl_gpio_direction_output() and byt_gpio_set()
calls.
Change byt_gpio_direction_output() to directly make the register
modifications itself instead. Replacing the 2 subsequent writes to the
value register with a single write.
Note that the pinctrl code does not keep track internally of the direction,
so not going through pinctrl_gpio_direction_output() is not an issue.
This issue was noticed on a Trekstor SurfTab Twin 10.1. When the panel is
already on at boot (no external monitor connected), then the i915 driver
does a gpiod_get(..., GPIOD_OUT_HIGH) for the panel-enable GPIO. The
temporarily going low of that GPIO was causing the panel to reset itself
after which it would not show an image until it was turned off and back on
again (until a full modeset was done on it). This commit fixes this.
This commit also updates the byt_gpio_direction_input() to use direct
register accesses instead of going through pinctrl_gpio_direction_input(),
to keep it consistent with byt_gpio_direction_output().
Note for backporting, this commit depends on:
commit e2b74419e5cc ("pinctrl: baytrail: Replace WARN with dev_info_once
when setting direct-irq pin to output")
Cc: stable(a)vger.kernel.org
Fixes: 86e3ef812fe3 ("pinctrl: baytrail: Update gpio chip operations")
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Acked-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
diff --git a/drivers/pinctrl/intel/pinctrl-baytrail.c b/drivers/pinctrl/intel/pinctrl-baytrail.c
index e3ceb3dfeabe..a917a2df520e 100644
--- a/drivers/pinctrl/intel/pinctrl-baytrail.c
+++ b/drivers/pinctrl/intel/pinctrl-baytrail.c
@@ -800,6 +800,21 @@ static void byt_gpio_disable_free(struct pinctrl_dev *pctl_dev,
pm_runtime_put(vg->dev);
}
+static void byt_gpio_direct_irq_check(struct intel_pinctrl *vg,
+ unsigned int offset)
+{
+ void __iomem *conf_reg = byt_gpio_reg(vg, offset, BYT_CONF0_REG);
+
+ /*
+ * Before making any direction modifications, do a check if gpio is set
+ * for direct IRQ. On Bay Trail, setting GPIO to output does not make
+ * sense, so let's at least inform the caller before they shoot
+ * themselves in the foot.
+ */
+ if (readl(conf_reg) & BYT_DIRECT_IRQ_EN)
+ dev_info_once(vg->dev, "Potential Error: Setting GPIO with direct_irq_en to output");
+}
+
static int byt_gpio_set_direction(struct pinctrl_dev *pctl_dev,
struct pinctrl_gpio_range *range,
unsigned int offset,
@@ -807,7 +822,6 @@ static int byt_gpio_set_direction(struct pinctrl_dev *pctl_dev,
{
struct intel_pinctrl *vg = pinctrl_dev_get_drvdata(pctl_dev);
void __iomem *val_reg = byt_gpio_reg(vg, offset, BYT_VAL_REG);
- void __iomem *conf_reg = byt_gpio_reg(vg, offset, BYT_CONF0_REG);
unsigned long flags;
u32 value;
@@ -817,14 +831,8 @@ static int byt_gpio_set_direction(struct pinctrl_dev *pctl_dev,
value &= ~BYT_DIR_MASK;
if (input)
value |= BYT_OUTPUT_EN;
- else if (readl(conf_reg) & BYT_DIRECT_IRQ_EN)
- /*
- * Before making any direction modifications, do a check if gpio
- * is set for direct IRQ. On baytrail, setting GPIO to output
- * does not make sense, so let's at least inform the caller before
- * they shoot themselves in the foot.
- */
- dev_info_once(vg->dev, "Potential Error: Setting GPIO with direct_irq_en to output");
+ else
+ byt_gpio_direct_irq_check(vg, offset);
writel(value, val_reg);
@@ -1165,19 +1173,50 @@ static int byt_gpio_get_direction(struct gpio_chip *chip, unsigned int offset)
static int byt_gpio_direction_input(struct gpio_chip *chip, unsigned int offset)
{
- return pinctrl_gpio_direction_input(chip->base + offset);
+ struct intel_pinctrl *vg = gpiochip_get_data(chip);
+ void __iomem *val_reg = byt_gpio_reg(vg, offset, BYT_VAL_REG);
+ unsigned long flags;
+ u32 reg;
+
+ raw_spin_lock_irqsave(&byt_lock, flags);
+
+ reg = readl(val_reg);
+ reg &= ~BYT_DIR_MASK;
+ reg |= BYT_OUTPUT_EN;
+ writel(reg, val_reg);
+
+ raw_spin_unlock_irqrestore(&byt_lock, flags);
+ return 0;
}
+/*
+ * Note despite the temptation this MUST NOT be converted into a call to
+ * pinctrl_gpio_direction_output() + byt_gpio_set() that does not work this
+ * MUST be done as a single BYT_VAL_REG register write.
+ * See the commit message of the commit adding this comment for details.
+ */
static int byt_gpio_direction_output(struct gpio_chip *chip,
unsigned int offset, int value)
{
- int ret = pinctrl_gpio_direction_output(chip->base + offset);
+ struct intel_pinctrl *vg = gpiochip_get_data(chip);
+ void __iomem *val_reg = byt_gpio_reg(vg, offset, BYT_VAL_REG);
+ unsigned long flags;
+ u32 reg;
- if (ret)
- return ret;
+ raw_spin_lock_irqsave(&byt_lock, flags);
+
+ byt_gpio_direct_irq_check(vg, offset);
- byt_gpio_set(chip, offset, value);
+ reg = readl(val_reg);
+ reg &= ~BYT_DIR_MASK;
+ if (value)
+ reg |= BYT_LEVEL;
+ else
+ reg &= ~BYT_LEVEL;
+ writel(reg, val_reg);
+
+ raw_spin_unlock_irqrestore(&byt_lock, flags);
return 0;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca10845a56856fff4de3804c85e6424d0f6d0cde Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Tue, 1 Sep 2020 08:09:01 -0400
Subject: [PATCH] btrfs: sysfs: init devices outside of the chunk_mutex
While running btrfs/061, btrfs/073, btrfs/078, or btrfs/178 we hit the
following lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
5.9.0-rc3+ #4 Not tainted
------------------------------------------------------
kswapd0/100 is trying to acquire lock:
ffff96ecc22ef4a0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330
but task is already holding lock:
ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire+0x65/0x80
slab_pre_alloc_hook.constprop.0+0x20/0x200
kmem_cache_alloc+0x37/0x270
alloc_inode+0x82/0xb0
iget_locked+0x10d/0x2c0
kernfs_get_inode+0x1b/0x130
kernfs_get_tree+0x136/0x240
sysfs_get_tree+0x16/0x40
vfs_get_tree+0x28/0xc0
path_mount+0x434/0xc00
__x64_sys_mount+0xe3/0x120
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #2 (kernfs_mutex){+.+.}-{3:3}:
__mutex_lock+0x7e/0x7e0
kernfs_add_one+0x23/0x150
kernfs_create_link+0x63/0xa0
sysfs_do_create_link_sd+0x5e/0xd0
btrfs_sysfs_add_devices_dir+0x81/0x130
btrfs_init_new_device+0x67f/0x1250
btrfs_ioctl+0x1ef/0x2e20
__x64_sys_ioctl+0x83/0xb0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
__mutex_lock+0x7e/0x7e0
btrfs_chunk_alloc+0x125/0x3a0
find_free_extent+0xdf6/0x1210
btrfs_reserve_extent+0xb3/0x1b0
btrfs_alloc_tree_block+0xb0/0x310
alloc_tree_block_no_bg_flush+0x4a/0x60
__btrfs_cow_block+0x11a/0x530
btrfs_cow_block+0x104/0x220
btrfs_search_slot+0x52e/0x9d0
btrfs_insert_empty_items+0x64/0xb0
btrfs_insert_delayed_items+0x90/0x4f0
btrfs_commit_inode_delayed_items+0x93/0x140
btrfs_log_inode+0x5de/0x2020
btrfs_log_inode_parent+0x429/0xc90
btrfs_log_new_name+0x95/0x9b
btrfs_rename2+0xbb9/0x1800
vfs_rename+0x64f/0x9f0
do_renameat2+0x320/0x4e0
__x64_sys_rename+0x1f/0x30
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (&delayed_node->mutex){+.+.}-{3:3}:
__lock_acquire+0x119c/0x1fc0
lock_acquire+0xa7/0x3d0
__mutex_lock+0x7e/0x7e0
__btrfs_release_delayed_node.part.0+0x3f/0x330
btrfs_evict_inode+0x24c/0x500
evict+0xcf/0x1f0
dispose_list+0x48/0x70
prune_icache_sb+0x44/0x50
super_cache_scan+0x161/0x1e0
do_shrink_slab+0x178/0x3c0
shrink_slab+0x17c/0x290
shrink_node+0x2b2/0x6d0
balance_pgdat+0x30a/0x670
kswapd+0x213/0x4c0
kthread+0x138/0x160
ret_from_fork+0x1f/0x30
other info that might help us debug this:
Chain exists of:
&delayed_node->mutex --> kernfs_mutex --> fs_reclaim
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(fs_reclaim);
lock(kernfs_mutex);
lock(fs_reclaim);
lock(&delayed_node->mutex);
*** DEADLOCK ***
3 locks held by kswapd0/100:
#0: ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
#1: ffffffff8dd65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290
#2: ffff96ed2ade30e0 (&type->s_umount_key#36){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0
stack backtrace:
CPU: 0 PID: 100 Comm: kswapd0 Not tainted 5.9.0-rc3+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
dump_stack+0x8b/0xb8
check_noncircular+0x12d/0x150
__lock_acquire+0x119c/0x1fc0
lock_acquire+0xa7/0x3d0
? __btrfs_release_delayed_node.part.0+0x3f/0x330
__mutex_lock+0x7e/0x7e0
? __btrfs_release_delayed_node.part.0+0x3f/0x330
? __btrfs_release_delayed_node.part.0+0x3f/0x330
? lock_acquire+0xa7/0x3d0
? find_held_lock+0x2b/0x80
__btrfs_release_delayed_node.part.0+0x3f/0x330
btrfs_evict_inode+0x24c/0x500
evict+0xcf/0x1f0
dispose_list+0x48/0x70
prune_icache_sb+0x44/0x50
super_cache_scan+0x161/0x1e0
do_shrink_slab+0x178/0x3c0
shrink_slab+0x17c/0x290
shrink_node+0x2b2/0x6d0
balance_pgdat+0x30a/0x670
kswapd+0x213/0x4c0
? _raw_spin_unlock_irqrestore+0x41/0x50
? add_wait_queue_exclusive+0x70/0x70
? balance_pgdat+0x670/0x670
kthread+0x138/0x160
? kthread_create_worker_on_cpu+0x40/0x40
ret_from_fork+0x1f/0x30
This happens because we are holding the chunk_mutex at the time of
adding in a new device. However we only need to hold the
device_list_mutex, as we're going to iterate over the fs_devices
devices. Move the sysfs init stuff outside of the chunk_mutex to get
rid of this lockdep splat.
CC: stable(a)vger.kernel.org # 4.4.x: f3cd2c58110dad14e: btrfs: sysfs, rename device_link add/remove functions
CC: stable(a)vger.kernel.org # 4.4.x
Reported-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9d169cba8514..c86ffad04641 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2597,9 +2597,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
btrfs_set_super_num_devices(fs_info->super_copy,
orig_super_num_devices + 1);
- /* add sysfs device entry */
- btrfs_sysfs_add_devices_dir(fs_devices, device);
-
/*
* we've got more storage, clear any full flags on the space
* infos
@@ -2607,6 +2604,10 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
btrfs_clear_space_info_full(fs_info);
mutex_unlock(&fs_info->chunk_mutex);
+
+ /* Add sysfs device entry */
+ btrfs_sysfs_add_devices_dir(fs_devices, device);
+
mutex_unlock(&fs_devices->device_list_mutex);
if (seeding_dev) {
At usual case DC3CO exit happen automatically by DMC f/w whenever
PSR2 clears idle. This happens smoothly by DMC f/w to work with flips.
But there are certain scenario where DC3CO Disallowed by driver
asynchronous with flips. In such scenario display engine could
be already in DC3CO state and driver has disallowed it,
It initiates DC3CO exit sequence in DMC f/w which requires a
dc3co exit delay of 200us in driver.
It requires to protect intel_pipe_update_{update_end} with
dc3co exit delay.
Cc: Imre Deak <imre.deak(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Anshuman Gupta <anshuman.gupta(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_display.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index ba26545392bc..3b81b98c0daf 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -15924,6 +15924,8 @@ static void intel_update_crtc(struct intel_atomic_state *state,
else
intel_fbc_enable(state, crtc);
+ /* Protect intel_pipe_update_{start,end} with power_domians lock */
+ mutex_lock(&dev_priv->power_domains.lock);
/* Perform vblank evasion around commit operation */
intel_pipe_update_start(new_crtc_state);
@@ -15935,6 +15937,7 @@ static void intel_update_crtc(struct intel_atomic_state *state,
i9xx_update_planes_on_crtc(state, crtc);
intel_pipe_update_end(new_crtc_state);
+ mutex_unlock(&dev_prive->power_domains.lock);
/*
* We usually enable FIFO underrun interrupts as part of the
--
2.26.2