This fixes a Spectre-v1/L1TF vulnerability in picdev_write().
It replaces index computations based on the (attacked-controlled) port
number with constants through a minor refactoring.
Fixes: commit 85f455f7ddbe ("KVM: Add support for in-kernel PIC emulation")
Signed-off-by: Nick Finco <nifi(a)google.com>
Signed-off-by: Marios Pomonis <pomonis(a)google.com>
Reviewed-by: Andrew Honig <ahonig(a)google.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/kvm/i8259.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 8b38bb4868a6..629a09ca9860 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -460,10 +460,14 @@ static int picdev_write(struct kvm_pic *s,
switch (addr) {
case 0x20:
case 0x21:
+ pic_lock(s);
+ pic_ioport_write(&s->pics[0], addr, data);
+ pic_unlock(s);
+ break;
case 0xa0:
case 0xa1:
pic_lock(s);
- pic_ioport_write(&s->pics[addr >> 7], addr, data);
+ pic_ioport_write(&s->pics[1], addr, data);
pic_unlock(s);
break;
case 0x4d0:
--
2.24.0.525.g8f36a354ae-goog
This is a note to let you know that I've just added the patch titled
chardev: Avoid potential use-after-free in 'chrdev_open()'
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 68faa679b8be1a74e6663c21c3a9d25d32f1c079 Mon Sep 17 00:00:00 2001
From: Will Deacon <will(a)kernel.org>
Date: Thu, 19 Dec 2019 12:02:03 +0000
Subject: chardev: Avoid potential use-after-free in 'chrdev_open()'
'chrdev_open()' calls 'cdev_get()' to obtain a reference to the
'struct cdev *' stashed in the 'i_cdev' field of the target inode
structure. If the pointer is NULL, then it is initialised lazily by
looking up the kobject in the 'cdev_map' and so the whole procedure is
protected by the 'cdev_lock' spinlock to serialise initialisation of
the shared pointer.
Unfortunately, it is possible for the initialising thread to fail *after*
installing the new pointer, for example if the subsequent '->open()' call
on the file fails. In this case, 'cdev_put()' is called, the reference
count on the kobject is dropped and, if nobody else has taken a reference,
the release function is called which finally clears 'inode->i_cdev' from
'cdev_purge()' before potentially freeing the object. The problem here
is that a racing thread can happily take the 'cdev_lock' and see the
non-NULL pointer in the inode, which can result in a refcount increment
from zero and a warning:
| ------------[ cut here ]------------
| refcount_t: addition on 0; use-after-free.
| WARNING: CPU: 2 PID: 6385 at lib/refcount.c:25 refcount_warn_saturate+0x6d/0xf0
| Modules linked in:
| CPU: 2 PID: 6385 Comm: repro Not tainted 5.5.0-rc2+ #22
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
| RIP: 0010:refcount_warn_saturate+0x6d/0xf0
| Code: 05 55 9a 15 01 01 e8 9d aa c8 ff 0f 0b c3 80 3d 45 9a 15 01 00 75 ce 48 c7 c7 00 9c 62 b3 c6 08
| RSP: 0018:ffffb524c1b9bc70 EFLAGS: 00010282
| RAX: 0000000000000000 RBX: ffff9e9da1f71390 RCX: 0000000000000000
| RDX: ffff9e9dbbd27618 RSI: ffff9e9dbbd18798 RDI: ffff9e9dbbd18798
| RBP: 0000000000000000 R08: 000000000000095f R09: 0000000000000039
| R10: 0000000000000000 R11: ffffb524c1b9bb20 R12: ffff9e9da1e8c700
| R13: ffffffffb25ee8b0 R14: 0000000000000000 R15: ffff9e9da1e8c700
| FS: 00007f3b87d26700(0000) GS:ffff9e9dbbd00000(0000) knlGS:0000000000000000
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 00007fc16909c000 CR3: 000000012df9c000 CR4: 00000000000006e0
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
| Call Trace:
| kobject_get+0x5c/0x60
| cdev_get+0x2b/0x60
| chrdev_open+0x55/0x220
| ? cdev_put.part.3+0x20/0x20
| do_dentry_open+0x13a/0x390
| path_openat+0x2c8/0x1470
| do_filp_open+0x93/0x100
| ? selinux_file_ioctl+0x17f/0x220
| do_sys_open+0x186/0x220
| do_syscall_64+0x48/0x150
| entry_SYSCALL_64_after_hwframe+0x44/0xa9
| RIP: 0033:0x7f3b87efcd0e
| Code: 89 54 24 08 e8 a3 f4 ff ff 8b 74 24 0c 48 8b 3c 24 41 89 c0 44 8b 54 24 08 b8 01 01 00 00 89 f4
| RSP: 002b:00007f3b87d259f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
| RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b87efcd0e
| RDX: 0000000000000000 RSI: 00007f3b87d25a80 RDI: 00000000ffffff9c
| RBP: 00007f3b87d25e90 R08: 0000000000000000 R09: 0000000000000000
| R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe188f504e
| R13: 00007ffe188f504f R14: 00007f3b87d26700 R15: 0000000000000000
| ---[ end trace 24f53ca58db8180a ]---
Since 'cdev_get()' can already fail to obtain a reference, simply move
it over to use 'kobject_get_unless_zero()' instead of 'kobject_get()',
which will cause the racing thread to return -ENXIO if the initialising
thread fails unexpectedly.
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Reported-by: syzbot+82defefbbd8527e1c2cb(a)syzkaller.appspotmail.com
Signed-off-by: Will Deacon <will(a)kernel.org>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20191219120203.32691-1-will@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/char_dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/char_dev.c b/fs/char_dev.c
index 00dfe17871ac..c5e6eff5a381 100644
--- a/fs/char_dev.c
+++ b/fs/char_dev.c
@@ -352,7 +352,7 @@ static struct kobject *cdev_get(struct cdev *p)
if (owner && !try_module_get(owner))
return NULL;
- kobj = kobject_get(&p->kobj);
+ kobj = kobject_get_unless_zero(&p->kobj);
if (!kobj)
module_put(owner);
return kobj;
--
2.24.1
The rseq.h UAPI documents that the rseq_cs field must be cleared
before reclaiming memory that contains the targeted struct rseq_cs.
We should extend this comment to also dictate that the rseq_cs field
must be cleared before reclaiming memory of the code pointed to by
the rseq_cs start_ip and post_commit_offset fields.
While we can expect that use of dlclose(3) will typically unmap
both struct rseq_cs and its associated code at once, nothing would
theoretically prevent a JIT from reclaiming the code without
reclaiming the struct rseq_cs, which would erroneously allow the
kernel to consider new code which is not a rseq critical section
as a rseq critical section following a code reclaim.
Suggested-by: Florian Weimer <fw(a)deneb.enyo.de>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Florian Weimer <fw(a)deneb.enyo.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: "Paul E. McKenney" <paulmck(a)linux.ibm.com>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: "H . Peter Anvin" <hpa(a)zytor.com>
Cc: Paul Turner <pjt(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Neel Natu <neelnatu(a)google.com>
Cc: linux-api(a)vger.kernel.org
---
include/uapi/linux/rseq.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
index 9a402fdb60e9..d94afdfc4b7c 100644
--- a/include/uapi/linux/rseq.h
+++ b/include/uapi/linux/rseq.h
@@ -100,7 +100,9 @@ struct rseq {
* instruction sequence block, as well as when the kernel detects that
* it is preempting or delivering a signal outside of the range
* targeted by the rseq_cs. Also needs to be set to NULL by user-space
- * before reclaiming memory that contains the targeted struct rseq_cs.
+ * before reclaiming memory that contains the targeted struct rseq_cs
+ * or reclaiming memory that contains the code referred to by the
+ * start_ip and post_commit_offset fields of struct rseq_cs.
*
* Read and set by the kernel. Set by user-space with single-copy
* atomicity semantics. This field should only be updated by the
--
2.17.1
Be sure to initialise the uabi_instance on the virtual engine to the
special invalid value, just in case we ever peek at it from the uAPI.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Fixes: 750e76b4f9f6 ("drm/i915/gt: Move the [class][inst] lookup for engines onto the GT")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v5.4+
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 170b5a0139a3..f07d93514a7c 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -4601,9 +4601,11 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
ve->base.gt = siblings[0]->gt;
ve->base.uncore = siblings[0]->uncore;
ve->base.id = -1;
+
ve->base.class = OTHER_CLASS;
ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+ ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
/*
* The decision on whether to submit a request using semaphores
--
2.25.0.rc1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From de7999afedff02c6631feab3ea726a0e8f8c3d40 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 11 Dec 2019 09:01:40 +0000
Subject: [PATCH] Btrfs: fix infinite loop during nocow writeback due to race
When starting writeback for a range that covers part of a preallocated
extent, due to a race with writeback for another range that also covers
another part of the same preallocated extent, we can end up in an infinite
loop.
Consider the following example where for inode 280 we have two dirty
ranges:
range A, from 294912 to 303103, 8192 bytes
range B, from 348160 to 438271, 90112 bytes
and we have the following file extent item layout for our inode:
leaf 38895616 gen 24544 total ptrs 29 free space 13820 owner 5
(...)
item 27 key (280 108 200704) itemoff 14598 itemsize 53
extent data disk bytenr 0 nr 0 type 1 (regular)
extent data offset 0 nr 94208 ram 94208
item 28 key (280 108 294912) itemoff 14545 itemsize 53
extent data disk bytenr 10433052672 nr 81920 type 2 (prealloc)
extent data offset 0 nr 81920 ram 81920
Then the following happens:
1) Writeback starts for range B (from 348160 to 438271), execution of
run_delalloc_nocow() starts;
2) The first iteration of run_delalloc_nocow()'s whil loop leaves us at
the extent item at slot 28, pointing to the prealloc extent item
covering the range from 294912 to 376831. This extent covers part of
our range;
3) An ordered extent is created against that extent, covering the file
range from 348160 to 376831 (28672 bytes);
4) We adjust 'cur_offset' to 376832 and move on to the next iteration of
the while loop;
5) The call to btrfs_lookup_file_extent() leaves us at the same leaf,
pointing to slot 29, 1 slot after the last item (the extent item
we processed in the previous iteration);
6) Because we are a slot beyond the last item, we call btrfs_next_leaf(),
which releases the search path before doing a another search for the
last key of the leaf (280 108 294912);
7) Right after btrfs_next_leaf() released the path, and before it did
another search for the last key of the leaf, writeback for the range
A (from 294912 to 303103) completes (it was previously started at
some point);
8) Upon completion of the ordered extent for range A, the prealloc extent
we previously found got split into two extent items, one covering the
range from 294912 to 303103 (8192 bytes), with a type of regular extent
(and no longer prealloc) and another covering the range from 303104 to
376831 (73728 bytes), with a type of prealloc and an offset of 8192
bytes. So our leaf now has the following layout:
leaf 38895616 gen 24544 total ptrs 31 free space 13664 owner 5
(...)
item 27 key (280 108 200704) itemoff 14598 itemsize 53
extent data disk bytenr 0 nr 0 type 1
extent data offset 0 nr 8192 ram 94208
item 28 key (280 108 208896) itemoff 14545 itemsize 53
extent data disk bytenr 10433142784 nr 86016 type 1
extent data offset 0 nr 86016 ram 86016
item 29 key (280 108 294912) itemoff 14492 itemsize 53
extent data disk bytenr 10433052672 nr 81920 type 1
extent data offset 0 nr 8192 ram 81920
item 30 key (280 108 303104) itemoff 14439 itemsize 53
extent data disk bytenr 10433052672 nr 81920 type 2
extent data offset 8192 nr 73728 ram 81920
9) After btrfs_next_leaf() returns, we have our path pointing to that same
leaf and at slot 30, since it has a key we didn't have before and it's
the first key greater then the key that was previously the last key of
the leaf (key (280 108 294912));
10) The extent item at slot 30 covers the range from 303104 to 376831
which is in our target range, so we process it, despite having already
created an ordered extent against this extent for the file range from
348160 to 376831. This is because we skip to the next extent item only
if its end is less than or equals to the start of our delalloc range,
and not less than or equals to the current offset ('cur_offset');
11) As a result we compute 'num_bytes' as:
num_bytes = min(end + 1, extent_end) - cur_offset;
= min(438271 + 1, 376832) - 376832 = 0
12) We then call create_io_em() for a 0 bytes range starting at offset
376832;
13) Then create_io_em() enters an infinite loop because its calls to
btrfs_drop_extent_cache() do nothing due to the 0 length range
passed to it. So no existing extent maps that cover the offset
376832 get removed, and therefore calls to add_extent_mapping()
return -EEXIST, resulting in an infinite loop. This loop from
create_io_em() is the following:
do {
btrfs_drop_extent_cache(BTRFS_I(inode), em->start,
em->start + em->len - 1, 0);
write_lock(&em_tree->lock);
ret = add_extent_mapping(em_tree, em, 1);
write_unlock(&em_tree->lock);
/*
* The caller has taken lock_extent(), who could race with us
* to add em?
*/
} while (ret == -EEXIST);
Also, each call to btrfs_drop_extent_cache() triggers a warning because
the start offset passed to it (376832) is smaller then the end offset
(376832 - 1) passed to it by -1, due to the 0 length:
[258532.052621] ------------[ cut here ]------------
[258532.052643] WARNING: CPU: 0 PID: 9987 at fs/btrfs/file.c:602 btrfs_drop_extent_cache+0x3f4/0x590 [btrfs]
(...)
[258532.052672] CPU: 0 PID: 9987 Comm: fsx Tainted: G W 5.4.0-rc7-btrfs-next-64 #1
[258532.052673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[258532.052691] RIP: 0010:btrfs_drop_extent_cache+0x3f4/0x590 [btrfs]
(...)
[258532.052695] RSP: 0018:ffffb4be0153f860 EFLAGS: 00010287
[258532.052700] RAX: ffff975b445ee360 RBX: ffff975b44eb3e08 RCX: 0000000000000000
[258532.052700] RDX: 0000000000038fff RSI: 0000000000039000 RDI: ffff975b445ee308
[258532.052700] RBP: 0000000000038fff R08: 0000000000000000 R09: 0000000000000001
[258532.052701] R10: ffff975b513c5c10 R11: 00000000e3c0cfa9 R12: 0000000000039000
[258532.052703] R13: ffff975b445ee360 R14: 00000000ffffffef R15: ffff975b445ee308
[258532.052705] FS: 00007f86a821de80(0000) GS:ffff975b76a00000(0000) knlGS:0000000000000000
[258532.052707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[258532.052708] CR2: 00007fdacf0f3ab4 CR3: 00000001f9d26002 CR4: 00000000003606f0
[258532.052712] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[258532.052717] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[258532.052717] Call Trace:
[258532.052718] ? preempt_schedule_common+0x32/0x70
[258532.052722] ? ___preempt_schedule+0x16/0x20
[258532.052741] create_io_em+0xff/0x180 [btrfs]
[258532.052767] run_delalloc_nocow+0x942/0xb10 [btrfs]
[258532.052791] btrfs_run_delalloc_range+0x30b/0x520 [btrfs]
[258532.052812] ? find_lock_delalloc_range+0x221/0x250 [btrfs]
[258532.052834] writepage_delalloc+0xe4/0x140 [btrfs]
[258532.052855] __extent_writepage+0x110/0x4e0 [btrfs]
[258532.052876] extent_write_cache_pages+0x21c/0x480 [btrfs]
[258532.052906] extent_writepages+0x52/0xb0 [btrfs]
[258532.052911] do_writepages+0x23/0x80
[258532.052915] __filemap_fdatawrite_range+0xd2/0x110
[258532.052938] btrfs_fdatawrite_range+0x1b/0x50 [btrfs]
[258532.052954] start_ordered_ops+0x57/0xa0 [btrfs]
[258532.052973] ? btrfs_sync_file+0x225/0x490 [btrfs]
[258532.052988] btrfs_sync_file+0x225/0x490 [btrfs]
[258532.052997] __x64_sys_msync+0x199/0x200
[258532.053004] do_syscall_64+0x5c/0x250
[258532.053007] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[258532.053010] RIP: 0033:0x7f86a7dfd760
(...)
[258532.053014] RSP: 002b:00007ffd99af0368 EFLAGS: 00000246 ORIG_RAX: 000000000000001a
[258532.053016] RAX: ffffffffffffffda RBX: 0000000000000ec9 RCX: 00007f86a7dfd760
[258532.053017] RDX: 0000000000000004 RSI: 000000000000836c RDI: 00007f86a8221000
[258532.053019] RBP: 0000000000021ec9 R08: 0000000000000003 R09: 00007f86a812037c
[258532.053020] R10: 0000000000000001 R11: 0000000000000246 R12: 00000000000074a3
[258532.053021] R13: 00007f86a8221000 R14: 000000000000836c R15: 0000000000000001
[258532.053032] irq event stamp: 1653450494
[258532.053035] hardirqs last enabled at (1653450493): [<ffffffff9dec69f9>] _raw_spin_unlock_irq+0x29/0x50
[258532.053037] hardirqs last disabled at (1653450494): [<ffffffff9d4048ea>] trace_hardirqs_off_thunk+0x1a/0x20
[258532.053039] softirqs last enabled at (1653449852): [<ffffffff9e200466>] __do_softirq+0x466/0x6bd
[258532.053042] softirqs last disabled at (1653449845): [<ffffffff9d4c8a0c>] irq_exit+0xec/0x120
[258532.053043] ---[ end trace 8476fce13d9ce20a ]---
Which results in flooding dmesg/syslog since btrfs_drop_extent_cache()
uses WARN_ON() and not WARN_ON_ONCE().
So fix this issue by changing run_delalloc_nocow()'s loop to move to the
next extent item when the current extent item ends at at offset less than
or equals to the current offset instead of the start offset.
Fixes: 80ff385665b7fc ("Btrfs: update nodatacow code v2")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e3c76645cad7..5509c41a4f43 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1479,10 +1479,10 @@ next_slot:
disk_num_bytes =
btrfs_file_extent_disk_num_bytes(leaf, fi);
/*
- * If extent we got ends before our range starts, skip
- * to next extent
+ * If the extent we got ends before our current offset,
+ * skip to the next extent.
*/
- if (extent_end <= start) {
+ if (extent_end <= cur_offset) {
path->slots[0]++;
goto next_slot;
}