I didn't receive any replies on this, and it doesn't seem to have made
it into the latest 3.18 or 4.4 releases. Previously sent on Aug 23rd:
https://lists.linaro.org/pipermail/linux-stable-mirror/2018-August/056347.h…
I assume I sent it wrong; maybe missing keywords or recipients? If
anyone can tell me what I missed, I'd appreciate it.
Trivial backport of commit 3e536e222f293053; newer kernels have simply
moved the vararg macros.
Testing: 3.18 and 4.4 booted OK in qemu.
>8------------------------------------------------------8<
[backport of commit 3e536e222f293053 from mainline]
There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.
creator other
vsnprintf:
fill (not terminated)
count the rest trace_sched_waking(p):
... memcpy(comm, p->comm, TASK_COMM_LEN)
write \0
The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):
crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12"
...and a strcpy out of there would cause stack corruption:
[224761.522292] Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffff9bf9783c78
crash-arm64> kbt | grep 'comm\|trace_print_context'
#6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
comm (char [16]) = "irq/497-pwr_even"
crash-arm64> rd 0xffffffd4d0e17d14 8
ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_
ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16:
ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`..
ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`..........
The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.
Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.
Cc: stable(a)vger.kernel.org
Fixes: bc0c38d139ec7 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
[backported to 3.18 / 4.4 by Snild]
Signed-off-by: Snild Dolkow <snild(a)sony.com>
---
kernel/kthread.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 850b255..ac6849e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -313,10 +313,16 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
task = create->result;
if (!IS_ERR(task)) {
static const struct sched_param param = { .sched_priority = 0 };
+ char name[TASK_COMM_LEN];
va_list args;
va_start(args, namefmt);
- vsnprintf(task->comm, sizeof(task->comm), namefmt, args);
+ /*
+ * task is already visible to other tasks, so updating
+ * COMM must be protected.
+ */
+ vsnprintf(name, sizeof(name), namefmt, args);
+ set_task_comm(task, name);
va_end(args);
/*
* root may have changed our (kthreadd's) priority or CPU mask.
--
2.7.4
Trivial backport of commit 3e536e222f293053; newer kernels have simply
moved the vararg macros.
Testing: 3.18 and 4.4 booted OK in qemu.
>8------------------------------------------------------8<
[backport of commit 3e536e222f293053 from mainline]
There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.
creator other
vsnprintf:
fill (not terminated)
count the rest trace_sched_waking(p):
... memcpy(comm, p->comm, TASK_COMM_LEN)
write \0
The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):
crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12"
...and a strcpy out of there would cause stack corruption:
[224761.522292] Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffff9bf9783c78
crash-arm64> kbt | grep 'comm\|trace_print_context'
#6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
comm (char [16]) = "irq/497-pwr_even"
crash-arm64> rd 0xffffffd4d0e17d14 8
ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_
ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16:
ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`..
ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`..........
The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.
Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.
Cc: stable(a)vger.kernel.org
Fixes: bc0c38d139ec7 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
[backported to 3.18 / 4.4 by Snild]
Signed-off-by: Snild Dolkow <snild(a)sony.com>
---
kernel/kthread.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 850b255..ac6849e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -313,10 +313,16 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
task = create->result;
if (!IS_ERR(task)) {
static const struct sched_param param = { .sched_priority = 0 };
+ char name[TASK_COMM_LEN];
va_list args;
va_start(args, namefmt);
- vsnprintf(task->comm, sizeof(task->comm), namefmt, args);
+ /*
+ * task is already visible to other tasks, so updating
+ * COMM must be protected.
+ */
+ vsnprintf(name, sizeof(name), namefmt, args);
+ set_task_comm(task, name);
va_end(args);
/*
* root may have changed our (kthreadd's) priority or CPU mask.
--
2.7.4
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8a9dbb779fe882325b9a0238494a7afaff2eb444 Mon Sep 17 00:00:00 2001
From: Chanwoo Choi <cw00.choi(a)samsung.com>
Date: Thu, 14 Jun 2018 11:16:29 +0900
Subject: [PATCH] extcon: Release locking when sending the notification of
connector state
Previously, extcon used the spinlock before calling the notifier_call_chain
to prevent the scheduled out of task and to prevent the notification delay.
When spinlock is locked for sending the notification, deadlock issue
occured on the side of extcon consumer device. To fix this issue,
extcon consumer device should always use the work. it is always not
reasonable to use work.
To fix this issue on extcon consumer device, release locking when sending
the notification of connector state.
Fixes: ab11af049f88 ("extcon: Add the synchronization extcon APIs to support the notification")
Cc: stable(a)vger.kernel.org
Cc: Roger Quadros <rogerq(a)ti.com>
Cc: Kishon Vijay Abraham I <kishon(a)ti.com>
Signed-off-by: Chanwoo Choi <cw00.choi(a)samsung.com>
diff --git a/drivers/extcon/extcon.c b/drivers/extcon/extcon.c
index af83ad58819c..b9d27c8fe57e 100644
--- a/drivers/extcon/extcon.c
+++ b/drivers/extcon/extcon.c
@@ -433,8 +433,8 @@ int extcon_sync(struct extcon_dev *edev, unsigned int id)
return index;
spin_lock_irqsave(&edev->lock, flags);
-
state = !!(edev->state & BIT(index));
+ spin_unlock_irqrestore(&edev->lock, flags);
/*
* Call functions in a raw notifier chain for the specific one
@@ -448,6 +448,7 @@ int extcon_sync(struct extcon_dev *edev, unsigned int id)
*/
raw_notifier_call_chain(&edev->nh_all, state, edev);
+ spin_lock_irqsave(&edev->lock, flags);
/* This could be in interrupt handler */
prop_buf = (char *)get_zeroed_page(GFP_ATOMIC);
if (!prop_buf) {
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5b1fe7bec8a8d0cc547a22e7ddc2bd59acd67de4 Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov(a)gmail.com>
Date: Thu, 9 Aug 2018 12:38:28 +0200
Subject: [PATCH] dm cache metadata: set dirty on all cache blocks after a
crash
Quoting Documentation/device-mapper/cache.txt:
The 'dirty' state for a cache block changes far too frequently for us
to keep updating it on the fly. So we treat it as a hint. In normal
operation it will be written when the dm device is suspended. If the
system crashes all cache blocks will be assumed dirty when restarted.
This got broken in commit f177940a8091 ("dm cache metadata: switch to
using the new cursor api for loading metadata") in 4.9, which removed
the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk
flag) when loading cache blocks. This results in data corruption on an
unclean shutdown with dirty cache blocks on the fast device. After the
crash those blocks are considered clean and may get evicted from the
cache at any time. This can be demonstrated by doing a lot of reads
to trigger individual evictions, but uncache is more predictable:
### Disable auto-activation in lvm.conf to be able to do uncache in
### time (i.e. see uncache doing flushing) when the fix is applied.
# xfs_io -d -c 'pwrite -b 4M -S 0xaa 0 1G' /dev/vdb
# vgcreate vg_cache /dev/vdb /dev/vdc
# lvcreate -L 1G -n lv_slowdev vg_cache /dev/vdb
# lvcreate -L 512M -n lv_cachedev vg_cache /dev/vdc
# lvcreate -L 256M -n lv_metadev vg_cache /dev/vdc
# lvconvert --type cache-pool --cachemode writeback vg_cache/lv_cachedev --poolmetadata vg_cache/lv_metadev
# lvconvert --type cache vg_cache/lv_slowdev --cachepool vg_cache/lv_cachedev
# xfs_io -d -c 'pwrite -b 4M -S 0xbb 0 512M' /dev/mapper/vg_cache-lv_slowdev
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
# dmsetup status vg_cache-lv_slowdev
0 2097152 cache 8 27/65536 128 8192/8192 1 100 0 0 0 8192 7065 2 metadata2 writeback 2 migration_threshold 2048 smq 0 rw -
^^^^
7065 * 64k = 441M yet to be written to the slow device
# echo b >/proc/sysrq-trigger
# vgchange -ay vg_cache
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
# lvconvert --uncache vg_cache/lv_slowdev
Flushing 0 blocks for cache vg_cache/lv_slowdev.
Logical volume "lv_cachedev" successfully removed
Logical volume vg_cache/lv_slowdev is not cached.
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
0fe00000: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................
0fe00010: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................
This is the case with both v1 and v2 cache pool metatata formats.
After applying this patch:
# vgchange -ay vg_cache
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
# lvconvert --uncache vg_cache/lv_slowdev
Flushing 3724 blocks for cache vg_cache/lv_slowdev.
...
Flushing 71 blocks for cache vg_cache/lv_slowdev.
Logical volume "lv_cachedev" successfully removed
Logical volume vg_cache/lv_slowdev is not cached.
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Cc: stable(a)vger.kernel.org
Fixes: f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata")
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
index 1a449105b007..69dddeab124c 100644
--- a/drivers/md/dm-cache-metadata.c
+++ b/drivers/md/dm-cache-metadata.c
@@ -1323,6 +1323,7 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd,
dm_oblock_t oblock;
unsigned flags;
+ bool dirty = true;
dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le);
memcpy(&mapping, mapping_value_le, sizeof(mapping));
@@ -1333,8 +1334,10 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd,
dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le);
memcpy(&hint, hint_value_le, sizeof(hint));
}
+ if (cmd->clean_when_opened)
+ dirty = flags & M_DIRTY;
- r = fn(context, oblock, to_cblock(cb), flags & M_DIRTY,
+ r = fn(context, oblock, to_cblock(cb), dirty,
le32_to_cpu(hint), hints_valid);
if (r) {
DMERR("policy couldn't load cache block %llu",
@@ -1362,7 +1365,7 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd,
dm_oblock_t oblock;
unsigned flags;
- bool dirty;
+ bool dirty = true;
dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le);
memcpy(&mapping, mapping_value_le, sizeof(mapping));
@@ -1373,8 +1376,9 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd,
dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le);
memcpy(&hint, hint_value_le, sizeof(hint));
}
+ if (cmd->clean_when_opened)
+ dirty = dm_bitset_cursor_get_value(dirty_cursor);
- dirty = dm_bitset_cursor_get_value(dirty_cursor);
r = fn(context, oblock, to_cblock(cb), dirty,
le32_to_cpu(hint), hints_valid);
if (r) {
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 75294442d896f2767be34f75aca7cc2b0d01301f Mon Sep 17 00:00:00 2001
From: Hou Tao <houtao1(a)huawei.com>
Date: Thu, 2 Aug 2018 16:18:24 +0800
Subject: [PATCH] dm thin: stop no_space_timeout worker when switching to
write-mode
Now both check_for_space() and do_no_space_timeout() will read & write
pool->pf.error_if_no_space. If these functions run concurrently, as
shown in the following case, the default setting of "queue_if_no_space"
can get lost.
precondition:
* error_if_no_space = false (aka "queue_if_no_space")
* pool is in Out-of-Data-Space (OODS) mode
* no_space_timeout worker has been queued
CPU 0: CPU 1:
// delete a thin device
process_delete_mesg()
// check_for_space() invoked by commit()
set_pool_mode(pool, PM_WRITE)
pool->pf.error_if_no_space = \
pt->requested_pf.error_if_no_space
// timeout, pool is still in OODS mode
do_no_space_timeout
// "queue_if_no_space" config is lost
pool->pf.error_if_no_space = true
pool->pf.mode = new_mode
Fix it by stopping no_space_timeout worker when switching to write mode.
Fixes: bcc696fac11f ("dm thin: stay in out-of-data-space mode once no_space_timeout expires")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hou Tao <houtao1(a)huawei.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 5997d6808b57..7bd60a150f8f 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2503,6 +2503,8 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
case PM_WRITE:
if (old_mode != new_mode)
notify_of_pool_mode_change(pool, "write");
+ if (old_mode == PM_OUT_OF_DATA_SPACE)
+ cancel_delayed_work_sync(&pool->no_space_timeout);
pool->out_of_data_space = false;
pool->pf.error_if_no_space = pt->requested_pf.error_if_no_space;
dm_pool_metadata_read_write(pool->pmd);
The patch below does not apply to the 4.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 05f58ceba123bdb420cf44c6ea04b6db467edd1c Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro(a)mellanox.com>
Date: Sun, 8 Jul 2018 13:50:21 +0300
Subject: [PATCH] RDMA/mlx5: Check that supplied blue flame index doesn't
overflow
User's supplied index is checked again total number of system pages, but
this number already includes num_static_sys_pages, so addition of that
value to supplied index causes to below error while trying to access
sys_pages[].
BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400
Read of size 4 at addr ffff880065561904 by task syz-executor446/314
CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ #256
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0xef/0x17e
print_address_description+0x83/0x3b0
kasan_report+0x18d/0x4d0
bfregn_to_uar_index+0x34f/0x400
create_user_qp+0x272/0x227d
create_qp_common+0x32eb/0x43e0
mlx5_ib_create_qp+0x379/0x1ca0
create_qp.isra.5+0xc94/0x22d0
ib_uverbs_create_qp+0x21b/0x2a0
ib_uverbs_write+0xc2c/0x1010
vfs_write+0x1b0/0x550
ksys_write+0xc6/0x1a0
do_syscall_64+0xa7/0x590
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x433679
Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679
RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8
R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000
R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006
Allocated by task 314:
kasan_kmalloc+0xa0/0xd0
__kmalloc+0x1a9/0x510
mlx5_ib_alloc_ucontext+0x966/0x2620
ib_uverbs_get_context+0x23f/0xa60
ib_uverbs_write+0xc2c/0x1010
__vfs_write+0x10d/0x720
vfs_write+0x1b0/0x550
ksys_write+0xc6/0x1a0
do_syscall_64+0xa7/0x590
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 1:
__kasan_slab_free+0x12e/0x180
kfree+0x159/0x630
kvfree+0x37/0x50
single_release+0x8e/0xf0
__fput+0x2d8/0x900
task_work_run+0x102/0x1f0
exit_to_usermode_loop+0x159/0x1c0
do_syscall_64+0x408/0x590
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff880065561100
which belongs to the cache kmalloc-4096 of size 4096
The buggy address is located 2052 bytes inside of
4096-byte region [ffff880065561100, ffff880065562100)
The buggy address belongs to the page:
page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480
raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Cc: <stable(a)vger.kernel.org> # 4.15
Fixes: 1ee47ab3e8d8 ("IB/mlx5: Enable QP creation with a given blue flame index")
Reported-by: Noa Osherovich <noaos(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 93087409f4b8..04a5d82c9cf3 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1330,6 +1330,6 @@ unsigned long mlx5_ib_get_xlt_emergency_page(void);
void mlx5_ib_put_xlt_emergency_page(void);
int bfregn_to_uar_index(struct mlx5_ib_dev *dev,
- struct mlx5_bfreg_info *bfregi, int bfregn,
+ struct mlx5_bfreg_info *bfregi, u32 bfregn,
bool dyn_bfreg);
#endif /* MLX5_IB_H */
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 51e68ca20215..d4414015b64f 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -631,22 +631,23 @@ static void mlx5_ib_unlock_cqs(struct mlx5_ib_cq *send_cq,
struct mlx5_ib_cq *recv_cq);
int bfregn_to_uar_index(struct mlx5_ib_dev *dev,
- struct mlx5_bfreg_info *bfregi, int bfregn,
+ struct mlx5_bfreg_info *bfregi, u32 bfregn,
bool dyn_bfreg)
{
- int bfregs_per_sys_page;
- int index_of_sys_page;
- int offset;
+ unsigned int bfregs_per_sys_page;
+ u32 index_of_sys_page;
+ u32 offset;
bfregs_per_sys_page = get_uars_per_sys_page(dev, bfregi->lib_uar_4k) *
MLX5_NON_FP_BFREGS_PER_UAR;
index_of_sys_page = bfregn / bfregs_per_sys_page;
- if (index_of_sys_page >= bfregi->num_sys_pages)
- return -EINVAL;
-
if (dyn_bfreg) {
index_of_sys_page += bfregi->num_static_sys_pages;
+
+ if (index_of_sys_page >= bfregi->num_sys_pages)
+ return -EINVAL;
+
if (bfregn > bfregi->num_dyn_bfregs ||
bfregi->sys_pages[index_of_sys_page] == MLX5_IB_INVALID_UAR_INDEX) {
mlx5_ib_dbg(dev, "Invalid dynamic uar index\n");
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 94675cceacaec27a30eefb142c4c59a9d3131742 Mon Sep 17 00:00:00 2001
From: Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
Date: Wed, 4 Jul 2018 23:27:21 +0530
Subject: [PATCH] powerpc/pseries: Defer the logging of rtas error to irq work
queue.
rtas_log_buf is a buffer to hold RTAS event data that are communicated
to kernel by hypervisor. This buffer is then used to pass RTAS event
data to user through proc fs. This buffer is allocated from
vmalloc (non-linear mapping) area.
On Machine check interrupt, register r3 points to RTAS extended event
log passed by hypervisor that contains the MCE event. The pseries
machine check handler then logs this error into rtas_log_buf. The
rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
page fault (vector 0x300) while accessing it. Since machine check
interrupt handler runs in NMI context we can not afford to take any
page fault. Page faults are not honored in NMI context and causes
kernel panic. Apart from that, as Nick pointed out,
pSeries_log_error() also takes a spin_lock while logging error which
is not safe in NMI context. It may endup in deadlock if we get another
MCE before releasing the lock. Fix this by deferring the logging of
rtas error to irq work queue.
Current implementation uses two different buffers to hold rtas error
log depending on whether extended log is provided or not. This makes
bit difficult to identify which buffer has valid data that needs to
logged later in irq work. Simplify this using single buffer, one per
paca, and copy rtas log to it irrespective of whether extended log is
provided or not. Allocate this buffer below RMA region so that it can
be accessed in real mode mce handler.
Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Cc: stable(a)vger.kernel.org # v4.14+
Reviewed-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 4e9cede5a7e7..ad4f16164619 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -247,6 +247,9 @@ struct paca_struct {
void *rfi_flush_fallback_area;
u64 l1d_flush_size;
#endif
+#ifdef CONFIG_PPC_PSERIES
+ u8 *mce_data_buf; /* buffer to hold per cpu rtas errlog */
+#endif /* CONFIG_PPC_PSERIES */
} ____cacheline_aligned;
extern void copy_mm_to_paca(struct mm_struct *mm);
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index ef104144d4bc..14a46b07ab2f 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -22,6 +22,7 @@
#include <linux/of.h>
#include <linux/fs.h>
#include <linux/reboot.h>
+#include <linux/irq_work.h>
#include <asm/machdep.h>
#include <asm/rtas.h>
@@ -32,11 +33,13 @@
static unsigned char ras_log_buf[RTAS_ERROR_LOG_MAX];
static DEFINE_SPINLOCK(ras_log_buf_lock);
-static char global_mce_data_buf[RTAS_ERROR_LOG_MAX];
-static DEFINE_PER_CPU(__u64, mce_data_buf);
-
static int ras_check_exception_token;
+static void mce_process_errlog_event(struct irq_work *work);
+static struct irq_work mce_errlog_process_work = {
+ .func = mce_process_errlog_event,
+};
+
#define EPOW_SENSOR_TOKEN 9
#define EPOW_SENSOR_INDEX 0
@@ -330,16 +333,20 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
((((A) >= 0x7000) && ((A) < 0x7ff0)) || \
(((A) >= rtas.base) && ((A) < (rtas.base + rtas.size - 16))))
+static inline struct rtas_error_log *fwnmi_get_errlog(void)
+{
+ return (struct rtas_error_log *)local_paca->mce_data_buf;
+}
+
/*
* Get the error information for errors coming through the
* FWNMI vectors. The pt_regs' r3 will be updated to reflect
* the actual r3 if possible, and a ptr to the error log entry
* will be returned if found.
*
- * If the RTAS error is not of the extended type, then we put it in a per
- * cpu 64bit buffer. If it is the extended type we use global_mce_data_buf.
+ * Use one buffer mce_data_buf per cpu to store RTAS error.
*
- * The global_mce_data_buf does not have any locks or protection around it,
+ * The mce_data_buf does not have any locks or protection around it,
* if a second machine check comes in, or a system reset is done
* before we have logged the error, then we will get corruption in the
* error log. This is preferable over holding off on calling
@@ -349,7 +356,7 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
{
unsigned long *savep;
- struct rtas_error_log *h, *errhdr = NULL;
+ struct rtas_error_log *h;
/* Mask top two bits */
regs->gpr[3] &= ~(0x3UL << 62);
@@ -362,22 +369,20 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
savep = __va(regs->gpr[3]);
regs->gpr[3] = savep[0]; /* restore original r3 */
- /* If it isn't an extended log we can use the per cpu 64bit buffer */
h = (struct rtas_error_log *)&savep[1];
+ /* Use the per cpu buffer from paca to store rtas error log */
+ memset(local_paca->mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
if (!rtas_error_extended(h)) {
- memcpy(this_cpu_ptr(&mce_data_buf), h, sizeof(__u64));
- errhdr = (struct rtas_error_log *)this_cpu_ptr(&mce_data_buf);
+ memcpy(local_paca->mce_data_buf, h, sizeof(__u64));
} else {
int len, error_log_length;
error_log_length = 8 + rtas_error_extended_log_length(h);
len = min_t(int, error_log_length, RTAS_ERROR_LOG_MAX);
- memset(global_mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
- memcpy(global_mce_data_buf, h, len);
- errhdr = (struct rtas_error_log *)global_mce_data_buf;
+ memcpy(local_paca->mce_data_buf, h, len);
}
- return errhdr;
+ return (struct rtas_error_log *)local_paca->mce_data_buf;
}
/* Call this when done with the data returned by FWNMI_get_errinfo.
@@ -422,6 +427,17 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
return 0; /* need to perform reset */
}
+/*
+ * Process MCE rtas errlog event.
+ */
+static void mce_process_errlog_event(struct irq_work *work)
+{
+ struct rtas_error_log *err;
+
+ err = fwnmi_get_errlog();
+ log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
+}
+
/*
* See if we can recover from a machine check exception.
* This is only called on power4 (or above) and only via
@@ -466,7 +482,8 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
recovered = 1;
}
- log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
+ /* Queue irq work to log this rtas event later. */
+ irq_work_queue(&mce_errlog_process_work);
return recovered;
}
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 9948ad16f788..b411a74b861d 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -41,6 +41,7 @@
#include <linux/root_dev.h>
#include <linux/of.h>
#include <linux/of_pci.h>
+#include <linux/memblock.h>
#include <asm/mmu.h>
#include <asm/processor.h>
@@ -102,6 +103,9 @@ static void pSeries_show_cpuinfo(struct seq_file *m)
static void __init fwnmi_init(void)
{
unsigned long system_reset_addr, machine_check_addr;
+ u8 *mce_data_buf;
+ unsigned int i;
+ int nr_cpus = num_possible_cpus();
int ibm_nmi_register = rtas_token("ibm,nmi-register");
if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE)
@@ -115,6 +119,18 @@ static void __init fwnmi_init(void)
if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, system_reset_addr,
machine_check_addr))
fwnmi_active = 1;
+
+ /*
+ * Allocate a chunk for per cpu buffer to hold rtas errorlog.
+ * It will be used in real mode mce handler, hence it needs to be
+ * below RMA.
+ */
+ mce_data_buf = __va(memblock_alloc_base(RTAS_ERROR_LOG_MAX * nr_cpus,
+ RTAS_ERROR_LOG_MAX, ppc64_rma_size));
+ for_each_possible_cpu(i) {
+ paca_ptrs[i]->mce_data_buf = mce_data_buf +
+ (RTAS_ERROR_LOG_MAX * i);
+ }
}
static void pseries_8259_cascade(struct irq_desc *desc)
The patch below does not apply to the 4.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 94675cceacaec27a30eefb142c4c59a9d3131742 Mon Sep 17 00:00:00 2001
From: Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
Date: Wed, 4 Jul 2018 23:27:21 +0530
Subject: [PATCH] powerpc/pseries: Defer the logging of rtas error to irq work
queue.
rtas_log_buf is a buffer to hold RTAS event data that are communicated
to kernel by hypervisor. This buffer is then used to pass RTAS event
data to user through proc fs. This buffer is allocated from
vmalloc (non-linear mapping) area.
On Machine check interrupt, register r3 points to RTAS extended event
log passed by hypervisor that contains the MCE event. The pseries
machine check handler then logs this error into rtas_log_buf. The
rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
page fault (vector 0x300) while accessing it. Since machine check
interrupt handler runs in NMI context we can not afford to take any
page fault. Page faults are not honored in NMI context and causes
kernel panic. Apart from that, as Nick pointed out,
pSeries_log_error() also takes a spin_lock while logging error which
is not safe in NMI context. It may endup in deadlock if we get another
MCE before releasing the lock. Fix this by deferring the logging of
rtas error to irq work queue.
Current implementation uses two different buffers to hold rtas error
log depending on whether extended log is provided or not. This makes
bit difficult to identify which buffer has valid data that needs to
logged later in irq work. Simplify this using single buffer, one per
paca, and copy rtas log to it irrespective of whether extended log is
provided or not. Allocate this buffer below RMA region so that it can
be accessed in real mode mce handler.
Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Cc: stable(a)vger.kernel.org # v4.14+
Reviewed-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 4e9cede5a7e7..ad4f16164619 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -247,6 +247,9 @@ struct paca_struct {
void *rfi_flush_fallback_area;
u64 l1d_flush_size;
#endif
+#ifdef CONFIG_PPC_PSERIES
+ u8 *mce_data_buf; /* buffer to hold per cpu rtas errlog */
+#endif /* CONFIG_PPC_PSERIES */
} ____cacheline_aligned;
extern void copy_mm_to_paca(struct mm_struct *mm);
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index ef104144d4bc..14a46b07ab2f 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -22,6 +22,7 @@
#include <linux/of.h>
#include <linux/fs.h>
#include <linux/reboot.h>
+#include <linux/irq_work.h>
#include <asm/machdep.h>
#include <asm/rtas.h>
@@ -32,11 +33,13 @@
static unsigned char ras_log_buf[RTAS_ERROR_LOG_MAX];
static DEFINE_SPINLOCK(ras_log_buf_lock);
-static char global_mce_data_buf[RTAS_ERROR_LOG_MAX];
-static DEFINE_PER_CPU(__u64, mce_data_buf);
-
static int ras_check_exception_token;
+static void mce_process_errlog_event(struct irq_work *work);
+static struct irq_work mce_errlog_process_work = {
+ .func = mce_process_errlog_event,
+};
+
#define EPOW_SENSOR_TOKEN 9
#define EPOW_SENSOR_INDEX 0
@@ -330,16 +333,20 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
((((A) >= 0x7000) && ((A) < 0x7ff0)) || \
(((A) >= rtas.base) && ((A) < (rtas.base + rtas.size - 16))))
+static inline struct rtas_error_log *fwnmi_get_errlog(void)
+{
+ return (struct rtas_error_log *)local_paca->mce_data_buf;
+}
+
/*
* Get the error information for errors coming through the
* FWNMI vectors. The pt_regs' r3 will be updated to reflect
* the actual r3 if possible, and a ptr to the error log entry
* will be returned if found.
*
- * If the RTAS error is not of the extended type, then we put it in a per
- * cpu 64bit buffer. If it is the extended type we use global_mce_data_buf.
+ * Use one buffer mce_data_buf per cpu to store RTAS error.
*
- * The global_mce_data_buf does not have any locks or protection around it,
+ * The mce_data_buf does not have any locks or protection around it,
* if a second machine check comes in, or a system reset is done
* before we have logged the error, then we will get corruption in the
* error log. This is preferable over holding off on calling
@@ -349,7 +356,7 @@ static irqreturn_t ras_error_interrupt(int irq, void *dev_id)
static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
{
unsigned long *savep;
- struct rtas_error_log *h, *errhdr = NULL;
+ struct rtas_error_log *h;
/* Mask top two bits */
regs->gpr[3] &= ~(0x3UL << 62);
@@ -362,22 +369,20 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
savep = __va(regs->gpr[3]);
regs->gpr[3] = savep[0]; /* restore original r3 */
- /* If it isn't an extended log we can use the per cpu 64bit buffer */
h = (struct rtas_error_log *)&savep[1];
+ /* Use the per cpu buffer from paca to store rtas error log */
+ memset(local_paca->mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
if (!rtas_error_extended(h)) {
- memcpy(this_cpu_ptr(&mce_data_buf), h, sizeof(__u64));
- errhdr = (struct rtas_error_log *)this_cpu_ptr(&mce_data_buf);
+ memcpy(local_paca->mce_data_buf, h, sizeof(__u64));
} else {
int len, error_log_length;
error_log_length = 8 + rtas_error_extended_log_length(h);
len = min_t(int, error_log_length, RTAS_ERROR_LOG_MAX);
- memset(global_mce_data_buf, 0, RTAS_ERROR_LOG_MAX);
- memcpy(global_mce_data_buf, h, len);
- errhdr = (struct rtas_error_log *)global_mce_data_buf;
+ memcpy(local_paca->mce_data_buf, h, len);
}
- return errhdr;
+ return (struct rtas_error_log *)local_paca->mce_data_buf;
}
/* Call this when done with the data returned by FWNMI_get_errinfo.
@@ -422,6 +427,17 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
return 0; /* need to perform reset */
}
+/*
+ * Process MCE rtas errlog event.
+ */
+static void mce_process_errlog_event(struct irq_work *work)
+{
+ struct rtas_error_log *err;
+
+ err = fwnmi_get_errlog();
+ log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
+}
+
/*
* See if we can recover from a machine check exception.
* This is only called on power4 (or above) and only via
@@ -466,7 +482,8 @@ static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
recovered = 1;
}
- log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
+ /* Queue irq work to log this rtas event later. */
+ irq_work_queue(&mce_errlog_process_work);
return recovered;
}
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 9948ad16f788..b411a74b861d 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -41,6 +41,7 @@
#include <linux/root_dev.h>
#include <linux/of.h>
#include <linux/of_pci.h>
+#include <linux/memblock.h>
#include <asm/mmu.h>
#include <asm/processor.h>
@@ -102,6 +103,9 @@ static void pSeries_show_cpuinfo(struct seq_file *m)
static void __init fwnmi_init(void)
{
unsigned long system_reset_addr, machine_check_addr;
+ u8 *mce_data_buf;
+ unsigned int i;
+ int nr_cpus = num_possible_cpus();
int ibm_nmi_register = rtas_token("ibm,nmi-register");
if (ibm_nmi_register == RTAS_UNKNOWN_SERVICE)
@@ -115,6 +119,18 @@ static void __init fwnmi_init(void)
if (0 == rtas_call(ibm_nmi_register, 2, 1, NULL, system_reset_addr,
machine_check_addr))
fwnmi_active = 1;
+
+ /*
+ * Allocate a chunk for per cpu buffer to hold rtas errorlog.
+ * It will be used in real mode mce handler, hence it needs to be
+ * below RMA.
+ */
+ mce_data_buf = __va(memblock_alloc_base(RTAS_ERROR_LOG_MAX * nr_cpus,
+ RTAS_ERROR_LOG_MAX, ppc64_rma_size));
+ for_each_possible_cpu(i) {
+ paca_ptrs[i]->mce_data_buf = mce_data_buf +
+ (RTAS_ERROR_LOG_MAX * i);
+ }
}
static void pseries_8259_cascade(struct irq_desc *desc)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4231aba000f5a4583dd9f67057aadb68c3eca99d Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Fri, 27 Jul 2018 21:48:17 +1000
Subject: [PATCH] powerpc/64s: Fix page table fragment refcount race vs
speculative references
The page table fragment allocator uses the main page refcount racily
with respect to speculative references. A customer observed a BUG due
to page table page refcount underflow in the fragment allocator. This
can be caused by the fragment allocator set_page_count stomping on a
speculative reference, and then the speculative failure handler
decrements the new reference, and the underflow eventually pops when
the page tables are freed.
Fix this by using a dedicated field in the struct page for the page
table fragment allocator.
Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
Cc: stable(a)vger.kernel.org # v3.10+
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 8b24168ea8c4..4a892d894a0f 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -200,9 +200,9 @@ static void pte_frag_destroy(void *pte_frag)
/* drop all the pending references */
count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
- if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) {
+ if (atomic_sub_and_test(PTE_FRAG_NR - count, &page->pt_frag_refcount)) {
pgtable_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
@@ -215,9 +215,9 @@ static void pmd_frag_destroy(void *pmd_frag)
/* drop all the pending references */
count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
- if (page_ref_sub_and_test(page, PMD_FRAG_NR - count)) {
+ if (atomic_sub_and_test(PMD_FRAG_NR - count, &page->pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 4afbfbb64bfd..78d0b3d5ebad 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -270,6 +270,8 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
return NULL;
}
+ atomic_set(&page->pt_frag_refcount, 1);
+
ret = page_address(page);
/*
* if we support only one fragment just return the
@@ -285,7 +287,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
* count.
*/
if (likely(!mm->context.pmd_frag)) {
- set_page_count(page, PMD_FRAG_NR);
+ atomic_set(&page->pt_frag_refcount, PMD_FRAG_NR);
mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
}
spin_unlock(&mm->page_table_lock);
@@ -308,9 +310,10 @@ void pmd_fragment_free(unsigned long *pmd)
{
struct page *page = virt_to_page(pmd);
- if (put_page_testzero(page)) {
+ BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
+ if (atomic_dec_and_test(&page->pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
@@ -352,6 +355,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
return NULL;
}
+ atomic_set(&page->pt_frag_refcount, 1);
ret = page_address(page);
/*
@@ -367,7 +371,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
* count.
*/
if (likely(!mm->context.pte_frag)) {
- set_page_count(page, PTE_FRAG_NR);
+ atomic_set(&page->pt_frag_refcount, PTE_FRAG_NR);
mm->context.pte_frag = ret + PTE_FRAG_SIZE;
}
spin_unlock(&mm->page_table_lock);
@@ -390,10 +394,11 @@ void pte_fragment_free(unsigned long *table, int kernel)
{
struct page *page = virt_to_page(table);
- if (put_page_testzero(page)) {
+ BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
+ if (atomic_dec_and_test(&page->pt_frag_refcount)) {
if (!kernel)
pgtable_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 99ce070e7dcb..22651e124071 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -139,7 +139,10 @@ struct page {
unsigned long _pt_pad_1; /* compound_head */
pgtable_t pmd_huge_pte; /* protected by page->ptl */
unsigned long _pt_pad_2; /* mapping */
- struct mm_struct *pt_mm; /* x86 pgds only */
+ union {
+ struct mm_struct *pt_mm; /* x86 pgds only */
+ atomic_t pt_frag_refcount; /* powerpc */
+ };
#if ALLOC_SPLIT_PTLOCKS
spinlock_t *ptl;
#else
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4231aba000f5a4583dd9f67057aadb68c3eca99d Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Fri, 27 Jul 2018 21:48:17 +1000
Subject: [PATCH] powerpc/64s: Fix page table fragment refcount race vs
speculative references
The page table fragment allocator uses the main page refcount racily
with respect to speculative references. A customer observed a BUG due
to page table page refcount underflow in the fragment allocator. This
can be caused by the fragment allocator set_page_count stomping on a
speculative reference, and then the speculative failure handler
decrements the new reference, and the underflow eventually pops when
the page tables are freed.
Fix this by using a dedicated field in the struct page for the page
table fragment allocator.
Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
Cc: stable(a)vger.kernel.org # v3.10+
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 8b24168ea8c4..4a892d894a0f 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -200,9 +200,9 @@ static void pte_frag_destroy(void *pte_frag)
/* drop all the pending references */
count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
- if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) {
+ if (atomic_sub_and_test(PTE_FRAG_NR - count, &page->pt_frag_refcount)) {
pgtable_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
@@ -215,9 +215,9 @@ static void pmd_frag_destroy(void *pmd_frag)
/* drop all the pending references */
count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
- if (page_ref_sub_and_test(page, PMD_FRAG_NR - count)) {
+ if (atomic_sub_and_test(PMD_FRAG_NR - count, &page->pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 4afbfbb64bfd..78d0b3d5ebad 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -270,6 +270,8 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
return NULL;
}
+ atomic_set(&page->pt_frag_refcount, 1);
+
ret = page_address(page);
/*
* if we support only one fragment just return the
@@ -285,7 +287,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
* count.
*/
if (likely(!mm->context.pmd_frag)) {
- set_page_count(page, PMD_FRAG_NR);
+ atomic_set(&page->pt_frag_refcount, PMD_FRAG_NR);
mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
}
spin_unlock(&mm->page_table_lock);
@@ -308,9 +310,10 @@ void pmd_fragment_free(unsigned long *pmd)
{
struct page *page = virt_to_page(pmd);
- if (put_page_testzero(page)) {
+ BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
+ if (atomic_dec_and_test(&page->pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
@@ -352,6 +355,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
return NULL;
}
+ atomic_set(&page->pt_frag_refcount, 1);
ret = page_address(page);
/*
@@ -367,7 +371,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
* count.
*/
if (likely(!mm->context.pte_frag)) {
- set_page_count(page, PTE_FRAG_NR);
+ atomic_set(&page->pt_frag_refcount, PTE_FRAG_NR);
mm->context.pte_frag = ret + PTE_FRAG_SIZE;
}
spin_unlock(&mm->page_table_lock);
@@ -390,10 +394,11 @@ void pte_fragment_free(unsigned long *table, int kernel)
{
struct page *page = virt_to_page(table);
- if (put_page_testzero(page)) {
+ BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
+ if (atomic_dec_and_test(&page->pt_frag_refcount)) {
if (!kernel)
pgtable_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 99ce070e7dcb..22651e124071 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -139,7 +139,10 @@ struct page {
unsigned long _pt_pad_1; /* compound_head */
pgtable_t pmd_huge_pte; /* protected by page->ptl */
unsigned long _pt_pad_2; /* mapping */
- struct mm_struct *pt_mm; /* x86 pgds only */
+ union {
+ struct mm_struct *pt_mm; /* x86 pgds only */
+ atomic_t pt_frag_refcount; /* powerpc */
+ };
#if ALLOC_SPLIT_PTLOCKS
spinlock_t *ptl;
#else
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4231aba000f5a4583dd9f67057aadb68c3eca99d Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Fri, 27 Jul 2018 21:48:17 +1000
Subject: [PATCH] powerpc/64s: Fix page table fragment refcount race vs
speculative references
The page table fragment allocator uses the main page refcount racily
with respect to speculative references. A customer observed a BUG due
to page table page refcount underflow in the fragment allocator. This
can be caused by the fragment allocator set_page_count stomping on a
speculative reference, and then the speculative failure handler
decrements the new reference, and the underflow eventually pops when
the page tables are freed.
Fix this by using a dedicated field in the struct page for the page
table fragment allocator.
Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
Cc: stable(a)vger.kernel.org # v3.10+
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 8b24168ea8c4..4a892d894a0f 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -200,9 +200,9 @@ static void pte_frag_destroy(void *pte_frag)
/* drop all the pending references */
count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
- if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) {
+ if (atomic_sub_and_test(PTE_FRAG_NR - count, &page->pt_frag_refcount)) {
pgtable_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
@@ -215,9 +215,9 @@ static void pmd_frag_destroy(void *pmd_frag)
/* drop all the pending references */
count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
- if (page_ref_sub_and_test(page, PMD_FRAG_NR - count)) {
+ if (atomic_sub_and_test(PMD_FRAG_NR - count, &page->pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 4afbfbb64bfd..78d0b3d5ebad 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -270,6 +270,8 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
return NULL;
}
+ atomic_set(&page->pt_frag_refcount, 1);
+
ret = page_address(page);
/*
* if we support only one fragment just return the
@@ -285,7 +287,7 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
* count.
*/
if (likely(!mm->context.pmd_frag)) {
- set_page_count(page, PMD_FRAG_NR);
+ atomic_set(&page->pt_frag_refcount, PMD_FRAG_NR);
mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
}
spin_unlock(&mm->page_table_lock);
@@ -308,9 +310,10 @@ void pmd_fragment_free(unsigned long *pmd)
{
struct page *page = virt_to_page(pmd);
- if (put_page_testzero(page)) {
+ BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
+ if (atomic_dec_and_test(&page->pt_frag_refcount)) {
pgtable_pmd_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
@@ -352,6 +355,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
return NULL;
}
+ atomic_set(&page->pt_frag_refcount, 1);
ret = page_address(page);
/*
@@ -367,7 +371,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
* count.
*/
if (likely(!mm->context.pte_frag)) {
- set_page_count(page, PTE_FRAG_NR);
+ atomic_set(&page->pt_frag_refcount, PTE_FRAG_NR);
mm->context.pte_frag = ret + PTE_FRAG_SIZE;
}
spin_unlock(&mm->page_table_lock);
@@ -390,10 +394,11 @@ void pte_fragment_free(unsigned long *table, int kernel)
{
struct page *page = virt_to_page(table);
- if (put_page_testzero(page)) {
+ BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
+ if (atomic_dec_and_test(&page->pt_frag_refcount)) {
if (!kernel)
pgtable_page_dtor(page);
- free_unref_page(page);
+ __free_page(page);
}
}
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 99ce070e7dcb..22651e124071 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -139,7 +139,10 @@ struct page {
unsigned long _pt_pad_1; /* compound_head */
pgtable_t pmd_huge_pte; /* protected by page->ptl */
unsigned long _pt_pad_2; /* mapping */
- struct mm_struct *pt_mm; /* x86 pgds only */
+ union {
+ struct mm_struct *pt_mm; /* x86 pgds only */
+ atomic_t pt_frag_refcount; /* powerpc */
+ };
#if ALLOC_SPLIT_PTLOCKS
spinlock_t *ptl;
#else
From: Andi Kleen <ak(a)linux.intel.com>
Mostly recycling the commit log from adaba23ccd7d which fixed
populate_pmd, but did not fix populate_pud. The same problem exists
there.
Stable trees reverted the following patch:
Revert "x86/mm/pat: Ensure cpa->pfn only contains page frame numbers"
This reverts commit 87e2bd898d3a79a8c609f183180adac47879a2a4 which is
commit edc3b9129cecd0f0857112136f5b8b1bc1d45918 upstream.
but the L1TF patch 02ff2769edbc backported here
x86/mm/pat: Make set_memory_np() L1TF safe
commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream
set_memory_np() is used to mark kernel mappings not present, but it has
it's own open coded mechanism which does not have the L1TF protection of
inverting the address bits.
assumed that cpa->pfn contains a PFN. With the above patch reverted
it does not, which causes the PUD to be set to an incorrect address
shifted by 12 bits, which can cause various failures.
Convert the address to a PFN before passing it to pud_pfn().
This is a 4.4 stable only patch to fix the L1TF patches backport there.
Cc: stable(a)vger.kernel.org # 4.4-only
Cc: Andi Kleen <ak(a)linux.intel.com>
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
---
arch/x86/mm/pageattr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 1007fa80f5a6..0e1dd7d47f05 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1079,7 +1079,7 @@ static int populate_pud(struct cpa_data *cpa, unsigned long start, pgd_t *pgd,
* Map everything starting from the Gb boundary, possibly with 1G pages
*/
while (end - start >= PUD_SIZE) {
- set_pud(pud, pud_mkhuge(pfn_pud(cpa->pfn,
+ set_pud(pud, pud_mkhuge(pfn_pud(cpa->pfn >> PAGE_SHIFT,
canon_pgprot(pud_pgprot))));
start += PUD_SIZE;
--
2.18.0
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7444a8092906ed44c09459780c56ba57043e39b1 Mon Sep 17 00:00:00 2001
From: Daniel Mack <daniel(a)zonque.org>
Date: Wed, 27 Jun 2018 20:58:45 +0200
Subject: [PATCH] libertas: fix suspend and resume for SDIO connected cards
Prior to commit 573185cc7e64 ("mmc: core: Invoke sdio func driver's PM
callbacks from the sdio bus"), the MMC core used to call into the power
management functions of SDIO clients itself and removed the card if the
return code was non-zero. IOW, the mmc handled errors gracefully and didn't
upchain them to the pm core.
Since this change, the mmc core relies on generic power management
functions which treat all errors as a reason to cancel the suspend
immediately. This causes suspend attempts to fail when the libertas
driver is loaded.
To fix this, power down the card explicitly in if_sdio_suspend() when we
know we're about to lose power and return success. Also set a flag in these
cases, and power up the card again in if_sdio_resume().
Fixes: 573185cc7e64 ("mmc: core: Invoke sdio func driver's PM callbacks from the sdio bus")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Daniel Mack <daniel(a)zonque.org>
Reviewed-by: Chris Ball <chris(a)printf.net>
Reviewed-by: Ulf Hansson <ulf.hansson(a)linaro.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
diff --git a/drivers/net/wireless/marvell/libertas/dev.h b/drivers/net/wireless/marvell/libertas/dev.h
index dd1ee1f0af48..469134930026 100644
--- a/drivers/net/wireless/marvell/libertas/dev.h
+++ b/drivers/net/wireless/marvell/libertas/dev.h
@@ -104,6 +104,7 @@ struct lbs_private {
u8 fw_ready;
u8 surpriseremoved;
u8 setup_fw_on_resume;
+ u8 power_up_on_resume;
int (*hw_host_to_card) (struct lbs_private *priv, u8 type, u8 *payload, u16 nb);
void (*reset_card) (struct lbs_private *priv);
int (*power_save) (struct lbs_private *priv);
diff --git a/drivers/net/wireless/marvell/libertas/if_sdio.c b/drivers/net/wireless/marvell/libertas/if_sdio.c
index 2300e796c6ab..43743c26c071 100644
--- a/drivers/net/wireless/marvell/libertas/if_sdio.c
+++ b/drivers/net/wireless/marvell/libertas/if_sdio.c
@@ -1290,15 +1290,23 @@ static void if_sdio_remove(struct sdio_func *func)
static int if_sdio_suspend(struct device *dev)
{
struct sdio_func *func = dev_to_sdio_func(dev);
- int ret;
struct if_sdio_card *card = sdio_get_drvdata(func);
+ struct lbs_private *priv = card->priv;
+ int ret;
mmc_pm_flag_t flags = sdio_get_host_pm_caps(func);
+ priv->power_up_on_resume = false;
/* If we're powered off anyway, just let the mmc layer remove the
* card. */
- if (!lbs_iface_active(card->priv))
- return -ENOSYS;
+ if (!lbs_iface_active(priv)) {
+ if (priv->fw_ready) {
+ priv->power_up_on_resume = true;
+ if_sdio_power_off(card);
+ }
+
+ return 0;
+ }
dev_info(dev, "%s: suspend: PM flags = 0x%x\n",
sdio_func_id(func), flags);
@@ -1306,9 +1314,14 @@ static int if_sdio_suspend(struct device *dev)
/* If we aren't being asked to wake on anything, we should bail out
* and let the SD stack power down the card.
*/
- if (card->priv->wol_criteria == EHS_REMOVE_WAKEUP) {
+ if (priv->wol_criteria == EHS_REMOVE_WAKEUP) {
dev_info(dev, "Suspend without wake params -- powering down card\n");
- return -ENOSYS;
+ if (priv->fw_ready) {
+ priv->power_up_on_resume = true;
+ if_sdio_power_off(card);
+ }
+
+ return 0;
}
if (!(flags & MMC_PM_KEEP_POWER)) {
@@ -1321,7 +1334,7 @@ static int if_sdio_suspend(struct device *dev)
if (ret)
return ret;
- ret = lbs_suspend(card->priv);
+ ret = lbs_suspend(priv);
if (ret)
return ret;
@@ -1336,6 +1349,11 @@ static int if_sdio_resume(struct device *dev)
dev_info(dev, "%s: resume: we're back\n", sdio_func_id(func));
+ if (card->priv->power_up_on_resume) {
+ if_sdio_power_on(card);
+ wait_event(card->pwron_waitq, card->priv->fw_ready);
+ }
+
ret = lbs_resume(card->priv);
return ret;
Hello,
Tested without any problem so please picked up this.
From: Matthew Auld
[ Upstream commit c11c7bfd213495784b22ef82a69b6489f8d0092f ]
Operating on a zero sized GEM userptr object will lead to explosions.
Fixes: 5cc9ed4b9a7a ("drm/i915: Introduce mapping of user pages into
video memory (userptr) ioctl")
Testcase: igt/gem_userptr_blits/input-checking
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Link:
https://patchwork.freedesktop.org/patch/msgid/20180502195021.30900-1-matthe…
---
drivers/gpu/drm/i915/i915_gem_userptr.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c
b/drivers/gpu/drm/i915/i915_gem_userptr.c
index d596a8302ca3c..854bd51b9478a 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -778,6 +778,9 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
I915_USERPTR_UNSYNCHRONIZED))
return -EINVAL;
+ if (!args->user_size)
+ return -EINVAL;
+
if (offset_in_page(args->user_ptr | args->user_size))
return -EINVAL;
--
2.17.1
The MTK xHCI controller use some reserved bytes in endpoint context for
bandwidth scheduling, so need keep them in xhci_endpoint_copy();
The issue is introduced by:
commit f5249461b504 ("xhci: Clear the host side toggle manually when
endpoint is soft reset")
It resets endpoints and will drop bandwidth scheduling parameters used
by interrupt or isochronous endpoints on MTK xHCI controller.
Fixes: f5249461b504 ("xhci: Clear the host side toggle manually when
endpoint is soft reset")
Cc: stable(a)vger.kernel.org
Signed-off-by: Chunfeng Yun <chunfeng.yun(a)mediatek.com>
Tested-by: Sean Wang <sean.wang(a)mediatek.com>
---
v2: add fix tag, Cc and Tested-by
---
drivers/usb/host/xhci-mem.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index ef350c3..b1f27aa 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1613,6 +1613,10 @@ void xhci_endpoint_copy(struct xhci_hcd *xhci,
in_ep_ctx->ep_info2 = out_ep_ctx->ep_info2;
in_ep_ctx->deq = out_ep_ctx->deq;
in_ep_ctx->tx_info = out_ep_ctx->tx_info;
+ if (xhci->quirks & XHCI_MTK_HOST) {
+ in_ep_ctx->reserved[0] = out_ep_ctx->reserved[0];
+ in_ep_ctx->reserved[1] = out_ep_ctx->reserved[1];
+ }
}
/* Copy output xhci_slot_ctx to the input xhci_slot_ctx.
--
1.9.1
The patch below does not apply to the 4.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ea8c5356d39048bc94bae068228f51ddbecc6b89 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Thu, 9 Aug 2018 15:48:49 +0800
Subject: [PATCH] bcache: set max writeback rate when I/O request is idle
Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
allows the writeback rate to be faster if there is no I/O request on a
bcache device. It works well if there is only one bcache device attached
to the cache set. If there are many bcache devices attached to a cache
set, it may introduce performance regression because multiple faster
writeback threads of the idle bcache devices will compete the btree level
locks with the bcache device who have I/O requests coming.
This patch fixes the above issue by only permitting fast writebac when
all bcache devices attached on the cache set are idle. And if one of the
bcache devices has new I/O request coming, minimized all writeback
throughput immediately and let PI controller __update_writeback_rate()
to decide the upcoming writeback rate for each bcache device.
Also when all bcache devices are idle, limited wrieback rate to a small
number is wast of thoughput, especially when backing devices are slower
non-rotation devices (e.g. SATA SSD). This patch sets a max writeback
rate for each backing device if the whole cache set is idle. A faster
writeback rate in idle time means new I/Os may have more available space
for dirty data, and people may observe a better write performance then.
Please note bcache may change its cache mode in run time, and this patch
still works if the cache mode is switched from writeback mode and there
is still dirty data on cache.
Fixes: Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
Cc: stable(a)vger.kernel.org #4.16+
Signed-off-by: Coly Li <colyli(a)suse.de>
Tested-by: Kai Krakow <kai(a)kaishome.de>
Tested-by: Stefan Priebe <s.priebe(a)profihost.ag>
Cc: Michael Lyle <mlyle(a)lyle.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index b393b3fd06b6..05f82ff6f016 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -328,13 +328,6 @@ struct cached_dev {
*/
atomic_t has_dirty;
- /*
- * Set to zero by things that touch the backing volume-- except
- * writeback. Incremented by writeback. Used to determine when to
- * accelerate idle writeback.
- */
- atomic_t backing_idle;
-
struct bch_ratelimit writeback_rate;
struct delayed_work writeback_rate_update;
@@ -515,6 +508,8 @@ struct cache_set {
struct cache_accounting accounting;
unsigned long flags;
+ atomic_t idle_counter;
+ atomic_t at_max_writeback_rate;
struct cache_sb sb;
@@ -524,6 +519,7 @@ struct cache_set {
struct bcache_device **devices;
unsigned devices_max_used;
+ atomic_t attached_dev_nr;
struct list_head cached_devs;
uint64_t cached_dev_sectors;
atomic_long_t flash_dev_dirty_sectors;
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 914d501ad1e0..7dbe8b6316a0 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1103,6 +1103,44 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio)
generic_make_request(bio);
}
+static void quit_max_writeback_rate(struct cache_set *c,
+ struct cached_dev *this_dc)
+{
+ int i;
+ struct bcache_device *d;
+ struct cached_dev *dc;
+
+ /*
+ * mutex bch_register_lock may compete with other parallel requesters,
+ * or attach/detach operations on other backing device. Waiting to
+ * the mutex lock may increase I/O request latency for seconds or more.
+ * To avoid such situation, if mutext_trylock() failed, only writeback
+ * rate of current cached device is set to 1, and __update_write_back()
+ * will decide writeback rate of other cached devices (remember now
+ * c->idle_counter is 0 already).
+ */
+ if (mutex_trylock(&bch_register_lock)) {
+ for (i = 0; i < c->devices_max_used; i++) {
+ if (!c->devices[i])
+ continue;
+
+ if (UUID_FLASH_ONLY(&c->uuids[i]))
+ continue;
+
+ d = c->devices[i];
+ dc = container_of(d, struct cached_dev, disk);
+ /*
+ * set writeback rate to default minimum value,
+ * then let update_writeback_rate() to decide the
+ * upcoming rate.
+ */
+ atomic_long_set(&dc->writeback_rate.rate, 1);
+ }
+ mutex_unlock(&bch_register_lock);
+ } else
+ atomic_long_set(&this_dc->writeback_rate.rate, 1);
+}
+
/* Cached devices - read & write stuff */
static blk_qc_t cached_dev_make_request(struct request_queue *q,
@@ -1120,8 +1158,25 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q,
return BLK_QC_T_NONE;
}
- atomic_set(&dc->backing_idle, 0);
- generic_start_io_acct(q, bio_op(bio), bio_sectors(bio), &d->disk->part0);
+ if (likely(d->c)) {
+ if (atomic_read(&d->c->idle_counter))
+ atomic_set(&d->c->idle_counter, 0);
+ /*
+ * If at_max_writeback_rate of cache set is true and new I/O
+ * comes, quit max writeback rate of all cached devices
+ * attached to this cache set, and set at_max_writeback_rate
+ * to false.
+ */
+ if (unlikely(atomic_read(&d->c->at_max_writeback_rate) == 1)) {
+ atomic_set(&d->c->at_max_writeback_rate, 0);
+ quit_max_writeback_rate(d->c, dc);
+ }
+ }
+
+ generic_start_io_acct(q,
+ bio_op(bio),
+ bio_sectors(bio),
+ &d->disk->part0);
bio_set_dev(bio, dc->bdev);
bio->bi_iter.bi_sector += dc->sb.data_offset;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 1e85cbb4c159..55a37641aa95 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -696,6 +696,8 @@ static void bcache_device_detach(struct bcache_device *d)
{
lockdep_assert_held(&bch_register_lock);
+ atomic_dec(&d->c->attached_dev_nr);
+
if (test_bit(BCACHE_DEV_DETACHING, &d->flags)) {
struct uuid_entry *u = d->c->uuids + d->id;
@@ -1144,6 +1146,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
bch_cached_dev_run(dc);
bcache_device_link(&dc->disk, c, "bdev");
+ atomic_inc(&c->attached_dev_nr);
/* Allow the writeback thread to proceed */
up_write(&dc->writeback_lock);
@@ -1696,6 +1699,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
c->block_bits = ilog2(sb->block_size);
c->nr_uuids = bucket_bytes(c) / sizeof(struct uuid_entry);
c->devices_max_used = 0;
+ atomic_set(&c->attached_dev_nr, 0);
c->btree_pages = bucket_pages(c);
if (c->btree_pages > BTREE_MAX_PAGES)
c->btree_pages = max_t(int, c->btree_pages / 4,
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 3e9d3459a224..6e88142514fb 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -171,7 +171,8 @@ SHOW(__bch_cached_dev)
var_printf(writeback_running, "%i");
var_print(writeback_delay);
var_print(writeback_percent);
- sysfs_hprint(writeback_rate, wb ? dc->writeback_rate.rate << 9 : 0);
+ sysfs_hprint(writeback_rate,
+ wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0);
sysfs_hprint(io_errors, atomic_read(&dc->io_errors));
sysfs_printf(io_error_limit, "%i", dc->error_limit);
sysfs_printf(io_disable, "%i", dc->io_disable);
@@ -193,7 +194,9 @@ SHOW(__bch_cached_dev)
* Except for dirty and target, other values should
* be 0 if writeback is not running.
*/
- bch_hprint(rate, wb ? dc->writeback_rate.rate << 9 : 0);
+ bch_hprint(rate,
+ wb ? atomic_long_read(&dc->writeback_rate.rate) << 9
+ : 0);
bch_hprint(dirty, bcache_dev_sectors_dirty(&dc->disk) << 9);
bch_hprint(target, dc->writeback_rate_target << 9);
bch_hprint(proportional,
@@ -261,8 +264,12 @@ STORE(__cached_dev)
sysfs_strtoul_clamp(writeback_percent, dc->writeback_percent, 0, 40);
- sysfs_strtoul_clamp(writeback_rate,
- dc->writeback_rate.rate, 1, INT_MAX);
+ if (attr == &sysfs_writeback_rate) {
+ int v;
+
+ sysfs_strtoul_clamp(writeback_rate, v, 1, INT_MAX);
+ atomic_long_set(&dc->writeback_rate.rate, v);
+ }
sysfs_strtoul_clamp(writeback_rate_update_seconds,
dc->writeback_rate_update_seconds,
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index fc479b026d6d..b15256bcf0e7 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -200,7 +200,7 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
{
uint64_t now = local_clock();
- d->next += div_u64(done * NSEC_PER_SEC, d->rate);
+ d->next += div_u64(done * NSEC_PER_SEC, atomic_long_read(&d->rate));
/* Bound the time. Don't let us fall further than 2 seconds behind
* (this prevents unnecessary backlog that would make it impossible
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index cced87f8eb27..f7b0133c9d2f 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -442,7 +442,7 @@ struct bch_ratelimit {
* Rate at which we want to do work, in units per second
* The units here correspond to the units passed to bch_next_delay()
*/
- uint32_t rate;
+ atomic_long_t rate;
};
static inline void bch_ratelimit_reset(struct bch_ratelimit *d)
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 912e969fedba..481d4cf38ac0 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -104,11 +104,56 @@ static void __update_writeback_rate(struct cached_dev *dc)
dc->writeback_rate_proportional = proportional_scaled;
dc->writeback_rate_integral_scaled = integral_scaled;
- dc->writeback_rate_change = new_rate - dc->writeback_rate.rate;
- dc->writeback_rate.rate = new_rate;
+ dc->writeback_rate_change = new_rate -
+ atomic_long_read(&dc->writeback_rate.rate);
+ atomic_long_set(&dc->writeback_rate.rate, new_rate);
dc->writeback_rate_target = target;
}
+static bool set_at_max_writeback_rate(struct cache_set *c,
+ struct cached_dev *dc)
+{
+ /*
+ * Idle_counter is increased everytime when update_writeback_rate() is
+ * called. If all backing devices attached to the same cache set have
+ * identical dc->writeback_rate_update_seconds values, it is about 6
+ * rounds of update_writeback_rate() on each backing device before
+ * c->at_max_writeback_rate is set to 1, and then max wrteback rate set
+ * to each dc->writeback_rate.rate.
+ * In order to avoid extra locking cost for counting exact dirty cached
+ * devices number, c->attached_dev_nr is used to calculate the idle
+ * throushold. It might be bigger if not all cached device are in write-
+ * back mode, but it still works well with limited extra rounds of
+ * update_writeback_rate().
+ */
+ if (atomic_inc_return(&c->idle_counter) <
+ atomic_read(&c->attached_dev_nr) * 6)
+ return false;
+
+ if (atomic_read(&c->at_max_writeback_rate) != 1)
+ atomic_set(&c->at_max_writeback_rate, 1);
+
+ atomic_long_set(&dc->writeback_rate.rate, INT_MAX);
+
+ /* keep writeback_rate_target as existing value */
+ dc->writeback_rate_proportional = 0;
+ dc->writeback_rate_integral_scaled = 0;
+ dc->writeback_rate_change = 0;
+
+ /*
+ * Check c->idle_counter and c->at_max_writeback_rate agagain in case
+ * new I/O arrives during before set_at_max_writeback_rate() returns.
+ * Then the writeback rate is set to 1, and its new value should be
+ * decided via __update_writeback_rate().
+ */
+ if ((atomic_read(&c->idle_counter) <
+ atomic_read(&c->attached_dev_nr) * 6) ||
+ !atomic_read(&c->at_max_writeback_rate))
+ return false;
+
+ return true;
+}
+
static void update_writeback_rate(struct work_struct *work)
{
struct cached_dev *dc = container_of(to_delayed_work(work),
@@ -136,13 +181,20 @@ static void update_writeback_rate(struct work_struct *work)
return;
}
- down_read(&dc->writeback_lock);
-
- if (atomic_read(&dc->has_dirty) &&
- dc->writeback_percent)
- __update_writeback_rate(dc);
+ if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
+ /*
+ * If the whole cache set is idle, set_at_max_writeback_rate()
+ * will set writeback rate to a max number. Then it is
+ * unncessary to update writeback rate for an idle cache set
+ * in maximum writeback rate number(s).
+ */
+ if (!set_at_max_writeback_rate(c, dc)) {
+ down_read(&dc->writeback_lock);
+ __update_writeback_rate(dc);
+ up_read(&dc->writeback_lock);
+ }
+ }
- up_read(&dc->writeback_lock);
/*
* CACHE_SET_IO_DISABLE might be set via sysfs interface,
@@ -422,27 +474,6 @@ static void read_dirty(struct cached_dev *dc)
delay = writeback_delay(dc, size);
- /* If the control system would wait for at least half a
- * second, and there's been no reqs hitting the backing disk
- * for awhile: use an alternate mode where we have at most
- * one contiguous set of writebacks in flight at a time. If
- * someone wants to do IO it will be quick, as it will only
- * have to contend with one operation in flight, and we'll
- * be round-tripping data to the backing disk as quickly as
- * it can accept it.
- */
- if (delay >= HZ / 2) {
- /* 3 means at least 1.5 seconds, up to 7.5 if we
- * have slowed way down.
- */
- if (atomic_inc_return(&dc->backing_idle) >= 3) {
- /* Wait for current I/Os to finish */
- closure_sync(&cl);
- /* And immediately launch a new set. */
- delay = 0;
- }
- }
-
while (!kthread_should_stop() &&
!test_bit(CACHE_SET_IO_DISABLE, &dc->disk.c->flags) &&
delay) {
@@ -741,7 +772,7 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
dc->writeback_running = true;
dc->writeback_percent = 10;
dc->writeback_delay = 30;
- dc->writeback_rate.rate = 1024;
+ atomic_long_set(&dc->writeback_rate.rate, 1024);
dc->writeback_rate_minimum = 8;
dc->writeback_rate_update_seconds = WRITEBACK_RATE_UPDATE_SECS_DEFAULT;
The patch below was submitted to be applied to the 4.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 78ac2107176baa0daf65b0fb8e561d2ed14c83ca Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Thu, 9 Aug 2018 15:48:42 +0800
Subject: [PATCH] bcache: do not check return value of debugfs_create_dir()
Greg KH suggests that normal code should not care about debugfs. Therefore
no matter successful or failed of debugfs_create_dir() execution, it is
unncessary to check its return value.
There are two functions called debugfs_create_dir() and check the return
value, which are bch_debug_init() and closure_debug_init(). This patch
changes these two functions from int to void type, and ignore return values
of debugfs_create_dir().
This patch does not fix exact bug, just makes things work as they should.
Signed-off-by: Coly Li <colyli(a)suse.de>
Suggested-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org
Cc: Kai Krakow <kai(a)kaishome.de>
Cc: Kent Overstreet <kent.overstreet(a)gmail.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 872ef4d67711..0a3e82b0876d 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1001,7 +1001,7 @@ void bch_open_buckets_free(struct cache_set *);
int bch_cache_allocator_start(struct cache *ca);
void bch_debug_exit(void);
-int bch_debug_init(struct kobject *);
+void bch_debug_init(struct kobject *kobj);
void bch_request_exit(void);
int bch_request_init(void);
diff --git a/drivers/md/bcache/closure.c b/drivers/md/bcache/closure.c
index 0e14969182c6..618253683d40 100644
--- a/drivers/md/bcache/closure.c
+++ b/drivers/md/bcache/closure.c
@@ -199,11 +199,16 @@ static const struct file_operations debug_ops = {
.release = single_release
};
-int __init closure_debug_init(void)
+void __init closure_debug_init(void)
{
- closure_debug = debugfs_create_file("closures",
- 0400, bcache_debug, NULL, &debug_ops);
- return IS_ERR_OR_NULL(closure_debug);
+ if (!IS_ERR_OR_NULL(bcache_debug))
+ /*
+ * it is unnecessary to check return value of
+ * debugfs_create_file(), we should not care
+ * about this.
+ */
+ closure_debug = debugfs_create_file(
+ "closures", 0400, bcache_debug, NULL, &debug_ops);
}
#endif
diff --git a/drivers/md/bcache/closure.h b/drivers/md/bcache/closure.h
index 71427eb5fdae..7c2c5bc7c88b 100644
--- a/drivers/md/bcache/closure.h
+++ b/drivers/md/bcache/closure.h
@@ -186,13 +186,13 @@ static inline void closure_sync(struct closure *cl)
#ifdef CONFIG_BCACHE_CLOSURES_DEBUG
-int closure_debug_init(void);
+void closure_debug_init(void);
void closure_debug_create(struct closure *cl);
void closure_debug_destroy(struct closure *cl);
#else
-static inline int closure_debug_init(void) { return 0; }
+static inline void closure_debug_init(void) {}
static inline void closure_debug_create(struct closure *cl) {}
static inline void closure_debug_destroy(struct closure *cl) {}
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 04d146711950..12034c07257b 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -252,11 +252,12 @@ void bch_debug_exit(void)
debugfs_remove_recursive(bcache_debug);
}
-int __init bch_debug_init(struct kobject *kobj)
+void __init bch_debug_init(struct kobject *kobj)
{
- if (!IS_ENABLED(CONFIG_DEBUG_FS))
- return 0;
-
+ /*
+ * it is unnecessary to check return value of
+ * debugfs_create_file(), we should not care
+ * about this.
+ */
bcache_debug = debugfs_create_dir("bcache", NULL);
- return IS_ERR_OR_NULL(bcache_debug);
}
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e0a92104ca23..c7ffa6ef3f82 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2345,10 +2345,12 @@ static int __init bcache_init(void)
goto err;
if (bch_request_init() ||
- bch_debug_init(bcache_kobj) || closure_debug_init() ||
sysfs_create_files(bcache_kobj, files))
goto err;
+ bch_debug_init(bcache_kobj);
+ closure_debug_init();
+
return 0;
err:
bcache_exit();
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 54648cf1ec2d7f4b6a71767799c45676a138ca24 Mon Sep 17 00:00:00 2001
From: xiao jin <jin.xiao(a)intel.com>
Date: Mon, 30 Jul 2018 14:11:12 +0800
Subject: [PATCH] block: blk_init_allocated_queue() set q->fq as NULL in the
fail case
We find the memory use-after-free issue in __blk_drain_queue()
on the kernel 4.14. After read the latest kernel 4.18-rc6 we
think it has the same problem.
Memory is allocated for q->fq in the blk_init_allocated_queue().
If the elevator init function called with error return, it will
run into the fail case to free the q->fq.
Then the __blk_drain_queue() uses the same memory after the free
of the q->fq, it will lead to the unpredictable event.
The patch is to set q->fq as NULL in the fail case of
blk_init_allocated_queue().
Fixes: commit 7c94e1c157a2 ("block: introduce blk_flush_queue to drive flush machinery")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Signed-off-by: xiao jin <jin.xiao(a)intel.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-core.c b/block/blk-core.c
index 03a4ea93a5f3..23cd1b7770e7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1184,6 +1184,7 @@ int blk_init_allocated_queue(struct request_queue *q)
q->exit_rq_fn(q, q->fq->flush_rq);
out_free_flush_queue:
blk_free_flush_queue(q->fq);
+ q->fq = NULL;
return -ENOMEM;
}
EXPORT_SYMBOL(blk_init_allocated_queue);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 54648cf1ec2d7f4b6a71767799c45676a138ca24 Mon Sep 17 00:00:00 2001
From: xiao jin <jin.xiao(a)intel.com>
Date: Mon, 30 Jul 2018 14:11:12 +0800
Subject: [PATCH] block: blk_init_allocated_queue() set q->fq as NULL in the
fail case
We find the memory use-after-free issue in __blk_drain_queue()
on the kernel 4.14. After read the latest kernel 4.18-rc6 we
think it has the same problem.
Memory is allocated for q->fq in the blk_init_allocated_queue().
If the elevator init function called with error return, it will
run into the fail case to free the q->fq.
Then the __blk_drain_queue() uses the same memory after the free
of the q->fq, it will lead to the unpredictable event.
The patch is to set q->fq as NULL in the fail case of
blk_init_allocated_queue().
Fixes: commit 7c94e1c157a2 ("block: introduce blk_flush_queue to drive flush machinery")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Signed-off-by: xiao jin <jin.xiao(a)intel.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-core.c b/block/blk-core.c
index 03a4ea93a5f3..23cd1b7770e7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1184,6 +1184,7 @@ int blk_init_allocated_queue(struct request_queue *q)
q->exit_rq_fn(q, q->fq->flush_rq);
out_free_flush_queue:
blk_free_flush_queue(q->fq);
+ q->fq = NULL;
return -ENOMEM;
}
EXPORT_SYMBOL(blk_init_allocated_queue);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 54648cf1ec2d7f4b6a71767799c45676a138ca24 Mon Sep 17 00:00:00 2001
From: xiao jin <jin.xiao(a)intel.com>
Date: Mon, 30 Jul 2018 14:11:12 +0800
Subject: [PATCH] block: blk_init_allocated_queue() set q->fq as NULL in the
fail case
We find the memory use-after-free issue in __blk_drain_queue()
on the kernel 4.14. After read the latest kernel 4.18-rc6 we
think it has the same problem.
Memory is allocated for q->fq in the blk_init_allocated_queue().
If the elevator init function called with error return, it will
run into the fail case to free the q->fq.
Then the __blk_drain_queue() uses the same memory after the free
of the q->fq, it will lead to the unpredictable event.
The patch is to set q->fq as NULL in the fail case of
blk_init_allocated_queue().
Fixes: commit 7c94e1c157a2 ("block: introduce blk_flush_queue to drive flush machinery")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Signed-off-by: xiao jin <jin.xiao(a)intel.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-core.c b/block/blk-core.c
index 03a4ea93a5f3..23cd1b7770e7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1184,6 +1184,7 @@ int blk_init_allocated_queue(struct request_queue *q)
q->exit_rq_fn(q, q->fq->flush_rq);
out_free_flush_queue:
blk_free_flush_queue(q->fq);
+ q->fq = NULL;
return -ENOMEM;
}
EXPORT_SYMBOL(blk_init_allocated_queue);
The default mid-level PLL bias current setting interferes with sigma
delta modulation. This manifests as decreased audio quality at lower
sampling rates, which sounds like radio broadcast quality, and
distortion noises at sampling rates at 48 kHz or above.
Changing the bias current settings to the lowest gets rid of the
noise.
Fixes: de3448519194 ("clk: sunxi-ng: sun4i: Use sigma-delta modulation
for audio PLL")
Cc: <stable(a)vger.kernel.org> # 4.15.x
Signed-off-by: Chen-Yu Tsai <wens(a)csie.org>
---
drivers/clk/sunxi-ng/ccu-sun4i-a10.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/clk/sunxi-ng/ccu-sun4i-a10.c b/drivers/clk/sunxi-ng/ccu-sun4i-a10.c
index ffa5dac221e4..129ebd2588fd 100644
--- a/drivers/clk/sunxi-ng/ccu-sun4i-a10.c
+++ b/drivers/clk/sunxi-ng/ccu-sun4i-a10.c
@@ -1434,8 +1434,16 @@ static void __init sun4i_ccu_init(struct device_node *node,
return;
}
- /* Force the PLL-Audio-1x divider to 1 */
val = readl(reg + SUN4I_PLL_AUDIO_REG);
+
+ /*
+ * Force VCO and PLL bias current to lowest setting. Higher
+ * settings interfere with sigma-delta modulation and result
+ * in audible noise and distortions when using SPDIF or I2S.
+ */
+ val &= ~GENMASK(25, 16);
+
+ /* Force the PLL-Audio-1x divider to 1 */
val &= ~GENMASK(29, 26);
writel(val | (1 << 26), reg + SUN4I_PLL_AUDIO_REG);
--
2.19.0.rc1
Hello!
Please apply the following commit to the v4.18 -stable tree:
rcu: Make expedited GPs handle CPU 0 being offline
fcc63543650150629c8a873cbef3578770acecd9
This patch fixes a v4.18 regression that is causing the RISC-V people
(CCed) some trouble. It is a small patch and has been tested in the
failing situation running v4.18 by Atish Patra and Andreas Schwab
(both CCed).
Please let me know if more information is required.
Thanx, Paul