From: Huang Ying <huang.ying.caritas(a)gmail.com>
Subject: mm, swap: fix false error message in __swp_swapcount()
When a page fault occurs for a swap entry, the physical swap readahead
(not the VMA base swap readahead) may readahead several swap entries after
the fault swap entry. The readahead algorithm calculates some of the swap
entries to readahead via increasing the offset of the fault swap entry
without checking whether they are beyond the end of the swap device and it
relys on the __swp_swapcount() and swapcache_prepare() to check it.
Although __swp_swapcount() checks for the swap entry passed in, it will
complain with the error message as follow for the expected invalid swap
entry. This may make the end users confused.
swap_info_get: Bad swap offset entry 0200f8a7
To fix the false error message, the swap entry checking is added in
swapin_readahead() to avoid to pass the out-of-bound swap entries and the
swap entry reserved for the swap header to __swp_swapcount() and
swapcache_prepare().
Link: http://lkml.kernel.org/r/20171102054225.22897-1-ying.huang@intel.com
Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots")
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com>
Reported-by: Christian Kujau <lists(a)nerdbynature.de>
Acked-by: Minchan Kim <minchan(a)kernel.org>
Suggested-by: Minchan Kim <minchan(a)kernel.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swap_state.c | 3 +++
1 file changed, 3 insertions(+)
diff -puN mm/swap_state.c~mm-swap-fix-false-error-message-in-__swp_swapcount mm/swap_state.c
--- a/mm/swap_state.c~mm-swap-fix-false-error-message-in-__swp_swapcount
+++ a/mm/swap_state.c
@@ -559,6 +559,7 @@ struct page *swapin_readahead(swp_entry_
unsigned long offset = entry_offset;
unsigned long start_offset, end_offset;
unsigned long mask;
+ struct swap_info_struct *si = swp_swap_info(entry);
struct blk_plug plug;
bool do_poll = true, page_allocated;
@@ -572,6 +573,8 @@ struct page *swapin_readahead(swp_entry_
end_offset = offset | mask;
if (!start_offset) /* First page is swap header. */
start_offset++;
+ if (end_offset >= si->max)
+ end_offset = si->max - 1;
blk_start_plug(&plug);
for (offset = start_offset; offset <= end_offset ; offset++) {
_
From: alex chen <alex.chen(a)huawei.com>
Subject: ocfs2: should wait dio before inode lock in ocfs2_setattr()
we should wait dio requests to finish before inode lock in
ocfs2_setattr(), otherwise the following deadlock will happen:
process 1 process 2 process 3
truncate file 'A' end_io of writing file 'A' receiving the bast messages
ocfs2_setattr
ocfs2_inode_lock_tracker
ocfs2_inode_lock_full
inode_dio_wait
__inode_dio_wait
-->waiting for all dio
requests finish
dlm_proxy_ast_handler
dlm_do_local_bast
ocfs2_blocking_ast
ocfs2_generic_handle_bast
set OCFS2_LOCK_BLOCKED flag
dio_end_io
dio_bio_end_aio
dio_complete
ocfs2_dio_end_io
ocfs2_dio_end_io_write
ocfs2_inode_lock
__ocfs2_cluster_lock
ocfs2_wait_for_mask
-->waiting for OCFS2_LOCK_BLOCKED
flag to be cleared, that is waiting
for 'process 1' unlocking the inode lock
inode_dio_end
-->here dec the i_dio_count, but will never
be called, so a deadlock happened.
Link: http://lkml.kernel.org/r/59F81636.70508@huawei.com
Signed-off-by: Alex Chen <alex.chen(a)huawei.com>
Reviewed-by: Jun Piao <piaojun(a)huawei.com>
Reviewed-by: Joseph Qi <jiangqi903(a)gmail.com>
Acked-by: Changwei Ge <ge.changwei(a)h3c.com>
Cc: Mark Fasheh <mfasheh(a)versity.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/file.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff -puN fs/ocfs2/file.c~ocfs2-should-wait-dio-before-inode-lock-in-ocfs2_setattr fs/ocfs2/file.c
--- a/fs/ocfs2/file.c~ocfs2-should-wait-dio-before-inode-lock-in-ocfs2_setattr
+++ a/fs/ocfs2/file.c
@@ -1161,6 +1161,13 @@ int ocfs2_setattr(struct dentry *dentry,
}
size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE;
if (size_change) {
+ /*
+ * Here we should wait dio to finish before inode lock
+ * to avoid a deadlock between ocfs2_setattr() and
+ * ocfs2_dio_end_io_write()
+ */
+ inode_dio_wait(inode);
+
status = ocfs2_rw_lock(inode, 1);
if (status < 0) {
mlog_errno(status);
@@ -1200,8 +1207,6 @@ int ocfs2_setattr(struct dentry *dentry,
if (status)
goto bail_unlock;
- inode_dio_wait(inode);
-
if (i_size_read(inode) >= attr->ia_size) {
if (ocfs2_should_order_data(inode)) {
status = ocfs2_begin_ordered_truncate(inode,
_
From: Changwei Ge <ge.changwei(a)h3c.com>
Subject: ocfs2: fix cluster hang after a node dies
When a node dies, other live nodes have to choose a new master for an
existed lock resource mastered by the dead node.
As for ocfs2/dlm implementation, this is done by function -
dlm_move_lockres_to_recovery_list which marks those lock rsources as
DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM changes
lock resource's master later.
So without invoking dlm_move_lockres_to_recovery_list, no master will be
choosed after dlm recovery accomplishment since no lock resource can be
found through ::resource list.
What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock
resources mastered a dead node, it will break up synchronization among
nodes.
So invoke dlm_move_lockres_to_recovery_list again.
Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")'
Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-…
Signed-off-by: Changwei Ge <ge.changwei(a)h3c.com>
Reported-by: Vitaly Mayatskih <v.mayatskih(a)gmail.com>
Tested-by: Vitaly Mayatskikh <v.mayatskih(a)gmail.com>
Cc: Mark Fasheh <mfasheh(a)versity.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Joseph Qi <jiangqi903(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/dlm/dlmrecovery.c | 1 +
1 file changed, 1 insertion(+)
diff -puN fs/ocfs2/dlm/dlmrecovery.c~ocfs2-fix-cluster-hang-after-a-node-dies fs/ocfs2/dlm/dlmrecovery.c
--- a/fs/ocfs2/dlm/dlmrecovery.c~ocfs2-fix-cluster-hang-after-a-node-dies
+++ a/fs/ocfs2/dlm/dlmrecovery.c
@@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanu
dlm_lockres_put(res);
continue;
}
+ dlm_move_lockres_to_recovery_list(dlm, res);
} else if (res->owner == dlm->node_num) {
dlm_free_dead_locks(dlm, res, dead_node);
__dlm_lockres_calc_usage(dlm, res);
_
Please can you queue up the following for stable:
df80cd9b28b9 sctp: do not peel off an assoc from one netns to another one
Ben.
--
Ben Hutchings
Software Developer, Codethink Ltd.
The patch titled
Subject: mm, swap: fix false error message in __swp_swapcount()
has been removed from the -mm tree. Its filename was
mm-swap-skip-swapcache-for-swapin-of-synchronous-device-fix.patch
This patch was dropped because it was folded into mm-swap-skip-swapcache-for-swapin-of-synchronous-device.patch
------------------------------------------------------
From: Huang Ying <huang.ying.caritas(a)gmail.com>
Subject: mm, swap: fix false error message in __swp_swapcount()
When a page fault occurs for a swap entry, the physical swap readahead
(not the VMA base swap readahead) may readahead several swap entries after
the fault swap entry. The readahead algorithm calculates some of the swap
entries to readahead via increasing the offset of the fault swap entry
without checking whether they are beyond the end of the swap device and it
relys on the __swp_swapcount() and swapcache_prepare() to check it.
Although __swp_swapcount() checks for the swap entry passed in, it will
complain with the error message as follow for the expected invalid swap
entry. This may make the end users confused.
swap_info_get: Bad swap offset entry 0200f8a7
To fix the false error message, the swap entry checking is added in
swapin_readahead() to avoid to pass the out-of-bound swap entries and the
swap entry reserved for the swap header to __swp_swapcount() and
swapcache_prepare().
Link: http://lkml.kernel.org/r/20171102054225.22897-1-ying.huang@intel.com
Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots")
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com>
Reported-by: Christian Kujau <lists(a)nerdbynature.de>
Acked-by: Minchan Kim <minchan(a)kernel.org>
Suggested-by: Minchan Kim <minchan(a)kernel.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swap_state.c | 3 +++
1 file changed, 3 insertions(+)
diff -puN mm/swap_state.c~mm-swap-skip-swapcache-for-swapin-of-synchronous-device-fix mm/swap_state.c
--- a/mm/swap_state.c~mm-swap-skip-swapcache-for-swapin-of-synchronous-device-fix
+++ a/mm/swap_state.c
@@ -559,6 +559,7 @@ struct page *swapin_readahead(swp_entry_
unsigned long offset = entry_offset;
unsigned long start_offset, end_offset;
unsigned long mask;
+ struct swap_info_struct *si = swp_swap_info(entry);
struct blk_plug plug;
bool do_poll = true, page_allocated;
@@ -572,6 +573,8 @@ struct page *swapin_readahead(swp_entry_
end_offset = offset | mask;
if (!start_offset) /* First page is swap header. */
start_offset++;
+ if (end_offset >= si->max)
+ end_offset = si->max - 1;
blk_start_plug(&plug);
for (offset = start_offset; offset <= end_offset ; offset++) {
_
Patches currently in -mm which might be from huang.ying.caritas(a)gmail.com are
mm-swap-skip-swapcache-for-swapin-of-synchronous-device.patch
The default max_cache_size_bytes for dm-bufio is meant to be the lesser
of 25% of the size of the vmalloc area and 2% of the size of lowmem.
However, on 32-bit systems the intermediate result in the expression
(VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100
overflows, causing the wrong result to be computed. For example, on a
32-bit system where the vmalloc area is 520093696 bytes, the result is
1174405 rather than the expected 130023424, which makes the maximum
cache size much too small (far less than 2% of lowmem). This causes
severe performance problems for dm-verity users on affected systems.
Fix this by using mult_frac() to correctly multiply by a percentage. Do
this for all places in dm-bufio that multiply by a percentage. Also
replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary
to the comment is now defined in include/linux/vmalloc.h.
Fixes: 95d402f057f2 ("dm: add bufio")
Cc: <stable(a)vger.kernel.org> # v3.2+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
drivers/md/dm-bufio.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 33bb074d6941..b8ac591aaaa7 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -974,7 +974,8 @@ static void __get_memory_limit(struct dm_bufio_client *c,
buffers = c->minimum_buffers;
*limit_buffers = buffers;
- *threshold_buffers = buffers * DM_BUFIO_WRITEBACK_PERCENT / 100;
+ *threshold_buffers = mult_frac(buffers,
+ DM_BUFIO_WRITEBACK_PERCENT, 100);
}
/*
@@ -1910,19 +1911,15 @@ static int __init dm_bufio_init(void)
memset(&dm_bufio_caches, 0, sizeof dm_bufio_caches);
memset(&dm_bufio_cache_names, 0, sizeof dm_bufio_cache_names);
- mem = (__u64)((totalram_pages - totalhigh_pages) *
- DM_BUFIO_MEMORY_PERCENT / 100) << PAGE_SHIFT;
+ mem = (__u64)mult_frac(totalram_pages - totalhigh_pages,
+ DM_BUFIO_MEMORY_PERCENT, 100) << PAGE_SHIFT;
if (mem > ULONG_MAX)
mem = ULONG_MAX;
#ifdef CONFIG_MMU
- /*
- * Get the size of vmalloc space the same way as VMALLOC_TOTAL
- * in fs/proc/internal.h
- */
- if (mem > (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100)
- mem = (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100;
+ if (mem > mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100))
+ mem = mult_frac(VMALLOC_TOTAL, DM_BUFIO_VMALLOC_PERCENT, 100);
#endif
dm_bufio_default_cache_size = mem;
--
2.15.0.448.gf294e3d99a-goog
From: Ian Abbott <abbotti(a)mev.co.uk>
Both fpga_region_get_manager() and fpga_region_get_bridges() call
of_parse_phandle(), but nothing calls of_node_put() on the returned
struct device_node pointers. Make sure to do that to stop their
reference counters getting out of whack.
Fixes: 0fa20cdfcc1f ("fpga: fpga-region: device tree control for FPGA")
Cc: <stable(a)vger.kernel.org> # 4.10+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
Signed-off-by: Alan Tull <atull(a)kernel.org>
---
drivers/fpga/of-fpga-region.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/fpga/of-fpga-region.c b/drivers/fpga/of-fpga-region.c
index a0c13cb..7dfaa95 100644
--- a/drivers/fpga/of-fpga-region.c
+++ b/drivers/fpga/of-fpga-region.c
@@ -73,6 +73,7 @@ static struct fpga_manager *of_fpga_region_get_mgr(struct device_node *np)
mgr_node = of_parse_phandle(np, "fpga-mgr", 0);
if (mgr_node) {
mgr = of_fpga_mgr_get(mgr_node);
+ of_node_put(mgr_node);
of_node_put(np);
return mgr;
}
@@ -120,10 +121,13 @@ static int of_fpga_region_get_bridges(struct fpga_region *region)
parent_br = region_np->parent;
/* If overlay has a list of bridges, use it. */
- if (of_parse_phandle(info->overlay, "fpga-bridges", 0))
+ br = of_parse_phandle(info->overlay, "fpga-bridges", 0);
+ if (br) {
+ of_node_put(br);
np = info->overlay;
- else
+ } else {
np = region_np;
+ }
for (i = 0; ; i++) {
br = of_parse_phandle(np, "fpga-bridges", i);
@@ -131,12 +135,15 @@ static int of_fpga_region_get_bridges(struct fpga_region *region)
break;
/* If parent bridge is in list, skip it. */
- if (br == parent_br)
+ if (br == parent_br) {
+ of_node_put(br);
continue;
+ }
/* If node is a bridge, get it and add to list */
ret = of_fpga_bridge_get_to_list(br, info,
®ion->bridge_list);
+ of_node_put(br);
/* If any of the bridges are in use, give up */
if (ret == -EBUSY) {
--
2.7.4