The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ff4dd232ec45a0e45ea69f28f069f2ab22b4908a Mon Sep 17 00:00:00 2001
From: Paul Burton <paul.burton(a)mips.com>
Date: Tue, 4 Dec 2018 23:44:12 +0000
Subject: [PATCH] MIPS: Expand MIPS32 ASIDs to 64 bits
ASIDs have always been stored as unsigned longs, ie. 32 bits on MIPS32
kernels. This is problematic because it is feasible for the ASID version
to overflow & wrap around to zero.
We currently attempt to handle this overflow by simply setting the ASID
version to 1, using asid_first_version(), but we make no attempt to
account for the fact that there may be mm_structs with stale ASIDs that
have versions which we now reuse due to the overflow & wrap around.
Encountering this requires that:
1) A struct mm_struct X is active on CPU A using ASID (V,n).
2) That mm is not used on CPU A for the length of time that it takes
for CPU A's asid_cache to overflow & wrap around to the same
version V that the mm had in step 1. During this time tasks using
the mm could either be sleeping or only scheduled on other CPUs.
3) Some other mm Y becomes active on CPU A and is allocated the same
ASID (V,n).
4) mm X now becomes active on CPU A again, and now incorrectly has the
same ASID as mm Y.
Where struct mm_struct ASIDs are represented above in the format
(version, EntryHi.ASID), and on a typical MIPS32 system version will be
24 bits wide & EntryHi.ASID will be 8 bits wide.
The length of time required in step 2 is highly dependent upon the CPU &
workload, but for a hypothetical 2GHz CPU running a workload which
generates a new ASID every 10000 cycles this period is around 248 days.
Due to this long period of time & the fact that tasks need to be
scheduled in just the right (or wrong, depending upon your inclination)
way, this is obviously a difficult bug to encounter but it's entirely
possible as evidenced by reports.
In order to fix this, simply extend ASIDs to 64 bits even on MIPS32
builds. This will extend the period of time required for the
hypothetical system above to encounter the problem from 28 days to
around 3 trillion years, which feels safely outside of the realms of
possibility.
The cost of this is slightly more generated code in some commonly
executed paths, but this is pretty minimal:
| Code Size Gain | Percentage
-----------------------|----------------|-------------
decstation_defconfig | +270 | +0.00%
32r2el_defconfig | +652 | +0.01%
32r6el_defconfig | +1000 | +0.01%
I have been unable to measure any change in performance of the LMbench
lat_ctx or lat_proc tests resulting from the 64b ASIDs on either
32r2el_defconfig+interAptiv or 32r6el_defconfig+I6500 systems.
Signed-off-by: Paul Burton <paul.burton(a)mips.com>
Suggested-by: James Hogan <jhogan(a)kernel.org>
References: https://lore.kernel.org/linux-mips/80B78A8B8FEE6145A87579E8435D78C30205D5F3…
References: https://lore.kernel.org/linux-mips/1488684260-18867-1-git-send-email-jiwei.…
Cc: Jiwei Sun <jiwei.sun(a)windriver.com>
Cc: Yu Huabing <yhb(a)ruijie.com.cn>
Cc: stable(a)vger.kernel.org # 2.6.12+
Cc: linux-mips(a)vger.kernel.org
diff --git a/arch/mips/include/asm/cpu-info.h b/arch/mips/include/asm/cpu-info.h
index a41059d47d31..ed7ffe4e63a3 100644
--- a/arch/mips/include/asm/cpu-info.h
+++ b/arch/mips/include/asm/cpu-info.h
@@ -50,7 +50,7 @@ struct guest_info {
#define MIPS_CACHE_PINDEX 0x00000020 /* Physically indexed cache */
struct cpuinfo_mips {
- unsigned long asid_cache;
+ u64 asid_cache;
#ifdef CONFIG_MIPS_ASID_BITS_VARIABLE
unsigned long asid_mask;
#endif
diff --git a/arch/mips/include/asm/mmu.h b/arch/mips/include/asm/mmu.h
index 0740be7d5d4a..24d6b42345fb 100644
--- a/arch/mips/include/asm/mmu.h
+++ b/arch/mips/include/asm/mmu.h
@@ -7,7 +7,7 @@
#include <linux/wait.h>
typedef struct {
- unsigned long asid[NR_CPUS];
+ u64 asid[NR_CPUS];
void *vdso;
atomic_t fp_mode_switching;
diff --git a/arch/mips/include/asm/mmu_context.h b/arch/mips/include/asm/mmu_context.h
index 94414561de0e..a589585be21b 100644
--- a/arch/mips/include/asm/mmu_context.h
+++ b/arch/mips/include/asm/mmu_context.h
@@ -76,14 +76,14 @@ extern unsigned long pgd_current[];
* All unused by hardware upper bits will be considered
* as a software asid extension.
*/
-static unsigned long asid_version_mask(unsigned int cpu)
+static inline u64 asid_version_mask(unsigned int cpu)
{
unsigned long asid_mask = cpu_asid_mask(&cpu_data[cpu]);
- return ~(asid_mask | (asid_mask - 1));
+ return ~(u64)(asid_mask | (asid_mask - 1));
}
-static unsigned long asid_first_version(unsigned int cpu)
+static inline u64 asid_first_version(unsigned int cpu)
{
return ~asid_version_mask(cpu) + 1;
}
@@ -102,14 +102,12 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
static inline void
get_new_mmu_context(struct mm_struct *mm, unsigned long cpu)
{
- unsigned long asid = asid_cache(cpu);
+ u64 asid = asid_cache(cpu);
if (!((asid += cpu_asid_inc()) & cpu_asid_mask(&cpu_data[cpu]))) {
if (cpu_has_vtag_icache)
flush_icache_all();
local_flush_tlb_all(); /* start new asid cycle */
- if (!asid) /* fix version if needed */
- asid = asid_first_version(cpu);
}
cpu_context(cpu, mm) = asid_cache(cpu) = asid;
diff --git a/arch/mips/mm/c-r3k.c b/arch/mips/mm/c-r3k.c
index 3466fcdae0ca..01848cdf2074 100644
--- a/arch/mips/mm/c-r3k.c
+++ b/arch/mips/mm/c-r3k.c
@@ -245,7 +245,7 @@ static void r3k_flush_cache_page(struct vm_area_struct *vma,
pmd_t *pmdp;
pte_t *ptep;
- pr_debug("cpage[%08lx,%08lx]\n",
+ pr_debug("cpage[%08llx,%08lx]\n",
cpu_context(smp_processor_id(), mm), addr);
/* No ASID => no such page in the cache. */
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0ea295dd853e0879a9a30ab61f923c26be35b902 Mon Sep 17 00:00:00 2001
From: Pan Bian <bianpan2016(a)163.com>
Date: Thu, 22 Nov 2018 18:58:46 +0800
Subject: [PATCH] f2fs: read page index before freeing
The function truncate_node frees the page with f2fs_put_page. However,
the page index is read after that. So, the patch reads the index before
freeing the page.
Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Pan Bian <bianpan2016(a)163.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index e60c7779e114..a2273340991f 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -826,6 +826,7 @@ static int truncate_node(struct dnode_of_data *dn)
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct node_info ni;
int err;
+ pgoff_t index;
err = f2fs_get_node_info(sbi, dn->nid, &ni);
if (err)
@@ -845,10 +846,11 @@ static int truncate_node(struct dnode_of_data *dn)
clear_node_page_dirty(dn->node_page);
set_sbi_flag(sbi, SBI_IS_DIRTY);
+ index = dn->node_page->index;
f2fs_put_page(dn->node_page, 1);
invalidate_mapping_pages(NODE_MAPPING(sbi),
- dn->node_page->index, dn->node_page->index);
+ index, index);
dn->node_page = NULL;
trace_f2fs_truncate_node(dn->inode, dn->nid, ni.blk_addr);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0ea295dd853e0879a9a30ab61f923c26be35b902 Mon Sep 17 00:00:00 2001
From: Pan Bian <bianpan2016(a)163.com>
Date: Thu, 22 Nov 2018 18:58:46 +0800
Subject: [PATCH] f2fs: read page index before freeing
The function truncate_node frees the page with f2fs_put_page. However,
the page index is read after that. So, the patch reads the index before
freeing the page.
Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Pan Bian <bianpan2016(a)163.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index e60c7779e114..a2273340991f 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -826,6 +826,7 @@ static int truncate_node(struct dnode_of_data *dn)
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct node_info ni;
int err;
+ pgoff_t index;
err = f2fs_get_node_info(sbi, dn->nid, &ni);
if (err)
@@ -845,10 +846,11 @@ static int truncate_node(struct dnode_of_data *dn)
clear_node_page_dirty(dn->node_page);
set_sbi_flag(sbi, SBI_IS_DIRTY);
+ index = dn->node_page->index;
f2fs_put_page(dn->node_page, 1);
invalidate_mapping_pages(NODE_MAPPING(sbi),
- dn->node_page->index, dn->node_page->index);
+ index, index);
dn->node_page = NULL;
trace_f2fs_truncate_node(dn->inode, dn->nid, ni.blk_addr);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0ea295dd853e0879a9a30ab61f923c26be35b902 Mon Sep 17 00:00:00 2001
From: Pan Bian <bianpan2016(a)163.com>
Date: Thu, 22 Nov 2018 18:58:46 +0800
Subject: [PATCH] f2fs: read page index before freeing
The function truncate_node frees the page with f2fs_put_page. However,
the page index is read after that. So, the patch reads the index before
freeing the page.
Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Pan Bian <bianpan2016(a)163.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index e60c7779e114..a2273340991f 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -826,6 +826,7 @@ static int truncate_node(struct dnode_of_data *dn)
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct node_info ni;
int err;
+ pgoff_t index;
err = f2fs_get_node_info(sbi, dn->nid, &ni);
if (err)
@@ -845,10 +846,11 @@ static int truncate_node(struct dnode_of_data *dn)
clear_node_page_dirty(dn->node_page);
set_sbi_flag(sbi, SBI_IS_DIRTY);
+ index = dn->node_page->index;
f2fs_put_page(dn->node_page, 1);
invalidate_mapping_pages(NODE_MAPPING(sbi),
- dn->node_page->index, dn->node_page->index);
+ index, index);
dn->node_page = NULL;
trace_f2fs_truncate_node(dn->inode, dn->nid, ni.blk_addr);
From: Ubuntu <ubuntu(a)petilil.segmaas.1ss>
[changelog]
- v2: include patch 5/5 (a very recent fix to patch 4/5) which is
not yet in Linus's tree but it's in nf.git + linux-next.git,
thus should make it shortly. Test results still consistent.
Thanks Florian Westphal for reviewing and pointing that out.
Recently, Alakesh Haloi reported the following issue [1] with stable/4.14:
"""
An iptable rule like the following on a multicore systems will result in
accepting more connections than set in the rule.
iptables -A INPUT -p tcp -m tcp --syn --dport 7777 -m connlimit \
--connlimit-above 2000 --connlimit-mask 0 -j DROP
"""
And proposed a fix that is not in Linus's tree. The discussion went on to
confirm whether the issue was still reproducible with mainline/nf.git tip,
and to either identify the upstream fix or re-submit the non-upstream fix.
Alakesh eventually was able to test with upstream, and reported that issue
was still reproducible [2].
On that, our findinds diverge, at least in my test environment:
First, I verified that the suggested mainline fix for the issue [3] indeed
fixes it, by testing with it applied and reverted on v4.18, a clean revert.
(The issue is reproducible with the commit reverted).
Then, with a consistent reproducer, I moved to nf.git, with HEAD on commit
a007232 ("netfilter: nf_conncount: fix argument order to find_next_bit"),
and the issues was not reproducible (even with 20+ threads on client side,
the number Alakesh reported to achieve 2150+ connections [4], and I tried
spreading the network interface IRQ affinity over more and more CPUs too.)
Either way, the suggested mainline fix does actually fix the issue in 4.14
for at least one environment. So, it might well be the case that Alakesh's
test environment has differences/subtleties that leads to more connections
accepted, and more commits are needed for that particular environment type.
(v2 update: see Florian's reply to v1 thread [1]; these different results
are probably explained by very recent fixes still missing back then.)
But for now, with one bare-metal environment (24-core server, 4-core client)
verified, I thought of submitting the patches for review/comments/testing,
then looking for additional fixes for that environment separately.
The fix is PATCH 4 (needs fix in PATCH 5), and PATCHes 1-3 are helpers for
a cleaner backport.
All backports are simple, and essentially consist of refresh context lines
and use older struct/file names.
Reviews from netfilter maintainers are very appreciated, as I've no previous
experience in this area, and although the backports look simple and build/run
correctly, there's usually stuff that only more experienced people may notice.
Thanks,
Mauricio
Links:
=====
[1] https://www.spinics.net/lists/stable/msg270040.html
[2] https://www.spinics.net/lists/stable/msg273669.html
[3] https://www.spinics.net/lists/stable/msg271300.html
[4] https://www.spinics.net/lists/stable/msg273669.html
[5] https://www.spinics.net/lists/stable/msg276883.html
Test-case:
=========
- v4.14.91 (original): client achieves 2000+ connections (6000 target)
with 3 threads.
server # iptables -F
server # iptables -A INPUT -p tcp -m tcp --syn --dport 7777 -m connlimit --connlimit-above 2000 --connlimit-mask 0 -j DROP
server # iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
DROP tcp -- anywhere anywhere tcp dpt:7777 flags:FIN,SYN,RST,ACK/SYN #conn src/0 > 2000
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
server # ulimit -SHn 65000
server # ruby server.rb
<... listening ...>
client # ulimit -SHn 65000
client # ruby client.rb 10.230.56.100 7777 6000 3
Connecting to ["10.230.56.100"]:7777 6000 times with 3
1
2
3
<...>
2000
<...>
6000
Target reached. Thread finishing
6001
Target reached. Thread finishing
6002
Target reached. Thread finishing
Threads done. 6002 connections
press enter to exit
- v4.14.91 + patches: client only achieved 2000 connections.
server # (same procedure)
client # (same procedure)
Connecting to ["10.230.56.100"]:7777 6000 times with 3
1
2
3
<...>
2000
<... blocked for a while...>
failed to create connection: Connection timed out - connect(2) for "10.230.56.100" port 7777
failed to create connection: Connection timed out - connect(2) for "10.230.56.100" port 7777
failed to create connection: Connection timed out - connect(2) for "10.230.56.100" port 7777
Threads done. 2000 connections
press enter to exit
Florian Westphal (3):
netfilter: xt_connlimit: don't store address in the conn nodes
netfilter: nf_conncount: fix garbage collection confirm race
netfilter: nf_conncount: don't skip eviction when age is negative
Pablo Neira Ayuso (1):
netfilter: nf_conncount: expose connection list interface
Yi-Hung Wei (1):
netfilter: nf_conncount: Fix garbage collection with zones
include/net/netfilter/nf_conntrack_count.h | 15 +++++
net/netfilter/xt_connlimit.c | 99 +++++++++++++++++++++++-------
2 files changed, 91 insertions(+), 23 deletions(-)
create mode 100644 include/net/netfilter/nf_conntrack_count.h
--
2.7.4
The original upstream fix, commit 55e56f06ed71 "dax: Don't access a freed
inode", prompted an immediate cleanup request. Now that the cleanup has
landed, commit d8a706414af4 "dax: Use non-exclusive wait in
wait_entry_unlocked()", backport them both to -stable.
---
Dan Williams (1):
dax: Use non-exclusive wait in wait_entry_unlocked()
Matthew Wilcox (1):
dax: Don't access a freed inode
fs/dax.c | 69 ++++++++++++++++++++++++++++----------------------------------
1 file changed, 31 insertions(+), 38 deletions(-)
On Sun, Jan 6, 2019 at 4:58 AM Sebastian Kemper <sebastian_ml(a)gmx.net> wrote:
> Sorry for probably breaking the thread. This is really in reply to Paul
> Aubrich's patch titled "smb3: fix large reads on encrypted connections".
> I browsed linux-cifs and found Paul's patch. With the patch applied the
> problem is gone, kernel 4.19 works like 4.9 did without encryption.
>
> The patch title suggests that only large reads are not working. Well,
> considering the cutoff is between 13K and 17K I'd say that encryption on
> 4.19 for cifs can be considered broken without this patch. It'd be cool
> if this could make it to stable kernels pronto.
Yes - it is a a very important patch, and is merged in and marked for stable
(for 4.19 and 4.20 kernels). Hopefully can be included soon.
The good news though is that with the improved test automation that
Paulo/Aurelien/Ronnie have been doing, we have added more to
the automated xfstests run before checkin and more importantly
added various different mount configrations that are much
broader (used to be only a very small set that ran) and so would have
caught this (and also other regressions that had made it through over the
past couple years).
Now our next challenge is figuring out how to get automated tests for smb3
to run reasonably regularly against some of the stable kernels and also against
the full backports of cifs.ko to earlier kernels etc. which some need to get the
many security, performance and functional improvements over the past year.
--
Thanks,
Steve