The patch titled
Subject: mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
has been added to the -mm tree. Its filename is
mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-devm_memremap_pages-mark-devm_m…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-devm_memremap_pages-mark-devm_m…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
devm_memremap_pages() is a facility that can create struct page entries
for any arbitrary range and give drivers the ability to subvert core
aspects of page management.
Specifically the facility is tightly integrated with the kernel's memory
hotplug functionality. It injects an altmap argument deep into the
architecture specific vmemmap implementation to allow allocating from
specific reserved pages, and it has Linux specific assumptions about page
structure reference counting relative to get_user_pages() and
get_user_pages_fast(). It was an oversight and a mistake that this was
not marked EXPORT_SYMBOL_GPL from the outset.
Again, devm_memremap_pagex() exposes and relies upon core kernel internal
assumptions and will continue to evolve along with 'struct page', memory
hotplug, and support for new memory types / topologies. Only an in-kernel
GPL-only driver is expected to keep up with this ongoing evolution. This
interface, and functionality derived from this interface, is not suitable
for kernel-external drivers.
Link: http://lkml.kernel.org/r/154275557457.76910.16923571232582744134.stgit@dwil…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: Balbir Singh <bsingharora(a)gmail.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/kernel/memremap.c~mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl
+++ a/kernel/memremap.c
@@ -233,7 +233,7 @@ void *devm_memremap_pages(struct device
err_array:
return ERR_PTR(error);
}
-EXPORT_SYMBOL(devm_memremap_pages);
+EXPORT_SYMBOL_GPL(devm_memremap_pages);
unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
{
--- a/tools/testing/nvdimm/test/iomap.c~mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl
+++ a/tools/testing/nvdimm/test/iomap.c
@@ -113,7 +113,7 @@ void *__wrap_devm_memremap_pages(struct
return nfit_res->buf + offset - nfit_res->res.start;
return devm_memremap_pages(dev, pgmap);
}
-EXPORT_SYMBOL(__wrap_devm_memremap_pages);
+EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);
pfn_t __wrap_phys_to_pfn_t(phys_addr_t addr, unsigned long flags)
{
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl.patch
mm-devm_memremap_pages-kill-mapping-system-ram-support.patch
mm-devm_memremap_pages-fix-shutdown-handling.patch
mm-devm_memremap_pages-add-memory_device_private-support.patch
mm-hmm-use-devm-semantics-for-hmm_devmem_add-remove.patch
mm-hmm-replace-hmm_devmem_pages_create-with-devm_memremap_pages.patch
mm-hmm-mark-hmm_devmem_add-add_resource-export_symbol_gpl.patch
My apology that the v6 patches are missing the first two patch in
the series. Resending the patch series as v7.
Fix in this version bugs causing build problems for UP configuration.
Also merged in Jiri's change to extend STIBP for SECCOMP processes and
renaming TIF_STIBP to TIF_SPEC_INDIR_BRANCH.
I've updated the boot options spectre_v2_app2app to
on, off, auto, prctl and seccomp. This aligns with
the options for other speculation related mitigations.
I tried to incorporate sched_smt_present to detect when we have all SMT
going offline and we can disable the SMT path that Peter suggested.
this is an optimization that can be easily left out of the patch
series. I've put these two patches at the end and they can be considered
separately.
I've dropped the TIF flags re-organization patches
as they are orthogonal to this patch series.
To do: Create a dedicated document on the mitigation options for Spectre V2.
Since Jiri's patchset to always turn on STIBP
has big performance impact, I think that it should
be reverted from 4.20 and stable kernels for now, till this
patchset to mitigate its performance impact can be merged
with it.
Thanks.
Tim
Patch 1 to 3 are clean up patches.
Patch 4 and 5 disable STIBP for enhacned IBRS.
Patch 6 to 9 reorganize and clean up the code without affecting
functionality for easier modification later.
Patch 10 introduces the STIBP flag on a task to dynamically
enable STIBP for that task.
Patch 11 introduces different modes to protect a
task against Spectre v2 user space attack.
Patch 12 adds prctl interface to turn on Spectre v2 user mode defenses on a task.
Patch 13 Put IBPB usage under the mode chosen for app2app mitigation.
Patch 14 Add STIBP protection for SECCOMP tasks.
Patch 15-16 add Spectre v2 defenses for non-dumpable tasks.
Patch 15-16 reorganizes the TIF flags, and can be dropped without affecting this series
Patch 17-18 When there are no paired SMT left, disable SMT specific code
Changes:
v6:
1. Fix bugs for UP build configuration.
2. Add protection for SECCOMP tasks.
3. Rename TIF_STIBP to TIF_SPEC_INDIR_BRANCH.
4. Update boot options to align with other speculation mitigations.
5. Separate out IBPB change that makes it depend on TIF_SPEC_INDIR_BRANCH.
6. Move some checks for SPEC_CTRL updates to spec_ctrl_update_msr to avoid
unnecesseary MSR writes.
7. Drop TIF reorg patches.
8. Incorporate optimization to disable SMT code paths when no paired SMT is present.
v5:
1. Drop patch to extend TIF_STIBP changes to all related threads on
a task's dumpabibility change.
2. Drop patch to replace sched_smt_present with cpu_smt_enabled.
3. Drop export of cpu_smt_control in kernel/cpu.c and replace external
usages of cpu_smt_control with cpu_smt_enabled.
4. Rebase patch series on 4.20-rc2.
v4:
1. Extend STIBP update to all threads of a process changing
it dumpability.
2. Add logic to update SPEC_CTRL MSR on a remote CPU when TIF flags
affecting speculation changes for task running on the remote CPU.
3. Regroup x86 TIF_* flags according to their functions.
4. Various code clean up.
v3:
1. Add logic to skip STIBP when Enhanced IBRS is used.
2. Break up v2 patches into smaller logical patches.
3. Fix bug in arch_set_dumpable that did not update SPEC_CTRL
MSR right away when according to task's STIBP flag clearing which
caused SITBP to be left on.
4. Various code clean up.
v2:
1. Extend per process STIBP to AMD cpus
2. Add prctl option to control per process indirect branch speculation
3. Bug fixes and cleanups
Jiri's patchset to harden Spectre v2 user space mitigation makes IBPB
and STIBP in use for Spectre v2 mitigation on all processes. IBPB will
be issued for switching to an application that's not ptraceable by the
previous application and STIBP will be always turned on.
However, leaving STIBP on all the time is expensive for certain
applications that have frequent indirect branches. One such application
is perlbench in the SpecInt Rate 2006 test suite which shows a
21% reduction in throughput.
There're also reports of drop in performance on Python and PHP benchmarks:
https://www.phoronix.com/scan.php?page=article&item=linux-420-bisect&num=2
Other applications like bzip2 with minimal indirct branches have
only a 0.7% reduction in throughput. IBPB will also impose
overhead during context switches.
Users may not wish to incur performance overhead from IBPB and STIBP for
general non security sensitive processes and use these mitigations only
for security sensitive processes.
This patchset provides a process property based lite protection mode.
In this mode, IBPB and STIBP mitigation are applied only to security
sensitive non-dumpable processes and processes that users want to protect
by having indirect branch speculation disabled via PRCTL. So the overhead
from IBPB and STIBP are avoided for low security processes that don't
require extra protection.
Jiri Kosina (1):
x86/speculation: Add 'seccomp' Spectre v2 app to app protection mode
Peter Zijlstra (1):
sched/smt: Make sched_smt_present track topology
Tim Chen (16):
x86/speculation: Clean up spectre_v2_parse_cmdline()
x86/speculation: Remove unnecessary ret variable in cpu_show_common()
x86/speculation: Reorganize cpu_show_common()
x86/speculation: Add X86_FEATURE_USE_IBRS_ENHANCED
x86/speculation: Disable STIBP when enhanced IBRS is in use
x86/speculation: Rename SSBD update functions
x86/speculation: Reorganize speculation control MSRs update
smt: Create cpu_smt_enabled static key for SMT specific code
x86/smt: Convert cpu_smt_control check to cpu_smt_enabled static key
x86/speculation: Turn on or off STIBP according to a task's TIF_STIBP
x86/speculation: Add Spectre v2 app to app protection modes
x86/speculation: Create PRCTL interface to restrict indirect branch
speculation
x86/speculation: Enable IBPB for tasks with
TIF_SPEC_BRANCH_SPECULATION
security: Update speculation restriction of a process when modifying
its dumpability
x86/speculation: Use STIBP to restrict speculation on non-dumpable
task
x86/smt: Allow disabling of SMT when last SMT is offlined
Documentation/admin-guide/kernel-parameters.txt | 34 +++
Documentation/userspace-api/spec_ctrl.rst | 9 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 6 +-
arch/x86/include/asm/nospec-branch.h | 10 +
arch/x86/include/asm/spec-ctrl.h | 18 +-
arch/x86/include/asm/thread_info.h | 5 +-
arch/x86/kernel/cpu/bugs.c | 336 +++++++++++++++++++++---
arch/x86/kernel/process.c | 58 +++-
arch/x86/kvm/vmx.c | 2 +-
arch/x86/mm/tlb.c | 23 +-
fs/exec.c | 3 +
include/linux/cpu.h | 31 ++-
include/linux/sched.h | 9 +
include/uapi/linux/prctl.h | 1 +
kernel/cpu.c | 28 +-
kernel/cred.c | 5 +-
kernel/sched/core.c | 19 +-
kernel/sched/sched.h | 2 -
kernel/sys.c | 7 +
tools/include/uapi/linux/prctl.h | 1 +
21 files changed, 526 insertions(+), 82 deletions(-)
--
2.9.4
Commit f77084d96355 "x86/mm/pat: Disable preemption around
__flush_tlb_all()" addressed a case where __flush_tlb_all() is called
without preemption being disabled. It also left a warning to catch other
cases where preemption is not disabled. That warning triggers for the
memory hotplug path which is also used for persistent memory enabling:
WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
RIP: 0010:__flush_tlb_all+0x1b/0x3a
[..]
Call Trace:
phys_pud_init+0x29c/0x2bb
kernel_physical_mapping_init+0xfc/0x219
init_memory_mapping+0x1a5/0x3b0
arch_add_memory+0x2c/0x50
devm_memremap_pages+0x3aa/0x610
pmem_attach_disk+0x585/0x700 [nd_pmem]
Andy wondered why a path that can sleep was using __flush_tlb_all() [1]
and Dave confirmed the expectation for TLB flush is for modifying /
invalidating existing pte entries, but not initial population [2]. Drop
the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the
expectation that this path is only ever populating empty entries for the
linear map. Note, at linear map teardown time there is a call to the
all-cpu flush_tlb_all() to invalidate the removed mappings.
[1]: https://lore.kernel.org/patchwork/patch/1009434/#1193941
[2]: https://lore.kernel.org/patchwork/patch/1009434/#1194540
Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()")
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: <stable(a)vger.kernel.org>
Reported-by: Andy Lutomirski <luto(a)kernel.org>
Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
arch/x86/mm/init_64.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 5fab264948c2..de95db8ac52f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -584,7 +584,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
paddr_end,
page_size_mask,
prot);
- __flush_tlb_all();
continue;
}
/*
@@ -627,7 +626,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
pud_populate(&init_mm, pud, pmd);
spin_unlock(&init_mm.page_table_lock);
}
- __flush_tlb_all();
update_page_count(PG_LEVEL_1G, pages);
@@ -668,7 +666,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
paddr_last = phys_pud_init(pud, paddr,
paddr_end,
page_size_mask);
- __flush_tlb_all();
continue;
}
@@ -680,7 +677,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
p4d_populate(&init_mm, p4d, pud);
spin_unlock(&init_mm.page_table_lock);
}
- __flush_tlb_all();
return paddr_last;
}
@@ -733,8 +729,6 @@ kernel_physical_mapping_init(unsigned long paddr_start,
if (pgd_changed)
sync_global_pgds(vaddr_start, vaddr_end - 1);
- __flush_tlb_all();
-
return paddr_last;
}
The patch titled
Subject: scripts/spdxcheck.py: make python3 compliant
has been removed from the -mm tree. Its filename was
scripts-spdxcheck-make-python3-compliant.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Subject: scripts/spdxcheck.py: make python3 compliant
Without this change the following happens when using Python3 (3.6.6):
$ echo "GPL-2.0" | python3 scripts/spdxcheck.py -
FAIL: 'str' object has no attribute 'decode'
Traceback (most recent call last):
File "scripts/spdxcheck.py", line 253, in <module>
parser.parse_lines(sys.stdin, args.maxlines, '-')
File "scripts/spdxcheck.py", line 171, in parse_lines
line = line.decode(locale.getpreferredencoding(False), errors='ignore')
AttributeError: 'str' object has no attribute 'decode'
So as the line is already a string, there is no need to decode it and
the line can be dropped.
/usr/bin/python on Arch is Python 3. So this would indeed be worth
going into 4.19.
Link: http://lkml.kernel.org/r/20181023070802.22558-1-u.kleine-koenig@pengutronix…
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Joe Perches <joe(a)perches.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/spdxcheck.py | 1 -
1 file changed, 1 deletion(-)
--- a/scripts/spdxcheck.py~scripts-spdxcheck-make-python3-compliant
+++ a/scripts/spdxcheck.py
@@ -168,7 +168,6 @@ class id_parser(object):
self.curline = 0
try:
for line in fd:
- line = line.decode(locale.getpreferredencoding(False), errors='ignore')
self.curline += 1
if self.curline > maxlines:
break
_
Patches currently in -mm which might be from u.kleine-koenig(a)pengutronix.de are
The patch titled
Subject: lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
has been removed from the -mm tree. Its filename was
ubsan-dont-mark-__ubsan_handle_builtin_unreachable-as-noreturn.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
gcc-8 complains about the prototype for this function:
lib/ubsan.c:432:1: error: ignoring attribute 'noreturn' in declaration of a built-in function '__ubsan_handle_builtin_unreachable' because it conflicts with attribute 'const' [-Werror=attributes]
This is actually a GCC's bug. In GCC internals
__ubsan_handle_builtin_unreachable() declared with both 'noreturn' and
'const' attributes instead of only 'noreturn':
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84210
Workaround this by removing the noreturn attribute.
[aryabinin: add information about GCC bug in changelog]
Link: http://lkml.kernel.org/r/20181107144516.4587-1-aryabinin@virtuozzo.com
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Acked-by: Olof Johansson <olof(a)lixom.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/ubsan.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/lib/ubsan.c~ubsan-dont-mark-__ubsan_handle_builtin_unreachable-as-noreturn
+++ a/lib/ubsan.c
@@ -427,8 +427,7 @@ void __ubsan_handle_shift_out_of_bounds(
EXPORT_SYMBOL(__ubsan_handle_shift_out_of_bounds);
-void __noreturn
-__ubsan_handle_builtin_unreachable(struct unreachable_data *data)
+void __ubsan_handle_builtin_unreachable(struct unreachable_data *data)
{
unsigned long flags;
_
Patches currently in -mm which might be from arnd(a)arndb.de are
vfs-replace-current_kernel_time64-with-ktime-equivalent.patch
The patch titled
Subject: ocfs2: free up write context when direct IO failed
has been removed from the -mm tree. Its filename was
ocfs2-free-up-write-context-when-direct-io-failed.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Wengang Wang <wen.gang.wang(a)oracle.com>
Subject: ocfs2: free up write context when direct IO failed
The write context should also be freed even when direct IO failed.
Otherwise a memory leak is introduced and entries remain in
oi->ip_unwritten_list causing the following BUG later in unlink path:
ERROR: bug expression: !list_empty(&oi->ip_unwritten_list)
ERROR: Clear inode of 215043, inode has unwritten extents
...
Call Trace:
? __set_current_blocked+0x42/0x68
ocfs2_evict_inode+0x91/0x6a0 [ocfs2]
? bit_waitqueue+0x40/0x33
evict+0xdb/0x1af
iput+0x1a2/0x1f7
do_unlinkat+0x194/0x28f
SyS_unlinkat+0x1b/0x2f
do_syscall_64+0x79/0x1ae
entry_SYSCALL_64_after_hwframe+0x151/0x0
This patch also logs, with frequency limit, direct IO failures.
Link: http://lkml.kernel.org/r/20181102170632.25921-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang(a)oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi(a)oracle.com>
Reviewed-by: Changwei Ge <ge.changwei(a)h3c.com>
Reviewed-by: Joseph Qi <jiangqi903(a)gmail.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/aops.c | 12 ++++++++++--
fs/ocfs2/cluster/masklog.h | 9 +++++++++
2 files changed, 19 insertions(+), 2 deletions(-)
--- a/fs/ocfs2/aops.c~ocfs2-free-up-write-context-when-direct-io-failed
+++ a/fs/ocfs2/aops.c
@@ -2411,8 +2411,16 @@ static int ocfs2_dio_end_io(struct kiocb
/* this io's submitter should not have unlocked this before we could */
BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
- if (bytes > 0 && private)
- ret = ocfs2_dio_end_io_write(inode, private, offset, bytes);
+ if (bytes <= 0)
+ mlog_ratelimited(ML_ERROR, "Direct IO failed, bytes = %lld",
+ (long long)bytes);
+ if (private) {
+ if (bytes > 0)
+ ret = ocfs2_dio_end_io_write(inode, private, offset,
+ bytes);
+ else
+ ocfs2_dio_free_write_ctx(inode, private);
+ }
ocfs2_iocb_clear_rw_locked(iocb);
--- a/fs/ocfs2/cluster/masklog.h~ocfs2-free-up-write-context-when-direct-io-failed
+++ a/fs/ocfs2/cluster/masklog.h
@@ -178,6 +178,15 @@ do { \
##__VA_ARGS__); \
} while (0)
+#define mlog_ratelimited(mask, fmt, ...) \
+do { \
+ static DEFINE_RATELIMIT_STATE(_rs, \
+ DEFAULT_RATELIMIT_INTERVAL, \
+ DEFAULT_RATELIMIT_BURST); \
+ if (__ratelimit(&_rs)) \
+ mlog(mask, fmt, ##__VA_ARGS__); \
+} while (0)
+
#define mlog_errno(st) ({ \
int _st = (st); \
if (_st != -ERESTARTSYS && _st != -EINTR && \
_
Patches currently in -mm which might be from wen.gang.wang(a)oracle.com are
The patch titled
Subject: mm: don't reclaim inodes with many attached pages
has been removed from the -mm tree. Its filename was
mm-dont-reclaim-inodes-with-many-attached-pages.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: don't reclaim inodes with many attached pages
Spock reported that the commit 172b06c32b94 ("mm: slowly shrink slabs with
a relatively small number of objects") leads to a regression on his setup:
periodically the majority of the pagecache is evicted without an obvious
reason, while before the change the amount of free memory was balancing
around the watermark.
The reason behind is that the mentioned above change created some minimal
background pressure on the inode cache. The problem is that if an inode
is considered to be reclaimed, all belonging pagecache page are stripped,
no matter how many of them are there. So, if a huge multi-gigabyte file
is cached in the memory, and the goal is to reclaim only few slab objects
(unused inodes), we still can eventually evict all gigabytes of the
pagecache at once.
The workload described by Spock has few large non-mapped files in the
pagecache, so it's especially noticeable.
To solve the problem let's postpone the reclaim of inodes, which have more
than 1 attached page. Let's wait until the pagecache pages will be
evicted naturally by scanning the corresponding LRU lists, and only then
reclaim the inode structure.
Link: http://lkml.kernel.org/r/20181023164302.20436-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reported-by: Spock <dairinin(a)gmail.com>
Tested-by: Spock <dairinin(a)gmail.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [4.19.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/inode.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/fs/inode.c~mm-dont-reclaim-inodes-with-many-attached-pages
+++ a/fs/inode.c
@@ -730,8 +730,11 @@ static enum lru_status inode_lru_isolate
return LRU_REMOVED;
}
- /* recently referenced inodes get one more pass */
- if (inode->i_state & I_REFERENCED) {
+ /*
+ * Recently referenced inodes and inodes with many attached pages
+ * get one more pass.
+ */
+ if (inode->i_state & I_REFERENCED || inode->i_data.nrpages > 1) {
inode->i_state &= ~I_REFERENCED;
spin_unlock(&inode->i_lock);
return LRU_ROTATE;
_
Patches currently in -mm which might be from guro(a)fb.com are
The patch titled
Subject: mm/swapfile.c: use kvzalloc for swap_info_struct allocation
has been removed from the -mm tree. Its filename was
mm-use-kvzalloc-for-swap_info_struct-allocation.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vasily Averin <vvs(a)virtuozzo.com>
Subject: mm/swapfile.c: use kvzalloc for swap_info_struct allocation
a2468cc9bfdf ("swap: choose swap device according to numa node") changed
'avail_lists' field of 'struct swap_info_struct' to an array. In popular
linux distros it increased size of swap_info_struct up to 40 Kbytes and
now swap_info_struct allocation requires order-4 page. Switch to
kvzmalloc allows to avoid unexpected allocation failures.
Link: http://lkml.kernel.org/r/fc23172d-3c75-21e2-d551-8b1808cbe593@virtuozzo.com
Fixes: a2468cc9bfdf ("swap: choose swap device according to numa node")
Signed-off-by: Vasily Averin <vvs(a)virtuozzo.com>
Acked-by: Aaron Lu <aaron.lu(a)intel.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swapfile.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/swapfile.c~mm-use-kvzalloc-for-swap_info_struct-allocation
+++ a/mm/swapfile.c
@@ -2813,7 +2813,7 @@ static struct swap_info_struct *alloc_sw
unsigned int type;
int i;
- p = kzalloc(sizeof(*p), GFP_KERNEL);
+ p = kvzalloc(sizeof(*p), GFP_KERNEL);
if (!p)
return ERR_PTR(-ENOMEM);
@@ -2824,7 +2824,7 @@ static struct swap_info_struct *alloc_sw
}
if (type >= MAX_SWAPFILES) {
spin_unlock(&swap_lock);
- kfree(p);
+ kvfree(p);
return ERR_PTR(-EPERM);
}
if (type >= nr_swapfiles) {
@@ -2838,7 +2838,7 @@ static struct swap_info_struct *alloc_sw
smp_wmb();
nr_swapfiles++;
} else {
- kfree(p);
+ kvfree(p);
p = swap_info[type];
/*
* Do not memset this entry: a racing procfs swap_next()
_
Patches currently in -mm which might be from vvs(a)virtuozzo.com are