The patch titled
Subject: mm: memcontrol: prevent starvation when writing memory.high
has been removed from the -mm tree. Its filename was
mm-memcontrol-prevent-starvation-when-writing-memoryhigh.patch
This patch was dropped because an alternative patch was merged
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: mm: memcontrol: prevent starvation when writing memory.high
When a value is written to a cgroup's memory.high control file, the
write() context first tries to reclaim the cgroup to size before putting
the limit in place for the workload. Concurrent charges from the workload
can keep such a write() looping in reclaim indefinitely.
In the past, a write to memory.high would first put the limit in place for
the workload, then do targeted reclaim until the new limit has been met -
similar to how we do it for memory.max. This wasn't prone to the
described starvation issue. However, this sequence could cause excessive
latencies in the workload, when allocating threads could be put into long
penalty sleeps on the sudden memory.high overage created by the write(),
before that had a chance to work it off.
Now that memory_high_write() performs reclaim before enforcing the new
limit, reflect that the cgroup may well fail to converge due to concurrent
workload activity. Bail out of the loop after a few tries.
Link: https://lkml.kernel.org/r/20210112163011.127833-1-hannes@cmpxchg.org
Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reported-by: Tejun Heo <tj(a)kernel.org>
Acked-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Michal Koutný <mkoutny(a)suse.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-prevent-starvation-when-writing-memoryhigh
+++ a/mm/memcontrol.c
@@ -6273,7 +6273,6 @@ static ssize_t memory_high_write(struct
for (;;) {
unsigned long nr_pages = page_counter_read(&memcg->memory);
- unsigned long reclaimed;
if (nr_pages <= high)
break;
@@ -6287,10 +6286,10 @@ static ssize_t memory_high_write(struct
continue;
}
- reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
- GFP_KERNEL, true);
+ try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
+ GFP_KERNEL, true);
- if (!reclaimed && !nr_retries--)
+ if (!nr_retries--)
break;
}
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
revert-mm-memcontrol-avoid-workload-stalls-when-lowering-memoryhigh.patch
Dear stable maintainers,
Please consider cherry-picking c8a950d0d3b9 ("tools: Factor HOSTCC,
HOSTLD, HOSTAR definitions") to 5.10, 5.4 and 4.19. It fixes a problem
where the host tools set by the user could be ignored for
'prepare-objtool', and is needed on x86 to enable the ORC unwinder,
dynamic ftrace, etc. with LLVM=1 and a 'hermetic' toolchain
environment.
On 5.10.10 it cherry-picks cleanly. On 5.4.92 and 4.19.170 it
cherry-picks cleanly, besides 'tools/bpf/resolve_btfids/Makefile'
which was introduced after 5.8-rc2. (This file can be ignored.)
Thanks!
Hello stable,
The following printk patches fixing a buffer overflow potential in 5.10
are now available in Linus' tree:
f0e386ee0c0b71ea6f7238506a4d0965a2dbef11 ("printk: fix buffer overflow
potential for print_text()")
08d60e5999540110576e7c1346d486220751b7f9 ("printk: fix string
termination for record_print_text()")
The first one (f0e386ee0c0b) was already queued up for 5.10-stable but I
requested it not be applied until this second one was accepted. Now they
are both accepted and both should be applied. Thanks.
John Ogness
The patch titled
Subject: mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
has been added to the -mm tree. Its filename is
mm-filemap-adding-missing-mem_cgroup_uncharge-to-__add_to_page_cache_locked.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-filemap-adding-missing-mem_cgr…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-filemap-adding-missing-mem_cgr…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked()
commit 3fea5a499d57 ("mm: memcontrol: convert page cache to a new
mem_cgroup_charge() API") introduced a bug in __add_to_page_cache_locked()
causing the following splat:
[ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
[ 1570.068333] pages's memcg:ffff8889a4116000
[ 1570.068343] ------------[ cut here ]------------
[ 1570.068346] kernel BUG at mm/memcontrol.c:2924!
[ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S W I 5.11.0-rc4-debug+ #1
[ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
[ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130
:
[ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286
[ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027
[ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8
[ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6
[ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38
[ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001
[ 1570.068391] FS: 00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000
[ 1570.068394] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0
[ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1570.068402] PKRU: 55555554
[ 1570.068404] Call Trace:
[ 1570.068407] mem_cgroup_charge+0x175/0x770
[ 1570.068413] __add_to_page_cache_locked+0x712/0xad0
[ 1570.068439] add_to_page_cache_lru+0xc5/0x1f0
[ 1570.068461] cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
[ 1570.068524] __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
[ 1570.068540] __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
[ 1570.068585] nfs_readpages+0x24e/0x540 [nfs]
[ 1570.068693] read_pages+0x5b1/0xc40
[ 1570.068711] page_cache_ra_unbounded+0x460/0x750
[ 1570.068729] generic_file_buffered_read_get_pages+0x290/0x1710
[ 1570.068756] generic_file_buffered_read+0x2a9/0xc30
[ 1570.068832] nfs_file_read+0x13f/0x230 [nfs]
[ 1570.068872] new_sync_read+0x3af/0x610
[ 1570.068901] vfs_read+0x339/0x4b0
[ 1570.068909] ksys_read+0xf1/0x1c0
[ 1570.068920] do_syscall_64+0x33/0x40
[ 1570.068926] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1570.068930] RIP: 0033:0x7ff039135595
Before that commit, there was a try_charge() and commit_charge() in
__add_to_page_cache_locked(). These 2 separated charge functions were
replaced by a single mem_cgroup_charge(). However, it forgot to add a
matching mem_cgroup_uncharge() when the xarray insertion failed with the
page released back to the pool. Fix this by adding a
mem_cgroup_uncharge() call when insertion error happens.
Link: https://lkml.kernel.org/r/20210125042441.20030-1-longman@redhat.com
Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Reviewed-by: Alex Shi <alex.shi(a)linux.alibaba.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <smuchun(a)gmail.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/filemap.c~mm-filemap-adding-missing-mem_cgroup_uncharge-to-__add_to_page_cache_locked
+++ a/mm/filemap.c
@@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked(
XA_STATE(xas, &mapping->i_pages, offset);
int huge = PageHuge(page);
int error;
+ bool charged = false;
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapBacked(page), page);
@@ -848,6 +849,7 @@ noinline int __add_to_page_cache_locked(
error = mem_cgroup_charge(page, current->mm, gfp);
if (error)
goto error;
+ charged = true;
}
gfp &= GFP_RECLAIM_MASK;
@@ -896,6 +898,8 @@ unlock:
if (xas_error(&xas)) {
error = xas_error(&xas);
+ if (charged)
+ mem_cgroup_uncharge(page);
goto error;
}
_
Patches currently in -mm which might be from longman(a)redhat.com are
mm-filemap-adding-missing-mem_cgroup_uncharge-to-__add_to_page_cache_locked.patch
Ville noticed that the last mocs entry is used unconditionally by the HW
when it performs cache evictions, and noted that while the value is not
meant to be writable by the driver, we should program it to a reasonable
value nevertheless.
As it turns out, we can change the value of mocs:63 and the value we
were programming into it would cause hard hangs in conjunction with
atomic operations.
Suggested-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2707
Fixes: 3bbaba0ceaa2 ("drm/i915: Added Programming of the MOCS")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Jason Ekstrand <jason(a)jlekstrand.net>
Cc: <stable(a)vger.kernel.org> # v4.3+
---
drivers/gpu/drm/i915/gt/intel_mocs.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
index 254873e1646e..6ae512847f64 100644
--- a/drivers/gpu/drm/i915/gt/intel_mocs.c
+++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
@@ -131,7 +131,10 @@ static const struct drm_i915_mocs_entry skl_mocs_table[] = {
GEN9_MOCS_ENTRIES,
MOCS_ENTRY(I915_MOCS_CACHED,
LE_3_WB | LE_TC_2_LLC_ELLC | LE_LRUM(3),
- L3_3_WB)
+ L3_3_WB),
+ MOCS_ENTRY(63,
+ LE_3_WB | LE_TC_1_LLC | LE_LRUM(3),
+ L3_1_UC)
};
/* NOTE: the LE_TGT_CACHE is not used on Broxton */
--
2.20.1
The patch titled
Subject: proc_sysctl: fix oops caused by incorrect command parameters
has been removed from the -mm tree. Its filename was
proc_sysctl-fix-oops-caused-by-incorrect-command-parameters.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Xiaoming Ni <nixiaoming(a)huawei.com>
Subject: proc_sysctl: fix oops caused by incorrect command parameters
The process_sysctl_arg() does not check whether val is empty before
invoking strlen(val). If the command line parameter () is incorrectly
configured and val is empty, oops is triggered.
For example:
"hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is
triggered. The call stack is as follows:
Kernel command line: .... hung_task_panic
......
Call trace:
__pi_strlen+0x10/0x98
parse_args+0x278/0x344
do_sysctl_args+0x8c/0xfc
kernel_init+0x5c/0xf4
ret_from_fork+0x10/0x30
To fix it, check whether "val" is empty when "phram" is a sysctl field.
Error codes are returned in the failure branch, and error logs are
generated by parse_args().
Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com
Fixes: 3db978d480e2843 ("kernel/sysctl: support setting sysctl parameters from kernel command line")
Signed-off-by: Xiaoming Ni <nixiaoming(a)huawei.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Iurii Zaikin <yzaikin(a)google.com>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Heiner Kallweit <hkallweit1(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/proc_sysctl.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/fs/proc/proc_sysctl.c~proc_sysctl-fix-oops-caused-by-incorrect-command-parameters
+++ a/fs/proc/proc_sysctl.c
@@ -1770,6 +1770,12 @@ static int process_sysctl_arg(char *para
return 0;
}
+ if (!val)
+ return -EINVAL;
+ len = strlen(val);
+ if (len == 0)
+ return -EINVAL;
+
/*
* To set sysctl options, we use a temporary mount of proc, look up the
* respective sys/ file and write to it. To avoid mounting it when no
@@ -1811,7 +1817,6 @@ static int process_sysctl_arg(char *para
file, param, val);
goto out;
}
- len = strlen(val);
wret = kernel_write(file, val, len, &pos);
if (wret < 0) {
err = wret;
_
Patches currently in -mm which might be from nixiaoming(a)huawei.com are
The patch titled
Subject: mm: fix page reference leak in soft_offline_page()
has been removed from the -mm tree. Its filename was
mm-fix-page-reference-leak-in-soft_offline_page.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: fix page reference leak in soft_offline_page()
The conversion to move pfn_to_online_page() internal to
soft_offline_page() missed that the get_user_pages() reference taken by
the madvise() path needs to be dropped when pfn_to_online_page() fails.
Note the direct sysfs-path to soft_offline_page() does not perform a
get_user_pages() lookup.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page()
pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Link: https://lkml.kernel.org/r/161058501210.1840162.8108917599181157327.stgit@dw…
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
--- a/mm/memory-failure.c~mm-fix-page-reference-leak-in-soft_offline_page
+++ a/mm/memory-failure.c
@@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct
return rc;
}
+static void put_ref_page(struct page *page)
+{
+ if (page)
+ put_page(page);
+}
+
/**
* soft_offline_page - Soft offline a page.
* @pfn: pfn to soft-offline
@@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct
int soft_offline_page(unsigned long pfn, int flags)
{
int ret;
- struct page *page;
bool try_again = true;
+ struct page *page, *ref_page = NULL;
+
+ WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
if (!pfn_valid(pfn))
return -ENXIO;
+ if (flags & MF_COUNT_INCREASED)
+ ref_page = pfn_to_page(pfn);
+
/* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */
page = pfn_to_online_page(pfn);
- if (!page)
+ if (!page) {
+ put_ref_page(ref_page);
return -EIO;
+ }
if (PageHWPoison(page)) {
pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
- if (flags & MF_COUNT_INCREASED)
- put_page(page);
+ put_ref_page(ref_page);
return 0;
}
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-move-pfn_to_online_page-out-of-line.patch
mm-teach-pfn_to_online_page-to-consider-subsection-validity.patch
mm-teach-pfn_to_online_page-about-zone_device-section-collisions.patch
mm-teach-pfn_to_online_page-about-zone_device-section-collisions-fix.patch
mm-fix-memory_failure-handling-of-dax-namespace-metadata.patch
The patch titled
Subject: mm: fix numa stats for thp migration
has been removed from the -mm tree. Its filename was
mm-fix-numa-stats-for-thp-migration.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: mm: fix numa stats for thp migration
Currently the kernel is not correctly updating the numa stats for
NR_FILE_PAGES and NR_SHMEM on THP migration. Fix that. For NR_FILE_DIRTY
and NR_ZONE_WRITE_PENDING, although at the moment there is no need to
handle THP migration as kernel still does not have write support for file
THP but to be more future proof, this patch adds the THP support for those
stats as well.
Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com
Fixes: e71769ae52609 ("mm: enable thp migration for shmem thp")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
--- a/mm/migrate.c~mm-fix-numa-stats-for-thp-migration
+++ a/mm/migrate.c
@@ -402,6 +402,7 @@ int migrate_page_move_mapping(struct add
struct zone *oldzone, *newzone;
int dirty;
int expected_count = expected_page_refs(mapping, page) + extra_count;
+ int nr = thp_nr_pages(page);
if (!mapping) {
/* Anonymous page without mapping */
@@ -437,7 +438,7 @@ int migrate_page_move_mapping(struct add
*/
newpage->index = page->index;
newpage->mapping = page->mapping;
- page_ref_add(newpage, thp_nr_pages(page)); /* add cache reference */
+ page_ref_add(newpage, nr); /* add cache reference */
if (PageSwapBacked(page)) {
__SetPageSwapBacked(newpage);
if (PageSwapCache(page)) {
@@ -459,7 +460,7 @@ int migrate_page_move_mapping(struct add
if (PageTransHuge(page)) {
int i;
- for (i = 1; i < HPAGE_PMD_NR; i++) {
+ for (i = 1; i < nr; i++) {
xas_next(&xas);
xas_store(&xas, newpage);
}
@@ -470,7 +471,7 @@ int migrate_page_move_mapping(struct add
* to one less reference.
* We know this isn't the last reference.
*/
- page_ref_unfreeze(page, expected_count - thp_nr_pages(page));
+ page_ref_unfreeze(page, expected_count - nr);
xas_unlock(&xas);
/* Leave irq disabled to prevent preemption while updating stats */
@@ -493,17 +494,17 @@ int migrate_page_move_mapping(struct add
old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat);
new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat);
- __dec_lruvec_state(old_lruvec, NR_FILE_PAGES);
- __inc_lruvec_state(new_lruvec, NR_FILE_PAGES);
+ __mod_lruvec_state(old_lruvec, NR_FILE_PAGES, -nr);
+ __mod_lruvec_state(new_lruvec, NR_FILE_PAGES, nr);
if (PageSwapBacked(page) && !PageSwapCache(page)) {
- __dec_lruvec_state(old_lruvec, NR_SHMEM);
- __inc_lruvec_state(new_lruvec, NR_SHMEM);
+ __mod_lruvec_state(old_lruvec, NR_SHMEM, -nr);
+ __mod_lruvec_state(new_lruvec, NR_SHMEM, nr);
}
if (dirty && mapping_can_writeback(mapping)) {
- __dec_lruvec_state(old_lruvec, NR_FILE_DIRTY);
- __dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING);
- __inc_lruvec_state(new_lruvec, NR_FILE_DIRTY);
- __inc_zone_state(newzone, NR_ZONE_WRITE_PENDING);
+ __mod_lruvec_state(old_lruvec, NR_FILE_DIRTY, -nr);
+ __mod_zone_page_state(oldzone, NR_ZONE_WRITE_PENDING, -nr);
+ __mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr);
+ __mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr);
}
}
local_irq_enable();
_
Patches currently in -mm which might be from shakeelb(a)google.com are
mm-memcg-add-swapcache-stat-for-memcg-v2.patch
The patch titled
Subject: mm: memcg: fix memcg file_dirty numa stat
has been removed from the -mm tree. Its filename was
mm-memcg-fix-memcg-file_dirty-numa-stat.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: mm: memcg: fix memcg file_dirty numa stat
The kernel updates the per-node NR_FILE_DIRTY stats on page migration but
not the memcg numa stats. That was not an issue until recently the commit
5f9a4f4a7096 ("mm: memcontrol: add the missing numa_stat interface for
cgroup v2") exposed numa stats for the memcg. So fix the file_dirty
per-memcg numa stat.
Link: https://lkml.kernel.org/r/20210108155813.2914586-1-shakeelb@google.com
Fixes: 5f9a4f4a7096 ("mm: memcontrol: add the missing numa_stat interface for cgroup v2")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Muchun Song <songmuchun(a)bytedance.com>
Acked-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/migrate.c~mm-memcg-fix-memcg-file_dirty-numa-stat
+++ a/mm/migrate.c
@@ -500,9 +500,9 @@ int migrate_page_move_mapping(struct add
__inc_lruvec_state(new_lruvec, NR_SHMEM);
}
if (dirty && mapping_can_writeback(mapping)) {
- __dec_node_state(oldzone->zone_pgdat, NR_FILE_DIRTY);
+ __dec_lruvec_state(old_lruvec, NR_FILE_DIRTY);
__dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING);
- __inc_node_state(newzone->zone_pgdat, NR_FILE_DIRTY);
+ __inc_lruvec_state(new_lruvec, NR_FILE_DIRTY);
__inc_zone_state(newzone, NR_ZONE_WRITE_PENDING);
}
}
_
Patches currently in -mm which might be from shakeelb(a)google.com are
mm-memcg-add-swapcache-stat-for-memcg-v2.patch
The patch titled
Subject: mm: memcg/slab: optimize objcg stock draining
has been removed from the -mm tree. Its filename was
mm-memcg-slab-optimize-objcg-stock-draining.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: memcg/slab: optimize objcg stock draining
Imran Khan reported a 16% regression in hackbench results caused by the
commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects
instead of pages"). The regression is noticeable in the case of a
consequent allocation of several relatively large slab objects, e.g.
skb's. As soon as the amount of stocked bytes exceeds PAGE_SIZE,
drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads
to a number of atomic operations in page_counter_uncharge().
The corresponding call graph is below (provided by Imran Khan):
|__alloc_skb
| |
| |__kmalloc_reserve.isra.61
| | |
| | |__kmalloc_node_track_caller
| | | |
| | | |slab_pre_alloc_hook.constprop.88
| | | obj_cgroup_charge
| | | | |
| | | | |__memcg_kmem_charge
| | | | | |
| | | | | |page_counter_try_charge
| | | | |
| | | | |refill_obj_stock
| | | | | |
| | | | | |drain_obj_stock.isra.68
| | | | | | |
| | | | | | |__memcg_kmem_uncharge
| | | | | | | |
| | | | | | | |page_counter_uncharge
| | | | | | | | |
| | | | | | | | |page_counter_cancel
| | | |
| | | |
| | | |__slab_alloc
| | | | |
| | | | |___slab_alloc
| | | | |
| | | |slab_post_alloc_hook
Instead of directly uncharging the accounted kernel memory, it's possible
to refill the generic page-sized per-cpu stock instead. It's a much
faster operation, especially on a default hierarchy. As a bonus,
__memcg_kmem_uncharge_page() will also get faster, so the freeing of
page-sized kernel allocations (e.g. large kmallocs) will become faster.
A similar change has been done earlier for the socket memory by the commit
475d0487a2ad ("mm: memcontrol: use per-cpu stocks for socket memory
uncharging").
Link: https://lkml.kernel.org/r/20210106042239.2860107-1-guro@fb.com
Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of
pages")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reported-by: Imran Khan <imran.f.khan(a)oracle.com>
Tested-by: Imran Khan <imran.f.khan(a)oracle.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Michal Koutn <mkoutny(a)suse.com>
Cc: Michal Koutný <mkoutny(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-slab-optimize-objcg-stock-draining
+++ a/mm/memcontrol.c
@@ -3115,9 +3115,7 @@ void __memcg_kmem_uncharge(struct mem_cg
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
page_counter_uncharge(&memcg->kmem, nr_pages);
- page_counter_uncharge(&memcg->memory, nr_pages);
- if (do_memsw_account())
- page_counter_uncharge(&memcg->memsw, nr_pages);
+ refill_stock(memcg, nr_pages);
}
/**
_
Patches currently in -mm which might be from guro(a)fb.com are
mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch
mm-kmem-make-__memcg_kmem_uncharge-static.patch
mm-cma-allocate-cma-areas-bottom-up.patch
mm-cma-allocate-cma-areas-bottom-up-fix.patch
mm-cma-allocate-cma-areas-bottom-up-fix-2.patch
mm-cma-allocate-cma-areas-bottom-up-fix-3.patch
memblock-do-not-start-bottom-up-allocations-with-kernel_end.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix.patch
The patch titled
Subject: mm: fix initialization of struct page for holes in memory layout
has been removed from the -mm tree. Its filename was
mm-fix-initialization-of-struct-page-for-holes-in-memory-layout.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mike Rapoport <rppt(a)linux.ibm.com>
Subject: mm: fix initialization of struct page for holes in memory layout
There could be struct pages that are not backed by actual physical memory.
This can happen when the actual memory bank is not a multiple of
SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.
Such pages are currently initialized using init_unavailable_mem() function
that iterates through PFNs in holes in memblock.memory and if there is a
struct page corresponding to a PFN, the fields if this page are set to
default values and the page is marked as Reserved.
init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.
On a system that has firmware reserved holes in a zone above ZONE_DMA, for
instance in a configuration below:
# grep -A1 E820 /proc/iomem
7a17b000-7a216fff : Unknown E820 type
7a217000-7bffffff : System RAM
unset zone link in struct page will trigger
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link
in struct page) in the same pageblock.
Update init_unavailable_mem() to use zone constraints defined by an
architecture to properly setup the zone link and use node ID of the
adjacent range in memblock.memory to set the node link.
Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org
Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reported-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 84 +++++++++++++++++++++++++++-------------------
1 file changed, 50 insertions(+), 34 deletions(-)
--- a/mm/page_alloc.c~mm-fix-initialization-of-struct-page-for-holes-in-memory-layout
+++ a/mm/page_alloc.c
@@ -7078,23 +7078,26 @@ void __init free_area_init_memoryless_no
* Initialize all valid struct pages in the range [spfn, epfn) and mark them
* PageReserved(). Return the number of struct pages that were initialized.
*/
-static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn)
+static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn,
+ int zone, int nid)
{
- unsigned long pfn;
+ unsigned long pfn, zone_spfn, zone_epfn;
u64 pgcnt = 0;
+ zone_spfn = arch_zone_lowest_possible_pfn[zone];
+ zone_epfn = arch_zone_highest_possible_pfn[zone];
+
+ spfn = clamp(spfn, zone_spfn, zone_epfn);
+ epfn = clamp(epfn, zone_spfn, zone_epfn);
+
for (pfn = spfn; pfn < epfn; pfn++) {
if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
+ pageblock_nr_pages - 1;
continue;
}
- /*
- * Use a fake node/zone (0) for now. Some of these pages
- * (in memblock.reserved but not in memblock.memory) will
- * get re-initialized via reserve_bootmem_region() later.
- */
- __init_single_page(pfn_to_page(pfn), pfn, 0, 0);
+
+ __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
__SetPageReserved(pfn_to_page(pfn));
pgcnt++;
}
@@ -7103,51 +7106,64 @@ static u64 __init init_unavailable_range
}
/*
- * Only struct pages that are backed by physical memory are zeroed and
- * initialized by going through __init_single_page(). But, there are some
- * struct pages which are reserved in memblock allocator and their fields
- * may be accessed (for example page_to_pfn() on some configuration accesses
- * flags). We must explicitly initialize those struct pages.
+ * Only struct pages that correspond to ranges defined by memblock.memory
+ * are zeroed and initialized by going through __init_single_page() during
+ * memmap_init().
*
- * This function also addresses a similar issue where struct pages are left
- * uninitialized because the physical address range is not covered by
- * memblock.memory or memblock.reserved. That could happen when memblock
- * layout is manually configured via memmap=, or when the highest physical
- * address (max_pfn) does not end on a section boundary.
+ * But, there could be struct pages that correspond to holes in
+ * memblock.memory. This can happen because of the following reasons:
+ * - phyiscal memory bank size is not necessarily the exact multiple of the
+ * arbitrary section size
+ * - early reserved memory may not be listed in memblock.memory
+ * - memory layouts defined with memmap= kernel parameter may not align
+ * nicely with memmap sections
+ *
+ * Explicitly initialize those struct pages so that:
+ * - PG_Reserved is set
+ * - zone link is set accorging to the architecture constrains
+ * - node is set to node id of the next populated region except for the
+ * trailing hole where last node id is used
*/
-static void __init init_unavailable_mem(void)
+static void __init init_zone_unavailable_mem(int zone)
{
- phys_addr_t start, end;
- u64 i, pgcnt;
- phys_addr_t next = 0;
+ unsigned long start, end;
+ int i, nid;
+ u64 pgcnt;
+ unsigned long next = 0;
/*
- * Loop through unavailable ranges not covered by memblock.memory.
+ * Loop through holes in memblock.memory and initialize struct
+ * pages corresponding to these holes
*/
pgcnt = 0;
- for_each_mem_range(i, &start, &end) {
+ for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) {
if (next < start)
- pgcnt += init_unavailable_range(PFN_DOWN(next),
- PFN_UP(start));
+ pgcnt += init_unavailable_range(next, start, zone, nid);
next = end;
}
/*
- * Early sections always have a fully populated memmap for the whole
- * section - see pfn_valid(). If the last section has holes at the
- * end and that section is marked "online", the memmap will be
- * considered initialized. Make sure that memmap has a well defined
- * state.
+ * Last section may surpass the actual end of memory (e.g. we can
+ * have 1Gb section and 512Mb of RAM pouplated).
+ * Make sure that memmap has a well defined state in this case.
*/
- pgcnt += init_unavailable_range(PFN_DOWN(next),
- round_up(max_pfn, PAGES_PER_SECTION));
+ end = round_up(max_pfn, PAGES_PER_SECTION);
+ pgcnt += init_unavailable_range(next, end, zone, nid);
/*
* Struct pages that do not have backing memory. This could be because
* firmware is using some of this memory, or for some other reasons.
*/
if (pgcnt)
- pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt);
+ pr_info("Zone %s: zeroed struct page in unavailable ranges: %lld pages", zone_names[zone], pgcnt);
+}
+
+static void __init init_unavailable_mem(void)
+{
+ int zone;
+
+ for (zone = 0; zone < ZONE_MOVABLE; zone++)
+ init_zone_unavailable_mem(zone);
}
#else
static inline void __init init_unavailable_mem(void)
_
Patches currently in -mm which might be from rppt(a)linux.ibm.com are
mm-add-definition-of-pmd_page_order.patch
mmap-make-mlock_future_check-global.patch
riscv-kconfig-make-direct-map-manipulation-options-depend-on-mmu.patch
set_memory-allow-set_direct_map__noflush-for-multiple-pages.patch
set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled.patch
mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas.patch
secretmem-use-pmd-size-pages-to-amortize-direct-map-fragmentation.patch
secretmem-add-memcg-accounting.patch
pm-hibernate-disable-when-there-are-active-secretmem-users.patch
arch-mm-wire-up-memfd_secret-system-call-where-relevant.patch
secretmem-test-add-basic-selftest-for-memfd_secret2.patch
The patch titled
Subject: x86/setup: don't remove E820_TYPE_RAM for pfn 0
has been removed from the -mm tree. Its filename was
x86-setup-dont-remove-e820_type_ram-for-pfn-0.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mike Rapoport <rppt(a)linux.ibm.com>
Subject: x86/setup: don't remove E820_TYPE_RAM for pfn 0
Patch series "mm: fix initialization of struct page for holes in memory layout", v3.
Commit 73a6e474cb37 ("mm: memmap_init: iterate over
memblock regions rather that check each PFN") exposed several issues with
the memory map initialization and these patches fix those issues.
Initially there were crashes during compaction that Qian Cai reported back
in April [1]. It seemed back then that the problem was fixed, but a few
weeks ago Andrea Arcangeli hit the same bug [2] and there was an additional
discussion at [3].
[1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
[2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com
[3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foun…
This patch (of 2):
The first 4Kb of memory is a BIOS owned area and to avoid its allocation
for the kernel it was not listed in e820 tables as memory. As the result,
pfn 0 was never recognised by the generic memory management and it is not
a part of neither node 0 nor ZONE_DMA.
If set_pfnblock_flags_mask() would be ever called for the pageblock
corresponding to the first 2Mbytes of memory, having pfn 0 outside of
ZONE_DMA would trigger
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
Along with reserving the first 4Kb in e820 tables, several first pages are
reserved with memblock in several places during setup_arch(). These
reservations are enough to ensure the kernel does not touch the BIOS area
and it is not necessary to remove E820_TYPE_RAM for pfn 0.
Remove the update of e820 table that changes the type of pfn 0 and move
the comment describing why it was done to trim_low_memory_range() that
reserves the beginning of the memory.
Link: https://lkml.kernel.org/r/20210111194017.22696-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/setup.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
--- a/arch/x86/kernel/setup.c~x86-setup-dont-remove-e820_type_ram-for-pfn-0
+++ a/arch/x86/kernel/setup.c
@@ -661,17 +661,6 @@ static void __init trim_platform_memory_
static void __init trim_bios_range(void)
{
/*
- * A special case is the first 4Kb of memory;
- * This is a BIOS owned area, not kernel ram, but generally
- * not listed as such in the E820 table.
- *
- * This typically reserves additional memory (64KiB by default)
- * since some BIOSes are known to corrupt low memory. See the
- * Kconfig help text for X86_RESERVE_LOW.
- */
- e820__range_update(0, PAGE_SIZE, E820_TYPE_RAM, E820_TYPE_RESERVED);
-
- /*
* special case: Some BIOSes report the PC BIOS
* area (640Kb -> 1Mb) as RAM even though it is not.
* take them out.
@@ -728,6 +717,15 @@ early_param("reservelow", parse_reservel
static void __init trim_low_memory_range(void)
{
+ /*
+ * A special case is the first 4Kb of memory;
+ * This is a BIOS owned area, not kernel ram, but generally
+ * not listed as such in the E820 table.
+ *
+ * This typically reserves additional memory (64KiB by default)
+ * since some BIOSes are known to corrupt low memory. See the
+ * Kconfig help text for X86_RESERVE_LOW.
+ */
memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
}
_
Patches currently in -mm which might be from rppt(a)linux.ibm.com are
mm-add-definition-of-pmd_page_order.patch
mmap-make-mlock_future_check-global.patch
riscv-kconfig-make-direct-map-manipulation-options-depend-on-mmu.patch
set_memory-allow-set_direct_map__noflush-for-multiple-pages.patch
set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled.patch
mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas.patch
secretmem-use-pmd-size-pages-to-amortize-direct-map-fragmentation.patch
secretmem-add-memcg-accounting.patch
pm-hibernate-disable-when-there-are-active-secretmem-users.patch
arch-mm-wire-up-memfd_secret-system-call-where-relevant.patch
secretmem-test-add-basic-selftest-for-memfd_secret2.patch
Backport a lazytime fix from upstream to 4.14-stable. To make it
cherry-pick cleanly, I first cherry-picked two other commits. Commit #2
had a conflict due to it making a change to fs/xfs/ which isn't
applicable to 4.14. I dropped that part of the change.
Christoph Hellwig (1):
fs: move I_DIRTY_INODE to fs.h
Eric Biggers (1):
fs: fix lazytime expiration handling in __writeback_single_inode()
Jan Kara (1):
writeback: Drop I_DIRTY_TIME_EXPIRE
fs/ext4/inode.c | 6 ++---
fs/fs-writeback.c | 43 +++++++++++++-------------------
fs/gfs2/super.c | 2 +-
include/linux/fs.h | 4 +--
include/trace/events/writeback.h | 1 -
5 files changed, 24 insertions(+), 32 deletions(-)
--
2.30.0
Backport a lazytime fix from upstream to 4.19-stable. The first commit
had a trivial conflict in fs/xfs/ due to a file being renamed.
Following that, the second commit is a clean cherry-pick.
Eric Biggers (1):
fs: fix lazytime expiration handling in __writeback_single_inode()
Jan Kara (1):
writeback: Drop I_DIRTY_TIME_EXPIRE
fs/ext4/inode.c | 2 +-
fs/fs-writeback.c | 36 ++++++++++++++------------------
fs/xfs/xfs_trans_inode.c | 4 ++--
include/linux/fs.h | 1 -
include/trace/events/writeback.h | 1 -
5 files changed, 19 insertions(+), 25 deletions(-)
--
2.30.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1e249cb5b7fc09ff216aa5a12f6c302e434e88f9 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Tue, 12 Jan 2021 11:02:43 -0800
Subject: [PATCH] fs: fix lazytime expiration handling in
__writeback_single_inode()
When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.
This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).
However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from ->i_state. This causes two bugs:
- mark_inode_dirty_sync() redirties the inode, causing it to remain
dirty. This wastefully causes the inode to be written twice. But
more importantly, it breaks cases where sync_filesystem() is expected
to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
ioctl (as reported at
https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
as possibly filesystem freezing (freeze_super()).
- Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
called from __writeback_single_inode() for lazytime expiration,
xfs_fs_dirty_inode() ignores the notification. (XFS only cares about
lazytime expirations, and it assumes that i_state will contain
I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't
persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS.
Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state. This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.
This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled. It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).
Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Cc: stable(a)vger.kernel.org
Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")
Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index acfb55834af2..c41cb887eb7d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
}
/*
- * Some filesystems may redirty the inode during the writeback
- * due to delalloc, clear dirty metadata flags right before
- * write_inode()
+ * If the inode has dirty timestamps and we need to write them, call
+ * mark_inode_dirty_sync() to notify the filesystem about it and to
+ * change I_DIRTY_TIME into I_DIRTY_SYNC.
*/
- spin_lock(&inode->i_lock);
-
- dirty = inode->i_state & I_DIRTY;
if ((inode->i_state & I_DIRTY_TIME) &&
- ((dirty & I_DIRTY_INODE) ||
- wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
+ (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
time_after(jiffies, inode->dirtied_time_when +
dirtytime_expire_interval * HZ))) {
- dirty |= I_DIRTY_TIME;
trace_writeback_lazytime(inode);
+ mark_inode_dirty_sync(inode);
}
+
+ /*
+ * Some filesystems may redirty the inode during the writeback
+ * due to delalloc, clear dirty metadata flags right before
+ * write_inode()
+ */
+ spin_lock(&inode->i_lock);
+ dirty = inode->i_state & I_DIRTY;
inode->i_state &= ~dirty;
/*
@@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
spin_unlock(&inode->i_lock);
- if (dirty & I_DIRTY_TIME)
- mark_inode_dirty_sync(inode);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & ~I_DIRTY_PAGES) {
int err = write_inode(inode, wbc);
From: Gaurav Kohli <gkohli(a)codeaurora.org>
[ upstream commit bbeb97464eefc65f506084fd9f18f21653e01137 ]
Below race can come, if trace_open and resize of
cpu buffer is running parallely on different cpus
CPUX CPUY
ring_buffer_resize
atomic_read(&buffer->resize_disabled)
tracing_open
tracing_reset_online_cpus
ring_buffer_reset_cpu
rb_reset_cpu
rb_update_pages
remove/insert pages
resetting pointer
This race can cause data abort or some times infinite loop in
rb_remove_pages and rb_insert_pages while checking pages
for sanity.
Take buffer lock to fix this.
Link: https://lkml.kernel.org/r/1601976833-24377-1-git-send-email-gkohli@codeauro…
Cc: stable(a)vger.kernel.org
Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic")
Reported-by: Denis Efremov <efremov(a)linux.com>
Signed-off-by: Gaurav Kohli <gkohli(a)codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
I updated the "Fixes:" tag with the proper commit it fixes.
-- Steve
kernel/trace/ring_buffer.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 077877ed54f7..728374166653 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4448,6 +4448,8 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return;
+ /* prevent another thread from changing buffer sizes */
+ mutex_lock(&buffer->mutex);
atomic_inc(&buffer->resize_disabled);
atomic_inc(&cpu_buffer->record_disabled);
@@ -4471,6 +4473,8 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
atomic_dec(&cpu_buffer->record_disabled);
atomic_dec(&buffer->resize_disabled);
+
+ mutex_unlock(&buffer->mutex);
}
EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
--
2.25.4
commit dca5244d2f5b94f1809f0c02a549edf41ccd5493 upstream.
GCC versions >= 4.9 and < 5.1 have been shown to emit memory references
beyond the stack pointer, resulting in memory corruption if an interrupt
is taken after the stack pointer has been adjusted but before the
reference has been executed. This leads to subtle, infrequent data
corruption such as the EXT4 problems reported by Russell King at the
link below.
Life is too short for buggy compilers, so raise the minimum GCC version
required by arm64 to 5.1.
Reported-by: Russell King <linux(a)armlinux.org.uk>
Suggested-by: Arnd Bergmann <arnd(a)kernel.org>
Signed-off-by: Will Deacon <will(a)kernel.org>
Tested-by: Nathan Chancellor <natechancellor(a)gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com>
Reviewed-by: Nathan Chancellor <natechancellor(a)gmail.com>
Acked-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: <stable(a)vger.kernel.org> # 4.4.y, 4.9.y and 4.14.y only
Cc: Theodore Ts'o <tytso(a)mit.edu>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Nathan Chancellor <natechancellor(a)gmail.com>
Cc: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Link: https://lore.kernel.org/r/20210105154726.GD1551@shell.armlinux.org.uk
Link: https://lore.kernel.org/r/20210112224832.10980-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
[will: backport to 4.4.y/4.9.y/4.14.y; add __clang__ check]
Link: https://lore.kernel.org/r/CA+G9fYuzE9WMSB7uGjV4gTzK510SHEdJb_UXQCzsQ5MqA=h9…
Signed-off-by: Will Deacon <will(a)kernel.org>
---
include/linux/compiler-gcc.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index af8b4a879934..9485abe76b68 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -145,6 +145,12 @@
#if GCC_VERSION < 30200
# error Sorry, your compiler is too old - please upgrade it.
+#elif defined(CONFIG_ARM64) && GCC_VERSION < 50100 && !defined(__clang__)
+/*
+ * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63293
+ * https://lore.kernel.org/r/20210107111841.GN1551@shell.armlinux.org.uk
+ */
+# error Sorry, your version of GCC is too old - please use 5.1 or newer.
#endif
#if GCC_VERSION < 30300
--
2.30.0.280.ga3ce27912f-goog
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2a0435df963f996ca870a2ef1cbf1773dc0ea25a Mon Sep 17 00:00:00 2001
From: Stephan Gerhold <stephan(a)gerhold.net>
Date: Thu, 7 Jan 2021 17:51:31 +0100
Subject: [PATCH] ASoC: hdmi-codec: Fix return value in hdmi_codec_set_jack()
Sound is broken on the DragonBoard 410c (apq8016_sbc) since 5.10:
hdmi-audio-codec hdmi-audio-codec.1.auto: ASoC: error at snd_soc_component_set_jack on hdmi-audio-codec.1.auto: -95
qcom-apq8016-sbc 7702000.sound: Failed to set jack: -95
ADV7533: ASoC: error at snd_soc_link_init on ADV7533: -95
hdmi-audio-codec hdmi-audio-codec.1.auto: ASoC: error at snd_soc_component_set_jack on hdmi-audio-codec.1.auto: -95
qcom-apq8016-sbc: probe of 7702000.sound failed with error -95
This happens because apq8016_sbc calls snd_soc_component_set_jack() on
all codec DAIs and attempts to ignore failures with return code -ENOTSUPP.
-ENOTSUPP is also excluded from error logging in soc_component_ret().
However, hdmi_codec_set_jack() returns -E*OP*NOTSUPP if jack detection
is not supported, which is not handled in apq8016_sbc and soc_component_ret().
Make it return -ENOTSUPP instead to fix sound and silence the errors.
Cc: Cheng-Yi Chiang <cychiang(a)chromium.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Fixes: 55c5cc63ab32 ("ASoC: hdmi-codec: Use set_jack ops to set jack")
Signed-off-by: Stephan Gerhold <stephan(a)gerhold.net>
Acked-by: Nicolin Chen <nicoleotsuka(a)gmail.com>
Link: https://lore.kernel.org/r/20210107165131.2535-1-stephan@gerhold.net
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/codecs/hdmi-codec.c b/sound/soc/codecs/hdmi-codec.c
index d5fcc4db8284..0f3ac22f2cf8 100644
--- a/sound/soc/codecs/hdmi-codec.c
+++ b/sound/soc/codecs/hdmi-codec.c
@@ -717,7 +717,7 @@ static int hdmi_codec_set_jack(struct snd_soc_component *component,
void *data)
{
struct hdmi_codec_priv *hcp = snd_soc_component_get_drvdata(component);
- int ret = -EOPNOTSUPP;
+ int ret = -ENOTSUPP;
if (hcp->hcd.ops->hook_plugged_cb) {
hcp->jack = jack;
diff --git a/sound/soc/fsl/imx-hdmi.c b/sound/soc/fsl/imx-hdmi.c
index ede4a9ad1054..dbbb7618351c 100644
--- a/sound/soc/fsl/imx-hdmi.c
+++ b/sound/soc/fsl/imx-hdmi.c
@@ -90,7 +90,7 @@ static int imx_hdmi_init(struct snd_soc_pcm_runtime *rtd)
}
ret = snd_soc_component_set_jack(component, &data->hdmi_jack, NULL);
- if (ret && ret != -EOPNOTSUPP) {
+ if (ret && ret != -ENOTSUPP) {
dev_err(card->dev, "Can't set HDMI Jack %d\n", ret);
return ret;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a95881d6aa2c000e3649f27a1a7329cf356e6bb3 Mon Sep 17 00:00:00 2001
From: Douglas Anderson <dianders(a)chromium.org>
Date: Thu, 14 Jan 2021 19:16:23 -0800
Subject: [PATCH] pinctrl: qcom: Properly clear "intr_ack_high" interrupts when
unmasking
In commit 4b7618fdc7e6 ("pinctrl: qcom: Add irq_enable callback for
msm gpio") we tried to Ack interrupts during unmask. However, that
patch forgot to check "intr_ack_high" so, presumably, it only worked
for a certain subset of SoCs.
Let's add a small accessor so we don't need to open-code the logic in
both places.
This was found by code inspection. I don't have any access to the
hardware in question nor software that needs the Ack during unmask.
Fixes: 4b7618fdc7e6 ("pinctrl: qcom: Add irq_enable callback for msm gpio")
Signed-off-by: Douglas Anderson <dianders(a)chromium.org>
Reviewed-by: Maulik Shah <mkshah(a)codeaurora.org>
Tested-by: Maulik Shah <mkshah(a)codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd(a)chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
Link: https://lore.kernel.org/r/20210114191601.v7.3.I32d0f4e174d45363b49ab611a13c…
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c b/drivers/pinctrl/qcom/pinctrl-msm.c
index 2f363c28d9d9..192ed31eabf4 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -96,6 +96,14 @@ MSM_ACCESSOR(intr_cfg)
MSM_ACCESSOR(intr_status)
MSM_ACCESSOR(intr_target)
+static void msm_ack_intr_status(struct msm_pinctrl *pctrl,
+ const struct msm_pingroup *g)
+{
+ u32 val = g->intr_ack_high ? BIT(g->intr_status_bit) : 0;
+
+ msm_writel_intr_status(val, pctrl, g);
+}
+
static int msm_get_groups_count(struct pinctrl_dev *pctldev)
{
struct msm_pinctrl *pctrl = pinctrl_dev_get_drvdata(pctldev);
@@ -797,7 +805,7 @@ static void msm_gpio_irq_clear_unmask(struct irq_data *d, bool status_clear)
* when the interrupt is not in use.
*/
if (status_clear)
- msm_writel_intr_status(0, pctrl, g);
+ msm_ack_intr_status(pctrl, g);
val = msm_readl_intr_cfg(pctrl, g);
val |= BIT(g->intr_raw_status_bit);
@@ -890,7 +898,6 @@ static void msm_gpio_irq_ack(struct irq_data *d)
struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
const struct msm_pingroup *g;
unsigned long flags;
- u32 val;
if (test_bit(d->hwirq, pctrl->skip_wake_irqs)) {
if (test_bit(d->hwirq, pctrl->dual_edge_irqs))
@@ -902,8 +909,7 @@ static void msm_gpio_irq_ack(struct irq_data *d)
raw_spin_lock_irqsave(&pctrl->lock, flags);
- val = g->intr_ack_high ? BIT(g->intr_status_bit) : 0;
- msm_writel_intr_status(val, pctrl, g);
+ msm_ack_intr_status(pctrl, g);
if (test_bit(d->hwirq, pctrl->dual_edge_irqs))
msm_gpio_update_dual_edge_pos(pctrl, g, d);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a95881d6aa2c000e3649f27a1a7329cf356e6bb3 Mon Sep 17 00:00:00 2001
From: Douglas Anderson <dianders(a)chromium.org>
Date: Thu, 14 Jan 2021 19:16:23 -0800
Subject: [PATCH] pinctrl: qcom: Properly clear "intr_ack_high" interrupts when
unmasking
In commit 4b7618fdc7e6 ("pinctrl: qcom: Add irq_enable callback for
msm gpio") we tried to Ack interrupts during unmask. However, that
patch forgot to check "intr_ack_high" so, presumably, it only worked
for a certain subset of SoCs.
Let's add a small accessor so we don't need to open-code the logic in
both places.
This was found by code inspection. I don't have any access to the
hardware in question nor software that needs the Ack during unmask.
Fixes: 4b7618fdc7e6 ("pinctrl: qcom: Add irq_enable callback for msm gpio")
Signed-off-by: Douglas Anderson <dianders(a)chromium.org>
Reviewed-by: Maulik Shah <mkshah(a)codeaurora.org>
Tested-by: Maulik Shah <mkshah(a)codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd(a)chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
Link: https://lore.kernel.org/r/20210114191601.v7.3.I32d0f4e174d45363b49ab611a13c…
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c b/drivers/pinctrl/qcom/pinctrl-msm.c
index 2f363c28d9d9..192ed31eabf4 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -96,6 +96,14 @@ MSM_ACCESSOR(intr_cfg)
MSM_ACCESSOR(intr_status)
MSM_ACCESSOR(intr_target)
+static void msm_ack_intr_status(struct msm_pinctrl *pctrl,
+ const struct msm_pingroup *g)
+{
+ u32 val = g->intr_ack_high ? BIT(g->intr_status_bit) : 0;
+
+ msm_writel_intr_status(val, pctrl, g);
+}
+
static int msm_get_groups_count(struct pinctrl_dev *pctldev)
{
struct msm_pinctrl *pctrl = pinctrl_dev_get_drvdata(pctldev);
@@ -797,7 +805,7 @@ static void msm_gpio_irq_clear_unmask(struct irq_data *d, bool status_clear)
* when the interrupt is not in use.
*/
if (status_clear)
- msm_writel_intr_status(0, pctrl, g);
+ msm_ack_intr_status(pctrl, g);
val = msm_readl_intr_cfg(pctrl, g);
val |= BIT(g->intr_raw_status_bit);
@@ -890,7 +898,6 @@ static void msm_gpio_irq_ack(struct irq_data *d)
struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
const struct msm_pingroup *g;
unsigned long flags;
- u32 val;
if (test_bit(d->hwirq, pctrl->skip_wake_irqs)) {
if (test_bit(d->hwirq, pctrl->dual_edge_irqs))
@@ -902,8 +909,7 @@ static void msm_gpio_irq_ack(struct irq_data *d)
raw_spin_lock_irqsave(&pctrl->lock, flags);
- val = g->intr_ack_high ? BIT(g->intr_status_bit) : 0;
- msm_writel_intr_status(val, pctrl, g);
+ msm_ack_intr_status(pctrl, g);
if (test_bit(d->hwirq, pctrl->dual_edge_irqs))
msm_gpio_update_dual_edge_pos(pctrl, g, d);
Commit fe8abf332b8f ("usb: dwc3: support clocks and resets for DWC3
core") introduced clock support and a new function named
dwc3_core_init_for_resume() which enables the clock before calling
dwc3_core_init() during resume as clocks get disabled during suspend.
Unfortunately in this commit the DWC3_GCTL_PRTCAP_OTG case was forgotten
and therefore during resume, a platform could call dwc3_core_init()
without re-enabling the clocks first, preventing to resume properly.
So update the resume path to call dwc3_core_init_for_resume() as it
should.
Fixes: fe8abf332b8f ("usb: dwc3: support clocks and resets for DWC3
core")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gary Bisson <gary.bisson(a)boundarydevices.com>
---
drivers/usb/dwc3/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 841daec70b6e..3101f0dcf6ae 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1758,7 +1758,7 @@ static int dwc3_resume_common(struct dwc3 *dwc, pm_message_t msg)
if (PMSG_IS_AUTO(msg))
break;
- ret = dwc3_core_init(dwc);
+ ret = dwc3_core_init_for_resume(dwc);
if (ret)
return ret;
--
2.29.2
Hi Greg,
On 2021-01-24, <gregkh(a)linuxfoundation.org> wrote:
> This is a note to let you know that I've just added the patch titled
>
> printk: fix buffer overflow potential for print_text()
We just learned that this patch introduces a new problem. I have just
posted a patch to fix the new problem:
https://lkml.kernel.org/r/20210124202728.4718-1-john.ogness@linutronix.de
You may want to hold off on applying the first fix until the second fix
has been accepted. Then you can apply both.
John Ogness
While running btrfs/011 in a loop I would often ASSERT() while trying to
add a new free space entry that already existed, or get an -EEXIST while
adding a new block to the extent tree, which is another indication of
double allocation.
This occurs because when we do the free space tree population, we create
the new root and then populate the tree and commit the transaction.
The problem is when you create a new root, the root node and commit root
node are the same. During this initial transaction commit we will run
all of the delayed refs that were paused during the free space tree
generation, and thus begin to cache block groups. While caching block
groups the caching thread will be reading from the main root for the
free space tree, so as we make allocations we'll be changing the free
space tree, which can cause us to add the same range twice which results
in either the ASSERT(ret != -EEXIST); in __btrfs_add_free_space, or in a
variety of different errors when running delayed refs because of a
double allocation.
Fix this by marking the fs_info as unsafe to load the free space tree,
and fall back on the old slow method. We could be smarter than this,
for example caching the block group while we're populating the free
space tree, but since this is a serious problem I've opted for the
simplest solution.
CC: stable(a)vger.kernel.org # 4.5+
Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree")
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
v2->v3:
- Updated the changelog to be more specific about the scenario that the problem
happens under.
- Fixed the stable tag as well.
v1->v2:
- Dropped the UNTRUSTED bit on abort.
fs/btrfs/block-group.c | 11 ++++++++++-
fs/btrfs/ctree.h | 3 +++
fs/btrfs/free-space-tree.c | 10 +++++++++-
3 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 0886e81e5540..290f05e87a6b 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -673,7 +673,16 @@ static noinline void caching_thread(struct btrfs_work *work)
wake_up(&caching_ctl->wait);
}
- if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE))
+ /*
+ * If we are in the transaction that populated the free space tree we
+ * can't actually cache from the free space tree as our commit root and
+ * real root are the same, so we could change the contents of the blocks
+ * while caching. Instead do the slow caching in this case, and after
+ * the transaction has committed we will be safe.
+ */
+ if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
+ !(test_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED,
+ &fs_info->flags)))
ret = load_free_space_tree(caching_ctl);
else
ret = load_extent_tree_free(caching_ctl);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1bd416e5a731..1fdb5dbe5287 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -563,6 +563,9 @@ enum {
/* Indicate that we need to cleanup space cache v1 */
BTRFS_FS_CLEANUP_SPACE_CACHE_V1,
+
+ /* Indicate that we can't trust the free space tree for caching yet. */
+ BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED,
};
/*
diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
index e33a65bd9a0c..a33bca94d133 100644
--- a/fs/btrfs/free-space-tree.c
+++ b/fs/btrfs/free-space-tree.c
@@ -1150,6 +1150,7 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info)
return PTR_ERR(trans);
set_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
+ set_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
free_space_root = btrfs_create_tree(trans,
BTRFS_FREE_SPACE_TREE_OBJECTID);
if (IS_ERR(free_space_root)) {
@@ -1171,11 +1172,18 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info)
btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE);
btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID);
clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
+ ret = btrfs_commit_transaction(trans);
- return btrfs_commit_transaction(trans);
+ /*
+ * Now that we've committed the transaction any reading of our commit
+ * root will be safe, so we can cache from the free space tree now.
+ */
+ clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
+ return ret;
abort:
clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
+ clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
btrfs_abort_transaction(trans, ret);
btrfs_end_transaction(trans);
return ret;
--
2.26.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b7ba6cfabc42fc846eb96e33f1edcd3ea6290a27 Mon Sep 17 00:00:00 2001
From: Yingjie Wang <wangyingjie55(a)126.com>
Date: Fri, 15 Jan 2021 06:10:04 -0800
Subject: [PATCH] octeontx2-af: Fix missing check bugs in rvu_cgx.c
In rvu_mbox_handler_cgx_mac_addr_get()
and rvu_mbox_handler_cgx_mac_addr_set(),
the msg is expected only from PFs that are mapped to CGX LMACs.
It should be checked before mapping,
so we add the is_cgx_config_permitted() in the functions.
Fixes: 96be2e0da85e ("octeontx2-af: Support for MAC address filters in CGX")
Signed-off-by: Yingjie Wang <wangyingjie55(a)126.com>
Reviewed-by: Geetha sowjanya<gakula(a)marvell.com>
Link: https://lore.kernel.org/r/1610719804-35230-1-git-send-email-wangyingjie55@1…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
index d298b9357177..6c6b411e78fd 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
@@ -469,6 +469,9 @@ int rvu_mbox_handler_cgx_mac_addr_set(struct rvu *rvu,
int pf = rvu_get_pf(req->hdr.pcifunc);
u8 cgx_id, lmac_id;
+ if (!is_cgx_config_permitted(rvu, req->hdr.pcifunc))
+ return -EPERM;
+
rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id);
cgx_lmac_addr_set(cgx_id, lmac_id, req->mac_addr);
@@ -485,6 +488,9 @@ int rvu_mbox_handler_cgx_mac_addr_get(struct rvu *rvu,
int rc = 0, i;
u64 cfg;
+ if (!is_cgx_config_permitted(rvu, req->hdr.pcifunc))
+ return -EPERM;
+
rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id);
rsp->hdr.rc = rc;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e708789c4a87989faff1131ccfdc465a1c1eddbc Mon Sep 17 00:00:00 2001
From: Miquel Raynal <miquel.raynal(a)bootlin.com>
Date: Thu, 7 Jan 2021 09:38:13 +0100
Subject: [PATCH] mtd: spinand: Fix MTD_OPS_AUTO_OOB requests
The initial change breaking the logic is
commit 3d1f08b032dc ("mtd: spinand: Use the external ECC engine logic")
It inadvertently dropped proper OOB support while doing something
else.
Shortly later, half of it got re-integrated by
commit 868cbe2a6dce ("mtd: spinand: Fix OOB read")
(pointing by the way to a more early change which had nothing to do
with the issue). Problem is, this commit failed to revert the faulty
change entirely and missed the logic handling MTD_OPS_AUTO_OOB
requests.
Let's fix this mess by re-inserting the missing part now.
Fixes: 868cbe2a6dce ("mtd: spinand: Fix OOB read")
Reported-by: Felix Fietkau <nbd(a)nbd.name>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210107083813.24283-1-miquel.raynal@boot…
diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c
index 8ea545bb924d..61d932c1b718 100644
--- a/drivers/mtd/nand/spi/core.c
+++ b/drivers/mtd/nand/spi/core.c
@@ -343,6 +343,7 @@ static int spinand_read_from_cache_op(struct spinand_device *spinand,
const struct nand_page_io_req *req)
{
struct nand_device *nand = spinand_to_nand(spinand);
+ struct mtd_info *mtd = spinand_to_mtd(spinand);
struct spi_mem_dirmap_desc *rdesc;
unsigned int nbytes = 0;
void *buf = NULL;
@@ -382,9 +383,16 @@ static int spinand_read_from_cache_op(struct spinand_device *spinand,
memcpy(req->databuf.in, spinand->databuf + req->dataoffs,
req->datalen);
- if (req->ooblen)
- memcpy(req->oobbuf.in, spinand->oobbuf + req->ooboffs,
- req->ooblen);
+ if (req->ooblen) {
+ if (req->mode == MTD_OPS_AUTO_OOB)
+ mtd_ooblayout_get_databytes(mtd, req->oobbuf.in,
+ spinand->oobbuf,
+ req->ooboffs,
+ req->ooblen);
+ else
+ memcpy(req->oobbuf.in, spinand->oobbuf + req->ooboffs,
+ req->ooblen);
+ }
return 0;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e708789c4a87989faff1131ccfdc465a1c1eddbc Mon Sep 17 00:00:00 2001
From: Miquel Raynal <miquel.raynal(a)bootlin.com>
Date: Thu, 7 Jan 2021 09:38:13 +0100
Subject: [PATCH] mtd: spinand: Fix MTD_OPS_AUTO_OOB requests
The initial change breaking the logic is
commit 3d1f08b032dc ("mtd: spinand: Use the external ECC engine logic")
It inadvertently dropped proper OOB support while doing something
else.
Shortly later, half of it got re-integrated by
commit 868cbe2a6dce ("mtd: spinand: Fix OOB read")
(pointing by the way to a more early change which had nothing to do
with the issue). Problem is, this commit failed to revert the faulty
change entirely and missed the logic handling MTD_OPS_AUTO_OOB
requests.
Let's fix this mess by re-inserting the missing part now.
Fixes: 868cbe2a6dce ("mtd: spinand: Fix OOB read")
Reported-by: Felix Fietkau <nbd(a)nbd.name>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210107083813.24283-1-miquel.raynal@boot…
diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c
index 8ea545bb924d..61d932c1b718 100644
--- a/drivers/mtd/nand/spi/core.c
+++ b/drivers/mtd/nand/spi/core.c
@@ -343,6 +343,7 @@ static int spinand_read_from_cache_op(struct spinand_device *spinand,
const struct nand_page_io_req *req)
{
struct nand_device *nand = spinand_to_nand(spinand);
+ struct mtd_info *mtd = spinand_to_mtd(spinand);
struct spi_mem_dirmap_desc *rdesc;
unsigned int nbytes = 0;
void *buf = NULL;
@@ -382,9 +383,16 @@ static int spinand_read_from_cache_op(struct spinand_device *spinand,
memcpy(req->databuf.in, spinand->databuf + req->dataoffs,
req->datalen);
- if (req->ooblen)
- memcpy(req->oobbuf.in, spinand->oobbuf + req->ooboffs,
- req->ooblen);
+ if (req->ooblen) {
+ if (req->mode == MTD_OPS_AUTO_OOB)
+ mtd_ooblayout_get_databytes(mtd, req->oobbuf.in,
+ spinand->oobbuf,
+ req->ooboffs,
+ req->ooblen);
+ else
+ memcpy(req->oobbuf.in, spinand->oobbuf + req->ooboffs,
+ req->ooblen);
+ }
return 0;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e73b0101ae5124bf7cd3fb5d250302ad2f16a416 Mon Sep 17 00:00:00 2001
From: Baruch Siach <baruch(a)tkos.co.il>
Date: Sun, 17 Jan 2021 15:17:02 +0200
Subject: [PATCH] gpio: mvebu: fix pwm .get_state period calculation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The period is the sum of on and off values. That is, calculate period as
($on + $off) / clkrate
instead of
$off / clkrate - $on / clkrate
that makes no sense.
Reported-by: Russell King <linux(a)armlinux.org.uk>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Fixes: 757642f9a584e ("gpio: mvebu: Add limited PWM support")
Signed-off-by: Baruch Siach <baruch(a)tkos.co.il>
Signed-off-by: Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
diff --git a/drivers/gpio/gpio-mvebu.c b/drivers/gpio/gpio-mvebu.c
index 672681a976f5..a912a8fed197 100644
--- a/drivers/gpio/gpio-mvebu.c
+++ b/drivers/gpio/gpio-mvebu.c
@@ -676,20 +676,17 @@ static void mvebu_pwm_get_state(struct pwm_chip *chip,
else
state->duty_cycle = 1;
+ val = (unsigned long long) u; /* on duration */
regmap_read(mvpwm->regs, mvebu_pwmreg_blink_off_duration(mvpwm), &u);
- val = (unsigned long long) u * NSEC_PER_SEC;
+ val += (unsigned long long) u; /* period = on + off duration */
+ val *= NSEC_PER_SEC;
do_div(val, mvpwm->clk_rate);
- if (val < state->duty_cycle) {
+ if (val > UINT_MAX)
+ state->period = UINT_MAX;
+ else if (val)
+ state->period = val;
+ else
state->period = 1;
- } else {
- val -= state->duty_cycle;
- if (val > UINT_MAX)
- state->period = UINT_MAX;
- else if (val)
- state->period = val;
- else
- state->period = 1;
- }
regmap_read(mvchip->regs, GPIO_BLINK_EN_OFF + mvchip->offset, &u);
if (u)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e73b0101ae5124bf7cd3fb5d250302ad2f16a416 Mon Sep 17 00:00:00 2001
From: Baruch Siach <baruch(a)tkos.co.il>
Date: Sun, 17 Jan 2021 15:17:02 +0200
Subject: [PATCH] gpio: mvebu: fix pwm .get_state period calculation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The period is the sum of on and off values. That is, calculate period as
($on + $off) / clkrate
instead of
$off / clkrate - $on / clkrate
that makes no sense.
Reported-by: Russell King <linux(a)armlinux.org.uk>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Fixes: 757642f9a584e ("gpio: mvebu: Add limited PWM support")
Signed-off-by: Baruch Siach <baruch(a)tkos.co.il>
Signed-off-by: Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
diff --git a/drivers/gpio/gpio-mvebu.c b/drivers/gpio/gpio-mvebu.c
index 672681a976f5..a912a8fed197 100644
--- a/drivers/gpio/gpio-mvebu.c
+++ b/drivers/gpio/gpio-mvebu.c
@@ -676,20 +676,17 @@ static void mvebu_pwm_get_state(struct pwm_chip *chip,
else
state->duty_cycle = 1;
+ val = (unsigned long long) u; /* on duration */
regmap_read(mvpwm->regs, mvebu_pwmreg_blink_off_duration(mvpwm), &u);
- val = (unsigned long long) u * NSEC_PER_SEC;
+ val += (unsigned long long) u; /* period = on + off duration */
+ val *= NSEC_PER_SEC;
do_div(val, mvpwm->clk_rate);
- if (val < state->duty_cycle) {
+ if (val > UINT_MAX)
+ state->period = UINT_MAX;
+ else if (val)
+ state->period = val;
+ else
state->period = 1;
- } else {
- val -= state->duty_cycle;
- if (val > UINT_MAX)
- state->period = UINT_MAX;
- else if (val)
- state->period = val;
- else
- state->period = 1;
- }
regmap_read(mvchip->regs, GPIO_BLINK_EN_OFF + mvchip->offset, &u);
if (u)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e73b0101ae5124bf7cd3fb5d250302ad2f16a416 Mon Sep 17 00:00:00 2001
From: Baruch Siach <baruch(a)tkos.co.il>
Date: Sun, 17 Jan 2021 15:17:02 +0200
Subject: [PATCH] gpio: mvebu: fix pwm .get_state period calculation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The period is the sum of on and off values. That is, calculate period as
($on + $off) / clkrate
instead of
$off / clkrate - $on / clkrate
that makes no sense.
Reported-by: Russell King <linux(a)armlinux.org.uk>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Fixes: 757642f9a584e ("gpio: mvebu: Add limited PWM support")
Signed-off-by: Baruch Siach <baruch(a)tkos.co.il>
Signed-off-by: Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
diff --git a/drivers/gpio/gpio-mvebu.c b/drivers/gpio/gpio-mvebu.c
index 672681a976f5..a912a8fed197 100644
--- a/drivers/gpio/gpio-mvebu.c
+++ b/drivers/gpio/gpio-mvebu.c
@@ -676,20 +676,17 @@ static void mvebu_pwm_get_state(struct pwm_chip *chip,
else
state->duty_cycle = 1;
+ val = (unsigned long long) u; /* on duration */
regmap_read(mvpwm->regs, mvebu_pwmreg_blink_off_duration(mvpwm), &u);
- val = (unsigned long long) u * NSEC_PER_SEC;
+ val += (unsigned long long) u; /* period = on + off duration */
+ val *= NSEC_PER_SEC;
do_div(val, mvpwm->clk_rate);
- if (val < state->duty_cycle) {
+ if (val > UINT_MAX)
+ state->period = UINT_MAX;
+ else if (val)
+ state->period = val;
+ else
state->period = 1;
- } else {
- val -= state->duty_cycle;
- if (val > UINT_MAX)
- state->period = UINT_MAX;
- else if (val)
- state->period = val;
- else
- state->period = 1;
- }
regmap_read(mvchip->regs, GPIO_BLINK_EN_OFF + mvchip->offset, &u);
if (u)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e73b0101ae5124bf7cd3fb5d250302ad2f16a416 Mon Sep 17 00:00:00 2001
From: Baruch Siach <baruch(a)tkos.co.il>
Date: Sun, 17 Jan 2021 15:17:02 +0200
Subject: [PATCH] gpio: mvebu: fix pwm .get_state period calculation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The period is the sum of on and off values. That is, calculate period as
($on + $off) / clkrate
instead of
$off / clkrate - $on / clkrate
that makes no sense.
Reported-by: Russell King <linux(a)armlinux.org.uk>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Fixes: 757642f9a584e ("gpio: mvebu: Add limited PWM support")
Signed-off-by: Baruch Siach <baruch(a)tkos.co.il>
Signed-off-by: Bartosz Golaszewski <bgolaszewski(a)baylibre.com>
diff --git a/drivers/gpio/gpio-mvebu.c b/drivers/gpio/gpio-mvebu.c
index 672681a976f5..a912a8fed197 100644
--- a/drivers/gpio/gpio-mvebu.c
+++ b/drivers/gpio/gpio-mvebu.c
@@ -676,20 +676,17 @@ static void mvebu_pwm_get_state(struct pwm_chip *chip,
else
state->duty_cycle = 1;
+ val = (unsigned long long) u; /* on duration */
regmap_read(mvpwm->regs, mvebu_pwmreg_blink_off_duration(mvpwm), &u);
- val = (unsigned long long) u * NSEC_PER_SEC;
+ val += (unsigned long long) u; /* period = on + off duration */
+ val *= NSEC_PER_SEC;
do_div(val, mvpwm->clk_rate);
- if (val < state->duty_cycle) {
+ if (val > UINT_MAX)
+ state->period = UINT_MAX;
+ else if (val)
+ state->period = val;
+ else
state->period = 1;
- } else {
- val -= state->duty_cycle;
- if (val > UINT_MAX)
- state->period = UINT_MAX;
- else if (val)
- state->period = val;
- else
- state->period = 1;
- }
regmap_read(mvchip->regs, GPIO_BLINK_EN_OFF + mvchip->offset, &u);
if (u)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9c7d9017a49fb8516c13b7bff59b7da2abed23e1 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Fri, 8 Jan 2021 19:05:59 +0100
Subject: [PATCH] x86: PM: Register syscore_ops for scale invariance
On x86 scale invariace tends to be disabled during resume from
suspend-to-RAM, because the MPERF or APERF MSR values are not as
expected then due to updates taking place after the platform
firmware has been invoked to complete the suspend transition.
That, of course, is not desirable, especially if the schedutil
scaling governor is in use, because the lack of scale invariance
causes it to be less reliable.
To counter that effect, modify init_freq_invariance() to register
a syscore_ops object for scale invariance with the ->resume callback
pointing to init_counter_refs() which will run on the CPU starting
the resume transition (the other CPUs will be taken care of the
"online" operations taking place later).
Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in frequency invariant accounting")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Acked-by: Giovanni Gherdovich <ggherdovich(a)suse.cz>
Link: https://lkml.kernel.org/r/1803209.Mvru99baaF@kreacher
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8ca66af96a54..117e24fbfd8a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -56,6 +56,7 @@
#include <linux/numa.h>
#include <linux/pgtable.h>
#include <linux/overflow.h>
+#include <linux/syscore_ops.h>
#include <asm/acpi.h>
#include <asm/desc.h>
@@ -2083,6 +2084,23 @@ static void init_counter_refs(void)
this_cpu_write(arch_prev_mperf, mperf);
}
+#ifdef CONFIG_PM_SLEEP
+static struct syscore_ops freq_invariance_syscore_ops = {
+ .resume = init_counter_refs,
+};
+
+static void register_freq_invariance_syscore_ops(void)
+{
+ /* Bail out if registered already. */
+ if (freq_invariance_syscore_ops.node.prev)
+ return;
+
+ register_syscore_ops(&freq_invariance_syscore_ops);
+}
+#else
+static inline void register_freq_invariance_syscore_ops(void) {}
+#endif
+
static void init_freq_invariance(bool secondary, bool cppc_ready)
{
bool ret = false;
@@ -2109,6 +2127,7 @@ static void init_freq_invariance(bool secondary, bool cppc_ready)
if (ret) {
init_counter_refs();
static_branch_enable(&arch_scale_freq_key);
+ register_freq_invariance_syscore_ops();
pr_info("Estimated ratio of average max frequency by base frequency (times 1024): %llu\n", arch_max_freq_ratio);
} else {
pr_debug("Couldn't determine max cpu frequency, necessary for scale-invariant accounting.\n");
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cf9d052aa6005f1e8dfaf491d83bf37f368af69e Mon Sep 17 00:00:00 2001
From: Douglas Anderson <dianders(a)chromium.org>
Date: Thu, 14 Jan 2021 19:16:24 -0800
Subject: [PATCH] pinctrl: qcom: Don't clear pending interrupts when enabling
In Linux, if a driver does disable_irq() and later does enable_irq()
on its interrupt, I believe it's expecting these properties:
* If an interrupt was pending when the driver disabled then it will
still be pending after the driver re-enables.
* If an edge-triggered interrupt comes in while an interrupt is
disabled it should assert when the interrupt is re-enabled.
If you think that the above sounds a lot like the disable_irq() and
enable_irq() are supposed to be masking/unmasking the interrupt
instead of disabling/enabling it then you've made an astute
observation. Specifically when talking about interrupts, "mask"
usually means to stop posting interrupts but keep tracking them and
"disable" means to fully shut off interrupt detection. It's
unfortunate that this is so confusing, but presumably this is all the
way it is for historical reasons.
Perhaps more confusing than the above is that, even though clients of
IRQs themselves don't have a way to request mask/unmask
vs. disable/enable calls, IRQ chips themselves can implement both.
...and yet more confusing is that if an IRQ chip implements
disable/enable then they will be called when a client driver calls
disable_irq() / enable_irq().
It does feel like some of the above could be cleared up. However,
without any other core interrupt changes it should be clear that when
an IRQ chip gets a request to "disable" an IRQ that it has to treat it
like a mask of that IRQ.
In any case, after that long interlude you can see that the "unmask
and clear" can break things. Maulik tried to fix it so that we no
longer did "unmask and clear" in commit 71266d9d3936 ("pinctrl: qcom:
Move clearing pending IRQ to .irq_request_resources callback"), but it
only handled the PDC case and it had problems (it caused
sc7180-trogdor devices to fail to suspend). Let's fix.
>From my understanding the source of the phantom interrupt in the
were these two things:
1. One that could have been introduced in msm_gpio_irq_set_type()
(only for the non-PDC case).
2. Edges could have been detected when a GPIO was muxed away.
Fixing case #1 is easy. We can just add a clear in
msm_gpio_irq_set_type().
Fixing case #2 is harder. Let's use a concrete example. In
sc7180-trogdor.dtsi we configure the uart3 to have two pinctrl states,
sleep and default, and mux between the two during runtime PM and
system suspend (see geni_se_resources_{on,off}() for more
details). The difference between the sleep and default state is that
the RX pin is muxed to a GPIO during sleep and muxed to the UART
otherwise.
As per Qualcomm, when we mux the pin over to the UART function the PDC
(or the non-PDC interrupt detection logic) is still watching it /
latching edges. These edges don't cause interrupts because the
current code masks the interrupt unless we're entering suspend.
However, as soon as we enter suspend we unmask the interrupt and it's
counted as a wakeup.
Let's deal with the problem like this:
* When we mux away, we'll mask our interrupt. This isn't necessary in
the above case since the client already masked us, but it's a good
idea in general.
* When we mux back will clear any interrupts and unmask.
Fixes: 4b7618fdc7e6 ("pinctrl: qcom: Add irq_enable callback for msm gpio")
Fixes: 71266d9d3936 ("pinctrl: qcom: Move clearing pending IRQ to .irq_request_resources callback")
Signed-off-by: Douglas Anderson <dianders(a)chromium.org>
Reviewed-by: Maulik Shah <mkshah(a)codeaurora.org>
Tested-by: Maulik Shah <mkshah(a)codeaurora.org>
Reviewed-by: Stephen Boyd <swboyd(a)chromium.org>
Link: https://lore.kernel.org/r/20210114191601.v7.4.I7cf3019783720feb57b958c95c2b…
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c b/drivers/pinctrl/qcom/pinctrl-msm.c
index 192ed31eabf4..d70caecd21d2 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -51,6 +51,7 @@
* @dual_edge_irqs: Bitmap of irqs that need sw emulated dual edge
* detection.
* @skip_wake_irqs: Skip IRQs that are handled by wakeup interrupt controller
+ * @disabled_for_mux: These IRQs were disabled because we muxed away.
* @soc: Reference to soc_data of platform specific data.
* @regs: Base addresses for the TLMM tiles.
* @phys_base: Physical base address
@@ -72,6 +73,7 @@ struct msm_pinctrl {
DECLARE_BITMAP(dual_edge_irqs, MAX_NR_GPIO);
DECLARE_BITMAP(enabled_irqs, MAX_NR_GPIO);
DECLARE_BITMAP(skip_wake_irqs, MAX_NR_GPIO);
+ DECLARE_BITMAP(disabled_for_mux, MAX_NR_GPIO);
const struct msm_pinctrl_soc_data *soc;
void __iomem *regs[MAX_NR_TILES];
@@ -179,6 +181,10 @@ static int msm_pinmux_set_mux(struct pinctrl_dev *pctldev,
unsigned group)
{
struct msm_pinctrl *pctrl = pinctrl_dev_get_drvdata(pctldev);
+ struct gpio_chip *gc = &pctrl->chip;
+ unsigned int irq = irq_find_mapping(gc->irq.domain, group);
+ struct irq_data *d = irq_get_irq_data(irq);
+ unsigned int gpio_func = pctrl->soc->gpio_func;
const struct msm_pingroup *g;
unsigned long flags;
u32 val, mask;
@@ -195,6 +201,20 @@ static int msm_pinmux_set_mux(struct pinctrl_dev *pctldev,
if (WARN_ON(i == g->nfuncs))
return -EINVAL;
+ /*
+ * If an GPIO interrupt is setup on this pin then we need special
+ * handling. Specifically interrupt detection logic will still see
+ * the pin twiddle even when we're muxed away.
+ *
+ * When we see a pin with an interrupt setup on it then we'll disable
+ * (mask) interrupts on it when we mux away until we mux back. Note
+ * that disable_irq() refcounts and interrupts are disabled as long as
+ * at least one disable_irq() has been called.
+ */
+ if (d && i != gpio_func &&
+ !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
+ disable_irq(irq);
+
raw_spin_lock_irqsave(&pctrl->lock, flags);
val = msm_readl_ctl(pctrl, g);
@@ -204,6 +224,20 @@ static int msm_pinmux_set_mux(struct pinctrl_dev *pctldev,
raw_spin_unlock_irqrestore(&pctrl->lock, flags);
+ if (d && i == gpio_func &&
+ test_and_clear_bit(d->hwirq, pctrl->disabled_for_mux)) {
+ /*
+ * Clear interrupts detected while not GPIO since we only
+ * masked things.
+ */
+ if (d->parent_data && test_bit(d->hwirq, pctrl->skip_wake_irqs))
+ irq_chip_set_parent_state(d, IRQCHIP_STATE_PENDING, false);
+ else
+ msm_ack_intr_status(pctrl, g);
+
+ enable_irq(irq);
+ }
+
return 0;
}
@@ -781,7 +815,7 @@ static void msm_gpio_irq_mask(struct irq_data *d)
raw_spin_unlock_irqrestore(&pctrl->lock, flags);
}
-static void msm_gpio_irq_clear_unmask(struct irq_data *d, bool status_clear)
+static void msm_gpio_irq_unmask(struct irq_data *d)
{
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
@@ -799,14 +833,6 @@ static void msm_gpio_irq_clear_unmask(struct irq_data *d, bool status_clear)
raw_spin_lock_irqsave(&pctrl->lock, flags);
- /*
- * clear the interrupt status bit before unmask to avoid
- * any erroneous interrupts that would have got latched
- * when the interrupt is not in use.
- */
- if (status_clear)
- msm_ack_intr_status(pctrl, g);
-
val = msm_readl_intr_cfg(pctrl, g);
val |= BIT(g->intr_raw_status_bit);
val |= BIT(g->intr_enable_bit);
@@ -826,7 +852,7 @@ static void msm_gpio_irq_enable(struct irq_data *d)
irq_chip_enable_parent(d);
if (!test_bit(d->hwirq, pctrl->skip_wake_irqs))
- msm_gpio_irq_clear_unmask(d, true);
+ msm_gpio_irq_unmask(d);
}
static void msm_gpio_irq_disable(struct irq_data *d)
@@ -841,11 +867,6 @@ static void msm_gpio_irq_disable(struct irq_data *d)
msm_gpio_irq_mask(d);
}
-static void msm_gpio_irq_unmask(struct irq_data *d)
-{
- msm_gpio_irq_clear_unmask(d, false);
-}
-
/**
* msm_gpio_update_dual_edge_parent() - Prime next edge for IRQs handled by parent.
* @d: The irq dta.
@@ -934,6 +955,7 @@ static int msm_gpio_irq_set_type(struct irq_data *d, unsigned int type)
struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
const struct msm_pingroup *g;
unsigned long flags;
+ bool was_enabled;
u32 val;
if (msm_gpio_needs_dual_edge_parent_workaround(d, type)) {
@@ -995,6 +1017,7 @@ static int msm_gpio_irq_set_type(struct irq_data *d, unsigned int type)
* could cause the INTR_STATUS to be set for EDGE interrupts.
*/
val = msm_readl_intr_cfg(pctrl, g);
+ was_enabled = val & BIT(g->intr_raw_status_bit);
val |= BIT(g->intr_raw_status_bit);
if (g->intr_detection_width == 2) {
val &= ~(3 << g->intr_detection_bit);
@@ -1044,6 +1067,14 @@ static int msm_gpio_irq_set_type(struct irq_data *d, unsigned int type)
}
msm_writel_intr_cfg(val, pctrl, g);
+ /*
+ * The first time we set RAW_STATUS_EN it could trigger an interrupt.
+ * Clear the interrupt. This is safe because we have
+ * IRQCHIP_SET_TYPE_MASKED.
+ */
+ if (!was_enabled)
+ msm_ack_intr_status(pctrl, g);
+
if (test_bit(d->hwirq, pctrl->dual_edge_irqs))
msm_gpio_update_dual_edge_pos(pctrl, g, d);
@@ -1097,16 +1128,11 @@ static int msm_gpio_irq_reqres(struct irq_data *d)
}
/*
- * Clear the interrupt that may be pending before we enable
- * the line.
- * This is especially a problem with the GPIOs routed to the
- * PDC. These GPIOs are direct-connect interrupts to the GIC.
- * Disabling the interrupt line at the PDC does not prevent
- * the interrupt from being latched at the GIC. The state at
- * GIC needs to be cleared before enabling.
+ * The disable / clear-enable workaround we do in msm_pinmux_set_mux()
+ * only works if disable is not lazy since we only clear any bogus
+ * interrupt in hardware. Explicitly mark the interrupt as UNLAZY.
*/
- if (d->parent_data && test_bit(d->hwirq, pctrl->skip_wake_irqs))
- irq_chip_set_parent_state(d, IRQCHIP_STATE_PENDING, 0);
+ irq_set_status_flags(d->irq, IRQ_DISABLE_UNLAZY);
return 0;
out:
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From dad4e5b390866ca902653df0daa864ae4b8d4147 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Sat, 23 Jan 2021 21:01:52 -0800
Subject: [PATCH] mm: fix page reference leak in soft_offline_page()
The conversion to move pfn_to_online_page() internal to
soft_offline_page() missed that the get_user_pages() reference taken by
the madvise() path needs to be dropped when pfn_to_online_page() fails.
Note the direct sysfs-path to soft_offline_page() does not perform a
get_user_pages() lookup.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page()
pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Link: https://lkml.kernel.org/r/161058501210.1840162.8108917599181157327.stgit@dw…
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 04d9f154a130..e9481632fcd1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct page *page)
return rc;
}
+static void put_ref_page(struct page *page)
+{
+ if (page)
+ put_page(page);
+}
+
/**
* soft_offline_page - Soft offline a page.
* @pfn: pfn to soft-offline
@@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct page *page)
int soft_offline_page(unsigned long pfn, int flags)
{
int ret;
- struct page *page;
bool try_again = true;
+ struct page *page, *ref_page = NULL;
+
+ WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
if (!pfn_valid(pfn))
return -ENXIO;
+ if (flags & MF_COUNT_INCREASED)
+ ref_page = pfn_to_page(pfn);
+
/* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */
page = pfn_to_online_page(pfn);
- if (!page)
+ if (!page) {
+ put_ref_page(ref_page);
return -EIO;
+ }
if (PageHWPoison(page)) {
pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
- if (flags & MF_COUNT_INCREASED)
- put_page(page);
+ put_ref_page(ref_page);
return 0;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5c447d274f3746fbed6e695e7b9a2d7bd8b31b71 Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeelb(a)google.com>
Date: Sat, 23 Jan 2021 21:01:15 -0800
Subject: [PATCH] mm: fix numa stats for thp migration
Currently the kernel is not correctly updating the numa stats for
NR_FILE_PAGES and NR_SHMEM on THP migration. Fix that.
For NR_FILE_DIRTY and NR_ZONE_WRITE_PENDING, although at the moment
there is no need to handle THP migration as kernel still does not have
write support for file THP but to be more future proof, this patch adds
the THP support for those stats as well.
Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com
Fixes: e71769ae52609 ("mm: enable thp migration for shmem thp")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/migrate.c b/mm/migrate.c
index 613794f6a433..c0efe921bca5 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -402,6 +402,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
struct zone *oldzone, *newzone;
int dirty;
int expected_count = expected_page_refs(mapping, page) + extra_count;
+ int nr = thp_nr_pages(page);
if (!mapping) {
/* Anonymous page without mapping */
@@ -437,7 +438,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
*/
newpage->index = page->index;
newpage->mapping = page->mapping;
- page_ref_add(newpage, thp_nr_pages(page)); /* add cache reference */
+ page_ref_add(newpage, nr); /* add cache reference */
if (PageSwapBacked(page)) {
__SetPageSwapBacked(newpage);
if (PageSwapCache(page)) {
@@ -459,7 +460,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
if (PageTransHuge(page)) {
int i;
- for (i = 1; i < HPAGE_PMD_NR; i++) {
+ for (i = 1; i < nr; i++) {
xas_next(&xas);
xas_store(&xas, newpage);
}
@@ -470,7 +471,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
* to one less reference.
* We know this isn't the last reference.
*/
- page_ref_unfreeze(page, expected_count - thp_nr_pages(page));
+ page_ref_unfreeze(page, expected_count - nr);
xas_unlock(&xas);
/* Leave irq disabled to prevent preemption while updating stats */
@@ -493,17 +494,17 @@ int migrate_page_move_mapping(struct address_space *mapping,
old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat);
new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat);
- __dec_lruvec_state(old_lruvec, NR_FILE_PAGES);
- __inc_lruvec_state(new_lruvec, NR_FILE_PAGES);
+ __mod_lruvec_state(old_lruvec, NR_FILE_PAGES, -nr);
+ __mod_lruvec_state(new_lruvec, NR_FILE_PAGES, nr);
if (PageSwapBacked(page) && !PageSwapCache(page)) {
- __dec_lruvec_state(old_lruvec, NR_SHMEM);
- __inc_lruvec_state(new_lruvec, NR_SHMEM);
+ __mod_lruvec_state(old_lruvec, NR_SHMEM, -nr);
+ __mod_lruvec_state(new_lruvec, NR_SHMEM, nr);
}
if (dirty && mapping_can_writeback(mapping)) {
- __dec_lruvec_state(old_lruvec, NR_FILE_DIRTY);
- __dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING);
- __inc_lruvec_state(new_lruvec, NR_FILE_DIRTY);
- __inc_zone_state(newzone, NR_ZONE_WRITE_PENDING);
+ __mod_lruvec_state(old_lruvec, NR_FILE_DIRTY, -nr);
+ __mod_zone_page_state(oldzone, NR_ZONE_WRITE_PENDING, -nr);
+ __mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr);
+ __mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr);
}
}
local_irq_enable();
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5c447d274f3746fbed6e695e7b9a2d7bd8b31b71 Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeelb(a)google.com>
Date: Sat, 23 Jan 2021 21:01:15 -0800
Subject: [PATCH] mm: fix numa stats for thp migration
Currently the kernel is not correctly updating the numa stats for
NR_FILE_PAGES and NR_SHMEM on THP migration. Fix that.
For NR_FILE_DIRTY and NR_ZONE_WRITE_PENDING, although at the moment
there is no need to handle THP migration as kernel still does not have
write support for file THP but to be more future proof, this patch adds
the THP support for those stats as well.
Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com
Fixes: e71769ae52609 ("mm: enable thp migration for shmem thp")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/migrate.c b/mm/migrate.c
index 613794f6a433..c0efe921bca5 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -402,6 +402,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
struct zone *oldzone, *newzone;
int dirty;
int expected_count = expected_page_refs(mapping, page) + extra_count;
+ int nr = thp_nr_pages(page);
if (!mapping) {
/* Anonymous page without mapping */
@@ -437,7 +438,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
*/
newpage->index = page->index;
newpage->mapping = page->mapping;
- page_ref_add(newpage, thp_nr_pages(page)); /* add cache reference */
+ page_ref_add(newpage, nr); /* add cache reference */
if (PageSwapBacked(page)) {
__SetPageSwapBacked(newpage);
if (PageSwapCache(page)) {
@@ -459,7 +460,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
if (PageTransHuge(page)) {
int i;
- for (i = 1; i < HPAGE_PMD_NR; i++) {
+ for (i = 1; i < nr; i++) {
xas_next(&xas);
xas_store(&xas, newpage);
}
@@ -470,7 +471,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
* to one less reference.
* We know this isn't the last reference.
*/
- page_ref_unfreeze(page, expected_count - thp_nr_pages(page));
+ page_ref_unfreeze(page, expected_count - nr);
xas_unlock(&xas);
/* Leave irq disabled to prevent preemption while updating stats */
@@ -493,17 +494,17 @@ int migrate_page_move_mapping(struct address_space *mapping,
old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat);
new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat);
- __dec_lruvec_state(old_lruvec, NR_FILE_PAGES);
- __inc_lruvec_state(new_lruvec, NR_FILE_PAGES);
+ __mod_lruvec_state(old_lruvec, NR_FILE_PAGES, -nr);
+ __mod_lruvec_state(new_lruvec, NR_FILE_PAGES, nr);
if (PageSwapBacked(page) && !PageSwapCache(page)) {
- __dec_lruvec_state(old_lruvec, NR_SHMEM);
- __inc_lruvec_state(new_lruvec, NR_SHMEM);
+ __mod_lruvec_state(old_lruvec, NR_SHMEM, -nr);
+ __mod_lruvec_state(new_lruvec, NR_SHMEM, nr);
}
if (dirty && mapping_can_writeback(mapping)) {
- __dec_lruvec_state(old_lruvec, NR_FILE_DIRTY);
- __dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING);
- __inc_lruvec_state(new_lruvec, NR_FILE_DIRTY);
- __inc_zone_state(newzone, NR_ZONE_WRITE_PENDING);
+ __mod_lruvec_state(old_lruvec, NR_FILE_DIRTY, -nr);
+ __mod_zone_page_state(oldzone, NR_ZONE_WRITE_PENDING, -nr);
+ __mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr);
+ __mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr);
}
}
local_irq_enable();
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 40c48fb79b9798954691f24b8ece1d3a7eb1b353 Mon Sep 17 00:00:00 2001
From: Lorenzo Bianconi <lorenzo(a)kernel.org>
Date: Tue, 8 Dec 2020 15:36:40 +0100
Subject: [PATCH] iio: common: st_sensors: fix possible infinite loop in
st_sensors_irq_thread
Return a boolean value in st_sensors_new_samples_available routine in
order to avoid an infinite loop in st_sensors_irq_thread if
stat_drdy.addr is not defined or stat_drdy read fails
Fixes: 90efe05562921 ("iio: st_sensors: harden interrupt handling")
Reported-by: Jonathan Cameron <jic23(a)kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo(a)kernel.org>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Link: https://lore.kernel.org/r/c9ec69ed349e7200c779fd7a5bf04c1aaa2817aa.16074381…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/common/st_sensors/st_sensors_trigger.c b/drivers/iio/common/st_sensors/st_sensors_trigger.c
index 0507283bd4c1..2dbd2646e44e 100644
--- a/drivers/iio/common/st_sensors/st_sensors_trigger.c
+++ b/drivers/iio/common/st_sensors/st_sensors_trigger.c
@@ -23,35 +23,31 @@
* @sdata: Sensor data.
*
* returns:
- * 0 - no new samples available
- * 1 - new samples available
- * negative - error or unknown
+ * false - no new samples available or read error
+ * true - new samples available
*/
-static int st_sensors_new_samples_available(struct iio_dev *indio_dev,
- struct st_sensor_data *sdata)
+static bool st_sensors_new_samples_available(struct iio_dev *indio_dev,
+ struct st_sensor_data *sdata)
{
int ret, status;
/* How would I know if I can't check it? */
if (!sdata->sensor_settings->drdy_irq.stat_drdy.addr)
- return -EINVAL;
+ return true;
/* No scan mask, no interrupt */
if (!indio_dev->active_scan_mask)
- return 0;
+ return false;
ret = regmap_read(sdata->regmap,
sdata->sensor_settings->drdy_irq.stat_drdy.addr,
&status);
if (ret < 0) {
dev_err(sdata->dev, "error checking samples available\n");
- return ret;
+ return false;
}
- if (status & sdata->sensor_settings->drdy_irq.stat_drdy.mask)
- return 1;
-
- return 0;
+ return !!(status & sdata->sensor_settings->drdy_irq.stat_drdy.mask);
}
/**
@@ -180,9 +176,15 @@ int st_sensors_allocate_trigger(struct iio_dev *indio_dev,
/* Tell the interrupt handler that we're dealing with edges */
if (irq_trig == IRQF_TRIGGER_FALLING ||
- irq_trig == IRQF_TRIGGER_RISING)
+ irq_trig == IRQF_TRIGGER_RISING) {
+ if (!sdata->sensor_settings->drdy_irq.stat_drdy.addr) {
+ dev_err(&indio_dev->dev,
+ "edge IRQ not supported w/o stat register.\n");
+ err = -EOPNOTSUPP;
+ goto iio_trigger_free;
+ }
sdata->edge_irq = true;
- else
+ } else {
/*
* If we're not using edges (i.e. level interrupts) we
* just mask off the IRQ, handle one interrupt, then
@@ -190,6 +192,7 @@ int st_sensors_allocate_trigger(struct iio_dev *indio_dev,
* interrupt handler top half again and start over.
*/
irq_trig |= IRQF_ONESHOT;
+ }
/*
* If the interrupt pin is Open Drain, by definition this
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 40c48fb79b9798954691f24b8ece1d3a7eb1b353 Mon Sep 17 00:00:00 2001
From: Lorenzo Bianconi <lorenzo(a)kernel.org>
Date: Tue, 8 Dec 2020 15:36:40 +0100
Subject: [PATCH] iio: common: st_sensors: fix possible infinite loop in
st_sensors_irq_thread
Return a boolean value in st_sensors_new_samples_available routine in
order to avoid an infinite loop in st_sensors_irq_thread if
stat_drdy.addr is not defined or stat_drdy read fails
Fixes: 90efe05562921 ("iio: st_sensors: harden interrupt handling")
Reported-by: Jonathan Cameron <jic23(a)kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo(a)kernel.org>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Link: https://lore.kernel.org/r/c9ec69ed349e7200c779fd7a5bf04c1aaa2817aa.16074381…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/common/st_sensors/st_sensors_trigger.c b/drivers/iio/common/st_sensors/st_sensors_trigger.c
index 0507283bd4c1..2dbd2646e44e 100644
--- a/drivers/iio/common/st_sensors/st_sensors_trigger.c
+++ b/drivers/iio/common/st_sensors/st_sensors_trigger.c
@@ -23,35 +23,31 @@
* @sdata: Sensor data.
*
* returns:
- * 0 - no new samples available
- * 1 - new samples available
- * negative - error or unknown
+ * false - no new samples available or read error
+ * true - new samples available
*/
-static int st_sensors_new_samples_available(struct iio_dev *indio_dev,
- struct st_sensor_data *sdata)
+static bool st_sensors_new_samples_available(struct iio_dev *indio_dev,
+ struct st_sensor_data *sdata)
{
int ret, status;
/* How would I know if I can't check it? */
if (!sdata->sensor_settings->drdy_irq.stat_drdy.addr)
- return -EINVAL;
+ return true;
/* No scan mask, no interrupt */
if (!indio_dev->active_scan_mask)
- return 0;
+ return false;
ret = regmap_read(sdata->regmap,
sdata->sensor_settings->drdy_irq.stat_drdy.addr,
&status);
if (ret < 0) {
dev_err(sdata->dev, "error checking samples available\n");
- return ret;
+ return false;
}
- if (status & sdata->sensor_settings->drdy_irq.stat_drdy.mask)
- return 1;
-
- return 0;
+ return !!(status & sdata->sensor_settings->drdy_irq.stat_drdy.mask);
}
/**
@@ -180,9 +176,15 @@ int st_sensors_allocate_trigger(struct iio_dev *indio_dev,
/* Tell the interrupt handler that we're dealing with edges */
if (irq_trig == IRQF_TRIGGER_FALLING ||
- irq_trig == IRQF_TRIGGER_RISING)
+ irq_trig == IRQF_TRIGGER_RISING) {
+ if (!sdata->sensor_settings->drdy_irq.stat_drdy.addr) {
+ dev_err(&indio_dev->dev,
+ "edge IRQ not supported w/o stat register.\n");
+ err = -EOPNOTSUPP;
+ goto iio_trigger_free;
+ }
sdata->edge_irq = true;
- else
+ } else {
/*
* If we're not using edges (i.e. level interrupts) we
* just mask off the IRQ, handle one interrupt, then
@@ -190,6 +192,7 @@ int st_sensors_allocate_trigger(struct iio_dev *indio_dev,
* interrupt handler top half again and start over.
*/
irq_trig |= IRQF_ONESHOT;
+ }
/*
* If the interrupt pin is Open Drain, by definition this
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 40c48fb79b9798954691f24b8ece1d3a7eb1b353 Mon Sep 17 00:00:00 2001
From: Lorenzo Bianconi <lorenzo(a)kernel.org>
Date: Tue, 8 Dec 2020 15:36:40 +0100
Subject: [PATCH] iio: common: st_sensors: fix possible infinite loop in
st_sensors_irq_thread
Return a boolean value in st_sensors_new_samples_available routine in
order to avoid an infinite loop in st_sensors_irq_thread if
stat_drdy.addr is not defined or stat_drdy read fails
Fixes: 90efe05562921 ("iio: st_sensors: harden interrupt handling")
Reported-by: Jonathan Cameron <jic23(a)kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo(a)kernel.org>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Link: https://lore.kernel.org/r/c9ec69ed349e7200c779fd7a5bf04c1aaa2817aa.16074381…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/common/st_sensors/st_sensors_trigger.c b/drivers/iio/common/st_sensors/st_sensors_trigger.c
index 0507283bd4c1..2dbd2646e44e 100644
--- a/drivers/iio/common/st_sensors/st_sensors_trigger.c
+++ b/drivers/iio/common/st_sensors/st_sensors_trigger.c
@@ -23,35 +23,31 @@
* @sdata: Sensor data.
*
* returns:
- * 0 - no new samples available
- * 1 - new samples available
- * negative - error or unknown
+ * false - no new samples available or read error
+ * true - new samples available
*/
-static int st_sensors_new_samples_available(struct iio_dev *indio_dev,
- struct st_sensor_data *sdata)
+static bool st_sensors_new_samples_available(struct iio_dev *indio_dev,
+ struct st_sensor_data *sdata)
{
int ret, status;
/* How would I know if I can't check it? */
if (!sdata->sensor_settings->drdy_irq.stat_drdy.addr)
- return -EINVAL;
+ return true;
/* No scan mask, no interrupt */
if (!indio_dev->active_scan_mask)
- return 0;
+ return false;
ret = regmap_read(sdata->regmap,
sdata->sensor_settings->drdy_irq.stat_drdy.addr,
&status);
if (ret < 0) {
dev_err(sdata->dev, "error checking samples available\n");
- return ret;
+ return false;
}
- if (status & sdata->sensor_settings->drdy_irq.stat_drdy.mask)
- return 1;
-
- return 0;
+ return !!(status & sdata->sensor_settings->drdy_irq.stat_drdy.mask);
}
/**
@@ -180,9 +176,15 @@ int st_sensors_allocate_trigger(struct iio_dev *indio_dev,
/* Tell the interrupt handler that we're dealing with edges */
if (irq_trig == IRQF_TRIGGER_FALLING ||
- irq_trig == IRQF_TRIGGER_RISING)
+ irq_trig == IRQF_TRIGGER_RISING) {
+ if (!sdata->sensor_settings->drdy_irq.stat_drdy.addr) {
+ dev_err(&indio_dev->dev,
+ "edge IRQ not supported w/o stat register.\n");
+ err = -EOPNOTSUPP;
+ goto iio_trigger_free;
+ }
sdata->edge_irq = true;
- else
+ } else {
/*
* If we're not using edges (i.e. level interrupts) we
* just mask off the IRQ, handle one interrupt, then
@@ -190,6 +192,7 @@ int st_sensors_allocate_trigger(struct iio_dev *indio_dev,
* interrupt handler top half again and start over.
*/
irq_trig |= IRQF_ONESHOT;
+ }
/*
* If the interrupt pin is Open Drain, by definition this
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 40c48fb79b9798954691f24b8ece1d3a7eb1b353 Mon Sep 17 00:00:00 2001
From: Lorenzo Bianconi <lorenzo(a)kernel.org>
Date: Tue, 8 Dec 2020 15:36:40 +0100
Subject: [PATCH] iio: common: st_sensors: fix possible infinite loop in
st_sensors_irq_thread
Return a boolean value in st_sensors_new_samples_available routine in
order to avoid an infinite loop in st_sensors_irq_thread if
stat_drdy.addr is not defined or stat_drdy read fails
Fixes: 90efe05562921 ("iio: st_sensors: harden interrupt handling")
Reported-by: Jonathan Cameron <jic23(a)kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo(a)kernel.org>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Link: https://lore.kernel.org/r/c9ec69ed349e7200c779fd7a5bf04c1aaa2817aa.16074381…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/common/st_sensors/st_sensors_trigger.c b/drivers/iio/common/st_sensors/st_sensors_trigger.c
index 0507283bd4c1..2dbd2646e44e 100644
--- a/drivers/iio/common/st_sensors/st_sensors_trigger.c
+++ b/drivers/iio/common/st_sensors/st_sensors_trigger.c
@@ -23,35 +23,31 @@
* @sdata: Sensor data.
*
* returns:
- * 0 - no new samples available
- * 1 - new samples available
- * negative - error or unknown
+ * false - no new samples available or read error
+ * true - new samples available
*/
-static int st_sensors_new_samples_available(struct iio_dev *indio_dev,
- struct st_sensor_data *sdata)
+static bool st_sensors_new_samples_available(struct iio_dev *indio_dev,
+ struct st_sensor_data *sdata)
{
int ret, status;
/* How would I know if I can't check it? */
if (!sdata->sensor_settings->drdy_irq.stat_drdy.addr)
- return -EINVAL;
+ return true;
/* No scan mask, no interrupt */
if (!indio_dev->active_scan_mask)
- return 0;
+ return false;
ret = regmap_read(sdata->regmap,
sdata->sensor_settings->drdy_irq.stat_drdy.addr,
&status);
if (ret < 0) {
dev_err(sdata->dev, "error checking samples available\n");
- return ret;
+ return false;
}
- if (status & sdata->sensor_settings->drdy_irq.stat_drdy.mask)
- return 1;
-
- return 0;
+ return !!(status & sdata->sensor_settings->drdy_irq.stat_drdy.mask);
}
/**
@@ -180,9 +176,15 @@ int st_sensors_allocate_trigger(struct iio_dev *indio_dev,
/* Tell the interrupt handler that we're dealing with edges */
if (irq_trig == IRQF_TRIGGER_FALLING ||
- irq_trig == IRQF_TRIGGER_RISING)
+ irq_trig == IRQF_TRIGGER_RISING) {
+ if (!sdata->sensor_settings->drdy_irq.stat_drdy.addr) {
+ dev_err(&indio_dev->dev,
+ "edge IRQ not supported w/o stat register.\n");
+ err = -EOPNOTSUPP;
+ goto iio_trigger_free;
+ }
sdata->edge_irq = true;
- else
+ } else {
/*
* If we're not using edges (i.e. level interrupts) we
* just mask off the IRQ, handle one interrupt, then
@@ -190,6 +192,7 @@ int st_sensors_allocate_trigger(struct iio_dev *indio_dev,
* interrupt handler top half again and start over.
*/
irq_trig |= IRQF_ONESHOT;
+ }
/*
* If the interrupt pin is Open Drain, by definition this
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: mm: memcontrol: prevent starvation when writing memory.high
When a value is written to a cgroup's memory.high control file, the
write() context first tries to reclaim the cgroup to size before putting
the limit in place for the workload. Concurrent charges from the workload
can keep such a write() looping in reclaim indefinitely.
In the past, a write to memory.high would first put the limit in place for
the workload, then do targeted reclaim until the new limit has been met -
similar to how we do it for memory.max. This wasn't prone to the
described starvation issue. However, this sequence could cause excessive
latencies in the workload, when allocating threads could be put into long
penalty sleeps on the sudden memory.high overage created by the write(),
before that had a chance to work it off.
Now that memory_high_write() performs reclaim before enforcing the new
limit, reflect that the cgroup may well fail to converge due to concurrent
workload activity. Bail out of the loop after a few tries.
Link: https://lkml.kernel.org/r/20210112163011.127833-1-hannes@cmpxchg.org
Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reported-by: Tejun Heo <tj(a)kernel.org>
Acked-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Michal Koutný <mkoutny(a)suse.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-prevent-starvation-when-writing-memoryhigh
+++ a/mm/memcontrol.c
@@ -6273,7 +6273,6 @@ static ssize_t memory_high_write(struct
for (;;) {
unsigned long nr_pages = page_counter_read(&memcg->memory);
- unsigned long reclaimed;
if (nr_pages <= high)
break;
@@ -6287,10 +6286,10 @@ static ssize_t memory_high_write(struct
continue;
}
- reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
- GFP_KERNEL, true);
+ try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
+ GFP_KERNEL, true);
- if (!reclaimed && !nr_retries--)
+ if (!nr_retries--)
break;
}
_
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
The MEI bus has a special behavior on suspend it destroys
all the attached devices, this is due to the fact that also
firmware context is not persistent across power flows.
If watchdog on MEI bus is ticking before suspending the firmware
times out and reports that the OS is missing watchdog tick.
Send the stop command to the firmware on watchdog unregistered
to eliminate the false event on suspend.
This does not make the things worse from the user-space perspective
as a user-space should re-open watchdog device after
suspending before this patch.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
---
V2: Update the commit message with better explanation
drivers/watchdog/mei_wdt.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/watchdog/mei_wdt.c b/drivers/watchdog/mei_wdt.c
index 5391bf3e6b11..c5967d8b4256 100644
--- a/drivers/watchdog/mei_wdt.c
+++ b/drivers/watchdog/mei_wdt.c
@@ -382,6 +382,7 @@ static int mei_wdt_register(struct mei_wdt *wdt)
watchdog_set_drvdata(&wdt->wdd, wdt);
watchdog_stop_on_reboot(&wdt->wdd);
+ watchdog_stop_on_unregister(&wdt->wdd);
ret = watchdog_register_device(&wdt->wdd);
if (ret)
--
2.26.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 45db630e5f7ec83817c57c8ae387fe219bd42adf Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris(a)chris-wilson.co.uk>
Date: Mon, 18 Jan 2021 10:17:55 +0000
Subject: [PATCH] drm/i915: Check for rq->hwsp validity after acquiring RCU
lock
Since we allow removing the timeline map at runtime, there is a risk
that rq->hwsp points into a stale page. To control that risk, we hold
the RCU read lock while reading *rq->hwsp, but we missed a couple of
important barriers. First, the unpinning / removal of the timeline map
must be after all RCU readers into that map are complete, i.e. after an
rcu barrier (in this case courtesy of call_rcu()). Secondly, we must
make sure that the rq->hwsp we are about to dereference under the RCU
lock is valid. In this case, we make the rq->hwsp pointer safe during
i915_request_retire() and so we know that rq->hwsp may become invalid
only after the request has been signaled. Therefore is the request is
not yet signaled when we acquire rq->hwsp under the RCU, we know that
rq->hwsp will remain valid for the duration of the RCU read lock.
This is a very small window that may lead to either considering the
request not completed (causing a delay until the request is checked
again, any wait for the request is not affected) or dereferencing an
invalid pointer.
Fixes: 3adac4689f58 ("drm/i915: Introduce concept of per-timeline (context) HWSP")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v5.1+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201218122421.18344-1-chris@…
(cherry picked from commit 9bb36cf66091ddf2d8840e5aa705ad3c93a6279b)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210118101755.476744-1-chris…
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index a24cc1ff08a0..0625cbb3b431 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -134,11 +134,6 @@ static bool remove_signaling_context(struct intel_breadcrumbs *b,
return true;
}
-static inline bool __request_completed(const struct i915_request *rq)
-{
- return i915_seqno_passed(__hwsp_seqno(rq), rq->fence.seqno);
-}
-
__maybe_unused static bool
check_signal_order(struct intel_context *ce, struct i915_request *rq)
{
@@ -257,7 +252,7 @@ static void signal_irq_work(struct irq_work *work)
list_for_each_entry_rcu(rq, &ce->signals, signal_link) {
bool release;
- if (!__request_completed(rq))
+ if (!__i915_request_is_complete(rq))
break;
if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL,
@@ -379,7 +374,7 @@ static void insert_breadcrumb(struct i915_request *rq)
* straight onto a signaled list, and queue the irq worker for
* its signal completion.
*/
- if (__request_completed(rq)) {
+ if (__i915_request_is_complete(rq)) {
if (__signal_request(rq) &&
llist_add(&rq->signal_node, &b->signaled_requests))
irq_work_queue(&b->irq_work);
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 7ea94d201fe6..8015964043eb 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -126,6 +126,10 @@ static void __rcu_cacheline_free(struct rcu_head *rcu)
struct intel_timeline_cacheline *cl =
container_of(rcu, typeof(*cl), rcu);
+ /* Must wait until after all *rq->hwsp are complete before removing */
+ i915_gem_object_unpin_map(cl->hwsp->vma->obj);
+ __idle_hwsp_free(cl->hwsp, ptr_unmask_bits(cl->vaddr, CACHELINE_BITS));
+
i915_active_fini(&cl->active);
kfree(cl);
}
@@ -133,11 +137,6 @@ static void __rcu_cacheline_free(struct rcu_head *rcu)
static void __idle_cacheline_free(struct intel_timeline_cacheline *cl)
{
GEM_BUG_ON(!i915_active_is_idle(&cl->active));
-
- i915_gem_object_unpin_map(cl->hwsp->vma->obj);
- i915_vma_put(cl->hwsp->vma);
- __idle_hwsp_free(cl->hwsp, ptr_unmask_bits(cl->vaddr, CACHELINE_BITS));
-
call_rcu(&cl->rcu, __rcu_cacheline_free);
}
@@ -179,7 +178,6 @@ cacheline_alloc(struct intel_timeline_hwsp *hwsp, unsigned int cacheline)
return ERR_CAST(vaddr);
}
- i915_vma_get(hwsp->vma);
cl->hwsp = hwsp;
cl->vaddr = page_pack_bits(vaddr, cacheline);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 620b6fab2c5c..92adfee30c7c 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -434,7 +434,7 @@ static inline u32 hwsp_seqno(const struct i915_request *rq)
static inline bool __i915_request_has_started(const struct i915_request *rq)
{
- return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
+ return i915_seqno_passed(__hwsp_seqno(rq), rq->fence.seqno - 1);
}
/**
@@ -465,11 +465,19 @@ static inline bool __i915_request_has_started(const struct i915_request *rq)
*/
static inline bool i915_request_started(const struct i915_request *rq)
{
+ bool result;
+
if (i915_request_signaled(rq))
return true;
- /* Remember: started but may have since been preempted! */
- return __i915_request_has_started(rq);
+ result = true;
+ rcu_read_lock(); /* the HWSP may be freed at runtime */
+ if (likely(!i915_request_signaled(rq)))
+ /* Remember: started but may have since been preempted! */
+ result = __i915_request_has_started(rq);
+ rcu_read_unlock();
+
+ return result;
}
/**
@@ -482,10 +490,16 @@ static inline bool i915_request_started(const struct i915_request *rq)
*/
static inline bool i915_request_is_running(const struct i915_request *rq)
{
+ bool result;
+
if (!i915_request_is_active(rq))
return false;
- return __i915_request_has_started(rq);
+ rcu_read_lock();
+ result = __i915_request_has_started(rq) && i915_request_is_active(rq);
+ rcu_read_unlock();
+
+ return result;
}
/**
@@ -509,12 +523,25 @@ static inline bool i915_request_is_ready(const struct i915_request *rq)
return !list_empty(&rq->sched.link);
}
+static inline bool __i915_request_is_complete(const struct i915_request *rq)
+{
+ return i915_seqno_passed(__hwsp_seqno(rq), rq->fence.seqno);
+}
+
static inline bool i915_request_completed(const struct i915_request *rq)
{
+ bool result;
+
if (i915_request_signaled(rq))
return true;
- return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno);
+ result = true;
+ rcu_read_lock(); /* the HWSP may be freed at runtime */
+ if (likely(!i915_request_signaled(rq)))
+ result = __i915_request_is_complete(rq);
+ rcu_read_unlock();
+
+ return result;
}
static inline void i915_request_mark_complete(struct i915_request *rq)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 488751a0ef9b5ce572c47301ce62d54fc6b5a74d Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris(a)chris-wilson.co.uk>
Date: Mon, 18 Jan 2021 09:53:32 +0000
Subject: [PATCH] drm/i915/gt: Prevent use of engine->wa_ctx after error
On error we unpin and free the wa_ctx.vma, but do not clear any of the
derived flags. During lrc_init, we look at the flags and attempt to
dereference the wa_ctx.vma if they are set. To protect the error path
where we try to limp along without the wa_ctx, make sure we clear those
flags!
Reported-by: Matt Roper <matthew.d.roper(a)intel.com>
Fixes: 604a8f6f1e33 ("drm/i915/lrc: Only enable per-context and per-bb buffers if set")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Matt Roper <matthew.d.roper(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Cc: Mika Kuoppala <mika.kuoppala(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v4.15+
Reviewed-by: Matt Roper <matthew.d.roper(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210108204026.20682-1-chris@…
(cherry-picked from 5b4dc95cf7f573e927fbbd406ebe54225d41b9b2)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210118095332.458813-1-chris…
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 7614a3d24fca..26c7d0a50585 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -3988,6 +3988,9 @@ static int lrc_setup_wa_ctx(struct intel_engine_cs *engine)
static void lrc_destroy_wa_ctx(struct intel_engine_cs *engine)
{
i915_vma_unpin_and_release(&engine->wa_ctx.vma, 0);
+
+ /* Called on error unwind, clear all flags to prevent further use */
+ memset(&engine->wa_ctx, 0, sizeof(engine->wa_ctx));
}
typedef u32 *(*wa_bb_func_t)(struct intel_engine_cs *engine, u32 *batch);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 51e87da7d4014f49769dcf60b8626a81492df2c4 Mon Sep 17 00:00:00 2001
From: Prike Liang <Prike.Liang(a)amd.com>
Date: Thu, 17 Dec 2020 13:55:46 +0800
Subject: [PATCH] drm/amdgpu/pm: no need GPU status set since
mmnbif_gpu_BIF_DOORBELL_FENCE_CNTL added in FSDL
In the renoir there is no need GpuChangeState message set to exit gfxoff in the s0i3 resume since
mmnbif_gpu_BIF_DOORBELL_FENCE_CNTL has been added in the s0i3 FSDL.
Signed-off-by: Prike Liang <Prike.Liang(a)amd.com>
Reviewed-by: Huang Rui <ray.huang(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
index f743685a20e8..9a9697038016 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
@@ -1121,7 +1121,7 @@ static ssize_t renoir_get_gpu_metrics(struct smu_context *smu,
static int renoir_gfx_state_change_set(struct smu_context *smu, uint32_t state)
{
- return smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GpuChangeState, state, NULL);
+ return 0;
}
static const struct pptable_funcs renoir_ppt_funcs = {
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5c02406428d5219c367c5f53457698c58bc5f917 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Wed, 20 Jan 2021 13:59:11 -0500
Subject: [PATCH] dm integrity: conditionally disable "recalculate" feature
Otherwise a malicious user could (ab)use the "recalculate" feature
that makes dm-integrity calculate the checksums in the background
while the device is already usable. When the system restarts before all
checksums have been calculated, the calculation continues where it was
interrupted even if the recalculate feature is not requested the next
time the dm device is set up.
Disable recalculating if we use internal_hash or journal_hash with a
key (e.g. HMAC) and we don't have the "legacy_recalculate" flag.
This may break activation of a volume, created by an older kernel,
that is not yet fully recalculated -- if this happens, the user should
add the "legacy_recalculate" flag to constructor parameters.
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Reported-by: Daniel Glockner <dg(a)emlix.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/Documentation/admin-guide/device-mapper/dm-integrity.rst b/Documentation/admin-guide/device-mapper/dm-integrity.rst
index 4e6f504474ac..2cc5488acbd9 100644
--- a/Documentation/admin-guide/device-mapper/dm-integrity.rst
+++ b/Documentation/admin-guide/device-mapper/dm-integrity.rst
@@ -177,14 +177,20 @@ bitmap_flush_interval:number
The bitmap flush interval in milliseconds. The metadata buffers
are synchronized when this interval expires.
+allow_discards
+ Allow block discard requests (a.k.a. TRIM) for the integrity device.
+ Discards are only allowed to devices using internal hash.
+
fix_padding
Use a smaller padding of the tag area that is more
space-efficient. If this option is not present, large padding is
used - that is for compatibility with older kernels.
-allow_discards
- Allow block discard requests (a.k.a. TRIM) for the integrity device.
- Discards are only allowed to devices using internal hash.
+legacy_recalculate
+ Allow recalculating of volumes with HMAC keys. This is disabled by
+ default for security reasons - an attacker could modify the volume,
+ set recalc_sector to zero, and the kernel would not detect the
+ modification.
The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and
allow_discards can be changed when reloading the target (load an inactive
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index cce203adcf77..b64fede032dc 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -257,8 +257,9 @@ struct dm_integrity_c {
bool journal_uptodate;
bool just_formatted;
bool recalculate_flag;
- bool fix_padding;
bool discard;
+ bool fix_padding;
+ bool legacy_recalculate;
struct alg_spec internal_hash_alg;
struct alg_spec journal_crypt_alg;
@@ -386,6 +387,14 @@ static int dm_integrity_failed(struct dm_integrity_c *ic)
return READ_ONCE(ic->failed);
}
+static bool dm_integrity_disable_recalculate(struct dm_integrity_c *ic)
+{
+ if ((ic->internal_hash_alg.key || ic->journal_mac_alg.key) &&
+ !ic->legacy_recalculate)
+ return true;
+ return false;
+}
+
static commit_id_t dm_integrity_commit_id(struct dm_integrity_c *ic, unsigned i,
unsigned j, unsigned char seq)
{
@@ -3140,6 +3149,7 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
arg_count += !!ic->journal_crypt_alg.alg_string;
arg_count += !!ic->journal_mac_alg.alg_string;
arg_count += (ic->sb->flags & cpu_to_le32(SB_FLAG_FIXED_PADDING)) != 0;
+ arg_count += ic->legacy_recalculate;
DMEMIT("%s %llu %u %c %u", ic->dev->name, ic->start,
ic->tag_size, ic->mode, arg_count);
if (ic->meta_dev)
@@ -3163,6 +3173,8 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
}
if ((ic->sb->flags & cpu_to_le32(SB_FLAG_FIXED_PADDING)) != 0)
DMEMIT(" fix_padding");
+ if (ic->legacy_recalculate)
+ DMEMIT(" legacy_recalculate");
#define EMIT_ALG(a, n) \
do { \
@@ -3792,7 +3804,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
unsigned extra_args;
struct dm_arg_set as;
static const struct dm_arg _args[] = {
- {0, 15, "Invalid number of feature args"},
+ {0, 16, "Invalid number of feature args"},
};
unsigned journal_sectors, interleave_sectors, buffer_sectors, journal_watermark, sync_msec;
bool should_write_sb;
@@ -3940,6 +3952,8 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
ic->discard = true;
} else if (!strcmp(opt_string, "fix_padding")) {
ic->fix_padding = true;
+ } else if (!strcmp(opt_string, "legacy_recalculate")) {
+ ic->legacy_recalculate = true;
} else {
r = -EINVAL;
ti->error = "Invalid argument";
@@ -4243,6 +4257,14 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
}
}
+ if (ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING) &&
+ le64_to_cpu(ic->sb->recalc_sector) < ic->provided_data_sectors &&
+ dm_integrity_disable_recalculate(ic)) {
+ ti->error = "Recalculating with HMAC is disabled for security reasons - if you really need it, use the argument \"legacy_recalculate\"";
+ r = -EOPNOTSUPP;
+ goto bad;
+ }
+
ic->bufio = dm_bufio_client_create(ic->meta_dev ? ic->meta_dev->bdev : ic->dev->bdev,
1U << (SECTOR_SHIFT + ic->log2_buffer_sectors), 1, 0, NULL, NULL);
if (IS_ERR(ic->bufio)) {
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5c02406428d5219c367c5f53457698c58bc5f917 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Wed, 20 Jan 2021 13:59:11 -0500
Subject: [PATCH] dm integrity: conditionally disable "recalculate" feature
Otherwise a malicious user could (ab)use the "recalculate" feature
that makes dm-integrity calculate the checksums in the background
while the device is already usable. When the system restarts before all
checksums have been calculated, the calculation continues where it was
interrupted even if the recalculate feature is not requested the next
time the dm device is set up.
Disable recalculating if we use internal_hash or journal_hash with a
key (e.g. HMAC) and we don't have the "legacy_recalculate" flag.
This may break activation of a volume, created by an older kernel,
that is not yet fully recalculated -- if this happens, the user should
add the "legacy_recalculate" flag to constructor parameters.
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Reported-by: Daniel Glockner <dg(a)emlix.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/Documentation/admin-guide/device-mapper/dm-integrity.rst b/Documentation/admin-guide/device-mapper/dm-integrity.rst
index 4e6f504474ac..2cc5488acbd9 100644
--- a/Documentation/admin-guide/device-mapper/dm-integrity.rst
+++ b/Documentation/admin-guide/device-mapper/dm-integrity.rst
@@ -177,14 +177,20 @@ bitmap_flush_interval:number
The bitmap flush interval in milliseconds. The metadata buffers
are synchronized when this interval expires.
+allow_discards
+ Allow block discard requests (a.k.a. TRIM) for the integrity device.
+ Discards are only allowed to devices using internal hash.
+
fix_padding
Use a smaller padding of the tag area that is more
space-efficient. If this option is not present, large padding is
used - that is for compatibility with older kernels.
-allow_discards
- Allow block discard requests (a.k.a. TRIM) for the integrity device.
- Discards are only allowed to devices using internal hash.
+legacy_recalculate
+ Allow recalculating of volumes with HMAC keys. This is disabled by
+ default for security reasons - an attacker could modify the volume,
+ set recalc_sector to zero, and the kernel would not detect the
+ modification.
The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and
allow_discards can be changed when reloading the target (load an inactive
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index cce203adcf77..b64fede032dc 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -257,8 +257,9 @@ struct dm_integrity_c {
bool journal_uptodate;
bool just_formatted;
bool recalculate_flag;
- bool fix_padding;
bool discard;
+ bool fix_padding;
+ bool legacy_recalculate;
struct alg_spec internal_hash_alg;
struct alg_spec journal_crypt_alg;
@@ -386,6 +387,14 @@ static int dm_integrity_failed(struct dm_integrity_c *ic)
return READ_ONCE(ic->failed);
}
+static bool dm_integrity_disable_recalculate(struct dm_integrity_c *ic)
+{
+ if ((ic->internal_hash_alg.key || ic->journal_mac_alg.key) &&
+ !ic->legacy_recalculate)
+ return true;
+ return false;
+}
+
static commit_id_t dm_integrity_commit_id(struct dm_integrity_c *ic, unsigned i,
unsigned j, unsigned char seq)
{
@@ -3140,6 +3149,7 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
arg_count += !!ic->journal_crypt_alg.alg_string;
arg_count += !!ic->journal_mac_alg.alg_string;
arg_count += (ic->sb->flags & cpu_to_le32(SB_FLAG_FIXED_PADDING)) != 0;
+ arg_count += ic->legacy_recalculate;
DMEMIT("%s %llu %u %c %u", ic->dev->name, ic->start,
ic->tag_size, ic->mode, arg_count);
if (ic->meta_dev)
@@ -3163,6 +3173,8 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
}
if ((ic->sb->flags & cpu_to_le32(SB_FLAG_FIXED_PADDING)) != 0)
DMEMIT(" fix_padding");
+ if (ic->legacy_recalculate)
+ DMEMIT(" legacy_recalculate");
#define EMIT_ALG(a, n) \
do { \
@@ -3792,7 +3804,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
unsigned extra_args;
struct dm_arg_set as;
static const struct dm_arg _args[] = {
- {0, 15, "Invalid number of feature args"},
+ {0, 16, "Invalid number of feature args"},
};
unsigned journal_sectors, interleave_sectors, buffer_sectors, journal_watermark, sync_msec;
bool should_write_sb;
@@ -3940,6 +3952,8 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
ic->discard = true;
} else if (!strcmp(opt_string, "fix_padding")) {
ic->fix_padding = true;
+ } else if (!strcmp(opt_string, "legacy_recalculate")) {
+ ic->legacy_recalculate = true;
} else {
r = -EINVAL;
ti->error = "Invalid argument";
@@ -4243,6 +4257,14 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
}
}
+ if (ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING) &&
+ le64_to_cpu(ic->sb->recalc_sector) < ic->provided_data_sectors &&
+ dm_integrity_disable_recalculate(ic)) {
+ ti->error = "Recalculating with HMAC is disabled for security reasons - if you really need it, use the argument \"legacy_recalculate\"";
+ r = -EOPNOTSUPP;
+ goto bad;
+ }
+
ic->bufio = dm_bufio_client_create(ic->meta_dev ? ic->meta_dev->bdev : ic->dev->bdev,
1U << (SECTOR_SHIFT + ic->log2_buffer_sectors), 1, 0, NULL, NULL);
if (IS_ERR(ic->bufio)) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5c02406428d5219c367c5f53457698c58bc5f917 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Wed, 20 Jan 2021 13:59:11 -0500
Subject: [PATCH] dm integrity: conditionally disable "recalculate" feature
Otherwise a malicious user could (ab)use the "recalculate" feature
that makes dm-integrity calculate the checksums in the background
while the device is already usable. When the system restarts before all
checksums have been calculated, the calculation continues where it was
interrupted even if the recalculate feature is not requested the next
time the dm device is set up.
Disable recalculating if we use internal_hash or journal_hash with a
key (e.g. HMAC) and we don't have the "legacy_recalculate" flag.
This may break activation of a volume, created by an older kernel,
that is not yet fully recalculated -- if this happens, the user should
add the "legacy_recalculate" flag to constructor parameters.
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Reported-by: Daniel Glockner <dg(a)emlix.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/Documentation/admin-guide/device-mapper/dm-integrity.rst b/Documentation/admin-guide/device-mapper/dm-integrity.rst
index 4e6f504474ac..2cc5488acbd9 100644
--- a/Documentation/admin-guide/device-mapper/dm-integrity.rst
+++ b/Documentation/admin-guide/device-mapper/dm-integrity.rst
@@ -177,14 +177,20 @@ bitmap_flush_interval:number
The bitmap flush interval in milliseconds. The metadata buffers
are synchronized when this interval expires.
+allow_discards
+ Allow block discard requests (a.k.a. TRIM) for the integrity device.
+ Discards are only allowed to devices using internal hash.
+
fix_padding
Use a smaller padding of the tag area that is more
space-efficient. If this option is not present, large padding is
used - that is for compatibility with older kernels.
-allow_discards
- Allow block discard requests (a.k.a. TRIM) for the integrity device.
- Discards are only allowed to devices using internal hash.
+legacy_recalculate
+ Allow recalculating of volumes with HMAC keys. This is disabled by
+ default for security reasons - an attacker could modify the volume,
+ set recalc_sector to zero, and the kernel would not detect the
+ modification.
The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and
allow_discards can be changed when reloading the target (load an inactive
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index cce203adcf77..b64fede032dc 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -257,8 +257,9 @@ struct dm_integrity_c {
bool journal_uptodate;
bool just_formatted;
bool recalculate_flag;
- bool fix_padding;
bool discard;
+ bool fix_padding;
+ bool legacy_recalculate;
struct alg_spec internal_hash_alg;
struct alg_spec journal_crypt_alg;
@@ -386,6 +387,14 @@ static int dm_integrity_failed(struct dm_integrity_c *ic)
return READ_ONCE(ic->failed);
}
+static bool dm_integrity_disable_recalculate(struct dm_integrity_c *ic)
+{
+ if ((ic->internal_hash_alg.key || ic->journal_mac_alg.key) &&
+ !ic->legacy_recalculate)
+ return true;
+ return false;
+}
+
static commit_id_t dm_integrity_commit_id(struct dm_integrity_c *ic, unsigned i,
unsigned j, unsigned char seq)
{
@@ -3140,6 +3149,7 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
arg_count += !!ic->journal_crypt_alg.alg_string;
arg_count += !!ic->journal_mac_alg.alg_string;
arg_count += (ic->sb->flags & cpu_to_le32(SB_FLAG_FIXED_PADDING)) != 0;
+ arg_count += ic->legacy_recalculate;
DMEMIT("%s %llu %u %c %u", ic->dev->name, ic->start,
ic->tag_size, ic->mode, arg_count);
if (ic->meta_dev)
@@ -3163,6 +3173,8 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
}
if ((ic->sb->flags & cpu_to_le32(SB_FLAG_FIXED_PADDING)) != 0)
DMEMIT(" fix_padding");
+ if (ic->legacy_recalculate)
+ DMEMIT(" legacy_recalculate");
#define EMIT_ALG(a, n) \
do { \
@@ -3792,7 +3804,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
unsigned extra_args;
struct dm_arg_set as;
static const struct dm_arg _args[] = {
- {0, 15, "Invalid number of feature args"},
+ {0, 16, "Invalid number of feature args"},
};
unsigned journal_sectors, interleave_sectors, buffer_sectors, journal_watermark, sync_msec;
bool should_write_sb;
@@ -3940,6 +3952,8 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
ic->discard = true;
} else if (!strcmp(opt_string, "fix_padding")) {
ic->fix_padding = true;
+ } else if (!strcmp(opt_string, "legacy_recalculate")) {
+ ic->legacy_recalculate = true;
} else {
r = -EINVAL;
ti->error = "Invalid argument";
@@ -4243,6 +4257,14 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
}
}
+ if (ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING) &&
+ le64_to_cpu(ic->sb->recalc_sector) < ic->provided_data_sectors &&
+ dm_integrity_disable_recalculate(ic)) {
+ ti->error = "Recalculating with HMAC is disabled for security reasons - if you really need it, use the argument \"legacy_recalculate\"";
+ r = -EOPNOTSUPP;
+ goto bad;
+ }
+
ic->bufio = dm_bufio_client_create(ic->meta_dev ? ic->meta_dev->bdev : ic->dev->bdev,
1U << (SECTOR_SHIFT + ic->log2_buffer_sectors), 1, 0, NULL, NULL);
if (IS_ERR(ic->bufio)) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca1219c0a7432272324660fc9f61a9940f90c50b Mon Sep 17 00:00:00 2001
From: Jisheng Zhang <Jisheng.Zhang(a)synaptics.com>
Date: Tue, 29 Dec 2020 16:16:25 +0800
Subject: [PATCH] mmc: sdhci-of-dwcmshc: fix rpmb access
Commit a44f7cb93732 ("mmc: core: use mrq->sbc when sending CMD23 for
RPMB") began to use ACMD23 for RPMB if the host supports ACMD23. In
RPMB ACM23 case, we need to set bit 31 to CMD23 argument, otherwise
RPMB write operation will return general fail.
However, no matter V4 is enabled or not, the dwcmshc's ARGUMENT2
register is 32-bit block count register which doesn't support stuff
bits of CMD23 argument. So let's handle this specific ACMD23 case.
>From another side, this patch also prepare for future v4 enabling
for dwcmshc, because from the 4.10 spec, the ARGUMENT2 register is
redefined as 32bit block count which doesn't support stuff bits of
CMD23 argument.
Fixes: a44f7cb93732 ("mmc: core: use mrq->sbc when sending CMD23 for RPMB")
Signed-off-by: Jisheng Zhang <Jisheng.Zhang(a)synaptics.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/20201229161625.38255233@xhacker.debian
Cc: stable(a)vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c b/drivers/mmc/host/sdhci-of-dwcmshc.c
index 4b673792b5a4..d90020ed3622 100644
--- a/drivers/mmc/host/sdhci-of-dwcmshc.c
+++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
@@ -16,6 +16,8 @@
#include "sdhci-pltfm.h"
+#define SDHCI_DWCMSHC_ARG2_STUFF GENMASK(31, 16)
+
/* DWCMSHC specific Mode Select value */
#define DWCMSHC_CTRL_HS400 0x7
@@ -49,6 +51,29 @@ static void dwcmshc_adma_write_desc(struct sdhci_host *host, void **desc,
sdhci_adma_write_desc(host, desc, addr, len, cmd);
}
+static void dwcmshc_check_auto_cmd23(struct mmc_host *mmc,
+ struct mmc_request *mrq)
+{
+ struct sdhci_host *host = mmc_priv(mmc);
+
+ /*
+ * No matter V4 is enabled or not, ARGUMENT2 register is 32-bit
+ * block count register which doesn't support stuff bits of
+ * CMD23 argument on dwcmsch host controller.
+ */
+ if (mrq->sbc && (mrq->sbc->arg & SDHCI_DWCMSHC_ARG2_STUFF))
+ host->flags &= ~SDHCI_AUTO_CMD23;
+ else
+ host->flags |= SDHCI_AUTO_CMD23;
+}
+
+static void dwcmshc_request(struct mmc_host *mmc, struct mmc_request *mrq)
+{
+ dwcmshc_check_auto_cmd23(mmc, mrq);
+
+ sdhci_request(mmc, mrq);
+}
+
static void dwcmshc_set_uhs_signaling(struct sdhci_host *host,
unsigned int timing)
{
@@ -133,6 +158,8 @@ static int dwcmshc_probe(struct platform_device *pdev)
sdhci_get_of_property(pdev);
+ host->mmc_host_ops.request = dwcmshc_request;
+
err = sdhci_add_host(host);
if (err)
goto err_clk;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca1219c0a7432272324660fc9f61a9940f90c50b Mon Sep 17 00:00:00 2001
From: Jisheng Zhang <Jisheng.Zhang(a)synaptics.com>
Date: Tue, 29 Dec 2020 16:16:25 +0800
Subject: [PATCH] mmc: sdhci-of-dwcmshc: fix rpmb access
Commit a44f7cb93732 ("mmc: core: use mrq->sbc when sending CMD23 for
RPMB") began to use ACMD23 for RPMB if the host supports ACMD23. In
RPMB ACM23 case, we need to set bit 31 to CMD23 argument, otherwise
RPMB write operation will return general fail.
However, no matter V4 is enabled or not, the dwcmshc's ARGUMENT2
register is 32-bit block count register which doesn't support stuff
bits of CMD23 argument. So let's handle this specific ACMD23 case.
>From another side, this patch also prepare for future v4 enabling
for dwcmshc, because from the 4.10 spec, the ARGUMENT2 register is
redefined as 32bit block count which doesn't support stuff bits of
CMD23 argument.
Fixes: a44f7cb93732 ("mmc: core: use mrq->sbc when sending CMD23 for RPMB")
Signed-off-by: Jisheng Zhang <Jisheng.Zhang(a)synaptics.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/20201229161625.38255233@xhacker.debian
Cc: stable(a)vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c b/drivers/mmc/host/sdhci-of-dwcmshc.c
index 4b673792b5a4..d90020ed3622 100644
--- a/drivers/mmc/host/sdhci-of-dwcmshc.c
+++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
@@ -16,6 +16,8 @@
#include "sdhci-pltfm.h"
+#define SDHCI_DWCMSHC_ARG2_STUFF GENMASK(31, 16)
+
/* DWCMSHC specific Mode Select value */
#define DWCMSHC_CTRL_HS400 0x7
@@ -49,6 +51,29 @@ static void dwcmshc_adma_write_desc(struct sdhci_host *host, void **desc,
sdhci_adma_write_desc(host, desc, addr, len, cmd);
}
+static void dwcmshc_check_auto_cmd23(struct mmc_host *mmc,
+ struct mmc_request *mrq)
+{
+ struct sdhci_host *host = mmc_priv(mmc);
+
+ /*
+ * No matter V4 is enabled or not, ARGUMENT2 register is 32-bit
+ * block count register which doesn't support stuff bits of
+ * CMD23 argument on dwcmsch host controller.
+ */
+ if (mrq->sbc && (mrq->sbc->arg & SDHCI_DWCMSHC_ARG2_STUFF))
+ host->flags &= ~SDHCI_AUTO_CMD23;
+ else
+ host->flags |= SDHCI_AUTO_CMD23;
+}
+
+static void dwcmshc_request(struct mmc_host *mmc, struct mmc_request *mrq)
+{
+ dwcmshc_check_auto_cmd23(mmc, mrq);
+
+ sdhci_request(mmc, mrq);
+}
+
static void dwcmshc_set_uhs_signaling(struct sdhci_host *host,
unsigned int timing)
{
@@ -133,6 +158,8 @@ static int dwcmshc_probe(struct platform_device *pdev)
sdhci_get_of_property(pdev);
+ host->mmc_host_ops.request = dwcmshc_request;
+
err = sdhci_add_host(host);
if (err)
goto err_clk;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b503087445ce7e45fabdee87ca9e460d5b5b5168 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne <pcc(a)google.com>
Date: Thu, 14 Jan 2021 12:14:05 -0800
Subject: [PATCH] mmc: core: don't initialize block size from ext_csd if not
present
If extended CSD was not available, the eMMC driver would incorrectly
set the block size to 0, as the data_sector_size field of ext_csd
was never initialized. This issue was exposed by commit 817046ecddbc
("block: Align max_hw_sectors to logical blocksize") which caused
max_sectors and max_hw_sectors to be set to 0 after setting the block
size to 0, resulting in a kernel panic in bio_split when attempting
to read from the device. Fix it by only reading the block size from
ext_csd if it is available.
Fixes: a5075eb94837 ("mmc: block: Allow disabling 512B sector size emulation")
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Reviewed-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Link: https://linux-review.googlesource.com/id/If244d178da4d86b52034459438fec295b…
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210114201405.2934886-1-pcc@google.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index de7cb0369c30..002426e3cf76 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -384,8 +384,10 @@ static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
"merging was advertised but not possible");
blk_queue_max_segments(mq->queue, mmc_get_max_segments(host));
- if (mmc_card_mmc(card))
+ if (mmc_card_mmc(card) && card->ext_csd.data_sector_size) {
block_size = card->ext_csd.data_sector_size;
+ WARN_ON(block_size != 512 && block_size != 4096);
+ }
blk_queue_logical_block_size(mq->queue, block_size);
/*
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b503087445ce7e45fabdee87ca9e460d5b5b5168 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne <pcc(a)google.com>
Date: Thu, 14 Jan 2021 12:14:05 -0800
Subject: [PATCH] mmc: core: don't initialize block size from ext_csd if not
present
If extended CSD was not available, the eMMC driver would incorrectly
set the block size to 0, as the data_sector_size field of ext_csd
was never initialized. This issue was exposed by commit 817046ecddbc
("block: Align max_hw_sectors to logical blocksize") which caused
max_sectors and max_hw_sectors to be set to 0 after setting the block
size to 0, resulting in a kernel panic in bio_split when attempting
to read from the device. Fix it by only reading the block size from
ext_csd if it is available.
Fixes: a5075eb94837 ("mmc: block: Allow disabling 512B sector size emulation")
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Reviewed-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Link: https://linux-review.googlesource.com/id/If244d178da4d86b52034459438fec295b…
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210114201405.2934886-1-pcc@google.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index de7cb0369c30..002426e3cf76 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -384,8 +384,10 @@ static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
"merging was advertised but not possible");
blk_queue_max_segments(mq->queue, mmc_get_max_segments(host));
- if (mmc_card_mmc(card))
+ if (mmc_card_mmc(card) && card->ext_csd.data_sector_size) {
block_size = card->ext_csd.data_sector_size;
+ WARN_ON(block_size != 512 && block_size != 4096);
+ }
blk_queue_logical_block_size(mq->queue, block_size);
/*
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b503087445ce7e45fabdee87ca9e460d5b5b5168 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne <pcc(a)google.com>
Date: Thu, 14 Jan 2021 12:14:05 -0800
Subject: [PATCH] mmc: core: don't initialize block size from ext_csd if not
present
If extended CSD was not available, the eMMC driver would incorrectly
set the block size to 0, as the data_sector_size field of ext_csd
was never initialized. This issue was exposed by commit 817046ecddbc
("block: Align max_hw_sectors to logical blocksize") which caused
max_sectors and max_hw_sectors to be set to 0 after setting the block
size to 0, resulting in a kernel panic in bio_split when attempting
to read from the device. Fix it by only reading the block size from
ext_csd if it is available.
Fixes: a5075eb94837 ("mmc: block: Allow disabling 512B sector size emulation")
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Reviewed-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Link: https://linux-review.googlesource.com/id/If244d178da4d86b52034459438fec295b…
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210114201405.2934886-1-pcc@google.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index de7cb0369c30..002426e3cf76 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -384,8 +384,10 @@ static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
"merging was advertised but not possible");
blk_queue_max_segments(mq->queue, mmc_get_max_segments(host));
- if (mmc_card_mmc(card))
+ if (mmc_card_mmc(card) && card->ext_csd.data_sector_size) {
block_size = card->ext_csd.data_sector_size;
+ WARN_ON(block_size != 512 && block_size != 4096);
+ }
blk_queue_logical_block_size(mq->queue, block_size);
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1e249cb5b7fc09ff216aa5a12f6c302e434e88f9 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Tue, 12 Jan 2021 11:02:43 -0800
Subject: [PATCH] fs: fix lazytime expiration handling in
__writeback_single_inode()
When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.
This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).
However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from ->i_state. This causes two bugs:
- mark_inode_dirty_sync() redirties the inode, causing it to remain
dirty. This wastefully causes the inode to be written twice. But
more importantly, it breaks cases where sync_filesystem() is expected
to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
ioctl (as reported at
https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
as possibly filesystem freezing (freeze_super()).
- Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
called from __writeback_single_inode() for lazytime expiration,
xfs_fs_dirty_inode() ignores the notification. (XFS only cares about
lazytime expirations, and it assumes that i_state will contain
I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't
persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS.
Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state. This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.
This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled. It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).
Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Cc: stable(a)vger.kernel.org
Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")
Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index acfb55834af2..c41cb887eb7d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
}
/*
- * Some filesystems may redirty the inode during the writeback
- * due to delalloc, clear dirty metadata flags right before
- * write_inode()
+ * If the inode has dirty timestamps and we need to write them, call
+ * mark_inode_dirty_sync() to notify the filesystem about it and to
+ * change I_DIRTY_TIME into I_DIRTY_SYNC.
*/
- spin_lock(&inode->i_lock);
-
- dirty = inode->i_state & I_DIRTY;
if ((inode->i_state & I_DIRTY_TIME) &&
- ((dirty & I_DIRTY_INODE) ||
- wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
+ (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
time_after(jiffies, inode->dirtied_time_when +
dirtytime_expire_interval * HZ))) {
- dirty |= I_DIRTY_TIME;
trace_writeback_lazytime(inode);
+ mark_inode_dirty_sync(inode);
}
+
+ /*
+ * Some filesystems may redirty the inode during the writeback
+ * due to delalloc, clear dirty metadata flags right before
+ * write_inode()
+ */
+ spin_lock(&inode->i_lock);
+ dirty = inode->i_state & I_DIRTY;
inode->i_state &= ~dirty;
/*
@@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
spin_unlock(&inode->i_lock);
- if (dirty & I_DIRTY_TIME)
- mark_inode_dirty_sync(inode);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & ~I_DIRTY_PAGES) {
int err = write_inode(inode, wbc);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1e249cb5b7fc09ff216aa5a12f6c302e434e88f9 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Tue, 12 Jan 2021 11:02:43 -0800
Subject: [PATCH] fs: fix lazytime expiration handling in
__writeback_single_inode()
When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.
This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).
However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from ->i_state. This causes two bugs:
- mark_inode_dirty_sync() redirties the inode, causing it to remain
dirty. This wastefully causes the inode to be written twice. But
more importantly, it breaks cases where sync_filesystem() is expected
to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
ioctl (as reported at
https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
as possibly filesystem freezing (freeze_super()).
- Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
called from __writeback_single_inode() for lazytime expiration,
xfs_fs_dirty_inode() ignores the notification. (XFS only cares about
lazytime expirations, and it assumes that i_state will contain
I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't
persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS.
Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state. This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.
This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled. It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).
Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Cc: stable(a)vger.kernel.org
Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")
Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index acfb55834af2..c41cb887eb7d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
}
/*
- * Some filesystems may redirty the inode during the writeback
- * due to delalloc, clear dirty metadata flags right before
- * write_inode()
+ * If the inode has dirty timestamps and we need to write them, call
+ * mark_inode_dirty_sync() to notify the filesystem about it and to
+ * change I_DIRTY_TIME into I_DIRTY_SYNC.
*/
- spin_lock(&inode->i_lock);
-
- dirty = inode->i_state & I_DIRTY;
if ((inode->i_state & I_DIRTY_TIME) &&
- ((dirty & I_DIRTY_INODE) ||
- wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
+ (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
time_after(jiffies, inode->dirtied_time_when +
dirtytime_expire_interval * HZ))) {
- dirty |= I_DIRTY_TIME;
trace_writeback_lazytime(inode);
+ mark_inode_dirty_sync(inode);
}
+
+ /*
+ * Some filesystems may redirty the inode during the writeback
+ * due to delalloc, clear dirty metadata flags right before
+ * write_inode()
+ */
+ spin_lock(&inode->i_lock);
+ dirty = inode->i_state & I_DIRTY;
inode->i_state &= ~dirty;
/*
@@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
spin_unlock(&inode->i_lock);
- if (dirty & I_DIRTY_TIME)
- mark_inode_dirty_sync(inode);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & ~I_DIRTY_PAGES) {
int err = write_inode(inode, wbc);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1e249cb5b7fc09ff216aa5a12f6c302e434e88f9 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Tue, 12 Jan 2021 11:02:43 -0800
Subject: [PATCH] fs: fix lazytime expiration handling in
__writeback_single_inode()
When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.
This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).
However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from ->i_state. This causes two bugs:
- mark_inode_dirty_sync() redirties the inode, causing it to remain
dirty. This wastefully causes the inode to be written twice. But
more importantly, it breaks cases where sync_filesystem() is expected
to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
ioctl (as reported at
https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
as possibly filesystem freezing (freeze_super()).
- Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
called from __writeback_single_inode() for lazytime expiration,
xfs_fs_dirty_inode() ignores the notification. (XFS only cares about
lazytime expirations, and it assumes that i_state will contain
I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't
persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS.
Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state. This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.
This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled. It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).
Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Cc: stable(a)vger.kernel.org
Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")
Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index acfb55834af2..c41cb887eb7d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
}
/*
- * Some filesystems may redirty the inode during the writeback
- * due to delalloc, clear dirty metadata flags right before
- * write_inode()
+ * If the inode has dirty timestamps and we need to write them, call
+ * mark_inode_dirty_sync() to notify the filesystem about it and to
+ * change I_DIRTY_TIME into I_DIRTY_SYNC.
*/
- spin_lock(&inode->i_lock);
-
- dirty = inode->i_state & I_DIRTY;
if ((inode->i_state & I_DIRTY_TIME) &&
- ((dirty & I_DIRTY_INODE) ||
- wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
+ (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
time_after(jiffies, inode->dirtied_time_when +
dirtytime_expire_interval * HZ))) {
- dirty |= I_DIRTY_TIME;
trace_writeback_lazytime(inode);
+ mark_inode_dirty_sync(inode);
}
+
+ /*
+ * Some filesystems may redirty the inode during the writeback
+ * due to delalloc, clear dirty metadata flags right before
+ * write_inode()
+ */
+ spin_lock(&inode->i_lock);
+ dirty = inode->i_state & I_DIRTY;
inode->i_state &= ~dirty;
/*
@@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
spin_unlock(&inode->i_lock);
- if (dirty & I_DIRTY_TIME)
- mark_inode_dirty_sync(inode);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & ~I_DIRTY_PAGES) {
int err = write_inode(inode, wbc);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1e249cb5b7fc09ff216aa5a12f6c302e434e88f9 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Tue, 12 Jan 2021 11:02:43 -0800
Subject: [PATCH] fs: fix lazytime expiration handling in
__writeback_single_inode()
When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.
This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).
However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from ->i_state. This causes two bugs:
- mark_inode_dirty_sync() redirties the inode, causing it to remain
dirty. This wastefully causes the inode to be written twice. But
more importantly, it breaks cases where sync_filesystem() is expected
to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
ioctl (as reported at
https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
as possibly filesystem freezing (freeze_super()).
- Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
called from __writeback_single_inode() for lazytime expiration,
xfs_fs_dirty_inode() ignores the notification. (XFS only cares about
lazytime expirations, and it assumes that i_state will contain
I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't
persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS.
Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state. This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.
This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled. It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).
Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Cc: stable(a)vger.kernel.org
Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")
Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index acfb55834af2..c41cb887eb7d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
}
/*
- * Some filesystems may redirty the inode during the writeback
- * due to delalloc, clear dirty metadata flags right before
- * write_inode()
+ * If the inode has dirty timestamps and we need to write them, call
+ * mark_inode_dirty_sync() to notify the filesystem about it and to
+ * change I_DIRTY_TIME into I_DIRTY_SYNC.
*/
- spin_lock(&inode->i_lock);
-
- dirty = inode->i_state & I_DIRTY;
if ((inode->i_state & I_DIRTY_TIME) &&
- ((dirty & I_DIRTY_INODE) ||
- wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
+ (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
time_after(jiffies, inode->dirtied_time_when +
dirtytime_expire_interval * HZ))) {
- dirty |= I_DIRTY_TIME;
trace_writeback_lazytime(inode);
+ mark_inode_dirty_sync(inode);
}
+
+ /*
+ * Some filesystems may redirty the inode during the writeback
+ * due to delalloc, clear dirty metadata flags right before
+ * write_inode()
+ */
+ spin_lock(&inode->i_lock);
+ dirty = inode->i_state & I_DIRTY;
inode->i_state &= ~dirty;
/*
@@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
spin_unlock(&inode->i_lock);
- if (dirty & I_DIRTY_TIME)
- mark_inode_dirty_sync(inode);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & ~I_DIRTY_PAGES) {
int err = write_inode(inode, wbc);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34d1eb0e599875064955a74712f08ff14c8e3d5f Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Wed, 16 Dec 2020 11:22:17 -0500
Subject: [PATCH] btrfs: don't clear ret in btrfs_start_dirty_block_groups
If we fail to update a block group item in the loop we'll break, however
we'll do btrfs_run_delayed_refs and lose our error value in ret, and
thus not clean up properly. Fix this by only running the delayed refs
if there was no failure.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 52f2198d44c9..0886e81e5540 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2669,7 +2669,8 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
* Go through delayed refs for all the stuff we've just kicked off
* and then loop back (just once)
*/
- ret = btrfs_run_delayed_refs(trans, 0);
+ if (!ret)
+ ret = btrfs_run_delayed_refs(trans, 0);
if (!ret && loops == 0) {
loops++;
spin_lock(&cur_trans->dirty_bgs_lock);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34d1eb0e599875064955a74712f08ff14c8e3d5f Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Wed, 16 Dec 2020 11:22:17 -0500
Subject: [PATCH] btrfs: don't clear ret in btrfs_start_dirty_block_groups
If we fail to update a block group item in the loop we'll break, however
we'll do btrfs_run_delayed_refs and lose our error value in ret, and
thus not clean up properly. Fix this by only running the delayed refs
if there was no failure.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 52f2198d44c9..0886e81e5540 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2669,7 +2669,8 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
* Go through delayed refs for all the stuff we've just kicked off
* and then loop back (just once)
*/
- ret = btrfs_run_delayed_refs(trans, 0);
+ if (!ret)
+ ret = btrfs_run_delayed_refs(trans, 0);
if (!ret && loops == 0) {
loops++;
spin_lock(&cur_trans->dirty_bgs_lock);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34d1eb0e599875064955a74712f08ff14c8e3d5f Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Wed, 16 Dec 2020 11:22:17 -0500
Subject: [PATCH] btrfs: don't clear ret in btrfs_start_dirty_block_groups
If we fail to update a block group item in the loop we'll break, however
we'll do btrfs_run_delayed_refs and lose our error value in ret, and
thus not clean up properly. Fix this by only running the delayed refs
if there was no failure.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 52f2198d44c9..0886e81e5540 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2669,7 +2669,8 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
* Go through delayed refs for all the stuff we've just kicked off
* and then loop back (just once)
*/
- ret = btrfs_run_delayed_refs(trans, 0);
+ if (!ret)
+ ret = btrfs_run_delayed_refs(trans, 0);
if (!ret && loops == 0) {
loops++;
spin_lock(&cur_trans->dirty_bgs_lock);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34d1eb0e599875064955a74712f08ff14c8e3d5f Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Wed, 16 Dec 2020 11:22:17 -0500
Subject: [PATCH] btrfs: don't clear ret in btrfs_start_dirty_block_groups
If we fail to update a block group item in the loop we'll break, however
we'll do btrfs_run_delayed_refs and lose our error value in ret, and
thus not clean up properly. Fix this by only running the delayed refs
if there was no failure.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 52f2198d44c9..0886e81e5540 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2669,7 +2669,8 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
* Go through delayed refs for all the stuff we've just kicked off
* and then loop back (just once)
*/
- ret = btrfs_run_delayed_refs(trans, 0);
+ if (!ret)
+ ret = btrfs_run_delayed_refs(trans, 0);
if (!ret && loops == 0) {
loops++;
spin_lock(&cur_trans->dirty_bgs_lock);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49ecc679ab48b40ca799bf94b327d5284eac9e46 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Wed, 16 Dec 2020 11:22:11 -0500
Subject: [PATCH] btrfs: do not double free backref nodes on error
Zygo reported the following KASAN splat:
BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
Read of size 8 at addr ffff888112402950 by task btrfs/28836
CPU: 0 PID: 28836 Comm: btrfs Tainted: G W 5.10.0-e35f27394290-for-next+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
dump_stack+0xbc/0xf9
? btrfs_backref_cleanup_node+0x18a/0x420
print_address_description.constprop.8+0x21/0x210
? record_print_text.cold.34+0x11/0x11
? btrfs_backref_cleanup_node+0x18a/0x420
? btrfs_backref_cleanup_node+0x18a/0x420
kasan_report.cold.10+0x20/0x37
? btrfs_backref_cleanup_node+0x18a/0x420
__asan_load8+0x69/0x90
btrfs_backref_cleanup_node+0x18a/0x420
btrfs_backref_release_cache+0x83/0x1b0
relocate_block_group+0x394/0x780
? merge_reloc_roots+0x4a0/0x4a0
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
? check_flags.part.50+0x6c/0x1e0
? btrfs_relocate_chunk+0x120/0x120
? kmem_cache_alloc_trace+0xa06/0xcb0
? _copy_from_user+0x83/0xc0
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
? __kasan_check_read+0x11/0x20
? check_chain_key+0x1f4/0x2f0
? __asan_loadN+0xf/0x20
? btrfs_ioctl_get_supported_features+0x30/0x30
? kvm_sched_clock_read+0x18/0x30
? check_chain_key+0x1f4/0x2f0
? lock_downgrade+0x3f0/0x3f0
? handle_mm_fault+0xad6/0x2150
? do_vfs_ioctl+0xfc/0x9d0
? ioctl_file_clone+0xe0/0xe0
? check_flags.part.50+0x6c/0x1e0
? check_flags.part.50+0x6c/0x1e0
? check_flags+0x26/0x30
? lock_is_held_type+0xc3/0xf0
? syscall_enter_from_user_mode+0x1b/0x60
? do_syscall_64+0x13/0x80
? rcu_read_lock_sched_held+0xa1/0xd0
? __kasan_check_read+0x11/0x20
? __fget_light+0xae/0x110
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4c4bdfe427
Allocated by task 28836:
kasan_save_stack+0x21/0x50
__kasan_kmalloc.constprop.18+0xbe/0xd0
kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x410/0xcb0
btrfs_backref_alloc_node+0x46/0xf0
btrfs_backref_add_tree_node+0x60d/0x11d0
build_backref_tree+0xc5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 28836:
kasan_save_stack+0x21/0x50
kasan_set_track+0x20/0x30
kasan_set_free_info+0x1f/0x30
__kasan_slab_free+0xf3/0x140
kasan_slab_free+0xe/0x10
kfree+0xde/0x200
btrfs_backref_error_cleanup+0x452/0x530
build_backref_tree+0x1a5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This occurred because we freed our backref node in
btrfs_backref_error_cleanup(), but then tried to free it again in
btrfs_backref_release_cache(). This is because
btrfs_backref_release_cache() will cycle through all of the
cache->leaves nodes and free them up. However
btrfs_backref_error_cleanup() freed the backref node with
btrfs_backref_free_node(), which simply kfree()d the backref node
without unlinking it from the cache. Change this to a
btrfs_backref_drop_node(), which does the appropriate cleanup and
removes the node from the cache->leaves list, so when we go to free the
remaining cache we don't trip over items we've already dropped.
Fixes: 75bfb9aff45e ("Btrfs: cleanup error handling in build_backref_tree")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 02d7d7b2563b..9cadacf3ec27 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3117,7 +3117,7 @@ void btrfs_backref_error_cleanup(struct btrfs_backref_cache *cache,
list_del_init(&lower->list);
if (lower == node)
node = NULL;
- btrfs_backref_free_node(cache, lower);
+ btrfs_backref_drop_node(cache, lower);
}
btrfs_backref_cleanup_node(cache, node);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49ecc679ab48b40ca799bf94b327d5284eac9e46 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Wed, 16 Dec 2020 11:22:11 -0500
Subject: [PATCH] btrfs: do not double free backref nodes on error
Zygo reported the following KASAN splat:
BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
Read of size 8 at addr ffff888112402950 by task btrfs/28836
CPU: 0 PID: 28836 Comm: btrfs Tainted: G W 5.10.0-e35f27394290-for-next+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
dump_stack+0xbc/0xf9
? btrfs_backref_cleanup_node+0x18a/0x420
print_address_description.constprop.8+0x21/0x210
? record_print_text.cold.34+0x11/0x11
? btrfs_backref_cleanup_node+0x18a/0x420
? btrfs_backref_cleanup_node+0x18a/0x420
kasan_report.cold.10+0x20/0x37
? btrfs_backref_cleanup_node+0x18a/0x420
__asan_load8+0x69/0x90
btrfs_backref_cleanup_node+0x18a/0x420
btrfs_backref_release_cache+0x83/0x1b0
relocate_block_group+0x394/0x780
? merge_reloc_roots+0x4a0/0x4a0
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
? check_flags.part.50+0x6c/0x1e0
? btrfs_relocate_chunk+0x120/0x120
? kmem_cache_alloc_trace+0xa06/0xcb0
? _copy_from_user+0x83/0xc0
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
? __kasan_check_read+0x11/0x20
? check_chain_key+0x1f4/0x2f0
? __asan_loadN+0xf/0x20
? btrfs_ioctl_get_supported_features+0x30/0x30
? kvm_sched_clock_read+0x18/0x30
? check_chain_key+0x1f4/0x2f0
? lock_downgrade+0x3f0/0x3f0
? handle_mm_fault+0xad6/0x2150
? do_vfs_ioctl+0xfc/0x9d0
? ioctl_file_clone+0xe0/0xe0
? check_flags.part.50+0x6c/0x1e0
? check_flags.part.50+0x6c/0x1e0
? check_flags+0x26/0x30
? lock_is_held_type+0xc3/0xf0
? syscall_enter_from_user_mode+0x1b/0x60
? do_syscall_64+0x13/0x80
? rcu_read_lock_sched_held+0xa1/0xd0
? __kasan_check_read+0x11/0x20
? __fget_light+0xae/0x110
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4c4bdfe427
Allocated by task 28836:
kasan_save_stack+0x21/0x50
__kasan_kmalloc.constprop.18+0xbe/0xd0
kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x410/0xcb0
btrfs_backref_alloc_node+0x46/0xf0
btrfs_backref_add_tree_node+0x60d/0x11d0
build_backref_tree+0xc5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 28836:
kasan_save_stack+0x21/0x50
kasan_set_track+0x20/0x30
kasan_set_free_info+0x1f/0x30
__kasan_slab_free+0xf3/0x140
kasan_slab_free+0xe/0x10
kfree+0xde/0x200
btrfs_backref_error_cleanup+0x452/0x530
build_backref_tree+0x1a5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This occurred because we freed our backref node in
btrfs_backref_error_cleanup(), but then tried to free it again in
btrfs_backref_release_cache(). This is because
btrfs_backref_release_cache() will cycle through all of the
cache->leaves nodes and free them up. However
btrfs_backref_error_cleanup() freed the backref node with
btrfs_backref_free_node(), which simply kfree()d the backref node
without unlinking it from the cache. Change this to a
btrfs_backref_drop_node(), which does the appropriate cleanup and
removes the node from the cache->leaves list, so when we go to free the
remaining cache we don't trip over items we've already dropped.
Fixes: 75bfb9aff45e ("Btrfs: cleanup error handling in build_backref_tree")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 02d7d7b2563b..9cadacf3ec27 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3117,7 +3117,7 @@ void btrfs_backref_error_cleanup(struct btrfs_backref_cache *cache,
list_del_init(&lower->list);
if (lower == node)
node = NULL;
- btrfs_backref_free_node(cache, lower);
+ btrfs_backref_drop_node(cache, lower);
}
btrfs_backref_cleanup_node(cache, node);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49ecc679ab48b40ca799bf94b327d5284eac9e46 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Wed, 16 Dec 2020 11:22:11 -0500
Subject: [PATCH] btrfs: do not double free backref nodes on error
Zygo reported the following KASAN splat:
BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
Read of size 8 at addr ffff888112402950 by task btrfs/28836
CPU: 0 PID: 28836 Comm: btrfs Tainted: G W 5.10.0-e35f27394290-for-next+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
dump_stack+0xbc/0xf9
? btrfs_backref_cleanup_node+0x18a/0x420
print_address_description.constprop.8+0x21/0x210
? record_print_text.cold.34+0x11/0x11
? btrfs_backref_cleanup_node+0x18a/0x420
? btrfs_backref_cleanup_node+0x18a/0x420
kasan_report.cold.10+0x20/0x37
? btrfs_backref_cleanup_node+0x18a/0x420
__asan_load8+0x69/0x90
btrfs_backref_cleanup_node+0x18a/0x420
btrfs_backref_release_cache+0x83/0x1b0
relocate_block_group+0x394/0x780
? merge_reloc_roots+0x4a0/0x4a0
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
? check_flags.part.50+0x6c/0x1e0
? btrfs_relocate_chunk+0x120/0x120
? kmem_cache_alloc_trace+0xa06/0xcb0
? _copy_from_user+0x83/0xc0
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
? __kasan_check_read+0x11/0x20
? check_chain_key+0x1f4/0x2f0
? __asan_loadN+0xf/0x20
? btrfs_ioctl_get_supported_features+0x30/0x30
? kvm_sched_clock_read+0x18/0x30
? check_chain_key+0x1f4/0x2f0
? lock_downgrade+0x3f0/0x3f0
? handle_mm_fault+0xad6/0x2150
? do_vfs_ioctl+0xfc/0x9d0
? ioctl_file_clone+0xe0/0xe0
? check_flags.part.50+0x6c/0x1e0
? check_flags.part.50+0x6c/0x1e0
? check_flags+0x26/0x30
? lock_is_held_type+0xc3/0xf0
? syscall_enter_from_user_mode+0x1b/0x60
? do_syscall_64+0x13/0x80
? rcu_read_lock_sched_held+0xa1/0xd0
? __kasan_check_read+0x11/0x20
? __fget_light+0xae/0x110
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4c4bdfe427
Allocated by task 28836:
kasan_save_stack+0x21/0x50
__kasan_kmalloc.constprop.18+0xbe/0xd0
kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x410/0xcb0
btrfs_backref_alloc_node+0x46/0xf0
btrfs_backref_add_tree_node+0x60d/0x11d0
build_backref_tree+0xc5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 28836:
kasan_save_stack+0x21/0x50
kasan_set_track+0x20/0x30
kasan_set_free_info+0x1f/0x30
__kasan_slab_free+0xf3/0x140
kasan_slab_free+0xe/0x10
kfree+0xde/0x200
btrfs_backref_error_cleanup+0x452/0x530
build_backref_tree+0x1a5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This occurred because we freed our backref node in
btrfs_backref_error_cleanup(), but then tried to free it again in
btrfs_backref_release_cache(). This is because
btrfs_backref_release_cache() will cycle through all of the
cache->leaves nodes and free them up. However
btrfs_backref_error_cleanup() freed the backref node with
btrfs_backref_free_node(), which simply kfree()d the backref node
without unlinking it from the cache. Change this to a
btrfs_backref_drop_node(), which does the appropriate cleanup and
removes the node from the cache->leaves list, so when we go to free the
remaining cache we don't trip over items we've already dropped.
Fixes: 75bfb9aff45e ("Btrfs: cleanup error handling in build_backref_tree")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 02d7d7b2563b..9cadacf3ec27 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3117,7 +3117,7 @@ void btrfs_backref_error_cleanup(struct btrfs_backref_cache *cache,
list_del_init(&lower->list);
if (lower == node)
node = NULL;
- btrfs_backref_free_node(cache, lower);
+ btrfs_backref_drop_node(cache, lower);
}
btrfs_backref_cleanup_node(cache, node);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49ecc679ab48b40ca799bf94b327d5284eac9e46 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Wed, 16 Dec 2020 11:22:11 -0500
Subject: [PATCH] btrfs: do not double free backref nodes on error
Zygo reported the following KASAN splat:
BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
Read of size 8 at addr ffff888112402950 by task btrfs/28836
CPU: 0 PID: 28836 Comm: btrfs Tainted: G W 5.10.0-e35f27394290-for-next+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
dump_stack+0xbc/0xf9
? btrfs_backref_cleanup_node+0x18a/0x420
print_address_description.constprop.8+0x21/0x210
? record_print_text.cold.34+0x11/0x11
? btrfs_backref_cleanup_node+0x18a/0x420
? btrfs_backref_cleanup_node+0x18a/0x420
kasan_report.cold.10+0x20/0x37
? btrfs_backref_cleanup_node+0x18a/0x420
__asan_load8+0x69/0x90
btrfs_backref_cleanup_node+0x18a/0x420
btrfs_backref_release_cache+0x83/0x1b0
relocate_block_group+0x394/0x780
? merge_reloc_roots+0x4a0/0x4a0
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
? check_flags.part.50+0x6c/0x1e0
? btrfs_relocate_chunk+0x120/0x120
? kmem_cache_alloc_trace+0xa06/0xcb0
? _copy_from_user+0x83/0xc0
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
? __kasan_check_read+0x11/0x20
? check_chain_key+0x1f4/0x2f0
? __asan_loadN+0xf/0x20
? btrfs_ioctl_get_supported_features+0x30/0x30
? kvm_sched_clock_read+0x18/0x30
? check_chain_key+0x1f4/0x2f0
? lock_downgrade+0x3f0/0x3f0
? handle_mm_fault+0xad6/0x2150
? do_vfs_ioctl+0xfc/0x9d0
? ioctl_file_clone+0xe0/0xe0
? check_flags.part.50+0x6c/0x1e0
? check_flags.part.50+0x6c/0x1e0
? check_flags+0x26/0x30
? lock_is_held_type+0xc3/0xf0
? syscall_enter_from_user_mode+0x1b/0x60
? do_syscall_64+0x13/0x80
? rcu_read_lock_sched_held+0xa1/0xd0
? __kasan_check_read+0x11/0x20
? __fget_light+0xae/0x110
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4c4bdfe427
Allocated by task 28836:
kasan_save_stack+0x21/0x50
__kasan_kmalloc.constprop.18+0xbe/0xd0
kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x410/0xcb0
btrfs_backref_alloc_node+0x46/0xf0
btrfs_backref_add_tree_node+0x60d/0x11d0
build_backref_tree+0xc5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 28836:
kasan_save_stack+0x21/0x50
kasan_set_track+0x20/0x30
kasan_set_free_info+0x1f/0x30
__kasan_slab_free+0xf3/0x140
kasan_slab_free+0xe/0x10
kfree+0xde/0x200
btrfs_backref_error_cleanup+0x452/0x530
build_backref_tree+0x1a5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This occurred because we freed our backref node in
btrfs_backref_error_cleanup(), but then tried to free it again in
btrfs_backref_release_cache(). This is because
btrfs_backref_release_cache() will cycle through all of the
cache->leaves nodes and free them up. However
btrfs_backref_error_cleanup() freed the backref node with
btrfs_backref_free_node(), which simply kfree()d the backref node
without unlinking it from the cache. Change this to a
btrfs_backref_drop_node(), which does the appropriate cleanup and
removes the node from the cache->leaves list, so when we go to free the
remaining cache we don't trip over items we've already dropped.
Fixes: 75bfb9aff45e ("Btrfs: cleanup error handling in build_backref_tree")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 02d7d7b2563b..9cadacf3ec27 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3117,7 +3117,7 @@ void btrfs_backref_error_cleanup(struct btrfs_backref_cache *cache,
list_del_init(&lower->list);
if (lower == node)
node = NULL;
- btrfs_backref_free_node(cache, lower);
+ btrfs_backref_drop_node(cache, lower);
}
btrfs_backref_cleanup_node(cache, node);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49ecc679ab48b40ca799bf94b327d5284eac9e46 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Wed, 16 Dec 2020 11:22:11 -0500
Subject: [PATCH] btrfs: do not double free backref nodes on error
Zygo reported the following KASAN splat:
BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
Read of size 8 at addr ffff888112402950 by task btrfs/28836
CPU: 0 PID: 28836 Comm: btrfs Tainted: G W 5.10.0-e35f27394290-for-next+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
dump_stack+0xbc/0xf9
? btrfs_backref_cleanup_node+0x18a/0x420
print_address_description.constprop.8+0x21/0x210
? record_print_text.cold.34+0x11/0x11
? btrfs_backref_cleanup_node+0x18a/0x420
? btrfs_backref_cleanup_node+0x18a/0x420
kasan_report.cold.10+0x20/0x37
? btrfs_backref_cleanup_node+0x18a/0x420
__asan_load8+0x69/0x90
btrfs_backref_cleanup_node+0x18a/0x420
btrfs_backref_release_cache+0x83/0x1b0
relocate_block_group+0x394/0x780
? merge_reloc_roots+0x4a0/0x4a0
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
? check_flags.part.50+0x6c/0x1e0
? btrfs_relocate_chunk+0x120/0x120
? kmem_cache_alloc_trace+0xa06/0xcb0
? _copy_from_user+0x83/0xc0
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
? __kasan_check_read+0x11/0x20
? check_chain_key+0x1f4/0x2f0
? __asan_loadN+0xf/0x20
? btrfs_ioctl_get_supported_features+0x30/0x30
? kvm_sched_clock_read+0x18/0x30
? check_chain_key+0x1f4/0x2f0
? lock_downgrade+0x3f0/0x3f0
? handle_mm_fault+0xad6/0x2150
? do_vfs_ioctl+0xfc/0x9d0
? ioctl_file_clone+0xe0/0xe0
? check_flags.part.50+0x6c/0x1e0
? check_flags.part.50+0x6c/0x1e0
? check_flags+0x26/0x30
? lock_is_held_type+0xc3/0xf0
? syscall_enter_from_user_mode+0x1b/0x60
? do_syscall_64+0x13/0x80
? rcu_read_lock_sched_held+0xa1/0xd0
? __kasan_check_read+0x11/0x20
? __fget_light+0xae/0x110
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4c4bdfe427
Allocated by task 28836:
kasan_save_stack+0x21/0x50
__kasan_kmalloc.constprop.18+0xbe/0xd0
kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x410/0xcb0
btrfs_backref_alloc_node+0x46/0xf0
btrfs_backref_add_tree_node+0x60d/0x11d0
build_backref_tree+0xc5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 28836:
kasan_save_stack+0x21/0x50
kasan_set_track+0x20/0x30
kasan_set_free_info+0x1f/0x30
__kasan_slab_free+0xf3/0x140
kasan_slab_free+0xe/0x10
kfree+0xde/0x200
btrfs_backref_error_cleanup+0x452/0x530
build_backref_tree+0x1a5/0x700
relocate_tree_blocks+0x2be/0xb90
relocate_block_group+0x2eb/0x780
btrfs_relocate_block_group+0x26e/0x4c0
btrfs_relocate_chunk+0x52/0x120
btrfs_balance+0xe2e/0x1900
btrfs_ioctl_balance+0x3a7/0x460
btrfs_ioctl+0x24c8/0x4360
__x64_sys_ioctl+0xc3/0x100
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This occurred because we freed our backref node in
btrfs_backref_error_cleanup(), but then tried to free it again in
btrfs_backref_release_cache(). This is because
btrfs_backref_release_cache() will cycle through all of the
cache->leaves nodes and free them up. However
btrfs_backref_error_cleanup() freed the backref node with
btrfs_backref_free_node(), which simply kfree()d the backref node
without unlinking it from the cache. Change this to a
btrfs_backref_drop_node(), which does the appropriate cleanup and
removes the node from the cache->leaves list, so when we go to free the
remaining cache we don't trip over items we've already dropped.
Fixes: 75bfb9aff45e ("Btrfs: cleanup error handling in build_backref_tree")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 02d7d7b2563b..9cadacf3ec27 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3117,7 +3117,7 @@ void btrfs_backref_error_cleanup(struct btrfs_backref_cache *cache,
list_del_init(&lower->list);
if (lower == node)
node = NULL;
- btrfs_backref_free_node(cache, lower);
+ btrfs_backref_drop_node(cache, lower);
}
btrfs_backref_cleanup_node(cache, node);
Below race can come, if trace_open and resize of
cpu buffer is running parallely on different cpus
CPUX CPUY
ring_buffer_resize
atomic_read(&buffer->resize_disabled)
tracing_open
tracing_reset_online_cpus
ring_buffer_reset_cpu
rb_reset_cpu
rb_update_pages
remove/insert pages
resetting pointer
This race can cause data abort or some times infinte loop in
rb_remove_pages and rb_insert_pages while checking pages
for sanity.
Take buffer lock to fix this.
Signed-off-by: Gaurav Kohli <gkohli(a)codeaurora.org>
Cc: stable(a)vger.kernel.org
---
Changes since v0:
-Addressed Steven's review comments.
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 93ef0ab..15bf28b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4866,6 +4866,9 @@ void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu)
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return;
+ /* prevent another thread from changing buffer sizes */
+ mutex_lock(&buffer->mutex);
+
atomic_inc(&cpu_buffer->resize_disabled);
atomic_inc(&cpu_buffer->record_disabled);
@@ -4876,6 +4879,8 @@ void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu)
atomic_dec(&cpu_buffer->record_disabled);
atomic_dec(&cpu_buffer->resize_disabled);
+
+ mutex_unlock(&buffer->mutex);
}
EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
@@ -4889,6 +4894,9 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
struct ring_buffer_per_cpu *cpu_buffer;
int cpu;
+ /* prevent another thread from changing buffer sizes */
+ mutex_lock(&buffer->mutex);
+
for_each_online_buffer_cpu(buffer, cpu) {
cpu_buffer = buffer->buffers[cpu];
@@ -4907,6 +4915,8 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
atomic_dec(&cpu_buffer->record_disabled);
atomic_dec(&cpu_buffer->resize_disabled);
}
+
+ mutex_unlock(&buffer->mutex);
}
/**
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
musb_queue_resume_work() would call the provided callback if the runtime
PM status was 'active'. Otherwise, it would enqueue the request if the
hardware was still suspended (musb->is_runtime_suspended is true).
This causes a race with the runtime PM handlers, as it is possible to be
in the case where the runtime PM status is not yet 'active', but the
hardware has been awaken (PM resume function has been called).
When hitting the race, the resume work was not enqueued, which probably
triggered other bugs further down the stack. For instance, a telnet
connection on Ingenic SoCs would result in a 50/50 chance of a
segmentation fault somewhere in the musb code.
Rework the code so that either we call the callback directly if
(musb->is_runtime_suspended == 0), or enqueue the query otherwise.
Fixes: ea2f35c01d5e ("usb: musb: Fix sleeping function called from invalid context for hdrc glue")
Cc: stable(a)vger.kernel.org # v4.9+
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
Reviewed-by: Tony Lindgren <tony(a)atomide.com>
Tested-by: Tony Lindgren <tony(a)atomide.com>
---
drivers/usb/musb/musb_core.c | 31 +++++++++++++++++--------------
1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c
index 849e0b770130..1cd87729ba60 100644
--- a/drivers/usb/musb/musb_core.c
+++ b/drivers/usb/musb/musb_core.c
@@ -2240,32 +2240,35 @@ int musb_queue_resume_work(struct musb *musb,
{
struct musb_pending_work *w;
unsigned long flags;
+ bool is_suspended;
int error;
if (WARN_ON(!callback))
return -EINVAL;
- if (pm_runtime_active(musb->controller))
- return callback(musb, data);
+ spin_lock_irqsave(&musb->list_lock, flags);
+ is_suspended = musb->is_runtime_suspended;
+
+ if (is_suspended) {
+ w = devm_kzalloc(musb->controller, sizeof(*w), GFP_ATOMIC);
+ if (!w) {
+ error = -ENOMEM;
+ goto out_unlock;
+ }
- w = devm_kzalloc(musb->controller, sizeof(*w), GFP_ATOMIC);
- if (!w)
- return -ENOMEM;
+ w->callback = callback;
+ w->data = data;
- w->callback = callback;
- w->data = data;
- spin_lock_irqsave(&musb->list_lock, flags);
- if (musb->is_runtime_suspended) {
list_add_tail(&w->node, &musb->pending_list);
error = 0;
- } else {
- dev_err(musb->controller, "could not add resume work %p\n",
- callback);
- devm_kfree(musb->controller, w);
- error = -EINPROGRESS;
}
+
+out_unlock:
spin_unlock_irqrestore(&musb->list_lock, flags);
+ if (!is_suspended)
+ error = callback(musb, data);
+
return error;
}
EXPORT_SYMBOL_GPL(musb_queue_resume_work);
--
2.29.2
From: Xiaoming Ni <nixiaoming(a)huawei.com>
Subject: proc_sysctl: fix oops caused by incorrect command parameters
The process_sysctl_arg() does not check whether val is empty before
invoking strlen(val). If the command line parameter () is incorrectly
configured and val is empty, oops is triggered.
For example:
"hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is
triggered. The call stack is as follows:
Kernel command line: .... hung_task_panic
......
Call trace:
__pi_strlen+0x10/0x98
parse_args+0x278/0x344
do_sysctl_args+0x8c/0xfc
kernel_init+0x5c/0xf4
ret_from_fork+0x10/0x30
To fix it, check whether "val" is empty when "phram" is a sysctl field.
Error codes are returned in the failure branch, and error logs are
generated by parse_args().
Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com
Fixes: 3db978d480e2843 ("kernel/sysctl: support setting sysctl parameters from kernel command line")
Signed-off-by: Xiaoming Ni <nixiaoming(a)huawei.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Iurii Zaikin <yzaikin(a)google.com>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Heiner Kallweit <hkallweit1(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/proc_sysctl.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/fs/proc/proc_sysctl.c~proc_sysctl-fix-oops-caused-by-incorrect-command-parameters
+++ a/fs/proc/proc_sysctl.c
@@ -1770,6 +1770,12 @@ static int process_sysctl_arg(char *para
return 0;
}
+ if (!val)
+ return -EINVAL;
+ len = strlen(val);
+ if (len == 0)
+ return -EINVAL;
+
/*
* To set sysctl options, we use a temporary mount of proc, look up the
* respective sys/ file and write to it. To avoid mounting it when no
@@ -1811,7 +1817,6 @@ static int process_sysctl_arg(char *para
file, param, val);
goto out;
}
- len = strlen(val);
wret = kernel_write(file, val, len, &pos);
if (wret < 0) {
err = wret;
_
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: fix page reference leak in soft_offline_page()
The conversion to move pfn_to_online_page() internal to
soft_offline_page() missed that the get_user_pages() reference taken by
the madvise() path needs to be dropped when pfn_to_online_page() fails.
Note the direct sysfs-path to soft_offline_page() does not perform a
get_user_pages() lookup.
When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page()
pfn the kernel hangs at dax-device shutdown due to a leaked reference.
Link: https://lkml.kernel.org/r/161058501210.1840162.8108917599181157327.stgit@dw…
Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
--- a/mm/memory-failure.c~mm-fix-page-reference-leak-in-soft_offline_page
+++ a/mm/memory-failure.c
@@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct
return rc;
}
+static void put_ref_page(struct page *page)
+{
+ if (page)
+ put_page(page);
+}
+
/**
* soft_offline_page - Soft offline a page.
* @pfn: pfn to soft-offline
@@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct
int soft_offline_page(unsigned long pfn, int flags)
{
int ret;
- struct page *page;
bool try_again = true;
+ struct page *page, *ref_page = NULL;
+
+ WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED));
if (!pfn_valid(pfn))
return -ENXIO;
+ if (flags & MF_COUNT_INCREASED)
+ ref_page = pfn_to_page(pfn);
+
/* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */
page = pfn_to_online_page(pfn);
- if (!page)
+ if (!page) {
+ put_ref_page(ref_page);
return -EIO;
+ }
if (PageHWPoison(page)) {
pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
- if (flags & MF_COUNT_INCREASED)
- put_page(page);
+ put_ref_page(ref_page);
return 0;
}
_
From: Shakeel Butt <shakeelb(a)google.com>
Subject: mm: fix numa stats for thp migration
Currently the kernel is not correctly updating the numa stats for
NR_FILE_PAGES and NR_SHMEM on THP migration. Fix that. For NR_FILE_DIRTY
and NR_ZONE_WRITE_PENDING, although at the moment there is no need to
handle THP migration as kernel still does not have write support for file
THP but to be more future proof, this patch adds the THP support for those
stats as well.
Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com
Fixes: e71769ae52609 ("mm: enable thp migration for shmem thp")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
--- a/mm/migrate.c~mm-fix-numa-stats-for-thp-migration
+++ a/mm/migrate.c
@@ -402,6 +402,7 @@ int migrate_page_move_mapping(struct add
struct zone *oldzone, *newzone;
int dirty;
int expected_count = expected_page_refs(mapping, page) + extra_count;
+ int nr = thp_nr_pages(page);
if (!mapping) {
/* Anonymous page without mapping */
@@ -437,7 +438,7 @@ int migrate_page_move_mapping(struct add
*/
newpage->index = page->index;
newpage->mapping = page->mapping;
- page_ref_add(newpage, thp_nr_pages(page)); /* add cache reference */
+ page_ref_add(newpage, nr); /* add cache reference */
if (PageSwapBacked(page)) {
__SetPageSwapBacked(newpage);
if (PageSwapCache(page)) {
@@ -459,7 +460,7 @@ int migrate_page_move_mapping(struct add
if (PageTransHuge(page)) {
int i;
- for (i = 1; i < HPAGE_PMD_NR; i++) {
+ for (i = 1; i < nr; i++) {
xas_next(&xas);
xas_store(&xas, newpage);
}
@@ -470,7 +471,7 @@ int migrate_page_move_mapping(struct add
* to one less reference.
* We know this isn't the last reference.
*/
- page_ref_unfreeze(page, expected_count - thp_nr_pages(page));
+ page_ref_unfreeze(page, expected_count - nr);
xas_unlock(&xas);
/* Leave irq disabled to prevent preemption while updating stats */
@@ -493,17 +494,17 @@ int migrate_page_move_mapping(struct add
old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat);
new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat);
- __dec_lruvec_state(old_lruvec, NR_FILE_PAGES);
- __inc_lruvec_state(new_lruvec, NR_FILE_PAGES);
+ __mod_lruvec_state(old_lruvec, NR_FILE_PAGES, -nr);
+ __mod_lruvec_state(new_lruvec, NR_FILE_PAGES, nr);
if (PageSwapBacked(page) && !PageSwapCache(page)) {
- __dec_lruvec_state(old_lruvec, NR_SHMEM);
- __inc_lruvec_state(new_lruvec, NR_SHMEM);
+ __mod_lruvec_state(old_lruvec, NR_SHMEM, -nr);
+ __mod_lruvec_state(new_lruvec, NR_SHMEM, nr);
}
if (dirty && mapping_can_writeback(mapping)) {
- __dec_lruvec_state(old_lruvec, NR_FILE_DIRTY);
- __dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING);
- __inc_lruvec_state(new_lruvec, NR_FILE_DIRTY);
- __inc_zone_state(newzone, NR_ZONE_WRITE_PENDING);
+ __mod_lruvec_state(old_lruvec, NR_FILE_DIRTY, -nr);
+ __mod_zone_page_state(oldzone, NR_ZONE_WRITE_PENDING, -nr);
+ __mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr);
+ __mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr);
}
}
local_irq_enable();
_
From: Shakeel Butt <shakeelb(a)google.com>
Subject: mm: memcg: fix memcg file_dirty numa stat
The kernel updates the per-node NR_FILE_DIRTY stats on page migration but
not the memcg numa stats. That was not an issue until recently the commit
5f9a4f4a7096 ("mm: memcontrol: add the missing numa_stat interface for
cgroup v2") exposed numa stats for the memcg. So fix the file_dirty
per-memcg numa stat.
Link: https://lkml.kernel.org/r/20210108155813.2914586-1-shakeelb@google.com
Fixes: 5f9a4f4a7096 ("mm: memcontrol: add the missing numa_stat interface for cgroup v2")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Muchun Song <songmuchun(a)bytedance.com>
Acked-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/migrate.c~mm-memcg-fix-memcg-file_dirty-numa-stat
+++ a/mm/migrate.c
@@ -500,9 +500,9 @@ int migrate_page_move_mapping(struct add
__inc_lruvec_state(new_lruvec, NR_SHMEM);
}
if (dirty && mapping_can_writeback(mapping)) {
- __dec_node_state(oldzone->zone_pgdat, NR_FILE_DIRTY);
+ __dec_lruvec_state(old_lruvec, NR_FILE_DIRTY);
__dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING);
- __inc_node_state(newzone->zone_pgdat, NR_FILE_DIRTY);
+ __inc_lruvec_state(new_lruvec, NR_FILE_DIRTY);
__inc_zone_state(newzone, NR_ZONE_WRITE_PENDING);
}
}
_
From: Mike Rapoport <rppt(a)linux.ibm.com>
Subject: mm: fix initialization of struct page for holes in memory layout
There could be struct pages that are not backed by actual physical memory.
This can happen when the actual memory bank is not a multiple of
SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.
Such pages are currently initialized using init_unavailable_mem() function
that iterates through PFNs in holes in memblock.memory and if there is a
struct page corresponding to a PFN, the fields if this page are set to
default values and the page is marked as Reserved.
init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.
On a system that has firmware reserved holes in a zone above ZONE_DMA, for
instance in a configuration below:
# grep -A1 E820 /proc/iomem
7a17b000-7a216fff : Unknown E820 type
7a217000-7bffffff : System RAM
unset zone link in struct page will trigger
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link
in struct page) in the same pageblock.
Update init_unavailable_mem() to use zone constraints defined by an
architecture to properly setup the zone link and use node ID of the
adjacent range in memblock.memory to set the node link.
Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org
Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reported-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 84 +++++++++++++++++++++++++++-------------------
1 file changed, 50 insertions(+), 34 deletions(-)
--- a/mm/page_alloc.c~mm-fix-initialization-of-struct-page-for-holes-in-memory-layout
+++ a/mm/page_alloc.c
@@ -7078,23 +7078,26 @@ void __init free_area_init_memoryless_no
* Initialize all valid struct pages in the range [spfn, epfn) and mark them
* PageReserved(). Return the number of struct pages that were initialized.
*/
-static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn)
+static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn,
+ int zone, int nid)
{
- unsigned long pfn;
+ unsigned long pfn, zone_spfn, zone_epfn;
u64 pgcnt = 0;
+ zone_spfn = arch_zone_lowest_possible_pfn[zone];
+ zone_epfn = arch_zone_highest_possible_pfn[zone];
+
+ spfn = clamp(spfn, zone_spfn, zone_epfn);
+ epfn = clamp(epfn, zone_spfn, zone_epfn);
+
for (pfn = spfn; pfn < epfn; pfn++) {
if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
+ pageblock_nr_pages - 1;
continue;
}
- /*
- * Use a fake node/zone (0) for now. Some of these pages
- * (in memblock.reserved but not in memblock.memory) will
- * get re-initialized via reserve_bootmem_region() later.
- */
- __init_single_page(pfn_to_page(pfn), pfn, 0, 0);
+
+ __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
__SetPageReserved(pfn_to_page(pfn));
pgcnt++;
}
@@ -7103,51 +7106,64 @@ static u64 __init init_unavailable_range
}
/*
- * Only struct pages that are backed by physical memory are zeroed and
- * initialized by going through __init_single_page(). But, there are some
- * struct pages which are reserved in memblock allocator and their fields
- * may be accessed (for example page_to_pfn() on some configuration accesses
- * flags). We must explicitly initialize those struct pages.
+ * Only struct pages that correspond to ranges defined by memblock.memory
+ * are zeroed and initialized by going through __init_single_page() during
+ * memmap_init().
*
- * This function also addresses a similar issue where struct pages are left
- * uninitialized because the physical address range is not covered by
- * memblock.memory or memblock.reserved. That could happen when memblock
- * layout is manually configured via memmap=, or when the highest physical
- * address (max_pfn) does not end on a section boundary.
+ * But, there could be struct pages that correspond to holes in
+ * memblock.memory. This can happen because of the following reasons:
+ * - phyiscal memory bank size is not necessarily the exact multiple of the
+ * arbitrary section size
+ * - early reserved memory may not be listed in memblock.memory
+ * - memory layouts defined with memmap= kernel parameter may not align
+ * nicely with memmap sections
+ *
+ * Explicitly initialize those struct pages so that:
+ * - PG_Reserved is set
+ * - zone link is set accorging to the architecture constrains
+ * - node is set to node id of the next populated region except for the
+ * trailing hole where last node id is used
*/
-static void __init init_unavailable_mem(void)
+static void __init init_zone_unavailable_mem(int zone)
{
- phys_addr_t start, end;
- u64 i, pgcnt;
- phys_addr_t next = 0;
+ unsigned long start, end;
+ int i, nid;
+ u64 pgcnt;
+ unsigned long next = 0;
/*
- * Loop through unavailable ranges not covered by memblock.memory.
+ * Loop through holes in memblock.memory and initialize struct
+ * pages corresponding to these holes
*/
pgcnt = 0;
- for_each_mem_range(i, &start, &end) {
+ for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) {
if (next < start)
- pgcnt += init_unavailable_range(PFN_DOWN(next),
- PFN_UP(start));
+ pgcnt += init_unavailable_range(next, start, zone, nid);
next = end;
}
/*
- * Early sections always have a fully populated memmap for the whole
- * section - see pfn_valid(). If the last section has holes at the
- * end and that section is marked "online", the memmap will be
- * considered initialized. Make sure that memmap has a well defined
- * state.
+ * Last section may surpass the actual end of memory (e.g. we can
+ * have 1Gb section and 512Mb of RAM pouplated).
+ * Make sure that memmap has a well defined state in this case.
*/
- pgcnt += init_unavailable_range(PFN_DOWN(next),
- round_up(max_pfn, PAGES_PER_SECTION));
+ end = round_up(max_pfn, PAGES_PER_SECTION);
+ pgcnt += init_unavailable_range(next, end, zone, nid);
/*
* Struct pages that do not have backing memory. This could be because
* firmware is using some of this memory, or for some other reasons.
*/
if (pgcnt)
- pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt);
+ pr_info("Zone %s: zeroed struct page in unavailable ranges: %lld pages", zone_names[zone], pgcnt);
+}
+
+static void __init init_unavailable_mem(void)
+{
+ int zone;
+
+ for (zone = 0; zone < ZONE_MOVABLE; zone++)
+ init_zone_unavailable_mem(zone);
}
#else
static inline void __init init_unavailable_mem(void)
_
From: Mike Rapoport <rppt(a)linux.ibm.com>
Subject: x86/setup: don't remove E820_TYPE_RAM for pfn 0
Patch series "mm: fix initialization of struct page for holes in memory layout", v3.
Commit 73a6e474cb37 ("mm: memmap_init: iterate over
memblock regions rather that check each PFN") exposed several issues with
the memory map initialization and these patches fix those issues.
Initially there were crashes during compaction that Qian Cai reported back
in April [1]. It seemed back then that the problem was fixed, but a few
weeks ago Andrea Arcangeli hit the same bug [2] and there was an additional
discussion at [3].
[1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
[2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com
[3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foun…
This patch (of 2):
The first 4Kb of memory is a BIOS owned area and to avoid its allocation
for the kernel it was not listed in e820 tables as memory. As the result,
pfn 0 was never recognised by the generic memory management and it is not
a part of neither node 0 nor ZONE_DMA.
If set_pfnblock_flags_mask() would be ever called for the pageblock
corresponding to the first 2Mbytes of memory, having pfn 0 outside of
ZONE_DMA would trigger
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
Along with reserving the first 4Kb in e820 tables, several first pages are
reserved with memblock in several places during setup_arch(). These
reservations are enough to ensure the kernel does not touch the BIOS area
and it is not necessary to remove E820_TYPE_RAM for pfn 0.
Remove the update of e820 table that changes the type of pfn 0 and move
the comment describing why it was done to trim_low_memory_range() that
reserves the beginning of the memory.
Link: https://lkml.kernel.org/r/20210111194017.22696-2-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/setup.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
--- a/arch/x86/kernel/setup.c~x86-setup-dont-remove-e820_type_ram-for-pfn-0
+++ a/arch/x86/kernel/setup.c
@@ -661,17 +661,6 @@ static void __init trim_platform_memory_
static void __init trim_bios_range(void)
{
/*
- * A special case is the first 4Kb of memory;
- * This is a BIOS owned area, not kernel ram, but generally
- * not listed as such in the E820 table.
- *
- * This typically reserves additional memory (64KiB by default)
- * since some BIOSes are known to corrupt low memory. See the
- * Kconfig help text for X86_RESERVE_LOW.
- */
- e820__range_update(0, PAGE_SIZE, E820_TYPE_RAM, E820_TYPE_RESERVED);
-
- /*
* special case: Some BIOSes report the PC BIOS
* area (640Kb -> 1Mb) as RAM even though it is not.
* take them out.
@@ -728,6 +717,15 @@ early_param("reservelow", parse_reservel
static void __init trim_low_memory_range(void)
{
+ /*
+ * A special case is the first 4Kb of memory;
+ * This is a BIOS owned area, not kernel ram, but generally
+ * not listed as such in the E820 table.
+ *
+ * This typically reserves additional memory (64KiB by default)
+ * since some BIOSes are known to corrupt low memory. See the
+ * Kconfig help text for X86_RESERVE_LOW.
+ */
memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
}
_
The patch titled
Subject: mm: thp: fix MADV_REMOVE deadlock on shmem THP
has been added to the -mm tree. Its filename is
mm-thp-fix-madv_remove-deadlock-on-shmem-thp.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-fix-madv_remove-deadlock-o…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-fix-madv_remove-deadlock-o…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: thp: fix MADV_REMOVE deadlock on shmem THP
Sergey reported deadlock between kswapd correctly doing its usual
lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and
madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing
down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page).
This happened when shmem_fallocate(punch hole)'s unmap_mapping_range()
reaches zap_pmd_range()'s call to __split_huge_pmd(). The same deadlock
could occur when partially truncating a mapped huge tmpfs file, or using
fallocate(FALLOC_FL_PUNCH_HOLE) on it.
__split_huge_pmd()'s page lock was added in 5.8, to make sure that any
concurrent use of reuse_swap_page() (holding page lock) could not catch
the anon THP's mapcounts and swapcounts while they were being split.
Fortunately, reuse_swap_page() is never applied to a shmem or file THP
(not even by khugepaged, which checks PageSwapCache before calling), and
anonymous THPs are never created in shmem or file areas: so that
__split_huge_pmd()'s page lock can only be necessary for anonymous THPs,
on which there is no risk of deadlock with i_mmap_rwsem.
Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101161409470.2022@eggly.anvils
Fixes: c444eb564fb1 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
Reviewed-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 37 +++++++++++++++++++++++--------------
1 file changed, 23 insertions(+), 14 deletions(-)
--- a/mm/huge_memory.c~mm-thp-fix-madv_remove-deadlock-on-shmem-thp
+++ a/mm/huge_memory.c
@@ -2202,7 +2202,7 @@ void __split_huge_pmd(struct vm_area_str
{
spinlock_t *ptl;
struct mmu_notifier_range range;
- bool was_locked = false;
+ bool do_unlock_page = false;
pmd_t _pmd;
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
@@ -2218,7 +2218,6 @@ void __split_huge_pmd(struct vm_area_str
VM_BUG_ON(freeze && !page);
if (page) {
VM_WARN_ON_ONCE(!PageLocked(page));
- was_locked = true;
if (page != pmd_page(*pmd))
goto out;
}
@@ -2227,19 +2226,29 @@ repeat:
if (pmd_trans_huge(*pmd)) {
if (!page) {
page = pmd_page(*pmd);
- if (unlikely(!trylock_page(page))) {
- get_page(page);
- _pmd = *pmd;
- spin_unlock(ptl);
- lock_page(page);
- spin_lock(ptl);
- if (unlikely(!pmd_same(*pmd, _pmd))) {
- unlock_page(page);
+ /*
+ * An anonymous page must be locked, to ensure that a
+ * concurrent reuse_swap_page() sees stable mapcount;
+ * but reuse_swap_page() is not used on shmem or file,
+ * and page lock must not be taken when zap_pmd_range()
+ * calls __split_huge_pmd() while i_mmap_lock is held.
+ */
+ if (PageAnon(page)) {
+ if (unlikely(!trylock_page(page))) {
+ get_page(page);
+ _pmd = *pmd;
+ spin_unlock(ptl);
+ lock_page(page);
+ spin_lock(ptl);
+ if (unlikely(!pmd_same(*pmd, _pmd))) {
+ unlock_page(page);
+ put_page(page);
+ page = NULL;
+ goto repeat;
+ }
put_page(page);
- page = NULL;
- goto repeat;
}
- put_page(page);
+ do_unlock_page = true;
}
}
if (PageMlocked(page))
@@ -2249,7 +2258,7 @@ repeat:
__split_huge_pmd_locked(vma, pmd, range.start, freeze);
out:
spin_unlock(ptl);
- if (!was_locked && page)
+ if (do_unlock_page)
unlock_page(page);
/*
* No need to double call mmu_notifier->invalidate_range() callback.
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-thp-fix-madv_remove-deadlock-on-shmem-thp.patch
The patch titled
Subject: Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"
has been added to the -mm tree. Its filename is
revert-mm-memcontrol-avoid-workload-stalls-when-lowering-memoryhigh.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/revert-mm-memcontrol-avoid-worklo…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/revert-mm-memcontrol-avoid-worklo…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"
This reverts commit 536d3bf261a2fc3b05b3e91e7eef7383443015cf, as it can
cause writers to memory.high to get stuck in the kernel forever,
performing page reclaim and consuming excessive amounts of CPU cycles.
Before the patch, a write to memory.high would first put the new limit in
place for the workload, and then reclaim the requested delta. After the
patch, the kernel tries to reclaim the delta before putting the new limit
into place, in order to not overwhelm the workload with a sudden, large
excess over the limit. However, if reclaim is actively racing with new
allocations from the uncurbed workload, it can keep the write() working
inside the kernel indefinitely.
This is causing problems in Facebook production. A privileged
system-level daemon that adjusts memory.high for various workloads running
on a host can get unexpectedly stuck in the kernel and essentially turn
into a sort of involuntary kswapd for one of the workloads. We've
observed that daemon busy-spin in a write() for minutes at a time,
neglecting its other duties on the system, and expending privileged system
resources on behalf of a workload.
To remedy this, we have first considered changing the reclaim logic to
break out after a couple of loops - whether the workload has converged to
the new limit or not - and bound the write() call this way. However, the
root cause that inspired the sequence change in the first place has been
fixed through other means, and so a revert back to the proven
limit-setting sequence, also used by memory.max, is preferable.
The sequence was changed to avoid extreme latencies in the workload when
the limit was lowered: the sudden, large excess created by the limit
lowering would erroneously trigger the penalty sleeping code that is meant
to throttle excessive growth from below. Allocating threads could end up
sleeping long after the write() had already reclaimed the delta for which
they were being punished.
However, erroneous throttling also caused problems in other scenarios at
around the same time. This resulted in commit b3ff92916af3 ("mm, memcg:
reclaim more aggressively before high allocator throttling"), included in
the same release as the offending commit. When allocating threads now
encounter large excess caused by a racing write() to memory.high, instead
of entering punitive sleeps, they will simply be tasked with helping
reclaim down the excess, and will be held no longer than it takes to
accomplish that. This is in line with regular limit enforcement - i.e.
if the workload allocates up against or over an otherwise unchanged limit
from below.
With the patch breaking userspace, and the root cause addressed by other
means already, revert it again.
Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org
Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reported-by: Tejun Heo <tj(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Michal Koutný <mkoutny(a)suse.com>
Cc: <stable(a)vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/mm/memcontrol.c~revert-mm-memcontrol-avoid-workload-stalls-when-lowering-memoryhigh
+++ a/mm/memcontrol.c
@@ -6271,6 +6271,8 @@ static ssize_t memory_high_write(struct
if (err)
return err;
+ page_counter_set_high(&memcg->memory, high);
+
for (;;) {
unsigned long nr_pages = page_counter_read(&memcg->memory);
@@ -6293,10 +6295,7 @@ static ssize_t memory_high_write(struct
break;
}
- page_counter_set_high(&memcg->memory, high);
-
memcg_wb_domain_size_changed(memcg);
-
return nbytes;
}
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
mm-memcontrol-prevent-starvation-when-writing-memoryhigh.patch
revert-mm-memcontrol-avoid-workload-stalls-when-lowering-memoryhigh.patch
The patch titled
Subject: mm/vmalloc: separate put pages and flush VM flags
has been added to the -mm tree. Its filename is
mm-vmalloc-separate-put-pages-and-flush-vm-flags.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-vmalloc-separate-put-pages-and…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-vmalloc-separate-put-pages-and…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Subject: mm/vmalloc: separate put pages and flush VM flags
When VM_MAP_PUT_PAGES was added, it was defined with the same value as
VM_FLUSH_RESET_PERMS. This doesn't seem like it will cause any big
functional problems other than some excess flushing for VM_MAP_PUT_PAGES
allocations.
Redefine VM_MAP_PUT_PAGES to have its own value. Also, rearrange things
so flags are less likely to be missed in the future.
Link: https://lkml.kernel.org/r/20210122233706.9304-1-rick.p.edgecombe@intel.com
Fixes: b944afc9d64d ("mm: add a VM_MAP_PUT_PAGES flag for vmap")
Signed-off-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Suggested-by: Matthew Wilcox <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Daniel Axtens <dja(a)axtens.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/vmalloc.h | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
--- a/include/linux/vmalloc.h~mm-vmalloc-separate-put-pages-and-flush-vm-flags
+++ a/include/linux/vmalloc.h
@@ -24,7 +24,8 @@ struct notifier_block; /* in notifier.h
#define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */
#define VM_NO_GUARD 0x00000040 /* don't add guard page */
#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
-#define VM_MAP_PUT_PAGES 0x00000100 /* put pages and free array in vfree */
+#define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB on unmap, can't be freed in atomic context */
+#define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */
/*
* VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
@@ -37,12 +38,6 @@ struct notifier_block; /* in notifier.h
* determine which allocations need the module shadow freed.
*/
-/*
- * Memory with VM_FLUSH_RESET_PERMS cannot be freed in an interrupt or with
- * vfree_atomic().
- */
-#define VM_FLUSH_RESET_PERMS 0x00000100 /* Reset direct map and flush TLB on unmap */
-
/* bits [20..32] reserved for arch specific ioremap internals */
/*
_
Patches currently in -mm which might be from rick.p.edgecombe(a)intel.com are
mm-vmalloc-separate-put-pages-and-flush-vm-flags.patch
The patch titled
Subject: mm/vmalloc: reparate put pages and flush VM flags
has been removed from the -mm tree. Its filename was
mm-vmalloc-separate-put-pages-and-flush-vm-flags.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Subject: mm/vmalloc: reparate put pages and flush VM flags
When VM_MAP_PUT_PAGES was added, it was defined with the same value as
VM_FLUSH_RESET_PERMS. This doesn't seem like it will cause any big
functional problems other than some excess flushing for VM_MAP_PUT_PAGES
allocations.
Redefine VM_MAP_PUT_PAGES to have its own value. Also, move the comment
and remove whitespace for VM_KASAN such that the flags lower down are less
likely to be missed in the future.
Link: https://lkml.kernel.org/r/20210121014118.31922-1-rick.p.edgecombe@intel.com
Fixes: b944afc9d64d ("mm: add a VM_MAP_PUT_PAGES flag for vmap")
Signed-off-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Reviewed-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: Daniel Axtens <dja(a)axtens.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/vmalloc.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
--- a/include/linux/vmalloc.h~mm-vmalloc-separate-put-pages-and-flush-vm-flags
+++ a/include/linux/vmalloc.h
@@ -23,9 +23,6 @@ struct notifier_block; /* in notifier.h
#define VM_DMA_COHERENT 0x00000010 /* dma_alloc_coherent */
#define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */
#define VM_NO_GUARD 0x00000040 /* don't add guard page */
-#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
-#define VM_MAP_PUT_PAGES 0x00000100 /* put pages and free array in vfree */
-
/*
* VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
*
@@ -36,12 +33,13 @@ struct notifier_block; /* in notifier.h
* Otherwise, VM_KASAN is set for kasan_module_alloc() allocations and used to
* determine which allocations need the module shadow freed.
*/
-
+#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
/*
* Memory with VM_FLUSH_RESET_PERMS cannot be freed in an interrupt or with
* vfree_atomic().
*/
#define VM_FLUSH_RESET_PERMS 0x00000100 /* Reset direct map and flush TLB on unmap */
+#define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */
/* bits [20..32] reserved for arch specific ioremap internals */
_
Patches currently in -mm which might be from rick.p.edgecombe(a)intel.com are
I'm announcing the release of the 4.19.170 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 2
drivers/md/dm-bufio.c | 6 ++
drivers/md/dm-integrity.c | 50 +++++++++++++++++--
drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 2
drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 7 --
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 -
drivers/net/usb/rndis_host.c | 2
drivers/spi/spi-cadence.c | 6 +-
drivers/usb/host/ohci-hcd.c | 2
fs/nfsd/nfs3xdr.c | 7 ++
include/linux/compiler-gcc.h | 6 ++
include/linux/dm-bufio.h | 1
include/linux/skbuff.h | 5 +
net/core/skbuff.c | 9 ++-
net/core/sock_reuseport.c | 2
net/dcb/dcbnl.c | 2
net/ipv4/esp4.c | 7 --
net/ipv6/esp6.c | 7 --
net/ipv6/ip6_output.c | 40 ++++++++++++++-
net/ipv6/sit.c | 5 +
net/rxrpc/input.c | 2
net/rxrpc/key.c | 6 +-
net/tipc/link.c | 9 ++-
24 files changed, 147 insertions(+), 43 deletions(-)
Andrey Zhizhikin (1):
rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request
Arnd Bergmann (1):
crypto: x86/crc32c - fix building with clang ias
Aya Levin (1):
net: ipv6: Validate GSO SKB before finish IPv6 processing
Baptiste Lepers (2):
udp: Prevent reuseport_select_sock from reading uninitialized socks
rxrpc: Call state should be read with READ_ONCE() under some circumstances
David Howells (1):
rxrpc: Fix handling of an unsupported token type in rxrpc_read()
David Wu (1):
net: stmmac: Fixed mtu channged by cache aligned
Eric Dumazet (1):
net: avoid 32 x truesize under-estimation for tiny skbs
Greg Kroah-Hartman (1):
Linux 4.19.170
Hamish Martin (1):
usb: ohci: Make distrust_firmware param default to false
Hoang Le (1):
tipc: fix NULL deref in tipc_link_xmit()
J. Bruce Fields (1):
nfsd4: readdirplus shouldn't return parent of export
Jakub Kicinski (1):
net: sit: unregister_netdevice on newlink's error path
Jason A. Donenfeld (2):
net: introduce skb_list_walk_safe for skb segment walking
net: skbuff: disambiguate argument and member for skb_list_walk_safe helper
Manish Chopra (1):
netxen_nic: fix MSI/MSI-x interrupts
Michael Hennerich (1):
spi: cadence: cache reference clock rate during probe
Mikulas Patocka (1):
dm integrity: fix flush with external metadata device
Petr Machata (2):
net: dcb: Validate netlink message in DCB handler
net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands
Stefan Chulski (1):
net: mvpp2: Remove Pause and Asym_Pause support
Will Deacon (1):
compiler.h: Raise minimum version of GCC to 5.1 for arm64
Willem de Bruijn (1):
esp: avoid unneeded kmap_atomic call
This is the start of the stable review cycle for the 5.10.10 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 24 Jan 2021 13:57:23 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.10-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.10-rc1
Michael Hennerich <michael.hennerich(a)analog.com>
spi: cadence: cache reference clock rate during probe
Christophe Leroy <christophe.leroy(a)csgroup.eu>
spi: fsl: Fix driver breakage when SPI_CS_HIGH is not set in spi->mode
Ayush Sawal <ayush.sawal(a)chelsio.com>
cxgb4/chtls: Fix tid stuck due to wrong update of qid
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: dsa: unbind all switches from tree when DSA master unbinds
Lorenzo Bianconi <lorenzo(a)kernel.org>
mac80211: check if atf has been disabled in __ieee80211_schedule_txq
Felix Fietkau <nbd(a)nbd.name>
mac80211: do not drop tx nulldata packets on encrypted links
Antonio Borneo <antonio.borneo(a)st.com>
drm/panel: otm8009a: allow using non-continuous dsi clock
Qinglang Miao <miaoqinglang(a)huawei.com>
can: mcp251xfd: mcp251xfd_handle_rxif_one(): fix wrong NULL pointer check
Seb Laveze <sebastien.laveze(a)nxp.com>
net: stmmac: use __napi_schedule() for PREEMPT_RT
David Howells <dhowells(a)redhat.com>
rxrpc: Fix handling of an unsupported token type in rxrpc_read()
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: dsa: clear devlink port type before unregistering slave netdevs
Marco Felsch <m.felsch(a)pengutronix.de>
net: phy: smsc: fix clk error handling
Geert Uytterhoeven <geert+renesas(a)glider.be>
dt-bindings: net: renesas,etheravb: RZ/G2H needs tx-internal-delay-ps
Eric Dumazet <edumazet(a)google.com>
net: avoid 32 x truesize under-estimation for tiny skbs
Yannick Vignon <yannick.vignon(a)nxp.com>
net: stmmac: fix taprio configuration when base_time is in the past
Yannick Vignon <yannick.vignon(a)nxp.com>
net: stmmac: fix taprio schedule configuration
Jakub Kicinski <kuba(a)kernel.org>
net: sit: unregister_netdevice on newlink's error path
David Wu <david.wu(a)rock-chips.com>
net: stmmac: Fixed mtu channged by cache aligned
Cristian Dumitrescu <cristian.dumitrescu(a)intel.com>
i40e: fix potential NULL pointer dereferencing
Baptiste Lepers <baptiste.lepers(a)gmail.com>
rxrpc: Call state should be read with READ_ONCE() under some circumstances
Petr Machata <petrm(a)nvidia.com>
net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands
Petr Machata <me(a)pmachata.org>
net: dcb: Validate netlink message in DCB handler
Willem de Bruijn <willemb(a)google.com>
esp: avoid unneeded kmap_atomic call
Andrey Zhizhikin <andrey.zhizhikin(a)leica-geosystems.com>
rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request
Stefan Chulski <stefanc(a)marvell.com>
net: mvpp2: Remove Pause and Asym_Pause support
Vadim Pasternak <vadimp(a)nvidia.com>
mlxsw: core: Increase critical threshold for ASIC thermal zone
Vadim Pasternak <vadimp(a)nvidia.com>
mlxsw: core: Add validation of transceiver temperature thresholds
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix NULL deref in tipc_link_xmit()
Aya Levin <ayal(a)nvidia.com>
net: ipv6: Validate GSO SKB before finish IPv6 processing
Manish Chopra <manishc(a)marvell.com>
netxen_nic: fix MSI/MSI-x interrupts
Baptiste Lepers <baptiste.lepers(a)gmail.com>
udp: Prevent reuseport_select_sock from reading uninitialized socks
Dongseok Yi <dseok.yi(a)samsung.com>
net: fix use-after-free when UDP GRO with shared fraglist
Stephan Gerhold <stephan(a)gerhold.net>
net: ipa: modem: add missing SET_NETDEV_DEV() for proper sysfs links
Mircea Cirjaliu <mcirjaliu(a)bitdefender.com>
bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
Gilad Reti <gilad.reti(a)gmail.com>
bpf: Support PTR_TO_MEM{,_OR_NULL} register spilling
Stanislav Fomichev <sdf(a)google.com>
bpf: Don't leak memory in bpf getsockopt when optlen == 0
J. Bruce Fields <bfields(a)redhat.com>
nfsd4: readdirplus shouldn't return parent of export
Tianjia Zhang <tianjia.zhang(a)linux.alibaba.com>
X.509: Fix crash caused by NULL pointer
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix signed_{sub,add32}_overflows type handling
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdgpu/display: drop DCN support for aarch64
Dexuan Cui <decui(a)microsoft.com>
x86/hyperv: Initialize clockevents after LAPIC is initialized
Andrei Matei <andreimatei1(a)gmail.com>
bpf: Fix selftest compilation on clang 11
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "kconfig: remove 'kvmconfig' and 'xenconfig' shorthands"
-------------
Diffstat:
.../devicetree/bindings/net/renesas,etheravb.yaml | 1 +
Makefile | 4 +-
arch/x86/hyperv/hv_init.c | 29 +++++++-
crypto/asymmetric_keys/public_key.c | 3 +-
drivers/gpu/drm/amd/display/Kconfig | 2 +-
drivers/gpu/drm/amd/display/dc/calcs/Makefile | 7 --
drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile | 7 --
drivers/gpu/drm/amd/display/dc/dcn10/Makefile | 7 --
.../gpu/drm/amd/display/dc/dcn10/dcn10_resource.c | 81 +++++++++-------------
drivers/gpu/drm/amd/display/dc/dcn20/Makefile | 4 --
drivers/gpu/drm/amd/display/dc/dcn21/Makefile | 4 --
drivers/gpu/drm/amd/display/dc/dml/Makefile | 13 ----
drivers/gpu/drm/amd/display/dc/dsc/Makefile | 5 --
drivers/gpu/drm/amd/display/dc/os_types.h | 4 --
drivers/gpu/drm/panel/panel-orisetech-otm8009a.c | 2 +-
drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c | 2 +-
drivers/net/ethernet/chelsio/cxgb4/t4_tcb.h | 7 ++
.../ethernet/chelsio/inline_crypto/chtls/chtls.h | 4 ++
.../chelsio/inline_crypto/chtls/chtls_cm.c | 32 ++++++++-
.../chelsio/inline_crypto/chtls/chtls_hw.c | 41 +++++++++++
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 2 +-
drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 2 -
drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 13 ++--
.../net/ethernet/qlogic/netxen/netxen_nic_main.c | 7 +-
drivers/net/ethernet/stmicro/stmmac/dwmac5.c | 52 ++------------
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c | 20 +++++-
drivers/net/ipa/ipa_modem.c | 1 +
drivers/net/phy/smsc.c | 3 +-
drivers/net/usb/rndis_host.c | 2 +-
drivers/spi/spi-cadence.c | 6 +-
drivers/spi/spi-fsl-spi.c | 5 +-
fs/nfsd/nfs3xdr.c | 7 +-
kernel/bpf/cgroup.c | 5 +-
kernel/bpf/helpers.c | 2 +-
kernel/bpf/verifier.c | 8 ++-
net/core/skbuff.c | 29 +++++++-
net/core/sock_reuseport.c | 2 +-
net/dcb/dcbnl.c | 2 +
net/dsa/dsa2.c | 4 ++
net/dsa/master.c | 10 +++
net/ipv4/esp4.c | 7 +-
net/ipv6/esp6.c | 7 +-
net/ipv6/ip6_output.c | 41 ++++++++++-
net/ipv6/sit.c | 5 +-
net/mac80211/tx.c | 4 +-
net/rxrpc/input.c | 2 +-
net/rxrpc/key.c | 6 +-
net/tipc/link.c | 9 ++-
scripts/kconfig/Makefile | 10 +++
tools/testing/selftests/bpf/progs/profiler.inc.h | 2 +
51 files changed, 323 insertions(+), 218 deletions(-)
This is the start of the stable review cycle for the 5.4.92 release.
There are 33 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 24 Jan 2021 13:57:23 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.92-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.92-rc1
Michael Hennerich <michael.hennerich(a)analog.com>
spi: cadence: cache reference clock rate during probe
Lorenzo Bianconi <lorenzo(a)kernel.org>
mac80211: check if atf has been disabled in __ieee80211_schedule_txq
Felix Fietkau <nbd(a)nbd.name>
mac80211: do not drop tx nulldata packets on encrypted links
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix NULL deref in tipc_link_xmit()
Daniel Borkmann <daniel(a)iogearbox.net>
net, sctp, filter: remap copy_from_user failure error
David Howells <dhowells(a)redhat.com>
rxrpc: Fix handling of an unsupported token type in rxrpc_read()
Eric Dumazet <edumazet(a)google.com>
net: avoid 32 x truesize under-estimation for tiny skbs
Jakub Kicinski <kuba(a)kernel.org>
net: sit: unregister_netdevice on newlink's error path
David Wu <david.wu(a)rock-chips.com>
net: stmmac: Fixed mtu channged by cache aligned
Baptiste Lepers <baptiste.lepers(a)gmail.com>
rxrpc: Call state should be read with READ_ONCE() under some circumstances
Petr Machata <petrm(a)nvidia.com>
net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands
Petr Machata <me(a)pmachata.org>
net: dcb: Validate netlink message in DCB handler
Willem de Bruijn <willemb(a)google.com>
esp: avoid unneeded kmap_atomic call
Andrey Zhizhikin <andrey.zhizhikin(a)leica-geosystems.com>
rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request
Stefan Chulski <stefanc(a)marvell.com>
net: mvpp2: Remove Pause and Asym_Pause support
Vadim Pasternak <vadimp(a)nvidia.com>
mlxsw: core: Increase critical threshold for ASIC thermal zone
Vadim Pasternak <vadimp(a)nvidia.com>
mlxsw: core: Add validation of transceiver temperature thresholds
Aya Levin <ayal(a)nvidia.com>
net: ipv6: Validate GSO SKB before finish IPv6 processing
Jason A. Donenfeld <Jason(a)zx2c4.com>
net: skbuff: disambiguate argument and member for skb_list_walk_safe helper
Jason A. Donenfeld <Jason(a)zx2c4.com>
net: introduce skb_list_walk_safe for skb segment walking
Manish Chopra <manishc(a)marvell.com>
netxen_nic: fix MSI/MSI-x interrupts
Baptiste Lepers <baptiste.lepers(a)gmail.com>
udp: Prevent reuseport_select_sock from reading uninitialized socks
Mircea Cirjaliu <mcirjaliu(a)bitdefender.com>
bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
Stanislav Fomichev <sdf(a)google.com>
bpf: Don't leak memory in bpf getsockopt when optlen == 0
J. Bruce Fields <bfields(a)redhat.com>
nfsd4: readdirplus shouldn't return parent of export
Lukas Wunner <lukas(a)wunner.de>
spi: npcm-fiu: Disable clock in probe error path
Qinglang Miao <miaoqinglang(a)huawei.com>
spi: npcm-fiu: simplify the return expression of npcm_fiu_probe()
YueHaibing <yuehaibing(a)huawei.com>
scsi: lpfc: Make lpfc_defer_acc_rsp static
zhengbin <zhengbin13(a)huawei.com>
scsi: lpfc: Make function lpfc_defer_pt2pt_acc static
Arnd Bergmann <arnd(a)arndb.de>
elfcore: fix building with clang
Roger Pau Monne <roger.pau(a)citrix.com>
xen/privcmd: allow fetching resource sizes
Will Deacon <will(a)kernel.org>
compiler.h: Raise minimum version of GCC to 5.1 for arm64
Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
usb: ohci: Make distrust_firmware param default to false
-------------
Diffstat:
Makefile | 4 +--
drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 2 --
drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 13 ++++---
.../net/ethernet/qlogic/netxen/netxen_nic_main.c | 7 +---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +-
drivers/net/usb/rndis_host.c | 2 +-
drivers/scsi/lpfc/lpfc_nportdisc.c | 4 +--
drivers/spi/spi-cadence.c | 6 ++--
drivers/spi/spi-npcm-fiu.c | 7 ++--
drivers/usb/host/ohci-hcd.c | 2 +-
drivers/xen/privcmd.c | 25 +++++++++----
fs/nfsd/nfs3xdr.c | 7 +++-
include/linux/compiler-gcc.h | 6 ++++
include/linux/elfcore.h | 22 ++++++++++++
include/linux/skbuff.h | 5 +++
kernel/Makefile | 1 -
kernel/bpf/cgroup.c | 5 +--
kernel/bpf/helpers.c | 2 +-
kernel/elfcore.c | 26 --------------
net/core/filter.c | 2 +-
net/core/skbuff.c | 9 +++--
net/core/sock_reuseport.c | 2 +-
net/dcb/dcbnl.c | 2 ++
net/ipv4/esp4.c | 7 +---
net/ipv6/esp6.c | 7 +---
net/ipv6/ip6_output.c | 41 +++++++++++++++++++++-
net/ipv6/sit.c | 5 ++-
net/mac80211/tx.c | 4 +--
net/rxrpc/input.c | 2 +-
net/rxrpc/key.c | 6 ++--
net/sctp/socket.c | 2 +-
net/tipc/link.c | 9 +++--
32 files changed, 158 insertions(+), 89 deletions(-)
This is the start of the stable review cycle for the 4.4.253 release.
There are 29 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 24 Jan 2021 16:08:14 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.253-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.253-rc2
Michael Hennerich <michael.hennerich(a)analog.com>
spi: cadence: cache reference clock rate during probe
Eric Dumazet <edumazet(a)google.com>
net: avoid 32 x truesize under-estimation for tiny skbs
David Howells <dhowells(a)redhat.com>
rxrpc: Fix handling of an unsupported token type in rxrpc_read()
Jakub Kicinski <kuba(a)kernel.org>
net: sit: unregister_netdevice on newlink's error path
Petr Machata <petrm(a)nvidia.com>
net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands
Petr Machata <me(a)pmachata.org>
net: dcb: Validate netlink message in DCB handler
Andrey Zhizhikin <andrey.zhizhikin(a)leica-geosystems.com>
rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request
Manish Chopra <manishc(a)marvell.com>
netxen_nic: fix MSI/MSI-x interrupts
Jouni K. Seppänen <jks(a)iki.fi>
net: cdc_ncm: correct overhead in delayed_ndp_size
J. Bruce Fields <bfields(a)redhat.com>
nfsd4: readdirplus shouldn't return parent of export
Nuno Sá <nuno.sa(a)analog.com>
iio: buffer: Fix demux update
Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
usb: ohci: Make distrust_firmware param default to false
j.nixdorf(a)avm.de <j.nixdorf(a)avm.de>
net: sunrpc: interpret the return value of kstrtou32 correctly
Jann Horn <jannh(a)google.com>
mm, slub: consider rest of partial list if acquire_slab() fails
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
Jan Kara <jack(a)suse.cz>
ext4: fix superblock checksum failure when setting password salt
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: nfs_igrab_and_active must first reference the superblock
Al Viro <viro(a)zeniv.linux.org.uk>
dump_common_audit_data(): fix racy accesses to ->d_name
Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Input: uinput - avoid FF flush when destroying device
Arnd Bergmann <arnd(a)arndb.de>
ARM: picoxcell: fix missing interrupt-parent properties
Shawn Guo <shawn.guo(a)linaro.org>
ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
Michael Ellerman <mpe(a)ellerman.id.au>
net: ethernet: fs_enet: Add missing MODULE_LICENSE
Arnd Bergmann <arnd(a)arndb.de>
misdn: dsp: select CONFIG_BITREVERSE
Randy Dunlap <rdunlap(a)infradead.org>
arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
ethernet: ucc_geth: fix definition and size of ucc_geth_tx_global_pram
Masahiro Yamada <masahiroy(a)kernel.org>
ARC: build: add boot_targets to PHONY
yangerkun <yangerkun(a)huawei.com>
ext4: fix bug for rename with RENAME_WHITEOUT
Miaohe Lin <linmiaohe(a)huawei.com>
mm/hugetlb: fix potential missing huge page size info
Thomas Hebb <tommyhebb(a)gmail.com>
ASoC: dapm: remove widget from dirty list on free
-------------
Diffstat:
Makefile | 4 ++--
arch/arc/Makefile | 1 +
arch/arc/include/asm/page.h | 1 +
arch/arm/boot/dts/picoxcell-pc3x2.dtsi | 4 ++++
drivers/iio/industrialio-buffer.c | 6 +++---
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 3 +++
drivers/input/ff-core.c | 13 ++++++++++---
drivers/input/misc/uinput.c | 18 ++++++++++++++++++
drivers/isdn/mISDN/Kconfig | 1 +
drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c | 1 +
drivers/net/ethernet/freescale/fs_enet/mii-fec.c | 1 +
drivers/net/ethernet/freescale/ucc_geth.h | 9 ++++++++-
drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 7 +------
drivers/net/usb/cdc_ncm.c | 8 ++++++--
drivers/net/usb/rndis_host.c | 2 +-
drivers/spi/spi-cadence.c | 6 ++++--
drivers/usb/host/ohci-hcd.c | 2 +-
fs/ext4/ioctl.c | 3 +++
fs/ext4/namei.c | 16 +++++++++-------
fs/nfs/internal.h | 12 +++++++-----
fs/nfsd/nfs3xdr.c | 7 ++++++-
include/linux/acpi.h | 7 +++++++
include/linux/input.h | 1 +
mm/hugetlb.c | 2 +-
mm/slub.c | 2 +-
net/core/skbuff.c | 9 +++++++--
net/dcb/dcbnl.c | 2 ++
net/ipv6/sit.c | 5 ++++-
net/rxrpc/ar-key.c | 6 ++++--
net/sunrpc/addr.c | 2 +-
security/lsm_audit.c | 7 +++++--
sound/soc/soc-dapm.c | 1 +
32 files changed, 125 insertions(+), 44 deletions(-)
This is the start of the stable review cycle for the 4.19.170 release.
There are 22 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 24 Jan 2021 13:57:23 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.170-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.170-rc1
Michael Hennerich <michael.hennerich(a)analog.com>
spi: cadence: cache reference clock rate during probe
Aya Levin <ayal(a)nvidia.com>
net: ipv6: Validate GSO SKB before finish IPv6 processing
Jason A. Donenfeld <Jason(a)zx2c4.com>
net: skbuff: disambiguate argument and member for skb_list_walk_safe helper
Jason A. Donenfeld <Jason(a)zx2c4.com>
net: introduce skb_list_walk_safe for skb segment walking
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix NULL deref in tipc_link_xmit()
David Howells <dhowells(a)redhat.com>
rxrpc: Fix handling of an unsupported token type in rxrpc_read()
Eric Dumazet <edumazet(a)google.com>
net: avoid 32 x truesize under-estimation for tiny skbs
Jakub Kicinski <kuba(a)kernel.org>
net: sit: unregister_netdevice on newlink's error path
David Wu <david.wu(a)rock-chips.com>
net: stmmac: Fixed mtu channged by cache aligned
Baptiste Lepers <baptiste.lepers(a)gmail.com>
rxrpc: Call state should be read with READ_ONCE() under some circumstances
Petr Machata <petrm(a)nvidia.com>
net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands
Petr Machata <me(a)pmachata.org>
net: dcb: Validate netlink message in DCB handler
Willem de Bruijn <willemb(a)google.com>
esp: avoid unneeded kmap_atomic call
Andrey Zhizhikin <andrey.zhizhikin(a)leica-geosystems.com>
rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request
Stefan Chulski <stefanc(a)marvell.com>
net: mvpp2: Remove Pause and Asym_Pause support
Manish Chopra <manishc(a)marvell.com>
netxen_nic: fix MSI/MSI-x interrupts
Baptiste Lepers <baptiste.lepers(a)gmail.com>
udp: Prevent reuseport_select_sock from reading uninitialized socks
J. Bruce Fields <bfields(a)redhat.com>
nfsd4: readdirplus shouldn't return parent of export
Arnd Bergmann <arnd(a)arndb.de>
crypto: x86/crc32c - fix building with clang ias
Mikulas Patocka <mpatocka(a)redhat.com>
dm integrity: fix flush with external metadata device
Will Deacon <will(a)kernel.org>
compiler.h: Raise minimum version of GCC to 5.1 for arm64
Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
usb: ohci: Make distrust_firmware param default to false
-------------
Diffstat:
Makefile | 4 +-
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 2 +-
drivers/md/dm-bufio.c | 6 +++
drivers/md/dm-integrity.c | 50 +++++++++++++++++++---
drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 2 -
.../net/ethernet/qlogic/netxen/netxen_nic_main.c | 7 +--
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +-
drivers/net/usb/rndis_host.c | 2 +-
drivers/spi/spi-cadence.c | 6 ++-
drivers/usb/host/ohci-hcd.c | 2 +-
fs/nfsd/nfs3xdr.c | 7 ++-
include/linux/compiler-gcc.h | 6 +++
include/linux/dm-bufio.h | 1 +
include/linux/skbuff.h | 5 +++
net/core/skbuff.c | 9 +++-
net/core/sock_reuseport.c | 2 +-
net/dcb/dcbnl.c | 2 +
net/ipv4/esp4.c | 7 +--
net/ipv6/esp6.c | 7 +--
net/ipv6/ip6_output.c | 40 ++++++++++++++++-
net/ipv6/sit.c | 5 ++-
net/rxrpc/input.c | 2 +-
net/rxrpc/key.c | 6 ++-
net/tipc/link.c | 9 +++-
24 files changed, 148 insertions(+), 44 deletions(-)
This is the start of the stable review cycle for the 4.14.217 release.
There are 48 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 24 Jan 2021 16:08:17 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.217-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.217-rc2
Michael Hennerich <michael.hennerich(a)analog.com>
spi: cadence: cache reference clock rate during probe
Aya Levin <ayal(a)nvidia.com>
net: ipv6: Validate GSO SKB before finish IPv6 processing
Jason A. Donenfeld <Jason(a)zx2c4.com>
net: skbuff: disambiguate argument and member for skb_list_walk_safe helper
Jason A. Donenfeld <Jason(a)zx2c4.com>
net: introduce skb_list_walk_safe for skb segment walking
Edward Cree <ecree(a)solarflare.com>
net: use skb_list_del_init() to remove from RX sublists
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix NULL deref in tipc_link_xmit()
David Howells <dhowells(a)redhat.com>
rxrpc: Fix handling of an unsupported token type in rxrpc_read()
Eric Dumazet <edumazet(a)google.com>
net: avoid 32 x truesize under-estimation for tiny skbs
Jakub Kicinski <kuba(a)kernel.org>
net: sit: unregister_netdevice on newlink's error path
David Wu <david.wu(a)rock-chips.com>
net: stmmac: Fixed mtu channged by cache aligned
Petr Machata <petrm(a)nvidia.com>
net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands
Petr Machata <me(a)pmachata.org>
net: dcb: Validate netlink message in DCB handler
Willem de Bruijn <willemb(a)google.com>
esp: avoid unneeded kmap_atomic call
Andrey Zhizhikin <andrey.zhizhikin(a)leica-geosystems.com>
rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request
Manish Chopra <manishc(a)marvell.com>
netxen_nic: fix MSI/MSI-x interrupts
J. Bruce Fields <bfields(a)redhat.com>
nfsd4: readdirplus shouldn't return parent of export
Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
usb: ohci: Make distrust_firmware param default to false
Jesper Dangaard Brouer <brouer(a)redhat.com>
netfilter: conntrack: fix reading nf_conntrack_buckets
Geert Uytterhoeven <geert+renesas(a)glider.be>
ALSA: fireface: Fix integer overflow in transmit_midi_msg()
Geert Uytterhoeven <geert+renesas(a)glider.be>
ALSA: firewire-tascam: Fix integer overflow in midi_port_work()
Mike Snitzer <snitzer(a)redhat.com>
dm: eliminate potential source of excessive kernel log noise
j.nixdorf(a)avm.de <j.nixdorf(a)avm.de>
net: sunrpc: interpret the return value of kstrtou32 correctly
Jann Horn <jannh(a)google.com>
mm, slub: consider rest of partial list if acquire_slab() fails
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
Jan Kara <jack(a)suse.cz>
ext4: fix superblock checksum failure when setting password salt
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: nfs_igrab_and_active must first reference the superblock
Trond Myklebust <trond.myklebust(a)hammerspace.com>
pNFS: Mark layout for return if return-on-close was not sent
Dave Wysochanski <dwysocha(a)redhat.com>
NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock
Dan Carpenter <dan.carpenter(a)oracle.com>
ASoC: Intel: fix error code cnl_set_dsp_D0()
Al Viro <viro(a)zeniv.linux.org.uk>
dump_common_audit_data(): fix racy accesses to ->d_name
Arnd Bergmann <arnd(a)arndb.de>
ARM: picoxcell: fix missing interrupt-parent properties
Shawn Guo <shawn.guo(a)linaro.org>
ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
Michael Ellerman <mpe(a)ellerman.id.au>
net: ethernet: fs_enet: Add missing MODULE_LICENSE
Arnd Bergmann <arnd(a)arndb.de>
misdn: dsp: select CONFIG_BITREVERSE
Randy Dunlap <rdunlap(a)infradead.org>
arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
ethernet: ucc_geth: fix definition and size of ucc_geth_tx_global_pram
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan
Masahiro Yamada <masahiroy(a)kernel.org>
ARC: build: add boot_targets to PHONY
Masahiro Yamada <masahiroy(a)kernel.org>
ARC: build: add uImage.lzma to the top-level target
Masahiro Yamada <masahiroy(a)kernel.org>
ARC: build: remove non-existing bootpImage from KBUILD_IMAGE
yangerkun <yangerkun(a)huawei.com>
ext4: fix bug for rename with RENAME_WHITEOUT
Leon Schuermann <leon(a)is.currently.online>
r8152: Add Lenovo Powered USB-C Travel Hub
Akilesh Kailash <akailash(a)google.com>
dm snapshot: flush merged data before committing metadata
Miaohe Lin <linmiaohe(a)huawei.com>
mm/hugetlb: fix potential missing huge page size info
Dexuan Cui <decui(a)microsoft.com>
ACPI: scan: Harden acpi_device_add() against device ID overflows
Alexander Lobakin <alobakin(a)pm.me>
MIPS: relocatable: fix possible boot hangup with KASLR enabled
Paul Cercueil <paul(a)crapouillou.net>
MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB
Thomas Hebb <tommyhebb(a)gmail.com>
ASoC: dapm: remove widget from dirty list on free
-------------
Diffstat:
Makefile | 4 +--
arch/arc/Makefile | 9 ++---
arch/arc/include/asm/page.h | 1 +
arch/arm/boot/dts/picoxcell-pc3x2.dtsi | 4 +++
arch/mips/boot/compressed/decompress.c | 3 +-
arch/mips/kernel/relocate.c | 10 ++++--
drivers/acpi/internal.h | 2 +-
drivers/acpi/scan.c | 15 +++++++-
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 3 ++
drivers/isdn/mISDN/Kconfig | 1 +
drivers/md/dm-snap.c | 24 +++++++++++++
drivers/md/dm.c | 2 +-
.../net/ethernet/freescale/fs_enet/mii-bitbang.c | 1 +
drivers/net/ethernet/freescale/fs_enet/mii-fec.c | 1 +
drivers/net/ethernet/freescale/ucc_geth.h | 9 ++++-
.../net/ethernet/qlogic/netxen/netxen_nic_main.c | 7 +---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +-
drivers/net/usb/cdc_ether.c | 7 ++++
drivers/net/usb/r8152.c | 1 +
drivers/net/usb/rndis_host.c | 2 +-
drivers/spi/spi-cadence.c | 6 ++--
drivers/usb/host/ohci-hcd.c | 2 +-
fs/btrfs/qgroup.c | 13 +++++--
fs/btrfs/super.c | 8 +++++
fs/ext4/ioctl.c | 3 ++
fs/ext4/namei.c | 16 +++++----
fs/nfs/internal.h | 12 ++++---
fs/nfs/nfs4proc.c | 2 +-
fs/nfs/pnfs.c | 6 ++++
fs/nfsd/nfs3xdr.c | 7 +++-
include/linux/acpi.h | 7 ++++
include/linux/skbuff.h | 16 +++++++++
mm/hugetlb.c | 2 +-
mm/slub.c | 2 +-
net/core/skbuff.c | 9 +++--
net/dcb/dcbnl.c | 2 ++
net/ipv4/esp4.c | 7 +---
net/ipv6/esp6.c | 7 +---
net/ipv6/ip6_output.c | 40 +++++++++++++++++++++-
net/ipv6/sit.c | 5 ++-
net/netfilter/nf_conntrack_standalone.c | 3 ++
net/rxrpc/key.c | 6 ++--
net/sunrpc/addr.c | 2 +-
net/tipc/link.c | 9 +++--
security/lsm_audit.c | 7 ++--
sound/firewire/fireface/ff-transaction.c | 2 +-
sound/firewire/tascam/tascam-transaction.c | 2 +-
sound/soc/intel/skylake/cnl-sst.c | 1 +
sound/soc/soc-dapm.c | 1 +
49 files changed, 243 insertions(+), 71 deletions(-)
This is the start of the stable review cycle for the 4.9.253 release.
There are 33 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 24 Jan 2021 16:08:20 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.253-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.253-rc2
Michael Hennerich <michael.hennerich(a)analog.com>
spi: cadence: cache reference clock rate during probe
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix NULL deref in tipc_link_xmit()
David Howells <dhowells(a)redhat.com>
rxrpc: Fix handling of an unsupported token type in rxrpc_read()
Eric Dumazet <edumazet(a)google.com>
net: avoid 32 x truesize under-estimation for tiny skbs
Jakub Kicinski <kuba(a)kernel.org>
net: sit: unregister_netdevice on newlink's error path
Petr Machata <petrm(a)nvidia.com>
net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands
Petr Machata <me(a)pmachata.org>
net: dcb: Validate netlink message in DCB handler
Andrey Zhizhikin <andrey.zhizhikin(a)leica-geosystems.com>
rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request
Manish Chopra <manishc(a)marvell.com>
netxen_nic: fix MSI/MSI-x interrupts
Jouni K. Seppänen <jks(a)iki.fi>
net: cdc_ncm: correct overhead in delayed_ndp_size
J. Bruce Fields <bfields(a)redhat.com>
nfsd4: readdirplus shouldn't return parent of export
Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
usb: ohci: Make distrust_firmware param default to false
Jesper Dangaard Brouer <brouer(a)redhat.com>
netfilter: conntrack: fix reading nf_conntrack_buckets
j.nixdorf(a)avm.de <j.nixdorf(a)avm.de>
net: sunrpc: interpret the return value of kstrtou32 correctly
Jann Horn <jannh(a)google.com>
mm, slub: consider rest of partial list if acquire_slab() fails
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
Jan Kara <jack(a)suse.cz>
ext4: fix superblock checksum failure when setting password salt
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: nfs_igrab_and_active must first reference the superblock
Al Viro <viro(a)zeniv.linux.org.uk>
dump_common_audit_data(): fix racy accesses to ->d_name
Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Input: uinput - avoid FF flush when destroying device
Arnd Bergmann <arnd(a)arndb.de>
ARM: picoxcell: fix missing interrupt-parent properties
Shawn Guo <shawn.guo(a)linaro.org>
ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
Michael Ellerman <mpe(a)ellerman.id.au>
net: ethernet: fs_enet: Add missing MODULE_LICENSE
Arnd Bergmann <arnd(a)arndb.de>
misdn: dsp: select CONFIG_BITREVERSE
Randy Dunlap <rdunlap(a)infradead.org>
arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
ethernet: ucc_geth: fix definition and size of ucc_geth_tx_global_pram
Masahiro Yamada <masahiroy(a)kernel.org>
ARC: build: add boot_targets to PHONY
yangerkun <yangerkun(a)huawei.com>
ext4: fix bug for rename with RENAME_WHITEOUT
Miaohe Lin <linmiaohe(a)huawei.com>
mm/hugetlb: fix potential missing huge page size info
Dexuan Cui <decui(a)microsoft.com>
ACPI: scan: Harden acpi_device_add() against device ID overflows
Alexander Lobakin <alobakin(a)pm.me>
MIPS: relocatable: fix possible boot hangup with KASLR enabled
Paul Cercueil <paul(a)crapouillou.net>
MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB
Thomas Hebb <tommyhebb(a)gmail.com>
ASoC: dapm: remove widget from dirty list on free
-------------
Diffstat:
Makefile | 4 ++--
arch/arc/Makefile | 1 +
arch/arc/include/asm/page.h | 1 +
arch/arm/boot/dts/picoxcell-pc3x2.dtsi | 4 ++++
arch/mips/boot/compressed/decompress.c | 3 ++-
arch/mips/kernel/relocate.c | 10 ++++++++--
drivers/acpi/internal.h | 2 +-
drivers/acpi/scan.c | 15 ++++++++++++++-
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 3 +++
drivers/input/ff-core.c | 13 ++++++++++---
drivers/input/misc/uinput.c | 18 ++++++++++++++++++
drivers/isdn/mISDN/Kconfig | 1 +
drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c | 1 +
drivers/net/ethernet/freescale/fs_enet/mii-fec.c | 1 +
drivers/net/ethernet/freescale/ucc_geth.h | 9 ++++++++-
drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 7 +------
drivers/net/usb/cdc_ncm.c | 8 ++++++--
drivers/net/usb/rndis_host.c | 2 +-
drivers/spi/spi-cadence.c | 6 ++++--
drivers/usb/host/ohci-hcd.c | 2 +-
fs/ext4/ioctl.c | 3 +++
fs/ext4/namei.c | 16 +++++++++-------
fs/nfs/internal.h | 12 +++++++-----
fs/nfsd/nfs3xdr.c | 7 ++++++-
include/linux/acpi.h | 7 +++++++
include/linux/input.h | 1 +
mm/hugetlb.c | 2 +-
mm/slub.c | 2 +-
net/core/skbuff.c | 9 +++++++--
net/dcb/dcbnl.c | 2 ++
net/ipv6/sit.c | 5 ++++-
net/netfilter/nf_conntrack_standalone.c | 3 +++
net/rxrpc/key.c | 6 ++++--
net/sunrpc/addr.c | 2 +-
net/tipc/link.c | 11 +++++++++--
security/lsm_audit.c | 7 +++++--
sound/soc/soc-dapm.c | 1 +
37 files changed, 159 insertions(+), 48 deletions(-)
Hi Saeed,
On 01/22/21 at 05:14pm, Saeed Mirzamohammadi wrote:
> Hi,
>
> > On Jan 21, 2021, at 7:12 PM, Dave Young <dyoung(a)redhat.com> wrote:
> >
> > On 01/22/21 at 09:22am, Dave Young wrote:
> >> Hi John,
> >>
> >> On 01/21/21 at 09:32am, john.p.donnelly(a)oracle.com wrote:
> >>> On 11/22/20 9:47 PM, Dave Young wrote:
> >>>> Hi Guilherme,
> >>>> On 11/22/20 at 12:32pm, Guilherme Piccoli wrote:
> >>>>> Hi Dave and Kairui, thanks for your responses! OK, if that makes sense
> >>>>> to you I'm fine with it. I'd just recommend to test recent kernels in
> >>>>> multiple distros with the minimum "range" to see if 64M is enough for
> >>>>> crashkernel, maybe we'd need to bump that.
> >>>>
> >>>> Giving the different kernel configs and the different userspace
> >>>> initramfs setup it is hard to get an uniform value for all distributions,
> >>>> but we can have an interface/kconfig-option for them to provide a value like this patch
> >>>> is doing. And it could be improved like Kairui said about some known
> >>>> kernel added extra values later, probably some more improvements if
> >>>> doable.
> >>>>
> >>>> Thanks
> >>>> Dave
> >>>>
> >>>
> >>> Hi.
> >>>
> >>> Are we going to move forward with implementing this for X86 and Arm ?
> >>>
> >>> If other platform maintainers want to include this CONFIG option in their
> >>> configuration settings they have a starting point.
> >>
> >> I would expect this become arch independent.
> >
> > Clarify a bit, it can be a general config option under arch/Kconfig and
> > just put the code in general arch independent part.
>
> Does this mean that we need to add the option to def_configs in all archs as well?
>
I think we do not need to add defconfig, something like this will just work?
BTW, it should depend on CRASH_CORE instead of CRASH_DUMP, the logic of
parsing crashkernel is in kernel/crash_core.c
diff --git a/arch/Kconfig b/arch/Kconfig
index af14a567b493..fa6efeb52dc5 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -14,6 +14,11 @@ menu "General architecture-dependent options"
config CRASH_CORE
bool
+config CRASH_AUTO_STR
+ depends on CRASH_CORE
+ string "Memory reserved for crash kernel"
+ default "1G-:128M"
+ ... help text [snip] ...
+
config KEXEC_CORE
select CRASH_CORE
bool
[...]
> Thanks,
> Saeed
>
> >
> >>
> >> Saeed, Kairui, would any of you like to update the patch?
> >>
> >>>
> >>> Thank you,
> >>>
> >>> John.
> >>>
> >>> ( I am not currently on many of the included dist lists in this email, so
> >>> hopefully key contributors are included in this exchange )
> >>>
> >>
> >> Thanks
> >> Dave
>
Thanks
Dave
This is the start of the stable review cycle for the 4.4.253 release.
There are 31 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 24 Jan 2021 13:57:23 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.253-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.253-rc1
Michael Hennerich <michael.hennerich(a)analog.com>
spi: cadence: cache reference clock rate during probe
Eric Dumazet <edumazet(a)google.com>
net: avoid 32 x truesize under-estimation for tiny skbs
David Howells <dhowells(a)redhat.com>
rxrpc: Fix handling of an unsupported token type in rxrpc_read()
Jakub Kicinski <kuba(a)kernel.org>
net: sit: unregister_netdevice on newlink's error path
Petr Machata <petrm(a)nvidia.com>
net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands
Petr Machata <me(a)pmachata.org>
net: dcb: Validate netlink message in DCB handler
Andrey Zhizhikin <andrey.zhizhikin(a)leica-geosystems.com>
rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request
Manish Chopra <manishc(a)marvell.com>
netxen_nic: fix MSI/MSI-x interrupts
Jouni K. Seppänen <jks(a)iki.fi>
net: cdc_ncm: correct overhead in delayed_ndp_size
J. Bruce Fields <bfields(a)redhat.com>
nfsd4: readdirplus shouldn't return parent of export
Nuno Sá <nuno.sa(a)analog.com>
iio: buffer: Fix demux update
Will Deacon <will(a)kernel.org>
compiler.h: Raise minimum version of GCC to 5.1 for arm64
Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
usb: ohci: Make distrust_firmware param default to false
j.nixdorf(a)avm.de <j.nixdorf(a)avm.de>
net: sunrpc: interpret the return value of kstrtou32 correctly
Jann Horn <jannh(a)google.com>
mm, slub: consider rest of partial list if acquire_slab() fails
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
Jan Kara <jack(a)suse.cz>
ext4: fix superblock checksum failure when setting password salt
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: nfs_igrab_and_active must first reference the superblock
Al Viro <viro(a)zeniv.linux.org.uk>
dump_common_audit_data(): fix racy accesses to ->d_name
Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Input: uinput - avoid FF flush when destroying device
Arnd Bergmann <arnd(a)arndb.de>
ARM: picoxcell: fix missing interrupt-parent properties
Shawn Guo <shawn.guo(a)linaro.org>
ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
Michael Ellerman <mpe(a)ellerman.id.au>
net: ethernet: fs_enet: Add missing MODULE_LICENSE
Arnd Bergmann <arnd(a)arndb.de>
misdn: dsp: select CONFIG_BITREVERSE
Randy Dunlap <rdunlap(a)infradead.org>
arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
ethernet: ucc_geth: fix definition and size of ucc_geth_tx_global_pram
Masahiro Yamada <masahiroy(a)kernel.org>
ARC: build: add boot_targets to PHONY
yangerkun <yangerkun(a)huawei.com>
ext4: fix bug for rename with RENAME_WHITEOUT
Miaohe Lin <linmiaohe(a)huawei.com>
mm/hugetlb: fix potential missing huge page size info
Al Viro <viro(a)zeniv.linux.org.uk>
MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
Thomas Hebb <tommyhebb(a)gmail.com>
ASoC: dapm: remove widget from dirty list on free
-------------
Diffstat:
Makefile | 4 ++--
arch/arc/Makefile | 1 +
arch/arc/include/asm/page.h | 1 +
arch/arm/boot/dts/picoxcell-pc3x2.dtsi | 4 ++++
arch/mips/kernel/binfmt_elfn32.c | 7 +++++++
arch/mips/kernel/binfmt_elfo32.c | 7 +++++++
drivers/iio/industrialio-buffer.c | 6 +++---
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 3 +++
drivers/input/ff-core.c | 13 ++++++++++---
drivers/input/misc/uinput.c | 18 ++++++++++++++++++
drivers/isdn/mISDN/Kconfig | 1 +
drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c | 1 +
drivers/net/ethernet/freescale/fs_enet/mii-fec.c | 1 +
drivers/net/ethernet/freescale/ucc_geth.h | 9 ++++++++-
drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 7 +------
drivers/net/usb/cdc_ncm.c | 8 ++++++--
drivers/net/usb/rndis_host.c | 2 +-
drivers/spi/spi-cadence.c | 6 ++++--
drivers/usb/host/ohci-hcd.c | 2 +-
fs/ext4/ioctl.c | 3 +++
fs/ext4/namei.c | 16 +++++++++-------
fs/nfs/internal.h | 12 +++++++-----
fs/nfsd/nfs3xdr.c | 7 ++++++-
include/linux/acpi.h | 7 +++++++
include/linux/compiler-gcc.h | 6 ++++++
include/linux/input.h | 1 +
mm/hugetlb.c | 2 +-
mm/slub.c | 2 +-
net/core/skbuff.c | 9 +++++++--
net/dcb/dcbnl.c | 2 ++
net/ipv6/sit.c | 5 ++++-
net/rxrpc/ar-key.c | 6 ++++--
net/sunrpc/addr.c | 2 +-
security/lsm_audit.c | 7 +++++--
sound/soc/soc-dapm.c | 1 +
35 files changed, 145 insertions(+), 44 deletions(-)
This is the start of the stable review cycle for the 4.9.253 release.
There are 35 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 24 Jan 2021 13:57:23 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.253-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.253-rc1
Michael Hennerich <michael.hennerich(a)analog.com>
spi: cadence: cache reference clock rate during probe
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix NULL deref in tipc_link_xmit()
David Howells <dhowells(a)redhat.com>
rxrpc: Fix handling of an unsupported token type in rxrpc_read()
Eric Dumazet <edumazet(a)google.com>
net: avoid 32 x truesize under-estimation for tiny skbs
Jakub Kicinski <kuba(a)kernel.org>
net: sit: unregister_netdevice on newlink's error path
Petr Machata <petrm(a)nvidia.com>
net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands
Petr Machata <me(a)pmachata.org>
net: dcb: Validate netlink message in DCB handler
Andrey Zhizhikin <andrey.zhizhikin(a)leica-geosystems.com>
rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request
Manish Chopra <manishc(a)marvell.com>
netxen_nic: fix MSI/MSI-x interrupts
Jouni K. Seppänen <jks(a)iki.fi>
net: cdc_ncm: correct overhead in delayed_ndp_size
J. Bruce Fields <bfields(a)redhat.com>
nfsd4: readdirplus shouldn't return parent of export
Will Deacon <will(a)kernel.org>
compiler.h: Raise minimum version of GCC to 5.1 for arm64
Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
usb: ohci: Make distrust_firmware param default to false
Jesper Dangaard Brouer <brouer(a)redhat.com>
netfilter: conntrack: fix reading nf_conntrack_buckets
j.nixdorf(a)avm.de <j.nixdorf(a)avm.de>
net: sunrpc: interpret the return value of kstrtou32 correctly
Jann Horn <jannh(a)google.com>
mm, slub: consider rest of partial list if acquire_slab() fails
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
Jan Kara <jack(a)suse.cz>
ext4: fix superblock checksum failure when setting password salt
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: nfs_igrab_and_active must first reference the superblock
Al Viro <viro(a)zeniv.linux.org.uk>
dump_common_audit_data(): fix racy accesses to ->d_name
Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Input: uinput - avoid FF flush when destroying device
Arnd Bergmann <arnd(a)arndb.de>
ARM: picoxcell: fix missing interrupt-parent properties
Shawn Guo <shawn.guo(a)linaro.org>
ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI
Michael Ellerman <mpe(a)ellerman.id.au>
net: ethernet: fs_enet: Add missing MODULE_LICENSE
Arnd Bergmann <arnd(a)arndb.de>
misdn: dsp: select CONFIG_BITREVERSE
Randy Dunlap <rdunlap(a)infradead.org>
arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
ethernet: ucc_geth: fix definition and size of ucc_geth_tx_global_pram
Masahiro Yamada <masahiroy(a)kernel.org>
ARC: build: add boot_targets to PHONY
yangerkun <yangerkun(a)huawei.com>
ext4: fix bug for rename with RENAME_WHITEOUT
Miaohe Lin <linmiaohe(a)huawei.com>
mm/hugetlb: fix potential missing huge page size info
Dexuan Cui <decui(a)microsoft.com>
ACPI: scan: Harden acpi_device_add() against device ID overflows
Alexander Lobakin <alobakin(a)pm.me>
MIPS: relocatable: fix possible boot hangup with KASLR enabled
Al Viro <viro(a)zeniv.linux.org.uk>
MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
Paul Cercueil <paul(a)crapouillou.net>
MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB
Thomas Hebb <tommyhebb(a)gmail.com>
ASoC: dapm: remove widget from dirty list on free
-------------
Diffstat:
Makefile | 4 ++--
arch/arc/Makefile | 1 +
arch/arc/include/asm/page.h | 1 +
arch/arm/boot/dts/picoxcell-pc3x2.dtsi | 4 ++++
arch/mips/boot/compressed/decompress.c | 3 ++-
arch/mips/kernel/binfmt_elfn32.c | 7 +++++++
arch/mips/kernel/binfmt_elfo32.c | 7 +++++++
arch/mips/kernel/relocate.c | 10 ++++++++--
drivers/acpi/internal.h | 2 +-
drivers/acpi/scan.c | 15 ++++++++++++++-
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 3 +++
drivers/input/ff-core.c | 13 ++++++++++---
drivers/input/misc/uinput.c | 18 ++++++++++++++++++
drivers/isdn/mISDN/Kconfig | 1 +
drivers/net/ethernet/freescale/fs_enet/mii-bitbang.c | 1 +
drivers/net/ethernet/freescale/fs_enet/mii-fec.c | 1 +
drivers/net/ethernet/freescale/ucc_geth.h | 9 ++++++++-
drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 7 +------
drivers/net/usb/cdc_ncm.c | 8 ++++++--
drivers/net/usb/rndis_host.c | 2 +-
drivers/spi/spi-cadence.c | 6 ++++--
drivers/usb/host/ohci-hcd.c | 2 +-
fs/ext4/ioctl.c | 3 +++
fs/ext4/namei.c | 16 +++++++++-------
fs/nfs/internal.h | 12 +++++++-----
fs/nfsd/nfs3xdr.c | 7 ++++++-
include/linux/acpi.h | 7 +++++++
include/linux/compiler-gcc.h | 6 ++++++
include/linux/input.h | 1 +
mm/hugetlb.c | 2 +-
mm/slub.c | 2 +-
net/core/skbuff.c | 9 +++++++--
net/dcb/dcbnl.c | 2 ++
net/ipv6/sit.c | 5 ++++-
net/netfilter/nf_conntrack_standalone.c | 3 +++
net/rxrpc/key.c | 6 ++++--
net/sunrpc/addr.c | 2 +-
net/tipc/link.c | 11 +++++++++--
security/lsm_audit.c | 7 +++++--
sound/soc/soc-dapm.c | 1 +
40 files changed, 179 insertions(+), 48 deletions(-)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 698222457465ce343443be81c5512edda86e5914 Mon Sep 17 00:00:00 2001
From: Al Viro <viro(a)zeniv.linux.org.uk>
Date: Thu, 24 Dec 2020 19:44:38 +0000
Subject: [PATCH] MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012
had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c)
coredumps; unfortunately, compat on mips (which does not go through the
usual compat_binfmt_elf.c) had not been noticed.
As the result, both N32 and O32 coredumps on 64bit mips kernels
have those sections malformed enough to confuse the living hell out of
all gdb and readelf versions (up to and including the tip of binutils-gdb.git).
Longer term solution is to make both O32 and N32 compat use the
regular compat_binfmt_elf.c, but that's too much for backports. The minimal
solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing
those patches have done in fs/compat_binfmt_elf.c
Cc: stable(a)kernel.org # v3.7+
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/kernel/binfmt_elfn32.c b/arch/mips/kernel/binfmt_elfn32.c
index 6ee3f7218c67..c4441416e96b 100644
--- a/arch/mips/kernel/binfmt_elfn32.c
+++ b/arch/mips/kernel/binfmt_elfn32.c
@@ -103,4 +103,11 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value)
#undef ns_to_kernel_old_timeval
#define ns_to_kernel_old_timeval ns_to_old_timeval32
+/*
+ * Some data types as stored in coredump.
+ */
+#define user_long_t compat_long_t
+#define user_siginfo_t compat_siginfo_t
+#define copy_siginfo_to_external copy_siginfo_to_external32
+
#include "../../../fs/binfmt_elf.c"
diff --git a/arch/mips/kernel/binfmt_elfo32.c b/arch/mips/kernel/binfmt_elfo32.c
index 6dd103d3cebb..7b2a23f48c1a 100644
--- a/arch/mips/kernel/binfmt_elfo32.c
+++ b/arch/mips/kernel/binfmt_elfo32.c
@@ -106,4 +106,11 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value)
#undef ns_to_kernel_old_timeval
#define ns_to_kernel_old_timeval ns_to_old_timeval32
+/*
+ * Some data types as stored in coredump.
+ */
+#define user_long_t compat_long_t
+#define user_siginfo_t compat_siginfo_t
+#define copy_siginfo_to_external copy_siginfo_to_external32
+
#include "../../../fs/binfmt_elf.c"
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 698222457465ce343443be81c5512edda86e5914 Mon Sep 17 00:00:00 2001
From: Al Viro <viro(a)zeniv.linux.org.uk>
Date: Thu, 24 Dec 2020 19:44:38 +0000
Subject: [PATCH] MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012
had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c)
coredumps; unfortunately, compat on mips (which does not go through the
usual compat_binfmt_elf.c) had not been noticed.
As the result, both N32 and O32 coredumps on 64bit mips kernels
have those sections malformed enough to confuse the living hell out of
all gdb and readelf versions (up to and including the tip of binutils-gdb.git).
Longer term solution is to make both O32 and N32 compat use the
regular compat_binfmt_elf.c, but that's too much for backports. The minimal
solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing
those patches have done in fs/compat_binfmt_elf.c
Cc: stable(a)kernel.org # v3.7+
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/kernel/binfmt_elfn32.c b/arch/mips/kernel/binfmt_elfn32.c
index 6ee3f7218c67..c4441416e96b 100644
--- a/arch/mips/kernel/binfmt_elfn32.c
+++ b/arch/mips/kernel/binfmt_elfn32.c
@@ -103,4 +103,11 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value)
#undef ns_to_kernel_old_timeval
#define ns_to_kernel_old_timeval ns_to_old_timeval32
+/*
+ * Some data types as stored in coredump.
+ */
+#define user_long_t compat_long_t
+#define user_siginfo_t compat_siginfo_t
+#define copy_siginfo_to_external copy_siginfo_to_external32
+
#include "../../../fs/binfmt_elf.c"
diff --git a/arch/mips/kernel/binfmt_elfo32.c b/arch/mips/kernel/binfmt_elfo32.c
index 6dd103d3cebb..7b2a23f48c1a 100644
--- a/arch/mips/kernel/binfmt_elfo32.c
+++ b/arch/mips/kernel/binfmt_elfo32.c
@@ -106,4 +106,11 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value)
#undef ns_to_kernel_old_timeval
#define ns_to_kernel_old_timeval ns_to_old_timeval32
+/*
+ * Some data types as stored in coredump.
+ */
+#define user_long_t compat_long_t
+#define user_siginfo_t compat_siginfo_t
+#define copy_siginfo_to_external copy_siginfo_to_external32
+
#include "../../../fs/binfmt_elf.c"
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 698222457465ce343443be81c5512edda86e5914 Mon Sep 17 00:00:00 2001
From: Al Viro <viro(a)zeniv.linux.org.uk>
Date: Thu, 24 Dec 2020 19:44:38 +0000
Subject: [PATCH] MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012
had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c)
coredumps; unfortunately, compat on mips (which does not go through the
usual compat_binfmt_elf.c) had not been noticed.
As the result, both N32 and O32 coredumps on 64bit mips kernels
have those sections malformed enough to confuse the living hell out of
all gdb and readelf versions (up to and including the tip of binutils-gdb.git).
Longer term solution is to make both O32 and N32 compat use the
regular compat_binfmt_elf.c, but that's too much for backports. The minimal
solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing
those patches have done in fs/compat_binfmt_elf.c
Cc: stable(a)kernel.org # v3.7+
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/kernel/binfmt_elfn32.c b/arch/mips/kernel/binfmt_elfn32.c
index 6ee3f7218c67..c4441416e96b 100644
--- a/arch/mips/kernel/binfmt_elfn32.c
+++ b/arch/mips/kernel/binfmt_elfn32.c
@@ -103,4 +103,11 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value)
#undef ns_to_kernel_old_timeval
#define ns_to_kernel_old_timeval ns_to_old_timeval32
+/*
+ * Some data types as stored in coredump.
+ */
+#define user_long_t compat_long_t
+#define user_siginfo_t compat_siginfo_t
+#define copy_siginfo_to_external copy_siginfo_to_external32
+
#include "../../../fs/binfmt_elf.c"
diff --git a/arch/mips/kernel/binfmt_elfo32.c b/arch/mips/kernel/binfmt_elfo32.c
index 6dd103d3cebb..7b2a23f48c1a 100644
--- a/arch/mips/kernel/binfmt_elfo32.c
+++ b/arch/mips/kernel/binfmt_elfo32.c
@@ -106,4 +106,11 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value)
#undef ns_to_kernel_old_timeval
#define ns_to_kernel_old_timeval ns_to_old_timeval32
+/*
+ * Some data types as stored in coredump.
+ */
+#define user_long_t compat_long_t
+#define user_siginfo_t compat_siginfo_t
+#define copy_siginfo_to_external copy_siginfo_to_external32
+
#include "../../../fs/binfmt_elf.c"
Hi,
Please consider applying the following commit to v5.10:
880ee3b7615e ("drm/panel: otm8009a: allow using non-continuous dsi clock")
A related patch introduced in v5.10 has accidentally broken the display
on stm32mp DK2 boards. This commit resolves the issue.
Fixes: c6d94e37bdbb ("drm/bridge/synopsys: dsi: add support for
non-continuous HS clock")
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b210de4f8c97d57de051e805686248ec4c6cfc52 Mon Sep 17 00:00:00 2001
From: Aya Levin <ayal(a)nvidia.com>
Date: Thu, 7 Jan 2021 15:50:18 +0200
Subject: [PATCH] net: ipv6: Validate GSO SKB before finish IPv6 processing
There are cases where GSO segment's length exceeds the egress MTU:
- Forwarding of a TCP GRO skb, when DF flag is not set.
- Forwarding of an skb that arrived on a virtualisation interface
(virtio-net/vhost/tap) with TSO/GSO size set by other network
stack.
- Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an
interface with a smaller MTU.
- Arriving GRO skb (or GSO skb in a virtualised environment) that is
bridged to a NETIF_F_TSO tunnel stacked over an interface with an
insufficient MTU.
If so:
- Consume the SKB and its segments.
- Issue an ICMP packet with 'Packet Too Big' message containing the
MTU, allowing the source host to reduce its Path MTU appropriately.
Note: These cases are handled in the same manner in IPv4 output finish.
This patch aligns the behavior of IPv6 and the one of IPv4.
Fixes: 9e50849054a4 ("netfilter: ipv6: move POSTROUTING invocation before fragmentation")
Signed-off-by: Aya Levin <ayal(a)nvidia.com>
Reviewed-by: Tariq Toukan <tariqt(a)nvidia.com>
Link: https://lore.kernel.org/r/1610027418-30438-1-git-send-email-ayal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 749ad72386b2..077d43af8226 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -125,8 +125,43 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *
return -EINVAL;
}
+static int
+ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk,
+ struct sk_buff *skb, unsigned int mtu)
+{
+ struct sk_buff *segs, *nskb;
+ netdev_features_t features;
+ int ret = 0;
+
+ /* Please see corresponding comment in ip_finish_output_gso
+ * describing the cases where GSO segment length exceeds the
+ * egress MTU.
+ */
+ features = netif_skb_features(skb);
+ segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+ if (IS_ERR_OR_NULL(segs)) {
+ kfree_skb(skb);
+ return -ENOMEM;
+ }
+
+ consume_skb(skb);
+
+ skb_list_walk_safe(segs, segs, nskb) {
+ int err;
+
+ skb_mark_not_on_list(segs);
+ err = ip6_fragment(net, sk, segs, ip6_finish_output2);
+ if (err && ret == 0)
+ ret = err;
+ }
+
+ return ret;
+}
+
static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
+ unsigned int mtu;
+
#if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM)
/* Policy lookup after SNAT yielded a new policy */
if (skb_dst(skb)->xfrm) {
@@ -135,7 +170,11 @@ static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff
}
#endif
- if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
+ mtu = ip6_skb_dst_mtu(skb);
+ if (skb_is_gso(skb) && !skb_gso_validate_network_len(skb, mtu))
+ return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu);
+
+ if ((skb->len > mtu && !skb_is_gso(skb)) ||
dst_allfrag(skb_dst(skb)) ||
(IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size))
return ip6_fragment(net, sk, skb, ip6_finish_output2);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b210de4f8c97d57de051e805686248ec4c6cfc52 Mon Sep 17 00:00:00 2001
From: Aya Levin <ayal(a)nvidia.com>
Date: Thu, 7 Jan 2021 15:50:18 +0200
Subject: [PATCH] net: ipv6: Validate GSO SKB before finish IPv6 processing
There are cases where GSO segment's length exceeds the egress MTU:
- Forwarding of a TCP GRO skb, when DF flag is not set.
- Forwarding of an skb that arrived on a virtualisation interface
(virtio-net/vhost/tap) with TSO/GSO size set by other network
stack.
- Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an
interface with a smaller MTU.
- Arriving GRO skb (or GSO skb in a virtualised environment) that is
bridged to a NETIF_F_TSO tunnel stacked over an interface with an
insufficient MTU.
If so:
- Consume the SKB and its segments.
- Issue an ICMP packet with 'Packet Too Big' message containing the
MTU, allowing the source host to reduce its Path MTU appropriately.
Note: These cases are handled in the same manner in IPv4 output finish.
This patch aligns the behavior of IPv6 and the one of IPv4.
Fixes: 9e50849054a4 ("netfilter: ipv6: move POSTROUTING invocation before fragmentation")
Signed-off-by: Aya Levin <ayal(a)nvidia.com>
Reviewed-by: Tariq Toukan <tariqt(a)nvidia.com>
Link: https://lore.kernel.org/r/1610027418-30438-1-git-send-email-ayal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 749ad72386b2..077d43af8226 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -125,8 +125,43 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *
return -EINVAL;
}
+static int
+ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk,
+ struct sk_buff *skb, unsigned int mtu)
+{
+ struct sk_buff *segs, *nskb;
+ netdev_features_t features;
+ int ret = 0;
+
+ /* Please see corresponding comment in ip_finish_output_gso
+ * describing the cases where GSO segment length exceeds the
+ * egress MTU.
+ */
+ features = netif_skb_features(skb);
+ segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+ if (IS_ERR_OR_NULL(segs)) {
+ kfree_skb(skb);
+ return -ENOMEM;
+ }
+
+ consume_skb(skb);
+
+ skb_list_walk_safe(segs, segs, nskb) {
+ int err;
+
+ skb_mark_not_on_list(segs);
+ err = ip6_fragment(net, sk, segs, ip6_finish_output2);
+ if (err && ret == 0)
+ ret = err;
+ }
+
+ return ret;
+}
+
static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
+ unsigned int mtu;
+
#if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM)
/* Policy lookup after SNAT yielded a new policy */
if (skb_dst(skb)->xfrm) {
@@ -135,7 +170,11 @@ static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff
}
#endif
- if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
+ mtu = ip6_skb_dst_mtu(skb);
+ if (skb_is_gso(skb) && !skb_gso_validate_network_len(skb, mtu))
+ return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu);
+
+ if ((skb->len > mtu && !skb_is_gso(skb)) ||
dst_allfrag(skb_dst(skb)) ||
(IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size))
return ip6_fragment(net, sk, skb, ip6_finish_output2);
The "ibm,arch-vec-5-platform-support" property is a list of pairs of
bytes representing the options and values supported by the platform
firmware. At boot time, Linux scans this list and activates the
available features it recognizes : Radix and XIVE.
A recent change modified the number of entries to loop on and 8 bytes,
4 pairs of { options, values } entries are always scanned. This is
fine on KVM but not on PowerVM which can advertises less. As a
consequence on this platform, Linux reads extra entries pointing to
random data, interprets these as available features and tries to
activate them, leading to a firmware crash in
ibm,client-architecture-support.
Fix that by using the property length of "ibm,arch-vec-5-platform-support".
Cc: stable(a)vger.kernel.org # v4.20+
Fixes: ab91239942a9 ("powerpc/prom: Remove VLA in prom_check_platform_support()")
Signed-off-by: Cédric Le Goater <clg(a)kaod.org>
---
arch/powerpc/kernel/prom_init.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index e9d4eb6144e1..ccf77b985c8f 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1331,14 +1331,10 @@ static void __init prom_check_platform_support(void)
if (prop_len > sizeof(vec))
prom_printf("WARNING: ibm,arch-vec-5-platform-support longer than expected (len: %d)\n",
prop_len);
- prom_getprop(prom.chosen, "ibm,arch-vec-5-platform-support",
- &vec, sizeof(vec));
- for (i = 0; i < sizeof(vec); i += 2) {
- prom_debug("%d: index = 0x%x val = 0x%x\n", i / 2
- , vec[i]
- , vec[i + 1]);
- prom_parse_platform_support(vec[i], vec[i + 1],
- &supported);
+ prom_getprop(prom.chosen, "ibm,arch-vec-5-platform-support", &vec, sizeof(vec));
+ for (i = 0; i < prop_len; i += 2) {
+ prom_debug("%d: index = 0x%x val = 0x%x\n", i / 2, vec[i], vec[i + 1]);
+ prom_parse_platform_support(vec[i], vec[i + 1], &supported);
}
}
--
2.26.2
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a68d725e4ea384977445e0bcaed3d7de83ab5b3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jouni=20K=2E=20Sepp=C3=A4nen?= <jks(a)iki.fi>
Date: Tue, 5 Jan 2021 06:52:49 +0200
Subject: [PATCH] net: cdc_ncm: correct overhead in delayed_ndp_size
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Aligning to tx_ndp_modulus is not sufficient because the next align
call can be cdc_ncm_align_tail, which can add up to ctx->tx_modulus +
ctx->tx_remainder - 1 bytes. This used to lead to occasional crashes
on a Huawei 909s-120 LTE module as follows:
- the condition marked /* if there is a remaining skb [...] */ is true
so the swaps happen
- skb_out is set from ctx->tx_curr_skb
- skb_out->len is exactly 0x3f52
- ctx->tx_curr_size is 0x4000 and delayed_ndp_size is 0xac
(note that the sum of skb_out->len and delayed_ndp_size is 0x3ffe)
- the for loop over n is executed once
- the cdc_ncm_align_tail call marked /* align beginning of next frame */
increases skb_out->len to 0x3f56 (the sum is now 0x4002)
- the condition marked /* check if we had enough room left [...] */ is
false so we break out of the loop
- the condition marked /* If requested, put NDP at end of frame. */ is
true so the NDP is written into skb_out
- now skb_out->len is 0x4002, so padding_count is minus two interpreted
as an unsigned number, which is used as the length argument to memset,
leading to a crash with various symptoms but usually including
> Call Trace:
> <IRQ>
> cdc_ncm_fill_tx_frame+0x83a/0x970 [cdc_ncm]
> cdc_mbim_tx_fixup+0x1d9/0x240 [cdc_mbim]
> usbnet_start_xmit+0x5d/0x720 [usbnet]
The cdc_ncm_align_tail call first aligns on a ctx->tx_modulus
boundary (adding at most ctx->tx_modulus-1 bytes), then adds
ctx->tx_remainder bytes. Alternatively, the next alignment call can
occur in cdc_ncm_ndp16 or cdc_ncm_ndp32, in which case at most
ctx->tx_ndp_modulus-1 bytes are added.
A similar problem has occurred before, and the code is nontrivial to
reason about, so add a guard before the crashing call. By that time it
is too late to prevent any memory corruption (we'll have written past
the end of the buffer already) but we can at least try to get a warning
written into an on-disk log by avoiding the hard crash caused by padding
past the buffer with a huge number of zeros.
Signed-off-by: Jouni K. Seppänen <jks(a)iki.fi>
Fixes: 4a0e3e989d66 ("cdc_ncm: Add support for moving NDP to end of NCM frame")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209407
Reported-by: kernel test robot <lkp(a)intel.com>
Reviewed-by: Bjørn Mork <bjorn(a)mork.no>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 3b816a4731f2..5a78848db93f 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -1199,7 +1199,10 @@ cdc_ncm_fill_tx_frame(struct usbnet *dev, struct sk_buff *skb, __le32 sign)
* accordingly. Otherwise, we should check here.
*/
if (ctx->drvflags & CDC_NCM_FLAG_NDP_TO_END)
- delayed_ndp_size = ALIGN(ctx->max_ndp_size, ctx->tx_ndp_modulus);
+ delayed_ndp_size = ctx->max_ndp_size +
+ max_t(u32,
+ ctx->tx_ndp_modulus,
+ ctx->tx_modulus + ctx->tx_remainder) - 1;
else
delayed_ndp_size = 0;
@@ -1410,7 +1413,8 @@ cdc_ncm_fill_tx_frame(struct usbnet *dev, struct sk_buff *skb, __le32 sign)
if (!(dev->driver_info->flags & FLAG_SEND_ZLP) &&
skb_out->len > ctx->min_tx_pkt) {
padding_count = ctx->tx_curr_size - skb_out->len;
- skb_put_zero(skb_out, padding_count);
+ if (!WARN_ON(padding_count > ctx->tx_curr_size))
+ skb_put_zero(skb_out, padding_count);
} else if (skb_out->len < ctx->tx_curr_size &&
(skb_out->len % dev->maxpacket) == 0) {
skb_put_u8(skb_out, 0); /* force short packet */
The first thing the active retirement worker does is decrement the
i915_active count.
The first thing we do during i915_active_wait is try to increment the
i915_active count, but only if already active [non-zero].
The wait may see that the retirement is already started and so marked the
i915_active as idle, and skip waiting for the retirement handler.
However, the caller of i915_active_wait may immediately free the
i915_active upon returning (e.g. i915_vma_destroy) so we must not return
before the concurrent access from the worker are completed. We must
always flush the worker.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2473
Fixes: 274cbf20fd10 ("drm/i915: Push the i915_active.retire into a worker")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v5.5+
---
drivers/gpu/drm/i915/i915_active.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index ab4382841c6b..3bc616cc1ad2 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -628,24 +628,26 @@ static int flush_lazy_signals(struct i915_active *ref)
int __i915_active_wait(struct i915_active *ref, int state)
{
- int err;
-
might_sleep();
- if (!i915_active_acquire_if_busy(ref))
- return 0;
-
/* Any fence added after the wait begins will not be auto-signaled */
- err = flush_lazy_signals(ref);
- i915_active_release(ref);
- if (err)
- return err;
+ if (i915_active_acquire_if_busy(ref)) {
+ int err;
- if (!i915_active_is_idle(ref) &&
- ___wait_var_event(ref, i915_active_is_idle(ref),
- state, 0, 0, schedule()))
- return -EINTR;
+ err = flush_lazy_signals(ref);
+ i915_active_release(ref);
+ if (err)
+ return err;
+ if (___wait_var_event(ref, i915_active_is_idle(ref),
+ state, 0, 0, schedule()))
+ return -EINTR;
+ }
+
+ /*
+ * After the wait is complete, the caller may free the active.
+ * We have to flush any concurrent retirement before returning.
+ */
flush_work(&ref->work);
return 0;
}
--
2.20.1
The vfio_ap device driver registers a group notifier with VFIO when the
file descriptor for a VFIO mediated device for a KVM guest is opened to
receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
event). When the KVM pointer is set, the vfio_ap driver takes the
following actions:
1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
of the mediated device.
2. Calls the kvm_get_kvm() function to increment its reference counter.
3. Sets the function pointer to the function that handles interception of
the instruction that enables/disables interrupt processing.
4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
the guest.
In order to avoid memory leaks, when the notifier is called to receive
notification that the KVM pointer has been set to NULL, the vfio_ap device
driver should reverse the actions taken when the KVM pointer was set.
Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Reviewed-by: Halil Pasic <pasic(a)linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck(a)redhat.com>
---
drivers/s390/crypto/vfio_ap_ops.c | 49 ++++++++++++++++++-------------
1 file changed, 28 insertions(+), 21 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..7339043906cf 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1037,19 +1037,14 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
{
struct ap_matrix_mdev *m;
- mutex_lock(&matrix_dev->lock);
-
list_for_each_entry(m, &matrix_dev->mdev_list, node) {
- if ((m != matrix_mdev) && (m->kvm == kvm)) {
- mutex_unlock(&matrix_dev->lock);
+ if ((m != matrix_mdev) && (m->kvm == kvm))
return -EPERM;
- }
}
matrix_mdev->kvm = kvm;
kvm_get_kvm(kvm);
kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
- mutex_unlock(&matrix_dev->lock);
return 0;
}
@@ -1083,35 +1078,52 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
return NOTIFY_DONE;
}
+static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
+{
+ kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+ matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+ vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+ kvm_put_kvm(matrix_mdev->kvm);
+ matrix_mdev->kvm = NULL;
+}
+
static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
- int ret;
+ int ret, notify_rc = NOTIFY_OK;
struct ap_matrix_mdev *matrix_mdev;
if (action != VFIO_GROUP_NOTIFY_SET_KVM)
return NOTIFY_OK;
matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
+ mutex_lock(&matrix_dev->lock);
if (!data) {
- matrix_mdev->kvm = NULL;
- return NOTIFY_OK;
+ if (matrix_mdev->kvm)
+ vfio_ap_mdev_unset_kvm(matrix_mdev);
+ goto notify_done;
}
ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
- if (ret)
- return NOTIFY_DONE;
+ if (ret) {
+ notify_rc = NOTIFY_DONE;
+ goto notify_done;
+ }
/* If there is no CRYCB pointer, then we can't copy the masks */
- if (!matrix_mdev->kvm->arch.crypto.crycbd)
- return NOTIFY_DONE;
+ if (!matrix_mdev->kvm->arch.crypto.crycbd) {
+ notify_rc = NOTIFY_DONE;
+ goto notify_done;
+ }
kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
matrix_mdev->matrix.aqm,
matrix_mdev->matrix.adm);
- return NOTIFY_OK;
+notify_done:
+ mutex_unlock(&matrix_dev->lock);
+ return notify_rc;
}
static void vfio_ap_irq_disable_apqn(int apqn)
@@ -1222,13 +1234,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
mutex_lock(&matrix_dev->lock);
- if (matrix_mdev->kvm) {
- kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
- matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
- vfio_ap_mdev_reset_queues(mdev);
- kvm_put_kvm(matrix_mdev->kvm);
- matrix_mdev->kvm = NULL;
- }
+ if (matrix_mdev->kvm)
+ vfio_ap_mdev_unset_kvm(matrix_mdev);
mutex_unlock(&matrix_dev->lock);
vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
--
2.21.1
The accelerated, instruction based implementations of SHA1, SHA2 and
SHA3 are autoloaded based on CPU capabilities, given that the code is
modest in size, and widely used, which means that resolving the algo
name, loading all compatible modules and picking the one with the
highest priority is taken to be suboptimal.
However, if these algorithms are requested before this CPU feature
based matching and autoloading occurs, these modules are not even
considered, and we end up with suboptimal performance.
So add the missing module aliases for the various SHA implementations.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
arch/arm64/crypto/sha1-ce-glue.c | 1 +
arch/arm64/crypto/sha2-ce-glue.c | 2 ++
arch/arm64/crypto/sha3-ce-glue.c | 4 ++++
arch/arm64/crypto/sha512-ce-glue.c | 2 ++
4 files changed, 9 insertions(+)
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
index c93121bcfdeb..c1362861765f 100644
--- a/arch/arm64/crypto/sha1-ce-glue.c
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -19,6 +19,7 @@
MODULE_DESCRIPTION("SHA1 secure hash using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel(a)linaro.org>");
MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("sha1");
struct sha1_ce_state {
struct sha1_state sst;
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
index 31ba3da5e61b..ded3a6488f81 100644
--- a/arch/arm64/crypto/sha2-ce-glue.c
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -19,6 +19,8 @@
MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel(a)linaro.org>");
MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("sha224");
+MODULE_ALIAS_CRYPTO("sha256");
struct sha256_ce_state {
struct sha256_state sst;
diff --git a/arch/arm64/crypto/sha3-ce-glue.c b/arch/arm64/crypto/sha3-ce-glue.c
index e5a2936f0886..7288d3046354 100644
--- a/arch/arm64/crypto/sha3-ce-glue.c
+++ b/arch/arm64/crypto/sha3-ce-glue.c
@@ -23,6 +23,10 @@
MODULE_DESCRIPTION("SHA3 secure hash using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel(a)linaro.org>");
MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("sha3-224");
+MODULE_ALIAS_CRYPTO("sha3-256");
+MODULE_ALIAS_CRYPTO("sha3-384");
+MODULE_ALIAS_CRYPTO("sha3-512");
asmlinkage void sha3_ce_transform(u64 *st, const u8 *data, int blocks,
int md_len);
diff --git a/arch/arm64/crypto/sha512-ce-glue.c b/arch/arm64/crypto/sha512-ce-glue.c
index faa83f6cf376..a6b1adf31c56 100644
--- a/arch/arm64/crypto/sha512-ce-glue.c
+++ b/arch/arm64/crypto/sha512-ce-glue.c
@@ -23,6 +23,8 @@
MODULE_DESCRIPTION("SHA-384/SHA-512 secure hash using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel(a)linaro.org>");
MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("sha384");
+MODULE_ALIAS_CRYPTO("sha512");
asmlinkage void sha512_ce_transform(struct sha512_state *sst, u8 const *src,
int blocks);
--
2.17.1
The function ovl_dir_real_file() currently uses the semaphore of the
inode to synchronize write to the upperfile cache field.
However, this function will get called by ovl_ioctl_set_flags(), which
utilizes the inode semaphore too. In this case ovl_dir_real_file() will
try to claim a lock that is owned by a function in its call stack, which
won't get released before ovl_dir_real_file() returns.
Define a dedicated semaphore for the upperfile cache, so that the
deadlock won't happen.
Fixes: 61536bed2149 ("ovl: support [S|G]ETFLAGS and FS[S|G]ETXATTR ioctls for directories")
Cc: stable(a)vger.kernel.org # v5.10
Signed-off-by: Icenowy Zheng <icenowy(a)aosc.io>
---
Changes in v2:
- Fixed missing replacement in error handling path.
Changes in v3:
- Use mutex instead of semaphore.
fs/overlayfs/readdir.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 01620ebae1bd..3980f9982f34 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -56,6 +56,7 @@ struct ovl_dir_file {
struct list_head *cursor;
struct file *realfile;
struct file *upperfile;
+ struct mutex upperfile_mutex;
};
static struct ovl_cache_entry *ovl_cache_entry_from_node(struct rb_node *n)
@@ -874,8 +875,6 @@ struct file *ovl_dir_real_file(const struct file *file, bool want_upper)
* Need to check if we started out being a lower dir, but got copied up
*/
if (!od->is_upper) {
- struct inode *inode = file_inode(file);
-
realfile = READ_ONCE(od->upperfile);
if (!realfile) {
struct path upperpath;
@@ -883,10 +882,10 @@ struct file *ovl_dir_real_file(const struct file *file, bool want_upper)
ovl_path_upper(dentry, &upperpath);
realfile = ovl_dir_open_realfile(file, &upperpath);
- inode_lock(inode);
+ mutex_lock(&od->upperfile_mutex);
if (!od->upperfile) {
if (IS_ERR(realfile)) {
- inode_unlock(inode);
+ mutex_unlock(&od->upperfile_mutex);
return realfile;
}
smp_store_release(&od->upperfile, realfile);
@@ -896,7 +895,7 @@ struct file *ovl_dir_real_file(const struct file *file, bool want_upper)
fput(realfile);
realfile = od->upperfile;
}
- inode_unlock(inode);
+ mutex_unlock(&od->upperfile_mutex);
}
}
@@ -959,6 +958,7 @@ static int ovl_dir_open(struct inode *inode, struct file *file)
od->realfile = realfile;
od->is_real = ovl_dir_is_real(file->f_path.dentry);
od->is_upper = OVL_TYPE_UPPER(type);
+ mutex_init(&od->upperfile_mutex);
file->private_data = od;
return 0;
--
2.28.0
On Fri, Nov 20, 2020 at 4:28 AM Saeed Mirzamohammadi
<saeed.mirzamohammadi(a)oracle.com> wrote:
>
> Hi,
>
> And I think crashkernel=auto could be used as an indicator that user
> want the kernel to control the crashkernel size, so some further work
> could be done to adjust the crashkernel more accordingly. eg. when
> memory encryption is enabled, increase the crashkernel value for the
> auto estimation, as it's known to consume more crashkernel memory.
>
> Thanks for the suggestion! I tried to keep it simple and leave it to the user to change Kconfig in case a different range is needed. Based on experience, these ranges work well for most of the regular cases.
Yes, I think the current implementation is a very good start.
There are some use cases, where kernel is expected to reserve more memory, like:
- when memory encryption is enabled, an extra swiotlb size of memory
should be reserved
- on pcc, fadump will expect more memory to be reserved
I believe there are a lot more cases like these.
I tried to come up with some patches to let the kernel reserve more
memory automatically, when such conditions are detected, but changing
the crashkernel= specified value is really weird.
But if we have a crashkernel=auto, then kernel automatically reserve
more memory will make sense.
> But why not make it arch-independent? This crashkernel=auto idea
> should simply work with every arch.
>
>
> Thanks! I’ll be making it arch-independent in the v2 patch.
>
>
> #include <asm/page.h>
> #include <asm/sections.h>
> @@ -41,6 +42,15 @@ static int __init parse_crashkernel_mem(char *cmdline,
> unsigned long long *crash_base)
> {
> char *cur = cmdline, *tmp;
> + unsigned long long total_mem = system_ram;
> +
> + /*
> + * Firmware sometimes reserves some memory regions for it's own use.
> + * so we get less than actual system memory size.
> + * Workaround this by round up the total size to 128M which is
> + * enough for most test cases.
> + */
> + total_mem = roundup(total_mem, SZ_128M);
>
>
> I think this rounding may be better moved to the arch specified part
> where parse_crashkernel is called?
>
>
> Thanks for the suggestion. Could you please elaborate why do we need to do that?
Every arch gets their total memory value using different methods,
(just check every parse_crashkernel call, and the system_ram param is
filled in many different ways), so I'm really not sure if this
rounding is always suitable.
>
> Thanks,
> Saeed
>
>
--
Best Regards,
Kairui Song
The patch titled
Subject: mm/vmalloc: reparate put pages and flush VM flags
has been added to the -mm tree. Its filename is
mm-vmalloc-separate-put-pages-and-flush-vm-flags.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-vmalloc-separate-put-pages-and…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-vmalloc-separate-put-pages-and…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Subject: mm/vmalloc: reparate put pages and flush VM flags
When VM_MAP_PUT_PAGES was added, it was defined with the same value as
VM_FLUSH_RESET_PERMS. This doesn't seem like it will cause any big
functional problems other than some excess flushing for VM_MAP_PUT_PAGES
allocations.
Redefine VM_MAP_PUT_PAGES to have its own value. Also, move the comment
and remove whitespace for VM_KASAN such that the flags lower down are less
likely to be missed in the future.
Link: https://lkml.kernel.org/r/20210121014118.31922-1-rick.p.edgecombe@intel.com
Fixes: b944afc9d64d ("mm: add a VM_MAP_PUT_PAGES flag for vmap")
Signed-off-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Reviewed-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: Daniel Axtens <dja(a)axtens.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/vmalloc.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
--- a/include/linux/vmalloc.h~mm-vmalloc-separate-put-pages-and-flush-vm-flags
+++ a/include/linux/vmalloc.h
@@ -23,9 +23,6 @@ struct notifier_block; /* in notifier.h
#define VM_DMA_COHERENT 0x00000010 /* dma_alloc_coherent */
#define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */
#define VM_NO_GUARD 0x00000040 /* don't add guard page */
-#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
-#define VM_MAP_PUT_PAGES 0x00000100 /* put pages and free array in vfree */
-
/*
* VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
*
@@ -36,12 +33,13 @@ struct notifier_block; /* in notifier.h
* Otherwise, VM_KASAN is set for kasan_module_alloc() allocations and used to
* determine which allocations need the module shadow freed.
*/
-
+#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
/*
* Memory with VM_FLUSH_RESET_PERMS cannot be freed in an interrupt or with
* vfree_atomic().
*/
#define VM_FLUSH_RESET_PERMS 0x00000100 /* Reset direct map and flush TLB on unmap */
+#define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */
/* bits [20..32] reserved for arch specific ioremap internals */
_
Patches currently in -mm which might be from rick.p.edgecombe(a)intel.com are
mm-vmalloc-separate-put-pages-and-flush-vm-flags.patch
The device link device's name was of the form:
<supplier-dev-name>--<consumer-dev-name>
This can cause name collision as reported here [1] as device names are
not globally unique. Since device names have to be unique within the
bus/class, add the bus/class name as a prefix to the device names used to
construct the device link device name.
So the devuce link device's name will be of the form:
<supplier-bus-name>:<supplier-dev-name>--<consumer-bus-name>:<consumer-dev-name>
[1] - https://lore.kernel.org/lkml/20201229033440.32142-1-michael@walle.cc/
Cc: stable(a)vger.kernel.org
Fixes: 287905e68dd2 ("driver core: Expose device link details in sysfs")
Reported-by: Michael Walle <michael(a)walle.cc>
Tested-by: Michael Walle <michael(a)walle.cc>
Signed-off-by: Saravana Kannan <saravanak(a)google.com>
---
v1:
- Fixed the collision in the device link device name.
v1 -> v2:
- Tried to fixed collision in the supplier: and consumer: symlinks
v2 -> v3:
- Fixed the truncation of the symlink names caused by v2.
v3 -> v4:
- Did all the above fixes for the symlink removal path.
Documentation/ABI/testing/sysfs-class-devlink | 4 +--
.../ABI/testing/sysfs-devices-consumer | 5 ++--
.../ABI/testing/sysfs-devices-supplier | 5 ++--
drivers/base/core.c | 27 ++++++++++---------
include/linux/device.h | 12 +++++++++
5 files changed, 35 insertions(+), 18 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-class-devlink b/Documentation/ABI/testing/sysfs-class-devlink
index b662f747c83e..8a21ce515f61 100644
--- a/Documentation/ABI/testing/sysfs-class-devlink
+++ b/Documentation/ABI/testing/sysfs-class-devlink
@@ -5,8 +5,8 @@ Description:
Provide a place in sysfs for the device link objects in the
kernel at any given time. The name of a device link directory,
denoted as ... above, is of the form <supplier>--<consumer>
- where <supplier> is the supplier device name and <consumer> is
- the consumer device name.
+ where <supplier> is the supplier bus:device name and <consumer>
+ is the consumer bus:device name.
What: /sys/class/devlink/.../auto_remove_on
Date: May 2020
diff --git a/Documentation/ABI/testing/sysfs-devices-consumer b/Documentation/ABI/testing/sysfs-devices-consumer
index 1f06d74d1c3c..0809fda092e6 100644
--- a/Documentation/ABI/testing/sysfs-devices-consumer
+++ b/Documentation/ABI/testing/sysfs-devices-consumer
@@ -4,5 +4,6 @@ Contact: Saravana Kannan <saravanak(a)google.com>
Description:
The /sys/devices/.../consumer:<consumer> are symlinks to device
links where this device is the supplier. <consumer> denotes the
- name of the consumer in that device link. There can be zero or
- more of these symlinks for a given device.
+ name of the consumer in that device link and is of the form
+ bus:device name. There can be zero or more of these symlinks
+ for a given device.
diff --git a/Documentation/ABI/testing/sysfs-devices-supplier b/Documentation/ABI/testing/sysfs-devices-supplier
index a919e0db5e90..207f5972e98d 100644
--- a/Documentation/ABI/testing/sysfs-devices-supplier
+++ b/Documentation/ABI/testing/sysfs-devices-supplier
@@ -4,5 +4,6 @@ Contact: Saravana Kannan <saravanak(a)google.com>
Description:
The /sys/devices/.../supplier:<supplier> are symlinks to device
links where this device is the consumer. <supplier> denotes the
- name of the supplier in that device link. There can be zero or
- more of these symlinks for a given device.
+ name of the supplier in that device link and is of the form
+ bus:device name. There can be zero or more of these symlinks
+ for a given device.
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 25e08e5f40bd..47a6faf1605a 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -456,7 +456,9 @@ static int devlink_add_symlinks(struct device *dev,
struct device *con = link->consumer;
char *buf;
- len = max(strlen(dev_name(sup)), strlen(dev_name(con)));
+ len = max(strlen(dev_bus_name(sup)) + strlen(dev_name(sup)),
+ strlen(dev_bus_name(con)) + strlen(dev_name(con)));
+ len += strlen(":");
len += strlen("supplier:") + 1;
buf = kzalloc(len, GFP_KERNEL);
if (!buf)
@@ -470,12 +472,12 @@ static int devlink_add_symlinks(struct device *dev,
if (ret)
goto err_con;
- snprintf(buf, len, "consumer:%s", dev_name(con));
+ snprintf(buf, len, "consumer:%s:%s", dev_bus_name(con), dev_name(con));
ret = sysfs_create_link(&sup->kobj, &link->link_dev.kobj, buf);
if (ret)
goto err_con_dev;
- snprintf(buf, len, "supplier:%s", dev_name(sup));
+ snprintf(buf, len, "supplier:%s:%s", dev_bus_name(sup), dev_name(sup));
ret = sysfs_create_link(&con->kobj, &link->link_dev.kobj, buf);
if (ret)
goto err_sup_dev;
@@ -483,7 +485,7 @@ static int devlink_add_symlinks(struct device *dev,
goto out;
err_sup_dev:
- snprintf(buf, len, "consumer:%s", dev_name(con));
+ snprintf(buf, len, "consumer:%s:%s", dev_bus_name(con), dev_name(con));
sysfs_remove_link(&sup->kobj, buf);
err_con_dev:
sysfs_remove_link(&link->link_dev.kobj, "consumer");
@@ -506,7 +508,9 @@ static void devlink_remove_symlinks(struct device *dev,
sysfs_remove_link(&link->link_dev.kobj, "consumer");
sysfs_remove_link(&link->link_dev.kobj, "supplier");
- len = max(strlen(dev_name(sup)), strlen(dev_name(con)));
+ len = max(strlen(dev_bus_name(sup)) + strlen(dev_name(sup)),
+ strlen(dev_bus_name(con)) + strlen(dev_name(con)));
+ len += strlen(":");
len += strlen("supplier:") + 1;
buf = kzalloc(len, GFP_KERNEL);
if (!buf) {
@@ -514,9 +518,9 @@ static void devlink_remove_symlinks(struct device *dev,
return;
}
- snprintf(buf, len, "supplier:%s", dev_name(sup));
+ snprintf(buf, len, "supplier:%s:%s", dev_bus_name(sup), dev_name(sup));
sysfs_remove_link(&con->kobj, buf);
- snprintf(buf, len, "consumer:%s", dev_name(con));
+ snprintf(buf, len, "consumer:%s:%s", dev_bus_name(con), dev_name(con));
sysfs_remove_link(&sup->kobj, buf);
kfree(buf);
}
@@ -737,8 +741,9 @@ struct device_link *device_link_add(struct device *consumer,
link->link_dev.class = &devlink_class;
device_set_pm_not_required(&link->link_dev);
- dev_set_name(&link->link_dev, "%s--%s",
- dev_name(supplier), dev_name(consumer));
+ dev_set_name(&link->link_dev, "%s:%s--%s:%s",
+ dev_bus_name(supplier), dev_name(supplier),
+ dev_bus_name(consumer), dev_name(consumer));
if (device_register(&link->link_dev)) {
put_device(consumer);
put_device(supplier);
@@ -1808,9 +1813,7 @@ const char *dev_driver_string(const struct device *dev)
* never change once they are set, so they don't need special care.
*/
drv = READ_ONCE(dev->driver);
- return drv ? drv->name :
- (dev->bus ? dev->bus->name :
- (dev->class ? dev->class->name : ""));
+ return drv ? drv->name : dev_bus_name(dev);
}
EXPORT_SYMBOL(dev_driver_string);
diff --git a/include/linux/device.h b/include/linux/device.h
index 89bb8b84173e..1779f90eeb4c 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -609,6 +609,18 @@ static inline const char *dev_name(const struct device *dev)
return kobject_name(&dev->kobj);
}
+/**
+ * dev_bus_name - Return a device's bus/class name, if at all possible
+ * @dev: struct device to get the bus/class name of
+ *
+ * Will return the name of the bus/class the device is attached to. If it is
+ * not attached to a bus/class, an empty string will be returned.
+ */
+static inline const char *dev_bus_name(const struct device *dev)
+{
+ return dev->bus ? dev->bus->name : (dev->class ? dev->class->name : "");
+}
+
__printf(2, 3) int dev_set_name(struct device *dev, const char *name, ...);
#ifdef CONFIG_NUMA
--
2.30.0.284.gd98b1dd5eaa7-goog