This is the start of the stable review cycle for the 5.10.1 release.
There are 2 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Monday, 14 Dec 2020 18:04:42 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.1-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.1-rc1
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "dm raid: fix discard limits for raid1 and raid10"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "md: change mddev 'chunk_sectors' from int to unsigned"
-------------
Diffstat:
Makefile | 4 ++--
drivers/md/dm-raid.c | 12 +++++-------
drivers/md/md.h | 4 ++--
3 files changed, 9 insertions(+), 11 deletions(-)
Xattr code using inodes with large xattr data can end up dropping last
inode reference (and thus deleting the inode) from places like
ext4_xattr_set_entry(). That function is called with transaction started
and so ext4_evict_inode() can deadlock against fs freezing like:
CPU1 CPU2
removexattr() freeze_super()
vfs_removexattr()
ext4_xattr_set()
handle = ext4_journal_start()
...
ext4_xattr_set_entry()
iput(old_ea_inode)
ext4_evict_inode(old_ea_inode)
sb->s_writers.frozen = SB_FREEZE_FS;
sb_wait_write(sb, SB_FREEZE_FS);
ext4_freeze()
jbd2_journal_lock_updates()
-> blocks waiting for all
handles to stop
sb_start_intwrite()
-> blocks as sb is already in SB_FREEZE_FS state
Generally it is advisable to delete inodes from a separate transaction
as it can consume quite some credits however in this case it would be
quite clumsy and furthermore the credits for inode deletion are quite
limited and already accounted for. So just tweak ext4_evict_inode() to
avoid freeze protection if we have transaction already started and thus
it is not really needed anyway.
CC: stable(a)vger.kernel.org
Fixes: dec214d00e0d ("ext4: xattr inode deduplication")
CC: Tahsin Erdogan <tahsin(a)google.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/ext4/inode.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 72534319fae5..777eb08b29cd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -175,6 +175,7 @@ void ext4_evict_inode(struct inode *inode)
*/
int extra_credits = 6;
struct ext4_xattr_inode_array *ea_inode_array = NULL;
+ bool freeze_protected = false;
trace_ext4_evict_inode(inode);
@@ -232,9 +233,14 @@ void ext4_evict_inode(struct inode *inode)
/*
* Protect us against freezing - iput() caller didn't have to have any
- * protection against it
+ * protection against it. When we are in a running transaction though,
+ * we are already protected against freezing and we cannot grab further
+ * protection due to lock ordering constraints.
*/
- sb_start_intwrite(inode->i_sb);
+ if (!ext4_journal_current_handle()) {
+ sb_start_intwrite(inode->i_sb);
+ freeze_protected = true;
+ }
if (!IS_NOQUOTA(inode))
extra_credits += EXT4_MAXQUOTAS_DEL_BLOCKS(inode->i_sb);
@@ -253,7 +259,8 @@ void ext4_evict_inode(struct inode *inode)
* cleaned up.
*/
ext4_orphan_del(NULL, inode);
- sb_end_intwrite(inode->i_sb);
+ if (freeze_protected)
+ sb_end_intwrite(inode->i_sb);
goto no_delete;
}
@@ -294,7 +301,8 @@ void ext4_evict_inode(struct inode *inode)
stop_handle:
ext4_journal_stop(handle);
ext4_orphan_del(NULL, inode);
- sb_end_intwrite(inode->i_sb);
+ if (freeze_protected)
+ sb_end_intwrite(inode->i_sb);
ext4_xattr_inode_array_free(ea_inode_array);
goto no_delete;
}
@@ -323,7 +331,8 @@ void ext4_evict_inode(struct inode *inode)
else
ext4_free_inode(handle, inode);
ext4_journal_stop(handle);
- sb_end_intwrite(inode->i_sb);
+ if (freeze_protected)
+ sb_end_intwrite(inode->i_sb);
ext4_xattr_inode_array_free(ea_inode_array);
return;
no_delete:
--
2.16.4
The patch titled
Subject: z3fold: simplify freeing slots
has been removed from the -mm tree. Its filename was
z3fold-simplify-freeing-slots.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vitaly Wool <vitaly.wool(a)konsulko.com>
Subject: z3fold: simplify freeing slots
Patch series "z3fold: stability / rt fixes".
Address z3fold stability issues under stress load, primarily in the
reclaim and free aspects. Besides, it fixes the locking problems that
were only seen in real-time kernel configuration.
This patch (of 3):
There used to be two places in the code where slots could be freed, namely
when freeing the last allocated handle from the slots and when releasing
the z3fold header these slots aree linked to. The logic to decide on
whether to free certain slots was complicated and error prone in both
functions and it led to failures in RT case.
To fix that, make free_handle() the single point of freeing slots.
Link: https://lkml.kernel.org/r/20201209145151.18994-1-vitaly.wool@konsulko.com
Link: https://lkml.kernel.org/r/20201209145151.18994-2-vitaly.wool@konsulko.com
Signed-off-by: Vitaly Wool <vitaly.wool(a)konsulko.com>
Tested-by: Mike Galbraith <efault(a)gmx.de>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/z3fold.c | 55 +++++++++++---------------------------------------
1 file changed, 13 insertions(+), 42 deletions(-)
--- a/mm/z3fold.c~z3fold-simplify-freeing-slots
+++ a/mm/z3fold.c
@@ -90,7 +90,7 @@ struct z3fold_buddy_slots {
* be enough slots to hold all possible variants
*/
unsigned long slot[BUDDY_MASK + 1];
- unsigned long pool; /* back link + flags */
+ unsigned long pool; /* back link */
rwlock_t lock;
};
#define HANDLE_FLAG_MASK (0x03)
@@ -182,13 +182,6 @@ enum z3fold_page_flags {
};
/*
- * handle flags, go under HANDLE_FLAG_MASK
- */
-enum z3fold_handle_flags {
- HANDLES_ORPHANED = 0,
-};
-
-/*
* Forward declarations
*/
static struct z3fold_header *__z3fold_alloc(struct z3fold_pool *, size_t, bool);
@@ -303,10 +296,9 @@ static inline void put_z3fold_header(str
z3fold_page_unlock(zhdr);
}
-static inline void free_handle(unsigned long handle)
+static inline void free_handle(unsigned long handle, struct z3fold_header *zhdr)
{
struct z3fold_buddy_slots *slots;
- struct z3fold_header *zhdr;
int i;
bool is_free;
@@ -316,22 +308,13 @@ static inline void free_handle(unsigned
if (WARN_ON(*(unsigned long *)handle == 0))
return;
- zhdr = handle_to_z3fold_header(handle);
slots = handle_to_slots(handle);
write_lock(&slots->lock);
*(unsigned long *)handle = 0;
- if (zhdr->slots == slots) {
- write_unlock(&slots->lock);
- return; /* simple case, nothing else to do */
- }
+ if (zhdr->slots != slots)
+ zhdr->foreign_handles--;
- /* we are freeing a foreign handle if we are here */
- zhdr->foreign_handles--;
is_free = true;
- if (!test_bit(HANDLES_ORPHANED, &slots->pool)) {
- write_unlock(&slots->lock);
- return;
- }
for (i = 0; i <= BUDDY_MASK; i++) {
if (slots->slot[i]) {
is_free = false;
@@ -343,6 +326,8 @@ static inline void free_handle(unsigned
if (is_free) {
struct z3fold_pool *pool = slots_to_pool(slots);
+ if (zhdr->slots == slots)
+ zhdr->slots = NULL;
kmem_cache_free(pool->c_handle, slots);
}
}
@@ -525,8 +510,6 @@ static void __release_z3fold_page(struct
{
struct page *page = virt_to_page(zhdr);
struct z3fold_pool *pool = zhdr_to_pool(zhdr);
- bool is_free = true;
- int i;
WARN_ON(!list_empty(&zhdr->buddy));
set_bit(PAGE_STALE, &page->private);
@@ -536,21 +519,6 @@ static void __release_z3fold_page(struct
list_del_init(&page->lru);
spin_unlock(&pool->lock);
- /* If there are no foreign handles, free the handles array */
- read_lock(&zhdr->slots->lock);
- for (i = 0; i <= BUDDY_MASK; i++) {
- if (zhdr->slots->slot[i]) {
- is_free = false;
- break;
- }
- }
- if (!is_free)
- set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
- read_unlock(&zhdr->slots->lock);
-
- if (is_free)
- kmem_cache_free(pool->c_handle, zhdr->slots);
-
if (locked)
z3fold_page_unlock(zhdr);
@@ -973,6 +941,9 @@ lookup:
}
}
+ if (zhdr && !zhdr->slots)
+ zhdr->slots = alloc_slots(pool,
+ can_sleep ? GFP_NOIO : GFP_ATOMIC);
return zhdr;
}
@@ -1270,7 +1241,7 @@ static void z3fold_free(struct z3fold_po
}
if (!page_claimed)
- free_handle(handle);
+ free_handle(handle, zhdr);
if (kref_put(&zhdr->refcount, release_z3fold_page_locked_list)) {
atomic64_dec(&pool->pages_nr);
return;
@@ -1429,19 +1400,19 @@ static int z3fold_reclaim_page(struct z3
ret = pool->ops->evict(pool, middle_handle);
if (ret)
goto next;
- free_handle(middle_handle);
+ free_handle(middle_handle, zhdr);
}
if (first_handle) {
ret = pool->ops->evict(pool, first_handle);
if (ret)
goto next;
- free_handle(first_handle);
+ free_handle(first_handle, zhdr);
}
if (last_handle) {
ret = pool->ops->evict(pool, last_handle);
if (ret)
goto next;
- free_handle(last_handle);
+ free_handle(last_handle, zhdr);
}
next:
if (test_bit(PAGE_HEADLESS, &page->private)) {
_
Patches currently in -mm which might be from vitaly.wool(a)konsulko.com are
syzbot reported the deadlock here [1]. The issue is in hugetlb cow
error handling when there are not enough huge pages for the faulting
task which took the original reservation. It is possible that other
(child) tasks could have consumed pages associated with the reservation.
In this case, we want the task which took the original reservation to
succeed. So, we unmap any associated pages in children so that they
can be used by the faulting task that owns the reservation.
The unmapping code needs to hold i_mmap_rwsem in write mode. However,
due to commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd
sharing synchronization") we are already holding i_mmap_rwsem in read
mode when hugetlb_cow is called. Technically, i_mmap_rwsem does not
need to be held in read mode for COW mappings as they can not share
pmd's. Modifying the fault code to not take i_mmap_rwsem in read mode
for COW (and other non-sharable) mappings is too involved for a stable
fix. Instead, we simply drop the hugetlb_fault_mutex and i_mmap_rwsem
before unmapping. This is OK as it is technically not needed. They
are reacquired after unmapping as expected by calling code. Since this
is done in an uncommon error path, the overhead of dropping and
reacquiring mutexes is acceptable.
While making changes, remove redundant BUG_ON after unmap_ref_private.
[1] https://lkml.kernel.org/r/000000000000b73ccc05b5cf8558@google.com
Reported-by: syzbot+5eee4145df3c15e96625(a)syzkaller.appspotmail.com
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
mm/hugetlb.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d029d938d26d..8713f8ef0f4c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4106,10 +4106,30 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
* may get SIGKILLed if it later faults.
*/
if (outside_reserve) {
+ struct address_space *mapping = vma->vm_file->f_mapping;
+ pgoff_t idx;
+ u32 hash;
+
put_page(old_page);
BUG_ON(huge_pte_none(pte));
+ /*
+ * Drop hugetlb_fault_mutex and i_mmap_rwsem before
+ * unmapping. unmapping needs to hold i_mmap_rwsem
+ * in write mode. Dropping i_mmap_rwsem in read mode
+ * here is OK as COW mappings do not interact with
+ * PMD sharing.
+ *
+ * Reacquire both after unmap operation.
+ */
+ idx = vma_hugecache_offset(h, vma, haddr);
+ hash = hugetlb_fault_mutex_hash(mapping, idx);
+ mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(vma->vm_file->f_mapping);
+
unmap_ref_private(mm, vma, old_page, haddr);
- BUG_ON(huge_pte_none(pte));
+
+ i_mmap_lock_read(vma->vm_file->f_mapping);
+ mutex_lock(&hugetlb_fault_mutex_table[hash]);
spin_lock(ptl);
ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
if (likely(ptep &&
--
2.29.2
When inserting a VMA, we restrict the placement to the low 4G unless the
caller opts into using the full range. This was done to allow usersapce
the opportunity to transition slowly from a 32b address space, and to
avoid breaking inherent 32b assumptions of some commands.
However, for insert we limited ourselves to 4G-4K, but on verification
we allowed the full 4G. This causes some attempts to bind a new buffer
to sporadically fail with -ENOSPC, but at other times be bound
successfully.
commit 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1
page") suggests that there is a genuine problem with stateless addressing
that cannot utilize the last page in 4G and so we purposefully excluded
it.
Reported-by: CQ Tang <cq.tang(a)intel.com>
Fixes: 48ea1e32c39d ("drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1 page")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: CQ Tang <cq.tang(a)intel.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 193996144c84..2ff32daa50bd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -382,7 +382,7 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry,
return true;
if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) &&
- (vma->node.start + vma->node.size - 1) >> 32)
+ (vma->node.start + vma->node.size + 4095) >> 32)
return true;
if (flags & __EXEC_OBJECT_NEEDS_MAP &&
--
2.20.1