[PATCH 3.18.y 0/5] Backports for 3.18.y

List overview All Threads
Download

newer

older

[GIT PULL] commits for Linux 4.14

FAILED: patch "[PATCH]...

Harsh Shandilya

22 Apr 2018 22 Apr '18

8:07 p.m.

A few more patches that were Cc'd stable but failed to apply to 3.18, backported with the 4.4 queue variants as reference wherever required.

Matthew Wilcox (1): mm/filemap.c: fix NULL pointer in page_cache_tree_insert()

Michal Hocko (1): mm: allow GFP_{FS,IO} for page_cache_read page cache allocation

Theodore Ts'o (2): ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea() ext4: don't update checksum of new initialized bitmaps

wangguang (1): ext4: bugfix for mmaped pages in mpage_release_unused_pages()

-- 2.15.0.2308.g658a28aa74af

Show replies by date

Harsh Shandilya

22 Apr 22 Apr

8:07 p.m.

New subject: [PATCH 3.18.y 1/5] ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()

From: Theodore Ts'o tytso@mit.edu

commit c755e251357a0cee0679081f08c3f4ba797a8009 upstream.

The xattr_sem deadlock problems fixed in commit 2e81a4eeedca: "ext4: avoid deadlock when expanding inode size" didn't include the use of xattr_sem in fs/ext4/inline.c. With the addition of project quota which added a new extra inode field, this exposed deadlocks in the inline_data code similar to the ones fixed by 2e81a4eeedca.

The deadlock can be reproduced via:

dmesg -n 7 mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768 mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc mkdir /vdc/a umount /vdc mount -t ext4 /dev/vdc /vdc echo foo > /vdc/a/foo

and looks like this:

[ 11.158815] [ 11.160276] ============================================= [ 11.161960] [ INFO: possible recursive locking detected ] [ 11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G W [ 11.161960] --------------------------------------------- [ 11.161960] bash/2519 is trying to acquire lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] [ 11.161960] but task is already holding lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] other info that might help us debug this: [ 11.161960] Possible unsafe locking scenario: [ 11.161960] [ 11.161960] CPU0 [ 11.161960] ---- [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] [ 11.161960] *** DEADLOCK *** [ 11.161960] [ 11.161960] May be due to missing lock nesting notation [ 11.161960] [ 11.161960] 4 locks held by bash/2519: [ 11.161960] #0: (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e [ 11.161960] #1: (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a [ 11.161960] #2: (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622 [ 11.161960] #3: (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] stack backtrace: [ 11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G W 4.10.0-rc3-00015-g011b30a8a3cf #160 [ 11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014 [ 11.161960] Call Trace: [ 11.161960] dump_stack+0x72/0xa3 [ 11.161960] __lock_acquire+0xb7c/0xcb9 [ 11.161960] ? kvm_clock_read+0x1f/0x29 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] lock_acquire+0x106/0x18a [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] down_write+0x39/0x72 [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ? _raw_read_unlock+0x22/0x2c [ 11.161960] ? jbd2_journal_extend+0x1e2/0x262 [ 11.161960] ? __ext4_journal_get_write_access+0x3d/0x60 [ 11.161960] ext4_mark_inode_dirty+0x17d/0x26d [ 11.161960] ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_try_add_inline_entry+0x69/0x152 [ 11.161960] ext4_add_entry+0xa3/0x848 [ 11.161960] ? __brelse+0x14/0x2f [ 11.161960] ? _raw_spin_unlock_irqrestore+0x44/0x4f [ 11.161960] ext4_add_nondir+0x17/0x5b [ 11.161960] ext4_create+0xcf/0x133 [ 11.161960] ? ext4_mknod+0x12f/0x12f [ 11.161960] lookup_open+0x39e/0x3fb [ 11.161960] ? __wake_up+0x1a/0x40 [ 11.161960] ? lock_acquire+0x11e/0x18a [ 11.161960] path_openat+0x35c/0x67a [ 11.161960] ? sched_clock_cpu+0xd7/0xf2 [ 11.161960] do_filp_open+0x36/0x7c [ 11.161960] ? _raw_spin_unlock+0x22/0x2c [ 11.161960] ? __alloc_fd+0x169/0x173 [ 11.161960] do_sys_open+0x59/0xcc [ 11.161960] SyS_open+0x1d/0x1f [ 11.161960] do_int80_syscall_32+0x4f/0x61 [ 11.161960] entry_INT80_32+0x2f/0x2f [ 11.161960] EIP: 0xb76ad469 [ 11.161960] EFLAGS: 00000286 CPU: 0 [ 11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6 [ 11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0 [ 11.161960] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4eeedca as a prereq) Reported-by: George Spelvin linux@sciencehorizons.net Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Harsh Shandilya harsh@prjkt.io --- fs/ext4/inline.c | 66 ++++++++++++++++++++++-------------------------- fs/ext4/xattr.c | 30 +++++++++------------- fs/ext4/xattr.h | 32 +++++++++++++++++++++++ 3 files changed, 74 insertions(+), 54 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 8fc2357c6867..5070616e6247 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -374,7 +374,7 @@ out: static int ext4_prepare_inline_data(handle_t *handle, struct inode *inode, unsigned int len) { - int ret, size; + int ret, size, no_expand; struct ext4_inode_info *ei = EXT4_I(inode);

if (!ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) @@ -384,15 +384,14 @@ static int ext4_prepare_inline_data(handle_t *handle, struct inode *inode, if (size < len) return -ENOSPC;

- down_write(&EXT4_I(inode)->xattr_sem); + ext4_write_lock_xattr(inode, &no_expand);

if (ei->i_inline_off) ret = ext4_update_inline_data(handle, inode, len); else ret = ext4_create_inline_data(handle, inode, len);

- up_write(&EXT4_I(inode)->xattr_sem); - + ext4_write_unlock_xattr(inode, &no_expand); return ret; }

@@ -522,7 +521,7 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping, struct inode *inode, unsigned flags) { - int ret, needed_blocks; + int ret, needed_blocks, no_expand; handle_t *handle = NULL; int retries = 0, sem_held = 0; struct page *page = NULL; @@ -562,7 +561,7 @@ retry: goto out; }

- down_write(&EXT4_I(inode)->xattr_sem); + ext4_write_lock_xattr(inode, &no_expand); sem_held = 1; /* If some one has already done this for us, just exit. */ if (!ext4_has_inline_data(inode)) { @@ -598,7 +597,7 @@ retry: page_cache_release(page); page = NULL; ext4_orphan_add(handle, inode); - up_write(&EXT4_I(inode)->xattr_sem); + ext4_write_unlock_xattr(inode, &no_expand); sem_held = 0; ext4_journal_stop(handle); handle = NULL; @@ -624,7 +623,7 @@ out: page_cache_release(page); } if (sem_held) - up_write(&EXT4_I(inode)->xattr_sem); + ext4_write_unlock_xattr(inode, &no_expand); if (handle) ext4_journal_stop(handle); brelse(iloc.bh); @@ -717,7 +716,7 @@ convert: int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied, struct page *page) { - int ret; + int ret, no_expand; void *kaddr; struct ext4_iloc iloc;

@@ -735,7 +734,7 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len, goto out; }

- down_write(&EXT4_I(inode)->xattr_sem); + ext4_write_lock_xattr(inode, &no_expand); BUG_ON(!ext4_has_inline_data(inode));

kaddr = kmap_atomic(page); @@ -745,7 +744,7 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len, /* clear page dirty so that writepages wouldn't work for us. */ ClearPageDirty(page);

- up_write(&EXT4_I(inode)->xattr_sem); + ext4_write_unlock_xattr(inode, &no_expand); brelse(iloc.bh); out: return copied; @@ -756,7 +755,7 @@ ext4_journalled_write_inline_data(struct inode *inode, unsigned len, struct page *page) { - int ret; + int ret, no_expand; void *kaddr; struct ext4_iloc iloc;

@@ -766,11 +765,11 @@ ext4_journalled_write_inline_data(struct inode *inode, return NULL; }

- down_write(&EXT4_I(inode)->xattr_sem); + ext4_write_lock_xattr(inode, &no_expand); kaddr = kmap_atomic(page); ext4_write_inline_data(inode, &iloc, kaddr, 0, len); kunmap_atomic(kaddr); - up_write(&EXT4_I(inode)->xattr_sem); + ext4_write_unlock_xattr(inode, &no_expand);

return iloc.bh; } @@ -1245,7 +1244,7 @@ out: int ext4_try_add_inline_entry(handle_t *handle, struct dentry *dentry, struct inode *inode) { - int ret, inline_size; + int ret, inline_size, no_expand; void *inline_start; struct ext4_iloc iloc; struct inode *dir = dentry->d_parent->d_inode; @@ -1254,7 +1253,7 @@ int ext4_try_add_inline_entry(handle_t *handle, struct dentry *dentry, if (ret) return ret;

- down_write(&EXT4_I(dir)->xattr_sem); + ext4_write_lock_xattr(dir, &no_expand); if (!ext4_has_inline_data(dir)) goto out;

@@ -1299,7 +1298,7 @@ int ext4_try_add_inline_entry(handle_t *handle, struct dentry *dentry,

out: ext4_mark_inode_dirty(handle, dir); - up_write(&EXT4_I(dir)->xattr_sem); + ext4_write_unlock_xattr(dir, &no_expand); brelse(iloc.bh); return ret; } @@ -1655,7 +1654,7 @@ int ext4_delete_inline_entry(handle_t *handle, struct buffer_head *bh, int *has_inline_data) { - int err, inline_size; + int err, inline_size, no_expand; struct ext4_iloc iloc; void *inline_start;

@@ -1663,7 +1662,7 @@ int ext4_delete_inline_entry(handle_t *handle, if (err) return err;

- down_write(&EXT4_I(dir)->xattr_sem); + ext4_write_lock_xattr(dir, &no_expand); if (!ext4_has_inline_data(dir)) { *has_inline_data = 0; goto out; @@ -1698,7 +1697,7 @@ int ext4_delete_inline_entry(handle_t *handle,

ext4_show_inline_dir(dir, iloc.bh, inline_start, inline_size); out: - up_write(&EXT4_I(dir)->xattr_sem); + ext4_write_unlock_xattr(dir, &no_expand); brelse(iloc.bh); if (err != -ENOENT) ext4_std_error(dir->i_sb, err); @@ -1797,11 +1796,11 @@ out:

int ext4_destroy_inline_data(handle_t *handle, struct inode *inode) { - int ret; + int ret, no_expand;

- down_write(&EXT4_I(inode)->xattr_sem); + ext4_write_lock_xattr(inode, &no_expand); ret = ext4_destroy_inline_data_nolock(handle, inode); - up_write(&EXT4_I(inode)->xattr_sem); + ext4_write_unlock_xattr(inode, &no_expand);

return ret; } @@ -1879,7 +1878,7 @@ out: void ext4_inline_data_truncate(struct inode *inode, int *has_inline) { handle_t *handle; - int inline_size, value_len, needed_blocks; + int inline_size, value_len, needed_blocks, no_expand; size_t i_size; void *value = NULL; struct ext4_xattr_ibody_find is = { @@ -1896,7 +1895,7 @@ void ext4_inline_data_truncate(struct inode *inode, int *has_inline) if (IS_ERR(handle)) return;

- down_write(&EXT4_I(inode)->xattr_sem); + ext4_write_lock_xattr(inode, &no_expand); if (!ext4_has_inline_data(inode)) { *has_inline = 0; ext4_journal_stop(handle); @@ -1954,7 +1953,7 @@ out_error: up_write(&EXT4_I(inode)->i_data_sem); out: brelse(is.iloc.bh); - up_write(&EXT4_I(inode)->xattr_sem); + ext4_write_unlock_xattr(inode, &no_expand); kfree(value); if (inode->i_nlink) ext4_orphan_del(handle, inode); @@ -1970,7 +1969,7 @@ out:

int ext4_convert_inline_data(struct inode *inode) { - int error, needed_blocks; + int error, needed_blocks, no_expand; handle_t *handle; struct ext4_iloc iloc;

@@ -1992,15 +1991,10 @@ int ext4_convert_inline_data(struct inode *inode) goto out_free; }

- down_write(&EXT4_I(inode)->xattr_sem); - if (!ext4_has_inline_data(inode)) { - up_write(&EXT4_I(inode)->xattr_sem); - goto out; - } - - error = ext4_convert_inline_data_nolock(handle, inode, &iloc); - up_write(&EXT4_I(inode)->xattr_sem); -out: + ext4_write_lock_xattr(inode, &no_expand); + if (ext4_has_inline_data(inode)) + error = ext4_convert_inline_data_nolock(handle, inode, &iloc); + ext4_write_unlock_xattr(inode, &no_expand); ext4_journal_stop(handle); out_free: brelse(iloc.bh); diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index c57c83806fb9..c8d782bf8c5c 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1120,16 +1120,14 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index, struct ext4_xattr_block_find bs = { .s = { .not_found = -ENODATA, }, }; - unsigned long no_expand; + int no_expand; int error;

if (!name) return -EINVAL; if (strlen(name) > 255) return -ERANGE; - down_write(&EXT4_I(inode)->xattr_sem); - no_expand = ext4_test_inode_state(inode, EXT4_STATE_NO_EXPAND); - ext4_set_inode_state(inode, EXT4_STATE_NO_EXPAND); + ext4_write_lock_xattr(inode, &no_expand);

error = ext4_reserve_inode_write(handle, inode, &is.iloc); if (error) @@ -1190,7 +1188,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index, ext4_xattr_update_super_block(handle, inode->i_sb); inode->i_ctime = ext4_current_time(inode); if (!value) - ext4_clear_inode_state(inode, EXT4_STATE_NO_EXPAND); + no_expand = 0; error = ext4_mark_iloc_dirty(handle, inode, &is.iloc); /* * The bh is consumed by ext4_mark_iloc_dirty, even with @@ -1204,9 +1202,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index, cleanup: brelse(is.iloc.bh); brelse(bs.bh); - if (no_expand == 0) - ext4_clear_inode_state(inode, EXT4_STATE_NO_EXPAND); - up_write(&EXT4_I(inode)->xattr_sem); + ext4_write_unlock_xattr(inode, &no_expand); return error; }

@@ -1289,12 +1285,11 @@ int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize, void *base, *start, *end; int extra_isize = 0, error = 0, tried_min_extra_isize = 0; int s_min_extra_isize = le16_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_min_extra_isize); + int no_expand; + + if (ext4_write_trylock_xattr(inode, &no_expand) == 0) + return 0;

- down_write(&EXT4_I(inode)->xattr_sem); - /* - * Set EXT4_STATE_NO_EXPAND to avoid recursion when marking inode dirty - */ - ext4_set_inode_state(inode, EXT4_STATE_NO_EXPAND); retry: if (EXT4_I(inode)->i_extra_isize >= new_extra_isize) goto out; @@ -1487,8 +1482,7 @@ retry: } brelse(bh); out: - ext4_clear_inode_state(inode, EXT4_STATE_NO_EXPAND); - up_write(&EXT4_I(inode)->xattr_sem); + ext4_write_unlock_xattr(inode, &no_expand); return 0;

cleanup: @@ -1500,10 +1494,10 @@ cleanup: kfree(bs); brelse(bh); /* - * We deliberately leave EXT4_STATE_NO_EXPAND set here since inode - * size expansion failed. + * Inode size expansion failed; don't try again */ - up_write(&EXT4_I(inode)->xattr_sem); + no_expand = 1; + ext4_write_unlock_xattr(inode, &no_expand); return error; }

diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h index 29bedf5589f6..2e8f23e78bc2 100644 --- a/fs/ext4/xattr.h +++ b/fs/ext4/xattr.h @@ -98,6 +98,38 @@ extern const struct xattr_handler ext4_xattr_user_handler; extern const struct xattr_handler ext4_xattr_trusted_handler; extern const struct xattr_handler ext4_xattr_security_handler;

+/* + * The EXT4_STATE_NO_EXPAND is overloaded and used for two purposes. + * The first is to signal that there the inline xattrs and data are + * taking up so much space that we might as well not keep trying to + * expand it. The second is that xattr_sem is taken for writing, so + * we shouldn't try to recurse into the inode expansion. For this + * second case, we need to make sure that we take save and restore the + * NO_EXPAND state flag appropriately. + */ +static inline void ext4_write_lock_xattr(struct inode *inode, int *save) +{ + down_write(&EXT4_I(inode)->xattr_sem); + *save = ext4_test_inode_state(inode, EXT4_STATE_NO_EXPAND); + ext4_set_inode_state(inode, EXT4_STATE_NO_EXPAND); +} + +static inline int ext4_write_trylock_xattr(struct inode *inode, int *save) +{ + if (down_write_trylock(&EXT4_I(inode)->xattr_sem) == 0) + return 0; + *save = ext4_test_inode_state(inode, EXT4_STATE_NO_EXPAND); + ext4_set_inode_state(inode, EXT4_STATE_NO_EXPAND); + return 1; +} + +static inline void ext4_write_unlock_xattr(struct inode *inode, int *save) +{ + if (*save == 0) + ext4_clear_inode_state(inode, EXT4_STATE_NO_EXPAND); + up_write(&EXT4_I(inode)->xattr_sem); +} + extern ssize_t ext4_listxattr(struct dentry *, char *, size_t);

extern int ext4_xattr_get(struct inode *, int, const char *, void *, size_t);

-- 2.15.0.2308.g658a28aa74af

Harsh Shandilya

8:07 p.m.

New subject: [PATCH 3.18.y 2/5] ext4: bugfix for mmaped pages in mpage_release_unused_pages()

From: wangguang wang.guang55@zte.com.cn

Commit 4e800c0359d9a53e6bf0ab216954971b2515247f upstream.

Pages clear buffers after ext4 delayed block allocation failed, However, it does not clean its pte_dirty flag. if the pages unmap ,in cording to the pte_dirty , unmap_page_range may try to call __set_page_dirty,

which may lead to the bugon at mpage_prepare_extent_to_map:head = page_buffers(page);.

This patch just call clear_page_dirty_for_io to clean pte_dirty at mpage_release_unused_pages for pages mmaped.

Steps to reproduce the bug:

（1） mmap a file in ext4 addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); memset(addr, 'i', 4096);

（2） return EIO at

ext4_writepages->mpage_map_and_submit_extent->mpage_map_one_extent

which causes this log message to be print:

ext4_msg(sb, KERN_CRIT, "Delayed block allocation failed for " "inode %lu at logical offset %llu with" " max blocks %u with error %d", inode->i_ino, (unsigned long long)map->m_lblk, (unsigned)map->m_len, -err);

(3）Unmap the addr cause warning at

__set_page_dirty:WARN_ON_ONCE(warn && !PageUptodate(page));

(4) wait for a minute,then bugon happen.

Cc: stable@vger.kernel.org Signed-off-by: wangguang wangguang03@zte.com Signed-off-by: Theodore Ts'o tytso@mit.edu [@nathanchance: Resolved conflict from lack of 09cbfeaf1a5a6] Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Harsh Shandilya harsh@prjkt.io --- fs/ext4/inode.c | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 859af265ae1b..e7f75942aea5 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1338,6 +1338,8 @@ static void mpage_release_unused_pages(struct mpage_da_data *mpd, BUG_ON(!PageLocked(page)); BUG_ON(PageWriteback(page)); if (invalidate) { + if (page_mapped(page)) + clear_page_dirty_for_io(page); block_invalidatepage(page, 0, PAGE_CACHE_SIZE); ClearPageUptodate(page); }

-- 2.15.0.2308.g658a28aa74af

Harsh Shandilya

8:07 p.m.

New subject: [PATCH 3.18.y 3/5] mm: allow GFP_{FS, IO} for page_cache_read page cache allocation

From: Michal Hocko mhocko@suse.com

Commit c20cd45eb01748f0fba77a504f956b000df4ea73 upstream.

page_cache_read has been historically using page_cache_alloc_cold to allocate a new page. This means that mapping_gfp_mask is used as the base for the gfp_mask. Many filesystems are setting this mask to GFP_NOFS to prevent from fs recursion issues. page_cache_read is called from the vm_operations_struct::fault() context during the page fault. This context doesn't need the reclaim protection normally.

ceph and ocfs2 which call filemap_fault from their fault handlers seem to be OK because they are not taking any fs lock before invoking generic implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe from the reclaim recursion POV because this lock serializes truncate and punch hole with the page faults and it doesn't get involved in the reclaim.

There is simply no reason to deliberately use a weaker allocation context when a __GFP_FS | __GFP_IO can be used. The GFP_NOFS protection might be even harmful. There is a push to fail GFP_NOFS allocations rather than loop within allocator indefinitely with a very limited reclaim ability. Once we start failing those requests the OOM killer might be triggered prematurely because the page cache allocation failure is propagated up the page fault path and end up in pagefault_out_of_memory.

We cannot play with mapping_gfp_mask directly because that would be racy wrt. parallel page faults and it might interfere with other users who really rely on NOFS semantic from the stored gfp_mask. The mask is also inode proper so it would even be a layering violation. What we can do instead is to push the gfp_mask into struct vm_fault and allow fs layer to overwrite it should the callback need to be called with a different allocation context.

Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO) because this should be safe from the page fault path normally. Why do we care about mapping_gfp_mask at all then? Because this doesn't hold only reclaim protection flags but it also might contain zone and movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have to respect those.

Signed-off-by: Michal Hocko mhocko@suse.com Reported-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Acked-by: Jan Kara jack@suse.com Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Mel Gorman mgorman@suse.de Cc: Dave Chinner david@fromorbit.com Cc: Mark Fasheh mfasheh@suse.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Harsh Shandilya harsh@prjkt.io --- include/linux/mm.h | 4 ++++ mm/filemap.c | 8 ++++---- mm/memory.c | 17 +++++++++++++++++ 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h index 5adffb0a468f..9ac4697979e8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -203,9 +203,13 @@ extern pgprot_t protection_map[16]; * * pgoff should be used in favour of virtual_address, if possible. If pgoff * is used, one may implement ->remap_pages to get nonlinear mapping support. + * + * MM layer fills up gfp_mask for page allocations but fault handler might + * alter it if its implementation requires a different allocation context. */ struct vm_fault { unsigned int flags; /* FAULT_FLAG_xxx flags */ + gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ void __user *virtual_address; /* Faulting virtual address */

diff --git a/mm/filemap.c b/mm/filemap.c index 7e6ab98d4d3c..aafeeefcb00d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1746,18 +1746,18 @@ EXPORT_SYMBOL(generic_file_read_iter); * This adds the requested page to the page cache if it isn't already there, * and schedules an I/O to read in its contents from disk. */ -static int page_cache_read(struct file *file, pgoff_t offset) +static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) { struct address_space *mapping = file->f_mapping; struct page *page; int ret;

do { - page = page_cache_alloc_cold(mapping); + page = __page_cache_alloc(gfp_mask|__GFP_COLD); if (!page) return -ENOMEM;

- ret = add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL); + ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL); if (ret == 0) ret = mapping->a_ops->readpage(file, page); else if (ret == -EEXIST) @@ -1940,7 +1940,7 @@ no_cached_page: * We're only likely to ever get here if MADV_RANDOM is in * effect. */ - error = page_cache_read(file, offset); + error = page_cache_read(file, offset, vmf->gfp_mask);

/* * The page we want has now been added to the page cache. diff --git a/mm/memory.c b/mm/memory.c index 0c4f5e36b155..5a62c6a42143 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1973,6 +1973,20 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo copy_user_highpage(dst, src, va, vma); }

+static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma) +{ + struct file *vm_file = vma->vm_file; + + if (vm_file) + return mapping_gfp_mask(vm_file->f_mapping) | __GFP_FS | __GFP_IO; + + /* + * Special mappings (e.g. VDSO) do not have any file so fake + * a default GFP_KERNEL for them. + */ + return GFP_KERNEL; +} + /* * Notify the address space that the page is about to become writable so that * it can prohibit this or wait for the page to get into an appropriate state. @@ -1988,6 +2002,7 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page, vmf.virtual_address = (void __user *)(address & PAGE_MASK); vmf.pgoff = page->index; vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE; + vmf.gfp_mask = __get_fault_gfp_mask(vma); vmf.page = page;

ret = vma->vm_ops->page_mkwrite(vma, &vmf); @@ -2670,6 +2685,7 @@ static int __do_fault(struct vm_area_struct *vma, unsigned long address, vmf.pgoff = pgoff; vmf.flags = flags; vmf.page = NULL; + vmf.gfp_mask = __get_fault_gfp_mask(vma);

ret = vma->vm_ops->fault(vma, &vmf); if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY))) @@ -2834,6 +2850,7 @@ static void do_fault_around(struct vm_area_struct *vma, unsigned long address, vmf.pgoff = pgoff; vmf.max_pgoff = max_pgoff; vmf.flags = flags; + vmf.gfp_mask = __get_fault_gfp_mask(vma); vma->vm_ops->map_pages(vma, &vmf); }

-- 2.15.0.2308.g658a28aa74af

Michal Hocko

10:46 p.m.

New subject: [PATCH 3.18.y 3/5] mm: allow GFP_{FS,IO} for page_cache_read page cache allocation

I am not reallu sure it is a good idea to blindly apply this patch to the older stable tree. Have you checked all filemap_fault handlers? This might have been quite different in 3.18 than for the kernel I was developing this against.

If the sole purpose of this backport is to make other patch (abc1be13fd11 ("mm/filemap.c: fix NULL pointer in page_cache_tree_insert()")) apply easier then I've already suggested how to handle those rejects.

On Mon 23-04-18 01:37:44, Harsh Shandilya wrote:

...

From: Michal Hocko mhocko@suse.com

Commit c20cd45eb01748f0fba77a504f956b000df4ea73 upstream.

page_cache_read has been historically using page_cache_alloc_cold to allocate a new page. This means that mapping_gfp_mask is used as the base for the gfp_mask. Many filesystems are setting this mask to GFP_NOFS to prevent from fs recursion issues. page_cache_read is called from the vm_operations_struct::fault() context during the page fault. This context doesn't need the reclaim protection normally.

ceph and ocfs2 which call filemap_fault from their fault handlers seem to be OK because they are not taking any fs lock before invoking generic implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe from the reclaim recursion POV because this lock serializes truncate and punch hole with the page faults and it doesn't get involved in the reclaim.

There is simply no reason to deliberately use a weaker allocation context when a __GFP_FS | __GFP_IO can be used. The GFP_NOFS protection might be even harmful. There is a push to fail GFP_NOFS allocations rather than loop within allocator indefinitely with a very limited reclaim ability. Once we start failing those requests the OOM killer might be triggered prematurely because the page cache allocation failure is propagated up the page fault path and end up in pagefault_out_of_memory.

We cannot play with mapping_gfp_mask directly because that would be racy wrt. parallel page faults and it might interfere with other users who really rely on NOFS semantic from the stored gfp_mask. The mask is also inode proper so it would even be a layering violation. What we can do instead is to push the gfp_mask into struct vm_fault and allow fs layer to overwrite it should the callback need to be called with a different allocation context.

Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO) because this should be safe from the page fault path normally. Why do we care about mapping_gfp_mask at all then? Because this doesn't hold only reclaim protection flags but it also might contain zone and movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have to respect those.

Signed-off-by: Michal Hocko mhocko@suse.com Reported-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Acked-by: Jan Kara jack@suse.com Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Mel Gorman mgorman@suse.de Cc: Dave Chinner david@fromorbit.com Cc: Mark Fasheh mfasheh@suse.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Harsh Shandilya harsh@prjkt.io

include/linux/mm.h | 4 ++++ mm/filemap.c | 8 ++++---- mm/memory.c | 17 +++++++++++++++++ 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h index 5adffb0a468f..9ac4697979e8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -203,9 +203,13 @@ extern pgprot_t protection_map[16];

pgoff should be used in favour of virtual_address, if possible. If pgoff

is used, one may implement ->remap_pages to get nonlinear mapping support.

MM layer fills up gfp_mask for page allocations but fault handler might

alter it if its implementation requires a different allocation context.

*/

struct vm_fault { unsigned int flags; /* FAULT_FLAG_xxx flags */

gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ void __user *virtual_address; /* Faulting virtual address */

diff --git a/mm/filemap.c b/mm/filemap.c index 7e6ab98d4d3c..aafeeefcb00d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1746,18 +1746,18 @@ EXPORT_SYMBOL(generic_file_read_iter);

This adds the requested page to the page cache if it isn't already there,

and schedules an I/O to read in its contents from disk.

*/ -static int page_cache_read(struct file *file, pgoff_t offset) +static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) { struct address_space *mapping = file->f_mapping; struct page *page; int ret; do {
page = page_cache_alloc_cold(mapping);
page = __page_cache_alloc(gfp_mask|__GFP_COLD);
if (!page) return -ENOMEM;
ret = add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL);
ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
if (ret == 0) ret = mapping->a_ops->readpage(file, page); else if (ret == -EEXIST)
@@ -1940,7 +1940,7 @@ no_cached_page: * We're only likely to ever get here if MADV_RANDOM is in * effect. */

error = page_cache_read(file, offset);

error = page_cache_read(file, offset, vmf->gfp_mask);

/* * The page we want has now been added to the page cache. diff --git a/mm/memory.c b/mm/memory.c index 0c4f5e36b155..5a62c6a42143 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1973,6 +1973,20 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo copy_user_highpage(dst, src, va, vma); } +static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma) +{
struct file *vm_file = vma->vm_file;

if (vm_file)
return mapping_gfp_mask(vm_file->f_mapping) | __GFP_FS | __GFP_IO;
/*
* Special mappings (e.g. VDSO) do not have any file so fake
* a default GFP_KERNEL for them.
*/
return GFP_KERNEL;
+}

/*

Notify the address space that the page is about to become writable so that

it can prohibit this or wait for the page to get into an appropriate state.

@@ -1988,6 +2002,7 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page, vmf.virtual_address = (void __user *)(address & PAGE_MASK); vmf.pgoff = page->index; vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;

vmf.gfp_mask = __get_fault_gfp_mask(vma); vmf.page = page;

ret = vma->vm_ops->page_mkwrite(vma, &vmf); @@ -2670,6 +2685,7 @@ static int __do_fault(struct vm_area_struct *vma, unsigned long address, vmf.pgoff = pgoff; vmf.flags = flags; vmf.page = NULL;

vmf.gfp_mask = __get_fault_gfp_mask(vma);

ret = vma->vm_ops->fault(vma, &vmf); if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY))) @@ -2834,6 +2850,7 @@ static void do_fault_around(struct vm_area_struct *vma, unsigned long address, vmf.pgoff = pgoff; vmf.max_pgoff = max_pgoff; vmf.flags = flags;

vmf.gfp_mask = __get_fault_gfp_mask(vma); vma->vm_ops->map_pages(vma, &vmf);

} -- 2.15.0.2308.g658a28aa74af

-- Michal Hocko SUSE Labs

Harsh Shandilya

23 Apr 23 Apr

3:16 a.m.

New subject: [PATCH 3.18.y 3/5] mm: allow GFP_{FS, IO} for page_cache_read page cache allocation

On 23 April 2018 4:16:36 AM IST, Michal Hocko mhocko@kernel.org wrote:

...

I am not reallu sure it is a good idea to blindly apply this patch to the older stable tree. Have you checked all filemap_fault handlers? This might have been quite different in 3.18 than for the kernel I was developing this against.

I did, but it's likely that I missed a few instances.

...

If the sole purpose of this backport is to make other patch (abc1be13fd11 ("mm/filemap.c: fix NULL pointer in page_cache_tree_insert()")) apply easier then I've already suggested how to handle those rejects.

I'll look for the email after this exam and spin up a fixed series, thanks for the heads-up!

...

On Mon 23-04-18 01:37:44, Harsh Shandilya wrote:

...
From: Michal Hocko mhocko@suse.com

Commit c20cd45eb01748f0fba77a504f956b000df4ea73 upstream.

page_cache_read has been historically using page_cache_alloc_cold to allocate a new page. This means that mapping_gfp_mask is used as the base for the gfp_mask. Many filesystems are setting this mask to GFP_NOFS to prevent from fs recursion issues. page_cache_read is

called

...
from the vm_operations_struct::fault() context during the page fault. This context doesn't need the reclaim protection normally.

ceph and ocfs2 which call filemap_fault from their fault handlers

seem

...
to be OK because they are not taking any fs lock before invoking

generic

...
implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe from the reclaim recursion POV because this lock serializes truncate and punch hole with the page faults and it doesn't get involved in the reclaim.

There is simply no reason to deliberately use a weaker allocation context when a __GFP_FS | __GFP_IO can be used. The GFP_NOFS

protection

...
might be even harmful. There is a push to fail GFP_NOFS allocations rather than loop within allocator indefinitely with a very limited reclaim ability. Once we start failing those requests the OOM killer might be triggered prematurely because the page cache allocation

failure

...
is propagated up the page fault path and end up in pagefault_out_of_memory.

We cannot play with mapping_gfp_mask directly because that would be

racy

...
wrt. parallel page faults and it might interfere with other users

who

...
really rely on NOFS semantic from the stored gfp_mask. The mask is

also

...
inode proper so it would even be a layering violation. What we can

do

...
instead is to push the gfp_mask into struct vm_fault and allow fs

layer

...
to overwrite it should the callback need to be called with a

different

...
allocation context.

Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO) because this should be safe from the page fault path normally. Why

do

...
we care about mapping_gfp_mask at all then? Because this doesn't hold only reclaim protection flags but it also might contain zone and movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we

have

...
to respect those.

Signed-off-by: Michal Hocko mhocko@suse.com Reported-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Acked-by: Jan Kara jack@suse.com Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Mel Gorman mgorman@suse.de Cc: Dave Chinner david@fromorbit.com Cc: Mark Fasheh mfasheh@suse.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Harsh Shandilya harsh@prjkt.io

include/linux/mm.h | 4 ++++ mm/filemap.c | 8 ++++---- mm/memory.c | 17 +++++++++++++++++ 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h index 5adffb0a468f..9ac4697979e8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -203,9 +203,13 @@ extern pgprot_t protection_map[16];

pgoff should be used in favour of virtual_address, if possible.

If pgoff

...

is used, one may implement ->remap_pages to get nonlinear mapping

support.

...

MM layer fills up gfp_mask for page allocations but fault handler

might

...

alter it if its implementation requires a different allocation

context.

...
*/ struct vm_fault { unsigned int flags; /* FAULT_FLAG_xxx flags */

gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ void __user *virtual_address; /* Faulting virtual address */

diff --git a/mm/filemap.c b/mm/filemap.c index 7e6ab98d4d3c..aafeeefcb00d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1746,18 +1746,18 @@ EXPORT_SYMBOL(generic_file_read_iter);

This adds the requested page to the page cache if it isn't

already there,

...

and schedules an I/O to read in its contents from disk.

*/ -static int page_cache_read(struct file *file, pgoff_t offset) +static int page_cache_read(struct file *file, pgoff_t offset, gfp_t

gfp_mask)

...
{ struct address_space *mapping = file->f_mapping; struct page *page; int ret; do {
page = page_cache_alloc_cold(mapping);
page = __page_cache_alloc(gfp_mask|__GFP_COLD);
if (!page) return -ENOMEM;
ret = add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL);
ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask &
GFP_KERNEL);

...
if (ret == 0)
	ret = mapping->a_ops->readpage(file, page);
else if (ret == -EEXIST)
@@ -1940,7 +1940,7 @@ no_cached_page: * We're only likely to ever get here if MADV_RANDOM is in * effect. */

error = page_cache_read(file, offset);

error = page_cache_read(file, offset, vmf->gfp_mask);

/* * The page we want has now been added to the page cache. diff --git a/mm/memory.c b/mm/memory.c index 0c4f5e36b155..5a62c6a42143 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1973,6 +1973,20 @@ static inline void cow_user_page(struct page
*dst, struct page *src, unsigned lo

...
copy_user_highpage(dst, src, va, vma);
} +static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma) +{
struct file *vm_file = vma->vm_file;

if (vm_file)
return mapping_gfp_mask(vm_file->f_mapping) | __GFP_FS | __GFP_IO;
/*
* Special mappings (e.g. VDSO) do not have any file so fake
* a default GFP_KERNEL for them.
*/
return GFP_KERNEL;
+}

/*

Notify the address space that the page is about to become
writable so that

...

it can prohibit this or wait for the page to get into an

appropriate state.

...
@@ -1988,6 +2002,7 @@ static int do_page_mkwrite(struct

vm_area_struct *vma, struct page *page,

...
vmf.virtual_address = (void __user *)(address & PAGE_MASK); vmf.pgoff = page->index; vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;

vmf.gfp_mask = __get_fault_gfp_mask(vma); vmf.page = page;

ret = vma->vm_ops->page_mkwrite(vma, &vmf); @@ -2670,6 +2685,7 @@ static int __do_fault(struct vm_area_struct

*vma, unsigned long address,

...
vmf.pgoff = pgoff; vmf.flags = flags; vmf.page = NULL;

vmf.gfp_mask = __get_fault_gfp_mask(vma);

ret = vma->vm_ops->fault(vma, &vmf); if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE |

VM_FAULT_RETRY)))

...
@@ -2834,6 +2850,7 @@ static void do_fault_around(struct

vm_area_struct *vma, unsigned long address,

...
vmf.pgoff = pgoff; vmf.max_pgoff = max_pgoff; vmf.flags = flags;

vmf.gfp_mask = __get_fault_gfp_mask(vma); vma->vm_ops->map_pages(vma, &vmf);

} -- 2.15.0.2308.g658a28aa74af

-- Harsh Shandilya, PRJKT Development LLC

Harsh Shandilya

22 Apr 22 Apr

8:07 p.m.

New subject: [PATCH 3.18.y 4/5] mm/filemap.c: fix NULL pointer in page_cache_tree_insert()

From: Matthew Wilcox mawilcox@microsoft.com

Commit abc1be13fd113ddef5e2d807a466286b864caed3 upstream.

f2fs specifies the __GFP_ZERO flag for allocating some of its pages. Unfortunately, the page cache also uses the mapping's GFP flags for allocating radix tree nodes. It always masked off the __GFP_HIGHMEM flag, and masks off __GFP_ZERO in some paths, but not all. That causes radix tree nodes to be allocated with a NULL list_head, which causes backtraces like:

__list_del_entry+0x30/0xd0 list_lru_del+0xac/0x1ac page_cache_tree_insert+0xd8/0x110

The __GFP_DMA and __GFP_DMA32 flags would also be able to sneak through if they are ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost location, and remove it from earlier in the callchain.

Link: http://lkml.kernel.org/r/20180411060320.14458-2-willy@infradead.org Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check") Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Reported-by: Chris Fries cfries@google.com Debugged-by: Minchan Kim minchan@kernel.org Acked-by: Johannes Weiner hannes@cmpxchg.org Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Jan Kara jack@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Harsh Shandilya harsh@prjkt.io --- mm/filemap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c index aafeeefcb00d..6b4bd08bebe1 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -468,7 +468,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask) VM_BUG_ON_PAGE(!PageLocked(new), new); VM_BUG_ON_PAGE(new->mapping, new);

- error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM); + error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK); if (!error) { struct address_space *mapping = old->mapping; void (*freepage)(struct page *); @@ -561,7 +561,7 @@ static int __add_to_page_cache_locked(struct page *page, return error; }

- error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM); + error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK); if (error) { if (!huge) mem_cgroup_cancel_charge(page, memcg); @@ -1757,7 +1757,7 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) if (!page) return -ENOMEM;

- ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL); + ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask); if (ret == 0) ret = mapping->a_ops->readpage(file, page); else if (ret == -EEXIST)

-- 2.15.0.2308.g658a28aa74af

Harsh Shandilya

8:07 p.m.

New subject: [PATCH 3.18.y 5/5] ext4: don't update checksum of new initialized bitmaps

From: Theodore Ts'o tytso@mit.edu

Commit 044e6e3d74a3d7103a0c8a9305dfd94d64000660 upstream.

When reading the inode or block allocation bitmap, if the bitmap needs to be initialized, do not update the checksum in the block group descriptor. That's because we're not set up to journal those changes. Instead, just set the verified bit on the bitmap block, so that it's not necessary to validate the checksum.

When a block or inode allocation actually happens, at that point the checksum will be calculated, and update of the bg descriptor block will be properly journalled.

Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@vger.kernel.org

Signed-off-by: Harsh Shandilya harsh@prjkt.io --- fs/ext4/balloc.c | 3 +-- fs/ext4/ialloc.c | 43 +++---------------------------------------- 2 files changed, 4 insertions(+), 42 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 3b88f0ca0e82..cbc1c40818f5 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -243,8 +243,6 @@ static int ext4_init_block_bitmap(struct super_block *sb, */ ext4_mark_bitmap_end(num_clusters_in_group(sb, block_group), sb->s_blocksize * 8, bh->b_data); - ext4_block_bitmap_csum_set(sb, block_group, gdp, bh); - ext4_group_desc_csum_set(sb, block_group, gdp); return 0; }

@@ -458,6 +456,7 @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group) err = ext4_init_block_bitmap(sb, bh, block_group, desc); set_bitmap_uptodate(bh); set_buffer_uptodate(bh); + set_buffer_verified(bh); ext4_unlock_group(sb, block_group); unlock_buffer(bh); if (err) diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index be20e9028fa1..28aaf640745f 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -64,45 +64,6 @@ void ext4_mark_bitmap_end(int start_bit, int end_bit, char *bitmap) memset(bitmap + (i >> 3), 0xff, (end_bit - i) >> 3); }

-/* Initializes an uninitialized inode bitmap */ -static unsigned ext4_init_inode_bitmap(struct super_block *sb, - struct buffer_head *bh, - ext4_group_t block_group, - struct ext4_group_desc *gdp) -{ - struct ext4_group_info *grp; - struct ext4_sb_info *sbi = EXT4_SB(sb); - J_ASSERT_BH(bh, buffer_locked(bh)); - - /* If checksum is bad mark all blocks and inodes use to prevent - * allocation, essentially implementing a per-group read-only flag. */ - if (!ext4_group_desc_csum_verify(sb, block_group, gdp)) { - ext4_error(sb, "Checksum bad for group %u", block_group); - grp = ext4_get_group_info(sb, block_group); - if (!EXT4_MB_GRP_BBITMAP_CORRUPT(grp)) - percpu_counter_sub(&sbi->s_freeclusters_counter, - grp->bb_free); - set_bit(EXT4_GROUP_INFO_BBITMAP_CORRUPT_BIT, &grp->bb_state); - if (!EXT4_MB_GRP_IBITMAP_CORRUPT(grp)) { - int count; - count = ext4_free_inodes_count(sb, gdp); - percpu_counter_sub(&sbi->s_freeinodes_counter, - count); - } - set_bit(EXT4_GROUP_INFO_IBITMAP_CORRUPT_BIT, &grp->bb_state); - return 0; - } - - memset(bh->b_data, 0, (EXT4_INODES_PER_GROUP(sb) + 7) / 8); - ext4_mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), sb->s_blocksize * 8, - bh->b_data); - ext4_inode_bitmap_csum_set(sb, block_group, gdp, bh, - EXT4_INODES_PER_GROUP(sb) / 8); - ext4_group_desc_csum_set(sb, block_group, gdp); - - return EXT4_INODES_PER_GROUP(sb); -} - void ext4_end_bitmap_read(struct buffer_head *bh, int uptodate) { if (uptodate) { @@ -157,7 +118,9 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)

ext4_lock_group(sb, block_group); if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { - ext4_init_inode_bitmap(sb, bh, block_group, desc); + memset(bh->b_data, 0, (EXT4_INODES_PER_GROUP(sb) + 7) / 8); + ext4_mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), + sb->s_blocksize * 8, bh->b_data); set_bitmap_uptodate(bh); set_buffer_uptodate(bh); set_buffer_verified(bh);

-- 2.15.0.2308.g658a28aa74af

Greg KH

24 Apr 24 Apr

12:30 p.m.

On Mon, Apr 23, 2018 at 01:37:41AM +0530, Harsh Shandilya wrote:

...

A few more patches that were Cc'd stable but failed to apply to 3.18, backported with the 4.4 queue variants as reference wherever required.

I've applied the ext4 patches, I'll wait for you to redo the other two.

thanks,

greg k-h

Harsh Shandilya

12:59 p.m.

On 24 April 2018 6:00:05 PM IST, Greg KH greg@kroah.com wrote:

...

On Mon, Apr 23, 2018 at 01:37:41AM +0530, Harsh Shandilya wrote:

...
A few more patches that were Cc'd stable but failed to apply to 3.18, backported with the 4.4 queue variants as reference wherever required.

I've applied the ext4 patches, I'll wait for you to redo the other two.

thanks,

greg k-h

Ahh well, I missed this email and respun this entire series after incorporating Michal's feedback. You can pick up "mm/filemap.c: fix NULL pointer in page_cache_tree_insert()" from there, the other mm patch is not to be backported per feedback from the patch author.

-- Harsh Shandilya, PRJKT Development LLC

Harsh Shandilya

1:11 p.m.

On 24 April 2018 6:00:05 PM IST, Greg KH greg@kroah.com wrote:

...

On Mon, Apr 23, 2018 at 01:37:41AM +0530, Harsh Shandilya wrote:

...
A few more patches that were Cc'd stable but failed to apply to 3.18, backported with the 4.4 queue variants as reference wherever required.

I've applied the ext4 patches, I'll wait for you to redo the other two.

thanks,

greg k-h

Can you drop the "[@nathanchance: xxxx]" line from ext4-bugfix-for-mmaped-pages-in-mpage_release_unused_pages.patch file please? Once the patches start hitting GitHub repositories of various people it sends an email notification to Nathan each time, he added it out of force of habit and is already regretting it with 4.4 and I didn't pay attention either when I backported the patch from the 4.4 queue. Sorry for the trouble :(

-- Harsh Shandilya, PRJKT Development LLC

Greg KH

2:03 p.m.

On Tue, Apr 24, 2018 at 06:41:00PM +0530, Harsh Shandilya wrote:

...

On 24 April 2018 6:00:05 PM IST, Greg KH greg@kroah.com wrote:

...
On Mon, Apr 23, 2018 at 01:37:41AM +0530, Harsh Shandilya wrote:

...
A few more patches that were Cc'd stable but failed to apply to 3.18, backported with the 4.4 queue variants as reference wherever required.

I've applied the ext4 patches, I'll wait for you to redo the other two.

thanks,

greg k-h

Can you drop the "[@nathanchance: xxxx]" line from ext4-bugfix-for-mmaped-pages-in-mpage_release_unused_pages.patch file please? Once the patches start hitting GitHub repositories of various people it sends an email notification to Nathan each time, he added it out of force of habit and is already regretting it with 4.4 and I didn't pay attention either when I backported the patch from the 4.4 queue. Sorry for the trouble :(

As you backported his 4.4 patch, I'm loath to drop the signed off by as it shows "providence".

As for github notifications, just go turn them off. It's the only way to deal with that horrid site :(

greg k-h

Harsh Shandilya

2:11 p.m.

On 24 April 2018 7:33:05 PM IST, Greg KH greg@kroah.com wrote:

...

On Tue, Apr 24, 2018 at 06:41:00PM +0530, Harsh Shandilya wrote:

...
On 24 April 2018 6:00:05 PM IST, Greg KH greg@kroah.com wrote:

...
On Mon, Apr 23, 2018 at 01:37:41AM +0530, Harsh Shandilya wrote:

...
A few more patches that were Cc'd stable but failed to apply to 3.18, backported with the 4.4 queue variants as reference wherever required.

I've applied the ext4 patches, I'll wait for you to redo the other

two.

...
...
thanks,

greg k-h

Can you drop the "[@nathanchance: xxxx]" line from ext4-bugfix-for-mmaped-pages-in-mpage_release_unused_pages.patch file please? Once the patches start hitting GitHub repositories of various people it sends an email notification to Nathan each time, he added

it

...
out of force of habit and is already regretting it with 4.4 and I didn't pay attention either when I backported the patch from the 4.4 queue. Sorry for the trouble :(

As you backported his 4.4 patch, I'm loath to drop the signed off by as it shows "providence".

Keep the sign-off and drop the line that mentions his GitHub username, that's what I requested for.

...

As for github notifications, just go turn them off. It's the only way to deal with that horrid site :(

I wish I could haha

-- Harsh Shandilya, PRJKT Development LLC

Greg KH

2:24 p.m.

On Tue, Apr 24, 2018 at 07:41:42PM +0530, Harsh Shandilya wrote:

...

On 24 April 2018 7:33:05 PM IST, Greg KH greg@kroah.com wrote:

...
On Tue, Apr 24, 2018 at 06:41:00PM +0530, Harsh Shandilya wrote:

...
On 24 April 2018 6:00:05 PM IST, Greg KH greg@kroah.com wrote:

...
On Mon, Apr 23, 2018 at 01:37:41AM +0530, Harsh Shandilya wrote:

...
A few more patches that were Cc'd stable but failed to apply to 3.18, backported with the 4.4 queue variants as reference wherever required.

I've applied the ext4 patches, I'll wait for you to redo the other

two.

...
...
thanks,

greg k-h

Can you drop the "[@nathanchance: xxxx]" line from ext4-bugfix-for-mmaped-pages-in-mpage_release_unused_pages.patch file please? Once the patches start hitting GitHub repositories of various people it sends an email notification to Nathan each time, he added

it

...
out of force of habit and is already regretting it with 4.4 and I didn't pay attention either when I backported the patch from the 4.4 queue. Sorry for the trouble :(

As you backported his 4.4 patch, I'm loath to drop the signed off by as it shows "providence".

Keep the sign-off and drop the line that mentions his GitHub username, that's what I requested for.

Ah, got it, now edited, I just dropped the '@' which should be fine.

thanks,

greg k-h

2746

days inactive

2748

days old

linux-stable-mirror@lists.linaro.org

13 comments

participants

tags (0)

participants (3)

Greg KH
Harsh Shandilya
Michal Hocko