This series backports bugfixes already merged in linux upstream which we found these issues in our commerical products, which are serious and should be fixed immediately.
Note that it also includes some xarray modification since upcoming patches heavily needs it, which can reduce more conflicts later.
All patches have been tested again as a whole.
Thanks, Gao Xiang
Chen Gong (1): staging: erofs: replace BUG_ON with DBG_BUGON in data.c
Gao Xiang (11): staging: erofs: fix a bug when appling cache strategy staging: erofs: complete error handing of z_erofs_do_read_page staging: erofs: drop multiref support temporarily staging: erofs: remove the redundant d_rehash() for the root dentry staging: erofs: fix race when the managed cache is enabled staging: erofs: atomic_cond_read_relaxed on ref-locked workgroup staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}' staging: erofs: add a full barrier in erofs_workgroup_unfreeze staging: erofs: {dir,inode,super}.c: rectify BUG_ONs staging: erofs: unzip_{pagevec.h,vle.c}: rectify BUG_ONs staging: erofs: unzip_vle_lz4.c,utils.c: rectify BUG_ONs
drivers/staging/erofs/data.c | 31 ++++--- drivers/staging/erofs/dir.c | 7 +- drivers/staging/erofs/inode.c | 10 ++- drivers/staging/erofs/internal.h | 71 ++++++++++------ drivers/staging/erofs/super.c | 19 ++--- drivers/staging/erofs/unzip_pagevec.h | 2 +- drivers/staging/erofs/unzip_vle.c | 97 ++++++++-------------- drivers/staging/erofs/unzip_vle.h | 12 +-- drivers/staging/erofs/unzip_vle_lz4.c | 2 +- drivers/staging/erofs/utils.c | 150 +++++++++++++++++++++++----------- include/linux/xarray.h | 48 +++++++++++ 11 files changed, 271 insertions(+), 178 deletions(-)
commit 0734ffbf574ee813b20899caef2fe0ed502bb783 upstream.
As described in Kconfig, the last compressed pack should be cached for further reading for either `EROFS_FS_ZIP_CACHE_UNIPOLAR' or `EROFS_FS_ZIP_CACHE_BIPOLAR' by design.
However, there is a bug in z_erofs_do_read_page, it will switch `initial' to `false' at the very beginning before it decides to cache the last compressed pack.
caching strategy should work properly after appling this patch.
Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/unzip_vle.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c index 0346630b67c8..35ae0865c1fd 100644 --- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -624,7 +624,7 @@ static int z_erofs_do_read_page(struct z_erofs_vle_frontend *fe, /* go ahead the next map_blocks */ debugln("%s: [out-of-range] pos %llu", __func__, offset + cur);
- if (!z_erofs_vle_work_iter_end(builder)) + if (z_erofs_vle_work_iter_end(builder)) fe->initial = false;
map->m_la = offset + cur;
commit 1e05ff36e6921ca61bdbf779f81a602863569ee3 upstream.
This patch completes error handing code of z_erofs_do_read_page. PG_error will be set when some read error happens, therefore z_erofs_onlinepage_endio will unlock this page without setting PG_uptodate.
Reviewed-by: Chao Yu yucxhao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: drivers/staging/erofs/unzip_vle.c
Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/unzip_vle.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c index 35ae0865c1fd..9ae1f8833b72 100644 --- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -605,8 +605,8 @@ static int z_erofs_do_read_page(struct z_erofs_vle_frontend *fe, #endif
enum z_erofs_page_type page_type; - unsigned cur, end, spiltted, index; - int err; + unsigned int cur, end, spiltted, index; + int err = 0;
/* register locked file pages as online pages in pack */ z_erofs_onlinepage_init(page); @@ -633,12 +633,11 @@ static int z_erofs_do_read_page(struct z_erofs_vle_frontend *fe, if (unlikely(err)) goto err_out;
- /* deal with hole (FIXME! broken now) */ if (unlikely(!(map->m_flags & EROFS_MAP_MAPPED))) goto hitted;
DBG_BUGON(map->m_plen != 1 << sbi->clusterbits); - BUG_ON(erofs_blkoff(map->m_pa)); + DBG_BUGON(erofs_blkoff(map->m_pa));
err = z_erofs_vle_work_iter_begin(builder, sb, map, &fe->owned_head); if (unlikely(err)) @@ -683,7 +682,7 @@ static int z_erofs_do_read_page(struct z_erofs_vle_frontend *fe,
err = z_erofs_vle_work_add_page(builder, newpage, Z_EROFS_PAGE_TYPE_EXCLUSIVE); - if (!err) + if (likely(!err)) goto retry; }
@@ -694,9 +693,10 @@ static int z_erofs_do_read_page(struct z_erofs_vle_frontend *fe,
/* FIXME! avoid the last relundant fixup & endio */ z_erofs_onlinepage_fixup(page, index, true); - ++spiltted;
- /* also update nr_pages and increase queued_pages */ + /* bump up the number of spiltted parts of a page */ + ++spiltted; + /* also update nr_pages */ work->nr_pages = max_t(pgoff_t, work->nr_pages, index + 1); next_part: /* can be used for verification */ @@ -706,16 +706,18 @@ static int z_erofs_do_read_page(struct z_erofs_vle_frontend *fe, if (end > 0) goto repeat;
+out: /* FIXME! avoid the last relundant fixup & endio */ z_erofs_onlinepage_endio(page);
debugln("%s, finish page: %pK spiltted: %u map->m_llen %llu", __func__, page, spiltted, map->m_llen); - return 0; + return err;
+ /* if some error occurred while processing this page */ err_out: - /* TODO: the missing error handing cases */ - return err; + SetPageError(page); + goto out; }
static void z_erofs_vle_unzip_kickoff(void *ptr, int bios)
On Wed, Feb 20, 2019 at 05:18:44PM +0800, Gao Xiang wrote:
commit 1e05ff36e6921ca61bdbf779f81a602863569ee3 upstream.
This patch completes error handing code of z_erofs_do_read_page. PG_error will be set when some read error happens, therefore z_erofs_onlinepage_endio will unlock this page without setting PG_uptodate.
Reviewed-by: Chao Yu yucxhao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: drivers/staging/erofs/unzip_vle.c
These types of lines are not needed in backports, they just are clutter. I'll go fix it up...
greg k-h
On 2019/2/25 22:59, Greg Kroah-Hartman wrote:
On Wed, Feb 20, 2019 at 05:18:44PM +0800, Gao Xiang wrote:
commit 1e05ff36e6921ca61bdbf779f81a602863569ee3 upstream.
This patch completes error handing code of z_erofs_do_read_page. PG_error will be set when some read error happens, therefore z_erofs_onlinepage_endio will unlock this page without setting PG_uptodate.
Reviewed-by: Chao Yu yucxhao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: drivers/staging/erofs/unzip_vle.c
These types of lines are not needed in backports, they just are clutter. I'll go fix it up...
Thanks Greg... I also added some comments in some patch.. If it is not needed, I will notice in the future submission...
Thanks, Gao Xiang
greg k-h
From: Chen Gong gongchen4@huawei.com
commit 9141b60cf6a53c99f8a9309bf8e1c6650a6785c1 upstream.
This patch replace BUG_ON with DBG_BUGON in data.c, and add necessary error handler.
Signed-off-by: Chen Gong gongchen4@huawei.com Reviewed-by: Gao Xiang gaoxiang25@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/data.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-)
diff --git a/drivers/staging/erofs/data.c b/drivers/staging/erofs/data.c index ac263a180253..894e60ecebe2 100644 --- a/drivers/staging/erofs/data.c +++ b/drivers/staging/erofs/data.c @@ -25,7 +25,7 @@ static inline void read_endio(struct bio *bio) struct page *page = bvec->bv_page;
/* page is already locked */ - BUG_ON(PageUptodate(page)); + DBG_BUGON(PageUptodate(page));
if (unlikely(err)) SetPageError(page); @@ -91,12 +91,12 @@ static int erofs_map_blocks_flatmode(struct inode *inode, struct erofs_map_blocks *map, int flags) { + int err = 0; erofs_blk_t nblocks, lastblk; u64 offset = map->m_la; struct erofs_vnode *vi = EROFS_V(inode);
trace_erofs_map_blocks_flatmode_enter(inode, map, flags); - BUG_ON(is_inode_layout_compression(inode));
nblocks = DIV_ROUND_UP(inode->i_size, PAGE_SIZE); lastblk = nblocks - is_inode_layout_inline(inode); @@ -123,18 +123,27 @@ static int erofs_map_blocks_flatmode(struct inode *inode, map->m_plen = inode->i_size - offset;
/* inline data should locate in one meta block */ - BUG_ON(erofs_blkoff(map->m_pa) + map->m_plen > PAGE_SIZE); + if (erofs_blkoff(map->m_pa) + map->m_plen > PAGE_SIZE) { + DBG_BUGON(1); + err = -EIO; + goto err_out; + } + map->m_flags |= EROFS_MAP_META; } else { errln("internal error @ nid: %llu (size %llu), m_la 0x%llx", vi->nid, inode->i_size, map->m_la); - BUG(); + DBG_BUGON(1); + err = -EIO; + goto err_out; }
out: map->m_llen = map->m_plen; + +err_out: trace_erofs_map_blocks_flatmode_exit(inode, map, flags, 0); - return 0; + return err; }
#ifdef CONFIG_EROFS_FS_ZIP @@ -190,7 +199,7 @@ static inline struct bio *erofs_read_raw_page( erofs_off_t current_block = (erofs_off_t)page->index; int err;
- BUG_ON(!nblocks); + DBG_BUGON(!nblocks);
if (PageUptodate(page)) { err = 0; @@ -233,7 +242,7 @@ static inline struct bio *erofs_read_raw_page( }
/* for RAW access mode, m_plen must be equal to m_llen */ - BUG_ON(map.m_plen != map.m_llen); + DBG_BUGON(map.m_plen != map.m_llen);
blknr = erofs_blknr(map.m_pa); blkoff = erofs_blkoff(map.m_pa); @@ -243,7 +252,7 @@ static inline struct bio *erofs_read_raw_page( void *vsrc, *vto; struct page *ipage;
- BUG_ON(map.m_plen > PAGE_SIZE); + DBG_BUGON(map.m_plen > PAGE_SIZE);
ipage = erofs_get_meta_page(inode->i_sb, blknr, 0);
@@ -270,7 +279,7 @@ static inline struct bio *erofs_read_raw_page( }
/* pa must be block-aligned for raw reading */ - BUG_ON(erofs_blkoff(map.m_pa) != 0); + DBG_BUGON(erofs_blkoff(map.m_pa));
/* max # of continuous pages */ if (nblocks > DIV_ROUND_UP(map.m_plen, PAGE_SIZE)) @@ -331,7 +340,7 @@ static int erofs_raw_access_readpage(struct file *file, struct page *page) if (IS_ERR(bio)) return PTR_ERR(bio);
- BUG_ON(bio != NULL); /* since we have only one bio -- must be NULL */ + DBG_BUGON(bio); /* since we have only one bio -- must be NULL */ return 0; }
@@ -369,7 +378,7 @@ static int erofs_raw_access_readpages(struct file *filp, /* pages could still be locked */ put_page(page); } - BUG_ON(!list_empty(pages)); + DBG_BUGON(!list_empty(pages));
/* the rare case (end in gaps) */ if (unlikely(bio != NULL))
commit e5e3abbadf0dbd1068f64f8abe70401c5a178180 upstream.
Multiref support means that a compressed page could have more than one reference, which is designed for on-disk data deduplication. However, mkfs doesn't support this mode at this moment, and the kernel implementation is also broken.
Let's drop multiref support. If it is fully implemented in the future, it can be reverted later.
Signed-off-by: Gao Xiang gaoxiang25@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: drivers/staging/erofs/unzip_vle.c
Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/unzip_vle.c | 40 +++++++-------------------------------- drivers/staging/erofs/unzip_vle.h | 12 +----------- 2 files changed, 8 insertions(+), 44 deletions(-)
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c index 9ae1f8833b72..63accc4527ce 100644 --- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -293,12 +293,9 @@ z_erofs_vle_work_lookup(struct super_block *sb, *grp_ret = grp = container_of(egrp, struct z_erofs_vle_workgroup, obj);
-#ifndef CONFIG_EROFS_FS_ZIP_MULTIREF work = z_erofs_vle_grab_work(grp, pageofs); + /* if multiref is disabled, `primary' is always true */ primary = true; -#else - BUG(); -#endif
DBG_BUGON(work->pageofs != pageofs);
@@ -365,12 +362,9 @@ z_erofs_vle_work_register(struct super_block *sb, struct z_erofs_vle_workgroup *grp = *grp_ret; struct z_erofs_vle_work *work;
-#ifndef CONFIG_EROFS_FS_ZIP_MULTIREF + /* if multiref is disabled, grp should never be nullptr */ BUG_ON(grp != NULL); -#else - if (grp != NULL) - goto skip; -#endif + /* no available workgroup, let's allocate one */ grp = kmem_cache_zalloc(z_erofs_workgroup_cachep, GFP_NOFS); if (unlikely(grp == NULL)) @@ -393,13 +387,7 @@ z_erofs_vle_work_register(struct super_block *sb, *hosted = true;
newgrp = true; -#ifdef CONFIG_EROFS_FS_ZIP_MULTIREF -skip: - /* currently unimplemented */ - BUG(); -#else work = z_erofs_vle_grab_primary_work(grp); -#endif work->pageofs = pageofs;
mutex_init(&work->lock); @@ -798,10 +786,8 @@ static int z_erofs_vle_unzip(struct super_block *sb, const unsigned clusterpages = erofs_clusterpages(sbi);
struct z_erofs_pagevec_ctor ctor; - unsigned nr_pages; -#ifndef CONFIG_EROFS_FS_ZIP_MULTIREF - unsigned sparsemem_pages = 0; -#endif + unsigned int nr_pages; + unsigned int sparsemem_pages = 0; struct page *pages_onstack[Z_EROFS_VLE_VMAP_ONSTACK_PAGES]; struct page **pages, **compressed_pages, *page; unsigned i, llen; @@ -813,11 +799,7 @@ static int z_erofs_vle_unzip(struct super_block *sb, int err;
might_sleep(); -#ifndef CONFIG_EROFS_FS_ZIP_MULTIREF work = z_erofs_vle_grab_primary_work(grp); -#else - BUG(); -#endif BUG_ON(!READ_ONCE(work->nr_pages));
mutex_lock(&work->lock); @@ -868,13 +850,11 @@ static int z_erofs_vle_unzip(struct super_block *sb, pagenr = z_erofs_onlinepage_index(page);
BUG_ON(pagenr >= nr_pages); - -#ifndef CONFIG_EROFS_FS_ZIP_MULTIREF BUG_ON(pages[pagenr] != NULL); - ++sparsemem_pages; -#endif + pages[pagenr] = page; } + sparsemem_pages = i;
z_erofs_pagevec_ctor_exit(&ctor, true);
@@ -904,10 +884,8 @@ static int z_erofs_vle_unzip(struct super_block *sb, pagenr = z_erofs_onlinepage_index(page);
BUG_ON(pagenr >= nr_pages); -#ifndef CONFIG_EROFS_FS_ZIP_MULTIREF BUG_ON(pages[pagenr] != NULL); ++sparsemem_pages; -#endif pages[pagenr] = page;
overlapped = true; @@ -933,12 +911,10 @@ static int z_erofs_vle_unzip(struct super_block *sb, if (err != -ENOTSUPP) goto out_percpu;
-#ifndef CONFIG_EROFS_FS_ZIP_MULTIREF if (sparsemem_pages >= nr_pages) { BUG_ON(sparsemem_pages > nr_pages); goto skip_allocpage; } -#endif
for (i = 0; i < nr_pages; ++i) { if (pages[i] != NULL) @@ -947,9 +923,7 @@ static int z_erofs_vle_unzip(struct super_block *sb, pages[i] = __stagingpage_alloc(page_pool, GFP_NOFS); }
-#ifndef CONFIG_EROFS_FS_ZIP_MULTIREF skip_allocpage: -#endif vout = erofs_vmap(pages, nr_pages);
err = z_erofs_vle_unzip_vmap(compressed_pages, diff --git a/drivers/staging/erofs/unzip_vle.h b/drivers/staging/erofs/unzip_vle.h index 393998500865..3316bc36965d 100644 --- a/drivers/staging/erofs/unzip_vle.h +++ b/drivers/staging/erofs/unzip_vle.h @@ -47,13 +47,6 @@ static inline bool z_erofs_gather_if_stagingpage(struct list_head *page_pool, #define Z_EROFS_VLE_INLINE_PAGEVECS 3
struct z_erofs_vle_work { - /* struct z_erofs_vle_work *left, *right; */ - -#ifdef CONFIG_EROFS_FS_ZIP_MULTIREF - struct list_head list; - - atomic_t refcount; -#endif struct mutex lock;
/* I: decompression offset in page */ @@ -107,10 +100,8 @@ static inline void z_erofs_vle_set_workgrp_fmt( grp->flags = fmt | (grp->flags & ~Z_EROFS_VLE_WORKGRP_FMT_MASK); }
-#ifdef CONFIG_EROFS_FS_ZIP_MULTIREF -#error multiref decompression is unimplemented yet -#else
+/* definitions if multiref is disabled */ #define z_erofs_vle_grab_primary_work(grp) (&(grp)->work) #define z_erofs_vle_grab_work(grp, pageofs) (&(grp)->work) #define z_erofs_vle_work_workgroup(wrk, primary) \ @@ -118,7 +109,6 @@ static inline void z_erofs_vle_set_workgrp_fmt( struct z_erofs_vle_workgroup, work) : \ ({ BUG(); (void *)NULL; }))
-#endif
#define Z_EROFS_WORKGROUP_SIZE sizeof(struct z_erofs_vle_workgroup)
commit e9c892465583c8f42d61fafe30970d36580925df upstream.
There is actually no need at all to d_rehash() for the root dentry as Al pointed out, fix it.
Reported-by: Al Viro viro@ZenIV.linux.org.uk Cc: Al Viro viro@ZenIV.linux.org.uk Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/super.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/staging/erofs/super.c b/drivers/staging/erofs/super.c index 2df9768edac9..434616b3e76c 100644 --- a/drivers/staging/erofs/super.c +++ b/drivers/staging/erofs/super.c @@ -404,12 +404,6 @@ static int erofs_read_super(struct super_block *sb,
erofs_register_super(sb);
- /* - * We already have a positive dentry, which was instantiated - * by d_make_root. Just need to d_rehash it. - */ - d_rehash(sb->s_root); - if (!silent) infoln("mounted on %s with opts: %s.", dev_name, (char *)data);
commit 51232df5e4b268936beccde5248f312a316800be upstream.
When the managed cache is enabled, the last reference count of a workgroup must be used for its workstation.
Otherwise, it could lead to incorrect (un)freezes in the reclaim path, and it would be harmful.
A typical race as follows:
Thread 1 (In the reclaim path) Thread 2 workgroup_freeze(grp, 1) refcnt = 1 ... workgroup_unfreeze(grp, 1) refcnt = 1 workgroup_get(grp) refcnt = 2 (x) workgroup_put(grp) refcnt = 1 (x) ...unexpected behaviors
* grp is detached but still used, which violates cache-managed freeze constraint.
Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com
Conflicts: drivers/staging/erofs/utils.c Updates: include/linux/xarray.h: add xa_untag_pointer,xa_tag_pointer,xa_pointer_tag from upstream 3159f943aafd in order to reduce conflicts.
Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/internal.h | 1 + drivers/staging/erofs/utils.c | 138 +++++++++++++++++++++++++++------------ include/linux/xarray.h | 48 ++++++++++++++ 3 files changed, 145 insertions(+), 42 deletions(-)
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h index e6313c54e3ad..122ea5016f3b 100644 --- a/drivers/staging/erofs/internal.h +++ b/drivers/staging/erofs/internal.h @@ -240,6 +240,7 @@ static inline bool erofs_workgroup_get(struct erofs_workgroup *grp, int *ocnt) }
#define __erofs_workgroup_get(grp) atomic_inc(&(grp)->refcount) +#define __erofs_workgroup_put(grp) atomic_dec(&(grp)->refcount)
extern int erofs_workgroup_put(struct erofs_workgroup *grp);
diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c index 595cf90af9bb..389f6182721e 100644 --- a/drivers/staging/erofs/utils.c +++ b/drivers/staging/erofs/utils.c @@ -87,12 +87,21 @@ int erofs_register_workgroup(struct super_block *sb, grp = (void *)((unsigned long)grp | 1UL << RADIX_TREE_EXCEPTIONAL_SHIFT);
- err = radix_tree_insert(&sbi->workstn_tree, - grp->index, grp); + /* + * Bump up reference count before making this workgroup + * visible to other users in order to avoid potential UAF + * without serialized by erofs_workstn_lock. + */ + __erofs_workgroup_get(grp);
- if (!err) { - __erofs_workgroup_get(grp); - } + err = radix_tree_insert(&sbi->workstn_tree, + grp->index, grp); + if (unlikely(err)) + /* + * it's safe to decrease since the workgroup isn't visible + * and refcount >= 2 (cannot be freezed). + */ + __erofs_workgroup_put(grp);
erofs_workstn_unlock(sbi); radix_tree_preload_end(); @@ -101,19 +110,94 @@ int erofs_register_workgroup(struct super_block *sb,
extern void erofs_workgroup_free_rcu(struct erofs_workgroup *grp);
+static void __erofs_workgroup_free(struct erofs_workgroup *grp) +{ + atomic_long_dec(&erofs_global_shrink_cnt); + erofs_workgroup_free_rcu(grp); +} + int erofs_workgroup_put(struct erofs_workgroup *grp) { int count = atomic_dec_return(&grp->refcount);
if (count == 1) atomic_long_inc(&erofs_global_shrink_cnt); - else if (!count) { - atomic_long_dec(&erofs_global_shrink_cnt); - erofs_workgroup_free_rcu(grp); - } + else if (!count) + __erofs_workgroup_free(grp); return count; }
+#ifdef EROFS_FS_HAS_MANAGED_CACHE +/* for cache-managed case, customized reclaim paths exist */ +static void erofs_workgroup_unfreeze_final(struct erofs_workgroup *grp) +{ + erofs_workgroup_unfreeze(grp, 0); + __erofs_workgroup_free(grp); +} + +bool erofs_try_to_release_workgroup(struct erofs_sb_info *sbi, + struct erofs_workgroup *grp, + bool cleanup) +{ + /* + * for managed cache enabled, the refcount of workgroups + * themselves could be < 0 (freezed). So there is no guarantee + * that all refcount > 0 if managed cache is enabled. + */ + if (!erofs_workgroup_try_to_freeze(grp, 1)) + return false; + + /* + * note that all cached pages should be unlinked + * before delete it from the radix tree. + * Otherwise some cached pages of an orphan old workgroup + * could be still linked after the new one is available. + */ + if (erofs_try_to_free_all_cached_pages(sbi, grp)) { + erofs_workgroup_unfreeze(grp, 1); + return false; + } + + /* + * it is impossible to fail after the workgroup is freezed, + * however in order to avoid some race conditions, add a + * DBG_BUGON to observe this in advance. + */ + DBG_BUGON(xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree, + grp->index)) != grp); + + /* + * if managed cache is enable, the last refcount + * should indicate the related workstation. + */ + erofs_workgroup_unfreeze_final(grp); + return true; +} + +#else +/* for nocache case, no customized reclaim path at all */ +bool erofs_try_to_release_workgroup(struct erofs_sb_info *sbi, + struct erofs_workgroup *grp, + bool cleanup) +{ + int cnt = atomic_read(&grp->refcount); + + DBG_BUGON(cnt <= 0); + DBG_BUGON(cleanup && cnt != 1); + + if (cnt > 1) + return false; + + DBG_BUGON(xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree, + grp->index)) != grp); + + /* (rarely) could be grabbed again when freeing */ + erofs_workgroup_put(grp); + return true; +} + +#endif + unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, unsigned long nr_shrink, bool cleanup) @@ -130,44 +214,14 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, batch, first_index, PAGEVEC_SIZE);
for (i = 0; i < found; ++i) { - int cnt; - struct erofs_workgroup *grp = (void *) - ((unsigned long)batch[i] & - ~RADIX_TREE_EXCEPTIONAL_ENTRY); + struct erofs_workgroup *grp = xa_untag_pointer(batch[i]);
first_index = grp->index + 1;
- cnt = atomic_read(&grp->refcount); - BUG_ON(cnt <= 0); - - if (cleanup) - BUG_ON(cnt != 1); - -#ifndef EROFS_FS_HAS_MANAGED_CACHE - else if (cnt > 1) -#else - if (!erofs_workgroup_try_to_freeze(grp, 1)) -#endif + /* try to shrink each valid workgroup */ + if (!erofs_try_to_release_workgroup(sbi, grp, cleanup)) continue;
- if (radix_tree_delete(&sbi->workstn_tree, - grp->index) != grp) { -#ifdef EROFS_FS_HAS_MANAGED_CACHE -skip: - erofs_workgroup_unfreeze(grp, 1); -#endif - continue; - } - -#ifdef EROFS_FS_HAS_MANAGED_CACHE - if (erofs_try_to_free_all_cached_pages(sbi, grp)) - goto skip; - - erofs_workgroup_unfreeze(grp, 1); -#endif - /* (rarely) grabbed again when freeing */ - erofs_workgroup_put(grp); - ++freed; if (unlikely(!--nr_shrink)) break; diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 2dfc8006fe64..1053785830c7 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -21,4 +21,52 @@ #define xa_unlock_irqrestore(xa, flags) \ spin_unlock_irqrestore(&(xa)->xa_lock, flags)
+/** + * xa_tag_pointer() - Create an XArray entry for a tagged pointer. + * @p: Plain pointer. + * @tag: Tag value (0, 1 or 3). + * + * If the user of the XArray prefers, they can tag their pointers instead + * of storing value entries. Three tags are available (0, 1 and 3). + * These are distinct from the xa_mark_t as they are not replicated up + * through the array and cannot be searched for. + * + * Context: Any context. + * Return: An XArray entry. + */ +static inline void *xa_tag_pointer(void *p, unsigned long tag) +{ + return (void *)((unsigned long)p | tag); +} + +/** + * xa_untag_pointer() - Turn an XArray entry into a plain pointer. + * @entry: XArray entry. + * + * If you have stored a tagged pointer in the XArray, call this function + * to get the untagged version of the pointer. + * + * Context: Any context. + * Return: A pointer. + */ +static inline void *xa_untag_pointer(void *entry) +{ + return (void *)((unsigned long)entry & ~3UL); +} + +/** + * xa_pointer_tag() - Get the tag stored in an XArray entry. + * @entry: XArray entry. + * + * If you have stored a tagged pointer in the XArray, call this function + * to get the tag of that pointer. + * + * Context: Any context. + * Return: A tag. + */ +static inline unsigned int xa_pointer_tag(void *entry) +{ + return (unsigned long)entry & 3UL; +} + #endif /* _LINUX_XARRAY_H */
On Wed, Feb 20, 2019 at 05:18:48PM +0800, Gao Xiang wrote:
commit 51232df5e4b268936beccde5248f312a316800be upstream.
When the managed cache is enabled, the last reference count of a workgroup must be used for its workstation.
Otherwise, it could lead to incorrect (un)freezes in the reclaim path, and it would be harmful.
A typical race as follows:
Thread 1 (In the reclaim path) Thread 2 workgroup_freeze(grp, 1) refcnt = 1 ... workgroup_unfreeze(grp, 1) refcnt = 1 workgroup_get(grp) refcnt = 2 (x) workgroup_put(grp) refcnt = 1 (x) ...unexpected behaviors
- grp is detached but still used, which violates cache-managed freeze constraint.
Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com
Conflicts: drivers/staging/erofs/utils.c Updates: include/linux/xarray.h: add xa_untag_pointer,xa_tag_pointer,xa_pointer_tag from upstream 3159f943aafd in order to reduce conflicts.
No, sorry, I don't want to add xarray.h to 4.19.y, that's crazy.
And even if we did, you do not slip it in as part of a different patch, it should come in as its own patch, with the same git commit id that it landed in 4.20 with.
Please fix this up...
greg k-h
On 2019/2/25 23:04, Greg Kroah-Hartman wrote:
On Wed, Feb 20, 2019 at 05:18:48PM +0800, Gao Xiang wrote:
commit 51232df5e4b268936beccde5248f312a316800be upstream.
When the managed cache is enabled, the last reference count of a workgroup must be used for its workstation.
Otherwise, it could lead to incorrect (un)freezes in the reclaim path, and it would be harmful.
A typical race as follows:
Thread 1 (In the reclaim path) Thread 2 workgroup_freeze(grp, 1) refcnt = 1 ... workgroup_unfreeze(grp, 1) refcnt = 1 workgroup_get(grp) refcnt = 2 (x) workgroup_put(grp) refcnt = 1 (x) ...unexpected behaviors
- grp is detached but still used, which violates cache-managed freeze constraint.
Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com
Conflicts: drivers/staging/erofs/utils.c Updates: include/linux/xarray.h: add xa_untag_pointer,xa_tag_pointer,xa_pointer_tag from upstream 3159f943aafd in order to reduce conflicts.
No, sorry, I don't want to add xarray.h to 4.19.y, that's crazy.
Or can I define these xa_untag_pointer,xa_tag_pointer,xa_pointer_tag in a erofs header internally? it is acceptable?
Thanks, Gao Xiang
And even if we did, you do not slip it in as part of a different patch, it should come in as its own patch, with the same git commit id that it landed in 4.20 with.
Please fix this up...
greg k-h
On Mon, Feb 25, 2019 at 11:07:19PM +0800, Gao Xiang wrote:
On 2019/2/25 23:04, Greg Kroah-Hartman wrote:
On Wed, Feb 20, 2019 at 05:18:48PM +0800, Gao Xiang wrote:
commit 51232df5e4b268936beccde5248f312a316800be upstream.
When the managed cache is enabled, the last reference count of a workgroup must be used for its workstation.
Otherwise, it could lead to incorrect (un)freezes in the reclaim path, and it would be harmful.
A typical race as follows:
Thread 1 (In the reclaim path) Thread 2 workgroup_freeze(grp, 1) refcnt = 1 ... workgroup_unfreeze(grp, 1) refcnt = 1 workgroup_get(grp) refcnt = 2 (x) workgroup_put(grp) refcnt = 1 (x) ...unexpected behaviors
- grp is detached but still used, which violates cache-managed freeze constraint.
Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com
Conflicts: drivers/staging/erofs/utils.c Updates: include/linux/xarray.h: add xa_untag_pointer,xa_tag_pointer,xa_pointer_tag from upstream 3159f943aafd in order to reduce conflicts.
No, sorry, I don't want to add xarray.h to 4.19.y, that's crazy.
Or can I define these xa_untag_pointer,xa_tag_pointer,xa_pointer_tag in a erofs header internally? it is acceptable?
No, that's not ok.
If you want to backport a subset of the api, and Matthew agrees with it (I think he did), then let's backport a subset, as a single patch, that matches the original patch comments and git id.
thanks,
greg k-h
On 2019/2/25 23:51, Greg Kroah-Hartman wrote:
On Mon, Feb 25, 2019 at 11:07:19PM +0800, Gao Xiang wrote:
On 2019/2/25 23:04, Greg Kroah-Hartman wrote:
On Wed, Feb 20, 2019 at 05:18:48PM +0800, Gao Xiang wrote:
commit 51232df5e4b268936beccde5248f312a316800be upstream.
When the managed cache is enabled, the last reference count of a workgroup must be used for its workstation.
Otherwise, it could lead to incorrect (un)freezes in the reclaim path, and it would be harmful.
A typical race as follows:
Thread 1 (In the reclaim path) Thread 2 workgroup_freeze(grp, 1) refcnt = 1 ... workgroup_unfreeze(grp, 1) refcnt = 1 workgroup_get(grp) refcnt = 2 (x) workgroup_put(grp) refcnt = 1 (x) ...unexpected behaviors
- grp is detached but still used, which violates cache-managed freeze constraint.
Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com
Conflicts: drivers/staging/erofs/utils.c Updates: include/linux/xarray.h: add xa_untag_pointer,xa_tag_pointer,xa_pointer_tag from upstream 3159f943aafd in order to reduce conflicts.
No, sorry, I don't want to add xarray.h to 4.19.y, that's crazy.
Or can I define these xa_untag_pointer,xa_tag_pointer,xa_pointer_tag in a erofs header internally? it is acceptable?
No, that's not ok.
If you want to backport a subset of the api, and Matthew agrees with it (I think he did), then let's backport a subset, as a single patch, that matches the original patch comments and git id.
OK, I will backport a minimum subset of upstream 3159f943aafd only with xa_untag_pointer,xa_tag_pointer,xa_pointer_tag as the only valid part.
I will do this soon and thanks Greg and Matthew for taking time on this patch :)
Thanks, Gao Xiang
thanks,
greg k-h
From: Matthew Wilcox willy@infradead.org
commit 3159f943aafdbacb2f94c38fdaadabf2bbde2a14 upstream.
Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens.
Signed-off-by: Matthew Wilcox willy@infradead.org Reviewed-by: Josef Bacik jbacik@fb.com [ take the minimum subset of tagged pointer support only. ] Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/utils.c | 18 ++++++---------- include/linux/xarray.h | 48 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+), 12 deletions(-)
diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c index 595cf90af9bb..bdee9bd09f11 100644 --- a/drivers/staging/erofs/utils.c +++ b/drivers/staging/erofs/utils.c @@ -35,7 +35,6 @@ static atomic_long_t erofs_global_shrink_cnt;
#ifdef CONFIG_EROFS_FS_ZIP
-/* radix_tree and the future XArray both don't use tagptr_t yet */ struct erofs_workgroup *erofs_find_workgroup( struct super_block *sb, pgoff_t index, bool *tag) { @@ -47,9 +46,8 @@ struct erofs_workgroup *erofs_find_workgroup( rcu_read_lock(); grp = radix_tree_lookup(&sbi->workstn_tree, index); if (grp != NULL) { - *tag = radix_tree_exceptional_entry(grp); - grp = (void *)((unsigned long)grp & - ~RADIX_TREE_EXCEPTIONAL_ENTRY); + *tag = xa_pointer_tag(grp); + grp = xa_untag_pointer(grp);
if (erofs_workgroup_get(grp, &oldcount)) { /* prefer to relax rcu read side */ @@ -83,9 +81,7 @@ int erofs_register_workgroup(struct super_block *sb, sbi = EROFS_SB(sb); erofs_workstn_lock(sbi);
- if (tag) - grp = (void *)((unsigned long)grp | - 1UL << RADIX_TREE_EXCEPTIONAL_SHIFT); + grp = xa_tag_pointer(grp, tag);
err = radix_tree_insert(&sbi->workstn_tree, grp->index, grp); @@ -131,9 +127,7 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi,
for (i = 0; i < found; ++i) { int cnt; - struct erofs_workgroup *grp = (void *) - ((unsigned long)batch[i] & - ~RADIX_TREE_EXCEPTIONAL_ENTRY); + struct erofs_workgroup *grp = xa_untag_pointer(batch[i]);
first_index = grp->index + 1;
@@ -150,8 +144,8 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, #endif continue;
- if (radix_tree_delete(&sbi->workstn_tree, - grp->index) != grp) { + if (xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree, + grp->index)) != grp) { #ifdef EROFS_FS_HAS_MANAGED_CACHE skip: erofs_workgroup_unfreeze(grp, 1); diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 2dfc8006fe64..6f5c380f2d80 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -9,6 +9,54 @@
#include <linux/spinlock.h>
+/** + * xa_tag_pointer() - Create an XArray entry for a tagged pointer. + * @p: Plain pointer. + * @tag: Tag value (0, 1 or 3). + * + * If the user of the XArray prefers, they can tag their pointers instead + * of storing value entries. Three tags are available (0, 1 and 3). + * These are distinct from the xa_mark_t as they are not replicated up + * through the array and cannot be searched for. + * + * Context: Any context. + * Return: An XArray entry. + */ +static inline void *xa_tag_pointer(void *p, unsigned long tag) +{ + return (void *)((unsigned long)p | tag); +} + +/** + * xa_untag_pointer() - Turn an XArray entry into a plain pointer. + * @entry: XArray entry. + * + * If you have stored a tagged pointer in the XArray, call this function + * to get the untagged version of the pointer. + * + * Context: Any context. + * Return: A pointer. + */ +static inline void *xa_untag_pointer(void *entry) +{ + return (void *)((unsigned long)entry & ~3UL); +} + +/** + * xa_pointer_tag() - Get the tag stored in an XArray entry. + * @entry: XArray entry. + * + * If you have stored a tagged pointer in the XArray, call this function + * to get the tag of that pointer. + * + * Context: Any context. + * Return: A tag. + */ +static inline unsigned int xa_pointer_tag(void *entry) +{ + return (unsigned long)entry & 3UL; +} + #define xa_trylock(xa) spin_trylock(&(xa)->xa_lock) #define xa_lock(xa) spin_lock(&(xa)->xa_lock) #define xa_unlock(xa) spin_unlock(&(xa)->xa_lock)
From: Gao Xiang gaoxiang25@huawei.com
commit 51232df5e4b268936beccde5248f312a316800be upstream.
When the managed cache is enabled, the last reference count of a workgroup must be used for its workstation.
Otherwise, it could lead to incorrect (un)freezes in the reclaim path, and it would be harmful.
A typical race as follows:
Thread 1 (In the reclaim path) Thread 2 workgroup_freeze(grp, 1) refcnt = 1 ... workgroup_unfreeze(grp, 1) refcnt = 1 workgroup_get(grp) refcnt = 2 (x) workgroup_put(grp) refcnt = 1 (x) ...unexpected behaviors
* grp is detached but still used, which violates cache-managed freeze constraint.
Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/internal.h | 1 + drivers/staging/erofs/utils.c | 134 +++++++++++++++++++++++++++------------ 2 files changed, 96 insertions(+), 39 deletions(-)
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h index e6313c54e3ad..122ea5016f3b 100644 --- a/drivers/staging/erofs/internal.h +++ b/drivers/staging/erofs/internal.h @@ -240,6 +240,7 @@ static inline bool erofs_workgroup_get(struct erofs_workgroup *grp, int *ocnt) }
#define __erofs_workgroup_get(grp) atomic_inc(&(grp)->refcount) +#define __erofs_workgroup_put(grp) atomic_dec(&(grp)->refcount)
extern int erofs_workgroup_put(struct erofs_workgroup *grp);
diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c index bdee9bd09f11..bcabbb85d40e 100644 --- a/drivers/staging/erofs/utils.c +++ b/drivers/staging/erofs/utils.c @@ -83,12 +83,21 @@ int erofs_register_workgroup(struct super_block *sb,
grp = xa_tag_pointer(grp, tag);
- err = radix_tree_insert(&sbi->workstn_tree, - grp->index, grp); + /* + * Bump up reference count before making this workgroup + * visible to other users in order to avoid potential UAF + * without serialized by erofs_workstn_lock. + */ + __erofs_workgroup_get(grp);
- if (!err) { - __erofs_workgroup_get(grp); - } + err = radix_tree_insert(&sbi->workstn_tree, + grp->index, grp); + if (unlikely(err)) + /* + * it's safe to decrease since the workgroup isn't visible + * and refcount >= 2 (cannot be freezed). + */ + __erofs_workgroup_put(grp);
erofs_workstn_unlock(sbi); radix_tree_preload_end(); @@ -97,19 +106,94 @@ int erofs_register_workgroup(struct super_block *sb,
extern void erofs_workgroup_free_rcu(struct erofs_workgroup *grp);
+static void __erofs_workgroup_free(struct erofs_workgroup *grp) +{ + atomic_long_dec(&erofs_global_shrink_cnt); + erofs_workgroup_free_rcu(grp); +} + int erofs_workgroup_put(struct erofs_workgroup *grp) { int count = atomic_dec_return(&grp->refcount);
if (count == 1) atomic_long_inc(&erofs_global_shrink_cnt); - else if (!count) { - atomic_long_dec(&erofs_global_shrink_cnt); - erofs_workgroup_free_rcu(grp); - } + else if (!count) + __erofs_workgroup_free(grp); return count; }
+#ifdef EROFS_FS_HAS_MANAGED_CACHE +/* for cache-managed case, customized reclaim paths exist */ +static void erofs_workgroup_unfreeze_final(struct erofs_workgroup *grp) +{ + erofs_workgroup_unfreeze(grp, 0); + __erofs_workgroup_free(grp); +} + +bool erofs_try_to_release_workgroup(struct erofs_sb_info *sbi, + struct erofs_workgroup *grp, + bool cleanup) +{ + /* + * for managed cache enabled, the refcount of workgroups + * themselves could be < 0 (freezed). So there is no guarantee + * that all refcount > 0 if managed cache is enabled. + */ + if (!erofs_workgroup_try_to_freeze(grp, 1)) + return false; + + /* + * note that all cached pages should be unlinked + * before delete it from the radix tree. + * Otherwise some cached pages of an orphan old workgroup + * could be still linked after the new one is available. + */ + if (erofs_try_to_free_all_cached_pages(sbi, grp)) { + erofs_workgroup_unfreeze(grp, 1); + return false; + } + + /* + * it is impossible to fail after the workgroup is freezed, + * however in order to avoid some race conditions, add a + * DBG_BUGON to observe this in advance. + */ + DBG_BUGON(xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree, + grp->index)) != grp); + + /* + * if managed cache is enable, the last refcount + * should indicate the related workstation. + */ + erofs_workgroup_unfreeze_final(grp); + return true; +} + +#else +/* for nocache case, no customized reclaim path at all */ +bool erofs_try_to_release_workgroup(struct erofs_sb_info *sbi, + struct erofs_workgroup *grp, + bool cleanup) +{ + int cnt = atomic_read(&grp->refcount); + + DBG_BUGON(cnt <= 0); + DBG_BUGON(cleanup && cnt != 1); + + if (cnt > 1) + return false; + + DBG_BUGON(xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree, + grp->index)) != grp); + + /* (rarely) could be grabbed again when freeing */ + erofs_workgroup_put(grp); + return true; +} + +#endif + unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, unsigned long nr_shrink, bool cleanup) @@ -126,42 +210,14 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, batch, first_index, PAGEVEC_SIZE);
for (i = 0; i < found; ++i) { - int cnt; struct erofs_workgroup *grp = xa_untag_pointer(batch[i]);
first_index = grp->index + 1;
- cnt = atomic_read(&grp->refcount); - BUG_ON(cnt <= 0); - - if (cleanup) - BUG_ON(cnt != 1); - -#ifndef EROFS_FS_HAS_MANAGED_CACHE - else if (cnt > 1) -#else - if (!erofs_workgroup_try_to_freeze(grp, 1)) -#endif + /* try to shrink each valid workgroup */ + if (!erofs_try_to_release_workgroup(sbi, grp, cleanup)) continue;
- if (xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree, - grp->index)) != grp) { -#ifdef EROFS_FS_HAS_MANAGED_CACHE -skip: - erofs_workgroup_unfreeze(grp, 1); -#endif - continue; - } - -#ifdef EROFS_FS_HAS_MANAGED_CACHE - if (erofs_try_to_free_all_cached_pages(sbi, grp)) - goto skip; - - erofs_workgroup_unfreeze(grp, 1); -#endif - /* (rarely) grabbed again when freeing */ - erofs_workgroup_put(grp); - ++freed; if (unlikely(!--nr_shrink)) break;
On Tue, Feb 26, 2019 at 01:58:08AM +0800, Gao Xiang wrote:
+/**
- xa_tag_pointer() - Create an XArray entry for a tagged pointer.
- @p: Plain pointer.
- @tag: Tag value (0, 1 or 3).
- If the user of the XArray prefers, they can tag their pointers instead
- of storing value entries. Three tags are available (0, 1 and 3).
- These are distinct from the xa_mark_t as they are not replicated up
- through the array and cannot be searched for.
- Context: Any context.
- Return: An XArray entry.
- */
+static inline void *xa_tag_pointer(void *p, unsigned long tag) +{
- return (void *)((unsigned long)p | tag);
+}
I think we have to diverge from upstream here. Part of the original commit is changing the format of internal & exceptional entries to give us an extra bit. This implementation of xa_tag_pointer would transform a pointer tagged with value 1 into an internal pointer, which would break the radix tree.
I would suggest:
+/** + * xa_tag_pointer() - Create an XArray entry for a tagged pointer. + * @p: Plain pointer. + * @tag: Tag value (0 or 1). + * + * If the user of the XArray prefers, they can tag their pointers instead + * of storing value entries. Two tags are available (0 and 1). + * These are distinct from the xa_mark_t as they are not replicated up + * through the array and cannot be searched for. + * + * Context: Any context. + * Return: An XArray entry. + */ +static inline void *xa_tag_pointer(void *p, unsigned long tag) +{ + BUG_ON(tag > 1); + return (void *)((unsigned long)p | (tag << 1)); +}
xa_untag_pointer() and xa_pointer_tag() will need corresponding changes.
Hi Matthew,
On 2019/2/26 2:27, Matthew Wilcox wrote:
On Tue, Feb 26, 2019 at 01:58:08AM +0800, Gao Xiang wrote:
+/**
- xa_tag_pointer() - Create an XArray entry for a tagged pointer.
- @p: Plain pointer.
- @tag: Tag value (0, 1 or 3).
- If the user of the XArray prefers, they can tag their pointers instead
- of storing value entries. Three tags are available (0, 1 and 3).
- These are distinct from the xa_mark_t as they are not replicated up
- through the array and cannot be searched for.
- Context: Any context.
- Return: An XArray entry.
- */
+static inline void *xa_tag_pointer(void *p, unsigned long tag) +{
- return (void *)((unsigned long)p | tag);
+}
I think we have to diverge from upstream here. Part of the original commit is changing the format of internal & exceptional entries to give us an extra bit. This implementation of xa_tag_pointer would transform a pointer tagged with value 1 into an internal pointer, which would break the radix tree.
Sorry for the delay reply. I noticed this, radix use 1x for storing exceptional entry #define RADIX_TREE_EXCEPTIONAL_ENTRY 2 but XArray use x1 instead.
But I didn't notice this commit fixes the radix tree implementation as well, let me fix it soon and thanks for pointing it out :)
Thanks, Gao Xiang
I would suggest:
+/**
- xa_tag_pointer() - Create an XArray entry for a tagged pointer.
- @p: Plain pointer.
- @tag: Tag value (0 or 1).
- If the user of the XArray prefers, they can tag their pointers instead
- of storing value entries. Two tags are available (0 and 1).
- These are distinct from the xa_mark_t as they are not replicated up
- through the array and cannot be searched for.
- Context: Any context.
- Return: An XArray entry.
- */
+static inline void *xa_tag_pointer(void *p, unsigned long tag) +{
- BUG_ON(tag > 1);
- return (void *)((unsigned long)p | (tag << 1));
+}
xa_untag_pointer() and xa_pointer_tag() will need corresponding changes.
From: Matthew Wilcox willy@infradead.org
commit 3159f943aafdbacb2f94c38fdaadabf2bbde2a14 upstream.
Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens.
Signed-off-by: Matthew Wilcox willy@infradead.org Reviewed-by: Josef Bacik jbacik@fb.com [ take the minimum subset of tagged pointer support only. ] Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- change log v2: - fix tagged pointer apis to make it work properly pointed out by Matthew;
drivers/staging/erofs/utils.c | 18 +++++---------- include/linux/xarray.h | 52 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 12 deletions(-)
diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c index 595cf90af9bb..bdee9bd09f11 100644 --- a/drivers/staging/erofs/utils.c +++ b/drivers/staging/erofs/utils.c @@ -35,7 +35,6 @@ static atomic_long_t erofs_global_shrink_cnt;
#ifdef CONFIG_EROFS_FS_ZIP
-/* radix_tree and the future XArray both don't use tagptr_t yet */ struct erofs_workgroup *erofs_find_workgroup( struct super_block *sb, pgoff_t index, bool *tag) { @@ -47,9 +46,8 @@ struct erofs_workgroup *erofs_find_workgroup( rcu_read_lock(); grp = radix_tree_lookup(&sbi->workstn_tree, index); if (grp != NULL) { - *tag = radix_tree_exceptional_entry(grp); - grp = (void *)((unsigned long)grp & - ~RADIX_TREE_EXCEPTIONAL_ENTRY); + *tag = xa_pointer_tag(grp); + grp = xa_untag_pointer(grp);
if (erofs_workgroup_get(grp, &oldcount)) { /* prefer to relax rcu read side */ @@ -83,9 +81,7 @@ int erofs_register_workgroup(struct super_block *sb, sbi = EROFS_SB(sb); erofs_workstn_lock(sbi);
- if (tag) - grp = (void *)((unsigned long)grp | - 1UL << RADIX_TREE_EXCEPTIONAL_SHIFT); + grp = xa_tag_pointer(grp, tag);
err = radix_tree_insert(&sbi->workstn_tree, grp->index, grp); @@ -131,9 +127,7 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi,
for (i = 0; i < found; ++i) { int cnt; - struct erofs_workgroup *grp = (void *) - ((unsigned long)batch[i] & - ~RADIX_TREE_EXCEPTIONAL_ENTRY); + struct erofs_workgroup *grp = xa_untag_pointer(batch[i]);
first_index = grp->index + 1;
@@ -150,8 +144,8 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, #endif continue;
- if (radix_tree_delete(&sbi->workstn_tree, - grp->index) != grp) { + if (xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree, + grp->index)) != grp) { #ifdef EROFS_FS_HAS_MANAGED_CACHE skip: erofs_workgroup_unfreeze(grp, 1); diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 2dfc8006fe64..51ae9779d08b 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -9,6 +9,58 @@
#include <linux/spinlock.h>
+/** + * xa_tag_pointer() - Create an XArray entry for a tagged pointer. + * @p: Plain pointer. + * @tag: Tag value (0 or 1). + * + * If the user of the XArray prefers, they can tag their pointers instead + * of storing value entries. Two tags are available (0 and 1). + * These are distinct from the xa_mark_t as they are not replicated up + * through the array and cannot be searched for. + * + * Context: Any context. + * Return: An XArray entry. + */ +static inline void *xa_tag_pointer(void *p, unsigned long tag) +{ + if (__builtin_constant_p(tag)) + BUILD_BUG_ON(tag > 1); + else + BUG_ON(tag > 1); + return (void *)((unsigned long)p | (tag << 1)); +} + +/** + * xa_untag_pointer() - Turn an XArray entry into a plain pointer. + * @entry: XArray entry. + * + * If you have stored a tagged pointer in the XArray, call this function + * to get the untagged version of the pointer. + * + * Context: Any context. + * Return: A pointer. + */ +static inline void *xa_untag_pointer(void *entry) +{ + return (void *)((unsigned long)entry & ~3UL); +} + +/** + * xa_pointer_tag() - Get the tag stored in an XArray entry. + * @entry: XArray entry. + * + * If you have stored a tagged pointer in the XArray, call this function + * to get the tag of that pointer. + * + * Context: Any context. + * Return: A tag. + */ +static inline unsigned int xa_pointer_tag(void *entry) +{ + return ((unsigned long)entry & 3UL) >> 1; +} + #define xa_trylock(xa) spin_trylock(&(xa)->xa_lock) #define xa_lock(xa) spin_lock(&(xa)->xa_lock) #define xa_unlock(xa) spin_unlock(&(xa)->xa_lock)
commit 51232df5e4b268936beccde5248f312a316800be upstream.
When the managed cache is enabled, the last reference count of a workgroup must be used for its workstation.
Otherwise, it could lead to incorrect (un)freezes in the reclaim path, and it would be harmful.
A typical race as follows:
Thread 1 (In the reclaim path) Thread 2 workgroup_freeze(grp, 1) refcnt = 1 ... workgroup_unfreeze(grp, 1) refcnt = 1 workgroup_get(grp) refcnt = 2 (x) workgroup_put(grp) refcnt = 1 (x) ...unexpected behaviors
* grp is detached but still used, which violates cache-managed freeze constraint.
Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/internal.h | 1 + drivers/staging/erofs/utils.c | 134 +++++++++++++++++++++++++++------------ 2 files changed, 96 insertions(+), 39 deletions(-)
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h index e6313c54e3ad..122ea5016f3b 100644 --- a/drivers/staging/erofs/internal.h +++ b/drivers/staging/erofs/internal.h @@ -240,6 +240,7 @@ static inline bool erofs_workgroup_get(struct erofs_workgroup *grp, int *ocnt) }
#define __erofs_workgroup_get(grp) atomic_inc(&(grp)->refcount) +#define __erofs_workgroup_put(grp) atomic_dec(&(grp)->refcount)
extern int erofs_workgroup_put(struct erofs_workgroup *grp);
diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c index bdee9bd09f11..bcabbb85d40e 100644 --- a/drivers/staging/erofs/utils.c +++ b/drivers/staging/erofs/utils.c @@ -83,12 +83,21 @@ int erofs_register_workgroup(struct super_block *sb,
grp = xa_tag_pointer(grp, tag);
- err = radix_tree_insert(&sbi->workstn_tree, - grp->index, grp); + /* + * Bump up reference count before making this workgroup + * visible to other users in order to avoid potential UAF + * without serialized by erofs_workstn_lock. + */ + __erofs_workgroup_get(grp);
- if (!err) { - __erofs_workgroup_get(grp); - } + err = radix_tree_insert(&sbi->workstn_tree, + grp->index, grp); + if (unlikely(err)) + /* + * it's safe to decrease since the workgroup isn't visible + * and refcount >= 2 (cannot be freezed). + */ + __erofs_workgroup_put(grp);
erofs_workstn_unlock(sbi); radix_tree_preload_end(); @@ -97,19 +106,94 @@ int erofs_register_workgroup(struct super_block *sb,
extern void erofs_workgroup_free_rcu(struct erofs_workgroup *grp);
+static void __erofs_workgroup_free(struct erofs_workgroup *grp) +{ + atomic_long_dec(&erofs_global_shrink_cnt); + erofs_workgroup_free_rcu(grp); +} + int erofs_workgroup_put(struct erofs_workgroup *grp) { int count = atomic_dec_return(&grp->refcount);
if (count == 1) atomic_long_inc(&erofs_global_shrink_cnt); - else if (!count) { - atomic_long_dec(&erofs_global_shrink_cnt); - erofs_workgroup_free_rcu(grp); - } + else if (!count) + __erofs_workgroup_free(grp); return count; }
+#ifdef EROFS_FS_HAS_MANAGED_CACHE +/* for cache-managed case, customized reclaim paths exist */ +static void erofs_workgroup_unfreeze_final(struct erofs_workgroup *grp) +{ + erofs_workgroup_unfreeze(grp, 0); + __erofs_workgroup_free(grp); +} + +bool erofs_try_to_release_workgroup(struct erofs_sb_info *sbi, + struct erofs_workgroup *grp, + bool cleanup) +{ + /* + * for managed cache enabled, the refcount of workgroups + * themselves could be < 0 (freezed). So there is no guarantee + * that all refcount > 0 if managed cache is enabled. + */ + if (!erofs_workgroup_try_to_freeze(grp, 1)) + return false; + + /* + * note that all cached pages should be unlinked + * before delete it from the radix tree. + * Otherwise some cached pages of an orphan old workgroup + * could be still linked after the new one is available. + */ + if (erofs_try_to_free_all_cached_pages(sbi, grp)) { + erofs_workgroup_unfreeze(grp, 1); + return false; + } + + /* + * it is impossible to fail after the workgroup is freezed, + * however in order to avoid some race conditions, add a + * DBG_BUGON to observe this in advance. + */ + DBG_BUGON(xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree, + grp->index)) != grp); + + /* + * if managed cache is enable, the last refcount + * should indicate the related workstation. + */ + erofs_workgroup_unfreeze_final(grp); + return true; +} + +#else +/* for nocache case, no customized reclaim path at all */ +bool erofs_try_to_release_workgroup(struct erofs_sb_info *sbi, + struct erofs_workgroup *grp, + bool cleanup) +{ + int cnt = atomic_read(&grp->refcount); + + DBG_BUGON(cnt <= 0); + DBG_BUGON(cleanup && cnt != 1); + + if (cnt > 1) + return false; + + DBG_BUGON(xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree, + grp->index)) != grp); + + /* (rarely) could be grabbed again when freeing */ + erofs_workgroup_put(grp); + return true; +} + +#endif + unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, unsigned long nr_shrink, bool cleanup) @@ -126,42 +210,14 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, batch, first_index, PAGEVEC_SIZE);
for (i = 0; i < found; ++i) { - int cnt; struct erofs_workgroup *grp = xa_untag_pointer(batch[i]);
first_index = grp->index + 1;
- cnt = atomic_read(&grp->refcount); - BUG_ON(cnt <= 0); - - if (cleanup) - BUG_ON(cnt != 1); - -#ifndef EROFS_FS_HAS_MANAGED_CACHE - else if (cnt > 1) -#else - if (!erofs_workgroup_try_to_freeze(grp, 1)) -#endif + /* try to shrink each valid workgroup */ + if (!erofs_try_to_release_workgroup(sbi, grp, cleanup)) continue;
- if (xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree, - grp->index)) != grp) { -#ifdef EROFS_FS_HAS_MANAGED_CACHE -skip: - erofs_workgroup_unfreeze(grp, 1); -#endif - continue; - } - -#ifdef EROFS_FS_HAS_MANAGED_CACHE - if (erofs_try_to_free_all_cached_pages(sbi, grp)) - goto skip; - - erofs_workgroup_unfreeze(grp, 1); -#endif - /* (rarely) grabbed again when freeing */ - erofs_workgroup_put(grp); - ++freed; if (unlikely(!--nr_shrink)) break;
Hi Matthew,
On 2019/2/26 13:14, Gao Xiang wrote:
From: Matthew Wilcox willy@infradead.org
commit 3159f943aafdbacb2f94c38fdaadabf2bbde2a14 upstream.
Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens.
Signed-off-by: Matthew Wilcox willy@infradead.org Reviewed-by: Josef Bacik jbacik@fb.com [ take the minimum subset of tagged pointer support only. ] Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com
change log v2:
- fix tagged pointer apis to make it work properly pointed out by Matthew;
Is this version ok? Could you kindly give an "Acked-by" tag for this patch?
The following patch is important for erofs, I'd like to backport it ASAP... staging: erofs: fix race when the managed cache is enabled
It will be of great help. Thanks in advance!
Thanks, Gao Xiang
drivers/staging/erofs/utils.c | 18 +++++---------- include/linux/xarray.h | 52 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 12 deletions(-)
diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c index 595cf90af9bb..bdee9bd09f11 100644 --- a/drivers/staging/erofs/utils.c +++ b/drivers/staging/erofs/utils.c @@ -35,7 +35,6 @@ static atomic_long_t erofs_global_shrink_cnt; #ifdef CONFIG_EROFS_FS_ZIP -/* radix_tree and the future XArray both don't use tagptr_t yet */ struct erofs_workgroup *erofs_find_workgroup( struct super_block *sb, pgoff_t index, bool *tag) { @@ -47,9 +46,8 @@ struct erofs_workgroup *erofs_find_workgroup( rcu_read_lock(); grp = radix_tree_lookup(&sbi->workstn_tree, index); if (grp != NULL) {
*tag = radix_tree_exceptional_entry(grp);
grp = (void *)((unsigned long)grp &
~RADIX_TREE_EXCEPTIONAL_ENTRY);
*tag = xa_pointer_tag(grp);
grp = xa_untag_pointer(grp);
if (erofs_workgroup_get(grp, &oldcount)) { /* prefer to relax rcu read side */ @@ -83,9 +81,7 @@ int erofs_register_workgroup(struct super_block *sb, sbi = EROFS_SB(sb); erofs_workstn_lock(sbi);
- if (tag)
grp = (void *)((unsigned long)grp |
1UL << RADIX_TREE_EXCEPTIONAL_SHIFT);
- grp = xa_tag_pointer(grp, tag);
err = radix_tree_insert(&sbi->workstn_tree, grp->index, grp); @@ -131,9 +127,7 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, for (i = 0; i < found; ++i) { int cnt;
struct erofs_workgroup *grp = (void *)
((unsigned long)batch[i] &
~RADIX_TREE_EXCEPTIONAL_ENTRY);
struct erofs_workgroup *grp = xa_untag_pointer(batch[i]);
first_index = grp->index + 1; @@ -150,8 +144,8 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, #endif continue;
if (radix_tree_delete(&sbi->workstn_tree,
grp->index) != grp) {
if (xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree,
grp->index)) != grp) {
#ifdef EROFS_FS_HAS_MANAGED_CACHE skip: erofs_workgroup_unfreeze(grp, 1); diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 2dfc8006fe64..51ae9779d08b 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -9,6 +9,58 @@ #include <linux/spinlock.h> +/**
- xa_tag_pointer() - Create an XArray entry for a tagged pointer.
- @p: Plain pointer.
- @tag: Tag value (0 or 1).
- If the user of the XArray prefers, they can tag their pointers instead
- of storing value entries. Two tags are available (0 and 1).
- These are distinct from the xa_mark_t as they are not replicated up
- through the array and cannot be searched for.
- Context: Any context.
- Return: An XArray entry.
- */
+static inline void *xa_tag_pointer(void *p, unsigned long tag) +{
- if (__builtin_constant_p(tag))
BUILD_BUG_ON(tag > 1);
- else
BUG_ON(tag > 1);
- return (void *)((unsigned long)p | (tag << 1));
+}
+/**
- xa_untag_pointer() - Turn an XArray entry into a plain pointer.
- @entry: XArray entry.
- If you have stored a tagged pointer in the XArray, call this function
- to get the untagged version of the pointer.
- Context: Any context.
- Return: A pointer.
- */
+static inline void *xa_untag_pointer(void *entry) +{
- return (void *)((unsigned long)entry & ~3UL);
+}
+/**
- xa_pointer_tag() - Get the tag stored in an XArray entry.
- @entry: XArray entry.
- If you have stored a tagged pointer in the XArray, call this function
- to get the tag of that pointer.
- Context: Any context.
- Return: A tag.
- */
+static inline unsigned int xa_pointer_tag(void *entry) +{
- return ((unsigned long)entry & 3UL) >> 1;
+}
#define xa_trylock(xa) spin_trylock(&(xa)->xa_lock) #define xa_lock(xa) spin_lock(&(xa)->xa_lock) #define xa_unlock(xa) spin_unlock(&(xa)->xa_lock)
ping? Is these two patches can be merged into linux-4.19? :'(
Thanks, Gao Xiang
On 2019/2/26 20:43, Gao Xiang wrote:
Hi Matthew,
On 2019/2/26 13:14, Gao Xiang wrote:
From: Matthew Wilcox willy@infradead.org
commit 3159f943aafdbacb2f94c38fdaadabf2bbde2a14 upstream.
Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens.
Signed-off-by: Matthew Wilcox willy@infradead.org Reviewed-by: Josef Bacik jbacik@fb.com [ take the minimum subset of tagged pointer support only. ] Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com
change log v2:
- fix tagged pointer apis to make it work properly pointed out by Matthew;
Is this version ok? Could you kindly give an "Acked-by" tag for this patch?
The following patch is important for erofs, I'd like to backport it ASAP... staging: erofs: fix race when the managed cache is enabled
It will be of great help. Thanks in advance!
Thanks, Gao Xiang
drivers/staging/erofs/utils.c | 18 +++++---------- include/linux/xarray.h | 52 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 12 deletions(-)
diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c index 595cf90af9bb..bdee9bd09f11 100644 --- a/drivers/staging/erofs/utils.c +++ b/drivers/staging/erofs/utils.c @@ -35,7 +35,6 @@ static atomic_long_t erofs_global_shrink_cnt; #ifdef CONFIG_EROFS_FS_ZIP -/* radix_tree and the future XArray both don't use tagptr_t yet */ struct erofs_workgroup *erofs_find_workgroup( struct super_block *sb, pgoff_t index, bool *tag) { @@ -47,9 +46,8 @@ struct erofs_workgroup *erofs_find_workgroup( rcu_read_lock(); grp = radix_tree_lookup(&sbi->workstn_tree, index); if (grp != NULL) {
*tag = radix_tree_exceptional_entry(grp);
grp = (void *)((unsigned long)grp &
~RADIX_TREE_EXCEPTIONAL_ENTRY);
*tag = xa_pointer_tag(grp);
grp = xa_untag_pointer(grp);
if (erofs_workgroup_get(grp, &oldcount)) { /* prefer to relax rcu read side */ @@ -83,9 +81,7 @@ int erofs_register_workgroup(struct super_block *sb, sbi = EROFS_SB(sb); erofs_workstn_lock(sbi);
- if (tag)
grp = (void *)((unsigned long)grp |
1UL << RADIX_TREE_EXCEPTIONAL_SHIFT);
- grp = xa_tag_pointer(grp, tag);
err = radix_tree_insert(&sbi->workstn_tree, grp->index, grp); @@ -131,9 +127,7 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, for (i = 0; i < found; ++i) { int cnt;
struct erofs_workgroup *grp = (void *)
((unsigned long)batch[i] &
~RADIX_TREE_EXCEPTIONAL_ENTRY);
struct erofs_workgroup *grp = xa_untag_pointer(batch[i]);
first_index = grp->index + 1; @@ -150,8 +144,8 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, #endif continue;
if (radix_tree_delete(&sbi->workstn_tree,
grp->index) != grp) {
if (xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree,
grp->index)) != grp) {
#ifdef EROFS_FS_HAS_MANAGED_CACHE skip: erofs_workgroup_unfreeze(grp, 1); diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 2dfc8006fe64..51ae9779d08b 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -9,6 +9,58 @@ #include <linux/spinlock.h> +/**
- xa_tag_pointer() - Create an XArray entry for a tagged pointer.
- @p: Plain pointer.
- @tag: Tag value (0 or 1).
- If the user of the XArray prefers, they can tag their pointers instead
- of storing value entries. Two tags are available (0 and 1).
- These are distinct from the xa_mark_t as they are not replicated up
- through the array and cannot be searched for.
- Context: Any context.
- Return: An XArray entry.
- */
+static inline void *xa_tag_pointer(void *p, unsigned long tag) +{
- if (__builtin_constant_p(tag))
BUILD_BUG_ON(tag > 1);
- else
BUG_ON(tag > 1);
- return (void *)((unsigned long)p | (tag << 1));
+}
+/**
- xa_untag_pointer() - Turn an XArray entry into a plain pointer.
- @entry: XArray entry.
- If you have stored a tagged pointer in the XArray, call this function
- to get the untagged version of the pointer.
- Context: Any context.
- Return: A pointer.
- */
+static inline void *xa_untag_pointer(void *entry) +{
- return (void *)((unsigned long)entry & ~3UL);
+}
+/**
- xa_pointer_tag() - Get the tag stored in an XArray entry.
- @entry: XArray entry.
- If you have stored a tagged pointer in the XArray, call this function
- to get the tag of that pointer.
- Context: Any context.
- Return: A tag.
- */
+static inline unsigned int xa_pointer_tag(void *entry) +{
- return ((unsigned long)entry & 3UL) >> 1;
+}
#define xa_trylock(xa) spin_trylock(&(xa)->xa_lock) #define xa_lock(xa) spin_lock(&(xa)->xa_lock) #define xa_unlock(xa) spin_unlock(&(xa)->xa_lock)
ping? ...
On 2019/3/4 13:13, Gao Xiang wrote:
ping? Is these two patches can be merged into linux-4.19? :'(
Thanks, Gao Xiang
On 2019/2/26 20:43, Gao Xiang wrote:
Hi Matthew,
On 2019/2/26 13:14, Gao Xiang wrote:
From: Matthew Wilcox willy@infradead.org
commit 3159f943aafdbacb2f94c38fdaadabf2bbde2a14 upstream.
Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens.
Signed-off-by: Matthew Wilcox willy@infradead.org Reviewed-by: Josef Bacik jbacik@fb.com [ take the minimum subset of tagged pointer support only. ] Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com
change log v2:
- fix tagged pointer apis to make it work properly pointed out by Matthew;
Is this version ok? Could you kindly give an "Acked-by" tag for this patch?
The following patch is important for erofs, I'd like to backport it ASAP... staging: erofs: fix race when the managed cache is enabled
It will be of great help. Thanks in advance!
Thanks, Gao Xiang
drivers/staging/erofs/utils.c | 18 +++++---------- include/linux/xarray.h | 52 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 12 deletions(-)
diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c index 595cf90af9bb..bdee9bd09f11 100644 --- a/drivers/staging/erofs/utils.c +++ b/drivers/staging/erofs/utils.c @@ -35,7 +35,6 @@ static atomic_long_t erofs_global_shrink_cnt; #ifdef CONFIG_EROFS_FS_ZIP -/* radix_tree and the future XArray both don't use tagptr_t yet */ struct erofs_workgroup *erofs_find_workgroup( struct super_block *sb, pgoff_t index, bool *tag) { @@ -47,9 +46,8 @@ struct erofs_workgroup *erofs_find_workgroup( rcu_read_lock(); grp = radix_tree_lookup(&sbi->workstn_tree, index); if (grp != NULL) {
*tag = radix_tree_exceptional_entry(grp);
grp = (void *)((unsigned long)grp &
~RADIX_TREE_EXCEPTIONAL_ENTRY);
*tag = xa_pointer_tag(grp);
grp = xa_untag_pointer(grp);
if (erofs_workgroup_get(grp, &oldcount)) { /* prefer to relax rcu read side */ @@ -83,9 +81,7 @@ int erofs_register_workgroup(struct super_block *sb, sbi = EROFS_SB(sb); erofs_workstn_lock(sbi);
- if (tag)
grp = (void *)((unsigned long)grp |
1UL << RADIX_TREE_EXCEPTIONAL_SHIFT);
- grp = xa_tag_pointer(grp, tag);
err = radix_tree_insert(&sbi->workstn_tree, grp->index, grp); @@ -131,9 +127,7 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, for (i = 0; i < found; ++i) { int cnt;
struct erofs_workgroup *grp = (void *)
((unsigned long)batch[i] &
~RADIX_TREE_EXCEPTIONAL_ENTRY);
struct erofs_workgroup *grp = xa_untag_pointer(batch[i]);
first_index = grp->index + 1; @@ -150,8 +144,8 @@ unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, #endif continue;
if (radix_tree_delete(&sbi->workstn_tree,
grp->index) != grp) {
if (xa_untag_pointer(radix_tree_delete(&sbi->workstn_tree,
grp->index)) != grp) {
#ifdef EROFS_FS_HAS_MANAGED_CACHE skip: erofs_workgroup_unfreeze(grp, 1); diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 2dfc8006fe64..51ae9779d08b 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -9,6 +9,58 @@ #include <linux/spinlock.h> +/**
- xa_tag_pointer() - Create an XArray entry for a tagged pointer.
- @p: Plain pointer.
- @tag: Tag value (0 or 1).
- If the user of the XArray prefers, they can tag their pointers instead
- of storing value entries. Two tags are available (0 and 1).
- These are distinct from the xa_mark_t as they are not replicated up
- through the array and cannot be searched for.
- Context: Any context.
- Return: An XArray entry.
- */
+static inline void *xa_tag_pointer(void *p, unsigned long tag) +{
- if (__builtin_constant_p(tag))
BUILD_BUG_ON(tag > 1);
- else
BUG_ON(tag > 1);
- return (void *)((unsigned long)p | (tag << 1));
+}
+/**
- xa_untag_pointer() - Turn an XArray entry into a plain pointer.
- @entry: XArray entry.
- If you have stored a tagged pointer in the XArray, call this function
- to get the untagged version of the pointer.
- Context: Any context.
- Return: A pointer.
- */
+static inline void *xa_untag_pointer(void *entry) +{
- return (void *)((unsigned long)entry & ~3UL);
+}
+/**
- xa_pointer_tag() - Get the tag stored in an XArray entry.
- @entry: XArray entry.
- If you have stored a tagged pointer in the XArray, call this function
- to get the tag of that pointer.
- Context: Any context.
- Return: A tag.
- */
+static inline unsigned int xa_pointer_tag(void *entry) +{
- return ((unsigned long)entry & 3UL) >> 1;
+}
#define xa_trylock(xa) spin_trylock(&(xa)->xa_lock) #define xa_lock(xa) spin_lock(&(xa)->xa_lock) #define xa_unlock(xa) spin_unlock(&(xa)->xa_lock)
On Mon, Feb 25, 2019 at 04:04:49PM +0100, Greg Kroah-Hartman wrote:
On Wed, Feb 20, 2019 at 05:18:48PM +0800, Gao Xiang wrote:
commit 51232df5e4b268936beccde5248f312a316800be upstream. Updates: include/linux/xarray.h: add xa_untag_pointer,xa_tag_pointer,xa_pointer_tag from upstream 3159f943aafd in order to reduce conflicts.
No, sorry, I don't want to add xarray.h to 4.19.y, that's crazy.
I gave this a quick look when it came past, and I don't particularly object to this piece going into 4.19.y. A full-on backport of XArray to 4.19 will be ... interesting, but essentially this is just some boilerplate.
And even if we did, you do not slip it in as part of a different patch, it should come in as its own patch, with the same git commit id that it landed in 4.20 with.
Putting in all of 3159f943aafdbacb2f94c38fdaadabf2bbde2a14 would be a bad idea; it actually ended up breaking m68k in a rather unexpected way which required 66ee620f06f99d72475db6eb638559ba608c7dee, which in turn caused a memory consumption regression ...
But putting in the xa_tag_pointer() API doesn't concern me.
Please fix this up...
greg k-h
On Mon, Feb 25, 2019 at 07:25:48AM -0800, Matthew Wilcox wrote:
On Mon, Feb 25, 2019 at 04:04:49PM +0100, Greg Kroah-Hartman wrote:
On Wed, Feb 20, 2019 at 05:18:48PM +0800, Gao Xiang wrote:
commit 51232df5e4b268936beccde5248f312a316800be upstream. Updates: include/linux/xarray.h: add xa_untag_pointer,xa_tag_pointer,xa_pointer_tag from upstream 3159f943aafd in order to reduce conflicts.
No, sorry, I don't want to add xarray.h to 4.19.y, that's crazy.
I gave this a quick look when it came past, and I don't particularly object to this piece going into 4.19.y. A full-on backport of XArray to 4.19 will be ... interesting, but essentially this is just some boilerplate.
And even if we did, you do not slip it in as part of a different patch, it should come in as its own patch, with the same git commit id that it landed in 4.20 with.
Putting in all of 3159f943aafdbacb2f94c38fdaadabf2bbde2a14 would be a bad idea; it actually ended up breaking m68k in a rather unexpected way which required 66ee620f06f99d72475db6eb638559ba608c7dee, which in turn caused a memory consumption regression ...
Where did the chain of regressions stop? that would be good to know as we will have to deal with this over time :)
thanks,
greg k-h
On Mon, Feb 25, 2019 at 04:52:43PM +0100, Greg Kroah-Hartman wrote:
On Mon, Feb 25, 2019 at 07:25:48AM -0800, Matthew Wilcox wrote:
Putting in all of 3159f943aafdbacb2f94c38fdaadabf2bbde2a14 would be a bad idea; it actually ended up breaking m68k in a rather unexpected way which required 66ee620f06f99d72475db6eb638559ba608c7dee, which in turn caused a memory consumption regression ...
Where did the chain of regressions stop? that would be good to know as we will have to deal with this over time :)
As far as I know, that terminated with f32f004cddf86d63a9c42994bbce9f4e2f07c9fa but by the point you get there, we've brought in the whole of the XArray which I, like you, would be reluctant to do at this point.
It may make sense to bring in the whole of the XArray at some point to make backporting other fixes easier, but I think we should give it a few more months to be sure it's solid.
commit df134b8d17b90c1e7720e318d36416b57424ff7a upstream.
It's better to use atomic_cond_read_relaxed, which is implemented in hardware instructions to monitor a variable changes currently for ARM64, instead of open-coded busy waiting.
Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/internal.h | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-)
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h index 122ea5016f3b..0fff0d30fed7 100644 --- a/drivers/staging/erofs/internal.h +++ b/drivers/staging/erofs/internal.h @@ -211,23 +211,29 @@ static inline void erofs_workgroup_unfreeze( preempt_enable(); }
+#if defined(CONFIG_SMP) +static inline int erofs_wait_on_workgroup_freezed(struct erofs_workgroup *grp) +{ + return atomic_cond_read_relaxed(&grp->refcount, + VAL != EROFS_LOCKED_MAGIC); +} +#else +static inline int erofs_wait_on_workgroup_freezed(struct erofs_workgroup *grp) +{ + int v = atomic_read(&grp->refcount); + + /* workgroup is never freezed on uniprocessor systems */ + DBG_BUGON(v == EROFS_LOCKED_MAGIC); + return v; +} +#endif + static inline bool erofs_workgroup_get(struct erofs_workgroup *grp, int *ocnt) { - const int locked = (int)EROFS_LOCKED_MAGIC; int o;
repeat: - o = atomic_read(&grp->refcount); - - /* spin if it is temporarily locked at the reclaim path */ - if (unlikely(o == locked)) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) - do - cpu_relax(); - while (atomic_read(&grp->refcount) == locked); -#endif - goto repeat; - } + o = erofs_wait_on_workgroup_freezed(grp);
if (unlikely(o <= 0)) return -1;
commit 73f5c66df3e26ab750cefcb9a3e08c71c9f79cad upstream.
There are two minor issues in the current freeze interface:
1) Freeze interfaces have not related with CONFIG_DEBUG_SPINLOCK, therefore fix the incorrect conditions;
2) For SMP platforms, it should also disable preemption before doing atomic_cmpxchg in case that some high priority tasks preempt between atomic_cmpxchg and disable_preempt, then spin on the locked refcount later.
Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/internal.h | 41 ++++++++++++++++++++++++---------------- 1 file changed, 25 insertions(+), 16 deletions(-)
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h index 0fff0d30fed7..c4c9caf84639 100644 --- a/drivers/staging/erofs/internal.h +++ b/drivers/staging/erofs/internal.h @@ -184,40 +184,49 @@ struct erofs_workgroup {
#define EROFS_LOCKED_MAGIC (INT_MIN | 0xE0F510CCL)
-static inline bool erofs_workgroup_try_to_freeze( - struct erofs_workgroup *grp, int v) +#if defined(CONFIG_SMP) +static inline bool erofs_workgroup_try_to_freeze(struct erofs_workgroup *grp, + int val) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) - if (v != atomic_cmpxchg(&grp->refcount, - v, EROFS_LOCKED_MAGIC)) - return false; preempt_disable(); -#else - preempt_disable(); - if (atomic_read(&grp->refcount) != v) { + if (val != atomic_cmpxchg(&grp->refcount, val, EROFS_LOCKED_MAGIC)) { preempt_enable(); return false; } -#endif return true; }
-static inline void erofs_workgroup_unfreeze( - struct erofs_workgroup *grp, int v) +static inline void erofs_workgroup_unfreeze(struct erofs_workgroup *grp, + int orig_val) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) - atomic_set(&grp->refcount, v); -#endif + atomic_set(&grp->refcount, orig_val); preempt_enable(); }
-#if defined(CONFIG_SMP) static inline int erofs_wait_on_workgroup_freezed(struct erofs_workgroup *grp) { return atomic_cond_read_relaxed(&grp->refcount, VAL != EROFS_LOCKED_MAGIC); } #else +static inline bool erofs_workgroup_try_to_freeze(struct erofs_workgroup *grp, + int val) +{ + preempt_disable(); + /* no need to spin on UP platforms, let's just disable preemption. */ + if (val != atomic_read(&grp->refcount)) { + preempt_enable(); + return false; + } + return true; +} + +static inline void erofs_workgroup_unfreeze(struct erofs_workgroup *grp, + int orig_val) +{ + preempt_enable(); +} + static inline int erofs_wait_on_workgroup_freezed(struct erofs_workgroup *grp) { int v = atomic_read(&grp->refcount);
commit 948bbdb1818b7ad6e539dad4fbd2dd4650793ea9 upstream.
Just like other generic locks, insert a full barrier in case of memory reorder.
Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/internal.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/staging/erofs/internal.h b/drivers/staging/erofs/internal.h index c4c9caf84639..2e9a9a4b4226 100644 --- a/drivers/staging/erofs/internal.h +++ b/drivers/staging/erofs/internal.h @@ -199,6 +199,11 @@ static inline bool erofs_workgroup_try_to_freeze(struct erofs_workgroup *grp, static inline void erofs_workgroup_unfreeze(struct erofs_workgroup *grp, int orig_val) { + /* + * other observers should notice all modifications + * in the freezing period. + */ + smp_mb(); atomic_set(&grp->refcount, orig_val); preempt_enable(); }
commit 8b987bca2d09649683cbe496419a011df8c08493 upstream.
remove all redundant BUG_ONs, and turn the rest useful usages to DBG_BUGONs.
Signed-off-by: Gao Xiang gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: drivers/staging/erofs/super.c
Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/dir.c | 7 +++++-- drivers/staging/erofs/inode.c | 10 ++++++++-- drivers/staging/erofs/super.c | 13 ++++++------- 3 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/drivers/staging/erofs/dir.c b/drivers/staging/erofs/dir.c index be6ae3b1bdbe..04b84ff31d03 100644 --- a/drivers/staging/erofs/dir.c +++ b/drivers/staging/erofs/dir.c @@ -53,8 +53,11 @@ static int erofs_fill_dentries(struct dir_context *ctx, strnlen(de_name, maxsize - nameoff) : le16_to_cpu(de[1].nameoff) - nameoff;
- /* the corrupted directory found */ - BUG_ON(de_namelen < 0); + /* a corrupted entry is found */ + if (unlikely(de_namelen < 0)) { + DBG_BUGON(1); + return -EIO; + }
#ifdef CONFIG_EROFS_FS_DEBUG dbg_namelen = min(EROFS_NAME_LEN - 1, de_namelen); diff --git a/drivers/staging/erofs/inode.c b/drivers/staging/erofs/inode.c index fbf6ff25cd1b..9e7815f55a17 100644 --- a/drivers/staging/erofs/inode.c +++ b/drivers/staging/erofs/inode.c @@ -132,7 +132,13 @@ static int fill_inline_data(struct inode *inode, void *data, unsigned m_pofs) return -ENOMEM;
m_pofs += vi->inode_isize + vi->xattr_isize; - BUG_ON(m_pofs + inode->i_size > PAGE_SIZE); + + /* inline symlink data shouldn't across page boundary as well */ + if (unlikely(m_pofs + inode->i_size > PAGE_SIZE)) { + DBG_BUGON(1); + kfree(lnk); + return -EIO; + }
/* get in-page inline data */ memcpy(lnk, data + m_pofs, inode->i_size); @@ -170,7 +176,7 @@ static int fill_inode(struct inode *inode, int isdir) return PTR_ERR(page); }
- BUG_ON(!PageUptodate(page)); + DBG_BUGON(!PageUptodate(page)); data = page_address(page);
err = read_inode(inode, data + ofs); diff --git a/drivers/staging/erofs/super.c b/drivers/staging/erofs/super.c index 434616b3e76c..b0583cdb079a 100644 --- a/drivers/staging/erofs/super.c +++ b/drivers/staging/erofs/super.c @@ -40,7 +40,6 @@ static int erofs_init_inode_cache(void)
static void erofs_exit_inode_cache(void) { - BUG_ON(erofs_inode_cachep == NULL); kmem_cache_destroy(erofs_inode_cachep); }
@@ -265,8 +264,8 @@ static int managed_cache_releasepage(struct page *page, gfp_t gfp_mask) int ret = 1; /* 0 - busy */ struct address_space *const mapping = page->mapping;
- BUG_ON(!PageLocked(page)); - BUG_ON(mapping->a_ops != &managed_cache_aops); + DBG_BUGON(!PageLocked(page)); + DBG_BUGON(mapping->a_ops != &managed_cache_aops);
if (PagePrivate(page)) ret = erofs_try_to_free_cached_page(mapping, page); @@ -279,10 +278,10 @@ static void managed_cache_invalidatepage(struct page *page, { const unsigned int stop = length + offset;
- BUG_ON(!PageLocked(page)); + DBG_BUGON(!PageLocked(page));
- /* Check for overflow */ - BUG_ON(stop > PAGE_SIZE || stop < length); + /* Check for potential overflow in debug mode */ + DBG_BUGON(stop > PAGE_SIZE || stop < length);
if (offset == 0 && stop == PAGE_SIZE) while (!managed_cache_releasepage(page, GFP_NOFS)) @@ -619,7 +618,7 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
static int erofs_remount(struct super_block *sb, int *flags, char *data) { - BUG_ON(!sb_rdonly(sb)); + DBG_BUGON(!sb_rdonly(sb));
*flags |= SB_RDONLY; return 0;
commit 70b17991d89554cdd16f3e4fb0179bcc03c808d9 upstream.
remove all redundant BUG_ONs, and turn the rest useful usages to DBG_BUGONs.
Signed-off-by: Gao Xiang gaoxiang25@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Conflicts: drivers/staging/erofs/unzip_vle.c
Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/unzip_pagevec.h | 2 +- drivers/staging/erofs/unzip_vle.c | 35 ++++++++++++++--------------------- 2 files changed, 15 insertions(+), 22 deletions(-)
diff --git a/drivers/staging/erofs/unzip_pagevec.h b/drivers/staging/erofs/unzip_pagevec.h index 0956615b86f7..23856ba2742d 100644 --- a/drivers/staging/erofs/unzip_pagevec.h +++ b/drivers/staging/erofs/unzip_pagevec.h @@ -150,7 +150,7 @@ z_erofs_pagevec_ctor_dequeue(struct z_erofs_pagevec_ctor *ctor, erofs_vtptr_t t;
if (unlikely(ctor->index >= ctor->nr)) { - BUG_ON(ctor->next == NULL); + DBG_BUGON(!ctor->next); z_erofs_pagevec_ctor_pagedown(ctor, true); }
diff --git a/drivers/staging/erofs/unzip_vle.c b/drivers/staging/erofs/unzip_vle.c index 63accc4527ce..b82af55c7f84 100644 --- a/drivers/staging/erofs/unzip_vle.c +++ b/drivers/staging/erofs/unzip_vle.c @@ -18,9 +18,6 @@ static struct kmem_cache *z_erofs_workgroup_cachep __read_mostly;
void z_erofs_exit_zip_subsystem(void) { - BUG_ON(z_erofs_workqueue == NULL); - BUG_ON(z_erofs_workgroup_cachep == NULL); - destroy_workqueue(z_erofs_workqueue); kmem_cache_destroy(z_erofs_workgroup_cachep); } @@ -363,7 +360,10 @@ z_erofs_vle_work_register(struct super_block *sb, struct z_erofs_vle_work *work;
/* if multiref is disabled, grp should never be nullptr */ - BUG_ON(grp != NULL); + if (unlikely(grp)) { + DBG_BUGON(1); + return ERR_PTR(-EINVAL); + }
/* no available workgroup, let's allocate one */ grp = kmem_cache_zalloc(z_erofs_workgroup_cachep, GFP_NOFS); @@ -742,7 +742,7 @@ static inline void z_erofs_vle_read_endio(struct bio *bio) bool cachemngd = false;
DBG_BUGON(PageUptodate(page)); - BUG_ON(page->mapping == NULL); + DBG_BUGON(!page->mapping);
#ifdef EROFS_FS_HAS_MANAGED_CACHE if (unlikely(mngda == NULL && !z_erofs_is_stagingpage(page))) { @@ -800,7 +800,7 @@ static int z_erofs_vle_unzip(struct super_block *sb,
might_sleep(); work = z_erofs_vle_grab_primary_work(grp); - BUG_ON(!READ_ONCE(work->nr_pages)); + DBG_BUGON(!READ_ONCE(work->nr_pages));
mutex_lock(&work->lock); nr_pages = work->nr_pages; @@ -849,8 +849,8 @@ static int z_erofs_vle_unzip(struct super_block *sb, else pagenr = z_erofs_onlinepage_index(page);
- BUG_ON(pagenr >= nr_pages); - BUG_ON(pages[pagenr] != NULL); + DBG_BUGON(pagenr >= nr_pages); + DBG_BUGON(pages[pagenr]);
pages[pagenr] = page; } @@ -873,9 +873,8 @@ static int z_erofs_vle_unzip(struct super_block *sb, if (z_erofs_is_stagingpage(page)) continue; #ifdef EROFS_FS_HAS_MANAGED_CACHE - else if (page->mapping == mngda) { - BUG_ON(PageLocked(page)); - BUG_ON(!PageUptodate(page)); + if (page->mapping == mngda) { + DBG_BUGON(!PageUptodate(page)); continue; } #endif @@ -883,8 +882,8 @@ static int z_erofs_vle_unzip(struct super_block *sb, /* only non-head page could be reused as a compressed page */ pagenr = z_erofs_onlinepage_index(page);
- BUG_ON(pagenr >= nr_pages); - BUG_ON(pages[pagenr] != NULL); + DBG_BUGON(pagenr >= nr_pages); + DBG_BUGON(pages[pagenr]); ++sparsemem_pages; pages[pagenr] = page;
@@ -894,9 +893,6 @@ static int z_erofs_vle_unzip(struct super_block *sb, llen = (nr_pages << PAGE_SHIFT) - work->pageofs;
if (z_erofs_vle_workgrp_fmt(grp) == Z_EROFS_VLE_WORKGRP_FMT_PLAIN) { - /* FIXME! this should be fixed in the future */ - BUG_ON(grp->llen != llen); - err = z_erofs_vle_plain_copy(compressed_pages, clusterpages, pages, nr_pages, work->pageofs); goto out; @@ -911,10 +907,8 @@ static int z_erofs_vle_unzip(struct super_block *sb, if (err != -ENOTSUPP) goto out_percpu;
- if (sparsemem_pages >= nr_pages) { - BUG_ON(sparsemem_pages > nr_pages); + if (sparsemem_pages >= nr_pages) goto skip_allocpage; - }
for (i = 0; i < nr_pages; ++i) { if (pages[i] != NULL) @@ -1007,7 +1001,7 @@ static void z_erofs_vle_unzip_wq(struct work_struct *work) struct z_erofs_vle_unzip_io_sb, io.u.work); LIST_HEAD(page_pool);
- BUG_ON(iosb->io.head == Z_EROFS_VLE_WORKGRP_TAIL_CLOSED); + DBG_BUGON(iosb->io.head == Z_EROFS_VLE_WORKGRP_TAIL_CLOSED); z_erofs_vle_unzip_all(iosb->sb, &iosb->io, &page_pool);
put_pages_list(&page_pool); @@ -1336,7 +1330,6 @@ static inline int __z_erofs_vle_normalaccess_readpages( continue; }
- BUG_ON(PagePrivate(page)); set_page_private(page, (unsigned long)head); head = page; }
commit b8e076a6ef253e763bfdb81e5c72bcc828b0fbeb upstream.
remove all redundant BUG_ONs, and turn the rest useful usages to DBG_BUGONs.
Signed-off-by: Gao Xiang gaoxiang25@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Gao Xiang gaoxiang25@huawei.com --- drivers/staging/erofs/unzip_vle_lz4.c | 2 +- drivers/staging/erofs/utils.c | 12 ++++++------ 2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/staging/erofs/unzip_vle_lz4.c b/drivers/staging/erofs/unzip_vle_lz4.c index f5b665f15be5..9cb35cd33365 100644 --- a/drivers/staging/erofs/unzip_vle_lz4.c +++ b/drivers/staging/erofs/unzip_vle_lz4.c @@ -57,7 +57,7 @@ int z_erofs_vle_plain_copy(struct page **compressed_pages, if (compressed_pages[j] != page) continue;
- BUG_ON(mirrored[j]); + DBG_BUGON(mirrored[j]); memcpy(percpu_data + j * PAGE_SIZE, dst, PAGE_SIZE); mirrored[j] = true; break; diff --git a/drivers/staging/erofs/utils.c b/drivers/staging/erofs/utils.c index 389f6182721e..ceee288fbe90 100644 --- a/drivers/staging/erofs/utils.c +++ b/drivers/staging/erofs/utils.c @@ -23,9 +23,6 @@ struct page *erofs_allocpage(struct list_head *pool, gfp_t gfp) list_del(&page->lru); } else { page = alloc_pages(gfp | __GFP_NOFAIL, 0); - - BUG_ON(page == NULL); - BUG_ON(page->mapping != NULL); } return page; } @@ -60,7 +57,7 @@ struct erofs_workgroup *erofs_find_workgroup( /* decrease refcount added by erofs_workgroup_put */ if (unlikely(oldcount == 1)) atomic_long_dec(&erofs_global_shrink_cnt); - BUG_ON(index != grp->index); + DBG_BUGON(index != grp->index); } rcu_read_unlock(); return grp; @@ -73,8 +70,11 @@ int erofs_register_workgroup(struct super_block *sb, struct erofs_sb_info *sbi; int err;
- /* grp->refcount should not < 1 */ - BUG_ON(!atomic_read(&grp->refcount)); + /* grp shouldn't be broken or used before */ + if (unlikely(atomic_read(&grp->refcount) != 1)) { + DBG_BUGON(1); + return -EINVAL; + }
err = radix_tree_preload(GFP_NOFS); if (err)
On Wed, Feb 20, 2019 at 05:18:42PM +0800, Gao Xiang wrote:
This series backports bugfixes already merged in linux upstream which we found these issues in our commerical products, which are serious and should be fixed immediately.
Note that it also includes some xarray modification since upcoming patches heavily needs it, which can reduce more conflicts later.
All patches have been tested again as a whole.
Some of these patches are not in 4.20, so if a user were to move to that kernel, they would see a "regression" in the filesystem functionality, right? That's not ok, and because of that reason alone, I can't take this whole series for 4.19.y right now.
So can you prepare a set of patches for 4.20.y also with the missing patches you have included here?
Also, are all of these really "fixes"? They feel like they are basic "get things working properly" type of patches. The BUG_ONs are not really fixes as no one _should_ be hitting those in a normal system, right? Are they really necessary for systems running 4.19?
thanks,
greg k-h
Hi Greg,
On 2019/2/22 16:35, Greg Kroah-Hartman wrote:
On Wed, Feb 20, 2019 at 05:18:42PM +0800, Gao Xiang wrote:
This series backports bugfixes already merged in linux upstream which we found these issues in our commerical products, which are serious and should be fixed immediately.
Note that it also includes some xarray modification since upcoming patches heavily needs it, which can reduce more conflicts later.
All patches have been tested again as a whole.
Some of these patches are not in 4.20, so if a user were to move to that kernel, they would see a "regression" in the filesystem functionality, right? That's not ok, and because of that reason alone, I can't take this whole series for 4.19.y right now.
Yes, you are right. I think 4.20 should be fixed as well. I will do it later for 4.20 and the reason I did 4.19 first is just because it is a LTS kernel...
So can you prepare a set of patches for 4.20.y also with the missing patches you have included here?
I will do that.
Also, are all of these really "fixes"? They feel like they are basic "get things working properly" type of patches. The BUG_ONs are not really fixes as no one _should_ be hitting those in a normal system, right? Are they really necessary for systems running 4.19?
Most of these BUG_ONs will crash kernel when mounted with corrupted images, since I plan to take time to run EROFS linux 4.19 LTS in AOSP, those patches are important to do that...
Thanks, Gao Xiang
thanks,
greg k-h
On Wed, Feb 20, 2019 at 05:18:42PM +0800, Gao Xiang wrote:
This series backports bugfixes already merged in linux upstream which we found these issues in our commerical products, which are serious and should be fixed immediately.
Note that it also includes some xarray modification since upcoming patches heavily needs it, which can reduce more conflicts later.
All patches have been tested again as a whole.
All but one now queued up, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org