On Thu, 17 May 2018 16:21:17 +0200
Christophe LEROY <christophe.leroy(a)c-s.fr> wrote:
> Le 17/05/2018 à 15:46, Michael Ellerman a écrit :
> > Nicholas Piggin <npiggin(a)gmail.com> writes:
> >
> >> On Thu, 17 May 2018 12:04:13 +0200 (CEST)
> >> Christophe Leroy <christophe.leroy(a)c-s.fr> wrote:
> >>
> >>> commit 87a156fb18fe1 ("Align hot loops of some string functions")
> >>> degraded the performance of string functions by adding useless
> >>> nops
> >>>
> >>> A simple benchmark on an 8xx calling 100000x a memchr() that
> >>> matches the first byte runs in 41668 TB ticks before this patch
> >>> and in 35986 TB ticks after this patch. So this gives an
> >>> improvement of approx 10%
> >>>
> >>> Another benchmark doing the same with a memchr() matching the 128th
> >>> byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks
> >>> after this patch, so regardless on the number of loops, removing
> >>> those useless nops improves the test by 5683 TB ticks.
> >>>
> >>> Fixes: 87a156fb18fe1 ("Align hot loops of some string functions")
> >>> Signed-off-by: Christophe Leroy <christophe.leroy(a)c-s.fr>
> >>> ---
> >>> Was sent already as part of a serie optimising string functions.
> >>> Resending on itself as it is independent of the other changes in the
> >>> serie
> >>>
> >>> arch/powerpc/lib/string.S | 6 ++++++
> >>> 1 file changed, 6 insertions(+)
> >>>
> >>> diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S
> >>> index a787776822d8..a026d8fa8a99 100644
> >>> --- a/arch/powerpc/lib/string.S
> >>> +++ b/arch/powerpc/lib/string.S
> >>> @@ -23,7 +23,9 @@ _GLOBAL(strncpy)
> >>> mtctr r5
> >>> addi r6,r3,-1
> >>> addi r4,r4,-1
> >>> +#ifdef CONFIG_PPC64
> >>> .balign 16
> >>> +#endif
> >>> 1: lbzu r0,1(r4)
> >>> cmpwi 0,r0,0
> >>> stbu r0,1(r6)
> >>
> >> The ifdefs are a bit ugly, but you can't argue with the numbers. These
> >> alignments should be IFETCH_ALIGN_BYTES, which is intended to optimise
> >> the ifetch performance when you have such a loop (although there is
> >> always a tradeoff for a single iteration).
> >>
> >> Would it make sense to define that for 32-bit as well, and you could use
> >> it here instead of the ifdefs? Small CPUs could just use 0.
> >
> > Can we do it with a macro in the header, eg. like:
> >
> > #ifdef CONFIG_PPC64
> > #define IFETCH_BALIGN .balign IFETCH_ALIGN_BYTES
> > #endif
> >
> > ...
> >
> > addi r4,r4,-1
> > IFETCH_BALIGN
> > 1: lbzu r0,1(r4)
> >
> >
>
> Why not just define IFETCH_ALIGN_SHIFT for PPC32 as well in asm/cache.h
> ?, then replace the .balign 16 by .balign IFETCH_ALIGN_BYTES (or .align
> IFETCH_ALIGN_SHIFT) ?
Yeah that's what I was thinking. I would do that.
Thanks,
Nick
Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()")
returns the return value of debugfs_create_dir() to bcache_init(). When
CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes
bcache_init() failedi.
This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n,
so bcache can continue to work for the kernels which don't have debugfs
enanbled.
Fixes: Commit 539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-by: Massimo B. <massimo.b(a)gmx.net>
Reported-by: Kai Krakow <kai(a)kaishome.de>
Tested-by: Kai Krakow <kai(a)kaishome.de>
Cc: Kent Overstreet <kent.overstreet(a)gmail.com>
---
drivers/md/bcache/debug.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 4e63c6f6c04d..d030ce3025a6 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -250,7 +250,9 @@ void bch_debug_exit(void)
int __init bch_debug_init(struct kobject *kobj)
{
- bcache_debug = debugfs_create_dir("bcache", NULL);
+ if (!IS_ENABLED(CONFIG_DEBUG_FS))
+ return 0;
+ bcache_debug = debugfs_create_dir("bcache", NULL);
return IS_ERR_OR_NULL(bcache_debug);
}
--
2.16.3
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 998ac6d21cfd6efd58f5edf420bae8839dda9f2a Mon Sep 17 00:00:00 2001
From: ethanwu <ethanwu(a)synology.com>
Date: Sun, 29 Apr 2018 15:59:42 +0800
Subject: [PATCH] btrfs: Take trans lock before access running trans in
check_delayed_ref
In preivous patch:
Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_exist
We avoid starting btrfs transaction and get this information from
fs_info->running_transaction directly.
When accessing running_transaction in check_delayed_ref, there's a
chance that current transaction will be freed by commit transaction
after the NULL pointer check of running_transaction is passed.
After looking all the other places using fs_info->running_transaction,
they are either protected by trans_lock or holding the transactions.
Fix this by using trans_lock and increasing the use_count.
Fixes: e4c3b2dcd144 ("Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_exist")
CC: stable(a)vger.kernel.org # 4.14+
Signed-off-by: ethanwu <ethanwu(a)synology.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f99102063366..3871658b6ab1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3142,7 +3142,11 @@ static noinline int check_delayed_ref(struct btrfs_root *root,
struct rb_node *node;
int ret = 0;
+ spin_lock(&root->fs_info->trans_lock);
cur_trans = root->fs_info->running_transaction;
+ if (cur_trans)
+ refcount_inc(&cur_trans->use_count);
+ spin_unlock(&root->fs_info->trans_lock);
if (!cur_trans)
return 0;
@@ -3151,6 +3155,7 @@ static noinline int check_delayed_ref(struct btrfs_root *root,
head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
if (!head) {
spin_unlock(&delayed_refs->lock);
+ btrfs_put_transaction(cur_trans);
return 0;
}
@@ -3167,6 +3172,7 @@ static noinline int check_delayed_ref(struct btrfs_root *root,
mutex_lock(&head->mutex);
mutex_unlock(&head->mutex);
btrfs_put_delayed_ref_head(head);
+ btrfs_put_transaction(cur_trans);
return -EAGAIN;
}
spin_unlock(&delayed_refs->lock);
@@ -3199,6 +3205,7 @@ static noinline int check_delayed_ref(struct btrfs_root *root,
}
spin_unlock(&head->lock);
mutex_unlock(&head->mutex);
+ btrfs_put_transaction(cur_trans);
return ret;
}