On Fri, Oct 12, 2018 at 7:31 PM, Alexei Starovoitov
<alexei.starovoitov(a)gmail.com> wrote:
> On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote:
>> The map-in-map frequently serves as a mechanism for atomic
>> snapshotting of state that a BPF program might record. The current
>> implementation is dangerous to use in this way, however, since
>> userspace has no way of knowing when all programs that might have
>> retrieved the "old" value of the map may have completed.
>>
>> This change ensures that map update operations on map-in-map map types
>> always wait for all references to the old map to drop before returning
>> to userspace.
>>
>> Signed-off-by: Daniel Colascione <dancol(a)google.com>
>> ---
>> kernel/bpf/syscall.c | 14 ++++++++++++++
>> 1 file changed, 14 insertions(+)
>>
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 8339d81cba1d..d7c16ae1e85a 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr)
>> return err;
>> }
>>
>> +static void maybe_wait_bpf_programs(struct bpf_map *map)
>> +{
>> + /* Wait for any running BPF programs to complete so that
>> + * userspace, when we return to it, knows that all programs
>> + * that could be running use the new map value.
>> + */
>> + if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
>> + map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
>> + synchronize_rcu();
>> + }
>
> extra {} were not necessary. I removed them while applying to bpf-next.
> Please run checkpatch.pl next time.
> Thanks
Thanks Alexei for taking it. Me and Lorenzo were discussing that not
having this causes incorrect behavior for apps using map-in-map for
this. So I CC'd stable as well.
-Joel
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f5e758b8358f6c27e8a351ddf0b441a64cdabb94 Mon Sep 17 00:00:00 2001
From: Marek Szyprowski <m.szyprowski(a)samsung.com>
Date: Wed, 5 Sep 2018 12:02:15 +0200
Subject: [PATCH] ARM: dts: exynos: Disable pull control for MAX8997 interrupts
on Origen
PMIC_IRQB and PMIC_KEYINB lines on Exynos4210-based Origen board have
external pull-up resistors, so disable any pull control for those lines
in respective pin controller node. This fixes support for MAX8997
interrupts and enables operation of wakeup from MAX8997 RTC alarm.
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
Fixes: 17419726aaa1 ("ARM: dts: add max8997 device node for exynos4210-origen board")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk(a)kernel.org>
diff --git a/arch/arm/boot/dts/exynos4210-origen.dts b/arch/arm/boot/dts/exynos4210-origen.dts
index 2ab99f9f3d0a..dd9ec05eb0f7 100644
--- a/arch/arm/boot/dts/exynos4210-origen.dts
+++ b/arch/arm/boot/dts/exynos4210-origen.dts
@@ -151,6 +151,8 @@
reg = <0x66>;
interrupt-parent = <&gpx0>;
interrupts = <4 IRQ_TYPE_NONE>, <3 IRQ_TYPE_NONE>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&max8997_irq>;
max8997,pmic-buck1-dvs-voltage = <1350000>;
max8997,pmic-buck2-dvs-voltage = <1100000>;
@@ -288,6 +290,13 @@
};
};
+&pinctrl_1 {
+ max8997_irq: max8997-irq {
+ samsung,pins = "gpx0-3", "gpx0-4";
+ samsung,pin-pud = <EXYNOS_PIN_PULL_NONE>;
+ };
+};
+
&sdhci_0 {
bus-width = <4>;
pinctrl-0 = <&sd0_clk &sd0_cmd &sd0_bus4 &sd0_cd>;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f5e758b8358f6c27e8a351ddf0b441a64cdabb94 Mon Sep 17 00:00:00 2001
From: Marek Szyprowski <m.szyprowski(a)samsung.com>
Date: Wed, 5 Sep 2018 12:02:15 +0200
Subject: [PATCH] ARM: dts: exynos: Disable pull control for MAX8997 interrupts
on Origen
PMIC_IRQB and PMIC_KEYINB lines on Exynos4210-based Origen board have
external pull-up resistors, so disable any pull control for those lines
in respective pin controller node. This fixes support for MAX8997
interrupts and enables operation of wakeup from MAX8997 RTC alarm.
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
Fixes: 17419726aaa1 ("ARM: dts: add max8997 device node for exynos4210-origen board")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk(a)kernel.org>
diff --git a/arch/arm/boot/dts/exynos4210-origen.dts b/arch/arm/boot/dts/exynos4210-origen.dts
index 2ab99f9f3d0a..dd9ec05eb0f7 100644
--- a/arch/arm/boot/dts/exynos4210-origen.dts
+++ b/arch/arm/boot/dts/exynos4210-origen.dts
@@ -151,6 +151,8 @@
reg = <0x66>;
interrupt-parent = <&gpx0>;
interrupts = <4 IRQ_TYPE_NONE>, <3 IRQ_TYPE_NONE>;
+ pinctrl-names = "default";
+ pinctrl-0 = <&max8997_irq>;
max8997,pmic-buck1-dvs-voltage = <1350000>;
max8997,pmic-buck2-dvs-voltage = <1100000>;
@@ -288,6 +290,13 @@
};
};
+&pinctrl_1 {
+ max8997_irq: max8997-irq {
+ samsung,pins = "gpx0-3", "gpx0-4";
+ samsung,pin-pud = <EXYNOS_PIN_PULL_NONE>;
+ };
+};
+
&sdhci_0 {
bus-width = <4>;
pinctrl-0 = <&sd0_clk &sd0_cmd &sd0_bus4 &sd0_cd>;
From: Daniel Wagner <daniel.wagner(a)siemens.com>
Sebastian writes:
"""
We reproducibly observe cache line starvation on a Core2Duo E6850 (2
cores), a i5-6400 SKL (4 cores) and on a NXP LS2044A ARM Cortex-A72 (4
cores).
The problem can be triggered with a v4.9-RT kernel by starting
cyclictest -S -p98 -m -i2000 -b 200
and as "load"
stress-ng --ptrace 4
The reported maximal latency is usually less than 60us. If the problem
triggers then values around 400us, 800us or even more are reported. The
upperlimit is the -i parameter.
Reproduction with 4.9-RT is almost immediate on Core2Duo, ARM64 and SKL,
but it took 7.5 hours to trigger on v4.14-RT on the Core2Duo.
Instrumentation show always the picture:
CPU0 CPU1
=> do_syscall_64 => do_syscall_64
=> SyS_ptrace => syscall_slow_exit_work
=> ptrace_check_attach => ptrace_do_notify / rt_read_unlock
=> wait_task_inactive rt_spin_lock_slowunlock()
-> while task_running() __rt_mutex_unlock_common()
/ check_task_state() mark_wakeup_next_waiter()
| raw_spin_lock_irq(&p->pi_lock); raw_spin_lock(¤t->pi_lock);
| . .
| raw_spin_unlock_irq(&p->pi_lock); .
\ cpu_relax() .
- .
*IRQ* <lock acquired>
In the error case we observe that the while() loop is repeated more than
5000 times which indicates that the pi_lock can be acquired. CPU1 on the
other side does not make progress waiting for the same lock with interrupts
disabled.
This continues until an IRQ hits CPU0. Once CPU0 starts processing the IRQ
the other CPU is able to acquire pi_lock and the situation relaxes.
"""
This matches with the observeration for v4.4-rt on a Core2Duo E6850:
CPU 0:
- no progress for a very long time in rt_mutex_dequeue_pi):
stress-n-1931 0d..11 5060.891219: function: __try_to_take_rt_mutex
stress-n-1931 0d..11 5060.891219: function: rt_mutex_dequeue
stress-n-1931 0d..21 5060.891220: function: rt_mutex_enqueue_pi
stress-n-1931 0....2 5060.891220: signal_generate: sig=17 errno=0 code=262148 comm=stress-ng-ptrac pid=1928 grp=1 res=1
stress-n-1931 0d..21 5060.894114: function: rt_mutex_dequeue_pi
stress-n-1931 0d.h11 5060.894115: local_timer_entry: vector=239
CPU 1:
- IRQ at 5060.894114 on CPU 1 followed by the IRQ on CPU 0
stress-n-1928 1....0 5060.891215: sys_enter: NR 101 (18, 78b, 0, 0, 17, 788)
stress-n-1928 1d..11 5060.891216: function: __try_to_take_rt_mutex
stress-n-1928 1d..21 5060.891216: function: rt_mutex_enqueue_pi
stress-n-1928 1d..21 5060.891217: function: rt_mutex_dequeue_pi
stress-n-1928 1....1 5060.891217: function: rt_mutex_adjust_prio
stress-n-1928 1d..11 5060.891218: function: __rt_mutex_adjust_prio
stress-n-1928 1d.h10 5060.894114: local_timer_entry: vector=239
Thomas writes:
"""
This has nothing to do with RT. RT is merily exposing the
problem in an observable way. The same issue happens with upstream, it's
harder to trigger and it's harder to observe for obvious reasons.
If you read through the discussions [see the links below] then you
really see that there is an upstream issue with the x86 qrlock
implementation and Peter has posted fixes which resolve it, both at
the practical and the theoretical level.
"""
Backporting all qspinlock related patches is very likely to introduce
regressions on v4.4. Therefore, the recommended solution by Peter and
Thomas is to drop back to ticket spinlocks for v4.4.
Link :https://lkml.kernel.org/r/20180921120226.6xjgr4oiho22ex75@linutronix.de
Link: https://lkml.kernel.org/r/20180926110117.405325143@infradead.org
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Daniel Wagner <daniel.wagner(a)siemens.com>
---
Thomas suggest following plan for fixing the issues on the varous
stable trees:
4.4: Trivial by switching back to ticket locks.
4.9: Decide whether bringing back ticket locks or backporting all qrlock
fixes. Sebastian has done the latter already and it's probably the
right solution
4.14:
4.18: Backporting the qrlock fixes
4.19: Either the fix ends up in 4.19 final or it needs to be backported
arch/x86/Kconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6df130a37d41..f00cab581e2d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -42,7 +42,6 @@ config X86
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if X86_64
select ARCH_USE_QUEUED_RWLOCKS
- select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
select ARCH_WANT_FRAME_POINTERS
--
2.14.4