From: "Naveen N. Rao" <naveen.n.rao(a)linux.vnet.ibm.com>
In register_ftrace_function_probe(), we are not checking the return
value of alloc_and_copy_ftrace_hash(). The subsequent call to
ftrace_match_records() may end up dereferencing the same. Add a check to
ensure this doesn't happen.
Link: http://lkml.kernel.org/r/26e92574f25ad23e7cafa3cf5f7a819de1832cbe.156224952…
Cc: stable(a)vger.kernel.org
Fixes: 1ec3a81a0cf42 ("ftrace: Have each function probe use its own ftrace_ops")
Signed-off-by: Naveen N. Rao <naveen.n.rao(a)linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 6200a6fe10e3..f9821a3374e9 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4338,6 +4338,11 @@ register_ftrace_function_probe(char *glob, struct trace_array *tr,
old_hash = *orig_hash;
hash = alloc_and_copy_ftrace_hash(FTRACE_HASH_DEFAULT_BITS, old_hash);
+ if (!hash) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
ret = ftrace_match_records(hash, glob, strlen(glob));
/* Nothing found? */
--
2.20.1
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
The race between adding a function probe and reading the probes that exist
is very subtle. It needs a comment. Also, the issue can also happen if the
probe has has the EMPTY_HASH as its func_hash.
Cc: stable(a)vger.kernel.org
Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 80beed2cf0da..6200a6fe10e3 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3096,7 +3096,11 @@ t_probe_next(struct seq_file *m, loff_t *pos)
hash = iter->probe->ops.func_hash->filter_hash;
- if (!hash)
+ /*
+ * A probe being registered may temporarily have an empty hash
+ * and it's at the end of the func_probes list.
+ */
+ if (!hash || hash == EMPTY_HASH)
return NULL;
size = 1 << hash->size_bits;
@@ -4324,6 +4328,10 @@ register_ftrace_function_probe(char *glob, struct trace_array *tr,
mutex_unlock(&ftrace_lock);
+ /*
+ * Note, there's a small window here that the func_hash->filter_hash
+ * may be NULL or empty. Need to be carefule when reading the loop.
+ */
mutex_lock(&probe->ops.func_hash->regex_lock);
orig_hash = &probe->ops.func_hash->filter_hash;
--
2.20.1
From: "Naveen N. Rao" <naveen.n.rao(a)linux.vnet.ibm.com>
LTP testsuite on powerpc results in the below crash:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc00000000029d800
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
...
CPU: 68 PID: 96584 Comm: cat Kdump: loaded Tainted: G W
NIP: c00000000029d800 LR: c00000000029dac4 CTR: c0000000001e6ad0
REGS: c0002017fae8ba10 TRAP: 0300 Tainted: G W
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022422 XER: 20040000
CFAR: c00000000029d90c DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
...
NIP [c00000000029d800] t_probe_next+0x60/0x180
LR [c00000000029dac4] t_mod_start+0x1a4/0x1f0
Call Trace:
[c0002017fae8bc90] [c000000000cdbc40] _cond_resched+0x10/0xb0 (unreliable)
[c0002017fae8bce0] [c0000000002a15b0] t_start+0xf0/0x1c0
[c0002017fae8bd30] [c0000000004ec2b4] seq_read+0x184/0x640
[c0002017fae8bdd0] [c0000000004a57bc] sys_read+0x10c/0x300
[c0002017fae8be30] [c00000000000b388] system_call+0x5c/0x70
The test (ftrace_set_ftrace_filter.sh) is part of ftrace stress tests
and the crash happens when the test does 'cat
$TRACING_PATH/set_ftrace_filter'.
The address points to the second line below, in t_probe_next(), where
filter_hash is dereferenced:
hash = iter->probe->ops.func_hash->filter_hash;
size = 1 << hash->size_bits;
This happens due to a race with register_ftrace_function_probe(). A new
ftrace_func_probe is created and added into the func_probes list in
trace_array under ftrace_lock. However, before initializing the filter,
we drop ftrace_lock, and re-acquire it after acquiring regex_lock. If
another process is trying to read set_ftrace_filter, it will be able to
acquire ftrace_lock during this window and it will end up seeing a NULL
filter_hash.
Fix this by just checking for a NULL filter_hash in t_probe_next(). If
the filter_hash is NULL, then this probe is just being added and we can
simply return from here.
Link: http://lkml.kernel.org/r/05e021f757625cbbb006fad41380323dbe4e3b43.156224952…
Cc: stable(a)vger.kernel.org
Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Naveen N. Rao <naveen.n.rao(a)linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index eca34503f178..80beed2cf0da 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3095,6 +3095,10 @@ t_probe_next(struct seq_file *m, loff_t *pos)
hnd = &iter->probe_entry->hlist;
hash = iter->probe->ops.func_hash->filter_hash;
+
+ if (!hash)
+ return NULL;
+
size = 1 << hash->size_bits;
retry:
--
2.20.1
Hello,
We ran automated tests on a patchset that was proposed for merging into this
kernel tree. The patches were applied to:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: c3915fe1bf12 - Linux 5.2.11
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
Merge: OK
Compile: OK
Tests: FAILED
All kernel binaries, config files, and logs are available for download here:
https://artifacts.cki-project.org/pipelines/133683
One or more kernel tests failed:
aarch64:
❌ Boot test
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Merge testing
-------------
We cloned this repository and checked out the following commit:
Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: c3915fe1bf12 - Linux 5.2.11
We grabbed the 3f740c3323de commit of the stable queue repository.
We then merged the patchset with `git am`:
dmaengine-ste_dma40-fix-unneeded-variable-warning.patch
nvme-multipath-revalidate-nvme_ns_head-gendisk-in-nv.patch
afs-fix-the-cb.probeuuid-service-handler-to-reply-co.patch
afs-fix-loop-index-mixup-in-afs_deliver_vl_get_entry.patch
fs-afs-fix-a-possible-null-pointer-dereference-in-af.patch
afs-fix-off-by-one-in-afs_rename-expected-data-versi.patch
afs-only-update-d_fsdata-if-different-in-afs_d_reval.patch
afs-fix-missing-dentry-data-version-updating.patch
nvmet-fix-use-after-free-bug-when-a-port-is-removed.patch
nvmet-loop-flush-nvme_delete_wq-when-removing-the-po.patch
nvmet-file-fix-nvmet_file_flush-always-returning-an-.patch
nvme-core-fix-extra-device_put-call-on-error-path.patch
nvme-fix-a-possible-deadlock-when-passthru-commands-.patch
nvme-rdma-fix-possible-use-after-free-in-connect-err.patch
nvme-fix-controller-removal-race-with-scan-work.patch
nvme-pci-fix-async-probe-remove-race.patch
soundwire-cadence_master-fix-register-definition-for.patch
soundwire-cadence_master-fix-definitions-for-intstat.patch
auxdisplay-panel-need-to-delete-scan_timer-when-misc.patch
btrfs-trim-check-the-range-passed-into-to-prevent-ov.patch
ib-mlx5-fix-implicit-mr-release-flow.patch
dmaengine-stm32-mdma-fix-a-possible-null-pointer-der.patch
omap-dma-omap_vout_vrfb-fix-off-by-one-fi-value.patch
iommu-dma-handle-sg-length-overflow-better.patch
dma-direct-don-t-truncate-dma_required_mask-to-bus-a.patch
usb-gadget-composite-clear-suspended-on-reset-discon.patch
usb-gadget-mass_storage-fix-races-between-fsg_disabl.patch
habanalabs-fix-dram-usage-accounting-on-context-tear.patch
habanalabs-fix-endianness-handling-for-packets-from-.patch
habanalabs-fix-completion-queue-handling-when-host-i.patch
habanalabs-fix-endianness-handling-for-internal-qman.patch
habanalabs-fix-device-irq-unmasking-for-be-host.patch
xen-blkback-fix-memory-leaks.patch
arm64-cpufeature-don-t-treat-granule-sizes-as-strict.patch
riscv-fix-flush_tlb_range-end-address-for-flush_tlb_.patch
i2c-rcar-avoid-race-when-unregistering-slave-client.patch
i2c-emev2-avoid-race-when-unregistering-slave-client.patch
drm-scheduler-use-job-count-instead-of-peek.patch
drm-ast-fixed-reboot-test-may-cause-system-hanged.patch
usb-host-fotg2-restart-hcd-after-port-reset.patch
tools-hv-fixed-python-pep8-flake8-warnings-for-lsvmb.patch
tools-hv-fix-kvp-and-vss-daemons-exit-code.patch
locking-rwsem-add-missing-acquire-to-read_slowpath-e.patch
lcoking-rwsem-add-missing-acquire-to-read_slowpath-s.patch
watchdog-bcm2835_wdt-fix-module-autoload.patch
selftests-bpf-install-files-test_xdp_vlan.sh.patch
drm-bridge-tfp410-fix-memleak-in-get_modes.patch
mt76-usb-fix-rx-a-msdu-support.patch
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test [0]
✅ selinux-policy: serge-testsuite [1]
🚧 ✅ Storage blktests [2]
Host 2:
❌ Boot test [0]
⚡⚡⚡ Podman system integration test (as root) [3]
⚡⚡⚡ Podman system integration test (as user) [3]
⚡⚡⚡ Loopdev Sanity [4]
⚡⚡⚡ jvm test suite [5]
⚡⚡⚡ AMTU (Abstract Machine Test Utility) [6]
⚡⚡⚡ LTP: openposix test suite [7]
⚡⚡⚡ audit: audit testsuite test [8]
⚡⚡⚡ httpd: mod_ssl smoke sanity [9]
⚡⚡⚡ iotop: sanity [10]
⚡⚡⚡ tuned: tune-processes-through-perf [11]
⚡⚡⚡ Usex - version 1.9-29 [12]
⚡⚡⚡ stress: stress-ng [13]
🚧 ⚡⚡⚡ LTP lite [14]
ppc64le:
Host 1:
✅ Boot test [0]
✅ selinux-policy: serge-testsuite [1]
🚧 ✅ Storage blktests [2]
Host 2:
✅ Boot test [0]
✅ Podman system integration test (as root) [3]
✅ Podman system integration test (as user) [3]
✅ Loopdev Sanity [4]
✅ jvm test suite [5]
✅ AMTU (Abstract Machine Test Utility) [6]
✅ LTP: openposix test suite [7]
✅ audit: audit testsuite test [8]
✅ httpd: mod_ssl smoke sanity [9]
✅ iotop: sanity [10]
✅ tuned: tune-processes-through-perf [11]
✅ Usex - version 1.9-29 [12]
🚧 ✅ LTP lite [14]
x86_64:
Host 1:
✅ Boot test [0]
✅ selinux-policy: serge-testsuite [1]
🚧 ✅ Storage blktests [2]
🚧 ✅ IOMMU boot test [15]
Host 2:
✅ Boot test [0]
✅ Podman system integration test (as root) [3]
✅ Podman system integration test (as user) [3]
✅ Loopdev Sanity [4]
✅ jvm test suite [5]
✅ AMTU (Abstract Machine Test Utility) [6]
✅ LTP: openposix test suite [7]
✅ audit: audit testsuite test [8]
✅ httpd: mod_ssl smoke sanity [9]
✅ iotop: sanity [10]
✅ tuned: tune-processes-through-perf [11]
✅ pciutils: sanity smoke test [16]
✅ Usex - version 1.9-29 [12]
✅ stress: stress-ng [13]
🚧 ✅ LTP lite [14]
Test source:
💚 Pull requests are welcome for new tests or improvements to existing tests!
[0]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[1]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/packages/se…
[2]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/blk
[3]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/container/p…
[4]: https://github.com/CKI-project/tests-beaker/archive/master.zip#filesystems/…
[5]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/jvm
[6]: https://github.com/CKI-project/tests-beaker/archive/master.zip#misc/amtu
[7]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[8]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/aud…
[9]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[10]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/iot…
[11]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/tun…
[12]: https://github.com/CKI-project/tests-beaker/archive/master.zip#standards/us…
[13]: https://github.com/CKI-project/tests-beaker/archive/master.zip#stress/stres…
[14]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[15]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/iommu/boot
[16]: https://github.com/CKI-project/tests-beaker/archive/master.zip#pciutils/san…
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
From: Shakeel Butt <shakeelb(a)google.com>
Subject: mm: memcontrol: fix percpu vmstats and vmevents flush
Instead of using raw_cpu_read() use per_cpu() to read the actual data of
the corresponding cpu otherwise we will be reading the data of the current
cpu for the number of online CPUs.
Link: http://lkml.kernel.org/r/20190829203110.129263-1-shakeelb@google.com
Fixes: bb65f89b7d3d ("mm: memcontrol: flush percpu vmevents before releasing memcg")
Fixes: c350a99ea2b1 ("mm: memcontrol: flush percpu vmstats before releasing memcg")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-percpu-vmstats-and-vmevents-flush
+++ a/mm/memcontrol.c
@@ -3278,7 +3278,7 @@ static void memcg_flush_percpu_vmstats(s
for_each_online_cpu(cpu)
for (i = min_idx; i < max_idx; i++)
- stat[i] += raw_cpu_read(memcg->vmstats_percpu->stat[i]);
+ stat[i] += per_cpu(memcg->vmstats_percpu->stat[i], cpu);
for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
for (i = min_idx; i < max_idx; i++)
@@ -3296,8 +3296,8 @@ static void memcg_flush_percpu_vmstats(s
for_each_online_cpu(cpu)
for (i = min_idx; i < max_idx; i++)
- stat[i] += raw_cpu_read(
- pn->lruvec_stat_cpu->count[i]);
+ stat[i] += per_cpu(
+ pn->lruvec_stat_cpu->count[i], cpu);
for (pi = pn; pi; pi = parent_nodeinfo(pi, node))
for (i = min_idx; i < max_idx; i++)
@@ -3316,8 +3316,8 @@ static void memcg_flush_percpu_vmevents(
for_each_online_cpu(cpu)
for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
- events[i] += raw_cpu_read(
- memcg->vmstats_percpu->events[i]);
+ events[i] += per_cpu(memcg->vmstats_percpu->events[i],
+ cpu);
for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
_
From: Roman Gushchin <guro(a)fb.com>
Subject: mm, memcg: partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"
Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with
the hierarchical ones") effectively decreased the precision of per-memcg
vmstats_local and per-memcg-per-node lruvec percpu counters.
That's good for displaying in memory.stat, but brings a serious regression
into the reclaim process.
One issue I've discovered and debugged is the following: lruvec_lru_size()
can return 0 instead of the actual number of pages in the lru list,
preventing the kernel to reclaim last remaining pages. Result is yet
another dying memory cgroups flooding. The opposite is also happening:
scanning an empty lru list is the waste of cpu time.
Also, inactive_list_is_low() can return incorrect values, preventing the
active lru from being scanned and freed. It can fail both because the
size of active and inactive lists are inaccurate, and because the number
of workingset refaults isn't precise. In other words, the result is
pretty random.
I'm not sure, if using the approximate number of slab pages in
count_shadow_number() is acceptable, but issues described above are enough
to partially revert the patch.
Let's keep per-memcg vmstat_local batched (they are only used for
displaying stats to the userspace), but keep lruvec stats precise. This
change fixes the dead memcg flooding on my setup.
Link: http://lkml.kernel.org/r/20190817004726.2530670-1-guro@fb.com
Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Yafang Shao <laoar.shao(a)gmail.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
--- a/mm/memcontrol.c~partially-revert-mm-memcontrolc-keep-local-vm-counters-in-sync-with-the-hierarchical-ones
+++ a/mm/memcontrol.c
@@ -752,15 +752,13 @@ void __mod_lruvec_state(struct lruvec *l
/* Update memcg */
__mod_memcg_state(memcg, idx, val);
+ /* Update lruvec */
+ __this_cpu_add(pn->lruvec_stat_local->count[idx], val);
+
x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]);
if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
struct mem_cgroup_per_node *pi;
- /*
- * Batch local counters to keep them in sync with
- * the hierarchical ones.
- */
- __this_cpu_add(pn->lruvec_stat_local->count[idx], x);
for (pi = pn; pi; pi = parent_nodeinfo(pi, pgdat->node_id))
atomic_long_add(x, &pi->lruvec_stat[idx]);
x = 0;
_
set_msi_sid_cb() is used to determine whether device aliases share the
same bus, but it can provide false indications that aliases use the same
bus when in fact they do not. The reason is that set_msi_sid_cb()
assumes that pdev is fixed, while actually pci_for_each_dma_alias() can
call fn() when pdev is set to a subordinate device.
As a result, running an VM on ESX with VT-d emulation enabled can
results in the log warning such as:
DMAR: [INTR-REMAP] Request device [00:11.0] fault index 3b [fault reason 38] Blocked an interrupt request due to source-id verification failure
This seems to cause additional ata errors such as:
ata3.00: qc timeout (cmd 0xa1)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
These timeouts also cause boot to be much longer and other errors.
Fix it by checking comparing the alias with the previous one instead.
Fixes: 3f0c625c6ae71 ("iommu/vt-d: Allow interrupts from the entire bus for aliased devices")
Cc: stable(a)vger.kernel.org
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Jacob Pan <jacob.jun.pan(a)linux.intel.com>
Signed-off-by: Nadav Amit <namit(a)vmware.com>
---
drivers/iommu/intel_irq_remapping.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 4786ca061e31..81e43c1df7ec 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -376,13 +376,13 @@ static int set_msi_sid_cb(struct pci_dev *pdev, u16 alias, void *opaque)
{
struct set_msi_sid_data *data = opaque;
+ if (data->count == 0 || PCI_BUS_NUM(alias) == PCI_BUS_NUM(data->alias))
+ data->busmatch_count++;
+
data->pdev = pdev;
data->alias = alias;
data->count++;
- if (PCI_BUS_NUM(alias) == pdev->bus->number)
- data->busmatch_count++;
-
return 0;
}
--
2.17.1