Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 3517b32c0774 - Linux 5.5.8-rc1
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
Merge: OK
Compile: OK
Tests: FAILED
All kernel binaries, config files, and logs are available for download here:
https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=dataware…
One or more kernel tests failed:
s390x:
❌ LTP
❌ Networking bridge: sanity
ppc64le:
❌ LTP
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ⚡⚡⚡ Storage blktests
ppc64le:
Host 1:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
❌ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ selinux-policy: serge-testsuite
✅ stress: stress-ng
🚧 ⚡⚡⚡ Storage blktests
Host 2:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
❌ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
❌ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ audit: audit testsuite test
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ Usex - version 1.9-29
🚧 ⚡⚡⚡ storage: dm/common
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
✅ stress: stress-ng
🚧 ✅ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
Host 2:
✅ Boot test
✅ Storage SAN device stress - mpt3sas driver
Host 3:
✅ Boot test
✅ Storage SAN device stress - megaraid_sas
Host 4:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ pciutils: sanity smoke test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Test sources: https://github.com/CKI-project/tests-beaker
💚 Pull requests are welcome for new tests or improvements to existing tests!
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
When looking for the memblock where the kernel lives, we should check
that the memory range associated to the memblock entirely comprises the
kernel image and not only intersects with it.
Signed-off-by: Alexandre Ghiti <alex(a)ghiti.fr>
---
arch/riscv/mm/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 965a8cf4829c..fab855963c73 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -131,7 +131,7 @@ void __init setup_bootmem(void)
for_each_memblock(memory, reg) {
phys_addr_t end = reg->base + reg->size;
- if (reg->base <= vmlinux_end && vmlinux_end <= end) {
+ if (reg->base <= vmlinux_start && vmlinux_end <= end) {
mem_size = min(reg->size, (phys_addr_t)-PAGE_OFFSET);
/*
--
2.20.1
From: Stephan Mueller <smueller(a)chronox.de>
commit db07cd26ac6a418dc2823187958edcfdb415fa83 upstream
FIPS 140-2 section 4.9.2 requires a continuous self test of the noise
source. Up to kernel 4.8 drivers/char/random.c provided this continuous
self test. Afterwards it was moved to a location that is inconsistent
with the FIPS 140-2 requirements. The relevant patch was
e192be9d9a30555aae2ca1dc3aad37cba484cd4a .
Thus, the FIPS 140-2 CTRNG is added to the DRBG when it obtains the
seed. This patch resurrects the function drbg_fips_continous_test that
existed some time ago and applies it to the noise sources. The patch
that removed the drbg_fips_continous_test was
b3614763059b82c26bdd02ffcb1c016c1132aad0 .
The Jitter RNG implements its own FIPS 140-2 self test and thus does not
need to be subjected to the test in the DRBG.
The patch contains a tiny fix to ensure proper zeroization in case of an
error during the Jitter RNG data gathering.
Signed-off-by: Stephan Mueller <smueller(a)chronox.de>
Reviewed-by: Yann Droneaud <ydroneaud(a)opteya.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Vikash Bansal <bvikas(a)vmware.com>
---
crypto/drbg.c | 94 +++++++++++++++++++++++++++++++++++++++++--
include/crypto/drbg.h | 2 +
2 files changed, 93 insertions(+), 3 deletions(-)
diff --git a/crypto/drbg.c b/crypto/drbg.c
index 2a5b16bb000c..b6929eb5f565 100644
--- a/crypto/drbg.c
+++ b/crypto/drbg.c
@@ -219,6 +219,57 @@ static inline unsigned short drbg_sec_strength(drbg_flag_t flags)
}
}
+/*
+ * FIPS 140-2 continuous self test for the noise source
+ * The test is performed on the noise source input data. Thus, the function
+ * implicitly knows the size of the buffer to be equal to the security
+ * strength.
+ *
+ * Note, this function disregards the nonce trailing the entropy data during
+ * initial seeding.
+ *
+ * drbg->drbg_mutex must have been taken.
+ *
+ * @drbg DRBG handle
+ * @entropy buffer of seed data to be checked
+ *
+ * return:
+ * 0 on success
+ * -EAGAIN on when the CTRNG is not yet primed
+ * < 0 on error
+ */
+static int drbg_fips_continuous_test(struct drbg_state *drbg,
+ const unsigned char *entropy)
+{
+ unsigned short entropylen = drbg_sec_strength(drbg->core->flags);
+ int ret = 0;
+
+ if (!IS_ENABLED(CONFIG_CRYPTO_FIPS))
+ return 0;
+
+ /* skip test if we test the overall system */
+ if (list_empty(&drbg->test_data.list))
+ return 0;
+ /* only perform test in FIPS mode */
+ if (!fips_enabled)
+ return 0;
+
+ if (!drbg->fips_primed) {
+ /* Priming of FIPS test */
+ memcpy(drbg->prev, entropy, entropylen);
+ drbg->fips_primed = true;
+ /* priming: another round is needed */
+ return -EAGAIN;
+ }
+ ret = memcmp(drbg->prev, entropy, entropylen);
+ if (!ret)
+ panic("DRBG continuous self test failed\n");
+ memcpy(drbg->prev, entropy, entropylen);
+
+ /* the test shall pass when the two values are not equal */
+ return 0;
+}
+
/*
* Convert an integer into a byte representation of this integer.
* The byte representation is big-endian
@@ -998,6 +1049,22 @@ static inline int __drbg_seed(struct drbg_state *drbg, struct list_head *seed,
return ret;
}
+static inline int drbg_get_random_bytes(struct drbg_state *drbg,
+ unsigned char *entropy,
+ unsigned int entropylen)
+{
+ int ret;
+
+ do {
+ get_random_bytes(entropy, entropylen);
+ ret = drbg_fips_continuous_test(drbg, entropy);
+ if (ret && ret != -EAGAIN)
+ return ret;
+ } while (ret);
+
+ return 0;
+}
+
static void drbg_async_seed(struct work_struct *work)
{
struct drbg_string data;
@@ -1006,16 +1073,20 @@ static void drbg_async_seed(struct work_struct *work)
seed_work);
unsigned int entropylen = drbg_sec_strength(drbg->core->flags);
unsigned char entropy[32];
+ int ret;
BUG_ON(!entropylen);
BUG_ON(entropylen > sizeof(entropy));
- get_random_bytes(entropy, entropylen);
drbg_string_fill(&data, entropy, entropylen);
list_add_tail(&data.list, &seedlist);
mutex_lock(&drbg->drbg_mutex);
+ ret = drbg_get_random_bytes(drbg, entropy, entropylen);
+ if (ret)
+ goto unlock;
+
/* If nonblocking pool is initialized, deactivate Jitter RNG */
crypto_free_rng(drbg->jent);
drbg->jent = NULL;
@@ -1030,6 +1101,7 @@ static void drbg_async_seed(struct work_struct *work)
if (drbg->seeded)
drbg->reseed_threshold = drbg_max_requests(drbg);
+unlock:
mutex_unlock(&drbg->drbg_mutex);
memzero_explicit(entropy, entropylen);
@@ -1081,7 +1153,9 @@ static int drbg_seed(struct drbg_state *drbg, struct drbg_string *pers,
BUG_ON((entropylen * 2) > sizeof(entropy));
/* Get seed from in-kernel /dev/urandom */
- get_random_bytes(entropy, entropylen);
+ ret = drbg_get_random_bytes(drbg, entropy, entropylen);
+ if (ret)
+ goto out;
if (!drbg->jent) {
drbg_string_fill(&data1, entropy, entropylen);
@@ -1094,7 +1168,7 @@ static int drbg_seed(struct drbg_state *drbg, struct drbg_string *pers,
entropylen);
if (ret) {
pr_devel("DRBG: jent failed with %d\n", ret);
- return ret;
+ goto out;
}
drbg_string_fill(&data1, entropy, entropylen * 2);
@@ -1121,6 +1195,7 @@ static int drbg_seed(struct drbg_state *drbg, struct drbg_string *pers,
ret = __drbg_seed(drbg, &seedlist, reseed);
+out:
memzero_explicit(entropy, entropylen * 2);
return ret;
@@ -1142,6 +1217,11 @@ static inline void drbg_dealloc_state(struct drbg_state *drbg)
drbg->reseed_ctr = 0;
drbg->d_ops = NULL;
drbg->core = NULL;
+ if (IS_ENABLED(CONFIG_CRYPTO_FIPS)) {
+ kzfree(drbg->prev);
+ drbg->prev = NULL;
+ drbg->fips_primed = false;
+ }
}
/*
@@ -1211,6 +1291,14 @@ static inline int drbg_alloc_state(struct drbg_state *drbg)
drbg->scratchpad = PTR_ALIGN(drbg->scratchpadbuf, ret + 1);
}
+ if (IS_ENABLED(CONFIG_CRYPTO_FIPS)) {
+ drbg->prev = kzalloc(drbg_sec_strength(drbg->core->flags),
+ GFP_KERNEL);
+ if (!drbg->prev)
+ goto fini;
+ drbg->fips_primed = false;
+ }
+
return 0;
fini:
diff --git a/include/crypto/drbg.h b/include/crypto/drbg.h
index 3fb581bf3b87..8c9af21efce1 100644
--- a/include/crypto/drbg.h
+++ b/include/crypto/drbg.h
@@ -129,6 +129,8 @@ struct drbg_state {
bool seeded; /* DRBG fully seeded? */
bool pr; /* Prediction resistance enabled? */
+ bool fips_primed; /* Continuous test primed? */
+ unsigned char *prev; /* FIPS 140-2 continuous test value */
struct work_struct seed_work; /* asynchronous seeding support */
struct crypto_rng *jent;
const struct drbg_state_ops *d_ops;
--
2.19.0
The way cookie_init_hw_msi_region() allocates the iommu_dma_msi_page
structures doesn't match the way iommu_put_dma_cookie() frees them.
The former performs a single allocation of all the required structures,
while the latter tries to free them one at a time. It doesn't quite
work for the main use case (the GICv3 ITS where the range is 64kB)
when the base granule size is 4kB.
This leads to a nice slab corruption on teardown, which is easily
observable by simply creating a VF on a SRIOV-capable device, and
tearing it down immediately (no need to even make use of it).
Fortunately, this only affects systems where the ITS isn't translated
by the SMMU, which are both rare and non-standard.
Fix it by allocating iommu_dma_msi_page structures one at a time.
Fixes: 7c1b058c8b5a3 ("iommu/dma: Handle IOMMU API reserved regions")
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Eric Auger <eric.auger(a)redhat.com>
Cc: Robin Murphy <robin.murphy(a)arm.com>
Cc: Joerg Roedel <jroedel(a)suse.de>
Cc: Will Deacon <will(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
* From v1:
- Got rid of the superfluous error handling (Robin)
- Clarified that it only affects a very small set of systems
- Added Eric's RB (which I assume still stands)
drivers/iommu/dma-iommu.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a2e96a5fd9a7..ba128d1cdaee 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -177,15 +177,15 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
start -= iova_offset(iovad, start);
num_pages = iova_align(iovad, end - start) >> iova_shift(iovad);
- msi_page = kcalloc(num_pages, sizeof(*msi_page), GFP_KERNEL);
- if (!msi_page)
- return -ENOMEM;
-
for (i = 0; i < num_pages; i++) {
- msi_page[i].phys = start;
- msi_page[i].iova = start;
- INIT_LIST_HEAD(&msi_page[i].list);
- list_add(&msi_page[i].list, &cookie->msi_page_list);
+ msi_page = kmalloc(sizeof(*msi_page), GFP_KERNEL);
+ if (!msi_page)
+ return -ENOMEM;
+
+ msi_page->phys = start;
+ msi_page->iova = start;
+ INIT_LIST_HEAD(&msi_page->list);
+ list_add(&msi_page->list, &cookie->msi_page_list);
start += iovad->granule;
}
--
2.20.1
After the switch to use v4l2_async_notifier_add_subdev() and
v4l2_async_notifier_cleanup(), unloading the ti_cal module would casue a
kernel oops.
This was root cause to the fact that v4l2_async_notifier_cleanup() tries
to kfree the asd pointer passed into v4l2_async_notifier_add_subdev().
In our case the asd reference was from a statically allocated struct.
So in effect v4l2_async_notifier_cleanup() was trying to free a pointer
that was not kalloc.
So here we switch to using a kzalloc struct instead of a static one.
Fixes: d079f94c9046 ("media: platform: Switch to v4l2_async_notifier_add_subdev")
Cc: stable(a)vger.kernel.org
Signed-off-by: Benoit Parrot <bparrot(a)ti.com>
---
drivers/media/platform/ti-vpe/cal.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/ti-vpe/cal.c b/drivers/media/platform/ti-vpe/cal.c
index 6d4cbb8782ed..18fe2cb9dd17 100644
--- a/drivers/media/platform/ti-vpe/cal.c
+++ b/drivers/media/platform/ti-vpe/cal.c
@@ -372,8 +372,6 @@ struct cal_ctx {
struct v4l2_subdev *sensor;
struct v4l2_fwnode_endpoint endpoint;
- struct v4l2_async_subdev asd;
-
struct v4l2_fh fh;
struct cal_dev *dev;
struct cc_data *cc;
@@ -2032,7 +2030,6 @@ static int of_cal_create_instance(struct cal_ctx *ctx, int inst)
parent = pdev->dev.of_node;
- asd = &ctx->asd;
endpoint = &ctx->endpoint;
ep_node = NULL;
@@ -2040,6 +2037,10 @@ static int of_cal_create_instance(struct cal_ctx *ctx, int inst)
sensor_node = NULL;
ret = -EINVAL;
+ asd = kzalloc(sizeof(*asd), GFP_KERNEL);
+ if (!asd)
+ goto cleanup_exit;
+
ctx_dbg(3, ctx, "Scanning Port node for csi2 port: %d\n", inst);
for (index = 0; index < CAL_NUM_CSI2_PORTS; index++) {
port = of_get_next_port(parent, port);
--
2.17.1
This is a note to let you know that I've just added the patch titled
driver core: Call sync_state() even if supplier has no consumers
to my driver-core git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
in the driver-core-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 21eb93f432b1a785df193df1a56a59e9eb3a985f Mon Sep 17 00:00:00 2001
From: Saravana Kannan <saravanak(a)google.com>
Date: Fri, 21 Feb 2020 00:05:08 -0800
Subject: driver core: Call sync_state() even if supplier has no consumers
The initial patch that added sync_state() support didn't handle the case
where a supplier has no consumers. This was because when a device is
successfully bound with a driver, only its suppliers were checked to see
if they are eligible to get a sync_state(). This is not sufficient for
devices that have no consumers but still need to do device state clean
up. So fix this.
Fixes: fc5a251d0fd7ca90 (driver core: Add sync_state driver/bus callback)
Signed-off-by: Saravana Kannan <saravanak(a)google.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20200221080510.197337-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/base/core.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 42a672456432..3306d5ae92a6 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -745,25 +745,31 @@ static void __device_links_queue_sync_state(struct device *dev,
/**
* device_links_flush_sync_list - Call sync_state() on a list of devices
* @list: List of devices to call sync_state() on
+ * @dont_lock_dev: Device for which lock is already held by the caller
*
* Calls sync_state() on all the devices that have been queued for it. This
- * function is used in conjunction with __device_links_queue_sync_state().
+ * function is used in conjunction with __device_links_queue_sync_state(). The
+ * @dont_lock_dev parameter is useful when this function is called from a
+ * context where a device lock is already held.
*/
-static void device_links_flush_sync_list(struct list_head *list)
+static void device_links_flush_sync_list(struct list_head *list,
+ struct device *dont_lock_dev)
{
struct device *dev, *tmp;
list_for_each_entry_safe(dev, tmp, list, links.defer_sync) {
list_del_init(&dev->links.defer_sync);
- device_lock(dev);
+ if (dev != dont_lock_dev)
+ device_lock(dev);
if (dev->bus->sync_state)
dev->bus->sync_state(dev);
else if (dev->driver && dev->driver->sync_state)
dev->driver->sync_state(dev);
- device_unlock(dev);
+ if (dev != dont_lock_dev)
+ device_unlock(dev);
put_device(dev);
}
@@ -801,7 +807,7 @@ void device_links_supplier_sync_state_resume(void)
out:
device_links_write_unlock();
- device_links_flush_sync_list(&sync_list);
+ device_links_flush_sync_list(&sync_list, NULL);
}
static int sync_state_resume_initcall(void)
@@ -865,6 +871,11 @@ void device_links_driver_bound(struct device *dev)
driver_deferred_probe_add(link->consumer);
}
+ if (defer_sync_state_count)
+ __device_links_supplier_defer_sync(dev);
+ else
+ __device_links_queue_sync_state(dev, &sync_list);
+
list_for_each_entry(link, &dev->links.suppliers, c_node) {
if (!(link->flags & DL_FLAG_MANAGED))
continue;
@@ -883,7 +894,7 @@ void device_links_driver_bound(struct device *dev)
device_links_write_unlock();
- device_links_flush_sync_list(&sync_list);
+ device_links_flush_sync_list(&sync_list, dev);
}
static void device_link_drop_managed(struct device_link *link)
--
2.25.1
Commit 2c361684803e ("PM / Domains: Don't treat zero found compatible idle
states as an error"), moved of_genpd_parse_idle_states() towards allowing
none compatible idle state to be found for the device node, rather than
returning an error code.
However, it didn't consider that the "domain-idle-states" DT property may
be missing as it's optional, which makes of_count_phandle_with_args() to
return -ENOENT. Let's fix this to make the behaviour consistent.
Reported-by: Benjamin Gaignard <benjamin.gaignard(a)st.com>
Fixes: 2c361684803e ("PM / Domains: Don't treat zero found compatible idle states as an error")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
---
drivers/base/power/domain.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 959d6d5eb000..0a01df608849 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2653,7 +2653,7 @@ static int genpd_iterate_idle_states(struct device_node *dn,
ret = of_count_phandle_with_args(dn, "domain-idle-states", NULL);
if (ret <= 0)
- return ret;
+ return ret == -ENOENT ? 0 : ret;
/* Loop over the phandles until all the requested entry is found */
of_for_each_phandle(&it, ret, dn, "domain-idle-states", NULL, 0) {
--
2.20.1
Since commit 7723f4c5ecdb ("driver core: platform: Add an error
message to platform_get_irq*()"), platform_get_irq() calls dev_err()
on an error. As we enumerate all interrupts until platform_get_irq()
fails, we now systematically get a message such as:
"vfio-platform fff51000.ethernet: IRQ index 3 not found" which is
a false positive.
Let's use platform_get_irq_optional() instead.
Signed-off-by: Eric Auger <eric.auger(a)redhat.com>
Cc: stable(a)vger.kernel.org # v5.3+
---
drivers/vfio/platform/vfio_platform.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index ae1a5eb98620..1e2769010089 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -44,7 +44,7 @@ static int get_platform_irq(struct vfio_platform_device *vdev, int i)
{
struct platform_device *pdev = (struct platform_device *) vdev->opaque;
- return platform_get_irq(pdev, i);
+ return platform_get_irq_optional(pdev, i);
}
static int vfio_platform_probe(struct platform_device *pdev)
--
2.20.1
This is a note to let you know that I've just added the patch titled
usb: dwc3: gadget: Update chain bit correctly when using sg list
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From dad2aff3e827b112f27fa5e6f2bf87a110067c3f Mon Sep 17 00:00:00 2001
From: Pratham Pratap <prathampratap(a)codeaurora.org>
Date: Mon, 2 Mar 2020 21:44:43 +0000
Subject: usb: dwc3: gadget: Update chain bit correctly when using sg list
If scatter-gather operation is allowed, a large USB request is split
into multiple TRBs. For preparing TRBs for sg list, driver iterates
over the list and creates TRB for each sg and mark the chain bit to
false for the last sg. The current IOMMU driver is clubbing the list
of sgs which shares a page boundary into one and giving it to USB driver.
With this the number of sgs mapped it not equal to the the number of sgs
passed. Because of this USB driver is not marking the chain bit to false
since it couldn't iterate to the last sg. This patch addresses this issue
by marking the chain bit to false if it is the last mapped sg.
At a practical level, this patch resolves USB transfer stalls
seen with adb on dwc3 based db845c, pixel3 and other qcom
hardware after functionfs gadget added scatter-gather support
around v4.20.
Credit also to Anurag Kumar Vulisha <anurag.kumar.vulisha(a)xilinx.com>
who implemented a very similar fix to this issue.
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Yang Fei <fei.yang(a)intel.com>
Cc: Thinh Nguyen <thinhn(a)synopsys.com>
Cc: Tejas Joglekar <tejas.joglekar(a)synopsys.com>
Cc: Andrzej Pietrasiewicz <andrzej.p(a)collabora.com>
Cc: Jack Pham <jackp(a)codeaurora.org>
Cc: Todd Kjos <tkjos(a)google.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Linux USB List <linux-usb(a)vger.kernel.org>
Cc: stable <stable(a)vger.kernel.org> #4.20+
Signed-off-by: Pratham Pratap <prathampratap(a)codeaurora.org>
[jstultz: Slight tweak to remove sg_is_last() usage, reworked
commit message, minor comment tweak]
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Link: https://lore.kernel.org/r/20200302214443.55783-1-john.stultz@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/gadget.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 1b7d2f9cb673..1e00bf2d65a2 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1071,7 +1071,14 @@ static void dwc3_prepare_one_trb_sg(struct dwc3_ep *dep,
unsigned int rem = length % maxp;
unsigned chain = true;
- if (sg_is_last(s))
+ /*
+ * IOMMU driver is coalescing the list of sgs which shares a
+ * page boundary into one and giving it to USB driver. With
+ * this the number of sgs mapped is not equal to the number of
+ * sgs passed. So mark the chain bit to false if it isthe last
+ * mapped sg.
+ */
+ if (i == remaining - 1)
chain = false;
if (rem && usb_endpoint_dir_out(dep->endpoint.desc) && !chain) {
--
2.25.1
This is a note to let you know that I've just added the patch titled
usb: storage: Add quirk for Samsung Fit flash
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 86d92f5465958752481269348d474414dccb1552 Mon Sep 17 00:00:00 2001
From: Jim Lin <jilin(a)nvidia.com>
Date: Mon, 2 Mar 2020 22:21:35 +0800
Subject: usb: storage: Add quirk for Samsung Fit flash
Current driver has 240 (USB2.0) and 2048 (USB3.0) as max_sectors,
e.g., /sys/bus/scsi/devices/0:0:0:0/max_sectors
If data access times out, driver error handling will issue a port
reset.
Sometimes Samsung Fit (090C:1000) flash disk will not respond to
later Set Address or Get Descriptor command.
Adding this quirk to limit max_sectors to 64 sectors to avoid issue
occurring.
Signed-off-by: Jim Lin <jilin(a)nvidia.com>
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/1583158895-31342-1-git-send-email-jilin@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/storage/unusual_devs.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/usb/storage/unusual_devs.h b/drivers/usb/storage/unusual_devs.h
index 1cd9b6305b06..1880f3e13f57 100644
--- a/drivers/usb/storage/unusual_devs.h
+++ b/drivers/usb/storage/unusual_devs.h
@@ -1258,6 +1258,12 @@ UNUSUAL_DEV( 0x090a, 0x1200, 0x0000, 0x9999,
USB_SC_RBC, USB_PR_BULK, NULL,
0 ),
+UNUSUAL_DEV(0x090c, 0x1000, 0x1100, 0x1100,
+ "Samsung",
+ "Flash Drive FIT",
+ USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+ US_FL_MAX_SECTORS_64),
+
/* aeb */
UNUSUAL_DEV( 0x090c, 0x1132, 0x0000, 0xffff,
"Feiya",
--
2.25.1
This is a note to let you know that I've just added the patch titled
usb: usb251xb: fix regulator probe and error handling
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From f06947f275f1838586792c17b6ab70da82ed7b43 Mon Sep 17 00:00:00 2001
From: Marco Felsch <m.felsch(a)pengutronix.de>
Date: Wed, 26 Feb 2020 08:26:44 +0100
Subject: usb: usb251xb: fix regulator probe and error handling
Commit 4d7201cda226 ("usb: usb251xb: add vdd supply support") didn't
covered the non-DT use-case and so the regualtor_enable() call during
probe will fail on those platforms. Also the commit didn't handled the
error case correctly.
Move devm_regulator_get() out of usb251xb_get_ofdata() to address the
1st issue. This can be done without worries because devm_regulator_get()
handles the non-DT use-case too. Add devm_add_action_or_reset() to
address the 2nd bug.
Fixes: 4d7201cda226 ("usb: usb251xb: add vdd supply support")
Signed-off-by: Marco Felsch <m.felsch(a)pengutronix.de>
Cc: stable <stable(a)vger.kernel.org>
Acked-by: Richard Leitner <richard.leitner(a)skidata.com>
Link: https://lore.kernel.org/r/20200226072644.18490-1-m.felsch@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/misc/usb251xb.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/misc/usb251xb.c b/drivers/usb/misc/usb251xb.c
index 10c9e7f6273e..29fe5771c21b 100644
--- a/drivers/usb/misc/usb251xb.c
+++ b/drivers/usb/misc/usb251xb.c
@@ -424,10 +424,6 @@ static int usb251xb_get_ofdata(struct usb251xb *hub,
return err;
}
- hub->vdd = devm_regulator_get(dev, "vdd");
- if (IS_ERR(hub->vdd))
- return PTR_ERR(hub->vdd);
-
if (of_property_read_u16_array(np, "vendor-id", &hub->vendor_id, 1))
hub->vendor_id = USB251XB_DEF_VENDOR_ID;
@@ -640,6 +636,13 @@ static int usb251xb_get_ofdata(struct usb251xb *hub,
}
#endif /* CONFIG_OF */
+static void usb251xb_regulator_disable_action(void *data)
+{
+ struct usb251xb *hub = data;
+
+ regulator_disable(hub->vdd);
+}
+
static int usb251xb_probe(struct usb251xb *hub)
{
struct device *dev = hub->dev;
@@ -676,10 +679,19 @@ static int usb251xb_probe(struct usb251xb *hub)
if (err)
return err;
+ hub->vdd = devm_regulator_get(dev, "vdd");
+ if (IS_ERR(hub->vdd))
+ return PTR_ERR(hub->vdd);
+
err = regulator_enable(hub->vdd);
if (err)
return err;
+ err = devm_add_action_or_reset(dev,
+ usb251xb_regulator_disable_action, hub);
+ if (err)
+ return err;
+
err = usb251xb_connect(hub);
if (err) {
dev_err(dev, "Failed to connect hub (%d)\n", err);
--
2.25.1
This is a note to let you know that I've just added the patch titled
phy: allwinner: Fix GENMASK misuse
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 96b4ea324ae92386db2b0c73ace597c80cde1ecb Mon Sep 17 00:00:00 2001
From: Rikard Falkeborn <rikard.falkeborn(a)gmail.com>
Date: Sun, 23 Feb 2020 00:41:25 +0100
Subject: phy: allwinner: Fix GENMASK misuse
Arguments are supposed to be ordered high then low.
Fixes: a228890f9458 ("phy: allwinner: add phy driver for USB3 PHY on Allwinner H6 SoC")
Signed-off-by: Rikard Falkeborn <rikard.falkeborn(a)gmail.com>
Tested-by: Ondrej Jirman <megous(a)megous.com>
Signed-off-by: Ondrej Jirman <megous(a)megous.com>
Acked-by: Maxime Ripard <mripard(a)kernel.org>
Acked-by: Chen-Yu Tsai <wens(a)csie.org>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20191110124355.1569-1-rikard.falkeborn@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/phy/allwinner/phy-sun50i-usb3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/phy/allwinner/phy-sun50i-usb3.c b/drivers/phy/allwinner/phy-sun50i-usb3.c
index 1169f3e83a6f..b1c04f71a31d 100644
--- a/drivers/phy/allwinner/phy-sun50i-usb3.c
+++ b/drivers/phy/allwinner/phy-sun50i-usb3.c
@@ -49,7 +49,7 @@
#define SUNXI_LOS_BIAS(n) ((n) << 3)
#define SUNXI_LOS_BIAS_MASK GENMASK(5, 3)
#define SUNXI_TXVBOOSTLVL(n) ((n) << 0)
-#define SUNXI_TXVBOOSTLVL_MASK GENMASK(0, 2)
+#define SUNXI_TXVBOOSTLVL_MASK GENMASK(2, 0)
struct sun50i_usb3_phy {
struct phy *phy;
--
2.25.1
This is a note to let you know that I've just added the patch titled
usb: quirks: add NO_LPM quirk for Logitech Screen Share
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From b96ed52d781a2026d0c0daa5787c6f3d45415862 Mon Sep 17 00:00:00 2001
From: Dan Lazewatsky <dlaz(a)chromium.org>
Date: Wed, 26 Feb 2020 14:34:38 +0000
Subject: usb: quirks: add NO_LPM quirk for Logitech Screen Share
LPM on the device appears to cause xHCI host controllers to claim
that there isn't enough bandwidth to support additional devices.
Signed-off-by: Dan Lazewatsky <dlaz(a)chromium.org>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan(a)collabora.com>
Link: https://lore.kernel.org/r/20200226143438.1445-1-gustavo.padovan@collabora.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/core/quirks.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index 2b24336a72e5..2dac3e7cdd97 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -231,6 +231,9 @@ static const struct usb_device_id usb_quirk_list[] = {
/* Logitech PTZ Pro Camera */
{ USB_DEVICE(0x046d, 0x0853), .driver_info = USB_QUIRK_DELAY_INIT },
+ /* Logitech Screen Share */
+ { USB_DEVICE(0x046d, 0x086c), .driver_info = USB_QUIRK_NO_LPM },
+
/* Logitech Quickcam Fusion */
{ USB_DEVICE(0x046d, 0x08c1), .driver_info = USB_QUIRK_RESET_RESUME },
--
2.25.1
This is a note to let you know that I've just added the patch titled
usb: cdns3: gadget: link trb should point to next request
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 8a7c47fb7285b23ca259c888016513d5566fa9e8 Mon Sep 17 00:00:00 2001
From: Peter Chen <peter.chen(a)nxp.com>
Date: Wed, 19 Feb 2020 22:14:54 +0800
Subject: usb: cdns3: gadget: link trb should point to next request
It has marked the dequeue trb as link trb, but its next segment
pointer is still itself, it causes the transfer can't go on. Fix
it by set its pointer as the trb address for the next request.
Fixes: f616c3bda47e ("usb: cdns3: Fix dequeue implementation")
Signed-off-by: Peter Chen <peter.chen(a)nxp.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20200219141455.23257-2-peter.chen@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/cdns3/gadget.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/cdns3/gadget.c b/drivers/usb/cdns3/gadget.c
index 736b0c6e27fe..1d8a2af35bb0 100644
--- a/drivers/usb/cdns3/gadget.c
+++ b/drivers/usb/cdns3/gadget.c
@@ -2550,7 +2550,7 @@ int cdns3_gadget_ep_dequeue(struct usb_ep *ep,
/* Update ring only if removed request is on pending_req_list list */
if (req_on_hw_ring) {
link_trb->buffer = TRB_BUFFER(priv_ep->trb_pool_dma +
- (priv_req->start_trb * TRB_SIZE));
+ ((priv_req->end_trb + 1) * TRB_SIZE));
link_trb->control = (link_trb->control & TRB_CYCLE) |
TRB_TYPE(TRB_LINK) | TRB_CHAIN;
--
2.25.1
This is a note to let you know that I've just added the patch titled
usb: cdns3: gadget: toggle cycle bit before reset endpoint
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 4bf2dd65135a2d7fe202f7c10d65b51bcf645ac6 Mon Sep 17 00:00:00 2001
From: Peter Chen <peter.chen(a)nxp.com>
Date: Wed, 19 Feb 2020 22:14:55 +0800
Subject: usb: cdns3: gadget: toggle cycle bit before reset endpoint
If there are TRBs pending during reset endpoint operation, the
DMA will advance after reset operation, but it isn't expected,
since the data is not yet available (For OUT, the data is not
yet available). After the data is ready, there won't be any
interrupt since the EP_TRADDR already points to next TRB entry
and doorbell is not set.
To fix it, it toggles cycle bit before reset operation, and restores
it after reset, it could avoid unexpected DMA advance due to
cycle bit is for software during the endpoint reset operation.
Fixes: 7733f6c32e36 ("usb: cdns3: Add Cadence USB3 DRD Driver")
Signed-off-by: Peter Chen <peter.chen(a)nxp.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20200219141455.23257-3-peter.chen@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/cdns3/gadget.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/cdns3/gadget.c b/drivers/usb/cdns3/gadget.c
index 1d8a2af35bb0..3574dbb09366 100644
--- a/drivers/usb/cdns3/gadget.c
+++ b/drivers/usb/cdns3/gadget.c
@@ -2595,11 +2595,21 @@ int __cdns3_gadget_ep_clear_halt(struct cdns3_endpoint *priv_ep)
{
struct cdns3_device *priv_dev = priv_ep->cdns3_dev;
struct usb_request *request;
+ struct cdns3_request *priv_req;
+ struct cdns3_trb *trb = NULL;
int ret;
int val;
trace_cdns3_halt(priv_ep, 0, 0);
+ request = cdns3_next_request(&priv_ep->pending_req_list);
+ if (request) {
+ priv_req = to_cdns3_request(request);
+ trb = priv_req->trb;
+ if (trb)
+ trb->control = trb->control ^ TRB_CYCLE;
+ }
+
writel(EP_CMD_CSTALL | EP_CMD_EPRST, &priv_dev->regs->ep_cmd);
/* wait for EPRST cleared */
@@ -2610,10 +2620,11 @@ int __cdns3_gadget_ep_clear_halt(struct cdns3_endpoint *priv_ep)
priv_ep->flags &= ~(EP_STALLED | EP_STALL_PENDING);
- request = cdns3_next_request(&priv_ep->pending_req_list);
-
- if (request)
+ if (request) {
+ if (trb)
+ trb->control = trb->control ^ TRB_CYCLE;
cdns3_rearm_transfer(priv_ep, 1);
+ }
cdns3_start_all_request(priv_dev, priv_ep);
return ret;
--
2.25.1
This is a note to let you know that I've just added the patch titled
staging: kpc2000: prevent underflow in cpld_reconfigure()
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the staging-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 72db61d7d17a475d3cc9de1a7c871d518fcd82f0 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Mon, 24 Feb 2020 13:33:25 +0300
Subject: staging: kpc2000: prevent underflow in cpld_reconfigure()
This function should not allow negative values of "wr_val". If
negatives are allowed then capping the upper bound at 7 is
meaningless. Let's make it unsigned.
Fixes: 7dc7967fc39a ("staging: kpc2000: add initial set of Daktronics drivers")
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20200224103325.hrxdnaeqsthplu42@kili.mountain
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/kpc2000/kpc2000/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/kpc2000/kpc2000/core.c b/drivers/staging/kpc2000/kpc2000/core.c
index 93cf28febdf6..7b00d7069e21 100644
--- a/drivers/staging/kpc2000/kpc2000/core.c
+++ b/drivers/staging/kpc2000/kpc2000/core.c
@@ -110,10 +110,10 @@ static ssize_t cpld_reconfigure(struct device *dev,
const char *buf, size_t count)
{
struct kp2000_device *pcard = dev_get_drvdata(dev);
- long wr_val;
+ unsigned long wr_val;
int rv;
- rv = kstrtol(buf, 0, &wr_val);
+ rv = kstrtoul(buf, 0, &wr_val);
if (rv < 0)
return rv;
if (wr_val > 7)
--
2.25.1
The TLB flush optimisation (a46cc7a90f: powerpc/mm/radix: Improve TLB/PWC
flushes) may result in random memory corruption. Any concurrent page-table walk
could end up with a Use-after-Free. Even on UP this might give issues, since
mmu_gather is preemptible these days. An interrupt or preempted task accessing
user pages might stumble into the free page if the hardware caches page
directories.
The series is a backport of the fix sent by Peter [1].
The first three patches are dependencies for the last patch (avoid potential
double flush). If the performance impact due to double flush is considered
trivial then the first three patches and last patch may be dropped.
This is only for v4.19 stable.
Changelog:
* Send the patches with the correct format (commit sha1 upstream) for stable
--
Aneesh Kumar K.V (1):
powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
Peter Zijlstra (4):
asm-generic/tlb: Track freeing of page-table directories in struct
mmu_gather
asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE
mm/mmu_gather: invalidate TLB correctly on batch allocation failure
and flush
asm-generic/tlb: avoid potential double flush
Will Deacon (1):
asm-generic/tlb: Track which levels of the page tables have been
cleared
arch/Kconfig | 3 -
arch/powerpc/Kconfig | 2 +-
arch/powerpc/include/asm/book3s/32/pgalloc.h | 8 --
arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 -
arch/powerpc/include/asm/tlb.h | 11 ++
arch/powerpc/mm/pgtable-book3s64.c | 7 --
arch/sparc/include/asm/tlb_64.h | 9 ++
arch/x86/Kconfig | 1 -
include/asm-generic/tlb.h | 103 ++++++++++++++++---
mm/memory.c | 20 ++--
10 files changed, 122 insertions(+), 44 deletions(-)
--
2.24.1
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 71926bbb9dfd - Linux 5.4.24-rc1
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=dataware…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ⚡⚡⚡ Storage blktests
Host 2:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
ppc64le:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Host 2:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
Host 3:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
x86_64:
Host 1:
✅ Boot test
✅ Storage SAN device stress - mpt3sas driver
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
Host 3:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ pciutils: sanity smoke test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Host 4:
✅ Boot test
✅ Storage SAN device stress - megaraid_sas
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
Host 6:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
✅ stress: stress-ng
🚧 ✅ IOMMU boot test
🚧 ✅ Storage blktests
Test sources: https://github.com/CKI-project/tests-beaker
💚 Pull requests are welcome for new tests or improvements to existing tests!
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
On Tue, 2020-03-03 at 11:19 -0800, Evgenii Stepanov wrote:
> I'm a bit surprised that this is necessary, given that the earlier
> patch that added the (current->flags & PF_KTHREAD) condition was in
> response to this exact problem, and I know for sure that it helped.
> This was the stack trace for the call to __range_ok in that case:
> [ 12.886765] c1 271 _copy_to_iter+0xb8/0x5c0
>
> [ 12.891421] c1 271 ffs_user_copy_worker+0xec/0x24c
> [ 12.896699] c1 271 process_one_work+0x264/0x450
> [ 12.901713] c1 271 worker_thread+0x250/0x484
> [ 12.906454] c1 271 kthread+0x11c/0x12c
> [ 12.910664] c1 271 ret_from_fork+0x10/0x18
> It would be great to know what changed to require the updated
> condition.
> Adding a prctl call to adb is unlikely to help, because it would not
> stop tagged address generation in malloc.
Sorry for late reply, after carefully check the kerenl update status
in Mediatek's branch. I've found we got this patch (df325e05a682
("arm64: Validate tagged addresses in access_ok() called from kernel
threads")) updated into internal Mediatek's working tree around Feb 23
or 24. However, I'm not sure why that patch cannot work in my own
working tree at that time. I've indeed dumped user space address and
checked the return result in access_ok() and found it was not worked.
In these days I've clean up all my working space and re-test this patch,
I've found to check PF_KTHREAD and TIF_TAGGED_ADDR was enough to solve
this problem. Sorry for bothering I'm not sure what causes that fail in
previous environment.
Moreover, I've tested PF_WQ_WORKER case, if we replaced test flag
PF_KTHREAD by PF_WQ_WORKER, AIO will still be worked, too. Both code
A.
if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
(current->flags & PF_KTHREAD || test_thread_flag(TIF_TAGGED_ADDR)))
or
B.
if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
(current->flags & PF_WQ_WORKER || test_thread_flag(TIF_TAGGED_ADDR)))
are worked for this issue.
> On Mon, Mar 2, 2020 at 8:19 AM Catalin Marinas
> <catalin.marinas(a)arm.com> wrote:
>
> On Sun, Mar 01, 2020 at 11:20:43AM +0800, Macpaul Lin wrote:
> > On Fri, 2020-02-28 at 16:48 +0000, Catalin Marinas wrote:
> > > On Wed, Feb 26, 2020 at 08:01:52PM +0800, Macpaul Lin
> wrote:
> > > > diff --git a/drivers/usb/gadget/function/f_fs.c
> b/drivers/usb/gadget/function/f_fs.c
> > > > index ce1d023..192935f 100644
> > > > --- a/drivers/usb/gadget/function/f_fs.c
> > > > +++ b/drivers/usb/gadget/function/f_fs.c
> > > > @@ -715,7 +715,20 @@ static void
> ffs_epfile_io_complete(struct usb_ep *_ep, struct usb_request
> *req)
> > > >
> > > > static ssize_t ffs_copy_to_iter(void *data, int
> data_len, struct iov_iter *iter)
> > > > {
> > > > - ssize_t ret = copy_to_iter(data, data_len, iter);
> > > > + ssize_t ret;
> > > > +
> > > > +#if defined(CONFIG_ARM64)
> > > > + /*
> > > > + * Replace tagged address passed by user space
> application before
> > > > + * copying.
> > > > + */
> > > > + if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
> > > > + (iter->type == ITER_IOVEC)) {
> > > > + *(unsigned long *)&iter->iov->iov_base =
> > > > + (unsigned
> long)untagged_addr(iter->iov->iov_base);
> > > > + }
> > > > +#endif
> > > > + ret = copy_to_iter(data, data_len, iter);
> > > > if (likely(ret == data_len))
> > > > return ret;
> > >
> > > I had forgotten that we discussed a similar case already a
> few months
> > > ago (thanks to Evgenii for pointing out). Do you have this
> commit
> > > applied to your tree: df325e05a682 ("arm64: Validate
> tagged addresses in
> > > access_ok() called from kernel threads")?
> > >
> >
> > Yes! We have that patch. I've also got Google's reply about
> referencing
> > this patch in android kernel tree.
> >
> https://android-review.googlesource.com/c/kernel/common/+/1186615
> >
> > However, during my debugging process, I've dumped specific
> length (e.g.,
> > 24 bytes for the first request) AIO request buffer address
> both in adbd
> > and in __range_ok(). Then I've found __range_ok() still
> always return
> > false on address begin with "0x3c". Since untagged_addr()
> already called
> > in __range_ok(), to set "TIF_TAGGED_ADDR" with adbd's user
> space buffer
> > should be the possible solution. Hence I've send the v3
> patch.
>
> ffs_copy_to_iter() is called from a workqueue
> (ffs_user_copy_worker()).
> That's still in a kernel thread context but it doesn't have
> PF_KTHREAD
> set, hence __range_ok() rejects the tagged address. Can you
> try the diff
> below:
>
> diff --git a/arch/arm64/include/asm/uaccess.h
> b/arch/arm64/include/asm/uaccess.h
> index 32fc8061aa76..2803143cad1f 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -68,7 +68,8 @@ static inline unsigned long __range_ok(const
> void __user *addr, unsigned long si
> * the user address before checking.
> */
> if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI) &&
> - (current->flags & PF_KTHREAD ||
> test_thread_flag(TIF_TAGGED_ADDR)))
> + (current->flags & (PF_KTHREAD | PF_WQ_WORKER) ||
> + test_thread_flag(TIF_TAGGED_ADDR)))
> addr = untagged_addr(addr);
>
> __chk_user_ptr(addr);
> -
Many thanks to Catalin and Evgenii.
Regards,
Macpaul Lin
The patch titled
Subject: mm: fork: fix kernel_stack memcg stats for various stack implementations
has been added to the -mm tree. Its filename is
mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-fork-fix-kernel_stack-memcg-sta…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-fork-fix-kernel_stack-memcg-sta…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: fork: fix kernel_stack memcg stats for various stack implementations
Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the
space for task stacks can be allocated using __vmalloc_node_range(),
alloc_pages_node() and kmem_cache_alloc_node(). In the first and the
second cases page->mem_cgroup pointer is set, but in the third it's not:
memcg membership of a slab page should be determined using the
memcg_from_slab_page() function, which looks at
page->slab_cache->memcg_params.memcg . In this case, using
mod_memcg_page_state() (as in account_kernel_stack()) is incorrect:
page->mem_cgroup pointer is NULL even for pages charged to a non-root
memory cgroup.
It can lead to kernel_stack per-memcg counters permanently showing 0 on
some architectures (depending on the configuration).
In order to fix it, let's introduce a mod_memcg_obj_state() helper, which
takes a pointer to a kernel object as a first argument, uses
mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls
mod_memcg_state(). It allows to handle all possible configurations
(CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without
spilling any memcg/kmem specifics into fork.c .
Note: this patch has been first posted as a part of the new slab
controller patchset. This is a slightly updated version: the fixes tag
has been added and the commit log was extended by the advice of Johannes
Weiner. Because it's a fix that makes sense by itself, I'm re-posting it
as a standalone patch.
Link: http://lkml.kernel.org/r/20200303233550.251375-1-guro@fb.com
Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Bharata B Rao <bharata(a)linux.ibm.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memcontrol.h | 5 +++++
kernel/fork.c | 4 ++--
mm/memcontrol.c | 11 +++++++++++
3 files changed, 18 insertions(+), 2 deletions(-)
--- a/include/linux/memcontrol.h~mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations
+++ a/include/linux/memcontrol.h
@@ -695,6 +695,7 @@ static inline unsigned long lruvec_page_
void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
int val);
void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val);
+void mod_memcg_obj_state(void *p, int idx, int val);
static inline void mod_lruvec_state(struct lruvec *lruvec,
enum node_stat_item idx, int val)
@@ -1123,6 +1124,10 @@ static inline void __mod_lruvec_slab_sta
__mod_node_page_state(page_pgdat(page), idx, val);
}
+static inline void mod_memcg_obj_state(void *p, int idx, int val)
+{
+}
+
static inline
unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
gfp_t gfp_mask,
--- a/kernel/fork.c~mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations
+++ a/kernel/fork.c
@@ -397,8 +397,8 @@ static void account_kernel_stack(struct
mod_zone_page_state(page_zone(first_page), NR_KERNEL_STACK_KB,
THREAD_SIZE / 1024 * account);
- mod_memcg_page_state(first_page, MEMCG_KERNEL_STACK_KB,
- account * (THREAD_SIZE / 1024));
+ mod_memcg_obj_state(stack, MEMCG_KERNEL_STACK_KB,
+ account * (THREAD_SIZE / 1024));
}
}
--- a/mm/memcontrol.c~mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations
+++ a/mm/memcontrol.c
@@ -777,6 +777,17 @@ void __mod_lruvec_slab_state(void *p, en
rcu_read_unlock();
}
+void mod_memcg_obj_state(void *p, int idx, int val)
+{
+ struct mem_cgroup *memcg;
+
+ rcu_read_lock();
+ memcg = mem_cgroup_from_obj(p);
+ if (memcg)
+ mod_memcg_state(memcg, idx, val);
+ rcu_read_unlock();
+}
+
/**
* __count_memcg_events - account VM events in a cgroup
* @memcg: the memory cgroup
_
Patches currently in -mm which might be from guro(a)fb.com are
mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations.patch
mm-memcg-slab-introduce-mem_cgroup_from_obj.patch
mm-memcg-slab-introduce-mem_cgroup_from_obj-v2.patch
mm-kmem-cleanup-__memcg_kmem_charge_memcg-arguments.patch
mm-kmem-cleanup-memcg_kmem_uncharge_memcg-arguments.patch
mm-kmem-rename-memcg_kmem_uncharge-into-memcg_kmem_uncharge_page.patch
mm-kmem-switch-to-nr_pages-in-__memcg_kmem_charge_memcg.patch
mm-memcg-slab-cache-page-number-in-memcg_uncharge_slab.patch
mm-kmem-rename-__memcg_kmem_uncharge_memcg-to-__memcg_kmem_uncharge.patch
On 3/3/20 11:34 AM, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> padata: always acquire cpu_hotplug_lock before pinst->lock
>
> to the 4.4-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> padata-always-acquire-cpu_hotplug_lock-before-pinst-lock.patch
> and it can be found in the queue-4.4 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
> From 38228e8848cd7dd86ccb90406af32de0cad24be3 Mon Sep 17 00:00:00 2001
> From: Daniel Jordan <daniel.m.jordan(a)oracle.com>
> Date: Tue, 3 Dec 2019 14:31:11 -0500
> Subject: padata: always acquire cpu_hotplug_lock before pinst->lock
>
> From: Daniel Jordan <daniel.m.jordan(a)oracle.com>
>
> commit 38228e8848cd7dd86ccb90406af32de0cad24be3 upstream.
Hi, can this please be removed from all stable trees? I guess this made its way back in after Sasha removed it before. Justification is at
https://lkml.kernel.org/r/20200222000045.cl45vclfhvkjursm@ca-dmjordan1.us.o…
thanks,
Daniel
The WMI method to set the charge threshold does not provide a
way to specific a battery, so we assume it is the first/primary
battery (by checking if the name is BAT0).
On some newer ASUS laptops (Zenbook UM431DA) though, the
primary/first battery isn't named BAT0 but BATT, so we need
to support that case.
Signed-off-by: Kristian Klausen <kristian(a)klausen.dk>
Cc: stable(a)vger.kernel.org
---
I'm not sure if this is candidate for -stable, it fix a real bug
(charge threshold doesn't work on newer ASUS laptops) which has been
reported by a user[1], but is that enough?
I had a quick look at[2], can this be considered a "something
critical"? It "bothers people"[1]. My point: I'm not sure..
I'm unsure if there is a bettery way to fix this. Maybe a counter
would be better (+1 for every new battery)? It would probably need
to be atomic to prevent race condition (I'm not sure how this code
is run), but this "fix" is way simpler.
Please do not accept this patch just yet, I'm waiting for the tester
to either confirm or deny credit[3].
[1] https://gist.github.com/klausenbusk/643f15320ae8997427155c38be13e445#gistco…
[2] https://www.kernel.org/doc/html/v5.5/process/stable-kernel-rules.html
[3] https://gist.github.com/klausenbusk/643f15320ae8997427155c38be13e445#gistco…
v2:
Add base commit
Remove uneeded parenthesis and adjust styling
drivers/platform/x86/asus-wmi.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index 612ef5526226..01becbe2a9a8 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -426,8 +426,11 @@ static int asus_wmi_battery_add(struct power_supply *battery)
{
/* The WMI method does not provide a way to specific a battery, so we
* just assume it is the first battery.
+ * Note: On some newer ASUS laptops (Zenbook UM431DA), the primary/first
+ * battery is named BATT.
*/
- if (strcmp(battery->desc->name, "BAT0") != 0)
+ if (strcmp(battery->desc->name, "BAT0") != 0 &&
+ strcmp(battery->desc->name, "BATT") != 0)
return -ENODEV;
if (device_create_file(&battery->dev,
base-commit: 11a48a5a18c63fd7621bb050228cebf13566e4d8
--
2.25.1
The WMI method to set the charge threshold does not provide a
way to specific a battery, so we assume it is the first/primary
battery (by checking if the name is BAT0).
On some newer ASUS laptops (Zenbook UM431DA) though, the
primary/first battery isn't named BAT0 but BATT, so we need
to support that case.
Fixes: 7973353e92ee ("platform/x86: asus-wmi: Refactor charge threshold to use the battery hooking API")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kristian Klausen <kristian(a)klausen.dk>
---
I'm not sure if this is candidate for -stable, it fix a real bug
(charge threshold doesn't work on newer ASUS laptops) which has been
reported by a user[1], but is that enough?
I had a quick look at[2], can this be considered a "something
critical"? It "bothers people"[1]. My point: I'm not sure..
I'm unsure if there is a bettery way to fix this. Maybe a counter
would be better (+1 for every new battery)? It would probably need
to be atomic to prevent race condition (I'm not sure how this code
is run), but this "fix" is way simpler.
Please do not accept this patch just yet, I'm waiting for the tester
to either confirm or deny credit[3].
[1] https://gist.github.com/klausenbusk/643f15320ae8997427155c38be13e445#gistco…
[2] https://www.kernel.org/doc/html/v5.5/process/stable-kernel-rules.html
[3] https://gist.github.com/klausenbusk/643f15320ae8997427155c38be13e445#gistco…
v3:
Add Fixes.
v2:
Add base commit
Remove uneeded parenthesis and adjust styling
drivers/platform/x86/asus-wmi.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
index 612ef5526226..01becbe2a9a8 100644
--- a/drivers/platform/x86/asus-wmi.c
+++ b/drivers/platform/x86/asus-wmi.c
@@ -426,8 +426,11 @@ static int asus_wmi_battery_add(struct power_supply *battery)
{
/* The WMI method does not provide a way to specific a battery, so we
* just assume it is the first battery.
+ * Note: On some newer ASUS laptops (Zenbook UM431DA), the primary/first
+ * battery is named BATT.
*/
- if (strcmp(battery->desc->name, "BAT0") != 0)
+ if (strcmp(battery->desc->name, "BAT0") != 0 &&
+ strcmp(battery->desc->name, "BATT") != 0)
return -ENODEV;
if (device_create_file(&battery->dev,
base-commit: 11a48a5a18c63fd7621bb050228cebf13566e4d8
--
2.25.1
The way cookie_init_hw_msi_region() allocates the iommu_dma_msi_page
structures doesn't match the way iommu_put_dma_cookie() frees them.
The former performs a single allocation of all the required structures,
while the latter tries to free them one at a time. It doesn't quite
work for the main use case (the GICv3 ITS where the range is 64kB)
when the base ganule size is 4kB.
This leads to a nice slab corruption on teardown, which is easily
observable by simply creating a VF on a SRIOV-capable device, and
tearing it down immediately (no need to even make use of it).
Fix it by allocating iommu_dma_msi_page structures one at a time.
Fixes: 7c1b058c8b5a3 ("iommu/dma: Handle IOMMU API reserved regions")
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: Robin Murphy <robin.murphy(a)arm.com>
Cc: Joerg Roedel <jroedel(a)suse.de>
Cc: Eric Auger <eric.auger(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/iommu/dma-iommu.c | 36 ++++++++++++++++++++++++------------
1 file changed, 24 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a2e96a5fd9a7..01fa64856c12 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -171,25 +171,37 @@ static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
phys_addr_t start, phys_addr_t end)
{
struct iova_domain *iovad = &cookie->iovad;
- struct iommu_dma_msi_page *msi_page;
- int i, num_pages;
+ struct iommu_dma_msi_page *msi_page, *tmp;
+ int i, num_pages, ret = 0;
+ phys_addr_t base;
- start -= iova_offset(iovad, start);
+ base = start -= iova_offset(iovad, start);
num_pages = iova_align(iovad, end - start) >> iova_shift(iovad);
- msi_page = kcalloc(num_pages, sizeof(*msi_page), GFP_KERNEL);
- if (!msi_page)
- return -ENOMEM;
-
for (i = 0; i < num_pages; i++) {
- msi_page[i].phys = start;
- msi_page[i].iova = start;
- INIT_LIST_HEAD(&msi_page[i].list);
- list_add(&msi_page[i].list, &cookie->msi_page_list);
+ msi_page = kmalloc(sizeof(*msi_page), GFP_KERNEL);
+ if (!msi_page) {
+ ret = -ENOMEM;
+ break;
+ }
+ msi_page->phys = start;
+ msi_page->iova = start;
+ INIT_LIST_HEAD(&msi_page->list);
+ list_add(&msi_page->list, &cookie->msi_page_list);
start += iovad->granule;
}
- return 0;
+ if (ret) {
+ list_for_each_entry_safe(msi_page, tmp,
+ &cookie->msi_page_list, list) {
+ if (msi_page->phys >= base && msi_page->phys < start) {
+ list_del(&msi_page->list);
+ kfree(msi_page);
+ }
+ }
+ }
+
+ return ret;
}
static int iova_reserve_pci_windows(struct pci_dev *dev,
--
2.20.1
[Posting again after correcting CC’ed e-mail id]
These patches include few backported fixes for the 4.4 stable
tree.
I would appreciate if you could kindly consider including them in the
next release.
Ajay
---
[Changes from v3]:
- Dropped [Patch v3 8/8] [2] as patches 1-7 are independent from patch 8
and patch 8 may require more work.
[Changes from v2]:
Merged following changes from Vlastimil's series [1]:
- Added page_ref_count() in [Patch v3 5/8]
- Added missing refcount overflow checks on x86 and s390 [Patch v3 5/8]
- Added [Patch v3 8/8]
- Removed 7aef4172c795 i.e. [Patch v2 3/8]
[1] https://lore.kernel.org/stable/20191108093814.16032-1-vbabka@suse.cz/
[2] https://lore.kernel.org/stable/1576529149-14269-9-git-send-email-akaher@vmw…
---
[PATCH v4 1/7]:
Backporting of upstream commit f958d7b528b1:
mm: make page ref count overflow check tighter and more explicit
[PATCH v4 2/7]:
Backporting of upstream commit 88b1a17dfc3e:
mm: add 'try_get_page()' helper function
[PATCH v4 3/7]:
Backporting of upstream commit a3e328556d41:
mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages
[PATCH v4 4/7]:
Backporting of upstream commit d63206ee32b6:
mm, gup: ensure real head page is ref-counted when using hugepages
[PATCH v4 5/7]:
Backporting of upstream commit 8fde12ca79af:
mm: prevent get_user_pages() from overflowing page refcount
[PATCH v4 6/7]:
Backporting of upstream commit 7bf2d1df8082:
pipe: add pipe_buf_get() helper
[PATCH v4 7/7]:
Backporting of upstream commit 15fab63e1e57:
fs: prevent page refcount overflow in pipe_buf_get
Hi Greg, Sasha,
Can you please include this perf stat fix to 4.19?
These two commits needed:
commit eb08d006054e7e374592068919e32579988602d4
Author: Ravi Bangoria <ravi.bangoria(a)linux.ibm.com>
Date: Thu Nov 15 15:25:32 2018 +0530
perf stat: Use perf_evsel__is_clocki() for clock events
commit 57ddf09173c1e7d0511ead8924675c7198e56545
Author: Ravi Bangoria <ravi.bangoria(a)linux.ibm.com>
Date: Fri Nov 16 09:58:43 2018 +0530
perf stat: Fix shadow stats for clock events
-Tommi
From: Jason Wang <jasowang(a)redhat.com>
[ Upstream commit 2f3ab6221e4c87960347d65c7cab9bd917d1f637 ]
When link is down, writes to the device might fail with
-EIO. Userspace needs an indication when the status is resolved. As a
fix, tun_net_open() attempts to wake up writers - but that is only
effective if SOCKWQ_ASYNC_NOSPACE has been set in the past. This is
not the case of vhost_net which only poll for EPOLLOUT after it meets
errors during sendmsg().
This patch fixes this by making sure SOCKWQ_ASYNC_NOSPACE is set when
socket is not writable or device is down to guarantee EPOLLOUT will be
raised in either tun_chr_poll() or tun_sock_write_space() after device
is up.
Cc: Hannes Frederic Sowa <hannes(a)stressinduktion.org>
Cc: Eric Dumazet <edumazet(a)google.com>
Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
Signed-off-by: Jason Wang <jasowang(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Tommi Rantala <tommi.t.rantala(a)nokia.com>
---
drivers/net/tun.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3086211829a7..ba34f61d70de 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1134,6 +1134,13 @@ static void tun_net_init(struct net_device *dev)
dev->max_mtu = MAX_MTU - dev->hard_header_len;
}
+static bool tun_sock_writeable(struct tun_struct *tun, struct tun_file *tfile)
+{
+ struct sock *sk = tfile->socket.sk;
+
+ return (tun->dev->flags & IFF_UP) && sock_writeable(sk);
+}
+
/* Character device part */
/* Poll */
@@ -1156,10 +1163,14 @@ static unsigned int tun_chr_poll(struct file *file, poll_table *wait)
if (!skb_array_empty(&tfile->tx_array))
mask |= POLLIN | POLLRDNORM;
- if (tun->dev->flags & IFF_UP &&
- (sock_writeable(sk) ||
- (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags) &&
- sock_writeable(sk))))
+ /* Make sure SOCKWQ_ASYNC_NOSPACE is set if not writable to
+ * guarantee EPOLLOUT to be raised by either here or
+ * tun_sock_write_space(). Then process could get notification
+ * after it writes to a down device and meets -EIO.
+ */
+ if (tun_sock_writeable(tun, tfile) ||
+ (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags) &&
+ tun_sock_writeable(tun, tfile)))
mask |= POLLOUT | POLLWRNORM;
if (tun->dev->reg_state != NETREG_REGISTERED)
--
2.21.1
After include 3b5a39979daf ("slip: Fix memory leak in slip_open error path")
and e58c19124189 ("slip: Fix use-after-free Read in slip_open") with 4.4.y/4.9.y.
We will trigger a bug since we can double free sl->dev in slip_open. Actually,
we should backport cf124db566e6 ("net: Fix inconsistent teardown and release
of private netdev state.") too since it has delete free_netdev from sl_free_netdev.
Fix it by delete free_netdev from slip_open.
Signed-off-by: yangerkun <yangerkun(a)huawei.com>
---
drivers/net/slip/slip.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
index 0f8d5609ed51..d4a33baa33b6 100644
--- a/drivers/net/slip/slip.c
+++ b/drivers/net/slip/slip.c
@@ -868,7 +868,6 @@ err_free_chan:
tty->disc_data = NULL;
clear_bit(SLF_INUSE, &sl->flags);
sl_free_netdev(sl->dev);
- free_netdev(sl->dev);
err_exit:
rtnl_unlock();
--
2.23.0.rc2.8.gff66981f45
Hi,
I still see high (upto 30%) ksoftirqd cpu use with 4.19.101+ after these 2 back ports went in for 4.19.101
(had all 4 backports applied earlier to our tree):
commit f6783319737f28e4436a69611853a5a098cbe974 sched/fair: Fix insertion in rq->leaf_cfs_rq_list
commit 5d299eabea5a251fbf66e8277704b874bbba92dc sched/fair: Add tmp_alone_branch assertion
perf shows for any given ksoftirqd, with 20k-30k processes on the system with high scheduler load:
58.88% ksoftirqd/0 [kernel.kallsyms] [k] update_blocked_averages
Can we backport these 2 also, confirmed that it fixes this behavior of ksoftirqd.
commit 039ae8bcf7a5f4476f4487e6bf816885fb3fb617 upstream
commit 31bc6aeaab1d1de8959b67edbed5c7a4b3cdbe7c upstream
The second one doesn’t apply cleanly, there’s a small change for the last diff, where pelt needs renaming to task (cfg_rq_clock_pelt was cfs_rq_clock_task in 4.19.y) in that unchanged part of the diff:
@@ -7700,10 +7720,6 @@ static void update_blocked_averages(int cpu)
for_each_leaf_cfs_rq(rq, cfs_rq) {
struct sched_entity *se;
- /* throttled entities do not contribute to load */
- if (throttled_hierarchy(cfs_rq))
- continue;
-
if (update_cfs_rq_load_avg(cfs_rq_clock_pelt(cfs_rq), cfs_rq))
update_tg_load_avg(cfs_rq, 0);
I can post that patch with that rename if required.
Thanks
Vishnu
Address below Coverity complaint (Feb 25, 2020, 8:06 AM CET):
*** CID 1458999: Error handling issues (CHECKED_RETURN)
/drivers/usb/core/hub.c: 1869 in hub_probe()
1863
1864 if (id->driver_info & HUB_QUIRK_CHECK_PORT_AUTOSUSPEND)
1865 hub->quirk_check_port_auto_suspend = 1;
1866
1867 if (id->driver_info & HUB_QUIRK_DISABLE_AUTOSUSPEND) {
1868 hub->quirk_disable_autosuspend = 1;
>>> CID 1458999: Error handling issues (CHECKED_RETURN)
>>> Calling "usb_autopm_get_interface" without checking return value (as is done elsewhere 97 out of 111 times).
1869 usb_autopm_get_interface(intf);
1870 }
1871
1872 if (hub_configure(hub, &desc->endpoint[0].desc) >= 0)
1873 return 0;
1874
Rather than checking the return value of 'usb_autopm_get_interface()',
switch to the usb_autopm_get_interface_no_resume() API, as per:
On Tue, Feb 25, 2020 at 10:32:32AM -0500, Alan Stern wrote:
------ 8< ------
> This change (i.e. 'ret = usb_autopm_get_interface') is not necessary,
> because the resume operation cannot fail at this point (interfaces
> are always powered-up during probe). A better solution would be to
> call usb_autopm_get_interface_no_resume() instead.
------ 8< ------
Fixes: 1208f9e1d758c9 ("USB: hub: Fix the broken detection of USB3 device in SMSC hub")
Cc: Hardik Gajjar <hgajjar(a)de.adit-jv.com>
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org # v4.14+
Reported-by: scan-admin(a)coverity.com
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: Eugeniu Rosca <erosca(a)de.adit-jv.com>
---
v3:
- Make the summary line more clear
- s/autpm/autopm/ in patch description
- Cc <stable> with v4.14+ since v4.14.x is the earliest stable kernel
which accepted commit 1208f9e1d758c9 ("USB: hub: Fix the broken
detection of USB3 device in SMSC hub")
v2:
- [Alan Stern] Use usb_autopm_get_interface_no_resume() instead of
usb_autopm_get_interface()
- Augment commit description to provide background
- Link: https://lore.kernel.org/lkml/20200225183057.31953-1-erosca@de.adit-jv.com
v1:
- Link: https://lore.kernel.org/lkml/20200225130846.20236-1-erosca@de.adit-jv.com
---
drivers/usb/core/hub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 1d212f82c69b..1105983b5c1c 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1866,7 +1866,7 @@ static int hub_probe(struct usb_interface *intf, const struct usb_device_id *id)
if (id->driver_info & HUB_QUIRK_DISABLE_AUTOSUSPEND) {
hub->quirk_disable_autosuspend = 1;
- usb_autopm_get_interface(intf);
+ usb_autopm_get_interface_no_resume(intf);
}
if (hub_configure(hub, &desc->endpoint[0].desc) >= 0)
--
2.25.1
The remaining count should not include successful
shrink attempts.
Fixes: e7e9a307be9d ("staging: erofs: introduce workstation for decompression")
Cc: <stable(a)vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25(a)huawei.com>
---
Changes since v1:
- Add "Fixes:" tags respectively suggested by Eric. I'd suggest
no rush to backport PATCH 2/3 and 3/3 since it's not quite
sure whether they behave well for normal images for now and
I will backport them manually later since they have no impact
on system stability with corrupted images;
- Fix PATCH 2/3 to exclude legacy (no decompression inplace
support, < v5.3) images.
fs/erofs/utils.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c
index fddc5059c930..df42ea552a44 100644
--- a/fs/erofs/utils.c
+++ b/fs/erofs/utils.c
@@ -286,7 +286,7 @@ static unsigned long erofs_shrink_scan(struct shrinker *shrink,
spin_unlock(&erofs_sb_list_lock);
sbi->shrinker_run_no = run_no;
- freed += erofs_shrink_workstation(sbi, nr);
+ freed += erofs_shrink_workstation(sbi, nr - freed);
spin_lock(&erofs_sb_list_lock);
/* Get the next list element before we move this one */
--
2.17.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7febbcbc48fc92e3f33863b32ed715ba4aff18c4 Mon Sep 17 00:00:00 2001
From: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Date: Tue, 11 Feb 2020 15:55:59 +0200
Subject: [PATCH] serial: 8250: Check UPF_IRQ_SHARED in advance
The commit 54e53b2e8081
("tty: serial: 8250: pass IRQ shared flag to UART ports")
nicely explained the problem:
---8<---8<---
On some systems IRQ lines between multiple UARTs might be shared. If so, the
irqflags have to be configured accordingly. The reason is: The 8250 port startup
code performs IRQ tests *before* the IRQ handler for that particular port is
registered. This is performed in serial8250_do_startup(). This function checks
whether IRQF_SHARED is configured and only then disables the IRQ line while
testing.
This test is performed upon each open() of the UART device. Imagine two UARTs
share the same IRQ line: On is already opened and the IRQ is active. When the
second UART is opened, the IRQ line has to be disabled while performing IRQ
tests. Otherwise an IRQ might handler might be invoked, but the IRQ itself
cannot be handled, because the corresponding handler isn't registered,
yet. That's because the 8250 code uses a chain-handler and invokes the
corresponding port's IRQ handling routines himself.
Unfortunately this IRQF_SHARED flag isn't configured for UARTs probed via device
tree even if the IRQs are shared. This way, the actual and shared IRQ line isn't
disabled while performing tests and the kernel correctly detects a spurious
IRQ. So, adding this flag to the DT probe solves the issue.
Note: The UPF_SHARE_IRQ flag is configured unconditionally. Therefore, the
IRQF_SHARED flag can be set unconditionally as well.
Example stack trace by performing `echo 1 > /dev/ttyS2` on a non-patched system:
|irq 85: nobody cared (try booting with the "irqpoll" option)
| [...]
|handlers:
|[<ffff0000080fc628>] irq_default_primary_handler threaded [<ffff00000855fbb8>] serial8250_interrupt
|Disabling IRQ #85
---8<---8<---
But unfortunately didn't fix the root cause. Let's try again here by moving
IRQ flag assignment from serial_link_irq_chain() to serial8250_do_startup().
This should fix the similar issue reported for 8250_pnp case.
Since this change we don't need to have custom solutions in 8250_aspeed_vuart
and 8250_of drivers, thus, drop them.
Fixes: 1c2f04937b3e ("serial: 8250: add IRQ trigger support")
Reported-by: Li RongQing <lirongqing(a)baidu.com>
Cc: Kurt Kanzenbach <kurt(a)linutronix.de>
Cc: Vikram Pandita <vikram.pandita(a)ti.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: stable <stable(a)vger.kernel.org>
Acked-by: Kurt Kanzenbach <kurt(a)linutronix.de>
Link: https://lore.kernel.org/r/20200211135559.85960-1-andriy.shevchenko@linux.in…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_aspeed_vuart.c b/drivers/tty/serial/8250/8250_aspeed_vuart.c
index d657aa14c3e4..c33e02cbde93 100644
--- a/drivers/tty/serial/8250/8250_aspeed_vuart.c
+++ b/drivers/tty/serial/8250/8250_aspeed_vuart.c
@@ -446,7 +446,6 @@ static int aspeed_vuart_probe(struct platform_device *pdev)
port.port.line = rc;
port.port.irq = irq_of_parse_and_map(np, 0);
- port.port.irqflags = IRQF_SHARED;
port.port.handle_irq = aspeed_vuart_handle_irq;
port.port.iotype = UPIO_MEM;
port.port.type = PORT_16550A;
diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index 0894a22fd702..f2a33c9082a6 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -174,7 +174,7 @@ static int serial_link_irq_chain(struct uart_8250_port *up)
struct hlist_head *h;
struct hlist_node *n;
struct irq_info *i;
- int ret, irq_flags = up->port.flags & UPF_SHARE_IRQ ? IRQF_SHARED : 0;
+ int ret;
mutex_lock(&hash_mutex);
@@ -209,9 +209,8 @@ static int serial_link_irq_chain(struct uart_8250_port *up)
INIT_LIST_HEAD(&up->list);
i->head = &up->list;
spin_unlock_irq(&i->lock);
- irq_flags |= up->port.irqflags;
ret = request_irq(up->port.irq, serial8250_interrupt,
- irq_flags, up->port.name, i);
+ up->port.irqflags, up->port.name, i);
if (ret < 0)
serial_do_unlink(i, up);
}
diff --git a/drivers/tty/serial/8250/8250_of.c b/drivers/tty/serial/8250/8250_of.c
index 531ad67395e0..f6687756ec5e 100644
--- a/drivers/tty/serial/8250/8250_of.c
+++ b/drivers/tty/serial/8250/8250_of.c
@@ -202,7 +202,6 @@ static int of_platform_serial_setup(struct platform_device *ofdev,
port->type = type;
port->uartclk = clk;
- port->irqflags |= IRQF_SHARED;
if (of_property_read_bool(np, "no-loopback-test"))
port->flags |= UPF_SKIP_TEST;
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 430e3467aff7..0325f2e53b74 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -2177,6 +2177,10 @@ int serial8250_do_startup(struct uart_port *port)
}
}
+ /* Check if we need to have shared IRQs */
+ if (port->irq && (up->port.flags & UPF_SHARE_IRQ))
+ up->port.irqflags |= IRQF_SHARED;
+
if (port->irq && !(up->port.flags & UPF_NO_THRE_TEST)) {
unsigned char iir1;
/*
When doing the 4.9 merge into certain Android trees, I noticed a warning
from Android's deprecated GCC 4.9.4, which causes a build failure in
those trees due to basically -Werror:
fs/ecryptfs/keystore.c: In function 'ecryptfs_parse_packet_set':
fs/ecryptfs/keystore.c:1357:2: warning: 'auth_tok_list_item' may be used
uninitialized in this function [-Wmaybe-uninitialized]
memset(auth_tok_list_item, 0,
^
fs/ecryptfs/keystore.c:1260:38: note: 'auth_tok_list_item' was declared
here
struct ecryptfs_auth_tok_list_item *auth_tok_list_item;
^
GCC 9.2.0 was not able to pick up this warning when I tested it.
Turns out that Clang warns as well when -Wuninitialized is used, which
is not the case in older stable trees at the moment (but shows value in
potentially backporting the various warning fixes currently in upstream
to get more coverage).
fs/ecryptfs/keystore.c:1284:6: warning: variable 'auth_tok_list_item' is
used uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (data[(*packet_size)++] != ECRYPTFS_TAG_1_PACKET_TYPE) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/ecryptfs/keystore.c:1360:4: note: uninitialized use occurs here
auth_tok_list_item);
^~~~~~~~~~~~~~~~~~
fs/ecryptfs/keystore.c:1284:2: note: remove the 'if' if its condition is
always false
if (data[(*packet_size)++] != ECRYPTFS_TAG_1_PACKET_TYPE) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/ecryptfs/keystore.c:1260:56: note: initialize the variable
'auth_tok_list_item' to silence this warning
struct ecryptfs_auth_tok_list_item *auth_tok_list_item;
^
= NULL
1 warning generated.
Somehow, commit fe2e082f5da5 ("ecryptfs: fix a memory leak bug in
parse_tag_1_packet()") upstream was not applied in the correct if block
in 4.4.215, 4.9.215, and 4.14.172, which will indeed lead to use of
uninitialized memory. Fix it up by undoing the bad backport in those
trees then reapplying the patch in the proper location.
Signed-off-by: Nathan Chancellor <natechancellor(a)gmail.com>
---
fs/ecryptfs/keystore.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ecryptfs/keystore.c b/fs/ecryptfs/keystore.c
index 3f3ec50bf773..b134315fb69d 100644
--- a/fs/ecryptfs/keystore.c
+++ b/fs/ecryptfs/keystore.c
@@ -1285,7 +1285,7 @@ parse_tag_1_packet(struct ecryptfs_crypt_stat *crypt_stat,
printk(KERN_ERR "Enter w/ first byte != 0x%.2x\n",
ECRYPTFS_TAG_1_PACKET_TYPE);
rc = -EINVAL;
- goto out_free;
+ goto out;
}
/* Released: wipe_auth_tok_list called in ecryptfs_parse_packet_set or
* at end of function upon failure */
@@ -1335,7 +1335,7 @@ parse_tag_1_packet(struct ecryptfs_crypt_stat *crypt_stat,
printk(KERN_WARNING "Tag 1 packet contains key larger "
"than ECRYPTFS_MAX_ENCRYPTED_KEY_BYTES");
rc = -EINVAL;
- goto out;
+ goto out_free;
}
memcpy((*new_auth_tok)->session_key.encrypted_key,
&data[(*packet_size)], (body_size - (ECRYPTFS_SIG_SIZE + 2)));
--
2.25.1
From: Daniel Golle <daniel(a)makrotopia.org>
[ Upstream commit 87c5cbf71ecbb9e289d60a2df22eb686c70bf196 ]
On AR934x this UART is usually not initialized by the bootloader
as it is only used as a secondary serial port while the primary
UART is a newly introduced NS16550-compatible.
In order to make use of the ar933x-uart on AR934x without RTS/CTS
hardware flow control, one needs to set the
UART_CS_{RX,TX}_READY_ORIDE bits as other than on AR933x where this
UART is used as primary/console, the bootloader on AR934x typically
doesn't set those bits.
Setting them explicitely on AR933x should not do any harm, so just
set them unconditionally.
Tested-by: Chuanhong Guo <gch981213(a)gmail.com>
Signed-off-by: Daniel Golle <daniel(a)makrotopia.org>
Link: https://lore.kernel.org/r/20200207095335.GA179836@makrotopia.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/tty/serial/ar933x_uart.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/tty/serial/ar933x_uart.c b/drivers/tty/serial/ar933x_uart.c
index 1519d2ca7705f..40194791cde0b 100644
--- a/drivers/tty/serial/ar933x_uart.c
+++ b/drivers/tty/serial/ar933x_uart.c
@@ -294,6 +294,10 @@ static void ar933x_uart_set_termios(struct uart_port *port,
ar933x_uart_rmw_set(up, AR933X_UART_CS_REG,
AR933X_UART_CS_HOST_INT_EN);
+ /* enable RX and TX ready overide */
+ ar933x_uart_rmw_set(up, AR933X_UART_CS_REG,
+ AR933X_UART_CS_TX_READY_ORIDE | AR933X_UART_CS_RX_READY_ORIDE);
+
/* reenable the UART */
ar933x_uart_rmw(up, AR933X_UART_CS_REG,
AR933X_UART_CS_IF_MODE_M << AR933X_UART_CS_IF_MODE_S,
@@ -426,6 +430,10 @@ static int ar933x_uart_startup(struct uart_port *port)
ar933x_uart_rmw_set(up, AR933X_UART_CS_REG,
AR933X_UART_CS_HOST_INT_EN);
+ /* enable RX and TX ready overide */
+ ar933x_uart_rmw_set(up, AR933X_UART_CS_REG,
+ AR933X_UART_CS_TX_READY_ORIDE | AR933X_UART_CS_RX_READY_ORIDE);
+
/* Enable RX interrupts */
up->ier = AR933X_UART_INT_RX_VALID;
ar933x_uart_write(up, AR933X_UART_INT_EN_REG, up->ier);
--
2.20.1
From: Daniel Golle <daniel(a)makrotopia.org>
[ Upstream commit 87c5cbf71ecbb9e289d60a2df22eb686c70bf196 ]
On AR934x this UART is usually not initialized by the bootloader
as it is only used as a secondary serial port while the primary
UART is a newly introduced NS16550-compatible.
In order to make use of the ar933x-uart on AR934x without RTS/CTS
hardware flow control, one needs to set the
UART_CS_{RX,TX}_READY_ORIDE bits as other than on AR933x where this
UART is used as primary/console, the bootloader on AR934x typically
doesn't set those bits.
Setting them explicitely on AR933x should not do any harm, so just
set them unconditionally.
Tested-by: Chuanhong Guo <gch981213(a)gmail.com>
Signed-off-by: Daniel Golle <daniel(a)makrotopia.org>
Link: https://lore.kernel.org/r/20200207095335.GA179836@makrotopia.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/tty/serial/ar933x_uart.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/tty/serial/ar933x_uart.c b/drivers/tty/serial/ar933x_uart.c
index d4462512605b5..246f4aab74075 100644
--- a/drivers/tty/serial/ar933x_uart.c
+++ b/drivers/tty/serial/ar933x_uart.c
@@ -289,6 +289,10 @@ static void ar933x_uart_set_termios(struct uart_port *port,
ar933x_uart_rmw_set(up, AR933X_UART_CS_REG,
AR933X_UART_CS_HOST_INT_EN);
+ /* enable RX and TX ready overide */
+ ar933x_uart_rmw_set(up, AR933X_UART_CS_REG,
+ AR933X_UART_CS_TX_READY_ORIDE | AR933X_UART_CS_RX_READY_ORIDE);
+
/* reenable the UART */
ar933x_uart_rmw(up, AR933X_UART_CS_REG,
AR933X_UART_CS_IF_MODE_M << AR933X_UART_CS_IF_MODE_S,
@@ -421,6 +425,10 @@ static int ar933x_uart_startup(struct uart_port *port)
ar933x_uart_rmw_set(up, AR933X_UART_CS_REG,
AR933X_UART_CS_HOST_INT_EN);
+ /* enable RX and TX ready overide */
+ ar933x_uart_rmw_set(up, AR933X_UART_CS_REG,
+ AR933X_UART_CS_TX_READY_ORIDE | AR933X_UART_CS_RX_READY_ORIDE);
+
/* Enable RX interrupts */
up->ier = AR933X_UART_INT_RX_VALID;
ar933x_uart_write(up, AR933X_UART_INT_EN_REG, up->ier);
--
2.20.1
From: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
[ Upstream commit 43bcb1c0507858cdc95e425017dcc33f8105df39 ]
snd_hdac_ext_bus_link_get() does not work correctly in case
there are multiple codecs on the bus. It unconditionally
resets the bus->codec_mask value. As per documentation in
hdaudio.h and existing use in client code, this field should
be used to store bit flag of detected codecs on the bus.
By overwriting value of the codec_mask, information on all
detected codecs is lost. No current user of hdac is impacted,
but use of bus->codec_mask is planned in future patches
for SOF.
Signed-off-by: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Reviewed-by: Takashi Iwai <tiwai(a)suse.de>
Link: https://lore.kernel.org/r/20200206200223.7715-1-kai.vehmanen@linux.intel.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/hda/ext/hdac_ext_controller.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/sound/hda/ext/hdac_ext_controller.c b/sound/hda/ext/hdac_ext_controller.c
index 60cb00fd0c693..84b44cdae28a1 100644
--- a/sound/hda/ext/hdac_ext_controller.c
+++ b/sound/hda/ext/hdac_ext_controller.c
@@ -262,6 +262,7 @@ EXPORT_SYMBOL_GPL(snd_hdac_ext_bus_link_power_down_all);
int snd_hdac_ext_bus_link_get(struct hdac_bus *bus,
struct hdac_ext_link *link)
{
+ unsigned long codec_mask;
int ret = 0;
mutex_lock(&bus->lock);
@@ -283,9 +284,11 @@ int snd_hdac_ext_bus_link_get(struct hdac_bus *bus,
* HDA spec section 4.3 - Codec Discovery
*/
udelay(521);
- bus->codec_mask = snd_hdac_chip_readw(bus, STATESTS);
- dev_dbg(bus->dev, "codec_mask = 0x%lx\n", bus->codec_mask);
- snd_hdac_chip_writew(bus, STATESTS, bus->codec_mask);
+ codec_mask = snd_hdac_chip_readw(bus, STATESTS);
+ dev_dbg(bus->dev, "codec_mask = 0x%lx\n", codec_mask);
+ snd_hdac_chip_writew(bus, STATESTS, codec_mask);
+ if (!bus->codec_mask)
+ bus->codec_mask = codec_mask;
}
mutex_unlock(&bus->lock);
--
2.20.1
From: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
[ Upstream commit 43bcb1c0507858cdc95e425017dcc33f8105df39 ]
snd_hdac_ext_bus_link_get() does not work correctly in case
there are multiple codecs on the bus. It unconditionally
resets the bus->codec_mask value. As per documentation in
hdaudio.h and existing use in client code, this field should
be used to store bit flag of detected codecs on the bus.
By overwriting value of the codec_mask, information on all
detected codecs is lost. No current user of hdac is impacted,
but use of bus->codec_mask is planned in future patches
for SOF.
Signed-off-by: Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Reviewed-by: Takashi Iwai <tiwai(a)suse.de>
Link: https://lore.kernel.org/r/20200206200223.7715-1-kai.vehmanen@linux.intel.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/hda/ext/hdac_ext_controller.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/sound/hda/ext/hdac_ext_controller.c b/sound/hda/ext/hdac_ext_controller.c
index cfab60d88c921..09ff209df4a30 100644
--- a/sound/hda/ext/hdac_ext_controller.c
+++ b/sound/hda/ext/hdac_ext_controller.c
@@ -254,6 +254,7 @@ EXPORT_SYMBOL_GPL(snd_hdac_ext_bus_link_power_down_all);
int snd_hdac_ext_bus_link_get(struct hdac_bus *bus,
struct hdac_ext_link *link)
{
+ unsigned long codec_mask;
int ret = 0;
mutex_lock(&bus->lock);
@@ -280,9 +281,11 @@ int snd_hdac_ext_bus_link_get(struct hdac_bus *bus,
* HDA spec section 4.3 - Codec Discovery
*/
udelay(521);
- bus->codec_mask = snd_hdac_chip_readw(bus, STATESTS);
- dev_dbg(bus->dev, "codec_mask = 0x%lx\n", bus->codec_mask);
- snd_hdac_chip_writew(bus, STATESTS, bus->codec_mask);
+ codec_mask = snd_hdac_chip_readw(bus, STATESTS);
+ dev_dbg(bus->dev, "codec_mask = 0x%lx\n", codec_mask);
+ snd_hdac_chip_writew(bus, STATESTS, codec_mask);
+ if (!bus->codec_mask)
+ bus->codec_mask = codec_mask;
}
mutex_unlock(&bus->lock);
--
2.20.1
From: Pratham Pratap <prathampratap(a)codeaurora.org>
If scatter-gather operation is allowed, a large USB request is split
into multiple TRBs. For preparing TRBs for sg list, driver iterates
over the list and creates TRB for each sg and mark the chain bit to
false for the last sg. The current IOMMU driver is clubbing the list
of sgs which shares a page boundary into one and giving it to USB driver.
With this the number of sgs mapped it not equal to the the number of sgs
passed. Because of this USB driver is not marking the chain bit to false
since it couldn't iterate to the last sg. This patch addresses this issue
by marking the chain bit to false if it is the last mapped sg.
At a practical level, this patch resolves USB transfer stalls
seen with adb on dwc3 based db845c, pixel3 and other qcom
hardware after functionfs gadget added scatter-gather support
around v4.20.
Credit also to Anurag Kumar Vulisha <anurag.kumar.vulisha(a)xilinx.com>
who implemented a very similar fix to this issue.
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Yang Fei <fei.yang(a)intel.com>
Cc: Thinh Nguyen <thinhn(a)synopsys.com>
Cc: Tejas Joglekar <tejas.joglekar(a)synopsys.com>
Cc: Andrzej Pietrasiewicz <andrzej.p(a)collabora.com>
Cc: Jack Pham <jackp(a)codeaurora.org>
Cc: Todd Kjos <tkjos(a)google.com>
Cc: Greg KH <gregkh(a)linuxfoundation.org>
Cc: Linux USB List <linux-usb(a)vger.kernel.org>
Cc: stable <stable(a)vger.kernel.org> #4.20+
Signed-off-by: Pratham Pratap <prathampratap(a)codeaurora.org>
[jstultz: Slight tweak to remove sg_is_last() usage, reworked
commit message, minor comment tweak]
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
---
v2:
* Fix typeos and unnecssary parens as suggested by Jack
---
drivers/usb/dwc3/gadget.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 1b7d2f9cb673..1e00bf2d65a2 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1071,7 +1071,14 @@ static void dwc3_prepare_one_trb_sg(struct dwc3_ep *dep,
unsigned int rem = length % maxp;
unsigned chain = true;
- if (sg_is_last(s))
+ /*
+ * IOMMU driver is coalescing the list of sgs which shares a
+ * page boundary into one and giving it to USB driver. With
+ * this the number of sgs mapped is not equal to the number of
+ * sgs passed. So mark the chain bit to false if it isthe last
+ * mapped sg.
+ */
+ if (i == remaining - 1)
chain = false;
if (rem && usb_endpoint_dir_out(dep->endpoint.desc) && !chain) {
--
2.17.1
On Mon, Mar 2, 2020 at 6:43 PM <christian(a)brauner.io> wrote:
> On March 2, 2020 6:37:27 PM GMT+01:00, Jann Horn <jannh(a)google.com> wrote:
> >On Mon, Mar 2, 2020 at 6:01 PM Bernd Edlinger
> ><bernd.edlinger(a)hotmail.de> wrote:
> >> On 3/2/20 5:43 PM, Jann Horn wrote:
> >> > On Mon, Mar 2, 2020 at 5:19 PM Eric W. Biederman
> ><ebiederm(a)xmission.com> wrote:
[...]
> >> >> I am 99% convinced that the fix is to move cred_guard_mutex down.
> >> >
> >> > "move cred_guard_mutex down" as in "take it once we've already set
> >up
> >> > the new process, past the point of no return"?
> >> >
> >> >> Then right after we take cred_guard_mutex do:
> >> >> if (ptraced) {
> >> >> use_original_creds();
> >> >> }
> >> >>
> >> >> And call it a day.
> >> >>
> >> >> The details suck but I am 99% certain that would solve everyones
> >> >> problems, and not be too bad to audit either.
> >> >
> >> > Ah, hmm, that sounds like it'll work fine at least when no LSMs are
> >involved.
> >> >
> >> > SELinux normally doesn't do the execution-degrading thing, it just
> >> > blocks the execution completely - see their
> >selinux_bprm_set_creds()
> >> > hook. So I think they'd still need to set some state on the task
> >that
> >> > says "we're currently in the middle of an execution where the
> >target
> >> > task will run in context X", and then check against that in the
> >> > ptrace_may_access hook. Or I suppose they could just kill the task
> >> > near the end of execve, although that'd be kinda ugly.
> >> >
> >>
> >> We have current->in_execve for that, right?
> >> I think when the cred_guard_mutex is taken only in the critical
> >section,
> >> then PTRACE_ATTACH could take the guard_mutex, and look at
> >current->in_execve,
> >> and just return -EAGAIN in that case, right, everybody happy :)
> >
> >It's probably going to mean that things like strace will just randomly
> >fail to attach to processes if they happen to be in the middle of
> >execve... but I guess that works?
>
> That sounds like an acceptable outcome.
> We can at least risk it and if we regress
> revert or come up with the more complex
> solution suggested in another mail here?
Yeah, sounds reasonable, I guess.
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f6f13c125e05603f68f5bf31f045b95e6d493598 Mon Sep 17 00:00:00 2001
From: Haiyang Zhang <haiyangz(a)microsoft.com>
Date: Fri, 21 Feb 2020 08:32:18 -0800
Subject: [PATCH] hv_netvsc: Fix unwanted wakeup in netvsc_attach()
When netvsc_attach() is called by operations like changing MTU, etc.,
an extra wakeup may happen while netvsc_attach() calling
rndis_filter_device_add() which sends rndis messages when queue is
stopped in netvsc_detach(). The completion message will wake up queue 0.
We can reproduce the issue by changing MTU etc., then the wake_queue
counter from "ethtool -S" will increase beyond stop_queue counter:
stop_queue: 0
wake_queue: 1
The issue causes queue wake up, and counter increment, no other ill
effects in current code. So we didn't see any network problem for now.
To fix this, initialize tx_disable to true, and set it to false when
the NIC is ready to be attached or registered.
Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
Signed-off-by: Haiyang Zhang <haiyangz(a)microsoft.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index ae3f3084c2ed..1b320bcf150a 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -99,7 +99,7 @@ static struct netvsc_device *alloc_net_device(void)
init_waitqueue_head(&net_device->wait_drain);
net_device->destroy = false;
- net_device->tx_disable = false;
+ net_device->tx_disable = true;
net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT;
net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 65e12cb07f45..2c0a24c606fc 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1068,6 +1068,7 @@ static int netvsc_attach(struct net_device *ndev,
}
/* In any case device is now ready */
+ nvdev->tx_disable = false;
netif_device_attach(ndev);
/* Note: enable and attach happen when sub-channels setup */
@@ -2476,6 +2477,8 @@ static int netvsc_probe(struct hv_device *dev,
else
net->max_mtu = ETH_DATA_LEN;
+ nvdev->tx_disable = false;
+
ret = register_netdevice(net);
if (ret != 0) {
pr_err("Unable to register netdev.\n");
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d876836204897b6d7d911f942084f69a1e9d5c4d Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Thu, 27 Feb 2020 14:17:49 -0700
Subject: [PATCH] io_uring: fix 32-bit compatability with sendmsg/recvmsg
We must set MSG_CMSG_COMPAT if we're in compatability mode, otherwise
the iovec import for these commands will not do the right thing and fail
the command with -EINVAL.
Found by running the test suite compiled as 32-bit.
Cc: stable(a)vger.kernel.org
Fixes: aa1fa28fc73e ("io_uring: add support for recvmsg()")
Fixes: 0fa03c624d8f ("io_uring: add support for sendmsg()")
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 05eea06f5421..6a595c13e108 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3001,6 +3001,11 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
sr->msg = u64_to_user_ptr(READ_ONCE(sqe->addr));
sr->len = READ_ONCE(sqe->len);
+#ifdef CONFIG_COMPAT
+ if (req->ctx->compat)
+ sr->msg_flags |= MSG_CMSG_COMPAT;
+#endif
+
if (!io || req->opcode == IORING_OP_SEND)
return 0;
/* iovec is already imported */
@@ -3153,6 +3158,11 @@ static int io_recvmsg_prep(struct io_kiocb *req,
sr->msg = u64_to_user_ptr(READ_ONCE(sqe->addr));
sr->len = READ_ONCE(sqe->len);
+#ifdef CONFIG_COMPAT
+ if (req->ctx->compat)
+ sr->msg_flags |= MSG_CMSG_COMPAT;
+#endif
+
if (!io || req->opcode == IORING_OP_RECV)
return 0;
/* iovec is already imported */
The patch below does not apply to the 5.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d876836204897b6d7d911f942084f69a1e9d5c4d Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Thu, 27 Feb 2020 14:17:49 -0700
Subject: [PATCH] io_uring: fix 32-bit compatability with sendmsg/recvmsg
We must set MSG_CMSG_COMPAT if we're in compatability mode, otherwise
the iovec import for these commands will not do the right thing and fail
the command with -EINVAL.
Found by running the test suite compiled as 32-bit.
Cc: stable(a)vger.kernel.org
Fixes: aa1fa28fc73e ("io_uring: add support for recvmsg()")
Fixes: 0fa03c624d8f ("io_uring: add support for sendmsg()")
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 05eea06f5421..6a595c13e108 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3001,6 +3001,11 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
sr->msg = u64_to_user_ptr(READ_ONCE(sqe->addr));
sr->len = READ_ONCE(sqe->len);
+#ifdef CONFIG_COMPAT
+ if (req->ctx->compat)
+ sr->msg_flags |= MSG_CMSG_COMPAT;
+#endif
+
if (!io || req->opcode == IORING_OP_SEND)
return 0;
/* iovec is already imported */
@@ -3153,6 +3158,11 @@ static int io_recvmsg_prep(struct io_kiocb *req,
sr->msg = u64_to_user_ptr(READ_ONCE(sqe->addr));
sr->len = READ_ONCE(sqe->len);
+#ifdef CONFIG_COMPAT
+ if (req->ctx->compat)
+ sr->msg_flags |= MSG_CMSG_COMPAT;
+#endif
+
if (!io || req->opcode == IORING_OP_RECV)
return 0;
/* iovec is already imported */
boot_aggregate is the first entry of IMA measurement list. Its purpose is
to link pre-boot measurements to IMA measurements. As IMA was designed to
work with a TPM 1.2, the SHA1 PCR bank was always selected.
Currently, even if a TPM 2.0 is used, the SHA1 PCR bank is selected.
However, the assumption that the SHA1 PCR bank is always available is not
correct, as PCR banks can be selected with the PCR_Allocate() TPM command.
This patch tries to use ima_hash_algo as hash algorithm for boot_aggregate.
If no PCR bank uses that algorithm, the patch tries to find the SHA256 PCR
bank (which is mandatory in the TCG PC Client specification). If also this
bank is not found, the patch selects the first one. If the TPM algorithm
of that bank is not mapped to a crypto ID, boot_aggregate is set to zero.
Changelog
v1:
- add Mimi's comments
- if there is no PCR bank for the IMA default algorithm use SHA256
(suggested by James Bottomley)
Cc: stable(a)vger.kernel.org # 5.1.x
Fixes: 879b589210a9 ("tpm: retrieve digest size of unknown algorithms with PCR read")
Reported-by: Jerry Snitselaar <jsnitsel(a)redhat.com>
Suggested-by: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
---
security/integrity/ima/ima_crypto.c | 45 +++++++++++++++++++++++++----
security/integrity/ima/ima_init.c | 22 ++++++++++----
2 files changed, 56 insertions(+), 11 deletions(-)
diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c
index 73044fc6a952..f2f41a2bc3d4 100644
--- a/security/integrity/ima/ima_crypto.c
+++ b/security/integrity/ima/ima_crypto.c
@@ -655,18 +655,29 @@ static void __init ima_pcrread(u32 idx, struct tpm_digest *d)
}
/*
- * Calculate the boot aggregate hash
+ * The boot_aggregate is a cumulative hash over TPM registers 0 - 7. With
+ * TPM 1.2 the boot_aggregate was based on reading the SHA1 PCRs, but with
+ * TPM 2.0 hash agility, TPM chips could support multiple TPM PCR banks,
+ * allowing firmware to configure and enable different banks.
+ *
+ * Knowing which TPM bank is read to calculate the boot_aggregate digest
+ * needs to be conveyed to a verifier. For this reason, use the same
+ * hash algorithm for reading the TPM PCRs as for calculating the boot
+ * aggregate digest as stored in the measurement list.
*/
-static int __init ima_calc_boot_aggregate_tfm(char *digest,
+static int __init ima_calc_boot_aggregate_tfm(char *digest, u16 alg_id,
struct crypto_shash *tfm)
{
- struct tpm_digest d = { .alg_id = TPM_ALG_SHA1, .digest = {0} };
+ struct tpm_digest d = { .alg_id = alg_id, .digest = {0} };
int rc;
u32 i;
SHASH_DESC_ON_STACK(shash, tfm);
shash->tfm = tfm;
+ pr_devel("calculating the boot-aggregate based on TPM bank: %04x\n",
+ d.alg_id);
+
rc = crypto_shash_init(shash);
if (rc != 0)
return rc;
@@ -675,7 +686,8 @@ static int __init ima_calc_boot_aggregate_tfm(char *digest,
for (i = TPM_PCR0; i < TPM_PCR8; i++) {
ima_pcrread(i, &d);
/* now accumulate with current aggregate */
- rc = crypto_shash_update(shash, d.digest, TPM_DIGEST_SIZE);
+ rc = crypto_shash_update(shash, d.digest,
+ crypto_shash_digestsize(tfm));
}
if (!rc)
crypto_shash_final(shash, digest);
@@ -685,14 +697,35 @@ static int __init ima_calc_boot_aggregate_tfm(char *digest,
int __init ima_calc_boot_aggregate(struct ima_digest_data *hash)
{
struct crypto_shash *tfm;
- int rc;
+ u16 crypto_id, alg_id;
+ int rc, i, bank_idx = 0;
+
+ for (i = 0; i < ima_tpm_chip->nr_allocated_banks; i++) {
+ crypto_id = ima_tpm_chip->allocated_banks[i].crypto_id;
+ if (crypto_id == hash->algo) {
+ bank_idx = i;
+ break;
+ }
+
+ if (crypto_id == HASH_ALGO_SHA256)
+ bank_idx = i;
+ }
+
+ if (bank_idx == 0 &&
+ ima_tpm_chip->allocated_banks[0].crypto_id == HASH_ALGO__LAST) {
+ pr_err("No suitable TPM algorithm for boot aggregate\n");
+ return 0;
+ }
+
+ hash->algo = ima_tpm_chip->allocated_banks[bank_idx].crypto_id;
tfm = ima_alloc_tfm(hash->algo);
if (IS_ERR(tfm))
return PTR_ERR(tfm);
hash->length = crypto_shash_digestsize(tfm);
- rc = ima_calc_boot_aggregate_tfm(hash->digest, tfm);
+ alg_id = ima_tpm_chip->allocated_banks[bank_idx].alg_id;
+ rc = ima_calc_boot_aggregate_tfm(hash->digest, alg_id, tfm);
ima_free_tfm(tfm);
diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
index 5d55ade5f3b9..fbd7a8e28a6b 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -27,7 +27,7 @@ struct tpm_chip *ima_tpm_chip;
/* Add the boot aggregate to the IMA measurement list and extend
* the PCR register.
*
- * Calculate the boot aggregate, a SHA1 over tpm registers 0-7,
+ * Calculate the boot aggregate, a hash over tpm registers 0-7,
* assuming a TPM chip exists, and zeroes if the TPM chip does not
* exist. Add the boot aggregate measurement to the measurement
* list and extend the PCR register.
@@ -51,15 +51,27 @@ static int __init ima_add_boot_aggregate(void)
int violation = 0;
struct {
struct ima_digest_data hdr;
- char digest[TPM_DIGEST_SIZE];
+ char digest[TPM_MAX_DIGEST_SIZE];
} hash;
memset(iint, 0, sizeof(*iint));
memset(&hash, 0, sizeof(hash));
iint->ima_hash = &hash.hdr;
- iint->ima_hash->algo = HASH_ALGO_SHA1;
- iint->ima_hash->length = SHA1_DIGEST_SIZE;
-
+ iint->ima_hash->algo = ima_hash_algo;
+ iint->ima_hash->length = hash_digest_size[ima_hash_algo];
+
+ /*
+ * With TPM 2.0 hash agility, TPM chips could support multiple TPM
+ * PCR banks, allowing firmware to configure and enable different
+ * banks. The SHA1 bank is not necessarily enabled.
+ *
+ * Use the same hash algorithm for reading the TPM PCRs as for
+ * calculating the boot aggregate digest. Preference is given to
+ * the configured IMA default hash algorithm. Otherwise, use the
+ * TPM required banks - SHA256 for TPM 2.0, SHA1 for TPM 1.2. If
+ * SHA256 is not available, use the first PCR bank and if that is
+ * not mapped to a crypto ID, set boot_aggregate to zero.
+ */
if (ima_tpm_chip) {
result = ima_calc_boot_aggregate(&hash.hdr);
if (result < 0) {
--
2.17.1
This issue was found when adbd trying to open functionfs with AIO mode.
Usually, we need to set "setprop sys.usb.ffs.aio_compat 0" to enable
adbd with AIO mode on Android.
When adbd is opening functionfs, it will try to read 24 bytes at the
first read I/O control. If this reading has been failed, adbd will
try to send FUNCTIONFS_CLEAR_HALT to functionfs. When adbd is in AIO
mode, functionfs will be acted with asyncronized I/O path. After the
successful read transfer has been completed by gadget hardware, the
following series of functions will be called.
ffs_epfile_async_io_complete() -> ffs_user_copy_worker() ->
copy_to_iter() -> _copy_to_iter() -> copyout() ->
iterate_and_advance() -> iterate_iovec()
Adding debug trace to these functions, it has been found that in
copyout(), access_ok() will check if the user space address is valid
to write. However if CONFIG_ARM64_TAGGED_ADDR_ABI is enabled, adbd
always passes user space address start with "0x3C" to gadget's AIO
blocks. This tagged address will cause access_ok() check always fail.
Which causes later calculation in iterate_iovec() turn zero.
Copyout() won't copy data to userspace since the length to be copied
"v.iov_len" will be zero. Finally leads ffs_copy_to_iter() always return
-EFAULT, causes adbd cannot open functionfs and send
FUNCTIONFS_CLEAR_HALT.
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
---
Changes for v3:
- Fix misspelling in commit message.
Changes for v2:
- Fix build error for 32-bit load. An #if defined(CONFIG_ARM64) still need
for avoiding undeclared defines.
drivers/usb/gadget/function/f_fs.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index ce1d023..728c260 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -35,6 +35,7 @@
#include <linux/mmu_context.h>
#include <linux/poll.h>
#include <linux/eventfd.h>
+#include <linux/thread_info.h>
#include "u_fs.h"
#include "u_f.h"
@@ -826,6 +827,10 @@ static void ffs_user_copy_worker(struct work_struct *work)
if (io_data->read && ret > 0) {
mm_segment_t oldfs = get_fs();
+#if defined(CONFIG_ARM64)
+ if (IS_ENABLED(CONFIG_ARM64_TAGGED_ADDR_ABI))
+ set_thread_flag(TIF_TAGGED_ADDR);
+#endif
set_fs(USER_DS);
use_mm(io_data->mm);
ret = ffs_copy_to_iter(io_data->buf, ret, &io_data->data);
--
1.7.9.5
intel_iommu_iova_to_phys() has a bug when it translates an IOVA for a huge
page onto its corresponding physical address. This commit fixes the bug by
accomodating the level of page entry for the IOVA and adds IOVA's lower
address to the physical address.
Cc: <stable(a)vger.kernel.org>
Acked-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Reviewed-by: Moritz Fischer <mdf(a)kernel.org>
Signed-off-by: Yonghyun Hwang <yonghyun(a)google.com>
---
Changes from v2:
- a new condition is added to check whether the pte is present.
Changes from v1:
- level cannot be 0. So, the condition, "if (level > 1)", is removed, which results in a simple code.
- a macro, BIT_MASK, is used to have a bit mask
---
drivers/iommu/intel-iommu.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 932267f49f9a..0837e0063973 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5553,8 +5553,10 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
u64 phys = 0;
pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &level);
- if (pte)
- phys = dma_pte_addr(pte);
+ if (pte && dma_pte_present(pte))
+ phys = dma_pte_addr(pte) +
+ (iova & (BIT_MASK(level_to_offset_bits(level) +
+ VTD_PAGE_SHIFT) - 1));
return phys;
}
--
2.25.0.265.gbab2e86ba0-goog
commit 3837407c1aa1 upstream.
The specification says PMSWINC increments PMEVCNTR<n>_EL1 by 1
if PMEVCNTR<n>_EL0 is enabled and configured to count SW_INCR.
For PMEVCNTR<n>_EL0 to be enabled, we need both PMCNTENSET to
be set for the corresponding event counter but we also need
the PMCR.E bit to be set.
Fixes: 7a0adc7064b8 ("arm64: KVM: Add access handler for PMSWINC register")
Signed-off-by: Eric Auger <eric.auger(a)redhat.com>
Cc: <stable(a)vger.kernel.org> # 4.9 and 4.14 only
---
This is a backport of 3837407c1aa1 ("KVM: arm64: pmu: Don't
increment SW_INCR if PMCR.E is unset") which did not apply on
4.9-stable and 4.14-stable trees. Compared to the original patch
__vcpu_sys_reg() is replaced by vcpu_sys_reg().
v1 -> v2:
- this patch also is candidate for 4.9-stable
---
virt/kvm/arm/pmu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 69ccce308458..9a47b0cfb01d 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -299,6 +299,9 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val)
if (val == 0)
return;
+ if (!(vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E))
+ return;
+
enable = vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
for (i = 0; i < ARMV8_PMU_CYCLE_IDX; i++) {
if (!(val & BIT(i)))
--
2.20.1
Commit 7c779cf7c1f7 ("mtd: rawnand: ingenic: Allow to compile test the
new Ingenic driver") dropped the dependency on JZ4780_NEMC when
COMPILE_TEST was set, which is wrong, as the driver requires symbols
provided by the jz4780-nemc driver.
Change the dependency to (MIPS || COMPILE_TEST) && JZ4780_NEMC to
address the issue.
Fixes: 7c779cf7c1f7 ("mtd: rawnand: ingenic: Allow to compile test the new Ingenic driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
---
drivers/mtd/nand/raw/ingenic/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/ingenic/Kconfig b/drivers/mtd/nand/raw/ingenic/Kconfig
index 485abfa3f80b..96c5ae8b1bbc 100644
--- a/drivers/mtd/nand/raw/ingenic/Kconfig
+++ b/drivers/mtd/nand/raw/ingenic/Kconfig
@@ -1,7 +1,8 @@
# SPDX-License-Identifier: GPL-2.0-only
config MTD_NAND_JZ4780
tristate "JZ4780 NAND controller"
- depends on JZ4780_NEMC || COMPILE_TEST
+ depends on MIPS || COMPILE_TEST
+ depends on JZ4780_NEMC
help
Enables support for NAND Flash connected to the NEMC on JZ4780 SoC
based boards, using the BCH controller for hardware error correction.
--
2.25.1
commit 3837407c1aa1 upstream.
The specification says PMSWINC increments PMEVCNTR<n>_EL1 by 1
if PMEVCNTR<n>_EL0 is enabled and configured to count SW_INCR.
For PMEVCNTR<n>_EL0 to be enabled, we need both PMCNTENSET to
be set for the corresponding event counter but we also need
the PMCR.E bit to be set.
Fixes: 7a0adc7064b8 ("arm64: KVM: Add access handler for PMSWINC register")
Signed-off-by: Eric Auger <eric.auger(a)redhat.com>
Cc: <stable(a)vger.kernel.org> # 4.14 only
---
This is a backport of 3837407c1aa1 ("KVM: arm64: pmu: Don't
increment SW_INCR if PMCR.E is unset") which did not apply on
4.14-stable. Compared to the original patch
__vcpu_sys_reg() is replaced by vcpu_sys_reg().
---
virt/kvm/arm/pmu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 8a9c42366db7..3998add436da 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -316,6 +316,9 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val)
if (val == 0)
return;
+ if (!(vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E))
+ return;
+
enable = vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
for (i = 0; i < ARMV8_PMU_CYCLE_IDX; i++) {
if (!(val & BIT(i)))
--
2.20.1
Hi Greg, Sasha,
Can you cherry-pick these two sysrq patches to LTS kernels?
They're introduced in v5.1-rc1 and should go to all prior LTS versions.
commit 075e1a0c50f59ea210561d0d0fedbd945615df78
Author: Petr Mladek <pmladek(a)suse.com>
Date: Fri Jan 11 13:45:15 2019 +0100
sysrq: Restore original console_loglevel when sysrq disabled
The sysrq header line is printed with an increased loglevel
to provide users some positive feedback.
The original loglevel is not restored when the sysrq operation
is disabled. This bug was introduced in 2.6.12 (pre-git-history)
by the commit ("Allow admin to enable only some of the Magic-Sysrq
functions").
commit c3fee60908db4a8594f2e4a2131998384b8fa006
Author: Petr Mladek <pmladek(a)suse.com>
Date: Fri Jan 11 17:20:37 2019 +0100
sysrq: Remove duplicated sysrq message
The commit 97f5f0cd8cd0a0544 ("Input: implement SysRq as a separate
input
handler") added pr_fmt() definition. It caused a duplicated message
prefix in the sysrq header messages, for example:
[ 177.053931] sysrq: SysRq : Show backtrace of all active CPUs
[ 742.864776] sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c)
Fixes: 97f5f0cd8cd0a05 ("Input: implement SysRq as a separate input
handler")
There is a DMA problem with the serial ports on i.MX6.
When the following sequence is performed:
1) Open a port
2) Write some data
3) Close the port
4) Open a *different* port
5) Write some data
6) Close the port
The second write sends nothing and the second close hangs.
If the first close() is omitted it works.
Adding logs to the the UART driver shows that the DMA is being setup but
the callback is never invoked for the second write.
This used to work in 4.19.
Git bisect leads to:
ad0d92d: "dmaengine: imx-sdma: refine to load context only once"
This commit adds a "context_loaded" flag used to avoid unnecessary context
setups.
However the flag is only reset in sdma_channel_terminate_work(),
which is only invoked in a worker triggered by sdma_terminate_all() IF
there is an active descriptor.
So, if no active descriptor remains when the channel is terminated, the
flag is not reset and, when the channel is later reused the old context
is used.
Fix the problem by always resetting the flag in sdma_free_chan_resources().
Fixes: ad0d92d: "dmaengine: imx-sdma: refine to load context only once"
Cc: stable(a)vger.kernel.org
Signed-off-by: Martin Fuzzey <martin.fuzzey(a)flowbird.group>
---
The following python script may be used to reproduce the problem:
import re, serial, sys
ports=(0, 4) # Can be any ports not used (no need to connect anything) NOT console...
def get_tx_counts():
pattern = re.compile("(\d+):.*tx:(\d+).*")
tx_counts = {}
with open("/proc/tty/driver/IMX-uart", "r") as f:
for line in f:
match = pattern.match(line)
if match:
tx_counts[int(match.group(1))] = int(match.group(2))
return tx_counts
before = get_tx_counts()
a = serial.Serial("/dev/ttymxc%d" % ports[0])
a.write("polop")
a.close()
b = serial.Serial("/dev/ttymxc%d" % ports[1])
b.write("test")
after = get_tx_counts()
if (after[ports[0]] - before[ports[0]] > 0) and (after[ports[1]] - before[ports[1]] > 0):
print "PASS"
sys.exit(0)
else:
print "FAIL"
print "Before: %s" % before
print "After: %s" % after
sys.exit(1)
---
drivers/dma/imx-sdma.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 066b21a..332ca50 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -1338,6 +1338,7 @@ static void sdma_free_chan_resources(struct dma_chan *chan)
sdmac->event_id0 = 0;
sdmac->event_id1 = 0;
+ sdmac->context_loaded = false;
sdma_set_channel_priority(sdmac, 0);
--
1.9.1
SPS30 uses triggered buffer, but the dependency is not specified in the
Kconfig file. Fix this by selecting IIO_BUFFER and IIO_TRIGGERED_BUFFER
config symbols.
Cc: stable(a)vger.kernel.org
Fixes: 232e0f6ddeae ("iio: chemical: add support for Sensirion SPS30 sensor")
Signed-off-by: Petr Štetiar <ynezz(a)true.cz>
---
drivers/iio/chemical/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/iio/chemical/Kconfig b/drivers/iio/chemical/Kconfig
index 0b91de4df8f4..a7e65a59bf42 100644
--- a/drivers/iio/chemical/Kconfig
+++ b/drivers/iio/chemical/Kconfig
@@ -91,6 +91,8 @@ config SPS30
tristate "SPS30 particulate matter sensor"
depends on I2C
select CRC8
+ select IIO_BUFFER
+ select IIO_TRIGGERED_BUFFER
help
Say Y here to build support for the Sensirion SPS30 particulate
matter sensor.
The patch titled
Subject: dma-buf: free dmabuf->name in dma_buf_release()
has been removed from the -mm tree. Its filename was
dma-buf-free-dmabuf-name-in-dma_buf_release.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Cong Wang <xiyou.wangcong(a)gmail.com>
Subject: dma-buf: free dmabuf->name in dma_buf_release()
dma-buff name can be set via DMA_BUF_SET_NAME ioctl, but once set
it never gets freed.
Free it in dma_buf_release().
Link: http://lkml.kernel.org/r/20200225204446.11378-1-xiyou.wangcong@gmail.com
Fixes: bb2bb9030425 ("dma-buf: add DMA_BUF_SET_NAME ioctls")
Reported-by: syzbot+b2098bc44728a4efb3e9(a)syzkaller.appspotmail.com
Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Acked-by: Chenbo Feng <fengc(a)google.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Greg Hackmann <ghackmann(a)google.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/dma-buf/dma-buf.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/dma-buf/dma-buf.c~dma-buf-free-dmabuf-name-in-dma_buf_release
+++ a/drivers/dma-buf/dma-buf.c
@@ -108,6 +108,7 @@ static int dma_buf_release(struct inode
dma_resv_fini(dmabuf->resv);
module_put(dmabuf->owner);
+ kfree(dmabuf->name);
kfree(dmabuf);
return 0;
}
_
Patches currently in -mm which might be from xiyou.wangcong(a)gmail.com are
Dear stable kernel team
I tested this on behalf of darrick.wong(a)oracle.com
I do not have push rights to your repo. I tried. Can you include it
for 5.4 and 5.5 ?
Comment from Darrick
This patch should make it so
that uswsusp can unlock the swap device prior to writing the memory
image to disk, and then lock it again when resuming.
Best regards and Thanks
Marian Klein
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: bfe3046ecafd - Linux 5.4.23
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
Merge: OK
Compile: OK
Tests: FAILED
All kernel binaries, config files, and logs are available for download here:
https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=dataware…
One or more kernel tests failed:
ppc64le:
❌ selinux-policy: serge-testsuite
aarch64:
❌ selinux-policy: serge-testsuite
x86_64:
❌ selinux-policy: serge-testsuite
❌ Networking route_func - local
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ audit: audit testsuite test
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ Usex - version 1.9-29
🚧 ⚡⚡⚡ storage: dm/common
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
⚡⚡⚡ xfstests - xfs
❌ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ⚡⚡⚡ Storage blktests
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
⚡⚡⚡ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Host 4:
✅ Boot test
✅ Podman system integration test - as root
⏱ Podman system integration test - as user
⏱ LTP
⏱ Loopdev Sanity
⏱ Memory function: memfd_create
⏱ AMTU (Abstract Machine Test Utility)
⏱ Networking bridge: sanity
⏱ Ethernet drivers sanity
⏱ Networking MACsec: sanity
⏱ Networking socket: fuzz
⏱ Networking sctp-auth: sockopts test
⏱ Networking: igmp conformance test
⏱ Networking route: pmtu
⏱ Networking route_func - local
⏱ Networking route_func - forward
⏱ Networking TCP: keepalive test
⏱ Networking UDP: socket
⏱ Networking tunnel: geneve basic test
⏱ Networking tunnel: gre basic
⏱ L2TP basic test
⏱ Networking tunnel: vxlan basic
⏱ Networking ipsec: basic netns - transport
⏱ Networking ipsec: basic netns - tunnel
⏱ audit: audit testsuite test
⏱ httpd: mod_ssl smoke sanity
⏱ tuned: tune-processes-through-perf
⏱ ALSA PCM loopback test
⏱ ALSA Control (mixer) Userspace Element test
⏱ storage: SCSI VPD
⏱ trace: ftrace/tracer
⏱ CIFS Connectathon
⏱ POSIX pjd-fstest suites
⏱ jvm - DaCapo Benchmark Suite
⏱ jvm - jcstress tests
⏱ Memory function: kaslr
⏱ LTP: openposix test suite
⏱ Networking vnic: ipvlan/basic
⏱ iotop: sanity
⏱ Usex - version 1.9-29
⏱ storage: dm/common
ppc64le:
Host 1:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
⏱ POSIX pjd-fstest suites
⏱ jvm - DaCapo Benchmark Suite
⏱ jvm - jcstress tests
⏱ Memory function: kaslr
⏱ LTP: openposix test suite
⏱ Networking vnic: ipvlan/basic
⏱ iotop: sanity
⏱ Usex - version 1.9-29
⏱ storage: dm/common
Host 2:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
❌ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
❌ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
⚡⚡⚡ stress: stress-ng
🚧 ✅ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ audit: audit testsuite test
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ Usex - version 1.9-29
🚧 ⚡⚡⚡ storage: dm/common
Host 3:
✅ Boot test
✅ Storage SAN device stress - mpt3sas driver
Host 4:
✅ Boot test
✅ Storage SAN device stress - megaraid_sas
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ audit: audit testsuite test
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ Usex - version 1.9-29
🚧 ⚡⚡⚡ storage: dm/common
Host 6:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
❌ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ audit: audit testsuite test
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ pciutils: sanity smoke test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ iotop: sanity
🚧 ✅ Usex - version 1.9-29
🚧 ✅ storage: dm/common
Test sources: https://github.com/CKI-project/tests-beaker
💚 Pull requests are welcome for new tests or improvements to existing tests!
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Hi All,
I and various other Fedora users have noticed that Fedora's 5.5.6 build gets stuck
at boot on a Lenovo X1 7th gen, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1772498
This is caused by the addition of this commit to 5.5.6:
24c259557c45 ("ASoC: SOF: Intel: hda: Fix SKL dai count")
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
###
ASoC: SOF: Intel: hda: Fix SKL dai count
[ Upstream commit a6947c9d86bcfd61b758b5693eba58defe7fd2ae ]
With fourth pin added for iDisp for skl_dai, update SOF_SKL_DAI_NUM to
account for the change. Without this, dais from the bottom of the list
are skipped. In current state that's the case for 'Alt Analog CPU DAI'.
Fixes: ac42b142cd76 ("ASoC: SOF: Intel: hda: Add iDisp4 DAI")
Signed-off-by: Cezary Rojewski <cezary.rojewski(a)intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Link: https://lore.kernel.org/r/20200113114054.9716-1-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
###
Notice the "Fixes: ac42b142cd76 (...)", that commit-id actually
does not exist, the correct commit-id which this fixes is:
e68d6696575e ("ASoC: SOF: Intel: hda: Add iDisp4 DAI")
and that commit is not in 5.5.6, which is causing the problem,
the missing commit makes an array one larger and the fix for the
missing fix which did end up in 5.5.6 and bumps a define which is
used to walk over the array in some places by one so now the
walking is going over the array boundary.
For the Fedora kernels I've fixed this by adding the
"ASoC: SOF: Intel: hda: Add iDisp4 DAI" commit as a downstream
patch for our kernels. I believe that this is probably the best
fix for 5.5.z too.
Regards,
Hans
p.s.
I know that the stable series are partly based on automatically
picking patches now. I wonder if the scripts doing that could be
made smarter wrt rejecting patches with a Fixes tag where the
fixed patch is not present, so where in essence a pre-requisite
of the patch being added is missing ?
Hi Greg, Sasha,
Could you please help to backport the upstream commit
20faba848752901de23a4d45a1174d64d2069dde ("irqchip/gic-v3-its: Fix
misuse of GENMASK macro") to the stable linux-4.19 tree since it's
definitely a fix for 4.19. Note that it's not needed for other stable
trees.
I've tried and it can be applied to 4.19.106 cleanly.
Thanks,
Zenghui
Hi,
We don't have these two commits in 5.4-stable:
ff002b30181d30cdfbca316dadd099c3ca0d739c
9392a27d88b9707145d713654eb26f0c29789e50
because they don't apply with the rework that was done in how io_uring
handles offload. Since there's no io-wq in 5.4, it doesn't make sense to
do two patches. I'm attaching my port of the two for 5.4-stable, it's
been tested. Please queue it up for the next 5.4-stable, thanks!
--
Jens Axboe
This is a note to let you know that I've just added the patch titled
vt: selection, push sel_lock up
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From e8c75a30a23c6ba63f4ef6895cbf41fd42f21aa2 Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jslaby(a)suse.cz>
Date: Fri, 28 Feb 2020 12:54:06 +0100
Subject: vt: selection, push sel_lock up
sel_lock cannot nest in the console lock. Thanks to syzkaller, the
kernel states firmly:
> WARNING: possible circular locking dependency detected
> 5.6.0-rc3-syzkaller #0 Not tainted
> ------------------------------------------------------
> syz-executor.4/20336 is trying to acquire lock:
> ffff8880a2e952a0 (&tty->termios_rwsem){++++}, at: tty_unthrottle+0x22/0x100 drivers/tty/tty_ioctl.c:136
>
> but task is already holding lock:
> ffffffff89462e70 (sel_lock){+.+.}, at: paste_selection+0x118/0x470 drivers/tty/vt/selection.c:374
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (sel_lock){+.+.}:
> mutex_lock_nested+0x1b/0x30 kernel/locking/mutex.c:1118
> set_selection_kernel+0x3b8/0x18a0 drivers/tty/vt/selection.c:217
> set_selection_user+0x63/0x80 drivers/tty/vt/selection.c:181
> tioclinux+0x103/0x530 drivers/tty/vt/vt.c:3050
> vt_ioctl+0x3f1/0x3a30 drivers/tty/vt/vt_ioctl.c:364
This is ioctl(TIOCL_SETSEL).
Locks held on the path: console_lock -> sel_lock
> -> #1 (console_lock){+.+.}:
> console_lock+0x46/0x70 kernel/printk/printk.c:2289
> con_flush_chars+0x50/0x650 drivers/tty/vt/vt.c:3223
> n_tty_write+0xeae/0x1200 drivers/tty/n_tty.c:2350
> do_tty_write drivers/tty/tty_io.c:962 [inline]
> tty_write+0x5a1/0x950 drivers/tty/tty_io.c:1046
This is write().
Locks held on the path: termios_rwsem -> console_lock
> -> #0 (&tty->termios_rwsem){++++}:
> down_write+0x57/0x140 kernel/locking/rwsem.c:1534
> tty_unthrottle+0x22/0x100 drivers/tty/tty_ioctl.c:136
> mkiss_receive_buf+0x12aa/0x1340 drivers/net/hamradio/mkiss.c:902
> tty_ldisc_receive_buf+0x12f/0x170 drivers/tty/tty_buffer.c:465
> paste_selection+0x346/0x470 drivers/tty/vt/selection.c:389
> tioclinux+0x121/0x530 drivers/tty/vt/vt.c:3055
> vt_ioctl+0x3f1/0x3a30 drivers/tty/vt/vt_ioctl.c:364
This is ioctl(TIOCL_PASTESEL).
Locks held on the path: sel_lock -> termios_rwsem
> other info that might help us debug this:
>
> Chain exists of:
> &tty->termios_rwsem --> console_lock --> sel_lock
Clearly. From the above, we have:
console_lock -> sel_lock
sel_lock -> termios_rwsem
termios_rwsem -> console_lock
Fix this by reversing the console_lock -> sel_lock dependency in
ioctl(TIOCL_SETSEL). First, lock sel_lock, then console_lock.
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
Reported-by: syzbot+26183d9746e62da329b8(a)syzkaller.appspotmail.com
Fixes: 07e6124a1a46 ("vt: selection, close sel_buffer race")
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20200228115406.5735-2-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/vt/selection.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/drivers/tty/vt/selection.c b/drivers/tty/vt/selection.c
index 9126a01290ea..d7d2e4b844bc 100644
--- a/drivers/tty/vt/selection.c
+++ b/drivers/tty/vt/selection.c
@@ -214,7 +214,6 @@ static int __set_selection_kernel(struct tiocl_selection *v, struct tty_struct *
if (ps > pe) /* make sel_start <= sel_end */
swap(ps, pe);
- mutex_lock(&sel_lock);
if (sel_cons != vc_cons[fg_console].d) {
clear_selection();
sel_cons = vc_cons[fg_console].d;
@@ -260,10 +259,9 @@ static int __set_selection_kernel(struct tiocl_selection *v, struct tty_struct *
break;
case TIOCL_SELPOINTER:
highlight_pointer(pe);
- goto unlock;
+ return 0;
default:
- ret = -EINVAL;
- goto unlock;
+ return -EINVAL;
}
/* remove the pointer */
@@ -285,7 +283,7 @@ static int __set_selection_kernel(struct tiocl_selection *v, struct tty_struct *
else if (new_sel_start == sel_start)
{
if (new_sel_end == sel_end) /* no action required */
- goto unlock;
+ return 0;
else if (new_sel_end > sel_end) /* extend to right */
highlight(sel_end + 2, new_sel_end);
else /* contract from right */
@@ -313,8 +311,7 @@ static int __set_selection_kernel(struct tiocl_selection *v, struct tty_struct *
if (!bp) {
printk(KERN_WARNING "selection: kmalloc() failed\n");
clear_selection();
- ret = -ENOMEM;
- goto unlock;
+ return -ENOMEM;
}
kfree(sel_buffer);
sel_buffer = bp;
@@ -339,8 +336,7 @@ static int __set_selection_kernel(struct tiocl_selection *v, struct tty_struct *
}
}
sel_buffer_lth = bp - sel_buffer;
-unlock:
- mutex_unlock(&sel_lock);
+
return ret;
}
@@ -348,9 +344,11 @@ int set_selection_kernel(struct tiocl_selection *v, struct tty_struct *tty)
{
int ret;
+ mutex_lock(&sel_lock);
console_lock();
ret = __set_selection_kernel(v, tty);
console_unlock();
+ mutex_unlock(&sel_lock);
return ret;
}
--
2.25.1
This is a note to let you know that I've just added the patch titled
vt: selection, push console lock down
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 4b70dd57a15d2f4685ac6e38056bad93e81e982f Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jslaby(a)suse.cz>
Date: Fri, 28 Feb 2020 12:54:05 +0100
Subject: vt: selection, push console lock down
We need to nest the console lock in sel_lock, so we have to push it down
a bit. Fortunately, the callers of set_selection_* just lock the console
lock around the function call. So moving it down is easy.
In the next patch, we switch the order.
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
Fixes: 07e6124a1a46 ("vt: selection, close sel_buffer race")
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20200228115406.5735-1-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/speakup/selection.c | 2 --
drivers/tty/vt/selection.c | 13 ++++++++++++-
drivers/tty/vt/vt.c | 2 --
3 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/staging/speakup/selection.c b/drivers/staging/speakup/selection.c
index a8b4d0c5ab7e..032f3264fba1 100644
--- a/drivers/staging/speakup/selection.c
+++ b/drivers/staging/speakup/selection.c
@@ -51,9 +51,7 @@ static void __speakup_set_selection(struct work_struct *work)
goto unref;
}
- console_lock();
set_selection_kernel(&sel, tty);
- console_unlock();
unref:
tty_kref_put(tty);
diff --git a/drivers/tty/vt/selection.c b/drivers/tty/vt/selection.c
index 0c50d7410b31..9126a01290ea 100644
--- a/drivers/tty/vt/selection.c
+++ b/drivers/tty/vt/selection.c
@@ -181,7 +181,7 @@ int set_selection_user(const struct tiocl_selection __user *sel,
return set_selection_kernel(&v, tty);
}
-int set_selection_kernel(struct tiocl_selection *v, struct tty_struct *tty)
+static int __set_selection_kernel(struct tiocl_selection *v, struct tty_struct *tty)
{
struct vc_data *vc = vc_cons[fg_console].d;
int new_sel_start, new_sel_end, spc;
@@ -343,6 +343,17 @@ int set_selection_kernel(struct tiocl_selection *v, struct tty_struct *tty)
mutex_unlock(&sel_lock);
return ret;
}
+
+int set_selection_kernel(struct tiocl_selection *v, struct tty_struct *tty)
+{
+ int ret;
+
+ console_lock();
+ ret = __set_selection_kernel(v, tty);
+ console_unlock();
+
+ return ret;
+}
EXPORT_SYMBOL_GPL(set_selection_kernel);
/* Insert the contents of the selection buffer into the
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 0cfbb7182b5a..15d27698054a 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3046,10 +3046,8 @@ int tioclinux(struct tty_struct *tty, unsigned long arg)
switch (type)
{
case TIOCL_SETSEL:
- console_lock();
ret = set_selection_user((struct tiocl_selection
__user *)(p+1), tty);
- console_unlock();
break;
case TIOCL_PASTESEL:
ret = paste_selection(tty);
--
2.25.1
From: Andrei Botila <andrei.botila(a)nxp.com>
Since in the software implementation of XTS-AES there is
no notion of sector every input length is processed the same way.
CAAM implementation has the notion of sector which causes different
results between the software implementation and the one in CAAM
for input lengths bigger than 512 bytes.
Increase sector size to maximum value on 16 bits.
Fixes: c6415a6016bf ("crypto: caam - add support for acipher xts(aes)")
Cc: <stable(a)vger.kernel.org> # v4.12+
Signed-off-by: Andrei Botila <andrei.botila(a)nxp.com>
---
drivers/crypto/caam/caamalg_desc.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/crypto/caam/caamalg_desc.c b/drivers/crypto/caam/caamalg_desc.c
index aa9ccca67045..8ebbbd28b1f7 100644
--- a/drivers/crypto/caam/caamalg_desc.c
+++ b/drivers/crypto/caam/caamalg_desc.c
@@ -1518,7 +1518,13 @@ EXPORT_SYMBOL(cnstr_shdsc_skcipher_decap);
*/
void cnstr_shdsc_xts_skcipher_encap(u32 * const desc, struct alginfo *cdata)
{
- __be64 sector_size = cpu_to_be64(512);
+ /*
+ * Set sector size to a big value, practically disabling
+ * sector size segmentation in xts implementation. We cannot
+ * take full advantage of this HW feature with existing
+ * crypto API / dm-crypt SW architecture.
+ */
+ __be64 sector_size = cpu_to_be64(BIT(15));
u32 *key_jump_cmd;
init_sh_desc(desc, HDR_SHARE_SERIAL | HDR_SAVECTX);
@@ -1571,7 +1577,13 @@ EXPORT_SYMBOL(cnstr_shdsc_xts_skcipher_encap);
*/
void cnstr_shdsc_xts_skcipher_decap(u32 * const desc, struct alginfo *cdata)
{
- __be64 sector_size = cpu_to_be64(512);
+ /*
+ * Set sector size to a big value, practically disabling
+ * sector size segmentation in xts implementation. We cannot
+ * take full advantage of this HW feature with existing
+ * crypto API / dm-crypt SW architecture.
+ */
+ __be64 sector_size = cpu_to_be64(BIT(15));
u32 *key_jump_cmd;
init_sh_desc(desc, HDR_SHARE_SERIAL | HDR_SAVECTX);
--
2.17.1
4.19.99 seems to have picked up quite a few device link patches.
However, one of them [1] broke the ability to add stateful and
stateless links between the same two devices. Looks like the breakage
was fixed in the mainline kernel in subsequent commits [2] and [3].
Can we pull [2] and [3] into 4.19 please? And any other intermediate
device link patches up to [3].
[1] - 6fdc440366f1a99f344b629ac92f350aefd77911
[2] - 515db266a9da
[3] - fb583c8eeeb1 (this fixed a bug in [2])
Thanks,
Saravana
From: Jann Horn <jannh(a)google.com>
[ Upstream commit 6ab405114b0b229151ef06f4e31c7834dd09d0c0 ]
Check whether inputs from userspace are too long (explicit length field too
big or string not null-terminated) to avoid out-of-bounds reads.
As far as I can tell, this can at worst lead to very limited kernel heap
memory disclosure or oopses.
This bug can be triggered by an unprivileged user even if the xt_bpf module
is not loaded: iptables is available in network namespaces, and the xt_bpf
module can be autoloaded.
Triggering the bug with a classic BPF filter with fake length 0x1000 causes
the following KASAN report:
==================================================================
BUG: KASAN: slab-out-of-bounds in bpf_prog_create+0x84/0xf0
Read of size 32768 at addr ffff8801eff2c494 by task test/4627
CPU: 0 PID: 4627 Comm: test Not tainted 4.15.0-rc1+ #1
[...]
Call Trace:
dump_stack+0x5c/0x85
print_address_description+0x6a/0x260
kasan_report+0x254/0x370
? bpf_prog_create+0x84/0xf0
memcpy+0x1f/0x50
bpf_prog_create+0x84/0xf0
bpf_mt_check+0x90/0xd6 [xt_bpf]
[...]
Allocated by task 4627:
kasan_kmalloc+0xa0/0xd0
__kmalloc_node+0x47/0x60
xt_alloc_table_info+0x41/0x70 [x_tables]
[...]
The buggy address belongs to the object at ffff8801eff2c3c0
which belongs to the cache kmalloc-2048 of size 2048
The buggy address is located 212 bytes inside of
2048-byte region [ffff8801eff2c3c0, ffff8801eff2cbc0)
[...]
==================================================================
Fixes: e6f30c731718 ("netfilter: x_tables: add xt_bpf match")
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Zubin Mithra <zsm(a)chromium.org>
---
* Syzkaller triggered an OOB read with the following stacktrace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0xcb/0x130 lib/dump_stack.c:56
print_trailer+0x151/0x15a mm/slub.c:658
object_err+0x35/0x3d mm/slub.c:665
print_address_description mm/kasan/report.c:188 [inline]
kasan_report_error mm/kasan/report.c:285 [inline]
kasan_report.part.0.cold+0x21a/0x4c1 mm/kasan/report.c:310
kasan_report+0x25/0x30 mm/kasan/report.c:297
check_memory_region_inline mm/kasan/kasan.c:292 [inline]
check_memory_region+0x116/0x190 mm/kasan/kasan.c:299
memcpy+0x24/0x50 mm/kasan/kasan.c:334
bpf_prog_create+0xdd/0x230 net/core/filter.c:1075
bpf_mt_check+0xb1/0x100 net/netfilter/xt_bpf.c:31
xt_check_match+0x26b/0x560 net/netfilter/x_tables.c:444
check_match net/ipv4/netfilter/ip_tables.c:593 [inline]
find_check_match net/ipv4/netfilter/ip_tables.c:616 [inline]
find_check_entry.isra.0+0x2fe/0x950 net/ipv4/netfilter/ip_tables.c:673
translate_table+0xb3f/0x15b0 net/ipv4/netfilter/ip_tables.c:878
do_replace net/ipv4/netfilter/ip_tables.c:1300 [inline]
do_ipt_set_ctl+0x2cf/0x460 net/ipv4/netfilter/ip_tables.c:1864
nf_sockopt net/netfilter/nf_sockopt.c:105 [inline]
nf_setsockopt+0x72/0xd0 net/netfilter/nf_sockopt.c:114
ip_setsockopt net/ipv4/ip_sockglue.c:1227 [inline]
ip_setsockopt+0xa0/0xb0 net/ipv4/ip_sockglue.c:1212
udp_setsockopt+0x53/0x90 net/ipv4/udp.c:2162
sock_common_setsockopt+0x9f/0xe0 net/core/sock.c:2697
SYSC_setsockopt net/socket.c:1769 [inline]
SyS_setsockopt+0x146/0x210 net/socket.c:1748
entry_SYSCALL_64_fastpath+0x1e/0xa0
* This commit is present in linux-4.14.y.
* This patch resolves conflicts that arise due to the following commit
not being present in linux-4.9.y, linux-4.4.y:
- 2c16d6033264 ("netfilter: xt_bpf: support ebpf")
* Tests run: syzkaller reproducer
net/netfilter/xt_bpf.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
index dffee9d47ec4b..7b993f25aab92 100644
--- a/net/netfilter/xt_bpf.c
+++ b/net/netfilter/xt_bpf.c
@@ -25,6 +25,9 @@ static int bpf_mt_check(const struct xt_mtchk_param *par)
struct xt_bpf_info *info = par->matchinfo;
struct sock_fprog_kern program;
+ if (info->bpf_program_num_elem > XT_BPF_MAX_NUM_INSTR)
+ return -EINVAL;
+
program.len = info->bpf_program_num_elem;
program.filter = info->bpf_program;
--
2.25.0.265.gbab2e86ba0-goog
From: Martynas Pumputis <martynas(a)weave.works>
[ Upstream commit ed07d9a021df6da53456663a76999189badc432a ]
This patch enables the clash resolution for NAT (disabled in
"590b52e10d41") if clashing conntracks match (i.e. both tuples are equal)
and a protocol allows it.
The clash might happen for a connections-less protocol (e.g. UDP) when
two threads in parallel writes to the same socket and consequent calls
to "get_unique_tuple" return the same tuples (incl. reply tuples).
In this case it is safe to perform the resolution, as the losing CT
describes the same mangling as the winning CT, so no modifications to
the packet are needed, and the result of rules traversal for the loser's
packet stays valid.
Signed-off-by: Martynas Pumputis <martynas(a)weave.works>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Andy Strohman <astroh(a)amazon.com>
---
net/netfilter/nf_conntrack_core.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 5123e91b1982..4ced7c7102b6 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -632,6 +632,18 @@ nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
net_eq(net, nf_ct_net(ct));
}
+static inline bool
+nf_ct_match(const struct nf_conn *ct1, const struct nf_conn *ct2)
+{
+ return nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
+ &ct2->tuplehash[IP_CT_DIR_ORIGINAL].tuple) &&
+ nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_REPLY].tuple,
+ &ct2->tuplehash[IP_CT_DIR_REPLY].tuple) &&
+ nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_ORIGINAL) &&
+ nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_REPLY) &&
+ net_eq(nf_ct_net(ct1), nf_ct_net(ct2));
+}
+
/* caller must hold rcu readlock and none of the nf_conntrack_locks */
static void nf_ct_gc_expired(struct nf_conn *ct)
{
@@ -825,19 +837,21 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
/* This is the conntrack entry already in hashes that won race. */
struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
const struct nf_conntrack_l4proto *l4proto;
+ enum ip_conntrack_info oldinfo;
+ struct nf_conn *loser_ct = nf_ct_get(skb, &oldinfo);
l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
if (l4proto->allow_clash &&
- ((ct->status & IPS_NAT_DONE_MASK) == 0) &&
!nf_ct_is_dying(ct) &&
atomic_inc_not_zero(&ct->ct_general.use)) {
- enum ip_conntrack_info oldinfo;
- struct nf_conn *loser_ct = nf_ct_get(skb, &oldinfo);
-
- nf_ct_acct_merge(ct, ctinfo, loser_ct);
- nf_conntrack_put(&loser_ct->ct_general);
- nf_ct_set(skb, ct, oldinfo);
- return NF_ACCEPT;
+ if (((ct->status & IPS_NAT_DONE_MASK) == 0) ||
+ nf_ct_match(ct, loser_ct)) {
+ nf_ct_acct_merge(ct, ctinfo, loser_ct);
+ nf_conntrack_put(&loser_ct->ct_general);
+ nf_ct_set(skb, ct, oldinfo);
+ return NF_ACCEPT;
+ }
+ nf_ct_put(ct);
}
NF_CT_STAT_INC(net, drop);
return NF_DROP;
--
2.16.6
Hello Greg,
Can you please consider including the following patch in the stable
linux-5.4.y and linux-5.5.y?
This is to downgrade "EDAC skx: Can't get tolm/tohm" to debug level
on missing PCI device.
854bb48018d5("EDAC: skx_common: downgrade message importance
on missing PCI device")
Thanks,
Qian Lu
The following commit has been merged into the ras/urgent branch of tip:
Commit-ID: 59b5809655bdafb0767d3fd00a3e41711aab07e6
Gitweb: https://git.kernel.org/tip/59b5809655bdafb0767d3fd00a3e41711aab07e6
Author: Tony Luck <tony.luck(a)intel.com>
AuthorDate: Tue, 25 Feb 2020 17:17:37 -08:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Thu, 27 Feb 2020 21:36:42 +01:00
x86/mce: Fix logic and comments around MSR_PPIN_CTL
There are two implemented bits in the PPIN_CTL MSR:
Bit 0: LockOut (R/WO)
Set 1 to prevent further writes to MSR_PPIN_CTL.
Bit 1: Enable_PPIN (R/W)
If 1, enables MSR_PPIN to be accessible using RDMSR.
If 0, an attempt to read MSR_PPIN will cause #GP.
So there are four defined values:
0: PPIN is disabled, PPIN_CTL may be updated
1: PPIN is disabled. PPIN_CTL is locked against updates
2: PPIN is enabled. PPIN_CTL may be updated
3: PPIN is enabled. PPIN_CTL is locked against updates
Code would only enable the X86_FEATURE_INTEL_PPIN feature for case "2".
When it should have done so for both case "2" and case "3".
Fix the final test to just check for the enable bit. Also fix some of
the other comments in this function.
Fixes: 3f5a7896a509 ("x86/mce: Include the PPIN in MCE records when available")
Signed-off-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20200226011737.9958-1-tony.luck@intel.com
---
arch/x86/kernel/cpu/mce/intel.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index 5627b10..f996ffb 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -493,17 +493,18 @@ static void intel_ppin_init(struct cpuinfo_x86 *c)
return;
if ((val & 3UL) == 1UL) {
- /* PPIN available but disabled: */
+ /* PPIN locked in disabled mode */
return;
}
- /* If PPIN is disabled, but not locked, try to enable: */
- if (!(val & 3UL)) {
+ /* If PPIN is disabled, try to enable */
+ if (!(val & 2UL)) {
wrmsrl_safe(MSR_PPIN_CTL, val | 2UL);
rdmsrl_safe(MSR_PPIN_CTL, &val);
}
- if ((val & 3UL) == 2UL)
+ /* Is the enable bit set? */
+ if (val & 2UL)
set_cpu_cap(c, X86_FEATURE_INTEL_PPIN);
}
}
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 735a6dd02222d8d070c7bb748f25895239ca8c92
Gitweb: https://git.kernel.org/tip/735a6dd02222d8d070c7bb748f25895239ca8c92
Author: Sean Christopherson <sean.j.christopherson(a)intel.com>
AuthorDate: Wed, 26 Feb 2020 15:16:15 -08:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Thu, 27 Feb 2020 19:02:45 +01:00
x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
Explicitly set X86_FEATURE_OSPKE via set_cpu_cap() instead of calling
get_cpu_cap() to pull the feature bit from CPUID after enabling CR4.PKE.
Invoking get_cpu_cap() effectively wipes out any {set,clear}_cpu_cap()
changes that were made between this_cpu->c_init() and setup_pku(), as
all non-synthetic feature words are reinitialized from the CPU's CPUID
values.
Blasting away capability updates manifests most visibility when running
on a VMX capable CPU, but with VMX disabled by BIOS. To indicate that
VMX is disabled, init_ia32_feat_ctl() clears X86_FEATURE_VMX, using
clear_cpu_cap() instead of setup_clear_cpu_cap() so that KVM can report
which CPU is misconfigured (KVM needs to probe every CPU anyways).
Restoring X86_FEATURE_VMX from CPUID causes KVM to think VMX is enabled,
ultimately leading to an unexpected #GP when KVM attempts to do VMXON.
Arguably, init_ia32_feat_ctl() should use setup_clear_cpu_cap() and let
KVM figure out a different way to report the misconfigured CPU, but VMX
is not the only feature bit that is affected, i.e. there is precedent
that tweaking feature bits via {set,clear}_cpu_cap() after ->c_init()
is expected to work. Most notably, x86_init_rdrand()'s clearing of
X86_FEATURE_RDRAND when RDRAND malfunctions is also overwritten.
Fixes: 0697694564c8 ("x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU")
Reported-by: Jacob Keller <jacob.e.keller(a)intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Tested-by: Jacob Keller <jacob.e.keller(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@inte…
---
arch/x86/kernel/cpu/common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 52c9bfb..4cdb123 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -445,7 +445,7 @@ static __always_inline void setup_pku(struct cpuinfo_x86 *c)
* cpuid bit to be set. We need to ensure that we
* update that bit in this CPU's "cpu_info".
*/
- get_cpu_cap(c);
+ set_cpu_cap(c, X86_FEATURE_OSPKE);
}
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
The patch below does not apply to the 5.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c6f8c440441029d5621ee5153676243234a4b76e Mon Sep 17 00:00:00 2001
From: Bhawanpreet Lakha <Bhawanpreet.Lakha(a)amd.com>
Date: Fri, 7 Feb 2020 10:41:20 -0500
Subject: [PATCH] drm/amd/display: fix dtm unloading
there was a type in the terminate command.
We should be calling psp_dtm_unload() instead of psp_hdcp_unload()
Fixes: 143f23053333 ("drm/amdgpu: psp DTM init")
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha(a)amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 3a1570dafe34..146f96661b6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1013,6 +1013,30 @@ static int psp_dtm_initialize(struct psp_context *psp)
return 0;
}
+static int psp_dtm_unload(struct psp_context *psp)
+{
+ int ret;
+ struct psp_gfx_cmd_resp *cmd;
+
+ /*
+ * TODO: bypass the unloading in sriov for now
+ */
+ if (amdgpu_sriov_vf(psp->adev))
+ return 0;
+
+ cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
+ if (!cmd)
+ return -ENOMEM;
+
+ psp_prep_ta_unload_cmd_buf(cmd, psp->dtm_context.session_id);
+
+ ret = psp_cmd_submit_buf(psp, NULL, cmd, psp->fence_buf_mc_addr);
+
+ kfree(cmd);
+
+ return ret;
+}
+
int psp_dtm_invoke(struct psp_context *psp, uint32_t ta_cmd_id)
{
/*
@@ -1037,7 +1061,7 @@ static int psp_dtm_terminate(struct psp_context *psp)
if (!psp->dtm_context.dtm_initialized)
return 0;
- ret = psp_hdcp_unload(psp);
+ ret = psp_dtm_unload(psp);
if (ret)
return ret;
Commit 2c361684803e ("PM / Domains: Don't treat zero found compatible idle
states as an error"), moved of_genpd_parse_idle_states() towards allowing
none compatible idle state to be found for the device node, rather than
returning an error code.
However, it didn't consider that the "domain-idle-states" DT property may
be missing as it's optional, which makes of_count_phandle_with_args() to
return -ENOENT. Let's fix this to make the behaviour consistent.
Reported-by: Benjamin Gaignard <benjamin.gaignard(a)st.com>
Fixes: 2c361684803e ("PM / Domains: Don't treat zero found compatible idle states as an error")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
---
drivers/base/power/domain.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 959d6d5eb000..0a01df608849 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2653,7 +2653,7 @@ static int genpd_iterate_idle_states(struct device_node *dn,
ret = of_count_phandle_with_args(dn, "domain-idle-states", NULL);
if (ret <= 0)
- return ret;
+ return ret == -ENOENT ? 0 : ret;
/* Loop over the phandles until all the requested entry is found */
of_for_each_phandle(&it, ret, dn, "domain-idle-states", NULL, 0) {
--
2.20.1
[ Upstream commit 7143b5ac5750f404ff3a594b34fdf3fc2f99f828 ]
This patch drops 'cur_mm' before calling cond_resched(), to prevent
the sq_thread from spinning even when the user process is finished.
Before this patch, if the user process ended without closing the
io_uring fd, the sq_thread continues to spin until the
'sq_thread_idle' timeout ends.
In the worst case where the 'sq_thread_idle' parameter is bigger than
INT_MAX, the sq_thread will spin forever.
Fixes: 6c271ce2f1d5 ("io_uring: add submission polling")
Signed-off-by: Stefano Garzarella <sgarzare(a)redhat.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
fs/io_uring.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 709671faaed6..33a95e1ffb15 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2740,16 +2740,6 @@ static int io_sq_thread(void *data)
to_submit = io_sqring_entries(ctx);
if (!to_submit) {
- /*
- * We're polling. If we're within the defined idle
- * period, then let us spin without work before going
- * to sleep.
- */
- if (inflight || !time_after(jiffies, timeout)) {
- cond_resched();
- continue;
- }
-
/*
* Drop cur_mm before scheduling, we can't hold it for
* long periods (or over schedule()). Do this before
@@ -2762,6 +2752,16 @@ static int io_sq_thread(void *data)
cur_mm = NULL;
}
+ /*
+ * We're polling. If we're within the defined idle
+ * period, then let us spin without work before going
+ * to sleep.
+ */
+ if (inflight || !time_after(jiffies, timeout)) {
+ cond_resched();
+ continue;
+ }
+
prepare_to_wait(&ctx->sqo_wait, &wait,
TASK_INTERRUPTIBLE);
--
2.24.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7143b5ac5750f404ff3a594b34fdf3fc2f99f828 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <sgarzare(a)redhat.com>
Date: Fri, 21 Feb 2020 16:42:16 +0100
Subject: [PATCH] io_uring: prevent sq_thread from spinning when it should stop
This patch drops 'cur_mm' before calling cond_resched(), to prevent
the sq_thread from spinning even when the user process is finished.
Before this patch, if the user process ended without closing the
io_uring fd, the sq_thread continues to spin until the
'sq_thread_idle' timeout ends.
In the worst case where the 'sq_thread_idle' parameter is bigger than
INT_MAX, the sq_thread will spin forever.
Fixes: 6c271ce2f1d5 ("io_uring: add submission polling")
Signed-off-by: Stefano Garzarella <sgarzare(a)redhat.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6e249aa97ba3..b43467b3a8dc 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5142,6 +5142,18 @@ static int io_sq_thread(void *data)
* to enter the kernel to reap and flush events.
*/
if (!to_submit || ret == -EBUSY) {
+ /*
+ * Drop cur_mm before scheduling, we can't hold it for
+ * long periods (or over schedule()). Do this before
+ * adding ourselves to the waitqueue, as the unuse/drop
+ * may sleep.
+ */
+ if (cur_mm) {
+ unuse_mm(cur_mm);
+ mmput(cur_mm);
+ cur_mm = NULL;
+ }
+
/*
* We're polling. If we're within the defined idle
* period, then let us spin without work before going
@@ -5156,18 +5168,6 @@ static int io_sq_thread(void *data)
continue;
}
- /*
- * Drop cur_mm before scheduling, we can't hold it for
- * long periods (or over schedule()). Do this before
- * adding ourselves to the waitqueue, as the unuse/drop
- * may sleep.
- */
- if (cur_mm) {
- unuse_mm(cur_mm);
- mmput(cur_mm);
- cur_mm = NULL;
- }
-
prepare_to_wait(&ctx->sqo_wait, &wait,
TASK_INTERRUPTIBLE);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 305e519ce48e935702c32241f07d393c3c8fed3e Mon Sep 17 00:00:00 2001
From: Alexander Potapenko <glider(a)google.com>
Date: Thu, 20 Feb 2020 20:04:30 -0800
Subject: [PATCH] lib/stackdepot.c: fix global out-of-bounds in stack_slabs
Walter Wu has reported a potential case in which init_stack_slab() is
called after stack_slabs[STACK_ALLOC_MAX_SLABS - 1] has already been
initialized. In that case init_stack_slab() will overwrite
stack_slabs[STACK_ALLOC_MAX_SLABS], which may result in a memory
corruption.
Link: http://lkml.kernel.org/r/20200218102950.260263-1-glider@google.com
Fixes: cd11016e5f521 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Reported-by: Walter Wu <walter-zh.wu(a)mediatek.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Matthias Brugger <matthias.bgg(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Kate Stewart <kstewart(a)linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index ed717dd08ff3..81c69c08d1d1 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -83,15 +83,19 @@ static bool init_stack_slab(void **prealloc)
return true;
if (stack_slabs[depot_index] == NULL) {
stack_slabs[depot_index] = *prealloc;
+ *prealloc = NULL;
} else {
- stack_slabs[depot_index + 1] = *prealloc;
+ /* If this is the last depot slab, do not touch the next one. */
+ if (depot_index + 1 < STACK_ALLOC_MAX_SLABS) {
+ stack_slabs[depot_index + 1] = *prealloc;
+ *prealloc = NULL;
+ }
/*
* This smp_store_release pairs with smp_load_acquire() from
* |next_slab_inited| above and in stack_depot_save().
*/
smp_store_release(&next_slab_inited, 1);
}
- *prealloc = NULL;
return true;
}
Upstream: 82c69bf58650 ("drm/i915/gt: Detect if we miss WaIdleLiteRestore")
In order to avoid confusing the HW, we must never submit an empty ring
during lite-restore, that is we should always advance the RING_TAIL
before submitting to stay ahead of the RING_HEAD.
Normally this is prevented by keeping a couple of spare NOPs in the
request->wa_tail so that on resubmission we can advance the tail. This
relies on the request only being resubmitted once, which is the normal
condition as it is seen once for ELSP[1] and then later in ELSP[0]. On
preemption, the requests are unwound and the tail reset back to the
normal end point (as we know the request is incomplete and therefore its
RING_HEAD is even earlier).
However, if this w/a should fail we would try and resubmit the request
with the RING_TAIL already set to the location of this request's wa_tail
potentially causing a GPU hang. We can spot when we do try and
incorrectly resubmit without advancing the RING_TAIL and spare any
embarrassment by forcing the context restore.
In the case of preempt-to-busy, we leave the requests running on the HW
while we unwind. As the ring is still live, we cannot rewind our
rq->tail without forcing a reload so leave it set to rq->wa_tail and
only force a reload if we resubmit after a lite-restore. (Normally, the
forced reload will be a part of the preemption event.)
Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/673
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala(a)linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Cc: stable(a)kernel.vger.org #5.3+
Link: https://patchwork.freedesktop.org/patch/msgid/20191209023215.3519970-1-chri…
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 42 ++++++++++++++---------------
1 file changed, 20 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 4949b5ad860f..e0e4f3deb2da 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -471,12 +471,6 @@ lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine)
return desc;
}
-static void unwind_wa_tail(struct i915_request *rq)
-{
- rq->tail = intel_ring_wrap(rq->ring, rq->wa_tail - WA_TAIL_BYTES);
- assert_ring_tail_valid(rq->ring, rq->tail);
-}
-
static struct i915_request *
__unwind_incomplete_requests(struct intel_engine_cs *engine)
{
@@ -495,7 +489,6 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
continue; /* XXX */
__i915_request_unsubmit(rq);
- unwind_wa_tail(rq);
/*
* Push the request back into the queue for later resubmission.
@@ -650,13 +643,29 @@ execlists_schedule_out(struct i915_request *rq)
i915_request_put(rq);
}
-static u64 execlists_update_context(const struct i915_request *rq)
+static u64 execlists_update_context(struct i915_request *rq)
{
struct intel_context *ce = rq->hw_context;
- u64 desc;
+ u64 desc = ce->lrc_desc;
+ u32 tail;
- ce->lrc_reg_state[CTX_RING_TAIL + 1] =
- intel_ring_set_tail(rq->ring, rq->tail);
+ /*
+ * WaIdleLiteRestore:bdw,skl
+ *
+ * We should never submit the context with the same RING_TAIL twice
+ * just in case we submit an empty ring, which confuses the HW.
+ *
+ * We append a couple of NOOPs (gen8_emit_wa_tail) after the end of
+ * the normal request to be able to always advance the RING_TAIL on
+ * subsequent resubmissions (for lite restore). Should that fail us,
+ * and we try and submit the same tail again, force the context
+ * reload.
+ */
+ tail = intel_ring_set_tail(rq->ring, rq->tail);
+ if (unlikely(ce->lrc_reg_state[CTX_RING_TAIL + 1] == tail))
+ desc |= CTX_DESC_FORCE_RESTORE;
+ ce->lrc_reg_state[CTX_RING_TAIL + 1] = tail;
+ rq->tail = rq->wa_tail;
/*
* Make sure the context image is complete before we submit it to HW.
@@ -675,7 +684,6 @@ static u64 execlists_update_context(const struct i915_request *rq)
*/
mb();
- desc = ce->lrc_desc;
ce->lrc_desc &= ~CTX_DESC_FORCE_RESTORE;
return desc;
@@ -1150,16 +1158,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
if (!list_is_last(&last->sched.link,
&engine->active.requests))
return;
-
- /*
- * WaIdleLiteRestore:bdw,skl
- * Apply the wa NOOPs to prevent
- * ring:HEAD == rq:TAIL as we resubmit the
- * request. See gen8_emit_fini_breadcrumb() for
- * where we prepare the padding after the
- * end of the request.
- */
- last->tail = last->wa_tail;
}
}
--
2.25.1
Hi,
I was just looking throught the current 5.4-stable queue and saw that
28553fa992cb28 ('Btrfs: fix race between shrinking truncate and fiemap')
is queued. Upstream has a follow-up fix for this: 52e29e331070cd aka
'btrfs: don't set path->leave_spinning for truncate'.
Would be nice to get those in together. I only looked at 5.4, don't
know about other queues.
thanks,
Holger
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a5ae50dea9111db63d30d700766dd5509602f7ad Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 20 Feb 2020 13:29:49 +0000
Subject: [PATCH] Btrfs: fix deadlock during fast fsync when logging prealloc
extents beyond eof
While logging the prealloc extents of an inode during a fast fsync we call
btrfs_truncate_inode_items(), through btrfs_log_prealloc_extents(), while
holding a read lock on a leaf of the inode's root (not the log root, the
fs/subvol root), and then that function locks the file range in the inode's
iotree. This can lead to a deadlock when:
* the fsync is ranged
* the file has prealloc extents beyond eof
* writeback for a range different from the fsync range starts
during the fsync
* the size of the file is not sector size aligned
Because when finishing an ordered extent we lock first a file range and
then try to COW the fs/subvol tree to insert an extent item.
The following diagram shows how the deadlock can happen.
CPU 1 CPU 2
btrfs_sync_file()
--> for range [0, 1MiB)
--> inode has a size of
1MiB and has 1 prealloc
extent beyond the
i_size, starting at offset
4MiB
flushes all delalloc for the
range [0MiB, 1MiB) and waits
for the respective ordered
extents to complete
--> before task at CPU 1 locks the
inode, a write into file range
[1MiB, 2MiB + 1KiB) is made
--> i_size is updated to 2MiB + 1KiB
--> writeback is started for that
range, [1MiB, 2MiB + 4KiB)
--> end offset rounded up to
be sector size aligned
btrfs_log_dentry_safe()
btrfs_log_inode_parent()
btrfs_log_inode()
btrfs_log_changed_extents()
btrfs_log_prealloc_extents()
--> does a search on the
inode's root
--> holds a read lock on
leaf X
btrfs_finish_ordered_io()
--> locks range [1MiB, 2MiB + 4KiB)
--> end offset rounded up
to be sector size aligned
--> tries to cow leaf X, through
insert_reserved_file_extent()
--> already locked by the
task at CPU 1
btrfs_truncate_inode_items()
--> gets an i_size of
2MiB + 1KiB, which is
not sector size
aligned
--> tries to lock file
range [2MiB, (u64)-1)
--> the start range
is rounded down
from 2MiB + 1K
to 2MiB to be sector
size aligned
--> but the subrange
[2MiB, 2MiB + 4KiB) is
already locked by
task at CPU 2 which
is waiting to get a
write lock on leaf X
for which we are
holding a read lock
*** deadlock ***
This results in a stack trace like the following, triggered by test case
generic/561 from fstests:
[ 2779.973608] INFO: task kworker/u8:6:247 blocked for more than 120 seconds.
[ 2779.979536] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2779.984503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2779.990136] kworker/u8:6 D 0 247 2 0x80004000
[ 2779.990457] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[ 2779.990466] Call Trace:
[ 2779.990491] ? __schedule+0x384/0xa30
[ 2779.990521] schedule+0x33/0xe0
[ 2779.990616] btrfs_tree_read_lock+0x19e/0x2e0 [btrfs]
[ 2779.990632] ? remove_wait_queue+0x60/0x60
[ 2779.990730] btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[ 2779.990782] btrfs_search_slot+0x510/0x1000 [btrfs]
[ 2779.990869] btrfs_lookup_file_extent+0x4a/0x70 [btrfs]
[ 2779.990944] __btrfs_drop_extents+0x161/0x1060 [btrfs]
[ 2779.990987] ? mark_held_locks+0x6d/0xc0
[ 2779.990994] ? __slab_alloc.isra.49+0x99/0x100
[ 2779.991060] ? insert_reserved_file_extent.constprop.19+0x64/0x300 [btrfs]
[ 2779.991145] insert_reserved_file_extent.constprop.19+0x97/0x300 [btrfs]
[ 2779.991222] ? start_transaction+0xdd/0x5c0 [btrfs]
[ 2779.991291] btrfs_finish_ordered_io+0x4f4/0x840 [btrfs]
[ 2779.991405] btrfs_work_helper+0xaa/0x720 [btrfs]
[ 2779.991432] process_one_work+0x26d/0x6a0
[ 2779.991460] worker_thread+0x4f/0x3e0
[ 2779.991481] ? process_one_work+0x6a0/0x6a0
[ 2779.991489] kthread+0x103/0x140
[ 2779.991499] ? kthread_create_worker_on_cpu+0x70/0x70
[ 2779.991515] ret_from_fork+0x3a/0x50
(...)
[ 2780.026211] INFO: task fsstress:17375 blocked for more than 120 seconds.
[ 2780.027480] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2780.028482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2780.030035] fsstress D 0 17375 17373 0x00004000
[ 2780.030038] Call Trace:
[ 2780.030044] ? __schedule+0x384/0xa30
[ 2780.030052] schedule+0x33/0xe0
[ 2780.030075] lock_extent_bits+0x20c/0x320 [btrfs]
[ 2780.030094] ? btrfs_truncate_inode_items+0xf4/0x1150 [btrfs]
[ 2780.030098] ? rcu_read_lock_sched_held+0x59/0xa0
[ 2780.030102] ? remove_wait_queue+0x60/0x60
[ 2780.030122] btrfs_truncate_inode_items+0x133/0x1150 [btrfs]
[ 2780.030151] ? btrfs_set_path_blocking+0xb2/0x160 [btrfs]
[ 2780.030165] ? btrfs_search_slot+0x379/0x1000 [btrfs]
[ 2780.030195] btrfs_log_changed_extents.isra.8+0x841/0x93e [btrfs]
[ 2780.030202] ? do_raw_spin_unlock+0x49/0xc0
[ 2780.030215] ? btrfs_get_num_csums+0x10/0x10 [btrfs]
[ 2780.030239] btrfs_log_inode+0xf83/0x1124 [btrfs]
[ 2780.030251] ? __mutex_unlock_slowpath+0x45/0x2a0
[ 2780.030275] btrfs_log_inode_parent+0x2a0/0xe40 [btrfs]
[ 2780.030282] ? dget_parent+0xa1/0x370
[ 2780.030309] btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[ 2780.030329] btrfs_sync_file+0x3f3/0x490 [btrfs]
[ 2780.030339] do_fsync+0x38/0x60
[ 2780.030343] __x64_sys_fdatasync+0x13/0x20
[ 2780.030345] do_syscall_64+0x5c/0x280
[ 2780.030348] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2780.030356] RIP: 0033:0x7f2d80f6d5f0
[ 2780.030361] Code: Bad RIP value.
[ 2780.030362] RSP: 002b:00007ffdba3c8548 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[ 2780.030364] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2d80f6d5f0
[ 2780.030365] RDX: 00007ffdba3c84b0 RSI: 00007ffdba3c84b0 RDI: 0000000000000003
[ 2780.030367] RBP: 000000000000004a R08: 0000000000000001 R09: 00007ffdba3c855c
[ 2780.030368] R10: 0000000000000078 R11: 0000000000000246 R12: 00000000000001f4
[ 2780.030369] R13: 0000000051eb851f R14: 00007ffdba3c85f0 R15: 0000557a49220d90
So fix this by making btrfs_truncate_inode_items() not lock the range in
the inode's iotree when the target root is a log root, since it's not
needed to lock the range for log roots as the protection from the inode's
lock and log_mutex are all that's needed.
Fixes: 28553fa992cb28 ("Btrfs: fix race between shrinking truncate and fiemap")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4f47ba652b31..1ccb3f8d528d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4103,8 +4103,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
return -ENOMEM;
path->reada = READA_BACK;
- lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
- &cached_state);
+ if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID)
+ lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
/*
* We want to drop from the next block forward in case this new size is
@@ -4368,11 +4369,10 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
if (!ret && last_size > new_size)
last_size = new_size;
btrfs_ordered_update_i_size(inode, last_size, NULL);
+ unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start,
+ (u64)-1, &cached_state);
}
- unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
- &cached_state);
-
btrfs_free_path(path);
return ret;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a5ae50dea9111db63d30d700766dd5509602f7ad Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 20 Feb 2020 13:29:49 +0000
Subject: [PATCH] Btrfs: fix deadlock during fast fsync when logging prealloc
extents beyond eof
While logging the prealloc extents of an inode during a fast fsync we call
btrfs_truncate_inode_items(), through btrfs_log_prealloc_extents(), while
holding a read lock on a leaf of the inode's root (not the log root, the
fs/subvol root), and then that function locks the file range in the inode's
iotree. This can lead to a deadlock when:
* the fsync is ranged
* the file has prealloc extents beyond eof
* writeback for a range different from the fsync range starts
during the fsync
* the size of the file is not sector size aligned
Because when finishing an ordered extent we lock first a file range and
then try to COW the fs/subvol tree to insert an extent item.
The following diagram shows how the deadlock can happen.
CPU 1 CPU 2
btrfs_sync_file()
--> for range [0, 1MiB)
--> inode has a size of
1MiB and has 1 prealloc
extent beyond the
i_size, starting at offset
4MiB
flushes all delalloc for the
range [0MiB, 1MiB) and waits
for the respective ordered
extents to complete
--> before task at CPU 1 locks the
inode, a write into file range
[1MiB, 2MiB + 1KiB) is made
--> i_size is updated to 2MiB + 1KiB
--> writeback is started for that
range, [1MiB, 2MiB + 4KiB)
--> end offset rounded up to
be sector size aligned
btrfs_log_dentry_safe()
btrfs_log_inode_parent()
btrfs_log_inode()
btrfs_log_changed_extents()
btrfs_log_prealloc_extents()
--> does a search on the
inode's root
--> holds a read lock on
leaf X
btrfs_finish_ordered_io()
--> locks range [1MiB, 2MiB + 4KiB)
--> end offset rounded up
to be sector size aligned
--> tries to cow leaf X, through
insert_reserved_file_extent()
--> already locked by the
task at CPU 1
btrfs_truncate_inode_items()
--> gets an i_size of
2MiB + 1KiB, which is
not sector size
aligned
--> tries to lock file
range [2MiB, (u64)-1)
--> the start range
is rounded down
from 2MiB + 1K
to 2MiB to be sector
size aligned
--> but the subrange
[2MiB, 2MiB + 4KiB) is
already locked by
task at CPU 2 which
is waiting to get a
write lock on leaf X
for which we are
holding a read lock
*** deadlock ***
This results in a stack trace like the following, triggered by test case
generic/561 from fstests:
[ 2779.973608] INFO: task kworker/u8:6:247 blocked for more than 120 seconds.
[ 2779.979536] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2779.984503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2779.990136] kworker/u8:6 D 0 247 2 0x80004000
[ 2779.990457] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[ 2779.990466] Call Trace:
[ 2779.990491] ? __schedule+0x384/0xa30
[ 2779.990521] schedule+0x33/0xe0
[ 2779.990616] btrfs_tree_read_lock+0x19e/0x2e0 [btrfs]
[ 2779.990632] ? remove_wait_queue+0x60/0x60
[ 2779.990730] btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[ 2779.990782] btrfs_search_slot+0x510/0x1000 [btrfs]
[ 2779.990869] btrfs_lookup_file_extent+0x4a/0x70 [btrfs]
[ 2779.990944] __btrfs_drop_extents+0x161/0x1060 [btrfs]
[ 2779.990987] ? mark_held_locks+0x6d/0xc0
[ 2779.990994] ? __slab_alloc.isra.49+0x99/0x100
[ 2779.991060] ? insert_reserved_file_extent.constprop.19+0x64/0x300 [btrfs]
[ 2779.991145] insert_reserved_file_extent.constprop.19+0x97/0x300 [btrfs]
[ 2779.991222] ? start_transaction+0xdd/0x5c0 [btrfs]
[ 2779.991291] btrfs_finish_ordered_io+0x4f4/0x840 [btrfs]
[ 2779.991405] btrfs_work_helper+0xaa/0x720 [btrfs]
[ 2779.991432] process_one_work+0x26d/0x6a0
[ 2779.991460] worker_thread+0x4f/0x3e0
[ 2779.991481] ? process_one_work+0x6a0/0x6a0
[ 2779.991489] kthread+0x103/0x140
[ 2779.991499] ? kthread_create_worker_on_cpu+0x70/0x70
[ 2779.991515] ret_from_fork+0x3a/0x50
(...)
[ 2780.026211] INFO: task fsstress:17375 blocked for more than 120 seconds.
[ 2780.027480] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2780.028482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2780.030035] fsstress D 0 17375 17373 0x00004000
[ 2780.030038] Call Trace:
[ 2780.030044] ? __schedule+0x384/0xa30
[ 2780.030052] schedule+0x33/0xe0
[ 2780.030075] lock_extent_bits+0x20c/0x320 [btrfs]
[ 2780.030094] ? btrfs_truncate_inode_items+0xf4/0x1150 [btrfs]
[ 2780.030098] ? rcu_read_lock_sched_held+0x59/0xa0
[ 2780.030102] ? remove_wait_queue+0x60/0x60
[ 2780.030122] btrfs_truncate_inode_items+0x133/0x1150 [btrfs]
[ 2780.030151] ? btrfs_set_path_blocking+0xb2/0x160 [btrfs]
[ 2780.030165] ? btrfs_search_slot+0x379/0x1000 [btrfs]
[ 2780.030195] btrfs_log_changed_extents.isra.8+0x841/0x93e [btrfs]
[ 2780.030202] ? do_raw_spin_unlock+0x49/0xc0
[ 2780.030215] ? btrfs_get_num_csums+0x10/0x10 [btrfs]
[ 2780.030239] btrfs_log_inode+0xf83/0x1124 [btrfs]
[ 2780.030251] ? __mutex_unlock_slowpath+0x45/0x2a0
[ 2780.030275] btrfs_log_inode_parent+0x2a0/0xe40 [btrfs]
[ 2780.030282] ? dget_parent+0xa1/0x370
[ 2780.030309] btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[ 2780.030329] btrfs_sync_file+0x3f3/0x490 [btrfs]
[ 2780.030339] do_fsync+0x38/0x60
[ 2780.030343] __x64_sys_fdatasync+0x13/0x20
[ 2780.030345] do_syscall_64+0x5c/0x280
[ 2780.030348] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2780.030356] RIP: 0033:0x7f2d80f6d5f0
[ 2780.030361] Code: Bad RIP value.
[ 2780.030362] RSP: 002b:00007ffdba3c8548 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[ 2780.030364] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2d80f6d5f0
[ 2780.030365] RDX: 00007ffdba3c84b0 RSI: 00007ffdba3c84b0 RDI: 0000000000000003
[ 2780.030367] RBP: 000000000000004a R08: 0000000000000001 R09: 00007ffdba3c855c
[ 2780.030368] R10: 0000000000000078 R11: 0000000000000246 R12: 00000000000001f4
[ 2780.030369] R13: 0000000051eb851f R14: 00007ffdba3c85f0 R15: 0000557a49220d90
So fix this by making btrfs_truncate_inode_items() not lock the range in
the inode's iotree when the target root is a log root, since it's not
needed to lock the range for log roots as the protection from the inode's
lock and log_mutex are all that's needed.
Fixes: 28553fa992cb28 ("Btrfs: fix race between shrinking truncate and fiemap")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4f47ba652b31..1ccb3f8d528d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4103,8 +4103,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
return -ENOMEM;
path->reada = READA_BACK;
- lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
- &cached_state);
+ if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID)
+ lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
/*
* We want to drop from the next block forward in case this new size is
@@ -4368,11 +4369,10 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
if (!ret && last_size > new_size)
last_size = new_size;
btrfs_ordered_update_i_size(inode, last_size, NULL);
+ unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start,
+ (u64)-1, &cached_state);
}
- unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
- &cached_state);
-
btrfs_free_path(path);
return ret;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a5ae50dea9111db63d30d700766dd5509602f7ad Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 20 Feb 2020 13:29:49 +0000
Subject: [PATCH] Btrfs: fix deadlock during fast fsync when logging prealloc
extents beyond eof
While logging the prealloc extents of an inode during a fast fsync we call
btrfs_truncate_inode_items(), through btrfs_log_prealloc_extents(), while
holding a read lock on a leaf of the inode's root (not the log root, the
fs/subvol root), and then that function locks the file range in the inode's
iotree. This can lead to a deadlock when:
* the fsync is ranged
* the file has prealloc extents beyond eof
* writeback for a range different from the fsync range starts
during the fsync
* the size of the file is not sector size aligned
Because when finishing an ordered extent we lock first a file range and
then try to COW the fs/subvol tree to insert an extent item.
The following diagram shows how the deadlock can happen.
CPU 1 CPU 2
btrfs_sync_file()
--> for range [0, 1MiB)
--> inode has a size of
1MiB and has 1 prealloc
extent beyond the
i_size, starting at offset
4MiB
flushes all delalloc for the
range [0MiB, 1MiB) and waits
for the respective ordered
extents to complete
--> before task at CPU 1 locks the
inode, a write into file range
[1MiB, 2MiB + 1KiB) is made
--> i_size is updated to 2MiB + 1KiB
--> writeback is started for that
range, [1MiB, 2MiB + 4KiB)
--> end offset rounded up to
be sector size aligned
btrfs_log_dentry_safe()
btrfs_log_inode_parent()
btrfs_log_inode()
btrfs_log_changed_extents()
btrfs_log_prealloc_extents()
--> does a search on the
inode's root
--> holds a read lock on
leaf X
btrfs_finish_ordered_io()
--> locks range [1MiB, 2MiB + 4KiB)
--> end offset rounded up
to be sector size aligned
--> tries to cow leaf X, through
insert_reserved_file_extent()
--> already locked by the
task at CPU 1
btrfs_truncate_inode_items()
--> gets an i_size of
2MiB + 1KiB, which is
not sector size
aligned
--> tries to lock file
range [2MiB, (u64)-1)
--> the start range
is rounded down
from 2MiB + 1K
to 2MiB to be sector
size aligned
--> but the subrange
[2MiB, 2MiB + 4KiB) is
already locked by
task at CPU 2 which
is waiting to get a
write lock on leaf X
for which we are
holding a read lock
*** deadlock ***
This results in a stack trace like the following, triggered by test case
generic/561 from fstests:
[ 2779.973608] INFO: task kworker/u8:6:247 blocked for more than 120 seconds.
[ 2779.979536] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2779.984503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2779.990136] kworker/u8:6 D 0 247 2 0x80004000
[ 2779.990457] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[ 2779.990466] Call Trace:
[ 2779.990491] ? __schedule+0x384/0xa30
[ 2779.990521] schedule+0x33/0xe0
[ 2779.990616] btrfs_tree_read_lock+0x19e/0x2e0 [btrfs]
[ 2779.990632] ? remove_wait_queue+0x60/0x60
[ 2779.990730] btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[ 2779.990782] btrfs_search_slot+0x510/0x1000 [btrfs]
[ 2779.990869] btrfs_lookup_file_extent+0x4a/0x70 [btrfs]
[ 2779.990944] __btrfs_drop_extents+0x161/0x1060 [btrfs]
[ 2779.990987] ? mark_held_locks+0x6d/0xc0
[ 2779.990994] ? __slab_alloc.isra.49+0x99/0x100
[ 2779.991060] ? insert_reserved_file_extent.constprop.19+0x64/0x300 [btrfs]
[ 2779.991145] insert_reserved_file_extent.constprop.19+0x97/0x300 [btrfs]
[ 2779.991222] ? start_transaction+0xdd/0x5c0 [btrfs]
[ 2779.991291] btrfs_finish_ordered_io+0x4f4/0x840 [btrfs]
[ 2779.991405] btrfs_work_helper+0xaa/0x720 [btrfs]
[ 2779.991432] process_one_work+0x26d/0x6a0
[ 2779.991460] worker_thread+0x4f/0x3e0
[ 2779.991481] ? process_one_work+0x6a0/0x6a0
[ 2779.991489] kthread+0x103/0x140
[ 2779.991499] ? kthread_create_worker_on_cpu+0x70/0x70
[ 2779.991515] ret_from_fork+0x3a/0x50
(...)
[ 2780.026211] INFO: task fsstress:17375 blocked for more than 120 seconds.
[ 2780.027480] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2780.028482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2780.030035] fsstress D 0 17375 17373 0x00004000
[ 2780.030038] Call Trace:
[ 2780.030044] ? __schedule+0x384/0xa30
[ 2780.030052] schedule+0x33/0xe0
[ 2780.030075] lock_extent_bits+0x20c/0x320 [btrfs]
[ 2780.030094] ? btrfs_truncate_inode_items+0xf4/0x1150 [btrfs]
[ 2780.030098] ? rcu_read_lock_sched_held+0x59/0xa0
[ 2780.030102] ? remove_wait_queue+0x60/0x60
[ 2780.030122] btrfs_truncate_inode_items+0x133/0x1150 [btrfs]
[ 2780.030151] ? btrfs_set_path_blocking+0xb2/0x160 [btrfs]
[ 2780.030165] ? btrfs_search_slot+0x379/0x1000 [btrfs]
[ 2780.030195] btrfs_log_changed_extents.isra.8+0x841/0x93e [btrfs]
[ 2780.030202] ? do_raw_spin_unlock+0x49/0xc0
[ 2780.030215] ? btrfs_get_num_csums+0x10/0x10 [btrfs]
[ 2780.030239] btrfs_log_inode+0xf83/0x1124 [btrfs]
[ 2780.030251] ? __mutex_unlock_slowpath+0x45/0x2a0
[ 2780.030275] btrfs_log_inode_parent+0x2a0/0xe40 [btrfs]
[ 2780.030282] ? dget_parent+0xa1/0x370
[ 2780.030309] btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[ 2780.030329] btrfs_sync_file+0x3f3/0x490 [btrfs]
[ 2780.030339] do_fsync+0x38/0x60
[ 2780.030343] __x64_sys_fdatasync+0x13/0x20
[ 2780.030345] do_syscall_64+0x5c/0x280
[ 2780.030348] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2780.030356] RIP: 0033:0x7f2d80f6d5f0
[ 2780.030361] Code: Bad RIP value.
[ 2780.030362] RSP: 002b:00007ffdba3c8548 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[ 2780.030364] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2d80f6d5f0
[ 2780.030365] RDX: 00007ffdba3c84b0 RSI: 00007ffdba3c84b0 RDI: 0000000000000003
[ 2780.030367] RBP: 000000000000004a R08: 0000000000000001 R09: 00007ffdba3c855c
[ 2780.030368] R10: 0000000000000078 R11: 0000000000000246 R12: 00000000000001f4
[ 2780.030369] R13: 0000000051eb851f R14: 00007ffdba3c85f0 R15: 0000557a49220d90
So fix this by making btrfs_truncate_inode_items() not lock the range in
the inode's iotree when the target root is a log root, since it's not
needed to lock the range for log roots as the protection from the inode's
lock and log_mutex are all that's needed.
Fixes: 28553fa992cb28 ("Btrfs: fix race between shrinking truncate and fiemap")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4f47ba652b31..1ccb3f8d528d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4103,8 +4103,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
return -ENOMEM;
path->reada = READA_BACK;
- lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
- &cached_state);
+ if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID)
+ lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
/*
* We want to drop from the next block forward in case this new size is
@@ -4368,11 +4369,10 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
if (!ret && last_size > new_size)
last_size = new_size;
btrfs_ordered_update_i_size(inode, last_size, NULL);
+ unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start,
+ (u64)-1, &cached_state);
}
- unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
- &cached_state);
-
btrfs_free_path(path);
return ret;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a5ae50dea9111db63d30d700766dd5509602f7ad Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 20 Feb 2020 13:29:49 +0000
Subject: [PATCH] Btrfs: fix deadlock during fast fsync when logging prealloc
extents beyond eof
While logging the prealloc extents of an inode during a fast fsync we call
btrfs_truncate_inode_items(), through btrfs_log_prealloc_extents(), while
holding a read lock on a leaf of the inode's root (not the log root, the
fs/subvol root), and then that function locks the file range in the inode's
iotree. This can lead to a deadlock when:
* the fsync is ranged
* the file has prealloc extents beyond eof
* writeback for a range different from the fsync range starts
during the fsync
* the size of the file is not sector size aligned
Because when finishing an ordered extent we lock first a file range and
then try to COW the fs/subvol tree to insert an extent item.
The following diagram shows how the deadlock can happen.
CPU 1 CPU 2
btrfs_sync_file()
--> for range [0, 1MiB)
--> inode has a size of
1MiB and has 1 prealloc
extent beyond the
i_size, starting at offset
4MiB
flushes all delalloc for the
range [0MiB, 1MiB) and waits
for the respective ordered
extents to complete
--> before task at CPU 1 locks the
inode, a write into file range
[1MiB, 2MiB + 1KiB) is made
--> i_size is updated to 2MiB + 1KiB
--> writeback is started for that
range, [1MiB, 2MiB + 4KiB)
--> end offset rounded up to
be sector size aligned
btrfs_log_dentry_safe()
btrfs_log_inode_parent()
btrfs_log_inode()
btrfs_log_changed_extents()
btrfs_log_prealloc_extents()
--> does a search on the
inode's root
--> holds a read lock on
leaf X
btrfs_finish_ordered_io()
--> locks range [1MiB, 2MiB + 4KiB)
--> end offset rounded up
to be sector size aligned
--> tries to cow leaf X, through
insert_reserved_file_extent()
--> already locked by the
task at CPU 1
btrfs_truncate_inode_items()
--> gets an i_size of
2MiB + 1KiB, which is
not sector size
aligned
--> tries to lock file
range [2MiB, (u64)-1)
--> the start range
is rounded down
from 2MiB + 1K
to 2MiB to be sector
size aligned
--> but the subrange
[2MiB, 2MiB + 4KiB) is
already locked by
task at CPU 2 which
is waiting to get a
write lock on leaf X
for which we are
holding a read lock
*** deadlock ***
This results in a stack trace like the following, triggered by test case
generic/561 from fstests:
[ 2779.973608] INFO: task kworker/u8:6:247 blocked for more than 120 seconds.
[ 2779.979536] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2779.984503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2779.990136] kworker/u8:6 D 0 247 2 0x80004000
[ 2779.990457] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[ 2779.990466] Call Trace:
[ 2779.990491] ? __schedule+0x384/0xa30
[ 2779.990521] schedule+0x33/0xe0
[ 2779.990616] btrfs_tree_read_lock+0x19e/0x2e0 [btrfs]
[ 2779.990632] ? remove_wait_queue+0x60/0x60
[ 2779.990730] btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[ 2779.990782] btrfs_search_slot+0x510/0x1000 [btrfs]
[ 2779.990869] btrfs_lookup_file_extent+0x4a/0x70 [btrfs]
[ 2779.990944] __btrfs_drop_extents+0x161/0x1060 [btrfs]
[ 2779.990987] ? mark_held_locks+0x6d/0xc0
[ 2779.990994] ? __slab_alloc.isra.49+0x99/0x100
[ 2779.991060] ? insert_reserved_file_extent.constprop.19+0x64/0x300 [btrfs]
[ 2779.991145] insert_reserved_file_extent.constprop.19+0x97/0x300 [btrfs]
[ 2779.991222] ? start_transaction+0xdd/0x5c0 [btrfs]
[ 2779.991291] btrfs_finish_ordered_io+0x4f4/0x840 [btrfs]
[ 2779.991405] btrfs_work_helper+0xaa/0x720 [btrfs]
[ 2779.991432] process_one_work+0x26d/0x6a0
[ 2779.991460] worker_thread+0x4f/0x3e0
[ 2779.991481] ? process_one_work+0x6a0/0x6a0
[ 2779.991489] kthread+0x103/0x140
[ 2779.991499] ? kthread_create_worker_on_cpu+0x70/0x70
[ 2779.991515] ret_from_fork+0x3a/0x50
(...)
[ 2780.026211] INFO: task fsstress:17375 blocked for more than 120 seconds.
[ 2780.027480] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2780.028482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2780.030035] fsstress D 0 17375 17373 0x00004000
[ 2780.030038] Call Trace:
[ 2780.030044] ? __schedule+0x384/0xa30
[ 2780.030052] schedule+0x33/0xe0
[ 2780.030075] lock_extent_bits+0x20c/0x320 [btrfs]
[ 2780.030094] ? btrfs_truncate_inode_items+0xf4/0x1150 [btrfs]
[ 2780.030098] ? rcu_read_lock_sched_held+0x59/0xa0
[ 2780.030102] ? remove_wait_queue+0x60/0x60
[ 2780.030122] btrfs_truncate_inode_items+0x133/0x1150 [btrfs]
[ 2780.030151] ? btrfs_set_path_blocking+0xb2/0x160 [btrfs]
[ 2780.030165] ? btrfs_search_slot+0x379/0x1000 [btrfs]
[ 2780.030195] btrfs_log_changed_extents.isra.8+0x841/0x93e [btrfs]
[ 2780.030202] ? do_raw_spin_unlock+0x49/0xc0
[ 2780.030215] ? btrfs_get_num_csums+0x10/0x10 [btrfs]
[ 2780.030239] btrfs_log_inode+0xf83/0x1124 [btrfs]
[ 2780.030251] ? __mutex_unlock_slowpath+0x45/0x2a0
[ 2780.030275] btrfs_log_inode_parent+0x2a0/0xe40 [btrfs]
[ 2780.030282] ? dget_parent+0xa1/0x370
[ 2780.030309] btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[ 2780.030329] btrfs_sync_file+0x3f3/0x490 [btrfs]
[ 2780.030339] do_fsync+0x38/0x60
[ 2780.030343] __x64_sys_fdatasync+0x13/0x20
[ 2780.030345] do_syscall_64+0x5c/0x280
[ 2780.030348] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2780.030356] RIP: 0033:0x7f2d80f6d5f0
[ 2780.030361] Code: Bad RIP value.
[ 2780.030362] RSP: 002b:00007ffdba3c8548 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[ 2780.030364] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2d80f6d5f0
[ 2780.030365] RDX: 00007ffdba3c84b0 RSI: 00007ffdba3c84b0 RDI: 0000000000000003
[ 2780.030367] RBP: 000000000000004a R08: 0000000000000001 R09: 00007ffdba3c855c
[ 2780.030368] R10: 0000000000000078 R11: 0000000000000246 R12: 00000000000001f4
[ 2780.030369] R13: 0000000051eb851f R14: 00007ffdba3c85f0 R15: 0000557a49220d90
So fix this by making btrfs_truncate_inode_items() not lock the range in
the inode's iotree when the target root is a log root, since it's not
needed to lock the range for log roots as the protection from the inode's
lock and log_mutex are all that's needed.
Fixes: 28553fa992cb28 ("Btrfs: fix race between shrinking truncate and fiemap")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4f47ba652b31..1ccb3f8d528d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4103,8 +4103,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
return -ENOMEM;
path->reada = READA_BACK;
- lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
- &cached_state);
+ if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID)
+ lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
/*
* We want to drop from the next block forward in case this new size is
@@ -4368,11 +4369,10 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
if (!ret && last_size > new_size)
last_size = new_size;
btrfs_ordered_update_i_size(inode, last_size, NULL);
+ unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start,
+ (u64)-1, &cached_state);
}
- unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
- &cached_state);
-
btrfs_free_path(path);
return ret;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 52e29e331070cd7d52a64cbf1b0958212a340e28 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 17 Jan 2020 09:02:20 -0500
Subject: [PATCH] btrfs: don't set path->leave_spinning for truncate
The only time we actually leave the path spinning is if we're truncating
a small amount and don't actually free an extent, which is not a common
occurrence. We have to set the path blocking in order to add the
delayed ref anyway, so the first extent we find we set the path to
blocking and stay blocking for the duration of the operation. With the
upcoming file extent map stuff there will be another case that we have
to have the path blocking, so just swap to blocking always.
Note: this patch also fixes a warning after 28553fa992cb ("Btrfs: fix
race between shrinking truncate and fiemap") got merged that inserts
extent locks around truncation so the path must not leave spinning locks
after btrfs_search_slot.
[70.794783] BUG: sleeping function called from invalid context at mm/slab.h:565
[70.794834] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1141, name: rsync
[70.794863] 5 locks held by rsync/1141:
[70.794876] #0: ffff888417b9c408 (sb_writers#17){.+.+}, at: mnt_want_write+0x20/0x50
[70.795030] #1: ffff888428de28e8 (&type->i_mutex_dir_key#13/1){+.+.}, at: lock_rename+0xf1/0x100
[70.795051] #2: ffff888417b9c608 (sb_internal#2){.+.+}, at: start_transaction+0x394/0x560
[70.795124] #3: ffff888403081768 (btrfs-fs-01){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
[70.795203] #4: ffff888403086568 (btrfs-fs-00){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
[70.795222] CPU: 5 PID: 1141 Comm: rsync Not tainted 5.6.0-rc2-backup+ #2
[70.795362] Call Trace:
[70.795374] dump_stack+0x71/0xa0
[70.795445] ___might_sleep.part.96.cold.106+0xa6/0xb6
[70.795459] kmem_cache_alloc+0x1d3/0x290
[70.795471] alloc_extent_state+0x22/0x1c0
[70.795544] __clear_extent_bit+0x3ba/0x580
[70.795557] ? _raw_spin_unlock_irq+0x24/0x30
[70.795569] btrfs_truncate_inode_items+0x339/0xe50
[70.795647] btrfs_evict_inode+0x269/0x540
[70.795659] ? dput.part.38+0x29/0x460
[70.795671] evict+0xcd/0x190
[70.795682] __dentry_kill+0xd6/0x180
[70.795754] dput.part.38+0x2ad/0x460
[70.795765] do_renameat2+0x3cb/0x540
[70.795777] __x64_sys_rename+0x1c/0x20
Reported-by: Dave Jones <davej(a)codemonkey.org.uk>
Fixes: 28553fa992cb ("Btrfs: fix race between shrinking truncate and fiemap")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7d26b4bfb2c6..36deef69f847 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4142,7 +4142,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
goto out;
}
- path->leave_spinning = 1;
ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
if (ret < 0)
goto out;
@@ -4294,7 +4293,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
root == fs_info->tree_root)) {
struct btrfs_ref ref = { 0 };
- btrfs_set_path_blocking(path);
bytes_deleted += extent_num_bytes;
btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 52e29e331070cd7d52a64cbf1b0958212a340e28 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 17 Jan 2020 09:02:20 -0500
Subject: [PATCH] btrfs: don't set path->leave_spinning for truncate
The only time we actually leave the path spinning is if we're truncating
a small amount and don't actually free an extent, which is not a common
occurrence. We have to set the path blocking in order to add the
delayed ref anyway, so the first extent we find we set the path to
blocking and stay blocking for the duration of the operation. With the
upcoming file extent map stuff there will be another case that we have
to have the path blocking, so just swap to blocking always.
Note: this patch also fixes a warning after 28553fa992cb ("Btrfs: fix
race between shrinking truncate and fiemap") got merged that inserts
extent locks around truncation so the path must not leave spinning locks
after btrfs_search_slot.
[70.794783] BUG: sleeping function called from invalid context at mm/slab.h:565
[70.794834] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1141, name: rsync
[70.794863] 5 locks held by rsync/1141:
[70.794876] #0: ffff888417b9c408 (sb_writers#17){.+.+}, at: mnt_want_write+0x20/0x50
[70.795030] #1: ffff888428de28e8 (&type->i_mutex_dir_key#13/1){+.+.}, at: lock_rename+0xf1/0x100
[70.795051] #2: ffff888417b9c608 (sb_internal#2){.+.+}, at: start_transaction+0x394/0x560
[70.795124] #3: ffff888403081768 (btrfs-fs-01){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
[70.795203] #4: ffff888403086568 (btrfs-fs-00){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
[70.795222] CPU: 5 PID: 1141 Comm: rsync Not tainted 5.6.0-rc2-backup+ #2
[70.795362] Call Trace:
[70.795374] dump_stack+0x71/0xa0
[70.795445] ___might_sleep.part.96.cold.106+0xa6/0xb6
[70.795459] kmem_cache_alloc+0x1d3/0x290
[70.795471] alloc_extent_state+0x22/0x1c0
[70.795544] __clear_extent_bit+0x3ba/0x580
[70.795557] ? _raw_spin_unlock_irq+0x24/0x30
[70.795569] btrfs_truncate_inode_items+0x339/0xe50
[70.795647] btrfs_evict_inode+0x269/0x540
[70.795659] ? dput.part.38+0x29/0x460
[70.795671] evict+0xcd/0x190
[70.795682] __dentry_kill+0xd6/0x180
[70.795754] dput.part.38+0x2ad/0x460
[70.795765] do_renameat2+0x3cb/0x540
[70.795777] __x64_sys_rename+0x1c/0x20
Reported-by: Dave Jones <davej(a)codemonkey.org.uk>
Fixes: 28553fa992cb ("Btrfs: fix race between shrinking truncate and fiemap")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7d26b4bfb2c6..36deef69f847 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4142,7 +4142,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
goto out;
}
- path->leave_spinning = 1;
ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
if (ret < 0)
goto out;
@@ -4294,7 +4293,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
root == fs_info->tree_root)) {
struct btrfs_ref ref = { 0 };
- btrfs_set_path_blocking(path);
bytes_deleted += extent_num_bytes;
btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 52e29e331070cd7d52a64cbf1b0958212a340e28 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 17 Jan 2020 09:02:20 -0500
Subject: [PATCH] btrfs: don't set path->leave_spinning for truncate
The only time we actually leave the path spinning is if we're truncating
a small amount and don't actually free an extent, which is not a common
occurrence. We have to set the path blocking in order to add the
delayed ref anyway, so the first extent we find we set the path to
blocking and stay blocking for the duration of the operation. With the
upcoming file extent map stuff there will be another case that we have
to have the path blocking, so just swap to blocking always.
Note: this patch also fixes a warning after 28553fa992cb ("Btrfs: fix
race between shrinking truncate and fiemap") got merged that inserts
extent locks around truncation so the path must not leave spinning locks
after btrfs_search_slot.
[70.794783] BUG: sleeping function called from invalid context at mm/slab.h:565
[70.794834] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1141, name: rsync
[70.794863] 5 locks held by rsync/1141:
[70.794876] #0: ffff888417b9c408 (sb_writers#17){.+.+}, at: mnt_want_write+0x20/0x50
[70.795030] #1: ffff888428de28e8 (&type->i_mutex_dir_key#13/1){+.+.}, at: lock_rename+0xf1/0x100
[70.795051] #2: ffff888417b9c608 (sb_internal#2){.+.+}, at: start_transaction+0x394/0x560
[70.795124] #3: ffff888403081768 (btrfs-fs-01){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
[70.795203] #4: ffff888403086568 (btrfs-fs-00){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
[70.795222] CPU: 5 PID: 1141 Comm: rsync Not tainted 5.6.0-rc2-backup+ #2
[70.795362] Call Trace:
[70.795374] dump_stack+0x71/0xa0
[70.795445] ___might_sleep.part.96.cold.106+0xa6/0xb6
[70.795459] kmem_cache_alloc+0x1d3/0x290
[70.795471] alloc_extent_state+0x22/0x1c0
[70.795544] __clear_extent_bit+0x3ba/0x580
[70.795557] ? _raw_spin_unlock_irq+0x24/0x30
[70.795569] btrfs_truncate_inode_items+0x339/0xe50
[70.795647] btrfs_evict_inode+0x269/0x540
[70.795659] ? dput.part.38+0x29/0x460
[70.795671] evict+0xcd/0x190
[70.795682] __dentry_kill+0xd6/0x180
[70.795754] dput.part.38+0x2ad/0x460
[70.795765] do_renameat2+0x3cb/0x540
[70.795777] __x64_sys_rename+0x1c/0x20
Reported-by: Dave Jones <davej(a)codemonkey.org.uk>
Fixes: 28553fa992cb ("Btrfs: fix race between shrinking truncate and fiemap")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7d26b4bfb2c6..36deef69f847 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4142,7 +4142,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
goto out;
}
- path->leave_spinning = 1;
ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
if (ret < 0)
goto out;
@@ -4294,7 +4293,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
root == fs_info->tree_root)) {
struct btrfs_ref ref = { 0 };
- btrfs_set_path_blocking(path);
bytes_deleted += extent_num_bytes;
btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 52e29e331070cd7d52a64cbf1b0958212a340e28 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Fri, 17 Jan 2020 09:02:20 -0500
Subject: [PATCH] btrfs: don't set path->leave_spinning for truncate
The only time we actually leave the path spinning is if we're truncating
a small amount and don't actually free an extent, which is not a common
occurrence. We have to set the path blocking in order to add the
delayed ref anyway, so the first extent we find we set the path to
blocking and stay blocking for the duration of the operation. With the
upcoming file extent map stuff there will be another case that we have
to have the path blocking, so just swap to blocking always.
Note: this patch also fixes a warning after 28553fa992cb ("Btrfs: fix
race between shrinking truncate and fiemap") got merged that inserts
extent locks around truncation so the path must not leave spinning locks
after btrfs_search_slot.
[70.794783] BUG: sleeping function called from invalid context at mm/slab.h:565
[70.794834] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1141, name: rsync
[70.794863] 5 locks held by rsync/1141:
[70.794876] #0: ffff888417b9c408 (sb_writers#17){.+.+}, at: mnt_want_write+0x20/0x50
[70.795030] #1: ffff888428de28e8 (&type->i_mutex_dir_key#13/1){+.+.}, at: lock_rename+0xf1/0x100
[70.795051] #2: ffff888417b9c608 (sb_internal#2){.+.+}, at: start_transaction+0x394/0x560
[70.795124] #3: ffff888403081768 (btrfs-fs-01){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
[70.795203] #4: ffff888403086568 (btrfs-fs-00){++++}, at: btrfs_try_tree_write_lock+0x2f/0x160
[70.795222] CPU: 5 PID: 1141 Comm: rsync Not tainted 5.6.0-rc2-backup+ #2
[70.795362] Call Trace:
[70.795374] dump_stack+0x71/0xa0
[70.795445] ___might_sleep.part.96.cold.106+0xa6/0xb6
[70.795459] kmem_cache_alloc+0x1d3/0x290
[70.795471] alloc_extent_state+0x22/0x1c0
[70.795544] __clear_extent_bit+0x3ba/0x580
[70.795557] ? _raw_spin_unlock_irq+0x24/0x30
[70.795569] btrfs_truncate_inode_items+0x339/0xe50
[70.795647] btrfs_evict_inode+0x269/0x540
[70.795659] ? dput.part.38+0x29/0x460
[70.795671] evict+0xcd/0x190
[70.795682] __dentry_kill+0xd6/0x180
[70.795754] dput.part.38+0x2ad/0x460
[70.795765] do_renameat2+0x3cb/0x540
[70.795777] __x64_sys_rename+0x1c/0x20
Reported-by: Dave Jones <davej(a)codemonkey.org.uk>
Fixes: 28553fa992cb ("Btrfs: fix race between shrinking truncate and fiemap")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7d26b4bfb2c6..36deef69f847 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4142,7 +4142,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
goto out;
}
- path->leave_spinning = 1;
ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
if (ret < 0)
goto out;
@@ -4294,7 +4293,6 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
root == fs_info->tree_root)) {
struct btrfs_ref ref = { 0 };
- btrfs_set_path_blocking(path);
bytes_deleted += extent_num_bytes;
btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 28553fa992cb28be6a65566681aac6cafabb4f2d Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Fri, 7 Feb 2020 12:23:09 +0000
Subject: [PATCH] Btrfs: fix race between shrinking truncate and fiemap
When there is a fiemap executing in parallel with a shrinking truncate
we can end up in a situation where we have extent maps for which we no
longer have corresponding file extent items. This is generally harmless
and at the moment the only consequences are missing file extent items
representing holes after we expand the file size again after the
truncate operation removed the prealloc extent items, and stale
information for future fiemap calls (reporting extents that no longer
exist or may have been reallocated to other files for example).
Consider the following example:
1) Our inode has a size of 128KiB, one 128KiB extent at file offset 0
and a 1MiB prealloc extent at file offset 128KiB;
2) Task A starts doing a shrinking truncate of our inode to reduce it to
a size of 64KiB. Before it searches the subvolume tree for file
extent items to delete, it drops all the extent maps in the range
from 64KiB to (u64)-1 by calling btrfs_drop_extent_cache();
3) Task B starts doing a fiemap against our inode. When looking up for
the inode's extent maps in the range from 128KiB to (u64)-1, it
doesn't find any in the inode's extent map tree, since they were
removed by task A. Because it didn't find any in the extent map
tree, it scans the inode's subvolume tree for file extent items, and
it finds the 1MiB prealloc extent at file offset 128KiB, then it
creates an extent map based on that file extent item and adds it to
inode's extent map tree (this ends up being done by
btrfs_get_extent() <- btrfs_get_extent_fiemap() <-
get_extent_skip_holes());
4) Task A then drops the prealloc extent at file offset 128KiB and
shrinks the 128KiB extent file offset 0 to a length of 64KiB. The
truncation operation finishes and we end up with an extent map
representing a 1MiB prealloc extent at file offset 128KiB, despite we
don't have any more that extent;
After this the two types of problems we have are:
1) Future calls to fiemap always report that a 1MiB prealloc extent
exists at file offset 128KiB. This is stale information, no longer
correct;
2) If the size of the file is increased, by a truncate operation that
increases the file size or by a write into a file offset > 64KiB for
example, we end up not inserting file extent items to represent holes
for any range between 128KiB and 128KiB + 1MiB, since the hole
expansion function, btrfs_cont_expand() will skip hole insertion for
any range for which an extent map exists that represents a prealloc
extent. This causes fsck to complain about missing file extent items
when not using the NO_HOLES feature.
The second issue could be often triggered by test case generic/561 from
fstests, which runs fsstress and duperemove in parallel, and duperemove
does frequent fiemap calls.
Essentially the problems happens because fiemap does not acquire the
inode's lock while truncate does, and fiemap locks the file range in the
inode's iotree while truncate does not. So fix the issue by making
btrfs_truncate_inode_items() lock the file range from the new file size
to (u64)-1, so that it serializes with fiemap.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5b3ec93ff911..7d26b4bfb2c6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4085,6 +4085,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
u64 bytes_deleted = 0;
bool be_nice = false;
bool should_throttle = false;
+ const u64 lock_start = ALIGN_DOWN(new_size, fs_info->sectorsize);
+ struct extent_state *cached_state = NULL;
BUG_ON(new_size > 0 && min_type != BTRFS_EXTENT_DATA_KEY);
@@ -4101,6 +4103,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
return -ENOMEM;
path->reada = READA_BACK;
+ lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
+
/*
* We want to drop from the next block forward in case this new size is
* not block aligned since we will be keeping the last block of the
@@ -4367,6 +4372,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
btrfs_ordered_update_i_size(inode, last_size, NULL);
}
+ unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
+
btrfs_free_path(path);
return ret;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 28553fa992cb28be6a65566681aac6cafabb4f2d Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Fri, 7 Feb 2020 12:23:09 +0000
Subject: [PATCH] Btrfs: fix race between shrinking truncate and fiemap
When there is a fiemap executing in parallel with a shrinking truncate
we can end up in a situation where we have extent maps for which we no
longer have corresponding file extent items. This is generally harmless
and at the moment the only consequences are missing file extent items
representing holes after we expand the file size again after the
truncate operation removed the prealloc extent items, and stale
information for future fiemap calls (reporting extents that no longer
exist or may have been reallocated to other files for example).
Consider the following example:
1) Our inode has a size of 128KiB, one 128KiB extent at file offset 0
and a 1MiB prealloc extent at file offset 128KiB;
2) Task A starts doing a shrinking truncate of our inode to reduce it to
a size of 64KiB. Before it searches the subvolume tree for file
extent items to delete, it drops all the extent maps in the range
from 64KiB to (u64)-1 by calling btrfs_drop_extent_cache();
3) Task B starts doing a fiemap against our inode. When looking up for
the inode's extent maps in the range from 128KiB to (u64)-1, it
doesn't find any in the inode's extent map tree, since they were
removed by task A. Because it didn't find any in the extent map
tree, it scans the inode's subvolume tree for file extent items, and
it finds the 1MiB prealloc extent at file offset 128KiB, then it
creates an extent map based on that file extent item and adds it to
inode's extent map tree (this ends up being done by
btrfs_get_extent() <- btrfs_get_extent_fiemap() <-
get_extent_skip_holes());
4) Task A then drops the prealloc extent at file offset 128KiB and
shrinks the 128KiB extent file offset 0 to a length of 64KiB. The
truncation operation finishes and we end up with an extent map
representing a 1MiB prealloc extent at file offset 128KiB, despite we
don't have any more that extent;
After this the two types of problems we have are:
1) Future calls to fiemap always report that a 1MiB prealloc extent
exists at file offset 128KiB. This is stale information, no longer
correct;
2) If the size of the file is increased, by a truncate operation that
increases the file size or by a write into a file offset > 64KiB for
example, we end up not inserting file extent items to represent holes
for any range between 128KiB and 128KiB + 1MiB, since the hole
expansion function, btrfs_cont_expand() will skip hole insertion for
any range for which an extent map exists that represents a prealloc
extent. This causes fsck to complain about missing file extent items
when not using the NO_HOLES feature.
The second issue could be often triggered by test case generic/561 from
fstests, which runs fsstress and duperemove in parallel, and duperemove
does frequent fiemap calls.
Essentially the problems happens because fiemap does not acquire the
inode's lock while truncate does, and fiemap locks the file range in the
inode's iotree while truncate does not. So fix the issue by making
btrfs_truncate_inode_items() lock the file range from the new file size
to (u64)-1, so that it serializes with fiemap.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5b3ec93ff911..7d26b4bfb2c6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4085,6 +4085,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
u64 bytes_deleted = 0;
bool be_nice = false;
bool should_throttle = false;
+ const u64 lock_start = ALIGN_DOWN(new_size, fs_info->sectorsize);
+ struct extent_state *cached_state = NULL;
BUG_ON(new_size > 0 && min_type != BTRFS_EXTENT_DATA_KEY);
@@ -4101,6 +4103,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
return -ENOMEM;
path->reada = READA_BACK;
+ lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
+
/*
* We want to drop from the next block forward in case this new size is
* not block aligned since we will be keeping the last block of the
@@ -4367,6 +4372,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
btrfs_ordered_update_i_size(inode, last_size, NULL);
}
+ unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
+
btrfs_free_path(path);
return ret;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 28553fa992cb28be6a65566681aac6cafabb4f2d Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Fri, 7 Feb 2020 12:23:09 +0000
Subject: [PATCH] Btrfs: fix race between shrinking truncate and fiemap
When there is a fiemap executing in parallel with a shrinking truncate
we can end up in a situation where we have extent maps for which we no
longer have corresponding file extent items. This is generally harmless
and at the moment the only consequences are missing file extent items
representing holes after we expand the file size again after the
truncate operation removed the prealloc extent items, and stale
information for future fiemap calls (reporting extents that no longer
exist or may have been reallocated to other files for example).
Consider the following example:
1) Our inode has a size of 128KiB, one 128KiB extent at file offset 0
and a 1MiB prealloc extent at file offset 128KiB;
2) Task A starts doing a shrinking truncate of our inode to reduce it to
a size of 64KiB. Before it searches the subvolume tree for file
extent items to delete, it drops all the extent maps in the range
from 64KiB to (u64)-1 by calling btrfs_drop_extent_cache();
3) Task B starts doing a fiemap against our inode. When looking up for
the inode's extent maps in the range from 128KiB to (u64)-1, it
doesn't find any in the inode's extent map tree, since they were
removed by task A. Because it didn't find any in the extent map
tree, it scans the inode's subvolume tree for file extent items, and
it finds the 1MiB prealloc extent at file offset 128KiB, then it
creates an extent map based on that file extent item and adds it to
inode's extent map tree (this ends up being done by
btrfs_get_extent() <- btrfs_get_extent_fiemap() <-
get_extent_skip_holes());
4) Task A then drops the prealloc extent at file offset 128KiB and
shrinks the 128KiB extent file offset 0 to a length of 64KiB. The
truncation operation finishes and we end up with an extent map
representing a 1MiB prealloc extent at file offset 128KiB, despite we
don't have any more that extent;
After this the two types of problems we have are:
1) Future calls to fiemap always report that a 1MiB prealloc extent
exists at file offset 128KiB. This is stale information, no longer
correct;
2) If the size of the file is increased, by a truncate operation that
increases the file size or by a write into a file offset > 64KiB for
example, we end up not inserting file extent items to represent holes
for any range between 128KiB and 128KiB + 1MiB, since the hole
expansion function, btrfs_cont_expand() will skip hole insertion for
any range for which an extent map exists that represents a prealloc
extent. This causes fsck to complain about missing file extent items
when not using the NO_HOLES feature.
The second issue could be often triggered by test case generic/561 from
fstests, which runs fsstress and duperemove in parallel, and duperemove
does frequent fiemap calls.
Essentially the problems happens because fiemap does not acquire the
inode's lock while truncate does, and fiemap locks the file range in the
inode's iotree while truncate does not. So fix the issue by making
btrfs_truncate_inode_items() lock the file range from the new file size
to (u64)-1, so that it serializes with fiemap.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5b3ec93ff911..7d26b4bfb2c6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4085,6 +4085,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
u64 bytes_deleted = 0;
bool be_nice = false;
bool should_throttle = false;
+ const u64 lock_start = ALIGN_DOWN(new_size, fs_info->sectorsize);
+ struct extent_state *cached_state = NULL;
BUG_ON(new_size > 0 && min_type != BTRFS_EXTENT_DATA_KEY);
@@ -4101,6 +4103,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
return -ENOMEM;
path->reada = READA_BACK;
+ lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
+
/*
* We want to drop from the next block forward in case this new size is
* not block aligned since we will be keeping the last block of the
@@ -4367,6 +4372,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
btrfs_ordered_update_i_size(inode, last_size, NULL);
}
+ unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
+
btrfs_free_path(path);
return ret;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 28553fa992cb28be6a65566681aac6cafabb4f2d Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Fri, 7 Feb 2020 12:23:09 +0000
Subject: [PATCH] Btrfs: fix race between shrinking truncate and fiemap
When there is a fiemap executing in parallel with a shrinking truncate
we can end up in a situation where we have extent maps for which we no
longer have corresponding file extent items. This is generally harmless
and at the moment the only consequences are missing file extent items
representing holes after we expand the file size again after the
truncate operation removed the prealloc extent items, and stale
information for future fiemap calls (reporting extents that no longer
exist or may have been reallocated to other files for example).
Consider the following example:
1) Our inode has a size of 128KiB, one 128KiB extent at file offset 0
and a 1MiB prealloc extent at file offset 128KiB;
2) Task A starts doing a shrinking truncate of our inode to reduce it to
a size of 64KiB. Before it searches the subvolume tree for file
extent items to delete, it drops all the extent maps in the range
from 64KiB to (u64)-1 by calling btrfs_drop_extent_cache();
3) Task B starts doing a fiemap against our inode. When looking up for
the inode's extent maps in the range from 128KiB to (u64)-1, it
doesn't find any in the inode's extent map tree, since they were
removed by task A. Because it didn't find any in the extent map
tree, it scans the inode's subvolume tree for file extent items, and
it finds the 1MiB prealloc extent at file offset 128KiB, then it
creates an extent map based on that file extent item and adds it to
inode's extent map tree (this ends up being done by
btrfs_get_extent() <- btrfs_get_extent_fiemap() <-
get_extent_skip_holes());
4) Task A then drops the prealloc extent at file offset 128KiB and
shrinks the 128KiB extent file offset 0 to a length of 64KiB. The
truncation operation finishes and we end up with an extent map
representing a 1MiB prealloc extent at file offset 128KiB, despite we
don't have any more that extent;
After this the two types of problems we have are:
1) Future calls to fiemap always report that a 1MiB prealloc extent
exists at file offset 128KiB. This is stale information, no longer
correct;
2) If the size of the file is increased, by a truncate operation that
increases the file size or by a write into a file offset > 64KiB for
example, we end up not inserting file extent items to represent holes
for any range between 128KiB and 128KiB + 1MiB, since the hole
expansion function, btrfs_cont_expand() will skip hole insertion for
any range for which an extent map exists that represents a prealloc
extent. This causes fsck to complain about missing file extent items
when not using the NO_HOLES feature.
The second issue could be often triggered by test case generic/561 from
fstests, which runs fsstress and duperemove in parallel, and duperemove
does frequent fiemap calls.
Essentially the problems happens because fiemap does not acquire the
inode's lock while truncate does, and fiemap locks the file range in the
inode's iotree while truncate does not. So fix the issue by making
btrfs_truncate_inode_items() lock the file range from the new file size
to (u64)-1, so that it serializes with fiemap.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5b3ec93ff911..7d26b4bfb2c6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4085,6 +4085,8 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
u64 bytes_deleted = 0;
bool be_nice = false;
bool should_throttle = false;
+ const u64 lock_start = ALIGN_DOWN(new_size, fs_info->sectorsize);
+ struct extent_state *cached_state = NULL;
BUG_ON(new_size > 0 && min_type != BTRFS_EXTENT_DATA_KEY);
@@ -4101,6 +4103,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
return -ENOMEM;
path->reada = READA_BACK;
+ lock_extent_bits(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
+
/*
* We want to drop from the next block forward in case this new size is
* not block aligned since we will be keeping the last block of the
@@ -4367,6 +4372,9 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
btrfs_ordered_update_i_size(inode, last_size, NULL);
}
+ unlock_extent_cached(&BTRFS_I(inode)->io_tree, lock_start, (u64)-1,
+ &cached_state);
+
btrfs_free_path(path);
return ret;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1e90315149f3fe148e114a5de86f0196d1c21fa5 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 13 Feb 2020 10:47:29 -0500
Subject: [PATCH] btrfs: do not check delayed items are empty for single
transaction cleanup
btrfs_assert_delayed_root_empty() will check if the delayed root is
completely empty, but this is a filesystem-wide check. On cleanup we
may have allowed other transactions to begin, for whatever reason, and
thus the delayed root is not empty.
So remove this check from cleanup_one_transation(). This however can
stay in btrfs_cleanup_transaction(), because it checks only after all of
the transactions have been properly cleaned up, and thus is valid.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 197352f23534..c6c9a6a8e6c8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4503,7 +4503,6 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans,
wake_up(&fs_info->transaction_wait);
btrfs_destroy_delayed_inodes(fs_info);
- btrfs_assert_delayed_root_empty(fs_info);
btrfs_destroy_marked_extents(fs_info, &cur_trans->dirty_pages,
EXTENT_DIRTY);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1e90315149f3fe148e114a5de86f0196d1c21fa5 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 13 Feb 2020 10:47:29 -0500
Subject: [PATCH] btrfs: do not check delayed items are empty for single
transaction cleanup
btrfs_assert_delayed_root_empty() will check if the delayed root is
completely empty, but this is a filesystem-wide check. On cleanup we
may have allowed other transactions to begin, for whatever reason, and
thus the delayed root is not empty.
So remove this check from cleanup_one_transation(). This however can
stay in btrfs_cleanup_transaction(), because it checks only after all of
the transactions have been properly cleaned up, and thus is valid.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 197352f23534..c6c9a6a8e6c8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4503,7 +4503,6 @@ void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans,
wake_up(&fs_info->transaction_wait);
btrfs_destroy_delayed_inodes(fs_info);
- btrfs_assert_delayed_root_empty(fs_info);
btrfs_destroy_marked_extents(fs_info, &cur_trans->dirty_pages,
EXTENT_DIRTY);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 315bf8ef914f31d51d084af950703aa1e09a728c Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 13 Feb 2020 10:47:28 -0500
Subject: [PATCH] btrfs: reset fs_root to NULL on error in open_ctree
While running my error injection script I hit a panic when we tried to
clean up the fs_root when freeing the fs_root. This is because
fs_info->fs_root == PTR_ERR(-EIO), which isn't great. Fix this by
setting fs_info->fs_root = NULL; if we fail to read the root.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d7fec89974cb..197352f23534 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3200,6 +3200,7 @@ int __cold open_ctree(struct super_block *sb,
if (IS_ERR(fs_info->fs_root)) {
err = PTR_ERR(fs_info->fs_root);
btrfs_warn(fs_info, "failed to read fs tree: %d", err);
+ fs_info->fs_root = NULL;
goto fail_qgroup;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 315bf8ef914f31d51d084af950703aa1e09a728c Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 13 Feb 2020 10:47:28 -0500
Subject: [PATCH] btrfs: reset fs_root to NULL on error in open_ctree
While running my error injection script I hit a panic when we tried to
clean up the fs_root when freeing the fs_root. This is because
fs_info->fs_root == PTR_ERR(-EIO), which isn't great. Fix this by
setting fs_info->fs_root = NULL; if we fail to read the root.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d7fec89974cb..197352f23534 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3200,6 +3200,7 @@ int __cold open_ctree(struct super_block *sb,
if (IS_ERR(fs_info->fs_root)) {
err = PTR_ERR(fs_info->fs_root);
btrfs_warn(fs_info, "failed to read fs tree: %d", err);
+ fs_info->fs_root = NULL;
goto fail_qgroup;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 315bf8ef914f31d51d084af950703aa1e09a728c Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 13 Feb 2020 10:47:28 -0500
Subject: [PATCH] btrfs: reset fs_root to NULL on error in open_ctree
While running my error injection script I hit a panic when we tried to
clean up the fs_root when freeing the fs_root. This is because
fs_info->fs_root == PTR_ERR(-EIO), which isn't great. Fix this by
setting fs_info->fs_root = NULL; if we fail to read the root.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d7fec89974cb..197352f23534 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3200,6 +3200,7 @@ int __cold open_ctree(struct super_block *sb,
if (IS_ERR(fs_info->fs_root)) {
err = PTR_ERR(fs_info->fs_root);
btrfs_warn(fs_info, "failed to read fs tree: %d", err);
+ fs_info->fs_root = NULL;
goto fail_qgroup;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b778cf962d71a0e737923d55d0432f3bd287258e Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 13 Feb 2020 10:47:31 -0500
Subject: [PATCH] btrfs: fix bytes_may_use underflow in prealloc error
condtition
I hit the following warning while running my error injection stress
testing:
WARNING: CPU: 3 PID: 1453 at fs/btrfs/space-info.h:108 btrfs_free_reserved_data_space_noquota+0xfd/0x160 [btrfs]
RIP: 0010:btrfs_free_reserved_data_space_noquota+0xfd/0x160 [btrfs]
Call Trace:
btrfs_free_reserved_data_space+0x4f/0x70 [btrfs]
__btrfs_prealloc_file_range+0x378/0x470 [btrfs]
elfcorehdr_read+0x40/0x40
? elfcorehdr_read+0x40/0x40
? btrfs_commit_transaction+0xca/0xa50 [btrfs]
? dput+0xb4/0x2a0
? btrfs_log_dentry_safe+0x55/0x70 [btrfs]
? btrfs_sync_file+0x30e/0x420 [btrfs]
? do_fsync+0x38/0x70
? __x64_sys_fdatasync+0x13/0x20
? do_syscall_64+0x5b/0x1b0
? entry_SYSCALL_64_after_hwframe+0x44/0xa9
This happens if we fail to insert our reserved file extent. At this
point we've already converted our reservation from ->bytes_may_use to
->bytes_reserved. However once we break we will attempt to free
everything from [cur_offset, end] from ->bytes_may_use, but our extent
reservation will overlap part of this.
Fix this problem by adding ins.offset (our extent allocation size) to
cur_offset so we remove the actual remaining part from ->bytes_may_use.
I validated this fix using my inject-error.py script
python inject-error.py -o should_fail_bio -t cache_save_setup -t \
__btrfs_prealloc_file_range \
-t insert_reserved_file_extent.constprop.0 \
-r "-5" ./run-fsstress.sh
where run-fsstress.sh simply mounts and runs fsstress on a disk.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 36deef69f847..4f47ba652b31 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9824,6 +9824,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_key ins;
u64 cur_offset = start;
+ u64 clear_offset = start;
u64 i_size;
u64 cur_bytes;
u64 last_alloc = (u64)-1;
@@ -9858,6 +9859,15 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
btrfs_end_transaction(trans);
break;
}
+
+ /*
+ * We've reserved this space, and thus converted it from
+ * ->bytes_may_use to ->bytes_reserved. Any error that happens
+ * from here on out we will only need to clear our reservation
+ * for the remaining unreserved area, so advance our
+ * clear_offset by our extent size.
+ */
+ clear_offset += ins.offset;
btrfs_dec_block_group_reservations(fs_info, ins.objectid);
last_alloc = ins.offset;
@@ -9937,9 +9947,9 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
if (own_trans)
btrfs_end_transaction(trans);
}
- if (cur_offset < end)
- btrfs_free_reserved_data_space(inode, NULL, cur_offset,
- end - cur_offset + 1);
+ if (clear_offset < end)
+ btrfs_free_reserved_data_space(inode, NULL, clear_offset,
+ end - clear_offset + 1);
return ret;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b778cf962d71a0e737923d55d0432f3bd287258e Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 13 Feb 2020 10:47:31 -0500
Subject: [PATCH] btrfs: fix bytes_may_use underflow in prealloc error
condtition
I hit the following warning while running my error injection stress
testing:
WARNING: CPU: 3 PID: 1453 at fs/btrfs/space-info.h:108 btrfs_free_reserved_data_space_noquota+0xfd/0x160 [btrfs]
RIP: 0010:btrfs_free_reserved_data_space_noquota+0xfd/0x160 [btrfs]
Call Trace:
btrfs_free_reserved_data_space+0x4f/0x70 [btrfs]
__btrfs_prealloc_file_range+0x378/0x470 [btrfs]
elfcorehdr_read+0x40/0x40
? elfcorehdr_read+0x40/0x40
? btrfs_commit_transaction+0xca/0xa50 [btrfs]
? dput+0xb4/0x2a0
? btrfs_log_dentry_safe+0x55/0x70 [btrfs]
? btrfs_sync_file+0x30e/0x420 [btrfs]
? do_fsync+0x38/0x70
? __x64_sys_fdatasync+0x13/0x20
? do_syscall_64+0x5b/0x1b0
? entry_SYSCALL_64_after_hwframe+0x44/0xa9
This happens if we fail to insert our reserved file extent. At this
point we've already converted our reservation from ->bytes_may_use to
->bytes_reserved. However once we break we will attempt to free
everything from [cur_offset, end] from ->bytes_may_use, but our extent
reservation will overlap part of this.
Fix this problem by adding ins.offset (our extent allocation size) to
cur_offset so we remove the actual remaining part from ->bytes_may_use.
I validated this fix using my inject-error.py script
python inject-error.py -o should_fail_bio -t cache_save_setup -t \
__btrfs_prealloc_file_range \
-t insert_reserved_file_extent.constprop.0 \
-r "-5" ./run-fsstress.sh
where run-fsstress.sh simply mounts and runs fsstress on a disk.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 36deef69f847..4f47ba652b31 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9824,6 +9824,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_key ins;
u64 cur_offset = start;
+ u64 clear_offset = start;
u64 i_size;
u64 cur_bytes;
u64 last_alloc = (u64)-1;
@@ -9858,6 +9859,15 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
btrfs_end_transaction(trans);
break;
}
+
+ /*
+ * We've reserved this space, and thus converted it from
+ * ->bytes_may_use to ->bytes_reserved. Any error that happens
+ * from here on out we will only need to clear our reservation
+ * for the remaining unreserved area, so advance our
+ * clear_offset by our extent size.
+ */
+ clear_offset += ins.offset;
btrfs_dec_block_group_reservations(fs_info, ins.objectid);
last_alloc = ins.offset;
@@ -9937,9 +9947,9 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
if (own_trans)
btrfs_end_transaction(trans);
}
- if (cur_offset < end)
- btrfs_free_reserved_data_space(inode, NULL, cur_offset,
- end - cur_offset + 1);
+ if (clear_offset < end)
+ btrfs_free_reserved_data_space(inode, NULL, clear_offset,
+ end - clear_offset + 1);
return ret;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 81f7eb00ff5bb8326e82503a32809421d14abb8a Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Tue, 11 Feb 2020 15:25:37 +0800
Subject: [PATCH] btrfs: destroy qgroup extent records on transaction abort
We clean up the delayed references when we abort a transaction but we
leave the pending qgroup extent records behind, leaking memory.
This patch destroys the extent records when we destroy the delayed refs
and makes sure ensure they're gone before releasing the transaction.
Fixes: 3368d001ba5d ("btrfs: qgroup: Record possible quota-related extent for qgroup.")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
[ Rebased to latest upstream, remove to_qgroup() helper, use
rbtree_postorder_for_each_entry_safe() wrapper ]
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 89422aa8e9d1..d7fec89974cb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4276,6 +4276,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
cond_resched();
spin_lock(&delayed_refs->lock);
}
+ btrfs_qgroup_destroy_extent_records(trans);
spin_unlock(&delayed_refs->lock);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 98d9a50352d6..ff1870ff3474 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -4002,3 +4002,16 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
}
return ret;
}
+
+void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans)
+{
+ struct btrfs_qgroup_extent_record *entry;
+ struct btrfs_qgroup_extent_record *next;
+ struct rb_root *root;
+
+ root = &trans->delayed_refs.dirty_extent_root;
+ rbtree_postorder_for_each_entry_safe(entry, next, root, node) {
+ ulist_free(entry->old_roots);
+ kfree(entry);
+ }
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 236f12224d52..1bc654459469 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -414,5 +414,6 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
u64 last_snapshot);
int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
struct btrfs_root *root, struct extent_buffer *eb);
+void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans);
#endif
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 33dcc88b428a..beb6c69cd1e5 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -121,6 +121,8 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
BUG_ON(!list_empty(&transaction->list));
WARN_ON(!RB_EMPTY_ROOT(
&transaction->delayed_refs.href_root.rb_root));
+ WARN_ON(!RB_EMPTY_ROOT(
+ &transaction->delayed_refs.dirty_extent_root));
if (transaction->delayed_refs.pending_csums)
btrfs_err(transaction->fs_info,
"pending csums is %llu",
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 81f7eb00ff5bb8326e82503a32809421d14abb8a Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Tue, 11 Feb 2020 15:25:37 +0800
Subject: [PATCH] btrfs: destroy qgroup extent records on transaction abort
We clean up the delayed references when we abort a transaction but we
leave the pending qgroup extent records behind, leaking memory.
This patch destroys the extent records when we destroy the delayed refs
and makes sure ensure they're gone before releasing the transaction.
Fixes: 3368d001ba5d ("btrfs: qgroup: Record possible quota-related extent for qgroup.")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
[ Rebased to latest upstream, remove to_qgroup() helper, use
rbtree_postorder_for_each_entry_safe() wrapper ]
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 89422aa8e9d1..d7fec89974cb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4276,6 +4276,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
cond_resched();
spin_lock(&delayed_refs->lock);
}
+ btrfs_qgroup_destroy_extent_records(trans);
spin_unlock(&delayed_refs->lock);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 98d9a50352d6..ff1870ff3474 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -4002,3 +4002,16 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
}
return ret;
}
+
+void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans)
+{
+ struct btrfs_qgroup_extent_record *entry;
+ struct btrfs_qgroup_extent_record *next;
+ struct rb_root *root;
+
+ root = &trans->delayed_refs.dirty_extent_root;
+ rbtree_postorder_for_each_entry_safe(entry, next, root, node) {
+ ulist_free(entry->old_roots);
+ kfree(entry);
+ }
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 236f12224d52..1bc654459469 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -414,5 +414,6 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
u64 last_snapshot);
int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
struct btrfs_root *root, struct extent_buffer *eb);
+void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans);
#endif
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 33dcc88b428a..beb6c69cd1e5 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -121,6 +121,8 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
BUG_ON(!list_empty(&transaction->list));
WARN_ON(!RB_EMPTY_ROOT(
&transaction->delayed_refs.href_root.rb_root));
+ WARN_ON(!RB_EMPTY_ROOT(
+ &transaction->delayed_refs.dirty_extent_root));
if (transaction->delayed_refs.pending_csums)
btrfs_err(transaction->fs_info,
"pending csums is %llu",
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 81f7eb00ff5bb8326e82503a32809421d14abb8a Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Tue, 11 Feb 2020 15:25:37 +0800
Subject: [PATCH] btrfs: destroy qgroup extent records on transaction abort
We clean up the delayed references when we abort a transaction but we
leave the pending qgroup extent records behind, leaking memory.
This patch destroys the extent records when we destroy the delayed refs
and makes sure ensure they're gone before releasing the transaction.
Fixes: 3368d001ba5d ("btrfs: qgroup: Record possible quota-related extent for qgroup.")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
[ Rebased to latest upstream, remove to_qgroup() helper, use
rbtree_postorder_for_each_entry_safe() wrapper ]
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 89422aa8e9d1..d7fec89974cb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4276,6 +4276,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
cond_resched();
spin_lock(&delayed_refs->lock);
}
+ btrfs_qgroup_destroy_extent_records(trans);
spin_unlock(&delayed_refs->lock);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 98d9a50352d6..ff1870ff3474 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -4002,3 +4002,16 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
}
return ret;
}
+
+void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans)
+{
+ struct btrfs_qgroup_extent_record *entry;
+ struct btrfs_qgroup_extent_record *next;
+ struct rb_root *root;
+
+ root = &trans->delayed_refs.dirty_extent_root;
+ rbtree_postorder_for_each_entry_safe(entry, next, root, node) {
+ ulist_free(entry->old_roots);
+ kfree(entry);
+ }
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 236f12224d52..1bc654459469 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -414,5 +414,6 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
u64 last_snapshot);
int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
struct btrfs_root *root, struct extent_buffer *eb);
+void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans);
#endif
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 33dcc88b428a..beb6c69cd1e5 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -121,6 +121,8 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
BUG_ON(!list_empty(&transaction->list));
WARN_ON(!RB_EMPTY_ROOT(
&transaction->delayed_refs.href_root.rb_root));
+ WARN_ON(!RB_EMPTY_ROOT(
+ &transaction->delayed_refs.dirty_extent_root));
if (transaction->delayed_refs.pending_csums)
btrfs_err(transaction->fs_info,
"pending csums is %llu",
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 81f7eb00ff5bb8326e82503a32809421d14abb8a Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Tue, 11 Feb 2020 15:25:37 +0800
Subject: [PATCH] btrfs: destroy qgroup extent records on transaction abort
We clean up the delayed references when we abort a transaction but we
leave the pending qgroup extent records behind, leaking memory.
This patch destroys the extent records when we destroy the delayed refs
and makes sure ensure they're gone before releasing the transaction.
Fixes: 3368d001ba5d ("btrfs: qgroup: Record possible quota-related extent for qgroup.")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
[ Rebased to latest upstream, remove to_qgroup() helper, use
rbtree_postorder_for_each_entry_safe() wrapper ]
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 89422aa8e9d1..d7fec89974cb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4276,6 +4276,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
cond_resched();
spin_lock(&delayed_refs->lock);
}
+ btrfs_qgroup_destroy_extent_records(trans);
spin_unlock(&delayed_refs->lock);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 98d9a50352d6..ff1870ff3474 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -4002,3 +4002,16 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
}
return ret;
}
+
+void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans)
+{
+ struct btrfs_qgroup_extent_record *entry;
+ struct btrfs_qgroup_extent_record *next;
+ struct rb_root *root;
+
+ root = &trans->delayed_refs.dirty_extent_root;
+ rbtree_postorder_for_each_entry_safe(entry, next, root, node) {
+ ulist_free(entry->old_roots);
+ kfree(entry);
+ }
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 236f12224d52..1bc654459469 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -414,5 +414,6 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans,
u64 last_snapshot);
int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
struct btrfs_root *root, struct extent_buffer *eb);
+void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans);
#endif
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 33dcc88b428a..beb6c69cd1e5 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -121,6 +121,8 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
BUG_ON(!list_empty(&transaction->list));
WARN_ON(!RB_EMPTY_ROOT(
&transaction->delayed_refs.href_root.rb_root));
+ WARN_ON(!RB_EMPTY_ROOT(
+ &transaction->delayed_refs.dirty_extent_root));
if (transaction->delayed_refs.pending_csums)
btrfs_err(transaction->fs_info,
"pending csums is %llu",
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 91a5f413af596ad01097e59bf487eb07cb3f1331 Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Date: Thu, 20 Feb 2020 18:22:05 +0100
Subject: [PATCH] KVM: nVMX: handle nested posted interrupts when apicv is
disabled for L1
Even when APICv is disabled for L1 it can (and, actually, is) still
available for L2, this means we need to always call
vmx_deliver_nested_posted_interrupt() when attempting an interrupt
delivery.
Suggested-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 40a0c0fd95ca..a84e8c5acda8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1146,7 +1146,7 @@ struct kvm_x86_ops {
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_apic_mode)(struct kvm_vcpu *vcpu);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
- void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
+ int (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*set_identity_map_addr)(struct kvm *kvm, u64 ident_addr);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afcd30d44cbb..cc8ee8125712 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1046,11 +1046,8 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
apic->regs + APIC_TMR);
}
- if (vcpu->arch.apicv_active)
- kvm_x86_ops->deliver_posted_interrupt(vcpu, vector);
- else {
+ if (kvm_x86_ops->deliver_posted_interrupt(vcpu, vector)) {
kvm_lapic_set_irr(vector, apic);
-
kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_vcpu_kick(vcpu);
}
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a391b29138f0..8787a123b8e7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5258,8 +5258,11 @@ static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
return;
}
-static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
+static int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
{
+ if (!vcpu->arch.apicv_active)
+ return -1;
+
kvm_lapic_set_irr(vec, vcpu->arch.apic);
smp_mb__after_atomic();
@@ -5271,6 +5274,8 @@ static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
put_cpu();
} else
kvm_vcpu_wake_up(vcpu);
+
+ return 0;
}
static bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index dafe4df893c8..63ccc435a602 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3822,24 +3822,29 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
* 2. If target vcpu isn't running(root mode), kick it to pick up the
* interrupt from PIR in next vmentry.
*/
-static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
+static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
int r;
r = vmx_deliver_nested_posted_interrupt(vcpu, vector);
if (!r)
- return;
+ return 0;
+
+ if (!vcpu->arch.apicv_active)
+ return -1;
if (pi_test_and_set_pir(vector, &vmx->pi_desc))
- return;
+ return 0;
/* If a previous notification has sent the IPI, nothing to do. */
if (pi_test_and_set_on(&vmx->pi_desc))
- return;
+ return 0;
if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false))
kvm_vcpu_kick(vcpu);
+
+ return 0;
}
/*
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 91a5f413af596ad01097e59bf487eb07cb3f1331 Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Date: Thu, 20 Feb 2020 18:22:05 +0100
Subject: [PATCH] KVM: nVMX: handle nested posted interrupts when apicv is
disabled for L1
Even when APICv is disabled for L1 it can (and, actually, is) still
available for L2, this means we need to always call
vmx_deliver_nested_posted_interrupt() when attempting an interrupt
delivery.
Suggested-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 40a0c0fd95ca..a84e8c5acda8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1146,7 +1146,7 @@ struct kvm_x86_ops {
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_apic_mode)(struct kvm_vcpu *vcpu);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
- void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
+ int (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*set_identity_map_addr)(struct kvm *kvm, u64 ident_addr);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afcd30d44cbb..cc8ee8125712 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1046,11 +1046,8 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
apic->regs + APIC_TMR);
}
- if (vcpu->arch.apicv_active)
- kvm_x86_ops->deliver_posted_interrupt(vcpu, vector);
- else {
+ if (kvm_x86_ops->deliver_posted_interrupt(vcpu, vector)) {
kvm_lapic_set_irr(vector, apic);
-
kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_vcpu_kick(vcpu);
}
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a391b29138f0..8787a123b8e7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5258,8 +5258,11 @@ static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
return;
}
-static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
+static int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
{
+ if (!vcpu->arch.apicv_active)
+ return -1;
+
kvm_lapic_set_irr(vec, vcpu->arch.apic);
smp_mb__after_atomic();
@@ -5271,6 +5274,8 @@ static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
put_cpu();
} else
kvm_vcpu_wake_up(vcpu);
+
+ return 0;
}
static bool svm_dy_apicv_has_pending_interrupt(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index dafe4df893c8..63ccc435a602 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3822,24 +3822,29 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu,
* 2. If target vcpu isn't running(root mode), kick it to pick up the
* interrupt from PIR in next vmentry.
*/
-static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
+static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
int r;
r = vmx_deliver_nested_posted_interrupt(vcpu, vector);
if (!r)
- return;
+ return 0;
+
+ if (!vcpu->arch.apicv_active)
+ return -1;
if (pi_test_and_set_pir(vector, &vmx->pi_desc))
- return;
+ return 0;
/* If a previous notification has sent the IPI, nothing to do. */
if (pi_test_and_set_on(&vmx->pi_desc))
- return;
+ return 0;
if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false))
kvm_vcpu_kick(vcpu);
+
+ return 0;
}
/*
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4443267800af240072280c44521caab61924e55 Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Date: Thu, 20 Feb 2020 18:22:04 +0100
Subject: [PATCH] KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested
pinbased_ctls only when apicv is globally disabled
When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*),
nothing happens to VMX MSRs on the already existing vCPUs, however, all new
ones are created with PIN_BASED_POSTED_INTR filtered out. This is very
confusing and results in the following picture inside the guest:
$ rdmsr -ax 0x48d
ff00000016
7f00000016
7f00000016
7f00000016
This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does
KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three.
L1 hypervisor may only check CPU0's controls to find out what features
are available and it will be very confused later. Switch to setting
PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting.
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 283bdb7071af..f486e2606247 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -12,6 +12,7 @@ extern bool __read_mostly enable_ept;
extern bool __read_mostly enable_unrestricted_guest;
extern bool __read_mostly enable_ept_ad_bits;
extern bool __read_mostly enable_pml;
+extern bool __read_mostly enable_apicv;
extern int __read_mostly pt_mode;
#define PT_MODE_SYSTEM 0
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a5757b0b80f9..2b3ba7d27be4 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5949,8 +5949,7 @@ void nested_vmx_set_vmcs_shadowing_bitmap(void)
* bit in the high half is on if the corresponding bit in the control field
* may be on. See also vmx_control_verify().
*/
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
- bool apicv)
+void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
{
/*
* Note that as a general rule, the high half of the MSRs (bits in
@@ -5977,7 +5976,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
PIN_BASED_EXT_INTR_MASK |
PIN_BASED_NMI_EXITING |
PIN_BASED_VIRTUAL_NMIS |
- (apicv ? PIN_BASED_POSTED_INTR : 0);
+ (enable_apicv ? PIN_BASED_POSTED_INTR : 0);
msrs->pinbased_ctls_high |=
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR |
PIN_BASED_VMX_PREEMPTION_TIMER;
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index fc874d4ead0f..1c5fbff45d69 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -17,8 +17,7 @@ enum nvmx_vmentry_status {
};
void vmx_leave_nested(struct kvm_vcpu *vcpu);
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
- bool apicv);
+void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
void nested_vmx_hardware_unsetup(void);
__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
void nested_vmx_set_vmcs_shadowing_bitmap(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 63ccc435a602..404dafedd778 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -95,7 +95,7 @@ module_param(emulate_invalid_guest_state, bool, S_IRUGO);
static bool __read_mostly fasteoi = 1;
module_param(fasteoi, bool, S_IRUGO);
-static bool __read_mostly enable_apicv = 1;
+bool __read_mostly enable_apicv = 1;
module_param(enable_apicv, bool, S_IRUGO);
/*
@@ -6769,8 +6769,7 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
if (nested)
nested_vmx_setup_ctls_msrs(&vmx->nested.msrs,
- vmx_capability.ept,
- kvm_vcpu_apicv_active(vcpu));
+ vmx_capability.ept);
else
memset(&vmx->nested.msrs, 0, sizeof(vmx->nested.msrs));
@@ -6851,8 +6850,7 @@ static int __init vmx_check_processor_compat(void)
if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
return -EIO;
if (nested)
- nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept,
- enable_apicv);
+ nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
smp_processor_id());
@@ -7714,7 +7712,7 @@ static __init int hardware_setup(void)
if (nested) {
nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
- vmx_capability.ept, enable_apicv);
+ vmx_capability.ept);
r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
if (r)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4443267800af240072280c44521caab61924e55 Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Date: Thu, 20 Feb 2020 18:22:04 +0100
Subject: [PATCH] KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested
pinbased_ctls only when apicv is globally disabled
When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*),
nothing happens to VMX MSRs on the already existing vCPUs, however, all new
ones are created with PIN_BASED_POSTED_INTR filtered out. This is very
confusing and results in the following picture inside the guest:
$ rdmsr -ax 0x48d
ff00000016
7f00000016
7f00000016
7f00000016
This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does
KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three.
L1 hypervisor may only check CPU0's controls to find out what features
are available and it will be very confused later. Switch to setting
PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting.
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 283bdb7071af..f486e2606247 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -12,6 +12,7 @@ extern bool __read_mostly enable_ept;
extern bool __read_mostly enable_unrestricted_guest;
extern bool __read_mostly enable_ept_ad_bits;
extern bool __read_mostly enable_pml;
+extern bool __read_mostly enable_apicv;
extern int __read_mostly pt_mode;
#define PT_MODE_SYSTEM 0
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a5757b0b80f9..2b3ba7d27be4 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5949,8 +5949,7 @@ void nested_vmx_set_vmcs_shadowing_bitmap(void)
* bit in the high half is on if the corresponding bit in the control field
* may be on. See also vmx_control_verify().
*/
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
- bool apicv)
+void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
{
/*
* Note that as a general rule, the high half of the MSRs (bits in
@@ -5977,7 +5976,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
PIN_BASED_EXT_INTR_MASK |
PIN_BASED_NMI_EXITING |
PIN_BASED_VIRTUAL_NMIS |
- (apicv ? PIN_BASED_POSTED_INTR : 0);
+ (enable_apicv ? PIN_BASED_POSTED_INTR : 0);
msrs->pinbased_ctls_high |=
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR |
PIN_BASED_VMX_PREEMPTION_TIMER;
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index fc874d4ead0f..1c5fbff45d69 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -17,8 +17,7 @@ enum nvmx_vmentry_status {
};
void vmx_leave_nested(struct kvm_vcpu *vcpu);
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
- bool apicv);
+void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
void nested_vmx_hardware_unsetup(void);
__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
void nested_vmx_set_vmcs_shadowing_bitmap(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 63ccc435a602..404dafedd778 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -95,7 +95,7 @@ module_param(emulate_invalid_guest_state, bool, S_IRUGO);
static bool __read_mostly fasteoi = 1;
module_param(fasteoi, bool, S_IRUGO);
-static bool __read_mostly enable_apicv = 1;
+bool __read_mostly enable_apicv = 1;
module_param(enable_apicv, bool, S_IRUGO);
/*
@@ -6769,8 +6769,7 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
if (nested)
nested_vmx_setup_ctls_msrs(&vmx->nested.msrs,
- vmx_capability.ept,
- kvm_vcpu_apicv_active(vcpu));
+ vmx_capability.ept);
else
memset(&vmx->nested.msrs, 0, sizeof(vmx->nested.msrs));
@@ -6851,8 +6850,7 @@ static int __init vmx_check_processor_compat(void)
if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
return -EIO;
if (nested)
- nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept,
- enable_apicv);
+ nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
smp_processor_id());
@@ -7714,7 +7712,7 @@ static __init int hardware_setup(void)
if (nested) {
nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
- vmx_capability.ept, enable_apicv);
+ vmx_capability.ept);
r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
if (r)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4443267800af240072280c44521caab61924e55 Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Date: Thu, 20 Feb 2020 18:22:04 +0100
Subject: [PATCH] KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested
pinbased_ctls only when apicv is globally disabled
When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*),
nothing happens to VMX MSRs on the already existing vCPUs, however, all new
ones are created with PIN_BASED_POSTED_INTR filtered out. This is very
confusing and results in the following picture inside the guest:
$ rdmsr -ax 0x48d
ff00000016
7f00000016
7f00000016
7f00000016
This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does
KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three.
L1 hypervisor may only check CPU0's controls to find out what features
are available and it will be very confused later. Switch to setting
PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting.
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 283bdb7071af..f486e2606247 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -12,6 +12,7 @@ extern bool __read_mostly enable_ept;
extern bool __read_mostly enable_unrestricted_guest;
extern bool __read_mostly enable_ept_ad_bits;
extern bool __read_mostly enable_pml;
+extern bool __read_mostly enable_apicv;
extern int __read_mostly pt_mode;
#define PT_MODE_SYSTEM 0
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a5757b0b80f9..2b3ba7d27be4 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5949,8 +5949,7 @@ void nested_vmx_set_vmcs_shadowing_bitmap(void)
* bit in the high half is on if the corresponding bit in the control field
* may be on. See also vmx_control_verify().
*/
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
- bool apicv)
+void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
{
/*
* Note that as a general rule, the high half of the MSRs (bits in
@@ -5977,7 +5976,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
PIN_BASED_EXT_INTR_MASK |
PIN_BASED_NMI_EXITING |
PIN_BASED_VIRTUAL_NMIS |
- (apicv ? PIN_BASED_POSTED_INTR : 0);
+ (enable_apicv ? PIN_BASED_POSTED_INTR : 0);
msrs->pinbased_ctls_high |=
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR |
PIN_BASED_VMX_PREEMPTION_TIMER;
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index fc874d4ead0f..1c5fbff45d69 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -17,8 +17,7 @@ enum nvmx_vmentry_status {
};
void vmx_leave_nested(struct kvm_vcpu *vcpu);
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
- bool apicv);
+void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
void nested_vmx_hardware_unsetup(void);
__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
void nested_vmx_set_vmcs_shadowing_bitmap(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 63ccc435a602..404dafedd778 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -95,7 +95,7 @@ module_param(emulate_invalid_guest_state, bool, S_IRUGO);
static bool __read_mostly fasteoi = 1;
module_param(fasteoi, bool, S_IRUGO);
-static bool __read_mostly enable_apicv = 1;
+bool __read_mostly enable_apicv = 1;
module_param(enable_apicv, bool, S_IRUGO);
/*
@@ -6769,8 +6769,7 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
if (nested)
nested_vmx_setup_ctls_msrs(&vmx->nested.msrs,
- vmx_capability.ept,
- kvm_vcpu_apicv_active(vcpu));
+ vmx_capability.ept);
else
memset(&vmx->nested.msrs, 0, sizeof(vmx->nested.msrs));
@@ -6851,8 +6850,7 @@ static int __init vmx_check_processor_compat(void)
if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
return -EIO;
if (nested)
- nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept,
- enable_apicv);
+ nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
smp_processor_id());
@@ -7714,7 +7712,7 @@ static __init int hardware_setup(void)
if (nested) {
nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
- vmx_capability.ept, enable_apicv);
+ vmx_capability.ept);
r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
if (r)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a4443267800af240072280c44521caab61924e55 Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Date: Thu, 20 Feb 2020 18:22:04 +0100
Subject: [PATCH] KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested
pinbased_ctls only when apicv is globally disabled
When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*),
nothing happens to VMX MSRs on the already existing vCPUs, however, all new
ones are created with PIN_BASED_POSTED_INTR filtered out. This is very
confusing and results in the following picture inside the guest:
$ rdmsr -ax 0x48d
ff00000016
7f00000016
7f00000016
7f00000016
This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does
KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three.
L1 hypervisor may only check CPU0's controls to find out what features
are available and it will be very confused later. Switch to setting
PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting.
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 283bdb7071af..f486e2606247 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -12,6 +12,7 @@ extern bool __read_mostly enable_ept;
extern bool __read_mostly enable_unrestricted_guest;
extern bool __read_mostly enable_ept_ad_bits;
extern bool __read_mostly enable_pml;
+extern bool __read_mostly enable_apicv;
extern int __read_mostly pt_mode;
#define PT_MODE_SYSTEM 0
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a5757b0b80f9..2b3ba7d27be4 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5949,8 +5949,7 @@ void nested_vmx_set_vmcs_shadowing_bitmap(void)
* bit in the high half is on if the corresponding bit in the control field
* may be on. See also vmx_control_verify().
*/
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
- bool apicv)
+void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
{
/*
* Note that as a general rule, the high half of the MSRs (bits in
@@ -5977,7 +5976,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
PIN_BASED_EXT_INTR_MASK |
PIN_BASED_NMI_EXITING |
PIN_BASED_VIRTUAL_NMIS |
- (apicv ? PIN_BASED_POSTED_INTR : 0);
+ (enable_apicv ? PIN_BASED_POSTED_INTR : 0);
msrs->pinbased_ctls_high |=
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR |
PIN_BASED_VMX_PREEMPTION_TIMER;
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index fc874d4ead0f..1c5fbff45d69 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -17,8 +17,7 @@ enum nvmx_vmentry_status {
};
void vmx_leave_nested(struct kvm_vcpu *vcpu);
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
- bool apicv);
+void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
void nested_vmx_hardware_unsetup(void);
__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
void nested_vmx_set_vmcs_shadowing_bitmap(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 63ccc435a602..404dafedd778 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -95,7 +95,7 @@ module_param(emulate_invalid_guest_state, bool, S_IRUGO);
static bool __read_mostly fasteoi = 1;
module_param(fasteoi, bool, S_IRUGO);
-static bool __read_mostly enable_apicv = 1;
+bool __read_mostly enable_apicv = 1;
module_param(enable_apicv, bool, S_IRUGO);
/*
@@ -6769,8 +6769,7 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
if (nested)
nested_vmx_setup_ctls_msrs(&vmx->nested.msrs,
- vmx_capability.ept,
- kvm_vcpu_apicv_active(vcpu));
+ vmx_capability.ept);
else
memset(&vmx->nested.msrs, 0, sizeof(vmx->nested.msrs));
@@ -6851,8 +6850,7 @@ static int __init vmx_check_processor_compat(void)
if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
return -EIO;
if (nested)
- nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept,
- enable_apicv);
+ nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
smp_processor_id());
@@ -7714,7 +7712,7 @@ static __init int hardware_setup(void)
if (nested) {
nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
- vmx_capability.ept, enable_apicv);
+ vmx_capability.ept);
r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
if (r)
The patch below does not apply to the 5.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2e0a576511f656933adfe56ef03b9cf3e64b21b7 Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula(a)intel.com>
Date: Thu, 13 Feb 2020 16:04:11 +0200
Subject: [PATCH] drm/i915/dsc: force full modeset whenever DSC is enabled at
probe
We lack full state readout of DSC config, which may lead to DSC enable
using a config that's all zeros, failing spectacularly. Force full
modeset and thus compute config at probe to get a sane state, until we
implement DSC state readout. Any fastset that did appear to work with
DSC at probe, worked by coincidence. [1] is an example of a change that
triggered the issue on TGL DSI DSC.
[1] http://patchwork.freedesktop.org/patch/msgid/20200212150102.7600-1-ville.sy…
Cc: Manasi Navare <manasi.d.navare(a)intel.com>
Cc: Vandita Kulkarni <vandita.kulkarni(a)intel.com>
Cc: Ville Syrjala <ville.syrjala(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Fixes: fbacb15ea814 ("drm/i915/dsc: add basic hardware state readout support")
Acked-by: Matt Roper <matthew.d.roper(a)intel.com>
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200213140412.32697-3-stanis…
(cherry picked from commit a4277aa398d76db109d6b8420934f68daf69a6c3)
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 064dd99bbc49..e68ec25fc97c 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -17433,6 +17433,24 @@ static int intel_initial_commit(struct drm_device *dev)
* have readout for pipe gamma enable.
*/
crtc_state->uapi.color_mgmt_changed = true;
+
+ /*
+ * FIXME hack to force full modeset when DSC is being
+ * used.
+ *
+ * As long as we do not have full state readout and
+ * config comparison of crtc_state->dsc, we have no way
+ * to ensure reliable fastset. Remove once we have
+ * readout for DSC.
+ */
+ if (crtc_state->dsc.compression_enable) {
+ ret = drm_atomic_add_affected_connectors(state,
+ &crtc->base);
+ if (ret)
+ goto out;
+ crtc_state->uapi.mode_changed = true;
+ drm_dbg_kms(dev, "Force full modeset for DSC\n");
+ }
}
}