From: Wanpeng Li <wanpengli(a)tencent.com>
Commit 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's
CPUID) avoids to access pv tlb shootdown host side logic when this pv feature
is not exposed to guest, however, kvm_steal_time.preempted not only leveraged
by pv tlb shootdown logic but also mitigate the lock holder preemption issue.
>From guest point of view, vCPU is always preempted since we lose the reset of
kvm_steal_time.preempted before vmentry if pv tlb shootdown feature is not
exposed. This patch fixes it by clearing kvm_steal_time.preempted before
vmentry.
Fixes: 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID)
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
---
arch/x86/kvm/x86.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c0244a6..c38e990 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3105,7 +3105,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
st->preempted & KVM_VCPU_FLUSH_TLB);
if (xchg(&st->preempted, 0) & KVM_VCPU_FLUSH_TLB)
kvm_vcpu_flush_tlb_guest(vcpu);
- }
+ } else
+ st->preempted = 0;
vcpu->arch.st.preempted = 0;
--
2.7.4
This reverts commit a979a6aa009f3c99689432e0cdb5402a4463fb88.
The reverted commit may cause VM freeze on arm64 platform.
Because on arm64 platform, stop a consumer will suspend the VM,
the VM will freeze without a start consumer
Signed-off-by: Zhu Lingshan <lingshan.zhu(a)intel.com>
---
virt/lib/irqbypass.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
index c9bb3957f58a..28fda42e471b 100644
--- a/virt/lib/irqbypass.c
+++ b/virt/lib/irqbypass.c
@@ -40,21 +40,17 @@ static int __connect(struct irq_bypass_producer *prod,
if (prod->add_consumer)
ret = prod->add_consumer(prod, cons);
- if (ret)
- goto err_add_consumer;
-
- ret = cons->add_producer(cons, prod);
- if (ret)
- goto err_add_producer;
+ if (!ret) {
+ ret = cons->add_producer(cons, prod);
+ if (ret && prod->del_consumer)
+ prod->del_consumer(prod, cons);
+ }
if (cons->start)
cons->start(cons);
if (prod->start)
prod->start(prod);
-err_add_producer:
- if (prod->del_consumer)
- prod->del_consumer(prod, cons);
-err_add_consumer:
+
return ret;
}
--
2.27.0
Hello there,
I hope this message finds you in good spirits especially during
this challenging time of coronavirus pandemic. I hope you and
your family are well and keeping safe. Anyway, I am Chris
Pavlides, a broker working with Flippiebecker Wealth. I got your
contact (along with few other contacts) through an online
business directory and I thought I should contact you to see if
you are interested in this opportunity. I am contacting you
because one of my high profile clients is interested in investing
abroad and has asked me to look for individuals and companies
with interesting business ideas and projects that he can invest
in. He wants to invest a substantial amount of asset abroad.
Please kindly respond back to this email if you are interested in
this opportunity. Once I receive your response, I will give you
more details and we can plan a strategy that will be beneficial
to all parties.
Best regards
C Pavlides
Flippiebecker Wealth
The patch titled
Subject: kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
has been added to the -mm tree. Its filename is
kasan-fix-unit-tests-with-config_ubsan_local_bounds-enabled.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/kasan-fix-unit-tests-with-config_…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/kasan-fix-unit-tests-with-config_…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Peter Collingbourne <pcc(a)google.com>
Subject: kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
These tests deliberately access these arrays out of bounds, which will
cause the dynamic local bounds checks inserted by
CONFIG_UBSAN_LOCAL_BOUNDS to fail and panic the kernel. To avoid this
problem, access the arrays via volatile pointers, which will prevent the
compiler from being able to determine the array bounds.
These accesses use volatile pointers to char (char *volatile) rather than
the more conventional pointers to volatile char (volatile char *) because
we want to prevent the compiler from making inferences about the pointer
itself (i.e. its array bounds), not the data that it refers to.
Link: https://lkml.kernel.org/r/20210507025915.1464056-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/I90b1713fbfa1bf68ff895aef099ea77b9…
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Tested-by: Alexander Potapenko <glider(a)google.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Peter Collingbourne <pcc(a)google.com>
Cc: George Popescu <georgepope(a)android.com>
Cc: Elena Petrova <lenaptr(a)google.com>
Cc: Evgenii Stepanov <eugenis(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/test_kasan.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
--- a/lib/test_kasan.c~kasan-fix-unit-tests-with-config_ubsan_local_bounds-enabled
+++ a/lib/test_kasan.c
@@ -654,8 +654,20 @@ static char global_array[10];
static void kasan_global_oob(struct kunit *test)
{
- volatile int i = 3;
- char *p = &global_array[ARRAY_SIZE(global_array) + i];
+ /*
+ * Deliberate out-of-bounds access. To prevent CONFIG_UBSAN_LOCAL_BOUNDS
+ * from failing here and panicing the kernel, access the array via a
+ * volatile pointer, which will prevent the compiler from being able to
+ * determine the array bounds.
+ *
+ * This access uses a volatile pointer to char (char *volatile) rather
+ * than the more conventional pointer to volatile char (volatile char *)
+ * because we want to prevent the compiler from making inferences about
+ * the pointer itself (i.e. its array bounds), not the data that it
+ * refers to.
+ */
+ char *volatile array = global_array;
+ char *p = &array[ARRAY_SIZE(global_array) + 3];
/* Only generic mode instruments globals. */
KASAN_TEST_NEEDS_CONFIG_ON(test, CONFIG_KASAN_GENERIC);
@@ -703,8 +715,9 @@ static void ksize_uaf(struct kunit *test
static void kasan_stack_oob(struct kunit *test)
{
char stack_array[10];
- volatile int i = OOB_TAG_OFF;
- char *p = &stack_array[ARRAY_SIZE(stack_array) + i];
+ /* See comment in kasan_global_oob. */
+ char *volatile array = stack_array;
+ char *p = &array[ARRAY_SIZE(stack_array) + OOB_TAG_OFF];
KASAN_TEST_NEEDS_CONFIG_ON(test, CONFIG_KASAN_STACK);
@@ -715,7 +728,9 @@ static void kasan_alloca_oob_left(struct
{
volatile int i = 10;
char alloca_array[i];
- char *p = alloca_array - 1;
+ /* See comment in kasan_global_oob. */
+ char *volatile array = alloca_array;
+ char *p = array - 1;
/* Only generic mode instruments dynamic allocas. */
KASAN_TEST_NEEDS_CONFIG_ON(test, CONFIG_KASAN_GENERIC);
@@ -728,7 +743,9 @@ static void kasan_alloca_oob_right(struc
{
volatile int i = 10;
char alloca_array[i];
- char *p = alloca_array + i;
+ /* See comment in kasan_global_oob. */
+ char *volatile array = alloca_array;
+ char *p = array + i;
/* Only generic mode instruments dynamic allocas. */
KASAN_TEST_NEEDS_CONFIG_ON(test, CONFIG_KASAN_GENERIC);
_
Patches currently in -mm which might be from pcc(a)google.com are
kasan-fix-unit-tests-with-config_ubsan_local_bounds-enabled.patch
mm-improve-mprotectrw-efficiency-on-pages-referenced-once.patch
The patch titled
Subject: mm: fix struct page layout on 32-bit systems
has been added to the -mm tree. Its filename is
mm-fix-struct-page-layout-on-32-bit-systems.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-fix-struct-page-layout-on-32-b…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-struct-page-layout-on-32-b…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Subject: mm: fix struct page layout on 32-bit systems
32-bit architectures which expect 8-byte alignment for 8-byte integers and
need 64-bit DMA addresses (arm, mips, ppc) had their struct page
inadvertently expanded in 2019. When the dma_addr_t was added, it forced
the alignment of the union to 8 bytes, which inserted a 4 byte gap between
'flags' and the union.
Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
This restores the alignment to that of an unsigned long. We always
store the low bits in the first word to prevent the PageTail bit from
being inadvertently set on a big endian platform. If that happened,
get_user_pages_fast() racing against a page which was freed and
reallocated to the page_pool could dereference a bogus compound_head(),
which would be hard to trace back to this cause.
Link: https://lkml.kernel.org/r/20210510153211.1504886-1-willy@infradead.org
Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer(a)redhat.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Tested-by: Matteo Croce <mcroce(a)linux.microsoft.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm_types.h | 4 ++--
include/net/page_pool.h | 12 +++++++++++-
net/core/page_pool.c | 12 +++++++-----
3 files changed, 20 insertions(+), 8 deletions(-)
--- a/include/linux/mm_types.h~mm-fix-struct-page-layout-on-32-bit-systems
+++ a/include/linux/mm_types.h
@@ -97,10 +97,10 @@ struct page {
};
struct { /* page_pool used by netstack */
/**
- * @dma_addr: might require a 64-bit value even on
+ * @dma_addr: might require a 64-bit value on
* 32-bit architectures.
*/
- dma_addr_t dma_addr;
+ unsigned long dma_addr[2];
};
struct { /* slab, slob and slub */
union {
--- a/include/net/page_pool.h~mm-fix-struct-page-layout-on-32-bit-systems
+++ a/include/net/page_pool.h
@@ -198,7 +198,17 @@ static inline void page_pool_recycle_dir
static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
{
- return page->dma_addr;
+ dma_addr_t ret = page->dma_addr[0];
+ if (sizeof(dma_addr_t) > sizeof(unsigned long))
+ ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16;
+ return ret;
+}
+
+static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
+{
+ page->dma_addr[0] = addr;
+ if (sizeof(dma_addr_t) > sizeof(unsigned long))
+ page->dma_addr[1] = upper_32_bits(addr);
}
static inline bool is_page_pool_compiled_in(void)
--- a/net/core/page_pool.c~mm-fix-struct-page-layout-on-32-bit-systems
+++ a/net/core/page_pool.c
@@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_devic
struct page *page,
unsigned int dma_sync_size)
{
+ dma_addr_t dma_addr = page_pool_get_dma_addr(page);
+
dma_sync_size = min(dma_sync_size, pool->p.max_len);
- dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
+ dma_sync_single_range_for_device(pool->p.dev, dma_addr,
pool->p.offset, dma_sync_size,
pool->p.dma_dir);
}
@@ -195,7 +197,7 @@ static bool page_pool_dma_map(struct pag
if (dma_mapping_error(pool->p.dev, dma))
return false;
- page->dma_addr = dma;
+ page_pool_set_dma_addr(page, dma);
if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
@@ -331,13 +333,13 @@ void page_pool_release_page(struct page_
*/
goto skip_dma_unmap;
- dma = page->dma_addr;
+ dma = page_pool_get_dma_addr(page);
- /* When page is unmapped, it cannot be returned our pool */
+ /* When page is unmapped, it cannot be returned to our pool */
dma_unmap_page_attrs(pool->p.dev, dma,
PAGE_SIZE << pool->p.order, pool->p.dma_dir,
DMA_ATTR_SKIP_CPU_SYNC);
- page->dma_addr = 0;
+ page_pool_set_dma_addr(page, 0);
skip_dma_unmap:
/* This may be the last page returned, releasing the pool, so
* it is not safe to reference pool afterwards.
_
Patches currently in -mm which might be from willy(a)infradead.org are
mm-fix-struct-page-layout-on-32-bit-systems.patch
mm-make-__dump_page-static.patch
mm-debug-factor-pagepoisoned-out-of-__dump_page.patch
mm-page_owner-constify-dump_page_owner.patch
mm-make-compound_head-const-preserving.patch
mm-constify-get_pfnblock_flags_mask-and-get_pfnblock_migratetype.patch
mm-constify-page_count-and-page_ref_count.patch
mm-optimise-nth_page-for-contiguous-memmap.patch
A valid implementation choice for the ChooseRandomNonExcludedTag()
pseudocode function used by IRG is to behave in the same way as with
GCR_EL1.RRND=0. This would mean that RGSR_EL1.SEED is used as an LFSR
which must have a non-zero value in order for IRG to properly produce
pseudorandom numbers. However, RGSR_EL1 is reset to an UNKNOWN value
on soft reset and thus may reset to 0. Therefore we must initialize
RGSR_EL1.SEED to a non-zero value in order to ensure that IRG behaves
as expected.
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Fixes: 3b714d24ef17 ("arm64: mte: CPU feature detection and initial sysreg configuration")
Cc: <stable(a)vger.kernel.org> # 5.10
Link: https://linux-review.googlesource.com/id/I2b089b6c7d6f17ee37e2f0db7df5ad5bc…
---
arch/arm64/mm/proc.S | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 0a48191534ff..97d7bcd8d4f2 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -447,6 +447,18 @@ SYM_FUNC_START(__cpu_setup)
mov x10, #(SYS_GCR_EL1_RRND | SYS_GCR_EL1_EXCL_MASK)
msr_s SYS_GCR_EL1, x10
+ /*
+ * If GCR_EL1.RRND=1 is implemented the same way as RRND=0, then
+ * RGSR_EL1.SEED must be non-zero for IRG to produce
+ * pseudorandom numbers. As RGSR_EL1 is UNKNOWN out of reset, we
+ * must initialize it.
+ */
+ mrs x10, CNTVCT_EL0
+ ands x10, x10, #SYS_RGSR_EL1_SEED_MASK
+ csinc x10, x10, xzr, ne
+ lsl x10, x10, #SYS_RGSR_EL1_SEED_SHIFT
+ msr_s SYS_RGSR_EL1, x10
+
/* clear any pending tag check faults in TFSR*_EL1 */
msr_s SYS_TFSR_EL1, xzr
msr_s SYS_TFSRE0_EL1, xzr
--
2.31.1.607.g51e8a6a459-goog
The mdev remove callback for the vfio_ap device driver bails out with
-EBUSY if the mdev is in use by a KVM guest. The intended purpose was
to prevent the mdev from being removed while in use; however, returning a
non-zero rc does not prevent removal. This could result in a memory leak
of the resources allocated when the mdev was created. In addition, the
KVM guest will still have access to the AP devices assigned to the mdev
even though the mdev no longer exists.
To prevent this scenario, cleanup will be done - including unplugging the
AP adapters, domains and control domains - regardless of whether the mdev
is in use by a KVM guest or not.
Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tony Krowiak <akrowiak(a)stny.rr.com>
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_ops.c | 39 +++++++++++++++++++++++--------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index b2c7e10dfdcd..757166da947e 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -335,6 +335,32 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
matrix->adm_max = info->apxa ? info->Nd : 15;
}
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+ return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_clear_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+ /*
+ * If the KVM pointer is in the process of being set, wait until the
+ * process has completed.
+ */
+ wait_event_cmd(matrix_mdev->wait_for_kvm,
+ !matrix_mdev->kvm_busy,
+ mutex_unlock(&matrix_dev->lock),
+ mutex_lock(&matrix_dev->lock));
+
+ if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+ matrix_mdev->kvm_busy = true;
+ mutex_unlock(&matrix_dev->lock);
+ kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+ mutex_lock(&matrix_dev->lock);
+ matrix_mdev->kvm_busy = false;
+ wake_up_all(&matrix_mdev->wait_for_kvm);
+ }
+}
+
static int vfio_ap_mdev_create(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev;
@@ -366,16 +392,9 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
mutex_lock(&matrix_dev->lock);
-
- /*
- * If the KVM pointer is in flux or the guest is running, disallow
- * un-assignment of control domain.
- */
- if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
- mutex_unlock(&matrix_dev->lock);
- return -EBUSY;
- }
-
+ WARN(vfio_ap_mdev_has_crycb(matrix_mdev),
+ "Removing mdev leaves KVM guest without any crypto devices");
+ vfio_ap_mdev_clear_apcb(matrix_mdev);
vfio_ap_mdev_reset_queues(mdev);
list_del(&matrix_mdev->node);
kfree(matrix_mdev);
--
2.30.2
The mdev remove callback for the vfio_ap device driver bails out with
-EBUSY if the mdev is in use by a KVM guest. The intended purpose was
to prevent the mdev from being removed while in use; however, returning a
non-zero rc does not prevent removal. This could result in a memory leak
of the resources allocated when the mdev was created. In addition, the
KVM guest will still have access to the AP devices assigned to the mdev
even though the mdev no longer exists.
To prevent this scenario, cleanup will be done - including unplugging the
AP adapters, domains and control domains - regardless of whether the mdev
is in use by a KVM guest or not.
Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tony Krowiak <akrowiak(a)stny.rr.com>
---
drivers/s390/crypto/vfio_ap_ops.c | 39 +++++++++++++++++++++++--------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index b2c7e10dfdcd..757166da947e 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -335,6 +335,32 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
matrix->adm_max = info->apxa ? info->Nd : 15;
}
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+ return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_clear_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+ /*
+ * If the KVM pointer is in the process of being set, wait until the
+ * process has completed.
+ */
+ wait_event_cmd(matrix_mdev->wait_for_kvm,
+ !matrix_mdev->kvm_busy,
+ mutex_unlock(&matrix_dev->lock),
+ mutex_lock(&matrix_dev->lock));
+
+ if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+ matrix_mdev->kvm_busy = true;
+ mutex_unlock(&matrix_dev->lock);
+ kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+ mutex_lock(&matrix_dev->lock);
+ matrix_mdev->kvm_busy = false;
+ wake_up_all(&matrix_mdev->wait_for_kvm);
+ }
+}
+
static int vfio_ap_mdev_create(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev;
@@ -366,16 +392,9 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
mutex_lock(&matrix_dev->lock);
-
- /*
- * If the KVM pointer is in flux or the guest is running, disallow
- * un-assignment of control domain.
- */
- if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
- mutex_unlock(&matrix_dev->lock);
- return -EBUSY;
- }
-
+ WARN(vfio_ap_mdev_has_crycb(matrix_mdev),
+ "Removing mdev leaves KVM guest without any crypto devices");
+ vfio_ap_mdev_clear_apcb(matrix_mdev);
vfio_ap_mdev_reset_queues(mdev);
list_del(&matrix_mdev->node);
kfree(matrix_mdev);
--
2.30.2
32-bit architectures which expect 8-byte alignment for 8-byte integers
and need 64-bit DMA addresses (arm, mips, ppc) had their struct page
inadvertently expanded in 2019. When the dma_addr_t was added, it forced
the alignment of the union to 8 bytes, which inserted a 4 byte gap between
'flags' and the union.
Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
This restores the alignment to that of an unsigned long. We always
store the low bits in the first word to prevent the PageTail bit from
being inadvertently set on a big endian platform. If that happened,
get_user_pages_fast() racing against a page which was freed and
reallocated to the page_pool could dereference a bogus compound_head(),
which would be hard to trace back to this cause.
Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer(a)redhat.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
---
include/linux/mm_types.h | 4 ++--
include/net/page_pool.h | 12 +++++++++++-
net/core/page_pool.c | 12 +++++++-----
3 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6613b26a8894..5aacc1c10a45 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -97,10 +97,10 @@ struct page {
};
struct { /* page_pool used by netstack */
/**
- * @dma_addr: might require a 64-bit value even on
+ * @dma_addr: might require a 64-bit value on
* 32-bit architectures.
*/
- dma_addr_t dma_addr;
+ unsigned long dma_addr[2];
};
struct { /* slab, slob and slub */
union {
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 6d517a37c18b..b4b6de909c93 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
{
- return page->dma_addr;
+ dma_addr_t ret = page->dma_addr[0];
+ if (sizeof(dma_addr_t) > sizeof(unsigned long))
+ ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16;
+ return ret;
+}
+
+static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
+{
+ page->dma_addr[0] = addr;
+ if (sizeof(dma_addr_t) > sizeof(unsigned long))
+ page->dma_addr[1] = upper_32_bits(addr);
}
static inline bool is_page_pool_compiled_in(void)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 9ec1aa9640ad..3c4c4c7a0402 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct page_pool *pool,
struct page *page,
unsigned int dma_sync_size)
{
+ dma_addr_t dma_addr = page_pool_get_dma_addr(page);
+
dma_sync_size = min(dma_sync_size, pool->p.max_len);
- dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
+ dma_sync_single_range_for_device(pool->p.dev, dma_addr,
pool->p.offset, dma_sync_size,
pool->p.dma_dir);
}
@@ -195,7 +197,7 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
if (dma_mapping_error(pool->p.dev, dma))
return false;
- page->dma_addr = dma;
+ page_pool_set_dma_addr(page, dma);
if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
@@ -331,13 +333,13 @@ void page_pool_release_page(struct page_pool *pool, struct page *page)
*/
goto skip_dma_unmap;
- dma = page->dma_addr;
+ dma = page_pool_get_dma_addr(page);
- /* When page is unmapped, it cannot be returned our pool */
+ /* When page is unmapped, it cannot be returned to our pool */
dma_unmap_page_attrs(pool->p.dev, dma,
PAGE_SIZE << pool->p.order, pool->p.dma_dir,
DMA_ATTR_SKIP_CPU_SYNC);
- page->dma_addr = 0;
+ page_pool_set_dma_addr(page, 0);
skip_dma_unmap:
/* This may be the last page returned, releasing the pool, so
* it is not safe to reference pool afterwards.
--
2.30.2
From: Bean Huo <beanhuo(a)micron.com>
Currently, we have two ways to issue multiple-block read/write the
command to the eMMC. One is by normal IO request path fs->block->mmc.
Another one is that we can issue multiple-block read/write through
MMC ioctl interface. For the first path, mrq->stop, and mrq->stop->opcode
will be initialized in mmc_blk_data_prep(). However, for the second IO
path, mrq->stop is not initialized since it is a pre-defined multiple
blocks read/write.
Meanwhile, if it is open-ended multiple block read/write command,
STOP_TRANSMISSION CMD12 will be issued later in mmc_blk_issue_drv_op(),
since it is MMC_IOC_MULTI_CMD.
So, delete these if-statement checkups, let these kinds of multiple-block
read/write request go.
Fixes 'ba3869ff32e4 ("mmc: cavium: Add core MMC driver for Cavium SOCs")'
Cc: stable(a)vger.kernel.org
Signed-off-by: Bean Huo <beanhuo(a)micron.com>
---
drivers/mmc/host/cavium.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/mmc/host/cavium.c b/drivers/mmc/host/cavium.c
index 95a41983c6c0..8fb7cbcf62ad 100644
--- a/drivers/mmc/host/cavium.c
+++ b/drivers/mmc/host/cavium.c
@@ -654,8 +654,7 @@ static void cvm_mmc_dma_request(struct mmc_host *mmc,
struct mmc_data *data;
u64 emm_dma, addr;
- if (!mrq->data || !mrq->data->sg || !mrq->data->sg_len ||
- !mrq->stop || mrq->stop->opcode != MMC_STOP_TRANSMISSION) {
+ if (!mrq->data || !mrq->data->sg || !mrq->data->sg_len) {
dev_err(&mmc->card->dev, "Error: %s no data\n", __func__);
goto error;
}
--
2.25.1
This is a note to let you know that I've just added the patch titled
usb: dwc3: omap: improve extcon initialization
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From e17b02d4970913233d543c79c9c66e72cac05bdd Mon Sep 17 00:00:00 2001
From: Marcel Hamer <marcel(a)solidxs.se>
Date: Tue, 27 Apr 2021 14:21:18 +0200
Subject: usb: dwc3: omap: improve extcon initialization
When extcon is used in combination with dwc3, it is assumed that the dwc3
registers are untouched and as such are only configured if VBUS is valid
or ID is tied to ground.
In case VBUS is not valid or ID is floating, the registers are not
configured as such during driver initialization, causing a wrong
default state during boot.
If the registers are not in a default state, because they are for
instance touched by a boot loader, this can cause for a kernel error.
Signed-off-by: Marcel Hamer <marcel(a)solidxs.se>
Link: https://lore.kernel.org/r/20210427122118.1948340-1-marcel@solidxs.se
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/dwc3-omap.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/usb/dwc3/dwc3-omap.c b/drivers/usb/dwc3/dwc3-omap.c
index 3db17806e92e..e196673f5c64 100644
--- a/drivers/usb/dwc3/dwc3-omap.c
+++ b/drivers/usb/dwc3/dwc3-omap.c
@@ -437,8 +437,13 @@ static int dwc3_omap_extcon_register(struct dwc3_omap *omap)
if (extcon_get_state(edev, EXTCON_USB) == true)
dwc3_omap_set_mailbox(omap, OMAP_DWC3_VBUS_VALID);
+ else
+ dwc3_omap_set_mailbox(omap, OMAP_DWC3_VBUS_OFF);
+
if (extcon_get_state(edev, EXTCON_USB_HOST) == true)
dwc3_omap_set_mailbox(omap, OMAP_DWC3_ID_GROUND);
+ else
+ dwc3_omap_set_mailbox(omap, OMAP_DWC3_ID_FLOAT);
omap->edev = edev;
}
--
2.31.1
This is a note to let you know that I've just added the patch titled
usb: typec: ucsi: Put fwnode in any case during ->probe()
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From b9a0866a5bdf6a4643a52872ada6be6184c6f4f2 Mon Sep 17 00:00:00 2001
From: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Date: Wed, 5 May 2021 01:23:37 +0300
Subject: usb: typec: ucsi: Put fwnode in any case during ->probe()
device_for_each_child_node() bumps a reference counting of a returned variable.
We have to balance it whenever we return to the caller.
Fixes: c1b0bc2dabfa ("usb: typec: Add support for UCSI interface")
Cc: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Link: https://lore.kernel.org/r/20210504222337.3151726-1-andy.shevchenko@gmail.com
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/ucsi/ucsi.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
index 282c3c825c13..0e1cec346e0f 100644
--- a/drivers/usb/typec/ucsi/ucsi.c
+++ b/drivers/usb/typec/ucsi/ucsi.c
@@ -999,6 +999,7 @@ static const struct typec_operations ucsi_ops = {
.pr_set = ucsi_pr_swap
};
+/* Caller must call fwnode_handle_put() after use */
static struct fwnode_handle *ucsi_find_fwnode(struct ucsi_connector *con)
{
struct fwnode_handle *fwnode;
@@ -1033,7 +1034,7 @@ static int ucsi_register_port(struct ucsi *ucsi, int index)
command |= UCSI_CONNECTOR_NUMBER(con->num);
ret = ucsi_send_command(ucsi, command, &con->cap, sizeof(con->cap));
if (ret < 0)
- goto out;
+ goto out_unlock;
if (con->cap.op_mode & UCSI_CONCAP_OPMODE_DRP)
cap->data = TYPEC_PORT_DRD;
@@ -1151,6 +1152,8 @@ static int ucsi_register_port(struct ucsi *ucsi, int index)
trace_ucsi_register_port(con->num, &con->status);
out:
+ fwnode_handle_put(cap->fwnode);
+out_unlock:
mutex_unlock(&con->lock);
return ret;
}
--
2.31.1
This is a note to let you know that I've just added the patch titled
usb: typec: tcpm: Fix wrong handling in GET_SINK_CAP
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 2e2b8d15adc2f6ab2d4aa0550e241b9742a436a0 Mon Sep 17 00:00:00 2001
From: Kyle Tso <kyletso(a)google.com>
Date: Tue, 4 May 2021 01:18:49 +0800
Subject: usb: typec: tcpm: Fix wrong handling in GET_SINK_CAP
After receiving Sink Capabilities Message in GET_SINK_CAP AMS, it is
incorrect to call tcpm_pd_handle_state because the Message is expected
and the current state is not Ready states. The result of this incorrect
operation ends in Soft Reset which is definitely wrong. Simply
forwarding to Ready States is enough to finish the AMS.
Fixes: 8dea75e11380 ("usb: typec: tcpm: Protocol Error handling")
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Acked-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Kyle Tso <kyletso(a)google.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210503171849.2605302-1-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/tcpm/tcpm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index c4fdc00a3bc8..68e04e397e92 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -2390,7 +2390,7 @@ static void tcpm_pd_data_request(struct tcpm_port *port,
port->nr_sink_caps = cnt;
port->sink_cap_done = true;
if (port->ams == GET_SINK_CAPABILITIES)
- tcpm_pd_handle_state(port, ready_state(port), NONE_AMS, 0);
+ tcpm_set_state(port, ready_state(port), 0);
/* Unexpected Sink Capabilities */
else
tcpm_pd_handle_msg(port,
--
2.31.1
This is a note to let you know that I've just added the patch titled
usb: dwc3: imx8mp: fix error return code in dwc3_imx8mp_probe()
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 0b2b149e918f6dddb4ea53615551bf7bc131f875 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen(a)huawei.com>
Date: Sat, 8 May 2021 09:53:10 +0800
Subject: usb: dwc3: imx8mp: fix error return code in dwc3_imx8mp_probe()
Fix to return a negative error code from the error handling case instead
of 0, as done elsewhere in this function.
Fixes: 6dd2565989b4 ("usb: dwc3: add imx8mp dwc3 glue layer driver")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Acked-by: Felipe Balbi <balbi(a)kernel.org>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210508015310.1627-1-thunder.leizhen@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/dwc3-imx8mp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/dwc3/dwc3-imx8mp.c b/drivers/usb/dwc3/dwc3-imx8mp.c
index e9fced6f7a7c..756faa46d33a 100644
--- a/drivers/usb/dwc3/dwc3-imx8mp.c
+++ b/drivers/usb/dwc3/dwc3-imx8mp.c
@@ -167,6 +167,7 @@ static int dwc3_imx8mp_probe(struct platform_device *pdev)
dwc3_np = of_get_compatible_child(node, "snps,dwc3");
if (!dwc3_np) {
+ err = -ENODEV;
dev_err(dev, "failed to find dwc3 core child\n");
goto disable_rpm;
}
--
2.31.1
This is a note to let you know that I've just added the patch titled
usb: dwc2: Fix gadget DMA unmap direction
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 75a41ce46bae6cbe7d3bb2584eb844291d642874 Mon Sep 17 00:00:00 2001
From: Phil Elwell <phil(a)raspberrypi.com>
Date: Thu, 6 May 2021 12:22:00 +0100
Subject: usb: dwc2: Fix gadget DMA unmap direction
The dwc2 gadget support maps and unmaps DMA buffers as necessary. When
mapping and unmapping it uses the direction of the endpoint to select
the direction of the DMA transfer, but this fails for Control OUT
transfers because the unmap occurs after the endpoint direction has
been reversed for the status phase.
A possible solution would be to unmap the buffer before the direction
is changed, but a safer, less invasive fix is to remember the buffer
direction independently of the endpoint direction.
Fixes: fe0b94abcdf6 ("usb: dwc2: gadget: manage ep0 state in software")
Acked-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Phil Elwell <phil(a)raspberrypi.com>
Link: https://lore.kernel.org/r/20210506112200.2893922-1-phil@raspberrypi.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc2/core.h | 2 ++
drivers/usb/dwc2/gadget.c | 3 ++-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index da5ac4a4595b..ab6b815e0089 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -113,6 +113,7 @@ struct dwc2_hsotg_req;
* @debugfs: File entry for debugfs file for this endpoint.
* @dir_in: Set to true if this endpoint is of the IN direction, which
* means that it is sending data to the Host.
+ * @map_dir: Set to the value of dir_in when the DMA buffer is mapped.
* @index: The index for the endpoint registers.
* @mc: Multi Count - number of transactions per microframe
* @interval: Interval for periodic endpoints, in frames or microframes.
@@ -162,6 +163,7 @@ struct dwc2_hsotg_ep {
unsigned short fifo_index;
unsigned char dir_in;
+ unsigned char map_dir;
unsigned char index;
unsigned char mc;
u16 interval;
diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c
index e6bb1bdb2760..184964174dc0 100644
--- a/drivers/usb/dwc2/gadget.c
+++ b/drivers/usb/dwc2/gadget.c
@@ -422,7 +422,7 @@ static void dwc2_hsotg_unmap_dma(struct dwc2_hsotg *hsotg,
{
struct usb_request *req = &hs_req->req;
- usb_gadget_unmap_request(&hsotg->gadget, req, hs_ep->dir_in);
+ usb_gadget_unmap_request(&hsotg->gadget, req, hs_ep->map_dir);
}
/*
@@ -1242,6 +1242,7 @@ static int dwc2_hsotg_map_dma(struct dwc2_hsotg *hsotg,
{
int ret;
+ hs_ep->map_dir = hs_ep->dir_in;
ret = usb_gadget_map_request(&hsotg->gadget, req, hs_ep->dir_in);
if (ret)
goto dma_error;
--
2.31.1
This is a note to let you know that I've just added the patch titled
usb: dwc3: gadget: Enable suspend events
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d1d90dd27254c44d087ad3f8b5b3e4fff0571f45 Mon Sep 17 00:00:00 2001
From: Jack Pham <jackp(a)codeaurora.org>
Date: Wed, 28 Apr 2021 02:01:10 -0700
Subject: usb: dwc3: gadget: Enable suspend events
commit 72704f876f50 ("dwc3: gadget: Implement the suspend entry event
handler") introduced (nearly 5 years ago!) an interrupt handler for
U3/L1-L2 suspend events. The problem is that these events aren't
currently enabled in the DEVTEN register so the handler is never
even invoked. Fix this simply by enabling the corresponding bit
in dwc3_gadget_enable_irq() using the same revision check as found
in the handler.
Fixes: 72704f876f50 ("dwc3: gadget: Implement the suspend entry event handler")
Acked-by: Felipe Balbi <balbi(a)kernel.org>
Signed-off-by: Jack Pham <jackp(a)codeaurora.org>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210428090111.3370-1-jackp@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/gadget.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index dd80e5ca8c78..cab3a9184068 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2323,6 +2323,10 @@ static void dwc3_gadget_enable_irq(struct dwc3 *dwc)
if (DWC3_VER_IS_PRIOR(DWC3, 250A))
reg |= DWC3_DEVTEN_ULSTCNGEN;
+ /* On 2.30a and above this bit enables U3/L2-L1 Suspend Events */
+ if (!DWC3_VER_IS_PRIOR(DWC3, 230A))
+ reg |= DWC3_DEVTEN_EOPFEN;
+
dwc3_writel(dwc->regs, DWC3_DEVTEN, reg);
}
--
2.31.1
This is a note to let you know that I've just added the patch titled
usb: dwc3: pci: Enable usb2-gadget-lpm-disable for Intel Merrifield
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 04357fafea9c7ed34525eb9680c760245c3bb958 Mon Sep 17 00:00:00 2001
From: Ferry Toth <ftoth(a)exalondelft.nl>
Date: Sun, 25 Apr 2021 17:09:47 +0200
Subject: usb: dwc3: pci: Enable usb2-gadget-lpm-disable for Intel Merrifield
On Intel Merrifield LPM is causing host to reset port after a timeout.
By disabling LPM entirely this is prevented.
Fixes: 066c09593454 ("usb: dwc3: pci: Enable extcon driver for Intel Merrifield")
Reviewed-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Signed-off-by: Ferry Toth <ftoth(a)exalondelft.nl>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210425150947.5862-1-ftoth@exalondelft.nl
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/dwc3-pci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
index e7b932dcbf82..1e51460938b8 100644
--- a/drivers/usb/dwc3/dwc3-pci.c
+++ b/drivers/usb/dwc3/dwc3-pci.c
@@ -123,6 +123,7 @@ static const struct property_entry dwc3_pci_mrfld_properties[] = {
PROPERTY_ENTRY_STRING("linux,extcon-name", "mrfld_bcove_pwrsrc"),
PROPERTY_ENTRY_BOOL("snps,dis_u3_susphy_quirk"),
PROPERTY_ENTRY_BOOL("snps,dis_u2_susphy_quirk"),
+ PROPERTY_ENTRY_BOOL("snps,usb2-gadget-lpm-disable"),
PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
{}
};
--
2.31.1
This is a note to let you know that I've just added the patch titled
cdc-wdm: untangle a circular dependency between callback and softint
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 18abf874367456540846319574864e6ff32752e2 Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oneukum(a)suse.com>
Date: Mon, 26 Apr 2021 11:26:22 +0200
Subject: cdc-wdm: untangle a circular dependency between callback and softint
We have a cycle of callbacks scheduling works which submit
URBs with those callbacks. This needs to be blocked, stopped
and unblocked to untangle the circle.
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Link: https://lore.kernel.org/r/20210426092622.20433-1-oneukum@suse.com
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/class/cdc-wdm.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)
diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index 508b1c3f8b73..d1e4a7379beb 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -321,12 +321,23 @@ static void wdm_int_callback(struct urb *urb)
}
-static void kill_urbs(struct wdm_device *desc)
+static void poison_urbs(struct wdm_device *desc)
{
/* the order here is essential */
- usb_kill_urb(desc->command);
- usb_kill_urb(desc->validity);
- usb_kill_urb(desc->response);
+ usb_poison_urb(desc->command);
+ usb_poison_urb(desc->validity);
+ usb_poison_urb(desc->response);
+}
+
+static void unpoison_urbs(struct wdm_device *desc)
+{
+ /*
+ * the order here is not essential
+ * it is symmetrical just to be nice
+ */
+ usb_unpoison_urb(desc->response);
+ usb_unpoison_urb(desc->validity);
+ usb_unpoison_urb(desc->command);
}
static void free_urbs(struct wdm_device *desc)
@@ -741,11 +752,12 @@ static int wdm_release(struct inode *inode, struct file *file)
if (!desc->count) {
if (!test_bit(WDM_DISCONNECTING, &desc->flags)) {
dev_dbg(&desc->intf->dev, "wdm_release: cleanup\n");
- kill_urbs(desc);
+ poison_urbs(desc);
spin_lock_irq(&desc->iuspin);
desc->resp_count = 0;
spin_unlock_irq(&desc->iuspin);
desc->manage_power(desc->intf, 0);
+ unpoison_urbs(desc);
} else {
/* must avoid dev_printk here as desc->intf is invalid */
pr_debug(KBUILD_MODNAME " %s: device gone - cleaning up\n", __func__);
@@ -1037,9 +1049,9 @@ static void wdm_disconnect(struct usb_interface *intf)
wake_up_all(&desc->wait);
mutex_lock(&desc->rlock);
mutex_lock(&desc->wlock);
+ poison_urbs(desc);
cancel_work_sync(&desc->rxwork);
cancel_work_sync(&desc->service_outs_intr);
- kill_urbs(desc);
mutex_unlock(&desc->wlock);
mutex_unlock(&desc->rlock);
@@ -1080,9 +1092,10 @@ static int wdm_suspend(struct usb_interface *intf, pm_message_t message)
set_bit(WDM_SUSPENDING, &desc->flags);
spin_unlock_irq(&desc->iuspin);
/* callback submits work - order is essential */
- kill_urbs(desc);
+ poison_urbs(desc);
cancel_work_sync(&desc->rxwork);
cancel_work_sync(&desc->service_outs_intr);
+ unpoison_urbs(desc);
}
if (!PMSG_IS_AUTO(message)) {
mutex_unlock(&desc->wlock);
@@ -1140,7 +1153,7 @@ static int wdm_pre_reset(struct usb_interface *intf)
wake_up_all(&desc->wait);
mutex_lock(&desc->rlock);
mutex_lock(&desc->wlock);
- kill_urbs(desc);
+ poison_urbs(desc);
cancel_work_sync(&desc->rxwork);
cancel_work_sync(&desc->service_outs_intr);
return 0;
@@ -1151,6 +1164,7 @@ static int wdm_post_reset(struct usb_interface *intf)
struct wdm_device *desc = wdm_find_device(intf);
int rv;
+ unpoison_urbs(desc);
clear_bit(WDM_OVERFLOW, &desc->flags);
clear_bit(WDM_RESETTING, &desc->flags);
rv = recover_from_urb_loss(desc);
--
2.31.1
Hi,
Can you please cherry-pick 4fff19ae4275 ("drm/qxl: use ttm bo
priorities") from 5.13-rc1 into stable branches? It is a clean
cherry-pick for 5.12 and 5.11. Looking into 5.10-lts right now,
will eventually follow up with a backported patch.
thanks,
Gerd
If an error is received when issuing a start or update transfer
command, the error handler will stop all active requests (including
the current USB request), and call dwc3_gadget_giveback() to notify
function drivers of the requests which have been stopped. Avoid
returning an error for kick transfer during EP queue, to remove
duplicate cleanup operations on the request being queued.
Fixes: 8d99087c2db8 ("usb: dwc3: gadget: Properly handle failed kick_transfer")
cc: stable(a)vger.kernel.org
Signed-off-by: Wesley Cheng <wcheng(a)codeaurora.org>
---
Changes in v2:
- Added Fixes tag and Cc'ed stable
Changes in v1:
- Renamed commit title due to new implementation
- Return success always for kick transfer during ep queue
Previous patchset:
https://lore.kernel.org/linux-usb/875yzxibur.fsf@kernel.org/T/#t
drivers/usb/dwc3/gadget.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index dd80e5c..a5b7fd9 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1684,7 +1684,9 @@ static int __dwc3_gadget_ep_queue(struct dwc3_ep *dep, struct dwc3_request *req)
}
}
- return __dwc3_gadget_kick_transfer(dep);
+ __dwc3_gadget_kick_transfer(dep);
+
+ return 0;
}
static int dwc3_gadget_ep_queue(struct usb_ep *ep, struct usb_request *request,
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
do_mq_timedreceive calls wq_sleep with a stack local address. The
sender (do_mq_timedsend) uses this address to later call
pipelined_send.
This leads to a very hard to trigger race where a do_mq_timedreceive call
might return and leave do_mq_timedsend to rely on an invalid address,
causing the following crash:
[ 240.739977] RIP: 0010:wake_q_add_safe+0x13/0x60
[ 240.739991] Call Trace:
[ 240.739999] __x64_sys_mq_timedsend+0x2a9/0x490
[ 240.740003] ? auditd_test_task+0x38/0x40
[ 240.740007] ? auditd_test_task+0x38/0x40
[ 240.740011] do_syscall_64+0x80/0x680
[ 240.740017] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 240.740019] RIP: 0033:0x7f5928e40343
The race occurs as:
1. do_mq_timedreceive calls wq_sleep with the address of
`struct ext_wait_queue` on function stack (aliased as `ewq_addr` here)
- it holds a valid `struct ext_wait_queue *` as long as the stack has
not been overwritten.
2. `ewq_addr` gets added to info->e_wait_q[RECV].list in wq_add, and
do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call
__pipelined_op.
3. Sender calls __pipelined_op::smp_store_release(&this->state, STATE_READY).
Here is where the race window begins. (`this` is `ewq_addr`.)
4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it
will see `state == STATE_READY` and break.
5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed
to be a `struct ext_wait_queue *` since it was on do_mq_timedreceive's
stack. (Although the address may not get overwritten until another
function happens to touch it, which means it can persist around for an
indefinite time.)
6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a
`struct ext_wait_queue *`, and uses it to find a task_struct to pass
to the wake_q_add_safe call. In the lucky case where nothing has
overwritten `ewq_addr` yet, `ewq_addr->task` is the right task_struct.
In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a
bogus address as the receiver's task_struct causing the crash.
do_mq_timedsend::__pipelined_op() should not dereference `this` after
setting STATE_READY, as the receiver counterpart is now free to return.
Change __pipelined_op to call wake_q_add before setting STATE_READY
which ensures that the receiver's task_struct can still be found via
`this`.
Fixes: c5b2cbdbdac563 ("ipc/mqueue.c: update/document memory barriers")
Signed-off-by: Varad Gautam <varad.gautam(a)suse.com>
Reported-by: Matthias von Faber <matthias.vonfaber(a)aox-tech.de>
Acked-by: Davidlohr Bueso <dbueso(a)suse.de>
Cc: <stable(a)vger.kernel.org> # 5.6
Cc: Christian Brauner <christian.brauner(a)ubuntu.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
---
v2: Call wake_q_add before smp_store_release, instead of using a
get_task_struct/wake_q_add_safe combination across
smp_store_release. (Davidlohr Bueso)
v3: Comment/commit message fixup.
ipc/mqueue.c | 33 ++++++++++++++++++++++++---------
1 file changed, 24 insertions(+), 9 deletions(-)
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 8031464ed4ae..bf5dce399854 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -78,11 +78,13 @@ struct posix_msg_tree_node {
* MQ_BARRIER:
* To achieve proper release/acquire memory barrier pairing, the state is set to
* STATE_READY with smp_store_release(), and it is read with READ_ONCE followed
- * by smp_acquire__after_ctrl_dep(). In addition, wake_q_add_safe() is used.
+ * by smp_acquire__after_ctrl_dep(). The state change to STATE_READY must be
+ * the last write operation, after which the blocked task can immediately
+ * return and exit.
*
* This prevents the following races:
*
- * 1) With the simple wake_q_add(), the task could be gone already before
+ * 1) With wake_q_add(), the task could be gone already before
* the increase of the reference happens
* Thread A
* Thread B
@@ -97,10 +99,25 @@ struct posix_msg_tree_node {
* sys_exit()
* get_task_struct() // UaF
*
- * Solution: Use wake_q_add_safe() and perform the get_task_struct() before
- * the smp_store_release() that does ->state = STATE_READY.
+ * 2) With wake_q_add(), the receiver task could have returned from the
+ * syscall and had its stack-allocated waiter overwritten before the
+ * waker could add it to the wake_q
+ * Thread A
+ * Thread B
+ * WRITE_ONCE(wait.state, STATE_NONE);
+ * schedule_hrtimeout()
+ * ->state = STATE_READY
+ * <timeout returns>
+ * if (wait.state == STATE_READY) return;
+ * sysret to user space
+ * overwrite receiver's stack
+ * wake_q_add(A)
+ * if (cmpxchg()) // corrupted waiter
*
- * 2) Without proper _release/_acquire barriers, the woken up task
+ * Solution: Queue the task for wakeup before the smp_store_release() that
+ * does ->state = STATE_READY.
+ *
+ * 3) Without proper _release/_acquire barriers, the woken up task
* could read stale data
*
* Thread A
@@ -116,7 +133,7 @@ struct posix_msg_tree_node {
*
* Solution: use _release and _acquire barriers.
*
- * 3) There is intentionally no barrier when setting current->state
+ * 4) There is intentionally no barrier when setting current->state
* to TASK_INTERRUPTIBLE: spin_unlock(&info->lock) provides the
* release memory barrier, and the wakeup is triggered when holding
* info->lock, i.e. spin_lock(&info->lock) provided a pairing
@@ -1005,11 +1022,9 @@ static inline void __pipelined_op(struct wake_q_head *wake_q,
struct ext_wait_queue *this)
{
list_del(&this->list);
- get_task_struct(this->task);
-
+ wake_q_add(wake_q, this->task);
/* see MQ_BARRIER for purpose/pairing */
smp_store_release(&this->state, STATE_READY);
- wake_q_add_safe(wake_q, this->task);
}
/* pipelined_send() - send a message directly to the task waiting in
--
2.30.2
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: i2c: ccs-core: fix pm_runtime_get_sync() usage count
Author: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
Date: Fri Apr 23 17:19:11 2021 +0200
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
There is a bug at ccs_pm_get_init(): when this function returns
an error, the stream is not started, and RPM usage_count
should not be incremented. However, if the calls to
v4l2_ctrl_handler_setup() return errors, it will be kept
incremented.
At ccs_suspend() the best is to replace it by the new
pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter automatically,
in the case of errors.
Fixes: 96e3a6b92f23 ("media: smiapp: Avoid maintaining power state information")
Cc: stable(a)vger.kernel.org
Acked-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/i2c/ccs/ccs-core.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)
---
diff --git a/drivers/media/i2c/ccs/ccs-core.c b/drivers/media/i2c/ccs/ccs-core.c
index b05f409014b2..4a848ac2d2cd 100644
--- a/drivers/media/i2c/ccs/ccs-core.c
+++ b/drivers/media/i2c/ccs/ccs-core.c
@@ -1880,21 +1880,33 @@ static int ccs_pm_get_init(struct ccs_sensor *sensor)
struct i2c_client *client = v4l2_get_subdevdata(&sensor->src->sd);
int rval;
+ /*
+ * It can't use pm_runtime_resume_and_get() here, as the driver
+ * relies at the returned value to detect if the device was already
+ * active or not.
+ */
rval = pm_runtime_get_sync(&client->dev);
- if (rval < 0) {
- pm_runtime_put_noidle(&client->dev);
+ if (rval < 0)
+ goto error;
- return rval;
- } else if (!rval) {
- rval = v4l2_ctrl_handler_setup(&sensor->pixel_array->
- ctrl_handler);
- if (rval)
- return rval;
+ /* Device was already active, so don't set controls */
+ if (rval == 1)
+ return 0;
- return v4l2_ctrl_handler_setup(&sensor->src->ctrl_handler);
- }
+ /* Restore V4L2 controls to the previously suspended device */
+ rval = v4l2_ctrl_handler_setup(&sensor->pixel_array->ctrl_handler);
+ if (rval)
+ goto error;
+ rval = v4l2_ctrl_handler_setup(&sensor->src->ctrl_handler);
+ if (rval)
+ goto error;
+
+ /* Keep PM runtime usage_count incremented on success */
return 0;
+error:
+ pm_runtime_put(&client->dev);
+ return rval;
}
static int ccs_set_stream(struct v4l2_subdev *subdev, int enable)
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: i2c: ccs-core: fix pm_runtime_get_sync() usage count
Author: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
Date: Fri Apr 23 17:19:11 2021 +0200
The pm_runtime_get_sync() internally increments the
dev->power.usage_count without decrementing it, even on errors.
There is a bug at ccs_pm_get_init(): when this function returns
an error, the stream is not started, and RPM usage_count
should not be incremented. However, if the calls to
v4l2_ctrl_handler_setup() return errors, it will be kept
incremented.
At ccs_suspend() the best is to replace it by the new
pm_runtime_resume_and_get(), introduced by:
commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
in order to properly decrement the usage counter automatically,
in the case of errors.
Fixes: 96e3a6b92f23 ("media: smiapp: Avoid maintaining power state information")
Cc: stable(a)vger.kernel.org
Acked-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/i2c/ccs/ccs-core.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)
---
diff --git a/drivers/media/i2c/ccs/ccs-core.c b/drivers/media/i2c/ccs/ccs-core.c
index b05f409014b2..4a848ac2d2cd 100644
--- a/drivers/media/i2c/ccs/ccs-core.c
+++ b/drivers/media/i2c/ccs/ccs-core.c
@@ -1880,21 +1880,33 @@ static int ccs_pm_get_init(struct ccs_sensor *sensor)
struct i2c_client *client = v4l2_get_subdevdata(&sensor->src->sd);
int rval;
+ /*
+ * It can't use pm_runtime_resume_and_get() here, as the driver
+ * relies at the returned value to detect if the device was already
+ * active or not.
+ */
rval = pm_runtime_get_sync(&client->dev);
- if (rval < 0) {
- pm_runtime_put_noidle(&client->dev);
+ if (rval < 0)
+ goto error;
- return rval;
- } else if (!rval) {
- rval = v4l2_ctrl_handler_setup(&sensor->pixel_array->
- ctrl_handler);
- if (rval)
- return rval;
+ /* Device was already active, so don't set controls */
+ if (rval == 1)
+ return 0;
- return v4l2_ctrl_handler_setup(&sensor->src->ctrl_handler);
- }
+ /* Restore V4L2 controls to the previously suspended device */
+ rval = v4l2_ctrl_handler_setup(&sensor->pixel_array->ctrl_handler);
+ if (rval)
+ goto error;
+ rval = v4l2_ctrl_handler_setup(&sensor->src->ctrl_handler);
+ if (rval)
+ goto error;
+
+ /* Keep PM runtime usage_count incremented on success */
return 0;
+error:
+ pm_runtime_put(&client->dev);
+ return rval;
}
static int ccs_set_stream(struct v4l2_subdev *subdev, int enable)
Commit d3eeb1d77c5d0af ("xen/gntdev: use mmu_interval_notifier_insert")
introduced an error in gntdev_mmap(): in case the call of
mmu_interval_notifier_insert_locked() fails the exit path should not
call mmu_interval_notifier_remove(), as this might result in NULL
dereferences.
One reason for failure is e.g. a signal pending for the running
process.
Fixes: d3eeb1d77c5d0af ("xen/gntdev: use mmu_interval_notifier_insert")
Cc: stable(a)vger.kernel.org
Reported-by: Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross(a)suse.com>
---
drivers/xen/gntdev.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index f01d58c7a042..a3e7be96527d 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -1017,8 +1017,10 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
err = mmu_interval_notifier_insert_locked(
&map->notifier, vma->vm_mm, vma->vm_start,
vma->vm_end - vma->vm_start, &gntdev_mmu_ops);
- if (err)
+ if (err) {
+ map->vma = NULL;
goto out_unlock_put;
+ }
}
mutex_unlock(&priv->lock);
--
2.26.2
Since the Hamming software ECC engine has been updated to become a
proper and independent ECC engine, it is now mandatory to either
initialize the engine before using any one of his functions or use one
of the bare helpers which only perform the calculations. As there is no
actual need for a proper ECC initialization, let's just use the bare
helper instead of the rawnand one.
Fixes: 90ccf0a0192f ("mtd: nand: ecc-hamming: Rename the exported functions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/cs553x_nand.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/cs553x_nand.c b/drivers/mtd/nand/raw/cs553x_nand.c
index 6edf78c16fc8..3e41d815d5b0 100644
--- a/drivers/mtd/nand/raw/cs553x_nand.c
+++ b/drivers/mtd/nand/raw/cs553x_nand.c
@@ -240,6 +240,15 @@ static int cs_calculate_ecc(struct nand_chip *this, const u_char *dat,
return 0;
}
+static int cs553x_ecc_correct(struct nand_chip *chip,
+ unsigned char *buf,
+ unsigned char *read_ecc,
+ unsigned char *calc_ecc)
+{
+ return ecc_sw_hamming_correct(buf, read_ecc, calc_ecc,
+ chip->ecc.size, false);
+}
+
static struct cs553x_nand_controller *controllers[4];
static int cs553x_attach_chip(struct nand_chip *chip)
@@ -251,7 +260,7 @@ static int cs553x_attach_chip(struct nand_chip *chip)
chip->ecc.bytes = 3;
chip->ecc.hwctl = cs_enable_hwecc;
chip->ecc.calculate = cs_calculate_ecc;
- chip->ecc.correct = rawnand_sw_hamming_correct;
+ chip->ecc.correct = cs553x_ecc_correct;
chip->ecc.strength = 1;
return 0;
--
2.27.0
Since the Hamming software ECC engine has been updated to become a
proper and independent ECC engine, it is now mandatory to either
initialize the engine before using any one of his functions or use one
of the bare helpers which only perform the calculations. As there is no
actual need for a proper ECC initialization, let's just use the bare
helper instead of the rawnand one.
Fixes: 90ccf0a0192f ("mtd: nand: ecc-hamming: Rename the exported functions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/fsmc_nand.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/fsmc_nand.c b/drivers/mtd/nand/raw/fsmc_nand.c
index a24e2f57fa68..99a4dde9825a 100644
--- a/drivers/mtd/nand/raw/fsmc_nand.c
+++ b/drivers/mtd/nand/raw/fsmc_nand.c
@@ -25,6 +25,7 @@
#include <linux/sched.h>
#include <linux/types.h>
#include <linux/mtd/mtd.h>
+#include <linux/mtd/nand-ecc-sw-hamming.h>
#include <linux/mtd/rawnand.h>
#include <linux/platform_device.h>
#include <linux/of.h>
@@ -432,6 +433,15 @@ static int fsmc_read_hwecc_ecc1(struct nand_chip *chip, const u8 *data,
return 0;
}
+static int fsmc_correct_ecc1(struct nand_chip *chip,
+ unsigned char *buf,
+ unsigned char *read_ecc,
+ unsigned char *calc_ecc)
+{
+ return ecc_sw_hamming_correct(buf, read_ecc, calc_ecc,
+ chip->ecc.size, false);
+}
+
/* Count the number of 0's in buff upto a max of max_bits */
static int count_written_bits(u8 *buff, int size, int max_bits)
{
@@ -917,7 +927,7 @@ static int fsmc_nand_attach_chip(struct nand_chip *nand)
case NAND_ECC_ENGINE_TYPE_ON_HOST:
dev_info(host->dev, "Using 1-bit HW ECC scheme\n");
nand->ecc.calculate = fsmc_read_hwecc_ecc1;
- nand->ecc.correct = rawnand_sw_hamming_correct;
+ nand->ecc.correct = fsmc_correct_ecc1;
nand->ecc.hwctl = fsmc_enable_hwecc;
nand->ecc.bytes = 3;
nand->ecc.strength = 1;
--
2.27.0
Since the Hamming software ECC engine has been updated to become a
proper and independent ECC engine, it is now mandatory to either
initialize the engine before using any one of his functions or use one
of the bare helpers which only perform the calculations. As there is no
actual need for a proper ECC initialization, let's just use the bare
helper instead of the rawnand one.
Fixes: 90ccf0a0192f ("mtd: nand: ecc-hamming: Rename the exported functions")
Cc: stable(a)vger.kernel.org
Cc: Vladimir Zapolskiy <vz(a)mleia.com>
Reported-by: Trevor Woerner <twoerner(a)gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Tested-by: Trevor Woerner <twoerner(a)gmail.com>
---
drivers/mtd/nand/raw/lpc32xx_slc.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/lpc32xx_slc.c b/drivers/mtd/nand/raw/lpc32xx_slc.c
index 6b7269cfb7d8..d7dfc6fd85ca 100644
--- a/drivers/mtd/nand/raw/lpc32xx_slc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_slc.c
@@ -27,6 +27,7 @@
#include <linux/of.h>
#include <linux/of_gpio.h>
#include <linux/mtd/lpc32xx_slc.h>
+#include <linux/mtd/nand-ecc-sw-hamming.h>
#define LPC32XX_MODNAME "lpc32xx-nand"
@@ -344,6 +345,18 @@ static int lpc32xx_nand_ecc_calculate(struct nand_chip *chip,
return 0;
}
+/*
+ * Corrects the data
+ */
+static int lpc32xx_nand_ecc_correct(struct nand_chip *chip,
+ unsigned char *buf,
+ unsigned char *read_ecc,
+ unsigned char *calc_ecc)
+{
+ return ecc_sw_hamming_correct(buf, read_ecc, calc_ecc,
+ chip->ecc.size, false);
+}
+
/*
* Read a single byte from NAND device
*/
@@ -802,7 +815,7 @@ static int lpc32xx_nand_attach_chip(struct nand_chip *chip)
chip->ecc.write_oob = lpc32xx_nand_write_oob_syndrome;
chip->ecc.read_oob = lpc32xx_nand_read_oob_syndrome;
chip->ecc.calculate = lpc32xx_nand_ecc_calculate;
- chip->ecc.correct = rawnand_sw_hamming_correct;
+ chip->ecc.correct = lpc32xx_nand_ecc_correct;
chip->ecc.hwctl = lpc32xx_nand_ecc_enable;
/*
--
2.27.0
Since the Hamming software ECC engine has been updated to become a
proper and independent ECC engine, it is now mandatory to either
initialize the engine before using any one of his functions or use one
of the bare helpers which only perform the calculations. As there is no
actual need for a proper ECC initialization, let's just use the bare
helper instead of the rawnand one.
Fixes: 90ccf0a0192f ("mtd: nand: ecc-hamming: Rename the exported functions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/ndfc.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/ndfc.c b/drivers/mtd/nand/raw/ndfc.c
index 338d6b1a189e..98d5a94c3a24 100644
--- a/drivers/mtd/nand/raw/ndfc.c
+++ b/drivers/mtd/nand/raw/ndfc.c
@@ -22,6 +22,7 @@
#include <linux/mtd/ndfc.h>
#include <linux/slab.h>
#include <linux/mtd/mtd.h>
+#include <linux/mtd/nand-ecc-sw-hamming.h>
#include <linux/of_address.h>
#include <linux/of_platform.h>
#include <asm/io.h>
@@ -100,6 +101,15 @@ static int ndfc_calculate_ecc(struct nand_chip *chip,
return 0;
}
+static int ndfc_correct_ecc(struct nand_chip *chip,
+ unsigned char *buf,
+ unsigned char *read_ecc,
+ unsigned char *calc_ecc)
+{
+ return ecc_sw_hamming_correct(buf, read_ecc, calc_ecc,
+ chip->ecc.size, false);
+}
+
/*
* Speedups for buffer read/write/verify
*
@@ -145,7 +155,7 @@ static int ndfc_chip_init(struct ndfc_controller *ndfc,
chip->controller = &ndfc->ndfc_control;
chip->legacy.read_buf = ndfc_read_buf;
chip->legacy.write_buf = ndfc_write_buf;
- chip->ecc.correct = rawnand_sw_hamming_correct;
+ chip->ecc.correct = ndfc_correct_ecc;
chip->ecc.hwctl = ndfc_enable_hwecc;
chip->ecc.calculate = ndfc_calculate_ecc;
chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_HOST;
--
2.27.0
Since the Hamming software ECC engine has been updated to become a
proper and independent ECC engine, it is now mandatory to either
initialize the engine before using any one of his functions or use one
of the bare helpers which only perform the calculations. As there is no
actual need for a proper ECC initialization, let's just use the bare
helper instead of the rawnand one.
Fixes: 90ccf0a0192f ("mtd: nand: ecc-hamming: Rename the exported functions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/sharpsl.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/sharpsl.c b/drivers/mtd/nand/raw/sharpsl.c
index 5612ee628425..2f1fe464e663 100644
--- a/drivers/mtd/nand/raw/sharpsl.c
+++ b/drivers/mtd/nand/raw/sharpsl.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/mtd/mtd.h>
+#include <linux/mtd/nand-ecc-sw-hamming.h>
#include <linux/mtd/rawnand.h>
#include <linux/mtd/partitions.h>
#include <linux/mtd/sharpsl.h>
@@ -96,6 +97,15 @@ static int sharpsl_nand_calculate_ecc(struct nand_chip *chip,
return readb(sharpsl->io + ECCCNTR) != 0;
}
+static int sharpsl_nand_correct_ecc(struct nand_chip *chip,
+ unsigned char *buf,
+ unsigned char *read_ecc,
+ unsigned char *calc_ecc)
+{
+ return ecc_sw_hamming_correct(buf, read_ecc, calc_ecc,
+ chip->ecc.size, false);
+}
+
static int sharpsl_attach_chip(struct nand_chip *chip)
{
if (chip->ecc.engine_type != NAND_ECC_ENGINE_TYPE_ON_HOST)
@@ -106,7 +116,7 @@ static int sharpsl_attach_chip(struct nand_chip *chip)
chip->ecc.strength = 1;
chip->ecc.hwctl = sharpsl_nand_enable_hwecc;
chip->ecc.calculate = sharpsl_nand_calculate_ecc;
- chip->ecc.correct = rawnand_sw_hamming_correct;
+ chip->ecc.correct = sharpsl_nand_correct_ecc;
return 0;
}
--
2.27.0
Since the Hamming software ECC engine has been updated to become a
proper and independent ECC engine, it is now mandatory to either
initialize the engine before using any one of his functions or use one
of the bare helpers which only perform the calculations. As there is no
actual need for a proper ECC initialization, let's just use the bare
helper instead of the rawnand one.
Fixes: 90ccf0a0192f ("mtd: nand: ecc-hamming: Rename the exported functions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/tmio_nand.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/nand/raw/tmio_nand.c b/drivers/mtd/nand/raw/tmio_nand.c
index de8e919d0ebe..6d93dd31969b 100644
--- a/drivers/mtd/nand/raw/tmio_nand.c
+++ b/drivers/mtd/nand/raw/tmio_nand.c
@@ -34,6 +34,7 @@
#include <linux/interrupt.h>
#include <linux/ioport.h>
#include <linux/mtd/mtd.h>
+#include <linux/mtd/nand-ecc-sw-hamming.h>
#include <linux/mtd/rawnand.h>
#include <linux/mtd/partitions.h>
#include <linux/slab.h>
@@ -292,11 +293,12 @@ static int tmio_nand_correct_data(struct nand_chip *chip, unsigned char *buf,
int r0, r1;
/* assume ecc.size = 512 and ecc.bytes = 6 */
- r0 = rawnand_sw_hamming_correct(chip, buf, read_ecc, calc_ecc);
+ r0 = ecc_sw_hamming_correct(buf, read_ecc, calc_ecc,
+ chip->ecc.size, false);
if (r0 < 0)
return r0;
- r1 = rawnand_sw_hamming_correct(chip, buf + 256, read_ecc + 3,
- calc_ecc + 3);
+ r1 = ecc_sw_hamming_correct(buf + 256, read_ecc + 3, calc_ecc + 3,
+ chip->ecc.size, false);
if (r1 < 0)
return r1;
return r0 + r1;
--
2.27.0
Since the Hamming software ECC engine has been updated to become a
proper and independent ECC engine, it is now mandatory to either
initialize the engine before using any one of his functions or use one
of the bare helpers which only perform the calculations. As there is no
actual need for a proper ECC initialization, let's just use the bare
helper instead of the rawnand one.
Fixes: 90ccf0a0192f ("mtd: nand: ecc-hamming: Rename the exported functions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/txx9ndfmc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/raw/txx9ndfmc.c b/drivers/mtd/nand/raw/txx9ndfmc.c
index 1a9449e53bf9..b8894ac27073 100644
--- a/drivers/mtd/nand/raw/txx9ndfmc.c
+++ b/drivers/mtd/nand/raw/txx9ndfmc.c
@@ -13,6 +13,7 @@
#include <linux/platform_device.h>
#include <linux/delay.h>
#include <linux/mtd/mtd.h>
+#include <linux/mtd/nand-ecc-sw-hamming.h>
#include <linux/mtd/rawnand.h>
#include <linux/mtd/partitions.h>
#include <linux/io.h>
@@ -193,8 +194,8 @@ static int txx9ndfmc_correct_data(struct nand_chip *chip, unsigned char *buf,
int stat;
for (eccsize = chip->ecc.size; eccsize > 0; eccsize -= 256) {
- stat = rawnand_sw_hamming_correct(chip, buf, read_ecc,
- calc_ecc);
+ stat = ecc_sw_hamming_correct(buf, read_ecc, calc_ecc,
+ chip->ecc.size, false);
if (stat < 0)
return stat;
corrected += stat;
--
2.27.0
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fef05776eb02238dcad8d5514e666a42572c3f32 Mon Sep 17 00:00:00 2001
From: Lukasz Luba <lukasz.luba(a)arm.com>
Date: Thu, 22 Apr 2021 16:36:22 +0100
Subject: [PATCH] thermal/core/fair share: Lock the thermal zone while looping
over instances
The tz->lock must be hold during the looping over the instances in that
thermal zone. This lock was missing in the governor code since the
beginning, so it's hard to point into a particular commit.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Lukasz Luba <lukasz.luba(a)arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Link: https://lore.kernel.org/r/20210422153624.6074-2-lukasz.luba@arm.com
diff --git a/drivers/thermal/gov_fair_share.c b/drivers/thermal/gov_fair_share.c
index aaa07180ab48..645432ce6365 100644
--- a/drivers/thermal/gov_fair_share.c
+++ b/drivers/thermal/gov_fair_share.c
@@ -82,6 +82,8 @@ static int fair_share_throttle(struct thermal_zone_device *tz, int trip)
int total_instance = 0;
int cur_trip_level = get_trip_level(tz);
+ mutex_lock(&tz->lock);
+
list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
if (instance->trip != trip)
continue;
@@ -110,6 +112,8 @@ static int fair_share_throttle(struct thermal_zone_device *tz, int trip)
mutex_unlock(&instance->cdev->lock);
thermal_cdev_update(cdev);
}
+
+ mutex_unlock(&tz->lock);
return 0;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fef05776eb02238dcad8d5514e666a42572c3f32 Mon Sep 17 00:00:00 2001
From: Lukasz Luba <lukasz.luba(a)arm.com>
Date: Thu, 22 Apr 2021 16:36:22 +0100
Subject: [PATCH] thermal/core/fair share: Lock the thermal zone while looping
over instances
The tz->lock must be hold during the looping over the instances in that
thermal zone. This lock was missing in the governor code since the
beginning, so it's hard to point into a particular commit.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Lukasz Luba <lukasz.luba(a)arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Link: https://lore.kernel.org/r/20210422153624.6074-2-lukasz.luba@arm.com
diff --git a/drivers/thermal/gov_fair_share.c b/drivers/thermal/gov_fair_share.c
index aaa07180ab48..645432ce6365 100644
--- a/drivers/thermal/gov_fair_share.c
+++ b/drivers/thermal/gov_fair_share.c
@@ -82,6 +82,8 @@ static int fair_share_throttle(struct thermal_zone_device *tz, int trip)
int total_instance = 0;
int cur_trip_level = get_trip_level(tz);
+ mutex_lock(&tz->lock);
+
list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
if (instance->trip != trip)
continue;
@@ -110,6 +112,8 @@ static int fair_share_throttle(struct thermal_zone_device *tz, int trip)
mutex_unlock(&instance->cdev->lock);
thermal_cdev_update(cdev);
}
+
+ mutex_unlock(&tz->lock);
return 0;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fef05776eb02238dcad8d5514e666a42572c3f32 Mon Sep 17 00:00:00 2001
From: Lukasz Luba <lukasz.luba(a)arm.com>
Date: Thu, 22 Apr 2021 16:36:22 +0100
Subject: [PATCH] thermal/core/fair share: Lock the thermal zone while looping
over instances
The tz->lock must be hold during the looping over the instances in that
thermal zone. This lock was missing in the governor code since the
beginning, so it's hard to point into a particular commit.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Lukasz Luba <lukasz.luba(a)arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Link: https://lore.kernel.org/r/20210422153624.6074-2-lukasz.luba@arm.com
diff --git a/drivers/thermal/gov_fair_share.c b/drivers/thermal/gov_fair_share.c
index aaa07180ab48..645432ce6365 100644
--- a/drivers/thermal/gov_fair_share.c
+++ b/drivers/thermal/gov_fair_share.c
@@ -82,6 +82,8 @@ static int fair_share_throttle(struct thermal_zone_device *tz, int trip)
int total_instance = 0;
int cur_trip_level = get_trip_level(tz);
+ mutex_lock(&tz->lock);
+
list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
if (instance->trip != trip)
continue;
@@ -110,6 +112,8 @@ static int fair_share_throttle(struct thermal_zone_device *tz, int trip)
mutex_unlock(&instance->cdev->lock);
thermal_cdev_update(cdev);
}
+
+ mutex_unlock(&tz->lock);
return 0;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fef05776eb02238dcad8d5514e666a42572c3f32 Mon Sep 17 00:00:00 2001
From: Lukasz Luba <lukasz.luba(a)arm.com>
Date: Thu, 22 Apr 2021 16:36:22 +0100
Subject: [PATCH] thermal/core/fair share: Lock the thermal zone while looping
over instances
The tz->lock must be hold during the looping over the instances in that
thermal zone. This lock was missing in the governor code since the
beginning, so it's hard to point into a particular commit.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Lukasz Luba <lukasz.luba(a)arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Link: https://lore.kernel.org/r/20210422153624.6074-2-lukasz.luba@arm.com
diff --git a/drivers/thermal/gov_fair_share.c b/drivers/thermal/gov_fair_share.c
index aaa07180ab48..645432ce6365 100644
--- a/drivers/thermal/gov_fair_share.c
+++ b/drivers/thermal/gov_fair_share.c
@@ -82,6 +82,8 @@ static int fair_share_throttle(struct thermal_zone_device *tz, int trip)
int total_instance = 0;
int cur_trip_level = get_trip_level(tz);
+ mutex_lock(&tz->lock);
+
list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
if (instance->trip != trip)
continue;
@@ -110,6 +112,8 @@ static int fair_share_throttle(struct thermal_zone_device *tz, int trip)
mutex_unlock(&instance->cdev->lock);
thermal_cdev_update(cdev);
}
+
+ mutex_unlock(&tz->lock);
return 0;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34ab17cc6c2c1ac93d7e5d53bb972df9a968f085 Mon Sep 17 00:00:00 2001
From: brian-sy yang <brian-sy.yang(a)mediatek.com>
Date: Tue, 29 Dec 2020 13:08:31 +0800
Subject: [PATCH] thermal/drivers/cpufreq_cooling: Fix slab OOB issue
Slab OOB issue is scanned by KASAN in cpu_power_to_freq().
If power is limited below the power of OPP0 in EM table,
it will cause slab out-of-bound issue with negative array
index.
Return the lowest frequency if limited power cannot found
a suitable OPP in EM table to fix this issue.
Backtrace:
[<ffffffd02d2a37f0>] die+0x104/0x5ac
[<ffffffd02d2a5630>] bug_handler+0x64/0xd0
[<ffffffd02d288ce4>] brk_handler+0x160/0x258
[<ffffffd02d281e5c>] do_debug_exception+0x248/0x3f0
[<ffffffd02d284488>] el1_dbg+0x14/0xbc
[<ffffffd02d75d1d4>] __kasan_report+0x1dc/0x1e0
[<ffffffd02d75c2e0>] kasan_report+0x10/0x20
[<ffffffd02d75def8>] __asan_report_load8_noabort+0x18/0x28
[<ffffffd02e6fce5c>] cpufreq_power2state+0x180/0x43c
[<ffffffd02e6ead80>] power_actor_set_power+0x114/0x1d4
[<ffffffd02e6fac24>] allocate_power+0xaec/0xde0
[<ffffffd02e6f9f80>] power_allocator_throttle+0x3ec/0x5a4
[<ffffffd02e6ea888>] handle_thermal_trip+0x160/0x294
[<ffffffd02e6edd08>] thermal_zone_device_check+0xe4/0x154
[<ffffffd02d351cb4>] process_one_work+0x5e4/0xe28
[<ffffffd02d352f44>] worker_thread+0xa4c/0xfac
[<ffffffd02d360124>] kthread+0x33c/0x358
[<ffffffd02d289940>] ret_from_fork+0xc/0x18
Fixes: 371a3bc79c11b ("thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from power")
Signed-off-by: brian-sy yang <brian-sy.yang(a)mediatek.com>
Signed-off-by: Michael Kao <michael.kao(a)mediatek.com>
Reviewed-by: Lukasz Luba <lukasz.luba(a)arm.com>
Cc: stable(a)vger.kernel.org #v5.7
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Link: https://lore.kernel.org/r/20201229050831.19493-1-michael.kao@mediatek.com
diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index f3d308427665..eeb4e4b76c0b 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -116,7 +116,7 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev,
{
int i;
- for (i = cpufreq_cdev->max_level; i >= 0; i--) {
+ for (i = cpufreq_cdev->max_level; i > 0; i--) {
if (power >= cpufreq_cdev->em->table[i].power)
break;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34ab17cc6c2c1ac93d7e5d53bb972df9a968f085 Mon Sep 17 00:00:00 2001
From: brian-sy yang <brian-sy.yang(a)mediatek.com>
Date: Tue, 29 Dec 2020 13:08:31 +0800
Subject: [PATCH] thermal/drivers/cpufreq_cooling: Fix slab OOB issue
Slab OOB issue is scanned by KASAN in cpu_power_to_freq().
If power is limited below the power of OPP0 in EM table,
it will cause slab out-of-bound issue with negative array
index.
Return the lowest frequency if limited power cannot found
a suitable OPP in EM table to fix this issue.
Backtrace:
[<ffffffd02d2a37f0>] die+0x104/0x5ac
[<ffffffd02d2a5630>] bug_handler+0x64/0xd0
[<ffffffd02d288ce4>] brk_handler+0x160/0x258
[<ffffffd02d281e5c>] do_debug_exception+0x248/0x3f0
[<ffffffd02d284488>] el1_dbg+0x14/0xbc
[<ffffffd02d75d1d4>] __kasan_report+0x1dc/0x1e0
[<ffffffd02d75c2e0>] kasan_report+0x10/0x20
[<ffffffd02d75def8>] __asan_report_load8_noabort+0x18/0x28
[<ffffffd02e6fce5c>] cpufreq_power2state+0x180/0x43c
[<ffffffd02e6ead80>] power_actor_set_power+0x114/0x1d4
[<ffffffd02e6fac24>] allocate_power+0xaec/0xde0
[<ffffffd02e6f9f80>] power_allocator_throttle+0x3ec/0x5a4
[<ffffffd02e6ea888>] handle_thermal_trip+0x160/0x294
[<ffffffd02e6edd08>] thermal_zone_device_check+0xe4/0x154
[<ffffffd02d351cb4>] process_one_work+0x5e4/0xe28
[<ffffffd02d352f44>] worker_thread+0xa4c/0xfac
[<ffffffd02d360124>] kthread+0x33c/0x358
[<ffffffd02d289940>] ret_from_fork+0xc/0x18
Fixes: 371a3bc79c11b ("thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from power")
Signed-off-by: brian-sy yang <brian-sy.yang(a)mediatek.com>
Signed-off-by: Michael Kao <michael.kao(a)mediatek.com>
Reviewed-by: Lukasz Luba <lukasz.luba(a)arm.com>
Cc: stable(a)vger.kernel.org #v5.7
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Link: https://lore.kernel.org/r/20201229050831.19493-1-michael.kao@mediatek.com
diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index f3d308427665..eeb4e4b76c0b 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -116,7 +116,7 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev,
{
int i;
- for (i = cpufreq_cdev->max_level; i >= 0; i--) {
+ for (i = cpufreq_cdev->max_level; i > 0; i--) {
if (power >= cpufreq_cdev->em->table[i].power)
break;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34ab17cc6c2c1ac93d7e5d53bb972df9a968f085 Mon Sep 17 00:00:00 2001
From: brian-sy yang <brian-sy.yang(a)mediatek.com>
Date: Tue, 29 Dec 2020 13:08:31 +0800
Subject: [PATCH] thermal/drivers/cpufreq_cooling: Fix slab OOB issue
Slab OOB issue is scanned by KASAN in cpu_power_to_freq().
If power is limited below the power of OPP0 in EM table,
it will cause slab out-of-bound issue with negative array
index.
Return the lowest frequency if limited power cannot found
a suitable OPP in EM table to fix this issue.
Backtrace:
[<ffffffd02d2a37f0>] die+0x104/0x5ac
[<ffffffd02d2a5630>] bug_handler+0x64/0xd0
[<ffffffd02d288ce4>] brk_handler+0x160/0x258
[<ffffffd02d281e5c>] do_debug_exception+0x248/0x3f0
[<ffffffd02d284488>] el1_dbg+0x14/0xbc
[<ffffffd02d75d1d4>] __kasan_report+0x1dc/0x1e0
[<ffffffd02d75c2e0>] kasan_report+0x10/0x20
[<ffffffd02d75def8>] __asan_report_load8_noabort+0x18/0x28
[<ffffffd02e6fce5c>] cpufreq_power2state+0x180/0x43c
[<ffffffd02e6ead80>] power_actor_set_power+0x114/0x1d4
[<ffffffd02e6fac24>] allocate_power+0xaec/0xde0
[<ffffffd02e6f9f80>] power_allocator_throttle+0x3ec/0x5a4
[<ffffffd02e6ea888>] handle_thermal_trip+0x160/0x294
[<ffffffd02e6edd08>] thermal_zone_device_check+0xe4/0x154
[<ffffffd02d351cb4>] process_one_work+0x5e4/0xe28
[<ffffffd02d352f44>] worker_thread+0xa4c/0xfac
[<ffffffd02d360124>] kthread+0x33c/0x358
[<ffffffd02d289940>] ret_from_fork+0xc/0x18
Fixes: 371a3bc79c11b ("thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from power")
Signed-off-by: brian-sy yang <brian-sy.yang(a)mediatek.com>
Signed-off-by: Michael Kao <michael.kao(a)mediatek.com>
Reviewed-by: Lukasz Luba <lukasz.luba(a)arm.com>
Cc: stable(a)vger.kernel.org #v5.7
Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Link: https://lore.kernel.org/r/20201229050831.19493-1-michael.kao@mediatek.com
diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index f3d308427665..eeb4e4b76c0b 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -116,7 +116,7 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev,
{
int i;
- for (i = cpufreq_cdev->max_level; i >= 0; i--) {
+ for (i = cpufreq_cdev->max_level; i > 0; i--) {
if (power >= cpufreq_cdev->em->table[i].power)
break;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 65afd97630a9d6dd9ea83ff182dfdb15bc58c5d1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E5=91=A8=E7=90=B0=E6=9D=B0=20=28Zhou=20Yanjie=29?=
<zhouyanjie(a)wanyeetech.com>
Date: Sun, 18 Apr 2021 22:44:22 +0800
Subject: [PATCH] pinctrl: Ingenic: Add missing pins to the JZ4770 MAC MII
group.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The MII group of JZ4770's MAC should have 7 pins, add missing
pins to the MII group.
Fixes: 5de1a73e78ed ("Pinctrl: Ingenic: Add missing parts for JZ4770 and JZ4780.")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie(a)wanyeetech.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Reviewed-by: Paul Cercueil <paul(a)crapouillou.net>
Link: https://lore.kernel.org/r/1618757073-1724-2-git-send-email-zhouyanjie@wanye…
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
diff --git a/drivers/pinctrl/pinctrl-ingenic.c b/drivers/pinctrl/pinctrl-ingenic.c
index c8ecd014cf19..f15ef814b75a 100644
--- a/drivers/pinctrl/pinctrl-ingenic.c
+++ b/drivers/pinctrl/pinctrl-ingenic.c
@@ -667,7 +667,9 @@ static int jz4770_pwm_pwm7_pins[] = { 0x6b, };
static int jz4770_mac_rmii_pins[] = {
0xa9, 0xab, 0xaa, 0xac, 0xa5, 0xa4, 0xad, 0xae, 0xa6, 0xa8,
};
-static int jz4770_mac_mii_pins[] = { 0xa7, 0xaf, };
+static int jz4770_mac_mii_pins[] = {
+ 0x7b, 0x7a, 0x7d, 0x7c, 0xa7, 0x24, 0xaf,
+};
static const struct group_desc jz4770_groups[] = {
INGENIC_PIN_GROUP("uart0-data", jz4770_uart0_data, 0),
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 65afd97630a9d6dd9ea83ff182dfdb15bc58c5d1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E5=91=A8=E7=90=B0=E6=9D=B0=20=28Zhou=20Yanjie=29?=
<zhouyanjie(a)wanyeetech.com>
Date: Sun, 18 Apr 2021 22:44:22 +0800
Subject: [PATCH] pinctrl: Ingenic: Add missing pins to the JZ4770 MAC MII
group.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The MII group of JZ4770's MAC should have 7 pins, add missing
pins to the MII group.
Fixes: 5de1a73e78ed ("Pinctrl: Ingenic: Add missing parts for JZ4770 and JZ4780.")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie(a)wanyeetech.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Reviewed-by: Paul Cercueil <paul(a)crapouillou.net>
Link: https://lore.kernel.org/r/1618757073-1724-2-git-send-email-zhouyanjie@wanye…
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
diff --git a/drivers/pinctrl/pinctrl-ingenic.c b/drivers/pinctrl/pinctrl-ingenic.c
index c8ecd014cf19..f15ef814b75a 100644
--- a/drivers/pinctrl/pinctrl-ingenic.c
+++ b/drivers/pinctrl/pinctrl-ingenic.c
@@ -667,7 +667,9 @@ static int jz4770_pwm_pwm7_pins[] = { 0x6b, };
static int jz4770_mac_rmii_pins[] = {
0xa9, 0xab, 0xaa, 0xac, 0xa5, 0xa4, 0xad, 0xae, 0xa6, 0xa8,
};
-static int jz4770_mac_mii_pins[] = { 0xa7, 0xaf, };
+static int jz4770_mac_mii_pins[] = {
+ 0x7b, 0x7a, 0x7d, 0x7c, 0xa7, 0x24, 0xaf,
+};
static const struct group_desc jz4770_groups[] = {
INGENIC_PIN_GROUP("uart0-data", jz4770_uart0_data, 0),
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 65afd97630a9d6dd9ea83ff182dfdb15bc58c5d1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E5=91=A8=E7=90=B0=E6=9D=B0=20=28Zhou=20Yanjie=29?=
<zhouyanjie(a)wanyeetech.com>
Date: Sun, 18 Apr 2021 22:44:22 +0800
Subject: [PATCH] pinctrl: Ingenic: Add missing pins to the JZ4770 MAC MII
group.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The MII group of JZ4770's MAC should have 7 pins, add missing
pins to the MII group.
Fixes: 5de1a73e78ed ("Pinctrl: Ingenic: Add missing parts for JZ4770 and JZ4780.")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: 周琰杰 (Zhou Yanjie) <zhouyanjie(a)wanyeetech.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Reviewed-by: Paul Cercueil <paul(a)crapouillou.net>
Link: https://lore.kernel.org/r/1618757073-1724-2-git-send-email-zhouyanjie@wanye…
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
diff --git a/drivers/pinctrl/pinctrl-ingenic.c b/drivers/pinctrl/pinctrl-ingenic.c
index c8ecd014cf19..f15ef814b75a 100644
--- a/drivers/pinctrl/pinctrl-ingenic.c
+++ b/drivers/pinctrl/pinctrl-ingenic.c
@@ -667,7 +667,9 @@ static int jz4770_pwm_pwm7_pins[] = { 0x6b, };
static int jz4770_mac_rmii_pins[] = {
0xa9, 0xab, 0xaa, 0xac, 0xa5, 0xa4, 0xad, 0xae, 0xa6, 0xa8,
};
-static int jz4770_mac_mii_pins[] = { 0xa7, 0xaf, };
+static int jz4770_mac_mii_pins[] = {
+ 0x7b, 0x7a, 0x7d, 0x7c, 0xa7, 0x24, 0xaf,
+};
static const struct group_desc jz4770_groups[] = {
INGENIC_PIN_GROUP("uart0-data", jz4770_uart0_data, 0),
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 17e9e134a8efabbbf689a0904eee92bb5a868172 Mon Sep 17 00:00:00 2001
From: Tian Tao <tiantao6(a)hisilicon.com>
Date: Wed, 14 Apr 2021 09:43:44 +0800
Subject: [PATCH] dm integrity: fix missing goto in bitmap_flush_interval error
handling
Fixes: 468dfca38b1a ("dm integrity: add a bitmap mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tian Tao <tiantao6(a)hisilicon.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index fed8a7ccd7f9..6977422454a4 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -4049,6 +4049,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
if (val >= (uint64_t)UINT_MAX * 1000 / HZ) {
r = -EINVAL;
ti->error = "Invalid bitmap_flush_interval argument";
+ goto bad;
}
ic->bitmap_flush_interval = msecs_to_jiffies(val);
} else if (!strncmp(opt_string, "internal_hash:", strlen("internal_hash:"))) {
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From aafe104aa9096827a429bc1358f8260ee565b7cc Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 30 Apr 2021 12:17:58 -0400
Subject: [PATCH] tracing: Restructure trace_clock_global() to never block
It was reported that a fix to the ring buffer recursion detection would
cause a hung machine when performing suspend / resume testing. The
following backtrace was extracted from debugging that case:
Call Trace:
trace_clock_global+0x91/0xa0
__rb_reserve_next+0x237/0x460
ring_buffer_lock_reserve+0x12a/0x3f0
trace_buffer_lock_reserve+0x10/0x50
__trace_graph_return+0x1f/0x80
trace_graph_return+0xb7/0xf0
? trace_clock_global+0x91/0xa0
ftrace_return_to_handler+0x8b/0xf0
? pv_hash+0xa0/0xa0
return_to_handler+0x15/0x30
? ftrace_graph_caller+0xa0/0xa0
? trace_clock_global+0x91/0xa0
? __rb_reserve_next+0x237/0x460
? ring_buffer_lock_reserve+0x12a/0x3f0
? trace_event_buffer_lock_reserve+0x3c/0x120
? trace_event_buffer_reserve+0x6b/0xc0
? trace_event_raw_event_device_pm_callback_start+0x125/0x2d0
? dpm_run_callback+0x3b/0xc0
? pm_ops_is_empty+0x50/0x50
? platform_get_irq_byname_optional+0x90/0x90
? trace_device_pm_callback_start+0x82/0xd0
? dpm_run_callback+0x49/0xc0
With the following RIP:
RIP: 0010:native_queued_spin_lock_slowpath+0x69/0x200
Since the fix to the recursion detection would allow a single recursion to
happen while tracing, this lead to the trace_clock_global() taking a spin
lock and then trying to take it again:
ring_buffer_lock_reserve() {
trace_clock_global() {
arch_spin_lock() {
queued_spin_lock_slowpath() {
/* lock taken */
(something else gets traced by function graph tracer)
ring_buffer_lock_reserve() {
trace_clock_global() {
arch_spin_lock() {
queued_spin_lock_slowpath() {
/* DEAD LOCK! */
Tracing should *never* block, as it can lead to strange lockups like the
above.
Restructure the trace_clock_global() code to instead of simply taking a
lock to update the recorded "prev_time" simply use it, as two events
happening on two different CPUs that calls this at the same time, really
doesn't matter which one goes first. Use a trylock to grab the lock for
updating the prev_time, and if it fails, simply try again the next time.
If it failed to be taken, that means something else is already updating
it.
Link: https://lkml.kernel.org/r/20210430121758.650b6e8a@gandalf.local.home
Cc: stable(a)vger.kernel.org
Tested-by: Konstantin Kharlamov <hi-angel(a)yandex.ru>
Tested-by: Todd Brandt <todd.e.brandt(a)linux.intel.com>
Fixes: b02414c8f045 ("ring-buffer: Fix recursion protection transitions between interrupt context") # started showing the problem
Fixes: 14131f2f98ac3 ("tracing: implement trace_clock_*() APIs") # where the bug happened
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212761
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index aaf6793ededa..c1637f90c8a3 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -95,33 +95,49 @@ u64 notrace trace_clock_global(void)
{
unsigned long flags;
int this_cpu;
- u64 now;
+ u64 now, prev_time;
raw_local_irq_save(flags);
this_cpu = raw_smp_processor_id();
- now = sched_clock_cpu(this_cpu);
+
/*
- * If in an NMI context then dont risk lockups and return the
- * cpu_clock() time:
+ * The global clock "guarantees" that the events are ordered
+ * between CPUs. But if two events on two different CPUS call
+ * trace_clock_global at roughly the same time, it really does
+ * not matter which one gets the earlier time. Just make sure
+ * that the same CPU will always show a monotonic clock.
+ *
+ * Use a read memory barrier to get the latest written
+ * time that was recorded.
*/
- if (unlikely(in_nmi()))
- goto out;
+ smp_rmb();
+ prev_time = READ_ONCE(trace_clock_struct.prev_time);
+ now = sched_clock_cpu(this_cpu);
- arch_spin_lock(&trace_clock_struct.lock);
+ /* Make sure that now is always greater than prev_time */
+ if ((s64)(now - prev_time) < 0)
+ now = prev_time + 1;
/*
- * TODO: if this happens often then maybe we should reset
- * my_scd->clock to prev_time+1, to make sure
- * we start ticking with the local clock from now on?
+ * If in an NMI context then dont risk lockups and simply return
+ * the current time.
*/
- if ((s64)(now - trace_clock_struct.prev_time) < 0)
- now = trace_clock_struct.prev_time + 1;
+ if (unlikely(in_nmi()))
+ goto out;
- trace_clock_struct.prev_time = now;
+ /* Tracing can cause strange recursion, always use a try lock */
+ if (arch_spin_trylock(&trace_clock_struct.lock)) {
+ /* Reread prev_time in case it was already updated */
+ prev_time = READ_ONCE(trace_clock_struct.prev_time);
+ if ((s64)(now - prev_time) < 0)
+ now = prev_time + 1;
- arch_spin_unlock(&trace_clock_struct.lock);
+ trace_clock_struct.prev_time = now;
+ /* The unlock acts as the wmb for the above rmb */
+ arch_spin_unlock(&trace_clock_struct.lock);
+ }
out:
raw_local_irq_restore(flags);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From aafe104aa9096827a429bc1358f8260ee565b7cc Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Fri, 30 Apr 2021 12:17:58 -0400
Subject: [PATCH] tracing: Restructure trace_clock_global() to never block
It was reported that a fix to the ring buffer recursion detection would
cause a hung machine when performing suspend / resume testing. The
following backtrace was extracted from debugging that case:
Call Trace:
trace_clock_global+0x91/0xa0
__rb_reserve_next+0x237/0x460
ring_buffer_lock_reserve+0x12a/0x3f0
trace_buffer_lock_reserve+0x10/0x50
__trace_graph_return+0x1f/0x80
trace_graph_return+0xb7/0xf0
? trace_clock_global+0x91/0xa0
ftrace_return_to_handler+0x8b/0xf0
? pv_hash+0xa0/0xa0
return_to_handler+0x15/0x30
? ftrace_graph_caller+0xa0/0xa0
? trace_clock_global+0x91/0xa0
? __rb_reserve_next+0x237/0x460
? ring_buffer_lock_reserve+0x12a/0x3f0
? trace_event_buffer_lock_reserve+0x3c/0x120
? trace_event_buffer_reserve+0x6b/0xc0
? trace_event_raw_event_device_pm_callback_start+0x125/0x2d0
? dpm_run_callback+0x3b/0xc0
? pm_ops_is_empty+0x50/0x50
? platform_get_irq_byname_optional+0x90/0x90
? trace_device_pm_callback_start+0x82/0xd0
? dpm_run_callback+0x49/0xc0
With the following RIP:
RIP: 0010:native_queued_spin_lock_slowpath+0x69/0x200
Since the fix to the recursion detection would allow a single recursion to
happen while tracing, this lead to the trace_clock_global() taking a spin
lock and then trying to take it again:
ring_buffer_lock_reserve() {
trace_clock_global() {
arch_spin_lock() {
queued_spin_lock_slowpath() {
/* lock taken */
(something else gets traced by function graph tracer)
ring_buffer_lock_reserve() {
trace_clock_global() {
arch_spin_lock() {
queued_spin_lock_slowpath() {
/* DEAD LOCK! */
Tracing should *never* block, as it can lead to strange lockups like the
above.
Restructure the trace_clock_global() code to instead of simply taking a
lock to update the recorded "prev_time" simply use it, as two events
happening on two different CPUs that calls this at the same time, really
doesn't matter which one goes first. Use a trylock to grab the lock for
updating the prev_time, and if it fails, simply try again the next time.
If it failed to be taken, that means something else is already updating
it.
Link: https://lkml.kernel.org/r/20210430121758.650b6e8a@gandalf.local.home
Cc: stable(a)vger.kernel.org
Tested-by: Konstantin Kharlamov <hi-angel(a)yandex.ru>
Tested-by: Todd Brandt <todd.e.brandt(a)linux.intel.com>
Fixes: b02414c8f045 ("ring-buffer: Fix recursion protection transitions between interrupt context") # started showing the problem
Fixes: 14131f2f98ac3 ("tracing: implement trace_clock_*() APIs") # where the bug happened
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212761
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index aaf6793ededa..c1637f90c8a3 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -95,33 +95,49 @@ u64 notrace trace_clock_global(void)
{
unsigned long flags;
int this_cpu;
- u64 now;
+ u64 now, prev_time;
raw_local_irq_save(flags);
this_cpu = raw_smp_processor_id();
- now = sched_clock_cpu(this_cpu);
+
/*
- * If in an NMI context then dont risk lockups and return the
- * cpu_clock() time:
+ * The global clock "guarantees" that the events are ordered
+ * between CPUs. But if two events on two different CPUS call
+ * trace_clock_global at roughly the same time, it really does
+ * not matter which one gets the earlier time. Just make sure
+ * that the same CPU will always show a monotonic clock.
+ *
+ * Use a read memory barrier to get the latest written
+ * time that was recorded.
*/
- if (unlikely(in_nmi()))
- goto out;
+ smp_rmb();
+ prev_time = READ_ONCE(trace_clock_struct.prev_time);
+ now = sched_clock_cpu(this_cpu);
- arch_spin_lock(&trace_clock_struct.lock);
+ /* Make sure that now is always greater than prev_time */
+ if ((s64)(now - prev_time) < 0)
+ now = prev_time + 1;
/*
- * TODO: if this happens often then maybe we should reset
- * my_scd->clock to prev_time+1, to make sure
- * we start ticking with the local clock from now on?
+ * If in an NMI context then dont risk lockups and simply return
+ * the current time.
*/
- if ((s64)(now - trace_clock_struct.prev_time) < 0)
- now = trace_clock_struct.prev_time + 1;
+ if (unlikely(in_nmi()))
+ goto out;
- trace_clock_struct.prev_time = now;
+ /* Tracing can cause strange recursion, always use a try lock */
+ if (arch_spin_trylock(&trace_clock_struct.lock)) {
+ /* Reread prev_time in case it was already updated */
+ prev_time = READ_ONCE(trace_clock_struct.prev_time);
+ if ((s64)(now - prev_time) < 0)
+ now = prev_time + 1;
- arch_spin_unlock(&trace_clock_struct.lock);
+ trace_clock_struct.prev_time = now;
+ /* The unlock acts as the wmb for the above rmb */
+ arch_spin_unlock(&trace_clock_struct.lock);
+ }
out:
raw_local_irq_restore(flags);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 785e3c0a3a870e72dc530856136ab4c8dd207128 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Tue, 27 Apr 2021 11:32:07 -0400
Subject: [PATCH] tracing: Map all PIDs to command lines
The default max PID is set by PID_MAX_DEFAULT, and the tracing
infrastructure uses this number to map PIDs to the comm names of the
tasks, such output of the trace can show names from the recorded PIDs in
the ring buffer. This mapping is also exported to user space via the
"saved_cmdlines" file in the tracefs directory.
But currently the mapping expects the PIDs to be less than
PID_MAX_DEFAULT, which is the default maximum and not the real maximum.
Recently, systemd will increases the maximum value of a PID on the system,
and when tasks are traced that have a PID higher than PID_MAX_DEFAULT, its
comm is not recorded. This leads to the entire trace to have "<...>" as
the comm name, which is pretty useless.
Instead, keep the array mapping the size of PID_MAX_DEFAULT, but instead
of just mapping the index to the comm, map a mask of the PID
(PID_MAX_DEFAULT - 1) to the comm, and find the full PID from the
map_cmdline_to_pid array (that already exists).
This bug goes back to the beginning of ftrace, but hasn't been an issue
until user space started increasing the maximum value of PIDs.
Link: https://lkml.kernel.org/r/20210427113207.3c601884@gandalf.local.home
Cc: stable(a)vger.kernel.org
Fixes: bc0c38d139ec7 ("ftrace: latency tracer infrastructure")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 66a4ad93b5e9..e28d08905124 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2390,14 +2390,13 @@ static void tracing_stop_tr(struct trace_array *tr)
static int trace_save_cmdline(struct task_struct *tsk)
{
- unsigned pid, idx;
+ unsigned tpid, idx;
/* treat recording of idle task as a success */
if (!tsk->pid)
return 1;
- if (unlikely(tsk->pid > PID_MAX_DEFAULT))
- return 0;
+ tpid = tsk->pid & (PID_MAX_DEFAULT - 1);
/*
* It's not the end of the world if we don't get
@@ -2408,26 +2407,15 @@ static int trace_save_cmdline(struct task_struct *tsk)
if (!arch_spin_trylock(&trace_cmdline_lock))
return 0;
- idx = savedcmd->map_pid_to_cmdline[tsk->pid];
+ idx = savedcmd->map_pid_to_cmdline[tpid];
if (idx == NO_CMDLINE_MAP) {
idx = (savedcmd->cmdline_idx + 1) % savedcmd->cmdline_num;
- /*
- * Check whether the cmdline buffer at idx has a pid
- * mapped. We are going to overwrite that entry so we
- * need to clear the map_pid_to_cmdline. Otherwise we
- * would read the new comm for the old pid.
- */
- pid = savedcmd->map_cmdline_to_pid[idx];
- if (pid != NO_CMDLINE_MAP)
- savedcmd->map_pid_to_cmdline[pid] = NO_CMDLINE_MAP;
-
- savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
- savedcmd->map_pid_to_cmdline[tsk->pid] = idx;
-
+ savedcmd->map_pid_to_cmdline[tpid] = idx;
savedcmd->cmdline_idx = idx;
}
+ savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
set_cmdline(idx, tsk->comm);
arch_spin_unlock(&trace_cmdline_lock);
@@ -2438,6 +2426,7 @@ static int trace_save_cmdline(struct task_struct *tsk)
static void __trace_find_cmdline(int pid, char comm[])
{
unsigned map;
+ int tpid;
if (!pid) {
strcpy(comm, "<idle>");
@@ -2449,16 +2438,16 @@ static void __trace_find_cmdline(int pid, char comm[])
return;
}
- if (pid > PID_MAX_DEFAULT) {
- strcpy(comm, "<...>");
- return;
+ tpid = pid & (PID_MAX_DEFAULT - 1);
+ map = savedcmd->map_pid_to_cmdline[tpid];
+ if (map != NO_CMDLINE_MAP) {
+ tpid = savedcmd->map_cmdline_to_pid[map];
+ if (tpid == pid) {
+ strlcpy(comm, get_saved_cmdlines(map), TASK_COMM_LEN);
+ return;
+ }
}
-
- map = savedcmd->map_pid_to_cmdline[pid];
- if (map != NO_CMDLINE_MAP)
- strlcpy(comm, get_saved_cmdlines(map), TASK_COMM_LEN);
- else
- strcpy(comm, "<...>");
+ strcpy(comm, "<...>");
}
void trace_find_cmdline(int pid, char comm[])
Hi
the following was commited in Linus tree a couple of days ago, but so
is not yet in a tagged version.
But still please consider to queue up 301b1d3a9104
("tools/power/turbostat: Fix turbostat for AMD Zen CPUs") to v5.10.y
and later for the next round of updates, as it turbostat no lonwer
worked on those Zen based systems since 5.10.
Thank you for considering, I can otherwise reping once 5.13-rc1 is
tagged and released.
Regards,
Salvatore
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 42b32b164acecd850edef010915a02418345a033 Mon Sep 17 00:00:00 2001
From: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Date: Thu, 8 Apr 2021 13:45:49 +0400
Subject: [PATCH] usb: dwc2: Fix session request interrupt handler
According to programming guide in host mode, port
power must be turned on in session request
interrupt handlers.
Fixes: 21795c826a45 ("usb: dwc2: exit hibernation on session request")
Cc: <stable(a)vger.kernel.org>
Acked-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
Signed-off-by: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Link: https://lore.kernel.org/r/20210408094550.75484A0094@mailhost.synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc2/core_intr.c b/drivers/usb/dwc2/core_intr.c
index 0a7f9330907f..8c0152b514be 100644
--- a/drivers/usb/dwc2/core_intr.c
+++ b/drivers/usb/dwc2/core_intr.c
@@ -307,6 +307,7 @@ static void dwc2_handle_conn_id_status_change_intr(struct dwc2_hsotg *hsotg)
static void dwc2_handle_session_req_intr(struct dwc2_hsotg *hsotg)
{
int ret;
+ u32 hprt0;
/* Clear interrupt */
dwc2_writel(hsotg, GINTSTS_SESSREQINT, GINTSTS);
@@ -328,6 +329,13 @@ static void dwc2_handle_session_req_intr(struct dwc2_hsotg *hsotg)
* established
*/
dwc2_hsotg_disconnect(hsotg);
+ } else {
+ /* Turn on the port power bit. */
+ hprt0 = dwc2_read_hprt0(hsotg);
+ hprt0 |= HPRT0_PWR;
+ dwc2_writel(hsotg, hprt0, HPRT0);
+ /* Connect hcd after port power is set. */
+ dwc2_hcd_connect(hsotg);
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 42b32b164acecd850edef010915a02418345a033 Mon Sep 17 00:00:00 2001
From: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Date: Thu, 8 Apr 2021 13:45:49 +0400
Subject: [PATCH] usb: dwc2: Fix session request interrupt handler
According to programming guide in host mode, port
power must be turned on in session request
interrupt handlers.
Fixes: 21795c826a45 ("usb: dwc2: exit hibernation on session request")
Cc: <stable(a)vger.kernel.org>
Acked-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
Signed-off-by: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Link: https://lore.kernel.org/r/20210408094550.75484A0094@mailhost.synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc2/core_intr.c b/drivers/usb/dwc2/core_intr.c
index 0a7f9330907f..8c0152b514be 100644
--- a/drivers/usb/dwc2/core_intr.c
+++ b/drivers/usb/dwc2/core_intr.c
@@ -307,6 +307,7 @@ static void dwc2_handle_conn_id_status_change_intr(struct dwc2_hsotg *hsotg)
static void dwc2_handle_session_req_intr(struct dwc2_hsotg *hsotg)
{
int ret;
+ u32 hprt0;
/* Clear interrupt */
dwc2_writel(hsotg, GINTSTS_SESSREQINT, GINTSTS);
@@ -328,6 +329,13 @@ static void dwc2_handle_session_req_intr(struct dwc2_hsotg *hsotg)
* established
*/
dwc2_hsotg_disconnect(hsotg);
+ } else {
+ /* Turn on the port power bit. */
+ hprt0 = dwc2_read_hprt0(hsotg);
+ hprt0 |= HPRT0_PWR;
+ dwc2_writel(hsotg, hprt0, HPRT0);
+ /* Connect hcd after port power is set. */
+ dwc2_hcd_connect(hsotg);
}
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 42b32b164acecd850edef010915a02418345a033 Mon Sep 17 00:00:00 2001
From: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Date: Thu, 8 Apr 2021 13:45:49 +0400
Subject: [PATCH] usb: dwc2: Fix session request interrupt handler
According to programming guide in host mode, port
power must be turned on in session request
interrupt handlers.
Fixes: 21795c826a45 ("usb: dwc2: exit hibernation on session request")
Cc: <stable(a)vger.kernel.org>
Acked-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
Signed-off-by: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Link: https://lore.kernel.org/r/20210408094550.75484A0094@mailhost.synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc2/core_intr.c b/drivers/usb/dwc2/core_intr.c
index 0a7f9330907f..8c0152b514be 100644
--- a/drivers/usb/dwc2/core_intr.c
+++ b/drivers/usb/dwc2/core_intr.c
@@ -307,6 +307,7 @@ static void dwc2_handle_conn_id_status_change_intr(struct dwc2_hsotg *hsotg)
static void dwc2_handle_session_req_intr(struct dwc2_hsotg *hsotg)
{
int ret;
+ u32 hprt0;
/* Clear interrupt */
dwc2_writel(hsotg, GINTSTS_SESSREQINT, GINTSTS);
@@ -328,6 +329,13 @@ static void dwc2_handle_session_req_intr(struct dwc2_hsotg *hsotg)
* established
*/
dwc2_hsotg_disconnect(hsotg);
+ } else {
+ /* Turn on the port power bit. */
+ hprt0 = dwc2_read_hprt0(hsotg);
+ hprt0 |= HPRT0_PWR;
+ dwc2_writel(hsotg, hprt0, HPRT0);
+ /* Connect hcd after port power is set. */
+ dwc2_hcd_connect(hsotg);
}
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e926c474ebee404441c838d18224cd6f246a71b7 Mon Sep 17 00:00:00 2001
From: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Date: Mon, 22 Feb 2021 11:06:43 +0100
Subject: [PATCH] drm/compat: Clear bounce structures
Some of them have gaps, or fields we don't clear. Native ioctl code
does full copies plus zero-extends on size mismatch, so nothing can
leak. But compat is more hand-rolled so need to be careful.
None of these matter for performance, so just memset.
Also I didn't fix up the CONFIG_DRM_LEGACY or CONFIG_DRM_AGP ioctl, those
are security holes anyway.
Acked-by: Maxime Ripard <mripard(a)kernel.org>
Reported-by: syzbot+620cf21140fc7e772a5d(a)syzkaller.appspotmail.com # vblank ioctl
Cc: syzbot+620cf21140fc7e772a5d(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210222100643.400935-1-danie…
diff --git a/drivers/gpu/drm/drm_ioc32.c b/drivers/gpu/drm/drm_ioc32.c
index f86448ab1fe0..dc734d4828a1 100644
--- a/drivers/gpu/drm/drm_ioc32.c
+++ b/drivers/gpu/drm/drm_ioc32.c
@@ -99,6 +99,8 @@ static int compat_drm_version(struct file *file, unsigned int cmd,
if (copy_from_user(&v32, (void __user *)arg, sizeof(v32)))
return -EFAULT;
+ memset(&v, 0, sizeof(v));
+
v = (struct drm_version) {
.name_len = v32.name_len,
.name = compat_ptr(v32.name),
@@ -137,6 +139,9 @@ static int compat_drm_getunique(struct file *file, unsigned int cmd,
if (copy_from_user(&uq32, (void __user *)arg, sizeof(uq32)))
return -EFAULT;
+
+ memset(&uq, 0, sizeof(uq));
+
uq = (struct drm_unique){
.unique_len = uq32.unique_len,
.unique = compat_ptr(uq32.unique),
@@ -265,6 +270,8 @@ static int compat_drm_getclient(struct file *file, unsigned int cmd,
if (copy_from_user(&c32, argp, sizeof(c32)))
return -EFAULT;
+ memset(&client, 0, sizeof(client));
+
client.idx = c32.idx;
err = drm_ioctl_kernel(file, drm_getclient, &client, 0);
@@ -852,6 +859,8 @@ static int compat_drm_wait_vblank(struct file *file, unsigned int cmd,
if (copy_from_user(&req32, argp, sizeof(req32)))
return -EFAULT;
+ memset(&req, 0, sizeof(req));
+
req.request.type = req32.request.type;
req.request.sequence = req32.request.sequence;
req.request.signal = req32.request.signal;
@@ -889,6 +898,8 @@ static int compat_drm_mode_addfb2(struct file *file, unsigned int cmd,
struct drm_mode_fb_cmd2 req64;
int err;
+ memset(&req64, 0, sizeof(req64));
+
if (copy_from_user(&req64, argp,
offsetof(drm_mode_fb_cmd232_t, modifier)))
return -EFAULT;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e926c474ebee404441c838d18224cd6f246a71b7 Mon Sep 17 00:00:00 2001
From: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Date: Mon, 22 Feb 2021 11:06:43 +0100
Subject: [PATCH] drm/compat: Clear bounce structures
Some of them have gaps, or fields we don't clear. Native ioctl code
does full copies plus zero-extends on size mismatch, so nothing can
leak. But compat is more hand-rolled so need to be careful.
None of these matter for performance, so just memset.
Also I didn't fix up the CONFIG_DRM_LEGACY or CONFIG_DRM_AGP ioctl, those
are security holes anyway.
Acked-by: Maxime Ripard <mripard(a)kernel.org>
Reported-by: syzbot+620cf21140fc7e772a5d(a)syzkaller.appspotmail.com # vblank ioctl
Cc: syzbot+620cf21140fc7e772a5d(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210222100643.400935-1-danie…
diff --git a/drivers/gpu/drm/drm_ioc32.c b/drivers/gpu/drm/drm_ioc32.c
index f86448ab1fe0..dc734d4828a1 100644
--- a/drivers/gpu/drm/drm_ioc32.c
+++ b/drivers/gpu/drm/drm_ioc32.c
@@ -99,6 +99,8 @@ static int compat_drm_version(struct file *file, unsigned int cmd,
if (copy_from_user(&v32, (void __user *)arg, sizeof(v32)))
return -EFAULT;
+ memset(&v, 0, sizeof(v));
+
v = (struct drm_version) {
.name_len = v32.name_len,
.name = compat_ptr(v32.name),
@@ -137,6 +139,9 @@ static int compat_drm_getunique(struct file *file, unsigned int cmd,
if (copy_from_user(&uq32, (void __user *)arg, sizeof(uq32)))
return -EFAULT;
+
+ memset(&uq, 0, sizeof(uq));
+
uq = (struct drm_unique){
.unique_len = uq32.unique_len,
.unique = compat_ptr(uq32.unique),
@@ -265,6 +270,8 @@ static int compat_drm_getclient(struct file *file, unsigned int cmd,
if (copy_from_user(&c32, argp, sizeof(c32)))
return -EFAULT;
+ memset(&client, 0, sizeof(client));
+
client.idx = c32.idx;
err = drm_ioctl_kernel(file, drm_getclient, &client, 0);
@@ -852,6 +859,8 @@ static int compat_drm_wait_vblank(struct file *file, unsigned int cmd,
if (copy_from_user(&req32, argp, sizeof(req32)))
return -EFAULT;
+ memset(&req, 0, sizeof(req));
+
req.request.type = req32.request.type;
req.request.sequence = req32.request.sequence;
req.request.signal = req32.request.signal;
@@ -889,6 +898,8 @@ static int compat_drm_mode_addfb2(struct file *file, unsigned int cmd,
struct drm_mode_fb_cmd2 req64;
int err;
+ memset(&req64, 0, sizeof(req64));
+
if (copy_from_user(&req64, argp,
offsetof(drm_mode_fb_cmd232_t, modifier)))
return -EFAULT;
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e926c474ebee404441c838d18224cd6f246a71b7 Mon Sep 17 00:00:00 2001
From: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Date: Mon, 22 Feb 2021 11:06:43 +0100
Subject: [PATCH] drm/compat: Clear bounce structures
Some of them have gaps, or fields we don't clear. Native ioctl code
does full copies plus zero-extends on size mismatch, so nothing can
leak. But compat is more hand-rolled so need to be careful.
None of these matter for performance, so just memset.
Also I didn't fix up the CONFIG_DRM_LEGACY or CONFIG_DRM_AGP ioctl, those
are security holes anyway.
Acked-by: Maxime Ripard <mripard(a)kernel.org>
Reported-by: syzbot+620cf21140fc7e772a5d(a)syzkaller.appspotmail.com # vblank ioctl
Cc: syzbot+620cf21140fc7e772a5d(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210222100643.400935-1-danie…
diff --git a/drivers/gpu/drm/drm_ioc32.c b/drivers/gpu/drm/drm_ioc32.c
index f86448ab1fe0..dc734d4828a1 100644
--- a/drivers/gpu/drm/drm_ioc32.c
+++ b/drivers/gpu/drm/drm_ioc32.c
@@ -99,6 +99,8 @@ static int compat_drm_version(struct file *file, unsigned int cmd,
if (copy_from_user(&v32, (void __user *)arg, sizeof(v32)))
return -EFAULT;
+ memset(&v, 0, sizeof(v));
+
v = (struct drm_version) {
.name_len = v32.name_len,
.name = compat_ptr(v32.name),
@@ -137,6 +139,9 @@ static int compat_drm_getunique(struct file *file, unsigned int cmd,
if (copy_from_user(&uq32, (void __user *)arg, sizeof(uq32)))
return -EFAULT;
+
+ memset(&uq, 0, sizeof(uq));
+
uq = (struct drm_unique){
.unique_len = uq32.unique_len,
.unique = compat_ptr(uq32.unique),
@@ -265,6 +270,8 @@ static int compat_drm_getclient(struct file *file, unsigned int cmd,
if (copy_from_user(&c32, argp, sizeof(c32)))
return -EFAULT;
+ memset(&client, 0, sizeof(client));
+
client.idx = c32.idx;
err = drm_ioctl_kernel(file, drm_getclient, &client, 0);
@@ -852,6 +859,8 @@ static int compat_drm_wait_vblank(struct file *file, unsigned int cmd,
if (copy_from_user(&req32, argp, sizeof(req32)))
return -EFAULT;
+ memset(&req, 0, sizeof(req));
+
req.request.type = req32.request.type;
req.request.sequence = req32.request.sequence;
req.request.signal = req32.request.signal;
@@ -889,6 +898,8 @@ static int compat_drm_mode_addfb2(struct file *file, unsigned int cmd,
struct drm_mode_fb_cmd2 req64;
int err;
+ memset(&req64, 0, sizeof(req64));
+
if (copy_from_user(&req64, argp,
offsetof(drm_mode_fb_cmd232_t, modifier)))
return -EFAULT;
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e926c474ebee404441c838d18224cd6f246a71b7 Mon Sep 17 00:00:00 2001
From: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Date: Mon, 22 Feb 2021 11:06:43 +0100
Subject: [PATCH] drm/compat: Clear bounce structures
Some of them have gaps, or fields we don't clear. Native ioctl code
does full copies plus zero-extends on size mismatch, so nothing can
leak. But compat is more hand-rolled so need to be careful.
None of these matter for performance, so just memset.
Also I didn't fix up the CONFIG_DRM_LEGACY or CONFIG_DRM_AGP ioctl, those
are security holes anyway.
Acked-by: Maxime Ripard <mripard(a)kernel.org>
Reported-by: syzbot+620cf21140fc7e772a5d(a)syzkaller.appspotmail.com # vblank ioctl
Cc: syzbot+620cf21140fc7e772a5d(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210222100643.400935-1-danie…
diff --git a/drivers/gpu/drm/drm_ioc32.c b/drivers/gpu/drm/drm_ioc32.c
index f86448ab1fe0..dc734d4828a1 100644
--- a/drivers/gpu/drm/drm_ioc32.c
+++ b/drivers/gpu/drm/drm_ioc32.c
@@ -99,6 +99,8 @@ static int compat_drm_version(struct file *file, unsigned int cmd,
if (copy_from_user(&v32, (void __user *)arg, sizeof(v32)))
return -EFAULT;
+ memset(&v, 0, sizeof(v));
+
v = (struct drm_version) {
.name_len = v32.name_len,
.name = compat_ptr(v32.name),
@@ -137,6 +139,9 @@ static int compat_drm_getunique(struct file *file, unsigned int cmd,
if (copy_from_user(&uq32, (void __user *)arg, sizeof(uq32)))
return -EFAULT;
+
+ memset(&uq, 0, sizeof(uq));
+
uq = (struct drm_unique){
.unique_len = uq32.unique_len,
.unique = compat_ptr(uq32.unique),
@@ -265,6 +270,8 @@ static int compat_drm_getclient(struct file *file, unsigned int cmd,
if (copy_from_user(&c32, argp, sizeof(c32)))
return -EFAULT;
+ memset(&client, 0, sizeof(client));
+
client.idx = c32.idx;
err = drm_ioctl_kernel(file, drm_getclient, &client, 0);
@@ -852,6 +859,8 @@ static int compat_drm_wait_vblank(struct file *file, unsigned int cmd,
if (copy_from_user(&req32, argp, sizeof(req32)))
return -EFAULT;
+ memset(&req, 0, sizeof(req));
+
req.request.type = req32.request.type;
req.request.sequence = req32.request.sequence;
req.request.signal = req32.request.signal;
@@ -889,6 +898,8 @@ static int compat_drm_mode_addfb2(struct file *file, unsigned int cmd,
struct drm_mode_fb_cmd2 req64;
int err;
+ memset(&req64, 0, sizeof(req64));
+
if (copy_from_user(&req64, argp,
offsetof(drm_mode_fb_cmd232_t, modifier)))
return -EFAULT;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c74c26f6e398387cc953b3fdb54858f09bfb696b Mon Sep 17 00:00:00 2001
From: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Date: Thu, 8 Apr 2021 13:46:06 +0400
Subject: [PATCH] usb: dwc2: Fix partial power down exiting by system resume
Fixes the implementation of exiting from partial power down
power saving mode when PC is resumed.
Added port connection status checking which prevents exiting from
Partial Power Down mode from _dwc2_hcd_resume() if not in Partial
Power Down mode.
Rearranged the implementation to get rid of many "if"
statements.
NOTE: Switch case statement is used for hibernation partial
power down and clock gating mode determination. In this patch
only Partial Power Down is implemented the Hibernation and
clock gating implementations are planned to be added.
Fixes: 6f6d70597c15 ("usb: dwc2: bus suspend/resume for hosts with DWC2_POWER_DOWN_PARAM_NONE")
Cc: <stable(a)vger.kernel.org>
Acked-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
Signed-off-by: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Link: https://lore.kernel.org/r/20210408094607.1A9BAA0094@mailhost.synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index 34030bafdff4..f096006df96f 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -4427,7 +4427,7 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
{
struct dwc2_hsotg *hsotg = dwc2_hcd_to_hsotg(hcd);
unsigned long flags;
- u32 pcgctl;
+ u32 hprt0;
int ret = 0;
spin_lock_irqsave(&hsotg->lock, flags);
@@ -4438,11 +4438,40 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
if (hsotg->lx_state != DWC2_L2)
goto unlock;
- if (hsotg->params.power_down > DWC2_POWER_DOWN_PARAM_PARTIAL) {
+ hprt0 = dwc2_read_hprt0(hsotg);
+
+ /*
+ * Added port connection status checking which prevents exiting from
+ * Partial Power Down mode from _dwc2_hcd_resume() if not in Partial
+ * Power Down mode.
+ */
+ if (hprt0 & HPRT0_CONNSTS) {
+ hsotg->lx_state = DWC2_L0;
+ goto unlock;
+ }
+
+ switch (hsotg->params.power_down) {
+ case DWC2_POWER_DOWN_PARAM_PARTIAL:
+ ret = dwc2_exit_partial_power_down(hsotg, 0, true);
+ if (ret)
+ dev_err(hsotg->dev,
+ "exit partial_power_down failed\n");
+ /*
+ * Set HW accessible bit before powering on the controller
+ * since an interrupt may rise.
+ */
+ set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags);
+ break;
+ case DWC2_POWER_DOWN_PARAM_HIBERNATION:
+ case DWC2_POWER_DOWN_PARAM_NONE:
+ default:
hsotg->lx_state = DWC2_L0;
goto unlock;
}
+ /* Change Root port status, as port status change occurred after resume.*/
+ hsotg->flags.b.port_suspend_change = 1;
+
/*
* Enable power if not already done.
* This must not be spinlocked since duration
@@ -4454,52 +4483,25 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
spin_lock_irqsave(&hsotg->lock, flags);
}
- if (hsotg->params.power_down == DWC2_POWER_DOWN_PARAM_PARTIAL) {
- /*
- * Set HW accessible bit before powering on the controller
- * since an interrupt may rise.
- */
- set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags);
-
-
- /* Exit partial_power_down */
- ret = dwc2_exit_partial_power_down(hsotg, 0, true);
- if (ret && (ret != -ENOTSUPP))
- dev_err(hsotg->dev, "exit partial_power_down failed\n");
- } else {
- pcgctl = readl(hsotg->regs + PCGCTL);
- pcgctl &= ~PCGCTL_STOPPCLK;
- writel(pcgctl, hsotg->regs + PCGCTL);
- }
-
- hsotg->lx_state = DWC2_L0;
-
+ /* Enable external vbus supply after resuming the port. */
spin_unlock_irqrestore(&hsotg->lock, flags);
+ dwc2_vbus_supply_init(hsotg);
- if (hsotg->bus_suspended) {
- spin_lock_irqsave(&hsotg->lock, flags);
- hsotg->flags.b.port_suspend_change = 1;
- spin_unlock_irqrestore(&hsotg->lock, flags);
- dwc2_port_resume(hsotg);
- } else {
- if (hsotg->params.power_down == DWC2_POWER_DOWN_PARAM_PARTIAL) {
- dwc2_vbus_supply_init(hsotg);
-
- /* Wait for controller to correctly update D+/D- level */
- usleep_range(3000, 5000);
- }
+ /* Wait for controller to correctly update D+/D- level */
+ usleep_range(3000, 5000);
+ spin_lock_irqsave(&hsotg->lock, flags);
- /*
- * Clear Port Enable and Port Status changes.
- * Enable Port Power.
- */
- dwc2_writel(hsotg, HPRT0_PWR | HPRT0_CONNDET |
- HPRT0_ENACHG, HPRT0);
- /* Wait for controller to detect Port Connect */
- usleep_range(5000, 7000);
- }
+ /*
+ * Clear Port Enable and Port Status changes.
+ * Enable Port Power.
+ */
+ dwc2_writel(hsotg, HPRT0_PWR | HPRT0_CONNDET |
+ HPRT0_ENACHG, HPRT0);
- return ret;
+ /* Wait for controller to detect Port Connect */
+ spin_unlock_irqrestore(&hsotg->lock, flags);
+ usleep_range(5000, 7000);
+ spin_lock_irqsave(&hsotg->lock, flags);
unlock:
spin_unlock_irqrestore(&hsotg->lock, flags);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c74c26f6e398387cc953b3fdb54858f09bfb696b Mon Sep 17 00:00:00 2001
From: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Date: Thu, 8 Apr 2021 13:46:06 +0400
Subject: [PATCH] usb: dwc2: Fix partial power down exiting by system resume
Fixes the implementation of exiting from partial power down
power saving mode when PC is resumed.
Added port connection status checking which prevents exiting from
Partial Power Down mode from _dwc2_hcd_resume() if not in Partial
Power Down mode.
Rearranged the implementation to get rid of many "if"
statements.
NOTE: Switch case statement is used for hibernation partial
power down and clock gating mode determination. In this patch
only Partial Power Down is implemented the Hibernation and
clock gating implementations are planned to be added.
Fixes: 6f6d70597c15 ("usb: dwc2: bus suspend/resume for hosts with DWC2_POWER_DOWN_PARAM_NONE")
Cc: <stable(a)vger.kernel.org>
Acked-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
Signed-off-by: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Link: https://lore.kernel.org/r/20210408094607.1A9BAA0094@mailhost.synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index 34030bafdff4..f096006df96f 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -4427,7 +4427,7 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
{
struct dwc2_hsotg *hsotg = dwc2_hcd_to_hsotg(hcd);
unsigned long flags;
- u32 pcgctl;
+ u32 hprt0;
int ret = 0;
spin_lock_irqsave(&hsotg->lock, flags);
@@ -4438,11 +4438,40 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
if (hsotg->lx_state != DWC2_L2)
goto unlock;
- if (hsotg->params.power_down > DWC2_POWER_DOWN_PARAM_PARTIAL) {
+ hprt0 = dwc2_read_hprt0(hsotg);
+
+ /*
+ * Added port connection status checking which prevents exiting from
+ * Partial Power Down mode from _dwc2_hcd_resume() if not in Partial
+ * Power Down mode.
+ */
+ if (hprt0 & HPRT0_CONNSTS) {
+ hsotg->lx_state = DWC2_L0;
+ goto unlock;
+ }
+
+ switch (hsotg->params.power_down) {
+ case DWC2_POWER_DOWN_PARAM_PARTIAL:
+ ret = dwc2_exit_partial_power_down(hsotg, 0, true);
+ if (ret)
+ dev_err(hsotg->dev,
+ "exit partial_power_down failed\n");
+ /*
+ * Set HW accessible bit before powering on the controller
+ * since an interrupt may rise.
+ */
+ set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags);
+ break;
+ case DWC2_POWER_DOWN_PARAM_HIBERNATION:
+ case DWC2_POWER_DOWN_PARAM_NONE:
+ default:
hsotg->lx_state = DWC2_L0;
goto unlock;
}
+ /* Change Root port status, as port status change occurred after resume.*/
+ hsotg->flags.b.port_suspend_change = 1;
+
/*
* Enable power if not already done.
* This must not be spinlocked since duration
@@ -4454,52 +4483,25 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
spin_lock_irqsave(&hsotg->lock, flags);
}
- if (hsotg->params.power_down == DWC2_POWER_DOWN_PARAM_PARTIAL) {
- /*
- * Set HW accessible bit before powering on the controller
- * since an interrupt may rise.
- */
- set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags);
-
-
- /* Exit partial_power_down */
- ret = dwc2_exit_partial_power_down(hsotg, 0, true);
- if (ret && (ret != -ENOTSUPP))
- dev_err(hsotg->dev, "exit partial_power_down failed\n");
- } else {
- pcgctl = readl(hsotg->regs + PCGCTL);
- pcgctl &= ~PCGCTL_STOPPCLK;
- writel(pcgctl, hsotg->regs + PCGCTL);
- }
-
- hsotg->lx_state = DWC2_L0;
-
+ /* Enable external vbus supply after resuming the port. */
spin_unlock_irqrestore(&hsotg->lock, flags);
+ dwc2_vbus_supply_init(hsotg);
- if (hsotg->bus_suspended) {
- spin_lock_irqsave(&hsotg->lock, flags);
- hsotg->flags.b.port_suspend_change = 1;
- spin_unlock_irqrestore(&hsotg->lock, flags);
- dwc2_port_resume(hsotg);
- } else {
- if (hsotg->params.power_down == DWC2_POWER_DOWN_PARAM_PARTIAL) {
- dwc2_vbus_supply_init(hsotg);
-
- /* Wait for controller to correctly update D+/D- level */
- usleep_range(3000, 5000);
- }
+ /* Wait for controller to correctly update D+/D- level */
+ usleep_range(3000, 5000);
+ spin_lock_irqsave(&hsotg->lock, flags);
- /*
- * Clear Port Enable and Port Status changes.
- * Enable Port Power.
- */
- dwc2_writel(hsotg, HPRT0_PWR | HPRT0_CONNDET |
- HPRT0_ENACHG, HPRT0);
- /* Wait for controller to detect Port Connect */
- usleep_range(5000, 7000);
- }
+ /*
+ * Clear Port Enable and Port Status changes.
+ * Enable Port Power.
+ */
+ dwc2_writel(hsotg, HPRT0_PWR | HPRT0_CONNDET |
+ HPRT0_ENACHG, HPRT0);
- return ret;
+ /* Wait for controller to detect Port Connect */
+ spin_unlock_irqrestore(&hsotg->lock, flags);
+ usleep_range(5000, 7000);
+ spin_lock_irqsave(&hsotg->lock, flags);
unlock:
spin_unlock_irqrestore(&hsotg->lock, flags);
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c74c26f6e398387cc953b3fdb54858f09bfb696b Mon Sep 17 00:00:00 2001
From: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Date: Thu, 8 Apr 2021 13:46:06 +0400
Subject: [PATCH] usb: dwc2: Fix partial power down exiting by system resume
Fixes the implementation of exiting from partial power down
power saving mode when PC is resumed.
Added port connection status checking which prevents exiting from
Partial Power Down mode from _dwc2_hcd_resume() if not in Partial
Power Down mode.
Rearranged the implementation to get rid of many "if"
statements.
NOTE: Switch case statement is used for hibernation partial
power down and clock gating mode determination. In this patch
only Partial Power Down is implemented the Hibernation and
clock gating implementations are planned to be added.
Fixes: 6f6d70597c15 ("usb: dwc2: bus suspend/resume for hosts with DWC2_POWER_DOWN_PARAM_NONE")
Cc: <stable(a)vger.kernel.org>
Acked-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
Signed-off-by: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Link: https://lore.kernel.org/r/20210408094607.1A9BAA0094@mailhost.synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index 34030bafdff4..f096006df96f 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -4427,7 +4427,7 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
{
struct dwc2_hsotg *hsotg = dwc2_hcd_to_hsotg(hcd);
unsigned long flags;
- u32 pcgctl;
+ u32 hprt0;
int ret = 0;
spin_lock_irqsave(&hsotg->lock, flags);
@@ -4438,11 +4438,40 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
if (hsotg->lx_state != DWC2_L2)
goto unlock;
- if (hsotg->params.power_down > DWC2_POWER_DOWN_PARAM_PARTIAL) {
+ hprt0 = dwc2_read_hprt0(hsotg);
+
+ /*
+ * Added port connection status checking which prevents exiting from
+ * Partial Power Down mode from _dwc2_hcd_resume() if not in Partial
+ * Power Down mode.
+ */
+ if (hprt0 & HPRT0_CONNSTS) {
+ hsotg->lx_state = DWC2_L0;
+ goto unlock;
+ }
+
+ switch (hsotg->params.power_down) {
+ case DWC2_POWER_DOWN_PARAM_PARTIAL:
+ ret = dwc2_exit_partial_power_down(hsotg, 0, true);
+ if (ret)
+ dev_err(hsotg->dev,
+ "exit partial_power_down failed\n");
+ /*
+ * Set HW accessible bit before powering on the controller
+ * since an interrupt may rise.
+ */
+ set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags);
+ break;
+ case DWC2_POWER_DOWN_PARAM_HIBERNATION:
+ case DWC2_POWER_DOWN_PARAM_NONE:
+ default:
hsotg->lx_state = DWC2_L0;
goto unlock;
}
+ /* Change Root port status, as port status change occurred after resume.*/
+ hsotg->flags.b.port_suspend_change = 1;
+
/*
* Enable power if not already done.
* This must not be spinlocked since duration
@@ -4454,52 +4483,25 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
spin_lock_irqsave(&hsotg->lock, flags);
}
- if (hsotg->params.power_down == DWC2_POWER_DOWN_PARAM_PARTIAL) {
- /*
- * Set HW accessible bit before powering on the controller
- * since an interrupt may rise.
- */
- set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags);
-
-
- /* Exit partial_power_down */
- ret = dwc2_exit_partial_power_down(hsotg, 0, true);
- if (ret && (ret != -ENOTSUPP))
- dev_err(hsotg->dev, "exit partial_power_down failed\n");
- } else {
- pcgctl = readl(hsotg->regs + PCGCTL);
- pcgctl &= ~PCGCTL_STOPPCLK;
- writel(pcgctl, hsotg->regs + PCGCTL);
- }
-
- hsotg->lx_state = DWC2_L0;
-
+ /* Enable external vbus supply after resuming the port. */
spin_unlock_irqrestore(&hsotg->lock, flags);
+ dwc2_vbus_supply_init(hsotg);
- if (hsotg->bus_suspended) {
- spin_lock_irqsave(&hsotg->lock, flags);
- hsotg->flags.b.port_suspend_change = 1;
- spin_unlock_irqrestore(&hsotg->lock, flags);
- dwc2_port_resume(hsotg);
- } else {
- if (hsotg->params.power_down == DWC2_POWER_DOWN_PARAM_PARTIAL) {
- dwc2_vbus_supply_init(hsotg);
-
- /* Wait for controller to correctly update D+/D- level */
- usleep_range(3000, 5000);
- }
+ /* Wait for controller to correctly update D+/D- level */
+ usleep_range(3000, 5000);
+ spin_lock_irqsave(&hsotg->lock, flags);
- /*
- * Clear Port Enable and Port Status changes.
- * Enable Port Power.
- */
- dwc2_writel(hsotg, HPRT0_PWR | HPRT0_CONNDET |
- HPRT0_ENACHG, HPRT0);
- /* Wait for controller to detect Port Connect */
- usleep_range(5000, 7000);
- }
+ /*
+ * Clear Port Enable and Port Status changes.
+ * Enable Port Power.
+ */
+ dwc2_writel(hsotg, HPRT0_PWR | HPRT0_CONNDET |
+ HPRT0_ENACHG, HPRT0);
- return ret;
+ /* Wait for controller to detect Port Connect */
+ spin_unlock_irqrestore(&hsotg->lock, flags);
+ usleep_range(5000, 7000);
+ spin_lock_irqsave(&hsotg->lock, flags);
unlock:
spin_unlock_irqrestore(&hsotg->lock, flags);
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c74c26f6e398387cc953b3fdb54858f09bfb696b Mon Sep 17 00:00:00 2001
From: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Date: Thu, 8 Apr 2021 13:46:06 +0400
Subject: [PATCH] usb: dwc2: Fix partial power down exiting by system resume
Fixes the implementation of exiting from partial power down
power saving mode when PC is resumed.
Added port connection status checking which prevents exiting from
Partial Power Down mode from _dwc2_hcd_resume() if not in Partial
Power Down mode.
Rearranged the implementation to get rid of many "if"
statements.
NOTE: Switch case statement is used for hibernation partial
power down and clock gating mode determination. In this patch
only Partial Power Down is implemented the Hibernation and
clock gating implementations are planned to be added.
Fixes: 6f6d70597c15 ("usb: dwc2: bus suspend/resume for hosts with DWC2_POWER_DOWN_PARAM_NONE")
Cc: <stable(a)vger.kernel.org>
Acked-by: Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
Signed-off-by: Artur Petrosyan <Arthur.Petrosyan(a)synopsys.com>
Link: https://lore.kernel.org/r/20210408094607.1A9BAA0094@mailhost.synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index 34030bafdff4..f096006df96f 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -4427,7 +4427,7 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
{
struct dwc2_hsotg *hsotg = dwc2_hcd_to_hsotg(hcd);
unsigned long flags;
- u32 pcgctl;
+ u32 hprt0;
int ret = 0;
spin_lock_irqsave(&hsotg->lock, flags);
@@ -4438,11 +4438,40 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
if (hsotg->lx_state != DWC2_L2)
goto unlock;
- if (hsotg->params.power_down > DWC2_POWER_DOWN_PARAM_PARTIAL) {
+ hprt0 = dwc2_read_hprt0(hsotg);
+
+ /*
+ * Added port connection status checking which prevents exiting from
+ * Partial Power Down mode from _dwc2_hcd_resume() if not in Partial
+ * Power Down mode.
+ */
+ if (hprt0 & HPRT0_CONNSTS) {
+ hsotg->lx_state = DWC2_L0;
+ goto unlock;
+ }
+
+ switch (hsotg->params.power_down) {
+ case DWC2_POWER_DOWN_PARAM_PARTIAL:
+ ret = dwc2_exit_partial_power_down(hsotg, 0, true);
+ if (ret)
+ dev_err(hsotg->dev,
+ "exit partial_power_down failed\n");
+ /*
+ * Set HW accessible bit before powering on the controller
+ * since an interrupt may rise.
+ */
+ set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags);
+ break;
+ case DWC2_POWER_DOWN_PARAM_HIBERNATION:
+ case DWC2_POWER_DOWN_PARAM_NONE:
+ default:
hsotg->lx_state = DWC2_L0;
goto unlock;
}
+ /* Change Root port status, as port status change occurred after resume.*/
+ hsotg->flags.b.port_suspend_change = 1;
+
/*
* Enable power if not already done.
* This must not be spinlocked since duration
@@ -4454,52 +4483,25 @@ static int _dwc2_hcd_resume(struct usb_hcd *hcd)
spin_lock_irqsave(&hsotg->lock, flags);
}
- if (hsotg->params.power_down == DWC2_POWER_DOWN_PARAM_PARTIAL) {
- /*
- * Set HW accessible bit before powering on the controller
- * since an interrupt may rise.
- */
- set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags);
-
-
- /* Exit partial_power_down */
- ret = dwc2_exit_partial_power_down(hsotg, 0, true);
- if (ret && (ret != -ENOTSUPP))
- dev_err(hsotg->dev, "exit partial_power_down failed\n");
- } else {
- pcgctl = readl(hsotg->regs + PCGCTL);
- pcgctl &= ~PCGCTL_STOPPCLK;
- writel(pcgctl, hsotg->regs + PCGCTL);
- }
-
- hsotg->lx_state = DWC2_L0;
-
+ /* Enable external vbus supply after resuming the port. */
spin_unlock_irqrestore(&hsotg->lock, flags);
+ dwc2_vbus_supply_init(hsotg);
- if (hsotg->bus_suspended) {
- spin_lock_irqsave(&hsotg->lock, flags);
- hsotg->flags.b.port_suspend_change = 1;
- spin_unlock_irqrestore(&hsotg->lock, flags);
- dwc2_port_resume(hsotg);
- } else {
- if (hsotg->params.power_down == DWC2_POWER_DOWN_PARAM_PARTIAL) {
- dwc2_vbus_supply_init(hsotg);
-
- /* Wait for controller to correctly update D+/D- level */
- usleep_range(3000, 5000);
- }
+ /* Wait for controller to correctly update D+/D- level */
+ usleep_range(3000, 5000);
+ spin_lock_irqsave(&hsotg->lock, flags);
- /*
- * Clear Port Enable and Port Status changes.
- * Enable Port Power.
- */
- dwc2_writel(hsotg, HPRT0_PWR | HPRT0_CONNDET |
- HPRT0_ENACHG, HPRT0);
- /* Wait for controller to detect Port Connect */
- usleep_range(5000, 7000);
- }
+ /*
+ * Clear Port Enable and Port Status changes.
+ * Enable Port Power.
+ */
+ dwc2_writel(hsotg, HPRT0_PWR | HPRT0_CONNDET |
+ HPRT0_ENACHG, HPRT0);
- return ret;
+ /* Wait for controller to detect Port Connect */
+ spin_unlock_irqrestore(&hsotg->lock, flags);
+ usleep_range(5000, 7000);
+ spin_lock_irqsave(&hsotg->lock, flags);
unlock:
spin_unlock_irqrestore(&hsotg->lock, flags);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f88359e1588b85cf0e8209ab7d6620085f3441d9 Mon Sep 17 00:00:00 2001
From: Yu Chen <chenyu56(a)huawei.com>
Date: Thu, 15 Apr 2021 15:20:30 -0700
Subject: [PATCH] usb: dwc3: core: Do core softreset when switch mode
From: John Stultz <john.stultz(a)linaro.org>
According to the programming guide, to switch mode for DRD controller,
the driver needs to do the following.
To switch from device to host:
1. Reset controller with GCTL.CoreSoftReset
2. Set GCTL.PrtCapDir(host mode)
3. Reset the host with USBCMD.HCRESET
4. Then follow up with the initializing host registers sequence
To switch from host to device:
1. Reset controller with GCTL.CoreSoftReset
2. Set GCTL.PrtCapDir(device mode)
3. Reset the device with DCTL.CSftRst
4. Then follow up with the initializing registers sequence
Currently we're missing step 1) to do GCTL.CoreSoftReset and step 3) of
switching from host to device. John Stult reported a lockup issue seen
with HiKey960 platform without these steps[1]. Similar issue is observed
with Ferry's testing platform[2].
So, apply the required steps along with some fixes to Yu Chen's and John
Stultz's version. The main fixes to their versions are the missing wait
for clocks synchronization before clearing GCTL.CoreSoftReset and only
apply DCTL.CSftRst when switching from host to device.
[1] https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro…
[2] https://lore.kernel.org/linux-usb/0ba7a6ba-e6a7-9cd4-0695-64fc927e01f1@gmai…
Fixes: 41ce1456e1db ("usb: dwc3: core: make dwc3_set_mode() work properly")
Cc: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Cc: Ferry Toth <fntoth(a)gmail.com>
Cc: Wesley Cheng <wcheng(a)codeaurora.org>
Cc: <stable(a)vger.kernel.org>
Tested-by: John Stultz <john.stultz(a)linaro.org>
Tested-by: Wesley Cheng <wcheng(a)codeaurora.org>
Signed-off-by: Yu Chen <chenyu56(a)huawei.com>
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/374440f8dcd4f06c02c2caf4b1efde86774e02d9.16185216…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 5c25e6a72dbd..2f118ad43571 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -114,6 +114,8 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
dwc->current_dr_role = mode;
}
+static int dwc3_core_soft_reset(struct dwc3 *dwc);
+
static void __dwc3_set_mode(struct work_struct *work)
{
struct dwc3 *dwc = work_to_dwc(work);
@@ -121,6 +123,8 @@ static void __dwc3_set_mode(struct work_struct *work)
int ret;
u32 reg;
+ mutex_lock(&dwc->mutex);
+
pm_runtime_get_sync(dwc->dev);
if (dwc->current_dr_role == DWC3_GCTL_PRTCAP_OTG)
@@ -154,6 +158,25 @@ static void __dwc3_set_mode(struct work_struct *work)
break;
}
+ /* For DRD host or device mode only */
+ if (dwc->desired_dr_role != DWC3_GCTL_PRTCAP_OTG) {
+ reg = dwc3_readl(dwc->regs, DWC3_GCTL);
+ reg |= DWC3_GCTL_CORESOFTRESET;
+ dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+
+ /*
+ * Wait for internal clocks to synchronized. DWC_usb31 and
+ * DWC_usb32 may need at least 50ms (less for DWC_usb3). To
+ * keep it consistent across different IPs, let's wait up to
+ * 100ms before clearing GCTL.CORESOFTRESET.
+ */
+ msleep(100);
+
+ reg = dwc3_readl(dwc->regs, DWC3_GCTL);
+ reg &= ~DWC3_GCTL_CORESOFTRESET;
+ dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+ }
+
spin_lock_irqsave(&dwc->lock, flags);
dwc3_set_prtcap(dwc, dwc->desired_dr_role);
@@ -178,6 +201,8 @@ static void __dwc3_set_mode(struct work_struct *work)
}
break;
case DWC3_GCTL_PRTCAP_DEVICE:
+ dwc3_core_soft_reset(dwc);
+
dwc3_event_buffers_setup(dwc);
if (dwc->usb2_phy)
@@ -200,6 +225,7 @@ static void __dwc3_set_mode(struct work_struct *work)
out:
pm_runtime_mark_last_busy(dwc->dev);
pm_runtime_put_autosuspend(dwc->dev);
+ mutex_unlock(&dwc->mutex);
}
void dwc3_set_mode(struct dwc3 *dwc, u32 mode)
@@ -1553,6 +1579,7 @@ static int dwc3_probe(struct platform_device *pdev)
dwc3_cache_hwparams(dwc);
spin_lock_init(&dwc->lock);
+ mutex_init(&dwc->mutex);
pm_runtime_set_active(dev);
pm_runtime_use_autosuspend(dev);
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 695ff2d791e4..7e3afa5378e8 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -13,6 +13,7 @@
#include <linux/device.h>
#include <linux/spinlock.h>
+#include <linux/mutex.h>
#include <linux/ioport.h>
#include <linux/list.h>
#include <linux/bitops.h>
@@ -947,6 +948,7 @@ struct dwc3_scratchpad_array {
* @scratch_addr: dma address of scratchbuf
* @ep0_in_setup: one control transfer is completed and enter setup phase
* @lock: for synchronizing
+ * @mutex: for mode switching
* @dev: pointer to our struct device
* @sysdev: pointer to the DMA-capable device
* @xhci: pointer to our xHCI child
@@ -1088,6 +1090,9 @@ struct dwc3 {
/* device lock */
spinlock_t lock;
+ /* mode switching lock */
+ struct mutex mutex;
+
struct device *dev;
struct device *sysdev;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f88359e1588b85cf0e8209ab7d6620085f3441d9 Mon Sep 17 00:00:00 2001
From: Yu Chen <chenyu56(a)huawei.com>
Date: Thu, 15 Apr 2021 15:20:30 -0700
Subject: [PATCH] usb: dwc3: core: Do core softreset when switch mode
From: John Stultz <john.stultz(a)linaro.org>
According to the programming guide, to switch mode for DRD controller,
the driver needs to do the following.
To switch from device to host:
1. Reset controller with GCTL.CoreSoftReset
2. Set GCTL.PrtCapDir(host mode)
3. Reset the host with USBCMD.HCRESET
4. Then follow up with the initializing host registers sequence
To switch from host to device:
1. Reset controller with GCTL.CoreSoftReset
2. Set GCTL.PrtCapDir(device mode)
3. Reset the device with DCTL.CSftRst
4. Then follow up with the initializing registers sequence
Currently we're missing step 1) to do GCTL.CoreSoftReset and step 3) of
switching from host to device. John Stult reported a lockup issue seen
with HiKey960 platform without these steps[1]. Similar issue is observed
with Ferry's testing platform[2].
So, apply the required steps along with some fixes to Yu Chen's and John
Stultz's version. The main fixes to their versions are the missing wait
for clocks synchronization before clearing GCTL.CoreSoftReset and only
apply DCTL.CSftRst when switching from host to device.
[1] https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro…
[2] https://lore.kernel.org/linux-usb/0ba7a6ba-e6a7-9cd4-0695-64fc927e01f1@gmai…
Fixes: 41ce1456e1db ("usb: dwc3: core: make dwc3_set_mode() work properly")
Cc: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Cc: Ferry Toth <fntoth(a)gmail.com>
Cc: Wesley Cheng <wcheng(a)codeaurora.org>
Cc: <stable(a)vger.kernel.org>
Tested-by: John Stultz <john.stultz(a)linaro.org>
Tested-by: Wesley Cheng <wcheng(a)codeaurora.org>
Signed-off-by: Yu Chen <chenyu56(a)huawei.com>
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/374440f8dcd4f06c02c2caf4b1efde86774e02d9.16185216…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 5c25e6a72dbd..2f118ad43571 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -114,6 +114,8 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
dwc->current_dr_role = mode;
}
+static int dwc3_core_soft_reset(struct dwc3 *dwc);
+
static void __dwc3_set_mode(struct work_struct *work)
{
struct dwc3 *dwc = work_to_dwc(work);
@@ -121,6 +123,8 @@ static void __dwc3_set_mode(struct work_struct *work)
int ret;
u32 reg;
+ mutex_lock(&dwc->mutex);
+
pm_runtime_get_sync(dwc->dev);
if (dwc->current_dr_role == DWC3_GCTL_PRTCAP_OTG)
@@ -154,6 +158,25 @@ static void __dwc3_set_mode(struct work_struct *work)
break;
}
+ /* For DRD host or device mode only */
+ if (dwc->desired_dr_role != DWC3_GCTL_PRTCAP_OTG) {
+ reg = dwc3_readl(dwc->regs, DWC3_GCTL);
+ reg |= DWC3_GCTL_CORESOFTRESET;
+ dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+
+ /*
+ * Wait for internal clocks to synchronized. DWC_usb31 and
+ * DWC_usb32 may need at least 50ms (less for DWC_usb3). To
+ * keep it consistent across different IPs, let's wait up to
+ * 100ms before clearing GCTL.CORESOFTRESET.
+ */
+ msleep(100);
+
+ reg = dwc3_readl(dwc->regs, DWC3_GCTL);
+ reg &= ~DWC3_GCTL_CORESOFTRESET;
+ dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+ }
+
spin_lock_irqsave(&dwc->lock, flags);
dwc3_set_prtcap(dwc, dwc->desired_dr_role);
@@ -178,6 +201,8 @@ static void __dwc3_set_mode(struct work_struct *work)
}
break;
case DWC3_GCTL_PRTCAP_DEVICE:
+ dwc3_core_soft_reset(dwc);
+
dwc3_event_buffers_setup(dwc);
if (dwc->usb2_phy)
@@ -200,6 +225,7 @@ static void __dwc3_set_mode(struct work_struct *work)
out:
pm_runtime_mark_last_busy(dwc->dev);
pm_runtime_put_autosuspend(dwc->dev);
+ mutex_unlock(&dwc->mutex);
}
void dwc3_set_mode(struct dwc3 *dwc, u32 mode)
@@ -1553,6 +1579,7 @@ static int dwc3_probe(struct platform_device *pdev)
dwc3_cache_hwparams(dwc);
spin_lock_init(&dwc->lock);
+ mutex_init(&dwc->mutex);
pm_runtime_set_active(dev);
pm_runtime_use_autosuspend(dev);
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 695ff2d791e4..7e3afa5378e8 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -13,6 +13,7 @@
#include <linux/device.h>
#include <linux/spinlock.h>
+#include <linux/mutex.h>
#include <linux/ioport.h>
#include <linux/list.h>
#include <linux/bitops.h>
@@ -947,6 +948,7 @@ struct dwc3_scratchpad_array {
* @scratch_addr: dma address of scratchbuf
* @ep0_in_setup: one control transfer is completed and enter setup phase
* @lock: for synchronizing
+ * @mutex: for mode switching
* @dev: pointer to our struct device
* @sysdev: pointer to the DMA-capable device
* @xhci: pointer to our xHCI child
@@ -1088,6 +1090,9 @@ struct dwc3 {
/* device lock */
spinlock_t lock;
+ /* mode switching lock */
+ struct mutex mutex;
+
struct device *dev;
struct device *sysdev;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f88359e1588b85cf0e8209ab7d6620085f3441d9 Mon Sep 17 00:00:00 2001
From: Yu Chen <chenyu56(a)huawei.com>
Date: Thu, 15 Apr 2021 15:20:30 -0700
Subject: [PATCH] usb: dwc3: core: Do core softreset when switch mode
From: John Stultz <john.stultz(a)linaro.org>
According to the programming guide, to switch mode for DRD controller,
the driver needs to do the following.
To switch from device to host:
1. Reset controller with GCTL.CoreSoftReset
2. Set GCTL.PrtCapDir(host mode)
3. Reset the host with USBCMD.HCRESET
4. Then follow up with the initializing host registers sequence
To switch from host to device:
1. Reset controller with GCTL.CoreSoftReset
2. Set GCTL.PrtCapDir(device mode)
3. Reset the device with DCTL.CSftRst
4. Then follow up with the initializing registers sequence
Currently we're missing step 1) to do GCTL.CoreSoftReset and step 3) of
switching from host to device. John Stult reported a lockup issue seen
with HiKey960 platform without these steps[1]. Similar issue is observed
with Ferry's testing platform[2].
So, apply the required steps along with some fixes to Yu Chen's and John
Stultz's version. The main fixes to their versions are the missing wait
for clocks synchronization before clearing GCTL.CoreSoftReset and only
apply DCTL.CSftRst when switching from host to device.
[1] https://lore.kernel.org/linux-usb/20210108015115.27920-1-john.stultz@linaro…
[2] https://lore.kernel.org/linux-usb/0ba7a6ba-e6a7-9cd4-0695-64fc927e01f1@gmai…
Fixes: 41ce1456e1db ("usb: dwc3: core: make dwc3_set_mode() work properly")
Cc: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Cc: Ferry Toth <fntoth(a)gmail.com>
Cc: Wesley Cheng <wcheng(a)codeaurora.org>
Cc: <stable(a)vger.kernel.org>
Tested-by: John Stultz <john.stultz(a)linaro.org>
Tested-by: Wesley Cheng <wcheng(a)codeaurora.org>
Signed-off-by: Yu Chen <chenyu56(a)huawei.com>
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/374440f8dcd4f06c02c2caf4b1efde86774e02d9.16185216…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 5c25e6a72dbd..2f118ad43571 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -114,6 +114,8 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
dwc->current_dr_role = mode;
}
+static int dwc3_core_soft_reset(struct dwc3 *dwc);
+
static void __dwc3_set_mode(struct work_struct *work)
{
struct dwc3 *dwc = work_to_dwc(work);
@@ -121,6 +123,8 @@ static void __dwc3_set_mode(struct work_struct *work)
int ret;
u32 reg;
+ mutex_lock(&dwc->mutex);
+
pm_runtime_get_sync(dwc->dev);
if (dwc->current_dr_role == DWC3_GCTL_PRTCAP_OTG)
@@ -154,6 +158,25 @@ static void __dwc3_set_mode(struct work_struct *work)
break;
}
+ /* For DRD host or device mode only */
+ if (dwc->desired_dr_role != DWC3_GCTL_PRTCAP_OTG) {
+ reg = dwc3_readl(dwc->regs, DWC3_GCTL);
+ reg |= DWC3_GCTL_CORESOFTRESET;
+ dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+
+ /*
+ * Wait for internal clocks to synchronized. DWC_usb31 and
+ * DWC_usb32 may need at least 50ms (less for DWC_usb3). To
+ * keep it consistent across different IPs, let's wait up to
+ * 100ms before clearing GCTL.CORESOFTRESET.
+ */
+ msleep(100);
+
+ reg = dwc3_readl(dwc->regs, DWC3_GCTL);
+ reg &= ~DWC3_GCTL_CORESOFTRESET;
+ dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+ }
+
spin_lock_irqsave(&dwc->lock, flags);
dwc3_set_prtcap(dwc, dwc->desired_dr_role);
@@ -178,6 +201,8 @@ static void __dwc3_set_mode(struct work_struct *work)
}
break;
case DWC3_GCTL_PRTCAP_DEVICE:
+ dwc3_core_soft_reset(dwc);
+
dwc3_event_buffers_setup(dwc);
if (dwc->usb2_phy)
@@ -200,6 +225,7 @@ static void __dwc3_set_mode(struct work_struct *work)
out:
pm_runtime_mark_last_busy(dwc->dev);
pm_runtime_put_autosuspend(dwc->dev);
+ mutex_unlock(&dwc->mutex);
}
void dwc3_set_mode(struct dwc3 *dwc, u32 mode)
@@ -1553,6 +1579,7 @@ static int dwc3_probe(struct platform_device *pdev)
dwc3_cache_hwparams(dwc);
spin_lock_init(&dwc->lock);
+ mutex_init(&dwc->mutex);
pm_runtime_set_active(dev);
pm_runtime_use_autosuspend(dev);
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 695ff2d791e4..7e3afa5378e8 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -13,6 +13,7 @@
#include <linux/device.h>
#include <linux/spinlock.h>
+#include <linux/mutex.h>
#include <linux/ioport.h>
#include <linux/list.h>
#include <linux/bitops.h>
@@ -947,6 +948,7 @@ struct dwc3_scratchpad_array {
* @scratch_addr: dma address of scratchbuf
* @ep0_in_setup: one control transfer is completed and enter setup phase
* @lock: for synchronizing
+ * @mutex: for mode switching
* @dev: pointer to our struct device
* @sysdev: pointer to the DMA-capable device
* @xhci: pointer to our xHCI child
@@ -1088,6 +1090,9 @@ struct dwc3 {
/* device lock */
spinlock_t lock;
+ /* mode switching lock */
+ struct mutex mutex;
+
struct device *dev;
struct device *sysdev;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3232a3ce55edfc0d7f8904543b4088a5339c2b2b Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Thu, 15 Apr 2021 00:41:58 -0700
Subject: [PATCH] usb: dwc3: gadget: Remove FS bInterval_m1 limitation
The programming guide incorrectly stated that the DCFG.bInterval_m1 must
be set to 0 when operating in fullspeed. There's no such limitation for
all IPs. See DWC_usb3x programming guide section 3.2.2.1.
Fixes: a1679af85b2a ("usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1")
Cc: <stable(a)vger.kernel.org>
Acked-by: Felipe Balbi <balbi(a)kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/5d4139ae89d810eb0a2d8577fb096fc88e87bfab.16184724…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 1a632a3faf7f..90f4f9e69b22 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -607,12 +607,14 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
u8 bInterval_m1;
/*
- * Valid range for DEPCFG.bInterval_m1 is from 0 to 13, and it
- * must be set to 0 when the controller operates in full-speed.
+ * Valid range for DEPCFG.bInterval_m1 is from 0 to 13.
+ *
+ * NOTE: The programming guide incorrectly stated bInterval_m1
+ * must be set to 0 when operating in fullspeed. Internally the
+ * controller does not have this limitation. See DWC_usb3x
+ * programming guide section 3.2.2.1.
*/
bInterval_m1 = min_t(u8, desc->bInterval - 1, 13);
- if (dwc->gadget->speed == USB_SPEED_FULL)
- bInterval_m1 = 0;
if (usb_endpoint_type(desc) == USB_ENDPOINT_XFER_INT &&
dwc->gadget->speed == USB_SPEED_FULL)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3232a3ce55edfc0d7f8904543b4088a5339c2b2b Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Thu, 15 Apr 2021 00:41:58 -0700
Subject: [PATCH] usb: dwc3: gadget: Remove FS bInterval_m1 limitation
The programming guide incorrectly stated that the DCFG.bInterval_m1 must
be set to 0 when operating in fullspeed. There's no such limitation for
all IPs. See DWC_usb3x programming guide section 3.2.2.1.
Fixes: a1679af85b2a ("usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1")
Cc: <stable(a)vger.kernel.org>
Acked-by: Felipe Balbi <balbi(a)kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/5d4139ae89d810eb0a2d8577fb096fc88e87bfab.16184724…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 1a632a3faf7f..90f4f9e69b22 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -607,12 +607,14 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
u8 bInterval_m1;
/*
- * Valid range for DEPCFG.bInterval_m1 is from 0 to 13, and it
- * must be set to 0 when the controller operates in full-speed.
+ * Valid range for DEPCFG.bInterval_m1 is from 0 to 13.
+ *
+ * NOTE: The programming guide incorrectly stated bInterval_m1
+ * must be set to 0 when operating in fullspeed. Internally the
+ * controller does not have this limitation. See DWC_usb3x
+ * programming guide section 3.2.2.1.
*/
bInterval_m1 = min_t(u8, desc->bInterval - 1, 13);
- if (dwc->gadget->speed == USB_SPEED_FULL)
- bInterval_m1 = 0;
if (usb_endpoint_type(desc) == USB_ENDPOINT_XFER_INT &&
dwc->gadget->speed == USB_SPEED_FULL)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3232a3ce55edfc0d7f8904543b4088a5339c2b2b Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Thu, 15 Apr 2021 00:41:58 -0700
Subject: [PATCH] usb: dwc3: gadget: Remove FS bInterval_m1 limitation
The programming guide incorrectly stated that the DCFG.bInterval_m1 must
be set to 0 when operating in fullspeed. There's no such limitation for
all IPs. See DWC_usb3x programming guide section 3.2.2.1.
Fixes: a1679af85b2a ("usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1")
Cc: <stable(a)vger.kernel.org>
Acked-by: Felipe Balbi <balbi(a)kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/5d4139ae89d810eb0a2d8577fb096fc88e87bfab.16184724…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 1a632a3faf7f..90f4f9e69b22 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -607,12 +607,14 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
u8 bInterval_m1;
/*
- * Valid range for DEPCFG.bInterval_m1 is from 0 to 13, and it
- * must be set to 0 when the controller operates in full-speed.
+ * Valid range for DEPCFG.bInterval_m1 is from 0 to 13.
+ *
+ * NOTE: The programming guide incorrectly stated bInterval_m1
+ * must be set to 0 when operating in fullspeed. Internally the
+ * controller does not have this limitation. See DWC_usb3x
+ * programming guide section 3.2.2.1.
*/
bInterval_m1 = min_t(u8, desc->bInterval - 1, 13);
- if (dwc->gadget->speed == USB_SPEED_FULL)
- bInterval_m1 = 0;
if (usb_endpoint_type(desc) == USB_ENDPOINT_XFER_INT &&
dwc->gadget->speed == USB_SPEED_FULL)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3232a3ce55edfc0d7f8904543b4088a5339c2b2b Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Thu, 15 Apr 2021 00:41:58 -0700
Subject: [PATCH] usb: dwc3: gadget: Remove FS bInterval_m1 limitation
The programming guide incorrectly stated that the DCFG.bInterval_m1 must
be set to 0 when operating in fullspeed. There's no such limitation for
all IPs. See DWC_usb3x programming guide section 3.2.2.1.
Fixes: a1679af85b2a ("usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1")
Cc: <stable(a)vger.kernel.org>
Acked-by: Felipe Balbi <balbi(a)kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/5d4139ae89d810eb0a2d8577fb096fc88e87bfab.16184724…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 1a632a3faf7f..90f4f9e69b22 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -607,12 +607,14 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
u8 bInterval_m1;
/*
- * Valid range for DEPCFG.bInterval_m1 is from 0 to 13, and it
- * must be set to 0 when the controller operates in full-speed.
+ * Valid range for DEPCFG.bInterval_m1 is from 0 to 13.
+ *
+ * NOTE: The programming guide incorrectly stated bInterval_m1
+ * must be set to 0 when operating in fullspeed. Internally the
+ * controller does not have this limitation. See DWC_usb3x
+ * programming guide section 3.2.2.1.
*/
bInterval_m1 = min_t(u8, desc->bInterval - 1, 13);
- if (dwc->gadget->speed == USB_SPEED_FULL)
- bInterval_m1 = 0;
if (usb_endpoint_type(desc) == USB_ENDPOINT_XFER_INT &&
dwc->gadget->speed == USB_SPEED_FULL)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3232a3ce55edfc0d7f8904543b4088a5339c2b2b Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Thu, 15 Apr 2021 00:41:58 -0700
Subject: [PATCH] usb: dwc3: gadget: Remove FS bInterval_m1 limitation
The programming guide incorrectly stated that the DCFG.bInterval_m1 must
be set to 0 when operating in fullspeed. There's no such limitation for
all IPs. See DWC_usb3x programming guide section 3.2.2.1.
Fixes: a1679af85b2a ("usb: dwc3: gadget: Fix setting of DEPCFG.bInterval_m1")
Cc: <stable(a)vger.kernel.org>
Acked-by: Felipe Balbi <balbi(a)kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/5d4139ae89d810eb0a2d8577fb096fc88e87bfab.16184724…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 1a632a3faf7f..90f4f9e69b22 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -607,12 +607,14 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
u8 bInterval_m1;
/*
- * Valid range for DEPCFG.bInterval_m1 is from 0 to 13, and it
- * must be set to 0 when the controller operates in full-speed.
+ * Valid range for DEPCFG.bInterval_m1 is from 0 to 13.
+ *
+ * NOTE: The programming guide incorrectly stated bInterval_m1
+ * must be set to 0 when operating in fullspeed. Internally the
+ * controller does not have this limitation. See DWC_usb3x
+ * programming guide section 3.2.2.1.
*/
bInterval_m1 = min_t(u8, desc->bInterval - 1, 13);
- if (dwc->gadget->speed == USB_SPEED_FULL)
- bInterval_m1 = 0;
if (usb_endpoint_type(desc) == USB_ENDPOINT_XFER_INT &&
dwc->gadget->speed == USB_SPEED_FULL)
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6d042ffb598ed83e7d5623cc961d249def5b9829 Mon Sep 17 00:00:00 2001
From: Palash Oswal <hello(a)oswalpalash.com>
Date: Tue, 27 Apr 2021 18:21:49 +0530
Subject: [PATCH] io_uring: Check current->io_uring in io_uring_cancel_sqpoll
syzkaller identified KASAN: null-ptr-deref Write in
io_uring_cancel_sqpoll.
io_uring_cancel_sqpoll is called by io_sq_thread before calling
io_uring_alloc_task_context. This leads to current->io_uring being NULL.
io_uring_cancel_sqpoll should not have to deal with threads where
current->io_uring is NULL.
In order to cast a wider safety net, perform input sanitisation directly
in io_uring_cancel_sqpoll and return for NULL value of current->io_uring.
This is safe since if current->io_uring isn't set, then there's no way
for the task to have submitted any requests.
Reported-by: syzbot+be51ca5a4d97f017cd50(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Palash Oswal <hello(a)oswalpalash.com>
Link: https://lore.kernel.org/r/20210427125148.21816-1-hello@oswalpalash.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 863420e184cf..81096f3b01ea 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -9044,6 +9044,8 @@ static void io_uring_cancel_sqpoll(struct io_sq_data *sqd)
s64 inflight;
DEFINE_WAIT(wait);
+ if (!current->io_uring)
+ return;
WARN_ON_ONCE(!sqd || sqd->thread != current);
atomic_inc(&tctx->in_idle);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6d042ffb598ed83e7d5623cc961d249def5b9829 Mon Sep 17 00:00:00 2001
From: Palash Oswal <hello(a)oswalpalash.com>
Date: Tue, 27 Apr 2021 18:21:49 +0530
Subject: [PATCH] io_uring: Check current->io_uring in io_uring_cancel_sqpoll
syzkaller identified KASAN: null-ptr-deref Write in
io_uring_cancel_sqpoll.
io_uring_cancel_sqpoll is called by io_sq_thread before calling
io_uring_alloc_task_context. This leads to current->io_uring being NULL.
io_uring_cancel_sqpoll should not have to deal with threads where
current->io_uring is NULL.
In order to cast a wider safety net, perform input sanitisation directly
in io_uring_cancel_sqpoll and return for NULL value of current->io_uring.
This is safe since if current->io_uring isn't set, then there's no way
for the task to have submitted any requests.
Reported-by: syzbot+be51ca5a4d97f017cd50(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Palash Oswal <hello(a)oswalpalash.com>
Link: https://lore.kernel.org/r/20210427125148.21816-1-hello@oswalpalash.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 863420e184cf..81096f3b01ea 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -9044,6 +9044,8 @@ static void io_uring_cancel_sqpoll(struct io_sq_data *sqd)
s64 inflight;
DEFINE_WAIT(wait);
+ if (!current->io_uring)
+ return;
WARN_ON_ONCE(!sqd || sqd->thread != current);
atomic_inc(&tctx->in_idle);
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 28090c133869b461c5366195a856d73469ab87d9 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Sun, 25 Apr 2021 23:34:45 +0100
Subject: [PATCH] io_uring: fix work_exit sqpoll cancellations
After closing an SQPOLL ring, io_ring_exit_work() kicks in and starts
doing cancellations via io_uring_try_cancel_requests(). It will go
through io_uring_try_cancel_iowq(), which uses ctx->tctx_list, but as
SQPOLL task don't have a ctx note, its io-wq won't be reachable and so
is left not cancelled.
It will eventually cancelled when one of the tasks dies, but if a thread
group survives for long and changes rings, it will spawn lots of
unreclaimed resources and live locked works.
Cancel SQPOLL task's io-wq separately in io_ring_exit_work().
Cc: stable(a)vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/a71a7fe345135d684025bb529d5cb1d8d6b46e10.16193899…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 577520445fa0..f20622bd963b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8679,6 +8679,13 @@ static void io_tctx_exit_cb(struct callback_head *cb)
complete(&work->completion);
}
+static bool io_cancel_ctx_cb(struct io_wq_work *work, void *data)
+{
+ struct io_kiocb *req = container_of(work, struct io_kiocb, work);
+
+ return req->ctx == data;
+}
+
static void io_ring_exit_work(struct work_struct *work)
{
struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
@@ -8695,6 +8702,17 @@ static void io_ring_exit_work(struct work_struct *work)
*/
do {
io_uring_try_cancel_requests(ctx, NULL, NULL);
+ if (ctx->sq_data) {
+ struct io_sq_data *sqd = ctx->sq_data;
+ struct task_struct *tsk;
+
+ io_sq_thread_park(sqd);
+ tsk = sqd->thread;
+ if (tsk && tsk->io_uring && tsk->io_uring->io_wq)
+ io_wq_cancel_cb(tsk->io_uring->io_wq,
+ io_cancel_ctx_cb, ctx, true);
+ io_sq_thread_unpark(sqd);
+ }
WARN_ON_ONCE(time_after(jiffies, timeout));
} while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20));
@@ -8844,13 +8862,6 @@ static bool io_cancel_defer_files(struct io_ring_ctx *ctx,
return true;
}
-static bool io_cancel_ctx_cb(struct io_wq_work *work, void *data)
-{
- struct io_kiocb *req = container_of(work, struct io_kiocb, work);
-
- return req->ctx == data;
-}
-
static bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx)
{
struct io_tctx_node *node;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 28090c133869b461c5366195a856d73469ab87d9 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Sun, 25 Apr 2021 23:34:45 +0100
Subject: [PATCH] io_uring: fix work_exit sqpoll cancellations
After closing an SQPOLL ring, io_ring_exit_work() kicks in and starts
doing cancellations via io_uring_try_cancel_requests(). It will go
through io_uring_try_cancel_iowq(), which uses ctx->tctx_list, but as
SQPOLL task don't have a ctx note, its io-wq won't be reachable and so
is left not cancelled.
It will eventually cancelled when one of the tasks dies, but if a thread
group survives for long and changes rings, it will spawn lots of
unreclaimed resources and live locked works.
Cancel SQPOLL task's io-wq separately in io_ring_exit_work().
Cc: stable(a)vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/a71a7fe345135d684025bb529d5cb1d8d6b46e10.16193899…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 577520445fa0..f20622bd963b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8679,6 +8679,13 @@ static void io_tctx_exit_cb(struct callback_head *cb)
complete(&work->completion);
}
+static bool io_cancel_ctx_cb(struct io_wq_work *work, void *data)
+{
+ struct io_kiocb *req = container_of(work, struct io_kiocb, work);
+
+ return req->ctx == data;
+}
+
static void io_ring_exit_work(struct work_struct *work)
{
struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
@@ -8695,6 +8702,17 @@ static void io_ring_exit_work(struct work_struct *work)
*/
do {
io_uring_try_cancel_requests(ctx, NULL, NULL);
+ if (ctx->sq_data) {
+ struct io_sq_data *sqd = ctx->sq_data;
+ struct task_struct *tsk;
+
+ io_sq_thread_park(sqd);
+ tsk = sqd->thread;
+ if (tsk && tsk->io_uring && tsk->io_uring->io_wq)
+ io_wq_cancel_cb(tsk->io_uring->io_wq,
+ io_cancel_ctx_cb, ctx, true);
+ io_sq_thread_unpark(sqd);
+ }
WARN_ON_ONCE(time_after(jiffies, timeout));
} while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20));
@@ -8844,13 +8862,6 @@ static bool io_cancel_defer_files(struct io_ring_ctx *ctx,
return true;
}
-static bool io_cancel_ctx_cb(struct io_wq_work *work, void *data)
-{
- struct io_kiocb *req = container_of(work, struct io_kiocb, work);
-
- return req->ctx == data;
-}
-
static bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx)
{
struct io_tctx_node *node;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3b763ba1c77da5806e4fdc5684285814fe970c98 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Sun, 18 Apr 2021 14:52:08 +0100
Subject: [PATCH] io_uring: remove extra sqpoll submission halting
SQPOLL task won't submit requests for a context that is currently dying,
so no need to remove ctx from sqd_list prior the main loop of
io_ring_exit_work(). Kill it, will be removed by io_sq_thread_finish()
and only brings confusion and lockups.
Cc: stable(a)vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/f220c2b786ba0f9499bebc9f3cd9714d29efb6a5.16187529…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e491b815df8c..4fc54cd40470 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6750,6 +6750,10 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
if (!list_empty(&ctx->iopoll_list))
io_do_iopoll(ctx, &nr_events, 0);
+ /*
+ * Don't submit if refs are dying, good for io_uring_register(),
+ * but also it is relied upon by io_ring_exit_work()
+ */
if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) &&
!(ctx->flags & IORING_SETUP_R_DISABLED))
ret = io_submit_sqes(ctx, to_submit);
@@ -8540,14 +8544,6 @@ static void io_ring_exit_work(struct work_struct *work)
struct io_tctx_node *node;
int ret;
- /* prevent SQPOLL from submitting new requests */
- if (ctx->sq_data) {
- io_sq_thread_park(ctx->sq_data);
- list_del_init(&ctx->sqd_list);
- io_sqd_update_thread_idle(ctx->sq_data);
- io_sq_thread_unpark(ctx->sq_data);
- }
-
/*
* If we're doing polled IO and end up having requests being
* submitted async (out-of-line), then completions can come in while
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3b763ba1c77da5806e4fdc5684285814fe970c98 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Sun, 18 Apr 2021 14:52:08 +0100
Subject: [PATCH] io_uring: remove extra sqpoll submission halting
SQPOLL task won't submit requests for a context that is currently dying,
so no need to remove ctx from sqd_list prior the main loop of
io_ring_exit_work(). Kill it, will be removed by io_sq_thread_finish()
and only brings confusion and lockups.
Cc: stable(a)vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/f220c2b786ba0f9499bebc9f3cd9714d29efb6a5.16187529…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e491b815df8c..4fc54cd40470 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6750,6 +6750,10 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
if (!list_empty(&ctx->iopoll_list))
io_do_iopoll(ctx, &nr_events, 0);
+ /*
+ * Don't submit if refs are dying, good for io_uring_register(),
+ * but also it is relied upon by io_ring_exit_work()
+ */
if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) &&
!(ctx->flags & IORING_SETUP_R_DISABLED))
ret = io_submit_sqes(ctx, to_submit);
@@ -8540,14 +8544,6 @@ static void io_ring_exit_work(struct work_struct *work)
struct io_tctx_node *node;
int ret;
- /* prevent SQPOLL from submitting new requests */
- if (ctx->sq_data) {
- io_sq_thread_park(ctx->sq_data);
- list_del_init(&ctx->sqd_list);
- io_sqd_update_thread_idle(ctx->sq_data);
- io_sq_thread_unpark(ctx->sq_data);
- }
-
/*
* If we're doing polled IO and end up having requests being
* submitted async (out-of-line), then completions can come in while
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 084804002e512427bfe52b448cb7cac0d4209b64 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Tue, 13 Apr 2021 02:58:38 +0100
Subject: [PATCH] io_uring: fix leaking reg files on exit
If io_sqe_files_unregister() faults on io_rsrc_ref_quiesce(), it will
fail to do unregister leaving files referenced. And that may well happen
because of a strayed signal or just because it does allocations inside.
In io_ring_ctx_free() do an unsafe version of unregister, as it's
guaranteed to not have requests by that point and so quiesce is useless.
Cc: stable(a)vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/e696e9eade571b51997d0dc1d01f144c6d685c05.16182789…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8564c7908126..1af8bb5f7d56 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7089,6 +7089,10 @@ static void __io_sqe_files_unregister(struct io_ring_ctx *ctx)
fput(file);
}
#endif
+ io_free_file_tables(&ctx->file_table, ctx->nr_user_files);
+ kfree(ctx->file_data);
+ ctx->file_data = NULL;
+ ctx->nr_user_files = 0;
}
static inline void io_rsrc_ref_lock(struct io_ring_ctx *ctx)
@@ -7195,21 +7199,14 @@ static struct io_rsrc_data *io_rsrc_data_alloc(struct io_ring_ctx *ctx,
static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
{
- struct io_rsrc_data *data = ctx->file_data;
int ret;
- if (!data)
+ if (!ctx->file_data)
return -ENXIO;
- ret = io_rsrc_ref_quiesce(data, ctx);
- if (ret)
- return ret;
-
- __io_sqe_files_unregister(ctx);
- io_free_file_tables(&ctx->file_table, ctx->nr_user_files);
- kfree(data);
- ctx->file_data = NULL;
- ctx->nr_user_files = 0;
- return 0;
+ ret = io_rsrc_ref_quiesce(ctx->file_data, ctx);
+ if (!ret)
+ __io_sqe_files_unregister(ctx);
+ return ret;
}
static void io_sq_thread_unpark(struct io_sq_data *sqd)
@@ -7659,7 +7656,7 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
ret = io_sqe_files_scm(ctx);
if (ret) {
- io_sqe_files_unregister(ctx);
+ __io_sqe_files_unregister(ctx);
return ret;
}
@@ -8460,7 +8457,11 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
}
mutex_lock(&ctx->uring_lock);
- io_sqe_files_unregister(ctx);
+ if (ctx->file_data) {
+ if (!atomic_dec_and_test(&ctx->file_data->refs))
+ wait_for_completion(&ctx->file_data->done);
+ __io_sqe_files_unregister(ctx);
+ }
if (ctx->rings)
__io_cqring_overflow_flush(ctx, true);
mutex_unlock(&ctx->uring_lock);
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 084804002e512427bfe52b448cb7cac0d4209b64 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Tue, 13 Apr 2021 02:58:38 +0100
Subject: [PATCH] io_uring: fix leaking reg files on exit
If io_sqe_files_unregister() faults on io_rsrc_ref_quiesce(), it will
fail to do unregister leaving files referenced. And that may well happen
because of a strayed signal or just because it does allocations inside.
In io_ring_ctx_free() do an unsafe version of unregister, as it's
guaranteed to not have requests by that point and so quiesce is useless.
Cc: stable(a)vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/e696e9eade571b51997d0dc1d01f144c6d685c05.16182789…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8564c7908126..1af8bb5f7d56 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7089,6 +7089,10 @@ static void __io_sqe_files_unregister(struct io_ring_ctx *ctx)
fput(file);
}
#endif
+ io_free_file_tables(&ctx->file_table, ctx->nr_user_files);
+ kfree(ctx->file_data);
+ ctx->file_data = NULL;
+ ctx->nr_user_files = 0;
}
static inline void io_rsrc_ref_lock(struct io_ring_ctx *ctx)
@@ -7195,21 +7199,14 @@ static struct io_rsrc_data *io_rsrc_data_alloc(struct io_ring_ctx *ctx,
static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
{
- struct io_rsrc_data *data = ctx->file_data;
int ret;
- if (!data)
+ if (!ctx->file_data)
return -ENXIO;
- ret = io_rsrc_ref_quiesce(data, ctx);
- if (ret)
- return ret;
-
- __io_sqe_files_unregister(ctx);
- io_free_file_tables(&ctx->file_table, ctx->nr_user_files);
- kfree(data);
- ctx->file_data = NULL;
- ctx->nr_user_files = 0;
- return 0;
+ ret = io_rsrc_ref_quiesce(ctx->file_data, ctx);
+ if (!ret)
+ __io_sqe_files_unregister(ctx);
+ return ret;
}
static void io_sq_thread_unpark(struct io_sq_data *sqd)
@@ -7659,7 +7656,7 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
ret = io_sqe_files_scm(ctx);
if (ret) {
- io_sqe_files_unregister(ctx);
+ __io_sqe_files_unregister(ctx);
return ret;
}
@@ -8460,7 +8457,11 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
}
mutex_lock(&ctx->uring_lock);
- io_sqe_files_unregister(ctx);
+ if (ctx->file_data) {
+ if (!atomic_dec_and_test(&ctx->file_data->refs))
+ wait_for_completion(&ctx->file_data->done);
+ __io_sqe_files_unregister(ctx);
+ }
if (ctx->rings)
__io_cqring_overflow_flush(ctx, true);
mutex_unlock(&ctx->uring_lock);
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 084804002e512427bfe52b448cb7cac0d4209b64 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Tue, 13 Apr 2021 02:58:38 +0100
Subject: [PATCH] io_uring: fix leaking reg files on exit
If io_sqe_files_unregister() faults on io_rsrc_ref_quiesce(), it will
fail to do unregister leaving files referenced. And that may well happen
because of a strayed signal or just because it does allocations inside.
In io_ring_ctx_free() do an unsafe version of unregister, as it's
guaranteed to not have requests by that point and so quiesce is useless.
Cc: stable(a)vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/e696e9eade571b51997d0dc1d01f144c6d685c05.16182789…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8564c7908126..1af8bb5f7d56 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7089,6 +7089,10 @@ static void __io_sqe_files_unregister(struct io_ring_ctx *ctx)
fput(file);
}
#endif
+ io_free_file_tables(&ctx->file_table, ctx->nr_user_files);
+ kfree(ctx->file_data);
+ ctx->file_data = NULL;
+ ctx->nr_user_files = 0;
}
static inline void io_rsrc_ref_lock(struct io_ring_ctx *ctx)
@@ -7195,21 +7199,14 @@ static struct io_rsrc_data *io_rsrc_data_alloc(struct io_ring_ctx *ctx,
static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
{
- struct io_rsrc_data *data = ctx->file_data;
int ret;
- if (!data)
+ if (!ctx->file_data)
return -ENXIO;
- ret = io_rsrc_ref_quiesce(data, ctx);
- if (ret)
- return ret;
-
- __io_sqe_files_unregister(ctx);
- io_free_file_tables(&ctx->file_table, ctx->nr_user_files);
- kfree(data);
- ctx->file_data = NULL;
- ctx->nr_user_files = 0;
- return 0;
+ ret = io_rsrc_ref_quiesce(ctx->file_data, ctx);
+ if (!ret)
+ __io_sqe_files_unregister(ctx);
+ return ret;
}
static void io_sq_thread_unpark(struct io_sq_data *sqd)
@@ -7659,7 +7656,7 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
ret = io_sqe_files_scm(ctx);
if (ret) {
- io_sqe_files_unregister(ctx);
+ __io_sqe_files_unregister(ctx);
return ret;
}
@@ -8460,7 +8457,11 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
}
mutex_lock(&ctx->uring_lock);
- io_sqe_files_unregister(ctx);
+ if (ctx->file_data) {
+ if (!atomic_dec_and_test(&ctx->file_data->refs))
+ wait_for_completion(&ctx->file_data->done);
+ __io_sqe_files_unregister(ctx);
+ }
if (ctx->rings)
__io_cqring_overflow_flush(ctx, true);
mutex_unlock(&ctx->uring_lock);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3215887167af7db9af9fa23d61321ebfbd6ed6d3 Mon Sep 17 00:00:00 2001
From: Stanimir Varbanov <stanimir.varbanov(a)linaro.org>
Date: Thu, 25 Feb 2021 15:28:57 +0100
Subject: [PATCH] media: venus: pm_helpers: Set opp clock name for v1
The rate of the core clock is set through devm_pm_opp_set_rate and
to avoid errors from it we have to set the name of the clock via
dev_pm_opp_set_clkname.
Fixes: 9a538b83612c ("media: venus: core: Add support for opp tables/perf voting")
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Stanimir Varbanov <stanimir.varbanov(a)linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/platform/qcom/venus/pm_helpers.c b/drivers/media/platform/qcom/venus/pm_helpers.c
index e349d01422c5..95b4d40ff6a5 100644
--- a/drivers/media/platform/qcom/venus/pm_helpers.c
+++ b/drivers/media/platform/qcom/venus/pm_helpers.c
@@ -279,7 +279,22 @@ static int load_scale_v1(struct venus_inst *inst)
static int core_get_v1(struct venus_core *core)
{
- return core_clks_get(core);
+ int ret;
+
+ ret = core_clks_get(core);
+ if (ret)
+ return ret;
+
+ core->opp_table = dev_pm_opp_set_clkname(core->dev, "core");
+ if (IS_ERR(core->opp_table))
+ return PTR_ERR(core->opp_table);
+
+ return 0;
+}
+
+static void core_put_v1(struct venus_core *core)
+{
+ dev_pm_opp_put_clkname(core->opp_table);
}
static int core_power_v1(struct venus_core *core, int on)
@@ -296,6 +311,7 @@ static int core_power_v1(struct venus_core *core, int on)
static const struct venus_pm_ops pm_ops_v1 = {
.core_get = core_get_v1,
+ .core_put = core_put_v1,
.core_power = core_power_v1,
.load_scale = load_scale_v1,
};
@@ -368,6 +384,7 @@ static int venc_power_v3(struct device *dev, int on)
static const struct venus_pm_ops pm_ops_v3 = {
.core_get = core_get_v1,
+ .core_put = core_put_v1,
.core_power = core_power_v1,
.vdec_get = vdec_get_v3,
.vdec_power = vdec_power_v3,
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3215887167af7db9af9fa23d61321ebfbd6ed6d3 Mon Sep 17 00:00:00 2001
From: Stanimir Varbanov <stanimir.varbanov(a)linaro.org>
Date: Thu, 25 Feb 2021 15:28:57 +0100
Subject: [PATCH] media: venus: pm_helpers: Set opp clock name for v1
The rate of the core clock is set through devm_pm_opp_set_rate and
to avoid errors from it we have to set the name of the clock via
dev_pm_opp_set_clkname.
Fixes: 9a538b83612c ("media: venus: core: Add support for opp tables/perf voting")
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Stanimir Varbanov <stanimir.varbanov(a)linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
diff --git a/drivers/media/platform/qcom/venus/pm_helpers.c b/drivers/media/platform/qcom/venus/pm_helpers.c
index e349d01422c5..95b4d40ff6a5 100644
--- a/drivers/media/platform/qcom/venus/pm_helpers.c
+++ b/drivers/media/platform/qcom/venus/pm_helpers.c
@@ -279,7 +279,22 @@ static int load_scale_v1(struct venus_inst *inst)
static int core_get_v1(struct venus_core *core)
{
- return core_clks_get(core);
+ int ret;
+
+ ret = core_clks_get(core);
+ if (ret)
+ return ret;
+
+ core->opp_table = dev_pm_opp_set_clkname(core->dev, "core");
+ if (IS_ERR(core->opp_table))
+ return PTR_ERR(core->opp_table);
+
+ return 0;
+}
+
+static void core_put_v1(struct venus_core *core)
+{
+ dev_pm_opp_put_clkname(core->opp_table);
}
static int core_power_v1(struct venus_core *core, int on)
@@ -296,6 +311,7 @@ static int core_power_v1(struct venus_core *core, int on)
static const struct venus_pm_ops pm_ops_v1 = {
.core_get = core_get_v1,
+ .core_put = core_put_v1,
.core_power = core_power_v1,
.load_scale = load_scale_v1,
};
@@ -368,6 +384,7 @@ static int venc_power_v3(struct device *dev, int on)
static const struct venus_pm_ops pm_ops_v3 = {
.core_get = core_get_v1,
+ .core_put = core_put_v1,
.core_power = core_power_v1,
.vdec_get = vdec_get_v3,
.vdec_power = vdec_power_v3,
LLVM's integrated assembler appears to assume an argument with default
value is passed whenever it sees a comma right after the macro name.
It will be fine if the number of following arguments is one less than
the number of parameters specified in the macro definition. Otherwise,
it fails. For example, the following code works:
$ cat foo.s
.macro foo arg1=2, arg2=4
ldr r0, [r1, #\arg1]
ldr r0, [r1, #\arg2]
.endm
foo, arg2=8
$ llvm-mc -triple=armv7a -filetype=obj foo.s -o ias.o
arm-linux-gnueabihf-objdump -dr ias.o
ias.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <.text>:
0: e5910001 ldr r0, [r1, #2]
4: e5910003 ldr r0, [r1, #8]
While the the following code would fail:
$ cat foo.s
.macro foo arg1=2, arg2=4
ldr r0, [r1, #\arg1]
ldr r0, [r1, #\arg2]
.endm
foo, arg1=2, arg2=8
$ llvm-mc -triple=armv7a -filetype=obj foo.s -o ias.o
foo.s:6:14: error: too many positional arguments
foo, arg1=2, arg2=8
This causes build failures as follows:
arch/arm64/kernel/vdso/gettimeofday.S:230:24: error: too many positional
arguments
clock_gettime_return, shift=1
^
arch/arm64/kernel/vdso/gettimeofday.S:253:24: error: too many positional
arguments
clock_gettime_return, shift=1
^
arch/arm64/kernel/vdso/gettimeofday.S:274:24: error: too many positional
arguments
clock_gettime_return, shift=1
This error is not in mainline because commit 28b1a824a4f4 ("arm64: vdso:
Substitute gettimeofday() with C implementation") rewrote this assembler
file in C as part of a 25 patch series that is unsuitable for stable.
Just remove the comma in the clock_gettime_return invocations in 4.19 so
that GNU as and LLVM's integrated assembler work the same.
Link:
https://github.com/ClangBuiltLinux/linux/issues/1349
Suggested-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Jian Cai <jiancai(a)google.com>
---
Changes v1 -> v2:
Keep the comma in the macro definition to be consistent with other
definitions.
Changes v2 -> v3:
Edit tags.
Changes v3 -> v4:
Update the commit message based on Nathan's comments.
arch/arm64/kernel/vdso/gettimeofday.S | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S
index 856fee6d3512..b6faf8b5d1fe 100644
--- a/arch/arm64/kernel/vdso/gettimeofday.S
+++ b/arch/arm64/kernel/vdso/gettimeofday.S
@@ -227,7 +227,7 @@ realtime:
seqcnt_check fail=realtime
get_ts_realtime res_sec=x10, res_nsec=x11, \
clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9
- clock_gettime_return, shift=1
+ clock_gettime_return shift=1
ALIGN
monotonic:
@@ -250,7 +250,7 @@ monotonic:
clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9
add_ts sec=x10, nsec=x11, ts_sec=x3, ts_nsec=x4, nsec_to_sec=x9
- clock_gettime_return, shift=1
+ clock_gettime_return shift=1
ALIGN
monotonic_raw:
@@ -271,7 +271,7 @@ monotonic_raw:
clock_nsec=x15, nsec_to_sec=x9
add_ts sec=x10, nsec=x11, ts_sec=x13, ts_nsec=x14, nsec_to_sec=x9
- clock_gettime_return, shift=1
+ clock_gettime_return shift=1
ALIGN
realtime_coarse:
--
2.31.1.607.g51e8a6a459-goog
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8c9af478c06bb1ab1422f90d8ecbc53defd44bc3 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Wed, 5 May 2021 10:38:24 -0400
Subject: [PATCH] ftrace: Handle commands when closing set_ftrace_filter file
# echo switch_mm:traceoff > /sys/kernel/tracing/set_ftrace_filter
will cause switch_mm to stop tracing by the traceoff command.
# echo -n switch_mm:traceoff > /sys/kernel/tracing/set_ftrace_filter
does nothing.
The reason is that the parsing in the write function only processes
commands if it finished parsing (there is white space written after the
command). That's to handle:
write(fd, "switch_mm:", 10);
write(fd, "traceoff", 8);
cases, where the command is broken over multiple writes.
The problem is if the file descriptor is closed, then the write call is
not processed, and the command needs to be processed in the release code.
The release code can handle matching of functions, but does not handle
commands.
Cc: stable(a)vger.kernel.org
Fixes: eda1e32855656 ("tracing: handle broken names in ftrace filter")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 057e962ca5ce..c57508445faa 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5591,7 +5591,10 @@ int ftrace_regex_release(struct inode *inode, struct file *file)
parser = &iter->parser;
if (trace_parser_loaded(parser)) {
- ftrace_match_records(iter->hash, parser->buffer, parser->idx);
+ int enable = !(iter->flags & FTRACE_ITER_NOTRACE);
+
+ ftrace_process_regex(iter, parser->buffer,
+ parser->idx, enable);
}
trace_parser_put(parser);
On Fri, 2021-05-07 at 23:25 -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> s390/pci: expose UID uniqueness guarantee
>
> to the 5.12-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> s390-pci-expose-uid-uniqueness-guarantee.patch
> and it can be found in the queue-5.12 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 7b9825aa0891087c5c91be0fb75431919e2027e3
> Author: Niklas Schnelle <schnelle(a)linux.ibm.com>
> Date: Wed Feb 24 11:29:36 2021 +0100
>
> s390/pci: expose UID uniqueness guarantee
>
> [ Upstream commit 408f2c9c15682fc21b645fdec1f726492e235c4b ]
>
> On s390 each PCI device has a user-defined ID (UID) exposed under
> /sys/bus/pci/devices/<dev>/uid. This ID was designed to serve as the PCI
> device's primary index and to match the device within Linux to the
> device configured in the hypervisor. To serve as a primary identifier
> the UID must be unique within the Linux instance, this is guaranteed by
> the platform if and only if the UID Uniqueness Checking flag is set
> within the CLP List PCI Functions response.
>
> While the UID has been exposed to userspace since commit ac4995b9d570
> ("s390/pci: add some new arch specific pci attributes") whether or not
> the platform guarantees its uniqueness for the lifetime of the Linux
> instance while defined is not visible from userspace. Remedy this by
> exposing this as a per device attribute at
>
> /sys/bus/pci/devices/<dev>/uid_is_unique
>
> Keeping this a per device attribute allows for maximum flexibility if we
> ever end up with some devices not having a UID or not enjoying the
> guaranteed uniqueness.
>
> Signed-off-by: Niklas Schnelle <schnelle(a)linux.ibm.com>
> Reviewed-by: Viktor Mihajlovski <mihajlov(a)linux.ibm.com>
> Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
>
Hi Sasha,
I think we should not backport this patch to the stable kernels either
5.12, 5.11 nor 5.10.
The reason is that I'd like this staty together with commit
81bbf03905aa ("s390/pci: expose a PCI device's UID as its index") since
this then allows to reason that if uid_is_unique is 1 that implies that
the index attribute must exist. On the other hand backporting the other
patch too would create new network interface names. I reckon that
there would be some value for the uid_is_unique attribute alone but I
think it's not worth having more possible scenarios.
Thanks,
Niklas Schnelle
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2d036dfa5f10df9782f5278fc591d79d283c1fad Mon Sep 17 00:00:00 2001
From: Chen Jun <chenjun102(a)huawei.com>
Date: Wed, 14 Apr 2021 03:04:49 +0000
Subject: [PATCH] posix-timers: Preserve return value in clock_adjtime32()
The return value on success (>= 0) is overwritten by the return value of
put_old_timex32(). That works correct in the fault case, but is wrong for
the success case where put_old_timex32() returns 0.
Just check the return value of put_old_timex32() and return -EFAULT in case
it is not zero.
[ tglx: Massage changelog ]
Fixes: 3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native counterparts")
Signed-off-by: Chen Jun <chenjun102(a)huawei.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Richard Cochran <richardcochran(a)gmail.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210414030449.90692-1-chenjun102@huawei.com
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index bf540f5a4115..dd5697d7347b 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -1191,8 +1191,8 @@ SYSCALL_DEFINE2(clock_adjtime32, clockid_t, which_clock,
err = do_clock_adjtime(which_clock, &ktx);
- if (err >= 0)
- err = put_old_timex32(utp, &ktx);
+ if (err >= 0 && put_old_timex32(utp, &ktx))
+ return -EFAULT;
return err;
}
From: Mark Gray <mark.d.gray(a)redhat.com>
[ Upstream commit 34beb21594519ce64a55a498c2fe7d567bc1ca20 ]
This patch adds transport ports information for route lookup so that
IPsec can select Geneve tunnel traffic to do encryption. This is
needed for OVS/OVN IPsec with encrypted Geneve tunnels.
This can be tested by configuring a host-host VPN using an IKE
daemon and specifying port numbers. For example, for an
Openswan-type configuration, the following parameters should be
configured on both hosts and IPsec set up as-per normal:
$ cat /etc/ipsec.conf
conn in
...
left=$IP1
right=$IP2
...
leftprotoport=udp/6081
rightprotoport=udp
...
conn out
...
left=$IP1
right=$IP2
...
leftprotoport=udp
rightprotoport=udp/6081
...
The tunnel can then be setup using "ip" on both hosts (but
changing the relevant IP addresses):
$ ip link add tun type geneve id 1000 remote $IP2
$ ip addr add 192.168.0.1/24 dev tun
$ ip link set tun up
This can then be tested by pinging from $IP1:
$ ping 192.168.0.2
Without this patch the traffic is unencrypted on the wire.
Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
Signed-off-by: Qiuyu Xiao <qiuyu.xiao.qyx(a)gmail.com>
Signed-off-by: Mark Gray <mark.d.gray(a)redhat.com>
Reviewed-by: Greg Rose <gvrose8192(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
[backport to 4.4 for CVE-2020-25645]
Signed-off-by: Pavel Machek (CIP) <pavel(a)denx.de>
---
drivers/net/geneve.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index ee38299f9c57..aa00d71705c6 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -842,7 +842,7 @@ static netdev_tx_t geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev,
sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
rt = geneve_get_v4_rt(skb, dev, &fl4, info,
- geneve->dst_port, sport);
+ info->key.tp_dst, sport);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
goto tx_error;
@@ -925,7 +925,7 @@ static netdev_tx_t geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev,
sport = udp_flow_src_port(geneve->net, skb, 1, USHRT_MAX, true);
dst = geneve_get_v6_dst(skb, dev, &fl6, info,
- geneve->dst_port, sport);
+ info->key.tp_dst, sport);
if (IS_ERR(dst)) {
err = PTR_ERR(dst);
goto tx_error;
@@ -1026,7 +1026,7 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
1, USHRT_MAX, true);
rt = geneve_get_v4_rt(skb, dev, &fl4, info,
- geneve->dst_port, sport);
+ info->key.tp_dst, sport);
if (IS_ERR(rt))
return PTR_ERR(rt);
@@ -1038,7 +1038,7 @@ static int geneve_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
1, USHRT_MAX, true);
dst = geneve_get_v6_dst(skb, dev, &fl6, info,
- geneve->dst_port, sport);
+ info->key.tp_dst, sport);
if (IS_ERR(dst))
return PTR_ERR(dst);
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
----- End forwarded message -----
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b6b4fbd90b155a0025223df2c137af8a701d53b3 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 4 May 2021 15:56:31 -0700
Subject: [PATCH] x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is
supported
Initialize MSR_TSC_AUX with CPU node information if RDTSCP or RDPID is
supported. This fixes a bug where vdso_read_cpunode() will read garbage
via RDPID if RDPID is supported but RDTSCP is not. While no known CPU
supports RDPID but not RDTSCP, both Intel's SDM and AMD's APM allow for
RDPID to exist without RDTSCP, e.g. it's technically a legal CPU model
for a virtual machine.
Note, technically MSR_TSC_AUX could be initialized if and only if RDPID
is supported since RDTSCP is currently not used to retrieve the CPU node.
But, the cost of the superfluous WRMSR is negigible, whereas leaving
MSR_TSC_AUX uninitialized is just asking for future breakage if someone
decides to utilize RDTSCP.
Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210504225632.1532621-2-seanjc@google.com
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6bdb69a9a7dc..490bed07fe35 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1851,7 +1851,7 @@ static inline void setup_getcpu(int cpu)
unsigned long cpudata = vdso_encode_cpunode(cpu, early_cpu_to_node(cpu));
struct desc_struct d = { };
- if (boot_cpu_has(X86_FEATURE_RDTSCP))
+ if (boot_cpu_has(X86_FEATURE_RDTSCP) || boot_cpu_has(X86_FEATURE_RDPID))
write_rdtscp_aux(cpudata);
/* Store CPU and node number in limit. */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b6b4fbd90b155a0025223df2c137af8a701d53b3 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 4 May 2021 15:56:31 -0700
Subject: [PATCH] x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is
supported
Initialize MSR_TSC_AUX with CPU node information if RDTSCP or RDPID is
supported. This fixes a bug where vdso_read_cpunode() will read garbage
via RDPID if RDPID is supported but RDTSCP is not. While no known CPU
supports RDPID but not RDTSCP, both Intel's SDM and AMD's APM allow for
RDPID to exist without RDTSCP, e.g. it's technically a legal CPU model
for a virtual machine.
Note, technically MSR_TSC_AUX could be initialized if and only if RDPID
is supported since RDTSCP is currently not used to retrieve the CPU node.
But, the cost of the superfluous WRMSR is negigible, whereas leaving
MSR_TSC_AUX uninitialized is just asking for future breakage if someone
decides to utilize RDTSCP.
Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210504225632.1532621-2-seanjc@google.com
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6bdb69a9a7dc..490bed07fe35 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1851,7 +1851,7 @@ static inline void setup_getcpu(int cpu)
unsigned long cpudata = vdso_encode_cpunode(cpu, early_cpu_to_node(cpu));
struct desc_struct d = { };
- if (boot_cpu_has(X86_FEATURE_RDTSCP))
+ if (boot_cpu_has(X86_FEATURE_RDTSCP) || boot_cpu_has(X86_FEATURE_RDPID))
write_rdtscp_aux(cpudata);
/* Store CPU and node number in limit. */
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b6b4fbd90b155a0025223df2c137af8a701d53b3 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 4 May 2021 15:56:31 -0700
Subject: [PATCH] x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is
supported
Initialize MSR_TSC_AUX with CPU node information if RDTSCP or RDPID is
supported. This fixes a bug where vdso_read_cpunode() will read garbage
via RDPID if RDPID is supported but RDTSCP is not. While no known CPU
supports RDPID but not RDTSCP, both Intel's SDM and AMD's APM allow for
RDPID to exist without RDTSCP, e.g. it's technically a legal CPU model
for a virtual machine.
Note, technically MSR_TSC_AUX could be initialized if and only if RDPID
is supported since RDTSCP is currently not used to retrieve the CPU node.
But, the cost of the superfluous WRMSR is negigible, whereas leaving
MSR_TSC_AUX uninitialized is just asking for future breakage if someone
decides to utilize RDTSCP.
Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210504225632.1532621-2-seanjc@google.com
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6bdb69a9a7dc..490bed07fe35 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1851,7 +1851,7 @@ static inline void setup_getcpu(int cpu)
unsigned long cpudata = vdso_encode_cpunode(cpu, early_cpu_to_node(cpu));
struct desc_struct d = { };
- if (boot_cpu_has(X86_FEATURE_RDTSCP))
+ if (boot_cpu_has(X86_FEATURE_RDTSCP) || boot_cpu_has(X86_FEATURE_RDPID))
write_rdtscp_aux(cpudata);
/* Store CPU and node number in limit. */
Hello Greg,
Can we please have the following jffs2 patches added to the stable tree:
960b9a8a7676 "jffs2: Fix kasan slab-out-of-bounds problem"
90ada91f4610 "jffs2: check the validity of dstlen in jffs2_zlib_compress()"
Both patches fix issues that date back to pre-git history. I am
interested in seeing them applied to v5.4 and v5.10.
Cheers,
Joel
The patch titled
Subject: userfaultfd: release page in error path to avoid BUG_ON
has been added to the -mm tree. Its filename is
userfaultfd-release-page-in-error-path-to-avoid-bug_on.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/userfaultfd-release-page-in-error…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/userfaultfd-release-page-in-error…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Axel Rasmussen <axelrasmussen(a)google.com>
Subject: userfaultfd: release page in error path to avoid BUG_ON
Consider the following sequence of events:
1. Userspace issues a UFFD ioctl, which ends up calling into
shmem_mfill_atomic_pte(). We successfully account the blocks, we
shmem_alloc_page(), but then the copy_from_user() fails. We return
-ENOENT. We don't release the page we allocated.
2. Our caller detects this error code, tries the copy_from_user() after
dropping the mmap_lock, and retries, calling back into
shmem_mfill_atomic_pte().
3. Meanwhile, let's say another process filled up the tmpfs being used.
4. So shmem_mfill_atomic_pte() fails to account blocks this time, and
immediately returns - without releasing the page.
This triggers a BUG_ON in our caller, which asserts that the page
should always be consumed, unless -ENOENT is returned.
To fix this, detect if we have such a "dangling" page when accounting
fails, and if so, release it before returning.
Link: https://lkml.kernel.org/r/20210428230858.348400-1-axelrasmussen@google.com
Fixes: cb658a453b93 ("userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY")
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
Reported-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
--- a/mm/shmem.c~userfaultfd-release-page-in-error-path-to-avoid-bug_on
+++ a/mm/shmem.c
@@ -2361,8 +2361,18 @@ static int shmem_mfill_atomic_pte(struct
pgoff_t offset, max_off;
ret = -ENOMEM;
- if (!shmem_inode_acct_block(inode, 1))
+ if (!shmem_inode_acct_block(inode, 1)) {
+ /*
+ * We may have got a page, returned -ENOENT triggering a retry,
+ * and now we find ourselves with -ENOMEM. Release the page, to
+ * avoid a BUG_ON in our caller.
+ */
+ if (unlikely(*pagep)) {
+ put_page(*pagep);
+ *pagep = NULL;
+ }
goto out;
+ }
if (!*pagep) {
page = shmem_alloc_page(gfp, info, pgoff);
_
Patches currently in -mm which might be from axelrasmussen(a)google.com are
userfaultfd-release-page-in-error-path-to-avoid-bug_on.patch
userfaultfd-hugetlbfs-avoid-including-userfaultfd_kh-in-hugetlbh.patch
userfaultfd-shmem-combine-shmem_mcopy_atomicmfill_zeropage_pte.patch
userfaultfd-shmem-support-minor-fault-registration-for-shmem.patch
userfaultfd-shmem-support-uffdio_continue-for-shmem.patch
userfaultfd-shmem-advertise-shmem-minor-fault-support.patch
userfaultfd-shmem-modify-shmem_mfill_atomic_pte-to-use-install_pte.patch
userfaultfd-selftests-use-memfd_create-for-shmem-test-type.patch
userfaultfd-selftests-create-alias-mappings-in-the-shmem-test.patch
userfaultfd-selftests-reinitialize-test-context-in-each-test.patch
userfaultfd-selftests-exercise-minor-fault-handling-shmem-support.patch
Those patch series can handle NVMe can't suspend to D3 during s2idle
entry on some AMD platform. In this case, can be settld by assigning and
passing a PCIe bus flag to the armed device which need NVMe shutdown opt
in s2idle suspend and then use PCIe power setting to put the NVMe device
to D3.
Prike Liang (2):
PCI: add AMD PCIe quirk for nvme shutdown opt
nvme-pci: add AMD PCIe quirk for simple suspend/resume
drivers/nvme/host/pci.c | 2 ++
drivers/pci/probe.c | 5 ++++-
drivers/pci/quirks.c | 7 +++++++
include/linux/pci.h | 2 ++
4 files changed, 15 insertions(+), 1 deletion(-)
--
2.7.4
Hello Greg,
Can you please pick up following patches to respective releases from [master]:
82e5d8cc768b ("security: commoncap: fix -Wstringop-overread warning")
# 4.14.y, 4.19.y, 5.4.y, 5.10.y
e7c6e405e171 ("Fix misc new gcc warnings") # 4.19.y, 5.4.y, 5.10.y
They do solve build warnings with GCC11.
Thanks a lot!
--
Regards,
Andrey.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b2fcf2102049f6e56981e0ab3d9b633b8e2741da Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <frederic(a)kernel.org>
Date: Tue, 23 Feb 2021 01:09:59 +0100
Subject: [PATCH] rcu/nocb: Fix missed nocb_timer requeue
This sequence of events can lead to a failure to requeue a CPU's
->nocb_timer:
1. There are no callbacks queued for any CPU covered by CPU 0-2's
->nocb_gp_kthread. Note that ->nocb_gp_kthread is associated
with CPU 0.
2. CPU 1 enqueues its first callback with interrupts disabled, and
thus must defer awakening its ->nocb_gp_kthread. It therefore
queues its rcu_data structure's ->nocb_timer. At this point,
CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.
3. CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a
callback, but with interrupts enabled, allowing it to directly
awaken the ->nocb_gp_kthread.
4. The newly awakened ->nocb_gp_kthread associates both CPU 1's
and CPU 2's callbacks with a future grace period and arranges
for that grace period to be started.
5. This ->nocb_gp_kthread goes to sleep waiting for the end of this
future grace period.
6. This grace period elapses before the CPU 1's timer fires.
This is normally improbably given that the timer is set for only
one jiffy, but timers can be delayed. Besides, it is possible
that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
7. The grace period ends, so rcu_gp_kthread awakens the
->nocb_gp_kthread, which in turn awakens both CPU 1's and
CPU 2's ->nocb_cb_kthread. Then ->nocb_gb_kthread sleeps
waiting for more newly queued callbacks.
8. CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps
waiting for more invocable callbacks.
9. Note that neither kthread updated any ->nocb_timer state,
so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.
10. CPU 1 enqueues its second callback, this time with interrupts
enabled so it can wake directly ->nocb_gp_kthread.
It does so with calling wake_nocb_gp() which also cancels the
pending timer that got queued in step 2. But that doesn't reset
CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now
desynchronized.
11. ->nocb_gp_kthread associates the callback queued in 10 with a new
grace period, arranges for that grace period to start and sleeps
waiting for it to complete.
12. The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread,
which in turn wakes up CPU 1's ->nocb_cb_kthread which then
invokes the callback queued in 10.
13. CPU 1 enqueues its third callback, this time with interrupts
disabled so it must queue a timer for a deferred wakeup. However
the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
incorrectly indicates that a timer is already queued. Instead,
CPU 1's ->nocb_timer was cancelled in 10. CPU 1 therefore fails
to queue the ->nocb_timer.
14. CPU 1 has its pending callback and it may go unnoticed until
some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever
calls an explicit deferred wakeup, for example, during idle entry.
This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime
we delete the ->nocb_timer.
It is quite possible that there is a similar scenario involving
->nocb_bypass_timer and ->nocb_defer_wakeup. However, despite some
effort from several people, a failure scenario has not yet been located.
However, that by no means guarantees that no such scenario exists.
Finding a failure scenario is left as an exercise for the reader, and the
"Fixes:" tag below relates to ->nocb_bypass_timer instead of ->nocb_timer.
Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
Cc: <stable(a)vger.kernel.org>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Reviewed-by: Neeraj Upadhyay <neeraju(a)codeaurora.org>
Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index a1a17adeae54..e392bd129316 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1708,7 +1708,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
rcu_nocb_unlock_irqrestore(rdp, flags);
return false;
}
- del_timer(&rdp->nocb_timer);
+
+ if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
+ WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+ del_timer(&rdp->nocb_timer);
+ }
rcu_nocb_unlock_irqrestore(rdp, flags);
raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
@@ -2335,7 +2339,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
return false;
}
ndw = READ_ONCE(rdp->nocb_defer_wakeup);
- WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b2fcf2102049f6e56981e0ab3d9b633b8e2741da Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <frederic(a)kernel.org>
Date: Tue, 23 Feb 2021 01:09:59 +0100
Subject: [PATCH] rcu/nocb: Fix missed nocb_timer requeue
This sequence of events can lead to a failure to requeue a CPU's
->nocb_timer:
1. There are no callbacks queued for any CPU covered by CPU 0-2's
->nocb_gp_kthread. Note that ->nocb_gp_kthread is associated
with CPU 0.
2. CPU 1 enqueues its first callback with interrupts disabled, and
thus must defer awakening its ->nocb_gp_kthread. It therefore
queues its rcu_data structure's ->nocb_timer. At this point,
CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.
3. CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a
callback, but with interrupts enabled, allowing it to directly
awaken the ->nocb_gp_kthread.
4. The newly awakened ->nocb_gp_kthread associates both CPU 1's
and CPU 2's callbacks with a future grace period and arranges
for that grace period to be started.
5. This ->nocb_gp_kthread goes to sleep waiting for the end of this
future grace period.
6. This grace period elapses before the CPU 1's timer fires.
This is normally improbably given that the timer is set for only
one jiffy, but timers can be delayed. Besides, it is possible
that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
7. The grace period ends, so rcu_gp_kthread awakens the
->nocb_gp_kthread, which in turn awakens both CPU 1's and
CPU 2's ->nocb_cb_kthread. Then ->nocb_gb_kthread sleeps
waiting for more newly queued callbacks.
8. CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps
waiting for more invocable callbacks.
9. Note that neither kthread updated any ->nocb_timer state,
so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.
10. CPU 1 enqueues its second callback, this time with interrupts
enabled so it can wake directly ->nocb_gp_kthread.
It does so with calling wake_nocb_gp() which also cancels the
pending timer that got queued in step 2. But that doesn't reset
CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now
desynchronized.
11. ->nocb_gp_kthread associates the callback queued in 10 with a new
grace period, arranges for that grace period to start and sleeps
waiting for it to complete.
12. The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread,
which in turn wakes up CPU 1's ->nocb_cb_kthread which then
invokes the callback queued in 10.
13. CPU 1 enqueues its third callback, this time with interrupts
disabled so it must queue a timer for a deferred wakeup. However
the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
incorrectly indicates that a timer is already queued. Instead,
CPU 1's ->nocb_timer was cancelled in 10. CPU 1 therefore fails
to queue the ->nocb_timer.
14. CPU 1 has its pending callback and it may go unnoticed until
some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever
calls an explicit deferred wakeup, for example, during idle entry.
This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime
we delete the ->nocb_timer.
It is quite possible that there is a similar scenario involving
->nocb_bypass_timer and ->nocb_defer_wakeup. However, despite some
effort from several people, a failure scenario has not yet been located.
However, that by no means guarantees that no such scenario exists.
Finding a failure scenario is left as an exercise for the reader, and the
"Fixes:" tag below relates to ->nocb_bypass_timer instead of ->nocb_timer.
Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
Cc: <stable(a)vger.kernel.org>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Reviewed-by: Neeraj Upadhyay <neeraju(a)codeaurora.org>
Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index a1a17adeae54..e392bd129316 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1708,7 +1708,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
rcu_nocb_unlock_irqrestore(rdp, flags);
return false;
}
- del_timer(&rdp->nocb_timer);
+
+ if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
+ WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+ del_timer(&rdp->nocb_timer);
+ }
rcu_nocb_unlock_irqrestore(rdp, flags);
raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
@@ -2335,7 +2339,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
return false;
}
ndw = READ_ONCE(rdp->nocb_defer_wakeup);
- WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4f06dd92b5d0a6f8eec6a34b8d6ef3e1f4ac1e10 Mon Sep 17 00:00:00 2001
From: Vivek Goyal <vgoyal(a)redhat.com>
Date: Wed, 21 Oct 2020 16:12:49 -0400
Subject: [PATCH] fuse: fix write deadlock
There are two modes for write(2) and friends in fuse:
a) write through (update page cache, send sync WRITE request to userspace)
b) buffered write (update page cache, async writeout later)
The write through method kept all the page cache pages locked that were
used for the request. Keeping more than one page locked is deadlock prone
and Qian Cai demonstrated this with trinity fuzzing.
The reason for keeping the pages locked is that concurrent mapped reads
shouldn't try to pull possibly stale data into the page cache.
For full page writes, the easy way to fix this is to make the cached page
be the authoritative source by marking the page PG_uptodate immediately.
After this the page can be safely unlocked, since mapped/cached reads will
take the written data from the cache.
Concurrent mapped writes will now cause data in the original WRITE request
to be updated; this however doesn't cause any data inconsistency and this
scenario should be exceedingly rare anyway.
If the WRITE request returns with an error in the above case, currently the
page is not marked uptodate; this means that a concurrent read will always
read consistent data. After this patch the page is uptodate between
writing to the cache and receiving the error: there's window where a cached
read will read the wrong data. While theoretically this could be a
regression, it is unlikely to be one in practice, since this is normal for
buffered writes.
In case of a partial page write to an already uptodate page the locking is
also unnecessary, with the above caveats.
Partial write of a not uptodate page still needs to be handled. One way
would be to read the complete page before doing the write. This is not
possible, since it might break filesystems that don't expect any READ
requests when the file was opened O_WRONLY.
The other solution is to serialize the synchronous write with reads from
the partial pages. The easiest way to do this is to keep the partial pages
locked. The problem is that a write() may involve two such pages (one head
and one tail). This patch fixes it by only locking the partial tail page.
If there's a partial head page as well, then split that off as a separate
WRITE request.
Reported-by: Qian Cai <cai(a)lca.pw>
Link: https://lore.kernel.org/linux-fsdevel/4794a3fa3742a5e84fb0f934944204b557308…
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Cc: <stable(a)vger.kernel.org> # v2.6.26
Signed-off-by: Vivek Goyal <vgoyal(a)redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8cccecb55fb8..eff4abaa87da 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1099,6 +1099,7 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
struct fuse_file *ff = file->private_data;
struct fuse_mount *fm = ff->fm;
unsigned int offset, i;
+ bool short_write;
int err;
for (i = 0; i < ap->num_pages; i++)
@@ -1113,32 +1114,38 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
if (!err && ia->write.out.size > count)
err = -EIO;
+ short_write = ia->write.out.size < count;
offset = ap->descs[0].offset;
count = ia->write.out.size;
for (i = 0; i < ap->num_pages; i++) {
struct page *page = ap->pages[i];
- if (!err && !offset && count >= PAGE_SIZE)
- SetPageUptodate(page);
-
- if (count > PAGE_SIZE - offset)
- count -= PAGE_SIZE - offset;
- else
- count = 0;
- offset = 0;
-
- unlock_page(page);
+ if (err) {
+ ClearPageUptodate(page);
+ } else {
+ if (count >= PAGE_SIZE - offset)
+ count -= PAGE_SIZE - offset;
+ else {
+ if (short_write)
+ ClearPageUptodate(page);
+ count = 0;
+ }
+ offset = 0;
+ }
+ if (ia->write.page_locked && (i == ap->num_pages - 1))
+ unlock_page(page);
put_page(page);
}
return err;
}
-static ssize_t fuse_fill_write_pages(struct fuse_args_pages *ap,
+static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
struct address_space *mapping,
struct iov_iter *ii, loff_t pos,
unsigned int max_pages)
{
+ struct fuse_args_pages *ap = &ia->ap;
struct fuse_conn *fc = get_fuse_conn(mapping->host);
unsigned offset = pos & (PAGE_SIZE - 1);
size_t count = 0;
@@ -1191,6 +1198,16 @@ static ssize_t fuse_fill_write_pages(struct fuse_args_pages *ap,
if (offset == PAGE_SIZE)
offset = 0;
+ /* If we copied full page, mark it uptodate */
+ if (tmp == PAGE_SIZE)
+ SetPageUptodate(page);
+
+ if (PageUptodate(page)) {
+ unlock_page(page);
+ } else {
+ ia->write.page_locked = true;
+ break;
+ }
if (!fc->big_writes)
break;
} while (iov_iter_count(ii) && count < fc->max_write &&
@@ -1234,7 +1251,7 @@ static ssize_t fuse_perform_write(struct kiocb *iocb,
break;
}
- count = fuse_fill_write_pages(ap, mapping, ii, pos, nr_pages);
+ count = fuse_fill_write_pages(&ia, mapping, ii, pos, nr_pages);
if (count <= 0) {
err = count;
} else {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 63d97a15ffde..74d888c78fa4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -912,6 +912,7 @@ struct fuse_io_args {
struct {
struct fuse_write_in in;
struct fuse_write_out out;
+ bool page_locked;
} write;
};
struct fuse_args_pages ap;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4f06dd92b5d0a6f8eec6a34b8d6ef3e1f4ac1e10 Mon Sep 17 00:00:00 2001
From: Vivek Goyal <vgoyal(a)redhat.com>
Date: Wed, 21 Oct 2020 16:12:49 -0400
Subject: [PATCH] fuse: fix write deadlock
There are two modes for write(2) and friends in fuse:
a) write through (update page cache, send sync WRITE request to userspace)
b) buffered write (update page cache, async writeout later)
The write through method kept all the page cache pages locked that were
used for the request. Keeping more than one page locked is deadlock prone
and Qian Cai demonstrated this with trinity fuzzing.
The reason for keeping the pages locked is that concurrent mapped reads
shouldn't try to pull possibly stale data into the page cache.
For full page writes, the easy way to fix this is to make the cached page
be the authoritative source by marking the page PG_uptodate immediately.
After this the page can be safely unlocked, since mapped/cached reads will
take the written data from the cache.
Concurrent mapped writes will now cause data in the original WRITE request
to be updated; this however doesn't cause any data inconsistency and this
scenario should be exceedingly rare anyway.
If the WRITE request returns with an error in the above case, currently the
page is not marked uptodate; this means that a concurrent read will always
read consistent data. After this patch the page is uptodate between
writing to the cache and receiving the error: there's window where a cached
read will read the wrong data. While theoretically this could be a
regression, it is unlikely to be one in practice, since this is normal for
buffered writes.
In case of a partial page write to an already uptodate page the locking is
also unnecessary, with the above caveats.
Partial write of a not uptodate page still needs to be handled. One way
would be to read the complete page before doing the write. This is not
possible, since it might break filesystems that don't expect any READ
requests when the file was opened O_WRONLY.
The other solution is to serialize the synchronous write with reads from
the partial pages. The easiest way to do this is to keep the partial pages
locked. The problem is that a write() may involve two such pages (one head
and one tail). This patch fixes it by only locking the partial tail page.
If there's a partial head page as well, then split that off as a separate
WRITE request.
Reported-by: Qian Cai <cai(a)lca.pw>
Link: https://lore.kernel.org/linux-fsdevel/4794a3fa3742a5e84fb0f934944204b557308…
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Cc: <stable(a)vger.kernel.org> # v2.6.26
Signed-off-by: Vivek Goyal <vgoyal(a)redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8cccecb55fb8..eff4abaa87da 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1099,6 +1099,7 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
struct fuse_file *ff = file->private_data;
struct fuse_mount *fm = ff->fm;
unsigned int offset, i;
+ bool short_write;
int err;
for (i = 0; i < ap->num_pages; i++)
@@ -1113,32 +1114,38 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
if (!err && ia->write.out.size > count)
err = -EIO;
+ short_write = ia->write.out.size < count;
offset = ap->descs[0].offset;
count = ia->write.out.size;
for (i = 0; i < ap->num_pages; i++) {
struct page *page = ap->pages[i];
- if (!err && !offset && count >= PAGE_SIZE)
- SetPageUptodate(page);
-
- if (count > PAGE_SIZE - offset)
- count -= PAGE_SIZE - offset;
- else
- count = 0;
- offset = 0;
-
- unlock_page(page);
+ if (err) {
+ ClearPageUptodate(page);
+ } else {
+ if (count >= PAGE_SIZE - offset)
+ count -= PAGE_SIZE - offset;
+ else {
+ if (short_write)
+ ClearPageUptodate(page);
+ count = 0;
+ }
+ offset = 0;
+ }
+ if (ia->write.page_locked && (i == ap->num_pages - 1))
+ unlock_page(page);
put_page(page);
}
return err;
}
-static ssize_t fuse_fill_write_pages(struct fuse_args_pages *ap,
+static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
struct address_space *mapping,
struct iov_iter *ii, loff_t pos,
unsigned int max_pages)
{
+ struct fuse_args_pages *ap = &ia->ap;
struct fuse_conn *fc = get_fuse_conn(mapping->host);
unsigned offset = pos & (PAGE_SIZE - 1);
size_t count = 0;
@@ -1191,6 +1198,16 @@ static ssize_t fuse_fill_write_pages(struct fuse_args_pages *ap,
if (offset == PAGE_SIZE)
offset = 0;
+ /* If we copied full page, mark it uptodate */
+ if (tmp == PAGE_SIZE)
+ SetPageUptodate(page);
+
+ if (PageUptodate(page)) {
+ unlock_page(page);
+ } else {
+ ia->write.page_locked = true;
+ break;
+ }
if (!fc->big_writes)
break;
} while (iov_iter_count(ii) && count < fc->max_write &&
@@ -1234,7 +1251,7 @@ static ssize_t fuse_perform_write(struct kiocb *iocb,
break;
}
- count = fuse_fill_write_pages(ap, mapping, ii, pos, nr_pages);
+ count = fuse_fill_write_pages(&ia, mapping, ii, pos, nr_pages);
if (count <= 0) {
err = count;
} else {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 63d97a15ffde..74d888c78fa4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -912,6 +912,7 @@ struct fuse_io_args {
struct {
struct fuse_write_in in;
struct fuse_write_out out;
+ bool page_locked;
} write;
};
struct fuse_args_pages ap;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4f06dd92b5d0a6f8eec6a34b8d6ef3e1f4ac1e10 Mon Sep 17 00:00:00 2001
From: Vivek Goyal <vgoyal(a)redhat.com>
Date: Wed, 21 Oct 2020 16:12:49 -0400
Subject: [PATCH] fuse: fix write deadlock
There are two modes for write(2) and friends in fuse:
a) write through (update page cache, send sync WRITE request to userspace)
b) buffered write (update page cache, async writeout later)
The write through method kept all the page cache pages locked that were
used for the request. Keeping more than one page locked is deadlock prone
and Qian Cai demonstrated this with trinity fuzzing.
The reason for keeping the pages locked is that concurrent mapped reads
shouldn't try to pull possibly stale data into the page cache.
For full page writes, the easy way to fix this is to make the cached page
be the authoritative source by marking the page PG_uptodate immediately.
After this the page can be safely unlocked, since mapped/cached reads will
take the written data from the cache.
Concurrent mapped writes will now cause data in the original WRITE request
to be updated; this however doesn't cause any data inconsistency and this
scenario should be exceedingly rare anyway.
If the WRITE request returns with an error in the above case, currently the
page is not marked uptodate; this means that a concurrent read will always
read consistent data. After this patch the page is uptodate between
writing to the cache and receiving the error: there's window where a cached
read will read the wrong data. While theoretically this could be a
regression, it is unlikely to be one in practice, since this is normal for
buffered writes.
In case of a partial page write to an already uptodate page the locking is
also unnecessary, with the above caveats.
Partial write of a not uptodate page still needs to be handled. One way
would be to read the complete page before doing the write. This is not
possible, since it might break filesystems that don't expect any READ
requests when the file was opened O_WRONLY.
The other solution is to serialize the synchronous write with reads from
the partial pages. The easiest way to do this is to keep the partial pages
locked. The problem is that a write() may involve two such pages (one head
and one tail). This patch fixes it by only locking the partial tail page.
If there's a partial head page as well, then split that off as a separate
WRITE request.
Reported-by: Qian Cai <cai(a)lca.pw>
Link: https://lore.kernel.org/linux-fsdevel/4794a3fa3742a5e84fb0f934944204b557308…
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Cc: <stable(a)vger.kernel.org> # v2.6.26
Signed-off-by: Vivek Goyal <vgoyal(a)redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8cccecb55fb8..eff4abaa87da 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1099,6 +1099,7 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
struct fuse_file *ff = file->private_data;
struct fuse_mount *fm = ff->fm;
unsigned int offset, i;
+ bool short_write;
int err;
for (i = 0; i < ap->num_pages; i++)
@@ -1113,32 +1114,38 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
if (!err && ia->write.out.size > count)
err = -EIO;
+ short_write = ia->write.out.size < count;
offset = ap->descs[0].offset;
count = ia->write.out.size;
for (i = 0; i < ap->num_pages; i++) {
struct page *page = ap->pages[i];
- if (!err && !offset && count >= PAGE_SIZE)
- SetPageUptodate(page);
-
- if (count > PAGE_SIZE - offset)
- count -= PAGE_SIZE - offset;
- else
- count = 0;
- offset = 0;
-
- unlock_page(page);
+ if (err) {
+ ClearPageUptodate(page);
+ } else {
+ if (count >= PAGE_SIZE - offset)
+ count -= PAGE_SIZE - offset;
+ else {
+ if (short_write)
+ ClearPageUptodate(page);
+ count = 0;
+ }
+ offset = 0;
+ }
+ if (ia->write.page_locked && (i == ap->num_pages - 1))
+ unlock_page(page);
put_page(page);
}
return err;
}
-static ssize_t fuse_fill_write_pages(struct fuse_args_pages *ap,
+static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
struct address_space *mapping,
struct iov_iter *ii, loff_t pos,
unsigned int max_pages)
{
+ struct fuse_args_pages *ap = &ia->ap;
struct fuse_conn *fc = get_fuse_conn(mapping->host);
unsigned offset = pos & (PAGE_SIZE - 1);
size_t count = 0;
@@ -1191,6 +1198,16 @@ static ssize_t fuse_fill_write_pages(struct fuse_args_pages *ap,
if (offset == PAGE_SIZE)
offset = 0;
+ /* If we copied full page, mark it uptodate */
+ if (tmp == PAGE_SIZE)
+ SetPageUptodate(page);
+
+ if (PageUptodate(page)) {
+ unlock_page(page);
+ } else {
+ ia->write.page_locked = true;
+ break;
+ }
if (!fc->big_writes)
break;
} while (iov_iter_count(ii) && count < fc->max_write &&
@@ -1234,7 +1251,7 @@ static ssize_t fuse_perform_write(struct kiocb *iocb,
break;
}
- count = fuse_fill_write_pages(ap, mapping, ii, pos, nr_pages);
+ count = fuse_fill_write_pages(&ia, mapping, ii, pos, nr_pages);
if (count <= 0) {
err = count;
} else {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 63d97a15ffde..74d888c78fa4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -912,6 +912,7 @@ struct fuse_io_args {
struct {
struct fuse_write_in in;
struct fuse_write_out out;
+ bool page_locked;
} write;
};
struct fuse_args_pages ap;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4f06dd92b5d0a6f8eec6a34b8d6ef3e1f4ac1e10 Mon Sep 17 00:00:00 2001
From: Vivek Goyal <vgoyal(a)redhat.com>
Date: Wed, 21 Oct 2020 16:12:49 -0400
Subject: [PATCH] fuse: fix write deadlock
There are two modes for write(2) and friends in fuse:
a) write through (update page cache, send sync WRITE request to userspace)
b) buffered write (update page cache, async writeout later)
The write through method kept all the page cache pages locked that were
used for the request. Keeping more than one page locked is deadlock prone
and Qian Cai demonstrated this with trinity fuzzing.
The reason for keeping the pages locked is that concurrent mapped reads
shouldn't try to pull possibly stale data into the page cache.
For full page writes, the easy way to fix this is to make the cached page
be the authoritative source by marking the page PG_uptodate immediately.
After this the page can be safely unlocked, since mapped/cached reads will
take the written data from the cache.
Concurrent mapped writes will now cause data in the original WRITE request
to be updated; this however doesn't cause any data inconsistency and this
scenario should be exceedingly rare anyway.
If the WRITE request returns with an error in the above case, currently the
page is not marked uptodate; this means that a concurrent read will always
read consistent data. After this patch the page is uptodate between
writing to the cache and receiving the error: there's window where a cached
read will read the wrong data. While theoretically this could be a
regression, it is unlikely to be one in practice, since this is normal for
buffered writes.
In case of a partial page write to an already uptodate page the locking is
also unnecessary, with the above caveats.
Partial write of a not uptodate page still needs to be handled. One way
would be to read the complete page before doing the write. This is not
possible, since it might break filesystems that don't expect any READ
requests when the file was opened O_WRONLY.
The other solution is to serialize the synchronous write with reads from
the partial pages. The easiest way to do this is to keep the partial pages
locked. The problem is that a write() may involve two such pages (one head
and one tail). This patch fixes it by only locking the partial tail page.
If there's a partial head page as well, then split that off as a separate
WRITE request.
Reported-by: Qian Cai <cai(a)lca.pw>
Link: https://lore.kernel.org/linux-fsdevel/4794a3fa3742a5e84fb0f934944204b557308…
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Cc: <stable(a)vger.kernel.org> # v2.6.26
Signed-off-by: Vivek Goyal <vgoyal(a)redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8cccecb55fb8..eff4abaa87da 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1099,6 +1099,7 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
struct fuse_file *ff = file->private_data;
struct fuse_mount *fm = ff->fm;
unsigned int offset, i;
+ bool short_write;
int err;
for (i = 0; i < ap->num_pages; i++)
@@ -1113,32 +1114,38 @@ static ssize_t fuse_send_write_pages(struct fuse_io_args *ia,
if (!err && ia->write.out.size > count)
err = -EIO;
+ short_write = ia->write.out.size < count;
offset = ap->descs[0].offset;
count = ia->write.out.size;
for (i = 0; i < ap->num_pages; i++) {
struct page *page = ap->pages[i];
- if (!err && !offset && count >= PAGE_SIZE)
- SetPageUptodate(page);
-
- if (count > PAGE_SIZE - offset)
- count -= PAGE_SIZE - offset;
- else
- count = 0;
- offset = 0;
-
- unlock_page(page);
+ if (err) {
+ ClearPageUptodate(page);
+ } else {
+ if (count >= PAGE_SIZE - offset)
+ count -= PAGE_SIZE - offset;
+ else {
+ if (short_write)
+ ClearPageUptodate(page);
+ count = 0;
+ }
+ offset = 0;
+ }
+ if (ia->write.page_locked && (i == ap->num_pages - 1))
+ unlock_page(page);
put_page(page);
}
return err;
}
-static ssize_t fuse_fill_write_pages(struct fuse_args_pages *ap,
+static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia,
struct address_space *mapping,
struct iov_iter *ii, loff_t pos,
unsigned int max_pages)
{
+ struct fuse_args_pages *ap = &ia->ap;
struct fuse_conn *fc = get_fuse_conn(mapping->host);
unsigned offset = pos & (PAGE_SIZE - 1);
size_t count = 0;
@@ -1191,6 +1198,16 @@ static ssize_t fuse_fill_write_pages(struct fuse_args_pages *ap,
if (offset == PAGE_SIZE)
offset = 0;
+ /* If we copied full page, mark it uptodate */
+ if (tmp == PAGE_SIZE)
+ SetPageUptodate(page);
+
+ if (PageUptodate(page)) {
+ unlock_page(page);
+ } else {
+ ia->write.page_locked = true;
+ break;
+ }
if (!fc->big_writes)
break;
} while (iov_iter_count(ii) && count < fc->max_write &&
@@ -1234,7 +1251,7 @@ static ssize_t fuse_perform_write(struct kiocb *iocb,
break;
}
- count = fuse_fill_write_pages(ap, mapping, ii, pos, nr_pages);
+ count = fuse_fill_write_pages(&ia, mapping, ii, pos, nr_pages);
if (count <= 0) {
err = count;
} else {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 63d97a15ffde..74d888c78fa4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -912,6 +912,7 @@ struct fuse_io_args {
struct {
struct fuse_write_in in;
struct fuse_write_out out;
+ bool page_locked;
} write;
};
struct fuse_args_pages ap;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9716ac65efc8f780549b03bddf41e60c445d4709 Mon Sep 17 00:00:00 2001
From: Stefan Berger <stefanb(a)linux.ibm.com>
Date: Wed, 10 Mar 2021 17:19:16 -0500
Subject: [PATCH] tpm: vtpm_proxy: Avoid reading host log when using a virtual
device
Avoid allocating memory and reading the host log when a virtual device
is used since this log is of no use to that driver. A virtual
device can be identified through the flag TPM_CHIP_FLAG_VIRTUAL, which
is only set for the tpm_vtpm_proxy driver.
Cc: stable(a)vger.kernel.org
Fixes: 6f99612e2500 ("tpm: Proxy driver for supporting multiple emulated TPMs")
Signed-off-by: Stefan Berger <stefanb(a)linux.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
diff --git a/drivers/char/tpm/eventlog/common.c b/drivers/char/tpm/eventlog/common.c
index 7460f230bae4..8512ec76d526 100644
--- a/drivers/char/tpm/eventlog/common.c
+++ b/drivers/char/tpm/eventlog/common.c
@@ -107,6 +107,9 @@ void tpm_bios_log_setup(struct tpm_chip *chip)
int log_version;
int rc = 0;
+ if (chip->flags & TPM_CHIP_FLAG_VIRTUAL)
+ return;
+
rc = tpm_read_log(chip);
if (rc < 0)
return;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9716ac65efc8f780549b03bddf41e60c445d4709 Mon Sep 17 00:00:00 2001
From: Stefan Berger <stefanb(a)linux.ibm.com>
Date: Wed, 10 Mar 2021 17:19:16 -0500
Subject: [PATCH] tpm: vtpm_proxy: Avoid reading host log when using a virtual
device
Avoid allocating memory and reading the host log when a virtual device
is used since this log is of no use to that driver. A virtual
device can be identified through the flag TPM_CHIP_FLAG_VIRTUAL, which
is only set for the tpm_vtpm_proxy driver.
Cc: stable(a)vger.kernel.org
Fixes: 6f99612e2500 ("tpm: Proxy driver for supporting multiple emulated TPMs")
Signed-off-by: Stefan Berger <stefanb(a)linux.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
diff --git a/drivers/char/tpm/eventlog/common.c b/drivers/char/tpm/eventlog/common.c
index 7460f230bae4..8512ec76d526 100644
--- a/drivers/char/tpm/eventlog/common.c
+++ b/drivers/char/tpm/eventlog/common.c
@@ -107,6 +107,9 @@ void tpm_bios_log_setup(struct tpm_chip *chip)
int log_version;
int rc = 0;
+ if (chip->flags & TPM_CHIP_FLAG_VIRTUAL)
+ return;
+
rc = tpm_read_log(chip);
if (rc < 0)
return;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38c527aeb41926c71902dd42f788a8b093b21416 Mon Sep 17 00:00:00 2001
From: "Longpeng(Mike)" <longpeng2(a)huawei.com>
Date: Thu, 15 Apr 2021 08:46:28 +0800
Subject: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB)
2.
while (1) {
(a) DMA MAP 0,0xa0000
(b) DMA UNMAP 0,0xa0000
(c) DMA MAP 0,0xc0000000
* DMA read IOVA 0 may failure here (Not present)
* if the problem occurs.
(d) DMA UNMAP 0,0xc0000000
}
The page table(only focus on IOVA 0) after (a) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:
1. PDE: 0x1a30a72003
2. __domain_mapping
dma_pte_free_pagetable
Set the PDE entry to ZERO
Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Lu Baolu <baolu.lu(a)linux.intel.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Alex Williamson <alex.williamson(a)redhat.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Kevin Tian <kevin.tian(a)intel.com>
Cc: Gonglei (Arei) <arei.gonglei(a)huawei.com>
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage")
Cc: <stable(a)vger.kernel.org> # v3.0+
Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@hu…
Suggested-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Signed-off-by: Longpeng(Mike) <longpeng2(a)huawei.com>
Acked-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Link: https://lore.kernel.org/r/20210415004628.1779-1-longpeng2@huawei.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c981e69bc107..ab2a026dd844 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2301,6 +2301,41 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain,
return level;
}
+/*
+ * Ensure that old small page tables are removed to make room for superpage(s).
+ * We're going to add new large pages, so make sure we don't remove their parent
+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+ unsigned long start_pfn,
+ unsigned long end_pfn, int level)
+{
+ unsigned long lvl_pages = lvl_to_nr_pages(level);
+ struct dma_pte *pte = NULL;
+ int i;
+
+ while (start_pfn <= end_pfn) {
+ if (!pte)
+ pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+ if (dma_pte_present(pte)) {
+ dma_pte_free_pagetable(domain, start_pfn,
+ start_pfn + lvl_pages - 1,
+ level + 1);
+
+ for_each_domain_iommu(i, domain)
+ iommu_flush_iotlb_psi(g_iommus[i], domain,
+ start_pfn, lvl_pages,
+ 0, 0);
+ }
+
+ pte++;
+ start_pfn += lvl_pages;
+ if (first_pte_in_page(pte))
+ pte = NULL;
+ }
+}
+
static int
__domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2342,22 +2377,11 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
return -ENOMEM;
/* It is large page*/
if (largepage_lvl > 1) {
- unsigned long nr_superpages, end_pfn;
+ unsigned long end_pfn;
pteval |= DMA_PTE_LARGE_PAGE;
- lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
- nr_superpages = nr_pages / lvl_pages;
- end_pfn = iov_pfn + nr_superpages * lvl_pages - 1;
-
- /*
- * Ensure that old small page tables are
- * removed to make room for superpage(s).
- * We're adding new large pages, so make sure
- * we don't remove their parent tables.
- */
- dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
- largepage_lvl + 1);
+ end_pfn = ((iov_pfn + nr_pages) & level_mask(largepage_lvl)) - 1;
+ switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38c527aeb41926c71902dd42f788a8b093b21416 Mon Sep 17 00:00:00 2001
From: "Longpeng(Mike)" <longpeng2(a)huawei.com>
Date: Thu, 15 Apr 2021 08:46:28 +0800
Subject: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB)
2.
while (1) {
(a) DMA MAP 0,0xa0000
(b) DMA UNMAP 0,0xa0000
(c) DMA MAP 0,0xc0000000
* DMA read IOVA 0 may failure here (Not present)
* if the problem occurs.
(d) DMA UNMAP 0,0xc0000000
}
The page table(only focus on IOVA 0) after (a) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:
1. PDE: 0x1a30a72003
2. __domain_mapping
dma_pte_free_pagetable
Set the PDE entry to ZERO
Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Lu Baolu <baolu.lu(a)linux.intel.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Alex Williamson <alex.williamson(a)redhat.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Kevin Tian <kevin.tian(a)intel.com>
Cc: Gonglei (Arei) <arei.gonglei(a)huawei.com>
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage")
Cc: <stable(a)vger.kernel.org> # v3.0+
Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@hu…
Suggested-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Signed-off-by: Longpeng(Mike) <longpeng2(a)huawei.com>
Acked-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Link: https://lore.kernel.org/r/20210415004628.1779-1-longpeng2@huawei.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c981e69bc107..ab2a026dd844 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2301,6 +2301,41 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain,
return level;
}
+/*
+ * Ensure that old small page tables are removed to make room for superpage(s).
+ * We're going to add new large pages, so make sure we don't remove their parent
+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+ unsigned long start_pfn,
+ unsigned long end_pfn, int level)
+{
+ unsigned long lvl_pages = lvl_to_nr_pages(level);
+ struct dma_pte *pte = NULL;
+ int i;
+
+ while (start_pfn <= end_pfn) {
+ if (!pte)
+ pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+ if (dma_pte_present(pte)) {
+ dma_pte_free_pagetable(domain, start_pfn,
+ start_pfn + lvl_pages - 1,
+ level + 1);
+
+ for_each_domain_iommu(i, domain)
+ iommu_flush_iotlb_psi(g_iommus[i], domain,
+ start_pfn, lvl_pages,
+ 0, 0);
+ }
+
+ pte++;
+ start_pfn += lvl_pages;
+ if (first_pte_in_page(pte))
+ pte = NULL;
+ }
+}
+
static int
__domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2342,22 +2377,11 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
return -ENOMEM;
/* It is large page*/
if (largepage_lvl > 1) {
- unsigned long nr_superpages, end_pfn;
+ unsigned long end_pfn;
pteval |= DMA_PTE_LARGE_PAGE;
- lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
- nr_superpages = nr_pages / lvl_pages;
- end_pfn = iov_pfn + nr_superpages * lvl_pages - 1;
-
- /*
- * Ensure that old small page tables are
- * removed to make room for superpage(s).
- * We're adding new large pages, so make sure
- * we don't remove their parent tables.
- */
- dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
- largepage_lvl + 1);
+ end_pfn = ((iov_pfn + nr_pages) & level_mask(largepage_lvl)) - 1;
+ switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38c527aeb41926c71902dd42f788a8b093b21416 Mon Sep 17 00:00:00 2001
From: "Longpeng(Mike)" <longpeng2(a)huawei.com>
Date: Thu, 15 Apr 2021 08:46:28 +0800
Subject: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB)
2.
while (1) {
(a) DMA MAP 0,0xa0000
(b) DMA UNMAP 0,0xa0000
(c) DMA MAP 0,0xc0000000
* DMA read IOVA 0 may failure here (Not present)
* if the problem occurs.
(d) DMA UNMAP 0,0xc0000000
}
The page table(only focus on IOVA 0) after (a) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:
1. PDE: 0x1a30a72003
2. __domain_mapping
dma_pte_free_pagetable
Set the PDE entry to ZERO
Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Lu Baolu <baolu.lu(a)linux.intel.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Alex Williamson <alex.williamson(a)redhat.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Kevin Tian <kevin.tian(a)intel.com>
Cc: Gonglei (Arei) <arei.gonglei(a)huawei.com>
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage")
Cc: <stable(a)vger.kernel.org> # v3.0+
Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@hu…
Suggested-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Signed-off-by: Longpeng(Mike) <longpeng2(a)huawei.com>
Acked-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Link: https://lore.kernel.org/r/20210415004628.1779-1-longpeng2@huawei.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c981e69bc107..ab2a026dd844 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2301,6 +2301,41 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain,
return level;
}
+/*
+ * Ensure that old small page tables are removed to make room for superpage(s).
+ * We're going to add new large pages, so make sure we don't remove their parent
+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+ unsigned long start_pfn,
+ unsigned long end_pfn, int level)
+{
+ unsigned long lvl_pages = lvl_to_nr_pages(level);
+ struct dma_pte *pte = NULL;
+ int i;
+
+ while (start_pfn <= end_pfn) {
+ if (!pte)
+ pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+ if (dma_pte_present(pte)) {
+ dma_pte_free_pagetable(domain, start_pfn,
+ start_pfn + lvl_pages - 1,
+ level + 1);
+
+ for_each_domain_iommu(i, domain)
+ iommu_flush_iotlb_psi(g_iommus[i], domain,
+ start_pfn, lvl_pages,
+ 0, 0);
+ }
+
+ pte++;
+ start_pfn += lvl_pages;
+ if (first_pte_in_page(pte))
+ pte = NULL;
+ }
+}
+
static int
__domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2342,22 +2377,11 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
return -ENOMEM;
/* It is large page*/
if (largepage_lvl > 1) {
- unsigned long nr_superpages, end_pfn;
+ unsigned long end_pfn;
pteval |= DMA_PTE_LARGE_PAGE;
- lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
- nr_superpages = nr_pages / lvl_pages;
- end_pfn = iov_pfn + nr_superpages * lvl_pages - 1;
-
- /*
- * Ensure that old small page tables are
- * removed to make room for superpage(s).
- * We're adding new large pages, so make sure
- * we don't remove their parent tables.
- */
- dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
- largepage_lvl + 1);
+ end_pfn = ((iov_pfn + nr_pages) & level_mask(largepage_lvl)) - 1;
+ switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38c527aeb41926c71902dd42f788a8b093b21416 Mon Sep 17 00:00:00 2001
From: "Longpeng(Mike)" <longpeng2(a)huawei.com>
Date: Thu, 15 Apr 2021 08:46:28 +0800
Subject: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB)
2.
while (1) {
(a) DMA MAP 0,0xa0000
(b) DMA UNMAP 0,0xa0000
(c) DMA MAP 0,0xc0000000
* DMA read IOVA 0 may failure here (Not present)
* if the problem occurs.
(d) DMA UNMAP 0,0xc0000000
}
The page table(only focus on IOVA 0) after (a) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:
1. PDE: 0x1a30a72003
2. __domain_mapping
dma_pte_free_pagetable
Set the PDE entry to ZERO
Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Lu Baolu <baolu.lu(a)linux.intel.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Alex Williamson <alex.williamson(a)redhat.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Kevin Tian <kevin.tian(a)intel.com>
Cc: Gonglei (Arei) <arei.gonglei(a)huawei.com>
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage")
Cc: <stable(a)vger.kernel.org> # v3.0+
Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@hu…
Suggested-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Signed-off-by: Longpeng(Mike) <longpeng2(a)huawei.com>
Acked-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Link: https://lore.kernel.org/r/20210415004628.1779-1-longpeng2@huawei.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c981e69bc107..ab2a026dd844 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2301,6 +2301,41 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain,
return level;
}
+/*
+ * Ensure that old small page tables are removed to make room for superpage(s).
+ * We're going to add new large pages, so make sure we don't remove their parent
+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+ unsigned long start_pfn,
+ unsigned long end_pfn, int level)
+{
+ unsigned long lvl_pages = lvl_to_nr_pages(level);
+ struct dma_pte *pte = NULL;
+ int i;
+
+ while (start_pfn <= end_pfn) {
+ if (!pte)
+ pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+ if (dma_pte_present(pte)) {
+ dma_pte_free_pagetable(domain, start_pfn,
+ start_pfn + lvl_pages - 1,
+ level + 1);
+
+ for_each_domain_iommu(i, domain)
+ iommu_flush_iotlb_psi(g_iommus[i], domain,
+ start_pfn, lvl_pages,
+ 0, 0);
+ }
+
+ pte++;
+ start_pfn += lvl_pages;
+ if (first_pte_in_page(pte))
+ pte = NULL;
+ }
+}
+
static int
__domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2342,22 +2377,11 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
return -ENOMEM;
/* It is large page*/
if (largepage_lvl > 1) {
- unsigned long nr_superpages, end_pfn;
+ unsigned long end_pfn;
pteval |= DMA_PTE_LARGE_PAGE;
- lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
- nr_superpages = nr_pages / lvl_pages;
- end_pfn = iov_pfn + nr_superpages * lvl_pages - 1;
-
- /*
- * Ensure that old small page tables are
- * removed to make room for superpage(s).
- * We're adding new large pages, so make sure
- * we don't remove their parent tables.
- */
- dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
- largepage_lvl + 1);
+ end_pfn = ((iov_pfn + nr_pages) & level_mask(largepage_lvl)) - 1;
+ switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38c527aeb41926c71902dd42f788a8b093b21416 Mon Sep 17 00:00:00 2001
From: "Longpeng(Mike)" <longpeng2(a)huawei.com>
Date: Thu, 15 Apr 2021 08:46:28 +0800
Subject: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB)
2.
while (1) {
(a) DMA MAP 0,0xa0000
(b) DMA UNMAP 0,0xa0000
(c) DMA MAP 0,0xc0000000
* DMA read IOVA 0 may failure here (Not present)
* if the problem occurs.
(d) DMA UNMAP 0,0xc0000000
}
The page table(only focus on IOVA 0) after (a) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:
1. PDE: 0x1a30a72003
2. __domain_mapping
dma_pte_free_pagetable
Set the PDE entry to ZERO
Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Lu Baolu <baolu.lu(a)linux.intel.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Alex Williamson <alex.williamson(a)redhat.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Kevin Tian <kevin.tian(a)intel.com>
Cc: Gonglei (Arei) <arei.gonglei(a)huawei.com>
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage")
Cc: <stable(a)vger.kernel.org> # v3.0+
Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@hu…
Suggested-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Signed-off-by: Longpeng(Mike) <longpeng2(a)huawei.com>
Acked-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Link: https://lore.kernel.org/r/20210415004628.1779-1-longpeng2@huawei.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c981e69bc107..ab2a026dd844 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2301,6 +2301,41 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain,
return level;
}
+/*
+ * Ensure that old small page tables are removed to make room for superpage(s).
+ * We're going to add new large pages, so make sure we don't remove their parent
+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+ unsigned long start_pfn,
+ unsigned long end_pfn, int level)
+{
+ unsigned long lvl_pages = lvl_to_nr_pages(level);
+ struct dma_pte *pte = NULL;
+ int i;
+
+ while (start_pfn <= end_pfn) {
+ if (!pte)
+ pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+ if (dma_pte_present(pte)) {
+ dma_pte_free_pagetable(domain, start_pfn,
+ start_pfn + lvl_pages - 1,
+ level + 1);
+
+ for_each_domain_iommu(i, domain)
+ iommu_flush_iotlb_psi(g_iommus[i], domain,
+ start_pfn, lvl_pages,
+ 0, 0);
+ }
+
+ pte++;
+ start_pfn += lvl_pages;
+ if (first_pte_in_page(pte))
+ pte = NULL;
+ }
+}
+
static int
__domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2342,22 +2377,11 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
return -ENOMEM;
/* It is large page*/
if (largepage_lvl > 1) {
- unsigned long nr_superpages, end_pfn;
+ unsigned long end_pfn;
pteval |= DMA_PTE_LARGE_PAGE;
- lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
- nr_superpages = nr_pages / lvl_pages;
- end_pfn = iov_pfn + nr_superpages * lvl_pages - 1;
-
- /*
- * Ensure that old small page tables are
- * removed to make room for superpage(s).
- * We're adding new large pages, so make sure
- * we don't remove their parent tables.
- */
- dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
- largepage_lvl + 1);
+ end_pfn = ((iov_pfn + nr_pages) & level_mask(largepage_lvl)) - 1;
+ switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38c527aeb41926c71902dd42f788a8b093b21416 Mon Sep 17 00:00:00 2001
From: "Longpeng(Mike)" <longpeng2(a)huawei.com>
Date: Thu, 15 Apr 2021 08:46:28 +0800
Subject: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB)
2.
while (1) {
(a) DMA MAP 0,0xa0000
(b) DMA UNMAP 0,0xa0000
(c) DMA MAP 0,0xc0000000
* DMA read IOVA 0 may failure here (Not present)
* if the problem occurs.
(d) DMA UNMAP 0,0xc0000000
}
The page table(only focus on IOVA 0) after (a) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:
1. PDE: 0x1a30a72003
2. __domain_mapping
dma_pte_free_pagetable
Set the PDE entry to ZERO
Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Lu Baolu <baolu.lu(a)linux.intel.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Alex Williamson <alex.williamson(a)redhat.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Kevin Tian <kevin.tian(a)intel.com>
Cc: Gonglei (Arei) <arei.gonglei(a)huawei.com>
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage")
Cc: <stable(a)vger.kernel.org> # v3.0+
Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@hu…
Suggested-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Signed-off-by: Longpeng(Mike) <longpeng2(a)huawei.com>
Acked-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Link: https://lore.kernel.org/r/20210415004628.1779-1-longpeng2@huawei.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c981e69bc107..ab2a026dd844 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2301,6 +2301,41 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain,
return level;
}
+/*
+ * Ensure that old small page tables are removed to make room for superpage(s).
+ * We're going to add new large pages, so make sure we don't remove their parent
+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+ unsigned long start_pfn,
+ unsigned long end_pfn, int level)
+{
+ unsigned long lvl_pages = lvl_to_nr_pages(level);
+ struct dma_pte *pte = NULL;
+ int i;
+
+ while (start_pfn <= end_pfn) {
+ if (!pte)
+ pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+ if (dma_pte_present(pte)) {
+ dma_pte_free_pagetable(domain, start_pfn,
+ start_pfn + lvl_pages - 1,
+ level + 1);
+
+ for_each_domain_iommu(i, domain)
+ iommu_flush_iotlb_psi(g_iommus[i], domain,
+ start_pfn, lvl_pages,
+ 0, 0);
+ }
+
+ pte++;
+ start_pfn += lvl_pages;
+ if (first_pte_in_page(pte))
+ pte = NULL;
+ }
+}
+
static int
__domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2342,22 +2377,11 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
return -ENOMEM;
/* It is large page*/
if (largepage_lvl > 1) {
- unsigned long nr_superpages, end_pfn;
+ unsigned long end_pfn;
pteval |= DMA_PTE_LARGE_PAGE;
- lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
- nr_superpages = nr_pages / lvl_pages;
- end_pfn = iov_pfn + nr_superpages * lvl_pages - 1;
-
- /*
- * Ensure that old small page tables are
- * removed to make room for superpage(s).
- * We're adding new large pages, so make sure
- * we don't remove their parent tables.
- */
- dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
- largepage_lvl + 1);
+ end_pfn = ((iov_pfn + nr_pages) & level_mask(largepage_lvl)) - 1;
+ switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b862676e371715456c9dade7990c8004996d0d9e Mon Sep 17 00:00:00 2001
From: Chao Yu <chao(a)kernel.org>
Date: Mon, 22 Mar 2021 19:47:30 +0800
Subject: [PATCH] f2fs: fix to avoid out-of-bounds memory access
butt3rflyh4ck <butterflyhuangxx(a)gmail.com> reported a bug found by
syzkaller fuzzer with custom modifications in 5.12.0-rc3+ [1]:
dump_stack+0xfa/0x151 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x82/0x32c mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
f2fs_test_bit fs/f2fs/f2fs.h:2572 [inline]
current_nat_addr fs/f2fs/node.h:213 [inline]
get_next_nat_page fs/f2fs/node.c:123 [inline]
__flush_nat_entry_set fs/f2fs/node.c:2888 [inline]
f2fs_flush_nat_entries+0x258e/0x2960 fs/f2fs/node.c:2991
f2fs_write_checkpoint+0x1372/0x6a70 fs/f2fs/checkpoint.c:1640
f2fs_issue_checkpoint+0x149/0x410 fs/f2fs/checkpoint.c:1807
f2fs_sync_fs+0x20f/0x420 fs/f2fs/super.c:1454
__sync_filesystem fs/sync.c:39 [inline]
sync_filesystem fs/sync.c:67 [inline]
sync_filesystem+0x1b5/0x260 fs/sync.c:48
generic_shutdown_super+0x70/0x370 fs/super.c:448
kill_block_super+0x97/0xf0 fs/super.c:1394
The root cause is, if nat entry in checkpoint journal area is corrupted,
e.g. nid of journalled nat entry exceeds max nid value, during checkpoint,
once it tries to flush nat journal to NAT area, get_next_nat_page() may
access out-of-bounds memory on nat_bitmap due to it uses wrong nid value
as bitmap offset.
[1] https://lore.kernel.org/lkml/CAFcO6XOMWdr8pObek6eN6-fs58KG9doRFadgJj-FnF-1x…
Reported-and-tested-by: butt3rflyh4ck <butterflyhuangxx(a)gmail.com>
Signed-off-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 4b0e2e3c2c88..45c8cf1afe66 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2785,6 +2785,9 @@ static void remove_nats_in_journal(struct f2fs_sb_info *sbi)
struct f2fs_nat_entry raw_ne;
nid_t nid = le32_to_cpu(nid_in_journal(journal, i));
+ if (f2fs_check_nid_range(sbi, nid))
+ continue;
+
raw_ne = nat_in_journal(journal, i);
ne = __lookup_nat_cache(nm_i, nid);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b862676e371715456c9dade7990c8004996d0d9e Mon Sep 17 00:00:00 2001
From: Chao Yu <chao(a)kernel.org>
Date: Mon, 22 Mar 2021 19:47:30 +0800
Subject: [PATCH] f2fs: fix to avoid out-of-bounds memory access
butt3rflyh4ck <butterflyhuangxx(a)gmail.com> reported a bug found by
syzkaller fuzzer with custom modifications in 5.12.0-rc3+ [1]:
dump_stack+0xfa/0x151 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x82/0x32c mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
f2fs_test_bit fs/f2fs/f2fs.h:2572 [inline]
current_nat_addr fs/f2fs/node.h:213 [inline]
get_next_nat_page fs/f2fs/node.c:123 [inline]
__flush_nat_entry_set fs/f2fs/node.c:2888 [inline]
f2fs_flush_nat_entries+0x258e/0x2960 fs/f2fs/node.c:2991
f2fs_write_checkpoint+0x1372/0x6a70 fs/f2fs/checkpoint.c:1640
f2fs_issue_checkpoint+0x149/0x410 fs/f2fs/checkpoint.c:1807
f2fs_sync_fs+0x20f/0x420 fs/f2fs/super.c:1454
__sync_filesystem fs/sync.c:39 [inline]
sync_filesystem fs/sync.c:67 [inline]
sync_filesystem+0x1b5/0x260 fs/sync.c:48
generic_shutdown_super+0x70/0x370 fs/super.c:448
kill_block_super+0x97/0xf0 fs/super.c:1394
The root cause is, if nat entry in checkpoint journal area is corrupted,
e.g. nid of journalled nat entry exceeds max nid value, during checkpoint,
once it tries to flush nat journal to NAT area, get_next_nat_page() may
access out-of-bounds memory on nat_bitmap due to it uses wrong nid value
as bitmap offset.
[1] https://lore.kernel.org/lkml/CAFcO6XOMWdr8pObek6eN6-fs58KG9doRFadgJj-FnF-1x…
Reported-and-tested-by: butt3rflyh4ck <butterflyhuangxx(a)gmail.com>
Signed-off-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 4b0e2e3c2c88..45c8cf1afe66 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2785,6 +2785,9 @@ static void remove_nats_in_journal(struct f2fs_sb_info *sbi)
struct f2fs_nat_entry raw_ne;
nid_t nid = le32_to_cpu(nid_in_journal(journal, i));
+ if (f2fs_check_nid_range(sbi, nid))
+ continue;
+
raw_ne = nat_in_journal(journal, i);
ne = __lookup_nat_cache(nm_i, nid);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b862676e371715456c9dade7990c8004996d0d9e Mon Sep 17 00:00:00 2001
From: Chao Yu <chao(a)kernel.org>
Date: Mon, 22 Mar 2021 19:47:30 +0800
Subject: [PATCH] f2fs: fix to avoid out-of-bounds memory access
butt3rflyh4ck <butterflyhuangxx(a)gmail.com> reported a bug found by
syzkaller fuzzer with custom modifications in 5.12.0-rc3+ [1]:
dump_stack+0xfa/0x151 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x82/0x32c mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
f2fs_test_bit fs/f2fs/f2fs.h:2572 [inline]
current_nat_addr fs/f2fs/node.h:213 [inline]
get_next_nat_page fs/f2fs/node.c:123 [inline]
__flush_nat_entry_set fs/f2fs/node.c:2888 [inline]
f2fs_flush_nat_entries+0x258e/0x2960 fs/f2fs/node.c:2991
f2fs_write_checkpoint+0x1372/0x6a70 fs/f2fs/checkpoint.c:1640
f2fs_issue_checkpoint+0x149/0x410 fs/f2fs/checkpoint.c:1807
f2fs_sync_fs+0x20f/0x420 fs/f2fs/super.c:1454
__sync_filesystem fs/sync.c:39 [inline]
sync_filesystem fs/sync.c:67 [inline]
sync_filesystem+0x1b5/0x260 fs/sync.c:48
generic_shutdown_super+0x70/0x370 fs/super.c:448
kill_block_super+0x97/0xf0 fs/super.c:1394
The root cause is, if nat entry in checkpoint journal area is corrupted,
e.g. nid of journalled nat entry exceeds max nid value, during checkpoint,
once it tries to flush nat journal to NAT area, get_next_nat_page() may
access out-of-bounds memory on nat_bitmap due to it uses wrong nid value
as bitmap offset.
[1] https://lore.kernel.org/lkml/CAFcO6XOMWdr8pObek6eN6-fs58KG9doRFadgJj-FnF-1x…
Reported-and-tested-by: butt3rflyh4ck <butterflyhuangxx(a)gmail.com>
Signed-off-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 4b0e2e3c2c88..45c8cf1afe66 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2785,6 +2785,9 @@ static void remove_nats_in_journal(struct f2fs_sb_info *sbi)
struct f2fs_nat_entry raw_ne;
nid_t nid = le32_to_cpu(nid_in_journal(journal, i));
+ if (f2fs_check_nid_range(sbi, nid))
+ continue;
+
raw_ne = nat_in_journal(journal, i);
ne = __lookup_nat_cache(nm_i, nid);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3e903315790baf4a966436e7f32e9c97864570ac Mon Sep 17 00:00:00 2001
From: Guochun Mao <guochun.mao(a)mediatek.com>
Date: Tue, 16 Mar 2021 16:52:14 +0800
Subject: [PATCH] ubifs: Only check replay with inode type to judge if inode
linked
Conside the following case, it just write a big file into flash,
when complete writing, delete the file, and then power off promptly.
Next time power on, we'll get a replay list like:
...
LEB 1105:211344 len 4144 deletion 0 sqnum 428783 key type 1 inode 80
LEB 15:233544 len 160 deletion 1 sqnum 428785 key type 0 inode 80
LEB 1105:215488 len 4144 deletion 0 sqnum 428787 key type 1 inode 80
...
In the replay list, data nodes' deletion are 0, and the inode node's
deletion is 1. In current logic, the file's dentry will be removed,
but inode and the flash space it occupied will be reserved.
User will see that much free space been disappeared.
We only need to check the deletion value of the following inode type
node of the replay entry.
Fixes: e58725d51fa8 ("ubifs: Handle re-linking of inodes correctly while recovery")
Cc: stable(a)vger.kernel.org
Signed-off-by: Guochun Mao <guochun.mao(a)mediatek.com>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index 0f8a6a16421b..1929ec63a0cb 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -223,7 +223,8 @@ static bool inode_still_linked(struct ubifs_info *c, struct replay_entry *rino)
*/
list_for_each_entry_reverse(r, &c->replay_list, list) {
ubifs_assert(c, r->sqnum >= rino->sqnum);
- if (key_inum(c, &r->key) == key_inum(c, &rino->key))
+ if (key_inum(c, &r->key) == key_inum(c, &rino->key) &&
+ key_type(c, &r->key) == UBIFS_INO_KEY)
return r->deletion == 0;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3e903315790baf4a966436e7f32e9c97864570ac Mon Sep 17 00:00:00 2001
From: Guochun Mao <guochun.mao(a)mediatek.com>
Date: Tue, 16 Mar 2021 16:52:14 +0800
Subject: [PATCH] ubifs: Only check replay with inode type to judge if inode
linked
Conside the following case, it just write a big file into flash,
when complete writing, delete the file, and then power off promptly.
Next time power on, we'll get a replay list like:
...
LEB 1105:211344 len 4144 deletion 0 sqnum 428783 key type 1 inode 80
LEB 15:233544 len 160 deletion 1 sqnum 428785 key type 0 inode 80
LEB 1105:215488 len 4144 deletion 0 sqnum 428787 key type 1 inode 80
...
In the replay list, data nodes' deletion are 0, and the inode node's
deletion is 1. In current logic, the file's dentry will be removed,
but inode and the flash space it occupied will be reserved.
User will see that much free space been disappeared.
We only need to check the deletion value of the following inode type
node of the replay entry.
Fixes: e58725d51fa8 ("ubifs: Handle re-linking of inodes correctly while recovery")
Cc: stable(a)vger.kernel.org
Signed-off-by: Guochun Mao <guochun.mao(a)mediatek.com>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c
index 0f8a6a16421b..1929ec63a0cb 100644
--- a/fs/ubifs/replay.c
+++ b/fs/ubifs/replay.c
@@ -223,7 +223,8 @@ static bool inode_still_linked(struct ubifs_info *c, struct replay_entry *rino)
*/
list_for_each_entry_reverse(r, &c->replay_list, list) {
ubifs_assert(c, r->sqnum >= rino->sqnum);
- if (key_inum(c, &r->key) == key_inum(c, &rino->key))
+ if (key_inum(c, &r->key) == key_inum(c, &rino->key) &&
+ key_type(c, &r->key) == UBIFS_INO_KEY)
return r->deletion == 0;
}
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 772d7891e8b3b0baae7bb88a294d61fd07ba6d15 Mon Sep 17 00:00:00 2001
From: Jisheng Zhang <jszhang(a)kernel.org>
Date: Fri, 2 Apr 2021 21:29:08 +0800
Subject: [PATCH] riscv: vdso: fix and clean-up Makefile
Running "make" on an already compiled kernel tree will rebuild the
kernel even without any modifications:
CALL linux/scripts/checksyscalls.sh
CALL linux/scripts/atomic/check-atomics.sh
CHK include/generated/compile.h
SO2S arch/riscv/kernel/vdso/vdso-syms.S
AS arch/riscv/kernel/vdso/vdso-syms.o
AR arch/riscv/kernel/vdso/built-in.a
AR arch/riscv/kernel/built-in.a
AR arch/riscv/built-in.a
GEN .version
CHK include/generated/compile.h
UPD include/generated/compile.h
CC init/version.o
AR init/built-in.a
LD vmlinux.o
The reason is "Any target that utilizes if_changed must be listed in
$(targets), otherwise the command line check will fail, and the target
will always be built" as explained by Documentation/kbuild/makefiles.rst
Fix this build bug by adding vdso-syms.S to $(targets)
At the same time, there are two trivial clean up modifications:
- the vdso-dummy.o is not needed any more after so remove it.
- vdso.lds is a generated file, so it should be prefixed with
$(obj)/ instead of $(src)/
Fixes: c2c81bb2f691 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jisheng Zhang <jszhang(a)kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt(a)google.com>
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index ca2b40dfd24b..24d936c147cd 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -23,7 +23,7 @@ ifneq ($(c-gettimeofday-y),)
endif
# Build rules
-targets := $(obj-vdso) vdso.so vdso.so.dbg vdso.lds vdso-dummy.o
+targets := $(obj-vdso) vdso.so vdso.so.dbg vdso.lds vdso-syms.S
obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
obj-y += vdso.o vdso-syms.o
@@ -41,7 +41,7 @@ KASAN_SANITIZE := n
$(obj)/vdso.o: $(obj)/vdso.so
# link rule for the .so file, .lds has to be first
-$(obj)/vdso.so.dbg: $(src)/vdso.lds $(obj-vdso) FORCE
+$(obj)/vdso.so.dbg: $(obj)/vdso.lds $(obj-vdso) FORCE
$(call if_changed,vdsold)
LDFLAGS_vdso.so.dbg = -shared -s -soname=linux-vdso.so.1 \
--build-id=sha1 --hash-style=both --eh-frame-hdr
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 772d7891e8b3b0baae7bb88a294d61fd07ba6d15 Mon Sep 17 00:00:00 2001
From: Jisheng Zhang <jszhang(a)kernel.org>
Date: Fri, 2 Apr 2021 21:29:08 +0800
Subject: [PATCH] riscv: vdso: fix and clean-up Makefile
Running "make" on an already compiled kernel tree will rebuild the
kernel even without any modifications:
CALL linux/scripts/checksyscalls.sh
CALL linux/scripts/atomic/check-atomics.sh
CHK include/generated/compile.h
SO2S arch/riscv/kernel/vdso/vdso-syms.S
AS arch/riscv/kernel/vdso/vdso-syms.o
AR arch/riscv/kernel/vdso/built-in.a
AR arch/riscv/kernel/built-in.a
AR arch/riscv/built-in.a
GEN .version
CHK include/generated/compile.h
UPD include/generated/compile.h
CC init/version.o
AR init/built-in.a
LD vmlinux.o
The reason is "Any target that utilizes if_changed must be listed in
$(targets), otherwise the command line check will fail, and the target
will always be built" as explained by Documentation/kbuild/makefiles.rst
Fix this build bug by adding vdso-syms.S to $(targets)
At the same time, there are two trivial clean up modifications:
- the vdso-dummy.o is not needed any more after so remove it.
- vdso.lds is a generated file, so it should be prefixed with
$(obj)/ instead of $(src)/
Fixes: c2c81bb2f691 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jisheng Zhang <jszhang(a)kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt(a)google.com>
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index ca2b40dfd24b..24d936c147cd 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -23,7 +23,7 @@ ifneq ($(c-gettimeofday-y),)
endif
# Build rules
-targets := $(obj-vdso) vdso.so vdso.so.dbg vdso.lds vdso-dummy.o
+targets := $(obj-vdso) vdso.so vdso.so.dbg vdso.lds vdso-syms.S
obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
obj-y += vdso.o vdso-syms.o
@@ -41,7 +41,7 @@ KASAN_SANITIZE := n
$(obj)/vdso.o: $(obj)/vdso.so
# link rule for the .so file, .lds has to be first
-$(obj)/vdso.so.dbg: $(src)/vdso.lds $(obj-vdso) FORCE
+$(obj)/vdso.so.dbg: $(obj)/vdso.lds $(obj-vdso) FORCE
$(call if_changed,vdsold)
LDFLAGS_vdso.so.dbg = -shared -s -soname=linux-vdso.so.1 \
--build-id=sha1 --hash-style=both --eh-frame-hdr
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 772d7891e8b3b0baae7bb88a294d61fd07ba6d15 Mon Sep 17 00:00:00 2001
From: Jisheng Zhang <jszhang(a)kernel.org>
Date: Fri, 2 Apr 2021 21:29:08 +0800
Subject: [PATCH] riscv: vdso: fix and clean-up Makefile
Running "make" on an already compiled kernel tree will rebuild the
kernel even without any modifications:
CALL linux/scripts/checksyscalls.sh
CALL linux/scripts/atomic/check-atomics.sh
CHK include/generated/compile.h
SO2S arch/riscv/kernel/vdso/vdso-syms.S
AS arch/riscv/kernel/vdso/vdso-syms.o
AR arch/riscv/kernel/vdso/built-in.a
AR arch/riscv/kernel/built-in.a
AR arch/riscv/built-in.a
GEN .version
CHK include/generated/compile.h
UPD include/generated/compile.h
CC init/version.o
AR init/built-in.a
LD vmlinux.o
The reason is "Any target that utilizes if_changed must be listed in
$(targets), otherwise the command line check will fail, and the target
will always be built" as explained by Documentation/kbuild/makefiles.rst
Fix this build bug by adding vdso-syms.S to $(targets)
At the same time, there are two trivial clean up modifications:
- the vdso-dummy.o is not needed any more after so remove it.
- vdso.lds is a generated file, so it should be prefixed with
$(obj)/ instead of $(src)/
Fixes: c2c81bb2f691 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jisheng Zhang <jszhang(a)kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt(a)google.com>
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index ca2b40dfd24b..24d936c147cd 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -23,7 +23,7 @@ ifneq ($(c-gettimeofday-y),)
endif
# Build rules
-targets := $(obj-vdso) vdso.so vdso.so.dbg vdso.lds vdso-dummy.o
+targets := $(obj-vdso) vdso.so vdso.so.dbg vdso.lds vdso-syms.S
obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
obj-y += vdso.o vdso-syms.o
@@ -41,7 +41,7 @@ KASAN_SANITIZE := n
$(obj)/vdso.o: $(obj)/vdso.so
# link rule for the .so file, .lds has to be first
-$(obj)/vdso.so.dbg: $(src)/vdso.lds $(obj-vdso) FORCE
+$(obj)/vdso.so.dbg: $(obj)/vdso.lds $(obj-vdso) FORCE
$(call if_changed,vdsold)
LDFLAGS_vdso.so.dbg = -shared -s -soname=linux-vdso.so.1 \
--build-id=sha1 --hash-style=both --eh-frame-hdr