Hi All,
We (Fedora) have been receiving reports from multiple users about gfx issues / glitches
stating with 5.10.9. All reporters are users of Ivy Bridge / Haswell iGPUs and all
reporters report that adding i915.mitigations=off to the cmdline fixes things, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1925346
Which should be fully visible without a bugzilla account.
I noticed that 5.10.13 had one more related i915 patch, so I've asked the reporters
to retest with 5.10.13, 5.10.13 is better, but things are not fixed there, it just
takes longer for the problems to show up.
Greg, I can prepare a Fedora test-kernel build for the reporters to test with
the following 3 commits reverted:
520d05a77b2866eb ("drm/i915/gt: Clear CACHE_MODE prior to clearing residuals")
ecca0c675bdecebd ("drm/i915/gt: Restore clear-residual mitigations for Ivybridge, Baytrail")
48b8c6689efa7cd6 ("drm/i915/gt: Limit VFE threads based on GT")
(Note this are the 5.10.y hashes)
Reverting these 3 is not ideal, but it is probably the fastest way to get
this resolved for the 5.10.y series.
Greg, do you want me to have the reporters test a 5.10.y series kernel
with these 3 reverts ?
Regards,
Hans
On Thu 2021-02-11 18:37:52, John Ogness wrote:
> If message sizes average larger than expected (more than 32
> characters), the data_ring will wrap before the desc_ring. Once the
> data_ring wraps, it will start invalidating descriptors. These
> invalid descriptors hang around until they are eventually recycled
> when the desc_ring wraps. Readers do not care about invalid
> descriptors, but they still need to iterate past them. If the
> average message size is much larger than 32 characters, then there
> will be many invalid descriptors preceding the valid descriptors.
>
> The function prb_first_valid_seq() always begins at the oldest
> descriptor and searches for the first valid descriptor. This can
> be rather expensive for the above scenario. And, in fact, because
> of its heavy usage in /dev/kmsg, there have been reports of long
> delays and even RCU stalls.
>
> For code that does not need to search from the oldest record,
> replace prb_first_valid_seq() usage with prb_read_valid_*()
> functions, which provide a start sequence number to search from.
>
> Fixes: 896fbe20b4e2333fb55 ("printk: use the lockless ringbuffer")
> Reported-by: kernel test robot <oliver.sang(a)intel.com>
> Reported-by: J. Avila <elavila(a)google.com>
> Signed-off-by: John Ogness <john.ogness(a)linutronix.de>
Could you please push this fix into the stable releases
based on 5.10 and 5.11, please?
The patch fixes a visible performance regression. It has
landed in the mainline as the commit
13791c80b0cdf54d92fc542 ("printk: avoid prb_first_valid_seq() where
possible").
It should apply cleanly.
Best Regards,
Petr
This is backport of 3642eb21256a ("powerpc/32: Preserve cr1 in
exception prolog stack check to fix build error") for kernel 5.10
It fixes the build failure on v5.10 reported by kernel test robot
and by David Michael.
This fix is not in Linux tree yet, it is in next branch in powerpc tree.
(cherry picked from commit 3642eb21256a317ac14e9ed560242c6d20cf06d9)
THREAD_ALIGN_SHIFT = THREAD_SHIFT + 1 = PAGE_SHIFT + 1
Maximum PAGE_SHIFT is 18 for 256k pages so
THREAD_ALIGN_SHIFT is 19 at the maximum.
No need to clobber cr1, it can be preserved when moving r1
into CR when we check stack overflow.
This reduces the number of instructions in Machine Check Exception
prolog and fixes a build failure reported by the kernel test robot
on v5.10 stable when building with RTAS + VMAP_STACK + KVM. That
build failure is due to too many instructions in the prolog hence
not fitting between 0x200 and 0x300. Allthough the problem doesn't
show up in mainline, it is still worth the change.
Fixes: 98bf2d3f4970 ("powerpc/32s: Fix RTAS machine check with VMAP stack")
Cc: stable(a)vger.kernel.org
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/5ae4d545e3ac58e133d2599e0deb88843cb494fc.16127686…
---
arch/powerpc/kernel/head_32.h | 2 +-
arch/powerpc/kernel/head_book3s_32.S | 6 ------
2 files changed, 1 insertion(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index c88e66adecb5..fef0b34a77c9 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -56,7 +56,7 @@
1:
tophys_novmstack r11, r11
#ifdef CONFIG_VMAP_STACK
- mtcrf 0x7f, r1
+ mtcrf 0x3f, r1
bt 32 - THREAD_ALIGN_SHIFT, stack_overflow
#endif
.endm
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index d66da35f2e8d..2729d8fa6e77 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -280,12 +280,6 @@ MachineCheck:
7: EXCEPTION_PROLOG_2
addi r3,r1,STACK_FRAME_OVERHEAD
#ifdef CONFIG_PPC_CHRP
-#ifdef CONFIG_VMAP_STACK
- mfspr r4, SPRN_SPRG_THREAD
- tovirt(r4, r4)
- lwz r4, RTAS_SP(r4)
- cmpwi cr1, r4, 0
-#endif
beq cr1, machine_check_tramp
twi 31, 0, 0
#else
--
2.25.0
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7018c897c2f243d4b5f1b94bc6b4831a7eab80fb Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Mon, 1 Feb 2021 16:20:40 -0800
Subject: [PATCH] libnvdimm/dimm: Avoid race between probe and
available_slots_show()
Richard reports that the following test:
(while true; do
cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null
done) &
while true; do
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind
done
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind
done
done
...fails with a crash signature like:
divide error: 0000 [#1] SMP KASAN PTI
RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm]
[..]
Call Trace:
available_slots_show+0x4e/0x120 [libnvdimm]
dev_attr_show+0x42/0x80
? memset+0x20/0x40
sysfs_kf_seq_show+0x218/0x410
The root cause is that available_slots_show() consults driver-data, but
fails to synchronize against device-unbind setting up a TOCTOU race to
access uninitialized memory.
Validate driver-data under the device-lock.
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure")
Cc: <stable(a)vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Coly Li <colyli(a)suse.com>
Reported-by: Richard Palethorpe <rpalethorpe(a)suse.com>
Acked-by: Richard Palethorpe <rpalethorpe(a)suse.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index b59032e0859b..9d208570d059 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -335,16 +335,16 @@ static ssize_t state_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(state);
-static ssize_t available_slots_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t __available_slots_show(struct nvdimm_drvdata *ndd, char *buf)
{
- struct nvdimm_drvdata *ndd = dev_get_drvdata(dev);
+ struct device *dev;
ssize_t rc;
u32 nfree;
if (!ndd)
return -ENXIO;
+ dev = ndd->dev;
nvdimm_bus_lock(dev);
nfree = nd_label_nfree(ndd);
if (nfree - 1 > nfree) {
@@ -356,6 +356,18 @@ static ssize_t available_slots_show(struct device *dev,
nvdimm_bus_unlock(dev);
return rc;
}
+
+static ssize_t available_slots_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ ssize_t rc;
+
+ nd_device_lock(dev);
+ rc = __available_slots_show(dev_get_drvdata(dev), buf);
+ nd_device_unlock(dev);
+
+ return rc;
+}
static DEVICE_ATTR_RO(available_slots);
__weak ssize_t security_show(struct device *dev,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9917f0e3cdba7b9f1a23f70e3f70b1a106be54a8 Mon Sep 17 00:00:00 2001
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Date: Mon, 1 Feb 2021 21:47:20 +0900
Subject: [PATCH] usb: renesas_usbhs: Clear pipe running flag in
usbhs_pkt_pop()
Should clear the pipe running flag in usbhs_pkt_pop(). Otherwise,
we cannot use this pipe after dequeue was called while the pipe was
running.
Fixes: 8355b2b3082d ("usb: renesas_usbhs: fix the behavior of some usbhs_pkt_handle")
Reported-by: Tho Vu <tho.vu.wh(a)renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Link: https://lore.kernel.org/r/1612183640-8898-1-git-send-email-yoshihiro.shimod…
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/renesas_usbhs/fifo.c b/drivers/usb/renesas_usbhs/fifo.c
index ac9a81ae8216..e6fa13701808 100644
--- a/drivers/usb/renesas_usbhs/fifo.c
+++ b/drivers/usb/renesas_usbhs/fifo.c
@@ -126,6 +126,7 @@ struct usbhs_pkt *usbhs_pkt_pop(struct usbhs_pipe *pipe, struct usbhs_pkt *pkt)
}
usbhs_pipe_clear_without_sequence(pipe, 0, 0);
+ usbhs_pipe_running(pipe, 0);
__usbhsf_pkt_del(pkt);
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7ffddd499ba6122b1a07828f023d1d67629aa017 Mon Sep 17 00:00:00 2001
From: Muchun Song <songmuchun(a)bytedance.com>
Date: Thu, 4 Feb 2021 18:32:06 -0800
Subject: [PATCH] mm: hugetlb: fix a race between freeing and dissolving the
page
There is a race condition between __free_huge_page()
and dissolve_free_huge_page().
CPU0: CPU1:
// page_count(page) == 1
put_page(page)
__free_huge_page(page)
dissolve_free_huge_page(page)
spin_lock(&hugetlb_lock)
// PageHuge(page) && !page_count(page)
update_and_free_page(page)
// page is freed to the buddy
spin_unlock(&hugetlb_lock)
spin_lock(&hugetlb_lock)
clear_page_huge_active(page)
enqueue_huge_page(page)
// It is wrong, the page is already freed
spin_unlock(&hugetlb_lock)
The race window is between put_page() and dissolve_free_huge_page().
We should make sure that the page is already on the free list when it is
dissolved.
As a result __free_huge_page would corrupt page(s) already in the buddy
allocator.
Link: https://lkml.kernel.org/r/20210115124942.46403-4-songmuchun@bytedance.com
Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6f0e242d38ca..c6ee3c28a04e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
static int num_fault_mutexes;
struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
+static inline bool PageHugeFreed(struct page *head)
+{
+ return page_private(head + 4) == -1UL;
+}
+
+static inline void SetPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, -1UL);
+}
+
+static inline void ClearPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, 0);
+}
+
/* Forward declaration */
static int hugetlb_acct_memory(struct hstate *h, long delta);
@@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
list_move(&page->lru, &h->hugepage_freelists[nid]);
h->free_huge_pages++;
h->free_huge_pages_node[nid]++;
+ SetPageHugeFreed(page);
}
static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
@@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
list_move(&page->lru, &h->hugepage_activelist);
set_page_refcounted(page);
+ ClearPageHugeFreed(page);
h->free_huge_pages--;
h->free_huge_pages_node[nid]--;
return page;
@@ -1505,6 +1522,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
spin_lock(&hugetlb_lock);
h->nr_huge_pages++;
h->nr_huge_pages_node[nid]++;
+ ClearPageHugeFreed(page);
spin_unlock(&hugetlb_lock);
}
@@ -1755,6 +1773,7 @@ int dissolve_free_huge_page(struct page *page)
{
int rc = -EBUSY;
+retry:
/* Not to disrupt normal path by vainly holding hugetlb_lock */
if (!PageHuge(page))
return 0;
@@ -1771,6 +1790,26 @@ int dissolve_free_huge_page(struct page *page)
int nid = page_to_nid(head);
if (h->free_huge_pages - h->resv_huge_pages == 0)
goto out;
+
+ /*
+ * We should make sure that the page is already on the free list
+ * when it is dissolved.
+ */
+ if (unlikely(!PageHugeFreed(head))) {
+ spin_unlock(&hugetlb_lock);
+ cond_resched();
+
+ /*
+ * Theoretically, we should return -EBUSY when we
+ * encounter this race. In fact, we have a chance
+ * to successfully dissolve the page if we do a
+ * retry. Because the race window is quite small.
+ * If we seize this opportunity, it is an optimization
+ * for increasing the success rate of dissolving page.
+ */
+ goto retry;
+ }
+
/*
* Move PageHWPoison flag from head page to the raw error page,
* which makes any subpages rather than the error page reusable.
Hi,
There is a regression in Linux 5.10.9 that does not happen in 5.10.8. It is still there as
of 5.11.1
This regression consists in graphics artifacts that will *only* start appearing after resuming
from suspend. They don't happen immediately after resuming from suspend either, but
after some minutes.
My system has integrated intel graphics
00:02.0 VGA compatible controller: Intel Corporation 4th Generation Core Processor Family Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
CPU: Intel(R) Core(TM) i3-4170T CPU @ 3.20GHz
For reference, this is the list of i915 commits that went into 5.10.9.
commit ecca0c675bdecebdeb2f2eb76fb33520c441dacf
Author: Chris Wilson <chris(a)chris-wilson.co.uk>
Date: Mon Jan 11 22:52:19 2021 +0000
drm/i915/gt: Restore clear-residual mitigations for Ivybridge, Baytrail
commit 09aa9e45863e9e25dfbf350bae89fc3c2964482c upstream.
commit de3f572607c29f7fdd1bfd754646d08e32db0249
Author: Imre Deak <imre.deak(a)intel.com>
Date: Wed Dec 9 17:39:52 2020 +0200
drm/i915/icl: Fix initing the DSI DSC power refcount during HW readout
commit 2af5268180410b874fc06be91a1b2fbb22b1be0c upstream.
commit 54c9246a47fa8559c3ec6da2048e976a4b8750f6
Author: Hans de Goede <hdegoede(a)redhat.com>
Date: Wed Nov 18 13:40:58 2020 +0100
drm/i915/dsi: Use unconditional msleep for the panel_on_delay when there is no reset-deassert MIPI-sequence
commit 00cb645fd7e29bdd20967cd20fa8f77bcdf422f9 upstream.
commit 0a34addcdbd9e03e3f3d09bcd5a1719d90b2d637
Author: Jani Nikula <jani.nikula(a)intel.com>
Date: Fri Jan 8 17:28:41 2021 +0200
drm/i915/backlight: fix CPU mode backlight takeover on LPT
commit bb83d5fb550bb7db75b29e6342417fda2bbb691c upstream.
commit 48b8c6689efa7cd65a72f620940a4f234b944b73
Author: Chris Wilson <chris(a)chris-wilson.co.uk>
Date: Mon Jan 11 22:52:18 2021 +0000
drm/i915/gt: Limit VFE threads based on GT
commit ffaf97899c4a58b9fefb11534f730785443611a8 upstream.
commit 481e27f050732b8c680f26287dd44967fddf9a79
Author: Chris Wilson <chris(a)chris-wilson.co.uk>
Date: Mon Jan 11 22:52:20 2021 +0000
drm/i915: Allow the sysadmin to override security mitigations
commit 984cadea032b103c5824a5f29d0a36b3e9df6333 upstream.
Regards