From: Kemeng Shi <shikemeng(a)huaweicloud.com>
[ Upstream commit d92109891f21cf367caa2cc6dff11a4411d917f4 ]
For case there is no more inodes for IO in io list from last wb_writeback,
We may bail out early even there is inode in dirty list should be written
back. Only bail out when we queued once to avoid missing dirtied inode.
This is from code reading...
Signed-off-by: Kemeng Shi <shikemeng(a)huaweicloud.com>
Link: https://lore.kernel.org/r/20240228091958.288260-3-shikemeng@huaweicloud.com
Reviewed-by: Jan Kara <jack(a)suse.cz>
[brauner(a)kernel.org: fold in memory corruption fix from Jan in [1]]
Link: https://lore.kernel.org/r/20240405132346.bid7gibby3lxxhez@quack3 [1]
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/fs-writeback.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e4f17c53ddfcf..d31853032a931 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2069,6 +2069,7 @@ static long wb_writeback(struct bdi_writeback *wb,
struct inode *inode;
long progress;
struct blk_plug plug;
+ bool queued = false;
blk_start_plug(&plug);
for (;;) {
@@ -2111,8 +2112,10 @@ static long wb_writeback(struct bdi_writeback *wb,
dirtied_before = jiffies;
trace_writeback_start(wb, work);
- if (list_empty(&wb->b_io))
+ if (list_empty(&wb->b_io)) {
queue_io(wb, work, dirtied_before);
+ queued = true;
+ }
if (work->sb)
progress = writeback_sb_inodes(work->sb, wb, work);
else
@@ -2127,7 +2130,7 @@ static long wb_writeback(struct bdi_writeback *wb,
* mean the overall work is done. So we keep looping as long
* as made some progress on cleaning pages or inodes.
*/
- if (progress) {
+ if (progress || !queued) {
spin_unlock(&wb->list_lock);
continue;
}
--
2.43.0
These commits reference use.after.free between v6.9 and v6.10-rc1
These commits are not, yet, in stable/linux-rolling-stable.
Let me know if you would rather me compare to a different repo/branch.
The list has been manually pruned to only contain commits that look like
actual issues.
If they contain a Fixes line it has been verified that at least one of the
commits that the Fixes tag(s) reference is in stable/linux-rolling-stable
90e823498881fb8a91d8
5c9c5d7f26acc2c669c1
573601521277119f2e2b
f88da7fbf665ffdcbf5b
47a92dfbe01f41bcbf35
5bc9de065b8bb9b8dd87
5f204051d998ec3d7306
be84f32bb2c981ca6709
88ce0106a1f603bf360c
--
Ronnie Sahlberg [Principal Software Engineer, Linux]
P 775 384 8203 | E [email] | W ciq.com
Greetings fellas,
I have encountered a critical bug in the Linux vanilla kernel that
leads to a kernel panic during the shutdown or reboot process. The
issue arises after all services, including `journald`, have been
stopped. As a result, the machine fails to complete the shutdown or
reboot procedure, effectively causing the system to hang and not shut
down or reboot.
Here are the details of the issue:
- Affected Versions: Before kernel version 6.8.10, the bug caused a
quick display of a kernel trace dump before the shutdown/reboot
completed. Starting from version 6.8.10 and continuing into version
6.9.0 and 6.9.1, this issue has escalated to a kernel panic,
preventing the shutdown or reboot from completing and leaving the
machine stuck.
- Symptoms:
- In normal shutdown/reboot scenarios, the kernel trace dump briefly
appears as the last message on the screen.
- In rescue mode, the kernel panic message is displayed. Normally it
is not shown.
Since `journald` is stopped before this issue occurs, no textual logs
are available. However, I have captured two pictures illustrating
these related issues, which I am attaching to this email for your
reference. Also added my custom kernel config.
Thank you for your attention to this matter. Please let me know if any
additional information is required to assist in diagnosing and
resolving this bug.
Best regards,
Ilkka Naulapää
There appears to be a possible use after free with vdec_close().
The firmware will add buffer release work to the work queue through
HFI callbacks as a normal part of decoding. Randomly closing the
decoder device from userspace during normal decoding can incur
a read after free for inst.
Fix it by cancelling the work in vdec_close.
Cc: stable(a)vger.kernel.org
Fixes: af2c3834c8ca ("[media] media: venus: adding core part and helper functions")
Signed-off-by: Dikshita Agarwal <quic_dikshita(a)quicinc.com>
---
Changes since v3:
- Fixed style issue with fixes tag
Changes since v2:
- Fixed email id
Changes since v1:
- Added fixes and stable tags
drivers/media/platform/qcom/venus/vdec.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/media/platform/qcom/venus/vdec.c b/drivers/media/platform/qcom/venus/vdec.c
index 29130a9..56f8a25 100644
--- a/drivers/media/platform/qcom/venus/vdec.c
+++ b/drivers/media/platform/qcom/venus/vdec.c
@@ -1747,6 +1747,7 @@ static int vdec_close(struct file *file)
vdec_pm_get(inst);
+ cancel_work_sync(&inst->delayed_process_work);
v4l2_m2m_ctx_release(inst->m2m_ctx);
v4l2_m2m_release(inst->m2m_dev);
vdec_ctrl_deinit(inst);
--
2.7.4
Am 23.05.24 um 15:13 schrieb Barry Kauler:
> On Wed, May 22, 2024 at 12:58 AM Armin Wolf <W_Armin(a)gmx.de> wrote:
>> Am 20.05.24 um 18:22 schrieb Alex Deucher:
>>
>>> On Sat, May 18, 2024 at 8:17 PM Armin Wolf <W_Armin(a)gmx.de> wrote:
>>>> Am 17.05.24 um 03:30 schrieb Barry Kauler:
>>>>
>>>>> Armin, Yifan, Prike,
>>>>> I will top-post, so you don't have to scroll down.
>>>>> After identifying the commit that causes black screen with my gpu, I
>>>>> posted the result to you guys, on May 9.
>>>>> It is now May 17 and no reply.
>>>>> OK, I have now created a patch that reverts Yifan's commit, compiled
>>>>> 5.15.158, and my gpu now works.
>>>>> Note, the radeon module is not loaded, so it is not a factor.
>>>>> I'm not a kernel developer. I have identified the culprit and it is up
>>>>> to you guys to fix it, Yifan especially, as you are the person who has
>>>>> created the regression.
>>>>> I will attach my patch.
>>>>> Regards,
>>>>> Barry Kauler
>>>> Hi,
>>>>
>>>> sorry for not responding to your findings. I normally do not work with GPU drivers,
>>>> so i hoped one of the amdgpu developers would handle this.
>>>>
>>>> I CCeddri-devel(a)lists.freedesktop.org and amd-gfx(a)lists.freedesktop.org so that other
>>>> amdgpu developers hear from this issue.
>>>>
>>>> Thanks you for you persistence in finding the offending commit.
>>> Likely this patch should not have been ported to 5.15 in the first
>>> place. The IOMMU requirements have been dropped from the driver for
>>> the last few kernel versions so it is no longer relevant on newer
>>> kernels.
>>>
>>> Alex
>> Barry, can you verify that the latest upstream kernel works on you device?
>> If yes, then the commit itself is ok and just the backporting itself was wrong.
>>
>> Thanks,
>> Armin Wolf
> Armin,
> The unmodified 6.8.1 kernel works ok.
> I presume that patch was applied long before 6.8.1 got released and
> only got backported to 5.15.x recently.
>
> Regards,
> Barry
>
Great to hear, that means we only have to revert commit 56b522f46681 ("drm/amdgpu: init iommu after amdkfd device init")
from the 5.15.y series.
I CCed the stable mailing list so that they can revert the offending commit.
Thanks,
Armin Wolf
>>>> Armin Wolf
>>>>
>>>>> On Thu, May 9, 2024 at 4:08 PM Barry Kauler <bkauler(a)gmail.com> wrote:
>>>>>> On Fri, May 3, 2024 at 9:03 PM Armin Wolf <W_Armin(a)gmx.de> wrote:
>>>>>>>> ...
>>>>>>>> # lspci | grep VGA
>>>>>>>> 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>>>>>>>> [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile
>>>>>>>> Series] (rev c2)
>>>>>>>> 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc.
>>>>>>>> [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver
>>>>>>>>
>>>>>>>> # lspci -n -k
>>>>>>>> ...
>>>>>>>> 05:00.0 0300: 1002:15d8 (rev c2)
>>>>>>>> Subsystem: 1025:1456
>>>>>>>> Kernel driver in use: amdgpu
>>>>>>>> Kernel modules: amdgpu
>>>>>>>> ...
>>>>>>> thanks for informing us of this regression. Since there are four commits affecting
>>>>>>> amdgpu in 5.15.150, i suggest that you use "git bisect" to find the faulty commits,
>>>>>>> see https://docs.kernel.org/admin-guide/bug-bisect.html for details.
>>>>>>>
>>>>>>> I think you can speed up the bisecting process by limiting yourself to the AMD DRM
>>>>>>> driver directory with "git bisect start -- drivers/gpu/drm/amd", take a look at the
>>>>>>> man page of "git bisect" for details.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Armin Wolf
>>>>>> Armin,
>>>>>> Thanks for the advice. I am unfamiliar with git on the commandline.
>>>>>> Previously only used SmartGit gui.
>>>>>> EasyOS requires aufs patch, and for a few days tried to figure out how
>>>>>> to use that with git bisect, then gave up. Changed to testing with my
>>>>>> "QV" distro, which is more conventional, doesn't need any kernel
>>>>>> patches. Managed to get it down to one commit. Here are the steps I
>>>>>> followed:
>>>>>>
>>>>>> # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
>>>>>> # cd linux-stable
>>>>>> # git tag -l | grep '5\.15\.150'
>>>>>> v5.15.150
>>>>>> # git checkout -b my5.15.150 v5.15.150
>>>>>> Updating files: 100% (65776/65776), done.
>>>>>> Switched to a new branch 'my5.15.150'
>>>>>>
>>>>>> Copied in my .config then...
>>>>>>
>>>>>> # make menuconfig
>>>>>> # git bisect start -- drivers/gpu/drm/amd
>>>>>> # git bisect bad
>>>>>> # git bisect good v5.15.149
>>>>>> Bisecting: 1 revision left to test after this (roughly 1 step)
>>>>>> [b9a61ee2bb2704e42516e3da962f99dfa98f3b20] drm/amdgpu: reset gpu for
>>>>>> s3 suspend abort case
>>>>>> # make
>>>>>> # rm -rf /boot2
>>>>>> # mkdir -p /boot2/lib/modules
>>>>>> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
>>>>>> # cp arch/x86/boot/bzImage /boot2/vmlinuz
>>>>>> # sync
>>>>>> ...QV on Acer laptop, with amdgpu, works!
>>>>>> # git bisect good
>>>>>> Bisecting: 0 revisions left to test after this (roughly 0 steps)
>>>>>> [56b522f4668167096a50c39446d6263c96219f5f] drm/amdgpu: init iommu
>>>>>> after amdkfd device init
>>>>>> # make
>>>>>> # mkdir -p /boot2/lib/modules
>>>>>> # make INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=/boot2 modules_install
>>>>>> # cp arch/x86/boot/bzImage /boot2/vmlinuz
>>>>>> # sync
>>>>>> ...QV on Acer laptop, black screen!
>>>>>>
>>>>>> # git bisect bad
>>>>>> 56b522f4668167096a50c39446d6263c96219f5f is the first bad commit
>>>>>> commit 56b522f4668167096a50c39446d6263c96219f5f
>>>>>> Author: Yifan Zhang <yifan1.zhang(a)amd.com>
>>>>>> Date: Tue Sep 28 15:42:35 2021 +0800
>>>>>>
>>>>>> drm/amdgpu: init iommu after amdkfd device init
>>>>>>
>>>>>> [ Upstream commit 286826d7d976e7646b09149d9bc2899d74ff962b ]
>>>>>>
>>>>>> This patch is to fix clinfo failure in Raven/Picasso:
>>>>>>
>>>>>> Number of platforms: 1
>>>>>> Platform Profile: FULL_PROFILE
>>>>>> Platform Version: OpenCL 2.2 AMD-APP (3364.0)
>>>>>> Platform Name: AMD Accelerated Parallel Processing
>>>>>> Platform Vendor: Advanced Micro Devices, Inc.
>>>>>> Platform Extensions: cl_khr_icd cl_amd_event_callback
>>>>>>
>>>>>> Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
>>>>>>
>>>>>> Signed-off-by: Yifan Zhang <yifan1.zhang(a)amd.com>
>>>>>> Reviewed-by: James Zhu <James.Zhu(a)amd.com>
>>>>>> Tested-by: James Zhu <James.Zhu(a)amd.com>
>>>>>> Acked-by: Felix Kuehling <Felix.Kuehling(a)amd.com>
>>>>>> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
>>>>>> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>>>>>>
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
>>>>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> Anything else I should do, to identify what in this commit is the
>>>>>> likely culprit?
>>>>>> Regards,
>>>>>> Barry Kauler
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a63c357b9fd56ad5fe64616f5b22835252c6a76a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024012226-unmanned-marshy-5819@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a63c357b9fd5 ("iommu/dma: Trace bounce buffer usage when mapping buffers")
f316ba0a8814 ("dma-iommu: Check that swiotlb is active before trying to use it")
a17e3026bc4d ("iommu: Move flush queue data into iommu_dma_cookie")
f7f07484542f ("iommu/iova: Move flush queue code to iommu-dma")
ea4d71bb5e3f ("iommu/iova: Consolidate flush queue code")
87f60cc65d24 ("iommu/vt-d: Use put_pages_list")
649ad9835a37 ("iommu/iova: Squash flush_cb abstraction")
d5c383f2c98a ("iommu/iova: Squash entry_dtor abstraction")
d7061627d701 ("iommu/iova: Fix race between FQ timeout and teardown")
2e727bffbe93 ("iommu/dma: Check CONFIG_SWIOTLB more broadly")
9b49bbc2c4df ("iommu/dma: Fold _swiotlb helpers into callers")
ee9d4097cc14 ("iommu/dma: Skip extra sync during unmap w/swiotlb")
06e620345d54 ("iommu/dma: Fix arch_sync_dma for map")
08ae5d4a1ae9 ("iommu/dma: Fix sync_sg with swiotlb")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a63c357b9fd56ad5fe64616f5b22835252c6a76a Mon Sep 17 00:00:00 2001
From: "Isaac J. Manjarres" <isaacmanjarres(a)google.com>
Date: Fri, 8 Dec 2023 15:41:40 -0800
Subject: [PATCH] iommu/dma: Trace bounce buffer usage when mapping buffers
When commit 82612d66d51d ("iommu: Allow the dma-iommu api to
use bounce buffers") was introduced, it did not add the logic
for tracing the bounce buffer usage from iommu_dma_map_page().
All of the users of swiotlb_tbl_map_single() trace their bounce
buffer usage, except iommu_dma_map_page(). This makes it difficult
to track SWIOTLB usage from that function. Thus, trace bounce buffer
usage from iommu_dma_map_page().
Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Cc: stable(a)vger.kernel.org # v5.15+
Cc: Tom Murphy <murphyt7(a)tcd.ie>
Cc: Lu Baolu <baolu.lu(a)linux.intel.com>
Cc: Saravana Kannan <saravanak(a)google.com>
Signed-off-by: Isaac J. Manjarres <isaacmanjarres(a)google.com>
Link: https://lore.kernel.org/r/20231208234141.2356157-1-isaacmanjarres@google.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 85163a83df2f..037fcf826407 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -29,6 +29,7 @@
#include <linux/spinlock.h>
#include <linux/swiotlb.h>
#include <linux/vmalloc.h>
+#include <trace/events/swiotlb.h>
#include "dma-iommu.h"
@@ -1156,6 +1157,8 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;
}
+ trace_swiotlb_bounced(dev, phys, size);
+
aligned_size = iova_align(iovad, size);
phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
iova_mask(iovad), dir, attrs);