The patch titled
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-fix-uninit-value-in-ocfs2_get_block.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
Date: Wed, 25 Sep 2024 17:06:00 +0800
syzbot reported an uninit-value BUG:
BUG: KMSAN: uninit-value in ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
do_mpage_readpage+0xc45/0x2780 fs/mpage.c:225
mpage_readahead+0x43f/0x840 fs/mpage.c:374
ocfs2_readahead+0x269/0x320 fs/ocfs2/aops.c:381
read_pages+0x193/0x1110 mm/readahead.c:160
page_cache_ra_unbounded+0x901/0x9f0 mm/readahead.c:273
do_page_cache_ra mm/readahead.c:303 [inline]
force_page_cache_ra+0x3b1/0x4b0 mm/readahead.c:332
force_page_cache_readahead mm/internal.h:347 [inline]
generic_fadvise+0x6b0/0xa90 mm/fadvise.c:106
vfs_fadvise mm/fadvise.c:185 [inline]
ksys_fadvise64_64 mm/fadvise.c:199 [inline]
__do_sys_fadvise64 mm/fadvise.c:214 [inline]
__se_sys_fadvise64 mm/fadvise.c:212 [inline]
__x64_sys_fadvise64+0x1fb/0x3a0 mm/fadvise.c:212
x64_sys_call+0xe11/0x3ba0
arch/x86/include/generated/asm/syscalls_64.h:222
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
This is because when ocfs2_extent_map_get_blocks() fails, p_blkno is
uninitialized. So the error log will trigger the above uninit-value
access.
The error log is out-of-date since get_blocks() was removed long time ago.
And the error code will be logged in ocfs2_extent_map_get_blocks() once
ocfs2_get_cluster() fails, so fix this by only logging inode and block.
Link: https://syzkaller.appspot.com/bug?extid=9709e73bae885b05314b
Link: https://lkml.kernel.org/r/20240925090600.3643376-1-joseph.qi@linux.alibaba.…
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Tested-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Cc: Heming Zhao <heming.zhao(a)suse.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/aops.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/fs/ocfs2/aops.c~ocfs2-fix-uninit-value-in-ocfs2_get_block
+++ a/fs/ocfs2/aops.c
@@ -156,9 +156,8 @@ int ocfs2_get_block(struct inode *inode,
err = ocfs2_extent_map_get_blocks(inode, iblock, &p_blkno, &count,
&ext_flags);
if (err) {
- mlog(ML_ERROR, "Error %d from get_blocks(0x%p, %llu, 1, "
- "%llu, NULL)\n", err, inode, (unsigned long long)iblock,
- (unsigned long long)p_blkno);
+ mlog(ML_ERROR, "get_blocks() failed, inode: 0x%p, "
+ "block: %llu\n", inode, (unsigned long long)iblock);
goto bail;
}
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
ocfs2-fix-uninit-value-in-ocfs2_get_block.patch
I forgot to CC stable on this fix.
Chris
On Thu, Sep 5, 2024 at 1:08 AM Chris Li <chrisl(a)kernel.org> wrote:
>
> I found a regression on mm-unstable during my swap stress test,
> using tmpfs to compile linux. The test OOM very soon after
> the make spawns many cc processes.
>
> It bisects down to this change: 33dfe9204f29b415bbc0abb1a50642d1ba94f5e9
> (mm/gup: clear the LRU flag of a page before adding to LRU batch)
>
> Yu Zhao propose the fix: "I think this is one of the potential side
> effects -- Huge mentioned earlier about isolate_lru_folios():"
>
> I test that with it the swap stress test no longer OOM.
>
> Link: https://lore.kernel.org/r/CAOUHufYi9h0kz5uW3LHHS3ZrVwEq-kKp8S6N-MZUmErNAXoX…
> Fixes: 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding to LRU batch")
> Suggested-by: Yu Zhao <yuzhao(a)google.com>
> Suggested-by: Hugh Dickins <hughd(a)google.com>
> Tested-by: Chris Li <chrisl(a)kernel.org>
> Closes: https://lore.kernel.org/all/CAF8kJuNP5iTj2p07QgHSGOJsiUfYpJ2f4R1Q5-3BN9JiD9…
> Signed-off-by: Chris Li <chrisl(a)kernel.org>
> ---
> Changes in v2:
> - Add Closes tag suggested by Yu and Thorsten.
> - Link to v1: https://lore.kernel.org/r/20240904-lru-flag-v1-1-36638d6a524c@kernel.org
> ---
> mm/vmscan.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a9b6a8196f95..96abf4a52382 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4323,7 +4323,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
> }
>
> /* ineligible */
> - if (zone > sc->reclaim_idx) {
> + if (!folio_test_lru(folio) || zone > sc->reclaim_idx) {
> gen = folio_inc_gen(lruvec, folio, false);
> list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
> return true;
>
> ---
> base-commit: 756ca36d643324d028b325a170e73e392b9590cd
> change-id: 20240904-lru-flag-2af2f955740e
>
> Best regards,
> --
> Chris Li <chrisl(a)kernel.org>
>
We acquire a connector reference before scheduling an HDCP prop work,
and expect the work function to release the reference.
However, if the work was already queued, it won't be queued multiple
times, and the reference is not dropped.
Release the reference immediately if the work was already queued.
Fixes: a6597faa2d59 ("drm/i915: Protect workers against disappearing connectors")
Cc: Sean Paul <seanpaul(a)chromium.org>
Cc: Suraj Kandpal <suraj.kandpal(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v5.10+
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
---
I don't know that we have any bugs open about this. Or how it would
manifest itself. Memory leak on driver unload? I just spotted this while
reading the code for other reasons.
---
drivers/gpu/drm/i915/display/intel_hdcp.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_hdcp.c b/drivers/gpu/drm/i915/display/intel_hdcp.c
index 2afa92321b08..cad309602617 100644
--- a/drivers/gpu/drm/i915/display/intel_hdcp.c
+++ b/drivers/gpu/drm/i915/display/intel_hdcp.c
@@ -1097,7 +1097,8 @@ static void intel_hdcp_update_value(struct intel_connector *connector,
hdcp->value = value;
if (update_property) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
}
}
@@ -2531,7 +2532,8 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
mutex_lock(&hdcp->mutex);
hdcp->value = DRM_MODE_CONTENT_PROTECTION_DESIRED;
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
mutex_unlock(&hdcp->mutex);
}
@@ -2548,7 +2550,9 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
*/
if (!desired_and_not_enabled && !content_protection_type_changed) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
+
}
}
--
2.39.2
I was benchmarking some compressors, piping to and from a network share on a NAS, and some consistently wrote corrupted data.
First, apologies in advance:
* if I'm not in the right place. I tried to follow the directions from the Regressions guide - https://www.kernel.org/doc/html/latest/admin-guide/reporting-regressions.ht…
* I know there's a ton of context I don't know
* I’m trying a different mail app, because the first one looked concussed with plain text. This might be worse.
The detailed description:
I was benchmarking some compressors on Debian on a Raspberry Pi, piping to and from a network share on a NAS, and found that some consistently had issues writing to my NAS. Specifically:
* lzop
* pigz - parallel gzip
* pbzip2 - parallel bzip2
This is dependent on kernel version. I've done a survey, below.
While I tripped over the issue on a Debian port (Debian 12, bookworm, kernel v6.6), I compiled my own vanilla / mainline kernels for testing and reporting this.
Even more details:
The Pi and the Synology NAS are directly connected by Gigabit Ethernet. Both sides are using self-assigned IP addresses. I'll note that at boot, getting the Pi to see the NAS requires some nudging of avahi-autoipd; while I think it's stable before testing, I'm not positive, and reconnection issues might be in play.
The files in question are tars of sparse file systems, about 270 gig, compressing down to 10-30 gig.
Compression seems to work, without complaint; decompression crashes the process, usually within the first gig of the compressed file. The output of the stream doesn't match what ends up written to disk.
Trying decompression during compression gets further along than it does after compression finishes; this might point toward something with writes and caches.
A previous attempt involved rpi-update, which:
* good: let me install kernels without building myself
* bad: updated the bootloader and firmware, to bleeding edge, with possible regressions; it definitely muddied the results of my tests
I started over with a fresh install, and no results involving rpi-update are included in this email.
A survey of major branches:
* 5.15.167, LTS - good
* 6.1.109, LTS - good
* 6.2.16 - good
* 6.3.13 - bad
* 6.4.16 - bad
* 6.5.13 - bad
* 6.6.50, LTS - bad
* 6.7.12 - bad
* 6.8.12 - bad
* 6.9.12 - bad
* 6.10.9 - good
* 6.11.0 - good
I tried, but couldn't fully build 4.19.322 or 6.0.19, due to issues with modules.
Important commits:
It looked like both the breakage and the fix came in during rc1 releases.
Breakage, v6.3-rc1:
I manually bisected commits in fs/smb* and fs/cifs.
3d78fe73fa12 cifs: Build the RDMA SGE list directly from an iterator
> lzop and pigz worked. last working. test in progress: pbzip2
607aea3cc2a8 cifs: Remove unused code
> lzop didn't work. first broken
Fix, v6.10-rc1:
I manually bisected commits in fs/smb.
69c3c023af25 cifs: Implement netfslib hooks
> lzop didn't work. last broken one
3ee1a1fc3981 cifs: Cut over to using netfslib
> lzop, pigz, pbzip2, all worked. first fixed one
To test / reproduce:
It looks like this, on a mounted network share, with extra pv for progress meters:
cat 1tb-rust-ext4.img.tar.gz | \
gzip -d | \
lzop -1 > \
1tb-rust-ext4.img.tar.lzop
# wait 40 minutes
cat 1tb-rust-ext4.img.tar.lzop | \
lzop -d | \
sha1sum
# either it works, and shows the right checksum
# or it crashes early, due to a corrupt file, and shows an incorrect checksum
As I re-read this, I realize it might look like the compressor behaves differently. I added a "tee $output | sha1sum; sha1sum $output" and ran it on a broken version. The checksums from the pipe and for the file on disk are different.
Assorted info:
This is a Raspberry Pi 4, with 4 GiB RAM, running Debian 12, bookworm, or a port.
mount.cifs version: 7.0
# cat /proc/sys/kernel/tainted
1024
# cat /proc/version
Linux version 6.2.0-3d78fe73f-v8-pronoiac+ (pronoiac@bisect) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #21 SMP PREEMPT Thu Sep 19 16:51:22 PDT 2024
DebugData:
/proc/fs/cifs/DebugData
Display Internal CIFS Data Structures for Debugging
---------------------------------------------------
CIFS Version 2.41
Features: DFS,FSCACHE,STATS2,DEBUG,ALLOW_INSECURE_LEGACY,CIFS_POSIX,UPCALL(SPNEGO),XATTR,ACL
CIFSMaxBufSize: 16384
Active VFS Requests: 1
Servers:
1) ConnectionId: 0x1 Hostname: drums.local
Number of credits: 8062 Dialect 0x300
TCP status: 1 Instance: 1
Local Users To Server: 1 SecMode: 0x1 Req On Wire: 2
In Send: 1 In MaxReq Wait: 0
Sessions:
1) Address: 169.254.132.219 Uses: 1 Capability: 0x300047 Session Status: 1
Security type: RawNTLMSSP SessionId: 0x4969841e
User: 1000 Cred User: 0
Shares:
0) IPC: \\drums.local\IPC$ Mounts: 1 DevInfo: 0x0 Attributes: 0x0
PathComponentMax: 0 Status: 1 type: 0 Serial Number: 0x0
Share Capabilities: None Share Flags: 0x0
tid: 0xeb093f0b Maximal Access: 0x1f00a9
1) \\drums.local\billions Mounts: 1 DevInfo: 0x20 Attributes: 0x5007f
PathComponentMax: 255 Status: 1 type: DISK Serial Number: 0x735a9af5
Share Capabilities: None Aligned, Partition Aligned, Share Flags: 0x0
tid: 0x5e6832e6 Optimal sector size: 0x200 Maximal Access: 0x1f01ff
MIDs:
State: 2 com: 9 pid: 3117 cbdata: 00000000e003293e mid 962892
State: 2 com: 9 pid: 3117 cbdata: 000000002610602a mid 962956
--
Let me know how I can help.
The process of iterating can take hours, and it's not automated, so my resources are limited.
#regzbot introduced: 607aea3cc2a8
#regzbot fix: 3ee1a1fc3981
-James
When CONFIG_ZRAM_MULTI_COMP isn't set ZRAM_SECONDARY_COMP can hold
default_compressor, because it's the same offset as ZRAM_PRIMARY_COMP,
so we need to make sure that we don't attempt to kfree() the
statically defined compressor name.
This is detected by KASAN.
==================================================================
Call trace:
kfree+0x60/0x3a0
zram_destroy_comps+0x98/0x198 [zram]
zram_reset_device+0x22c/0x4a8 [zram]
reset_store+0x1bc/0x2d8 [zram]
dev_attr_store+0x44/0x80
sysfs_kf_write+0xfc/0x188
kernfs_fop_write_iter+0x28c/0x428
vfs_write+0x4dc/0x9b8
ksys_write+0x100/0x1f8
__arm64_sys_write+0x74/0xb8
invoke_syscall+0xd8/0x260
el0_svc_common.constprop.0+0xb4/0x240
do_el0_svc+0x48/0x68
el0_svc+0x40/0xc8
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x190/0x198
==================================================================
Signed-off-by: Andrey Skvortsov <andrej.skvortzov(a)gmail.com>
Fixes: 684826f8271a ("zram: free secondary algorithms names")
Cc: <stable(a)vger.kernel.org>
---
Changes in v2:
- removed comment from source code about freeing statically defined compression
- removed part of KASAN report from commit description
- added information about CONFIG_ZRAM_MULTI_COMP into commit description
Changes in v3:
- modified commit description based on Sergey's comment
- changed start for-loop to ZRAM_PRIMARY_COMP
drivers/block/zram/zram_drv.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index c3d245617083d..ad9c9bc3ccfc5 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2115,8 +2115,10 @@ static void zram_destroy_comps(struct zram *zram)
zram->num_active_comps--;
}
- for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
- kfree(zram->comp_algs[prio]);
+ for (prio = ZRAM_PRIMARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
+ /* Do not free statically defined compression algorithms */
+ if (zram->comp_algs[prio] != default_compressor)
+ kfree(zram->comp_algs[prio]);
zram->comp_algs[prio] = NULL;
}
--
2.45.2
From userspace, spawning a new process with, for example,
posix_spawn(), only allows the user to work with
the scheduling priority value defined by POSIX
in the sched_param struct.
However, sched_setparam() and similar syscalls lead to
__sched_setscheduler() which rejects any new value
for the priority other than 0 for non-RT schedule classes,
a behavior kept since Linux 2.6 or earlier.
Linux translates the usage of the sched_param struct
into it's own internal sched_attr struct during the syscall,
but the user has no way to manage the other values
within the sched_attr struct using only POSIX functions.
The only other way to adjust niceness while using posix_spawn()
would be to set the value after the process has started,
but this introduces the risk of the process being dead
before the next syscall can set the priority after the fact.
To resolve this, allow the use of the priority value
originally from the POSIX sched_param struct in order to
set the niceness value instead of rejecting the priority value.
Edit the sched_get_priority_*() POSIX syscalls
in order to reflect the range of values accepted.
Cc: stable(a)vger.kernel.org # Apply to kernel/sched/core.c
Signed-off-by: Michael Pratt <mcpratt(a)pm.me>
---
kernel/sched/syscalls.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index ae1b42775ef9..52c02b80f037 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -853,6 +853,19 @@ static int _sched_setscheduler(struct task_struct *p, int policy,
attr.sched_policy = policy;
}
+ if (attr.sched_priority > MAX_PRIO-1)
+ return -EINVAL;
+
+ /*
+ * If priority is set for SCHED_NORMAL or SCHED_BATCH,
+ * set the niceness instead, but only for user calls.
+ */
+ if (check && attr.sched_priority > MAX_RT_PRIO-1 &&
+ ((policy != SETPARAM_POLICY && fair_policy(policy)) || fair_policy(p->policy))) {
+ attr.sched_nice = PRIO_TO_NICE(attr.sched_priority);
+ attr.sched_priority = 0;
+ }
+
return __sched_setscheduler(p, &attr, check, true);
}
/**
@@ -1598,9 +1611,11 @@ SYSCALL_DEFINE1(sched_get_priority_max, int, policy)
case SCHED_RR:
ret = MAX_RT_PRIO-1;
break;
- case SCHED_DEADLINE:
case SCHED_NORMAL:
case SCHED_BATCH:
+ ret = MAX_PRIO-1;
+ break;
+ case SCHED_DEADLINE:
case SCHED_IDLE:
ret = 0;
break;
@@ -1625,9 +1640,11 @@ SYSCALL_DEFINE1(sched_get_priority_min, int, policy)
case SCHED_RR:
ret = 1;
break;
- case SCHED_DEADLINE:
case SCHED_NORMAL:
case SCHED_BATCH:
+ ret = MAX_RT_PRIO;
+ break;
+ case SCHED_DEADLINE:
case SCHED_IDLE:
ret = 0;
}
base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
--
2.30.2