Hello,
The commit 338567d17627 ("drm/amd/display: Fix MST BW calculation
Regression") caused a regression with some MST displays due to a mistake.
For 6.11.y here is the series of commits that fixes it:
commit 4599aef0c97b ("drm/amd/display: Fix Synaptics Cascaded Panamera
DSC Determination")
commit ecc4038ec1de ("drm/amd/display: Add DSC Debug Log")
commit b2b4afb9cf07 ("drm/amdgpu/display: Fix a mistake in revert commit")
Can you please bring them back to 6.11.y?
Thanks!
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x ea7e2d5e49c05e5db1922387b09ca74aa40f46e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100103-banish-batboy-00df@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
ea7e2d5e49c0 ("mm: call the security_mmap_file() LSM hook in remap_file_pages()")
592b5fad1677 ("mm: Re-introduce vm_flags to do_mmap()")
183654ce26a5 ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator")
0378c0a0e9e4 ("mm/mmap: remove preallocation from do_mas_align_munmap()")
92fed82047d7 ("mm/mmap: convert brk to use vma iterator")
baabcfc93d3b ("mm/mmap: fix typo in comment")
c5d5546ea065 ("maple_tree: remove the parameter entry of mas_preallocate")
675eaca1f441 ("mm/mmap: properly unaccount memory on mas_preallocate() failure")
6c28ca6485dd ("mmap: fix do_brk_flags() modifying obviously incorrect VMAs")
f5ad5083404b ("mm: do not BUG_ON missing brk mapping, because userspace can unmap it")
cc674ab3c018 ("mm/mmap: fix memory leak in mmap_region()")
120b116208a0 ("maple_tree: reorganize testing to restore module testing")
a57b70519d1f ("mm/mmap: fix MAP_FIXED address return on VMA merge")
5789151e48ac ("mm/mmap: undo ->mmap() when mas_preallocate() fails")
deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
28c5609fb236 ("mm/mmap: preallocate maple nodes for brk vma expansion")
763ecb035029 ("mm: remove the vma linked list")
8220543df148 ("nommu: remove uses of VMA linked list")
67e7c16764c3 ("mm/mmap: change do_brk_munmap() to use do_mas_align_munmap()")
11f9a21ab655 ("mm/mmap: reorganize munmap to use maple states")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea7e2d5e49c05e5db1922387b09ca74aa40f46e2 Mon Sep 17 00:00:00 2001
From: Shu Han <ebpqwerty472123(a)gmail.com>
Date: Tue, 17 Sep 2024 17:41:04 +0800
Subject: [PATCH] mm: call the security_mmap_file() LSM hook in
remap_file_pages()
The remap_file_pages syscall handler calls do_mmap() directly, which
doesn't contain the LSM security check. And if the process has called
personality(READ_IMPLIES_EXEC) before and remap_file_pages() is called for
RW pages, this will actually result in remapping the pages to RWX,
bypassing a W^X policy enforced by SELinux.
So we should check prot by security_mmap_file LSM hook in the
remap_file_pages syscall handler before do_mmap() is called. Otherwise, it
potentially permits an attacker to bypass a W^X policy enforced by
SELinux.
The bypass is similar to CVE-2016-10044, which bypass the same thing via
AIO and can be found in [1].
The PoC:
$ cat > test.c
int main(void) {
size_t pagesz = sysconf(_SC_PAGE_SIZE);
int mfd = syscall(SYS_memfd_create, "test", 0);
const char *buf = mmap(NULL, 4 * pagesz, PROT_READ | PROT_WRITE,
MAP_SHARED, mfd, 0);
unsigned int old = syscall(SYS_personality, 0xffffffff);
syscall(SYS_personality, READ_IMPLIES_EXEC | old);
syscall(SYS_remap_file_pages, buf, pagesz, 0, 2, 0);
syscall(SYS_personality, old);
// show the RWX page exists even if W^X policy is enforced
int fd = open("/proc/self/maps", O_RDONLY);
unsigned char buf2[1024];
while (1) {
int ret = read(fd, buf2, 1024);
if (ret <= 0) break;
write(1, buf2, ret);
}
close(fd);
}
$ gcc test.c -o test
$ ./test | grep rwx
7f1836c34000-7f1836c35000 rwxs 00002000 00:01 2050 /memfd:test (deleted)
Link: https://project-zero.issues.chromium.org/issues/42452389 [1]
Cc: stable(a)vger.kernel.org
Signed-off-by: Shu Han <ebpqwerty472123(a)gmail.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/mm/mmap.c b/mm/mmap.c
index d0dfc85b209b..18fddcce03b8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3198,8 +3198,12 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
flags |= MAP_LOCKED;
file = get_file(vma->vm_file);
+ ret = security_mmap_file(vma->vm_file, prot, flags);
+ if (ret)
+ goto out_fput;
ret = do_mmap(vma->vm_file, start, size,
prot, flags, 0, pgoff, &populate, NULL);
+out_fput:
fput(file);
out:
mmap_write_unlock(mm);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x ea7e2d5e49c05e5db1922387b09ca74aa40f46e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100157-unpeeled-surprise-8138@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
ea7e2d5e49c0 ("mm: call the security_mmap_file() LSM hook in remap_file_pages()")
592b5fad1677 ("mm: Re-introduce vm_flags to do_mmap()")
183654ce26a5 ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator")
0378c0a0e9e4 ("mm/mmap: remove preallocation from do_mas_align_munmap()")
92fed82047d7 ("mm/mmap: convert brk to use vma iterator")
baabcfc93d3b ("mm/mmap: fix typo in comment")
c5d5546ea065 ("maple_tree: remove the parameter entry of mas_preallocate")
675eaca1f441 ("mm/mmap: properly unaccount memory on mas_preallocate() failure")
6c28ca6485dd ("mmap: fix do_brk_flags() modifying obviously incorrect VMAs")
f5ad5083404b ("mm: do not BUG_ON missing brk mapping, because userspace can unmap it")
cc674ab3c018 ("mm/mmap: fix memory leak in mmap_region()")
120b116208a0 ("maple_tree: reorganize testing to restore module testing")
a57b70519d1f ("mm/mmap: fix MAP_FIXED address return on VMA merge")
5789151e48ac ("mm/mmap: undo ->mmap() when mas_preallocate() fails")
deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
28c5609fb236 ("mm/mmap: preallocate maple nodes for brk vma expansion")
763ecb035029 ("mm: remove the vma linked list")
8220543df148 ("nommu: remove uses of VMA linked list")
67e7c16764c3 ("mm/mmap: change do_brk_munmap() to use do_mas_align_munmap()")
11f9a21ab655 ("mm/mmap: reorganize munmap to use maple states")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea7e2d5e49c05e5db1922387b09ca74aa40f46e2 Mon Sep 17 00:00:00 2001
From: Shu Han <ebpqwerty472123(a)gmail.com>
Date: Tue, 17 Sep 2024 17:41:04 +0800
Subject: [PATCH] mm: call the security_mmap_file() LSM hook in
remap_file_pages()
The remap_file_pages syscall handler calls do_mmap() directly, which
doesn't contain the LSM security check. And if the process has called
personality(READ_IMPLIES_EXEC) before and remap_file_pages() is called for
RW pages, this will actually result in remapping the pages to RWX,
bypassing a W^X policy enforced by SELinux.
So we should check prot by security_mmap_file LSM hook in the
remap_file_pages syscall handler before do_mmap() is called. Otherwise, it
potentially permits an attacker to bypass a W^X policy enforced by
SELinux.
The bypass is similar to CVE-2016-10044, which bypass the same thing via
AIO and can be found in [1].
The PoC:
$ cat > test.c
int main(void) {
size_t pagesz = sysconf(_SC_PAGE_SIZE);
int mfd = syscall(SYS_memfd_create, "test", 0);
const char *buf = mmap(NULL, 4 * pagesz, PROT_READ | PROT_WRITE,
MAP_SHARED, mfd, 0);
unsigned int old = syscall(SYS_personality, 0xffffffff);
syscall(SYS_personality, READ_IMPLIES_EXEC | old);
syscall(SYS_remap_file_pages, buf, pagesz, 0, 2, 0);
syscall(SYS_personality, old);
// show the RWX page exists even if W^X policy is enforced
int fd = open("/proc/self/maps", O_RDONLY);
unsigned char buf2[1024];
while (1) {
int ret = read(fd, buf2, 1024);
if (ret <= 0) break;
write(1, buf2, ret);
}
close(fd);
}
$ gcc test.c -o test
$ ./test | grep rwx
7f1836c34000-7f1836c35000 rwxs 00002000 00:01 2050 /memfd:test (deleted)
Link: https://project-zero.issues.chromium.org/issues/42452389 [1]
Cc: stable(a)vger.kernel.org
Signed-off-by: Shu Han <ebpqwerty472123(a)gmail.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/mm/mmap.c b/mm/mmap.c
index d0dfc85b209b..18fddcce03b8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3198,8 +3198,12 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
flags |= MAP_LOCKED;
file = get_file(vma->vm_file);
+ ret = security_mmap_file(vma->vm_file, prot, flags);
+ if (ret)
+ goto out_fput;
ret = do_mmap(vma->vm_file, start, size,
prot, flags, 0, pgoff, &populate, NULL);
+out_fput:
fput(file);
out:
mmap_write_unlock(mm);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ea7e2d5e49c05e5db1922387b09ca74aa40f46e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100151-evolution-apache-87b3@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
ea7e2d5e49c0 ("mm: call the security_mmap_file() LSM hook in remap_file_pages()")
592b5fad1677 ("mm: Re-introduce vm_flags to do_mmap()")
183654ce26a5 ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator")
0378c0a0e9e4 ("mm/mmap: remove preallocation from do_mas_align_munmap()")
92fed82047d7 ("mm/mmap: convert brk to use vma iterator")
baabcfc93d3b ("mm/mmap: fix typo in comment")
c5d5546ea065 ("maple_tree: remove the parameter entry of mas_preallocate")
675eaca1f441 ("mm/mmap: properly unaccount memory on mas_preallocate() failure")
6c28ca6485dd ("mmap: fix do_brk_flags() modifying obviously incorrect VMAs")
f5ad5083404b ("mm: do not BUG_ON missing brk mapping, because userspace can unmap it")
cc674ab3c018 ("mm/mmap: fix memory leak in mmap_region()")
120b116208a0 ("maple_tree: reorganize testing to restore module testing")
a57b70519d1f ("mm/mmap: fix MAP_FIXED address return on VMA merge")
5789151e48ac ("mm/mmap: undo ->mmap() when mas_preallocate() fails")
deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
28c5609fb236 ("mm/mmap: preallocate maple nodes for brk vma expansion")
763ecb035029 ("mm: remove the vma linked list")
8220543df148 ("nommu: remove uses of VMA linked list")
67e7c16764c3 ("mm/mmap: change do_brk_munmap() to use do_mas_align_munmap()")
11f9a21ab655 ("mm/mmap: reorganize munmap to use maple states")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea7e2d5e49c05e5db1922387b09ca74aa40f46e2 Mon Sep 17 00:00:00 2001
From: Shu Han <ebpqwerty472123(a)gmail.com>
Date: Tue, 17 Sep 2024 17:41:04 +0800
Subject: [PATCH] mm: call the security_mmap_file() LSM hook in
remap_file_pages()
The remap_file_pages syscall handler calls do_mmap() directly, which
doesn't contain the LSM security check. And if the process has called
personality(READ_IMPLIES_EXEC) before and remap_file_pages() is called for
RW pages, this will actually result in remapping the pages to RWX,
bypassing a W^X policy enforced by SELinux.
So we should check prot by security_mmap_file LSM hook in the
remap_file_pages syscall handler before do_mmap() is called. Otherwise, it
potentially permits an attacker to bypass a W^X policy enforced by
SELinux.
The bypass is similar to CVE-2016-10044, which bypass the same thing via
AIO and can be found in [1].
The PoC:
$ cat > test.c
int main(void) {
size_t pagesz = sysconf(_SC_PAGE_SIZE);
int mfd = syscall(SYS_memfd_create, "test", 0);
const char *buf = mmap(NULL, 4 * pagesz, PROT_READ | PROT_WRITE,
MAP_SHARED, mfd, 0);
unsigned int old = syscall(SYS_personality, 0xffffffff);
syscall(SYS_personality, READ_IMPLIES_EXEC | old);
syscall(SYS_remap_file_pages, buf, pagesz, 0, 2, 0);
syscall(SYS_personality, old);
// show the RWX page exists even if W^X policy is enforced
int fd = open("/proc/self/maps", O_RDONLY);
unsigned char buf2[1024];
while (1) {
int ret = read(fd, buf2, 1024);
if (ret <= 0) break;
write(1, buf2, ret);
}
close(fd);
}
$ gcc test.c -o test
$ ./test | grep rwx
7f1836c34000-7f1836c35000 rwxs 00002000 00:01 2050 /memfd:test (deleted)
Link: https://project-zero.issues.chromium.org/issues/42452389 [1]
Cc: stable(a)vger.kernel.org
Signed-off-by: Shu Han <ebpqwerty472123(a)gmail.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/mm/mmap.c b/mm/mmap.c
index d0dfc85b209b..18fddcce03b8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3198,8 +3198,12 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
flags |= MAP_LOCKED;
file = get_file(vma->vm_file);
+ ret = security_mmap_file(vma->vm_file, prot, flags);
+ if (ret)
+ goto out_fput;
ret = do_mmap(vma->vm_file, start, size,
prot, flags, 0, pgoff, &populate, NULL);
+out_fput:
fput(file);
out:
mmap_write_unlock(mm);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ea7e2d5e49c05e5db1922387b09ca74aa40f46e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100145-gristle-unknotted-af80@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
ea7e2d5e49c0 ("mm: call the security_mmap_file() LSM hook in remap_file_pages()")
592b5fad1677 ("mm: Re-introduce vm_flags to do_mmap()")
183654ce26a5 ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator")
0378c0a0e9e4 ("mm/mmap: remove preallocation from do_mas_align_munmap()")
92fed82047d7 ("mm/mmap: convert brk to use vma iterator")
baabcfc93d3b ("mm/mmap: fix typo in comment")
c5d5546ea065 ("maple_tree: remove the parameter entry of mas_preallocate")
675eaca1f441 ("mm/mmap: properly unaccount memory on mas_preallocate() failure")
6c28ca6485dd ("mmap: fix do_brk_flags() modifying obviously incorrect VMAs")
f5ad5083404b ("mm: do not BUG_ON missing brk mapping, because userspace can unmap it")
cc674ab3c018 ("mm/mmap: fix memory leak in mmap_region()")
120b116208a0 ("maple_tree: reorganize testing to restore module testing")
a57b70519d1f ("mm/mmap: fix MAP_FIXED address return on VMA merge")
5789151e48ac ("mm/mmap: undo ->mmap() when mas_preallocate() fails")
deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
28c5609fb236 ("mm/mmap: preallocate maple nodes for brk vma expansion")
763ecb035029 ("mm: remove the vma linked list")
8220543df148 ("nommu: remove uses of VMA linked list")
67e7c16764c3 ("mm/mmap: change do_brk_munmap() to use do_mas_align_munmap()")
11f9a21ab655 ("mm/mmap: reorganize munmap to use maple states")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea7e2d5e49c05e5db1922387b09ca74aa40f46e2 Mon Sep 17 00:00:00 2001
From: Shu Han <ebpqwerty472123(a)gmail.com>
Date: Tue, 17 Sep 2024 17:41:04 +0800
Subject: [PATCH] mm: call the security_mmap_file() LSM hook in
remap_file_pages()
The remap_file_pages syscall handler calls do_mmap() directly, which
doesn't contain the LSM security check. And if the process has called
personality(READ_IMPLIES_EXEC) before and remap_file_pages() is called for
RW pages, this will actually result in remapping the pages to RWX,
bypassing a W^X policy enforced by SELinux.
So we should check prot by security_mmap_file LSM hook in the
remap_file_pages syscall handler before do_mmap() is called. Otherwise, it
potentially permits an attacker to bypass a W^X policy enforced by
SELinux.
The bypass is similar to CVE-2016-10044, which bypass the same thing via
AIO and can be found in [1].
The PoC:
$ cat > test.c
int main(void) {
size_t pagesz = sysconf(_SC_PAGE_SIZE);
int mfd = syscall(SYS_memfd_create, "test", 0);
const char *buf = mmap(NULL, 4 * pagesz, PROT_READ | PROT_WRITE,
MAP_SHARED, mfd, 0);
unsigned int old = syscall(SYS_personality, 0xffffffff);
syscall(SYS_personality, READ_IMPLIES_EXEC | old);
syscall(SYS_remap_file_pages, buf, pagesz, 0, 2, 0);
syscall(SYS_personality, old);
// show the RWX page exists even if W^X policy is enforced
int fd = open("/proc/self/maps", O_RDONLY);
unsigned char buf2[1024];
while (1) {
int ret = read(fd, buf2, 1024);
if (ret <= 0) break;
write(1, buf2, ret);
}
close(fd);
}
$ gcc test.c -o test
$ ./test | grep rwx
7f1836c34000-7f1836c35000 rwxs 00002000 00:01 2050 /memfd:test (deleted)
Link: https://project-zero.issues.chromium.org/issues/42452389 [1]
Cc: stable(a)vger.kernel.org
Signed-off-by: Shu Han <ebpqwerty472123(a)gmail.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/mm/mmap.c b/mm/mmap.c
index d0dfc85b209b..18fddcce03b8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3198,8 +3198,12 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
flags |= MAP_LOCKED;
file = get_file(vma->vm_file);
+ ret = security_mmap_file(vma->vm_file, prot, flags);
+ if (ret)
+ goto out_fput;
ret = do_mmap(vma->vm_file, start, size,
prot, flags, 0, pgoff, &populate, NULL);
+out_fput:
fput(file);
out:
mmap_write_unlock(mm);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ea7e2d5e49c05e5db1922387b09ca74aa40f46e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100139-mousiness-agreeable-1242@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
ea7e2d5e49c0 ("mm: call the security_mmap_file() LSM hook in remap_file_pages()")
592b5fad1677 ("mm: Re-introduce vm_flags to do_mmap()")
183654ce26a5 ("mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iterator")
0378c0a0e9e4 ("mm/mmap: remove preallocation from do_mas_align_munmap()")
92fed82047d7 ("mm/mmap: convert brk to use vma iterator")
baabcfc93d3b ("mm/mmap: fix typo in comment")
c5d5546ea065 ("maple_tree: remove the parameter entry of mas_preallocate")
675eaca1f441 ("mm/mmap: properly unaccount memory on mas_preallocate() failure")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea7e2d5e49c05e5db1922387b09ca74aa40f46e2 Mon Sep 17 00:00:00 2001
From: Shu Han <ebpqwerty472123(a)gmail.com>
Date: Tue, 17 Sep 2024 17:41:04 +0800
Subject: [PATCH] mm: call the security_mmap_file() LSM hook in
remap_file_pages()
The remap_file_pages syscall handler calls do_mmap() directly, which
doesn't contain the LSM security check. And if the process has called
personality(READ_IMPLIES_EXEC) before and remap_file_pages() is called for
RW pages, this will actually result in remapping the pages to RWX,
bypassing a W^X policy enforced by SELinux.
So we should check prot by security_mmap_file LSM hook in the
remap_file_pages syscall handler before do_mmap() is called. Otherwise, it
potentially permits an attacker to bypass a W^X policy enforced by
SELinux.
The bypass is similar to CVE-2016-10044, which bypass the same thing via
AIO and can be found in [1].
The PoC:
$ cat > test.c
int main(void) {
size_t pagesz = sysconf(_SC_PAGE_SIZE);
int mfd = syscall(SYS_memfd_create, "test", 0);
const char *buf = mmap(NULL, 4 * pagesz, PROT_READ | PROT_WRITE,
MAP_SHARED, mfd, 0);
unsigned int old = syscall(SYS_personality, 0xffffffff);
syscall(SYS_personality, READ_IMPLIES_EXEC | old);
syscall(SYS_remap_file_pages, buf, pagesz, 0, 2, 0);
syscall(SYS_personality, old);
// show the RWX page exists even if W^X policy is enforced
int fd = open("/proc/self/maps", O_RDONLY);
unsigned char buf2[1024];
while (1) {
int ret = read(fd, buf2, 1024);
if (ret <= 0) break;
write(1, buf2, ret);
}
close(fd);
}
$ gcc test.c -o test
$ ./test | grep rwx
7f1836c34000-7f1836c35000 rwxs 00002000 00:01 2050 /memfd:test (deleted)
Link: https://project-zero.issues.chromium.org/issues/42452389 [1]
Cc: stable(a)vger.kernel.org
Signed-off-by: Shu Han <ebpqwerty472123(a)gmail.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/mm/mmap.c b/mm/mmap.c
index d0dfc85b209b..18fddcce03b8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3198,8 +3198,12 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
flags |= MAP_LOCKED;
file = get_file(vma->vm_file);
+ ret = security_mmap_file(vma->vm_file, prot, flags);
+ if (ret)
+ goto out_fput;
ret = do_mmap(vma->vm_file, start, size,
prot, flags, 0, pgoff, &populate, NULL);
+out_fput:
fput(file);
out:
mmap_write_unlock(mm);
Ensure we serialize with completion side to prevent UAF with fence going
out of scope on the stack, since we have no clue if it will fire after
the timeout before we can erase from the xa. Also we have some dependent
loads and stores for which we need the correct ordering, and we lack the
needed barriers. Fix this by grabbing the ct->lock after the wait, which
is also held by the completion side.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Badal Nilawar <badal.nilawar(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_guc_ct.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 4b95f75b1546..232eb69bd8e4 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -903,16 +903,26 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
}
ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
+
+ /*
+ * Ensure we serialize with completion side to prevent UAF with fence going out of scope on
+ * the stack, since we have no clue if it will fire after the timeout before we can erase
+ * from the xa. Also we have some dependent loads and stores below for which we need the
+ * correct ordering, and we lack the needed barriers.
+ */
+ mutex_lock(&ct->lock);
if (!ret) {
xe_gt_err(gt, "Timed out wait for G2H, fence %u, action %04x",
g2h_fence.seqno, action[0]);
xa_erase_irq(&ct->fence_lookup, g2h_fence.seqno);
+ mutex_unlock(&ct->lock);
return -ETIME;
}
if (g2h_fence.retry) {
xe_gt_dbg(gt, "H2G action %#x retrying: reason %#x\n",
action[0], g2h_fence.reason);
+ mutex_unlock(&ct->lock);
goto retry;
}
if (g2h_fence.fail) {
@@ -921,7 +931,12 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
ret = -EIO;
}
- return ret > 0 ? response_buffer ? g2h_fence.response_len : g2h_fence.response_data : ret;
+ if (ret > 0)
+ ret = response_buffer ? g2h_fence.response_len : g2h_fence.response_data;
+
+ mutex_unlock(&ct->lock);
+
+ return ret;
}
/**
--
2.46.2
From: Gui-Dong Han <hanguidong02(a)outlook.com>
This patch addresses an issue with improper reference count handling in the
ice_sriov_set_msix_vec_count() function.
First, the function calls ice_get_vf_by_id(), which increments the
reference count of the vf pointer. If the subsequent call to
ice_get_vf_vsi() fails, the function currently returns an error without
decrementing the reference count of the vf pointer, leading to a reference
count leak. The correct behavior, as implemented in this patch, is to
decrement the reference count using ice_put_vf(vf) before returning an
error when vsi is NULL.
Second, the function calls ice_sriov_get_irqs(), which sets
vf->first_vector_idx. If this call returns a negative value, indicating an
error, the function returns an error without decrementing the reference
count of the vf pointer, resulting in another reference count leak. The
patch addresses this by adding a call to ice_put_vf(vf) before returning
an error when vf->first_vector_idx < 0.
This bug was identified by an experimental static analysis tool developed
by our team. The tool specializes in analyzing reference count operations
and identifying potential mismanagement of reference counts. In this case,
the tool flagged the missing decrement operation as a potential issue,
leading to this patch.
Fixes: 4035c72dc1ba ("ice: reconfig host after changing MSI-X on VF")
Fixes: 4d38cb44bd32 ("ice: manage VFs MSI-X using resource tracking")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02(a)outlook.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
---
drivers/net/ethernet/intel/ice/ice_sriov.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
index e34fe2516ccc..c2d6b2a144e9 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -1096,8 +1096,10 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
return -ENOENT;
vsi = ice_get_vf_vsi(vf);
- if (!vsi)
+ if (!vsi) {
+ ice_put_vf(vf);
return -ENOENT;
+ }
prev_msix = vf->num_msix;
prev_queues = vf->num_vf_qs;
@@ -1142,8 +1144,10 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
vf->num_msix = prev_msix;
vf->num_vf_qs = prev_queues;
vf->first_vector_idx = ice_sriov_get_irqs(pf, vf->num_msix);
- if (vf->first_vector_idx < 0)
+ if (vf->first_vector_idx < 0) {
+ ice_put_vf(vf);
return -EINVAL;
+ }
if (needs_rebuild) {
ice_vf_reconfig_vsi(vf);
--
2.42.0
From: Joshua Hay <joshua.a.hay(a)intel.com>
When a mailbox message is received, the driver is checking for a non 0
datalen in the controlq descriptor. If it is valid, the payload is
attached to the ctlq message to give to the upper layer. However, the
payload response size given to the upper layer was taken from the buffer
metadata which is _always_ the max buffer size. This meant the API was
returning 4K as the payload size for all messages. This went unnoticed
since the virtchnl exchange response logic was checking for a response
size less than 0 (error), not less than exact size, or not greater than
or equal to the max mailbox buffer size (4K). All of these checks will
pass in the success case since the size provided is always 4K. However,
this breaks anyone that wants to validate the exact response size.
Fetch the actual payload length from the value provided in the
descriptor data_len field (instead of the buffer metadata).
Unfortunately, this means we lose some extra error parsing for variable
sized virtchnl responses such as create vport and get ptypes. However,
the original checks weren't really helping anyways since the size was
_always_ 4K.
Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager")
Cc: stable(a)vger.kernel.org # 6.9+
Signed-off-by: Joshua Hay <joshua.a.hay(a)intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
---
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 70986e12da28..3c0f97650d72 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -666,7 +666,7 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
if (ctlq_msg->data_len) {
payload = ctlq_msg->ctx.indirect.payload->va;
- payload_size = ctlq_msg->ctx.indirect.payload->size;
+ payload_size = ctlq_msg->data_len;
}
xn->reply_sz = payload_size;
@@ -1295,10 +1295,6 @@ int idpf_send_create_vport_msg(struct idpf_adapter *adapter,
err = reply_sz;
goto free_vport_params;
}
- if (reply_sz < IDPF_CTLQ_MAX_BUF_LEN) {
- err = -EIO;
- goto free_vport_params;
- }
return 0;
@@ -2602,9 +2598,6 @@ int idpf_send_get_rx_ptype_msg(struct idpf_vport *vport)
if (reply_sz < 0)
return reply_sz;
- if (reply_sz < IDPF_CTLQ_MAX_BUF_LEN)
- return -EIO;
-
ptypes_recvd += le16_to_cpu(ptype_info->num_ptypes);
if (ptypes_recvd > max_ptype)
return -EINVAL;
--
2.42.0
Commit d2aaf1996504 ("ACPI: resource: Add DMI quirks for ASUS Vivobook
E1504GA and E1504GAB") does exactly what the subject says, adding DMI
matches for both the E1504GA and E1504GAB. But DMI_MATCH() does a substring
match, so checking for E1504GA will also match E1504GAB.
Drop the unnecessary E1504GAB entry since that is covered already by
the E1504GA entry.
Fixes: d2aaf1996504 ("ACPI: resource: Add DMI quirks for ASUS Vivobook E1504GA and E1504GAB")
Cc: Ben Mayo <benny1091(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/acpi/resource.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 8a4726e2eb69..1ff251fd1901 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -511,19 +511,12 @@ static const struct dmi_system_id irq1_level_low_skip_override[] = {
},
},
{
- /* Asus Vivobook E1504GA */
+ /* Asus Vivobook E1504GA* */
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
DMI_MATCH(DMI_BOARD_NAME, "E1504GA"),
},
},
- {
- /* Asus Vivobook E1504GAB */
- .matches = {
- DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
- DMI_MATCH(DMI_BOARD_NAME, "E1504GAB"),
- },
- },
{
/* Asus Vivobook Pro N6506MV */
.matches = {
--
2.46.0
When we access a no-current task's pid namespace we need check that the
task hasn't been reaped in the meantime and it's pid namespace isn't
accessible anymore.
The user namespace is fine because it is only released when the last
reference to struct task_struct is put and exit_creds() is called.
Fixes: 5b08bd408534 ("pidfs: allow retrieval of namespace file descriptors")
CC: stable(a)vger.kernel.org # v6.11
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
fs/pidfs.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/pidfs.c b/fs/pidfs.c
index 7ffdc88dfb52..80675b6bf884 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -120,6 +120,7 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
struct nsproxy *nsp __free(put_nsproxy) = NULL;
struct pid *pid = pidfd_pid(file);
struct ns_common *ns_common = NULL;
+ struct pid_namespace *pid_ns;
if (arg)
return -EINVAL;
@@ -202,7 +203,9 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case PIDFD_GET_PID_NAMESPACE:
if (IS_ENABLED(CONFIG_PID_NS)) {
rcu_read_lock();
- ns_common = to_ns_common( get_pid_ns(task_active_pid_ns(task)));
+ pid_ns = task_active_pid_ns(task);
+ if (pid_ns)
+ ns_common = to_ns_common(get_pid_ns(pid_ns));
rcu_read_unlock();
}
break;
--
2.45.2
After commit 379a58caa199 ("scsi: fnic: Move fnic_fnic_flush_tx() to a work
queue"), it can happen that a work item is sent to an uninitialized work
queue. This may has the effect that the item being queued is never
actually queued, and any further actions depending on it will not proceed.
The following warning is observed while the fnic driver is loaded:
kernel: WARNING: CPU: 11 PID: 0 at ../kernel/workqueue.c:1524 __queue_work+0x373/0x410
kernel: <IRQ>
kernel: queue_work_on+0x3a/0x50
kernel: fnic_wq_copy_cmpl_handler+0x54a/0x730 [fnic 62fbff0c42e7fb825c60a55cde2fb91facb2ed24]
kernel: fnic_isr_msix_wq_copy+0x2d/0x60 [fnic 62fbff0c42e7fb825c60a55cde2fb91facb2ed24]
kernel: __handle_irq_event_percpu+0x36/0x1a0
kernel: handle_irq_event_percpu+0x30/0x70
kernel: handle_irq_event+0x34/0x60
kernel: handle_edge_irq+0x7e/0x1a0
kernel: __common_interrupt+0x3b/0xb0
kernel: common_interrupt+0x58/0xa0
kernel: </IRQ>
It has been observed that this may break the rediscovery of fibre channel
devices after a temporary fabric failure.
This patch fixes it by moving the work queue initialization out of
an if block in fnic_probe().
Signed-off-by: Martin Wilck <mwilck(a)suse.com>
Fixes: 379a58caa199 ("scsi: fnic: Move fnic_fnic_flush_tx() to a work queue")
Cc: stable(a)vger.kernel.org
---
drivers/scsi/fnic/fnic_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/fnic/fnic_main.c b/drivers/scsi/fnic/fnic_main.c
index 0044717d4486..adec0df24bc4 100644
--- a/drivers/scsi/fnic/fnic_main.c
+++ b/drivers/scsi/fnic/fnic_main.c
@@ -830,7 +830,6 @@ static int fnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
spin_lock_init(&fnic->vlans_lock);
INIT_WORK(&fnic->fip_frame_work, fnic_handle_fip_frame);
INIT_WORK(&fnic->event_work, fnic_handle_event);
- INIT_WORK(&fnic->flush_work, fnic_flush_tx);
skb_queue_head_init(&fnic->fip_frame_queue);
INIT_LIST_HEAD(&fnic->evlist);
INIT_LIST_HEAD(&fnic->vlans);
@@ -948,6 +947,7 @@ static int fnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
INIT_WORK(&fnic->link_work, fnic_handle_link);
INIT_WORK(&fnic->frame_work, fnic_handle_frame);
+ INIT_WORK(&fnic->flush_work, fnic_flush_tx);
skb_queue_head_init(&fnic->frame_queue);
skb_queue_head_init(&fnic->tx_queue);
--
2.46.1
The patch titled
Subject: fs/proc/kcore.c: allow translation of physical memory addresses
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fs-proc-kcorec-allow-translation-of-physical-memory-addresses.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Alexander Gordeev <agordeev(a)linux.ibm.com>
Subject: fs/proc/kcore.c: allow translation of physical memory addresses
Date: Mon, 30 Sep 2024 14:21:19 +0200
When /proc/kcore is read an attempt to read the first two pages results in
HW-specific page swap on s390 and another (so called prefix) pages are
accessed instead. That leads to a wrong read.
Allow architecture-specific translation of memory addresses using
kc_xlate_dev_mem_ptr() and kc_unxlate_dev_mem_ptr() callbacks similarily
to /dev/mem xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() callbacks. That
way an architecture can deal with specific physical memory ranges.
Re-use the existing /dev/mem callback implementation on s390, which
handles the described prefix pages swapping correctly.
For other architectures the default callback is basically NOP. It is
expected the condition (vaddr == __va(__pa(vaddr))) always holds true for
KCORE_RAM memory type.
Link: https://lkml.kernel.org/r/20240930122119.1651546-1-agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
Suggested-by: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/s390/include/asm/io.h | 2 +
fs/proc/kcore.c | 36 +++++++++++++++++++++++++++++++++--
2 files changed, 36 insertions(+), 2 deletions(-)
--- a/arch/s390/include/asm/io.h~fs-proc-kcorec-allow-translation-of-physical-memory-addresses
+++ a/arch/s390/include/asm/io.h
@@ -16,8 +16,10 @@
#include <asm/pci_io.h>
#define xlate_dev_mem_ptr xlate_dev_mem_ptr
+#define kc_xlate_dev_mem_ptr xlate_dev_mem_ptr
void *xlate_dev_mem_ptr(phys_addr_t phys);
#define unxlate_dev_mem_ptr unxlate_dev_mem_ptr
+#define kc_unxlate_dev_mem_ptr unxlate_dev_mem_ptr
void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);
#define IO_SPACE_LIMIT 0
--- a/fs/proc/kcore.c~fs-proc-kcorec-allow-translation-of-physical-memory-addresses
+++ a/fs/proc/kcore.c
@@ -50,6 +50,20 @@ static struct proc_dir_entry *proc_root_
#define kc_offset_to_vaddr(o) ((o) + PAGE_OFFSET)
#endif
+#ifndef kc_xlate_dev_mem_ptr
+#define kc_xlate_dev_mem_ptr kc_xlate_dev_mem_ptr
+static inline void *kc_xlate_dev_mem_ptr(phys_addr_t phys)
+{
+ return __va(phys);
+}
+#endif
+#ifndef kc_unxlate_dev_mem_ptr
+#define kc_unxlate_dev_mem_ptr kc_unxlate_dev_mem_ptr
+static inline void kc_unxlate_dev_mem_ptr(phys_addr_t phys, void *virt)
+{
+}
+#endif
+
static LIST_HEAD(kclist_head);
static DECLARE_RWSEM(kclist_lock);
static int kcore_need_update = 1;
@@ -471,6 +485,8 @@ static ssize_t read_kcore_iter(struct ki
while (buflen) {
struct page *page;
unsigned long pfn;
+ phys_addr_t phys;
+ void *__start;
/*
* If this is the first iteration or the address is not within
@@ -537,7 +553,8 @@ static ssize_t read_kcore_iter(struct ki
}
break;
case KCORE_RAM:
- pfn = __pa(start) >> PAGE_SHIFT;
+ phys = __pa(start);
+ pfn = phys >> PAGE_SHIFT;
page = pfn_to_online_page(pfn);
/*
@@ -557,13 +574,28 @@ static ssize_t read_kcore_iter(struct ki
fallthrough;
case KCORE_VMEMMAP:
case KCORE_TEXT:
+ if (m->type == KCORE_RAM) {
+ __start = kc_xlate_dev_mem_ptr(phys);
+ if (!__start) {
+ ret = -ENOMEM;
+ if (iov_iter_zero(tsz, iter) != tsz)
+ ret = -EFAULT;
+ goto out;
+ }
+ } else {
+ __start = (void *)start;
+ }
+
/*
* Sadly we must use a bounce buffer here to be able to
* make use of copy_from_kernel_nofault(), as these
* memory regions might not always be mapped on all
* architectures.
*/
- if (copy_from_kernel_nofault(buf, (void *)start, tsz)) {
+ ret = copy_from_kernel_nofault(buf, __start, tsz);
+ if (m->type == KCORE_RAM)
+ kc_unxlate_dev_mem_ptr(phys, __start);
+ if (ret) {
if (iov_iter_zero(tsz, iter) != tsz) {
ret = -EFAULT;
goto out;
_
Patches currently in -mm which might be from agordeev(a)linux.ibm.com are
fs-proc-kcorec-allow-translation-of-physical-memory-addresses.patch
From: Kan Liang <kan.liang(a)linux.intel.com>
The EAX of the CPUID Leaf 023H enumerates the mask of valid sub-leaves.
To tell the availability of the sub-leaf 1 (enumerate the counter mask),
perf should check the bit 1 (0x2) of EAS, rather than bit 0 (0x1).
The error is not user-visible on bare metal. Because the sub-leaf 0 and
the sub-leaf 1 are always available. However, it may bring issues in a
virtualization environment when a VMM only enumerates the sub-leaf 0.
Fixes: eb467aaac21e ("perf/x86/intel: Support Architectural PerfMon Extension leaf")
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/events/intel/core.c | 4 ++--
arch/x86/include/asm/perf_event.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 342f8b1a2f93..123ed1d60118 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4900,8 +4900,8 @@ static void update_pmu_cap(struct x86_hybrid_pmu *pmu)
if (ebx & ARCH_PERFMON_EXT_EQ)
pmu->config_mask |= ARCH_PERFMON_EVENTSEL_EQ;
- if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF_BIT) {
- cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF,
+ if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF) {
+ cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF_BIT,
&eax, &ebx, &ecx, &edx);
pmu->cntr_mask64 = eax;
pmu->fixed_cntr_mask64 = ebx;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index e3b5e8e96fb3..1d4ce655aece 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -191,7 +191,7 @@ union cpuid10_edx {
#define ARCH_PERFMON_EXT_UMASK2 0x1
#define ARCH_PERFMON_EXT_EQ 0x2
#define ARCH_PERFMON_NUM_COUNTER_LEAF_BIT 0x1
-#define ARCH_PERFMON_NUM_COUNTER_LEAF 0x1
+#define ARCH_PERFMON_NUM_COUNTER_LEAF BIT(ARCH_PERFMON_NUM_COUNTER_LEAF_BIT)
/*
* Intel Architectural LBR CPUID detection/enumeration details:
--
2.38.1
Hey,
Would you be interested in acquiring an visitors list of Batimat Expo 2024?
We have 95,000 verified contact list.
List Contains: Contact Name, Title, Phone Number, Fax Number, Physical address, Company Name, Company URL, Employee Size, Revenue Size, Industry, and more…
Interested? I will share you more details and pricing for the same.
Kind Regards,
Clara Skylar
Senior Marketing Executive
If you do not wish to receive our emails, please reply with "Not Interested."
CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND is an int, defaulting to 250. When
the wakeref is non-zero, it's either -1 or a dynamically allocated
pointer, depending on CONFIG_DRM_I915_DEBUG_RUNTIME_PM. It's likely that
the code works by coincidence with the bitwise AND, but with
CONFIG_DRM_I915_DEBUG_RUNTIME_PM=y, there's the off chance that the
condition evaluates to false, and intel_wakeref_auto() doesn't get
called. Switch to the intended logical AND.
v2: Use != to avoid clang -Wconstant-logical-operand (Nathan)
Fixes: ad74457a6b5a ("drm/i915/dgfx: Release mmap on rpm suspend")
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Anshuman Gupta <anshuman.gupta(a)intel.com>
Cc: Andi Shyti <andi.shyti(a)linux.intel.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # v6.1+
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com> # v1
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com> # v1
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
---
drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 5c72462d1f57..b22e2019768f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1131,7 +1131,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
GEM_WARN_ON(!i915_ttm_cpu_maps_iomem(bo->resource));
}
- if (wakeref & CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND)
+ if (wakeref && CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND != 0)
intel_wakeref_auto(&to_i915(obj->base.dev)->runtime_pm.userfault_wakeref,
msecs_to_jiffies_timeout(CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND));
--
2.39.2
I'm announcing the release of the 6.11.1 kernel.
All users of the 6.11 kernel series must upgrade.
The updated 6.11.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.11.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
drivers/accel/drm_accel.c | 110 ++--------------------------------
drivers/bluetooth/btintel_pcie.c | 2
drivers/cpufreq/amd-pstate.c | 5 +
drivers/gpu/drm/drm_drv.c | 97 ++++++++++++++---------------
drivers/gpu/drm/drm_file.c | 2
drivers/gpu/drm/drm_internal.h | 4 -
drivers/nvme/host/nvme.h | 5 +
drivers/nvme/host/pci.c | 18 ++---
drivers/powercap/intel_rapl_common.c | 35 +++++++++-
drivers/usb/class/usbtmc.c | 2
drivers/usb/serial/pl2303.c | 1
drivers/usb/serial/pl2303.h | 4 +
include/drm/drm_accel.h | 18 -----
include/drm/drm_file.h | 5 +
net/netfilter/nft_socket.c | 4 -
sound/soc/amd/acp/acp-legacy-common.c | 5 +
sound/soc/amd/acp/amd.h | 2
18 files changed, 128 insertions(+), 193 deletions(-)
Dan Carpenter (2):
netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level()
powercap: intel_rapl: Change an error pointer to NULL
Dhananjay Ugwekar (3):
powercap/intel_rapl: Add support for AMD family 1Ah
powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
cpufreq/amd-pstate: Add the missing cpufreq_cpu_put()
Edward Adam Davis (1):
USB: usbtmc: prevent kernel-usb-infoleak
Greg Kroah-Hartman (1):
Linux 6.11.1
Junhao Xie (1):
USB: serial: pl2303: add device id for Macrosilicon MS3020
Keith Busch (1):
nvme-pci: qdepth 1 quirk
Kiran K (1):
Bluetooth: btintel_pcie: Allocate memory for driver private data
Michał Winiarski (3):
drm: Use XArray instead of IDR for minors
accel: Use XArray instead of IDR for minors
drm: Expand max DRM device number to full MINORBITS
Vijendar Mukunda (1):
ASoC: amd: acp: add ZSC control register programming sequence
Reset LPCR_MER bit before running a vCPU to ensure that it is not set if
there are no pending interrupts. Running a vCPU with LPCR_MER bit set
and no pending interrupts results in L2 vCPU getting an infinite flood
of spurious interrupts. The 'if check' in kvmhv_run_single_vcpu() sets
the LPCR_MER bit if there are pending interrupts.
The spurious flood problem can be observed in 2 cases:
1. Crashing the guest while interrupt heavy workload is running
a. Start a L2 guest and run an interrupt heavy workload (eg: ipistorm)
b. While the workload is running, crash the guest (make sure kdump
is configured)
c. Any one of the vCPUs of the guest will start getting an infinite
flood of spurious interrupts.
2. Running LTP stress tests in multiple guests at the same time
a. Start 4 L2 guests.
b. Start running LTP stress tests on all 4 guests at same time.
c. In some time, any one/more of the vCPUs of any of the guests will
start getting an infinite flood of spurious interrupts.
The root cause of both the above issues is the same:
1. A NMI is sent to a running vCPU that had LPCR_MER bit set.
2. In the NMI path, all registers are refreshed, i.e, H_GUEST_GET_STATE
is called for all the registers.
3. When H_GUEST_GET_STATE is called for lpcr, the vcpu->arch.vcore->lpcr
of that vCPU at L1 level gets updated with LPCR_MER set to 1, and this
new value is always used whenever that vCPU runs, regardless of whether
there was a pending interrupt.
4. Since LPCR_MER is set, the vCPU in L2 always jumps to the external
interrupt handler, and this cycle never ends.
Fix the spurious flood by making sure LPCR_MER is always reset before
running a vCPU.
Fixes: ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0")
Cc: stable(a)vger.kernel.org # v6.8+
Signed-off-by: Gautam Menghani <gautam(a)linux.ibm.com>
---
arch/powerpc/kvm/book3s_hv.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8f7d7e37bc8c..3cc2f1691001 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -98,6 +98,13 @@
/* Used to indicate that a guest passthrough interrupt needs to be handled */
#define RESUME_PASSTHROUGH (RESUME_GUEST | RESUME_FLAG_ARCH2)
+/* Clear LPCR_MER bit - If we run a L2 vCPU with LPCR_MER bit set but no pending external
+ * interrupts, we end up getting a flood of spurious interrupts in L2 KVM guests. To avoid
+ * that, reset LPCR_MER and let the 'if check' for pending interrupts in kvmhv_run_single_vcpu()
+ * set LPCR_MER if there are pending interrupts.
+ */
+#define kvmppc_reset_lpcr_mer(vcpu) (vcpu->arch.vcore->lpcr &= ~LPCR_MER)
+
/* Used as a "null" value for timebase values */
#define TB_NIL (~(u64)0)
@@ -5091,7 +5098,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
accumulate_time(vcpu, &vcpu->arch.guest_entry);
if (cpu_has_feature(CPU_FTR_ARCH_300))
r = kvmhv_run_single_vcpu(vcpu, ~(u64)0,
- vcpu->arch.vcore->lpcr);
+ kvmppc_reset_lpcr_mer(vcpu));
else
r = kvmppc_run_vcpu(vcpu);
--
2.46.0
Atomicity violation occurs during consecutive reads of the members of
gb_tty. Consider a scenario where, because the consecutive reads of gb_tty
members are not protected by a lock, the value of gb_tty may still be
changing during the read process.
gb_tty->port.close_delay and gb_tty->port.closing_wait are updated
together, such as in the set_serial_info() function. If during the
read process, gb_tty->port.close_delay and gb_tty->port.closing_wait
are still being updated, it is possible that gb_tty->port.close_delay
is updated while gb_tty->port.closing_wait is not. In this case,
the code first reads gb_tty->port.close_delay and then
gb_tty->port.closing_wait. A new gb_tty->port.close_delay and an
old gb_tty->port.closing_wait could be read. Such values, whether
before or after the update, should not coexist as they represent an
intermediate state.
This could result in a mismatch of the values read for gb_tty->minor,
gb_tty->port.close_delay, and gb_tty->port.closing_wait, which in turn
could cause ss->close_delay and ss->closing_wait to be mismatched.
To address this issue, we have enclosed all sequential read operations of
the gb_tty variable within a lock. This ensures that the value of gb_tty
remains unchanged throughout the process, guaranteeing its validity.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: b71e571adaa5 ("staging: greybus: uart: fix TIOCSSERIAL jiffies conversions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/staging/greybus/uart.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/staging/greybus/uart.c b/drivers/staging/greybus/uart.c
index cdf4ebb93b10..8cc18590d97b 100644
--- a/drivers/staging/greybus/uart.c
+++ b/drivers/staging/greybus/uart.c
@@ -595,12 +595,14 @@ static int get_serial_info(struct tty_struct *tty,
{
struct gb_tty *gb_tty = tty->driver_data;
+ mutex_lock(&gb_tty->port.mutex);
ss->line = gb_tty->minor;
ss->close_delay = jiffies_to_msecs(gb_tty->port.close_delay) / 10;
ss->closing_wait =
gb_tty->port.closing_wait == ASYNC_CLOSING_WAIT_NONE ?
ASYNC_CLOSING_WAIT_NONE :
jiffies_to_msecs(gb_tty->port.closing_wait) / 10;
+ mutex_unlock(&gb_tty->port.mutex);
return 0;
}
--
2.34.1
From: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
[ Upstream commit 2a3cfb9a24a28da9cc13d2c525a76548865e182c ]
Since 'adev->dm.dc' in amdgpu_dm_fini() might turn out to be NULL
before the call to dc_enable_dmub_notifications(), check
beforehand to ensure there will not be a possible NULL-ptr-deref
there.
Also, since commit 1e88eb1b2c25 ("drm/amd/display: Drop
CONFIG_DRM_AMD_DC_HDCP") there are two separate checks for NULL in
'adev->dm.dc' before dc_deinit_callbacks() and dc_dmub_srv_destroy().
Clean up by combining them all under one 'if'.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 81927e2808be ("drm/amd/display: Support for DMUB AUX")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 393e32259a77..4850aed54604 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1876,14 +1876,14 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
dc_deinit_callbacks(adev->dm.dc);
#endif
- if (adev->dm.dc)
+ if (adev->dm.dc) {
dc_dmub_srv_destroy(&adev->dm.dc->ctx->dmub_srv);
-
- if (dc_enable_dmub_notifications(adev->dm.dc)) {
- kfree(adev->dm.dmub_notify);
- adev->dm.dmub_notify = NULL;
- destroy_workqueue(adev->dm.delayed_hpd_wq);
- adev->dm.delayed_hpd_wq = NULL;
+ if (dc_enable_dmub_notifications(adev->dm.dc)) {
+ kfree(adev->dm.dmub_notify);
+ adev->dm.dmub_notify = NULL;
+ destroy_workqueue(adev->dm.delayed_hpd_wq);
+ adev->dm.delayed_hpd_wq = NULL;
+ }
}
if (adev->dm.dmub_bo)
--
2.25.1
This series fixes an error path where the reference of a child node is
not decremented upon early return. When at it, a trivial comma/semicolon
substitution I found by chance has been added to improve code clarity.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
pinctrl: intel: platform: fix error path in device_for_each_child_node()
pinctrl: intel: platform: use semicolon instead of comma in ncommunities assignment
drivers/pinctrl/intel/pinctrl-intel-platform.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
---
base-commit: 92fc9636d1471b7f68bfee70c776f7f77e747b97
change-id: 20240926-intel-pinctrl-platform-scoped-68d000ba3c1d
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
The atomicity violation occurs when the variables cur_delay and new_delay
are defined. Imagine a scenario where, while defining cur_delay and
new_delay, the values stored in devfreq->profile->polling_ms and the delay
variable change. After acquiring the mutex_lock and entering the critical
section, due to possible concurrent modifications, cur_delay and new_delay
may no longer represent the correct values. Subsequent usage, such as if
(cur_delay > new_delay), could cause the program to run incorrectly,
resulting in inconsistencies.
To address this issue, it is recommended to acquire a lock in advance,
ensuring that devfreq->profile->polling_ms and delay are protected by the
lock when being read. This will help ensure the consistency of the program.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: 7e6fdd4bad03 ("PM / devfreq: Core updates to support devices which can idle")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/devfreq/devfreq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 98657d3b9435..9634739fc9cb 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -616,10 +616,10 @@ EXPORT_SYMBOL(devfreq_monitor_resume);
*/
void devfreq_update_interval(struct devfreq *devfreq, unsigned int *delay)
{
+ mutex_lock(&devfreq->lock);
unsigned int cur_delay = devfreq->profile->polling_ms;
unsigned int new_delay = *delay;
- mutex_lock(&devfreq->lock);
devfreq->profile->polling_ms = new_delay;
if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
--
2.34.1
If the adp5588_read function returns 0, then there will be an
overflow of the kpad->keycode buffer.
If the adp5588_read function returns a negative value, then the
logic is broken - the wrong value is used as an index of
the kpad->keycode array.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
drivers/input/keyboard/adp5588-keys.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/input/keyboard/adp5588-keys.c b/drivers/input/keyboard/adp5588-keys.c
index 1b0279393df4..d05387f9c11f 100644
--- a/drivers/input/keyboard/adp5588-keys.c
+++ b/drivers/input/keyboard/adp5588-keys.c
@@ -526,14 +526,17 @@ static void adp5588_report_events(struct adp5588_kpad *kpad, int ev_cnt)
int i;
for (i = 0; i < ev_cnt; i++) {
- int key = adp5588_read(kpad->client, KEY_EVENTA + i);
- int key_val = key & KEY_EV_MASK;
- int key_press = key & KEY_EV_PRESSED;
+ int key, key_val, key_press;
+ key = adp5588_read(kpad->client, KEY_EVENTA + i);
+ if (key < 0)
+ continue;
+ key_val = key & KEY_EV_MASK;
+ key_press = key & KEY_EV_PRESSED;
if (key_val >= GPI_PIN_BASE && key_val <= GPI_PIN_END) {
/* gpio line used as IRQ source */
adp5588_gpio_irq_handle(kpad, key_val, key_press);
- } else {
+ } else if (key_val > 0) {
int row = (key_val - 1) / ADP5588_COLS_MAX;
int col = (key_val - 1) % ADP5588_COLS_MAX;
int code = MATRIX_SCAN_CODE(row, col, kpad->row_shift);
--
2.25.1
This change is specific to Hyper-V based VMs.
If the Virtual Machine Connection window is focused,
a Hyper-V VM user can unintentionally touch the keyboard/mouse
when the VM is hibernating or resuming, and consequently the
hibernation or resume operation can be aborted unexpectedly.
Fix the issue by no longer registering the keyboard/mouse as
wakeup devices (see the other two patches for the
changes to drivers/input/serio/hyperv-keyboard.c and
drivers/hid/hid-hyperv.c).
The keyboard/mouse were registered as wakeup devices because the
VM needs to be woken up from the Suspend-to-Idle state after
a user runs "echo freeze > /sys/power/state". It seems like
the Suspend-to-Idle feature has no real users in practice, so
let's no longer support that by returning -EOPNOTSUPP if a
user tries to use that.
$echo freeze > /sys/power/state
> bash: echo: write error: Operation not supported
Fixes: 1a06d017fb3f ("Drivers: hv: vmbus: Fix Suspend-to-Idle for Generation-2 VM")
Cc: stable(a)vger.kernel.org
Signed-off-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Signed-off-by: Erni Sri Satya Vennela <ernis(a)linux.microsoft.com>
---
Changes in v2:
* Add "#define vmbus_freeze NULL" when CONFIG_PM_SLEEP is not
enabled.
* Change commit message to clarify that this change is specifc to
Hyper-V based VMs.
Changes in v3:
* Add 'Cc: stable(a)vger.kernel.org' in sign-off area.
---
drivers/hv/vmbus_drv.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 965d2a4efb7e..8f445c849512 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -900,6 +900,19 @@ static void vmbus_shutdown(struct device *child_device)
}
#ifdef CONFIG_PM_SLEEP
+/*
+ * vmbus_freeze - Suspend-to-Idle
+ */
+static int vmbus_freeze(struct device *child_device)
+{
+/*
+ * Do not support Suspend-to-Idle ("echo freeze > /sys/power/state") as
+ * that would require registering the Hyper-V synthetic mouse/keyboard
+ * devices as wakeup devices, which can abort hibernation/resume unexpectedly.
+ */
+ return -EOPNOTSUPP;
+}
+
/*
* vmbus_suspend - Suspend a vmbus device
*/
@@ -938,6 +951,7 @@ static int vmbus_resume(struct device *child_device)
return drv->resume(dev);
}
#else
+#define vmbus_freeze NULL
#define vmbus_suspend NULL
#define vmbus_resume NULL
#endif /* CONFIG_PM_SLEEP */
@@ -969,7 +983,7 @@ static void vmbus_device_release(struct device *device)
*/
static const struct dev_pm_ops vmbus_pm = {
- .suspend_noirq = NULL,
+ .suspend_noirq = vmbus_freeze,
.resume_noirq = NULL,
.freeze_noirq = vmbus_suspend,
.thaw_noirq = vmbus_resume,
--
2.34.1
This change is specific to Hyper-V based VMs.
If the Virtual Machine Connection window is focused,
a Hyper-V VM user can unintentionally touch the keyboard/mouse
when the VM is hibernating or resuming, and consequently the
hibernation or resume operation can be aborted unexpectedly.
Fix the issue by no longer registering the keyboard/mouse as
wakeup devices (see the other two patches for the
changes to drivers/input/serio/hyperv-keyboard.c and
drivers/hid/hid-hyperv.c).
The keyboard/mouse were registered as wakeup devices because the
VM needs to be woken up from the Suspend-to-Idle state after
a user runs "echo freeze > /sys/power/state". It seems like
the Suspend-to-Idle feature has no real users in practice, so
let's no longer support that by returning -EOPNOTSUPP if a
user tries to use that.
$echo freeze > /sys/power/state
> bash: echo: write error: Operation not supported
Fixes: 1a06d017fb3f ("Drivers: hv: vmbus: Fix Suspend-to-Idle for Generation-2 VM")
Signed-off-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Signed-off-by: Erni Sri Satya Vennela <ernis(a)linux.microsoft.com>
---
Changes in v2:
* Add "#define vmbus_freeze NULL" when CONFIG_PM_SLEEP is not
enabled.
* Change commit message to clarify that this change is specifc to
Hyper-V based VMs.
---
drivers/hv/vmbus_drv.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 965d2a4efb7e..8f445c849512 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -900,6 +900,19 @@ static void vmbus_shutdown(struct device *child_device)
}
#ifdef CONFIG_PM_SLEEP
+/*
+ * vmbus_freeze - Suspend-to-Idle
+ */
+static int vmbus_freeze(struct device *child_device)
+{
+/*
+ * Do not support Suspend-to-Idle ("echo freeze > /sys/power/state") as
+ * that would require registering the Hyper-V synthetic mouse/keyboard
+ * devices as wakeup devices, which can abort hibernation/resume unexpectedly.
+ */
+ return -EOPNOTSUPP;
+}
+
/*
* vmbus_suspend - Suspend a vmbus device
*/
@@ -938,6 +951,7 @@ static int vmbus_resume(struct device *child_device)
return drv->resume(dev);
}
#else
+#define vmbus_freeze NULL
#define vmbus_suspend NULL
#define vmbus_resume NULL
#endif /* CONFIG_PM_SLEEP */
@@ -969,7 +983,7 @@ static void vmbus_device_release(struct device *device)
*/
static const struct dev_pm_ops vmbus_pm = {
- .suspend_noirq = NULL,
+ .suspend_noirq = vmbus_freeze,
.resume_noirq = NULL,
.freeze_noirq = vmbus_suspend,
.thaw_noirq = vmbus_resume,
--
2.34.1
This reverts commit 69197dfc64007b5292cc960581548f41ccd44828.
commit 937d46ecc5f9 ("net: wangxun: add ethtool_ops for
channel number") changed NIC misc irq from most significant
bit to least significant bit, the former condition is not
required to apply this patch, because we only need to set
irq affinity for NIC queue irq vectors.
this patch is required after commit 937d46ecc5f9 ("net: wangxun:
add ethtool_ops for channel number") was applied, so this is only
relevant to 6.6.y branch.
Signed-off-by: Duanqiang Wen <duanqiangwen(a)net-swift.com>
---
drivers/net/ethernet/wangxun/libwx/wx_lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
index 2b3d6586f44a..fb1caa40da6b 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
@@ -1620,7 +1620,7 @@ static void wx_set_num_queues(struct wx *wx)
*/
static int wx_acquire_msix_vectors(struct wx *wx)
{
- struct irq_affinity affd = { .pre_vectors = 1 };
+ struct irq_affinity affd = {0, };
int nvecs, i;
/* We start by asking for one vector per queue pair */
--
2.27.0
This is the start of the stable review cycle for the 6.11.1 release.
There are 12 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 29 Sep 2024 12:17:00 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.11.1-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.11.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.11.1-rc1
Edward Adam Davis <eadavis(a)qq.com>
USB: usbtmc: prevent kernel-usb-infoleak
Junhao Xie <bigfoot(a)classfun.cn>
USB: serial: pl2303: add device id for Macrosilicon MS3020
Keith Busch <kbusch(a)kernel.org>
nvme-pci: qdepth 1 quirk
Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
ASoC: amd: acp: add ZSC control register programming sequence
Kiran K <kiran.k(a)intel.com>
Bluetooth: btintel_pcie: Allocate memory for driver private data
Dan Carpenter <dan.carpenter(a)linaro.org>
netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level()
Dhananjay Ugwekar <Dhananjay.Ugwekar(a)amd.com>
cpufreq/amd-pstate: Add the missing cpufreq_cpu_put()
Dhananjay Ugwekar <Dhananjay.Ugwekar(a)amd.com>
powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
Dhananjay Ugwekar <Dhananjay.Ugwekar(a)amd.com>
powercap/intel_rapl: Add support for AMD family 1Ah
Michał Winiarski <michal.winiarski(a)intel.com>
drm: Expand max DRM device number to full MINORBITS
Michał Winiarski <michal.winiarski(a)intel.com>
accel: Use XArray instead of IDR for minors
Michał Winiarski <michal.winiarski(a)intel.com>
drm: Use XArray instead of IDR for minors
-------------
Diffstat:
Makefile | 4 +-
drivers/accel/drm_accel.c | 110 +++-------------------------------
drivers/bluetooth/btintel_pcie.c | 2 +-
drivers/cpufreq/amd-pstate.c | 5 +-
drivers/gpu/drm/drm_drv.c | 97 +++++++++++++++---------------
drivers/gpu/drm/drm_file.c | 2 +-
drivers/gpu/drm/drm_internal.h | 4 --
drivers/nvme/host/nvme.h | 5 ++
drivers/nvme/host/pci.c | 18 +++---
drivers/powercap/intel_rapl_common.c | 35 +++++++++--
drivers/usb/class/usbtmc.c | 2 +-
drivers/usb/serial/pl2303.c | 1 +
drivers/usb/serial/pl2303.h | 4 ++
include/drm/drm_accel.h | 18 +-----
include/drm/drm_file.h | 5 ++
net/netfilter/nft_socket.c | 4 +-
sound/soc/amd/acp/acp-legacy-common.c | 5 ++
sound/soc/amd/acp/amd.h | 2 +
18 files changed, 129 insertions(+), 194 deletions(-)
According to Intel SDM, for the local APIC timer,
1. "The initial-count register is a read-write register. A write of 0 to
the initial-count register effectively stops the local APIC timer, in
both one-shot and periodic mode."
2. "In TSC deadline mode, writes to the initial-count register are
ignored; and current-count register always reads 0. Instead, timer
behavior is controlled using the IA32_TSC_DEADLINE MSR."
"In TSC-deadline mode, writing 0 to the IA32_TSC_DEADLINE MSR disarms
the local-APIC timer."
Current code in lapic_timer_shutdown() writes 0 to the initial-count
register. This stops the local APIC timer for one-shot and periodic mode
only. In TSC deadline mode, the timer is not properly stopped.
Some CPUs are affected by this and they are woke up by the armed timer
in s2idle in TSC deadline mode.
Stop the TSC deadline timer in lapic_timer_shutdown() by writing 0 to
MSR_IA32_TSC_DEADLINE.
Cc: stable(a)vger.kernel.org
Fixes: 279f1461432c ("x86: apic: Use tsc deadline for oneshot when available")
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
---
arch/x86/kernel/apic/apic.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 6513c53c9459..d1006531729a 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -441,6 +441,10 @@ static int lapic_timer_shutdown(struct clock_event_device *evt)
v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
apic_write(APIC_LVTT, v);
apic_write(APIC_TMICT, 0);
+
+ if (boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER))
+ wrmsrl(MSR_IA32_TSC_DEADLINE, 0);
+
return 0;
}
--
2.34.1
We found some bugs when testing the XDP function of enetc driver,
and these bugs are easy to reproduce. This is not only causes XDP
to not work, but also the network cannot be restored after exiting
the XDP program. So the patch set is mainly to fix these bugs. For
details, please see the commit message of each patch.
Wei Fang (3):
net: enetc: remove xdp_drops statistic from enetc_xdp_drop()
net: enetc: fix the issues of XDP_REDIRECT feature
net: enetc: reset xdp_tx_in_flight when updating bpf program
drivers/net/ethernet/freescale/enetc/enetc.c | 46 +++++++++++++++-----
drivers/net/ethernet/freescale/enetc/enetc.h | 1 +
2 files changed, 37 insertions(+), 10 deletions(-)
--
2.34.1
I was benchmarking some compressors, piping to and from a network share on a NAS, and some consistently wrote corrupted data.
First, apologies in advance:
* if I'm not in the right place. I tried to follow the directions from the Regressions guide - https://www.kernel.org/doc/html/latest/admin-guide/reporting-regressions.ht…
* I know there's a ton of context I don't know
* I’m trying a different mail app, because the first one looked concussed with plain text. This might be worse.
The detailed description:
I was benchmarking some compressors on Debian on a Raspberry Pi, piping to and from a network share on a NAS, and found that some consistently had issues writing to my NAS. Specifically:
* lzop
* pigz - parallel gzip
* pbzip2 - parallel bzip2
This is dependent on kernel version. I've done a survey, below.
While I tripped over the issue on a Debian port (Debian 12, bookworm, kernel v6.6), I compiled my own vanilla / mainline kernels for testing and reporting this.
Even more details:
The Pi and the Synology NAS are directly connected by Gigabit Ethernet. Both sides are using self-assigned IP addresses. I'll note that at boot, getting the Pi to see the NAS requires some nudging of avahi-autoipd; while I think it's stable before testing, I'm not positive, and reconnection issues might be in play.
The files in question are tars of sparse file systems, about 270 gig, compressing down to 10-30 gig.
Compression seems to work, without complaint; decompression crashes the process, usually within the first gig of the compressed file. The output of the stream doesn't match what ends up written to disk.
Trying decompression during compression gets further along than it does after compression finishes; this might point toward something with writes and caches.
A previous attempt involved rpi-update, which:
* good: let me install kernels without building myself
* bad: updated the bootloader and firmware, to bleeding edge, with possible regressions; it definitely muddied the results of my tests
I started over with a fresh install, and no results involving rpi-update are included in this email.
A survey of major branches:
* 5.15.167, LTS - good
* 6.1.109, LTS - good
* 6.2.16 - good
* 6.3.13 - bad
* 6.4.16 - bad
* 6.5.13 - bad
* 6.6.50, LTS - bad
* 6.7.12 - bad
* 6.8.12 - bad
* 6.9.12 - bad
* 6.10.9 - good
* 6.11.0 - good
I tried, but couldn't fully build 4.19.322 or 6.0.19, due to issues with modules.
Important commits:
It looked like both the breakage and the fix came in during rc1 releases.
Breakage, v6.3-rc1:
I manually bisected commits in fs/smb* and fs/cifs.
3d78fe73fa12 cifs: Build the RDMA SGE list directly from an iterator
> lzop and pigz worked. last working. test in progress: pbzip2
607aea3cc2a8 cifs: Remove unused code
> lzop didn't work. first broken
Fix, v6.10-rc1:
I manually bisected commits in fs/smb.
69c3c023af25 cifs: Implement netfslib hooks
> lzop didn't work. last broken one
3ee1a1fc3981 cifs: Cut over to using netfslib
> lzop, pigz, pbzip2, all worked. first fixed one
To test / reproduce:
It looks like this, on a mounted network share, with extra pv for progress meters:
cat 1tb-rust-ext4.img.tar.gz | \
gzip -d | \
lzop -1 > \
1tb-rust-ext4.img.tar.lzop
# wait 40 minutes
cat 1tb-rust-ext4.img.tar.lzop | \
lzop -d | \
sha1sum
# either it works, and shows the right checksum
# or it crashes early, due to a corrupt file, and shows an incorrect checksum
As I re-read this, I realize it might look like the compressor behaves differently. I added a "tee $output | sha1sum; sha1sum $output" and ran it on a broken version. The checksums from the pipe and for the file on disk are different.
Assorted info:
This is a Raspberry Pi 4, with 4 GiB RAM, running Debian 12, bookworm, or a port.
mount.cifs version: 7.0
# cat /proc/sys/kernel/tainted
1024
# cat /proc/version
Linux version 6.2.0-3d78fe73f-v8-pronoiac+ (pronoiac@bisect) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #21 SMP PREEMPT Thu Sep 19 16:51:22 PDT 2024
DebugData:
/proc/fs/cifs/DebugData
Display Internal CIFS Data Structures for Debugging
---------------------------------------------------
CIFS Version 2.41
Features: DFS,FSCACHE,STATS2,DEBUG,ALLOW_INSECURE_LEGACY,CIFS_POSIX,UPCALL(SPNEGO),XATTR,ACL
CIFSMaxBufSize: 16384
Active VFS Requests: 1
Servers:
1) ConnectionId: 0x1 Hostname: drums.local
Number of credits: 8062 Dialect 0x300
TCP status: 1 Instance: 1
Local Users To Server: 1 SecMode: 0x1 Req On Wire: 2
In Send: 1 In MaxReq Wait: 0
Sessions:
1) Address: 169.254.132.219 Uses: 1 Capability: 0x300047 Session Status: 1
Security type: RawNTLMSSP SessionId: 0x4969841e
User: 1000 Cred User: 0
Shares:
0) IPC: \\drums.local\IPC$ Mounts: 1 DevInfo: 0x0 Attributes: 0x0
PathComponentMax: 0 Status: 1 type: 0 Serial Number: 0x0
Share Capabilities: None Share Flags: 0x0
tid: 0xeb093f0b Maximal Access: 0x1f00a9
1) \\drums.local\billions Mounts: 1 DevInfo: 0x20 Attributes: 0x5007f
PathComponentMax: 255 Status: 1 type: DISK Serial Number: 0x735a9af5
Share Capabilities: None Aligned, Partition Aligned, Share Flags: 0x0
tid: 0x5e6832e6 Optimal sector size: 0x200 Maximal Access: 0x1f01ff
MIDs:
State: 2 com: 9 pid: 3117 cbdata: 00000000e003293e mid 962892
State: 2 com: 9 pid: 3117 cbdata: 000000002610602a mid 962956
--
Let me know how I can help.
The process of iterating can take hours, and it's not automated, so my resources are limited.
#regzbot introduced: 607aea3cc2a8
#regzbot fix: 3ee1a1fc3981
-James
This series updates the driver for the veml6030 ALS and adds support for
the veml6035, which shares most of its functionality with the former.
The most relevant updates for the veml6030 are the resolution correction
to meet the datasheet update that took place with Rev 1.7, 28-Nov-2023,
a fix to avoid a segmentation fault when reading the
in_illuminance_period_available attribute, and the removal of the
processed value for the WHITE channel, as it uses the ALS resolution.
Vishay does not host the Product Information Notification where the
resolution correction was introduced, but it can still be found
online[1], and the corrected value is the one listed on the latest
version of the datasheet[2] (Rev. 1.7, 28-Nov-2023) and application
note[3] (Rev. 17-Jan-2024).
Link: https://www.farnell.com/datasheets/4379688.pdf [1]
Link: https://www.vishay.com/docs/84366/veml6030.pdf [2]
Link: https://www.vishay.com/docs/84367/designingveml6030.pdf [3]
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Changes in v2:
- Rebase to iio/testing, dropping applied patches [1/7], [4/7].
- Drop [3/7] (applied to iio/fixes-togreg).
- Add patch to use dev_err_probe() in probe error paths.
- Add patch to use read_avail() for available attributes.
- Add patches to use to support a regulator.
- Add patch to ensure that the device is powered off in error paths
after powering it on.
- Add patch to drop processed values from the WHITE channel.
- Use fsleep() instead of usleep_range() in veml6030_als_pwr_on()
- Link to v1: https://lore.kernel.org/r/20240913-veml6035-v1-0-0b09c0c90418@gmail.com
---
Javier Carrasco (10):
iio: light: veml6030: fix ALS sensor resolution
iio: light: veml6030: add set up delay after any power on sequence
iio: light: veml6030: use dev_err_probe()
dt-bindings: iio: light: veml6030: add vdd-supply property
iio: light: veml6030: add support for a regulator
iio: light: veml6030: use read_avail() for available attributes
iio: light: veml6030: drop processed info for white channel
iio: light: veml6030: power off device in probe error paths
dt-bindings: iio: light: veml6030: add veml6035
iio: light: veml6030: add support for veml6035
.../bindings/iio/light/vishay,veml6030.yaml | 43 +-
drivers/iio/light/Kconfig | 4 +-
drivers/iio/light/veml6030.c | 453 ++++++++++++++++-----
3 files changed, 391 insertions(+), 109 deletions(-)
---
base-commit: 8bea3878a1511bceadc2fbf284b00bcc5a2ef28d
change-id: 20240903-veml6035-7a91bc088c6f
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
Hi,
Upstream commit 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just
X86_VENDOR_INTEL") introduced a flags member to struct x86_cpu_id. Bit 0
of x86_cpu.id.flags must be set to 1 for x86_match_cpu() to work correctly.
This upstream commit has been backported to 5.15.y.
Callers that use the X86_MATCH_*() family of macros to compose the argument
of x86_match_cpu() function correctly. Callers that use their own custom
mechanisms may not work if they fail to set x86_cpu_id.flags correctly.
There are three remaining callers in 5.15.y that use their own mechanisms:
a) setup_pcid(), b) rapl_msr_probe(), and c) goodix_add_acpi_gpio_
mappings(). a) caused a regression that Thomas Lindroth reported in [1]. b)
works by luck but it still populates its x86_cpu_id[] array incorrectly. I
am not aware of any reports on c), but inspecting the code reveals that it
will fail to identify INTEL_FAM6_ATOM_SILVERMONT for the reason described
above.
I backported three patches that fix these bugs in mainline. Hopefully the
authors and/or maintainers can ack the backports?
I tested patches 2/3 and 3/3 on Alder Lake, and Meteor Lake systems as the
two involved callers behave differently on these two systems. I wrote the
testing details in each patch separately. I could not test patch 1/3
because I do not have access to Bay Trail hardware.
Thanks and BR,
Ricardo
[1]. https://lore.kernel.org/all/eb709d67-2a8d-412f-905d-f3777d897bfa@gmail.com/
Hans de Goede (1):
Input: goodix - use the new soc_intel_is_byt() helper
Sumeet Pawnikar (1):
powercap: RAPL: fix invalid initialization for pl4_supported field
Tony Luck (1):
x86/mm: Switch to new Intel CPU model defines
arch/x86/mm/init.c | 16 ++++++----------
drivers/input/touchscreen/goodix.c | 18 ++----------------
drivers/powercap/intel_rapl_msr.c | 6 +++---
3 files changed, 11 insertions(+), 29 deletions(-)
--
2.34.1
From: Zack Rusin <zack.rusin(a)broadcom.com>
commit aba07b9a0587f50e5d3346eaa19019cf3f86c0ea upstream.
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240816183332.31961-2-zack.r…
Reviewed-by: Martin Krastev <martin.krastev(a)broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala(a)broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Shivani: Modified to apply on v6.1.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 12 +++++++++++-
drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 3 +++
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index c46f380d9149..733b0013eda1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -348,6 +348,8 @@ void *vmw_bo_map_and_cache(struct vmw_buffer_object *vbo)
void *virtual;
int ret;
+ atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -370,10 +372,17 @@ void *vmw_bo_map_and_cache(struct vmw_buffer_object *vbo)
*/
void vmw_bo_unmap(struct vmw_buffer_object *vbo)
{
+ int map_count;
+
if (vbo->map.bo == NULL)
return;
- ttm_bo_kunmap(&vbo->map);
+ map_count = atomic_dec_return(&vbo->map_count);
+
+ if (!map_count) {
+ ttm_bo_kunmap(&vbo->map);
+ vbo->map.bo = NULL;
+ }
}
@@ -510,6 +519,7 @@ int vmw_bo_init(struct vmw_private *dev_priv,
BUILD_BUG_ON(TTM_MAX_BO_PRIORITY <= 3);
vmw_bo->base.priority = 3;
vmw_bo->res_tree = RB_ROOT;
+ atomic_set(&vmw_bo->map_count, 0);
size = ALIGN(size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->base.base, size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index b0c23559511a..bca10214e0bf 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -116,6 +116,8 @@ struct vmwgfx_hash_item {
* @base: The TTM buffer object
* @res_tree: RB tree of resources using this buffer object as a backing MOB
* @base_mapped_count: ttm BO mapping count; used by KMS atomic helpers.
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
* @cpu_writers: Number of synccpu write grabs. Protected by reservation when
* increased. May be decreased without reservation.
* @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -129,6 +131,7 @@ struct vmw_buffer_object {
/* For KMS atomic helpers: ttm bo mapping count */
atomic_t base_mapped_count;
+ atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted. Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
--
2.39.4
From: Zack Rusin <zack.rusin(a)broadcom.com>
commit aba07b9a0587f50e5d3346eaa19019cf3f86c0ea upstream.
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240816183332.31961-2-zack.r…
Reviewed-by: Martin Krastev <martin.krastev(a)broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala(a)broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Shivani: Modified to apply on v6.6.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 13 +++++++++++--
drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 3 +++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index ae796e0c64aa..fdc34283eeb9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -331,6 +331,8 @@ void *vmw_bo_map_and_cache(struct vmw_bo *vbo)
void *virtual;
int ret;
+ atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -353,11 +355,17 @@ void *vmw_bo_map_and_cache(struct vmw_bo *vbo)
*/
void vmw_bo_unmap(struct vmw_bo *vbo)
{
+ int map_count;
+
if (vbo->map.bo == NULL)
return;
- ttm_bo_kunmap(&vbo->map);
- vbo->map.bo = NULL;
+ map_count = atomic_dec_return(&vbo->map_count);
+
+ if (!map_count) {
+ ttm_bo_kunmap(&vbo->map);
+ vbo->map.bo = NULL;
+ }
}
@@ -390,6 +398,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
BUILD_BUG_ON(TTM_MAX_BO_PRIORITY <= 3);
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
+ atomic_set(&vmw_bo->map_count, 0);
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index f349642e6190..156ea612fc2a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -68,6 +68,8 @@ struct vmw_bo_params {
* @map: Kmap object for semi-persistent mappings
* @res_tree: RB tree of resources using this buffer object as a backing MOB
* @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
* @cpu_writers: Number of synccpu write grabs. Protected by reservation when
* increased. May be decreased without reservation.
* @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -86,6 +88,7 @@ struct vmw_bo {
struct rb_root res_tree;
u32 res_prios[TTM_MAX_BO_PRIORITY];
+ atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted. Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
--
2.39.4
From: Scott Mayhew <smayhew(a)redhat.com>
commit 76a0e79bc84f466999fa501fce5bf7a07641b8a7 upstream.
Marek Gresko reports that the root user on an NFS client is able to
change the security labels on files on an NFS filesystem that is
exported with root squashing enabled.
The end of the kerneldoc comment for __vfs_setxattr_noperm() states:
* This function requires the caller to lock the inode's i_mutex before it
* is executed. It also assumes that the caller will make the appropriate
* permission checks.
nfsd_setattr() does do permissions checking via fh_verify() and
nfsd_permission(), but those don't do all the same permissions checks
that are done by security_inode_setxattr() and its related LSM hooks do.
Since nfsd_setattr() is the only consumer of security_inode_setsecctx(),
simplest solution appears to be to replace the call to
__vfs_setxattr_noperm() with a call to __vfs_setxattr_locked(). This
fixes the above issue and has the added benefit of causing nfsd to
recall conflicting delegations on a file when a client tries to change
its security label.
Cc: stable(a)kernel.org
Reported-by: Marek Gresko <marek.gresko(a)protonmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218809
Signed-off-by: Scott Mayhew <smayhew(a)redhat.com>
Tested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Reviewed-by: Chuck Lever <chuck.lever(a)oracle.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Acked-by: Casey Schaufler <casey(a)schaufler-ca.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Shivani: Modified to apply on v5.15.y-v6.1.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
security/selinux/hooks.c | 4 ++--
security/smack/smack_lsm.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 78f3da39b031..626d397c2f86 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6631,8 +6631,8 @@ static int selinux_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen
*/
static int selinux_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
{
- return __vfs_setxattr_noperm(&init_user_ns, dentry, XATTR_NAME_SELINUX,
- ctx, ctxlen, 0);
+ return __vfs_setxattr_locked(&init_user_ns, dentry, XATTR_NAME_SELINUX,
+ ctx, ctxlen, 0, NULL);
}
static int selinux_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index c18366dbbfed..25995df15e82 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -4714,8 +4714,8 @@ static int smack_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen)
static int smack_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
{
- return __vfs_setxattr_noperm(&init_user_ns, dentry, XATTR_NAME_SMACK,
- ctx, ctxlen, 0);
+ return __vfs_setxattr_locked(&init_user_ns, dentry, XATTR_NAME_SMACK,
+ ctx, ctxlen, 0, NULL);
}
static int smack_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
--
2.39.4
From: Scott Mayhew <smayhew(a)redhat.com>
commit 76a0e79bc84f466999fa501fce5bf7a07641b8a7 upstream.
Marek Gresko reports that the root user on an NFS client is able to
change the security labels on files on an NFS filesystem that is
exported with root squashing enabled.
The end of the kerneldoc comment for __vfs_setxattr_noperm() states:
* This function requires the caller to lock the inode's i_mutex before it
* is executed. It also assumes that the caller will make the appropriate
* permission checks.
nfsd_setattr() does do permissions checking via fh_verify() and
nfsd_permission(), but those don't do all the same permissions checks
that are done by security_inode_setxattr() and its related LSM hooks do.
Since nfsd_setattr() is the only consumer of security_inode_setsecctx(),
simplest solution appears to be to replace the call to
__vfs_setxattr_noperm() with a call to __vfs_setxattr_locked(). This
fixes the above issue and has the added benefit of causing nfsd to
recall conflicting delegations on a file when a client tries to change
its security label.
Cc: stable(a)kernel.org
Reported-by: Marek Gresko <marek.gresko(a)protonmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218809
Signed-off-by: Scott Mayhew <smayhew(a)redhat.com>
Tested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Reviewed-by: Chuck Lever <chuck.lever(a)oracle.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Acked-by: Casey Schaufler <casey(a)schaufler-ca.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Shivani: Modified to apply on v5.10.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
security/selinux/hooks.c | 3 ++-
security/smack/smack_lsm.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 46c00a68bb4b..90935ed3d8d8 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6570,7 +6570,8 @@ static int selinux_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen
*/
static int selinux_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
{
- return __vfs_setxattr_noperm(dentry, XATTR_NAME_SELINUX, ctx, ctxlen, 0);
+ return __vfs_setxattr_locked(dentry, XATTR_NAME_SELINUX, ctx, ctxlen, 0,
+ NULL);
}
static int selinux_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 92bc6c9d793d..cb4801fcf9a8 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -4651,7 +4651,8 @@ static int smack_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen)
static int smack_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
{
- return __vfs_setxattr_noperm(dentry, XATTR_NAME_SMACK, ctx, ctxlen, 0);
+ return __vfs_setxattr_locked(dentry, XATTR_NAME_SMACK, ctx, ctxlen, 0,
+ NULL);
}
static int smack_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
--
2.39.4
When CONFIG_ZRAM_MULTI_COMP isn't set ZRAM_SECONDARY_COMP can hold
default_compressor, because it's the same offset as ZRAM_PRIMARY_COMP,
so we need to make sure that we don't attempt to kfree() the
statically defined compressor name.
This is detected by KASAN.
==================================================================
Call trace:
kfree+0x60/0x3a0
zram_destroy_comps+0x98/0x198 [zram]
zram_reset_device+0x22c/0x4a8 [zram]
reset_store+0x1bc/0x2d8 [zram]
dev_attr_store+0x44/0x80
sysfs_kf_write+0xfc/0x188
kernfs_fop_write_iter+0x28c/0x428
vfs_write+0x4dc/0x9b8
ksys_write+0x100/0x1f8
__arm64_sys_write+0x74/0xb8
invoke_syscall+0xd8/0x260
el0_svc_common.constprop.0+0xb4/0x240
do_el0_svc+0x48/0x68
el0_svc+0x40/0xc8
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x190/0x198
==================================================================
Signed-off-by: Andrey Skvortsov <andrej.skvortzov(a)gmail.com>
Fixes: 684826f8271a ("zram: free secondary algorithms names")
Cc: <stable(a)vger.kernel.org>
---
Changes in v2:
- removed comment from source code about freeing statically defined compression
- removed part of KASAN report from commit description
- added information about CONFIG_ZRAM_MULTI_COMP into commit description
Changes in v3:
- modified commit description based on Sergey's comment
- changed start for-loop to ZRAM_PRIMARY_COMP
drivers/block/zram/zram_drv.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index c3d245617083d..ad9c9bc3ccfc5 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2115,8 +2115,10 @@ static void zram_destroy_comps(struct zram *zram)
zram->num_active_comps--;
}
- for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
- kfree(zram->comp_algs[prio]);
+ for (prio = ZRAM_PRIMARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
+ /* Do not free statically defined compression algorithms */
+ if (zram->comp_algs[prio] != default_compressor)
+ kfree(zram->comp_algs[prio]);
zram->comp_algs[prio] = NULL;
}
--
2.45.2
Hi, Thorsten here, the Linux kernel's regression tracker.
I noticed a report about a linux-6.6.y regression in bugzilla.kernel.org
that appears to be caused by this commit from Dan applied by Greg:
15fffc6a5624b1 ("driver core: Fix uevent_show() vs driver detach race")
[v6.11-rc3, v6.10.5, v6.6.46, v6.1.105, v5.15.165, v5.10.224, v5.4.282,
v4.19.320]
The reporter did not check yet if mainline is affected; decided to
forward the report by mail nevertheless, as the maintainer for the
subsystem is also the maintainer for the stable tree. ;-)
To quote from https://bugzilla.kernel.org/show_bug.cgi?id=219244 :
> The symptoms of this bug are as follows:
>
> - After booting (to the graphical login screen) the mouse pointer
> would frozen and only after unplugging and plugging-in again the usb
> plug of the mouse would the mouse be working as expected.
> - If one would log in without fixing the mouse issue, the mouse
> pointer would still be frozen after login.
> - The usb keyboard usually is not affected even though plugged into
> the same usb-hub - thus logging in is possible.
> - The mouse pointer is also frozen if the usb connector is plugged
> into a different usb-port (different from the usb-hub)
> - The pointer is moveable via the inbuilt synaptics trackpad
>
>
> The kernel log shows almost the same messages (not sure if the
> differences mean anything in regards to this bug) for the initial
> recognizing the mouse (frozen mouse pointer) and the re-plugged-in mouse
> (and subsequently moveable mouse pointer):
>
> [kernel] [ 8.763158] usb 1-2.2.1.2: new low-speed USB device number 10 using xhci_hcd
> [kernel] [ 8.956028] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04
> [kernel] [ 8.956036] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [kernel] [ 8.956039] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0
> [kernel] [ 8.956041] usb 1-2.2.1.2: Manufacturer: Microsoft
> [kernel] [ 8.963554] input: Microsoft Microsoft Basic Optical Mouse v2.0 as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0002/input/input18
> [kernel] [ 8.964417] hid-generic 0003:045E:00CB.0002: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0
>
> [kernel] [ 31.258381] usb 1-2.2.1.2: USB disconnect, device number 10
> [kernel] [ 31.595051] usb 1-2.2.1.2: new low-speed USB device number 16 using xhci_hcd
> [kernel] [ 31.804002] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04
> [kernel] [ 31.804010] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [kernel] [ 31.804013] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0
> [kernel] [ 31.804016] usb 1-2.2.1.2: Manufacturer: Microsoft
> [kernel] [ 31.812933] input: Microsoft Microsoft Basic Optical Mouse v2.0 as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0004/input/input20
> [kernel] [ 31.814028] hid-generic 0003:045E:00CB.0004: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0
>
> Differences:
>
> ../0003:045E:00CB.0002/input/input18 vs ../0003:045E:00CB.0004/input/input20
>
> and
>
> hid-generic 0003:045E:00CB.0002 vs hid-generic 0003:045E:00CB.0004
>
>
> The connector / usb-port was not changed in this case!
>
>
> The symptoms of this bug have been present at one point in the
> recent
> past, but with kernel v6.6.45 (or maybe even some version before that)
> it was fine. But with 6.6.45 it seems to be definitely fine.
>
> But with v6.6.46 the symptoms returned. That's the reason I
> suspected
> the kernel to be the cause of this issue. So I did some bisecting -
> which wasn't easy because that bug would often times not appear if the
> system was simply rebooted into the test kernel.
> As the bug would definitely appear on the affected kernels (v6.6.46
> ff) after shutting down the system for the night and booting the next
> day, I resorted to simulating the over-night powering-off by shutting
> the system down, unplugging the power and pressing the power button to
> get rid of residual voltage. But even then a few times the bug would
> only appear if I repeated this procedure before booting the system again
> with the respective kernel.
>
> This is on a Thinkpad with Kaby Lake and integrated Intel graphics.
> Even though it is a laptop, it is used as a desktop device, and the
> internal battery is disconnected, the removable battery is removed as
> the system is plugged-in via the power cord at all times (when in use)!
> Also, the system has no power (except for the bios battery, of
> course)
> over night as the power outlet is switched off if the device is not in use.
>
> Not sure if this affects the issue - or how it does. But for
> successful bisecting I had to resort to the above procedure.
>
> Bisecting the issue (between the release commits of v6.6.45 and
> v6.6.46) resulted in this commit [1] being the probable culprit.
>
> I then tested kernel v6.6.49. It still produced the bug for me. So I
> reverted the changes of the assumed "bad commit" and re-compiled kernel
> v6.6.49. With this modified kernel the bug seems to be gone.
>
> Now, I assume the commit has a reason for being introduced, but
> maybe
> needs some adjusting in order to avoid this bug I'm experiencing on my
> system.
> Also, I can't say why the issue appeared in the past even without
> this
> commit being present, as I haven't bisected any kernel version before
> v6.6.45.
>
>
> [1]:
>
> 4d035c743c3e391728a6f81cbf0f7f9ca700cf62 is the first bad commit
> commit 4d035c743c3e391728a6f81cbf0f7f9ca700cf62
> Author: Dan Williams <dan.j.williams(a)intel.com>
> Date: Fri Jul 12 12:42:09 2024 -0700
>
> driver core: Fix uevent_show() vs driver detach race
>
> commit 15fffc6a5624b13b428bb1c6e9088e32a55eb82c upstream.
>
> uevent_show() wants to de-reference dev->driver->name. There is no clean
See the ticket for more details. Note, you have to use bugzilla to reach
the reporter, as I sadly[1] can not CCed them in mails like this.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"
P.S.: let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: 4d035c743c3e391728a6f81cbf0f7f9ca700cf62
#regzbot from: brmails+k
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=219244
#regzbot title: driver core: frozen usb mouse pointer at boot
#regzbot ignore-activity
The patch titled
Subject: mm: avoid unconditional one-tick sleep when swapcache_prepare fails
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Barry Song <v-songbaohua(a)oppo.com>
Subject: mm: avoid unconditional one-tick sleep when swapcache_prepare fails
Date: Fri, 27 Sep 2024 09:19:36 +1200
Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
introduced an unconditional one-tick sleep when `swapcache_prepare()`
fails, which has led to reports of UI stuttering on latency-sensitive
Android devices. To address this, we can use a waitqueue to wake up tasks
that fail `swapcache_prepare()` sooner, instead of always sleeping for a
full tick. While tasks may occasionally be woken by an unrelated
`do_swap_page()`, this method is preferable to two scenarios: rapid
re-entry into page faults, which can cause livelocks, and multiple
millisecond sleeps, which visibly degrade user experience.
Oven's testing shows that a single waitqueue resolves the UI stuttering
issue. If a 'thundering herd' problem becomes apparent later, a waitqueue
hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit
locks can be introduced.
Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com
Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
Reported-by: Oven Liyang <liyangouwen1(a)oppo.com>
Tested-by: Oven Liyang <liyangouwen1(a)oppo.com>
Cc: Kairui Song <kasong(a)tencent.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
--- a/mm/memory.c~mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails
+++ a/mm/memory.c
@@ -4192,6 +4192,8 @@ static struct folio *alloc_swap_folio(st
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq);
+
/*
* We enter with non-exclusive mmap_lock (to exclude vma changes,
* but allow concurrent faults), and pte mapped but not yet locked.
@@ -4204,6 +4206,7 @@ vm_fault_t do_swap_page(struct vm_fault
{
struct vm_area_struct *vma = vmf->vma;
struct folio *swapcache, *folio = NULL;
+ DECLARE_WAITQUEUE(wait, current);
struct page *page;
struct swap_info_struct *si = NULL;
rmap_t rmap_flags = RMAP_NONE;
@@ -4302,7 +4305,9 @@ vm_fault_t do_swap_page(struct vm_fault
* Relax a bit to prevent rapid
* repeated page faults.
*/
+ add_wait_queue(&swapcache_wq, &wait);
schedule_timeout_uninterruptible(1);
+ remove_wait_queue(&swapcache_wq, &wait);
goto out_page;
}
need_clear_cache = true;
@@ -4609,8 +4614,10 @@ unlock:
pte_unmap_unlock(vmf->pte, vmf->ptl);
out:
/* Clear the swap cache pin for direct swapin after PTL unlock */
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
@@ -4625,8 +4632,10 @@ out_release:
folio_unlock(swapcache);
folio_put(swapcache);
}
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
_
Patches currently in -mm which might be from v-songbaohua(a)oppo.com are
mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails.patch
The patch titled
Subject: selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-fixed-incorrect-buffer-mirror-size-in-hmm2-double_map-test.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Donet Tom <donettom(a)linux.ibm.com>
Subject: selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
Date: Fri, 27 Sep 2024 00:07:52 -0500
The hmm2 double_map test was failing due to an incorrect buffer->mirror
size. The buffer->mirror size was 6, while buffer->ptr size was 6 *
PAGE_SIZE. The test failed because the kernel's copy_to_user function was
attempting to copy a 6 * PAGE_SIZE buffer to buffer->mirror. Since the
size of buffer->mirror was incorrect, copy_to_user failed.
This patch corrects the buffer->mirror size to 6 * PAGE_SIZE.
Test Result without this patch
==============================
# RUN hmm2.hmm2_device_private.double_map ...
# hmm-tests.c:1680:double_map:Expected ret (-14) == 0 (0)
# double_map: Test terminated by assertion
# FAIL hmm2.hmm2_device_private.double_map
not ok 53 hmm2.hmm2_device_private.double_map
Test Result with this patch
===========================
# RUN hmm2.hmm2_device_private.double_map ...
# OK hmm2.hmm2_device_private.double_map
ok 53 hmm2.hmm2_device_private.double_map
Link: https://lkml.kernel.org/r/20240927050752.51066-1-donettom@linux.ibm.com
Fixes: fee9f6d1b8df ("mm/hmm/test: add selftests for HMM")
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: J��r��me Glisse <jglisse(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Cc: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/hmm-tests.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/hmm-tests.c~selftests-mm-fixed-incorrect-buffer-mirror-size-in-hmm2-double_map-test
+++ a/tools/testing/selftests/mm/hmm-tests.c
@@ -1657,7 +1657,7 @@ TEST_F(hmm2, double_map)
buffer->fd = -1;
buffer->size = size;
- buffer->mirror = malloc(npages);
+ buffer->mirror = malloc(size);
ASSERT_NE(buffer->mirror, NULL);
/* Reserve a range of addresses. */
_
Patches currently in -mm which might be from donettom(a)linux.ibm.com are
selftests-mm-fixed-incorrect-buffer-mirror-size-in-hmm2-double_map-test.patch
The patch titled
Subject: device-dax: correct pgoff align in dax_set_mapping()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
device-dax-correct-pgoff-align-in-dax_set_mapping.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Kun(llfl)" <llfl(a)linux.alibaba.com>
Subject: device-dax: correct pgoff align in dax_set_mapping()
Date: Fri, 27 Sep 2024 15:45:09 +0800
pgoff should be aligned using ALIGN_DOWN() instead of ALIGN(). Otherwise,
vmf->address not aligned to fault_size will be aligned to the next
alignment, that can result in memory failure getting the wrong address.
Link: https://lkml.kernel.org/r/23c02a03e8d666fef11bbe13e85c69c8b4ca0624.17274216…
Fixes: b9b5777f09be ("device-dax: use ALIGN() for determining pgoff")
Signed-off-by: Kun(llfl) <llfl(a)linux.alibaba.com>
Tested-by: JianXiong Zhao <zhaojianxiong.zjx(a)alibaba-inc.com>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/dax/device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/dax/device.c~device-dax-correct-pgoff-align-in-dax_set_mapping
+++ a/drivers/dax/device.c
@@ -86,7 +86,7 @@ static void dax_set_mapping(struct vm_fa
nr_pages = 1;
pgoff = linear_page_index(vmf->vma,
- ALIGN(vmf->address, fault_size));
+ ALIGN_DOWN(vmf->address, fault_size));
for (i = 0; i < nr_pages; i++) {
struct page *page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
_
Patches currently in -mm which might be from llfl(a)linux.alibaba.com are
device-dax-correct-pgoff-align-in-dax_set_mapping.patch
arch-s390.h uses types from std.h, but does not include it.
Depending on the inclusion order the compilation can fail.
Include std.h explicitly to avoid these errors.
Fixes: 404fa87c0eaf ("tools/nolibc: s390: provide custom implementation for sys_fork")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
tools/include/nolibc/arch-s390.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/include/nolibc/arch-s390.h b/tools/include/nolibc/arch-s390.h
index 2ec13d8b9a2db80efa8d6cbbbd01bfa3d0059de2..f9ab83a219b8a2d5e53b0b303d8bf0bf78280d5f 100644
--- a/tools/include/nolibc/arch-s390.h
+++ b/tools/include/nolibc/arch-s390.h
@@ -10,6 +10,7 @@
#include "compiler.h"
#include "crt.h"
+#include "std.h"
/* Syscalls for s390:
* - registers are 64-bit
---
base-commit: e477dba5442c0af7acb9e8bbbbde1108a37ed39c
change-id: 20240927-nolibc-s390-std-h-cbb13f70fa73
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Hello,
I am unsure if this is the 'correct' behavior for ptrace.
If you run ptrace_traceme followed by ptrace_attach, then the process
attaches its own parent to itself and cannot be attached by another
thing. The attach call errors out, but GDB does report something
attached to it. I am unsure if Bash does this itself perhaps.
It's a bit hard for me to reason about because my debugging skills are
bad and trying 'strace' with bash -c ./thing, or just on the thing
itself gives -1 on both ptrace calls as strace attaches to it.
similarly with GDB. Unsure how to debug this.
https://gist.github.com/x64-elf-sh42/83393e319ad8280b8704fbe3f499e381
to compile simply:
gcc test.c -o thingy
This code works on my machine which is:
Linux 6.1.0-25-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3
(2024-08-26) x86_64 GNU/Linux
GDB -p on the pid reports that another pid is attached and the
operation is illegal. That other pid is the bash shell that i spawned
this binary from (code in gist).
it's useful for anti-debugging, but it seems odd it will attach it's
parent to the process since that's not actually doing the attach call.
If anything i'd expect the pid attached to itself, rather than the
parent getting attached.
The first call to ptrace (traceme) gets return value 0. The second
call (attach) gets return value -1. That does seem correct, but yet
there is something 'attached' when i try to use GDB.
If I only do the traceme call, it does not get attached by Bash, so it
looks totally like the 'attach' call has a side effect of attaching
the parent, rather than just only failing.
Kind regards,
~sh42
Hello everyone,
I wondered a few times in the last months how we could properly track
regressions for the stable series, as I'm always a bit unsure how proper
use of regzbot would look for these cases.
For issues that affect mainline usually just the commit itself is
specified with the relevant "#regzbot introduced: ..." invocation, but
how would one do this if the stable trees are also affected? Do we need
to specify "#regzbot introduced" for each backported version of the
upstream commit?!
IMO this is especially interesting because sometimes getting a patchset
to fix the older trees can take quite some time since possibly backport
dependencies have to be found or even custom patches need to be written,
as it i.e. was the case [here][0]. This led to fixes for the linux-6.1.y
branch landing a bit later which would be good to track with regzbot so
it does not fall through the cracks.
Maybe there already is a regzbot facility to do this that I have just
missed, but I thought if there is people on the lists will know :)
Cheers,
Chris
P.S.: I hope everybody had a great time at the conferences! I hope to
also make it next year ..
[0]: https://lore.kernel.org/all/66bcb6f68172f_adbf529471@willemb.c.googlers.com…
The atomicity violation issue is due to the invalidation of the function
port_has_data()'s check caused by concurrency. Imagine a scenario where a
port that contains data passes the validity check but is simultaneously
assigned a value with no data. This could result in an empty port passing
the validity check, potentially leading to a null pointer dereference
error later in the program, which is inconsistent.
To address this issue, we believe there is a problem with the original
logic. Since the validity check and the read operation were separated, it
could lead to inconsistencies between the data that passes the check and
the data being read. Therefore, we moved the main logic of the
port_has_data function into this function and placed the read operation
within a lock, ensuring that the validity check and read operation are
not separated, thus resolving the problem.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: 203baab8ba31 ("virtio: console: Introduce function to hand off data from host to readers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/char/virtio_console.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index de7d720d99fa..5aaf07f71a4c 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -656,10 +656,14 @@ static ssize_t fill_readbuf(struct port *port, u8 __user *out_buf,
struct port_buffer *buf;
unsigned long flags;
- if (!out_count || !port_has_data(port))
+ if (!out_count)
return 0;
- buf = port->inbuf;
+ spin_lock_irqsave(&port->inbuf_lock, flags);
+ buf = port->inbuf = get_inbuf(port);
+ spin_unlock_irqrestore(&port->inbuf_lock, flags);
+ if (!buf)
+ return 0;
out_count = min(out_count, buf->len - buf->offset);
if (to_user) {
--
2.34.1
Atomicity violation occurs when the fmc_send_cmd() function is executed
simultaneously with the modification of the fmdev->resp_skb value.
Consider a scenario where, after passing the validity check within the
function, a non-null fmdev->resp_skb variable is assigned a null value.
This results in an invalid fmdev->resp_skb variable passing the validity
check. As seen in the later part of the function, skb = fmdev->resp_skb;
when the invalid fmdev->resp_skb passes the check, a null pointer
dereference error may occur at line 478, evt_hdr = (void *)skb->data;
To address this issue, it is recommended to include the validity check of
fmdev->resp_skb within the locked section of the function. This
modification ensures that the value of fmdev->resp_skb does not change
during the validation process, thereby maintaining its validity.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: e8454ff7b9a4 ("[media] drivers:media:radio: wl128x: FM Driver Common sources")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/media/radio/wl128x/fmdrv_common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/radio/wl128x/fmdrv_common.c b/drivers/media/radio/wl128x/fmdrv_common.c
index 3d36f323a8f8..4d032436691c 100644
--- a/drivers/media/radio/wl128x/fmdrv_common.c
+++ b/drivers/media/radio/wl128x/fmdrv_common.c
@@ -466,11 +466,12 @@ int fmc_send_cmd(struct fmdev *fmdev, u8 fm_op, u16 type, void *payload,
jiffies_to_msecs(FM_DRV_TX_TIMEOUT) / 1000);
return -ETIMEDOUT;
}
+ spin_lock_irqsave(&fmdev->resp_skb_lock, flags);
if (!fmdev->resp_skb) {
+ spin_unlock_irqrestore(&fmdev->resp_skb_lock, flags);
fmerr("Response SKB is missing\n");
return -EFAULT;
}
- spin_lock_irqsave(&fmdev->resp_skb_lock, flags);
skb = fmdev->resp_skb;
fmdev->resp_skb = NULL;
spin_unlock_irqrestore(&fmdev->resp_skb_lock, flags);
--
2.34.1
From: Filipe Manana <fdmanana(a)suse.com>
commit f8f210dc84709804c9f952297f2bfafa6ea6b4bd upstream.
When updating the global block reserve, we account for the 6 items needed
by an unlink operation and the 6 delayed references for each one of those
items. However the calculation for the delayed references is not correct
in case we have the free space tree enabled, as in that case we need to
touch the free space tree as well and therefore need twice the number of
bytes. So use the btrfs_calc_delayed_ref_bytes() helper to calculate the
number of bytes need for the delayed references at
btrfs_update_global_block_rsv().
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
[Diogo: this patch has been cherry-picked from the original commit;
conflicts included lack of a define (picked from commit 5630e2bcfe223)
and lack of btrfs_calc_delayed_ref_bytes (picked from commit 0e55a54502b97)
- changed const struct -> struct for compatibility.]
Signed-off-by: Diogo Jahchan Koike <djahchankoike(a)gmail.com>
---
Fixes WARNING in btrfs_chunk_alloc (2) [0]
[0]: https://syzkaller.appspot.com/bug?extid=57de2b05959bc1e659af
---
fs/btrfs/block-rsv.c | 16 +++++++++-------
fs/btrfs/block-rsv.h | 12 ++++++++++++
fs/btrfs/delayed-ref.h | 21 +++++++++++++++++++++
3 files changed, 42 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index ec96285357e0..f7e3d6c2e302 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -378,17 +378,19 @@ void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
/*
* But we also want to reserve enough space so we can do the fallback
- * global reserve for an unlink, which is an additional 5 items (see the
- * comment in __unlink_start_trans for what we're modifying.)
+ * global reserve for an unlink, which is an additional
+ * BTRFS_UNLINK_METADATA_UNITS items.
*
* But we also need space for the delayed ref updates from the unlink,
- * so its 10, 5 for the actual operation, and 5 for the delayed ref
- * updates.
+ * so add BTRFS_UNLINK_METADATA_UNITS units for delayed refs, one for
+ * each unlink metadata item.
*/
- min_items += 10;
-
+ min_items += BTRFS_UNLINK_METADATA_UNITS;
+
num_bytes = max_t(u64, num_bytes,
- btrfs_calc_insert_metadata_size(fs_info, min_items));
+ btrfs_calc_insert_metadata_size(fs_info, min_items) +
+ btrfs_calc_delayed_ref_bytes(fs_info,
+ BTRFS_UNLINK_METADATA_UNITS));
spin_lock(&sinfo->lock);
spin_lock(&block_rsv->lock);
diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h
index 578c3497a455..662c52b4bd44 100644
--- a/fs/btrfs/block-rsv.h
+++ b/fs/btrfs/block-rsv.h
@@ -50,6 +50,18 @@ struct btrfs_block_rsv {
u64 qgroup_rsv_reserved;
};
+/*
+ * Number of metadata items necessary for an unlink operation:
+ *
+ * 1 for the possible orphan item
+ * 1 for the dir item
+ * 1 for the dir index
+ * 1 for the inode ref
+ * 1 for the inode
+ * 1 for the parent inode
+ */
+#define BTRFS_UNLINK_METADATA_UNITS 6
+
void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, enum btrfs_rsv_type type);
void btrfs_init_root_block_rsv(struct btrfs_root *root);
struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index d6304b690ec4..5c6bb23d45a5 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -253,6 +253,27 @@ extern struct kmem_cache *btrfs_delayed_extent_op_cachep;
int __init btrfs_delayed_ref_init(void);
void __cold btrfs_delayed_ref_exit(void);
+static inline u64 btrfs_calc_delayed_ref_bytes(struct btrfs_fs_info *fs_info,
+ int num_delayed_refs)
+{
+ u64 num_bytes;
+
+ num_bytes = btrfs_calc_insert_metadata_size(fs_info, num_delayed_refs);
+
+ /*
+ * We have to check the mount option here because we could be enabling
+ * the free space tree for the first time and don't have the compat_ro
+ * option set yet.
+ *
+ * We need extra reservations if we have the free space tree because
+ * we'll have to modify that tree as well.
+ */
+ if (btrfs_test_opt(fs_info, FREE_SPACE_TREE))
+ num_bytes *= 2;
+
+ return num_bytes;
+}
+
static inline void btrfs_init_generic_ref(struct btrfs_ref *generic_ref,
int action, u64 bytenr, u64 len, u64 parent)
{
--
2.43.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x b4cd80b0338945a94972ac3ed54f8338d2da2076
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091330-reps-craftsman-ab67@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
b4cd80b03389 ("mptcp: pm: Fix uaf in __timer_delete_sync")
d58300c3185b ("mptcp: validate 'id' when stopping the ADD_ADDR retransmit timer")
f7dafee18538 ("mptcp: use mptcp_addr_info in mptcp_options_received")
dc87efdb1a5c ("mptcp: add mptcp reset option support")
557963c383e8 ("mptcp: move to next addr when subflow creation fail")
d88c476f4a7d ("mptcp: export lookup_anno_list_by_saddr")
efd13b71a3fa ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b4cd80b0338945a94972ac3ed54f8338d2da2076 Mon Sep 17 00:00:00 2001
From: Edward Adam Davis <eadavis(a)qq.com>
Date: Tue, 10 Sep 2024 17:58:56 +0800
Subject: [PATCH] mptcp: pm: Fix uaf in __timer_delete_sync
There are two paths to access mptcp_pm_del_add_timer, result in a race
condition:
CPU1 CPU2
==== ====
net_rx_action
napi_poll netlink_sendmsg
__napi_poll netlink_unicast
process_backlog netlink_unicast_kernel
__netif_receive_skb genl_rcv
__netif_receive_skb_one_core netlink_rcv_skb
NF_HOOK genl_rcv_msg
ip_local_deliver_finish genl_family_rcv_msg
ip_protocol_deliver_rcu genl_family_rcv_msg_doit
tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit
tcp_v4_do_rcv mptcp_nl_remove_addrs_list
tcp_rcv_established mptcp_pm_remove_addrs_and_subflows
tcp_data_queue remove_anno_list_by_saddr
mptcp_incoming_options mptcp_pm_del_add_timer
mptcp_pm_del_add_timer kfree(entry)
In remove_anno_list_by_saddr(running on CPU2), after leaving the critical
zone protected by "pm.lock", the entry will be released, which leads to the
occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1).
Keeping a reference to add_timer inside the lock, and calling
sk_stop_timer_sync() with this reference, instead of "entry->add_timer".
Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock,
do not directly access any members of the entry outside the pm lock, which
can avoid similar "entry->x" uaf.
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable(a)vger.kernel.org
Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Edward Adam Davis <eadavis(a)qq.com>
Acked-by: Paolo Abeni <pabeni(a)redhat.com>
Link: https://patch.msgid.link/tencent_7142963A37944B4A74EF76CD66EA3C253609@qq.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index f891bc714668..ad935d34c973 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -334,15 +334,21 @@ mptcp_pm_del_add_timer(struct mptcp_sock *msk,
{
struct mptcp_pm_add_entry *entry;
struct sock *sk = (struct sock *)msk;
+ struct timer_list *add_timer = NULL;
spin_lock_bh(&msk->pm.lock);
entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
- if (entry && (!check_id || entry->addr.id == addr->id))
+ if (entry && (!check_id || entry->addr.id == addr->id)) {
entry->retrans_times = ADD_ADDR_RETRANS_MAX;
+ add_timer = &entry->add_timer;
+ }
+ if (!check_id && entry)
+ list_del(&entry->list);
spin_unlock_bh(&msk->pm.lock);
- if (entry && (!check_id || entry->addr.id == addr->id))
- sk_stop_timer_sync(sk, &entry->add_timer);
+ /* no lock, because sk_stop_timer_sync() is calling del_timer_sync() */
+ if (add_timer)
+ sk_stop_timer_sync(sk, add_timer);
return entry;
}
@@ -1462,7 +1468,6 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
entry = mptcp_pm_del_add_timer(msk, addr, false);
if (entry) {
- list_del(&entry->list);
kfree(entry);
return true;
}
Hello stable-team,
Francesco Dolcini reported a regression on v6.6.52 [1], it turns out
since v6.6.51~34 a.k.a. 5ea24ddc26a7 ("can: mcp251xfd: rx: add
workaround for erratum DS80000789E 6 of mcp2518fd") the mcp25xfd driver
is broken.
Cherry picking the following commits fixes the problem:
51b2a7216122 ("can: mcp251xfd: properly indent labels")
a7801540f325 ("can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()")
Please pick these patches for
- 6.10.x
- 6.6.x
- 6.1.x
[1] https://lore.kernel.org/all/20240923115310.GA138774@francesco-nb
regards,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung Nürnberg | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |
Hi Greg, Sasha,
This batch contains a backport for fixes for 5.10-stable:
The following list shows the backported patches, I am using original commit
IDs for reference:
1) 29b359cf6d95 ("netfilter: nft_set_pipapo: walk over current view on netlink dump")
2) efefd4f00c96 ("netfilter: nf_tables: missing iterator type in lookup walk")
Please, apply,
Thanks
Pablo Neira Ayuso (2):
netfilter: nft_set_pipapo: walk over current view on netlink dump
netfilter: nf_tables: missing iterator type in lookup walk
include/net/netfilter/nf_tables.h | 13 +++++++++++++
net/netfilter/nf_tables_api.c | 5 +++++
net/netfilter/nft_lookup.c | 1 +
net/netfilter/nft_set_pipapo.c | 6 ++++--
4 files changed, 23 insertions(+), 2 deletions(-)
--
2.30.2
Hi,
Can you please bring this commit into linux-6.11.y?
commit c35fad6f7e0d ("ASoC: amd: acp: add ZSC control register
programming sequence")
This helps with power consumption on the affected platforms.
Thanks,
Commit 0ef625bba6fb ("vfs: support statx(..., NULL, AT_EMPTY_PATH,
...)") added support for passing in NULL when AT_EMPTY_PATH is given,
improving performance when statx is used for fetching stat informantion
from a given fd, which is especially important for 32-bit platforms.
This commit also improved the performance when an empty string is given
by short-circuiting the handling of such paths.
This series is based on the commits in the Linus’ tree. Sligth
modifications are applied to the context of the patches for cleanly
applying.
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Signed-off-by: Miao Wang <shankerwangmiao(a)gmail.com>
---
Christian Brauner (2):
fs: new helper vfs_empty_path()
stat: use vfs_empty_path() helper
Mateusz Guzik (1):
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
fs/internal.h | 14 ++++++
fs/namespace.c | 13 ------
fs/stat.c | 123 ++++++++++++++++++++++++++++++++++++-----------------
include/linux/fs.h | 17 ++++++++
4 files changed, 116 insertions(+), 51 deletions(-)
---
base-commit: 049be94099ea5ba8338526c5a4f4f404b9dcaf54
change-id: 20240918-statx-stable-linux-6-10-y-17c3ec7497c8
Best regards,
--
Miao Wang <shankerwangmiao(a)gmail.com>
Hello,
please consider picking up:
7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces")
and its followup fix,
7052622fccb1 ("netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level()")
It should cherry-pick fine for 6.1 and later.
I'm not sure a 5.15 backport is worth it, as its not a crash fix and
noone has reported this problem so far with a 5.15 kernel.
Hello again,
Here is the next set of XFS backports, this set is for 6.1.y and I will
be following up with a set for 5.15.y later. There were some good
suggestions made at LSF to survey test coverage to cut back on
testing but I've been a bit swamped and a backport set was overdue.
So for this set, I have run the auto group 3 x 8 configs with no
regressions seen. Let me know if you spot any issues.
This set has already been ack'd on the XFS list.
Thanks,
Leah
https://lkml.iu.edu/hypermail/linux/kernel/2212.0/04860.html
52f31ed22821
[1/1] xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING
https://lore.kernel.org/linux-xfs/ef8a958d-741f-5bfd-7b2f-db65bf6dc3ac@huaw…
4da112513c01
[1/1] xfs: Fix deadlock on xfs_inodegc_worker
https://www.spinics.net/lists/linux-xfs/msg68547.html
601a27ea09a3
[1/1] xfs: fix extent busy updating
https://www.spinics.net/lists/linux-xfs/msg67254.html
c85007e2e394
[1/1] xfs: don't use BMBT btree split workers for IO completion
fixes from start of the series
https://lore.kernel.org/linux-xfs/20230209221825.3722244-1-david@fromorbit.…
[00/42] xfs: per-ag centric allocation alogrithms
1dd0510f6d4b
[01/42] xfs: fix low space alloc deadlock
f08f984c63e9
[02/42] xfs: prefer free inodes at ENOSPC over chunk allocation
d5753847b216
[03/42] xfs: block reservation too large for minleft allocation
https://lore.kernel.org/linux-xfs/Y+Z7TZ9o+KgXLcV8@magnolia/
60b730a40c43
[1/1] xfs: fix uninitialized variable access
https://lore.kernel.org/linux-xfs/20230228051250.1238353-1-david@fromorbit.…
0c7273e494dd
[1/1] xfs: quotacheck failure can race with background inode inactivation
https://lore.kernel.org/linux-xfs/20230412024907.GP360889@frogsfrogsfrogs/
8ee81ed581ff
[1/1] xfs: fix BUG_ON in xfs_getbmap()
https://www.spinics.net/lists/linux-xfs/msg71062.html
[0/4] xfs: bug fixes for 6.4-rc1
[1/4] xfs: don't unconditionally null args->pag in xfs_bmap_btalloc_at_eof
skip, fix for a commit from 6.3
8e698ee72c4e
[2/4] xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
[3/4] xfs: flush dirty data and drain directios before scrubbing cow fork
skip, scrub
[4/4] xfs: don't allocate into the data fork for an unshare request
skip, more of an optimization than a fix
1bba82fe1afa
[5/4] xfs: fix negative array access in xfs_getbmap
(fix of 8ee81ed)
https://lore.kernel.org/linux-xfs/20230517000449.3997582-1-david@fromorbit.…
[0/4] xfs: bug fixes for 6.4-rcX
89a4bf0dc385
[1/4] xfs: buffer pins need to hold a buffer reference
[2/4] xfs: restore allocation trylock iteration
skip, for issue introduced in 6.3
cb042117488d
[3/4] xfs: defered work could create precommits
(dependency for patch 4)
82842fee6e59
[4/4] xfs: fix AGF vs inode cluster buffer deadlock
https://lore.kernel.org/linux-xfs/20240612225148.3989713-1-david@fromorbit.…
348a1983cf4c
(fix for 82842fee6e5)
[1/1] xfs: fix unlink vs cluster buffer instantiation race
https://lore.kernel.org/linux-xfs/20230530001928.2967218-1-david@fromorbit.…
d4d12c02b
[1/1] xfs: collect errors from inodegc for unlinked inode recovery
https://lore.kernel.org/linux-xfs/20230524121041.GA4128075@ceph-admin/
c3b880acadc9
[1/1] xfs: fix ag count overflow during growfs
4b827b3f305d
[1/1] xfs: remove WARN when dquot cache insertion fails
requested on list to reduce bot noise
https://www.spinics.net/lists/linux-xfs/msg73214.html
5cf32f63b0f4
[1/2] xfs: fix the calculation for "end" and "length"
[2/2] introduces new feature, skipping
https://lore.kernel.org/linux-xfs/20230913102942.601271-1-ruansy.fnst@fujit…
3c90c01e4934
[1/1] xfs: correct calculation for agend and blockcount
(fixes 5cf32)
https://lore.kernel.org/all/20230901160020.GT28186@frogsfrogsfrogs/
68b957f64fca
[1/1] xfs: load uncached unlinked inodes into memory on demand
https://www.spinics.net/lists/linux-xfs/msg74960.html
[0/3] xfs: reload entire iunlink lists
f12b96683d69
[1/3] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
83771c50e42b
[2/3] xfs: reload entire unlinked bucket lists
(dependency of 49813a21ed)
49813a21ed57
[3/3] xfs: make inode unlinked bucket recovery work with quotacheck
(dependency for 537c013)
https://lore.kernel.org/all/169565629026.1982077.12646061547002741492.stgit…
537c013b140d
[1/1] xfs: fix reloading entire unlinked bucket lists
(fix of 68b957f64fca)
Darrick J. Wong (8):
xfs: fix uninitialized variable access
xfs: load uncached unlinked inodes into memory on demand
xfs: fix negative array access in xfs_getbmap
xfs: use i_prev_unlinked to distinguish inodes that are not on the
unlinked list
xfs: reload entire unlinked bucket lists
xfs: make inode unlinked bucket recovery work with quotacheck
xfs: fix reloading entire unlinked bucket lists
xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
Dave Chinner (12):
xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING
xfs: don't use BMBT btree split workers for IO completion
xfs: fix low space alloc deadlock
xfs: prefer free inodes at ENOSPC over chunk allocation
xfs: block reservation too large for minleft allocation
xfs: quotacheck failure can race with background inode inactivation
xfs: buffer pins need to hold a buffer reference
xfs: defered work could create precommits
xfs: fix AGF vs inode cluster buffer deadlock
xfs: collect errors from inodegc for unlinked inode recovery
xfs: remove WARN when dquot cache insertion fails
xfs: fix unlink vs cluster buffer instantiation race
Long Li (1):
xfs: fix ag count overflow during growfs
Shiyang Ruan (2):
xfs: fix the calculation for "end" and "length"
xfs: correct calculation for agend and blockcount
Wengang Wang (1):
xfs: fix extent busy updating
Wu Guanghao (1):
xfs: Fix deadlock on xfs_inodegc_worker
Ye Bin (1):
xfs: fix BUG_ON in xfs_getbmap()
fs/xfs/libxfs/xfs_ag.c | 19 ++-
fs/xfs/libxfs/xfs_alloc.c | 69 +++++++--
fs/xfs/libxfs/xfs_bmap.c | 16 +-
fs/xfs/libxfs/xfs_bmap.h | 2 +
fs/xfs/libxfs/xfs_bmap_btree.c | 19 ++-
fs/xfs/libxfs/xfs_btree.c | 18 ++-
fs/xfs/libxfs/xfs_fs.h | 2 +
fs/xfs/libxfs/xfs_ialloc.c | 17 +++
fs/xfs/libxfs/xfs_log_format.h | 9 +-
fs/xfs/libxfs/xfs_trans_inode.c | 113 +-------------
fs/xfs/xfs_attr_inactive.c | 1 -
fs/xfs/xfs_bmap_util.c | 18 +--
fs/xfs/xfs_buf_item.c | 88 ++++++++---
fs/xfs/xfs_dquot.c | 1 -
fs/xfs/xfs_export.c | 14 ++
fs/xfs/xfs_extent_busy.c | 1 +
fs/xfs/xfs_fsmap.c | 1 +
fs/xfs/xfs_fsops.c | 13 +-
fs/xfs/xfs_icache.c | 58 +++++--
fs/xfs/xfs_icache.h | 4 +-
fs/xfs/xfs_inode.c | 260 ++++++++++++++++++++++++++++----
fs/xfs/xfs_inode.h | 36 ++++-
fs/xfs/xfs_inode_item.c | 149 ++++++++++++++++++
fs/xfs/xfs_inode_item.h | 1 +
fs/xfs/xfs_itable.c | 11 ++
fs/xfs/xfs_log_recover.c | 19 ++-
fs/xfs/xfs_mount.h | 11 +-
fs/xfs/xfs_notify_failure.c | 15 +-
fs/xfs/xfs_qm.c | 72 ++++++---
fs/xfs/xfs_super.c | 1 +
fs/xfs/xfs_trace.h | 46 ++++++
fs/xfs/xfs_trans.c | 9 +-
32 files changed, 841 insertions(+), 272 deletions(-)
--
2.46.0.792.g87dc391469-goog
From: Tony Luck <tony.luck(a)intel.com>
[ Upstream commit 2eda374e883ad297bd9fe575a16c1dc850346075 ]
New CPU #defines encode vendor and family as well as model.
[ dhansen: vertically align 0's in invlpg_miss_ids[] ]
Signed-off-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Link: https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com
[ Ricardo: I used the old match macro X86_MATCH_INTEL_FAM6_MODEL()
instead of X86_MATCH_VFM() as in the upstream commit. ]
Signed-off-by: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
---
I tested this backport on an Alder Lake system. Now pr_info("Incomplete
global flushes, disabling PCID") is back in dmesg. I also tested on a
Meteor Lake system, which is not affected by the INVLPG issue. The message
in question is not there before and after the backport, as expected.
This backport fixes the last remaining caller of x86_match_cpu()
that does not use the family of X86_MATCH_*() macros.
Thomas Lindroth intially reported the regression in
https://lore.kernel.org/all/eb709d67-2a8d-412f-905d-f3777d897bfa@gmail.com/
Maybe Tony and/or the x86 maintainers can ack this backport?
---
arch/x86/mm/init.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 679893ea5e68..6215dfa23578 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -261,21 +261,17 @@ static void __init probe_page_size_mask(void)
}
}
-#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
- .family = 6, \
- .model = _model, \
- }
/*
* INVLPG may not properly flush Global entries
* on these CPUs when PCIDs are enabled.
*/
static const struct x86_cpu_id invlpg_miss_ids[] = {
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
- INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_GRACEMONT, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_P, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_S, 0),
{}
};
--
2.34.1
Hi,
Upstream commit 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just
X86_VENDOR_INTEL") introduced a flags member to struct x86_cpu_id. Bit 0
of x86_cpu.id.flags must be set to 1 for x86_match_cpu() to work correctly.
This upstream commit has been backported to 6.1.y.
Callers that use the X86_MATCH_*() family of macros to compose the argument
of x86_match_cpu() function correctly. Callers that use their own custom
mechanisms may not work if they fail to set x86_cpu_id.flags correctly.
There are two remaining callers in 6.1.y that use their own mechanisms:
setup_pcid() and rapl_msr_probe(). The former caused a regression that
Thomas Lindroth reported in [1]. The latter works by luck but it still
populates its x86_cpu_id[] array incorrectly.
I backported two patches that fix these bugs in mainline. Hopefully the
authors and/or maintainers can ack the backports?
I tested these patches in Alder Lake and Meteor Lake systems as the two
involved callers behave differently on these two systems. I wrote the
testing details in each patch separately.
Thanks and BR,
Ricardo
[1]. https://lore.kernel.org/all/eb709d67-2a8d-412f-905d-f3777d897bfa@gmail.com/
Sumeet Pawnikar (1):
powercap: RAPL: fix invalid initialization for pl4_supported field
Tony Luck (1):
x86/mm: Switch to new Intel CPU model defines
arch/x86/mm/init.c | 16 ++++++----------
drivers/powercap/intel_rapl_msr.c | 12 ++++++------
2 files changed, 12 insertions(+), 16 deletions(-)
--
2.34.1
The `LockedBy::access` method only requires a shared reference to the
owner, so if we have shared access to the `LockedBy` from several
threads at once, then two threads could call `access` in parallel and
both obtain a shared reference to the inner value. Thus, require that
`T: Sync` when calling the `access` method.
An alternative is to require `T: Sync` in the `impl Sync for LockedBy`.
This patch does not choose that approach as it gives up the ability to
use `LockedBy` with `!Sync` types, which is okay as long as you only use
`access_mut`.
Cc: stable(a)vger.kernel.org
Fixes: 7b1f55e3a984 ("rust: sync: introduce `LockedBy`")
Signed-off-by: Alice Ryhl <aliceryhl(a)google.com>
---
Changes in v2:
- Use a `where T: Sync` on `access` instead of changing `impl Sync for
LockedBy`.
- Link to v1: https://lore.kernel.org/r/20240912-locked-by-sync-fix-v1-1-26433cbccbd2@goo…
---
rust/kernel/sync/locked_by.rs | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/rust/kernel/sync/locked_by.rs b/rust/kernel/sync/locked_by.rs
index babc731bd5f6..ce2ee8d87865 100644
--- a/rust/kernel/sync/locked_by.rs
+++ b/rust/kernel/sync/locked_by.rs
@@ -83,8 +83,12 @@ pub struct LockedBy<T: ?Sized, U: ?Sized> {
// SAFETY: `LockedBy` can be transferred across thread boundaries iff the data it protects can.
unsafe impl<T: ?Sized + Send, U: ?Sized> Send for LockedBy<T, U> {}
-// SAFETY: `LockedBy` serialises the interior mutability it provides, so it is `Sync` as long as the
-// data it protects is `Send`.
+// SAFETY: If `T` is not `Sync`, then parallel shared access to this `LockedBy` allows you to use
+// `access_mut` to hand out `&mut T` on one thread at the time. The requirement that `T: Send` is
+// sufficient to allow that.
+//
+// If `T` is `Sync`, then the `access` method also becomes available, which allows you to obtain
+// several `&T` from several threads at once. However, this is okay as `T` is `Sync`.
unsafe impl<T: ?Sized + Send, U: ?Sized> Sync for LockedBy<T, U> {}
impl<T, U> LockedBy<T, U> {
@@ -118,7 +122,10 @@ impl<T: ?Sized, U> LockedBy<T, U> {
///
/// Panics if `owner` is different from the data protected by the lock used in
/// [`new`](LockedBy::new).
- pub fn access<'a>(&'a self, owner: &'a U) -> &'a T {
+ pub fn access<'a>(&'a self, owner: &'a U) -> &'a T
+ where
+ T: Sync,
+ {
build_assert!(
size_of::<U>() > 0,
"`U` cannot be a ZST because `owner` wouldn't be unique"
@@ -127,7 +134,10 @@ pub fn access<'a>(&'a self, owner: &'a U) -> &'a T {
panic!("mismatched owners");
}
- // SAFETY: `owner` is evidence that the owner is locked.
+ // SAFETY: `owner` is evidence that there are only shared references to the owner for the
+ // duration of 'a, so it's not possible to use `Self::access_mut` to obtain a mutable
+ // reference to the inner value that aliases with this shared reference. The type is `Sync`
+ // so there are no other requirements.
unsafe { &*self.data.get() }
}
---
base-commit: 93dc3be19450447a3a7090bd1dfb9f3daac3e8d2
change-id: 20240912-locked-by-sync-fix-07193df52f98
Best regards,
--
Alice Ryhl <aliceryhl(a)google.com>
The quilt patch titled
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
has been removed from the -mm tree. Its filename was
ocfs2-fix-uninit-value-in-ocfs2_get_block.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
Date: Wed, 25 Sep 2024 17:06:00 +0800
syzbot reported an uninit-value BUG:
BUG: KMSAN: uninit-value in ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
do_mpage_readpage+0xc45/0x2780 fs/mpage.c:225
mpage_readahead+0x43f/0x840 fs/mpage.c:374
ocfs2_readahead+0x269/0x320 fs/ocfs2/aops.c:381
read_pages+0x193/0x1110 mm/readahead.c:160
page_cache_ra_unbounded+0x901/0x9f0 mm/readahead.c:273
do_page_cache_ra mm/readahead.c:303 [inline]
force_page_cache_ra+0x3b1/0x4b0 mm/readahead.c:332
force_page_cache_readahead mm/internal.h:347 [inline]
generic_fadvise+0x6b0/0xa90 mm/fadvise.c:106
vfs_fadvise mm/fadvise.c:185 [inline]
ksys_fadvise64_64 mm/fadvise.c:199 [inline]
__do_sys_fadvise64 mm/fadvise.c:214 [inline]
__se_sys_fadvise64 mm/fadvise.c:212 [inline]
__x64_sys_fadvise64+0x1fb/0x3a0 mm/fadvise.c:212
x64_sys_call+0xe11/0x3ba0
arch/x86/include/generated/asm/syscalls_64.h:222
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
This is because when ocfs2_extent_map_get_blocks() fails, p_blkno is
uninitialized. So the error log will trigger the above uninit-value
access.
The error log is out-of-date since get_blocks() was removed long time ago.
And the error code will be logged in ocfs2_extent_map_get_blocks() once
ocfs2_get_cluster() fails, so fix this by only logging inode and block.
Link: https://syzkaller.appspot.com/bug?extid=9709e73bae885b05314b
Link: https://lkml.kernel.org/r/20240925090600.3643376-1-joseph.qi@linux.alibaba.…
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Tested-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Cc: Heming Zhao <heming.zhao(a)suse.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/aops.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/fs/ocfs2/aops.c~ocfs2-fix-uninit-value-in-ocfs2_get_block
+++ a/fs/ocfs2/aops.c
@@ -156,9 +156,8 @@ int ocfs2_get_block(struct inode *inode,
err = ocfs2_extent_map_get_blocks(inode, iblock, &p_blkno, &count,
&ext_flags);
if (err) {
- mlog(ML_ERROR, "Error %d from get_blocks(0x%p, %llu, 1, "
- "%llu, NULL)\n", err, inode, (unsigned long long)iblock,
- (unsigned long long)p_blkno);
+ mlog(ML_ERROR, "get_blocks() failed, inode: 0x%p, "
+ "block: %llu\n", inode, (unsigned long long)iblock);
goto bail;
}
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
The quilt patch titled
Subject: kselftests: mm: fix wrong __NR_userfaultfd value
has been removed from the -mm tree. Its filename was
kselftests-mm-fix-wrong-__nr_userfaultfd-value.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: kselftests: mm: fix wrong __NR_userfaultfd value
Date: Mon, 23 Sep 2024 10:38:36 +0500
grep -rnIF "#define __NR_userfaultfd"
tools/include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
arch/x86/include/generated/uapi/asm/unistd_32.h:374:#define
__NR_userfaultfd 374
arch/x86/include/generated/uapi/asm/unistd_64.h:327:#define
__NR_userfaultfd 323
arch/x86/include/generated/uapi/asm/unistd_x32.h:282:#define
__NR_userfaultfd (__X32_SYSCALL_BIT + 323)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:347:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
arch/arm/include/generated/uapi/asm/unistd-oabi.h:359:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
The number is dependent on the architecture. The above data shows that:
x86 374
x86_64 323
The value of __NR_userfaultfd was changed to 282 when asm-generic/unistd.h
was included. It makes the test to fail every time as the correct number
of this syscall on x86_64 is 323. Fix the header to asm/unistd.h.
Link: https://lkml.kernel.org/r/20240923053836.3270393-1-usama.anjum@collabora.com
Fixes: a5c6bc590094 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/pagemap_ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/pagemap_ioctl.c~kselftests-mm-fix-wrong-__nr_userfaultfd-value
+++ a/tools/testing/selftests/mm/pagemap_ioctl.c
@@ -15,7 +15,7 @@
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <math.h>
-#include <asm-generic/unistd.h>
+#include <asm/unistd.h>
#include <pthread.h>
#include <sys/resource.h>
#include <assert.h>
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
The quilt patch titled
Subject: compiler.h: specify correct attribute for .rodata..c_jump_table
has been removed from the -mm tree. Its filename was
compilerh-specify-correct-attribute-for-rodatac_jump_table.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Subject: compiler.h: specify correct attribute for .rodata..c_jump_table
Date: Tue, 24 Sep 2024 14:27:10 +0800
Currently, there is an assembler message when generating kernel/bpf/core.o
under CONFIG_OBJTOOL with LoongArch compiler toolchain:
Warning: setting incorrect section attributes for .rodata..c_jump_table
This is because the section ".rodata..c_jump_table" should be readonly,
but there is a "W" (writable) part of the flags:
$ readelf -S kernel/bpf/core.o | grep -A 1 "rodata..c"
[34] .rodata..c_j[...] PROGBITS 0000000000000000 0000d2e0
0000000000000800 0000000000000000 WA 0 0 8
There is no above issue on x86 due to the generated section flag is only
"A" (allocatable). In order to silence the warning on LoongArch, specify
the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly,
then the section attribute of ".rodata..c_jump_table" must be readonly
in the kernel/bpf/core.o file.
Before:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, DATA
After:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
By the way, AFAICT, maybe the root cause is related with the different
compiler behavior of various archs, so to some extent this change is a
workaround for LoongArch, and also there is no effect for x86 which is the
only port supported by objtool before LoongArch with this patch.
Link: https://lkml.kernel.org/r/20240924062710.1243-1-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Cc: Josh Poimboeuf <jpoimboe(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [6.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/compiler.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/compiler.h~compilerh-specify-correct-attribute-for-rodatac_jump_table
+++ a/include/linux/compiler.h
@@ -133,7 +133,7 @@ void ftrace_likely_update(struct ftrace_
#define annotate_unreachable() __annotate_unreachable(__COUNTER__)
/* Annotate a C jump table to allow objtool to follow the code flow */
-#define __annotate_jump_table __section(".rodata..c_jump_table")
+#define __annotate_jump_table __section(".rodata..c_jump_table,\"a\",@progbits #")
#else /* !CONFIG_OBJTOOL */
#define annotate_reachable()
_
Patches currently in -mm which might be from yangtiezhu(a)loongson.cn are
The quilt patch titled
Subject: ocfs2: fix deadlock in ocfs2_get_system_file_inode
has been removed from the -mm tree. Its filename was
ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mohammed Anees <pvmohammedanees2003(a)gmail.com>
Subject: ocfs2: fix deadlock in ocfs2_get_system_file_inode
Date: Tue, 24 Sep 2024 09:32:57 +0000
syzbot has found a possible deadlock in ocfs2_get_system_file_inode [1].
The scenario is depicted here,
CPU0 CPU1
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
The function calls which could lead to this are:
CPU0
ocfs2_mknod - lock(&ocfs2_file_ip_alloc_sem_key);
.
.
.
ocfs2_get_system_file_inode - lock(&osb->system_file_mutex);
CPU1 -
ocfs2_fill_super - lock(&osb->system_file_mutex);
.
.
.
ocfs2_read_virt_blocks - lock(&ocfs2_file_ip_alloc_sem_key);
This issue can be resolved by making the down_read -> down_read_try
in the ocfs2_read_virt_blocks.
[1] https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Link: https://lkml.kernel.org/r/20240924093257.7181-1-pvmohammedanees2003@gmail.c…
Signed-off-by: Mohammed Anees <pvmohammedanees2003(a)gmail.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: <syzbot+e0055ea09f1f5e6fabdd(a)syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Tested-by: syzbot+e0055ea09f1f5e6fabdd(a)syzkaller.appspotmail.com
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/extent_map.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/extent_map.c~ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode
+++ a/fs/ocfs2/extent_map.c
@@ -973,7 +973,13 @@ int ocfs2_read_virt_blocks(struct inode
}
while (done < nr) {
- down_read(&OCFS2_I(inode)->ip_alloc_sem);
+ if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
+ rc = -EAGAIN;
+ mlog(ML_ERROR,
+ "Inode #%llu ip_alloc_sem is temporarily unavailable\n",
+ (unsigned long long)OCFS2_I(inode)->ip_blkno);
+ break;
+ }
rc = ocfs2_extent_map_get_blocks(inode, v_block + done,
&p_block, &p_count, NULL);
up_read(&OCFS2_I(inode)->ip_alloc_sem);
_
Patches currently in -mm which might be from pvmohammedanees2003(a)gmail.com are
ocfs2-fix-typo-in-comment.patch
The quilt patch titled
Subject: ocfs2: reserve space for inline xattr before attaching reflink tree
has been removed from the -mm tree. Its filename was
ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
Subject: ocfs2: reserve space for inline xattr before attaching reflink tree
Date: Wed, 18 Sep 2024 06:38:44 +0000
One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption. Upon troubleshooting,
the fsck -fn output showed the below corruption
[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227. Clamp the next record value? n
The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.
Inode: 33080590 Mode: 0640 Generation: 2619713622 (0x9c25a856)
FS Generation: 904309833 (0x35e6ac49)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
Extended Attributes Block: 0 Extended Attributes Inline Size: 256
User: 0 (root) Group: 0 (root) Size: 281320357888
Links: 1 Clusters: 141738
ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
dtime: 0x0 -- Wed Dec 31 17:00:00 1969
Refcount Block: 2777346
Last Extblk: 2886943 Orphan Slot: 0
Sub Alloc Slot: 0 Sub Alloc Bit: 14
Tree Depth: 1 Count: 227 Next Free Rec: 230
## Offset Clusters Block#
0 0 2310 2776351
1 2310 2139 2777375
2 4449 1221 2778399
3 5670 731 2779423
4 6401 566 2780447
....... .... .......
....... .... .......
The issue was in the reflink workfow while reserving space for inline
xattr. The problematic function is ocfs2_reflink_xattr_inline(). By the
time this function is called the reflink tree is already recreated at the
destination inode from the source inode. At this point, this function
reserves space for inline xattrs at the destination inode without even
checking if there is space at the root metadata block. It simply reduces
the l_count from 243 to 227 thereby making space of 256 bytes for inline
xattr whereas the inode already has extents beyond this index (in this
case up to 230), thereby causing corruption.
The fix for this is to reserve space for inline metadata at the destination
inode before the reflink tree gets recreated. The customer has verified the
fix.
Link: https://lkml.kernel.org/r/20240918063844.1830332-1-gautham.ananthakrishna@o…
Fixes: ef962df057aa ("ocfs2: xattr: fix inlined xattr reflink")
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/refcounttree.c | 26 ++++++++++++++++++++++++--
fs/ocfs2/xattr.c | 11 +----------
2 files changed, 25 insertions(+), 12 deletions(-)
--- a/fs/ocfs2/refcounttree.c~ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree
+++ a/fs/ocfs2/refcounttree.c
@@ -25,6 +25,7 @@
#include "namei.h"
#include "ocfs2_trace.h"
#include "file.h"
+#include "symlink.h"
#include <linux/bio.h>
#include <linux/blkdev.h>
@@ -4148,8 +4149,9 @@ static int __ocfs2_reflink(struct dentry
int ret;
struct inode *inode = d_inode(old_dentry);
struct buffer_head *new_bh = NULL;
+ struct ocfs2_inode_info *oi = OCFS2_I(inode);
- if (OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
+ if (oi->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
ret = -EINVAL;
mlog_errno(ret);
goto out;
@@ -4175,6 +4177,26 @@ static int __ocfs2_reflink(struct dentry
goto out_unlock;
}
+ if ((oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) &&
+ (oi->ip_dyn_features & OCFS2_INLINE_XATTR_FL)) {
+ /*
+ * Adjust extent record count to reserve space for extended attribute.
+ * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
+ */
+ struct ocfs2_inode_info *new_oi = OCFS2_I(new_inode);
+
+ if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+ !(ocfs2_inode_is_fast_symlink(new_inode))) {
+ struct ocfs2_dinode *new_di = (struct ocfs2_dinode *)new_bh->b_data;
+ struct ocfs2_dinode *old_di = (struct ocfs2_dinode *)old_bh->b_data;
+ struct ocfs2_extent_list *el = &new_di->id2.i_list;
+ int inline_size = le16_to_cpu(old_di->i_xattr_inline_size);
+
+ le16_add_cpu(&el->l_count, -(inline_size /
+ sizeof(struct ocfs2_extent_rec)));
+ }
+ }
+
ret = ocfs2_create_reflink_node(inode, old_bh,
new_inode, new_bh, preserve);
if (ret) {
@@ -4182,7 +4204,7 @@ static int __ocfs2_reflink(struct dentry
goto inode_unlock;
}
- if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
+ if (oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
ret = ocfs2_reflink_xattrs(inode, old_bh,
new_inode, new_bh,
preserve);
--- a/fs/ocfs2/xattr.c~ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree
+++ a/fs/ocfs2/xattr.c
@@ -6511,16 +6511,7 @@ static int ocfs2_reflink_xattr_inline(st
}
new_oi = OCFS2_I(args->new_inode);
- /*
- * Adjust extent record count to reserve space for extended attribute.
- * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
- */
- if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
- !(ocfs2_inode_is_fast_symlink(args->new_inode))) {
- struct ocfs2_extent_list *el = &new_di->id2.i_list;
- le16_add_cpu(&el->l_count, -(inline_size /
- sizeof(struct ocfs2_extent_rec)));
- }
+
spin_lock(&new_oi->ip_lock);
new_oi->ip_dyn_features |= OCFS2_HAS_XATTR_FL | OCFS2_INLINE_XATTR_FL;
new_di->i_dyn_features = cpu_to_le16(new_oi->ip_dyn_features);
_
Patches currently in -mm which might be from gautham.ananthakrishna(a)oracle.com are
The quilt patch titled
Subject: mm: migrate: annotate data-race in migrate_folio_unmap()
has been removed from the -mm tree. Its filename was
mm-migrate-annotate-data-race-in-migrate_folio_unmap.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jeongjun Park <aha310510(a)gmail.com>
Subject: mm: migrate: annotate data-race in migrate_folio_unmap()
Date: Tue, 24 Sep 2024 22:00:53 +0900
I found a report from syzbot [1]
This report shows that the value can be changed, but in reality, the
value of __folio_set_movable() cannot be changed because it holds the
folio refcount.
Therefore, it is appropriate to add an annotate to make KCSAN
ignore that data-race.
[1]
==================================================================
BUG: KCSAN: data-race in __filemap_remove_folio / migrate_pages_batch
write to 0xffffea0004b81dd8 of 8 bytes by task 6348 on cpu 0:
page_cache_delete mm/filemap.c:153 [inline]
__filemap_remove_folio+0x1ac/0x2c0 mm/filemap.c:233
filemap_remove_folio+0x6b/0x1f0 mm/filemap.c:265
truncate_inode_folio+0x42/0x50 mm/truncate.c:178
shmem_undo_range+0x25b/0xa70 mm/shmem.c:1028
shmem_truncate_range mm/shmem.c:1144 [inline]
shmem_evict_inode+0x14d/0x530 mm/shmem.c:1272
evict+0x2f0/0x580 fs/inode.c:731
iput_final fs/inode.c:1883 [inline]
iput+0x42a/0x5b0 fs/inode.c:1909
dentry_unlink_inode+0x24f/0x260 fs/dcache.c:412
__dentry_kill+0x18b/0x4c0 fs/dcache.c:615
dput+0x5c/0xd0 fs/dcache.c:857
__fput+0x3fb/0x6d0 fs/file_table.c:439
____fput+0x1c/0x30 fs/file_table.c:459
task_work_run+0x13a/0x1a0 kernel/task_work.c:228
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0xbe/0x130 kernel/entry/common.c:218
do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffffea0004b81dd8 of 8 bytes by task 6342 on cpu 1:
__folio_test_movable include/linux/page-flags.h:699 [inline]
migrate_folio_unmap mm/migrate.c:1199 [inline]
migrate_pages_batch+0x24c/0x1940 mm/migrate.c:1797
migrate_pages_sync mm/migrate.c:1963 [inline]
migrate_pages+0xff1/0x1820 mm/migrate.c:2072
do_mbind mm/mempolicy.c:1390 [inline]
kernel_mbind mm/mempolicy.c:1533 [inline]
__do_sys_mbind mm/mempolicy.c:1607 [inline]
__se_sys_mbind+0xf76/0x1160 mm/mempolicy.c:1603
__x64_sys_mbind+0x78/0x90 mm/mempolicy.c:1603
x64_sys_call+0x2b4d/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:238
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0xffff888127601078 -> 0x0000000000000000
Link: https://lkml.kernel.org/r/20240924130053.107490-1-aha310510@gmail.com
Fixes: 7e2a5e5ab217 ("mm: migrate: use __folio_test_movable()")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-annotate-data-race-in-migrate_folio_unmap
+++ a/mm/migrate.c
@@ -1196,7 +1196,7 @@ static int migrate_folio_unmap(new_folio
int rc = -EAGAIN;
int old_page_state = 0;
struct anon_vma *anon_vma = NULL;
- bool is_lru = !__folio_test_movable(src);
+ bool is_lru = data_race(!__folio_test_movable(src));
bool locked = false;
bool dst_locked = false;
_
Patches currently in -mm which might be from aha310510(a)gmail.com are
mm-percpu-fix-typo-to-pcpu_alloc_noprof-description.patch
mm-shmem-fix-data-race-in-shmem_getattr.patch
The quilt patch titled
Subject: mm/hugetlb: simplify refs in memfd_alloc_folio
has been removed from the -mm tree. Its filename was
mm-hugetlb-simplify-refs-in-memfd_alloc_folio.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/hugetlb: simplify refs in memfd_alloc_folio
Date: Wed, 4 Sep 2024 12:41:08 -0700
The folio_try_get in memfd_alloc_folio is not necessary. Delete it, and
delete the matching folio_put in memfd_pin_folios. This also avoids
leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
fails. That error path is also broken in a second way -- when its
folio_put causes the ref to become 0, it will implicitly call
free_huge_folio, but then the path *explicitly* calls free_huge_folio.
Delete the latter.
This is a continuation of the fix
"mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
[steven.sistare(a)oracle.com: remove explicit call to free_huge_folio(), per Matthew]
Link: https://lkml.kernel.org/r/Zti-7nPVMcGgpcbi@casper.infradead.org
Link: https://lkml.kernel.org/r/1725481920-82506-1-git-send-email-steven.sistare@…
Link: https://lkml.kernel.org/r/1725478868-61732-1-git-send-email-steven.sistare@…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Suggested-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 4 +---
mm/memfd.c | 3 +--
2 files changed, 2 insertions(+), 5 deletions(-)
--- a/mm/gup.c~mm-hugetlb-simplify-refs-in-memfd_alloc_folio
+++ a/mm/gup.c
@@ -3615,7 +3615,7 @@ long memfd_pin_folios(struct file *memfd
pgoff_t start_idx, end_idx, next_idx;
struct folio *folio = NULL;
struct folio_batch fbatch;
- struct hstate *h = NULL;
+ struct hstate *h;
long ret = -EINVAL;
if (start < 0 || start > end || !max_folios)
@@ -3659,8 +3659,6 @@ long memfd_pin_folios(struct file *memfd
&fbatch);
if (folio) {
folio_put(folio);
- if (h)
- folio_put(folio);
folio = NULL;
}
--- a/mm/memfd.c~mm-hugetlb-simplify-refs-in-memfd_alloc_folio
+++ a/mm/memfd.c
@@ -89,13 +89,12 @@ struct folio *memfd_alloc_folio(struct f
numa_node_id(),
NULL,
gfp_mask);
- if (folio && folio_try_get(folio)) {
+ if (folio) {
err = hugetlb_add_to_page_cache(folio,
memfd->f_mapping,
idx);
if (err) {
folio_put(folio);
- free_huge_folio(folio);
return ERR_PTR(err);
}
folio_unlock(folio);
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/gup: fix memfd_pin_folios alloc race panic
has been removed from the -mm tree. Its filename was
mm-gup-fix-memfd_pin_folios-alloc-race-panic.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/gup: fix memfd_pin_folios alloc race panic
Date: Tue, 3 Sep 2024 07:25:21 -0700
If memfd_pin_folios tries to create a hugetlb page, but someone else
already did, then folio gets the value -EEXIST here:
folio = memfd_alloc_folio(memfd, start_idx);
if (IS_ERR(folio)) {
ret = PTR_ERR(folio);
if (ret != -EEXIST)
goto err;
then on the next trip through the "while start_idx" loop we panic here:
if (folio) {
folio_put(folio);
To fix, set the folio to NULL on error.
Link: https://lkml.kernel.org/r/1725373521-451395-6-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/gup.c~mm-gup-fix-memfd_pin_folios-alloc-race-panic
+++ a/mm/gup.c
@@ -3702,6 +3702,7 @@ long memfd_pin_folios(struct file *memfd
ret = PTR_ERR(folio);
if (ret != -EEXIST)
goto err;
+ folio = NULL;
}
}
}
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/gup: fix memfd_pin_folios hugetlb page allocation
has been removed from the -mm tree. Its filename was
mm-gup-fix-memfd_pin_folios-hugetlb-page-allocation.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/gup: fix memfd_pin_folios hugetlb page allocation
Date: Tue, 3 Sep 2024 07:25:20 -0700
When memfd_pin_folios -> memfd_alloc_folio creates a hugetlb page, the
index is wrong. The subsequent call to filemap_get_folios_contig thus
cannot find it, and fails, and memfd_pin_folios loops forever. To fix,
adjust the index for the huge_page_order.
memfd_alloc_folio also forgets to unlock the folio, so the next touch of
the page calls hugetlb_fault which blocks forever trying to take the lock.
Unlock it.
Link: https://lkml.kernel.org/r/1725373521-451395-5-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memfd.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/mm/memfd.c~mm-gup-fix-memfd_pin_folios-hugetlb-page-allocation
+++ a/mm/memfd.c
@@ -79,10 +79,13 @@ struct folio *memfd_alloc_folio(struct f
* alloc from. Also, the folio will be pinned for an indefinite
* amount of time, so it is not expected to be migrated away.
*/
- gfp_mask = htlb_alloc_mask(hstate_file(memfd));
+ struct hstate *h = hstate_file(memfd);
+
+ gfp_mask = htlb_alloc_mask(h);
gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE);
+ idx >>= huge_page_order(h);
- folio = alloc_hugetlb_folio_reserve(hstate_file(memfd),
+ folio = alloc_hugetlb_folio_reserve(h,
numa_node_id(),
NULL,
gfp_mask);
@@ -95,6 +98,7 @@ struct folio *memfd_alloc_folio(struct f
free_huge_folio(folio);
return ERR_PTR(err);
}
+ folio_unlock(folio);
return folio;
}
return ERR_PTR(-ENOMEM);
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/hugetlb: fix memfd_pin_folios free_huge_pages leak
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-memfd_pin_folios-free_huge_pages-leak.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/hugetlb: fix memfd_pin_folios free_huge_pages leak
Date: Tue, 3 Sep 2024 07:25:18 -0700
memfd_pin_folios followed by unpin_folios fails to restore free_huge_pages
if the pages were not already faulted in, because the folio refcount for
pages created by memfd_alloc_folio never goes to 0. memfd_pin_folios
needs another folio_put to undo the folio_try_get below:
memfd_alloc_folio()
alloc_hugetlb_folio_nodemask()
dequeue_hugetlb_folio_nodemask()
dequeue_hugetlb_folio_node_exact()
folio_ref_unfreeze(folio, 1); ; adds 1 refcount
folio_try_get() ; adds 1 refcount
hugetlb_add_to_page_cache() ; adds 512 refcount (on x86)
With the fix, after memfd_pin_folios + unpin_folios, the refcount for the
(unfaulted) page is 512, which is correct, as the refcount for a faulted
unpinned page is 513.
Link: https://lkml.kernel.org/r/1725373521-451395-3-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/gup.c~mm-hugetlb-fix-memfd_pin_folios-free_huge_pages-leak
+++ a/mm/gup.c
@@ -3615,7 +3615,7 @@ long memfd_pin_folios(struct file *memfd
pgoff_t start_idx, end_idx, next_idx;
struct folio *folio = NULL;
struct folio_batch fbatch;
- struct hstate *h;
+ struct hstate *h = NULL;
long ret = -EINVAL;
if (start < 0 || start > end || !max_folios)
@@ -3659,6 +3659,8 @@ long memfd_pin_folios(struct file *memfd
&fbatch);
if (folio) {
folio_put(folio);
+ if (h)
+ folio_put(folio);
folio = NULL;
}
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/filemap: fix filemap_get_folios_contig THP panic
has been removed from the -mm tree. Its filename was
mm-filemap-fix-filemap_get_folios_contig-thp-panic.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/filemap: fix filemap_get_folios_contig THP panic
Date: Tue, 3 Sep 2024 07:25:17 -0700
Patch series "memfd-pin huge page fixes".
Fix multiple bugs that occur when using memfd_pin_folios with hugetlb
pages and THP. The hugetlb bugs only bite when the page is not yet
faulted in when memfd_pin_folios is called. The THP bug bites when the
starting offset passed to memfd_pin_folios is not huge page aligned. See
the commit messages for details.
This patch (of 5):
memfd_pin_folios on memory backed by THP panics if the requested start
offset is not huge page aligned:
BUG: kernel NULL pointer dereference, address: 0000000000000036
RIP: 0010:filemap_get_folios_contig+0xdf/0x290
RSP: 0018:ffffc9002092fbe8 EFLAGS: 00010202
RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000002
The fault occurs here, because xas_load returns a folio with value 2:
filemap_get_folios_contig()
for (folio = xas_load(&xas); folio && xas.xa_index <= end;
folio = xas_next(&xas)) {
...
if (!folio_try_get(folio)) <-- BOOM
"2" is an xarray sibling entry. We get it because memfd_pin_folios does
not round the indices passed to filemap_get_folios_contig to huge page
boundaries for THP, so we load from the middle of a huge page range see a
sibling. (It does round for hugetlbfs, at the is_file_hugepages test).
To fix, if the folio is a sibling, then return the next index as the
starting point for the next call to filemap_get_folios_contig.
Link: https://lkml.kernel.org/r/1725373521-451395-1-git-send-email-steven.sistare…
Link: https://lkml.kernel.org/r/1725373521-451395-2-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/filemap.c~mm-filemap-fix-filemap_get_folios_contig-thp-panic
+++ a/mm/filemap.c
@@ -2196,6 +2196,10 @@ unsigned filemap_get_folios_contig(struc
if (xa_is_value(folio))
goto update_start;
+ /* If we landed in the middle of a THP, continue at its end. */
+ if (xa_is_sibling(folio))
+ goto update_start;
+
if (!folio_try_get(folio))
goto retry;
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
nfs41_init_clientid does not signal a failure condition from
nfs4_proc_exchange_id and nfs4_proc_create_session to a client which may
lead to mount syscall indefinitely blocked in the following stack trace:
nfs_wait_client_init_complete
nfs41_discover_server_trunking
nfs4_discover_server_trunking
nfs4_init_client
nfs4_set_client
nfs4_create_server
nfs4_try_get_tree
vfs_get_tree
do_new_mount
__se_sys_mount
and the client stuck in uninitialized state.
In addition to this all subsequent mount calls would also get blocked in
nfs_match_client waiting for the uninitialized client to finish
initialization:
nfs_wait_client_init_complete
nfs_match_client
nfs_get_client
nfs4_set_client
nfs4_create_server
nfs4_try_get_tree
vfs_get_tree
do_new_mount
__se_sys_mount
To avoid this situation propagate error condition to the mount thread
and let mount syscall fail properly.
Signed-off-by: Oleksandr Tymoshenko <ovt(a)google.com>
---
fs/nfs/nfs4state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 877f682b45f2..54ad3440ad2b 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -335,8 +335,8 @@ int nfs41_init_clientid(struct nfs_client *clp, const struct cred *cred)
if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_CONFIRMED_R))
nfs4_state_start_reclaim_reboot(clp);
nfs41_finish_session_reset(clp);
- nfs_mark_client_ready(clp, NFS_CS_READY);
out:
+ nfs_mark_client_ready(clp, status == 0 ? NFS_CS_READY : status);
return status;
}
---
base-commit: ad618736883b8970f66af799e34007475fe33a68
change-id: 20240906-nfs-mount-deadlock-fix-55c14b38e088
Best regards,
--
Oleksandr Tymoshenko <ovt(a)google.com>
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: d4fc4d01471528da8a9797a065982e05090e1d81
Gitweb: https://git.kernel.org/tip/d4fc4d01471528da8a9797a065982e05090e1d81
Author: Alexey Gladkov (Intel) <legion(a)kernel.org>
AuthorDate: Fri, 13 Sep 2024 19:05:56 +02:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Thu, 26 Sep 2024 09:45:04 -07:00
x86/tdx: Fix "in-kernel MMIO" check
TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.
However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.
Ensure that the target MMIO address is within the kernel before decoding
instruction.
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Signed-off-by: Alexey Gladkov (Intel) <legion(a)kernel.org>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/565a804b80387970460a4ebc67c88d1380f61ad1.172623…
---
arch/x86/coco/tdx/tdx.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index da8b66d..327c45c 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -16,6 +16,7 @@
#include <asm/insn-eval.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
+#include <asm/traps.h>
/* MMIO direction */
#define EPT_READ 0
@@ -433,6 +434,11 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
return -EINVAL;
}
+ if (!fault_in_kernel_space(ve->gla)) {
+ WARN_ONCE(1, "Access to userspace address is not supported");
+ return -EINVAL;
+ }
+
/*
* Reject EPT violation #VEs that split pages.
*
This series brings fixes and updates to ov08x40 which allows for use of
this sensor on the Qualcomm x1e80100 CRD but also on any other dts based
system.
Firstly there's a fix for the pseudo burst mode code that was added in
8f667d202384 ("media: ov08x40: Reduce start streaming time"). Not every I2C
controller can handle an arbitrary sized write, this is the case on
Qualcomm CAMSS/CCI I2C sensor interfaces which limit the transaction size
and communicate this limit via I2C quirks. A simple fix to optionally break
up the large submitted burst into chunks not exceeding adapter->quirk size
fixes.
Secondly then is addition of a yaml description for the ov08x40 and
extension of the driver to support OF probe and powering on of the power
rails from the driver instead of from ACPI.
Once done the sensor works without further modification on the Qualcomm
x1e80100 CRD.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (4):
media: ov08x40: Fix burst write sequence
media: dt-bindings: Add OmniVision OV08X40
media: ov08x40: Rename ext_clk to xvclk
media: ov08x40: Add OF probe support
.../bindings/media/i2c/ovti,ov08x40.yaml | 130 ++++++++++++++++
drivers/media/i2c/ov08x40.c | 172 ++++++++++++++++++---
2 files changed, 283 insertions(+), 19 deletions(-)
---
base-commit: 2b7275670032a98cba266bd1b8905f755b3e650f
change-id: 20240926-b4-master-24-11-25-ov08x40-c6f477aaa6a4
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Since the inception of relocation we have maintained the backref cache
across transaction commits, updating the backref cache with the new
bytenr whenever we COW'ed blocks that were in the cache, and then
updating their bytenr once we detected a transaction id change.
This works as long as we're only ever modifying blocks, not changing the
structure of the tree.
However relocation does in fact change the structure of the tree. For
example, if we are relocating a data extent, we will look up all the
leaves that point to this data extent. We will then call
do_relocation() on each of these leaves, which will COW down to the leaf
and then update the file extent location.
But, a key feature of do_relocation is the pending list. This is all
the pending nodes that we modified when we updated the file extent item.
We will then process all of these blocks via finish_pending_nodes, which
calls do_relocation() on all of the nodes that led up to that leaf.
The purpose of this is to make sure we don't break sharing unless we
absolutely have to. Consider the case that we have 3 snapshots that all
point to this leaf through the same nodes, the initial COW would have
created a whole new path. If we did this for all 3 snapshots we would
end up with 3x the number of nodes we had originally. To avoid this we
will cycle through each of the snapshots that point to each of these
nodes and update their pointers to point at the new nodes.
Once we update the pointer to the new node we will drop the node we
removed the link for and all of its children via btrfs_drop_subtree().
This is essentially just btrfs_drop_snapshot(), but for an arbitrary
point in the snapshot.
The problem with this is that we will never reflect this in the backref
cache. If we do this btrfs_drop_snapshot() for a node that is in the
backref tree, we will leave the node in the backref tree. This becomes
a problem when we change the transid, as now the backref cache has
entire subtrees that no longer exist, but exist as if they still are
pointed to by the same roots.
In the best case scenario you end up with "adding refs to an existing
tree ref" errors from insert_inline_extent_backref(), where we attempt
to link in nodes on roots that are no longer valid.
Worst case you will double free some random block and re-use it when
there's still references to the block.
This is extremely subtle, and the consequences are quite bad. There
isn't a way to make sure our backref cache is consistent between
transid's.
In order to fix this we need to simply evict the entire backref cache
anytime we cross transid's. This reduces performance in that we have to
rebuild this backref cache every time we change transid's, but fixes the
bug.
This has existed since relocation was added, and is a pretty critical
bug. There's a lot more cleanup that can be done now that this
functionality is going away, but this patch is as small as possible in
order to fix the problem and make it easy for us to backport it to all
the kernels it needs to be backported to.
Followup series will dismantle more of this code and simplify relocation
drastically to remove this functionality.
We have a reproducer that reproduced the corruption within a few minutes
of running. With this patch it survives several iterations/hours of
running the reproducer.
Fixes: 3fd0a5585eb9 ("Btrfs: Metadata ENOSPC handling for balance")
Cc: stable(a)vger.kernel.org
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
fs/btrfs/backref.c | 12 ++++---
fs/btrfs/relocation.c | 75 ++-----------------------------------------
2 files changed, 11 insertions(+), 76 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index e2f478ecd7fd..f8e1d5b2c512 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3179,10 +3179,14 @@ void btrfs_backref_release_cache(struct btrfs_backref_cache *cache)
btrfs_backref_cleanup_node(cache, node);
}
- cache->last_trans = 0;
-
- for (i = 0; i < BTRFS_MAX_LEVEL; i++)
- ASSERT(list_empty(&cache->pending[i]));
+ for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
+ while (!list_empty(&cache->pending[i])) {
+ node = list_first_entry(&cache->pending[i],
+ struct btrfs_backref_node,
+ list);
+ btrfs_backref_cleanup_node(cache, node);
+ }
+ }
ASSERT(list_empty(&cache->pending_edge));
ASSERT(list_empty(&cache->useless_node));
ASSERT(list_empty(&cache->changed));
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index ea4ed85919ec..cf1dfeaaf2d8 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -232,70 +232,6 @@ static struct btrfs_backref_node *walk_down_backref(
return NULL;
}
-static void update_backref_node(struct btrfs_backref_cache *cache,
- struct btrfs_backref_node *node, u64 bytenr)
-{
- struct rb_node *rb_node;
- rb_erase(&node->rb_node, &cache->rb_root);
- node->bytenr = bytenr;
- rb_node = rb_simple_insert(&cache->rb_root, node->bytenr, &node->rb_node);
- if (rb_node)
- btrfs_backref_panic(cache->fs_info, bytenr, -EEXIST);
-}
-
-/*
- * update backref cache after a transaction commit
- */
-static int update_backref_cache(struct btrfs_trans_handle *trans,
- struct btrfs_backref_cache *cache)
-{
- struct btrfs_backref_node *node;
- int level = 0;
-
- if (cache->last_trans == 0) {
- cache->last_trans = trans->transid;
- return 0;
- }
-
- if (cache->last_trans == trans->transid)
- return 0;
-
- /*
- * detached nodes are used to avoid unnecessary backref
- * lookup. transaction commit changes the extent tree.
- * so the detached nodes are no longer useful.
- */
- while (!list_empty(&cache->detached)) {
- node = list_entry(cache->detached.next,
- struct btrfs_backref_node, list);
- btrfs_backref_cleanup_node(cache, node);
- }
-
- while (!list_empty(&cache->changed)) {
- node = list_entry(cache->changed.next,
- struct btrfs_backref_node, list);
- list_del_init(&node->list);
- BUG_ON(node->pending);
- update_backref_node(cache, node, node->new_bytenr);
- }
-
- /*
- * some nodes can be left in the pending list if there were
- * errors during processing the pending nodes.
- */
- for (level = 0; level < BTRFS_MAX_LEVEL; level++) {
- list_for_each_entry(node, &cache->pending[level], list) {
- BUG_ON(!node->pending);
- if (node->bytenr == node->new_bytenr)
- continue;
- update_backref_node(cache, node, node->new_bytenr);
- }
- }
-
- cache->last_trans = 0;
- return 1;
-}
-
static bool reloc_root_is_dead(const struct btrfs_root *root)
{
/*
@@ -551,9 +487,6 @@ static int clone_backref_node(struct btrfs_trans_handle *trans,
struct btrfs_backref_edge *new_edge;
struct rb_node *rb_node;
- if (cache->last_trans > 0)
- update_backref_cache(trans, cache);
-
rb_node = rb_simple_search(&cache->rb_root, src->commit_root->start);
if (rb_node) {
node = rb_entry(rb_node, struct btrfs_backref_node, rb_node);
@@ -3698,11 +3631,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
break;
}
restart:
- if (update_backref_cache(trans, &rc->backref_cache)) {
- btrfs_end_transaction(trans);
- trans = NULL;
- continue;
- }
+ if (rc->backref_cache.last_trans != trans->transid)
+ btrfs_backref_release_cache(&rc->backref_cache);
+ rc->backref_cache.last_trans = trans->transid;
ret = find_next_extent(rc, path, &key);
if (ret < 0)
--
2.43.0
Since the inception of relocation we have maintained the backref cache
across transaction commits, updating the backref cache with the new
bytenr whenever we COW'ed blocks that were in the cache, and then
updating their bytenr once we detected a transaction id change.
This works as long as we're only ever modifying blocks, not changing the
structure of the tree.
However relocation does in fact change the structure of the tree. For
example, if we are relocating a data extent, we will look up all the
leaves that point to this data extent. We will then call
do_relocation() on each of these leaves, which will COW down to the leaf
and then update the file extent location.
But, a key feature of do_relocation is the pending list. This is all
the pending nodes that we modified when we updated the file extent item.
We will then process all of these blocks via finish_pending_nodes, which
calls do_relocation() on all of the nodes that led up to that leaf.
The purpose of this is to make sure we don't break sharing unless we
absolutely have to. Consider the case that we have 3 snapshots that all
point to this leaf through the same nodes, the initial COW would have
created a whole new path. If we did this for all 3 snapshots we would
end up with 3x the number of nodes we had originally. To avoid this we
will cycle through each of the snapshots that point to each of these
nodes and update their pointers to point at the new nodes.
Once we update the pointer to the new node we will drop the node we
removed the link for and all of its children via btrfs_drop_subtree().
This is essentially just btrfs_drop_snapshot(), but for an arbitrary
point in the snapshot.
The problem with this is that we will never reflect this in the backref
cache. If we do this btrfs_drop_snapshot() for a node that is in the
backref tree, we will leave the node in the backref tree. This becomes
a problem when we change the transid, as now the backref cache has
entire subtrees that no longer exist, but exist as if they still are
pointed to by the same roots.
In the best case scenario you end up with "adding refs to an existing
tree ref" errors from insert_inline_extent_backref(), where we attempt
to link in nodes on roots that are no longer valid.
Worst case you will double free some random block and re-use it when
there's still references to the block.
This is extremely subtle, and the consequences are quite bad. There
isn't a way to make sure our backref cache is consistent between
transid's.
In order to fix this we need to simply evict the entire backref cache
anytime we cross transid's. This reduces performance in that we have to
rebuild this backref cache every time we change transid's, but fixes the
bug.
This has existed since relocation was added, and is a pretty critical
bug. There's a lot more cleanup that can be done now that this
functionality is going away, but this patch is as small as possible in
order to fix the problem and make it easy for us to backport it to all
the kernels it needs to be backported to.
Followup series will dismantle more of this code and simplify relocation
drastically to remove this functionality.
We have a reproducer that reproduced the corruption within a few minutes
of running. With this patch it survives several iterations/hours of
running the reproducer.
Fixes: 3fd0a5585eb9 ("Btrfs: Metadata ENOSPC handling for balance")
Cc: stable(a)vger.kernel.org
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
fs/btrfs/backref.c | 12 ++++---
fs/btrfs/relocation.c | 76 +++----------------------------------------
2 files changed, 13 insertions(+), 75 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index e2f478ecd7fd..f8e1d5b2c512 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3179,10 +3179,14 @@ void btrfs_backref_release_cache(struct btrfs_backref_cache *cache)
btrfs_backref_cleanup_node(cache, node);
}
- cache->last_trans = 0;
-
- for (i = 0; i < BTRFS_MAX_LEVEL; i++)
- ASSERT(list_empty(&cache->pending[i]));
+ for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
+ while (!list_empty(&cache->pending[i])) {
+ node = list_first_entry(&cache->pending[i],
+ struct btrfs_backref_node,
+ list);
+ btrfs_backref_cleanup_node(cache, node);
+ }
+ }
ASSERT(list_empty(&cache->pending_edge));
ASSERT(list_empty(&cache->useless_node));
ASSERT(list_empty(&cache->changed));
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index ea4ed85919ec..aaa9cac213f1 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -232,70 +232,6 @@ static struct btrfs_backref_node *walk_down_backref(
return NULL;
}
-static void update_backref_node(struct btrfs_backref_cache *cache,
- struct btrfs_backref_node *node, u64 bytenr)
-{
- struct rb_node *rb_node;
- rb_erase(&node->rb_node, &cache->rb_root);
- node->bytenr = bytenr;
- rb_node = rb_simple_insert(&cache->rb_root, node->bytenr, &node->rb_node);
- if (rb_node)
- btrfs_backref_panic(cache->fs_info, bytenr, -EEXIST);
-}
-
-/*
- * update backref cache after a transaction commit
- */
-static int update_backref_cache(struct btrfs_trans_handle *trans,
- struct btrfs_backref_cache *cache)
-{
- struct btrfs_backref_node *node;
- int level = 0;
-
- if (cache->last_trans == 0) {
- cache->last_trans = trans->transid;
- return 0;
- }
-
- if (cache->last_trans == trans->transid)
- return 0;
-
- /*
- * detached nodes are used to avoid unnecessary backref
- * lookup. transaction commit changes the extent tree.
- * so the detached nodes are no longer useful.
- */
- while (!list_empty(&cache->detached)) {
- node = list_entry(cache->detached.next,
- struct btrfs_backref_node, list);
- btrfs_backref_cleanup_node(cache, node);
- }
-
- while (!list_empty(&cache->changed)) {
- node = list_entry(cache->changed.next,
- struct btrfs_backref_node, list);
- list_del_init(&node->list);
- BUG_ON(node->pending);
- update_backref_node(cache, node, node->new_bytenr);
- }
-
- /*
- * some nodes can be left in the pending list if there were
- * errors during processing the pending nodes.
- */
- for (level = 0; level < BTRFS_MAX_LEVEL; level++) {
- list_for_each_entry(node, &cache->pending[level], list) {
- BUG_ON(!node->pending);
- if (node->bytenr == node->new_bytenr)
- continue;
- update_backref_node(cache, node, node->new_bytenr);
- }
- }
-
- cache->last_trans = 0;
- return 1;
-}
-
static bool reloc_root_is_dead(const struct btrfs_root *root)
{
/*
@@ -551,9 +487,6 @@ static int clone_backref_node(struct btrfs_trans_handle *trans,
struct btrfs_backref_edge *new_edge;
struct rb_node *rb_node;
- if (cache->last_trans > 0)
- update_backref_cache(trans, cache);
-
rb_node = rb_simple_search(&cache->rb_root, src->commit_root->start);
if (rb_node) {
node = rb_entry(rb_node, struct btrfs_backref_node, rb_node);
@@ -3698,10 +3631,11 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
break;
}
restart:
- if (update_backref_cache(trans, &rc->backref_cache)) {
- btrfs_end_transaction(trans);
- trans = NULL;
- continue;
+ if (rc->backref_cache.last_trans == 0) {
+ rc->backref_cache.last_trans = trans->transid;
+ } else if (rc->backref_cache.last_trans != trans->transid) {
+ btrfs_backref_release_cache(&rc->backref_cache);
+ rc->backref_cache.last_trans = trans->transid;
}
ret = find_next_extent(rc, path, &key);
--
2.43.0
These are all fixes for the frozen notification patch [1], which as of
today hasn't landed in mainline yet. As such, this patchset is rebased
on top of the char-misc-next branch.
[1] https://lore.kernel.org/all/20240709070047.4055369-2-yutingtseng@google.com/
Cc: stable(a)vger.kernel.org
Cc: Yu-Ting Tseng <yutingtseng(a)google.com>
Cc: Alice Ryhl <aliceryhl(a)google.com>
Cc: Todd Kjos <tkjos(a)google.com>
Cc: Martijn Coenen <maco(a)google.com>
Cc: Arve Hjønnevåg <arve(a)android.com>
Cc: Viktor Martensson <vmartensson(a)google.com>
Carlos Llamas (4):
binder: fix node UAF in binder_add_freeze_work()
binder: fix OOB in binder_add_freeze_work()
binder: fix freeze UAF in binder_release_work()
binder: fix BINDER_WORK_FROZEN_BINDER debug logs
drivers/android/binder.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
--
2.46.0.792.g87dc391469-goog
Please consider adding the following commit to v6.6:
83bdfcbdbe5d ("nvme-pci: qdepth 1 quirk")
This resolves filesystem corruption and random drive drop-outs on machines with O2micro NVME drives, including a number of HP Thinclients.
Alex
Move the "l3_cache" node under the "cpus" node in the dtsi file for Rockchip
RK356x SoCs. There's no need for this cache node to be at the higher level.
Fixes: 8612169a05c5 ("arm64: dts: rockchip: Add cache information to the SoC dtsi for RK356x")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dragan Simic <dsimic(a)manjaro.org>
---
arch/arm64/boot/dts/rockchip/rk356x.dtsi | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk356x.dtsi b/arch/arm64/boot/dts/rockchip/rk356x.dtsi
index 4690be841a1c..9f7136e5d553 100644
--- a/arch/arm64/boot/dts/rockchip/rk356x.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk356x.dtsi
@@ -113,19 +113,19 @@ cpu3: cpu@300 {
d-cache-sets = <128>;
next-level-cache = <&l3_cache>;
};
- };
- /*
- * There are no private per-core L2 caches, but only the
- * L3 cache that appears to the CPU cores as L2 caches
- */
- l3_cache: l3-cache {
- compatible = "cache";
- cache-level = <2>;
- cache-unified;
- cache-size = <0x80000>;
- cache-line-size = <64>;
- cache-sets = <512>;
+ /*
+ * There are no private per-core L2 caches, but only the
+ * L3 cache that appears to the CPU cores as L2 caches
+ */
+ l3_cache: l3-cache {
+ compatible = "cache";
+ cache-level = <2>;
+ cache-unified;
+ cache-size = <0x80000>;
+ cache-line-size = <64>;
+ cache-sets = <512>;
+ };
};
cpu0_opp_table: opp-table-0 {
From: Willem de Bruijn <willemb(a)google.com>
Detect gso fraglist skbs with corrupted geometry (see below) and
pass these to skb_segment instead of skb_segment_list, as the first
can segment them correctly.
Valid SKB_GSO_FRAGLIST skbs
- consist of two or more segments
- the head_skb holds the protocol headers plus first gso_size
- one or more frag_list skbs hold exactly one segment
- all but the last must be gso_size
Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
modify these skbs, breaking these invariants.
In extreme cases they pull all data into skb linear. For UDP, this
causes a NULL ptr deref in __udpv4_gso_segment_list_csum at
udp_hdr(seg->next)->dest.
Detect invalid geometry due to pull, by checking head_skb size.
Don't just drop, as this may blackhole a destination. Convert to be
able to pass to regular skb_segment.
Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediate…
Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Cc: stable(a)vger.kernel.org
---
Tested:
- tools/testing/selftests/net/udpgro_fwd.sh
- kunit gso_test_func converted to calling __udp_gso_segment
- below manual end-to-end test:
(which probably repeats a lot of udpgro_fwd.sh, in hindsight..)
(won't repeat this on any resubmits, given how long it is)
#!/bin/bash
ip netns add test1
ip netns add test2
ip netns add test3
ip link add dev veth0 netns test1 type veth peer name veth0 netns test2
ip link add dev veth1 netns test2 type veth peer name veth1 netns test3
ip netns exec test1 ip link set dev veth0 up
ip netns exec test2 ip link set dev veth0 up
ip netns exec test2 ip link set dev veth1 up
ip netns exec test3 ip link set dev veth1 up
ip netns exec test1 ip addr add 10.0.8.1/24 dev veth0
ip netns exec test2 ip addr add 10.0.8.2/24 dev veth0
ip netns exec test2 ip addr add 10.0.9.2/24 dev veth1
ip netns exec test3 ip addr add 10.0.9.3/24 dev veth1
ip -6 -netns test1 addr add fdaa::1 dev veth0
ip -6 -netns test2 addr add fdaa::2 dev veth0
ip -6 -netns test2 addr add fdbb::2 dev veth1
ip -6 -netns test3 addr add fdbb::3 dev veth1
ip netns exec test2 sysctl -w net.ipv4.ip_forward=1
ip netns exec test2 sysctl -w net.ipv6.conf.all.forwarding=1
ip -netns test1 route add default via 10.0.8.2
ip -netns test3 route add default via 10.0.9.2
ip -6 -netns test1 route add fdaa::2 dev veth0
ip -6 -netns test2 route add fdaa::1 dev veth0
ip -6 -netns test2 route add fdbb::3 dev veth1
ip -6 -netns test3 route add fdbb::2 dev veth1
ip -6 -netns test1 route add default via fdaa::2
ip -6 -netns test3 route add default via fdbb::2
ip netns exec test1 ethtool -K veth0 gso off tx-udp-segmentation off
ip netns exec test2 ethtool -K veth0 gro on rx-gro-list on rx-udp-gro-forwarding on
ip netns exec test2 ethtool -K veth1 gso off tx-udp-segmentation off
ip netns exec test2 tc qdisc add dev veth0 clsact
ip netns exec test2 tc filter add dev veth0 ingress bpf direct-action obj tc_pull.o sec tc
ip netns exec test3 /mnt/shared/udpgso_bench_rx & \
ip netns exec test1 /mnt/shared/udpgso_bench_tx -l 5 -4 -D 10.0.9.3 -s
60000 -S 0 -z
ip netns exec test3 /mnt/shared/udpgso_bench_rx & \
ip netns exec test1 /mnt/shared/udpgso_bench_tx -l 5 -6 -D fdbb::3 -s
60000 -S 0 -z
With trivial BPF program:
$ cat ~/work/tc_pull.c
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include <linux/pkt_cls.h>
#include <linux/types.h>
#include <bpf/bpf_helpers.h>
__attribute__((section("tc")))
int tc_cls_prog(struct __sk_buff *skb) {
bpf_skb_pull_data(skb, skb->len);
return TC_ACT_OK;
}
---
net/ipv4/udp_offload.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index d842303587af..e457fa9143a6 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -296,8 +296,16 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
return NULL;
}
- if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST)
- return __udp_gso_segment_list(gso_skb, features, is_ipv6);
+ if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST) {
+ /* Detect modified geometry and pass these to skb_segment. */
+ if (skb_pagelen(gso_skb) - sizeof(*uh) == skb_shinfo(gso_skb)->gso_size)
+ return __udp_gso_segment_list(gso_skb, features, is_ipv6);
+
+ /* Setup csum, as fraglist skips this in udp4_gro_receive. */
+ gso_skb->csum_start = skb_transport_header(gso_skb) - gso_skb->head;
+ gso_skb->csum_offset = offsetof(struct udphdr, check);
+ gso_skb->ip_summed = CHECKSUM_PARTIAL;
+ }
skb_pull(gso_skb, sizeof(*uh));
--
2.46.0.792.g87dc391469-goog
On Thu, Sep 26, 2024 at 10:56:24AM +0800, cuigaosheng wrote:
> On 2024/9/25 20:19, Greg KH wrote:
>
> > On Wed, Sep 25, 2024 at 08:07:46PM +0800, Gaosheng Cui wrote:
> > > If a kset with uninitialized kset->kobj.ktype be registered,
> > Does that happen today with any in-kernel code? If so, let's fix those
> > kset instances, right?
>
> I didn't find this kset instance in kernel code,itwas discovered through code review.
Great, then it is not a real issue :)
> > > kset_register() will return error, and the kset.kobj.name allocated
> > > by kobject_set_name() will be leaked.
> > >
> > > To mitigate this, we free the name in kset_register() when an error
> > > is encountered due to uninitialized kset->kobj.ktype.
> > How did you hit this?
>
> I am testing kernel functionality through fault injection, and I discovered
> it whenI was locating the issue fixed by another patch, I haven't found this
> wrong usage in the kernel, but we can construct it by fault injection,
"fault injection" also shows things that are impossible to ever have
happen as well, so our "need" to fix them is usually very low, right?
thanks,
greg k-h
From: Dave Chinner <dchinner(a)redhat.com>
[ Upstream commit f1e1765aad7de7a8b8102044fc6a44684bc36180 ]
If the journal geometry results in a sector or log stripe unit
validation problem, it indicates that we cannot set the log up to
safely write to the the journal. In these cases, we must abort the
mount because the corruption needs external intervention to resolve.
Similarly, a journal that is too large cannot be written to safely,
either, so we shouldn't allow those geometries to mount, either.
If the log is too small, we risk having transaction reservations
overruning the available log space and the system hanging waiting
for space it can never provide. This is purely a runtime hang issue,
not a corruption issue as per the first cases listed above. We abort
mounts of the log is too small for V5 filesystems, but we must allow
v4 filesystems to mount because, historically, there was no log size
validity checking and so some systems may still be out there with
undersized logs.
The problem is that on V4 filesystems, when we discover a log
geometry problem, we skip all the remaining checks and then allow
the log to continue mounting. This mean that if one of the log size
checks fails, we skip the log stripe unit check. i.e. we allow the
mount because a "non-fatal" geometry is violated, and then fail to
check the hard fail geometries that should fail the mount.
Move all these fatal checks to the superblock verifier, and add a
new check for the two log sector size geometry variables having the
same values. This will prevent any attempt to mount a log that has
invalid or inconsistent geometries long before we attempt to mount
the log.
However, for the minimum log size checks, we can only do that once
we've setup up the log and calculated all the iclog sizes and
roundoffs. Hence this needs to remain in the log mount code after
the log has been initialised. It is also the only case where we
should allow a v4 filesystem to continue running, so leave that
handling in place, too.
Signed-off-by: Dave Chinner <dchinner(a)redhat.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik(a)gmail.com>
---
Notes:
A new fix for the latest 6.1.y backport series just came in. Ran
some tests on it as well and all looks good. Please include with
the original set.
Thanks,
Leah
fs/xfs/libxfs/xfs_sb.c | 56 +++++++++++++++++++++++++++++++++++++++++-
fs/xfs/xfs_log.c | 47 +++++++++++------------------------
2 files changed, 70 insertions(+), 33 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index bf2cca78304e..c24a38272cb7 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -413,7 +413,6 @@ xfs_validate_sb_common(
sbp->sb_inodelog < XFS_DINODE_MIN_LOG ||
sbp->sb_inodelog > XFS_DINODE_MAX_LOG ||
sbp->sb_inodesize != (1 << sbp->sb_inodelog) ||
- sbp->sb_logsunit > XLOG_MAX_RECORD_BSIZE ||
sbp->sb_inopblock != howmany(sbp->sb_blocksize,sbp->sb_inodesize) ||
XFS_FSB_TO_B(mp, sbp->sb_agblocks) < XFS_MIN_AG_BYTES ||
XFS_FSB_TO_B(mp, sbp->sb_agblocks) > XFS_MAX_AG_BYTES ||
@@ -431,6 +430,61 @@ xfs_validate_sb_common(
return -EFSCORRUPTED;
}
+ /*
+ * Logs that are too large are not supported at all. Reject them
+ * outright. Logs that are too small are tolerated on v4 filesystems,
+ * but we can only check that when mounting the log. Hence we skip
+ * those checks here.
+ */
+ if (sbp->sb_logblocks > XFS_MAX_LOG_BLOCKS) {
+ xfs_notice(mp,
+ "Log size 0x%x blocks too large, maximum size is 0x%llx blocks",
+ sbp->sb_logblocks, XFS_MAX_LOG_BLOCKS);
+ return -EFSCORRUPTED;
+ }
+
+ if (XFS_FSB_TO_B(mp, sbp->sb_logblocks) > XFS_MAX_LOG_BYTES) {
+ xfs_warn(mp,
+ "log size 0x%llx bytes too large, maximum size is 0x%llx bytes",
+ XFS_FSB_TO_B(mp, sbp->sb_logblocks),
+ XFS_MAX_LOG_BYTES);
+ return -EFSCORRUPTED;
+ }
+
+ /*
+ * Do not allow filesystems with corrupted log sector or stripe units to
+ * be mounted. We cannot safely size the iclogs or write to the log if
+ * the log stripe unit is not valid.
+ */
+ if (sbp->sb_versionnum & XFS_SB_VERSION_SECTORBIT) {
+ if (sbp->sb_logsectsize != (1U << sbp->sb_logsectlog)) {
+ xfs_notice(mp,
+ "log sector size in bytes/log2 (0x%x/0x%x) must match",
+ sbp->sb_logsectsize, 1U << sbp->sb_logsectlog);
+ return -EFSCORRUPTED;
+ }
+ } else if (sbp->sb_logsectsize || sbp->sb_logsectlog) {
+ xfs_notice(mp,
+ "log sector size in bytes/log2 (0x%x/0x%x) are not zero",
+ sbp->sb_logsectsize, sbp->sb_logsectlog);
+ return -EFSCORRUPTED;
+ }
+
+ if (sbp->sb_logsunit > 1) {
+ if (sbp->sb_logsunit % sbp->sb_blocksize) {
+ xfs_notice(mp,
+ "log stripe unit 0x%x bytes must be a multiple of block size",
+ sbp->sb_logsunit);
+ return -EFSCORRUPTED;
+ }
+ if (sbp->sb_logsunit > XLOG_MAX_RECORD_BSIZE) {
+ xfs_notice(mp,
+ "log stripe unit 0x%x bytes over maximum size (0x%x bytes)",
+ sbp->sb_logsunit, XLOG_MAX_RECORD_BSIZE);
+ return -EFSCORRUPTED;
+ }
+ }
+
/* Validate the realtime geometry; stolen from xfs_repair */
if (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE ||
sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE) {
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index d9aa5eab02c3..59c982297503 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -639,7 +639,6 @@ xfs_log_mount(
int num_bblks)
{
struct xlog *log;
- bool fatal = xfs_has_crc(mp);
int error = 0;
int min_logfsbs;
@@ -661,53 +660,37 @@ xfs_log_mount(
mp->m_log = log;
/*
- * Validate the given log space and drop a critical message via syslog
- * if the log size is too small that would lead to some unexpected
- * situations in transaction log space reservation stage.
+ * Now that we have set up the log and it's internal geometry
+ * parameters, we can validate the given log space and drop a critical
+ * message via syslog if the log size is too small. A log that is too
+ * small can lead to unexpected situations in transaction log space
+ * reservation stage. The superblock verifier has already validated all
+ * the other log geometry constraints, so we don't have to check those
+ * here.
*
- * Note: we can't just reject the mount if the validation fails. This
- * would mean that people would have to downgrade their kernel just to
- * remedy the situation as there is no way to grow the log (short of
- * black magic surgery with xfs_db).
+ * Note: For v4 filesystems, we can't just reject the mount if the
+ * validation fails. This would mean that people would have to
+ * downgrade their kernel just to remedy the situation as there is no
+ * way to grow the log (short of black magic surgery with xfs_db).
*
- * We can, however, reject mounts for CRC format filesystems, as the
+ * We can, however, reject mounts for V5 format filesystems, as the
* mkfs binary being used to make the filesystem should never create a
* filesystem with a log that is too small.
*/
min_logfsbs = xfs_log_calc_minimum_size(mp);
-
if (mp->m_sb.sb_logblocks < min_logfsbs) {
xfs_warn(mp,
"Log size %d blocks too small, minimum size is %d blocks",
mp->m_sb.sb_logblocks, min_logfsbs);
- error = -EINVAL;
- } else if (mp->m_sb.sb_logblocks > XFS_MAX_LOG_BLOCKS) {
- xfs_warn(mp,
- "Log size %d blocks too large, maximum size is %lld blocks",
- mp->m_sb.sb_logblocks, XFS_MAX_LOG_BLOCKS);
- error = -EINVAL;
- } else if (XFS_FSB_TO_B(mp, mp->m_sb.sb_logblocks) > XFS_MAX_LOG_BYTES) {
- xfs_warn(mp,
- "log size %lld bytes too large, maximum size is %lld bytes",
- XFS_FSB_TO_B(mp, mp->m_sb.sb_logblocks),
- XFS_MAX_LOG_BYTES);
- error = -EINVAL;
- } else if (mp->m_sb.sb_logsunit > 1 &&
- mp->m_sb.sb_logsunit % mp->m_sb.sb_blocksize) {
- xfs_warn(mp,
- "log stripe unit %u bytes must be a multiple of block size",
- mp->m_sb.sb_logsunit);
- error = -EINVAL;
- fatal = true;
- }
- if (error) {
+
/*
* Log check errors are always fatal on v5; or whenever bad
* metadata leads to a crash.
*/
- if (fatal) {
+ if (xfs_has_crc(mp)) {
xfs_crit(mp, "AAIEEE! Log failed size checks. Abort!");
ASSERT(0);
+ error = -EINVAL;
goto out_free_log;
}
xfs_crit(mp, "Log size out of supported range.");
--
2.46.0.792.g87dc391469-goog
On Wed, 2024-09-25 at 03:30 +0000, Ping-Ke Shih wrote:
> I think the cause is picking commit 268f84a82753
> ("wifi: cfg80211: check wiphy mutex is held for wdev mutex"), and
> cfg80211_is_all_idle() called by disconnect_work() uses wdev_lock()
> but not wiphy_lock().
Yeah seems like a stable only problem (CC them), this is with kernel
version 6.6.51-00141-ga1649b6f8ed6 according to the warning.
> I'm not sure if we should simply revert the picked commit 268f84a82753
> or should pick more commits.
I don't see why it was picked up in the first place, so I guess I'd say
remove it. We won't want to redo the locking on a stable kernel, I'd
think.
> By the way, I think the latest kernel will not throw these messages.
Agree, that seems unlikely.
johannes
I upgraded from kernel 6.1.94 to 6.1.99 on one of my machines and noticed that
the dmesg line "Incomplete global flushes, disabling PCID" had disappeared from
the log.
That message comes from commit c26b9e193172f48cd0ccc64285337106fb8aa804, which
disables PCID support on some broken hardware in arch/x86/mm/init.c:
#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
.family = 6, \
.model = _model, \
}
/*
* INVLPG may not properly flush Global entries
* on these CPUs when PCIDs are enabled.
*/
static const struct x86_cpu_id invlpg_miss_ids[] = {
INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
{}
...
if (x86_match_cpu(invlpg_miss_ids)) {
pr_info("Incomplete global flushes, disabling PCID");
setup_clear_cpu_cap(X86_FEATURE_PCID);
return;
}
arch/x86/mm/init.c, which has that code, hasn't changed in 6.1.94 -> 6.1.99.
However I found a commit changing how x86_match_cpu() behaves in 6.1.96:
commit 8ab1361b2eae44077fef4adea16228d44ffb860c
Author: Tony Luck <tony.luck(a)intel.com>
Date: Mon May 20 15:45:33 2024 -0700
x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
I suspect this broke the PCID disabling code in arch/x86/mm/init.c.
The commit message says:
"Add a new flags field to struct x86_cpu_id that has a bit set to indicate that
this entry in the array is valid. Update X86_MATCH*() macros to set that bit.
Change the end-marker check in x86_match_cpu() to just check the flags field
for this bit."
But the PCID disabling code in 6.1.99 does not make use of the
X86_MATCH*() macros; instead, it defines a new INTEL_MATCH() macro without the
X86_CPU_ID_FLAG_ENTRY_VALID flag.
I looked in upstream git and found an existing fix:
commit 2eda374e883ad297bd9fe575a16c1dc850346075
Author: Tony Luck <tony.luck(a)intel.com>
Date: Wed Apr 24 11:15:18 2024 -0700
x86/mm: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.
[ dhansen: vertically align 0's in invlpg_miss_ids[] ]
Signed-off-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Link: https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 679893ea5e68..6b43b6480354 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -261,21 +261,17 @@ static void __init probe_page_size_mask(void)
}
}
-#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
- .family = 6, \
- .model = _model, \
- }
/*
* INVLPG may not properly flush Global entries
* on these CPUs when PCIDs are enabled.
*/
static const struct x86_cpu_id invlpg_miss_ids[] = {
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
- INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ X86_MATCH_VFM(INTEL_ALDERLAKE, 0),
+ X86_MATCH_VFM(INTEL_ALDERLAKE_L, 0),
+ X86_MATCH_VFM(INTEL_ATOM_GRACEMONT, 0),
+ X86_MATCH_VFM(INTEL_RAPTORLAKE, 0),
+ X86_MATCH_VFM(INTEL_RAPTORLAKE_P, 0),
+ X86_MATCH_VFM(INTEL_RAPTORLAKE_S, 0),
{}
};
The fix removed the custom INTEL_MATCH macro and uses the X86_MATCH*() macros
with X86_CPU_ID_FLAG_ENTRY_VALID. This fixed commit was never backported to 6.1,
so it looks like a stable series regression due to a missing backport.
If I apply the fix patch on 6.1.99, the PCID disabling code activates again.
I had to change all the INTEL_* definitions to the old definitions to make it
build:
static const struct x86_cpu_id invlpg_miss_ids[] = {
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE, 0),
+ X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_L, 0),
+ X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_N, 0),
+ X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE, 0),
+ X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_P, 0),
+ X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_S, 0),
{}
};
I only looked at the code in arch/x86/mm/init.c, so there may be other uses of
x86_match_cpu() in the kernel that are also broken in 6.1.99.
This email is meant as a bug report, not a pull request. Someone else should
confirm the problem and submit the appropriate fix.
From: Masami Hiramatsu <mhiramat(a)kernel.org>
commit 5b06eeae52c02dd0d9bc8488275a1207d410870b upstream.
Since commit 5821ba969511 ("selftests: Add test plan API to kselftest.h
and adjust callers") accidentally introduced 'a' typo in the front of
run_test() function, breakpoint_test_arm64.c became not able to be
compiled.
Remove the 'a' from arun_test().
Fixes: 5821ba969511 ("selftests: Add test plan API to kselftest.h and adjust callers")
Reported-by: Jun Takahashi <takahashi.jun_s(a)aa.socionext.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
[Samasth: bp to 5.4.y]
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda(a)oracle.com>
---
tools/testing/selftests/breakpoints/breakpoint_test_arm64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
index 58ed5eeab7094..ad41ea69001bc 100644
--- a/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
+++ b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
@@ -109,7 +109,7 @@ static bool set_watchpoint(pid_t pid, int size, int wp)
return false;
}
-static bool arun_test(int wr_size, int wp_size, int wr, int wp)
+static bool run_test(int wr_size, int wp_size, int wr, int wp)
{
int status;
siginfo_t siginfo;
--
2.45.2
Evil user can guess the next id of the vm before the ioctl completes and
then call vm destroy ioctl to trigger UAF since create ioctl is still
referencing the same vm. Move the xa_alloc all the way to the end to
prevent this.
v2:
- Rebase
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_vm.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 31fe31db3fdc..ce9dca4d4e87 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1765,10 +1765,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
if (IS_ERR(vm))
return PTR_ERR(vm);
- err = xa_alloc(&xef->vm.xa, &id, vm, xa_limit_32b, GFP_KERNEL);
- if (err)
- goto err_close_and_put;
-
if (xe->info.has_asid) {
down_write(&xe->usm.lock);
err = xa_alloc_cyclic(&xe->usm.asid_to_vm, &asid, vm,
@@ -1776,12 +1772,11 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
&xe->usm.next_asid, GFP_KERNEL);
up_write(&xe->usm.lock);
if (err < 0)
- goto err_free_id;
+ goto err_close_and_put;
vm->usm.asid = asid;
}
- args->vm_id = id;
vm->xef = xe_file_get(xef);
/* Record BO memory for VM pagetable created against client */
@@ -1794,10 +1789,15 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
args->reserved[0] = xe_bo_main_addr(vm->pt_root[0]->bo, XE_PAGE_SIZE);
#endif
+ /* user id alloc must always be last in ioctl to prevent UAF */
+ err = xa_alloc(&xef->vm.xa, &id, vm, xa_limit_32b, GFP_KERNEL);
+ if (err)
+ goto err_close_and_put;
+
+ args->vm_id = id;
+
return 0;
-err_free_id:
- xa_erase(&xef->vm.xa, id);
err_close_and_put:
xe_vm_close_and_put(vm);
--
2.46.1
From: Sarannya S <quic_sarannya(a)quicinc.com>
On remote susbsystem restart rproc will stop glink subdev which will
trigger qcom_glink_native_remove, any ongoing intent wait should be
aborted from there otherwise this wait delays glink send which potentially
delays glink channel removal as well. This further introduces delay in ssr
notification to other remote subsystems from rproc.
Currently qcom_glink_native_remove is not setting channel->intent_received,
so any ongoing intent wait is not aborted on remote susbsystem restart.
abort_tx flag can be used as a condition to abort in such cases.
Adding abort_tx flag check in intent wait, to abort intent wait from
qcom_glink_native_remove.
Fixes: c05dfce0b89e ("rpmsg: glink: Wait for intent, not just request ack")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sarannya S <quic_sarannya(a)quicinc.com>
Signed-off-by: Deepak Kumar Singh <quic_deesin(a)quicinc.com>
---
drivers/rpmsg/qcom_glink_native.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/rpmsg/qcom_glink_native.c b/drivers/rpmsg/qcom_glink_native.c
index 82d460ff4777..ff828531c36f 100644
--- a/drivers/rpmsg/qcom_glink_native.c
+++ b/drivers/rpmsg/qcom_glink_native.c
@@ -438,7 +438,6 @@ static void qcom_glink_handle_intent_req_ack(struct qcom_glink *glink,
static void qcom_glink_intent_req_abort(struct glink_channel *channel)
{
- WRITE_ONCE(channel->intent_req_result, 0);
wake_up_all(&channel->intent_req_wq);
}
@@ -1354,8 +1353,9 @@ static int qcom_glink_request_intent(struct qcom_glink *glink,
goto unlock;
ret = wait_event_timeout(channel->intent_req_wq,
- READ_ONCE(channel->intent_req_result) >= 0 &&
- READ_ONCE(channel->intent_received),
+ (READ_ONCE(channel->intent_req_result) >= 0 &&
+ READ_ONCE(channel->intent_received)) ||
+ glink->abort_tx,
10 * HZ);
if (!ret) {
dev_err(glink->dev, "intent request timed out\n");
--
2.34.1
Variable 'scale', whose possible value set allows a zero value in a check
at r8169_main.c:2014, is used as a denominator at r8169_main.c:2040 and
r8169_main.c:2042.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 2815b30535a0 ("r8169: merge scale for tx and rx irq coalescing")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru>
---
drivers/net/ethernet/realtek/r8169_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 45ac8befba29..b97e68cfcfad 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2011,7 +2011,7 @@ static int rtl_set_coalesce(struct net_device *dev,
coal_usec_max = max(ec->rx_coalesce_usecs, ec->tx_coalesce_usecs);
scale = rtl_coalesce_choose_scale(tp, coal_usec_max, &cp01);
- if (scale < 0)
+ if (scale <= 0)
return scale;
/* Accept max_frames=1 we returned in rtl_get_coalesce. Accept it
--
2.34.1
Заявление о конфиденциальности
Данное электронное письмо и любые приложения к нему являются конфиденциальными и предназначены исключительно для адресата. Если Вы не являетесь адресатом данного письма, пожалуйста, уведомите немедленно отправителя, не раскрывайте содержание другим лицам, не используйте его в каких-либо целях, не храните и не копируйте информацию любым способом.
[ Upstream commit b440396387418fe2feaacd41ca16080e7a8bc9ad ]
linereq_set_config() behaves badly when direction is not set.
The configuration validation is borrowed from linereq_create(), where,
to verify the intent of the user, the direction must be set to in order to
effect a change to the electrical configuration of a line. But, when
applied to reconfiguration, that validation does not allow for the unset
direction case, making it possible to clear flags set previously without
specifying the line direction.
Adding to the inconsistency, those changes are not immediately applied by
linereq_set_config(), but will take effect when the line value is next get
or set.
For example, by requesting a configuration with no flags set, an output
line with GPIO_V2_LINE_FLAG_ACTIVE_LOW and GPIO_V2_LINE_FLAG_OPEN_DRAIN
set could have those flags cleared, inverting the sense of the line and
changing the line drive to push-pull on the next line value set.
Skip the reconfiguration of lines for which the direction is not set, and
only reconfigure the lines for which direction is set.
Fixes: a54756cb24ea ("gpiolib: cdev: support GPIO_V2_LINE_SET_CONFIG_IOCTL")
Signed-off-by: Kent Gibson <warthog618(a)gmail.com>
Link: https://lore.kernel.org/r/20240626052925.174272-3-warthog618@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
---
drivers/gpio/gpiolib-cdev.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
index c2f9d95d1086..fe0926ce0068 100644
--- a/drivers/gpio/gpiolib-cdev.c
+++ b/drivers/gpio/gpiolib-cdev.c
@@ -1186,15 +1186,18 @@ static long linereq_set_config_unlocked(struct linereq *lr,
for (i = 0; i < lr->num_lines; i++) {
desc = lr->lines[i].desc;
flags = gpio_v2_line_config_flags(lc, i);
+ /*
+ * Lines not explicitly reconfigured as input or output
+ * are left unchanged.
+ */
+ if (!(flags & GPIO_V2_LINE_DIRECTION_FLAGS))
+ continue;
+
polarity_change =
(!!test_bit(FLAG_ACTIVE_LOW, &desc->flags) !=
((flags & GPIO_V2_LINE_FLAG_ACTIVE_LOW) != 0));
gpio_v2_line_config_flags_to_desc_flags(flags, &desc->flags);
- /*
- * Lines have to be requested explicitly for input
- * or output, else the line will be treated "as is".
- */
if (flags & GPIO_V2_LINE_FLAG_OUTPUT) {
int val = gpio_v2_line_config_output_value(lc, i);
@@ -1202,7 +1205,7 @@ static long linereq_set_config_unlocked(struct linereq *lr,
ret = gpiod_direction_output(desc, val);
if (ret)
return ret;
- } else if (flags & GPIO_V2_LINE_FLAG_INPUT) {
+ } else {
ret = gpiod_direction_input(desc);
if (ret)
return ret;
--
2.39.5
[ Upstream commit b440396387418fe2feaacd41ca16080e7a8bc9ad ]
linereq_set_config() behaves badly when direction is not set.
The configuration validation is borrowed from linereq_create(), where,
to verify the intent of the user, the direction must be set to in order to
effect a change to the electrical configuration of a line. But, when
applied to reconfiguration, that validation does not allow for the unset
direction case, making it possible to clear flags set previously without
specifying the line direction.
Adding to the inconsistency, those changes are not immediately applied by
linereq_set_config(), but will take effect when the line value is next get
or set.
For example, by requesting a configuration with no flags set, an output
line with GPIO_V2_LINE_FLAG_ACTIVE_LOW and GPIO_V2_LINE_FLAG_OPEN_DRAIN
set could have those flags cleared, inverting the sense of the line and
changing the line drive to push-pull on the next line value set.
Skip the reconfiguration of lines for which the direction is not set, and
only reconfigure the lines for which direction is set.
Fixes: a54756cb24ea ("gpiolib: cdev: support GPIO_V2_LINE_SET_CONFIG_IOCTL")
Signed-off-by: Kent Gibson <warthog618(a)gmail.com>
Link: https://lore.kernel.org/r/20240626052925.174272-3-warthog618@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
---
drivers/gpio/gpiolib-cdev.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
index d526a4c91e82..545998e9f6ad 100644
--- a/drivers/gpio/gpiolib-cdev.c
+++ b/drivers/gpio/gpiolib-cdev.c
@@ -1565,12 +1565,14 @@ static long linereq_set_config_unlocked(struct linereq *lr,
line = &lr->lines[i];
desc = lr->lines[i].desc;
flags = gpio_v2_line_config_flags(lc, i);
- gpio_v2_line_config_flags_to_desc_flags(flags, &desc->flags);
- edflags = flags & GPIO_V2_LINE_EDGE_DETECTOR_FLAGS;
/*
- * Lines have to be requested explicitly for input
- * or output, else the line will be treated "as is".
+ * Lines not explicitly reconfigured as input or output
+ * are left unchanged.
*/
+ if (!(flags & GPIO_V2_LINE_DIRECTION_FLAGS))
+ continue;
+ gpio_v2_line_config_flags_to_desc_flags(flags, &desc->flags);
+ edflags = flags & GPIO_V2_LINE_EDGE_DETECTOR_FLAGS;
if (flags & GPIO_V2_LINE_FLAG_OUTPUT) {
int val = gpio_v2_line_config_output_value(lc, i);
@@ -1578,7 +1580,7 @@ static long linereq_set_config_unlocked(struct linereq *lr,
ret = gpiod_direction_output(desc, val);
if (ret)
return ret;
- } else if (flags & GPIO_V2_LINE_FLAG_INPUT) {
+ } else {
ret = gpiod_direction_input(desc);
if (ret)
return ret;
--
2.39.5
The patch titled
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-fix-uninit-value-in-ocfs2_get_block.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
Date: Wed, 25 Sep 2024 17:06:00 +0800
syzbot reported an uninit-value BUG:
BUG: KMSAN: uninit-value in ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
do_mpage_readpage+0xc45/0x2780 fs/mpage.c:225
mpage_readahead+0x43f/0x840 fs/mpage.c:374
ocfs2_readahead+0x269/0x320 fs/ocfs2/aops.c:381
read_pages+0x193/0x1110 mm/readahead.c:160
page_cache_ra_unbounded+0x901/0x9f0 mm/readahead.c:273
do_page_cache_ra mm/readahead.c:303 [inline]
force_page_cache_ra+0x3b1/0x4b0 mm/readahead.c:332
force_page_cache_readahead mm/internal.h:347 [inline]
generic_fadvise+0x6b0/0xa90 mm/fadvise.c:106
vfs_fadvise mm/fadvise.c:185 [inline]
ksys_fadvise64_64 mm/fadvise.c:199 [inline]
__do_sys_fadvise64 mm/fadvise.c:214 [inline]
__se_sys_fadvise64 mm/fadvise.c:212 [inline]
__x64_sys_fadvise64+0x1fb/0x3a0 mm/fadvise.c:212
x64_sys_call+0xe11/0x3ba0
arch/x86/include/generated/asm/syscalls_64.h:222
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
This is because when ocfs2_extent_map_get_blocks() fails, p_blkno is
uninitialized. So the error log will trigger the above uninit-value
access.
The error log is out-of-date since get_blocks() was removed long time ago.
And the error code will be logged in ocfs2_extent_map_get_blocks() once
ocfs2_get_cluster() fails, so fix this by only logging inode and block.
Link: https://syzkaller.appspot.com/bug?extid=9709e73bae885b05314b
Link: https://lkml.kernel.org/r/20240925090600.3643376-1-joseph.qi@linux.alibaba.…
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Tested-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Cc: Heming Zhao <heming.zhao(a)suse.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/aops.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/fs/ocfs2/aops.c~ocfs2-fix-uninit-value-in-ocfs2_get_block
+++ a/fs/ocfs2/aops.c
@@ -156,9 +156,8 @@ int ocfs2_get_block(struct inode *inode,
err = ocfs2_extent_map_get_blocks(inode, iblock, &p_blkno, &count,
&ext_flags);
if (err) {
- mlog(ML_ERROR, "Error %d from get_blocks(0x%p, %llu, 1, "
- "%llu, NULL)\n", err, inode, (unsigned long long)iblock,
- (unsigned long long)p_blkno);
+ mlog(ML_ERROR, "get_blocks() failed, inode: 0x%p, "
+ "block: %llu\n", inode, (unsigned long long)iblock);
goto bail;
}
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
ocfs2-fix-uninit-value-in-ocfs2_get_block.patch
I forgot to CC stable on this fix.
Chris
On Thu, Sep 5, 2024 at 1:08 AM Chris Li <chrisl(a)kernel.org> wrote:
>
> I found a regression on mm-unstable during my swap stress test,
> using tmpfs to compile linux. The test OOM very soon after
> the make spawns many cc processes.
>
> It bisects down to this change: 33dfe9204f29b415bbc0abb1a50642d1ba94f5e9
> (mm/gup: clear the LRU flag of a page before adding to LRU batch)
>
> Yu Zhao propose the fix: "I think this is one of the potential side
> effects -- Huge mentioned earlier about isolate_lru_folios():"
>
> I test that with it the swap stress test no longer OOM.
>
> Link: https://lore.kernel.org/r/CAOUHufYi9h0kz5uW3LHHS3ZrVwEq-kKp8S6N-MZUmErNAXoX…
> Fixes: 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding to LRU batch")
> Suggested-by: Yu Zhao <yuzhao(a)google.com>
> Suggested-by: Hugh Dickins <hughd(a)google.com>
> Tested-by: Chris Li <chrisl(a)kernel.org>
> Closes: https://lore.kernel.org/all/CAF8kJuNP5iTj2p07QgHSGOJsiUfYpJ2f4R1Q5-3BN9JiD9…
> Signed-off-by: Chris Li <chrisl(a)kernel.org>
> ---
> Changes in v2:
> - Add Closes tag suggested by Yu and Thorsten.
> - Link to v1: https://lore.kernel.org/r/20240904-lru-flag-v1-1-36638d6a524c@kernel.org
> ---
> mm/vmscan.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a9b6a8196f95..96abf4a52382 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4323,7 +4323,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
> }
>
> /* ineligible */
> - if (zone > sc->reclaim_idx) {
> + if (!folio_test_lru(folio) || zone > sc->reclaim_idx) {
> gen = folio_inc_gen(lruvec, folio, false);
> list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
> return true;
>
> ---
> base-commit: 756ca36d643324d028b325a170e73e392b9590cd
> change-id: 20240904-lru-flag-2af2f955740e
>
> Best regards,
> --
> Chris Li <chrisl(a)kernel.org>
>
We acquire a connector reference before scheduling an HDCP prop work,
and expect the work function to release the reference.
However, if the work was already queued, it won't be queued multiple
times, and the reference is not dropped.
Release the reference immediately if the work was already queued.
Fixes: a6597faa2d59 ("drm/i915: Protect workers against disappearing connectors")
Cc: Sean Paul <seanpaul(a)chromium.org>
Cc: Suraj Kandpal <suraj.kandpal(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v5.10+
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
---
I don't know that we have any bugs open about this. Or how it would
manifest itself. Memory leak on driver unload? I just spotted this while
reading the code for other reasons.
---
drivers/gpu/drm/i915/display/intel_hdcp.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_hdcp.c b/drivers/gpu/drm/i915/display/intel_hdcp.c
index 2afa92321b08..cad309602617 100644
--- a/drivers/gpu/drm/i915/display/intel_hdcp.c
+++ b/drivers/gpu/drm/i915/display/intel_hdcp.c
@@ -1097,7 +1097,8 @@ static void intel_hdcp_update_value(struct intel_connector *connector,
hdcp->value = value;
if (update_property) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
}
}
@@ -2531,7 +2532,8 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
mutex_lock(&hdcp->mutex);
hdcp->value = DRM_MODE_CONTENT_PROTECTION_DESIRED;
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
mutex_unlock(&hdcp->mutex);
}
@@ -2548,7 +2550,9 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
*/
if (!desired_and_not_enabled && !content_protection_type_changed) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
+
}
}
--
2.39.2
When a mailbox message is received, the driver is checking for a non 0
datalen in the controlq descriptor. If it is valid, the payload is
attached to the ctlq message to give to the upper layer. However, the
payload response size given to the upper layer was taken from the buffer
metadata which is _always_ the max buffer size. This meant the API was
returning 4K as the payload size for all messages. This went unnoticed
since the virtchnl exchange response logic was checking for a response
size less than 0 (error), not less than exact size, or not greater than
or equal to the max mailbox buffer size (4K). All of these checks will
pass in the success case since the size provided is always 4K. However,
this breaks anyone that wants to validate the exact response size.
Fetch the actual payload length from the value provided in the
descriptor data_len field (instead of the buffer metadata).
Unfortunately, this means we lose some extra error parsing for variable
sized virtchnl responses such as create vport and get ptypes. However,
the original checks weren't really helping anyways since the size was
_always_ 4K.
Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager")
Cc: stable(a)vger.kernel.org # 6.9+
Signed-off-by: Joshua Hay <joshua.a.hay(a)intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
---
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 70986e12da28..3c0f97650d72 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -666,7 +666,7 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
if (ctlq_msg->data_len) {
payload = ctlq_msg->ctx.indirect.payload->va;
- payload_size = ctlq_msg->ctx.indirect.payload->size;
+ payload_size = ctlq_msg->data_len;
}
xn->reply_sz = payload_size;
@@ -1295,10 +1295,6 @@ int idpf_send_create_vport_msg(struct idpf_adapter *adapter,
err = reply_sz;
goto free_vport_params;
}
- if (reply_sz < IDPF_CTLQ_MAX_BUF_LEN) {
- err = -EIO;
- goto free_vport_params;
- }
return 0;
@@ -2602,9 +2598,6 @@ int idpf_send_get_rx_ptype_msg(struct idpf_vport *vport)
if (reply_sz < 0)
return reply_sz;
- if (reply_sz < IDPF_CTLQ_MAX_BUF_LEN)
- return -EIO;
-
ptypes_recvd += le16_to_cpu(ptype_info->num_ptypes);
if (ptypes_recvd > max_ptype)
return -EINVAL;
--
2.39.2
The patch titled
Subject: kselftests: mm: fix wrong __NR_userfaultfd value
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kselftests-mm-fix-wrong-__nr_userfaultfd-value.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: kselftests: mm: fix wrong __NR_userfaultfd value
Date: Mon, 23 Sep 2024 10:38:36 +0500
grep -rnIF "#define __NR_userfaultfd"
tools/include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
arch/x86/include/generated/uapi/asm/unistd_32.h:374:#define
__NR_userfaultfd 374
arch/x86/include/generated/uapi/asm/unistd_64.h:327:#define
__NR_userfaultfd 323
arch/x86/include/generated/uapi/asm/unistd_x32.h:282:#define
__NR_userfaultfd (__X32_SYSCALL_BIT + 323)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:347:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
arch/arm/include/generated/uapi/asm/unistd-oabi.h:359:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
The number is dependent on the architecture. The above data shows that:
x86 374
x86_64 323
The value of __NR_userfaultfd was changed to 282 when asm-generic/unistd.h
was included. It makes the test to fail every time as the correct number
of this syscall on x86_64 is 323. Fix the header to asm/unistd.h.
Link: https://lkml.kernel.org/r/20240923053836.3270393-1-usama.anjum@collabora.com
Fixes: a5c6bc590094 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/pagemap_ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/pagemap_ioctl.c~kselftests-mm-fix-wrong-__nr_userfaultfd-value
+++ a/tools/testing/selftests/mm/pagemap_ioctl.c
@@ -15,7 +15,7 @@
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <math.h>
-#include <asm-generic/unistd.h>
+#include <asm/unistd.h>
#include <pthread.h>
#include <sys/resource.h>
#include <assert.h>
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
kselftests-mm-fix-wrong-__nr_userfaultfd-value.patch
The patch titled
Subject: compiler.h: specify correct attribute for .rodata..c_jump_table
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
compilerh-specify-correct-attribute-for-rodatac_jump_table.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Subject: compiler.h: specify correct attribute for .rodata..c_jump_table
Date: Tue, 24 Sep 2024 14:27:10 +0800
Currently, there is an assembler message when generating kernel/bpf/core.o
under CONFIG_OBJTOOL with LoongArch compiler toolchain:
Warning: setting incorrect section attributes for .rodata..c_jump_table
This is because the section ".rodata..c_jump_table" should be readonly,
but there is a "W" (writable) part of the flags:
$ readelf -S kernel/bpf/core.o | grep -A 1 "rodata..c"
[34] .rodata..c_j[...] PROGBITS 0000000000000000 0000d2e0
0000000000000800 0000000000000000 WA 0 0 8
There is no above issue on x86 due to the generated section flag is only
"A" (allocatable). In order to silence the warning on LoongArch, specify
the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly,
then the section attribute of ".rodata..c_jump_table" must be readonly
in the kernel/bpf/core.o file.
Before:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, DATA
After:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
By the way, AFAICT, maybe the root cause is related with the different
compiler behavior of various archs, so to some extent this change is a
workaround for LoongArch, and also there is no effect for x86 which is the
only port supported by objtool before LoongArch with this patch.
Link: https://lkml.kernel.org/r/20240924062710.1243-1-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Cc: Josh Poimboeuf <jpoimboe(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [6.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/compiler.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/compiler.h~compilerh-specify-correct-attribute-for-rodatac_jump_table
+++ a/include/linux/compiler.h
@@ -133,7 +133,7 @@ void ftrace_likely_update(struct ftrace_
#define annotate_unreachable() __annotate_unreachable(__COUNTER__)
/* Annotate a C jump table to allow objtool to follow the code flow */
-#define __annotate_jump_table __section(".rodata..c_jump_table")
+#define __annotate_jump_table __section(".rodata..c_jump_table,\"a\",@progbits #")
#else /* !CONFIG_OBJTOOL */
#define annotate_reachable()
_
Patches currently in -mm which might be from yangtiezhu(a)loongson.cn are
compilerh-specify-correct-attribute-for-rodatac_jump_table.patch
The patch titled
Subject: ocfs2: fix deadlock in ocfs2_get_system_file_inode
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mohammed Anees <pvmohammedanees2003(a)gmail.com>
Subject: ocfs2: fix deadlock in ocfs2_get_system_file_inode
Date: Tue, 24 Sep 2024 09:32:57 +0000
syzbot has found a possible deadlock in ocfs2_get_system_file_inode [1].
The scenario is depicted here,
CPU0 CPU1
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
The function calls which could lead to this are:
CPU0
ocfs2_mknod - lock(&ocfs2_file_ip_alloc_sem_key);
.
.
.
ocfs2_get_system_file_inode - lock(&osb->system_file_mutex);
CPU1 -
ocfs2_fill_super - lock(&osb->system_file_mutex);
.
.
.
ocfs2_read_virt_blocks - lock(&ocfs2_file_ip_alloc_sem_key);
This issue can be resolved by making the down_read -> down_read_try
in the ocfs2_read_virt_blocks.
[1] https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Link: https://lkml.kernel.org/r/20240924093257.7181-1-pvmohammedanees2003@gmail.c…
Signed-off-by: Mohammed Anees <pvmohammedanees2003(a)gmail.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: <syzbot+e0055ea09f1f5e6fabdd(a)syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Tested-by: syzbot+e0055ea09f1f5e6fabdd(a)syzkaller.appspotmail.com
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/extent_map.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/extent_map.c~ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode
+++ a/fs/ocfs2/extent_map.c
@@ -973,7 +973,13 @@ int ocfs2_read_virt_blocks(struct inode
}
while (done < nr) {
- down_read(&OCFS2_I(inode)->ip_alloc_sem);
+ if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
+ rc = -EAGAIN;
+ mlog(ML_ERROR,
+ "Inode #%llu ip_alloc_sem is temporarily unavailable\n",
+ (unsigned long long)OCFS2_I(inode)->ip_blkno);
+ break;
+ }
rc = ocfs2_extent_map_get_blocks(inode, v_block + done,
&p_block, &p_count, NULL);
up_read(&OCFS2_I(inode)->ip_alloc_sem);
_
Patches currently in -mm which might be from pvmohammedanees2003(a)gmail.com are
ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode.patch
ocfs2-fix-typo-in-comment.patch
The patch titled
Subject: ocfs2: reserve space for inline xattr before attaching reflink tree
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
Subject: ocfs2: reserve space for inline xattr before attaching reflink tree
Date: Wed, 18 Sep 2024 06:38:44 +0000
One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption. Upon troubleshooting,
the fsck -fn output showed the below corruption
[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227. Clamp the next record value? n
The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.
Inode: 33080590 Mode: 0640 Generation: 2619713622 (0x9c25a856)
FS Generation: 904309833 (0x35e6ac49)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
Extended Attributes Block: 0 Extended Attributes Inline Size: 256
User: 0 (root) Group: 0 (root) Size: 281320357888
Links: 1 Clusters: 141738
ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
dtime: 0x0 -- Wed Dec 31 17:00:00 1969
Refcount Block: 2777346
Last Extblk: 2886943 Orphan Slot: 0
Sub Alloc Slot: 0 Sub Alloc Bit: 14
Tree Depth: 1 Count: 227 Next Free Rec: 230
## Offset Clusters Block#
0 0 2310 2776351
1 2310 2139 2777375
2 4449 1221 2778399
3 5670 731 2779423
4 6401 566 2780447
....... .... .......
....... .... .......
The issue was in the reflink workfow while reserving space for inline
xattr. The problematic function is ocfs2_reflink_xattr_inline(). By the
time this function is called the reflink tree is already recreated at the
destination inode from the source inode. At this point, this function
reserves space for inline xattrs at the destination inode without even
checking if there is space at the root metadata block. It simply reduces
the l_count from 243 to 227 thereby making space of 256 bytes for inline
xattr whereas the inode already has extents beyond this index (in this
case upto 230), thereby causing corruption.
The fix for this is to reserve space for inline metadata at the destination
inode before the reflink tree gets recreated. The customer has verified the
fix.
Link: https://lkml.kernel.org/r/20240918063844.1830332-1-gautham.ananthakrishna@o…
Fixes: ef962df057aa ("ocfs2: xattr: fix inlined xattr reflink")
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/refcounttree.c | 26 ++++++++++++++++++++++++--
fs/ocfs2/xattr.c | 11 +----------
2 files changed, 25 insertions(+), 12 deletions(-)
--- a/fs/ocfs2/refcounttree.c~ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree
+++ a/fs/ocfs2/refcounttree.c
@@ -25,6 +25,7 @@
#include "namei.h"
#include "ocfs2_trace.h"
#include "file.h"
+#include "symlink.h"
#include <linux/bio.h>
#include <linux/blkdev.h>
@@ -4148,8 +4149,9 @@ static int __ocfs2_reflink(struct dentry
int ret;
struct inode *inode = d_inode(old_dentry);
struct buffer_head *new_bh = NULL;
+ struct ocfs2_inode_info *oi = OCFS2_I(inode);
- if (OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
+ if (oi->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
ret = -EINVAL;
mlog_errno(ret);
goto out;
@@ -4175,6 +4177,26 @@ static int __ocfs2_reflink(struct dentry
goto out_unlock;
}
+ if ((oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) &&
+ (oi->ip_dyn_features & OCFS2_INLINE_XATTR_FL)) {
+ /*
+ * Adjust extent record count to reserve space for extended attribute.
+ * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
+ */
+ struct ocfs2_inode_info *new_oi = OCFS2_I(new_inode);
+
+ if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+ !(ocfs2_inode_is_fast_symlink(new_inode))) {
+ struct ocfs2_dinode *new_di = (struct ocfs2_dinode *)new_bh->b_data;
+ struct ocfs2_dinode *old_di = (struct ocfs2_dinode *)old_bh->b_data;
+ struct ocfs2_extent_list *el = &new_di->id2.i_list;
+ int inline_size = le16_to_cpu(old_di->i_xattr_inline_size);
+
+ le16_add_cpu(&el->l_count, -(inline_size /
+ sizeof(struct ocfs2_extent_rec)));
+ }
+ }
+
ret = ocfs2_create_reflink_node(inode, old_bh,
new_inode, new_bh, preserve);
if (ret) {
@@ -4182,7 +4204,7 @@ static int __ocfs2_reflink(struct dentry
goto inode_unlock;
}
- if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
+ if (oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
ret = ocfs2_reflink_xattrs(inode, old_bh,
new_inode, new_bh,
preserve);
--- a/fs/ocfs2/xattr.c~ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree
+++ a/fs/ocfs2/xattr.c
@@ -6511,16 +6511,7 @@ static int ocfs2_reflink_xattr_inline(st
}
new_oi = OCFS2_I(args->new_inode);
- /*
- * Adjust extent record count to reserve space for extended attribute.
- * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
- */
- if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
- !(ocfs2_inode_is_fast_symlink(args->new_inode))) {
- struct ocfs2_extent_list *el = &new_di->id2.i_list;
- le16_add_cpu(&el->l_count, -(inline_size /
- sizeof(struct ocfs2_extent_rec)));
- }
+
spin_lock(&new_oi->ip_lock);
new_oi->ip_dyn_features |= OCFS2_HAS_XATTR_FL | OCFS2_INLINE_XATTR_FL;
new_di->i_dyn_features = cpu_to_le16(new_oi->ip_dyn_features);
_
Patches currently in -mm which might be from gautham.ananthakrishna(a)oracle.com are
ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree.patch
The patch titled
Subject: mm: migrate: annotate data-race in migrate_folio_unmap()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-migrate-annotate-data-race-in-migrate_folio_unmap.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jeongjun Park <aha310510(a)gmail.com>
Subject: mm: migrate: annotate data-race in migrate_folio_unmap()
Date: Tue, 24 Sep 2024 22:00:53 +0900
I found a report from syzbot [1]
This report shows that the value can be changed, but in reality, the
value of __folio_set_movable() cannot be changed because it holds the
folio refcount.
Therefore, it is appropriate to add an annotate to make KCSAN
ignore that data-race.
[1]
==================================================================
BUG: KCSAN: data-race in __filemap_remove_folio / migrate_pages_batch
write to 0xffffea0004b81dd8 of 8 bytes by task 6348 on cpu 0:
page_cache_delete mm/filemap.c:153 [inline]
__filemap_remove_folio+0x1ac/0x2c0 mm/filemap.c:233
filemap_remove_folio+0x6b/0x1f0 mm/filemap.c:265
truncate_inode_folio+0x42/0x50 mm/truncate.c:178
shmem_undo_range+0x25b/0xa70 mm/shmem.c:1028
shmem_truncate_range mm/shmem.c:1144 [inline]
shmem_evict_inode+0x14d/0x530 mm/shmem.c:1272
evict+0x2f0/0x580 fs/inode.c:731
iput_final fs/inode.c:1883 [inline]
iput+0x42a/0x5b0 fs/inode.c:1909
dentry_unlink_inode+0x24f/0x260 fs/dcache.c:412
__dentry_kill+0x18b/0x4c0 fs/dcache.c:615
dput+0x5c/0xd0 fs/dcache.c:857
__fput+0x3fb/0x6d0 fs/file_table.c:439
____fput+0x1c/0x30 fs/file_table.c:459
task_work_run+0x13a/0x1a0 kernel/task_work.c:228
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0xbe/0x130 kernel/entry/common.c:218
do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffffea0004b81dd8 of 8 bytes by task 6342 on cpu 1:
__folio_test_movable include/linux/page-flags.h:699 [inline]
migrate_folio_unmap mm/migrate.c:1199 [inline]
migrate_pages_batch+0x24c/0x1940 mm/migrate.c:1797
migrate_pages_sync mm/migrate.c:1963 [inline]
migrate_pages+0xff1/0x1820 mm/migrate.c:2072
do_mbind mm/mempolicy.c:1390 [inline]
kernel_mbind mm/mempolicy.c:1533 [inline]
__do_sys_mbind mm/mempolicy.c:1607 [inline]
__se_sys_mbind+0xf76/0x1160 mm/mempolicy.c:1603
__x64_sys_mbind+0x78/0x90 mm/mempolicy.c:1603
x64_sys_call+0x2b4d/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:238
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0xffff888127601078 -> 0x0000000000000000
Link: https://lkml.kernel.org/r/20240924130053.107490-1-aha310510@gmail.com
Fixes: 7e2a5e5ab217 ("mm: migrate: use __folio_test_movable()")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-annotate-data-race-in-migrate_folio_unmap
+++ a/mm/migrate.c
@@ -1196,7 +1196,7 @@ static int migrate_folio_unmap(new_folio
int rc = -EAGAIN;
int old_page_state = 0;
struct anon_vma *anon_vma = NULL;
- bool is_lru = !__folio_test_movable(src);
+ bool is_lru = data_race(!__folio_test_movable(src));
bool locked = false;
bool dst_locked = false;
_
Patches currently in -mm which might be from aha310510(a)gmail.com are
mm-migrate-annotate-data-race-in-migrate_folio_unmap.patch
mm-percpu-fix-typo-to-pcpu_alloc_noprof-description.patch
The quilt patch titled
Subject: padata: honor the caller's alignment in case of chunk_size 0
has been removed from the -mm tree. Its filename was
padata-honor-the-callers-alignment-in-case-of-chunk_size-0.patch
This patch was dropped because an alternative patch was or shall be merged
------------------------------------------------------
From: Kamlesh Gurudasani <kamlesh(a)ti.com>
Subject: padata: honor the caller's alignment in case of chunk_size 0
Date: Thu, 22 Aug 2024 02:32:52 +0530
In the case where we are forcing the ps.chunk_size to be at least 1, we
are ignoring the caller's alignment.
Move the forcing of ps.chunk_size to be at least 1 before rounding it up
to caller's alignment, so that caller's alignment is honored.
While at it, use max() to force the ps.chunk_size to be at least 1 to
improve readability.
Link: https://lkml.kernel.org/r/20240822-max-v1-1-cb4bc5b1c101@ti.com
Fixes: 6d45e1c948a8 ("padata: Fix possible divide-by-0 panic in padata_mt_helper()")
Signed-off-by: Kamlesh Gurudasani <kamlesh(a)ti.com>
Acked-by: Waiman Long <longman(a)redhat.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/padata.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
--- a/kernel/padata.c~padata-honor-the-callers-alignment-in-case-of-chunk_size-0
+++ a/kernel/padata.c
@@ -509,21 +509,17 @@ void __init padata_do_multithreaded(stru
/*
* Chunk size is the amount of work a helper does per call to the
- * thread function. Load balance large jobs between threads by
+ * thread function. Load balance large jobs between threads by
* increasing the number of chunks, guarantee at least the minimum
* chunk size from the caller, and honor the caller's alignment.
+ * Ensure chunk_size is at least 1 to prevent divide-by-0
+ * panic in padata_mt_helper().
*/
ps.chunk_size = job->size / (ps.nworks * load_balance_factor);
ps.chunk_size = max(ps.chunk_size, job->min_chunk);
+ ps.chunk_size = max(ps.chunk_size, 1ul);
ps.chunk_size = roundup(ps.chunk_size, job->align);
- /*
- * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
- * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
- */
- if (!ps.chunk_size)
- ps.chunk_size = 1U;
-
list_for_each_entry(pw, &works, pw_list)
if (job->numa_aware) {
int old_node = atomic_read(&last_used_nid);
_
Patches currently in -mm which might be from kamlesh(a)ti.com are
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and MADV_DONTNEED semantics). Consider some enclave page added to the
enclave. User space decides to temporarily remove this page (e.g.,
emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
space performs a memory access on the same page on CPU2, which results
in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
follows:
/*
* CPU1: User space performs
* ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
* on enclave page X
*/
sgx_encl_remove_pages() {
mutex_lock(&encl->lock);
entry = sgx_encl_load_page(encl);
/*
* verify that page is
* trimmed and accepted
*/
mutex_unlock(&encl->lock);
/*
* remove PTE entry; cannot
* be performed under lock
*/
sgx_zap_enclave_ptes(encl);
/*
* Fault on CPU2 on same page X
*/
sgx_vma_fault() {
/*
* PTE entry was removed, but the
* page is still in enclave's xarray
*/
xa_load(&encl->page_array) != NULL ->
/*
* SGX driver thinks that this page
* was swapped out and loads it
*/
mutex_lock(&encl->lock);
/*
* this is effectively a no-op
*/
entry = sgx_encl_load_page_in_vma();
/*
* add PTE entry
*
* *BUG*: a PTE is installed for a
* page in process of being removed
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
/*
* continue with page removal
*/
mutex_lock(&encl->lock);
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
xa_erase(&encl->page_array);
mutex_unlock(&encl->lock);
}
Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page.
Fix this race by forcing the fault handler on CPU2 to back off if the
page is currently being removed (on CPU1). This is achieved by
setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is busy, and if yes then CPU2 backs off and waits until the page is
completely removed. After that, any memory access to this page results
in a normal "allocate and EAUG a page on #PF" flow.
Additionally fix a similar race: user space converts a normal enclave
page to a TCS page (via SGX_IOC_ENCLAVE_MODIFY_TYPES) on CPU1, and at
the same time, user space performs a memory access on the same page on
CPU2. This fix is not strictly necessary (this particular race would
indicate a bug in a user space application), but it gives a consistent
rule: if an enclave page is under certain operation by the kernel with
the mapping removed, then other threads trying to access that page are
temporarily blocked and should retry.
Fixes: 9849bb27152c1 ("x86/sgx: Support complete page removal")
Fixes: 45d546b8c109d ("x86/sgx: Support modifying SGX page type")
Cc: stable(a)vger.kernel.org
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.h | 3 ++-
arch/x86/kernel/cpu/sgx/ioctl.c | 17 +++++++++++++++++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index b566b8ad5f33..96b11e8fb770 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -22,7 +22,8 @@
/* 'desc' bits holding the offset in the VA (version array) page. */
#define SGX_ENCL_PAGE_VA_OFFSET_MASK GENMASK_ULL(11, 3)
-/* 'desc' bit indicating that the page is busy (being reclaimed). */
+/* 'desc' bit indicating that the page is busy (being reclaimed, removed or
+ * converted to a TCS page). */
#define SGX_ENCL_PAGE_BUSY BIT(2)
/*
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index b65ab214bdf5..c6f936440438 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -969,12 +969,22 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl,
/*
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
+ *
+ * Releasing encl->lock leads to a data race: while CPU1
+ * performs sgx_zap_enclave_ptes() and removes the PTE
+ * entry for the enclave page, CPU2 may attempt to load
+ * this page (because the page is still in enclave's
+ * xarray). To prevent CPU2 from loading the page, mark
+ * the page as busy before unlock and unmark after lock
+ * again.
*/
+ entry->desc |= SGX_ENCL_PAGE_BUSY;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
mutex_lock(&encl->lock);
+ entry->desc &= ~SGX_ENCL_PAGE_BUSY;
sgx_mark_page_reclaimable(entry->epc_page);
}
@@ -1141,7 +1151,14 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
/*
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
+ *
+ * Releasing encl->lock leads to a data race: while CPU1
+ * performs sgx_zap_enclave_ptes() and removes the PTE entry
+ * for the enclave page, CPU2 may attempt to load this page
+ * (because the page is still in enclave's xarray). To prevent
+ * CPU2 from loading the page, mark the page as busy.
*/
+ entry->desc |= SGX_ENCL_PAGE_BUSY;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
--
2.43.0
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Since drm_sched_entity_modify_sched() can modify the entities run queue,
lets make sure to only dereference the pointer once so both adding and
waking up are guaranteed to be consistent.
Alternative of moving the spin_unlock to after the wake up would for now
be more problematic since the same lock is taken inside
drm_sched_rq_update_fifo().
v2:
* Improve commit message. (Philipp)
* Cache the scheduler pointer directly. (Christian)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Luben Tuikov <ltuikov89(a)gmail.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Philipp Stanner <pstanner(a)redhat.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.7+
Reviewed-by: Christian König <christian.koenig(a)amd.com>
---
drivers/gpu/drm/scheduler/sched_entity.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 0e002c17fcb6..a75eede8bf8d 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
/* first job wakes up scheduler */
if (first) {
+ struct drm_gpu_scheduler *sched;
+ struct drm_sched_rq *rq;
+
/* Add the entity to the run queue */
spin_lock(&entity->rq_lock);
if (entity->stopped) {
@@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
return;
}
- drm_sched_rq_add_entity(entity->rq, entity);
+ rq = entity->rq;
+ sched = rq->sched;
+
+ drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo(entity, submit_ts);
- drm_sched_wakeup(entity->rq->sched);
+ drm_sched_wakeup(sched);
}
}
EXPORT_SYMBOL(drm_sched_entity_push_job);
--
2.46.0
The sysfb framebuffer handling only operates on graphics devices
that provide the system's firmware framebuffer. If that device is
not known, assume that any graphics device has been initialized by
firmware.
Fixes a problem on i915 where sysfb does not release the firmware
framebuffer after the native graphics driver loaded.
Reported-by: Borah, Chaitanya Kumar <chaitanya.kumar.borah(a)intel.com>
Closes: https://lore.kernel.org/dri-devel/SJ1PR11MB6129EFB8CE63D1EF6D932F94B96F2@SJ…
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12160
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: b49420d6a1ae ("video/aperture: optionally match the device in sysfb_disable()")
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: Linux regression tracking (Thorsten Leemhuis) <regressions(a)leemhuis.info>
Cc: <stable(a)vger.kernel.org> # v6.11+
---
drivers/firmware/sysfb.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
index 02a07d3d0d40..a3df782fa687 100644
--- a/drivers/firmware/sysfb.c
+++ b/drivers/firmware/sysfb.c
@@ -67,9 +67,11 @@ static bool sysfb_unregister(void)
void sysfb_disable(struct device *dev)
{
struct screen_info *si = &screen_info;
+ struct device *parent;
mutex_lock(&disable_lock);
- if (!dev || dev == sysfb_parent_dev(si)) {
+ parent = sysfb_parent_dev(si);
+ if (!dev || !parent || dev == parent) {
sysfb_unregister();
disabled = true;
}
--
2.46.0
If the video card driver could not find the connector assigned to the
current video controller, or if the hardware status has changed so that
a pre-existing connector is no longer active, none of the state
connectors will meet the assignment criteria for the current crtc video
controller.
In the drm_WARN function, encoder->base.dev is called, so
'&encoder->base.dev' will be dereferenced since encoder will still be
initialized NULL.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: e12d6218fda2 ("drm/i915: Reduce bigjoiner special casing")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru>
---
drivers/gpu/drm/i915/display/intel_display.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index b4ef4d59da1a..1f25b12e5f67 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -819,9 +819,11 @@ intel_get_crtc_new_encoder(const struct intel_atomic_state *state,
num_encoders++;
}
- drm_WARN(state->base.dev, num_encoders != 1,
+ if (encoder) {
+ drm_WARN(state->base.dev, num_encoders != 1,
"%d encoders for pipe %c\n",
num_encoders, pipe_name(primary_crtc->pipe));
+ }
return encoder;
}
--
2.34.1
Заявление о конфиденциальности
Данное электронное письмо и любые приложения к нему являются конфиденциальными и предназначены исключительно для адресата. Если Вы не являетесь адресатом данного письма, пожалуйста, уведомите немедленно отправителя, не раскрывайте содержание другим лицам, не используйте его в каких-либо целях, не храните и не копируйте информацию любым способом.
This patch addresses an open issue of buffer overflow in
f2fs function stat_show(). On the off chance that si->sbi->s_flag
had one of its bits (on the higher end) set to 1, for_each_set_bit()
will loop more than s_flag[] can afford, leading in turn to
erroneous array access.
The issue in question has been fixed in commit 5bb9c111cd98
("f2fs: convert to MAX_SBI_FLAG instead of 32 in stat_show()") and
cherry-picked for 6.1 stable branch.
Modified patch can now be cleanly applied to linux-6.1.y. All of
the changes made to the patch in order to adapt it are described
at the end of commit message in [PATCH 6.1 1/1] f2fs: convert to
MAX_SBI_FLAG instead of 32 in stat_show().
Atomicity violation occurs when the vc4_crtc_send_vblank function is
executed simultaneously with modifications to crtc->state or
crtc->state->event. Consider a scenario where both crtc->state and
crtc->state->event are non-null. They can pass the validity check, but at
the same time, crtc->state or crtc->state->event could be set to null. In
this case, the validity check in vc4_crtc_send_vblank might act on the old
crtc->state and crtc->state->event (before locking), allowing invalid
values to pass the validity check, leading to null pointer dereference.
To address this issue, it is recommended to include the validity check of
crtc->state and crtc->state->event within the locking section of the
function. This modification ensures that the values of crtc->state->event
and crtc->state do not change during the validation process, maintaining
their valid conditions.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: 68e4a69aec4d ("drm/vc4: crtc: Create vblank reporting function")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/gpu/drm/vc4/vc4_crtc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index 8b5a7e5eb146..98885f519827 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -575,10 +575,12 @@ void vc4_crtc_send_vblank(struct drm_crtc *crtc)
struct drm_device *dev = crtc->dev;
unsigned long flags;
- if (!crtc->state || !crtc->state->event)
+ spin_lock_irqsave(&dev->event_lock, flags);
+ if (!crtc->state || !crtc->state->event) {
+ spin_unlock_irqrestore(&dev->event_lock, flags);
return;
+ }
- spin_lock_irqsave(&dev->event_lock, flags);
drm_crtc_send_vblank_event(crtc, crtc->state->event);
crtc->state->event = NULL;
spin_unlock_irqrestore(&dev->event_lock, flags);
--
2.34.1
After having been assigned to a NULL value at rdma.c:1758, pointer 'queue'
is passed as 1st parameter in call to function
'nvmet_rdma_queue_established' at rdma.c:1773, as 1st parameter in call
to function 'nvmet_rdma_queue_disconnect' at rdma.c:1787 and as 2nd
parameter in call to function 'nvmet_rdma_queue_connect_fail' at
rdma.c:1800, where it is dereferenced.
I understand, that driver is confident that the
RDMA_CM_EVENT_CONNECT_REQUEST event will occur first and perform
initialization, but maliciously prepared hardware could send events in
violation of the protocol. Nothing guarantees that the sequence of events
will start with RDMA_CM_EVENT_CONNECT_REQUEST.
Found by Linux Verification Center (linuxtesting.org) with SVACE
Fixes: e1a2ee249b19 ("nvmet-rdma: Fix use after free in nvmet_rdma_cm_handler()")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru>
---
drivers/nvme/target/rdma.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 1b6264fa5803..becebc95f349 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1770,8 +1770,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id,
ret = nvmet_rdma_queue_connect(cm_id, event);
break;
case RDMA_CM_EVENT_ESTABLISHED:
- nvmet_rdma_queue_established(queue);
- break;
+ if (!queue) {
+ nvmet_rdma_queue_established(queue);
+ break;
+ }
case RDMA_CM_EVENT_ADDR_CHANGE:
if (!queue) {
struct nvmet_rdma_port *port = cm_id->context;
@@ -1782,8 +1784,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id,
fallthrough;
case RDMA_CM_EVENT_DISCONNECTED:
case RDMA_CM_EVENT_TIMEWAIT_EXIT:
- nvmet_rdma_queue_disconnect(queue);
- break;
+ if (!queue) {
+ nvmet_rdma_queue_disconnect(queue);
+ break;
+ }
case RDMA_CM_EVENT_DEVICE_REMOVAL:
ret = nvmet_rdma_device_removal(cm_id, queue);
break;
@@ -1793,8 +1797,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id,
fallthrough;
case RDMA_CM_EVENT_UNREACHABLE:
case RDMA_CM_EVENT_CONNECT_ERROR:
- nvmet_rdma_queue_connect_fail(cm_id, queue);
- break;
+ if (!queue) {
+ nvmet_rdma_queue_connect_fail(cm_id, queue);
+ break;
+ }
default:
pr_err("received unrecognized RDMA CM event %d\n",
event->event);
--
2.34.1
Заявление о конфиденциальности
Данное электронное письмо и любые приложения к нему являются конфиденциальными и предназначены исключительно для адресата. Если Вы не являетесь адресатом данного письма, пожалуйста, уведомите немедленно отправителя, не раскрывайте содержание другим лицам, не используйте его в каких-либо целях, не храните и не копируйте информацию любым способом.
If the video card driver could not find the connector assigned to the
current video controller, or if the hardware status has changed so that
a pre-existing connector is no longer active, none of the state
connectors will meet the assignment criteria for the current crtc video
controller.
In the drm_WARN function, encoder->base.dev is called, so
'&encoder->base.dev' will be dereferenced since encoder will still be
initialized NULL.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: e12d6218fda2 ("drm/i915: Reduce bigjoiner special casing")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru>
---
drivers/gpu/drm/i915/display/intel_display.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index b4ef4d59da1a..1f25b12e5f67 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -819,9 +819,11 @@ intel_get_crtc_new_encoder(const struct intel_atomic_state *state,
num_encoders++;
}
- drm_WARN(state->base.dev, num_encoders != 1,
+ if (encoder) {
+ drm_WARN(state->base.dev, num_encoders != 1,
"%d encoders for pipe %c\n",
num_encoders, pipe_name(primary_crtc->pipe));
+ }
return encoder;
}
--
2.34.1
Заявление о конфиденциальности
Данное электронное письмо и любые приложения к нему являются конфиденциальными и предназначены исключительно для адресата. Если Вы не являетесь адресатом данного письма, пожалуйста, уведомите немедленно отправителя, не раскрывайте содержание другим лицам, не используйте его в каких-либо целях, не храните и не копируйте информацию любым способом.
If the video card driver could not find the connector assigned to the
current video controller, or if the hardware status has changed so that
a pre-existing connector is no longer active, none of the state
connectors will meet the assignment criteria for the current crtc video
controller.
In the drm_WARN function, encoder->base.dev is called, so
'&encoder->base.dev' will be dereferenced since encoder will still be
initialized NULL.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: e12d6218fda2 ("drm/i915: Reduce bigjoiner special casing")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru>
---
drivers/gpu/drm/i915/display/intel_display.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index b4ef4d59da1a..a5e24d64f909 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -819,9 +819,10 @@ intel_get_crtc_new_encoder(const struct intel_atomic_state *state,
num_encoders++;
}
- drm_WARN(state->base.dev, num_encoders != 1,
- "%d encoders for pipe %c\n",
- num_encoders, pipe_name(primary_crtc->pipe));
+ if (encoder)
+ drm_WARN(state->base.dev, num_encoders != 1,
+ "%d encoders for pipe %c\n",
+ num_encoders, pipe_name(primary_crtc->pipe));
return encoder;
}
--
2.34.1
Заявление о конфиденциальности
Данное электронное письмо и любые приложения к нему являются конфиденциальными и предназначены исключительно для адресата. Если Вы не являетесь адресатом данного письма, пожалуйста, уведомите немедленно отправителя, не раскрывайте содержание другим лицам, не используйте его в каких-либо целях, не храните и не копируйте информацию любым способом.
In the `mac802154_scan_worker` function, the `scan_req->type` field was
accessed after the RCU read-side critical section was unlocked. According
to RCU usage rules, this is illegal and can lead to unpredictable
behavior, such as accessing memory that has been updated or causing
use-after-free issues.
This possible bug was identified using a static analysis tool developed
by myself, specifically designed to detect RCU-related issues.
To address this, the `scan_req->type` value is now stored in a local
variable `scan_req_type` while still within the RCU read-side critical
section. The `scan_req_type` is then used after the RCU lock is released,
ensuring that the type value is safely accessed without violating RCU
rules.
Fixes: e2c3e6f53a7a ("mac802154: Handle active scanning")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jiawei Ye <jiawei.ye(a)foxmail.com>
Acked-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Changelog:
v1 -> v2:
- Repositioned the enum nl802154_scan_types scan_req_type declaration
between struct cfg802154_scan_request *scan_req and struct
ieee802154_sub_if_data *sdata to comply with the reverse Christmas tree
rule.
v2 -> v3:
- Add Cc stable tag and Acked-by.
---
net/mac802154/scan.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/mac802154/scan.c b/net/mac802154/scan.c
index 1c0eeaa76560..a6dab3cc3ad8 100644
--- a/net/mac802154/scan.c
+++ b/net/mac802154/scan.c
@@ -176,6 +176,7 @@ void mac802154_scan_worker(struct work_struct *work)
struct ieee802154_local *local =
container_of(work, struct ieee802154_local, scan_work.work);
struct cfg802154_scan_request *scan_req;
+ enum nl802154_scan_types scan_req_type;
struct ieee802154_sub_if_data *sdata;
unsigned int scan_duration = 0;
struct wpan_phy *wpan_phy;
@@ -209,6 +210,7 @@ void mac802154_scan_worker(struct work_struct *work)
}
wpan_phy = scan_req->wpan_phy;
+ scan_req_type = scan_req->type;
scan_req_duration = scan_req->duration;
/* Look for the next valid chan */
@@ -246,7 +248,7 @@ void mac802154_scan_worker(struct work_struct *work)
goto end_scan;
}
- if (scan_req->type == NL802154_SCAN_ACTIVE) {
+ if (scan_req_type == NL802154_SCAN_ACTIVE) {
ret = mac802154_transmit_beacon_req(local, sdata);
if (ret)
dev_err(&sdata->dev->dev,
--
2.34.1
Imagine an mmap()'d file. Two threads touch the same address at the same
time and fault. Both allocate a physical page and race to install a PTE
for that page. Only one will win the race. The loser frees its page, but
still continues handling the fault as a success and returns
VM_FAULT_NOPAGE from the fault handler.
The same race can happen with SGX. But there's a bug: the loser in the
SGX steers into a failure path. The loser EREMOVE's the winner's EPC
page, then returns SIGBUS, likely killing the app.
Fix the SGX loser's behavior. Check whether another thread already
allocated the page and if yes, return with VM_FAULT_NOPAGE.
The race can be illustrated as follows:
/* /*
* Fault on CPU1 * Fault on CPU2
* on enclave page X * on enclave page X
*/ */
sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array)
== NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ...
/* /*
* alloc encl_page * alloc encl_page
*/ */
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray
*/
xa_insert(&encl->page_array, ...);
/*
* add page to enclave via EAUG
* (page is in pending state)
*/
/*
* add PTE entry
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
}
/*
* All good up to here: enclave page
* successfully added to enclave,
* ready for EACCEPT from user space
*/
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray,
* this fails with -EBUSY as this
* page was already added by CPU2
*/
xa_insert(&encl->page_array, ...);
err_out_shrink:
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*
* *BUG*: page added by CPU2 is
* yanked from enclave while it
* remains accessible from OS
* perspective (PTE installed)
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
mutex_unlock(&encl->lock);
/*
* *BUG*: SIGBUS is returned
* for a valid enclave page
*/
return VM_FAULT_SIGBUS;
}
}
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable(a)vger.kernel.org
Reported-by: Marcelina Kościelnicka <mwk(a)invisiblethingslab.com>
Suggested-by: Kai Huang <kai.huang(a)intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 36 ++++++++++++++++++++--------------
1 file changed, 21 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index c0a3c00284c8..2aa7ced0e4a0 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -337,6 +337,16 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
return VM_FAULT_SIGBUS;
+ mutex_lock(&encl->lock);
+
+ /*
+ * Multiple threads may try to fault on the same page concurrently.
+ * Re-check if another thread has already done that.
+ */
+ encl_page = xa_load(&encl->page_array, PFN_DOWN(addr));
+ if (encl_page)
+ goto done;
+
/*
* Ignore internal permission checking for dynamically added pages.
* They matter only for data added during the pre-initialization
@@ -345,23 +355,23 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
*/
secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X;
encl_page = sgx_encl_page_alloc(encl, addr - encl->base, secinfo_flags);
- if (IS_ERR(encl_page))
- return VM_FAULT_OOM;
-
- mutex_lock(&encl->lock);
+ if (IS_ERR(encl_page)) {
+ vmret = VM_FAULT_OOM;
+ goto err_out_unlock;
+ }
epc_page = sgx_encl_load_secs(encl);
if (IS_ERR(epc_page)) {
if (PTR_ERR(epc_page) == -EBUSY)
vmret = VM_FAULT_NOPAGE;
- goto err_out_unlock;
+ goto err_out_encl;
}
epc_page = sgx_alloc_epc_page(encl_page, false);
if (IS_ERR(epc_page)) {
if (PTR_ERR(epc_page) == -EBUSY)
vmret = VM_FAULT_NOPAGE;
- goto err_out_unlock;
+ goto err_out_encl;
}
va_page = sgx_encl_grow(encl, false);
@@ -376,10 +386,6 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
encl_page, GFP_KERNEL);
- /*
- * If ret == -EBUSY then page was created in another flow while
- * running without encl->lock
- */
if (ret)
goto err_out_shrink;
@@ -389,7 +395,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
if (ret)
- goto err_out;
+ goto err_out_eaug;
encl_page->encl = encl;
encl_page->epc_page = epc_page;
@@ -408,20 +414,20 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
mutex_unlock(&encl->lock);
return VM_FAULT_SIGBUS;
}
+done:
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
-err_out:
+err_out_eaug:
xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
-
err_out_shrink:
sgx_encl_shrink(encl, va_page);
err_out_epc:
sgx_encl_free_epc_page(epc_page);
+err_out_encl:
+ kfree(encl_page);
err_out_unlock:
mutex_unlock(&encl->lock);
- kfree(encl_page);
-
return vmret;
}
--
2.43.0
The page reclaimer thread sets SGX_ENC_PAGE_BEING_RECLAIMED flag when
the enclave page is being reclaimed (moved to the backing store). This
flag however has two logical meanings:
1. Don't attempt to load the enclave page (the page is busy), see
__sgx_encl_load_page().
2. Don't attempt to remove the PCMD page corresponding to this enclave
page (the PCMD page is busy), see reclaimer_writing_to_pcmd().
To reflect these two meanings, split SGX_ENCL_PAGE_BEING_RECLAIMED into
two flags: SGX_ENCL_PAGE_BUSY and SGX_ENCL_PAGE_PCMD_BUSY. Currently,
both flags are set only when the enclave page is being reclaimed (by the
page reclaimer thread). A future commit will introduce new cases when
the enclave page is being operated on; these new cases will set only the
SGX_ENCL_PAGE_BUSY flag.
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
Reviewed-by: Haitao Huang <haitao.huang(a)linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Acked-by: Kai Huang <kai.huang(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 16 +++++++---------
arch/x86/kernel/cpu/sgx/encl.h | 10 ++++++++--
arch/x86/kernel/cpu/sgx/main.c | 4 ++--
3 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..c0a3c00284c8 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -46,10 +46,10 @@ static int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_ind
* a check if an enclave page sharing the PCMD page is in the process of being
* reclaimed.
*
- * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
- * intends to reclaim that enclave page - it means that the PCMD page
- * associated with that enclave page is about to get some data and thus
- * even if the PCMD page is empty, it should not be truncated.
+ * The reclaimer sets the SGX_ENCL_PAGE_PCMD_BUSY flag when it intends to
+ * reclaim that enclave page - it means that the PCMD page associated with that
+ * enclave page is about to get some data and thus even if the PCMD page is
+ * empty, it should not be truncated.
*
* Context: Enclave mutex (&sgx_encl->lock) must be held.
* Return: 1 if the reclaimer is about to write to the PCMD page
@@ -77,8 +77,7 @@ static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
* Stop when reaching the SECS page - it does not
* have a page_array entry and its reclaim is
* started and completed with enclave mutex held so
- * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
- * flag.
+ * it does not use the SGX_ENCL_PAGE_PCMD_BUSY flag.
*/
if (addr == encl->base + encl->size)
break;
@@ -91,8 +90,7 @@ static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
* VA page slot ID uses same bit as the flag so it is important
* to ensure that the page is not already in backing store.
*/
- if (entry->epc_page &&
- (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
+ if (entry->epc_page && (entry->desc & SGX_ENCL_PAGE_PCMD_BUSY)) {
reclaimed = 1;
break;
}
@@ -257,7 +255,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
/* Entry successfully located. */
if (entry->epc_page) {
- if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+ if (entry->desc & SGX_ENCL_PAGE_BUSY)
return ERR_PTR(-EBUSY);
return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..b566b8ad5f33 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -22,8 +22,14 @@
/* 'desc' bits holding the offset in the VA (version array) page. */
#define SGX_ENCL_PAGE_VA_OFFSET_MASK GENMASK_ULL(11, 3)
-/* 'desc' bit marking that the page is being reclaimed. */
-#define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3)
+/* 'desc' bit indicating that the page is busy (being reclaimed). */
+#define SGX_ENCL_PAGE_BUSY BIT(2)
+
+/*
+ * 'desc' bit indicating that PCMD page associated with the enclave page is
+ * busy (because the enclave page is being reclaimed).
+ */
+#define SGX_ENCL_PAGE_PCMD_BUSY BIT(3)
struct sgx_encl_page {
unsigned long desc;
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index f3f1461273ee..da59f034b769 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -205,7 +205,7 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page,
void *va_slot;
int ret;
- encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED;
+ encl_page->desc &= ~(SGX_ENCL_PAGE_BUSY | SGX_ENCL_PAGE_PCMD_BUSY);
va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
list);
@@ -341,7 +341,7 @@ static void sgx_reclaim_pages(void)
goto skip;
}
- encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
+ encl_page->desc |= SGX_ENCL_PAGE_BUSY | SGX_ENCL_PAGE_PCMD_BUSY;
mutex_unlock(&encl_page->encl->lock);
continue;
--
2.43.0
The return value of drm_atomic_get_crtc_state() needs to be
checked. To avoid use of error pointer 'crtc_state' in case
of the failure.
Cc: stable(a)vger.kernel.org
Fixes: dd86dc2f9ae1 ("drm/sti: implement atomic_check for the planes")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/gpu/drm/sti/sti_hqvdp.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/sti/sti_hqvdp.c b/drivers/gpu/drm/sti/sti_hqvdp.c
index 0fb48ac044d8..abab92df78bd 100644
--- a/drivers/gpu/drm/sti/sti_hqvdp.c
+++ b/drivers/gpu/drm/sti/sti_hqvdp.c
@@ -1037,6 +1037,9 @@ static int sti_hqvdp_atomic_check(struct drm_plane *drm_plane,
return 0;
crtc_state = drm_atomic_get_crtc_state(state, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
+
mode = &crtc_state->mode;
dst_x = new_plane_state->crtc_x;
dst_y = new_plane_state->crtc_y;
--
2.25.1
The return value of drm_atomic_get_crtc_state() needs to be
checked. To avoid use of error pointer 'crtc_state' in case
of the failure.
Cc: stable(a)vger.kernel.org
Fixes: dd86dc2f9ae1 ("drm/sti: implement atomic_check for the planes")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- modified the Fixes tag according to suggestions;
- added necessary blank line.
---
drivers/gpu/drm/sti/sti_cursor.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/sti/sti_cursor.c b/drivers/gpu/drm/sti/sti_cursor.c
index db0a1eb53532..c59fcb4dca32 100644
--- a/drivers/gpu/drm/sti/sti_cursor.c
+++ b/drivers/gpu/drm/sti/sti_cursor.c
@@ -200,6 +200,9 @@ static int sti_cursor_atomic_check(struct drm_plane *drm_plane,
return 0;
crtc_state = drm_atomic_get_crtc_state(state, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
+
mode = &crtc_state->mode;
dst_x = new_plane_state->crtc_x;
dst_y = new_plane_state->crtc_y;
--
2.25.1
The return value of drm_atomic_get_crtc_state() needs to be
checked. To avoid use of error pointer 'crtc_state' in case
of the failure.
Cc: stable(a)vger.kernel.org
Fixes: dd86dc2f9ae1 ("drm/sti: implement atomic_check for the planes")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/gpu/drm/sti/sti_gdp.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/sti/sti_gdp.c b/drivers/gpu/drm/sti/sti_gdp.c
index 43c72c2604a0..f046f5f7ad25 100644
--- a/drivers/gpu/drm/sti/sti_gdp.c
+++ b/drivers/gpu/drm/sti/sti_gdp.c
@@ -638,6 +638,9 @@ static int sti_gdp_atomic_check(struct drm_plane *drm_plane,
mixer = to_sti_mixer(crtc);
crtc_state = drm_atomic_get_crtc_state(state, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
+
mode = &crtc_state->mode;
dst_x = new_plane_state->crtc_x;
dst_y = new_plane_state->crtc_y;
--
2.25.1
From: Rob Clark <robdclark(a)chromium.org>
Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609
The whole premise of lockless access to a single-producer-single-
consumer queue is that there is just a single producer and single
consumer. That means we can't call drm_sched_can_queue() (which is
about queueing more work to the hw, not to the spsc queue) from
anywhere other than the consumer (wq).
This call in the producer is just an optimization to avoid scheduling
the consuming worker if it cannot yet queue more work to the hw. It
is safe to drop this optimization to avoid the race condition.
Suggested-by: Asahi Lina <lina(a)asahilina.net>
Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control")
Closes: https://github.com/AsahiLinux/linux/issues/309
Cc: stable(a)vger.kernel.org
Signed-off-by: Rob Clark <robdclark(a)chromium.org>
---
drivers/gpu/drm/scheduler/sched_entity.c | 4 ++--
drivers/gpu/drm/scheduler/sched_main.c | 7 ++-----
include/drm/gpu_scheduler.h | 2 +-
3 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 58c8161289fe..567e5ace6d0c 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -380,7 +380,7 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
container_of(cb, struct drm_sched_entity, cb);
drm_sched_entity_clear_dep(f, cb);
- drm_sched_wakeup(entity->rq->sched, entity);
+ drm_sched_wakeup(entity->rq->sched);
}
/**
@@ -612,7 +612,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo(entity, submit_ts);
- drm_sched_wakeup(entity->rq->sched, entity);
+ drm_sched_wakeup(entity->rq->sched);
}
}
EXPORT_SYMBOL(drm_sched_entity_push_job);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index ab53ab486fe6..6f27cab0b76d 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1013,15 +1013,12 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
/**
* drm_sched_wakeup - Wake up the scheduler if it is ready to queue
* @sched: scheduler instance
- * @entity: the scheduler entity
*
* Wake up the scheduler if we can queue jobs.
*/
-void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
- struct drm_sched_entity *entity)
+void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
{
- if (drm_sched_can_queue(sched, entity))
- drm_sched_run_job_queue(sched);
+ drm_sched_run_job_queue(sched);
}
/**
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index fe8edb917360..9c437a057e5d 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -574,7 +574,7 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
void drm_sched_job_cleanup(struct drm_sched_job *job);
-void drm_sched_wakeup(struct drm_gpu_scheduler *sched, struct drm_sched_entity *entity);
+void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched);
void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched);
--
2.46.0
The IO yurex_write() needs to wait for in order to have a device
ready for writing again can take a long time time.
Consequently the sleep is done in an interruptible state.
Therefore others waiting for yurex_write() itself to finish should
use mutex_lock_interruptible.
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Fixes: 6bc235a2e24a5 ("USB: add driver for Meywa-Denki & Kayac YUREX")
---
drivers/usb/misc/yurex.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/misc/yurex.c b/drivers/usb/misc/yurex.c
index 4a9859e03f6b..0deffdc8205f 100644
--- a/drivers/usb/misc/yurex.c
+++ b/drivers/usb/misc/yurex.c
@@ -430,7 +430,7 @@ static ssize_t yurex_write(struct file *file, const char __user *user_buffer,
size_t count, loff_t *ppos)
{
struct usb_yurex *dev;
- int i, set = 0, retval = 0;
+ int i, set = 0, retval;
char buffer[16 + 1];
char *data = buffer;
unsigned long long c, c2 = 0;
@@ -444,7 +444,10 @@ static ssize_t yurex_write(struct file *file, const char __user *user_buffer,
if (count == 0)
goto error;
- mutex_lock(&dev->io_mutex);
+ retval = mutex_lock_interruptible(&dev->io_mutex);
+ if (retval < 0)
+ return -EINTR;
+
if (dev->disconnected) { /* already disconnected */
mutex_unlock(&dev->io_mutex);
retval = -ENODEV;
--
2.46.1
The change is similar to that is used in comp_algorithm_set.
default_compressor is used for ZRAM_PRIMARY_COMP only, but if
CONFIG_ZRAM_MULTI_COMP isn't set, then ZRAM_PRIMARY_COMP and
ZRAM_SECONDARY_COMP are the same.
This is detected by KASAN.
==================================================================
Call trace:
kfree+0x60/0x3a0
zram_destroy_comps+0x98/0x198 [zram]
zram_reset_device+0x22c/0x4a8 [zram]
reset_store+0x1bc/0x2d8 [zram]
dev_attr_store+0x44/0x80
sysfs_kf_write+0xfc/0x188
kernfs_fop_write_iter+0x28c/0x428
vfs_write+0x4dc/0x9b8
ksys_write+0x100/0x1f8
__arm64_sys_write+0x74/0xb8
invoke_syscall+0xd8/0x260
el0_svc_common.constprop.0+0xb4/0x240
do_el0_svc+0x48/0x68
el0_svc+0x40/0xc8
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x190/0x198
==================================================================
Signed-off-by: Andrey Skvortsov <andrej.skvortzov(a)gmail.com>
Fixes: 684826f8271a ("zram: free secondary algorithms names")
Cc: <stable(a)vger.kernel.org>
---
Changes in v2:
- removed comment from source code about freeing statically defined compression
- removed part of KASAN report from commit description
- added information about CONFIG_ZRAM_MULTI_COMP into commit description
drivers/block/zram/zram_drv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index c3d245617083d..4a9cdb7fe8878 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2116,7 +2116,8 @@ static void zram_destroy_comps(struct zram *zram)
}
for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
- kfree(zram->comp_algs[prio]);
+ if (zram->comp_algs[prio] != default_compressor)
+ kfree(zram->comp_algs[prio]);
zram->comp_algs[prio] = NULL;
}
--
2.45.2
Evil user can guess the next id of the vm before the ioctl completes and
then call vm destroy ioctl to trigger UAF since create ioctl is still
referencing the same vm. Move the xa_alloc all the way to the end to
prevent this.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_vm.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a3d7cb7cfd22..f7182ef3d8e6 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1765,12 +1765,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
if (IS_ERR(vm))
return PTR_ERR(vm);
- mutex_lock(&xef->vm.lock);
- err = xa_alloc(&xef->vm.xa, &id, vm, xa_limit_32b, GFP_KERNEL);
- mutex_unlock(&xef->vm.lock);
- if (err)
- goto err_close_and_put;
-
if (xe->info.has_asid) {
down_write(&xe->usm.lock);
err = xa_alloc_cyclic(&xe->usm.asid_to_vm, &asid, vm,
@@ -1778,12 +1772,11 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
&xe->usm.next_asid, GFP_KERNEL);
up_write(&xe->usm.lock);
if (err < 0)
- goto err_free_id;
+ goto err_close_and_put;
vm->usm.asid = asid;
}
- args->vm_id = id;
vm->xef = xe_file_get(xef);
/* Record BO memory for VM pagetable created against client */
@@ -1796,12 +1789,17 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
args->reserved[0] = xe_bo_main_addr(vm->pt_root[0]->bo, XE_PAGE_SIZE);
#endif
- return 0;
-
-err_free_id:
+ /* user id alloc must always be last in ioctl to prevent UAF */
mutex_lock(&xef->vm.lock);
- xa_erase(&xef->vm.xa, id);
+ err = xa_alloc(&xef->vm.xa, &id, vm, xa_limit_32b, GFP_KERNEL);
mutex_unlock(&xef->vm.lock);
+ if (err)
+ goto err_close_and_put;
+
+ args->vm_id = id;
+
+ return 0;
+
err_close_and_put:
xe_vm_close_and_put(vm);
--
2.46.1
On Mon, Sep 23, 2024 at 03:41:12PM +0000, John Allen wrote:
> A problem was introduced with f69759b ("x86/CPU/AMD: Move Zenbleed check to
> the Zen2 init function") where a bit in the DE_CFG MSR is getting set after
> a microcode late load.
>
> The problem is that the microcode late load path calls into
> amd_check_microcode and subsequently zen2_zenbleed_check. Since the patch
> removes the cpu_has_amd_erratum check from zen2_zenbleed_check, this will
> cause all non-Zen2 cpus to go through the function and set the bit in the
> DE_CFG MSR.
>
> Call into the zenbleed fix path on Zen2 cpus only.
>
> Fixes: f69759be251d ("x86/CPU/AMD: Move Zenbleed check to the Zen2 init function")
Should probably go into stable as well.
Cc: <stable(a)vger.kernel.org>
> Signed-off-by: John Allen <john.allen(a)amd.com>
> ---
> arch/x86/kernel/cpu/amd.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 015971adadfc..368344e1394b 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -1202,5 +1202,6 @@ void amd_check_microcode(void)
> if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> return;
>
> - on_each_cpu(zenbleed_check_cpu, NULL, 1);
> + if (boot_cpu_has(X86_FEATURE_ZEN2))
> + on_each_cpu(zenbleed_check_cpu, NULL, 1);
> }
> --
> 2.34.1
>
devm_kasprintf() can return a NULL pointer on failure but this returned
value is not checked. Fix this lack and check the returned value.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 32c170ff15b0 ("pinctrl: stm32: set default gpio line names using pin names")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v3:
- modified the code style according to suggestions.
Changes in v2:
- modified the patch according to suggestions, added braces;
- modified the typo.
---
drivers/pinctrl/stm32/pinctrl-stm32.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/pinctrl/stm32/pinctrl-stm32.c b/drivers/pinctrl/stm32/pinctrl-stm32.c
index a8673739871d..5b7fa77c1184 100644
--- a/drivers/pinctrl/stm32/pinctrl-stm32.c
+++ b/drivers/pinctrl/stm32/pinctrl-stm32.c
@@ -1374,10 +1374,15 @@ static int stm32_gpiolib_register_bank(struct stm32_pinctrl *pctl, struct fwnode
for (i = 0; i < npins; i++) {
stm32_pin = stm32_pctrl_get_desc_pin_from_gpio(pctl, bank, i);
- if (stm32_pin && stm32_pin->pin.name)
+ if (stm32_pin && stm32_pin->pin.name) {
names[i] = devm_kasprintf(dev, GFP_KERNEL, "%s", stm32_pin->pin.name);
- else
+ if (!names[i]) {
+ err = -ENOMEM;
+ goto err_clk;
+ }
+ } else {
names[i] = NULL;
+ }
}
bank->gpio_chip.names = (const char * const *)names;
--
2.25.1
devm_kasprintf() can return a NULL pointer on failure but this returned
value is not checked. Fix this lack and check the returned value.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: a0f160ffcb83 ("pinctrl: add pinctrl/GPIO driver for Apple SoCs")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/pinctrl/pinctrl-apple-gpio.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pinctrl/pinctrl-apple-gpio.c b/drivers/pinctrl/pinctrl-apple-gpio.c
index 3751c7de37aa..f861e63f4115 100644
--- a/drivers/pinctrl/pinctrl-apple-gpio.c
+++ b/drivers/pinctrl/pinctrl-apple-gpio.c
@@ -474,6 +474,9 @@ static int apple_gpio_pinctrl_probe(struct platform_device *pdev)
for (i = 0; i < npins; i++) {
pins[i].number = i;
pins[i].name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "PIN%u", i);
+ if (!pins[i].name)
+ return -ENOMEM;
+
pins[i].drv_data = pctl;
pin_names[i] = pins[i].name;
pin_nums[i] = i;
--
2.25.1
Hi,
This commit from mainline fixes a crash found in kernel 6.11 if users
change modes with amd-pstate.
commit 49243adc715e ("cpufreq/amd-pstate: Add the missing
cpufreq_cpu_put()")
Please backport to 6.11.y. Earlier kernels should not be affected.
Thanks!
Hi,
With systems that have 8 GPUs nodes + an integrated BMC, the BMC can't
work properly because of limits of DRM nodes in the kernel.
This is fixed in Linus tree with these commits:
Can you please backport to 6.6.y and later?
commit 5fbca8b48b30 ("drm: Use XArray instead of IDR for minors")
commit 45c4d994b82b ("accel: Use XArray instead of IDR for minors")
commit 071d583e01c8 ("drm: Expand max DRM device number to full MINORBITS")
Thanks,
From: Peter Wang <peter.wang(a)mediatek.com>
ufshcd_abort_all froce abort all on-going command and the host
will automatically fill in the OCS field of the corresponding
response with OCS_ABORTED based on different working modes.
MCQ mode: aborts a command using SQ cleanup, The host controller
will post a Completion Queue entry with OCS = ABORTED.
SDB mode: aborts a command using UTRLCLR. Task Management response
which means a Transfer Request was aborted.
For these two cases, set a flag to notify SCSI to requeue the
command after receiving response with OCS_ABORTED.
Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: Peter Wang <peter.wang(a)mediatek.com>
---
drivers/ufs/core/ufshcd.c | 34 +++++++++++++++++++---------------
include/ufs/ufshcd.h | 3 +++
2 files changed, 22 insertions(+), 15 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index a6f818cdef0e..615da47c1727 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -3006,6 +3006,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
ufshcd_prepare_lrbp_crypto(scsi_cmd_to_rq(cmd), lrbp);
lrbp->req_abort_skip = false;
+ lrbp->abort_initiated_by_err = false;
ufshcd_comp_scsi_upiu(hba, lrbp);
@@ -5404,7 +5405,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
}
break;
case OCS_ABORTED:
- result |= DID_ABORT << 16;
+ if (lrbp->abort_initiated_by_err)
+ result |= DID_REQUEUE << 16;
+ else
+ result |= DID_ABORT << 16;
break;
case OCS_INVALID_COMMAND_STATUS:
result |= DID_REQUEUE << 16;
@@ -6471,26 +6475,12 @@ static bool ufshcd_abort_one(struct request *rq, void *priv)
struct scsi_device *sdev = cmd->device;
struct Scsi_Host *shost = sdev->host;
struct ufs_hba *hba = shost_priv(shost);
- struct ufshcd_lrb *lrbp = &hba->lrb[tag];
- struct ufs_hw_queue *hwq;
- unsigned long flags;
*ret = ufshcd_try_to_abort_task(hba, tag);
dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
*ret ? "failed" : "succeeded");
- /* Release cmd in MCQ mode if abort succeeds */
- if (hba->mcq_enabled && (*ret == 0)) {
- hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
- if (!hwq)
- return 0;
- spin_lock_irqsave(&hwq->cq_lock, flags);
- if (ufshcd_cmd_inflight(lrbp->cmd))
- ufshcd_release_scsi_cmd(hba, lrbp);
- spin_unlock_irqrestore(&hwq->cq_lock, flags);
- }
-
return *ret == 0;
}
@@ -7561,6 +7551,20 @@ int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag)
goto out;
}
+ /*
+ * When the host software receives a "FUNCTION COMPLETE", set flag
+ * to requeue command after receive response with OCS_ABORTED
+ * SDB mode: UTRLCLR Task Management response which means a Transfer
+ * Request was aborted.
+ * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ cleanup
+ * This flag is set because ufshcd_abort_all forcibly aborts all
+ * commands, and the host will automatically fill in the OCS field
+ * of the corresponding response with OCS_ABORTED.
+ * Therefore, upon receiving this response, it needs to be requeued.
+ */
+ if (!err)
+ lrbp->abort_initiated_by_err = true;
+
err = ufshcd_clear_cmd(hba, tag);
if (err)
dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 0fd2aebac728..15b357672ca5 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -173,6 +173,8 @@ struct ufs_pm_lvl_states {
* @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
* @data_unit_num: the data unit number for the first block for inline crypto
* @req_abort_skip: skip request abort task flag
+ * @abort_initiated_by_err: The flag is specifically used to handle aborts
+ * caused by errors due to host/device communication
*/
struct ufshcd_lrb {
struct utp_transfer_req_desc *utr_descriptor_ptr;
@@ -202,6 +204,7 @@ struct ufshcd_lrb {
#endif
bool req_abort_skip;
+ bool abort_initiated_by_err;
};
/**
--
2.45.2
Commit 07a4a4fc83dd ("ideapad: add Lenovo IdeaPad Z570 support (part 2)")
added an i8042_command(..., I8042_CMD_AUX_[EN|DIS]ABLE) call to
the ideapad-laptop driver to suppress the touchpad events at the PS/2
AUX controller level.
Commit c69e7d843d2c ("platform/x86: ideapad-laptop: Only toggle ps2 aux
port on/off on select models") limited this to only do this by default
on the IdeaPad Z570 to replace a growing list of models on which
the i8042_command() call was disabled by quirks because it was causing
issues.
A recent report shows that this is causing issues even on the Z570 for
which it was originally added because it can happen on resume before
the i8042 controller's own resume() method has run:
[ 50.241235] ideapad_acpi VPC2004:00: PM: calling acpi_subsys_resume+0x0/0x5d @ 4492, parent: PNP0C09:00
[ 50.242055] snd_hda_intel 0000:00:0e.0: PM: pci_pm_resume+0x0/0xed returned 0 after 13511 usecs
[ 50.242120] snd_hda_codec_realtek hdaudioC0D0: PM: calling hda_codec_pm_resume+0x0/0x19 [snd_hda_codec] @ 4518, parent: 0000:00:0e.0
[ 50.247406] i8042: [49434] a8 -> i8042 (command)
[ 50.247468] ideapad_acpi VPC2004:00: PM: acpi_subsys_resume+0x0/0x5d returned 0 after 6220 usecs
...
[ 50.247883] i8042 kbd 00:01: PM: calling pnp_bus_resume+0x0/0x9d @ 4492, parent: pnp0
[ 50.247894] i8042 kbd 00:01: PM: pnp_bus_resume+0x0/0x9d returned 0 after 0 usecs
[ 50.247906] i8042 aux 00:02: PM: calling pnp_bus_resume+0x0/0x9d @ 4492, parent: pnp0
[ 50.247916] i8042 aux 00:02: PM: pnp_bus_resume+0x0/0x9d returned 0 after 0 usecs
...
[ 50.248301] i8042 i8042: PM: calling platform_pm_resume+0x0/0x41 @ 4492, parent: platform
[ 50.248377] i8042: [49434] 55 <- i8042 (flush, kbd)
[ 50.248407] i8042: [49435] aa -> i8042 (command)
[ 50.248601] i8042: [49435] 00 <- i8042 (return)
[ 50.248604] i8042: [49435] i8042 controller selftest: 0x0 != 0x55
Dmitry (input subsys maintainer) pointed out that just sending
KEY_TOUCHPAD_OFF/KEY_TOUCHPAD_ON which the ideapad-laptop driver
already does should be sufficient and that it then is up to userspace
to filter out touchpad events after having received a KEY_TOUCHPAD_OFF.
Given all the problems the i8042_command() call has been causing just
removing it indeed seems best, so this removes it completely. Note that
this only impacts the Ideapad Z570 since it was already disabled by
default on all other models.
Fixes: c69e7d843d2c ("platform/x86: ideapad-laptop: Only toggle ps2 aux port on/off on select models")
Reported-by: Jonathan Denose <jdenose(a)chromium.org>
Closes: https://lore.kernel.org/linux-input/20231102075243.1.Idb37ff8043a29f607beab…
Suggested-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Cc: Maxim Mikityanskiy <maxtram95(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/platform/x86/ideapad-laptop.c | 37 ---------------------------
1 file changed, 37 deletions(-)
diff --git a/drivers/platform/x86/ideapad-laptop.c b/drivers/platform/x86/ideapad-laptop.c
index 1ace711f7442..255fb56ec9ee 100644
--- a/drivers/platform/x86/ideapad-laptop.c
+++ b/drivers/platform/x86/ideapad-laptop.c
@@ -18,7 +18,6 @@
#include <linux/device.h>
#include <linux/dmi.h>
#include <linux/fb.h>
-#include <linux/i8042.h>
#include <linux/init.h>
#include <linux/input.h>
#include <linux/input/sparse-keymap.h>
@@ -144,7 +143,6 @@ struct ideapad_private {
bool hw_rfkill_switch : 1;
bool kbd_bl : 1;
bool touchpad_ctrl_via_ec : 1;
- bool ctrl_ps2_aux_port : 1;
bool usb_charging : 1;
} features;
struct {
@@ -182,12 +180,6 @@ MODULE_PARM_DESC(set_fn_lock_led,
"Enable driver based updates of the fn-lock LED on fn-lock changes. "
"If you need this please report this to: platform-driver-x86(a)vger.kernel.org");
-static bool ctrl_ps2_aux_port;
-module_param(ctrl_ps2_aux_port, bool, 0444);
-MODULE_PARM_DESC(ctrl_ps2_aux_port,
- "Enable driver based PS/2 aux port en-/dis-abling on touchpad on/off toggle. "
- "If you need this please report this to: platform-driver-x86(a)vger.kernel.org");
-
static bool touchpad_ctrl_via_ec;
module_param(touchpad_ctrl_via_ec, bool, 0444);
MODULE_PARM_DESC(touchpad_ctrl_via_ec,
@@ -1560,7 +1552,6 @@ static void ideapad_fn_lock_led_exit(struct ideapad_private *priv)
static void ideapad_sync_touchpad_state(struct ideapad_private *priv, bool send_events)
{
unsigned long value;
- unsigned char param;
int ret;
/* Without reading from EC touchpad LED doesn't switch state */
@@ -1568,15 +1559,6 @@ static void ideapad_sync_touchpad_state(struct ideapad_private *priv, bool send_
if (ret)
return;
- /*
- * Some IdeaPads don't really turn off touchpad - they only
- * switch the LED state. We (de)activate KBC AUX port to turn
- * touchpad off and on. We send KEY_TOUCHPAD_OFF and
- * KEY_TOUCHPAD_ON to not to get out of sync with LED
- */
- if (priv->features.ctrl_ps2_aux_port)
- i8042_command(¶m, value ? I8042_CMD_AUX_ENABLE : I8042_CMD_AUX_DISABLE);
-
/*
* On older models the EC controls the touchpad and toggles it on/off
* itself, in this case we report KEY_TOUCHPAD_ON/_OFF. Some models do
@@ -1699,23 +1681,6 @@ static const struct dmi_system_id hw_rfkill_list[] = {
{}
};
-/*
- * On some models the EC toggles the touchpad muted LED on touchpad toggle
- * hotkey presses, but the EC does not actually disable the touchpad itself.
- * On these models the driver needs to explicitly enable/disable the i8042
- * (PS/2) aux port.
- */
-static const struct dmi_system_id ctrl_ps2_aux_port_list[] = {
- {
- /* Lenovo Ideapad Z570 */
- .matches = {
- DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
- DMI_MATCH(DMI_PRODUCT_VERSION, "Ideapad Z570"),
- },
- },
- {}
-};
-
static void ideapad_check_features(struct ideapad_private *priv)
{
acpi_handle handle = priv->adev->handle;
@@ -1725,8 +1690,6 @@ static void ideapad_check_features(struct ideapad_private *priv)
set_fn_lock_led || dmi_check_system(set_fn_lock_led_list);
priv->features.hw_rfkill_switch =
hw_rfkill_switch || dmi_check_system(hw_rfkill_list);
- priv->features.ctrl_ps2_aux_port =
- ctrl_ps2_aux_port || dmi_check_system(ctrl_ps2_aux_port_list);
priv->features.touchpad_ctrl_via_ec = touchpad_ctrl_via_ec;
if (!read_ec_data(handle, VPCCMD_R_FAN, &val))
--
2.45.2
The following patch addresses the issue of unchecked calls to
dma_map_sg() in safexcel_send_req() as these macros may return 0 in
case of unsuccessful mapping. This outcome in turn requires
unmapping of previously mapped buffers.
The fix has already been backported to the following stable branches:
v6.6: https://lore.kernel.org/all/20240122235813.608624333@linuxfoundation.org/
v6.1: https://lore.kernel.org/all/20240122235752.938797245@linuxfoundation.org/
The issue in question can be fixed in 5.10 and 5.15 stable branches by
backporting the following 2 upstream commits. Both can be cleanly
applied to kernel versions mentioned above.
[PATCH 5.10/5.15 1/2] crypto: inside_secure - Avoid dma map if size is zero
[PATCH 5.10/5.15 2/2] crypto: safexcel - Add error handling for dma_map_sg() calls
First patch is a prerequisite to main fix and removes warnings in case
of a call to dma_map_sg() with size 0 and allows for clean application
of the main fix.
Second (and main) patch adds proper handling of dma_map_sg() erroneous
behaviour.
In the `mac802154_scan_worker` function, the `scan_req->type` field was
accessed after the RCU read-side critical section was unlocked. According
to RCU usage rules, this is illegal and can lead to unpredictable
behavior, such as accessing memory that has been updated or causing
use-after-free issues.
This possible bug was identified using a static analysis tool developed
by myself, specifically designed to detect RCU-related issues.
To address this, the `scan_req->type` value is now stored in a local
variable `scan_req_type` while still within the RCU read-side critical
section. The `scan_req_type` is then used after the RCU lock is released,
ensuring that the type value is safely accessed without violating RCU
rules.
Fixes: e2c3e6f53a7a ("mac802154: Handle active scanning")
Signed-off-by: Jiawei Ye <jiawei.ye(a)foxmail.com>
Cc: stable(a)vger.kernel.org
Acked-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Changelog:
v1->v2: -Repositioned the enum nl802154_scan_types scan_req_type
declaration between struct cfg802154_scan_request *scan_req and struct
ieee802154_sub_if_data *sdata to comply with the reverse Christmas tree
rule.
---
net/mac802154/scan.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/mac802154/scan.c b/net/mac802154/scan.c
index 1c0eeaa76560..a6dab3cc3ad8 100644
--- a/net/mac802154/scan.c
+++ b/net/mac802154/scan.c
@@ -176,6 +176,7 @@ void mac802154_scan_worker(struct work_struct *work)
struct ieee802154_local *local =
container_of(work, struct ieee802154_local, scan_work.work);
struct cfg802154_scan_request *scan_req;
+ enum nl802154_scan_types scan_req_type;
struct ieee802154_sub_if_data *sdata;
unsigned int scan_duration = 0;
struct wpan_phy *wpan_phy;
@@ -209,6 +210,7 @@ void mac802154_scan_worker(struct work_struct *work)
}
wpan_phy = scan_req->wpan_phy;
+ scan_req_type = scan_req->type;
scan_req_duration = scan_req->duration;
/* Look for the next valid chan */
@@ -246,7 +248,7 @@ void mac802154_scan_worker(struct work_struct *work)
goto end_scan;
}
- if (scan_req->type == NL802154_SCAN_ACTIVE) {
+ if (scan_req_type == NL802154_SCAN_ACTIVE) {
ret = mac802154_transmit_beacon_req(local, sdata);
if (ret)
dev_err(&sdata->dev->dev,
--
2.34.1
The initial kref from dma_fence_init() should match up with whatever
signals the fence, however here we are submitting the job first to the
hw and only then grabbing the extra ref and even then we touch some
fence state before this. This might be too late if the fence is
signalled before we can grab the extra ref. Rather always grab the
refcount early before we do the submission part.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2811
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_guc_submit.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index fbbe6a487bbb..b33f3d23a068 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -766,12 +766,15 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
struct xe_guc *guc = exec_queue_to_guc(q);
struct xe_device *xe = guc_to_xe(guc);
bool lr = xe_exec_queue_is_lr(q);
+ struct dma_fence *fence;
xe_assert(xe, !(exec_queue_destroyed(q) || exec_queue_pending_disable(q)) ||
exec_queue_banned(q) || exec_queue_suspended(q));
trace_xe_sched_job_run(job);
+ dma_fence_get(job->fence);
+
if (!exec_queue_killed_or_banned_or_wedged(q) && !xe_sched_job_is_error(job)) {
if (!exec_queue_registered(q))
register_exec_queue(q);
@@ -782,12 +785,16 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
if (lr) {
xe_sched_job_set_error(job, -EOPNOTSUPP);
- return NULL;
+ fence = NULL;
} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
- return job->fence;
+ fence = job->fence;
} else {
- return dma_fence_get(job->fence);
+ fence = dma_fence_get(job->fence);
}
+
+ dma_fence_put(job->fence);
+
+ return fence;
}
static void guc_exec_queue_free_job(struct drm_sched_job *drm_job)
--
2.46.0
This patch aims to prevent reliably encountered buffer overflow in
rtl92d_dm_txpower_tracking_callback_thermalmeter() caused by
accessing 'ofdm_index[]' array of size 2 with index 'i' equal to 2,
which in turn is possible if value of 'is2t' is 'true'.
The issue in question has been fixed by the following upstream
patch that can be cleanly applied to 5.10 stable branch.
Keep it consistent with the handling of the same check within
generic_copy_file_checks().
Also, returning -EOVERFLOW in this case is more appropriate.
Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com>
---
fs/remap_range.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/remap_range.c b/fs/remap_range.c
index 6fdeb3c8cb70..a26521ded5c8 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -42,7 +42,7 @@ static int generic_remap_checks(struct file *file_in, loff_t pos_in,
/* The start of both ranges must be aligned to an fs block. */
if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_out, bs))
- return -EINVAL;
+ return -EOVERFLOW;
/* Ensure offsets don't wrap. */
if (check_add_overflow(pos_in, count, &tmp) ||
--
2.39.2
The recently submitted fix-commit revealed a problem in the iDMA32
platform code. Even though the controller supported only a single master
the dw_dma_acpi_filter() method hard-coded two master interfaces with IDs
0 and 1. As a result the sanity check implemented in the commit
b336268dde75 ("dmaengine: dw: Add peripheral bus width verification") got
incorrect interface data width and thus prevented the client drivers
from configuring the DMA-channel with the EINVAL error returned. E.g. the
next error was printed for the PXA2xx SPI controller driver trying to
configure the requested channels:
> [ 164.525604] pxa2xx_spi_pci 0000:00:07.1: DMA slave config failed
> [ 164.536105] pxa2xx_spi_pci 0000:00:07.1: failed to get DMA TX descriptor
> [ 164.543213] spidev spi-SPT0001:00: SPI transfer failed: -16
The problem would have been spotted much earlier if the iDMA32 controller
supported more than one master interfaces. But since it supports just a
single master and the iDMA32-specific code just ignores the master IDs in
the CTLLO preparation method, the issue has been gone unnoticed so far.
Fix the problem by specifying a single master ID for both memory and
peripheral devices on the ACPI-based platforms if there is only one master
available on the controller. Thus the issue noticed for the iDMA32
controllers will be eliminated and the ACPI-probed DW DMA controllers will
be configured with the correct master ID by default.
Cc: stable(a)vger.kernel.org
Fixes: b336268dde75 ("dmaengine: dw: Add peripheral bus width verification")
Fixes: 199244d69458 ("dmaengine: dw: add support of iDMA 32-bit hardware")
Reported-by: Ferry Toth <fntoth(a)gmail.com>
Closes: https://lore.kernel.org/dmaengine/ZuXbCKUs1iOqFu51@black.fi.intel.com/
Reported-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Closes: https://lore.kernel.org/dmaengine/ZuXgI-VcHpMgbZ91@black.fi.intel.com/
Signed-off-by: Serge Semin <fancer.lancer(a)gmail.com>
---
Note I haven't got any device with the Intel Merrifield iDMA32 + SPI
PXA2xx pair to test out the solution. So any tests are very welcome. But
based on Andy' (see the reported-by links) and my investigations the fix
seems correct.
---
drivers/dma/dw/acpi.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/dw/acpi.c b/drivers/dma/dw/acpi.c
index c510c109d2c3..efbe8baeccbc 100644
--- a/drivers/dma/dw/acpi.c
+++ b/drivers/dma/dw/acpi.c
@@ -8,15 +8,25 @@
static bool dw_dma_acpi_filter(struct dma_chan *chan, void *param)
{
+ struct dw_dma *dw = to_dw_dma(chan->device);
struct acpi_dma_spec *dma_spec = param;
struct dw_dma_slave slave = {
.dma_dev = dma_spec->dev,
.src_id = dma_spec->slave_id,
.dst_id = dma_spec->slave_id,
.m_master = 0,
- .p_master = 1,
};
+ /*
+ * Fallback to using a single interface for both memory and peripheral
+ * device if there is only one master I/F supported (e.g. iDMA32)
+ */
+ if (dw->pdata->nr_masters == 0)
+ slave.p_master = 0;
+ else
+ slave.p_master = 1;
+
+
return dw_dma_filter(chan, &slave);
}
--
2.43.0
From: Toke Høiland-Jørgensen <toke(a)redhat.com>
[ Upstream commit 281d464a34f540de166cee74b723e97ac2515ec3 ]
The devmap code allocates a number hash buckets equal to the next power
of two of the max_entries value provided when creating the map. When
rounding up to the next power of two, the 32-bit variable storing the
number of buckets can overflow, and the code checks for overflow by
checking if the truncated 32-bit value is equal to 0. However, on 32-bit
arches the rounding up itself can overflow mid-way through, because it
ends up doing a left-shift of 32 bits on an unsigned long value. If the
size of an unsigned long is four bytes, this is undefined behaviour, so
there is no guarantee that we'll end up with a nice and tidy 0-value at
the end.
Syzbot managed to turn this into a crash on arm32 by creating a
DEVMAP_HASH with max_entries > 0x80000000 and then trying to update it.
Fix this by moving the overflow check to before the rounding up
operation.
Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Link: https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
Reported-and-tested-by: syzbot+8cd36f6b65f3cafd400a(a)syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Message-ID: <20240307120340.99577-2-toke(a)redhat.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Pu Lehui <pulehui(a)huawei.com>
---
kernel/bpf/devmap.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 4b2819b0a05a..2370fc31169f 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -130,10 +130,13 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
cost = (u64) sizeof(struct list_head) * num_possible_cpus();
if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
- dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
-
- if (!dtab->n_buckets) /* Overflow check */
+ /* hash table size must be power of 2; roundup_pow_of_two() can
+ * overflow into UB on 32-bit arches, so check that first
+ */
+ if (dtab->map.max_entries > 1UL << 31)
return -EINVAL;
+
+ dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
cost += (u64) sizeof(struct hlist_head) * dtab->n_buckets;
} else {
cost += (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
--
2.34.1
If the adp5588_read function returns 0, then there will be an
overflow of the kpad->keycode buffer.
If the adp5588_read function returns a negative value, then the
logic is broken - the wrong value is used as an index of
the kpad->keycode array.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
drivers/input/keyboard/adp5588-keys.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/input/keyboard/adp5588-keys.c b/drivers/input/keyboard/adp5588-keys.c
index 1b0279393df4..d05387f9c11f 100644
--- a/drivers/input/keyboard/adp5588-keys.c
+++ b/drivers/input/keyboard/adp5588-keys.c
@@ -526,14 +526,17 @@ static void adp5588_report_events(struct adp5588_kpad *kpad, int ev_cnt)
int i;
for (i = 0; i < ev_cnt; i++) {
- int key = adp5588_read(kpad->client, KEY_EVENTA + i);
- int key_val = key & KEY_EV_MASK;
- int key_press = key & KEY_EV_PRESSED;
+ int key, key_val, key_press;
+ key = adp5588_read(kpad->client, KEY_EVENTA + i);
+ if (key < 0)
+ continue;
+ key_val = key & KEY_EV_MASK;
+ key_press = key & KEY_EV_PRESSED;
if (key_val >= GPI_PIN_BASE && key_val <= GPI_PIN_END) {
/* gpio line used as IRQ source */
adp5588_gpio_irq_handle(kpad, key_val, key_press);
- } else {
+ } else if (key_val > 0) {
int row = (key_val - 1) / ADP5588_COLS_MAX;
int col = (key_val - 1) % ADP5588_COLS_MAX;
int code = MATRIX_SCAN_CODE(row, col, kpad->row_shift);
--
2.25.1
Syzkaller reports a sleeping function called from invalid context
in do_con_write() in the 5.10 and 5.15 stable releases.
The issue has been fixed by the following upstream patch that was
adapted to 5.10 and 5.15. All of the changes made to the patch in order
to adapt it are described at the end of the commit message.
This patch has already been backported to the following stable branches:
v6.9 - https://lore.kernel.org/all/20240625085551.368621044@linuxfoundation.org/
v6.6 - https://lore.kernel.org/all/20240625085539.398852153@linuxfoundation.org/
v6.1 - https://lore.kernel.org/all/20240625085527.625846117@linuxfoundation.org/
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with Syzkaller.
Linus Torvalds (1):
tty: add the option to have a tty reject a new ldisc
drivers/tty/tty_ldisc.c | 6 ++++++
drivers/tty/vt/vt.c | 10 ++++++++++
include/linux/tty_driver.h | 8 ++++++++
3 files changed, 24 insertions(+)
--
2.39.2
usb_power_delivery_register_capabilities() returns ERR_PTR in case of
failure. usb_power_delivery_unregister_capabilities() we only check
argument ("cap") for NULL. A more robust check would be checking for
ERR_PTR as well.
Cc: stable(a)vger.kernel.org
Fixes: 662a60102c12 ("usb: typec: Separate USB Power Delivery from USB Type-C")
Signed-off-by: Amit Sunil Dhamne <amitsd(a)google.com>
Reviewed-by: Badhri Jagan Sridharan <badhri(a)google.com>
---
drivers/usb/typec/pd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/typec/pd.c b/drivers/usb/typec/pd.c
index d78c04a421bc..761fe4dddf1b 100644
--- a/drivers/usb/typec/pd.c
+++ b/drivers/usb/typec/pd.c
@@ -519,7 +519,7 @@ EXPORT_SYMBOL_GPL(usb_power_delivery_register_capabilities);
*/
void usb_power_delivery_unregister_capabilities(struct usb_power_delivery_capabilities *cap)
{
- if (!cap)
+ if (IS_ERR_OR_NULL(cap))
return;
device_for_each_child(&cap->dev, NULL, remove_pdo);
base-commit: 68d4209158f43a558c5553ea95ab0c8975eab18c
--
2.46.0.792.g87dc391469-goog