Hi, Thorsten here, the Linux kernel's regression tracker.
I noticed a report about a linux-6.6.y regression in bugzilla.kernel.org
that appears to be caused by this commit from Dan applied by Greg:
15fffc6a5624b1 ("driver core: Fix uevent_show() vs driver detach race")
[v6.11-rc3, v6.10.5, v6.6.46, v6.1.105, v5.15.165, v5.10.224, v5.4.282,
v4.19.320]
The reporter did not check yet if mainline is affected; decided to
forward the report by mail nevertheless, as the maintainer for the
subsystem is also the maintainer for the stable tree. ;-)
To quote from https://bugzilla.kernel.org/show_bug.cgi?id=219244 :
> The symptoms of this bug are as follows:
>
> - After booting (to the graphical login screen) the mouse pointer
> would frozen and only after unplugging and plugging-in again the usb
> plug of the mouse would the mouse be working as expected.
> - If one would log in without fixing the mouse issue, the mouse
> pointer would still be frozen after login.
> - The usb keyboard usually is not affected even though plugged into
> the same usb-hub - thus logging in is possible.
> - The mouse pointer is also frozen if the usb connector is plugged
> into a different usb-port (different from the usb-hub)
> - The pointer is moveable via the inbuilt synaptics trackpad
>
>
> The kernel log shows almost the same messages (not sure if the
> differences mean anything in regards to this bug) for the initial
> recognizing the mouse (frozen mouse pointer) and the re-plugged-in mouse
> (and subsequently moveable mouse pointer):
>
> [kernel] [ 8.763158] usb 1-2.2.1.2: new low-speed USB device number 10 using xhci_hcd
> [kernel] [ 8.956028] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04
> [kernel] [ 8.956036] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [kernel] [ 8.956039] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0
> [kernel] [ 8.956041] usb 1-2.2.1.2: Manufacturer: Microsoft
> [kernel] [ 8.963554] input: Microsoft Microsoft Basic Optical Mouse v2.0 as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0002/input/input18
> [kernel] [ 8.964417] hid-generic 0003:045E:00CB.0002: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0
>
> [kernel] [ 31.258381] usb 1-2.2.1.2: USB disconnect, device number 10
> [kernel] [ 31.595051] usb 1-2.2.1.2: new low-speed USB device number 16 using xhci_hcd
> [kernel] [ 31.804002] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04
> [kernel] [ 31.804010] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [kernel] [ 31.804013] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0
> [kernel] [ 31.804016] usb 1-2.2.1.2: Manufacturer: Microsoft
> [kernel] [ 31.812933] input: Microsoft Microsoft Basic Optical Mouse v2.0 as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0004/input/input20
> [kernel] [ 31.814028] hid-generic 0003:045E:00CB.0004: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0
>
> Differences:
>
> ../0003:045E:00CB.0002/input/input18 vs ../0003:045E:00CB.0004/input/input20
>
> and
>
> hid-generic 0003:045E:00CB.0002 vs hid-generic 0003:045E:00CB.0004
>
>
> The connector / usb-port was not changed in this case!
>
>
> The symptoms of this bug have been present at one point in the
> recent
> past, but with kernel v6.6.45 (or maybe even some version before that)
> it was fine. But with 6.6.45 it seems to be definitely fine.
>
> But with v6.6.46 the symptoms returned. That's the reason I
> suspected
> the kernel to be the cause of this issue. So I did some bisecting -
> which wasn't easy because that bug would often times not appear if the
> system was simply rebooted into the test kernel.
> As the bug would definitely appear on the affected kernels (v6.6.46
> ff) after shutting down the system for the night and booting the next
> day, I resorted to simulating the over-night powering-off by shutting
> the system down, unplugging the power and pressing the power button to
> get rid of residual voltage. But even then a few times the bug would
> only appear if I repeated this procedure before booting the system again
> with the respective kernel.
>
> This is on a Thinkpad with Kaby Lake and integrated Intel graphics.
> Even though it is a laptop, it is used as a desktop device, and the
> internal battery is disconnected, the removable battery is removed as
> the system is plugged-in via the power cord at all times (when in use)!
> Also, the system has no power (except for the bios battery, of
> course)
> over night as the power outlet is switched off if the device is not in use.
>
> Not sure if this affects the issue - or how it does. But for
> successful bisecting I had to resort to the above procedure.
>
> Bisecting the issue (between the release commits of v6.6.45 and
> v6.6.46) resulted in this commit [1] being the probable culprit.
>
> I then tested kernel v6.6.49. It still produced the bug for me. So I
> reverted the changes of the assumed "bad commit" and re-compiled kernel
> v6.6.49. With this modified kernel the bug seems to be gone.
>
> Now, I assume the commit has a reason for being introduced, but
> maybe
> needs some adjusting in order to avoid this bug I'm experiencing on my
> system.
> Also, I can't say why the issue appeared in the past even without
> this
> commit being present, as I haven't bisected any kernel version before
> v6.6.45.
>
>
> [1]:
>
> 4d035c743c3e391728a6f81cbf0f7f9ca700cf62 is the first bad commit
> commit 4d035c743c3e391728a6f81cbf0f7f9ca700cf62
> Author: Dan Williams <dan.j.williams(a)intel.com>
> Date: Fri Jul 12 12:42:09 2024 -0700
>
> driver core: Fix uevent_show() vs driver detach race
>
> commit 15fffc6a5624b13b428bb1c6e9088e32a55eb82c upstream.
>
> uevent_show() wants to de-reference dev->driver->name. There is no clean
See the ticket for more details. Note, you have to use bugzilla to reach
the reporter, as I sadly[1] can not CCed them in mails like this.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"
P.S.: let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: 4d035c743c3e391728a6f81cbf0f7f9ca700cf62
#regzbot from: brmails+k
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=219244
#regzbot title: driver core: frozen usb mouse pointer at boot
#regzbot ignore-activity
The patch titled
Subject: mm: avoid unconditional one-tick sleep when swapcache_prepare fails
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Barry Song <v-songbaohua(a)oppo.com>
Subject: mm: avoid unconditional one-tick sleep when swapcache_prepare fails
Date: Fri, 27 Sep 2024 09:19:36 +1200
Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
introduced an unconditional one-tick sleep when `swapcache_prepare()`
fails, which has led to reports of UI stuttering on latency-sensitive
Android devices. To address this, we can use a waitqueue to wake up tasks
that fail `swapcache_prepare()` sooner, instead of always sleeping for a
full tick. While tasks may occasionally be woken by an unrelated
`do_swap_page()`, this method is preferable to two scenarios: rapid
re-entry into page faults, which can cause livelocks, and multiple
millisecond sleeps, which visibly degrade user experience.
Oven's testing shows that a single waitqueue resolves the UI stuttering
issue. If a 'thundering herd' problem becomes apparent later, a waitqueue
hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit
locks can be introduced.
Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com
Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
Reported-by: Oven Liyang <liyangouwen1(a)oppo.com>
Tested-by: Oven Liyang <liyangouwen1(a)oppo.com>
Cc: Kairui Song <kasong(a)tencent.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
--- a/mm/memory.c~mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails
+++ a/mm/memory.c
@@ -4192,6 +4192,8 @@ static struct folio *alloc_swap_folio(st
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq);
+
/*
* We enter with non-exclusive mmap_lock (to exclude vma changes,
* but allow concurrent faults), and pte mapped but not yet locked.
@@ -4204,6 +4206,7 @@ vm_fault_t do_swap_page(struct vm_fault
{
struct vm_area_struct *vma = vmf->vma;
struct folio *swapcache, *folio = NULL;
+ DECLARE_WAITQUEUE(wait, current);
struct page *page;
struct swap_info_struct *si = NULL;
rmap_t rmap_flags = RMAP_NONE;
@@ -4302,7 +4305,9 @@ vm_fault_t do_swap_page(struct vm_fault
* Relax a bit to prevent rapid
* repeated page faults.
*/
+ add_wait_queue(&swapcache_wq, &wait);
schedule_timeout_uninterruptible(1);
+ remove_wait_queue(&swapcache_wq, &wait);
goto out_page;
}
need_clear_cache = true;
@@ -4609,8 +4614,10 @@ unlock:
pte_unmap_unlock(vmf->pte, vmf->ptl);
out:
/* Clear the swap cache pin for direct swapin after PTL unlock */
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
@@ -4625,8 +4632,10 @@ out_release:
folio_unlock(swapcache);
folio_put(swapcache);
}
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
_
Patches currently in -mm which might be from v-songbaohua(a)oppo.com are
mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails.patch
The patch titled
Subject: selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-fixed-incorrect-buffer-mirror-size-in-hmm2-double_map-test.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Donet Tom <donettom(a)linux.ibm.com>
Subject: selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
Date: Fri, 27 Sep 2024 00:07:52 -0500
The hmm2 double_map test was failing due to an incorrect buffer->mirror
size. The buffer->mirror size was 6, while buffer->ptr size was 6 *
PAGE_SIZE. The test failed because the kernel's copy_to_user function was
attempting to copy a 6 * PAGE_SIZE buffer to buffer->mirror. Since the
size of buffer->mirror was incorrect, copy_to_user failed.
This patch corrects the buffer->mirror size to 6 * PAGE_SIZE.
Test Result without this patch
==============================
# RUN hmm2.hmm2_device_private.double_map ...
# hmm-tests.c:1680:double_map:Expected ret (-14) == 0 (0)
# double_map: Test terminated by assertion
# FAIL hmm2.hmm2_device_private.double_map
not ok 53 hmm2.hmm2_device_private.double_map
Test Result with this patch
===========================
# RUN hmm2.hmm2_device_private.double_map ...
# OK hmm2.hmm2_device_private.double_map
ok 53 hmm2.hmm2_device_private.double_map
Link: https://lkml.kernel.org/r/20240927050752.51066-1-donettom@linux.ibm.com
Fixes: fee9f6d1b8df ("mm/hmm/test: add selftests for HMM")
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: J��r��me Glisse <jglisse(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Cc: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/hmm-tests.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/hmm-tests.c~selftests-mm-fixed-incorrect-buffer-mirror-size-in-hmm2-double_map-test
+++ a/tools/testing/selftests/mm/hmm-tests.c
@@ -1657,7 +1657,7 @@ TEST_F(hmm2, double_map)
buffer->fd = -1;
buffer->size = size;
- buffer->mirror = malloc(npages);
+ buffer->mirror = malloc(size);
ASSERT_NE(buffer->mirror, NULL);
/* Reserve a range of addresses. */
_
Patches currently in -mm which might be from donettom(a)linux.ibm.com are
selftests-mm-fixed-incorrect-buffer-mirror-size-in-hmm2-double_map-test.patch
The patch titled
Subject: device-dax: correct pgoff align in dax_set_mapping()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
device-dax-correct-pgoff-align-in-dax_set_mapping.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Kun(llfl)" <llfl(a)linux.alibaba.com>
Subject: device-dax: correct pgoff align in dax_set_mapping()
Date: Fri, 27 Sep 2024 15:45:09 +0800
pgoff should be aligned using ALIGN_DOWN() instead of ALIGN(). Otherwise,
vmf->address not aligned to fault_size will be aligned to the next
alignment, that can result in memory failure getting the wrong address.
Link: https://lkml.kernel.org/r/23c02a03e8d666fef11bbe13e85c69c8b4ca0624.17274216…
Fixes: b9b5777f09be ("device-dax: use ALIGN() for determining pgoff")
Signed-off-by: Kun(llfl) <llfl(a)linux.alibaba.com>
Tested-by: JianXiong Zhao <zhaojianxiong.zjx(a)alibaba-inc.com>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/dax/device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/dax/device.c~device-dax-correct-pgoff-align-in-dax_set_mapping
+++ a/drivers/dax/device.c
@@ -86,7 +86,7 @@ static void dax_set_mapping(struct vm_fa
nr_pages = 1;
pgoff = linear_page_index(vmf->vma,
- ALIGN(vmf->address, fault_size));
+ ALIGN_DOWN(vmf->address, fault_size));
for (i = 0; i < nr_pages; i++) {
struct page *page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
_
Patches currently in -mm which might be from llfl(a)linux.alibaba.com are
device-dax-correct-pgoff-align-in-dax_set_mapping.patch
arch-s390.h uses types from std.h, but does not include it.
Depending on the inclusion order the compilation can fail.
Include std.h explicitly to avoid these errors.
Fixes: 404fa87c0eaf ("tools/nolibc: s390: provide custom implementation for sys_fork")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
tools/include/nolibc/arch-s390.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/include/nolibc/arch-s390.h b/tools/include/nolibc/arch-s390.h
index 2ec13d8b9a2db80efa8d6cbbbd01bfa3d0059de2..f9ab83a219b8a2d5e53b0b303d8bf0bf78280d5f 100644
--- a/tools/include/nolibc/arch-s390.h
+++ b/tools/include/nolibc/arch-s390.h
@@ -10,6 +10,7 @@
#include "compiler.h"
#include "crt.h"
+#include "std.h"
/* Syscalls for s390:
* - registers are 64-bit
---
base-commit: e477dba5442c0af7acb9e8bbbbde1108a37ed39c
change-id: 20240927-nolibc-s390-std-h-cbb13f70fa73
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Hello,
I am unsure if this is the 'correct' behavior for ptrace.
If you run ptrace_traceme followed by ptrace_attach, then the process
attaches its own parent to itself and cannot be attached by another
thing. The attach call errors out, but GDB does report something
attached to it. I am unsure if Bash does this itself perhaps.
It's a bit hard for me to reason about because my debugging skills are
bad and trying 'strace' with bash -c ./thing, or just on the thing
itself gives -1 on both ptrace calls as strace attaches to it.
similarly with GDB. Unsure how to debug this.
https://gist.github.com/x64-elf-sh42/83393e319ad8280b8704fbe3f499e381
to compile simply:
gcc test.c -o thingy
This code works on my machine which is:
Linux 6.1.0-25-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3
(2024-08-26) x86_64 GNU/Linux
GDB -p on the pid reports that another pid is attached and the
operation is illegal. That other pid is the bash shell that i spawned
this binary from (code in gist).
it's useful for anti-debugging, but it seems odd it will attach it's
parent to the process since that's not actually doing the attach call.
If anything i'd expect the pid attached to itself, rather than the
parent getting attached.
The first call to ptrace (traceme) gets return value 0. The second
call (attach) gets return value -1. That does seem correct, but yet
there is something 'attached' when i try to use GDB.
If I only do the traceme call, it does not get attached by Bash, so it
looks totally like the 'attach' call has a side effect of attaching
the parent, rather than just only failing.
Kind regards,
~sh42
Hello everyone,
I wondered a few times in the last months how we could properly track
regressions for the stable series, as I'm always a bit unsure how proper
use of regzbot would look for these cases.
For issues that affect mainline usually just the commit itself is
specified with the relevant "#regzbot introduced: ..." invocation, but
how would one do this if the stable trees are also affected? Do we need
to specify "#regzbot introduced" for each backported version of the
upstream commit?!
IMO this is especially interesting because sometimes getting a patchset
to fix the older trees can take quite some time since possibly backport
dependencies have to be found or even custom patches need to be written,
as it i.e. was the case [here][0]. This led to fixes for the linux-6.1.y
branch landing a bit later which would be good to track with regzbot so
it does not fall through the cracks.
Maybe there already is a regzbot facility to do this that I have just
missed, but I thought if there is people on the lists will know :)
Cheers,
Chris
P.S.: I hope everybody had a great time at the conferences! I hope to
also make it next year ..
[0]: https://lore.kernel.org/all/66bcb6f68172f_adbf529471@willemb.c.googlers.com…
The atomicity violation issue is due to the invalidation of the function
port_has_data()'s check caused by concurrency. Imagine a scenario where a
port that contains data passes the validity check but is simultaneously
assigned a value with no data. This could result in an empty port passing
the validity check, potentially leading to a null pointer dereference
error later in the program, which is inconsistent.
To address this issue, we believe there is a problem with the original
logic. Since the validity check and the read operation were separated, it
could lead to inconsistencies between the data that passes the check and
the data being read. Therefore, we moved the main logic of the
port_has_data function into this function and placed the read operation
within a lock, ensuring that the validity check and read operation are
not separated, thus resolving the problem.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: 203baab8ba31 ("virtio: console: Introduce function to hand off data from host to readers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/char/virtio_console.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index de7d720d99fa..5aaf07f71a4c 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -656,10 +656,14 @@ static ssize_t fill_readbuf(struct port *port, u8 __user *out_buf,
struct port_buffer *buf;
unsigned long flags;
- if (!out_count || !port_has_data(port))
+ if (!out_count)
return 0;
- buf = port->inbuf;
+ spin_lock_irqsave(&port->inbuf_lock, flags);
+ buf = port->inbuf = get_inbuf(port);
+ spin_unlock_irqrestore(&port->inbuf_lock, flags);
+ if (!buf)
+ return 0;
out_count = min(out_count, buf->len - buf->offset);
if (to_user) {
--
2.34.1
Atomicity violation occurs when the fmc_send_cmd() function is executed
simultaneously with the modification of the fmdev->resp_skb value.
Consider a scenario where, after passing the validity check within the
function, a non-null fmdev->resp_skb variable is assigned a null value.
This results in an invalid fmdev->resp_skb variable passing the validity
check. As seen in the later part of the function, skb = fmdev->resp_skb;
when the invalid fmdev->resp_skb passes the check, a null pointer
dereference error may occur at line 478, evt_hdr = (void *)skb->data;
To address this issue, it is recommended to include the validity check of
fmdev->resp_skb within the locked section of the function. This
modification ensures that the value of fmdev->resp_skb does not change
during the validation process, thereby maintaining its validity.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: e8454ff7b9a4 ("[media] drivers:media:radio: wl128x: FM Driver Common sources")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/media/radio/wl128x/fmdrv_common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/radio/wl128x/fmdrv_common.c b/drivers/media/radio/wl128x/fmdrv_common.c
index 3d36f323a8f8..4d032436691c 100644
--- a/drivers/media/radio/wl128x/fmdrv_common.c
+++ b/drivers/media/radio/wl128x/fmdrv_common.c
@@ -466,11 +466,12 @@ int fmc_send_cmd(struct fmdev *fmdev, u8 fm_op, u16 type, void *payload,
jiffies_to_msecs(FM_DRV_TX_TIMEOUT) / 1000);
return -ETIMEDOUT;
}
+ spin_lock_irqsave(&fmdev->resp_skb_lock, flags);
if (!fmdev->resp_skb) {
+ spin_unlock_irqrestore(&fmdev->resp_skb_lock, flags);
fmerr("Response SKB is missing\n");
return -EFAULT;
}
- spin_lock_irqsave(&fmdev->resp_skb_lock, flags);
skb = fmdev->resp_skb;
fmdev->resp_skb = NULL;
spin_unlock_irqrestore(&fmdev->resp_skb_lock, flags);
--
2.34.1
From: Filipe Manana <fdmanana(a)suse.com>
commit f8f210dc84709804c9f952297f2bfafa6ea6b4bd upstream.
When updating the global block reserve, we account for the 6 items needed
by an unlink operation and the 6 delayed references for each one of those
items. However the calculation for the delayed references is not correct
in case we have the free space tree enabled, as in that case we need to
touch the free space tree as well and therefore need twice the number of
bytes. So use the btrfs_calc_delayed_ref_bytes() helper to calculate the
number of bytes need for the delayed references at
btrfs_update_global_block_rsv().
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
[Diogo: this patch has been cherry-picked from the original commit;
conflicts included lack of a define (picked from commit 5630e2bcfe223)
and lack of btrfs_calc_delayed_ref_bytes (picked from commit 0e55a54502b97)
- changed const struct -> struct for compatibility.]
Signed-off-by: Diogo Jahchan Koike <djahchankoike(a)gmail.com>
---
Fixes WARNING in btrfs_chunk_alloc (2) [0]
[0]: https://syzkaller.appspot.com/bug?extid=57de2b05959bc1e659af
---
fs/btrfs/block-rsv.c | 16 +++++++++-------
fs/btrfs/block-rsv.h | 12 ++++++++++++
fs/btrfs/delayed-ref.h | 21 +++++++++++++++++++++
3 files changed, 42 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index ec96285357e0..f7e3d6c2e302 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -378,17 +378,19 @@ void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
/*
* But we also want to reserve enough space so we can do the fallback
- * global reserve for an unlink, which is an additional 5 items (see the
- * comment in __unlink_start_trans for what we're modifying.)
+ * global reserve for an unlink, which is an additional
+ * BTRFS_UNLINK_METADATA_UNITS items.
*
* But we also need space for the delayed ref updates from the unlink,
- * so its 10, 5 for the actual operation, and 5 for the delayed ref
- * updates.
+ * so add BTRFS_UNLINK_METADATA_UNITS units for delayed refs, one for
+ * each unlink metadata item.
*/
- min_items += 10;
-
+ min_items += BTRFS_UNLINK_METADATA_UNITS;
+
num_bytes = max_t(u64, num_bytes,
- btrfs_calc_insert_metadata_size(fs_info, min_items));
+ btrfs_calc_insert_metadata_size(fs_info, min_items) +
+ btrfs_calc_delayed_ref_bytes(fs_info,
+ BTRFS_UNLINK_METADATA_UNITS));
spin_lock(&sinfo->lock);
spin_lock(&block_rsv->lock);
diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h
index 578c3497a455..662c52b4bd44 100644
--- a/fs/btrfs/block-rsv.h
+++ b/fs/btrfs/block-rsv.h
@@ -50,6 +50,18 @@ struct btrfs_block_rsv {
u64 qgroup_rsv_reserved;
};
+/*
+ * Number of metadata items necessary for an unlink operation:
+ *
+ * 1 for the possible orphan item
+ * 1 for the dir item
+ * 1 for the dir index
+ * 1 for the inode ref
+ * 1 for the inode
+ * 1 for the parent inode
+ */
+#define BTRFS_UNLINK_METADATA_UNITS 6
+
void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, enum btrfs_rsv_type type);
void btrfs_init_root_block_rsv(struct btrfs_root *root);
struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index d6304b690ec4..5c6bb23d45a5 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -253,6 +253,27 @@ extern struct kmem_cache *btrfs_delayed_extent_op_cachep;
int __init btrfs_delayed_ref_init(void);
void __cold btrfs_delayed_ref_exit(void);
+static inline u64 btrfs_calc_delayed_ref_bytes(struct btrfs_fs_info *fs_info,
+ int num_delayed_refs)
+{
+ u64 num_bytes;
+
+ num_bytes = btrfs_calc_insert_metadata_size(fs_info, num_delayed_refs);
+
+ /*
+ * We have to check the mount option here because we could be enabling
+ * the free space tree for the first time and don't have the compat_ro
+ * option set yet.
+ *
+ * We need extra reservations if we have the free space tree because
+ * we'll have to modify that tree as well.
+ */
+ if (btrfs_test_opt(fs_info, FREE_SPACE_TREE))
+ num_bytes *= 2;
+
+ return num_bytes;
+}
+
static inline void btrfs_init_generic_ref(struct btrfs_ref *generic_ref,
int action, u64 bytenr, u64 len, u64 parent)
{
--
2.43.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x b4cd80b0338945a94972ac3ed54f8338d2da2076
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091330-reps-craftsman-ab67@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
b4cd80b03389 ("mptcp: pm: Fix uaf in __timer_delete_sync")
d58300c3185b ("mptcp: validate 'id' when stopping the ADD_ADDR retransmit timer")
f7dafee18538 ("mptcp: use mptcp_addr_info in mptcp_options_received")
dc87efdb1a5c ("mptcp: add mptcp reset option support")
557963c383e8 ("mptcp: move to next addr when subflow creation fail")
d88c476f4a7d ("mptcp: export lookup_anno_list_by_saddr")
efd13b71a3fa ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b4cd80b0338945a94972ac3ed54f8338d2da2076 Mon Sep 17 00:00:00 2001
From: Edward Adam Davis <eadavis(a)qq.com>
Date: Tue, 10 Sep 2024 17:58:56 +0800
Subject: [PATCH] mptcp: pm: Fix uaf in __timer_delete_sync
There are two paths to access mptcp_pm_del_add_timer, result in a race
condition:
CPU1 CPU2
==== ====
net_rx_action
napi_poll netlink_sendmsg
__napi_poll netlink_unicast
process_backlog netlink_unicast_kernel
__netif_receive_skb genl_rcv
__netif_receive_skb_one_core netlink_rcv_skb
NF_HOOK genl_rcv_msg
ip_local_deliver_finish genl_family_rcv_msg
ip_protocol_deliver_rcu genl_family_rcv_msg_doit
tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit
tcp_v4_do_rcv mptcp_nl_remove_addrs_list
tcp_rcv_established mptcp_pm_remove_addrs_and_subflows
tcp_data_queue remove_anno_list_by_saddr
mptcp_incoming_options mptcp_pm_del_add_timer
mptcp_pm_del_add_timer kfree(entry)
In remove_anno_list_by_saddr(running on CPU2), after leaving the critical
zone protected by "pm.lock", the entry will be released, which leads to the
occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1).
Keeping a reference to add_timer inside the lock, and calling
sk_stop_timer_sync() with this reference, instead of "entry->add_timer".
Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock,
do not directly access any members of the entry outside the pm lock, which
can avoid similar "entry->x" uaf.
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable(a)vger.kernel.org
Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Edward Adam Davis <eadavis(a)qq.com>
Acked-by: Paolo Abeni <pabeni(a)redhat.com>
Link: https://patch.msgid.link/tencent_7142963A37944B4A74EF76CD66EA3C253609@qq.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index f891bc714668..ad935d34c973 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -334,15 +334,21 @@ mptcp_pm_del_add_timer(struct mptcp_sock *msk,
{
struct mptcp_pm_add_entry *entry;
struct sock *sk = (struct sock *)msk;
+ struct timer_list *add_timer = NULL;
spin_lock_bh(&msk->pm.lock);
entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
- if (entry && (!check_id || entry->addr.id == addr->id))
+ if (entry && (!check_id || entry->addr.id == addr->id)) {
entry->retrans_times = ADD_ADDR_RETRANS_MAX;
+ add_timer = &entry->add_timer;
+ }
+ if (!check_id && entry)
+ list_del(&entry->list);
spin_unlock_bh(&msk->pm.lock);
- if (entry && (!check_id || entry->addr.id == addr->id))
- sk_stop_timer_sync(sk, &entry->add_timer);
+ /* no lock, because sk_stop_timer_sync() is calling del_timer_sync() */
+ if (add_timer)
+ sk_stop_timer_sync(sk, add_timer);
return entry;
}
@@ -1462,7 +1468,6 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
entry = mptcp_pm_del_add_timer(msk, addr, false);
if (entry) {
- list_del(&entry->list);
kfree(entry);
return true;
}
Hello stable-team,
Francesco Dolcini reported a regression on v6.6.52 [1], it turns out
since v6.6.51~34 a.k.a. 5ea24ddc26a7 ("can: mcp251xfd: rx: add
workaround for erratum DS80000789E 6 of mcp2518fd") the mcp25xfd driver
is broken.
Cherry picking the following commits fixes the problem:
51b2a7216122 ("can: mcp251xfd: properly indent labels")
a7801540f325 ("can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()")
Please pick these patches for
- 6.10.x
- 6.6.x
- 6.1.x
[1] https://lore.kernel.org/all/20240923115310.GA138774@francesco-nb
regards,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung Nürnberg | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |
Hi Greg, Sasha,
This batch contains a backport for fixes for 5.10-stable:
The following list shows the backported patches, I am using original commit
IDs for reference:
1) 29b359cf6d95 ("netfilter: nft_set_pipapo: walk over current view on netlink dump")
2) efefd4f00c96 ("netfilter: nf_tables: missing iterator type in lookup walk")
Please, apply,
Thanks
Pablo Neira Ayuso (2):
netfilter: nft_set_pipapo: walk over current view on netlink dump
netfilter: nf_tables: missing iterator type in lookup walk
include/net/netfilter/nf_tables.h | 13 +++++++++++++
net/netfilter/nf_tables_api.c | 5 +++++
net/netfilter/nft_lookup.c | 1 +
net/netfilter/nft_set_pipapo.c | 6 ++++--
4 files changed, 23 insertions(+), 2 deletions(-)
--
2.30.2
Hi,
Can you please bring this commit into linux-6.11.y?
commit c35fad6f7e0d ("ASoC: amd: acp: add ZSC control register
programming sequence")
This helps with power consumption on the affected platforms.
Thanks,
Commit 0ef625bba6fb ("vfs: support statx(..., NULL, AT_EMPTY_PATH,
...)") added support for passing in NULL when AT_EMPTY_PATH is given,
improving performance when statx is used for fetching stat informantion
from a given fd, which is especially important for 32-bit platforms.
This commit also improved the performance when an empty string is given
by short-circuiting the handling of such paths.
This series is based on the commits in the Linus’ tree. Sligth
modifications are applied to the context of the patches for cleanly
applying.
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Signed-off-by: Miao Wang <shankerwangmiao(a)gmail.com>
---
Christian Brauner (2):
fs: new helper vfs_empty_path()
stat: use vfs_empty_path() helper
Mateusz Guzik (1):
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
fs/internal.h | 14 ++++++
fs/namespace.c | 13 ------
fs/stat.c | 123 ++++++++++++++++++++++++++++++++++++-----------------
include/linux/fs.h | 17 ++++++++
4 files changed, 116 insertions(+), 51 deletions(-)
---
base-commit: 049be94099ea5ba8338526c5a4f4f404b9dcaf54
change-id: 20240918-statx-stable-linux-6-10-y-17c3ec7497c8
Best regards,
--
Miao Wang <shankerwangmiao(a)gmail.com>
Hello,
please consider picking up:
7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces")
and its followup fix,
7052622fccb1 ("netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level()")
It should cherry-pick fine for 6.1 and later.
I'm not sure a 5.15 backport is worth it, as its not a crash fix and
noone has reported this problem so far with a 5.15 kernel.
Hello again,
Here is the next set of XFS backports, this set is for 6.1.y and I will
be following up with a set for 5.15.y later. There were some good
suggestions made at LSF to survey test coverage to cut back on
testing but I've been a bit swamped and a backport set was overdue.
So for this set, I have run the auto group 3 x 8 configs with no
regressions seen. Let me know if you spot any issues.
This set has already been ack'd on the XFS list.
Thanks,
Leah
https://lkml.iu.edu/hypermail/linux/kernel/2212.0/04860.html
52f31ed22821
[1/1] xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING
https://lore.kernel.org/linux-xfs/ef8a958d-741f-5bfd-7b2f-db65bf6dc3ac@huaw…
4da112513c01
[1/1] xfs: Fix deadlock on xfs_inodegc_worker
https://www.spinics.net/lists/linux-xfs/msg68547.html
601a27ea09a3
[1/1] xfs: fix extent busy updating
https://www.spinics.net/lists/linux-xfs/msg67254.html
c85007e2e394
[1/1] xfs: don't use BMBT btree split workers for IO completion
fixes from start of the series
https://lore.kernel.org/linux-xfs/20230209221825.3722244-1-david@fromorbit.…
[00/42] xfs: per-ag centric allocation alogrithms
1dd0510f6d4b
[01/42] xfs: fix low space alloc deadlock
f08f984c63e9
[02/42] xfs: prefer free inodes at ENOSPC over chunk allocation
d5753847b216
[03/42] xfs: block reservation too large for minleft allocation
https://lore.kernel.org/linux-xfs/Y+Z7TZ9o+KgXLcV8@magnolia/
60b730a40c43
[1/1] xfs: fix uninitialized variable access
https://lore.kernel.org/linux-xfs/20230228051250.1238353-1-david@fromorbit.…
0c7273e494dd
[1/1] xfs: quotacheck failure can race with background inode inactivation
https://lore.kernel.org/linux-xfs/20230412024907.GP360889@frogsfrogsfrogs/
8ee81ed581ff
[1/1] xfs: fix BUG_ON in xfs_getbmap()
https://www.spinics.net/lists/linux-xfs/msg71062.html
[0/4] xfs: bug fixes for 6.4-rc1
[1/4] xfs: don't unconditionally null args->pag in xfs_bmap_btalloc_at_eof
skip, fix for a commit from 6.3
8e698ee72c4e
[2/4] xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
[3/4] xfs: flush dirty data and drain directios before scrubbing cow fork
skip, scrub
[4/4] xfs: don't allocate into the data fork for an unshare request
skip, more of an optimization than a fix
1bba82fe1afa
[5/4] xfs: fix negative array access in xfs_getbmap
(fix of 8ee81ed)
https://lore.kernel.org/linux-xfs/20230517000449.3997582-1-david@fromorbit.…
[0/4] xfs: bug fixes for 6.4-rcX
89a4bf0dc385
[1/4] xfs: buffer pins need to hold a buffer reference
[2/4] xfs: restore allocation trylock iteration
skip, for issue introduced in 6.3
cb042117488d
[3/4] xfs: defered work could create precommits
(dependency for patch 4)
82842fee6e59
[4/4] xfs: fix AGF vs inode cluster buffer deadlock
https://lore.kernel.org/linux-xfs/20240612225148.3989713-1-david@fromorbit.…
348a1983cf4c
(fix for 82842fee6e5)
[1/1] xfs: fix unlink vs cluster buffer instantiation race
https://lore.kernel.org/linux-xfs/20230530001928.2967218-1-david@fromorbit.…
d4d12c02b
[1/1] xfs: collect errors from inodegc for unlinked inode recovery
https://lore.kernel.org/linux-xfs/20230524121041.GA4128075@ceph-admin/
c3b880acadc9
[1/1] xfs: fix ag count overflow during growfs
4b827b3f305d
[1/1] xfs: remove WARN when dquot cache insertion fails
requested on list to reduce bot noise
https://www.spinics.net/lists/linux-xfs/msg73214.html
5cf32f63b0f4
[1/2] xfs: fix the calculation for "end" and "length"
[2/2] introduces new feature, skipping
https://lore.kernel.org/linux-xfs/20230913102942.601271-1-ruansy.fnst@fujit…
3c90c01e4934
[1/1] xfs: correct calculation for agend and blockcount
(fixes 5cf32)
https://lore.kernel.org/all/20230901160020.GT28186@frogsfrogsfrogs/
68b957f64fca
[1/1] xfs: load uncached unlinked inodes into memory on demand
https://www.spinics.net/lists/linux-xfs/msg74960.html
[0/3] xfs: reload entire iunlink lists
f12b96683d69
[1/3] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
83771c50e42b
[2/3] xfs: reload entire unlinked bucket lists
(dependency of 49813a21ed)
49813a21ed57
[3/3] xfs: make inode unlinked bucket recovery work with quotacheck
(dependency for 537c013)
https://lore.kernel.org/all/169565629026.1982077.12646061547002741492.stgit…
537c013b140d
[1/1] xfs: fix reloading entire unlinked bucket lists
(fix of 68b957f64fca)
Darrick J. Wong (8):
xfs: fix uninitialized variable access
xfs: load uncached unlinked inodes into memory on demand
xfs: fix negative array access in xfs_getbmap
xfs: use i_prev_unlinked to distinguish inodes that are not on the
unlinked list
xfs: reload entire unlinked bucket lists
xfs: make inode unlinked bucket recovery work with quotacheck
xfs: fix reloading entire unlinked bucket lists
xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
Dave Chinner (12):
xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING
xfs: don't use BMBT btree split workers for IO completion
xfs: fix low space alloc deadlock
xfs: prefer free inodes at ENOSPC over chunk allocation
xfs: block reservation too large for minleft allocation
xfs: quotacheck failure can race with background inode inactivation
xfs: buffer pins need to hold a buffer reference
xfs: defered work could create precommits
xfs: fix AGF vs inode cluster buffer deadlock
xfs: collect errors from inodegc for unlinked inode recovery
xfs: remove WARN when dquot cache insertion fails
xfs: fix unlink vs cluster buffer instantiation race
Long Li (1):
xfs: fix ag count overflow during growfs
Shiyang Ruan (2):
xfs: fix the calculation for "end" and "length"
xfs: correct calculation for agend and blockcount
Wengang Wang (1):
xfs: fix extent busy updating
Wu Guanghao (1):
xfs: Fix deadlock on xfs_inodegc_worker
Ye Bin (1):
xfs: fix BUG_ON in xfs_getbmap()
fs/xfs/libxfs/xfs_ag.c | 19 ++-
fs/xfs/libxfs/xfs_alloc.c | 69 +++++++--
fs/xfs/libxfs/xfs_bmap.c | 16 +-
fs/xfs/libxfs/xfs_bmap.h | 2 +
fs/xfs/libxfs/xfs_bmap_btree.c | 19 ++-
fs/xfs/libxfs/xfs_btree.c | 18 ++-
fs/xfs/libxfs/xfs_fs.h | 2 +
fs/xfs/libxfs/xfs_ialloc.c | 17 +++
fs/xfs/libxfs/xfs_log_format.h | 9 +-
fs/xfs/libxfs/xfs_trans_inode.c | 113 +-------------
fs/xfs/xfs_attr_inactive.c | 1 -
fs/xfs/xfs_bmap_util.c | 18 +--
fs/xfs/xfs_buf_item.c | 88 ++++++++---
fs/xfs/xfs_dquot.c | 1 -
fs/xfs/xfs_export.c | 14 ++
fs/xfs/xfs_extent_busy.c | 1 +
fs/xfs/xfs_fsmap.c | 1 +
fs/xfs/xfs_fsops.c | 13 +-
fs/xfs/xfs_icache.c | 58 +++++--
fs/xfs/xfs_icache.h | 4 +-
fs/xfs/xfs_inode.c | 260 ++++++++++++++++++++++++++++----
fs/xfs/xfs_inode.h | 36 ++++-
fs/xfs/xfs_inode_item.c | 149 ++++++++++++++++++
fs/xfs/xfs_inode_item.h | 1 +
fs/xfs/xfs_itable.c | 11 ++
fs/xfs/xfs_log_recover.c | 19 ++-
fs/xfs/xfs_mount.h | 11 +-
fs/xfs/xfs_notify_failure.c | 15 +-
fs/xfs/xfs_qm.c | 72 ++++++---
fs/xfs/xfs_super.c | 1 +
fs/xfs/xfs_trace.h | 46 ++++++
fs/xfs/xfs_trans.c | 9 +-
32 files changed, 841 insertions(+), 272 deletions(-)
--
2.46.0.792.g87dc391469-goog
From: Tony Luck <tony.luck(a)intel.com>
[ Upstream commit 2eda374e883ad297bd9fe575a16c1dc850346075 ]
New CPU #defines encode vendor and family as well as model.
[ dhansen: vertically align 0's in invlpg_miss_ids[] ]
Signed-off-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Link: https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com
[ Ricardo: I used the old match macro X86_MATCH_INTEL_FAM6_MODEL()
instead of X86_MATCH_VFM() as in the upstream commit. ]
Signed-off-by: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
---
I tested this backport on an Alder Lake system. Now pr_info("Incomplete
global flushes, disabling PCID") is back in dmesg. I also tested on a
Meteor Lake system, which is not affected by the INVLPG issue. The message
in question is not there before and after the backport, as expected.
This backport fixes the last remaining caller of x86_match_cpu()
that does not use the family of X86_MATCH_*() macros.
Thomas Lindroth intially reported the regression in
https://lore.kernel.org/all/eb709d67-2a8d-412f-905d-f3777d897bfa@gmail.com/
Maybe Tony and/or the x86 maintainers can ack this backport?
---
arch/x86/mm/init.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 679893ea5e68..6215dfa23578 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -261,21 +261,17 @@ static void __init probe_page_size_mask(void)
}
}
-#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
- .family = 6, \
- .model = _model, \
- }
/*
* INVLPG may not properly flush Global entries
* on these CPUs when PCIDs are enabled.
*/
static const struct x86_cpu_id invlpg_miss_ids[] = {
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
- INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_GRACEMONT, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_P, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_S, 0),
{}
};
--
2.34.1
Hi,
Upstream commit 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just
X86_VENDOR_INTEL") introduced a flags member to struct x86_cpu_id. Bit 0
of x86_cpu.id.flags must be set to 1 for x86_match_cpu() to work correctly.
This upstream commit has been backported to 6.1.y.
Callers that use the X86_MATCH_*() family of macros to compose the argument
of x86_match_cpu() function correctly. Callers that use their own custom
mechanisms may not work if they fail to set x86_cpu_id.flags correctly.
There are two remaining callers in 6.1.y that use their own mechanisms:
setup_pcid() and rapl_msr_probe(). The former caused a regression that
Thomas Lindroth reported in [1]. The latter works by luck but it still
populates its x86_cpu_id[] array incorrectly.
I backported two patches that fix these bugs in mainline. Hopefully the
authors and/or maintainers can ack the backports?
I tested these patches in Alder Lake and Meteor Lake systems as the two
involved callers behave differently on these two systems. I wrote the
testing details in each patch separately.
Thanks and BR,
Ricardo
[1]. https://lore.kernel.org/all/eb709d67-2a8d-412f-905d-f3777d897bfa@gmail.com/
Sumeet Pawnikar (1):
powercap: RAPL: fix invalid initialization for pl4_supported field
Tony Luck (1):
x86/mm: Switch to new Intel CPU model defines
arch/x86/mm/init.c | 16 ++++++----------
drivers/powercap/intel_rapl_msr.c | 12 ++++++------
2 files changed, 12 insertions(+), 16 deletions(-)
--
2.34.1
The `LockedBy::access` method only requires a shared reference to the
owner, so if we have shared access to the `LockedBy` from several
threads at once, then two threads could call `access` in parallel and
both obtain a shared reference to the inner value. Thus, require that
`T: Sync` when calling the `access` method.
An alternative is to require `T: Sync` in the `impl Sync for LockedBy`.
This patch does not choose that approach as it gives up the ability to
use `LockedBy` with `!Sync` types, which is okay as long as you only use
`access_mut`.
Cc: stable(a)vger.kernel.org
Fixes: 7b1f55e3a984 ("rust: sync: introduce `LockedBy`")
Signed-off-by: Alice Ryhl <aliceryhl(a)google.com>
---
Changes in v2:
- Use a `where T: Sync` on `access` instead of changing `impl Sync for
LockedBy`.
- Link to v1: https://lore.kernel.org/r/20240912-locked-by-sync-fix-v1-1-26433cbccbd2@goo…
---
rust/kernel/sync/locked_by.rs | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/rust/kernel/sync/locked_by.rs b/rust/kernel/sync/locked_by.rs
index babc731bd5f6..ce2ee8d87865 100644
--- a/rust/kernel/sync/locked_by.rs
+++ b/rust/kernel/sync/locked_by.rs
@@ -83,8 +83,12 @@ pub struct LockedBy<T: ?Sized, U: ?Sized> {
// SAFETY: `LockedBy` can be transferred across thread boundaries iff the data it protects can.
unsafe impl<T: ?Sized + Send, U: ?Sized> Send for LockedBy<T, U> {}
-// SAFETY: `LockedBy` serialises the interior mutability it provides, so it is `Sync` as long as the
-// data it protects is `Send`.
+// SAFETY: If `T` is not `Sync`, then parallel shared access to this `LockedBy` allows you to use
+// `access_mut` to hand out `&mut T` on one thread at the time. The requirement that `T: Send` is
+// sufficient to allow that.
+//
+// If `T` is `Sync`, then the `access` method also becomes available, which allows you to obtain
+// several `&T` from several threads at once. However, this is okay as `T` is `Sync`.
unsafe impl<T: ?Sized + Send, U: ?Sized> Sync for LockedBy<T, U> {}
impl<T, U> LockedBy<T, U> {
@@ -118,7 +122,10 @@ impl<T: ?Sized, U> LockedBy<T, U> {
///
/// Panics if `owner` is different from the data protected by the lock used in
/// [`new`](LockedBy::new).
- pub fn access<'a>(&'a self, owner: &'a U) -> &'a T {
+ pub fn access<'a>(&'a self, owner: &'a U) -> &'a T
+ where
+ T: Sync,
+ {
build_assert!(
size_of::<U>() > 0,
"`U` cannot be a ZST because `owner` wouldn't be unique"
@@ -127,7 +134,10 @@ pub fn access<'a>(&'a self, owner: &'a U) -> &'a T {
panic!("mismatched owners");
}
- // SAFETY: `owner` is evidence that the owner is locked.
+ // SAFETY: `owner` is evidence that there are only shared references to the owner for the
+ // duration of 'a, so it's not possible to use `Self::access_mut` to obtain a mutable
+ // reference to the inner value that aliases with this shared reference. The type is `Sync`
+ // so there are no other requirements.
unsafe { &*self.data.get() }
}
---
base-commit: 93dc3be19450447a3a7090bd1dfb9f3daac3e8d2
change-id: 20240912-locked-by-sync-fix-07193df52f98
Best regards,
--
Alice Ryhl <aliceryhl(a)google.com>
The quilt patch titled
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
has been removed from the -mm tree. Its filename was
ocfs2-fix-uninit-value-in-ocfs2_get_block.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
Date: Wed, 25 Sep 2024 17:06:00 +0800
syzbot reported an uninit-value BUG:
BUG: KMSAN: uninit-value in ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
do_mpage_readpage+0xc45/0x2780 fs/mpage.c:225
mpage_readahead+0x43f/0x840 fs/mpage.c:374
ocfs2_readahead+0x269/0x320 fs/ocfs2/aops.c:381
read_pages+0x193/0x1110 mm/readahead.c:160
page_cache_ra_unbounded+0x901/0x9f0 mm/readahead.c:273
do_page_cache_ra mm/readahead.c:303 [inline]
force_page_cache_ra+0x3b1/0x4b0 mm/readahead.c:332
force_page_cache_readahead mm/internal.h:347 [inline]
generic_fadvise+0x6b0/0xa90 mm/fadvise.c:106
vfs_fadvise mm/fadvise.c:185 [inline]
ksys_fadvise64_64 mm/fadvise.c:199 [inline]
__do_sys_fadvise64 mm/fadvise.c:214 [inline]
__se_sys_fadvise64 mm/fadvise.c:212 [inline]
__x64_sys_fadvise64+0x1fb/0x3a0 mm/fadvise.c:212
x64_sys_call+0xe11/0x3ba0
arch/x86/include/generated/asm/syscalls_64.h:222
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
This is because when ocfs2_extent_map_get_blocks() fails, p_blkno is
uninitialized. So the error log will trigger the above uninit-value
access.
The error log is out-of-date since get_blocks() was removed long time ago.
And the error code will be logged in ocfs2_extent_map_get_blocks() once
ocfs2_get_cluster() fails, so fix this by only logging inode and block.
Link: https://syzkaller.appspot.com/bug?extid=9709e73bae885b05314b
Link: https://lkml.kernel.org/r/20240925090600.3643376-1-joseph.qi@linux.alibaba.…
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Tested-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Cc: Heming Zhao <heming.zhao(a)suse.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/aops.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/fs/ocfs2/aops.c~ocfs2-fix-uninit-value-in-ocfs2_get_block
+++ a/fs/ocfs2/aops.c
@@ -156,9 +156,8 @@ int ocfs2_get_block(struct inode *inode,
err = ocfs2_extent_map_get_blocks(inode, iblock, &p_blkno, &count,
&ext_flags);
if (err) {
- mlog(ML_ERROR, "Error %d from get_blocks(0x%p, %llu, 1, "
- "%llu, NULL)\n", err, inode, (unsigned long long)iblock,
- (unsigned long long)p_blkno);
+ mlog(ML_ERROR, "get_blocks() failed, inode: 0x%p, "
+ "block: %llu\n", inode, (unsigned long long)iblock);
goto bail;
}
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
The quilt patch titled
Subject: kselftests: mm: fix wrong __NR_userfaultfd value
has been removed from the -mm tree. Its filename was
kselftests-mm-fix-wrong-__nr_userfaultfd-value.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: kselftests: mm: fix wrong __NR_userfaultfd value
Date: Mon, 23 Sep 2024 10:38:36 +0500
grep -rnIF "#define __NR_userfaultfd"
tools/include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
arch/x86/include/generated/uapi/asm/unistd_32.h:374:#define
__NR_userfaultfd 374
arch/x86/include/generated/uapi/asm/unistd_64.h:327:#define
__NR_userfaultfd 323
arch/x86/include/generated/uapi/asm/unistd_x32.h:282:#define
__NR_userfaultfd (__X32_SYSCALL_BIT + 323)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:347:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
arch/arm/include/generated/uapi/asm/unistd-oabi.h:359:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
The number is dependent on the architecture. The above data shows that:
x86 374
x86_64 323
The value of __NR_userfaultfd was changed to 282 when asm-generic/unistd.h
was included. It makes the test to fail every time as the correct number
of this syscall on x86_64 is 323. Fix the header to asm/unistd.h.
Link: https://lkml.kernel.org/r/20240923053836.3270393-1-usama.anjum@collabora.com
Fixes: a5c6bc590094 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/pagemap_ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/pagemap_ioctl.c~kselftests-mm-fix-wrong-__nr_userfaultfd-value
+++ a/tools/testing/selftests/mm/pagemap_ioctl.c
@@ -15,7 +15,7 @@
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <math.h>
-#include <asm-generic/unistd.h>
+#include <asm/unistd.h>
#include <pthread.h>
#include <sys/resource.h>
#include <assert.h>
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
The quilt patch titled
Subject: compiler.h: specify correct attribute for .rodata..c_jump_table
has been removed from the -mm tree. Its filename was
compilerh-specify-correct-attribute-for-rodatac_jump_table.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Subject: compiler.h: specify correct attribute for .rodata..c_jump_table
Date: Tue, 24 Sep 2024 14:27:10 +0800
Currently, there is an assembler message when generating kernel/bpf/core.o
under CONFIG_OBJTOOL with LoongArch compiler toolchain:
Warning: setting incorrect section attributes for .rodata..c_jump_table
This is because the section ".rodata..c_jump_table" should be readonly,
but there is a "W" (writable) part of the flags:
$ readelf -S kernel/bpf/core.o | grep -A 1 "rodata..c"
[34] .rodata..c_j[...] PROGBITS 0000000000000000 0000d2e0
0000000000000800 0000000000000000 WA 0 0 8
There is no above issue on x86 due to the generated section flag is only
"A" (allocatable). In order to silence the warning on LoongArch, specify
the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly,
then the section attribute of ".rodata..c_jump_table" must be readonly
in the kernel/bpf/core.o file.
Before:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, DATA
After:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
By the way, AFAICT, maybe the root cause is related with the different
compiler behavior of various archs, so to some extent this change is a
workaround for LoongArch, and also there is no effect for x86 which is the
only port supported by objtool before LoongArch with this patch.
Link: https://lkml.kernel.org/r/20240924062710.1243-1-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Cc: Josh Poimboeuf <jpoimboe(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [6.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/compiler.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/compiler.h~compilerh-specify-correct-attribute-for-rodatac_jump_table
+++ a/include/linux/compiler.h
@@ -133,7 +133,7 @@ void ftrace_likely_update(struct ftrace_
#define annotate_unreachable() __annotate_unreachable(__COUNTER__)
/* Annotate a C jump table to allow objtool to follow the code flow */
-#define __annotate_jump_table __section(".rodata..c_jump_table")
+#define __annotate_jump_table __section(".rodata..c_jump_table,\"a\",@progbits #")
#else /* !CONFIG_OBJTOOL */
#define annotate_reachable()
_
Patches currently in -mm which might be from yangtiezhu(a)loongson.cn are
The quilt patch titled
Subject: ocfs2: fix deadlock in ocfs2_get_system_file_inode
has been removed from the -mm tree. Its filename was
ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mohammed Anees <pvmohammedanees2003(a)gmail.com>
Subject: ocfs2: fix deadlock in ocfs2_get_system_file_inode
Date: Tue, 24 Sep 2024 09:32:57 +0000
syzbot has found a possible deadlock in ocfs2_get_system_file_inode [1].
The scenario is depicted here,
CPU0 CPU1
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
The function calls which could lead to this are:
CPU0
ocfs2_mknod - lock(&ocfs2_file_ip_alloc_sem_key);
.
.
.
ocfs2_get_system_file_inode - lock(&osb->system_file_mutex);
CPU1 -
ocfs2_fill_super - lock(&osb->system_file_mutex);
.
.
.
ocfs2_read_virt_blocks - lock(&ocfs2_file_ip_alloc_sem_key);
This issue can be resolved by making the down_read -> down_read_try
in the ocfs2_read_virt_blocks.
[1] https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Link: https://lkml.kernel.org/r/20240924093257.7181-1-pvmohammedanees2003@gmail.c…
Signed-off-by: Mohammed Anees <pvmohammedanees2003(a)gmail.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: <syzbot+e0055ea09f1f5e6fabdd(a)syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Tested-by: syzbot+e0055ea09f1f5e6fabdd(a)syzkaller.appspotmail.com
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/extent_map.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/extent_map.c~ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode
+++ a/fs/ocfs2/extent_map.c
@@ -973,7 +973,13 @@ int ocfs2_read_virt_blocks(struct inode
}
while (done < nr) {
- down_read(&OCFS2_I(inode)->ip_alloc_sem);
+ if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
+ rc = -EAGAIN;
+ mlog(ML_ERROR,
+ "Inode #%llu ip_alloc_sem is temporarily unavailable\n",
+ (unsigned long long)OCFS2_I(inode)->ip_blkno);
+ break;
+ }
rc = ocfs2_extent_map_get_blocks(inode, v_block + done,
&p_block, &p_count, NULL);
up_read(&OCFS2_I(inode)->ip_alloc_sem);
_
Patches currently in -mm which might be from pvmohammedanees2003(a)gmail.com are
ocfs2-fix-typo-in-comment.patch
The quilt patch titled
Subject: ocfs2: reserve space for inline xattr before attaching reflink tree
has been removed from the -mm tree. Its filename was
ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
Subject: ocfs2: reserve space for inline xattr before attaching reflink tree
Date: Wed, 18 Sep 2024 06:38:44 +0000
One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption. Upon troubleshooting,
the fsck -fn output showed the below corruption
[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227. Clamp the next record value? n
The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.
Inode: 33080590 Mode: 0640 Generation: 2619713622 (0x9c25a856)
FS Generation: 904309833 (0x35e6ac49)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
Extended Attributes Block: 0 Extended Attributes Inline Size: 256
User: 0 (root) Group: 0 (root) Size: 281320357888
Links: 1 Clusters: 141738
ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
dtime: 0x0 -- Wed Dec 31 17:00:00 1969
Refcount Block: 2777346
Last Extblk: 2886943 Orphan Slot: 0
Sub Alloc Slot: 0 Sub Alloc Bit: 14
Tree Depth: 1 Count: 227 Next Free Rec: 230
## Offset Clusters Block#
0 0 2310 2776351
1 2310 2139 2777375
2 4449 1221 2778399
3 5670 731 2779423
4 6401 566 2780447
....... .... .......
....... .... .......
The issue was in the reflink workfow while reserving space for inline
xattr. The problematic function is ocfs2_reflink_xattr_inline(). By the
time this function is called the reflink tree is already recreated at the
destination inode from the source inode. At this point, this function
reserves space for inline xattrs at the destination inode without even
checking if there is space at the root metadata block. It simply reduces
the l_count from 243 to 227 thereby making space of 256 bytes for inline
xattr whereas the inode already has extents beyond this index (in this
case up to 230), thereby causing corruption.
The fix for this is to reserve space for inline metadata at the destination
inode before the reflink tree gets recreated. The customer has verified the
fix.
Link: https://lkml.kernel.org/r/20240918063844.1830332-1-gautham.ananthakrishna@o…
Fixes: ef962df057aa ("ocfs2: xattr: fix inlined xattr reflink")
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/refcounttree.c | 26 ++++++++++++++++++++++++--
fs/ocfs2/xattr.c | 11 +----------
2 files changed, 25 insertions(+), 12 deletions(-)
--- a/fs/ocfs2/refcounttree.c~ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree
+++ a/fs/ocfs2/refcounttree.c
@@ -25,6 +25,7 @@
#include "namei.h"
#include "ocfs2_trace.h"
#include "file.h"
+#include "symlink.h"
#include <linux/bio.h>
#include <linux/blkdev.h>
@@ -4148,8 +4149,9 @@ static int __ocfs2_reflink(struct dentry
int ret;
struct inode *inode = d_inode(old_dentry);
struct buffer_head *new_bh = NULL;
+ struct ocfs2_inode_info *oi = OCFS2_I(inode);
- if (OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
+ if (oi->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
ret = -EINVAL;
mlog_errno(ret);
goto out;
@@ -4175,6 +4177,26 @@ static int __ocfs2_reflink(struct dentry
goto out_unlock;
}
+ if ((oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) &&
+ (oi->ip_dyn_features & OCFS2_INLINE_XATTR_FL)) {
+ /*
+ * Adjust extent record count to reserve space for extended attribute.
+ * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
+ */
+ struct ocfs2_inode_info *new_oi = OCFS2_I(new_inode);
+
+ if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+ !(ocfs2_inode_is_fast_symlink(new_inode))) {
+ struct ocfs2_dinode *new_di = (struct ocfs2_dinode *)new_bh->b_data;
+ struct ocfs2_dinode *old_di = (struct ocfs2_dinode *)old_bh->b_data;
+ struct ocfs2_extent_list *el = &new_di->id2.i_list;
+ int inline_size = le16_to_cpu(old_di->i_xattr_inline_size);
+
+ le16_add_cpu(&el->l_count, -(inline_size /
+ sizeof(struct ocfs2_extent_rec)));
+ }
+ }
+
ret = ocfs2_create_reflink_node(inode, old_bh,
new_inode, new_bh, preserve);
if (ret) {
@@ -4182,7 +4204,7 @@ static int __ocfs2_reflink(struct dentry
goto inode_unlock;
}
- if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
+ if (oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
ret = ocfs2_reflink_xattrs(inode, old_bh,
new_inode, new_bh,
preserve);
--- a/fs/ocfs2/xattr.c~ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree
+++ a/fs/ocfs2/xattr.c
@@ -6511,16 +6511,7 @@ static int ocfs2_reflink_xattr_inline(st
}
new_oi = OCFS2_I(args->new_inode);
- /*
- * Adjust extent record count to reserve space for extended attribute.
- * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
- */
- if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
- !(ocfs2_inode_is_fast_symlink(args->new_inode))) {
- struct ocfs2_extent_list *el = &new_di->id2.i_list;
- le16_add_cpu(&el->l_count, -(inline_size /
- sizeof(struct ocfs2_extent_rec)));
- }
+
spin_lock(&new_oi->ip_lock);
new_oi->ip_dyn_features |= OCFS2_HAS_XATTR_FL | OCFS2_INLINE_XATTR_FL;
new_di->i_dyn_features = cpu_to_le16(new_oi->ip_dyn_features);
_
Patches currently in -mm which might be from gautham.ananthakrishna(a)oracle.com are
The quilt patch titled
Subject: mm: migrate: annotate data-race in migrate_folio_unmap()
has been removed from the -mm tree. Its filename was
mm-migrate-annotate-data-race-in-migrate_folio_unmap.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jeongjun Park <aha310510(a)gmail.com>
Subject: mm: migrate: annotate data-race in migrate_folio_unmap()
Date: Tue, 24 Sep 2024 22:00:53 +0900
I found a report from syzbot [1]
This report shows that the value can be changed, but in reality, the
value of __folio_set_movable() cannot be changed because it holds the
folio refcount.
Therefore, it is appropriate to add an annotate to make KCSAN
ignore that data-race.
[1]
==================================================================
BUG: KCSAN: data-race in __filemap_remove_folio / migrate_pages_batch
write to 0xffffea0004b81dd8 of 8 bytes by task 6348 on cpu 0:
page_cache_delete mm/filemap.c:153 [inline]
__filemap_remove_folio+0x1ac/0x2c0 mm/filemap.c:233
filemap_remove_folio+0x6b/0x1f0 mm/filemap.c:265
truncate_inode_folio+0x42/0x50 mm/truncate.c:178
shmem_undo_range+0x25b/0xa70 mm/shmem.c:1028
shmem_truncate_range mm/shmem.c:1144 [inline]
shmem_evict_inode+0x14d/0x530 mm/shmem.c:1272
evict+0x2f0/0x580 fs/inode.c:731
iput_final fs/inode.c:1883 [inline]
iput+0x42a/0x5b0 fs/inode.c:1909
dentry_unlink_inode+0x24f/0x260 fs/dcache.c:412
__dentry_kill+0x18b/0x4c0 fs/dcache.c:615
dput+0x5c/0xd0 fs/dcache.c:857
__fput+0x3fb/0x6d0 fs/file_table.c:439
____fput+0x1c/0x30 fs/file_table.c:459
task_work_run+0x13a/0x1a0 kernel/task_work.c:228
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0xbe/0x130 kernel/entry/common.c:218
do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffffea0004b81dd8 of 8 bytes by task 6342 on cpu 1:
__folio_test_movable include/linux/page-flags.h:699 [inline]
migrate_folio_unmap mm/migrate.c:1199 [inline]
migrate_pages_batch+0x24c/0x1940 mm/migrate.c:1797
migrate_pages_sync mm/migrate.c:1963 [inline]
migrate_pages+0xff1/0x1820 mm/migrate.c:2072
do_mbind mm/mempolicy.c:1390 [inline]
kernel_mbind mm/mempolicy.c:1533 [inline]
__do_sys_mbind mm/mempolicy.c:1607 [inline]
__se_sys_mbind+0xf76/0x1160 mm/mempolicy.c:1603
__x64_sys_mbind+0x78/0x90 mm/mempolicy.c:1603
x64_sys_call+0x2b4d/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:238
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0xffff888127601078 -> 0x0000000000000000
Link: https://lkml.kernel.org/r/20240924130053.107490-1-aha310510@gmail.com
Fixes: 7e2a5e5ab217 ("mm: migrate: use __folio_test_movable()")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-annotate-data-race-in-migrate_folio_unmap
+++ a/mm/migrate.c
@@ -1196,7 +1196,7 @@ static int migrate_folio_unmap(new_folio
int rc = -EAGAIN;
int old_page_state = 0;
struct anon_vma *anon_vma = NULL;
- bool is_lru = !__folio_test_movable(src);
+ bool is_lru = data_race(!__folio_test_movable(src));
bool locked = false;
bool dst_locked = false;
_
Patches currently in -mm which might be from aha310510(a)gmail.com are
mm-percpu-fix-typo-to-pcpu_alloc_noprof-description.patch
mm-shmem-fix-data-race-in-shmem_getattr.patch
The quilt patch titled
Subject: mm/hugetlb: simplify refs in memfd_alloc_folio
has been removed from the -mm tree. Its filename was
mm-hugetlb-simplify-refs-in-memfd_alloc_folio.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/hugetlb: simplify refs in memfd_alloc_folio
Date: Wed, 4 Sep 2024 12:41:08 -0700
The folio_try_get in memfd_alloc_folio is not necessary. Delete it, and
delete the matching folio_put in memfd_pin_folios. This also avoids
leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
fails. That error path is also broken in a second way -- when its
folio_put causes the ref to become 0, it will implicitly call
free_huge_folio, but then the path *explicitly* calls free_huge_folio.
Delete the latter.
This is a continuation of the fix
"mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
[steven.sistare(a)oracle.com: remove explicit call to free_huge_folio(), per Matthew]
Link: https://lkml.kernel.org/r/Zti-7nPVMcGgpcbi@casper.infradead.org
Link: https://lkml.kernel.org/r/1725481920-82506-1-git-send-email-steven.sistare@…
Link: https://lkml.kernel.org/r/1725478868-61732-1-git-send-email-steven.sistare@…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Suggested-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 4 +---
mm/memfd.c | 3 +--
2 files changed, 2 insertions(+), 5 deletions(-)
--- a/mm/gup.c~mm-hugetlb-simplify-refs-in-memfd_alloc_folio
+++ a/mm/gup.c
@@ -3615,7 +3615,7 @@ long memfd_pin_folios(struct file *memfd
pgoff_t start_idx, end_idx, next_idx;
struct folio *folio = NULL;
struct folio_batch fbatch;
- struct hstate *h = NULL;
+ struct hstate *h;
long ret = -EINVAL;
if (start < 0 || start > end || !max_folios)
@@ -3659,8 +3659,6 @@ long memfd_pin_folios(struct file *memfd
&fbatch);
if (folio) {
folio_put(folio);
- if (h)
- folio_put(folio);
folio = NULL;
}
--- a/mm/memfd.c~mm-hugetlb-simplify-refs-in-memfd_alloc_folio
+++ a/mm/memfd.c
@@ -89,13 +89,12 @@ struct folio *memfd_alloc_folio(struct f
numa_node_id(),
NULL,
gfp_mask);
- if (folio && folio_try_get(folio)) {
+ if (folio) {
err = hugetlb_add_to_page_cache(folio,
memfd->f_mapping,
idx);
if (err) {
folio_put(folio);
- free_huge_folio(folio);
return ERR_PTR(err);
}
folio_unlock(folio);
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/gup: fix memfd_pin_folios alloc race panic
has been removed from the -mm tree. Its filename was
mm-gup-fix-memfd_pin_folios-alloc-race-panic.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/gup: fix memfd_pin_folios alloc race panic
Date: Tue, 3 Sep 2024 07:25:21 -0700
If memfd_pin_folios tries to create a hugetlb page, but someone else
already did, then folio gets the value -EEXIST here:
folio = memfd_alloc_folio(memfd, start_idx);
if (IS_ERR(folio)) {
ret = PTR_ERR(folio);
if (ret != -EEXIST)
goto err;
then on the next trip through the "while start_idx" loop we panic here:
if (folio) {
folio_put(folio);
To fix, set the folio to NULL on error.
Link: https://lkml.kernel.org/r/1725373521-451395-6-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/gup.c~mm-gup-fix-memfd_pin_folios-alloc-race-panic
+++ a/mm/gup.c
@@ -3702,6 +3702,7 @@ long memfd_pin_folios(struct file *memfd
ret = PTR_ERR(folio);
if (ret != -EEXIST)
goto err;
+ folio = NULL;
}
}
}
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/gup: fix memfd_pin_folios hugetlb page allocation
has been removed from the -mm tree. Its filename was
mm-gup-fix-memfd_pin_folios-hugetlb-page-allocation.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/gup: fix memfd_pin_folios hugetlb page allocation
Date: Tue, 3 Sep 2024 07:25:20 -0700
When memfd_pin_folios -> memfd_alloc_folio creates a hugetlb page, the
index is wrong. The subsequent call to filemap_get_folios_contig thus
cannot find it, and fails, and memfd_pin_folios loops forever. To fix,
adjust the index for the huge_page_order.
memfd_alloc_folio also forgets to unlock the folio, so the next touch of
the page calls hugetlb_fault which blocks forever trying to take the lock.
Unlock it.
Link: https://lkml.kernel.org/r/1725373521-451395-5-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memfd.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/mm/memfd.c~mm-gup-fix-memfd_pin_folios-hugetlb-page-allocation
+++ a/mm/memfd.c
@@ -79,10 +79,13 @@ struct folio *memfd_alloc_folio(struct f
* alloc from. Also, the folio will be pinned for an indefinite
* amount of time, so it is not expected to be migrated away.
*/
- gfp_mask = htlb_alloc_mask(hstate_file(memfd));
+ struct hstate *h = hstate_file(memfd);
+
+ gfp_mask = htlb_alloc_mask(h);
gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE);
+ idx >>= huge_page_order(h);
- folio = alloc_hugetlb_folio_reserve(hstate_file(memfd),
+ folio = alloc_hugetlb_folio_reserve(h,
numa_node_id(),
NULL,
gfp_mask);
@@ -95,6 +98,7 @@ struct folio *memfd_alloc_folio(struct f
free_huge_folio(folio);
return ERR_PTR(err);
}
+ folio_unlock(folio);
return folio;
}
return ERR_PTR(-ENOMEM);
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/hugetlb: fix memfd_pin_folios free_huge_pages leak
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-memfd_pin_folios-free_huge_pages-leak.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/hugetlb: fix memfd_pin_folios free_huge_pages leak
Date: Tue, 3 Sep 2024 07:25:18 -0700
memfd_pin_folios followed by unpin_folios fails to restore free_huge_pages
if the pages were not already faulted in, because the folio refcount for
pages created by memfd_alloc_folio never goes to 0. memfd_pin_folios
needs another folio_put to undo the folio_try_get below:
memfd_alloc_folio()
alloc_hugetlb_folio_nodemask()
dequeue_hugetlb_folio_nodemask()
dequeue_hugetlb_folio_node_exact()
folio_ref_unfreeze(folio, 1); ; adds 1 refcount
folio_try_get() ; adds 1 refcount
hugetlb_add_to_page_cache() ; adds 512 refcount (on x86)
With the fix, after memfd_pin_folios + unpin_folios, the refcount for the
(unfaulted) page is 512, which is correct, as the refcount for a faulted
unpinned page is 513.
Link: https://lkml.kernel.org/r/1725373521-451395-3-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/gup.c~mm-hugetlb-fix-memfd_pin_folios-free_huge_pages-leak
+++ a/mm/gup.c
@@ -3615,7 +3615,7 @@ long memfd_pin_folios(struct file *memfd
pgoff_t start_idx, end_idx, next_idx;
struct folio *folio = NULL;
struct folio_batch fbatch;
- struct hstate *h;
+ struct hstate *h = NULL;
long ret = -EINVAL;
if (start < 0 || start > end || !max_folios)
@@ -3659,6 +3659,8 @@ long memfd_pin_folios(struct file *memfd
&fbatch);
if (folio) {
folio_put(folio);
+ if (h)
+ folio_put(folio);
folio = NULL;
}
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/filemap: fix filemap_get_folios_contig THP panic
has been removed from the -mm tree. Its filename was
mm-filemap-fix-filemap_get_folios_contig-thp-panic.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/filemap: fix filemap_get_folios_contig THP panic
Date: Tue, 3 Sep 2024 07:25:17 -0700
Patch series "memfd-pin huge page fixes".
Fix multiple bugs that occur when using memfd_pin_folios with hugetlb
pages and THP. The hugetlb bugs only bite when the page is not yet
faulted in when memfd_pin_folios is called. The THP bug bites when the
starting offset passed to memfd_pin_folios is not huge page aligned. See
the commit messages for details.
This patch (of 5):
memfd_pin_folios on memory backed by THP panics if the requested start
offset is not huge page aligned:
BUG: kernel NULL pointer dereference, address: 0000000000000036
RIP: 0010:filemap_get_folios_contig+0xdf/0x290
RSP: 0018:ffffc9002092fbe8 EFLAGS: 00010202
RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000002
The fault occurs here, because xas_load returns a folio with value 2:
filemap_get_folios_contig()
for (folio = xas_load(&xas); folio && xas.xa_index <= end;
folio = xas_next(&xas)) {
...
if (!folio_try_get(folio)) <-- BOOM
"2" is an xarray sibling entry. We get it because memfd_pin_folios does
not round the indices passed to filemap_get_folios_contig to huge page
boundaries for THP, so we load from the middle of a huge page range see a
sibling. (It does round for hugetlbfs, at the is_file_hugepages test).
To fix, if the folio is a sibling, then return the next index as the
starting point for the next call to filemap_get_folios_contig.
Link: https://lkml.kernel.org/r/1725373521-451395-1-git-send-email-steven.sistare…
Link: https://lkml.kernel.org/r/1725373521-451395-2-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/filemap.c~mm-filemap-fix-filemap_get_folios_contig-thp-panic
+++ a/mm/filemap.c
@@ -2196,6 +2196,10 @@ unsigned filemap_get_folios_contig(struc
if (xa_is_value(folio))
goto update_start;
+ /* If we landed in the middle of a THP, continue at its end. */
+ if (xa_is_sibling(folio))
+ goto update_start;
+
if (!folio_try_get(folio))
goto retry;
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
nfs41_init_clientid does not signal a failure condition from
nfs4_proc_exchange_id and nfs4_proc_create_session to a client which may
lead to mount syscall indefinitely blocked in the following stack trace:
nfs_wait_client_init_complete
nfs41_discover_server_trunking
nfs4_discover_server_trunking
nfs4_init_client
nfs4_set_client
nfs4_create_server
nfs4_try_get_tree
vfs_get_tree
do_new_mount
__se_sys_mount
and the client stuck in uninitialized state.
In addition to this all subsequent mount calls would also get blocked in
nfs_match_client waiting for the uninitialized client to finish
initialization:
nfs_wait_client_init_complete
nfs_match_client
nfs_get_client
nfs4_set_client
nfs4_create_server
nfs4_try_get_tree
vfs_get_tree
do_new_mount
__se_sys_mount
To avoid this situation propagate error condition to the mount thread
and let mount syscall fail properly.
Signed-off-by: Oleksandr Tymoshenko <ovt(a)google.com>
---
fs/nfs/nfs4state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 877f682b45f2..54ad3440ad2b 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -335,8 +335,8 @@ int nfs41_init_clientid(struct nfs_client *clp, const struct cred *cred)
if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_CONFIRMED_R))
nfs4_state_start_reclaim_reboot(clp);
nfs41_finish_session_reset(clp);
- nfs_mark_client_ready(clp, NFS_CS_READY);
out:
+ nfs_mark_client_ready(clp, status == 0 ? NFS_CS_READY : status);
return status;
}
---
base-commit: ad618736883b8970f66af799e34007475fe33a68
change-id: 20240906-nfs-mount-deadlock-fix-55c14b38e088
Best regards,
--
Oleksandr Tymoshenko <ovt(a)google.com>
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: d4fc4d01471528da8a9797a065982e05090e1d81
Gitweb: https://git.kernel.org/tip/d4fc4d01471528da8a9797a065982e05090e1d81
Author: Alexey Gladkov (Intel) <legion(a)kernel.org>
AuthorDate: Fri, 13 Sep 2024 19:05:56 +02:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Thu, 26 Sep 2024 09:45:04 -07:00
x86/tdx: Fix "in-kernel MMIO" check
TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.
However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.
Ensure that the target MMIO address is within the kernel before decoding
instruction.
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Signed-off-by: Alexey Gladkov (Intel) <legion(a)kernel.org>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/565a804b80387970460a4ebc67c88d1380f61ad1.172623…
---
arch/x86/coco/tdx/tdx.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index da8b66d..327c45c 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -16,6 +16,7 @@
#include <asm/insn-eval.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
+#include <asm/traps.h>
/* MMIO direction */
#define EPT_READ 0
@@ -433,6 +434,11 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
return -EINVAL;
}
+ if (!fault_in_kernel_space(ve->gla)) {
+ WARN_ONCE(1, "Access to userspace address is not supported");
+ return -EINVAL;
+ }
+
/*
* Reject EPT violation #VEs that split pages.
*
This series brings fixes and updates to ov08x40 which allows for use of
this sensor on the Qualcomm x1e80100 CRD but also on any other dts based
system.
Firstly there's a fix for the pseudo burst mode code that was added in
8f667d202384 ("media: ov08x40: Reduce start streaming time"). Not every I2C
controller can handle an arbitrary sized write, this is the case on
Qualcomm CAMSS/CCI I2C sensor interfaces which limit the transaction size
and communicate this limit via I2C quirks. A simple fix to optionally break
up the large submitted burst into chunks not exceeding adapter->quirk size
fixes.
Secondly then is addition of a yaml description for the ov08x40 and
extension of the driver to support OF probe and powering on of the power
rails from the driver instead of from ACPI.
Once done the sensor works without further modification on the Qualcomm
x1e80100 CRD.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (4):
media: ov08x40: Fix burst write sequence
media: dt-bindings: Add OmniVision OV08X40
media: ov08x40: Rename ext_clk to xvclk
media: ov08x40: Add OF probe support
.../bindings/media/i2c/ovti,ov08x40.yaml | 130 ++++++++++++++++
drivers/media/i2c/ov08x40.c | 172 ++++++++++++++++++---
2 files changed, 283 insertions(+), 19 deletions(-)
---
base-commit: 2b7275670032a98cba266bd1b8905f755b3e650f
change-id: 20240926-b4-master-24-11-25-ov08x40-c6f477aaa6a4
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Since the inception of relocation we have maintained the backref cache
across transaction commits, updating the backref cache with the new
bytenr whenever we COW'ed blocks that were in the cache, and then
updating their bytenr once we detected a transaction id change.
This works as long as we're only ever modifying blocks, not changing the
structure of the tree.
However relocation does in fact change the structure of the tree. For
example, if we are relocating a data extent, we will look up all the
leaves that point to this data extent. We will then call
do_relocation() on each of these leaves, which will COW down to the leaf
and then update the file extent location.
But, a key feature of do_relocation is the pending list. This is all
the pending nodes that we modified when we updated the file extent item.
We will then process all of these blocks via finish_pending_nodes, which
calls do_relocation() on all of the nodes that led up to that leaf.
The purpose of this is to make sure we don't break sharing unless we
absolutely have to. Consider the case that we have 3 snapshots that all
point to this leaf through the same nodes, the initial COW would have
created a whole new path. If we did this for all 3 snapshots we would
end up with 3x the number of nodes we had originally. To avoid this we
will cycle through each of the snapshots that point to each of these
nodes and update their pointers to point at the new nodes.
Once we update the pointer to the new node we will drop the node we
removed the link for and all of its children via btrfs_drop_subtree().
This is essentially just btrfs_drop_snapshot(), but for an arbitrary
point in the snapshot.
The problem with this is that we will never reflect this in the backref
cache. If we do this btrfs_drop_snapshot() for a node that is in the
backref tree, we will leave the node in the backref tree. This becomes
a problem when we change the transid, as now the backref cache has
entire subtrees that no longer exist, but exist as if they still are
pointed to by the same roots.
In the best case scenario you end up with "adding refs to an existing
tree ref" errors from insert_inline_extent_backref(), where we attempt
to link in nodes on roots that are no longer valid.
Worst case you will double free some random block and re-use it when
there's still references to the block.
This is extremely subtle, and the consequences are quite bad. There
isn't a way to make sure our backref cache is consistent between
transid's.
In order to fix this we need to simply evict the entire backref cache
anytime we cross transid's. This reduces performance in that we have to
rebuild this backref cache every time we change transid's, but fixes the
bug.
This has existed since relocation was added, and is a pretty critical
bug. There's a lot more cleanup that can be done now that this
functionality is going away, but this patch is as small as possible in
order to fix the problem and make it easy for us to backport it to all
the kernels it needs to be backported to.
Followup series will dismantle more of this code and simplify relocation
drastically to remove this functionality.
We have a reproducer that reproduced the corruption within a few minutes
of running. With this patch it survives several iterations/hours of
running the reproducer.
Fixes: 3fd0a5585eb9 ("Btrfs: Metadata ENOSPC handling for balance")
Cc: stable(a)vger.kernel.org
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
fs/btrfs/backref.c | 12 ++++---
fs/btrfs/relocation.c | 75 ++-----------------------------------------
2 files changed, 11 insertions(+), 76 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index e2f478ecd7fd..f8e1d5b2c512 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3179,10 +3179,14 @@ void btrfs_backref_release_cache(struct btrfs_backref_cache *cache)
btrfs_backref_cleanup_node(cache, node);
}
- cache->last_trans = 0;
-
- for (i = 0; i < BTRFS_MAX_LEVEL; i++)
- ASSERT(list_empty(&cache->pending[i]));
+ for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
+ while (!list_empty(&cache->pending[i])) {
+ node = list_first_entry(&cache->pending[i],
+ struct btrfs_backref_node,
+ list);
+ btrfs_backref_cleanup_node(cache, node);
+ }
+ }
ASSERT(list_empty(&cache->pending_edge));
ASSERT(list_empty(&cache->useless_node));
ASSERT(list_empty(&cache->changed));
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index ea4ed85919ec..cf1dfeaaf2d8 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -232,70 +232,6 @@ static struct btrfs_backref_node *walk_down_backref(
return NULL;
}
-static void update_backref_node(struct btrfs_backref_cache *cache,
- struct btrfs_backref_node *node, u64 bytenr)
-{
- struct rb_node *rb_node;
- rb_erase(&node->rb_node, &cache->rb_root);
- node->bytenr = bytenr;
- rb_node = rb_simple_insert(&cache->rb_root, node->bytenr, &node->rb_node);
- if (rb_node)
- btrfs_backref_panic(cache->fs_info, bytenr, -EEXIST);
-}
-
-/*
- * update backref cache after a transaction commit
- */
-static int update_backref_cache(struct btrfs_trans_handle *trans,
- struct btrfs_backref_cache *cache)
-{
- struct btrfs_backref_node *node;
- int level = 0;
-
- if (cache->last_trans == 0) {
- cache->last_trans = trans->transid;
- return 0;
- }
-
- if (cache->last_trans == trans->transid)
- return 0;
-
- /*
- * detached nodes are used to avoid unnecessary backref
- * lookup. transaction commit changes the extent tree.
- * so the detached nodes are no longer useful.
- */
- while (!list_empty(&cache->detached)) {
- node = list_entry(cache->detached.next,
- struct btrfs_backref_node, list);
- btrfs_backref_cleanup_node(cache, node);
- }
-
- while (!list_empty(&cache->changed)) {
- node = list_entry(cache->changed.next,
- struct btrfs_backref_node, list);
- list_del_init(&node->list);
- BUG_ON(node->pending);
- update_backref_node(cache, node, node->new_bytenr);
- }
-
- /*
- * some nodes can be left in the pending list if there were
- * errors during processing the pending nodes.
- */
- for (level = 0; level < BTRFS_MAX_LEVEL; level++) {
- list_for_each_entry(node, &cache->pending[level], list) {
- BUG_ON(!node->pending);
- if (node->bytenr == node->new_bytenr)
- continue;
- update_backref_node(cache, node, node->new_bytenr);
- }
- }
-
- cache->last_trans = 0;
- return 1;
-}
-
static bool reloc_root_is_dead(const struct btrfs_root *root)
{
/*
@@ -551,9 +487,6 @@ static int clone_backref_node(struct btrfs_trans_handle *trans,
struct btrfs_backref_edge *new_edge;
struct rb_node *rb_node;
- if (cache->last_trans > 0)
- update_backref_cache(trans, cache);
-
rb_node = rb_simple_search(&cache->rb_root, src->commit_root->start);
if (rb_node) {
node = rb_entry(rb_node, struct btrfs_backref_node, rb_node);
@@ -3698,11 +3631,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
break;
}
restart:
- if (update_backref_cache(trans, &rc->backref_cache)) {
- btrfs_end_transaction(trans);
- trans = NULL;
- continue;
- }
+ if (rc->backref_cache.last_trans != trans->transid)
+ btrfs_backref_release_cache(&rc->backref_cache);
+ rc->backref_cache.last_trans = trans->transid;
ret = find_next_extent(rc, path, &key);
if (ret < 0)
--
2.43.0
Since the inception of relocation we have maintained the backref cache
across transaction commits, updating the backref cache with the new
bytenr whenever we COW'ed blocks that were in the cache, and then
updating their bytenr once we detected a transaction id change.
This works as long as we're only ever modifying blocks, not changing the
structure of the tree.
However relocation does in fact change the structure of the tree. For
example, if we are relocating a data extent, we will look up all the
leaves that point to this data extent. We will then call
do_relocation() on each of these leaves, which will COW down to the leaf
and then update the file extent location.
But, a key feature of do_relocation is the pending list. This is all
the pending nodes that we modified when we updated the file extent item.
We will then process all of these blocks via finish_pending_nodes, which
calls do_relocation() on all of the nodes that led up to that leaf.
The purpose of this is to make sure we don't break sharing unless we
absolutely have to. Consider the case that we have 3 snapshots that all
point to this leaf through the same nodes, the initial COW would have
created a whole new path. If we did this for all 3 snapshots we would
end up with 3x the number of nodes we had originally. To avoid this we
will cycle through each of the snapshots that point to each of these
nodes and update their pointers to point at the new nodes.
Once we update the pointer to the new node we will drop the node we
removed the link for and all of its children via btrfs_drop_subtree().
This is essentially just btrfs_drop_snapshot(), but for an arbitrary
point in the snapshot.
The problem with this is that we will never reflect this in the backref
cache. If we do this btrfs_drop_snapshot() for a node that is in the
backref tree, we will leave the node in the backref tree. This becomes
a problem when we change the transid, as now the backref cache has
entire subtrees that no longer exist, but exist as if they still are
pointed to by the same roots.
In the best case scenario you end up with "adding refs to an existing
tree ref" errors from insert_inline_extent_backref(), where we attempt
to link in nodes on roots that are no longer valid.
Worst case you will double free some random block and re-use it when
there's still references to the block.
This is extremely subtle, and the consequences are quite bad. There
isn't a way to make sure our backref cache is consistent between
transid's.
In order to fix this we need to simply evict the entire backref cache
anytime we cross transid's. This reduces performance in that we have to
rebuild this backref cache every time we change transid's, but fixes the
bug.
This has existed since relocation was added, and is a pretty critical
bug. There's a lot more cleanup that can be done now that this
functionality is going away, but this patch is as small as possible in
order to fix the problem and make it easy for us to backport it to all
the kernels it needs to be backported to.
Followup series will dismantle more of this code and simplify relocation
drastically to remove this functionality.
We have a reproducer that reproduced the corruption within a few minutes
of running. With this patch it survives several iterations/hours of
running the reproducer.
Fixes: 3fd0a5585eb9 ("Btrfs: Metadata ENOSPC handling for balance")
Cc: stable(a)vger.kernel.org
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
fs/btrfs/backref.c | 12 ++++---
fs/btrfs/relocation.c | 76 +++----------------------------------------
2 files changed, 13 insertions(+), 75 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index e2f478ecd7fd..f8e1d5b2c512 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3179,10 +3179,14 @@ void btrfs_backref_release_cache(struct btrfs_backref_cache *cache)
btrfs_backref_cleanup_node(cache, node);
}
- cache->last_trans = 0;
-
- for (i = 0; i < BTRFS_MAX_LEVEL; i++)
- ASSERT(list_empty(&cache->pending[i]));
+ for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
+ while (!list_empty(&cache->pending[i])) {
+ node = list_first_entry(&cache->pending[i],
+ struct btrfs_backref_node,
+ list);
+ btrfs_backref_cleanup_node(cache, node);
+ }
+ }
ASSERT(list_empty(&cache->pending_edge));
ASSERT(list_empty(&cache->useless_node));
ASSERT(list_empty(&cache->changed));
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index ea4ed85919ec..aaa9cac213f1 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -232,70 +232,6 @@ static struct btrfs_backref_node *walk_down_backref(
return NULL;
}
-static void update_backref_node(struct btrfs_backref_cache *cache,
- struct btrfs_backref_node *node, u64 bytenr)
-{
- struct rb_node *rb_node;
- rb_erase(&node->rb_node, &cache->rb_root);
- node->bytenr = bytenr;
- rb_node = rb_simple_insert(&cache->rb_root, node->bytenr, &node->rb_node);
- if (rb_node)
- btrfs_backref_panic(cache->fs_info, bytenr, -EEXIST);
-}
-
-/*
- * update backref cache after a transaction commit
- */
-static int update_backref_cache(struct btrfs_trans_handle *trans,
- struct btrfs_backref_cache *cache)
-{
- struct btrfs_backref_node *node;
- int level = 0;
-
- if (cache->last_trans == 0) {
- cache->last_trans = trans->transid;
- return 0;
- }
-
- if (cache->last_trans == trans->transid)
- return 0;
-
- /*
- * detached nodes are used to avoid unnecessary backref
- * lookup. transaction commit changes the extent tree.
- * so the detached nodes are no longer useful.
- */
- while (!list_empty(&cache->detached)) {
- node = list_entry(cache->detached.next,
- struct btrfs_backref_node, list);
- btrfs_backref_cleanup_node(cache, node);
- }
-
- while (!list_empty(&cache->changed)) {
- node = list_entry(cache->changed.next,
- struct btrfs_backref_node, list);
- list_del_init(&node->list);
- BUG_ON(node->pending);
- update_backref_node(cache, node, node->new_bytenr);
- }
-
- /*
- * some nodes can be left in the pending list if there were
- * errors during processing the pending nodes.
- */
- for (level = 0; level < BTRFS_MAX_LEVEL; level++) {
- list_for_each_entry(node, &cache->pending[level], list) {
- BUG_ON(!node->pending);
- if (node->bytenr == node->new_bytenr)
- continue;
- update_backref_node(cache, node, node->new_bytenr);
- }
- }
-
- cache->last_trans = 0;
- return 1;
-}
-
static bool reloc_root_is_dead(const struct btrfs_root *root)
{
/*
@@ -551,9 +487,6 @@ static int clone_backref_node(struct btrfs_trans_handle *trans,
struct btrfs_backref_edge *new_edge;
struct rb_node *rb_node;
- if (cache->last_trans > 0)
- update_backref_cache(trans, cache);
-
rb_node = rb_simple_search(&cache->rb_root, src->commit_root->start);
if (rb_node) {
node = rb_entry(rb_node, struct btrfs_backref_node, rb_node);
@@ -3698,10 +3631,11 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
break;
}
restart:
- if (update_backref_cache(trans, &rc->backref_cache)) {
- btrfs_end_transaction(trans);
- trans = NULL;
- continue;
+ if (rc->backref_cache.last_trans == 0) {
+ rc->backref_cache.last_trans = trans->transid;
+ } else if (rc->backref_cache.last_trans != trans->transid) {
+ btrfs_backref_release_cache(&rc->backref_cache);
+ rc->backref_cache.last_trans = trans->transid;
}
ret = find_next_extent(rc, path, &key);
--
2.43.0
These are all fixes for the frozen notification patch [1], which as of
today hasn't landed in mainline yet. As such, this patchset is rebased
on top of the char-misc-next branch.
[1] https://lore.kernel.org/all/20240709070047.4055369-2-yutingtseng@google.com/
Cc: stable(a)vger.kernel.org
Cc: Yu-Ting Tseng <yutingtseng(a)google.com>
Cc: Alice Ryhl <aliceryhl(a)google.com>
Cc: Todd Kjos <tkjos(a)google.com>
Cc: Martijn Coenen <maco(a)google.com>
Cc: Arve Hjønnevåg <arve(a)android.com>
Cc: Viktor Martensson <vmartensson(a)google.com>
Carlos Llamas (4):
binder: fix node UAF in binder_add_freeze_work()
binder: fix OOB in binder_add_freeze_work()
binder: fix freeze UAF in binder_release_work()
binder: fix BINDER_WORK_FROZEN_BINDER debug logs
drivers/android/binder.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
--
2.46.0.792.g87dc391469-goog
Please consider adding the following commit to v6.6:
83bdfcbdbe5d ("nvme-pci: qdepth 1 quirk")
This resolves filesystem corruption and random drive drop-outs on machines with O2micro NVME drives, including a number of HP Thinclients.
Alex
Move the "l3_cache" node under the "cpus" node in the dtsi file for Rockchip
RK356x SoCs. There's no need for this cache node to be at the higher level.
Fixes: 8612169a05c5 ("arm64: dts: rockchip: Add cache information to the SoC dtsi for RK356x")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dragan Simic <dsimic(a)manjaro.org>
---
arch/arm64/boot/dts/rockchip/rk356x.dtsi | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk356x.dtsi b/arch/arm64/boot/dts/rockchip/rk356x.dtsi
index 4690be841a1c..9f7136e5d553 100644
--- a/arch/arm64/boot/dts/rockchip/rk356x.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk356x.dtsi
@@ -113,19 +113,19 @@ cpu3: cpu@300 {
d-cache-sets = <128>;
next-level-cache = <&l3_cache>;
};
- };
- /*
- * There are no private per-core L2 caches, but only the
- * L3 cache that appears to the CPU cores as L2 caches
- */
- l3_cache: l3-cache {
- compatible = "cache";
- cache-level = <2>;
- cache-unified;
- cache-size = <0x80000>;
- cache-line-size = <64>;
- cache-sets = <512>;
+ /*
+ * There are no private per-core L2 caches, but only the
+ * L3 cache that appears to the CPU cores as L2 caches
+ */
+ l3_cache: l3-cache {
+ compatible = "cache";
+ cache-level = <2>;
+ cache-unified;
+ cache-size = <0x80000>;
+ cache-line-size = <64>;
+ cache-sets = <512>;
+ };
};
cpu0_opp_table: opp-table-0 {
From: Willem de Bruijn <willemb(a)google.com>
Detect gso fraglist skbs with corrupted geometry (see below) and
pass these to skb_segment instead of skb_segment_list, as the first
can segment them correctly.
Valid SKB_GSO_FRAGLIST skbs
- consist of two or more segments
- the head_skb holds the protocol headers plus first gso_size
- one or more frag_list skbs hold exactly one segment
- all but the last must be gso_size
Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
modify these skbs, breaking these invariants.
In extreme cases they pull all data into skb linear. For UDP, this
causes a NULL ptr deref in __udpv4_gso_segment_list_csum at
udp_hdr(seg->next)->dest.
Detect invalid geometry due to pull, by checking head_skb size.
Don't just drop, as this may blackhole a destination. Convert to be
able to pass to regular skb_segment.
Link: https://lore.kernel.org/netdev/20240428142913.18666-1-shiming.cheng@mediate…
Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Cc: stable(a)vger.kernel.org
---
Tested:
- tools/testing/selftests/net/udpgro_fwd.sh
- kunit gso_test_func converted to calling __udp_gso_segment
- below manual end-to-end test:
(which probably repeats a lot of udpgro_fwd.sh, in hindsight..)
(won't repeat this on any resubmits, given how long it is)
#!/bin/bash
ip netns add test1
ip netns add test2
ip netns add test3
ip link add dev veth0 netns test1 type veth peer name veth0 netns test2
ip link add dev veth1 netns test2 type veth peer name veth1 netns test3
ip netns exec test1 ip link set dev veth0 up
ip netns exec test2 ip link set dev veth0 up
ip netns exec test2 ip link set dev veth1 up
ip netns exec test3 ip link set dev veth1 up
ip netns exec test1 ip addr add 10.0.8.1/24 dev veth0
ip netns exec test2 ip addr add 10.0.8.2/24 dev veth0
ip netns exec test2 ip addr add 10.0.9.2/24 dev veth1
ip netns exec test3 ip addr add 10.0.9.3/24 dev veth1
ip -6 -netns test1 addr add fdaa::1 dev veth0
ip -6 -netns test2 addr add fdaa::2 dev veth0
ip -6 -netns test2 addr add fdbb::2 dev veth1
ip -6 -netns test3 addr add fdbb::3 dev veth1
ip netns exec test2 sysctl -w net.ipv4.ip_forward=1
ip netns exec test2 sysctl -w net.ipv6.conf.all.forwarding=1
ip -netns test1 route add default via 10.0.8.2
ip -netns test3 route add default via 10.0.9.2
ip -6 -netns test1 route add fdaa::2 dev veth0
ip -6 -netns test2 route add fdaa::1 dev veth0
ip -6 -netns test2 route add fdbb::3 dev veth1
ip -6 -netns test3 route add fdbb::2 dev veth1
ip -6 -netns test1 route add default via fdaa::2
ip -6 -netns test3 route add default via fdbb::2
ip netns exec test1 ethtool -K veth0 gso off tx-udp-segmentation off
ip netns exec test2 ethtool -K veth0 gro on rx-gro-list on rx-udp-gro-forwarding on
ip netns exec test2 ethtool -K veth1 gso off tx-udp-segmentation off
ip netns exec test2 tc qdisc add dev veth0 clsact
ip netns exec test2 tc filter add dev veth0 ingress bpf direct-action obj tc_pull.o sec tc
ip netns exec test3 /mnt/shared/udpgso_bench_rx & \
ip netns exec test1 /mnt/shared/udpgso_bench_tx -l 5 -4 -D 10.0.9.3 -s
60000 -S 0 -z
ip netns exec test3 /mnt/shared/udpgso_bench_rx & \
ip netns exec test1 /mnt/shared/udpgso_bench_tx -l 5 -6 -D fdbb::3 -s
60000 -S 0 -z
With trivial BPF program:
$ cat ~/work/tc_pull.c
// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include <linux/pkt_cls.h>
#include <linux/types.h>
#include <bpf/bpf_helpers.h>
__attribute__((section("tc")))
int tc_cls_prog(struct __sk_buff *skb) {
bpf_skb_pull_data(skb, skb->len);
return TC_ACT_OK;
}
---
net/ipv4/udp_offload.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index d842303587af..e457fa9143a6 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -296,8 +296,16 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
return NULL;
}
- if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST)
- return __udp_gso_segment_list(gso_skb, features, is_ipv6);
+ if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST) {
+ /* Detect modified geometry and pass these to skb_segment. */
+ if (skb_pagelen(gso_skb) - sizeof(*uh) == skb_shinfo(gso_skb)->gso_size)
+ return __udp_gso_segment_list(gso_skb, features, is_ipv6);
+
+ /* Setup csum, as fraglist skips this in udp4_gro_receive. */
+ gso_skb->csum_start = skb_transport_header(gso_skb) - gso_skb->head;
+ gso_skb->csum_offset = offsetof(struct udphdr, check);
+ gso_skb->ip_summed = CHECKSUM_PARTIAL;
+ }
skb_pull(gso_skb, sizeof(*uh));
--
2.46.0.792.g87dc391469-goog
On Thu, Sep 26, 2024 at 10:56:24AM +0800, cuigaosheng wrote:
> On 2024/9/25 20:19, Greg KH wrote:
>
> > On Wed, Sep 25, 2024 at 08:07:46PM +0800, Gaosheng Cui wrote:
> > > If a kset with uninitialized kset->kobj.ktype be registered,
> > Does that happen today with any in-kernel code? If so, let's fix those
> > kset instances, right?
>
> I didn't find this kset instance in kernel code,itwas discovered through code review.
Great, then it is not a real issue :)
> > > kset_register() will return error, and the kset.kobj.name allocated
> > > by kobject_set_name() will be leaked.
> > >
> > > To mitigate this, we free the name in kset_register() when an error
> > > is encountered due to uninitialized kset->kobj.ktype.
> > How did you hit this?
>
> I am testing kernel functionality through fault injection, and I discovered
> it whenI was locating the issue fixed by another patch, I haven't found this
> wrong usage in the kernel, but we can construct it by fault injection,
"fault injection" also shows things that are impossible to ever have
happen as well, so our "need" to fix them is usually very low, right?
thanks,
greg k-h
From: Dave Chinner <dchinner(a)redhat.com>
[ Upstream commit f1e1765aad7de7a8b8102044fc6a44684bc36180 ]
If the journal geometry results in a sector or log stripe unit
validation problem, it indicates that we cannot set the log up to
safely write to the the journal. In these cases, we must abort the
mount because the corruption needs external intervention to resolve.
Similarly, a journal that is too large cannot be written to safely,
either, so we shouldn't allow those geometries to mount, either.
If the log is too small, we risk having transaction reservations
overruning the available log space and the system hanging waiting
for space it can never provide. This is purely a runtime hang issue,
not a corruption issue as per the first cases listed above. We abort
mounts of the log is too small for V5 filesystems, but we must allow
v4 filesystems to mount because, historically, there was no log size
validity checking and so some systems may still be out there with
undersized logs.
The problem is that on V4 filesystems, when we discover a log
geometry problem, we skip all the remaining checks and then allow
the log to continue mounting. This mean that if one of the log size
checks fails, we skip the log stripe unit check. i.e. we allow the
mount because a "non-fatal" geometry is violated, and then fail to
check the hard fail geometries that should fail the mount.
Move all these fatal checks to the superblock verifier, and add a
new check for the two log sector size geometry variables having the
same values. This will prevent any attempt to mount a log that has
invalid or inconsistent geometries long before we attempt to mount
the log.
However, for the minimum log size checks, we can only do that once
we've setup up the log and calculated all the iclog sizes and
roundoffs. Hence this needs to remain in the log mount code after
the log has been initialised. It is also the only case where we
should allow a v4 filesystem to continue running, so leave that
handling in place, too.
Signed-off-by: Dave Chinner <dchinner(a)redhat.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Leah Rumancik <leah.rumancik(a)gmail.com>
---
Notes:
A new fix for the latest 6.1.y backport series just came in. Ran
some tests on it as well and all looks good. Please include with
the original set.
Thanks,
Leah
fs/xfs/libxfs/xfs_sb.c | 56 +++++++++++++++++++++++++++++++++++++++++-
fs/xfs/xfs_log.c | 47 +++++++++++------------------------
2 files changed, 70 insertions(+), 33 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index bf2cca78304e..c24a38272cb7 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -413,7 +413,6 @@ xfs_validate_sb_common(
sbp->sb_inodelog < XFS_DINODE_MIN_LOG ||
sbp->sb_inodelog > XFS_DINODE_MAX_LOG ||
sbp->sb_inodesize != (1 << sbp->sb_inodelog) ||
- sbp->sb_logsunit > XLOG_MAX_RECORD_BSIZE ||
sbp->sb_inopblock != howmany(sbp->sb_blocksize,sbp->sb_inodesize) ||
XFS_FSB_TO_B(mp, sbp->sb_agblocks) < XFS_MIN_AG_BYTES ||
XFS_FSB_TO_B(mp, sbp->sb_agblocks) > XFS_MAX_AG_BYTES ||
@@ -431,6 +430,61 @@ xfs_validate_sb_common(
return -EFSCORRUPTED;
}
+ /*
+ * Logs that are too large are not supported at all. Reject them
+ * outright. Logs that are too small are tolerated on v4 filesystems,
+ * but we can only check that when mounting the log. Hence we skip
+ * those checks here.
+ */
+ if (sbp->sb_logblocks > XFS_MAX_LOG_BLOCKS) {
+ xfs_notice(mp,
+ "Log size 0x%x blocks too large, maximum size is 0x%llx blocks",
+ sbp->sb_logblocks, XFS_MAX_LOG_BLOCKS);
+ return -EFSCORRUPTED;
+ }
+
+ if (XFS_FSB_TO_B(mp, sbp->sb_logblocks) > XFS_MAX_LOG_BYTES) {
+ xfs_warn(mp,
+ "log size 0x%llx bytes too large, maximum size is 0x%llx bytes",
+ XFS_FSB_TO_B(mp, sbp->sb_logblocks),
+ XFS_MAX_LOG_BYTES);
+ return -EFSCORRUPTED;
+ }
+
+ /*
+ * Do not allow filesystems with corrupted log sector or stripe units to
+ * be mounted. We cannot safely size the iclogs or write to the log if
+ * the log stripe unit is not valid.
+ */
+ if (sbp->sb_versionnum & XFS_SB_VERSION_SECTORBIT) {
+ if (sbp->sb_logsectsize != (1U << sbp->sb_logsectlog)) {
+ xfs_notice(mp,
+ "log sector size in bytes/log2 (0x%x/0x%x) must match",
+ sbp->sb_logsectsize, 1U << sbp->sb_logsectlog);
+ return -EFSCORRUPTED;
+ }
+ } else if (sbp->sb_logsectsize || sbp->sb_logsectlog) {
+ xfs_notice(mp,
+ "log sector size in bytes/log2 (0x%x/0x%x) are not zero",
+ sbp->sb_logsectsize, sbp->sb_logsectlog);
+ return -EFSCORRUPTED;
+ }
+
+ if (sbp->sb_logsunit > 1) {
+ if (sbp->sb_logsunit % sbp->sb_blocksize) {
+ xfs_notice(mp,
+ "log stripe unit 0x%x bytes must be a multiple of block size",
+ sbp->sb_logsunit);
+ return -EFSCORRUPTED;
+ }
+ if (sbp->sb_logsunit > XLOG_MAX_RECORD_BSIZE) {
+ xfs_notice(mp,
+ "log stripe unit 0x%x bytes over maximum size (0x%x bytes)",
+ sbp->sb_logsunit, XLOG_MAX_RECORD_BSIZE);
+ return -EFSCORRUPTED;
+ }
+ }
+
/* Validate the realtime geometry; stolen from xfs_repair */
if (sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE ||
sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE) {
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index d9aa5eab02c3..59c982297503 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -639,7 +639,6 @@ xfs_log_mount(
int num_bblks)
{
struct xlog *log;
- bool fatal = xfs_has_crc(mp);
int error = 0;
int min_logfsbs;
@@ -661,53 +660,37 @@ xfs_log_mount(
mp->m_log = log;
/*
- * Validate the given log space and drop a critical message via syslog
- * if the log size is too small that would lead to some unexpected
- * situations in transaction log space reservation stage.
+ * Now that we have set up the log and it's internal geometry
+ * parameters, we can validate the given log space and drop a critical
+ * message via syslog if the log size is too small. A log that is too
+ * small can lead to unexpected situations in transaction log space
+ * reservation stage. The superblock verifier has already validated all
+ * the other log geometry constraints, so we don't have to check those
+ * here.
*
- * Note: we can't just reject the mount if the validation fails. This
- * would mean that people would have to downgrade their kernel just to
- * remedy the situation as there is no way to grow the log (short of
- * black magic surgery with xfs_db).
+ * Note: For v4 filesystems, we can't just reject the mount if the
+ * validation fails. This would mean that people would have to
+ * downgrade their kernel just to remedy the situation as there is no
+ * way to grow the log (short of black magic surgery with xfs_db).
*
- * We can, however, reject mounts for CRC format filesystems, as the
+ * We can, however, reject mounts for V5 format filesystems, as the
* mkfs binary being used to make the filesystem should never create a
* filesystem with a log that is too small.
*/
min_logfsbs = xfs_log_calc_minimum_size(mp);
-
if (mp->m_sb.sb_logblocks < min_logfsbs) {
xfs_warn(mp,
"Log size %d blocks too small, minimum size is %d blocks",
mp->m_sb.sb_logblocks, min_logfsbs);
- error = -EINVAL;
- } else if (mp->m_sb.sb_logblocks > XFS_MAX_LOG_BLOCKS) {
- xfs_warn(mp,
- "Log size %d blocks too large, maximum size is %lld blocks",
- mp->m_sb.sb_logblocks, XFS_MAX_LOG_BLOCKS);
- error = -EINVAL;
- } else if (XFS_FSB_TO_B(mp, mp->m_sb.sb_logblocks) > XFS_MAX_LOG_BYTES) {
- xfs_warn(mp,
- "log size %lld bytes too large, maximum size is %lld bytes",
- XFS_FSB_TO_B(mp, mp->m_sb.sb_logblocks),
- XFS_MAX_LOG_BYTES);
- error = -EINVAL;
- } else if (mp->m_sb.sb_logsunit > 1 &&
- mp->m_sb.sb_logsunit % mp->m_sb.sb_blocksize) {
- xfs_warn(mp,
- "log stripe unit %u bytes must be a multiple of block size",
- mp->m_sb.sb_logsunit);
- error = -EINVAL;
- fatal = true;
- }
- if (error) {
+
/*
* Log check errors are always fatal on v5; or whenever bad
* metadata leads to a crash.
*/
- if (fatal) {
+ if (xfs_has_crc(mp)) {
xfs_crit(mp, "AAIEEE! Log failed size checks. Abort!");
ASSERT(0);
+ error = -EINVAL;
goto out_free_log;
}
xfs_crit(mp, "Log size out of supported range.");
--
2.46.0.792.g87dc391469-goog
On Wed, 2024-09-25 at 03:30 +0000, Ping-Ke Shih wrote:
> I think the cause is picking commit 268f84a82753
> ("wifi: cfg80211: check wiphy mutex is held for wdev mutex"), and
> cfg80211_is_all_idle() called by disconnect_work() uses wdev_lock()
> but not wiphy_lock().
Yeah seems like a stable only problem (CC them), this is with kernel
version 6.6.51-00141-ga1649b6f8ed6 according to the warning.
> I'm not sure if we should simply revert the picked commit 268f84a82753
> or should pick more commits.
I don't see why it was picked up in the first place, so I guess I'd say
remove it. We won't want to redo the locking on a stable kernel, I'd
think.
> By the way, I think the latest kernel will not throw these messages.
Agree, that seems unlikely.
johannes
I upgraded from kernel 6.1.94 to 6.1.99 on one of my machines and noticed that
the dmesg line "Incomplete global flushes, disabling PCID" had disappeared from
the log.
That message comes from commit c26b9e193172f48cd0ccc64285337106fb8aa804, which
disables PCID support on some broken hardware in arch/x86/mm/init.c:
#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
.family = 6, \
.model = _model, \
}
/*
* INVLPG may not properly flush Global entries
* on these CPUs when PCIDs are enabled.
*/
static const struct x86_cpu_id invlpg_miss_ids[] = {
INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
{}
...
if (x86_match_cpu(invlpg_miss_ids)) {
pr_info("Incomplete global flushes, disabling PCID");
setup_clear_cpu_cap(X86_FEATURE_PCID);
return;
}
arch/x86/mm/init.c, which has that code, hasn't changed in 6.1.94 -> 6.1.99.
However I found a commit changing how x86_match_cpu() behaves in 6.1.96:
commit 8ab1361b2eae44077fef4adea16228d44ffb860c
Author: Tony Luck <tony.luck(a)intel.com>
Date: Mon May 20 15:45:33 2024 -0700
x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
I suspect this broke the PCID disabling code in arch/x86/mm/init.c.
The commit message says:
"Add a new flags field to struct x86_cpu_id that has a bit set to indicate that
this entry in the array is valid. Update X86_MATCH*() macros to set that bit.
Change the end-marker check in x86_match_cpu() to just check the flags field
for this bit."
But the PCID disabling code in 6.1.99 does not make use of the
X86_MATCH*() macros; instead, it defines a new INTEL_MATCH() macro without the
X86_CPU_ID_FLAG_ENTRY_VALID flag.
I looked in upstream git and found an existing fix:
commit 2eda374e883ad297bd9fe575a16c1dc850346075
Author: Tony Luck <tony.luck(a)intel.com>
Date: Wed Apr 24 11:15:18 2024 -0700
x86/mm: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.
[ dhansen: vertically align 0's in invlpg_miss_ids[] ]
Signed-off-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Link: https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 679893ea5e68..6b43b6480354 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -261,21 +261,17 @@ static void __init probe_page_size_mask(void)
}
}
-#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
- .family = 6, \
- .model = _model, \
- }
/*
* INVLPG may not properly flush Global entries
* on these CPUs when PCIDs are enabled.
*/
static const struct x86_cpu_id invlpg_miss_ids[] = {
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
- INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ X86_MATCH_VFM(INTEL_ALDERLAKE, 0),
+ X86_MATCH_VFM(INTEL_ALDERLAKE_L, 0),
+ X86_MATCH_VFM(INTEL_ATOM_GRACEMONT, 0),
+ X86_MATCH_VFM(INTEL_RAPTORLAKE, 0),
+ X86_MATCH_VFM(INTEL_RAPTORLAKE_P, 0),
+ X86_MATCH_VFM(INTEL_RAPTORLAKE_S, 0),
{}
};
The fix removed the custom INTEL_MATCH macro and uses the X86_MATCH*() macros
with X86_CPU_ID_FLAG_ENTRY_VALID. This fixed commit was never backported to 6.1,
so it looks like a stable series regression due to a missing backport.
If I apply the fix patch on 6.1.99, the PCID disabling code activates again.
I had to change all the INTEL_* definitions to the old definitions to make it
build:
static const struct x86_cpu_id invlpg_miss_ids[] = {
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE, 0),
+ X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_L, 0),
+ X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_N, 0),
+ X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE, 0),
+ X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_P, 0),
+ X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_S, 0),
{}
};
I only looked at the code in arch/x86/mm/init.c, so there may be other uses of
x86_match_cpu() in the kernel that are also broken in 6.1.99.
This email is meant as a bug report, not a pull request. Someone else should
confirm the problem and submit the appropriate fix.
From: Masami Hiramatsu <mhiramat(a)kernel.org>
commit 5b06eeae52c02dd0d9bc8488275a1207d410870b upstream.
Since commit 5821ba969511 ("selftests: Add test plan API to kselftest.h
and adjust callers") accidentally introduced 'a' typo in the front of
run_test() function, breakpoint_test_arm64.c became not able to be
compiled.
Remove the 'a' from arun_test().
Fixes: 5821ba969511 ("selftests: Add test plan API to kselftest.h and adjust callers")
Reported-by: Jun Takahashi <takahashi.jun_s(a)aa.socionext.com>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
[Samasth: bp to 5.4.y]
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda(a)oracle.com>
---
tools/testing/selftests/breakpoints/breakpoint_test_arm64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
index 58ed5eeab7094..ad41ea69001bc 100644
--- a/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
+++ b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
@@ -109,7 +109,7 @@ static bool set_watchpoint(pid_t pid, int size, int wp)
return false;
}
-static bool arun_test(int wr_size, int wp_size, int wr, int wp)
+static bool run_test(int wr_size, int wp_size, int wr, int wp)
{
int status;
siginfo_t siginfo;
--
2.45.2
Evil user can guess the next id of the vm before the ioctl completes and
then call vm destroy ioctl to trigger UAF since create ioctl is still
referencing the same vm. Move the xa_alloc all the way to the end to
prevent this.
v2:
- Rebase
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_vm.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 31fe31db3fdc..ce9dca4d4e87 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1765,10 +1765,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
if (IS_ERR(vm))
return PTR_ERR(vm);
- err = xa_alloc(&xef->vm.xa, &id, vm, xa_limit_32b, GFP_KERNEL);
- if (err)
- goto err_close_and_put;
-
if (xe->info.has_asid) {
down_write(&xe->usm.lock);
err = xa_alloc_cyclic(&xe->usm.asid_to_vm, &asid, vm,
@@ -1776,12 +1772,11 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
&xe->usm.next_asid, GFP_KERNEL);
up_write(&xe->usm.lock);
if (err < 0)
- goto err_free_id;
+ goto err_close_and_put;
vm->usm.asid = asid;
}
- args->vm_id = id;
vm->xef = xe_file_get(xef);
/* Record BO memory for VM pagetable created against client */
@@ -1794,10 +1789,15 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
args->reserved[0] = xe_bo_main_addr(vm->pt_root[0]->bo, XE_PAGE_SIZE);
#endif
+ /* user id alloc must always be last in ioctl to prevent UAF */
+ err = xa_alloc(&xef->vm.xa, &id, vm, xa_limit_32b, GFP_KERNEL);
+ if (err)
+ goto err_close_and_put;
+
+ args->vm_id = id;
+
return 0;
-err_free_id:
- xa_erase(&xef->vm.xa, id);
err_close_and_put:
xe_vm_close_and_put(vm);
--
2.46.1
From: Sarannya S <quic_sarannya(a)quicinc.com>
On remote susbsystem restart rproc will stop glink subdev which will
trigger qcom_glink_native_remove, any ongoing intent wait should be
aborted from there otherwise this wait delays glink send which potentially
delays glink channel removal as well. This further introduces delay in ssr
notification to other remote subsystems from rproc.
Currently qcom_glink_native_remove is not setting channel->intent_received,
so any ongoing intent wait is not aborted on remote susbsystem restart.
abort_tx flag can be used as a condition to abort in such cases.
Adding abort_tx flag check in intent wait, to abort intent wait from
qcom_glink_native_remove.
Fixes: c05dfce0b89e ("rpmsg: glink: Wait for intent, not just request ack")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sarannya S <quic_sarannya(a)quicinc.com>
Signed-off-by: Deepak Kumar Singh <quic_deesin(a)quicinc.com>
---
drivers/rpmsg/qcom_glink_native.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/rpmsg/qcom_glink_native.c b/drivers/rpmsg/qcom_glink_native.c
index 82d460ff4777..ff828531c36f 100644
--- a/drivers/rpmsg/qcom_glink_native.c
+++ b/drivers/rpmsg/qcom_glink_native.c
@@ -438,7 +438,6 @@ static void qcom_glink_handle_intent_req_ack(struct qcom_glink *glink,
static void qcom_glink_intent_req_abort(struct glink_channel *channel)
{
- WRITE_ONCE(channel->intent_req_result, 0);
wake_up_all(&channel->intent_req_wq);
}
@@ -1354,8 +1353,9 @@ static int qcom_glink_request_intent(struct qcom_glink *glink,
goto unlock;
ret = wait_event_timeout(channel->intent_req_wq,
- READ_ONCE(channel->intent_req_result) >= 0 &&
- READ_ONCE(channel->intent_received),
+ (READ_ONCE(channel->intent_req_result) >= 0 &&
+ READ_ONCE(channel->intent_received)) ||
+ glink->abort_tx,
10 * HZ);
if (!ret) {
dev_err(glink->dev, "intent request timed out\n");
--
2.34.1
Variable 'scale', whose possible value set allows a zero value in a check
at r8169_main.c:2014, is used as a denominator at r8169_main.c:2040 and
r8169_main.c:2042.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 2815b30535a0 ("r8169: merge scale for tx and rx irq coalescing")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru>
---
drivers/net/ethernet/realtek/r8169_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 45ac8befba29..b97e68cfcfad 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2011,7 +2011,7 @@ static int rtl_set_coalesce(struct net_device *dev,
coal_usec_max = max(ec->rx_coalesce_usecs, ec->tx_coalesce_usecs);
scale = rtl_coalesce_choose_scale(tp, coal_usec_max, &cp01);
- if (scale < 0)
+ if (scale <= 0)
return scale;
/* Accept max_frames=1 we returned in rtl_get_coalesce. Accept it
--
2.34.1
Заявление о конфиденциальности
Данное электронное письмо и любые приложения к нему являются конфиденциальными и предназначены исключительно для адресата. Если Вы не являетесь адресатом данного письма, пожалуйста, уведомите немедленно отправителя, не раскрывайте содержание другим лицам, не используйте его в каких-либо целях, не храните и не копируйте информацию любым способом.
[ Upstream commit b440396387418fe2feaacd41ca16080e7a8bc9ad ]
linereq_set_config() behaves badly when direction is not set.
The configuration validation is borrowed from linereq_create(), where,
to verify the intent of the user, the direction must be set to in order to
effect a change to the electrical configuration of a line. But, when
applied to reconfiguration, that validation does not allow for the unset
direction case, making it possible to clear flags set previously without
specifying the line direction.
Adding to the inconsistency, those changes are not immediately applied by
linereq_set_config(), but will take effect when the line value is next get
or set.
For example, by requesting a configuration with no flags set, an output
line with GPIO_V2_LINE_FLAG_ACTIVE_LOW and GPIO_V2_LINE_FLAG_OPEN_DRAIN
set could have those flags cleared, inverting the sense of the line and
changing the line drive to push-pull on the next line value set.
Skip the reconfiguration of lines for which the direction is not set, and
only reconfigure the lines for which direction is set.
Fixes: a54756cb24ea ("gpiolib: cdev: support GPIO_V2_LINE_SET_CONFIG_IOCTL")
Signed-off-by: Kent Gibson <warthog618(a)gmail.com>
Link: https://lore.kernel.org/r/20240626052925.174272-3-warthog618@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
---
drivers/gpio/gpiolib-cdev.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
index c2f9d95d1086..fe0926ce0068 100644
--- a/drivers/gpio/gpiolib-cdev.c
+++ b/drivers/gpio/gpiolib-cdev.c
@@ -1186,15 +1186,18 @@ static long linereq_set_config_unlocked(struct linereq *lr,
for (i = 0; i < lr->num_lines; i++) {
desc = lr->lines[i].desc;
flags = gpio_v2_line_config_flags(lc, i);
+ /*
+ * Lines not explicitly reconfigured as input or output
+ * are left unchanged.
+ */
+ if (!(flags & GPIO_V2_LINE_DIRECTION_FLAGS))
+ continue;
+
polarity_change =
(!!test_bit(FLAG_ACTIVE_LOW, &desc->flags) !=
((flags & GPIO_V2_LINE_FLAG_ACTIVE_LOW) != 0));
gpio_v2_line_config_flags_to_desc_flags(flags, &desc->flags);
- /*
- * Lines have to be requested explicitly for input
- * or output, else the line will be treated "as is".
- */
if (flags & GPIO_V2_LINE_FLAG_OUTPUT) {
int val = gpio_v2_line_config_output_value(lc, i);
@@ -1202,7 +1205,7 @@ static long linereq_set_config_unlocked(struct linereq *lr,
ret = gpiod_direction_output(desc, val);
if (ret)
return ret;
- } else if (flags & GPIO_V2_LINE_FLAG_INPUT) {
+ } else {
ret = gpiod_direction_input(desc);
if (ret)
return ret;
--
2.39.5
[ Upstream commit b440396387418fe2feaacd41ca16080e7a8bc9ad ]
linereq_set_config() behaves badly when direction is not set.
The configuration validation is borrowed from linereq_create(), where,
to verify the intent of the user, the direction must be set to in order to
effect a change to the electrical configuration of a line. But, when
applied to reconfiguration, that validation does not allow for the unset
direction case, making it possible to clear flags set previously without
specifying the line direction.
Adding to the inconsistency, those changes are not immediately applied by
linereq_set_config(), but will take effect when the line value is next get
or set.
For example, by requesting a configuration with no flags set, an output
line with GPIO_V2_LINE_FLAG_ACTIVE_LOW and GPIO_V2_LINE_FLAG_OPEN_DRAIN
set could have those flags cleared, inverting the sense of the line and
changing the line drive to push-pull on the next line value set.
Skip the reconfiguration of lines for which the direction is not set, and
only reconfigure the lines for which direction is set.
Fixes: a54756cb24ea ("gpiolib: cdev: support GPIO_V2_LINE_SET_CONFIG_IOCTL")
Signed-off-by: Kent Gibson <warthog618(a)gmail.com>
Link: https://lore.kernel.org/r/20240626052925.174272-3-warthog618@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
---
drivers/gpio/gpiolib-cdev.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
index d526a4c91e82..545998e9f6ad 100644
--- a/drivers/gpio/gpiolib-cdev.c
+++ b/drivers/gpio/gpiolib-cdev.c
@@ -1565,12 +1565,14 @@ static long linereq_set_config_unlocked(struct linereq *lr,
line = &lr->lines[i];
desc = lr->lines[i].desc;
flags = gpio_v2_line_config_flags(lc, i);
- gpio_v2_line_config_flags_to_desc_flags(flags, &desc->flags);
- edflags = flags & GPIO_V2_LINE_EDGE_DETECTOR_FLAGS;
/*
- * Lines have to be requested explicitly for input
- * or output, else the line will be treated "as is".
+ * Lines not explicitly reconfigured as input or output
+ * are left unchanged.
*/
+ if (!(flags & GPIO_V2_LINE_DIRECTION_FLAGS))
+ continue;
+ gpio_v2_line_config_flags_to_desc_flags(flags, &desc->flags);
+ edflags = flags & GPIO_V2_LINE_EDGE_DETECTOR_FLAGS;
if (flags & GPIO_V2_LINE_FLAG_OUTPUT) {
int val = gpio_v2_line_config_output_value(lc, i);
@@ -1578,7 +1580,7 @@ static long linereq_set_config_unlocked(struct linereq *lr,
ret = gpiod_direction_output(desc, val);
if (ret)
return ret;
- } else if (flags & GPIO_V2_LINE_FLAG_INPUT) {
+ } else {
ret = gpiod_direction_input(desc);
if (ret)
return ret;
--
2.39.5
The patch titled
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-fix-uninit-value-in-ocfs2_get_block.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
Date: Wed, 25 Sep 2024 17:06:00 +0800
syzbot reported an uninit-value BUG:
BUG: KMSAN: uninit-value in ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
do_mpage_readpage+0xc45/0x2780 fs/mpage.c:225
mpage_readahead+0x43f/0x840 fs/mpage.c:374
ocfs2_readahead+0x269/0x320 fs/ocfs2/aops.c:381
read_pages+0x193/0x1110 mm/readahead.c:160
page_cache_ra_unbounded+0x901/0x9f0 mm/readahead.c:273
do_page_cache_ra mm/readahead.c:303 [inline]
force_page_cache_ra+0x3b1/0x4b0 mm/readahead.c:332
force_page_cache_readahead mm/internal.h:347 [inline]
generic_fadvise+0x6b0/0xa90 mm/fadvise.c:106
vfs_fadvise mm/fadvise.c:185 [inline]
ksys_fadvise64_64 mm/fadvise.c:199 [inline]
__do_sys_fadvise64 mm/fadvise.c:214 [inline]
__se_sys_fadvise64 mm/fadvise.c:212 [inline]
__x64_sys_fadvise64+0x1fb/0x3a0 mm/fadvise.c:212
x64_sys_call+0xe11/0x3ba0
arch/x86/include/generated/asm/syscalls_64.h:222
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
This is because when ocfs2_extent_map_get_blocks() fails, p_blkno is
uninitialized. So the error log will trigger the above uninit-value
access.
The error log is out-of-date since get_blocks() was removed long time ago.
And the error code will be logged in ocfs2_extent_map_get_blocks() once
ocfs2_get_cluster() fails, so fix this by only logging inode and block.
Link: https://syzkaller.appspot.com/bug?extid=9709e73bae885b05314b
Link: https://lkml.kernel.org/r/20240925090600.3643376-1-joseph.qi@linux.alibaba.…
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Tested-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Cc: Heming Zhao <heming.zhao(a)suse.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/aops.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/fs/ocfs2/aops.c~ocfs2-fix-uninit-value-in-ocfs2_get_block
+++ a/fs/ocfs2/aops.c
@@ -156,9 +156,8 @@ int ocfs2_get_block(struct inode *inode,
err = ocfs2_extent_map_get_blocks(inode, iblock, &p_blkno, &count,
&ext_flags);
if (err) {
- mlog(ML_ERROR, "Error %d from get_blocks(0x%p, %llu, 1, "
- "%llu, NULL)\n", err, inode, (unsigned long long)iblock,
- (unsigned long long)p_blkno);
+ mlog(ML_ERROR, "get_blocks() failed, inode: 0x%p, "
+ "block: %llu\n", inode, (unsigned long long)iblock);
goto bail;
}
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
ocfs2-fix-uninit-value-in-ocfs2_get_block.patch
I forgot to CC stable on this fix.
Chris
On Thu, Sep 5, 2024 at 1:08 AM Chris Li <chrisl(a)kernel.org> wrote:
>
> I found a regression on mm-unstable during my swap stress test,
> using tmpfs to compile linux. The test OOM very soon after
> the make spawns many cc processes.
>
> It bisects down to this change: 33dfe9204f29b415bbc0abb1a50642d1ba94f5e9
> (mm/gup: clear the LRU flag of a page before adding to LRU batch)
>
> Yu Zhao propose the fix: "I think this is one of the potential side
> effects -- Huge mentioned earlier about isolate_lru_folios():"
>
> I test that with it the swap stress test no longer OOM.
>
> Link: https://lore.kernel.org/r/CAOUHufYi9h0kz5uW3LHHS3ZrVwEq-kKp8S6N-MZUmErNAXoX…
> Fixes: 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding to LRU batch")
> Suggested-by: Yu Zhao <yuzhao(a)google.com>
> Suggested-by: Hugh Dickins <hughd(a)google.com>
> Tested-by: Chris Li <chrisl(a)kernel.org>
> Closes: https://lore.kernel.org/all/CAF8kJuNP5iTj2p07QgHSGOJsiUfYpJ2f4R1Q5-3BN9JiD9…
> Signed-off-by: Chris Li <chrisl(a)kernel.org>
> ---
> Changes in v2:
> - Add Closes tag suggested by Yu and Thorsten.
> - Link to v1: https://lore.kernel.org/r/20240904-lru-flag-v1-1-36638d6a524c@kernel.org
> ---
> mm/vmscan.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a9b6a8196f95..96abf4a52382 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4323,7 +4323,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
> }
>
> /* ineligible */
> - if (zone > sc->reclaim_idx) {
> + if (!folio_test_lru(folio) || zone > sc->reclaim_idx) {
> gen = folio_inc_gen(lruvec, folio, false);
> list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
> return true;
>
> ---
> base-commit: 756ca36d643324d028b325a170e73e392b9590cd
> change-id: 20240904-lru-flag-2af2f955740e
>
> Best regards,
> --
> Chris Li <chrisl(a)kernel.org>
>
We acquire a connector reference before scheduling an HDCP prop work,
and expect the work function to release the reference.
However, if the work was already queued, it won't be queued multiple
times, and the reference is not dropped.
Release the reference immediately if the work was already queued.
Fixes: a6597faa2d59 ("drm/i915: Protect workers against disappearing connectors")
Cc: Sean Paul <seanpaul(a)chromium.org>
Cc: Suraj Kandpal <suraj.kandpal(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v5.10+
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
---
I don't know that we have any bugs open about this. Or how it would
manifest itself. Memory leak on driver unload? I just spotted this while
reading the code for other reasons.
---
drivers/gpu/drm/i915/display/intel_hdcp.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_hdcp.c b/drivers/gpu/drm/i915/display/intel_hdcp.c
index 2afa92321b08..cad309602617 100644
--- a/drivers/gpu/drm/i915/display/intel_hdcp.c
+++ b/drivers/gpu/drm/i915/display/intel_hdcp.c
@@ -1097,7 +1097,8 @@ static void intel_hdcp_update_value(struct intel_connector *connector,
hdcp->value = value;
if (update_property) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
}
}
@@ -2531,7 +2532,8 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
mutex_lock(&hdcp->mutex);
hdcp->value = DRM_MODE_CONTENT_PROTECTION_DESIRED;
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
mutex_unlock(&hdcp->mutex);
}
@@ -2548,7 +2550,9 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
*/
if (!desired_and_not_enabled && !content_protection_type_changed) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
+
}
}
--
2.39.2
When a mailbox message is received, the driver is checking for a non 0
datalen in the controlq descriptor. If it is valid, the payload is
attached to the ctlq message to give to the upper layer. However, the
payload response size given to the upper layer was taken from the buffer
metadata which is _always_ the max buffer size. This meant the API was
returning 4K as the payload size for all messages. This went unnoticed
since the virtchnl exchange response logic was checking for a response
size less than 0 (error), not less than exact size, or not greater than
or equal to the max mailbox buffer size (4K). All of these checks will
pass in the success case since the size provided is always 4K. However,
this breaks anyone that wants to validate the exact response size.
Fetch the actual payload length from the value provided in the
descriptor data_len field (instead of the buffer metadata).
Unfortunately, this means we lose some extra error parsing for variable
sized virtchnl responses such as create vport and get ptypes. However,
the original checks weren't really helping anyways since the size was
_always_ 4K.
Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager")
Cc: stable(a)vger.kernel.org # 6.9+
Signed-off-by: Joshua Hay <joshua.a.hay(a)intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
---
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 70986e12da28..3c0f97650d72 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -666,7 +666,7 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
if (ctlq_msg->data_len) {
payload = ctlq_msg->ctx.indirect.payload->va;
- payload_size = ctlq_msg->ctx.indirect.payload->size;
+ payload_size = ctlq_msg->data_len;
}
xn->reply_sz = payload_size;
@@ -1295,10 +1295,6 @@ int idpf_send_create_vport_msg(struct idpf_adapter *adapter,
err = reply_sz;
goto free_vport_params;
}
- if (reply_sz < IDPF_CTLQ_MAX_BUF_LEN) {
- err = -EIO;
- goto free_vport_params;
- }
return 0;
@@ -2602,9 +2598,6 @@ int idpf_send_get_rx_ptype_msg(struct idpf_vport *vport)
if (reply_sz < 0)
return reply_sz;
- if (reply_sz < IDPF_CTLQ_MAX_BUF_LEN)
- return -EIO;
-
ptypes_recvd += le16_to_cpu(ptype_info->num_ptypes);
if (ptypes_recvd > max_ptype)
return -EINVAL;
--
2.39.2
The patch titled
Subject: kselftests: mm: fix wrong __NR_userfaultfd value
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kselftests-mm-fix-wrong-__nr_userfaultfd-value.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: kselftests: mm: fix wrong __NR_userfaultfd value
Date: Mon, 23 Sep 2024 10:38:36 +0500
grep -rnIF "#define __NR_userfaultfd"
tools/include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
arch/x86/include/generated/uapi/asm/unistd_32.h:374:#define
__NR_userfaultfd 374
arch/x86/include/generated/uapi/asm/unistd_64.h:327:#define
__NR_userfaultfd 323
arch/x86/include/generated/uapi/asm/unistd_x32.h:282:#define
__NR_userfaultfd (__X32_SYSCALL_BIT + 323)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:347:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
arch/arm/include/generated/uapi/asm/unistd-oabi.h:359:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
The number is dependent on the architecture. The above data shows that:
x86 374
x86_64 323
The value of __NR_userfaultfd was changed to 282 when asm-generic/unistd.h
was included. It makes the test to fail every time as the correct number
of this syscall on x86_64 is 323. Fix the header to asm/unistd.h.
Link: https://lkml.kernel.org/r/20240923053836.3270393-1-usama.anjum@collabora.com
Fixes: a5c6bc590094 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/pagemap_ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/pagemap_ioctl.c~kselftests-mm-fix-wrong-__nr_userfaultfd-value
+++ a/tools/testing/selftests/mm/pagemap_ioctl.c
@@ -15,7 +15,7 @@
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <math.h>
-#include <asm-generic/unistd.h>
+#include <asm/unistd.h>
#include <pthread.h>
#include <sys/resource.h>
#include <assert.h>
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
kselftests-mm-fix-wrong-__nr_userfaultfd-value.patch
The patch titled
Subject: compiler.h: specify correct attribute for .rodata..c_jump_table
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
compilerh-specify-correct-attribute-for-rodatac_jump_table.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Subject: compiler.h: specify correct attribute for .rodata..c_jump_table
Date: Tue, 24 Sep 2024 14:27:10 +0800
Currently, there is an assembler message when generating kernel/bpf/core.o
under CONFIG_OBJTOOL with LoongArch compiler toolchain:
Warning: setting incorrect section attributes for .rodata..c_jump_table
This is because the section ".rodata..c_jump_table" should be readonly,
but there is a "W" (writable) part of the flags:
$ readelf -S kernel/bpf/core.o | grep -A 1 "rodata..c"
[34] .rodata..c_j[...] PROGBITS 0000000000000000 0000d2e0
0000000000000800 0000000000000000 WA 0 0 8
There is no above issue on x86 due to the generated section flag is only
"A" (allocatable). In order to silence the warning on LoongArch, specify
the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly,
then the section attribute of ".rodata..c_jump_table" must be readonly
in the kernel/bpf/core.o file.
Before:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, DATA
After:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
By the way, AFAICT, maybe the root cause is related with the different
compiler behavior of various archs, so to some extent this change is a
workaround for LoongArch, and also there is no effect for x86 which is the
only port supported by objtool before LoongArch with this patch.
Link: https://lkml.kernel.org/r/20240924062710.1243-1-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Cc: Josh Poimboeuf <jpoimboe(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [6.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/compiler.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/compiler.h~compilerh-specify-correct-attribute-for-rodatac_jump_table
+++ a/include/linux/compiler.h
@@ -133,7 +133,7 @@ void ftrace_likely_update(struct ftrace_
#define annotate_unreachable() __annotate_unreachable(__COUNTER__)
/* Annotate a C jump table to allow objtool to follow the code flow */
-#define __annotate_jump_table __section(".rodata..c_jump_table")
+#define __annotate_jump_table __section(".rodata..c_jump_table,\"a\",@progbits #")
#else /* !CONFIG_OBJTOOL */
#define annotate_reachable()
_
Patches currently in -mm which might be from yangtiezhu(a)loongson.cn are
compilerh-specify-correct-attribute-for-rodatac_jump_table.patch
The patch titled
Subject: ocfs2: fix deadlock in ocfs2_get_system_file_inode
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mohammed Anees <pvmohammedanees2003(a)gmail.com>
Subject: ocfs2: fix deadlock in ocfs2_get_system_file_inode
Date: Tue, 24 Sep 2024 09:32:57 +0000
syzbot has found a possible deadlock in ocfs2_get_system_file_inode [1].
The scenario is depicted here,
CPU0 CPU1
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
The function calls which could lead to this are:
CPU0
ocfs2_mknod - lock(&ocfs2_file_ip_alloc_sem_key);
.
.
.
ocfs2_get_system_file_inode - lock(&osb->system_file_mutex);
CPU1 -
ocfs2_fill_super - lock(&osb->system_file_mutex);
.
.
.
ocfs2_read_virt_blocks - lock(&ocfs2_file_ip_alloc_sem_key);
This issue can be resolved by making the down_read -> down_read_try
in the ocfs2_read_virt_blocks.
[1] https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Link: https://lkml.kernel.org/r/20240924093257.7181-1-pvmohammedanees2003@gmail.c…
Signed-off-by: Mohammed Anees <pvmohammedanees2003(a)gmail.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: <syzbot+e0055ea09f1f5e6fabdd(a)syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Tested-by: syzbot+e0055ea09f1f5e6fabdd(a)syzkaller.appspotmail.com
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/extent_map.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/extent_map.c~ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode
+++ a/fs/ocfs2/extent_map.c
@@ -973,7 +973,13 @@ int ocfs2_read_virt_blocks(struct inode
}
while (done < nr) {
- down_read(&OCFS2_I(inode)->ip_alloc_sem);
+ if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
+ rc = -EAGAIN;
+ mlog(ML_ERROR,
+ "Inode #%llu ip_alloc_sem is temporarily unavailable\n",
+ (unsigned long long)OCFS2_I(inode)->ip_blkno);
+ break;
+ }
rc = ocfs2_extent_map_get_blocks(inode, v_block + done,
&p_block, &p_count, NULL);
up_read(&OCFS2_I(inode)->ip_alloc_sem);
_
Patches currently in -mm which might be from pvmohammedanees2003(a)gmail.com are
ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode.patch
ocfs2-fix-typo-in-comment.patch
The patch titled
Subject: ocfs2: reserve space for inline xattr before attaching reflink tree
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
Subject: ocfs2: reserve space for inline xattr before attaching reflink tree
Date: Wed, 18 Sep 2024 06:38:44 +0000
One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption. Upon troubleshooting,
the fsck -fn output showed the below corruption
[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227. Clamp the next record value? n
The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.
Inode: 33080590 Mode: 0640 Generation: 2619713622 (0x9c25a856)
FS Generation: 904309833 (0x35e6ac49)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
Extended Attributes Block: 0 Extended Attributes Inline Size: 256
User: 0 (root) Group: 0 (root) Size: 281320357888
Links: 1 Clusters: 141738
ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
dtime: 0x0 -- Wed Dec 31 17:00:00 1969
Refcount Block: 2777346
Last Extblk: 2886943 Orphan Slot: 0
Sub Alloc Slot: 0 Sub Alloc Bit: 14
Tree Depth: 1 Count: 227 Next Free Rec: 230
## Offset Clusters Block#
0 0 2310 2776351
1 2310 2139 2777375
2 4449 1221 2778399
3 5670 731 2779423
4 6401 566 2780447
....... .... .......
....... .... .......
The issue was in the reflink workfow while reserving space for inline
xattr. The problematic function is ocfs2_reflink_xattr_inline(). By the
time this function is called the reflink tree is already recreated at the
destination inode from the source inode. At this point, this function
reserves space for inline xattrs at the destination inode without even
checking if there is space at the root metadata block. It simply reduces
the l_count from 243 to 227 thereby making space of 256 bytes for inline
xattr whereas the inode already has extents beyond this index (in this
case upto 230), thereby causing corruption.
The fix for this is to reserve space for inline metadata at the destination
inode before the reflink tree gets recreated. The customer has verified the
fix.
Link: https://lkml.kernel.org/r/20240918063844.1830332-1-gautham.ananthakrishna@o…
Fixes: ef962df057aa ("ocfs2: xattr: fix inlined xattr reflink")
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/refcounttree.c | 26 ++++++++++++++++++++++++--
fs/ocfs2/xattr.c | 11 +----------
2 files changed, 25 insertions(+), 12 deletions(-)
--- a/fs/ocfs2/refcounttree.c~ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree
+++ a/fs/ocfs2/refcounttree.c
@@ -25,6 +25,7 @@
#include "namei.h"
#include "ocfs2_trace.h"
#include "file.h"
+#include "symlink.h"
#include <linux/bio.h>
#include <linux/blkdev.h>
@@ -4148,8 +4149,9 @@ static int __ocfs2_reflink(struct dentry
int ret;
struct inode *inode = d_inode(old_dentry);
struct buffer_head *new_bh = NULL;
+ struct ocfs2_inode_info *oi = OCFS2_I(inode);
- if (OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
+ if (oi->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
ret = -EINVAL;
mlog_errno(ret);
goto out;
@@ -4175,6 +4177,26 @@ static int __ocfs2_reflink(struct dentry
goto out_unlock;
}
+ if ((oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) &&
+ (oi->ip_dyn_features & OCFS2_INLINE_XATTR_FL)) {
+ /*
+ * Adjust extent record count to reserve space for extended attribute.
+ * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
+ */
+ struct ocfs2_inode_info *new_oi = OCFS2_I(new_inode);
+
+ if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+ !(ocfs2_inode_is_fast_symlink(new_inode))) {
+ struct ocfs2_dinode *new_di = (struct ocfs2_dinode *)new_bh->b_data;
+ struct ocfs2_dinode *old_di = (struct ocfs2_dinode *)old_bh->b_data;
+ struct ocfs2_extent_list *el = &new_di->id2.i_list;
+ int inline_size = le16_to_cpu(old_di->i_xattr_inline_size);
+
+ le16_add_cpu(&el->l_count, -(inline_size /
+ sizeof(struct ocfs2_extent_rec)));
+ }
+ }
+
ret = ocfs2_create_reflink_node(inode, old_bh,
new_inode, new_bh, preserve);
if (ret) {
@@ -4182,7 +4204,7 @@ static int __ocfs2_reflink(struct dentry
goto inode_unlock;
}
- if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
+ if (oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
ret = ocfs2_reflink_xattrs(inode, old_bh,
new_inode, new_bh,
preserve);
--- a/fs/ocfs2/xattr.c~ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree
+++ a/fs/ocfs2/xattr.c
@@ -6511,16 +6511,7 @@ static int ocfs2_reflink_xattr_inline(st
}
new_oi = OCFS2_I(args->new_inode);
- /*
- * Adjust extent record count to reserve space for extended attribute.
- * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
- */
- if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
- !(ocfs2_inode_is_fast_symlink(args->new_inode))) {
- struct ocfs2_extent_list *el = &new_di->id2.i_list;
- le16_add_cpu(&el->l_count, -(inline_size /
- sizeof(struct ocfs2_extent_rec)));
- }
+
spin_lock(&new_oi->ip_lock);
new_oi->ip_dyn_features |= OCFS2_HAS_XATTR_FL | OCFS2_INLINE_XATTR_FL;
new_di->i_dyn_features = cpu_to_le16(new_oi->ip_dyn_features);
_
Patches currently in -mm which might be from gautham.ananthakrishna(a)oracle.com are
ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree.patch
The patch titled
Subject: mm: migrate: annotate data-race in migrate_folio_unmap()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-migrate-annotate-data-race-in-migrate_folio_unmap.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jeongjun Park <aha310510(a)gmail.com>
Subject: mm: migrate: annotate data-race in migrate_folio_unmap()
Date: Tue, 24 Sep 2024 22:00:53 +0900
I found a report from syzbot [1]
This report shows that the value can be changed, but in reality, the
value of __folio_set_movable() cannot be changed because it holds the
folio refcount.
Therefore, it is appropriate to add an annotate to make KCSAN
ignore that data-race.
[1]
==================================================================
BUG: KCSAN: data-race in __filemap_remove_folio / migrate_pages_batch
write to 0xffffea0004b81dd8 of 8 bytes by task 6348 on cpu 0:
page_cache_delete mm/filemap.c:153 [inline]
__filemap_remove_folio+0x1ac/0x2c0 mm/filemap.c:233
filemap_remove_folio+0x6b/0x1f0 mm/filemap.c:265
truncate_inode_folio+0x42/0x50 mm/truncate.c:178
shmem_undo_range+0x25b/0xa70 mm/shmem.c:1028
shmem_truncate_range mm/shmem.c:1144 [inline]
shmem_evict_inode+0x14d/0x530 mm/shmem.c:1272
evict+0x2f0/0x580 fs/inode.c:731
iput_final fs/inode.c:1883 [inline]
iput+0x42a/0x5b0 fs/inode.c:1909
dentry_unlink_inode+0x24f/0x260 fs/dcache.c:412
__dentry_kill+0x18b/0x4c0 fs/dcache.c:615
dput+0x5c/0xd0 fs/dcache.c:857
__fput+0x3fb/0x6d0 fs/file_table.c:439
____fput+0x1c/0x30 fs/file_table.c:459
task_work_run+0x13a/0x1a0 kernel/task_work.c:228
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0xbe/0x130 kernel/entry/common.c:218
do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffffea0004b81dd8 of 8 bytes by task 6342 on cpu 1:
__folio_test_movable include/linux/page-flags.h:699 [inline]
migrate_folio_unmap mm/migrate.c:1199 [inline]
migrate_pages_batch+0x24c/0x1940 mm/migrate.c:1797
migrate_pages_sync mm/migrate.c:1963 [inline]
migrate_pages+0xff1/0x1820 mm/migrate.c:2072
do_mbind mm/mempolicy.c:1390 [inline]
kernel_mbind mm/mempolicy.c:1533 [inline]
__do_sys_mbind mm/mempolicy.c:1607 [inline]
__se_sys_mbind+0xf76/0x1160 mm/mempolicy.c:1603
__x64_sys_mbind+0x78/0x90 mm/mempolicy.c:1603
x64_sys_call+0x2b4d/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:238
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0xffff888127601078 -> 0x0000000000000000
Link: https://lkml.kernel.org/r/20240924130053.107490-1-aha310510@gmail.com
Fixes: 7e2a5e5ab217 ("mm: migrate: use __folio_test_movable()")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-annotate-data-race-in-migrate_folio_unmap
+++ a/mm/migrate.c
@@ -1196,7 +1196,7 @@ static int migrate_folio_unmap(new_folio
int rc = -EAGAIN;
int old_page_state = 0;
struct anon_vma *anon_vma = NULL;
- bool is_lru = !__folio_test_movable(src);
+ bool is_lru = data_race(!__folio_test_movable(src));
bool locked = false;
bool dst_locked = false;
_
Patches currently in -mm which might be from aha310510(a)gmail.com are
mm-migrate-annotate-data-race-in-migrate_folio_unmap.patch
mm-percpu-fix-typo-to-pcpu_alloc_noprof-description.patch
The quilt patch titled
Subject: padata: honor the caller's alignment in case of chunk_size 0
has been removed from the -mm tree. Its filename was
padata-honor-the-callers-alignment-in-case-of-chunk_size-0.patch
This patch was dropped because an alternative patch was or shall be merged
------------------------------------------------------
From: Kamlesh Gurudasani <kamlesh(a)ti.com>
Subject: padata: honor the caller's alignment in case of chunk_size 0
Date: Thu, 22 Aug 2024 02:32:52 +0530
In the case where we are forcing the ps.chunk_size to be at least 1, we
are ignoring the caller's alignment.
Move the forcing of ps.chunk_size to be at least 1 before rounding it up
to caller's alignment, so that caller's alignment is honored.
While at it, use max() to force the ps.chunk_size to be at least 1 to
improve readability.
Link: https://lkml.kernel.org/r/20240822-max-v1-1-cb4bc5b1c101@ti.com
Fixes: 6d45e1c948a8 ("padata: Fix possible divide-by-0 panic in padata_mt_helper()")
Signed-off-by: Kamlesh Gurudasani <kamlesh(a)ti.com>
Acked-by: Waiman Long <longman(a)redhat.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/padata.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
--- a/kernel/padata.c~padata-honor-the-callers-alignment-in-case-of-chunk_size-0
+++ a/kernel/padata.c
@@ -509,21 +509,17 @@ void __init padata_do_multithreaded(stru
/*
* Chunk size is the amount of work a helper does per call to the
- * thread function. Load balance large jobs between threads by
+ * thread function. Load balance large jobs between threads by
* increasing the number of chunks, guarantee at least the minimum
* chunk size from the caller, and honor the caller's alignment.
+ * Ensure chunk_size is at least 1 to prevent divide-by-0
+ * panic in padata_mt_helper().
*/
ps.chunk_size = job->size / (ps.nworks * load_balance_factor);
ps.chunk_size = max(ps.chunk_size, job->min_chunk);
+ ps.chunk_size = max(ps.chunk_size, 1ul);
ps.chunk_size = roundup(ps.chunk_size, job->align);
- /*
- * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
- * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
- */
- if (!ps.chunk_size)
- ps.chunk_size = 1U;
-
list_for_each_entry(pw, &works, pw_list)
if (job->numa_aware) {
int old_node = atomic_read(&last_used_nid);
_
Patches currently in -mm which might be from kamlesh(a)ti.com are
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and MADV_DONTNEED semantics). Consider some enclave page added to the
enclave. User space decides to temporarily remove this page (e.g.,
emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
space performs a memory access on the same page on CPU2, which results
in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
follows:
/*
* CPU1: User space performs
* ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
* on enclave page X
*/
sgx_encl_remove_pages() {
mutex_lock(&encl->lock);
entry = sgx_encl_load_page(encl);
/*
* verify that page is
* trimmed and accepted
*/
mutex_unlock(&encl->lock);
/*
* remove PTE entry; cannot
* be performed under lock
*/
sgx_zap_enclave_ptes(encl);
/*
* Fault on CPU2 on same page X
*/
sgx_vma_fault() {
/*
* PTE entry was removed, but the
* page is still in enclave's xarray
*/
xa_load(&encl->page_array) != NULL ->
/*
* SGX driver thinks that this page
* was swapped out and loads it
*/
mutex_lock(&encl->lock);
/*
* this is effectively a no-op
*/
entry = sgx_encl_load_page_in_vma();
/*
* add PTE entry
*
* *BUG*: a PTE is installed for a
* page in process of being removed
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
/*
* continue with page removal
*/
mutex_lock(&encl->lock);
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
xa_erase(&encl->page_array);
mutex_unlock(&encl->lock);
}
Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page.
Fix this race by forcing the fault handler on CPU2 to back off if the
page is currently being removed (on CPU1). This is achieved by
setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is busy, and if yes then CPU2 backs off and waits until the page is
completely removed. After that, any memory access to this page results
in a normal "allocate and EAUG a page on #PF" flow.
Additionally fix a similar race: user space converts a normal enclave
page to a TCS page (via SGX_IOC_ENCLAVE_MODIFY_TYPES) on CPU1, and at
the same time, user space performs a memory access on the same page on
CPU2. This fix is not strictly necessary (this particular race would
indicate a bug in a user space application), but it gives a consistent
rule: if an enclave page is under certain operation by the kernel with
the mapping removed, then other threads trying to access that page are
temporarily blocked and should retry.
Fixes: 9849bb27152c1 ("x86/sgx: Support complete page removal")
Fixes: 45d546b8c109d ("x86/sgx: Support modifying SGX page type")
Cc: stable(a)vger.kernel.org
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.h | 3 ++-
arch/x86/kernel/cpu/sgx/ioctl.c | 17 +++++++++++++++++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index b566b8ad5f33..96b11e8fb770 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -22,7 +22,8 @@
/* 'desc' bits holding the offset in the VA (version array) page. */
#define SGX_ENCL_PAGE_VA_OFFSET_MASK GENMASK_ULL(11, 3)
-/* 'desc' bit indicating that the page is busy (being reclaimed). */
+/* 'desc' bit indicating that the page is busy (being reclaimed, removed or
+ * converted to a TCS page). */
#define SGX_ENCL_PAGE_BUSY BIT(2)
/*
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index b65ab214bdf5..c6f936440438 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -969,12 +969,22 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl,
/*
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
+ *
+ * Releasing encl->lock leads to a data race: while CPU1
+ * performs sgx_zap_enclave_ptes() and removes the PTE
+ * entry for the enclave page, CPU2 may attempt to load
+ * this page (because the page is still in enclave's
+ * xarray). To prevent CPU2 from loading the page, mark
+ * the page as busy before unlock and unmark after lock
+ * again.
*/
+ entry->desc |= SGX_ENCL_PAGE_BUSY;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
mutex_lock(&encl->lock);
+ entry->desc &= ~SGX_ENCL_PAGE_BUSY;
sgx_mark_page_reclaimable(entry->epc_page);
}
@@ -1141,7 +1151,14 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
/*
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
+ *
+ * Releasing encl->lock leads to a data race: while CPU1
+ * performs sgx_zap_enclave_ptes() and removes the PTE entry
+ * for the enclave page, CPU2 may attempt to load this page
+ * (because the page is still in enclave's xarray). To prevent
+ * CPU2 from loading the page, mark the page as busy.
*/
+ entry->desc |= SGX_ENCL_PAGE_BUSY;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
--
2.43.0
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Since drm_sched_entity_modify_sched() can modify the entities run queue,
lets make sure to only dereference the pointer once so both adding and
waking up are guaranteed to be consistent.
Alternative of moving the spin_unlock to after the wake up would for now
be more problematic since the same lock is taken inside
drm_sched_rq_update_fifo().
v2:
* Improve commit message. (Philipp)
* Cache the scheduler pointer directly. (Christian)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Luben Tuikov <ltuikov89(a)gmail.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Philipp Stanner <pstanner(a)redhat.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.7+
Reviewed-by: Christian König <christian.koenig(a)amd.com>
---
drivers/gpu/drm/scheduler/sched_entity.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 0e002c17fcb6..a75eede8bf8d 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
/* first job wakes up scheduler */
if (first) {
+ struct drm_gpu_scheduler *sched;
+ struct drm_sched_rq *rq;
+
/* Add the entity to the run queue */
spin_lock(&entity->rq_lock);
if (entity->stopped) {
@@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
return;
}
- drm_sched_rq_add_entity(entity->rq, entity);
+ rq = entity->rq;
+ sched = rq->sched;
+
+ drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo(entity, submit_ts);
- drm_sched_wakeup(entity->rq->sched);
+ drm_sched_wakeup(sched);
}
}
EXPORT_SYMBOL(drm_sched_entity_push_job);
--
2.46.0
The sysfb framebuffer handling only operates on graphics devices
that provide the system's firmware framebuffer. If that device is
not known, assume that any graphics device has been initialized by
firmware.
Fixes a problem on i915 where sysfb does not release the firmware
framebuffer after the native graphics driver loaded.
Reported-by: Borah, Chaitanya Kumar <chaitanya.kumar.borah(a)intel.com>
Closes: https://lore.kernel.org/dri-devel/SJ1PR11MB6129EFB8CE63D1EF6D932F94B96F2@SJ…
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12160
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: b49420d6a1ae ("video/aperture: optionally match the device in sysfb_disable()")
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: Linux regression tracking (Thorsten Leemhuis) <regressions(a)leemhuis.info>
Cc: <stable(a)vger.kernel.org> # v6.11+
---
drivers/firmware/sysfb.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
index 02a07d3d0d40..a3df782fa687 100644
--- a/drivers/firmware/sysfb.c
+++ b/drivers/firmware/sysfb.c
@@ -67,9 +67,11 @@ static bool sysfb_unregister(void)
void sysfb_disable(struct device *dev)
{
struct screen_info *si = &screen_info;
+ struct device *parent;
mutex_lock(&disable_lock);
- if (!dev || dev == sysfb_parent_dev(si)) {
+ parent = sysfb_parent_dev(si);
+ if (!dev || !parent || dev == parent) {
sysfb_unregister();
disabled = true;
}
--
2.46.0
If the video card driver could not find the connector assigned to the
current video controller, or if the hardware status has changed so that
a pre-existing connector is no longer active, none of the state
connectors will meet the assignment criteria for the current crtc video
controller.
In the drm_WARN function, encoder->base.dev is called, so
'&encoder->base.dev' will be dereferenced since encoder will still be
initialized NULL.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: e12d6218fda2 ("drm/i915: Reduce bigjoiner special casing")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru>
---
drivers/gpu/drm/i915/display/intel_display.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index b4ef4d59da1a..1f25b12e5f67 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -819,9 +819,11 @@ intel_get_crtc_new_encoder(const struct intel_atomic_state *state,
num_encoders++;
}
- drm_WARN(state->base.dev, num_encoders != 1,
+ if (encoder) {
+ drm_WARN(state->base.dev, num_encoders != 1,
"%d encoders for pipe %c\n",
num_encoders, pipe_name(primary_crtc->pipe));
+ }
return encoder;
}
--
2.34.1
Заявление о конфиденциальности
Данное электронное письмо и любые приложения к нему являются конфиденциальными и предназначены исключительно для адресата. Если Вы не являетесь адресатом данного письма, пожалуйста, уведомите немедленно отправителя, не раскрывайте содержание другим лицам, не используйте его в каких-либо целях, не храните и не копируйте информацию любым способом.
This patch addresses an open issue of buffer overflow in
f2fs function stat_show(). On the off chance that si->sbi->s_flag
had one of its bits (on the higher end) set to 1, for_each_set_bit()
will loop more than s_flag[] can afford, leading in turn to
erroneous array access.
The issue in question has been fixed in commit 5bb9c111cd98
("f2fs: convert to MAX_SBI_FLAG instead of 32 in stat_show()") and
cherry-picked for 6.1 stable branch.
Modified patch can now be cleanly applied to linux-6.1.y. All of
the changes made to the patch in order to adapt it are described
at the end of commit message in [PATCH 6.1 1/1] f2fs: convert to
MAX_SBI_FLAG instead of 32 in stat_show().
Atomicity violation occurs when the vc4_crtc_send_vblank function is
executed simultaneously with modifications to crtc->state or
crtc->state->event. Consider a scenario where both crtc->state and
crtc->state->event are non-null. They can pass the validity check, but at
the same time, crtc->state or crtc->state->event could be set to null. In
this case, the validity check in vc4_crtc_send_vblank might act on the old
crtc->state and crtc->state->event (before locking), allowing invalid
values to pass the validity check, leading to null pointer dereference.
To address this issue, it is recommended to include the validity check of
crtc->state and crtc->state->event within the locking section of the
function. This modification ensures that the values of crtc->state->event
and crtc->state do not change during the validation process, maintaining
their valid conditions.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: 68e4a69aec4d ("drm/vc4: crtc: Create vblank reporting function")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/gpu/drm/vc4/vc4_crtc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index 8b5a7e5eb146..98885f519827 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -575,10 +575,12 @@ void vc4_crtc_send_vblank(struct drm_crtc *crtc)
struct drm_device *dev = crtc->dev;
unsigned long flags;
- if (!crtc->state || !crtc->state->event)
+ spin_lock_irqsave(&dev->event_lock, flags);
+ if (!crtc->state || !crtc->state->event) {
+ spin_unlock_irqrestore(&dev->event_lock, flags);
return;
+ }
- spin_lock_irqsave(&dev->event_lock, flags);
drm_crtc_send_vblank_event(crtc, crtc->state->event);
crtc->state->event = NULL;
spin_unlock_irqrestore(&dev->event_lock, flags);
--
2.34.1
After having been assigned to a NULL value at rdma.c:1758, pointer 'queue'
is passed as 1st parameter in call to function
'nvmet_rdma_queue_established' at rdma.c:1773, as 1st parameter in call
to function 'nvmet_rdma_queue_disconnect' at rdma.c:1787 and as 2nd
parameter in call to function 'nvmet_rdma_queue_connect_fail' at
rdma.c:1800, where it is dereferenced.
I understand, that driver is confident that the
RDMA_CM_EVENT_CONNECT_REQUEST event will occur first and perform
initialization, but maliciously prepared hardware could send events in
violation of the protocol. Nothing guarantees that the sequence of events
will start with RDMA_CM_EVENT_CONNECT_REQUEST.
Found by Linux Verification Center (linuxtesting.org) with SVACE
Fixes: e1a2ee249b19 ("nvmet-rdma: Fix use after free in nvmet_rdma_cm_handler()")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru>
---
drivers/nvme/target/rdma.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 1b6264fa5803..becebc95f349 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1770,8 +1770,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id,
ret = nvmet_rdma_queue_connect(cm_id, event);
break;
case RDMA_CM_EVENT_ESTABLISHED:
- nvmet_rdma_queue_established(queue);
- break;
+ if (!queue) {
+ nvmet_rdma_queue_established(queue);
+ break;
+ }
case RDMA_CM_EVENT_ADDR_CHANGE:
if (!queue) {
struct nvmet_rdma_port *port = cm_id->context;
@@ -1782,8 +1784,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id,
fallthrough;
case RDMA_CM_EVENT_DISCONNECTED:
case RDMA_CM_EVENT_TIMEWAIT_EXIT:
- nvmet_rdma_queue_disconnect(queue);
- break;
+ if (!queue) {
+ nvmet_rdma_queue_disconnect(queue);
+ break;
+ }
case RDMA_CM_EVENT_DEVICE_REMOVAL:
ret = nvmet_rdma_device_removal(cm_id, queue);
break;
@@ -1793,8 +1797,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id,
fallthrough;
case RDMA_CM_EVENT_UNREACHABLE:
case RDMA_CM_EVENT_CONNECT_ERROR:
- nvmet_rdma_queue_connect_fail(cm_id, queue);
- break;
+ if (!queue) {
+ nvmet_rdma_queue_connect_fail(cm_id, queue);
+ break;
+ }
default:
pr_err("received unrecognized RDMA CM event %d\n",
event->event);
--
2.34.1
Заявление о конфиденциальности
Данное электронное письмо и любые приложения к нему являются конфиденциальными и предназначены исключительно для адресата. Если Вы не являетесь адресатом данного письма, пожалуйста, уведомите немедленно отправителя, не раскрывайте содержание другим лицам, не используйте его в каких-либо целях, не храните и не копируйте информацию любым способом.
If the video card driver could not find the connector assigned to the
current video controller, or if the hardware status has changed so that
a pre-existing connector is no longer active, none of the state
connectors will meet the assignment criteria for the current crtc video
controller.
In the drm_WARN function, encoder->base.dev is called, so
'&encoder->base.dev' will be dereferenced since encoder will still be
initialized NULL.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: e12d6218fda2 ("drm/i915: Reduce bigjoiner special casing")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru>
---
drivers/gpu/drm/i915/display/intel_display.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index b4ef4d59da1a..1f25b12e5f67 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -819,9 +819,11 @@ intel_get_crtc_new_encoder(const struct intel_atomic_state *state,
num_encoders++;
}
- drm_WARN(state->base.dev, num_encoders != 1,
+ if (encoder) {
+ drm_WARN(state->base.dev, num_encoders != 1,
"%d encoders for pipe %c\n",
num_encoders, pipe_name(primary_crtc->pipe));
+ }
return encoder;
}
--
2.34.1
Заявление о конфиденциальности
Данное электронное письмо и любые приложения к нему являются конфиденциальными и предназначены исключительно для адресата. Если Вы не являетесь адресатом данного письма, пожалуйста, уведомите немедленно отправителя, не раскрывайте содержание другим лицам, не используйте его в каких-либо целях, не храните и не копируйте информацию любым способом.
If the video card driver could not find the connector assigned to the
current video controller, or if the hardware status has changed so that
a pre-existing connector is no longer active, none of the state
connectors will meet the assignment criteria for the current crtc video
controller.
In the drm_WARN function, encoder->base.dev is called, so
'&encoder->base.dev' will be dereferenced since encoder will still be
initialized NULL.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: e12d6218fda2 ("drm/i915: Reduce bigjoiner special casing")
Cc: stable(a)vger.kernel.org
Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru>
---
drivers/gpu/drm/i915/display/intel_display.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index b4ef4d59da1a..a5e24d64f909 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -819,9 +819,10 @@ intel_get_crtc_new_encoder(const struct intel_atomic_state *state,
num_encoders++;
}
- drm_WARN(state->base.dev, num_encoders != 1,
- "%d encoders for pipe %c\n",
- num_encoders, pipe_name(primary_crtc->pipe));
+ if (encoder)
+ drm_WARN(state->base.dev, num_encoders != 1,
+ "%d encoders for pipe %c\n",
+ num_encoders, pipe_name(primary_crtc->pipe));
return encoder;
}
--
2.34.1
Заявление о конфиденциальности
Данное электронное письмо и любые приложения к нему являются конфиденциальными и предназначены исключительно для адресата. Если Вы не являетесь адресатом данного письма, пожалуйста, уведомите немедленно отправителя, не раскрывайте содержание другим лицам, не используйте его в каких-либо целях, не храните и не копируйте информацию любым способом.
In the `mac802154_scan_worker` function, the `scan_req->type` field was
accessed after the RCU read-side critical section was unlocked. According
to RCU usage rules, this is illegal and can lead to unpredictable
behavior, such as accessing memory that has been updated or causing
use-after-free issues.
This possible bug was identified using a static analysis tool developed
by myself, specifically designed to detect RCU-related issues.
To address this, the `scan_req->type` value is now stored in a local
variable `scan_req_type` while still within the RCU read-side critical
section. The `scan_req_type` is then used after the RCU lock is released,
ensuring that the type value is safely accessed without violating RCU
rules.
Fixes: e2c3e6f53a7a ("mac802154: Handle active scanning")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jiawei Ye <jiawei.ye(a)foxmail.com>
Acked-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Changelog:
v1 -> v2:
- Repositioned the enum nl802154_scan_types scan_req_type declaration
between struct cfg802154_scan_request *scan_req and struct
ieee802154_sub_if_data *sdata to comply with the reverse Christmas tree
rule.
v2 -> v3:
- Add Cc stable tag and Acked-by.
---
net/mac802154/scan.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/mac802154/scan.c b/net/mac802154/scan.c
index 1c0eeaa76560..a6dab3cc3ad8 100644
--- a/net/mac802154/scan.c
+++ b/net/mac802154/scan.c
@@ -176,6 +176,7 @@ void mac802154_scan_worker(struct work_struct *work)
struct ieee802154_local *local =
container_of(work, struct ieee802154_local, scan_work.work);
struct cfg802154_scan_request *scan_req;
+ enum nl802154_scan_types scan_req_type;
struct ieee802154_sub_if_data *sdata;
unsigned int scan_duration = 0;
struct wpan_phy *wpan_phy;
@@ -209,6 +210,7 @@ void mac802154_scan_worker(struct work_struct *work)
}
wpan_phy = scan_req->wpan_phy;
+ scan_req_type = scan_req->type;
scan_req_duration = scan_req->duration;
/* Look for the next valid chan */
@@ -246,7 +248,7 @@ void mac802154_scan_worker(struct work_struct *work)
goto end_scan;
}
- if (scan_req->type == NL802154_SCAN_ACTIVE) {
+ if (scan_req_type == NL802154_SCAN_ACTIVE) {
ret = mac802154_transmit_beacon_req(local, sdata);
if (ret)
dev_err(&sdata->dev->dev,
--
2.34.1
Imagine an mmap()'d file. Two threads touch the same address at the same
time and fault. Both allocate a physical page and race to install a PTE
for that page. Only one will win the race. The loser frees its page, but
still continues handling the fault as a success and returns
VM_FAULT_NOPAGE from the fault handler.
The same race can happen with SGX. But there's a bug: the loser in the
SGX steers into a failure path. The loser EREMOVE's the winner's EPC
page, then returns SIGBUS, likely killing the app.
Fix the SGX loser's behavior. Check whether another thread already
allocated the page and if yes, return with VM_FAULT_NOPAGE.
The race can be illustrated as follows:
/* /*
* Fault on CPU1 * Fault on CPU2
* on enclave page X * on enclave page X
*/ */
sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array)
== NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ...
/* /*
* alloc encl_page * alloc encl_page
*/ */
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray
*/
xa_insert(&encl->page_array, ...);
/*
* add page to enclave via EAUG
* (page is in pending state)
*/
/*
* add PTE entry
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
}
/*
* All good up to here: enclave page
* successfully added to enclave,
* ready for EACCEPT from user space
*/
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray,
* this fails with -EBUSY as this
* page was already added by CPU2
*/
xa_insert(&encl->page_array, ...);
err_out_shrink:
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*
* *BUG*: page added by CPU2 is
* yanked from enclave while it
* remains accessible from OS
* perspective (PTE installed)
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
mutex_unlock(&encl->lock);
/*
* *BUG*: SIGBUS is returned
* for a valid enclave page
*/
return VM_FAULT_SIGBUS;
}
}
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable(a)vger.kernel.org
Reported-by: Marcelina Kościelnicka <mwk(a)invisiblethingslab.com>
Suggested-by: Kai Huang <kai.huang(a)intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 36 ++++++++++++++++++++--------------
1 file changed, 21 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index c0a3c00284c8..2aa7ced0e4a0 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -337,6 +337,16 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
return VM_FAULT_SIGBUS;
+ mutex_lock(&encl->lock);
+
+ /*
+ * Multiple threads may try to fault on the same page concurrently.
+ * Re-check if another thread has already done that.
+ */
+ encl_page = xa_load(&encl->page_array, PFN_DOWN(addr));
+ if (encl_page)
+ goto done;
+
/*
* Ignore internal permission checking for dynamically added pages.
* They matter only for data added during the pre-initialization
@@ -345,23 +355,23 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
*/
secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X;
encl_page = sgx_encl_page_alloc(encl, addr - encl->base, secinfo_flags);
- if (IS_ERR(encl_page))
- return VM_FAULT_OOM;
-
- mutex_lock(&encl->lock);
+ if (IS_ERR(encl_page)) {
+ vmret = VM_FAULT_OOM;
+ goto err_out_unlock;
+ }
epc_page = sgx_encl_load_secs(encl);
if (IS_ERR(epc_page)) {
if (PTR_ERR(epc_page) == -EBUSY)
vmret = VM_FAULT_NOPAGE;
- goto err_out_unlock;
+ goto err_out_encl;
}
epc_page = sgx_alloc_epc_page(encl_page, false);
if (IS_ERR(epc_page)) {
if (PTR_ERR(epc_page) == -EBUSY)
vmret = VM_FAULT_NOPAGE;
- goto err_out_unlock;
+ goto err_out_encl;
}
va_page = sgx_encl_grow(encl, false);
@@ -376,10 +386,6 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
encl_page, GFP_KERNEL);
- /*
- * If ret == -EBUSY then page was created in another flow while
- * running without encl->lock
- */
if (ret)
goto err_out_shrink;
@@ -389,7 +395,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
if (ret)
- goto err_out;
+ goto err_out_eaug;
encl_page->encl = encl;
encl_page->epc_page = epc_page;
@@ -408,20 +414,20 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
mutex_unlock(&encl->lock);
return VM_FAULT_SIGBUS;
}
+done:
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
-err_out:
+err_out_eaug:
xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
-
err_out_shrink:
sgx_encl_shrink(encl, va_page);
err_out_epc:
sgx_encl_free_epc_page(epc_page);
+err_out_encl:
+ kfree(encl_page);
err_out_unlock:
mutex_unlock(&encl->lock);
- kfree(encl_page);
-
return vmret;
}
--
2.43.0
The page reclaimer thread sets SGX_ENC_PAGE_BEING_RECLAIMED flag when
the enclave page is being reclaimed (moved to the backing store). This
flag however has two logical meanings:
1. Don't attempt to load the enclave page (the page is busy), see
__sgx_encl_load_page().
2. Don't attempt to remove the PCMD page corresponding to this enclave
page (the PCMD page is busy), see reclaimer_writing_to_pcmd().
To reflect these two meanings, split SGX_ENCL_PAGE_BEING_RECLAIMED into
two flags: SGX_ENCL_PAGE_BUSY and SGX_ENCL_PAGE_PCMD_BUSY. Currently,
both flags are set only when the enclave page is being reclaimed (by the
page reclaimer thread). A future commit will introduce new cases when
the enclave page is being operated on; these new cases will set only the
SGX_ENCL_PAGE_BUSY flag.
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
Reviewed-by: Haitao Huang <haitao.huang(a)linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Acked-by: Kai Huang <kai.huang(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 16 +++++++---------
arch/x86/kernel/cpu/sgx/encl.h | 10 ++++++++--
arch/x86/kernel/cpu/sgx/main.c | 4 ++--
3 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..c0a3c00284c8 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -46,10 +46,10 @@ static int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_ind
* a check if an enclave page sharing the PCMD page is in the process of being
* reclaimed.
*
- * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
- * intends to reclaim that enclave page - it means that the PCMD page
- * associated with that enclave page is about to get some data and thus
- * even if the PCMD page is empty, it should not be truncated.
+ * The reclaimer sets the SGX_ENCL_PAGE_PCMD_BUSY flag when it intends to
+ * reclaim that enclave page - it means that the PCMD page associated with that
+ * enclave page is about to get some data and thus even if the PCMD page is
+ * empty, it should not be truncated.
*
* Context: Enclave mutex (&sgx_encl->lock) must be held.
* Return: 1 if the reclaimer is about to write to the PCMD page
@@ -77,8 +77,7 @@ static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
* Stop when reaching the SECS page - it does not
* have a page_array entry and its reclaim is
* started and completed with enclave mutex held so
- * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
- * flag.
+ * it does not use the SGX_ENCL_PAGE_PCMD_BUSY flag.
*/
if (addr == encl->base + encl->size)
break;
@@ -91,8 +90,7 @@ static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
* VA page slot ID uses same bit as the flag so it is important
* to ensure that the page is not already in backing store.
*/
- if (entry->epc_page &&
- (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
+ if (entry->epc_page && (entry->desc & SGX_ENCL_PAGE_PCMD_BUSY)) {
reclaimed = 1;
break;
}
@@ -257,7 +255,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
/* Entry successfully located. */
if (entry->epc_page) {
- if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+ if (entry->desc & SGX_ENCL_PAGE_BUSY)
return ERR_PTR(-EBUSY);
return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..b566b8ad5f33 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -22,8 +22,14 @@
/* 'desc' bits holding the offset in the VA (version array) page. */
#define SGX_ENCL_PAGE_VA_OFFSET_MASK GENMASK_ULL(11, 3)
-/* 'desc' bit marking that the page is being reclaimed. */
-#define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3)
+/* 'desc' bit indicating that the page is busy (being reclaimed). */
+#define SGX_ENCL_PAGE_BUSY BIT(2)
+
+/*
+ * 'desc' bit indicating that PCMD page associated with the enclave page is
+ * busy (because the enclave page is being reclaimed).
+ */
+#define SGX_ENCL_PAGE_PCMD_BUSY BIT(3)
struct sgx_encl_page {
unsigned long desc;
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index f3f1461273ee..da59f034b769 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -205,7 +205,7 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page,
void *va_slot;
int ret;
- encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED;
+ encl_page->desc &= ~(SGX_ENCL_PAGE_BUSY | SGX_ENCL_PAGE_PCMD_BUSY);
va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
list);
@@ -341,7 +341,7 @@ static void sgx_reclaim_pages(void)
goto skip;
}
- encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
+ encl_page->desc |= SGX_ENCL_PAGE_BUSY | SGX_ENCL_PAGE_PCMD_BUSY;
mutex_unlock(&encl_page->encl->lock);
continue;
--
2.43.0
The return value of drm_atomic_get_crtc_state() needs to be
checked. To avoid use of error pointer 'crtc_state' in case
of the failure.
Cc: stable(a)vger.kernel.org
Fixes: dd86dc2f9ae1 ("drm/sti: implement atomic_check for the planes")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/gpu/drm/sti/sti_hqvdp.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/sti/sti_hqvdp.c b/drivers/gpu/drm/sti/sti_hqvdp.c
index 0fb48ac044d8..abab92df78bd 100644
--- a/drivers/gpu/drm/sti/sti_hqvdp.c
+++ b/drivers/gpu/drm/sti/sti_hqvdp.c
@@ -1037,6 +1037,9 @@ static int sti_hqvdp_atomic_check(struct drm_plane *drm_plane,
return 0;
crtc_state = drm_atomic_get_crtc_state(state, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
+
mode = &crtc_state->mode;
dst_x = new_plane_state->crtc_x;
dst_y = new_plane_state->crtc_y;
--
2.25.1
The return value of drm_atomic_get_crtc_state() needs to be
checked. To avoid use of error pointer 'crtc_state' in case
of the failure.
Cc: stable(a)vger.kernel.org
Fixes: dd86dc2f9ae1 ("drm/sti: implement atomic_check for the planes")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- modified the Fixes tag according to suggestions;
- added necessary blank line.
---
drivers/gpu/drm/sti/sti_cursor.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/sti/sti_cursor.c b/drivers/gpu/drm/sti/sti_cursor.c
index db0a1eb53532..c59fcb4dca32 100644
--- a/drivers/gpu/drm/sti/sti_cursor.c
+++ b/drivers/gpu/drm/sti/sti_cursor.c
@@ -200,6 +200,9 @@ static int sti_cursor_atomic_check(struct drm_plane *drm_plane,
return 0;
crtc_state = drm_atomic_get_crtc_state(state, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
+
mode = &crtc_state->mode;
dst_x = new_plane_state->crtc_x;
dst_y = new_plane_state->crtc_y;
--
2.25.1
The return value of drm_atomic_get_crtc_state() needs to be
checked. To avoid use of error pointer 'crtc_state' in case
of the failure.
Cc: stable(a)vger.kernel.org
Fixes: dd86dc2f9ae1 ("drm/sti: implement atomic_check for the planes")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/gpu/drm/sti/sti_gdp.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/sti/sti_gdp.c b/drivers/gpu/drm/sti/sti_gdp.c
index 43c72c2604a0..f046f5f7ad25 100644
--- a/drivers/gpu/drm/sti/sti_gdp.c
+++ b/drivers/gpu/drm/sti/sti_gdp.c
@@ -638,6 +638,9 @@ static int sti_gdp_atomic_check(struct drm_plane *drm_plane,
mixer = to_sti_mixer(crtc);
crtc_state = drm_atomic_get_crtc_state(state, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
+
mode = &crtc_state->mode;
dst_x = new_plane_state->crtc_x;
dst_y = new_plane_state->crtc_y;
--
2.25.1
From: Rob Clark <robdclark(a)chromium.org>
Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609
The whole premise of lockless access to a single-producer-single-
consumer queue is that there is just a single producer and single
consumer. That means we can't call drm_sched_can_queue() (which is
about queueing more work to the hw, not to the spsc queue) from
anywhere other than the consumer (wq).
This call in the producer is just an optimization to avoid scheduling
the consuming worker if it cannot yet queue more work to the hw. It
is safe to drop this optimization to avoid the race condition.
Suggested-by: Asahi Lina <lina(a)asahilina.net>
Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control")
Closes: https://github.com/AsahiLinux/linux/issues/309
Cc: stable(a)vger.kernel.org
Signed-off-by: Rob Clark <robdclark(a)chromium.org>
---
drivers/gpu/drm/scheduler/sched_entity.c | 4 ++--
drivers/gpu/drm/scheduler/sched_main.c | 7 ++-----
include/drm/gpu_scheduler.h | 2 +-
3 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 58c8161289fe..567e5ace6d0c 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -380,7 +380,7 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
container_of(cb, struct drm_sched_entity, cb);
drm_sched_entity_clear_dep(f, cb);
- drm_sched_wakeup(entity->rq->sched, entity);
+ drm_sched_wakeup(entity->rq->sched);
}
/**
@@ -612,7 +612,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo(entity, submit_ts);
- drm_sched_wakeup(entity->rq->sched, entity);
+ drm_sched_wakeup(entity->rq->sched);
}
}
EXPORT_SYMBOL(drm_sched_entity_push_job);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index ab53ab486fe6..6f27cab0b76d 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1013,15 +1013,12 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
/**
* drm_sched_wakeup - Wake up the scheduler if it is ready to queue
* @sched: scheduler instance
- * @entity: the scheduler entity
*
* Wake up the scheduler if we can queue jobs.
*/
-void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
- struct drm_sched_entity *entity)
+void drm_sched_wakeup(struct drm_gpu_scheduler *sched)
{
- if (drm_sched_can_queue(sched, entity))
- drm_sched_run_job_queue(sched);
+ drm_sched_run_job_queue(sched);
}
/**
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index fe8edb917360..9c437a057e5d 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -574,7 +574,7 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
void drm_sched_job_cleanup(struct drm_sched_job *job);
-void drm_sched_wakeup(struct drm_gpu_scheduler *sched, struct drm_sched_entity *entity);
+void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched);
void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched);
--
2.46.0
The IO yurex_write() needs to wait for in order to have a device
ready for writing again can take a long time time.
Consequently the sleep is done in an interruptible state.
Therefore others waiting for yurex_write() itself to finish should
use mutex_lock_interruptible.
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Fixes: 6bc235a2e24a5 ("USB: add driver for Meywa-Denki & Kayac YUREX")
---
drivers/usb/misc/yurex.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/misc/yurex.c b/drivers/usb/misc/yurex.c
index 4a9859e03f6b..0deffdc8205f 100644
--- a/drivers/usb/misc/yurex.c
+++ b/drivers/usb/misc/yurex.c
@@ -430,7 +430,7 @@ static ssize_t yurex_write(struct file *file, const char __user *user_buffer,
size_t count, loff_t *ppos)
{
struct usb_yurex *dev;
- int i, set = 0, retval = 0;
+ int i, set = 0, retval;
char buffer[16 + 1];
char *data = buffer;
unsigned long long c, c2 = 0;
@@ -444,7 +444,10 @@ static ssize_t yurex_write(struct file *file, const char __user *user_buffer,
if (count == 0)
goto error;
- mutex_lock(&dev->io_mutex);
+ retval = mutex_lock_interruptible(&dev->io_mutex);
+ if (retval < 0)
+ return -EINTR;
+
if (dev->disconnected) { /* already disconnected */
mutex_unlock(&dev->io_mutex);
retval = -ENODEV;
--
2.46.1
The change is similar to that is used in comp_algorithm_set.
default_compressor is used for ZRAM_PRIMARY_COMP only, but if
CONFIG_ZRAM_MULTI_COMP isn't set, then ZRAM_PRIMARY_COMP and
ZRAM_SECONDARY_COMP are the same.
This is detected by KASAN.
==================================================================
Call trace:
kfree+0x60/0x3a0
zram_destroy_comps+0x98/0x198 [zram]
zram_reset_device+0x22c/0x4a8 [zram]
reset_store+0x1bc/0x2d8 [zram]
dev_attr_store+0x44/0x80
sysfs_kf_write+0xfc/0x188
kernfs_fop_write_iter+0x28c/0x428
vfs_write+0x4dc/0x9b8
ksys_write+0x100/0x1f8
__arm64_sys_write+0x74/0xb8
invoke_syscall+0xd8/0x260
el0_svc_common.constprop.0+0xb4/0x240
do_el0_svc+0x48/0x68
el0_svc+0x40/0xc8
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x190/0x198
==================================================================
Signed-off-by: Andrey Skvortsov <andrej.skvortzov(a)gmail.com>
Fixes: 684826f8271a ("zram: free secondary algorithms names")
Cc: <stable(a)vger.kernel.org>
---
Changes in v2:
- removed comment from source code about freeing statically defined compression
- removed part of KASAN report from commit description
- added information about CONFIG_ZRAM_MULTI_COMP into commit description
drivers/block/zram/zram_drv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index c3d245617083d..4a9cdb7fe8878 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2116,7 +2116,8 @@ static void zram_destroy_comps(struct zram *zram)
}
for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
- kfree(zram->comp_algs[prio]);
+ if (zram->comp_algs[prio] != default_compressor)
+ kfree(zram->comp_algs[prio]);
zram->comp_algs[prio] = NULL;
}
--
2.45.2
Evil user can guess the next id of the vm before the ioctl completes and
then call vm destroy ioctl to trigger UAF since create ioctl is still
referencing the same vm. Move the xa_alloc all the way to the end to
prevent this.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_vm.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a3d7cb7cfd22..f7182ef3d8e6 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1765,12 +1765,6 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
if (IS_ERR(vm))
return PTR_ERR(vm);
- mutex_lock(&xef->vm.lock);
- err = xa_alloc(&xef->vm.xa, &id, vm, xa_limit_32b, GFP_KERNEL);
- mutex_unlock(&xef->vm.lock);
- if (err)
- goto err_close_and_put;
-
if (xe->info.has_asid) {
down_write(&xe->usm.lock);
err = xa_alloc_cyclic(&xe->usm.asid_to_vm, &asid, vm,
@@ -1778,12 +1772,11 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
&xe->usm.next_asid, GFP_KERNEL);
up_write(&xe->usm.lock);
if (err < 0)
- goto err_free_id;
+ goto err_close_and_put;
vm->usm.asid = asid;
}
- args->vm_id = id;
vm->xef = xe_file_get(xef);
/* Record BO memory for VM pagetable created against client */
@@ -1796,12 +1789,17 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
args->reserved[0] = xe_bo_main_addr(vm->pt_root[0]->bo, XE_PAGE_SIZE);
#endif
- return 0;
-
-err_free_id:
+ /* user id alloc must always be last in ioctl to prevent UAF */
mutex_lock(&xef->vm.lock);
- xa_erase(&xef->vm.xa, id);
+ err = xa_alloc(&xef->vm.xa, &id, vm, xa_limit_32b, GFP_KERNEL);
mutex_unlock(&xef->vm.lock);
+ if (err)
+ goto err_close_and_put;
+
+ args->vm_id = id;
+
+ return 0;
+
err_close_and_put:
xe_vm_close_and_put(vm);
--
2.46.1
On Mon, Sep 23, 2024 at 03:41:12PM +0000, John Allen wrote:
> A problem was introduced with f69759b ("x86/CPU/AMD: Move Zenbleed check to
> the Zen2 init function") where a bit in the DE_CFG MSR is getting set after
> a microcode late load.
>
> The problem is that the microcode late load path calls into
> amd_check_microcode and subsequently zen2_zenbleed_check. Since the patch
> removes the cpu_has_amd_erratum check from zen2_zenbleed_check, this will
> cause all non-Zen2 cpus to go through the function and set the bit in the
> DE_CFG MSR.
>
> Call into the zenbleed fix path on Zen2 cpus only.
>
> Fixes: f69759be251d ("x86/CPU/AMD: Move Zenbleed check to the Zen2 init function")
Should probably go into stable as well.
Cc: <stable(a)vger.kernel.org>
> Signed-off-by: John Allen <john.allen(a)amd.com>
> ---
> arch/x86/kernel/cpu/amd.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 015971adadfc..368344e1394b 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -1202,5 +1202,6 @@ void amd_check_microcode(void)
> if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> return;
>
> - on_each_cpu(zenbleed_check_cpu, NULL, 1);
> + if (boot_cpu_has(X86_FEATURE_ZEN2))
> + on_each_cpu(zenbleed_check_cpu, NULL, 1);
> }
> --
> 2.34.1
>
devm_kasprintf() can return a NULL pointer on failure but this returned
value is not checked. Fix this lack and check the returned value.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 32c170ff15b0 ("pinctrl: stm32: set default gpio line names using pin names")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v3:
- modified the code style according to suggestions.
Changes in v2:
- modified the patch according to suggestions, added braces;
- modified the typo.
---
drivers/pinctrl/stm32/pinctrl-stm32.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/pinctrl/stm32/pinctrl-stm32.c b/drivers/pinctrl/stm32/pinctrl-stm32.c
index a8673739871d..5b7fa77c1184 100644
--- a/drivers/pinctrl/stm32/pinctrl-stm32.c
+++ b/drivers/pinctrl/stm32/pinctrl-stm32.c
@@ -1374,10 +1374,15 @@ static int stm32_gpiolib_register_bank(struct stm32_pinctrl *pctl, struct fwnode
for (i = 0; i < npins; i++) {
stm32_pin = stm32_pctrl_get_desc_pin_from_gpio(pctl, bank, i);
- if (stm32_pin && stm32_pin->pin.name)
+ if (stm32_pin && stm32_pin->pin.name) {
names[i] = devm_kasprintf(dev, GFP_KERNEL, "%s", stm32_pin->pin.name);
- else
+ if (!names[i]) {
+ err = -ENOMEM;
+ goto err_clk;
+ }
+ } else {
names[i] = NULL;
+ }
}
bank->gpio_chip.names = (const char * const *)names;
--
2.25.1
devm_kasprintf() can return a NULL pointer on failure but this returned
value is not checked. Fix this lack and check the returned value.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: a0f160ffcb83 ("pinctrl: add pinctrl/GPIO driver for Apple SoCs")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/pinctrl/pinctrl-apple-gpio.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pinctrl/pinctrl-apple-gpio.c b/drivers/pinctrl/pinctrl-apple-gpio.c
index 3751c7de37aa..f861e63f4115 100644
--- a/drivers/pinctrl/pinctrl-apple-gpio.c
+++ b/drivers/pinctrl/pinctrl-apple-gpio.c
@@ -474,6 +474,9 @@ static int apple_gpio_pinctrl_probe(struct platform_device *pdev)
for (i = 0; i < npins; i++) {
pins[i].number = i;
pins[i].name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "PIN%u", i);
+ if (!pins[i].name)
+ return -ENOMEM;
+
pins[i].drv_data = pctl;
pin_names[i] = pins[i].name;
pin_nums[i] = i;
--
2.25.1
Hi,
This commit from mainline fixes a crash found in kernel 6.11 if users
change modes with amd-pstate.
commit 49243adc715e ("cpufreq/amd-pstate: Add the missing
cpufreq_cpu_put()")
Please backport to 6.11.y. Earlier kernels should not be affected.
Thanks!
Hi,
With systems that have 8 GPUs nodes + an integrated BMC, the BMC can't
work properly because of limits of DRM nodes in the kernel.
This is fixed in Linus tree with these commits:
Can you please backport to 6.6.y and later?
commit 5fbca8b48b30 ("drm: Use XArray instead of IDR for minors")
commit 45c4d994b82b ("accel: Use XArray instead of IDR for minors")
commit 071d583e01c8 ("drm: Expand max DRM device number to full MINORBITS")
Thanks,
From: Peter Wang <peter.wang(a)mediatek.com>
ufshcd_abort_all froce abort all on-going command and the host
will automatically fill in the OCS field of the corresponding
response with OCS_ABORTED based on different working modes.
MCQ mode: aborts a command using SQ cleanup, The host controller
will post a Completion Queue entry with OCS = ABORTED.
SDB mode: aborts a command using UTRLCLR. Task Management response
which means a Transfer Request was aborted.
For these two cases, set a flag to notify SCSI to requeue the
command after receiving response with OCS_ABORTED.
Fixes: ab248643d3d6 ("scsi: ufs: core: Add error handling for MCQ mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: Peter Wang <peter.wang(a)mediatek.com>
---
drivers/ufs/core/ufshcd.c | 34 +++++++++++++++++++---------------
include/ufs/ufshcd.h | 3 +++
2 files changed, 22 insertions(+), 15 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index a6f818cdef0e..615da47c1727 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -3006,6 +3006,7 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
ufshcd_prepare_lrbp_crypto(scsi_cmd_to_rq(cmd), lrbp);
lrbp->req_abort_skip = false;
+ lrbp->abort_initiated_by_err = false;
ufshcd_comp_scsi_upiu(hba, lrbp);
@@ -5404,7 +5405,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp,
}
break;
case OCS_ABORTED:
- result |= DID_ABORT << 16;
+ if (lrbp->abort_initiated_by_err)
+ result |= DID_REQUEUE << 16;
+ else
+ result |= DID_ABORT << 16;
break;
case OCS_INVALID_COMMAND_STATUS:
result |= DID_REQUEUE << 16;
@@ -6471,26 +6475,12 @@ static bool ufshcd_abort_one(struct request *rq, void *priv)
struct scsi_device *sdev = cmd->device;
struct Scsi_Host *shost = sdev->host;
struct ufs_hba *hba = shost_priv(shost);
- struct ufshcd_lrb *lrbp = &hba->lrb[tag];
- struct ufs_hw_queue *hwq;
- unsigned long flags;
*ret = ufshcd_try_to_abort_task(hba, tag);
dev_err(hba->dev, "Aborting tag %d / CDB %#02x %s\n", tag,
hba->lrb[tag].cmd ? hba->lrb[tag].cmd->cmnd[0] : -1,
*ret ? "failed" : "succeeded");
- /* Release cmd in MCQ mode if abort succeeds */
- if (hba->mcq_enabled && (*ret == 0)) {
- hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
- if (!hwq)
- return 0;
- spin_lock_irqsave(&hwq->cq_lock, flags);
- if (ufshcd_cmd_inflight(lrbp->cmd))
- ufshcd_release_scsi_cmd(hba, lrbp);
- spin_unlock_irqrestore(&hwq->cq_lock, flags);
- }
-
return *ret == 0;
}
@@ -7561,6 +7551,20 @@ int ufshcd_try_to_abort_task(struct ufs_hba *hba, int tag)
goto out;
}
+ /*
+ * When the host software receives a "FUNCTION COMPLETE", set flag
+ * to requeue command after receive response with OCS_ABORTED
+ * SDB mode: UTRLCLR Task Management response which means a Transfer
+ * Request was aborted.
+ * MCQ mode: Host will post to CQ with OCS_ABORTED after SQ cleanup
+ * This flag is set because ufshcd_abort_all forcibly aborts all
+ * commands, and the host will automatically fill in the OCS field
+ * of the corresponding response with OCS_ABORTED.
+ * Therefore, upon receiving this response, it needs to be requeued.
+ */
+ if (!err)
+ lrbp->abort_initiated_by_err = true;
+
err = ufshcd_clear_cmd(hba, tag);
if (err)
dev_err(hba->dev, "%s: Failed clearing cmd at tag %d, err %d\n",
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 0fd2aebac728..15b357672ca5 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -173,6 +173,8 @@ struct ufs_pm_lvl_states {
* @crypto_key_slot: the key slot to use for inline crypto (-1 if none)
* @data_unit_num: the data unit number for the first block for inline crypto
* @req_abort_skip: skip request abort task flag
+ * @abort_initiated_by_err: The flag is specifically used to handle aborts
+ * caused by errors due to host/device communication
*/
struct ufshcd_lrb {
struct utp_transfer_req_desc *utr_descriptor_ptr;
@@ -202,6 +204,7 @@ struct ufshcd_lrb {
#endif
bool req_abort_skip;
+ bool abort_initiated_by_err;
};
/**
--
2.45.2
Commit 07a4a4fc83dd ("ideapad: add Lenovo IdeaPad Z570 support (part 2)")
added an i8042_command(..., I8042_CMD_AUX_[EN|DIS]ABLE) call to
the ideapad-laptop driver to suppress the touchpad events at the PS/2
AUX controller level.
Commit c69e7d843d2c ("platform/x86: ideapad-laptop: Only toggle ps2 aux
port on/off on select models") limited this to only do this by default
on the IdeaPad Z570 to replace a growing list of models on which
the i8042_command() call was disabled by quirks because it was causing
issues.
A recent report shows that this is causing issues even on the Z570 for
which it was originally added because it can happen on resume before
the i8042 controller's own resume() method has run:
[ 50.241235] ideapad_acpi VPC2004:00: PM: calling acpi_subsys_resume+0x0/0x5d @ 4492, parent: PNP0C09:00
[ 50.242055] snd_hda_intel 0000:00:0e.0: PM: pci_pm_resume+0x0/0xed returned 0 after 13511 usecs
[ 50.242120] snd_hda_codec_realtek hdaudioC0D0: PM: calling hda_codec_pm_resume+0x0/0x19 [snd_hda_codec] @ 4518, parent: 0000:00:0e.0
[ 50.247406] i8042: [49434] a8 -> i8042 (command)
[ 50.247468] ideapad_acpi VPC2004:00: PM: acpi_subsys_resume+0x0/0x5d returned 0 after 6220 usecs
...
[ 50.247883] i8042 kbd 00:01: PM: calling pnp_bus_resume+0x0/0x9d @ 4492, parent: pnp0
[ 50.247894] i8042 kbd 00:01: PM: pnp_bus_resume+0x0/0x9d returned 0 after 0 usecs
[ 50.247906] i8042 aux 00:02: PM: calling pnp_bus_resume+0x0/0x9d @ 4492, parent: pnp0
[ 50.247916] i8042 aux 00:02: PM: pnp_bus_resume+0x0/0x9d returned 0 after 0 usecs
...
[ 50.248301] i8042 i8042: PM: calling platform_pm_resume+0x0/0x41 @ 4492, parent: platform
[ 50.248377] i8042: [49434] 55 <- i8042 (flush, kbd)
[ 50.248407] i8042: [49435] aa -> i8042 (command)
[ 50.248601] i8042: [49435] 00 <- i8042 (return)
[ 50.248604] i8042: [49435] i8042 controller selftest: 0x0 != 0x55
Dmitry (input subsys maintainer) pointed out that just sending
KEY_TOUCHPAD_OFF/KEY_TOUCHPAD_ON which the ideapad-laptop driver
already does should be sufficient and that it then is up to userspace
to filter out touchpad events after having received a KEY_TOUCHPAD_OFF.
Given all the problems the i8042_command() call has been causing just
removing it indeed seems best, so this removes it completely. Note that
this only impacts the Ideapad Z570 since it was already disabled by
default on all other models.
Fixes: c69e7d843d2c ("platform/x86: ideapad-laptop: Only toggle ps2 aux port on/off on select models")
Reported-by: Jonathan Denose <jdenose(a)chromium.org>
Closes: https://lore.kernel.org/linux-input/20231102075243.1.Idb37ff8043a29f607beab…
Suggested-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Cc: Maxim Mikityanskiy <maxtram95(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/platform/x86/ideapad-laptop.c | 37 ---------------------------
1 file changed, 37 deletions(-)
diff --git a/drivers/platform/x86/ideapad-laptop.c b/drivers/platform/x86/ideapad-laptop.c
index 1ace711f7442..255fb56ec9ee 100644
--- a/drivers/platform/x86/ideapad-laptop.c
+++ b/drivers/platform/x86/ideapad-laptop.c
@@ -18,7 +18,6 @@
#include <linux/device.h>
#include <linux/dmi.h>
#include <linux/fb.h>
-#include <linux/i8042.h>
#include <linux/init.h>
#include <linux/input.h>
#include <linux/input/sparse-keymap.h>
@@ -144,7 +143,6 @@ struct ideapad_private {
bool hw_rfkill_switch : 1;
bool kbd_bl : 1;
bool touchpad_ctrl_via_ec : 1;
- bool ctrl_ps2_aux_port : 1;
bool usb_charging : 1;
} features;
struct {
@@ -182,12 +180,6 @@ MODULE_PARM_DESC(set_fn_lock_led,
"Enable driver based updates of the fn-lock LED on fn-lock changes. "
"If you need this please report this to: platform-driver-x86(a)vger.kernel.org");
-static bool ctrl_ps2_aux_port;
-module_param(ctrl_ps2_aux_port, bool, 0444);
-MODULE_PARM_DESC(ctrl_ps2_aux_port,
- "Enable driver based PS/2 aux port en-/dis-abling on touchpad on/off toggle. "
- "If you need this please report this to: platform-driver-x86(a)vger.kernel.org");
-
static bool touchpad_ctrl_via_ec;
module_param(touchpad_ctrl_via_ec, bool, 0444);
MODULE_PARM_DESC(touchpad_ctrl_via_ec,
@@ -1560,7 +1552,6 @@ static void ideapad_fn_lock_led_exit(struct ideapad_private *priv)
static void ideapad_sync_touchpad_state(struct ideapad_private *priv, bool send_events)
{
unsigned long value;
- unsigned char param;
int ret;
/* Without reading from EC touchpad LED doesn't switch state */
@@ -1568,15 +1559,6 @@ static void ideapad_sync_touchpad_state(struct ideapad_private *priv, bool send_
if (ret)
return;
- /*
- * Some IdeaPads don't really turn off touchpad - they only
- * switch the LED state. We (de)activate KBC AUX port to turn
- * touchpad off and on. We send KEY_TOUCHPAD_OFF and
- * KEY_TOUCHPAD_ON to not to get out of sync with LED
- */
- if (priv->features.ctrl_ps2_aux_port)
- i8042_command(¶m, value ? I8042_CMD_AUX_ENABLE : I8042_CMD_AUX_DISABLE);
-
/*
* On older models the EC controls the touchpad and toggles it on/off
* itself, in this case we report KEY_TOUCHPAD_ON/_OFF. Some models do
@@ -1699,23 +1681,6 @@ static const struct dmi_system_id hw_rfkill_list[] = {
{}
};
-/*
- * On some models the EC toggles the touchpad muted LED on touchpad toggle
- * hotkey presses, but the EC does not actually disable the touchpad itself.
- * On these models the driver needs to explicitly enable/disable the i8042
- * (PS/2) aux port.
- */
-static const struct dmi_system_id ctrl_ps2_aux_port_list[] = {
- {
- /* Lenovo Ideapad Z570 */
- .matches = {
- DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
- DMI_MATCH(DMI_PRODUCT_VERSION, "Ideapad Z570"),
- },
- },
- {}
-};
-
static void ideapad_check_features(struct ideapad_private *priv)
{
acpi_handle handle = priv->adev->handle;
@@ -1725,8 +1690,6 @@ static void ideapad_check_features(struct ideapad_private *priv)
set_fn_lock_led || dmi_check_system(set_fn_lock_led_list);
priv->features.hw_rfkill_switch =
hw_rfkill_switch || dmi_check_system(hw_rfkill_list);
- priv->features.ctrl_ps2_aux_port =
- ctrl_ps2_aux_port || dmi_check_system(ctrl_ps2_aux_port_list);
priv->features.touchpad_ctrl_via_ec = touchpad_ctrl_via_ec;
if (!read_ec_data(handle, VPCCMD_R_FAN, &val))
--
2.45.2
The following patch addresses the issue of unchecked calls to
dma_map_sg() in safexcel_send_req() as these macros may return 0 in
case of unsuccessful mapping. This outcome in turn requires
unmapping of previously mapped buffers.
The fix has already been backported to the following stable branches:
v6.6: https://lore.kernel.org/all/20240122235813.608624333@linuxfoundation.org/
v6.1: https://lore.kernel.org/all/20240122235752.938797245@linuxfoundation.org/
The issue in question can be fixed in 5.10 and 5.15 stable branches by
backporting the following 2 upstream commits. Both can be cleanly
applied to kernel versions mentioned above.
[PATCH 5.10/5.15 1/2] crypto: inside_secure - Avoid dma map if size is zero
[PATCH 5.10/5.15 2/2] crypto: safexcel - Add error handling for dma_map_sg() calls
First patch is a prerequisite to main fix and removes warnings in case
of a call to dma_map_sg() with size 0 and allows for clean application
of the main fix.
Second (and main) patch adds proper handling of dma_map_sg() erroneous
behaviour.
In the `mac802154_scan_worker` function, the `scan_req->type` field was
accessed after the RCU read-side critical section was unlocked. According
to RCU usage rules, this is illegal and can lead to unpredictable
behavior, such as accessing memory that has been updated or causing
use-after-free issues.
This possible bug was identified using a static analysis tool developed
by myself, specifically designed to detect RCU-related issues.
To address this, the `scan_req->type` value is now stored in a local
variable `scan_req_type` while still within the RCU read-side critical
section. The `scan_req_type` is then used after the RCU lock is released,
ensuring that the type value is safely accessed without violating RCU
rules.
Fixes: e2c3e6f53a7a ("mac802154: Handle active scanning")
Signed-off-by: Jiawei Ye <jiawei.ye(a)foxmail.com>
Cc: stable(a)vger.kernel.org
Acked-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Changelog:
v1->v2: -Repositioned the enum nl802154_scan_types scan_req_type
declaration between struct cfg802154_scan_request *scan_req and struct
ieee802154_sub_if_data *sdata to comply with the reverse Christmas tree
rule.
---
net/mac802154/scan.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/mac802154/scan.c b/net/mac802154/scan.c
index 1c0eeaa76560..a6dab3cc3ad8 100644
--- a/net/mac802154/scan.c
+++ b/net/mac802154/scan.c
@@ -176,6 +176,7 @@ void mac802154_scan_worker(struct work_struct *work)
struct ieee802154_local *local =
container_of(work, struct ieee802154_local, scan_work.work);
struct cfg802154_scan_request *scan_req;
+ enum nl802154_scan_types scan_req_type;
struct ieee802154_sub_if_data *sdata;
unsigned int scan_duration = 0;
struct wpan_phy *wpan_phy;
@@ -209,6 +210,7 @@ void mac802154_scan_worker(struct work_struct *work)
}
wpan_phy = scan_req->wpan_phy;
+ scan_req_type = scan_req->type;
scan_req_duration = scan_req->duration;
/* Look for the next valid chan */
@@ -246,7 +248,7 @@ void mac802154_scan_worker(struct work_struct *work)
goto end_scan;
}
- if (scan_req->type == NL802154_SCAN_ACTIVE) {
+ if (scan_req_type == NL802154_SCAN_ACTIVE) {
ret = mac802154_transmit_beacon_req(local, sdata);
if (ret)
dev_err(&sdata->dev->dev,
--
2.34.1
The initial kref from dma_fence_init() should match up with whatever
signals the fence, however here we are submitting the job first to the
hw and only then grabbing the extra ref and even then we touch some
fence state before this. This might be too late if the fence is
signalled before we can grab the extra ref. Rather always grab the
refcount early before we do the submission part.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2811
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_guc_submit.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index fbbe6a487bbb..b33f3d23a068 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -766,12 +766,15 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
struct xe_guc *guc = exec_queue_to_guc(q);
struct xe_device *xe = guc_to_xe(guc);
bool lr = xe_exec_queue_is_lr(q);
+ struct dma_fence *fence;
xe_assert(xe, !(exec_queue_destroyed(q) || exec_queue_pending_disable(q)) ||
exec_queue_banned(q) || exec_queue_suspended(q));
trace_xe_sched_job_run(job);
+ dma_fence_get(job->fence);
+
if (!exec_queue_killed_or_banned_or_wedged(q) && !xe_sched_job_is_error(job)) {
if (!exec_queue_registered(q))
register_exec_queue(q);
@@ -782,12 +785,16 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
if (lr) {
xe_sched_job_set_error(job, -EOPNOTSUPP);
- return NULL;
+ fence = NULL;
} else if (test_and_set_bit(JOB_FLAG_SUBMIT, &job->fence->flags)) {
- return job->fence;
+ fence = job->fence;
} else {
- return dma_fence_get(job->fence);
+ fence = dma_fence_get(job->fence);
}
+
+ dma_fence_put(job->fence);
+
+ return fence;
}
static void guc_exec_queue_free_job(struct drm_sched_job *drm_job)
--
2.46.0
This patch aims to prevent reliably encountered buffer overflow in
rtl92d_dm_txpower_tracking_callback_thermalmeter() caused by
accessing 'ofdm_index[]' array of size 2 with index 'i' equal to 2,
which in turn is possible if value of 'is2t' is 'true'.
The issue in question has been fixed by the following upstream
patch that can be cleanly applied to 5.10 stable branch.
Keep it consistent with the handling of the same check within
generic_copy_file_checks().
Also, returning -EOVERFLOW in this case is more appropriate.
Signed-off-by: Julian Sun <sunjunchao2870(a)gmail.com>
---
fs/remap_range.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/remap_range.c b/fs/remap_range.c
index 6fdeb3c8cb70..a26521ded5c8 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -42,7 +42,7 @@ static int generic_remap_checks(struct file *file_in, loff_t pos_in,
/* The start of both ranges must be aligned to an fs block. */
if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_out, bs))
- return -EINVAL;
+ return -EOVERFLOW;
/* Ensure offsets don't wrap. */
if (check_add_overflow(pos_in, count, &tmp) ||
--
2.39.2
The recently submitted fix-commit revealed a problem in the iDMA32
platform code. Even though the controller supported only a single master
the dw_dma_acpi_filter() method hard-coded two master interfaces with IDs
0 and 1. As a result the sanity check implemented in the commit
b336268dde75 ("dmaengine: dw: Add peripheral bus width verification") got
incorrect interface data width and thus prevented the client drivers
from configuring the DMA-channel with the EINVAL error returned. E.g. the
next error was printed for the PXA2xx SPI controller driver trying to
configure the requested channels:
> [ 164.525604] pxa2xx_spi_pci 0000:00:07.1: DMA slave config failed
> [ 164.536105] pxa2xx_spi_pci 0000:00:07.1: failed to get DMA TX descriptor
> [ 164.543213] spidev spi-SPT0001:00: SPI transfer failed: -16
The problem would have been spotted much earlier if the iDMA32 controller
supported more than one master interfaces. But since it supports just a
single master and the iDMA32-specific code just ignores the master IDs in
the CTLLO preparation method, the issue has been gone unnoticed so far.
Fix the problem by specifying a single master ID for both memory and
peripheral devices on the ACPI-based platforms if there is only one master
available on the controller. Thus the issue noticed for the iDMA32
controllers will be eliminated and the ACPI-probed DW DMA controllers will
be configured with the correct master ID by default.
Cc: stable(a)vger.kernel.org
Fixes: b336268dde75 ("dmaengine: dw: Add peripheral bus width verification")
Fixes: 199244d69458 ("dmaengine: dw: add support of iDMA 32-bit hardware")
Reported-by: Ferry Toth <fntoth(a)gmail.com>
Closes: https://lore.kernel.org/dmaengine/ZuXbCKUs1iOqFu51@black.fi.intel.com/
Reported-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Closes: https://lore.kernel.org/dmaengine/ZuXgI-VcHpMgbZ91@black.fi.intel.com/
Signed-off-by: Serge Semin <fancer.lancer(a)gmail.com>
---
Note I haven't got any device with the Intel Merrifield iDMA32 + SPI
PXA2xx pair to test out the solution. So any tests are very welcome. But
based on Andy' (see the reported-by links) and my investigations the fix
seems correct.
---
drivers/dma/dw/acpi.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/dw/acpi.c b/drivers/dma/dw/acpi.c
index c510c109d2c3..efbe8baeccbc 100644
--- a/drivers/dma/dw/acpi.c
+++ b/drivers/dma/dw/acpi.c
@@ -8,15 +8,25 @@
static bool dw_dma_acpi_filter(struct dma_chan *chan, void *param)
{
+ struct dw_dma *dw = to_dw_dma(chan->device);
struct acpi_dma_spec *dma_spec = param;
struct dw_dma_slave slave = {
.dma_dev = dma_spec->dev,
.src_id = dma_spec->slave_id,
.dst_id = dma_spec->slave_id,
.m_master = 0,
- .p_master = 1,
};
+ /*
+ * Fallback to using a single interface for both memory and peripheral
+ * device if there is only one master I/F supported (e.g. iDMA32)
+ */
+ if (dw->pdata->nr_masters == 0)
+ slave.p_master = 0;
+ else
+ slave.p_master = 1;
+
+
return dw_dma_filter(chan, &slave);
}
--
2.43.0
From: Toke Høiland-Jørgensen <toke(a)redhat.com>
[ Upstream commit 281d464a34f540de166cee74b723e97ac2515ec3 ]
The devmap code allocates a number hash buckets equal to the next power
of two of the max_entries value provided when creating the map. When
rounding up to the next power of two, the 32-bit variable storing the
number of buckets can overflow, and the code checks for overflow by
checking if the truncated 32-bit value is equal to 0. However, on 32-bit
arches the rounding up itself can overflow mid-way through, because it
ends up doing a left-shift of 32 bits on an unsigned long value. If the
size of an unsigned long is four bytes, this is undefined behaviour, so
there is no guarantee that we'll end up with a nice and tidy 0-value at
the end.
Syzbot managed to turn this into a crash on arm32 by creating a
DEVMAP_HASH with max_entries > 0x80000000 and then trying to update it.
Fix this by moving the overflow check to before the rounding up
operation.
Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Link: https://lore.kernel.org/r/000000000000ed666a0611af6818@google.com
Reported-and-tested-by: syzbot+8cd36f6b65f3cafd400a(a)syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Message-ID: <20240307120340.99577-2-toke(a)redhat.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Pu Lehui <pulehui(a)huawei.com>
---
kernel/bpf/devmap.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 4b2819b0a05a..2370fc31169f 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -130,10 +130,13 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
cost = (u64) sizeof(struct list_head) * num_possible_cpus();
if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
- dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
-
- if (!dtab->n_buckets) /* Overflow check */
+ /* hash table size must be power of 2; roundup_pow_of_two() can
+ * overflow into UB on 32-bit arches, so check that first
+ */
+ if (dtab->map.max_entries > 1UL << 31)
return -EINVAL;
+
+ dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
cost += (u64) sizeof(struct hlist_head) * dtab->n_buckets;
} else {
cost += (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
--
2.34.1
If the adp5588_read function returns 0, then there will be an
overflow of the kpad->keycode buffer.
If the adp5588_read function returns a negative value, then the
logic is broken - the wrong value is used as an index of
the kpad->keycode array.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
drivers/input/keyboard/adp5588-keys.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/input/keyboard/adp5588-keys.c b/drivers/input/keyboard/adp5588-keys.c
index 1b0279393df4..d05387f9c11f 100644
--- a/drivers/input/keyboard/adp5588-keys.c
+++ b/drivers/input/keyboard/adp5588-keys.c
@@ -526,14 +526,17 @@ static void adp5588_report_events(struct adp5588_kpad *kpad, int ev_cnt)
int i;
for (i = 0; i < ev_cnt; i++) {
- int key = adp5588_read(kpad->client, KEY_EVENTA + i);
- int key_val = key & KEY_EV_MASK;
- int key_press = key & KEY_EV_PRESSED;
+ int key, key_val, key_press;
+ key = adp5588_read(kpad->client, KEY_EVENTA + i);
+ if (key < 0)
+ continue;
+ key_val = key & KEY_EV_MASK;
+ key_press = key & KEY_EV_PRESSED;
if (key_val >= GPI_PIN_BASE && key_val <= GPI_PIN_END) {
/* gpio line used as IRQ source */
adp5588_gpio_irq_handle(kpad, key_val, key_press);
- } else {
+ } else if (key_val > 0) {
int row = (key_val - 1) / ADP5588_COLS_MAX;
int col = (key_val - 1) % ADP5588_COLS_MAX;
int code = MATRIX_SCAN_CODE(row, col, kpad->row_shift);
--
2.25.1
Syzkaller reports a sleeping function called from invalid context
in do_con_write() in the 5.10 and 5.15 stable releases.
The issue has been fixed by the following upstream patch that was
adapted to 5.10 and 5.15. All of the changes made to the patch in order
to adapt it are described at the end of the commit message.
This patch has already been backported to the following stable branches:
v6.9 - https://lore.kernel.org/all/20240625085551.368621044@linuxfoundation.org/
v6.6 - https://lore.kernel.org/all/20240625085539.398852153@linuxfoundation.org/
v6.1 - https://lore.kernel.org/all/20240625085527.625846117@linuxfoundation.org/
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with Syzkaller.
Linus Torvalds (1):
tty: add the option to have a tty reject a new ldisc
drivers/tty/tty_ldisc.c | 6 ++++++
drivers/tty/vt/vt.c | 10 ++++++++++
include/linux/tty_driver.h | 8 ++++++++
3 files changed, 24 insertions(+)
--
2.39.2
usb_power_delivery_register_capabilities() returns ERR_PTR in case of
failure. usb_power_delivery_unregister_capabilities() we only check
argument ("cap") for NULL. A more robust check would be checking for
ERR_PTR as well.
Cc: stable(a)vger.kernel.org
Fixes: 662a60102c12 ("usb: typec: Separate USB Power Delivery from USB Type-C")
Signed-off-by: Amit Sunil Dhamne <amitsd(a)google.com>
Reviewed-by: Badhri Jagan Sridharan <badhri(a)google.com>
---
drivers/usb/typec/pd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/typec/pd.c b/drivers/usb/typec/pd.c
index d78c04a421bc..761fe4dddf1b 100644
--- a/drivers/usb/typec/pd.c
+++ b/drivers/usb/typec/pd.c
@@ -519,7 +519,7 @@ EXPORT_SYMBOL_GPL(usb_power_delivery_register_capabilities);
*/
void usb_power_delivery_unregister_capabilities(struct usb_power_delivery_capabilities *cap)
{
- if (!cap)
+ if (IS_ERR_OR_NULL(cap))
return;
device_for_each_child(&cap->dev, NULL, remove_pdo);
base-commit: 68d4209158f43a558c5553ea95ab0c8975eab18c
--
2.46.0.792.g87dc391469-goog
From: Fangzhi Zuo <Jerry.Zuo(a)amd.com>
Existing last step of dsc policy is to restore pbn value under minimum compression
when try to greedily disable dsc for a stream failed to fit in MST bw.
Optimized dsc params result from optimization step is not necessarily the minimum compression,
therefore it is not correct to restore the pbn under minimum compression rate.
Restore the pbn under minimum compression instead of the value from optimized pbn could result
in the dsc params not correct at the modeset where atomic_check failed due to not
enough bw. One or more monitors connected could not light up in such case.
Restore the optimized pbn value, instead of using the pbn value under minimum
compression.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Wayne Lin <wayne.lin(a)amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo(a)amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
---
.../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 358c4bff1c40..88b9f4df8fd9 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -1026,6 +1026,7 @@ static int try_disable_dsc(struct drm_atomic_state *state,
int remaining_to_try = 0;
int ret;
uint16_t fec_overhead_multiplier_x1000 = get_fec_overhead_multiplier(dc_link);
+ int var_pbn;
for (i = 0; i < count; i++) {
if (vars[i + k].dsc_enabled
@@ -1056,13 +1057,18 @@ static int try_disable_dsc(struct drm_atomic_state *state,
break;
DRM_DEBUG_DRIVER("MST_DSC index #%d, try no compression\n", next_index);
+ var_pbn = vars[next_index].pbn;
vars[next_index].pbn = kbps_to_peak_pbn(params[next_index].bw_range.stream_kbps, fec_overhead_multiplier_x1000);
ret = drm_dp_atomic_find_time_slots(state,
params[next_index].port->mgr,
params[next_index].port,
vars[next_index].pbn);
- if (ret < 0)
+ if (ret < 0) {
+ DRM_DEBUG_DRIVER("%s:%d MST_DSC index #%d, failed to set pbn to the state, %d\n",
+ __func__, __LINE__, next_index, ret);
+ vars[next_index].pbn = var_pbn;
return ret;
+ }
ret = drm_dp_mst_atomic_check(state);
if (ret == 0) {
@@ -1070,14 +1076,17 @@ static int try_disable_dsc(struct drm_atomic_state *state,
vars[next_index].dsc_enabled = false;
vars[next_index].bpp_x16 = 0;
} else {
- DRM_DEBUG_DRIVER("MST_DSC index #%d, restore minimum compression\n", next_index);
- vars[next_index].pbn = kbps_to_peak_pbn(params[next_index].bw_range.max_kbps, fec_overhead_multiplier_x1000);
+ DRM_DEBUG_DRIVER("MST_DSC index #%d, restore optimized pbn value\n", next_index);
+ vars[next_index].pbn = var_pbn;
ret = drm_dp_atomic_find_time_slots(state,
params[next_index].port->mgr,
params[next_index].port,
vars[next_index].pbn);
- if (ret < 0)
+ if (ret < 0) {
+ DRM_DEBUG_DRIVER("%s:%d MST_DSC index #%d, failed to set pbn to the state, %d\n",
+ __func__, __LINE__, next_index, ret);
return ret;
+ }
}
tried[next_index] = true;
--
2.46.0
No upstream commit exists for this commit.
If the adp5588_read function returns 0, then there will be an
overflow of the kpad->keycode[key_val - 1] buffer.
If the adp5588_read function returns a negative value, then the
logic is broken - the wrong value is used as an index of
the kpad->keycode array.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 69a4af606ed4 ("Input: adp5588-keys - support GPI events for ADP5588 devices")
Cc: stable(a)vger.kernel.org
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
drivers/input/keyboard/adp5588-keys.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/input/keyboard/adp5588-keys.c b/drivers/input/keyboard/adp5588-keys.c
index 90a59b973d00..19be8054eb5f 100644
--- a/drivers/input/keyboard/adp5588-keys.c
+++ b/drivers/input/keyboard/adp5588-keys.c
@@ -272,6 +272,8 @@ static void adp5588_report_events(struct adp5588_kpad *kpad, int ev_cnt)
int key = adp5588_read(kpad->client, Key_EVENTA + i);
int key_val = key & KEY_EV_MASK;
+ if (key_val <= 0)
+ continue;
if (key_val >= GPI_PIN_BASE && key_val <= GPI_PIN_END) {
for (j = 0; j < kpad->gpimapsize; j++) {
if (key_val == kpad->gpimap[j].pin) {
--
2.25.1
The following patch aims to fix a possible NULL-ptr-dereference that
occurs if a call to get_ep_from_tid() fails to assign nonzero
value.
Upstream commit 283861a4c52c1ea4df3dd1b6fc75a50796ce3524 has been
backported up to version 5.15. For some reason, older stable branches
have been ignored.
This backport can be cleanly applied to 4.19, 5.4 and 5.10 versions.
v2: Add stable maintainers as patch recepients.
This patch addresses issues of integer overflow and possibly erroneous
integer promotion in skl_ddi_calculate_wrpll() and
skl_ddi_hdmi_pll_dividers() in kernel versions 5.10 and 5.15.
The problem has been fixed in upstream and stable versions up to 6.1
with commit:
5b5115726601 ("drm/i915: Fix possible int overflow in skl_ddi_calculate_wrpll()")
Due to changes to skl_ddi_calculate_wrpll() return value the patch had
to be slightly modified during cherry-picking, leaving the original
fix intact, and can now be cleanly applied to 5.10 and 5.15 kernels.
No upstream commit exists for this commit.
If the adp5588_read function returns 0, then there will be an
overflow of the kpad->keycode[key_val - 1] buffer.
If the adp5588_read function returns a negative value, then the
logic is broken - the wrong value is used as an index of
the kpad->keycode array.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: stable(a)vger.kernel.org
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
drivers/input/keyboard/adp5588-keys.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/input/keyboard/adp5588-keys.c b/drivers/input/keyboard/adp5588-keys.c
index 90a59b973d00..19be8054eb5f 100644
--- a/drivers/input/keyboard/adp5588-keys.c
+++ b/drivers/input/keyboard/adp5588-keys.c
@@ -272,6 +272,8 @@ static void adp5588_report_events(struct adp5588_kpad *kpad, int ev_cnt)
int key = adp5588_read(kpad->client, Key_EVENTA + i);
int key_val = key & KEY_EV_MASK;
+ if (key_val <= 0)
+ continue;
if (key_val >= GPI_PIN_BASE && key_val <= GPI_PIN_END) {
for (j = 0; j < kpad->gpimapsize; j++) {
if (key_val == kpad->gpimap[j].pin) {
--
2.25.1
On 2024-09-19 at 14:11:04, Wei Fang (wei.fang(a)nxp.com) wrote:
> When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
> on LS1028A, it was found that if the command was re-run multiple times,
> Rx could not receive the frames, and the result of xdo-bench showed
> that the rx rate was 0.
>
> root@ls1028ardb:~# ./xdp-bench tx eno0
> Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
> Summary 2046 rx/s 0 err,drop/s
> Summary 0 rx/s 0 err,drop/s
> Summary 0 rx/s 0 err,drop/s
> Summary 0 rx/s 0 err,drop/s
>
> By observing the Rx PIR and CIR registers, we found that CIR is always
> equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring
> is full and can no longer accommodate other Rx frames. Therefore, it
> is obvious that the RX BD ring has not been cleaned up.
>
> Further analysis of the code revealed that the Rx BD ring will only
> be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
> Therefore, some debug logs were added to the driver and the current
> values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
> BD ring was full. The logs are as follows.
>
> [ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
> [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
> [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
>
> From the results, we can see that the maximum value of xdp_tx_in_flight
> has reached 2140. However, the size of the Rx BD ring is only 2048. This
> is incredible, so checked the code again and found that the driver did
> not reset xdp_tx_in_flight when installing or uninstalling bpf program,
> resulting in xdp_tx_in_flight still retaining the value after the last
> command was run.
>
> Fixes: c33bfaf91c4c ("net: enetc: set up XDP program under enetc_reconfigure()")
> Cc: stable(a)vger.kernel.org
> Signed-off-by: Wei Fang <wei.fang(a)nxp.com>
Reviewed-by: Subbaraya Sundeep <sbhatta(a)marvell.com>
Thanks,
Sundeep
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit 18685451fc4e546fc0e718580d32df3c0e5c8272 ]
ip_local_out() and other functions can pass skb->sk as function argument.
If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released.
This affects skb fragments reassembled via netfilter or similar
modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.
Eric Dumazet made an initial analysis of this bug. Quoting Eric:
Calling ip_defrag() in output path is also implying skb_orphan(),
which is buggy because output path relies on sk not disappearing.
A relevant old patch about the issue was :
8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()")
[..]
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
inet socket, not an arbitrary one.
If we orphan the packet in ipvlan, then downstream things like FQ
packet scheduler will not work properly.
We need to change ip_defrag() to only use skb_orphan() when really
needed, ie whenever frag_list is going to be used.
Eric suggested to stash sk in fragment queue and made an initial patch.
However there is a problem with this:
If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree.
IOW, we have no choice but to fix up sk_wmem accouting to reflect the
fully reassembled skb, else wmem will underflow.
This change moves the orphan down into the core, to last possible moment.
As ip_defrag_offset is aliased with sk_buff->sk member, we must move the
offset into the FRAG_CB, else skb->sk gets clobbered.
This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue.
In the former case, things work as before, skb is orphaned. This is
safe because skb gets queued/stolen and won't continue past reasm engine.
In the latter case, we will steal the skb->sk reference, reattach it to
the head skb, and fix up wmem accouting when inet_frag inflates truesize.
Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().")
Diagnosed-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: xingwei lee <xrivendell7(a)gmail.com>
Reported-by: yue sun <samsun1006219(a)gmail.com>
Reported-by: syzbot+e5167d7144a62715044c(a)syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
(cherry picked from commit 7d0567842b78390dd9b60f00f1d8f838d540e325)
CVE: CVE-2024-26921
Cc: stable(a)vger.kernel.org # 5.4
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi(a)oracle.com>
---
include/linux/skbuff.h | 5 +-
net/core/sock_destructor.h | 12 +++++
net/ipv4/inet_fragment.c | 70 ++++++++++++++++++++-----
net/ipv4/ip_fragment.c | 2 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +-
5 files changed, 72 insertions(+), 19 deletions(-)
create mode 100644 net/core/sock_destructor.h
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 29ccc33a1c627..3191d0ffc6e9a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -704,10 +704,7 @@ struct sk_buff {
struct list_head list;
};
- union {
- struct sock *sk;
- int ip_defrag_offset;
- };
+ struct sock *sk;
union {
ktime_t tstamp;
diff --git a/net/core/sock_destructor.h b/net/core/sock_destructor.h
new file mode 100644
index 0000000000000..2f396e6bfba5a
--- /dev/null
+++ b/net/core/sock_destructor.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _NET_CORE_SOCK_DESTRUCTOR_H
+#define _NET_CORE_SOCK_DESTRUCTOR_H
+#include <net/tcp.h>
+
+static inline bool is_skb_wmem(const struct sk_buff *skb)
+{
+ return skb->destructor == sock_wfree ||
+ skb->destructor == __sock_wfree ||
+ (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
+}
+#endif
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index e0e8a65d561ec..12ef3cb26676d 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -24,6 +24,8 @@
#include <net/ip.h>
#include <net/ipv6.h>
+#include "../core/sock_destructor.h"
+
/* Use skb->cb to track consecutive/adjacent fragments coming at
* the end of the queue. Nodes in the rb-tree queue will
* contain "runs" of one or more adjacent fragments.
@@ -39,6 +41,7 @@ struct ipfrag_skb_cb {
};
struct sk_buff *next_frag;
int frag_run_len;
+ int ip_defrag_offset;
};
#define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb))
@@ -359,12 +362,12 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
*/
if (!last)
fragrun_create(q, skb); /* First fragment. */
- else if (last->ip_defrag_offset + last->len < end) {
+ else if (FRAG_CB(last)->ip_defrag_offset + last->len < end) {
/* This is the common case: skb goes to the end. */
/* Detect and discard overlaps. */
- if (offset < last->ip_defrag_offset + last->len)
+ if (offset < FRAG_CB(last)->ip_defrag_offset + last->len)
return IPFRAG_OVERLAP;
- if (offset == last->ip_defrag_offset + last->len)
+ if (offset == FRAG_CB(last)->ip_defrag_offset + last->len)
fragrun_append_to_last(q, skb);
else
fragrun_create(q, skb);
@@ -381,13 +384,13 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
parent = *rbn;
curr = rb_to_skb(parent);
- curr_run_end = curr->ip_defrag_offset +
+ curr_run_end = FRAG_CB(curr)->ip_defrag_offset +
FRAG_CB(curr)->frag_run_len;
- if (end <= curr->ip_defrag_offset)
+ if (end <= FRAG_CB(curr)->ip_defrag_offset)
rbn = &parent->rb_left;
else if (offset >= curr_run_end)
rbn = &parent->rb_right;
- else if (offset >= curr->ip_defrag_offset &&
+ else if (offset >= FRAG_CB(curr)->ip_defrag_offset &&
end <= curr_run_end)
return IPFRAG_DUP;
else
@@ -401,7 +404,7 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
rb_insert_color(&skb->rbnode, &q->rb_fragments);
}
- skb->ip_defrag_offset = offset;
+ FRAG_CB(skb)->ip_defrag_offset = offset;
return IPFRAG_OK;
}
@@ -411,13 +414,28 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
struct sk_buff *parent)
{
struct sk_buff *fp, *head = skb_rb_first(&q->rb_fragments);
- struct sk_buff **nextp;
+ void (*destructor)(struct sk_buff *);
+ unsigned int orig_truesize = 0;
+ struct sk_buff **nextp = NULL;
+ struct sock *sk = skb->sk;
int delta;
+ if (sk && is_skb_wmem(skb)) {
+ /* TX: skb->sk might have been passed as argument to
+ * dst->output and must remain valid until tx completes.
+ *
+ * Move sk to reassembled skb and fix up wmem accounting.
+ */
+ orig_truesize = skb->truesize;
+ destructor = skb->destructor;
+ }
+
if (head != skb) {
fp = skb_clone(skb, GFP_ATOMIC);
- if (!fp)
- return NULL;
+ if (!fp) {
+ head = skb;
+ goto out_restore_sk;
+ }
FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag;
if (RB_EMPTY_NODE(&skb->rbnode))
FRAG_CB(parent)->next_frag = fp;
@@ -426,6 +444,12 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
&q->rb_fragments);
if (q->fragments_tail == skb)
q->fragments_tail = fp;
+
+ if (orig_truesize) {
+ /* prevent skb_morph from releasing sk */
+ skb->sk = NULL;
+ skb->destructor = NULL;
+ }
skb_morph(skb, head);
FRAG_CB(skb)->next_frag = FRAG_CB(head)->next_frag;
rb_replace_node(&head->rbnode, &skb->rbnode,
@@ -433,13 +457,13 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
consume_skb(head);
head = skb;
}
- WARN_ON(head->ip_defrag_offset != 0);
+ WARN_ON(FRAG_CB(head)->ip_defrag_offset != 0);
delta = -head->truesize;
/* Head of list must not be cloned. */
if (skb_unclone(head, GFP_ATOMIC))
- return NULL;
+ goto out_restore_sk;
delta += head->truesize;
if (delta)
@@ -455,7 +479,7 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
clone = alloc_skb(0, GFP_ATOMIC);
if (!clone)
- return NULL;
+ goto out_restore_sk;
skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list;
skb_frag_list_init(head);
for (i = 0; i < skb_shinfo(head)->nr_frags; i++)
@@ -472,6 +496,21 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
nextp = &skb_shinfo(head)->frag_list;
}
+out_restore_sk:
+ if (orig_truesize) {
+ int ts_delta = head->truesize - orig_truesize;
+
+ /* if this reassembled skb is fragmented later,
+ * fraglist skbs will get skb->sk assigned from head->sk,
+ * and each frag skb will be released via sock_wfree.
+ *
+ * Update sk_wmem_alloc.
+ */
+ head->sk = sk;
+ head->destructor = destructor;
+ refcount_add(ts_delta, &sk->sk_wmem_alloc);
+ }
+
return nextp;
}
EXPORT_SYMBOL(inet_frag_reasm_prepare);
@@ -479,6 +518,8 @@ EXPORT_SYMBOL(inet_frag_reasm_prepare);
void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
void *reasm_data, bool try_coalesce)
{
+ struct sock *sk = is_skb_wmem(head) ? head->sk : NULL;
+ const unsigned int head_truesize = head->truesize;
struct sk_buff **nextp = (struct sk_buff **)reasm_data;
struct rb_node *rbn;
struct sk_buff *fp;
@@ -541,6 +582,9 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
skb_mark_not_on_list(head);
head->prev = NULL;
head->tstamp = q->stamp;
+
+ if (sk)
+ refcount_add(sum_truesize - head_truesize, &sk->sk_wmem_alloc);
}
EXPORT_SYMBOL(inet_frag_reasm_finish);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index fad803d2d711e..ec2264adf2a6a 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -377,6 +377,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
}
skb_dst_drop(skb);
+ skb_orphan(skb);
return -EINPROGRESS;
insert_error:
@@ -479,7 +480,6 @@ int ip_defrag(struct net *net, struct sk_buff *skb, u32 user)
struct ipq *qp;
__IP_INC_STATS(net, IPSTATS_MIB_REASMREQDS);
- skb_orphan(skb);
/* Lookup (or create) queue header */
qp = ip_find(net, ip_hdr(skb), user, vif);
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index fed9666a2f7da..cab68c63ea65e 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -296,6 +296,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
}
skb_dst_drop(skb);
+ skb_orphan(skb);
return -EINPROGRESS;
insert_error:
@@ -461,7 +462,6 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user)
hdr = ipv6_hdr(skb);
fhdr = (struct frag_hdr *)skb_transport_header(skb);
- skb_orphan(skb);
fq = fq_find(net, fhdr->identification, user, hdr,
skb->dev ? skb->dev->ifindex : 0);
if (fq == NULL) {
--
2.45.2
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit 18685451fc4e546fc0e718580d32df3c0e5c8272 ]
ip_local_out() and other functions can pass skb->sk as function argument.
If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released.
This affects skb fragments reassembled via netfilter or similar
modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.
Eric Dumazet made an initial analysis of this bug. Quoting Eric:
Calling ip_defrag() in output path is also implying skb_orphan(),
which is buggy because output path relies on sk not disappearing.
A relevant old patch about the issue was :
8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()")
[..]
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
inet socket, not an arbitrary one.
If we orphan the packet in ipvlan, then downstream things like FQ
packet scheduler will not work properly.
We need to change ip_defrag() to only use skb_orphan() when really
needed, ie whenever frag_list is going to be used.
Eric suggested to stash sk in fragment queue and made an initial patch.
However there is a problem with this:
If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree.
IOW, we have no choice but to fix up sk_wmem accouting to reflect the
fully reassembled skb, else wmem will underflow.
This change moves the orphan down into the core, to last possible moment.
As ip_defrag_offset is aliased with sk_buff->sk member, we must move the
offset into the FRAG_CB, else skb->sk gets clobbered.
This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue.
In the former case, things work as before, skb is orphaned. This is
safe because skb gets queued/stolen and won't continue past reasm engine.
In the latter case, we will steal the skb->sk reference, reattach it to
the head skb, and fix up wmem accouting when inet_frag inflates truesize.
Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().")
Diagnosed-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: xingwei lee <xrivendell7(a)gmail.com>
Reported-by: yue sun <samsun1006219(a)gmail.com>
Reported-by: syzbot+e5167d7144a62715044c(a)syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
(cherry picked from commit 7d0567842b78390dd9b60f00f1d8f838d540e325)
CVE: CVE-2024-26921
Cc: stable(a)vger.kernel.org # 5.10
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi(a)oracle.com>
---
include/linux/skbuff.h | 5 +-
net/core/sock_destructor.h | 12 +++++
net/ipv4/inet_fragment.c | 70 ++++++++++++++++++++-----
net/ipv4/ip_fragment.c | 2 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +-
5 files changed, 72 insertions(+), 19 deletions(-)
create mode 100644 net/core/sock_destructor.h
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 31755d496b01d..31ae4b74d4352 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -733,10 +733,7 @@ struct sk_buff {
struct list_head list;
};
- union {
- struct sock *sk;
- int ip_defrag_offset;
- };
+ struct sock *sk;
union {
ktime_t tstamp;
diff --git a/net/core/sock_destructor.h b/net/core/sock_destructor.h
new file mode 100644
index 0000000000000..2f396e6bfba5a
--- /dev/null
+++ b/net/core/sock_destructor.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _NET_CORE_SOCK_DESTRUCTOR_H
+#define _NET_CORE_SOCK_DESTRUCTOR_H
+#include <net/tcp.h>
+
+static inline bool is_skb_wmem(const struct sk_buff *skb)
+{
+ return skb->destructor == sock_wfree ||
+ skb->destructor == __sock_wfree ||
+ (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
+}
+#endif
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index e0e8a65d561ec..12ef3cb26676d 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -24,6 +24,8 @@
#include <net/ip.h>
#include <net/ipv6.h>
+#include "../core/sock_destructor.h"
+
/* Use skb->cb to track consecutive/adjacent fragments coming at
* the end of the queue. Nodes in the rb-tree queue will
* contain "runs" of one or more adjacent fragments.
@@ -39,6 +41,7 @@ struct ipfrag_skb_cb {
};
struct sk_buff *next_frag;
int frag_run_len;
+ int ip_defrag_offset;
};
#define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb))
@@ -359,12 +362,12 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
*/
if (!last)
fragrun_create(q, skb); /* First fragment. */
- else if (last->ip_defrag_offset + last->len < end) {
+ else if (FRAG_CB(last)->ip_defrag_offset + last->len < end) {
/* This is the common case: skb goes to the end. */
/* Detect and discard overlaps. */
- if (offset < last->ip_defrag_offset + last->len)
+ if (offset < FRAG_CB(last)->ip_defrag_offset + last->len)
return IPFRAG_OVERLAP;
- if (offset == last->ip_defrag_offset + last->len)
+ if (offset == FRAG_CB(last)->ip_defrag_offset + last->len)
fragrun_append_to_last(q, skb);
else
fragrun_create(q, skb);
@@ -381,13 +384,13 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
parent = *rbn;
curr = rb_to_skb(parent);
- curr_run_end = curr->ip_defrag_offset +
+ curr_run_end = FRAG_CB(curr)->ip_defrag_offset +
FRAG_CB(curr)->frag_run_len;
- if (end <= curr->ip_defrag_offset)
+ if (end <= FRAG_CB(curr)->ip_defrag_offset)
rbn = &parent->rb_left;
else if (offset >= curr_run_end)
rbn = &parent->rb_right;
- else if (offset >= curr->ip_defrag_offset &&
+ else if (offset >= FRAG_CB(curr)->ip_defrag_offset &&
end <= curr_run_end)
return IPFRAG_DUP;
else
@@ -401,7 +404,7 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
rb_insert_color(&skb->rbnode, &q->rb_fragments);
}
- skb->ip_defrag_offset = offset;
+ FRAG_CB(skb)->ip_defrag_offset = offset;
return IPFRAG_OK;
}
@@ -411,13 +414,28 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
struct sk_buff *parent)
{
struct sk_buff *fp, *head = skb_rb_first(&q->rb_fragments);
- struct sk_buff **nextp;
+ void (*destructor)(struct sk_buff *);
+ unsigned int orig_truesize = 0;
+ struct sk_buff **nextp = NULL;
+ struct sock *sk = skb->sk;
int delta;
+ if (sk && is_skb_wmem(skb)) {
+ /* TX: skb->sk might have been passed as argument to
+ * dst->output and must remain valid until tx completes.
+ *
+ * Move sk to reassembled skb and fix up wmem accounting.
+ */
+ orig_truesize = skb->truesize;
+ destructor = skb->destructor;
+ }
+
if (head != skb) {
fp = skb_clone(skb, GFP_ATOMIC);
- if (!fp)
- return NULL;
+ if (!fp) {
+ head = skb;
+ goto out_restore_sk;
+ }
FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag;
if (RB_EMPTY_NODE(&skb->rbnode))
FRAG_CB(parent)->next_frag = fp;
@@ -426,6 +444,12 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
&q->rb_fragments);
if (q->fragments_tail == skb)
q->fragments_tail = fp;
+
+ if (orig_truesize) {
+ /* prevent skb_morph from releasing sk */
+ skb->sk = NULL;
+ skb->destructor = NULL;
+ }
skb_morph(skb, head);
FRAG_CB(skb)->next_frag = FRAG_CB(head)->next_frag;
rb_replace_node(&head->rbnode, &skb->rbnode,
@@ -433,13 +457,13 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
consume_skb(head);
head = skb;
}
- WARN_ON(head->ip_defrag_offset != 0);
+ WARN_ON(FRAG_CB(head)->ip_defrag_offset != 0);
delta = -head->truesize;
/* Head of list must not be cloned. */
if (skb_unclone(head, GFP_ATOMIC))
- return NULL;
+ goto out_restore_sk;
delta += head->truesize;
if (delta)
@@ -455,7 +479,7 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
clone = alloc_skb(0, GFP_ATOMIC);
if (!clone)
- return NULL;
+ goto out_restore_sk;
skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list;
skb_frag_list_init(head);
for (i = 0; i < skb_shinfo(head)->nr_frags; i++)
@@ -472,6 +496,21 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
nextp = &skb_shinfo(head)->frag_list;
}
+out_restore_sk:
+ if (orig_truesize) {
+ int ts_delta = head->truesize - orig_truesize;
+
+ /* if this reassembled skb is fragmented later,
+ * fraglist skbs will get skb->sk assigned from head->sk,
+ * and each frag skb will be released via sock_wfree.
+ *
+ * Update sk_wmem_alloc.
+ */
+ head->sk = sk;
+ head->destructor = destructor;
+ refcount_add(ts_delta, &sk->sk_wmem_alloc);
+ }
+
return nextp;
}
EXPORT_SYMBOL(inet_frag_reasm_prepare);
@@ -479,6 +518,8 @@ EXPORT_SYMBOL(inet_frag_reasm_prepare);
void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
void *reasm_data, bool try_coalesce)
{
+ struct sock *sk = is_skb_wmem(head) ? head->sk : NULL;
+ const unsigned int head_truesize = head->truesize;
struct sk_buff **nextp = (struct sk_buff **)reasm_data;
struct rb_node *rbn;
struct sk_buff *fp;
@@ -541,6 +582,9 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
skb_mark_not_on_list(head);
head->prev = NULL;
head->tstamp = q->stamp;
+
+ if (sk)
+ refcount_add(sum_truesize - head_truesize, &sk->sk_wmem_alloc);
}
EXPORT_SYMBOL(inet_frag_reasm_finish);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index fad803d2d711e..ec2264adf2a6a 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -377,6 +377,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
}
skb_dst_drop(skb);
+ skb_orphan(skb);
return -EINPROGRESS;
insert_error:
@@ -479,7 +480,6 @@ int ip_defrag(struct net *net, struct sk_buff *skb, u32 user)
struct ipq *qp;
__IP_INC_STATS(net, IPSTATS_MIB_REASMREQDS);
- skb_orphan(skb);
/* Lookup (or create) queue header */
qp = ip_find(net, ip_hdr(skb), user, vif);
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index c129ad334eb39..8c2163f95711c 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -296,6 +296,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
}
skb_dst_drop(skb);
+ skb_orphan(skb);
return -EINPROGRESS;
insert_error:
@@ -471,7 +472,6 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user)
hdr = ipv6_hdr(skb);
fhdr = (struct frag_hdr *)skb_transport_header(skb);
- skb_orphan(skb);
fq = fq_find(net, fhdr->identification, user, hdr,
skb->dev ? skb->dev->ifindex : 0);
if (fq == NULL) {
--
2.45.2
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit 18685451fc4e546fc0e718580d32df3c0e5c8272 ]
ip_local_out() and other functions can pass skb->sk as function argument.
If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released.
This affects skb fragments reassembled via netfilter or similar
modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.
Eric Dumazet made an initial analysis of this bug. Quoting Eric:
Calling ip_defrag() in output path is also implying skb_orphan(),
which is buggy because output path relies on sk not disappearing.
A relevant old patch about the issue was :
8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()")
[..]
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
inet socket, not an arbitrary one.
If we orphan the packet in ipvlan, then downstream things like FQ
packet scheduler will not work properly.
We need to change ip_defrag() to only use skb_orphan() when really
needed, ie whenever frag_list is going to be used.
Eric suggested to stash sk in fragment queue and made an initial patch.
However there is a problem with this:
If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree.
IOW, we have no choice but to fix up sk_wmem accouting to reflect the
fully reassembled skb, else wmem will underflow.
This change moves the orphan down into the core, to last possible moment.
As ip_defrag_offset is aliased with sk_buff->sk member, we must move the
offset into the FRAG_CB, else skb->sk gets clobbered.
This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue.
In the former case, things work as before, skb is orphaned. This is
safe because skb gets queued/stolen and won't continue past reasm engine.
In the latter case, we will steal the skb->sk reference, reattach it to
the head skb, and fix up wmem accouting when inet_frag inflates truesize.
Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().")
Diagnosed-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: xingwei lee <xrivendell7(a)gmail.com>
Reported-by: yue sun <samsun1006219(a)gmail.com>
Reported-by: syzbot+e5167d7144a62715044c(a)syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
(cherry picked from commit 7d0567842b78390dd9b60f00f1d8f838d540e325)
CVE: CVE-2024-26921
Cc: stable(a)vger.kernel.org # 5.15
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi(a)oracle.com>
---
include/linux/skbuff.h | 7 +--
net/ipv4/inet_fragment.c | 70 ++++++++++++++++++++-----
net/ipv4/ip_fragment.c | 2 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +-
4 files changed, 60 insertions(+), 21 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b230c422dc3b9..7f52562fac19c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -660,8 +660,6 @@ typedef unsigned char *sk_buff_data_t;
* @rbnode: RB tree node, alternative to next/prev for netem/tcp
* @list: queue head
* @sk: Socket we are owned by
- * @ip_defrag_offset: (aka @sk) alternate use of @sk, used in
- * fragmentation management
* @dev: Device we arrived on/are leaving by
* @dev_scratch: (aka @dev) alternate use of @dev when @dev would be %NULL
* @cb: Control buffer. Free for use by every layer. Put private vars here
@@ -778,10 +776,7 @@ struct sk_buff {
struct list_head list;
};
- union {
- struct sock *sk;
- int ip_defrag_offset;
- };
+ struct sock *sk;
union {
ktime_t tstamp;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 341096807100c..7e38170111999 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -24,6 +24,8 @@
#include <net/ip.h>
#include <net/ipv6.h>
+#include "../core/sock_destructor.h"
+
/* Use skb->cb to track consecutive/adjacent fragments coming at
* the end of the queue. Nodes in the rb-tree queue will
* contain "runs" of one or more adjacent fragments.
@@ -39,6 +41,7 @@ struct ipfrag_skb_cb {
};
struct sk_buff *next_frag;
int frag_run_len;
+ int ip_defrag_offset;
};
#define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb))
@@ -390,12 +393,12 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
*/
if (!last)
fragrun_create(q, skb); /* First fragment. */
- else if (last->ip_defrag_offset + last->len < end) {
+ else if (FRAG_CB(last)->ip_defrag_offset + last->len < end) {
/* This is the common case: skb goes to the end. */
/* Detect and discard overlaps. */
- if (offset < last->ip_defrag_offset + last->len)
+ if (offset < FRAG_CB(last)->ip_defrag_offset + last->len)
return IPFRAG_OVERLAP;
- if (offset == last->ip_defrag_offset + last->len)
+ if (offset == FRAG_CB(last)->ip_defrag_offset + last->len)
fragrun_append_to_last(q, skb);
else
fragrun_create(q, skb);
@@ -412,13 +415,13 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
parent = *rbn;
curr = rb_to_skb(parent);
- curr_run_end = curr->ip_defrag_offset +
+ curr_run_end = FRAG_CB(curr)->ip_defrag_offset +
FRAG_CB(curr)->frag_run_len;
- if (end <= curr->ip_defrag_offset)
+ if (end <= FRAG_CB(curr)->ip_defrag_offset)
rbn = &parent->rb_left;
else if (offset >= curr_run_end)
rbn = &parent->rb_right;
- else if (offset >= curr->ip_defrag_offset &&
+ else if (offset >= FRAG_CB(curr)->ip_defrag_offset &&
end <= curr_run_end)
return IPFRAG_DUP;
else
@@ -432,7 +435,7 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
rb_insert_color(&skb->rbnode, &q->rb_fragments);
}
- skb->ip_defrag_offset = offset;
+ FRAG_CB(skb)->ip_defrag_offset = offset;
return IPFRAG_OK;
}
@@ -442,13 +445,28 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
struct sk_buff *parent)
{
struct sk_buff *fp, *head = skb_rb_first(&q->rb_fragments);
- struct sk_buff **nextp;
+ void (*destructor)(struct sk_buff *);
+ unsigned int orig_truesize = 0;
+ struct sk_buff **nextp = NULL;
+ struct sock *sk = skb->sk;
int delta;
+ if (sk && is_skb_wmem(skb)) {
+ /* TX: skb->sk might have been passed as argument to
+ * dst->output and must remain valid until tx completes.
+ *
+ * Move sk to reassembled skb and fix up wmem accounting.
+ */
+ orig_truesize = skb->truesize;
+ destructor = skb->destructor;
+ }
+
if (head != skb) {
fp = skb_clone(skb, GFP_ATOMIC);
- if (!fp)
- return NULL;
+ if (!fp) {
+ head = skb;
+ goto out_restore_sk;
+ }
FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag;
if (RB_EMPTY_NODE(&skb->rbnode))
FRAG_CB(parent)->next_frag = fp;
@@ -457,6 +475,12 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
&q->rb_fragments);
if (q->fragments_tail == skb)
q->fragments_tail = fp;
+
+ if (orig_truesize) {
+ /* prevent skb_morph from releasing sk */
+ skb->sk = NULL;
+ skb->destructor = NULL;
+ }
skb_morph(skb, head);
FRAG_CB(skb)->next_frag = FRAG_CB(head)->next_frag;
rb_replace_node(&head->rbnode, &skb->rbnode,
@@ -464,13 +488,13 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
consume_skb(head);
head = skb;
}
- WARN_ON(head->ip_defrag_offset != 0);
+ WARN_ON(FRAG_CB(head)->ip_defrag_offset != 0);
delta = -head->truesize;
/* Head of list must not be cloned. */
if (skb_unclone(head, GFP_ATOMIC))
- return NULL;
+ goto out_restore_sk;
delta += head->truesize;
if (delta)
@@ -486,7 +510,7 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
clone = alloc_skb(0, GFP_ATOMIC);
if (!clone)
- return NULL;
+ goto out_restore_sk;
skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list;
skb_frag_list_init(head);
for (i = 0; i < skb_shinfo(head)->nr_frags; i++)
@@ -503,6 +527,21 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
nextp = &skb_shinfo(head)->frag_list;
}
+out_restore_sk:
+ if (orig_truesize) {
+ int ts_delta = head->truesize - orig_truesize;
+
+ /* if this reassembled skb is fragmented later,
+ * fraglist skbs will get skb->sk assigned from head->sk,
+ * and each frag skb will be released via sock_wfree.
+ *
+ * Update sk_wmem_alloc.
+ */
+ head->sk = sk;
+ head->destructor = destructor;
+ refcount_add(ts_delta, &sk->sk_wmem_alloc);
+ }
+
return nextp;
}
EXPORT_SYMBOL(inet_frag_reasm_prepare);
@@ -510,6 +549,8 @@ EXPORT_SYMBOL(inet_frag_reasm_prepare);
void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
void *reasm_data, bool try_coalesce)
{
+ struct sock *sk = is_skb_wmem(head) ? head->sk : NULL;
+ const unsigned int head_truesize = head->truesize;
struct sk_buff **nextp = (struct sk_buff **)reasm_data;
struct rb_node *rbn;
struct sk_buff *fp;
@@ -572,6 +613,9 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
skb_mark_not_on_list(head);
head->prev = NULL;
head->tstamp = q->stamp;
+
+ if (sk)
+ refcount_add(sum_truesize - head_truesize, &sk->sk_wmem_alloc);
}
EXPORT_SYMBOL(inet_frag_reasm_finish);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index fad803d2d711e..ec2264adf2a6a 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -377,6 +377,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
}
skb_dst_drop(skb);
+ skb_orphan(skb);
return -EINPROGRESS;
insert_error:
@@ -479,7 +480,6 @@ int ip_defrag(struct net *net, struct sk_buff *skb, u32 user)
struct ipq *qp;
__IP_INC_STATS(net, IPSTATS_MIB_REASMREQDS);
- skb_orphan(skb);
/* Lookup (or create) queue header */
qp = ip_find(net, ip_hdr(skb), user, vif);
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 2e5b090d7c89f..0ec5ec5a5b45a 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -297,6 +297,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
}
skb_dst_drop(skb);
+ skb_orphan(skb);
return -EINPROGRESS;
insert_error:
@@ -472,7 +473,6 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user)
hdr = ipv6_hdr(skb);
fhdr = (struct frag_hdr *)skb_transport_header(skb);
- skb_orphan(skb);
fq = fq_find(net, fhdr->identification, user, hdr,
skb->dev ? skb->dev->ifindex : 0);
if (fq == NULL) {
--
2.45.2
tpm2_load_null() ignores the return value of tpm2_create_primary().
Further, it does not heal from the situation when memcmp() returns zero.
Address this by returning on failure and saving the null key if there
was no detected interference in the bus.
Cc: stable(a)vger.kernel.org # v6.10+
Fixes: eb24c9788cd9 ("tpm: disable the TPM if NULL name changes")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v3:
- Update log messages. Previously the log message incorrectly stated
on load failure that integrity check had been failed, even tho the
check is done *after* the load operation.
v2:
- Refined the commit message.
- Reverted tpm2_create_primary() changes. They are not required if
tmp_null_key is used as the parameter.
---
drivers/char/tpm/tpm2-sessions.c | 38 +++++++++++++++++---------------
1 file changed, 20 insertions(+), 18 deletions(-)
diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c
index 795f4c7c6adb..a62f64e21511 100644
--- a/drivers/char/tpm/tpm2-sessions.c
+++ b/drivers/char/tpm/tpm2-sessions.c
@@ -915,32 +915,34 @@ static int tpm2_parse_start_auth_session(struct tpm2_auth *auth,
static int tpm2_load_null(struct tpm_chip *chip, u32 *null_key)
{
- int rc;
unsigned int offset = 0; /* dummy offset for null seed context */
u8 name[SHA256_DIGEST_SIZE + 2];
+ u32 tmp_null_key;
+ int rc;
rc = tpm2_load_context(chip, chip->null_key_context, &offset,
- null_key);
- if (rc != -EINVAL)
+ &tmp_null_key);
+ if (rc != -EINVAL) {
+ if (!rc)
+ *null_key = tmp_null_key;
return rc;
+ }
+ dev_info(&chip->dev, "the null key has been reset\n");
- /* an integrity failure may mean the TPM has been reset */
- dev_err(&chip->dev, "NULL key integrity failure!\n");
- /* check the null name against what we know */
- tpm2_create_primary(chip, TPM2_RH_NULL, NULL, name);
- if (memcmp(name, chip->null_key_name, sizeof(name)) == 0)
- /* name unchanged, assume transient integrity failure */
+ rc = tpm2_create_primary(chip, TPM2_RH_NULL, &tmp_null_key, name);
+ if (rc)
return rc;
- /*
- * Fatal TPM failure: the NULL seed has actually changed, so
- * the TPM must have been illegally reset. All in-kernel TPM
- * operations will fail because the NULL primary can't be
- * loaded to salt the sessions, but disable the TPM anyway so
- * userspace programmes can't be compromised by it.
- */
- dev_err(&chip->dev, "NULL name has changed, disabling TPM due to interference\n");
- chip->flags |= TPM_CHIP_FLAG_DISABLE;
+ /* Return the null key if the name has not been changed: */
+ if (memcmp(name, chip->null_key_name, sizeof(name)) == 0) {
+ *null_key = tmp_null_key;
+ return 0;
+ }
+
+ /* Deduce from the name change TPM interference: */
+ dev_err(&chip->dev, "the null key integrity check failedh\n");
+ tpm2_flush_context(chip, tmp_null_key);
+ chip->flags |= TPM_CHIP_FLAG_DISABLE;
return rc;
}
--
2.46.0
Hi Greg, hi Sasha,
Please applied f3c89983cb4f ("block: Fix where bio IO priority gets
set") to stable 6.1+, it applies cleanly to v6.1.110/v6.6.51.
Thx!
Jinpu Wang @ IONOS
Hi,
Upstream commit 1474bc87fe57 ("wifi: cfg80211: check wiphy mutex is held for wdev mutex")
has been backported recently to 5.15/6.1/6.6 stable branches. After that
we started seeing numerous lockdep assertion splats in these kernels
originating from different parts of wireless stack where wdev_lock() is
called. There is also a huge pile of them already found in Syzbot [1,2,3].
Digging more into the issue it appears that the blamed commit is a part of
a much larger series [4] with locking cleanups and improvements for the
whole wireless subsystem. The series was merged at 6.7.
The cover letter for the series says:
There's a kind of pointless commit in there that adds some wiphy
locking assertions to the wdev as an intermediate step, I can
remove that if you think that's better. We ran with it at that
intermediate stage for a while to test things.
So backporting this commit to stable branches without taking the series as
a whole is pointless and just leads to bogus lockdep assertion splats
there. The series itself is an improvement and cleanup work and therefore
is not considered as material for old stable kernels.
The solution which comes to mind is to revert this backported patch from
the affected stable branches.
Namely:
- 5.15 https://lore.kernel.org/stable/20240901160825.013135421@linuxfoundation.org/
- 6.1 https://lore.kernel.org/stable/20240827143842.546537850@linuxfoundation.org/
- 6.6 https://lore.kernel.org/stable/20240827143846.794100356@linuxfoundation.org/
The intention why it was suddenly backported to these branches a year
after merge-to-upstream is not clear actually: there are no stable or
Fixes tags in commit message, and I don't find any public request for
explicit backport on mailing lists.
Please let me know if you can revert the commits yourself or I have to
prepare and send them to you.
[1]: https://syzkaller.appspot.com/bug?extid=310a1a9715fc1c9ead61
[2]: https://syzkaller.appspot.com/bug?extid=b730e8b6bc76d07fe10b
[3]: https://syzkaller.appspot.com/bug?extid=09501cf606ec2823fafa
[4]: https://lore.kernel.org/linux-wireless/20230828115927.116700-41-johannes@si…
--
Thanks,
Fedor
CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND is an int, defaulting to 250. When
the wakeref is non-zero, it's either -1 or a dynamically allocated
pointer, depending on CONFIG_DRM_I915_DEBUG_RUNTIME_PM. It's likely that
the code works by coincidence with the bitwise AND, but with
CONFIG_DRM_I915_DEBUG_RUNTIME_PM=y, there's the off chance that the
condition evaluates to false, and intel_wakeref_auto() doesn't get
called. Switch to the intended logical AND.
Fixes: ad74457a6b5a ("drm/i915/dgfx: Release mmap on rpm suspend")
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Anshuman Gupta <anshuman.gupta(a)intel.com>
Cc: Andi Shyti <andi.shyti(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v6.1+
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
---
drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 5c72462d1f57..c157ade48c39 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1131,7 +1131,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
GEM_WARN_ON(!i915_ttm_cpu_maps_iomem(bo->resource));
}
- if (wakeref & CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND)
+ if (wakeref && CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND)
intel_wakeref_auto(&to_i915(obj->base.dev)->runtime_pm.userfault_wakeref,
msecs_to_jiffies_timeout(CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND));
--
2.39.2
Hi,
There were some fixes for RAPL reading issues recently on some AMD systems.
Can you please bring this commit to 6.6.y, 6.10.y and 6.11.y?
commit 166df51097a2 ("powercap/intel_rapl: Add support for AMD family 1Ah")
Can you also please bring this commit to 6.10.y and 6.11.y?
commit 26096aed255f ("powercap/intel_rapl: Fix the energy-pkg event for
AMD CPUs")
Thanks!
Dell All In One (AIO) models released after 2017 may use a backlight
controller board connected to an UART.
In DSDT this uart port will be defined as:
Name (_HID, "DELL0501")
Name (_CID, EisaId ("PNP0501")
The Dell OptiPlex 5480 AIO has an ACPI device for one if its UARTs with
the above _HID + _CID. Loading the dell-uart-backlight driver fails with
the following errors:
[ 18.261353] dell_uart_backlight serial0-0: Timed out waiting for response.
[ 18.261356] dell_uart_backlight serial0-0: error -ETIMEDOUT: getting firmware version
[ 18.261359] dell_uart_backlight serial0-0: probe with driver dell_uart_backlight failed with error -110
Indicating that there is no backlight controller board attached to
the UART, while the GPU's native backlight control method does work.
Add a quirk to use the GPU's native backlight control method on this model.
Fixes: cd8e468efb4f ("ACPI: video: Add Dell UART backlight controller detection")
Cc: All applicable <stable(a)vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/acpi/video_detect.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c
index b70e84e8049a..015bd8e66c1c 100644
--- a/drivers/acpi/video_detect.c
+++ b/drivers/acpi/video_detect.c
@@ -844,6 +844,15 @@ static const struct dmi_system_id video_detect_dmi_table[] = {
* controller board in their ACPI tables (and may even have one), but
* which need native backlight control nevertheless.
*/
+ {
+ /* https://github.com/zabbly/linux/issues/26 */
+ .callback = video_detect_force_native,
+ /* Dell OptiPlex 5480 AIO */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+ DMI_MATCH(DMI_PRODUCT_NAME, "OptiPlex 5480 AIO"),
+ },
+ },
{
/* https://bugzilla.redhat.com/show_bug.cgi?id=2303936 */
.callback = video_detect_force_native,
--
2.46.0
Commit 0ef625bba6fb ("vfs: support statx(..., NULL, AT_EMPTY_PATH,
...)") added support for passing in NULL when AT_EMPTY_PATH is given,
improving performance when statx is used for fetching stat informantion
from a given fd, which is especially important for 32-bit platforms.
This commit also improved the performance when an empty string is given
by short-circuiting the handling of such paths.
This series is based on the commits in the Linus’ tree. Comparing to the
original patches, the helper vfs_empty_path() is moved to stat.c from
linux/fs.h, because get_user() is only available in fs.h since v5.7,
where commit 80fbaf1c3f29 ('rcuwait: Add @State argument to
rcuwait_wait_event()') added linux/sched/signal.h to rcuwait.h, and
uaccess.h finally got its way to fs.h along the path uaccess.h ->
sched/task.h -> sched/signal.h -> rcuwait.h -> percpu-rwsem.h -> fs.h.
uaccess.h cannot be directly included in fs.h before v5.7, where commit
df23e2be3d24 ('acpi: Remove header dependency') removed proc_fs.h from
acpi/acpi_bus.h, preventing arch/x86/boot/compressed/cmdline.c from
indirectly including fs.h. Otherwise, the function set_fs() defined in
asm/uaccess.h will get into cmdline.c, which contains another set_fs(),
resulting conflicting function definations. There is no users of
vfs_empty_path() except stat.c, and as a result, putting it in stat.c is
acceptable.
The existing vfs_statx_fd(), which is removed since v5.10, is utilized
to implement short-circuit handling of NULL and "" paths, instead of
introducing vfs_statx_path(), simplifying the implementation.
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Signed-off-by: Miao Wang <shankerwangmiao(a)gmail.com>
---
Christian Brauner (2):
fs: new helper vfs_empty_path()
stat: use vfs_empty_path() helper
Christoph Hellwig (2):
fs: implement vfs_stat and vfs_lstat in terms of vfs_fstatat
fs: move vfs_fstatat out of line
Linus Torvalds (1):
vfs: mostly undo glibc turning 'fstat()' into 'fstatat(AT_EMPTY_PATH)'
Mateusz Guzik (1):
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
fs/stat.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++-----
include/linux/fs.h | 26 ++++++++++--------------
2 files changed, 63 insertions(+), 21 deletions(-)
---
base-commit: 661f109c057497c8baf507a2562ceb9f9fb3cbc2
change-id: 20240918-statx-stable-linux-5-4-y-a79d4268600d
Best regards,
--
Miao Wang <shankerwangmiao(a)gmail.com>
Commit 0ef625bba6fb ("vfs: support statx(..., NULL, AT_EMPTY_PATH,
...)") added support for passing in NULL when AT_EMPTY_PATH is given,
improving performance when statx is used for fetching stat informantion
from a given fd, which is especially important for 32-bit platforms.
This commit also improved the performance when an empty string is given
by short-circuiting the handling of such paths.
This series is based on the commits in the Linus’ tree. Modifications
are applied to vfs_statx_path(). In the original patch, vfs_statx_path()
was created to warp around the call to vfs_getattr() after
filename_lookup() in vfs_statx(). Since the coresponding code is
different in 5.15 and 5.10, the content of vfs_statx_path() is modified
to match this. The original patch also moved path_mounted() from
namespace.c to internal.h, which is not applicable for 5.15 and 5.10
since it has not been introduced before 6.5. The original patch also
used CLASS(fd_raw, ) to convert a file descriptor number provided from
the user space in to a struct and automatically release it afterwards.
Since CLASS mechanism is only available since 6.1.79, obtaining and
releasing fd struct is done manually. do_statx() was directly handling
filename string instead of a struct filename * before 5.18, as a result
short-circuiting is implemented in do_statx() instead of sys_statx,
without the need of introducing do_statx_fd().
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Signed-off-by: Miao Wang <shankerwangmiao(a)gmail.com>
---
Christian Brauner (2):
fs: new helper vfs_empty_path()
stat: use vfs_empty_path() helper
Linus Torvalds (1):
vfs: mostly undo glibc turning 'fstat()' into 'fstatat(AT_EMPTY_PATH)'
Mateusz Guzik (1):
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
fs/stat.c | 73 +++++++++++++++++++++++++++++++++++++++++++++---------
include/linux/fs.h | 17 +++++++++++++
2 files changed, 78 insertions(+), 12 deletions(-)
---
base-commit: 3a5928702e7120f83f703fd566082bfb59f1a57e
change-id: 20240918-statx-stable-linux-5-15-y-9a30358a7d47
Best regards,
--
Miao Wang <shankerwangmiao(a)gmail.com>
Commit 0ef625bba6fb ("vfs: support statx(..., NULL, AT_EMPTY_PATH,
...)") added support for passing in NULL when AT_EMPTY_PATH is given,
improving performance when statx is used for fetching stat informantion
from a given fd, which is especially important for 32-bit platforms.
This commit also improved the performance when an empty string is given
by short-circuiting the handling of such paths.
This series is based on the commits in the Linus’ tree. Modifications
are applied to vfs_statx_path(). In the original patch, vfs_statx_path()
was created to warp around the call to vfs_getattr() after
filename_lookup() in vfs_statx(). Since the coresponding code is
different in 6.1, the content of vfs_statx_path() is modified to match
this. The original patch also moved path_mounted() from namespace.c to
internal.h, which is not applicable for 6.1 since it has not been
introduced before 6.5.
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Signed-off-by: Miao Wang <shankerwangmiao(a)gmail.com>
---
Christian Brauner (3):
file: add fd_raw cleanup class
fs: new helper vfs_empty_path()
stat: use vfs_empty_path() helper
Linus Torvalds (1):
vfs: mostly undo glibc turning 'fstat()' into 'fstatat(AT_EMPTY_PATH)'
Mateusz Guzik (1):
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
fs/internal.h | 2 +
fs/stat.c | 101 ++++++++++++++++++++++++++++++++++++++++-----------
include/linux/file.h | 1 +
include/linux/fs.h | 17 +++++++++
4 files changed, 99 insertions(+), 22 deletions(-)
---
base-commit: 5f55cad62cc9d8d29dd3556e0243b14355725ffb
change-id: 20240918-statx-stable-linux-6-1-y-37e6ca691c9b
Best regards,
--
Miao Wang <shankerwangmiao(a)gmail.com>
Commit 0ef625bba6fb ("vfs: support statx(..., NULL, AT_EMPTY_PATH,
...)") added support for passing in NULL when AT_EMPTY_PATH is given,
improving performance when statx is used for fetching stat informantion
from a given fd, which is especially important for 32-bit platforms.
This commit also improved the performance when an empty string is given
by short-circuiting the handling of such paths.
This series is based on the commits in the Linus’ tree. Modifications
are applied to vfs_statx_path(). In the original patch, vfs_statx_path()
was created to warp around the call to vfs_getattr() after
filename_lookup() in vfs_statx(). Since the coresponding code is
different in 6.6, the content of vfs_statx_path() is modified to match
this.
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Signed-off-by: Miao Wang <shankerwangmiao(a)gmail.com>
---
Christian Brauner (3):
file: add fd_raw cleanup class
fs: new helper vfs_empty_path()
stat: use vfs_empty_path() helper
Mateusz Guzik (1):
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
fs/internal.h | 14 +++++++
fs/namespace.c | 13 ------
fs/stat.c | 113 ++++++++++++++++++++++++++++++++++++---------------
include/linux/file.h | 1 +
include/linux/fs.h | 17 ++++++++
5 files changed, 112 insertions(+), 46 deletions(-)
---
base-commit: 6d1dc55b5bab93ef868d223b740d527ee7501063
change-id: 20240918-statx-stable-linux-6-6-y-02566b94440d
Best regards,
--
Miao Wang <shankerwangmiao(a)gmail.com>
From: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
[ Upstream commit 2a3cfb9a24a28da9cc13d2c525a76548865e182c ]
Since 'adev->dm.dc' in amdgpu_dm_fini() might turn out to be NULL
before the call to dc_enable_dmub_notifications(), check
beforehand to ensure there will not be a possible NULL-ptr-deref
there.
Also, since commit 1e88eb1b2c25 ("drm/amd/display: Drop
CONFIG_DRM_AMD_DC_HDCP") there are two separate checks for NULL in
'adev->dm.dc' before dc_deinit_callbacks() and dc_dmub_srv_destroy().
Clean up by combining them all under one 'if'.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 81927e2808be ("drm/amd/display: Support for DMUB AUX")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 393e32259a77..4850aed54604 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1876,14 +1876,14 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
dc_deinit_callbacks(adev->dm.dc);
#endif
- if (adev->dm.dc)
+ if (adev->dm.dc) {
dc_dmub_srv_destroy(&adev->dm.dc->ctx->dmub_srv);
-
- if (dc_enable_dmub_notifications(adev->dm.dc)) {
- kfree(adev->dm.dmub_notify);
- adev->dm.dmub_notify = NULL;
- destroy_workqueue(adev->dm.delayed_hpd_wq);
- adev->dm.delayed_hpd_wq = NULL;
+ if (dc_enable_dmub_notifications(adev->dm.dc)) {
+ kfree(adev->dm.dmub_notify);
+ adev->dm.dmub_notify = NULL;
+ destroy_workqueue(adev->dm.delayed_hpd_wq);
+ adev->dm.delayed_hpd_wq = NULL;
+ }
}
if (adev->dm.dmub_bo)
--
2.25.1
The violation of atomicity occurs when the drbd_uuid_set_bm function is
executed simultaneously with modifying the value of
device->ldev->md.uuid[UI_BITMAP]. Consider a scenario where, while
device->ldev->md.uuid[UI_BITMAP] passes the validity check when its value
is not zero, the value of device->ldev->md.uuid[UI_BITMAP] is written to
zero. In this case, the check in drbd_uuid_set_bm might refer to the old
value of device->ldev->md.uuid[UI_BITMAP] (before locking), which allows
an invalid value to pass the validity check, resulting in inconsistency.
To address this issue, it is recommended to include the data validity check
within the locked section of the function. This modification ensures that
the value of device->ldev->md.uuid[UI_BITMAP] does not change during the
validation process, thereby maintaining its integrity.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs to extract
function pairs that can be concurrently executed, and then analyzes the
instructions in the paired functions to identify possible concurrency bugs
including data races and atomicity violations.
Fixes: 9f2247bb9b75 ("drbd: Protect accesses to the uuid set with a spinlock")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/block/drbd/drbd_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index a9e49b212341..abafc4edf9ed 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -3399,10 +3399,12 @@ void drbd_uuid_new_current(struct drbd_device *device) __must_hold(local)
void drbd_uuid_set_bm(struct drbd_device *device, u64 val) __must_hold(local)
{
unsigned long flags;
- if (device->ldev->md.uuid[UI_BITMAP] == 0 && val == 0)
+ spin_lock_irqsave(&device->ldev->md.uuid_lock, flags);
+ if (device->ldev->md.uuid[UI_BITMAP] == 0 && val == 0) {
+ spin_unlock_irqrestore(&device->ldev->md.uuid_lock, flags);
return;
+ }
- spin_lock_irqsave(&device->ldev->md.uuid_lock, flags);
if (val == 0) {
drbd_uuid_move_history(device);
device->ldev->md.uuid[UI_HISTORY_START] = device->ldev->md.uuid[UI_BITMAP];
--
2.34.1
From: Philip Yang <Philip.Yang(a)amd.com>
commit 8c45b31909b730f9c7b146588e038f9c6553394d upstream.
If the SVM range has no GPU access nor access-in-place attribute,
validate and map to GPU should skip the range.
Add NULL pointer check if find_first_bit(ctx->bitmap, MAX_GPU_INSTANCE)
returns MAX_GPU_INSTANCE as gpuidx if ctx->bitmap is empty.
Signed-off-by: Philip Yang <Philip.Yang(a)amd.com>
Reviewed-by: Alex Sierra <alex.sierra(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
[m.masimov(a)maxima.ru: In order to adapt this patch to branch 6.1
ctx was treated as a variable and not as a pointer.]
Signed-off-by: Murad Masimov <m.masimov(a)maxima.ru>
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7fa5e70f1aac..a44781b66af9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1491,6 +1491,8 @@ static void *kfd_svm_page_owner(struct kfd_process *p, int32_t gpuidx)
struct kfd_process_device *pdd;
pdd = kfd_process_device_from_gpuidx(p, gpuidx);
+ if (!pdd)
+ return NULL;
return SVM_ADEV_PGMAP_OWNER(pdd->dev->adev);
}
@@ -1561,10 +1563,10 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
}
if (bitmap_empty(ctx.bitmap, MAX_GPU_INSTANCE)) {
- if (!prange->mapped_to_gpu)
- return 0;
-
bitmap_copy(ctx.bitmap, prange->bitmap_access, MAX_GPU_INSTANCE);
+ if (!prange->mapped_to_gpu ||
+ bitmap_empty(ctx.bitmap, MAX_GPU_INSTANCE))
+ return 0;
}
if (prange->actual_loc && !prange->ttm_res) {
--
2.39.2
One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption. Upon troubleshooting,
the fsck -fn output showed the below corruption
[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227. Clamp the next record value? n
The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.
Inode: 33080590 Mode: 0640 Generation: 2619713622 (0x9c25a856)
FS Generation: 904309833 (0x35e6ac49)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
Extended Attributes Block: 0 Extended Attributes Inline Size: 256
User: 0 (root) Group: 0 (root) Size: 281320357888
Links: 1 Clusters: 141738
ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
dtime: 0x0 -- Wed Dec 31 17:00:00 1969
Refcount Block: 2777346
Last Extblk: 2886943 Orphan Slot: 0
Sub Alloc Slot: 0 Sub Alloc Bit: 14
Tree Depth: 1 Count: 227 Next Free Rec: 230
## Offset Clusters Block#
0 0 2310 2776351
1 2310 2139 2777375
2 4449 1221 2778399
3 5670 731 2779423
4 6401 566 2780447
....... .... .......
....... .... .......
The issue was in the reflink workfow while reserving space for inline xattr.
The problematic function is ocfs2_reflink_xattr_inline(). By the time this
function is called the reflink tree is already recreated at the destination
inode from the source inode. At this point, this function reserves space
for inline xattrs at the destination inode without even checking if there
is space at the root metadata block. It simply reduces the l_count from 243
to 227 thereby making space of 256 bytes for inline xattr whereas the inode
already has extents beyond this index (in this case upto 230), thereby causing
corruption.
The fix for this is to reserve space for inline metadata at the destination
inode before the reflink tree gets recreated. The customer has verified the
fix.
Fixes: ef962df057aa ("ocfs2: xattr: fix inlined xattr reflink")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
---
fs/ocfs2/refcounttree.c | 26 ++++++++++++++++++++++++--
fs/ocfs2/xattr.c | 11 +----------
2 files changed, 25 insertions(+), 12 deletions(-)
diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index 25c8ec3c8c3a5..80f441878dc1f 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -25,6 +25,7 @@
#include "namei.h"
#include "ocfs2_trace.h"
#include "file.h"
+#include "symlink.h"
#include <linux/bio.h>
#include <linux/blkdev.h>
@@ -4155,8 +4156,9 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
int ret;
struct inode *inode = d_inode(old_dentry);
struct buffer_head *new_bh = NULL;
+ struct ocfs2_inode_info *oi = OCFS2_I(inode);
- if (OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
+ if (oi->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
ret = -EINVAL;
mlog_errno(ret);
goto out;
@@ -4182,6 +4184,26 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
goto out_unlock;
}
+ if ((oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) &&
+ (oi->ip_dyn_features & OCFS2_INLINE_XATTR_FL)) {
+ /*
+ * Adjust extent record count to reserve space for extended attribute.
+ * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
+ */
+ struct ocfs2_inode_info *new_oi = OCFS2_I(new_inode);
+
+ if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+ !(ocfs2_inode_is_fast_symlink(new_inode))) {
+ struct ocfs2_dinode *new_di = (struct ocfs2_dinode *)new_bh->b_data;
+ struct ocfs2_dinode *old_di = (struct ocfs2_dinode *)old_bh->b_data;
+ struct ocfs2_extent_list *el = &new_di->id2.i_list;
+ int inline_size = le16_to_cpu(old_di->i_xattr_inline_size);
+
+ le16_add_cpu(&el->l_count, -(inline_size /
+ sizeof(struct ocfs2_extent_rec)));
+ }
+ }
+
ret = ocfs2_create_reflink_node(inode, old_bh,
new_inode, new_bh, preserve);
if (ret) {
@@ -4189,7 +4211,7 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
goto inode_unlock;
}
- if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
+ if (oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
ret = ocfs2_reflink_xattrs(inode, old_bh,
new_inode, new_bh,
preserve);
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index 6510ad783c912..2c572b336ba48 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -6511,16 +6511,7 @@ static int ocfs2_reflink_xattr_inline(struct ocfs2_xattr_reflink *args)
}
new_oi = OCFS2_I(args->new_inode);
- /*
- * Adjust extent record count to reserve space for extended attribute.
- * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
- */
- if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
- !(ocfs2_inode_is_fast_symlink(args->new_inode))) {
- struct ocfs2_extent_list *el = &new_di->id2.i_list;
- le16_add_cpu(&el->l_count, -(inline_size /
- sizeof(struct ocfs2_extent_rec)));
- }
+
spin_lock(&new_oi->ip_lock);
new_oi->ip_dyn_features |= OCFS2_HAS_XATTR_FL | OCFS2_INLINE_XATTR_FL;
new_di->i_dyn_features = cpu_to_le16(new_oi->ip_dyn_features);
--
2.43.5
From: Michael Kelley <mhklinux(a)outlook.com>
[ Upstream commit 8fcc514809de41153b43ccbe1a0cdf7f72b78e7e ]
A Linux guest on Hyper-V gets the TSC frequency from a synthetic MSR, if
available. In this case, set X86_FEATURE_TSC_KNOWN_FREQ so that Linux
doesn't unnecessarily do refined TSC calibration when setting up the TSC
clocksource.
With this change, a message such as this is no longer output during boot
when the TSC is used as the clocksource:
[ 1.115141] tsc: Refined TSC clocksource calibration: 2918.408 MHz
Furthermore, the guest and host will have exactly the same view of the
TSC frequency, which is important for features such as the TSC deadline
timer that are emulated by the Hyper-V host.
Signed-off-by: Michael Kelley <mhklinux(a)outlook.com>
Reviewed-by: Roman Kisel <romank(a)linux.microsoft.com>
Link: https://lore.kernel.org/r/20240606025559.1631-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
Message-ID: <20240606025559.1631-1-mhklinux(a)outlook.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/kernel/cpu/mshyperv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index f8b0fa2dbe37..b43f25b3c99d 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -243,6 +243,7 @@ static void __init ms_hyperv_init_platform(void)
ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) {
x86_platform.calibrate_tsc = hv_get_tsc_khz;
x86_platform.calibrate_cpu = hv_get_tsc_khz;
+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
}
if (ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED) {
--
2.43.0
From: Michael Kelley <mhklinux(a)outlook.com>
[ Upstream commit 8fcc514809de41153b43ccbe1a0cdf7f72b78e7e ]
A Linux guest on Hyper-V gets the TSC frequency from a synthetic MSR, if
available. In this case, set X86_FEATURE_TSC_KNOWN_FREQ so that Linux
doesn't unnecessarily do refined TSC calibration when setting up the TSC
clocksource.
With this change, a message such as this is no longer output during boot
when the TSC is used as the clocksource:
[ 1.115141] tsc: Refined TSC clocksource calibration: 2918.408 MHz
Furthermore, the guest and host will have exactly the same view of the
TSC frequency, which is important for features such as the TSC deadline
timer that are emulated by the Hyper-V host.
Signed-off-by: Michael Kelley <mhklinux(a)outlook.com>
Reviewed-by: Roman Kisel <romank(a)linux.microsoft.com>
Link: https://lore.kernel.org/r/20240606025559.1631-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
Message-ID: <20240606025559.1631-1-mhklinux(a)outlook.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/kernel/cpu/mshyperv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 51d95c4b692c..cebbcc6c36ae 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -256,6 +256,7 @@ static void __init ms_hyperv_init_platform(void)
ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) {
x86_platform.calibrate_tsc = hv_get_tsc_khz;
x86_platform.calibrate_cpu = hv_get_tsc_khz;
+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
}
if (ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED) {
--
2.43.0
From: Michael Kelley <mhklinux(a)outlook.com>
[ Upstream commit 8fcc514809de41153b43ccbe1a0cdf7f72b78e7e ]
A Linux guest on Hyper-V gets the TSC frequency from a synthetic MSR, if
available. In this case, set X86_FEATURE_TSC_KNOWN_FREQ so that Linux
doesn't unnecessarily do refined TSC calibration when setting up the TSC
clocksource.
With this change, a message such as this is no longer output during boot
when the TSC is used as the clocksource:
[ 1.115141] tsc: Refined TSC clocksource calibration: 2918.408 MHz
Furthermore, the guest and host will have exactly the same view of the
TSC frequency, which is important for features such as the TSC deadline
timer that are emulated by the Hyper-V host.
Signed-off-by: Michael Kelley <mhklinux(a)outlook.com>
Reviewed-by: Roman Kisel <romank(a)linux.microsoft.com>
Link: https://lore.kernel.org/r/20240606025559.1631-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
Message-ID: <20240606025559.1631-1-mhklinux(a)outlook.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/kernel/cpu/mshyperv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 021cd067733e..a91aad434d03 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -275,6 +275,7 @@ static void __init ms_hyperv_init_platform(void)
ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) {
x86_platform.calibrate_tsc = hv_get_tsc_khz;
x86_platform.calibrate_cpu = hv_get_tsc_khz;
+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
}
if (ms_hyperv.hints & HV_X64_ENLIGHTENED_VMCS_RECOMMENDED) {
--
2.43.0
From: Michael Kelley <mhklinux(a)outlook.com>
[ Upstream commit 8fcc514809de41153b43ccbe1a0cdf7f72b78e7e ]
A Linux guest on Hyper-V gets the TSC frequency from a synthetic MSR, if
available. In this case, set X86_FEATURE_TSC_KNOWN_FREQ so that Linux
doesn't unnecessarily do refined TSC calibration when setting up the TSC
clocksource.
With this change, a message such as this is no longer output during boot
when the TSC is used as the clocksource:
[ 1.115141] tsc: Refined TSC clocksource calibration: 2918.408 MHz
Furthermore, the guest and host will have exactly the same view of the
TSC frequency, which is important for features such as the TSC deadline
timer that are emulated by the Hyper-V host.
Signed-off-by: Michael Kelley <mhklinux(a)outlook.com>
Reviewed-by: Roman Kisel <romank(a)linux.microsoft.com>
Link: https://lore.kernel.org/r/20240606025559.1631-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
Message-ID: <20240606025559.1631-1-mhklinux(a)outlook.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/kernel/cpu/mshyperv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 8d3c649a1769..3794b223fd69 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -322,6 +322,7 @@ static void __init ms_hyperv_init_platform(void)
ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) {
x86_platform.calibrate_tsc = hv_get_tsc_khz;
x86_platform.calibrate_cpu = hv_get_tsc_khz;
+ setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
}
if (ms_hyperv.priv_high & HV_ISOLATION) {
--
2.43.0