OP-TEE supplicant is a user-space daemon and it's possible for it
being hung or crashed or killed in the middle of processing an OP-TEE
RPC call. It becomes more complicated when there is incorrect shutdown
ordering of the supplicant process vs the OP-TEE client application which
can eventually lead to system hang-up waiting for the closure of the
client application.
Allow the client process waiting in kernel for supplicant response to
be killed rather than indefinitetly waiting in an unkillable state. This
fixes issues observed during system reboot/shutdown when supplicant got
hung for some reason or gets crashed/killed which lead to client getting
hung in an unkillable state. It in turn lead to system being in hung up
state requiring hard power off/on to recover.
Fixes: 4fb0a5eb364d ("tee: add OP-TEE driver")
Suggested-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Sumit Garg <sumit.garg(a)linaro.org>
---
Changes in v2:
- Switch to killable wait instead as suggested by Arnd instead
of supplicant timeout. It atleast allow the client to wait for
supplicant in killable state which in turn allows system to reboot
or shutdown gracefully.
drivers/tee/optee/supp.c | 32 +++++++-------------------------
1 file changed, 7 insertions(+), 25 deletions(-)
diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
index 322a543b8c27..3fbfa9751931 100644
--- a/drivers/tee/optee/supp.c
+++ b/drivers/tee/optee/supp.c
@@ -80,7 +80,6 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
struct optee *optee = tee_get_drvdata(ctx->teedev);
struct optee_supp *supp = &optee->supp;
struct optee_supp_req *req;
- bool interruptable;
u32 ret;
/*
@@ -111,36 +110,19 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
/*
* Wait for supplicant to process and return result, once we've
* returned from wait_for_completion(&req->c) successfully we have
- * exclusive access again.
+ * exclusive access again. Allow the wait to be killable such that
+ * the wait doesn't turn into an indefinite state if the supplicant
+ * gets hung for some reason.
*/
- while (wait_for_completion_interruptible(&req->c)) {
- mutex_lock(&supp->mutex);
- interruptable = !supp->ctx;
- if (interruptable) {
- /*
- * There's no supplicant available and since the
- * supp->mutex currently is held none can
- * become available until the mutex released
- * again.
- *
- * Interrupting an RPC to supplicant is only
- * allowed as a way of slightly improving the user
- * experience in case the supplicant hasn't been
- * started yet. During normal operation the supplicant
- * will serve all requests in a timely manner and
- * interrupting then wouldn't make sense.
- */
+ if (wait_for_completion_killable(&req->c)) {
+ if (!mutex_lock_killable(&supp->mutex)) {
if (req->in_queue) {
list_del(&req->link);
req->in_queue = false;
}
+ mutex_unlock(&supp->mutex);
}
- mutex_unlock(&supp->mutex);
-
- if (interruptable) {
- req->ret = TEEC_ERROR_COMMUNICATION;
- break;
- }
+ req->ret = TEEC_ERROR_COMMUNICATION;
}
ret = req->ret;
--
2.43.0
This is the start of the stable review cycle for the 6.13.1 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 01 Feb 2025 13:34:42 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.13.1-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.13.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.13.1-rc1
Jack Greiner <jack(a)emoss.org>
Input: xpad - add support for wooting two he (arm)
Matheos Mattsson <matheos.mattsson(a)gmail.com>
Input: xpad - add support for Nacon Evol-X Xbox One Controller
Leonardo Brondani Schenkel <leonardo(a)schenkel.net>
Input: xpad - improve name of 8BitDo controller 2dc8:3106
Pierre-Loup A. Griffais <pgriffais(a)valvesoftware.com>
Input: xpad - add QH Electronics VID/PID
Nilton Perim Neto <niltonperimneto(a)gmail.com>
Input: xpad - add unofficial Xbox 360 wireless receiver clone
Mark Pearson <mpearson-lenovo(a)squebb.ca>
Input: atkbd - map F23 key to support default copilot shortcut
Nicolas Nobelis <nicolas(a)nobelis.eu>
Input: xpad - add support for Nacon Pro Compact
Jann Horn <jannh(a)google.com>
io_uring/rsrc: require cloned buffers to share accounting contexts
Jason Gerecke <jason.gerecke(a)wacom.com>
HID: wacom: Initialize brightness of LED trigger
Hans de Goede <hdegoede(a)redhat.com>
wifi: rtl8xxxu: add more missing rtl8192cu USB IDs
Lianqin Hu <hulianqin(a)vivo.com>
ALSA: usb-audio: Add delay quirk for USB Audio Device
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
Qasim Ijaz <qasdev00(a)gmail.com>
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
Easwar Hariharan <eahariha(a)linux.microsoft.com>
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: check the bounds of read/write syscalls
Linus Torvalds <torvalds(a)linux-foundation.org>
cachestat: fix page cache statistics permission checking
Jiri Kosina <jikos(a)kernel.org>
Revert "HID: multitouch: Add support for lenovo Y9000P Touchpad"
Jamal Hadi Salim <jhs(a)mojatatu.com>
net: sched: fix ets qdisc OOB Indexing
Paulo Alcantara <pc(a)manguebit.com>
smb: client: handle lack of EA support in smb2_query_path_info()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Use d_children list to iterate simple_offset directories
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Replace simple_offset end-of-directory detection
Chuck Lever <chuck.lever(a)oracle.com>
Revert "libfs: fix infinite directory reads for offset dir"
Chuck Lever <chuck.lever(a)oracle.com>
Revert "libfs: Add simple_offset_empty()"
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Return ENOSPC when the directory offset range is exhausted
Andreas Gruenbacher <agruenba(a)redhat.com>
gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
-------------
Diffstat:
Makefile | 4 +-
drivers/hid/hid-ids.h | 1 -
drivers/hid/hid-multitouch.c | 8 +-
drivers/hid/wacom_sys.c | 24 ++--
drivers/input/joystick/xpad.c | 9 +-
drivers/input/keyboard/atkbd.c | 2 +-
drivers/net/wireless/realtek/rtl8xxxu/core.c | 20 ++++
drivers/scsi/storvsc_drv.c | 8 +-
drivers/usb/gadget/function/u_serial.c | 8 +-
drivers/usb/serial/quatech2.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 10 ++
fs/gfs2/file.c | 1 +
fs/libfs.c | 162 +++++++++++++--------------
fs/smb/client/smb2inode.c | 104 ++++++++++++-----
include/linux/fs.h | 1 -
io_uring/rsrc.c | 7 ++
mm/filemap.c | 17 +++
mm/shmem.c | 4 +-
net/sched/sch_ets.c | 2 +
sound/usb/quirks.c | 2 +
20 files changed, 251 insertions(+), 145 deletions(-)
Hi
On Sat, 2025-02-01 at 23:33 -0500, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> drm/xe: Use ttm_bo_access in xe_vm_snapshot_capture_delayed
>
> to the 6.12-stable tree which can be found at:
>
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> drm-xe-use-ttm_bo_access-in-xe_vm_snapshot_capture_d.patch
> and it can be found in the queue-6.12 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable
> tree,
> please let <stable(a)vger.kernel.org> know about it.
Please avoid including this patch for now. It turned out it needs more
dependencies and we will attempt a manual backport later.
Thanks,
Thomas
>
>
>
> commit 7aad4e92ca3782686571792d117b3ffcfe05c65c
> Author: Matthew Brost <matthew.brost(a)intel.com>
> Date: Tue Nov 26 09:46:13 2024 -0800
>
> drm/xe: Use ttm_bo_access in xe_vm_snapshot_capture_delayed
>
> [ Upstream commit 5f7bec831f1f17c354e4307a12cf79b018296975 ]
>
> Non-contiguous mapping of BO in VRAM doesn't work, use
> ttm_bo_access
> instead.
>
> v2:
> - Fix error handling
>
> Fixes: 0eb2a18a8fad ("drm/xe: Implement VM snapshot support for
> BO's and userptr")
> Suggested-by: Matthew Auld <matthew.auld(a)intel.com>
> Signed-off-by: Matthew Brost <matthew.brost(a)intel.com>
> Reviewed-by: Matthew Auld <matthew.auld(a)intel.com>
> Link:
> https://patchwork.freedesktop.org/patch/msgid/20241126174615.2665852-7-matt…
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index c99380271de62..c8782da3a5c38 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -3303,7 +3303,6 @@ void xe_vm_snapshot_capture_delayed(struct
> xe_vm_snapshot *snap)
>
> for (int i = 0; i < snap->num_snaps; i++) {
> struct xe_bo *bo = snap->snap[i].bo;
> - struct iosys_map src;
> int err;
>
> if (IS_ERR(snap->snap[i].data))
> @@ -3316,16 +3315,12 @@ void xe_vm_snapshot_capture_delayed(struct
> xe_vm_snapshot *snap)
> }
>
> if (bo) {
> - xe_bo_lock(bo, false);
> - err = ttm_bo_vmap(&bo->ttm, &src);
> - if (!err) {
> - xe_map_memcpy_from(xe_bo_device(bo),
> - snap-
> >snap[i].data,
> - &src, snap-
> >snap[i].bo_ofs,
> - snap-
> >snap[i].len);
> - ttm_bo_vunmap(&bo->ttm, &src);
> - }
> - xe_bo_unlock(bo);
> + err = ttm_bo_access(&bo->ttm, snap-
> >snap[i].bo_ofs,
> + snap->snap[i].data,
> snap->snap[i].len, 0);
> + if (!(err < 0) && err != snap->snap[i].len)
> + err = -EIO;
> + else if (!(err < 0))
> + err = 0;
> } else {
> void __user *userptr = (void __user
> *)(size_t)snap->snap[i].bo_ofs;
>
If an AX25 device is bound to a socket by setting the SO_BINDTODEVICE
socket option, a refcount leak will occur in ax25_release().
Commit 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()")
added decrement of device refcounts in ax25_release(). In order for that
to work correctly the refcounts must already be incremented when the
device is bound to the socket. An AX25 device can be bound to a socket
by either calling ax25_bind() or setting SO_BINDTODEVICE socket option.
In both cases the refcounts should be incremented, but in fact it is done
only in ax25_bind().
This bug leads to the following issue reported by Syzkaller:
================================================================
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 1 PID: 5932 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
Modules linked in:
CPU: 1 UID: 0 PID: 5932 Comm: syz-executor424 Not tainted 6.13.0-rc4-syzkaller-00110-g4099a71718b0 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
Call Trace:
<TASK>
__refcount_dec include/linux/refcount.h:336 [inline]
refcount_dec include/linux/refcount.h:351 [inline]
ref_tracker_free+0x710/0x820 lib/ref_tracker.c:236
netdev_tracker_free include/linux/netdevice.h:4156 [inline]
netdev_put include/linux/netdevice.h:4173 [inline]
netdev_put include/linux/netdevice.h:4169 [inline]
ax25_release+0x33f/0xa10 net/ax25/af_ax25.c:1069
__sock_release+0xb0/0x270 net/socket.c:640
sock_close+0x1c/0x30 net/socket.c:1408
...
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
</TASK>
================================================================
Fix the implementation of ax25_setsockopt() by adding increment of
refcounts for the new device bound, and decrement of refcounts for
the old unbound device.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()")
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+33841dc6aa3e1d86b78a(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=33841dc6aa3e1d86b78a
Signed-off-by: Murad Masimov <m.masimov(a)mt-integration.ru>
---
net/ax25/af_ax25.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index aa6c714892ec..9f3b8b682adb 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -685,6 +685,15 @@ static int ax25_setsockopt(struct socket *sock, int level, int optname,
break;
}
+ if (ax25->ax25_dev) {
+ if (dev == ax25->ax25_dev->dev) {
+ rcu_read_unlock();
+ break;
+ }
+ netdev_put(ax25->ax25_dev->dev, &ax25->dev_tracker);
+ ax25_dev_put(ax25->ax25_dev);
+ }
+
ax25->ax25_dev = ax25_dev_ax25dev(dev);
if (!ax25->ax25_dev) {
rcu_read_unlock();
@@ -692,6 +701,8 @@ static int ax25_setsockopt(struct socket *sock, int level, int optname,
break;
}
ax25_fillin_cb(ax25, ax25->ax25_dev);
+ netdev_hold(dev, &ax25->dev_tracker, GFP_ATOMIC);
+ ax25_dev_hold(ax25->ax25_dev);
rcu_read_unlock();
break;
--
2.39.2
There are two variables that indicate the interrupt type to be used
in the next test execution, "irq_type" as global and test->irq_type.
The global is referenced from pci_endpoint_test_get_irq() to preserve
the current type for ioctl(PCITEST_GET_IRQTYPE).
The type set in this function isn't reflected in the global "irq_type",
so ioctl(PCITEST_GET_IRQTYPE) returns the previous type.
As a result, the wrong type will be displayed in "pcitest" as follows:
# pcitest -i 0
SET IRQ TYPE TO LEGACY: OKAY
# pcitest -I
GET IRQ TYPE: MSI
Fix this issue by propagating the current type to the global "irq_type".
Cc: stable(a)vger.kernel.org
Fixes: b2ba9225e031 ("misc: pci_endpoint_test: Avoid using module parameter to determine irqtype")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
---
drivers/misc/pci_endpoint_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index a342587fc78a..33058630cd50 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -742,6 +742,7 @@ static bool pci_endpoint_test_set_irq(struct pci_endpoint_test *test,
if (!pci_endpoint_test_request_irq(test))
goto err;
+ irq_type = test->irq_type;
return true;
err:
--
2.25.1
Add a sanity check to madvise_dontneed_free() to address a corner case
in madvise where a race condition causes the current vma being processed
to be backed by a different page size.
During a madvise(MADV_DONTNEED) call on a memory region registered with
a userfaultfd, there's a period of time where the process mm lock is
temporarily released in order to send a UFFD_EVENT_REMOVE and let
userspace handle the event. During this time, the vma covering the
current address range may change due to an explicit mmap done
concurrently by another thread.
If, after that change, the memory region, which was originally backed by
4KB pages, is now backed by hugepages, the end address is rounded down
to a hugepage boundary to avoid data loss (see "Fixes" below). This
rounding may cause the end address to be truncated to the same address
as the start.
Make this corner case follow the same semantics as in other similar
cases where the requested region has zero length (ie. return 0).
This will make madvise_walk_vmas() continue to the next vma in the
range (this time holding the process mm lock) which, due to the prev
pointer becoming stale because of the vma change, will be the same
hugepage-backed vma that was just checked before. The next time
madvise_dontneed_free() runs for this vma, if the start address isn't
aligned to a hugepage boundary, it'll return -EINVAL, which is also in
line with the madvise api.
From userspace perspective, madvise() will return EINVAL because the
start address isn't aligned according to the new vma alignment
requirements (hugepage), even though it was correctly page-aligned when
the call was issued.
Fixes: 8ebe0a5eaaeb ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ricardo Cañuelo Navarro <rcn(a)igalia.com>
---
mm/madvise.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 49f3a75046f6..c8e28d51978a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -933,7 +933,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
*/
end = vma->vm_end;
}
- VM_WARN_ON(start >= end);
+ if (start == end)
+ return 0;
+ VM_WARN_ON(start > end);
}
if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED)
--
2.48.1
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Commit 11c60f23ed13 ("integrity: Remove unused macro
IMA_ACTION_RULE_FLAGS") removed the IMA_ACTION_RULE_FLAGS mask, due to it
not being used after commit 0d73a55208e9 ("ima: re-introduce own integrity
cache lock").
However, it seems that the latter commit mistakenly used the wrong mask
when moving the code from ima_inode_post_setattr() to
process_measurement(). There is no mention in the commit message about this
change and it looks quite important, since changing from IMA_ACTIONS_FLAGS
(later renamed to IMA_NONACTION_FLAGS) to IMA_ACTION_RULE_FLAGS was done by
commit 42a4c603198f0 ("ima: fix ima_inode_post_setattr").
Restore the original change of resetting only the policy-specific flags and
not the new file status, but with new mask 0xfb000000 since the
policy-specific flags changed meanwhile. Also rename IMA_ACTION_RULE_FLAGS
to IMA_NONACTION_RULE_FLAGS, to be consistent with IMA_NONACTION_FLAGS.
Cc: stable(a)vger.kernel.org # v4.16.x
Fixes: 11c60f23ed13 ("integrity: Remove unused macro IMA_ACTION_RULE_FLAGS")
Reviewed-by: Mimi Zohar <zohar(a)linux.ibm.com>
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
---
security/integrity/ima/ima.h | 1 +
security/integrity/ima/ima_main.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index e1a3d1239bee..615900d4150d 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -141,6 +141,7 @@ struct ima_kexec_hdr {
/* IMA iint policy rule cache flags */
#define IMA_NONACTION_FLAGS 0xff000000
+#define IMA_NONACTION_RULE_FLAGS 0xfb000000
#define IMA_DIGSIG_REQUIRED 0x01000000
#define IMA_PERMIT_DIRECTIO 0x02000000
#define IMA_NEW_FILE 0x04000000
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 46adfd524dd8..7173dca20c23 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -275,7 +275,7 @@ static int process_measurement(struct file *file, const struct cred *cred,
/* reset appraisal flags if ima_inode_post_setattr was called */
iint->flags &= ~(IMA_APPRAISE | IMA_APPRAISED |
IMA_APPRAISE_SUBMASK | IMA_APPRAISED_SUBMASK |
- IMA_NONACTION_FLAGS);
+ IMA_NONACTION_RULE_FLAGS);
/*
* Re-evaulate the file if either the xattr has changed or the
--
2.34.1
Hello Christian,
It will cause problems for real applications if the specific event happens, but that happens with a very low probability. It's more an issue that can be resolved at the author's convenience, but I would be comfortable if the issue I discovered is acknowledged at least, if not addressed.
Thanks
Shehab
________________________________________
From: Ahmed, Shehab Sarar <shehaba2(a)illinois.edu>
Sent: Sunday, February 2, 2025 6:10 PM
To: Christian Heusel
Cc: stable(a)vger.kernel.org; regressions(a)lists.linux.dev
Subject: Re: TCP Fast Retransmission Issue
Hello Christian,
It will cause problems for real applications if the specific event happens, but that happens with a very low probability. It's more an issue that can be resolved at the author's convenience, but I would be comfortable if the issue I discovered is acknowledged at least, if not addressed.
Thanks
Shehab
________________________________
From: Christian Heusel
Sent: Sunday, February 2, 2025 3:26 AM
To: Ahmed, Shehab Sarar
Cc: stable(a)vger.kernel.org; regressions(a)lists.linux.dev
Subject: Re: TCP Fast Retransmission Issue
On 25/02/02 10:26AM, Christian Heusel wrote:
> On 25/02/01 07:09PM, Ahmed, Shehab Sarar wrote:
> > Hello,
>
> Hello,
>
> > While experimenting with bbr protocol, I manipulated the network conditions by maintaining a high RTT for about one second before abruptly reducing it. Some packets sent during the high RTT phase experienced long delays in reaching the destination, while later packets, benefiting from the lower RTT, arrived earlier. This out-of-order arrival triggered the receiver to generate duplicate acknowledgments (dup ACKs). Due to the low RTT, these dup ACKs quickly reached the sender. Upon receiving three dup ACKs, the sender initiated a fast retransmission for an earlier packet that was not lost but was simply taking longer to arrive. Interestingly, despite the fast-retransmitted packet experienced a lower RTT, the original delayed packet still arrived first. When the receiver received this packet, it sent an ACK for the next packet in sequence. However, upon later receiving the fast-retransmitted packet, an issue arose in its logic for updating the acknowledgment number. As a result, even after the next expected packet was received, the acknowledgment number was not updated correctly. The receiver continued sending dup ACKs, ultimately forcing bbr into the retransmission timeout (RTO) phase.
> >
> > I generated this issue in linux kernel version 5.15.0-117-generic with Ubuntu 20.04. I attempted to confirm whether the issue persists with the latest Linux kernel. However, I discovered that the behavior of bbr has changed in the most recent kernel version, where it now sends chunks of packets instead of sending them one by one over time. As a result, I was unable to reproduce the specific sequence of events that triggered the bug we identified. Consequently, I could not confirm whether the bug still exists in the latest kernel.
> >
> > I believe that the issue (if still exists) will have to be resolved in the location net/ipv4/tcp_input.c or something like that. There are so many authors here that I do not know who to CC here. So, sending this email to you. Sorry if this is not the best way to report this issue.
>
> does this cause problems for real applications? To me it sounds a bit
> like a constructed issue, but I'm also not really proficient about the
> stack mentioned about :p
>
> I'm asking because this is important for how we treat this report, see
> the "Reporting Regression"[0] document for more details on what we
> consider to be an regression.
>
> > Thanks
> > Shehab
>
> Cheers,
> Christian
(forgot the link)
[0]: https://docs.kernel.org/admin-guide/reporting-regressions.html#what-is-a-re…
While by default max_autoclose equals to INT_MAX / HZ, one may set
net.sctp.max_autoclose to UINT_MAX. There is code in
sctp_association_init() that can consequently trigger overflow.
Cc: stable(a)vger.kernel.org
Fixes: 9f70f46bd4c7 ("sctp: properly latch and use autoclose value from sock to association")
Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru>
---
net/sctp/associola.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index c45c192b7878..0b0794f164cf 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -137,7 +137,8 @@ static struct sctp_association *sctp_association_init(
= 5 * asoc->rto_max;
asoc->timeouts[SCTP_EVENT_TIMEOUT_SACK] = asoc->sackdelay;
- asoc->timeouts[SCTP_EVENT_TIMEOUT_AUTOCLOSE] = sp->autoclose * HZ;
+ asoc->timeouts[SCTP_EVENT_TIMEOUT_AUTOCLOSE] =
+ (unsigned long)sp->autoclose * HZ;
/* Initializes the timers */
for (i = SCTP_EVENT_TIMEOUT_NONE; i < SCTP_NUM_TIMEOUT_TYPES; ++i)
--
2.34.1
On 02/02/2025 05:23, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> ARM: dts: aspeed: yosemite4: correct the compatible string for max31790
>
> to the 6.13-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> arm-dts-aspeed-yosemite4-correct-the-compatible-stri.patch
> and it can be found in the queue-6.13 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 64b29da76fb21bbb955e262461996d37865d4ae9
> Author: Ricky CX Wu <ricky.cx.wu.wiwynn(a)gmail.com>
> Date: Thu Oct 3 15:42:46 2024 +0800
>
> ARM: dts: aspeed: yosemite4: correct the compatible string for max31790
>
> [ Upstream commit b1a1ecb669bfa763ee5e86a038d7c9363eee7548 ]
>
> Fix the compatible string for max31790 to match the binding document.
>
> Fixes: 2b8d94f4b4a4 ("ARM: dts: aspeed: yosemite4: add Facebook Yosemite 4 BMC")
> Signed-off-by: Ricky CX Wu <ricky.cx.wu.wiwynn(a)gmail.com>
> Signed-off-by: Delphine CC Chiu <Delphine_CC_Chiu(a)wiwynn.com>
> Link: https://patch.msgid.link/20241003074251.3818101-6-Delphine_CC_Chiu@wiwynn.c…
> Signed-off-by: Andrew Jeffery <andrew(a)codeconstruct.com.au>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Sasha, something got broken in your scripts.
I received today huge flood (100? 200?) of such stable confirmations,
but I am not listed above at all. No tags came here from me, so why I am
Cc-ed on all this?
Best regards,
Krzysztof
When attaching uretprobes to processes running inside docker, the attached
process is segfaulted when encountering the retprobe.
The reason is that now that uretprobe is a system call the default seccomp
filters in docker block it as they only allow a specific set of known
syscalls. This is true for other userspace applications which use seccomp
to control their syscall surface.
Since uretprobe is a "kernel implementation detail" system call which is
not used by userspace application code directly, it is impractical and
there's very little point in forcing all userspace applications to
explicitly allow it in order to avoid crashing tracked processes.
Pass this systemcall through seccomp without depending on configuration.
Note: uretprobe isn't supported in i386 and __NR_ia32_rt_tgsigqueueinfo
uses the same number as __NR_uretprobe so the syscall isn't forced in the
compat bitmap.
Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe")
Reported-by: Rafael Buchbinder <rafi(a)rbk.io>
Link: https://lore.kernel.org/lkml/CAHsH6Gs3Eh8DFU0wq58c_LF8A4_+o6z456J7BidmcVY2A…
Link: https://lore.kernel.org/lkml/20250121182939.33d05470@gandalf.local.home/T/#…
Cc: stable(a)vger.kernel.org
Signed-off-by: Eyal Birger <eyal.birger(a)gmail.com>
---
v2: use action_cache bitmap and mode1 array to check the syscall
The following reproduction script synthetically demonstrates the problem
for seccomp filters:
cat > /tmp/x.c << EOF
char *syscalls[] = {
"write",
"exit_group",
"fstat",
};
__attribute__((noinline)) int probed(void)
{
printf("Probed\n");
return 1;
}
void apply_seccomp_filter(char **syscalls, int num_syscalls)
{
scmp_filter_ctx ctx;
ctx = seccomp_init(SCMP_ACT_KILL);
for (int i = 0; i < num_syscalls; i++) {
seccomp_rule_add(ctx, SCMP_ACT_ALLOW,
seccomp_syscall_resolve_name(syscalls[i]), 0);
}
seccomp_load(ctx);
seccomp_release(ctx);
}
int main(int argc, char *argv[])
{
int num_syscalls = sizeof(syscalls) / sizeof(syscalls[0]);
apply_seccomp_filter(syscalls, num_syscalls);
probed();
return 0;
}
EOF
cat > /tmp/trace.bt << EOF
uretprobe:/tmp/x:probed
{
printf("ret=%d\n", retval);
}
EOF
gcc -o /tmp/x /tmp/x.c -lseccomp
/usr/bin/bpftrace /tmp/trace.bt &
sleep 5 # wait for uretprobe attach
/tmp/x
pkill bpftrace
rm /tmp/x /tmp/x.c /tmp/trace.bt
---
kernel/seccomp.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 385d48293a5f..23b594a68bc0 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -734,13 +734,13 @@ seccomp_prepare_user_filter(const char __user *user_filter)
#ifdef SECCOMP_ARCH_NATIVE
/**
- * seccomp_is_const_allow - check if filter is constant allow with given data
+ * seccomp_is_filter_const_allow - check if filter is constant allow with given data
* @fprog: The BPF programs
* @sd: The seccomp data to check against, only syscall number and arch
* number are considered constant.
*/
-static bool seccomp_is_const_allow(struct sock_fprog_kern *fprog,
- struct seccomp_data *sd)
+static bool seccomp_is_filter_const_allow(struct sock_fprog_kern *fprog,
+ struct seccomp_data *sd)
{
unsigned int reg_value = 0;
unsigned int pc;
@@ -812,6 +812,21 @@ static bool seccomp_is_const_allow(struct sock_fprog_kern *fprog,
return false;
}
+static bool seccomp_is_const_allow(struct sock_fprog_kern *fprog,
+ struct seccomp_data *sd)
+{
+#ifdef __NR_uretprobe
+ if (sd->nr == __NR_uretprobe
+#ifdef SECCOMP_ARCH_COMPAT
+ && sd->arch != SECCOMP_ARCH_COMPAT
+#endif
+ )
+ return true;
+#endif
+
+ return seccomp_is_filter_const_allow(fprog, sd);
+}
+
static void seccomp_cache_prepare_bitmap(struct seccomp_filter *sfilter,
void *bitmap, const void *bitmap_prev,
size_t bitmap_size, int arch)
@@ -1023,6 +1038,9 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
*/
static const int mode1_syscalls[] = {
__NR_seccomp_read, __NR_seccomp_write, __NR_seccomp_exit, __NR_seccomp_sigreturn,
+#ifdef __NR_uretprobe
+ __NR_uretprobe,
+#endif
-1, /* negative terminated */
};
--
2.43.0
This is the start of the stable review cycle for the 6.6.75 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 01 Feb 2025 13:34:42 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.75-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.6.75-rc1
Jack Greiner <jack(a)emoss.org>
Input: xpad - add support for wooting two he (arm)
Matheos Mattsson <matheos.mattsson(a)gmail.com>
Input: xpad - add support for Nacon Evol-X Xbox One Controller
Leonardo Brondani Schenkel <leonardo(a)schenkel.net>
Input: xpad - improve name of 8BitDo controller 2dc8:3106
Pierre-Loup A. Griffais <pgriffais(a)valvesoftware.com>
Input: xpad - add QH Electronics VID/PID
Nilton Perim Neto <niltonperimneto(a)gmail.com>
Input: xpad - add unofficial Xbox 360 wireless receiver clone
Mark Pearson <mpearson-lenovo(a)squebb.ca>
Input: atkbd - map F23 key to support default copilot shortcut
Nicolas Nobelis <nicolas(a)nobelis.eu>
Input: xpad - add support for Nacon Pro Compact
Lianqin Hu <hulianqin(a)vivo.com>
ALSA: usb-audio: Add delay quirk for USB Audio Device
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
Qasim Ijaz <qasdev00(a)gmail.com>
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
Easwar Hariharan <eahariha(a)linux.microsoft.com>
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
Ido Schimmel <idosch(a)nvidia.com>
ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_find()
Luis Henriques (SUSE) <luis.henriques(a)linux.dev>
ext4: fix access to uninitialised lock in fc replay path
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: check the bounds of read/write syscalls
Linus Torvalds <torvalds(a)linux-foundation.org>
cachestat: fix page cache statistics permission checking
Jiri Kosina <jkosina(a)suse.com>
Revert "HID: multitouch: Add support for lenovo Y9000P Touchpad"
Alexey Dobriyan <adobriyan(a)gmail.com>
block: fix integer overflow in BLKSECDISCARD
Jamal Hadi Salim <jhs(a)mojatatu.com>
net: sched: fix ets qdisc OOB Indexing
Paulo Alcantara <pc(a)manguebit.com>
smb: client: handle lack of EA support in smb2_query_path_info()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Use d_children list to iterate simple_offset directories
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Replace simple_offset end-of-directory detection
Chuck Lever <chuck.lever(a)oracle.com>
Revert "libfs: Add simple_offset_empty()"
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Return ENOSPC when the directory offset range is exhausted
Chuck Lever <chuck.lever(a)oracle.com>
shmem: Fix shmem_rename2()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Add simple_offset_rename() API
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Fix simple_offset_rename_exchange()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Add simple_offset_empty()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Define a minimum directory offset
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Re-arrange locking in offset_iterate_dir()
Andreas Gruenbacher <agruenba(a)redhat.com>
gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
Selvin Xavier <selvin.xavier(a)broadcom.com>
RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop
Omid Ehtemam-Haghighi <omid.ehtemamhaghighi(a)menlosecurity.com>
ipv6: Fix soft lockups in fib6_select_path under high next hop churn
Anastasia Belova <abelova(a)astralinux.ru>
cpufreq: amd-pstate: add check for cpufreq_cpu_get's return value
Igor Pylypiv <ipylypiv(a)google.com>
ata: libata-core: Set ATA_QCFLAG_RTF_FILLED in fill_result_tf()
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing depends on I2C
Russell Harmon <russ(a)har.mn>
hwmon: (drivetemp) Set scsi command timeout to 10s
Philippe Simons <simons.philippe(a)gmail.com>
irqchip/sunxi-nmi: Add missing SKIP_WAKE flag
Rob Herring (Arm) <robh(a)kernel.org>
of/unittest: Add test that of_address_to_resource() fails on non-translatable address
Tom Chung <chiahsuan.chung(a)amd.com>
drm/amd/display: Use HW lock mgr for PSR1
Xiang Zhang <hawkxiang.cpp(a)gmail.com>
scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
Linus Walleij <linus.walleij(a)linaro.org>
seccomp: Stub for !CONFIG_SECCOMP
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing selects for MFD_WM8994
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: wm8994: Add depends on MFD core
-------------
Diffstat:
Makefile | 4 +-
block/ioctl.c | 9 +-
drivers/ata/libahci.c | 12 +-
drivers/ata/libata-core.c | 8 +
drivers/cpufreq/amd-pstate.c | 7 +-
.../gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c | 3 +-
drivers/hid/hid-ids.h | 1 -
drivers/hid/hid-multitouch.c | 8 +-
drivers/hwmon/drivetemp.c | 2 +-
drivers/infiniband/hw/bnxt_re/main.c | 10 +
drivers/input/joystick/xpad.c | 9 +-
drivers/input/keyboard/atkbd.c | 2 +-
drivers/irqchip/irq-sunxi-nmi.c | 3 +-
drivers/of/unittest-data/tests-platform.dtsi | 13 +
drivers/of/unittest.c | 14 ++
drivers/scsi/scsi_transport_iscsi.c | 4 +-
drivers/scsi/storvsc_drv.c | 8 +-
drivers/usb/gadget/function/u_serial.c | 8 +-
drivers/usb/serial/quatech2.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 10 +
fs/ext4/super.c | 3 +-
fs/gfs2/file.c | 1 +
fs/libfs.c | 177 ++++++++++----
fs/smb/client/smb2inode.c | 104 +++++---
include/linux/fs.h | 2 +
include/linux/seccomp.h | 2 +-
mm/filemap.c | 19 ++
mm/shmem.c | 3 +-
net/ipv4/ip_tunnel.c | 2 +-
net/ipv6/ip6_fib.c | 8 +-
net/ipv6/route.c | 45 ++--
net/sched/sch_ets.c | 2 +
sound/soc/codecs/Kconfig | 1 +
sound/soc/samsung/Kconfig | 6 +-
sound/usb/quirks.c | 2 +
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/ipv6_route_update_soft_lockup.sh | 262 +++++++++++++++++++++
37 files changed, 640 insertions(+), 137 deletions(-)
This is the start of the stable review cycle for the 6.12.12 release.
There are 41 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 01 Feb 2025 14:41:19 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.12-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.12.12-rc2
Jann Horn <jannh(a)google.com>
io_uring/rsrc: require cloned buffers to share accounting contexts
Jack Greiner <jack(a)emoss.org>
Input: xpad - add support for wooting two he (arm)
Matheos Mattsson <matheos.mattsson(a)gmail.com>
Input: xpad - add support for Nacon Evol-X Xbox One Controller
Leonardo Brondani Schenkel <leonardo(a)schenkel.net>
Input: xpad - improve name of 8BitDo controller 2dc8:3106
Pierre-Loup A. Griffais <pgriffais(a)valvesoftware.com>
Input: xpad - add QH Electronics VID/PID
Nilton Perim Neto <niltonperimneto(a)gmail.com>
Input: xpad - add unofficial Xbox 360 wireless receiver clone
Mark Pearson <mpearson-lenovo(a)squebb.ca>
Input: atkbd - map F23 key to support default copilot shortcut
Nicolas Nobelis <nicolas(a)nobelis.eu>
Input: xpad - add support for Nacon Pro Compact
Jason Gerecke <jason.gerecke(a)wacom.com>
HID: wacom: Initialize brightness of LED trigger
Hans de Goede <hdegoede(a)redhat.com>
wifi: rtl8xxxu: add more missing rtl8192cu USB IDs
Lianqin Hu <hulianqin(a)vivo.com>
ALSA: usb-audio: Add delay quirk for USB Audio Device
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
Qasim Ijaz <qasdev00(a)gmail.com>
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
Easwar Hariharan <eahariha(a)linux.microsoft.com>
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: check the bounds of read/write syscalls
Linus Torvalds <torvalds(a)linux-foundation.org>
cachestat: fix page cache statistics permission checking
Jiri Kosina <jikos(a)kernel.org>
Revert "HID: multitouch: Add support for lenovo Y9000P Touchpad"
Jamal Hadi Salim <jhs(a)mojatatu.com>
net: sched: fix ets qdisc OOB Indexing
Paulo Alcantara <pc(a)manguebit.com>
smb: client: handle lack of EA support in smb2_query_path_info()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Use d_children list to iterate simple_offset directories
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Replace simple_offset end-of-directory detection
Chuck Lever <chuck.lever(a)oracle.com>
Revert "libfs: fix infinite directory reads for offset dir"
Chuck Lever <chuck.lever(a)oracle.com>
Revert "libfs: Add simple_offset_empty()"
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Return ENOSPC when the directory offset range is exhausted
Andreas Gruenbacher <agruenba(a)redhat.com>
gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
Yosry Ahmed <yosryahmed(a)google.com>
mm: zswap: move allocations during CPU init outside the lock
Yosry Ahmed <yosryahmed(a)google.com>
mm: zswap: properly synchronize freeing resources during CPU hotunplug
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing depends on I2C
Russell Harmon <russ(a)har.mn>
hwmon: (drivetemp) Set scsi command timeout to 10s
Philippe Simons <simons.philippe(a)gmail.com>
irqchip/sunxi-nmi: Add missing SKIP_WAKE flag
Cristian Ciocaltea <cristian.ciocaltea(a)collabora.com>
drm/connector: hdmi: Validate supported_formats matches ycbcr_420_allowed
Yage Geng <icoderdev(a)gmail.com>
ALSA: hda/realtek: Fix volume adjustment issue on Lenovo ThinkBook 16P Gen5
Rob Herring (Arm) <robh(a)kernel.org>
of/unittest: Add test that of_address_to_resource() fails on non-translatable address
Alex Hung <alex.hung(a)amd.com>
drm/amd/display: Initialize denominator defaults to 1
Tom Chung <chiahsuan.chung(a)amd.com>
drm/amd/display: Use HW lock mgr for PSR1
Xiang Zhang <hawkxiang.cpp(a)gmail.com>
scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
Maciej Strozek <mstrozek(a)opensource.cirrus.com>
ASoC: cs42l43: Add codec force suspend/resume ops
Linus Walleij <linus.walleij(a)linaro.org>
seccomp: Stub for !CONFIG_SECCOMP
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing selects for MFD_WM8994
Marian Postevca <posteuca(a)mutex.one>
ASoC: codecs: es8316: Fix HW rate calculation for 48Mhz MCLK
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: wm8994: Add depends on MFD core
-------------
Diffstat:
Makefile | 4 +-
.../gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c | 3 +-
.../dml21/src/dml2_core/dml2_core_dcn4_calcs.c | 4 +-
drivers/gpu/drm/drm_connector.c | 3 +
drivers/hid/hid-ids.h | 1 -
drivers/hid/hid-multitouch.c | 8 +-
drivers/hid/wacom_sys.c | 24 +--
drivers/hwmon/drivetemp.c | 2 +-
drivers/input/joystick/xpad.c | 9 +-
drivers/input/keyboard/atkbd.c | 2 +-
drivers/irqchip/irq-sunxi-nmi.c | 3 +-
drivers/net/wireless/realtek/rtl8xxxu/core.c | 20 +++
drivers/of/unittest-data/tests-platform.dtsi | 13 ++
drivers/of/unittest.c | 14 ++
drivers/scsi/scsi_transport_iscsi.c | 4 +-
drivers/scsi/storvsc_drv.c | 8 +-
drivers/usb/gadget/function/u_serial.c | 8 +-
drivers/usb/serial/quatech2.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 10 ++
fs/gfs2/file.c | 1 +
fs/libfs.c | 162 ++++++++++-----------
fs/smb/client/smb2inode.c | 104 +++++++++----
include/linux/fs.h | 1 -
include/linux/seccomp.h | 2 +-
io_uring/rsrc.c | 7 +
mm/filemap.c | 19 +++
mm/shmem.c | 4 +-
mm/zswap.c | 90 ++++++++----
net/sched/sch_ets.c | 2 +
sound/pci/hda/patch_realtek.c | 4 +-
sound/soc/codecs/Kconfig | 1 +
sound/soc/codecs/cs42l43.c | 1 +
sound/soc/codecs/es8316.c | 10 +-
sound/soc/samsung/Kconfig | 6 +-
sound/usb/quirks.c | 2 +
35 files changed, 374 insertions(+), 184 deletions(-)
Hello,
While experimenting with bbr protocol, I manipulated the network conditions by maintaining a high RTT for about one second before abruptly reducing it. Some packets sent during the high RTT phase experienced long delays in reaching the destination, while later packets, benefiting from the lower RTT, arrived earlier. This out-of-order arrival triggered the receiver to generate duplicate acknowledgments (dup ACKs). Due to the low RTT, these dup ACKs quickly reached the sender. Upon receiving three dup ACKs, the sender initiated a fast retransmission for an earlier packet that was not lost but was simply taking longer to arrive. Interestingly, despite the fast-retransmitted packet experienced a lower RTT, the original delayed packet still arrived first. When the receiver received this packet, it sent an ACK for the next packet in sequence. However, upon later receiving the fast-retransmitted packet, an issue arose in its logic for updating the acknowledgment number. As a result, even after the next expected packet was received, the acknowledgment number was not updated correctly. The receiver continued sending dup ACKs, ultimately forcing bbr into the retransmission timeout (RTO) phase.
I generated this issue in linux kernel version 5.15.0-117-generic with Ubuntu 20.04. I attempted to confirm whether the issue persists with the latest Linux kernel. However, I discovered that the behavior of bbr has changed in the most recent kernel version, where it now sends chunks of packets instead of sending them one by one over time. As a result, I was unable to reproduce the specific sequence of events that triggered the bug we identified. Consequently, I could not confirm whether the bug still exists in the latest kernel.
I believe that the issue (if still exists) will have to be resolved in the location net/ipv4/tcp_input.c or something like that. There are so many authors here that I do not know who to CC here. So, sending this email to you. Sorry if this is not the best way to report this issue.
Thanks
Shehab
________________________________________
From: Ahmed, Shehab Sarar
Sent: Saturday, February 1, 2025 1:01 PM
To: stable(a)vger.kernel.org
Cc: regressions(a)lists.linux.dev
Subject: TCP Fast Retransmission Issue
Hello,
While experimenting with bbr protocol, I manipulated the network conditions by maintaining a high RTT for about one second before abruptly reducing it. Some packets sent during the high RTT phase experienced long delays in reaching the destination, while later packets, benefiting from the lower RTT, arrived earlier. This out-of-order arrival triggered the receiver to generate duplicate acknowledgments (dup ACKs). Due to the low RTT, these dup ACKs quickly reached the sender. Upon receiving three dup ACKs, the sender initiated a fast retransmission for an earlier packet that was not lost but was simply taking longer to arrive. Interestingly, despite the fast-retransmitted packet experienced a lower RTT, the original delayed packet still arrived first. When the receiver received this packet, it sent an ACK for the next packet in sequence. However, upon later receiving the fast-retransmitted packet, an issue arose in its logic for updating the acknowledgment number. As a result, even after the next expected packet was received, the acknowledgment number was not updated correctly. The receiver continued sending dup ACKs, ultimately forcing bbr into the retransmission timeout (RTO) phase.
I generated this issue in linux kernel version 5.15.0-117-generic with Ubuntu 20.04. I attempted to confirm whether the issue persists with the latest Linux kernel. However, I discovered that the behavior of bbr has changed in the most recent kernel version, where it now sends chunks of packets instead of sending them one by one over time. As a result, I was unable to reproduce the specific sequence of events that triggered the bug we identified. Consequently, I could not confirm whether the bug still exists in the latest kernel.
I believe that the issue (if still exists) will have to be resolved in the location net/ipv4/tcp_input.c or something like that. There are so many authors here that I do not know who to CC here. So, sending this email to you. Sorry if this is not the best way to report this issue.
Thanks
Shehab
Returning to focus on 6.1, here is the 6.1 set from the corresponding
6.6 set:
https://lore.kernel.org/all/20240208232054.15778-1-catherine.hoang@oracle.c…
Two patches are missing from the original set:
[01/21] MAINTAINERS: add Catherine as xfs maintainer for 6.6.y
6.6.y-only change
[16/21] xfs: fix again select in kconfig XFS_ONLINE_SCRUB_STATS
XFS_ONLINE_SCRUB_STATS didn't show up till 6.6
The auto group was run on 10 configs and no regressions were seen.
This has been ack'd on the xfs-stable mailing list.
Thanks,
Leah
Catherine Hoang (1):
xfs: allow read IO and FICLONE to run concurrently
Cheng Lin (1):
xfs: introduce protection for drop nlink
Christoph Hellwig (4):
xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space
xfs: only remap the written blocks in xfs_reflink_end_cow_extent
xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags
xfs: respect the stable writes flag on the RT device
Darrick J. Wong (8):
xfs: bump max fsgeom struct version
xfs: hoist freeing of rt data fork extent mappings
xfs: prevent rt growfs when quota is enabled
xfs: rt stubs should return negative errnos when rt disabled
xfs: fix units conversion error in xfs_bmap_del_extent_delay
xfs: make sure maxlen is still congruent with prod when rounding down
xfs: clean up dqblk extraction
xfs: dquot recovery does not validate the recovered dquot
Dave Chinner (1):
xfs: inode recovery does not validate the recovered inode
Leah Rumancik (1):
xfs: up(ic_sema) if flushing data device fails
Long Li (2):
xfs: factor out xfs_defer_pending_abort
xfs: abort intent items when recovery intents fail
Omar Sandoval (1):
xfs: fix internal error from AGFL exhaustion
fs/xfs/libxfs/xfs_alloc.c | 27 ++++++++++++--
fs/xfs/libxfs/xfs_bmap.c | 21 +++--------
fs/xfs/libxfs/xfs_defer.c | 28 +++++++++------
fs/xfs/libxfs/xfs_defer.h | 2 +-
fs/xfs/libxfs/xfs_inode_buf.c | 3 ++
fs/xfs/libxfs/xfs_rtbitmap.c | 33 +++++++++++++++++
fs/xfs/libxfs/xfs_sb.h | 2 +-
fs/xfs/xfs_bmap_util.c | 24 +++++++------
fs/xfs/xfs_dquot.c | 5 +--
fs/xfs/xfs_dquot_item_recover.c | 21 +++++++++--
fs/xfs/xfs_file.c | 63 ++++++++++++++++++++++++++-------
fs/xfs/xfs_inode.c | 24 +++++++++++++
fs/xfs/xfs_inode.h | 17 +++++++++
fs/xfs/xfs_inode_item_recover.c | 14 +++++++-
fs/xfs/xfs_ioctl.c | 30 ++++++++++------
fs/xfs/xfs_iops.c | 7 ++++
fs/xfs/xfs_log.c | 23 ++++++------
fs/xfs/xfs_log_recover.c | 2 +-
fs/xfs/xfs_reflink.c | 5 +++
fs/xfs/xfs_rtalloc.c | 33 +++++++++++++----
fs/xfs/xfs_rtalloc.h | 27 ++++++++------
21 files changed, 310 insertions(+), 101 deletions(-)
--
2.48.1.362.g079036d154-goog
From: Tejun Heo <tj(a)kernel.org>
[ Upstream commit 86e6ca55b83c575ab0f2e105cf08f98e58d3d7af ]
blkcg_unpin_online() walks up the blkcg hierarchy putting the online pin. To
walk up, it uses blkcg_parent(blkcg) but it was calling that after
blkcg_destroy_blkgs(blkcg) which could free the blkcg, leading to the
following UAF:
==================================================================
BUG: KASAN: slab-use-after-free in blkcg_unpin_online+0x15a/0x270
Read of size 8 at addr ffff8881057678c0 by task kworker/9:1/117
CPU: 9 UID: 0 PID: 117 Comm: kworker/9:1 Not tainted 6.13.0-rc1-work-00182-gb8f52214c61a-dirty #48
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 02/02/2022
Workqueue: cgwb_release cgwb_release_workfn
Call Trace:
<TASK>
dump_stack_lvl+0x27/0x80
print_report+0x151/0x710
kasan_report+0xc0/0x100
blkcg_unpin_online+0x15a/0x270
cgwb_release_workfn+0x194/0x480
process_scheduled_works+0x71b/0xe20
worker_thread+0x82a/0xbd0
kthread+0x242/0x2c0
ret_from_fork+0x33/0x70
ret_from_fork_asm+0x1a/0x30
</TASK>
...
Freed by task 1944:
kasan_save_track+0x2b/0x70
kasan_save_free_info+0x3c/0x50
__kasan_slab_free+0x33/0x50
kfree+0x10c/0x330
css_free_rwork_fn+0xe6/0xb30
process_scheduled_works+0x71b/0xe20
worker_thread+0x82a/0xbd0
kthread+0x242/0x2c0
ret_from_fork+0x33/0x70
ret_from_fork_asm+0x1a/0x30
Note that the UAF is not easy to trigger as the free path is indirected
behind a couple RCU grace periods and a work item execution. I could only
trigger it with artifical msleep() injected in blkcg_unpin_online().
Fix it by reading the parent pointer before destroying the blkcg's blkg's.
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Reported-by: Abagail ren <renzezhongucas(a)gmail.com>
Suggested-by: Linus Torvalds <torvalds(a)linuxfoundation.org>
Fixes: 4308a434e5e0 ("blkcg: don't offline parent blkcg first")
Cc: stable(a)vger.kernel.org # v5.7+
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Andrea Ciprietti <ciprietti(a)google.com>
---
include/linux/blk-cgroup.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 0e6e84db06f6..b89099360a86 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -428,10 +428,14 @@ static inline void blkcg_pin_online(struct blkcg *blkcg)
static inline void blkcg_unpin_online(struct blkcg *blkcg)
{
do {
+ struct blkcg *parent;
+
if (!refcount_dec_and_test(&blkcg->online_pin))
break;
+
+ parent = blkcg_parent(blkcg);
blkcg_destroy_blkgs(blkcg);
- blkcg = blkcg_parent(blkcg);
+ blkcg = parent;
} while (blkcg);
}
--
2.48.1.262.g85cc9f2d1e-goog
From: Ard Biesheuvel <ardb(a)kernel.org>
UEFI 2.11 introduced EFI_MEMORY_HOT_PLUGGABLE to annotate system memory
regions that are 'cold plugged' at boot, i.e., hot pluggable memory that
is available from early boot, and described as system RAM by the
firmware.
Existing loaders and EFI applications running in the boot context will
happily use this memory for allocating data structures that cannot be
freed or moved at runtime, and this prevents the memory from being
unplugged. Going forward, the new EFI_MEMORY_HOT_PLUGGABLE attribute
should be tested, and memory annotated as such should be avoided for
such allocations.
In the EFI stub, there are a couple of occurrences where, instead of the
high-level AllocatePages() UEFI boot service, a low-level code sequence
is used that traverses the EFI memory map and carves out the requested
number of pages from a free region. This is needed, e.g., for allocating
as low as possible, or for allocating pages at random.
While AllocatePages() should presumably avoid special purpose memory and
cold plugged regions, this manual approach needs to incorporate this
logic itself, in order to prevent the kernel itself from ending up in a
hot unpluggable region, preventing it from being unplugged.
So add the EFI_MEMORY_HOTPLUGGABLE macro definition, and check for it
where appropriate.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
drivers/firmware/efi/efi.c | 6 ++++--
drivers/firmware/efi/libstub/randomalloc.c | 3 +++
drivers/firmware/efi/libstub/relocate.c | 3 +++
include/linux/efi.h | 1 +
4 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 8296bf985d1d..7309394b8fc9 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -934,13 +934,15 @@ char * __init efi_md_typeattr_format(char *buf, size_t size,
EFI_MEMORY_WB | EFI_MEMORY_UCE | EFI_MEMORY_RO |
EFI_MEMORY_WP | EFI_MEMORY_RP | EFI_MEMORY_XP |
EFI_MEMORY_NV | EFI_MEMORY_SP | EFI_MEMORY_CPU_CRYPTO |
- EFI_MEMORY_RUNTIME | EFI_MEMORY_MORE_RELIABLE))
+ EFI_MEMORY_MORE_RELIABLE | EFI_MEMORY_HOT_PLUGGABLE |
+ EFI_MEMORY_RUNTIME))
snprintf(pos, size, "|attr=0x%016llx]",
(unsigned long long)attr);
else
snprintf(pos, size,
- "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
+ "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
attr & EFI_MEMORY_RUNTIME ? "RUN" : "",
+ attr & EFI_MEMORY_HOT_PLUGGABLE ? "HP" : "",
attr & EFI_MEMORY_MORE_RELIABLE ? "MR" : "",
attr & EFI_MEMORY_CPU_CRYPTO ? "CC" : "",
attr & EFI_MEMORY_SP ? "SP" : "",
diff --git a/drivers/firmware/efi/libstub/randomalloc.c b/drivers/firmware/efi/libstub/randomalloc.c
index e5872e38d9a4..5a732018be36 100644
--- a/drivers/firmware/efi/libstub/randomalloc.c
+++ b/drivers/firmware/efi/libstub/randomalloc.c
@@ -25,6 +25,9 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
if (md->type != EFI_CONVENTIONAL_MEMORY)
return 0;
+ if (md->attribute & EFI_MEMORY_HOT_PLUGGABLE)
+ return 0;
+
if (efi_soft_reserve_enabled() &&
(md->attribute & EFI_MEMORY_SP))
return 0;
diff --git a/drivers/firmware/efi/libstub/relocate.c b/drivers/firmware/efi/libstub/relocate.c
index 99b45d1cd624..d4264bfb6dc1 100644
--- a/drivers/firmware/efi/libstub/relocate.c
+++ b/drivers/firmware/efi/libstub/relocate.c
@@ -53,6 +53,9 @@ efi_status_t efi_low_alloc_above(unsigned long size, unsigned long align,
if (desc->type != EFI_CONVENTIONAL_MEMORY)
continue;
+ if (desc->attribute & EFI_MEMORY_HOT_PLUGGABLE)
+ continue;
+
if (efi_soft_reserve_enabled() &&
(desc->attribute & EFI_MEMORY_SP))
continue;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 053c57e61869..db293d7de686 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -128,6 +128,7 @@ typedef struct {
#define EFI_MEMORY_RO ((u64)0x0000000000020000ULL) /* read-only */
#define EFI_MEMORY_SP ((u64)0x0000000000040000ULL) /* soft reserved */
#define EFI_MEMORY_CPU_CRYPTO ((u64)0x0000000000080000ULL) /* supports encryption */
+#define EFI_MEMORY_HOT_PLUGGABLE BIT_ULL(20) /* supports unplugging at runtime */
#define EFI_MEMORY_RUNTIME ((u64)0x8000000000000000ULL) /* range requires runtime mapping */
#define EFI_MEMORY_DESCRIPTOR_VERSION 1
--
2.48.1.362.g079036d154-goog
I'm announcing the release of the 6.13.1 kernel.
All users of the 6.13 kernel series must upgrade.
The updated 6.13.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.13.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
drivers/gpu/drm/v3d/v3d_irq.c | 16 ++
drivers/hid/hid-ids.h | 1
drivers/hid/hid-multitouch.c | 8 -
drivers/hid/wacom_sys.c | 24 ++--
drivers/input/joystick/xpad.c | 9 +
drivers/input/keyboard/atkbd.c | 2
drivers/net/wireless/realtek/rtl8xxxu/core.c | 20 +++
drivers/scsi/storvsc_drv.c | 8 +
drivers/usb/gadget/function/u_serial.c | 8 -
drivers/usb/serial/quatech2.c | 2
drivers/vfio/platform/vfio_platform_common.c | 10 +
fs/gfs2/file.c | 1
fs/libfs.c | 162 ++++++++++++---------------
fs/smb/client/smb2inode.c | 92 +++++++++++----
include/linux/fs.h | 1
io_uring/rsrc.c | 7 +
mm/filemap.c | 17 ++
mm/shmem.c | 4
net/sched/sch_ets.c | 2
sound/usb/quirks.c | 2
tools/power/cpupower/Makefile | 8 +
22 files changed, 264 insertions(+), 142 deletions(-)
Alex Williamson (1):
vfio/platform: check the bounds of read/write syscalls
Andreas Gruenbacher (1):
gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
Chuck Lever (5):
libfs: Return ENOSPC when the directory offset range is exhausted
Revert "libfs: Add simple_offset_empty()"
Revert "libfs: fix infinite directory reads for offset dir"
libfs: Replace simple_offset end-of-directory detection
libfs: Use d_children list to iterate simple_offset directories
Easwar Hariharan (1):
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
Greg Kroah-Hartman (2):
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
Linux 6.13.1
Hans de Goede (1):
wifi: rtl8xxxu: add more missing rtl8192cu USB IDs
Jack Greiner (1):
Input: xpad - add support for wooting two he (arm)
Jamal Hadi Salim (1):
net: sched: fix ets qdisc OOB Indexing
Jann Horn (1):
io_uring/rsrc: require cloned buffers to share accounting contexts
Jason Gerecke (1):
HID: wacom: Initialize brightness of LED trigger
Jiri Kosina (1):
Revert "HID: multitouch: Add support for lenovo Y9000P Touchpad"
Leonardo Brondani Schenkel (1):
Input: xpad - improve name of 8BitDo controller 2dc8:3106
Lianqin Hu (1):
ALSA: usb-audio: Add delay quirk for USB Audio Device
Linus Torvalds (1):
cachestat: fix page cache statistics permission checking
Mark Pearson (1):
Input: atkbd - map F23 key to support default copilot shortcut
Matheos Mattsson (1):
Input: xpad - add support for Nacon Evol-X Xbox One Controller
Maíra Canal (1):
drm/v3d: Assign job pointer to NULL before signaling the fence
Nicolas Nobelis (1):
Input: xpad - add support for Nacon Pro Compact
Nilton Perim Neto (1):
Input: xpad - add unofficial Xbox 360 wireless receiver clone
Paulo Alcantara (1):
smb: client: handle lack of EA support in smb2_query_path_info()
Peng Fan (1):
pm: cpupower: Makefile: Fix cross compilation
Pierre-Loup A. Griffais (1):
Input: xpad - add QH Electronics VID/PID
Qasim Ijaz (1):
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
This is the start of the stable review cycle for the 6.1.128 release.
There are 49 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 01 Feb 2025 14:01:16 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.128-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.128-rc1
Marek Szyprowski <m.szyprowski(a)samsung.com>
ASoC: samsung: midas_wm1811: Fix 'Headphone Switch' control creation
Paulo Alcantara <pc(a)manguebit.com>
smb: client: fix NULL ptr deref in crypto_aead_setkey()
Jack Greiner <jack(a)emoss.org>
Input: xpad - add support for wooting two he (arm)
Nilton Perim Neto <niltonperimneto(a)gmail.com>
Input: xpad - add unofficial Xbox 360 wireless receiver clone
Mark Pearson <mpearson-lenovo(a)squebb.ca>
Input: atkbd - map F23 key to support default copilot shortcut
Lianqin Hu <hulianqin(a)vivo.com>
ALSA: usb-audio: Add delay quirk for USB Audio Device
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
Qasim Ijaz <qasdev00(a)gmail.com>
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
Enzo Matsumiya <ematsumiya(a)suse.de>
smb: client: fix UAF in async decryption
Anjaneyulu <pagadala.yesu.anjaneyulu(a)intel.com>
wifi: iwlwifi: add a few rate index validity checks
Easwar Hariharan <eahariha(a)linux.microsoft.com>
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
Ido Schimmel <idosch(a)nvidia.com>
ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_find()
Luis Henriques (SUSE) <luis.henriques(a)linux.dev>
ext4: fix access to uninitialised lock in fc replay path
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: check the bounds of read/write syscalls
Jiri Kosina <jkosina(a)suse.com>
Revert "HID: multitouch: Add support for lenovo Y9000P Touchpad"
Alexey Dobriyan <adobriyan(a)gmail.com>
block: fix integer overflow in BLKSECDISCARD
Jamal Hadi Salim <jhs(a)mojatatu.com>
net: sched: fix ets qdisc OOB Indexing
Pavel Begunkov <asml.silence(a)gmail.com>
io_uring: fix waiters missing wake ups
Andreas Gruenbacher <agruenba(a)redhat.com>
gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
Christoph Hellwig <hch(a)lst.de>
xfs: respect the stable writes flag on the RT device
Christoph Hellwig <hch(a)lst.de>
xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags
Darrick J. Wong <djwong(a)kernel.org>
xfs: dquot recovery does not validate the recovered dquot
Darrick J. Wong <djwong(a)kernel.org>
xfs: clean up dqblk extraction
Dave Chinner <dchinner(a)redhat.com>
xfs: inode recovery does not validate the recovered inode
Omar Sandoval <osandov(a)fb.com>
xfs: fix internal error from AGFL exhaustion
Leah Rumancik <leah.rumancik(a)gmail.com>
xfs: up(ic_sema) if flushing data device fails
Christoph Hellwig <hch(a)lst.de>
xfs: only remap the written blocks in xfs_reflink_end_cow_extent
Long Li <leo.lilong(a)huawei.com>
xfs: abort intent items when recovery intents fail
Long Li <leo.lilong(a)huawei.com>
xfs: factor out xfs_defer_pending_abort
Catherine Hoang <catherine.hoang(a)oracle.com>
xfs: allow read IO and FICLONE to run concurrently
Christoph Hellwig <hch(a)lst.de>
xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space
Cheng Lin <cheng.lin130(a)zte.com.cn>
xfs: introduce protection for drop nlink
Darrick J. Wong <djwong(a)kernel.org>
xfs: make sure maxlen is still congruent with prod when rounding down
Darrick J. Wong <djwong(a)kernel.org>
xfs: fix units conversion error in xfs_bmap_del_extent_delay
Darrick J. Wong <djwong(a)kernel.org>
xfs: rt stubs should return negative errnos when rt disabled
Darrick J. Wong <djwong(a)kernel.org>
xfs: prevent rt growfs when quota is enabled
Darrick J. Wong <djwong(a)kernel.org>
xfs: hoist freeing of rt data fork extent mappings
Darrick J. Wong <djwong(a)kernel.org>
xfs: bump max fsgeom struct version
K Prateek Nayak <kprateek.nayak(a)amd.com>
softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel
Omid Ehtemam-Haghighi <omid.ehtemamhaghighi(a)menlosecurity.com>
ipv6: Fix soft lockups in fib6_select_path under high next hop churn
Cosmin Tanislav <demonsingur(a)gmail.com>
regmap: detach regmap from dev on regmap_exit
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing depends on I2C
Alper Nebi Yasak <alpernebiyasak(a)gmail.com>
ASoC: samsung: midas_wm1811: Map missing jack kcontrols
Philippe Simons <simons.philippe(a)gmail.com>
irqchip/sunxi-nmi: Add missing SKIP_WAKE flag
Tom Chung <chiahsuan.chung(a)amd.com>
drm/amd/display: Use HW lock mgr for PSR1
Xiang Zhang <hawkxiang.cpp(a)gmail.com>
scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
Linus Walleij <linus.walleij(a)linaro.org>
seccomp: Stub for !CONFIG_SECCOMP
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing selects for MFD_WM8994
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: wm8994: Add depends on MFD core
-------------
Diffstat:
Makefile | 4 +-
block/ioctl.c | 9 +-
drivers/base/regmap/regmap.c | 12 +
.../gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c | 3 +-
drivers/hid/hid-ids.h | 1 -
drivers/hid/hid-multitouch.c | 8 +-
drivers/input/joystick/xpad.c | 2 +
drivers/input/keyboard/atkbd.c | 2 +-
drivers/irqchip/irq-sunxi-nmi.c | 3 +-
drivers/net/wireless/intel/iwlwifi/dvm/rs.c | 9 +-
drivers/net/wireless/intel/iwlwifi/mvm/rs.c | 9 +-
drivers/scsi/scsi_transport_iscsi.c | 4 +-
drivers/scsi/storvsc_drv.c | 8 +-
drivers/usb/gadget/function/u_serial.c | 8 +-
drivers/usb/serial/quatech2.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 10 +
fs/ext4/super.c | 3 +-
fs/gfs2/file.c | 1 +
fs/smb/client/smb2ops.c | 47 ++--
fs/smb/client/smb2pdu.c | 10 +-
fs/xfs/libxfs/xfs_alloc.c | 27 ++-
fs/xfs/libxfs/xfs_bmap.c | 21 +-
fs/xfs/libxfs/xfs_defer.c | 38 +--
fs/xfs/libxfs/xfs_defer.h | 2 +-
fs/xfs/libxfs/xfs_inode_buf.c | 3 +
fs/xfs/libxfs/xfs_rtbitmap.c | 33 +++
fs/xfs/libxfs/xfs_sb.h | 2 +-
fs/xfs/xfs_bmap_util.c | 24 +-
fs/xfs/xfs_dquot.c | 5 +-
fs/xfs/xfs_dquot_item_recover.c | 21 +-
fs/xfs/xfs_file.c | 63 ++++-
fs/xfs/xfs_inode.c | 24 ++
fs/xfs/xfs_inode.h | 17 ++
fs/xfs/xfs_inode_item_recover.c | 14 +-
fs/xfs/xfs_ioctl.c | 34 ++-
fs/xfs/xfs_iops.c | 7 +
fs/xfs/xfs_log.c | 23 +-
fs/xfs/xfs_log_recover.c | 2 +-
fs/xfs/xfs_reflink.c | 5 +
fs/xfs/xfs_rtalloc.c | 33 ++-
fs/xfs/xfs_rtalloc.h | 27 ++-
include/linux/seccomp.h | 2 +-
io_uring/io_uring.c | 4 +-
kernel/softirq.c | 15 +-
net/ipv4/ip_tunnel.c | 2 +-
net/ipv6/ip6_fib.c | 8 +-
net/ipv6/route.c | 45 ++--
net/sched/sch_ets.c | 2 +
sound/soc/codecs/Kconfig | 1 +
sound/soc/samsung/Kconfig | 6 +-
sound/soc/samsung/midas_wm1811.c | 24 +-
sound/usb/quirks.c | 2 +
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/ipv6_route_update_soft_lockup.sh | 262 +++++++++++++++++++++
54 files changed, 763 insertions(+), 191 deletions(-)
The quilt patch titled
Subject: mm/hugetlb: fix hugepage allocation for interleaved memory nodes
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Ritesh Harjani (IBM)" <ritesh.list(a)gmail.com>
Subject: mm/hugetlb: fix hugepage allocation for interleaved memory nodes
Date: Sat, 11 Jan 2025 16:36:55 +0530
gather_bootmem_prealloc() assumes the start nid as 0 and size as
num_node_state(N_MEMORY). That means in case if memory attached numa
nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail
to scan few of these nodes.
Since memory attached numa nodes can be interleaved in any fashion, hence
ensure that the current code checks for all numa node ids
(.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
it can distributes all nr_node_ids among the these many no. threads.
e.g. qemu cmdline
========================
numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
mem_cmd="-object memory-backend-ram,id=mem1,size=16G"
w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
==========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 0 kB
with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
===========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 2
HugePages_Free: 2
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 2097152 kB
Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.17365925…
Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Reported-by: Pavithra Prakash <pavrampu(a)linux.ibm.com>
Suggested-by: Muchun Song <muchun.song(a)linux.dev>
Tested-by: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Reviewed-by: Luiz Capitulino <luizcap(a)redhat.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Cc: Donet Tom <donettom(a)linux.ibm.com>
Cc: Gang Li <gang.li(a)linux.dev>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes
+++ a/mm/hugetlb.c
@@ -3309,7 +3309,7 @@ static void __init gather_bootmem_preall
.thread_fn = gather_bootmem_prealloc_parallel,
.fn_arg = NULL,
.start = 0,
- .size = num_node_state(N_MEMORY),
+ .size = nr_node_ids,
.align = 1,
.min_chunk = 1,
.max_threads = num_node_state(N_MEMORY),
_
Patches currently in -mm which might be from ritesh.list(a)gmail.com are
The quilt patch titled
Subject: mm: gup: fix infinite loop within __get_longterm_locked
has been removed from the -mm tree. Its filename was
mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Subject: mm: gup: fix infinite loop within __get_longterm_locked
Date: Tue, 21 Jan 2025 10:01:59 +0800
We can run into an infinite loop in __get_longterm_locked() when
collect_longterm_unpinnable_folios() finds only folios that are isolated
from the LRU or were never added to the LRU. This can happen when all
folios to be pinned are never added to the LRU, for example when
vm_ops->fault allocated pages using cma_alloc() and never added them to
the LRU.
Fix it by simply taking a look at the list in the single caller, to see if
anything was added.
[zhaoyang.huang(a)unisoc.com: move definition of local]
Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com
Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com
Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()")
Signed-off-by: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Aijun Sun <aijun.sun(a)unisoc.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
--- a/mm/gup.c~mm-gup-fix-infinite-loop-within-__get_longterm_locked
+++ a/mm/gup.c
@@ -2320,13 +2320,13 @@ static void pofs_unpin(struct pages_or_f
/*
* Returns the number of collected folios. Return value is always >= 0.
*/
-static unsigned long collect_longterm_unpinnable_folios(
+static void collect_longterm_unpinnable_folios(
struct list_head *movable_folio_list,
struct pages_or_folios *pofs)
{
- unsigned long i, collected = 0;
struct folio *prev_folio = NULL;
bool drain_allow = true;
+ unsigned long i;
for (i = 0; i < pofs->nr_entries; i++) {
struct folio *folio = pofs_get_folio(pofs, i);
@@ -2338,8 +2338,6 @@ static unsigned long collect_longterm_un
if (folio_is_longterm_pinnable(folio))
continue;
- collected++;
-
if (folio_is_device_coherent(folio))
continue;
@@ -2361,8 +2359,6 @@ static unsigned long collect_longterm_un
NR_ISOLATED_ANON + folio_is_file_lru(folio),
folio_nr_pages(folio));
}
-
- return collected;
}
/*
@@ -2439,11 +2435,9 @@ static long
check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
{
LIST_HEAD(movable_folio_list);
- unsigned long collected;
- collected = collect_longterm_unpinnable_folios(&movable_folio_list,
- pofs);
- if (!collected)
+ collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
+ if (list_empty(&movable_folio_list))
return 0;
return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
_
Patches currently in -mm which might be from zhaoyang.huang(a)unisoc.com are
The quilt patch titled
Subject: kfence: skip __GFP_THISNODE allocations on NUMA systems
has been removed from the -mm tree. Its filename was
kfence-skip-__gfp_thisnode-allocations-on-numa-systems.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Marco Elver <elver(a)google.com>
Subject: kfence: skip __GFP_THISNODE allocations on NUMA systems
Date: Fri, 24 Jan 2025 13:01:38 +0100
On NUMA systems, __GFP_THISNODE indicates that an allocation _must_ be on
a particular node, and failure to allocate on the desired node will result
in a failed allocation.
Skip __GFP_THISNODE allocations if we are running on a NUMA system, since
KFENCE can't guarantee which node its pool pages are allocated on.
Link: https://lkml.kernel.org/r/20250124120145.410066-1-elver@google.com
Fixes: 236e9f153852 ("kfence: skip all GFP_ZONEMASK allocations")
Signed-off-by: Marco Elver <elver(a)google.com>
Reported-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Chistoph Lameter <cl(a)linux.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kfence/core.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/kfence/core.c~kfence-skip-__gfp_thisnode-allocations-on-numa-systems
+++ a/mm/kfence/core.c
@@ -21,6 +21,7 @@
#include <linux/log2.h>
#include <linux/memblock.h>
#include <linux/moduleparam.h>
+#include <linux/nodemask.h>
#include <linux/notifier.h>
#include <linux/panic_notifier.h>
#include <linux/random.h>
@@ -1084,6 +1085,7 @@ void *__kfence_alloc(struct kmem_cache *
* properties (e.g. reside in DMAable memory).
*/
if ((flags & GFP_ZONEMASK) ||
+ ((flags & __GFP_THISNODE) && num_online_nodes() > 1) ||
(s->flags & (SLAB_CACHE_DMA | SLAB_CACHE_DMA32))) {
atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_INCOMPAT]);
return NULL;
_
Patches currently in -mm which might be from elver(a)google.com are
The quilt patch titled
Subject: nilfs2: fix possible int overflows in nilfs_fiemap()
has been removed from the -mm tree. Its filename was
nilfs2-fix-possible-int-overflows-in-nilfs_fiemap.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Subject: nilfs2: fix possible int overflows in nilfs_fiemap()
Date: Sat, 25 Jan 2025 07:20:53 +0900
Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result
by being prepared to go through potentially maxblocks == INT_MAX blocks,
the value in n may experience an overflow caused by left shift of blkbits.
While it is extremely unlikely to occur, play it safe and cast right hand
expression to wider type to mitigate the issue.
Found by Linux Verification Center (linuxtesting.org) with static analysis
tool SVACE.
Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com
Fixes: 622daaff0a89 ("nilfs2: fiemap support")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/inode.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/nilfs2/inode.c~nilfs2-fix-possible-int-overflows-in-nilfs_fiemap
+++ a/fs/nilfs2/inode.c
@@ -1186,7 +1186,7 @@ int nilfs_fiemap(struct inode *inode, st
if (size) {
if (phys && blkphy << blkbits == phys + size) {
/* The current extent goes on */
- size += n << blkbits;
+ size += (u64)n << blkbits;
} else {
/* Terminate the current extent */
ret = fiemap_fill_next_extent(
@@ -1199,14 +1199,14 @@ int nilfs_fiemap(struct inode *inode, st
flags = FIEMAP_EXTENT_MERGED;
logical = blkoff << blkbits;
phys = blkphy << blkbits;
- size = n << blkbits;
+ size = (u64)n << blkbits;
}
} else {
/* Start a new extent */
flags = FIEMAP_EXTENT_MERGED;
logical = blkoff << blkbits;
phys = blkphy << blkbits;
- size = n << blkbits;
+ size = (u64)n << blkbits;
}
blkoff += n;
}
_
Patches currently in -mm which might be from n.zhandarovich(a)fintech.ru are
The quilt patch titled
Subject: mm: kmemleak: fix upper boundary check for physical address objects
has been removed from the -mm tree. Its filename was
mm-kmemleak-fix-upper-boundary-check-for-physical-address-objects.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Catalin Marinas <catalin.marinas(a)arm.com>
Subject: mm: kmemleak: fix upper boundary check for physical address objects
Date: Mon, 27 Jan 2025 18:42:33 +0000
Memblock allocations are registered by kmemleak separately, based on their
physical address. During the scanning stage, it checks whether an object
is within the min_low_pfn and max_low_pfn boundaries and ignores it
otherwise.
With the recent addition of __percpu pointer leak detection (commit
6c99d4eb7c5e ("kmemleak: enable tracking for percpu pointers")), kmemleak
started reporting leaks in setup_zone_pageset() and
setup_per_cpu_pageset(). These were caused by the node_data[0] object
(initialised in alloc_node_data()) ending on the PFN_PHYS(max_low_pfn)
boundary. The non-strict upper boundary check introduced by commit
84c326299191 ("mm: kmemleak: check physical address when scan") causes the
pg_data_t object to be ignored (not scanned) and the __percpu pointers it
contains to be reported as leaks.
Make the max_low_pfn upper boundary check strict when deciding whether to
ignore a physical address object and not scan it.
Link: https://lkml.kernel.org/r/20250127184233.2974311-1-catalin.marinas@arm.com
Fixes: 84c326299191 ("mm: kmemleak: check physical address when scan")
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Reported-by: Jakub Kicinski <kuba(a)kernel.org>
Tested-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Cc: Patrick Wang <patrick.wang.shcn(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [6.0.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/kmemleak.c~mm-kmemleak-fix-upper-boundary-check-for-physical-address-objects
+++ a/mm/kmemleak.c
@@ -1689,7 +1689,7 @@ static void kmemleak_scan(void)
unsigned long phys = object->pointer;
if (PHYS_PFN(phys) < min_low_pfn ||
- PHYS_PFN(phys + object->size) >= max_low_pfn)
+ PHYS_PFN(phys + object->size) > max_low_pfn)
__paint_it(object, KMEMLEAK_BLACK);
}
_
Patches currently in -mm which might be from catalin.marinas(a)arm.com are
The quilt patch titled
Subject: scripts/gdb: fix aarch64 userspace detection in get_current_task
has been removed from the -mm tree. Its filename was
scripts-gdb-fix-aarch64-userspace-detection-in-get_current_task.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jan Kiszka <jan.kiszka(a)siemens.com>
Subject: scripts/gdb: fix aarch64 userspace detection in get_current_task
Date: Fri, 10 Jan 2025 11:36:33 +0100
At least recent gdb releases (seen with 14.2) return SP_EL0 as signed long
which lets the right-shift always return 0.
Link: https://lkml.kernel.org/r/dcd2fabc-9131-4b48-8419-6444e2d67454@siemens.com
Signed-off-by: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/cpus.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/scripts/gdb/linux/cpus.py~scripts-gdb-fix-aarch64-userspace-detection-in-get_current_task
+++ a/scripts/gdb/linux/cpus.py
@@ -167,7 +167,7 @@ def get_current_task(cpu):
var_ptr = gdb.parse_and_eval("&pcpu_hot.current_task")
return per_cpu(var_ptr, cpu).dereference()
elif utils.is_target_arch("aarch64"):
- current_task_addr = gdb.parse_and_eval("$SP_EL0")
+ current_task_addr = gdb.parse_and_eval("(unsigned long)$SP_EL0")
if (current_task_addr >> 63) != 0:
current_task = current_task_addr.cast(task_ptr_type)
return current_task.dereference()
_
Patches currently in -mm which might be from jan.kiszka(a)siemens.com are
The quilt patch titled
Subject: mm/vmscan: accumulate nr_demoted for accurate demotion statistics
has been removed from the -mm tree. Its filename was
mm-vmscan-accumulate-nr_demoted-for-accurate-demotion-statistics.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Li Zhijian <lizhijian(a)fujitsu.com>
Subject: mm/vmscan: accumulate nr_demoted for accurate demotion statistics
Date: Fri, 10 Jan 2025 20:21:32 +0800
In shrink_folio_list(), demote_folio_list() can be called 2 times.
Currently stat->nr_demoted will only store the last nr_demoted( the later
nr_demoted is always zero, the former nr_demoted will get lost), as a
result number of demoted pages is not accurate.
Accumulate the nr_demoted count across multiple calls to
demote_folio_list(), ensuring accurate reporting of demotion statistics.
[lizhijian(a)fujitsu.com: introduce local nr_demoted to fix nr_reclaimed double counting]
Link: https://lkml.kernel.org/r/20250111015253.425693-1-lizhijian@fujitsu.com
Link: https://lkml.kernel.org/r/20250110122133.423481-1-lizhijian@fujitsu.com
Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations")
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Acked-by: Kaiyang Zhao <kaiyang2(a)cs.cmu.edu>
Tested-by: Donet Tom <donettom(a)linux.ibm.com>
Reviewed-by: Donet Tom <donettom(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/vmscan.c~mm-vmscan-accumulate-nr_demoted-for-accurate-demotion-statistics
+++ a/mm/vmscan.c
@@ -1086,7 +1086,7 @@ static unsigned int shrink_folio_list(st
struct folio_batch free_folios;
LIST_HEAD(ret_folios);
LIST_HEAD(demote_folios);
- unsigned int nr_reclaimed = 0;
+ unsigned int nr_reclaimed = 0, nr_demoted = 0;
unsigned int pgactivate = 0;
bool do_demote_pass;
struct swap_iocb *plug = NULL;
@@ -1550,8 +1550,9 @@ keep:
/* 'folio_list' is always empty here */
/* Migrate folios selected for demotion */
- stat->nr_demoted = demote_folio_list(&demote_folios, pgdat);
- nr_reclaimed += stat->nr_demoted;
+ nr_demoted = demote_folio_list(&demote_folios, pgdat);
+ nr_reclaimed += nr_demoted;
+ stat->nr_demoted += nr_demoted;
/* Folios that could not be demoted are still in @demote_folios */
if (!list_empty(&demote_folios)) {
/* Folios which weren't demoted go back on @folio_list */
_
Patches currently in -mm which might be from lizhijian(a)fujitsu.com are
The quilt patch titled
Subject: ocfs2: fix incorrect CPU endianness conversion causing mount failure
has been removed from the -mm tree. Its filename was
ocfs2-fix-incorrect-cpu-endianness-conversion-causing-mount-failure.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Heming Zhao <heming.zhao(a)suse.com>
Subject: ocfs2: fix incorrect CPU endianness conversion causing mount failure
Date: Tue, 21 Jan 2025 19:22:03 +0800
Commit 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()")
introduced a regression bug. The blksz_bits value is already converted to
CPU endian in the previous code; therefore, the code shouldn't use
le32_to_cpu() anymore.
Link: https://lkml.kernel.org/r/20250121112204.12834-1-heming.zhao@suse.com
Fixes: 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()")
Signed-off-by: Heming Zhao <heming.zhao(a)suse.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ocfs2/super.c~ocfs2-fix-incorrect-cpu-endianness-conversion-causing-mount-failure
+++ a/fs/ocfs2/super.c
@@ -2285,7 +2285,7 @@ static int ocfs2_verify_volume(struct oc
mlog(ML_ERROR, "found superblock with incorrect block "
"size bits: found %u, should be 9, 10, 11, or 12\n",
blksz_bits);
- } else if ((1 << le32_to_cpu(blksz_bits)) != blksz) {
+ } else if ((1 << blksz_bits) != blksz) {
mlog(ML_ERROR, "found superblock with incorrect block "
"size: found %u, should be %u\n", 1 << blksz_bits, blksz);
} else if (le16_to_cpu(di->id2.i_super.s_major_rev_level) !=
_
Patches currently in -mm which might be from heming.zhao(a)suse.com are
This is the start of the stable review cycle for the 5.15.178 release.
There are 24 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 01 Feb 2025 14:01:15 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.178-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.178-rc1
Jack Greiner <jack(a)emoss.org>
Input: xpad - add support for wooting two he (arm)
Nilton Perim Neto <niltonperimneto(a)gmail.com>
Input: xpad - add unofficial Xbox 360 wireless receiver clone
Mark Pearson <mpearson-lenovo(a)squebb.ca>
Input: atkbd - map F23 key to support default copilot shortcut
Lianqin Hu <hulianqin(a)vivo.com>
ALSA: usb-audio: Add delay quirk for USB Audio Device
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
Qasim Ijaz <qasdev00(a)gmail.com>
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
Anjaneyulu <pagadala.yesu.anjaneyulu(a)intel.com>
wifi: iwlwifi: add a few rate index validity checks
Easwar Hariharan <eahariha(a)linux.microsoft.com>
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
Ido Schimmel <idosch(a)nvidia.com>
ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_find()
Akihiko Odaki <akihiko.odaki(a)gmail.com>
platform/chrome: cros_ec_typec: Check for EC driver
Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
fs/ntfs3: Additional check in ntfs_file_release
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: RFCOMM: Fix not validating setsockopt user input
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: SCO: Fix not validating setsockopt user input
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: check the bounds of read/write syscalls
Jamal Hadi Salim <jhs(a)mojatatu.com>
net: sched: fix ets qdisc OOB Indexing
Andreas Gruenbacher <agruenba(a)redhat.com>
gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
Paolo Abeni <pabeni(a)redhat.com>
mptcp: don't always assume copied data in mptcp_cleanup_rbuf()
Cosmin Tanislav <demonsingur(a)gmail.com>
regmap: detach regmap from dev on regmap_exit
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing depends on I2C
Philippe Simons <simons.philippe(a)gmail.com>
irqchip/sunxi-nmi: Add missing SKIP_WAKE flag
Xiang Zhang <hawkxiang.cpp(a)gmail.com>
scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
Linus Walleij <linus.walleij(a)linaro.org>
seccomp: Stub for !CONFIG_SECCOMP
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing selects for MFD_WM8994
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: wm8994: Add depends on MFD core
-------------
Diffstat:
Makefile | 4 ++--
drivers/base/regmap/regmap.c | 12 ++++++++++++
drivers/input/joystick/xpad.c | 2 ++
drivers/input/keyboard/atkbd.c | 2 +-
drivers/irqchip/irq-sunxi-nmi.c | 3 ++-
drivers/net/wireless/intel/iwlwifi/dvm/rs.c | 7 +++++--
drivers/net/wireless/intel/iwlwifi/mvm/rs.c | 9 ++++++---
drivers/platform/chrome/cros_ec_typec.c | 3 +++
drivers/scsi/scsi_transport_iscsi.c | 4 +++-
drivers/scsi/storvsc_drv.c | 8 +++++++-
drivers/usb/gadget/function/u_serial.c | 8 ++++----
drivers/usb/serial/quatech2.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 10 ++++++++++
fs/gfs2/file.c | 1 +
fs/ntfs3/file.c | 12 ++++++++++--
include/linux/seccomp.h | 2 +-
include/net/bluetooth/bluetooth.h | 9 +++++++++
net/bluetooth/rfcomm/sock.c | 14 +++++---------
net/bluetooth/sco.c | 19 ++++++++-----------
net/ipv4/ip_tunnel.c | 2 +-
net/mptcp/protocol.c | 18 +++++++++---------
net/sched/sch_ets.c | 2 ++
sound/soc/codecs/Kconfig | 1 +
sound/soc/samsung/Kconfig | 6 ++++--
sound/usb/quirks.c | 2 ++
25 files changed, 111 insertions(+), 51 deletions(-)
Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
skips charging any zswap entries when it failed to zswap the entire
folio.
However, when some base pages are zswapped but it failed to zswap
the entire folio, the zswap operation is rolled back.
When freeing zswap entries for those pages, zswap_entry_free() uncharges
the zswap entries that were not previously charged, causing zswap charging
to become inconsistent.
This inconsistency triggers two warnings with following steps:
# On a machine with 64GiB of RAM and 36GiB of zswap
$ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng
$ sudo reboot
The two warnings are:
in mm/memcontrol.c:163, function obj_cgroup_release():
WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
in mm/page_counter.c:60, function page_counter_cancel():
if (WARN_ONCE(new < 0, "page_counter underflow: %ld nr_pages=%lu\n",
new, nr_pages))
zswap_stored_pages also becomes inconsistent in the same way.
As suggested by Kanchana, increment zswap_stored_pages and charge zswap
entries within zswap_store_page() when it succeeds. This way,
zswap_entry_free() will decrement the counter and uncharge the entries
when it failed to zswap the entire folio.
While this could potentially be optimized by batching objcg charging
and incrementing the counter, let's focus on fixing the bug this time
and leave the optimization for later after some evaluation.
After resolving the inconsistency, the warnings disappear.
Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
Cc: stable(a)vger.kernel.org
Co-developed-by: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com>
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com>
Signed-off-by: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
---
v2 -> v3:
- Adjusted Kanchana's feedback:
- Fixed inconsistency in zswap_stored_pages
- Now objcg charging and incrementing zswap_store_pages is done
within zswap_stored_pages, one by one
mm/zswap.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index 6504174fbc6a..f0bd962bffd5 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1504,11 +1504,14 @@ static ssize_t zswap_store_page(struct page *page,
entry->pool = pool;
entry->swpentry = page_swpentry;
entry->objcg = objcg;
+ if (objcg)
+ obj_cgroup_charge_zswap(objcg, entry->length);
entry->referenced = true;
if (entry->length) {
INIT_LIST_HEAD(&entry->lru);
zswap_lru_add(&zswap_list_lru, entry);
}
+ atomic_long_inc(&zswap_stored_pages);
return entry->length;
@@ -1526,7 +1529,6 @@ bool zswap_store(struct folio *folio)
struct obj_cgroup *objcg = NULL;
struct mem_cgroup *memcg = NULL;
struct zswap_pool *pool;
- size_t compressed_bytes = 0;
bool ret = false;
long index;
@@ -1569,15 +1571,11 @@ bool zswap_store(struct folio *folio)
bytes = zswap_store_page(page, objcg, pool);
if (bytes < 0)
goto put_pool;
- compressed_bytes += bytes;
}
- if (objcg) {
- obj_cgroup_charge_zswap(objcg, compressed_bytes);
+ if (objcg)
count_objcg_events(objcg, ZSWPOUT, nr_pages);
- }
- atomic_long_add(nr_pages, &zswap_stored_pages);
count_vm_events(ZSWPOUT, nr_pages);
ret = true;
--
2.47.1
The patch titled
Subject: lib/iov_iter: fix import_iovec_ubuf iovec management
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
lib-iov_iter-fix-import_iovec_ubuf-iovec-management.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Pavel Begunkov <asml.silence(a)gmail.com>
Subject: lib/iov_iter: fix import_iovec_ubuf iovec management
Date: Fri, 31 Jan 2025 14:13:15 +0000
import_iovec() says that it should always be fine to kfree the iovec
returned in @iovp regardless of the error code. __import_iovec_ubuf()
never reallocates it and thus should clear the pointer even in cases when
copy_iovec_*() fail.
Link: https://lkml.kernel.org/r/378ae26923ffc20fd5e41b4360d673bf47b1775b.17383324…
Fixes: 3b2deb0e46da9 ("iov_iter: import single vector iovecs as ITER_UBUF")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Reviewed-by: Jens Axboe <axboe(a)kernel.dk>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/iov_iter.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/lib/iov_iter.c~lib-iov_iter-fix-import_iovec_ubuf-iovec-management
+++ a/lib/iov_iter.c
@@ -1428,6 +1428,8 @@ static ssize_t __import_iovec_ubuf(int t
struct iovec *iov = *iovp;
ssize_t ret;
+ *iovp = NULL;
+
if (compat)
ret = copy_compat_iovec_from_user(iov, uvec, 1);
else
@@ -1438,7 +1440,6 @@ static ssize_t __import_iovec_ubuf(int t
ret = import_ubuf(type, iov->iov_base, iov->iov_len, i);
if (unlikely(ret))
return ret;
- *iovp = NULL;
return i->count;
}
_
Patches currently in -mm which might be from asml.silence(a)gmail.com are
lib-iov_iter-fix-import_iovec_ubuf-iovec-management.patch
pm_runtime_resume_and_get() sets dev->power.runtime_error that causes
all subsequent pm_runtime_get_sync() calls to fail.
Clear the runtime_error using pm_runtime_set_suspended(), so the driver
doesn't have to be reloaded to recover when the NPU fails to boot during
runtime resume.
Fixes: 7d4b4c74432d ("accel/ivpu: Remove suspend_reschedule_counter")
Cc: <stable(a)vger.kernel.org> # v6.11+
Reviewed-by: Maciej Falkowski <maciej.falkowski(a)linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com>
---
drivers/accel/ivpu/ivpu_pm.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c
index 949f4233946c6..c3774d2221326 100644
--- a/drivers/accel/ivpu/ivpu_pm.c
+++ b/drivers/accel/ivpu/ivpu_pm.c
@@ -309,7 +309,10 @@ int ivpu_rpm_get(struct ivpu_device *vdev)
int ret;
ret = pm_runtime_resume_and_get(vdev->drm.dev);
- drm_WARN_ON(&vdev->drm, ret < 0);
+ if (ret < 0) {
+ ivpu_err(vdev, "Failed to resume NPU: %d\n", ret);
+ pm_runtime_set_suspended(vdev->drm.dev);
+ }
return ret;
}
--
2.45.1
While converting users of msecs_to_jiffies(), lkp reported that some
range checks would always be true because of the mismatch between the
implied int value of secs_to_jiffies() vs the unsigned long
return value of the msecs_to_jiffies() calls it was replacing. Fix this
by casting secs_to_jiffies() values as unsigned long.
Fixes: b35108a51cf7ba ("jiffies: Define secs_to_jiffies()")
CC: stable(a)vger.kernel.org # 6.13+
CC: Andrew Morton <akpm(a)linux-foundation.org>
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501301334.NB6NszQR-lkp@intel.com/
Signed-off-by: Easwar Hariharan <eahariha(a)linux.microsoft.com>
---
include/linux/jiffies.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index ed945f42e064..0ea8c9887429 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -537,7 +537,7 @@ static __always_inline unsigned long msecs_to_jiffies(const unsigned int m)
*
* Return: jiffies value
*/
-#define secs_to_jiffies(_secs) ((_secs) * HZ)
+#define secs_to_jiffies(_secs) (unsigned long)((_secs) * HZ)
extern unsigned long __usecs_to_jiffies(const unsigned int u);
#if !(USEC_PER_SEC % HZ)
--
2.43.0
There are a few places where the idreg overrides are read w/ the MMU
off, for example the VHE and hVHE checks in __finalise_el2. And while
the infrastructure gets this _mostly_ right (i.e. does the appropriate
cache maintenance), the placement of the data itself is problematic and
could share a cache line with something else.
Depending on how unforgiving an implementation's handling of mismatched
attributes is, this could lead to data corruption. In one observed case,
the system_cpucaps shared a line with arm64_sw_feature_override and the
cpucaps got nuked after entering the hyp stub...
Even though only a few overrides are read without the MMU on, just throw
the whole lot into the mmuoff section and be done with it.
Cc: stable(a)vger.kernel.org # v5.15+
Tested-by: Moritz Fischer <moritzf(a)google.com>
Tested-by: Pedro Martelletto <martelletto(a)google.com>
Reported-by: Jon Masters <jonmasters(a)google.com>
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
---
arch/arm64/kernel/cpufeature.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index d41128e37701..92506d9f90db 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -755,17 +755,20 @@ static const struct arm64_ftr_bits ftr_raz[] = {
#define ARM64_FTR_REG(id, table) \
__ARM64_FTR_REG_OVERRIDE(#id, id, table, &no_override)
-struct arm64_ftr_override id_aa64mmfr0_override;
-struct arm64_ftr_override id_aa64mmfr1_override;
-struct arm64_ftr_override id_aa64mmfr2_override;
-struct arm64_ftr_override id_aa64pfr0_override;
-struct arm64_ftr_override id_aa64pfr1_override;
-struct arm64_ftr_override id_aa64zfr0_override;
-struct arm64_ftr_override id_aa64smfr0_override;
-struct arm64_ftr_override id_aa64isar1_override;
-struct arm64_ftr_override id_aa64isar2_override;
-
-struct arm64_ftr_override arm64_sw_feature_override;
+#define DEFINE_FTR_OVERRIDE(name) \
+ struct arm64_ftr_override __section(".mmuoff.data.read") name
+
+DEFINE_FTR_OVERRIDE(id_aa64mmfr0_override);
+DEFINE_FTR_OVERRIDE(id_aa64mmfr1_override);
+DEFINE_FTR_OVERRIDE(id_aa64mmfr2_override);
+DEFINE_FTR_OVERRIDE(id_aa64pfr0_override);
+DEFINE_FTR_OVERRIDE(id_aa64pfr1_override);
+DEFINE_FTR_OVERRIDE(id_aa64zfr0_override);
+DEFINE_FTR_OVERRIDE(id_aa64smfr0_override);
+DEFINE_FTR_OVERRIDE(id_aa64isar1_override);
+DEFINE_FTR_OVERRIDE(id_aa64isar2_override);
+
+DEFINE_FTR_OVERRIDE(arm64_sw_feature_override);
static const struct __ftr_reg_entry {
u32 sys_id;
base-commit: 1dd3393696efba1598aa7692939bba99d0cffae3
--
2.39.5
From: Willem de Bruijn <willemb(a)google.com>
commit 6513eb3d3191574b58859ef2d6dc26c0277c6f81 upstream.
The referenced commit drops bad input, but has false positives.
Tighten the check to avoid these.
The check detects illegal checksum offload requests, which produce
csum_start/csum_off beyond end of packet after segmentation.
But it is based on two incorrect assumptions:
1. virtio_net_hdr_to_skb with VIRTIO_NET_HDR_GSO_TCP[46] implies GSO.
True in callers that inject into the tx path, such as tap.
But false in callers that inject into rx, like virtio-net.
Here, the flags indicate GRO, and CHECKSUM_UNNECESSARY or
CHECKSUM_NONE without VIRTIO_NET_HDR_F_NEEDS_CSUM is normal.
2. TSO requires checksum offload, i.e., ip_summed == CHECKSUM_PARTIAL.
False, as tcp[46]_gso_segment will fix up csum_start and offset for
all other ip_summed by calling __tcp_v4_send_check.
Because of 2, we can limit the scope of the fix to virtio_net_hdr
that do try to set these fields, with a bogus value.
Link: https://lore.kernel.org/netdev/20240909094527.GA3048202@port70.net/
Fixes: 89add40066f9 ("net: drop bad gso csum_start and offset in virtio_net_hdr")
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/20240910213553.839926-1-willemdebruijn.kernel@gmai…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[Denis: minor fix to resolve merge conflict.]
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
Backport fix for CVE-2024-43817
Link: https://nvd.nist.gov/vuln/detail/cve-2024-43817
---
include/linux/virtio_net.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 823e28042f41..62613d4d84b7 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -161,7 +161,8 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
break;
case SKB_GSO_TCPV4:
case SKB_GSO_TCPV6:
- if (skb->csum_offset != offsetof(struct tcphdr, check))
+ if (skb->ip_summed == CHECKSUM_PARTIAL &&
+ skb->csum_offset != offsetof(struct tcphdr, check))
return -EINVAL;
break;
}
--
2.43.0
From: Willem de Bruijn <willemb(a)google.com>
commit 89add40066f9ed9abe5f7f886fe5789ff7e0c50e upstream.
Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb
for GSO packets.
The function already checks that a checksum requested with
VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets
this might not hold for segs after segmentation.
Syzkaller demonstrated to reach this warning in skb_checksum_help
offset = skb_checksum_start_offset(skb);
ret = -EINVAL;
if (WARN_ON_ONCE(offset >= skb_headlen(skb)))
By injecting a TSO packet:
WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0
ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774
ip_finish_output_gso net/ipv4/ip_output.c:279 [inline]
__ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301
iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4850 [inline]
netdev_start_xmit include/linux/netdevice.h:4864 [inline]
xmit_one net/core/dev.c:3595 [inline]
dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611
__dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261
packet_snd net/packet/af_packet.c:3073 [inline]
The geometry of the bad input packet at tcp_gso_segment:
[ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0
[ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244
[ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0))
[ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536
ip_summed=3 complete_sw=0 valid=0 level=0)
Mitigate with stricter input validation.
csum_offset: for GSO packets, deduce the correct value from gso_type.
This is already done for USO. Extend it to TSO. Let UFO be:
udp[46]_ufo_fragment ignores these fields and always computes the
checksum in software.
csum_start: finding the real offset requires parsing to the transport
header. Do not add a parser, use existing segmentation parsing. Thanks
to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded.
Again test both TSO and USO. Do not test UFO for the above reason, and
do not test UDP tunnel offload.
GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be
CHECKSUM_NONE since commit 10154dbded6d6 ("udp: Allow GSO transmit
from devices with no checksum offload"), but then still these fields
are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no
need to test for ip_summed == CHECKSUM_PARTIAL first.
This revises an existing fix mentioned in the Fixes tag, which broke
small packets with GSO offload, as detected by kselftests.
Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871
Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org
Fixes: e269d79c7d35 ("net: missing check virtio")
Cc: stable(a)vger.kernel.org
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Link: https://patch.msgid.link/20240729201108.1615114-1-willemdebruijn.kernel@gma…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[Denis: minor fix to resolve merge conflict.]
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
Backport fix for CVE-2024-43817
Link: https://nvd.nist.gov/vuln/detail/cve-2024-43817
---
include/linux/virtio_net.h | 5 +++++
net/ipv4/tcp_offload.c | 3 +++
net/ipv4/udp_offload.c | 14 ++++++++++++++
3 files changed, 22 insertions(+)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 9bdba0c00c07..823e28042f41 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -159,6 +159,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
if (gso_type != SKB_GSO_UDP_L4)
return -EINVAL;
break;
+ case SKB_GSO_TCPV4:
+ case SKB_GSO_TCPV6:
+ if (skb->csum_offset != offsetof(struct tcphdr, check))
+ return -EINVAL;
+ break;
}
/* Kernel has a special handling for GSO_BY_FRAGS. */
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index fc61cd3fea65..357d3be04f84 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -71,6 +71,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
if (thlen < sizeof(*th))
goto out;
+ if (unlikely(skb_checksum_start(skb) != skb_transport_header(skb)))
+ goto out;
+
if (!pskb_may_pull(skb, thlen))
goto out;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 6e36eb1ba276..9b757394ad4f 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -276,6 +276,20 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
if (gso_skb->len <= sizeof(*uh) + mss)
return ERR_PTR(-EINVAL);
+ if (unlikely(skb_checksum_start(gso_skb) !=
+ skb_transport_header(gso_skb)))
+ return ERR_PTR(-EINVAL);
+
+ if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
+ /* Packet is from an untrusted source, reset gso_segs. */
+ skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
+ mss);
+ return NULL;
+ }
+
+ if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST)
+ return __udp_gso_segment_list(gso_skb, features, is_ipv6);
+
skb_pull(gso_skb, sizeof(*uh));
/* clear destructor to avoid skb_segment assigning it to tail */
--
2.43.0
Link: https://nvd.nist.gov/vuln/detail/cve-2024-43817
[PATCH 5.10 1/5] net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation
[PATCH 5.10 2/5] net: drop bad gso csum_start and offset in virtio_net_hdr
[PATCH 5.10 3/5] net: tighten bad gso csum offset check in virtio_net_hdr
[PATCH 5.10 4/5] net: add more sanity check in virtio_net_hdr_to_skb()
[PATCH 5.10 5/5] net: test for not too small csum_start in
The bug has been fixed "silently" in upstream with the following series
of 5 commits.
49d14b54a527 net: test for not too small csum_start in virtio_net_hdr_to_skb()
6513eb3d3191 net: tighten bad gso csum offset check in virtio_net_hdr
89add40066f9 net: drop bad gso csum_start and offset in virtio_net_hdr
9181d6f8a2bb net: add more sanity check in virtio_net_hdr_to_skb()
fc8b2a619469 net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation
While the MIDI jacks are configured correctly, and the MIDIStreaming
endpoint descriptors are filled with the correct information,
bNumEmbMIDIJack and bLength are set incorrectly in these descriptors.
This does not matter when the numbers of in and out ports are equal, but
when they differ the host will receive broken descriptors with
uninitialized stack memory leaking into the descriptor for whichever
value is smaller.
The precise meaning of "in" and "out" in the port counts is not clearly
defined and can be confusing. But elsewhere the driver consistently
uses this to match the USB meaning of IN and OUT viewed from the host,
so that "in" ports send data to the host and "out" ports receive data
from it.
Cc: stable(a)vger.kernel.org
Fixes: c8933c3f79568 ("USB: gadget: f_midi: allow a dynamic number of input and output ports")
Signed-off-by: John Keeping <jkeeping(a)inmusicbrands.com>
---
v2:
- Rewrite commit message to hopefully be clearer about what is going on
with the meaning of in/out
drivers/usb/gadget/function/f_midi.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c
index 837fcdfa3840f..6cc3d86cb4774 100644
--- a/drivers/usb/gadget/function/f_midi.c
+++ b/drivers/usb/gadget/function/f_midi.c
@@ -1000,11 +1000,11 @@ static int f_midi_bind(struct usb_configuration *c, struct usb_function *f)
}
/* configure the endpoint descriptors ... */
- ms_out_desc.bLength = USB_DT_MS_ENDPOINT_SIZE(midi->in_ports);
- ms_out_desc.bNumEmbMIDIJack = midi->in_ports;
+ ms_out_desc.bLength = USB_DT_MS_ENDPOINT_SIZE(midi->out_ports);
+ ms_out_desc.bNumEmbMIDIJack = midi->out_ports;
- ms_in_desc.bLength = USB_DT_MS_ENDPOINT_SIZE(midi->out_ports);
- ms_in_desc.bNumEmbMIDIJack = midi->out_ports;
+ ms_in_desc.bLength = USB_DT_MS_ENDPOINT_SIZE(midi->in_ports);
+ ms_in_desc.bNumEmbMIDIJack = midi->in_ports;
/* ... and add them to the list */
endpoint_descriptor_index = i;
--
2.48.1
The patch titled
Subject: mm/zswap: fix inconsistency when zswap_store_page() fails
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-zswap-fix-inconsistency-when-zswap_store_page-fails.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
Subject: mm/zswap: fix inconsistency when zswap_store_page() fails
Date: Wed, 29 Jan 2025 19:08:44 +0900
Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
skips charging any zswap entries when it failed to zswap the entire folio.
However, when some base pages are zswapped but it failed to zswap the
entire folio, the zswap operation is rolled back. When freeing zswap
entries for those pages, zswap_entry_free() uncharges the zswap entries
that were not previously charged, causing zswap charging to become
inconsistent.
This inconsistency triggers two warnings with following steps:
# On a machine with 64GiB of RAM and 36GiB of zswap
$ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng
$ sudo reboot
The two warnings are:
in mm/memcontrol.c:163, function obj_cgroup_release():
WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
in mm/page_counter.c:60, function page_counter_cancel():
if (WARN_ONCE(new < 0, "page_counter underflow: %ld nr_pages=%lu\n",
new, nr_pages))
zswap_stored_pages also becomes inconsistent in the same way.
As suggested by Kanchana, increment zswap_stored_pages and charge zswap
entries within zswap_store_page() when it succeeds. This way,
zswap_entry_free() will decrement the counter and uncharge the entries
when it failed to zswap the entire folio.
While this could potentially be optimized by batching objcg charging and
incrementing the counter, let's focus on fixing the bug this time and
leave the optimization for later after some evaluation.
After resolving the inconsistency, the warnings disappear.
Link: https://lkml.kernel.org/r/20250129100844.2935-1-42.hyeyoo@gmail.com
Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
Co-developed-by: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com>
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar(a)intel.com>
Signed-off-by: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
Acked-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zswap.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
--- a/mm/zswap.c~mm-zswap-fix-inconsistency-when-zswap_store_page-fails
+++ a/mm/zswap.c
@@ -1504,11 +1504,14 @@ static ssize_t zswap_store_page(struct p
entry->pool = pool;
entry->swpentry = page_swpentry;
entry->objcg = objcg;
+ if (objcg)
+ obj_cgroup_charge_zswap(objcg, entry->length);
entry->referenced = true;
if (entry->length) {
INIT_LIST_HEAD(&entry->lru);
zswap_lru_add(&zswap_list_lru, entry);
}
+ atomic_long_inc(&zswap_stored_pages);
return entry->length;
@@ -1526,7 +1529,6 @@ bool zswap_store(struct folio *folio)
struct obj_cgroup *objcg = NULL;
struct mem_cgroup *memcg = NULL;
struct zswap_pool *pool;
- size_t compressed_bytes = 0;
bool ret = false;
long index;
@@ -1569,15 +1571,11 @@ bool zswap_store(struct folio *folio)
bytes = zswap_store_page(page, objcg, pool);
if (bytes < 0)
goto put_pool;
- compressed_bytes += bytes;
}
- if (objcg) {
- obj_cgroup_charge_zswap(objcg, compressed_bytes);
+ if (objcg)
count_objcg_events(objcg, ZSWPOUT, nr_pages);
- }
- atomic_long_add(nr_pages, &zswap_stored_pages);
count_vm_events(ZSWPOUT, nr_pages);
ret = true;
_
Patches currently in -mm which might be from 42.hyeyoo(a)gmail.com are
mm-zsmalloc-add-__maybe_unused-attribute-for-is_first_zpdesc.patch
mm-zswap-fix-inconsistency-when-zswap_store_page-fails.patch
The patch titled
Subject: jiffies: cast to unsigned long for secs_to_jiffies() conversion
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
jiffies-cast-to-unsigned-long-for-secs_to_jiffies-conversion.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Easwar Hariharan <eahariha(a)linux.microsoft.com>
Subject: jiffies: cast to unsigned long for secs_to_jiffies() conversion
Date: Thu, 30 Jan 2025 19:26:58 +0000
While converting users of msecs_to_jiffies(), lkp reported that some range
checks would always be true because of the mismatch between the implied
int value of secs_to_jiffies() vs the unsigned long return value of the
msecs_to_jiffies() calls it was replacing. Fix this by casting
secs_to_jiffies() values as unsigned long.
Link: https://lkml.kernel.org/r/20250130192701.99626-1-eahariha@linux.microsoft.c…
Fixes: b35108a51cf7ba ("jiffies: Define secs_to_jiffies()")
Signed-off-by: Easwar Hariharan <eahariha(a)linux.microsoft.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501301334.NB6NszQR-lkp@intel.com/
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Anna-Maria Behnsen <anna-maria(a)linutronix.de>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Cc: Miguel Ojeda <ojeda(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [6.13+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/jiffies.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/jiffies.h~jiffies-cast-to-unsigned-long-for-secs_to_jiffies-conversion
+++ a/include/linux/jiffies.h
@@ -537,7 +537,7 @@ static __always_inline unsigned long mse
*
* Return: jiffies value
*/
-#define secs_to_jiffies(_secs) ((_secs) * HZ)
+#define secs_to_jiffies(_secs) (unsigned long)((_secs) * HZ)
extern unsigned long __usecs_to_jiffies(const unsigned int u);
#if !(USEC_PER_SEC % HZ)
_
Patches currently in -mm which might be from eahariha(a)linux.microsoft.com are
jiffies-cast-to-unsigned-long-for-secs_to_jiffies-conversion.patch
scsi-lpfc-convert-timeouts-to-secs_to_jiffies.patch
accel-habanalabs-convert-timeouts-to-secs_to_jiffies.patch
alsa-ac97-convert-timeouts-to-secs_to_jiffies.patch
btrfs-convert-timeouts-to-secs_to_jiffies.patch
rbd-convert-timeouts-to-secs_to_jiffies.patch
libceph-convert-timeouts-to-secs_to_jiffies.patch
libata-zpodd-convert-timeouts-to-secs_to_jiffies.patch
xfs-convert-timeouts-to-secs_to_jiffies.patch
power-supply-da9030-convert-timeouts-to-secs_to_jiffies.patch
nvme-convert-timeouts-to-secs_to_jiffies.patch
spi-spi-fsl-lpspi-convert-timeouts-to-secs_to_jiffies.patch
spi-spi-imx-convert-timeouts-to-secs_to_jiffies.patch
platform-x86-amd-pmf-convert-timeouts-to-secs_to_jiffies.patch
platform-x86-thinkpad_acpi-convert-timeouts-to-secs_to_jiffies.patch
rdma-bnxt_re-convert-timeouts-to-secs_to_jiffies.patch
The patch titled "scsi: core: Fix scsi_mode_sense() buffer length handling"
addresses CVE-2021-47182, fixing the following issues in `scsi_mode_sense()`
buffer length handling:
1. Incorrect handling of the allocation length field in the MODE SENSE(10)
command, causing truncation of buffer lengths larger than 255 bytes.
2. Memory corruption when handling small buffer lengths due to lack of proper
validation.
CVE announcement in linux-cve-announce:
https://lore.kernel.org/linux-cve-announce/2024041032-CVE-2021-47182-377e@g…
Fixed versions:
- Fixed in 5.15.5 with commit e15de347faf4
- Fixed in 5.16 with commit 17b49bcbf835
Official CVE entry:
https://cve.org/CVERecord/?id=CVE-2021-47182
---
v2: To ensure consistency and completeness of the fixes, this backport
includes all 3 patches from the series [1].
In addition to the first patch that addresses the CVE, the second and
third patches are included, which prevent further regressions and align
with the fixes already backported and proposed for backporting [2] to
the stable 5.15 kernel.
[1] https://lore.kernel.org/all/20210820070255.682775-1-damien.lemoal@wdc.com/
[2] https://lore.kernel.org/all/20241209165340.112862-1-kovalev@altlinux.org/
[PATCH 5.10.y 1/3] scsi: core: Fix scsi_mode_sense() buffer length handling
[PATCH 5.10.y 2/3] scsi: core: Fix scsi_mode_select() buffer length handling
[PATCH 5.10.y 3/3] scsi: sd: Fix sd_do_mode_sense() buffer length handling
From: Ard Biesheuvel <ardb(a)kernel.org>
commit 97d6786e0669daa5c2f2d07a057f574e849dfd3e upstream
As a hardening measure, we currently randomize the placement of
physical memory inside the linear region when KASLR is in effect.
Since the random offset at which to place the available physical
memory inside the linear region is chosen early at boot, it is
based on the memblock description of memory, which does not cover
hotplug memory. The consequence of this is that the randomization
offset may be chosen such that any hotplugged memory located above
memblock_end_of_DRAM() that appears later is pushed off the end of
the linear region, where it cannot be accessed.
So let's limit this randomization of the linear region to ensure
that this can no longer happen, by using the CPU's addressable PA
range instead. As it is guaranteed that no hotpluggable memory will
appear that falls outside of that range, we can safely put this PA
range sized window anywhere in the linear region.
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Steven Price <steven.price(a)arm.com>
Cc: Robin Murphy <robin.murphy(a)arm.com>
Link: https://lore.kernel.org/r/20201014081857.3288-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Signed-off-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
---
arch/arm64/mm/init.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index cbcac03c0e0d..a6034645d6f7 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -392,15 +392,18 @@ void __init arm64_memblock_init(void)
if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
extern u16 memstart_offset_seed;
- u64 range = linear_region_size -
- (memblock_end_of_DRAM() - memblock_start_of_DRAM());
+ u64 mmfr0 = read_cpuid(ID_AA64MMFR0_EL1);
+ int parange = cpuid_feature_extract_unsigned_field(
+ mmfr0, ID_AA64MMFR0_PARANGE_SHIFT);
+ s64 range = linear_region_size -
+ BIT(id_aa64mmfr0_parange_to_phys_shift(parange));
/*
* If the size of the linear region exceeds, by a sufficient
- * margin, the size of the region that the available physical
- * memory spans, randomize the linear region as well.
+ * margin, the size of the region that the physical memory can
+ * span, randomize the linear region as well.
*/
- if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) {
+ if (memstart_offset_seed > 0 && range >= (s64)ARM64_MEMSTART_ALIGN) {
range /= ARM64_MEMSTART_ALIGN;
memstart_addr -= ARM64_MEMSTART_ALIGN *
((range * memstart_offset_seed) >> 16);
--
2.43.0
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: ee2ab467bddfb2d7f68d996dbab94d7b88f8eaf7
Gitweb: https://git.kernel.org/tip/ee2ab467bddfb2d7f68d996dbab94d7b88f8eaf7
Author: Nathan Chancellor <nathan(a)kernel.org>
AuthorDate: Tue, 21 Jan 2025 18:11:33 -07:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Thu, 30 Jan 2025 09:59:24 -08:00
x86/boot: Use '-std=gnu11' to fix build with GCC 15
GCC 15 changed the default C standard version to C23, which should not
have impacted the kernel because it requests the gnu11 standard via
'-std=' in the main Makefile. However, the x86 compressed boot Makefile
uses its own set of KBUILD_CFLAGS without a '-std=' value (i.e., using
the default), resulting in errors from the kernel's definitions of bool,
true, and false in stddef.h, which are reserved keywords under C23.
./include/linux/stddef.h:11:9: error: expected identifier before ‘false’
11 | false = 0,
./include/linux/types.h:35:33: error: two or more data types in declaration specifiers
35 | typedef _Bool bool;
Set '-std=gnu11' in the x86 compressed boot Makefile to resolve the
error and consistently use the same C standard version for the entire
kernel.
Closes: https://lore.kernel.org/4OAhbllK7x4QJGpZjkYjtBYNLd_2whHx9oFiuZcGwtVR4hIzvdu…
Closes: https://lore.kernel.org/Z4467umXR2PZ0M1H@tucnak/
Reported-by: Kostadin Shishmanov <kostadinshishmanov(a)protonmail.com>
Reported-by: Jakub Jelinek <jakub(a)redhat.com>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Ard Biesheuvel <ardb(a)kernel.org>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250121-x86-use-std-consistently-gcc-15-v1-1-8…
---
arch/x86/boot/compressed/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index f205164..606c74f 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -25,6 +25,7 @@ targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \
# avoid errors with '-march=i386', and future flags may depend on the target to
# be valid.
KBUILD_CFLAGS := -m$(BITS) -O2 $(CLANG_FLAGS)
+KBUILD_CFLAGS += -std=gnu11
KBUILD_CFLAGS += -fno-strict-aliasing -fPIE
KBUILD_CFLAGS += -Wundef
KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
In the two loops before setting the MIDIStreaming descriptors,
ms_in_desc.baAssocJackID[] has entries written for "in_ports" values and
ms_out_desc.baAssocJackID[] has entries written for "out_ports" values.
But the counts and lengths are set the other way round in the
descriptors.
Fix the descriptors so that the bNumEmbMIDIJack values and the
descriptor lengths match the number of entries populated in the trailing
arrays.
Cc: stable(a)vger.kernel.org
Fixes: c8933c3f79568 ("USB: gadget: f_midi: allow a dynamic number of input and output ports")
Signed-off-by: John Keeping <jkeeping(a)inmusicbrands.com>
---
drivers/usb/gadget/function/f_midi.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c
index 837fcdfa3840f..6cc3d86cb4774 100644
--- a/drivers/usb/gadget/function/f_midi.c
+++ b/drivers/usb/gadget/function/f_midi.c
@@ -1000,11 +1000,11 @@ static int f_midi_bind(struct usb_configuration *c, struct usb_function *f)
}
/* configure the endpoint descriptors ... */
- ms_out_desc.bLength = USB_DT_MS_ENDPOINT_SIZE(midi->in_ports);
- ms_out_desc.bNumEmbMIDIJack = midi->in_ports;
+ ms_out_desc.bLength = USB_DT_MS_ENDPOINT_SIZE(midi->out_ports);
+ ms_out_desc.bNumEmbMIDIJack = midi->out_ports;
- ms_in_desc.bLength = USB_DT_MS_ENDPOINT_SIZE(midi->out_ports);
- ms_in_desc.bNumEmbMIDIJack = midi->out_ports;
+ ms_in_desc.bLength = USB_DT_MS_ENDPOINT_SIZE(midi->in_ports);
+ ms_in_desc.bNumEmbMIDIJack = midi->in_ports;
/* ... and add them to the list */
endpoint_descriptor_index = i;
--
2.48.1
Just like it's normal for unset values to be zero, unset strings should be
empty instead of containing random values.
It seems to be a typical mistake that the mask returned by statmount is not
checked, which can result in various bugs.
With this fix, these bugs are prevented, since it is highly likely that
userspace would just want to turn the missing mask case into an empty
string anyway (most of the recently found cases are of this type).
Link: https://lore.kernel.org/all/CAJfpegsVCPfCn2DpM8iiYSS5DpMsLB8QBUCHecoj6s0Vxf…
Fixes: 68385d77c05b ("statmount: simplify string option retrieval")
Fixes: 46eae99ef733 ("add statmount(2) syscall")
Cc: <stable(a)vger.kernel.org> # v6.8
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
---
fs/namespace.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index a3ed3f2980cb..9c4d307a82cd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -5191,39 +5191,45 @@ static int statmount_string(struct kstatmount *s, u64 flag)
size_t kbufsize;
struct seq_file *seq = &s->seq;
struct statmount *sm = &s->sm;
- u32 start = seq->count;
+ u32 start, *offp;
+
+ /* Reserve an empty string at the beginning for any unset offsets */
+ if (!seq->count)
+ seq_putc(seq, 0);
+
+ start = seq->count;
switch (flag) {
case STATMOUNT_FS_TYPE:
- sm->fs_type = start;
+ offp = &sm->fs_type;
ret = statmount_fs_type(s, seq);
break;
case STATMOUNT_MNT_ROOT:
- sm->mnt_root = start;
+ offp = &sm->mnt_root;
ret = statmount_mnt_root(s, seq);
break;
case STATMOUNT_MNT_POINT:
- sm->mnt_point = start;
+ offp = &sm->mnt_point;
ret = statmount_mnt_point(s, seq);
break;
case STATMOUNT_MNT_OPTS:
- sm->mnt_opts = start;
+ offp = &sm->mnt_opts;
ret = statmount_mnt_opts(s, seq);
break;
case STATMOUNT_OPT_ARRAY:
- sm->opt_array = start;
+ offp = &sm->opt_array;
ret = statmount_opt_array(s, seq);
break;
case STATMOUNT_OPT_SEC_ARRAY:
- sm->opt_sec_array = start;
+ offp = &sm->opt_sec_array;
ret = statmount_opt_sec_array(s, seq);
break;
case STATMOUNT_FS_SUBTYPE:
- sm->fs_subtype = start;
+ offp = &sm->fs_subtype;
statmount_fs_subtype(s, seq);
break;
case STATMOUNT_SB_SOURCE:
- sm->sb_source = start;
+ offp = &sm->sb_source;
ret = statmount_sb_source(s, seq);
break;
default:
@@ -5251,6 +5257,7 @@ static int statmount_string(struct kstatmount *s, u64 flag)
seq->buf[seq->count++] = '\0';
sm->mask |= flag;
+ *offp = start;
return 0;
}
--
2.48.1
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 19d340a2988d4f3e673cded9dde405d727d7e248
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025013011-scenic-crazed-e3c8@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 19d340a2988d4f3e673cded9dde405d727d7e248 Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Tue, 14 Jan 2025 18:49:00 +0100
Subject: [PATCH] io_uring/rsrc: require cloned buffers to share accounting
contexts
When IORING_REGISTER_CLONE_BUFFERS is used to clone buffers from uring
instance A to uring instance B, where A and B use different MMs for
accounting, the accounting can go wrong:
If uring instance A is closed before uring instance B, the pinned memory
counters for uring instance B will be decremented, even though the pinned
memory was originally accounted through uring instance A; so the MM of
uring instance B can end up with negative locked memory.
Cc: stable(a)vger.kernel.org
Closes: https://lore.kernel.org/r/CAG48ez1zez4bdhmeGLEFxtbFADY4Czn3CV0u9d_TMcbvRA01…
Fixes: 7cc2a6eadcd7 ("io_uring: add IORING_REGISTER_COPY_BUFFERS method")
Signed-off-by: Jann Horn <jannh(a)google.com>
Link: https://lore.kernel.org/r/20250114-uring-check-accounting-v1-1-42e4145aa743…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 964a47c8d85e..688e277f0335 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -928,6 +928,13 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
int i, ret, off, nr;
unsigned int nbufs;
+ /*
+ * Accounting state is shared between the two rings; that only works if
+ * both rings are accounted towards the same counters.
+ */
+ if (ctx->user != src_ctx->user || ctx->mm_account != src_ctx->mm_account)
+ return -EINVAL;
+
/* if offsets are given, must have nr specified too */
if (!arg->nr && (arg->dst_off || arg->src_off))
return -EINVAL;
From: Chuck Lever <chuck.lever(a)oracle.com>
This series backports several upstream fixes to origin/linux-6.6.y
in order to address CVE-2024-46701:
https://nvd.nist.gov/vuln/detail/CVE-2024-46701
As applied to origin/linux-6.6.y, this series passes fstests and the
git regression suite.
Before officially requesting that stable@ merge this series, I'd
like to provide an opportunity for community review of the backport
patches.
You can also find them them in the "nfsd-6.6.y" branch in
https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
Chuck Lever (10):
libfs: Re-arrange locking in offset_iterate_dir()
libfs: Define a minimum directory offset
libfs: Add simple_offset_empty()
libfs: Fix simple_offset_rename_exchange()
libfs: Add simple_offset_rename() API
shmem: Fix shmem_rename2()
libfs: Return ENOSPC when the directory offset range is exhausted
Revert "libfs: Add simple_offset_empty()"
libfs: Replace simple_offset end-of-directory detection
libfs: Use d_children list to iterate simple_offset directories
fs/libfs.c | 177 +++++++++++++++++++++++++++++++++------------
include/linux/fs.h | 2 +
mm/shmem.c | 3 +-
3 files changed, 134 insertions(+), 48 deletions(-)
--
2.47.0
This is the start of the stable review cycle for the 6.12.12 release.
There are 40 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 01 Feb 2025 13:34:42 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.12-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.12.12-rc1
Jack Greiner <jack(a)emoss.org>
Input: xpad - add support for wooting two he (arm)
Matheos Mattsson <matheos.mattsson(a)gmail.com>
Input: xpad - add support for Nacon Evol-X Xbox One Controller
Leonardo Brondani Schenkel <leonardo(a)schenkel.net>
Input: xpad - improve name of 8BitDo controller 2dc8:3106
Pierre-Loup A. Griffais <pgriffais(a)valvesoftware.com>
Input: xpad - add QH Electronics VID/PID
Nilton Perim Neto <niltonperimneto(a)gmail.com>
Input: xpad - add unofficial Xbox 360 wireless receiver clone
Mark Pearson <mpearson-lenovo(a)squebb.ca>
Input: atkbd - map F23 key to support default copilot shortcut
Nicolas Nobelis <nicolas(a)nobelis.eu>
Input: xpad - add support for Nacon Pro Compact
Jason Gerecke <jason.gerecke(a)wacom.com>
HID: wacom: Initialize brightness of LED trigger
Hans de Goede <hdegoede(a)redhat.com>
wifi: rtl8xxxu: add more missing rtl8192cu USB IDs
Lianqin Hu <hulianqin(a)vivo.com>
ALSA: usb-audio: Add delay quirk for USB Audio Device
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
Qasim Ijaz <qasdev00(a)gmail.com>
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
Easwar Hariharan <eahariha(a)linux.microsoft.com>
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: check the bounds of read/write syscalls
Linus Torvalds <torvalds(a)linux-foundation.org>
cachestat: fix page cache statistics permission checking
Jiri Kosina <jikos(a)kernel.org>
Revert "HID: multitouch: Add support for lenovo Y9000P Touchpad"
Jamal Hadi Salim <jhs(a)mojatatu.com>
net: sched: fix ets qdisc OOB Indexing
Paulo Alcantara <pc(a)manguebit.com>
smb: client: handle lack of EA support in smb2_query_path_info()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Use d_children list to iterate simple_offset directories
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Replace simple_offset end-of-directory detection
Chuck Lever <chuck.lever(a)oracle.com>
Revert "libfs: fix infinite directory reads for offset dir"
Chuck Lever <chuck.lever(a)oracle.com>
Revert "libfs: Add simple_offset_empty()"
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Return ENOSPC when the directory offset range is exhausted
Andreas Gruenbacher <agruenba(a)redhat.com>
gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
Yosry Ahmed <yosryahmed(a)google.com>
mm: zswap: move allocations during CPU init outside the lock
Yosry Ahmed <yosryahmed(a)google.com>
mm: zswap: properly synchronize freeing resources during CPU hotunplug
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing depends on I2C
Russell Harmon <russ(a)har.mn>
hwmon: (drivetemp) Set scsi command timeout to 10s
Philippe Simons <simons.philippe(a)gmail.com>
irqchip/sunxi-nmi: Add missing SKIP_WAKE flag
Cristian Ciocaltea <cristian.ciocaltea(a)collabora.com>
drm/connector: hdmi: Validate supported_formats matches ycbcr_420_allowed
Yage Geng <icoderdev(a)gmail.com>
ALSA: hda/realtek: Fix volume adjustment issue on Lenovo ThinkBook 16P Gen5
Rob Herring (Arm) <robh(a)kernel.org>
of/unittest: Add test that of_address_to_resource() fails on non-translatable address
Alex Hung <alex.hung(a)amd.com>
drm/amd/display: Initialize denominator defaults to 1
Tom Chung <chiahsuan.chung(a)amd.com>
drm/amd/display: Use HW lock mgr for PSR1
Xiang Zhang <hawkxiang.cpp(a)gmail.com>
scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
Maciej Strozek <mstrozek(a)opensource.cirrus.com>
ASoC: cs42l43: Add codec force suspend/resume ops
Linus Walleij <linus.walleij(a)linaro.org>
seccomp: Stub for !CONFIG_SECCOMP
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing selects for MFD_WM8994
Marian Postevca <posteuca(a)mutex.one>
ASoC: codecs: es8316: Fix HW rate calculation for 48Mhz MCLK
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: wm8994: Add depends on MFD core
-------------
Diffstat:
Makefile | 4 +-
.../gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c | 3 +-
.../dml21/src/dml2_core/dml2_core_dcn4_calcs.c | 4 +-
drivers/gpu/drm/drm_connector.c | 3 +
drivers/hid/hid-ids.h | 1 -
drivers/hid/hid-multitouch.c | 8 +-
drivers/hid/wacom_sys.c | 24 +--
drivers/hwmon/drivetemp.c | 2 +-
drivers/input/joystick/xpad.c | 9 +-
drivers/input/keyboard/atkbd.c | 2 +-
drivers/irqchip/irq-sunxi-nmi.c | 3 +-
drivers/net/wireless/realtek/rtl8xxxu/core.c | 20 +++
drivers/of/unittest-data/tests-platform.dtsi | 13 ++
drivers/of/unittest.c | 14 ++
drivers/scsi/scsi_transport_iscsi.c | 4 +-
drivers/scsi/storvsc_drv.c | 8 +-
drivers/usb/gadget/function/u_serial.c | 8 +-
drivers/usb/serial/quatech2.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 10 ++
fs/gfs2/file.c | 1 +
fs/libfs.c | 162 ++++++++++-----------
fs/smb/client/smb2inode.c | 104 +++++++++----
include/linux/fs.h | 1 -
include/linux/seccomp.h | 2 +-
mm/filemap.c | 19 +++
mm/shmem.c | 4 +-
mm/zswap.c | 90 ++++++++----
net/sched/sch_ets.c | 2 +
sound/pci/hda/patch_realtek.c | 4 +-
sound/soc/codecs/Kconfig | 1 +
sound/soc/codecs/cs42l43.c | 1 +
sound/soc/codecs/es8316.c | 10 +-
sound/soc/samsung/Kconfig | 6 +-
sound/usb/quirks.c | 2 +
34 files changed, 367 insertions(+), 184 deletions(-)
Here are two small fixes for issues introduced in v6.12.
- Patch 1: reset the mpc_drop mark for other SYN retransmits, to only
consider an MPTCP blackhole when the first SYN retransmitted without
the MPTCP options is accepted, as initially intended.
- Patch 2: also mention in the doc that the blackhole_timeout sysctl
knob is per-netns, like all the others.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Notes:
- The Cc stable tag has only been added to the first patch, I don't
think it is usually added on fixes related to the doc, right?
- A Fixes tag is present in both patches: I hope that's also OK for the
one modifying the doc. It can be removed if preferred.
---
Matthieu Baerts (NGI0) (2):
mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted
doc: mptcp: sysctl: blackhole_timeout is per-netns
Documentation/networking/mptcp-sysctl.rst | 2 +-
net/mptcp/ctrl.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
---
base-commit: 9e6c4e6b605c1fa3e24f74ee0b641e95f090188a
change-id: 20250128-net-mptcp-blackhole-fix-363f098fe726
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
For historical reasons, the GLINK smem interrupt is registered with
IRRQF_NO_SUSPEND flag set, which is the underlying problem here, since the
incoming messages can be delivered during late suspend and early
resume.
In this specific case, the pmic_glink_altmode_worker() currently gets
scheduled on the system_wq which can be scheduled to run while devices
are still suspended. This proves to be a problem when a Type-C retimer,
switch or mux that is controlled over a bus like I2C, because the I2C
controller is suspended.
This has been proven to be the case on the X Elite boards where such
retimers (ParadeTech PS8830) are used in order to handle Type-C
orientation and altmode configuration. The following warning is thrown:
[ 35.134876] i2c i2c-4: Transfer while suspended
[ 35.143865] WARNING: CPU: 0 PID: 99 at drivers/i2c/i2c-core.h:56 __i2c_transfer+0xb4/0x57c [i2c_core]
[ 35.352879] Workqueue: events pmic_glink_altmode_worker [pmic_glink_altmode]
[ 35.360179] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 35.455242] Call trace:
[ 35.457826] __i2c_transfer+0xb4/0x57c [i2c_core] (P)
[ 35.463086] i2c_transfer+0x98/0xf0 [i2c_core]
[ 35.467713] i2c_transfer_buffer_flags+0x54/0x88 [i2c_core]
[ 35.473502] regmap_i2c_write+0x20/0x48 [regmap_i2c]
[ 35.478659] _regmap_raw_write_impl+0x780/0x944
[ 35.483401] _regmap_bus_raw_write+0x60/0x7c
[ 35.487848] _regmap_write+0x134/0x184
[ 35.491773] regmap_write+0x54/0x78
[ 35.495418] ps883x_set+0x58/0xec [ps883x]
[ 35.499688] ps883x_sw_set+0x60/0x84 [ps883x]
[ 35.504223] typec_switch_set+0x48/0x74 [typec]
[ 35.508952] pmic_glink_altmode_worker+0x44/0x1fc [pmic_glink_altmode]
[ 35.515712] process_scheduled_works+0x1a0/0x2d0
[ 35.520525] worker_thread+0x2a8/0x3c8
[ 35.524449] kthread+0xfc/0x184
[ 35.527749] ret_from_fork+0x10/0x20
The proper solution here should be to not deliver these kind of messages
during system suspend at all, or at least make it configurable per glink
client. But simply dropping the IRQF_NO_SUSPEND flag entirely will break
other clients. The final shape of the rework of the pmic glink driver in
order to fulfill both the filtering of the messages that need to be able
to wake-up the system and the queueing of these messages until the system
has properly resumed is still being discussed and it is planned as a
future effort.
Meanwhile, the stop-gap fix here is to schedule the pmic glink altmode
worker on the system_freezable_wq instead of the system_wq. This will
result in the altmode worker not being scheduled to run until the
devices are resumed first, which will give the controllers like I2C a
chance to resume before the transfer is requested.
Reported-by: Johan Hovold <johan+linaro(a)kernel.org>
Closes: https://lore.kernel.org/lkml/Z1CCVjEZMQ6hJ-wK@hovoldconsulting.com/
Fixes: 080b4e24852b ("soc: qcom: pmic_glink: Introduce altmode support")
Cc: stable(a)vger.kernel.org # 6.3
Reviewed-by: Caleb Connolly <caleb.connolly(a)linaro.org>
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
Changes in v2:
- Re-worded the commit to explain the underlying problem and how
this fix is just a stop-gap for the pmic glink client for now.
- Added Johan's Reported-by tag and link
- Added Caleb's Reviewed-by tag
- Link to v1: https://lore.kernel.org/r/20250110-soc-qcom-pmic-glink-fix-device-access-on…
---
drivers/soc/qcom/pmic_glink_altmode.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink_altmode.c b/drivers/soc/qcom/pmic_glink_altmode.c
index bd06ce16180411059e9efb14d9aeccda27744280..bde129aa7d90a39becaa720376c0539bcaa492fb 100644
--- a/drivers/soc/qcom/pmic_glink_altmode.c
+++ b/drivers/soc/qcom/pmic_glink_altmode.c
@@ -295,7 +295,7 @@ static void pmic_glink_altmode_sc8180xp_notify(struct pmic_glink_altmode *altmod
alt_port->mode = mode;
alt_port->hpd_state = hpd_state;
alt_port->hpd_irq = hpd_irq;
- schedule_work(&alt_port->work);
+ queue_work(system_freezable_wq, &alt_port->work);
}
#define SC8280XP_DPAM_MASK 0x3f
@@ -338,7 +338,7 @@ static void pmic_glink_altmode_sc8280xp_notify(struct pmic_glink_altmode *altmod
alt_port->mode = mode;
alt_port->hpd_state = hpd_state;
alt_port->hpd_irq = hpd_irq;
- schedule_work(&alt_port->work);
+ queue_work(system_freezable_wq, &alt_port->work);
}
static void pmic_glink_altmode_callback(const void *data, size_t len, void *priv)
---
base-commit: da7e6047a6264af16d2cb82bed9b6caa33eaf56a
change-id: 20250110-soc-qcom-pmic-glink-fix-device-access-on-worker-while-suspended-af54c5e43ed6
Best regards,
--
Abel Vesa <abel.vesa(a)linaro.org>
From: "Luis Henriques (SUSE)" <luis.henriques(a)linux.dev>
commit 23dfdb56581ad92a9967bcd720c8c23356af74c1 upstream.
The following kernel trace can be triggered with fstest generic/629 when
executed against a filesystem with fast-commit feature enabled:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 PID: 866 Comm: mount Not tainted 6.10.0+ #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x66/0x90
register_lock_class+0x759/0x7d0
__lock_acquire+0x85/0x2630
? __find_get_block+0xb4/0x380
lock_acquire+0xd1/0x2d0
? __ext4_journal_get_write_access+0xd5/0x160
_raw_spin_lock+0x33/0x40
? __ext4_journal_get_write_access+0xd5/0x160
__ext4_journal_get_write_access+0xd5/0x160
ext4_reserve_inode_write+0x61/0xb0
__ext4_mark_inode_dirty+0x79/0x270
? ext4_ext_replay_set_iblocks+0x2f8/0x450
ext4_ext_replay_set_iblocks+0x330/0x450
ext4_fc_replay+0x14c8/0x1540
? jread+0x88/0x2e0
? rcu_is_watching+0x11/0x40
do_one_pass+0x447/0xd00
jbd2_journal_recover+0x139/0x1b0
jbd2_journal_load+0x96/0x390
ext4_load_and_init_journal+0x253/0xd40
ext4_fill_super+0x2cc6/0x3180
...
In the replay path there's an attempt to lock sbi->s_bdev_wb_lock in
function ext4_check_bdev_write_error(). Unfortunately, at this point this
spinlock has not been initialized yet. Moving it's initialization to an
earlier point in __ext4_fill_super() fixes this splat.
Signed-off-by: Luis Henriques (SUSE) <luis.henriques(a)linux.dev>
Link: https://patch.msgid.link/20240718094356.7863-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
Signed-off-by: Bruno VERNAY <bruno.vernay(a)se.com>
Signed-off-by: Victor Giraud <vgiraud.opensource(a)witekio.com>
---
fs/ext4/super.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0b2591c07166..53f1deb049ec 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5264,6 +5264,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
mutex_init(&sbi->s_orphan_lock);
+ spin_lock_init(&sbi->s_bdev_wb_lock);
+
ext4_fast_commit_init(sb);
sb->s_root = NULL;
@@ -5514,7 +5516,6 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
* Save the original bdev mapping's wb_err value which could be
* used to detect the metadata async write error.
*/
- spin_lock_init(&sbi->s_bdev_wb_lock);
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
&sbi->s_bdev_wb_err);
sb->s_bdev->bd_super = sb;
--
2.34.1
[ upstream commit 3181e22fb79910c7071e84a43af93ac89e8a7106 ]
There are reports of mariadb hangs, which is caused by a missing
barrier in the waking code resulting in waiters losing events.
The problem was introduced in a backport
3ab9326f93ec4 ("io_uring: wake up optimisations"),
and the change restores the barrier present in the original commit
3ab9326f93ec4 ("io_uring: wake up optimisations")
Reported by: Xan Charbonnet <xan(a)charbonnet.com>
Fixes: 3ab9326f93ec4 ("io_uring: wake up optimisations")
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093243#99
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
---
io_uring/io_uring.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9b58ba4616d40..e5a8ee944ef59 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -592,8 +592,10 @@ static inline void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
io_commit_cqring(ctx);
spin_unlock(&ctx->completion_lock);
io_commit_cqring_flush(ctx);
- if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
+ if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) {
+ smp_mb();
__io_cqring_wake(ctx);
+ }
}
void io_cq_unlock_post(struct io_ring_ctx *ctx)
--
2.47.1
Hello,
I would like to propose backporting this commit to stable releases:
https://git.kernel.org/torvalds/c/3681c74d342db75b0d641ba60de27bf73e16e66b
smb: client: handle lack of EA support in smb2_query_path_info()
It is fixing support for querying special files (fifo/socket/block/char)
over SMB2+ servers which do not support extended attributes and reparse
point at the same time on one inode, which applied for older Windows
servers (pre-Win10).
I think that commit should have line:
Fixes: ea41367b2a60 ("smb: client: introduce SMB2_OP_QUERY_WSL_EA")
Note that the mention commit depends on:
ca4b2c460743 ("fs/smb/client: avoid querying SMB2_OP_QUERY_WSL_EA for SMB3 POSIX")
Pali
Even though FOLL_SPLIT_PMD on hugetlb now always fails with -EOPNOTSUPP,
let's add a safety net in case FOLL_SPLIT_PMD usage would ever be reworked.
In particular, before commit 9cb28da54643 ("mm/gup: handle hugetlb in the
generic follow_page_mask code"), GUP(FOLL_SPLIT_PMD) would just have
returned a page. In particular, hugetlb folios that are not PMD-sized
would never have been prone to FOLL_SPLIT_PMD.
hugetlb folios can be anonymous, and page_make_device_exclusive_one() is
not really prepared for handling them at all. So let's spell that out.
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
mm/rmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index c6c4d4ea29a7..17fbfa61f7ef 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2499,7 +2499,7 @@ static bool folio_make_device_exclusive(struct folio *folio,
* Restrict to anonymous folios for now to avoid potential writeback
* issues.
*/
- if (!folio_test_anon(folio))
+ if (!folio_test_anon(folio) || folio_test_hugetlb(folio))
return false;
rmap_walk(folio, &rwc);
--
2.48.1
Link: https://www.cve.org/CVERecord/?id=CVE-2023-0597
Link: https://www.cve.org/CVERecord/?id=CVE-2023-3640
v1: https://lore.kernel.org/all/20241112224201.289285-1-kovalev@altlinux.org/
v2: fix the regression causing kernel boot failures when both
CONFIG_RANDOMIZE_BASE=y and CONFIG_KASAN=y are enabled, instead of backporting
commit d4150779e60f ("random32: use real rng for non-deterministic randomness"),
which would bring in additional fixing commits:
4051a81774d6 ("locking/lockdep: Use sched_clock() for random numbers")
327b18b7aaed ("mm/kfence: select random number before taking raw lock")
f05ccf6a6ac6 ("crypto: testmgr - fix RNG performance in fuzz tests")
replaced the random number generator function (prandom -> random) with in
commit dcd5ba760e89 ("x86/mm: Randomize per-cpu entry area"):
- cea = prandom_u32_max(max_cea);
+ cea = (u32)(((u64) get_random_u32() * max_cea) >> 32);
This change will replicate the behavior as if the fixing
commit d4150779e60f ("random32: use real rng for non-deterministic randomness")
had been applied.
[PATCH v2 5.10/5.15/6.1 1/5] x86/kasan: Map shadow for percpu pages on demand
[PATCH v2 5.10/5.15/6.1 2/5] x86/mm: Recompute physical address for every page of
[PATCH v2 5.10/5.15/6.1 3/5] x86/mm: Populate KASAN shadow for entire per-CPU range of
[PATCH v2 5.10/5.15/6.1 4/5] x86/mm: Randomize per-cpu entry area
[PATCH v2 5.10/5.15/6.1 5/5] x86/mm: Do not shuffle CPU entry areas without KASLR
This is the start of the stable review cycle for the 6.6.69 release.
There are 86 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 01 Jan 2025 15:41:48 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.69-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.6.69-rc1
Ming Lei <ming.lei(a)redhat.com>
block: avoid to reuse `hctx` not removed from cpuhp callback list
Colin Ian King <colin.i.king(a)gmail.com>
ALSA: hda/realtek: Fix spelling mistake "Firelfy" -> "Firefly"
Andrew Cooper <andrew.cooper3(a)citrix.com>
x86/cpu/intel: Drop stray FAM6 check with new Intel CPU model defines
Takashi Iwai <tiwai(a)suse.de>
ALSA: sh: Fix wrong argument order for copy_from_iter()
Qu Wenruo <wqu(a)suse.com>
btrfs: sysfs: fix direct super block member reads
Filipe Manana <fdmanana(a)suse.com>
btrfs: avoid monopolizing a core when activating a swap file
Dimitri Fedrau <dimitri.fedrau(a)liebherr.com>
power: supply: gpio-charger: Fix set charge current limits
Conor Dooley <conor.dooley(a)microchip.com>
i2c: microchip-core: fix "ghost" detections
Carlos Song <carlos.song(a)nxp.com>
i2c: imx: add imx7d compatible string for applying erratum ERR007805
Thomas Gleixner <tglx(a)linutronix.de>
PCI/MSI: Handle lack of irqdomain gracefully
Conor Dooley <conor.dooley(a)microchip.com>
i2c: microchip-core: actually use repeated sends
Pavel Begunkov <asml.silence(a)gmail.com>
io_uring/sqpoll: fix sqpoll error handling races
Lizhi Xu <lizhi.xu(a)windriver.com>
tracing: Prevent bad count for tracing_cpumask_write
Christian Göttsche <cgzones(a)googlemail.com>
tracing: Constify string literal data member in struct trace_event_call
Chen Ridong <chenridong(a)huawei.com>
freezer, sched: Report frozen tasks as 'D' instead of 'R'
Jesse.zhang(a)amd.com <Jesse.zhang(a)amd.com>
drm/amdkfd: pause autosuspend when creating pdd
Lijo Lazar <lijo.lazar(a)amd.com>
drm/amdkfd: Use device based logging for errors
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdkfd: drop struct kfd_cu_info
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdkfd: reduce stack size in kfd_topology_add_device()
Len Brown <len.brown(a)intel.com>
x86/cpu: Add Lunar Lake to list of CPUs with a broken MONITOR implementation
Tony Luck <tony.luck(a)intel.com>
x86/cpu/intel: Switch to new Intel CPU model defines
Tony Luck <tony.luck(a)intel.com>
x86/cpu/vfm: Update arch/x86/include/asm/intel-family.h
Tony Luck <tony.luck(a)intel.com>
x86/cpu/vfm: Add/initialize x86_vfm field to struct cpuinfo_x86
Tony Luck <tony.luck(a)intel.com>
x86/cpu: Add model number for another Intel Arrow Lake mobile processor
Tony Luck <tony.luck(a)intel.com>
x86/cpu: Add model number for Intel Clearwater Forest processor
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdgpu/hdp6.0: do a posting read when flushing HDP
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdgpu/hdp5.0: do a posting read when flushing HDP
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdgpu/hdp4.0: do a posting read when flushing HDP
Victor Zhao <Victor.Zhao(a)amd.com>
drm/amd/amdgpu: allow use kiq to do hdp flush under sriov
Ulf Hansson <ulf.hansson(a)linaro.org>
pmdomain: core: Add missing put_device()
Chris Chiu <chris.chiu(a)canonical.com>
ALSA: hda/realtek: fix micmute LEDs don't work on HP Laptops
Dirk Su <dirk.su(a)canonical.com>
ALSA: hda/realtek: fix mute/micmute LEDs don't work for EliteBook X G1i
Qun-Wei Lin <qun-wei.lin(a)mediatek.com>
sched/task_stack: fix object_is_on_stack() for KASAN tagged pointers
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
MIPS: mipsregs: Set proper ISA level for virt extensions
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
MIPS: Probe toolchain support of -msym32
Ming Lei <ming.lei(a)redhat.com>
blk-mq: move cpuhp callback registering out of q->sysfs_lock
Ming Lei <ming.lei(a)redhat.com>
blk-mq: register cpuhp callback after hctx is added to xarray table
Ming Lei <ming.lei(a)redhat.com>
virtio-blk: don't keep queue frozen during system suspend
Imre Deak <imre.deak(a)intel.com>
drm/dp_mst: Ensure mst_primary pointer is valid in drm_dp_mst_handle_up_req()
Purushothama Siddaiah <psiddaiah(a)mvista.com>
spi: omap2-mcspi: Fix the IS_ERR() bug for devm_clk_get_optional_enabled()
Cathy Avery <cavery(a)redhat.com>
scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_OVERRUN as an error
Ranjan Kumar <ranjan.kumar(a)broadcom.com>
scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load time
Aapo Vienamo <aapo.vienamo(a)iki.fi>
spi: intel: Add Panther Lake SPI controller support
Armin Wolf <W_Armin(a)gmx.de>
platform/x86: asus-nb-wmi: Ignore unknown event 0xCF
Tiezhu Yang <yangtiezhu(a)loongson.cn>
LoongArch: BPF: Adjust the parameter of emit_jirl()
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Fix reserving screen info memory for above-4G firmware
Mark Brown <broonie(a)kernel.org>
regmap: Use correct format specifier for logging range errors
Brahmajit Das <brahmajit.xyz(a)gmail.com>
smb: server: Fix building with GCC 15
Takashi Iwai <tiwai(a)suse.de>
ALSA: sh: Use standard helper for buffer accesses
bo liu <bo.liu(a)senarytech.com>
ALSA: hda/conexant: fix Z60MR100 startup pop issue
Jan Kara <jack(a)suse.cz>
udf: Skip parent dir link count update if corrupted
Tomas Henzl <thenzl(a)redhat.com>
scsi: megaraid_sas: Fix for a potential deadlock
Magnus Lindholm <linmag7(a)gmail.com>
scsi: qla1280: Fix hw revision numbering for ISP1020/1040
Yassine Oudjana <y.oudjana(a)protonmail.com>
watchdog: mediatek: Add support for MT6735 TOPRGU/WDT
James Hilliard <james.hilliard1(a)gmail.com>
watchdog: it87_wdt: add PWRGD enable quirk for Qotom QCML04
Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
tracing/kprobe: Make trace_kprobe's module callback called after jump_label update
Alexander Lobakin <aleksander.lobakin(a)intel.com>
stddef: make __struct_group() UAPI C++-friendly
Haren Myneni <haren(a)linux.ibm.com>
powerpc/pseries/vas: Add close() callback in vas_vm_ops struct
Dan Carpenter <dan.carpenter(a)linaro.org>
mtd: rawnand: fix double free in atmel_pmecc_create_user()
Chen Ridong <chenridong(a)huawei.com>
dmaengine: at_xdmac: avoid null_prt_deref in at_xdmac_prep_dma_memset
Sasha Finkelstein <fnkl.kernel(a)gmail.com>
dmaengine: apple-admac: Avoid accessing registers in probe
Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
dmaengine: fsl-edma: implement the cleanup path of fsl_edma3_attach_pd()
Akhil R <akhilrajeev(a)nvidia.com>
dmaengine: tegra: Return correct DMA status when paused
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
dmaengine: dw: Select only supported masters for ACPI devices
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
dmaengine: mv_xor: fix child node refcount handling in early exit
Chukun Pan <amadeus(a)jmu.edu.cn>
phy: rockchip: naneng-combphy: fix phy reset
Justin Chen <justin.chen(a)broadcom.com>
phy: usb: Toggle the PHY power during init
Zijun Hu <quic_zijuhu(a)quicinc.com>
phy: core: Fix that API devm_phy_destroy() fails to destroy the phy
Zijun Hu <quic_zijuhu(a)quicinc.com>
phy: core: Fix that API devm_of_phy_provider_unregister() fails to unregister the phy provider
Zijun Hu <quic_zijuhu(a)quicinc.com>
phy: core: Fix that API devm_phy_put() fails to release the phy
Zijun Hu <quic_zijuhu(a)quicinc.com>
phy: core: Fix an OF node refcount leakage in of_phy_provider_lookup()
Zijun Hu <quic_zijuhu(a)quicinc.com>
phy: core: Fix an OF node refcount leakage in _of_phy_get()
Krishna Kurapati <quic_kriskura(a)quicinc.com>
phy: qcom-qmp: Fix register name in RX Lane config of SC8280XP
Maciej Andrzejewski <maciej.andrzejewski(a)m-works.net>
mtd: rawnand: arasan: Fix missing de-registration of NAND
Maciej Andrzejewski <maciej.andrzejewski(a)m-works.net>
mtd: rawnand: arasan: Fix double assertion of chip-select
Zichen Xie <zichenxie0106(a)gmail.com>
mtd: diskonchip: Cast an operand to prevent potential overflow
NeilBrown <neilb(a)suse.de>
nfsd: restore callback functionality for NFSv4.0
Yang Erkun <yangerkun(a)huawei.com>
nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"
Cong Wang <cong.wang(a)bytedance.com>
bpf: Check negative offsets in __bpf_skb_min_len()
Zijian Zhang <zijianzhang(a)bytedance.com>
tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection
Cong Wang <cong.wang(a)bytedance.com>
tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress()
Bart Van Assche <bvanassche(a)acm.org>
mm/vmstat: fix a W=1 clang compiler warning
Ilya Dryomov <idryomov(a)gmail.com>
ceph: allocate sparse_ext map only for sparse reads
Ilya Dryomov <idryomov(a)gmail.com>
ceph: fix memory leak in ceph_direct_read_write()
Xiubo Li <xiubli(a)redhat.com>
ceph: try to allocate a smaller extent map for sparse read
Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
media: dvb-frontends: dib3000mb: fix uninit-value in dib3000_write_reg
-------------
Diffstat:
Makefile | 4 +-
arch/loongarch/include/asm/inst.h | 12 +-
arch/loongarch/kernel/efi.c | 2 +-
arch/loongarch/kernel/inst.c | 2 +-
arch/loongarch/net/bpf_jit.c | 6 +-
arch/mips/Makefile | 2 +-
arch/mips/include/asm/mipsregs.h | 13 ++-
arch/powerpc/platforms/book3s/vas-api.c | 36 ++++++
arch/x86/include/asm/intel-family.h | 87 ++++++++++++++
arch/x86/include/asm/processor.h | 20 +++-
arch/x86/kernel/cpu/intel.c | 118 ++++++++++---------
arch/x86/kernel/cpu/match.c | 3 +-
block/blk-mq.c | 119 ++++++++++++++++---
drivers/base/power/domain.c | 1 +
drivers/base/regmap/regmap.c | 4 +-
drivers/block/virtio_blk.c | 7 +-
drivers/dma/apple-admac.c | 7 +-
drivers/dma/at_xdmac.c | 2 +
drivers/dma/dw/acpi.c | 6 +-
drivers/dma/dw/internal.h | 8 ++
drivers/dma/dw/pci.c | 4 +-
drivers/dma/fsl-edma-common.h | 1 +
drivers/dma/fsl-edma-main.c | 41 ++++++-
drivers/dma/mv_xor.c | 2 +
drivers/dma/tegra186-gpc-dma.c | 10 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 22 ----
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 -
drivers/gpu/drm/amd/amdgpu/hdp_v4_0.c | 14 ++-
drivers/gpu/drm/amd/amdgpu/hdp_v5_0.c | 9 +-
drivers/gpu/drm/amd/amdgpu/hdp_v6_0.c | 8 +-
drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 28 ++---
.../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 15 +++
drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 3 +-
drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 21 ++--
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 32 +++---
drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c | 63 +++++++----
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 43 +++----
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 44 ++++---
drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 14 ---
drivers/gpu/drm/display/drm_dp_mst_topology.c | 24 +++-
drivers/i2c/busses/i2c-imx.c | 1 +
drivers/i2c/busses/i2c-microchip-corei2c.c | 126 ++++++++++++++++-----
drivers/media/dvb-frontends/dib3000mb.c | 2 +-
drivers/mtd/nand/raw/arasan-nand-controller.c | 11 +-
drivers/mtd/nand/raw/atmel/pmecc.c | 4 +-
drivers/mtd/nand/raw/diskonchip.c | 2 +-
drivers/pci/msi/irqdomain.c | 7 +-
drivers/pci/msi/msi.c | 4 +
drivers/phy/broadcom/phy-brcm-usb-init-synopsys.c | 6 +
drivers/phy/phy-core.c | 21 ++--
drivers/phy/qualcomm/phy-qcom-qmp-usb.c | 2 +-
drivers/phy/rockchip/phy-rockchip-naneng-combphy.c | 2 +-
drivers/platform/x86/asus-nb-wmi.c | 1 +
drivers/power/supply/gpio-charger.c | 8 ++
drivers/scsi/megaraid/megaraid_sas_base.c | 5 +-
drivers/scsi/mpt3sas/mpt3sas_base.c | 7 +-
drivers/scsi/qla1280.h | 12 +-
drivers/scsi/storvsc_drv.c | 7 +-
drivers/spi/spi-intel-pci.c | 2 +
drivers/spi/spi-omap2-mcspi.c | 6 +-
drivers/watchdog/it87_wdt.c | 39 +++++++
drivers/watchdog/mtk_wdt.c | 6 +
fs/btrfs/inode.c | 2 +
fs/btrfs/sysfs.c | 6 +-
fs/ceph/addr.c | 4 +-
fs/ceph/file.c | 43 +++----
fs/ceph/super.h | 14 +++
fs/nfsd/export.c | 31 +----
fs/nfsd/export.h | 4 +-
fs/nfsd/nfs4callback.c | 4 +-
fs/smb/server/smb_common.c | 4 +-
fs/udf/namei.c | 6 +-
include/linux/ceph/osd_client.h | 7 +-
include/linux/sched.h | 3 +-
include/linux/sched/task_stack.h | 2 +
include/linux/skmsg.h | 11 +-
include/linux/trace_events.h | 2 +-
include/linux/vmstat.h | 2 +-
include/net/sock.h | 10 +-
include/uapi/linux/stddef.h | 13 ++-
io_uring/sqpoll.c | 6 +
kernel/trace/trace.c | 3 +
kernel/trace/trace_kprobe.c | 2 +-
net/ceph/osd_client.c | 2 +
net/core/filter.c | 21 +++-
net/core/skmsg.c | 6 +-
net/ipv4/tcp_bpf.c | 6 +-
sound/pci/hda/patch_conexant.c | 28 +++++
sound/pci/hda/patch_realtek.c | 7 ++
sound/sh/sh_dac_audio.c | 5 +-
tools/include/uapi/linux/stddef.h | 15 ++-
91 files changed, 980 insertions(+), 429 deletions(-)
From: Kuniyuki Iwashima <kuniyu(a)amazon.com>
commit e8c526f2bdf1845bedaf6a478816a3d06fa78b8f upstream.
Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
"""
We are seeing a use-after-free from a bpf prog attached to
trace_tcp_retransmit_synack. The program passes the req->sk to the
bpf_sk_storage_get_tracing kernel helper which does check for null
before using it.
"""
The commit 83fccfc3940c ("inet: fix potential deadlock in
reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not
to call del_timer_sync() from reqsk_timer_handler(), but it introduced a
small race window.
Before the timer is called, expire_timers() calls detach_timer(timer, true)
to clear timer->entry.pprev and marks it as not pending.
If reqsk_queue_unlink() checks timer_pending() just after expire_timers()
calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will
continue running and send multiple SYN+ACKs until it expires.
The reported UAF could happen if req->sk is close()d earlier than the timer
expiration, which is 63s by default.
The scenario would be
1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
but del_timer_sync() is missed
2. reqsk timer is executed and scheduled again
3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but
reqsk timer still has another one, and inet_csk_accept() does not
clear req->sk for non-TFO sockets
4. sk is close()d
5. reqsk timer is executed again, and BPF touches req->sk
Let's not use timer_pending() by passing the caller context to
__inet_csk_reqsk_queue_drop().
Note that reqsk timer is pinned, so the issue does not happen in most
use cases. [1]
[0]
BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0
Use-after-free read at 0x00000000a891fb3a (in kfence-#1):
bpf_sk_storage_get_tracing+0x2e/0x1b0
bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda
bpf_trace_run2+0x4c/0xc0
tcp_rtx_synack+0xf9/0x100
reqsk_timer_handler+0xda/0x3d0
run_timer_softirq+0x292/0x8a0
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
intel_idle_irq+0x5a/0xa0
cpuidle_enter_state+0x94/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6
allocated by task 0 on cpu 9 at 260507.901592s:
sk_prot_alloc+0x35/0x140
sk_clone_lock+0x1f/0x3f0
inet_csk_clone_lock+0x15/0x160
tcp_create_openreq_child+0x1f/0x410
tcp_v6_syn_recv_sock+0x1da/0x700
tcp_check_req+0x1fb/0x510
tcp_v6_rcv+0x98b/0x1420
ipv6_list_rcv+0x2258/0x26e0
napi_complete_done+0x5b1/0x2990
mlx5e_napi_poll+0x2ae/0x8d0
net_rx_action+0x13e/0x590
irq_exit_rcu+0xf5/0x320
common_interrupt+0x80/0x90
asm_common_interrupt+0x22/0x40
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
freed by task 0 on cpu 9 at 260507.927527s:
rcu_core_si+0x4ff/0xf10
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()")
Reported-by: Martin KaFai Lau <martin.lau(a)kernel.org>
Closes: https://lore.kernel.org/netdev/eb6684d0-ffd9-4bdc-9196-33f690c25824@linux.d…
Link: https://lore.kernel.org/netdev/b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.d… [1]
Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Reviewed-by: Martin KaFai Lau <martin.lau(a)kernel.org>
Link: https://patch.msgid.link/20241014223312.4254-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[d.privalov: adapt calling __inet_csk_reqsk_queue_drop()]
Signed-off-by: Dmitriy Privalov <d.privalov(a)omp.ru>
---
Backport fix for CVE-2024-50154
Link: https://nvd.nist.gov/vuln/detail/CVE-2024-50154
---
v3: Return using timer_delete_sync()
v2: Fix patch applying
net/ipv4/inet_connection_sock.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 1dfa561e8f981..d912894ad4adc 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -700,21 +700,31 @@ static bool reqsk_queue_unlink(struct request_sock *req)
found = __sk_nulls_del_node_init_rcu(req_to_sk(req));
spin_unlock(lock);
}
- if (timer_pending(&req->rsk_timer) && del_timer_sync(&req->rsk_timer))
- reqsk_put(req);
+
return found;
}
-bool inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req)
+static bool __inet_csk_reqsk_queue_drop(struct sock *sk,
+ struct request_sock *req,
+ bool from_timer)
{
bool unlinked = reqsk_queue_unlink(req);
+ if (!from_timer && timer_delete_sync(&req->rsk_timer))
+ reqsk_put(req);
+
if (unlinked) {
reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
reqsk_put(req);
}
+
return unlinked;
}
+
+bool inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req)
+{
+ return __inet_csk_reqsk_queue_drop(sk, req, false);
+}
EXPORT_SYMBOL(inet_csk_reqsk_queue_drop);
void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock *req)
@@ -781,7 +791,8 @@ static void reqsk_timer_handler(struct timer_list *t)
return;
}
drop:
- inet_csk_reqsk_queue_drop_and_put(sk_listener, req);
+ __inet_csk_reqsk_queue_drop(sk_listener, req, true);
+ reqsk_put(req);
}
static void reqsk_queue_hash_req(struct request_sock *req,
--
2.34.1
One of the possible ways to enable the input MTU auto-selection for L2CAP
connections is supposed to be through passing a special "0" value for it
as a socket option. Commit [1] added one of those into avdtp. However, it
simply wouldn't work because the kernel still treats the specified value
as invalid and denies the setting attempt. Recorded BlueZ logs include the
following:
bluetoothd[496]: profiles/audio/avdtp.c:l2cap_connect() setsockopt(L2CAP_OPTIONS): Invalid argument (22)
[1]: https://github.com/bluez/bluez/commit/ae5be371a9f53fed33d2b34748a95a5498fd4…
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4b6e228e297b ("Bluetooth: Auto tune if input MTU is set to 0")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
net/bluetooth/l2cap_sock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c
index 49f97d4138ea..46ea0bee2259 100644
--- a/net/bluetooth/l2cap_sock.c
+++ b/net/bluetooth/l2cap_sock.c
@@ -710,12 +710,12 @@ static bool l2cap_valid_mtu(struct l2cap_chan *chan, u16 mtu)
{
switch (chan->scid) {
case L2CAP_CID_ATT:
- if (mtu < L2CAP_LE_MIN_MTU)
+ if (mtu && mtu < L2CAP_LE_MIN_MTU)
return false;
break;
default:
- if (mtu < L2CAP_DEFAULT_MIN_MTU)
+ if (mtu && mtu < L2CAP_DEFAULT_MIN_MTU)
return false;
}
--
2.39.5
The patch titled
Subject: mm/hugetlb: fix hugepage allocation for interleaved memory nodes
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Ritesh Harjani (IBM)" <ritesh.list(a)gmail.com>
Subject: mm/hugetlb: fix hugepage allocation for interleaved memory nodes
Date: Sat, 11 Jan 2025 16:36:55 +0530
gather_bootmem_prealloc() assumes the start nid as 0 and size as
num_node_state(N_MEMORY). That means in case if memory attached numa
nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail
to scan few of these nodes.
Since memory attached numa nodes can be interleaved in any fashion, hence
ensure that the current code checks for all numa node ids
(.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
it can distributes all nr_node_ids among the these many no. threads.
e.g. qemu cmdline
========================
numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
mem_cmd="-object memory-backend-ram,id=mem1,size=16G"
w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
==========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 0 kB
with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
===========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 2
HugePages_Free: 2
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
Hugetlb: 2097152 kB
Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.17365925…
Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Reported-by: Pavithra Prakash <pavrampu(a)linux.ibm.com>
Suggested-by: Muchun Song <muchun.song(a)linux.dev>
Tested-by: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Reviewed-by: Luiz Capitulino <luizcap(a)redhat.com>
Cc: Donet Tom <donettom(a)linux.ibm.com>
Cc: Gang Li <gang.li(a)linux.dev>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes
+++ a/mm/hugetlb.c
@@ -3309,7 +3309,7 @@ static void __init gather_bootmem_preall
.thread_fn = gather_bootmem_prealloc_parallel,
.fn_arg = NULL,
.start = 0,
- .size = num_node_state(N_MEMORY),
+ .size = nr_node_ids,
.align = 1,
.min_chunk = 1,
.max_threads = num_node_state(N_MEMORY),
_
Patches currently in -mm which might be from ritesh.list(a)gmail.com are
mm-hugetlb-fix-hugepage-allocation-for-interleaved-memory-nodes.patch
From: K Prateek Nayak <kprateek.nayak(a)amd.com>
commit 6675ce20046d149e1e1ffe7e9577947dee17aad5 upstream.
do_softirq_post_smp_call_flush() on PREEMPT_RT kernels carries a
WARN_ON_ONCE() for any SOFTIRQ being raised from an SMP-call-function.
Since do_softirq_post_smp_call_flush() is called with preempt disabled,
raising a SOFTIRQ during flush_smp_call_function_queue() can lead to
longer preempt disabled sections.
Since commit b2a02fc43a1f ("smp: Optimize
send_call_function_single_ipi()") IPIs to an idle CPU in
TIF_POLLING_NRFLAG mode can be optimized out by instead setting
TIF_NEED_RESCHED bit in idle task's thread_info and relying on the
flush_smp_call_function_queue() in the idle-exit path to run the
SMP-call-function.
To trigger an idle load balancing, the scheduler queues
nohz_csd_function() responsible for triggering an idle load balancing on
a target nohz idle CPU and sends an IPI. Only now, this IPI is optimized
out and the SMP-call-function is executed from
flush_smp_call_function_queue() in do_idle() which can raise a
SCHED_SOFTIRQ to trigger the balancing.
So far, this went undetected since, the need_resched() check in
nohz_csd_function() would make it bail out of idle load balancing early
as the idle thread does not clear TIF_POLLING_NRFLAG before calling
flush_smp_call_function_queue(). The need_resched() check was added with
the intent to catch a new task wakeup, however, it has recently
discovered to be unnecessary and will be removed in the subsequent
commit after which nohz_csd_function() can raise a SCHED_SOFTIRQ from
flush_smp_call_function_queue() to trigger an idle load balance on an
idle target in TIF_POLLING_NRFLAG mode.
nohz_csd_function() bails out early if "idle_cpu()" check for the
target CPU, and does not lock the target CPU's rq until the very end,
once it has found tasks to run on the CPU and will not inhibit the
wakeup of, or running of a newly woken up higher priority task. Account
for this and prevent a WARN_ON_ONCE() when SCHED_SOFTIRQ is raised from
flush_smp_call_function_queue().
Signed-off-by: K Prateek Nayak <kprateek.nayak(a)amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20241119054432.6405-2-kprateek.nayak@amd.com
Tested-by: Felix Moessbauer <felix.moessbauer(a)siemens.com>
Signed-off-by: Florian Bezdeka <florian.bezdeka(a)siemens.com>
---
Newer stable branches (6.12, 6.6) got this already, 5.10 and lower are
not affected.
The warning triggered for SCHED_SOFTIRQ under high network load while
testing.
kernel/softirq.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index a47396161843..6665f5cd60cb 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -294,17 +294,24 @@ static inline void invoke_softirq(void)
wakeup_softirqd();
}
+#define SCHED_SOFTIRQ_MASK BIT(SCHED_SOFTIRQ)
+
/*
* flush_smp_call_function_queue() can raise a soft interrupt in a function
- * call. On RT kernels this is undesired and the only known functionality
- * in the block layer which does this is disabled on RT. If soft interrupts
- * get raised which haven't been raised before the flush, warn so it can be
+ * call. On RT kernels this is undesired and the only known functionalities
+ * are in the block layer which is disabled on RT, and in the scheduler for
+ * idle load balancing. If soft interrupts get raised which haven't been
+ * raised before the flush, warn if it is not a SCHED_SOFTIRQ so it can be
* investigated.
*/
void do_softirq_post_smp_call_flush(unsigned int was_pending)
{
- if (WARN_ON_ONCE(was_pending != local_softirq_pending()))
+ unsigned int is_pending = local_softirq_pending();
+
+ if (unlikely(was_pending != is_pending)) {
+ WARN_ON_ONCE(was_pending != (is_pending & ~SCHED_SOFTIRQ_MASK));
invoke_softirq();
+ }
}
#else /* CONFIG_PREEMPT_RT */
--
2.39.5
When comparing to the ARM list [1], it appears that several ARM cores
were missing from the lists in spectre_bhb_loop_affected(). Add them.
NOTE: for some of these cores it may not matter since other ways of
clearing the BHB may be used (like the CLRBHB instruction or ECBHB),
but it still seems good to have all the info from ARM's whitepaper
included.
[1] https://developer.arm.com/Arm%20Security%20Center/Spectre-BHB
Fixes: 558c303c9734 ("arm64: Mitigate spectre style branch history side channels")
Cc: stable(a)vger.kernel.org
Signed-off-by: Douglas Anderson <dianders(a)chromium.org>
---
(no changes since v3)
Changes in v3:
- New
arch/arm64/kernel/proton-pack.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c
index 89405be53d8f..0f51fd10b4b0 100644
--- a/arch/arm64/kernel/proton-pack.c
+++ b/arch/arm64/kernel/proton-pack.c
@@ -876,6 +876,14 @@ static u8 spectre_bhb_loop_affected(void)
{
u8 k = 0;
+ static const struct midr_range spectre_bhb_k132_list[] = {
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_X3),
+ MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V2),
+ };
+ static const struct midr_range spectre_bhb_k38_list[] = {
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A715),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A720),
+ };
static const struct midr_range spectre_bhb_k32_list[] = {
MIDR_ALL_VERSIONS(MIDR_CORTEX_A78),
MIDR_ALL_VERSIONS(MIDR_CORTEX_A78AE),
@@ -889,6 +897,7 @@ static u8 spectre_bhb_loop_affected(void)
};
static const struct midr_range spectre_bhb_k24_list[] = {
MIDR_ALL_VERSIONS(MIDR_CORTEX_A76),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A76AE),
MIDR_ALL_VERSIONS(MIDR_CORTEX_A77),
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_4XX_GOLD),
@@ -904,7 +913,11 @@ static u8 spectre_bhb_loop_affected(void)
{},
};
- if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k32_list))
+ if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k132_list))
+ k = 132;
+ else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k38_list))
+ k = 38;
+ else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k32_list))
k = 32;
else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k24_list))
k = 24;
--
2.47.1.613.gc27f4b7a9f-goog
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: b9a49520679e98700d3d89689cc91c08a1c88c1d
Gitweb: https://git.kernel.org/tip/b9a49520679e98700d3d89689cc91c08a1c88c1d
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Sun, 19 Jan 2025 00:55:32 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Wed, 29 Jan 2025 15:21:31 +01:00
rcuref: Plug slowpath race in rcuref_put()
Kernel test robot reported an "imbalanced put" in the rcuref_put() slow
path, which turned out to be a false positive. Consider the following race:
ref = 0 (via rcuref_init(ref, 1))
T1 T2
rcuref_put(ref)
-> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff
-> rcuref_put_slowpath(ref)
rcuref_get(ref)
-> atomic_add_negative_relaxed(1, &ref->refcnt)
-> return true; # ref -> 0
rcuref_put(ref)
-> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff
-> rcuref_put_slowpath()
-> cnt = atomic_read(&ref->refcnt); # cnt -> 0xffffffff / RCUREF_NOREF
-> atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD)) # ref -> 0xe0000000 / RCUREF_DEAD
-> return true
-> cnt = atomic_read(&ref->refcnt); # cnt -> 0xe0000000 / RCUREF_DEAD
-> if (cnt > RCUREF_RELEASED) # 0xe0000000 > 0xc0000000
-> WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")
The problem is the additional read in the slow path (after it
decremented to RCUREF_NOREF) which can happen after the counter has been
marked RCUREF_DEAD.
Prevent this by reusing the return value of the decrement. Now every "final"
put uses RCUREF_NOREF in the slow path and attempts the final cmpxchg() to
RCUREF_DEAD.
[ bigeasy: Add changelog ]
Fixes: ee1ee6db07795 ("atomics: Provide rcuref - scalable reference counting")
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Debugged-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Closes: https://lore.kernel.org/oe-lkp/202412311453.9d7636a2-lkp@intel.com
---
include/linux/rcuref.h | 9 ++++++---
lib/rcuref.c | 5 ++---
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/include/linux/rcuref.h b/include/linux/rcuref.h
index 2c8bfd0..6322d8c 100644
--- a/include/linux/rcuref.h
+++ b/include/linux/rcuref.h
@@ -71,27 +71,30 @@ static inline __must_check bool rcuref_get(rcuref_t *ref)
return rcuref_get_slowpath(ref);
}
-extern __must_check bool rcuref_put_slowpath(rcuref_t *ref);
+extern __must_check bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt);
/*
* Internal helper. Do not invoke directly.
*/
static __always_inline __must_check bool __rcuref_put(rcuref_t *ref)
{
+ int cnt;
+
RCU_LOCKDEP_WARN(!rcu_read_lock_held() && preemptible(),
"suspicious rcuref_put_rcusafe() usage");
/*
* Unconditionally decrease the reference count. The saturation and
* dead zones provide enough tolerance for this.
*/
- if (likely(!atomic_add_negative_release(-1, &ref->refcnt)))
+ cnt = atomic_sub_return_release(1, &ref->refcnt);
+ if (likely(cnt >= 0))
return false;
/*
* Handle the last reference drop and cases inside the saturation
* and dead zones.
*/
- return rcuref_put_slowpath(ref);
+ return rcuref_put_slowpath(ref, cnt);
}
/**
diff --git a/lib/rcuref.c b/lib/rcuref.c
index 97f300e..5bd726b 100644
--- a/lib/rcuref.c
+++ b/lib/rcuref.c
@@ -220,6 +220,7 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath);
/**
* rcuref_put_slowpath - Slowpath of __rcuref_put()
* @ref: Pointer to the reference count
+ * @cnt: The resulting value of the fastpath decrement
*
* Invoked when the reference count is outside of the valid zone.
*
@@ -233,10 +234,8 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath);
* with a concurrent get()/put() pair. Caller is not allowed to
* deconstruct the protected object.
*/
-bool rcuref_put_slowpath(rcuref_t *ref)
+bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt)
{
- unsigned int cnt = atomic_read(&ref->refcnt);
-
/* Did this drop the last reference? */
if (likely(cnt == RCUREF_NOREF)) {
/*
Hi all, thanks for the replies! Understood the NACK, so let's drop this
backport.
Just not to let Greg's questoins go unanswered: this was meant for
linux-6.6.y (sorry forgot to mention it) and the additional cast is
required because the type of `ctx->next_offset` is a u32 in the 6.6
version, later changed to unsigned long which made it possible to cast it
to (void *) directly.
Best,
Andrea
From: Ramesh Thomas <ramesh.thomas(a)intel.com>
[ Upstream commit 2b938e3db335e3670475e31a722c2bee34748c5a ]
Definitions of ioread64 and iowrite64 macros in asm/io.h called by vfio
pci implementations are enclosed inside check for CONFIG_GENERIC_IOMAP.
They don't get defined if CONFIG_GENERIC_IOMAP is defined. Include
linux/io-64-nonatomic-lo-hi.h to define iowrite64 and ioread64 macros
when they are not defined. io-64-nonatomic-lo-hi.h maps the macros to
generic implementation in lib/iomap.c. The generic implementation does
64 bit rw if readq/writeq is defined for the architecture, otherwise it
would do 32 bit back to back rw.
Note that there are two versions of the generic implementation that
differs in the order the 32 bit words are written if 64 bit support is
not present. This is not the little/big endian ordering, which is
handled separately. This patch uses the lo followed by hi word ordering
which is consistent with current back to back implementation in the
vfio/pci code.
Signed-off-by: Ramesh Thomas <ramesh.thomas(a)intel.com>
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Link: https://lore.kernel.org/r/20241210131938.303500-2-ramesh.thomas@intel.com
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/vfio/pci/vfio_pci_rdwr.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 83f81d24df78e..94e3fb9f42243 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -16,6 +16,7 @@
#include <linux/io.h>
#include <linux/vfio.h>
#include <linux/vgaarb.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
#include "vfio_pci_private.h"
--
2.39.5
From: Ramesh Thomas <ramesh.thomas(a)intel.com>
[ Upstream commit 2b938e3db335e3670475e31a722c2bee34748c5a ]
Definitions of ioread64 and iowrite64 macros in asm/io.h called by vfio
pci implementations are enclosed inside check for CONFIG_GENERIC_IOMAP.
They don't get defined if CONFIG_GENERIC_IOMAP is defined. Include
linux/io-64-nonatomic-lo-hi.h to define iowrite64 and ioread64 macros
when they are not defined. io-64-nonatomic-lo-hi.h maps the macros to
generic implementation in lib/iomap.c. The generic implementation does
64 bit rw if readq/writeq is defined for the architecture, otherwise it
would do 32 bit back to back rw.
Note that there are two versions of the generic implementation that
differs in the order the 32 bit words are written if 64 bit support is
not present. This is not the little/big endian ordering, which is
handled separately. This patch uses the lo followed by hi word ordering
which is consistent with current back to back implementation in the
vfio/pci code.
Signed-off-by: Ramesh Thomas <ramesh.thomas(a)intel.com>
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Link: https://lore.kernel.org/r/20241210131938.303500-2-ramesh.thomas@intel.com
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/vfio/pci/vfio_pci_rdwr.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index a0b5fc8e46f4d..fdcc9dca14ca9 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -16,6 +16,7 @@
#include <linux/io.h>
#include <linux/vfio.h>
#include <linux/vgaarb.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
#include "vfio_pci_private.h"
--
2.39.5
From: Ramesh Thomas <ramesh.thomas(a)intel.com>
[ Upstream commit 2b938e3db335e3670475e31a722c2bee34748c5a ]
Definitions of ioread64 and iowrite64 macros in asm/io.h called by vfio
pci implementations are enclosed inside check for CONFIG_GENERIC_IOMAP.
They don't get defined if CONFIG_GENERIC_IOMAP is defined. Include
linux/io-64-nonatomic-lo-hi.h to define iowrite64 and ioread64 macros
when they are not defined. io-64-nonatomic-lo-hi.h maps the macros to
generic implementation in lib/iomap.c. The generic implementation does
64 bit rw if readq/writeq is defined for the architecture, otherwise it
would do 32 bit back to back rw.
Note that there are two versions of the generic implementation that
differs in the order the 32 bit words are written if 64 bit support is
not present. This is not the little/big endian ordering, which is
handled separately. This patch uses the lo followed by hi word ordering
which is consistent with current back to back implementation in the
vfio/pci code.
Signed-off-by: Ramesh Thomas <ramesh.thomas(a)intel.com>
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Link: https://lore.kernel.org/r/20241210131938.303500-2-ramesh.thomas@intel.com
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/vfio/pci/vfio_pci_rdwr.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 82ac1569deb05..e45c15e210ffd 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -16,6 +16,7 @@
#include <linux/io.h>
#include <linux/vfio.h>
#include <linux/vgaarb.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/vfio_pci_core.h>
--
2.39.5
From: Ramesh Thomas <ramesh.thomas(a)intel.com>
[ Upstream commit 2b938e3db335e3670475e31a722c2bee34748c5a ]
Definitions of ioread64 and iowrite64 macros in asm/io.h called by vfio
pci implementations are enclosed inside check for CONFIG_GENERIC_IOMAP.
They don't get defined if CONFIG_GENERIC_IOMAP is defined. Include
linux/io-64-nonatomic-lo-hi.h to define iowrite64 and ioread64 macros
when they are not defined. io-64-nonatomic-lo-hi.h maps the macros to
generic implementation in lib/iomap.c. The generic implementation does
64 bit rw if readq/writeq is defined for the architecture, otherwise it
would do 32 bit back to back rw.
Note that there are two versions of the generic implementation that
differs in the order the 32 bit words are written if 64 bit support is
not present. This is not the little/big endian ordering, which is
handled separately. This patch uses the lo followed by hi word ordering
which is consistent with current back to back implementation in the
vfio/pci code.
Signed-off-by: Ramesh Thomas <ramesh.thomas(a)intel.com>
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Link: https://lore.kernel.org/r/20241210131938.303500-2-ramesh.thomas@intel.com
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/vfio/pci/vfio_pci_rdwr.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index e27de61ac9fe7..8191c8fcfb256 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -16,6 +16,7 @@
#include <linux/io.h>
#include <linux/vfio.h>
#include <linux/vgaarb.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
#include "vfio_pci_priv.h"
--
2.39.5
From: Ramesh Thomas <ramesh.thomas(a)intel.com>
[ Upstream commit 2b938e3db335e3670475e31a722c2bee34748c5a ]
Definitions of ioread64 and iowrite64 macros in asm/io.h called by vfio
pci implementations are enclosed inside check for CONFIG_GENERIC_IOMAP.
They don't get defined if CONFIG_GENERIC_IOMAP is defined. Include
linux/io-64-nonatomic-lo-hi.h to define iowrite64 and ioread64 macros
when they are not defined. io-64-nonatomic-lo-hi.h maps the macros to
generic implementation in lib/iomap.c. The generic implementation does
64 bit rw if readq/writeq is defined for the architecture, otherwise it
would do 32 bit back to back rw.
Note that there are two versions of the generic implementation that
differs in the order the 32 bit words are written if 64 bit support is
not present. This is not the little/big endian ordering, which is
handled separately. This patch uses the lo followed by hi word ordering
which is consistent with current back to back implementation in the
vfio/pci code.
Signed-off-by: Ramesh Thomas <ramesh.thomas(a)intel.com>
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Link: https://lore.kernel.org/r/20241210131938.303500-2-ramesh.thomas@intel.com
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/vfio/pci/vfio_pci_rdwr.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index e27de61ac9fe7..8191c8fcfb256 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -16,6 +16,7 @@
#include <linux/io.h>
#include <linux/vfio.h>
#include <linux/vgaarb.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
#include "vfio_pci_priv.h"
--
2.39.5
In 'iwl_pnvm_load()', assume that the power table size is always
equal to IWL_HARDCODED_REDUCE_POWER_SIZE. Compile tested only.
Signed-off-by: Dmitry Antipov <dmantipov(a)yandex.ru>
---
I would gently ask Johannes to review this before taking any other
actions. This is intended for stable linux-6.1.y only in attempt to
avoid possible use of an uninitialized 'len' without backporting
https://lore.kernel.org/linux-wireless/20230606074310.889520-1-gregory.gree….
---
drivers/net/wireless/intel/iwlwifi/fw/pnvm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/pnvm.c b/drivers/net/wireless/intel/iwlwifi/fw/pnvm.c
index b6d3ac6ed440..ddf7acd67e94 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/pnvm.c
+++ b/drivers/net/wireless/intel/iwlwifi/fw/pnvm.c
@@ -332,6 +332,9 @@ int iwl_pnvm_load(struct iwl_trans *trans,
goto skip_reduce_power;
}
+ } else {
+ /* see iwl_uefi_get_reduced_power() to check why */
+ len = IWL_HARDCODED_REDUCE_POWER_SIZE;
}
ret = iwl_trans_set_reduce_power(trans, data, len);
--
2.48.1
ast_dp_is_connected() used to also check for link training success
to report the DP connector as connected. Without this check, the
physical_status is always connected. So if no monitor is present, it
will fail to read the EDID and set the default resolution to 640x480
instead of 1024x768.
Signed-off-by: Jocelyn Falempe <jfalempe(a)redhat.com>
Fixes: 2281475168d2 ("drm/ast: astdp: Perform link training during atomic_enable")
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v6.12+
---
drivers/gpu/drm/ast/ast_dp.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/ast/ast_dp.c b/drivers/gpu/drm/ast/ast_dp.c
index 0e282b7b167c..30aad5c0112a 100644
--- a/drivers/gpu/drm/ast/ast_dp.c
+++ b/drivers/gpu/drm/ast/ast_dp.c
@@ -17,6 +17,12 @@ static bool ast_astdp_is_connected(struct ast_device *ast)
{
if (!ast_get_index_reg_mask(ast, AST_IO_VGACRI, 0xDF, AST_IO_VGACRDF_HPD))
return false;
+ /*
+ * HPD might be set even if no monitor is connected, so also check that
+ * the link training was successful.
+ */
+ if (!ast_get_index_reg_mask(ast, AST_IO_VGACRI, 0xDC, AST_IO_VGACRDC_LINK_SUCCESS))
+ return false;
return true;
}
base-commit: 798047e63ac970f105c49c22e6d44df901c486b2
--
2.47.1
Hi all,
I have been sent here from the linux-ide mailing list. Over there, I
have reported an issue with the 6.6 stable series of the kernel,
starting with 6.6.51. The root cause is as follows:
On 12.09.2024, commit 872f86e1757bbb0a334ee739b824e47c448f5ebc ("ata:
libata-scsi: Check ATA_QCFLAG_RTF_FILLED before using result_tf") was
applied to 6.6, adding checks of ATA_QCFLAG_RTF_FILLED to libata_scsi.
The patch seen in baseline commit
18676c6aab0863618eb35443e7b8615eea3535a9 ("ata: libata-core: Set
ATA_QCFLAG_RTF_FILLED in fill_result_tf()") should have gone together
with this.
Without it, I receive errors retrieving SMART data from SATA disks via a
C602 SAS controller, apparently because in this situation
ATA_QCFLAG_RTF_FILLED is not set.
I applied 18676c6aab0863618eb35443e7b8615eea3535a9 from baseline to
6.6.74 and the problem went away.
If you need any further information, do not hesitate to contact me
(linux user since 0.99.x, but only debugging it once every few years or
so...).
Regards
Christian
From: Andrey Vatoropin <a.vatoropin(a)crpt.ru>
Add suffix ULL to constant 500 in order to give the compiler complete
information about the proper arithmetic to use. Notice that this
constant is used in a context that expects an expression of type
u64 (64 bits, unsigned).
The expression PCC_NUM_RETRIES * pcc_chan->latency, which at
preprocessing time translates to pcc_chan->latency; is currently
being evaluated using 32-bit arithmetic.
This is similar to commit b52f45110502
("ACPI / CPPC: Use 64-bit arithmetic instead of 32-bit")
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Andrey Vatoropin <a.vatoropin(a)crpt.ru>
---
drivers/hwmon/xgene-hwmon.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwmon/xgene-hwmon.c b/drivers/hwmon/xgene-hwmon.c
index 1e3bd129a922..43a08ddb964b 100644
--- a/drivers/hwmon/xgene-hwmon.c
+++ b/drivers/hwmon/xgene-hwmon.c
@@ -61,7 +61,7 @@
* Arbitrary retries in case the remote processor is slow to respond
* to PCC commands
*/
-#define PCC_NUM_RETRIES 500
+#define PCC_NUM_RETRIES 500ULL
#define ASYNC_MSG_FIFO_SIZE 16
#define MBOX_OP_TIMEOUTMS 1000
--
2.43.0
From: Andrey Vatoropin <a.vatoropin(a)crpt.ru>
Add suffix ULL to constant 500 in order to give the compiler complete
information about the proper arithmetic to use. Notice that this
constant is used in a context that expects an expression of type
u64 (64 bits, unsigned).
The expression PCC_NUM_RETRIES * cppc_ss->latency, which at
preprocessing time translates to cppc_ss->latency; is currently
being evaluated using 32-bit arithmetic.
This is similar to commit b52f45110502 ("ACPI / CPPC: Use 64-bit
arithmetic instead of 32-bit")
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Andrey Vatoropin <a.vatoropin(a)crpt.ru>
---
drivers/hwmon/xgene-hwmon.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwmon/xgene-hwmon.c b/drivers/hwmon/xgene-hwmon.c
index 1e3bd129a922..43a08ddb964b 100644
--- a/drivers/hwmon/xgene-hwmon.c
+++ b/drivers/hwmon/xgene-hwmon.c
@@ -61,7 +61,7 @@
* Arbitrary retries in case the remote processor is slow to respond
* to PCC commands
*/
-#define PCC_NUM_RETRIES 500
+#define PCC_NUM_RETRIES 500ULL
#define ASYNC_MSG_FIFO_SIZE 16
#define MBOX_OP_TIMEOUTMS 1000
--
2.43.0
Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
skips charging any zswapped base pages when it failed to zswap the entire
folio.
However, when some base pages are zswapped but it failed to zswap
the entire folio, the zswap operation is rolled back.
When freeing zswap entries for those pages, zswap_entry_free() uncharges
the pages that were not previously charged, causing zswap charging to
become inconsistent.
This inconsistency triggers two warnings with following steps:
# On a machine with 64GiB of RAM and 36GiB of zswap
$ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng
$ sudo reboot
Two warnings are:
in mm/memcontrol.c:163, function obj_cgroup_release():
WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
in mm/page_counter.c:60, function page_counter_cancel():
if (WARN_ONCE(new < 0, "page_counter underflow: %ld nr_pages=%lu\n",
new, nr_pages))
While objcg events should only be accounted for when the entire folio is
zswapped, objcg charging should be performed regardlessly.
Fix accordingly.
After resolving the inconsistency, these warnings disappear.
Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
---
v1->v2:
Fixed objcg events being accounted for on zswap failure.
Fixed the incorrect description. I misunderstood that the base pages are
going to be stored in zswap, but their zswap entries are freed immediately.
Added a comment on why it charges pages that are going to be removed
from zswap.
mm/zswap.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index 6504174fbc6a..10b30ac46deb 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1568,20 +1568,26 @@ bool zswap_store(struct folio *folio)
bytes = zswap_store_page(page, objcg, pool);
if (bytes < 0)
- goto put_pool;
+ goto charge_zswap;
compressed_bytes += bytes;
}
- if (objcg) {
- obj_cgroup_charge_zswap(objcg, compressed_bytes);
+ if (objcg)
count_objcg_events(objcg, ZSWPOUT, nr_pages);
- }
atomic_long_add(nr_pages, &zswap_stored_pages);
count_vm_events(ZSWPOUT, nr_pages);
ret = true;
+charge_zswap:
+ /*
+ * Charge zswapped pages even when it failed to zswap the entire folio,
+ * because zswap_entry_free() will uncharge them anyway.
+ * Otherwise zswap charging will become inconsistent.
+ */
+ if (objcg)
+ obj_cgroup_charge_zswap(objcg, compressed_bytes);
put_pool:
zswap_pool_put(pool);
put_objcg:
--
2.47.1
From: Andrey Vatoropin <a.vatoropin(a)crpt.ru>
Size of variable sd_gain equals four bytes - DA9150_QIF_SD_GAIN_SIZE.
Size of variable shunt_val equals two bytes - DA9150_QIF_SHUNT_VAL_SIZE.
The expression sd_gain * shunt_val is currently being evaluated using
32-bit arithmetic. So during the multiplication an overflow may occur.
As the value of type 'u64' is used as storage for the eventual result, put
ULL variable at the first position of each expression in order to give the
compiler complete information about the proper arithmetic to use. According
to C99 the guaranteed width for a variable of type 'unsigned long long' >=
64 bits.
Remove the explicit cast to u64 as it is meaningless.
Just for the sake of consistency, perform the similar trick with another
expression concerning 'iavg'.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: a419b4fd9138 ("power: Add support for DA9150 Fuel-Gauge")
Signed-off-by: Andrey Vatoropin <a.vatoropin(a)crpt.ru>
---
drivers/power/supply/da9150-fg.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/power/supply/da9150-fg.c b/drivers/power/supply/da9150-fg.c
index 652c1f213af1..63bec706167c 100644
--- a/drivers/power/supply/da9150-fg.c
+++ b/drivers/power/supply/da9150-fg.c
@@ -247,7 +247,7 @@ static int da9150_fg_current_avg(struct da9150_fg *fg,
DA9150_QIF_SD_GAIN_SIZE);
da9150_fg_read_sync_end(fg);
- div = (u64) (sd_gain * shunt_val * 65536ULL);
+ div = 65536ULL * sd_gain * shunt_val;
do_div(div, 1000000);
- res = (u64) (iavg * 1000000ULL);
+ res = 1000000ULL * iavg;
do_div(res, div);
--
2.43.0
From: Andrey Vatoropin <a.vatoropin(a)crpt.ru>
The function ib_umem_find_best_pgsz may return a value of zero.
If this occurs, the function ib_umem_num_dma_blocks could attempt
to divide by zero, resulting in a division by zero error.
To avoid this, please add a check to ensure that the variable
is not zero before performing the division.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 6d1782919dc9 ("drm/cma: Introduce drm_gem_cma_dumb_create_internal()")
Signed-off-by: Andrey Vatoropin <a.vatoropin(a)crpt.ru>
---
drivers/infiniband/hw/erdma/erdma_verbs.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/infiniband/hw/erdma/erdma_verbs.c b/drivers/infiniband/hw/erdma/erdma_verbs.c
index 51d619edb6c5..7ad38fb84661 100644
--- a/drivers/infiniband/hw/erdma/erdma_verbs.c
+++ b/drivers/infiniband/hw/erdma/erdma_verbs.c
@@ -781,6 +781,10 @@ static int get_mtt_entries(struct erdma_dev *dev, struct erdma_mem *mem,
mem->page_size = ib_umem_find_best_pgsz(mem->umem, req_page_size, virt);
+ if (!mem->page_size) {
+ ret = -EINVAL;
+ goto error_ret;
+ }
mem->page_offset = start & (mem->page_size - 1);
mem->mtt_nents = ib_umem_num_dma_blocks(mem->umem, mem->page_size);
mem->page_cnt = mem->mtt_nents;
mem->mtt = erdma_create_mtt(dev, MTT_SIZE(mem->page_cnt),
force_continuous);
--
2.43.0
GCC changed the default C standard dialect from gnu17 to gnu23,
which should not have impacted the kernel because it explicitly requests
the gnu11 standard in the main Makefile. However, there are certain
places in the s390 code that use their own CFLAGS without a '-std='
value, which break with this dialect change because of the kernel's own
definitions of bool, false, and true conflicting with the C23 reserved
keywords.
include/linux/stddef.h:11:9: error: cannot use keyword 'false' as enumeration constant
11 | false = 0,
| ^~~~~
include/linux/stddef.h:11:9: note: 'false' is a keyword with '-std=c23' onwards
include/linux/types.h:35:33: error: 'bool' cannot be defined via 'typedef'
35 | typedef _Bool bool;
| ^~~~
include/linux/types.h:35:33: note: 'bool' is a keyword with '-std=c23' onwards
Add '-std=gnu11' to the decompressor and purgatory CFLAGS to eliminate
these errors and make the C standard version of these areas match the
rest of the kernel.
Cc: stable(a)vger.kernel.org
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
I only see one other error in various files with a recent GCC 15.0.1
snapshot, which I can eliminate by dropping the version part of the
condition for CONFIG_GCC_ASM_FLAG_OUTPUT_BROKEN. Is this a regression of
the fix for the problem of GCC 14.2.0 or is something else doing on
here?
arch/s390/include/asm/bitops.h: Assembler messages:
arch/s390/include/asm/bitops.h:60: Error: operand 1: syntax error; missing ')' after base register
arch/s390/include/asm/bitops.h:60: Error: operand 2: syntax error; ')' not allowed here
arch/s390/include/asm/bitops.h:60: Error: junk at end of line: `,4'
arch/s390/include/asm/bitops.h:60: Error: operand 1: syntax error; missing ')' after base register
arch/s390/include/asm/bitops.h:60: Error: operand 2: syntax error; ')' not allowed here
arch/s390/include/asm/bitops.h:60: Error: junk at end of line: `,64'
arch/s390/include/asm/bitops.h:60: Error: operand 1: syntax error; missing ')' after base register
arch/s390/include/asm/bitops.h:60: Error: operand 2: syntax error; ')' not allowed here
arch/s390/include/asm/bitops.h:60: Error: junk at end of line: `,4'
make[6]: *** [scripts/Makefile.build:194: fs/gfs2/glock.o] Error 1
---
arch/s390/Makefile | 2 +-
arch/s390/purgatory/Makefile | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/s390/Makefile b/arch/s390/Makefile
index 3f25498dac65..5fae311203c2 100644
--- a/arch/s390/Makefile
+++ b/arch/s390/Makefile
@@ -22,7 +22,7 @@ KBUILD_AFLAGS_DECOMPRESSOR := $(CLANG_FLAGS) -m64 -D__ASSEMBLY__
ifndef CONFIG_AS_IS_LLVM
KBUILD_AFLAGS_DECOMPRESSOR += $(if $(CONFIG_DEBUG_INFO),$(aflags_dwarf))
endif
-KBUILD_CFLAGS_DECOMPRESSOR := $(CLANG_FLAGS) -m64 -O2 -mpacked-stack
+KBUILD_CFLAGS_DECOMPRESSOR := $(CLANG_FLAGS) -m64 -O2 -mpacked-stack -std=gnu11
KBUILD_CFLAGS_DECOMPRESSOR += -DDISABLE_BRANCH_PROFILING -D__NO_FORTIFY
KBUILD_CFLAGS_DECOMPRESSOR += -D__DECOMPRESSOR
KBUILD_CFLAGS_DECOMPRESSOR += -fno-delete-null-pointer-checks -msoft-float -mbackchain
diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
index 24eccaa29337..bdcf2a3b6c41 100644
--- a/arch/s390/purgatory/Makefile
+++ b/arch/s390/purgatory/Makefile
@@ -13,7 +13,7 @@ CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY
$(obj)/mem.o: $(srctree)/arch/s390/lib/mem.S FORCE
$(call if_changed_rule,as_o_S)
-KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes
+KBUILD_CFLAGS := -std=gnu11 -fno-strict-aliasing -Wall -Wstrict-prototypes
KBUILD_CFLAGS += -Wno-pointer-sign -Wno-sign-compare
KBUILD_CFLAGS += -fno-zero-initialized-in-bss -fno-builtin -ffreestanding
KBUILD_CFLAGS += -Os -m64 -msoft-float -fno-common
---
base-commit: b2832409e00b6330781458d7db0080508a35a9a8
change-id: 20250122-s390-fix-std-for-gcc-15-0abfa4caf757
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
On 28/01/2025 09:57, Jens Schmidt wrote:
> This is on Debian testing with the following kernel, built from
> the tarball on kernel.org:
>
> Linux sappc1 6.12.10 #4 SMP PREEMPT_DYNAMIC Fri Jan 17 22:17:45 CET 2025 x86_64 GNU/Linux
>
> More by chance than anything else I noticed this sysfs entry:
>
> /devices/pci0000:00/0000:00:02.0/rc/rc0/input33
> ^^
> immediately after a reboot. After letting the system run for
> a while the file name counter may well be in the thousands.
>
> I instrumented function drm_dp_cec_attach to dump the values
> of the expressions involved in the following test:
>
> if (aux->cec.adap->capabilities == cec_caps &&
> aux->cec.adap->available_log_addrs == num_las) {
>
> The result was that on every call I have
>
> aux->cec.adap->capabilities == 0b1101111110
> cec_caps == 0b0101111110
> aux->cec.adap->available_log_addrs == 4
> num_las == 4
>
> which triggers recreation of the sysfs entry.
>
> So the capabilities differ in CEC_CAP_REPLY_VENDOR_ID. If I
> clear that bit in above test, I am back to normal, getting only
>
> /devices/pci0000:00/0000:00:02.0/rc/rc0/input1
>
> and keeping that throughout the runtime of the system.
>
> Could this be related to commit
> 613f21505b25a4f43f33de00f11afc059bedde2b?
Well spotted! Yes, it is related to that commit.
I'll take a closer look at this tomorrow since this test against
cec_caps needs work.
Regards,
Hans
From: Aditya Garg <gargaditya08(a)live.com>
Before 6.13, random seed to the firmware was given based on the logic
whether the device had valid OTP or not, and such devices were found
mainly on the T2 and Apple Silicon Macs. In 6.13, the logic was changed,
and the device table was used for this purpose, so as to cover the special
case of BCM43752 chip.
During the transition, the device table for BCM4364 and BCM4355 Wi-Fi chips
which had valid OTP was not modified, thus breaking Wi-Fi on these devices.
This patch adds does the necessary changes, similar to the ones done for
other chips.
Fixes: ea11a89c3ac6 ("wifi: brcmfmac: add flag for random seed during firmware download")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aditya Garg <gargaditya08(a)live.com>
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
index e4395b1f8..d2caa80e9 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c
@@ -2712,7 +2712,7 @@ static const struct pci_device_id brcmf_pcie_devid_table[] = {
BRCMF_PCIE_DEVICE(BRCM_PCIE_4350_DEVICE_ID, WCC),
BRCMF_PCIE_DEVICE_SUB(0x4355, BRCM_PCIE_VENDOR_ID_BROADCOM, 0x4355, WCC),
BRCMF_PCIE_DEVICE(BRCM_PCIE_4354_RAW_DEVICE_ID, WCC),
- BRCMF_PCIE_DEVICE(BRCM_PCIE_4355_DEVICE_ID, WCC),
+ BRCMF_PCIE_DEVICE(BRCM_PCIE_4355_DEVICE_ID, WCC_SEED),
BRCMF_PCIE_DEVICE(BRCM_PCIE_4356_DEVICE_ID, WCC),
BRCMF_PCIE_DEVICE(BRCM_PCIE_43567_DEVICE_ID, WCC),
BRCMF_PCIE_DEVICE(BRCM_PCIE_43570_DEVICE_ID, WCC),
@@ -2723,7 +2723,7 @@ static const struct pci_device_id brcmf_pcie_devid_table[] = {
BRCMF_PCIE_DEVICE(BRCM_PCIE_43602_2G_DEVICE_ID, WCC),
BRCMF_PCIE_DEVICE(BRCM_PCIE_43602_5G_DEVICE_ID, WCC),
BRCMF_PCIE_DEVICE(BRCM_PCIE_43602_RAW_DEVICE_ID, WCC),
- BRCMF_PCIE_DEVICE(BRCM_PCIE_4364_DEVICE_ID, WCC),
+ BRCMF_PCIE_DEVICE(BRCM_PCIE_4364_DEVICE_ID, WCC_SEED),
BRCMF_PCIE_DEVICE(BRCM_PCIE_4365_DEVICE_ID, BCA),
BRCMF_PCIE_DEVICE(BRCM_PCIE_4365_2G_DEVICE_ID, BCA),
BRCMF_PCIE_DEVICE(BRCM_PCIE_4365_5G_DEVICE_ID, BCA),
--
2.39.5 (Apple Git-154)
From: yangerkun <yangerkun(a)huawei.com>
[ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ]
After we switch tmpfs dir operations from simple_dir_operations to
simple_offset_dir_operations, every rename happened will fill new dentry
to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
key starting with octx->newx_offset, and then set newx_offset equals to
free key + 1. This will lead to infinite readdir combine with rename
happened at the same time, which fail generic/736 in xfstests(detail show
as below).
1. create 5000 files(1 2 3...) under one dir
2. call readdir(man 3 readdir) once, and get one entry
3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
4. loop 2~3, until readdir return nothing or we loop too many
times(tmpfs break test with the second condition)
We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
directory reads") to fix it, record the last_index when we open dir, and
do not emit the entry which index >= last_index. The file->private_data
now used in offset dir can use directly to do this, and we also update
the last_index when we llseek the dir file.
Fixes: a2e459555c5f ("shmem: stable directory offsets")
Signed-off-by: yangerkun <yangerkun(a)huawei.com>
Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com
Reviewed-by: Chuck Lever <chuck.lever(a)oracle.com>
[brauner: only update last_index after seek when offset is zero like Jan suggested]
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Andrea Ciprietti <ciprietti(a)google.com>
---
fs/libfs.c | 39 ++++++++++++++++++++++++++++-----------
1 file changed, 28 insertions(+), 11 deletions(-)
diff --git a/fs/libfs.c b/fs/libfs.c
index dc0f7519045f..916c39e758b1 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -371,6 +371,15 @@ void simple_offset_destroy(struct offset_ctx *octx)
xa_destroy(&octx->xa);
}
+static int offset_dir_open(struct inode *inode, struct file *file)
+{
+ struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
+ unsigned long next_offset = (unsigned long)ctx->next_offset;
+
+ file->private_data = (void *)next_offset;
+ return 0;
+}
+
/**
* offset_dir_llseek - Advance the read position of a directory descriptor
* @file: an open directory whose position is to be updated
@@ -384,6 +393,9 @@ void simple_offset_destroy(struct offset_ctx *octx)
*/
static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
{
+ struct inode *inode = file->f_inode;
+ struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
+
switch (whence) {
case SEEK_CUR:
offset += file->f_pos;
@@ -397,7 +409,11 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
}
/* In this case, ->private_data is protected by f_pos_lock */
- file->private_data = NULL;
+ if (!offset) {
+ unsigned long next_offset = (unsigned long)ctx->next_offset;
+
+ file->private_data = (void *)next_offset;
+ }
return vfs_setpos(file, offset, U32_MAX);
}
@@ -427,7 +443,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
inode->i_ino, fs_umode_to_dtype(inode->i_mode));
}
-static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
+static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index)
{
struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode);
XA_STATE(xas, &so_ctx->xa, ctx->pos);
@@ -436,17 +452,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
while (true) {
dentry = offset_find_next(&xas);
if (!dentry)
- return ERR_PTR(-ENOENT);
+ return;
+
+ if (dentry2offset(dentry) >= last_index) {
+ dput(dentry);
+ return;
+ }
if (!offset_dir_emit(ctx, dentry)) {
dput(dentry);
- break;
+ return;
}
dput(dentry);
ctx->pos = xas.xa_index + 1;
}
- return NULL;
}
/**
@@ -473,22 +493,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
static int offset_readdir(struct file *file, struct dir_context *ctx)
{
struct dentry *dir = file->f_path.dentry;
+ long last_index = (long)file->private_data;
lockdep_assert_held(&d_inode(dir)->i_rwsem);
if (!dir_emit_dots(file, ctx))
return 0;
- /* In this case, ->private_data is protected by f_pos_lock */
- if (ctx->pos == 2)
- file->private_data = NULL;
- else if (file->private_data == ERR_PTR(-ENOENT))
- return 0;
- file->private_data = offset_iterate_dir(d_inode(dir), ctx);
+ offset_iterate_dir(d_inode(dir), ctx, last_index);
return 0;
}
const struct file_operations simple_offset_dir_operations = {
+ .open = offset_dir_open,
.llseek = offset_dir_llseek,
.iterate_shared = offset_readdir,
.read = generic_read_dir,
--
2.48.1.262.g85cc9f2d1e-goog
The auto_parser assumed sort() was stable, but the kernel's sort() uses
heapsort, which has never been stable. After commit 0e02ca29a563
("lib/sort: optimize heapsort with double-pop variation"), the order of
equal elements changed, causing the headset to fail to work.
Fix the issue by recording the original order of elements before
sorting and using it as a tiebreaker for equal elements in the
comparison function.
Fixes: b9030a005d58 ("ALSA: hda - Use standard sort function in hda_auto_parser.c")
Reported-by: Austrum <austrum.lab(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219158
Tested-by: Austrum <austrum.lab(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Kuan-Wei Chiu <visitorckw(a)gmail.com>
---
sound/pci/hda/hda_auto_parser.c | 8 +++++++-
sound/pci/hda/hda_auto_parser.h | 1 +
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/sound/pci/hda/hda_auto_parser.c b/sound/pci/hda/hda_auto_parser.c
index 84393f4f429d..8923813ce424 100644
--- a/sound/pci/hda/hda_auto_parser.c
+++ b/sound/pci/hda/hda_auto_parser.c
@@ -80,7 +80,11 @@ static int compare_input_type(const void *ap, const void *bp)
/* In case one has boost and the other one has not,
pick the one with boost first. */
- return (int)(b->has_boost_on_pin - a->has_boost_on_pin);
+ if (a->has_boost_on_pin != b->has_boost_on_pin)
+ return (int)(b->has_boost_on_pin - a->has_boost_on_pin);
+
+ /* Keep the original order */
+ return a->order - b->order;
}
/* Reorder the surround channels
@@ -400,6 +404,8 @@ int snd_hda_parse_pin_defcfg(struct hda_codec *codec,
reorder_outputs(cfg->speaker_outs, cfg->speaker_pins);
/* sort inputs in the order of AUTO_PIN_* type */
+ for (i = 0; i < cfg->num_inputs; i++)
+ cfg->inputs[i].order = i;
sort(cfg->inputs, cfg->num_inputs, sizeof(cfg->inputs[0]),
compare_input_type, NULL);
diff --git a/sound/pci/hda/hda_auto_parser.h b/sound/pci/hda/hda_auto_parser.h
index 579b11beac71..87af3d8c02f7 100644
--- a/sound/pci/hda/hda_auto_parser.h
+++ b/sound/pci/hda/hda_auto_parser.h
@@ -37,6 +37,7 @@ struct auto_pin_cfg_item {
unsigned int is_headset_mic:1;
unsigned int is_headphone_mic:1; /* Mic-only in headphone jack */
unsigned int has_boost_on_pin:1;
+ int order;
};
struct auto_pin_cfg;
--
2.34.1
When converting to folios the cleanup path of shmem_get_pages() was
missed. When a DMA remap fails and the max segment size is greater than
PAGE_SIZE it will attempt to retry the remap with a PAGE_SIZEd segment
size. The cleanup code isn't properly using the folio apis and as a
result isn't handling compound pages correctly.
v2 -> v3:
(Ville) Just use shmem_sg_free_table() as-is in the failure path of
shmem_get_pages(). shmem_sg_free_table() will clear mapping unevictable
but it will be reset when it retries in shmem_sg_alloc_table().
v1 -> v2:
(Ville) Fixed locations where we were not clearing mapping unevictable.
Cc: stable(a)vger.kernel.org
Cc: Ville Syrjala <ville.syrjala(a)linux.intel.com>
Cc: Vidya Srinivas <vidya.srinivas(a)intel.com>
Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13487
Link: https://lore.kernel.org/lkml/20250116135636.410164-1-bgeffon@google.com/
Fixes: 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Signed-off-by: Brian Geffon <bgeffon(a)google.com>
Suggested-by: Tomasz Figa <tfiga(a)google.com>
---
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index fe69f2c8527d..ae3343c81a64 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -209,8 +209,6 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
struct address_space *mapping = obj->base.filp->f_mapping;
unsigned int max_segment = i915_sg_segment_size(i915->drm.dev);
struct sg_table *st;
- struct sgt_iter sgt_iter;
- struct page *page;
int ret;
/*
@@ -239,9 +237,7 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
* for PAGE_SIZE chunks instead may be helpful.
*/
if (max_segment > PAGE_SIZE) {
- for_each_sgt_page(page, sgt_iter, st)
- put_page(page);
- sg_free_table(st);
+ shmem_sg_free_table(st, mapping, false, false);
kfree(st);
max_segment = PAGE_SIZE;
--
2.48.1.262.g85cc9f2d1e-goog
Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
mistakenly skipped charging any zswapped pages when a single call to
zswap_store_page() failed, even if some pages in the folio are
successfully stored in zswap.
Making things worse, these not-charged pages are uncharged in
zswap_entry_free(), making zswap charging inconsistent.
This inconsistency triggers two warnings when following these steps:
# On a machine with 64GiB of RAM and 36GiB of zswap
$ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng
$ sudo reboot
Two warnings are:
in mm/memcontrol.c:163, function obj_cgroup_release():
WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
in mm/page_counter.c:60, function page_counter_cancel():
if (WARN_ONCE(new < 0, "page_counter underflow: %ld nr_pages=%lu\n",
new, nr_pages))
Charge zswapped pages even if some pages of the folio are not zswapped.
After resolving the inconsistency, these warnings disappear.
Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
---
mm/zswap.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index 6504174fbc6a..92752cd05c75 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1568,20 +1568,20 @@ bool zswap_store(struct folio *folio)
bytes = zswap_store_page(page, objcg, pool);
if (bytes < 0)
- goto put_pool;
+ goto charge_zswap;
compressed_bytes += bytes;
}
- if (objcg) {
- obj_cgroup_charge_zswap(objcg, compressed_bytes);
- count_objcg_events(objcg, ZSWPOUT, nr_pages);
- }
-
atomic_long_add(nr_pages, &zswap_stored_pages);
count_vm_events(ZSWPOUT, nr_pages);
ret = true;
+charge_zswap:
+ if (objcg) {
+ obj_cgroup_charge_zswap(objcg, compressed_bytes);
+ count_objcg_events(objcg, ZSWPOUT, nr_pages);
+ }
put_pool:
zswap_pool_put(pool);
put_objcg:
--
2.47.1
Add everything needed to support the DSI output on Renesas r8a779h0
(V4M) SoC, and the DP output (via sn65dsi86 DSI to DP bridge) on the
Renesas grey-hawk board.
Overall the DSI and the board design is almost identical to Renesas
r8a779g0 and white-hawk board.
Note: the v4 no longer has the dts and the clk patches, as those have
been merged to renesas-devel.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen+renesas(a)ideasonboard.com>
---
Changes in v5:
- Add minItems/maxItems to the top level cmms & vsps properties
- Drop "minItems: 1" when not needed
- Link to v4: https://lore.kernel.org/r/20241213-rcar-gh-dsi-v4-0-f8e41425207b@ideasonboa…
Changes in v4:
- Dropped patches merged to renesas-devel
- Added new patch "dt-bindings: display: renesas,du: Add missing
maxItems" to fix the bindings
- Add the missing maxItems to "dt-bindings: display: renesas,du: Add
r8a779h0"
- Link to v3: https://lore.kernel.org/r/20241206-rcar-gh-dsi-v3-0-d74c2166fa15@ideasonboa…
Changes in v3:
- Update "Write DPTSR only if there are more than one crtc" patch to
"Write DPTSR only if the second source exists"
- Add Laurent's Rb
- Link to v2: https://lore.kernel.org/r/20241205-rcar-gh-dsi-v2-0-42471851df86@ideasonboa…
Changes in v2:
- Add the DT binding with a new conditional block, so that we can set
only the port@0 as required
- Drop port@1 from r8a779h0.dtsi (there's no port@1)
- Add a new patch to write DPTSR only if num_crtcs > 1
- Drop RCAR_DU_FEATURE_NO_DPTSR (not needed anymore)
- Add Cc: stable to the fix, and move it as first patch
- Added the tags from reviews
- Link to v1: https://lore.kernel.org/r/20241203-rcar-gh-dsi-v1-0-738ae1a95d2a@ideasonboa…
---
Tomi Valkeinen (7):
drm/rcar-du: dsi: Fix PHY lock bit check
drm/rcar-du: Write DPTSR only if the second source exists
dt-bindings: display: renesas,du: Add missing constraints
dt-bindings: display: renesas,du: Add r8a779h0
dt-bindings: display: bridge: renesas,dsi-csi2-tx: Add r8a779h0
drm/rcar-du: dsi: Add r8a779h0 support
drm/rcar-du: Add support for r8a779h0
.../display/bridge/renesas,dsi-csi2-tx.yaml | 1 +
.../devicetree/bindings/display/renesas,du.yaml | 67 ++++++++++++++++++++--
drivers/gpu/drm/renesas/rcar-du/rcar_du_drv.c | 18 ++++++
drivers/gpu/drm/renesas/rcar-du/rcar_du_group.c | 24 ++++++--
drivers/gpu/drm/renesas/rcar-du/rcar_mipi_dsi.c | 4 +-
.../gpu/drm/renesas/rcar-du/rcar_mipi_dsi_regs.h | 1 -
6 files changed, 102 insertions(+), 13 deletions(-)
---
base-commit: adc218676eef25575469234709c2d87185ca223a
change-id: 20241008-rcar-gh-dsi-9c01f5deeac8
Best regards,
--
Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
There are two variables that indicate the interrupt type to be used
in the next test execution, global "irq_type" and test->irq_type.
The former is referenced from pci_endpoint_test_get_irq() to preserve
the current type for ioctl(PCITEST_GET_IRQTYPE).
In pci_endpoint_test_request_irq(), since this global variable is
referenced when an error occurs, the unintended error message is
displayed.
For example, the following message shows "MSI 3" even if the current
irq type becomes "MSI-X".
# pcitest -i 2
pci-endpoint-test 0000:01:00.0: Failed to request IRQ 30 for MSI 3
SET IRQ TYPE TO MSI-X: NOT OKAY
Fix this issue by using test->irq_type instead of global "irq_type".
Cc: stable(a)vger.kernel.org
Fixes: b2ba9225e031 ("misc: pci_endpoint_test: Avoid using module parameter to determine irqtype")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
---
drivers/misc/pci_endpoint_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index 302955c20979..a342587fc78a 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -235,7 +235,7 @@ static bool pci_endpoint_test_request_irq(struct pci_endpoint_test *test)
return true;
fail:
- switch (irq_type) {
+ switch (test->irq_type) {
case IRQ_TYPE_INTX:
dev_err(dev, "Failed to request IRQ %d for Legacy\n",
pci_irq_vector(pdev, i));
--
2.25.1
Attention: Sir/Madam,
I am Cesandra Pace, a Research Assistant working with Pharma CURE Laboratory Ltd. We are Bio-pharmaceutical Company here in the U.K.I'm looking for a reliable businessman/individual in your region to represent this company in sourcing some of our basic raw materials used for the manufacturing of high quality Anti-Viral Vaccines, Cancer treatment medications and other lifesaving Pharmaceutical Products.
When I receive a response from you, I shall divulge to you my intent for your consideration.
Best regards,
Cesandra Pace
Research & Dev Dept-
Pharma CURE Laboratory Ltd UK
Attention: Sir/Madam,
I am Cesandra Pace, a Research Assistant working with Pharma CURE Laboratory Ltd. We are Bio-pharmaceutical Company here in the U.K.I'm looking for a reliable businessman/individual in your region to represent this company in sourcing some of our basic raw materials used for the manufacturing of high quality Anti-Viral Vaccines, Cancer treatment medications and other lifesaving Pharmaceutical Products.
When I receive a response from you, I shall divulge to you my intent for your consideration.
Best regards,
Cesandra Pace
Research & Dev Dept-
Pharma CURE Laboratory Ltd UK
In xfs_inactive(), xfs_reflink_cancel_cow_range() is called
without error handling, risking unnoticed failures and
inconsistent behavior compared to other parts of the code.
Fix this issue by adding an error handling for the
xfs_reflink_cancel_cow_range(), improving code robustness.
Fixes: 6231848c3aa5 ("xfs: check for cow blocks before trying to clear them")
Cc: <stable(a)vger.kernel.org> # v4.17
Reviewed-by: "Darrick J. Wong" <djwong(a)kernel.org>
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
fs/xfs/xfs_inode.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index c8ad2606f928..1ff514b6c035 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1404,8 +1404,11 @@ xfs_inactive(
goto out;
/* Try to clean out the cow blocks if there are any. */
- if (xfs_inode_has_cow_data(ip))
- xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);
+ if (xfs_inode_has_cow_data(ip)) {
+ error = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);
+ if (error)
+ goto out;
+ }
if (VFS_I(ip)->i_nlink != 0) {
/*
--
2.42.0.windows.2
reveliofuzzing reported that a SCSI_IOCTL_SEND_COMMAND ioctl with out_len
set to 0xd42, SCSI command set to ATA_16 PASS-THROUGH, ATA command set to
ATA_NOP, and protocol set to ATA_PROT_PIO, can cause ata_pio_sector() to
write outside the allocated buffer, overwriting random memory.
While a ATA device is supposed to abort a ATA_NOP command, there does seem
to be a bug either in libata-sff or QEMU, where either this status is not
set, or the status is cleared before read by ata_sff_hsm_move().
Anyway, that is most likely a separate bug.
Looking at __atapi_pio_bytes(), it already has a safety check to ensure
that __atapi_pio_bytes() cannot write outside the allocated buffer.
Add a similar check to ata_pio_sector(), such that also ata_pio_sector()
cannot write outside the allocated buffer.
Cc: stable(a)vger.kernel.org
Reported-by: reveliofuzzing <reveliofuzzing(a)gmail.com>
Closes: https://lore.kernel.org/linux-ide/CA+-ZZ_jTgxh3bS7m+KX07_EWckSnW3N2adX3KV63…
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
Changes since v1:
-Add stable to Cc.
drivers/ata/libata-sff.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
index 67f277e1c3bf..5a46c066abc3 100644
--- a/drivers/ata/libata-sff.c
+++ b/drivers/ata/libata-sff.c
@@ -601,7 +601,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
{
struct ata_port *ap = qc->ap;
struct page *page;
- unsigned int offset;
+ unsigned int offset, count;
if (!qc->cursg) {
qc->curbytes = qc->nbytes;
@@ -617,25 +617,27 @@ static void ata_pio_sector(struct ata_queued_cmd *qc)
page = nth_page(page, (offset >> PAGE_SHIFT));
offset %= PAGE_SIZE;
- trace_ata_sff_pio_transfer_data(qc, offset, qc->sect_size);
+ /* don't overrun current sg */
+ count = min(qc->cursg->length - qc->cursg_ofs, qc->sect_size);
+
+ trace_ata_sff_pio_transfer_data(qc, offset, count);
/*
* Split the transfer when it splits a page boundary. Note that the
* split still has to be dword aligned like all ATA data transfers.
*/
WARN_ON_ONCE(offset % 4);
- if (offset + qc->sect_size > PAGE_SIZE) {
+ if (offset + count > PAGE_SIZE) {
unsigned int split_len = PAGE_SIZE - offset;
ata_pio_xfer(qc, page, offset, split_len);
- ata_pio_xfer(qc, nth_page(page, 1), 0,
- qc->sect_size - split_len);
+ ata_pio_xfer(qc, nth_page(page, 1), 0, count - split_len);
} else {
- ata_pio_xfer(qc, page, offset, qc->sect_size);
+ ata_pio_xfer(qc, page, offset, count);
}
- qc->curbytes += qc->sect_size;
- qc->cursg_ofs += qc->sect_size;
+ qc->curbytes += count;
+ qc->cursg_ofs += count;
if (qc->cursg_ofs == qc->cursg->length) {
qc->cursg = sg_next(qc->cursg);
--
2.48.1
This series primarily adds check at relevant places in venus driver
where there are possible OOB accesses due to unexpected payload from
venus firmware. The patches describes the specific OOB possibility.
Please review and share your feedback.
Validated on sc7180(v4), rb5(v6) and db410c(v1).
Changes in v3:
- update the packet parsing logic in hfi_parser. The utility parsing api
now returns the size of data parsed, accordingly the parser adjust the
remaining bytes, taking care of OOB scenario as well (Bryan)
- Link to v2:
https://lore.kernel.org/r/20241128-venus_oob_2-v2-0-483ae0a464b8@quicinc.com
Changes in v2:
- init_codec to always update with latest payload from firmware
(Dmitry/Bryan)
- Rewrite the logic of packet parsing to consider payload size for
different packet type (Bryan)
- Consider reading sfr data till available space (Dmitry)
- Add reviewed-by tags
- Link to v1:
https://lore.kernel.org/all/20241105-venus_oob-v1-0-8d4feedfe2bb@quicinc.co…
Signed-off-by: Vikash Garodia <quic_vgarodia(a)quicinc.com>
---
Vikash Garodia (4):
media: venus: hfi_parser: add check to avoid out of bound access
media: venus: hfi_parser: refactor hfi packet parsing logic
media: venus: hfi: add check to handle incorrect queue size
media: venus: hfi: add a check to handle OOB in sfr region
drivers/media/platform/qcom/venus/hfi_parser.c | 94 +++++++++++++++++++-------
drivers/media/platform/qcom/venus/hfi_venus.c | 15 +++-
2 files changed, 82 insertions(+), 27 deletions(-)
---
base-commit: c7ccf3683ac9746b263b0502255f5ce47f64fe0a
change-id: 20241115-venus_oob_2-21708239176a
Best regards,
--
Vikash Garodia <quic_vgarodia(a)quicinc.com>
Current unmapped_area() may fail to get the available area.
For example, we have a vma range like below:
0123456789abcdef
m m A m m
Let's assume start_gap is 0x2000 and stack_guard_gap is 0x1000. And now
we are looking for free area with size 0x1000 within [0x2000, 0xd000].
The unmapped_area_topdown() could find address at 0x8000, while
unmapped_area() fails.
In original code before commit 3499a13168da ("mm/mmap: use maple tree
for unmapped_area{_topdown}"), the logic is:
* find a gap with total length, including alignment
* adjust start address with alignment
What we do now is:
* find a gap with total length, including alignment
* adjust start address with alignment
* then compare the left range with total length
This is not necessary to compare with total length again after start
address adjusted.
Also, it is not correct to minus 1 here. This may not trigger an issue
in real world, since address are usually aligned with page size.
Fixes: 58c5d0d6d522 ("mm/mmap: regression fix for unmapped_area{_topdown}")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
CC: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
CC: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
CC: Vlastimil Babka <vbabka(a)suse.cz>
CC: Jann Horn <jannh(a)google.com>
CC: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Cc: <stable(a)vger.kernel.org>
---
mm/vma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vma.c b/mm/vma.c
index 3f45a245e31b..d82fdbc710b0 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2668,7 +2668,7 @@ unsigned long unmapped_area(struct vm_unmapped_area_info *info)
gap += (info->align_offset - gap) & info->align_mask;
tmp = vma_next(&vmi);
if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */
- if (vm_start_gap(tmp) < gap + length - 1) {
+ if (vm_start_gap(tmp) < gap + info->length) {
low_limit = tmp->vm_end;
vma_iter_reset(&vmi);
goto retry;
--
2.34.1
We should select PCI_HYPERV here, otherwise it's possible for devices to
not show up as expected, at least not in an orderly manner.
Cc: stable(a)vger.kernel.org
Cc: Wei Liu <wei.liu(a)kernel.org>
Signed-off-by: Hamza Mahfooz <hamzamahfooz(a)linux.microsoft.com>
---
drivers/hv/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 862c47b191af..6ee75b3f0fa6 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -9,6 +9,7 @@ config HYPERV
select PARAVIRT
select X86_HV_CALLBACK_VECTOR if X86
select OF_EARLY_FLATTREE if OF
+ select PCI_HYPERV if PCI
help
Select this option to run Linux as a Hyper-V client operating
system.
--
2.47.1
When an Icelake or Sapphire Rapids CPU isn't providing the maximum and
critical thresholds for particular DIMM the driver should return an
error to the userspace instead of giving it stale (best case) or wrong
(the structure contains all zeros after kzalloc() call) data.
The issue can be reproduced by binding the peci driver while the host is
fully booted and idle, this makes PECI interaction unreliable enough.
Fixes: 73bc1b885dae ("hwmon: peci: Add dimmtemp driver")
Fixes: 621995b6d795 ("hwmon: (peci/dimmtemp) Add Sapphire Rapids support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paul Fertser <fercerpav(a)gmail.com>
---
drivers/hwmon/peci/dimmtemp.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/hwmon/peci/dimmtemp.c b/drivers/hwmon/peci/dimmtemp.c
index d6762259dd69..fbe82d9852e0 100644
--- a/drivers/hwmon/peci/dimmtemp.c
+++ b/drivers/hwmon/peci/dimmtemp.c
@@ -127,8 +127,6 @@ static int update_thresholds(struct peci_dimmtemp *priv, int dimm_no)
return 0;
ret = priv->gen_info->read_thresholds(priv, dimm_order, chan_rank, &data);
- if (ret == -ENODATA) /* Use default or previous value */
- return 0;
if (ret)
return ret;
@@ -509,11 +507,11 @@ read_thresholds_icx(struct peci_dimmtemp *priv, int dimm_order, int chan_rank, u
ret = peci_ep_pci_local_read(priv->peci_dev, 0, 13, 0, 2, 0xd4, ®_val);
if (ret || !(reg_val & BIT(31)))
- return -ENODATA; /* Use default or previous value */
+ return -ENODATA;
ret = peci_ep_pci_local_read(priv->peci_dev, 0, 13, 0, 2, 0xd0, ®_val);
if (ret)
- return -ENODATA; /* Use default or previous value */
+ return -ENODATA;
/*
* Device 26, Offset 224e0: IMC 0 channel 0 -> rank 0
@@ -546,11 +544,11 @@ read_thresholds_spr(struct peci_dimmtemp *priv, int dimm_order, int chan_rank, u
ret = peci_ep_pci_local_read(priv->peci_dev, 0, 30, 0, 2, 0xd4, ®_val);
if (ret || !(reg_val & BIT(31)))
- return -ENODATA; /* Use default or previous value */
+ return -ENODATA;
ret = peci_ep_pci_local_read(priv->peci_dev, 0, 30, 0, 2, 0xd0, ®_val);
if (ret)
- return -ENODATA; /* Use default or previous value */
+ return -ENODATA;
/*
* Device 26, Offset 219a8: IMC 0 channel 0 -> rank 0
--
2.34.1
From: Kan Liang <kan.liang(a)linux.intel.com>
The EAX of the CPUID Leaf 023H enumerates the mask of valid sub-leaves.
To tell the availability of the sub-leaf 1 (enumerate the counter mask),
perf should check the bit 1 (0x2) of EAS, rather than bit 0 (0x1).
The error is not user-visible on bare metal. Because the sub-leaf 0 and
the sub-leaf 1 are always available. However, it may bring issues in a
virtualization environment when a VMM only enumerates the sub-leaf 0.
Fixes: eb467aaac21e ("perf/x86/intel: Support Architectural PerfMon Extension leaf")
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/events/intel/core.c | 4 ++--
arch/x86/include/asm/perf_event.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 5e8521a54474..12eb96219740 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4966,8 +4966,8 @@ static void update_pmu_cap(struct x86_hybrid_pmu *pmu)
if (ebx & ARCH_PERFMON_EXT_EQ)
pmu->config_mask |= ARCH_PERFMON_EVENTSEL_EQ;
- if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF_BIT) {
- cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF,
+ if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF) {
+ cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF_BIT,
&eax, &ebx, &ecx, &edx);
pmu->cntr_mask64 = eax;
pmu->fixed_cntr_mask64 = ebx;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index adaeb8ca3a8a..71e2ae021374 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -197,7 +197,7 @@ union cpuid10_edx {
#define ARCH_PERFMON_EXT_UMASK2 0x1
#define ARCH_PERFMON_EXT_EQ 0x2
#define ARCH_PERFMON_NUM_COUNTER_LEAF_BIT 0x1
-#define ARCH_PERFMON_NUM_COUNTER_LEAF 0x1
+#define ARCH_PERFMON_NUM_COUNTER_LEAF BIT(ARCH_PERFMON_NUM_COUNTER_LEAF_BIT)
/*
* Intel Architectural LBR CPUID detection/enumeration details:
--
2.40.1
Recently, a few issues linked to MPTCP have been reported by syzbot. All
the remaining ones are addressed in this series.
- Patch 1: Address "KMSAN: uninit-value in mptcp_incoming_options (2)".
A fix for v5.11.
- Patch 2: Address "WARNING in mptcp_pm_nl_set_flags (2)". A fix for
v5.18.
- Patch 3: Address "WARNING in __mptcp_clean_una (2)". A fix for v6.4,
backported up to v6.1.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (1):
mptcp: pm: only set fullmesh for subflow endp
Paolo Abeni (2):
mptcp: consolidate suboption status
mptcp: handle fastopen disconnect correctly
net/mptcp/options.c | 13 +++++--------
net/mptcp/pm_netlink.c | 3 ++-
net/mptcp/protocol.c | 4 +++-
net/mptcp/protocol.h | 30 ++++++++++++++++--------------
4 files changed, 26 insertions(+), 24 deletions(-)
---
base-commit: 15a901361ec3fb1c393f91880e1cbf24ec0a88bd
change-id: 20250123-net-mptcp-syzbot-issues-ffdbbb9b85d7
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
The patch titled
Subject: kfence: skip __GFP_THISNODE allocations on NUMA systems
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kfence-skip-__gfp_thisnode-allocations-on-numa-systems.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Marco Elver <elver(a)google.com>
Subject: kfence: skip __GFP_THISNODE allocations on NUMA systems
Date: Fri, 24 Jan 2025 13:01:38 +0100
On NUMA systems, __GFP_THISNODE indicates that an allocation _must_ be on
a particular node, and failure to allocate on the desired node will result
in a failed allocation.
Skip __GFP_THISNODE allocations if we are running on a NUMA system, since
KFENCE can't guarantee which node its pool pages are allocated on.
Link: https://lkml.kernel.org/r/20250124120145.410066-1-elver@google.com
Fixes: 236e9f153852 ("kfence: skip all GFP_ZONEMASK allocations")
Signed-off-by: Marco Elver <elver(a)google.com>
Reported-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Chistoph Lameter <cl(a)linux.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kfence/core.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/kfence/core.c~kfence-skip-__gfp_thisnode-allocations-on-numa-systems
+++ a/mm/kfence/core.c
@@ -21,6 +21,7 @@
#include <linux/log2.h>
#include <linux/memblock.h>
#include <linux/moduleparam.h>
+#include <linux/nodemask.h>
#include <linux/notifier.h>
#include <linux/panic_notifier.h>
#include <linux/random.h>
@@ -1084,6 +1085,7 @@ void *__kfence_alloc(struct kmem_cache *
* properties (e.g. reside in DMAable memory).
*/
if ((flags & GFP_ZONEMASK) ||
+ ((flags & __GFP_THISNODE) && num_online_nodes() > 1) ||
(s->flags & (SLAB_CACHE_DMA | SLAB_CACHE_DMA32))) {
atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_INCOMPAT]);
return NULL;
_
Patches currently in -mm which might be from elver(a)google.com are
kfence-skip-__gfp_thisnode-allocations-on-numa-systems.patch
The patch titled
Subject: nilfs2: fix possible int overflows in nilfs_fiemap()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-possible-int-overflows-in-nilfs_fiemap.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Subject: nilfs2: fix possible int overflows in nilfs_fiemap()
Date: Sat, 25 Jan 2025 07:20:53 +0900
Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result
by being prepared to go through potentially maxblocks == INT_MAX blocks,
the value in n may experience an overflow caused by left shift of blkbits.
While it is extremely unlikely to occur, play it safe and cast right hand
expression to wider type to mitigate the issue.
Found by Linux Verification Center (linuxtesting.org) with static analysis
tool SVACE.
Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com
Fixes: 622daaff0a89 ("nilfs2: fiemap support")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/inode.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/nilfs2/inode.c~nilfs2-fix-possible-int-overflows-in-nilfs_fiemap
+++ a/fs/nilfs2/inode.c
@@ -1186,7 +1186,7 @@ int nilfs_fiemap(struct inode *inode, st
if (size) {
if (phys && blkphy << blkbits == phys + size) {
/* The current extent goes on */
- size += n << blkbits;
+ size += (u64)n << blkbits;
} else {
/* Terminate the current extent */
ret = fiemap_fill_next_extent(
@@ -1199,14 +1199,14 @@ int nilfs_fiemap(struct inode *inode, st
flags = FIEMAP_EXTENT_MERGED;
logical = blkoff << blkbits;
phys = blkphy << blkbits;
- size = n << blkbits;
+ size = (u64)n << blkbits;
}
} else {
/* Start a new extent */
flags = FIEMAP_EXTENT_MERGED;
logical = blkoff << blkbits;
phys = blkphy << blkbits;
- size = n << blkbits;
+ size = (u64)n << blkbits;
}
blkoff += n;
}
_
Patches currently in -mm which might be from n.zhandarovich(a)fintech.ru are
nilfs2-fix-possible-int-overflows-in-nilfs_fiemap.patch
Having the exec queue snapshot inside a "GuC CT" section was always
wrong. Commit c28fd6c358db ("drm/xe/devcoredump: Improve section
headings and add tile info") tried to fix that bug, but with that also
broke the mesa tool that parses the devcoredump, hence it was reverted
in commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool").
With the mesa tool also fixed, this can propagate as a fix on both
kernel and userspace side to avoid unnecessary headache for a debug
feature.
Cc: John Harrison <John.C.Harrison(a)Intel.com>
Cc: Julia Filipchuk <julia.filipchuk(a)intel.com>
Cc: José Roberto de Souza <jose.souza(a)intel.com>
Cc: stable(a)vger.kernel.org
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
---
drivers/gpu/drm/xe/xe_devcoredump.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index 81dc7795c0651..a7946a76777e7 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -119,11 +119,7 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count,
drm_puts(&p, "\n**** GuC CT ****\n");
xe_guc_ct_snapshot_print(ss->guc.ct, &p);
- /*
- * Don't add a new section header here because the mesa debug decoder
- * tool expects the context information to be in the 'GuC CT' section.
- */
- /* drm_puts(&p, "\n**** Contexts ****\n"); */
+ drm_puts(&p, "\n**** Contexts ****\n");
xe_guc_exec_queue_snapshot_print(ss->ge, &p);
drm_puts(&p, "\n**** Job ****\n");
--
2.48.0
From: Lucas De Marchi <lucas.demarchi(a)intel.com>
Commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool") partially reverted some changes to workaround breakage
caused to mesa tools. However, in doing so it also broke fetching the
GuC log via debugfs since xe_print_blob_ascii85() simply bails out.
The fix is to avoid the extra newlines: the devcoredump interface is
line-oriented and adding random newlines in the middle breaks it. If a
tool is able to parse it by looking at the data and checking for chars
that are out of the ascii85 space, it can still do so. A format change
that breaks the line-oriented output on devcoredump however needs better
coordination with existing tools.
v2:
- added suffix description comment
Reviewed-by: José Roberto de Souza <jose.souza(a)intel.com>
Cc: John Harrison <John.C.Harrison(a)Intel.com>
Cc: Julia Filipchuk <julia.filipchuk(a)intel.com>
Cc: José Roberto de Souza <jose.souza(a)intel.com>
Cc: stable(a)vger.kernel.org
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Fixes: ec1455ce7e35 ("drm/xe/devcoredump: Add ASCII85 dump helper function")
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
---
drivers/gpu/drm/xe/xe_devcoredump.c | 33 +++++++++++------------------
drivers/gpu/drm/xe/xe_devcoredump.h | 2 +-
drivers/gpu/drm/xe/xe_guc_ct.c | 3 ++-
drivers/gpu/drm/xe/xe_guc_log.c | 4 +++-
4 files changed, 18 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index 81dc7795c0651..6f73b1ba0f2aa 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -395,42 +395,33 @@ int xe_devcoredump_init(struct xe_device *xe)
/**
* xe_print_blob_ascii85 - print a BLOB to some useful location in ASCII85
*
- * The output is split to multiple lines because some print targets, e.g. dmesg
- * cannot handle arbitrarily long lines. Note also that printing to dmesg in
- * piece-meal fashion is not possible, each separate call to drm_puts() has a
- * line-feed automatically added! Therefore, the entire output line must be
- * constructed in a local buffer first, then printed in one atomic output call.
+ * The output is split to multiple print calls because some print targets, e.g.
+ * dmesg cannot handle arbitrarily long lines. These targets may add newline
+ * between calls.
*
* There is also a scheduler yield call to prevent the 'task has been stuck for
* 120s' kernel hang check feature from firing when printing to a slow target
* such as dmesg over a serial port.
*
- * TODO: Add compression prior to the ASCII85 encoding to shrink huge buffers down.
- *
* @p: the printer object to output to
* @prefix: optional prefix to add to output string
+ * @suffix: optional suffix to add at the end. 0 disables it and is
+ * not added to the output, which is useful when using multiple calls
+ * to dump data to @p
* @blob: the Binary Large OBject to dump out
* @offset: offset in bytes to skip from the front of the BLOB, must be a multiple of sizeof(u32)
* @size: the size in bytes of the BLOB, must be a multiple of sizeof(u32)
*/
-void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
+void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, char suffix,
const void *blob, size_t offset, size_t size)
{
const u32 *blob32 = (const u32 *)blob;
char buff[ASCII85_BUFSZ], *line_buff;
size_t line_pos = 0;
- /*
- * Splitting blobs across multiple lines is not compatible with the mesa
- * debug decoder tool. Note that even dropping the explicit '\n' below
- * doesn't help because the GuC log is so big some underlying implementation
- * still splits the lines at 512K characters. So just bail completely for
- * the moment.
- */
- return;
-
#define DMESG_MAX_LINE_LEN 800
-#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "\n\0" */
+ /* Always leave space for the suffix char and the \0 */
+#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "<suffix>\0" */
if (size & 3)
drm_printf(p, "Size not word aligned: %zu", size);
@@ -462,7 +453,6 @@ void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
line_pos += strlen(line_buff + line_pos);
if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) {
- line_buff[line_pos++] = '\n';
line_buff[line_pos++] = 0;
drm_puts(p, line_buff);
@@ -474,10 +464,11 @@ void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
}
}
+ if (suffix)
+ line_buff[line_pos++] = suffix;
+
if (line_pos) {
- line_buff[line_pos++] = '\n';
line_buff[line_pos++] = 0;
-
drm_puts(p, line_buff);
}
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h b/drivers/gpu/drm/xe/xe_devcoredump.h
index 6a17e6d601022..5391a80a4d1ba 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.h
+++ b/drivers/gpu/drm/xe/xe_devcoredump.h
@@ -29,7 +29,7 @@ static inline int xe_devcoredump_init(struct xe_device *xe)
}
#endif
-void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
+void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, char suffix,
const void *blob, size_t offset, size_t size);
#endif
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 8b65c5e959cc2..50c8076b51585 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1724,7 +1724,8 @@ void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot,
snapshot->g2h_outstanding);
if (snapshot->ctb)
- xe_print_blob_ascii85(p, "CTB data", snapshot->ctb, 0, snapshot->ctb_size);
+ xe_print_blob_ascii85(p, "CTB data", '\n',
+ snapshot->ctb, 0, snapshot->ctb_size);
} else {
drm_puts(p, "CT disabled\n");
}
diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c
index 80151ff6a71f8..44482ea919924 100644
--- a/drivers/gpu/drm/xe/xe_guc_log.c
+++ b/drivers/gpu/drm/xe/xe_guc_log.c
@@ -207,8 +207,10 @@ void xe_guc_log_snapshot_print(struct xe_guc_log_snapshot *snapshot, struct drm_
remain = snapshot->size;
for (i = 0; i < snapshot->num_chunks; i++) {
size_t size = min(GUC_LOG_CHUNK_SIZE, remain);
+ const char *prefix = i ? NULL : "Log data";
+ char suffix = i == snapshot->num_chunks - 1 ? '\n' : 0;
- xe_print_blob_ascii85(p, i ? NULL : "Log data", snapshot->copy[i], 0, size);
+ xe_print_blob_ascii85(p, prefix, suffix, snapshot->copy[i], 0, size);
remain -= size;
}
}
--
2.48.1
The patch titled
Subject: mm: kmemleak: fix upper boundary check for physical address objects
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-kmemleak-fix-upper-boundary-check-for-physical-address-objects.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Catalin Marinas <catalin.marinas(a)arm.com>
Subject: mm: kmemleak: fix upper boundary check for physical address objects
Date: Mon, 27 Jan 2025 18:42:33 +0000
Memblock allocations are registered by kmemleak separately, based on their
physical address. During the scanning stage, it checks whether an object
is within the min_low_pfn and max_low_pfn boundaries and ignores it
otherwise.
With the recent addition of __percpu pointer leak detection (commit
6c99d4eb7c5e ("kmemleak: enable tracking for percpu pointers")), kmemleak
started reporting leaks in setup_zone_pageset() and
setup_per_cpu_pageset(). These were caused by the node_data[0] object
(initialised in alloc_node_data()) ending on the PFN_PHYS(max_low_pfn)
boundary. The non-strict upper boundary check introduced by commit
84c326299191 ("mm: kmemleak: check physical address when scan") causes the
pg_data_t object to be ignored (not scanned) and the __percpu pointers it
contains to be reported as leaks.
Make the max_low_pfn upper boundary check strict when deciding whether to
ignore a physical address object and not scan it.
Link: https://lkml.kernel.org/r/20250127184233.2974311-1-catalin.marinas@arm.com
Fixes: 84c326299191 ("mm: kmemleak: check physical address when scan")
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Reported-by: Jakub Kicinski <kuba(a)kernel.org>
Tested-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Cc: Patrick Wang <patrick.wang.shcn(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [6.0.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/kmemleak.c~mm-kmemleak-fix-upper-boundary-check-for-physical-address-objects
+++ a/mm/kmemleak.c
@@ -1689,7 +1689,7 @@ static void kmemleak_scan(void)
unsigned long phys = object->pointer;
if (PHYS_PFN(phys) < min_low_pfn ||
- PHYS_PFN(phys + object->size) >= max_low_pfn)
+ PHYS_PFN(phys + object->size) > max_low_pfn)
__paint_it(object, KMEMLEAK_BLACK);
}
_
Patches currently in -mm which might be from catalin.marinas(a)arm.com are
mm-kmemleak-fix-upper-boundary-check-for-physical-address-objects.patch
When attaching uretprobes to processes running inside docker, the attached
process is segfaulted when encountering the retprobe.
The reason is that now that uretprobe is a system call the default seccomp
filters in docker block it as they only allow a specific set of known
syscalls. This is true for other userspace applications which use seccomp
to control their syscall surface.
Since uretprobe is a "kernel implementation detail" system call which is
not used by userspace application code directly, it is impractical and
there's very little point in forcing all userspace applications to
explicitly allow it in order to avoid crashing tracked processes.
Pass this systemcall through seccomp without depending on configuration.
Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe")
Reported-by: Rafael Buchbinder <rafi(a)rbk.io>
Link: https://lore.kernel.org/lkml/CAHsH6Gs3Eh8DFU0wq58c_LF8A4_+o6z456J7BidmcVY2A…
Cc: stable(a)vger.kernel.org
Signed-off-by: Eyal Birger <eyal.birger(a)gmail.com>
---
The following reproduction script synthetically demonstrates the problem:
cat > /tmp/x.c << EOF
char *syscalls[] = {
"write",
"exit_group",
"fstat",
};
__attribute__((noinline)) int probed(void)
{
printf("Probed\n");
return 1;
}
void apply_seccomp_filter(char **syscalls, int num_syscalls)
{
scmp_filter_ctx ctx;
ctx = seccomp_init(SCMP_ACT_KILL);
for (int i = 0; i < num_syscalls; i++) {
seccomp_rule_add(ctx, SCMP_ACT_ALLOW,
seccomp_syscall_resolve_name(syscalls[i]), 0);
}
seccomp_load(ctx);
seccomp_release(ctx);
}
int main(int argc, char *argv[])
{
int num_syscalls = sizeof(syscalls) / sizeof(syscalls[0]);
apply_seccomp_filter(syscalls, num_syscalls);
probed();
return 0;
}
EOF
cat > /tmp/trace.bt << EOF
uretprobe:/tmp/x:probed
{
printf("ret=%d\n", retval);
}
EOF
gcc -o /tmp/x /tmp/x.c -lseccomp
/usr/bin/bpftrace /tmp/trace.bt &
sleep 5 # wait for uretprobe attach
/tmp/x
pkill bpftrace
rm /tmp/x /tmp/x.c /tmp/trace.bt
---
kernel/seccomp.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 385d48293a5f..10a55c9b5c18 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1359,6 +1359,11 @@ int __secure_computing(const struct seccomp_data *sd)
this_syscall = sd ? sd->nr :
syscall_get_nr(current, current_pt_regs());
+#ifdef CONFIG_X86_64
+ if (unlikely(this_syscall == __NR_uretprobe) && !in_ia32_syscall())
+ return 0;
+#endif
+
switch (mode) {
case SECCOMP_MODE_STRICT:
__secure_computing_strict(this_syscall); /* may call do_exit */
--
2.43.0
This small series adds support for non-coherent video capture buffers
on Rockchip ISP V1. Patch 1 fixes cache management for dmabuf's
allocated by dma-contig allocator. Patch 2 allows non-coherent
allocations on the rkisp1 capture queue. Some timing measurements are
provided in the commit message of patch 2.
Signed-off-by: Mikhail Rudenko <mike.rudenko(a)gmail.com>
---
Changes in v2:
- Fix vb2_dc_dmabuf_ops_{begin,end}_cpu_access() for non-coherent buffers.
- Add cache management timing information to patch 2 commit message.
- Link to v1: https://lore.kernel.org/r/20250102-b4-rkisp-noncoherent-v1-1-bba164f7132c@g…
---
Mikhail Rudenko (2):
media: videobuf2: Fix dmabuf cache sync/flush in dma-contig
media: rkisp1: Allow non-coherent video capture buffers
drivers/media/common/videobuf2/videobuf2-dma-contig.c | 14 ++++++++++++++
drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 1 +
2 files changed, 15 insertions(+)
---
base-commit: 94794b5ce4d90ab134b0b101a02fddf6e74c437d
change-id: 20241231-b4-rkisp-noncoherent-ad6e7c7a68ba
Best regards,
--
Mikhail Rudenko <mike.rudenko(a)gmail.com>
Syzkaller has reported a warning in to_nfit_bus_uuid(): "only secondary
bus families can be translated". This warning is emited if the argument
is equal to NVDIMM_BUS_FAMILY_NFIT == 0. Function acpi_nfit_ctl() first
verifies that a user-provided value call_pkg->nd_family of type u64 is
not equal to 0. Then the value is converted to int, and only after that
is compared to NVDIMM_BUS_FAMILY_MAX. This can lead to passing an invalid
argument to acpi_nfit_ctl(), if call_pkg->nd_family is non-zero, while
the lower 32 bits are zero.
All checks of the input value should be applied to the original variable
call_pkg->nd_family.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 6450ddbd5d8e ("ACPI: NFIT: Define runtime firmware activation commands")
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+c80d8dc0d9fa81a3cd8c(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c80d8dc0d9fa81a3cd8c
Signed-off-by: Murad Masimov <m.masimov(a)mt-integration.ru>
---
drivers/acpi/nfit/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index a5d47819b3a4..ae035b93da08 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -485,7 +485,7 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm,
cmd_mask = nd_desc->cmd_mask;
if (cmd == ND_CMD_CALL && call_pkg->nd_family) {
family = call_pkg->nd_family;
- if (family > NVDIMM_BUS_FAMILY_MAX ||
+ if (call_pkg->nd_family > NVDIMM_BUS_FAMILY_MAX ||
!test_bit(family, &nd_desc->bus_family_mask))
return -EINVAL;
family = array_index_nospec(family,
--
2.39.2
The CPU PMU in Apple SoCs can be configured to fire its interrupt in one
of several ways, and since Apple A11 one of the method is FIQ. Only handle
the PMC interrupt as a FIQ when the CPU PMU has been configured to fire
FIQs.
Cc: stable(a)vger.kernel.org
Fixes: c7708816c944 ("irqchip/apple-aic: Wire PMU interrupts")
Signed-off-by: Nick Chan <towinchenmi(a)gmail.com>
---
Changes in v2:
Fix the conditional to have the intented behavior of evaluating to true
only when both PMCR0_IMODE is PMCR0_IMODE_FIQ and PMCR0_IACT is set by
reverting the conditional to how it is before c7708816c944.
Link to v1: https://lore.kernel.org/asahi/20250117170227.45243-1-towinchenmi@gmail.com/T
- Nick Chan
drivers/irqchip/irq-apple-aic.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c
index da5250f0155c..2b1684c60e3c 100644
--- a/drivers/irqchip/irq-apple-aic.c
+++ b/drivers/irqchip/irq-apple-aic.c
@@ -577,7 +577,8 @@ static void __exception_irq_entry aic_handle_fiq(struct pt_regs *regs)
AIC_FIQ_HWIRQ(AIC_TMR_EL02_VIRT));
}
- if (read_sysreg_s(SYS_IMP_APL_PMCR0_EL1) & PMCR0_IACT) {
+ if ((read_sysreg_s(SYS_IMP_APL_PMCR0_EL1) & (PMCR0_IMODE | PMCR0_IACT)) ==
+ (FIELD_PREP(PMCR0_IMODE, PMCR0_IMODE_FIQ) | PMCR0_IACT)) {
int irq;
if (cpumask_test_cpu(smp_processor_id(),
&aic_irqc->fiq_aff[AIC_CPU_PMU_P]->aff))
base-commit: 40384c840ea1944d7c5a392e8975ed088ecf0b37
--
2.48.1
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Any active plane needs to have its crtc included in the atomic
state. For planes enabled via uapi that is all handler in the core.
But when we use a plane for joiner the uapi code things the plane
is disabled and therefore doesn't have a crtc. So we need to pull
those in by hand. We do it first thing in
intel_joiner_add_affected_crtcs() so that any newly added crtc will
subsequently pull in all of its joined crtcs as well.
The symptoms from failing to do this are:
- duct tape in the form of commit 1d5b09f8daf8 ("drm/i915: Fix NULL
ptr deref by checking new_crtc_state")
- the plane's hw state will get overwritten by the disabled
uapi state if it can't find the uapi counterpart plane in
the atomic state from where it should copy the correct state
Cc: stable(a)vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_display.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 7d68d652c1bc..2b31c8f4b7cd 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -6682,12 +6682,30 @@ static int intel_async_flip_check_hw(struct intel_atomic_state *state, struct in
static int intel_joiner_add_affected_crtcs(struct intel_atomic_state *state)
{
struct drm_i915_private *i915 = to_i915(state->base.dev);
+ const struct intel_plane_state *plane_state;
struct intel_crtc_state *crtc_state;
+ struct intel_plane *plane;
struct intel_crtc *crtc;
u8 affected_pipes = 0;
u8 modeset_pipes = 0;
int i;
+ /*
+ * Any plane which is in use by the joiner needs its crtc.
+ * Pull those in first as this will not have happened yet
+ * if the plane remains disabled according to uapi.
+ */
+ for_each_new_intel_plane_in_state(state, plane, plane_state, i) {
+ crtc = to_intel_crtc(plane_state->hw.crtc);
+ if (!crtc)
+ continue;
+
+ crtc_state = intel_atomic_get_crtc_state(&state->base, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
+ }
+
+ /* Now pull in all joined crtcs */
for_each_new_intel_crtc_in_state(state, crtc, crtc_state, i) {
affected_pipes |= crtc_state->joiner_pipes;
if (intel_crtc_needs_modeset(crtc_state))
--
2.45.3
From: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
According to the Rev.1.20 hardware manual for the RZ/Five SoC, the clock
source for HP is derived from PLL6 divided by 2. This patch corrects the
implementation by configuring HP as a fixed clock source instead of a MUX.
The `CPG_PL6_ETH_SSEL` register, which is available on the RZ/G2UL SoC, is
not present on the RZ/Five SoC, necessitating this change.
Fixes: 95d48d270305ad2c ("clk: renesas: r9a07g043: Add support for RZ/Five SoC")
Cc: stable(a)vger.kernel.org
Reported-by: Hien Huynh <hien.huynh.px(a)renesas.com>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
---
drivers/clk/renesas/r9a07g043-cpg.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/clk/renesas/r9a07g043-cpg.c b/drivers/clk/renesas/r9a07g043-cpg.c
index b97e9a7b9708..da5aa015790c 100644
--- a/drivers/clk/renesas/r9a07g043-cpg.c
+++ b/drivers/clk/renesas/r9a07g043-cpg.c
@@ -138,7 +138,11 @@ static const struct cpg_core_clk r9a07g043_core_clks[] __initconst = {
DEF_DIV("P2", R9A07G043_CLK_P2, CLK_PLL3_DIV2_4_2, DIVPL3A, dtable_1_32),
DEF_FIXED("M0", R9A07G043_CLK_M0, CLK_PLL3_DIV2_4, 1, 1),
DEF_FIXED("ZT", R9A07G043_CLK_ZT, CLK_PLL3_DIV2_4_2, 1, 1),
+#ifdef CONFIG_ARM64
DEF_MUX("HP", R9A07G043_CLK_HP, SEL_PLL6_2, sel_pll6_2),
+#else
+ DEF_FIXED("HP", R9A07G043_CLK_HP, CLK_PLL6_250, 1, 1),
+#endif
DEF_FIXED("SPI0", R9A07G043_CLK_SPI0, CLK_DIV_PLL3_C, 1, 2),
DEF_FIXED("SPI1", R9A07G043_CLK_SPI1, CLK_DIV_PLL3_C, 1, 4),
DEF_SD_MUX("SD0", R9A07G043_CLK_SD0, SEL_SDHI0, SEL_SDHI0_STS, sel_sdhi,
--
2.43.0
The ASTDP transmitter sometimes takes up to second for enabling the
video signal, while the timeout is only 200 msec. This results in a
kernel error message. Increase the timeout to 1 second. An example
of the error message is shown below.
[ 697.084433] ------------[ cut here ]------------
[ 697.091115] ast 0000:02:00.0: [drm] drm_WARN_ON(!__ast_dp_wait_enable(ast, enabled))
[ 697.091233] WARNING: CPU: 1 PID: 160 at drivers/gpu/drm/ast/ast_dp.c:232 ast_dp_set_enable+0x123/0x140 [ast]
[...]
[ 697.272469] RIP: 0010:ast_dp_set_enable+0x123/0x140 [ast]
[...]
[ 697.415283] Call Trace:
[ 697.420727] <TASK>
[ 697.425908] ? show_trace_log_lvl+0x196/0x2c0
[ 697.433304] ? show_trace_log_lvl+0x196/0x2c0
[ 697.440693] ? drm_atomic_helper_commit_modeset_enables+0x30a/0x470
[ 697.450115] ? ast_dp_set_enable+0x123/0x140 [ast]
[ 697.458059] ? __warn.cold+0xaf/0xca
[ 697.464713] ? ast_dp_set_enable+0x123/0x140 [ast]
[ 697.472633] ? report_bug+0x134/0x1d0
[ 697.479544] ? handle_bug+0x58/0x90
[ 697.486127] ? exc_invalid_op+0x13/0x40
[ 697.492975] ? asm_exc_invalid_op+0x16/0x20
[ 697.500224] ? preempt_count_sub+0x14/0xc0
[ 697.507473] ? ast_dp_set_enable+0x123/0x140 [ast]
[ 697.515377] ? ast_dp_set_enable+0x123/0x140 [ast]
[ 697.523227] drm_atomic_helper_commit_modeset_enables+0x30a/0x470
[ 697.532388] drm_atomic_helper_commit_tail+0x58/0x90
[ 697.540400] ast_mode_config_helper_atomic_commit_tail+0x30/0x40 [ast]
[ 697.550009] commit_tail+0xfe/0x1d0
[ 697.556547] drm_atomic_helper_commit+0x198/0x1c0
This is a cosmetical problem. Enabling the video signal still works
even with the error message. The problem has always been present, but
only recent versions of the ast driver warn about missing the timeout.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: 4e29cc7c5c67 ("drm/ast: astdp: Replace ast_dp_set_on_off()")
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Jocelyn Falempe <jfalempe(a)redhat.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v6.13+
---
drivers/gpu/drm/ast/ast_dp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ast/ast_dp.c b/drivers/gpu/drm/ast/ast_dp.c
index 30aad5c0112a1..2d7482a65f62a 100644
--- a/drivers/gpu/drm/ast/ast_dp.c
+++ b/drivers/gpu/drm/ast/ast_dp.c
@@ -201,7 +201,7 @@ static bool __ast_dp_wait_enable(struct ast_device *ast, bool enabled)
if (enabled)
vgacrdf_test |= AST_IO_VGACRDF_DP_VIDEO_ENABLE;
- for (i = 0; i < 200; ++i) {
+ for (i = 0; i < 1000; ++i) {
if (i)
mdelay(1);
vgacrdf = ast_get_index_reg_mask(ast, AST_IO_VGACRI, 0xdf,
--
2.48.1
From: Andrey Vatoropin <a.vatoropin(a)crpt.ru>
Size of variable args->pitch equals four bytes.
Size of variable args->height equals four bytes.
The expression args->pitch * args->height is currently being evaluated
using 32-bit arithmetic. During multiplication, an overflow may occur.
Above the expression args->pitch * args->height there is a check for its
minimum value. However, if args->pitch has a value greater than this
minimum, that check is insufficient.
Since a value of type 'u64' is used to store the eventual result,
cast the first variable of each expression to 'u64' to provide the
compiler with complete information about the appropriate arithmetic to use.
This is similar to commit 0f8f8a643000 ("drm/i915/gem: Detect overflow
in calculating dumb buffer size").
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 6d1782919dc9 ("drm/cma: Introduce drm_gem_cma_dumb_create_internal()")
Signed-off-by: Andrey Vatoropin <a.vatoropin(a)crpt.ru>
---
drivers/gpu/drm/drm_gem_dma_helper.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_dma_helper.c b/drivers/gpu/drm/drm_gem_dma_helper.c
index 870b90b78bc4..a8862f6f702a 100644
--- a/drivers/gpu/drm/drm_gem_dma_helper.c
+++ b/drivers/gpu/drm/drm_gem_dma_helper.c
@@ -272,8 +272,8 @@ int drm_gem_dma_dumb_create_internal(struct drm_file *file_priv,
if (args->pitch < min_pitch)
args->pitch = min_pitch;
- if (args->size < args->pitch * args->height)
- args->size = args->pitch * args->height;
+ if (args->size < mul_u32_u32(args->pitch, args->height))
+ args->size = mul_u32_u32(args->pitch, args->height);
dma_obj = drm_gem_dma_create_with_handle(file_priv, drm, args->size,
&args->handle);
--
2.43.0
Previously, we incorrectly cast irq_domain->host_data directly to
mvebu_icu_msi_data. However, host_data actually stores a structure of
type msi_domain_info.
This incorrect assumption caused issues such as the thermal sensors of
the CP110 platform malfunctioning. Specifically, the translation of the
SEI interrupt to IRQ_TYPE_EDGE_RISING failed, preventing proper
interrupt handling. The following error was observed:
genirq: Setting trigger mode 4 for irq 85 failed (irq_chip_set_type_parent+0x0/0x34)
armada_thermal f2400000.system-controller:thermal-sensor@70: Cannot request threaded IRQ 85
This commit resolves the issue by first casting host_data to
msi_domain_info and then accessing the mvebu_icu_msi_data through
msi_domain_info->chip_data.
Cc: stable(a)vger.kernel.org
Fixes: d929e4db22b6 ("irqchip/irq-mvebu-icu: Prepare for real per device MSI")
Signed-off-by: Stefan Eichenberger <eichest(a)gmail.com>
---
Changes in v2:
* This patch is a v2 because it addresses the same issue as this patch:
https://lore.kernel.org/all/20241217111623.92625-1-eichest@gmail.com/
* This time it addresses the underlying issue instead of adding a
workaround (thanks to Thomas)
drivers/irqchip/irq-mvebu-icu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-mvebu-icu.c b/drivers/irqchip/irq-mvebu-icu.c
index b337f6c05f18..4eebed39880a 100644
--- a/drivers/irqchip/irq-mvebu-icu.c
+++ b/drivers/irqchip/irq-mvebu-icu.c
@@ -68,7 +68,8 @@ static int mvebu_icu_translate(struct irq_domain *d, struct irq_fwspec *fwspec,
unsigned long *hwirq, unsigned int *type)
{
unsigned int param_count = static_branch_unlikely(&legacy_bindings) ? 3 : 2;
- struct mvebu_icu_msi_data *msi_data = d->host_data;
+ struct msi_domain_info *info = d->host_data;
+ struct mvebu_icu_msi_data *msi_data = info->chip_data;
struct mvebu_icu *icu = msi_data->icu;
/* Check the count of the parameters in dt */
--
2.45.2
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 01cfc84024e9a6b619696a35d2e5662255001cd0
Gitweb: https://git.kernel.org/tip/01cfc84024e9a6b619696a35d2e5662255001cd0
Author: Waiman Long <longman(a)redhat.com>
AuthorDate: Fri, 24 Jan 2025 20:54:42 -05:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 27 Jan 2025 10:30:59 +01:00
clocksource: Use get_random_bytes() in clocksource_verify_choose_cpus()
The following bug report happened in a PREEMPT_RT kernel.
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2012, name: kwatchdog
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by kwatchdog/2012:
#0: ffffffff8af2da60 (clocksource_mutex){+.+.}-{3:3}, at: clocksource_watchdog_kthread+0x13/0x50
#1: ffffffff8aa8d4d0 (cpu_hotplug_lock){++++}-{0:0}, at: clocksource_verify_percpu.part.0+0x5c/0x330
#2: ffff9fe02f5f33e0 ((batched_entropy_u32.lock)){+.+.}-{2:2}, at: get_random_u32+0x4f/0x110
Preemption disabled at:
[<ffffffff88c1fe56>] clocksource_verify_percpu.part.0+0x66/0x330
CPU: 33 PID: 2012 Comm: kwatchdog Not tainted 5.14.0-503.23.1.el9_5.x86_64+rt-debug #1
Call Trace:
<TASK>
__might_resched.cold+0xf4/0x12f
rt_spin_lock+0x4c/0x100
get_random_u32+0x4f/0x110
clocksource_verify_choose_cpus+0xab/0x1a0
clocksource_verify_percpu.part.0+0x6b/0x330
__clocksource_watchdog_kthread+0x193/0x1a0
clocksource_watchdog_kthread+0x18/0x50
kthread+0x114/0x140
ret_from_fork+0x2c/0x50
</TASK>
This happens due to the fact that get_random_u32() is called in
clocksource_verify_choose_cpus() with preemption disabled. If crng_ready()
is true by the time get_random_u32() is called, The batched_entropy_32
local lock will be acquired. In a PREEMPT_RT enabled kernel, it is a
rtmutex, which can't be acquireq with preemption disabled.
Fix this problem by using the less random get_random_bytes() function which
will not take any lock. In fact, it has the same random-ness as
get_random_u32_below() when crng_ready() is false.
Fixes: 7560c02bdffb ("clocksource: Check per-CPU clock synchronization when marked unstable")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Suggested-by: Paul E. McKenney <paulmck(a)kernel.org>
Reviewed-by: Paul E. McKenney <paulmck(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20250125015442.3740588-2-longman@redhat.com
---
kernel/time/clocksource.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 77d9566..659c4b7 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -340,9 +340,13 @@ static void clocksource_verify_choose_cpus(void)
* and no replacement CPU is selected. This gracefully handles
* situations where verify_n_cpus is greater than the number of
* CPUs that are currently online.
+ *
+ * The get_random_bytes() is used here to avoid taking lock with
+ * preemption disabled.
*/
for (i = 1; i < n; i++) {
- cpu = get_random_u32_below(nr_cpu_ids);
+ get_random_bytes(&cpu, sizeof(cpu));
+ cpu %= nr_cpu_ids;
cpu = cpumask_next(cpu - 1, cpu_online_mask);
if (cpu >= nr_cpu_ids)
cpu = cpumask_first(cpu_online_mask);
From: Aradhya Bhatia <a-bhatia1(a)ti.com>
Fix the OF node pointer passed to the of_drm_find_bridge() call to find
the next bridge in the display chain.
The code to find the next panel (and create its panel-bridge) works
fine, but to find the next (non-panel) bridge does not.
To find the next bridge in the pipeline, we need to pass "np" - the OF
node pointer of the next entity in the devicetree chain. Passing
"of_node" to of_drm_find_bridge (which is what the code does currently)
will fetch the bridge for the cdns-dsi which is not what's required.
Fix that.
Fixes: e19233955d9e ("drm/bridge: Add Cadence DSI driver")
Cc: Stable List <stable(a)vger.kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
Signed-off-by: Aradhya Bhatia <a-bhatia1(a)ti.com>
---
drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c b/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
index c7a0247e06ad..2f897ea5e80a 100644
--- a/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
+++ b/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
@@ -952,7 +952,7 @@ static int cdns_dsi_attach(struct mipi_dsi_host *host,
bridge = drm_panel_bridge_add_typed(panel,
DRM_MODE_CONNECTOR_DSI);
} else {
- bridge = of_drm_find_bridge(dev->dev.of_node);
+ bridge = of_drm_find_bridge(np);
if (!bridge)
bridge = ERR_PTR(-EINVAL);
}
--
2.34.1
From: Aradhya Bhatia <a-bhatia1(a)ti.com>
Once the DSI Link and DSI Phy are initialized, the code needs to wait
for Clk and Data Lanes to be ready, before continuing configuration.
This is in accordance with the DSI Start-up procedure, found in the
Technical Reference Manual of Texas Instrument's J721E SoC[0] which
houses this DSI TX controller.
If the previous bridge (or crtc/encoder) are configured pre-maturely,
the input signal FIFO gets corrupt. This introduces a color-shift on the
display.
Allow the driver to wait for the clk and data lanes to get ready during
DSI enable.
[0]: See section 12.6.5.7.3 "Start-up Procedure" in J721E SoC TRM
TRM Link: http://www.ti.com/lit/pdf/spruil1
Fixes: e19233955d9e ("drm/bridge: Add Cadence DSI driver")
Cc: Stable List <stable(a)vger.kernel.org>
Tested-by: Dominik Haller <d.haller(a)phytec.de>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
Signed-off-by: Aradhya Bhatia <a-bhatia1(a)ti.com>
Signed-off-by: Aradhya Bhatia <aradhya.bhatia(a)linux.dev>
---
drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c b/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
index 87921a748cdb..6a77ca36cb9d 100644
--- a/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
+++ b/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
@@ -769,7 +769,7 @@ static void cdns_dsi_bridge_enable(struct drm_bridge *bridge)
struct phy_configure_opts_mipi_dphy *phy_cfg = &output->phy_opts.mipi_dphy;
unsigned long tx_byte_period;
struct cdns_dsi_cfg dsi_cfg;
- u32 tmp, reg_wakeup, div;
+ u32 tmp, reg_wakeup, div, status;
int nlanes;
if (WARN_ON(pm_runtime_get_sync(dsi->base.dev) < 0))
@@ -786,6 +786,19 @@ static void cdns_dsi_bridge_enable(struct drm_bridge *bridge)
cdns_dsi_hs_init(dsi);
cdns_dsi_init_link(dsi);
+ /*
+ * Now that the DSI Link and DSI Phy are initialized,
+ * wait for the CLK and Data Lanes to be ready.
+ */
+ tmp = CLK_LANE_RDY;
+ for (int i = 0; i < nlanes; i++)
+ tmp |= DATA_LANE_RDY(i);
+
+ if (readl_poll_timeout(dsi->regs + MCTL_MAIN_STS, status,
+ (tmp == (status & tmp)), 100, 500000))
+ dev_err(dsi->base.dev,
+ "Timed Out: DSI-DPhy Clock and Data Lanes not ready.\n");
+
writel(HBP_LEN(dsi_cfg.hbp) | HSA_LEN(dsi_cfg.hsa),
dsi->regs + VID_HSIZE1);
writel(HFP_LEN(dsi_cfg.hfp) | HACT_LEN(dsi_cfg.hact),
--
2.34.1
From: Aradhya Bhatia <a-bhatia1(a)ti.com>
The crtc_* mode parameters do not get generated (duplicated in this
case) from the regular parameters before the mode validation phase
begins.
The rest of the code conditionally uses the crtc_* parameters only
during the bridge enable phase, but sticks to the regular parameters
for mode validation. In this singular instance, however, the driver
tries to use the crtc_clock parameter even during the mode validation,
causing the validation to fail.
Allow the D-Phy config checks to use mode->clock instead of
mode->crtc_clock during mode_valid checks, like everywhere else in the
driver.
Fixes: fced5a364dee ("drm/bridge: cdns: Convert to phy framework")
Cc: Stable List <stable(a)vger.kernel.org>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Aradhya Bhatia <a-bhatia1(a)ti.com>
Signed-off-by: Aradhya Bhatia <aradhya.bhatia(a)linux.dev>
---
drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c b/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
index b0a1a6774ea6..19cc8734a4c8 100644
--- a/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
+++ b/drivers/gpu/drm/bridge/cadence/cdns-dsi-core.c
@@ -568,13 +568,14 @@ static int cdns_dsi_check_conf(struct cdns_dsi *dsi,
struct phy_configure_opts_mipi_dphy *phy_cfg = &output->phy_opts.mipi_dphy;
unsigned long dsi_hss_hsa_hse_hbp;
unsigned int nlanes = output->dev->lanes;
+ int mode_clock = (mode_valid_check ? mode->clock : mode->crtc_clock);
int ret;
ret = cdns_dsi_mode2cfg(dsi, mode, dsi_cfg, mode_valid_check);
if (ret)
return ret;
- phy_mipi_dphy_get_default_config(mode->crtc_clock * 1000,
+ phy_mipi_dphy_get_default_config(mode_clock * 1000,
mipi_dsi_pixel_format_to_bpp(output->dev->format),
nlanes, phy_cfg);
--
2.34.1
From: Thomas Gleixner <tglx(a)linutronix.de>
commit fe513c2ef0a172a58f158e2e70465c4317f0a9a2 upstream.
static_call_module_notify() triggers a WARN_ON(), when memory allocation
fails in __static_call_add_module().
That's not really justified, because the failure case must be correctly
handled by the well known call chain and the error code is passed
through to the initiating userspace application.
A memory allocation fail is not a fatal problem, but the WARN_ON() takes
the machine out when panic_on_warn is set.
Replace it with a pr_warn().
Fixes: 9183c3f9ed71 ("static_call: Add inline static call infrastructure")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lkml.kernel.org/r/8734mf7pmb.ffs@tglx
Signed-off-by: Alexey Nepomnyashih <sdl(a)nppct.ru>
---
Backport to fix CVE-2024-49954
Link: https://nvd.nist.gov/vuln/detail/CVE-2024-49954
---
kernel/static_call.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/static_call.c b/kernel/static_call.c
index e9408409eb46..a008250e2533 100644
--- a/kernel/static_call.c
+++ b/kernel/static_call.c
@@ -431,7 +431,7 @@ static int static_call_module_notify(struct notifier_block *nb,
case MODULE_STATE_COMING:
ret = static_call_add_module(mod);
if (ret) {
- WARN(1, "Failed to allocate memory for static calls");
+ pr_warn("Failed to allocate memory for static calls\n");
static_call_del_module(mod);
}
break;
--
2.43.0
From: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
[ Upstream commit 2a3cfb9a24a28da9cc13d2c525a76548865e182c ]
Since 'adev->dm.dc' in amdgpu_dm_fini() might turn out to be NULL
before the call to dc_enable_dmub_notifications(), check
beforehand to ensure there will not be a possible NULL-ptr-deref
there.
Also, since commit 1e88eb1b2c25 ("drm/amd/display: Drop
CONFIG_DRM_AMD_DC_HDCP") there are two separate checks for NULL in
'adev->dm.dc' before dc_deinit_callbacks() and dc_dmub_srv_destroy().
Clean up by combining them all under one 'if'.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 81927e2808be ("drm/amd/display: Support for DMUB AUX")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
[ To fix CVE-2024-27041, I fix the merge conflict by still using macro
CONFIG_DRM_AMD_DC_HDCP in the first adev->dm.dc check. ]
Signed-off-by: Imkanmod Khan <imkanmodkhan(a)gmail.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8dc0f70df24f..7b4d44dcb343 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1882,14 +1882,14 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
dc_deinit_callbacks(adev->dm.dc);
#endif
- if (adev->dm.dc)
+ if (adev->dm.dc) {
dc_dmub_srv_destroy(&adev->dm.dc->ctx->dmub_srv);
-
- if (dc_enable_dmub_notifications(adev->dm.dc)) {
- kfree(adev->dm.dmub_notify);
- adev->dm.dmub_notify = NULL;
- destroy_workqueue(adev->dm.delayed_hpd_wq);
- adev->dm.delayed_hpd_wq = NULL;
+ if (dc_enable_dmub_notifications(adev->dm.dc)) {
+ kfree(adev->dm.dmub_notify);
+ adev->dm.dmub_notify = NULL;
+ destroy_workqueue(adev->dm.delayed_hpd_wq);
+ adev->dm.delayed_hpd_wq = NULL;
+ }
}
if (adev->dm.dmub_bo)
--
2.25.1
From: "Luis Henriques (SUSE)" <luis.henriques(a)linux.dev>
[ commit 23dfdb56581ad92a9967bcd720c8c23356af74c1 upstream ]
The following kernel trace can be triggered with fstest generic/629 when
executed against a filesystem with fast-commit feature enabled:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 PID: 866 Comm: mount Not tainted 6.10.0+ #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x66/0x90
register_lock_class+0x759/0x7d0
__lock_acquire+0x85/0x2630
? __find_get_block+0xb4/0x380
lock_acquire+0xd1/0x2d0
? __ext4_journal_get_write_access+0xd5/0x160
_raw_spin_lock+0x33/0x40
? __ext4_journal_get_write_access+0xd5/0x160
__ext4_journal_get_write_access+0xd5/0x160
ext4_reserve_inode_write+0x61/0xb0
__ext4_mark_inode_dirty+0x79/0x270
? ext4_ext_replay_set_iblocks+0x2f8/0x450
ext4_ext_replay_set_iblocks+0x330/0x450
ext4_fc_replay+0x14c8/0x1540
? jread+0x88/0x2e0
? rcu_is_watching+0x11/0x40
do_one_pass+0x447/0xd00
jbd2_journal_recover+0x139/0x1b0
jbd2_journal_load+0x96/0x390
ext4_load_and_init_journal+0x253/0xd40
ext4_fill_super+0x2cc6/0x3180
...
In the replay path there's an attempt to lock sbi->s_bdev_wb_lock in
function ext4_check_bdev_write_error(). Unfortunately, at this point this
spinlock has not been initialized yet. Moving it's initialization to an
earlier point in __ext4_fill_super() fixes this splat.
Signed-off-by: Luis Henriques (SUSE) <luis.henriques(a)linux.dev>
Link: https://patch.msgid.link/20240718094356.7863-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
Signed-off-by: Imkanmod Khan <imkanmodkhan(a)gmail.com>
---
fs/ext4/super.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0b2591c07166..53f1deb049ec 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5264,6 +5264,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
mutex_init(&sbi->s_orphan_lock);
+ spin_lock_init(&sbi->s_bdev_wb_lock);
+
ext4_fast_commit_init(sb);
sb->s_root = NULL;
@@ -5514,7 +5516,6 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
* Save the original bdev mapping's wb_err value which could be
* used to detect the metadata async write error.
*/
- spin_lock_init(&sbi->s_bdev_wb_lock);
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
&sbi->s_bdev_wb_err);
sb->s_bdev->bd_super = sb;
--
2.25.1
From: "Luis Henriques (SUSE)" <luis.henriques(a)linux.dev>
[ commit 23dfdb56581ad92a9967bcd720c8c23356af74c1 upstream ]
The following kernel trace can be triggered with fstest generic/629 when
executed against a filesystem with fast-commit feature enabled:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 PID: 866 Comm: mount Not tainted 6.10.0+ #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x66/0x90
register_lock_class+0x759/0x7d0
__lock_acquire+0x85/0x2630
? __find_get_block+0xb4/0x380
lock_acquire+0xd1/0x2d0
? __ext4_journal_get_write_access+0xd5/0x160
_raw_spin_lock+0x33/0x40
? __ext4_journal_get_write_access+0xd5/0x160
__ext4_journal_get_write_access+0xd5/0x160
ext4_reserve_inode_write+0x61/0xb0
__ext4_mark_inode_dirty+0x79/0x270
? ext4_ext_replay_set_iblocks+0x2f8/0x450
ext4_ext_replay_set_iblocks+0x330/0x450
ext4_fc_replay+0x14c8/0x1540
? jread+0x88/0x2e0
? rcu_is_watching+0x11/0x40
do_one_pass+0x447/0xd00
jbd2_journal_recover+0x139/0x1b0
jbd2_journal_load+0x96/0x390
ext4_load_and_init_journal+0x253/0xd40
ext4_fill_super+0x2cc6/0x3180
...
In the replay path there's an attempt to lock sbi->s_bdev_wb_lock in
function ext4_check_bdev_write_error(). Unfortunately, at this point this
spinlock has not been initialized yet. Moving it's initialization to an
earlier point in __ext4_fill_super() fixes this splat.
Signed-off-by: Luis Henriques (SUSE) <luis.henriques(a)linux.dev>
Link: https://patch.msgid.link/20240718094356.7863-1-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
Signed-off-by: Imkanmod Khan <imkanmodkhan(a)gmail.com>
---
fs/ext4/super.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 71ced0ada9a2..f019ce64eba4 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5366,6 +5366,8 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
mutex_init(&sbi->s_orphan_lock);
+ spin_lock_init(&sbi->s_bdev_wb_lock);
+
ext4_fast_commit_init(sb);
sb->s_root = NULL;
@@ -5586,7 +5588,6 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
* Save the original bdev mapping's wb_err value which could be
* used to detect the metadata async write error.
*/
- spin_lock_init(&sbi->s_bdev_wb_lock);
errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
&sbi->s_bdev_wb_err);
EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
--
2.25.1
From: David Woodhouse <dwmw(a)amazon.co.uk>
[ Upstream commit 4b5bc2ec9a239bce261ffeafdd63571134102323 ]
Now that the following fix:
d0ceea662d45 ("x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables")
stops kernel_ident_mapping_init() from scribbling over the end of a
4KiB PGD by assuming the following 4KiB will be a userspace PGD,
there's no good reason for the kexec PGD to be part of a single
8KiB allocation with the control_code_page.
( It's not clear that that was the reason for x86_64 kexec doing it that
way in the first place either; there were no comments to that effect and
it seems to have been the case even before PTI came along. It looks like
it was just a happy accident which prevented memory corruption on kexec. )
Either way, it definitely isn't needed now. Just allocate the PGD
separately on x86_64, like i386 already does.
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-6-dwmw2@infradead.org
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/include/asm/kexec.h | 18 +++++++++---
arch/x86/kernel/machine_kexec_64.c | 45 ++++++++++++++++--------------
2 files changed, 38 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 256eee99afc8f..e2e1ec99c9998 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -16,6 +16,7 @@
# define PAGES_NR 4
#endif
+# define KEXEC_CONTROL_PAGE_SIZE 4096
# define KEXEC_CONTROL_CODE_MAX_SIZE 2048
#ifndef __ASSEMBLY__
@@ -44,7 +45,6 @@ struct kimage;
/* Maximum address we can use for the control code buffer */
# define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
-# define KEXEC_CONTROL_PAGE_SIZE 4096
/* The native architecture */
# define KEXEC_ARCH KEXEC_ARCH_386
@@ -59,9 +59,6 @@ struct kimage;
/* Maximum address we can use for the control pages */
# define KEXEC_CONTROL_MEMORY_LIMIT (MAXMEM-1)
-/* Allocate one page for the pdp and the second for the code */
-# define KEXEC_CONTROL_PAGE_SIZE (4096UL + 4096UL)
-
/* The native architecture */
# define KEXEC_ARCH KEXEC_ARCH_X86_64
#endif
@@ -146,6 +143,19 @@ struct kimage_arch {
};
#else
struct kimage_arch {
+ /*
+ * This is a kimage control page, as it must not overlap with either
+ * source or destination address ranges.
+ */
+ pgd_t *pgd;
+ /*
+ * The virtual mapping of the control code page itself is used only
+ * during the transition, while the current kernel's pages are all
+ * in place. Thus the intermediate page table pages used to map it
+ * are not control pages, but instead just normal pages obtained
+ * with get_zeroed_page(). And have to be tracked (below) so that
+ * they can be freed.
+ */
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 24b6eaacc81eb..5d61a342871b5 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -149,7 +149,8 @@ static void free_transition_pgtable(struct kimage *image)
image->arch.pte = NULL;
}
-static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
+static int init_transition_pgtable(struct kimage *image, pgd_t *pgd,
+ unsigned long control_page)
{
pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
unsigned long vaddr, paddr;
@@ -160,7 +161,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
pte_t *pte;
vaddr = (unsigned long)relocate_kernel;
- paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
+ paddr = control_page;
pgd += pgd_index(vaddr);
if (!pgd_present(*pgd)) {
p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
@@ -219,7 +220,7 @@ static void *alloc_pgt_page(void *data)
return p;
}
-static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
+static int init_pgtable(struct kimage *image, unsigned long control_page)
{
struct x86_mapping_info info = {
.alloc_pgt_page = alloc_pgt_page,
@@ -228,12 +229,12 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
.kernpg_flag = _KERNPG_TABLE_NOENC,
};
unsigned long mstart, mend;
- pgd_t *level4p;
int result;
int i;
- level4p = (pgd_t *)__va(start_pgtable);
- clear_page(level4p);
+ image->arch.pgd = alloc_pgt_page(image);
+ if (!image->arch.pgd)
+ return -ENOMEM;
if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
info.page_flag |= _PAGE_ENC;
@@ -247,8 +248,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
mstart = pfn_mapped[i].start << PAGE_SHIFT;
mend = pfn_mapped[i].end << PAGE_SHIFT;
- result = kernel_ident_mapping_init(&info,
- level4p, mstart, mend);
+ result = kernel_ident_mapping_init(&info, image->arch.pgd,
+ mstart, mend);
if (result)
return result;
}
@@ -263,8 +264,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
mstart = image->segment[i].mem;
mend = mstart + image->segment[i].memsz;
- result = kernel_ident_mapping_init(&info,
- level4p, mstart, mend);
+ result = kernel_ident_mapping_init(&info, image->arch.pgd,
+ mstart, mend);
if (result)
return result;
@@ -274,15 +275,19 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
* Prepare EFI systab and ACPI tables for kexec kernel since they are
* not covered by pfn_mapped.
*/
- result = map_efi_systab(&info, level4p);
+ result = map_efi_systab(&info, image->arch.pgd);
if (result)
return result;
- result = map_acpi_tables(&info, level4p);
+ result = map_acpi_tables(&info, image->arch.pgd);
if (result)
return result;
- return init_transition_pgtable(image, level4p);
+ /*
+ * This must be last because the intermediate page table pages it
+ * allocates will not be control pages and may overlap the image.
+ */
+ return init_transition_pgtable(image, image->arch.pgd, control_page);
}
static void load_segments(void)
@@ -299,14 +304,14 @@ static void load_segments(void)
int machine_kexec_prepare(struct kimage *image)
{
- unsigned long start_pgtable;
+ unsigned long control_page;
int result;
/* Calculate the offsets */
- start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
+ control_page = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
/* Setup the identity mapped 64bit page table */
- result = init_pgtable(image, start_pgtable);
+ result = init_pgtable(image, control_page);
if (result)
return result;
@@ -353,13 +358,12 @@ void machine_kexec(struct kimage *image)
#endif
}
- control_page = page_address(image->control_code_page) + PAGE_SIZE;
+ control_page = page_address(image->control_code_page);
__memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
page_list[PA_CONTROL_PAGE] = virt_to_phys(control_page);
page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
- page_list[PA_TABLE_PAGE] =
- (unsigned long)__pa(page_address(image->control_code_page));
+ page_list[PA_TABLE_PAGE] = (unsigned long)__pa(image->arch.pgd);
if (image->type == KEXEC_TYPE_DEFAULT)
page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page)
@@ -578,8 +582,7 @@ static void kexec_mark_crashkres(bool protect)
/* Don't touch the control code page used in crash_kexec().*/
control = PFN_PHYS(page_to_pfn(kexec_crash_image->control_code_page));
- /* Control code page is located in the 2nd page. */
- kexec_mark_range(crashk_res.start, control + PAGE_SIZE - 1, protect);
+ kexec_mark_range(crashk_res.start, control - 1, protect);
control += KEXEC_CONTROL_PAGE_SIZE;
kexec_mark_range(control, crashk_res.end, protect);
}
--
2.39.5
From: David Woodhouse <dwmw(a)amazon.co.uk>
[ Upstream commit 4b5bc2ec9a239bce261ffeafdd63571134102323 ]
Now that the following fix:
d0ceea662d45 ("x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables")
stops kernel_ident_mapping_init() from scribbling over the end of a
4KiB PGD by assuming the following 4KiB will be a userspace PGD,
there's no good reason for the kexec PGD to be part of a single
8KiB allocation with the control_code_page.
( It's not clear that that was the reason for x86_64 kexec doing it that
way in the first place either; there were no comments to that effect and
it seems to have been the case even before PTI came along. It looks like
it was just a happy accident which prevented memory corruption on kexec. )
Either way, it definitely isn't needed now. Just allocate the PGD
separately on x86_64, like i386 already does.
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Link: https://lore.kernel.org/r/20241205153343.3275139-6-dwmw2@infradead.org
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/include/asm/kexec.h | 18 +++++++++---
arch/x86/kernel/machine_kexec_64.c | 45 ++++++++++++++++--------------
2 files changed, 38 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index c9f6a6c5de3cf..d54dd7084a3e1 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -16,6 +16,7 @@
# define PAGES_NR 4
#endif
+# define KEXEC_CONTROL_PAGE_SIZE 4096
# define KEXEC_CONTROL_CODE_MAX_SIZE 2048
#ifndef __ASSEMBLY__
@@ -44,7 +45,6 @@ struct kimage;
/* Maximum address we can use for the control code buffer */
# define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
-# define KEXEC_CONTROL_PAGE_SIZE 4096
/* The native architecture */
# define KEXEC_ARCH KEXEC_ARCH_386
@@ -59,9 +59,6 @@ struct kimage;
/* Maximum address we can use for the control pages */
# define KEXEC_CONTROL_MEMORY_LIMIT (MAXMEM-1)
-/* Allocate one page for the pdp and the second for the code */
-# define KEXEC_CONTROL_PAGE_SIZE (4096UL + 4096UL)
-
/* The native architecture */
# define KEXEC_ARCH KEXEC_ARCH_X86_64
#endif
@@ -146,6 +143,19 @@ struct kimage_arch {
};
#else
struct kimage_arch {
+ /*
+ * This is a kimage control page, as it must not overlap with either
+ * source or destination address ranges.
+ */
+ pgd_t *pgd;
+ /*
+ * The virtual mapping of the control code page itself is used only
+ * during the transition, while the current kernel's pages are all
+ * in place. Thus the intermediate page table pages used to map it
+ * are not control pages, but instead just normal pages obtained
+ * with get_zeroed_page(). And have to be tracked (below) so that
+ * they can be freed.
+ */
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 2fa12d1dc6760..8509d809a9f1b 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -149,7 +149,8 @@ static void free_transition_pgtable(struct kimage *image)
image->arch.pte = NULL;
}
-static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
+static int init_transition_pgtable(struct kimage *image, pgd_t *pgd,
+ unsigned long control_page)
{
pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
unsigned long vaddr, paddr;
@@ -160,7 +161,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
pte_t *pte;
vaddr = (unsigned long)relocate_kernel;
- paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
+ paddr = control_page;
pgd += pgd_index(vaddr);
if (!pgd_present(*pgd)) {
p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
@@ -219,7 +220,7 @@ static void *alloc_pgt_page(void *data)
return p;
}
-static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
+static int init_pgtable(struct kimage *image, unsigned long control_page)
{
struct x86_mapping_info info = {
.alloc_pgt_page = alloc_pgt_page,
@@ -228,12 +229,12 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
.kernpg_flag = _KERNPG_TABLE_NOENC,
};
unsigned long mstart, mend;
- pgd_t *level4p;
int result;
int i;
- level4p = (pgd_t *)__va(start_pgtable);
- clear_page(level4p);
+ image->arch.pgd = alloc_pgt_page(image);
+ if (!image->arch.pgd)
+ return -ENOMEM;
if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
info.page_flag |= _PAGE_ENC;
@@ -247,8 +248,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
mstart = pfn_mapped[i].start << PAGE_SHIFT;
mend = pfn_mapped[i].end << PAGE_SHIFT;
- result = kernel_ident_mapping_init(&info,
- level4p, mstart, mend);
+ result = kernel_ident_mapping_init(&info, image->arch.pgd,
+ mstart, mend);
if (result)
return result;
}
@@ -263,8 +264,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
mstart = image->segment[i].mem;
mend = mstart + image->segment[i].memsz;
- result = kernel_ident_mapping_init(&info,
- level4p, mstart, mend);
+ result = kernel_ident_mapping_init(&info, image->arch.pgd,
+ mstart, mend);
if (result)
return result;
@@ -274,15 +275,19 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
* Prepare EFI systab and ACPI tables for kexec kernel since they are
* not covered by pfn_mapped.
*/
- result = map_efi_systab(&info, level4p);
+ result = map_efi_systab(&info, image->arch.pgd);
if (result)
return result;
- result = map_acpi_tables(&info, level4p);
+ result = map_acpi_tables(&info, image->arch.pgd);
if (result)
return result;
- return init_transition_pgtable(image, level4p);
+ /*
+ * This must be last because the intermediate page table pages it
+ * allocates will not be control pages and may overlap the image.
+ */
+ return init_transition_pgtable(image, image->arch.pgd, control_page);
}
static void load_segments(void)
@@ -299,14 +304,14 @@ static void load_segments(void)
int machine_kexec_prepare(struct kimage *image)
{
- unsigned long start_pgtable;
+ unsigned long control_page;
int result;
/* Calculate the offsets */
- start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
+ control_page = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
/* Setup the identity mapped 64bit page table */
- result = init_pgtable(image, start_pgtable);
+ result = init_pgtable(image, control_page);
if (result)
return result;
@@ -360,13 +365,12 @@ void machine_kexec(struct kimage *image)
#endif
}
- control_page = page_address(image->control_code_page) + PAGE_SIZE;
+ control_page = page_address(image->control_code_page);
__memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
page_list[PA_CONTROL_PAGE] = virt_to_phys(control_page);
page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
- page_list[PA_TABLE_PAGE] =
- (unsigned long)__pa(page_address(image->control_code_page));
+ page_list[PA_TABLE_PAGE] = (unsigned long)__pa(image->arch.pgd);
if (image->type == KEXEC_TYPE_DEFAULT)
page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page)
@@ -574,8 +578,7 @@ static void kexec_mark_crashkres(bool protect)
/* Don't touch the control code page used in crash_kexec().*/
control = PFN_PHYS(page_to_pfn(kexec_crash_image->control_code_page));
- /* Control code page is located in the 2nd page. */
- kexec_mark_range(crashk_res.start, control + PAGE_SIZE - 1, protect);
+ kexec_mark_range(crashk_res.start, control - 1, protect);
control += KEXEC_CONTROL_PAGE_SIZE;
kexec_mark_range(control, crashk_res.end, protect);
}
--
2.39.5
From: Bard Liao <yung-chuan.liao(a)linux.intel.com>
[ Upstream commit 569922b82ca660f8b24e705f6cf674e6b1f99cc7 ]
Each cpu DAI should associate with a widget. However, the topology might
not create the right number of DAI widgets for aggregated amps. And it
will cause NULL pointer deference.
Check that the DAI widget associated with the CPU DAI is valid to prevent
NULL pointer deference due to missing DAI widgets in topologies with
aggregated amps.
Signed-off-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood(a)intel.com>
Link: https://patch.msgid.link/20241203104853.56956-1-yung-chuan.liao@linux.intel…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/sof/intel/hda-dai.c | 12 ++++++++++++
sound/soc/sof/intel/hda.c | 5 +++++
2 files changed, 17 insertions(+)
diff --git a/sound/soc/sof/intel/hda-dai.c b/sound/soc/sof/intel/hda-dai.c
index 82f46ecd94301..2e58a264da556 100644
--- a/sound/soc/sof/intel/hda-dai.c
+++ b/sound/soc/sof/intel/hda-dai.c
@@ -503,6 +503,12 @@ int sdw_hda_dai_hw_params(struct snd_pcm_substream *substream,
int ret;
int i;
+ if (!w) {
+ dev_err(cpu_dai->dev, "%s widget not found, check amp link num in the topology\n",
+ cpu_dai->name);
+ return -EINVAL;
+ }
+
ops = hda_dai_get_ops(substream, cpu_dai);
if (!ops) {
dev_err(cpu_dai->dev, "DAI widget ops not set\n");
@@ -582,6 +588,12 @@ int sdw_hda_dai_hw_params(struct snd_pcm_substream *substream,
*/
for_each_rtd_cpu_dais(rtd, i, dai) {
w = snd_soc_dai_get_widget(dai, substream->stream);
+ if (!w) {
+ dev_err(cpu_dai->dev,
+ "%s widget not found, check amp link num in the topology\n",
+ dai->name);
+ return -EINVAL;
+ }
ipc4_copier = widget_to_copier(w);
memcpy(&ipc4_copier->dma_config_tlv[cpu_dai_id], dma_config_tlv,
sizeof(*dma_config_tlv));
diff --git a/sound/soc/sof/intel/hda.c b/sound/soc/sof/intel/hda.c
index 70fc08c8fc99e..f10ed4d102501 100644
--- a/sound/soc/sof/intel/hda.c
+++ b/sound/soc/sof/intel/hda.c
@@ -63,6 +63,11 @@ static int sdw_params_stream(struct device *dev,
struct snd_soc_dapm_widget *w = snd_soc_dai_get_widget(d, params_data->substream->stream);
struct snd_sof_dai_config_data data = { 0 };
+ if (!w) {
+ dev_err(dev, "%s widget not found, check amp link num in the topology\n",
+ d->name);
+ return -EINVAL;
+ }
data.dai_index = (params_data->link_id << 8) | d->id;
data.dai_data = params_data->alh_stream_id;
data.dai_node_id = data.dai_data;
--
2.39.5
From: Stas Sergeev <stsp2(a)yandex.ru>
[ Upstream commit 3ca459eaba1bf96a8c7878de84fa8872259a01e3 ]
Currently tun checks the group permission even if the user have matched.
Besides going against the usual permission semantic, this has a
very interesting implication: if the tun group is not among the
supplementary groups of the tun user, then effectively no one can
access the tun device. CAP_SYS_ADMIN still can, but its the same as
not setting the tun ownership.
This patch relaxes the group checking so that either the user match
or the group match is enough. This avoids the situation when no one
can access the device even though the ownership is properly set.
Also I simplified the logic by removing the redundant inversions:
tun_not_capable() --> !tun_capable()
Signed-off-by: Stas Sergeev <stsp2(a)yandex.ru>
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Link: https://patch.msgid.link/20241205073614.294773-1-stsp2@yandex.ru
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/tun.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 0adce9bf7a1e5..87cc7d778c3cf 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -636,14 +636,18 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
return ret;
}
-static inline bool tun_not_capable(struct tun_struct *tun)
+static inline bool tun_capable(struct tun_struct *tun)
{
const struct cred *cred = current_cred();
struct net *net = dev_net(tun->dev);
- return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
- (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
- !ns_capable(net->user_ns, CAP_NET_ADMIN);
+ if (ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return 1;
+ if (uid_valid(tun->owner) && uid_eq(cred->euid, tun->owner))
+ return 1;
+ if (gid_valid(tun->group) && in_egroup_p(tun->group))
+ return 1;
+ return 0;
}
static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -2838,7 +2842,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
!!(tun->flags & IFF_MULTI_QUEUE))
return -EINVAL;
- if (tun_not_capable(tun))
+ if (!tun_capable(tun))
return -EPERM;
err = security_tun_dev_open(tun->security);
if (err < 0)
--
2.39.5
From: Stas Sergeev <stsp2(a)yandex.ru>
[ Upstream commit 3ca459eaba1bf96a8c7878de84fa8872259a01e3 ]
Currently tun checks the group permission even if the user have matched.
Besides going against the usual permission semantic, this has a
very interesting implication: if the tun group is not among the
supplementary groups of the tun user, then effectively no one can
access the tun device. CAP_SYS_ADMIN still can, but its the same as
not setting the tun ownership.
This patch relaxes the group checking so that either the user match
or the group match is enough. This avoids the situation when no one
can access the device even though the ownership is properly set.
Also I simplified the logic by removing the redundant inversions:
tun_not_capable() --> !tun_capable()
Signed-off-by: Stas Sergeev <stsp2(a)yandex.ru>
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Link: https://patch.msgid.link/20241205073614.294773-1-stsp2@yandex.ru
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/tun.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index c34c6f0d23efe..52ea9f81d388b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -586,14 +586,18 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
return ret;
}
-static inline bool tun_not_capable(struct tun_struct *tun)
+static inline bool tun_capable(struct tun_struct *tun)
{
const struct cred *cred = current_cred();
struct net *net = dev_net(tun->dev);
- return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
- (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
- !ns_capable(net->user_ns, CAP_NET_ADMIN);
+ if (ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return 1;
+ if (uid_valid(tun->owner) && uid_eq(cred->euid, tun->owner))
+ return 1;
+ if (gid_valid(tun->group) && in_egroup_p(tun->group))
+ return 1;
+ return 0;
}
static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -2772,7 +2776,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
!!(tun->flags & IFF_MULTI_QUEUE))
return -EINVAL;
- if (tun_not_capable(tun))
+ if (!tun_capable(tun))
return -EPERM;
err = security_tun_dev_open(tun->security);
if (err < 0)
--
2.39.5
From: Stas Sergeev <stsp2(a)yandex.ru>
[ Upstream commit 3ca459eaba1bf96a8c7878de84fa8872259a01e3 ]
Currently tun checks the group permission even if the user have matched.
Besides going against the usual permission semantic, this has a
very interesting implication: if the tun group is not among the
supplementary groups of the tun user, then effectively no one can
access the tun device. CAP_SYS_ADMIN still can, but its the same as
not setting the tun ownership.
This patch relaxes the group checking so that either the user match
or the group match is enough. This avoids the situation when no one
can access the device even though the ownership is properly set.
Also I simplified the logic by removing the redundant inversions:
tun_not_capable() --> !tun_capable()
Signed-off-by: Stas Sergeev <stsp2(a)yandex.ru>
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Link: https://patch.msgid.link/20241205073614.294773-1-stsp2@yandex.ru
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/tun.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 959ca6b9cd138..a85f743aa1573 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -575,14 +575,18 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
return ret;
}
-static inline bool tun_not_capable(struct tun_struct *tun)
+static inline bool tun_capable(struct tun_struct *tun)
{
const struct cred *cred = current_cred();
struct net *net = dev_net(tun->dev);
- return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
- (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
- !ns_capable(net->user_ns, CAP_NET_ADMIN);
+ if (ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return 1;
+ if (uid_valid(tun->owner) && uid_eq(cred->euid, tun->owner))
+ return 1;
+ if (gid_valid(tun->group) && in_egroup_p(tun->group))
+ return 1;
+ return 0;
}
static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -2718,7 +2722,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
!!(tun->flags & IFF_MULTI_QUEUE))
return -EINVAL;
- if (tun_not_capable(tun))
+ if (!tun_capable(tun))
return -EPERM;
err = security_tun_dev_open(tun->security);
if (err < 0)
--
2.39.5
From: Stas Sergeev <stsp2(a)yandex.ru>
[ Upstream commit 3ca459eaba1bf96a8c7878de84fa8872259a01e3 ]
Currently tun checks the group permission even if the user have matched.
Besides going against the usual permission semantic, this has a
very interesting implication: if the tun group is not among the
supplementary groups of the tun user, then effectively no one can
access the tun device. CAP_SYS_ADMIN still can, but its the same as
not setting the tun ownership.
This patch relaxes the group checking so that either the user match
or the group match is enough. This avoids the situation when no one
can access the device even though the ownership is properly set.
Also I simplified the logic by removing the redundant inversions:
tun_not_capable() --> !tun_capable()
Signed-off-by: Stas Sergeev <stsp2(a)yandex.ru>
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Link: https://patch.msgid.link/20241205073614.294773-1-stsp2@yandex.ru
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/tun.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ea98d93138c12..a6c9f9062dbd4 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -574,14 +574,18 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
return ret;
}
-static inline bool tun_not_capable(struct tun_struct *tun)
+static inline bool tun_capable(struct tun_struct *tun)
{
const struct cred *cred = current_cred();
struct net *net = dev_net(tun->dev);
- return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
- (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
- !ns_capable(net->user_ns, CAP_NET_ADMIN);
+ if (ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return 1;
+ if (uid_valid(tun->owner) && uid_eq(cred->euid, tun->owner))
+ return 1;
+ if (gid_valid(tun->group) && in_egroup_p(tun->group))
+ return 1;
+ return 0;
}
static void tun_set_real_num_queues(struct tun_struct *tun)
@@ -2767,7 +2771,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
!!(tun->flags & IFF_MULTI_QUEUE))
return -EINVAL;
- if (tun_not_capable(tun))
+ if (!tun_capable(tun))
return -EPERM;
err = security_tun_dev_open(tun->security);
if (err < 0)
--
2.39.5