If an AX25 device is bound to a socket by setting the SO_BINDTODEVICE
socket option, a refcount leak will occur in ax25_release().
Commit 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()")
added decrement of device refcounts in ax25_release(). In order for that
to work correctly the refcounts must already be incremented when the
device is bound to the socket. An AX25 device can be bound to a socket
by either calling ax25_bind() or setting SO_BINDTODEVICE socket option.
In both cases the refcounts should be incremented, but in fact it is done
only in ax25_bind().
This bug leads to the following issue reported by Syzkaller:
================================================================
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 1 PID: 5932 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
Modules linked in:
CPU: 1 UID: 0 PID: 5932 Comm: syz-executor424 Not tainted 6.13.0-rc4-syzkaller-00110-g4099a71718b0 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
Call Trace:
<TASK>
__refcount_dec include/linux/refcount.h:336 [inline]
refcount_dec include/linux/refcount.h:351 [inline]
ref_tracker_free+0x710/0x820 lib/ref_tracker.c:236
netdev_tracker_free include/linux/netdevice.h:4156 [inline]
netdev_put include/linux/netdevice.h:4173 [inline]
netdev_put include/linux/netdevice.h:4169 [inline]
ax25_release+0x33f/0xa10 net/ax25/af_ax25.c:1069
__sock_release+0xb0/0x270 net/socket.c:640
sock_close+0x1c/0x30 net/socket.c:1408
...
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
</TASK>
================================================================
Fix the implementation of ax25_setsockopt() by adding increment of
refcounts for the new device bound, and decrement of refcounts for
the old unbound device.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()")
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+33841dc6aa3e1d86b78a(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=33841dc6aa3e1d86b78a
Signed-off-by: Murad Masimov <m.masimov(a)mt-integration.ru>
---
net/ax25/af_ax25.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index aa6c714892ec..9f3b8b682adb 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -685,6 +685,15 @@ static int ax25_setsockopt(struct socket *sock, int level, int optname,
break;
}
+ if (ax25->ax25_dev) {
+ if (dev == ax25->ax25_dev->dev) {
+ rcu_read_unlock();
+ break;
+ }
+ netdev_put(ax25->ax25_dev->dev, &ax25->dev_tracker);
+ ax25_dev_put(ax25->ax25_dev);
+ }
+
ax25->ax25_dev = ax25_dev_ax25dev(dev);
if (!ax25->ax25_dev) {
rcu_read_unlock();
@@ -692,6 +701,8 @@ static int ax25_setsockopt(struct socket *sock, int level, int optname,
break;
}
ax25_fillin_cb(ax25, ax25->ax25_dev);
+ netdev_hold(dev, &ax25->dev_tracker, GFP_ATOMIC);
+ ax25_dev_hold(ax25->ax25_dev);
rcu_read_unlock();
break;
--
2.39.2
There are two variables that indicate the interrupt type to be used
in the next test execution, "irq_type" as global and test->irq_type.
The global is referenced from pci_endpoint_test_get_irq() to preserve
the current type for ioctl(PCITEST_GET_IRQTYPE).
The type set in this function isn't reflected in the global "irq_type",
so ioctl(PCITEST_GET_IRQTYPE) returns the previous type.
As a result, the wrong type will be displayed in "pcitest" as follows:
# pcitest -i 0
SET IRQ TYPE TO LEGACY: OKAY
# pcitest -I
GET IRQ TYPE: MSI
Fix this issue by propagating the current type to the global "irq_type".
Cc: stable(a)vger.kernel.org
Fixes: b2ba9225e031 ("misc: pci_endpoint_test: Avoid using module parameter to determine irqtype")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko(a)socionext.com>
---
drivers/misc/pci_endpoint_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index a342587fc78a..33058630cd50 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -742,6 +742,7 @@ static bool pci_endpoint_test_set_irq(struct pci_endpoint_test *test,
if (!pci_endpoint_test_request_irq(test))
goto err;
+ irq_type = test->irq_type;
return true;
err:
--
2.25.1
Add a sanity check to madvise_dontneed_free() to address a corner case
in madvise where a race condition causes the current vma being processed
to be backed by a different page size.
During a madvise(MADV_DONTNEED) call on a memory region registered with
a userfaultfd, there's a period of time where the process mm lock is
temporarily released in order to send a UFFD_EVENT_REMOVE and let
userspace handle the event. During this time, the vma covering the
current address range may change due to an explicit mmap done
concurrently by another thread.
If, after that change, the memory region, which was originally backed by
4KB pages, is now backed by hugepages, the end address is rounded down
to a hugepage boundary to avoid data loss (see "Fixes" below). This
rounding may cause the end address to be truncated to the same address
as the start.
Make this corner case follow the same semantics as in other similar
cases where the requested region has zero length (ie. return 0).
This will make madvise_walk_vmas() continue to the next vma in the
range (this time holding the process mm lock) which, due to the prev
pointer becoming stale because of the vma change, will be the same
hugepage-backed vma that was just checked before. The next time
madvise_dontneed_free() runs for this vma, if the start address isn't
aligned to a hugepage boundary, it'll return -EINVAL, which is also in
line with the madvise api.
From userspace perspective, madvise() will return EINVAL because the
start address isn't aligned according to the new vma alignment
requirements (hugepage), even though it was correctly page-aligned when
the call was issued.
Fixes: 8ebe0a5eaaeb ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ricardo Cañuelo Navarro <rcn(a)igalia.com>
---
mm/madvise.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 49f3a75046f6..c8e28d51978a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -933,7 +933,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
*/
end = vma->vm_end;
}
- VM_WARN_ON(start >= end);
+ if (start == end)
+ return 0;
+ VM_WARN_ON(start > end);
}
if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED)
--
2.48.1
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Commit 11c60f23ed13 ("integrity: Remove unused macro
IMA_ACTION_RULE_FLAGS") removed the IMA_ACTION_RULE_FLAGS mask, due to it
not being used after commit 0d73a55208e9 ("ima: re-introduce own integrity
cache lock").
However, it seems that the latter commit mistakenly used the wrong mask
when moving the code from ima_inode_post_setattr() to
process_measurement(). There is no mention in the commit message about this
change and it looks quite important, since changing from IMA_ACTIONS_FLAGS
(later renamed to IMA_NONACTION_FLAGS) to IMA_ACTION_RULE_FLAGS was done by
commit 42a4c603198f0 ("ima: fix ima_inode_post_setattr").
Restore the original change of resetting only the policy-specific flags and
not the new file status, but with new mask 0xfb000000 since the
policy-specific flags changed meanwhile. Also rename IMA_ACTION_RULE_FLAGS
to IMA_NONACTION_RULE_FLAGS, to be consistent with IMA_NONACTION_FLAGS.
Cc: stable(a)vger.kernel.org # v4.16.x
Fixes: 11c60f23ed13 ("integrity: Remove unused macro IMA_ACTION_RULE_FLAGS")
Reviewed-by: Mimi Zohar <zohar(a)linux.ibm.com>
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
---
security/integrity/ima/ima.h | 1 +
security/integrity/ima/ima_main.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index e1a3d1239bee..615900d4150d 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -141,6 +141,7 @@ struct ima_kexec_hdr {
/* IMA iint policy rule cache flags */
#define IMA_NONACTION_FLAGS 0xff000000
+#define IMA_NONACTION_RULE_FLAGS 0xfb000000
#define IMA_DIGSIG_REQUIRED 0x01000000
#define IMA_PERMIT_DIRECTIO 0x02000000
#define IMA_NEW_FILE 0x04000000
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 46adfd524dd8..7173dca20c23 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -275,7 +275,7 @@ static int process_measurement(struct file *file, const struct cred *cred,
/* reset appraisal flags if ima_inode_post_setattr was called */
iint->flags &= ~(IMA_APPRAISE | IMA_APPRAISED |
IMA_APPRAISE_SUBMASK | IMA_APPRAISED_SUBMASK |
- IMA_NONACTION_FLAGS);
+ IMA_NONACTION_RULE_FLAGS);
/*
* Re-evaulate the file if either the xattr has changed or the
--
2.34.1
Hello Christian,
It will cause problems for real applications if the specific event happens, but that happens with a very low probability. It's more an issue that can be resolved at the author's convenience, but I would be comfortable if the issue I discovered is acknowledged at least, if not addressed.
Thanks
Shehab
________________________________________
From: Ahmed, Shehab Sarar <shehaba2(a)illinois.edu>
Sent: Sunday, February 2, 2025 6:10 PM
To: Christian Heusel
Cc: stable(a)vger.kernel.org; regressions(a)lists.linux.dev
Subject: Re: TCP Fast Retransmission Issue
Hello Christian,
It will cause problems for real applications if the specific event happens, but that happens with a very low probability. It's more an issue that can be resolved at the author's convenience, but I would be comfortable if the issue I discovered is acknowledged at least, if not addressed.
Thanks
Shehab
________________________________
From: Christian Heusel
Sent: Sunday, February 2, 2025 3:26 AM
To: Ahmed, Shehab Sarar
Cc: stable(a)vger.kernel.org; regressions(a)lists.linux.dev
Subject: Re: TCP Fast Retransmission Issue
On 25/02/02 10:26AM, Christian Heusel wrote:
> On 25/02/01 07:09PM, Ahmed, Shehab Sarar wrote:
> > Hello,
>
> Hello,
>
> > While experimenting with bbr protocol, I manipulated the network conditions by maintaining a high RTT for about one second before abruptly reducing it. Some packets sent during the high RTT phase experienced long delays in reaching the destination, while later packets, benefiting from the lower RTT, arrived earlier. This out-of-order arrival triggered the receiver to generate duplicate acknowledgments (dup ACKs). Due to the low RTT, these dup ACKs quickly reached the sender. Upon receiving three dup ACKs, the sender initiated a fast retransmission for an earlier packet that was not lost but was simply taking longer to arrive. Interestingly, despite the fast-retransmitted packet experienced a lower RTT, the original delayed packet still arrived first. When the receiver received this packet, it sent an ACK for the next packet in sequence. However, upon later receiving the fast-retransmitted packet, an issue arose in its logic for updating the acknowledgment number. As a result, even after the next expected packet was received, the acknowledgment number was not updated correctly. The receiver continued sending dup ACKs, ultimately forcing bbr into the retransmission timeout (RTO) phase.
> >
> > I generated this issue in linux kernel version 5.15.0-117-generic with Ubuntu 20.04. I attempted to confirm whether the issue persists with the latest Linux kernel. However, I discovered that the behavior of bbr has changed in the most recent kernel version, where it now sends chunks of packets instead of sending them one by one over time. As a result, I was unable to reproduce the specific sequence of events that triggered the bug we identified. Consequently, I could not confirm whether the bug still exists in the latest kernel.
> >
> > I believe that the issue (if still exists) will have to be resolved in the location net/ipv4/tcp_input.c or something like that. There are so many authors here that I do not know who to CC here. So, sending this email to you. Sorry if this is not the best way to report this issue.
>
> does this cause problems for real applications? To me it sounds a bit
> like a constructed issue, but I'm also not really proficient about the
> stack mentioned about :p
>
> I'm asking because this is important for how we treat this report, see
> the "Reporting Regression"[0] document for more details on what we
> consider to be an regression.
>
> > Thanks
> > Shehab
>
> Cheers,
> Christian
(forgot the link)
[0]: https://docs.kernel.org/admin-guide/reporting-regressions.html#what-is-a-re…
While by default max_autoclose equals to INT_MAX / HZ, one may set
net.sctp.max_autoclose to UINT_MAX. There is code in
sctp_association_init() that can consequently trigger overflow.
Cc: stable(a)vger.kernel.org
Fixes: 9f70f46bd4c7 ("sctp: properly latch and use autoclose value from sock to association")
Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru>
---
net/sctp/associola.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index c45c192b7878..0b0794f164cf 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -137,7 +137,8 @@ static struct sctp_association *sctp_association_init(
= 5 * asoc->rto_max;
asoc->timeouts[SCTP_EVENT_TIMEOUT_SACK] = asoc->sackdelay;
- asoc->timeouts[SCTP_EVENT_TIMEOUT_AUTOCLOSE] = sp->autoclose * HZ;
+ asoc->timeouts[SCTP_EVENT_TIMEOUT_AUTOCLOSE] =
+ (unsigned long)sp->autoclose * HZ;
/* Initializes the timers */
for (i = SCTP_EVENT_TIMEOUT_NONE; i < SCTP_NUM_TIMEOUT_TYPES; ++i)
--
2.34.1
On 02/02/2025 05:23, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> ARM: dts: aspeed: yosemite4: correct the compatible string for max31790
>
> to the 6.13-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> arm-dts-aspeed-yosemite4-correct-the-compatible-stri.patch
> and it can be found in the queue-6.13 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 64b29da76fb21bbb955e262461996d37865d4ae9
> Author: Ricky CX Wu <ricky.cx.wu.wiwynn(a)gmail.com>
> Date: Thu Oct 3 15:42:46 2024 +0800
>
> ARM: dts: aspeed: yosemite4: correct the compatible string for max31790
>
> [ Upstream commit b1a1ecb669bfa763ee5e86a038d7c9363eee7548 ]
>
> Fix the compatible string for max31790 to match the binding document.
>
> Fixes: 2b8d94f4b4a4 ("ARM: dts: aspeed: yosemite4: add Facebook Yosemite 4 BMC")
> Signed-off-by: Ricky CX Wu <ricky.cx.wu.wiwynn(a)gmail.com>
> Signed-off-by: Delphine CC Chiu <Delphine_CC_Chiu(a)wiwynn.com>
> Link: https://patch.msgid.link/20241003074251.3818101-6-Delphine_CC_Chiu@wiwynn.c…
> Signed-off-by: Andrew Jeffery <andrew(a)codeconstruct.com.au>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Sasha, something got broken in your scripts.
I received today huge flood (100? 200?) of such stable confirmations,
but I am not listed above at all. No tags came here from me, so why I am
Cc-ed on all this?
Best regards,
Krzysztof
When attaching uretprobes to processes running inside docker, the attached
process is segfaulted when encountering the retprobe.
The reason is that now that uretprobe is a system call the default seccomp
filters in docker block it as they only allow a specific set of known
syscalls. This is true for other userspace applications which use seccomp
to control their syscall surface.
Since uretprobe is a "kernel implementation detail" system call which is
not used by userspace application code directly, it is impractical and
there's very little point in forcing all userspace applications to
explicitly allow it in order to avoid crashing tracked processes.
Pass this systemcall through seccomp without depending on configuration.
Note: uretprobe isn't supported in i386 and __NR_ia32_rt_tgsigqueueinfo
uses the same number as __NR_uretprobe so the syscall isn't forced in the
compat bitmap.
Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe")
Reported-by: Rafael Buchbinder <rafi(a)rbk.io>
Link: https://lore.kernel.org/lkml/CAHsH6Gs3Eh8DFU0wq58c_LF8A4_+o6z456J7BidmcVY2A…
Link: https://lore.kernel.org/lkml/20250121182939.33d05470@gandalf.local.home/T/#…
Cc: stable(a)vger.kernel.org
Signed-off-by: Eyal Birger <eyal.birger(a)gmail.com>
---
v2: use action_cache bitmap and mode1 array to check the syscall
The following reproduction script synthetically demonstrates the problem
for seccomp filters:
cat > /tmp/x.c << EOF
char *syscalls[] = {
"write",
"exit_group",
"fstat",
};
__attribute__((noinline)) int probed(void)
{
printf("Probed\n");
return 1;
}
void apply_seccomp_filter(char **syscalls, int num_syscalls)
{
scmp_filter_ctx ctx;
ctx = seccomp_init(SCMP_ACT_KILL);
for (int i = 0; i < num_syscalls; i++) {
seccomp_rule_add(ctx, SCMP_ACT_ALLOW,
seccomp_syscall_resolve_name(syscalls[i]), 0);
}
seccomp_load(ctx);
seccomp_release(ctx);
}
int main(int argc, char *argv[])
{
int num_syscalls = sizeof(syscalls) / sizeof(syscalls[0]);
apply_seccomp_filter(syscalls, num_syscalls);
probed();
return 0;
}
EOF
cat > /tmp/trace.bt << EOF
uretprobe:/tmp/x:probed
{
printf("ret=%d\n", retval);
}
EOF
gcc -o /tmp/x /tmp/x.c -lseccomp
/usr/bin/bpftrace /tmp/trace.bt &
sleep 5 # wait for uretprobe attach
/tmp/x
pkill bpftrace
rm /tmp/x /tmp/x.c /tmp/trace.bt
---
kernel/seccomp.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 385d48293a5f..23b594a68bc0 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -734,13 +734,13 @@ seccomp_prepare_user_filter(const char __user *user_filter)
#ifdef SECCOMP_ARCH_NATIVE
/**
- * seccomp_is_const_allow - check if filter is constant allow with given data
+ * seccomp_is_filter_const_allow - check if filter is constant allow with given data
* @fprog: The BPF programs
* @sd: The seccomp data to check against, only syscall number and arch
* number are considered constant.
*/
-static bool seccomp_is_const_allow(struct sock_fprog_kern *fprog,
- struct seccomp_data *sd)
+static bool seccomp_is_filter_const_allow(struct sock_fprog_kern *fprog,
+ struct seccomp_data *sd)
{
unsigned int reg_value = 0;
unsigned int pc;
@@ -812,6 +812,21 @@ static bool seccomp_is_const_allow(struct sock_fprog_kern *fprog,
return false;
}
+static bool seccomp_is_const_allow(struct sock_fprog_kern *fprog,
+ struct seccomp_data *sd)
+{
+#ifdef __NR_uretprobe
+ if (sd->nr == __NR_uretprobe
+#ifdef SECCOMP_ARCH_COMPAT
+ && sd->arch != SECCOMP_ARCH_COMPAT
+#endif
+ )
+ return true;
+#endif
+
+ return seccomp_is_filter_const_allow(fprog, sd);
+}
+
static void seccomp_cache_prepare_bitmap(struct seccomp_filter *sfilter,
void *bitmap, const void *bitmap_prev,
size_t bitmap_size, int arch)
@@ -1023,6 +1038,9 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
*/
static const int mode1_syscalls[] = {
__NR_seccomp_read, __NR_seccomp_write, __NR_seccomp_exit, __NR_seccomp_sigreturn,
+#ifdef __NR_uretprobe
+ __NR_uretprobe,
+#endif
-1, /* negative terminated */
};
--
2.43.0
This is the start of the stable review cycle for the 6.6.75 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 01 Feb 2025 13:34:42 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.75-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.6.75-rc1
Jack Greiner <jack(a)emoss.org>
Input: xpad - add support for wooting two he (arm)
Matheos Mattsson <matheos.mattsson(a)gmail.com>
Input: xpad - add support for Nacon Evol-X Xbox One Controller
Leonardo Brondani Schenkel <leonardo(a)schenkel.net>
Input: xpad - improve name of 8BitDo controller 2dc8:3106
Pierre-Loup A. Griffais <pgriffais(a)valvesoftware.com>
Input: xpad - add QH Electronics VID/PID
Nilton Perim Neto <niltonperimneto(a)gmail.com>
Input: xpad - add unofficial Xbox 360 wireless receiver clone
Mark Pearson <mpearson-lenovo(a)squebb.ca>
Input: atkbd - map F23 key to support default copilot shortcut
Nicolas Nobelis <nicolas(a)nobelis.eu>
Input: xpad - add support for Nacon Pro Compact
Lianqin Hu <hulianqin(a)vivo.com>
ALSA: usb-audio: Add delay quirk for USB Audio Device
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
Qasim Ijaz <qasdev00(a)gmail.com>
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
Easwar Hariharan <eahariha(a)linux.microsoft.com>
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
Ido Schimmel <idosch(a)nvidia.com>
ipv4: ip_tunnel: Fix suspicious RCU usage warning in ip_tunnel_find()
Luis Henriques (SUSE) <luis.henriques(a)linux.dev>
ext4: fix access to uninitialised lock in fc replay path
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: check the bounds of read/write syscalls
Linus Torvalds <torvalds(a)linux-foundation.org>
cachestat: fix page cache statistics permission checking
Jiri Kosina <jkosina(a)suse.com>
Revert "HID: multitouch: Add support for lenovo Y9000P Touchpad"
Alexey Dobriyan <adobriyan(a)gmail.com>
block: fix integer overflow in BLKSECDISCARD
Jamal Hadi Salim <jhs(a)mojatatu.com>
net: sched: fix ets qdisc OOB Indexing
Paulo Alcantara <pc(a)manguebit.com>
smb: client: handle lack of EA support in smb2_query_path_info()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Use d_children list to iterate simple_offset directories
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Replace simple_offset end-of-directory detection
Chuck Lever <chuck.lever(a)oracle.com>
Revert "libfs: Add simple_offset_empty()"
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Return ENOSPC when the directory offset range is exhausted
Chuck Lever <chuck.lever(a)oracle.com>
shmem: Fix shmem_rename2()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Add simple_offset_rename() API
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Fix simple_offset_rename_exchange()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Add simple_offset_empty()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Define a minimum directory offset
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Re-arrange locking in offset_iterate_dir()
Andreas Gruenbacher <agruenba(a)redhat.com>
gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
Selvin Xavier <selvin.xavier(a)broadcom.com>
RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop
Omid Ehtemam-Haghighi <omid.ehtemamhaghighi(a)menlosecurity.com>
ipv6: Fix soft lockups in fib6_select_path under high next hop churn
Anastasia Belova <abelova(a)astralinux.ru>
cpufreq: amd-pstate: add check for cpufreq_cpu_get's return value
Igor Pylypiv <ipylypiv(a)google.com>
ata: libata-core: Set ATA_QCFLAG_RTF_FILLED in fill_result_tf()
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing depends on I2C
Russell Harmon <russ(a)har.mn>
hwmon: (drivetemp) Set scsi command timeout to 10s
Philippe Simons <simons.philippe(a)gmail.com>
irqchip/sunxi-nmi: Add missing SKIP_WAKE flag
Rob Herring (Arm) <robh(a)kernel.org>
of/unittest: Add test that of_address_to_resource() fails on non-translatable address
Tom Chung <chiahsuan.chung(a)amd.com>
drm/amd/display: Use HW lock mgr for PSR1
Xiang Zhang <hawkxiang.cpp(a)gmail.com>
scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
Linus Walleij <linus.walleij(a)linaro.org>
seccomp: Stub for !CONFIG_SECCOMP
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing selects for MFD_WM8994
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: wm8994: Add depends on MFD core
-------------
Diffstat:
Makefile | 4 +-
block/ioctl.c | 9 +-
drivers/ata/libahci.c | 12 +-
drivers/ata/libata-core.c | 8 +
drivers/cpufreq/amd-pstate.c | 7 +-
.../gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c | 3 +-
drivers/hid/hid-ids.h | 1 -
drivers/hid/hid-multitouch.c | 8 +-
drivers/hwmon/drivetemp.c | 2 +-
drivers/infiniband/hw/bnxt_re/main.c | 10 +
drivers/input/joystick/xpad.c | 9 +-
drivers/input/keyboard/atkbd.c | 2 +-
drivers/irqchip/irq-sunxi-nmi.c | 3 +-
drivers/of/unittest-data/tests-platform.dtsi | 13 +
drivers/of/unittest.c | 14 ++
drivers/scsi/scsi_transport_iscsi.c | 4 +-
drivers/scsi/storvsc_drv.c | 8 +-
drivers/usb/gadget/function/u_serial.c | 8 +-
drivers/usb/serial/quatech2.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 10 +
fs/ext4/super.c | 3 +-
fs/gfs2/file.c | 1 +
fs/libfs.c | 177 ++++++++++----
fs/smb/client/smb2inode.c | 104 +++++---
include/linux/fs.h | 2 +
include/linux/seccomp.h | 2 +-
mm/filemap.c | 19 ++
mm/shmem.c | 3 +-
net/ipv4/ip_tunnel.c | 2 +-
net/ipv6/ip6_fib.c | 8 +-
net/ipv6/route.c | 45 ++--
net/sched/sch_ets.c | 2 +
sound/soc/codecs/Kconfig | 1 +
sound/soc/samsung/Kconfig | 6 +-
sound/usb/quirks.c | 2 +
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/ipv6_route_update_soft_lockup.sh | 262 +++++++++++++++++++++
37 files changed, 640 insertions(+), 137 deletions(-)
This is the start of the stable review cycle for the 6.12.12 release.
There are 41 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 01 Feb 2025 14:41:19 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.12-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.12.12-rc2
Jann Horn <jannh(a)google.com>
io_uring/rsrc: require cloned buffers to share accounting contexts
Jack Greiner <jack(a)emoss.org>
Input: xpad - add support for wooting two he (arm)
Matheos Mattsson <matheos.mattsson(a)gmail.com>
Input: xpad - add support for Nacon Evol-X Xbox One Controller
Leonardo Brondani Schenkel <leonardo(a)schenkel.net>
Input: xpad - improve name of 8BitDo controller 2dc8:3106
Pierre-Loup A. Griffais <pgriffais(a)valvesoftware.com>
Input: xpad - add QH Electronics VID/PID
Nilton Perim Neto <niltonperimneto(a)gmail.com>
Input: xpad - add unofficial Xbox 360 wireless receiver clone
Mark Pearson <mpearson-lenovo(a)squebb.ca>
Input: atkbd - map F23 key to support default copilot shortcut
Nicolas Nobelis <nicolas(a)nobelis.eu>
Input: xpad - add support for Nacon Pro Compact
Jason Gerecke <jason.gerecke(a)wacom.com>
HID: wacom: Initialize brightness of LED trigger
Hans de Goede <hdegoede(a)redhat.com>
wifi: rtl8xxxu: add more missing rtl8192cu USB IDs
Lianqin Hu <hulianqin(a)vivo.com>
ALSA: usb-audio: Add delay quirk for USB Audio Device
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null"
Qasim Ijaz <qasdev00(a)gmail.com>
USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb()
Easwar Hariharan <eahariha(a)linux.microsoft.com>
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: check the bounds of read/write syscalls
Linus Torvalds <torvalds(a)linux-foundation.org>
cachestat: fix page cache statistics permission checking
Jiri Kosina <jikos(a)kernel.org>
Revert "HID: multitouch: Add support for lenovo Y9000P Touchpad"
Jamal Hadi Salim <jhs(a)mojatatu.com>
net: sched: fix ets qdisc OOB Indexing
Paulo Alcantara <pc(a)manguebit.com>
smb: client: handle lack of EA support in smb2_query_path_info()
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Use d_children list to iterate simple_offset directories
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Replace simple_offset end-of-directory detection
Chuck Lever <chuck.lever(a)oracle.com>
Revert "libfs: fix infinite directory reads for offset dir"
Chuck Lever <chuck.lever(a)oracle.com>
Revert "libfs: Add simple_offset_empty()"
Chuck Lever <chuck.lever(a)oracle.com>
libfs: Return ENOSPC when the directory offset range is exhausted
Andreas Gruenbacher <agruenba(a)redhat.com>
gfs2: Truncate address space when flipping GFS2_DIF_JDATA flag
Yosry Ahmed <yosryahmed(a)google.com>
mm: zswap: move allocations during CPU init outside the lock
Yosry Ahmed <yosryahmed(a)google.com>
mm: zswap: properly synchronize freeing resources during CPU hotunplug
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing depends on I2C
Russell Harmon <russ(a)har.mn>
hwmon: (drivetemp) Set scsi command timeout to 10s
Philippe Simons <simons.philippe(a)gmail.com>
irqchip/sunxi-nmi: Add missing SKIP_WAKE flag
Cristian Ciocaltea <cristian.ciocaltea(a)collabora.com>
drm/connector: hdmi: Validate supported_formats matches ycbcr_420_allowed
Yage Geng <icoderdev(a)gmail.com>
ALSA: hda/realtek: Fix volume adjustment issue on Lenovo ThinkBook 16P Gen5
Rob Herring (Arm) <robh(a)kernel.org>
of/unittest: Add test that of_address_to_resource() fails on non-translatable address
Alex Hung <alex.hung(a)amd.com>
drm/amd/display: Initialize denominator defaults to 1
Tom Chung <chiahsuan.chung(a)amd.com>
drm/amd/display: Use HW lock mgr for PSR1
Xiang Zhang <hawkxiang.cpp(a)gmail.com>
scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
Maciej Strozek <mstrozek(a)opensource.cirrus.com>
ASoC: cs42l43: Add codec force suspend/resume ops
Linus Walleij <linus.walleij(a)linaro.org>
seccomp: Stub for !CONFIG_SECCOMP
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: samsung: Add missing selects for MFD_WM8994
Marian Postevca <posteuca(a)mutex.one>
ASoC: codecs: es8316: Fix HW rate calculation for 48Mhz MCLK
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: wm8994: Add depends on MFD core
-------------
Diffstat:
Makefile | 4 +-
.../gpu/drm/amd/display/dc/dce/dmub_hw_lock_mgr.c | 3 +-
.../dml21/src/dml2_core/dml2_core_dcn4_calcs.c | 4 +-
drivers/gpu/drm/drm_connector.c | 3 +
drivers/hid/hid-ids.h | 1 -
drivers/hid/hid-multitouch.c | 8 +-
drivers/hid/wacom_sys.c | 24 +--
drivers/hwmon/drivetemp.c | 2 +-
drivers/input/joystick/xpad.c | 9 +-
drivers/input/keyboard/atkbd.c | 2 +-
drivers/irqchip/irq-sunxi-nmi.c | 3 +-
drivers/net/wireless/realtek/rtl8xxxu/core.c | 20 +++
drivers/of/unittest-data/tests-platform.dtsi | 13 ++
drivers/of/unittest.c | 14 ++
drivers/scsi/scsi_transport_iscsi.c | 4 +-
drivers/scsi/storvsc_drv.c | 8 +-
drivers/usb/gadget/function/u_serial.c | 8 +-
drivers/usb/serial/quatech2.c | 2 +-
drivers/vfio/platform/vfio_platform_common.c | 10 ++
fs/gfs2/file.c | 1 +
fs/libfs.c | 162 ++++++++++-----------
fs/smb/client/smb2inode.c | 104 +++++++++----
include/linux/fs.h | 1 -
include/linux/seccomp.h | 2 +-
io_uring/rsrc.c | 7 +
mm/filemap.c | 19 +++
mm/shmem.c | 4 +-
mm/zswap.c | 90 ++++++++----
net/sched/sch_ets.c | 2 +
sound/pci/hda/patch_realtek.c | 4 +-
sound/soc/codecs/Kconfig | 1 +
sound/soc/codecs/cs42l43.c | 1 +
sound/soc/codecs/es8316.c | 10 +-
sound/soc/samsung/Kconfig | 6 +-
sound/usb/quirks.c | 2 +
35 files changed, 374 insertions(+), 184 deletions(-)
Hello,
While experimenting with bbr protocol, I manipulated the network conditions by maintaining a high RTT for about one second before abruptly reducing it. Some packets sent during the high RTT phase experienced long delays in reaching the destination, while later packets, benefiting from the lower RTT, arrived earlier. This out-of-order arrival triggered the receiver to generate duplicate acknowledgments (dup ACKs). Due to the low RTT, these dup ACKs quickly reached the sender. Upon receiving three dup ACKs, the sender initiated a fast retransmission for an earlier packet that was not lost but was simply taking longer to arrive. Interestingly, despite the fast-retransmitted packet experienced a lower RTT, the original delayed packet still arrived first. When the receiver received this packet, it sent an ACK for the next packet in sequence. However, upon later receiving the fast-retransmitted packet, an issue arose in its logic for updating the acknowledgment number. As a result, even after the next expected packet was received, the acknowledgment number was not updated correctly. The receiver continued sending dup ACKs, ultimately forcing bbr into the retransmission timeout (RTO) phase.
I generated this issue in linux kernel version 5.15.0-117-generic with Ubuntu 20.04. I attempted to confirm whether the issue persists with the latest Linux kernel. However, I discovered that the behavior of bbr has changed in the most recent kernel version, where it now sends chunks of packets instead of sending them one by one over time. As a result, I was unable to reproduce the specific sequence of events that triggered the bug we identified. Consequently, I could not confirm whether the bug still exists in the latest kernel.
I believe that the issue (if still exists) will have to be resolved in the location net/ipv4/tcp_input.c or something like that. There are so many authors here that I do not know who to CC here. So, sending this email to you. Sorry if this is not the best way to report this issue.
Thanks
Shehab
________________________________________
From: Ahmed, Shehab Sarar
Sent: Saturday, February 1, 2025 1:01 PM
To: stable(a)vger.kernel.org
Cc: regressions(a)lists.linux.dev
Subject: TCP Fast Retransmission Issue
Hello,
While experimenting with bbr protocol, I manipulated the network conditions by maintaining a high RTT for about one second before abruptly reducing it. Some packets sent during the high RTT phase experienced long delays in reaching the destination, while later packets, benefiting from the lower RTT, arrived earlier. This out-of-order arrival triggered the receiver to generate duplicate acknowledgments (dup ACKs). Due to the low RTT, these dup ACKs quickly reached the sender. Upon receiving three dup ACKs, the sender initiated a fast retransmission for an earlier packet that was not lost but was simply taking longer to arrive. Interestingly, despite the fast-retransmitted packet experienced a lower RTT, the original delayed packet still arrived first. When the receiver received this packet, it sent an ACK for the next packet in sequence. However, upon later receiving the fast-retransmitted packet, an issue arose in its logic for updating the acknowledgment number. As a result, even after the next expected packet was received, the acknowledgment number was not updated correctly. The receiver continued sending dup ACKs, ultimately forcing bbr into the retransmission timeout (RTO) phase.
I generated this issue in linux kernel version 5.15.0-117-generic with Ubuntu 20.04. I attempted to confirm whether the issue persists with the latest Linux kernel. However, I discovered that the behavior of bbr has changed in the most recent kernel version, where it now sends chunks of packets instead of sending them one by one over time. As a result, I was unable to reproduce the specific sequence of events that triggered the bug we identified. Consequently, I could not confirm whether the bug still exists in the latest kernel.
I believe that the issue (if still exists) will have to be resolved in the location net/ipv4/tcp_input.c or something like that. There are so many authors here that I do not know who to CC here. So, sending this email to you. Sorry if this is not the best way to report this issue.
Thanks
Shehab
Returning to focus on 6.1, here is the 6.1 set from the corresponding
6.6 set:
https://lore.kernel.org/all/20240208232054.15778-1-catherine.hoang@oracle.c…
Two patches are missing from the original set:
[01/21] MAINTAINERS: add Catherine as xfs maintainer for 6.6.y
6.6.y-only change
[16/21] xfs: fix again select in kconfig XFS_ONLINE_SCRUB_STATS
XFS_ONLINE_SCRUB_STATS didn't show up till 6.6
The auto group was run on 10 configs and no regressions were seen.
This has been ack'd on the xfs-stable mailing list.
Thanks,
Leah
Catherine Hoang (1):
xfs: allow read IO and FICLONE to run concurrently
Cheng Lin (1):
xfs: introduce protection for drop nlink
Christoph Hellwig (4):
xfs: handle nimaps=0 from xfs_bmapi_write in xfs_alloc_file_space
xfs: only remap the written blocks in xfs_reflink_end_cow_extent
xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags
xfs: respect the stable writes flag on the RT device
Darrick J. Wong (8):
xfs: bump max fsgeom struct version
xfs: hoist freeing of rt data fork extent mappings
xfs: prevent rt growfs when quota is enabled
xfs: rt stubs should return negative errnos when rt disabled
xfs: fix units conversion error in xfs_bmap_del_extent_delay
xfs: make sure maxlen is still congruent with prod when rounding down
xfs: clean up dqblk extraction
xfs: dquot recovery does not validate the recovered dquot
Dave Chinner (1):
xfs: inode recovery does not validate the recovered inode
Leah Rumancik (1):
xfs: up(ic_sema) if flushing data device fails
Long Li (2):
xfs: factor out xfs_defer_pending_abort
xfs: abort intent items when recovery intents fail
Omar Sandoval (1):
xfs: fix internal error from AGFL exhaustion
fs/xfs/libxfs/xfs_alloc.c | 27 ++++++++++++--
fs/xfs/libxfs/xfs_bmap.c | 21 +++--------
fs/xfs/libxfs/xfs_defer.c | 28 +++++++++------
fs/xfs/libxfs/xfs_defer.h | 2 +-
fs/xfs/libxfs/xfs_inode_buf.c | 3 ++
fs/xfs/libxfs/xfs_rtbitmap.c | 33 +++++++++++++++++
fs/xfs/libxfs/xfs_sb.h | 2 +-
fs/xfs/xfs_bmap_util.c | 24 +++++++------
fs/xfs/xfs_dquot.c | 5 +--
fs/xfs/xfs_dquot_item_recover.c | 21 +++++++++--
fs/xfs/xfs_file.c | 63 ++++++++++++++++++++++++++-------
fs/xfs/xfs_inode.c | 24 +++++++++++++
fs/xfs/xfs_inode.h | 17 +++++++++
fs/xfs/xfs_inode_item_recover.c | 14 +++++++-
fs/xfs/xfs_ioctl.c | 30 ++++++++++------
fs/xfs/xfs_iops.c | 7 ++++
fs/xfs/xfs_log.c | 23 ++++++------
fs/xfs/xfs_log_recover.c | 2 +-
fs/xfs/xfs_reflink.c | 5 +++
fs/xfs/xfs_rtalloc.c | 33 +++++++++++++----
fs/xfs/xfs_rtalloc.h | 27 ++++++++------
21 files changed, 310 insertions(+), 101 deletions(-)
--
2.48.1.362.g079036d154-goog
From: Tejun Heo <tj(a)kernel.org>
[ Upstream commit 86e6ca55b83c575ab0f2e105cf08f98e58d3d7af ]
blkcg_unpin_online() walks up the blkcg hierarchy putting the online pin. To
walk up, it uses blkcg_parent(blkcg) but it was calling that after
blkcg_destroy_blkgs(blkcg) which could free the blkcg, leading to the
following UAF:
==================================================================
BUG: KASAN: slab-use-after-free in blkcg_unpin_online+0x15a/0x270
Read of size 8 at addr ffff8881057678c0 by task kworker/9:1/117
CPU: 9 UID: 0 PID: 117 Comm: kworker/9:1 Not tainted 6.13.0-rc1-work-00182-gb8f52214c61a-dirty #48
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 02/02/2022
Workqueue: cgwb_release cgwb_release_workfn
Call Trace:
<TASK>
dump_stack_lvl+0x27/0x80
print_report+0x151/0x710
kasan_report+0xc0/0x100
blkcg_unpin_online+0x15a/0x270
cgwb_release_workfn+0x194/0x480
process_scheduled_works+0x71b/0xe20
worker_thread+0x82a/0xbd0
kthread+0x242/0x2c0
ret_from_fork+0x33/0x70
ret_from_fork_asm+0x1a/0x30
</TASK>
...
Freed by task 1944:
kasan_save_track+0x2b/0x70
kasan_save_free_info+0x3c/0x50
__kasan_slab_free+0x33/0x50
kfree+0x10c/0x330
css_free_rwork_fn+0xe6/0xb30
process_scheduled_works+0x71b/0xe20
worker_thread+0x82a/0xbd0
kthread+0x242/0x2c0
ret_from_fork+0x33/0x70
ret_from_fork_asm+0x1a/0x30
Note that the UAF is not easy to trigger as the free path is indirected
behind a couple RCU grace periods and a work item execution. I could only
trigger it with artifical msleep() injected in blkcg_unpin_online().
Fix it by reading the parent pointer before destroying the blkcg's blkg's.
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Reported-by: Abagail ren <renzezhongucas(a)gmail.com>
Suggested-by: Linus Torvalds <torvalds(a)linuxfoundation.org>
Fixes: 4308a434e5e0 ("blkcg: don't offline parent blkcg first")
Cc: stable(a)vger.kernel.org # v5.7+
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Andrea Ciprietti <ciprietti(a)google.com>
---
include/linux/blk-cgroup.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 0e6e84db06f6..b89099360a86 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -428,10 +428,14 @@ static inline void blkcg_pin_online(struct blkcg *blkcg)
static inline void blkcg_unpin_online(struct blkcg *blkcg)
{
do {
+ struct blkcg *parent;
+
if (!refcount_dec_and_test(&blkcg->online_pin))
break;
+
+ parent = blkcg_parent(blkcg);
blkcg_destroy_blkgs(blkcg);
- blkcg = blkcg_parent(blkcg);
+ blkcg = parent;
} while (blkcg);
}
--
2.48.1.262.g85cc9f2d1e-goog
From: Ard Biesheuvel <ardb(a)kernel.org>
UEFI 2.11 introduced EFI_MEMORY_HOT_PLUGGABLE to annotate system memory
regions that are 'cold plugged' at boot, i.e., hot pluggable memory that
is available from early boot, and described as system RAM by the
firmware.
Existing loaders and EFI applications running in the boot context will
happily use this memory for allocating data structures that cannot be
freed or moved at runtime, and this prevents the memory from being
unplugged. Going forward, the new EFI_MEMORY_HOT_PLUGGABLE attribute
should be tested, and memory annotated as such should be avoided for
such allocations.
In the EFI stub, there are a couple of occurrences where, instead of the
high-level AllocatePages() UEFI boot service, a low-level code sequence
is used that traverses the EFI memory map and carves out the requested
number of pages from a free region. This is needed, e.g., for allocating
as low as possible, or for allocating pages at random.
While AllocatePages() should presumably avoid special purpose memory and
cold plugged regions, this manual approach needs to incorporate this
logic itself, in order to prevent the kernel itself from ending up in a
hot unpluggable region, preventing it from being unplugged.
So add the EFI_MEMORY_HOTPLUGGABLE macro definition, and check for it
where appropriate.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
drivers/firmware/efi/efi.c | 6 ++++--
drivers/firmware/efi/libstub/randomalloc.c | 3 +++
drivers/firmware/efi/libstub/relocate.c | 3 +++
include/linux/efi.h | 1 +
4 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 8296bf985d1d..7309394b8fc9 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -934,13 +934,15 @@ char * __init efi_md_typeattr_format(char *buf, size_t size,
EFI_MEMORY_WB | EFI_MEMORY_UCE | EFI_MEMORY_RO |
EFI_MEMORY_WP | EFI_MEMORY_RP | EFI_MEMORY_XP |
EFI_MEMORY_NV | EFI_MEMORY_SP | EFI_MEMORY_CPU_CRYPTO |
- EFI_MEMORY_RUNTIME | EFI_MEMORY_MORE_RELIABLE))
+ EFI_MEMORY_MORE_RELIABLE | EFI_MEMORY_HOT_PLUGGABLE |
+ EFI_MEMORY_RUNTIME))
snprintf(pos, size, "|attr=0x%016llx]",
(unsigned long long)attr);
else
snprintf(pos, size,
- "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
+ "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
attr & EFI_MEMORY_RUNTIME ? "RUN" : "",
+ attr & EFI_MEMORY_HOT_PLUGGABLE ? "HP" : "",
attr & EFI_MEMORY_MORE_RELIABLE ? "MR" : "",
attr & EFI_MEMORY_CPU_CRYPTO ? "CC" : "",
attr & EFI_MEMORY_SP ? "SP" : "",
diff --git a/drivers/firmware/efi/libstub/randomalloc.c b/drivers/firmware/efi/libstub/randomalloc.c
index e5872e38d9a4..5a732018be36 100644
--- a/drivers/firmware/efi/libstub/randomalloc.c
+++ b/drivers/firmware/efi/libstub/randomalloc.c
@@ -25,6 +25,9 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
if (md->type != EFI_CONVENTIONAL_MEMORY)
return 0;
+ if (md->attribute & EFI_MEMORY_HOT_PLUGGABLE)
+ return 0;
+
if (efi_soft_reserve_enabled() &&
(md->attribute & EFI_MEMORY_SP))
return 0;
diff --git a/drivers/firmware/efi/libstub/relocate.c b/drivers/firmware/efi/libstub/relocate.c
index 99b45d1cd624..d4264bfb6dc1 100644
--- a/drivers/firmware/efi/libstub/relocate.c
+++ b/drivers/firmware/efi/libstub/relocate.c
@@ -53,6 +53,9 @@ efi_status_t efi_low_alloc_above(unsigned long size, unsigned long align,
if (desc->type != EFI_CONVENTIONAL_MEMORY)
continue;
+ if (desc->attribute & EFI_MEMORY_HOT_PLUGGABLE)
+ continue;
+
if (efi_soft_reserve_enabled() &&
(desc->attribute & EFI_MEMORY_SP))
continue;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 053c57e61869..db293d7de686 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -128,6 +128,7 @@ typedef struct {
#define EFI_MEMORY_RO ((u64)0x0000000000020000ULL) /* read-only */
#define EFI_MEMORY_SP ((u64)0x0000000000040000ULL) /* soft reserved */
#define EFI_MEMORY_CPU_CRYPTO ((u64)0x0000000000080000ULL) /* supports encryption */
+#define EFI_MEMORY_HOT_PLUGGABLE BIT_ULL(20) /* supports unplugging at runtime */
#define EFI_MEMORY_RUNTIME ((u64)0x8000000000000000ULL) /* range requires runtime mapping */
#define EFI_MEMORY_DESCRIPTOR_VERSION 1
--
2.48.1.362.g079036d154-goog