From: John Harrison <John.C.Harrison(a)Intel.com>
Direction from hardware is that ring buffers should never be mapped
via the BAR on systems with LLC. There are too many caching pitfalls
due to the way BAR accesses are routed. So it is safest to just not
use it.
Signed-off-by: John Harrison <John.C.Harrison(a)Intel.com>
Fixes: 9d80841ea4c9 ("drm/i915: Allow ringbuffers to be bound anywhere")
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.9+
---
drivers/gpu/drm/i915/gt/intel_ring.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
index fb1d2595392ed..fb99143be98e7 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -53,7 +53,7 @@ int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)
if (unlikely(ret))
goto err_unpin;
- if (i915_vma_is_map_and_fenceable(vma)) {
+ if (i915_vma_is_map_and_fenceable(vma) && !HAS_LLC(vma->vm->i915)) {
addr = (void __force *)i915_vma_pin_iomap(vma);
} else {
int type = i915_coherent_map_type(vma->vm->i915, vma->obj, false);
@@ -98,7 +98,7 @@ void intel_ring_unpin(struct intel_ring *ring)
return;
i915_vma_unset_ggtt_write(vma);
- if (i915_vma_is_map_and_fenceable(vma))
+ if (i915_vma_is_map_and_fenceable(vma) && !HAS_LLC(vma->vm->i915))
i915_vma_unpin_iomap(vma);
else
i915_gem_object_unpin_map(vma->obj);
--
2.39.1
BugLink: https://bugs.launchpad.net/bugs/2007581
GPIO chip irq members are exposed before they could be completely
initialized and this leads to race conditions.
One such issue was observed for the gc->irq.domain variable which
was accessed through the I2C interface in gpiochip_to_irq() before
it could be initialized by gpiochip_add_irqchip(). This resulted in
Kernel NULL pointer dereference.
Following are the logs for reference :-
kernel: Call Trace:
kernel: gpiod_to_irq+0x53/0x70
kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0
kernel: i2c_acpi_get_irq+0xc0/0xd0
kernel: i2c_device_probe+0x28a/0x2a0
kernel: really_probe+0xf2/0x460
kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0
To avoid such scenarios, restrict usage of GPIO chip irq members before
they are completely initialized.
Signed-off-by: Shreeya Patel <shreeya.patel(a)collabora.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Bartosz Golaszewski <brgl(a)bgdev.pl>
(backported from commit 5467801f1fcbdc46bc7298a84dbf3ca1ff2a7320)
Signed-off-by: Asmaa Mnebhi <asmaa(a)nvidia.com>
---
drivers/gpio/gpiolib.c | 19 +++++++++++++++++++
include/linux/gpio/driver.h | 9 +++++++++
2 files changed, 28 insertions(+)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index abdf448b11a3..e4d47e00c392 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2146,6 +2146,16 @@ static int gpiochip_to_irq(struct gpio_chip *chip, unsigned offset)
{
struct irq_domain *domain = chip->irq.domain;
+#ifdef CONFIG_GPIOLIB_IRQCHIP
+ /*
+ * Avoid race condition with other code, which tries to lookup
+ * an IRQ before the irqchip has been properly registered,
+ * i.e. while gpiochip is still being brought up.
+ */
+ if (!chip->irq.initialized)
+ return -EPROBE_DEFER;
+#endif
+
if (!gpiochip_irqchip_irq_valid(chip, offset))
return -ENXIO;
@@ -2321,6 +2331,15 @@ static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
acpi_gpiochip_request_interrupts(gpiochip);
+ /*
+ * Using barrier() here to prevent compiler from reordering
+ * gc->irq.initialized before initialization of above
+ * GPIO chip irq members.
+ */
+ barrier();
+
+ gpiochip->irq.initialized = true;
+
return 0;
}
diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h
index 5dd9c982e2cb..15418caf76fc 100644
--- a/include/linux/gpio/driver.h
+++ b/include/linux/gpio/driver.h
@@ -201,6 +201,15 @@ struct gpio_irq_chip {
*/
bool threaded;
+ /**
+ * @initialized:
+ *
+ * Flag to track GPIO chip irq member's initialization.
+ * This flag will make sure GPIO chip irq members are not used
+ * before they are initialized.
+ */
+ bool initialized;
+
/**
* @init_hw: optional routine to initialize hardware before
* an IRQ chip will be added. This is quite useful when
--
2.30.1
From: John Harrison <John.C.Harrison(a)Intel.com>
Direction from hardware is that stolen memory should never be used for
ring buffer allocations on platforms with LLC. There are too many
caching pitfalls due to the way stolen memory accesses are routed. So
it is safest to just not use it.
Signed-off-by: John Harrison <John.C.Harrison(a)Intel.com>
Fixes: c58b735fc762 ("drm/i915: Allocate rings from stolen")
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.9+
---
drivers/gpu/drm/i915/gt/intel_ring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
index 15ec64d881c44..fb1d2595392ed 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -116,7 +116,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE |
I915_BO_ALLOC_PM_VOLATILE);
- if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt))
+ if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt) && !HAS_LLC(i915))
obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(obj))
obj = i915_gem_object_create_internal(i915, size);
--
2.39.1
Hi Greg,
I noticed that gcc 4.6 is also the requirement for kernel 5.4:
https://www.kernel.org/doc/html/v5.4/process/changes.html
So anyone who compiles 5.4 with gcc4.6 should also run into the problem.
Best regards,
Michael
-----Ursprüngliche Nachricht-----
Von: Michael Nies
Gesendet: Donnerstag, 9. Februar 2023 08:30
An: 'Greg KH' <gregkh(a)linuxfoundation.org>
Cc: 'stable(a)vger.kernel.org' <stable(a)vger.kernel.org>
Betreff: AW: Kernel Bug 217013
Hi Greg,
thanks for your answer.
We do not use kernel 4.19 at the moment.
On the systems itself we have only gcc 4.4.5 - so requirements for kernel 4.19 are not matched and we stay on 4.14.
But if anyone else should try to compile 4.19 with gcc 4.6 - they should run into the same problem because of "_Alignof".
On my side it is planned to compile kernel 4.19 with gcc 4.7 or higher - so we should not have problems with "_Alignof" in the future.
Best regards,
Michael
-----Ursprüngliche Nachricht-----
Von: Greg KH <mailto:gregkh@linuxfoundation.org>
Gesendet: Donnerstag, 9. Februar 2023 08:10
An: Michael Nies <mailto:michael.nies@netclusive.com>
Cc: 'stable(a)vger.kernel.org' <mailto:stable@vger.kernel.org>
Betreff: Re: Kernel Bug 217013
On Thu, Feb 09, 2023 at 06:54:51AM +0000, Michael Nies wrote:
> Hello,
>
> could you please have a look at Kernel Bug 217013 that was reported by me yesterday?
> https://bugzilla.kernel.org/show_bug.cgi?id=217013
>
> Greg wrote that I should write a Mail to this address.
Thanks for the email, and the bug report, I'll work on this later today.
Do you also see the same problem with the 4.19.y kernel tree? Or are you not using that one too, or with a newer compiler?
thanks,
greg k-h
Greg,
These two patches have been (correctly) auto selected to 5.15.y
along with the two dependency patches tagged with:
Stable-dep-of: b306e90ffabd ("ovl: remove privs in ovl_copyfile()")
9636e70ee2d3 ("ovl: use ovl_copy_{real,upper}attr() wrappers")
a54843833caf ("ovl: store lower path in ovl_inode")
It wasn't wrong to apply those patches with the two dependencies
to 5.15.y, but it is not as easy to do for 5.10.y, so here is a
very simple backport of the two fixes to 5.10.y, i.e.:
replaced ovl_copyattr(X) with ovl_copyattr(ovl_inode_real(X), X).
Note that the language "This fixes some failure in fstests..."
in commit message means that those fixes are not enough for the
tests to pass. Additional backports from v6.2 are needed for the
tests to pass and I am collaborating those backports with Leah,
so they will hit 5.15.y first before posting them for 5.10.y.
Never the less, these overlayfs fixes are important security
fixes, so they should be applied to LTS kernel even before
all the cases in the fstests are fixed.
Thanks,
Amir.
Amir Goldstein (2):
ovl: remove privs in ovl_copyfile()
ovl: remove privs in ovl_fallocate()
fs/overlayfs/file.c | 28 +++++++++++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
--
2.34.1
BugLink: https://bugs.launchpad.net/bugs/2007581
Commit 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members
before initialization") attempted to fix a race condition that lead to a
NULL pointer, but in the process caused a regression for _AEI/_EVT
declared GPIOs.
This manifests in messages showing deferred probing while trying to
allocate IRQs like so:
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0000 to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x002C to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x003D to IRQ, err -517
[ .. more of the same .. ]
The code for walking _AEI doesn't handle deferred probing and so this
leads to non-functional GPIO interrupts.
Fix this issue by moving the call to `acpi_gpiochip_request_interrupts`
to occur after gc->irc.initialized is set.
Fixes: 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members before initialization")
Link: https://lore.kernel.org/linux-gpio/BL1PR12MB51577A77F000A008AA694675E2EF9@B…
Link: https://bugzilla.suse.com/show_bug.cgi?id=1198697
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215850
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1979
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1976
Reported-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Shreeya Patel <shreeya.patel(a)collabora.com>
Tested-By: Samuel Čavoj <samuel(a)cavoj.net>
Tested-By: lukeluk498(a)gmail.com Link:
Reviewed-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Acked-by: Linus Walleij <linus.walleij(a)linaro.org>
Reviewed-and-tested-by: Takashi Iwai <tiwai(a)suse.de>
Cc: Shreeya Patel <shreeya.patel(a)collabora.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
(backported from commit 06fb4ecfeac7e00d6704fa5ed19299f2fefb3cc9)
Signed-off-by: Asmaa Mnebhi <asmaa(a)nvidia.com>
---
drivers/gpio/gpiolib.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index e4d47e00c392..049cdfc975b3 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2329,8 +2329,6 @@ static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
gpiochip_set_irq_hooks(gpiochip);
- acpi_gpiochip_request_interrupts(gpiochip);
-
/*
* Using barrier() here to prevent compiler from reordering
* gc->irq.initialized before initialization of above
@@ -2340,6 +2338,8 @@ static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
gpiochip->irq.initialized = true;
+ acpi_gpiochip_request_interrupts(gpiochip);
+
return 0;
}
--
2.30.1
A number of Cezanne systems report IRQ1 as a wakeup source when it's not actually
a wakeup. This can cause problems for certain ACPI events. The following fix
went upstream that fixed it:
commit 8e60615e8932 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN")
It was reported that this fix actually helped here with older kernels too:
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1921#note_1770257
So backport this fix to 5.15.y as well.
This backport is dependent upon being able to read the SMU version which
happened in a different commit. So backport that commit and follow up fixes
as well.
v1->v2:
* Split into multiple commits
* Catch some fixes for reading SMU version too
Hans de Goede (1):
platform/x86: amd-pmc: Fix compilation when CONFIG_DEBUGFS is disabled
Mario Limonciello (2):
platform/x86: amd-pmc: Correct usage of SMU version
platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN
Sanket Goswami (1):
platform/x86: amd-pmc: Export Idlemask values based on the APU
drivers/platform/x86/amd-pmc.c | 116 +++++++++++++++++++++++++++++++++
1 file changed, 116 insertions(+)
--
2.34.1
No upstream commit exists: the problem addressed here is that
'commit 75309018a24d ("s390: add support for TIF_NOTIFY_SIGNAL")'
was backported to 5.10. This commit is broken, but nobody noticed
upstream, since shortly after s390 converted to generic entry with
'commit 75309018a24d ("s390: add support for TIF_NOTIFY_SIGNAL")', which
implicitly fixed the problem outlined below.
Thread flag is set to TIF_NOTIFY_SIGNAL for io_uring work. The io work
user or syscall calls do_signal when either one of the TIF_SIGPENDING or
TIF_NOTIFY_SIGNAL flag is set. However, do_signal does consider only
TIF_SIGPENDING signal and ignores TIF_NOTIFY_SIGNAL condition. This
means get_signal is never invoked for TIF_NOTIFY_SIGNAL and hence the
flag is not cleared, which results in an endless do_signal loop.
Reference: 'commit 788d0824269b ("io_uring: import 5.15-stable io_uring")'
Fixes: 75309018a24d ("s390: add support for TIF_NOTIFY_SIGNAL")
Cc: stable(a)vger.kernel.org # 5.10.162
Acked-by: Heiko Carstens <hca(a)linux.ibm.com>
Acked-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
---
arch/s390/kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/kernel/signal.c b/arch/s390/kernel/signal.c
index b27b6c1f058d..9e900a8977bd 100644
--- a/arch/s390/kernel/signal.c
+++ b/arch/s390/kernel/signal.c
@@ -472,7 +472,7 @@ void do_signal(struct pt_regs *regs)
current->thread.system_call =
test_pt_regs_flag(regs, PIF_SYSCALL) ? regs->int_code : 0;
- if (test_thread_flag(TIF_SIGPENDING) && get_signal(&ksig)) {
+ if (get_signal(&ksig)) {
/* Whee! Actually deliver the signal. */
if (current->thread.system_call) {
regs->int_code = current->thread.system_call;
--
2.37.2
commit 5f58d783fd7823b2c2d5954d1126e702f94bfc4c upstream
We have this check to make sure we don't accidentally add older devices
that may have disappeared and re-appeared with an older generation from
being added to an fs_devices (such as a replace source device). This
makes sense, we don't want stale disks in our file system. However for
single disks this doesn't really make sense.
I've seen this in testing, but I was provided a reproducer from a
project that builds btrfs images on loopback devices. The loopback
device gets cached with the new generation, and then if it is re-used to
generate a new file system we'll fail to mount it because the new fs is
"older" than what we have in cache.
Fix this by freeing the cache when closing the device for a single device
filesystem. This will ensure that the mount command passed device path is
scanned successfully during the next mount.
CC: stable(a)vger.kernel.org # 5.10+
Reported-by: Daan De Meyer <daandemeyer(a)fb.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
---
This patch has already been submitted for the LTS stable 5.10 and above.
fs/btrfs/volumes.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 548de841cee5..dacaea61c2f7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -354,6 +354,7 @@ void btrfs_free_device(struct btrfs_device *device)
static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
{
struct btrfs_device *device;
+
WARN_ON(fs_devices->opened);
while (!list_empty(&fs_devices->devices)) {
device = list_entry(fs_devices->devices.next,
@@ -1401,6 +1402,17 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
if (!fs_devices->opened) {
seed_devices = fs_devices->seed;
fs_devices->seed = NULL;
+
+ /*
+ * If the struct btrfs_fs_devices is not assembled with any
+ * other device, it can be re-initialized during the next mount
+ * without the needing device-scan step. Therefore, it can be
+ * fully freed.
+ */
+ if (fs_devices->num_devices == 1) {
+ list_del(&fs_devices->fs_list);
+ free_fs_devices(fs_devices);
+ }
}
mutex_unlock(&uuid_mutex);
--
2.31.1
This patch series is to remove reader optimistic spinning in
kernel 5.10 to improve the MongoDB performance. Performance measurements
(10 times running average of overall throughput ops/sec) are using
MongoDB 5.0.11 and YCSB [1] microbenchmark with workloadA [2] on AWS EC2
m5.4xlarge/m6g.4xlarge (16-vCPU 64GiB-memory) instances with a 512GB EBS
IO1 drive disk with 5000 IOPS and separating MongoDB and YCSB load generator
on 2 instances and setting recordcount=25000000 and operationcount=10000000
to see the impacts of these changes:
Before - v5.10.165 kernel in OS Amazon Linux 2
After - v5.10.165 kernel with reader spinning disabled in OS Amazon Linux 2
| Arch | Instance Type | Before | After |
|---------+---------------+---------+---------|
| x86_64 | m5.4xlarge | 37365.4 | 42373.9 |
|---------+---------------+---------+---------|
| aarch64 | m6g.4xlarge | 33823.1 | 43113.7 |
|---------+---------------+---------+---------|
It can be seen that the MongoDB throughput can be improved around 13% in x86_64
and 27% in aarch64 after disabling reader optimistic spinning and these patches
can be applied to 5.10 with no conflict so we wonder if it's possible to backport
them to stable 5.10?
[1] https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17…
[2] https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada
Thanks,
Shaoying
Peter Zijlstra (3):
locking/rwsem: Better collate rwsem_read_trylock()
locking/rwsem: Introduce rwsem_write_trylock()
locking/rwsem: Fold __down_{read,write}*()
Waiman Long (4):
locking/rwsem: Pass the current atomic count to
rwsem_down_read_slowpath()
locking/rwsem: Prevent potential lock starvation
locking/rwsem: Enable reader optimistic lock stealing
locking/rwsem: Remove reader optimistic spinning
kernel/locking/lock_events_list.h | 6 +-
kernel/locking/rwsem.c | 359 +++++++++---------------------
2 files changed, 106 insertions(+), 259 deletions(-)
--
2.38.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ec76d0c2da5c ("vmxnet3: move rss code block under eop descriptor")
bdeed8b0958c ("vmxnet3: Record queue number to incoming packets")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ec76d0c2da5c6dfb6a33f1545cc15997013923da Mon Sep 17 00:00:00 2001
From: Ronak Doshi <doshir(a)vmware.com>
Date: Wed, 8 Feb 2023 14:38:59 -0800
Subject: [PATCH] vmxnet3: move rss code block under eop descriptor
Commit b3973bb40041 ("vmxnet3: set correct hash type based on
rss information") added hashType information into skb. However,
rssType field is populated for eop descriptor. This can lead
to incorrectly reporting of hashType for packets which use
multiple rx descriptors. Multiple rx descriptors are used
for Jumbo frame or LRO packets, which can hit this issue.
This patch moves the RSS codeblock under eop descritor.
Cc: stable(a)vger.kernel.org
Fixes: b3973bb40041 ("vmxnet3: set correct hash type based on rss information")
Signed-off-by: Ronak Doshi <doshir(a)vmware.com>
Acked-by: Peng Li <lpeng(a)vmware.com>
Acked-by: Guolin Yang <gyang(a)vmware.com>
Link: https://lore.kernel.org/r/20230208223900.5794-1-doshir@vmware.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 56267c327f0b..682987040ea8 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1546,31 +1546,6 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
rxd->len = rbi->len;
}
-#ifdef VMXNET3_RSS
- if (rcd->rssType != VMXNET3_RCD_RSS_TYPE_NONE &&
- (adapter->netdev->features & NETIF_F_RXHASH)) {
- enum pkt_hash_types hash_type;
-
- switch (rcd->rssType) {
- case VMXNET3_RCD_RSS_TYPE_IPV4:
- case VMXNET3_RCD_RSS_TYPE_IPV6:
- hash_type = PKT_HASH_TYPE_L3;
- break;
- case VMXNET3_RCD_RSS_TYPE_TCPIPV4:
- case VMXNET3_RCD_RSS_TYPE_TCPIPV6:
- case VMXNET3_RCD_RSS_TYPE_UDPIPV4:
- case VMXNET3_RCD_RSS_TYPE_UDPIPV6:
- hash_type = PKT_HASH_TYPE_L4;
- break;
- default:
- hash_type = PKT_HASH_TYPE_L3;
- break;
- }
- skb_set_hash(ctx->skb,
- le32_to_cpu(rcd->rssHash),
- hash_type);
- }
-#endif
skb_record_rx_queue(ctx->skb, rq->qid);
skb_put(ctx->skb, rcd->len);
@@ -1653,6 +1628,31 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
u32 mtu = adapter->netdev->mtu;
skb->len += skb->data_len;
+#ifdef VMXNET3_RSS
+ if (rcd->rssType != VMXNET3_RCD_RSS_TYPE_NONE &&
+ (adapter->netdev->features & NETIF_F_RXHASH)) {
+ enum pkt_hash_types hash_type;
+
+ switch (rcd->rssType) {
+ case VMXNET3_RCD_RSS_TYPE_IPV4:
+ case VMXNET3_RCD_RSS_TYPE_IPV6:
+ hash_type = PKT_HASH_TYPE_L3;
+ break;
+ case VMXNET3_RCD_RSS_TYPE_TCPIPV4:
+ case VMXNET3_RCD_RSS_TYPE_TCPIPV6:
+ case VMXNET3_RCD_RSS_TYPE_UDPIPV4:
+ case VMXNET3_RCD_RSS_TYPE_UDPIPV6:
+ hash_type = PKT_HASH_TYPE_L4;
+ break;
+ default:
+ hash_type = PKT_HASH_TYPE_L3;
+ break;
+ }
+ skb_set_hash(skb,
+ le32_to_cpu(rcd->rssHash),
+ hash_type);
+ }
+#endif
vmxnet3_rx_csum(adapter, skb,
(union Vmxnet3_GenericDesc *)rcd);
skb->protocol = eth_type_trans(skb, adapter->netdev);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ce4d9a1ea35a ("of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem")
3ecc68349bba ("memblock: rename memblock_free to memblock_phys_free")
fa27717110ae ("memblock: drop memblock_free_early_nid() and memblock_free_early()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ce4d9a1ea35ac5429e822c4106cb2859d5c71f3e Mon Sep 17 00:00:00 2001
From: "Isaac J. Manjarres" <isaacmanjarres(a)google.com>
Date: Wed, 8 Feb 2023 15:20:00 -0800
Subject: [PATCH] of: reserved_mem: Have kmemleak ignore dynamically allocated
reserved mem
Patch series "Fix kmemleak crashes when scanning CMA regions", v2.
When trying to boot a device with an ARM64 kernel with the following
config options enabled:
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_DEBUG_KMEMLEAK=y
a crash is encountered when kmemleak starts to scan the list of gray
or allocated objects that it maintains. Upon closer inspection, it was
observed that these page-faults always occurred when kmemleak attempted
to scan a CMA region.
At the moment, kmemleak is made aware of CMA regions that are specified
through the devicetree to be dynamically allocated within a range of
addresses. However, kmemleak should not need to scan CMA regions or any
reserved memory region, as those regions can be used for DMA transfers
between drivers and peripherals, and thus wouldn't contain anything
useful for kmemleak.
Additionally, since CMA regions are unmapped from the kernel's address
space when they are freed to the buddy allocator at boot when
CONFIG_DEBUG_PAGEALLOC is enabled, kmemleak shouldn't attempt to access
those memory regions, as that will trigger a crash. Thus, kmemleak
should ignore all dynamically allocated reserved memory regions.
This patch (of 1):
Currently, kmemleak ignores dynamically allocated reserved memory regions
that don't have a kernel mapping. However, regions that do retain a
kernel mapping (e.g. CMA regions) do get scanned by kmemleak.
This is not ideal for two reasons:
1 kmemleak works by scanning memory regions for pointers to allocated
objects to determine if those objects have been leaked or not.
However, reserved memory regions can be used between drivers and
peripherals for DMA transfers, and thus, would not contain pointers to
allocated objects, making it unnecessary for kmemleak to scan these
reserved memory regions.
2 When CONFIG_DEBUG_PAGEALLOC is enabled, along with kmemleak, the
CMA reserved memory regions are unmapped from the kernel's address
space when they are freed to buddy at boot. These CMA reserved regions
are still tracked by kmemleak, however, and when kmemleak attempts to
scan them, a crash will happen, as accessing the CMA region will result
in a page-fault, since the regions are unmapped.
Thus, use kmemleak_ignore_phys() for all dynamically allocated reserved
memory regions, instead of those that do not have a kernel mapping
associated with them.
Link: https://lkml.kernel.org/r/20230208232001.2052777-1-isaacmanjarres@google.com
Link: https://lkml.kernel.org/r/20230208232001.2052777-2-isaacmanjarres@google.com
Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
Signed-off-by: Isaac J. Manjarres <isaacmanjarres(a)google.com>
Acked-by: Mike Rapoport (IBM) <rppt(a)kernel.org>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Frank Rowand <frowand.list(a)gmail.com>
Cc: Kirill A. Shutemov <kirill.shtuemov(a)linux.intel.com>
Cc: Nick Kossifidis <mick(a)ics.forth.gr>
Cc: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: Rob Herring <robh(a)kernel.org>
Cc: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Cc: Saravana Kannan <saravanak(a)google.com>
Cc: <stable(a)vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 65f3b02a0e4e..f90975e00446 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -48,9 +48,10 @@ static int __init early_init_dt_alloc_reserved_memory_arch(phys_addr_t size,
err = memblock_mark_nomap(base, size);
if (err)
memblock_phys_free(base, size);
- kmemleak_ignore_phys(base);
}
+ kmemleak_ignore_phys(base);
+
return err;
}
From: Paolo Abeni <pabeni(a)redhat.com>
[ Upstream commit d4e85922e3e7ef2071f91f65e61629b60f3a9cf4 ]
If the peer closes all the existing subflows for a given
mptcp socket and later the application closes it, the current
implementation let it survive until the timewait timeout expires.
While the above is allowed by the protocol specification it
consumes resources for almost no reason and additionally
causes sporadic self-tests failures.
Let's move the mptcp socket to the TCP_CLOSE state when there are
no alive subflows at close time, so that the allocated resources
will be freed immediately.
Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Hi Greg, Sasha,
Here is one MPTCP patch backport that recently failed to apply to the
5.15 stable tree: it clears resources earlier if there is no more
reasons to keep MPTCP sockets alive.
I had a simple conflict because in v5.15, the context is a bit different
when iterating over the different subflows in __mptcp_close() but the
idea is still the same: in this loop, a counter needs to be incremented.
---
net/mptcp/protocol.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 47f359dac247..5d05d85242bc 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2726,6 +2726,7 @@ static void mptcp_close(struct sock *sk, long timeout)
{
struct mptcp_subflow_context *subflow;
bool do_cancel_work = false;
+ int subflows_alive = 0;
lock_sock(sk);
sk->sk_shutdown = SHUTDOWN_MASK;
@@ -2747,11 +2748,19 @@ static void mptcp_close(struct sock *sk, long timeout)
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
bool slow = lock_sock_fast_nested(ssk);
+ subflows_alive += ssk->sk_state != TCP_CLOSE;
+
sock_orphan(ssk);
unlock_sock_fast(ssk, slow);
}
sock_orphan(sk);
+ /* all the subflows are closed, only timeout can change the msk
+ * state, let's not keep resources busy for no reasons
+ */
+ if (subflows_alive == 0)
+ inet_sk_state_store(sk, TCP_CLOSE);
+
sock_hold(sk);
pr_debug("msk=%p state=%d", sk, sk->sk_state);
if (sk->sk_state == TCP_CLOSE) {
---
base-commit: e2c1a934fd8e4288e7a32f4088ceaccf469eb74c
change-id: 20230214-upstream-stable-20230214-linux-5-15-94-rc1-mptcp-fixes-517feb25bd47
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
If REQ_NOWAIT is set, then do a non-blocking allocation if the operation
is a write and we need to insert a new page. Currently REQ_NOWAIT cannot
be set as the queue isn't marked as supporting nowait, this change is in
preparation for allowing that.
Cc: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
drivers/block/brd.c | 39 ++++++++++++++++++++++-----------------
1 file changed, 22 insertions(+), 17 deletions(-)
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 15a148d5aad9..1ddada0cdaca 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -80,26 +80,20 @@ static struct page *brd_lookup_page(struct brd_device *brd, sector_t sector)
/*
* Insert a new page for a given sector, if one does not already exist.
*/
-static int brd_insert_page(struct brd_device *brd, sector_t sector)
+static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp)
{
pgoff_t idx;
struct page *page;
- gfp_t gfp_flags;
page = brd_lookup_page(brd, sector);
if (page)
return 0;
- /*
- * Must use NOIO because we don't want to recurse back into the
- * block or filesystem layers from page reclaim.
- */
- gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM;
- page = alloc_page(gfp_flags);
+ page = alloc_page(gfp | __GFP_ZERO | __GFP_HIGHMEM);
if (!page)
return -ENOMEM;
- if (radix_tree_preload(GFP_NOIO)) {
+ if (radix_tree_preload(gfp)) {
__free_page(page);
return -ENOMEM;
}
@@ -167,19 +161,20 @@ static void brd_free_pages(struct brd_device *brd)
/*
* copy_to_brd_setup must be called before copy_to_brd. It may sleep.
*/
-static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n)
+static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n,
+ gfp_t gfp)
{
unsigned int offset = (sector & (PAGE_SECTORS-1)) << SECTOR_SHIFT;
size_t copy;
int ret;
copy = min_t(size_t, n, PAGE_SIZE - offset);
- ret = brd_insert_page(brd, sector);
+ ret = brd_insert_page(brd, sector, gfp);
if (ret)
return ret;
if (copy < n) {
sector += copy >> SECTOR_SHIFT;
- ret = brd_insert_page(brd, sector);
+ ret = brd_insert_page(brd, sector, gfp);
}
return ret;
}
@@ -254,20 +249,26 @@ static void copy_from_brd(void *dst, struct brd_device *brd,
* Process a single bvec of a bio.
*/
static int brd_do_bvec(struct brd_device *brd, struct page *page,
- unsigned int len, unsigned int off, enum req_op op,
+ unsigned int len, unsigned int off, blk_opf_t opf,
sector_t sector)
{
void *mem;
int err = 0;
- if (op_is_write(op)) {
- err = copy_to_brd_setup(brd, sector, len);
+ if (op_is_write(opf)) {
+ /*
+ * Must use NOIO because we don't want to recurse back into the
+ * block or filesystem layers from page reclaim.
+ */
+ gfp_t gfp = opf & REQ_NOWAIT ? GFP_NOWAIT : GFP_NOIO;
+
+ err = copy_to_brd_setup(brd, sector, len, gfp);
if (err)
goto out;
}
mem = kmap_atomic(page);
- if (!op_is_write(op)) {
+ if (!op_is_write(opf)) {
copy_from_brd(mem + off, brd, sector, len);
flush_dcache_page(page);
} else {
@@ -296,8 +297,12 @@ static void brd_submit_bio(struct bio *bio)
(len & (SECTOR_SIZE - 1)));
err = brd_do_bvec(brd, bvec.bv_page, len, bvec.bv_offset,
- bio_op(bio), sector);
+ bio->bi_opf, sector);
if (err) {
+ if (err == -ENOMEM && bio->bi_opf & REQ_NOWAIT) {
+ bio_wouldblock_error(bio);
+ return;
+ }
bio_io_error(bio);
return;
}
--
2.39.1
During page migration, the copy_highpage function is used to copy the
page data to the target page. If the source page is a userspace page
with MTE tags, the KASAN tag of the target page must have the match-all
tag in order to avoid tag check faults during subsequent accesses to the
page by the kernel. However, the target page may have been allocated in
a number of ways, some of which will use the KASAN allocator and will
therefore end up setting the KASAN tag to a non-match-all tag. Therefore,
update the target page's KASAN tag to match the source page.
We ended up unintentionally fixing this issue as a result of a bad
merge conflict resolution between commit e059853d14ca ("arm64: mte:
Fix/clarify the PG_mte_tagged semantics") and commit 20794545c146 ("arm64:
kasan: Revert "arm64: mte: reset the page tag in page->flags""), which
preserved a tag reset for PG_mte_tagged pages which was considered to be
unnecessary at the time. Because SW tags KASAN uses separate tag storage,
update the code to only reset the tags when HW tags KASAN is enabled.
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Link: https://linux-review.googlesource.com/id/If303d8a709438d3ff5af5fd8570650583…
Reported-by: "Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee(a)mediatek.com>
Cc: <stable(a)vger.kernel.org> # 6.1
---
For the stable branch, e059853d14ca needs to be cherry-picked and the following
merge conflict resolution is needed:
- page_kasan_tag_reset(to);
+ if (kasan_hw_tags_enabled())
+ page_kasan_tag_reset(to);
- /* It's a new page, shouldn't have been tagged yet */
- WARN_ON_ONCE(!try_page_mte_tagging(to));
arch/arm64/mm/copypage.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
index 8dd5a8fe64b4..4aadcfb01754 100644
--- a/arch/arm64/mm/copypage.c
+++ b/arch/arm64/mm/copypage.c
@@ -22,7 +22,8 @@ void copy_highpage(struct page *to, struct page *from)
copy_page(kto, kfrom);
if (system_supports_mte() && page_mte_tagged(from)) {
- page_kasan_tag_reset(to);
+ if (kasan_hw_tags_enabled())
+ page_kasan_tag_reset(to);
/* It's a new page, shouldn't have been tagged yet */
WARN_ON_ONCE(!try_page_mte_tagging(to));
mte_copy_page_tags(kto, kfrom);
--
2.39.1.581.gbfd45094c4-goog
Use power state to decide whether we can enter or leave IPS accurately,
and then prevent to power on/off twice.
The commit 6bf3a083407b ("wifi: rtw88: add flag check before enter or leave IPS")
would like to prevent this as well, but it still can't entirely handle all
cases. The exception is that WiFi gets connected and does suspend/resume,
it will power on twice and cause it failed to power on after resuming,
like:
rtw_8723de 0000:03:00.0: failed to poll offset=0x6 mask=0x2 value=0x2
rtw_8723de 0000:03:00.0: mac power on failed
rtw_8723de 0000:03:00.0: failed to power on mac
rtw_8723de 0000:03:00.0: leave idle state failed
rtw_8723de 0000:03:00.0: failed to leave ips state
rtw_8723de 0000:03:00.0: failed to leave idle state
rtw_8723de 0000:03:00.0: failed to send h2c command
To fix this, introduce new flag RTW_FLAG_POWERON to reflect power state,
and call rtw_mac_pre_system_cfg() to configure registers properly between
power-off/-on.
Reported-by: Paul Gover <pmw.gover(a)yahoo.co.uk>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217016
Fixes: 6bf3a083407b ("wifi: rtw88: add flag check before enter or leave IPS")
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Ping-Ke Shih <pkshih(a)realtek.com>
---
Hi Kalle,
This patch is to fix 8723DE failed to power on after system resume. Please
queue this to 6.3
Thank you
Ping-Ke
---
drivers/net/wireless/realtek/rtw88/coex.c | 2 +-
drivers/net/wireless/realtek/rtw88/mac.c | 10 ++++++++++
drivers/net/wireless/realtek/rtw88/main.h | 2 +-
drivers/net/wireless/realtek/rtw88/ps.c | 4 ++--
drivers/net/wireless/realtek/rtw88/wow.c | 2 +-
5 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw88/coex.c b/drivers/net/wireless/realtek/rtw88/coex.c
index 38697237ee5f0..86467d2f8888c 100644
--- a/drivers/net/wireless/realtek/rtw88/coex.c
+++ b/drivers/net/wireless/realtek/rtw88/coex.c
@@ -4056,7 +4056,7 @@ void rtw_coex_display_coex_info(struct rtw_dev *rtwdev, struct seq_file *m)
rtwdev->stats.tx_throughput, rtwdev->stats.rx_throughput);
seq_printf(m, "%-40s = %u/ %u/ %u\n",
"IPS/ Low Power/ PS mode",
- test_bit(RTW_FLAG_INACTIVE_PS, rtwdev->flags),
+ !test_bit(RTW_FLAG_POWERON, rtwdev->flags),
test_bit(RTW_FLAG_LEISURE_PS_DEEP, rtwdev->flags),
rtwdev->lps_conf.mode);
diff --git a/drivers/net/wireless/realtek/rtw88/mac.c b/drivers/net/wireless/realtek/rtw88/mac.c
index 4e5c194aac299..dae64901bac5a 100644
--- a/drivers/net/wireless/realtek/rtw88/mac.c
+++ b/drivers/net/wireless/realtek/rtw88/mac.c
@@ -273,6 +273,11 @@ static int rtw_mac_power_switch(struct rtw_dev *rtwdev, bool pwr_on)
if (rtw_pwr_seq_parser(rtwdev, pwr_seq))
return -EINVAL;
+ if (pwr_on)
+ set_bit(RTW_FLAG_POWERON, rtwdev->flags);
+ else
+ clear_bit(RTW_FLAG_POWERON, rtwdev->flags);
+
return 0;
}
@@ -335,6 +340,11 @@ int rtw_mac_power_on(struct rtw_dev *rtwdev)
ret = rtw_mac_power_switch(rtwdev, true);
if (ret == -EALREADY) {
rtw_mac_power_switch(rtwdev, false);
+
+ ret = rtw_mac_pre_system_cfg(rtwdev);
+ if (ret)
+ goto err;
+
ret = rtw_mac_power_switch(rtwdev, true);
if (ret)
goto err;
diff --git a/drivers/net/wireless/realtek/rtw88/main.h b/drivers/net/wireless/realtek/rtw88/main.h
index 165f299e8e1f9..d4a53d5567451 100644
--- a/drivers/net/wireless/realtek/rtw88/main.h
+++ b/drivers/net/wireless/realtek/rtw88/main.h
@@ -356,7 +356,7 @@ enum rtw_flags {
RTW_FLAG_RUNNING,
RTW_FLAG_FW_RUNNING,
RTW_FLAG_SCANNING,
- RTW_FLAG_INACTIVE_PS,
+ RTW_FLAG_POWERON,
RTW_FLAG_LEISURE_PS,
RTW_FLAG_LEISURE_PS_DEEP,
RTW_FLAG_DIG_DISABLE,
diff --git a/drivers/net/wireless/realtek/rtw88/ps.c b/drivers/net/wireless/realtek/rtw88/ps.c
index 11594940d6b00..996365575f44f 100644
--- a/drivers/net/wireless/realtek/rtw88/ps.c
+++ b/drivers/net/wireless/realtek/rtw88/ps.c
@@ -25,7 +25,7 @@ static int rtw_ips_pwr_up(struct rtw_dev *rtwdev)
int rtw_enter_ips(struct rtw_dev *rtwdev)
{
- if (test_and_set_bit(RTW_FLAG_INACTIVE_PS, rtwdev->flags))
+ if (!test_bit(RTW_FLAG_POWERON, rtwdev->flags))
return 0;
rtw_coex_ips_notify(rtwdev, COEX_IPS_ENTER);
@@ -50,7 +50,7 @@ int rtw_leave_ips(struct rtw_dev *rtwdev)
{
int ret;
- if (!test_and_clear_bit(RTW_FLAG_INACTIVE_PS, rtwdev->flags))
+ if (test_bit(RTW_FLAG_POWERON, rtwdev->flags))
return 0;
rtw_hci_link_ps(rtwdev, false);
diff --git a/drivers/net/wireless/realtek/rtw88/wow.c b/drivers/net/wireless/realtek/rtw88/wow.c
index 89dc595094d5c..16ddee577efec 100644
--- a/drivers/net/wireless/realtek/rtw88/wow.c
+++ b/drivers/net/wireless/realtek/rtw88/wow.c
@@ -592,7 +592,7 @@ static int rtw_wow_leave_no_link_ps(struct rtw_dev *rtwdev)
if (rtw_get_lps_deep_mode(rtwdev) != LPS_DEEP_MODE_NONE)
rtw_leave_lps_deep(rtwdev);
} else {
- if (test_bit(RTW_FLAG_INACTIVE_PS, rtwdev->flags)) {
+ if (!test_bit(RTW_FLAG_POWERON, rtwdev->flags)) {
rtw_wow->ips_enabled = true;
ret = rtw_leave_ips(rtwdev);
if (ret)
--
2.25.1
The quilt patch titled
Subject: mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
has been removed from the -mm tree. Its filename was
mm-madv_collapse-set-eagain-on-unexpected-page-refcount.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Zach O'Keefe" <zokeefe(a)google.com>
Subject: mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount
Date: Tue, 24 Jan 2023 17:57:37 -0800
During collapse, in a few places we check to see if a given small page has
any unaccounted references. If the refcount on the page doesn't match our
expectations, it must be there is an unknown user concurrently interested
in the page, and so it's not safe to move the contents elsewhere.
However, the unaccounted pins are likely an ephemeral state.
In this situation, MADV_COLLAPSE returns -EINVAL when it should return
-EAGAIN. This could cause userspace to conclude that the syscall
failed, when it in fact could succeed by retrying.
Link: https://lkml.kernel.org/r/20230125015738.912924-1-zokeefe@google.com
Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Signed-off-by: Zach O'Keefe <zokeefe(a)google.com>
Reported-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/khugepaged.c~mm-madv_collapse-set-eagain-on-unexpected-page-refcount
+++ a/mm/khugepaged.c
@@ -2611,6 +2611,7 @@ static int madvise_collapse_errno(enum s
case SCAN_CGROUP_CHARGE_FAIL:
return -EBUSY;
/* Resource temporary unavailable - trying again might succeed */
+ case SCAN_PAGE_COUNT:
case SCAN_PAGE_LOCK:
case SCAN_PAGE_LRU:
case SCAN_DEL_PAGE_LRU:
_
Patches currently in -mm which might be from zokeefe(a)google.com are
The quilt patch titled
Subject: mm/filemap: fix page end in filemap_get_read_batch
has been removed from the -mm tree. Its filename was
mm-filemap-fix-page-end-in-filemap_get_read_batch.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Qian Yingjin <qian(a)ddn.com>
Subject: mm/filemap: fix page end in filemap_get_read_batch
Date: Wed, 8 Feb 2023 10:24:00 +0800
I was running traces of the read code against an RAID storage system to
understand why read requests were being misaligned against the underlying
RAID strips. I found that the page end offset calculation in
filemap_get_read_batch() was off by one.
When a read is submitted with end offset 1048575, then it calculates the
end page for read of 256 when it should be 255. "last_index" is the index
of the page beyond the end of the read and it should be skipped when get a
batch of pages for read in @filemap_get_read_batch().
The below simple patch fixes the problem. This code was introduced in
kernel 5.12.
Link: https://lkml.kernel.org/r/20230208022400.28962-1-coolqyj@163.com
Fixes: cbd59c48ae2b ("mm/filemap: use head pages in generic_file_buffered_read")
Signed-off-by: Qian Yingjin <qian(a)ddn.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/filemap.c~mm-filemap-fix-page-end-in-filemap_get_read_batch
+++ a/mm/filemap.c
@@ -2588,18 +2588,19 @@ static int filemap_get_pages(struct kioc
struct folio *folio;
int err = 0;
+ /* "last_index" is the index of the page beyond the end of the read */
last_index = DIV_ROUND_UP(iocb->ki_pos + iter->count, PAGE_SIZE);
retry:
if (fatal_signal_pending(current))
return -EINTR;
- filemap_get_read_batch(mapping, index, last_index, fbatch);
+ filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
if (!folio_batch_count(fbatch)) {
if (iocb->ki_flags & IOCB_NOIO)
return -EAGAIN;
page_cache_sync_readahead(mapping, ra, filp, index,
last_index - index);
- filemap_get_read_batch(mapping, index, last_index, fbatch);
+ filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
}
if (!folio_batch_count(fbatch)) {
if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
_
Patches currently in -mm which might be from qian(a)ddn.com are
From: Dave Ertman <david.m.ertman(a)intel.com>
RDMA is not supported in ice on a PF that has been added to a bonded
interface. To enforce this, when an interface enters a bond, we unplug
the auxiliary device that supports RDMA functionality. This unplug
currently happens in the context of handling the netdev bonding event.
This event is sent to the ice driver under RTNL context. This is causing
a deadlock where the RDMA driver is waiting for the RTNL lock to complete
the removal.
Defer the unplugging/re-plugging of the auxiliary device to the service
task so that it is not performed under the RTNL lock context.
Cc: stable(a)vger.kernel.org # 6.1.x
Reported-by: Jaroslav Pulchart <jaroslav.pulchart(a)gooddata.com>
Link: https://lore.kernel.org/netdev/CAK8fFZ6A_Gphw_3-QMGKEFQk=sfCw1Qmq0TVZK3rtAi…
Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed")
Signed-off-by: Dave Ertman <david.m.ertman(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
---
v2:
(Removed from original pull request)
- Reversed order of bit processing in ice_service_task for PLUG/UNPLUG
v1: https://lore.kernel.org/netdev/20230131213703.1347761-2-anthony.l.nguyen@in…
drivers/net/ethernet/intel/ice/ice.h | 14 +++++---------
drivers/net/ethernet/intel/ice/ice_main.c | 19 ++++++++-----------
2 files changed, 13 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 713069f809ec..3cad5e6b2ad1 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -506,6 +506,7 @@ enum ice_pf_flags {
ICE_FLAG_VF_VLAN_PRUNING,
ICE_FLAG_LINK_LENIENT_MODE_ENA,
ICE_FLAG_PLUG_AUX_DEV,
+ ICE_FLAG_UNPLUG_AUX_DEV,
ICE_FLAG_MTU_CHANGED,
ICE_FLAG_GNSS, /* GNSS successfully initialized */
ICE_PF_FLAGS_NBITS /* must be last */
@@ -950,16 +951,11 @@ static inline void ice_set_rdma_cap(struct ice_pf *pf)
*/
static inline void ice_clear_rdma_cap(struct ice_pf *pf)
{
- /* We can directly unplug aux device here only if the flag bit
- * ICE_FLAG_PLUG_AUX_DEV is not set because ice_unplug_aux_dev()
- * could race with ice_plug_aux_dev() called from
- * ice_service_task(). In this case we only clear that bit now and
- * aux device will be unplugged later once ice_plug_aux_device()
- * called from ice_service_task() finishes (see ice_service_task()).
+ /* defer unplug to service task to avoid RTNL lock and
+ * clear PLUG bit so that pending plugs don't interfere
*/
- if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
- ice_unplug_aux_dev(pf);
-
+ clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags);
+ set_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags);
clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
}
#endif /* _ICE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 8ec24f6cf6be..10d1c5b10d2a 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2316,18 +2316,15 @@ static void ice_service_task(struct work_struct *work)
}
}
- if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) {
- /* Plug aux device per request */
- ice_plug_aux_dev(pf);
+ /* unplug aux dev per request, if an unplug request came in
+ * while processing a plug request, this will handle it
+ */
+ if (test_and_clear_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags))
+ ice_unplug_aux_dev(pf);
- /* Mark plugging as done but check whether unplug was
- * requested during ice_plug_aux_dev() call
- * (e.g. from ice_clear_rdma_cap()) and if so then
- * plug aux device.
- */
- if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
- ice_unplug_aux_dev(pf);
- }
+ /* Plug aux device per request */
+ if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
+ ice_plug_aux_dev(pf);
if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) {
struct iidc_event *event;
--
2.38.1
On a heavily loaded machine there can be lock contention on the
global buffers lock. Add a percpu list to cache buffers on when
lock contention is encountered.
When allocating buffers attempt to use cached buffers first,
before taking the global buffers lock. When freeing buffers
try to put them back to the global list but if contention is
encountered, put the buffer on the percpu list.
The length of time a buffer is held on the percpu list is dynamically
adjusted based on lock contention. The amount of hold time is rapidly
increased and slow ramped down.
Fixes: df323337e507 ("apparmor: Use a memory pool instead per-CPU caches")
Link: https://lore.kernel.org/lkml/cfd5cc6f-5943-2e06-1dbe-f4b4ad5c1fa1@canonical…
Signed-off-by: John Johansen <john.johansen(a)canonical.com>
Reported-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Signed-off-by: Anil Altinay <aaltinay(a)google.com>
Cc: stable(a)vger.kernel.org
---
security/apparmor/lsm.c | 73 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 68 insertions(+), 5 deletions(-)
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index c6728a629437..56b22e2def4c 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -49,12 +49,19 @@ union aa_buffer {
char buffer[1];
};
+struct aa_local_cache {
+ unsigned int contention;
+ unsigned int hold;
+ struct list_head head;
+};
+
#define RESERVE_COUNT 2
static int reserve_count = RESERVE_COUNT;
static int buffer_count;
static LIST_HEAD(aa_global_buffers);
static DEFINE_SPINLOCK(aa_buffers_lock);
+static DEFINE_PER_CPU(struct aa_local_cache, aa_local_buffers);
/*
* LSM hook functions
@@ -1634,14 +1641,43 @@ static int param_set_mode(const char *val, const struct kernel_param *kp)
return 0;
}
+static void update_contention(struct aa_local_cache *cache)
+{
+ cache->contention += 3;
+ if (cache->contention > 9)
+ cache->contention = 9;
+ cache->hold += 1 << cache->contention; /* 8, 64, 512 */
+}
+
char *aa_get_buffer(bool in_atomic)
{
union aa_buffer *aa_buf;
+ struct aa_local_cache *cache;
bool try_again = true;
gfp_t flags = (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+ /* use per cpu cached buffers first */
+ cache = get_cpu_ptr(&aa_local_buffers);
+ if (!list_empty(&cache->head)) {
+ aa_buf = list_first_entry(&cache->head, union aa_buffer, list);
+ list_del(&aa_buf->list);
+ cache->hold--;
+ put_cpu_ptr(&aa_local_buffers);
+ return &aa_buf->buffer[0];
+ }
+ put_cpu_ptr(&aa_local_buffers);
+ if (!spin_trylock(&aa_buffers_lock)) {
+ cache = get_cpu_ptr(&aa_local_buffers);
+ update_contention(cache);
+ put_cpu_ptr(&aa_local_buffers);
+ spin_lock(&aa_buffers_lock);
+ } else {
+ cache = get_cpu_ptr(&aa_local_buffers);
+ if (cache->contention)
+ cache->contention--;
+ put_cpu_ptr(&aa_local_buffers);
+ }
retry:
- spin_lock(&aa_buffers_lock);
if (buffer_count > reserve_count ||
(in_atomic && !list_empty(&aa_global_buffers))) {
aa_buf = list_first_entry(&aa_global_buffers, union aa_buffer,
@@ -1667,6 +1703,7 @@ char *aa_get_buffer(bool in_atomic)
if (!aa_buf) {
if (try_again) {
try_again = false;
+ spin_lock(&aa_buffers_lock);
goto retry;
}
pr_warn_once("AppArmor: Failed to allocate a memory buffer.\n");
@@ -1678,15 +1715,32 @@ char *aa_get_buffer(bool in_atomic)
void aa_put_buffer(char *buf)
{
union aa_buffer *aa_buf;
+ struct aa_local_cache *cache;
if (!buf)
return;
aa_buf = container_of(buf, union aa_buffer, buffer[0]);
- spin_lock(&aa_buffers_lock);
- list_add(&aa_buf->list, &aa_global_buffers);
- buffer_count++;
- spin_unlock(&aa_buffers_lock);
+ cache = get_cpu_ptr(&aa_local_buffers);
+ if (!cache->hold) {
+ put_cpu_ptr(&aa_local_buffers);
+ if (spin_trylock(&aa_buffers_lock)) {
+ list_add(&aa_buf->list, &aa_global_buffers);
+ buffer_count++;
+ spin_unlock(&aa_buffers_lock);
+ cache = get_cpu_ptr(&aa_local_buffers);
+ if (cache->contention)
+ cache->contention--;
+ put_cpu_ptr(&aa_local_buffers);
+ return;
+ }
+ cache = get_cpu_ptr(&aa_local_buffers);
+ update_contention(cache);
+ }
+
+ /* cache in percpu list */
+ list_add(&aa_buf->list, &cache->head);
+ put_cpu_ptr(&aa_local_buffers);
}
/*
@@ -1728,6 +1782,15 @@ static int __init alloc_buffers(void)
union aa_buffer *aa_buf;
int i, num;
+ /*
+ * per cpu set of cached allocated buffers used to help reduce
+ * lock contention
+ */
+ for_each_possible_cpu(i) {
+ per_cpu(aa_local_buffers, i).contention = 0;
+ per_cpu(aa_local_buffers, i).hold = 0;
+ INIT_LIST_HEAD(&per_cpu(aa_local_buffers, i).head);
+ }
/*
* A function may require two buffers at once. Usually the buffers are
* used for a short period of time and are shared. On UP kernel buffers
--
2.39.2.637.g21b0678d19-goog
The patch titled
Subject: nilfs2: fix underflow in second superblock position calculations
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-underflow-in-second-superblock-position-calculations.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix underflow in second superblock position calculations
Date: Wed, 15 Feb 2023 07:40:43 +0900
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
superblock, underflows when the argument device size is less than 4096
bytes. Therefore, when using this macro, it is necessary to check in
advance that the device size is not less than a lower limit, or at least
that underflow does not occur.
The current nilfs2 implementation lacks this check, causing out-of-bound
block access when mounting devices smaller than 4096 bytes:
I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 2
NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
In addition, when trying to resize the filesystem to a size below 4096
bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
of segments to nilfs_sufile_resize(), corrupting parameters such as the
number of segments in superblocks. This causes excessive loop iterations
in nilfs_sufile_resize() during a subsequent resize ioctl, causing
semaphore ns_segctor_sem to block for a long time and hang the writer
thread:
INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2
flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
...
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
...
This fixes these issues by inserting appropriate minimum device size
checks or anti-underflow checks, depending on where the macro is used.
Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com
Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: <syzbot+f0c4082ce5ebebdac63b(a)syzkaller.appspotmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/fs/nilfs2/ioctl.c~nilfs2-fix-underflow-in-second-superblock-position-calculations
+++ a/fs/nilfs2/ioctl.c
@@ -1114,7 +1114,14 @@ static int nilfs_ioctl_set_alloc_range(s
minseg = range[0] + segbytes - 1;
do_div(minseg, segbytes);
+
+ if (range[1] < 4096)
+ goto out;
+
maxseg = NILFS_SB2_OFFSET_BYTES(range[1]);
+ if (maxseg < segbytes)
+ goto out;
+
do_div(maxseg, segbytes);
maxseg--;
--- a/fs/nilfs2/super.c~nilfs2-fix-underflow-in-second-superblock-position-calculations
+++ a/fs/nilfs2/super.c
@@ -409,6 +409,15 @@ int nilfs_resize_fs(struct super_block *
goto out;
/*
+ * Prevent underflow in second superblock position calculation.
+ * The exact minimum size check is done in nilfs_sufile_resize().
+ */
+ if (newsize < 4096) {
+ ret = -ENOSPC;
+ goto out;
+ }
+
+ /*
* Write lock is required to protect some functions depending
* on the number of segments, the number of reserved segments,
* and so forth.
--- a/fs/nilfs2/the_nilfs.c~nilfs2-fix-underflow-in-second-superblock-position-calculations
+++ a/fs/nilfs2/the_nilfs.c
@@ -544,9 +544,15 @@ static int nilfs_load_super_block(struct
{
struct nilfs_super_block **sbp = nilfs->ns_sbp;
struct buffer_head **sbh = nilfs->ns_sbh;
- u64 sb2off = NILFS_SB2_OFFSET_BYTES(bdev_nr_bytes(nilfs->ns_bdev));
+ u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
int valid[2], swp = 0;
+ if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
+ nilfs_err(sb, "device size too small");
+ return -EINVAL;
+ }
+ sb2off = NILFS_SB2_OFFSET_BYTES(devsize);
+
sbp[0] = nilfs_read_super_block(sb, NILFS_SB_OFFSET_BYTES, blocksize,
&sbh[0]);
sbp[1] = nilfs_read_super_block(sb, sb2off, blocksize, &sbh[1]);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-underflow-in-second-superblock-position-calculations.patch
The patch titled
Subject: hugetlb: check for undefined shift on 32 bit architectures
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
hugetlb-check-for-undefined-shift-on-32-bit-architectures.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: check for undefined shift on 32 bit architectures
Date: Wed, 15 Feb 2023 17:35:42 -0800
Users can specify the hugetlb page size in the mmap, shmget and
memfd_create system calls. This is done by using 6 bits within the flags
argument to encode the base-2 logarithm of the desired page size. The
routine hstate_sizelog() uses the log2 value to find the corresponding
hugetlb hstate structure. Converting the log2 value (page_size_log) to
potential hugetlb page size is the simple statement:
1UL << page_size_log
Because only 6 bits are used for page_size_log, the left shift can not be
greater than 63. This is fine on 64 bit architectures where a long is 64
bits. However, if a value greater than 31 is passed on a 32 bit
architecture (where long is 32 bits) the shift will result in undefined
behavior. This was generally not an issue as the result of the undefined
shift had to exactly match hugetlb page size to proceed.
Recent improvements in runtime checking have resulted in this undefined
behavior throwing errors such as reported below.
Fix by comparing page_size_log to BITS_PER_LONG before doing shift.
Link: https://lkml.kernel.org/r/20230216013542.138708-1-mike.kravetz@oracle.com
Link: https://lore.kernel.org/lkml/CA+G9fYuei_Tr-vN9GS7SfFyU1y9hNysnf=PB7kT0=yv4M…
Fixes: 42d7395feb56 ("mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Reviewed-by: Jesper Juhl <jesperjuhl76(a)gmail.com>
Acked-by: Muchun Song <songmuchun(a)bytedance.com>
Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Cc: Anders Roxell <anders.roxell(a)linaro.org>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Sasha Levin <sashal(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/include/linux/hugetlb.h~hugetlb-check-for-undefined-shift-on-32-bit-architectures
+++ a/include/linux/hugetlb.h
@@ -743,7 +743,10 @@ static inline struct hstate *hstate_size
if (!page_size_log)
return &default_hstate;
- return size_to_hstate(1UL << page_size_log);
+ if (page_size_log < BITS_PER_LONG)
+ return size_to_hstate(1UL << page_size_log);
+
+ return NULL;
}
static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlb-check-for-undefined-shift-on-32-bit-architectures.patch
The patch titled
Subject: mm/migrate: fix wrongly apply write bit after mkdirty on sparc64
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-migrate-fix-wrongly-apply-write-bit-after-mkdirty-on-sparc64.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/migrate: fix wrongly apply write bit after mkdirty on sparc64
Date: Thu, 16 Feb 2023 10:30:59 -0500
Nick Bowler reported another sparc64 breakage after the young/dirty
persistent work for page migration (per "Link:" below). That's after a
similar report [2].
It turns out page migration was overlooked, and it wasn't failing before
because page migration was not enabled in the initial report test
environment.
David proposed another way [2] to fix this from sparc64 side, but that
patch didn't land somehow. Neither did I check whether there's any other
arch that has similar issues.
Let's fix it for now as simple as moving the write bit handling to be
after dirty, like what we did before.
Note: this is based on mm-unstable, because the breakage was since 6.1 and
we're at a very late stage of 6.2 (-rc8), so I assume for this specific
case we should target this at 6.3.
[1] https://lore.kernel.org/all/20221021160603.GA23307@u164.east.ru/
[2] https://lore.kernel.org/all/20221212130213.136267-1-david@redhat.com/
Link: https://lkml.kernel.org/r/20230216153059.256739-1-peterx@redhat.com
Fixes: 2e3468778dbe ("mm: remember young/dirty bit for page migrations")
Link: https://lore.kernel.org/all/CADyTPExpEqaJiMGoV+Z6xVgL50ZoMJg49B10LcZ=8eg19u…
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reported-by: Nick Bowler <nbowler(a)draconx.ca>
Acked-by: David Hildenbrand <david(a)redhat.com>
Tested-by: Nick Bowler <nbowler(a)draconx.ca>
Cc: <regressions(a)lists.linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/huge_memory.c~mm-migrate-fix-wrongly-apply-write-bit-after-mkdirty-on-sparc64
+++ a/mm/huge_memory.c
@@ -3272,8 +3272,6 @@ void remove_migration_pmd(struct page_vm
pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot));
if (pmd_swp_soft_dirty(*pvmw->pmd))
pmde = pmd_mksoft_dirty(pmde);
- if (is_writable_migration_entry(entry))
- pmde = maybe_pmd_mkwrite(pmde, vma);
if (pmd_swp_uffd_wp(*pvmw->pmd))
pmde = pmd_wrprotect(pmd_mkuffd_wp(pmde));
if (!is_migration_entry_young(entry))
@@ -3281,6 +3279,10 @@ void remove_migration_pmd(struct page_vm
/* NOTE: this may contain setting soft-dirty on some archs */
if (PageDirty(new) && is_migration_entry_dirty(entry))
pmde = pmd_mkdirty(pmde);
+ if (is_writable_migration_entry(entry))
+ pmde = maybe_pmd_mkwrite(pmde, vma);
+ else
+ pmde = pmd_wrprotect(pmde);
if (PageAnon(new)) {
rmap_t rmap_flags = RMAP_COMPOUND;
--- a/mm/migrate.c~mm-migrate-fix-wrongly-apply-write-bit-after-mkdirty-on-sparc64
+++ a/mm/migrate.c
@@ -224,6 +224,8 @@ static bool remove_migration_pte(struct
pte = maybe_mkwrite(pte, vma);
else if (pte_swp_uffd_wp(*pvmw.pte))
pte = pte_mkuffd_wp(pte);
+ else
+ pte = pte_wrprotect(pte);
if (folio_test_anon(folio) && !is_readable_migration_entry(entry))
rmap_flags |= RMAP_EXCLUSIVE;
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-migrate-fix-wrongly-apply-write-bit-after-mkdirty-on-sparc64.patch
mm-uffd-fix-comment-in-handling-pte-markers.patch
This patch introduce a new internal flag per lkb value to handle
internal flags which are handled not on wire. The current lkb internal
flags stored as lkb->lkb_flags are split in upper and lower bits, the
lower bits are used to share internal flags over wire for other cluster
wide lkb copies on other nodes.
In commit 61bed0baa4db ("fs: dlm: use a non-static queue for callbacks")
we introduced a new internal flag for pending callbacks for the dlm
callback queue. This flag is protected by the lkb->lkb_cb_lock lock.
This patch overlooked that on dlm receive path and the mentioned upper
and lower bits, that dlm will read the flags, mask it and write it
back. As example receive_flags() in fs/dlm/lock.c. This flag
manipulation is not done atomically and is not protected by
lkb->lkb_cb_lock. This has unknown side effects of the current callback
handling.
In future we should move to set/clear/test bit functionality and avoid
read, mask and writing back flag values. In later patches we will move
the upper parts to the new introduced internal lkb flags which are not
shared between other cluster nodes to the new non shared internal flag
field to avoid similar issues.
Cc: stable(a)vger.kernel.org
Fixes: 61bed0baa4db ("fs: dlm: use a non-static queue for callbacks")
Reported-by: Bob Peterson <rpeterso(a)redhat.com>
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
---
fs/dlm/ast.c | 9 ++++-----
fs/dlm/dlm_internal.h | 7 ++++++-
fs/dlm/user.c | 2 +-
3 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
index 26fef9945cc9..7daffdd99f99 100644
--- a/fs/dlm/ast.c
+++ b/fs/dlm/ast.c
@@ -45,7 +45,7 @@ void dlm_purge_lkb_callbacks(struct dlm_lkb *lkb)
kref_put(&cb->ref, dlm_release_callback);
}
- lkb->lkb_flags &= ~DLM_IFL_CB_PENDING;
+ clear_bit(DLM_IFLNS_CB_PENDING, &lkb->lkb_insflags);
/* invalidate */
dlm_callback_set_last_ptr(&lkb->lkb_last_cast, NULL);
@@ -103,10 +103,9 @@ int dlm_enqueue_lkb_callback(struct dlm_lkb *lkb, uint32_t flags, int mode,
cb->sb_status = status;
cb->sb_flags = (sbflags & 0x000000FF);
kref_init(&cb->ref);
- if (!(lkb->lkb_flags & DLM_IFL_CB_PENDING)) {
- lkb->lkb_flags |= DLM_IFL_CB_PENDING;
+ if (!test_and_set_bit(DLM_IFLNS_CB_PENDING, &lkb->lkb_insflags))
rv = DLM_ENQUEUE_CALLBACK_NEED_SCHED;
- }
+
list_add_tail(&cb->list, &lkb->lkb_callbacks);
if (flags & DLM_CB_CAST)
@@ -209,7 +208,7 @@ void dlm_callback_work(struct work_struct *work)
spin_lock(&lkb->lkb_cb_lock);
rv = dlm_dequeue_lkb_callback(lkb, &cb);
if (rv == DLM_DEQUEUE_CALLBACK_EMPTY) {
- lkb->lkb_flags &= ~DLM_IFL_CB_PENDING;
+ clear_bit(DLM_IFLNS_CB_PENDING, &lkb->lkb_insflags);
spin_unlock(&lkb->lkb_cb_lock);
break;
}
diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h
index ab1a55337a6e..b967b4d7d55d 100644
--- a/fs/dlm/dlm_internal.h
+++ b/fs/dlm/dlm_internal.h
@@ -211,7 +211,11 @@ struct dlm_args {
#endif
#define DLM_IFL_DEADLOCK_CANCEL 0x01000000
#define DLM_IFL_STUB_MS 0x02000000 /* magic number for m_flags */
-#define DLM_IFL_CB_PENDING 0x04000000
+
+/* lkb_insflags */
+
+#define DLM_IFLNS_CB_PENDING 0
+
/* least significant 2 bytes are message changed, they are full transmitted
* but at receive side only the 2 bytes LSB will be set.
*
@@ -246,6 +250,7 @@ struct dlm_lkb {
uint32_t lkb_exflags; /* external flags from caller */
uint32_t lkb_sbflags; /* lksb flags */
uint32_t lkb_flags; /* internal flags */
+ unsigned long lkb_insflags; /* internal non shared flags */
uint32_t lkb_lvbseq; /* lvb sequence number */
int8_t lkb_status; /* granted, waiting, convert */
diff --git a/fs/dlm/user.c b/fs/dlm/user.c
index 35129505ddda..98488a1b702d 100644
--- a/fs/dlm/user.c
+++ b/fs/dlm/user.c
@@ -884,7 +884,7 @@ static ssize_t device_read(struct file *file, char __user *buf, size_t count,
goto try_another;
case DLM_DEQUEUE_CALLBACK_LAST:
list_del_init(&lkb->lkb_cb_list);
- lkb->lkb_flags &= ~DLM_IFL_CB_PENDING;
+ clear_bit(DLM_IFLNS_CB_PENDING, &lkb->lkb_insflags);
break;
case DLM_DEQUEUE_CALLBACK_SUCCESS:
break;
--
2.31.1
radix_tree_preload() warns on attempting to call it with an allocation
mask that doesn't allow blocking. While that warning could arguably
be removed, we need to handle radix insertion failures anyway as they
are more likely if we cannot block to get memory.
Remove legacy BUG_ON()'s and turn them into proper errors instead, one
for the allocation failure and one for finding a page that doesn't
match the correct index.
Cc: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
drivers/block/brd.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 1ddada0cdaca..6019ef23344f 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -84,6 +84,7 @@ static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp)
{
pgoff_t idx;
struct page *page;
+ int ret = 0;
page = brd_lookup_page(brd, sector);
if (page)
@@ -93,7 +94,7 @@ static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp)
if (!page)
return -ENOMEM;
- if (radix_tree_preload(gfp)) {
+ if (gfpflags_allow_blocking(gfp) && radix_tree_preload(gfp)) {
__free_page(page);
return -ENOMEM;
}
@@ -104,8 +105,10 @@ static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp)
if (radix_tree_insert(&brd->brd_pages, idx, page)) {
__free_page(page);
page = radix_tree_lookup(&brd->brd_pages, idx);
- BUG_ON(!page);
- BUG_ON(page->index != idx);
+ if (!page)
+ ret = -ENOMEM;
+ else if (page->index != idx)
+ ret = -EIO;
} else {
brd->brd_nr_pages++;
}
--
2.39.1
From: Mario Limonciello <mario.limonciello(a)amd.com>
BugLink: https://bugs.launchpad.net/bugs/2007581
Commit 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members
before initialization") attempted to fix a race condition that lead to a
NULL pointer, but in the process caused a regression for _AEI/_EVT
declared GPIOs.
This manifests in messages showing deferred probing while trying to
allocate IRQs like so:
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0000 to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x002C to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x003D to IRQ, err -517
[ .. more of the same .. ]
The code for walking _AEI doesn't handle deferred probing and so this
leads to non-functional GPIO interrupts.
Fix this issue by moving the call to `acpi_gpiochip_request_interrupts`
to occur after gc->irc.initialized is set.
Fixes: 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members before initialization")
Link: https://lore.kernel.org/linux-gpio/BL1PR12MB51577A77F000A008AA694675E2EF9@B…
Link: https://bugzilla.suse.com/show_bug.cgi?id=1198697
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215850
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1979
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1976
Reported-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Shreeya Patel <shreeya.patel(a)collabora.com>
Tested-By: Samuel Čavoj <samuel(a)cavoj.net>
Tested-By: lukeluk498(a)gmail.com Link:
Reviewed-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Acked-by: Linus Walleij <linus.walleij(a)linaro.org>
Reviewed-and-tested-by: Takashi Iwai <tiwai(a)suse.de>
Cc: Shreeya Patel <shreeya.patel(a)collabora.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
(backported from commit 06fb4ecfeac7e00d6704fa5ed19299f2fefb3cc9)
Signed-off-by: Asmaa Mnebhi <asmaa(a)nvidia.com>
---
drivers/gpio/gpiolib.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index e4d47e00c392..049cdfc975b3 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2329,8 +2329,6 @@ static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
gpiochip_set_irq_hooks(gpiochip);
- acpi_gpiochip_request_interrupts(gpiochip);
-
/*
* Using barrier() here to prevent compiler from reordering
* gc->irq.initialized before initialization of above
@@ -2340,6 +2338,8 @@ static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
gpiochip->irq.initialized = true;
+ acpi_gpiochip_request_interrupts(gpiochip);
+
return 0;
}
--
2.30.1
From: Shreeya Patel <shreeya.patel(a)collabora.com>
BugLink: https://bugs.launchpad.net/bugs/2007581
GPIO chip irq members are exposed before they could be completely
initialized and this leads to race conditions.
One such issue was observed for the gc->irq.domain variable which
was accessed through the I2C interface in gpiochip_to_irq() before
it could be initialized by gpiochip_add_irqchip(). This resulted in
Kernel NULL pointer dereference.
Following are the logs for reference :-
kernel: Call Trace:
kernel: gpiod_to_irq+0x53/0x70
kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0
kernel: i2c_acpi_get_irq+0xc0/0xd0
kernel: i2c_device_probe+0x28a/0x2a0
kernel: really_probe+0xf2/0x460
kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0
To avoid such scenarios, restrict usage of GPIO chip irq members before
they are completely initialized.
Signed-off-by: Shreeya Patel <shreeya.patel(a)collabora.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Bartosz Golaszewski <brgl(a)bgdev.pl>
(backported from commit 5467801f1fcbdc46bc7298a84dbf3ca1ff2a7320)
Signed-off-by: Asmaa Mnebhi <asmaa(a)nvidia.com>
---
drivers/gpio/gpiolib.c | 19 +++++++++++++++++++
include/linux/gpio/driver.h | 9 +++++++++
2 files changed, 28 insertions(+)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index abdf448b11a3..e4d47e00c392 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2146,6 +2146,16 @@ static int gpiochip_to_irq(struct gpio_chip *chip, unsigned offset)
{
struct irq_domain *domain = chip->irq.domain;
+#ifdef CONFIG_GPIOLIB_IRQCHIP
+ /*
+ * Avoid race condition with other code, which tries to lookup
+ * an IRQ before the irqchip has been properly registered,
+ * i.e. while gpiochip is still being brought up.
+ */
+ if (!chip->irq.initialized)
+ return -EPROBE_DEFER;
+#endif
+
if (!gpiochip_irqchip_irq_valid(chip, offset))
return -ENXIO;
@@ -2321,6 +2331,15 @@ static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
acpi_gpiochip_request_interrupts(gpiochip);
+ /*
+ * Using barrier() here to prevent compiler from reordering
+ * gc->irq.initialized before initialization of above
+ * GPIO chip irq members.
+ */
+ barrier();
+
+ gpiochip->irq.initialized = true;
+
return 0;
}
diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h
index 5dd9c982e2cb..15418caf76fc 100644
--- a/include/linux/gpio/driver.h
+++ b/include/linux/gpio/driver.h
@@ -201,6 +201,15 @@ struct gpio_irq_chip {
*/
bool threaded;
+ /**
+ * @initialized:
+ *
+ * Flag to track GPIO chip irq member's initialization.
+ * This flag will make sure GPIO chip irq members are not used
+ * before they are initialized.
+ */
+ bool initialized;
+
/**
* @init_hw: optional routine to initialize hardware before
* an IRQ chip will be added. This is quite useful when
--
2.30.1
Users can specify the hugetlb page size in the mmap, shmget and
memfd_create system calls. This is done by using 6 bits within the
flags argument to encode the base-2 logarithm of the desired page size.
The routine hstate_sizelog() uses the log2 value to find the
corresponding hugetlb hstate structure. Converting the log2 value
(page_size_log) to potential hugetlb page size is the simple statement:
1UL << page_size_log
Because only 6 bits are used for page_size_log, the left shift can not
be greater than 63. This is fine on 64 bit architectures where a long
is 64 bits. However, if a value greater than 31 is passed on a 32 bit
architecture (where long is 32 bits) the shift will result in undefined
behavior. This was generally not an issue as the result of the
undefined shift had to exactly match hugetlb page size to proceed.
Recent improvements in runtime checking have resulted in this undefined
behavior throwing errors such as reported below.
Fix by comparing page_size_log to BITS_PER_LONG before doing shift.
Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Link: https://lore.kernel.org/lkml/CA+G9fYuei_Tr-vN9GS7SfFyU1y9hNysnf=PB7kT0=yv4M…
Fixes: 42d7395feb56 ("mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
include/linux/hugetlb.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index df6dd624ccfe..8b45720f9475 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -781,7 +781,10 @@ static inline struct hstate *hstate_sizelog(int page_size_log)
if (!page_size_log)
return &default_hstate;
- return size_to_hstate(1UL << page_size_log);
+ if (page_size_log < BITS_PER_LONG)
+ return size_to_hstate(1UL << page_size_log);
+
+ return NULL;
}
static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
--
2.39.1
By default, non-mq drivers do not support nowait. This causes io_uring
to use a slower path as the driver cannot be trust not to block. brd
can safely set the nowait flag, as worst case all it does is a NOIO
allocation.
For io_uring, this makes a substantial difference. Before:
submitter=0, tid=453, file=/dev/ram0, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=440.03K, BW=1718MiB/s, IOS/call=32/31
IOPS=428.96K, BW=1675MiB/s, IOS/call=32/32
IOPS=442.59K, BW=1728MiB/s, IOS/call=32/31
IOPS=419.65K, BW=1639MiB/s, IOS/call=32/32
IOPS=426.82K, BW=1667MiB/s, IOS/call=32/31
and after:
submitter=0, tid=354, file=/dev/ram0, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=3.37M, BW=13.15GiB/s, IOS/call=32/31
IOPS=3.45M, BW=13.46GiB/s, IOS/call=32/31
IOPS=3.43M, BW=13.42GiB/s, IOS/call=32/32
IOPS=3.43M, BW=13.39GiB/s, IOS/call=32/31
IOPS=3.43M, BW=13.38GiB/s, IOS/call=32/31
or about an 8x in difference. Now that brd is prepared to deal with
REQ_NOWAIT reads/writes, mark it as supporting that.
Cc: stable(a)vger.kernel.org # 5.10+
Link: https://lore.kernel.org/linux-block/20230203103005.31290-1-p.raghav@samsung…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
drivers/block/brd.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 6019ef23344f..522530a6ebca 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -418,6 +418,7 @@ static int brd_alloc(int i)
/* Tell the block layer that this is not a rotational device */
blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, disk->queue);
+ blk_queue_flag_set(QUEUE_FLAG_NOWAIT, disk->queue);
err = add_disk(disk);
if (err)
goto out_cleanup_disk;
--
2.39.1
by Linux kernel regression tracking (Thorsten Leemhuis)
[adding Chih-Kang Chang (author), Kalle (committer) and LKML to the list
of recipients]
[anyone who replies to this: feel free to remove stable(a)vger.kernel.org
from the recipients, this is a mainline regression]
[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]
On 09.02.23 20:59, Paul Gover wrote:
> Suspend/Resume was working OK on kernel 6.0.13, broken since 6.1.1
> (I've not tried kernels between those, except in the bisect below.)
> All subsequent 6,1 kernels exhibit the same behaviour.
>
> Suspend works OK, but on Resume, there's a flicker, and then it reboots.
> Sometimes the screen gets restored to its contents at the time of suspend. but
> less than a second later, it starts rebooting.
> To reproduce, simply boot, suspend, and resume.
>
> Git bisect blames RTW88
> commit 6bf3a083407b5d404d70efc3a5ac75b472e5efa9
TWIMC, that's "wifi: rtw88: add flag check before enter or leave IPS"
> I'll attach bisect log, dmesg and configs to the bug I've opened
> https://bugzilla.kernel.org/show_bug.cgi?id=217016
>
> dmesg from the following boot show a hardware error.
> It's not there when the system resumes or reboots with 6.0.13,
> and if I don't suspend & resume, there are no reported errors.
>
> The problem occurs under both Wayland and X11, and from the command line via
> echo mem>/sys/power.state
>
>
> Vanilla kernels, untainted, compiled with GCC; my system is Gentoo FWIW, but I
> do my own kernels direct from a git clone of stable.
>
> Couldn't find anything similar with Google or the mailing lists.
>
> **Hardware:**
>
> HP Laptop 15-bw0xx
> AMD A9-9420 RADEON R5, 5 COMPUTE CORES
> Stoney [Radeon R2/R3/R4/R5 Graphics]
> 4 GB memory
> RTL8723DE PCIe adapter
>
> **Kernel**
>
> Kernel command line:
> psmouse.synaptics_intertouch=1 pcie_aspm=force rdrand=force rootfstype=f2fs
> root=LABEL=gentoo
>
> CONFIG_RTW88=m
> CONFIG_RTW88_CORE=m
> CONFIG_RTW88_PCI=m
> CONFIG_RTW88_8723D=m
> # CONFIG_RTW88_8822BE is not set
> # CONFIG_RTW88_8822CE is not set
> CONFIG_RTW88_8723DE=m
> # CONFIG_RTW88_8821CE is not set
> # CONFIG_RTW88_DEBUG is not set
> # CONFIG_RTW88_DEBUGFS is not set
> # CONFIG_RTW89 is not set
Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:
#regzbot ^introduced 6bf3a083407b
#regzbot title wifi: rtw88: resume broken (reboot)
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
Fixed optkprobe issues, mainly related to the x86 architecture.
Yang Jihong (2):
x86/kprobes: Fix __recover_optprobed_insn check optimizing logic
x86/kprobes: Fix arch_check_optimized_kprobe check within
optimized_kprobe range
arch/x86/kernel/kprobes/opt.c | 6 +++---
include/linux/kprobes.h | 2 ++
kernel/kprobes.c | 4 ++--
3 files changed, 7 insertions(+), 5 deletions(-)
--
Changes since v1:
- Remove patch1 since there is already a fix patch.
- Add "cc stable" and modify comment for patch2.
- Use "kprobe_disarmed" instead of "kprobe_disabled" for patch3.
- Add fix commmit and "cc stable" for patch3.
2.30.GIT
--
Az SFG Finance struktúra pénzügyi segítséget nyújt a világ bármely pontján
lakóhellyel rendelkező természetes vagy jogi személynek.
Segítségre van szüksége a napi finanszírozási problémák megoldásához?
Mennyire van szükséged ?
Most vagy soha.
Lépjen kapcsolatba finanszírozási csoportunkkal a Facebook Messengeren:
https://www.facebook.com/sfg.finances
VAGY E-mailben: sg.finance(a)gmail.com
From: Daniele Ceraolo Spurio <daniele.ceraolospurio(a)intel.com>
Direction from hardware is that ring buffers should never be mapped
via the BAR on systems with LLC. There are too many caching pitfalls
due to the way BAR accesses are routed. So it is safest to just not
use it.
Signed-off-by: John Harrison <John.C.Harrison(a)Intel.com>
Fixes: 9d80841ea4c9 ("drm/i915: Allow ringbuffers to be bound anywhere")
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.9+
---
drivers/gpu/drm/i915/gt/intel_ring.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
index fb1d2595392ed..8675ec8ead353 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -53,7 +53,7 @@ int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)
if (unlikely(ret))
goto err_unpin;
- if (i915_vma_is_map_and_fenceable(vma)) {
+ if (i915_vma_is_map_and_fenceable(vma) && !HAS_LLC(vma->vm->i915)) {
addr = (void __force *)i915_vma_pin_iomap(vma);
} else {
int type = i915_coherent_map_type(vma->vm->i915, vma->obj, false);
@@ -98,7 +98,7 @@ void intel_ring_unpin(struct intel_ring *ring)
return;
i915_vma_unset_ggtt_write(vma);
- if (i915_vma_is_map_and_fenceable(vma))
+ if (i915_vma_is_map_and_fenceable(vma) && !HAS_LLC(vma->vm->i915)) {
i915_vma_unpin_iomap(vma);
else
i915_gem_object_unpin_map(vma->obj);
--
2.39.1
When f2fs skipped a gc round during victim migration, there was a bug which
would skip all upcoming gc rounds unconditionally because skipped_gc_rwsem
was not initialized. It fixes the bug by correctly initializing the
skipped_gc_rwsem inside the gc loop.
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Signed-off-by: Yonggil Song <yonggil.song(a)samsung.com>
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index b22f49a6f128..81d326abaac1 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1786,8 +1786,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct f2fs_gc_control *gc_control)
prefree_segments(sbi));
cpc.reason = __get_cp_reason(sbi);
- sbi->skipped_gc_rwsem = 0;
gc_more:
+ sbi->skipped_gc_rwsem = 0;
if (unlikely(!(sbi->sb->s_flags & SB_ACTIVE))) {
ret = -EINVAL;
goto stop;
--
2.34.1
Cześć,
Otrzymałeś moją poprzednią wiadomość? Skontaktowałem się z tobą
wcześniej, ale wiadomość nie wróciła, więc postanowiłem napisać
ponownie. Potwierdź, czy to otrzymasz, abym mógł kontynuować,
czekam na Twoją odpowiedź.
Pozdrowienia,
Pani Reacheal
From: John Harrison <John.C.Harrison(a)Intel.com>
Direction from hardware is that stolen memory should never be used for
ring buffer allocations on platforms with LLC. There are too many
caching pitfalls due to the way stolen memory accesses are routed. So
it is safest to just not use it.
Signed-off-by: John Harrison <John.C.Harrison(a)Intel.com>
Fixes: c58b735fc762 ("drm/i915: Allocate rings from stolen")
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.9+
---
drivers/gpu/drm/i915/gt/intel_ring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
index 15ec64d881c44..fb1d2595392ed 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -116,7 +116,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE |
I915_BO_ALLOC_PM_VOLATILE);
- if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt))
+ if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt) && !HAS_LLC(i915))
obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(obj))
obj = i915_gem_object_create_internal(i915, size);
--
2.39.1
Hi Greg, Sasha,
Here are two MPTCP patches backports (patches 2-3/4), and one
prerequisite (patch 1/4), that recently failed to apply to the 6.1
stable tree. They prevent some locking issues with MPTCP.
After having cherry-picked patch 1/4 -- a simple refactoring to make a
function more generic -- patch 2/4 applied without any issue.
For patch 3/4, I had to resolve two simple function because two
if-statements around the modified code have curly braces in v6.1, not
later, see commit 976d302fb616 ("mptcp: deduplicate error paths on
endpoint creation").
On top of that, patch 4/4 fixes MPTCP userspace PM selftest that has
been recently broken due to a backport done in v6.1.8.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (2):
mptcp: sockopt: make 'tcp_fastopen_connect' generic
selftests: mptcp: userspace: fix v4-v6 test in v6.1
Paolo Abeni (2):
mptcp: fix locking for setsockopt corner-case
mptcp: fix locking for in-kernel listener creation
net/mptcp/pm_netlink.c | 10 ++++++----
net/mptcp/sockopt.c | 20 ++++++++++++++------
net/mptcp/subflow.c | 2 +-
tools/testing/selftests/net/mptcp/userspace_pm.sh | 11 +++++++++++
4 files changed, 32 insertions(+), 11 deletions(-)
---
base-commit: 9012d1ebd3236e1d741ab4264f1d14e276c2e29f
change-id: 20230214-upstream-stable-20230214-linux-6-1-12-rc1-mptcp-fixes-df24a5f41151
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Hello,
Here is the next batch of backports for 5.15.y. These patches have
already been ACK'd on the xfs mailing list. Testing included
25 runs of auto group on 12 xfs configs. No regressions were seen.
I checked xfs/538 was run without issue as this test was mentioned
in 56486f307100. Also, from 86d40f1e49e9, I ran ran xfs/117 with
XFS compiled as a module and TEST_FS_MODULE_REOLOAD set, but I was
unable to reproduce the issue.
Below I've outlined which series the backports came from:
series "xfs: intent whiteouts" (1):
[01/10] cb512c921639613ce03f87e62c5e93ed9fe8c84d
xfs: zero inode fork buffer at allocation
[02/10] c230a4a85bcdbfc1a7415deec6caf04e8fca1301
xfs: fix potential log item leak
series "xfs: fix random format verification issues" (2):
[1/4] dc04db2aa7c9307e740d6d0e173085301c173b1a
xfs: detect self referencing btree sibling pointers
[2/4] 1eb70f54c445fcbb25817841e774adb3d912f3e8 -> already in 5.15.y
xfs: validate inode fork size against fork format
[3/4] dd0d2f9755191690541b09e6385d0f8cd8bc9d8f
xfs: set XFS_FEAT_NLINK correctly
[4/4] f0f5f658065a5af09126ec892e4c383540a1c77f
xfs: validate v5 feature fields
series "xfs: small fixes for 5.19 cycle" (3):
[1/3] 5672225e8f2a872a22b0cecedba7a6644af1fb84
xfs: avoid unnecessary runtime sibling pointer endian conversions
[2/3] 5b55cbc2d72632e874e50d2e36bce608e55aaaea
fs: don't assert fail on perag references on teardown
[2/3] 56486f307100e8fc66efa2ebd8a71941fa10bf6f
xfs: assert in xfs_btree_del_cursor should take into account error
series "xfs: random fixes for 5.19" (4):
[1/2] 86d40f1e49e9a909d25c35ba01bea80dbcd758cb
xfs: purge dquots after inode walk fails during quotacheck
[2/2] a54f78def73d847cb060b18c4e4a3d1d26c9ca6d
xfs: don't leak btree cursor when insrec fails after a split
(1) https://lore.kernel.org/all/20220503221728.185449-1-david@fromorbit.com/
(2) https://lore.kernel.org/all/20220502082018.1076561-1-david@fromorbit.com/
(3) https://lore.kernel.org/all/20220524022158.1849458-1-david@fromorbit.com/
(4) https://lore.kernel.org/all/165337056527.993079.1232300816023906959.stgit@m…
- Leah
Darrick J. Wong (2):
xfs: purge dquots after inode walk fails during quotacheck
xfs: don't leak btree cursor when insrec fails after a split
Dave Chinner (8):
xfs: zero inode fork buffer at allocation
xfs: fix potential log item leak
xfs: detect self referencing btree sibling pointers
xfs: set XFS_FEAT_NLINK correctly
xfs: validate v5 feature fields
xfs: avoid unnecessary runtime sibling pointer endian conversions
xfs: don't assert fail on perag references on teardown
xfs: assert in xfs_btree_del_cursor should take into account error
fs/xfs/libxfs/xfs_ag.c | 3 +-
fs/xfs/libxfs/xfs_btree.c | 175 +++++++++++++++++++++++++--------
fs/xfs/libxfs/xfs_inode_fork.c | 12 ++-
fs/xfs/libxfs/xfs_sb.c | 70 +++++++++++--
fs/xfs/xfs_bmap_item.c | 2 +
fs/xfs/xfs_icreate_item.c | 1 +
fs/xfs/xfs_qm.c | 9 +-
fs/xfs/xfs_refcount_item.c | 2 +
fs/xfs/xfs_rmap_item.c | 2 +
9 files changed, 221 insertions(+), 55 deletions(-)
--
2.39.1.581.gbfd45094c4-goog
[Greetings]
I'm Dr. Breiner, a research consultant with one of the leading
laboratories in the United Kingdom.
Our company is one of the most respected indigenous multi-million
pharma companies, manufacturing hundreds of lifesaving
biopharmaceutical products and medical consumables. Our range
includes anti-diabetic, anti-inflammatory, and analgesic drugs,
vaccines, antimalarial drugs, and other essential medical
consumables.
I have a business proposal that will be of interest to you. I'll
explain it in detail if you let me know if you'd like to hear
more. Please keep in mind that you can decide not to move forward
with me at any point during or after my detailed explanation.
But please be sure to trust me; you will not experience any
regret whatsoever.
I look forward to hearing back from you. If you have any
questions, please do not hesitate to contact me.
Best regards
[Dr. Breiner]
No upstream commit exists: the problem addressed here is that
'commit 75309018a24d ("s390: add support for TIF_NOTIFY_SIGNAL")'
was backported to 5.10. This commit is broken, but nobody noticed
upstream, since shortly after s390 converted to generic entry with
'commit 56e62a737028 ("s390: convert to generic entry")', which
implicitly fixed the problem outlined below.
Thread flag is set to TIF_NOTIFY_SIGNAL for io_uring work. The io work
user or syscall calls do_signal when either one of the TIF_SIGPENDING or
TIF_NOTIFY_SIGNAL flag is set. However, do_signal does consider only
TIF_SIGPENDING signal and ignores TIF_NOTIFY_SIGNAL condition. This
means get_signal is never invoked for TIF_NOTIFY_SIGNAL and hence the
flag is not cleared, which results in an endless do_signal loop.
Reference: 'commit 788d0824269b ("io_uring: import 5.15-stable io_uring")'
Fixes: 75309018a24d ("s390: add support for TIF_NOTIFY_SIGNAL")
Cc: stable(a)vger.kernel.org # 5.10.162
Acked-by: Heiko Carstens <hca(a)linux.ibm.com>
Acked-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
---
v2->v3:
Correct changelog.
v1->v2:
Add the changelog.
arch/s390/kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/kernel/signal.c b/arch/s390/kernel/signal.c
index b27b6c1f058d..9e900a8977bd 100644
--- a/arch/s390/kernel/signal.c
+++ b/arch/s390/kernel/signal.c
@@ -472,7 +472,7 @@ void do_signal(struct pt_regs *regs)
current->thread.system_call =
test_pt_regs_flag(regs, PIF_SYSCALL) ? regs->int_code : 0;
- if (test_thread_flag(TIF_SIGPENDING) && get_signal(&ksig)) {
+ if (get_signal(&ksig)) {
/* Whee! Actually deliver the signal. */
if (current->thread.system_call) {
regs->int_code = current->thread.system_call;
--
2.37.2
Hi,
This patch fixes the issue for s390 stable kernel starting 5.10.162.
The issue was specifically seen after stable version 5.10.162:
Following commits can trigger it:
1. stable commit id - 788d0824269b ("io_uring: import 5.15-stable
io_uring") can trigger this problem.
2. upstream commit id - 75309018a24d ("s390: add support for
TIF_NOTIFY_SIGNAL")
Problem:
qemu and user processes could stall when TIF_NOTIFY_SIGNAL is set from
io_uring work.
Affected users:
The issue was first raised by the debian team, where the s390
bullseye build systems are affected.
Upstream commit Id:
* The attached patch has no upstream commit. However, the stable kernel
5.10.162+ uses upstream commit id - 75309018a24d ("s390: add support for
TIF_NOTIFY_SIGNAL"), which would need this fix
* Starting from v5.12, there are s390 generic entry commits
56e62a737028 ("s390: convert to generic entry") and its relevant fixes,
which are recommended and should address these problems.
Kernel version to be applied:
stable kernel 5.10.162+
Thanks.
Sumanth
Sumanth Korikkar (1):
s390/signal: fix endless loop in do_signal
arch/s390/kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.37.2
If a non-root cgroup gets removed when there is a thread that registered
trigger and is polling on a pressure file within the cgroup, the polling
waitqueue gets freed in the following path.
do_rmdir
cgroup_rmdir
kernfs_drain_open_files
cgroup_file_release
cgroup_pressure_release
psi_trigger_destroy
However, the polling thread still has a reference to the pressure file and
will access the freed waitqueue when the file is closed or upon exit.
fput
ep_eventpoll_release
ep_free
ep_remove_wait_queue
remove_wait_queue
This results in use-after-free as pasted below.
The fundamental problem here is that cgroup_file_release() (and
consequently waitqueue's lifetime) is not tied to the file's real lifetime.
Using wake_up_pollfree() here might be less than ideal, but it is in line
with the comment at commit 42288cb44c4b ("wait: add wake_up_pollfree()")
since the waitqueue's lifetime is not tied to file's one and can be
considered as another special case. While this would be fixable by somehow
making cgroup_file_release() be tied to the fput(), it would require
sizable refactoring at cgroups or higher layer which might be more
justifiable if we identify more cases like this.
BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x60/0xc0
Write of size 4 at addr ffff88810e625328 by task a.out/4404
CPU: 19 PID: 4404 Comm: a.out Not tainted 6.2.0-rc6 #38
Hardware name: Amazon EC2 c5a.8xlarge/, BIOS 1.0 10/16/2017
Call Trace:
<TASK>
dump_stack_lvl+0x73/0xa0
print_report+0x16c/0x4e0
? _printk+0x59/0x80
? __virt_addr_valid+0xb8/0x130
? _raw_spin_lock_irqsave+0x60/0xc0
kasan_report+0xc3/0xf0
? _raw_spin_lock_irqsave+0x60/0xc0
kasan_check_range+0x2d2/0x310
_raw_spin_lock_irqsave+0x60/0xc0
remove_wait_queue+0x1a/0xa0
ep_free+0x12c/0x170
ep_eventpoll_release+0x26/0x30
__fput+0x202/0x400
task_work_run+0x11d/0x170
do_exit+0x495/0x1130
? update_cfs_rq_load_avg+0x2c2/0x2e0
do_group_exit+0x100/0x100
get_signal+0xd67/0xde0
? finish_task_switch+0x15f/0x3a0
arch_do_signal_or_restart+0x2a/0x2b0
exit_to_user_mode_prepare+0x94/0x100
syscall_exit_to_user_mode+0x20/0x40
do_syscall_64+0x52/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f8e392bfb91
Code: Unable to access opcode bytes at 0x7f8e392bfb67.
RSP: 002b:00007fff261e08d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000022
RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: 00007f8e392bfb91
RDX: 0000000000000001 RSI: 00007fff261e08e8 RDI: 0000000000000004
RBP: 00007fff261e0920 R08: 0000000000400780 R09: 00007f8e3960f240
R10: 00000000000003df R11: 0000000000000246 R12: 00000000004005a0
R13: 00007fff261e0a00 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Allocated by task 4404:
kasan_set_track+0x3d/0x60
__kasan_kmalloc+0x85/0x90
psi_trigger_create+0x113/0x3e0
pressure_write+0x146/0x2e0
cgroup_file_write+0x11c/0x250
kernfs_fop_write_iter+0x186/0x220
vfs_write+0x3d8/0x5c0
ksys_write+0x90/0x110
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 4407:
kasan_set_track+0x3d/0x60
kasan_save_free_info+0x27/0x40
____kasan_slab_free+0x11d/0x170
slab_free_freelist_hook+0x87/0x150
__kmem_cache_free+0xcb/0x180
psi_trigger_destroy+0x2e8/0x310
cgroup_file_release+0x4f/0xb0
kernfs_drain_open_files+0x165/0x1f0
kernfs_drain+0x162/0x1a0
__kernfs_remove+0x1fb/0x310
kernfs_remove_by_name_ns+0x95/0xe0
cgroup_addrm_files+0x67f/0x700
cgroup_destroy_locked+0x283/0x3c0
cgroup_rmdir+0x29/0x100
kernfs_iop_rmdir+0xd1/0x140
vfs_rmdir+0xfe/0x240
do_rmdir+0x13d/0x280
__x64_sys_rmdir+0x2c/0x30
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
v4: updated commit message
v3: updated commit message and the comment in the code
v2: updated commit message
Link: https://lore.kernel.org/lkml/20230106224859.4123476-1-kamatam@amazon.com/
Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Cc: stable(a)vger.kernel.org
Signed-off-by: Munehisa Kamata <kamatam(a)amazon.com>
Signed-off-by: Mengchi Cheng <mengcc(a)amazon.com>
Acked-by: Suren Baghdasaryan <surenb(a)google.com>
---
kernel/sched/psi.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 8ac8b81bfee6..02e011cabe91 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -1343,10 +1343,11 @@ void psi_trigger_destroy(struct psi_trigger *t)
group = t->group;
/*
- * Wakeup waiters to stop polling. Can happen if cgroup is deleted
- * from under a polling process.
+ * Wakeup waiters to stop polling and clear the queue to prevent it from
+ * being accessed later. Can happen if cgroup is deleted from under a
+ * polling process.
*/
- wake_up_interruptible(&t->event_wait);
+ wake_up_pollfree(&t->event_wait);
mutex_lock(&group->trigger_lock);
--
2.38.1
This bug is marked as fixed by commit:
ext4: block range must be validated before use in ext4_mb_clear_bb()
But I can't find it in the tested trees[1] for more than 90 days.
Is it a correct commit? Please update it by replying:
#syz fix: exact-commit-title
Until then the bug is still considered open and new crashes with
the same signature are ignored.
Kernel: Android 5.10
Dashboard link: https://syzkaller.appspot.com/bug?extid=15cd994e273307bf5cfa
---
[1] I expect the commit to be present in:
1. android12-5.10-lts branch of
https://android.googlesource.com/kernel/common
There is an error
tree: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
master
head: c911f03f8d444e623724fddd82b07a7e1af42338
commit: d5924531dd8ad012ad13eb4d6a5e120c3dadfc05 arm64/kexec: Test page
size support with new TGRAN range values
#
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
When I compile the ko file, I add [-Werror=type-limits] compilation
options, an error is reported during compilation.
The log is as follows:
./arch/arm64/include/asm/cpufeature.h: In function
‘system_supports_4kb_granule’:
./arch/arm64/include/asm/cpufeature.h:653:14: error:
comparison of unsigned expression >= 0 is always true [-Werror=type-limits]
return (val >= ID_AA64MMFR0_TGRAN4_SUPPORTED_MIN) &&
^~
./arch/arm64/include/asm/cpufeature.h: In function
‘system_supports_64kb_granule’:
./arch/arm64/include/asm/cpufeature.h:666:14: error:
comparison of unsigned expression >= 0 is always true [-Werror=type-limits]
return (val >= ID_AA64MMFR0_TGRAN64_SUPPORTED_MIN) &&
^~
"val" variable type is "u32"
"#define ID_AA64MMFR0_TGRAN4_SUPPORTED_MIN 0x0"
"#define ID_AA64MMFR0_TGRAN64_SUPPORTED_MIN 0x0"
comparison of val >= 0 is always true.
If you fix the issue, kindly add following tag where applicable
Reported-by: heyuqiang <heyuqiang1(a)huawei.com>
Thanks
When a connection was established without going through
NL80211_CMD_CONNECT, the ssid was never set in the wireless_dev struct.
Now we set it in __cfg80211_connect_result() when it is not already set.
When using a userspace configuration that does not call
cfg80211_connect() (can be checked with breakpoints in the kernel),
this patch should allow `networkctl status device_name` to output the
SSID instead of null.
Reported-by: Yohan Prod'homme <kernel(a)zoddo.fr>
Fixes: 7b0a0e3c3a88 (wifi: cfg80211: do some rework towards MLO link APIs)
CC: Kalle Valo <kvalo(a)kernel.org>
Cc: Denis Kirjanov <dkirjanov(a)suse.de>
Cc: linux-wireless(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216711
Signed-off-by: Marc Bornand <dev.mbornand(a)systemb.ch>
---
changes since v4:
- style: use xmas tree
- better fixes tag
- fix typo in commit message
- explain how to test the patch
- fix fixes tag
- move change log
- changing the title to something better
changes since v3:
- add missing NULL check
- add missing break
changes since v2:
- The code was tottaly rewritten based on the disscution of the
v2 patch.
- the ssid is set in __cfg80211_connect_result() and only if the ssid is
not already set.
- Do not add an other ssid reset path since it is already done in
__cfg80211_disconnected()
---
net/wireless/sme.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/net/wireless/sme.c b/net/wireless/sme.c
index 4b5b6ee0fe01..032464a38787 100644
--- a/net/wireless/sme.c
+++ b/net/wireless/sme.c
@@ -724,6 +724,7 @@ void __cfg80211_connect_result(struct net_device *dev,
{
struct wireless_dev *wdev = dev->ieee80211_ptr;
const struct element *country_elem = NULL;
+ const struct element *ssid;
const u8 *country_data;
u8 country_datalen;
#ifdef CONFIG_CFG80211_WEXT
@@ -883,6 +884,22 @@ void __cfg80211_connect_result(struct net_device *dev,
country_data, country_datalen);
kfree(country_data);
+ if (wdev->u.client.ssid_len == 0) {
+ rcu_read_lock();
+ for_each_valid_link(cr, link) {
+ ssid = ieee80211_bss_get_elem(cr->links[link].bss,
+ WLAN_EID_SSID);
+
+ if (!ssid || ssid->datalen == 0)
+ continue;
+
+ memcpy(wdev->u.client.ssid, ssid->data, ssid->datalen);
+ wdev->u.client.ssid_len = ssid->datalen;
+ break;
+ }
+ rcu_read_unlock();
+ }
+
return;
out:
for_each_valid_link(cr, link)
--
2.39.2
changes since v3:
- add missing NULL check
- add missing break
changes since v2:
- The code was tottaly rewritten based on the disscution of the
v2 patch.
- the ssid is set in __cfg80211_connect_result() and only if the ssid is
not already set.
- Do not add an other ssid reset path since it is already done in
__cfg80211_disconnected()
When a connexion was established without going through
NL80211_CMD_CONNECT, the ssid was never set in the wireless_dev struct.
Now we set it in __cfg80211_connect_result() when it is not already set.
Reported-by: Yohan Prod'homme <kernel(a)zoddo.fr>
Fixes: 7b0a0e3c3a88260b6fcb017e49f198463aa62ed1
Cc: linux-wireless(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216711
Signed-off-by: Marc Bornand <dev.mbornand(a)systemb.ch>
---
net/wireless/sme.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/net/wireless/sme.c b/net/wireless/sme.c
index 4b5b6ee0fe01..b552d6c20a26 100644
--- a/net/wireless/sme.c
+++ b/net/wireless/sme.c
@@ -723,6 +723,7 @@ void __cfg80211_connect_result(struct net_device *dev,
bool wextev)
{
struct wireless_dev *wdev = dev->ieee80211_ptr;
+ const struct element *ssid;
const struct element *country_elem = NULL;
const u8 *country_data;
u8 country_datalen;
@@ -883,6 +884,22 @@ void __cfg80211_connect_result(struct net_device *dev,
country_data, country_datalen);
kfree(country_data);
+ if (wdev->u.client.ssid_len == 0) {
+ rcu_read_lock();
+ for_each_valid_link(cr, link) {
+ ssid = ieee80211_bss_get_elem(cr->links[link].bss,
+ WLAN_EID_SSID);
+
+ if (!ssid || ssid->datalen == 0)
+ continue;
+
+ memcpy(wdev->u.client.ssid, ssid->data, ssid->datalen);
+ wdev->u.client.ssid_len = ssid->datalen;
+ break;
+ }
+ rcu_read_unlock();
+ }
+
return;
out:
for_each_valid_link(cr, link)
--
2.39.1
This is the start of the stable review cycle for the 5.10.168 release.
There are 139 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 15 Feb 2023 14:46:51 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.168-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.168-rc1
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix return value
David Chen <david.chen(a)nutanix.com>
Fix page corruption caused by racy check in __free_pages
Heiner Kallweit <hkallweit1(a)gmail.com>
arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive
Heiner Kallweit <hkallweit1(a)gmail.com>
arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive
Heiner Kallweit <hkallweit1(a)gmail.com>
arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive
Guo Ren <guoren(a)linux.alibaba.com>
riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte
Xiubo Li <xiubli(a)redhat.com>
ceph: flush cap releases when the session is flushed
Prashant Malani <pmalani(a)chromium.org>
usb: typec: altmodes/displayport: Fix probe pin assign check
Mark Pearson <mpearson-lenovo(a)squebb.ca>
usb: core: add quirk for Alcor Link AK9563 smartcard reader
Anand Jain <anand.jain(a)oracle.com>
btrfs: free device in btrfs_close_devices for a single device filesystem
Alan Stern <stern(a)rowland.harvard.edu>
net: USB: Fix wrong-direction WARNING in plusb.c
ZhaoLong Wang <wangzhaolong1(a)huawei.com>
cifs: Fix use-after-free in rdata->read_into_pages()
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
pinctrl: intel: Restore the pins that used to be in Direct IRQ mode
Serge Semin <Sergey.Semin(a)baikalelectronics.ru>
spi: dw: Fix wrong FIFO level setting for long xfers
Maxim Korotkov <korotkov.maxim.s(a)gmail.com>
pinctrl: single: fix potential NULL dereference
Joel Stanley <joel(a)jms.id.au>
pinctrl: aspeed: Fix confusing types in return value
Dan Carpenter <error27(a)gmail.com>
ALSA: pci: lx6464es: fix a debug loop
Hangbin Liu <liuhangbin(a)gmail.com>
selftests: forwarding: lib: quote the sysctl values
Pietro Borrello <borrello(a)diag.uniroma1.it>
rds: rds_rm_zerocopy_callback() use list_first_entry()
Shay Drory <shayd(a)nvidia.com>
net/mlx5: fw_tracer, Zero consumer index when reloading the tracer
Shay Drory <shayd(a)nvidia.com>
net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers
Dragos Tatulea <dtatulea(a)nvidia.com>
net/mlx5e: IPoIB, Show unknown speed instead of error
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: mscc: ocelot: fix VCAP filters not matching on MAC with "protocol 802.1Q"
Anirudh Venkataramanan <anirudh.venkataramanan(a)intel.com>
ice: Do not use WQ_MEM_RECLAIM flag for workqueue
Herton R. Krzesinski <herton(a)redhat.com>
uapi: add missing ip/ipv6 header dependencies for linux/stddef.h
Neel Patel <neel.patel(a)amd.com>
ionic: clean interrupt before enabling queue to avoid credit race
Heiner Kallweit <hkallweit1(a)gmail.com>
net: phy: meson-gxl: use MMD access dummy stubs for GXL, internal PHY
Qi Zheng <zhengqi.arch(a)bytedance.com>
bonding: fix error checking in bond_debug_reregister()
Christian Hopps <chopps(a)chopps.org>
xfrm: fix bug with DSCP copy to v6 from v4 tunnel
Yang Yingliang <yangyingliang(a)huawei.com>
RDMA/usnic: use iommu_map_atomic() under spin_lock()
Dragos Tatulea <dtatulea(a)nvidia.com>
IB/IPoIB: Fix legacy IPoIB due to wrong number of queues
Eric Dumazet <edumazet(a)google.com>
xfrm/compat: prevent potential spectre v1 gadget in xfrm_xlate32_attr()
Dean Luick <dean.luick(a)cornelisnetworks.com>
IB/hfi1: Restore allocated resources on failed copyout
Anastasia Belova <abelova(a)astralinux.ru>
xfrm: compat: change expression for switch in xfrm_xlate64
Devid Antonio Filoni <devid.filoni(a)egluetechnologies.com>
can: j1939: do not wait 250 ms if the same addr was already claimed
Mark Brown <broonie(a)kernel.org>
of/address: Return an error when no valid dma-ranges are found
Shiju Jose <shiju.jose(a)huawei.com>
tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
Guillaume Pinot <texitoi(a)texitoi.eu>
ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro 360
Artemii Karasev <karasev(a)ispras.ru>
ALSA: emux: Avoid potential array out-of-bound in snd_emux_xg_control()
Edson Juliano Drosdeck <edson.drosdeck(a)gmail.com>
ALSA: hda/realtek: Add Positivo N14KP6-TG
Alexander Potapenko <glider(a)google.com>
btrfs: zlib: zero-initialize zlib workspace
Josef Bacik <josef(a)toxicpanda.com>
btrfs: limit device extents to the device size
Mike Kravetz <mike.kravetz(a)oracle.com>
migrate: hugetlb: check for hugetlb shared PMD in node migration
Miaohe Lin <linmiaohe(a)huawei.com>
mm/migration: return errno when isolate_huge_page failed
Andreas Kemnade <andreas(a)kemnade.info>
iio:adc:twl6030: Enable measurement of VAC
Martin KaFai Lau <kafai(a)fb.com>
bpf: Do not reject when the stack read size is different from the tracked scalar size
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix registration vs use race
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix cleanup after dev_set_name()
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: remove nvmem_config wp_gpio
Gaosheng Cui <cuigaosheng1(a)huawei.com>
nvmem: core: add error handling for dev_set_name
Christophe Kerello <christophe.kerello(a)foss.st.com>
nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios property
Minsuk Kang <linuxlovemin(a)yonsei.ac.kr>
wifi: brcmfmac: Check the count value of channel spec to prevent out-of-bounds reads
Chao Yu <chao(a)kernel.org>
f2fs: fix to do sanity check on i_extra_isize in is_alive()
Dongliang Mu <dzm91(a)hust.edu.cn>
fbdev: smscufx: fix error handling code in ufx_usb_probe
Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
serial: 8250_dma: Fix DMA Rx rearm race
Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
serial: 8250_dma: Fix DMA Rx completion race
Michael Walle <michael(a)walle.cc>
nvmem: core: fix cell removal on error
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: initialise nvmem->id early
Rob Clark <robdclark(a)chromium.org>
drm/i915: Fix potential bit_17 double-free
Phillip Lougher <phillip(a)squashfs.org.uk>
Squashfs: fix handling and sanity checking of xattr_ids count
Longlong Xia <xialonglong1(a)huawei.com>
mm/swapfile: add cond_resched() in get_swap_pages()
Zheng Yongjun <zhengyongjun3(a)huawei.com>
fpga: stratix10-soc: Fix return value check in s10_ops_write_init()
Joerg Roedel <jroedel(a)suse.de>
x86/debug: Fix stack recursion caused by wrongly ordered DR7 accesses
Mike Kravetz <mike.kravetz(a)oracle.com>
mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
Andreas Schwab <schwab(a)suse.de>
riscv: disable generation of unwind tables
Helge Deller <deller(a)gmx.de>
parisc: Wire up PTRACE_GETREGS/PTRACE_SETREGS for compat case
Helge Deller <deller(a)gmx.de>
parisc: Fix return code of pdc_iodc_print()
Johan Hovold <johan+linaro(a)kernel.org>
nvmem: qcom-spmi-sdam: fix module autoloading
Carlos Song <carlos.song(a)nxp.com>
iio: imu: fxos8700: fix MAGN sensor scale and unit
Carlos Song <carlos.song(a)nxp.com>
iio: imu: fxos8700: remove definition FXOS8700_CTRL_ODR_MIN
Carlos Song <carlos.song(a)nxp.com>
iio: imu: fxos8700: fix failed initialization ODR mode assignment
Carlos Song <carlos.song(a)nxp.com>
iio: imu: fxos8700: fix incorrect ODR mode readback
Carlos Song <carlos.song(a)nxp.com>
iio: imu: fxos8700: fix swapped ACCEL and MAGN channels readback
Carlos Song <carlos.song(a)nxp.com>
iio: imu: fxos8700: fix map label of channel type to MAGN sensor
Carlos Song <carlos.song(a)nxp.com>
iio: imu: fxos8700: fix IMU data bits returned to user space
Carlos Song <carlos.song(a)nxp.com>
iio: imu: fxos8700: fix incomplete ACCEL and MAGN channels readback
Carlos Song <carlos.song(a)nxp.com>
iio: imu: fxos8700: fix ACCEL measurement range selection
Andreas Kemnade <andreas(a)kemnade.info>
iio:adc:twl6030: Enable measurements of VUSB, VBAT and others
Xiongfeng Wang <wangxiongfeng2(a)huawei.com>
iio: adc: berlin2-adc: Add missing of_node_put() in error path
Dmitry Perchanov <dmitry.perchanov(a)intel.com>
iio: hid: fix the retval in accel_3d_capture_sample
Ard Biesheuvel <ardb(a)kernel.org>
efi: Accept version 2 of memory attributes table
Victor Shyba <victor1984(a)riseup.net>
ALSA: hda/realtek: Add Acer Predator PH315-54
Alexander Egorenkov <egorenar(a)linux.ibm.com>
watchdog: diag288_wdt: fix __diag288() inline assembly
Alexander Egorenkov <egorenar(a)linux.ibm.com>
watchdog: diag288_wdt: do not use stack buffers for hardware data
Natalia Petrova <n.petrova(a)fintech.ru>
net: qrtr: free memory on error path in radix_tree_insert()
Samuel Thibault <samuel.thibault(a)ens-lyon.org>
fbcon: Check font dimension limits
Werner Sembach <wse(a)tuxedocomputers.com>
Input: i8042 - add Clevo PCX0DX to i8042 quirk table
Werner Sembach <wse(a)tuxedocomputers.com>
Input: i8042 - add TUXEDO devices to i8042 quirk tables
Werner Sembach <wse(a)tuxedocomputers.com>
Input: i8042 - merge quirk tables
Werner Sembach <wse(a)tuxedocomputers.com>
Input: i8042 - move __initconst to fix code styling warning
George Kennedy <george.kennedy(a)oracle.com>
vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF
Udipto Goswami <quic_ugoswami(a)quicinc.com>
usb: gadget: f_fs: Fix unbalanced spinlock in __ffs_ep0_queue_wait
Neil Armstrong <neil.armstrong(a)linaro.org>
usb: dwc3: qcom: enable vbus override when in OTG dr-mode
Wesley Cheng <wcheng(a)codeaurora.org>
usb: dwc3: dwc3-qcom: Fix typo in the dwc3 vbus override API
Olivier Moysan <olivier.moysan(a)foss.st.com>
iio: adc: stm32-dfsdm: fill module aliases
Hyunwoo Kim <v4bel(a)theori.io>
net/x25: Fix to not accept on connected socket
Koba Ko <koba.ko(a)canonical.com>
platform/x86: dell-wmi: Add a keymap for KEY_MUTE in type 0x0010 table
Randy Dunlap <rdunlap(a)infradead.org>
i2c: rk3x: fix a bunch of kernel-doc warnings
Mike Christie <michael.christie(a)oracle.com>
scsi: iscsi_tcp: Fix UAF during login when accessing the shost ipaddress
Maurizio Lombardi <mlombard(a)redhat.com>
scsi: target: core: Fix warning on RT kernels
Stefan Wahren <stefan.wahren(a)i2se.com>
i2c: mxs: suppress probe-deferral error message
Magnus Karlsson <magnus.karlsson(a)intel.com>
qede: execute xdp_do_flush() before napi_complete_done()
Bhaskar Upadhaya <bupadhaya(a)marvell.com>
qede: add netpoll support for qede driver
Anton Gusev <aagusev(a)ispras.ru>
efi: fix potential NULL deref in efi_mem_reserve_persistent
Fedor Pchelkin <pchelkin(a)ispras.ru>
net: openvswitch: fix flow memory leak in ovs_flow_cmd_new
Parav Pandit <parav(a)nvidia.com>
virtio-net: Keep stop() to follow mirror sequence of open()
Andrei Gherzan <andrei.gherzan(a)canonical.com>
selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking
Andrei Gherzan <andrei.gherzan(a)canonical.com>
selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs
Andrei Gherzan <andrei.gherzan(a)canonical.com>
selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided
Andrei Gherzan <andrei.gherzan(a)canonical.com>
selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning
Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
ata: libata: Fix sata_down_spd_limit() when no link speed is reported
Ziyang Xuan <william.xuanziyang(a)huawei.com>
can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
Tom Rix <trix(a)redhat.com>
igc: return an error if the mac type is unknown in igc_ptp_systim_to_hwtstamp()
Chris Healy <healych(a)amazon.com>
net: phy: meson-gxl: Add generic dummy stubs for MMD register access
Fedor Pchelkin <pchelkin(a)ispras.ru>
squashfs: harden sanity check in squashfs_read_xattr_id_table
Florian Westphal <fw(a)strlen.de>
netfilter: br_netfilter: disable sabotage_in hook after first suppression
Hyunwoo Kim <v4bel(a)theori.io>
netrom: Fix use-after-free caused by accept on already connected socket
Andre Kalb <andre.kalb(a)sma.de>
net: phy: dp83822: Fix null pointer access on DP83825/DP83826 devices
Íñigo Huguet <ihuguet(a)redhat.com>
sfc: correctly advertise tunneled IPv6 segmentation
Magnus Karlsson <magnus.karlsson(a)intel.com>
virtio-net: execute xdp_do_flush() before napi_complete_done()
Al Viro <viro(a)zeniv.linux.org.uk>
fix "direction" argument of iov_iter_kvec()
Al Viro <viro(a)zeniv.linux.org.uk>
fix iov_iter_bvec() "direction" argument
Al Viro <viro(a)zeniv.linux.org.uk>
READ is "data destination", not source...
Al Viro <viro(a)zeniv.linux.org.uk>
WRITE is "data source", not destination...
Eric Auger <eric.auger(a)redhat.com>
vhost/net: Clear the pending messages when the backend is removed
Martin K. Petersen <martin.petersen(a)oracle.com>
scsi: Revert "scsi: core: map PQ=1, PDT=other values to SCSI_SCAN_TARGET_PRESENT"
Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
drm/vc4: hdmi: make CEC adapter name unique
Pierluigi Passaro <pierluigi.p(a)variscite.com>
arm64: dts: imx8mm: Fix pad control for UART1_DTE_RX
Jakub Sitnicki <jakub(a)cloudflare.com>
bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listener
Eduard Zingerman <eddyz87(a)gmail.com>
bpf: Fix to preserve reg parent/live fields when copying range info
Martin KaFai Lau <kafai(a)fb.com>
bpf: Support <8-byte scalar spill and refill
Christophe Leroy <christophe.leroy(a)csgroup.eu>
powerpc/bpf: Move common helpers into bpf_jit.h
Christophe Leroy <christophe.leroy(a)csgroup.eu>
powerpc/bpf: Change register numbering for bpf_set/is_seen_register()
Artemii Karasev <karasev(a)ispras.ru>
ALSA: hda/via: Avoid potential array out-of-bound in add_secret_dac_path()
Yonghong Song <yhs(a)fb.com>
bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers
Michael Ellerman <mpe(a)ellerman.id.au>
powerpc/imc-pmu: Revert nest_init_lock to being a mutex
Paul Chaignon <paul(a)isovalent.com>
bpf: Fix incorrect state pruning for <8B spill/fill
Yuan Can <yuancan(a)huawei.com>
bus: sunxi-rsb: Fix error handling in sunxi_rsb_init()
Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
firewire: fix memory leak for payload of request subaction to IEC 61883-1 FCP region
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/boot/dts/amlogic/meson-axg.dtsi | 4 +-
arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi | 6 +-
arch/arm64/boot/dts/amlogic/meson-gx.dtsi | 6 +-
arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h | 2 +-
arch/parisc/kernel/firmware.c | 5 +-
arch/parisc/kernel/ptrace.c | 15 +-
arch/powerpc/net/bpf_jit.h | 35 +
arch/powerpc/net/bpf_jit64.h | 19 -
arch/powerpc/net/bpf_jit_comp64.c | 28 +-
arch/powerpc/perf/imc-pmu.c | 14 +-
arch/riscv/Makefile | 3 +
arch/riscv/mm/cacheflush.c | 4 +-
arch/x86/include/asm/debugreg.h | 26 +-
drivers/ata/libata-core.c | 2 +-
drivers/bus/sunxi-rsb.c | 8 +-
drivers/firewire/core-cdev.c | 4 +-
drivers/firmware/efi/efi.c | 2 +
drivers/firmware/efi/memattr.c | 2 +-
drivers/fpga/stratix10-soc.c | 4 +-
drivers/fsi/fsi-sbefifo.c | 6 +-
drivers/gpu/drm/i915/gem/i915_gem_tiling.c | 9 +-
drivers/gpu/drm/vc4/vc4_hdmi.c | 3 +-
drivers/i2c/busses/i2c-mxs.c | 4 +-
drivers/i2c/busses/i2c-rk3x.c | 44 +-
drivers/iio/accel/hid-sensor-accel-3d.c | 1 +
drivers/iio/adc/berlin2-adc.c | 4 +-
drivers/iio/adc/stm32-dfsdm-adc.c | 1 +
drivers/iio/adc/twl6030-gpadc.c | 32 +
drivers/iio/imu/fxos8700_core.c | 111 +-
drivers/infiniband/hw/hfi1/file_ops.c | 7 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 8 +-
drivers/infiniband/ulp/ipoib/ipoib_main.c | 8 +
drivers/infiniband/ulp/rtrs/rtrs-clt.c | 2 +-
drivers/input/serio/i8042-x86ia64io.h | 1188 ++++++++++++--------
drivers/net/bonding/bond_debugfs.c | 2 +-
drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
drivers/net/ethernet/intel/igc/igc_ptp.c | 14 +-
.../ethernet/mellanox/mlx5/core/diag/fw_tracer.c | 3 +-
.../ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 13 +-
drivers/net/ethernet/mscc/ocelot_flower.c | 24 +-
drivers/net/ethernet/pensando/ionic/ionic_lif.c | 15 +-
drivers/net/ethernet/qlogic/qede/qede_fp.c | 10 +-
drivers/net/ethernet/sfc/efx.c | 5 +-
drivers/net/phy/dp83822.c | 6 +-
drivers/net/phy/meson-gxl.c | 4 +
drivers/net/usb/plusb.c | 4 +-
drivers/net/virtio_net.c | 8 +-
.../broadcom/brcm80211/brcmfmac/cfg80211.c | 17 +
drivers/nvmem/core.c | 45 +-
drivers/nvmem/qcom-spmi-sdam.c | 1 +
drivers/of/address.c | 21 +-
drivers/pinctrl/aspeed/pinctrl-aspeed.c | 2 +-
drivers/pinctrl/intel/pinctrl-intel.c | 16 +-
drivers/pinctrl/pinctrl-single.c | 2 +
drivers/platform/x86/dell-wmi.c | 3 +
drivers/scsi/iscsi_tcp.c | 9 +-
drivers/scsi/scsi_scan.c | 7 +-
drivers/spi/spi-dw-core.c | 2 +-
drivers/target/target_core_file.c | 4 +-
drivers/target/target_core_tmr.c | 4 +-
drivers/tty/serial/8250/8250_dma.c | 26 +-
drivers/tty/vt/vc_screen.c | 9 +-
drivers/usb/core/quirks.c | 3 +
drivers/usb/dwc3/dwc3-qcom.c | 10 +-
drivers/usb/gadget/function/f_fs.c | 4 +-
drivers/usb/typec/altmodes/displayport.c | 8 +-
drivers/vhost/net.c | 3 +
drivers/vhost/vhost.c | 3 +-
drivers/vhost/vhost.h | 1 +
drivers/video/fbdev/core/fbcon.c | 7 +-
drivers/video/fbdev/smscufx.c | 46 +-
drivers/watchdog/diag288_wdt.c | 15 +-
drivers/xen/pvcalls-back.c | 8 +-
fs/btrfs/volumes.c | 22 +-
fs/btrfs/zlib.c | 2 +-
fs/ceph/mds_client.c | 6 +
fs/cifs/file.c | 4 +-
fs/f2fs/gc.c | 18 +-
fs/proc/task_mmu.c | 4 +-
fs/squashfs/squashfs_fs.h | 2 +-
fs/squashfs/squashfs_fs_sb.h | 2 +-
fs/squashfs/xattr.h | 4 +-
fs/squashfs/xattr_id.c | 4 +-
include/linux/hugetlb.h | 19 +-
include/linux/nvmem-provider.h | 4 +-
include/linux/util_macros.h | 12 +
include/uapi/linux/ip.h | 1 +
include/uapi/linux/ipv6.h | 1 +
kernel/bpf/verifier.c | 102 +-
kernel/trace/bpf_trace.c | 3 +-
kernel/trace/trace.c | 3 -
mm/gup.c | 2 +-
mm/hugetlb.c | 6 +-
mm/memory-failure.c | 2 +-
mm/memory_hotplug.c | 2 +-
mm/mempolicy.c | 5 +-
mm/migrate.c | 7 +-
mm/page_alloc.c | 5 +-
mm/swapfile.c | 1 +
net/bridge/br_netfilter_hooks.c | 1 +
net/can/j1939/address-claim.c | 40 +
net/can/j1939/transport.c | 4 -
net/ipv4/tcp_bpf.c | 4 +-
net/netrom/af_netrom.c | 5 +
net/openvswitch/datapath.c | 12 +-
net/qrtr/ns.c | 5 +-
net/rds/message.c | 6 +-
net/x25/af_x25.c | 6 +
net/xfrm/xfrm_compat.c | 4 +-
net/xfrm/xfrm_input.c | 3 +-
sound/pci/hda/patch_realtek.c | 3 +
sound/pci/hda/patch_via.c | 3 +
sound/pci/lx6464es/lx_core.c | 11 +-
sound/synth/emux/emux_nrpn.c | 3 +
tools/testing/selftests/net/forwarding/lib.sh | 4 +-
tools/testing/selftests/net/udpgso_bench.sh | 24 +-
tools/testing/selftests/net/udpgso_bench_rx.c | 4 +-
tools/testing/selftests/net/udpgso_bench_tx.c | 36 +-
119 files changed, 1573 insertions(+), 855 deletions(-)
Some TBT3 devices have a hard time reliably responding to bit banging
requests correctly when connected to AMD USB4 hosts running Linux.
These problems are not reported in any other CM, and comparing the
implementations the Linux CM is the only one that utilizes bit banging
to access the DROM. Other CM implementations access the DROM directly
from the NVM instead of bit banging.
Adjust the flow to try this on TBT3 devices before resorting to bit
banging.
Cc: stable(a)vger.kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/thunderbolt/eeprom.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/thunderbolt/eeprom.c b/drivers/thunderbolt/eeprom.c
index c90d22f56d4e1..d9d9567bb938b 100644
--- a/drivers/thunderbolt/eeprom.c
+++ b/drivers/thunderbolt/eeprom.c
@@ -640,6 +640,10 @@ int tb_drom_read(struct tb_switch *sw)
return 0;
}
+ /* TBT3 devices have the DROM as part of NVM */
+ if (tb_drom_copy_nvm(sw, &size) == 0)
+ goto parse;
+
res = tb_drom_read_n(sw, 14, (u8 *) &size, 2);
if (res)
return res;
--
2.25.1
This is a note to let you know that I've just added the patch titled
usb: typec: tcpm: fix create duplicate source/sink-capabilities file
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
From f430e60b78c6359ba4cb4e521d7df7f9a0484e03 Mon Sep 17 00:00:00 2001
From: Xu Yang <xu.yang_2(a)nxp.com>
Date: Tue, 14 Feb 2023 14:56:35 +0800
Subject: usb: typec: tcpm: fix create duplicate source/sink-capabilities file
The kernel will dump in the below cases:
sysfs: cannot create duplicate filename
'/devices/virtual/usb_power_delivery/pd1/source-capabilities'
1. After soft reset has completed, an Explicit Contract negotiation occurs.
The sink device will receive source capabilitys again. This will cause
a duplicate source-capabilities file be created.
2. Power swap twice on a device that is initailly sink role.
This will unregister existing capabilities when above cases occurs.
Fixes: 8203d26905ee ("usb: typec: tcpm: Register USB Power Delivery Capabilities")
cc: <stable(a)vger.kernel.org>
Signed-off-by: Xu Yang <xu.yang_2(a)nxp.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Link: https://lore.kernel.org/r/20230214065635.972698-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/tcpm/tcpm.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index a0d943d78580..7d8c53d96c3b 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -4570,6 +4570,8 @@ static void run_state_machine(struct tcpm_port *port)
case SOFT_RESET:
port->message_id = 0;
port->rx_msgid = -1;
+ /* remove existing capabilities */
+ usb_power_delivery_unregister_capabilities(port->partner_source_caps);
tcpm_pd_send_control(port, PD_CTRL_ACCEPT);
tcpm_ams_finish(port);
if (port->pwr_role == TYPEC_SOURCE) {
@@ -4589,6 +4591,8 @@ static void run_state_machine(struct tcpm_port *port)
case SOFT_RESET_SEND:
port->message_id = 0;
port->rx_msgid = -1;
+ /* remove existing capabilities */
+ usb_power_delivery_unregister_capabilities(port->partner_source_caps);
if (tcpm_pd_send_control(port, PD_CTRL_SOFT_RESET))
tcpm_set_state_cond(port, hard_reset_state(port), 0);
else
@@ -4718,6 +4722,8 @@ static void run_state_machine(struct tcpm_port *port)
tcpm_set_state(port, SNK_STARTUP, 0);
break;
case PR_SWAP_SNK_SRC_SINK_OFF:
+ /* will be source, remove existing capabilities */
+ usb_power_delivery_unregister_capabilities(port->partner_source_caps);
/*
* Prevent vbus discharge circuit from turning on during PR_SWAP
* as this is not a disconnect.
--
2.39.1
Assert PCI Configuration Enable bit after probe. When this bit is left to
0 in the endpoint mode, the RK3399 PCIe endpoint core will generate
configuration request retry status (CRS) messages back to the root complex.
Assert this bit after probe to allow the RK3399 PCIe endpoint core to reply
to configuration requests from the root complex.
This is documented in section 17.5.8.1.2 of the RK3399 TRM.
Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller")
Cc: stable(a)vger.kernel.org
Signed-off-by: Rick Wertenbroek <rick.wertenbroek(a)gmail.com>
---
drivers/pci/controller/pcie-rockchip-ep.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pci/controller/pcie-rockchip-ep.c b/drivers/pci/controller/pcie-rockchip-ep.c
index 9b835377b..4c84e403e 100644
--- a/drivers/pci/controller/pcie-rockchip-ep.c
+++ b/drivers/pci/controller/pcie-rockchip-ep.c
@@ -623,6 +623,8 @@ static int rockchip_pcie_ep_probe(struct platform_device *pdev)
ep->irq_pci_addr = ROCKCHIP_PCIE_EP_DUMMY_IRQ_ADDR;
+ rockchip_pcie_write(rockchip, PCIE_CLIENT_CONF_ENABLE, PCIE_CLIENT_CONFIG);
+
return 0;
err_epc_mem_exit:
pci_epc_mem_exit(epc);
--
2.25.1
Certain AMD processors are vulnerable to a cross-thread return address
predictions bug. When running in SMT mode and one of the sibling threads
transitions out of C0 state, the other thread gets access to twice as many
entries in the RSB, but unfortunately the predictions of the now-halted
logical processor are not purged. Therefore, the executing processor
could speculatively execute from locations that the now-halted processor
had trained the RSB on.
The Spectre v2 mitigations cover the Linux kernel, as it fills the RSB
when context switching to the idle thread. However, KVM allows a VMM to
prevent exiting guest mode when transitioning out of C0 using the
KVM_CAP_X86_DISABLE_EXITS capability can be used by a VMM to change this
behavior. To mitigate the cross-thread return address predictions bug,
a VMM must not be allowed to override the default behavior to intercept
C0 transitions.
These patches introduce a KVM module parameter that, if set, will prevent
the user from disabling the HLT, MWAIT and CSTATE exits.
The patches apply to the 5.15 stable tree, and Greg has already received
them through a git bundle. The difference is only in context, but it is
too much for "git cherry-pick" so here they are.
Thanks,
Paolo
Tom Lendacky (3):
x86/speculation: Identify processors vulnerable to SMT RSB predictions
KVM: x86: Mitigate the cross-thread return address predictions bug
Documentation/hw-vuln: Add documentation for Cross-Thread Return
Predictions
.../admin-guide/hw-vuln/cross-thread-rsb.rst | 92 +++++++++++++++++++
Documentation/admin-guide/hw-vuln/index.rst | 1 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/common.c | 9 +-
arch/x86/kvm/x86.c | 43 ++++++---
5 files changed, 133 insertions(+), 13 deletions(-)
create mode 100644 Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst
--
2.39.1
When a grant entry is still in use by the remote domain, Linux must put
it on a deferred list. Normally, this list is very short, because
the PV network and block protocols expect the backend to unmap the grant
first. However, Qubes OS's GUI protocol is subject to the constraints
of the X Window System, and as such winds up with the frontend unmapping
the window first. As a result, the list can grow very large, resulting
in a massive memory leak and eventual VM freeze.
Fix this problem by bumping the number of entries that the VM will
attempt to free at each iteration to 10000. This is an ugly hack that
may well make a denial of service easier, but for Qubes OS that is less
bad than the problem Qubes OS users are facing today. There really
needs to be a way for a frontend to be notified when the backend has
unmapped the grants. Additionally, a module parameter is provided to
allow tuning the reclaim speed.
The code previously used printk(KERN_DEBUG) whenever it had to defer
reclaiming a page because the grant was still mapped. This resulted in
a large volume of log messages that bothered users. Use pr_debug
instead, which suppresses the messages by default. Developers can
enable them using the dynamic debug mechanism.
Fixes: QubesOS/qubes-issues#7410 (memory leak)
Fixes: QubesOS/qubes-issues#7359 (excessive logging)
Fixes: 569ca5b3f94c ("xen/gnttab: add deferred freeing logic")
Cc: stable(a)vger.kernel.org
Signed-off-by: Demi Marie Obenour <demi(a)invisiblethingslab.com>
---
Anyone have suggestions for improving the grant mechanism? Argo isn't
a good option, as in the GUI protocol there are substantial performance
wins to be had by using true shared memory. Resending as I forgot the
Signed-off-by on the first submission. Sorry about that.
drivers/xen/grant-table.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 5c83d41..2c2faa7 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -355,14 +355,20 @@
static void gnttab_handle_deferred(struct timer_list *);
static DEFINE_TIMER(deferred_timer, gnttab_handle_deferred);
+static atomic64_t deferred_count;
+static atomic64_t leaked_count;
+static unsigned int free_per_iteration = 10000;
+
static void gnttab_handle_deferred(struct timer_list *unused)
{
- unsigned int nr = 10;
+ unsigned int nr = READ_ONCE(free_per_iteration);
+ const bool ignore_limit = nr == 0;
struct deferred_entry *first = NULL;
unsigned long flags;
+ size_t freed = 0;
spin_lock_irqsave(&gnttab_list_lock, flags);
- while (nr--) {
+ while ((ignore_limit || nr--) && !list_empty(&deferred_list)) {
struct deferred_entry *entry
= list_first_entry(&deferred_list,
struct deferred_entry, list);
@@ -372,10 +378,13 @@
list_del(&entry->list);
spin_unlock_irqrestore(&gnttab_list_lock, flags);
if (_gnttab_end_foreign_access_ref(entry->ref)) {
+ uint64_t ret = atomic64_sub_return(1, &deferred_count);
put_free_entry(entry->ref);
- pr_debug("freeing g.e. %#x (pfn %#lx)\n",
- entry->ref, page_to_pfn(entry->page));
+ pr_debug("freeing g.e. %#x (pfn %#lx), %llu remaining\n",
+ entry->ref, page_to_pfn(entry->page),
+ (unsigned long long)ret);
put_page(entry->page);
+ freed++;
kfree(entry);
entry = NULL;
} else {
@@ -387,14 +396,15 @@
spin_lock_irqsave(&gnttab_list_lock, flags);
if (entry)
list_add_tail(&entry->list, &deferred_list);
- else if (list_empty(&deferred_list))
- break;
}
- if (!list_empty(&deferred_list) && !timer_pending(&deferred_timer)) {
+ if (list_empty(&deferred_list))
+ WARN_ON(atomic64_read(&deferred_count));
+ else if (!timer_pending(&deferred_timer)) {
deferred_timer.expires = jiffies + HZ;
add_timer(&deferred_timer);
}
spin_unlock_irqrestore(&gnttab_list_lock, flags);
+ pr_debug("Freed %zu references", freed);
}
static void gnttab_add_deferred(grant_ref_t ref, struct page *page)
@@ -402,7 +412,7 @@
{
struct deferred_entry *entry;
gfp_t gfp = (in_atomic() || irqs_disabled()) ? GFP_ATOMIC : GFP_KERNEL;
- const char *what = KERN_WARNING "leaking";
+ uint64_t leaked, deferred;
entry = kmalloc(sizeof(*entry), gfp);
if (!page) {
@@ -426,12 +436,20 @@
add_timer(&deferred_timer);
}
spin_unlock_irqrestore(&gnttab_list_lock, flags);
- what = KERN_DEBUG "deferring";
+ deferred = atomic64_add_return(1, &deferred_count);
+ leaked = atomic64_read(&leaked_count);
+ pr_debug("deferring g.e. %#x (pfn %#lx) (total deferred %llu, total leaked %llu)\n",
+ ref, page ? page_to_pfn(page) : -1, deferred, leaked);
+ } else {
+ deferred = atomic64_read(&deferred_count);
+ leaked = atomic64_add_return(1, &leaked_count);
+ pr_warn("leaking g.e. %#x (pfn %#lx) (total deferred %llu, total leaked %llu)\n",
+ ref, page ? page_to_pfn(page) : -1, deferred, leaked);
}
- printk("%s g.e. %#x (pfn %#lx)\n",
- what, ref, page ? page_to_pfn(page) : -1);
}
+module_param(free_per_iteration, uint, 0600);
+
int gnttab_try_end_foreign_access(grant_ref_t ref)
{
int ret = _gnttab_end_foreign_access_ref(ref);
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
Assert PCI Configuration Enable bit after probe. When this bit is left to
0 in the endpoint mode, the RK3399 PCIe endpoint core will generate
configuration request retry status (CRS) messages back to the root complex.
Assert this bit after probe to allow the RK3399 PCIe endpoint core to reply
to configuration requests from the root complex.
This is documented in section 17.5.8.1.2 of the RK3399 TRM.
Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller")
Cc: stable(a)vger.kernel.org
Signed-off-by: Rick Wertenbroek <rick.wertenbroek(a)gmail.com>
---
drivers/pci/controller/pcie-rockchip-ep.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pci/controller/pcie-rockchip-ep.c b/drivers/pci/controller/pcie-rockchip-ep.c
index 9b835377b..4c84e403e 100644
--- a/drivers/pci/controller/pcie-rockchip-ep.c
+++ b/drivers/pci/controller/pcie-rockchip-ep.c
@@ -623,6 +623,8 @@ static int rockchip_pcie_ep_probe(struct platform_device *pdev)
ep->irq_pci_addr = ROCKCHIP_PCIE_EP_DUMMY_IRQ_ADDR;
+ rockchip_pcie_write(rockchip, PCIE_CLIENT_CONF_ENABLE, PCIE_CLIENT_CONFIG);
+
return 0;
err_epc_mem_exit:
pci_epc_mem_exit(epc);
--
2.25.1
This is the start of the stable review cycle for the 5.15.94 release.
There are 67 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 15 Feb 2023 14:46:51 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.94-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.94-rc1
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix return value
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
drm/i915: Fix VBT DSI DVO port handling
Aravind Iddamsetty <aravind.iddamsetty(a)intel.com>
drm/i915: Initialize the obj flags for shmem objects
Guilherme G. Piccoli <gpiccoli(a)igalia.com>
drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
David Chen <david.chen(a)nutanix.com>
Fix page corruption caused by racy check in __free_pages
Heiner Kallweit <hkallweit1(a)gmail.com>
arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive
Heiner Kallweit <hkallweit1(a)gmail.com>
arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive
Heiner Kallweit <hkallweit1(a)gmail.com>
arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive
Wander Lairson Costa <wander(a)redhat.com>
rtmutex: Ensure that the top waiter is always woken up
Nicholas Piggin <npiggin(a)gmail.com>
powerpc/64s/interrupt: Fix interrupt exit race with security mitigation switch
Guo Ren <guoren(a)linux.alibaba.com>
riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte
Xiubo Li <xiubli(a)redhat.com>
ceph: flush cap releases when the session is flushed
Paul Cercueil <paul(a)crapouillou.net>
clk: ingenic: jz4760: Update M/N/OD calculation algorithm
Prashant Malani <pmalani(a)chromium.org>
usb: typec: altmodes/displayport: Fix probe pin assign check
Mark Pearson <mpearson-lenovo(a)squebb.ca>
usb: core: add quirk for Alcor Link AK9563 smartcard reader
Anand Jain <anand.jain(a)oracle.com>
btrfs: free device in btrfs_close_devices for a single device filesystem
Paolo Abeni <pabeni(a)redhat.com>
mptcp: be careful on subflow status propagation on errors
Alan Stern <stern(a)rowland.harvard.edu>
net: USB: Fix wrong-direction WARNING in plusb.c
ZhaoLong Wang <wangzhaolong1(a)huawei.com>
cifs: Fix use-after-free in rdata->read_into_pages()
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
pinctrl: intel: Restore the pins that used to be in Direct IRQ mode
Serge Semin <Sergey.Semin(a)baikalelectronics.ru>
spi: dw: Fix wrong FIFO level setting for long xfers
Maxim Korotkov <korotkov.maxim.s(a)gmail.com>
pinctrl: single: fix potential NULL dereference
Joel Stanley <joel(a)jms.id.au>
pinctrl: aspeed: Fix confusing types in return value
Guodong Liu <Guodong.Liu(a)mediatek.com>
pinctrl: mediatek: Fix the drive register definition of some Pins
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Return -ENOMEM on memory allocation failure
Liu Shixin <liushixin2(a)huawei.com>
riscv: stacktrace: Fix missing the first frame
Dan Carpenter <error27(a)gmail.com>
ALSA: pci: lx6464es: fix a debug loop
Hangbin Liu <liuhangbin(a)gmail.com>
selftests: forwarding: lib: quote the sysctl values
Pietro Borrello <borrello(a)diag.uniroma1.it>
rds: rds_rm_zerocopy_callback() use list_first_entry()
Sasha Neftin <sasha.neftin(a)intel.com>
igc: Add ndo_tx_timeout support
Shay Drory <shayd(a)nvidia.com>
net/mlx5: Serialize module cleanup with reload and remove
Shay Drory <shayd(a)nvidia.com>
net/mlx5: fw_tracer, Zero consumer index when reloading the tracer
Shay Drory <shayd(a)nvidia.com>
net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers
Dragos Tatulea <dtatulea(a)nvidia.com>
net/mlx5e: IPoIB, Show unknown speed instead of error
Vlad Buslov <vladbu(a)nvidia.com>
net/mlx5: Bridge, fix ageing of peer FDB entries
Adham Faris <afaris(a)nvidia.com>
net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change
Maxim Mikityanskiy <maximmi(a)nvidia.com>
net/mlx5e: Introduce the mlx5e_flush_rq function
Maxim Mikityanskiy <maximmi(a)nvidia.com>
net/mlx5e: Move repeating clear_bit in mlx5e_rx_reporter_err_rq_cqe_recover
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: mscc: ocelot: fix VCAP filters not matching on MAC with "protocol 802.1Q"
Vladimir Oltean <vladimir.oltean(a)nxp.com>
net: dsa: mt7530: don't change PVC_EG_TAG when CPU port becomes VLAN-aware
Anirudh Venkataramanan <anirudh.venkataramanan(a)intel.com>
ice: Do not use WQ_MEM_RECLAIM flag for workqueue
Herton R. Krzesinski <herton(a)redhat.com>
uapi: add missing ip/ipv6 header dependencies for linux/stddef.h
Neel Patel <neel.patel(a)amd.com>
ionic: clean interrupt before enabling queue to avoid credit race
Heiner Kallweit <hkallweit1(a)gmail.com>
net: phy: meson-gxl: use MMD access dummy stubs for GXL, internal PHY
Qi Zheng <zhengqi.arch(a)bytedance.com>
bonding: fix error checking in bond_debug_reregister()
Clément Léger <clement.leger(a)bootlin.com>
net: phylink: move phy_device_free() to correctly release phy device
Christian Hopps <chopps(a)chopps.org>
xfrm: fix bug with DSCP copy to v6 from v4 tunnel
Yang Yingliang <yangyingliang(a)huawei.com>
RDMA/usnic: use iommu_map_atomic() under spin_lock()
Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
RDMA/irdma: Fix potential NULL-ptr-dereference
Dragos Tatulea <dtatulea(a)nvidia.com>
IB/IPoIB: Fix legacy IPoIB due to wrong number of queues
Eric Dumazet <edumazet(a)google.com>
xfrm/compat: prevent potential spectre v1 gadget in xfrm_xlate32_attr()
Dean Luick <dean.luick(a)cornelisnetworks.com>
IB/hfi1: Restore allocated resources on failed copyout
Anastasia Belova <abelova(a)astralinux.ru>
xfrm: compat: change expression for switch in xfrm_xlate64
Devid Antonio Filoni <devid.filoni(a)egluetechnologies.com>
can: j1939: do not wait 250 ms if the same addr was already claimed
Mark Brown <broonie(a)kernel.org>
of/address: Return an error when no valid dma-ranges are found
Shiju Jose <shiju.jose(a)huawei.com>
tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
Elvis Angelaccio <elvis.angelaccio(a)kde.org>
ALSA: hda/realtek: Enable mute/micmute LEDs on HP Elitebook, 645 G9
Guillaume Pinot <texitoi(a)texitoi.eu>
ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro 360
Artemii Karasev <karasev(a)ispras.ru>
ALSA: emux: Avoid potential array out-of-bound in snd_emux_xg_control()
Edson Juliano Drosdeck <edson.drosdeck(a)gmail.com>
ALSA: hda/realtek: Add Positivo N14KP6-TG
Alexander Potapenko <glider(a)google.com>
btrfs: zlib: zero-initialize zlib workspace
Josef Bacik <josef(a)toxicpanda.com>
btrfs: limit device extents to the device size
Mike Kravetz <mike.kravetz(a)oracle.com>
migrate: hugetlb: check for hugetlb shared PMD in node migration
Miaohe Lin <linmiaohe(a)huawei.com>
mm/migration: return errno when isolate_huge_page failed
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix registration vs use race
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix cleanup after dev_set_name()
Gaosheng Cui <cuigaosheng1(a)huawei.com>
nvmem: core: add error handling for dev_set_name
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/boot/dts/amlogic/meson-axg.dtsi | 4 +-
arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi | 6 +-
arch/arm64/boot/dts/amlogic/meson-gx.dtsi | 6 +-
arch/powerpc/kernel/interrupt.c | 6 +-
arch/riscv/kernel/stacktrace.c | 3 +-
arch/riscv/mm/cacheflush.c | 4 +-
drivers/clk/ingenic/jz4760-cgu.c | 18 ++--
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +-
drivers/gpu/drm/i915/display/intel_bios.c | 33 +++++---
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 2 +-
drivers/infiniband/hw/hfi1/file_ops.c | 7 +-
drivers/infiniband/hw/irdma/cm.c | 3 +
drivers/infiniband/hw/usnic/usnic_uiom.c | 8 +-
drivers/infiniband/ulp/ipoib/ipoib_main.c | 8 ++
drivers/net/bonding/bond_debugfs.c | 2 +-
drivers/net/dsa/mt7530.c | 26 ++++--
drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
drivers/net/ethernet/intel/igc/igc_main.c | 25 +++++-
.../ethernet/mellanox/mlx5/core/diag/fw_tracer.c | 3 +-
drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +-
.../ethernet/mellanox/mlx5/core/en/rep/bridge.c | 4 -
.../ethernet/mellanox/mlx5/core/en/reporter_rx.c | 30 +------
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 98 ++++++++--------------
.../net/ethernet/mellanox/mlx5/core/esw/bridge.c | 2 +-
.../ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 13 ++-
drivers/net/ethernet/mellanox/mlx5/core/main.c | 14 ++--
drivers/net/ethernet/mscc/ocelot_flower.c | 24 +++---
drivers/net/ethernet/pensando/ionic/ionic_lif.c | 15 +++-
drivers/net/phy/meson-gxl.c | 2 +
drivers/net/phy/phylink.c | 5 +-
drivers/net/usb/plusb.c | 4 +-
drivers/nvmem/core.c | 41 ++++-----
drivers/of/address.c | 21 +++--
drivers/pinctrl/aspeed/pinctrl-aspeed.c | 2 +-
drivers/pinctrl/intel/pinctrl-intel.c | 16 +++-
drivers/pinctrl/mediatek/pinctrl-mt8195.c | 4 +-
drivers/pinctrl/pinctrl-single.c | 2 +
drivers/spi/spi-dw-core.c | 2 +-
drivers/usb/core/quirks.c | 3 +
drivers/usb/typec/altmodes/displayport.c | 8 +-
fs/btrfs/volumes.c | 22 ++++-
fs/btrfs/zlib.c | 2 +-
fs/ceph/mds_client.c | 6 ++
fs/cifs/file.c | 4 +-
include/linux/hugetlb.h | 6 +-
include/uapi/linux/ip.h | 1 +
include/uapi/linux/ipv6.h | 1 +
kernel/locking/rtmutex.c | 5 +-
kernel/trace/trace.c | 3 -
mm/gup.c | 2 +-
mm/hugetlb.c | 11 ++-
mm/memory-failure.c | 2 +-
mm/memory_hotplug.c | 2 +-
mm/mempolicy.c | 5 +-
mm/migrate.c | 7 +-
mm/page_alloc.c | 5 +-
net/can/j1939/address-claim.c | 40 +++++++++
net/mptcp/subflow.c | 10 ++-
net/rds/message.c | 6 +-
net/xfrm/xfrm_compat.c | 4 +-
net/xfrm/xfrm_input.c | 3 +-
sound/pci/hda/patch_realtek.c | 3 +
sound/pci/lx6464es/lx_core.c | 11 ++-
sound/soc/soc-topology.c | 8 +-
sound/synth/emux/emux_nrpn.c | 3 +
tools/testing/selftests/net/forwarding/lib.sh | 4 +-
67 files changed, 398 insertions(+), 268 deletions(-)
changes since v2:
- The code was tottaly rewritten based on the disscution of the
v2 patch.
- the ssid is set in __cfg80211_connect_result() and only if the ssid is
not already set.
- Do not add an other ssid reset path since it is already done in
__cfg80211_disconnected()
When a connexion was established without going through
NL80211_CMD_CONNECT, the ssid was never set in the wireless_dev struct.
Now we set it in __cfg80211_connect_result() when it is not already set.
Reported-by: Yohan Prod'homme <kernel(a)zoddo.fr>
Fixes: 7b0a0e3c3a88260b6fcb017e49f198463aa62ed1
Cc: linux-wireless(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216711
Signed-off-by: Marc Bornand <dev.mbornand(a)systemb.ch>
---
net/wireless/sme.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/net/wireless/sme.c b/net/wireless/sme.c
index 4b5b6ee0fe01..629d7b5f65c1 100644
--- a/net/wireless/sme.c
+++ b/net/wireless/sme.c
@@ -723,6 +723,7 @@ void __cfg80211_connect_result(struct net_device *dev,
bool wextev)
{
struct wireless_dev *wdev = dev->ieee80211_ptr;
+ const struct element *ssid;
const struct element *country_elem = NULL;
const u8 *country_data;
u8 country_datalen;
@@ -883,6 +884,21 @@ void __cfg80211_connect_result(struct net_device *dev,
country_data, country_datalen);
kfree(country_data);
+ if (wdev->u.client.ssid_len == 0) {
+ rcu_read_lock();
+ for_each_valid_link(cr, link) {
+ ssid = ieee80211_bss_get_elem(cr->links[link].bss,
+ WLAN_EID_SSID);
+
+ if (ssid->datalen == 0)
+ continue;
+
+ memcpy(wdev->u.client.ssid, ssid->data, ssid->datalen);
+ wdev->u.client.ssid_len = ssid->datalen;
+ }
+ rcu_read_unlock();
+ }
+
return;
out:
for_each_valid_link(cr, link)
--
2.39.1
Add the MST topology for a CRTC to the atomic state if the driver
needs to force a modeset on the CRTC after the encoder compute config
functions are called.
Later the MST encoder's disable hook also adds the state, but that isn't
guaranteed to work (since in that hook getting the state may fail, which
can't be handled there). This should fix that, while a later patch fixes
the use of the MST state in the disable hook.
v2: Add missing forward struct declartions, caught by hdrtest.
v3: Factor out intel_dp_mst_add_topology_state_for_connector() used
later in the patchset.
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # 6.1
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com> # v2
Reviewed-by: Lyude Paul <lyude(a)redhat.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_display.c | 4 ++
drivers/gpu/drm/i915/display/intel_dp_mst.c | 61 ++++++++++++++++++++
drivers/gpu/drm/i915/display/intel_dp_mst.h | 4 ++
3 files changed, 69 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 166662ade593c..38106cf63b3b9 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -5936,6 +5936,10 @@ int intel_modeset_all_pipes(struct intel_atomic_state *state,
if (ret)
return ret;
+ ret = intel_dp_mst_add_topology_state_for_crtc(state, crtc);
+ if (ret)
+ return ret;
+
ret = intel_atomic_add_affected_planes(state, crtc);
if (ret)
return ret;
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index 8b0e4defa3f10..f3cb12dcfe0a7 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -1223,3 +1223,64 @@ bool intel_dp_mst_is_slave_trans(const struct intel_crtc_state *crtc_state)
return crtc_state->mst_master_transcoder != INVALID_TRANSCODER &&
crtc_state->mst_master_transcoder != crtc_state->cpu_transcoder;
}
+
+/**
+ * intel_dp_mst_add_topology_state_for_connector - add MST topology state for a connector
+ * @state: atomic state
+ * @connector: connector to add the state for
+ * @crtc: the CRTC @connector is attached to
+ *
+ * Add the MST topology state for @connector to @state.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+static int
+intel_dp_mst_add_topology_state_for_connector(struct intel_atomic_state *state,
+ struct intel_connector *connector,
+ struct intel_crtc *crtc)
+{
+ struct drm_dp_mst_topology_state *mst_state;
+
+ if (!connector->mst_port)
+ return 0;
+
+ mst_state = drm_atomic_get_mst_topology_state(&state->base,
+ &connector->mst_port->mst_mgr);
+ if (IS_ERR(mst_state))
+ return PTR_ERR(mst_state);
+
+ mst_state->pending_crtc_mask |= drm_crtc_mask(&crtc->base);
+
+ return 0;
+}
+
+/**
+ * intel_dp_mst_add_topology_state_for_crtc - add MST topology state for a CRTC
+ * @state: atomic state
+ * @crtc: CRTC to add the state for
+ *
+ * Add the MST topology state for @crtc to @state.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+int intel_dp_mst_add_topology_state_for_crtc(struct intel_atomic_state *state,
+ struct intel_crtc *crtc)
+{
+ struct drm_connector *_connector;
+ struct drm_connector_state *conn_state;
+ int i;
+
+ for_each_new_connector_in_state(&state->base, _connector, conn_state, i) {
+ struct intel_connector *connector = to_intel_connector(_connector);
+ int ret;
+
+ if (conn_state->crtc != &crtc->base)
+ continue;
+
+ ret = intel_dp_mst_add_topology_state_for_connector(state, connector, crtc);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.h b/drivers/gpu/drm/i915/display/intel_dp_mst.h
index f7301de6cdfb3..f1815bb722672 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.h
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.h
@@ -8,6 +8,8 @@
#include <linux/types.h>
+struct intel_atomic_state;
+struct intel_crtc;
struct intel_crtc_state;
struct intel_digital_port;
struct intel_dp;
@@ -18,5 +20,7 @@ int intel_dp_mst_encoder_active_links(struct intel_digital_port *dig_port);
bool intel_dp_mst_is_master_trans(const struct intel_crtc_state *crtc_state);
bool intel_dp_mst_is_slave_trans(const struct intel_crtc_state *crtc_state);
bool intel_dp_mst_source_support(struct intel_dp *intel_dp);
+int intel_dp_mst_add_topology_state_for_crtc(struct intel_atomic_state *state,
+ struct intel_crtc *crtc);
#endif /* __INTEL_DP_MST_H__ */
--
2.37.1
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: d125d1349abeb46945dc5e98f7824bf688266f13
Gitweb: https://git.kernel.org/tip/d125d1349abeb46945dc5e98f7824bf688266f13
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Thu, 09 Feb 2023 23:25:49 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 14 Feb 2023 11:18:35 +01:00
alarmtimer: Prevent starvation by small intervals and SIG_IGN
syzbot reported a RCU stall which is caused by setting up an alarmtimer
with a very small interval and ignoring the signal. The reproducer arms the
alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a
problem per se, but that's an issue when the signal is ignored because then
the timer is immediately rearmed because there is no way to delay that
rearming to the signal delivery path. See posix_timer_fn() and commit
58229a189942 ("posix-timers: Prevent softirq starvation by small intervals
and SIG_IGN") for details.
The reproducer does not set SIG_IGN explicitely, but it sets up the timers
signal with SIGCONT. That has the same effect as explicitely setting
SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and
the task is not ptraced.
The log clearly shows that:
[pid 5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---
It works because the tasks are traced and therefore the signal is queued so
the tracer can see it, which delays the restart of the timer to the signal
delivery path. But then the tracer is killed:
[pid 5087] kill(-5102, SIGKILL <unfinished ...>
...
./strace-static-x86_64: Process 5107 detached
and after it's gone the stall can be observed:
syzkaller login: [ 79.439102][ C0] hrtimer: interrupt took 68471 ns
[ 184.460538][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
...
[ 184.658237][ C1] rcu: Stack dump where RCU GP kthread last ran:
[ 184.664574][ C1] Sending NMI from CPU 1 to CPUs 0:
[ 184.669821][ C0] NMI backtrace for cpu 0
[ 184.669831][ C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0
...
[ 184.670036][ C0] Call Trace:
[ 184.670041][ C0] <IRQ>
[ 184.670045][ C0] alarmtimer_fired+0x327/0x670
posix_timer_fn() prevents that by checking whether the interval for
timers which have the signal ignored is smaller than a jiffie and
artifically delay it by shifting the next expiry out by a jiffie. That's
accurate vs. the overrun accounting, but slightly inaccurate
vs. timer_gettimer(2).
The comment in that function says what needs to be done and there was a fix
available for the regular userspace induced SIG_IGN mechanism, but that did
not work due to the implicit ignore for SIGCONT and similar signals. This
needs to be worked on, but for now the only available workaround is to do
exactly what posix_timer_fn() does:
Increase the interval of self-rearming timers, which have their signal
ignored, to at least a jiffie.
Interestingly this has been fixed before via commit ff86bf0c65f1
("alarmtimer: Rate limit periodic intervals") already, but that fix got
lost in a later rework.
Reported-by: syzbot+b9564ba6e8e00694511b(a)syzkaller.appspotmail.com
Fixes: f2c45807d399 ("alarmtimer: Switch over to generic set/get/rearm routine")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: John Stultz <jstultz(a)google.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx
---
kernel/time/alarmtimer.c | 33 +++++++++++++++++++++++++++++----
1 file changed, 29 insertions(+), 4 deletions(-)
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 5897828..7e5dff6 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -470,11 +470,35 @@ u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval)
}
EXPORT_SYMBOL_GPL(alarm_forward);
-u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+static u64 __alarm_forward_now(struct alarm *alarm, ktime_t interval, bool throttle)
{
struct alarm_base *base = &alarm_bases[alarm->type];
+ ktime_t now = base->get_ktime();
+
+ if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS) && throttle) {
+ /*
+ * Same issue as with posix_timer_fn(). Timers which are
+ * periodic but the signal is ignored can starve the system
+ * with a very small interval. The real fix which was
+ * promised in the context of posix_timer_fn() never
+ * materialized, but someone should really work on it.
+ *
+ * To prevent DOS fake @now to be 1 jiffie out which keeps
+ * the overrun accounting correct but creates an
+ * inconsistency vs. timer_gettime(2).
+ */
+ ktime_t kj = NSEC_PER_SEC / HZ;
+
+ if (interval < kj)
+ now = ktime_add(now, kj);
+ }
+
+ return alarm_forward(alarm, now, interval);
+}
- return alarm_forward(alarm, base->get_ktime(), interval);
+u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+{
+ return __alarm_forward_now(alarm, interval, false);
}
EXPORT_SYMBOL_GPL(alarm_forward_now);
@@ -551,9 +575,10 @@ static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm,
if (posix_timer_event(ptr, si_private) && ptr->it_interval) {
/*
* Handle ignored signals and rearm the timer. This will go
- * away once we handle ignored signals proper.
+ * away once we handle ignored signals proper. Ensure that
+ * small intervals cannot starve the system.
*/
- ptr->it_overrun += alarm_forward_now(alarm, ptr->it_interval);
+ ptr->it_overrun += __alarm_forward_now(alarm, ptr->it_interval, true);
++ptr->it_requeue_pending;
ptr->it_active = 1;
result = ALARMTIMER_RESTART;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
73bdf65ea748 ("migrate: hugetlb: check for hugetlb shared PMD in node migration")
7ce82f4c3f3e ("mm/migration: return errno when isolate_huge_page failed")
1b7f7e58decc ("mm/gup: Convert check_and_migrate_movable_pages() to use a folio")
f9f38f78c5d5 ("mm: refactor check_and_migrate_movable_pages")
5ac95884a784 ("mm/migrate: enable returning precise migrate_pages() success count")
c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
5db4f15c4fd7 ("mm: memory: add orig_pmd to struct vm_fault")
8f34f1eac382 ("mm/userfaultfd: fix uffd-wp special cases for fork()")
25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation")
f68749ec342b ("mm/gup: longterm pin migration cleanup")
d1e153fea2a8 ("mm/gup: migrate pinned pages out of movable zone")
1a08ae36cf8b ("mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN")
6e7f34ebb8d2 ("mm/gup: check for isolation errors")
f0f4463837da ("mm/gup: return an error on migration failure")
83c02c23d074 ("mm/gup: check every subpage of a compound page during isolation")
c991ffef7bce ("mm/gup: don't pin migrated cma pages in movable zone")
7ee820ee7238 ("Revert "mm: migrate: skip shared exec THP for NUMA balancing"")
ae37c7ff79f1 ("mm: make alloc_contig_range handle in-use hugetlb pages")
369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages")
c2ad7a1ffeaf ("mm,compaction: let isolate_migratepages_{range,block} return error codes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 73bdf65ea74857d7fb2ec3067a3cec0e261b1462 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Date: Thu, 26 Jan 2023 14:27:21 -0800
Subject: [PATCH] migrate: hugetlb: check for hugetlb shared PMD in node
migration
migrate_pages/mempolicy semantics state that CAP_SYS_NICE is required to
move pages shared with another process to a different node. page_mapcount
> 1 is being used to determine if a hugetlb page is shared. However, a
hugetlb page will have a mapcount of 1 if mapped by multiple processes via
a shared PMD. As a result, hugetlb pages shared by multiple processes and
mapped with a shared PMD can be moved by a process without CAP_SYS_NICE.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found
consider the page shared.
Link: https://lkml.kernel.org/r/20230126222721.222195-3-mike.kravetz@oracle.com
Fixes: e2d8cf405525 ("migrate: add hugepage migration code to migrate_pages()")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 02c8a712282f..f940395667c8 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -600,7 +600,8 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
if (flags & (MPOL_MF_MOVE_ALL) ||
- (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) {
+ (flags & MPOL_MF_MOVE && page_mapcount(page) == 1 &&
+ !hugetlb_pmd_shared(pte))) {
if (isolate_hugetlb(page, qp->pagelist) &&
(flags & MPOL_MF_STRICT))
/*
From: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
It's unusual that we have enumeration by class in the middle of the table.
It might potentially be problematic in the future if we add another entry
after it.
So, move class matching entry to be the last in the ID table.
[ Upstream commit 0b85f59d30b91bd2b93ea7ef0816a4b7e7039e8c ]
Without this change, quirks set in driver_data added after the catch-all
are ignored.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reviewed-by: Keith Busch <kbusch(a)kernel.org>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni(a)wdc.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Gwendal Grignou <gwendal(a)chromium.org>
---
drivers/nvme/host/pci.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5d62d1042c0e6..a58711c488509 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3199,7 +3199,6 @@ static const struct pci_device_id nvme_id_table[] = {
NVME_QUIRK_IGNORE_DEV_SUBNQN, },
{ PCI_DEVICE(0x1c5c, 0x1504), /* SK Hynix PC400 */
.driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
- { PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
{ PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD */
.driver_data = NVME_QUIRK_NO_DEEPEST_PS, },
{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001),
@@ -3209,6 +3208,8 @@ static const struct pci_device_id nvme_id_table[] = {
.driver_data = NVME_QUIRK_SINGLE_VECTOR |
NVME_QUIRK_128_BYTES_SQES |
NVME_QUIRK_SHARED_TAGS },
+
+ { PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
{ 0, }
};
MODULE_DEVICE_TABLE(pci, nvme_id_table);
--
2.39.1.519.gcb327c4b5f-goog
Mark the Tiger Lake UP{3,4} AHCI controller as "low_power". This enables
S0ix to work out of the box. Otherwise this isn't working unless the
user manually sets /sys/class/scsi_host/*/link_power_management_policy.
Intel lists a total of 4 SATA controller IDs in [1] for those mobile
PCHs. This commit just adds the "AHCI" variant since I only tested
those.
[1]: https://cdrdv2.intel.com/v1/dl/getContent/631119
Signed-off-by: Simon Gaiser <simon(a)invisiblethingslab.com>
CC: stable(a)vger.kernel.org
---
As noted above this doesn't include the other PCI IDs listed by Intel
for those PCHs (RAID modes). Also the same is probably needed for newer
generations. But for both I don't have hardware to test handy right now,
so only included what I have actually tested.
Added stable to CC, since on systems using S0ix this prevents S0ix
residency and therefore leads to such high power consumption that
suspend is effectively broken.
drivers/ata/ahci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 14a1c0d14916..3bb9bb483fe3 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -421,6 +421,7 @@ static const struct pci_device_id ahci_pci_tbl[] = {
{ PCI_VDEVICE(INTEL, 0x34d3), board_ahci_low_power }, /* Ice Lake LP AHCI */
{ PCI_VDEVICE(INTEL, 0x02d3), board_ahci_low_power }, /* Comet Lake PCH-U AHCI */
{ PCI_VDEVICE(INTEL, 0x02d7), board_ahci_low_power }, /* Comet Lake PCH RAID */
+ { PCI_VDEVICE(INTEL, 0xa0d3), board_ahci_low_power }, /* Tiger Lake UP{3,4} AHCI */
/* JMicron 360/1/3/5/6, match class to avoid IDE function */
{ PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
--
2.39.1
A hugetlb page will have a mapcount of 1 if mapped by multiple processes
via a shared PMD. This is because only the first process increases the
map count, and subsequent processes just add the shared PMD page to
their page table.
page_mapcount is being used to decide if a hugetlb page is shared or
private in /proc/PID/smaps. Pages referenced via a shared PMD were
incorrectly being counted as private.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is
found count the hugetlb page as shared. A new helper to check for a
shared PMD is added.
Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
fs/proc/task_mmu.c | 10 ++++++++--
include/linux/hugetlb.h | 12 ++++++++++++
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e35a0398db63..cb9539879402 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -749,8 +749,14 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
if (mapcount >= 2)
mss->shared_hugetlb += huge_page_size(hstate_vma(vma));
- else
- mss->private_hugetlb += huge_page_size(hstate_vma(vma));
+ else {
+ if (hugetlb_pmd_shared(pte))
+ mss->shared_hugetlb +=
+ huge_page_size(hstate_vma(vma));
+ else
+ mss->private_hugetlb +=
+ huge_page_size(hstate_vma(vma));
+ }
}
return 0;
}
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e3aa336df900..8e65920e4363 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1225,6 +1225,18 @@ static inline __init void hugetlb_cma_reserve(int order)
}
#endif
+#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
+static inline bool hugetlb_pmd_shared(pte_t *pte)
+{
+ return page_count(virt_to_page(pte)) > 1;
+}
+#else
+static inline bool hugetlb_pmd_shared(pte_t *pte)
+{
+ return false;
+}
+#endif
+
bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr);
#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE
--
2.39.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
73bdf65ea748 ("migrate: hugetlb: check for hugetlb shared PMD in node migration")
7ce82f4c3f3e ("mm/migration: return errno when isolate_huge_page failed")
1b7f7e58decc ("mm/gup: Convert check_and_migrate_movable_pages() to use a folio")
f9f38f78c5d5 ("mm: refactor check_and_migrate_movable_pages")
5ac95884a784 ("mm/migrate: enable returning precise migrate_pages() success count")
c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
5db4f15c4fd7 ("mm: memory: add orig_pmd to struct vm_fault")
8f34f1eac382 ("mm/userfaultfd: fix uffd-wp special cases for fork()")
25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation")
f68749ec342b ("mm/gup: longterm pin migration cleanup")
d1e153fea2a8 ("mm/gup: migrate pinned pages out of movable zone")
1a08ae36cf8b ("mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN")
6e7f34ebb8d2 ("mm/gup: check for isolation errors")
f0f4463837da ("mm/gup: return an error on migration failure")
83c02c23d074 ("mm/gup: check every subpage of a compound page during isolation")
c991ffef7bce ("mm/gup: don't pin migrated cma pages in movable zone")
7ee820ee7238 ("Revert "mm: migrate: skip shared exec THP for NUMA balancing"")
ae37c7ff79f1 ("mm: make alloc_contig_range handle in-use hugetlb pages")
369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages")
c2ad7a1ffeaf ("mm,compaction: let isolate_migratepages_{range,block} return error codes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 73bdf65ea74857d7fb2ec3067a3cec0e261b1462 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Date: Thu, 26 Jan 2023 14:27:21 -0800
Subject: [PATCH] migrate: hugetlb: check for hugetlb shared PMD in node
migration
migrate_pages/mempolicy semantics state that CAP_SYS_NICE is required to
move pages shared with another process to a different node. page_mapcount
> 1 is being used to determine if a hugetlb page is shared. However, a
hugetlb page will have a mapcount of 1 if mapped by multiple processes via
a shared PMD. As a result, hugetlb pages shared by multiple processes and
mapped with a shared PMD can be moved by a process without CAP_SYS_NICE.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found
consider the page shared.
Link: https://lkml.kernel.org/r/20230126222721.222195-3-mike.kravetz@oracle.com
Fixes: e2d8cf405525 ("migrate: add hugepage migration code to migrate_pages()")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 02c8a712282f..f940395667c8 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -600,7 +600,8 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
if (flags & (MPOL_MF_MOVE_ALL) ||
- (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) {
+ (flags & MPOL_MF_MOVE && page_mapcount(page) == 1 &&
+ !hugetlb_pmd_shared(pte))) {
if (isolate_hugetlb(page, qp->pagelist) &&
(flags & MPOL_MF_STRICT))
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
73bdf65ea748 ("migrate: hugetlb: check for hugetlb shared PMD in node migration")
7ce82f4c3f3e ("mm/migration: return errno when isolate_huge_page failed")
1b7f7e58decc ("mm/gup: Convert check_and_migrate_movable_pages() to use a folio")
f9f38f78c5d5 ("mm: refactor check_and_migrate_movable_pages")
5ac95884a784 ("mm/migrate: enable returning precise migrate_pages() success count")
c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
5db4f15c4fd7 ("mm: memory: add orig_pmd to struct vm_fault")
8f34f1eac382 ("mm/userfaultfd: fix uffd-wp special cases for fork()")
25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation")
f68749ec342b ("mm/gup: longterm pin migration cleanup")
d1e153fea2a8 ("mm/gup: migrate pinned pages out of movable zone")
1a08ae36cf8b ("mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN")
6e7f34ebb8d2 ("mm/gup: check for isolation errors")
f0f4463837da ("mm/gup: return an error on migration failure")
83c02c23d074 ("mm/gup: check every subpage of a compound page during isolation")
c991ffef7bce ("mm/gup: don't pin migrated cma pages in movable zone")
7ee820ee7238 ("Revert "mm: migrate: skip shared exec THP for NUMA balancing"")
ae37c7ff79f1 ("mm: make alloc_contig_range handle in-use hugetlb pages")
369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages")
c2ad7a1ffeaf ("mm,compaction: let isolate_migratepages_{range,block} return error codes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 73bdf65ea74857d7fb2ec3067a3cec0e261b1462 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Date: Thu, 26 Jan 2023 14:27:21 -0800
Subject: [PATCH] migrate: hugetlb: check for hugetlb shared PMD in node
migration
migrate_pages/mempolicy semantics state that CAP_SYS_NICE is required to
move pages shared with another process to a different node. page_mapcount
> 1 is being used to determine if a hugetlb page is shared. However, a
hugetlb page will have a mapcount of 1 if mapped by multiple processes via
a shared PMD. As a result, hugetlb pages shared by multiple processes and
mapped with a shared PMD can be moved by a process without CAP_SYS_NICE.
To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found
consider the page shared.
Link: https://lkml.kernel.org/r/20230126222721.222195-3-mike.kravetz@oracle.com
Fixes: e2d8cf405525 ("migrate: add hugepage migration code to migrate_pages()")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 02c8a712282f..f940395667c8 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -600,7 +600,8 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
if (flags & (MPOL_MF_MOVE_ALL) ||
- (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) {
+ (flags & MPOL_MF_MOVE && page_mapcount(page) == 1 &&
+ !hugetlb_pmd_shared(pte))) {
if (isolate_hugetlb(page, qp->pagelist) &&
(flags & MPOL_MF_STRICT))
/*
The global irq_domain_mutex is held when mapping interrupts from
non-hierarchical domains but currently not when disposing them.
This specifically means that updates of the domain mapcount is racy
(currently only used for statistics in debugfs).
Make sure to hold the global irq_domain_mutex also when disposing
mappings from non-hierarchical domains.
Fixes: 9dc6be3d4193 ("genirq/irqdomain: Add map counter")
Cc: stable(a)vger.kernel.org # 4.13
Tested-by: Hsin-Yi Wang <hsinyi(a)chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
kernel/irq/irqdomain.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 561689a3f050..981cd636275e 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -538,6 +538,9 @@ static void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq)
return;
hwirq = irq_data->hwirq;
+
+ mutex_lock(&irq_domain_mutex);
+
irq_set_status_flags(irq, IRQ_NOREQUEST);
/* remove chip and handler */
@@ -557,6 +560,8 @@ static void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq)
/* Clear reverse map for this hwirq */
irq_domain_clear_mapping(domain, hwirq);
+
+ mutex_unlock(&irq_domain_mutex);
}
static int irq_domain_associate_locked(struct irq_domain *domain, unsigned int virq,
--
2.39.1
Refactor __irq_domain_alloc_irqs() so that it can be called internally
while holding the irq_domain_mutex.
This will be used to fix a shared-interrupt mapping race, hence the
Fixes tag.
Fixes: b62b2cf5759b ("irqdomain: Fix handling of type settings for existing mappings")
Cc: stable(a)vger.kernel.org # 4.8
Tested-by: Hsin-Yi Wang <hsinyi(a)chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
kernel/irq/irqdomain.c | 88 +++++++++++++++++++++++-------------------
1 file changed, 48 insertions(+), 40 deletions(-)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 3d6a14efae62..7b57949bc79c 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1441,40 +1441,12 @@ int irq_domain_alloc_irqs_hierarchy(struct irq_domain *domain,
return domain->ops->alloc(domain, irq_base, nr_irqs, arg);
}
-/**
- * __irq_domain_alloc_irqs - Allocate IRQs from domain
- * @domain: domain to allocate from
- * @irq_base: allocate specified IRQ number if irq_base >= 0
- * @nr_irqs: number of IRQs to allocate
- * @node: NUMA node id for memory allocation
- * @arg: domain specific argument
- * @realloc: IRQ descriptors have already been allocated if true
- * @affinity: Optional irq affinity mask for multiqueue devices
- *
- * Allocate IRQ numbers and initialized all data structures to support
- * hierarchy IRQ domains.
- * Parameter @realloc is mainly to support legacy IRQs.
- * Returns error code or allocated IRQ number
- *
- * The whole process to setup an IRQ has been split into two steps.
- * The first step, __irq_domain_alloc_irqs(), is to allocate IRQ
- * descriptor and required hardware resources. The second step,
- * irq_domain_activate_irq(), is to program the hardware with preallocated
- * resources. In this way, it's easier to rollback when failing to
- * allocate resources.
- */
-int __irq_domain_alloc_irqs(struct irq_domain *domain, int irq_base,
- unsigned int nr_irqs, int node, void *arg,
- bool realloc, const struct irq_affinity_desc *affinity)
+static int irq_domain_alloc_irqs_locked(struct irq_domain *domain, int irq_base,
+ unsigned int nr_irqs, int node, void *arg,
+ bool realloc, const struct irq_affinity_desc *affinity)
{
int i, ret, virq;
- if (domain == NULL) {
- domain = irq_default_domain;
- if (WARN(!domain, "domain is NULL; cannot allocate IRQ\n"))
- return -EINVAL;
- }
-
if (realloc && irq_base >= 0) {
virq = irq_base;
} else {
@@ -1493,24 +1465,18 @@ int __irq_domain_alloc_irqs(struct irq_domain *domain, int irq_base,
goto out_free_desc;
}
- mutex_lock(&irq_domain_mutex);
ret = irq_domain_alloc_irqs_hierarchy(domain, virq, nr_irqs, arg);
- if (ret < 0) {
- mutex_unlock(&irq_domain_mutex);
+ if (ret < 0)
goto out_free_irq_data;
- }
for (i = 0; i < nr_irqs; i++) {
ret = irq_domain_trim_hierarchy(virq + i);
- if (ret) {
- mutex_unlock(&irq_domain_mutex);
+ if (ret)
goto out_free_irq_data;
- }
}
-
+
for (i = 0; i < nr_irqs; i++)
irq_domain_insert_irq(virq + i);
- mutex_unlock(&irq_domain_mutex);
return virq;
@@ -1520,6 +1486,48 @@ int __irq_domain_alloc_irqs(struct irq_domain *domain, int irq_base,
irq_free_descs(virq, nr_irqs);
return ret;
}
+
+/**
+ * __irq_domain_alloc_irqs - Allocate IRQs from domain
+ * @domain: domain to allocate from
+ * @irq_base: allocate specified IRQ number if irq_base >= 0
+ * @nr_irqs: number of IRQs to allocate
+ * @node: NUMA node id for memory allocation
+ * @arg: domain specific argument
+ * @realloc: IRQ descriptors have already been allocated if true
+ * @affinity: Optional irq affinity mask for multiqueue devices
+ *
+ * Allocate IRQ numbers and initialized all data structures to support
+ * hierarchy IRQ domains.
+ * Parameter @realloc is mainly to support legacy IRQs.
+ * Returns error code or allocated IRQ number
+ *
+ * The whole process to setup an IRQ has been split into two steps.
+ * The first step, __irq_domain_alloc_irqs(), is to allocate IRQ
+ * descriptor and required hardware resources. The second step,
+ * irq_domain_activate_irq(), is to program the hardware with preallocated
+ * resources. In this way, it's easier to rollback when failing to
+ * allocate resources.
+ */
+int __irq_domain_alloc_irqs(struct irq_domain *domain, int irq_base,
+ unsigned int nr_irqs, int node, void *arg,
+ bool realloc, const struct irq_affinity_desc *affinity)
+{
+ int ret;
+
+ if (domain == NULL) {
+ domain = irq_default_domain;
+ if (WARN(!domain, "domain is NULL; cannot allocate IRQ\n"))
+ return -EINVAL;
+ }
+
+ mutex_lock(&irq_domain_mutex);
+ ret = irq_domain_alloc_irqs_locked(domain, irq_base, nr_irqs, node, arg,
+ realloc, affinity);
+ mutex_unlock(&irq_domain_mutex);
+
+ return ret;
+}
EXPORT_SYMBOL_GPL(__irq_domain_alloc_irqs);
/* The irq_data was moved, fix the revmap to refer to the new location */
--
2.39.1
In case a newly allocated IRQ ever ends up not having any associated
struct irq_data it would not even be possible to dispose the mapping.
Replace the bogus disposal with a WARN_ON().
This will also be used to fix a shared-interrupt mapping race, hence the
CC-stable tag.
Fixes: 1e2a7d78499e ("irqdomain: Don't set type when mapping an IRQ")
Cc: stable(a)vger.kernel.org # 4.8
Tested-by: Hsin-Yi Wang <hsinyi(a)chromium.org>
Tested-by: Mark-PK Tsai <mark-pk.tsai(a)mediatek.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
kernel/irq/irqdomain.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 981cd636275e..b4326c364ae7 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -847,13 +847,8 @@ unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec)
}
irq_data = irq_get_irq_data(virq);
- if (!irq_data) {
- if (irq_domain_is_hierarchy(domain))
- irq_domain_free_irqs(virq, 1);
- else
- irq_dispose_mapping(virq);
+ if (WARN_ON(!irq_data))
return 0;
- }
/* Store trigger type */
irqd_set_trigger_type(irq_data, type);
--
2.39.1
From: Marc Zyngier <maz(a)kernel.org>
Hierarchical domains created using irq_domain_create_hierarchy() are
currently added to the domain list before having been fully initialised.
This specifically means that a racing allocation request might fail to
allocate irq data for the inner domains of a hierarchy in case the
parent domain pointer has not yet been set up.
Note that this is not really any issue for irqchip drivers that are
registered early (e.g. via IRQCHIP_DECLARE() or IRQCHIP_ACPI_DECLARE())
but could potentially cause trouble with drivers that are registered
later (e.g. modular drivers using IRQCHIP_PLATFORM_DRIVER_BEGIN(),
gpiochip drivers, etc.).
Fixes: afb7da83b9f4 ("irqdomain: Introduce helper function irq_domain_add_hierarchy()")
Cc: stable(a)vger.kernel.org # 3.19
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
[ johan: add commit message ]
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
kernel/irq/irqdomain.c | 62 +++++++++++++++++++++++++++++-------------
1 file changed, 43 insertions(+), 19 deletions(-)
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index bfda4adc05c0..8e14805c5508 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -126,23 +126,12 @@ void irq_domain_free_fwnode(struct fwnode_handle *fwnode)
}
EXPORT_SYMBOL_GPL(irq_domain_free_fwnode);
-/**
- * __irq_domain_add() - Allocate a new irq_domain data structure
- * @fwnode: firmware node for the interrupt controller
- * @size: Size of linear map; 0 for radix mapping only
- * @hwirq_max: Maximum number of interrupts supported by controller
- * @direct_max: Maximum value of direct maps; Use ~0 for no limit; 0 for no
- * direct mapping
- * @ops: domain callbacks
- * @host_data: Controller private data pointer
- *
- * Allocates and initializes an irq_domain structure.
- * Returns pointer to IRQ domain, or NULL on failure.
- */
-struct irq_domain *__irq_domain_add(struct fwnode_handle *fwnode, unsigned int size,
- irq_hw_number_t hwirq_max, int direct_max,
- const struct irq_domain_ops *ops,
- void *host_data)
+static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
+ unsigned int size,
+ irq_hw_number_t hwirq_max,
+ int direct_max,
+ const struct irq_domain_ops *ops,
+ void *host_data)
{
struct irqchip_fwid *fwid;
struct irq_domain *domain;
@@ -230,12 +219,44 @@ struct irq_domain *__irq_domain_add(struct fwnode_handle *fwnode, unsigned int s
irq_domain_check_hierarchy(domain);
+ return domain;
+}
+
+static void __irq_domain_publish(struct irq_domain *domain)
+{
mutex_lock(&irq_domain_mutex);
debugfs_add_domain_dir(domain);
list_add(&domain->link, &irq_domain_list);
mutex_unlock(&irq_domain_mutex);
pr_debug("Added domain %s\n", domain->name);
+}
+
+/**
+ * __irq_domain_add() - Allocate a new irq_domain data structure
+ * @fwnode: firmware node for the interrupt controller
+ * @size: Size of linear map; 0 for radix mapping only
+ * @hwirq_max: Maximum number of interrupts supported by controller
+ * @direct_max: Maximum value of direct maps; Use ~0 for no limit; 0 for no
+ * direct mapping
+ * @ops: domain callbacks
+ * @host_data: Controller private data pointer
+ *
+ * Allocates and initializes an irq_domain structure.
+ * Returns pointer to IRQ domain, or NULL on failure.
+ */
+struct irq_domain *__irq_domain_add(struct fwnode_handle *fwnode, unsigned int size,
+ irq_hw_number_t hwirq_max, int direct_max,
+ const struct irq_domain_ops *ops,
+ void *host_data)
+{
+ struct irq_domain *domain;
+
+ domain = __irq_domain_create(fwnode, size, hwirq_max, direct_max,
+ ops, host_data);
+ if (domain)
+ __irq_domain_publish(domain);
+
return domain;
}
EXPORT_SYMBOL_GPL(__irq_domain_add);
@@ -1138,12 +1159,15 @@ struct irq_domain *irq_domain_create_hierarchy(struct irq_domain *parent,
struct irq_domain *domain;
if (size)
- domain = irq_domain_create_linear(fwnode, size, ops, host_data);
+ domain = __irq_domain_create(fwnode, size, size, 0, ops, host_data);
else
- domain = irq_domain_create_tree(fwnode, ops, host_data);
+ domain = __irq_domain_create(fwnode, 0, ~0, 0, ops, host_data);
+
if (domain) {
domain->parent = parent;
domain->flags |= flags;
+
+ __irq_domain_publish(domain);
}
return domain;
--
2.39.1
changes since v1:
- add some informations
- test it on wireless-2023-01-18 tag
- no real code change
When a connexion was established without going through
NL80211_CMD_CONNECT, the ssid was never set in the wireless_dev struct.
Now we set it during when an NL80211_CMD_AUTHENTICATE is issued.
It may be needed to test this on some additional hardware (tested with
iwlwifi and a AX201, and iwd on the userspace side), I could not test
things like roaming and p2p.
alternatives:
1. Do the same but during association and not authentication.
2. use ieee80211_bss_get_elem in nl80211_send_iface, this would report
the right ssid to userspace, but this would not fix the root cause,
this alos wa the behavior prior to 7b0a0e3c3a882 when the bug was
introduced.
This applies to v6.2-rc8 or wireless-2023-01-18,
The last linux version known to be unafected is 5.19 and the bug was
backported to the 5.19.y releases
Reported-by: Yohan Prod'homme <kernel(a)zoddo.fr>
Fixes: 7b0a0e3c3a88260b6fcb017e49f198463aa62ed1
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216711
Signed-off-by: Marc Bornand <dev.mbornand(a)systemb.ch>
---
net/wireless/nl80211.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 33a82ecab9d5..f1627ea542b9 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -10552,6 +10552,10 @@ static int nl80211_authenticate(struct sk_buff *skb, struct genl_info *info)
return -ENOENT;
wdev_lock(dev->ieee80211_ptr);
+
+ memcpy(dev->ieee80211_ptr->u.client.ssid, ssid, ssid_len);
+ dev->ieee80211_ptr->u.client.ssid_len = ssid_len;
+
err = cfg80211_mlme_auth(rdev, dev, &req);
wdev_unlock(dev->ieee80211_ptr);
@@ -11025,6 +11029,11 @@ static int nl80211_deauthenticate(struct sk_buff *skb, struct genl_info *info)
local_state_change = !!info->attrs[NL80211_ATTR_LOCAL_STATE_CHANGE];
wdev_lock(dev->ieee80211_ptr);
+
+ if (reason_code == WLAN_REASON_DEAUTH_LEAVING) {
+ dev->ieee80211_ptr->u.client.ssid_len = 0;
+ }
+
err = cfg80211_mlme_deauth(rdev, dev, bssid, ie, ie_len, reason_code,
local_state_change);
wdev_unlock(dev->ieee80211_ptr);
--
2.39.1
A number of Cezanne systems report IRQ1 as a wakeup source when it's not
actually a wakeup. This can cause problems for certain ACPI events. The
following fix went upstream that fixed it:
commit 8e60615e8932 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for
RN/CZN")
It was reported that this fix actually helped here with older kernels too:
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1921#note_1770257
So backport this fix to 5.15.y as well. The backport is done by hand
because the driver has changed significantly. To backport this also
requires the SMU version reading function which was introduced from:
commit f6045de1f532 ("platform/x86: amd-pmc: Export Idlemask values based
on the APU")
So backport that part of that commit as well.
Mario Limonciello (1):
platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN
drivers/platform/x86/amd-pmc.c | 59 ++++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)
--
2.34.1