The quilt patch titled
Subject: kunit/overflow: fix UB in overflow_allocation_test
has been removed from the -mm tree. Its filename was
kunit-overflow-fix-ub-in-overflow_allocation_test.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Subject: kunit/overflow: fix UB in overflow_allocation_test
Date: Thu, 15 Aug 2024 01:04:31 +0100
The 'device_name' array doesn't exist out of the
'overflow_allocation_test' function scope. However, it is being used
as a driver name when calling 'kunit_driver_create' from
'kunit_device_register'. It produces the kernel panic with KASAN
enabled due to undefined behaviour.
Since this variable is used in one place only, remove it and pass the
device name into kunit_device_register directly as an ascii string.
Link: https://lkml.kernel.org/r/20240815000431.401869-1-ivan.orlov0322@gmail.com
Fixes: ca90800a91ba ("test_overflow: Add memory allocation overflow tests")
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Tested-by: Erhard Furtner <erhard_f(a)mailbox.org>
Reviewed-by: David Gow <davidgow(a)google.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/overflow_kunit.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/lib/overflow_kunit.c~kunit-overflow-fix-ub-in-overflow_allocation_test
+++ a/lib/overflow_kunit.c
@@ -668,7 +668,6 @@ DEFINE_TEST_ALLOC(devm_kzalloc, devm_kf
static void overflow_allocation_test(struct kunit *test)
{
- const char device_name[] = "overflow-test";
struct device *dev;
int count = 0;
@@ -678,7 +677,7 @@ static void overflow_allocation_test(str
} while (0)
/* Create dummy device for devm_kmalloc()-family tests. */
- dev = kunit_device_register(test, device_name);
+ dev = kunit_device_register(test, "overflow-test");
KUNIT_ASSERT_FALSE_MSG(test, IS_ERR(dev),
"Cannot register test device\n");
_
Patches currently in -mm which might be from ivan.orlov0322(a)gmail.com are
The check_unaligned_access_emulated() function should have been called
during CPU hotplug to ensure that if all CPUs had emulated unaligned
accesses, the new CPU also does.
This patch adds the call to check_unaligned_access_emulated() in
the hotplug path.
Fixes: 55e0bf49a0d0 ("RISC-V: Probe misaligned access speed in parallel")
Signed-off-by: Jesse Taube <jesse(a)rivosinc.com>
Reviewed-by: Evan Green <evan(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
---
V5 -> V6:
- New patch
V6 -> V7:
- No changes
V7 -> V8:
- Rebase onto fixes
---
arch/riscv/kernel/unaligned_access_speed.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index 160628a2116d..f3508cc54f91 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -191,6 +191,7 @@ static int riscv_online_cpu(unsigned int cpu)
if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
goto exit;
+ check_unaligned_access_emulated(NULL);
buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
if (!buf) {
pr_warn("Allocation failure, not measuring misaligned performance\n");
--
2.45.2
Hello Igor, hello Niklas,
on my NAS I am encountering the following issue since v6.6.44 (LTS),
when executing the hdparm command for my WD-WCC7K4NLX884 drives to get
the active or standby state:
$ hdparm -C /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]: f0 00 01 00 50 40 ff 0a 00 00 78 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
drive state is: unknown
While the expected output is the following:
$ hdparm -C /dev/sda
/dev/sda:
drive state is: active/idle
I did a bisection within the stable series and found the following
commit to be the first bad one:
28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
According to kernel.dance the same commit was also backported to the
v6.10.3 and v6.1.103 stable kernels and I could not find any commit or
pending patch with a "Fixes:" tag for the offending commit.
So far I have not been able to test with the mainline kernel as this is
a remote device which I couldn't rescue in case of a boot failure. Also
just for transparency it does have the out of tree ZFS module loaded,
but AFAIU this shouldn't be an issue here, as the commit seems clearly
related to the error. If needed I can test with an untainted mainline
kernel on Friday when I'm near the device.
I have attached the output of hdparm -I below and would be happy to
provide further debug information or test patches.
Cheers,
Christian
---
#regzbot introduced: 28ab9769117c
#regzbot title: ata: libata-scsi: Sense data errors breaking hdparm with WD drives
---
$ pacman -Q hdparm
hdparm 9.65-2
$ hdparm -I /dev/sda
/dev/sda:
ATA device, with non-removable media
Model Number: WDC WD40EFRX-68N32N0
Serial Number: WD-WCC7K4NLX884
Firmware Revision: 82.00A82
Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Used: unknown (minor revision code 0x006d)
Supported: 10 9 8 7 6 5
Likely used: 10
Configuration:
Logical max current
cylinders 16383 0
heads 16 0
sectors/track 63 0
--
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 7814037168
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 3815447 MBytes
device size with M = 1000*1000: 4000787 MBytes (4000 GB)
cache/buffer size = unknown
Form Factor: 3.5 inch
Nominal Media Rotation Rate: 5400
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* 64-bit World wide name
* IDLE_IMMEDIATE with UNLOAD
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* Idle-Unload when NCQ is active
* NCQ priority information
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
* DMA Setup Auto-Activate optimization
* Device-initiated interface power management
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
unknown 206[12] (vendor specific)
unknown 206[13] (vendor specific)
* DOWNLOAD MICROCODE DMA command
* WRITE BUFFER DMA command
* READ BUFFER DMA command
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
504min for SECURITY ERASE UNIT. 504min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50014ee2647735a1
NAA : 5
IEEE OUI : 0014ee
Unique ID : 2647735a1
Checksum: correct
Coherent surfaces make only sense if the host renders to them using
accelerated apis. Without 3d the entire content of dumb buffers stays
in the guest making all of the extra work they're doing to synchronize
between guest and host useless.
Configurations without 3d also tend to run with very low graphics
memory limits. The pinned console fb, mob cursors and graphical login
manager tend to run out of 16MB graphics memory that those guests use.
Fix it by making sure the coherent dumb buffers are only used on
configs with 3d enabled.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
Reported-by: Christian Heusel <christian(a)heusel.eu>
Closes: https://lore.kernel.org/all/0d0330f3-2ac0-4cd5-8075-7f1cbaf72a8e@heusel.eu
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v6.9+
Cc: Maaz Mombasawala <maaz.mombasawala(a)broadcom.com>
Cc: Martin Krastev <martin.krastev(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
index 8ae6a761c900..1625b30d9970 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
@@ -2283,9 +2283,11 @@ int vmw_dumb_create(struct drm_file *file_priv,
/*
* Without mob support we're just going to use raw memory buffer
* because we wouldn't be able to support full surface coherency
- * without mobs
+ * without mobs. There also no reason to support surface coherency
+ * without 3d (i.e. gpu usage on the host) because then all the
+ * contents is going to be rendered guest side.
*/
- if (!dev_priv->has_mob) {
+ if (!dev_priv->has_mob || !vmw_supports_3d(dev_priv)) {
int cpp = DIV_ROUND_UP(args->bpp, 8);
switch (cpp) {
--
2.43.0
In aperture_remove_conflicting_pci_devices(), match the pci
device determine whether or not to call sysfb_disable(). This
fixes cases where the pimary device is not VGA compatible which
leads to the following problem:
1. A PCI device with a non-VGA class is the the boot display
2. That device is probed first and it is not a vga device so
sysfb_disable() is not called, but the device resources
are freed by aperture_detach_platform_device()
3. Non-primary GPU is vga device and it ends up calling sysfb_disable()
4. NULL pointer dereference via sysfb_disable() since the resources
have already been freed by aperture_detach_platform_device() when
it was called by the other device.
Fix this by calling sysfb_disable() on the device associated with it.
Fixes: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device")
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
---
drivers/video/aperture.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index 561be8feca96..56a5a0bc2b1a 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -6,6 +6,7 @@
#include <linux/mutex.h>
#include <linux/pci.h>
#include <linux/platform_device.h>
+#include <linux/screen_info.h>
#include <linux/slab.h>
#include <linux/sysfb.h>
#include <linux/types.h>
@@ -346,6 +347,7 @@ EXPORT_SYMBOL(__aperture_remove_legacy_vga_devices);
*/
int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *name)
{
+ struct screen_info *si = &screen_info;
bool primary = false;
resource_size_t base, size;
int bar, ret = 0;
@@ -353,7 +355,7 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na
if (pdev == vga_default_device())
primary = true;
- if (primary)
+ if (pdev == screen_info_pci_dev(si))
sysfb_disable();
for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) {
--
2.45.2
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
v2: Update doc strings
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 13 +++++++++++--
drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 3 +++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index f42ebc4a7c22..a0e433fbcba6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -360,6 +360,8 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
void *virtual;
int ret;
+ atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -383,11 +385,17 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
*/
void vmw_bo_unmap(struct vmw_bo *vbo)
{
+ int map_count;
+
if (vbo->map.bo == NULL)
return;
- ttm_bo_kunmap(&vbo->map);
- vbo->map.bo = NULL;
+ map_count = atomic_dec_return(&vbo->map_count);
+
+ if (!map_count) {
+ ttm_bo_kunmap(&vbo->map);
+ vbo->map.bo = NULL;
+ }
}
@@ -421,6 +429,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
xa_init(&vmw_bo->detached_resources);
+ atomic_set(&vmw_bo->map_count, 0);
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 62b4342d5f7c..43b5439ec9f7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -71,6 +71,8 @@ struct vmw_bo_params {
* @map: Kmap object for semi-persistent mappings
* @res_tree: RB tree of resources using this buffer object as a backing MOB
* @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
* @cpu_writers: Number of synccpu write grabs. Protected by reservation when
* increased. May be decreased without reservation.
* @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -90,6 +92,7 @@ struct vmw_bo {
u32 res_prios[TTM_MAX_BO_PRIORITY];
struct xarray detached_resources;
+ atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted. Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
--
2.43.0
Some FSA4480-compatible chips like the OCP96011 used on Fairphone 5
return 0x00 from the CHIP_ID register. Handle that gracefully and only
fail probe when the I2C read has failed.
With this the dev_dbg will print 0 but otherwise continue working.
[ 0.251581] fsa4480 1-0042: Found FSA4480 v0.0 (Vendor ID = 0)
Cc: stable(a)vger.kernel.org
Fixes: e885f5f1f2b4 ("usb: typec: fsa4480: Check if the chip is really there")
Signed-off-by: Luca Weiss <luca.weiss(a)fairphone.com>
---
drivers/usb/typec/mux/fsa4480.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/typec/mux/fsa4480.c b/drivers/usb/typec/mux/fsa4480.c
index cd235339834b..f71dba8bf07c 100644
--- a/drivers/usb/typec/mux/fsa4480.c
+++ b/drivers/usb/typec/mux/fsa4480.c
@@ -274,7 +274,7 @@ static int fsa4480_probe(struct i2c_client *client)
return dev_err_probe(dev, PTR_ERR(fsa->regmap), "failed to initialize regmap\n");
ret = regmap_read(fsa->regmap, FSA4480_DEVICE_ID, &val);
- if (ret || !val)
+ if (ret)
return dev_err_probe(dev, -ENODEV, "FSA4480 not found\n");
dev_dbg(dev, "Found FSA4480 v%lu.%lu (Vendor ID = %lu)\n",
---
base-commit: ccdbf91fdf5a71881ef32b41797382c4edd6f670
change-id: 20240818-fsa4480-chipid-fix-2c7cf5810135
Best regards,
--
Luca Weiss <luca.weiss(a)fairphone.com>
Failing to read out rawclk makes it impossible to read out backlight,
which results in backlight not working when the backlight is off during
boot, or when reloading the module.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/display/xe_display.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index ca4468c820788..65d7a5040270e 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -157,6 +157,9 @@ int xe_display_init_noirq(struct xe_device *xe)
intel_display_device_info_runtime_init(xe);
+ RUNTIME_INFO(xe)->rawclk_freq = intel_read_rawclk(xe);
+ drm_dbg(&xe->drm, "rawclk rate: %d kHz\n", RUNTIME_INFO(xe)->rawclk_freq);
+
err = intel_display_driver_probe_noirq(xe);
if (err) {
intel_opregion_cleanup(xe);
--
2.45.2
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081926-dumpling-ecard-9941@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 3b5bbe798b2451820e74243b738268f51901e7d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081916-hastily-apostle-7025@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
3b5bbe798b24 ("pidfd: prevent creation of pidfds for kthreads")
83b290c9e3b5 ("pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together")
64bef697d33b ("pidfd: implement PIDFD_THREAD flag for pidfd_open()")
21e25205d7f9 ("pidfd: don't do_notify_pidfd() if !thread_group_empty()")
cdefbf2324ce ("pidfd: cleanup the usage of __pidfd_prepare's flags")
932562a6045e ("rseq: Split out rseq.h from sched.h")
cba6167f0adb ("restart_block: Trim includes")
f038cc1379c0 ("locking/seqlock: Split out seqlock_types.h")
53d31ba842d9 ("posix-cpu-timers: Split out posix-timers_types.h")
f995443f01b4 ("locking/seqlock: Simplify SEQCOUNT_LOCKNAME()")
58390c8ce1bd ("Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b5bbe798b2451820e74243b738268f51901e7d0 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Wed, 31 Jul 2024 12:01:12 +0200
Subject: [PATCH] pidfd: prevent creation of pidfds for kthreads
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f201..18bdc87209d0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,11 +2053,24 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- bool thread = flags & PIDFD_THREAD;
-
- if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
+ if (!pid)
return -EINVAL;
+ scoped_guard(rcu) {
+ struct task_struct *tsk;
+
+ if (flags & PIDFD_THREAD)
+ tsk = pid_task(pid, PIDTYPE_PID);
+ else
+ tsk = pid_task(pid, PIDTYPE_TGID);
+ if (!tsk)
+ return -EINVAL;
+
+ /* Don't create pidfds for kernel threads for now. */
+ if (tsk->flags & PF_KTHREAD)
+ return -EINVAL;
+ }
+
return __pidfd_prepare(pid, flags, ret);
}
@@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
if (clone_flags & CLONE_PIDFD) {
int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+ /* Don't create pidfds for kernel threads for now. */
+ if (args->kthread) {
+ retval = -EINVAL;
+ goto bad_fork_free_pid;
+ }
+
/* Note that no task has been attached to @pid yet. */
retval = __pidfd_prepare(pid, flags, &pidfile);
if (retval < 0)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 3b5bbe798b2451820e74243b738268f51901e7d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081915-sample-happening-317a@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
3b5bbe798b24 ("pidfd: prevent creation of pidfds for kthreads")
83b290c9e3b5 ("pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together")
64bef697d33b ("pidfd: implement PIDFD_THREAD flag for pidfd_open()")
21e25205d7f9 ("pidfd: don't do_notify_pidfd() if !thread_group_empty()")
cdefbf2324ce ("pidfd: cleanup the usage of __pidfd_prepare's flags")
932562a6045e ("rseq: Split out rseq.h from sched.h")
cba6167f0adb ("restart_block: Trim includes")
f038cc1379c0 ("locking/seqlock: Split out seqlock_types.h")
53d31ba842d9 ("posix-cpu-timers: Split out posix-timers_types.h")
f995443f01b4 ("locking/seqlock: Simplify SEQCOUNT_LOCKNAME()")
58390c8ce1bd ("Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b5bbe798b2451820e74243b738268f51901e7d0 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Wed, 31 Jul 2024 12:01:12 +0200
Subject: [PATCH] pidfd: prevent creation of pidfds for kthreads
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f201..18bdc87209d0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,11 +2053,24 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- bool thread = flags & PIDFD_THREAD;
-
- if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
+ if (!pid)
return -EINVAL;
+ scoped_guard(rcu) {
+ struct task_struct *tsk;
+
+ if (flags & PIDFD_THREAD)
+ tsk = pid_task(pid, PIDTYPE_PID);
+ else
+ tsk = pid_task(pid, PIDTYPE_TGID);
+ if (!tsk)
+ return -EINVAL;
+
+ /* Don't create pidfds for kernel threads for now. */
+ if (tsk->flags & PF_KTHREAD)
+ return -EINVAL;
+ }
+
return __pidfd_prepare(pid, flags, ret);
}
@@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
if (clone_flags & CLONE_PIDFD) {
int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+ /* Don't create pidfds for kernel threads for now. */
+ if (args->kthread) {
+ retval = -EINVAL;
+ goto bad_fork_free_pid;
+ }
+
/* Note that no task has been attached to @pid yet. */
retval = __pidfd_prepare(pid, flags, &pidfile);
if (retval < 0)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 3b5bbe798b2451820e74243b738268f51901e7d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081910-grid-bobsled-6cb3@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
3b5bbe798b24 ("pidfd: prevent creation of pidfds for kthreads")
83b290c9e3b5 ("pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together")
64bef697d33b ("pidfd: implement PIDFD_THREAD flag for pidfd_open()")
21e25205d7f9 ("pidfd: don't do_notify_pidfd() if !thread_group_empty()")
cdefbf2324ce ("pidfd: cleanup the usage of __pidfd_prepare's flags")
932562a6045e ("rseq: Split out rseq.h from sched.h")
cba6167f0adb ("restart_block: Trim includes")
f038cc1379c0 ("locking/seqlock: Split out seqlock_types.h")
53d31ba842d9 ("posix-cpu-timers: Split out posix-timers_types.h")
f995443f01b4 ("locking/seqlock: Simplify SEQCOUNT_LOCKNAME()")
58390c8ce1bd ("Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b5bbe798b2451820e74243b738268f51901e7d0 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Wed, 31 Jul 2024 12:01:12 +0200
Subject: [PATCH] pidfd: prevent creation of pidfds for kthreads
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f201..18bdc87209d0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,11 +2053,24 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- bool thread = flags & PIDFD_THREAD;
-
- if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
+ if (!pid)
return -EINVAL;
+ scoped_guard(rcu) {
+ struct task_struct *tsk;
+
+ if (flags & PIDFD_THREAD)
+ tsk = pid_task(pid, PIDTYPE_PID);
+ else
+ tsk = pid_task(pid, PIDTYPE_TGID);
+ if (!tsk)
+ return -EINVAL;
+
+ /* Don't create pidfds for kernel threads for now. */
+ if (tsk->flags & PF_KTHREAD)
+ return -EINVAL;
+ }
+
return __pidfd_prepare(pid, flags, ret);
}
@@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
if (clone_flags & CLONE_PIDFD) {
int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+ /* Don't create pidfds for kernel threads for now. */
+ if (args->kthread) {
+ retval = -EINVAL;
+ goto bad_fork_free_pid;
+ }
+
/* Note that no task has been attached to @pid yet. */
retval = __pidfd_prepare(pid, flags, &pidfile);
if (retval < 0)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 3b5bbe798b2451820e74243b738268f51901e7d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081909-comment-hypocrisy-a802@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
3b5bbe798b24 ("pidfd: prevent creation of pidfds for kthreads")
83b290c9e3b5 ("pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together")
64bef697d33b ("pidfd: implement PIDFD_THREAD flag for pidfd_open()")
21e25205d7f9 ("pidfd: don't do_notify_pidfd() if !thread_group_empty()")
cdefbf2324ce ("pidfd: cleanup the usage of __pidfd_prepare's flags")
932562a6045e ("rseq: Split out rseq.h from sched.h")
cba6167f0adb ("restart_block: Trim includes")
f038cc1379c0 ("locking/seqlock: Split out seqlock_types.h")
53d31ba842d9 ("posix-cpu-timers: Split out posix-timers_types.h")
f995443f01b4 ("locking/seqlock: Simplify SEQCOUNT_LOCKNAME()")
58390c8ce1bd ("Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b5bbe798b2451820e74243b738268f51901e7d0 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Wed, 31 Jul 2024 12:01:12 +0200
Subject: [PATCH] pidfd: prevent creation of pidfds for kthreads
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f201..18bdc87209d0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,11 +2053,24 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- bool thread = flags & PIDFD_THREAD;
-
- if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
+ if (!pid)
return -EINVAL;
+ scoped_guard(rcu) {
+ struct task_struct *tsk;
+
+ if (flags & PIDFD_THREAD)
+ tsk = pid_task(pid, PIDTYPE_PID);
+ else
+ tsk = pid_task(pid, PIDTYPE_TGID);
+ if (!tsk)
+ return -EINVAL;
+
+ /* Don't create pidfds for kernel threads for now. */
+ if (tsk->flags & PF_KTHREAD)
+ return -EINVAL;
+ }
+
return __pidfd_prepare(pid, flags, ret);
}
@@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
if (clone_flags & CLONE_PIDFD) {
int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+ /* Don't create pidfds for kernel threads for now. */
+ if (args->kthread) {
+ retval = -EINVAL;
+ goto bad_fork_free_pid;
+ }
+
/* Note that no task has been attached to @pid yet. */
retval = __pidfd_prepare(pid, flags, &pidfile);
if (retval < 0)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 3b5bbe798b2451820e74243b738268f51901e7d0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081908-dork-steadfast-6a68@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
3b5bbe798b24 ("pidfd: prevent creation of pidfds for kthreads")
83b290c9e3b5 ("pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together")
64bef697d33b ("pidfd: implement PIDFD_THREAD flag for pidfd_open()")
21e25205d7f9 ("pidfd: don't do_notify_pidfd() if !thread_group_empty()")
cdefbf2324ce ("pidfd: cleanup the usage of __pidfd_prepare's flags")
932562a6045e ("rseq: Split out rseq.h from sched.h")
cba6167f0adb ("restart_block: Trim includes")
f038cc1379c0 ("locking/seqlock: Split out seqlock_types.h")
53d31ba842d9 ("posix-cpu-timers: Split out posix-timers_types.h")
f995443f01b4 ("locking/seqlock: Simplify SEQCOUNT_LOCKNAME()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3b5bbe798b2451820e74243b738268f51901e7d0 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Wed, 31 Jul 2024 12:01:12 +0200
Subject: [PATCH] pidfd: prevent creation of pidfds for kthreads
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f201..18bdc87209d0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,11 +2053,24 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- bool thread = flags & PIDFD_THREAD;
-
- if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
+ if (!pid)
return -EINVAL;
+ scoped_guard(rcu) {
+ struct task_struct *tsk;
+
+ if (flags & PIDFD_THREAD)
+ tsk = pid_task(pid, PIDTYPE_PID);
+ else
+ tsk = pid_task(pid, PIDTYPE_TGID);
+ if (!tsk)
+ return -EINVAL;
+
+ /* Don't create pidfds for kernel threads for now. */
+ if (tsk->flags & PF_KTHREAD)
+ return -EINVAL;
+ }
+
return __pidfd_prepare(pid, flags, ret);
}
@@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
if (clone_flags & CLONE_PIDFD) {
int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+ /* Don't create pidfds for kernel threads for now. */
+ if (args->kthread) {
+ retval = -EINVAL;
+ goto bad_fork_free_pid;
+ }
+
/* Note that no task has been attached to @pid yet. */
retval = __pidfd_prepare(pid, flags, &pidfile);
if (retval < 0)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 737222cebecbdbcdde2b69475c52bcb9ecfeb830
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081937-granular-passably-9226@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
737222cebecb ("drm/amd/display: fix cursor offset on rotation 180")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737222cebecbdbcdde2b69475c52bcb9ecfeb830 Mon Sep 17 00:00:00 2001
From: Melissa Wen <mwen(a)igalia.com>
Date: Tue, 31 Jan 2023 15:05:46 -0100
Subject: [PATCH] drm/amd/display: fix cursor offset on rotation 180
[why & how]
Cursor gets clipped off in the middle of the screen with hw
rotation 180. Fix a miscalculation of cursor offset when it's
placed near the edges in the pipe split case.
Cursor bugs with hw rotation were reported on AMD issue
tracker:
https://gitlab.freedesktop.org/drm/amd/-/issues/2247
The issues on rotation 270 was fixed by:
https://lore.kernel.org/amd-gfx/20221118125935.4013669-22-Brian.Chang@amd.c…
that partially addressed the rotation 180 too. So, this patch is the
final bits for rotation 180.
Reported-by: Xaver Hugl <xaver.hugl(a)gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2247
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Signed-off-by: Melissa Wen <mwen(a)igalia.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 1fd2cf090096af8a25bf85564341cfc21cec659d)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index ff03b1d98aa7..1b9ac8812f5b 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3589,7 +3589,7 @@ void dcn10_set_cursor_position(struct pipe_ctx *pipe_ctx)
(int)hubp->curs_attr.width || pos_cpy.x
<= (int)hubp->curs_attr.width +
pipe_ctx->plane_state->src_rect.x) {
- pos_cpy.x = temp_x + viewport_width;
+ pos_cpy.x = 2 * viewport_width - temp_x;
}
}
} else {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 737222cebecbdbcdde2b69475c52bcb9ecfeb830
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081932-diabetes-etching-e527@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
737222cebecb ("drm/amd/display: fix cursor offset on rotation 180")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 737222cebecbdbcdde2b69475c52bcb9ecfeb830 Mon Sep 17 00:00:00 2001
From: Melissa Wen <mwen(a)igalia.com>
Date: Tue, 31 Jan 2023 15:05:46 -0100
Subject: [PATCH] drm/amd/display: fix cursor offset on rotation 180
[why & how]
Cursor gets clipped off in the middle of the screen with hw
rotation 180. Fix a miscalculation of cursor offset when it's
placed near the edges in the pipe split case.
Cursor bugs with hw rotation were reported on AMD issue
tracker:
https://gitlab.freedesktop.org/drm/amd/-/issues/2247
The issues on rotation 270 was fixed by:
https://lore.kernel.org/amd-gfx/20221118125935.4013669-22-Brian.Chang@amd.c…
that partially addressed the rotation 180 too. So, this patch is the
final bits for rotation 180.
Reported-by: Xaver Hugl <xaver.hugl(a)gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2247
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Signed-off-by: Melissa Wen <mwen(a)igalia.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 1fd2cf090096af8a25bf85564341cfc21cec659d)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index ff03b1d98aa7..1b9ac8812f5b 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3589,7 +3589,7 @@ void dcn10_set_cursor_position(struct pipe_ctx *pipe_ctx)
(int)hubp->curs_attr.width || pos_cpy.x
<= (int)hubp->curs_attr.width +
pipe_ctx->plane_state->src_rect.x) {
- pos_cpy.x = temp_x + viewport_width;
+ pos_cpy.x = 2 * viewport_width - temp_x;
}
}
} else {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 0dbb81d44108a2a1004e5b485ef3fca5bc078424
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081907-uptight-blah-bb36@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
0dbb81d44108 ("drm/amd/display: Enable otg synchronization logic for DCN321")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0dbb81d44108a2a1004e5b485ef3fca5bc078424 Mon Sep 17 00:00:00 2001
From: Loan Chen <lo-an.chen(a)amd.com>
Date: Fri, 2 Aug 2024 13:57:40 +0800
Subject: [PATCH] drm/amd/display: Enable otg synchronization logic for DCN321
[Why]
Tiled display cannot synchronize properly after S3.
The fix for commit 5f0c74915815 ("drm/amd/display: Fix for otg
synchronization logic") is not enable in DCN321, which causes
the otg is excluded from synchronization.
[How]
Enable otg synchronization logic in dcn321.
Fixes: 5f0c74915815 ("drm/amd/display: Fix for otg synchronization logic")
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Reviewed-by: Alvin Lee <alvin.lee2(a)amd.com>
Signed-off-by: Loan Chen <lo-an.chen(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit d6ed53712f583423db61fbb802606759e023bf7b)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn321/dcn321_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn321/dcn321_resource.c
index 9a3cc0514a36..8e0588b1cf30 100644
--- a/drivers/gpu/drm/amd/display/dc/resource/dcn321/dcn321_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/resource/dcn321/dcn321_resource.c
@@ -1778,6 +1778,9 @@ static bool dcn321_resource_construct(
dc->caps.color.mpc.ogam_rom_caps.hlg = 0;
dc->caps.color.mpc.ocsc = 1;
+ /* Use pipe context based otg sync logic */
+ dc->config.use_pipe_ctx_sync_logic = true;
+
dc->config.dc_mode_clk_limit_support = true;
dc->config.enable_windowed_mpo_odm = true;
/* read VBIOS LTTPR caps */
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 56fb276d0244d430496f249335a44ae114dd5f54
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081912-custodian-handgrip-81cc@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
56fb276d0244 ("drm/amd/display: Adjust cursor position")
e53524cdcc02 ("drm/amd/display: Refactor HWSS into component folder")
6e2c4941ce0c ("drm/amd/display: Move dml code under CONFIG_DRM_AMD_DC_FP guard")
1cb87e048975 ("drm/amd/display: Add DCN35 blocks to Makefile")
0fa45b6aeae4 ("drm/amd/display: Add DCN35 Resource")
ec129fa356be ("drm/amd/display: Add DCN35 init")
6f8b7565cca4 ("drm/amd/display: Add DCN35 HWSEQ")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 56fb276d0244d430496f249335a44ae114dd5f54 Mon Sep 17 00:00:00 2001
From: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Date: Thu, 1 Aug 2024 16:16:35 -0600
Subject: [PATCH] drm/amd/display: Adjust cursor position
[why & how]
When the commit 9d84c7ef8a87 ("drm/amd/display: Correct cursor position
on horizontal mirror") was introduced, it used the wrong calculation for
the position copy for X. This commit uses the correct calculation for that
based on the original patch.
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Acked-by: Wayne Lin <wayne.lin(a)amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 8f9b23abbae5ffcd64856facd26a86b67195bc2f)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index 1b9ac8812f5b..14a902ff3b8a 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3682,7 +3682,7 @@ void dcn10_set_cursor_position(struct pipe_ctx *pipe_ctx)
(int)hubp->curs_attr.width || pos_cpy.x
<= (int)hubp->curs_attr.width +
pipe_ctx->plane_state->src_rect.x) {
- pos_cpy.x = 2 * viewport_width - temp_x;
+ pos_cpy.x = temp_x + viewport_width;
}
}
} else {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 56fb276d0244d430496f249335a44ae114dd5f54
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081911-spending-purchase-3a01@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
56fb276d0244 ("drm/amd/display: Adjust cursor position")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 56fb276d0244d430496f249335a44ae114dd5f54 Mon Sep 17 00:00:00 2001
From: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Date: Thu, 1 Aug 2024 16:16:35 -0600
Subject: [PATCH] drm/amd/display: Adjust cursor position
[why & how]
When the commit 9d84c7ef8a87 ("drm/amd/display: Correct cursor position
on horizontal mirror") was introduced, it used the wrong calculation for
the position copy for X. This commit uses the correct calculation for that
based on the original patch.
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror")
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Acked-by: Wayne Lin <wayne.lin(a)amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 8f9b23abbae5ffcd64856facd26a86b67195bc2f)
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index 1b9ac8812f5b..14a902ff3b8a 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3682,7 +3682,7 @@ void dcn10_set_cursor_position(struct pipe_ctx *pipe_ctx)
(int)hubp->curs_attr.width || pos_cpy.x
<= (int)hubp->curs_attr.width +
pipe_ctx->plane_state->src_rect.x) {
- pos_cpy.x = 2 * viewport_width - temp_x;
+ pos_cpy.x = temp_x + viewport_width;
}
}
} else {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 807174a93d24c456503692dc3f5af322ee0b640a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081913-outbid-skincare-1e41@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
807174a93d24 ("mm: fix endless reclaim on machines with unaccepted memory")
57c0419c5f0e ("mm, pcp: decrease PCP high if free pages < high watermark")
51a755c56dc0 ("mm: tune PCP high automatically")
90b41691b988 ("mm: add framework for PCP high auto-tuning")
c0a242394cb9 ("mm, page_alloc: scale the number of pages that are batch allocated")
52166607ecc9 ("mm: restrict the pcp batch scale factor to avoid too long latency")
362d37a106dd ("mm, pcp: reduce lock contention for draining high-order pages")
94a3bfe4073c ("cacheinfo: calculate size of per-CPU data cache slice")
ca71fe1ad922 ("mm, pcp: avoid to drain PCP when process exit")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 807174a93d24c456503692dc3f5af322ee0b640a Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Date: Fri, 9 Aug 2024 14:48:47 +0300
Subject: [PATCH] mm: fix endless reclaim on machines with unaccepted memory
Unaccepted memory is considered unusable free memory, which is not counted
as free on the zone watermark check. This causes get_page_from_freelist()
to accept more memory to hit the high watermark, but it creates problems
in the reclaim path.
The reclaim path encounters a failed zone watermark check and attempts to
reclaim memory. This is usually successful, but if there is little or no
reclaimable memory, it can result in endless reclaim with little to no
progress. This can occur early in the boot process, just after start of
the init process when the only reclaimable memory is the page cache of the
init executable and its libraries.
Make unaccepted memory free from watermark check point of view. This way
unaccepted memory will never be the trigger of memory reclaim. Accept
more memory in the get_page_from_freelist() if needed.
Link: https://lkml.kernel.org/r/20240809114854.3745464-2-kirill.shutemov@linux.in…
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reported-by: Jianxiong Gao <jxgao(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Tested-by: Jianxiong Gao <jxgao(a)google.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 875d76e8684a..8747087acee3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -287,7 +287,7 @@ EXPORT_SYMBOL(nr_online_nodes);
static bool page_contains_unaccepted(struct page *page, unsigned int order);
static void accept_page(struct page *page, unsigned int order);
-static bool try_to_accept_memory(struct zone *zone, unsigned int order);
+static bool cond_accept_memory(struct zone *zone, unsigned int order);
static inline bool has_unaccepted_memory(void);
static bool __free_unaccepted(struct page *page);
@@ -3072,9 +3072,6 @@ static inline long __zone_watermark_unusable_free(struct zone *z,
if (!(alloc_flags & ALLOC_CMA))
unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
#endif
-#ifdef CONFIG_UNACCEPTED_MEMORY
- unusable_free += zone_page_state(z, NR_UNACCEPTED);
-#endif
return unusable_free;
}
@@ -3368,6 +3365,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
}
}
+ cond_accept_memory(zone, order);
+
/*
* Detect whether the number of free pages is below high
* watermark. If so, we will decrease pcp->high and free
@@ -3393,10 +3392,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
gfp_mask)) {
int ret;
- if (has_unaccepted_memory()) {
- if (try_to_accept_memory(zone, order))
- goto try_this_zone;
- }
+ if (cond_accept_memory(zone, order))
+ goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/*
@@ -3450,10 +3447,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
return page;
} else {
- if (has_unaccepted_memory()) {
- if (try_to_accept_memory(zone, order))
- goto try_this_zone;
- }
+ if (cond_accept_memory(zone, order))
+ goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/* Try again if zone has deferred pages */
@@ -6950,9 +6945,6 @@ static bool try_to_accept_memory_one(struct zone *zone)
struct page *page;
bool last;
- if (list_empty(&zone->unaccepted_pages))
- return false;
-
spin_lock_irqsave(&zone->lock, flags);
page = list_first_entry_or_null(&zone->unaccepted_pages,
struct page, lru);
@@ -6978,23 +6970,29 @@ static bool try_to_accept_memory_one(struct zone *zone)
return true;
}
-static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
long to_accept;
- int ret = false;
+ bool ret = false;
+
+ if (!has_unaccepted_memory())
+ return false;
+
+ if (list_empty(&zone->unaccepted_pages))
+ return false;
/* How much to accept to get to high watermark? */
to_accept = high_wmark_pages(zone) -
(zone_page_state(zone, NR_FREE_PAGES) -
- __zone_watermark_unusable_free(zone, order, 0));
+ __zone_watermark_unusable_free(zone, order, 0) -
+ zone_page_state(zone, NR_UNACCEPTED));
- /* Accept at least one page */
- do {
+ while (to_accept > 0) {
if (!try_to_accept_memory_one(zone))
break;
ret = true;
to_accept -= MAX_ORDER_NR_PAGES;
- } while (to_accept > 0);
+ }
return ret;
}
@@ -7037,7 +7035,7 @@ static void accept_page(struct page *page, unsigned int order)
{
}
-static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
return false;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081935-borax-concerned-4bcc@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
6dd1e4c045af ("selinux: add the processing of the failure of avc_add_xperms_decision()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen(a)huawei.com>
Date: Wed, 7 Aug 2024 17:00:56 +0800
Subject: [PATCH] selinux: add the processing of the failure of
avc_add_xperms_decision()
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable(a)vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 7087cd2b802d..b49c44869dc4 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -907,7 +907,11 @@ static int avc_update_node(u32 event, u32 perms, u8 driver, u8 xperm, u32 ssid,
node->ae.avd.auditdeny &= ~perms;
break;
case AVC_CALLBACK_ADD_XPERMS:
- avc_add_xperms_decision(node, xpd);
+ rc = avc_add_xperms_decision(node, xpd);
+ if (rc) {
+ avc_node_kill(node);
+ goto out_unlock;
+ }
break;
}
avc_node_replace(node, orig);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081933-audience-hedging-fac5@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
6dd1e4c045af ("selinux: add the processing of the failure of avc_add_xperms_decision()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen(a)huawei.com>
Date: Wed, 7 Aug 2024 17:00:56 +0800
Subject: [PATCH] selinux: add the processing of the failure of
avc_add_xperms_decision()
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable(a)vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 7087cd2b802d..b49c44869dc4 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -907,7 +907,11 @@ static int avc_update_node(u32 event, u32 perms, u8 driver, u8 xperm, u32 ssid,
node->ae.avd.auditdeny &= ~perms;
break;
case AVC_CALLBACK_ADD_XPERMS:
- avc_add_xperms_decision(node, xpd);
+ rc = avc_add_xperms_decision(node, xpd);
+ if (rc) {
+ avc_node_kill(node);
+ goto out_unlock;
+ }
break;
}
avc_node_replace(node, orig);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081934-daffodil-dingy-d012@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
6dd1e4c045af ("selinux: add the processing of the failure of avc_add_xperms_decision()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen(a)huawei.com>
Date: Wed, 7 Aug 2024 17:00:56 +0800
Subject: [PATCH] selinux: add the processing of the failure of
avc_add_xperms_decision()
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable(a)vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 7087cd2b802d..b49c44869dc4 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -907,7 +907,11 @@ static int avc_update_node(u32 event, u32 perms, u8 driver, u8 xperm, u32 ssid,
node->ae.avd.auditdeny &= ~perms;
break;
case AVC_CALLBACK_ADD_XPERMS:
- avc_add_xperms_decision(node, xpd);
+ rc = avc_add_xperms_decision(node, xpd);
+ if (rc) {
+ avc_node_kill(node);
+ goto out_unlock;
+ }
break;
}
avc_node_replace(node, orig);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081933-sulphuric-enamel-7da6@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
6dd1e4c045af ("selinux: add the processing of the failure of avc_add_xperms_decision()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen(a)huawei.com>
Date: Wed, 7 Aug 2024 17:00:56 +0800
Subject: [PATCH] selinux: add the processing of the failure of
avc_add_xperms_decision()
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable(a)vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 7087cd2b802d..b49c44869dc4 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -907,7 +907,11 @@ static int avc_update_node(u32 event, u32 perms, u8 driver, u8 xperm, u32 ssid,
node->ae.avd.auditdeny &= ~perms;
break;
case AVC_CALLBACK_ADD_XPERMS:
- avc_add_xperms_decision(node, xpd);
+ rc = avc_add_xperms_decision(node, xpd);
+ if (rc) {
+ avc_node_kill(node);
+ goto out_unlock;
+ }
break;
}
avc_node_replace(node, orig);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081932-idealism-parabola-3435@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
6dd1e4c045af ("selinux: add the processing of the failure of avc_add_xperms_decision()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen(a)huawei.com>
Date: Wed, 7 Aug 2024 17:00:56 +0800
Subject: [PATCH] selinux: add the processing of the failure of
avc_add_xperms_decision()
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable(a)vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index 7087cd2b802d..b49c44869dc4 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -907,7 +907,11 @@ static int avc_update_node(u32 event, u32 perms, u8 driver, u8 xperm, u32 ssid,
node->ae.avd.auditdeny &= ~perms;
break;
case AVC_CALLBACK_ADD_XPERMS:
- avc_add_xperms_decision(node, xpd);
+ rc = avc_add_xperms_decision(node, xpd);
+ if (rc) {
+ avc_node_kill(node);
+ goto out_unlock;
+ }
break;
}
avc_node_replace(node, orig);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e30729d4bd4001881be4d1ad4332a5d4985398f8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081951-enrich-hesitate-0db0@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
e30729d4bd40 ("btrfs: zoned: properly take lock to read/update block group's zoned variables")
8cd44dd1d17a ("btrfs: zoned: fix zone_unusable accounting on making block group read-write again")
b9fd2affe4aa ("btrfs: zoned: fix initial free space detection")
6a8ebc773ef6 ("btrfs: zoned: no longer count fresh BG region as zone unusable")
fa2068d7e922 ("btrfs: zoned: count fresh BG region as zone unusable")
3349b57fd47b ("btrfs: convert block group bit field to use bit helpers")
9d4b0a129a0d ("btrfs: simplify arguments of btrfs_update_space_info and rename")
6ca64ac27631 ("btrfs: zoned: fix mounting with conventional zones")
ced8ecf026fd ("btrfs: fix space cache corruption and potential double allocations")
b09315139136 ("btrfs: zoned: activate metadata block group on flush_space")
6a921de58992 ("btrfs: zoned: introduce space_info->active_total_bytes")
393f646e34c1 ("btrfs: zoned: finish least available block group on data bg allocation")
bb9950d3df71 ("btrfs: let can_allocate_chunk return error")
f6fca3917b4d ("btrfs: store chunk size in space-info struct")
b8bea09a456f ("btrfs: add trace event for submitted RAID56 bio")
c67c68eb57f1 ("btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap")
143823cf4d5a ("btrfs: fix typos in comments")
b3a3b0255797 ("btrfs: zoned: drop optimization of zone finish")
343d8a30851c ("btrfs: zoned: prevent allocation from previous data relocation BG")
d70cbdda75da ("btrfs: zoned: consolidate zone finish functions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e30729d4bd4001881be4d1ad4332a5d4985398f8 Mon Sep 17 00:00:00 2001
From: Naohiro Aota <naohiro.aota(a)wdc.com>
Date: Thu, 1 Aug 2024 16:47:52 +0900
Subject: [PATCH] btrfs: zoned: properly take lock to read/update block group's
zoned variables
__btrfs_add_free_space_zoned() references and modifies bg's alloc_offset,
ro, and zone_unusable, but without taking the lock. It is mostly safe
because they monotonically increase (at least for now) and this function is
mostly called by a transaction commit, which is serialized by itself.
Still, taking the lock is a safer and correct option and I'm going to add a
change to reset zone_unusable while a block group is still alive. So, add
locking around the operations.
Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index f5996a43db24..eaa1dbd31352 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2697,15 +2697,16 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
u64 offset = bytenr - block_group->start;
u64 to_free, to_unusable;
int bg_reclaim_threshold = 0;
- bool initial = ((size == block_group->length) && (block_group->alloc_offset == 0));
+ bool initial;
u64 reclaimable_unusable;
- WARN_ON(!initial && offset + size > block_group->zone_capacity);
+ spin_lock(&block_group->lock);
+ initial = ((size == block_group->length) && (block_group->alloc_offset == 0));
+ WARN_ON(!initial && offset + size > block_group->zone_capacity);
if (!initial)
bg_reclaim_threshold = READ_ONCE(sinfo->bg_reclaim_threshold);
- spin_lock(&ctl->tree_lock);
if (!used)
to_free = size;
else if (initial)
@@ -2718,7 +2719,9 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
to_free = offset + size - block_group->alloc_offset;
to_unusable = size - to_free;
+ spin_lock(&ctl->tree_lock);
ctl->free_space += to_free;
+ spin_unlock(&ctl->tree_lock);
/*
* If the block group is read-only, we should account freed space into
* bytes_readonly.
@@ -2727,11 +2730,8 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
block_group->zone_unusable += to_unusable;
WARN_ON(block_group->zone_unusable > block_group->length);
}
- spin_unlock(&ctl->tree_lock);
if (!used) {
- spin_lock(&block_group->lock);
block_group->alloc_offset -= size;
- spin_unlock(&block_group->lock);
}
reclaimable_unusable = block_group->zone_unusable -
@@ -2745,6 +2745,8 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
btrfs_mark_bg_to_reclaim(block_group);
}
+ spin_unlock(&block_group->lock);
+
return 0;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081933-lyricism-manly-3272@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081931-slot-improving-a44a@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081929-drastic-shale-7639@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-overstay-bullseye-eee1@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 42fac187b5c746227c92d024f1caf33bc1d337e4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-knoll-dropkick-f11a@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
42fac187b5c7 ("btrfs: check delayed refs when we're checking if a ref exists")
e094f48040cd ("btrfs: change root->root_key.objectid to btrfs_root_id()")
44cc2e38e67b ("btrfs: stop referencing btrfs_delayed_data_ref directly")
cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
12390e42b69d ("btrfs: rename ->len to ->num_bytes in btrfs_ref")
1bff6d4f8737 ("btrfs: simplify delayed ref tracepoints")
0ea4703cc27e ("btrfs: move ref specific initialization into init_delayed_ref_common")
0509cc56619d ("btrfs: initialize btrfs_delayed_ref_head with btrfs_ref")
da3c54854197 ("btrfs: pass btrfs_ref to init_delayed_ref_common")
f2e69a77aa51 ("btrfs: move ref_root into btrfs_ref")
4d09b4e942bc ("btrfs: do not use a function to initialize btrfs_ref")
d3fbb00f5e21 ("btrfs: embed data_ref and tree_ref in btrfs_delayed_ref_node")
0eea355fc0f4 ("btrfs: add a helper to get the delayed ref node from the data/tree ref")
6de3595473b0 ("btrfs: compression: add error handling for missed page cache")
01b69bf9906b ("btrfs: convert put_file_data() to folios")
073bda7a5417 ("btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling")
141fb8cd206a ("btrfs: qgroup: correctly model root qgroup rsv in convert")
ef5a05c55704 ("btrfs: remove SLAB_MEM_SPREAD flag use")
06c9564980f1 ("btrfs: use KMEM_CACHE() to create btrfs_free_space cache")
b2c7d55e4c4c ("btrfs: use KMEM_CACHE() to create delayed ref caches")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fac187b5c746227c92d024f1caf33bc1d337e4 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Thu, 11 Apr 2024 16:41:20 -0400
Subject: [PATCH] btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs
btrfs subvol create /btrfs/s1
simoop -E -f 4k -n 200000 -z /btrfs/s1
while(true) ; do
btrfs subvol snap /btrfs/s1 /btrfs/s2
simoop -f 4k -n 200000 -r 10 -z /btrfs/s2
btrfs subvol snap /btrfs/s2 /btrfs/s3
btrfs balance start -dusage=80 /btrfs
btrfs subvol del /btrfs/s2 /btrfs/s3
umount /btrfs
btrfsck /dev/nvme4n1 || exit 1
mount /dev/nvme4n1 /btrfs
done
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2ac9296edccb..06a9e0542d70 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -1134,6 +1134,73 @@ btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 byt
return find_ref_head(delayed_refs, bytenr, false);
}
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent)
+{
+ int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY;
+
+ if (type < entry->type)
+ return -1;
+ if (type > entry->type)
+ return 1;
+
+ if (type == BTRFS_TREE_BLOCK_REF_KEY) {
+ if (root < entry->ref_root)
+ return -1;
+ if (root > entry->ref_root)
+ return 1;
+ } else {
+ if (parent < entry->parent)
+ return -1;
+ if (parent > entry->parent)
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Check to see if a given root/parent reference is attached to the head. This
+ * only checks for BTRFS_ADD_DELAYED_REF references that match, as that
+ * indicates the reference exists for the given root or parent. This is for
+ * tree blocks only.
+ *
+ * @head: the head of the bytenr we're searching.
+ * @root: the root objectid of the reference if it is a normal reference.
+ * @parent: the parent if this is a shared backref.
+ */
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent)
+{
+ struct rb_node *node;
+ bool found = false;
+
+ lockdep_assert_held(&head->mutex);
+
+ spin_lock(&head->lock);
+ node = head->ref_tree.rb_root.rb_node;
+ while (node) {
+ struct btrfs_delayed_ref_node *entry;
+ int ret;
+
+ entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node);
+ ret = find_comp(entry, root, parent);
+ if (ret < 0) {
+ node = node->rb_left;
+ } else if (ret > 0) {
+ node = node->rb_right;
+ } else {
+ /*
+ * We only want to count ADD actions, as drops mean the
+ * ref doesn't exist.
+ */
+ if (entry->action == BTRFS_ADD_DELAYED_REF)
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&head->lock);
+ return found;
+}
+
void __cold btrfs_delayed_ref_exit(void)
{
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ef15e998be03..05f634eb472d 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -389,6 +389,8 @@ void btrfs_dec_delayed_refs_rsv_bg_updates(struct btrfs_fs_info *fs_info);
int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info,
enum btrfs_reserve_flush_enum flush);
bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info);
+bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head,
+ u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node)
{
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ff9f0d41987e..feec49e6f9c8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5472,23 +5472,62 @@ static int check_ref_exists(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 bytenr, u64 parent,
int level)
{
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_delayed_ref_head *head;
struct btrfs_path *path;
struct btrfs_extent_inline_ref *iref;
int ret;
+ bool exists = false;
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-
+again:
ret = lookup_extent_backref(trans, path, &iref, bytenr,
root->fs_info->nodesize, parent,
btrfs_root_id(root), level, 0);
+ if (ret != -ENOENT) {
+ /*
+ * If we get 0 then we found our reference, return 1, else
+ * return the error if it's not -ENOENT;
+ */
+ btrfs_free_path(path);
+ return (ret < 0 ) ? ret : 1;
+ }
+
+ /*
+ * We could have a delayed ref with this reference, so look it up while
+ * we're holding the path open to make sure we don't race with the
+ * delayed ref running.
+ */
+ delayed_refs = &trans->transaction->delayed_refs;
+ spin_lock(&delayed_refs->lock);
+ head = btrfs_find_delayed_ref_head(delayed_refs, bytenr);
+ if (!head)
+ goto out;
+ if (!mutex_trylock(&head->mutex)) {
+ /*
+ * We're contended, means that the delayed ref is running, get a
+ * reference and wait for the ref head to be complete and then
+ * try again.
+ */
+ refcount_inc(&head->refs);
+ spin_unlock(&delayed_refs->lock);
+
+ btrfs_release_path(path);
+
+ mutex_lock(&head->mutex);
+ mutex_unlock(&head->mutex);
+ btrfs_put_delayed_ref_head(head);
+ goto again;
+ }
+
+ exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent);
+ mutex_unlock(&head->mutex);
+out:
+ spin_unlock(&delayed_refs->lock);
btrfs_free_path(path);
- if (ret == -ENOENT)
- return 0;
- if (ret < 0)
- return ret;
- return 1;
+ return exists ? 1 : 0;
}
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081915-scrubber-excretion-0f83@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
3ea4dc5bf00c ("btrfs: send: send compressed extents with encoded writes")
6fe81a3a3ac8 ("btrfs: balance btree dirty pages and delayed items after clone and dedupe")
152555b39ceb ("btrfs: send: avoid trashing the page cache")
521b6803f22e ("btrfs: send: keep the current inode open while processing it")
1c6cbbbeeeca ("btrfs: remove inode_dio_wait() calls when starting reflink operations")
6b1f86f8e9c7 ("Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081913-cupped-curing-f315@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
3ea4dc5bf00c ("btrfs: send: send compressed extents with encoded writes")
6fe81a3a3ac8 ("btrfs: balance btree dirty pages and delayed items after clone and dedupe")
152555b39ceb ("btrfs: send: avoid trashing the page cache")
521b6803f22e ("btrfs: send: keep the current inode open while processing it")
1c6cbbbeeeca ("btrfs: remove inode_dio_wait() calls when starting reflink operations")
6b1f86f8e9c7 ("Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081911-shrivel-overstay-1394@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081909-blunt-glass-4ea5@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
4e00422ee626 ("btrfs: replace sb::s_blocksize by fs_info::sectorsize")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 12 Aug 2024 14:18:06 +0100
Subject: [PATCH] btrfs: send: allow cloning non-aligned extent if it ends at
i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap
rm -f /tmp/send-test
btrfs send -f /tmp/send-test $MNT/snap
umount $MNT
mkfs.btrfs -f $DEV
mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
write bar - offset=0 length=49152
write bar - offset=49152 length=49152
write bar - offset=98304 length=49152
write bar - offset=147456 length=49152
write bar - offset=196608 length=49152
write bar - offset=245760 length=49152
write bar - offset=294912 length=49152
write bar - offset=344064 length=49152
write bar - offset=393216 length=49152
write bar - offset=442368 length=49152
write bar - offset=491520 length=49152
write bar - offset=540672 length=49152
write bar - offset=589824 length=49152
write bar - offset=638976 length=49152
write bar - offset=688128 length=49152
write bar - offset=737280 length=49152
write bar - offset=786432 length=49152
write bar - offset=835584 length=49152
write bar - offset=884736 length=49152
write bar - offset=933888 length=49152
write bar - offset=983040 length=49152
write bar - offset=1032192 length=16389
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file
bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the
source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
clone bar - source=foo source offset=0 offset=0 length=1048581
chown bar - uid=0, gid=0
chmod bar - mode=0644
utimes bar
utimes
BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7
/mnt/sdi/snap/bar:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416
CC: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4ca711a773ef..7fc692fc76e1 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6157,25 +6157,51 @@ static int send_write_or_clone(struct send_ctx *sctx,
u64 offset = key->offset;
u64 end;
u64 bs = sctx->send_root->fs_info->sectorsize;
+ struct btrfs_file_extent_item *ei;
+ u64 disk_byte;
+ u64 data_offset;
+ u64 num_bytes;
+ struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size);
if (offset >= end)
return 0;
- if (clone_root && IS_ALIGNED(end, bs)) {
- struct btrfs_file_extent_item *ei;
- u64 disk_byte;
- u64 data_offset;
+ num_bytes = end - offset;
- ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
- struct btrfs_file_extent_item);
- disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
- data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
- ret = clone_range(sctx, path, clone_root, disk_byte,
- data_offset, offset, end - offset);
- } else {
- ret = send_extent_data(sctx, path, offset, end - offset);
- }
+ if (!clone_root)
+ goto write_data;
+
+ if (IS_ALIGNED(end, bs))
+ goto clone_data;
+
+ /*
+ * If the extent end is not aligned, we can clone if the extent ends at
+ * the i_size of the inode and the clone range ends at the i_size of the
+ * source inode, otherwise the clone operation fails with -EINVAL.
+ */
+ if (end != sctx->cur_inode_size)
+ goto write_data;
+
+ ret = get_inode_info(clone_root->root, clone_root->ino, &info);
+ if (ret < 0)
+ return ret;
+
+ if (clone_root->offset + num_bytes == info.size)
+ goto clone_data;
+
+write_data:
+ ret = send_extent_data(sctx, path, offset, num_bytes);
+ sctx->cur_inode_next_write_offset = end;
+ return ret;
+
+clone_data:
+ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
+ struct btrfs_file_extent_item);
+ disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei);
+ data_offset = btrfs_file_extent_offset(path->nodes[0], ei);
+ ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset,
+ num_bytes);
sctx->cur_inode_next_write_offset = end;
return ret;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081954-purplish-scarecrow-7568@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081953-calm-granola-8c1b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081952-clatter-tribute-f81c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 31723c9542dba1681cc3720571fdf12ffe0eddd9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081951-anyhow-fool-5756@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
31723c9542db ("btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type")
94a48aef49f2 ("btrfs: extend btrfs_dir_item type to store encryption status")
e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 31723c9542dba1681cc3720571fdf12ffe0eddd9 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 12 Aug 2024 08:52:44 +0930
Subject: [PATCH] btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Reported-by: Kota <nospam(a)kota.moe>
Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc…
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a825fa598e3c..6f1e2f2215d9 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -569,9 +569,10 @@ static int check_dir_item(struct extent_buffer *leaf,
/* dir type check */
dir_type = btrfs_dir_ftype(leaf, di);
- if (unlikely(dir_type >= BTRFS_FT_MAX)) {
+ if (unlikely(dir_type <= BTRFS_FT_UNKNOWN ||
+ dir_type >= BTRFS_FT_MAX)) {
dir_item_err(leaf, slot,
- "invalid dir item type, have %u expect [0, %u)",
+ "invalid dir item type, have %u expect (0, %u)",
dir_type, BTRFS_FT_MAX);
return -EUCLEAN;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7c5e8d212d7d81991a580e7de3904ea213d9a852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081942-rippling-relieving-c8d1@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
7c5e8d212d7d ("selftests: memfd_secret: don't build memfd_secret test on unsupported arches")
a3c5cc5129ef ("selftests/mm: log run_vmtests.sh results in TAP format")
2ffc27b15b11 ("tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions")
05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
000303329752 ("selftests/mm: make migration test robust to failure")
f6dd4e223d87 ("selftests/mm: skip soft-dirty tests on arm64")
ba91e7e5d15a ("selftests/mm: add tests for HWPOISON hugetlbfs read")
2bc481362245 ("selftests/mm: add -a to run_vmtests.sh")
63773d2b593d ("Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c5e8d212d7d81991a580e7de3904ea213d9a852 Mon Sep 17 00:00:00 2001
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Date: Fri, 9 Aug 2024 12:56:42 +0500
Subject: [PATCH] selftests: memfd_secret: don't build memfd_secret test on
unsupported arches
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise on
KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before
executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 7b8a5def54a1..cfad627e8d94 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 03ac4f2e1cce..36045edb10de 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 7c5e8d212d7d81991a580e7de3904ea213d9a852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081941-agility-fable-8749@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
7c5e8d212d7d ("selftests: memfd_secret: don't build memfd_secret test on unsupported arches")
a3c5cc5129ef ("selftests/mm: log run_vmtests.sh results in TAP format")
2ffc27b15b11 ("tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions")
05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
000303329752 ("selftests/mm: make migration test robust to failure")
f6dd4e223d87 ("selftests/mm: skip soft-dirty tests on arm64")
ba91e7e5d15a ("selftests/mm: add tests for HWPOISON hugetlbfs read")
2bc481362245 ("selftests/mm: add -a to run_vmtests.sh")
63773d2b593d ("Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c5e8d212d7d81991a580e7de3904ea213d9a852 Mon Sep 17 00:00:00 2001
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Date: Fri, 9 Aug 2024 12:56:42 +0500
Subject: [PATCH] selftests: memfd_secret: don't build memfd_secret test on
unsupported arches
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise on
KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before
executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 7b8a5def54a1..cfad627e8d94 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 03ac4f2e1cce..36045edb10de 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 7c5e8d212d7d81991a580e7de3904ea213d9a852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081940-stopped-knickers-3a22@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
7c5e8d212d7d ("selftests: memfd_secret: don't build memfd_secret test on unsupported arches")
a3c5cc5129ef ("selftests/mm: log run_vmtests.sh results in TAP format")
2ffc27b15b11 ("tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7c5e8d212d7d81991a580e7de3904ea213d9a852 Mon Sep 17 00:00:00 2001
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Date: Fri, 9 Aug 2024 12:56:42 +0500
Subject: [PATCH] selftests: memfd_secret: don't build memfd_secret test on
unsupported arches
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise on
KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before
executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 7b8a5def54a1..cfad627e8d94 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 03ac4f2e1cce..36045edb10de 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d75abd0d0bc29e6ebfebbf76d11b4067b35844af
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-stardust-create-e577@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
d75abd0d0bc2 ("mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu")
96f96763de26 ("mm: memory-failure: convert to pr_fmt()")
98931dd95fd4 ("Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d75abd0d0bc29e6ebfebbf76d11b4067b35844af Mon Sep 17 00:00:00 2001
From: Waiman Long <longman(a)redhat.com>
Date: Tue, 6 Aug 2024 12:41:07 -0400
Subject: [PATCH] mm/memory-failure: use raw_spinlock_t in struct
memory_failure_cpu
The memory_failure_cpu structure is a per-cpu structure. Access to its
content requires the use of get_cpu_var() to lock in the current CPU and
disable preemption. The use of a regular spinlock_t for locking purpose
is fine for a non-RT kernel.
Since the integration of RT spinlock support into the v5.15 kernel, a
spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping
lock in a preemption disabled context is illegal resulting in the
following kind of warning.
[12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
[12135.732252] preempt_count: 1, expected: 0
[12135.732255] RCU nest depth: 2, expected: 2
:
[12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
[12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
[12135.732433] Call Trace:
[12135.732436] <TASK>
[12135.732450] dump_stack_lvl+0x57/0x81
[12135.732461] __might_resched.cold+0xf4/0x12f
[12135.732479] rt_spin_lock+0x4c/0x100
[12135.732491] memory_failure_queue+0x40/0xe0
[12135.732503] ghes_do_memory_failure+0x53/0x390
[12135.732516] ghes_do_proc.constprop.0+0x229/0x3e0
[12135.732575] ghes_proc+0xf9/0x1a0
[12135.732591] ghes_notify_hed+0x6a/0x150
[12135.732602] notifier_call_chain+0x43/0xb0
[12135.732626] blocking_notifier_call_chain+0x43/0x60
[12135.732637] acpi_ev_notify_dispatch+0x47/0x70
[12135.732648] acpi_os_execute_deferred+0x13/0x20
[12135.732654] process_one_work+0x41f/0x500
[12135.732695] worker_thread+0x192/0x360
[12135.732715] kthread+0x111/0x140
[12135.732733] ret_from_fork+0x29/0x50
[12135.732779] </TASK>
Fix it by using a raw_spinlock_t for locking instead.
Also move the pr_err() out of the lock critical section and after
put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep
with this call.
[longman(a)redhat.com: don't hold percpu ref across pr_err(), per Miaohe]
Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com
Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Len Brown <len.brown(a)intel.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 581d3e5c9117..7066fc84f351 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2417,7 +2417,7 @@ struct memory_failure_entry {
struct memory_failure_cpu {
DECLARE_KFIFO(fifo, struct memory_failure_entry,
MEMORY_FAILURE_FIFO_SIZE);
- spinlock_t lock;
+ raw_spinlock_t lock;
struct work_struct work;
};
@@ -2443,20 +2443,22 @@ void memory_failure_queue(unsigned long pfn, int flags)
{
struct memory_failure_cpu *mf_cpu;
unsigned long proc_flags;
+ bool buffer_overflow;
struct memory_failure_entry entry = {
.pfn = pfn,
.flags = flags,
};
mf_cpu = &get_cpu_var(memory_failure_cpu);
- spin_lock_irqsave(&mf_cpu->lock, proc_flags);
- if (kfifo_put(&mf_cpu->fifo, entry))
+ raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
+ buffer_overflow = !kfifo_put(&mf_cpu->fifo, entry);
+ if (!buffer_overflow)
schedule_work_on(smp_processor_id(), &mf_cpu->work);
- else
+ raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
+ put_cpu_var(memory_failure_cpu);
+ if (buffer_overflow)
pr_err("buffer overflow when queuing memory failure at %#lx\n",
pfn);
- spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
- put_cpu_var(memory_failure_cpu);
}
EXPORT_SYMBOL_GPL(memory_failure_queue);
@@ -2469,9 +2471,9 @@ static void memory_failure_work_func(struct work_struct *work)
mf_cpu = container_of(work, struct memory_failure_cpu, work);
for (;;) {
- spin_lock_irqsave(&mf_cpu->lock, proc_flags);
+ raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
gotten = kfifo_get(&mf_cpu->fifo, &entry);
- spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
+ raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
if (!gotten)
break;
if (entry.flags & MF_SOFT_OFFLINE)
@@ -2501,7 +2503,7 @@ static int __init memory_failure_init(void)
for_each_possible_cpu(cpu) {
mf_cpu = &per_cpu(memory_failure_cpu, cpu);
- spin_lock_init(&mf_cpu->lock);
+ raw_spin_lock_init(&mf_cpu->lock);
INIT_KFIFO(mf_cpu->fifo);
INIT_WORK(&mf_cpu->work, memory_failure_work_func);
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 58a63729c957621f1990c3494c702711188ca347
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081950-tweed-tattle-7f2c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
58a63729c957 ("net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings")
18010ff776fa ("net: mana: Fix race on per-CQ variable napi work_done")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58a63729c957621f1990c3494c702711188ca347 Mon Sep 17 00:00:00 2001
From: Long Li <longli(a)microsoft.com>
Date: Fri, 9 Aug 2024 08:58:58 -0700
Subject: [PATCH] net: mana: Fix doorbell out of order violation and avoid
unnecessary doorbell rings
After napi_complete_done() is called when NAPI is polling in the current
process context, another NAPI may be scheduled and start running in
softirq on another CPU and may ring the doorbell before the current CPU
does. When combined with unnecessary rings when there is no need to arm
the CQ, it triggers error paths in the hardware.
This patch fixes this by calling napi_complete_done() after doorbell
rings. It limits the number of unnecessary rings when there is
no need to arm. MANA hardware specifies that there must be one doorbell
ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as
soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical
workloads, the 4 CQ wraparounds proves to be big enough that it rarely
exceeds this limit before all the napi weight is consumed.
To implement this, add a per-CQ counter cq->work_done_since_doorbell,
and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.
Cc: stable(a)vger.kernel.org
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Reviewed-by: Haiyang Zhang <haiyangz(a)microsoft.com>
Signed-off-by: Long Li <longli(a)microsoft.com>
Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhy…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ae717d06e66f..39f56973746d 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1792,7 +1792,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
{
struct mana_cq *cq = context;
- u8 arm_bit;
int w;
WARN_ON_ONCE(cq->gdma_cq != gdma_queue);
@@ -1803,16 +1802,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
mana_poll_tx_cq(cq);
w = cq->work_done;
+ cq->work_done_since_doorbell += w;
- if (w < cq->budget &&
- napi_complete_done(&cq->napi, w)) {
- arm_bit = SET_ARM_BIT;
- } else {
- arm_bit = 0;
+ if (w < cq->budget) {
+ mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
+ cq->work_done_since_doorbell = 0;
+ napi_complete_done(&cq->napi, w);
+ } else if (cq->work_done_since_doorbell >
+ cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
+ /* MANA hardware requires at least one doorbell ring every 8
+ * wraparounds of CQ even if there is no need to arm the CQ.
+ * This driver rings the doorbell as soon as we have exceeded
+ * 4 wraparounds.
+ */
+ mana_gd_ring_cq(gdma_queue, 0);
+ cq->work_done_since_doorbell = 0;
}
- mana_gd_ring_cq(gdma_queue, arm_bit);
-
return w;
}
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 6439fd8b437b..7caa334f4888 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -275,6 +275,7 @@ struct mana_cq {
/* NAPI data */
struct napi_struct napi;
int work_done;
+ int work_done_since_doorbell;
int budget;
};
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 58a63729c957621f1990c3494c702711188ca347
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081949-strategy-gatherer-758e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
58a63729c957 ("net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58a63729c957621f1990c3494c702711188ca347 Mon Sep 17 00:00:00 2001
From: Long Li <longli(a)microsoft.com>
Date: Fri, 9 Aug 2024 08:58:58 -0700
Subject: [PATCH] net: mana: Fix doorbell out of order violation and avoid
unnecessary doorbell rings
After napi_complete_done() is called when NAPI is polling in the current
process context, another NAPI may be scheduled and start running in
softirq on another CPU and may ring the doorbell before the current CPU
does. When combined with unnecessary rings when there is no need to arm
the CQ, it triggers error paths in the hardware.
This patch fixes this by calling napi_complete_done() after doorbell
rings. It limits the number of unnecessary rings when there is
no need to arm. MANA hardware specifies that there must be one doorbell
ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as
soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical
workloads, the 4 CQ wraparounds proves to be big enough that it rarely
exceeds this limit before all the napi weight is consumed.
To implement this, add a per-CQ counter cq->work_done_since_doorbell,
and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.
Cc: stable(a)vger.kernel.org
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Reviewed-by: Haiyang Zhang <haiyangz(a)microsoft.com>
Signed-off-by: Long Li <longli(a)microsoft.com>
Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhy…
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index ae717d06e66f..39f56973746d 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1792,7 +1792,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
{
struct mana_cq *cq = context;
- u8 arm_bit;
int w;
WARN_ON_ONCE(cq->gdma_cq != gdma_queue);
@@ -1803,16 +1802,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
mana_poll_tx_cq(cq);
w = cq->work_done;
+ cq->work_done_since_doorbell += w;
- if (w < cq->budget &&
- napi_complete_done(&cq->napi, w)) {
- arm_bit = SET_ARM_BIT;
- } else {
- arm_bit = 0;
+ if (w < cq->budget) {
+ mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
+ cq->work_done_since_doorbell = 0;
+ napi_complete_done(&cq->napi, w);
+ } else if (cq->work_done_since_doorbell >
+ cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
+ /* MANA hardware requires at least one doorbell ring every 8
+ * wraparounds of CQ even if there is no need to arm the CQ.
+ * This driver rings the doorbell as soon as we have exceeded
+ * 4 wraparounds.
+ */
+ mana_gd_ring_cq(gdma_queue, 0);
+ cq->work_done_since_doorbell = 0;
}
- mana_gd_ring_cq(gdma_queue, arm_bit);
-
return w;
}
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 6439fd8b437b..7caa334f4888 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -275,6 +275,7 @@ struct mana_cq {
/* NAPI data */
struct napi_struct napi;
int work_done;
+ int work_done_since_doorbell;
int budget;
};
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 38055789d15155109b41602ad719d770af507030
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081920-gating-yummy-71e0@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
38055789d151 ("wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850")
26dd8ccdba4d ("wifi: ath12k: dynamic VLAN support")
97b7cbb7a3cb ("wifi: ath12k: support SMPS configuration for 6 GHz")
f0e61dc7ecf9 ("wifi: ath12k: refactor SMPS configuration")
112dbc6af807 ("wifi: ath12k: add 6 GHz params in peer assoc command")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 38055789d15155109b41602ad719d770af507030 Mon Sep 17 00:00:00 2001
From: Baochen Qiang <quic_bqiang(a)quicinc.com>
Date: Thu, 1 Aug 2024 18:04:07 +0300
Subject: [PATCH] wifi: ath12k: use 128 bytes aligned iova in transmit path for
WCN7850
In transmit path, it is likely that the iova is not aligned to PCIe TLP
max payload size, which is 128 for WCN7850. Normally in such cases hardware
is expected to split the packet into several parts in a manner such that
they, other than the first one, have aligned iova. However due to hardware
limitations, WCN7850 does not behave like that properly with some specific
unaligned iova in transmit path. This easily results in target hang in a
KPI transmit test: packet send/receive failure, WMI command send timeout
etc. Also fatal error seen in PCIe level:
...
Capabilities: ...
...
DevSta: ... FatalErr+ ...
...
...
Work around this by manually moving/reallocating payload buffer such that
we can map it to a 128 bytes aligned iova. The moving requires sufficient
head room or tail room in skb: for the former we can do ourselves a favor
by asking some extra bytes when registering with mac80211, while for the
latter we can do nothing.
Moving/reallocating buffer consumes additional CPU cycles, but the good news
is that an aligned iova increases PCIe efficiency. In my tests on some X86
platforms the KPI results are almost consistent.
Since this is seen only with WCN7850, add a new hardware parameter to
differentiate from others.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Signed-off-by: Baochen Qiang <quic_bqiang(a)quicinc.com>
Cc: <stable(a)vger.kernel.org>
Tested-by: Mark Pearson <mpearson-lenovo(a)squebb.ca>
Signed-off-by: Kalle Valo <quic_kvalo(a)quicinc.com>
Link: https://patch.msgid.link/20240715023814.20242-1-quic_bqiang@quicinc.com
diff --git a/drivers/net/wireless/ath/ath12k/dp_tx.c b/drivers/net/wireless/ath/ath12k/dp_tx.c
index d08c04343e90..44406e0b4a34 100644
--- a/drivers/net/wireless/ath/ath12k/dp_tx.c
+++ b/drivers/net/wireless/ath/ath12k/dp_tx.c
@@ -162,6 +162,60 @@ static int ath12k_dp_prepare_htt_metadata(struct sk_buff *skb)
return 0;
}
+static void ath12k_dp_tx_move_payload(struct sk_buff *skb,
+ unsigned long delta,
+ bool head)
+{
+ unsigned long len = skb->len;
+
+ if (head) {
+ skb_push(skb, delta);
+ memmove(skb->data, skb->data + delta, len);
+ skb_trim(skb, len);
+ } else {
+ skb_put(skb, delta);
+ memmove(skb->data + delta, skb->data, len);
+ skb_pull(skb, delta);
+ }
+}
+
+static int ath12k_dp_tx_align_payload(struct ath12k_base *ab,
+ struct sk_buff **pskb)
+{
+ u32 iova_mask = ab->hw_params->iova_mask;
+ unsigned long offset, delta1, delta2;
+ struct sk_buff *skb2, *skb = *pskb;
+ unsigned int headroom = skb_headroom(skb);
+ int tailroom = skb_tailroom(skb);
+ int ret = 0;
+
+ offset = (unsigned long)skb->data & iova_mask;
+ delta1 = offset;
+ delta2 = iova_mask - offset + 1;
+
+ if (headroom >= delta1) {
+ ath12k_dp_tx_move_payload(skb, delta1, true);
+ } else if (tailroom >= delta2) {
+ ath12k_dp_tx_move_payload(skb, delta2, false);
+ } else {
+ skb2 = skb_realloc_headroom(skb, iova_mask);
+ if (!skb2) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ dev_kfree_skb_any(skb);
+
+ offset = (unsigned long)skb2->data & iova_mask;
+ if (offset)
+ ath12k_dp_tx_move_payload(skb2, offset, true);
+ *pskb = skb2;
+ }
+
+out:
+ return ret;
+}
+
int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif,
struct sk_buff *skb)
{
@@ -184,6 +238,7 @@ int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif,
bool tcl_ring_retry;
bool msdu_ext_desc = false;
bool add_htt_metadata = false;
+ u32 iova_mask = ab->hw_params->iova_mask;
if (test_bit(ATH12K_FLAG_CRASH_FLUSH, &ar->ab->dev_flags))
return -ESHUTDOWN;
@@ -279,6 +334,23 @@ int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif,
goto fail_remove_tx_buf;
}
+ if (iova_mask &&
+ (unsigned long)skb->data & iova_mask) {
+ ret = ath12k_dp_tx_align_payload(ab, &skb);
+ if (ret) {
+ ath12k_warn(ab, "failed to align TX buffer %d\n", ret);
+ /* don't bail out, give original buffer
+ * a chance even unaligned.
+ */
+ goto map;
+ }
+
+ /* hdr is pointing to a wrong place after alignment,
+ * so refresh it for later use.
+ */
+ hdr = (void *)skb->data;
+ }
+map:
ti.paddr = dma_map_single(ab->dev, skb->data, skb->len, DMA_TO_DEVICE);
if (dma_mapping_error(ab->dev, ti.paddr)) {
atomic_inc(&ab->soc_stats.tx_err.misc_fail);
diff --git a/drivers/net/wireless/ath/ath12k/hw.c b/drivers/net/wireless/ath/ath12k/hw.c
index 2e11ea763574..7b0b6a7f4701 100644
--- a/drivers/net/wireless/ath/ath12k/hw.c
+++ b/drivers/net/wireless/ath/ath12k/hw.c
@@ -924,6 +924,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = {
.acpi_guid = NULL,
.supports_dynamic_smps_6ghz = true,
+
+ .iova_mask = 0,
},
{
.name = "wcn7850 hw2.0",
@@ -1000,6 +1002,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = {
.acpi_guid = &wcn7850_uuid,
.supports_dynamic_smps_6ghz = false,
+
+ .iova_mask = ATH12K_PCIE_MAX_PAYLOAD_SIZE - 1,
},
{
.name = "qcn9274 hw2.0",
@@ -1072,6 +1076,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = {
.acpi_guid = NULL,
.supports_dynamic_smps_6ghz = true,
+
+ .iova_mask = 0,
},
};
diff --git a/drivers/net/wireless/ath/ath12k/hw.h b/drivers/net/wireless/ath/ath12k/hw.h
index e792eb6b249b..b1d302c48326 100644
--- a/drivers/net/wireless/ath/ath12k/hw.h
+++ b/drivers/net/wireless/ath/ath12k/hw.h
@@ -96,6 +96,8 @@
#define ATH12K_M3_FILE "m3.bin"
#define ATH12K_REGDB_FILE_NAME "regdb.bin"
+#define ATH12K_PCIE_MAX_PAYLOAD_SIZE 128
+
enum ath12k_hw_rate_cck {
ATH12K_HW_RATE_CCK_LP_11M = 0,
ATH12K_HW_RATE_CCK_LP_5_5M,
@@ -215,6 +217,8 @@ struct ath12k_hw_params {
const guid_t *acpi_guid;
bool supports_dynamic_smps_6ghz;
+
+ u32 iova_mask;
};
struct ath12k_hw_ops {
diff --git a/drivers/net/wireless/ath/ath12k/mac.c b/drivers/net/wireless/ath/ath12k/mac.c
index 8106297f0bc1..ce41c8153080 100644
--- a/drivers/net/wireless/ath/ath12k/mac.c
+++ b/drivers/net/wireless/ath/ath12k/mac.c
@@ -9193,6 +9193,7 @@ static int ath12k_mac_hw_register(struct ath12k_hw *ah)
hw->vif_data_size = sizeof(struct ath12k_vif);
hw->sta_data_size = sizeof(struct ath12k_sta);
+ hw->extra_tx_headroom = ab->hw_params->iova_mask;
wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_CQM_RSSI_LIST);
wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_STA_TX_PWR);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081926-thrower-salaried-4605@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
4e91fa1ef3ce ("i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume")
14d02fbadb5d ("i2c: qcom-geni: add desc struct to prepare support for I2C Master Hub variant")
d8703554f4de ("i2c: qcom-geni: Add support for GPI DMA")
357ee8841d0b ("i2c: qcom-geni: Store DMA mapping data in geni_i2c_dev struct")
9cb4c67d7717 ("Revert "i2c: i2c-qcom-geni: Fix DMA transfer race"")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8 Mon Sep 17 00:00:00 2001
From: Andi Shyti <andi.shyti(a)kernel.org>
Date: Mon, 12 Aug 2024 21:40:28 +0200
Subject: [PATCH] i2c: qcom-geni: Add missing geni_icc_disable in
geni_i2c_runtime_resume
Add the missing geni_icc_disable() call before returning in the
geni_i2c_runtime_resume() function.
Commit 9ba48db9f77c ("i2c: qcom-geni: Add missing
geni_icc_disable in geni_i2c_runtime_resume") by Gaosheng missed
disabling the interconnect in one case.
Fixes: bf225ed357c6 ("i2c: i2c-qcom-geni: Add interconnect support")
Cc: Gaosheng Cui <cuigaosheng1(a)huawei.com>
Cc: stable(a)vger.kernel.org # v5.9+
Signed-off-by: Andi Shyti <andi.shyti(a)kernel.org>
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
index 365e37bba0f3..06e836e3e877 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -986,8 +986,10 @@ static int __maybe_unused geni_i2c_runtime_resume(struct device *dev)
return ret;
ret = clk_prepare_enable(gi2c->core_clk);
- if (ret)
+ if (ret) {
+ geni_icc_disable(&gi2c->se);
return ret;
+ }
ret = geni_se_resources_on(&gi2c->se);
if (ret) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-editor-flinch-ca97@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
4e91fa1ef3ce ("i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume")
14d02fbadb5d ("i2c: qcom-geni: add desc struct to prepare support for I2C Master Hub variant")
d8703554f4de ("i2c: qcom-geni: Add support for GPI DMA")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8 Mon Sep 17 00:00:00 2001
From: Andi Shyti <andi.shyti(a)kernel.org>
Date: Mon, 12 Aug 2024 21:40:28 +0200
Subject: [PATCH] i2c: qcom-geni: Add missing geni_icc_disable in
geni_i2c_runtime_resume
Add the missing geni_icc_disable() call before returning in the
geni_i2c_runtime_resume() function.
Commit 9ba48db9f77c ("i2c: qcom-geni: Add missing
geni_icc_disable in geni_i2c_runtime_resume") by Gaosheng missed
disabling the interconnect in one case.
Fixes: bf225ed357c6 ("i2c: i2c-qcom-geni: Add interconnect support")
Cc: Gaosheng Cui <cuigaosheng1(a)huawei.com>
Cc: stable(a)vger.kernel.org # v5.9+
Signed-off-by: Andi Shyti <andi.shyti(a)kernel.org>
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
index 365e37bba0f3..06e836e3e877 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -986,8 +986,10 @@ static int __maybe_unused geni_i2c_runtime_resume(struct device *dev)
return ret;
ret = clk_prepare_enable(gi2c->core_clk);
- if (ret)
+ if (ret) {
+ geni_icc_disable(&gi2c->se);
return ret;
+ }
ret = geni_se_resources_on(&gi2c->se);
if (ret) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-lapping-sedation-f06f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
85067747cf98 ("dm: do not use waitqueue for request-based DM")
087615bf3acd ("dm mpath: pass IO start time to path selector")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-voting-handwrite-258d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081925-comic-charcoal-8b91@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081923-move-improving-38ad@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081924-fidelity-leverage-7546@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
1e1fd567d32f ("dm suspend: return -ERESTARTSYS instead of -EINTR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Tue, 13 Aug 2024 12:38:51 +0200
Subject: [PATCH] dm suspend: return -ERESTARTSYS instead of -EINTR
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 97fab2087df8..87bb90303435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2737,7 +2737,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
@@ -2762,7 +2762,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st
break;
if (signal_pending_state(task_state, current)) {
- r = -EINTR;
+ r = -ERESTARTSYS;
break;
}
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 74c2ab6d653b4c2354df65a7f7f2df1925a40a51
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-sustained-humbly-8aaf@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
74c2ab6d653b ("smb/client: avoid possible NULL dereference in cifs_free_subrequest()")
519be989717c ("cifs: Add a tracepoint to track credits involved in R/W requests")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 74c2ab6d653b4c2354df65a7f7f2df1925a40a51 Mon Sep 17 00:00:00 2001
From: Su Hui <suhui(a)nfschina.com>
Date: Thu, 8 Aug 2024 20:23:32 +0800
Subject: [PATCH] smb/client: avoid possible NULL dereference in
cifs_free_subrequest()
Clang static checker (scan-build) warning:
cifsglob.h:line 890, column 3
Access to field 'ops' results in a dereference of a null pointer.
Commit 519be989717c ("cifs: Add a tracepoint to track credits involved in
R/W requests") adds a check for 'rdata->server', and let clang throw this
warning about NULL dereference.
When 'rdata->credits.value != 0 && rdata->server == NULL' happens,
add_credits_and_wake_if() will call rdata->server->ops->add_credits().
This will cause NULL dereference problem. Add a check for 'rdata->server'
to avoid NULL dereference.
Cc: stable(a)vger.kernel.org
Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks")
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Su Hui <suhui(a)nfschina.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index b2405dd4d4d4..45459af5044d 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -315,7 +315,7 @@ static void cifs_free_subrequest(struct netfs_io_subrequest *subreq)
#endif
}
- if (rdata->credits.value != 0)
+ if (rdata->credits.value != 0) {
trace_smb3_rw_credits(rdata->rreq->debug_id,
rdata->subreq.debug_index,
rdata->credits.value,
@@ -323,8 +323,12 @@ static void cifs_free_subrequest(struct netfs_io_subrequest *subreq)
rdata->server ? rdata->server->in_flight : 0,
-rdata->credits.value,
cifs_trace_rw_credits_free_subreq);
+ if (rdata->server)
+ add_credits_and_wake_if(rdata->server, &rdata->credits, 0);
+ else
+ rdata->credits.value = 0;
+ }
- add_credits_and_wake_if(rdata->server, &rdata->credits, 0);
if (rdata->have_xid)
free_xid(rdata->xid);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081915-antitoxic-kennel-6f2e@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081914-divided-overhaul-0e25@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081913-dominoes-octagon-edcc@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081912-hesitate-manor-84c0@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081911-flame-used-1927@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 836bb3268db405cf9021496ac4dbc26d3e4758fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081910-dawn-handoff-e37b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
836bb3268db4 ("smb3: fix lock breakage for cached writes")
3ee1a1fc3981 ("cifs: Cut over to using netfslib")
69c3c023af25 ("cifs: Implement netfslib hooks")
edea94a69730 ("cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs")
1a5b4edd97ce ("cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c")
ab58fbdeebc7 ("cifs: Use more fields from netfs_io_subrequest")
a975a2f22cdc ("cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest")
753b67eb630d ("cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest")
0f7c0f3f5150 ("cifs: Use alternative invalidation to using launder_folio")
2e9d7e4b984a ("mm: Remove the PG_fscache alias for PG_private_2")
2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
f3dc1bdb6b0b ("cifs: Fix writeback data corruption")
d1bba17e20d5 ("Merge tag '6.8-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 836bb3268db405cf9021496ac4dbc26d3e4758fe Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 15 Aug 2024 14:03:43 -0500
Subject: [PATCH] smb3: fix lock breakage for cached writes
Mandatory locking is enforced for cached writes, which violates
default posix semantics, and also it is enforced inconsistently.
This apparently breaks recent versions of libreoffice, but can
also be demonstrated by opening a file twice from the same
client, locking it from handle one and writing to it from
handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock"
(which defaults to off), with this change only when the user
intentionally specifies "forcemandatorylock" on mount will we
break posix semantics on write to a locked range (ie we will
only fail the write in this case, if the user mounts with
"forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks")
Cc: stable(a)vger.kernel.org
Cc: Pavel Shilovsky <piastryyy(a)gmail.com>
Reported-by: abartlet(a)samba.org
Reported-by: Kevin Ottens <kevin.ottens(a)enioka.com>
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 45459af5044d..06a0667f8ff2 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2753,6 +2753,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file->f_mapping->host;
struct cifsInodeInfo *cinode = CIFS_I(inode);
struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
+ struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
ssize_t rc;
rc = netfs_start_io_write(inode);
@@ -2769,12 +2770,16 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
if (rc <= 0)
goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
+ if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) &&
+ (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from),
server->vals->exclusive_lock_type, 0,
- NULL, CIFS_WRITE_OP))
- rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
- else
+ NULL, CIFS_WRITE_OP))) {
rc = -EACCES;
+ goto out;
+ }
+
+ rc = netfs_buffered_write_iter_locked(iocb, from, NULL);
+
out:
up_read(&cinode->lock_sem);
netfs_end_io_write(inode);
The return value of drm_atomic_get_crtc_state() needs to be
checked. To avoid use of error pointer 'crtc_state' in case
of the failure.
Cc: stable(a)vger.kernel.org
Fixes: dec92020671c ("drm: Use the state pointer directly in planes atomic_check")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/gpu/drm/sti/sti_cursor.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/sti/sti_cursor.c b/drivers/gpu/drm/sti/sti_cursor.c
index db0a1eb53532..e460f5ba2d87 100644
--- a/drivers/gpu/drm/sti/sti_cursor.c
+++ b/drivers/gpu/drm/sti/sti_cursor.c
@@ -200,6 +200,8 @@ static int sti_cursor_atomic_check(struct drm_plane *drm_plane,
return 0;
crtc_state = drm_atomic_get_crtc_state(state, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
mode = &crtc_state->mode;
dst_x = new_plane_state->crtc_x;
dst_y = new_plane_state->crtc_y;
--
2.25.1
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 6e6f58a170ea98e44075b761f2da42a5aec47dfb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081916-anguished-snooze-7b66@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
6e6f58a170ea ("thermal: gov_bang_bang: Use governor_data to reduce overhead")
5f64b4a1ab1b ("thermal: gov_bang_bang: Add .manage() callback")
84248e35d9b6 ("thermal: gov_bang_bang: Split bang_bang_control()")
b9b6ee6fe258 ("thermal: gov_bang_bang: Call __thermal_cdev_update() directly")
2c637af8a74d ("thermal: gov_bang_bang: Drop unnecessary cooling device target state checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6e6f58a170ea98e44075b761f2da42a5aec47dfb Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Tue, 13 Aug 2024 16:29:11 +0200
Subject: [PATCH] thermal: gov_bang_bang: Use governor_data to reduce overhead
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
After running once, the for_each_trip_desc() loop in
bang_bang_manage() is pure needless overhead because it is not going to
make any changes unless a new cooling device has been bound to one of
the trips in the thermal zone or the system is resuming from sleep.
For this reason, make bang_bang_manage() set governor_data for the
thermal zone and check it upfront to decide whether or not it needs to
do anything.
However, governor_data needs to be reset in some cases to let
bang_bang_manage() know that it should walk the trips again, so add an
.update_tz() callback to the governor and make the core additionally
invoke it during system resume.
To avoid affecting the other users of that callback unnecessarily, add
a special notification reason for system resume, THERMAL_TZ_RESUME, and
also pass it to __thermal_zone_device_update() called during system
resume for consistency.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Peter Kästle <peter(a)piie.net>
Reviewed-by: Zhang Rui <rui.zhang(a)intel.com>
Cc: 6.10+ <stable(a)vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index bc55e0698bfa..daed67d19efb 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -86,6 +86,10 @@ static void bang_bang_manage(struct thermal_zone_device *tz)
const struct thermal_trip_desc *td;
struct thermal_instance *instance;
+ /* If the code below has run already, nothing needs to be done. */
+ if (tz->governor_data)
+ return;
+
for_each_trip_desc(tz, td) {
const struct thermal_trip *trip = &td->trip;
@@ -107,11 +111,25 @@ static void bang_bang_manage(struct thermal_zone_device *tz)
bang_bang_set_instance_target(instance, 0);
}
}
+
+ tz->governor_data = (void *)true;
+}
+
+static void bang_bang_update_tz(struct thermal_zone_device *tz,
+ enum thermal_notify_event reason)
+{
+ /*
+ * Let bang_bang_manage() know that it needs to walk trips after binding
+ * a new cdev and after system resume.
+ */
+ if (reason == THERMAL_TZ_BIND_CDEV || reason == THERMAL_TZ_RESUME)
+ tz->governor_data = NULL;
}
static struct thermal_governor thermal_gov_bang_bang = {
.name = "bang_bang",
.trip_crossed = bang_bang_control,
.manage = bang_bang_manage,
+ .update_tz = bang_bang_update_tz,
};
THERMAL_GOVERNOR_DECLARE(thermal_gov_bang_bang);
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 95c399f94744..e6669aeda1ff 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1728,7 +1728,8 @@ static void thermal_zone_device_resume(struct work_struct *work)
thermal_debug_tz_resume(tz);
thermal_zone_device_init(tz);
- __thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+ thermal_governor_update_tz(tz, THERMAL_TZ_RESUME);
+ __thermal_zone_device_update(tz, THERMAL_TZ_RESUME);
complete(&tz->resume);
tz->resuming = false;
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 25fbf960b474..b86ddca46b9e 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -55,6 +55,7 @@ enum thermal_notify_event {
THERMAL_TZ_BIND_CDEV, /* Cooling dev is bind to the thermal zone */
THERMAL_TZ_UNBIND_CDEV, /* Cooling dev is unbind from the thermal zone */
THERMAL_INSTANCE_WEIGHT_CHANGED, /* Thermal instance weight changed */
+ THERMAL_TZ_RESUME, /* Thermal zone is resuming after system sleep */
};
/**
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 5f64b4a1ab1b0412446d42e1fc2964c2cdb60b27
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081911-parlor-reformer-542a@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
5f64b4a1ab1b ("thermal: gov_bang_bang: Add .manage() callback")
84248e35d9b6 ("thermal: gov_bang_bang: Split bang_bang_control()")
b9b6ee6fe258 ("thermal: gov_bang_bang: Call __thermal_cdev_update() directly")
2c637af8a74d ("thermal: gov_bang_bang: Drop unnecessary cooling device target state checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5f64b4a1ab1b0412446d42e1fc2964c2cdb60b27 Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Tue, 13 Aug 2024 16:27:33 +0200
Subject: [PATCH] thermal: gov_bang_bang: Add .manage() callback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
After recent changes, the Bang-bang governor may not adjust the
initial configuration of cooling devices to the actual situation.
Namely, if a cooling device bound to a certain trip point starts in
the "on" state and the thermal zone temperature is below the threshold
of that trip point, the trip point may never be crossed on the way up
in which case the state of the cooling device will never be adjusted
because the thermal core will never invoke the governor's
.trip_crossed() callback. [Note that there is no issue if the zone
temperature is at the trip threshold or above it to start with because
.trip_crossed() will be invoked then to indicate the start of thermal
mitigation for the given trip.]
To address this, add a .manage() callback to the Bang-bang governor
and use it to ensure that all of the thermal instances managed by the
governor have been initialized properly and the states of all of the
cooling devices involved have been adjusted to the current zone
temperature as appropriate.
Fixes: 530c932bdf75 ("thermal: gov_bang_bang: Use .trip_crossed() instead of .throttle()")
Link: https://lore.kernel.org/linux-pm/1bfbbae5-42b0-4c7d-9544-e98855715294@piie.…
Cc: 6.10+ <stable(a)vger.kernel.org> # 6.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Peter Kästle <peter(a)piie.net>
Reviewed-by: Zhang Rui <rui.zhang(a)intel.com>
Link: https://patch.msgid.link/8419356.T7Z3S40VBb@rjwysocki.net
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index 87cff3ea77a9..bc55e0698bfa 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -26,6 +26,7 @@ static void bang_bang_set_instance_target(struct thermal_instance *instance,
* when the trip is crossed on the way down.
*/
instance->target = target;
+ instance->initialized = true;
dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
@@ -80,8 +81,37 @@ static void bang_bang_control(struct thermal_zone_device *tz,
}
}
+static void bang_bang_manage(struct thermal_zone_device *tz)
+{
+ const struct thermal_trip_desc *td;
+ struct thermal_instance *instance;
+
+ for_each_trip_desc(tz, td) {
+ const struct thermal_trip *trip = &td->trip;
+
+ if (tz->temperature >= td->threshold ||
+ trip->temperature == THERMAL_TEMP_INVALID ||
+ trip->type == THERMAL_TRIP_CRITICAL ||
+ trip->type == THERMAL_TRIP_HOT)
+ continue;
+
+ /*
+ * If the initial cooling device state is "on", but the zone
+ * temperature is not above the trip point, the core will not
+ * call bang_bang_control() until the zone temperature reaches
+ * the trip point temperature which may be never. In those
+ * cases, set the initial state of the cooling device to 0.
+ */
+ list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
+ if (!instance->initialized && instance->trip == trip)
+ bang_bang_set_instance_target(instance, 0);
+ }
+ }
+}
+
static struct thermal_governor thermal_gov_bang_bang = {
.name = "bang_bang",
.trip_crossed = bang_bang_control,
+ .manage = bang_bang_manage,
};
THERMAL_GOVERNOR_DECLARE(thermal_gov_bang_bang);
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 84248e35d9b60e03df7276627e4e91fbaf80f73d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081904-sprite-urologist-d8e9@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
84248e35d9b6 ("thermal: gov_bang_bang: Split bang_bang_control()")
b9b6ee6fe258 ("thermal: gov_bang_bang: Call __thermal_cdev_update() directly")
2c637af8a74d ("thermal: gov_bang_bang: Drop unnecessary cooling device target state checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 84248e35d9b60e03df7276627e4e91fbaf80f73d Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Tue, 13 Aug 2024 16:26:42 +0200
Subject: [PATCH] thermal: gov_bang_bang: Split bang_bang_control()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Move the setting of the thermal instance target state from
bang_bang_control() into a separate function that will be also called
in a different place going forward.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Peter Kästle <peter(a)piie.net>
Reviewed-by: Zhang Rui <rui.zhang(a)intel.com>
Cc: 6.10+ <stable(a)vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/3313587.aeNJFYEL58@rjwysocki.net
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index b9474c6af72b..87cff3ea77a9 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -13,6 +13,27 @@
#include "thermal_core.h"
+static void bang_bang_set_instance_target(struct thermal_instance *instance,
+ unsigned int target)
+{
+ if (instance->target != 0 && instance->target != 1 &&
+ instance->target != THERMAL_NO_TARGET)
+ pr_debug("Unexpected state %ld of thermal instance %s in bang-bang\n",
+ instance->target, instance->name);
+
+ /*
+ * Enable the fan when the trip is crossed on the way up and disable it
+ * when the trip is crossed on the way down.
+ */
+ instance->target = target;
+
+ dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
+
+ mutex_lock(&instance->cdev->lock);
+ __thermal_cdev_update(instance->cdev);
+ mutex_unlock(&instance->cdev->lock);
+}
+
/**
* bang_bang_control - controls devices associated with the given zone
* @tz: thermal_zone_device
@@ -54,25 +75,8 @@ static void bang_bang_control(struct thermal_zone_device *tz,
tz->temperature, trip->hysteresis);
list_for_each_entry(instance, &tz->thermal_instances, tz_node) {
- if (instance->trip != trip)
- continue;
-
- if (instance->target != 0 && instance->target != 1 &&
- instance->target != THERMAL_NO_TARGET)
- pr_debug("Unexpected state %ld of thermal instance %s in bang-bang\n",
- instance->target, instance->name);
-
- /*
- * Enable the fan when the trip is crossed on the way up and
- * disable it when the trip is crossed on the way down.
- */
- instance->target = crossed_up;
-
- dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
-
- mutex_lock(&instance->cdev->lock);
- __thermal_cdev_update(instance->cdev);
- mutex_unlock(&instance->cdev->lock);
+ if (instance->trip == trip)
+ bang_bang_set_instance_target(instance, crossed_up);
}
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081930-untoasted-corporal-214e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081931-improvise-purgatory-bbe3@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081930-commuting-makeover-de1c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081929-kangaroo-abstract-f5d8@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081927-ungodly-gumminess-adf9@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081928-deduct-humongous-41ae@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
ccbfcac05866 ("ALSA: timer: Relax start tick time check for slave timer elements")
4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Sat, 10 Aug 2024 10:48:32 +0200
Subject: [PATCH] ALSA: timer: Relax start tick time check for slave timer
elements
The recent addition of a sanity check for a too low start tick time
seems breaking some applications that uses aloop with a certain slave
timer setup. They may have the initial resolution 0, hence it's
treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing
the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/raspberrypi/linux/issues/6294
Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/core/timer.c b/sound/core/timer.c
index d104adc75a8b..71a07c1662f5 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
/* check the actual time for the start tick;
* bail out as error if it's way too low (< 100us)
*/
- if (start) {
+ if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) {
if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000)
return -EINVAL;
}
Several cs track offsets (such as 'track->db_s_read_offset')
either are initialized with or plainly take big enough values that,
once shifted 8 bits left, may be hit with integer overflow if the
resulting values end up going over u32 limit.
Some debug prints take this into account (see according dev_warn() in
evergreen_cs_track_validate_stencil()), even if the actual
calculated value assigned to local 'offset' variable is missing
similar proper expansion.
Mitigate the problem by casting the type of right operands to the
wider type of corresponding left ones in all such cases.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 285484e2d55e ("drm/radeon: add support for evergreen/ni tiling informations v11")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
---
P.S. While I am not certain that track->cb_color_bo_offset[id]
actually ends up taking values high enough to cause an overflow,
nonetheless I thought it prudent to cast it to ulong as well.
drivers/gpu/drm/radeon/evergreen_cs.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c
index 1fe6e0d883c7..d734d221e2da 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -433,7 +433,7 @@ static int evergreen_cs_track_validate_cb(struct radeon_cs_parser *p, unsigned i
return r;
}
- offset = track->cb_color_bo_offset[id] << 8;
+ offset = (unsigned long)track->cb_color_bo_offset[id] << 8;
if (offset & (surf.base_align - 1)) {
dev_warn(p->dev, "%s:%d cb[%d] bo base %ld not aligned with %ld\n",
__func__, __LINE__, id, offset, surf.base_align);
@@ -455,7 +455,7 @@ static int evergreen_cs_track_validate_cb(struct radeon_cs_parser *p, unsigned i
min = surf.nby - 8;
}
bsize = radeon_bo_size(track->cb_color_bo[id]);
- tmp = track->cb_color_bo_offset[id] << 8;
+ tmp = (unsigned long)track->cb_color_bo_offset[id] << 8;
for (nby = surf.nby; nby > min; nby--) {
size = nby * surf.nbx * surf.bpe * surf.nsamples;
if ((tmp + size * mslice) <= bsize) {
@@ -476,10 +476,10 @@ static int evergreen_cs_track_validate_cb(struct radeon_cs_parser *p, unsigned i
}
}
dev_warn(p->dev, "%s:%d cb[%d] bo too small (layer size %d, "
- "offset %d, max layer %d, bo size %ld, slice %d)\n",
+ "offset %ld, max layer %d, bo size %ld, slice %d)\n",
__func__, __LINE__, id, surf.layer_size,
- track->cb_color_bo_offset[id] << 8, mslice,
- radeon_bo_size(track->cb_color_bo[id]), slice);
+ (unsigned long)track->cb_color_bo_offset[id] << 8,
+ mslice, radeon_bo_size(track->cb_color_bo[id]), slice);
dev_warn(p->dev, "%s:%d problematic surf: (%d %d) (%d %d %d %d %d %d %d)\n",
__func__, __LINE__, surf.nbx, surf.nby,
surf.mode, surf.bpe, surf.nsamples,
@@ -608,7 +608,7 @@ static int evergreen_cs_track_validate_stencil(struct radeon_cs_parser *p)
return r;
}
- offset = track->db_s_read_offset << 8;
+ offset = (unsigned long)track->db_s_read_offset << 8;
if (offset & (surf.base_align - 1)) {
dev_warn(p->dev, "%s:%d stencil read bo base %ld not aligned with %ld\n",
__func__, __LINE__, offset, surf.base_align);
@@ -627,7 +627,7 @@ static int evergreen_cs_track_validate_stencil(struct radeon_cs_parser *p)
return -EINVAL;
}
- offset = track->db_s_write_offset << 8;
+ offset = (unsigned long)track->db_s_write_offset << 8;
if (offset & (surf.base_align - 1)) {
dev_warn(p->dev, "%s:%d stencil write bo base %ld not aligned with %ld\n",
__func__, __LINE__, offset, surf.base_align);
@@ -706,7 +706,7 @@ static int evergreen_cs_track_validate_depth(struct radeon_cs_parser *p)
return r;
}
- offset = track->db_z_read_offset << 8;
+ offset = (unsigned long)track->db_z_read_offset << 8;
if (offset & (surf.base_align - 1)) {
dev_warn(p->dev, "%s:%d stencil read bo base %ld not aligned with %ld\n",
__func__, __LINE__, offset, surf.base_align);
@@ -722,7 +722,7 @@ static int evergreen_cs_track_validate_depth(struct radeon_cs_parser *p)
return -EINVAL;
}
- offset = track->db_z_write_offset << 8;
+ offset = (unsigned long)track->db_z_write_offset << 8;
if (offset & (surf.base_align - 1)) {
dev_warn(p->dev, "%s:%d stencil write bo base %ld not aligned with %ld\n",
__func__, __LINE__, offset, surf.base_align);
If formatting a suspended disk (such as formatting with different DIF
type), the disk will be resuming first, and then the format command will
submit to the disk through SG_IO ioctl.
When the disk is processing the format command, the system does not submit
other commands to the disk. Therefore, the system attempts to suspend the
disk again and sends the SYNC CACHE command. However, the SYNC CACHE
command will fail because the disk is in the formatting process, which
will cause the runtime_status of the disk to error and it is difficult
for user to recover it. Error info like:
[ 669.925325] sd 6:0:6:0: [sdg] Synchronizing SCSI cache
[ 670.202371] sd 6:0:6:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
[ 670.216300] sd 6:0:6:0: [sdg] Sense Key : 0x2 [current]
[ 670.221860] sd 6:0:6:0: [sdg] ASC=0x4 ASCQ=0x4
To solve the issue, retry the command until format command is finished.
Cc: stable(a)vger.kernel.org
Signed-off-by: Yihang Li <liyihang9(a)huawei.com>
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
---
Changes since v3:
- Add Cc tag for kernel stable.
Changes since v2:
- Add Reviewed-by for Bart.
Changes since v1:
- Updated and added error information to the patch description.
---
drivers/scsi/sd.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index adeaa8ab9951..5cd88a8eea73 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1823,6 +1823,11 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
(sshdr.asc == 0x74 && sshdr.ascq == 0x71)) /* drive is password locked */
/* this is no error here */
return 0;
+
+ /* retry if format in progress */
+ if (sshdr.asc == 0x4 && sshdr.ascq == 0x4)
+ return -EBUSY;
+
/*
* This drive doesn't support sync and there's not much
* we can do because this is called during shutdown
--
2.33.0
When operating in High-Speed, it is observed that DSTS[USBLNKST] doesn't
update link state immediately after receiving the wakeup interrupt. Since
wakeup event handler calls the resume callbacks, there is a chance that
function drivers can perform an ep queue, which in turn tries to perform
remote wakeup from send_gadget_ep_cmd(STARTXFER). This happens because
DSTS[[21:18] wasn't updated to U0 yet, it's observed that the latency of
DSTS can be in order of milli-seconds. Hence avoid calling gadget_wakeup
during startxfer to prevent unnecessarily issuing remote wakeup to host.
Fixes: c36d8e947a56 ("usb: dwc3: gadget: put link to U0 before Start Transfer")
Cc: <stable(a)vger.kernel.org>
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
v2: Refactored the patch as suggested in v1 discussion.
drivers/usb/dwc3/gadget.c | 24 ------------------------
1 file changed, 24 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..3f634209c5b8 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -327,30 +327,6 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
}
- if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int link_state;
-
- /*
- * Initiate remote wakeup if the link state is in U3 when
- * operating in SS/SSP or L1/L2 when operating in HS/FS. If the
- * link state is in U1/U2, no remote wakeup is needed. The Start
- * Transfer command will initiate the link recovery.
- */
- link_state = dwc3_gadget_get_link_state(dwc);
- switch (link_state) {
- case DWC3_LINK_STATE_U2:
- if (dwc->gadget->speed >= USB_SPEED_SUPER)
- break;
-
- fallthrough;
- case DWC3_LINK_STATE_U3:
- ret = __dwc3_gadget_wakeup(dwc, false);
- dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
- ret);
- break;
- }
- }
-
/*
* For some commands such as Update Transfer command, DEPCMDPARn
* registers are reserved. Since the driver often sends Update Transfer
--
2.25.1
I'm announcing the release of the 6.10.6 kernel.
All users of the 6.10 kernel series must upgrade.
The updated 6.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/loongarch/include/uapi/asm/unistd.h | 1
drivers/ata/libata-scsi.c | 15
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 232 ++++--------
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 6
drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 90 +++-
drivers/gpu/drm/amd/display/dc/dc_stream.h | 8
drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c | 2
drivers/media/usb/dvb-usb/dvb-usb-init.c | 35 -
drivers/nvme/host/pci.c | 7
drivers/platform/x86/Kconfig | 1
drivers/platform/x86/amd/pmf/spc.c | 32 -
drivers/platform/x86/ideapad-laptop.c | 148 ++++++-
drivers/platform/x86/ideapad-laptop.h | 9
drivers/platform/x86/lenovo-ymc.c | 60 ---
fs/binfmt_flat.c | 4
fs/exec.c | 8
fs/f2fs/extent_cache.c | 48 --
fs/f2fs/f2fs.h | 2
fs/f2fs/gc.c | 10
fs/f2fs/inode.c | 10
fs/jfs/jfs_dmap.c | 2
fs/jfs/jfs_dtree.c | 2
fs/ntfs3/frecord.c | 75 +++
net/core/filter.c | 8
net/ipv4/fou_core.c | 2
sound/soc/codecs/cs35l56-shared.c | 1
sound/usb/mixer.c | 7
29 files changed, 486 insertions(+), 355 deletions(-)
Chao Yu (2):
f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC
f2fs: fix to cover read extent cache access with lock
Edward Adam Davis (1):
jfs: fix null ptr deref in dtInsertEntry
Fangzhi Zuo (1):
drm/amd/display: Prevent IPX From Link Detect and Set Mode
Gergo Koteles (3):
platform/x86: ideapad-laptop: introduce a generic notification chain
platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc
platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands
Greg Kroah-Hartman (2):
Revert "drm/amd/display: Refactor function dm_dp_mst_is_port_support_mode()"
Linux 6.10.6
Harry Wentland (1):
drm/amd/display: Separate setting and programming of cursor
Huacai Chen (1):
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Kees Cook (2):
exec: Fix ToCToU between perm check and set-uid/gid usage
binfmt_flat: Fix corruption when not offsetting data start
Konstantin Komarov (1):
fs/ntfs3: Do copy_to_user out of run_lock
Niklas Cassel (1):
Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
Pei Li (1):
jfs: Fix shift-out-of-bounds in dbDiscardAG
Sean Young (1):
media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"
Shyam Sundar S K (1):
platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
Simon Trimmer (1):
ASoC: cs35l56: Patch CS35L56_IRQ1_MASK_18 to the default value
Srinivasan Shanmugam (1):
drm/amdgpu/display: Fix null pointer dereference in dc_stream_program_cursor_position
Takashi Iwai (1):
ALSA: usb: Fix UBSAN warning in parse_audio_unit()
WangYuli (1):
nvme/pci: Add APST quirk for Lenovo N60z laptop
Wayne Lin (2):
drm/amd/display: Defer handling mst up request in resume
drm/amd/display: Solve mst monitors blank out problem after resume
Willem de Bruijn (1):
fou: remove warn in gue_gro_receive on unsupported protocol
yunshui (1):
bpf, net: Use DEV_STAT_INC()
I'm announcing the release of the 6.1.106 kernel.
All users of the 6.1 kernel series must upgrade.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm64/kvm/hyp/pgtable.c | 10 -
arch/loongarch/include/uapi/asm/unistd.h | 1
drivers/ata/libata-scsi.c | 15 +
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 192 ++++++++++++++++------
drivers/gpu/drm/i915/gem/i915_gem_mman.h | 2
drivers/media/usb/dvb-usb/dvb-usb-init.c | 35 ----
drivers/nvme/host/pci.c | 7
fs/binfmt_flat.c | 4
fs/exec.c | 8
fs/lockd/svc.c | 3
fs/nfs/callback.c | 3
fs/nfsd/export.c | 32 ++-
fs/nfsd/export.h | 4
fs/nfsd/netns.h | 25 ++
fs/nfsd/nfs4proc.c | 6
fs/nfsd/nfscache.c | 201 ++++++++++++++----------
fs/nfsd/nfsctl.c | 24 +-
fs/nfsd/nfsd.h | 1
fs/nfsd/nfsfh.c | 3
fs/nfsd/nfssvc.c | 24 +-
fs/nfsd/stats.c | 52 ++----
fs/nfsd/stats.h | 85 +++-------
fs/nfsd/trace.h | 22 ++
fs/nfsd/vfs.c | 6
include/linux/cgroup-defs.h | 7
include/linux/sunrpc/svc.h | 5
kernel/cgroup/cgroup-internal.h | 3
kernel/cgroup/cgroup.c | 23 +-
net/mptcp/options.c | 3
net/mptcp/pm_netlink.c | 49 +++--
net/mptcp/pm_userspace.c | 2
net/mptcp/protocol.h | 2
net/sunrpc/stats.c | 2
net/sunrpc/svc.c | 36 ++--
net/wireless/nl80211.c | 6
sound/soc/soc-topology.c | 32 ---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 14 +
38 files changed, 572 insertions(+), 379 deletions(-)
Amadeusz Sławiński (2):
ASoC: topology: Clean up route loading
ASoC: topology: Fix route memory corruption
Andi Shyti (2):
drm/i915/gem: Fix Virtual Memory mapping boundaries calculation
drm/i915/gem: Adjust vma offset for framebuffer mmap offset
Chuck Lever (6):
NFSD: Refactor nfsd_reply_cache_free_locked()
NFSD: Rename nfsd_reply_cache_alloc()
NFSD: Replace nfsd_prune_bucket()
NFSD: Refactor the duplicate reply cache shrinker
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
NFSD: Fix frame size warning in svc_export_parse()
Dan Carpenter (1):
drm/i915: Fix a NULL vs IS_ERR() bug
Eric Dumazet (1):
wifi: cfg80211: restrict NL80211_ATTR_TXQ_QUANTUM values
Geliang Tang (1):
mptcp: pass addr to mptcp_pm_alloc_anno_list
Greg Kroah-Hartman (1):
Linux 6.1.106
Huacai Chen (1):
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Jeff Layton (2):
nfsd: move reply cache initialization into nfsd startup
nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net
Josef Bacik (10):
sunrpc: don't change ->sv_stats if it doesn't exist
nfsd: stop setting ->pg_stats for unused stats
sunrpc: pass in the sv_stats struct through svc_create_pooled
sunrpc: remove ->pg_stats from svc_program
sunrpc: use the struct net as the svc proc private
nfsd: rename NFSD_NET_* to NFSD_STATS_*
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
nfsd: make all of the nfsd stats per-network namespace
nfsd: remove nfsd_stats, make th_cnt a global counter
nfsd: make svc_stat per-network namespace instead of global
Kees Cook (2):
exec: Fix ToCToU between perm check and set-uid/gid usage
binfmt_flat: Fix corruption when not offsetting data start
Matthieu Baerts (NGI0) (5):
mptcp: pm: reduce indentation blocks
mptcp: pm: don't try to create sf if alloc failed
mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set
selftests: mptcp: join: test both signal & subflow
mptcp: fully established after ADD_ADDR echo on MPJ
Niklas Cassel (1):
Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
Nirmoy Das (1):
drm/i915: Add a function to mmap framebuffer obj
Sean Young (1):
media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"
Waiman Long (1):
cgroup: Move rcu_head up near the top of cgroup_root
WangYuli (1):
nvme/pci: Add APST quirk for Lenovo N60z laptop
Will Deacon (1):
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
Yafang Shao (1):
cgroup: Make operations on the cgroup root_list RCU safe
On Sat, 2024-08-17 at 10:41 +0100, Martin Whitaker wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
>
> When performing the port_hwtstamp_set operation,
> ptp_schedule_worker()
> will be called if hardware timestamoing is enabled on any of the
> ports.
> When using multiple ports for PTP, port_hwtstamp_set is executed for
> each port. When called for the first time ptp_schedule_worker()
> returns
> 0. On subsequent calls it returns 1, indicating the worker is already
> scheduled. Currently the ksz driver treats 1 as an error and fails to
> complete the port_hwtstamp_set operation, thus leaving the
> timestamping
> configuration for those ports unchanged.
>
> This patch fixes this by ignoring the ptp_schedule_worker() return
> value.
>
> Link:
> https://lore.kernel.org/netdev/7aae307a-35ca-4209-a850-7b2749d40f90@martin-…
> Fixes: bb01ad30570b0 ("net: dsa: microchip: ptp: manipulating
> absolute time using ptp hw clock")
> Signed-off-by: Martin Whitaker <foss(a)martin-whitaker.me.uk>
Acked-by: Arun Ramadoss <arun.ramadoss(a)microchip.com>
From: Ming Yen Hsieh <mingyen.hsieh(a)mediatek.com>
Due to the lack of checks on the clc array, if the firmware supports
more clc configuration, it will cause illegal memory access.
Cc: stable(a)vger.kernel.org
Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 chips")
Signed-off-by: Ming Yen Hsieh <mingyen.hsieh(a)mediatek.com>
---
drivers/net/wireless/mediatek/mt76/mt7925/mcu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7925/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
index 9dc22fbe25d3..c6c380571fd8 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7925/mcu.c
@@ -638,6 +638,9 @@ static int mt7925_load_clc(struct mt792x_dev *dev, const char *fw_name)
for (offset = 0; offset < len; offset += le32_to_cpu(clc->len)) {
clc = (const struct mt7925_clc *)(clc_base + offset);
+ if (clc->idx > ARRAY_SIZE(phy->clc))
+ break;
+
/* do not init buf again if chip reset triggered */
if (phy->clc[clc->idx])
continue;
--
2.18.0
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
[ Upstream commit 37ae5a0f5287a52cf51242e76ccf198d02ffe495]
Since lo_simple_ioctl(LOOP_SET_BLOCK_SIZE) and ioctl(NBD_SET_BLKSIZE) pass
user-controlled "unsigned long arg" to blk_validate_block_size(),
"unsigned long" should be used for validation.
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/9ecbf057-4375-c2db-ab53-e4cc0dff953d@i-love.sakur…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
(cherry picked from commit 37ae5a0f5287a52cf51242e76ccf198d02ffe495)
Signed-off-by: David Hunter <david.hunter.linux(a)gmail.com>
---
V1 --> V2
- put upstream commit after subject
- put the original Author
- Added a few people I needed to CC
---
include/linux/blkdev.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 905844172cfd..c6d57814988d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -235,7 +235,7 @@ struct request {
void *end_io_data;
};
-static inline int blk_validate_block_size(unsigned int bsize)
+static inline int blk_validate_block_size(unsigned long bsize)
{
if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize))
return -EINVAL;
--
2.43.0
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: uapi/linux/cec.h: cec_msg_set_reply_to: zero flags
Author: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Date: Wed Aug 7 09:22:10 2024 +0200
The cec_msg_set_reply_to() helper function never zeroed the
struct cec_msg flags field, this can cause unexpected behavior
if flags was uninitialized to begin with.
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Fixes: 0dbacebede1e ("[media] cec: move the CEC framework out of staging and to media")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
include/uapi/linux/cec.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
---
diff --git a/include/uapi/linux/cec.h b/include/uapi/linux/cec.h
index 894fffc66f2c..b2af1dddd4d7 100644
--- a/include/uapi/linux/cec.h
+++ b/include/uapi/linux/cec.h
@@ -132,6 +132,8 @@ static inline void cec_msg_init(struct cec_msg *msg,
* Set the msg destination to the orig initiator and the msg initiator to the
* orig destination. Note that msg and orig may be the same pointer, in which
* case the change is done in place.
+ *
+ * It also zeroes the reply, timeout and flags fields.
*/
static inline void cec_msg_set_reply_to(struct cec_msg *msg,
struct cec_msg *orig)
@@ -139,7 +141,9 @@ static inline void cec_msg_set_reply_to(struct cec_msg *msg,
/* The destination becomes the initiator and vice versa */
msg->msg[0] = (cec_msg_destination(orig) << 4) |
cec_msg_initiator(orig);
- msg->reply = msg->timeout = 0;
+ msg->reply = 0;
+ msg->timeout = 0;
+ msg->flags = 0;
}
/**
Zero and negative number is not a valid IRQ for in-kernel code and the
irq_of_parse_and_map() function returns zero on error. So this check for
valid IRQs should only accept values > 0.
Cc: stable(a)vger.kernel.org
Fixes: 2d9e31b9412c ("dmaengine: moxart: remove NO_IRQ")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v3:
- added missed changelog v2.
Changes in v2:
- added Cc stable line;
- added Fixes line.
---
drivers/dma/moxart-dma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/moxart-dma.c b/drivers/dma/moxart-dma.c
index 66dc6d31b603..16dd3c5aba4d 100644
--- a/drivers/dma/moxart-dma.c
+++ b/drivers/dma/moxart-dma.c
@@ -568,7 +568,7 @@ static int moxart_probe(struct platform_device *pdev)
return -ENOMEM;
irq = irq_of_parse_and_map(node, 0);
- if (!irq) {
+ if (irq <= 0) {
dev_err(dev, "no IRQ resource\n");
return -EINVAL;
}
--
2.25.1
This is the start of the stable review cycle for the 6.10.6 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 18 Aug 2024 08:52:13 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.6-rc2…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.10.6-rc2
Aurabindo Pillai <aurabindo.pillai(a)amd.com>
drm/amd/display: Add misc DC changes for DCN401
Niklas Cassel <cassel(a)kernel.org>
Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
Sean Young <sean(a)mess.org>
media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"
Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
drm/amdgpu/display: Fix null pointer dereference in dc_stream_program_cursor_position
Wayne Lin <Wayne.Lin(a)amd.com>
drm/amd/display: Solve mst monitors blank out problem after resume
Kees Cook <kees(a)kernel.org>
binfmt_flat: Fix corruption when not offsetting data start
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: introduce a generic notification chain
Shyam Sundar S K <Shyam-sundar.S-k(a)amd.com>
platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb: Fix UBSAN warning in parse_audio_unit()
Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
fs/ntfs3: Do copy_to_user out of run_lock
Pei Li <peili.dev(a)gmail.com>
jfs: Fix shift-out-of-bounds in dbDiscardAG
Edward Adam Davis <eadavis(a)qq.com>
jfs: fix null ptr deref in dtInsertEntry
Willem de Bruijn <willemb(a)google.com>
fou: remove warn in gue_gro_receive on unsupported protocol
Chao Yu <chao(a)kernel.org>
f2fs: fix to cover read extent cache access with lock
Chao Yu <chao(a)kernel.org>
f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC
yunshui <jiangyunshui(a)kylinos.cn>
bpf, net: Use DEV_STAT_INC()
Simon Trimmer <simont(a)opensource.cirrus.com>
ASoC: cs35l56: Patch CS35L56_IRQ1_MASK_18 to the default value
WangYuli <wangyuli(a)uniontech.com>
nvme/pci: Add APST quirk for Lenovo N60z laptop
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Fangzhi Zuo <jerry.zuo(a)amd.com>
drm/amd/display: Prevent IPX From Link Detect and Set Mode
Harry Wentland <harry.wentland(a)amd.com>
drm/amd/display: Separate setting and programming of cursor
Wayne Lin <wayne.lin(a)amd.com>
drm/amd/display: Defer handling mst up request in resume
Kees Cook <kees(a)kernel.org>
exec: Fix ToCToU between perm check and set-uid/gid usage
-------------
Diffstat:
Makefile | 4 +-
arch/loongarch/include/uapi/asm/unistd.h | 1 +
drivers/ata/libata-scsi.c | 15 ++-
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +-
.../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 6 +
.../drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 6 +-
drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 94 ++++++++-----
drivers/gpu/drm/amd/display/dc/dc_stream.h | 8 ++
.../drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c | 2 +-
drivers/media/usb/dvb-usb/dvb-usb-init.c | 35 +----
drivers/nvme/host/pci.c | 7 +
drivers/platform/x86/Kconfig | 1 +
drivers/platform/x86/amd/pmf/spc.c | 32 ++---
drivers/platform/x86/ideapad-laptop.c | 148 ++++++++++++++++++---
drivers/platform/x86/ideapad-laptop.h | 9 ++
drivers/platform/x86/lenovo-ymc.c | 60 +--------
fs/binfmt_flat.c | 4 +-
fs/exec.c | 8 +-
fs/f2fs/extent_cache.c | 50 +++----
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/gc.c | 10 ++
fs/f2fs/inode.c | 10 +-
fs/jfs/jfs_dmap.c | 2 +
fs/jfs/jfs_dtree.c | 2 +
fs/ntfs3/frecord.c | 75 ++++++++++-
net/core/filter.c | 8 +-
net/ipv4/fou_core.c | 2 +-
sound/soc/codecs/cs35l56-shared.c | 1 +
sound/usb/mixer.c | 7 +
29 files changed, 411 insertions(+), 212 deletions(-)
From: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
Interrupt status read seems to be broken on some old MPU-6050 like
chips. Fix by reverting to previous driver behavior bypassing interrupt
status read. This is working because these chips are not supporting
WoM and data ready is the only interrupt source.
Fixes: 5537f653d9be ("iio: imu: inv_mpu6050: add new interrupt handler for WoM events")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
---
drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
index 84273660ca2e..3bfeabab0ec4 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
@@ -248,12 +248,20 @@ static irqreturn_t inv_mpu6050_interrupt_handle(int irq, void *p)
int result;
switch (st->chip_type) {
+ case INV_MPU6000:
case INV_MPU6050:
+ case INV_MPU9150:
+ /*
+ * WoM is not supported and interrupt status read seems to be broken for
+ * some chips. Since data ready is the only interrupt, bypass interrupt
+ * status read and always assert data ready bit.
+ */
+ wom_bits = 0;
+ int_status = INV_MPU6050_BIT_RAW_DATA_RDY_INT;
+ goto data_ready_interrupt;
case INV_MPU6500:
case INV_MPU6515:
case INV_MPU6880:
- case INV_MPU6000:
- case INV_MPU9150:
case INV_MPU9250:
case INV_MPU9255:
wom_bits = INV_MPU6500_BIT_WOM_INT;
@@ -279,6 +287,7 @@ static irqreturn_t inv_mpu6050_interrupt_handle(int irq, void *p)
}
}
+data_ready_interrupt:
/* handle raw data interrupt */
if (int_status & INV_MPU6050_BIT_RAW_DATA_RDY_INT) {
indio_dev->pollfunc->timestamp = st->it_timestamp;
--
2.34.1
On Fri, Aug 16, 2024 at 08:12:36PM +0000, Jon Hunter wrote:
>
> ________________________________
> From: Jon Hunter <jonathanh(a)nvidia.com>
> Sent: Friday, August 16, 2024 2:43 PM
> To: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>; patches(a)lists.linux.dev <patches(a)lists.linux.dev>; linux-kernel(a)vger.kernel.org <linux-kernel(a)vger.kernel.org>; torvalds(a)linux-foundation.org <torvalds(a)linux-foundation.org>; akpm(a)linux-foundation.org <akpm(a)linux-foundation.org>; linux(a)roeck-us.net <linux(a)roeck-us.net>; shuah(a)kernel.org <shuah(a)kernel.org>; patches(a)kernelci.org <patches(a)kernelci.org>; lkft-triage(a)lists.linaro.org <lkft-triage(a)lists.linaro.org>; pavel(a)denx.de <pavel(a)denx.de>; Jon Hunter <jonathanh(a)nvidia.com>; f.fainelli(a)gmail.com <f.fainelli(a)gmail.com>; sudipm.mukherjee(a)gmail.com <sudipm.mukherjee(a)gmail.com>; srw(a)sladewatkins.net <srw(a)sladewatkins.net>; rwarsow(a)gmx.de <rwarsow(a)gmx.de>; conor(a)kernel.org <conor(a)kernel.org>; allen.lkml(a)gmail.com <allen.lkml(a)gmail.com>; broonie(a)kernel.org <broonie(a)kernel.org>; linux-tegra(a)vger.kernel.org <linux-tegra(a)vger.kernel.org>; stable(a)vger.kernel.org <stable(a)vger.kernel.org>
> Subject: Re: [PATCH 5.10 000/350] 5.10.224-rc2 review
>
> On Fri, 16 Aug 2024 12:22:05 +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 5.10.224 release.
> > There are 350 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sun, 18 Aug 2024 10:14:04 +0000.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.224-r…
> > or in the git tree and branch at:
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
>
> Failures detected for Tegra ...
>
> Test results for stable-v5.10:
> 10 builds: 10 pass, 0 fail
> 31 boots: 26 pass, 5 fail
> 45 tests: 44 pass, 1 fail
>
> Linux version: 5.10.224-rc2-g470450f8c61c
> Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000,
> tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000,
> tegra20-ventana, tegra210-p2371-2180,
> tegra210-p3450-0000, tegra30-cardhu-a04
>
> Boot failures: tegra186-p2771-0000, tegra210-p2371-2180,
> tegra210-p3450-0000
>
> Test failures: tegra194-p2972-0000: boot.py
>
> ---
>
> Apologies for the mail formatting. I am travelling and only have outlook for mobile :-(
>
> Bisect points to the following commit ...
>
> # first bad commit: [4bade5a6b1cfe81c9777aa3c8823009ff28a6e7f] memory: fsl_ifc: Make FSL_IFC config visible and selectable
>
> Reverting this does fix the issue. Seems odd but this appears to disable CONFIG_MEMORY for v5.10 with ARM64 defconfig. So something we need to fix.
Ah, that's a mess. I'll go drop this one for now, glad it's not showing
up on 5.15.y where this commit also is. It's not really important for
5.10.y so there's no harm in removing it.
thanks,
greg k-h
This series begins with some work on the mac_scsi driver to improve
compatibility with SCSI2SD v5 devices. Better error handling is needed
there because the PDMA hardware does not tolerate the write latency spikes
which SD cards can produce.
A bug is fixed in the 5380 core driver so that scatter/gather can be
enabled in mac_scsi.
Several patches at the end of this series improve robustness and correctness
in the core driver.
This series has been tested on a variety of mac_scsi hosts. A variety of
SCSI targets was also tested, including Quantum HDD, Fujitsu HDD, Iomega FDD,
Ricoh CD-RW, Matsushita CD-ROM, SCSI2SD and BlueSCSI.
Finn Thain (11):
scsi: mac_scsi: Revise printk(KERN_DEBUG ...) messages
scsi: mac_scsi: Refactor polling loop
scsi: mac_scsi: Disallow bus errors during PDMA send
scsi: NCR5380: Check for phase match during PDMA fixup
scsi: mac_scsi: Enable scatter/gather by default
scsi: NCR5380: Initialize buffer for MSG IN and STATUS transfers
scsi: NCR5380: Handle BSY signal loss during information transfer
phases
scsi: NCR5380: Drop redundant member from struct NCR5380_cmd
scsi: NCR5380: Remove redundant result calculation from
NCR5380_transfer_pio()
scsi: NCR5380: Remove obsolete comment
scsi: NCR5380: Clean up indentation
drivers/scsi/NCR5380.c | 233 +++++++++++++++++++--------------------
drivers/scsi/NCR5380.h | 20 ++--
drivers/scsi/mac_scsi.c | 170 ++++++++++++++--------------
drivers/scsi/sun3_scsi.c | 2 +-
4 files changed, 215 insertions(+), 210 deletions(-)
--
2.39.5
The patch titled
Subject: kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kexec_file-fix-elfcorehdr-digest-exclusion-when-config_crash_hotplug=y.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Petr Tesarik <ptesarik(a)suse.com>
Subject: kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
Date: Mon, 5 Aug 2024 17:07:50 +0200
Fix the condition to exclude the elfcorehdr segment from the SHA digest
calculation.
The j iterator is an index into the output sha_regions[] array, not into
the input image->segment[] array. Once it reaches
image->elfcorehdr_index, all subsequent segments are excluded. Besides,
if the purgatory segment precedes the elfcorehdr segment, the elfcorehdr
may be wrongly included in the calculation.
Link: https://lkml.kernel.org/r/20240805150750.170739-1-petr.tesarik@suse.com
Fixes: f7cc804a9fd4 ("kexec: exclude elfcorehdr from the segment digest")
Signed-off-by: Petr Tesarik <ptesarik(a)suse.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Hari Bathini <hbathini(a)linux.ibm.com>
Cc: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Cc: Eric DeVolder <eric_devolder(a)yahoo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kexec_file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/kexec_file.c~kexec_file-fix-elfcorehdr-digest-exclusion-when-config_crash_hotplug=y
+++ a/kernel/kexec_file.c
@@ -752,7 +752,7 @@ static int kexec_calculate_store_digests
#ifdef CONFIG_CRASH_HOTPLUG
/* Exclude elfcorehdr segment to allow future changes via hotplug */
- if (j == image->elfcorehdr_index)
+ if (i == image->elfcorehdr_index)
continue;
#endif
_
Patches currently in -mm which might be from ptesarik(a)suse.com are
kexec_file-fix-elfcorehdr-digest-exclusion-when-config_crash_hotplug=y.patch
This is the start of the stable review cycle for the 6.1.106 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 17 Aug 2024 13:18:17 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.106-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.106-rc1
Will Deacon <will(a)kernel.org>
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
Eric Dumazet <edumazet(a)google.com>
wifi: cfg80211: restrict NL80211_ATTR_TXQ_QUANTUM values
Waiman Long <longman(a)redhat.com>
cgroup: Move rcu_head up near the top of cgroup_root
Kees Cook <kees(a)kernel.org>
binfmt_flat: Fix corruption when not offsetting data start
Andi Shyti <andi.shyti(a)linux.intel.com>
drm/i915/gem: Adjust vma offset for framebuffer mmap offset
Dan Carpenter <dan.carpenter(a)linaro.org>
drm/i915: Fix a NULL vs IS_ERR() bug
Nirmoy Das <nirmoy.das(a)intel.com>
drm/i915: Add a function to mmap framebuffer obj
Yafang Shao <laoar.shao(a)gmail.com>
cgroup: Make operations on the cgroup root_list RCU safe
Andi Shyti <andi.shyti(a)linux.intel.com>
drm/i915/gem: Fix Virtual Memory mapping boundaries calculation
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
mptcp: fully established after ADD_ADDR echo on MPJ
WangYuli <wangyuli(a)uniontech.com>
nvme/pci: Add APST quirk for Lenovo N60z laptop
Josef Bacik <josef(a)toxicpanda.com>
nfsd: make svc_stat per-network namespace instead of global
Josef Bacik <josef(a)toxicpanda.com>
nfsd: remove nfsd_stats, make th_cnt a global counter
Josef Bacik <josef(a)toxicpanda.com>
nfsd: make all of the nfsd stats per-network namespace
Josef Bacik <josef(a)toxicpanda.com>
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
Josef Bacik <josef(a)toxicpanda.com>
nfsd: rename NFSD_NET_* to NFSD_STATS_*
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: use the struct net as the svc proc private
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: remove ->pg_stats from svc_program
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: pass in the sv_stats struct through svc_create_pooled
Josef Bacik <josef(a)toxicpanda.com>
nfsd: stop setting ->pg_stats for unused stats
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: don't change ->sv_stats if it doesn't exist
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix frame size warning in svc_export_parse()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor the duplicate reply cache shrinker
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Replace nfsd_prune_bucket()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Rename nfsd_reply_cache_alloc()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd_reply_cache_free_locked()
Jeff Layton <jlayton(a)kernel.org>
nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net
Jeff Layton <jlayton(a)kernel.org>
nfsd: move reply cache initialization into nfsd startup
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Kees Cook <kees(a)kernel.org>
exec: Fix ToCToU between perm check and set-uid/gid usage
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Fix route memory corruption
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Clean up route loading
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
selftests: mptcp: join: test both signal & subflow
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
mptcp: pm: don't try to create sf if alloc failed
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
mptcp: pm: reduce indentation blocks
Geliang Tang <geliang.tang(a)suse.com>
mptcp: pass addr to mptcp_pm_alloc_anno_list
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 10 +-
arch/loongarch/include/uapi/asm/unistd.h | 1 +
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 192 ++++++++++++++++------
drivers/gpu/drm/i915/gem/i915_gem_mman.h | 2 +-
drivers/nvme/host/pci.c | 7 +
fs/binfmt_flat.c | 4 +-
fs/exec.c | 8 +-
fs/lockd/svc.c | 3 -
fs/nfs/callback.c | 3 -
fs/nfsd/export.c | 32 ++--
fs/nfsd/export.h | 4 +-
fs/nfsd/netns.h | 25 ++-
fs/nfsd/nfs4proc.c | 6 +-
fs/nfsd/nfscache.c | 201 ++++++++++++++----------
fs/nfsd/nfsctl.c | 24 ++-
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 3 +-
fs/nfsd/nfssvc.c | 24 +--
fs/nfsd/stats.c | 52 +++---
fs/nfsd/stats.h | 83 ++++------
fs/nfsd/trace.h | 22 +++
fs/nfsd/vfs.c | 6 +-
include/linux/cgroup-defs.h | 7 +-
include/linux/sunrpc/svc.h | 5 +-
kernel/cgroup/cgroup-internal.h | 3 +-
kernel/cgroup/cgroup.c | 23 ++-
net/mptcp/options.c | 3 +-
net/mptcp/pm_netlink.c | 49 +++---
net/mptcp/pm_userspace.c | 2 +-
net/mptcp/protocol.h | 2 +-
net/sunrpc/stats.c | 2 +-
net/sunrpc/svc.c | 36 +++--
net/wireless/nl80211.c | 6 +-
sound/soc/soc-topology.c | 32 +---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 14 ++
36 files changed, 555 insertions(+), 346 deletions(-)
This is the start of the stable review cycle for the 6.6.47 release.
There are 67 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 17 Aug 2024 13:18:17 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.47-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.6.47-rc1
Will Deacon <will(a)kernel.org>
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
Will Deacon <will(a)kernel.org>
KVM: arm64: Don't defer TLB invalidation when zapping table entries
Waiman Long <longman(a)redhat.com>
cgroup: Move rcu_head up near the top of cgroup_root
Peter Xu <peterx(a)redhat.com>
mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
Revert "Input: bcm5974 - check endpoint type before starting traffic"
Dave Kleikamp <dave.kleikamp(a)oracle.com>
Revert "jfs: fix shift-out-of-bounds in dbJoin"
Kees Cook <kees(a)kernel.org>
binfmt_flat: Fix corruption when not offsetting data start
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb: Fix UBSAN warning in parse_audio_unit()
Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
fs/ntfs3: Do copy_to_user out of run_lock
Pei Li <peili.dev(a)gmail.com>
jfs: Fix shift-out-of-bounds in dbDiscardAG
Edward Adam Davis <eadavis(a)qq.com>
jfs: fix null ptr deref in dtInsertEntry
Willem de Bruijn <willemb(a)google.com>
fou: remove warn in gue_gro_receive on unsupported protocol
Chao Yu <chao(a)kernel.org>
f2fs: fix to cover read extent cache access with lock
Chao Yu <chao(a)kernel.org>
f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC
yunshui <jiangyunshui(a)kylinos.cn>
bpf, net: Use DEV_STAT_INC()
Wojciech Gładysz <wojciech.gladysz(a)infogain.com>
ext4: sanity check for NULL pointer after ext4_force_shutdown
Matthew Wilcox (Oracle) <willy(a)infradead.org>
ext4: convert ext4_da_do_write_end() to take a folio
Eric Dumazet <edumazet(a)google.com>
wifi: cfg80211: restrict NL80211_ATTR_TXQ_QUANTUM values
Peter Xu <peterx(a)redhat.com>
mm/page_table_check: support userfault wr-protect entries
Jan Kara <jack(a)suse.cz>
ext4: do not create EA inode under buffer lock
Jan Kara <jack(a)suse.cz>
ext4: fold quota accounting into ext4_xattr_inode_lookup_create()
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: RFCOMM: Fix not validating setsockopt user input
Eric Dumazet <edumazet(a)google.com>
nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
Eric Dumazet <edumazet(a)google.com>
net: add copy_safe_from_sockptr() helper
Eric Dumazet <edumazet(a)google.com>
mISDN: fix MISDN_TIME_STAMP handling
Gustavo A. R. Silva <gustavoars(a)kernel.org>
fs: Annotate struct file_handle with __counted_by() and use struct_size()
Alexei Starovoitov <ast(a)kernel.org>
bpf: Avoid kfree_rcu() under lock in bpf_lpm_trie.
Kees Cook <keescook(a)chromium.org>
bpf: Replace bpf_lpm_trie_key 0-length array with flexible array
Gavrilov Ilia <Ilia.Gavrilov(a)infotecs.ru>
pppoe: Fix memory leak in pppoe_sendmsg()
Dmitry Antipov <dmantipov(a)yandex.ru>
net: sctp: fix skb leak in sctp_inq_free()
Allison Henderson <allison.henderson(a)oracle.com>
net:rds: Fix possible deadlock in rds_message_put
Jan Kara <jack(a)suse.cz>
quota: Detect loops in quota tree
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
Input: bcm5974 - check endpoint type before starting traffic
John Fastabend <john.fastabend(a)gmail.com>
net: tls, add test to capture error on large splice
Gao Xiang <xiang(a)kernel.org>
erofs: avoid debugging output for (de)compressed data
Edward Adam Davis <eadavis(a)qq.com>
reiserfs: fix uninit-value in comp_keys
Phillip Lougher <phillip(a)squashfs.org.uk>
Squashfs: fix variable overflow triggered by sysbot
Lizhi Xu <lizhi.xu(a)windriver.com>
squashfs: squashfs_read_data need to check if the length is 0
Manas Ghandat <ghandatmanas(a)gmail.com>
jfs: fix shift-out-of-bounds in dbJoin
Jakub Kicinski <kuba(a)kernel.org>
net: don't dump stack on queue timeout
Lizhi Xu <lizhi.xu(a)windriver.com>
jfs: fix log->bdev_handle null ptr deref in lbmStartIO
Jan Kara <jack(a)suse.cz>
jfs: Convert to bdev_open_by_dev()
Jan Kara <jack(a)suse.cz>
fs: Convert to bdev_open_by_dev()
Johannes Berg <johannes.berg(a)intel.com>
wifi: mac80211: fix change_address deadlock during unregister
Johannes Berg <johannes.berg(a)intel.com>
wifi: mac80211: take wiphy lock for MAC addr change
Eric Dumazet <edumazet(a)google.com>
tcp_metrics: optimize tcp_metrics_flush_all()
Yafang Shao <laoar.shao(a)gmail.com>
cgroup: Make operations on the cgroup root_list RCU safe
Dongli Zhang <dongli.zhang(a)oracle.com>
genirq/cpuhotplug: Retry with cpu_online_mask when migration fails
David Stevens <stevensd(a)chromium.org>
genirq/cpuhotplug: Skip suspended interrupts when restoring affinity
WangYuli <wangyuli(a)uniontech.com>
nvme/pci: Add APST quirk for Lenovo N60z laptop
Yang Shi <yang(a)os.amperecomputing.com>
mm: gup: stop abusing try_grab_folio
Josef Bacik <josef(a)toxicpanda.com>
nfsd: make svc_stat per-network namespace instead of global
Josef Bacik <josef(a)toxicpanda.com>
nfsd: remove nfsd_stats, make th_cnt a global counter
Josef Bacik <josef(a)toxicpanda.com>
nfsd: make all of the nfsd stats per-network namespace
Josef Bacik <josef(a)toxicpanda.com>
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
Josef Bacik <josef(a)toxicpanda.com>
nfsd: rename NFSD_NET_* to NFSD_STATS_*
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: use the struct net as the svc proc private
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: remove ->pg_stats from svc_program
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: pass in the sv_stats struct through svc_create_pooled
Josef Bacik <josef(a)toxicpanda.com>
nfsd: stop setting ->pg_stats for unused stats
Josef Bacik <josef(a)toxicpanda.com>
sunrpc: don't change ->sv_stats if it doesn't exist
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix frame size warning in svc_export_parse()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Fix route memory corruption
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Clean up route loading
Kees Cook <kees(a)kernel.org>
exec: Fix ToCToU between perm check and set-uid/gid usage
-------------
Diffstat:
Documentation/bpf/map_lpm_trie.rst | 2 +-
Documentation/mm/page_table_check.rst | 9 +-
Makefile | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 12 +-
arch/loongarch/include/uapi/asm/unistd.h | 1 +
arch/x86/include/asm/pgtable.h | 18 +-
drivers/isdn/mISDN/socket.c | 10 +-
drivers/net/ppp/pppoe.c | 23 +--
drivers/nvme/host/pci.c | 7 +
fs/binfmt_flat.c | 4 +-
fs/buffer.c | 2 +
fs/cramfs/inode.c | 2 +-
fs/erofs/decompressor.c | 8 +-
fs/exec.c | 8 +-
fs/ext4/inode.c | 24 ++-
fs/ext4/xattr.c | 155 +++++++-------
fs/f2fs/extent_cache.c | 50 ++---
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/gc.c | 10 +
fs/f2fs/inode.c | 10 +-
fs/fhandle.c | 6 +-
fs/jfs/jfs_dmap.c | 2 +
fs/jfs/jfs_dtree.c | 2 +
fs/jfs/jfs_logmgr.c | 33 +--
fs/jfs/jfs_logmgr.h | 2 +-
fs/jfs/jfs_mount.c | 3 +-
fs/lockd/svc.c | 3 -
fs/nfs/callback.c | 3 -
fs/nfsd/cache.h | 2 -
fs/nfsd/export.c | 32 ++-
fs/nfsd/export.h | 4 +-
fs/nfsd/netns.h | 25 ++-
fs/nfsd/nfs4proc.c | 6 +-
fs/nfsd/nfs4state.c | 3 +-
fs/nfsd/nfscache.c | 40 +---
fs/nfsd/nfsctl.c | 16 +-
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 3 +-
fs/nfsd/nfssvc.c | 14 +-
fs/nfsd/stats.c | 54 ++---
fs/nfsd/stats.h | 88 +++-----
fs/nfsd/vfs.c | 6 +-
fs/ntfs3/frecord.c | 75 ++++++-
fs/quota/quota_tree.c | 128 +++++++++---
fs/quota/quota_v2.c | 15 +-
fs/reiserfs/stree.c | 2 +-
fs/romfs/super.c | 2 +-
fs/squashfs/block.c | 2 +-
fs/squashfs/file.c | 3 +-
fs/squashfs/file_direct.c | 6 +-
fs/super.c | 15 +-
include/linux/cgroup-defs.h | 7 +-
include/linux/fs.h | 3 +-
include/linux/sockptr.h | 25 +++
include/linux/sunrpc/svc.h | 5 +-
include/uapi/linux/bpf.h | 19 +-
kernel/bpf/lpm_trie.c | 33 +--
kernel/cgroup/cgroup-internal.h | 3 +-
kernel/cgroup/cgroup.c | 23 ++-
kernel/irq/cpuhotplug.c | 27 ++-
kernel/irq/manage.c | 12 +-
mm/debug_vm_pgtable.c | 31 +--
mm/gup.c | 251 ++++++++++++-----------
mm/huge_memory.c | 6 +-
mm/hugetlb.c | 2 +-
mm/internal.h | 4 +-
mm/page_table_check.c | 30 +++
net/bluetooth/rfcomm/sock.c | 14 +-
net/core/filter.c | 8 +-
net/ipv4/fou_core.c | 2 +-
net/ipv4/tcp_metrics.c | 7 +-
net/mac80211/iface.c | 27 ++-
net/nfc/llcp_sock.c | 12 +-
net/rds/recv.c | 13 +-
net/sched/sch_generic.c | 5 +-
net/sctp/inqueue.c | 14 +-
net/sunrpc/stats.c | 2 +-
net/sunrpc/svc.c | 39 ++--
net/wireless/nl80211.c | 6 +-
samples/bpf/map_perf_test_user.c | 2 +-
samples/bpf/xdp_router_ipv4_user.c | 2 +-
sound/soc/soc-topology.c | 32 +--
sound/usb/mixer.c | 7 +
tools/include/uapi/linux/bpf.h | 19 +-
tools/testing/selftests/bpf/progs/map_ptr_kern.c | 2 +-
tools/testing/selftests/bpf/test_lpm_map.c | 18 +-
tools/testing/selftests/net/tls.c | 14 ++
87 files changed, 987 insertions(+), 696 deletions(-)
From: Arnd Bergmann <arnd(a)arndb.de>
Both of these architectures require u64 function arguments to be
passed in even/odd pairs of registers or stack slots, which in case of
sync_file_range would result in a seven-argument system call that is
not currently possible. The system call is therefore incompatible with
all existing binaries.
While it would be possible to implement support for seven arguments
like on mips, it seems better to use a six-argument version, either
with the normal argument order but misaligned as on most architectures
or with the reordered sync_file_range2() calling conventions as on
arm and powerpc.
Cc: stable(a)vger.kernel.org
Acked-by: Guo Ren <guoren(a)kernel.org>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
arch/csky/include/uapi/asm/unistd.h | 1 +
arch/hexagon/include/uapi/asm/unistd.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/csky/include/uapi/asm/unistd.h b/arch/csky/include/uapi/asm/unistd.h
index 7ff6a2466af1..e0594b6370a6 100644
--- a/arch/csky/include/uapi/asm/unistd.h
+++ b/arch/csky/include/uapi/asm/unistd.h
@@ -6,6 +6,7 @@
#define __ARCH_WANT_SYS_CLONE3
#define __ARCH_WANT_SET_GET_RLIMIT
#define __ARCH_WANT_TIME32_SYSCALLS
+#define __ARCH_WANT_SYNC_FILE_RANGE2
#include <asm-generic/unistd.h>
#define __NR_set_thread_area (__NR_arch_specific_syscall + 0)
diff --git a/arch/hexagon/include/uapi/asm/unistd.h b/arch/hexagon/include/uapi/asm/unistd.h
index 432c4db1b623..21ae22306b5d 100644
--- a/arch/hexagon/include/uapi/asm/unistd.h
+++ b/arch/hexagon/include/uapi/asm/unistd.h
@@ -36,5 +36,6 @@
#define __ARCH_WANT_SYS_VFORK
#define __ARCH_WANT_SYS_FORK
#define __ARCH_WANT_TIME32_SYSCALLS
+#define __ARCH_WANT_SYNC_FILE_RANGE2
#include <asm-generic/unistd.h>
--
2.39.2
No upstream commit exists for this commit.
Fuzzing of 5.10 stable branch reports a slab-out-of-bounds error in
ata_scsi_pass_thru.
The error is fixed in 5.18 by commit ce70fd9a551a ("scsi: core: Remove the
cmd field from struct scsi_request") upstream.
Backporting this commit would require significant changes to the code so
it is better to use a simple fix for that particular error.
The problem is that the length of the received SCSI command is not
validated if scsi_op == VARIABLE_LENGTH_CMD. It can lead to out-of-bounds
reading if the user sends a request with SCSI command of length less than
32.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Acked-by: Damien Le Moal <dlemoal(a)kernel.org>
Co-developed-by: Mikhail Ivanov <iwanov-23(a)bk.ru>
Signed-off-by: Mikhail Ivanov <iwanov-23(a)bk.ru>
Co-developed-by: Mikhail Ukhin <mish.uxin2012(a)yandex.ru>
Signed-off-by: Mikhail Ukhin <mish.uxin2012(a)yandex.ru>
Signed-off-by: Artem Sadovnikov <ancowi69(a)gmail.com>
---
Link: https://lore.kernel.org/lkml/20240711151546.341491-1-ancowi69@gmail.com/T/#u
unfortunately, stable(a)vger.kernel.org wasn't initially mentioned.
drivers/ata/libata-scsi.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 36f32fa052df..4397986db053 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3949,6 +3949,9 @@ static unsigned int ata_scsi_var_len_cdb_xlat(struct ata_queued_cmd *qc)
const u8 *cdb = scmd->cmnd;
const u16 sa = get_unaligned_be16(&cdb[8]);
+ if (scmd->cmd_len != 32)
+ return 1;
+
/*
* if service action represents a ata pass-thru(32) command,
* then pass it to ata_scsi_pass_thru handler.
--
2.34.1
Although there are several patches improving the extent map shrinker,
there are still reports of too frequent shrinker behavior, taking too
much CPU for the kswapd process.
So let's only enable extent shrinker for now, until we got more
comprehensive understanding and a better solution.
Link: https://lore.kernel.org/linux-btrfs/3df4acd616a07ef4d2dc6bad668701504b412ff…
Link: https://lore.kernel.org/linux-btrfs/c30fd6b3-ca7a-4759-8a53-d42878bf84f7@gm…
Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps")
CC: stable(a)vger.kernel.org # 6.10+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
I also checked how XFS (the only other fs implemented the
free_cached_objects callback) implemented the callback.
They did two things:
- Make sure there is only one queued reclaim
Currently we only do the reclaim for kswapd, but for multi-node
systems, we can still have multiple kswapd processes.
But I do not think that's the root cause.
- With an extra delay of 60% of xfs_syncd_centiseccs
The default value for xfs_syncd_centiseccs is 3000 centiseconds (30s),
with a minimal 100 centiseconds (1s).
This results the reclaim work only to be executed at most every 18
seconds by default (or 0.6s for the minimal interval).
I believe this is the root cause, we have no extra delay and that
makes btrfs to shrink extent maps too frequently.
---
fs/btrfs/super.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 11044e9e2cb1..98fa0f382480 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2402,7 +2402,13 @@ static long btrfs_nr_cached_objects(struct super_block *sb, struct shrink_contro
trace_btrfs_extent_map_shrinker_count(fs_info, nr);
- return nr;
+ /*
+ * Only report the real number for DEBUG builds, as there are reports of
+ * serious performance degradation caused by too frequent shrinks.
+ */
+ if (IS_ENABLED(CONFIG_BTRFS_DEBUG))
+ return nr;
+ return 0;
}
static long btrfs_free_cached_objects(struct super_block *sb, struct shrink_control *sc)
--
2.46.0
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 13 +++++++++++--
drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 3 +++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index f42ebc4a7c22..a0e433fbcba6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -360,6 +360,8 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
void *virtual;
int ret;
+ atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -383,11 +385,17 @@ void *vmw_bo_map_and_cache_size(struct vmw_bo *vbo, size_t size)
*/
void vmw_bo_unmap(struct vmw_bo *vbo)
{
+ int map_count;
+
if (vbo->map.bo == NULL)
return;
- ttm_bo_kunmap(&vbo->map);
- vbo->map.bo = NULL;
+ map_count = atomic_dec_return(&vbo->map_count);
+
+ if (!map_count) {
+ ttm_bo_kunmap(&vbo->map);
+ vbo->map.bo = NULL;
+ }
}
@@ -421,6 +429,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
xa_init(&vmw_bo->detached_resources);
+ atomic_set(&vmw_bo->map_count, 0);
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 62b4342d5f7c..43b5439ec9f7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -71,6 +71,8 @@ struct vmw_bo_params {
* @map: Kmap object for semi-persistent mappings
* @res_tree: RB tree of resources using this buffer object as a backing MOB
* @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
* @cpu_writers: Number of synccpu write grabs. Protected by reservation when
* increased. May be decreased without reservation.
* @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -90,6 +92,7 @@ struct vmw_bo {
u32 res_prios[TTM_MAX_BO_PRIORITY];
struct xarray detached_resources;
+ atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted. Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
--
2.43.0
It is done everywhere in cxgb4 code, e.g. in is_filter_exact_match()
There is no reason it should not be done here
Found by Linux Verification Center (linuxtesting.org) with SVACE
Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru>
Cc: stable(a)vger.kernel.org
Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters")
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
index 786ceae34488..e417ff0ea06c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
@@ -1244,7 +1244,7 @@ static u64 hash_filter_ntuple(struct ch_filter_specification *fs,
* in the Compressed Filter Tuple.
*/
if (tp->vlan_shift >= 0 && fs->mask.ivlan)
- ntuple |= (FT_VLAN_VLD_F | fs->val.ivlan) << tp->vlan_shift;
+ ntuple |= (u64)(FT_VLAN_VLD_F | fs->val.ivlan) << tp->vlan_shift;
if (tp->port_shift >= 0 && fs->mask.iport)
ntuple |= (u64)fs->val.iport << tp->port_shift;
--
2.34.1
Greetings,
Did you receive my last email message I sent to this Email
address: ( stable(a)vger.kernel.org ) concerning relocating my
investment to your country due to the on going war in my country
Russia.
Best Regards,
Mr.Boris Soroka.
From: Steven Rostedt <rostedt(a)goodmis.org>
When running the following:
# cd /sys/kernel/tracing/
# echo 1 > events/sched/sched_waking/enable
# echo 1 > events/sched/sched_switch/enable
# echo 0 > tracing_on
# dd if=per_cpu/cpu0/trace_pipe_raw of=/tmp/raw0.dat
The dd task would get stuck in an infinite loop in the kernel. What would
happen is the following:
When ring_buffer_read_page() returns -1 (no data) then a check is made to
see if the buffer is empty (as happens when the page is not full), it will
call wait_on_pipe() to wait until the ring buffer has data. When it is it
will try again to read data (unless O_NONBLOCK is set).
The issue happens when there's a reader and the file descriptor is closed.
The wait_on_pipe() will return when that is the case. But this loop will
continue to try again and wait_on_pipe() will again return immediately and
the loop will continue and never stop.
Simply check if the file was closed before looping and exit out if it is.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Link: https://lore.kernel.org/20240808235730.78bf63e5@rorschach.local.home
Fixes: 2aa043a55b9a7 ("tracing/ring-buffer: Fix wait_on_pipe() race")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 10cd38bce2f1..ebe7ce2f5f4a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7956,7 +7956,7 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
trace_access_unlock(iter->cpu_file);
if (ret < 0) {
- if (trace_empty(iter)) {
+ if (trace_empty(iter) && !iter->closed) {
if ((filp->f_flags & O_NONBLOCK))
return -EAGAIN;
--
2.43.0
This is the start of the stable review cycle for the 6.10.6 release.
There are 22 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 17 Aug 2024 13:18:17 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.6-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.10.6-rc1
Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
drm/amdgpu/display: Fix null pointer dereference in dc_stream_program_cursor_position
Wayne Lin <Wayne.Lin(a)amd.com>
drm/amd/display: Solve mst monitors blank out problem after resume
Kees Cook <kees(a)kernel.org>
binfmt_flat: Fix corruption when not offsetting data start
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc
Gergo Koteles <soyer(a)irl.hu>
platform/x86: ideapad-laptop: introduce a generic notification chain
Shyam Sundar S K <Shyam-sundar.S-k(a)amd.com>
platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb: Fix UBSAN warning in parse_audio_unit()
Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
fs/ntfs3: Do copy_to_user out of run_lock
Pei Li <peili.dev(a)gmail.com>
jfs: Fix shift-out-of-bounds in dbDiscardAG
Edward Adam Davis <eadavis(a)qq.com>
jfs: fix null ptr deref in dtInsertEntry
Willem de Bruijn <willemb(a)google.com>
fou: remove warn in gue_gro_receive on unsupported protocol
Chao Yu <chao(a)kernel.org>
f2fs: fix to cover read extent cache access with lock
Chao Yu <chao(a)kernel.org>
f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC
yunshui <jiangyunshui(a)kylinos.cn>
bpf, net: Use DEV_STAT_INC()
Simon Trimmer <simont(a)opensource.cirrus.com>
ASoC: cs35l56: Patch CS35L56_IRQ1_MASK_18 to the default value
WangYuli <wangyuli(a)uniontech.com>
nvme/pci: Add APST quirk for Lenovo N60z laptop
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Fangzhi Zuo <jerry.zuo(a)amd.com>
drm/amd/display: Prevent IPX From Link Detect and Set Mode
Harry Wentland <harry.wentland(a)amd.com>
drm/amd/display: Separate setting and programming of cursor
Wayne Lin <wayne.lin(a)amd.com>
drm/amd/display: Defer handling mst up request in resume
Kees Cook <kees(a)kernel.org>
exec: Fix ToCToU between perm check and set-uid/gid usage
-------------
Diffstat:
Makefile | 4 +-
arch/loongarch/include/uapi/asm/unistd.h | 1 +
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +-
.../drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 6 +-
drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 94 ++++++++-----
drivers/gpu/drm/amd/display/dc/dc_stream.h | 8 ++
.../drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c | 2 +-
drivers/nvme/host/pci.c | 7 +
drivers/platform/x86/Kconfig | 1 +
drivers/platform/x86/amd/pmf/spc.c | 32 ++---
drivers/platform/x86/ideapad-laptop.c | 148 ++++++++++++++++++---
drivers/platform/x86/ideapad-laptop.h | 9 ++
drivers/platform/x86/lenovo-ymc.c | 60 +--------
fs/binfmt_flat.c | 4 +-
fs/exec.c | 8 +-
fs/f2fs/extent_cache.c | 50 +++----
fs/f2fs/f2fs.h | 2 +-
fs/f2fs/gc.c | 10 ++
fs/f2fs/inode.c | 10 +-
fs/jfs/jfs_dmap.c | 2 +
fs/jfs/jfs_dtree.c | 2 +
fs/ntfs3/frecord.c | 75 ++++++++++-
net/core/filter.c | 8 +-
net/ipv4/fou_core.c | 2 +-
sound/soc/codecs/cs35l56-shared.c | 1 +
sound/usb/mixer.c | 7 +
26 files changed, 388 insertions(+), 179 deletions(-)
mwifiex_band_2ghz and mwifiex_band_5ghz are statically allocated, but
used and modified in driver instances. Duplicate them before using
them in driver instances so that different driver instances do not
influence each other.
This was observed on a board which has one PCIe and one SDIO mwifiex
adapter. It blew up in mwifiex_setup_ht_caps(). This was called with
the statically allocated struct which is modified in this function.
Cc: stable(a)vger.kernel.org
Fixes: d6bffe8bb520 ("mwifiex: support for creation of AP interface")
Signed-off-by: Sascha Hauer <s.hauer(a)pengutronix.de>
---
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 32 ++++++++++++++++++++-----
1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index b909a7665e9cc..d2e4153192032 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -4361,11 +4361,27 @@ int mwifiex_register_cfg80211(struct mwifiex_adapter *adapter)
if (ISSUPP_ADHOC_ENABLED(adapter->fw_cap_info))
wiphy->interface_modes |= BIT(NL80211_IFTYPE_ADHOC);
- wiphy->bands[NL80211_BAND_2GHZ] = &mwifiex_band_2ghz;
- if (adapter->config_bands & BAND_A)
- wiphy->bands[NL80211_BAND_5GHZ] = &mwifiex_band_5ghz;
- else
+ wiphy->bands[NL80211_BAND_2GHZ] = devm_kmemdup(adapter->dev,
+ &mwifiex_band_2ghz,
+ sizeof(mwifiex_band_2ghz),
+ GFP_KERNEL);
+ if (!wiphy->bands[NL80211_BAND_2GHZ]) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ if (adapter->config_bands & BAND_A) {
+ wiphy->bands[NL80211_BAND_5GHZ] = devm_kmemdup(adapter->dev,
+ &mwifiex_band_5ghz,
+ sizeof(mwifiex_band_5ghz),
+ GFP_KERNEL);
+ if (!wiphy->bands[NL80211_BAND_5GHZ]) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ } else {
wiphy->bands[NL80211_BAND_5GHZ] = NULL;
+ }
if (adapter->drcs_enabled && ISSUPP_DRCS_ENABLED(adapter->fw_cap_info))
wiphy->iface_combinations = &mwifiex_iface_comb_ap_sta_drcs;
@@ -4459,8 +4475,7 @@ int mwifiex_register_cfg80211(struct mwifiex_adapter *adapter)
if (ret < 0) {
mwifiex_dbg(adapter, ERROR,
"%s: wiphy_register failed: %d\n", __func__, ret);
- wiphy_free(wiphy);
- return ret;
+ goto err;
}
if (!adapter->regd) {
@@ -4502,4 +4517,9 @@ int mwifiex_register_cfg80211(struct mwifiex_adapter *adapter)
adapter->wiphy = wiphy;
return ret;
+
+err:
+ wiphy_free(wiphy);
+
+ return ret;
}
---
base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
change-id: 20240809-mwifiex-duplicate-static-structs-f6355e8da797
Best regards,
--
Sascha Hauer <s.hauer(a)pengutronix.de>
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: dfb3911c3692e45b027f13c7dca3230921533953
Gitweb: https://git.kernel.org/tip/dfb3911c3692e45b027f13c7dca3230921533953
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Wed, 14 Aug 2024 00:29:36 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Fri, 16 Aug 2024 11:33:33 +02:00
x86/kaslr: Expose and use the end of the physical memory address space
iounmap() on x86 occasionally fails to unmap because the provided valid
ioremap address is not below high_memory. It turned out that this
happens due to KASLR.
KASLR uses the full address space between PAGE_OFFSET and vaddr_end to
randomize the starting points of the direct map, vmalloc and vmemmap
regions. It thereby limits the size of the direct map by using the
installed memory size plus an extra configurable margin for hot-plug
memory. This limitation is done to gain more randomization space
because otherwise only the holes between the direct map, vmalloc,
vmemmap and vaddr_end would be usable for randomizing.
The limited direct map size is not exposed to the rest of the kernel, so
the memory hot-plug and resource management related code paths still
operate under the assumption that the available address space can be
determined with MAX_PHYSMEM_BITS.
request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
downwards. That means the first allocation happens past the end of the
direct map and if unlucky this address is in the vmalloc space, which
causes high_memory to become greater than VMALLOC_START and consequently
causes iounmap() to fail for valid ioremap addresses.
MAX_PHYSMEM_BITS cannot be changed for that because the randomization
does not align with address bit boundaries and there are other places
which actually require to know the maximum number of address bits. All
remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found
to be correct.
Cure this by exposing the end of the direct map via PHYSMEM_END and use
that for the memory hot-plug and resource management related places
instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END
maps to a variable which is initialized by the KASLR initialization and
otherwise it is based on MAX_PHYSMEM_BITS as before.
To prevent future hickups add a check into add_pages() to catch callers
trying to add memory above PHYSMEM_END.
Fixes: 0483e1fa6e09 ("x86/mm: Implement ASLR for kernel memory regions")
Reported-by: Max Ramanouski <max8rr8(a)gmail.com>
Reported-by: Alistair Popple <apopple(a)nvidia.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-By: Max Ramanouski <max8rr8(a)gmail.com>
Tested-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Reviewed-by: Kees Cook <kees(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/87ed6soy3z.ffs@tglx
---
arch/x86/include/asm/page_64.h | 1 +-
arch/x86/include/asm/pgtable_64_types.h | 4 ++++-
arch/x86/mm/init_64.c | 4 ++++-
arch/x86/mm/kaslr.c | 26 +++++++++++++++++++++---
include/linux/mm.h | 4 ++++-
kernel/resource.c | 6 ++----
mm/memory_hotplug.c | 2 +-
mm/sparse.c | 2 +-
8 files changed, 40 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index af4302d..f3d257c 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -17,6 +17,7 @@ extern unsigned long phys_base;
extern unsigned long page_offset_base;
extern unsigned long vmalloc_base;
extern unsigned long vmemmap_base;
+extern unsigned long physmem_end;
static __always_inline unsigned long __phys_addr_nodebug(unsigned long x)
{
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 9053dfe..a98e534 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -140,6 +140,10 @@ extern unsigned int ptrs_per_p4d;
# define VMEMMAP_START __VMEMMAP_BASE_L4
#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
+#ifdef CONFIG_RANDOMIZE_MEMORY
+# define PHYSMEM_END physmem_end
+#endif
+
/*
* End of the region for which vmalloc page tables are pre-allocated.
* For non-KMSAN builds, this is the same as VMALLOC_END.
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index d8dbeac..ff25364 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -958,8 +958,12 @@ static void update_end_of_memory_vars(u64 start, u64 size)
int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
struct mhp_params *params)
{
+ unsigned long end = ((start_pfn + nr_pages) << PAGE_SHIFT) - 1;
int ret;
+ if (WARN_ON_ONCE(end > PHYSMEM_END))
+ return -ERANGE;
+
ret = __add_pages(nid, start_pfn, nr_pages, params);
WARN_ON_ONCE(ret);
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 37db264..0f2a3a4 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -47,13 +47,24 @@ static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE;
*/
static __initdata struct kaslr_memory_region {
unsigned long *base;
+ unsigned long *end;
unsigned long size_tb;
} kaslr_regions[] = {
- { &page_offset_base, 0 },
- { &vmalloc_base, 0 },
- { &vmemmap_base, 0 },
+ {
+ .base = &page_offset_base,
+ .end = &physmem_end,
+ },
+ {
+ .base = &vmalloc_base,
+ },
+ {
+ .base = &vmemmap_base,
+ },
};
+/* The end of the possible address space for physical memory */
+unsigned long physmem_end __ro_after_init;
+
/* Get size in bytes used by the memory region */
static inline unsigned long get_padding(struct kaslr_memory_region *region)
{
@@ -82,6 +93,8 @@ void __init kernel_randomize_memory(void)
BUILD_BUG_ON(vaddr_end != CPU_ENTRY_AREA_BASE);
BUILD_BUG_ON(vaddr_end > __START_KERNEL_map);
+ /* Preset the end of the possible address space for physical memory */
+ physmem_end = ((1ULL << MAX_PHYSMEM_BITS) - 1);
if (!kaslr_memory_enabled())
return;
@@ -134,6 +147,13 @@ void __init kernel_randomize_memory(void)
*/
vaddr += get_padding(&kaslr_regions[i]);
vaddr = round_up(vaddr + 1, PUD_SIZE);
+
+ /*
+ * KASLR trims the maximum possible size of the
+ * direct-map. Update the physmem_end boundary.
+ */
+ if (kaslr_regions[i].end)
+ *kaslr_regions[i].end = __pa(vaddr) - 1;
remain_entropy -= entropy;
}
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c4b238a..b386415 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -97,6 +97,10 @@ extern const int mmap_rnd_compat_bits_max;
extern int mmap_rnd_compat_bits __read_mostly;
#endif
+#ifndef PHYSMEM_END
+# define PHYSMEM_END ((1ULL << MAX_PHYSMEM_BITS) - 1)
+#endif
+
#include <asm/page.h>
#include <asm/processor.h>
diff --git a/kernel/resource.c b/kernel/resource.c
index 14777af..a83040f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1826,8 +1826,7 @@ static resource_size_t gfr_start(struct resource *base, resource_size_t size,
if (flags & GFR_DESCENDING) {
resource_size_t end;
- end = min_t(resource_size_t, base->end,
- (1ULL << MAX_PHYSMEM_BITS) - 1);
+ end = min_t(resource_size_t, base->end, PHYSMEM_END);
return end - size + 1;
}
@@ -1844,8 +1843,7 @@ static bool gfr_continue(struct resource *base, resource_size_t addr,
* @size did not wrap 0.
*/
return addr > addr - size &&
- addr <= min_t(resource_size_t, base->end,
- (1ULL << MAX_PHYSMEM_BITS) - 1);
+ addr <= min_t(resource_size_t, base->end, PHYSMEM_END);
}
static resource_size_t gfr_next(resource_size_t addr, resource_size_t size,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 66267c2..951878a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1681,7 +1681,7 @@ struct range __weak arch_get_mappable_range(void)
struct range mhp_get_pluggable_range(bool need_mapping)
{
- const u64 max_phys = (1ULL << MAX_PHYSMEM_BITS) - 1;
+ const u64 max_phys = PHYSMEM_END;
struct range mhp_range;
if (need_mapping) {
diff --git a/mm/sparse.c b/mm/sparse.c
index e4b8300..0c3bff8 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -129,7 +129,7 @@ static inline int sparse_early_nid(struct mem_section *section)
static void __meminit mminit_validate_memmodel_limits(unsigned long *start_pfn,
unsigned long *end_pfn)
{
- unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT);
+ unsigned long max_sparsemem_pfn = (PHYSMEM_END + 1) >> PAGE_SHIFT;
/*
* Sanity checks - do not allow an architecture to pass
The quilt patch titled
Subject: alloc_tag: mark pages reserved during CMA activation as not tagged
has been removed from the -mm tree. Its filename was
alloc_tag-mark-pages-reserved-during-cma-activation-as-not-tagged.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: alloc_tag: mark pages reserved during CMA activation as not tagged
Date: Tue, 13 Aug 2024 08:07:57 -0700
During CMA activation, pages in CMA area are prepared and then freed
without being allocated. This triggers warnings when memory allocation
debug config (CONFIG_MEM_ALLOC_PROFILING_DEBUG) is enabled. Fix this by
marking these pages not tagged before freeing them.
Link: https://lkml.kernel.org/r/20240813150758.855881-2-surenb@google.com
Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Sourav Panda <souravpanda(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.10]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mm_init.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/mm_init.c~alloc_tag-mark-pages-reserved-during-cma-activation-as-not-tagged
+++ a/mm/mm_init.c
@@ -2244,6 +2244,8 @@ void __init init_cma_reserved_pageblock(
set_pageblock_migratetype(page, MIGRATE_CMA);
set_page_refcounted(page);
+ /* pages were reserved and not allocated */
+ clear_page_tag_ref(page);
__free_pages(page, pageblock_order);
adjust_managed_page_count(page, pageblock_nr_pages);
_
Patches currently in -mm which might be from surenb(a)google.com are
The quilt patch titled
Subject: alloc_tag: introduce clear_page_tag_ref() helper function
has been removed from the -mm tree. Its filename was
alloc_tag-introduce-clear_page_tag_ref-helper-function.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: alloc_tag: introduce clear_page_tag_ref() helper function
Date: Tue, 13 Aug 2024 08:07:56 -0700
In several cases we are freeing pages which were not allocated using
common page allocators. For such cases, in order to keep allocation
accounting correct, we should clear the page tag to indicate that the page
being freed is expected to not have a valid allocation tag. Introduce
clear_page_tag_ref() helper function to be used for this.
Link: https://lkml.kernel.org/r/20240813150758.855881-1-surenb@google.com
Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Sourav Panda <souravpanda(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.10]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/pgalloc_tag.h | 13 +++++++++++++
mm/mm_init.c | 10 +---------
mm/page_alloc.c | 9 +--------
3 files changed, 15 insertions(+), 17 deletions(-)
--- a/include/linux/pgalloc_tag.h~alloc_tag-introduce-clear_page_tag_ref-helper-function
+++ a/include/linux/pgalloc_tag.h
@@ -43,6 +43,18 @@ static inline void put_page_tag_ref(unio
page_ext_put(page_ext_from_codetag_ref(ref));
}
+static inline void clear_page_tag_ref(struct page *page)
+{
+ if (mem_alloc_profiling_enabled()) {
+ union codetag_ref *ref = get_page_tag_ref(page);
+
+ if (ref) {
+ set_codetag_empty(ref);
+ put_page_tag_ref(ref);
+ }
+ }
+}
+
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr)
{
@@ -126,6 +138,7 @@ static inline void pgalloc_tag_sub_pages
static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
static inline void put_page_tag_ref(union codetag_ref *ref) {}
+static inline void clear_page_tag_ref(struct page *page) {}
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr) {}
static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
--- a/mm/mm_init.c~alloc_tag-introduce-clear_page_tag_ref-helper-function
+++ a/mm/mm_init.c
@@ -2459,15 +2459,7 @@ void __init memblock_free_pages(struct p
}
/* pages were reserved and not allocated */
- if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
-
- if (ref) {
- set_codetag_empty(ref);
- put_page_tag_ref(ref);
- }
- }
-
+ clear_page_tag_ref(page);
__free_pages_core(page, order, MEMINIT_EARLY);
}
--- a/mm/page_alloc.c~alloc_tag-introduce-clear_page_tag_ref-helper-function
+++ a/mm/page_alloc.c
@@ -5815,14 +5815,7 @@ unsigned long free_reserved_area(void *s
void free_reserved_page(struct page *page)
{
- if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
-
- if (ref) {
- set_codetag_empty(ref);
- put_page_tag_ref(ref);
- }
- }
+ clear_page_tag_ref(page);
ClearPageReserved(page);
init_page_count(page);
__free_page(page);
_
Patches currently in -mm which might be from surenb(a)google.com are
The quilt patch titled
Subject: selftests: memfd_secret: don't build memfd_secret test on unsupported arches
has been removed from the -mm tree. Its filename was
selftests-memfd_secret-dont-build-memfd_secret-test-on-unsupported-arches.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: selftests: memfd_secret: don't build memfd_secret test on unsupported arches
Date: Fri, 9 Aug 2024 12:56:42 +0500
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and
x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise on
KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before
executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com
Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1]
Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/Makefile | 2 ++
tools/testing/selftests/mm/run_vmtests.sh | 3 +++
2 files changed, 5 insertions(+)
--- a/tools/testing/selftests/mm/Makefile~selftests-memfd_secret-dont-build-memfd_secret-test-on-unsupported-arches
+++ a/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
--- a/tools/testing/selftests/mm/run_vmtests.sh~selftests-memfd_secret-dont-build-memfd_secret-test-on-unsupported-arches
+++ a/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
selftests-mm-fix-build-errors-on-armhf.patch
The quilt patch titled
Subject: mm: fix endless reclaim on machines with unaccepted memory
has been removed from the -mm tree. Its filename was
mm-fix-endless-reclaim-on-machines-with-unaccepted-memory.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm: fix endless reclaim on machines with unaccepted memory
Date: Fri, 9 Aug 2024 14:48:47 +0300
Unaccepted memory is considered unusable free memory, which is not counted
as free on the zone watermark check. This causes get_page_from_freelist()
to accept more memory to hit the high watermark, but it creates problems
in the reclaim path.
The reclaim path encounters a failed zone watermark check and attempts to
reclaim memory. This is usually successful, but if there is little or no
reclaimable memory, it can result in endless reclaim with little to no
progress. This can occur early in the boot process, just after start of
the init process when the only reclaimable memory is the page cache of the
init executable and its libraries.
Make unaccepted memory free from watermark check point of view. This way
unaccepted memory will never be the trigger of memory reclaim. Accept
more memory in the get_page_from_freelist() if needed.
Link: https://lkml.kernel.org/r/20240809114854.3745464-2-kirill.shutemov@linux.in…
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reported-by: Jianxiong Gao <jxgao(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Tested-by: Jianxiong Gao <jxgao(a)google.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 42 ++++++++++++++++++++----------------------
1 file changed, 20 insertions(+), 22 deletions(-)
--- a/mm/page_alloc.c~mm-fix-endless-reclaim-on-machines-with-unaccepted-memory
+++ a/mm/page_alloc.c
@@ -287,7 +287,7 @@ EXPORT_SYMBOL(nr_online_nodes);
static bool page_contains_unaccepted(struct page *page, unsigned int order);
static void accept_page(struct page *page, unsigned int order);
-static bool try_to_accept_memory(struct zone *zone, unsigned int order);
+static bool cond_accept_memory(struct zone *zone, unsigned int order);
static inline bool has_unaccepted_memory(void);
static bool __free_unaccepted(struct page *page);
@@ -3072,9 +3072,6 @@ static inline long __zone_watermark_unus
if (!(alloc_flags & ALLOC_CMA))
unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
#endif
-#ifdef CONFIG_UNACCEPTED_MEMORY
- unusable_free += zone_page_state(z, NR_UNACCEPTED);
-#endif
return unusable_free;
}
@@ -3368,6 +3365,8 @@ retry:
}
}
+ cond_accept_memory(zone, order);
+
/*
* Detect whether the number of free pages is below high
* watermark. If so, we will decrease pcp->high and free
@@ -3393,10 +3392,8 @@ check_alloc_wmark:
gfp_mask)) {
int ret;
- if (has_unaccepted_memory()) {
- if (try_to_accept_memory(zone, order))
- goto try_this_zone;
- }
+ if (cond_accept_memory(zone, order))
+ goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/*
@@ -3450,10 +3447,8 @@ try_this_zone:
return page;
} else {
- if (has_unaccepted_memory()) {
- if (try_to_accept_memory(zone, order))
- goto try_this_zone;
- }
+ if (cond_accept_memory(zone, order))
+ goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/* Try again if zone has deferred pages */
@@ -6950,9 +6945,6 @@ static bool try_to_accept_memory_one(str
struct page *page;
bool last;
- if (list_empty(&zone->unaccepted_pages))
- return false;
-
spin_lock_irqsave(&zone->lock, flags);
page = list_first_entry_or_null(&zone->unaccepted_pages,
struct page, lru);
@@ -6978,23 +6970,29 @@ static bool try_to_accept_memory_one(str
return true;
}
-static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
long to_accept;
- int ret = false;
+ bool ret = false;
+
+ if (!has_unaccepted_memory())
+ return false;
+
+ if (list_empty(&zone->unaccepted_pages))
+ return false;
/* How much to accept to get to high watermark? */
to_accept = high_wmark_pages(zone) -
(zone_page_state(zone, NR_FREE_PAGES) -
- __zone_watermark_unusable_free(zone, order, 0));
+ __zone_watermark_unusable_free(zone, order, 0) -
+ zone_page_state(zone, NR_UNACCEPTED));
- /* Accept at least one page */
- do {
+ while (to_accept > 0) {
if (!try_to_accept_memory_one(zone))
break;
ret = true;
to_accept -= MAX_ORDER_NR_PAGES;
- } while (to_accept > 0);
+ }
return ret;
}
@@ -7037,7 +7035,7 @@ static void accept_page(struct page *pag
{
}
-static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
return false;
}
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
mm-reduce-deferred-struct-page-init-ifdeffery.patch
mm-accept-memory-in-__alloc_pages_bulk.patch
mm-introduce-pageunaccepted-page-type.patch
mm-rework-accept-memory-helpers.patch
mm-add-a-helper-to-accept-page.patch
mm-page_isolation-handle-unaccepted-memory-isolation.patch
mm-accept-to-promo-watermark.patch
The quilt patch titled
Subject: mm/numa: no task_numa_fault() call if PMD is changed
has been removed from the -mm tree. Its filename was
mm-numa-no-task_numa_fault-call-if-pmd-is-changed.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/numa: no task_numa_fault() call if PMD is changed
Date: Fri, 9 Aug 2024 10:59:05 -0400
When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit c5b5a3dd2c1f ("mm: thp: refactor NUMA
fault handling") restructured do_huge_pmd_numa_page() and did not avoid
task_numa_fault() call in the second page table check after a numa
migration failure. Fix it by making all !pmd_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-3-ziy@nvidia.com
Fixes: c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling")
Reported-by: "Huang, Ying" <ying.huang(a)intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte…
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 29 +++++++++++++----------------
1 file changed, 13 insertions(+), 16 deletions(-)
--- a/mm/huge_memory.c~mm-numa-no-task_numa_fault-call-if-pmd-is-changed
+++ a/mm/huge_memory.c
@@ -1685,7 +1685,7 @@ vm_fault_t do_huge_pmd_numa_page(struct
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
spin_unlock(vmf->ptl);
- goto out;
+ return 0;
}
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1728,22 +1728,16 @@ vm_fault_t do_huge_pmd_numa_page(struct
if (!migrate_misplaced_folio(folio, vma, target_nid)) {
flags |= TNF_MIGRATED;
nid = target_nid;
- } else {
- flags |= TNF_MIGRATE_FAIL;
- vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
- if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
- spin_unlock(vmf->ptl);
- goto out;
- }
- goto out_map;
- }
-
-out:
- if (nid != NUMA_NO_NODE)
task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
+ }
- return 0;
-
+ flags |= TNF_MIGRATE_FAIL;
+ vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
+ if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
+ spin_unlock(vmf->ptl);
+ return 0;
+ }
out_map:
/* Restore the PMD */
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
@@ -1753,7 +1747,10 @@ out_map:
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd);
update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
spin_unlock(vmf->ptl);
- goto out;
+
+ if (nid != NUMA_NO_NODE)
+ task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags);
+ return 0;
}
/*
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
memory-tiering-read-last_cpupid-correctly-in-do_huge_pmd_numa_page.patch
memory-tiering-introduce-folio_use_access_time-check.patch
memory-tiering-count-pgpromote_success-when-mem-tiering-is-enabled.patch
mm-migrate-move-common-code-to-numa_migrate_check-was-numa_migrate_prep.patch
The quilt patch titled
Subject: mm/numa: no task_numa_fault() call if PTE is changed
has been removed from the -mm tree. Its filename was
mm-numa-no-task_numa_fault-call-if-pte-is-changed.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/numa: no task_numa_fault() call if PTE is changed
Date: Fri, 9 Aug 2024 10:59:04 -0400
When handling a numa page fault, task_numa_fault() should be called by a
process that restores the page table of the faulted folio to avoid
duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce
TLB flush via delaying mapping on hint page fault") restructured
do_numa_page() and did not avoid task_numa_fault() call in the second page
table check after a numa migration failure. Fix it by making all
!pte_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary
and lead to unexpected numa balancing results (It is hard to tell whether
the issue will cause positive or negative performance impact due to
duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-2-ziy@nvidia.com
Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: "Huang, Ying" <ying.huang(a)intel.com>
Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.inte…
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 33 ++++++++++++++++-----------------
1 file changed, 16 insertions(+), 17 deletions(-)
--- a/mm/memory.c~mm-numa-no-task_numa_fault-call-if-pte-is-changed
+++ a/mm/memory.c
@@ -5295,7 +5295,7 @@ static vm_fault_t do_numa_page(struct vm
if (unlikely(!pte_same(old_pte, vmf->orig_pte))) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
+ return 0;
}
pte = pte_modify(old_pte, vma->vm_page_prot);
@@ -5358,23 +5358,19 @@ static vm_fault_t do_numa_page(struct vm
if (!migrate_misplaced_folio(folio, vma, target_nid)) {
nid = target_nid;
flags |= TNF_MIGRATED;
- } else {
- flags |= TNF_MIGRATE_FAIL;
- vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
- vmf->address, &vmf->ptl);
- if (unlikely(!vmf->pte))
- goto out;
- if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
- pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
- }
- goto out_map;
+ task_numa_fault(last_cpupid, nid, nr_pages, flags);
+ return 0;
}
-out:
- if (nid != NUMA_NO_NODE)
- task_numa_fault(last_cpupid, nid, nr_pages, flags);
- return 0;
+ flags |= TNF_MIGRATE_FAIL;
+ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
+ vmf->address, &vmf->ptl);
+ if (unlikely(!vmf->pte))
+ return 0;
+ if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) {
+ pte_unmap_unlock(vmf->pte, vmf->ptl);
+ return 0;
+ }
out_map:
/*
* Make it present again, depending on how arch implements
@@ -5387,7 +5383,10 @@ out_map:
numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte,
writable);
pte_unmap_unlock(vmf->pte, vmf->ptl);
- goto out;
+
+ if (nid != NUMA_NO_NODE)
+ task_numa_fault(last_cpupid, nid, nr_pages, flags);
+ return 0;
}
static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
memory-tiering-read-last_cpupid-correctly-in-do_huge_pmd_numa_page.patch
memory-tiering-introduce-folio_use_access_time-check.patch
memory-tiering-count-pgpromote_success-when-mem-tiering-is-enabled.patch
mm-migrate-move-common-code-to-numa_migrate_check-was-numa_migrate_prep.patch
The quilt patch titled
Subject: mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
has been removed from the -mm tree. Its filename was
mm-vmalloc-fix-page-mapping-if-vm_area_alloc_pages-with-high-order-fallback-to-order-0.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Hailong Liu <hailong.liu(a)oppo.com>
Subject: mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
Date: Thu, 8 Aug 2024 20:19:56 +0800
The __vmap_pages_range_noflush() assumes its argument pages** contains
pages with the same page shift. However, since commit e9c3cda4d86e ("mm,
vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes
__GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation
failed for high order, the pages** may contain two different page shifts
(high order and order-0). This could lead __vmap_pages_range_noflush() to
perform incorrect mappings, potentially resulting in memory corruption.
Users might encounter this as follows (vmap_allow_huge = true, 2M is for
PMD_SIZE):
kvmalloc(2M, __GFP_NOFAIL|GFP_X)
__vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0
vmap_pages_range()
vmap_pages_range_noflush()
__vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
We can remove the fallback code because if a high-order allocation fails,
__vmalloc_node_range_noprof() will retry with order-0. Therefore, it is
unnecessary to fallback to order-0 here. Therefore, fix this by removing
the fallback code.
Link: https://lkml.kernel.org/r/20240808122019.3361-1-hailong.liu@oppo.com
Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
Signed-off-by: Hailong Liu <hailong.liu(a)oppo.com>
Reported-by: Tangquan Zheng <zhengtangquan(a)oppo.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Acked-by: Barry Song <baohua(a)kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
--- a/mm/vmalloc.c~mm-vmalloc-fix-page-mapping-if-vm_area_alloc_pages-with-high-order-fallback-to-order-0
+++ a/mm/vmalloc.c
@@ -3584,15 +3584,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
page = alloc_pages_noprof(alloc_gfp, order);
else
page = alloc_pages_node_noprof(nid, alloc_gfp, order);
- if (unlikely(!page)) {
- if (!nofail)
- break;
-
- /* fall back to the zero order allocations */
- alloc_gfp |= __GFP_NOFAIL;
- order = 0;
- continue;
- }
+ if (unlikely(!page))
+ break;
/*
* Higher order allocations must be able to be treated as
_
Patches currently in -mm which might be from hailong.liu(a)oppo.com are
The quilt patch titled
Subject: mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
has been removed from the -mm tree. Its filename was
mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
Date: Tue, 6 Aug 2024 12:41:07 -0400
The memory_failure_cpu structure is a per-cpu structure. Access to its
content requires the use of get_cpu_var() to lock in the current CPU and
disable preemption. The use of a regular spinlock_t for locking purpose
is fine for a non-RT kernel.
Since the integration of RT spinlock support into the v5.15 kernel, a
spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping
lock in a preemption disabled context is illegal resulting in the
following kind of warning.
[12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0
[12135.732252] preempt_count: 1, expected: 0
[12135.732255] RCU nest depth: 2, expected: 2
:
[12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021
[12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred
[12135.732433] Call Trace:
[12135.732436] <TASK>
[12135.732450] dump_stack_lvl+0x57/0x81
[12135.732461] __might_resched.cold+0xf4/0x12f
[12135.732479] rt_spin_lock+0x4c/0x100
[12135.732491] memory_failure_queue+0x40/0xe0
[12135.732503] ghes_do_memory_failure+0x53/0x390
[12135.732516] ghes_do_proc.constprop.0+0x229/0x3e0
[12135.732575] ghes_proc+0xf9/0x1a0
[12135.732591] ghes_notify_hed+0x6a/0x150
[12135.732602] notifier_call_chain+0x43/0xb0
[12135.732626] blocking_notifier_call_chain+0x43/0x60
[12135.732637] acpi_ev_notify_dispatch+0x47/0x70
[12135.732648] acpi_os_execute_deferred+0x13/0x20
[12135.732654] process_one_work+0x41f/0x500
[12135.732695] worker_thread+0x192/0x360
[12135.732715] kthread+0x111/0x140
[12135.732733] ret_from_fork+0x29/0x50
[12135.732779] </TASK>
Fix it by using a raw_spinlock_t for locking instead.
Also move the pr_err() out of the lock critical section and after
put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep
with this call.
[longman(a)redhat.com: don't hold percpu ref across pr_err(), per Miaohe]
Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com
Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Len Brown <len.brown(a)intel.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu
+++ a/mm/memory-failure.c
@@ -2417,7 +2417,7 @@ struct memory_failure_entry {
struct memory_failure_cpu {
DECLARE_KFIFO(fifo, struct memory_failure_entry,
MEMORY_FAILURE_FIFO_SIZE);
- spinlock_t lock;
+ raw_spinlock_t lock;
struct work_struct work;
};
@@ -2443,20 +2443,22 @@ void memory_failure_queue(unsigned long
{
struct memory_failure_cpu *mf_cpu;
unsigned long proc_flags;
+ bool buffer_overflow;
struct memory_failure_entry entry = {
.pfn = pfn,
.flags = flags,
};
mf_cpu = &get_cpu_var(memory_failure_cpu);
- spin_lock_irqsave(&mf_cpu->lock, proc_flags);
- if (kfifo_put(&mf_cpu->fifo, entry))
+ raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
+ buffer_overflow = !kfifo_put(&mf_cpu->fifo, entry);
+ if (!buffer_overflow)
schedule_work_on(smp_processor_id(), &mf_cpu->work);
- else
+ raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
+ put_cpu_var(memory_failure_cpu);
+ if (buffer_overflow)
pr_err("buffer overflow when queuing memory failure at %#lx\n",
pfn);
- spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
- put_cpu_var(memory_failure_cpu);
}
EXPORT_SYMBOL_GPL(memory_failure_queue);
@@ -2469,9 +2471,9 @@ static void memory_failure_work_func(str
mf_cpu = container_of(work, struct memory_failure_cpu, work);
for (;;) {
- spin_lock_irqsave(&mf_cpu->lock, proc_flags);
+ raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags);
gotten = kfifo_get(&mf_cpu->fifo, &entry);
- spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
+ raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
if (!gotten)
break;
if (entry.flags & MF_SOFT_OFFLINE)
@@ -2501,7 +2503,7 @@ static int __init memory_failure_init(vo
for_each_possible_cpu(cpu) {
mf_cpu = &per_cpu(memory_failure_cpu, cpu);
- spin_lock_init(&mf_cpu->lock);
+ raw_spin_lock_init(&mf_cpu->lock);
INIT_KFIFO(mf_cpu->fifo);
INIT_WORK(&mf_cpu->work, memory_failure_work_func);
}
_
Patches currently in -mm which might be from longman(a)redhat.com are
watchdog-handle-the-enodev-failure-case-of-lockup_detector_delay_init-separately.patch
The quilt patch titled
Subject: mm/hugetlb: fix hugetlb vs. core-mm PT locking
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/hugetlb: fix hugetlb vs. core-mm PT locking
Date: Thu, 1 Aug 2024 22:47:48 +0200
We recently made GUP's common page table walking code to also walk hugetlb
VMAs without most hugetlb special-casing, preparing for the future of
having less hugetlb-specific page table walking code in the codebase.
Turns out that we missed one page table locking detail: page table locking
for hugetlb folios that are not mapped using a single PMD/PUD.
Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
page tables, will perform a pte_offset_map_lock() to grab the PTE table
lock.
However, hugetlb that concurrently modifies these page tables would
actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
locks would differ. Something similar can happen right now with hugetlb
folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
This issue can be reproduced [1], for example triggering:
[ 3105.936100] ------------[ cut here ]------------
[ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188
[ 3105.944634] Modules linked in: [...]
[ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1
[ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024
[ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3105.991108] pc : try_grab_folio+0x11c/0x188
[ 3105.994013] lr : follow_page_pte+0xd8/0x430
[ 3105.996986] sp : ffff80008eafb8f0
[ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43
[ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48
[ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978
[ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001
[ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000
[ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000
[ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0
[ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080
[ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000
[ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000
[ 3106.047957] Call trace:
[ 3106.049522] try_grab_folio+0x11c/0x188
[ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0
[ 3106.055527] follow_page_mask+0x1a0/0x2b8
[ 3106.058118] __get_user_pages+0xf0/0x348
[ 3106.060647] faultin_page_range+0xb0/0x360
[ 3106.063651] do_madvise+0x340/0x598
Let's make huge_pte_lockptr() effectively use the same PT locks as any
core-mm page table walker would. Add ptep_lockptr() to obtain the PTE
page table lock using a pte pointer -- unfortunately we cannot convert
pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables
we can have with CONFIG_HIGHPTE.
Handle CONFIG_PGTABLE_LEVELS correctly by checking in reverse order, such
that when e.g., CONFIG_PGTABLE_LEVELS==2 with
PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE will work as expected. Document
why that works.
There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
folio being mapped using two PTE page tables. While hugetlb wants to take
the PMD table lock, core-mm would grab the PTE table lock of one of both
PTE page tables. In such corner cases, we have to make sure that both
locks match, which is (fortunately!) currently guaranteed for 8xx as it
does not support SMP and consequently doesn't use split PT locks.
[1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
Link: https://lkml.kernel.org/r/20240801204748.99107-1-david@redhat.com
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb.h | 33 ++++++++++++++++++++++++++++++---
include/linux/mm.h | 11 +++++++++++
2 files changed, 41 insertions(+), 3 deletions(-)
--- a/include/linux/hugetlb.h~mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking
+++ a/include/linux/hugetlb.h
@@ -944,10 +944,37 @@ static inline bool htlb_allow_alloc_fall
static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
- if (huge_page_size(h) == PMD_SIZE)
+ const unsigned long size = huge_page_size(h);
+
+ VM_WARN_ON(size == PAGE_SIZE);
+
+ /*
+ * hugetlb must use the exact same PT locks as core-mm page table
+ * walkers would. When modifying a PTE table, hugetlb must take the
+ * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
+ * PT lock etc.
+ *
+ * The expectation is that any hugetlb folio smaller than a PMD is
+ * always mapped into a single PTE table and that any hugetlb folio
+ * smaller than a PUD (but at least as big as a PMD) is always mapped
+ * into a single PMD table.
+ *
+ * If that does not hold for an architecture, then that architecture
+ * must disable split PT locks such that all *_lockptr() functions
+ * will give us the same result: the per-MM PT lock.
+ *
+ * Note that with e.g., CONFIG_PGTABLE_LEVELS=2 where
+ * PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE, we'd use pud_lockptr()
+ * and core-mm would use pmd_lockptr(). However, in such configurations
+ * split PMD locks are disabled -- they don't make sense on a single
+ * PGDIR page table -- and the end result is the same.
+ */
+ if (size >= PUD_SIZE)
+ return pud_lockptr(mm, (pud_t *) pte);
+ else if (size >= PMD_SIZE || IS_ENABLED(CONFIG_HIGHPTE))
return pmd_lockptr(mm, (pmd_t *) pte);
- VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
- return &mm->page_table_lock;
+ /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */
+ return ptep_lockptr(mm, pte);
}
#ifndef hugepages_supported
--- a/include/linux/mm.h~mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking
+++ a/include/linux/mm.h
@@ -2920,6 +2920,13 @@ static inline spinlock_t *pte_lockptr(st
return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
}
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE));
+ BUILD_BUG_ON(MAX_PTRS_PER_PTE * sizeof(pte_t) > PAGE_SIZE);
+ return ptlock_ptr(virt_to_ptdesc(pte));
+}
+
static inline bool ptlock_init(struct ptdesc *ptdesc)
{
/*
@@ -2944,6 +2951,10 @@ static inline spinlock_t *pte_lockptr(st
{
return &mm->page_table_lock;
}
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ return &mm->page_table_lock;
+}
static inline void ptlock_cache_init(void) {}
static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
static inline void ptlock_free(struct ptdesc *ptdesc) {}
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-turn-use_split_pte_ptlocks-use_split_pte_ptlocks-into-kconfig-options.patch
mm-hugetlb-enforce-that-pmd-pt-sharing-has-split-pmd-pt-locks.patch
powerpc-8xx-document-and-enforce-that-split-pt-locks-are-not-used.patch
mm-simplify-arch_make_folio_accessible.patch
mm-gup-convert-to-arch_make_folio_accessible.patch
s390-uv-drop-arch_make_page_accessible.patch
mm-hugetlb-remove-hugetlb_follow_page_mask-leftover.patch
mm-rmap-cleanup-partially-mapped-handling-in-__folio_remove_rmap.patch
mm-clarify-folio_likely_mapped_shared-documentation-for-ksm-folios.patch
mm-provide-vm_normal_pagefolio_pmd-with-config_pgtable_has_huge_leaves.patch
mm-pagewalk-introduce-folio_walk_start-folio_walk_end.patch
mm-migrate-convert-do_pages_stat_array-from-follow_page-to-folio_walk.patch
mm-migrate-convert-add_page_for_migration-from-follow_page-to-folio_walk.patch
mm-ksm-convert-get_mergeable_page-from-follow_page-to-folio_walk.patch
mm-ksm-convert-scan_get_next_rmap_item-from-follow_page-to-folio_walk.patch
mm-huge_memory-convert-split_huge_pages_pid-from-follow_page-to-folio_walk.patch
mm-huge_memory-convert-split_huge_pages_pid-from-follow_page-to-folio_walk-fix.patch
s390-uv-convert-gmap_destroy_page-from-follow_page-to-folio_walk.patch
s390-mm-fault-convert-do_secure_storage_access-from-follow_page-to-folio_walk.patch
mm-remove-follow_page.patch
mm-ksm-convert-break_ksm-from-walk_page_range_vma-to-folio_walk.patch
mm-rmap-minimize-folio-_nr_pages_mapped-updates-when-batching-pte-unmapping.patch
The quilt patch titled
Subject: mseal: fix is_madv_discard()
has been removed from the -mm tree. Its filename was
mseal-fix-is_madv_discard.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Pedro Falcato <pedro.falcato(a)gmail.com>
Subject: mseal: fix is_madv_discard()
Date: Wed, 7 Aug 2024 18:33:35 +0100
is_madv_discard did its check wrong. MADV_ flags are not bitwise,
they're normal sequential numbers. So, for instance:
behavior & (/* ... */ | MADV_REMOVE)
tagged both MADV_REMOVE and MADV_RANDOM (bit 0 set) as discard
operations.
As a result the kernel could erroneously block certain madvises (e.g
MADV_RANDOM or MADV_HUGEPAGE) on sealed VMAs due to them sharing bits
with blocked MADV operations (e.g REMOVE or WIPEONFORK).
This is obviously incorrect, so use a switch statement instead.
Link: https://lkml.kernel.org/r/20240807173336.2523757-1-pedro.falcato@gmail.com
Link: https://lkml.kernel.org/r/20240807173336.2523757-2-pedro.falcato@gmail.com
Fixes: 8be7258aad44 ("mseal: add mseal syscall")
Signed-off-by: Pedro Falcato <pedro.falcato(a)gmail.com>
Tested-by: Jeff Xu <jeffxu(a)chromium.org>
Reviewed-by: Jeff Xu <jeffxu(a)chromium.org>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mseal.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
--- a/mm/mseal.c~mseal-fix-is_madv_discard
+++ a/mm/mseal.c
@@ -40,9 +40,17 @@ static bool can_modify_vma(struct vm_are
static bool is_madv_discard(int behavior)
{
- return behavior &
- (MADV_FREE | MADV_DONTNEED | MADV_DONTNEED_LOCKED |
- MADV_REMOVE | MADV_DONTFORK | MADV_WIPEONFORK);
+ switch (behavior) {
+ case MADV_FREE:
+ case MADV_DONTNEED:
+ case MADV_DONTNEED_LOCKED:
+ case MADV_REMOVE:
+ case MADV_DONTFORK:
+ case MADV_WIPEONFORK:
+ return true;
+ }
+
+ return false;
}
static bool is_ro_anon(struct vm_area_struct *vma)
_
Patches currently in -mm which might be from pedro.falcato(a)gmail.com are
selftests-mm-add-mseal-test-for-no-discard-madvise.patch
selftests-mm-add-mseal-test-for-no-discard-madvise-fix.patch
The locks_remove_posix() function in fcntl_setlk/fcntl_setlk64 is designed
to reliably remove locks when an fcntl/close race is detected. However, it
was passing in the wrong filelock owner, it looks like a mistake and
resulting in a failure to remove locks. More critically, if the lock
removal fails, it could lead to a uaf issue while traversing the locks.
This problem occurs only in the 4.19/5.4 stable version.
Fixes: a561145f3ae9 ("filelock: Fix fcntl/close race recovery compat path")
Fixes: d30ff3304083 ("filelock: Remove locks reliably when fcntl/close race is detected")
Cc: stable(a)vger.kernel.org
Signed-off-by: Long Li <leo.lilong(a)huawei.com>
---
fs/locks.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 234ebfa8c070..b1201b01867a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2313,7 +2313,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
f = fcheck(fd);
spin_unlock(¤t->files->file_lock);
if (f != filp) {
- locks_remove_posix(filp, ¤t->files);
+ locks_remove_posix(filp, current->files);
error = -EBADF;
}
}
@@ -2443,7 +2443,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
f = fcheck(fd);
spin_unlock(¤t->files->file_lock);
if (f != filp) {
- locks_remove_posix(filp, ¤t->files);
+ locks_remove_posix(filp, current->files);
error = -EBADF;
}
}
--
2.39.2
The quilt patch titled
Subject: lib/stackdepot: double DEPOT_POOLS_CAP if KASAN is enabled
has been removed from the -mm tree. Its filename was
lib-stackdepot-double-depot_pools_cap-if-kasan-is-enabled.patch
This patch was dropped because it is obsolete
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: lib/stackdepot: double DEPOT_POOLS_CAP if KASAN is enabled
Date: Wed, 7 Aug 2024 12:52:28 -0400
When a wide variety of workloads are run on a debug kernel with KASAN
enabled, the following warning may sometimes be printed.
[ 6818.650674] Stack depot reached limit capacity
[ 6818.650730] WARNING: CPU: 1 PID: 272741 at lib/stackdepot.c:252 depot_alloc_stack+0x39e/0x3d0
:
[ 6818.650907] Call Trace:
[ 6818.650909] [<00047dd453d84b92>] depot_alloc_stack+0x3a2/0x3d0
[ 6818.650916] [<00047dd453d85254>] stack_depot_save_flags+0x4f4/0x5c0
[ 6818.650920] [<00047dd4535872c6>] kasan_save_stack+0x56/0x70
[ 6818.650924] [<00047dd453587328>] kasan_save_track+0x28/0x40
[ 6818.650927] [<00047dd45358a27a>] kasan_save_free_info+0x4a/0x70
[ 6818.650930] [<00047dd45358766a>] __kasan_slab_free+0x12a/0x1d0
[ 6818.650933] [<00047dd45350deb4>] kmem_cache_free+0x1b4/0x580
[ 6818.650938] [<00047dd452c520da>] __put_task_struct+0x24a/0x320
[ 6818.650945] [<00047dd452c6aee4>] delayed_put_task_struct+0x294/0x350
[ 6818.650949] [<00047dd452e9066a>] rcu_do_batch+0x6ea/0x2090
[ 6818.650953] [<00047dd452ea60f4>] rcu_core+0x474/0xa90
[ 6818.650956] [<00047dd452c780c0>] handle_softirqs+0x3c0/0xf90
[ 6818.650960] [<00047dd452c76fbe>] __irq_exit_rcu+0x35e/0x460
[ 6818.650963] [<00047dd452c79992>] irq_exit_rcu+0x22/0xb0
[ 6818.650966] [<00047dd454bd8128>] do_ext_irq+0xd8/0x120
[ 6818.650972] [<00047dd454c0ddd0>] ext_int_handler+0xb8/0xe8
[ 6818.650979] [<00047dd453589cf6>] kasan_check_range+0x236/0x2f0
[ 6818.650982] [<00047dd453378cf0>] filemap_get_pages+0x190/0xaa0
[ 6818.650986] [<00047dd453379940>] filemap_read+0x340/0xa70
[ 6818.650989] [<00047dd3d325d226>] xfs_file_buffered_read+0x2c6/0x400 [xfs]
[ 6818.651431] [<00047dd3d325dfe2>] xfs_file_read_iter+0x2c2/0x550 [xfs]
[ 6818.651663] [<00047dd45364710c>] vfs_read+0x64c/0x8c0
[ 6818.651669] [<00047dd453648ed8>] ksys_read+0x118/0x200
[ 6818.651672] [<00047dd452b6cf5a>] do_syscall+0x27a/0x380
[ 6818.651676] [<00047dd454bd7e74>] __do_syscall+0xf4/0x1a0
[ 6818.651680] [<00047dd454c0db58>] system_call+0x70/0x98
As KASAN is a big user of stackdepot, the current DEPOT_POOLS_CAP of
8192 may not be enough. Double DEPOT_POOLS_CAP if KASAN is enabled to
avoid hitting this problem.
Also use the MIN() macro for defining DEPOT_MAX_POOLS to clarify the
intention.
Link: https://lkml.kernel.org/r/20240807165228.1116831-1-longman@redhat.com
Fixes: 02754e0a484a ("lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Cc: Andrey Konovalov <andreyknvl(a)google.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/stackdepot.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/lib/stackdepot.c~lib-stackdepot-double-depot_pools_cap-if-kasan-is-enabled
+++ a/lib/stackdepot.c
@@ -36,11 +36,12 @@
#include <linux/memblock.h>
#include <linux/kasan-enabled.h>
-#define DEPOT_POOLS_CAP 8192
+/* KASAN is a big user of stackdepot, double the cap if KASAN is enabled */
+#define DEPOT_POOLS_CAP (8192 * (IS_ENABLED(CONFIG_KASAN) ? 2 : 1))
+
/* The pool_index is offset by 1 so the first record does not have a 0 handle. */
#define DEPOT_MAX_POOLS \
- (((1LL << (DEPOT_POOL_INDEX_BITS)) - 1 < DEPOT_POOLS_CAP) ? \
- (1LL << (DEPOT_POOL_INDEX_BITS)) - 1 : DEPOT_POOLS_CAP)
+ MIN((1LL << (DEPOT_POOL_INDEX_BITS)) - 1, DEPOT_POOLS_CAP)
static bool stack_depot_disabled;
static bool __stack_depot_early_init_requested __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
_
Patches currently in -mm which might be from longman(a)redhat.com are
mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu.patch
mm-memory-failure-use-raw_spinlock_t-in-struct-memory_failure_cpu-v3.patch
watchdog-handle-the-enodev-failure-case-of-lockup_detector_delay_init-separately.patch
The locks_remove_posix() function in fcntl_setlk/fcntl_setlk64 is designed
to reliably remove locks when an fcntl/close race is detected. However, it
was passing in the wrong filelock owner, it looks like a mistake and
resulting in a failure to remove locks. More critically, if the lock
removal fails, it could lead to a uaf issue while traversing the locks.
This problem occurs only in the 4.19 stable version.
Fixes: a561145f3ae9 ("filelock: Fix fcntl/close race recovery compat path")
Fixes: d30ff3304083 ("filelock: Remove locks reliably when fcntl/close race is detected")
Cc: stable(a)vger.kernel.org
Signed-off-by: Long Li <leo.lilong(a)huawei.com>
---
fs/locks.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 234ebfa8c070..b1201b01867a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2313,7 +2313,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
f = fcheck(fd);
spin_unlock(¤t->files->file_lock);
if (f != filp) {
- locks_remove_posix(filp, ¤t->files);
+ locks_remove_posix(filp, current->files);
error = -EBADF;
}
}
@@ -2443,7 +2443,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
f = fcheck(fd);
spin_unlock(¤t->files->file_lock);
if (f != filp) {
- locks_remove_posix(filp, ¤t->files);
+ locks_remove_posix(filp, current->files);
error = -EBADF;
}
}
--
2.39.2
The patch titled
Subject: mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-slub-add-check-for-s-flags-in-the-alloc_tagging_slab_free_hook.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Hao Ge <gehao(a)kylinos.cn>
Subject: mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
Date: Fri, 16 Aug 2024 09:33:36 +0800
When enable CONFIG_MEMCG & CONFIG_KFENCE & CONFIG_KMEMLEAK, the following
warning always occurs,This is because the following call stack occurred:
mem_pool_alloc
kmem_cache_alloc_noprof
slab_alloc_node
kfence_alloc
Once the kfence allocation is successful,slab->obj_exts will not be empty,
because it has already been assigned a value in kfence_init_pool.
Since in the prepare_slab_obj_exts_hook function,we perform a check for
s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE),the alloc_tag_add function
will not be called as a result.Therefore,ref->ct remains NULL.
However,when we call mem_pool_free,since obj_ext is not empty, it
eventually leads to the alloc_tag_sub scenario being invoked. This is
where the warning occurs.
So we should add corresponding checks in the alloc_tagging_slab_free_hook.
For __GFP_NO_OBJ_EXT case,I didn't see the specific case where it's using
kfence,so I won't add the corresponding check in
alloc_tagging_slab_free_hook for now.
[ 3.734349] ------------[ cut here ]------------
[ 3.734807] alloc_tag was not set
[ 3.735129] WARNING: CPU: 4 PID: 40 at ./include/linux/alloc_tag.h:130 kmem_cache_free+0x444/0x574
[ 3.735866] Modules linked in: autofs4
[ 3.736211] CPU: 4 UID: 0 PID: 40 Comm: ksoftirqd/4 Tainted: G W 6.11.0-rc3-dirty #1
[ 3.736969] Tainted: [W]=WARN
[ 3.737258] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[ 3.737875] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3.738501] pc : kmem_cache_free+0x444/0x574
[ 3.738951] lr : kmem_cache_free+0x444/0x574
[ 3.739361] sp : ffff80008357bb60
[ 3.739693] x29: ffff80008357bb70 x28: 0000000000000000 x27: 0000000000000000
[ 3.740338] x26: ffff80008207f000 x25: ffff000b2eb2fd60 x24: ffff0000c0005700
[ 3.740982] x23: ffff8000804229e4 x22: ffff800082080000 x21: ffff800081756000
[ 3.741630] x20: fffffd7ff8253360 x19: 00000000000000a8 x18: ffffffffffffffff
[ 3.742274] x17: ffff800ab327f000 x16: ffff800083398000 x15: ffff800081756df0
[ 3.742919] x14: 0000000000000000 x13: 205d344320202020 x12: 5b5d373038343337
[ 3.743560] x11: ffff80008357b650 x10: 000000000000005d x9 : 00000000ffffffd0
[ 3.744231] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008237bad0 x6 : c0000000ffff7fff
[ 3.744907] x5 : ffff80008237ba78 x4 : ffff8000820bbad0 x3 : 0000000000000001
[ 3.745580] x2 : 68d66547c09f7800 x1 : 68d66547c09f7800 x0 : 0000000000000000
[ 3.746255] Call trace:
[ 3.746530] kmem_cache_free+0x444/0x574
[ 3.746931] mem_pool_free+0x44/0xf4
[ 3.747306] free_object_rcu+0xc8/0xdc
[ 3.747693] rcu_do_batch+0x234/0x8a4
[ 3.748075] rcu_core+0x230/0x3e4
[ 3.748424] rcu_core_si+0x14/0x1c
[ 3.748780] handle_softirqs+0x134/0x378
[ 3.749189] run_ksoftirqd+0x70/0x9c
[ 3.749560] smpboot_thread_fn+0x148/0x22c
[ 3.749978] kthread+0x10c/0x118
[ 3.750323] ret_from_fork+0x10/0x20
[ 3.750696] ---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20240816013336.17505-1-hao.ge@linux.dev
Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Hao Ge <gehao(a)kylinos.cn>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slub.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/slub.c~mm-slub-add-check-for-s-flags-in-the-alloc_tagging_slab_free_hook
+++ a/mm/slub.c
@@ -2116,6 +2116,10 @@ alloc_tagging_slab_free_hook(struct kmem
if (!mem_alloc_profiling_enabled())
return;
+ /* slab->obj_exts might not be NULL if it was created for MEMCG accounting. */
+ if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
+ return;
+
obj_exts = slab_obj_exts(slab);
if (!obj_exts)
return;
_
Patches currently in -mm which might be from gehao(a)kylinos.cn are
mm-slub-add-check-for-s-flags-in-the-alloc_tagging_slab_free_hook.patch
mm-cma-change-the-addition-of-totalcma_pages-in-the-cma_init_reserved_mem.patch
AddressSanitizer found a use-after-free bug in the symbol code which
manifested as perf top segfaulting.
==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80
READ of size 1 at 0x60b00c48844b thread T193
#0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310
#1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286
#2 0x5650d8043951 in hists__findnew_entry util/hist.c:614
#3 0x5650d804568f in __hists__add_entry util/hist.c:754
#4 0x5650d8045bf9 in hists__add_entry util/hist.c:772
#5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997
#6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242
#7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845
#8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208
#9 0x5650d7fdb51b in do_flush util/ordered-events.c:245
#10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324
#11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120
#12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442
#13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
When updating hist maps it's also necessary to update the hist symbol
reference because the old one gets freed in map__put().
While this bug was probably introduced with 5c24b67aae72 ("perf tools:
Replace map->referenced & maps->removed_maps with map->refcnt"), the
symbol objects were leaked until c087e9480cf3 ("perf machine: Fix
refcount usage when processing PERF_RECORD_KSYMBOL") was merged so the
bug was masked.
Fixes: c087e9480cf3 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL")
Signed-off-by: Matt Fleming (Cloudflare) <matt(a)readmodwrite.com>
Reported-by: Yunzhao Li <yunzhao(a)cloudflare.com>
Cc: stable(a)vger.kernel.org # v5.13+
---
tools/perf/util/hist.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 0f554febf9a1..0f9ce2ee2c31 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -639,6 +639,11 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
* the history counter to increment.
*/
if (he->ms.map != entry->ms.map) {
+ if (he->ms.sym) {
+ u64 addr = he->ms.sym->start;
+ he->ms.sym = map__find_symbol(entry->ms.map, addr);
+ }
+
map__put(he->ms.map);
he->ms.map = map__get(entry->ms.map);
}
--
2.34.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 82dbb57ac8d06dfe8227ba9ab11a49de2b475ae5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081113-creamed-unleaded-c696@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
82dbb57ac8d0 ("scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES")
0c25422d34b4 ("scsi: mpt3sas: Remove scsi_dma_map() error messages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 82dbb57ac8d06dfe8227ba9ab11a49de2b475ae5 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal(a)kernel.org>
Date: Fri, 19 Jul 2024 16:39:12 +0900
Subject: [PATCH] scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES
Some firmware versions of the 9600 series SAS HBA byte-swap the REPORT
ZONES command reply buffer from ATA-ZAC devices by directly accessing the
buffer in the host memory. This does not respect the default command DMA
direction and causes IOMMU page faults on architectures with an IOMMU
enforcing write-only mappings for DMA_FROM_DEVICE DMA driection (e.g. AMD
hosts).
scsi 18:0:0:0: Direct-Access-ZBC ATA WDC WSH722020AL W870 PQ: 0 ANSI: 6
scsi 18:0:0:0: SATA: handle(0x0027), sas_addr(0x300062b2083e7c40), phy(0), device_name(0x5000cca29dc35e11)
scsi 18:0:0:0: enclosure logical id (0x300062b208097c40), slot(0)
scsi 18:0:0:0: enclosure level(0x0000), connector name( C0.0)
scsi 18:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
scsi 18:0:0:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
sd 18:0:0:0: Attached scsi generic sg2 type 20
sd 18:0:0:0: [sdc] Host-managed zoned block device
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b200 flags=0x0050]
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b300 flags=0x0050]
mpt3sas_cm0: mpt3sas_ctl_pre_reset_handler: Releasing the trace buffer due to adapter reset.
mpt3sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready
mpt3sas_cm0: fault_state(0x2666)!
mpt3sas_cm0: sending diag reset !!
mpt3sas_cm0: diag reset: SUCCESS
sd 18:0:0:0: [sdc] REPORT ZONES start lba 0 failed
sd 18:0:0:0: [sdc] REPORT ZONES: Result: hostbyte=DID_RESET driverbyte=DRIVER_OK
sd 18:0:0:0: [sdc] 0 4096-byte logical blocks: (0 B/0 B)
Avoid such issue by always mapping the buffer of REPORT ZONES commands
using DMA_BIDIRECTIONAL (read+write IOMMU mapping). This is done by
introducing the helper function _base_scsi_dma_map() and using this helper
in _base_build_sg_scmd() and _base_build_sg_scmd_ieee() instead of calling
directly scsi_dma_map().
Fixes: 471ef9d4e498 ("mpt3sas: Build MPI SGL LIST on GEN2 HBAs and IEEE SGL LIST on GEN3 HBAs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
Link: https://lore.kernel.org/r/20240719073913.179559-3-dlemoal@kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 1092497563b2..c8fb965a6bf0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2671,6 +2671,22 @@ _base_build_zero_len_sge_ieee(struct MPT3SAS_ADAPTER *ioc, void *paddr)
_base_add_sg_single_ieee(paddr, sgl_flags, 0, 0, -1);
}
+static inline int _base_scsi_dma_map(struct scsi_cmnd *cmd)
+{
+ /*
+ * Some firmware versions byte-swap the REPORT ZONES command reply from
+ * ATA-ZAC devices by directly accessing in the host buffer. This does
+ * not respect the default command DMA direction and causes IOMMU page
+ * faults on some architectures with an IOMMU enforcing write mappings
+ * (e.g. AMD hosts). Avoid such issue by making the report zones buffer
+ * mapping bi-directional.
+ */
+ if (cmd->cmnd[0] == ZBC_IN && cmd->cmnd[1] == ZI_REPORT_ZONES)
+ cmd->sc_data_direction = DMA_BIDIRECTIONAL;
+
+ return scsi_dma_map(cmd);
+}
+
/**
* _base_build_sg_scmd - main sg creation routine
* pcie_device is unused here!
@@ -2717,7 +2733,7 @@ _base_build_sg_scmd(struct MPT3SAS_ADAPTER *ioc,
sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
sg_scmd = scsi_sglist(scmd);
- sges_left = scsi_dma_map(scmd);
+ sges_left = _base_scsi_dma_map(scmd);
if (sges_left < 0)
return -ENOMEM;
@@ -2861,7 +2877,7 @@ _base_build_sg_scmd_ieee(struct MPT3SAS_ADAPTER *ioc,
}
sg_scmd = scsi_sglist(scmd);
- sges_left = scsi_dma_map(scmd);
+ sges_left = _base_scsi_dma_map(scmd);
if (sges_left < 0)
return -ENOMEM;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 82dbb57ac8d06dfe8227ba9ab11a49de2b475ae5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081112-idly-qualify-80f3@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
82dbb57ac8d0 ("scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES")
0c25422d34b4 ("scsi: mpt3sas: Remove scsi_dma_map() error messages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 82dbb57ac8d06dfe8227ba9ab11a49de2b475ae5 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal(a)kernel.org>
Date: Fri, 19 Jul 2024 16:39:12 +0900
Subject: [PATCH] scsi: mpt3sas: Avoid IOMMU page faults on REPORT ZONES
Some firmware versions of the 9600 series SAS HBA byte-swap the REPORT
ZONES command reply buffer from ATA-ZAC devices by directly accessing the
buffer in the host memory. This does not respect the default command DMA
direction and causes IOMMU page faults on architectures with an IOMMU
enforcing write-only mappings for DMA_FROM_DEVICE DMA driection (e.g. AMD
hosts).
scsi 18:0:0:0: Direct-Access-ZBC ATA WDC WSH722020AL W870 PQ: 0 ANSI: 6
scsi 18:0:0:0: SATA: handle(0x0027), sas_addr(0x300062b2083e7c40), phy(0), device_name(0x5000cca29dc35e11)
scsi 18:0:0:0: enclosure logical id (0x300062b208097c40), slot(0)
scsi 18:0:0:0: enclosure level(0x0000), connector name( C0.0)
scsi 18:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
scsi 18:0:0:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
sd 18:0:0:0: Attached scsi generic sg2 type 20
sd 18:0:0:0: [sdc] Host-managed zoned block device
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b200 flags=0x0050]
mpt3sas 0000:41:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0xfff9b300 flags=0x0050]
mpt3sas_cm0: mpt3sas_ctl_pre_reset_handler: Releasing the trace buffer due to adapter reset.
mpt3sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready
mpt3sas_cm0: fault_state(0x2666)!
mpt3sas_cm0: sending diag reset !!
mpt3sas_cm0: diag reset: SUCCESS
sd 18:0:0:0: [sdc] REPORT ZONES start lba 0 failed
sd 18:0:0:0: [sdc] REPORT ZONES: Result: hostbyte=DID_RESET driverbyte=DRIVER_OK
sd 18:0:0:0: [sdc] 0 4096-byte logical blocks: (0 B/0 B)
Avoid such issue by always mapping the buffer of REPORT ZONES commands
using DMA_BIDIRECTIONAL (read+write IOMMU mapping). This is done by
introducing the helper function _base_scsi_dma_map() and using this helper
in _base_build_sg_scmd() and _base_build_sg_scmd_ieee() instead of calling
directly scsi_dma_map().
Fixes: 471ef9d4e498 ("mpt3sas: Build MPI SGL LIST on GEN2 HBAs and IEEE SGL LIST on GEN3 HBAs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
Link: https://lore.kernel.org/r/20240719073913.179559-3-dlemoal@kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 1092497563b2..c8fb965a6bf0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2671,6 +2671,22 @@ _base_build_zero_len_sge_ieee(struct MPT3SAS_ADAPTER *ioc, void *paddr)
_base_add_sg_single_ieee(paddr, sgl_flags, 0, 0, -1);
}
+static inline int _base_scsi_dma_map(struct scsi_cmnd *cmd)
+{
+ /*
+ * Some firmware versions byte-swap the REPORT ZONES command reply from
+ * ATA-ZAC devices by directly accessing in the host buffer. This does
+ * not respect the default command DMA direction and causes IOMMU page
+ * faults on some architectures with an IOMMU enforcing write mappings
+ * (e.g. AMD hosts). Avoid such issue by making the report zones buffer
+ * mapping bi-directional.
+ */
+ if (cmd->cmnd[0] == ZBC_IN && cmd->cmnd[1] == ZI_REPORT_ZONES)
+ cmd->sc_data_direction = DMA_BIDIRECTIONAL;
+
+ return scsi_dma_map(cmd);
+}
+
/**
* _base_build_sg_scmd - main sg creation routine
* pcie_device is unused here!
@@ -2717,7 +2733,7 @@ _base_build_sg_scmd(struct MPT3SAS_ADAPTER *ioc,
sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
sg_scmd = scsi_sglist(scmd);
- sges_left = scsi_dma_map(scmd);
+ sges_left = _base_scsi_dma_map(scmd);
if (sges_left < 0)
return -ENOMEM;
@@ -2861,7 +2877,7 @@ _base_build_sg_scmd_ieee(struct MPT3SAS_ADAPTER *ioc,
}
sg_scmd = scsi_sglist(scmd);
- sges_left = scsi_dma_map(scmd);
+ sges_left = _base_scsi_dma_map(scmd);
if (sges_left < 0)
return -ENOMEM;
In _emif_get_id(), of_get_address() may return NULL which is later
dereferenced. Fix this bug by adding NULL check. of_translate_address() is
the same.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 86a18ee21e5e ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v4:
- added the check of of_translate_address() as suggestions.
Changes in v3:
- added the patch operations omitted in PATCH v2 RESEND compared to PATCH
v2. Sorry for my oversight.
Changes in v2:
- added Cc stable line.
---
drivers/edac/ti_edac.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/edac/ti_edac.c b/drivers/edac/ti_edac.c
index 29723c9592f7..f466f12630d3 100644
--- a/drivers/edac/ti_edac.c
+++ b/drivers/edac/ti_edac.c
@@ -207,14 +207,24 @@ static int _emif_get_id(struct device_node *node)
int my_id = 0;
addrp = of_get_address(node, 0, NULL, NULL);
+ if (!addrp)
+ return -EINVAL;
+
my_addr = (u32)of_translate_address(node, addrp);
+ if (my_addr == OF_BAD_ADDR)
+ return -EINVAL;
for_each_matching_node(np, ti_edac_of_match) {
if (np == node)
continue;
addrp = of_get_address(np, 0, NULL, NULL);
+ if (!addrp)
+ return -EINVAL;
+
addr = (u32)of_translate_address(np, addrp);
+ if (addr == OF_BAD_ADDR)
+ return -EINVAL;
edac_printk(KERN_INFO, EDAC_MOD_NAME,
"addr=%x, my_addr=%x\n",
--
2.25.1
The patch titled
Subject: kunit/overflow: fix UB in overflow_allocation_test
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kunit-overflow-fix-ub-in-overflow_allocation_test.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Subject: kunit/overflow: fix UB in overflow_allocation_test
Date: Thu, 15 Aug 2024 01:04:31 +0100
The 'device_name' array doesn't exist out of the
'overflow_allocation_test' function scope. However, it is being used as a
driver name when calling 'kunit_driver_create' from
'kunit_device_register'. It produces the kernel panic with KASAN enabled.
Since this variable is used in one place only, remove it and pass the
device name into kunit_device_register directly as an ascii string.
Link: https://lkml.kernel.org/r/20240815000431.401869-1-ivan.orlov0322@gmail.com
Fixes: ca90800a91ba ("test_overflow: Add memory allocation overflow tests")
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Tested-by: Erhard Furtner <erhard_f(a)mailbox.org>
Reviewed-by: David Gow <davidgow(a)google.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/overflow_kunit.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/lib/overflow_kunit.c~kunit-overflow-fix-ub-in-overflow_allocation_test
+++ a/lib/overflow_kunit.c
@@ -668,7 +668,6 @@ DEFINE_TEST_ALLOC(devm_kzalloc, devm_kf
static void overflow_allocation_test(struct kunit *test)
{
- const char device_name[] = "overflow-test";
struct device *dev;
int count = 0;
@@ -678,7 +677,7 @@ static void overflow_allocation_test(str
} while (0)
/* Create dummy device for devm_kmalloc()-family tests. */
- dev = kunit_device_register(test, device_name);
+ dev = kunit_device_register(test, "overflow-test");
KUNIT_ASSERT_FALSE_MSG(test, IS_ERR(dev),
"Cannot register test device\n");
_
Patches currently in -mm which might be from ivan.orlov0322(a)gmail.com are
kunit-overflow-fix-ub-in-overflow_allocation_test.patch
From: Murali Nalajala <quic_mnalajal(a)quicinc.com>
Currently get_wq_ctx() is wrongly configured as a standard call. When two
SMC calls are in sleep and one SMC wakes up, it calls get_wq_ctx() to
resume the corresponding sleeping thread. But if get_wq_ctx() is
interrupted, goes to sleep and another SMC call is waiting to be allocated
a waitq context, it leads to a deadlock.
To avoid this get_wq_ctx() must be an atomic call and can't be a standard
SMC call. Hence mark get_wq_ctx() as a fast call.
Fixes: 6bf325992236 ("firmware: qcom: scm: Add wait-queue handling logic")
Cc: stable(a)vger.kernel.org
Signed-off-by: Murali Nalajala <quic_mnalajal(a)quicinc.com>
Signed-off-by: Unnathi Chalicheemala <quic_uchalich(a)quicinc.com>
Reviewed-by: Elliot Berman <quic_eberman(a)quicinc.com>
---
Changes in v2:
- Made commit message more clear.
- R-b tag from Elliot.
- Link to v1: https://lore.kernel.org/all/20240611-get_wq_ctx_atomic-v1-1-9189a0a7d1ba@qu…
drivers/firmware/qcom/qcom_scm-smc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/qcom/qcom_scm-smc.c b/drivers/firmware/qcom/qcom_scm-smc.c
index dca5f3f1883b..2b4c2826f572 100644
--- a/drivers/firmware/qcom/qcom_scm-smc.c
+++ b/drivers/firmware/qcom/qcom_scm-smc.c
@@ -73,7 +73,7 @@ int scm_get_wq_ctx(u32 *wq_ctx, u32 *flags, u32 *more_pending)
struct arm_smccc_res get_wq_res;
struct arm_smccc_args get_wq_ctx = {0};
- get_wq_ctx.args[0] = ARM_SMCCC_CALL_VAL(ARM_SMCCC_STD_CALL,
+ get_wq_ctx.args[0] = ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,
ARM_SMCCC_SMC_64, ARM_SMCCC_OWNER_SIP,
SCM_SMC_FNID(QCOM_SCM_SVC_WAITQ, QCOM_SCM_WAITQ_GET_WQ_CTX));
--
2.34.1
This series adds the DFS support for GCC QUPv3 RCGS and also adds the
missing GPLL9 support and fixes the sdcc clocks frequency tables.
Signed-off-by: Satya Priya Kakitapalli <quic_skakitap(a)quicinc.com>
---
Changes in V2:
- Add stable kernel tags and update the commit text for [1/4] patch.
- Added one more fix in V2, to remove the unused cpuss_ahb_clk and its RCG.
---
Satya Priya Kakitapalli (5):
clk: qcom: gcc-sc8180x: Register QUPv3 RCGs for DFS on sc8180x
dt-bindings: clock: qcom: Add GPLL9 support on gcc-sc8180x
clk: qcom: gcc-sc8180x: Add GPLL9 support
clk: qcom: gcc-sc8180x: Fix the sdcc2 and sdcc4 clocks freq table
clk: qcom: gcc-sm8150: De-register gcc_cpuss_ahb_clk_src
drivers/clk/qcom/gcc-sc8180x.c | 438 ++++++++++++++-------------
include/dt-bindings/clock/qcom,gcc-sc8180x.h | 1 +
2 files changed, 232 insertions(+), 207 deletions(-)
---
base-commit: 864b1099d16fc7e332c3ad7823058c65f890486c
change-id: 20240725-gcc-sc8180x-fixes-cf58908142b5
Best regards,
--
Satya Priya Kakitapalli <quic_skakitap(a)quicinc.com>
From: Mike Tipton <quic_mdtipton(a)quicinc.com>
Valid frequencies may result in BCM votes that exceed the max HW value.
Set vote ceiling to BCM_TCS_CMD_VOTE_MASK to ensure the votes aren't
truncated, which can result in lower frequencies than desired.
Fixes: 04053f4d23a4 ("clk: qcom: clk-rpmh: Add IPA clock support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mike Tipton <quic_mdtipton(a)quicinc.com>
Reviewed-by: Taniya Das <quic_tdas(a)quicinc.com>
Signed-off-by: Imran Shaik <quic_imrashai(a)quicinc.com>
---
Changes in v2:
- Updated the overflow check as per the comment from Stephen.
- Link to v1: https://lore.kernel.org/r/20240808-clk-rpmh-bcm-vote-fix-v1-1-109bd1d76189@…
---
drivers/clk/qcom/clk-rpmh.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/clk/qcom/clk-rpmh.c b/drivers/clk/qcom/clk-rpmh.c
index bb82abeed88f..4acde937114a 100644
--- a/drivers/clk/qcom/clk-rpmh.c
+++ b/drivers/clk/qcom/clk-rpmh.c
@@ -263,6 +263,8 @@ static int clk_rpmh_bcm_send_cmd(struct clk_rpmh *c, bool enable)
cmd_state = 0;
}
+ cmd_state = min(cmd_state, BCM_TCS_CMD_VOTE_MASK);
+
if (c->last_sent_aggr_state != cmd_state) {
cmd.addr = c->res_addr;
cmd.data = BCM_TCS_CMD(1, enable, 0, cmd_state);
---
base-commit: 222a3380f92b8791d4eeedf7cd750513ff428adf
change-id: 20240808-clk-rpmh-bcm-vote-fix-c344e213c9bb
Best regards,
--
Imran Shaik <quic_imrashai(a)quicinc.com>
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 24e82654e98e96cece5d8b919c522054456eeec6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081204-stylist-bobsled-3424@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
24e82654e98e ("drm/amdkfd: don't allow mapping the MMIO HDP page with large pages")
b38c074b2b07 ("drm/amdkfd: CRIU Refactor restore BO function")
67a359d85ec2 ("drm/amdkfd: CRIU remove sync and TLB flush on restore")
22804e03f7a5 ("drm/amdkfd: Fix criu_restore_bo error handling")
d8a25e485857 ("drm/amdkfd: fix loop error handling")
e5af61ffaaef ("drm/amdkfd: CRIU fix a NULL vs IS_ERR() check")
be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
bef153b70c6e ("drm/amdkfd: CRIU implement gpu_id remapping")
40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
42c6c48214b7 ("drm/amdkfd: CRIU checkpoint and restore queue mqds")
2485c12c980a ("drm/amdkfd: CRIU restore sdma id for queues")
8668dfc30d3e ("drm/amdkfd: CRIU restore queue ids")
626f7b3190b4 ("drm/amdkfd: CRIU add queues support")
cd9f79103003 ("drm/amdkfd: CRIU Implement KFD unpause operation")
011bbb03024f ("drm/amdkfd: CRIU Implement KFD resume ioctl")
73fa13b6a511 ("drm/amdkfd: CRIU Implement KFD restore ioctl")
5ccbb057c0a1 ("drm/amdkfd: CRIU Implement KFD checkpoint ioctl")
f185381b6481 ("drm/amdkfd: CRIU Implement KFD process_info ioctl")
3698807094ec ("drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs")
f61c40c0757a ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24e82654e98e96cece5d8b919c522054456eeec6 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Sun, 14 Apr 2024 13:06:39 -0400
Subject: [PATCH] drm/amdkfd: don't allow mapping the MMIO HDP page with large
pages
We don't get the right offset in that case. The GPU has
an unused 4K area of the register BAR space into which you can
remap registers. We remap the HDP flush registers into this
space to allow userspace (CPU or GPU) to flush the HDP when it
updates VRAM. However, on systems with >4K pages, we end up
exposing PAGE_SIZE of MMIO space.
Fixes: d8e408a82704 ("drm/amdkfd: Expose HDP registers to user space")
Reviewed-by: Felix Kuehling <felix.kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6b713fb0b818..fdf171ad4a3c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1144,7 +1144,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
goto err_unlock;
}
offset = dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
err = -ENOMEM;
goto err_unlock;
}
@@ -2312,7 +2312,7 @@ static int criu_restore_memory_of_gpu(struct kfd_process_device *pdd,
return -EINVAL;
}
offset = pdd->dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
pr_err("amdgpu_amdkfd_get_mmio_remap_phys_addr failed\n");
return -ENOMEM;
}
@@ -3354,6 +3354,9 @@ static int kfd_mmio_mmap(struct kfd_node *dev, struct kfd_process *process,
if (vma->vm_end - vma->vm_start != PAGE_SIZE)
return -EINVAL;
+ if (PAGE_SIZE > 4096)
+ return -EINVAL;
+
address = dev->adev->rmmio_remap.bus_addr;
vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 24e82654e98e96cece5d8b919c522054456eeec6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081203-dreadlock-trodden-9b5f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
24e82654e98e ("drm/amdkfd: don't allow mapping the MMIO HDP page with large pages")
b38c074b2b07 ("drm/amdkfd: CRIU Refactor restore BO function")
67a359d85ec2 ("drm/amdkfd: CRIU remove sync and TLB flush on restore")
22804e03f7a5 ("drm/amdkfd: Fix criu_restore_bo error handling")
d8a25e485857 ("drm/amdkfd: fix loop error handling")
e5af61ffaaef ("drm/amdkfd: CRIU fix a NULL vs IS_ERR() check")
be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
bef153b70c6e ("drm/amdkfd: CRIU implement gpu_id remapping")
40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
42c6c48214b7 ("drm/amdkfd: CRIU checkpoint and restore queue mqds")
2485c12c980a ("drm/amdkfd: CRIU restore sdma id for queues")
8668dfc30d3e ("drm/amdkfd: CRIU restore queue ids")
626f7b3190b4 ("drm/amdkfd: CRIU add queues support")
cd9f79103003 ("drm/amdkfd: CRIU Implement KFD unpause operation")
011bbb03024f ("drm/amdkfd: CRIU Implement KFD resume ioctl")
73fa13b6a511 ("drm/amdkfd: CRIU Implement KFD restore ioctl")
5ccbb057c0a1 ("drm/amdkfd: CRIU Implement KFD checkpoint ioctl")
f185381b6481 ("drm/amdkfd: CRIU Implement KFD process_info ioctl")
3698807094ec ("drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs")
f61c40c0757a ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24e82654e98e96cece5d8b919c522054456eeec6 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Sun, 14 Apr 2024 13:06:39 -0400
Subject: [PATCH] drm/amdkfd: don't allow mapping the MMIO HDP page with large
pages
We don't get the right offset in that case. The GPU has
an unused 4K area of the register BAR space into which you can
remap registers. We remap the HDP flush registers into this
space to allow userspace (CPU or GPU) to flush the HDP when it
updates VRAM. However, on systems with >4K pages, we end up
exposing PAGE_SIZE of MMIO space.
Fixes: d8e408a82704 ("drm/amdkfd: Expose HDP registers to user space")
Reviewed-by: Felix Kuehling <felix.kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6b713fb0b818..fdf171ad4a3c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1144,7 +1144,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
goto err_unlock;
}
offset = dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
err = -ENOMEM;
goto err_unlock;
}
@@ -2312,7 +2312,7 @@ static int criu_restore_memory_of_gpu(struct kfd_process_device *pdd,
return -EINVAL;
}
offset = pdd->dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
pr_err("amdgpu_amdkfd_get_mmio_remap_phys_addr failed\n");
return -ENOMEM;
}
@@ -3354,6 +3354,9 @@ static int kfd_mmio_mmap(struct kfd_node *dev, struct kfd_process *process,
if (vma->vm_end - vma->vm_start != PAGE_SIZE)
return -EINVAL;
+ if (PAGE_SIZE > 4096)
+ return -EINVAL;
+
address = dev->adev->rmmio_remap.bus_addr;
vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and MADV_DONTNEED semantics). Consider some enclave page added to the
enclave. User space decides to temporarily remove this page (e.g.,
emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
space performs a memory access on the same page on CPU2, which results
in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
follows:
/*
* CPU1: User space performs
* ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
* on enclave page X
*/
sgx_encl_remove_pages() {
mutex_lock(&encl->lock);
entry = sgx_encl_load_page(encl);
/*
* verify that page is
* trimmed and accepted
*/
mutex_unlock(&encl->lock);
/*
* remove PTE entry; cannot
* be performed under lock
*/
sgx_zap_enclave_ptes(encl);
/*
* Fault on CPU2 on same page X
*/
sgx_vma_fault() {
/*
* PTE entry was removed, but the
* page is still in enclave's xarray
*/
xa_load(&encl->page_array) != NULL ->
/*
* SGX driver thinks that this page
* was swapped out and loads it
*/
mutex_lock(&encl->lock);
/*
* this is effectively a no-op
*/
entry = sgx_encl_load_page_in_vma();
/*
* add PTE entry
*
* *BUG*: a PTE is installed for a
* page in process of being removed
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
/*
* continue with page removal
*/
mutex_lock(&encl->lock);
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
xa_erase(&encl->page_array);
mutex_unlock(&encl->lock);
}
Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page.
Fix this race by forcing the fault handler on CPU2 to back off if the
page is currently being removed (on CPU1). This is achieved by
setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is busy, and if yes then CPU2 backs off and waits until the page is
completely removed. After that, any memory access to this page results
in a normal "allocate and EAUG a page on #PF" flow.
Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/ioctl.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 5d390df21440..02441883401d 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1141,7 +1141,14 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
/*
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
+ *
+ * Releasing encl->lock leads to a data race: while CPU1
+ * performs sgx_zap_enclave_ptes() and removes the PTE entry
+ * for the enclave page, CPU2 may attempt to load this page
+ * (because the page is still in enclave's xarray). To prevent
+ * CPU2 from loading the page, mark the page as busy.
*/
+ entry->desc |= SGX_ENCL_PAGE_BUSY;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
--
2.43.0
On Thu, Aug 15, 2024 at 08:21:29AM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> Input: bcm5974 - check endpoint type before starting traffic
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> input-bcm5974-check-endpoint-type-before-starting-tr.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
I am confused why to pick this up only to have to pick up the revert? It
was not tagged for stable explicitly.
I'd expects stable scripts to check for reverts that happen almost
immediately...
>
>
>
> commit 16ae8b10b473f96b7f63add2e928d8d437c83b07
> Author: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
> Date: Sat Oct 14 12:20:15 2023 +0200
>
> Input: bcm5974 - check endpoint type before starting traffic
>
> [ Upstream commit 2b9c3eb32a699acdd4784d6b93743271b4970899 ]
>
> syzbot has found a type mismatch between a USB pipe and the transfer
> endpoint, which is triggered by the bcm5974 driver[1].
>
> This driver expects the device to provide input interrupt endpoints and
> if that is not the case, the driver registration should terminate.
>
> Repros are available to reproduce this issue with a certain setup for
> the dummy_hcd, leading to an interrupt/bulk mismatch which is caught in
> the USB core after calling usb_submit_urb() with the following message:
> "BOGUS urb xfer, pipe 1 != type 3"
>
> Some other device drivers (like the appletouch driver bcm5974 is mainly
> based on) provide some checking mechanism to make sure that an IN
> interrupt endpoint is available. In this particular case the endpoint
> addresses are provided by a config table, so the checking can be
> targeted to the provided endpoints.
>
> Add some basic checking to guarantee that the endpoints available match
> the expected type for both the trackpad and button endpoints.
>
> This issue was only found for the trackpad endpoint, but the checking
> has been added to the button endpoint as well for the same reasons.
>
> Given that there was never a check for the endpoint type, this bug has
> been there since the first implementation of the driver (f89bd95c5c94).
>
> [1] https://syzkaller.appspot.com/bug?extid=348331f63b034f89b622
>
> Fixes: f89bd95c5c94 ("Input: bcm5974 - add driver for Macbook Air and Pro Penryn touchpads")
> Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
> Reported-and-tested-by: syzbot+348331f63b034f89b622(a)syzkaller.appspotmail.com
> Link: https://lore.kernel.org/r/20231007-topic-bcm5974_bulk-v3-1-d0f38b9d2935@gma…
> Signed-off-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/input/mouse/bcm5974.c b/drivers/input/mouse/bcm5974.c
> index ca150618d32f1..953992b458e9f 100644
> --- a/drivers/input/mouse/bcm5974.c
> +++ b/drivers/input/mouse/bcm5974.c
> @@ -19,6 +19,7 @@
> * Copyright (C) 2006 Nicolas Boichat (nicolas(a)boichat.ch)
> */
>
> +#include "linux/usb.h"
> #include <linux/kernel.h>
> #include <linux/errno.h>
> #include <linux/slab.h>
> @@ -193,6 +194,8 @@ enum tp_type {
>
> /* list of device capability bits */
> #define HAS_INTEGRATED_BUTTON 1
> +/* maximum number of supported endpoints (currently trackpad and button) */
> +#define MAX_ENDPOINTS 2
>
> /* trackpad finger data block size */
> #define FSIZE_TYPE1 (14 * sizeof(__le16))
> @@ -891,6 +894,18 @@ static int bcm5974_resume(struct usb_interface *iface)
> return error;
> }
>
> +static bool bcm5974_check_endpoints(struct usb_interface *iface,
> + const struct bcm5974_config *cfg)
> +{
> + u8 ep_addr[MAX_ENDPOINTS + 1] = {0};
> +
> + ep_addr[0] = cfg->tp_ep;
> + if (cfg->tp_type == TYPE1)
> + ep_addr[1] = cfg->bt_ep;
> +
> + return usb_check_int_endpoints(iface, ep_addr);
> +}
> +
> static int bcm5974_probe(struct usb_interface *iface,
> const struct usb_device_id *id)
> {
> @@ -903,6 +918,11 @@ static int bcm5974_probe(struct usb_interface *iface,
> /* find the product index */
> cfg = bcm5974_get_config(udev);
>
> + if (!bcm5974_check_endpoints(iface, cfg)) {
> + dev_err(&iface->dev, "Unexpected non-int endpoint\n");
> + return -ENODEV;
> + }
> +
> /* allocate memory for our device state and initialize it */
> dev = kzalloc(sizeof(struct bcm5974), GFP_KERNEL);
> input_dev = input_allocate_device();
Thanks.
--
Dmitry
Imagine an mmap()'d file. Two threads touch the same address at the same
time and fault. Both allocate a physical page and race to install a PTE
for that page. Only one will win the race. The loser frees its page, but
still continues handling the fault as a success and returns
VM_FAULT_NOPAGE from the fault handler.
The same race can happen with SGX. But there's a bug: the loser in the
SGX steers into a failure path. The loser EREMOVE's the winner's EPC
page, then returns SIGBUS, likely killing the app.
Fix the SGX loser's behavior. Change the return code to VM_FAULT_NOPAGE
to avoid SIGBUS and call sgx_free_epc_page() which avoids EREMOVE'ing
the winner's page and only frees the page that the loser allocated.
The race can be illustrated as follows:
/* /*
* Fault on CPU1 * Fault on CPU2
* on enclave page X * on enclave page X
*/ */
sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array)
== NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ...
/* /*
* alloc encl_page * alloc encl_page
*/ */
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray
*/
xa_insert(&encl->page_array, ...);
/*
* add page to enclave via EAUG
* (page is in pending state)
*/
/*
* add PTE entry
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
}
/*
* All good up to here: enclave page
* successfully added to enclave,
* ready for EACCEPT from user space
*/
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray,
* this fails with -EBUSY as this
* page was already added by CPU2
*/
xa_insert(&encl->page_array, ...);
err_out_shrink:
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*
* *BUG*: page added by CPU2 is
* yanked from enclave while it
* remains accessible from OS
* perspective (PTE installed)
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
mutex_unlock(&encl->lock);
/*
* *BUG*: SIGBUS is returned
* for a valid enclave page
*/
return VM_FAULT_SIGBUS;
}
}
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable(a)vger.kernel.org
Reported-by: Marcelina Kościelnicka <mwk(a)invisiblethingslab.com>
Suggested-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
Reviewed-by: Haitao Huang <haitao.huang(a)linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index c0a3c00284c8..9f7f9e57cdeb 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -380,8 +380,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
* If ret == -EBUSY then page was created in another flow while
* running without encl->lock
*/
- if (ret)
+ if (ret) {
+ if (ret == -EBUSY)
+ vmret = VM_FAULT_NOPAGE;
goto err_out_shrink;
+ }
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
pginfo.addr = encl_page->desc & PAGE_MASK;
@@ -417,7 +420,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
err_out_shrink:
sgx_encl_shrink(encl, va_page);
err_out_epc:
- sgx_encl_free_epc_page(epc_page);
+ sgx_free_epc_page(epc_page);
err_out_unlock:
mutex_unlock(&encl->lock);
kfree(encl_page);
--
2.43.0
SGX_ENCL_PAGE_BEING_RECLAIMED flag is set when the enclave page is being
reclaimed (moved to the backing store). This flag however has two
logical meanings:
1. Don't attempt to load the enclave page (the page is busy).
2. Don't attempt to remove the PCMD page corresponding to this enclave
page (the PCMD page is busy).
To reflect these two meanings, split SGX_ENCL_PAGE_BEING_RECLAIMED into
two flags: SGX_ENCL_PAGE_BUSY and SGX_ENCL_PAGE_PCMD_BUSY. Currently,
both flags are set only when the enclave page is being reclaimed. A
future commit will introduce a new case when the enclave page is being
removed; this new case will set only the SGX_ENCL_PAGE_BUSY flag.
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 16 +++++++---------
arch/x86/kernel/cpu/sgx/encl.h | 10 ++++++++--
arch/x86/kernel/cpu/sgx/main.c | 4 ++--
3 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..c0a3c00284c8 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -46,10 +46,10 @@ static int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_ind
* a check if an enclave page sharing the PCMD page is in the process of being
* reclaimed.
*
- * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
- * intends to reclaim that enclave page - it means that the PCMD page
- * associated with that enclave page is about to get some data and thus
- * even if the PCMD page is empty, it should not be truncated.
+ * The reclaimer sets the SGX_ENCL_PAGE_PCMD_BUSY flag when it intends to
+ * reclaim that enclave page - it means that the PCMD page associated with that
+ * enclave page is about to get some data and thus even if the PCMD page is
+ * empty, it should not be truncated.
*
* Context: Enclave mutex (&sgx_encl->lock) must be held.
* Return: 1 if the reclaimer is about to write to the PCMD page
@@ -77,8 +77,7 @@ static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
* Stop when reaching the SECS page - it does not
* have a page_array entry and its reclaim is
* started and completed with enclave mutex held so
- * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
- * flag.
+ * it does not use the SGX_ENCL_PAGE_PCMD_BUSY flag.
*/
if (addr == encl->base + encl->size)
break;
@@ -91,8 +90,7 @@ static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
* VA page slot ID uses same bit as the flag so it is important
* to ensure that the page is not already in backing store.
*/
- if (entry->epc_page &&
- (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
+ if (entry->epc_page && (entry->desc & SGX_ENCL_PAGE_PCMD_BUSY)) {
reclaimed = 1;
break;
}
@@ -257,7 +255,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
/* Entry successfully located. */
if (entry->epc_page) {
- if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+ if (entry->desc & SGX_ENCL_PAGE_BUSY)
return ERR_PTR(-EBUSY);
return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..11b09899cd92 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -22,8 +22,14 @@
/* 'desc' bits holding the offset in the VA (version array) page. */
#define SGX_ENCL_PAGE_VA_OFFSET_MASK GENMASK_ULL(11, 3)
-/* 'desc' bit marking that the page is being reclaimed. */
-#define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3)
+/* 'desc' bit indicating that the page is busy (e.g. being reclaimed). */
+#define SGX_ENCL_PAGE_BUSY BIT(2)
+
+/*
+ * 'desc' bit indicating that PCMD page associated with the enclave page is
+ * busy (e.g. because the enclave page is being reclaimed).
+ */
+#define SGX_ENCL_PAGE_PCMD_BUSY BIT(3)
struct sgx_encl_page {
unsigned long desc;
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 166692f2d501..e94b09c43673 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -204,7 +204,7 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page,
void *va_slot;
int ret;
- encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED;
+ encl_page->desc &= ~(SGX_ENCL_PAGE_BUSY | SGX_ENCL_PAGE_PCMD_BUSY);
va_page = list_first_entry(&encl->va_pages, struct sgx_va_page,
list);
@@ -340,7 +340,7 @@ static void sgx_reclaim_pages(void)
goto skip;
}
- encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
+ encl_page->desc |= SGX_ENCL_PAGE_BUSY | SGX_ENCL_PAGE_PCMD_BUSY;
mutex_unlock(&encl_page->encl->lock);
continue;
--
2.43.0
Hi,
This series is a v6.6-only backport (based on v6.6.44) of the upstream
workaround for SSBS errata on Arm Ltd CPUs, as affected parts are likely
to be used with stable kernels. This does not apply to earlier stable
trees, which will receive a separate backport.
The errata mean that an MSR to the SSBS special-purpose register does not
affect subsequent speculative instructions, permitting speculative store
bypassing for a window of time.
The upstream support was original posted as:
* https://lore.kernel.org/linux-arm-kernel/20240508081400.235362-1-mark.rutla…
"arm64: errata: Add workaround for Arm errata 3194386 and 3312417"
Present in v6.10
* https://lore.kernel.org/linux-arm-kernel/20240603111812.1514101-1-mark.rutl…
"arm64: errata: Expand speculative SSBS workaround"
Present in v6.11-rc1
* https://lore.kernel.org/linux-arm-kernel/20240801101803.1982459-1-mark.rutl…
"arm64: errata: Expand speculative SSBS workaround (again)"
Present in v6.11-rc2
This backport applies the patches which are not present in v6.6.y, and
as prerequisites backports the addition of the Neoverse-V2 MIDR values
and the restoration of the spec_bar() macro.
I have tested the backport (when applied to v6.6.44), ensuring that the
detection logic works and that the HWCAP and string in /proc/cpuinfo are
both hidden when the relevant errata are detected.
Mark.Besar Wicaksono (1):
arm64: Add Neoverse-V2 part
Mark Rutland (12):
arm64: barrier: Restore spec_bar() macro
arm64: cputype: Add Cortex-X4 definitions
arm64: cputype: Add Neoverse-V3 definitions
arm64: errata: Add workaround for Arm errata 3194386 and 3312417
arm64: cputype: Add Cortex-X3 definitions
arm64: cputype: Add Cortex-A720 definitions
arm64: cputype: Add Cortex-X925 definitions
arm64: errata: Unify speculative SSBS errata logic
arm64: errata: Expand speculative SSBS workaround
arm64: cputype: Add Cortex-X1C definitions
arm64: cputype: Add Cortex-A725 definitions
arm64: errata: Expand speculative SSBS workaround (again)
Documentation/arch/arm64/silicon-errata.rst | 36 +++++++++++++++++++
arch/arm64/Kconfig | 38 +++++++++++++++++++++
arch/arm64/include/asm/barrier.h | 4 +++
arch/arm64/include/asm/cputype.h | 16 +++++++++
arch/arm64/kernel/cpu_errata.c | 31 +++++++++++++++++
arch/arm64/kernel/cpufeature.c | 12 +++++++
arch/arm64/kernel/proton-pack.c | 12 +++++++
arch/arm64/tools/cpucaps | 1 +
8 files changed, 150 insertions(+)
--
2.30.2
From: Friedrich Vock <friedrich.vock(a)gmx.de>
The special case for VM passthrough doesn't check adev->nbio.funcs
before dereferencing it. If GPUs that don't have an NBIO block are
passed through, this leads to a NULL pointer dereference on startup.
Signed-off-by: Friedrich Vock <friedrich.vock(a)gmx.de>
Fixes: 1bece222eabe ("drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid")
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Christian König <christian.koenig(a)amd.com>
Acked-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3558
Cc: stable(a)vger.kernel.org # 5.15.x
(cherry picked from commit 0cdb3f9740844b9d95ca413e3fcff11f81223ecf)
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5f6c32ec674d..300d3b236bb3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5531,7 +5531,7 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
adev->nbio.funcs->enable_doorbell_interrupt)
adev->nbio.funcs->enable_doorbell_interrupt(adev, true);
- if (amdgpu_passthrough(adev) &&
+ if (amdgpu_passthrough(adev) && adev->nbio.funcs &&
adev->nbio.funcs->clear_doorbell_interrupt)
adev->nbio.funcs->clear_doorbell_interrupt(adev);
--
2.46.0
re-enumerating full-speed devices after a failed address device command
can trigger a NULL pointer dereference.
Full-speed devices may need to reconfigure the endpoint 0 Max Packet Size
value during enumeration. Usb core calls usb_ep0_reinit() in this case,
which ends up calling xhci_configure_endpoint().
On Panther point xHC the xhci_configure_endpoint() function will
additionally check and reserve bandwidth in software. Other hosts do
this in hardware
If xHC address device command fails then a new xhci_virt_device structure
is allocated as part of re-enabling the slot, but the bandwidth table
pointers are not set up properly here.
This triggers the NULL pointer dereference the next time usb_ep0_reinit()
is called and xhci_configure_endpoint() tries to check and reserve
bandwidth
[46710.713538] usb 3-1: new full-speed USB device number 5 using xhci_hcd
[46710.713699] usb 3-1: Device not responding to setup address.
[46710.917684] usb 3-1: Device not responding to setup address.
[46711.125536] usb 3-1: device not accepting address 5, error -71
[46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008
[46711.125600] #PF: supervisor read access in kernel mode
[46711.125603] #PF: error_code(0x0000) - not-present page
[46711.125606] PGD 0 P4D 0
[46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.3_2 #1
[46711.125620] Hardware name: Gigabyte Technology Co., Ltd.
[46711.125623] Workqueue: usb_hub_wq hub_event [usbcore]
[46711.125668] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c
Fix this by making sure bandwidth table pointers are set up correctly
after a failed address device command, and additionally by avoiding
checking for bandwidth in cases like this where no actual endpoints are
added or removed, i.e. only context for default control endpoint 0 is
evaluated.
Reported-by: Karel Balej <balejk(a)matfyz.cz>
Closes: https://lore.kernel.org/linux-usb/D3CKQQAETH47.1MUO22RTCH2O3@matfyz.cz/
Cc: stable(a)vger.kernel.org
Fixes: 651aaf36a7d7 ("usb: xhci: Handle USB transaction error on address command")
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 0a8cf6c17f82..efdf4c228b8c 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -2837,7 +2837,7 @@ static int xhci_configure_endpoint(struct xhci_hcd *xhci,
xhci->num_active_eps);
return -ENOMEM;
}
- if ((xhci->quirks & XHCI_SW_BW_CHECKING) &&
+ if ((xhci->quirks & XHCI_SW_BW_CHECKING) && !ctx_change &&
xhci_reserve_bandwidth(xhci, virt_dev, command->in_ctx)) {
if ((xhci->quirks & XHCI_EP_LIMIT_QUIRK))
xhci_free_host_resources(xhci, ctrl_ctx);
@@ -4200,8 +4200,10 @@ static int xhci_setup_device(struct usb_hcd *hcd, struct usb_device *udev,
mutex_unlock(&xhci->mutex);
ret = xhci_disable_slot(xhci, udev->slot_id);
xhci_free_virt_device(xhci, udev->slot_id);
- if (!ret)
- xhci_alloc_dev(hcd, udev);
+ if (!ret) {
+ if (xhci_alloc_dev(hcd, udev) == 1)
+ xhci_setup_addressable_virt_dev(xhci, udev);
+ }
kfree(command->completion);
kfree(command);
return -EPROTO;
--
2.25.1
This reverts commit 28ab9769117ca944cb6eb537af5599aa436287a4.
Sense data can be in either fixed format or descriptor format.
SAT-6 revision 1, "10.4.6 Control mode page", defines the D_SENSE bit:
"The SATL shall support this bit as defined in SPC-5 with the following
exception: if the D_ SENSE bit is set to zero (i.e., fixed format sense
data), then the SATL should return fixed format sense data for ATA
PASS-THROUGH commands."
The libata SATL has always kept D_SENSE set to zero by default. (It is
however possible to change the value using a MODE SELECT SG_IO command.)
Failed ATA PASS-THROUGH commands correctly respected the D_SENSE bit,
however, successful ATA PASS-THROUGH commands incorrectly returned the
sense data in descriptor format (regardless of the D_SENSE bit).
Commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for
CK_COND=1 and no error") fixed this bug for successful ATA PASS-THROUGH
commands.
However, after commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE
bit for CK_COND=1 and no error"), there were bug reports that hdparm,
hddtemp, and udisks were no longer working as expected.
These applications incorrectly assume the returned sense data is in
descriptor format, without even looking at the RESPONSE CODE field in the
returned sense data (to see which format the returned sense data is in).
Considering that there will be broken versions of these applications around
roughly forever, we are stuck with being bug compatible with older kernels.
Cc: stable(a)vger.kernel.org # 4.19+
Reported-by: Stephan Eisvogel <eisvogel(a)seitics.de>
Reported-by: Christian Heusel <christian(a)heusel.eu>
Closes: https://lore.kernel.org/linux-ide/0bf3f2f0-0fc6-4ba5-a420-c0874ef82d64@heus…
Fixes: 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
drivers/ata/libata-scsi.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index d6f5e25e1ed8..473e00a58a8b 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -951,8 +951,19 @@ static void ata_gen_passthru_sense(struct ata_queued_cmd *qc)
&sense_key, &asc, &ascq);
ata_scsi_set_sense(qc->dev, cmd, sense_key, asc, ascq);
} else {
- /* ATA PASS-THROUGH INFORMATION AVAILABLE */
- ata_scsi_set_sense(qc->dev, cmd, RECOVERED_ERROR, 0, 0x1D);
+ /*
+ * ATA PASS-THROUGH INFORMATION AVAILABLE
+ *
+ * Note: we are supposed to call ata_scsi_set_sense(), which
+ * respects the D_SENSE bit, instead of unconditionally
+ * generating the sense data in descriptor format. However,
+ * because hdparm, hddtemp, and udisks incorrectly assume sense
+ * data in descriptor format, without even looking at the
+ * RESPONSE CODE field in the returned sense data (to see which
+ * format the returned sense data is in), we are stuck with
+ * being bug compatible with older kernels.
+ */
+ scsi_build_sense(cmd, 1, RECOVERED_ERROR, 0, 0x1D);
}
}
--
2.46.0
On Thu, Aug 15, 2024 at 08:22:16AM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> ext4: convert ext4_da_do_write_end() to take a folio
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> ext4-convert-ext4_da_do_write_end-to-take-a-folio.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
I'd think you'd want to backport 83f4414b8f84 to before folios existing,
so you may as well use the same patch for 6.6 as you'd want to use for
those older kernels. ie:
if (unlikely(!page_buffers(page))) {
unlock_page(page);
put_page(page);
return -EIO;
}
but maybe this has already been discussed and I wasn't cc'd on that
discussion.
The main fix is a possible memory leak on an early exit in the
for_each_child_of_node() loop. That fix has been divided into a patch
that can be backported (a simple of_node_put()), and another one that
uses the scoped variant of the macro, removing the need for any
of_node_put(). That prevents mistakes if new break/return instructions
are added, but the macro might not be available in older kernels.
When at it, an unused header has been dropped.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (3):
drm/mediatek: ovl_adaptor: drop unused mtk_crtc.h header
drm/mediatek: ovl_adaptor: add missing of_node_put()
drm/mediatek: ovl_adaptor: use scoped variant of for_each_child_of_node()
drivers/gpu/drm/mediatek/mtk_disp_ovl_adaptor.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
---
base-commit: f76698bd9a8ca01d3581236082d786e9a6b72bb7
change-id: 20240624-mtk_disp_ovl_adaptor_scoped-0702a6b23443
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: d33d26036a0274b472299d7dcdaa5fb34329f91b
Gitweb: https://git.kernel.org/tip/d33d26036a0274b472299d7dcdaa5fb34329f91b
Author: Roland Xu <mu001999(a)outlook.com>
AuthorDate: Thu, 15 Aug 2024 10:58:13 +08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 15 Aug 2024 15:38:53 +02:00
rtmutex: Drop rt_mutex::wait_lock before scheduling
rt_mutex_handle_deadlock() is called with rt_mutex::wait_lock held. In the
good case it returns with the lock held and in the deadlock case it emits a
warning and goes into an endless scheduling loop with the lock held, which
triggers the 'scheduling in atomic' warning.
Unlock rt_mutex::wait_lock in the dead lock case before issuing the warning
and dropping into the schedule for ever loop.
[ tglx: Moved unlock before the WARN(), removed the pointless comment,
massaged changelog, added Fixes tag ]
Fixes: 3d5c9340d194 ("rtmutex: Handle deadlock detection smarter")
Signed-off-by: Roland Xu <mu001999(a)outlook.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/ME0P300MB063599BEF0743B8FA339C2CECC802@ME0P300M…
---
kernel/locking/rtmutex.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 88d08ee..fba1229 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1644,6 +1644,7 @@ static int __sched rt_mutex_slowlock_block(struct rt_mutex_base *lock,
}
static void __sched rt_mutex_handle_deadlock(int res, int detect_deadlock,
+ struct rt_mutex_base *lock,
struct rt_mutex_waiter *w)
{
/*
@@ -1656,10 +1657,10 @@ static void __sched rt_mutex_handle_deadlock(int res, int detect_deadlock,
if (build_ww_mutex() && w->ww_ctx)
return;
- /*
- * Yell loudly and stop the task right here.
- */
+ raw_spin_unlock_irq(&lock->wait_lock);
+
WARN(1, "rtmutex deadlock detected\n");
+
while (1) {
set_current_state(TASK_INTERRUPTIBLE);
rt_mutex_schedule();
@@ -1713,7 +1714,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
} else {
__set_current_state(TASK_RUNNING);
remove_waiter(lock, waiter);
- rt_mutex_handle_deadlock(ret, chwalk, waiter);
+ rt_mutex_handle_deadlock(ret, chwalk, lock, waiter);
}
/*
Hi Greg,
Please consider these two patches for the 6.6 kernel. These patches are
unmodified versions of the corresponding upstream commits.
Thank you,
Bart.
David Stevens (1):
genirq/cpuhotplug: Skip suspended interrupts when restoring affinity
Dongli Zhang (1):
genirq/cpuhotplug: Retry with cpu_online_mask when migration fails
kernel/irq/cpuhotplug.c | 27 ++++++++++++++++++++++++---
kernel/irq/manage.c | 12 ++++++++----
2 files changed, 32 insertions(+), 7 deletions(-)
Hi stable folks,
I noticed that these two KVM/arm64 pgtable fixes are missing from 6.6.y
so I've done the backports. The second one is also needed in 6.1.y but
it needs some tweaks so I'll post a separate backport for that.
Cheers,
Will
Cc: Marc Zyngier <maz(a)kernel.org>
Cc: Oliver Upton <oliver.upton(a)linux.dev>
cc: kvmarm(a)lists.linux.dev
--->8
Will Deacon (2):
KVM: arm64: Don't defer TLB invalidation when zapping table entries
KVM: arm64: Don't pass a TLBI level hint when zapping table entries
arch/arm64/kvm/hyp/pgtable.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
--
2.46.0.184.g6999bdac58-goog
An USB hub is not a HCD, but an USB device. Fix the referenced schema
accordingly.
Fixes: bfbf2e4b77e2 ("dt-bindings: usb: Document the Microchip USB2514 hub")
Cc: stable(a)vger.kernel.org
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Signed-off-by: Alexander Stein <alexander.stein(a)ew.tq-group.com>
---
As this USB hub also can contain an USB (ethernet) sub device, I copied
the subdevice part from usb-hcd.yaml.
I had to add 'additionalProperties: true' as well, because I got that warning
upon dt_binding_check otherwise:
> Documentation/devicetree/bindings/usb/microchip,usb2514.yaml:
> ^.*@[0-9a-f]{1,2}$: Missing additionalProperties/unevaluatedProperties constraint
I added a Fixes tag to keep this schema aligned in v6.10 stable tree.
Changes in v2:
* Do not update the example
* Adjust comit message accordingly
* Add Cc for stable
* Collected Krzysztof's R-b
* Shorten the SHA1 of the Fixes tag
.../devicetree/bindings/usb/microchip,usb2514.yaml | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/usb/microchip,usb2514.yaml b/Documentation/devicetree/bindings/usb/microchip,usb2514.yaml
index 245e8c3ce6699..b14e6f37b2987 100644
--- a/Documentation/devicetree/bindings/usb/microchip,usb2514.yaml
+++ b/Documentation/devicetree/bindings/usb/microchip,usb2514.yaml
@@ -10,7 +10,7 @@ maintainers:
- Fabio Estevam <festevam(a)gmail.com>
allOf:
- - $ref: usb-hcd.yaml#
+ - $ref: usb-device.yaml#
properties:
compatible:
@@ -36,6 +36,13 @@ required:
- compatible
- reg
+patternProperties:
+ "^.*@[0-9a-f]{1,2}$":
+ description: The hard wired USB devices
+ type: object
+ $ref: /schemas/usb/usb-device.yaml
+ additionalProperties: true
+
unevaluatedProperties: false
examples:
--
2.34.1
Signed-off-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
---
Changes in v2:
- v1 was sent in mistake, b4 messed up with QEMU again
- Link to v1: https://lore.kernel.org/r/20240621-loongson3-ipi-follow-v1-0-c6e73f2b2844@f…
---
Jiaxun Yang (3):
hw/mips/loongson3_virt: Store core_iocsr into LoongsonMachineState
hw/mips/loongson3_virt: Fix condition of IPI IOCSR connection
linux-user/mips64: Use MIPS64R2-generic as default CPU type
hw/mips/loongson3_virt.c | 5 ++++-
linux-user/mips64/target_elf.h | 2 +-
2 files changed, 5 insertions(+), 2 deletions(-)
---
base-commit: 02d9c38236cf8c9826e5c5be61780c4444cb4ae0
change-id: 20240621-loongson3-ipi-follow-1f4919621882
Best regards,
--
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 8bdd9ef7e9b1b2a73e394712b72b22055e0e26c3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081218-quotation-thud-f8b0@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
8bdd9ef7e9b1 ("drm/i915/gem: Fix Virtual Memory mapping boundaries calculation")
8e4ee5e87ce6 ("drm/i915: Wrap all access to i915_vma.node.start|size")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8bdd9ef7e9b1b2a73e394712b72b22055e0e26c3 Mon Sep 17 00:00:00 2001
From: Andi Shyti <andi.shyti(a)linux.intel.com>
Date: Fri, 2 Aug 2024 10:38:50 +0200
Subject: [PATCH] drm/i915/gem: Fix Virtual Memory mapping boundaries
calculation
Calculating the size of the mapped area as the lesser value
between the requested size and the actual size does not consider
the partial mapping offset. This can cause page fault access.
Fix the calculation of the starting and ending addresses, the
total size is now deduced from the difference between the end and
start addresses.
Additionally, the calculations have been rewritten in a clearer
and more understandable form.
Fixes: c58305af1835 ("drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass")
Reported-by: Jann Horn <jannh(a)google.com>
Co-developed-by: Chris Wilson <chris.p.wilson(a)linux.intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson(a)linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v4.9+
Reviewed-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Jonathan Cavitt <Jonathan.cavitt(a)intel.com>
[Joonas: Add Requires: tag]
Requires: 60a2066c5005 ("drm/i915/gem: Adjust vma offset for framebuffer mmap offset")
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240802083850.103694-3-andi.…
(cherry picked from commit 97b6784753da06d9d40232328efc5c5367e53417)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index ce10dd259812..cac6d4184506 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -290,6 +290,41 @@ static vm_fault_t vm_fault_cpu(struct vm_fault *vmf)
return i915_error_to_vmf_fault(err);
}
+static void set_address_limits(struct vm_area_struct *area,
+ struct i915_vma *vma,
+ unsigned long obj_offset,
+ unsigned long *start_vaddr,
+ unsigned long *end_vaddr)
+{
+ unsigned long vm_start, vm_end, vma_size; /* user's memory parameters */
+ long start, end; /* memory boundaries */
+
+ /*
+ * Let's move into the ">> PAGE_SHIFT"
+ * domain to be sure not to lose bits
+ */
+ vm_start = area->vm_start >> PAGE_SHIFT;
+ vm_end = area->vm_end >> PAGE_SHIFT;
+ vma_size = vma->size >> PAGE_SHIFT;
+
+ /*
+ * Calculate the memory boundaries by considering the offset
+ * provided by the user during memory mapping and the offset
+ * provided for the partial mapping.
+ */
+ start = vm_start;
+ start -= obj_offset;
+ start += vma->gtt_view.partial.offset;
+ end = start + vma_size;
+
+ start = max_t(long, start, vm_start);
+ end = min_t(long, end, vm_end);
+
+ /* Let's move back into the "<< PAGE_SHIFT" domain */
+ *start_vaddr = (unsigned long)start << PAGE_SHIFT;
+ *end_vaddr = (unsigned long)end << PAGE_SHIFT;
+}
+
static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
{
#define MIN_CHUNK_PAGES (SZ_1M >> PAGE_SHIFT)
@@ -302,14 +337,18 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
struct i915_ggtt *ggtt = to_gt(i915)->ggtt;
bool write = area->vm_flags & VM_WRITE;
struct i915_gem_ww_ctx ww;
+ unsigned long obj_offset;
+ unsigned long start, end; /* memory boundaries */
intel_wakeref_t wakeref;
struct i915_vma *vma;
pgoff_t page_offset;
+ unsigned long pfn;
int srcu;
int ret;
- /* We don't use vmf->pgoff since that has the fake offset */
+ obj_offset = area->vm_pgoff - drm_vma_node_start(&mmo->vma_node);
page_offset = (vmf->address - area->vm_start) >> PAGE_SHIFT;
+ page_offset += obj_offset;
trace_i915_gem_object_fault(obj, page_offset, true, write);
@@ -402,12 +441,14 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
if (ret)
goto err_unpin;
+ set_address_limits(area, vma, obj_offset, &start, &end);
+
+ pfn = (ggtt->gmadr.start + i915_ggtt_offset(vma)) >> PAGE_SHIFT;
+ pfn += (start - area->vm_start) >> PAGE_SHIFT;
+ pfn += obj_offset - vma->gtt_view.partial.offset;
+
/* Finally, remap it using the new GTT offset */
- ret = remap_io_mapping(area,
- area->vm_start + (vma->gtt_view.partial.offset << PAGE_SHIFT),
- (ggtt->gmadr.start + i915_ggtt_offset(vma)) >> PAGE_SHIFT,
- min_t(u64, vma->size, area->vm_end - area->vm_start),
- &ggtt->iomap);
+ ret = remap_io_mapping(area, start, pfn, end - start, &ggtt->iomap);
if (ret)
goto err_fence;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d67c5649c1541dc93f202eeffc6f49220a4ed71d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024081208-motion-jubilant-6af5@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
d67c5649c154 ("mptcp: fully established after ADD_ADDR echo on MPJ")
b3ea6b272d79 ("mptcp: consolidate initial ack seq generation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d67c5649c1541dc93f202eeffc6f49220a4ed71d Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Wed, 31 Jul 2024 13:05:53 +0200
Subject: [PATCH] mptcp: fully established after ADD_ADDR echo on MPJ
Before this patch, receiving an ADD_ADDR echo on the just connected
MP_JOIN subflow -- initiator side, after the MP_JOIN 3WHS -- was
resulting in an MP_RESET. That's because only ACKs with a DSS or
ADD_ADDRs without the echo bit were allowed.
Not allowing the ADD_ADDR echo after an MP_CAPABLE 3WHS makes sense, as
we are not supposed to send an ADD_ADDR before because it requires to be
in full established mode first. For the MP_JOIN 3WHS, that's different:
the ADD_ADDR can be sent on a previous subflow, and the ADD_ADDR echo
can be received on the recently created one. The other peer will already
be in fully established, so it is allowed to send that.
We can then relax the conditions here to accept the ADD_ADDR echo for
MPJ subflows.
Fixes: 67b12f792d5e ("mptcp: full fully established support after ADD_ADDR")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-endp-subflow-…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 8a68382a4fe9..ac2f1a54cc43 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -958,7 +958,8 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
if (subflow->remote_key_valid &&
(((mp_opt->suboptions & OPTION_MPTCP_DSS) && mp_opt->use_ack) ||
- ((mp_opt->suboptions & OPTION_MPTCP_ADD_ADDR) && !mp_opt->echo))) {
+ ((mp_opt->suboptions & OPTION_MPTCP_ADD_ADDR) &&
+ (!mp_opt->echo || subflow->mp_join)))) {
/* subflows are fully established as soon as we get any
* additional ack, including ADD_ADDR.
*/