Currently we treat EFAULT from hmm_range_fault() as a non-fatal error
when called from xe_vm_userptr_pin() with the idea that we want to avoid
killing the entire vm and chucking an error, under the assumption that
the user just did an unmap or something, and has no intention of
actually touching that memory from the GPU. At this point we have
already zapped the PTEs so any access should generate a page fault, and
if the pin fails there also it will then become fatal.
However it looks like it's possible for the userptr vma to still be on
the rebind list in preempt_rebind_work_func(), if we had to retry the
pin again due to something happening in the caller before we did the
rebind step, but in the meantime needing to re-validate the userptr and
this time hitting the EFAULT.
This might explain an internal user report of hitting:
[ 191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe]
[ 191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738690] Call Trace:
[ 191.738692] <TASK>
[ 191.738694] ? show_regs+0x69/0x80
[ 191.738698] ? __warn+0x93/0x1a0
[ 191.738703] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738759] ? report_bug+0x18f/0x1a0
[ 191.738764] ? handle_bug+0x63/0xa0
[ 191.738767] ? exc_invalid_op+0x19/0x70
[ 191.738770] ? asm_exc_invalid_op+0x1b/0x20
[ 191.738777] ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
[ 191.738834] ? ret_from_fork_asm+0x1a/0x30
[ 191.738849] bind_op_prepare+0x105/0x7b0 [xe]
[ 191.738906] ? dma_resv_reserve_fences+0x301/0x380
[ 191.738912] xe_pt_update_ops_prepare+0x28c/0x4b0 [xe]
[ 191.738966] ? kmemleak_alloc+0x4b/0x80
[ 191.738973] ops_execute+0x188/0x9d0 [xe]
[ 191.739036] xe_vm_rebind+0x4ce/0x5a0 [xe]
[ 191.739098] ? trace_hardirqs_on+0x4d/0x60
[ 191.739112] preempt_rebind_work_func+0x76f/0xd00 [xe]
Followed by NPD, when running some workload, since the sg was never
actually populated but the vma is still marked for rebind when it should
be skipped for this special EFAULT case. And from the logs it does seem
like we hit this special EFAULT case before the explosions.
Fixes: 521db22a1d70 ("drm/xe: Invalidate userptr VMA on page pin fault")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v6.10+
---
drivers/gpu/drm/xe/xe_vm.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index d664f2e418b2..1caee9cbeafb 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -692,6 +692,17 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
xe_vm_unlock(vm);
if (err)
return err;
+
+ /*
+ * We might have already done the pin once already, but then had to retry
+ * before the re-bind happended, due some other condition in the caller, but
+ * in the meantime the userptr got dinged by the notifier such that we need
+ * to revalidate here, but this time we hit the EFAULT. In such a case
+ * make sure we remove ourselves from the rebind list to avoid going down in
+ * flames.
+ */
+ if (!list_empty(&uvma->vma.combined_links.rebind))
+ list_del_init(&uvma->vma.combined_links.rebind);
} else {
if (err < 0)
return err;
--
2.48.1
From: Jos Wang <joswang(a)lenovo.com>
As PD2.0 spec ("6.5.6.2 PSSourceOffTimer"),the PSSourceOffTimer is
used by the Policy Engine in Dual-Role Power device that is currently
acting as a Sink to timeout on a PS_RDY Message during a Power Role
Swap sequence. This condition leads to a Hard Reset for USB Type-A and
Type-B Plugs and Error Recovery for Type-C plugs and return to USB
Default Operation.
Therefore, after PSSourceOffTimer timeout, the tcpm state machine should
switch from PR_SWAP_SNK_SRC_SINK_OFF to ERROR_RECOVERY. This can also
solve the test items in the USB power delivery compliance test:
TEST.PD.PROT.SNK.12 PR_Swap – PSSourceOffTimer Timeout
[1] https://usb.org/document-library/usb-power-delivery-compliance-test-specifi…
Fixes: f0690a25a140 ("staging: typec: USB Type-C Port Manager (tcpm)")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jos Wang <joswang(a)lenovo.com>
---
v2: Modify the commit message, remove unnecessary blank lines.
drivers/usb/typec/tcpm/tcpm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 47be450d2be3..6bf1a22c785a 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -5591,8 +5591,7 @@ static void run_state_machine(struct tcpm_port *port)
tcpm_set_auto_vbus_discharge_threshold(port, TYPEC_PWR_MODE_USB,
port->pps_data.active, 0);
tcpm_set_charge(port, false);
- tcpm_set_state(port, hard_reset_state(port),
- port->timings.ps_src_off_time);
+ tcpm_set_state(port, ERROR_RECOVERY, port->timings.ps_src_off_time);
break;
case PR_SWAP_SNK_SRC_SOURCE_ON:
tcpm_enable_auto_vbus_discharge(port, true);
--
2.17.1
From: Jos Wang <joswang(a)lenovo.com>
As PD2.0 spec ("6.5.6.2 PSSourceOffTimer"),the PSSourceOffTimer is
used by the Policy Engine in Dual-Role Power device that is currently
acting as a Sink to timeout on a PS_RDY Message during a Power Role
Swap sequence. This condition leads to a Hard Reset for USB Type-A and
Type-B Plugs and Error Recovery for Type-C plugs and return to USB
Default Operation.
Therefore, after PSSourceOffTimer timeout, the tcpm state machine should
switch from PR_SWAP_SNK_SRC_SINK_OFF to ERROR_RECOVERY. This can also solve
the test items in the USB power delivery compliance test:
TEST.PD.PROT.SNK.12 PR_Swap – PSSourceOffTimer Timeout
[1] https://usb.org/document-library/usb-power-delivery-compliance-test-specifi…
Fixes: f0690a25a140 ("staging: typec: USB Type-C Port Manager (tcpm)")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jos Wang <joswang(a)lenovo.com>
---
drivers/usb/typec/tcpm/tcpm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 47be450d2be3..6bf1a22c785a 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -5591,8 +5591,7 @@ static void run_state_machine(struct tcpm_port *port)
tcpm_set_auto_vbus_discharge_threshold(port, TYPEC_PWR_MODE_USB,
port->pps_data.active, 0);
tcpm_set_charge(port, false);
- tcpm_set_state(port, hard_reset_state(port),
- port->timings.ps_src_off_time);
+ tcpm_set_state(port, ERROR_RECOVERY, port->timings.ps_src_off_time);
break;
case PR_SWAP_SNK_SRC_SOURCE_ON:
tcpm_enable_auto_vbus_discharge(port, true);
--
2.17.1
The xHC resources allocated for USB devices are not released in correct
order after resuming in case when while suspend device was reconnected.
This issue has been detected during the fallowing scenario:
- connect hub HS to root port
- connect LS/FS device to hub port
- wait for enumeration to finish
- force DUT to suspend
- reconnect hub attached to root port
- wake DUT
For this scenario during enumeration of USB LS/FS device the Cadence xHC
reports completion error code for xHCi commands because the devices was not
property disconnected and in result the xHC resources has not been
correct freed.
XHCI specification doesn't mention that device can be reset in any order
so, we should not treat this issue as Cadence xHC controller bug.
Similar as during disconnecting in this case the device should be cleared
starting form the last usb device in tree toward the root hub.
To fix this issue usbcore driver should disconnect all USB
devices connected to hub which was reconnected while suspending.
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
cc: <stable(a)vger.kernel.org>
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
---
drivers/usb/core/hub.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 0cd44f1fd56d..2473cbf317a8 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3627,10 +3627,12 @@ static int finish_port_resume(struct usb_device *udev)
* the device will be rediscovered.
*/
retry_reset_resume:
- if (udev->quirks & USB_QUIRK_RESET)
+ if (udev->quirks & USB_QUIRK_RESET) {
status = -ENODEV;
- else
+ } else {
+ hub_disconnect_children(udev);
status = usb_reset_and_verify_device(udev);
+ }
}
/* 10.5.4.5 says be sure devices in the tree are still there.
--
2.43.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 4f6a6bed0bfef4b966f076f33eb4f5547226056a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021128-repeater-percolate-6131@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4f6a6bed0bfef4b966f076f33eb4f5547226056a Mon Sep 17 00:00:00 2001
From: Wei Yang <richard.weiyang(a)gmail.com>
Date: Wed, 13 Nov 2024 03:16:14 +0000
Subject: [PATCH] maple_tree: simplify split calculation
Patch series "simplify split calculation", v3.
This patch (of 3):
The current calculation for splitting nodes tries to enforce a minimum
span on the leaf nodes. This code is complex and never worked correctly
to begin with, due to the min value being passed as 0 for all leaves.
The calculation should just split the data as equally as possible
between the new nodes. Note that b_end will be one more than the data,
so the left side is still favoured in the calculation.
The current code may also lead to a deficient node by not leaving enough
data for the right side of the split. This issue is also addressed with
the split calculation change.
[Liam.Howlett(a)Oracle.com: rephrase the change log]
Link: https://lkml.kernel.org/r/20241113031616.10530-1-richard.weiyang@gmail.com
Link: https://lkml.kernel.org/r/20241113031616.10530-2-richard.weiyang@gmail.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index fe7f9e1f5bbb..ca8ae1e1cc0a 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1863,11 +1863,11 @@ static inline int mab_no_null_split(struct maple_big_node *b_node,
* Return: The first split location. The middle split is set in @mid_split.
*/
static inline int mab_calc_split(struct ma_state *mas,
- struct maple_big_node *bn, unsigned char *mid_split, unsigned long min)
+ struct maple_big_node *bn, unsigned char *mid_split)
{
unsigned char b_end = bn->b_end;
int split = b_end / 2; /* Assume equal split. */
- unsigned char slot_min, slot_count = mt_slots[bn->type];
+ unsigned char slot_count = mt_slots[bn->type];
/*
* To support gap tracking, all NULL entries are kept together and a node cannot
@@ -1900,18 +1900,7 @@ static inline int mab_calc_split(struct ma_state *mas,
split = b_end / 3;
*mid_split = split * 2;
} else {
- slot_min = mt_min_slots[bn->type];
-
*mid_split = 0;
- /*
- * Avoid having a range less than the slot count unless it
- * causes one node to be deficient.
- * NOTE: mt_min_slots is 1 based, b_end and split are zero.
- */
- while ((split < slot_count - 1) &&
- ((bn->pivot[split] - min) < slot_count - 1) &&
- (b_end - split > slot_min))
- split++;
}
/* Avoid ending a node on a NULL entry */
@@ -2377,7 +2366,7 @@ static inline struct maple_enode
static inline unsigned char mas_mab_to_node(struct ma_state *mas,
struct maple_big_node *b_node, struct maple_enode **left,
struct maple_enode **right, struct maple_enode **middle,
- unsigned char *mid_split, unsigned long min)
+ unsigned char *mid_split)
{
unsigned char split = 0;
unsigned char slot_count = mt_slots[b_node->type];
@@ -2390,7 +2379,7 @@ static inline unsigned char mas_mab_to_node(struct ma_state *mas,
if (b_node->b_end < slot_count) {
split = b_node->b_end;
} else {
- split = mab_calc_split(mas, b_node, mid_split, min);
+ split = mab_calc_split(mas, b_node, mid_split);
*right = mas_new_ma_node(mas, b_node);
}
@@ -2877,7 +2866,7 @@ static void mas_spanning_rebalance(struct ma_state *mas,
mast->bn->b_end--;
mast->bn->type = mte_node_type(mast->orig_l->node);
split = mas_mab_to_node(mas, mast->bn, &left, &right, &middle,
- &mid_split, mast->orig_l->min);
+ &mid_split);
mast_set_split_parents(mast, left, middle, right, split,
mid_split);
mast_cp_to_nodes(mast, left, middle, right, split, mid_split);
@@ -3365,7 +3354,7 @@ static void mas_split(struct ma_state *mas, struct maple_big_node *b_node)
if (mas_push_data(mas, height, &mast, false))
break;
- split = mab_calc_split(mas, b_node, &mid_split, prev_l_mas.min);
+ split = mab_calc_split(mas, b_node, &mid_split);
mast_split_data(&mast, mas, split);
/*
* Usually correct, mab_mas_cp in the above call overwrites
From: Steve Wahl <steve.wahl(a)hpe.com>
[ Upstream commit cc31744a294584a36bf764a0ffa3255a8e69f036 ]
When ident_pud_init() uses only GB pages to create identity maps, large
ranges of addresses not actually requested can be included in the resulting
table; a 4K request will map a full GB. This can include a lot of extra
address space past that requested, including areas marked reserved by the
BIOS. That allows processor speculation into reserved regions, that on UV
systems can cause system halts.
Only use GB pages when map creation requests include the full GB page of
space. Fall back to using smaller 2M pages when only portions of a GB page
are included in the request.
No attempt is made to coalesce mapping requests. If a request requires a
map entry at the 2M (pmd) level, subsequent mapping requests within the
same 1G region will also be at the pmd level, even if adjacent or
overlapping such requests could have been combined to map a full GB page.
Existing usage starts with larger regions and then adds smaller regions, so
this should not have any great consequence.
Signed-off-by: Steve Wahl <steve.wahl(a)hpe.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Pavin Joseph <me(a)pavinjoseph.com>
Tested-by: Sarah Brofeldt <srhb(a)dbc.dk>
Tested-by: Eric Hagberg <ehagberg(a)gmail.com>
Link: https://lore.kernel.org/all/20240717213121.3064030-3-steve.wahl@hpe.com
Signed-off-by: Bruno VERNAY <bruno.vernay(a)se.com>
Signed-off-by: Hugo SIMELIERE <hsimeliere.opensource(a)witekio.com>
---
arch/x86/mm/ident_map.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 968d7005f4a7..a204a332c71f 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -26,18 +26,31 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
for (; addr < end; addr = next) {
pud_t *pud = pud_page + pud_index(addr);
pmd_t *pmd;
+ bool use_gbpage;
next = (addr & PUD_MASK) + PUD_SIZE;
if (next > end)
next = end;
- if (info->direct_gbpages) {
- pud_t pudval;
+ /* if this is already a gbpage, this portion is already mapped */
+ if (pud_leaf(*pud))
+ continue;
+
+ /* Is using a gbpage allowed? */
+ use_gbpage = info->direct_gbpages;
- if (pud_present(*pud))
- continue;
+ /* Don't use gbpage if it maps more than the requested region. */
+ /* at the begining: */
+ use_gbpage &= ((addr & ~PUD_MASK) == 0);
+ /* ... or at the end: */
+ use_gbpage &= ((next & ~PUD_MASK) == 0);
+
+ /* Never overwrite existing mappings */
+ use_gbpage &= !pud_present(*pud);
+
+ if (use_gbpage) {
+ pud_t pudval;
- addr &= PUD_MASK;
pudval = __pud((addr - info->offset) | info->page_flag);
set_pud(pud, pudval);
continue;
--
2.43.0
The role switch registration and set_role() can happen in parallel as they
are invoked independent of each other. There is a possibility that a driver
might spend significant amount of time in usb_role_switch_register() API
due to the presence of time intensive operations like component_add()
which operate under common mutex. This leads to a time window after
allocating the switch and before setting the registered flag where the set
role notifications are dropped. Below timeline summarizes this behavior
Thread1 | Thread2
usb_role_switch_register() |
| |
---> allocate switch |
| |
---> component_add() | usb_role_switch_set_role()
| | |
| | --> Drop role notifications
| | since sw->registered
| | flag is not set.
| |
--->Set registered flag.|
To avoid this, set the registered flag early on in the switch register
API.
Fixes: b787a3e78175 ("usb: roles: don't get/set_role() when usb_role_switch is unregistered")
cc: stable(a)vger.kernel.org
Signed-off-by: Elson Roy Serrao <quic_eserrao(a)quicinc.com>
---
Changes in v2:
- Set the switch registered flag from the get-go as suggested by
Heikki.
- Modified subject line and commit next as per the new logic.
- Link to v1: https://lore.kernel.org/all/20250127230715.6142-1-quic_eserrao@quicinc.com/
drivers/usb/roles/class.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c
index c58a12c147f4..30482d4cf826 100644
--- a/drivers/usb/roles/class.c
+++ b/drivers/usb/roles/class.c
@@ -387,8 +387,11 @@ usb_role_switch_register(struct device *parent,
dev_set_name(&sw->dev, "%s-role-switch",
desc->name ? desc->name : dev_name(parent));
+ sw->registered = true;
+
ret = device_register(&sw->dev);
if (ret) {
+ sw->registered = false;
put_device(&sw->dev);
return ERR_PTR(ret);
}
@@ -399,8 +402,6 @@ usb_role_switch_register(struct device *parent,
dev_warn(&sw->dev, "failed to add component\n");
}
- sw->registered = true;
-
/* TODO: Symlinks for the host port and the device controller. */
return sw;
--
2.17.1
In 3eab9d7bc2f4 ("fuse: convert readahead to use folios"), the logic
was converted to using the new folio readahead code, which drops the
reference on the folio once it is locked, using an inferred reference
on the folio. Previously we held a reference on the folio for the
entire duration of the readpages call.
This is fine, however for the case for splice pipe responses where we
will remove the old folio and splice in the new folio (see
fuse_try_move_page()), we assume that there is a reference held on the
folio for ap->folios, which is no longer the case.
To fix this, revert back to __readahead_folio() which allows us to hold
the reference on the folio for the duration of readpages until either we
drop the reference ourselves in fuse_readpages_end() or the reference is
dropped after it's replaced in the page cache in the splice case.
This will fix the UAF bug that was reported.
Link: https://lore.kernel.org/linux-fsdevel/2f681f48-00f5-4e09-8431-2b3dbfaa881e@…
Fixes: 3eab9d7bc2f4 ("fuse: convert readahead to use folios")
Reported-by: Christian Heusel <christian(a)heusel.eu>
Closes: https://lore.kernel.org/all/2f681f48-00f5-4e09-8431-2b3dbfaa881e@heusel.eu/
Closes: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/110
Reported-by: Mantas Mikulėnas <grawity(a)gmail.com>
Closes: https://lore.kernel.org/all/34feb867-09e2-46e4-aa31-d9660a806d1a@gmail.com/
Closes: https://bugzilla.opensuse.org/show_bug.cgi?id=1236660
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Joanne Koong <joannelkoong(a)gmail.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
---
fs/fuse/dev.c | 6 ++++++
fs/fuse/file.c | 13 +++++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 3c03aac480a4..de9e25e7dd2d 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -933,6 +933,12 @@ static int fuse_check_folio(struct folio *folio)
return 0;
}
+/*
+ * Attempt to steal a page from the splice() pipe and move it into the
+ * pagecache. If successful, the pointer in @pagep will be updated. The
+ * folio that was originally in @pagep will lose a reference and the new
+ * folio returned in @pagep will carry a reference.
+ */
static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
{
int err;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7d92a5479998..d63e56fd3dd2 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -955,8 +955,10 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
fuse_invalidate_atime(inode);
}
- for (i = 0; i < ap->num_folios; i++)
+ for (i = 0; i < ap->num_folios; i++) {
folio_end_read(ap->folios[i], !err);
+ folio_put(ap->folios[i]);
+ }
if (ia->ff)
fuse_file_put(ia->ff, false);
@@ -1048,7 +1050,14 @@ static void fuse_readahead(struct readahead_control *rac)
ap = &ia->ap;
while (ap->num_folios < cur_pages) {
- folio = readahead_folio(rac);
+ /*
+ * This returns a folio with a ref held on it.
+ * The ref needs to be held until the request is
+ * completed, since the splice case (see
+ * fuse_try_move_page()) drops the ref after it's
+ * replaced in the page cache.
+ */
+ folio = __readahead_folio(rac);
ap->folios[ap->num_folios] = folio;
ap->descs[ap->num_folios].length = folio_size(folio);
ap->num_folios++;
--
2.43.5
Hi,
Following up on my primary email about the visitor list. Please let me know your thoughts, and I'd be happy to give more details.
Best regards,
Grace
Subject: Integrated Systems Europe 2025!
Hi,
Are you interested in the latest attendee and exhibitor lists for Integrated Systems Europe 2025? This comprehensive database now includes last-minute registrants, providing you with up-to-date and actionable insights.
Event Details:
Location: Fira Barcelona Gran Via, Barcelona, Spain
Exhibitors: 1,460
Attendees: 73,891
Data fields: Induvial Email address, Cell Phone Number, Contact Name, Job Title, Company Name, Company website, Physical Address and more),
Would you like details on the attendees list, exhibitors list, or both? Let me know, and I’ll be happy to share pricing details and answer any questions.
Looking forward to hearing from you!
Best regards,
Grace Green
Sr. Demand Generation
P.S. If you’d prefer not to receive updates, simply reply with “NO.”
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when
pmd_sect() is true, the entire PMD section is cleared, even if there is
other effective subsection. For example page_struct_map1 and
page_strcut_map2 are part of a single PMD entry and they are hot-added
sequentially. Then page_struct_map1 is removed, vmemmap_free() will clear
the entire PMD entry freeing the struct page map for the whole section,
even though page_struct_map2 is still active. Similar problem exists
with linear mapping as well, for 16K base page(PMD size = 32M) or 64K
base page(PMD = 512M), their block mappings exceed SUBSECTION_SIZE.
Tearing down the entire PMD mapping too will leave other subsections
unmapped in the linear mapping.
To address the issue, we need to prevent PMD/PUD/CONT mappings for both
linear and vmemmap for non-boot sections if corresponding size on the
given base page exceeds SUBSECTION_SIZE(2MB now).
Cc: stable(a)vger.kernel.org # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
---
Hi Catalin and Anshuman,
I have addressed comments so far, please help review.
One outstanding point which not finalized is in vmemmap_populate(): how to judge hotplug
section. Currently I am using system_state, discussion:
https://lore.kernel.org/linux-mm/1515dae4-cb53-4645-8c72-d33b27ede7eb@quici…
arch/arm64/mm/mmu.c | 46 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 37 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e2739b69e11b..8718d6e454c5 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -42,9 +42,13 @@
#include <asm/pgalloc.h>
#include <asm/kfence.h>
-#define NO_BLOCK_MAPPINGS BIT(0)
-#define NO_CONT_MAPPINGS BIT(1)
-#define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
+#define NO_PMD_BLOCK_MAPPINGS BIT(0)
+#define NO_PUD_BLOCK_MAPPINGS BIT(1) /* Hotplug case: do not want block mapping for PUD */
+#define NO_BLOCK_MAPPINGS (NO_PMD_BLOCK_MAPPINGS | NO_PUD_BLOCK_MAPPINGS)
+#define NO_PTE_CONT_MAPPINGS BIT(2)
+#define NO_PMD_CONT_MAPPINGS BIT(3) /* Hotplug case: do not want cont mapping for PMD */
+#define NO_CONT_MAPPINGS (NO_PTE_CONT_MAPPINGS | NO_PMD_CONT_MAPPINGS)
+#define NO_EXEC_MAPPINGS BIT(4) /* assumes FEAT_HPDS is not used */
u64 kimage_voffset __ro_after_init;
EXPORT_SYMBOL(kimage_voffset);
@@ -224,7 +228,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
/* use a contiguous mapping if the range is suitably aligned */
if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
- (flags & NO_CONT_MAPPINGS) == 0)
+ (flags & NO_PTE_CONT_MAPPINGS) == 0)
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
init_pte(ptep, addr, next, phys, __prot);
@@ -254,7 +258,7 @@ static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
/* try section mapping first */
if (((addr | next | phys) & ~PMD_MASK) == 0 &&
- (flags & NO_BLOCK_MAPPINGS) == 0) {
+ (flags & NO_PMD_BLOCK_MAPPINGS) == 0) {
pmd_set_huge(pmdp, phys, prot);
/*
@@ -311,7 +315,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
/* use a contiguous mapping if the range is suitably aligned */
if ((((addr | next | phys) & ~CONT_PMD_MASK) == 0) &&
- (flags & NO_CONT_MAPPINGS) == 0)
+ (flags & NO_PMD_CONT_MAPPINGS) == 0)
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
init_pmd(pmdp, addr, next, phys, __prot, pgtable_alloc, flags);
@@ -358,8 +362,8 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
* For 4K granule only, attempt to put down a 1GB block
*/
if (pud_sect_supported() &&
- ((addr | next | phys) & ~PUD_MASK) == 0 &&
- (flags & NO_BLOCK_MAPPINGS) == 0) {
+ ((addr | next | phys) & ~PUD_MASK) == 0 &&
+ (flags & NO_PUD_BLOCK_MAPPINGS) == 0) {
pud_set_huge(pudp, phys, prot);
/*
@@ -1177,7 +1181,13 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ /*
+ * Hotplugged section does not support hugepages as
+ * PMD_SIZE (hence PUD_SIZE) section mapping covers
+ * struct page range that exceeds a SUBSECTION_SIZE
+ * i.e 2MB - for all available base page sizes.
+ */
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || system_state != SYSTEM_BOOTING)
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
@@ -1339,9 +1349,27 @@ int arch_add_memory(int nid, u64 start, u64 size,
struct mhp_params *params)
{
int ret, flags = NO_EXEC_MAPPINGS;
+ unsigned long start_pfn = PFN_DOWN(start);
+ struct mem_section *ms = __pfn_to_section(start_pfn);
VM_BUG_ON(!mhp_range_allowed(start, size, true));
+ /* should not be invoked by early section */
+ WARN_ON(early_section(ms));
+
+ /*
+ * Disallow BlOCK/CONT mappings if the corresponding size exceeds
+ * SUBSECTION_SIZE which now is 2MB.
+ *
+ * PUD_BLOCK or PMD_CONT should consistently exceed SUBSECTION_SIZE
+ * across all variable page size configurations, so add them directly
+ */
+ flags |= NO_PUD_BLOCK_MAPPINGS | NO_PMD_CONT_MAPPINGS;
+ if (SUBSECTION_SHIFT < PMD_SHIFT)
+ flags |= NO_PMD_BLOCK_MAPPINGS;
+ if (SUBSECTION_SHIFT < CONT_PTE_SHIFT)
+ flags |= NO_PTE_CONT_MAPPINGS;
+
if (can_set_direct_map())
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
--
2.25.1
commit c910f2b65518 ("arm64/mm: Update tlb invalidation routines for
FEAT_LPA2") changed the "invalidation level unknown" hint from 0 to
TLBI_TTL_UNKNOWN (INT_MAX). But the fallback "unknown level" path in
flush_hugetlb_tlb_range() was not updated. So as it stands, when trying
to invalidate CONT_PMD_SIZE or CONT_PTE_SIZE hugetlb mappings, we will
spuriously try to invalidate at level 0 on LPA2-enabled systems.
Fix this so that the fallback passes TLBI_TTL_UNKNOWN, and while we are
at it, explicitly use the correct stride and level for CONT_PMD_SIZE and
CONT_PTE_SIZE, which should provide a minor optimization.
Cc: <stable(a)vger.kernel.org>
Fixes: c910f2b65518 ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
---
arch/arm64/include/asm/hugetlb.h | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 03db9cb21ace..8ab9542d2d22 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -76,12 +76,20 @@ static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma,
{
unsigned long stride = huge_page_size(hstate_vma(vma));
- if (stride == PMD_SIZE)
- __flush_tlb_range(vma, start, end, stride, false, 2);
- else if (stride == PUD_SIZE)
- __flush_tlb_range(vma, start, end, stride, false, 1);
- else
- __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0);
+ switch (stride) {
+ case PUD_SIZE:
+ __flush_tlb_range(vma, start, end, PUD_SIZE, false, 1);
+ break;
+ case CONT_PMD_SIZE:
+ case PMD_SIZE:
+ __flush_tlb_range(vma, start, end, PMD_SIZE, false, 2);
+ break;
+ case CONT_PTE_SIZE:
+ __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 3);
+ break;
+ default:
+ __flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN);
+ }
}
#endif /* __ASM_HUGETLB_H */
--
2.43.0
Now for dwmac-loongson {tx,rx}_fifo_size are uninitialised, which means
zero. This means dwmac-loongson doesn't support changing MTU because in
stmmac_change_mtu() it requires the fifo size be no less than MTU. Thus,
set the correct tx_fifo_size and rx_fifo_size for it (16KB multiplied by
queue counts).
Here {tx,rx}_fifo_size is initialised with the initial value (also the
maximum value) of {tx,rx}_queues_to_use. So it will keep as 16KB if we
don't change the queue count, and will be larger than 16KB if we change
(decrease) the queue count. However stmmac_change_mtu() still work well
with current logic (MTU cannot be larger than 16KB for stmmac).
Note: the Fixes tag picked here is the oldest commit and key commit of
the dwmac-loongson series "stmmac: Add Loongson platform support".
Cc: stable(a)vger.kernel.org
Fixes: ad72f783de06 ("net: stmmac: Add multi-channel support")
Acked-by: Yanteng Si <si.yanteng(a)linux.dev>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Signed-off-by: Chong Qiao <qiaochong(a)loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
V2: Update commit message and CC list.
drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c
index bfe6e2d631bd..79acdf38c525 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-loongson.c
@@ -574,6 +574,9 @@ static int loongson_dwmac_probe(struct pci_dev *pdev, const struct pci_device_id
if (ret)
goto err_disable_device;
+ plat->tx_fifo_size = SZ_16K * plat->tx_queues_to_use;
+ plat->rx_fifo_size = SZ_16K * plat->rx_queues_to_use;
+
if (dev_of_node(&pdev->dev))
ret = loongson_dwmac_dt_config(pdev, plat, &res);
else
--
2.47.1
The function get_cipher_desc() may return NULL if the cipher type is
invalid or unsupported. In fill_sg_out(), the return value is used
without any checks, which could lead to a NULL pointer dereference.
This patch adds a DEBUG_NET_WARN_ON_ONCE check to ensure that
cipher_desc is valid and offloadable before proceeding. This prevents
potential crashes and provides a clear warning in debug builds.
Fixes: 8db44ab26beb ("tls: rename tls_cipher_size_desc to tls_cipher_desc")
Cc: stable(a)vger.kernel.org # 6.6+
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
net/tls/tls_device_fallback.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c
index f9e3d3d90dcf..0f93a0833ec2 100644
--- a/net/tls/tls_device_fallback.c
+++ b/net/tls/tls_device_fallback.c
@@ -306,6 +306,7 @@ static void fill_sg_out(struct scatterlist sg_out[3], void *buf,
{
const struct tls_cipher_desc *cipher_desc =
get_cipher_desc(tls_ctx->crypto_send.info.cipher_type);
+ DEBUG_NET_WARN_ON_ONCE(!cipher_desc || !cipher_desc->offloadable);
sg_set_buf(&sg_out[0], dummy_buf, sync_size);
sg_set_buf(&sg_out[1], nskb->data + tcp_payload_offset, payload_len);
--
2.42.0.windows.2
The function blkg_to_lat() may return NULL if the blkg is not associated
with an iolatency group. In iolatency_set_min_lat_nsec() and
iolatency_pd_init(), the return values are not checked, leading to
potential NULL pointer dereferences.
This patch adds checks for the return values of blkg_to_lat and let it
returns early if it is NULL, preventing the NULL pointer dereference.
Fixes: d70675121546 ("block: introduce blk-iolatency io controller")
Cc: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
block/blk-iolatency.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index ebb522788d97..398f0a1747c4 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -787,6 +787,8 @@ static int blk_iolatency_init(struct gendisk *disk)
static void iolatency_set_min_lat_nsec(struct blkcg_gq *blkg, u64 val)
{
struct iolatency_grp *iolat = blkg_to_lat(blkg);
+ if (!iolat)
+ return;
struct blk_iolatency *blkiolat = iolat->blkiolat;
u64 oldval = iolat->min_lat_nsec;
@@ -1013,6 +1015,8 @@ static void iolatency_pd_init(struct blkg_policy_data *pd)
*/
if (blkg->parent && blkg_to_pd(blkg->parent, &blkcg_policy_iolatency)) {
struct iolatency_grp *parent = blkg_to_lat(blkg->parent);
+ if (!parent)
+ return;
atomic_set(&iolat->scale_cookie,
atomic_read(&parent->child_lat.scale_cookie));
} else {
--
2.42.0.windows.2
From: Shyam Prasad N <sprasad(a)microsoft.com>
Our current approach to select a channel for sending requests is this:
1. iterate all channels to find the min and max queue depth
2. if min and max are not the same, pick the channel with min depth
3. if min and max are same, round robin, as all channels are equally loaded
The problem with this approach is that there's a lag between selecting
a channel and sending the request (that increases the queue depth on the channel).
While these numbers will eventually catch up, there could be a skew in the
channel usage, depending on the application's I/O parallelism and the server's
speed of handling requests.
With sufficient parallelism, this lag can artificially increase the queue depth,
thereby impacting the performance negatively.
This change will change the step 1 above to start the iteration from the last
selected channel. This is to reduce the skew in channel usage even in the presence
of this lag.
Fixes: ea90708d3cf3 ("cifs: use the least loaded channel for sending requests")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Shyam Prasad N <sprasad(a)microsoft.com>
---
fs/smb/client/transport.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c
index 0dc80959ce48..e2fbf8b18eb2 100644
--- a/fs/smb/client/transport.c
+++ b/fs/smb/client/transport.c
@@ -1015,14 +1015,16 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses)
uint index = 0;
unsigned int min_in_flight = UINT_MAX, max_in_flight = 0;
struct TCP_Server_Info *server = NULL;
- int i;
+ int i, start, cur;
if (!ses)
return NULL;
spin_lock(&ses->chan_lock);
+ start = atomic_inc_return(&ses->chan_seq);
for (i = 0; i < ses->chan_count; i++) {
- server = ses->chans[i].server;
+ cur = (start + i) % ses->chan_count;
+ server = ses->chans[cur].server;
if (!server || server->terminate)
continue;
@@ -1039,17 +1041,15 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses)
*/
if (server->in_flight < min_in_flight) {
min_in_flight = server->in_flight;
- index = i;
+ index = cur;
}
if (server->in_flight > max_in_flight)
max_in_flight = server->in_flight;
}
/* if all channels are equally loaded, fall back to round-robin */
- if (min_in_flight == max_in_flight) {
- index = (uint)atomic_inc_return(&ses->chan_seq);
- index %= ses->chan_count;
- }
+ if (min_in_flight == max_in_flight)
+ index = (uint)start % ses->chan_count;
server = ses->chans[index].server;
spin_unlock(&ses->chan_lock);
--
2.43.0
Starting with Rust 1.86.0 (to be released 2025-04-03), Clippy will have
a new lint, `doc_overindented_list_items` [1], which catches cases of
overindented list items.
The lint has been added by Yutaro Ohno, based on feedback from the kernel
[2] on a patch that fixed a similar case -- commit 0c5928deada1 ("rust:
block: fix formatting in GenDisk doc").
Clippy reports a single case in the kernel, apart from the one already
fixed in the commit above:
error: doc list item overindented
--> rust/kernel/rbtree.rs:1152:5
|
1152 | /// null, it is a pointer to the root of the [`RBTree`].
| ^^^^ help: try using ` ` (2 spaces)
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#doc_overindented_…
= note: `-D clippy::doc-overindented-list-items` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::doc_overindented_list_items)]`
Thus clean it up.
Cc: Yutaro Ohno <yutaro.ono.418(a)gmail.com>
Cc: <stable(a)vger.kernel.org> # Needed in 6.12.y and 6.13.y only (Rust is pinned in older LTSs).
Fixes: a335e9591404 ("rust: rbtree: add `RBTree::entry`")
Link: https://github.com/rust-lang/rust-clippy/pull/13711 [1]
Link: https://github.com/rust-lang/rust-clippy/issues/13601 [2]
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
rust/kernel/rbtree.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/rust/kernel/rbtree.rs b/rust/kernel/rbtree.rs
index ee2731dad72d..0d1e75810664 100644
--- a/rust/kernel/rbtree.rs
+++ b/rust/kernel/rbtree.rs
@@ -1149,7 +1149,7 @@ pub struct VacantEntry<'a, K, V> {
/// # Invariants
/// - `parent` may be null if the new node becomes the root.
/// - `child_field_of_parent` is a valid pointer to the left-child or right-child of `parent`. If `parent` is
-/// null, it is a pointer to the root of the [`RBTree`].
+/// null, it is a pointer to the root of the [`RBTree`].
struct RawVacantEntry<'a, K, V> {
rbtree: *mut RBTree<K, V>,
/// The node that will become the parent of the new node if we insert one.
--
2.48.1
Starting with Rust 1.85.0 (currently in beta, to be released 2025-02-20),
under some kernel configurations with `CONFIG_RUST_DEBUG_ASSERTIONS=y`,
one may trigger a new `objtool` warning:
rust/kernel.o: warning: objtool: _R...securityNtB2_11SecurityCtx8as_bytes()
falls through to next function _R...core3ops4drop4Drop4drop()
due to a call to the `noreturn` symbol:
core::panicking::assert_failed::<usize, usize>
Thus add it to the list so that `objtool` knows it is actually `noreturn`.
Do so matching with `strstr` since it is a generic.
See commit 56d680dd23c3 ("objtool/rust: list `noreturn` Rust functions")
for more details.
Cc: <stable(a)vger.kernel.org> # Needed in 6.12.y only (Rust is pinned in older LTSs).
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
tools/objtool/check.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 76060da755b5..e7ec29dfdff2 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -218,6 +218,7 @@ static bool is_rust_noreturn(const struct symbol *func)
str_ends_with(func->name, "_4core9panicking18panic_bounds_check") ||
str_ends_with(func->name, "_4core9panicking19assert_failed_inner") ||
str_ends_with(func->name, "_4core9panicking36panic_misaligned_pointer_dereference") ||
+ strstr(func->name, "_4core9panicking13assert_failed") ||
strstr(func->name, "_4core9panicking11panic_const24panic_const_") ||
(strstr(func->name, "_4core5slice5index24slice_") &&
str_ends_with(func->name, "_fail"));
base-commit: 9d89551994a430b50c4fffcb1e617a057fa76e20
--
2.48.0
The code for detecting CPUs that are vulnerable to Spectre BHB was
based on a hardcoded list of CPU IDs that were known to be affected.
Unfortunately, the list mostly only contained the IDs of standard ARM
cores. The IDs for many cores that are minor variants of the standard
ARM cores (like many Qualcomm Kyro CPUs) weren't listed. This led the
code to assume that those variants were not affected.
Flip the code on its head and instead assume that a core is vulnerable
if it doesn't have CSV2_3 but is unrecognized as being safe. This
involves creating a "Spectre BHB safe" list.
As of right now, the only CPU IDs added to the "Spectre BHB safe" list
are ARM Cortex A35, A53, A55, A510, and A520. This list was created by
looking for cores that weren't listed in ARM's list [1] as per review
feedback on v2 of this patch [2]. Additionally Brahma A53 is added as
per mailing list feedback [3].
NOTE: this patch will not actually _mitigate_ anyone, it will simply
cause them to report themselves as vulnerable. If any cores in the
system are reported as vulnerable but not mitigated then the whole
system will be reported as vulnerable though the system will attempt
to mitigate with the information it has about the known cores.
[1] https://developer.arm.com/Arm%20Security%20Center/Spectre-BHB
[2] https://lore.kernel.org/r/20241219175128.GA25477@willie-the-truck
[3] https://lore.kernel.org/r/18dbd7d1-a46c-4112-a425-320c99f67a8d@broadcom.com
Fixes: 558c303c9734 ("arm64: Mitigate spectre style branch history side channels")
Cc: stable(a)vger.kernel.org
Reviewed-by: Julius Werner <jwerner(a)chromium.org>
Signed-off-by: Douglas Anderson <dianders(a)chromium.org>
---
Changes in v4:
- Add MIDR_BRAHMA_B53 as safe.
- Get rid of `spectre_bhb_firmware_mitigated_list`.
Changes in v3:
- Don't guess the mitigation; just report unknown cores as vulnerable.
- Restructure the code since is_spectre_bhb_affected() defaults to true
Changes in v2:
- New
arch/arm64/include/asm/spectre.h | 1 -
arch/arm64/kernel/proton-pack.c | 203 ++++++++++++++++---------------
2 files changed, 102 insertions(+), 102 deletions(-)
diff --git a/arch/arm64/include/asm/spectre.h b/arch/arm64/include/asm/spectre.h
index 0c4d9045c31f..f1524cdeacf1 100644
--- a/arch/arm64/include/asm/spectre.h
+++ b/arch/arm64/include/asm/spectre.h
@@ -97,7 +97,6 @@ enum mitigation_state arm64_get_meltdown_state(void);
enum mitigation_state arm64_get_spectre_bhb_state(void);
bool is_spectre_bhb_affected(const struct arm64_cpu_capabilities *entry, int scope);
-u8 spectre_bhb_loop_affected(int scope);
void spectre_bhb_enable_mitigation(const struct arm64_cpu_capabilities *__unused);
bool try_emulate_el1_ssbs(struct pt_regs *regs, u32 instr);
diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c
index e149efadff20..17aa836fe46d 100644
--- a/arch/arm64/kernel/proton-pack.c
+++ b/arch/arm64/kernel/proton-pack.c
@@ -845,53 +845,70 @@ static unsigned long system_bhb_mitigations;
* This must be called with SCOPE_LOCAL_CPU for each type of CPU, before any
* SCOPE_SYSTEM call will give the right answer.
*/
-u8 spectre_bhb_loop_affected(int scope)
+static bool is_spectre_bhb_safe(int scope)
+{
+ static const struct midr_range spectre_bhb_safe_list[] = {
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A35),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A53),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A55),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A510),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A520),
+ MIDR_ALL_VERSIONS(MIDR_BRAHMA_B53),
+ {},
+ };
+ static bool all_safe = true;
+
+ if (scope != SCOPE_LOCAL_CPU)
+ return all_safe;
+
+ if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_safe_list))
+ return true;
+
+ all_safe = false;
+
+ return false;
+}
+
+static u8 spectre_bhb_loop_affected(void)
{
u8 k = 0;
- static u8 max_bhb_k;
-
- if (scope == SCOPE_LOCAL_CPU) {
- static const struct midr_range spectre_bhb_k32_list[] = {
- MIDR_ALL_VERSIONS(MIDR_CORTEX_A78),
- MIDR_ALL_VERSIONS(MIDR_CORTEX_A78AE),
- MIDR_ALL_VERSIONS(MIDR_CORTEX_A78C),
- MIDR_ALL_VERSIONS(MIDR_CORTEX_X1),
- MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
- MIDR_ALL_VERSIONS(MIDR_CORTEX_X2),
- MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
- MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
- {},
- };
- static const struct midr_range spectre_bhb_k24_list[] = {
- MIDR_ALL_VERSIONS(MIDR_CORTEX_A76),
- MIDR_ALL_VERSIONS(MIDR_CORTEX_A77),
- MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
- MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_4XX_GOLD),
- {},
- };
- static const struct midr_range spectre_bhb_k11_list[] = {
- MIDR_ALL_VERSIONS(MIDR_AMPERE1),
- {},
- };
- static const struct midr_range spectre_bhb_k8_list[] = {
- MIDR_ALL_VERSIONS(MIDR_CORTEX_A72),
- MIDR_ALL_VERSIONS(MIDR_CORTEX_A57),
- {},
- };
-
- if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k32_list))
- k = 32;
- else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k24_list))
- k = 24;
- else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k11_list))
- k = 11;
- else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k8_list))
- k = 8;
-
- max_bhb_k = max(max_bhb_k, k);
- } else {
- k = max_bhb_k;
- }
+
+ static const struct midr_range spectre_bhb_k32_list[] = {
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A78),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A78AE),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A78C),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_X1),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_X2),
+ MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+ MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
+ {},
+ };
+ static const struct midr_range spectre_bhb_k24_list[] = {
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A76),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A77),
+ MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
+ MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_4XX_GOLD),
+ {},
+ };
+ static const struct midr_range spectre_bhb_k11_list[] = {
+ MIDR_ALL_VERSIONS(MIDR_AMPERE1),
+ {},
+ };
+ static const struct midr_range spectre_bhb_k8_list[] = {
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A72),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A57),
+ {},
+ };
+
+ if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k32_list))
+ k = 32;
+ else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k24_list))
+ k = 24;
+ else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k11_list))
+ k = 11;
+ else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k8_list))
+ k = 8;
return k;
}
@@ -917,29 +934,13 @@ static enum mitigation_state spectre_bhb_get_cpu_fw_mitigation_state(void)
}
}
-static bool is_spectre_bhb_fw_affected(int scope)
+static bool has_spectre_bhb_fw_mitigation(void)
{
- static bool system_affected;
enum mitigation_state fw_state;
bool has_smccc = arm_smccc_1_1_get_conduit() != SMCCC_CONDUIT_NONE;
- static const struct midr_range spectre_bhb_firmware_mitigated_list[] = {
- MIDR_ALL_VERSIONS(MIDR_CORTEX_A73),
- MIDR_ALL_VERSIONS(MIDR_CORTEX_A75),
- {},
- };
- bool cpu_in_list = is_midr_in_range_list(read_cpuid_id(),
- spectre_bhb_firmware_mitigated_list);
-
- if (scope != SCOPE_LOCAL_CPU)
- return system_affected;
fw_state = spectre_bhb_get_cpu_fw_mitigation_state();
- if (cpu_in_list || (has_smccc && fw_state == SPECTRE_MITIGATED)) {
- system_affected = true;
- return true;
- }
-
- return false;
+ return has_smccc && fw_state == SPECTRE_MITIGATED;
}
static bool supports_ecbhb(int scope)
@@ -955,6 +956,8 @@ static bool supports_ecbhb(int scope)
ID_AA64MMFR1_EL1_ECBHB_SHIFT);
}
+static u8 max_bhb_k;
+
bool is_spectre_bhb_affected(const struct arm64_cpu_capabilities *entry,
int scope)
{
@@ -963,16 +966,18 @@ bool is_spectre_bhb_affected(const struct arm64_cpu_capabilities *entry,
if (supports_csv2p3(scope))
return false;
- if (supports_clearbhb(scope))
- return true;
-
- if (spectre_bhb_loop_affected(scope))
- return true;
+ if (is_spectre_bhb_safe(scope))
+ return false;
- if (is_spectre_bhb_fw_affected(scope))
- return true;
+ /*
+ * At this point the core isn't known to be "safe" so we're going to
+ * assume it's vulnerable. We still need to update `max_bhb_k` though,
+ * but only if we aren't mitigating with clearbhb though.
+ */
+ if (scope == SCOPE_LOCAL_CPU && !supports_clearbhb(SCOPE_LOCAL_CPU))
+ max_bhb_k = max(max_bhb_k, spectre_bhb_loop_affected());
- return false;
+ return true;
}
static void this_cpu_set_vectors(enum arm64_bp_harden_el1_vectors slot)
@@ -1003,7 +1008,7 @@ early_param("nospectre_bhb", parse_spectre_bhb_param);
void spectre_bhb_enable_mitigation(const struct arm64_cpu_capabilities *entry)
{
bp_hardening_cb_t cpu_cb;
- enum mitigation_state fw_state, state = SPECTRE_VULNERABLE;
+ enum mitigation_state state = SPECTRE_VULNERABLE;
struct bp_hardening_data *data = this_cpu_ptr(&bp_hardening_data);
if (!is_spectre_bhb_affected(entry, SCOPE_LOCAL_CPU))
@@ -1029,7 +1034,7 @@ void spectre_bhb_enable_mitigation(const struct arm64_cpu_capabilities *entry)
this_cpu_set_vectors(EL1_VECTOR_BHB_CLEAR_INSN);
state = SPECTRE_MITIGATED;
set_bit(BHB_INSN, &system_bhb_mitigations);
- } else if (spectre_bhb_loop_affected(SCOPE_LOCAL_CPU)) {
+ } else if (spectre_bhb_loop_affected()) {
/*
* Ensure KVM uses the indirect vector which will have the
* branchy-loop added. A57/A72-r0 will already have selected
@@ -1042,32 +1047,29 @@ void spectre_bhb_enable_mitigation(const struct arm64_cpu_capabilities *entry)
this_cpu_set_vectors(EL1_VECTOR_BHB_LOOP);
state = SPECTRE_MITIGATED;
set_bit(BHB_LOOP, &system_bhb_mitigations);
- } else if (is_spectre_bhb_fw_affected(SCOPE_LOCAL_CPU)) {
- fw_state = spectre_bhb_get_cpu_fw_mitigation_state();
- if (fw_state == SPECTRE_MITIGATED) {
- /*
- * Ensure KVM uses one of the spectre bp_hardening
- * vectors. The indirect vector doesn't include the EL3
- * call, so needs upgrading to
- * HYP_VECTOR_SPECTRE_INDIRECT.
- */
- if (!data->slot || data->slot == HYP_VECTOR_INDIRECT)
- data->slot += 1;
-
- this_cpu_set_vectors(EL1_VECTOR_BHB_FW);
-
- /*
- * The WA3 call in the vectors supersedes the WA1 call
- * made during context-switch. Uninstall any firmware
- * bp_hardening callback.
- */
- cpu_cb = spectre_v2_get_sw_mitigation_cb();
- if (__this_cpu_read(bp_hardening_data.fn) != cpu_cb)
- __this_cpu_write(bp_hardening_data.fn, NULL);
-
- state = SPECTRE_MITIGATED;
- set_bit(BHB_FW, &system_bhb_mitigations);
- }
+ } else if (has_spectre_bhb_fw_mitigation()) {
+ /*
+ * Ensure KVM uses one of the spectre bp_hardening
+ * vectors. The indirect vector doesn't include the EL3
+ * call, so needs upgrading to
+ * HYP_VECTOR_SPECTRE_INDIRECT.
+ */
+ if (!data->slot || data->slot == HYP_VECTOR_INDIRECT)
+ data->slot += 1;
+
+ this_cpu_set_vectors(EL1_VECTOR_BHB_FW);
+
+ /*
+ * The WA3 call in the vectors supersedes the WA1 call
+ * made during context-switch. Uninstall any firmware
+ * bp_hardening callback.
+ */
+ cpu_cb = spectre_v2_get_sw_mitigation_cb();
+ if (__this_cpu_read(bp_hardening_data.fn) != cpu_cb)
+ __this_cpu_write(bp_hardening_data.fn, NULL);
+
+ state = SPECTRE_MITIGATED;
+ set_bit(BHB_FW, &system_bhb_mitigations);
}
update_mitigation_state(&spectre_bhb_state, state);
@@ -1101,7 +1103,6 @@ void noinstr spectre_bhb_patch_loop_iter(struct alt_instr *alt,
{
u8 rd;
u32 insn;
- u16 loop_count = spectre_bhb_loop_affected(SCOPE_SYSTEM);
BUG_ON(nr_inst != 1); /* MOV -> MOV */
@@ -1110,7 +1111,7 @@ void noinstr spectre_bhb_patch_loop_iter(struct alt_instr *alt,
insn = le32_to_cpu(*origptr);
rd = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD, insn);
- insn = aarch64_insn_gen_movewide(rd, loop_count, 0,
+ insn = aarch64_insn_gen_movewide(rd, max_bhb_k, 0,
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_MOVEWIDE_ZERO);
*updptr++ = cpu_to_le32(insn);
--
2.47.1.613.gc27f4b7a9f-goog
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Any active plane needs to have its crtc included in the atomic
state. For planes enabled via uapi that is all handler in the core.
But when we use a plane for joiner the uapi code things the plane
is disabled and therefore doesn't have a crtc. So we need to pull
those in by hand. We do it first thing in
intel_joiner_add_affected_crtcs() so that any newly added crtc will
subsequently pull in all of its joined crtcs as well.
The symptoms from failing to do this are:
- duct tape in the form of commit 1d5b09f8daf8 ("drm/i915: Fix NULL
ptr deref by checking new_crtc_state")
- the plane's hw state will get overwritten by the disabled
uapi state if it can't find the uapi counterpart plane in
the atomic state from where it should copy the correct state
Cc: stable(a)vger.kernel.org
Reviewed-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_display.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 6c1e7441313e..22bf46be2ca9 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -6695,12 +6695,30 @@ static int intel_async_flip_check_hw(struct intel_atomic_state *state, struct in
static int intel_joiner_add_affected_crtcs(struct intel_atomic_state *state)
{
struct drm_i915_private *i915 = to_i915(state->base.dev);
+ const struct intel_plane_state *plane_state;
struct intel_crtc_state *crtc_state;
+ struct intel_plane *plane;
struct intel_crtc *crtc;
u8 affected_pipes = 0;
u8 modeset_pipes = 0;
int i;
+ /*
+ * Any plane which is in use by the joiner needs its crtc.
+ * Pull those in first as this will not have happened yet
+ * if the plane remains disabled according to uapi.
+ */
+ for_each_new_intel_plane_in_state(state, plane, plane_state, i) {
+ crtc = to_intel_crtc(plane_state->hw.crtc);
+ if (!crtc)
+ continue;
+
+ crtc_state = intel_atomic_get_crtc_state(&state->base, crtc);
+ if (IS_ERR(crtc_state))
+ return PTR_ERR(crtc_state);
+ }
+
+ /* Now pull in all joined crtcs */
for_each_new_intel_crtc_in_state(state, crtc, crtc_state, i) {
affected_pipes |= crtc_state->joiner_pipes;
if (intel_crtc_needs_modeset(crtc_state))
--
2.45.3
When some client process A call pdr_add_lookup() to add the look up for
the service and does schedule locator work, later a process B got a new
server packet indicating locator is up and call pdr_locator_new_server()
which eventually sets pdr->locator_init_complete to true which process A
sees and takes list lock and queries domain list but it will timeout due
to deadlock as the response will queued to the same qmi->wq and it is
ordered workqueue and process B is not able to complete new server
request work due to deadlock on list lock.
Process A Process B
process_scheduled_works()
pdr_add_lookup() qmi_data_ready_work()
process_scheduled_works() pdr_locator_new_server()
pdr->locator_init_complete=true;
pdr_locator_work()
mutex_lock(&pdr->list_lock);
pdr_locate_service() mutex_lock(&pdr->list_lock);
pdr_get_domain_list()
pr_err("PDR: %s get domain list
txn wait failed: %d\n",
req->service_name,
ret);
Fix it by removing the unnecessary list iteration as the list iteration
is already being done inside locator work, so avoid it here and just
call schedule_work() here.
Fixes: fbe639b44a82 ("soc: qcom: Introduce Protection Domain Restart helpers")
CC: stable(a)vger.kernel.org
Signed-off-by: Saranya R <quic_sarar(a)quicinc.com>
Signed-off-by: Mukesh Ojha <mukesh.ojha(a)oss.qualcomm.com>
---
Changes in v2:
- Added Fixes tag,
drivers/soc/qcom/pdr_interface.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/soc/qcom/pdr_interface.c b/drivers/soc/qcom/pdr_interface.c
index 328b6153b2be..71be378d2e43 100644
--- a/drivers/soc/qcom/pdr_interface.c
+++ b/drivers/soc/qcom/pdr_interface.c
@@ -75,7 +75,6 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
{
struct pdr_handle *pdr = container_of(qmi, struct pdr_handle,
locator_hdl);
- struct pdr_service *pds;
mutex_lock(&pdr->lock);
/* Create a local client port for QMI communication */
@@ -87,12 +86,7 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
mutex_unlock(&pdr->lock);
/* Service pending lookup requests */
- mutex_lock(&pdr->list_lock);
- list_for_each_entry(pds, &pdr->lookups, node) {
- if (pds->need_locator_lookup)
- schedule_work(&pdr->locator_work);
- }
- mutex_unlock(&pdr->list_lock);
+ schedule_work(&pdr->locator_work);
return 0;
}
--
2.34.1
The card works fine on 6.6.
Here's the output of alsa-info.sh for 6.6 (where it works):
http://alsa-project.org/db/?f=ac3a0849a332189eb61a640fe94b46de369a655c
Here's the output for 6.12:
http://alsa-project.org/db/?f=22995c74392ed86393fc2c180ab7ae010e6cbfbb
Here's the output of $ lspci -k | grep -A 3 Audio
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
DeviceName: Onboard - Sound
Subsystem: Dell Device 0871
Kernel modules: snd_hda_intel, snd_soc_avs, snd_sof_pci_intel_cnl
I have initially also added modinfo for snd_hda_intel, snd_soc_avs,
and snd_sof_pci_intel_cnl - But my email was flagged as spam, so I
won't do so now.
Installed sound related packages:
sof-firmware 2024.09.2-1
alsa-ucm-conf 1.2.13-2
If there's a need for more information, please let me know.
arm64 supports multiple huge_pte sizes. Some of the sizes are covered by
a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some
are covered by multiple ptes at a particular level (CONT_PTE_SIZE,
CONT_PMD_SIZE). So the function has to figure out the size from the
huge_pte pointer. This was previously done by walking the pgtable to
determine the level, then using the PTE_CONT bit to determine the number
of ptes.
But the PTE_CONT bit is only valid when the pte is present. For
non-present pte values (e.g. markers, migration entries), the previous
implementation was therefore erroniously determining the size. There is
at least one known caller in core-mm, move_huge_pte(), which may call
huge_ptep_get_and_clear() for a non-present pte. So we must be robust to
this case. Additionally the "regular" ptep_get_and_clear() is robust to
being called for non-present ptes so it makes sense to follow the
behaviour.
Fix this by using the new sz parameter which is now provided to the
function. Additionally when clearing each pte in a contig range, don't
gather the access and dirty bits if the pte is not present.
An alternative approach that would not require API changes would be to
store the PTE_CONT bit in a spare bit in the swap entry pte. But it felt
cleaner to follow other APIs' lead and just pass in the size.
While we are at it, add some debug warnings in functions that require
the pte is present.
As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap
entry offset field (layout of non-present pte). Since hugetlb is never
swapped to disk, this field will only be populated for markers, which
always set this bit to 0 and hwpoison swap entries, which set the offset
field to a PFN; So it would only ever be 1 for a 52-bit PVA system where
memory in that high half was poisoned (I think!). So in practice, this
bit would almost always be zero for non-present ptes and we would only
clear the first entry if it was actually a contiguous block. That's
probably a less severe symptom than if it was always interpretted as 1
and cleared out potentially-present neighboring PTEs.
Cc: <stable(a)vger.kernel.org>
Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
---
arch/arm64/mm/hugetlbpage.c | 54 ++++++++++++++++++++-----------------
1 file changed, 29 insertions(+), 25 deletions(-)
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 06db4649af91..328eec4bfe55 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm,
unsigned long pgsize,
unsigned long ncontig)
{
- pte_t orig_pte = __ptep_get(ptep);
- unsigned long i;
-
- for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) {
- pte_t pte = __ptep_get_and_clear(mm, addr, ptep);
-
- /*
- * If HW_AFDBM is enabled, then the HW could turn on
- * the dirty or accessed bit for any page in the set,
- * so check them all.
- */
- if (pte_dirty(pte))
- orig_pte = pte_mkdirty(orig_pte);
-
- if (pte_young(pte))
- orig_pte = pte_mkyoung(orig_pte);
+ pte_t pte, tmp_pte;
+ bool present;
+
+ pte = __ptep_get_and_clear(mm, addr, ptep);
+ present = pte_present(pte);
+ while (--ncontig) {
+ ptep++;
+ addr += pgsize;
+ tmp_pte = __ptep_get_and_clear(mm, addr, ptep);
+ if (present) {
+ if (pte_dirty(tmp_pte))
+ pte = pte_mkdirty(pte);
+ if (pte_young(tmp_pte))
+ pte = pte_mkyoung(pte);
+ }
}
- return orig_pte;
+ return pte;
}
static pte_t get_clear_contig_flush(struct mm_struct *mm,
@@ -401,13 +400,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
{
int ncontig;
size_t pgsize;
- pte_t orig_pte = __ptep_get(ptep);
-
- if (!pte_cont(orig_pte))
- return __ptep_get_and_clear(mm, addr, ptep);
-
- ncontig = find_num_contig(mm, addr, ptep, &pgsize);
+ ncontig = num_contig_ptes(sz, &pgsize);
return get_clear_contig(mm, addr, ptep, pgsize, ncontig);
}
@@ -451,6 +445,8 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
pgprot_t hugeprot;
pte_t orig_pte;
+ VM_WARN_ON(!pte_present(pte));
+
if (!pte_cont(pte))
return __ptep_set_access_flags(vma, addr, ptep, pte, dirty);
@@ -461,6 +457,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
return 0;
orig_pte = get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig);
+ VM_WARN_ON(!pte_present(orig_pte));
/* Make sure we don't lose the dirty or young state */
if (pte_dirty(orig_pte))
@@ -485,7 +482,10 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
size_t pgsize;
pte_t pte;
- if (!pte_cont(__ptep_get(ptep))) {
+ pte = __ptep_get(ptep);
+ VM_WARN_ON(!pte_present(pte));
+
+ if (!pte_cont(pte)) {
__ptep_set_wrprotect(mm, addr, ptep);
return;
}
@@ -509,8 +509,12 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
struct mm_struct *mm = vma->vm_mm;
size_t pgsize;
int ncontig;
+ pte_t pte;
- if (!pte_cont(__ptep_get(ptep)))
+ pte = __ptep_get(ptep);
+ VM_WARN_ON(!pte_present(pte));
+
+ if (!pte_cont(pte))
return ptep_clear_flush(vma, addr, ptep);
ncontig = find_num_contig(mm, addr, ptep, &pgsize);
--
2.43.0
The stmpe_reg_read function can fail, but its return value is not checked
in stmpe_gpio_irq_sync_unlock. This can lead to silent failures and
incorrect behavior if the hardware access fails.
This patch adds checks for the return value of stmpe_reg_read. If the
function fails, an error message is logged and the function returns
early to avoid further issues.
Fixes: b888fb6f2a27 ("gpio: stmpe: i2c transfer are forbiden in atomic context")
Cc: stable(a)vger.kernel.org # 4.16+
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
drivers/gpio/gpio-stmpe.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpio/gpio-stmpe.c b/drivers/gpio/gpio-stmpe.c
index 75a3633ceddb..222279a9d82b 100644
--- a/drivers/gpio/gpio-stmpe.c
+++ b/drivers/gpio/gpio-stmpe.c
@@ -191,7 +191,7 @@ static void stmpe_gpio_irq_sync_unlock(struct irq_data *d)
[REG_IE][CSB] = STMPE_IDX_IEGPIOR_CSB,
[REG_IE][MSB] = STMPE_IDX_IEGPIOR_MSB,
};
- int i, j;
+ int ret, i, j;
/*
* STMPE1600: to be able to get IRQ from pins,
@@ -199,8 +199,16 @@ static void stmpe_gpio_irq_sync_unlock(struct irq_data *d)
* GPSR or GPCR registers
*/
if (stmpe->partnum == STMPE1600) {
- stmpe_reg_read(stmpe, stmpe->regs[STMPE_IDX_GPMR_LSB]);
- stmpe_reg_read(stmpe, stmpe->regs[STMPE_IDX_GPMR_CSB]);
+ ret = stmpe_reg_read(stmpe, stmpe->regs[STMPE_IDX_GPMR_LSB]);
+ if (ret < 0) {
+ dev_err(stmpe->dev, "Failed to read GPMR_LSB: %d\n", ret);
+ goto err;
+ }
+ ret = stmpe_reg_read(stmpe, stmpe->regs[STMPE_IDX_GPMR_CSB]);
+ if (ret < 0) {
+ dev_err(stmpe->dev, "Failed to read GPMR_CSB: %d\n", ret);
+ goto err;
+ }
}
for (i = 0; i < CACHE_NR_REGS; i++) {
@@ -222,6 +230,7 @@ static void stmpe_gpio_irq_sync_unlock(struct irq_data *d)
}
}
+err:
mutex_unlock(&stmpe_gpio->irq_lock);
}
--
2.42.0.windows.2
Jann reported [1] possible issue when trampoline_check_ip returns
address near the bottom of the address space that is allowed to
call into the syscall if uretprobes are not set up.
Though the mmap minimum address restrictions will typically prevent
creating mappings there, let's make sure uretprobe syscall checks
for that.
[1] https://lore.kernel.org/bpf/202502081235.5A6F352985@keescook/T/#m9d416df341…
Cc: Kees Cook <kees(a)kernel.org>
Cc: Eyal Birger <eyal.birger(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe")
Reported-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Kees Cook <kees(a)kernel.org>
Signed-off-by: Jiri Olsa <jolsa(a)kernel.org>
---
v2 changes:
- adding UPROBE_NO_TRAMPOLINE_VADDR macro (Andrii)
- rebased on top of perf/core
arch/x86/kernel/uprobes.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 5a952c5ea66b..e8d3c59aa9f7 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -357,19 +357,25 @@ void *arch_uprobe_trampoline(unsigned long *psize)
return &insn;
}
-static unsigned long trampoline_check_ip(void)
+static unsigned long trampoline_check_ip(unsigned long tramp)
{
- unsigned long tramp = uprobe_get_trampoline_vaddr();
-
return tramp + (uretprobe_syscall_check - uretprobe_trampoline_entry);
}
+#define UPROBE_NO_TRAMPOLINE_VADDR ((unsigned long)-1)
+
SYSCALL_DEFINE0(uretprobe)
{
struct pt_regs *regs = task_pt_regs(current);
- unsigned long err, ip, sp, r11_cx_ax[3];
+ unsigned long err, ip, sp, r11_cx_ax[3], tramp;
+
+ /* If there's no trampoline, we are called from wrong place. */
+ tramp = uprobe_get_trampoline_vaddr();
+ if (tramp == UPROBE_NO_TRAMPOLINE_VADDR)
+ goto sigill;
- if (regs->ip != trampoline_check_ip())
+ /* Make sure the ip matches the only allowed sys_uretprobe caller. */
+ if (regs->ip != trampoline_check_ip(tramp))
goto sigill;
err = copy_from_user(r11_cx_ax, (void __user *)regs->sp, sizeof(r11_cx_ax));
--
2.48.1
In vcap_debugfs_show_rule_keyset(), the function vcap_keyfields()
returns a NULL pointer upon allocation failure. This can lead to
a NULL pointer dereference in vcap_debugfs_show_rule_keyfield().
To prevent this, add a check for a NULL return value from
vcap_keyfields() and continue the loop if it is NULL.
Fixes: 610c32b2ce66 ("net: microchip: vcap: Add vcap_get_rule")
Cc: stable(a)vger.kernel.org # 6.2+
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c b/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c
index 59bfbda29bb3..e9e2f7af9be3 100644
--- a/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c
+++ b/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c
@@ -202,6 +202,8 @@ static int vcap_debugfs_show_rule_keyset(struct vcap_rule_internal *ri,
list_for_each_entry(ckf, &ri->data.keyfields, ctrl.list) {
keyfield = vcap_keyfields(vctrl, admin->vtype, ri->data.keyset);
+ if (!keyfield)
+ continue;
vcap_debugfs_show_rule_keyfield(vctrl, out, ckf->ctrl.key,
keyfield, &ckf->data);
}
--
2.42.0.windows.2
Fixes for the provided buffers for not allowing kbufs to cross a single
execution section. Upstream had most of it already fixed by chance,
which is why all 3 patches refer to a single upstream commit.
Pavel Begunkov (3):
io_uring: fix multishots with selected buffers
io_uring: fix io_req_prep_async with provided buffers
io_uring/rw: commit provided buffer state on async
io_uring/io_uring.c | 5 ++++-
io_uring/poll.c | 2 ++
io_uring/rw.c | 10 ++++++++++
3 files changed, 16 insertions(+), 1 deletion(-)
--
2.47.1
From: Mario Limonciello <mario.limonciello(a)amd.com>
Spurious immediate wake up events are reported on Acer Nitro ANV14. GPIO 11 is
specified as an edge triggered input and also a wake source but this pin is
supposed to be an output pin for an LED, so it's effectively floating.
Block the interrupt from getting set up for this GPIO on this device.
Cc: stable(a)vger.kernel.org
Reported-and-tested-by: Delgan <delgan.py(a)gmail.com>
Close: https://gitlab.freedesktop.org/drm/amd/-/issues/3954
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/gpio/gpiolib-acpi.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/drivers/gpio/gpiolib-acpi.c b/drivers/gpio/gpiolib-acpi.c
index 1f9fe50bba005..f7746c57ba76a 100644
--- a/drivers/gpio/gpiolib-acpi.c
+++ b/drivers/gpio/gpiolib-acpi.c
@@ -1689,6 +1689,20 @@ static const struct dmi_system_id gpiolib_acpi_quirks[] __initconst = {
.ignore_wake = "PNP0C50:00@8",
},
},
+ {
+ /*
+ * Spurious wakeups from GPIO 11
+ * Found in BIOS 1.04
+ * https://gitlab.freedesktop.org/drm/amd/-/issues/3954
+ */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Acer"),
+ DMI_MATCH(DMI_PRODUCT_FAMILY, "Acer Nitro V 14"),
+ },
+ .driver_data = &(struct acpi_gpiolib_dmi_quirk) {
+ .ignore_interrupt = "AMDI0030:00@11",
+ },
+ },
{} /* Terminating entry */
};
--
2.43.0
This is a note to let you know that I've just added the patch titled
iio: filter: admv8818: Force initialization of SDO
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From cc2c3540d9477a9931fb0fd851fcaeba524a5b35 Mon Sep 17 00:00:00 2001
From: Sam Winchenbach <swinchenbach(a)arka.org>
Date: Mon, 3 Feb 2025 13:34:34 +0000
Subject: iio: filter: admv8818: Force initialization of SDO
When a weak pull-up is present on the SDO line, regmap_update_bits fails
to write both the SOFTRESET and SDOACTIVE bits because it incorrectly
reads them as already set.
Since the soft reset disables the SDO line, performing a
read-modify-write operation on ADI_SPI_CONFIG_A to enable the SDO line
doesn't make sense. This change directly writes to the register instead
of using regmap_update_bits.
Fixes: f34fe888ad05 ("iio:filter:admv8818: add support for ADMV8818")
Signed-off-by: Sam Winchenbach <swinchenbach(a)arka.org>
Link: https://patch.msgid.link/SA1P110MB106904C961B0F3FAFFED74C0BCF5A@SA1P110MB10…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/filter/admv8818.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/drivers/iio/filter/admv8818.c b/drivers/iio/filter/admv8818.c
index 848baa6e3bbf..d85b7d3de866 100644
--- a/drivers/iio/filter/admv8818.c
+++ b/drivers/iio/filter/admv8818.c
@@ -574,21 +574,15 @@ static int admv8818_init(struct admv8818_state *st)
struct spi_device *spi = st->spi;
unsigned int chip_id;
- ret = regmap_update_bits(st->regmap, ADMV8818_REG_SPI_CONFIG_A,
- ADMV8818_SOFTRESET_N_MSK |
- ADMV8818_SOFTRESET_MSK,
- FIELD_PREP(ADMV8818_SOFTRESET_N_MSK, 1) |
- FIELD_PREP(ADMV8818_SOFTRESET_MSK, 1));
+ ret = regmap_write(st->regmap, ADMV8818_REG_SPI_CONFIG_A,
+ ADMV8818_SOFTRESET_N_MSK | ADMV8818_SOFTRESET_MSK);
if (ret) {
dev_err(&spi->dev, "ADMV8818 Soft Reset failed.\n");
return ret;
}
- ret = regmap_update_bits(st->regmap, ADMV8818_REG_SPI_CONFIG_A,
- ADMV8818_SDOACTIVE_N_MSK |
- ADMV8818_SDOACTIVE_MSK,
- FIELD_PREP(ADMV8818_SDOACTIVE_N_MSK, 1) |
- FIELD_PREP(ADMV8818_SDOACTIVE_MSK, 1));
+ ret = regmap_write(st->regmap, ADMV8818_REG_SPI_CONFIG_A,
+ ADMV8818_SDOACTIVE_N_MSK | ADMV8818_SDOACTIVE_MSK);
if (ret) {
dev_err(&spi->dev, "ADMV8818 SDO Enable failed.\n");
return ret;
--
2.48.1
This is a note to let you know that I've just added the patch titled
iio: dac: ad3552r: clear reset status flag
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From e17b9f20da7d2bc1f48878ab2230523b2512d965 Mon Sep 17 00:00:00 2001
From: Angelo Dureghello <adureghello(a)baylibre.com>
Date: Sat, 25 Jan 2025 17:24:32 +0100
Subject: iio: dac: ad3552r: clear reset status flag
Clear reset status flag, to keep error status register clean after reset
(ad3552r manual, rev B table 38).
Reset error flag was left to 1, so debugging registers, the "Error
Status Register" was dirty (0x01). It is important to clear this bit, so
if there is any reset event over normal working mode, it is possible to
detect it.
Fixes: 8f2b54824b28 ("drivers:iio:dac: Add AD3552R driver support")
Signed-off-by: Angelo Dureghello <adureghello(a)baylibre.com>
Link: https://patch.msgid.link/20250125-wip-bl-ad3552r-clear-reset-v2-1-aa3a27f3f…
Cc: <Stable@vger..kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/dac/ad3552r.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/iio/dac/ad3552r.c b/drivers/iio/dac/ad3552r.c
index e7206af53af6..7944f5c1d264 100644
--- a/drivers/iio/dac/ad3552r.c
+++ b/drivers/iio/dac/ad3552r.c
@@ -410,6 +410,12 @@ static int ad3552r_reset(struct ad3552r_desc *dac)
return ret;
}
+ /* Clear reset error flag, see ad3552r manual, rev B table 38. */
+ ret = ad3552r_write_reg(dac, AD3552R_REG_ADDR_ERR_STATUS,
+ AD3552R_MASK_RESET_STATUS);
+ if (ret)
+ return ret;
+
return ad3552r_update_reg_field(dac,
AD3552R_REG_ADDR_INTERFACE_CONFIG_A,
AD3552R_MASK_ADDR_ASCENSION,
--
2.48.1
This is a note to let you know that I've just added the patch titled
iio: adc: ad7192: fix channel select
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 21d7241faf406e8aee3ce348451cc362d5db6a02 Mon Sep 17 00:00:00 2001
From: Markus Burri <markus.burri(a)mt.com>
Date: Fri, 24 Jan 2025 16:07:03 +0100
Subject: iio: adc: ad7192: fix channel select
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Channel configuration doesn't work as expected.
For FIELD_PREP the bit mask is needed and not the bit number.
Fixes: 874bbd1219c7 ("iio: adc: ad7192: Use bitfield access macros")
Signed-off-by: Markus Burri <markus.burri(a)mt.com>
Reviewed-by: Nuno Sá <nuno.sa(a)analog.com>
Link: https://patch.msgid.link/20250124150703.97848-1-markus.burri@mt.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad7192.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/adc/ad7192.c b/drivers/iio/adc/ad7192.c
index e96a5ae92375..cfaf8f7e0a07 100644
--- a/drivers/iio/adc/ad7192.c
+++ b/drivers/iio/adc/ad7192.c
@@ -1084,7 +1084,7 @@ static int ad7192_update_scan_mode(struct iio_dev *indio_dev, const unsigned lon
conf &= ~AD7192_CONF_CHAN_MASK;
for_each_set_bit(i, scan_mask, 8)
- conf |= FIELD_PREP(AD7192_CONF_CHAN_MASK, i);
+ conf |= FIELD_PREP(AD7192_CONF_CHAN_MASK, BIT(i));
ret = ad_sd_write_reg(&st->sd, AD7192_REG_CONF, 3, conf);
if (ret < 0)
--
2.48.1
Hi all,
I ran into a small memory leak while working on the BPF selftests suite
on the bpf-next tree. It leads to an oom-kill after thousands of iterations
in the qemu environment provided by tools/testing/selftests/bpf/vmtest.sh.
To reproduce the issue from the net-next tree:
$ git remote add bpf-next https://github.com/kernel-patches/bpf
$ git fetch bpf-next
$ git cherry-pick 723f1b9ce332^..edb996fae276
$ tools/testing/selftests/bpf/vmtest.sh -i -s "bash -c 'for i in {1..8192}; do ./test_progs -t xdp_veth_redirect; done'"
[... coffee break ...]
[ XXXX.YYYYYY] sh invoked oom-killer: gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0
[...]
[ XXXX.YYYYYY] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=bash,pid=116,uid=0
[ XXXX.YYYYYY] Out of memory: Killed process 116 (bash) total-vm:6816kB, anon-rss:2816kB, file-rss:240kB, shmem-rss:0kB, UID:0 pgtables:48kB oom_score_adj:0
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Bastien Curutchet (eBPF Foundation) (2):
rtnetlink: Fix rtnl_net_cmp_locks() when DEBUG is off
rtnetlink: Release nets when leaving rtnl_setlink()
net/core/rtnetlink.c | 4 ++++
1 file changed, 4 insertions(+)
---
base-commit: 5d332c1ad3226c0a31653dbf2391bd332e157625
change-id: 20250211-rtnetlink_leak-8b12ca5c83f3
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
When update_mmu_cache_range() is called by update_mmu_cache(), the vmf
parameter is NULL, which will cause a NULL pointer dereference issue in
adjust_pte():
Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read
Hardware name: Atmel AT91SAM9
PC is at update_mmu_cache_range+0x1e0/0x278
LR is at pte_offset_map_rw_nolock+0x18/0x2c
Call trace:
update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec
remove_migration_pte from rmap_walk_file+0xcc/0x130
rmap_walk_file from remove_migration_ptes+0x90/0xa4
remove_migration_ptes from migrate_pages_batch+0x6d4/0x858
migrate_pages_batch from migrate_pages+0x188/0x488
migrate_pages from compact_zone+0x56c/0x954
compact_zone from compact_node+0x90/0xf0
compact_node from kcompactd+0x1d4/0x204
kcompactd from kthread+0x120/0x12c
kthread from ret_from_fork+0x14/0x38
Exception stack(0xc0d8bfb0 to 0xc0d8bff8)
To fix it, do not rely on whether 'ptl' is equal to decide whether to hold
the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is
enabled. In addition, if two vmas map to the same PTE page, there is no
need to hold the pte lock again, otherwise a deadlock will occur. Just add
the need_lock parameter to let adjust_pte() know this information.
Reported-by: Ezra Buehler <ezra(a)easyb.ch>
Closes: https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0c…
Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
---
arch/arm/mm/fault-armv.c | 40 ++++++++++++++++++++++++++++------------
1 file changed, 28 insertions(+), 12 deletions(-)
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 2bec87c3327d2..3627bf0957c75 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, unsigned long address,
}
static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
- unsigned long pfn, struct vm_fault *vmf)
+ unsigned long pfn, bool need_lock)
{
spinlock_t *ptl;
pgd_t *pgd;
@@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
if (!pte)
return 0;
- /*
- * If we are using split PTE locks, then we need to take the page
- * lock here. Otherwise we are using shared mm->page_table_lock
- * which is already locked, thus cannot take it.
- */
- if (ptl != vmf->ptl) {
+ if (need_lock) {
+ /*
+ * Use nested version here to indicate that we are already
+ * holding one similar spinlock.
+ */
spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
pte_unmap_unlock(pte, ptl);
@@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
ret = do_adjust_pte(vma, address, pfn, pte);
- if (ptl != vmf->ptl)
+ if (need_lock)
spin_unlock(ptl);
pte_unmap(pte);
@@ -123,16 +122,17 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
static void
make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep, unsigned long pfn,
- struct vm_fault *vmf)
+ unsigned long addr, pte_t *ptep, unsigned long pfn)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *mpnt;
unsigned long offset;
+ unsigned long start;
pgoff_t pgoff;
int aliases = 0;
pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
+ start = ALIGN_DOWN(addr, PMD_SIZE);
/*
* If we have any shared mappings that are in the same mm
@@ -141,6 +141,14 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
*/
flush_dcache_mmap_lock(mapping);
vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
+ unsigned long mpnt_addr;
+ /*
+ * If we are using split PTE locks, then we need to take the pte
+ * lock. Otherwise we are using shared mm->page_table_lock which
+ * is already locked, thus cannot take it.
+ */
+ bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS);
+
/*
* If this VMA is not in our MM, we can ignore it.
* Note that we intentionally mask out the VMA
@@ -151,7 +159,15 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
if (!(mpnt->vm_flags & VM_MAYSHARE))
continue;
offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
- aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf);
+ mpnt_addr = mpnt->vm_start + offset;
+ /*
+ * If mpnt_addr and addr are mapped to the same PTE page, there
+ * is no need to hold the pte lock again, otherwise a deadlock
+ * will occur.
+ */
+ if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE)
+ need_lock = false;
+ aliases += adjust_pte(mpnt, mpnt_addr, pfn, need_lock);
}
flush_dcache_mmap_unlock(mapping);
if (aliases)
@@ -194,7 +210,7 @@ void update_mmu_cache_range(struct vm_fault *vmf, struct vm_area_struct *vma,
__flush_dcache_folio(mapping, folio);
if (mapping) {
if (cache_is_vivt())
- make_coherent(mapping, vma, addr, ptep, pfn, vmf);
+ make_coherent(mapping, vma, addr, ptep, pfn);
else if (vma->vm_flags & VM_EXEC)
__flush_icache_all();
}
--
2.20.1
Since the new_metric and last_hop_metric variables can reach
the MAX_METRIC(0xffffffff) value, an integer overflow may occur
when multiplying them by 10/9. It can lead to incorrect behavior.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: a8d418d9ac25 ("mac80211: mesh: only switch path when new metric is at least 10% better")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov(a)infotecs.ru>
---
v2:
- Remove 64-bit arithmetic according to https://lore.kernel.org/all/a6bd38c58f2f7685eac53844f2336432503c328e.camel@…
- Replace multiplication by 10/9 with a function that compares metrics by adding 10% without integer overflow
v3:
- Fix a typo (persent->percent)
v4:
- Simplify the is_metric_better() function
- Remove 'inline', add a comment
net/mac80211/mesh_hwmp.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
index 4e9546e998b6..c94a9c7ca960 100644
--- a/net/mac80211/mesh_hwmp.c
+++ b/net/mac80211/mesh_hwmp.c
@@ -367,6 +367,12 @@ u32 airtime_link_metric_get(struct ieee80211_local *local,
return (u32)result;
}
+/* Check that the first metric is at least 10% better than the second one */
+static bool is_metric_better(u32 x, u32 y)
+{
+ return (x < y) && (x < (y - x / 10));
+}
+
/**
* hwmp_route_info_get - Update routing info to originator and transmitter
*
@@ -458,8 +464,8 @@ static u32 hwmp_route_info_get(struct ieee80211_sub_if_data *sdata,
(mpath->sn == orig_sn &&
(rcu_access_pointer(mpath->next_hop) !=
sta ?
- mult_frac(new_metric, 10, 9) :
- new_metric) >= mpath->metric)) {
+ !is_metric_better(new_metric, mpath->metric) :
+ new_metric >= mpath->metric))) {
process = false;
fresh_info = false;
}
@@ -533,8 +539,8 @@ static u32 hwmp_route_info_get(struct ieee80211_sub_if_data *sdata,
if ((mpath->flags & MESH_PATH_FIXED) ||
((mpath->flags & MESH_PATH_ACTIVE) &&
((rcu_access_pointer(mpath->next_hop) != sta ?
- mult_frac(last_hop_metric, 10, 9) :
- last_hop_metric) > mpath->metric)))
+ !is_metric_better(last_hop_metric, mpath->metric) :
+ last_hop_metric > mpath->metric))))
fresh_info = false;
} else {
mpath = mesh_path_add(sdata, ta);
--
2.39.5
The function drm_syncobj_fence_get() may return NULL if the syncobj
has no fence. In eb_fences_add(), this return value is not checked,
leading to a potential NULL pointer dereference in
i915_request_await_dma_fence().
This patch adds a check for the return value of drm_syncobj_fence_get
and returns an error if it is NULL, preventing the NULL pointer
dereference.
Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf")
Cc: stable(a)vger.kernel.org # 5.16+
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index f151640c1d13..7da65535feb9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3252,6 +3252,12 @@ eb_fences_add(struct i915_execbuffer *eb, struct i915_request *rq,
struct dma_fence *fence;
fence = drm_syncobj_fence_get(eb->gem_context->syncobj);
+ if (!fence) {
+ drm_dbg(&eb->i915->drm,
+ "Syncobj handle has no fence\n");
+ return ERR_PTR(-EINVAL);
+ }
+
err = i915_request_await_dma_fence(rq, fence);
dma_fence_put(fence);
if (err)
--
2.42.0.windows.2
The patch titled
Subject: arm: pgtable: fix NULL pointer dereference issue
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
arm-pgtable-fix-null-pointer-dereference-issue.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Qi Zheng <zhengqi.arch(a)bytedance.com>
Subject: arm: pgtable: fix NULL pointer dereference issue
Date: Wed, 12 Feb 2025 14:40:02 +0800
When update_mmu_cache_range() is called by update_mmu_cache(), the vmf
parameter is NULL, which will cause a NULL pointer dereference issue in
adjust_pte():
Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read
Hardware name: Atmel AT91SAM9
PC is at update_mmu_cache_range+0x1e0/0x278
LR is at pte_offset_map_rw_nolock+0x18/0x2c
Call trace:
update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec
remove_migration_pte from rmap_walk_file+0xcc/0x130
rmap_walk_file from remove_migration_ptes+0x90/0xa4
remove_migration_ptes from migrate_pages_batch+0x6d4/0x858
migrate_pages_batch from migrate_pages+0x188/0x488
migrate_pages from compact_zone+0x56c/0x954
compact_zone from compact_node+0x90/0xf0
compact_node from kcompactd+0x1d4/0x204
kcompactd from kthread+0x120/0x12c
kthread from ret_from_fork+0x14/0x38
Exception stack(0xc0d8bfb0 to 0xc0d8bff8)
To fix it, do not rely on whether 'ptl' is equal to decide whether to hold
the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is
enabled. In addition, if two vmas map to the same PTE page, there is no
need to hold the pte lock again, otherwise a deadlock will occur. Just add
the need_lock parameter to let adjust_pte() know this information.
Link: https://lkml.kernel.org/r/20250212064002.55598-1-zhengqi.arch@bytedance.com
Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()")
Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Reported-by: Ezra Buehler <ezra(a)easyb.ch>
Closes: https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0c…
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/arm/mm/fault-armv.c | 40 +++++++++++++++++++++++++------------
1 file changed, 28 insertions(+), 12 deletions(-)
--- a/arch/arm/mm/fault-armv.c~arm-pgtable-fix-null-pointer-dereference-issue
+++ a/arch/arm/mm/fault-armv.c
@@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_
}
static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
- unsigned long pfn, struct vm_fault *vmf)
+ unsigned long pfn, bool need_lock)
{
spinlock_t *ptl;
pgd_t *pgd;
@@ -99,12 +99,11 @@ again:
if (!pte)
return 0;
- /*
- * If we are using split PTE locks, then we need to take the page
- * lock here. Otherwise we are using shared mm->page_table_lock
- * which is already locked, thus cannot take it.
- */
- if (ptl != vmf->ptl) {
+ if (need_lock) {
+ /*
+ * Use nested version here to indicate that we are already
+ * holding one similar spinlock.
+ */
spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
pte_unmap_unlock(pte, ptl);
@@ -114,7 +113,7 @@ again:
ret = do_adjust_pte(vma, address, pfn, pte);
- if (ptl != vmf->ptl)
+ if (need_lock)
spin_unlock(ptl);
pte_unmap(pte);
@@ -123,16 +122,17 @@ again:
static void
make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep, unsigned long pfn,
- struct vm_fault *vmf)
+ unsigned long addr, pte_t *ptep, unsigned long pfn)
{
struct mm_struct *mm = vma->vm_mm;
struct vm_area_struct *mpnt;
unsigned long offset;
+ unsigned long start;
pgoff_t pgoff;
int aliases = 0;
pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
+ start = ALIGN_DOWN(addr, PMD_SIZE);
/*
* If we have any shared mappings that are in the same mm
@@ -141,6 +141,14 @@ make_coherent(struct address_space *mapp
*/
flush_dcache_mmap_lock(mapping);
vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
+ unsigned long mpnt_addr;
+ /*
+ * If we are using split PTE locks, then we need to take the pte
+ * lock. Otherwise we are using shared mm->page_table_lock which
+ * is already locked, thus cannot take it.
+ */
+ bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS);
+
/*
* If this VMA is not in our MM, we can ignore it.
* Note that we intentionally mask out the VMA
@@ -151,7 +159,15 @@ make_coherent(struct address_space *mapp
if (!(mpnt->vm_flags & VM_MAYSHARE))
continue;
offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
- aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf);
+ mpnt_addr = mpnt->vm_start + offset;
+ /*
+ * If mpnt_addr and addr are mapped to the same PTE page, there
+ * is no need to hold the pte lock again, otherwise a deadlock
+ * will occur.
+ */
+ if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE)
+ need_lock = false;
+ aliases += adjust_pte(mpnt, mpnt_addr, pfn, need_lock);
}
flush_dcache_mmap_unlock(mapping);
if (aliases)
@@ -194,7 +210,7 @@ void update_mmu_cache_range(struct vm_fa
__flush_dcache_folio(mapping, folio);
if (mapping) {
if (cache_is_vivt())
- make_coherent(mapping, vma, addr, ptep, pfn, vmf);
+ make_coherent(mapping, vma, addr, ptep, pfn);
else if (vma->vm_flags & VM_EXEC)
__flush_icache_all();
}
_
Patches currently in -mm which might be from zhengqi.arch(a)bytedance.com are
mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages.patch
arm-pgtable-fix-null-pointer-dereference-issue.patch
UNCRD is recruiting eligible candidates for the post listed below:
Human Resources Officer
Economic Affairs Officer
Logistics Officer
Environmental Affairs Officer
Administrative Officer
Chief Nurse
Finance and Budget Officer
Information Management Officer
Property Management Officer
Project Manager Engineer
Pharmacist
Research Officer
Medical Officer
Programme Officer
Training Officer
Engineer
Full Listing in the attached PDF.
HOW TO APPLY
Interested and qualified applicants should send their detailed Resumes/Cvs to recruit(a)uncrd.org as soon as possible.
Each applicant must bear in mind that submission of incomplete or inaccurate applications may render that applicant ineligible for consideration. Candidates under serious consideration for selection will be subjected to a reference-checking process to verify the information provided in the application.
Goodluck,
New Recruitment.
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x a9ab6591b45258b79af1cb66112fd9f83c8855da
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021018-attire-arrange-51d4@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a9ab6591b45258b79af1cb66112fd9f83c8855da Mon Sep 17 00:00:00 2001
From: Lucas De Marchi <lucas.demarchi(a)intel.com>
Date: Thu, 23 Jan 2025 12:22:03 -0800
Subject: [PATCH] drm/xe: Fix and re-enable xe_print_blob_ascii85()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool") partially reverted some changes to workaround breakage
caused to mesa tools. However, in doing so it also broke fetching the
GuC log via debugfs since xe_print_blob_ascii85() simply bails out.
The fix is to avoid the extra newlines: the devcoredump interface is
line-oriented and adding random newlines in the middle breaks it. If a
tool is able to parse it by looking at the data and checking for chars
that are out of the ascii85 space, it can still do so. A format change
that breaks the line-oriented output on devcoredump however needs better
coordination with existing tools.
v2: Add suffix description comment
v3: Reword explanation of xe_print_blob_ascii85() calling drm_puts()
in a loop
Reviewed-by: José Roberto de Souza <jose.souza(a)intel.com>
Cc: John Harrison <John.C.Harrison(a)Intel.com>
Cc: Julia Filipchuk <julia.filipchuk(a)intel.com>
Cc: José Roberto de Souza <jose.souza(a)intel.com>
Cc: stable(a)vger.kernel.org
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Fixes: ec1455ce7e35 ("drm/xe/devcoredump: Add ASCII85 dump helper function")
Link: https://patchwork.freedesktop.org/patch/msgid/20250123202307.95103-2-jose.s…
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
(cherry picked from commit 2c95bbf5002776117a69caed3b31c10bf7341bec)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index a7946a76777e..39fe485d2085 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -391,42 +391,34 @@ int xe_devcoredump_init(struct xe_device *xe)
/**
* xe_print_blob_ascii85 - print a BLOB to some useful location in ASCII85
*
- * The output is split to multiple lines because some print targets, e.g. dmesg
- * cannot handle arbitrarily long lines. Note also that printing to dmesg in
- * piece-meal fashion is not possible, each separate call to drm_puts() has a
- * line-feed automatically added! Therefore, the entire output line must be
- * constructed in a local buffer first, then printed in one atomic output call.
+ * The output is split into multiple calls to drm_puts() because some print
+ * targets, e.g. dmesg, cannot handle arbitrarily long lines. These targets may
+ * add newlines, as is the case with dmesg: each drm_puts() call creates a
+ * separate line.
*
* There is also a scheduler yield call to prevent the 'task has been stuck for
* 120s' kernel hang check feature from firing when printing to a slow target
* such as dmesg over a serial port.
*
- * TODO: Add compression prior to the ASCII85 encoding to shrink huge buffers down.
- *
* @p: the printer object to output to
* @prefix: optional prefix to add to output string
+ * @suffix: optional suffix to add at the end. 0 disables it and is
+ * not added to the output, which is useful when using multiple calls
+ * to dump data to @p
* @blob: the Binary Large OBject to dump out
* @offset: offset in bytes to skip from the front of the BLOB, must be a multiple of sizeof(u32)
* @size: the size in bytes of the BLOB, must be a multiple of sizeof(u32)
*/
-void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
+void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, char suffix,
const void *blob, size_t offset, size_t size)
{
const u32 *blob32 = (const u32 *)blob;
char buff[ASCII85_BUFSZ], *line_buff;
size_t line_pos = 0;
- /*
- * Splitting blobs across multiple lines is not compatible with the mesa
- * debug decoder tool. Note that even dropping the explicit '\n' below
- * doesn't help because the GuC log is so big some underlying implementation
- * still splits the lines at 512K characters. So just bail completely for
- * the moment.
- */
- return;
-
#define DMESG_MAX_LINE_LEN 800
-#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "\n\0" */
+ /* Always leave space for the suffix char and the \0 */
+#define MIN_SPACE (ASCII85_BUFSZ + 2) /* 85 + "<suffix>\0" */
if (size & 3)
drm_printf(p, "Size not word aligned: %zu", size);
@@ -458,7 +450,6 @@ void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
line_pos += strlen(line_buff + line_pos);
if ((line_pos + MIN_SPACE) >= DMESG_MAX_LINE_LEN) {
- line_buff[line_pos++] = '\n';
line_buff[line_pos++] = 0;
drm_puts(p, line_buff);
@@ -470,10 +461,11 @@ void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
}
}
- if (line_pos) {
- line_buff[line_pos++] = '\n';
- line_buff[line_pos++] = 0;
+ if (suffix)
+ line_buff[line_pos++] = suffix;
+ if (line_pos) {
+ line_buff[line_pos++] = 0;
drm_puts(p, line_buff);
}
diff --git a/drivers/gpu/drm/xe/xe_devcoredump.h b/drivers/gpu/drm/xe/xe_devcoredump.h
index 6a17e6d60102..5391a80a4d1b 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.h
+++ b/drivers/gpu/drm/xe/xe_devcoredump.h
@@ -29,7 +29,7 @@ static inline int xe_devcoredump_init(struct xe_device *xe)
}
#endif
-void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix,
+void xe_print_blob_ascii85(struct drm_printer *p, const char *prefix, char suffix,
const void *blob, size_t offset, size_t size);
#endif
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 8b65c5e959cc..50c8076b5158 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1724,7 +1724,8 @@ void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot,
snapshot->g2h_outstanding);
if (snapshot->ctb)
- xe_print_blob_ascii85(p, "CTB data", snapshot->ctb, 0, snapshot->ctb_size);
+ xe_print_blob_ascii85(p, "CTB data", '\n',
+ snapshot->ctb, 0, snapshot->ctb_size);
} else {
drm_puts(p, "CT disabled\n");
}
diff --git a/drivers/gpu/drm/xe/xe_guc_log.c b/drivers/gpu/drm/xe/xe_guc_log.c
index df4cfb698cdb..2baa4d95571f 100644
--- a/drivers/gpu/drm/xe/xe_guc_log.c
+++ b/drivers/gpu/drm/xe/xe_guc_log.c
@@ -211,8 +211,10 @@ void xe_guc_log_snapshot_print(struct xe_guc_log_snapshot *snapshot, struct drm_
remain = snapshot->size;
for (i = 0; i < snapshot->num_chunks; i++) {
size_t size = min(GUC_LOG_CHUNK_SIZE, remain);
+ const char *prefix = i ? NULL : "Log data";
+ char suffix = i == snapshot->num_chunks - 1 ? '\n' : 0;
- xe_print_blob_ascii85(p, i ? NULL : "Log data", snapshot->copy[i], 0, size);
+ xe_print_blob_ascii85(p, prefix, suffix, snapshot->copy[i], 0, size);
remain -= size;
}
}
We started hitting the same netdevice refcount leak that had been
reported and fixed in mainline, and that Olivier had backported into
v5.15.142. So this is the same series further backported to the
latest v5.10 which fixes the issues we were seeing there.
This version also applies without change on v5.4 which should also
be affected, but it was only build-tested there as we don't have an
easy way to run our tests on that version.
Reference: https://lore.kernel.org/stable/20231201133004.3853933-1-olivier.matz@6wind.…
Xin Long (2):
vlan: introduce vlan_dev_free_egress_priority
vlan: move dev_put into vlan_dev_uninit
net/8021q/vlan.h | 2 +-
net/8021q/vlan_dev.c | 15 +++++++++++----
net/8021q/vlan_netlink.c | 7 ++++---
3 files changed, 16 insertions(+), 8 deletions(-)
--
2.34.1
device_del() can lead to new work being scheduled in gadget->work
workqueue. This is observed, for example, with the dwc3 driver with the
following call stack:
device_del()
gadget_unbind_driver()
usb_gadget_disconnect_locked()
dwc3_gadget_pullup()
dwc3_gadget_soft_disconnect()
usb_gadget_set_state()
schedule_work(&gadget->work)
Move flush_work() after device_del() to ensure the workqueue is cleaned
up.
Fixes: 5702f75375aa9 ("usb: gadget: udc-core: move sysfs_notify() to a workqueue")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Roy Luo <royluo(a)google.com>
---
drivers/usb/gadget/udc/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index a6f46364be65..4b3d5075621a 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -1543,8 +1543,8 @@ void usb_del_gadget(struct usb_gadget *gadget)
kobject_uevent(&udc->dev.kobj, KOBJ_REMOVE);
sysfs_remove_link(&udc->dev.kobj, "gadget");
- flush_work(&gadget->work);
device_del(&gadget->dev);
+ flush_work(&gadget->work);
ida_free(&gadget_id_numbers, gadget->id_number);
cancel_work_sync(&udc->vbus_work);
device_unregister(&udc->dev);
base-commit: f286757b644c226b6b31779da95a4fa7ab245ef5
--
2.48.1.362.g079036d154-goog
On destroy, we should set each node dead. But current code miss this
when the maple tree has only the root node.
The reason is mt_destroy_walk() leverage mte_destroy_descend() to set
node dead, but this is skipped since the only root node is a leaf.
This patch fixes this by setting the root dead before mt_destroy_walk().
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
CC: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
---
lib/maple_tree.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 198c14dd3377..d31f0a2858f7 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5347,6 +5347,8 @@ static inline void mte_destroy_walk(struct maple_enode *enode,
{
struct maple_node *node = mte_to_node(enode);
+ mte_set_node_dead(enode);
+
if (mt_in_rcu(mt)) {
mt_destroy_walk(enode, mt, false);
call_rcu(&node->rcu, mt_free_walk);
--
2.34.1
Direct HLT instruction execution causes #VEs for TDX VMs which is routed
to hypervisor via TDCALL. safe_halt() routines execute HLT in STI-shadow
so IRQs need to remain disabled until the TDCALL to ensure that pending
IRQs are correctly treated as wake events. So "sti;hlt" sequence needs to
be replaced with "TDCALL; raw_local_irq_enable()" for TDX VMs.
Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
prevented the idle routines from using "sti;hlt". But it missed the
paravirt routine which can be reached like this as an example:
acpi_safe_halt() =>
raw_safe_halt() =>
arch_safe_halt() =>
irq.safe_halt() =>
pv_native_safe_halt()
Modify tdx_safe_halt() to implement the sequence "TDCALL;
raw_local_irq_enable()" and invoke tdx_halt() from idle routine which just
executes TDCALL without changing state of interrupts.
If CONFIG_PARAVIRT_XXL is disabled, "sti;hlt" sequences can still get
executed from TDX VMs via paths like:
acpi_safe_halt() =>
raw_safe_halt() =>
arch_safe_halt() =>
native_safe_halt()
There is a long term plan to fix these paths by carving out
irq.safe_halt() outside paravirt framework.
Cc: stable(a)vger.kernel.org
Fixes: bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>@linux.intel.com>
Signed-off-by: Vishal Annapurve <vannapurve(a)google.com>
---
Changes since V2:
1) Addressed comments from Dave H and Kirill S.
V2: https://lore.kernel.org/lkml/20250129232525.3519586-1-vannapurve@google.com/
arch/x86/coco/tdx/tdx.c | 18 +++++++++++++++++-
arch/x86/include/asm/tdx.h | 2 +-
arch/x86/kernel/process.c | 2 +-
3 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 32809a06dab4..5e68758666a4 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -14,6 +14,7 @@
#include <asm/ia32.h>
#include <asm/insn.h>
#include <asm/insn-eval.h>
+#include <asm/paravirt_types.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
#include <asm/traps.h>
@@ -398,7 +399,7 @@ static int handle_halt(struct ve_info *ve)
return ve_instr_len(ve);
}
-void __cpuidle tdx_safe_halt(void)
+void __cpuidle tdx_halt(void)
{
const bool irq_disabled = false;
@@ -409,6 +410,12 @@ void __cpuidle tdx_safe_halt(void)
WARN_ONCE(1, "HLT instruction emulation failed\n");
}
+static void __cpuidle tdx_safe_halt(void)
+{
+ tdx_halt();
+ raw_local_irq_enable();
+}
+
static int read_msr(struct pt_regs *regs, struct ve_info *ve)
{
struct tdx_module_args args = {
@@ -1109,6 +1116,15 @@ void __init tdx_early_init(void)
x86_platform.guest.enc_kexec_begin = tdx_kexec_begin;
x86_platform.guest.enc_kexec_finish = tdx_kexec_finish;
+#ifdef CONFIG_PARAVIRT_XXL
+ /*
+ * Avoid the literal hlt instruction in TDX guests. hlt will
+ * induce a #VE in the STI-shadow which will enable interrupts
+ * in a place where they are not wanted.
+ */
+ pv_ops.irq.safe_halt = tdx_safe_halt;
+#endif
+
/*
* TDX intercepts the RDMSR to read the X2APIC ID in the parallel
* bringup low level code. That raises #VE which cannot be handled
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index b4b16dafd55e..393ee2dfaab1 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -58,7 +58,7 @@ void tdx_get_ve_info(struct ve_info *ve);
bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve);
-void tdx_safe_halt(void);
+void tdx_halt(void);
bool tdx_early_handle_ve(struct pt_regs *regs);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 6da6769d7254..d11956a178df 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -934,7 +934,7 @@ void __init select_idle_routine(void)
static_call_update(x86_idle, mwait_idle);
} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
pr_info("using TDX aware idle routine\n");
- static_call_update(x86_idle, tdx_safe_halt);
+ static_call_update(x86_idle, tdx_halt);
} else {
static_call_update(x86_idle, default_idle);
}
--
2.48.1.502.g6dc24dfdaf-goog
The patch titled
Subject: kasan: don't call find_vm_area() in RT kernel
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kasan-dont-call-find_vm_area-in-rt-kernel.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: kasan: don't call find_vm_area() in RT kernel
Date: Tue, 11 Feb 2025 11:07:50 -0500
The following bug report appeared with a test run in a RT debug kernel.
[ 3359.353842] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ 3359.353848] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 140605, name: kunit_try_catch
[ 3359.353853] preempt_count: 1, expected: 0
:
[ 3359.353933] Call trace:
:
[ 3359.353955] rt_spin_lock+0x70/0x140
[ 3359.353959] find_vmap_area+0x84/0x168
[ 3359.353963] find_vm_area+0x1c/0x50
[ 3359.353966] print_address_description.constprop.0+0x2a0/0x320
[ 3359.353972] print_report+0x108/0x1f8
[ 3359.353976] kasan_report+0x90/0xc8
[ 3359.353980] __asan_load1+0x60/0x70
print_address_description() is run with a raw_spinlock_t acquired and
interrupt disabled. find_vm_area() needs to acquire a spinlock_t which
becomes a sleeping lock in the RT kernel. IOW, we can't call
find_vm_area() in a RT kernel. Fix this bug report by skipping the
find_vm_area() call in this case and just print out the address as is.
For !RT kernel, follow the example set in commit 0cce06ba859a
("debugobjects,locking: Annotate debug_object_fill_pool() wait type
violation") and use DEFINE_WAIT_OVERRIDE_MAP() to avoid a spinlock_t
inside raw_spinlock_t warning.
Link: https://lkml.kernel.org/r/20250211160750.1301353-1-longman@redhat.com
Fixes: c056a364e954 ("kasan: print virtual mapping info in reports")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Mariano Pache <npache(a)redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/report.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
--- a/mm/kasan/report.c~kasan-dont-call-find_vm_area-in-rt-kernel
+++ a/mm/kasan/report.c
@@ -398,9 +398,20 @@ static void print_address_description(vo
pr_err("\n");
}
- if (is_vmalloc_addr(addr)) {
- struct vm_struct *va = find_vm_area(addr);
+ if (!is_vmalloc_addr(addr))
+ goto print_page;
+ /*
+ * RT kernel cannot call find_vm_area() in atomic context.
+ * For !RT kernel, prevent spinlock_t inside raw_spinlock_t warning
+ * by raising wait-type to WAIT_SLEEP.
+ */
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ static DEFINE_WAIT_OVERRIDE_MAP(vmalloc_map, LD_WAIT_SLEEP);
+ struct vm_struct *va;
+
+ lock_map_acquire_try(&vmalloc_map);
+ va = find_vm_area(addr);
if (va) {
pr_err("The buggy address belongs to the virtual mapping at\n"
" [%px, %px) created by:\n"
@@ -410,8 +421,13 @@ static void print_address_description(vo
page = vmalloc_to_page(addr);
}
+ lock_map_release(&vmalloc_map);
+ } else {
+ pr_err("The buggy address %px belongs to a vmalloc virtual mapping\n",
+ addr);
}
+print_page:
if (page) {
pr_err("The buggy address belongs to the physical page:\n");
dump_page(page, "kasan: bad access detected");
_
Patches currently in -mm which might be from longman(a)redhat.com are
kasan-dont-call-find_vm_area-in-rt-kernel.patch
This is the start of the stable review cycle for the 6.12.8 release.
There are 114 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 01 Jan 2025 15:41:48 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.8-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.12.8-rc1
Takashi Iwai <tiwai(a)suse.de>
ALSA: sh: Fix wrong argument order for copy_from_iter()
Takashi Iwai <tiwai(a)suse.de>
ALSA: ump: Shut up truncated string warning
Chris Lu <chris.lu(a)mediatek.com>
Bluetooth: btusb: mediatek: change the conditions for ISO interface
Chris Lu <chris.lu(a)mediatek.com>
Bluetooth: btusb: mediatek: add intf release flow when usb disconnect
Chris Lu <chris.lu(a)mediatek.com>
Bluetooth: btusb: mediatek: add callback function in btusb_disconnect
Chris Lu <chris.lu(a)mediatek.com>
Bluetooth: btusb: mediatek: move Bluetooth power off command position
Boris Burkov <boris(a)bur.io>
btrfs: check folio mapping after unlock in relocate_one_folio()
Boris Burkov <boris(a)bur.io>
btrfs: check folio mapping after unlock in put_file_data()
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix use-after-free when COWing tree bock and tracing is enabled
Qu Wenruo <wqu(a)suse.com>
btrfs: sysfs: fix direct super block member reads
Julian Sun <sunjunchao2870(a)gmail.com>
btrfs: fix transaction atomicity bug when enabling simple quotas
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix swap file activation failure due to extents that used to be shared
Filipe Manana <fdmanana(a)suse.com>
btrfs: avoid monopolizing a core when activating a swap file
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix race with memory mapped writes when activating swap file
Dimitri Fedrau <dimitri.fedrau(a)liebherr.com>
power: supply: gpio-charger: Fix set charge current limits
Thomas Weißschuh <linux(a)weissschuh.net>
power: supply: cros_charge-control: hide start threshold on v2 cmd
Thomas Weißschuh <linux(a)weissschuh.net>
power: supply: cros_charge-control: allow start_threshold == end_threshold
Thomas Weißschuh <linux(a)weissschuh.net>
power: supply: cros_charge-control: add mutex for driver data
Kan Liang <kan.liang(a)linux.intel.com>
perf/x86/intel/ds: Add PEBS format 6
Conor Dooley <conor.dooley(a)microchip.com>
i2c: microchip-core: fix "ghost" detections
Carlos Song <carlos.song(a)nxp.com>
i2c: imx: add imx7d compatible string for applying erratum ERR007805
Kan Liang <kan.liang(a)linux.intel.com>
perf/x86/intel: Fix bitmask of OCR and FRONTEND events for LNC
Thomas Gleixner <tglx(a)linutronix.de>
PCI/MSI: Handle lack of irqdomain gracefully
Li RongQing <lirongqing(a)baidu.com>
virt: tdx-guest: Just leak decrypted memory on unrecoverable errors
Xin Li (Intel) <xin(a)zytor.com>
x86/fred: Clear WFE in missing-ENDBRANCH #CPs
Conor Dooley <conor.dooley(a)microchip.com>
i2c: microchip-core: actually use repeated sends
Pavel Begunkov <asml.silence(a)gmail.com>
io_uring/sqpoll: fix sqpoll error handling races
Tomas Glozar <tglozar(a)redhat.com>
rtla/timerlat: Fix histogram ALL for zero samples
Lizhi Xu <lizhi.xu(a)windriver.com>
tracing: Prevent bad count for tracing_cpumask_write
Christian Göttsche <cgzones(a)googlemail.com>
tracing: Constify string literal data member in struct trace_event_call
Kan Liang <kan.liang(a)linux.intel.com>
perf/x86/intel/uncore: Add Clearwater Forest support
Binbin Zhou <zhoubinbin(a)loongson.cn>
dmaengine: loongson2-apb: Change GENMASK to GENMASK_ULL
Chen Ridong <chenridong(a)huawei.com>
freezer, sched: Report frozen tasks as 'D' instead of 'R'
chenchangcheng <ccc194101(a)163.com>
objtool: Add bch2_trans_unlocked_error() to bcachefs noreturns
John Harrison <John.C.Harrison(a)Intel.com>
drm/xe: Move the coredump registration to the worker thread
Matthew Brost <matthew.brost(a)intel.com>
drm/xe: Take PM ref in delayed snapshot capture worker
Ming Lei <ming.lei(a)redhat.com>
ublk: detach gendisk from ublk device if add_disk() fails
Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
wifi: iwlwifi: be less noisy if the NIC is dead in S3
Ming Lei <ming.lei(a)redhat.com>
blk-mq: register cpuhp callback after hctx is added to xarray table
Ming Lei <ming.lei(a)redhat.com>
virtio-blk: don't keep queue frozen during system suspend
Imre Deak <imre.deak(a)intel.com>
drm/dp_mst: Ensure mst_primary pointer is valid in drm_dp_mst_handle_up_req()
Purushothama Siddaiah <psiddaiah(a)mvista.com>
spi: omap2-mcspi: Fix the IS_ERR() bug for devm_clk_get_optional_enabled()
Qinxin Xia <xiaqinxin(a)huawei.com>
ACPI/IORT: Add PMCG platform information for HiSilicon HIP09A
Cathy Avery <cavery(a)redhat.com>
scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_OVERRUN as an error
Ranjan Kumar <ranjan.kumar(a)broadcom.com>
scsi: mpi3mr: Handling of fault code for insufficient power
Ranjan Kumar <ranjan.kumar(a)broadcom.com>
scsi: mpi3mr: Start controller indexing from 0
Ranjan Kumar <ranjan.kumar(a)broadcom.com>
scsi: mpi3mr: Fix corrupt config pages PHY state is switched in sysfs
Ranjan Kumar <ranjan.kumar(a)broadcom.com>
scsi: mpi3mr: Synchronize access to ioctl data buffer
Ranjan Kumar <ranjan.kumar(a)broadcom.com>
scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load time
Aapo Vienamo <aapo.vienamo(a)iki.fi>
spi: intel: Add Panther Lake SPI controller support
Kumar Kartikeya Dwivedi <memxor(a)gmail.com>
bpf: Zero index arg error string for dynptr and iter
Armin Wolf <W_Armin(a)gmx.de>
platform/x86: asus-nb-wmi: Ignore unknown event 0xCF
Tiezhu Yang <yangtiezhu(a)loongson.cn>
LoongArch: BPF: Adjust the parameter of emit_jirl()
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Fix reserving screen info memory for above-4G firmware
Mark Brown <broonie(a)kernel.org>
regmap: Use correct format specifier for logging range errors
Brahmajit Das <brahmajit.xyz(a)gmail.com>
smb: server: Fix building with GCC 15
Takashi Iwai <tiwai(a)suse.de>
ALSA: sh: Use standard helper for buffer accesses
bo liu <bo.liu(a)senarytech.com>
ALSA: hda/conexant: fix Z60MR100 startup pop issue
Takashi Iwai <tiwai(a)suse.de>
ALSA: ump: Update legacy substream names upon FB info update
Takashi Iwai <tiwai(a)suse.de>
ALSA: ump: Indicate the inactive group in legacy substream names
Takashi Iwai <tiwai(a)suse.de>
ALSA: ump: Don't open legacy substream for an inactive group
Jan Kara <jack(a)suse.cz>
udf: Verify inode link counts before performing rename
Jan Kara <jack(a)suse.cz>
udf: Skip parent dir link count update if corrupted
Tomas Henzl <thenzl(a)redhat.com>
scsi: megaraid_sas: Fix for a potential deadlock
Magnus Lindholm <linmag7(a)gmail.com>
scsi: qla1280: Fix hw revision numbering for ISP1020/1040
Yassine Oudjana <y.oudjana(a)protonmail.com>
watchdog: mediatek: Add support for MT6735 TOPRGU/WDT
Peter Griffin <peter.griffin(a)linaro.org>
Revert "watchdog: s3c2410_wdt: use exynos_get_pmu_regmap_by_phandle() for PMU regs"
Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
watchdog: rzg2l_wdt: Power on the watchdog domain in the restart handler
James Hilliard <james.hilliard1(a)gmail.com>
watchdog: it87_wdt: add PWRGD enable quirk for Qotom QCML04
Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
tracing/kprobe: Make trace_kprobe's module callback called after jump_label update
Alexander Lobakin <aleksander.lobakin(a)intel.com>
stddef: make __struct_group() UAPI C++-friendly
Hans de Goede <hdegoede(a)redhat.com>
power: supply: bq24190: Fix BQ24296 Vbus regulator support
Haren Myneni <haren(a)linux.ibm.com>
powerpc/pseries/vas: Add close() callback in vas_vm_ops struct
Richard Fitzgerald <rf(a)opensource.cirrus.com>
ASoC: Intel: sof_sdw: Fix DMI match for Lenovo 21Q6 and 21Q7
Chen-Yu Tsai <wenst(a)chromium.org>
ASoC: dt-bindings: realtek,rt5645: Fix CPVDD voltage comment
Richard Fitzgerald <rf(a)opensource.cirrus.com>
ASoC: Intel: sof_sdw: Fix DMI match for Lenovo 21QA and 21QB
Venkata Prasad Potturu <venkataprasad.potturu(a)amd.com>
ASoC: amd: ps: Fix for enabling DMIC on acp63 platform via _DSD entry
Dan Carpenter <dan.carpenter(a)linaro.org>
mtd: rawnand: fix double free in atmel_pmecc_create_user()
Dustin L. Howett <dustin(a)howett.net>
platform/chrome: cros_ec_lpc: fix product identity for early Framework Laptops
Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
ASoC: SOF: Intel: hda-dai: Do not release the link DMA on STOP
Chen Ridong <chenridong(a)huawei.com>
dmaengine: at_xdmac: avoid null_prt_deref in at_xdmac_prep_dma_memset
Sasha Finkelstein <fnkl.kernel(a)gmail.com>
dmaengine: apple-admac: Avoid accessing registers in probe
Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp>
dmaengine: fsl-edma: implement the cleanup path of fsl_edma3_attach_pd()
Lizhi Hou <lizhi.hou(a)amd.com>
dmaengine: amd: qdma: Remove using the private get and set dma_ops APIs
Akhil R <akhilrajeev(a)nvidia.com>
dmaengine: tegra: Return correct DMA status when paused
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
dmaengine: dw: Select only supported masters for ACPI devices
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
dmaengine: mv_xor: fix child node refcount handling in early exit
Fedor Pchelkin <pchelkin(a)ispras.ru>
ALSA: memalloc: prefer dma_mapping_error() over explicit address checking
Chukun Pan <amadeus(a)jmu.edu.cn>
phy: rockchip: naneng-combphy: fix phy reset
Cristian Ciocaltea <cristian.ciocaltea(a)collabora.com>
phy: rockchip: samsung-hdptx: Set drvdata before enabling runtime PM
Justin Chen <justin.chen(a)broadcom.com>
phy: usb: Toggle the PHY power during init
Zijun Hu <quic_zijuhu(a)quicinc.com>
phy: core: Fix that API devm_phy_destroy() fails to destroy the phy
Zijun Hu <quic_zijuhu(a)quicinc.com>
phy: core: Fix that API devm_of_phy_provider_unregister() fails to unregister the phy provider
Zijun Hu <quic_zijuhu(a)quicinc.com>
phy: core: Fix that API devm_phy_put() fails to release the phy
Zijun Hu <quic_zijuhu(a)quicinc.com>
phy: core: Fix an OF node refcount leakage in of_phy_provider_lookup()
Zijun Hu <quic_zijuhu(a)quicinc.com>
phy: core: Fix an OF node refcount leakage in _of_phy_get()
Krishna Kurapati <quic_kriskura(a)quicinc.com>
phy: qcom-qmp: Fix register name in RX Lane config of SC8280XP
Maciej Andrzejewski <maciej.andrzejewski(a)m-works.net>
mtd: rawnand: arasan: Fix missing de-registration of NAND
Maciej Andrzejewski <maciej.andrzejewski(a)m-works.net>
mtd: rawnand: arasan: Fix double assertion of chip-select
Zichen Xie <zichenxie0106(a)gmail.com>
mtd: diskonchip: Cast an operand to prevent potential overflow
NeilBrown <neilb(a)suse.de>
nfsd: restore callback functionality for NFSv4.0
Yang Erkun <yangerkun(a)huawei.com>
nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"
Cong Wang <cong.wang(a)bytedance.com>
bpf: Check negative offsets in __bpf_skb_min_len()
Zijian Zhang <zijianzhang(a)bytedance.com>
tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection
Cong Wang <cong.wang(a)bytedance.com>
tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress()
Bharath SM <bharathsm.hsk(a)gmail.com>
smb: fix bytes written value in /proc/fs/cifs/Stats
Dragan Simic <dsimic(a)manjaro.org>
smb: client: Deduplicate "select NETFS_SUPPORT" in Kconfig
Jerome Marchand <jmarchan(a)redhat.com>
selftests/bpf: Fix compilation error in get_uprobe_offset()
Bart Van Assche <bvanassche(a)acm.org>
mm/vmstat: fix a W=1 clang compiler warning
Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
fork: avoid inappropriate uprobe access to invalid mm
Andrea Righi <arighi(a)nvidia.com>
bpf: Fix bpf_get_smp_processor_id() on !CONFIG_SMP
Willow Cunningham <willow.e.cunningham(a)gmail.com>
arm64: dts: broadcom: Fix L2 linesize for Raspberry Pi 5
Ilya Dryomov <idryomov(a)gmail.com>
ceph: allocate sparse_ext map only for sparse reads
Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
media: dvb-frontends: dib3000mb: fix uninit-value in dib3000_write_reg
-------------
Diffstat:
Documentation/arch/arm64/silicon-errata.rst | 5 +-
.../devicetree/bindings/sound/realtek,rt5645.yaml | 2 +-
Makefile | 4 +-
arch/arm64/boot/dts/broadcom/bcm2712.dtsi | 8 +-
arch/loongarch/include/asm/inst.h | 12 +-
arch/loongarch/kernel/efi.c | 2 +-
arch/loongarch/kernel/inst.c | 2 +-
arch/loongarch/net/bpf_jit.c | 6 +-
arch/powerpc/platforms/book3s/vas-api.c | 36 +++++
arch/x86/events/intel/core.c | 12 +-
arch/x86/events/intel/ds.c | 1 +
arch/x86/events/intel/uncore.c | 1 +
arch/x86/kernel/cet.c | 30 ++++
block/blk-mq.c | 15 +-
drivers/acpi/arm64/iort.c | 2 +
drivers/base/regmap/regmap.c | 4 +-
drivers/block/ublk_drv.c | 26 +--
drivers/block/virtio_blk.c | 7 +-
drivers/bluetooth/btusb.c | 41 +++--
drivers/dma/amd/qdma/qdma.c | 28 ++--
drivers/dma/apple-admac.c | 7 +-
drivers/dma/at_xdmac.c | 2 +
drivers/dma/dw/acpi.c | 6 +-
drivers/dma/dw/internal.h | 8 +
drivers/dma/dw/pci.c | 4 +-
drivers/dma/fsl-edma-common.h | 1 +
drivers/dma/fsl-edma-main.c | 41 ++++-
drivers/dma/ls2x-apb-dma.c | 2 +-
drivers/dma/mv_xor.c | 2 +
drivers/dma/tegra186-gpc-dma.c | 10 ++
drivers/gpu/drm/display/drm_dp_mst_topology.c | 24 ++-
drivers/gpu/drm/xe/xe_devcoredump.c | 69 ++++----
drivers/i2c/busses/i2c-imx.c | 1 +
drivers/i2c/busses/i2c-microchip-corei2c.c | 126 +++++++++++----
drivers/media/dvb-frontends/dib3000mb.c | 2 +-
drivers/mtd/nand/raw/arasan-nand-controller.c | 11 +-
drivers/mtd/nand/raw/atmel/pmecc.c | 4 +-
drivers/mtd/nand/raw/diskonchip.c | 2 +-
drivers/net/wireless/intel/iwlwifi/iwl-trans.h | 13 +-
drivers/net/wireless/intel/iwlwifi/mvm/d3.c | 28 +++-
drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 2 +
drivers/pci/msi/irqdomain.c | 7 +-
drivers/pci/msi/msi.c | 4 +
drivers/phy/broadcom/phy-brcm-usb-init-synopsys.c | 6 +
drivers/phy/phy-core.c | 21 ++-
drivers/phy/qualcomm/phy-qcom-qmp-usb.c | 2 +-
drivers/phy/rockchip/phy-rockchip-naneng-combphy.c | 2 +-
drivers/phy/rockchip/phy-rockchip-samsung-hdptx.c | 3 +-
drivers/platform/chrome/cros_ec_lpc.c | 4 +-
drivers/platform/x86/asus-nb-wmi.c | 1 +
drivers/power/supply/bq24190_charger.c | 12 +-
drivers/power/supply/cros_charge-control.c | 36 +++--
drivers/power/supply/gpio-charger.c | 8 +
drivers/scsi/megaraid/megaraid_sas_base.c | 5 +-
drivers/scsi/mpi3mr/mpi3mr.h | 9 --
drivers/scsi/mpi3mr/mpi3mr_app.c | 36 +++--
drivers/scsi/mpi3mr/mpi3mr_fw.c | 121 ++++++--------
drivers/scsi/mpi3mr/mpi3mr_os.c | 2 +-
drivers/scsi/mpt3sas/mpt3sas_base.c | 7 +-
drivers/scsi/qla1280.h | 12 +-
drivers/scsi/storvsc_drv.c | 7 +-
drivers/spi/spi-intel-pci.c | 2 +
drivers/spi/spi-omap2-mcspi.c | 6 +-
drivers/virt/coco/tdx-guest/tdx-guest.c | 4 +-
drivers/watchdog/Kconfig | 1 +
drivers/watchdog/it87_wdt.c | 39 +++++
drivers/watchdog/mtk_wdt.c | 6 +
drivers/watchdog/rzg2l_wdt.c | 20 ++-
drivers/watchdog/s3c2410_wdt.c | 8 +-
fs/btrfs/ctree.c | 11 +-
fs/btrfs/inode.c | 129 +++++++++++----
fs/btrfs/qgroup.c | 3 +-
fs/btrfs/relocation.c | 6 +
fs/btrfs/send.c | 6 +
fs/btrfs/sysfs.c | 6 +-
fs/ceph/file.c | 2 +-
fs/nfsd/export.c | 31 +---
fs/nfsd/export.h | 4 +-
fs/nfsd/nfs4callback.c | 4 +-
fs/smb/client/Kconfig | 1 -
fs/smb/client/smb2pdu.c | 3 +
fs/smb/server/smb_common.c | 4 +-
fs/udf/namei.c | 16 +-
include/linux/platform_data/amd_qdma.h | 2 +
include/linux/sched.h | 3 +-
include/linux/skmsg.h | 11 +-
include/linux/trace_events.h | 2 +-
include/linux/vmstat.h | 2 +-
include/net/sock.h | 10 +-
include/uapi/linux/stddef.h | 13 +-
io_uring/sqpoll.c | 6 +
kernel/bpf/verifier.c | 18 ++-
kernel/fork.c | 13 +-
kernel/trace/trace.c | 3 +
kernel/trace/trace_kprobe.c | 2 +-
net/ceph/osd_client.c | 2 +
net/core/filter.c | 21 ++-
net/core/skmsg.c | 6 +-
net/ipv4/tcp_bpf.c | 6 +-
sound/core/memalloc.c | 2 +-
sound/core/ump.c | 26 ++-
sound/pci/hda/patch_conexant.c | 28 ++++
sound/sh/sh_dac_audio.c | 5 +-
sound/soc/amd/ps/pci-ps.c | 17 +-
sound/soc/intel/boards/sof_sdw.c | 23 ++-
sound/soc/sof/intel/hda-dai.c | 25 ++-
sound/soc/sof/intel/hda.h | 2 -
tools/include/uapi/linux/stddef.h | 15 +-
tools/objtool/noreturns.h | 1 +
tools/testing/selftests/bpf/progs/dynptr_fail.c | 22 +--
.../selftests/bpf/progs/iters_state_safety.c | 14 +-
.../selftests/bpf/progs/iters_testmod_seq.c | 4 +-
.../selftests/bpf/progs/test_kfunc_dynptr_param.c | 2 +-
.../selftests/bpf/progs/verifier_bits_iter.c | 4 +-
tools/testing/selftests/bpf/trace_helpers.c | 4 +
tools/tracing/rtla/src/timerlat_hist.c | 177 +++++++++++----------
116 files changed, 1169 insertions(+), 548 deletions(-)
From: Song Yoong Siang <yoong.siang.song(a)intel.com>
Set the buffer type to IGC_TX_BUFFER_TYPE_SKB for empty frame in the
igc_init_empty_frame function. This ensures that the buffer type is
correctly identified and handled during Tx ring cleanup.
Fixes: db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit")
Cc: stable(a)vger.kernel.org # 6.2+
Signed-off-by: Song Yoong Siang <yoong.siang.song(a)intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski(a)intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Tested-by: Mor Bar-Gabay <morx.bar.gabay(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 21f318f12a8d..84307bb7313e 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1096,6 +1096,7 @@ static int igc_init_empty_frame(struct igc_ring *ring,
return -ENOMEM;
}
+ buffer->type = IGC_TX_BUFFER_TYPE_SKB;
buffer->skb = skb;
buffer->protocol = 0;
buffer->bytecount = skb->len;
--
2.47.1
This series adds T6020 support to the Apple PCIe controller. Mostly
Apple shuffled registers around (presumably to accommodate the larger
configurations on those machines). So there's a bit of churn here but
not too much in the way of functional changes.
Signed-off-by: Alyssa Rosenzweig <alyssa(a)rosenzweig.io>
---
Alyssa Rosenzweig (1):
dt-bindings: pci: apple,pcie: Add t6020 support
Hector Martin (5):
PCI: apple: Fix missing OF node reference in apple_pcie_setup_port
PCI: apple: Move port PHY registers to their own reg items
PCI: apple: Drop poll for CORE_RC_PHYIF_STAT_REFCLK
PCI: apple: Use gpiod_set_value_cansleep in probe flow
PCI: apple: Add T602x PCIe support
Janne Grunau (1):
PCI: apple: Set only available ports up
.../devicetree/bindings/pci/apple,pcie.yaml | 1 +
drivers/pci/controller/pcie-apple.c | 189 ++++++++++++++++-----
2 files changed, 146 insertions(+), 44 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250211-pcie-t6-3f4898ce9b1d
Best regards,
--
Alyssa Rosenzweig <alyssa(a)rosenzweig.io>
These patches fix some issues with the way KVM manages FPSIMD/SVE/SME
state. The series supersedes my earlier attempt at fixing the host SVE
state corruption issue:
https://lore.kernel.org/linux-arm-kernel/20250121100026.3974971-1-mark.rutl…
Patch 1 addresses the host SVE state corruption issue by always saving
and unbinding the host state when loading a vCPU, as discussed on the
earlier patch:
https://lore.kernel.org/linux-arm-kernel/Z4--YuG5SWrP_pW7@J2N7QTR9R3/https://lore.kernel.org/linux-arm-kernel/86plkful48.wl-maz@kernel.org/
Patches 2 to 4 remove code made redundant by patch 1. These probably
warrant backporting along with patch 1 as there is some historical
brokenness in the code they remove.
Patches 5 to 7 are preparatory refactoring for patch 8, and are not
intended to have any functional impact.
Patch 8 addresses some mismanagement of ZCR_EL{1,2} which can result in
the host VMM unexpectedly receiving a SIGKILL. To fix this, we eagerly
switch ZCR_EL{1,2} at guest<->host transitions, as discussed on another
series:
https://lore.kernel.org/linux-arm-kernel/Z4pAMaEYvdLpmbg2@J2N7QTR9R3/https://lore.kernel.org/linux-arm-kernel/86o6zzukwr.wl-maz@kernel.org/https://lore.kernel.org/linux-arm-kernel/Z5Dc-WMu2azhTuMn@J2N7QTR9R3/
The end result is that KVM loses ~100 lines of code, and becomes a bit
simpler to reason about.
I've pushed these patches to the arm64-kvm-fpsimd-fixes-20250206 tag on
my kernel.org repo:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git
The (unstable) arm64/kvm/fpsimd-fixes branch in that repo contains the
fixes plus additional debug patches I've used for testing. I've given
this some basic testing on a virtual platform, booting a host and a
guest with and without constraining the guest's max SVE VL, with:
* kvm_arm.mode=vhe
* kvm_arm.mode=nvhe
* kvm_arm.mode=protected (IIUC this will default to hVHE)
Since v1 [1]:
* Address some additional compiler warnings in patch 7
* Use ZCR_EL1 alias in VHE code
* Fold in Tested-by and Reviewed-by tags
* Fix typos
[1] https://lore.kernel.org/linux-arm-kernel/20250204152100.705610-1-mark.rutla…
Mark.
Mark Rutland (8):
KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state
KVM: arm64: Remove host FPSIMD saving for non-protected KVM
KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN
KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN
KVM: arm64: Refactor CPTR trap deactivation
KVM: arm64: Refactor exit handlers
KVM: arm64: Mark some header functions as inline
KVM: arm64: Eagerly switch ZCR_EL{1,2}
arch/arm64/include/asm/kvm_emulate.h | 42 --------
arch/arm64/include/asm/kvm_host.h | 22 +----
arch/arm64/kernel/fpsimd.c | 25 -----
arch/arm64/kvm/arm.c | 8 --
arch/arm64/kvm/fpsimd.c | 100 ++-----------------
arch/arm64/kvm/hyp/include/hyp/switch.h | 125 +++++++++++++++++-------
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 15 ++-
arch/arm64/kvm/hyp/nvhe/switch.c | 91 ++++++++---------
arch/arm64/kvm/hyp/vhe/switch.c | 33 ++++---
9 files changed, 174 insertions(+), 287 deletions(-)
--
2.30.2
From: Jill Donahue <jilliandonahue58(a)gmail.com>
When using USB MIDI, a lock is attempted to be acquired twice through a
re-entrant call to f_midi_transmit, causing a deadlock.
Fix it by using queue_work() to schedule the inner f_midi_transmit() via
a high priority work queue from the completion handler.
Link: https://lore.kernel.org/all/CAArt=LjxU0fUZOj06X+5tkeGT+6RbXzpWg1h4t4Fwa_KGV…
Fixes: d5daf49b58661 ("USB: gadget: midi: add midi function driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jill Donahue <jilliandonahue58(a)gmail.com>
---
V3 -> V4: Adjusted changes based on latest kernel tree
drivers/usb/gadget/function/f_midi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c
index 837fcdfa3840..a188648d7528 100644
--- a/drivers/usb/gadget/function/f_midi.c
+++ b/drivers/usb/gadget/function/f_midi.c
@@ -283,7 +283,7 @@ f_midi_complete(struct usb_ep *ep, struct usb_request *req)
/* Our transmit completed. See if there's more to go.
* f_midi_transmit eats req, don't queue it again. */
req->length = 0;
- f_midi_transmit(midi);
+ queue_work(system_highpri_wq, &midi->work);
return;
}
break;
--
2.25.1
This series fixes oopses on Alpha/SMP observed since kernel v6.9. [1]
Thanks to Magnus Lindholm for identifying that remarkably longstanding
bug.
The problem is that GCC expects 16-byte alignment of the incoming stack
since early 2004, as Maciej found out [2]:
Having actually dug speculatively I can see that the psABI was changed in
GCC 3.5 with commit e5e10fb4a350 ("re PR target/14539 (128-bit long double
improperly aligned)") back in Mar 2004, when the stack pointer alignment
was increased from 8 bytes to 16 bytes, and arch/alpha/kernel/entry.S has
various suspicious stack pointer adjustments, starting with SP_OFF which
is not a whole multiple of 16.
Also, as Magnus noted, "ALPHA Calling Standard" [3] required the same:
D.3.1 Stack Alignment
This standard requires that stacks be octaword aligned at the time a
new procedure is invoked.
However:
- the "normal" kernel stack is always misaligned by 8 bytes, thanks to
the odd number of 64-bit words in 'struct pt_regs', which is the very
first thing pushed onto the kernel thread stack;
- syscall, fault, interrupt etc. handlers may, or may not, receive aligned
stack depending on numerous factors.
Somehow we got away with it until recently, when we ended up with
a stack corruption in kernel/smp.c:smp_call_function_single() due to
its use of 32-byte aligned local data and the compiler doing clever
things allocating it on the stack.
Patche 1 is preparatory; 2 - the main fix; 3 - fixes remaining
special cases.
Ivan.
[1] https://lore.kernel.org/rcu/CA+=Fv5R9NG+1SHU9QV9hjmavycHKpnNyerQ=Ei90G98ukR…
[2] https://lore.kernel.org/rcu/alpine.DEB.2.21.2501130248010.18889@angie.orcam…
[3] https://bitsavers.org/pdf/dec/alpha/Alpha_Calling_Standard_Rev_2.0_19900427…
---
Changes in v2:
- patch #1: provide empty 'struct pt_regs' to fix compile failure in libbpf,
reported by John Paul Adrian Glaubitz <glaubitz(a)physik.fu-berlin.de>;
update comment and commit message accordingly;
- cc'ed <stable(a)vger.kernel.org> as older kernels ought to be fixed as well.
Changes in v3:
- patch #1 dropped for the time being;
- updated commit messages as Maciej suggested.
---
Ivan Kokshaysky (3):
alpha: replace hardcoded stack offsets with autogenerated ones
alpha: make stack 16-byte aligned (most cases)
alpha: align stack for page fault and user unaligned trap handlers
arch/alpha/include/uapi/asm/ptrace.h | 2 ++
arch/alpha/kernel/asm-offsets.c | 4 ++++
arch/alpha/kernel/entry.S | 24 ++++++++++--------------
arch/alpha/kernel/traps.c | 2 +-
arch/alpha/mm/fault.c | 4 ++--
5 files changed, 19 insertions(+), 17 deletions(-)
--
2.47.2
From: Shyam Prasad N <sprasad(a)microsoft.com>
The netfs library could break down a read request into
multiple subrequests. When multichannel is used, there is
potential to improve performance when each of these
subrequests pick a different channel.
Today we call cifs_pick_channel when the main read request
is initialized in cifs_init_request. This change moves this to
cifs_prepare_read, which is the right place to pick channel since
it gets called for each subrequest.
Interestingly cifs_prepare_write already does channel selection
for individual subreq, but looks like it was missed for read.
This is especially important when multichannel is used with
increased rasize.
In my test setup, with rasize set to 8MB, a sequential read
of large file was taking 11.5s without this change. With the
change, it completed in 9s. The difference is even more signigicant
with bigger rasize.
Cc: <stable(a)vger.kernel.org>
Cc: David Howells <dhowells(a)redhat.com>
Signed-off-by: Shyam Prasad N <sprasad(a)microsoft.com>
---
fs/smb/client/cifsglob.h | 1 -
fs/smb/client/file.c | 7 ++++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index a68434ad744a..243e4881528c 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -1508,7 +1508,6 @@ struct cifs_io_parms {
struct cifs_io_request {
struct netfs_io_request rreq;
struct cifsFileInfo *cfile;
- struct TCP_Server_Info *server;
pid_t pid;
};
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 79de2f2f9c41..8582cf61242c 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -147,7 +147,7 @@ static int cifs_prepare_read(struct netfs_io_subrequest *subreq)
struct netfs_io_request *rreq = subreq->rreq;
struct cifs_io_subrequest *rdata = container_of(subreq, struct cifs_io_subrequest, subreq);
struct cifs_io_request *req = container_of(subreq->rreq, struct cifs_io_request, rreq);
- struct TCP_Server_Info *server = req->server;
+ struct TCP_Server_Info *server;
struct cifs_sb_info *cifs_sb = CIFS_SB(rreq->inode->i_sb);
size_t size;
int rc = 0;
@@ -156,6 +156,8 @@ static int cifs_prepare_read(struct netfs_io_subrequest *subreq)
rdata->xid = get_xid();
rdata->have_xid = true;
}
+
+ server = cifs_pick_channel(tlink_tcon(req->cfile->tlink)->ses);
rdata->server = server;
if (cifs_sb->ctx->rsize == 0)
@@ -198,7 +200,7 @@ static void cifs_issue_read(struct netfs_io_subrequest *subreq)
struct netfs_io_request *rreq = subreq->rreq;
struct cifs_io_subrequest *rdata = container_of(subreq, struct cifs_io_subrequest, subreq);
struct cifs_io_request *req = container_of(subreq->rreq, struct cifs_io_request, rreq);
- struct TCP_Server_Info *server = req->server;
+ struct TCP_Server_Info *server = rdata->server;
int rc = 0;
cifs_dbg(FYI, "%s: op=%08x[%x] mapping=%p len=%zu/%zu\n",
@@ -266,7 +268,6 @@ static int cifs_init_request(struct netfs_io_request *rreq, struct file *file)
open_file = file->private_data;
rreq->netfs_priv = file->private_data;
req->cfile = cifsFileInfo_get(open_file);
- req->server = cifs_pick_channel(tlink_tcon(req->cfile->tlink)->ses);
if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
req->pid = req->cfile->pid;
} else if (rreq->origin != NETFS_WRITEBACK) {
--
2.43.0
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: c631a2de7ae48d50434bdc205d901423f8577c65
Gitweb: https://git.kernel.org/tip/c631a2de7ae48d50434bdc205d901423f8577c65
Author: Sean Christopherson <seanjc(a)google.com>
AuthorDate: Thu, 30 Jan 2025 17:07:21 -08:00
Committer: Peter Zijlstra <peterz(a)infradead.org>
CommitterDate: Sat, 08 Feb 2025 15:47:26 +01:00
perf/x86/intel: Ensure LBRs are disabled when a CPU is starting
Explicitly clear DEBUGCTL.LBR when a CPU is starting, prior to purging the
LBR MSRs themselves, as at least one system has been found to transfer
control to the kernel with LBRs enabled (it's unclear whether it's a BIOS
flaw or a CPU goof). Because the kernel preserves the original DEBUGCTL,
even when toggling LBRs, leaving DEBUGCTL.LBR as is results in running
with LBRs enabled at all times.
Closes: https://lore.kernel.org/all/c9d8269bff69f6359731d758e3b1135dedd7cc61.camel@…
Reported-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20250131010721.470503-1-seanjc@google.com
---
arch/x86/events/intel/core.c | 5 ++++-
arch/x86/include/asm/msr-index.h | 3 ++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index f3d5b71..e86333e 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5042,8 +5042,11 @@ static void intel_pmu_cpu_starting(int cpu)
init_debug_store_on_cpu(cpu);
/*
- * Deal with CPUs that don't clear their LBRs on power-up.
+ * Deal with CPUs that don't clear their LBRs on power-up, and that may
+ * even boot with LBRs enabled.
*/
+ if (!static_cpu_has(X86_FEATURE_ARCH_LBR) && x86_pmu.lbr_nr)
+ msr_clear_bit(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR_BIT);
intel_pmu_lbr_reset();
cpuc->lbr_sel = NULL;
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 9a71880..72765b2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -395,7 +395,8 @@
#define MSR_IA32_PASID_VALID BIT_ULL(31)
/* DEBUGCTLMSR bits (others vary by model): */
-#define DEBUGCTLMSR_LBR (1UL << 0) /* last branch recording */
+#define DEBUGCTLMSR_LBR_BIT 0 /* last branch recording */
+#define DEBUGCTLMSR_LBR (1UL << DEBUGCTLMSR_LBR_BIT)
#define DEBUGCTLMSR_BTF_SHIFT 1
#define DEBUGCTLMSR_BTF (1UL << 1) /* single-step on branches */
#define DEBUGCTLMSR_BUS_LOCK_DETECT (1UL << 2)
Hello,
New build issue found on stable-rc/linux-6.1.y:
---
‘struct drm_connector’ has no member named ‘eld_mutex’ in
drivers/gpu/drm/sti/sti_hdmi.o (drivers/gpu/drm/sti/sti_hdmi.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://dashboard.kernelci.org/issue/maestro:7fe27892aa3e055cf7cbc7660b6921…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 5beb9a3ea62e7725d8cd88410dbd269ebf683ce0
Log excerpt:
=====================================================
drivers/gpu/drm/sti/sti_hdmi.c:1223:30: error: ‘struct drm_connector’
has no member named ‘eld_mutex’
1223 | mutex_lock(&connector->eld_mutex);
| ^~
drivers/gpu/drm/sti/sti_hdmi.c:1225:32: error: ‘struct drm_connector’
has no member named ‘eld_mutex’
1225 | mutex_unlock(&connector->eld_mutex);
| ^~
=====================================================
# Builds where the incident occurred:
## multi_v7_defconfig on (arm):
- compiler: gcc-12
- dashboard: https://dashboard.kernelci.org/build/maestro:67ab310db27a1f56cc37e118
#kernelci issue maestro:7fe27892aa3e055cf7cbc7660b69219ecce56688
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kerneci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-5.4.y:
---
‘struct drm_connector’ has no member named ‘eld_mutex’ in
drivers/gpu/drm/sti/sti_hdmi.o (drivers/gpu/drm/sti/sti_hdmi.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://dashboard.kernelci.org/issue/maestro:5e925d96a3540cf43ab1b679d0e9b4…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 16f808b001a697126972a39c8a2600c33a616ebf
Log excerpt:
=====================================================
drivers/gpu/drm/sti/sti_hdmi.c:1217:30: error: ‘struct drm_connector’
has no member named ‘eld_mutex’
1217 | mutex_lock(&connector->eld_mutex);
| ^~
drivers/gpu/drm/sti/sti_hdmi.c:1219:32: error: ‘struct drm_connector’
has no member named ‘eld_mutex’
1219 | mutex_unlock(&connector->eld_mutex);
| ^~
=====================================================
# Builds where the incident occurred:
## multi_v7_defconfig on (arm):
- compiler: gcc-12
- dashboard: https://dashboard.kernelci.org/build/maestro:67ab2fc1b27a1f56cc37dfff
#kernelci issue maestro:5e925d96a3540cf43ab1b679d0e9b4356b623237
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kerneci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-5.10.y:
---
‘struct drm_connector’ has no member named ‘eld_mutex’ in
drivers/gpu/drm/sti/sti_hdmi.o (drivers/gpu/drm/sti/sti_hdmi.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://dashboard.kernelci.org/issue/maestro:fa3672570798747730c19859539289…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: a12eb63b1d685a20c1abe34c84c383f0b7b829b5
Log excerpt:
=====================================================
drivers/gpu/drm/sti/sti_hdmi.c:1216:30: error: ‘struct drm_connector’
has no member named ‘eld_mutex’
1216 | mutex_lock(&connector->eld_mutex);
| ^~
drivers/gpu/drm/sti/sti_hdmi.c:1218:32: error: ‘struct drm_connector’
has no member named ‘eld_mutex’
1218 | mutex_unlock(&connector->eld_mutex);
| ^~
CC [M] drivers/gpu/drm/msm/dp/dp_display.o
=====================================================
# Builds where the incident occurred:
## multi_v7_defconfig on (arm):
- compiler: gcc-12
- dashboard: https://dashboard.kernelci.org/build/maestro:67ab302ab27a1f56cc37e05b
#kernelci issue maestro:fa3672570798747730c198595392890f2f99404c
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kerneci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-5.4.y:
---
implicit declaration of function
‘drm_connector_helper_hpd_irq_event’; did you mean
‘drm_helper_hpd_irq_event’? [-Werror=implicit-function-declaration] in
drivers/gpu/drm/rockchip/cdn-dp-core.o
(drivers/gpu/drm/rockchip/cdn-dp-core.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://dashboard.kernelci.org/issue/maestro:15d7ea9fd5f2bedae85bc45a4267a6…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 16f808b001a697126972a39c8a2600c33a616ebf
Log excerpt:
=====================================================
drivers/gpu/drm/rockchip/cdn-dp-core.c:981:9: error: implicit
declaration of function ‘drm_connector_helper_hpd_irq_event’; did you
mean ‘drm_helper_hpd_irq_event’?
[-Werror=implicit-function-declaration]
981 | drm_connector_helper_hpd_irq_event(&dp->connector);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| drm_helper_hpd_irq_event
CC [M] drivers/gpu/drm/nouveau/nvkm/engine/dma/gv100.o
CC [M] drivers/gpu/drm/sun4i/sun4i_tv.o
CC [M] drivers/gpu/drm/tegra/gem.o
CC [M] drivers/gpu/drm/nouveau/nvkm/engine/dma/user.o
cc1: some warnings being treated as errors
=====================================================
# Builds where the incident occurred:
## defconfig on (arm64):
- compiler: gcc-12
- dashboard: https://dashboard.kernelci.org/build/maestro:67ab2fcbb27a1f56cc37e008
#kernelci issue maestro:15d7ea9fd5f2bedae85bc45a4267a66024d1b429
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kerneci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-5.15.y:
---
‘struct drm_connector’ has no member named ‘eld_mutex’ in
drivers/gpu/drm/sti/sti_hdmi.o (drivers/gpu/drm/sti/sti_hdmi.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://dashboard.kernelci.org/issue/maestro:c489b0f0574bd38579af0eba0ab35e…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: fbe69012f6c01a2b7fea011bfad0655f1d28eea5
Log excerpt:
=====================================================
drivers/gpu/drm/sti/sti_hdmi.c:1222:30: error: ‘struct drm_connector’
has no member named ‘eld_mutex’
1222 | mutex_lock(&connector->eld_mutex);
| ^~
drivers/gpu/drm/sti/sti_hdmi.c:1224:32: error: ‘struct drm_connector’
has no member named ‘eld_mutex’
1224 | mutex_unlock(&connector->eld_mutex);
| ^~
=====================================================
# Builds where the incident occurred:
## multi_v7_defconfig on (arm):
- compiler: gcc-12
- dashboard: https://dashboard.kernelci.org/build/maestro:67ab309ab27a1f56cc37e0b7
#kernelci issue maestro:c489b0f0574bd38579af0eba0ab35e7b2e49d000
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kerneci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Certain registers in the AFE IO space require the apll1 clock to be
enabled in order to be read, otherwise the machine hangs (registers like
0x280, 0x410 (AFE_GAIN1_CON0) and 0x830 (AFE_CONN0_5)). During AFE
driver probe, when initializing the regmap for the AFE IO space those
registers are read, resulting in a hang during boot.
This has been observed on the Genio 700 EVK, Genio 510 EVK and
MT8188-Geralt-Ciri Chromebook, all of which are based on the MT8188 SoC.
Assign CLK_TOP_APLL1_D4 as the parent for CLK_TOP_A1SYS_HP, which is
enabled during register read and write, to make sure the apll1 is
enabled during register operations and prevent the MT8188 machines from
hanging during boot.
Cc: stable(a)vger.kernel.org
Fixes: 4dbec3a59a71 ("arm64: dts: mediatek: mt8188: Add audio support")
Suggested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Changes in v2:
- Changed patch from explicitly enabling apll1 clock in the driver to
assigning apll1_d4 as the parent for the a1sys_hp clock in the
mt8188.dtsi
- Link to v1: https://lore.kernel.org/r/20241203-mt8188-afe-fix-hang-disabled-apll1-clk-v…
---
arch/arm64/boot/dts/mediatek/mt8188.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/mediatek/mt8188.dtsi b/arch/arm64/boot/dts/mediatek/mt8188.dtsi
index 5d78f51c6183c15018986df2c76e6fdc1f9f43b4..6352c9bd436550dce66435f23653ebcb43ccf0cd 100644
--- a/arch/arm64/boot/dts/mediatek/mt8188.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8188.dtsi
@@ -1392,7 +1392,7 @@ afe: audio-controller@10b10000 {
compatible = "mediatek,mt8188-afe";
reg = <0 0x10b10000 0 0x10000>;
assigned-clocks = <&topckgen CLK_TOP_A1SYS_HP>;
- assigned-clock-parents = <&clk26m>;
+ assigned-clock-parents = <&topckgen CLK_TOP_APLL1_D4>;
clocks = <&clk26m>,
<&apmixedsys CLK_APMIXED_APLL1>,
<&apmixedsys CLK_APMIXED_APLL2>,
---
base-commit: ed58d103e6da15a442ff87567898768dc3a66987
change-id: 20241203-mt8188-afe-fix-hang-disabled-apll1-clk-b3c11782cbaf
Best regards,
--
Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
Set the buffer type to IGC_TX_BUFFER_TYPE_SKB for empty frame in the
igc_init_empty_frame function. This ensures that the buffer type is
correctly identified and handled during Tx ring cleanup.
Fixes: db0b124f02ba ("igc: Enhance Qbv scheduling by using first flag bit")
Cc: stable(a)vger.kernel.org # 6.2+
Signed-off-by: Song Yoong Siang <yoong.siang.song(a)intel.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 56a35d58e7a6..7daa7f8f81ca 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1096,6 +1096,7 @@ static int igc_init_empty_frame(struct igc_ring *ring,
return -ENOMEM;
}
+ buffer->type = IGC_TX_BUFFER_TYPE_SKB;
buffer->skb = skb;
buffer->protocol = 0;
buffer->bytecount = skb->len;
--
2.34.1
'vfio_device' keeps the ->kvm pointer with elevated counter from the first
open of the device up until the last close(). So the kvm struct and its
dependencies (kvm kthreads, cgroups ...) kept alive even for VFIO device
that don't need ->kvm.
Copy ->kvm pointer from the vfio_device struct and store it in the
'intel_vgpu'. Note that kvm_page_track_[un]register_notifier() already
does get/put calls, keeping the kvm struct alive.
This will allow to release ->kvm from the vfio_device righ after the
first open call, so that devices not using kvm not keeping it alive.
Devices that are using kvm (like intel_vgpu) will be expected to mange
the lifetime of the kvm struct by themselves.
Fixes: 2b48f52f2bff ("vfio: fix deadlock between group lock and kvm lock")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrey Ryabinin <arbn(a)yandex-team.com>
---
drivers/gpu/drm/i915/gvt/gvt.h | 1 +
drivers/gpu/drm/i915/gvt/kvmgt.c | 14 +++++++-------
2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 2c95aeef4e41..6c62467df22c 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -232,6 +232,7 @@ struct intel_vgpu {
unsigned long nr_cache_entries;
struct mutex cache_lock;
+ struct kvm *kvm;
struct kvm_page_track_notifier_node track_node;
#define NR_BKT (1 << 18)
struct hlist_head ptable[NR_BKT];
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b27ff77bfb50..cf418e2c560d 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -36,6 +36,7 @@
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/kthread.h>
+#include <linux/kvm_host.h>
#include <linux/sched/mm.h>
#include <linux/types.h>
#include <linux/list.h>
@@ -649,7 +650,7 @@ static bool __kvmgt_vgpu_exist(struct intel_vgpu *vgpu)
if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, itr->status))
continue;
- if (vgpu->vfio_device.kvm == itr->vfio_device.kvm) {
+ if (vgpu->kvm == itr->kvm) {
ret = true;
goto out;
}
@@ -664,13 +665,13 @@ static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
int ret;
+ vgpu->kvm = vgpu->vfio_device.kvm;
if (__kvmgt_vgpu_exist(vgpu))
return -EEXIST;
vgpu->track_node.track_write = kvmgt_page_track_write;
vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
- ret = kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
- &vgpu->track_node);
+ ret = kvm_page_track_register_notifier(vgpu->kvm, &vgpu->track_node);
if (ret) {
gvt_vgpu_err("KVM is required to use Intel vGPU\n");
return ret;
@@ -707,8 +708,7 @@ static void intel_vgpu_close_device(struct vfio_device *vfio_dev)
debugfs_lookup_and_remove(KVMGT_DEBUGFS_FILENAME, vgpu->debugfs);
- kvm_page_track_unregister_notifier(vgpu->vfio_device.kvm,
- &vgpu->track_node);
+ kvm_page_track_unregister_notifier(vgpu->kvm, &vgpu->track_node);
kvmgt_protect_table_destroy(vgpu);
gvt_cache_destroy(vgpu);
@@ -1560,7 +1560,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
if (kvmgt_gfn_is_write_protected(info, gfn))
return 0;
- r = kvm_write_track_add_gfn(info->vfio_device.kvm, gfn);
+ r = kvm_write_track_add_gfn(info->kvm, gfn);
if (r)
return r;
@@ -1578,7 +1578,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
if (!kvmgt_gfn_is_write_protected(info, gfn))
return 0;
- r = kvm_write_track_remove_gfn(info->vfio_device.kvm, gfn);
+ r = kvm_write_track_remove_gfn(info->kvm, gfn);
if (r)
return r;
--
2.45.3
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x e52e97f09fb66fd868260d05bd6b74a9a3db39ee
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021110-demeaning-mushroom-9922@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e52e97f09fb66fd868260d05bd6b74a9a3db39ee Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Thu, 30 Jan 2025 13:15:00 +0100
Subject: [PATCH] statmount: let unset strings be empty
Just like it's normal for unset values to be zero, unset strings should be
empty instead of containing random values.
It seems to be a typical mistake that the mask returned by statmount is not
checked, which can result in various bugs.
With this fix, these bugs are prevented, since it is highly likely that
userspace would just want to turn the missing mask case into an empty
string anyway (most of the recently found cases are of this type).
Link: https://lore.kernel.org/all/CAJfpegsVCPfCn2DpM8iiYSS5DpMsLB8QBUCHecoj6s0Vxf…
Fixes: 68385d77c05b ("statmount: simplify string option retrieval")
Fixes: 46eae99ef733 ("add statmount(2) syscall")
Cc: stable(a)vger.kernel.org # v6.8
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
Link: https://lore.kernel.org/r/20250130121500.113446-1-mszeredi@redhat.com
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/fs/namespace.c b/fs/namespace.c
index a3ed3f2980cb..9c4d307a82cd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -5191,39 +5191,45 @@ static int statmount_string(struct kstatmount *s, u64 flag)
size_t kbufsize;
struct seq_file *seq = &s->seq;
struct statmount *sm = &s->sm;
- u32 start = seq->count;
+ u32 start, *offp;
+
+ /* Reserve an empty string at the beginning for any unset offsets */
+ if (!seq->count)
+ seq_putc(seq, 0);
+
+ start = seq->count;
switch (flag) {
case STATMOUNT_FS_TYPE:
- sm->fs_type = start;
+ offp = &sm->fs_type;
ret = statmount_fs_type(s, seq);
break;
case STATMOUNT_MNT_ROOT:
- sm->mnt_root = start;
+ offp = &sm->mnt_root;
ret = statmount_mnt_root(s, seq);
break;
case STATMOUNT_MNT_POINT:
- sm->mnt_point = start;
+ offp = &sm->mnt_point;
ret = statmount_mnt_point(s, seq);
break;
case STATMOUNT_MNT_OPTS:
- sm->mnt_opts = start;
+ offp = &sm->mnt_opts;
ret = statmount_mnt_opts(s, seq);
break;
case STATMOUNT_OPT_ARRAY:
- sm->opt_array = start;
+ offp = &sm->opt_array;
ret = statmount_opt_array(s, seq);
break;
case STATMOUNT_OPT_SEC_ARRAY:
- sm->opt_sec_array = start;
+ offp = &sm->opt_sec_array;
ret = statmount_opt_sec_array(s, seq);
break;
case STATMOUNT_FS_SUBTYPE:
- sm->fs_subtype = start;
+ offp = &sm->fs_subtype;
statmount_fs_subtype(s, seq);
break;
case STATMOUNT_SB_SOURCE:
- sm->sb_source = start;
+ offp = &sm->sb_source;
ret = statmount_sb_source(s, seq);
break;
default:
@@ -5251,6 +5257,7 @@ static int statmount_string(struct kstatmount *s, u64 flag)
seq->buf[seq->count++] = '\0';
sm->mask |= flag;
+ *offp = start;
return 0;
}
From: Saurabh Sengar <ssengar(a)linux.microsoft.com>
On a x86 system under test with 1780 CPUs, topology_span_sane() takes
around 8 seconds cumulatively for all the iterations. It is an expensive
operation which does the sanity of non-NUMA topology masks.
CPU topology is not something which changes very frequently hence make
this check optional for the systems where the topology is trusted and
need faster bootup.
Restrict this to sched_verbose kernel cmdline option so that this penalty
can be avoided for the systems who want to avoid it.
Cc: stable(a)vger.kernel.org
Fixes: ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap")
Signed-off-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Co-developed-by: Naman Jain <namjain(a)linux.microsoft.com>
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
---
Changes since v2:
https://lore.kernel.org/all/1731922777-7121-1-git-send-email-ssengar@linux.…
- Use sched_debug() instead of using sched_debug_verbose
variable directly (addressing Prateek's comment)
Changes since v1:
https://lore.kernel.org/all/1729619853-2597-1-git-send-email-ssengar@linux.…
- Use kernel cmdline param instead of compile time flag.
Adding a link to the other patch which is under review.
https://lore.kernel.org/lkml/20241031200431.182443-1-steve.wahl@hpe.com/
Above patch tries to optimize the topology sanity check, whereas this
patch makes it optional. We believe both patches can coexist, as even
with optimization, there will still be some performance overhead for
this check.
---
kernel/sched/topology.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index c49aea8c1025..b030c1a2121f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2359,6 +2359,13 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
{
int i = cpu + 1;
+ /* Skip the topology sanity check for non-debug, as it is a time-consuming operatin */
+ if (!sched_debug()) {
+ pr_info_once("%s: Skipping topology span sanity check. Use `sched_verbose` boot parameter to enable it.\n",
+ __func__);
+ return true;
+ }
+
/* NUMA levels are allowed to overlap */
if (tl->flags & SDTL_OVERLAP)
return true;
base-commit: 00f3246adeeacbda0bd0b303604e46eb59c32e6e
--
2.43.0
These patches fix some issues with the way KVM manages FPSIMD/SVE/SME
state. The series supersedes my earlier attempt at fixing the host SVE
state corruption issue:
https://lore.kernel.org/linux-arm-kernel/20250121100026.3974971-1-mark.rutl…
Patch 1 addresses the host SVE state corruption issue by always saving
and unbinding the host state when loading a vCPU, as discussed on the
earlier patch:
https://lore.kernel.org/linux-arm-kernel/Z4--YuG5SWrP_pW7@J2N7QTR9R3/https://lore.kernel.org/linux-arm-kernel/86plkful48.wl-maz@kernel.org/
Patches 2 to 4 remove code made redundant by patch 1. These probably
warrant backporting along with patch 1 as there is some historical
brokenness in the code they remove.
Patches 5 to 7 are preparatory refactoring for patch 8, and are not
intended to have any functional impact.
Patch 8 addresses some mismanagement of ZCR_EL{1,2} which can result in
the host VMM unexpectedly receiving a SIGKILL. To fix this, we eagerly
switch ZCR_EL{1,2} at guest<->host transitions, as discussed on another
series:
https://lore.kernel.org/linux-arm-kernel/Z4pAMaEYvdLpmbg2@J2N7QTR9R3/https://lore.kernel.org/linux-arm-kernel/86o6zzukwr.wl-maz@kernel.org/https://lore.kernel.org/linux-arm-kernel/Z5Dc-WMu2azhTuMn@J2N7QTR9R3/
The end result is that KVM loses ~100 lines of code, and becomes a bit
simpler to reason about.
I've pushed these patches to the arm64-kvm-fpsimd-fixes-20250210 tag on my
kernel.org repo:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git
The (unstable) arm64/kvm/fpsimd-fixes branch in that repo contains the
fixes plus additional debug patches I've used for testing. I've given
this some basic testing on a virtual platform, booting a host and a
guest with and without constraining the guest's max SVE VL, with:
* kvm_arm.mode=vhe
* kvm_arm.mode=nvhe
* kvm_arm.mode=protected (IIUC this will default to hVHE)
Since v1 [1]:
* Address some additional compiler warnings in patch 7
* Use ZCR_EL1 alias in VHE code
* Fold in Tested-by and Reviewed-by tags
* Fix typos
Since v2 [2]:
* Ensure context synchronization in patch 8
* Fold in Tested-by and Reviewed-by tags
* Fix typos
[1] https://lore.kernel.org/linux-arm-kernel/20250204152100.705610-1-mark.rutla…
[2] https://lore.kernel.org/linux-arm-kernel/20250206141102.954688-1-mark.rutla…
Mark.
Mark Rutland (8):
KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state
KVM: arm64: Remove host FPSIMD saving for non-protected KVM
KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN
KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN
KVM: arm64: Refactor CPTR trap deactivation
KVM: arm64: Refactor exit handlers
KVM: arm64: Mark some header functions as inline
KVM: arm64: Eagerly switch ZCR_EL{1,2}
arch/arm64/include/asm/kvm_emulate.h | 42 --------
arch/arm64/include/asm/kvm_host.h | 22 +---
arch/arm64/kernel/fpsimd.c | 25 -----
arch/arm64/kvm/arm.c | 8 --
arch/arm64/kvm/fpsimd.c | 100 ++----------------
arch/arm64/kvm/hyp/entry.S | 5 +
arch/arm64/kvm/hyp/include/hyp/switch.h | 133 +++++++++++++++++-------
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 15 ++-
arch/arm64/kvm/hyp/nvhe/switch.c | 91 ++++++++--------
arch/arm64/kvm/hyp/vhe/switch.c | 33 +++---
10 files changed, 187 insertions(+), 287 deletions(-)
--
2.30.2
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x be92ab2de0ee1a13291c3b47b2d7eb24d80c0a2c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021058-ruse-paradox-92e6@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be92ab2de0ee1a13291c3b47b2d7eb24d80c0a2c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bence=20Cs=C3=B3k=C3=A1s?= <csokas.bence(a)prolan.hu>
Date: Thu, 19 Dec 2024 10:12:58 +0100
Subject: [PATCH] spi: atmel-qspi: Memory barriers after memory-mapped I/O
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The QSPI peripheral control and status registers are
accessible via the SoC's APB bus, whereas MMIO transactions'
data travels on the AHB bus.
Microchip documentation and even sample code from Atmel
emphasises the need for a memory barrier before the first
MMIO transaction to the AHB-connected QSPI, and before the
last write to its registers via APB. This is achieved by
the following lines in `atmel_qspi_transfer()`:
/* Dummy read of QSPI_IFR to synchronize APB and AHB accesses */
(void)atmel_qspi_read(aq, QSPI_IFR);
However, the current documentation makes no mention to
synchronization requirements in the other direction, i.e.
after the last data written via AHB, and before the first
register access on APB.
In our case, we were facing an issue where the QSPI peripheral
would cease to send any new CSR (nCS Rise) interrupts,
leading to a timeout in `atmel_qspi_wait_for_completion()`
and ultimately this panic in higher levels:
ubi0 error: ubi_io_write: error -110 while writing 63108 bytes
to PEB 491:128, written 63104 bytes
After months of extensive research of the codebase, fiddling
around the debugger with kgdb, and back-and-forth with
Microchip, we came to the conclusion that the issue is
probably that the peripheral is still busy receiving on AHB
when the LASTXFER bit is written to its Control Register
on APB, therefore this write gets lost, and the peripheral
still thinks there is more data to come in the MMIO transfer.
This was first formulated when we noticed that doubling the
write() of QSPI_CR_LASTXFER seemed to solve the problem.
Ultimately, the solution is to introduce memory barriers
after the AHB-mapped MMIO transfers, to ensure ordering.
Fixes: d5433def3153 ("mtd: spi-nor: atmel-quadspi: Add spi-mem support to atmel-quadspi")
Cc: Hari.PrasathGE(a)microchip.com
Cc: Mahesh.Abotula(a)microchip.com
Cc: Marco.Cardellini(a)microchip.com
Cc: stable(a)vger.kernel.org # c0a0203cf579: ("spi: atmel-quadspi: Create `atmel_qspi_ops`"...)
Cc: stable(a)vger.kernel.org # 6.x.y
Signed-off-by: Bence Csókás <csokas.bence(a)prolan.hu>
Link: https://patch.msgid.link/20241219091258.395187-1-csokas.bence@prolan.hu
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/atmel-quadspi.c b/drivers/spi/atmel-quadspi.c
index f46da363574f..8fdc9d27a95e 100644
--- a/drivers/spi/atmel-quadspi.c
+++ b/drivers/spi/atmel-quadspi.c
@@ -661,13 +661,20 @@ static int atmel_qspi_transfer(struct spi_mem *mem,
(void)atmel_qspi_read(aq, QSPI_IFR);
/* Send/Receive data */
- if (op->data.dir == SPI_MEM_DATA_IN)
+ if (op->data.dir == SPI_MEM_DATA_IN) {
memcpy_fromio(op->data.buf.in, aq->mem + offset,
op->data.nbytes);
- else
+
+ /* Synchronize AHB and APB accesses again */
+ rmb();
+ } else {
memcpy_toio(aq->mem + offset, op->data.buf.out,
op->data.nbytes);
+ /* Synchronize AHB and APB accesses again */
+ wmb();
+ }
+
/* Release the chip-select */
atmel_qspi_write(QSPI_CR_LASTXFER, aq, QSPI_CR);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x e966eae72762ecfdbdb82627e2cda48845b9dd66
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021134-kissing-enjoyer-5d7e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e966eae72762ecfdbdb82627e2cda48845b9dd66 Mon Sep 17 00:00:00 2001
From: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Date: Fri, 10 Jan 2025 13:42:39 +0000
Subject: [PATCH] misc: fastrpc: Fix copy buffer page size
For non-registered buffer, fastrpc driver copies the buffer and
pass it to the remote subsystem. There is a problem with current
implementation of page size calculation which is not considering
the offset in the calculation. This might lead to passing of
improper and out-of-bounds page size which could result in
memory issue. Calculate page start and page end using the offset
adjusted address instead of absolute address.
Fixes: 02b45b47fbe8 ("misc: fastrpc: fix remote page size calculation")
Cc: stable(a)kernel.org
Signed-off-by: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20250110134239.123603-4-srinivas.kandagatla@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 56dc3b3a8940..7b7a22c91fe4 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -1019,8 +1019,8 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
(pkt_size - rlen);
pages[i].addr = pages[i].addr & PAGE_MASK;
- pg_start = (args & PAGE_MASK) >> PAGE_SHIFT;
- pg_end = ((args + len - 1) & PAGE_MASK) >> PAGE_SHIFT;
+ pg_start = (rpra[i].buf.pv & PAGE_MASK) >> PAGE_SHIFT;
+ pg_end = ((rpra[i].buf.pv + len - 1) & PAGE_MASK) >> PAGE_SHIFT;
pages[i].size = (pg_end - pg_start + 1) * PAGE_SIZE;
args = args + mlen;
rlen -= mlen;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x e966eae72762ecfdbdb82627e2cda48845b9dd66
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021134-attendant-greedless-c5c8@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e966eae72762ecfdbdb82627e2cda48845b9dd66 Mon Sep 17 00:00:00 2001
From: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Date: Fri, 10 Jan 2025 13:42:39 +0000
Subject: [PATCH] misc: fastrpc: Fix copy buffer page size
For non-registered buffer, fastrpc driver copies the buffer and
pass it to the remote subsystem. There is a problem with current
implementation of page size calculation which is not considering
the offset in the calculation. This might lead to passing of
improper and out-of-bounds page size which could result in
memory issue. Calculate page start and page end using the offset
adjusted address instead of absolute address.
Fixes: 02b45b47fbe8 ("misc: fastrpc: fix remote page size calculation")
Cc: stable(a)kernel.org
Signed-off-by: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20250110134239.123603-4-srinivas.kandagatla@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 56dc3b3a8940..7b7a22c91fe4 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -1019,8 +1019,8 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
(pkt_size - rlen);
pages[i].addr = pages[i].addr & PAGE_MASK;
- pg_start = (args & PAGE_MASK) >> PAGE_SHIFT;
- pg_end = ((args + len - 1) & PAGE_MASK) >> PAGE_SHIFT;
+ pg_start = (rpra[i].buf.pv & PAGE_MASK) >> PAGE_SHIFT;
+ pg_end = ((rpra[i].buf.pv + len - 1) & PAGE_MASK) >> PAGE_SHIFT;
pages[i].size = (pg_end - pg_start + 1) * PAGE_SIZE;
args = args + mlen;
rlen -= mlen;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e966eae72762ecfdbdb82627e2cda48845b9dd66
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021139-bounce-growl-6d4e@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e966eae72762ecfdbdb82627e2cda48845b9dd66 Mon Sep 17 00:00:00 2001
From: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Date: Fri, 10 Jan 2025 13:42:39 +0000
Subject: [PATCH] misc: fastrpc: Fix copy buffer page size
For non-registered buffer, fastrpc driver copies the buffer and
pass it to the remote subsystem. There is a problem with current
implementation of page size calculation which is not considering
the offset in the calculation. This might lead to passing of
improper and out-of-bounds page size which could result in
memory issue. Calculate page start and page end using the offset
adjusted address instead of absolute address.
Fixes: 02b45b47fbe8 ("misc: fastrpc: fix remote page size calculation")
Cc: stable(a)kernel.org
Signed-off-by: Ekansh Gupta <quic_ekangupt(a)quicinc.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20250110134239.123603-4-srinivas.kandagatla@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 56dc3b3a8940..7b7a22c91fe4 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -1019,8 +1019,8 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
(pkt_size - rlen);
pages[i].addr = pages[i].addr & PAGE_MASK;
- pg_start = (args & PAGE_MASK) >> PAGE_SHIFT;
- pg_end = ((args + len - 1) & PAGE_MASK) >> PAGE_SHIFT;
+ pg_start = (rpra[i].buf.pv & PAGE_MASK) >> PAGE_SHIFT;
+ pg_end = ((rpra[i].buf.pv + len - 1) & PAGE_MASK) >> PAGE_SHIFT;
pages[i].size = (pg_end - pg_start + 1) * PAGE_SIZE;
args = args + mlen;
rlen -= mlen;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 391b06ecb63e6eacd054582cb4eb738dfbf5eb77
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021141-negotiate-many-f58a@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 391b06ecb63e6eacd054582cb4eb738dfbf5eb77 Mon Sep 17 00:00:00 2001
From: Sascha Hauer <s.hauer(a)pengutronix.de>
Date: Mon, 30 Dec 2024 14:18:58 +0000
Subject: [PATCH] nvmem: imx-ocotp-ele: fix MAC address byte order
According to the i.MX93 Fusemap the two MAC addresses are stored in
words 315 to 317 like this:
315 MAC1_ADDR_31_0[31:0]
316 MAC1_ADDR_47_32[47:32]
MAC2_ADDR_15_0[15:0]
317 MAC2_ADDR_47_16[31:0]
This means the MAC addresses are stored in reverse byte order. We have
to swap the bytes before passing them to the upper layers. The storage
format is consistent to the one used on i.MX6 using imx-ocotp driver
which does the same byte swapping as introduced here.
With this patch the MAC address on my i.MX93 TQ board correctly reads as
00:d0:93:6b:27:b8 instead of b8:27:6b:93:d0:00.
Fixes: 22e9e6fcfb50 ("nvmem: imx: support i.MX93 OCOTP")
Signed-off-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: stable <stable(a)kernel.org>
Reviewed-by: Peng Fan <peng.fan(a)nxp.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20241230141901.263976-4-srinivas.kandagatla@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/nvmem/imx-ocotp-ele.c b/drivers/nvmem/imx-ocotp-ele.c
index b2d21a5f77bc..422a6d53b10e 100644
--- a/drivers/nvmem/imx-ocotp-ele.c
+++ b/drivers/nvmem/imx-ocotp-ele.c
@@ -111,6 +111,26 @@ static int imx_ocotp_reg_read(void *context, unsigned int offset, void *val, siz
return 0;
};
+static int imx_ocotp_cell_pp(void *context, const char *id, int index,
+ unsigned int offset, void *data, size_t bytes)
+{
+ u8 *buf = data;
+ int i;
+
+ /* Deal with some post processing of nvmem cell data */
+ if (id && !strcmp(id, "mac-address"))
+ for (i = 0; i < bytes / 2; i++)
+ swap(buf[i], buf[bytes - i - 1]);
+
+ return 0;
+}
+
+static void imx_ocotp_fixup_dt_cell_info(struct nvmem_device *nvmem,
+ struct nvmem_cell_info *cell)
+{
+ cell->read_post_process = imx_ocotp_cell_pp;
+}
+
static int imx_ele_ocotp_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
@@ -137,6 +157,8 @@ static int imx_ele_ocotp_probe(struct platform_device *pdev)
priv->config.stride = 1;
priv->config.priv = priv;
priv->config.read_only = true;
+ priv->config.add_legacy_fixed_of_cells = true;
+ priv->config.fixup_dt_cell_info = imx_ocotp_fixup_dt_cell_info;
mutex_init(&priv->lock);
nvmem = devm_nvmem_register(dev, &priv->config);
In theory overlayfs could support upper layer directly referring to a data
layer, but there's no current use case for this.
Originally, when data-only layers were introduced, this wasn't allowed,
only introduced by the "datadir+" feture, but without actually handling
this case, resuting in an Oops.
Fix by disallowing datadir without lowerdir.
Reported-by: Giuseppe Scrivano <gscrivan(a)redhat.com>
Fixes: 24e16e385f22 ("ovl: add support for appending lowerdirs one by one")
Cc: <stable(a)vger.kernel.org> # v6.7
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
---
fs/overlayfs/super.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 86ae6f6da36b..b11094acdd8f 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1137,6 +1137,11 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
return ERR_PTR(-EINVAL);
}
+ if (ctx->nr == ctx->nr_data) {
+ pr_err("at least one non-data lowerdir is required\n");
+ return ERR_PTR(-EINVAL);
+ }
+
err = -EINVAL;
for (i = 0; i < ctx->nr; i++) {
l = &ctx->lower[i];
--
2.48.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x abb604a1a9c87255c7a6f3b784410a9707baf467
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021152-overdrive-premiere-cca5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From abb604a1a9c87255c7a6f3b784410a9707baf467 Mon Sep 17 00:00:00 2001
From: Yishai Hadas <yishaih(a)nvidia.com>
Date: Sun, 19 Jan 2025 14:38:25 +0200
Subject: [PATCH] RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with
error
This patch addresses a race condition for an ODP MR that can result in a
CQE with an error on the UMR QP.
During the __mlx5_ib_dereg_mr() flow, the following sequence of calls
occurs:
mlx5_revoke_mr()
mlx5r_umr_revoke_mr()
mlx5r_umr_post_send_wait()
At this point, the lkey is freed from the hardware's perspective.
However, concurrently, mlx5_ib_invalidate_range() might be triggered by
another task attempting to invalidate a range for the same freed lkey.
This task will:
- Acquire the umem_odp->umem_mutex lock.
- Call mlx5r_umr_update_xlt() on the UMR QP.
- Since the lkey has already been freed, this can lead to a CQE error,
causing the UMR QP to enter an error state [1].
To resolve this race condition, the umem_odp->umem_mutex lock is now also
acquired as part of the mlx5_revoke_mr() scope. Upon successful revoke,
we set umem_odp->private which points to that MR to NULL, preventing any
further invalidation attempts on its lkey.
[1] From dmesg:
infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error
cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2
WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib]
Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core
CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib]
[..]
Call Trace:
<TASK>
mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib]
mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib]
__mmu_notifier_invalidate_range_start+0x1e1/0x240
zap_page_range_single+0xf1/0x1a0
madvise_vma_behavior+0x677/0x6e0
do_madvise+0x1a2/0x4b0
__x64_sys_madvise+0x25/0x30
do_syscall_64+0x6b/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: e6fb246ccafb ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/r/68a1e007c25b2b8fe5d625f238cc3b63e5341f77.1737290…
Signed-off-by: Yishai Hadas <yishaih(a)nvidia.com>
Reviewed-by: Artemy Kovalyov <artemyko(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 45d9dc9c6c8f..bb02b6adbf2c 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -2021,6 +2021,11 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
{
struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);
struct mlx5_cache_ent *ent = mr->mmkey.cache_ent;
+ bool is_odp = is_odp_mr(mr);
+ int ret = 0;
+
+ if (is_odp)
+ mutex_lock(&to_ib_umem_odp(mr->umem)->umem_mutex);
if (mr->mmkey.cacheable && !mlx5r_umr_revoke_mr(mr) && !cache_ent_find_and_store(dev, mr)) {
ent = mr->mmkey.cache_ent;
@@ -2032,7 +2037,7 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
ent->tmp_cleanup_scheduled = true;
}
spin_unlock_irq(&ent->mkeys_queue.lock);
- return 0;
+ goto out;
}
if (ent) {
@@ -2041,7 +2046,15 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
mr->mmkey.cache_ent = NULL;
spin_unlock_irq(&ent->mkeys_queue.lock);
}
- return destroy_mkey(dev, mr);
+ ret = destroy_mkey(dev, mr);
+out:
+ if (is_odp) {
+ if (!ret)
+ to_ib_umem_odp(mr->umem)->private = NULL;
+ mutex_unlock(&to_ib_umem_odp(mr->umem)->umem_mutex);
+ }
+
+ return ret;
}
static int __mlx5_ib_dereg_mr(struct ib_mr *ibmr)
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index f2eb940bddc8..f655859eec00 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -268,6 +268,8 @@ static bool mlx5_ib_invalidate_range(struct mmu_interval_notifier *mni,
if (!umem_odp->npages)
goto out;
mr = umem_odp->private;
+ if (!mr)
+ goto out;
start = max_t(u64, ib_umem_start(umem_odp), range->start);
end = min_t(u64, ib_umem_end(umem_odp), range->end);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x abb604a1a9c87255c7a6f3b784410a9707baf467
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021150-recent-yen-89d6@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From abb604a1a9c87255c7a6f3b784410a9707baf467 Mon Sep 17 00:00:00 2001
From: Yishai Hadas <yishaih(a)nvidia.com>
Date: Sun, 19 Jan 2025 14:38:25 +0200
Subject: [PATCH] RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with
error
This patch addresses a race condition for an ODP MR that can result in a
CQE with an error on the UMR QP.
During the __mlx5_ib_dereg_mr() flow, the following sequence of calls
occurs:
mlx5_revoke_mr()
mlx5r_umr_revoke_mr()
mlx5r_umr_post_send_wait()
At this point, the lkey is freed from the hardware's perspective.
However, concurrently, mlx5_ib_invalidate_range() might be triggered by
another task attempting to invalidate a range for the same freed lkey.
This task will:
- Acquire the umem_odp->umem_mutex lock.
- Call mlx5r_umr_update_xlt() on the UMR QP.
- Since the lkey has already been freed, this can lead to a CQE error,
causing the UMR QP to enter an error state [1].
To resolve this race condition, the umem_odp->umem_mutex lock is now also
acquired as part of the mlx5_revoke_mr() scope. Upon successful revoke,
we set umem_odp->private which points to that MR to NULL, preventing any
further invalidation attempts on its lkey.
[1] From dmesg:
infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error
cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2
WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib]
Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core
CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib]
[..]
Call Trace:
<TASK>
mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib]
mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib]
__mmu_notifier_invalidate_range_start+0x1e1/0x240
zap_page_range_single+0xf1/0x1a0
madvise_vma_behavior+0x677/0x6e0
do_madvise+0x1a2/0x4b0
__x64_sys_madvise+0x25/0x30
do_syscall_64+0x6b/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: e6fb246ccafb ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/r/68a1e007c25b2b8fe5d625f238cc3b63e5341f77.1737290…
Signed-off-by: Yishai Hadas <yishaih(a)nvidia.com>
Reviewed-by: Artemy Kovalyov <artemyko(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 45d9dc9c6c8f..bb02b6adbf2c 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -2021,6 +2021,11 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
{
struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);
struct mlx5_cache_ent *ent = mr->mmkey.cache_ent;
+ bool is_odp = is_odp_mr(mr);
+ int ret = 0;
+
+ if (is_odp)
+ mutex_lock(&to_ib_umem_odp(mr->umem)->umem_mutex);
if (mr->mmkey.cacheable && !mlx5r_umr_revoke_mr(mr) && !cache_ent_find_and_store(dev, mr)) {
ent = mr->mmkey.cache_ent;
@@ -2032,7 +2037,7 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
ent->tmp_cleanup_scheduled = true;
}
spin_unlock_irq(&ent->mkeys_queue.lock);
- return 0;
+ goto out;
}
if (ent) {
@@ -2041,7 +2046,15 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
mr->mmkey.cache_ent = NULL;
spin_unlock_irq(&ent->mkeys_queue.lock);
}
- return destroy_mkey(dev, mr);
+ ret = destroy_mkey(dev, mr);
+out:
+ if (is_odp) {
+ if (!ret)
+ to_ib_umem_odp(mr->umem)->private = NULL;
+ mutex_unlock(&to_ib_umem_odp(mr->umem)->umem_mutex);
+ }
+
+ return ret;
}
static int __mlx5_ib_dereg_mr(struct ib_mr *ibmr)
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index f2eb940bddc8..f655859eec00 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -268,6 +268,8 @@ static bool mlx5_ib_invalidate_range(struct mmu_interval_notifier *mni,
if (!umem_odp->npages)
goto out;
mr = umem_odp->private;
+ if (!mr)
+ goto out;
start = max_t(u64, ib_umem_start(umem_odp), range->start);
end = min_t(u64, ib_umem_end(umem_odp), range->end);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x abb604a1a9c87255c7a6f3b784410a9707baf467
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021149-mandarin-rethink-1770@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From abb604a1a9c87255c7a6f3b784410a9707baf467 Mon Sep 17 00:00:00 2001
From: Yishai Hadas <yishaih(a)nvidia.com>
Date: Sun, 19 Jan 2025 14:38:25 +0200
Subject: [PATCH] RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with
error
This patch addresses a race condition for an ODP MR that can result in a
CQE with an error on the UMR QP.
During the __mlx5_ib_dereg_mr() flow, the following sequence of calls
occurs:
mlx5_revoke_mr()
mlx5r_umr_revoke_mr()
mlx5r_umr_post_send_wait()
At this point, the lkey is freed from the hardware's perspective.
However, concurrently, mlx5_ib_invalidate_range() might be triggered by
another task attempting to invalidate a range for the same freed lkey.
This task will:
- Acquire the umem_odp->umem_mutex lock.
- Call mlx5r_umr_update_xlt() on the UMR QP.
- Since the lkey has already been freed, this can lead to a CQE error,
causing the UMR QP to enter an error state [1].
To resolve this race condition, the umem_odp->umem_mutex lock is now also
acquired as part of the mlx5_revoke_mr() scope. Upon successful revoke,
we set umem_odp->private which points to that MR to NULL, preventing any
further invalidation attempts on its lkey.
[1] From dmesg:
infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error
cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2
WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib]
Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core
CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib]
[..]
Call Trace:
<TASK>
mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib]
mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib]
__mmu_notifier_invalidate_range_start+0x1e1/0x240
zap_page_range_single+0xf1/0x1a0
madvise_vma_behavior+0x677/0x6e0
do_madvise+0x1a2/0x4b0
__x64_sys_madvise+0x25/0x30
do_syscall_64+0x6b/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: e6fb246ccafb ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/r/68a1e007c25b2b8fe5d625f238cc3b63e5341f77.1737290…
Signed-off-by: Yishai Hadas <yishaih(a)nvidia.com>
Reviewed-by: Artemy Kovalyov <artemyko(a)nvidia.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 45d9dc9c6c8f..bb02b6adbf2c 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -2021,6 +2021,11 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
{
struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);
struct mlx5_cache_ent *ent = mr->mmkey.cache_ent;
+ bool is_odp = is_odp_mr(mr);
+ int ret = 0;
+
+ if (is_odp)
+ mutex_lock(&to_ib_umem_odp(mr->umem)->umem_mutex);
if (mr->mmkey.cacheable && !mlx5r_umr_revoke_mr(mr) && !cache_ent_find_and_store(dev, mr)) {
ent = mr->mmkey.cache_ent;
@@ -2032,7 +2037,7 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
ent->tmp_cleanup_scheduled = true;
}
spin_unlock_irq(&ent->mkeys_queue.lock);
- return 0;
+ goto out;
}
if (ent) {
@@ -2041,7 +2046,15 @@ static int mlx5_revoke_mr(struct mlx5_ib_mr *mr)
mr->mmkey.cache_ent = NULL;
spin_unlock_irq(&ent->mkeys_queue.lock);
}
- return destroy_mkey(dev, mr);
+ ret = destroy_mkey(dev, mr);
+out:
+ if (is_odp) {
+ if (!ret)
+ to_ib_umem_odp(mr->umem)->private = NULL;
+ mutex_unlock(&to_ib_umem_odp(mr->umem)->umem_mutex);
+ }
+
+ return ret;
}
static int __mlx5_ib_dereg_mr(struct ib_mr *ibmr)
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index f2eb940bddc8..f655859eec00 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -268,6 +268,8 @@ static bool mlx5_ib_invalidate_range(struct mmu_interval_notifier *mni,
if (!umem_odp->npages)
goto out;
mr = umem_odp->private;
+ if (!mr)
+ goto out;
start = max_t(u64, ib_umem_start(umem_odp), range->start);
end = min_t(u64, ib_umem_end(umem_odp), range->end);
From: Andy Strohman <andrew(a)andrewstrohman.com>
Reference counting is used to ensure that
batadv_hardif_neigh_node and batadv_hard_iface
are not freed before/during
batadv_v_elp_throughput_metric_update work is
finished.
But there isn't a guarantee that the hard if will
remain associated with a soft interface up until
the work is finished.
This fixes a crash triggered by reboot that looks
like this:
Call trace:
batadv_v_mesh_free+0xd0/0x4dc [batman_adv]
batadv_v_elp_throughput_metric_update+0x1c/0xa4
process_one_work+0x178/0x398
worker_thread+0x2e8/0x4d0
kthread+0xd8/0xdc
ret_from_fork+0x10/0x20
(the batadv_v_mesh_free call is misleading,
and does not actually happen)
I was able to make the issue happen more reliably
by changing hardif_neigh->bat_v.metric_work work
to be delayed work. This allowed me to track down
and confirm the fix.
Cc: stable(a)vger.kernel.org
Fixes: c833484e5f38 ("batman-adv: ELP - compute the metric based on the estimated throughput")
Signed-off-by: Andy Strohman <andrew(a)andrewstrohman.com>
[sven(a)narfation.org: prevent entering batadv_v_elp_get_throughput without
soft_iface]
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
---
net/batman-adv/bat_v_elp.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c
index 1d704574e6bf..fbf499bcc671 100644
--- a/net/batman-adv/bat_v_elp.c
+++ b/net/batman-adv/bat_v_elp.c
@@ -66,12 +66,19 @@ static void batadv_v_elp_start_timer(struct batadv_hard_iface *hard_iface)
static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
{
struct batadv_hard_iface *hard_iface = neigh->if_incoming;
+ struct net_device *soft_iface = hard_iface->soft_iface;
struct ethtool_link_ksettings link_settings;
struct net_device *real_netdev;
struct station_info sinfo;
u32 throughput;
int ret;
+ /* don't query throughput when no longer associated with any
+ * batman-adv interface
+ */
+ if (!soft_iface)
+ return BATADV_THROUGHPUT_DEFAULT_VALUE;
+
/* if the user specified a customised value for this interface, then
* return it directly
*/
@@ -141,7 +148,7 @@ static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
default_throughput:
if (!(hard_iface->bat_v.flags & BATADV_WARNING_DEFAULT)) {
- batadv_info(hard_iface->soft_iface,
+ batadv_info(soft_iface,
"WiFi driver or ethtool info does not provide information about link speeds on interface %s, therefore defaulting to hardcoded throughput values of %u.%1u Mbps. Consider overriding the throughput manually or checking your driver.\n",
hard_iface->net_dev->name,
BATADV_THROUGHPUT_DEFAULT_VALUE / 10,
--
2.39.5
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d63b0e8a628e62ca85a0f7915230186bb92f8bb4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021131-mushy-affecting-a576@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d63b0e8a628e62ca85a0f7915230186bb92f8bb4 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Tue, 28 Jan 2025 00:55:24 +0000
Subject: [PATCH] io_uring: fix multishots with selected buffers
We do io_kbuf_recycle() when arming a poll but every iteration of a
multishot can grab more buffers, which is why we need to flush the kbuf
ring state before continuing with waiting.
Cc: stable(a)vger.kernel.org
Fixes: b3fdea6ecb55c ("io_uring: multishot recv")
Reported-by: Muhammad Ramdhan <ramdhan(a)starlabs.sg>
Reported-by: Bing-Jhong Billy Jheng <billy(a)starlabs.sg>
Reported-by: Jacob Soo <jacob.soo(a)starlabs.sg>
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/1bfc9990fe435f1fc6152ca9efeba5eb3e68339c.17380255…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 356474c66f32..31b118133bb0 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -315,8 +315,10 @@ void io_poll_task_func(struct io_kiocb *req, struct io_tw_state *ts)
ret = io_poll_check_events(req, ts);
if (ret == IOU_POLL_NO_ACTION) {
+ io_kbuf_recycle(req, 0);
return;
} else if (ret == IOU_POLL_REQUEUE) {
+ io_kbuf_recycle(req, 0);
__io_poll_execute(req, 0);
return;
}
Some error handling issues I noticed while looking at the code.
Only compile-tested.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v2:
- Fix typo in commit message: vmclock_ptp_register()
- Retarget against net tree
- Include patch from David. It's at the start of the series so that all
bugfixes are at the beginning and the logical ordering of my patches
is not disrupted.
- Pick up tags from LKML
- Link to v1: https://lore.kernel.org/r/20250206-vmclock-probe-v1-0-17a3ea07be34@linutron…
---
David Woodhouse (1):
ptp: vmclock: Add .owner to vmclock_miscdev_fops
Thomas Weißschuh (4):
ptp: vmclock: Set driver data before its usage
ptp: vmclock: Don't unregister misc device if it was not registered
ptp: vmclock: Clean up miscdev and ptp clock through devres
ptp: vmclock: Remove goto-based cleanup logic
drivers/ptp/ptp_vmclock.c | 47 +++++++++++++++++++++--------------------------
1 file changed, 21 insertions(+), 26 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250206-vmclock-probe-57cbcb770925
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Hi there! I hope you can get back to me soon. I'm currently in need of help with purchasing
some raw materials from China. Thank you!
Martyn Beasley
--
Weslaco ISD does not discriminate on the basis of race, religion, color,
national origin, gender, or disability in providing education services,
activities, and programs, including vocational programs, in accordance with
Title VI of the Civil Rights Act of 1964, as amended; Title IX of the
Educational Amendments of 1972; section 504 of the Rehabilitation Act of
1973, as amended.
This driver supports page faults on PCI RID since commit <9f831c16c69e>
("iommu/vt-d: Remove the pasid present check in prq_event_thread") by
allowing the reporting of page faults with the pasid_present field cleared
to the upper layer for further handling. The fundamental assumption here
is that the detach or replace operations act as a fence for page faults.
This implies that all pending page faults associated with a specific RID
or PASID are flushed when a domain is detached or replaced from a device
RID or PASID.
However, the intel_iommu_drain_pasid_prq() helper does not correctly
handle faults for RID. This leads to faults potentially remaining pending
in the iommu hardware queue even after the domain is detached, thereby
violating the aforementioned assumption.
Fix this issue by extending intel_iommu_drain_pasid_prq() to cover faults
for RID.
Fixes: 9f831c16c69e ("iommu/vt-d: Remove the pasid present check in prq_event_thread")
Cc: stable(a)vger.kernel.org
Suggested-by: Kevin Tian <kevin.tian(a)intel.com>
Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
---
drivers/iommu/intel/prq.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Change log:
v2:
- Add check on page faults targeting RID.
v1: https://lore.kernel.org/linux-iommu/20250120080144.810455-1-baolu.lu@linux.…
diff --git a/drivers/iommu/intel/prq.c b/drivers/iommu/intel/prq.c
index c2d792db52c3..064194399b38 100644
--- a/drivers/iommu/intel/prq.c
+++ b/drivers/iommu/intel/prq.c
@@ -87,7 +87,9 @@ void intel_iommu_drain_pasid_prq(struct device *dev, u32 pasid)
struct page_req_dsc *req;
req = &iommu->prq[head / sizeof(*req)];
- if (!req->pasid_present || req->pasid != pasid) {
+ if (req->rid != sid ||
+ (req->pasid_present && pasid != req->pasid) ||
+ (!req->pasid_present && pasid != IOMMU_NO_PASID)) {
head = (head + sizeof(*req)) & PRQ_RING_MASK;
continue;
}
--
2.43.0
Fixes a use-after-free and a struct without an initialiser
Attempt 2, since apparently I messed up the command the first time
Stuart Hayhurst (2):
HID: corsair-void: Add missing delayed work cancel for headset status
HID: corsair-void: Initialise memory for psy_cfg
drivers/hid/hid-corsair-void.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--
2.47.1
Patches 1 and 2 of this series fix the issue reported by Hsin-Te Yuan
[1] where MT8192-based Chromebooks are not able to suspend/resume 10
times in a row. Either one of those patches on its own is enough to fix
the issue, but I believe both are desirable, so I've included them both
here.
Patches 3-5 fix unrelated issues that I've noticed while debugging.
Patch 3 fixes IRQ storms when the temperature sensors drop to 20
Celsius. Patches 4 and 5 are cleanups to prevent future issues.
To test this series, I've run 'rtcwake -m mem -d 60' 10 times in a row
on a MT8192-Asurada-Spherion-rev3 Chromebook and checked that the wakeup
happened 60 seconds later (+-5 seconds). I've repeated that test on 10
separate runs. Not once did the chromebook wake up early with the series
applied.
I've also checked that during those runs, the LVTS interrupt didn't
trigger even once, while before the series it would trigger a few times
per run, generally during boot or resume.
Finally, as a sanity check I've verified that the interrupts still work
by lowering the thermal trip point to 45 Celsius and running 'stress -c
8'. Indeed they still do, and the temperature showed by the
thermal_temperature ftrace event matched the expected value.
[1] https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Changes in v2:
- Renamed bitmasks for interrupt enable (added "INTEN" to the name)
- Made read-only arrays static const
- Changed sensor_filt_bitmap array from u32 to u8 to save memory
- Rebased on next-20241209
- Link to v1: https://lore.kernel.org/r/20241125-mt8192-lvts-filtered-suspend-fix-v1-0-42…
---
Nícolas F. R. A. Prado (5):
thermal/drivers/mediatek/lvts: Disable monitor mode during suspend
thermal/drivers/mediatek/lvts: Disable Stage 3 thermal threshold
thermal/drivers/mediatek/lvts: Disable low offset IRQ for minimum threshold
thermal/drivers/mediatek/lvts: Start sensor interrupts disabled
thermal/drivers/mediatek/lvts: Only update IRQ enable for valid sensors
drivers/thermal/mediatek/lvts_thermal.c | 103 ++++++++++++++++++++++----------
1 file changed, 72 insertions(+), 31 deletions(-)
---
base-commit: d1486dca38afd08ca279ae94eb3a397f10737824
change-id: 20241121-mt8192-lvts-filtered-suspend-fix-a5032ca8eceb
Best regards,
--
Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
I'm announcing the release of the 6.6.77 kernel.
Only users of the um platform need to upgrade, this fixes a build error
in 6.6.76 with that platform. All other users can ignore this release.
The updated 6.6.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.6.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
fs/hostfs/hostfs_kern.c | 157 +++++++++++++-----------------------------------
2 files changed, 44 insertions(+), 115 deletions(-)
Greg Kroah-Hartman (5):
Revert "hostfs: fix the host directory parse when mounting."
Revert "hostfs: Add const qualifier to host_root in hostfs_fill_super()"
Revert "hostfs: fix string handling in __dentry_name()"
Revert "hostfs: convert hostfs to use the new mount API"
Linux 6.6.77
On 2/11/2025 11:28 AM, Greg KH wrote:
> On Mon, Feb 10, 2025 at 06:41:23PM +0530, Selvarasu Ganesan wrote:
>> Fix the following smatch errors:
>>
>> drivers/usb/host/xhci-mem.c:2060 xhci_add_in_port() error: unassigned variable 'tmp_minor_revision'
>> drivers/usb/host/xhci-hub.c:71 xhci_create_usb3x_bos_desc() error: unassigned variable 'bcdUSB'
>>
>> Fixes: d9b0328d0b8b ("xhci: Show ZHAOXIN xHCI root hub speed correctly")
>> Fixes: eb02aaf21f29 ("usb: xhci: Rewrite xhci_create_usb3_bos_desc()")
> This should be two different changes, right?
>
> Please break it up and send as a patch series.
>
> thanks,
>
> greg k-h
Hi Greg,
Thanks for your comments. Sure i will send as a patch series.
Thanks,
Selva
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 73b42dc69be8564d4951a14d00f827929fe5ef79
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024100123-unreached-enrage-2cb1@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
73b42dc69be8 ("KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)")
d33234342f8b ("KVM: x86: Move x2APIC ICR helper above kvm_apic_write_nodecode()")
71bf395a276f ("KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits")
4b7c3f6d04bd ("KVM: x86: Make x2APIC ID 100% readonly")
c7d4c5f01961 ("KVM: x86: Drop unused check_apicv_inhibit_reasons() callback definition")
5f18c642ff7e ("KVM: VMX: Move out vmx_x86_ops to 'main.c' to dispatch VMX and TDX")
0ec3d6d1f169 ("KVM: x86: Fully defer to vendor code to decide how to force immediate exit")
bf1a49436ea3 ("KVM: x86: Move handling of is_guest_mode() into fastpath exit handlers")
11776aa0cfa7 ("KVM: VMX: Handle forced exit due to preemption timer in fastpath")
e6b5d16bbd2d ("KVM: VMX: Re-enter guest in fastpath for "spurious" preemption timer exits")
9c9025ea003a ("KVM: x86: Plumb "force_immediate_exit" into kvm_entry() tracepoint")
8ecb10bcbfa3 ("Merge tag 'kvm-x86-lam-6.8' of https://github.com/kvm-x86/linux into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 73b42dc69be8564d4951a14d00f827929fe5ef79 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 19 Jul 2024 16:51:00 -0700
Subject: [PATCH] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)
Re-introduce the "split" x2APIC ICR storage that KVM used prior to Intel's
IPI virtualization support, but only for AMD. While not stated anywhere
in the APM, despite stating the ICR is a single 64-bit register, AMD CPUs
store the 64-bit ICR as two separate 32-bit values in ICR and ICR2. When
IPI virtualization (IPIv on Intel, all AVIC flavors on AMD) is enabled,
KVM needs to match CPU behavior as some ICR ICR writes will be handled by
the CPU, not by KVM.
Add a kvm_x86_ops knob to control the underlying format used by the CPU to
store the x2APIC ICR, and tune it to AMD vs. Intel regardless of whether
or not x2AVIC is enabled. If KVM is handling all ICR writes, the storage
format for x2APIC mode doesn't matter, and having the behavior follow AMD
versus Intel will provide better test coverage and ease debugging.
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable(a)vger.kernel.org
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Link: https://lore.kernel.org/r/20240719235107.3023592-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 95396e4cb3da..f9dfb2d62053 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1727,6 +1727,8 @@ struct kvm_x86_ops {
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
+
+ const bool x2apic_icr_is_split;
const unsigned long required_apicv_inhibits;
bool allow_apicv_in_x2apic_without_x2apic_virtualization;
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 63be07d7c782..c7180cb5f464 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2471,11 +2471,25 @@ int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
data &= ~APIC_ICR_BUSY;
kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
- kvm_lapic_set_reg64(apic, APIC_ICR, data);
+ if (kvm_x86_ops.x2apic_icr_is_split) {
+ kvm_lapic_set_reg(apic, APIC_ICR, data);
+ kvm_lapic_set_reg(apic, APIC_ICR2, data >> 32);
+ } else {
+ kvm_lapic_set_reg64(apic, APIC_ICR, data);
+ }
trace_kvm_apic_write(APIC_ICR, data);
return 0;
}
+static u64 kvm_x2apic_icr_read(struct kvm_lapic *apic)
+{
+ if (kvm_x86_ops.x2apic_icr_is_split)
+ return (u64)kvm_lapic_get_reg(apic, APIC_ICR) |
+ (u64)kvm_lapic_get_reg(apic, APIC_ICR2) << 32;
+
+ return kvm_lapic_get_reg64(apic, APIC_ICR);
+}
+
/* emulate APIC access in a trap manner */
void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
{
@@ -2493,7 +2507,7 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
* maybe-unecessary write, and both are in the noise anyways.
*/
if (apic_x2apic_mode(apic) && offset == APIC_ICR)
- WARN_ON_ONCE(kvm_x2apic_icr_write(apic, kvm_lapic_get_reg64(apic, APIC_ICR)));
+ WARN_ON_ONCE(kvm_x2apic_icr_write(apic, kvm_x2apic_icr_read(apic)));
else
kvm_lapic_reg_write(apic, offset, kvm_lapic_get_reg(apic, offset));
}
@@ -3013,18 +3027,22 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
/*
* In x2APIC mode, the LDR is fixed and based on the id. And
- * ICR is internally a single 64-bit register, but needs to be
- * split to ICR+ICR2 in userspace for backwards compatibility.
+ * if the ICR is _not_ split, ICR is internally a single 64-bit
+ * register, but needs to be split to ICR+ICR2 in userspace for
+ * backwards compatibility.
*/
- if (set) {
+ if (set)
*ldr = kvm_apic_calc_x2apic_ldr(x2apic_id);
- icr = __kvm_lapic_get_reg(s->regs, APIC_ICR) |
- (u64)__kvm_lapic_get_reg(s->regs, APIC_ICR2) << 32;
- __kvm_lapic_set_reg64(s->regs, APIC_ICR, icr);
- } else {
- icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
- __kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
+ if (!kvm_x86_ops.x2apic_icr_is_split) {
+ if (set) {
+ icr = __kvm_lapic_get_reg(s->regs, APIC_ICR) |
+ (u64)__kvm_lapic_get_reg(s->regs, APIC_ICR2) << 32;
+ __kvm_lapic_set_reg64(s->regs, APIC_ICR, icr);
+ } else {
+ icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
+ __kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
+ }
}
}
@@ -3222,7 +3240,7 @@ static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data)
u32 low;
if (reg == APIC_ICR) {
- *data = kvm_lapic_get_reg64(apic, APIC_ICR);
+ *data = kvm_x2apic_icr_read(apic);
return 0;
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d8cfe8f23327..eb3de01602b9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5053,6 +5053,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.enable_nmi_window = svm_enable_nmi_window,
.enable_irq_window = svm_enable_irq_window,
.update_cr8_intercept = svm_update_cr8_intercept,
+
+ .x2apic_icr_is_split = true,
.set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
.apicv_post_state_restore = avic_apicv_post_state_restore,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 4f6023a0deb3..0a094ebad4b1 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -89,6 +89,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.enable_nmi_window = vmx_enable_nmi_window,
.enable_irq_window = vmx_enable_irq_window,
.update_cr8_intercept = vmx_update_cr8_intercept,
+
+ .x2apic_icr_is_split = false,
.set_virtual_apic_mode = vmx_set_virtual_apic_mode,
.set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
Jann reported [1] possible issue when trampoline_check_ip returns
address near the bottom of the address space that is allowed to
call into the syscall if uretprobes are not set up.
Though the mmap minimum address restrictions will typically prevent
creating mappings there, let's make sure uretprobe syscall checks
for that.
[1] https://lore.kernel.org/bpf/202502081235.5A6F352985@keescook/T/#m9d416df341…
Cc: Kees Cook <kees(a)kernel.org>
Cc: Eyal Birger <eyal.birger(a)gmail.com>
Cc: stable(a)vger.kernel.org
Reported-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Jiri Olsa <jolsa(a)kernel.org>
---
arch/x86/kernel/uprobes.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 5a952c5ea66b..109d6641a1b3 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -357,19 +357,23 @@ void *arch_uprobe_trampoline(unsigned long *psize)
return &insn;
}
-static unsigned long trampoline_check_ip(void)
+static unsigned long trampoline_check_ip(unsigned long tramp)
{
- unsigned long tramp = uprobe_get_trampoline_vaddr();
-
return tramp + (uretprobe_syscall_check - uretprobe_trampoline_entry);
}
SYSCALL_DEFINE0(uretprobe)
{
struct pt_regs *regs = task_pt_regs(current);
- unsigned long err, ip, sp, r11_cx_ax[3];
+ unsigned long err, ip, sp, r11_cx_ax[3], tramp;
+
+ /* If there's no trampoline, we are called from wrong place. */
+ tramp = uprobe_get_trampoline_vaddr();
+ if (tramp == -1)
+ goto sigill;
- if (regs->ip != trampoline_check_ip())
+ /* Make sure the ip matches the only allowed sys_uretprobe caller. */
+ if (regs->ip != trampoline_check_ip(tramp))
goto sigill;
err = copy_from_user(r11_cx_ax, (void __user *)regs->sp, sizeof(r11_cx_ax));
--
2.48.1
The patch titled
Subject: mm: pgtable: fix incorrect reclaim of non-empty PTE pages
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Qi Zheng <zhengqi.arch(a)bytedance.com>
Subject: mm: pgtable: fix incorrect reclaim of non-empty PTE pages
Date: Tue, 11 Feb 2025 15:26:25 +0800
In zap_pte_range(), if the pte lock was released midway, the pte entries
may be refilled with physical pages by another thread, which may cause a
non-empty PTE page to be reclaimed and eventually cause the system to
crash.
To fix it, fall back to the slow path in this case to recheck if all pte
entries are still none.
Link: https://lkml.kernel.org/r/20250211072625.89188-1-zhengqi.arch@bytedance.com
Fixes: 6375e95f381e ("mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)")
Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Reported-by: Christian Brauner <brauner(a)kernel.org>
Closes: https://lore.kernel.org/all/20250207-anbot-bankfilialen-acce9d79a2c7@braune…
Reported-by: Qu Wenruo <quwenruo.btrfs(a)gmx.com>
Closes: https://lore.kernel.org/all/152296f3-5c81-4a94-97f3-004108fba7be@gmx.com/
Tested-by: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Cc: "Darrick J. Wong" <djwong(a)kernel.org>
Cc: Dave Chinner <david(a)fromorbit.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Zi Yan <ziy(a)nvidia.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
--- a/mm/memory.c~mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages
+++ a/mm/memory.c
@@ -1719,7 +1719,7 @@ static unsigned long zap_pte_range(struc
pmd_t pmdval;
unsigned long start = addr;
bool can_reclaim_pt = reclaim_pt_is_enabled(start, end, details);
- bool direct_reclaim = false;
+ bool direct_reclaim = true;
int nr;
retry:
@@ -1734,8 +1734,10 @@ retry:
do {
bool any_skipped = false;
- if (need_resched())
+ if (need_resched()) {
+ direct_reclaim = false;
break;
+ }
nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, rss,
&force_flush, &force_break, &any_skipped);
@@ -1743,11 +1745,20 @@ retry:
can_reclaim_pt = false;
if (unlikely(force_break)) {
addr += nr * PAGE_SIZE;
+ direct_reclaim = false;
break;
}
} while (pte += nr, addr += PAGE_SIZE * nr, addr != end);
- if (can_reclaim_pt && addr == end)
+ /*
+ * Fast path: try to hold the pmd lock and unmap the PTE page.
+ *
+ * If the pte lock was released midway (retry case), or if the attempt
+ * to hold the pmd lock failed, then we need to recheck all pte entries
+ * to ensure they are still none, thereby preventing the pte entries
+ * from being repopulated by another thread.
+ */
+ if (can_reclaim_pt && direct_reclaim && addr == end)
direct_reclaim = try_get_and_clear_pmd(mm, pmd, &pmdval);
add_mm_rss_vec(mm, rss);
_
Patches currently in -mm which might be from zhengqi.arch(a)bytedance.com are
mm-pgtable-fix-incorrect-reclaim-of-non-empty-pte-pages.patch
From: Lu Baolu <baolu.lu(a)linux.intel.com>
commit 89e8a2366e3bce584b6c01549d5019c5cda1205e upstream.
iommu_sva_bind_device() should return either a sva bond handle or an
ERR_PTR value in error cases. Existing drivers (idxd and uacce) only
check the return value with IS_ERR(). This could potentially lead to
a kernel NULL pointer dereference issue if the function returns NULL
instead of an error pointer.
In reality, this doesn't cause any problems because iommu_sva_bind_device()
only returns NULL when the kernel is not configured with CONFIG_IOMMU_SVA.
In this case, iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) will
return an error, and the device drivers won't call iommu_sva_bind_device()
at all.
Fixes: 26b25a2b98e4 ("iommu: Bind process address spaces to devices")
Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe(a)linaro.org>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
Reviewed-by: Vasant Hegde <vasant.hegde(a)amd.com>
Link: https://lore.kernel.org/r/20240528042528.71396-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
Signed-off-by: Bin Lan <lanbincn(a)qq.com>
---
include/linux/iommu.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9d87090953bc..2bfa9611be67 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -999,7 +999,7 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
static inline struct iommu_sva *
iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
{
- return NULL;
+ return ERR_PTR(-ENODEV);
}
static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
--
2.43.0
The mapping table for the rk3328 is missing the entry for -25C which is
found in the TRM section 9.5.2 "Temperature-to-code mapping".
NOTE: the kernel uses the tsadc_q_sel=1'b1 mode which is defined as:
4096-<code in table>. Whereas the table in the TRM gives the code
"3774" for -25C, the kernel uses 4096-3774=322.
Link: https://opensource.rock-chips.com/images/9/97/Rockchip_RK3328TRM_V1.1-Part1…
Cc: stable(a)vger.kernel.org
Fixes: eda519d5f73e ("thermal: rockchip: Support the RK3328 SOC in thermal driver")
Signed-off-by: Trevor Woerner <twoerner(a)gmail.com>
---
changes in v2:
- remove non-ascii characters in commit message
- remove dangling [1] reference in commit message
- include "Fixes:"
- add request for stable backport
---
drivers/thermal/rockchip_thermal.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/thermal/rockchip_thermal.c b/drivers/thermal/rockchip_thermal.c
index f551df48eef9..a8ad85feb68f 100644
--- a/drivers/thermal/rockchip_thermal.c
+++ b/drivers/thermal/rockchip_thermal.c
@@ -386,6 +386,7 @@ static const struct tsadc_table rk3328_code_table[] = {
{296, -40000},
{304, -35000},
{313, -30000},
+ {322, -25000},
{331, -20000},
{340, -15000},
{349, -10000},
--
2.44.0.501.g19981daefd7c
In zap_pte_range(), if the pte lock was released midway, the pte entries
may be refilled with physical pages by another thread, which may cause a
non-empty PTE page to be reclaimed and eventually cause the system to
crash.
To fix it, fall back to the slow path in this case to recheck if all pte
entries are still none.
Fixes: 6375e95f381e ("mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)")
Reported-by: Christian Brauner <brauner(a)kernel.org>
Closes: https://lore.kernel.org/all/20250207-anbot-bankfilialen-acce9d79a2c7@braune…
Reported-by: Qu Wenruo <quwenruo.btrfs(a)gmx.com>
Closes: https://lore.kernel.org/all/152296f3-5c81-4a94-97f3-004108fba7be@gmx.com/
Tested-by: Zi Yan <ziy(a)nvidia.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
---
mm/memory.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index a8196ae72e9ae..7c7193cb21248 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1721,7 +1721,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
pmd_t pmdval;
unsigned long start = addr;
bool can_reclaim_pt = reclaim_pt_is_enabled(start, end, details);
- bool direct_reclaim = false;
+ bool direct_reclaim = true;
int nr;
retry:
@@ -1736,8 +1736,10 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
do {
bool any_skipped = false;
- if (need_resched())
+ if (need_resched()) {
+ direct_reclaim = false;
break;
+ }
nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, rss,
&force_flush, &force_break, &any_skipped);
@@ -1745,11 +1747,20 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
can_reclaim_pt = false;
if (unlikely(force_break)) {
addr += nr * PAGE_SIZE;
+ direct_reclaim = false;
break;
}
} while (pte += nr, addr += PAGE_SIZE * nr, addr != end);
- if (can_reclaim_pt && addr == end)
+ /*
+ * Fast path: try to hold the pmd lock and unmap the PTE page.
+ *
+ * If the pte lock was released midway (retry case), or if the attempt
+ * to hold the pmd lock failed, then we need to recheck all pte entries
+ * to ensure they are still none, thereby preventing the pte entries
+ * from being repopulated by another thread.
+ */
+ if (can_reclaim_pt && direct_reclaim && addr == end)
direct_reclaim = try_get_and_clear_pmd(mm, pmd, &pmdval);
add_mm_rss_vec(mm, rss);
--
2.20.1
From: Xu Kuohai <xukuohai(a)huawei.com>
[ Upstream commit 28ead3eaabc16ecc907cfb71876da028080f6356 ]
bpf progs can be attached to kernel functions, and the attached functions
can take different parameters or return different return values. If
prog attached to one kernel function tail calls prog attached to another
kernel function, the ctx access or return value verification could be
bypassed.
For example, if prog1 is attached to func1 which takes only 1 parameter
and prog2 is attached to func2 which takes two parameters. Since verifier
assumes the bpf ctx passed to prog2 is constructed based on func2's
prototype, verifier allows prog2 to access the second parameter from
the bpf ctx passed to it. The problem is that verifier does not prevent
prog1 from passing its bpf ctx to prog2 via tail call. In this case,
the bpf ctx passed to prog2 is constructed from func1 instead of func2,
that is, the assumption for ctx access verification is bypassed.
Another example, if BPF LSM prog1 is attached to hook file_alloc_security,
and BPF LSM prog2 is attached to hook bpf_lsm_audit_rule_known. Verifier
knows the return value rules for these two hooks, e.g. it is legal for
bpf_lsm_audit_rule_known to return positive number 1, and it is illegal
for file_alloc_security to return positive number. So verifier allows
prog2 to return positive number 1, but does not allow prog1 to return
positive number. The problem is that verifier does not prevent prog1
from calling prog2 via tail call. In this case, prog2's return value 1
will be used as the return value for prog1's hook file_alloc_security.
That is, the return value rule is bypassed.
This patch adds restriction for tail call to prevent such bypasses.
Signed-off-by: Xu Kuohai <xukuohai(a)huawei.com>
Link: https://lore.kernel.org/r/20240719110059.797546-4-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: BRUNO VERNAY <bruno.vernay(a)se.com>
Signed-off-by: Hugo SIMELIERE <hsimeliere.opensource(a)witekio.com>
---
include/linux/bpf.h | 1 +
kernel/bpf/core.c | 19 +++++++++++++++++--
2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7f4ce183dcb0..39291ec48374 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -250,6 +250,7 @@ struct bpf_map {
* same prog type, JITed flag and xdp_has_frags flag.
*/
struct {
+ const struct btf_type *attach_func_proto;
spinlock_t lock;
enum bpf_prog_type type;
bool jited;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 83b416af4da1..c281f5b8705e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2121,6 +2121,7 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
{
enum bpf_prog_type prog_type = resolve_prog_type(fp);
bool ret;
+ struct bpf_prog_aux *aux = fp->aux;
if (fp->kprobe_override)
return false;
@@ -2132,12 +2133,26 @@ bool bpf_prog_map_compatible(struct bpf_map *map,
*/
map->owner.type = prog_type;
map->owner.jited = fp->jited;
- map->owner.xdp_has_frags = fp->aux->xdp_has_frags;
+ map->owner.xdp_has_frags = aux->xdp_has_frags;
+ map->owner.attach_func_proto = aux->attach_func_proto;
ret = true;
} else {
ret = map->owner.type == prog_type &&
map->owner.jited == fp->jited &&
- map->owner.xdp_has_frags == fp->aux->xdp_has_frags;
+ map->owner.xdp_has_frags == aux->xdp_has_frags;
+ if (ret &&
+ map->owner.attach_func_proto != aux->attach_func_proto) {
+ switch (prog_type) {
+ case BPF_PROG_TYPE_TRACING:
+ case BPF_PROG_TYPE_LSM:
+ case BPF_PROG_TYPE_EXT:
+ case BPF_PROG_TYPE_STRUCT_OPS:
+ ret = false;
+ break;
+ default:
+ break;
+ }
+ }
}
spin_unlock(&map->owner.lock);
--
2.43.0
When using USB MIDI, a lock is attempted to be acquired twice through a
re-entrant call to f_midi_transmit, causing a deadlock.
Fix it by using tasklet_hi_schedule() to schedule the inner
f_midi_transmit() via a tasklet from the completion handler.
Link: https://lore.kernel.org/all/CAArt=LjxU0fUZOj06X+5tkeGT+6RbXzpWg1h4t4Fwa_KGV…
Fixes: d5daf49b58661 ("USB: gadget: midi: add midi function driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jill Donahue <jilliandonahue58(a)gmail.com>
---
drivers/usb/gadget/function/f_midi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c
index 837fcdfa3840..37d438e5d451 100644
--- a/drivers/usb/gadget/function/f_midi.c
+++ b/drivers/usb/gadget/function/f_midi.c
@@ -283,7 +283,7 @@ f_midi_complete(struct usb_ep *ep, struct usb_request *req)
/* Our transmit completed. See if there's more to go.
* f_midi_transmit eats req, don't queue it again. */
req->length = 0;
- f_midi_transmit(midi);
+ tasklet_hi_schedule(&midi->tasklet);
return;
}
break;
--
2.25.1
Journal emptiness is not determined by sb->s_sequence == 0 but rather by
sb->s_start == 0 (which is set a few lines above). Furthermore 0 is a
valid transaction ID so the check can spuriously trigger. Remove the
invalid WARN_ON.
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/jbd2/journal.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 7e49d912b091..354c9f691df3 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1879,7 +1879,6 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
/* Log is no longer empty */
write_lock(&journal->j_state_lock);
- WARN_ON(!sb->s_sequence);
journal->j_flags &= ~JBD2_FLUSHED;
write_unlock(&journal->j_state_lock);
--
2.43.0
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x a2fad248947d702ed3dcb52b8377c1a3ae201e44
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021050-canal-limeade-cac4@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2fad248947d702ed3dcb52b8377c1a3ae201e44 Mon Sep 17 00:00:00 2001
From: Zijun Hu <quic_zijuhu(a)quicinc.com>
Date: Mon, 13 Jan 2025 22:43:23 +0800
Subject: [PATCH] Bluetooth: qca: Fix poor RF performance for WCN6855
For WCN6855, board ID specific NVM needs to be downloaded once board ID
is available, but the default NVM is always downloaded currently.
The wrong NVM causes poor RF performance, and effects user experience
for several types of laptop with WCN6855 on the market.
Fix by downloading board ID specific NVM if board ID is available.
Fixes: 095327fede00 ("Bluetooth: hci_qca: Add support for QTI Bluetooth chip wcn6855")
Cc: stable(a)vger.kernel.org # 6.4
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
Tested-by: Johan Hovold <johan+linaro(a)kernel.org>
Reviewed-by: Johan Hovold <johan+linaro(a)kernel.org>
Tested-by: Steev Klimaszewski <steev(a)kali.org> #Thinkpad X13s
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index a6b53d1f23db..cdf09d9a9ad2 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -909,8 +909,9 @@ int qca_uart_setup(struct hci_dev *hdev, uint8_t baudrate,
"qca/msnv%02x.bin", rom_ver);
break;
case QCA_WCN6855:
- snprintf(config.fwname, sizeof(config.fwname),
- "qca/hpnv%02x.bin", rom_ver);
+ qca_read_fw_board_id(hdev, &boardid);
+ qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname),
+ "hpnv", soc_type, ver, rom_ver, boardid);
break;
case QCA_WCN7850:
qca_get_nvm_name_by_board(config.fwname, sizeof(config.fwname),
From: Daniel Wagner <wagi(a)kernel.org>
[ Upstream commit d3d380eded7ee5fc2fc53b3b0e72365ded025c4a ]
The initial controller initialization mimiks the reconnect loop
behavior by switching from NEW to RESETTING and then to CONNECTING.
The transition from NEW to CONNECTING is a valid transition, so there is
no point entering the RESETTING state. TCP and RDMA also transition
directly to CONNECTING state.
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Daniel Wagner <wagi(a)kernel.org>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/fc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 0d2c22cf12a08..f3c0bff714eba 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3164,8 +3164,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
spin_unlock_irqrestore(&rport->lock, flags);
- if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
- !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
dev_err(ctrl->ctrl.device,
"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
goto fail_ctrl;
--
2.39.5
From: Daniel Wagner <wagi(a)kernel.org>
[ Upstream commit d3d380eded7ee5fc2fc53b3b0e72365ded025c4a ]
The initial controller initialization mimiks the reconnect loop
behavior by switching from NEW to RESETTING and then to CONNECTING.
The transition from NEW to CONNECTING is a valid transition, so there is
no point entering the RESETTING state. TCP and RDMA also transition
directly to CONNECTING state.
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Daniel Wagner <wagi(a)kernel.org>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/fc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 8e05239073ef2..f49e98c2e31db 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3536,8 +3536,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
spin_unlock_irqrestore(&rport->lock, flags);
- if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
- !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
dev_err(ctrl->ctrl.device,
"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
goto fail_ctrl;
--
2.39.5
From: Daniel Wagner <wagi(a)kernel.org>
[ Upstream commit d3d380eded7ee5fc2fc53b3b0e72365ded025c4a ]
The initial controller initialization mimiks the reconnect loop
behavior by switching from NEW to RESETTING and then to CONNECTING.
The transition from NEW to CONNECTING is a valid transition, so there is
no point entering the RESETTING state. TCP and RDMA also transition
directly to CONNECTING state.
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Daniel Wagner <wagi(a)kernel.org>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/fc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 8dfd317509aa6..ebe8c2f147a33 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3547,8 +3547,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
spin_unlock_irqrestore(&rport->lock, flags);
- if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
- !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
dev_err(ctrl->ctrl.device,
"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
goto fail_ctrl;
--
2.39.5
From: Daniel Wagner <wagi(a)kernel.org>
[ Upstream commit d3d380eded7ee5fc2fc53b3b0e72365ded025c4a ]
The initial controller initialization mimiks the reconnect loop
behavior by switching from NEW to RESETTING and then to CONNECTING.
The transition from NEW to CONNECTING is a valid transition, so there is
no point entering the RESETTING state. TCP and RDMA also transition
directly to CONNECTING state.
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Daniel Wagner <wagi(a)kernel.org>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/fc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 3dbf926fd99fd..2b0f15de77111 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3525,8 +3525,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
spin_unlock_irqrestore(&rport->lock, flags);
- if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
- !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
dev_err(ctrl->ctrl.device,
"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
goto fail_ctrl;
--
2.39.5
From: Daniel Wagner <wagi(a)kernel.org>
[ Upstream commit d3d380eded7ee5fc2fc53b3b0e72365ded025c4a ]
The initial controller initialization mimiks the reconnect loop
behavior by switching from NEW to RESETTING and then to CONNECTING.
The transition from NEW to CONNECTING is a valid transition, so there is
no point entering the RESETTING state. TCP and RDMA also transition
directly to CONNECTING state.
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Daniel Wagner <wagi(a)kernel.org>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/fc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index cdb1e706f855e..bb85e79b62fad 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3550,8 +3550,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
spin_unlock_irqrestore(&rport->lock, flags);
- if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
- !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
dev_err(ctrl->ctrl.device,
"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
goto fail_ctrl;
--
2.39.5
From: Daniel Wagner <wagi(a)kernel.org>
[ Upstream commit d3d380eded7ee5fc2fc53b3b0e72365ded025c4a ]
The initial controller initialization mimiks the reconnect loop
behavior by switching from NEW to RESETTING and then to CONNECTING.
The transition from NEW to CONNECTING is a valid transition, so there is
no point entering the RESETTING state. TCP and RDMA also transition
directly to CONNECTING state.
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Daniel Wagner <wagi(a)kernel.org>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/fc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index b81af7919e94c..d45ab530ff9b7 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3579,8 +3579,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
spin_unlock_irqrestore(&rport->lock, flags);
- if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
- !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
dev_err(ctrl->ctrl.device,
"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
goto fail_ctrl;
--
2.39.5
From: Daniel Wagner <wagi(a)kernel.org>
[ Upstream commit d3d380eded7ee5fc2fc53b3b0e72365ded025c4a ]
The initial controller initialization mimiks the reconnect loop
behavior by switching from NEW to RESETTING and then to CONNECTING.
The transition from NEW to CONNECTING is a valid transition, so there is
no point entering the RESETTING state. TCP and RDMA also transition
directly to CONNECTING state.
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Daniel Wagner <wagi(a)kernel.org>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/fc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index b81af7919e94c..d45ab530ff9b7 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3579,8 +3579,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
spin_unlock_irqrestore(&rport->lock, flags);
- if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
- !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+ if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
dev_err(ctrl->ctrl.device,
"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
goto fail_ctrl;
--
2.39.5
The patch titled
Subject: mm/rmap: reject hugetlb folios in folio_make_device_exclusive()
has been added to the -mm mm-unstable branch. Its filename is
mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/rmap: reject hugetlb folios in folio_make_device_exclusive()
Date: Mon, 10 Feb 2025 20:37:44 +0100
Even though FOLL_SPLIT_PMD on hugetlb now always fails with -EOPNOTSUPP,
let's add a safety net in case FOLL_SPLIT_PMD usage would ever be
reworked.
In particular, before commit 9cb28da54643 ("mm/gup: handle hugetlb in the
generic follow_page_mask code"), GUP(FOLL_SPLIT_PMD) would just have
returned a page. In particular, hugetlb folios that are not PMD-sized
would never have been prone to FOLL_SPLIT_PMD.
hugetlb folios can be anonymous, and page_make_device_exclusive_one() is
not really prepared for handling them at all. So let's spell that out.
Link: https://lkml.kernel.org/r/20250210193801.781278-3-david@redhat.com
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Cc: Alex Shi <alexs(a)kernel.org>
Cc: Danilo Krummrich <dakr(a)kernel.org>
Cc: Dave Airlie <airlied(a)gmail.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Karol Herbst <kherbst(a)redhat.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Lyude <lyude(a)redhat.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Simona Vetter <simona.vetter(a)ffwll.ch>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yanteng Si <si.yanteng(a)linux.dev>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/rmap.c~mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive
+++ a/mm/rmap.c
@@ -2499,7 +2499,7 @@ static bool folio_make_device_exclusive(
* Restrict to anonymous folios for now to avoid potential writeback
* issues.
*/
- if (!folio_test_anon(folio))
+ if (!folio_test_anon(folio) || folio_test_hugetlb(folio))
return false;
rmap_walk(folio, &rwc);
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-gup-reject-foll_split_pmd-with-hugetlb-vmas.patch
mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch
mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive.patch
mm-rmap-implement-make_device_exclusive-using-folio_walk-instead-of-rmap-walk.patch
mm-memory-detect-writability-in-restore_exclusive_pte-through-can_change_pte_writable.patch
mm-use-single-swp_device_exclusive-entry-type.patch
mm-page_vma_mapped-device-exclusive-entries-are-not-migration-entries.patch
kernel-events-uprobes-handle-device-exclusive-entries-correctly-in-__replace_page.patch
mm-ksm-handle-device-exclusive-entries-correctly-in-write_protect_page.patch
mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_unmap_one.patch
mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_migrate_one.patch
mm-rmap-handle-device-exclusive-entries-correctly-in-page_vma_mkclean_one.patch
mm-page_idle-handle-device-exclusive-entries-correctly-in-page_idle_clear_pte_refs_one.patch
mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_young_one.patch
mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_mkold_one.patch
mm-rmap-keep-mapcount-untouched-for-device-exclusive-entries.patch
mm-rmap-avoid-ebusy-from-make_device_exclusive.patch
The patch titled
Subject: mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs
has been added to the -mm mm-unstable branch. Its filename is
mm-gup-reject-foll_split_pmd-with-hugetlb-vmas.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs
Date: Mon, 10 Feb 2025 20:37:43 +0100
Patch series "mm: fixes for device-exclusive entries (hmm)", v2.
Discussing the PageTail() call in make_device_exclusive_range() with
Willy, I recently discovered [1] that device-exclusive handling does not
properly work with THP, making the hmm-tests selftests fail if THPs are
enabled on the system.
Looking into more details, I found that hugetlb is not properly fenced,
and I realized that something that was bugging me for longer -- how
device-exclusive entries interact with mapcounts -- completely breaks
migration/swapout/split/hwpoison handling of these folios while they have
device-exclusive PTEs.
The program below can be used to allocate 1 GiB worth of pages and making
them device-exclusive on a kernel with CONFIG_TEST_HMM.
Once they are device-exclusive, these folios cannot get swapped out
(proc$pid/smaps_rollup will always indicate 1 GiB RSS no matter how much
one forces memory reclaim), and when having a memory block onlined to
ZONE_MOVABLE, trying to offline it will loop forever and complain about
failed migration of a page that should be movable.
# echo offline > /sys/devices/system/memory/memory136/state
# echo online_movable > /sys/devices/system/memory/memory136/state
# ./hmm-swap &
... wait until everything is device-exclusive
# echo offline > /sys/devices/system/memory/memory136/state
[ 285.193431][T14882] page: refcount:2 mapcount:0 mapping:0000000000000000
index:0x7f20671f7 pfn:0x442b6a
[ 285.196618][T14882] memcg:ffff888179298000
[ 285.198085][T14882] anon flags: 0x5fff0000002091c(referenced|uptodate|
dirty|active|owner_2|swapbacked|node=1|zone=3|lastcpupid=0x7ff)
[ 285.201734][T14882] raw: ...
[ 285.204464][T14882] raw: ...
[ 285.207196][T14882] page dumped because: migration failure
[ 285.209072][T14882] page_owner tracks the page as allocated
[ 285.210915][T14882] page last allocated via order 0, migratetype
Movable, gfp_mask 0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO),
id 14926, tgid 14926 (hmm-swap), ts 254506295376, free_ts 227402023774
[ 285.216765][T14882] post_alloc_hook+0x197/0x1b0
[ 285.218874][T14882] get_page_from_freelist+0x76e/0x3280
[ 285.220864][T14882] __alloc_frozen_pages_noprof+0x38e/0x2740
[ 285.223302][T14882] alloc_pages_mpol+0x1fc/0x540
[ 285.225130][T14882] folio_alloc_mpol_noprof+0x36/0x340
[ 285.227222][T14882] vma_alloc_folio_noprof+0xee/0x1a0
[ 285.229074][T14882] __handle_mm_fault+0x2b38/0x56a0
[ 285.230822][T14882] handle_mm_fault+0x368/0x9f0
...
This series fixes all issues I found so far. There is no easy way to fix
without a bigger rework/cleanup. I have a bunch of cleanups on top (some
previous sent, some the result of the discussion in v1) that I will send
out separately once this landed and I get to it.
I wish we could just use some special present PROT_NONE PTEs instead of
these (non-present, non-none) fake-swap entries; but that just results in
the same problem we keep having (lack of spare PTE bits), and staring at
other similar fake-swap entries, that ship has sailed.
With this series, make_device_exclusive() doesn't actually belong into
mm/rmap.c anymore, but I'll leave moving that for another day.
I only tested this series with the hmm-tests selftests due to lack of HW,
so I'd appreciate some testing, especially if the interaction between two
GPUs wanting a device-exclusive entry works as expected.
<program>
#include <stdio.h>
#include <fcntl.h>
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <linux/types.h>
#include <linux/ioctl.h>
#define HMM_DMIRROR_EXCLUSIVE _IOWR('H', 0x05, struct hmm_dmirror_cmd)
struct hmm_dmirror_cmd {
__u64 addr;
__u64 ptr;
__u64 npages;
__u64 cpages;
__u64 faults;
};
const size_t size = 1 * 1024 * 1024 * 1024ul;
const size_t chunk_size = 2 * 1024 * 1024ul;
int main(void)
{
struct hmm_dmirror_cmd cmd;
size_t cur_size;
int fd, ret;
char *addr, *mirror;
fd = open("/dev/hmm_dmirror1", O_RDWR, 0);
if (fd < 0) {
perror("open failed\n");
exit(1);
}
addr = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap failed\n");
exit(1);
}
madvise(addr, size, MADV_NOHUGEPAGE);
memset(addr, 1, size);
mirror = malloc(chunk_size);
for (cur_size = 0; cur_size < size; cur_size += chunk_size) {
cmd.addr = (uintptr_t)addr + cur_size;
cmd.ptr = (uintptr_t)mirror;
cmd.npages = chunk_size / getpagesize();
ret = ioctl(fd, HMM_DMIRROR_EXCLUSIVE, &cmd);
if (ret) {
perror("ioctl failed\n");
exit(1);
}
}
pause();
return 0;
}
</program>
[1] https://lkml.kernel.org/r/25e02685-4f1d-47fa-be5b-01ff85bb0ce2@redhat.com
This patch (of 17):
We only have two FOLL_SPLIT_PMD users. While uprobe refuses hugetlb
early, make_device_exclusive_range() can end up getting called on hugetlb
VMAs.
Right now, this means that with a PMD-sized hugetlb page, we can end up
calling split_huge_pmd(), because pmd_trans_huge() also succeeds with
hugetlb PMDs.
For example, using a modified hmm-test selftest one can trigger:
[ 207.017134][T14945] ------------[ cut here ]------------
[ 207.018614][T14945] kernel BUG at mm/page_table_check.c:87!
[ 207.019716][T14945] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 207.021072][T14945] CPU: 3 UID: 0 PID: ...
[ 207.023036][T14945] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[ 207.024834][T14945] RIP: 0010:page_table_check_clear.part.0+0x488/0x510
[ 207.026128][T14945] Code: ...
[ 207.029965][T14945] RSP: 0018:ffffc9000cb8f348 EFLAGS: 00010293
[ 207.031139][T14945] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: ffffffff8249a0cd
[ 207.032649][T14945] RDX: ffff88811e883c80 RSI: ffffffff8249a357 RDI: ffff88811e883c80
[ 207.034183][T14945] RBP: ffff888105c0a050 R08: 0000000000000005 R09: 0000000000000000
[ 207.035688][T14945] R10: 00000000ffffffff R11: 0000000000000003 R12: 0000000000000001
[ 207.037203][T14945] R13: 0000000000000200 R14: 0000000000000001 R15: dffffc0000000000
[ 207.038711][T14945] FS: 00007f2783275740(0000) GS:ffff8881f4980000(0000) knlGS:0000000000000000
[ 207.040407][T14945] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 207.041660][T14945] CR2: 00007f2782c00000 CR3: 0000000132356000 CR4: 0000000000750ef0
[ 207.043196][T14945] PKRU: 55555554
[ 207.043880][T14945] Call Trace:
[ 207.044506][T14945] <TASK>
[ 207.045086][T14945] ? __die+0x51/0x92
[ 207.045864][T14945] ? die+0x29/0x50
[ 207.046596][T14945] ? do_trap+0x250/0x320
[ 207.047430][T14945] ? do_error_trap+0xe7/0x220
[ 207.048346][T14945] ? page_table_check_clear.part.0+0x488/0x510
[ 207.049535][T14945] ? handle_invalid_op+0x34/0x40
[ 207.050494][T14945] ? page_table_check_clear.part.0+0x488/0x510
[ 207.051681][T14945] ? exc_invalid_op+0x2e/0x50
[ 207.052589][T14945] ? asm_exc_invalid_op+0x1a/0x20
[ 207.053596][T14945] ? page_table_check_clear.part.0+0x1fd/0x510
[ 207.054790][T14945] ? page_table_check_clear.part.0+0x487/0x510
[ 207.055993][T14945] ? page_table_check_clear.part.0+0x488/0x510
[ 207.057195][T14945] ? page_table_check_clear.part.0+0x487/0x510
[ 207.058384][T14945] __page_table_check_pmd_clear+0x34b/0x5a0
[ 207.059524][T14945] ? __pfx___page_table_check_pmd_clear+0x10/0x10
[ 207.060775][T14945] ? __pfx___mutex_unlock_slowpath+0x10/0x10
[ 207.061940][T14945] ? __pfx___lock_acquire+0x10/0x10
[ 207.062967][T14945] pmdp_huge_clear_flush+0x279/0x360
[ 207.064024][T14945] split_huge_pmd_locked+0x82b/0x3750
...
Before commit 9cb28da54643 ("mm/gup: handle hugetlb in the generic
follow_page_mask code"), we would have ignored the flag; instead, let's
simply refuse the combination completely in check_vma_flags(): the caller
is likely not prepared to handle any hugetlb folios.
We'll teach make_device_exclusive_range() separately to ignore any hugetlb
folios as a future-proof safety net.
Link: https://lkml.kernel.org/r/20250210193801.781278-1-david@redhat.com
Link: https://lkml.kernel.org/r/20250210193801.781278-2-david@redhat.com
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Cc: Alex Shi <alexs(a)kernel.org>
Cc: Danilo Krummrich <dakr(a)kernel.org>
Cc: Dave Airlie <airlied(a)gmail.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Karol Herbst <kherbst(a)redhat.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Lyude <lyude(a)redhat.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yanteng Si <si.yanteng(a)linux.dev>
Cc: Simona Vetter <simona.vetter(a)ffwll.ch>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/gup.c~mm-gup-reject-foll_split_pmd-with-hugetlb-vmas
+++ a/mm/gup.c
@@ -1283,6 +1283,9 @@ static int check_vma_flags(struct vm_are
if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
return -EOPNOTSUPP;
+ if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma))
+ return -EOPNOTSUPP;
+
if (vma_is_secretmem(vma))
return -EFAULT;
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-gup-reject-foll_split_pmd-with-hugetlb-vmas.patch
mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch
mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive.patch
mm-rmap-implement-make_device_exclusive-using-folio_walk-instead-of-rmap-walk.patch
mm-memory-detect-writability-in-restore_exclusive_pte-through-can_change_pte_writable.patch
mm-use-single-swp_device_exclusive-entry-type.patch
mm-page_vma_mapped-device-exclusive-entries-are-not-migration-entries.patch
kernel-events-uprobes-handle-device-exclusive-entries-correctly-in-__replace_page.patch
mm-ksm-handle-device-exclusive-entries-correctly-in-write_protect_page.patch
mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_unmap_one.patch
mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_migrate_one.patch
mm-rmap-handle-device-exclusive-entries-correctly-in-page_vma_mkclean_one.patch
mm-page_idle-handle-device-exclusive-entries-correctly-in-page_idle_clear_pte_refs_one.patch
mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_young_one.patch
mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_mkold_one.patch
mm-rmap-keep-mapcount-untouched-for-device-exclusive-entries.patch
mm-rmap-avoid-ebusy-from-make_device_exclusive.patch
Commit 91b587ba79e1 ("f2fs: Introduce linear search for dentries")
This is a follow up to a patch that previously merged in stable to restore
access to files created with emojis under a different hashing scheme.
I believe it's relevant for all currently supported stable branches.
-Daniel
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 604120fd9e9df50ee0e803d3c6e77a1f45d2c58e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021013-rearrange-cavalry-69c3@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 604120fd9e9df50ee0e803d3c6e77a1f45d2c58e Mon Sep 17 00:00:00 2001
From: Sumit Gupta <sumitg(a)nvidia.com>
Date: Wed, 18 Dec 2024 00:07:36 +0000
Subject: [PATCH] arm64: tegra: Fix typo in Tegra234 dce-fabric compatible
The compatible string for the Tegra DCE fabric is currently defined as
'nvidia,tegra234-sce-fabric' but this is incorrect because this is the
compatible string for SCE fabric. Update the compatible for the DCE
fabric to correct the compatible string.
This compatible needs to be correct in order for the interconnect
to catch things such as improper data accesses.
Cc: stable(a)vger.kernel.org
Fixes: 302e154000ec ("arm64: tegra: Add node for CBB 2.0 on Tegra234")
Signed-off-by: Sumit Gupta <sumitg(a)nvidia.com>
Signed-off-by: Ivy Huang <yijuh(a)nvidia.com>
Reviewed-by: Brad Griffis <bgriffis(a)nvidia.com>
Reviewed-by: Jon Hunter <jonathanh(a)nvidia.com>
Link: https://lore.kernel.org/r/20241218000737.1789569-2-yijuh@nvidia.com
Signed-off-by: Thierry Reding <treding(a)nvidia.com>
diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
index 570331baa09e..62b9f1784030 100644
--- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
@@ -3995,7 +3995,7 @@ bpmp-fabric@d600000 {
};
dce-fabric@de00000 {
- compatible = "nvidia,tegra234-sce-fabric";
+ compatible = "nvidia,tegra234-dce-fabric";
reg = <0x0 0xde00000 0x0 0x40000>;
interrupts = <GIC_SPI 381 IRQ_TYPE_LEVEL_HIGH>;
status = "okay";
Even though FOLL_SPLIT_PMD on hugetlb now always fails with -EOPNOTSUPP,
let's add a safety net in case FOLL_SPLIT_PMD usage would ever be reworked.
In particular, before commit 9cb28da54643 ("mm/gup: handle hugetlb in the
generic follow_page_mask code"), GUP(FOLL_SPLIT_PMD) would just have
returned a page. In particular, hugetlb folios that are not PMD-sized
would never have been prone to FOLL_SPLIT_PMD.
hugetlb folios can be anonymous, and page_make_device_exclusive_one() is
not really prepared for handling them at all. So let's spell that out.
Fixes: b756a3b5e7ea ("mm: device exclusive memory access")
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
mm/rmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index c6c4d4ea29a7e..17fbfa61f7efb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2499,7 +2499,7 @@ static bool folio_make_device_exclusive(struct folio *folio,
* Restrict to anonymous folios for now to avoid potential writeback
* issues.
*/
- if (!folio_test_anon(folio))
+ if (!folio_test_anon(folio) || folio_test_hugetlb(folio))
return false;
rmap_walk(folio, &rwc);
--
2.48.1
From: Shu Han <ebpqwerty472123(a)gmail.com>
commit ea7e2d5e49c05e5db1922387b09ca74aa40f46e2 upstream.
The remap_file_pages syscall handler calls do_mmap() directly, which
doesn't contain the LSM security check. And if the process has called
personality(READ_IMPLIES_EXEC) before and remap_file_pages() is called for
RW pages, this will actually result in remapping the pages to RWX,
bypassing a W^X policy enforced by SELinux.
So we should check prot by security_mmap_file LSM hook in the
remap_file_pages syscall handler before do_mmap() is called. Otherwise, it
potentially permits an attacker to bypass a W^X policy enforced by
SELinux.
The bypass is similar to CVE-2016-10044, which bypass the same thing via
AIO and can be found in [1].
The PoC:
$ cat > test.c
int main(void) {
size_t pagesz = sysconf(_SC_PAGE_SIZE);
int mfd = syscall(SYS_memfd_create, "test", 0);
const char *buf = mmap(NULL, 4 * pagesz, PROT_READ | PROT_WRITE,
MAP_SHARED, mfd, 0);
unsigned int old = syscall(SYS_personality, 0xffffffff);
syscall(SYS_personality, READ_IMPLIES_EXEC | old);
syscall(SYS_remap_file_pages, buf, pagesz, 0, 2, 0);
syscall(SYS_personality, old);
// show the RWX page exists even if W^X policy is enforced
int fd = open("/proc/self/maps", O_RDONLY);
unsigned char buf2[1024];
while (1) {
int ret = read(fd, buf2, 1024);
if (ret <= 0) break;
write(1, buf2, ret);
}
close(fd);
}
$ gcc test.c -o test
$ ./test | grep rwx
7f1836c34000-7f1836c35000 rwxs 00002000 00:01 2050 /memfd:test (deleted)
Link: https://project-zero.issues.chromium.org/issues/42452389 [1]
Cc: stable(a)vger.kernel.org
Signed-off-by: Shu Han <ebpqwerty472123(a)gmail.com>
Acked-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
Signed-off-by: Pratyush Yadav <ptyadav(a)amazon.de>
---
mm/mmap.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c
index f8a2f15fc5a20..f774795c53bca 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3035,8 +3035,12 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
flags |= MAP_LOCKED;
file = get_file(vma->vm_file);
+ ret = security_mmap_file(vma->vm_file, prot, flags);
+ if (ret)
+ goto out_fput;
ret = do_mmap(vma->vm_file, start, size,
prot, flags, pgoff, &populate, NULL);
+out_fput:
fput(file);
out:
mmap_write_unlock(mm);
--
2.47.1
Fixes for the provided buffers for not allowing kbufs to cross a single
execution section. Upstream had most of it already fixed by chance,
which is why all 3 patches refer to a single upstream commit.
Pavel Begunkov (3):
io_uring: fix multishots with selected buffers
io_uring: fix io_req_prep_async with provided buffers
io_uring/rw: commit provided buffer state on async
io_uring/io_uring.c | 5 ++++-
io_uring/poll.c | 2 ++
io_uring/rw.c | 10 ++++++++++
3 files changed, 16 insertions(+), 1 deletion(-)
--
2.48.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x b06f388994500297bb91be60ffaf6825ecfd2afe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025020949-press-evolve-b900@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b06f388994500297bb91be60ffaf6825ecfd2afe Mon Sep 17 00:00:00 2001
From: Sean Anderson <sean.anderson(a)linux.dev>
Date: Fri, 10 Jan 2025 16:38:22 -0500
Subject: [PATCH] tty: xilinx_uartps: split sysrq handling
lockdep detects the following circular locking dependency:
CPU 0 CPU 1
========================== ============================
cdns_uart_isr() printk()
uart_port_lock(port) console_lock()
cdns_uart_console_write()
if (!port->sysrq)
uart_port_lock(port)
uart_handle_break()
port->sysrq = ...
uart_handle_sysrq_char()
printk()
console_lock()
The fixed commit attempts to avoid this situation by only taking the
port lock in cdns_uart_console_write if port->sysrq unset. However, if
(as shown above) cdns_uart_console_write runs before port->sysrq is set,
then it will try to take the port lock anyway. This may result in a
deadlock.
Fix this by splitting sysrq handling into two parts. We use the prepare
helper under the port lock and defer handling until we release the lock.
Fixes: 74ea66d4ca06 ("tty: xuartps: Improve sysrq handling")
Signed-off-by: Sean Anderson <sean.anderson(a)linux.dev>
Cc: stable(a)vger.kernel.org # c980248179d: serial: xilinx_uartps: Use port lock wrappers
Acked-by: John Ogness <john.ogness(a)linutronix.de>
Link: https://lore.kernel.org/r/20250110213822.2107462-1-sean.anderson@linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/xilinx_uartps.c b/drivers/tty/serial/xilinx_uartps.c
index beb151be4d32..92ec51870d1d 100644
--- a/drivers/tty/serial/xilinx_uartps.c
+++ b/drivers/tty/serial/xilinx_uartps.c
@@ -287,7 +287,7 @@ static void cdns_uart_handle_rx(void *dev_id, unsigned int isrstatus)
continue;
}
- if (uart_handle_sysrq_char(port, data))
+ if (uart_prepare_sysrq_char(port, data))
continue;
if (is_rxbs_support) {
@@ -495,7 +495,7 @@ static irqreturn_t cdns_uart_isr(int irq, void *dev_id)
!(readl(port->membase + CDNS_UART_CR) & CDNS_UART_CR_RX_DIS))
cdns_uart_handle_rx(dev_id, isrstatus);
- uart_port_unlock(port);
+ uart_unlock_and_check_sysrq(port);
return IRQ_HANDLED;
}
@@ -1380,9 +1380,7 @@ static void cdns_uart_console_write(struct console *co, const char *s,
unsigned int imr, ctrl;
int locked = 1;
- if (port->sysrq)
- locked = 0;
- else if (oops_in_progress)
+ if (oops_in_progress)
locked = uart_port_trylock_irqsave(port, &flags);
else
uart_port_lock_irqsave(port, &flags);
From: Niravkumar L Rabara <niravkumar.l.rabara(a)intel.com>
This patchset introduces improvements and fixes for cadence nand driver.
The changes include:
1. Replace dma_request_channel() with dma_request_chan_by_mask() and use
helper functions to return proper error code instead of fixed -EBUSY.
2. Remap the slave DMA I/O resources to enhance driver portability.
3. Fixed dma_unmap_single to use correct physical/bus device.
v3 changes:-
* Update commit message based on v2 review feedback for better clarity.
* Use dma_request_chan_by_mask() and helper functions to return proper
error code instead of fixed -EBUSY error code.
link to v2:
- https://lore.kernel.org/all/20250116032154.3976447-1-niravkumar.l.rabara@in…
v2 changes:-
* Added the missing Fixes and Cc: stable tags to the patches.
link to v1:
- https://lore.kernel.org/all/20250108135234.3107502-1-niravkumar.l.rabara@in…
Niravkumar L Rabara (3):
mtd: rawnand: cadence: fix error code in cadence_nand_init()
mtd: rawnand: cadence: use dma_map_resource for sdma address
mtd: rawnand: cadence: fix incorrect device in dma_unmap_single
.../mtd/nand/raw/cadence-nand-controller.c | 42 ++++++++++++++-----
1 file changed, 31 insertions(+), 11 deletions(-)
--
2.25.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1aaf8c122918aa8897605a9aa1e8ed6600d6f930
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021008-nemeses-footwork-0f1d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1aaf8c122918aa8897605a9aa1e8ed6600d6f930 Mon Sep 17 00:00:00 2001
From: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Date: Tue, 21 Jan 2025 10:01:59 +0800
Subject: [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
We can run into an infinite loop in __get_longterm_locked() when
collect_longterm_unpinnable_folios() finds only folios that are isolated
from the LRU or were never added to the LRU. This can happen when all
folios to be pinned are never added to the LRU, for example when
vm_ops->fault allocated pages using cma_alloc() and never added them to
the LRU.
Fix it by simply taking a look at the list in the single caller, to see if
anything was added.
[zhaoyang.huang(a)unisoc.com: move definition of local]
Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com
Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com
Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()")
Signed-off-by: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Aijun Sun <aijun.sun(a)unisoc.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/gup.c b/mm/gup.c
index 9aaf338cc1f4..3883b307780e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2320,13 +2320,13 @@ static void pofs_unpin(struct pages_or_folios *pofs)
/*
* Returns the number of collected folios. Return value is always >= 0.
*/
-static unsigned long collect_longterm_unpinnable_folios(
+static void collect_longterm_unpinnable_folios(
struct list_head *movable_folio_list,
struct pages_or_folios *pofs)
{
- unsigned long i, collected = 0;
struct folio *prev_folio = NULL;
bool drain_allow = true;
+ unsigned long i;
for (i = 0; i < pofs->nr_entries; i++) {
struct folio *folio = pofs_get_folio(pofs, i);
@@ -2338,8 +2338,6 @@ static unsigned long collect_longterm_unpinnable_folios(
if (folio_is_longterm_pinnable(folio))
continue;
- collected++;
-
if (folio_is_device_coherent(folio))
continue;
@@ -2361,8 +2359,6 @@ static unsigned long collect_longterm_unpinnable_folios(
NR_ISOLATED_ANON + folio_is_file_lru(folio),
folio_nr_pages(folio));
}
-
- return collected;
}
/*
@@ -2439,11 +2435,9 @@ static long
check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
{
LIST_HEAD(movable_folio_list);
- unsigned long collected;
- collected = collect_longterm_unpinnable_folios(&movable_folio_list,
- pofs);
- if (!collected)
+ collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
+ if (list_empty(&movable_folio_list))
return 0;
return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x c9c0036c1990da8d2dd33563e327e05a775fcf10
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021013-trinity-yearling-c7bf@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c9c0036c1990da8d2dd33563e327e05a775fcf10 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Date: Sat, 4 Jan 2025 15:20:12 +0100
Subject: [PATCH] soc: mediatek: mtk-devapc: Fix leaking IO map on driver
remove
Driver removal should fully clean up - unmap the memory.
Fixes: 0890beb22618 ("soc: mediatek: add mt6779 devapc driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20250104142012.115974-2-krzysztof.kozlowski@linar…
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
diff --git a/drivers/soc/mediatek/mtk-devapc.c b/drivers/soc/mediatek/mtk-devapc.c
index 500847b41b16..f54c966138b5 100644
--- a/drivers/soc/mediatek/mtk-devapc.c
+++ b/drivers/soc/mediatek/mtk-devapc.c
@@ -305,6 +305,7 @@ static void mtk_devapc_remove(struct platform_device *pdev)
struct mtk_devapc_context *ctx = platform_get_drvdata(pdev);
stop_devapc(ctx);
+ iounmap(ctx->infra_base);
}
static struct platform_driver mtk_devapc_driver = {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c9c0036c1990da8d2dd33563e327e05a775fcf10
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021013-unveiled-grazing-b8ac@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c9c0036c1990da8d2dd33563e327e05a775fcf10 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Date: Sat, 4 Jan 2025 15:20:12 +0100
Subject: [PATCH] soc: mediatek: mtk-devapc: Fix leaking IO map on driver
remove
Driver removal should fully clean up - unmap the memory.
Fixes: 0890beb22618 ("soc: mediatek: add mt6779 devapc driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20250104142012.115974-2-krzysztof.kozlowski@linar…
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
diff --git a/drivers/soc/mediatek/mtk-devapc.c b/drivers/soc/mediatek/mtk-devapc.c
index 500847b41b16..f54c966138b5 100644
--- a/drivers/soc/mediatek/mtk-devapc.c
+++ b/drivers/soc/mediatek/mtk-devapc.c
@@ -305,6 +305,7 @@ static void mtk_devapc_remove(struct platform_device *pdev)
struct mtk_devapc_context *ctx = platform_get_drvdata(pdev);
stop_devapc(ctx);
+ iounmap(ctx->infra_base);
}
static struct platform_driver mtk_devapc_driver = {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x c9c0036c1990da8d2dd33563e327e05a775fcf10
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021012-busybody-undertook-fce1@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c9c0036c1990da8d2dd33563e327e05a775fcf10 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Date: Sat, 4 Jan 2025 15:20:12 +0100
Subject: [PATCH] soc: mediatek: mtk-devapc: Fix leaking IO map on driver
remove
Driver removal should fully clean up - unmap the memory.
Fixes: 0890beb22618 ("soc: mediatek: add mt6779 devapc driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20250104142012.115974-2-krzysztof.kozlowski@linar…
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
diff --git a/drivers/soc/mediatek/mtk-devapc.c b/drivers/soc/mediatek/mtk-devapc.c
index 500847b41b16..f54c966138b5 100644
--- a/drivers/soc/mediatek/mtk-devapc.c
+++ b/drivers/soc/mediatek/mtk-devapc.c
@@ -305,6 +305,7 @@ static void mtk_devapc_remove(struct platform_device *pdev)
struct mtk_devapc_context *ctx = platform_get_drvdata(pdev);
stop_devapc(ctx);
+ iounmap(ctx->infra_base);
}
static struct platform_driver mtk_devapc_driver = {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x c0eb059a4575ed57f265d9883a5203799c19982c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021059-repulsive-savor-1f9d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c0eb059a4575ed57f265d9883a5203799c19982c Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Date: Sat, 4 Jan 2025 15:20:11 +0100
Subject: [PATCH] soc: mediatek: mtk-devapc: Fix leaking IO map on error paths
Error paths of mtk_devapc_probe() should unmap the memory. Reported by
Smatch:
drivers/soc/mediatek/mtk-devapc.c:292 mtk_devapc_probe() warn: 'ctx->infra_base' from of_iomap() not released on lines: 277,281,286.
Fixes: 0890beb22618 ("soc: mediatek: add mt6779 devapc driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20250104142012.115974-1-krzysztof.kozlowski@linar…
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
diff --git a/drivers/soc/mediatek/mtk-devapc.c b/drivers/soc/mediatek/mtk-devapc.c
index 2a1adcb87d4e..500847b41b16 100644
--- a/drivers/soc/mediatek/mtk-devapc.c
+++ b/drivers/soc/mediatek/mtk-devapc.c
@@ -273,23 +273,31 @@ static int mtk_devapc_probe(struct platform_device *pdev)
return -EINVAL;
devapc_irq = irq_of_parse_and_map(node, 0);
- if (!devapc_irq)
- return -EINVAL;
+ if (!devapc_irq) {
+ ret = -EINVAL;
+ goto err;
+ }
ctx->infra_clk = devm_clk_get_enabled(&pdev->dev, "devapc-infra-clock");
- if (IS_ERR(ctx->infra_clk))
- return -EINVAL;
+ if (IS_ERR(ctx->infra_clk)) {
+ ret = -EINVAL;
+ goto err;
+ }
ret = devm_request_irq(&pdev->dev, devapc_irq, devapc_violation_irq,
IRQF_TRIGGER_NONE, "devapc", ctx);
if (ret)
- return ret;
+ goto err;
platform_set_drvdata(pdev, ctx);
start_devapc(ctx);
return 0;
+
+err:
+ iounmap(ctx->infra_base);
+ return ret;
}
static void mtk_devapc_remove(struct platform_device *pdev)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c0eb059a4575ed57f265d9883a5203799c19982c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021049-womanlike-crouch-8a65@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c0eb059a4575ed57f265d9883a5203799c19982c Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Date: Sat, 4 Jan 2025 15:20:11 +0100
Subject: [PATCH] soc: mediatek: mtk-devapc: Fix leaking IO map on error paths
Error paths of mtk_devapc_probe() should unmap the memory. Reported by
Smatch:
drivers/soc/mediatek/mtk-devapc.c:292 mtk_devapc_probe() warn: 'ctx->infra_base' from of_iomap() not released on lines: 277,281,286.
Fixes: 0890beb22618 ("soc: mediatek: add mt6779 devapc driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20250104142012.115974-1-krzysztof.kozlowski@linar…
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
diff --git a/drivers/soc/mediatek/mtk-devapc.c b/drivers/soc/mediatek/mtk-devapc.c
index 2a1adcb87d4e..500847b41b16 100644
--- a/drivers/soc/mediatek/mtk-devapc.c
+++ b/drivers/soc/mediatek/mtk-devapc.c
@@ -273,23 +273,31 @@ static int mtk_devapc_probe(struct platform_device *pdev)
return -EINVAL;
devapc_irq = irq_of_parse_and_map(node, 0);
- if (!devapc_irq)
- return -EINVAL;
+ if (!devapc_irq) {
+ ret = -EINVAL;
+ goto err;
+ }
ctx->infra_clk = devm_clk_get_enabled(&pdev->dev, "devapc-infra-clock");
- if (IS_ERR(ctx->infra_clk))
- return -EINVAL;
+ if (IS_ERR(ctx->infra_clk)) {
+ ret = -EINVAL;
+ goto err;
+ }
ret = devm_request_irq(&pdev->dev, devapc_irq, devapc_violation_irq,
IRQF_TRIGGER_NONE, "devapc", ctx);
if (ret)
- return ret;
+ goto err;
platform_set_drvdata(pdev, ctx);
start_devapc(ctx);
return 0;
+
+err:
+ iounmap(ctx->infra_base);
+ return ret;
}
static void mtk_devapc_remove(struct platform_device *pdev)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1e758b613212b6964518a67939535910b5aee831
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021011-ending-obliged-c11e@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e758b613212b6964518a67939535910b5aee831 Mon Sep 17 00:00:00 2001
From: Angelo Dureghello <adureghello(a)baylibre.com>
Date: Wed, 8 Jan 2025 18:29:15 +0100
Subject: [PATCH] iio: dac: ad3552r-common: fix ad3541/2r ranges
Fix ad3541/2r voltage ranges to be as per ad3542r datasheet,
rev. C, table 38 (page 57).
The wrong ad354xr ranges was generating erroneous Vpp output.
In more details:
- fix wrong number of ranges, they are 5 ranges, not 6,
- remove non-existent 0-3V range,
- adjust order, since ad3552r_find_range() get a wrong index,
producing a wrong Vpp as output.
Retested all the ranges on real hardware, EVALAD3542RFMCZ:
adi,output-range-microvolt (fdt):
<(000000) (2500000)>; ok (Rfbx1, switch 10)
<(000000) (5000000)>; ok (Rfbx1, switch 10)
<(000000) (10000000)>; ok (Rfbx1, switch 10)
<(-5000000) (5000000)>; ok (Rfbx2, switch +/- 5)
<(-2500000) (7500000)>; ok (Rfbx2, switch -2.5/7.5)
Fixes: 8f2b54824b28 ("drivers:iio:dac: Add AD3552R driver support")
Signed-off-by: Angelo Dureghello <adureghello(a)baylibre.com>
Reviewed-by: David Lechner <dlechner(a)baylibre.com>
Link: https://patch.msgid.link/20250108-wip-bl-ad3552r-axi-v0-iio-testing-carlos-…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/dac/ad3552r-common.c b/drivers/iio/dac/ad3552r-common.c
index 0f495df2e5ce..03e0864f5084 100644
--- a/drivers/iio/dac/ad3552r-common.c
+++ b/drivers/iio/dac/ad3552r-common.c
@@ -22,11 +22,10 @@ EXPORT_SYMBOL_NS_GPL(ad3552r_ch_ranges, "IIO_AD3552R");
const s32 ad3542r_ch_ranges[AD3542R_MAX_RANGES][2] = {
[AD3542R_CH_OUTPUT_RANGE_0__2P5V] = { 0, 2500 },
- [AD3542R_CH_OUTPUT_RANGE_0__3V] = { 0, 3000 },
[AD3542R_CH_OUTPUT_RANGE_0__5V] = { 0, 5000 },
[AD3542R_CH_OUTPUT_RANGE_0__10V] = { 0, 10000 },
- [AD3542R_CH_OUTPUT_RANGE_NEG_2P5__7P5V] = { -2500, 7500 },
- [AD3542R_CH_OUTPUT_RANGE_NEG_5__5V] = { -5000, 5000 }
+ [AD3542R_CH_OUTPUT_RANGE_NEG_5__5V] = { -5000, 5000 },
+ [AD3542R_CH_OUTPUT_RANGE_NEG_2P5__7P5V] = { -2500, 7500 }
};
EXPORT_SYMBOL_NS_GPL(ad3542r_ch_ranges, "IIO_AD3552R");
diff --git a/drivers/iio/dac/ad3552r.h b/drivers/iio/dac/ad3552r.h
index fd5a3dfd1d1c..4b5581039ae9 100644
--- a/drivers/iio/dac/ad3552r.h
+++ b/drivers/iio/dac/ad3552r.h
@@ -131,7 +131,7 @@
#define AD3552R_CH1_ACTIVE BIT(1)
#define AD3552R_MAX_RANGES 5
-#define AD3542R_MAX_RANGES 6
+#define AD3542R_MAX_RANGES 5
#define AD3552R_QUAD_SPI 2
extern const s32 ad3552r_ch_ranges[AD3552R_MAX_RANGES][2];
@@ -189,16 +189,14 @@ enum ad3552r_ch_vref_select {
enum ad3542r_ch_output_range {
/* Range from 0 V to 2.5 V. Requires Rfb1x connection */
AD3542R_CH_OUTPUT_RANGE_0__2P5V,
- /* Range from 0 V to 3 V. Requires Rfb1x connection */
- AD3542R_CH_OUTPUT_RANGE_0__3V,
/* Range from 0 V to 5 V. Requires Rfb1x connection */
AD3542R_CH_OUTPUT_RANGE_0__5V,
/* Range from 0 V to 10 V. Requires Rfb2x connection */
AD3542R_CH_OUTPUT_RANGE_0__10V,
- /* Range from -2.5 V to 7.5 V. Requires Rfb2x connection */
- AD3542R_CH_OUTPUT_RANGE_NEG_2P5__7P5V,
/* Range from -5 V to 5 V. Requires Rfb2x connection */
AD3542R_CH_OUTPUT_RANGE_NEG_5__5V,
+ /* Range from -2.5 V to 7.5 V. Requires Rfb2x connection */
+ AD3542R_CH_OUTPUT_RANGE_NEG_2P5__7P5V,
};
enum ad3552r_ch_output_range {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1e758b613212b6964518a67939535910b5aee831
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021011-gully-stillness-9d82@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e758b613212b6964518a67939535910b5aee831 Mon Sep 17 00:00:00 2001
From: Angelo Dureghello <adureghello(a)baylibre.com>
Date: Wed, 8 Jan 2025 18:29:15 +0100
Subject: [PATCH] iio: dac: ad3552r-common: fix ad3541/2r ranges
Fix ad3541/2r voltage ranges to be as per ad3542r datasheet,
rev. C, table 38 (page 57).
The wrong ad354xr ranges was generating erroneous Vpp output.
In more details:
- fix wrong number of ranges, they are 5 ranges, not 6,
- remove non-existent 0-3V range,
- adjust order, since ad3552r_find_range() get a wrong index,
producing a wrong Vpp as output.
Retested all the ranges on real hardware, EVALAD3542RFMCZ:
adi,output-range-microvolt (fdt):
<(000000) (2500000)>; ok (Rfbx1, switch 10)
<(000000) (5000000)>; ok (Rfbx1, switch 10)
<(000000) (10000000)>; ok (Rfbx1, switch 10)
<(-5000000) (5000000)>; ok (Rfbx2, switch +/- 5)
<(-2500000) (7500000)>; ok (Rfbx2, switch -2.5/7.5)
Fixes: 8f2b54824b28 ("drivers:iio:dac: Add AD3552R driver support")
Signed-off-by: Angelo Dureghello <adureghello(a)baylibre.com>
Reviewed-by: David Lechner <dlechner(a)baylibre.com>
Link: https://patch.msgid.link/20250108-wip-bl-ad3552r-axi-v0-iio-testing-carlos-…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/dac/ad3552r-common.c b/drivers/iio/dac/ad3552r-common.c
index 0f495df2e5ce..03e0864f5084 100644
--- a/drivers/iio/dac/ad3552r-common.c
+++ b/drivers/iio/dac/ad3552r-common.c
@@ -22,11 +22,10 @@ EXPORT_SYMBOL_NS_GPL(ad3552r_ch_ranges, "IIO_AD3552R");
const s32 ad3542r_ch_ranges[AD3542R_MAX_RANGES][2] = {
[AD3542R_CH_OUTPUT_RANGE_0__2P5V] = { 0, 2500 },
- [AD3542R_CH_OUTPUT_RANGE_0__3V] = { 0, 3000 },
[AD3542R_CH_OUTPUT_RANGE_0__5V] = { 0, 5000 },
[AD3542R_CH_OUTPUT_RANGE_0__10V] = { 0, 10000 },
- [AD3542R_CH_OUTPUT_RANGE_NEG_2P5__7P5V] = { -2500, 7500 },
- [AD3542R_CH_OUTPUT_RANGE_NEG_5__5V] = { -5000, 5000 }
+ [AD3542R_CH_OUTPUT_RANGE_NEG_5__5V] = { -5000, 5000 },
+ [AD3542R_CH_OUTPUT_RANGE_NEG_2P5__7P5V] = { -2500, 7500 }
};
EXPORT_SYMBOL_NS_GPL(ad3542r_ch_ranges, "IIO_AD3552R");
diff --git a/drivers/iio/dac/ad3552r.h b/drivers/iio/dac/ad3552r.h
index fd5a3dfd1d1c..4b5581039ae9 100644
--- a/drivers/iio/dac/ad3552r.h
+++ b/drivers/iio/dac/ad3552r.h
@@ -131,7 +131,7 @@
#define AD3552R_CH1_ACTIVE BIT(1)
#define AD3552R_MAX_RANGES 5
-#define AD3542R_MAX_RANGES 6
+#define AD3542R_MAX_RANGES 5
#define AD3552R_QUAD_SPI 2
extern const s32 ad3552r_ch_ranges[AD3552R_MAX_RANGES][2];
@@ -189,16 +189,14 @@ enum ad3552r_ch_vref_select {
enum ad3542r_ch_output_range {
/* Range from 0 V to 2.5 V. Requires Rfb1x connection */
AD3542R_CH_OUTPUT_RANGE_0__2P5V,
- /* Range from 0 V to 3 V. Requires Rfb1x connection */
- AD3542R_CH_OUTPUT_RANGE_0__3V,
/* Range from 0 V to 5 V. Requires Rfb1x connection */
AD3542R_CH_OUTPUT_RANGE_0__5V,
/* Range from 0 V to 10 V. Requires Rfb2x connection */
AD3542R_CH_OUTPUT_RANGE_0__10V,
- /* Range from -2.5 V to 7.5 V. Requires Rfb2x connection */
- AD3542R_CH_OUTPUT_RANGE_NEG_2P5__7P5V,
/* Range from -5 V to 5 V. Requires Rfb2x connection */
AD3542R_CH_OUTPUT_RANGE_NEG_5__5V,
+ /* Range from -2.5 V to 7.5 V. Requires Rfb2x connection */
+ AD3542R_CH_OUTPUT_RANGE_NEG_2P5__7P5V,
};
enum ad3552r_ch_output_range {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 58db7c5fbe7daa42098d4965133a864f98ba90ba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021036-joyride-vitality-e81d@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58db7c5fbe7daa42098d4965133a864f98ba90ba Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Tue, 7 Jan 2025 15:39:56 -0500
Subject: [PATCH] mm/hugetlb: fix avoid_reserve to allow taking folio from
subpool
Patch series "mm/hugetlb: Refactor hugetlb allocation resv accounting",
v2.
This is a follow up on Ackerley's series here as replacement:
https://lore.kernel.org/r/cover.1728684491.git.ackerleytng@google.com
The goal of this series is to cleanup hugetlb resv accounting, especially
during folio allocation, to decouple a few things:
- Hugetlb folios v.s. Hugetlbfs: IOW, the hope is in the future hugetlb
folios can be allocated completely without hugetlbfs.
- Decouple VMA v.s. hugetlb folio allocations: allocating a hugetlb folio
should not always require a hugetlbfs VMA. For example, either it got
allocated from the inode level (see hugetlbfs_fallocate() where it used
a pesudo VMA for allocation), or it can be allocated by other kernel
subsystems.
It paves way for other users to allocate hugetlb folios out of either
system reservations, or subpools (instead of hugetlbfs, as a file system).
For longer term, this prepares hugetlb as a separate concept versus
hugetlbfs, so that hugetlb folios can be allocated by not only hugetlbfs
and other things.
Tests I've done:
- I had a reproducer in patch 1 for the bug I found, this will start to
work after patch 1 or the whole set applied.
- Hugetlb regression tests (on x86_64 2MBs), includes:
- All vmtests on hugetlbfs
- libhugetlbfs test suite (which may fail some tests, but no new failures
will be introduced by this series, so all such failures happen before
this series so shouldn't be relevant).
This patch (of 7):
Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a
process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"),
avoid_reserve was introduced for a special case of CoW on hugetlb private
mappings, and only if the owner VMA is trying to allocate yet another
hugetlb folio that is not reserved within the private vma reserved map.
Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle
areas hole punched by fallocate"), alloc_huge_page() enforced to not
consume any global reservation as long as avoid_reserve=true. This
operation doesn't look correct, because even if it will enforce the
allocation to not use global reservation at all, it will still try to take
one reservation from the spool (if the subpool existed). Then since the
spool reserved pages take from global reservation, it'll also take one
reservation globally.
Logically it can cause global reservation to go wrong.
I wrote a reproducer below, trigger this special path, and every run of
such program will cause global reservation count to increment by one, until
it hits the number of free pages:
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#define MSIZE (2UL << 20)
int main(int argc, char *argv[])
{
const char *path;
int *buf;
int fd, ret;
pid_t child;
if (argc < 2) {
printf("usage: %s <hugetlb_file>\n", argv[0]);
return -1;
}
path = argv[1];
fd = open(path, O_RDWR | O_CREAT, 0666);
if (fd < 0) {
perror("open failed");
return -1;
}
ret = fallocate(fd, 0, 0, MSIZE);
if (ret != 0) {
perror("fallocate");
return -1;
}
buf = mmap(NULL, MSIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE, fd, 0);
if (buf == MAP_FAILED) {
perror("mmap() failed");
return -1;
}
/* Allocate a page */
*buf = 1;
child = fork();
if (child == 0) {
/* child doesn't need to do anything */
exit(0);
}
/* Trigger CoW from owner */
*buf = 2;
munmap(buf, MSIZE);
close(fd);
unlink(path);
return 0;
}
It can only reproduce with a sub-mount when there're reserved pages on the
spool, like:
# sysctl vm.nr_hugepages=128
# mkdir ./hugetlb-pool
# mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool
Then run the reproducer on the mountpoint:
# ./reproducer ./hugetlb-pool/test
Fix it by taking the reservation from spool if available. In general,
avoid_reserve is IMHO more about "avoid vma resv map", not spool's.
I copied stable, however I have no intention for backporting if it's not a
clean cherry-pick, because private hugetlb mapping, and then fork() on top
is too rare to hit.
Link: https://lkml.kernel.org/r/20250107204002.2683356-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20250107204002.2683356-2-peterx@redhat.com
Fixes: d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng(a)google.com>
Tested-by: Ackerley Tng <ackerleytng(a)google.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 312ed27b9721..a10d376cb1a8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1398,8 +1398,7 @@ static unsigned long available_huge_pages(struct hstate *h)
static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
struct vm_area_struct *vma,
- unsigned long address, int avoid_reserve,
- long chg)
+ unsigned long address, long chg)
{
struct folio *folio = NULL;
struct mempolicy *mpol;
@@ -1415,10 +1414,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
if (!vma_has_reserves(vma, chg) && !available_huge_pages(h))
goto err;
- /* If reserves cannot be used, ensure enough pages are in the pool */
- if (avoid_reserve && !available_huge_pages(h))
- goto err;
-
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
@@ -1434,7 +1429,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask,
nid, nodemask);
- if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) {
+ if (folio && vma_has_reserves(vma, chg)) {
folio_set_hugetlb_restore_reserve(folio);
h->resv_huge_pages--;
}
@@ -3051,17 +3046,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
gbl_chg = hugepage_subpool_get_pages(spool, 1);
if (gbl_chg < 0)
goto out_end_reservation;
-
- /*
- * Even though there was no reservation in the region/reserve
- * map, there could be reservations associated with the
- * subpool that can be used. This would be indicated if the
- * return value of hugepage_subpool_get_pages() is zero.
- * However, if avoid_reserve is specified we still avoid even
- * the subpool reservations.
- */
- if (avoid_reserve)
- gbl_chg = 1;
}
/* If this allocation is not consuming a reservation, charge it now.
@@ -3084,7 +3068,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
* from the global free pool (global change). gbl_chg == 0 indicates
* a reservation exists for the allocation.
*/
- folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
+ folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg);
if (!folio) {
spin_unlock_irq(&hugetlb_lock);
folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 58db7c5fbe7daa42098d4965133a864f98ba90ba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021033-cupcake-spud-c04d@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58db7c5fbe7daa42098d4965133a864f98ba90ba Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Tue, 7 Jan 2025 15:39:56 -0500
Subject: [PATCH] mm/hugetlb: fix avoid_reserve to allow taking folio from
subpool
Patch series "mm/hugetlb: Refactor hugetlb allocation resv accounting",
v2.
This is a follow up on Ackerley's series here as replacement:
https://lore.kernel.org/r/cover.1728684491.git.ackerleytng@google.com
The goal of this series is to cleanup hugetlb resv accounting, especially
during folio allocation, to decouple a few things:
- Hugetlb folios v.s. Hugetlbfs: IOW, the hope is in the future hugetlb
folios can be allocated completely without hugetlbfs.
- Decouple VMA v.s. hugetlb folio allocations: allocating a hugetlb folio
should not always require a hugetlbfs VMA. For example, either it got
allocated from the inode level (see hugetlbfs_fallocate() where it used
a pesudo VMA for allocation), or it can be allocated by other kernel
subsystems.
It paves way for other users to allocate hugetlb folios out of either
system reservations, or subpools (instead of hugetlbfs, as a file system).
For longer term, this prepares hugetlb as a separate concept versus
hugetlbfs, so that hugetlb folios can be allocated by not only hugetlbfs
and other things.
Tests I've done:
- I had a reproducer in patch 1 for the bug I found, this will start to
work after patch 1 or the whole set applied.
- Hugetlb regression tests (on x86_64 2MBs), includes:
- All vmtests on hugetlbfs
- libhugetlbfs test suite (which may fail some tests, but no new failures
will be introduced by this series, so all such failures happen before
this series so shouldn't be relevant).
This patch (of 7):
Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a
process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"),
avoid_reserve was introduced for a special case of CoW on hugetlb private
mappings, and only if the owner VMA is trying to allocate yet another
hugetlb folio that is not reserved within the private vma reserved map.
Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle
areas hole punched by fallocate"), alloc_huge_page() enforced to not
consume any global reservation as long as avoid_reserve=true. This
operation doesn't look correct, because even if it will enforce the
allocation to not use global reservation at all, it will still try to take
one reservation from the spool (if the subpool existed). Then since the
spool reserved pages take from global reservation, it'll also take one
reservation globally.
Logically it can cause global reservation to go wrong.
I wrote a reproducer below, trigger this special path, and every run of
such program will cause global reservation count to increment by one, until
it hits the number of free pages:
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#define MSIZE (2UL << 20)
int main(int argc, char *argv[])
{
const char *path;
int *buf;
int fd, ret;
pid_t child;
if (argc < 2) {
printf("usage: %s <hugetlb_file>\n", argv[0]);
return -1;
}
path = argv[1];
fd = open(path, O_RDWR | O_CREAT, 0666);
if (fd < 0) {
perror("open failed");
return -1;
}
ret = fallocate(fd, 0, 0, MSIZE);
if (ret != 0) {
perror("fallocate");
return -1;
}
buf = mmap(NULL, MSIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE, fd, 0);
if (buf == MAP_FAILED) {
perror("mmap() failed");
return -1;
}
/* Allocate a page */
*buf = 1;
child = fork();
if (child == 0) {
/* child doesn't need to do anything */
exit(0);
}
/* Trigger CoW from owner */
*buf = 2;
munmap(buf, MSIZE);
close(fd);
unlink(path);
return 0;
}
It can only reproduce with a sub-mount when there're reserved pages on the
spool, like:
# sysctl vm.nr_hugepages=128
# mkdir ./hugetlb-pool
# mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool
Then run the reproducer on the mountpoint:
# ./reproducer ./hugetlb-pool/test
Fix it by taking the reservation from spool if available. In general,
avoid_reserve is IMHO more about "avoid vma resv map", not spool's.
I copied stable, however I have no intention for backporting if it's not a
clean cherry-pick, because private hugetlb mapping, and then fork() on top
is too rare to hit.
Link: https://lkml.kernel.org/r/20250107204002.2683356-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20250107204002.2683356-2-peterx@redhat.com
Fixes: d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng(a)google.com>
Tested-by: Ackerley Tng <ackerleytng(a)google.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 312ed27b9721..a10d376cb1a8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1398,8 +1398,7 @@ static unsigned long available_huge_pages(struct hstate *h)
static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
struct vm_area_struct *vma,
- unsigned long address, int avoid_reserve,
- long chg)
+ unsigned long address, long chg)
{
struct folio *folio = NULL;
struct mempolicy *mpol;
@@ -1415,10 +1414,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
if (!vma_has_reserves(vma, chg) && !available_huge_pages(h))
goto err;
- /* If reserves cannot be used, ensure enough pages are in the pool */
- if (avoid_reserve && !available_huge_pages(h))
- goto err;
-
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
@@ -1434,7 +1429,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask,
nid, nodemask);
- if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) {
+ if (folio && vma_has_reserves(vma, chg)) {
folio_set_hugetlb_restore_reserve(folio);
h->resv_huge_pages--;
}
@@ -3051,17 +3046,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
gbl_chg = hugepage_subpool_get_pages(spool, 1);
if (gbl_chg < 0)
goto out_end_reservation;
-
- /*
- * Even though there was no reservation in the region/reserve
- * map, there could be reservations associated with the
- * subpool that can be used. This would be indicated if the
- * return value of hugepage_subpool_get_pages() is zero.
- * However, if avoid_reserve is specified we still avoid even
- * the subpool reservations.
- */
- if (avoid_reserve)
- gbl_chg = 1;
}
/* If this allocation is not consuming a reservation, charge it now.
@@ -3084,7 +3068,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
* from the global free pool (global change). gbl_chg == 0 indicates
* a reservation exists for the allocation.
*/
- folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
+ folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg);
if (!folio) {
spin_unlock_irq(&hugetlb_lock);
folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 58db7c5fbe7daa42098d4965133a864f98ba90ba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021025-deflector-judge-df07@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58db7c5fbe7daa42098d4965133a864f98ba90ba Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Tue, 7 Jan 2025 15:39:56 -0500
Subject: [PATCH] mm/hugetlb: fix avoid_reserve to allow taking folio from
subpool
Patch series "mm/hugetlb: Refactor hugetlb allocation resv accounting",
v2.
This is a follow up on Ackerley's series here as replacement:
https://lore.kernel.org/r/cover.1728684491.git.ackerleytng@google.com
The goal of this series is to cleanup hugetlb resv accounting, especially
during folio allocation, to decouple a few things:
- Hugetlb folios v.s. Hugetlbfs: IOW, the hope is in the future hugetlb
folios can be allocated completely without hugetlbfs.
- Decouple VMA v.s. hugetlb folio allocations: allocating a hugetlb folio
should not always require a hugetlbfs VMA. For example, either it got
allocated from the inode level (see hugetlbfs_fallocate() where it used
a pesudo VMA for allocation), or it can be allocated by other kernel
subsystems.
It paves way for other users to allocate hugetlb folios out of either
system reservations, or subpools (instead of hugetlbfs, as a file system).
For longer term, this prepares hugetlb as a separate concept versus
hugetlbfs, so that hugetlb folios can be allocated by not only hugetlbfs
and other things.
Tests I've done:
- I had a reproducer in patch 1 for the bug I found, this will start to
work after patch 1 or the whole set applied.
- Hugetlb regression tests (on x86_64 2MBs), includes:
- All vmtests on hugetlbfs
- libhugetlbfs test suite (which may fail some tests, but no new failures
will be introduced by this series, so all such failures happen before
this series so shouldn't be relevant).
This patch (of 7):
Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a
process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"),
avoid_reserve was introduced for a special case of CoW on hugetlb private
mappings, and only if the owner VMA is trying to allocate yet another
hugetlb folio that is not reserved within the private vma reserved map.
Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle
areas hole punched by fallocate"), alloc_huge_page() enforced to not
consume any global reservation as long as avoid_reserve=true. This
operation doesn't look correct, because even if it will enforce the
allocation to not use global reservation at all, it will still try to take
one reservation from the spool (if the subpool existed). Then since the
spool reserved pages take from global reservation, it'll also take one
reservation globally.
Logically it can cause global reservation to go wrong.
I wrote a reproducer below, trigger this special path, and every run of
such program will cause global reservation count to increment by one, until
it hits the number of free pages:
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#define MSIZE (2UL << 20)
int main(int argc, char *argv[])
{
const char *path;
int *buf;
int fd, ret;
pid_t child;
if (argc < 2) {
printf("usage: %s <hugetlb_file>\n", argv[0]);
return -1;
}
path = argv[1];
fd = open(path, O_RDWR | O_CREAT, 0666);
if (fd < 0) {
perror("open failed");
return -1;
}
ret = fallocate(fd, 0, 0, MSIZE);
if (ret != 0) {
perror("fallocate");
return -1;
}
buf = mmap(NULL, MSIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE, fd, 0);
if (buf == MAP_FAILED) {
perror("mmap() failed");
return -1;
}
/* Allocate a page */
*buf = 1;
child = fork();
if (child == 0) {
/* child doesn't need to do anything */
exit(0);
}
/* Trigger CoW from owner */
*buf = 2;
munmap(buf, MSIZE);
close(fd);
unlink(path);
return 0;
}
It can only reproduce with a sub-mount when there're reserved pages on the
spool, like:
# sysctl vm.nr_hugepages=128
# mkdir ./hugetlb-pool
# mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool
Then run the reproducer on the mountpoint:
# ./reproducer ./hugetlb-pool/test
Fix it by taking the reservation from spool if available. In general,
avoid_reserve is IMHO more about "avoid vma resv map", not spool's.
I copied stable, however I have no intention for backporting if it's not a
clean cherry-pick, because private hugetlb mapping, and then fork() on top
is too rare to hit.
Link: https://lkml.kernel.org/r/20250107204002.2683356-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20250107204002.2683356-2-peterx@redhat.com
Fixes: d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng(a)google.com>
Tested-by: Ackerley Tng <ackerleytng(a)google.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 312ed27b9721..a10d376cb1a8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1398,8 +1398,7 @@ static unsigned long available_huge_pages(struct hstate *h)
static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
struct vm_area_struct *vma,
- unsigned long address, int avoid_reserve,
- long chg)
+ unsigned long address, long chg)
{
struct folio *folio = NULL;
struct mempolicy *mpol;
@@ -1415,10 +1414,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
if (!vma_has_reserves(vma, chg) && !available_huge_pages(h))
goto err;
- /* If reserves cannot be used, ensure enough pages are in the pool */
- if (avoid_reserve && !available_huge_pages(h))
- goto err;
-
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
@@ -1434,7 +1429,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask,
nid, nodemask);
- if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) {
+ if (folio && vma_has_reserves(vma, chg)) {
folio_set_hugetlb_restore_reserve(folio);
h->resv_huge_pages--;
}
@@ -3051,17 +3046,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
gbl_chg = hugepage_subpool_get_pages(spool, 1);
if (gbl_chg < 0)
goto out_end_reservation;
-
- /*
- * Even though there was no reservation in the region/reserve
- * map, there could be reservations associated with the
- * subpool that can be used. This would be indicated if the
- * return value of hugepage_subpool_get_pages() is zero.
- * However, if avoid_reserve is specified we still avoid even
- * the subpool reservations.
- */
- if (avoid_reserve)
- gbl_chg = 1;
}
/* If this allocation is not consuming a reservation, charge it now.
@@ -3084,7 +3068,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
* from the global free pool (global change). gbl_chg == 0 indicates
* a reservation exists for the allocation.
*/
- folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
+ folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg);
if (!folio) {
spin_unlock_irq(&hugetlb_lock);
folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 58db7c5fbe7daa42098d4965133a864f98ba90ba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021023-plank-tribesman-b3f2@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58db7c5fbe7daa42098d4965133a864f98ba90ba Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Tue, 7 Jan 2025 15:39:56 -0500
Subject: [PATCH] mm/hugetlb: fix avoid_reserve to allow taking folio from
subpool
Patch series "mm/hugetlb: Refactor hugetlb allocation resv accounting",
v2.
This is a follow up on Ackerley's series here as replacement:
https://lore.kernel.org/r/cover.1728684491.git.ackerleytng@google.com
The goal of this series is to cleanup hugetlb resv accounting, especially
during folio allocation, to decouple a few things:
- Hugetlb folios v.s. Hugetlbfs: IOW, the hope is in the future hugetlb
folios can be allocated completely without hugetlbfs.
- Decouple VMA v.s. hugetlb folio allocations: allocating a hugetlb folio
should not always require a hugetlbfs VMA. For example, either it got
allocated from the inode level (see hugetlbfs_fallocate() where it used
a pesudo VMA for allocation), or it can be allocated by other kernel
subsystems.
It paves way for other users to allocate hugetlb folios out of either
system reservations, or subpools (instead of hugetlbfs, as a file system).
For longer term, this prepares hugetlb as a separate concept versus
hugetlbfs, so that hugetlb folios can be allocated by not only hugetlbfs
and other things.
Tests I've done:
- I had a reproducer in patch 1 for the bug I found, this will start to
work after patch 1 or the whole set applied.
- Hugetlb regression tests (on x86_64 2MBs), includes:
- All vmtests on hugetlbfs
- libhugetlbfs test suite (which may fail some tests, but no new failures
will be introduced by this series, so all such failures happen before
this series so shouldn't be relevant).
This patch (of 7):
Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a
process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"),
avoid_reserve was introduced for a special case of CoW on hugetlb private
mappings, and only if the owner VMA is trying to allocate yet another
hugetlb folio that is not reserved within the private vma reserved map.
Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle
areas hole punched by fallocate"), alloc_huge_page() enforced to not
consume any global reservation as long as avoid_reserve=true. This
operation doesn't look correct, because even if it will enforce the
allocation to not use global reservation at all, it will still try to take
one reservation from the spool (if the subpool existed). Then since the
spool reserved pages take from global reservation, it'll also take one
reservation globally.
Logically it can cause global reservation to go wrong.
I wrote a reproducer below, trigger this special path, and every run of
such program will cause global reservation count to increment by one, until
it hits the number of free pages:
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#define MSIZE (2UL << 20)
int main(int argc, char *argv[])
{
const char *path;
int *buf;
int fd, ret;
pid_t child;
if (argc < 2) {
printf("usage: %s <hugetlb_file>\n", argv[0]);
return -1;
}
path = argv[1];
fd = open(path, O_RDWR | O_CREAT, 0666);
if (fd < 0) {
perror("open failed");
return -1;
}
ret = fallocate(fd, 0, 0, MSIZE);
if (ret != 0) {
perror("fallocate");
return -1;
}
buf = mmap(NULL, MSIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE, fd, 0);
if (buf == MAP_FAILED) {
perror("mmap() failed");
return -1;
}
/* Allocate a page */
*buf = 1;
child = fork();
if (child == 0) {
/* child doesn't need to do anything */
exit(0);
}
/* Trigger CoW from owner */
*buf = 2;
munmap(buf, MSIZE);
close(fd);
unlink(path);
return 0;
}
It can only reproduce with a sub-mount when there're reserved pages on the
spool, like:
# sysctl vm.nr_hugepages=128
# mkdir ./hugetlb-pool
# mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool
Then run the reproducer on the mountpoint:
# ./reproducer ./hugetlb-pool/test
Fix it by taking the reservation from spool if available. In general,
avoid_reserve is IMHO more about "avoid vma resv map", not spool's.
I copied stable, however I have no intention for backporting if it's not a
clean cherry-pick, because private hugetlb mapping, and then fork() on top
is too rare to hit.
Link: https://lkml.kernel.org/r/20250107204002.2683356-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20250107204002.2683356-2-peterx@redhat.com
Fixes: d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng(a)google.com>
Tested-by: Ackerley Tng <ackerleytng(a)google.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 312ed27b9721..a10d376cb1a8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1398,8 +1398,7 @@ static unsigned long available_huge_pages(struct hstate *h)
static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
struct vm_area_struct *vma,
- unsigned long address, int avoid_reserve,
- long chg)
+ unsigned long address, long chg)
{
struct folio *folio = NULL;
struct mempolicy *mpol;
@@ -1415,10 +1414,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
if (!vma_has_reserves(vma, chg) && !available_huge_pages(h))
goto err;
- /* If reserves cannot be used, ensure enough pages are in the pool */
- if (avoid_reserve && !available_huge_pages(h))
- goto err;
-
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
@@ -1434,7 +1429,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask,
nid, nodemask);
- if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) {
+ if (folio && vma_has_reserves(vma, chg)) {
folio_set_hugetlb_restore_reserve(folio);
h->resv_huge_pages--;
}
@@ -3051,17 +3046,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
gbl_chg = hugepage_subpool_get_pages(spool, 1);
if (gbl_chg < 0)
goto out_end_reservation;
-
- /*
- * Even though there was no reservation in the region/reserve
- * map, there could be reservations associated with the
- * subpool that can be used. This would be indicated if the
- * return value of hugepage_subpool_get_pages() is zero.
- * However, if avoid_reserve is specified we still avoid even
- * the subpool reservations.
- */
- if (avoid_reserve)
- gbl_chg = 1;
}
/* If this allocation is not consuming a reservation, charge it now.
@@ -3084,7 +3068,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
* from the global free pool (global change). gbl_chg == 0 indicates
* a reservation exists for the allocation.
*/
- folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
+ folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg);
if (!folio) {
spin_unlock_irq(&hugetlb_lock);
folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 58db7c5fbe7daa42098d4965133a864f98ba90ba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021020-copper-visibly-e6a9@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58db7c5fbe7daa42098d4965133a864f98ba90ba Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Tue, 7 Jan 2025 15:39:56 -0500
Subject: [PATCH] mm/hugetlb: fix avoid_reserve to allow taking folio from
subpool
Patch series "mm/hugetlb: Refactor hugetlb allocation resv accounting",
v2.
This is a follow up on Ackerley's series here as replacement:
https://lore.kernel.org/r/cover.1728684491.git.ackerleytng@google.com
The goal of this series is to cleanup hugetlb resv accounting, especially
during folio allocation, to decouple a few things:
- Hugetlb folios v.s. Hugetlbfs: IOW, the hope is in the future hugetlb
folios can be allocated completely without hugetlbfs.
- Decouple VMA v.s. hugetlb folio allocations: allocating a hugetlb folio
should not always require a hugetlbfs VMA. For example, either it got
allocated from the inode level (see hugetlbfs_fallocate() where it used
a pesudo VMA for allocation), or it can be allocated by other kernel
subsystems.
It paves way for other users to allocate hugetlb folios out of either
system reservations, or subpools (instead of hugetlbfs, as a file system).
For longer term, this prepares hugetlb as a separate concept versus
hugetlbfs, so that hugetlb folios can be allocated by not only hugetlbfs
and other things.
Tests I've done:
- I had a reproducer in patch 1 for the bug I found, this will start to
work after patch 1 or the whole set applied.
- Hugetlb regression tests (on x86_64 2MBs), includes:
- All vmtests on hugetlbfs
- libhugetlbfs test suite (which may fail some tests, but no new failures
will be introduced by this series, so all such failures happen before
this series so shouldn't be relevant).
This patch (of 7):
Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a
process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"),
avoid_reserve was introduced for a special case of CoW on hugetlb private
mappings, and only if the owner VMA is trying to allocate yet another
hugetlb folio that is not reserved within the private vma reserved map.
Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle
areas hole punched by fallocate"), alloc_huge_page() enforced to not
consume any global reservation as long as avoid_reserve=true. This
operation doesn't look correct, because even if it will enforce the
allocation to not use global reservation at all, it will still try to take
one reservation from the spool (if the subpool existed). Then since the
spool reserved pages take from global reservation, it'll also take one
reservation globally.
Logically it can cause global reservation to go wrong.
I wrote a reproducer below, trigger this special path, and every run of
such program will cause global reservation count to increment by one, until
it hits the number of free pages:
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#define MSIZE (2UL << 20)
int main(int argc, char *argv[])
{
const char *path;
int *buf;
int fd, ret;
pid_t child;
if (argc < 2) {
printf("usage: %s <hugetlb_file>\n", argv[0]);
return -1;
}
path = argv[1];
fd = open(path, O_RDWR | O_CREAT, 0666);
if (fd < 0) {
perror("open failed");
return -1;
}
ret = fallocate(fd, 0, 0, MSIZE);
if (ret != 0) {
perror("fallocate");
return -1;
}
buf = mmap(NULL, MSIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE, fd, 0);
if (buf == MAP_FAILED) {
perror("mmap() failed");
return -1;
}
/* Allocate a page */
*buf = 1;
child = fork();
if (child == 0) {
/* child doesn't need to do anything */
exit(0);
}
/* Trigger CoW from owner */
*buf = 2;
munmap(buf, MSIZE);
close(fd);
unlink(path);
return 0;
}
It can only reproduce with a sub-mount when there're reserved pages on the
spool, like:
# sysctl vm.nr_hugepages=128
# mkdir ./hugetlb-pool
# mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool
Then run the reproducer on the mountpoint:
# ./reproducer ./hugetlb-pool/test
Fix it by taking the reservation from spool if available. In general,
avoid_reserve is IMHO more about "avoid vma resv map", not spool's.
I copied stable, however I have no intention for backporting if it's not a
clean cherry-pick, because private hugetlb mapping, and then fork() on top
is too rare to hit.
Link: https://lkml.kernel.org/r/20250107204002.2683356-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20250107204002.2683356-2-peterx@redhat.com
Fixes: d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng(a)google.com>
Tested-by: Ackerley Tng <ackerleytng(a)google.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 312ed27b9721..a10d376cb1a8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1398,8 +1398,7 @@ static unsigned long available_huge_pages(struct hstate *h)
static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
struct vm_area_struct *vma,
- unsigned long address, int avoid_reserve,
- long chg)
+ unsigned long address, long chg)
{
struct folio *folio = NULL;
struct mempolicy *mpol;
@@ -1415,10 +1414,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
if (!vma_has_reserves(vma, chg) && !available_huge_pages(h))
goto err;
- /* If reserves cannot be used, ensure enough pages are in the pool */
- if (avoid_reserve && !available_huge_pages(h))
- goto err;
-
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
@@ -1434,7 +1429,7 @@ static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask,
nid, nodemask);
- if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) {
+ if (folio && vma_has_reserves(vma, chg)) {
folio_set_hugetlb_restore_reserve(folio);
h->resv_huge_pages--;
}
@@ -3051,17 +3046,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
gbl_chg = hugepage_subpool_get_pages(spool, 1);
if (gbl_chg < 0)
goto out_end_reservation;
-
- /*
- * Even though there was no reservation in the region/reserve
- * map, there could be reservations associated with the
- * subpool that can be used. This would be indicated if the
- * return value of hugepage_subpool_get_pages() is zero.
- * However, if avoid_reserve is specified we still avoid even
- * the subpool reservations.
- */
- if (avoid_reserve)
- gbl_chg = 1;
}
/* If this allocation is not consuming a reservation, charge it now.
@@ -3084,7 +3068,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
* from the global free pool (global change). gbl_chg == 0 indicates
* a reservation exists for the allocation.
*/
- folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
+ folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg);
if (!folio) {
spin_unlock_irq(&hugetlb_lock);
folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e64f81946adf68cd75e2207dd9a51668348a4af8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021058-paramount-dance-41da@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e64f81946adf68cd75e2207dd9a51668348a4af8 Mon Sep 17 00:00:00 2001
From: Marco Elver <elver(a)google.com>
Date: Fri, 24 Jan 2025 13:01:38 +0100
Subject: [PATCH] kfence: skip __GFP_THISNODE allocations on NUMA systems
On NUMA systems, __GFP_THISNODE indicates that an allocation _must_ be on
a particular node, and failure to allocate on the desired node will result
in a failed allocation.
Skip __GFP_THISNODE allocations if we are running on a NUMA system, since
KFENCE can't guarantee which node its pool pages are allocated on.
Link: https://lkml.kernel.org/r/20250124120145.410066-1-elver@google.com
Fixes: 236e9f153852 ("kfence: skip all GFP_ZONEMASK allocations")
Signed-off-by: Marco Elver <elver(a)google.com>
Reported-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Chistoph Lameter <cl(a)linux.com>
Cc: Dmitriy Vyukov <dvyukov(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 67fc321db79b..102048821c22 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -21,6 +21,7 @@
#include <linux/log2.h>
#include <linux/memblock.h>
#include <linux/moduleparam.h>
+#include <linux/nodemask.h>
#include <linux/notifier.h>
#include <linux/panic_notifier.h>
#include <linux/random.h>
@@ -1084,6 +1085,7 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
* properties (e.g. reside in DMAable memory).
*/
if ((flags & GFP_ZONEMASK) ||
+ ((flags & __GFP_THISNODE) && num_online_nodes() > 1) ||
(s->flags & (SLAB_CACHE_DMA | SLAB_CACHE_DMA32))) {
atomic_long_inc(&counters[KFENCE_COUNTER_SKIP_INCOMPAT]);
return NULL;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 26b63bee2f6e711c5a169997fd126fddcfb90848
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021019-constrain-diligent-85b6@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 26b63bee2f6e711c5a169997fd126fddcfb90848 Mon Sep 17 00:00:00 2001
From: Wentao Liang <vulab(a)iscas.ac.cn>
Date: Fri, 24 Jan 2025 11:45:09 +0800
Subject: [PATCH] xfs: Add error handling for xfs_reflink_cancel_cow_range
In xfs_inactive(), xfs_reflink_cancel_cow_range() is called
without error handling, risking unnoticed failures and
inconsistent behavior compared to other parts of the code.
Fix this issue by adding an error handling for the
xfs_reflink_cancel_cow_range(), improving code robustness.
Fixes: 6231848c3aa5 ("xfs: check for cow blocks before trying to clear them")
Cc: stable(a)vger.kernel.org # v4.17
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
Signed-off-by: Carlos Maiolino <cem(a)kernel.org>
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index c95fe1b1de4e..b1f9f156ec88 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1404,8 +1404,11 @@ xfs_inactive(
goto out;
/* Try to clean out the cow blocks if there are any. */
- if (xfs_inode_has_cow_data(ip))
- xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);
+ if (xfs_inode_has_cow_data(ip)) {
+ error = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);
+ if (error)
+ goto out;
+ }
if (VFS_I(ip)->i_nlink != 0) {
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 26b63bee2f6e711c5a169997fd126fddcfb90848
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021014-dedicate-recycled-1738@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 26b63bee2f6e711c5a169997fd126fddcfb90848 Mon Sep 17 00:00:00 2001
From: Wentao Liang <vulab(a)iscas.ac.cn>
Date: Fri, 24 Jan 2025 11:45:09 +0800
Subject: [PATCH] xfs: Add error handling for xfs_reflink_cancel_cow_range
In xfs_inactive(), xfs_reflink_cancel_cow_range() is called
without error handling, risking unnoticed failures and
inconsistent behavior compared to other parts of the code.
Fix this issue by adding an error handling for the
xfs_reflink_cancel_cow_range(), improving code robustness.
Fixes: 6231848c3aa5 ("xfs: check for cow blocks before trying to clear them")
Cc: stable(a)vger.kernel.org # v4.17
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
Signed-off-by: Carlos Maiolino <cem(a)kernel.org>
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index c95fe1b1de4e..b1f9f156ec88 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1404,8 +1404,11 @@ xfs_inactive(
goto out;
/* Try to clean out the cow blocks if there are any. */
- if (xfs_inode_has_cow_data(ip))
- xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);
+ if (xfs_inode_has_cow_data(ip)) {
+ error = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, true);
+ if (error)
+ goto out;
+ }
if (VFS_I(ip)->i_nlink != 0) {
/*
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 53dac345395c0d2493cbc2f4c85fe38aef5b63f5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021056-sincere-reformist-1cc5@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 53dac345395c0d2493cbc2f4c85fe38aef5b63f5 Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <frederic(a)kernel.org>
Date: Sat, 18 Jan 2025 00:24:33 +0100
Subject: [PATCH] hrtimers: Force migrate away hrtimers queued after
CPUHP_AP_HRTIMERS_DYING
hrtimers are migrated away from the dying CPU to any online target at
the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth timers
handling tasks involved in the CPU hotplug forward progress.
However wakeups can still be performed by the outgoing CPU after
CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers being
armed. Depending on several considerations (crystal ball power management
based election, earliest timer already enqueued, timer migration enabled or
not), the target may eventually be the current CPU even if offline. If that
happens, the timer is eventually ignored.
The most notable example is RCU which had to deal with each and every of
those wake-ups by deferring them to an online CPU, along with related
workarounds:
_ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying)
_ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from offline CPU)
_ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq)
The problem isn't confined to RCU though as the stop machine kthread
(which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the end
of its work through cpu_stop_signal_done() and performs a wake up that
eventually arms the deadline server timer:
WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0
CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0
RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
Call Trace:
<TASK>
start_dl_timer
enqueue_dl_entity
dl_server_start
enqueue_task_fair
enqueue_task
ttwu_do_activate
try_to_wake_up
complete
cpu_stopper_thread
Instead of providing yet another bandaid to work around the situation, fix
it in the hrtimers infrastructure instead: always migrate away a timer to
an online target whenever it is enqueued from an offline CPU.
This will also allow to revert all the above RCU disgraceful hacks.
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Reported-by: Vlad Poenaru <vlad.wing(a)gmail.com>
Reported-by: Usama Arif <usamaarif642(a)gmail.com>
Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Tested-by: Paul E. McKenney <paulmck(a)kernel.org>
Link: https://lore.kernel.org/all/20250117232433.24027-1-frederic@kernel.org
Closes: 20241213203739.1519801-1-usamaarif642(a)gmail.com
diff --git a/include/linux/hrtimer_defs.h b/include/linux/hrtimer_defs.h
index c3b4b7ed7c16..84a5045f80f3 100644
--- a/include/linux/hrtimer_defs.h
+++ b/include/linux/hrtimer_defs.h
@@ -125,6 +125,7 @@ struct hrtimer_cpu_base {
ktime_t softirq_expires_next;
struct hrtimer *softirq_next_timer;
struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
+ call_single_data_t csd;
} ____cacheline_aligned;
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 4fb81f8c6f1c..deb1aa32814e 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -58,6 +58,8 @@
#define HRTIMER_ACTIVE_SOFT (HRTIMER_ACTIVE_HARD << MASK_SHIFT)
#define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD)
+static void retrigger_next_event(void *arg);
+
/*
* The timer bases:
*
@@ -111,7 +113,8 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
.clockid = CLOCK_TAI,
.get_time = &ktime_get_clocktai,
},
- }
+ },
+ .csd = CSD_INIT(retrigger_next_event, NULL)
};
static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
@@ -124,6 +127,14 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
[CLOCK_TAI] = HRTIMER_BASE_TAI,
};
+static inline bool hrtimer_base_is_online(struct hrtimer_cpu_base *base)
+{
+ if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
+ return true;
+ else
+ return likely(base->online);
+}
+
/*
* Functions and macros which are different for UP/SMP systems are kept in a
* single place
@@ -178,27 +189,54 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
}
/*
- * We do not migrate the timer when it is expiring before the next
- * event on the target cpu. When high resolution is enabled, we cannot
- * reprogram the target cpu hardware and we would cause it to fire
- * late. To keep it simple, we handle the high resolution enabled and
- * disabled case similar.
+ * Check if the elected target is suitable considering its next
+ * event and the hotplug state of the current CPU.
+ *
+ * If the elected target is remote and its next event is after the timer
+ * to queue, then a remote reprogram is necessary. However there is no
+ * guarantee the IPI handling the operation would arrive in time to meet
+ * the high resolution deadline. In this case the local CPU becomes a
+ * preferred target, unless it is offline.
+ *
+ * High and low resolution modes are handled the same way for simplicity.
*
* Called with cpu_base->lock of target cpu held.
*/
-static int
-hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base)
+static bool hrtimer_suitable_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base,
+ struct hrtimer_cpu_base *new_cpu_base,
+ struct hrtimer_cpu_base *this_cpu_base)
{
ktime_t expires;
+ /*
+ * The local CPU clockevent can be reprogrammed. Also get_target_base()
+ * guarantees it is online.
+ */
+ if (new_cpu_base == this_cpu_base)
+ return true;
+
+ /*
+ * The offline local CPU can't be the default target if the
+ * next remote target event is after this timer. Keep the
+ * elected new base. An IPI will we issued to reprogram
+ * it as a last resort.
+ */
+ if (!hrtimer_base_is_online(this_cpu_base))
+ return true;
+
expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
- return expires < new_base->cpu_base->expires_next;
+
+ return expires >= new_base->cpu_base->expires_next;
}
-static inline
-struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base,
- int pinned)
+static inline struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base, int pinned)
{
+ if (!hrtimer_base_is_online(base)) {
+ int cpu = cpumask_any_and(cpu_online_mask, housekeeping_cpumask(HK_TYPE_TIMER));
+
+ return &per_cpu(hrtimer_bases, cpu);
+ }
+
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
if (static_branch_likely(&timers_migration_enabled) && !pinned)
return &per_cpu(hrtimer_bases, get_nohz_timer_target());
@@ -249,8 +287,8 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
raw_spin_unlock(&base->cpu_base->lock);
raw_spin_lock(&new_base->cpu_base->lock);
- if (new_cpu_base != this_cpu_base &&
- hrtimer_check_target(timer, new_base)) {
+ if (!hrtimer_suitable_target(timer, new_base, new_cpu_base,
+ this_cpu_base)) {
raw_spin_unlock(&new_base->cpu_base->lock);
raw_spin_lock(&base->cpu_base->lock);
new_cpu_base = this_cpu_base;
@@ -259,8 +297,7 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
}
WRITE_ONCE(timer->base, new_base);
} else {
- if (new_cpu_base != this_cpu_base &&
- hrtimer_check_target(timer, new_base)) {
+ if (!hrtimer_suitable_target(timer, new_base, new_cpu_base, this_cpu_base)) {
new_cpu_base = this_cpu_base;
goto again;
}
@@ -706,8 +743,6 @@ static inline int hrtimer_is_hres_enabled(void)
return hrtimer_hres_enabled;
}
-static void retrigger_next_event(void *arg);
-
/*
* Switch to high resolution mode
*/
@@ -1195,6 +1230,7 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
u64 delta_ns, const enum hrtimer_mode mode,
struct hrtimer_clock_base *base)
{
+ struct hrtimer_cpu_base *this_cpu_base = this_cpu_ptr(&hrtimer_bases);
struct hrtimer_clock_base *new_base;
bool force_local, first;
@@ -1206,9 +1242,15 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
* and enforce reprogramming after it is queued no matter whether
* it is the new first expiring timer again or not.
*/
- force_local = base->cpu_base == this_cpu_ptr(&hrtimer_bases);
+ force_local = base->cpu_base == this_cpu_base;
force_local &= base->cpu_base->next_timer == timer;
+ /*
+ * Don't force local queuing if this enqueue happens on a unplugged
+ * CPU after hrtimer_cpu_dying() has been invoked.
+ */
+ force_local &= this_cpu_base->online;
+
/*
* Remove an active timer from the queue. In case it is not queued
* on the current CPU, make sure that remove_hrtimer() updates the
@@ -1238,8 +1280,27 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
}
first = enqueue_hrtimer(timer, new_base, mode);
- if (!force_local)
- return first;
+ if (!force_local) {
+ /*
+ * If the current CPU base is online, then the timer is
+ * never queued on a remote CPU if it would be the first
+ * expiring timer there.
+ */
+ if (hrtimer_base_is_online(this_cpu_base))
+ return first;
+
+ /*
+ * Timer was enqueued remote because the current base is
+ * already offline. If the timer is the first to expire,
+ * kick the remote CPU to reprogram the clock event.
+ */
+ if (first) {
+ struct hrtimer_cpu_base *new_cpu_base = new_base->cpu_base;
+
+ smp_call_function_single_async(new_cpu_base->cpu, &new_cpu_base->csd);
+ }
+ return 0;
+ }
/*
* Timer was forced to stay on the current CPU to avoid
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 53dac345395c0d2493cbc2f4c85fe38aef5b63f5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021055-chimp-kinswoman-da90@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 53dac345395c0d2493cbc2f4c85fe38aef5b63f5 Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <frederic(a)kernel.org>
Date: Sat, 18 Jan 2025 00:24:33 +0100
Subject: [PATCH] hrtimers: Force migrate away hrtimers queued after
CPUHP_AP_HRTIMERS_DYING
hrtimers are migrated away from the dying CPU to any online target at
the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth timers
handling tasks involved in the CPU hotplug forward progress.
However wakeups can still be performed by the outgoing CPU after
CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers being
armed. Depending on several considerations (crystal ball power management
based election, earliest timer already enqueued, timer migration enabled or
not), the target may eventually be the current CPU even if offline. If that
happens, the timer is eventually ignored.
The most notable example is RCU which had to deal with each and every of
those wake-ups by deferring them to an online CPU, along with related
workarounds:
_ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying)
_ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from offline CPU)
_ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq)
The problem isn't confined to RCU though as the stop machine kthread
(which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the end
of its work through cpu_stop_signal_done() and performs a wake up that
eventually arms the deadline server timer:
WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0
CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0
RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
Call Trace:
<TASK>
start_dl_timer
enqueue_dl_entity
dl_server_start
enqueue_task_fair
enqueue_task
ttwu_do_activate
try_to_wake_up
complete
cpu_stopper_thread
Instead of providing yet another bandaid to work around the situation, fix
it in the hrtimers infrastructure instead: always migrate away a timer to
an online target whenever it is enqueued from an offline CPU.
This will also allow to revert all the above RCU disgraceful hacks.
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Reported-by: Vlad Poenaru <vlad.wing(a)gmail.com>
Reported-by: Usama Arif <usamaarif642(a)gmail.com>
Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Tested-by: Paul E. McKenney <paulmck(a)kernel.org>
Link: https://lore.kernel.org/all/20250117232433.24027-1-frederic@kernel.org
Closes: 20241213203739.1519801-1-usamaarif642(a)gmail.com
diff --git a/include/linux/hrtimer_defs.h b/include/linux/hrtimer_defs.h
index c3b4b7ed7c16..84a5045f80f3 100644
--- a/include/linux/hrtimer_defs.h
+++ b/include/linux/hrtimer_defs.h
@@ -125,6 +125,7 @@ struct hrtimer_cpu_base {
ktime_t softirq_expires_next;
struct hrtimer *softirq_next_timer;
struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
+ call_single_data_t csd;
} ____cacheline_aligned;
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 4fb81f8c6f1c..deb1aa32814e 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -58,6 +58,8 @@
#define HRTIMER_ACTIVE_SOFT (HRTIMER_ACTIVE_HARD << MASK_SHIFT)
#define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD)
+static void retrigger_next_event(void *arg);
+
/*
* The timer bases:
*
@@ -111,7 +113,8 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
.clockid = CLOCK_TAI,
.get_time = &ktime_get_clocktai,
},
- }
+ },
+ .csd = CSD_INIT(retrigger_next_event, NULL)
};
static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
@@ -124,6 +127,14 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
[CLOCK_TAI] = HRTIMER_BASE_TAI,
};
+static inline bool hrtimer_base_is_online(struct hrtimer_cpu_base *base)
+{
+ if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
+ return true;
+ else
+ return likely(base->online);
+}
+
/*
* Functions and macros which are different for UP/SMP systems are kept in a
* single place
@@ -178,27 +189,54 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
}
/*
- * We do not migrate the timer when it is expiring before the next
- * event on the target cpu. When high resolution is enabled, we cannot
- * reprogram the target cpu hardware and we would cause it to fire
- * late. To keep it simple, we handle the high resolution enabled and
- * disabled case similar.
+ * Check if the elected target is suitable considering its next
+ * event and the hotplug state of the current CPU.
+ *
+ * If the elected target is remote and its next event is after the timer
+ * to queue, then a remote reprogram is necessary. However there is no
+ * guarantee the IPI handling the operation would arrive in time to meet
+ * the high resolution deadline. In this case the local CPU becomes a
+ * preferred target, unless it is offline.
+ *
+ * High and low resolution modes are handled the same way for simplicity.
*
* Called with cpu_base->lock of target cpu held.
*/
-static int
-hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base)
+static bool hrtimer_suitable_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base,
+ struct hrtimer_cpu_base *new_cpu_base,
+ struct hrtimer_cpu_base *this_cpu_base)
{
ktime_t expires;
+ /*
+ * The local CPU clockevent can be reprogrammed. Also get_target_base()
+ * guarantees it is online.
+ */
+ if (new_cpu_base == this_cpu_base)
+ return true;
+
+ /*
+ * The offline local CPU can't be the default target if the
+ * next remote target event is after this timer. Keep the
+ * elected new base. An IPI will we issued to reprogram
+ * it as a last resort.
+ */
+ if (!hrtimer_base_is_online(this_cpu_base))
+ return true;
+
expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
- return expires < new_base->cpu_base->expires_next;
+
+ return expires >= new_base->cpu_base->expires_next;
}
-static inline
-struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base,
- int pinned)
+static inline struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base, int pinned)
{
+ if (!hrtimer_base_is_online(base)) {
+ int cpu = cpumask_any_and(cpu_online_mask, housekeeping_cpumask(HK_TYPE_TIMER));
+
+ return &per_cpu(hrtimer_bases, cpu);
+ }
+
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
if (static_branch_likely(&timers_migration_enabled) && !pinned)
return &per_cpu(hrtimer_bases, get_nohz_timer_target());
@@ -249,8 +287,8 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
raw_spin_unlock(&base->cpu_base->lock);
raw_spin_lock(&new_base->cpu_base->lock);
- if (new_cpu_base != this_cpu_base &&
- hrtimer_check_target(timer, new_base)) {
+ if (!hrtimer_suitable_target(timer, new_base, new_cpu_base,
+ this_cpu_base)) {
raw_spin_unlock(&new_base->cpu_base->lock);
raw_spin_lock(&base->cpu_base->lock);
new_cpu_base = this_cpu_base;
@@ -259,8 +297,7 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
}
WRITE_ONCE(timer->base, new_base);
} else {
- if (new_cpu_base != this_cpu_base &&
- hrtimer_check_target(timer, new_base)) {
+ if (!hrtimer_suitable_target(timer, new_base, new_cpu_base, this_cpu_base)) {
new_cpu_base = this_cpu_base;
goto again;
}
@@ -706,8 +743,6 @@ static inline int hrtimer_is_hres_enabled(void)
return hrtimer_hres_enabled;
}
-static void retrigger_next_event(void *arg);
-
/*
* Switch to high resolution mode
*/
@@ -1195,6 +1230,7 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
u64 delta_ns, const enum hrtimer_mode mode,
struct hrtimer_clock_base *base)
{
+ struct hrtimer_cpu_base *this_cpu_base = this_cpu_ptr(&hrtimer_bases);
struct hrtimer_clock_base *new_base;
bool force_local, first;
@@ -1206,9 +1242,15 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
* and enforce reprogramming after it is queued no matter whether
* it is the new first expiring timer again or not.
*/
- force_local = base->cpu_base == this_cpu_ptr(&hrtimer_bases);
+ force_local = base->cpu_base == this_cpu_base;
force_local &= base->cpu_base->next_timer == timer;
+ /*
+ * Don't force local queuing if this enqueue happens on a unplugged
+ * CPU after hrtimer_cpu_dying() has been invoked.
+ */
+ force_local &= this_cpu_base->online;
+
/*
* Remove an active timer from the queue. In case it is not queued
* on the current CPU, make sure that remove_hrtimer() updates the
@@ -1238,8 +1280,27 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
}
first = enqueue_hrtimer(timer, new_base, mode);
- if (!force_local)
- return first;
+ if (!force_local) {
+ /*
+ * If the current CPU base is online, then the timer is
+ * never queued on a remote CPU if it would be the first
+ * expiring timer there.
+ */
+ if (hrtimer_base_is_online(this_cpu_base))
+ return first;
+
+ /*
+ * Timer was enqueued remote because the current base is
+ * already offline. If the timer is the first to expire,
+ * kick the remote CPU to reprogram the clock event.
+ */
+ if (first) {
+ struct hrtimer_cpu_base *new_cpu_base = new_base->cpu_base;
+
+ smp_call_function_single_async(new_cpu_base->cpu, &new_cpu_base->csd);
+ }
+ return 0;
+ }
/*
* Timer was forced to stay on the current CPU to avoid
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 53dac345395c0d2493cbc2f4c85fe38aef5b63f5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021054-crushing-bluff-d330@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 53dac345395c0d2493cbc2f4c85fe38aef5b63f5 Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <frederic(a)kernel.org>
Date: Sat, 18 Jan 2025 00:24:33 +0100
Subject: [PATCH] hrtimers: Force migrate away hrtimers queued after
CPUHP_AP_HRTIMERS_DYING
hrtimers are migrated away from the dying CPU to any online target at
the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth timers
handling tasks involved in the CPU hotplug forward progress.
However wakeups can still be performed by the outgoing CPU after
CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers being
armed. Depending on several considerations (crystal ball power management
based election, earliest timer already enqueued, timer migration enabled or
not), the target may eventually be the current CPU even if offline. If that
happens, the timer is eventually ignored.
The most notable example is RCU which had to deal with each and every of
those wake-ups by deferring them to an online CPU, along with related
workarounds:
_ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying)
_ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from offline CPU)
_ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq)
The problem isn't confined to RCU though as the stop machine kthread
(which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the end
of its work through cpu_stop_signal_done() and performs a wake up that
eventually arms the deadline server timer:
WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0
CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0
RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
Call Trace:
<TASK>
start_dl_timer
enqueue_dl_entity
dl_server_start
enqueue_task_fair
enqueue_task
ttwu_do_activate
try_to_wake_up
complete
cpu_stopper_thread
Instead of providing yet another bandaid to work around the situation, fix
it in the hrtimers infrastructure instead: always migrate away a timer to
an online target whenever it is enqueued from an offline CPU.
This will also allow to revert all the above RCU disgraceful hacks.
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Reported-by: Vlad Poenaru <vlad.wing(a)gmail.com>
Reported-by: Usama Arif <usamaarif642(a)gmail.com>
Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Tested-by: Paul E. McKenney <paulmck(a)kernel.org>
Link: https://lore.kernel.org/all/20250117232433.24027-1-frederic@kernel.org
Closes: 20241213203739.1519801-1-usamaarif642(a)gmail.com
diff --git a/include/linux/hrtimer_defs.h b/include/linux/hrtimer_defs.h
index c3b4b7ed7c16..84a5045f80f3 100644
--- a/include/linux/hrtimer_defs.h
+++ b/include/linux/hrtimer_defs.h
@@ -125,6 +125,7 @@ struct hrtimer_cpu_base {
ktime_t softirq_expires_next;
struct hrtimer *softirq_next_timer;
struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
+ call_single_data_t csd;
} ____cacheline_aligned;
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 4fb81f8c6f1c..deb1aa32814e 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -58,6 +58,8 @@
#define HRTIMER_ACTIVE_SOFT (HRTIMER_ACTIVE_HARD << MASK_SHIFT)
#define HRTIMER_ACTIVE_ALL (HRTIMER_ACTIVE_SOFT | HRTIMER_ACTIVE_HARD)
+static void retrigger_next_event(void *arg);
+
/*
* The timer bases:
*
@@ -111,7 +113,8 @@ DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
.clockid = CLOCK_TAI,
.get_time = &ktime_get_clocktai,
},
- }
+ },
+ .csd = CSD_INIT(retrigger_next_event, NULL)
};
static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
@@ -124,6 +127,14 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
[CLOCK_TAI] = HRTIMER_BASE_TAI,
};
+static inline bool hrtimer_base_is_online(struct hrtimer_cpu_base *base)
+{
+ if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
+ return true;
+ else
+ return likely(base->online);
+}
+
/*
* Functions and macros which are different for UP/SMP systems are kept in a
* single place
@@ -178,27 +189,54 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
}
/*
- * We do not migrate the timer when it is expiring before the next
- * event on the target cpu. When high resolution is enabled, we cannot
- * reprogram the target cpu hardware and we would cause it to fire
- * late. To keep it simple, we handle the high resolution enabled and
- * disabled case similar.
+ * Check if the elected target is suitable considering its next
+ * event and the hotplug state of the current CPU.
+ *
+ * If the elected target is remote and its next event is after the timer
+ * to queue, then a remote reprogram is necessary. However there is no
+ * guarantee the IPI handling the operation would arrive in time to meet
+ * the high resolution deadline. In this case the local CPU becomes a
+ * preferred target, unless it is offline.
+ *
+ * High and low resolution modes are handled the same way for simplicity.
*
* Called with cpu_base->lock of target cpu held.
*/
-static int
-hrtimer_check_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base)
+static bool hrtimer_suitable_target(struct hrtimer *timer, struct hrtimer_clock_base *new_base,
+ struct hrtimer_cpu_base *new_cpu_base,
+ struct hrtimer_cpu_base *this_cpu_base)
{
ktime_t expires;
+ /*
+ * The local CPU clockevent can be reprogrammed. Also get_target_base()
+ * guarantees it is online.
+ */
+ if (new_cpu_base == this_cpu_base)
+ return true;
+
+ /*
+ * The offline local CPU can't be the default target if the
+ * next remote target event is after this timer. Keep the
+ * elected new base. An IPI will we issued to reprogram
+ * it as a last resort.
+ */
+ if (!hrtimer_base_is_online(this_cpu_base))
+ return true;
+
expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
- return expires < new_base->cpu_base->expires_next;
+
+ return expires >= new_base->cpu_base->expires_next;
}
-static inline
-struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base,
- int pinned)
+static inline struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base, int pinned)
{
+ if (!hrtimer_base_is_online(base)) {
+ int cpu = cpumask_any_and(cpu_online_mask, housekeeping_cpumask(HK_TYPE_TIMER));
+
+ return &per_cpu(hrtimer_bases, cpu);
+ }
+
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
if (static_branch_likely(&timers_migration_enabled) && !pinned)
return &per_cpu(hrtimer_bases, get_nohz_timer_target());
@@ -249,8 +287,8 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
raw_spin_unlock(&base->cpu_base->lock);
raw_spin_lock(&new_base->cpu_base->lock);
- if (new_cpu_base != this_cpu_base &&
- hrtimer_check_target(timer, new_base)) {
+ if (!hrtimer_suitable_target(timer, new_base, new_cpu_base,
+ this_cpu_base)) {
raw_spin_unlock(&new_base->cpu_base->lock);
raw_spin_lock(&base->cpu_base->lock);
new_cpu_base = this_cpu_base;
@@ -259,8 +297,7 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
}
WRITE_ONCE(timer->base, new_base);
} else {
- if (new_cpu_base != this_cpu_base &&
- hrtimer_check_target(timer, new_base)) {
+ if (!hrtimer_suitable_target(timer, new_base, new_cpu_base, this_cpu_base)) {
new_cpu_base = this_cpu_base;
goto again;
}
@@ -706,8 +743,6 @@ static inline int hrtimer_is_hres_enabled(void)
return hrtimer_hres_enabled;
}
-static void retrigger_next_event(void *arg);
-
/*
* Switch to high resolution mode
*/
@@ -1195,6 +1230,7 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
u64 delta_ns, const enum hrtimer_mode mode,
struct hrtimer_clock_base *base)
{
+ struct hrtimer_cpu_base *this_cpu_base = this_cpu_ptr(&hrtimer_bases);
struct hrtimer_clock_base *new_base;
bool force_local, first;
@@ -1206,9 +1242,15 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
* and enforce reprogramming after it is queued no matter whether
* it is the new first expiring timer again or not.
*/
- force_local = base->cpu_base == this_cpu_ptr(&hrtimer_bases);
+ force_local = base->cpu_base == this_cpu_base;
force_local &= base->cpu_base->next_timer == timer;
+ /*
+ * Don't force local queuing if this enqueue happens on a unplugged
+ * CPU after hrtimer_cpu_dying() has been invoked.
+ */
+ force_local &= this_cpu_base->online;
+
/*
* Remove an active timer from the queue. In case it is not queued
* on the current CPU, make sure that remove_hrtimer() updates the
@@ -1238,8 +1280,27 @@ static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
}
first = enqueue_hrtimer(timer, new_base, mode);
- if (!force_local)
- return first;
+ if (!force_local) {
+ /*
+ * If the current CPU base is online, then the timer is
+ * never queued on a remote CPU if it would be the first
+ * expiring timer there.
+ */
+ if (hrtimer_base_is_online(this_cpu_base))
+ return first;
+
+ /*
+ * Timer was enqueued remote because the current base is
+ * already offline. If the timer is the first to expire,
+ * kick the remote CPU to reprogram the clock event.
+ */
+ if (first) {
+ struct hrtimer_cpu_base *new_cpu_base = new_base->cpu_base;
+
+ smp_call_function_single_async(new_cpu_base->cpu, &new_cpu_base->csd);
+ }
+ return 0;
+ }
/*
* Timer was forced to stay on the current CPU to avoid
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 49b9258b05b97c6464e1964b6a2fddb3ddb65d17
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021000-backboned-mumble-2daa@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49b9258b05b97c6464e1964b6a2fddb3ddb65d17 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Tue, 3 Dec 2024 10:05:53 -0800
Subject: [PATCH] crypto: qce - fix priority to be less than ARMv8 CE
As QCE is an order of magnitude slower than the ARMv8 Crypto Extensions
on the CPU, and is also less well tested, give it a lower priority.
Previously the QCE SHA algorithms had higher priority than the ARMv8 CE
equivalents, and the ciphers such as AES-XTS had the same priority which
meant the QCE versions were chosen if they happened to be loaded later.
Fixes: ec8f5d8f6f76 ("crypto: qce - Qualcomm crypto engine driver")
Cc: stable(a)vger.kernel.org
Cc: Bartosz Golaszewski <brgl(a)bgdev.pl>
Cc: Neil Armstrong <neil.armstrong(a)linaro.org>
Cc: Thara Gopinath <thara.gopinath(a)gmail.com>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
Reviewed-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/qce/aead.c b/drivers/crypto/qce/aead.c
index 7d811728f047..97b56e92ea33 100644
--- a/drivers/crypto/qce/aead.c
+++ b/drivers/crypto/qce/aead.c
@@ -786,7 +786,7 @@ static int qce_aead_register_one(const struct qce_aead_def *def, struct qce_devi
alg->init = qce_aead_init;
alg->exit = qce_aead_exit;
- alg->base.cra_priority = 300;
+ alg->base.cra_priority = 275;
alg->base.cra_flags = CRYPTO_ALG_ASYNC |
CRYPTO_ALG_ALLOCATES_MEMORY |
CRYPTO_ALG_KERN_DRIVER_ONLY |
diff --git a/drivers/crypto/qce/sha.c b/drivers/crypto/qce/sha.c
index 916908c04b63..c4ddc3b265ee 100644
--- a/drivers/crypto/qce/sha.c
+++ b/drivers/crypto/qce/sha.c
@@ -482,7 +482,7 @@ static int qce_ahash_register_one(const struct qce_ahash_def *def,
base = &alg->halg.base;
base->cra_blocksize = def->blocksize;
- base->cra_priority = 300;
+ base->cra_priority = 175;
base->cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_KERN_DRIVER_ONLY;
base->cra_ctxsize = sizeof(struct qce_sha_ctx);
base->cra_alignmask = 0;
diff --git a/drivers/crypto/qce/skcipher.c b/drivers/crypto/qce/skcipher.c
index 5b493fdc1e74..ffb334eb5b34 100644
--- a/drivers/crypto/qce/skcipher.c
+++ b/drivers/crypto/qce/skcipher.c
@@ -461,7 +461,7 @@ static int qce_skcipher_register_one(const struct qce_skcipher_def *def,
alg->encrypt = qce_skcipher_encrypt;
alg->decrypt = qce_skcipher_decrypt;
- alg->base.cra_priority = 300;
+ alg->base.cra_priority = 275;
alg->base.cra_flags = CRYPTO_ALG_ASYNC |
CRYPTO_ALG_ALLOCATES_MEMORY |
CRYPTO_ALG_KERN_DRIVER_ONLY;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 49b9258b05b97c6464e1964b6a2fddb3ddb65d17
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021059-drinking-september-98ee@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49b9258b05b97c6464e1964b6a2fddb3ddb65d17 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Tue, 3 Dec 2024 10:05:53 -0800
Subject: [PATCH] crypto: qce - fix priority to be less than ARMv8 CE
As QCE is an order of magnitude slower than the ARMv8 Crypto Extensions
on the CPU, and is also less well tested, give it a lower priority.
Previously the QCE SHA algorithms had higher priority than the ARMv8 CE
equivalents, and the ciphers such as AES-XTS had the same priority which
meant the QCE versions were chosen if they happened to be loaded later.
Fixes: ec8f5d8f6f76 ("crypto: qce - Qualcomm crypto engine driver")
Cc: stable(a)vger.kernel.org
Cc: Bartosz Golaszewski <brgl(a)bgdev.pl>
Cc: Neil Armstrong <neil.armstrong(a)linaro.org>
Cc: Thara Gopinath <thara.gopinath(a)gmail.com>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
Reviewed-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/qce/aead.c b/drivers/crypto/qce/aead.c
index 7d811728f047..97b56e92ea33 100644
--- a/drivers/crypto/qce/aead.c
+++ b/drivers/crypto/qce/aead.c
@@ -786,7 +786,7 @@ static int qce_aead_register_one(const struct qce_aead_def *def, struct qce_devi
alg->init = qce_aead_init;
alg->exit = qce_aead_exit;
- alg->base.cra_priority = 300;
+ alg->base.cra_priority = 275;
alg->base.cra_flags = CRYPTO_ALG_ASYNC |
CRYPTO_ALG_ALLOCATES_MEMORY |
CRYPTO_ALG_KERN_DRIVER_ONLY |
diff --git a/drivers/crypto/qce/sha.c b/drivers/crypto/qce/sha.c
index 916908c04b63..c4ddc3b265ee 100644
--- a/drivers/crypto/qce/sha.c
+++ b/drivers/crypto/qce/sha.c
@@ -482,7 +482,7 @@ static int qce_ahash_register_one(const struct qce_ahash_def *def,
base = &alg->halg.base;
base->cra_blocksize = def->blocksize;
- base->cra_priority = 300;
+ base->cra_priority = 175;
base->cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_KERN_DRIVER_ONLY;
base->cra_ctxsize = sizeof(struct qce_sha_ctx);
base->cra_alignmask = 0;
diff --git a/drivers/crypto/qce/skcipher.c b/drivers/crypto/qce/skcipher.c
index 5b493fdc1e74..ffb334eb5b34 100644
--- a/drivers/crypto/qce/skcipher.c
+++ b/drivers/crypto/qce/skcipher.c
@@ -461,7 +461,7 @@ static int qce_skcipher_register_one(const struct qce_skcipher_def *def,
alg->encrypt = qce_skcipher_encrypt;
alg->decrypt = qce_skcipher_decrypt;
- alg->base.cra_priority = 300;
+ alg->base.cra_priority = 275;
alg->base.cra_flags = CRYPTO_ALG_ASYNC |
CRYPTO_ALG_ALLOCATES_MEMORY |
CRYPTO_ALG_KERN_DRIVER_ONLY;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 3751fe2cba2a9fba2204ef62102bc4bb027cec7b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021001-outflank-broken-2dd7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3751fe2cba2a9fba2204ef62102bc4bb027cec7b Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Date: Fri, 13 Dec 2024 15:53:54 +0100
Subject: [PATCH] arm64: dts: qcom: sm8450: Fix CDSP memory length
The address space in CDSP PAS (Peripheral Authentication Service)
remoteproc node should point to the QDSP PUB address space
(QDSP6...SS_PUB) which has a length of 0x10000. Value of 0x1400000 was
copied from older DTS, but it does not look accurate at all.
This should have no functional impact on Linux users, because PAS loader
does not use this address space at all.
Fixes: 1172729576fb ("arm64: dts: qcom: sm8450: Add remoteproc enablers and instances")
Cc: stable(a)vger.kernel.org
Reviewed-by: Neil Armstrong <neil.armstrong(a)linaro.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20241213-dts-qcom-cdsp-mpss-base-address-v3-5-2e0…
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/arch/arm64/boot/dts/qcom/sm8450.dtsi b/arch/arm64/boot/dts/qcom/sm8450.dtsi
index 962023331ac4..b57edfbaf784 100644
--- a/arch/arm64/boot/dts/qcom/sm8450.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8450.dtsi
@@ -2800,7 +2800,7 @@ vamacro: codec@33f0000 {
remoteproc_cdsp: remoteproc@32300000 {
compatible = "qcom,sm8450-cdsp-pas";
- reg = <0 0x32300000 0 0x1400000>;
+ reg = <0 0x32300000 0 0x10000>;
interrupts-extended = <&intc GIC_SPI 578 IRQ_TYPE_EDGE_RISING>,
<&smp2p_cdsp_in 0 IRQ_TYPE_EDGE_RISING>,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 26f6e91fa29a58fdc76b47f94f8f6027944a490c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021032-pregnant-oat-1b18@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 26f6e91fa29a58fdc76b47f94f8f6027944a490c Mon Sep 17 00:00:00 2001
From: Chen-Yu Tsai <wenst(a)chromium.org>
Date: Fri, 25 Oct 2024 15:56:28 +0800
Subject: [PATCH] arm64: dts: mediatek: mt8183: Disable DSI display output by
default
Most SoC dtsi files have the display output interfaces disabled by
default, and only enabled on boards that utilize them. The MT8183
has it backwards: the display outputs are left enabled by default,
and only disabled at the board level.
Reverse the situation for the DSI output so that it follows the
normal scheme. For ease of backporting the DPI output is handled
in a separate patch.
Fixes: 88ec840270e6 ("arm64: dts: mt8183: Add dsi node")
Fixes: 19b6403f1e2a ("arm64: dts: mt8183: add mt8183 pumpkin board")
Cc: stable(a)vger.kernel.org
Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org>
Reviewed-by: Fei Shao <fshao(a)chromium.org>
Link: https://lore.kernel.org/r/20241025075630.3917458-2-wenst@chromium.org
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
diff --git a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
index 61a6f66914b8..dbdee604edab 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
@@ -522,10 +522,6 @@ &scp {
status = "okay";
};
-&dsi0 {
- status = "disabled";
-};
-
&dpi0 {
pinctrl-names = "default", "sleep";
pinctrl-0 = <&dpi_func_pins>;
diff --git a/arch/arm64/boot/dts/mediatek/mt8183.dtsi b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
index 8f31fc9050ec..c7008bb8a81d 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
@@ -1834,6 +1834,7 @@ dsi0: dsi@14014000 {
resets = <&mmsys MT8183_MMSYS_SW0_RST_B_DISP_DSI0>;
phys = <&mipi_tx0>;
phy-names = "dphy";
+ status = "disabled";
};
dpi0: dpi@14015000 {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 26f6e91fa29a58fdc76b47f94f8f6027944a490c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021032-nanny-finishing-f853@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 26f6e91fa29a58fdc76b47f94f8f6027944a490c Mon Sep 17 00:00:00 2001
From: Chen-Yu Tsai <wenst(a)chromium.org>
Date: Fri, 25 Oct 2024 15:56:28 +0800
Subject: [PATCH] arm64: dts: mediatek: mt8183: Disable DSI display output by
default
Most SoC dtsi files have the display output interfaces disabled by
default, and only enabled on boards that utilize them. The MT8183
has it backwards: the display outputs are left enabled by default,
and only disabled at the board level.
Reverse the situation for the DSI output so that it follows the
normal scheme. For ease of backporting the DPI output is handled
in a separate patch.
Fixes: 88ec840270e6 ("arm64: dts: mt8183: Add dsi node")
Fixes: 19b6403f1e2a ("arm64: dts: mt8183: add mt8183 pumpkin board")
Cc: stable(a)vger.kernel.org
Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org>
Reviewed-by: Fei Shao <fshao(a)chromium.org>
Link: https://lore.kernel.org/r/20241025075630.3917458-2-wenst@chromium.org
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
diff --git a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
index 61a6f66914b8..dbdee604edab 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
@@ -522,10 +522,6 @@ &scp {
status = "okay";
};
-&dsi0 {
- status = "disabled";
-};
-
&dpi0 {
pinctrl-names = "default", "sleep";
pinctrl-0 = <&dpi_func_pins>;
diff --git a/arch/arm64/boot/dts/mediatek/mt8183.dtsi b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
index 8f31fc9050ec..c7008bb8a81d 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
@@ -1834,6 +1834,7 @@ dsi0: dsi@14014000 {
resets = <&mmsys MT8183_MMSYS_SW0_RST_B_DISP_DSI0>;
phys = <&mipi_tx0>;
phy-names = "dphy";
+ status = "disabled";
};
dpi0: dpi@14015000 {
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 26f6e91fa29a58fdc76b47f94f8f6027944a490c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021031-unhappily-scribble-5cef@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 26f6e91fa29a58fdc76b47f94f8f6027944a490c Mon Sep 17 00:00:00 2001
From: Chen-Yu Tsai <wenst(a)chromium.org>
Date: Fri, 25 Oct 2024 15:56:28 +0800
Subject: [PATCH] arm64: dts: mediatek: mt8183: Disable DSI display output by
default
Most SoC dtsi files have the display output interfaces disabled by
default, and only enabled on boards that utilize them. The MT8183
has it backwards: the display outputs are left enabled by default,
and only disabled at the board level.
Reverse the situation for the DSI output so that it follows the
normal scheme. For ease of backporting the DPI output is handled
in a separate patch.
Fixes: 88ec840270e6 ("arm64: dts: mt8183: Add dsi node")
Fixes: 19b6403f1e2a ("arm64: dts: mt8183: add mt8183 pumpkin board")
Cc: stable(a)vger.kernel.org
Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org>
Reviewed-by: Fei Shao <fshao(a)chromium.org>
Link: https://lore.kernel.org/r/20241025075630.3917458-2-wenst@chromium.org
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
diff --git a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
index 61a6f66914b8..dbdee604edab 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
@@ -522,10 +522,6 @@ &scp {
status = "okay";
};
-&dsi0 {
- status = "disabled";
-};
-
&dpi0 {
pinctrl-names = "default", "sleep";
pinctrl-0 = <&dpi_func_pins>;
diff --git a/arch/arm64/boot/dts/mediatek/mt8183.dtsi b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
index 8f31fc9050ec..c7008bb8a81d 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
@@ -1834,6 +1834,7 @@ dsi0: dsi@14014000 {
resets = <&mmsys MT8183_MMSYS_SW0_RST_B_DISP_DSI0>;
phys = <&mipi_tx0>;
phy-names = "dphy";
+ status = "disabled";
};
dpi0: dpi@14015000 {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 26f6e91fa29a58fdc76b47f94f8f6027944a490c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021031-uniquely-everglade-75eb@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 26f6e91fa29a58fdc76b47f94f8f6027944a490c Mon Sep 17 00:00:00 2001
From: Chen-Yu Tsai <wenst(a)chromium.org>
Date: Fri, 25 Oct 2024 15:56:28 +0800
Subject: [PATCH] arm64: dts: mediatek: mt8183: Disable DSI display output by
default
Most SoC dtsi files have the display output interfaces disabled by
default, and only enabled on boards that utilize them. The MT8183
has it backwards: the display outputs are left enabled by default,
and only disabled at the board level.
Reverse the situation for the DSI output so that it follows the
normal scheme. For ease of backporting the DPI output is handled
in a separate patch.
Fixes: 88ec840270e6 ("arm64: dts: mt8183: Add dsi node")
Fixes: 19b6403f1e2a ("arm64: dts: mt8183: add mt8183 pumpkin board")
Cc: stable(a)vger.kernel.org
Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org>
Reviewed-by: Fei Shao <fshao(a)chromium.org>
Link: https://lore.kernel.org/r/20241025075630.3917458-2-wenst@chromium.org
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
diff --git a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
index 61a6f66914b8..dbdee604edab 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
@@ -522,10 +522,6 @@ &scp {
status = "okay";
};
-&dsi0 {
- status = "disabled";
-};
-
&dpi0 {
pinctrl-names = "default", "sleep";
pinctrl-0 = <&dpi_func_pins>;
diff --git a/arch/arm64/boot/dts/mediatek/mt8183.dtsi b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
index 8f31fc9050ec..c7008bb8a81d 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
@@ -1834,6 +1834,7 @@ dsi0: dsi@14014000 {
resets = <&mmsys MT8183_MMSYS_SW0_RST_B_DISP_DSI0>;
phys = <&mipi_tx0>;
phy-names = "dphy";
+ status = "disabled";
};
dpi0: dpi@14015000 {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 0cfbd7805fe13406500e6a6f2aa08f198d5db4bd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021051-patriarch-epidermis-967b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0cfbd7805fe13406500e6a6f2aa08f198d5db4bd Mon Sep 17 00:00:00 2001
From: Andreas Kemnade <andreas(a)kemnade.info>
Date: Wed, 4 Dec 2024 18:41:52 +0100
Subject: [PATCH] ARM: dts: ti/omap: gta04: fix pm issues caused by spi module
Despite CM_IDLEST1_CORE and CM_FCLKEN1_CORE behaving normal,
disabling SPI leads to messages like when suspending:
Powerdomain (core_pwrdm) didn't enter target state 0
and according to /sys/kernel/debug/pm_debug/count off state is not
entered. That was not connected to SPI during the discussion
of disabling SPI. See:
https://lore.kernel.org/linux-omap/20230122100852.32ae082c@aktux/
The reason is that SPI is per default in slave mode. Linux driver
will turn it to master per default. It slave mode, the powerdomain seems to
be kept active if active chip select input is sensed.
Fix that by explicitly disabling the SPI3 pins which used to be muxed by
the bootloader since they are available on an optionally fitted header
which would require dtb overlays anyways.
Fixes: a622310f7f01 ("ARM: dts: gta04: fix excess dma channel usage")
CC: stable(a)vger.kernel.org
Signed-off-by: Andreas Kemnade <andreas(a)kemnade.info>
Reviewed-by: Roger Quadros <rogerq(a)kernel.org>
Link: https://lore.kernel.org/r/20241204174152.2360431-1-andreas@kemnade.info
Signed-off-by: Kevin Hilman <khilman(a)baylibre.com>
diff --git a/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi b/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
index 2ee3ddd64020..536070e80b2c 100644
--- a/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
+++ b/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
@@ -446,6 +446,7 @@ &omap3_pmx_core2 {
pinctrl-names = "default";
pinctrl-0 = <
&hsusb2_2_pins
+ &mcspi3hog_pins
>;
hsusb2_2_pins: hsusb2-2-pins {
@@ -459,6 +460,15 @@ OMAP3630_CORE2_IOPAD(0x25fa, PIN_INPUT_PULLDOWN | MUX_MODE3) /* etk_d15.hsusb2_d
>;
};
+ mcspi3hog_pins: mcspi3hog-pins {
+ pinctrl-single,pins = <
+ OMAP3630_CORE2_IOPAD(0x25dc, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d0 */
+ OMAP3630_CORE2_IOPAD(0x25de, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d1 */
+ OMAP3630_CORE2_IOPAD(0x25e0, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d2 */
+ OMAP3630_CORE2_IOPAD(0x25e2, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d3 */
+ >;
+ };
+
spi_gpio_pins: spi-gpio-pinmux-pins {
pinctrl-single,pins = <
OMAP3630_CORE2_IOPAD(0x25d8, PIN_OUTPUT | MUX_MODE4) /* clk */
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 0cfbd7805fe13406500e6a6f2aa08f198d5db4bd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021051-ricotta-everybody-4e43@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0cfbd7805fe13406500e6a6f2aa08f198d5db4bd Mon Sep 17 00:00:00 2001
From: Andreas Kemnade <andreas(a)kemnade.info>
Date: Wed, 4 Dec 2024 18:41:52 +0100
Subject: [PATCH] ARM: dts: ti/omap: gta04: fix pm issues caused by spi module
Despite CM_IDLEST1_CORE and CM_FCLKEN1_CORE behaving normal,
disabling SPI leads to messages like when suspending:
Powerdomain (core_pwrdm) didn't enter target state 0
and according to /sys/kernel/debug/pm_debug/count off state is not
entered. That was not connected to SPI during the discussion
of disabling SPI. See:
https://lore.kernel.org/linux-omap/20230122100852.32ae082c@aktux/
The reason is that SPI is per default in slave mode. Linux driver
will turn it to master per default. It slave mode, the powerdomain seems to
be kept active if active chip select input is sensed.
Fix that by explicitly disabling the SPI3 pins which used to be muxed by
the bootloader since they are available on an optionally fitted header
which would require dtb overlays anyways.
Fixes: a622310f7f01 ("ARM: dts: gta04: fix excess dma channel usage")
CC: stable(a)vger.kernel.org
Signed-off-by: Andreas Kemnade <andreas(a)kemnade.info>
Reviewed-by: Roger Quadros <rogerq(a)kernel.org>
Link: https://lore.kernel.org/r/20241204174152.2360431-1-andreas@kemnade.info
Signed-off-by: Kevin Hilman <khilman(a)baylibre.com>
diff --git a/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi b/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
index 2ee3ddd64020..536070e80b2c 100644
--- a/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
+++ b/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
@@ -446,6 +446,7 @@ &omap3_pmx_core2 {
pinctrl-names = "default";
pinctrl-0 = <
&hsusb2_2_pins
+ &mcspi3hog_pins
>;
hsusb2_2_pins: hsusb2-2-pins {
@@ -459,6 +460,15 @@ OMAP3630_CORE2_IOPAD(0x25fa, PIN_INPUT_PULLDOWN | MUX_MODE3) /* etk_d15.hsusb2_d
>;
};
+ mcspi3hog_pins: mcspi3hog-pins {
+ pinctrl-single,pins = <
+ OMAP3630_CORE2_IOPAD(0x25dc, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d0 */
+ OMAP3630_CORE2_IOPAD(0x25de, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d1 */
+ OMAP3630_CORE2_IOPAD(0x25e0, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d2 */
+ OMAP3630_CORE2_IOPAD(0x25e2, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d3 */
+ >;
+ };
+
spi_gpio_pins: spi-gpio-pinmux-pins {
pinctrl-single,pins = <
OMAP3630_CORE2_IOPAD(0x25d8, PIN_OUTPUT | MUX_MODE4) /* clk */
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0cfbd7805fe13406500e6a6f2aa08f198d5db4bd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021050-talon-glowing-ad64@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0cfbd7805fe13406500e6a6f2aa08f198d5db4bd Mon Sep 17 00:00:00 2001
From: Andreas Kemnade <andreas(a)kemnade.info>
Date: Wed, 4 Dec 2024 18:41:52 +0100
Subject: [PATCH] ARM: dts: ti/omap: gta04: fix pm issues caused by spi module
Despite CM_IDLEST1_CORE and CM_FCLKEN1_CORE behaving normal,
disabling SPI leads to messages like when suspending:
Powerdomain (core_pwrdm) didn't enter target state 0
and according to /sys/kernel/debug/pm_debug/count off state is not
entered. That was not connected to SPI during the discussion
of disabling SPI. See:
https://lore.kernel.org/linux-omap/20230122100852.32ae082c@aktux/
The reason is that SPI is per default in slave mode. Linux driver
will turn it to master per default. It slave mode, the powerdomain seems to
be kept active if active chip select input is sensed.
Fix that by explicitly disabling the SPI3 pins which used to be muxed by
the bootloader since they are available on an optionally fitted header
which would require dtb overlays anyways.
Fixes: a622310f7f01 ("ARM: dts: gta04: fix excess dma channel usage")
CC: stable(a)vger.kernel.org
Signed-off-by: Andreas Kemnade <andreas(a)kemnade.info>
Reviewed-by: Roger Quadros <rogerq(a)kernel.org>
Link: https://lore.kernel.org/r/20241204174152.2360431-1-andreas@kemnade.info
Signed-off-by: Kevin Hilman <khilman(a)baylibre.com>
diff --git a/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi b/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
index 2ee3ddd64020..536070e80b2c 100644
--- a/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
+++ b/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
@@ -446,6 +446,7 @@ &omap3_pmx_core2 {
pinctrl-names = "default";
pinctrl-0 = <
&hsusb2_2_pins
+ &mcspi3hog_pins
>;
hsusb2_2_pins: hsusb2-2-pins {
@@ -459,6 +460,15 @@ OMAP3630_CORE2_IOPAD(0x25fa, PIN_INPUT_PULLDOWN | MUX_MODE3) /* etk_d15.hsusb2_d
>;
};
+ mcspi3hog_pins: mcspi3hog-pins {
+ pinctrl-single,pins = <
+ OMAP3630_CORE2_IOPAD(0x25dc, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d0 */
+ OMAP3630_CORE2_IOPAD(0x25de, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d1 */
+ OMAP3630_CORE2_IOPAD(0x25e0, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d2 */
+ OMAP3630_CORE2_IOPAD(0x25e2, PIN_OUTPUT_PULLDOWN | MUX_MODE4) /* etk_d3 */
+ >;
+ };
+
spi_gpio_pins: spi-gpio-pinmux-pins {
pinctrl-single,pins = <
OMAP3630_CORE2_IOPAD(0x25d8, PIN_OUTPUT | MUX_MODE4) /* clk */