The patch below does not apply to the 6.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.9.y
git checkout FETCH_HEAD
git cherry-pick -x 340f0c7067a95281ad13734f8225f49c6cf52067
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061305-kilogram-handheld-7bd9@gregkh' --subject-prefix 'PATCH 6.9.y' HEAD^..
Possible dependencies:
340f0c7067a9 ("eventfs: Update all the eventfs_inodes from the events descriptor")
27c046484382 ("tracefs: Update inode permissions on remount")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 340f0c7067a95281ad13734f8225f49c6cf52067 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 23 May 2024 01:14:28 -0400
Subject: [PATCH] eventfs: Update all the eventfs_inodes from the events
descriptor
The change to update the permissions of the eventfs_inode had the
misconception that using the tracefs_inode would find all the
eventfs_inodes that have been updated and reset them on remount.
The problem with this approach is that the eventfs_inodes are freed when
they are no longer used (basically the reason the eventfs system exists).
When they are freed, the updated eventfs_inodes are not reset on a remount
because their tracefs_inodes have been freed.
Instead, since the events directory eventfs_inode always has a
tracefs_inode pointing to it (it is not freed when finished), and the
events directory has a link to all its children, have the
eventfs_remount() function only operate on the events eventfs_inode and
have it descend into its children updating their uid and gids.
Link: https://lore.kernel.org/all/CAK7LNARXgaWw3kH9JgrnH4vK6fr8LDkNKf3wq8NhMWJrVw…
Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.754424703@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options")
Reported-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 5dfb1ccd56ea..129d0f54ba62 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -305,6 +305,45 @@ static const struct file_operations eventfs_file_operations = {
.llseek = generic_file_llseek,
};
+static void eventfs_set_attrs(struct eventfs_inode *ei, bool update_uid, kuid_t uid,
+ bool update_gid, kgid_t gid, int level)
+{
+ struct eventfs_inode *ei_child;
+
+ /* Update events/<system>/<event> */
+ if (WARN_ON_ONCE(level > 3))
+ return;
+
+ if (update_uid) {
+ ei->attr.mode &= ~EVENTFS_SAVE_UID;
+ ei->attr.uid = uid;
+ }
+
+ if (update_gid) {
+ ei->attr.mode &= ~EVENTFS_SAVE_GID;
+ ei->attr.gid = gid;
+ }
+
+ list_for_each_entry(ei_child, &ei->children, list) {
+ eventfs_set_attrs(ei_child, update_uid, uid, update_gid, gid, level + 1);
+ }
+
+ if (!ei->entry_attrs)
+ return;
+
+ for (int i = 0; i < ei->nr_entries; i++) {
+ if (update_uid) {
+ ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID;
+ ei->entry_attrs[i].uid = uid;
+ }
+ if (update_gid) {
+ ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID;
+ ei->entry_attrs[i].gid = gid;
+ }
+ }
+
+}
+
/*
* On a remount of tracefs, if UID or GID options are set, then
* the mount point inode permissions should be used.
@@ -314,33 +353,12 @@ void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid)
{
struct eventfs_inode *ei = ti->private;
- if (!ei)
+ /* Only the events directory does the updates */
+ if (!ei || !ei->is_events || ei->is_freed)
return;
- if (update_uid) {
- ei->attr.mode &= ~EVENTFS_SAVE_UID;
- ei->attr.uid = ti->vfs_inode.i_uid;
- }
-
-
- if (update_gid) {
- ei->attr.mode &= ~EVENTFS_SAVE_GID;
- ei->attr.gid = ti->vfs_inode.i_gid;
- }
-
- if (!ei->entry_attrs)
- return;
-
- for (int i = 0; i < ei->nr_entries; i++) {
- if (update_uid) {
- ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID;
- ei->entry_attrs[i].uid = ti->vfs_inode.i_uid;
- }
- if (update_gid) {
- ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID;
- ei->entry_attrs[i].gid = ti->vfs_inode.i_gid;
- }
- }
+ eventfs_set_attrs(ei, update_uid, ti->vfs_inode.i_uid,
+ update_gid, ti->vfs_inode.i_gid, 0);
}
/* Return the evenfs_inode of the "events" directory */
Sometimes errors are seen, when doing DR swap, like:
[ 24.672481] ucsi-stm32g0-i2c 0-0035: UCSI_GET_PDOS failed (-5)
[ 24.720188] ucsi-stm32g0-i2c 0-0035: ucsi_handle_connector_change:
GET_CONNECTOR_STATUS failed (-5)
There may be some race, which lead to read CCI, before the command complete
flag is set, hence returning -EIO. Similar fix has been done also in
ucsi_acpi [1].
In case of a spurious or otherwise delayed notification it is
possible that CCI still reports the previous completion. The
UCSI spec is aware of this and provides two completion bits in
CCI, one for normal commands and one for acks. As acks and commands
alternate the notification handler can determine if the completion
bit is from the current command.
To fix this add the ACK_PENDING bit for ucsi_stm32g0 and only complete
commands if the completion bit matches.
[1] https://lore.kernel.org/lkml/20240121204123.275441-3-lk@c--e.de/
Fixes: 72849d4fcee7 ("usb: typec: ucsi: stm32g0: add support for stm32g0 controller")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier(a)foss.st.com>
---
Changes in v2: rebase and define ACK_PENDING as commented by Dmitry.
---
drivers/usb/typec/ucsi/ucsi_stm32g0.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/typec/ucsi/ucsi_stm32g0.c b/drivers/usb/typec/ucsi/ucsi_stm32g0.c
index ac48b7763114..ac69288e8bb0 100644
--- a/drivers/usb/typec/ucsi/ucsi_stm32g0.c
+++ b/drivers/usb/typec/ucsi/ucsi_stm32g0.c
@@ -65,6 +65,7 @@ struct ucsi_stm32g0 {
struct device *dev;
unsigned long flags;
#define COMMAND_PENDING 1
+#define ACK_PENDING 2
const char *fw_name;
struct ucsi *ucsi;
bool suspended;
@@ -396,9 +397,13 @@ static int ucsi_stm32g0_sync_write(struct ucsi *ucsi, unsigned int offset, const
size_t len)
{
struct ucsi_stm32g0 *g0 = ucsi_get_drvdata(ucsi);
+ bool ack = UCSI_COMMAND(*(u64 *)val) == UCSI_ACK_CC_CI;
int ret;
- set_bit(COMMAND_PENDING, &g0->flags);
+ if (ack)
+ set_bit(ACK_PENDING, &g0->flags);
+ else
+ set_bit(COMMAND_PENDING, &g0->flags);
ret = ucsi_stm32g0_async_write(ucsi, offset, val, len);
if (ret)
@@ -406,9 +411,14 @@ static int ucsi_stm32g0_sync_write(struct ucsi *ucsi, unsigned int offset, const
if (!wait_for_completion_timeout(&g0->complete, msecs_to_jiffies(5000)))
ret = -ETIMEDOUT;
+ else
+ return 0;
out_clear_bit:
- clear_bit(COMMAND_PENDING, &g0->flags);
+ if (ack)
+ clear_bit(ACK_PENDING, &g0->flags);
+ else
+ clear_bit(COMMAND_PENDING, &g0->flags);
return ret;
}
@@ -429,8 +439,9 @@ static irqreturn_t ucsi_stm32g0_irq_handler(int irq, void *data)
if (UCSI_CCI_CONNECTOR(cci))
ucsi_connector_change(g0->ucsi, UCSI_CCI_CONNECTOR(cci));
- if (test_bit(COMMAND_PENDING, &g0->flags) &&
- cci & (UCSI_CCI_ACK_COMPLETE | UCSI_CCI_COMMAND_COMPLETE))
+ if (cci & UCSI_CCI_ACK_COMPLETE && test_and_clear_bit(ACK_PENDING, &g0->flags))
+ complete(&g0->complete);
+ if (cci & UCSI_CCI_COMMAND_COMPLETE && test_and_clear_bit(COMMAND_PENDING, &g0->flags))
complete(&g0->complete);
return IRQ_HANDLED;
--
2.25.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 13df4d44a3aaabe61cd01d277b6ee23ead2a5206
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061335-payee-pamphlet-09d5@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
13df4d44a3aa ("ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()")
57341fe3179c ("ext4: refactor out ext4_generic_attr_show()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 13df4d44a3aaabe61cd01d277b6ee23ead2a5206 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Tue, 19 Mar 2024 19:33:20 +0800
Subject: [PATCH] ext4: fix slab-out-of-bounds in
ext4_mb_find_good_group_avg_frag_lists()
We can trigger a slab-out-of-bounds with the following commands:
mkfs.ext4 -F /dev/$disk 10G
mount /dev/$disk /tmp/test
echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc
echo test > /tmp/test/file && sync
==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11
CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521
Call Trace:
dump_stack_lvl+0x2c/0x50
kasan_report+0xb6/0xf0
ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
ext4_mb_regular_allocator+0x19e9/0x2370 [ext4]
ext4_mb_new_blocks+0x88a/0x1370 [ext4]
ext4_ext_map_blocks+0x14f7/0x2390 [ext4]
ext4_map_blocks+0x569/0xea0 [ext4]
ext4_do_writepages+0x10f6/0x1bc0 [ext4]
[...]
==================================================================
The flow of issue triggering is as follows:
// Set s_mb_group_prealloc to 2147483647 via sysfs
ext4_mb_new_blocks
ext4_mb_normalize_request
ext4_mb_normalize_group_request
ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc
ext4_mb_regular_allocator
ext4_mb_choose_next_group
ext4_mb_choose_next_group_best_avail
mb_avg_fragment_size_order
order = fls(len) - 2 = 29
ext4_mb_find_good_group_avg_frag_lists
frag_list = &sbi->s_mb_avg_fragment_size[order]
if (list_empty(frag_list)) // Trigger SOOB!
At 4k block size, the length of the s_mb_avg_fragment_size list is 14,
but an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds
to be triggered by an attempt to access an element at index 29.
Add a new attr_id attr_clusters_in_group with values in the range
[0, sbi->s_clusters_per_group] and declare mb_group_prealloc as
that type to fix the issue. In addition avoid returning an order
from mb_avg_fragment_size_order() greater than MB_NUM_ORDERS(sb)
and reduce some useless loops.
Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
CC: stable(a)vger.kernel.org
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20240319113325.3110393-5-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 12b3f196010b..dbf04f91516c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -831,6 +831,8 @@ static int mb_avg_fragment_size_order(struct super_block *sb, ext4_grpblk_t len)
return 0;
if (order == MB_NUM_ORDERS(sb))
order--;
+ if (WARN_ON_ONCE(order > MB_NUM_ORDERS(sb)))
+ order = MB_NUM_ORDERS(sb) - 1;
return order;
}
@@ -1008,6 +1010,8 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* goal length.
*/
order = fls(ac->ac_g_ex.fe_len) - 1;
+ if (WARN_ON_ONCE(order - 1 > MB_NUM_ORDERS(ac->ac_sb)))
+ order = MB_NUM_ORDERS(ac->ac_sb);
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 7f455b5f22c0..ddd71673176c 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -29,6 +29,7 @@ typedef enum {
attr_trigger_test_error,
attr_first_error_time,
attr_last_error_time,
+ attr_clusters_in_group,
attr_feature,
attr_pointer_ui,
attr_pointer_ul,
@@ -207,13 +208,14 @@ EXT4_ATTR_FUNC(sra_exceeded_retry_limit, 0444);
EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead,
ext4_sb_info, s_inode_readahead_blks);
+EXT4_ATTR_OFFSET(mb_group_prealloc, 0644, clusters_in_group,
+ ext4_sb_info, s_mb_group_prealloc);
EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal);
EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats);
EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
-EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
@@ -376,6 +378,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
switch (a->attr_id) {
case attr_inode_readahead:
+ case attr_clusters_in_group:
case attr_pointer_ui:
if (a->attr_ptr == ptr_ext4_super_block_offset)
return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
@@ -455,6 +458,14 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
else
*((unsigned int *) ptr) = t;
return len;
+ case attr_clusters_in_group:
+ ret = kstrtouint(skip_spaces(buf), 0, &t);
+ if (ret)
+ return ret;
+ if (t > sbi->s_clusters_per_group)
+ return -EINVAL;
+ *((unsigned int *) ptr) = t;
+ return len;
case attr_pointer_ul:
ret = kstrtoul(skip_spaces(buf), 0, <);
if (ret)
The patch below does not apply to the 6.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.9.y
git checkout FETCH_HEAD
git cherry-pick -x 13df4d44a3aaabe61cd01d277b6ee23ead2a5206
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061334-impish-backdrop-1c34@gregkh' --subject-prefix 'PATCH 6.9.y' HEAD^..
Possible dependencies:
13df4d44a3aa ("ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()")
57341fe3179c ("ext4: refactor out ext4_generic_attr_show()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 13df4d44a3aaabe61cd01d277b6ee23ead2a5206 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Tue, 19 Mar 2024 19:33:20 +0800
Subject: [PATCH] ext4: fix slab-out-of-bounds in
ext4_mb_find_good_group_avg_frag_lists()
We can trigger a slab-out-of-bounds with the following commands:
mkfs.ext4 -F /dev/$disk 10G
mount /dev/$disk /tmp/test
echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc
echo test > /tmp/test/file && sync
==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11
CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521
Call Trace:
dump_stack_lvl+0x2c/0x50
kasan_report+0xb6/0xf0
ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
ext4_mb_regular_allocator+0x19e9/0x2370 [ext4]
ext4_mb_new_blocks+0x88a/0x1370 [ext4]
ext4_ext_map_blocks+0x14f7/0x2390 [ext4]
ext4_map_blocks+0x569/0xea0 [ext4]
ext4_do_writepages+0x10f6/0x1bc0 [ext4]
[...]
==================================================================
The flow of issue triggering is as follows:
// Set s_mb_group_prealloc to 2147483647 via sysfs
ext4_mb_new_blocks
ext4_mb_normalize_request
ext4_mb_normalize_group_request
ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc
ext4_mb_regular_allocator
ext4_mb_choose_next_group
ext4_mb_choose_next_group_best_avail
mb_avg_fragment_size_order
order = fls(len) - 2 = 29
ext4_mb_find_good_group_avg_frag_lists
frag_list = &sbi->s_mb_avg_fragment_size[order]
if (list_empty(frag_list)) // Trigger SOOB!
At 4k block size, the length of the s_mb_avg_fragment_size list is 14,
but an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds
to be triggered by an attempt to access an element at index 29.
Add a new attr_id attr_clusters_in_group with values in the range
[0, sbi->s_clusters_per_group] and declare mb_group_prealloc as
that type to fix the issue. In addition avoid returning an order
from mb_avg_fragment_size_order() greater than MB_NUM_ORDERS(sb)
and reduce some useless loops.
Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
CC: stable(a)vger.kernel.org
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20240319113325.3110393-5-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 12b3f196010b..dbf04f91516c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -831,6 +831,8 @@ static int mb_avg_fragment_size_order(struct super_block *sb, ext4_grpblk_t len)
return 0;
if (order == MB_NUM_ORDERS(sb))
order--;
+ if (WARN_ON_ONCE(order > MB_NUM_ORDERS(sb)))
+ order = MB_NUM_ORDERS(sb) - 1;
return order;
}
@@ -1008,6 +1010,8 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* goal length.
*/
order = fls(ac->ac_g_ex.fe_len) - 1;
+ if (WARN_ON_ONCE(order - 1 > MB_NUM_ORDERS(ac->ac_sb)))
+ order = MB_NUM_ORDERS(ac->ac_sb);
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 7f455b5f22c0..ddd71673176c 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -29,6 +29,7 @@ typedef enum {
attr_trigger_test_error,
attr_first_error_time,
attr_last_error_time,
+ attr_clusters_in_group,
attr_feature,
attr_pointer_ui,
attr_pointer_ul,
@@ -207,13 +208,14 @@ EXT4_ATTR_FUNC(sra_exceeded_retry_limit, 0444);
EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead,
ext4_sb_info, s_inode_readahead_blks);
+EXT4_ATTR_OFFSET(mb_group_prealloc, 0644, clusters_in_group,
+ ext4_sb_info, s_mb_group_prealloc);
EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal);
EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats);
EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
-EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
@@ -376,6 +378,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
switch (a->attr_id) {
case attr_inode_readahead:
+ case attr_clusters_in_group:
case attr_pointer_ui:
if (a->attr_ptr == ptr_ext4_super_block_offset)
return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
@@ -455,6 +458,14 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
else
*((unsigned int *) ptr) = t;
return len;
+ case attr_clusters_in_group:
+ ret = kstrtouint(skip_spaces(buf), 0, &t);
+ if (ret)
+ return ret;
+ if (t > sbi->s_clusters_per_group)
+ return -EINVAL;
+ *((unsigned int *) ptr) = t;
+ return len;
case attr_pointer_ul:
ret = kstrtoul(skip_spaces(buf), 0, <);
if (ret)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c898afdc15645efb555acb6d85b484eb40a45409
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061333-wincing-tackle-2315@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
c898afdc1564 ("9p: add missing locking around taking dentry fid list")
b48dbb998d70 ("9p fid refcount: add p9_fid_get/put wrappers")
47b1e3432b06 ("9p: Remove unnecessary variable for old fids while walking from d_parent")
cba83f47fc0e ("9p: Track the root fid with its own variable during lookups")
b0017602fdf6 ("9p: fix EBADF errors in cached mode")
2a3dcbccd64b ("9p: Fix refcounting during full path walks for fid lookups")
beca774fc51a ("9p: fix fid refcount leak in v9fs_vfs_atomic_open_dotl")
6e195b0f7c8e ("9p: fix a bunch of checkpatch warnings")
eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c898afdc15645efb555acb6d85b484eb40a45409 Mon Sep 17 00:00:00 2001
From: Dominique Martinet <asmadeus(a)codewreck.org>
Date: Tue, 21 May 2024 21:13:36 +0900
Subject: [PATCH] 9p: add missing locking around taking dentry fid list
Fix a use-after-free on dentry's d_fsdata fid list when a thread
looks up a fid through dentry while another thread unlinks it:
UAF thread:
refcount_t: addition on 0; use-after-free.
p9_fid_get linux/./include/net/9p/client.h:262
v9fs_fid_find+0x236/0x280 linux/fs/9p/fid.c:129
v9fs_fid_lookup_with_uid linux/fs/9p/fid.c:181
v9fs_fid_lookup+0xbf/0xc20 linux/fs/9p/fid.c:314
v9fs_vfs_getattr_dotl+0xf9/0x360 linux/fs/9p/vfs_inode_dotl.c:400
vfs_statx+0xdd/0x4d0 linux/fs/stat.c:248
Freed by:
p9_fid_destroy (inlined)
p9_client_clunk+0xb0/0xe0 linux/net/9p/client.c:1456
p9_fid_put linux/./include/net/9p/client.h:278
v9fs_dentry_release+0xb5/0x140 linux/fs/9p/vfs_dentry.c:55
v9fs_remove+0x38f/0x620 linux/fs/9p/vfs_inode.c:518
vfs_unlink+0x29a/0x810 linux/fs/namei.c:4335
The problem is that d_fsdata was not accessed under d_lock, because
d_release() normally is only called once the dentry is otherwise no
longer accessible but since we also call it explicitly in v9fs_remove
that lock is required:
move the hlist out of the dentry under lock then unref its fids once
they are no longer accessible.
Fixes: 154372e67d40 ("fs/9p: fix create-unlink-getattr idiom")
Cc: stable(a)vger.kernel.org
Reported-by: Meysam Firouzi
Reported-by: Amirmohammad Eftekhar
Reviewed-by: Christian Schoenebeck <linux_oss(a)crudebyte.com>
Message-ID: <20240521122947.1080227-1-asmadeus(a)codewreck.org>
Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org>
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index f16f73581634..01338d4c2d9e 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -48,12 +48,17 @@ static int v9fs_cached_dentry_delete(const struct dentry *dentry)
static void v9fs_dentry_release(struct dentry *dentry)
{
struct hlist_node *p, *n;
+ struct hlist_head head;
p9_debug(P9_DEBUG_VFS, " dentry: %pd (%p)\n",
dentry, dentry);
- hlist_for_each_safe(p, n, (struct hlist_head *)&dentry->d_fsdata)
+
+ spin_lock(&dentry->d_lock);
+ hlist_move_list((struct hlist_head *)&dentry->d_fsdata, &head);
+ spin_unlock(&dentry->d_lock);
+
+ hlist_for_each_safe(p, n, &head)
p9_fid_put(hlist_entry(p, struct p9_fid, dlist));
- dentry->d_fsdata = NULL;
}
static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: b2747f108b8034271fd5289bd8f3a7003e0775a3
Gitweb: https://git.kernel.org/tip/b2747f108b8034271fd5289bd8f3a7003e0775a3
Author: Benjamin Segall <bsegall(a)google.com>
AuthorDate: Wed, 12 Jun 2024 12:44:44 -07:00
Committer: Borislav Petkov (AMD) <bp(a)alien8.de>
CommitterDate: Thu, 13 Jun 2024 10:32:36 +02:00
x86/boot: Don't add the EFI stub to targets, again
This is a re-commit of
da05b143a308 ("x86/boot: Don't add the EFI stub to targets")
after the tagged patch incorrectly reverted it.
vmlinux-objs-y is added to targets, with an assumption that they are all
relative to $(obj); adding a $(objtree)/drivers/... path causes the
build to incorrectly create a useless
arch/x86/boot/compressed/drivers/... directory tree.
Fix this just by using a different make variable for the EFI stub.
Fixes: cb8bda8ad443 ("x86/boot/compressed: Rename efi_thunk_64.S to efi-mixed.S")
Signed-off-by: Ben Segall <bsegall(a)google.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Ard Biesheuvel <ardb(a)kernel.org>
Cc: stable(a)vger.kernel.org # v6.1+
Link: https://lore.kernel.org/r/xm267ceukksz.fsf@bsegall.svl.corp.google.com
---
arch/x86/boot/compressed/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 243ee86..f205164 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -105,9 +105,9 @@ vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/mem.o
vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
-vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
+vmlinux-libs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
-$(obj)/vmlinux: $(vmlinux-objs-y) FORCE
+$(obj)/vmlinux: $(vmlinux-objs-y) $(vmlinux-libs-y) FORCE
$(call if_changed,ld)
OBJCOPYFLAGS_vmlinux.bin := -R .comment -S
Two additional changes not present in the original patch:
1. Check optlen in the XDP_UMEM_REG case as well. It was added in commit
c05cd36458147 ("xsk: add support to allow unaligned chunk placement")
but seems like too big of a change for stable
2. copy_from_sockptr() in the context was replace copy_from_usr()
because commit a7b75c5a8c414 ("net: pass a sockptr_t into
->setsockopt") was not present
[ Upstream commit 237f3cf13b20db183d3706d997eedc3c49eacd44 ]
From: Eric Dumazet <edumazet(a)google.com>
syzbot reported an illegal copy in xsk_setsockopt() [1]
Make sure to validate setsockopt() @optlen parameter.
[1]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
BUG: KASAN: slab-out-of-bounds in xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
Read of size 4 at addr ffff888028c6cde3 by task syz-executor.0/7549
CPU: 0 PID: 7549 Comm: syz-executor.0 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
copy_from_sockptr include/linux/sockptr.h:55 [inline]
xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
RIP: 0033:0x7fb40587de69
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb40665a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007fb4059abf80 RCX: 00007fb40587de69
RDX: 0000000000000005 RSI: 000000000000011b RDI: 0000000000000006
RBP: 00007fb4058ca47a R08: 0000000000000002 R09: 0000000000000000
R10: 0000000020001980 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fb4059abf80 R15: 00007fff57ee4d08
</TASK>
Allocated by task 7549:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
kasan_kmalloc include/linux/kasan.h:211 [inline]
__do_kmalloc_node mm/slub.c:3966 [inline]
__kmalloc+0x233/0x4a0 mm/slub.c:3979
kmalloc include/linux/slab.h:632 [inline]
__cgroup_bpf_run_filter_setsockopt+0xd2f/0x1040 kernel/bpf/cgroup.c:1869
do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
The buggy address belongs to the object at ffff888028c6cde0
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 1 bytes to the right of
allocated 2-byte region [ffff888028c6cde0, ffff888028c6cde2)
The buggy address belongs to the physical page:
page:ffffea0000a31b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888028c6c9c0 pfn:0x28c6c
anon flags: 0xfff00000000800(slab|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000800 ffff888014c41280 0000000000000000 dead000000000001
raw: ffff888028c6c9c0 0000000080800057 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112cc0(GFP_USER|__GFP_NOWARN|__GFP_NORETRY), pid 6648, tgid 6644 (syz-executor.0), ts 133906047828, free_ts 133859922223
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
prep_new_page mm/page_alloc.c:1540 [inline]
get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
__alloc_pages+0x256/0x680 mm/page_alloc.c:4569
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page+0x5f/0x160 mm/slub.c:2175
allocate_slab mm/slub.c:2338 [inline]
new_slab+0x84/0x2f0 mm/slub.c:2391
___slab_alloc+0xc73/0x1260 mm/slub.c:3525
__slab_alloc mm/slub.c:3610 [inline]
__slab_alloc_node mm/slub.c:3663 [inline]
slab_alloc_node mm/slub.c:3835 [inline]
__do_kmalloc_node mm/slub.c:3965 [inline]
__kmalloc_node+0x2db/0x4e0 mm/slub.c:3973
kmalloc_node include/linux/slab.h:648 [inline]
__vmalloc_area_node mm/vmalloc.c:3197 [inline]
__vmalloc_node_range+0x5f9/0x14a0 mm/vmalloc.c:3392
__vmalloc_node mm/vmalloc.c:3457 [inline]
vzalloc+0x79/0x90 mm/vmalloc.c:3530
bpf_check+0x260/0x19010 kernel/bpf/verifier.c:21162
bpf_prog_load+0x1667/0x20f0 kernel/bpf/syscall.c:2895
__sys_bpf+0x4ee/0x810 kernel/bpf/syscall.c:5631
__do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
page last free pid 6650 tgid 6647 stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1140 [inline]
free_unref_page_prepare+0x95d/0xa80 mm/page_alloc.c:2346
free_unref_page_list+0x5a3/0x850 mm/page_alloc.c:2532
release_pages+0x2117/0x2400 mm/swap.c:1042
tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
tlb_flush_mmu+0x34d/0x4e0 mm/mmu_gather.c:300
tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:392
exit_mmap+0x4b6/0xd40 mm/mmap.c:3300
__mmput+0x115/0x3c0 kernel/fork.c:1345
exit_mm+0x220/0x310 kernel/exit.c:569
do_exit+0x99e/0x27e0 kernel/exit.c:865
do_group_exit+0x207/0x2c0 kernel/exit.c:1027
get_signal+0x176e/0x1850 kernel/signal.c:2907
arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:310
exit_to_user_mode_loop kernel/entry/common.c:105 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:201 [inline]
syscall_exit_to_user_mode+0xc9/0x360 kernel/entry/common.c:212
do_syscall_64+0x10a/0x240 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Memory state around the buggy address:
ffff888028c6cc80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
ffff888028c6cd00: fa fc fc fc fa fc fc fc 00 fc fc fc 06 fc fc fc
>ffff888028c6cd80: fa fc fc fc fa fc fc fc fa fc fc fc 02 fc fc fc
^
ffff888028c6ce00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
ffff888028c6ce80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap")
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Cc: "Björn Töpel" <bjorn(a)kernel.org>
Cc: Magnus Karlsson <magnus.karlsson(a)intel.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski(a)intel.com>
Cc: Jonathan Lemon <jonathan.lemon(a)gmail.com>
Acked-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/r/20240404202738.3634547-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu(a)suse.com>
---
net/xdp/xsk.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6bb0649c028c..d5a9c43930de 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -515,6 +515,8 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
struct xdp_umem_reg mr;
struct xdp_umem *umem;
+ if (optlen < sizeof(mr))
+ return -EINVAL;
if (copy_from_user(&mr, optval, sizeof(mr)))
return -EFAULT;
@@ -542,6 +544,8 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
struct xsk_queue **q;
int entries;
+ if (optlen < sizeof(entries))
+ return -EINVAL;
if (copy_from_user(&entries, optval, sizeof(entries)))
return -EFAULT;
--
2.45.1
[ Upstream commit 237f3cf13b20db183d3706d997eedc3c49eacd44 ]
From: Eric Dumazet <edumazet(a)google.com>
syzbot reported an illegal copy in xsk_setsockopt() [1]
Make sure to validate setsockopt() @optlen parameter.
[1]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
BUG: KASAN: slab-out-of-bounds in xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
Read of size 4 at addr ffff888028c6cde3 by task syz-executor.0/7549
CPU: 0 PID: 7549 Comm: syz-executor.0 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
copy_from_sockptr include/linux/sockptr.h:55 [inline]
xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
RIP: 0033:0x7fb40587de69
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb40665a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007fb4059abf80 RCX: 00007fb40587de69
RDX: 0000000000000005 RSI: 000000000000011b RDI: 0000000000000006
RBP: 00007fb4058ca47a R08: 0000000000000002 R09: 0000000000000000
R10: 0000000020001980 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fb4059abf80 R15: 00007fff57ee4d08
</TASK>
Allocated by task 7549:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
kasan_kmalloc include/linux/kasan.h:211 [inline]
__do_kmalloc_node mm/slub.c:3966 [inline]
__kmalloc+0x233/0x4a0 mm/slub.c:3979
kmalloc include/linux/slab.h:632 [inline]
__cgroup_bpf_run_filter_setsockopt+0xd2f/0x1040 kernel/bpf/cgroup.c:1869
do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
__sys_setsockopt+0x1ae/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
The buggy address belongs to the object at ffff888028c6cde0
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 1 bytes to the right of
allocated 2-byte region [ffff888028c6cde0, ffff888028c6cde2)
The buggy address belongs to the physical page:
page:ffffea0000a31b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888028c6c9c0 pfn:0x28c6c
anon flags: 0xfff00000000800(slab|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000800 ffff888014c41280 0000000000000000 dead000000000001
raw: ffff888028c6c9c0 0000000080800057 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112cc0(GFP_USER|__GFP_NOWARN|__GFP_NORETRY), pid 6648, tgid 6644 (syz-executor.0), ts 133906047828, free_ts 133859922223
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
prep_new_page mm/page_alloc.c:1540 [inline]
get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
__alloc_pages+0x256/0x680 mm/page_alloc.c:4569
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page+0x5f/0x160 mm/slub.c:2175
allocate_slab mm/slub.c:2338 [inline]
new_slab+0x84/0x2f0 mm/slub.c:2391
___slab_alloc+0xc73/0x1260 mm/slub.c:3525
__slab_alloc mm/slub.c:3610 [inline]
__slab_alloc_node mm/slub.c:3663 [inline]
slab_alloc_node mm/slub.c:3835 [inline]
__do_kmalloc_node mm/slub.c:3965 [inline]
__kmalloc_node+0x2db/0x4e0 mm/slub.c:3973
kmalloc_node include/linux/slab.h:648 [inline]
__vmalloc_area_node mm/vmalloc.c:3197 [inline]
__vmalloc_node_range+0x5f9/0x14a0 mm/vmalloc.c:3392
__vmalloc_node mm/vmalloc.c:3457 [inline]
vzalloc+0x79/0x90 mm/vmalloc.c:3530
bpf_check+0x260/0x19010 kernel/bpf/verifier.c:21162
bpf_prog_load+0x1667/0x20f0 kernel/bpf/syscall.c:2895
__sys_bpf+0x4ee/0x810 kernel/bpf/syscall.c:5631
__do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
page last free pid 6650 tgid 6647 stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1140 [inline]
free_unref_page_prepare+0x95d/0xa80 mm/page_alloc.c:2346
free_unref_page_list+0x5a3/0x850 mm/page_alloc.c:2532
release_pages+0x2117/0x2400 mm/swap.c:1042
tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
tlb_flush_mmu+0x34d/0x4e0 mm/mmu_gather.c:300
tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:392
exit_mmap+0x4b6/0xd40 mm/mmap.c:3300
__mmput+0x115/0x3c0 kernel/fork.c:1345
exit_mm+0x220/0x310 kernel/exit.c:569
do_exit+0x99e/0x27e0 kernel/exit.c:865
do_group_exit+0x207/0x2c0 kernel/exit.c:1027
get_signal+0x176e/0x1850 kernel/signal.c:2907
arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:310
exit_to_user_mode_loop kernel/entry/common.c:105 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:201 [inline]
syscall_exit_to_user_mode+0xc9/0x360 kernel/entry/common.c:212
do_syscall_64+0x10a/0x240 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Memory state around the buggy address:
ffff888028c6cc80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
ffff888028c6cd00: fa fc fc fc fa fc fc fc 00 fc fc fc 06 fc fc fc
>ffff888028c6cd80: fa fc fc fc fa fc fc fc fa fc fc fc 02 fc fc fc
^
ffff888028c6ce00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
ffff888028c6ce80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap")
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Cc: "Björn Töpel" <bjorn(a)kernel.org>
Cc: Magnus Karlsson <magnus.karlsson(a)intel.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski(a)intel.com>
Cc: Jonathan Lemon <jonathan.lemon(a)gmail.com>
Acked-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/r/20240404202738.3634547-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[shung-hsi.yu: copy_from_sockptr() in the context was replaced with
copy_from_usr() because commit a7b75c5a8c414
("net: pass a sockptr_t into ->setsockopt") was not present]
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu(a)suse.com>
---
Moved the description about diversion from original patch down to the
very bottom as that seems to be what other are doing[1], and seems to
fit the bottom-posting style better.
1: https://lore.kernel.org/stable/20240602152233.78240-1-gpiccoli@igalia.com/
---
net/xdp/xsk.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index d04a2345bc3f..2ffcda7b1678 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -809,6 +809,8 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
struct xsk_queue **q;
int entries;
+ if (optlen < sizeof(entries))
+ return -EINVAL;
if (copy_from_user(&entries, optval, sizeof(entries)))
return -EFAULT;
--
2.45.2
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 21ae74e1bf18331ae5e279bd96304b3630828009
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061314-platypus-impaired-f82d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
21ae74e1bf18 ("wifi: ath10k: fix QCOM_RPROC_COMMON dependency")
d03407183d97 ("wifi: ath10k: fix QCOM_SMEM dependency")
4d79f6f34bbb ("wifi: ath10k: Store WLAN firmware version in SMEM image table")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 21ae74e1bf18331ae5e279bd96304b3630828009 Mon Sep 17 00:00:00 2001
From: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Date: Fri, 17 May 2024 10:00:28 +0300
Subject: [PATCH] wifi: ath10k: fix QCOM_RPROC_COMMON dependency
If ath10k_snoc is built-in, while Qualcomm remoteprocs are built as
modules, compilation fails with:
/usr/bin/aarch64-linux-gnu-ld: drivers/net/wireless/ath/ath10k/snoc.o: in function `ath10k_modem_init':
drivers/net/wireless/ath/ath10k/snoc.c:1534: undefined reference to `qcom_register_ssr_notifier'
/usr/bin/aarch64-linux-gnu-ld: drivers/net/wireless/ath/ath10k/snoc.o: in function `ath10k_modem_deinit':
drivers/net/wireless/ath/ath10k/snoc.c:1551: undefined reference to `qcom_unregister_ssr_notifier'
Add corresponding dependency to ATH10K_SNOC Kconfig entry so that it's
built as module if QCOM_RPROC_COMMON is built as module too.
Fixes: 747ff7d3d742 ("ath10k: Don't always treat modem stop events as crashes")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Kalle Valo <quic_kvalo(a)quicinc.com>
Link: https://msgid.link/20240511-ath10k-snoc-dep-v1-1-9666e3af5c27@linaro.org
diff --git a/drivers/net/wireless/ath/ath10k/Kconfig b/drivers/net/wireless/ath/ath10k/Kconfig
index e6ea884cafc1..4f385f4a8cef 100644
--- a/drivers/net/wireless/ath/ath10k/Kconfig
+++ b/drivers/net/wireless/ath/ath10k/Kconfig
@@ -45,6 +45,7 @@ config ATH10K_SNOC
depends on ATH10K
depends on ARCH_QCOM || COMPILE_TEST
depends on QCOM_SMEM
+ depends on QCOM_RPROC_COMMON || QCOM_RPROC_COMMON=n
select QCOM_SCM
select QCOM_QMI_HELPERS
help
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 21ae74e1bf18331ae5e279bd96304b3630828009
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061313-wistful-dipping-5d5b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
21ae74e1bf18 ("wifi: ath10k: fix QCOM_RPROC_COMMON dependency")
d03407183d97 ("wifi: ath10k: fix QCOM_SMEM dependency")
4d79f6f34bbb ("wifi: ath10k: Store WLAN firmware version in SMEM image table")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 21ae74e1bf18331ae5e279bd96304b3630828009 Mon Sep 17 00:00:00 2001
From: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Date: Fri, 17 May 2024 10:00:28 +0300
Subject: [PATCH] wifi: ath10k: fix QCOM_RPROC_COMMON dependency
If ath10k_snoc is built-in, while Qualcomm remoteprocs are built as
modules, compilation fails with:
/usr/bin/aarch64-linux-gnu-ld: drivers/net/wireless/ath/ath10k/snoc.o: in function `ath10k_modem_init':
drivers/net/wireless/ath/ath10k/snoc.c:1534: undefined reference to `qcom_register_ssr_notifier'
/usr/bin/aarch64-linux-gnu-ld: drivers/net/wireless/ath/ath10k/snoc.o: in function `ath10k_modem_deinit':
drivers/net/wireless/ath/ath10k/snoc.c:1551: undefined reference to `qcom_unregister_ssr_notifier'
Add corresponding dependency to ATH10K_SNOC Kconfig entry so that it's
built as module if QCOM_RPROC_COMMON is built as module too.
Fixes: 747ff7d3d742 ("ath10k: Don't always treat modem stop events as crashes")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Kalle Valo <quic_kvalo(a)quicinc.com>
Link: https://msgid.link/20240511-ath10k-snoc-dep-v1-1-9666e3af5c27@linaro.org
diff --git a/drivers/net/wireless/ath/ath10k/Kconfig b/drivers/net/wireless/ath/ath10k/Kconfig
index e6ea884cafc1..4f385f4a8cef 100644
--- a/drivers/net/wireless/ath/ath10k/Kconfig
+++ b/drivers/net/wireless/ath/ath10k/Kconfig
@@ -45,6 +45,7 @@ config ATH10K_SNOC
depends on ATH10K
depends on ARCH_QCOM || COMPILE_TEST
depends on QCOM_SMEM
+ depends on QCOM_RPROC_COMMON || QCOM_RPROC_COMMON=n
select QCOM_SCM
select QCOM_QMI_HELPERS
help
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 0110c4b110477bb1f19b0d02361846be7ab08300
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061358-defile-outplayed-f986@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
0110c4b11047 ("irqchip/riscv-intc: Prevent memory leak when riscv_intc_init_common() fails")
f4cc33e78ba8 ("irqchip/riscv-intc: Introduce Andes hart-level interrupt controller")
96303bcb401c ("irqchip/riscv-intc: Allow large non-standard interrupt number")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0110c4b110477bb1f19b0d02361846be7ab08300 Mon Sep 17 00:00:00 2001
From: Sunil V L <sunilvl(a)ventanamicro.com>
Date: Mon, 27 May 2024 13:41:13 +0530
Subject: [PATCH] irqchip/riscv-intc: Prevent memory leak when
riscv_intc_init_common() fails
When riscv_intc_init_common() fails, the firmware node allocated is not
freed. Add the missing free().
Fixes: 7023b9d83f03 ("irqchip/riscv-intc: Add ACPI support")
Signed-off-by: Sunil V L <sunilvl(a)ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Anup Patel <anup(a)brainfault.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240527081113.616189-1-sunilvl@ventanamicro.com
diff --git a/drivers/irqchip/irq-riscv-intc.c b/drivers/irqchip/irq-riscv-intc.c
index 9e71c4428814..4f3a12383a1e 100644
--- a/drivers/irqchip/irq-riscv-intc.c
+++ b/drivers/irqchip/irq-riscv-intc.c
@@ -253,8 +253,9 @@ IRQCHIP_DECLARE(andes, "andestech,cpu-intc", riscv_intc_init);
static int __init riscv_intc_acpi_init(union acpi_subtable_headers *header,
const unsigned long end)
{
- struct fwnode_handle *fn;
struct acpi_madt_rintc *rintc;
+ struct fwnode_handle *fn;
+ int rc;
rintc = (struct acpi_madt_rintc *)header;
@@ -273,7 +274,11 @@ static int __init riscv_intc_acpi_init(union acpi_subtable_headers *header,
return -ENOMEM;
}
- return riscv_intc_init_common(fn, &riscv_intc_chip);
+ rc = riscv_intc_init_common(fn, &riscv_intc_chip);
+ if (rc)
+ irq_domain_free_fwnode(fn);
+
+ return rc;
}
IRQCHIP_ACPI_DECLARE(riscv_intc, ACPI_MADT_TYPE_RINTC, NULL,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x d4202e66a4b1fe6968f17f9f09bbc30d08f028a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061321-cleaver-straddle-c86d@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4202e66a4b1fe6968f17f9f09bbc30d08f028a1 Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:56 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success on
Aarch64
Patch series "Fixes for compaction_test", v2.
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This patch (of 3):
Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 4f42eb7d7636..0b249a06a60b 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,12 +82,13 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
{
+ unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[10] = {0};
- char nr_hugepages[10] = {0};
+ char initial_nr_hugepages[20] = {0};
+ char nr_hugepages[20] = {0};
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -134,7 +135,12 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
/* We should have been able to request at least 1/3 rd of the memory in
huge pages */
- compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size);
+ nr_hugepages_ul = strtoul(nr_hugepages, NULL, 10);
+ if (!nr_hugepages_ul) {
+ ksft_print_msg("ERROR: No memory is available as huge pages\n");
+ goto close_fd;
+ }
+ compaction_index = mem_free/(nr_hugepages_ul * hugepage_size);
lseek(fd, 0, SEEK_SET);
@@ -145,11 +151,11 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
goto close_fd;
}
- ksft_print_msg("Number of huge pages allocated = %d\n",
- atoi(nr_hugepages));
+ ksft_print_msg("Number of huge pages allocated = %lu\n",
+ nr_hugepages_ul);
if (compaction_index > 3) {
- ksft_print_msg("ERROR: Less that 1/%d of memory is available\n"
+ ksft_print_msg("ERROR: Less than 1/%d of memory is available\n"
"as huge pages\n", compaction_index);
goto close_fd;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x d4202e66a4b1fe6968f17f9f09bbc30d08f028a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061319-groovy-unequal-0931@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4202e66a4b1fe6968f17f9f09bbc30d08f028a1 Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:56 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success on
Aarch64
Patch series "Fixes for compaction_test", v2.
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This patch (of 3):
Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 4f42eb7d7636..0b249a06a60b 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,12 +82,13 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
{
+ unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[10] = {0};
- char nr_hugepages[10] = {0};
+ char initial_nr_hugepages[20] = {0};
+ char nr_hugepages[20] = {0};
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -134,7 +135,12 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
/* We should have been able to request at least 1/3 rd of the memory in
huge pages */
- compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size);
+ nr_hugepages_ul = strtoul(nr_hugepages, NULL, 10);
+ if (!nr_hugepages_ul) {
+ ksft_print_msg("ERROR: No memory is available as huge pages\n");
+ goto close_fd;
+ }
+ compaction_index = mem_free/(nr_hugepages_ul * hugepage_size);
lseek(fd, 0, SEEK_SET);
@@ -145,11 +151,11 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
goto close_fd;
}
- ksft_print_msg("Number of huge pages allocated = %d\n",
- atoi(nr_hugepages));
+ ksft_print_msg("Number of huge pages allocated = %lu\n",
+ nr_hugepages_ul);
if (compaction_index > 3) {
- ksft_print_msg("ERROR: Less that 1/%d of memory is available\n"
+ ksft_print_msg("ERROR: Less than 1/%d of memory is available\n"
"as huge pages\n", compaction_index);
goto close_fd;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d4202e66a4b1fe6968f17f9f09bbc30d08f028a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061318-finalize-junior-4cb2@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4202e66a4b1fe6968f17f9f09bbc30d08f028a1 Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:56 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success on
Aarch64
Patch series "Fixes for compaction_test", v2.
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This patch (of 3):
Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 4f42eb7d7636..0b249a06a60b 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,12 +82,13 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
{
+ unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[10] = {0};
- char nr_hugepages[10] = {0};
+ char initial_nr_hugepages[20] = {0};
+ char nr_hugepages[20] = {0};
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -134,7 +135,12 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
/* We should have been able to request at least 1/3 rd of the memory in
huge pages */
- compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size);
+ nr_hugepages_ul = strtoul(nr_hugepages, NULL, 10);
+ if (!nr_hugepages_ul) {
+ ksft_print_msg("ERROR: No memory is available as huge pages\n");
+ goto close_fd;
+ }
+ compaction_index = mem_free/(nr_hugepages_ul * hugepage_size);
lseek(fd, 0, SEEK_SET);
@@ -145,11 +151,11 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
goto close_fd;
}
- ksft_print_msg("Number of huge pages allocated = %d\n",
- atoi(nr_hugepages));
+ ksft_print_msg("Number of huge pages allocated = %lu\n",
+ nr_hugepages_ul);
if (compaction_index > 3) {
- ksft_print_msg("ERROR: Less that 1/%d of memory is available\n"
+ ksft_print_msg("ERROR: Less than 1/%d of memory is available\n"
"as huge pages\n", compaction_index);
goto close_fd;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d4202e66a4b1fe6968f17f9f09bbc30d08f028a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061317-attractor-approval-2f8a@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4202e66a4b1fe6968f17f9f09bbc30d08f028a1 Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:56 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success on
Aarch64
Patch series "Fixes for compaction_test", v2.
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This patch (of 3):
Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 4f42eb7d7636..0b249a06a60b 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,12 +82,13 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
{
+ unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[10] = {0};
- char nr_hugepages[10] = {0};
+ char initial_nr_hugepages[20] = {0};
+ char nr_hugepages[20] = {0};
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -134,7 +135,12 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
/* We should have been able to request at least 1/3 rd of the memory in
huge pages */
- compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size);
+ nr_hugepages_ul = strtoul(nr_hugepages, NULL, 10);
+ if (!nr_hugepages_ul) {
+ ksft_print_msg("ERROR: No memory is available as huge pages\n");
+ goto close_fd;
+ }
+ compaction_index = mem_free/(nr_hugepages_ul * hugepage_size);
lseek(fd, 0, SEEK_SET);
@@ -145,11 +151,11 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
goto close_fd;
}
- ksft_print_msg("Number of huge pages allocated = %d\n",
- atoi(nr_hugepages));
+ ksft_print_msg("Number of huge pages allocated = %lu\n",
+ nr_hugepages_ul);
if (compaction_index > 3) {
- ksft_print_msg("ERROR: Less that 1/%d of memory is available\n"
+ ksft_print_msg("ERROR: Less than 1/%d of memory is available\n"
"as huge pages\n", compaction_index);
goto close_fd;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d4202e66a4b1fe6968f17f9f09bbc30d08f028a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061315-breeding-carry-caf4@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4202e66a4b1fe6968f17f9f09bbc30d08f028a1 Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:56 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success on
Aarch64
Patch series "Fixes for compaction_test", v2.
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This patch (of 3):
Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 4f42eb7d7636..0b249a06a60b 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,12 +82,13 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
{
+ unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[10] = {0};
- char nr_hugepages[10] = {0};
+ char initial_nr_hugepages[20] = {0};
+ char nr_hugepages[20] = {0};
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -134,7 +135,12 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
/* We should have been able to request at least 1/3 rd of the memory in
huge pages */
- compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size);
+ nr_hugepages_ul = strtoul(nr_hugepages, NULL, 10);
+ if (!nr_hugepages_ul) {
+ ksft_print_msg("ERROR: No memory is available as huge pages\n");
+ goto close_fd;
+ }
+ compaction_index = mem_free/(nr_hugepages_ul * hugepage_size);
lseek(fd, 0, SEEK_SET);
@@ -145,11 +151,11 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
goto close_fd;
}
- ksft_print_msg("Number of huge pages allocated = %d\n",
- atoi(nr_hugepages));
+ ksft_print_msg("Number of huge pages allocated = %lu\n",
+ nr_hugepages_ul);
if (compaction_index > 3) {
- ksft_print_msg("ERROR: Less that 1/%d of memory is available\n"
+ ksft_print_msg("ERROR: Less than 1/%d of memory is available\n"
"as huge pages\n", compaction_index);
goto close_fd;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x d4202e66a4b1fe6968f17f9f09bbc30d08f028a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061314-unlit-filled-c396@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4202e66a4b1fe6968f17f9f09bbc30d08f028a1 Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:56 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success on
Aarch64
Patch series "Fixes for compaction_test", v2.
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This patch (of 3):
Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 4f42eb7d7636..0b249a06a60b 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,12 +82,13 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
{
+ unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[10] = {0};
- char nr_hugepages[10] = {0};
+ char initial_nr_hugepages[20] = {0};
+ char nr_hugepages[20] = {0};
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -134,7 +135,12 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
/* We should have been able to request at least 1/3 rd of the memory in
huge pages */
- compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size);
+ nr_hugepages_ul = strtoul(nr_hugepages, NULL, 10);
+ if (!nr_hugepages_ul) {
+ ksft_print_msg("ERROR: No memory is available as huge pages\n");
+ goto close_fd;
+ }
+ compaction_index = mem_free/(nr_hugepages_ul * hugepage_size);
lseek(fd, 0, SEEK_SET);
@@ -145,11 +151,11 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
goto close_fd;
}
- ksft_print_msg("Number of huge pages allocated = %d\n",
- atoi(nr_hugepages));
+ ksft_print_msg("Number of huge pages allocated = %lu\n",
+ nr_hugepages_ul);
if (compaction_index > 3) {
- ksft_print_msg("ERROR: Less that 1/%d of memory is available\n"
+ ksft_print_msg("ERROR: Less than 1/%d of memory is available\n"
"as huge pages\n", compaction_index);
goto close_fd;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061355-send-backwash-9965@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
fb9293b6b015 ("selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation")
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
69e545edbe8b ("selftests/mm: ksft_exit functions do not return")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9c1490d911f8 ("selftests/mm: log skipped compaction test as a skip")
8c9eea721a98 ("selftests/mm: skip test if application doesn't has root privileges")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
cb6e7cae1886 ("selftests/mm: gup_test: conform test to TAP format output")
019b277b680f ("selftests: mm: skip whole test instead of failure")
46fd75d4a3c9 ("selftests: mm: add pagemap ioctl tests")
05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
000303329752 ("selftests/mm: make migration test robust to failure")
f6dd4e223d87 ("selftests/mm: skip soft-dirty tests on arm64")
ba91e7e5d15a ("selftests/mm: add tests for HWPOISON hugetlbfs read")
63773d2b593d ("Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:58 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success and
reduce probability of OOM-killer invocation
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 5e9bd1da9370..e140558e6f53 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,13 +82,16 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
+ unsigned long initial_nr_hugepages)
{
unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[20] = {0};
char nr_hugepages[20] = {0};
+ char init_nr_hugepages[20] = {0};
+
+ sprintf(init_nr_hugepages, "%lu", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -102,23 +105,6 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto out;
}
- if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) {
- ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
- /* Start with the initial condition of 0 huge pages*/
- if (write(fd, "0", sizeof(char)) != sizeof(char)) {
- ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
/* Request a large number of huge pages. The Kernel will allocate
as much as it can */
if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
@@ -146,8 +132,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
lseek(fd, 0, SEEK_SET);
- if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages))
- != strlen(initial_nr_hugepages)) {
+ if (write(fd, init_nr_hugepages, strlen(init_nr_hugepages))
+ != strlen(init_nr_hugepages)) {
ksft_print_msg("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n",
strerror(errno));
goto close_fd;
@@ -171,6 +157,41 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
return ret;
}
+int set_zero_hugepages(unsigned long *initial_nr_hugepages)
+{
+ int fd, ret = -1;
+ char nr_hugepages[20] = {0};
+
+ fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ksft_print_msg("Failed to open /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) {
+ ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ /* Start with the initial condition of 0 huge pages */
+ if (write(fd, "0", sizeof(char)) != sizeof(char)) {
+ ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ *initial_nr_hugepages = strtoul(nr_hugepages, NULL, 10);
+ ret = 0;
+
+ close_fd:
+ close(fd);
+
+ out:
+ return ret;
+}
int main(int argc, char **argv)
{
@@ -181,6 +202,7 @@ int main(int argc, char **argv)
unsigned long mem_free = 0;
unsigned long hugepage_size = 0;
long mem_fragmentable_MB = 0;
+ unsigned long initial_nr_hugepages;
ksft_print_header();
@@ -189,6 +211,10 @@ int main(int argc, char **argv)
ksft_set_plan(1);
+ /* Start the test without hugepages reducing mem_free */
+ if (set_zero_hugepages(&initial_nr_hugepages))
+ ksft_exit_fail();
+
lim.rlim_cur = RLIM_INFINITY;
lim.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_MEMLOCK, &lim))
@@ -232,7 +258,8 @@ int main(int argc, char **argv)
entry = entry->next;
}
- if (check_compaction(mem_free, hugepage_size) == 0)
+ if (check_compaction(mem_free, hugepage_size,
+ initial_nr_hugepages) == 0)
ksft_exit_pass();
ksft_exit_fail();
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061353-scanner-unusable-1d09@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
fb9293b6b015 ("selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation")
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
69e545edbe8b ("selftests/mm: ksft_exit functions do not return")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9c1490d911f8 ("selftests/mm: log skipped compaction test as a skip")
8c9eea721a98 ("selftests/mm: skip test if application doesn't has root privileges")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
cb6e7cae1886 ("selftests/mm: gup_test: conform test to TAP format output")
019b277b680f ("selftests: mm: skip whole test instead of failure")
46fd75d4a3c9 ("selftests: mm: add pagemap ioctl tests")
05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
000303329752 ("selftests/mm: make migration test robust to failure")
f6dd4e223d87 ("selftests/mm: skip soft-dirty tests on arm64")
ba91e7e5d15a ("selftests/mm: add tests for HWPOISON hugetlbfs read")
63773d2b593d ("Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:58 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success and
reduce probability of OOM-killer invocation
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 5e9bd1da9370..e140558e6f53 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,13 +82,16 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
+ unsigned long initial_nr_hugepages)
{
unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[20] = {0};
char nr_hugepages[20] = {0};
+ char init_nr_hugepages[20] = {0};
+
+ sprintf(init_nr_hugepages, "%lu", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -102,23 +105,6 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto out;
}
- if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) {
- ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
- /* Start with the initial condition of 0 huge pages*/
- if (write(fd, "0", sizeof(char)) != sizeof(char)) {
- ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
/* Request a large number of huge pages. The Kernel will allocate
as much as it can */
if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
@@ -146,8 +132,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
lseek(fd, 0, SEEK_SET);
- if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages))
- != strlen(initial_nr_hugepages)) {
+ if (write(fd, init_nr_hugepages, strlen(init_nr_hugepages))
+ != strlen(init_nr_hugepages)) {
ksft_print_msg("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n",
strerror(errno));
goto close_fd;
@@ -171,6 +157,41 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
return ret;
}
+int set_zero_hugepages(unsigned long *initial_nr_hugepages)
+{
+ int fd, ret = -1;
+ char nr_hugepages[20] = {0};
+
+ fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ksft_print_msg("Failed to open /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) {
+ ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ /* Start with the initial condition of 0 huge pages */
+ if (write(fd, "0", sizeof(char)) != sizeof(char)) {
+ ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ *initial_nr_hugepages = strtoul(nr_hugepages, NULL, 10);
+ ret = 0;
+
+ close_fd:
+ close(fd);
+
+ out:
+ return ret;
+}
int main(int argc, char **argv)
{
@@ -181,6 +202,7 @@ int main(int argc, char **argv)
unsigned long mem_free = 0;
unsigned long hugepage_size = 0;
long mem_fragmentable_MB = 0;
+ unsigned long initial_nr_hugepages;
ksft_print_header();
@@ -189,6 +211,10 @@ int main(int argc, char **argv)
ksft_set_plan(1);
+ /* Start the test without hugepages reducing mem_free */
+ if (set_zero_hugepages(&initial_nr_hugepages))
+ ksft_exit_fail();
+
lim.rlim_cur = RLIM_INFINITY;
lim.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_MEMLOCK, &lim))
@@ -232,7 +258,8 @@ int main(int argc, char **argv)
entry = entry->next;
}
- if (check_compaction(mem_free, hugepage_size) == 0)
+ if (check_compaction(mem_free, hugepage_size,
+ initial_nr_hugepages) == 0)
ksft_exit_pass();
ksft_exit_fail();
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061351-oxidize-alright-8eee@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
fb9293b6b015 ("selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation")
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
69e545edbe8b ("selftests/mm: ksft_exit functions do not return")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9c1490d911f8 ("selftests/mm: log skipped compaction test as a skip")
8c9eea721a98 ("selftests/mm: skip test if application doesn't has root privileges")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
cb6e7cae1886 ("selftests/mm: gup_test: conform test to TAP format output")
019b277b680f ("selftests: mm: skip whole test instead of failure")
46fd75d4a3c9 ("selftests: mm: add pagemap ioctl tests")
05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
000303329752 ("selftests/mm: make migration test robust to failure")
f6dd4e223d87 ("selftests/mm: skip soft-dirty tests on arm64")
ba91e7e5d15a ("selftests/mm: add tests for HWPOISON hugetlbfs read")
63773d2b593d ("Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:58 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success and
reduce probability of OOM-killer invocation
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 5e9bd1da9370..e140558e6f53 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,13 +82,16 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
+ unsigned long initial_nr_hugepages)
{
unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[20] = {0};
char nr_hugepages[20] = {0};
+ char init_nr_hugepages[20] = {0};
+
+ sprintf(init_nr_hugepages, "%lu", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -102,23 +105,6 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto out;
}
- if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) {
- ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
- /* Start with the initial condition of 0 huge pages*/
- if (write(fd, "0", sizeof(char)) != sizeof(char)) {
- ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
/* Request a large number of huge pages. The Kernel will allocate
as much as it can */
if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
@@ -146,8 +132,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
lseek(fd, 0, SEEK_SET);
- if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages))
- != strlen(initial_nr_hugepages)) {
+ if (write(fd, init_nr_hugepages, strlen(init_nr_hugepages))
+ != strlen(init_nr_hugepages)) {
ksft_print_msg("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n",
strerror(errno));
goto close_fd;
@@ -171,6 +157,41 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
return ret;
}
+int set_zero_hugepages(unsigned long *initial_nr_hugepages)
+{
+ int fd, ret = -1;
+ char nr_hugepages[20] = {0};
+
+ fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ksft_print_msg("Failed to open /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) {
+ ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ /* Start with the initial condition of 0 huge pages */
+ if (write(fd, "0", sizeof(char)) != sizeof(char)) {
+ ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ *initial_nr_hugepages = strtoul(nr_hugepages, NULL, 10);
+ ret = 0;
+
+ close_fd:
+ close(fd);
+
+ out:
+ return ret;
+}
int main(int argc, char **argv)
{
@@ -181,6 +202,7 @@ int main(int argc, char **argv)
unsigned long mem_free = 0;
unsigned long hugepage_size = 0;
long mem_fragmentable_MB = 0;
+ unsigned long initial_nr_hugepages;
ksft_print_header();
@@ -189,6 +211,10 @@ int main(int argc, char **argv)
ksft_set_plan(1);
+ /* Start the test without hugepages reducing mem_free */
+ if (set_zero_hugepages(&initial_nr_hugepages))
+ ksft_exit_fail();
+
lim.rlim_cur = RLIM_INFINITY;
lim.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_MEMLOCK, &lim))
@@ -232,7 +258,8 @@ int main(int argc, char **argv)
entry = entry->next;
}
- if (check_compaction(mem_free, hugepage_size) == 0)
+ if (check_compaction(mem_free, hugepage_size,
+ initial_nr_hugepages) == 0)
ksft_exit_pass();
ksft_exit_fail();
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061349-imply-endnote-102d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
fb9293b6b015 ("selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation")
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
69e545edbe8b ("selftests/mm: ksft_exit functions do not return")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9c1490d911f8 ("selftests/mm: log skipped compaction test as a skip")
8c9eea721a98 ("selftests/mm: skip test if application doesn't has root privileges")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
cb6e7cae1886 ("selftests/mm: gup_test: conform test to TAP format output")
019b277b680f ("selftests: mm: skip whole test instead of failure")
46fd75d4a3c9 ("selftests: mm: add pagemap ioctl tests")
05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
000303329752 ("selftests/mm: make migration test robust to failure")
f6dd4e223d87 ("selftests/mm: skip soft-dirty tests on arm64")
ba91e7e5d15a ("selftests/mm: add tests for HWPOISON hugetlbfs read")
63773d2b593d ("Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:58 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success and
reduce probability of OOM-killer invocation
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 5e9bd1da9370..e140558e6f53 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,13 +82,16 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
+ unsigned long initial_nr_hugepages)
{
unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[20] = {0};
char nr_hugepages[20] = {0};
+ char init_nr_hugepages[20] = {0};
+
+ sprintf(init_nr_hugepages, "%lu", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -102,23 +105,6 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto out;
}
- if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) {
- ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
- /* Start with the initial condition of 0 huge pages*/
- if (write(fd, "0", sizeof(char)) != sizeof(char)) {
- ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
/* Request a large number of huge pages. The Kernel will allocate
as much as it can */
if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
@@ -146,8 +132,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
lseek(fd, 0, SEEK_SET);
- if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages))
- != strlen(initial_nr_hugepages)) {
+ if (write(fd, init_nr_hugepages, strlen(init_nr_hugepages))
+ != strlen(init_nr_hugepages)) {
ksft_print_msg("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n",
strerror(errno));
goto close_fd;
@@ -171,6 +157,41 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
return ret;
}
+int set_zero_hugepages(unsigned long *initial_nr_hugepages)
+{
+ int fd, ret = -1;
+ char nr_hugepages[20] = {0};
+
+ fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ksft_print_msg("Failed to open /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) {
+ ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ /* Start with the initial condition of 0 huge pages */
+ if (write(fd, "0", sizeof(char)) != sizeof(char)) {
+ ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ *initial_nr_hugepages = strtoul(nr_hugepages, NULL, 10);
+ ret = 0;
+
+ close_fd:
+ close(fd);
+
+ out:
+ return ret;
+}
int main(int argc, char **argv)
{
@@ -181,6 +202,7 @@ int main(int argc, char **argv)
unsigned long mem_free = 0;
unsigned long hugepage_size = 0;
long mem_fragmentable_MB = 0;
+ unsigned long initial_nr_hugepages;
ksft_print_header();
@@ -189,6 +211,10 @@ int main(int argc, char **argv)
ksft_set_plan(1);
+ /* Start the test without hugepages reducing mem_free */
+ if (set_zero_hugepages(&initial_nr_hugepages))
+ ksft_exit_fail();
+
lim.rlim_cur = RLIM_INFINITY;
lim.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_MEMLOCK, &lim))
@@ -232,7 +258,8 @@ int main(int argc, char **argv)
entry = entry->next;
}
- if (check_compaction(mem_free, hugepage_size) == 0)
+ if (check_compaction(mem_free, hugepage_size,
+ initial_nr_hugepages) == 0)
ksft_exit_pass();
ksft_exit_fail();
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061347-carat-unguided-96cc@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
fb9293b6b015 ("selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation")
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
69e545edbe8b ("selftests/mm: ksft_exit functions do not return")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9c1490d911f8 ("selftests/mm: log skipped compaction test as a skip")
8c9eea721a98 ("selftests/mm: skip test if application doesn't has root privileges")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
cb6e7cae1886 ("selftests/mm: gup_test: conform test to TAP format output")
019b277b680f ("selftests: mm: skip whole test instead of failure")
46fd75d4a3c9 ("selftests: mm: add pagemap ioctl tests")
05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
000303329752 ("selftests/mm: make migration test robust to failure")
f6dd4e223d87 ("selftests/mm: skip soft-dirty tests on arm64")
ba91e7e5d15a ("selftests/mm: add tests for HWPOISON hugetlbfs read")
63773d2b593d ("Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:58 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success and
reduce probability of OOM-killer invocation
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 5e9bd1da9370..e140558e6f53 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,13 +82,16 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
+ unsigned long initial_nr_hugepages)
{
unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[20] = {0};
char nr_hugepages[20] = {0};
+ char init_nr_hugepages[20] = {0};
+
+ sprintf(init_nr_hugepages, "%lu", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -102,23 +105,6 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto out;
}
- if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) {
- ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
- /* Start with the initial condition of 0 huge pages*/
- if (write(fd, "0", sizeof(char)) != sizeof(char)) {
- ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
/* Request a large number of huge pages. The Kernel will allocate
as much as it can */
if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
@@ -146,8 +132,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
lseek(fd, 0, SEEK_SET);
- if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages))
- != strlen(initial_nr_hugepages)) {
+ if (write(fd, init_nr_hugepages, strlen(init_nr_hugepages))
+ != strlen(init_nr_hugepages)) {
ksft_print_msg("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n",
strerror(errno));
goto close_fd;
@@ -171,6 +157,41 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
return ret;
}
+int set_zero_hugepages(unsigned long *initial_nr_hugepages)
+{
+ int fd, ret = -1;
+ char nr_hugepages[20] = {0};
+
+ fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ksft_print_msg("Failed to open /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) {
+ ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ /* Start with the initial condition of 0 huge pages */
+ if (write(fd, "0", sizeof(char)) != sizeof(char)) {
+ ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ *initial_nr_hugepages = strtoul(nr_hugepages, NULL, 10);
+ ret = 0;
+
+ close_fd:
+ close(fd);
+
+ out:
+ return ret;
+}
int main(int argc, char **argv)
{
@@ -181,6 +202,7 @@ int main(int argc, char **argv)
unsigned long mem_free = 0;
unsigned long hugepage_size = 0;
long mem_fragmentable_MB = 0;
+ unsigned long initial_nr_hugepages;
ksft_print_header();
@@ -189,6 +211,10 @@ int main(int argc, char **argv)
ksft_set_plan(1);
+ /* Start the test without hugepages reducing mem_free */
+ if (set_zero_hugepages(&initial_nr_hugepages))
+ ksft_exit_fail();
+
lim.rlim_cur = RLIM_INFINITY;
lim.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_MEMLOCK, &lim))
@@ -232,7 +258,8 @@ int main(int argc, char **argv)
entry = entry->next;
}
- if (check_compaction(mem_free, hugepage_size) == 0)
+ if (check_compaction(mem_free, hugepage_size,
+ initial_nr_hugepages) == 0)
ksft_exit_pass();
ksft_exit_fail();
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061345-fanciness-reheat-95c1@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
fb9293b6b015 ("selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation")
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
69e545edbe8b ("selftests/mm: ksft_exit functions do not return")
f3b7568c4942 ("selftests/mm: log a consistent test name for check_compaction")
9c1490d911f8 ("selftests/mm: log skipped compaction test as a skip")
8c9eea721a98 ("selftests/mm: skip test if application doesn't has root privileges")
9a21701edc41 ("selftests/mm: conform test to TAP format output")
cb6e7cae1886 ("selftests/mm: gup_test: conform test to TAP format output")
019b277b680f ("selftests: mm: skip whole test instead of failure")
46fd75d4a3c9 ("selftests: mm: add pagemap ioctl tests")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:58 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success and
reduce probability of OOM-killer invocation
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 5e9bd1da9370..e140558e6f53 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,13 +82,16 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
+ unsigned long initial_nr_hugepages)
{
unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[20] = {0};
char nr_hugepages[20] = {0};
+ char init_nr_hugepages[20] = {0};
+
+ sprintf(init_nr_hugepages, "%lu", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -102,23 +105,6 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto out;
}
- if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) {
- ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
- /* Start with the initial condition of 0 huge pages*/
- if (write(fd, "0", sizeof(char)) != sizeof(char)) {
- ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
/* Request a large number of huge pages. The Kernel will allocate
as much as it can */
if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
@@ -146,8 +132,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
lseek(fd, 0, SEEK_SET);
- if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages))
- != strlen(initial_nr_hugepages)) {
+ if (write(fd, init_nr_hugepages, strlen(init_nr_hugepages))
+ != strlen(init_nr_hugepages)) {
ksft_print_msg("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n",
strerror(errno));
goto close_fd;
@@ -171,6 +157,41 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
return ret;
}
+int set_zero_hugepages(unsigned long *initial_nr_hugepages)
+{
+ int fd, ret = -1;
+ char nr_hugepages[20] = {0};
+
+ fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ksft_print_msg("Failed to open /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) {
+ ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ /* Start with the initial condition of 0 huge pages */
+ if (write(fd, "0", sizeof(char)) != sizeof(char)) {
+ ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ *initial_nr_hugepages = strtoul(nr_hugepages, NULL, 10);
+ ret = 0;
+
+ close_fd:
+ close(fd);
+
+ out:
+ return ret;
+}
int main(int argc, char **argv)
{
@@ -181,6 +202,7 @@ int main(int argc, char **argv)
unsigned long mem_free = 0;
unsigned long hugepage_size = 0;
long mem_fragmentable_MB = 0;
+ unsigned long initial_nr_hugepages;
ksft_print_header();
@@ -189,6 +211,10 @@ int main(int argc, char **argv)
ksft_set_plan(1);
+ /* Start the test without hugepages reducing mem_free */
+ if (set_zero_hugepages(&initial_nr_hugepages))
+ ksft_exit_fail();
+
lim.rlim_cur = RLIM_INFINITY;
lim.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_MEMLOCK, &lim))
@@ -232,7 +258,8 @@ int main(int argc, char **argv)
entry = entry->next;
}
- if (check_compaction(mem_free, hugepage_size) == 0)
+ if (check_compaction(mem_free, hugepage_size,
+ initial_nr_hugepages) == 0)
ksft_exit_pass();
ksft_exit_fail();
The patch below does not apply to the 6.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.9.y
git checkout FETCH_HEAD
git cherry-pick -x fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061344-citric-service-140d@gregkh' --subject-prefix 'PATCH 6.9.y' HEAD^..
Possible dependencies:
fb9293b6b015 ("selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation")
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64")
69e545edbe8b ("selftests/mm: ksft_exit functions do not return")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:58 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix bogus test success and
reduce probability of OOM-killer invocation
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 5e9bd1da9370..e140558e6f53 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,13 +82,16 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
+ unsigned long initial_nr_hugepages)
{
unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[20] = {0};
char nr_hugepages[20] = {0};
+ char init_nr_hugepages[20] = {0};
+
+ sprintf(init_nr_hugepages, "%lu", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -102,23 +105,6 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto out;
}
- if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) {
- ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
- /* Start with the initial condition of 0 huge pages*/
- if (write(fd, "0", sizeof(char)) != sizeof(char)) {
- ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
/* Request a large number of huge pages. The Kernel will allocate
as much as it can */
if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
@@ -146,8 +132,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
lseek(fd, 0, SEEK_SET);
- if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages))
- != strlen(initial_nr_hugepages)) {
+ if (write(fd, init_nr_hugepages, strlen(init_nr_hugepages))
+ != strlen(init_nr_hugepages)) {
ksft_print_msg("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n",
strerror(errno));
goto close_fd;
@@ -171,6 +157,41 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
return ret;
}
+int set_zero_hugepages(unsigned long *initial_nr_hugepages)
+{
+ int fd, ret = -1;
+ char nr_hugepages[20] = {0};
+
+ fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ksft_print_msg("Failed to open /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) {
+ ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ /* Start with the initial condition of 0 huge pages */
+ if (write(fd, "0", sizeof(char)) != sizeof(char)) {
+ ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ *initial_nr_hugepages = strtoul(nr_hugepages, NULL, 10);
+ ret = 0;
+
+ close_fd:
+ close(fd);
+
+ out:
+ return ret;
+}
int main(int argc, char **argv)
{
@@ -181,6 +202,7 @@ int main(int argc, char **argv)
unsigned long mem_free = 0;
unsigned long hugepage_size = 0;
long mem_fragmentable_MB = 0;
+ unsigned long initial_nr_hugepages;
ksft_print_header();
@@ -189,6 +211,10 @@ int main(int argc, char **argv)
ksft_set_plan(1);
+ /* Start the test without hugepages reducing mem_free */
+ if (set_zero_hugepages(&initial_nr_hugepages))
+ ksft_exit_fail();
+
lim.rlim_cur = RLIM_INFINITY;
lim.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_MEMLOCK, &lim))
@@ -232,7 +258,8 @@ int main(int argc, char **argv)
entry = entry->next;
}
- if (check_compaction(mem_free, hugepage_size) == 0)
+ if (check_compaction(mem_free, hugepage_size,
+ initial_nr_hugepages) == 0)
ksft_exit_pass();
ksft_exit_fail();
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 9ad665ef55eaad1ead1406a58a34f615a7c18b5e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061334-splinter-unshipped-1c3e@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9ad665ef55eaad1ead1406a58a34f615a7c18b5e Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:57 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix incorrect write of zero to
nr_hugepages
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read(). Fix that
using lseek().
Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 0b249a06a60b..5e9bd1da9370 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -108,6 +108,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto close_fd;
}
+ lseek(fd, 0, SEEK_SET);
+
/* Start with the initial condition of 0 huge pages*/
if (write(fd, "0", sizeof(char)) != sizeof(char)) {
ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 9ad665ef55eaad1ead1406a58a34f615a7c18b5e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061333-unicorn-unstaffed-01d6@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9ad665ef55eaad1ead1406a58a34f615a7c18b5e Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:57 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix incorrect write of zero to
nr_hugepages
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read(). Fix that
using lseek().
Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 0b249a06a60b..5e9bd1da9370 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -108,6 +108,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto close_fd;
}
+ lseek(fd, 0, SEEK_SET);
+
/* Start with the initial condition of 0 huge pages*/
if (write(fd, "0", sizeof(char)) != sizeof(char)) {
ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 9ad665ef55eaad1ead1406a58a34f615a7c18b5e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061332-catering-triage-c259@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9ad665ef55eaad1ead1406a58a34f615a7c18b5e Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:57 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix incorrect write of zero to
nr_hugepages
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read(). Fix that
using lseek().
Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 0b249a06a60b..5e9bd1da9370 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -108,6 +108,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto close_fd;
}
+ lseek(fd, 0, SEEK_SET);
+
/* Start with the initial condition of 0 huge pages*/
if (write(fd, "0", sizeof(char)) != sizeof(char)) {
ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 9ad665ef55eaad1ead1406a58a34f615a7c18b5e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061331-trilogy-bulk-6fc0@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9ad665ef55eaad1ead1406a58a34f615a7c18b5e Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:57 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix incorrect write of zero to
nr_hugepages
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read(). Fix that
using lseek().
Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 0b249a06a60b..5e9bd1da9370 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -108,6 +108,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto close_fd;
}
+ lseek(fd, 0, SEEK_SET);
+
/* Start with the initial condition of 0 huge pages*/
if (write(fd, "0", sizeof(char)) != sizeof(char)) {
ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 9ad665ef55eaad1ead1406a58a34f615a7c18b5e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061331-these-daunting-6dba@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
9ad665ef55ea ("selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9ad665ef55eaad1ead1406a58a34f615a7c18b5e Mon Sep 17 00:00:00 2001
From: Dev Jain <dev.jain(a)arm.com>
Date: Tue, 21 May 2024 13:13:57 +0530
Subject: [PATCH] selftests/mm: compaction_test: fix incorrect write of zero to
nr_hugepages
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read(). Fix that
using lseek().
Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 0b249a06a60b..5e9bd1da9370 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -108,6 +108,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto close_fd;
}
+ lseek(fd, 0, SEEK_SET);
+
/* Start with the initial condition of 0 huge pages*/
if (write(fd, "0", sizeof(char)) != sizeof(char)) {
ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6d065f507d82307d6161ac75c025111fb8b08a46
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061335-lunchbox-playroom-cf81@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
6d065f507d82 ("mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again")
250cb40f0afe ("task_mmu: convert to vma iterator")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6d065f507d82307d6161ac75c025111fb8b08a46 Mon Sep 17 00:00:00 2001
From: Yuanyuan Zhong <yzhong(a)purestorage.com>
Date: Thu, 23 May 2024 12:35:31 -0600
Subject: [PATCH] mm: /proc/pid/smaps_rollup: avoid skipping vma after getting
mmap_lock again
After switching smaps_rollup to use VMA iterator, searching for next entry
is part of the condition expression of the do-while loop. So the current
VMA needs to be addressed before the continue statement.
Otherwise, with some VMAs skipped, userspace observed memory
consumption from /proc/pid/smaps_rollup will be smaller than the sum of
the corresponding fields from /proc/pid/smaps.
Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com
Fixes: c4c84f06285e ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
Signed-off-by: Yuanyuan Zhong <yzhong(a)purestorage.com>
Reviewed-by: Mohamed Khalfella <mkhalfella(a)purestorage.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e5a5f015ff03..f8d35f993fe5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -970,12 +970,17 @@ static int show_smaps_rollup(struct seq_file *m, void *v)
break;
/* Case 1 and 2 above */
- if (vma->vm_start >= last_vma_end)
+ if (vma->vm_start >= last_vma_end) {
+ smap_gather_stats(vma, &mss, 0);
+ last_vma_end = vma->vm_end;
continue;
+ }
/* Case 4 above */
- if (vma->vm_end > last_vma_end)
+ if (vma->vm_end > last_vma_end) {
smap_gather_stats(vma, &mss, last_vma_end);
+ last_vma_end = vma->vm_end;
+ }
}
} for_each_vma(vmi, vma);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 55d134a7b499c77e7cfd0ee41046f3c376e791e5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061318-magical-unclamped-49ef@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
55d134a7b499 ("mm/hugetlb: pass correct order_per_bit to cma_declare_contiguous_nid")
a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to CMA")
79dfc695525f ("hugetlb: add demote hugetlb page sysfs interfaces")
6eb4e88a6d27 ("hugetlb: create remove_hugetlb_page() to separate functionality")
262443c0421e ("hugetlb: no need to drop hugetlb_lock to call cma_release")
9157c31186c3 ("hugetlb: convert PageHugeTemporary() to HPageTemporary flag")
8f251a3d5ce3 ("hugetlb: convert page_huge_active() HPageMigratable flag")
d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")
dbfee5aee7e5 ("hugetlb: fix update_and_free_page contig page struct assumption")
ecbf4724e606 ("mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active")
0eb2df2b5629 ("mm: hugetlb: fix a race between isolating and freeing page")
585fc0d2871c ("mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 55d134a7b499c77e7cfd0ee41046f3c376e791e5 Mon Sep 17 00:00:00 2001
From: Frank van der Linden <fvdl(a)google.com>
Date: Thu, 4 Apr 2024 16:25:15 +0000
Subject: [PATCH] mm/hugetlb: pass correct order_per_bit to
cma_declare_contiguous_nid
The hugetlb_cma code passes 0 in the order_per_bit argument to
cma_declare_contiguous_nid (the alignment, computed using the page order,
is correctly passed in).
This causes a bit in the cma allocation bitmap to always represent a 4k
page, making the bitmaps potentially very large, and slower.
It would create bitmaps that would be pretty big. E.g. for a 4k page
size on x86, hugetlb_cma=64G would mean a bitmap size of (64G / 4k) / 8
== 2M. With HUGETLB_PAGE_ORDER as order_per_bit, as intended, this
would be (64G / 2M) / 8 == 4k. So, that's quite a difference.
Also, this restricted the hugetlb_cma area to ((PAGE_SIZE <<
MAX_PAGE_ORDER) * 8) * PAGE_SIZE (e.g. 128G on x86) , since
bitmap_alloc uses normal page allocation, and is thus restricted by
MAX_PAGE_ORDER. Specifying anything about that would fail the CMA
initialization.
So, correctly pass in the order instead.
Link: https://lkml.kernel.org/r/20240404162515.527802-2-fvdl@google.com
Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Signed-off-by: Frank van der Linden <fvdl(a)google.com>
Acked-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 228c886c46c1..5dc3f5ea3a2e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7794,9 +7794,9 @@ void __init hugetlb_cma_reserve(int order)
* huge page demotion.
*/
res = cma_declare_contiguous_nid(0, size, 0,
- PAGE_SIZE << HUGETLB_PAGE_ORDER,
- 0, false, name,
- &hugetlb_cma[nid], nid);
+ PAGE_SIZE << HUGETLB_PAGE_ORDER,
+ HUGETLB_PAGE_ORDER, false, name,
+ &hugetlb_cma[nid], nid);
if (res) {
pr_warn("hugetlb_cma: reservation failed: err %d, node %d",
res, nid);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 55d134a7b499c77e7cfd0ee41046f3c376e791e5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061317-rework-obituary-d23b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
55d134a7b499 ("mm/hugetlb: pass correct order_per_bit to cma_declare_contiguous_nid")
a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to CMA")
79dfc695525f ("hugetlb: add demote hugetlb page sysfs interfaces")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 55d134a7b499c77e7cfd0ee41046f3c376e791e5 Mon Sep 17 00:00:00 2001
From: Frank van der Linden <fvdl(a)google.com>
Date: Thu, 4 Apr 2024 16:25:15 +0000
Subject: [PATCH] mm/hugetlb: pass correct order_per_bit to
cma_declare_contiguous_nid
The hugetlb_cma code passes 0 in the order_per_bit argument to
cma_declare_contiguous_nid (the alignment, computed using the page order,
is correctly passed in).
This causes a bit in the cma allocation bitmap to always represent a 4k
page, making the bitmaps potentially very large, and slower.
It would create bitmaps that would be pretty big. E.g. for a 4k page
size on x86, hugetlb_cma=64G would mean a bitmap size of (64G / 4k) / 8
== 2M. With HUGETLB_PAGE_ORDER as order_per_bit, as intended, this
would be (64G / 2M) / 8 == 4k. So, that's quite a difference.
Also, this restricted the hugetlb_cma area to ((PAGE_SIZE <<
MAX_PAGE_ORDER) * 8) * PAGE_SIZE (e.g. 128G on x86) , since
bitmap_alloc uses normal page allocation, and is thus restricted by
MAX_PAGE_ORDER. Specifying anything about that would fail the CMA
initialization.
So, correctly pass in the order instead.
Link: https://lkml.kernel.org/r/20240404162515.527802-2-fvdl@google.com
Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Signed-off-by: Frank van der Linden <fvdl(a)google.com>
Acked-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 228c886c46c1..5dc3f5ea3a2e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7794,9 +7794,9 @@ void __init hugetlb_cma_reserve(int order)
* huge page demotion.
*/
res = cma_declare_contiguous_nid(0, size, 0,
- PAGE_SIZE << HUGETLB_PAGE_ORDER,
- 0, false, name,
- &hugetlb_cma[nid], nid);
+ PAGE_SIZE << HUGETLB_PAGE_ORDER,
+ HUGETLB_PAGE_ORDER, false, name,
+ &hugetlb_cma[nid], nid);
if (res) {
pr_warn("hugetlb_cma: reservation failed: err %d, node %d",
res, nid);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x b174f139bdc8aaaf72f5b67ad1bd512c4868a87e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061304-wolverine-clamshell-84a2@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
b174f139bdc8 ("mm/cma: drop incorrect alignment check in cma_init_reserved_mem")
e16faf26780f ("cma: factor out minimum alignment requirement")
658aafc8139c ("memblock: exclude MEMBLOCK_NOMAP regions from kmemleak")
a7259df76702 ("memblock: make memblock_find_in_range method private")
a70bb580bfea ("Merge tag 'devicetree-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b174f139bdc8aaaf72f5b67ad1bd512c4868a87e Mon Sep 17 00:00:00 2001
From: Frank van der Linden <fvdl(a)google.com>
Date: Thu, 4 Apr 2024 16:25:14 +0000
Subject: [PATCH] mm/cma: drop incorrect alignment check in
cma_init_reserved_mem
cma_init_reserved_mem uses IS_ALIGNED to check if the size represented by
one bit in the cma allocation bitmask is aligned with
CMA_MIN_ALIGNMENT_BYTES (pageblock size).
However, this is too strict, as this will fail if order_per_bit >
pageblock_order, which is a valid configuration.
We could check IS_ALIGNED both ways, but since both numbers are powers of
two, no check is needed at all.
Link: https://lkml.kernel.org/r/20240404162515.527802-1-fvdl@google.com
Fixes: de9e14eebf33 ("drivers: dma-contiguous: add initialization from device tree")
Signed-off-by: Frank van der Linden <fvdl(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/cma.c b/mm/cma.c
index 01f5a8f71ddf..3e9724716bad 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -182,10 +182,6 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
if (!size || !memblock_is_region_reserved(base, size))
return -EINVAL;
- /* alignment should be aligned with order_per_bit */
- if (!IS_ALIGNED(CMA_MIN_ALIGNMENT_PAGES, 1 << order_per_bit))
- return -EINVAL;
-
/* ensure minimal alignment required by mm core */
if (!IS_ALIGNED(base | size, CMA_MIN_ALIGNMENT_BYTES))
return -EINVAL;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x b174f139bdc8aaaf72f5b67ad1bd512c4868a87e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061303-scouting-precise-e914@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
b174f139bdc8 ("mm/cma: drop incorrect alignment check in cma_init_reserved_mem")
e16faf26780f ("cma: factor out minimum alignment requirement")
658aafc8139c ("memblock: exclude MEMBLOCK_NOMAP regions from kmemleak")
a7259df76702 ("memblock: make memblock_find_in_range method private")
a70bb580bfea ("Merge tag 'devicetree-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b174f139bdc8aaaf72f5b67ad1bd512c4868a87e Mon Sep 17 00:00:00 2001
From: Frank van der Linden <fvdl(a)google.com>
Date: Thu, 4 Apr 2024 16:25:14 +0000
Subject: [PATCH] mm/cma: drop incorrect alignment check in
cma_init_reserved_mem
cma_init_reserved_mem uses IS_ALIGNED to check if the size represented by
one bit in the cma allocation bitmask is aligned with
CMA_MIN_ALIGNMENT_BYTES (pageblock size).
However, this is too strict, as this will fail if order_per_bit >
pageblock_order, which is a valid configuration.
We could check IS_ALIGNED both ways, but since both numbers are powers of
two, no check is needed at all.
Link: https://lkml.kernel.org/r/20240404162515.527802-1-fvdl@google.com
Fixes: de9e14eebf33 ("drivers: dma-contiguous: add initialization from device tree")
Signed-off-by: Frank van der Linden <fvdl(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/cma.c b/mm/cma.c
index 01f5a8f71ddf..3e9724716bad 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -182,10 +182,6 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
if (!size || !memblock_is_region_reserved(base, size))
return -EINVAL;
- /* alignment should be aligned with order_per_bit */
- if (!IS_ALIGNED(CMA_MIN_ALIGNMENT_PAGES, 1 << order_per_bit))
- return -EINVAL;
-
/* ensure minimal alignment required by mm core */
if (!IS_ALIGNED(base | size, CMA_MIN_ALIGNMENT_BYTES))
return -EINVAL;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x b174f139bdc8aaaf72f5b67ad1bd512c4868a87e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061302-polyester-ahead-33a7@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
b174f139bdc8 ("mm/cma: drop incorrect alignment check in cma_init_reserved_mem")
e16faf26780f ("cma: factor out minimum alignment requirement")
658aafc8139c ("memblock: exclude MEMBLOCK_NOMAP regions from kmemleak")
a7259df76702 ("memblock: make memblock_find_in_range method private")
a70bb580bfea ("Merge tag 'devicetree-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b174f139bdc8aaaf72f5b67ad1bd512c4868a87e Mon Sep 17 00:00:00 2001
From: Frank van der Linden <fvdl(a)google.com>
Date: Thu, 4 Apr 2024 16:25:14 +0000
Subject: [PATCH] mm/cma: drop incorrect alignment check in
cma_init_reserved_mem
cma_init_reserved_mem uses IS_ALIGNED to check if the size represented by
one bit in the cma allocation bitmask is aligned with
CMA_MIN_ALIGNMENT_BYTES (pageblock size).
However, this is too strict, as this will fail if order_per_bit >
pageblock_order, which is a valid configuration.
We could check IS_ALIGNED both ways, but since both numbers are powers of
two, no check is needed at all.
Link: https://lkml.kernel.org/r/20240404162515.527802-1-fvdl@google.com
Fixes: de9e14eebf33 ("drivers: dma-contiguous: add initialization from device tree")
Signed-off-by: Frank van der Linden <fvdl(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/cma.c b/mm/cma.c
index 01f5a8f71ddf..3e9724716bad 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -182,10 +182,6 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
if (!size || !memblock_is_region_reserved(base, size))
return -EINVAL;
- /* alignment should be aligned with order_per_bit */
- if (!IS_ALIGNED(CMA_MIN_ALIGNMENT_PAGES, 1 << order_per_bit))
- return -EINVAL;
-
/* ensure minimal alignment required by mm core */
if (!IS_ALIGNED(base | size, CMA_MIN_ALIGNMENT_BYTES))
return -EINVAL;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x b174f139bdc8aaaf72f5b67ad1bd512c4868a87e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061301-runt-mannish-7604@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
b174f139bdc8 ("mm/cma: drop incorrect alignment check in cma_init_reserved_mem")
e16faf26780f ("cma: factor out minimum alignment requirement")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b174f139bdc8aaaf72f5b67ad1bd512c4868a87e Mon Sep 17 00:00:00 2001
From: Frank van der Linden <fvdl(a)google.com>
Date: Thu, 4 Apr 2024 16:25:14 +0000
Subject: [PATCH] mm/cma: drop incorrect alignment check in
cma_init_reserved_mem
cma_init_reserved_mem uses IS_ALIGNED to check if the size represented by
one bit in the cma allocation bitmask is aligned with
CMA_MIN_ALIGNMENT_BYTES (pageblock size).
However, this is too strict, as this will fail if order_per_bit >
pageblock_order, which is a valid configuration.
We could check IS_ALIGNED both ways, but since both numbers are powers of
two, no check is needed at all.
Link: https://lkml.kernel.org/r/20240404162515.527802-1-fvdl@google.com
Fixes: de9e14eebf33 ("drivers: dma-contiguous: add initialization from device tree")
Signed-off-by: Frank van der Linden <fvdl(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/cma.c b/mm/cma.c
index 01f5a8f71ddf..3e9724716bad 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -182,10 +182,6 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
if (!size || !memblock_is_region_reserved(base, size))
return -EINVAL;
- /* alignment should be aligned with order_per_bit */
- if (!IS_ALIGNED(CMA_MIN_ALIGNMENT_PAGES, 1 << order_per_bit))
- return -EINVAL;
-
/* ensure minimal alignment required by mm core */
if (!IS_ALIGNED(base | size, CMA_MIN_ALIGNMENT_BYTES))
return -EINVAL;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 3f858bbf04dbac934ac279aaee05d49eb9910051
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061326-catalyst-ridden-9b12@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
3f858bbf04db ("i2c: acpi: Unbind mux adapters before delete")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3f858bbf04dbac934ac279aaee05d49eb9910051 Mon Sep 17 00:00:00 2001
From: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Date: Wed, 13 Mar 2024 11:16:32 +1300
Subject: [PATCH] i2c: acpi: Unbind mux adapters before delete
There is an issue with ACPI overlay table removal specifically related
to I2C multiplexers.
Consider an ACPI SSDT Overlay that defines a PCA9548 I2C mux on an
existing I2C bus. When this table is loaded we see the creation of a
device for the overall PCA9548 chip and 8 further devices - one
i2c_adapter each for the mux channels. These are all bound to their
ACPI equivalents via an eventual invocation of acpi_bind_one().
When we unload the SSDT overlay we run into the problem. The ACPI
devices are deleted as normal via acpi_device_del_work_fn() and the
acpi_device_del_list.
However, the following warning and stack trace is output as the
deletion does not go smoothly:
------------[ cut here ]------------
kernfs: can not remove 'physical_node', no directory
WARNING: CPU: 1 PID: 11 at fs/kernfs/dir.c:1674 kernfs_remove_by_name_ns+0xb9/0xc0
Modules linked in:
CPU: 1 PID: 11 Comm: kworker/u128:0 Not tainted 6.8.0-rc6+ #1
Hardware name: congatec AG conga-B7E3/conga-B7E3, BIOS 5.13 05/16/2023
Workqueue: kacpi_hotplug acpi_device_del_work_fn
RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0
Code: e4 00 48 89 ef e8 07 71 db ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 a7 55 e4 00 0f 0b eb a6 48 c7 c7 f0 38 0d 9d e8 97 0a d5 ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffff9f864008fb28 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8ef90a8d4940 RCX: 0000000000000000
RDX: ffff8f000e267d10 RSI: ffff8f000e25c780 RDI: ffff8f000e25c780
RBP: ffff8ef9186f9870 R08: 0000000000013ffb R09: 00000000ffffbfff
R10: 00000000ffffbfff R11: ffff8f000e0a0000 R12: ffff9f864008fb50
R13: ffff8ef90c93dd60 R14: ffff8ef9010d0958 R15: ffff8ef9186f98c8
FS: 0000000000000000(0000) GS:ffff8f000e240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f48f5253a08 CR3: 00000003cb82e000 CR4: 00000000003506f0
Call Trace:
<TASK>
? kernfs_remove_by_name_ns+0xb9/0xc0
? __warn+0x7c/0x130
? kernfs_remove_by_name_ns+0xb9/0xc0
? report_bug+0x171/0x1a0
? handle_bug+0x3c/0x70
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? kernfs_remove_by_name_ns+0xb9/0xc0
? kernfs_remove_by_name_ns+0xb9/0xc0
acpi_unbind_one+0x108/0x180
device_del+0x18b/0x490
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
device_unregister+0xd/0x30
i2c_del_adapter.part.0+0x1bf/0x250
i2c_mux_del_adapters+0xa1/0xe0
i2c_device_remove+0x1e/0x80
device_release_driver_internal+0x19a/0x200
bus_remove_device+0xbf/0x100
device_del+0x157/0x490
? __pfx_device_match_fwnode+0x10/0x10
? srso_return_thunk+0x5/0x5f
device_unregister+0xd/0x30
i2c_acpi_notify+0x10f/0x140
notifier_call_chain+0x58/0xd0
blocking_notifier_call_chain+0x3a/0x60
acpi_device_del_work_fn+0x85/0x1d0
process_one_work+0x134/0x2f0
worker_thread+0x2f0/0x410
? __pfx_worker_thread+0x10/0x10
kthread+0xe3/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
---[ end trace 0000000000000000 ]---
...
repeated 7 more times, 1 for each channel of the mux
...
The issue is that the binding of the ACPI devices to their peer I2C
adapters is not correctly cleaned up. Digging deeper into the issue we
see that the deletion order is such that the ACPI devices matching the
mux channel i2c adapters are deleted first during the SSDT overlay
removal. For each of the channels we see a call to i2c_acpi_notify()
with ACPI_RECONFIG_DEVICE_REMOVE but, because these devices are not
actually i2c_clients, nothing is done for them.
Later on, after each of the mux channels has been dealt with, we come
to delete the i2c_client representing the PCA9548 device. This is the
call stack we see above, whereby the kernel cleans up the i2c_client
including destruction of the mux and its channel adapters. At this
point we do attempt to unbind from the ACPI peers but those peers no
longer exist and so we hit the kernfs errors.
The fix is to augment i2c_acpi_notify() to handle i2c_adapters. But,
given that the life cycle of the adapters is linked to the i2c_client,
instead of deleting the i2c_adapters during the i2c_acpi_notify(), we
just trigger unbinding of the ACPI device from the adapter device, and
allow the clean up of the adapter to continue in the way it always has.
Signed-off-by: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti(a)kernel.org>
Fixes: 525e6fabeae2 ("i2c / ACPI: add support for ACPI reconfigure notifications")
Cc: <stable(a)vger.kernel.org> # v4.8+
Signed-off-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-acpi.c
index d6037a328669..14ae0cfc325e 100644
--- a/drivers/i2c/i2c-core-acpi.c
+++ b/drivers/i2c/i2c-core-acpi.c
@@ -445,6 +445,11 @@ static struct i2c_client *i2c_acpi_find_client_by_adev(struct acpi_device *adev)
return i2c_find_device_by_fwnode(acpi_fwnode_handle(adev));
}
+static struct i2c_adapter *i2c_acpi_find_adapter_by_adev(struct acpi_device *adev)
+{
+ return i2c_find_adapter_by_fwnode(acpi_fwnode_handle(adev));
+}
+
static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
void *arg)
{
@@ -471,11 +476,17 @@ static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
break;
client = i2c_acpi_find_client_by_adev(adev);
- if (!client)
- break;
+ if (client) {
+ i2c_unregister_device(client);
+ put_device(&client->dev);
+ }
+
+ adapter = i2c_acpi_find_adapter_by_adev(adev);
+ if (adapter) {
+ acpi_unbind_one(&adapter->dev);
+ put_device(&adapter->dev);
+ }
- i2c_unregister_device(client);
- put_device(&client->dev);
break;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 3f858bbf04dbac934ac279aaee05d49eb9910051
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061323-augmented-much-5265@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
3f858bbf04db ("i2c: acpi: Unbind mux adapters before delete")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3f858bbf04dbac934ac279aaee05d49eb9910051 Mon Sep 17 00:00:00 2001
From: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Date: Wed, 13 Mar 2024 11:16:32 +1300
Subject: [PATCH] i2c: acpi: Unbind mux adapters before delete
There is an issue with ACPI overlay table removal specifically related
to I2C multiplexers.
Consider an ACPI SSDT Overlay that defines a PCA9548 I2C mux on an
existing I2C bus. When this table is loaded we see the creation of a
device for the overall PCA9548 chip and 8 further devices - one
i2c_adapter each for the mux channels. These are all bound to their
ACPI equivalents via an eventual invocation of acpi_bind_one().
When we unload the SSDT overlay we run into the problem. The ACPI
devices are deleted as normal via acpi_device_del_work_fn() and the
acpi_device_del_list.
However, the following warning and stack trace is output as the
deletion does not go smoothly:
------------[ cut here ]------------
kernfs: can not remove 'physical_node', no directory
WARNING: CPU: 1 PID: 11 at fs/kernfs/dir.c:1674 kernfs_remove_by_name_ns+0xb9/0xc0
Modules linked in:
CPU: 1 PID: 11 Comm: kworker/u128:0 Not tainted 6.8.0-rc6+ #1
Hardware name: congatec AG conga-B7E3/conga-B7E3, BIOS 5.13 05/16/2023
Workqueue: kacpi_hotplug acpi_device_del_work_fn
RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0
Code: e4 00 48 89 ef e8 07 71 db ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 a7 55 e4 00 0f 0b eb a6 48 c7 c7 f0 38 0d 9d e8 97 0a d5 ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffff9f864008fb28 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8ef90a8d4940 RCX: 0000000000000000
RDX: ffff8f000e267d10 RSI: ffff8f000e25c780 RDI: ffff8f000e25c780
RBP: ffff8ef9186f9870 R08: 0000000000013ffb R09: 00000000ffffbfff
R10: 00000000ffffbfff R11: ffff8f000e0a0000 R12: ffff9f864008fb50
R13: ffff8ef90c93dd60 R14: ffff8ef9010d0958 R15: ffff8ef9186f98c8
FS: 0000000000000000(0000) GS:ffff8f000e240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f48f5253a08 CR3: 00000003cb82e000 CR4: 00000000003506f0
Call Trace:
<TASK>
? kernfs_remove_by_name_ns+0xb9/0xc0
? __warn+0x7c/0x130
? kernfs_remove_by_name_ns+0xb9/0xc0
? report_bug+0x171/0x1a0
? handle_bug+0x3c/0x70
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? kernfs_remove_by_name_ns+0xb9/0xc0
? kernfs_remove_by_name_ns+0xb9/0xc0
acpi_unbind_one+0x108/0x180
device_del+0x18b/0x490
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
device_unregister+0xd/0x30
i2c_del_adapter.part.0+0x1bf/0x250
i2c_mux_del_adapters+0xa1/0xe0
i2c_device_remove+0x1e/0x80
device_release_driver_internal+0x19a/0x200
bus_remove_device+0xbf/0x100
device_del+0x157/0x490
? __pfx_device_match_fwnode+0x10/0x10
? srso_return_thunk+0x5/0x5f
device_unregister+0xd/0x30
i2c_acpi_notify+0x10f/0x140
notifier_call_chain+0x58/0xd0
blocking_notifier_call_chain+0x3a/0x60
acpi_device_del_work_fn+0x85/0x1d0
process_one_work+0x134/0x2f0
worker_thread+0x2f0/0x410
? __pfx_worker_thread+0x10/0x10
kthread+0xe3/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
---[ end trace 0000000000000000 ]---
...
repeated 7 more times, 1 for each channel of the mux
...
The issue is that the binding of the ACPI devices to their peer I2C
adapters is not correctly cleaned up. Digging deeper into the issue we
see that the deletion order is such that the ACPI devices matching the
mux channel i2c adapters are deleted first during the SSDT overlay
removal. For each of the channels we see a call to i2c_acpi_notify()
with ACPI_RECONFIG_DEVICE_REMOVE but, because these devices are not
actually i2c_clients, nothing is done for them.
Later on, after each of the mux channels has been dealt with, we come
to delete the i2c_client representing the PCA9548 device. This is the
call stack we see above, whereby the kernel cleans up the i2c_client
including destruction of the mux and its channel adapters. At this
point we do attempt to unbind from the ACPI peers but those peers no
longer exist and so we hit the kernfs errors.
The fix is to augment i2c_acpi_notify() to handle i2c_adapters. But,
given that the life cycle of the adapters is linked to the i2c_client,
instead of deleting the i2c_adapters during the i2c_acpi_notify(), we
just trigger unbinding of the ACPI device from the adapter device, and
allow the clean up of the adapter to continue in the way it always has.
Signed-off-by: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti(a)kernel.org>
Fixes: 525e6fabeae2 ("i2c / ACPI: add support for ACPI reconfigure notifications")
Cc: <stable(a)vger.kernel.org> # v4.8+
Signed-off-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-acpi.c
index d6037a328669..14ae0cfc325e 100644
--- a/drivers/i2c/i2c-core-acpi.c
+++ b/drivers/i2c/i2c-core-acpi.c
@@ -445,6 +445,11 @@ static struct i2c_client *i2c_acpi_find_client_by_adev(struct acpi_device *adev)
return i2c_find_device_by_fwnode(acpi_fwnode_handle(adev));
}
+static struct i2c_adapter *i2c_acpi_find_adapter_by_adev(struct acpi_device *adev)
+{
+ return i2c_find_adapter_by_fwnode(acpi_fwnode_handle(adev));
+}
+
static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
void *arg)
{
@@ -471,11 +476,17 @@ static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
break;
client = i2c_acpi_find_client_by_adev(adev);
- if (!client)
- break;
+ if (client) {
+ i2c_unregister_device(client);
+ put_device(&client->dev);
+ }
+
+ adapter = i2c_acpi_find_adapter_by_adev(adev);
+ if (adapter) {
+ acpi_unbind_one(&adapter->dev);
+ put_device(&adapter->dev);
+ }
- i2c_unregister_device(client);
- put_device(&client->dev);
break;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 3f858bbf04dbac934ac279aaee05d49eb9910051
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061321-unfeeling-stinging-0b73@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
3f858bbf04db ("i2c: acpi: Unbind mux adapters before delete")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3f858bbf04dbac934ac279aaee05d49eb9910051 Mon Sep 17 00:00:00 2001
From: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Date: Wed, 13 Mar 2024 11:16:32 +1300
Subject: [PATCH] i2c: acpi: Unbind mux adapters before delete
There is an issue with ACPI overlay table removal specifically related
to I2C multiplexers.
Consider an ACPI SSDT Overlay that defines a PCA9548 I2C mux on an
existing I2C bus. When this table is loaded we see the creation of a
device for the overall PCA9548 chip and 8 further devices - one
i2c_adapter each for the mux channels. These are all bound to their
ACPI equivalents via an eventual invocation of acpi_bind_one().
When we unload the SSDT overlay we run into the problem. The ACPI
devices are deleted as normal via acpi_device_del_work_fn() and the
acpi_device_del_list.
However, the following warning and stack trace is output as the
deletion does not go smoothly:
------------[ cut here ]------------
kernfs: can not remove 'physical_node', no directory
WARNING: CPU: 1 PID: 11 at fs/kernfs/dir.c:1674 kernfs_remove_by_name_ns+0xb9/0xc0
Modules linked in:
CPU: 1 PID: 11 Comm: kworker/u128:0 Not tainted 6.8.0-rc6+ #1
Hardware name: congatec AG conga-B7E3/conga-B7E3, BIOS 5.13 05/16/2023
Workqueue: kacpi_hotplug acpi_device_del_work_fn
RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0
Code: e4 00 48 89 ef e8 07 71 db ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 a7 55 e4 00 0f 0b eb a6 48 c7 c7 f0 38 0d 9d e8 97 0a d5 ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffff9f864008fb28 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8ef90a8d4940 RCX: 0000000000000000
RDX: ffff8f000e267d10 RSI: ffff8f000e25c780 RDI: ffff8f000e25c780
RBP: ffff8ef9186f9870 R08: 0000000000013ffb R09: 00000000ffffbfff
R10: 00000000ffffbfff R11: ffff8f000e0a0000 R12: ffff9f864008fb50
R13: ffff8ef90c93dd60 R14: ffff8ef9010d0958 R15: ffff8ef9186f98c8
FS: 0000000000000000(0000) GS:ffff8f000e240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f48f5253a08 CR3: 00000003cb82e000 CR4: 00000000003506f0
Call Trace:
<TASK>
? kernfs_remove_by_name_ns+0xb9/0xc0
? __warn+0x7c/0x130
? kernfs_remove_by_name_ns+0xb9/0xc0
? report_bug+0x171/0x1a0
? handle_bug+0x3c/0x70
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? kernfs_remove_by_name_ns+0xb9/0xc0
? kernfs_remove_by_name_ns+0xb9/0xc0
acpi_unbind_one+0x108/0x180
device_del+0x18b/0x490
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
device_unregister+0xd/0x30
i2c_del_adapter.part.0+0x1bf/0x250
i2c_mux_del_adapters+0xa1/0xe0
i2c_device_remove+0x1e/0x80
device_release_driver_internal+0x19a/0x200
bus_remove_device+0xbf/0x100
device_del+0x157/0x490
? __pfx_device_match_fwnode+0x10/0x10
? srso_return_thunk+0x5/0x5f
device_unregister+0xd/0x30
i2c_acpi_notify+0x10f/0x140
notifier_call_chain+0x58/0xd0
blocking_notifier_call_chain+0x3a/0x60
acpi_device_del_work_fn+0x85/0x1d0
process_one_work+0x134/0x2f0
worker_thread+0x2f0/0x410
? __pfx_worker_thread+0x10/0x10
kthread+0xe3/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
---[ end trace 0000000000000000 ]---
...
repeated 7 more times, 1 for each channel of the mux
...
The issue is that the binding of the ACPI devices to their peer I2C
adapters is not correctly cleaned up. Digging deeper into the issue we
see that the deletion order is such that the ACPI devices matching the
mux channel i2c adapters are deleted first during the SSDT overlay
removal. For each of the channels we see a call to i2c_acpi_notify()
with ACPI_RECONFIG_DEVICE_REMOVE but, because these devices are not
actually i2c_clients, nothing is done for them.
Later on, after each of the mux channels has been dealt with, we come
to delete the i2c_client representing the PCA9548 device. This is the
call stack we see above, whereby the kernel cleans up the i2c_client
including destruction of the mux and its channel adapters. At this
point we do attempt to unbind from the ACPI peers but those peers no
longer exist and so we hit the kernfs errors.
The fix is to augment i2c_acpi_notify() to handle i2c_adapters. But,
given that the life cycle of the adapters is linked to the i2c_client,
instead of deleting the i2c_adapters during the i2c_acpi_notify(), we
just trigger unbinding of the ACPI device from the adapter device, and
allow the clean up of the adapter to continue in the way it always has.
Signed-off-by: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti(a)kernel.org>
Fixes: 525e6fabeae2 ("i2c / ACPI: add support for ACPI reconfigure notifications")
Cc: <stable(a)vger.kernel.org> # v4.8+
Signed-off-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-acpi.c
index d6037a328669..14ae0cfc325e 100644
--- a/drivers/i2c/i2c-core-acpi.c
+++ b/drivers/i2c/i2c-core-acpi.c
@@ -445,6 +445,11 @@ static struct i2c_client *i2c_acpi_find_client_by_adev(struct acpi_device *adev)
return i2c_find_device_by_fwnode(acpi_fwnode_handle(adev));
}
+static struct i2c_adapter *i2c_acpi_find_adapter_by_adev(struct acpi_device *adev)
+{
+ return i2c_find_adapter_by_fwnode(acpi_fwnode_handle(adev));
+}
+
static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
void *arg)
{
@@ -471,11 +476,17 @@ static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
break;
client = i2c_acpi_find_client_by_adev(adev);
- if (!client)
- break;
+ if (client) {
+ i2c_unregister_device(client);
+ put_device(&client->dev);
+ }
+
+ adapter = i2c_acpi_find_adapter_by_adev(adev);
+ if (adapter) {
+ acpi_unbind_one(&adapter->dev);
+ put_device(&adapter->dev);
+ }
- i2c_unregister_device(client);
- put_device(&client->dev);
break;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 3f858bbf04dbac934ac279aaee05d49eb9910051
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061319-epic-chunk-b98b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
3f858bbf04db ("i2c: acpi: Unbind mux adapters before delete")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3f858bbf04dbac934ac279aaee05d49eb9910051 Mon Sep 17 00:00:00 2001
From: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Date: Wed, 13 Mar 2024 11:16:32 +1300
Subject: [PATCH] i2c: acpi: Unbind mux adapters before delete
There is an issue with ACPI overlay table removal specifically related
to I2C multiplexers.
Consider an ACPI SSDT Overlay that defines a PCA9548 I2C mux on an
existing I2C bus. When this table is loaded we see the creation of a
device for the overall PCA9548 chip and 8 further devices - one
i2c_adapter each for the mux channels. These are all bound to their
ACPI equivalents via an eventual invocation of acpi_bind_one().
When we unload the SSDT overlay we run into the problem. The ACPI
devices are deleted as normal via acpi_device_del_work_fn() and the
acpi_device_del_list.
However, the following warning and stack trace is output as the
deletion does not go smoothly:
------------[ cut here ]------------
kernfs: can not remove 'physical_node', no directory
WARNING: CPU: 1 PID: 11 at fs/kernfs/dir.c:1674 kernfs_remove_by_name_ns+0xb9/0xc0
Modules linked in:
CPU: 1 PID: 11 Comm: kworker/u128:0 Not tainted 6.8.0-rc6+ #1
Hardware name: congatec AG conga-B7E3/conga-B7E3, BIOS 5.13 05/16/2023
Workqueue: kacpi_hotplug acpi_device_del_work_fn
RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0
Code: e4 00 48 89 ef e8 07 71 db ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 a7 55 e4 00 0f 0b eb a6 48 c7 c7 f0 38 0d 9d e8 97 0a d5 ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffff9f864008fb28 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8ef90a8d4940 RCX: 0000000000000000
RDX: ffff8f000e267d10 RSI: ffff8f000e25c780 RDI: ffff8f000e25c780
RBP: ffff8ef9186f9870 R08: 0000000000013ffb R09: 00000000ffffbfff
R10: 00000000ffffbfff R11: ffff8f000e0a0000 R12: ffff9f864008fb50
R13: ffff8ef90c93dd60 R14: ffff8ef9010d0958 R15: ffff8ef9186f98c8
FS: 0000000000000000(0000) GS:ffff8f000e240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f48f5253a08 CR3: 00000003cb82e000 CR4: 00000000003506f0
Call Trace:
<TASK>
? kernfs_remove_by_name_ns+0xb9/0xc0
? __warn+0x7c/0x130
? kernfs_remove_by_name_ns+0xb9/0xc0
? report_bug+0x171/0x1a0
? handle_bug+0x3c/0x70
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? kernfs_remove_by_name_ns+0xb9/0xc0
? kernfs_remove_by_name_ns+0xb9/0xc0
acpi_unbind_one+0x108/0x180
device_del+0x18b/0x490
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
device_unregister+0xd/0x30
i2c_del_adapter.part.0+0x1bf/0x250
i2c_mux_del_adapters+0xa1/0xe0
i2c_device_remove+0x1e/0x80
device_release_driver_internal+0x19a/0x200
bus_remove_device+0xbf/0x100
device_del+0x157/0x490
? __pfx_device_match_fwnode+0x10/0x10
? srso_return_thunk+0x5/0x5f
device_unregister+0xd/0x30
i2c_acpi_notify+0x10f/0x140
notifier_call_chain+0x58/0xd0
blocking_notifier_call_chain+0x3a/0x60
acpi_device_del_work_fn+0x85/0x1d0
process_one_work+0x134/0x2f0
worker_thread+0x2f0/0x410
? __pfx_worker_thread+0x10/0x10
kthread+0xe3/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
---[ end trace 0000000000000000 ]---
...
repeated 7 more times, 1 for each channel of the mux
...
The issue is that the binding of the ACPI devices to their peer I2C
adapters is not correctly cleaned up. Digging deeper into the issue we
see that the deletion order is such that the ACPI devices matching the
mux channel i2c adapters are deleted first during the SSDT overlay
removal. For each of the channels we see a call to i2c_acpi_notify()
with ACPI_RECONFIG_DEVICE_REMOVE but, because these devices are not
actually i2c_clients, nothing is done for them.
Later on, after each of the mux channels has been dealt with, we come
to delete the i2c_client representing the PCA9548 device. This is the
call stack we see above, whereby the kernel cleans up the i2c_client
including destruction of the mux and its channel adapters. At this
point we do attempt to unbind from the ACPI peers but those peers no
longer exist and so we hit the kernfs errors.
The fix is to augment i2c_acpi_notify() to handle i2c_adapters. But,
given that the life cycle of the adapters is linked to the i2c_client,
instead of deleting the i2c_adapters during the i2c_acpi_notify(), we
just trigger unbinding of the ACPI device from the adapter device, and
allow the clean up of the adapter to continue in the way it always has.
Signed-off-by: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti(a)kernel.org>
Fixes: 525e6fabeae2 ("i2c / ACPI: add support for ACPI reconfigure notifications")
Cc: <stable(a)vger.kernel.org> # v4.8+
Signed-off-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-acpi.c
index d6037a328669..14ae0cfc325e 100644
--- a/drivers/i2c/i2c-core-acpi.c
+++ b/drivers/i2c/i2c-core-acpi.c
@@ -445,6 +445,11 @@ static struct i2c_client *i2c_acpi_find_client_by_adev(struct acpi_device *adev)
return i2c_find_device_by_fwnode(acpi_fwnode_handle(adev));
}
+static struct i2c_adapter *i2c_acpi_find_adapter_by_adev(struct acpi_device *adev)
+{
+ return i2c_find_adapter_by_fwnode(acpi_fwnode_handle(adev));
+}
+
static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
void *arg)
{
@@ -471,11 +476,17 @@ static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
break;
client = i2c_acpi_find_client_by_adev(adev);
- if (!client)
- break;
+ if (client) {
+ i2c_unregister_device(client);
+ put_device(&client->dev);
+ }
+
+ adapter = i2c_acpi_find_adapter_by_adev(adev);
+ if (adapter) {
+ acpi_unbind_one(&adapter->dev);
+ put_device(&adapter->dev);
+ }
- i2c_unregister_device(client);
- put_device(&client->dev);
break;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 3f858bbf04dbac934ac279aaee05d49eb9910051
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061317-avid-favoring-8698@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
3f858bbf04db ("i2c: acpi: Unbind mux adapters before delete")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3f858bbf04dbac934ac279aaee05d49eb9910051 Mon Sep 17 00:00:00 2001
From: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Date: Wed, 13 Mar 2024 11:16:32 +1300
Subject: [PATCH] i2c: acpi: Unbind mux adapters before delete
There is an issue with ACPI overlay table removal specifically related
to I2C multiplexers.
Consider an ACPI SSDT Overlay that defines a PCA9548 I2C mux on an
existing I2C bus. When this table is loaded we see the creation of a
device for the overall PCA9548 chip and 8 further devices - one
i2c_adapter each for the mux channels. These are all bound to their
ACPI equivalents via an eventual invocation of acpi_bind_one().
When we unload the SSDT overlay we run into the problem. The ACPI
devices are deleted as normal via acpi_device_del_work_fn() and the
acpi_device_del_list.
However, the following warning and stack trace is output as the
deletion does not go smoothly:
------------[ cut here ]------------
kernfs: can not remove 'physical_node', no directory
WARNING: CPU: 1 PID: 11 at fs/kernfs/dir.c:1674 kernfs_remove_by_name_ns+0xb9/0xc0
Modules linked in:
CPU: 1 PID: 11 Comm: kworker/u128:0 Not tainted 6.8.0-rc6+ #1
Hardware name: congatec AG conga-B7E3/conga-B7E3, BIOS 5.13 05/16/2023
Workqueue: kacpi_hotplug acpi_device_del_work_fn
RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0
Code: e4 00 48 89 ef e8 07 71 db ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 a7 55 e4 00 0f 0b eb a6 48 c7 c7 f0 38 0d 9d e8 97 0a d5 ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffff9f864008fb28 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8ef90a8d4940 RCX: 0000000000000000
RDX: ffff8f000e267d10 RSI: ffff8f000e25c780 RDI: ffff8f000e25c780
RBP: ffff8ef9186f9870 R08: 0000000000013ffb R09: 00000000ffffbfff
R10: 00000000ffffbfff R11: ffff8f000e0a0000 R12: ffff9f864008fb50
R13: ffff8ef90c93dd60 R14: ffff8ef9010d0958 R15: ffff8ef9186f98c8
FS: 0000000000000000(0000) GS:ffff8f000e240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f48f5253a08 CR3: 00000003cb82e000 CR4: 00000000003506f0
Call Trace:
<TASK>
? kernfs_remove_by_name_ns+0xb9/0xc0
? __warn+0x7c/0x130
? kernfs_remove_by_name_ns+0xb9/0xc0
? report_bug+0x171/0x1a0
? handle_bug+0x3c/0x70
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? kernfs_remove_by_name_ns+0xb9/0xc0
? kernfs_remove_by_name_ns+0xb9/0xc0
acpi_unbind_one+0x108/0x180
device_del+0x18b/0x490
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
device_unregister+0xd/0x30
i2c_del_adapter.part.0+0x1bf/0x250
i2c_mux_del_adapters+0xa1/0xe0
i2c_device_remove+0x1e/0x80
device_release_driver_internal+0x19a/0x200
bus_remove_device+0xbf/0x100
device_del+0x157/0x490
? __pfx_device_match_fwnode+0x10/0x10
? srso_return_thunk+0x5/0x5f
device_unregister+0xd/0x30
i2c_acpi_notify+0x10f/0x140
notifier_call_chain+0x58/0xd0
blocking_notifier_call_chain+0x3a/0x60
acpi_device_del_work_fn+0x85/0x1d0
process_one_work+0x134/0x2f0
worker_thread+0x2f0/0x410
? __pfx_worker_thread+0x10/0x10
kthread+0xe3/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
---[ end trace 0000000000000000 ]---
...
repeated 7 more times, 1 for each channel of the mux
...
The issue is that the binding of the ACPI devices to their peer I2C
adapters is not correctly cleaned up. Digging deeper into the issue we
see that the deletion order is such that the ACPI devices matching the
mux channel i2c adapters are deleted first during the SSDT overlay
removal. For each of the channels we see a call to i2c_acpi_notify()
with ACPI_RECONFIG_DEVICE_REMOVE but, because these devices are not
actually i2c_clients, nothing is done for them.
Later on, after each of the mux channels has been dealt with, we come
to delete the i2c_client representing the PCA9548 device. This is the
call stack we see above, whereby the kernel cleans up the i2c_client
including destruction of the mux and its channel adapters. At this
point we do attempt to unbind from the ACPI peers but those peers no
longer exist and so we hit the kernfs errors.
The fix is to augment i2c_acpi_notify() to handle i2c_adapters. But,
given that the life cycle of the adapters is linked to the i2c_client,
instead of deleting the i2c_adapters during the i2c_acpi_notify(), we
just trigger unbinding of the ACPI device from the adapter device, and
allow the clean up of the adapter to continue in the way it always has.
Signed-off-by: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti(a)kernel.org>
Fixes: 525e6fabeae2 ("i2c / ACPI: add support for ACPI reconfigure notifications")
Cc: <stable(a)vger.kernel.org> # v4.8+
Signed-off-by: Wolfram Sang <wsa+renesas(a)sang-engineering.com>
diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-acpi.c
index d6037a328669..14ae0cfc325e 100644
--- a/drivers/i2c/i2c-core-acpi.c
+++ b/drivers/i2c/i2c-core-acpi.c
@@ -445,6 +445,11 @@ static struct i2c_client *i2c_acpi_find_client_by_adev(struct acpi_device *adev)
return i2c_find_device_by_fwnode(acpi_fwnode_handle(adev));
}
+static struct i2c_adapter *i2c_acpi_find_adapter_by_adev(struct acpi_device *adev)
+{
+ return i2c_find_adapter_by_fwnode(acpi_fwnode_handle(adev));
+}
+
static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
void *arg)
{
@@ -471,11 +476,17 @@ static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
break;
client = i2c_acpi_find_client_by_adev(adev);
- if (!client)
- break;
+ if (client) {
+ i2c_unregister_device(client);
+ put_device(&client->dev);
+ }
+
+ adapter = i2c_acpi_find_adapter_by_adev(adev);
+ if (adapter) {
+ acpi_unbind_one(&adapter->dev);
+ put_device(&adapter->dev);
+ }
- i2c_unregister_device(client);
- put_device(&client->dev);
break;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0eafc58f2194dbd01d4be40f99a697681171995b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061323-deuce-expose-15f0@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
0eafc58f2194 ("HID: i2c-hid: elan: fix reset suspend current leakage")
f2f43bf15d7a ("HID: i2c-hid: elan: Add ili9882t timing")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0eafc58f2194dbd01d4be40f99a697681171995b Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Tue, 7 May 2024 16:48:18 +0200
Subject: [PATCH] HID: i2c-hid: elan: fix reset suspend current leakage
The Elan eKTH5015M touch controller found on the Lenovo ThinkPad X13s
shares the VCC33 supply with other peripherals that may remain powered
during suspend (e.g. when enabled as wakeup sources).
The reset line is also wired so that it can be left deasserted when the
supply is off.
This is important as it avoids holding the controller in reset for
extended periods of time when it remains powered, which can lead to
increased power consumption, and also avoids leaking current through the
X13s reset circuitry during suspend (and after driver unbind).
Use the new 'no-reset-on-power-off' devicetree property to determine
when reset needs to be asserted on power down.
Notably this also avoids wasting power on machine variants without a
touchscreen for which the driver would otherwise exit probe with reset
asserted.
Fixes: bd3cba00dcc6 ("HID: i2c-hid: elan: Add support for Elan eKTH6915 i2c-hid touchscreens")
Cc: <stable(a)vger.kernel.org> # 6.0
Cc: Douglas Anderson <dianders(a)chromium.org>
Tested-by: Steev Klimaszewski <steev(a)kali.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Reviewed-by: Douglas Anderson <dianders(a)chromium.org>
Link: https://lore.kernel.org/r/20240507144821.12275-5-johan+linaro@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
diff --git a/drivers/hid/i2c-hid/i2c-hid-of-elan.c b/drivers/hid/i2c-hid/i2c-hid-of-elan.c
index 5b91fb106cfc..091e37933225 100644
--- a/drivers/hid/i2c-hid/i2c-hid-of-elan.c
+++ b/drivers/hid/i2c-hid/i2c-hid-of-elan.c
@@ -31,6 +31,7 @@ struct i2c_hid_of_elan {
struct regulator *vcc33;
struct regulator *vccio;
struct gpio_desc *reset_gpio;
+ bool no_reset_on_power_off;
const struct elan_i2c_hid_chip_data *chip_data;
};
@@ -40,17 +41,17 @@ static int elan_i2c_hid_power_up(struct i2chid_ops *ops)
container_of(ops, struct i2c_hid_of_elan, ops);
int ret;
+ gpiod_set_value_cansleep(ihid_elan->reset_gpio, 1);
+
if (ihid_elan->vcc33) {
ret = regulator_enable(ihid_elan->vcc33);
if (ret)
- return ret;
+ goto err_deassert_reset;
}
ret = regulator_enable(ihid_elan->vccio);
- if (ret) {
- regulator_disable(ihid_elan->vcc33);
- return ret;
- }
+ if (ret)
+ goto err_disable_vcc33;
if (ihid_elan->chip_data->post_power_delay_ms)
msleep(ihid_elan->chip_data->post_power_delay_ms);
@@ -60,6 +61,15 @@ static int elan_i2c_hid_power_up(struct i2chid_ops *ops)
msleep(ihid_elan->chip_data->post_gpio_reset_on_delay_ms);
return 0;
+
+err_disable_vcc33:
+ if (ihid_elan->vcc33)
+ regulator_disable(ihid_elan->vcc33);
+err_deassert_reset:
+ if (ihid_elan->no_reset_on_power_off)
+ gpiod_set_value_cansleep(ihid_elan->reset_gpio, 0);
+
+ return ret;
}
static void elan_i2c_hid_power_down(struct i2chid_ops *ops)
@@ -67,7 +77,14 @@ static void elan_i2c_hid_power_down(struct i2chid_ops *ops)
struct i2c_hid_of_elan *ihid_elan =
container_of(ops, struct i2c_hid_of_elan, ops);
- gpiod_set_value_cansleep(ihid_elan->reset_gpio, 1);
+ /*
+ * Do not assert reset when the hardware allows for it to remain
+ * deasserted regardless of the state of the (shared) power supply to
+ * avoid wasting power when the supply is left on.
+ */
+ if (!ihid_elan->no_reset_on_power_off)
+ gpiod_set_value_cansleep(ihid_elan->reset_gpio, 1);
+
if (ihid_elan->chip_data->post_gpio_reset_off_delay_ms)
msleep(ihid_elan->chip_data->post_gpio_reset_off_delay_ms);
@@ -79,6 +96,7 @@ static void elan_i2c_hid_power_down(struct i2chid_ops *ops)
static int i2c_hid_of_elan_probe(struct i2c_client *client)
{
struct i2c_hid_of_elan *ihid_elan;
+ int ret;
ihid_elan = devm_kzalloc(&client->dev, sizeof(*ihid_elan), GFP_KERNEL);
if (!ihid_elan)
@@ -93,21 +111,38 @@ static int i2c_hid_of_elan_probe(struct i2c_client *client)
if (IS_ERR(ihid_elan->reset_gpio))
return PTR_ERR(ihid_elan->reset_gpio);
+ ihid_elan->no_reset_on_power_off = of_property_read_bool(client->dev.of_node,
+ "no-reset-on-power-off");
+
ihid_elan->vccio = devm_regulator_get(&client->dev, "vccio");
- if (IS_ERR(ihid_elan->vccio))
- return PTR_ERR(ihid_elan->vccio);
+ if (IS_ERR(ihid_elan->vccio)) {
+ ret = PTR_ERR(ihid_elan->vccio);
+ goto err_deassert_reset;
+ }
ihid_elan->chip_data = device_get_match_data(&client->dev);
if (ihid_elan->chip_data->main_supply_name) {
ihid_elan->vcc33 = devm_regulator_get(&client->dev,
ihid_elan->chip_data->main_supply_name);
- if (IS_ERR(ihid_elan->vcc33))
- return PTR_ERR(ihid_elan->vcc33);
+ if (IS_ERR(ihid_elan->vcc33)) {
+ ret = PTR_ERR(ihid_elan->vcc33);
+ goto err_deassert_reset;
+ }
}
- return i2c_hid_core_probe(client, &ihid_elan->ops,
- ihid_elan->chip_data->hid_descriptor_address, 0);
+ ret = i2c_hid_core_probe(client, &ihid_elan->ops,
+ ihid_elan->chip_data->hid_descriptor_address, 0);
+ if (ret)
+ goto err_deassert_reset;
+
+ return 0;
+
+err_deassert_reset:
+ if (ihid_elan->no_reset_on_power_off)
+ gpiod_set_value_cansleep(ihid_elan->reset_gpio, 0);
+
+ return ret;
}
static const struct elan_i2c_hid_chip_data elan_ekth6915_chip_data = {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c898afdc15645efb555acb6d85b484eb40a45409
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061357-product-rigid-0f3e@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
c898afdc1564 ("9p: add missing locking around taking dentry fid list")
b48dbb998d70 ("9p fid refcount: add p9_fid_get/put wrappers")
47b1e3432b06 ("9p: Remove unnecessary variable for old fids while walking from d_parent")
cba83f47fc0e ("9p: Track the root fid with its own variable during lookups")
b0017602fdf6 ("9p: fix EBADF errors in cached mode")
2a3dcbccd64b ("9p: Fix refcounting during full path walks for fid lookups")
beca774fc51a ("9p: fix fid refcount leak in v9fs_vfs_atomic_open_dotl")
6e195b0f7c8e ("9p: fix a bunch of checkpatch warnings")
eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c898afdc15645efb555acb6d85b484eb40a45409 Mon Sep 17 00:00:00 2001
From: Dominique Martinet <asmadeus(a)codewreck.org>
Date: Tue, 21 May 2024 21:13:36 +0900
Subject: [PATCH] 9p: add missing locking around taking dentry fid list
Fix a use-after-free on dentry's d_fsdata fid list when a thread
looks up a fid through dentry while another thread unlinks it:
UAF thread:
refcount_t: addition on 0; use-after-free.
p9_fid_get linux/./include/net/9p/client.h:262
v9fs_fid_find+0x236/0x280 linux/fs/9p/fid.c:129
v9fs_fid_lookup_with_uid linux/fs/9p/fid.c:181
v9fs_fid_lookup+0xbf/0xc20 linux/fs/9p/fid.c:314
v9fs_vfs_getattr_dotl+0xf9/0x360 linux/fs/9p/vfs_inode_dotl.c:400
vfs_statx+0xdd/0x4d0 linux/fs/stat.c:248
Freed by:
p9_fid_destroy (inlined)
p9_client_clunk+0xb0/0xe0 linux/net/9p/client.c:1456
p9_fid_put linux/./include/net/9p/client.h:278
v9fs_dentry_release+0xb5/0x140 linux/fs/9p/vfs_dentry.c:55
v9fs_remove+0x38f/0x620 linux/fs/9p/vfs_inode.c:518
vfs_unlink+0x29a/0x810 linux/fs/namei.c:4335
The problem is that d_fsdata was not accessed under d_lock, because
d_release() normally is only called once the dentry is otherwise no
longer accessible but since we also call it explicitly in v9fs_remove
that lock is required:
move the hlist out of the dentry under lock then unref its fids once
they are no longer accessible.
Fixes: 154372e67d40 ("fs/9p: fix create-unlink-getattr idiom")
Cc: stable(a)vger.kernel.org
Reported-by: Meysam Firouzi
Reported-by: Amirmohammad Eftekhar
Reviewed-by: Christian Schoenebeck <linux_oss(a)crudebyte.com>
Message-ID: <20240521122947.1080227-1-asmadeus(a)codewreck.org>
Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org>
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index f16f73581634..01338d4c2d9e 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -48,12 +48,17 @@ static int v9fs_cached_dentry_delete(const struct dentry *dentry)
static void v9fs_dentry_release(struct dentry *dentry)
{
struct hlist_node *p, *n;
+ struct hlist_head head;
p9_debug(P9_DEBUG_VFS, " dentry: %pd (%p)\n",
dentry, dentry);
- hlist_for_each_safe(p, n, (struct hlist_head *)&dentry->d_fsdata)
+
+ spin_lock(&dentry->d_lock);
+ hlist_move_list((struct hlist_head *)&dentry->d_fsdata, &head);
+ spin_unlock(&dentry->d_lock);
+
+ hlist_for_each_safe(p, n, &head)
p9_fid_put(hlist_entry(p, struct p9_fid, dlist));
- dentry->d_fsdata = NULL;
}
static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
Hi Greg, Sasha,
This round includes pending -stable backport fixes for 4.19.
This large batch includes dependency patches and fixes which are
already present in -stable kernels >= 5.4.x but not in 4.19.x.
The following list shows the backported patches, I am using original
commit IDs for reference:
1) 0c2a85edd143 ("netfilter: nf_tables: pass context to nft_set_destroy()")
2) f8bb7889af58 ("netfilter: nftables: rename set element data activation/deactivation functions")
3) 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
4) 3b18d5eba491 ("netfilter: nft_set_rbtree: allow loose matching of closing element in interval")
5) 340eaff65116 ("netfilter: nft_set_rbtree: Add missing expired checks")
6) c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
7) 61ae320a29b0 ("netfilter: nft_set_rbtree: fix null deref on element insertion")
8) f718863aca46 ("netfilter: nft_set_rbtree: fix overlap expiration walk")
9) 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk")
10) 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
11) f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
12) a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API")
13) 6a33d8b73dfa ("netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path")
14) 02c6c24402bf ("netfilter: nf_tables: GC transaction race with netns dismantle")
15) 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path")
16) 8e51830e29e1 ("netfilter: nf_tables: defer gc run if previous batch is still pending")
17) 2ee52ae94baa ("netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction")
18) 96b33300fba8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
19) b079155faae9 ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
20) cf5000a7787c ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
21) 6069da443bf6 ("netfilter: nf_tables: unregister flowtable hooks on netns exit")
22) f9a43007d3f7 ("netfilter: nf_tables: double hook unregistration in netns path")
23) 0ce7cf4127f1 ("netfilter: nftables: update table flags from the commit phase")
24) 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
25) c9bd26513b3a ("netfilter: nf_tables: disable toggling dormant table state more than once")
26) ("netfilter: nf_tables: bogus EBUSY when deleting flowtable after flush (for 4.19)")
NB: This patch does not exist in any upstream tree, but there is a similar patch already in 5.4
27) 917d80d376ff ("netfilter: nft_dynset: fix timeouts later than 23 days")
28) fd94d9dadee5 ("netfilter: nftables: exthdr: fix 4-byte stack OOB write")
29) 95cd4bca7b1f ("netfilter: nft_dynset: report EOPNOTSUPP on missing set feature")
30) 7b1394892de8 ("netfilter: nft_dynset: relax superfluous check on set updates")
31) 08e4c8c5919f ("netfilter: nf_tables: mark newset as dead on transaction abort")
32) 6b1ca88e4bb6 ("netfilter: nf_tables: skip dead set elements in netlink dump")
33) d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family")
34) 60c0c230c6f0 ("netfilter: nft_set_rbtree: skip end interval element from gc")
35) bccebf647017 ("netfilter: nf_tables: set dormant flag on hook register failure")
36) 7e0f122c6591 ("netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()")
37) 4a0e7f2decbf ("netfilter: nf_tables: do not compare internal table flags on updates")
38) 552705a3650b ("netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout")
39) 994209ddf4f4 ("netfilter: nf_tables: reject new basechain after table flag update")
40) 1bc83a019bbe ("netfilter: nf_tables: discard table flag update with pending basechain deletion")
Please, apply.
Thanks.
Florian Westphal (4):
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nftables: exthdr: fix 4-byte stack OOB write
netfilter: nf_tables: mark newset as dead on transaction abort
netfilter: nf_tables: set dormant flag on hook register failure
Ignat Korchagin (1):
netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()
Pablo Neira Ayuso (34):
netfilter: nf_tables: pass context to nft_set_destroy()
netfilter: nftables: rename set element data activation/deactivation functions
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nft_set_rbtree: allow loose matching of closing element in interval
netfilter: nft_set_rbtree: Switch to node list walk for overlap detection
netfilter: nft_set_rbtree: fix null deref on element insertion
netfilter: nft_set_rbtree: fix overlap expiration walk
netfilter: nf_tables: don't skip expired elements during walk
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: GC transaction race with abort path
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
netfilter: nf_tables: fix memleak when more than 255 elements expired
netfilter: nf_tables: unregister flowtable hooks on netns exit
netfilter: nf_tables: double hook unregistration in netns path
netfilter: nftables: update table flags from the commit phase
netfilter: nf_tables: fix table flag updates
netfilter: nf_tables: disable toggling dormant table state more than once
netfilter: nf_tables: bogus EBUSY when deleting flowtable after flush (for 4.19)
netfilter: nft_dynset: fix timeouts later than 23 days
netfilter: nft_dynset: report EOPNOTSUPP on missing set feature
netfilter: nft_dynset: relax superfluous check on set updates
netfilter: nf_tables: skip dead set elements in netlink dump
netfilter: nf_tables: validate NFPROTO_* family
netfilter: nft_set_rbtree: skip end interval element from gc
netfilter: nf_tables: do not compare internal table flags on updates
netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
netfilter: nf_tables: reject new basechain after table flag update
netfilter: nf_tables: discard table flag update with pending basechain deletion
Phil Sutter (1):
netfilter: nft_set_rbtree: Add missing expired checks
include/net/netfilter/nf_tables.h | 132 +++---
include/uapi/linux/netfilter/nf_tables.h | 1 +
net/netfilter/nf_tables_api.c | 529 +++++++++++++++++++----
net/netfilter/nft_chain_filter.c | 3 +
net/netfilter/nft_compat.c | 32 ++
net/netfilter/nft_dynset.c | 24 +-
net/netfilter/nft_exthdr.c | 14 +-
net/netfilter/nft_flow_offload.c | 5 +
net/netfilter/nft_nat.c | 5 +
net/netfilter/nft_rt.c | 5 +
net/netfilter/nft_set_bitmap.c | 5 +-
net/netfilter/nft_set_hash.c | 111 +++--
net/netfilter/nft_set_rbtree.c | 387 ++++++++++++++---
net/netfilter/nft_socket.c | 5 +
net/netfilter/nft_tproxy.c | 5 +
15 files changed, 977 insertions(+), 286 deletions(-)
--
2.30.2
Read callbacks registered with nvmem core expect 0 to be returned on
success and a negative value to be returned on failure.
abx80x_nvmem_xfer() on read calls i2c_smbus_read_i2c_block_data() which
returns the number of bytes read on success as per its api description,
this return value is handled as an error and returned to nvmem even on
success.
Fix to handle all possible values that would be returned by
i2c_smbus_read_i2c_block_data().
Fixes: e90ff8ede777 ("rtc: abx80x: Add nvmem support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Joy Chakraborty <joychakr(a)google.com>
---
drivers/rtc/rtc-abx80x.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/rtc/rtc-abx80x.c b/drivers/rtc/rtc-abx80x.c
index fde2b8054c2e..0f5847d1ca2a 100644
--- a/drivers/rtc/rtc-abx80x.c
+++ b/drivers/rtc/rtc-abx80x.c
@@ -711,9 +711,16 @@ static int abx80x_nvmem_xfer(struct abx80x_priv *priv, unsigned int offset,
else
ret = i2c_smbus_read_i2c_block_data(priv->client, reg,
len, val);
- if (ret)
+ if (ret < 0)
return ret;
+ if (!write) {
+ if (ret)
+ len = ret;
+ else
+ return -EIO;
+ }
+
offset += len;
val += len;
bytes -= len;
--
2.45.2.505.gda0bf45e8d-goog
When the best selected CPU is offline, work_on_cpu() will stuck forever.
This can be happen if a node is online while all its CPUs are offline
(we can use "maxcpus=1" without "nr_cpus=1" to reproduce it), Therefore,
in this case, we should call local_pci_probe() instead of work_on_cpu().
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn>
---
v1 -> v2 Added the method to reproduce this issue
---
drivers/pci/pci-driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index af2996d0d17f..32a99828e6a3 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
free_cpumask_var(wq_domain_mask);
}
- if (cpu < nr_cpu_ids)
+ if ((cpu < nr_cpu_ids) && cpu_online(cpu))
error = work_on_cpu(cpu, local_pci_probe, &ddi);
else
error = local_pci_probe(&ddi);
--
2.33.0
From: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
commit 0774d19038c496f0c3602fb505c43e1b2d8eed85 upstream.
If an input device declares too many capability bits then modalias
string for such device may become too long and not fit into uevent
buffer, resulting in failure of sending said uevent. This, in turn,
may prevent userspace from recognizing existence of such devices.
This is typically not a concern for real hardware devices as they have
limited number of keys, but happen with synthetic devices such as
ones created by xen-kbdfront driver, which creates devices as being
capable of delivering all possible keys, since it doesn't know what
keys the backend may produce.
To deal with such devices input core will attempt to trim key data,
in the hope that the rest of modalias string will fit in the given
buffer. When trimming key data it will indicate that it is not
complete by placing "+," sign, resulting in conversions like this:
old: k71,72,73,74,78,7A,7B,7C,7D,8E,9E,A4,AD,E0,E1,E4,F8,174,
new: k71,72,73,74,78,7A,7B,7C,+,
This should allow existing udev rules continue to work with existing
devices, and will also allow writing more complex rules that would
recognize trimmed modalias and check input device characteristics by
other means (for example by parsing KEY= data in uevent or parsing
input device sysfs attributes).
Note that the driver core may try adding more uevent environment
variables once input core is done adding its own, so when forming
modalias we can not use the entire available buffer, so we reduce
it by somewhat an arbitrary amount (96 bytes).
Reported-by: Jason Andryuk <jandryuk(a)gmail.com>
Reviewed-by: Peter Hutterer <peter.hutterer(a)who-t.net>
Tested-by: Jason Andryuk <jandryuk(a)gmail.com>
Link: https://lore.kernel.org/r/ZjAWMQCJdrxZkvkB@google.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
[ Apply to linux-6.1.y ]
Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com>
---
For 6.1 only.
Patch did not automatically apply to 6.1.y because
input_print_modalias_parts() does not have const on *id.
v2:
Remove const from input_print_modalias() and
input_print_modalias_parts()
Tested on 6.1. 5.15 and earlier need an additional fixup.
drivers/input/input.c | 104 ++++++++++++++++++++++++++++++++++++------
1 file changed, 89 insertions(+), 15 deletions(-)
diff --git a/drivers/input/input.c b/drivers/input/input.c
index 8b6a922f8470..78be582b5766 100644
--- a/drivers/input/input.c
+++ b/drivers/input/input.c
@@ -1374,19 +1374,19 @@ static int input_print_modalias_bits(char *buf, int size,
char name, unsigned long *bm,
unsigned int min_bit, unsigned int max_bit)
{
- int len = 0, i;
+ int bit = min_bit;
+ int len = 0;
len += snprintf(buf, max(size, 0), "%c", name);
- for (i = min_bit; i < max_bit; i++)
- if (bm[BIT_WORD(i)] & BIT_MASK(i))
- len += snprintf(buf + len, max(size - len, 0), "%X,", i);
+ for_each_set_bit_from(bit, bm, max_bit)
+ len += snprintf(buf + len, max(size - len, 0), "%X,", bit);
return len;
}
-static int input_print_modalias(char *buf, int size, struct input_dev *id,
- int add_cr)
+static int input_print_modalias_parts(char *buf, int size, int full_len,
+ struct input_dev *id)
{
- int len;
+ int len, klen, remainder, space;
len = snprintf(buf, max(size, 0),
"input:b%04Xv%04Xp%04Xe%04X-",
@@ -1395,8 +1395,48 @@ static int input_print_modalias(char *buf, int size, struct input_dev *id,
len += input_print_modalias_bits(buf + len, size - len,
'e', id->evbit, 0, EV_MAX);
- len += input_print_modalias_bits(buf + len, size - len,
+
+ /*
+ * Calculate the remaining space in the buffer making sure we
+ * have place for the terminating 0.
+ */
+ space = max(size - (len + 1), 0);
+
+ klen = input_print_modalias_bits(buf + len, size - len,
'k', id->keybit, KEY_MIN_INTERESTING, KEY_MAX);
+ len += klen;
+
+ /*
+ * If we have more data than we can fit in the buffer, check
+ * if we can trim key data to fit in the rest. We will indicate
+ * that key data is incomplete by adding "+" sign at the end, like
+ * this: * "k1,2,3,45,+,".
+ *
+ * Note that we shortest key info (if present) is "k+," so we
+ * can only try to trim if key data is longer than that.
+ */
+ if (full_len && size < full_len + 1 && klen > 3) {
+ remainder = full_len - len;
+ /*
+ * We can only trim if we have space for the remainder
+ * and also for at least "k+," which is 3 more characters.
+ */
+ if (remainder <= space - 3) {
+ /*
+ * We are guaranteed to have 'k' in the buffer, so
+ * we need at least 3 additional bytes for storing
+ * "+," in addition to the remainder.
+ */
+ for (int i = size - 1 - remainder - 3; i >= 0; i--) {
+ if (buf[i] == 'k' || buf[i] == ',') {
+ strcpy(buf + i + 1, "+,");
+ len = i + 3; /* Not counting '\0' */
+ break;
+ }
+ }
+ }
+ }
+
len += input_print_modalias_bits(buf + len, size - len,
'r', id->relbit, 0, REL_MAX);
len += input_print_modalias_bits(buf + len, size - len,
@@ -1412,12 +1452,25 @@ static int input_print_modalias(char *buf, int size, struct input_dev *id,
len += input_print_modalias_bits(buf + len, size - len,
'w', id->swbit, 0, SW_MAX);
- if (add_cr)
- len += snprintf(buf + len, max(size - len, 0), "\n");
-
return len;
}
+static int input_print_modalias(char *buf, int size, struct input_dev *id)
+{
+ int full_len;
+
+ /*
+ * Printing is done in 2 passes: first one figures out total length
+ * needed for the modalias string, second one will try to trim key
+ * data in case when buffer is too small for the entire modalias.
+ * If the buffer is too small regardless, it will fill as much as it
+ * can (without trimming key data) into the buffer and leave it to
+ * the caller to figure out what to do with the result.
+ */
+ full_len = input_print_modalias_parts(NULL, 0, 0, id);
+ return input_print_modalias_parts(buf, size, full_len, id);
+}
+
static ssize_t input_dev_show_modalias(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -1425,7 +1478,9 @@ static ssize_t input_dev_show_modalias(struct device *dev,
struct input_dev *id = to_input_dev(dev);
ssize_t len;
- len = input_print_modalias(buf, PAGE_SIZE, id, 1);
+ len = input_print_modalias(buf, PAGE_SIZE, id);
+ if (len < PAGE_SIZE - 2)
+ len += snprintf(buf + len, PAGE_SIZE - len, "\n");
return min_t(int, len, PAGE_SIZE);
}
@@ -1637,6 +1692,23 @@ static int input_add_uevent_bm_var(struct kobj_uevent_env *env,
return 0;
}
+/*
+ * This is a pretty gross hack. When building uevent data the driver core
+ * may try adding more environment variables to kobj_uevent_env without
+ * telling us, so we have no idea how much of the buffer we can use to
+ * avoid overflows/-ENOMEM elsewhere. To work around this let's artificially
+ * reduce amount of memory we will use for the modalias environment variable.
+ *
+ * The potential additions are:
+ *
+ * SEQNUM=18446744073709551615 - (%llu - 28 bytes)
+ * HOME=/ (6 bytes)
+ * PATH=/sbin:/bin:/usr/sbin:/usr/bin (34 bytes)
+ *
+ * 68 bytes total. Allow extra buffer - 96 bytes
+ */
+#define UEVENT_ENV_EXTRA_LEN 96
+
static int input_add_uevent_modalias_var(struct kobj_uevent_env *env,
struct input_dev *dev)
{
@@ -1646,9 +1718,11 @@ static int input_add_uevent_modalias_var(struct kobj_uevent_env *env,
return -ENOMEM;
len = input_print_modalias(&env->buf[env->buflen - 1],
- sizeof(env->buf) - env->buflen,
- dev, 0);
- if (len >= (sizeof(env->buf) - env->buflen))
+ (int)sizeof(env->buf) - env->buflen -
+ UEVENT_ENV_EXTRA_LEN,
+ dev);
+ if (len >= ((int)sizeof(env->buf) - env->buflen -
+ UEVENT_ENV_EXTRA_LEN))
return -ENOMEM;
env->buflen += len;
--
2.40.1
From: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
commit 0774d19038c496f0c3602fb505c43e1b2d8eed85 upstream.
If an input device declares too many capability bits then modalias
string for such device may become too long and not fit into uevent
buffer, resulting in failure of sending said uevent. This, in turn,
may prevent userspace from recognizing existence of such devices.
This is typically not a concern for real hardware devices as they have
limited number of keys, but happen with synthetic devices such as
ones created by xen-kbdfront driver, which creates devices as being
capable of delivering all possible keys, since it doesn't know what
keys the backend may produce.
To deal with such devices input core will attempt to trim key data,
in the hope that the rest of modalias string will fit in the given
buffer. When trimming key data it will indicate that it is not
complete by placing "+," sign, resulting in conversions like this:
old: k71,72,73,74,78,7A,7B,7C,7D,8E,9E,A4,AD,E0,E1,E4,F8,174,
new: k71,72,73,74,78,7A,7B,7C,+,
This should allow existing udev rules continue to work with existing
devices, and will also allow writing more complex rules that would
recognize trimmed modalias and check input device characteristics by
other means (for example by parsing KEY= data in uevent or parsing
input device sysfs attributes).
Note that the driver core may try adding more uevent environment
variables once input core is done adding its own, so when forming
modalias we can not use the entire available buffer, so we reduce
it by somewhat an arbitrary amount (96 bytes).
Reported-by: Jason Andryuk <jandryuk(a)gmail.com>
Reviewed-by: Peter Hutterer <peter.hutterer(a)who-t.net>
Tested-by: Jason Andryuk <jandryuk(a)gmail.com>
Link: https://lore.kernel.org/r/ZjAWMQCJdrxZkvkB@google.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
[ Apply to linux-6.1.y ]
Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com>
---
Patch did not automatically apply to 6.1.y because
input_print_modalias_parts() does not have const on *id.
Tested on 6.1. Seems to also apply and build on 5.4 and 4.19.
drivers/input/input.c | 104 ++++++++++++++++++++++++++++++++++++------
1 file changed, 89 insertions(+), 15 deletions(-)
diff --git a/drivers/input/input.c b/drivers/input/input.c
index 8b6a922f8470..eb2bb8cbec3c 100644
--- a/drivers/input/input.c
+++ b/drivers/input/input.c
@@ -1374,19 +1374,19 @@ static int input_print_modalias_bits(char *buf, int size,
char name, unsigned long *bm,
unsigned int min_bit, unsigned int max_bit)
{
- int len = 0, i;
+ int bit = min_bit;
+ int len = 0;
len += snprintf(buf, max(size, 0), "%c", name);
- for (i = min_bit; i < max_bit; i++)
- if (bm[BIT_WORD(i)] & BIT_MASK(i))
- len += snprintf(buf + len, max(size - len, 0), "%X,", i);
+ for_each_set_bit_from(bit, bm, max_bit)
+ len += snprintf(buf + len, max(size - len, 0), "%X,", bit);
return len;
}
-static int input_print_modalias(char *buf, int size, struct input_dev *id,
- int add_cr)
+static int input_print_modalias_parts(char *buf, int size, int full_len,
+ const struct input_dev *id)
{
- int len;
+ int len, klen, remainder, space;
len = snprintf(buf, max(size, 0),
"input:b%04Xv%04Xp%04Xe%04X-",
@@ -1395,8 +1395,48 @@ static int input_print_modalias(char *buf, int size, struct input_dev *id,
len += input_print_modalias_bits(buf + len, size - len,
'e', id->evbit, 0, EV_MAX);
- len += input_print_modalias_bits(buf + len, size - len,
+
+ /*
+ * Calculate the remaining space in the buffer making sure we
+ * have place for the terminating 0.
+ */
+ space = max(size - (len + 1), 0);
+
+ klen = input_print_modalias_bits(buf + len, size - len,
'k', id->keybit, KEY_MIN_INTERESTING, KEY_MAX);
+ len += klen;
+
+ /*
+ * If we have more data than we can fit in the buffer, check
+ * if we can trim key data to fit in the rest. We will indicate
+ * that key data is incomplete by adding "+" sign at the end, like
+ * this: * "k1,2,3,45,+,".
+ *
+ * Note that we shortest key info (if present) is "k+," so we
+ * can only try to trim if key data is longer than that.
+ */
+ if (full_len && size < full_len + 1 && klen > 3) {
+ remainder = full_len - len;
+ /*
+ * We can only trim if we have space for the remainder
+ * and also for at least "k+," which is 3 more characters.
+ */
+ if (remainder <= space - 3) {
+ /*
+ * We are guaranteed to have 'k' in the buffer, so
+ * we need at least 3 additional bytes for storing
+ * "+," in addition to the remainder.
+ */
+ for (int i = size - 1 - remainder - 3; i >= 0; i--) {
+ if (buf[i] == 'k' || buf[i] == ',') {
+ strcpy(buf + i + 1, "+,");
+ len = i + 3; /* Not counting '\0' */
+ break;
+ }
+ }
+ }
+ }
+
len += input_print_modalias_bits(buf + len, size - len,
'r', id->relbit, 0, REL_MAX);
len += input_print_modalias_bits(buf + len, size - len,
@@ -1412,12 +1452,25 @@ static int input_print_modalias(char *buf, int size, struct input_dev *id,
len += input_print_modalias_bits(buf + len, size - len,
'w', id->swbit, 0, SW_MAX);
- if (add_cr)
- len += snprintf(buf + len, max(size - len, 0), "\n");
-
return len;
}
+static int input_print_modalias(char *buf, int size, const struct input_dev *id)
+{
+ int full_len;
+
+ /*
+ * Printing is done in 2 passes: first one figures out total length
+ * needed for the modalias string, second one will try to trim key
+ * data in case when buffer is too small for the entire modalias.
+ * If the buffer is too small regardless, it will fill as much as it
+ * can (without trimming key data) into the buffer and leave it to
+ * the caller to figure out what to do with the result.
+ */
+ full_len = input_print_modalias_parts(NULL, 0, 0, id);
+ return input_print_modalias_parts(buf, size, full_len, id);
+}
+
static ssize_t input_dev_show_modalias(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -1425,7 +1478,9 @@ static ssize_t input_dev_show_modalias(struct device *dev,
struct input_dev *id = to_input_dev(dev);
ssize_t len;
- len = input_print_modalias(buf, PAGE_SIZE, id, 1);
+ len = input_print_modalias(buf, PAGE_SIZE, id);
+ if (len < PAGE_SIZE - 2)
+ len += snprintf(buf + len, PAGE_SIZE - len, "\n");
return min_t(int, len, PAGE_SIZE);
}
@@ -1637,6 +1692,23 @@ static int input_add_uevent_bm_var(struct kobj_uevent_env *env,
return 0;
}
+/*
+ * This is a pretty gross hack. When building uevent data the driver core
+ * may try adding more environment variables to kobj_uevent_env without
+ * telling us, so we have no idea how much of the buffer we can use to
+ * avoid overflows/-ENOMEM elsewhere. To work around this let's artificially
+ * reduce amount of memory we will use for the modalias environment variable.
+ *
+ * The potential additions are:
+ *
+ * SEQNUM=18446744073709551615 - (%llu - 28 bytes)
+ * HOME=/ (6 bytes)
+ * PATH=/sbin:/bin:/usr/sbin:/usr/bin (34 bytes)
+ *
+ * 68 bytes total. Allow extra buffer - 96 bytes
+ */
+#define UEVENT_ENV_EXTRA_LEN 96
+
static int input_add_uevent_modalias_var(struct kobj_uevent_env *env,
struct input_dev *dev)
{
@@ -1646,9 +1718,11 @@ static int input_add_uevent_modalias_var(struct kobj_uevent_env *env,
return -ENOMEM;
len = input_print_modalias(&env->buf[env->buflen - 1],
- sizeof(env->buf) - env->buflen,
- dev, 0);
- if (len >= (sizeof(env->buf) - env->buflen))
+ (int)sizeof(env->buf) - env->buflen -
+ UEVENT_ENV_EXTRA_LEN,
+ dev);
+ if (len >= ((int)sizeof(env->buf) - env->buflen -
+ UEVENT_ENV_EXTRA_LEN))
return -ENOMEM;
env->buflen += len;
--
2.40.1
From: He Zhai <zhai.he(a)nxp.com>
In the current code logic, if the device-specified CMA memory
allocation fails, memory will not be allocated from the default CMA area.
This patch will use the default cma region when the device's
specified CMA is not enough.
In addition, the log level of allocation failure is changed to debug.
Because these logs will be printed when memory allocation from the
device specified CMA fails, but if the allocation fails, it will be
allocated from the default cma area. It can easily mislead developers'
judgment.
Signed-off-by: He Zhai <zhai.he(a)nxp.com>
---
kernel/dma/contiguous.c | 11 +++++++++--
mm/cma.c | 4 ++--
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 055da410ac71..e45cfb24500f 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -357,8 +357,13 @@ struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
/* CMA can be used only in the context which permits sleeping */
if (!gfpflags_allow_blocking(gfp))
return NULL;
- if (dev->cma_area)
- return cma_alloc_aligned(dev->cma_area, size, gfp);
+ if (dev->cma_area) {
+ struct page *page = NULL;
+
+ page = cma_alloc_aligned(dev->cma_area, size, gfp);
+ if (page)
+ return page;
+ }
if (size <= PAGE_SIZE)
return NULL;
@@ -406,6 +411,8 @@ void dma_free_contiguous(struct device *dev, struct page *page, size_t size)
if (dev->cma_area) {
if (cma_release(dev->cma_area, page, count))
return;
+ if (cma_release(dma_contiguous_default_area, page, count))
+ return;
} else {
/*
* otherwise, page is from either per-numa cma or default cma
diff --git a/mm/cma.c b/mm/cma.c
index 3e9724716bad..6e12faf1bea7 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -495,8 +495,8 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
}
if (ret && !no_warn) {
- pr_err_ratelimited("%s: %s: alloc failed, req-size: %lu pages, ret: %d\n",
- __func__, cma->name, count, ret);
+ pr_debug("%s: alloc failed, req-size: %lu pages, ret: %d, try to use default cma\n",
+ cma->name, count, ret);
cma_debug_show_areas(cma);
}
--
2.34.1
Hi,
I have a project that I would like you to be part of for possible
business collaboration in future as my company needs a new raw material
supplier from your country. You might be wondering the the type of
project, please get back to me so I can give you more details.
Peter Chan
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0c50b7fcf2773b4853e83fc15aba1a196ba95966
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061237-ethically-ethically-19bc@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
0c50b7fcf277 ("firmware: qcom_scm: disable clocks if qcom_scm_bw_enable() fails")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0c50b7fcf2773b4853e83fc15aba1a196ba95966 Mon Sep 17 00:00:00 2001
From: Gabor Juhos <j4g8y7(a)gmail.com>
Date: Mon, 4 Mar 2024 14:14:53 +0100
Subject: [PATCH] firmware: qcom_scm: disable clocks if qcom_scm_bw_enable()
fails
There are several functions which are calling qcom_scm_bw_enable()
then returns immediately if the call fails and leaves the clocks
enabled.
Change the code of these functions to disable clocks when the
qcom_scm_bw_enable() call fails. This also fixes a possible dma
buffer leak in the qcom_scm_pas_init_image() function.
Compile tested only due to lack of hardware with interconnect
support.
Cc: stable(a)vger.kernel.org
Fixes: 65b7ebda5028 ("firmware: qcom_scm: Add bw voting support to the SCM interface")
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Link: https://lore.kernel.org/r/20240304-qcom-scm-disable-clk-v1-1-b36e51577ca1@g…
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
index 520de9b5633a..e8460626fb0c 100644
--- a/drivers/firmware/qcom/qcom_scm.c
+++ b/drivers/firmware/qcom/qcom_scm.c
@@ -569,13 +569,14 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size,
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
desc.args[1] = mdata_phys;
ret = qcom_scm_call(__scm->dev, &desc, &res);
-
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
out:
@@ -637,10 +638,12 @@ int qcom_scm_pas_mem_setup(u32 peripheral, phys_addr_t addr, phys_addr_t size)
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
ret = qcom_scm_call(__scm->dev, &desc, &res);
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
return ret ? : res.result[0];
@@ -672,10 +675,12 @@ int qcom_scm_pas_auth_and_reset(u32 peripheral)
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
ret = qcom_scm_call(__scm->dev, &desc, &res);
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
return ret ? : res.result[0];
@@ -706,11 +711,12 @@ int qcom_scm_pas_shutdown(u32 peripheral)
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
ret = qcom_scm_call(__scm->dev, &desc, &res);
-
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
return ret ? : res.result[0];
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 0c50b7fcf2773b4853e83fc15aba1a196ba95966
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061236-vanity-bankbook-d9dc@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
0c50b7fcf277 ("firmware: qcom_scm: disable clocks if qcom_scm_bw_enable() fails")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0c50b7fcf2773b4853e83fc15aba1a196ba95966 Mon Sep 17 00:00:00 2001
From: Gabor Juhos <j4g8y7(a)gmail.com>
Date: Mon, 4 Mar 2024 14:14:53 +0100
Subject: [PATCH] firmware: qcom_scm: disable clocks if qcom_scm_bw_enable()
fails
There are several functions which are calling qcom_scm_bw_enable()
then returns immediately if the call fails and leaves the clocks
enabled.
Change the code of these functions to disable clocks when the
qcom_scm_bw_enable() call fails. This also fixes a possible dma
buffer leak in the qcom_scm_pas_init_image() function.
Compile tested only due to lack of hardware with interconnect
support.
Cc: stable(a)vger.kernel.org
Fixes: 65b7ebda5028 ("firmware: qcom_scm: Add bw voting support to the SCM interface")
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
Link: https://lore.kernel.org/r/20240304-qcom-scm-disable-clk-v1-1-b36e51577ca1@g…
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
index 520de9b5633a..e8460626fb0c 100644
--- a/drivers/firmware/qcom/qcom_scm.c
+++ b/drivers/firmware/qcom/qcom_scm.c
@@ -569,13 +569,14 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size,
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
desc.args[1] = mdata_phys;
ret = qcom_scm_call(__scm->dev, &desc, &res);
-
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
out:
@@ -637,10 +638,12 @@ int qcom_scm_pas_mem_setup(u32 peripheral, phys_addr_t addr, phys_addr_t size)
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
ret = qcom_scm_call(__scm->dev, &desc, &res);
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
return ret ? : res.result[0];
@@ -672,10 +675,12 @@ int qcom_scm_pas_auth_and_reset(u32 peripheral)
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
ret = qcom_scm_call(__scm->dev, &desc, &res);
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
return ret ? : res.result[0];
@@ -706,11 +711,12 @@ int qcom_scm_pas_shutdown(u32 peripheral)
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
ret = qcom_scm_call(__scm->dev, &desc, &res);
-
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
return ret ? : res.result[0];
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 55c421b364482b61c4c45313a535e61ed5ae4ea3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061231-player-lumpish-af5c@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
55c421b36448 ("mmc: davinci: Don't strip remove function when driver is builtin")
bc1711e8332d ("mmc: davinci_mmc: Convert to platform remove callback returning void")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
fa6c12e036c9 ("drm/xe/guc: Add Relay Communication ABI definitions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 55c421b364482b61c4c45313a535e61ed5ae4ea3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig(a)pengutronix.de>
Date: Sun, 24 Mar 2024 12:40:17 +0100
Subject: [PATCH] mmc: davinci: Don't strip remove function when driver is
builtin
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Using __exit for the remove function results in the remove callback being
discarded with CONFIG_MMC_DAVINCI=y. When such a device gets unbound (e.g.
using sysfs or hotplug), the driver is just removed without the cleanup
being performed. This results in resource leaks. Fix it by compiling in the
remove callback unconditionally.
This also fixes a W=1 modpost warning:
WARNING: modpost: drivers/mmc/host/davinci_mmc: section mismatch in
reference: davinci_mmcsd_driver+0x10 (section: .data) ->
davinci_mmcsd_remove (section: .exit.text)
Fixes: b4cff4549b7a ("DaVinci: MMC: MMC/SD controller driver for DaVinci family")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240324114017.231936-2-u.kleine-koenig@pengutron…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/davinci_mmc.c b/drivers/mmc/host/davinci_mmc.c
index 8bd938919687..d7427894e0bc 100644
--- a/drivers/mmc/host/davinci_mmc.c
+++ b/drivers/mmc/host/davinci_mmc.c
@@ -1337,7 +1337,7 @@ static int davinci_mmcsd_probe(struct platform_device *pdev)
return ret;
}
-static void __exit davinci_mmcsd_remove(struct platform_device *pdev)
+static void davinci_mmcsd_remove(struct platform_device *pdev)
{
struct mmc_davinci_host *host = platform_get_drvdata(pdev);
@@ -1392,7 +1392,7 @@ static struct platform_driver davinci_mmcsd_driver = {
.of_match_table = davinci_mmc_dt_ids,
},
.probe = davinci_mmcsd_probe,
- .remove_new = __exit_p(davinci_mmcsd_remove),
+ .remove_new = davinci_mmcsd_remove,
.id_table = davinci_mmc_devtype,
};
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 55c421b364482b61c4c45313a535e61ed5ae4ea3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061230-triangle-crepe-3f0f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
55c421b36448 ("mmc: davinci: Don't strip remove function when driver is builtin")
bc1711e8332d ("mmc: davinci_mmc: Convert to platform remove callback returning void")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 55c421b364482b61c4c45313a535e61ed5ae4ea3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig(a)pengutronix.de>
Date: Sun, 24 Mar 2024 12:40:17 +0100
Subject: [PATCH] mmc: davinci: Don't strip remove function when driver is
builtin
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Using __exit for the remove function results in the remove callback being
discarded with CONFIG_MMC_DAVINCI=y. When such a device gets unbound (e.g.
using sysfs or hotplug), the driver is just removed without the cleanup
being performed. This results in resource leaks. Fix it by compiling in the
remove callback unconditionally.
This also fixes a W=1 modpost warning:
WARNING: modpost: drivers/mmc/host/davinci_mmc: section mismatch in
reference: davinci_mmcsd_driver+0x10 (section: .data) ->
davinci_mmcsd_remove (section: .exit.text)
Fixes: b4cff4549b7a ("DaVinci: MMC: MMC/SD controller driver for DaVinci family")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240324114017.231936-2-u.kleine-koenig@pengutron…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/davinci_mmc.c b/drivers/mmc/host/davinci_mmc.c
index 8bd938919687..d7427894e0bc 100644
--- a/drivers/mmc/host/davinci_mmc.c
+++ b/drivers/mmc/host/davinci_mmc.c
@@ -1337,7 +1337,7 @@ static int davinci_mmcsd_probe(struct platform_device *pdev)
return ret;
}
-static void __exit davinci_mmcsd_remove(struct platform_device *pdev)
+static void davinci_mmcsd_remove(struct platform_device *pdev)
{
struct mmc_davinci_host *host = platform_get_drvdata(pdev);
@@ -1392,7 +1392,7 @@ static struct platform_driver davinci_mmcsd_driver = {
.of_match_table = davinci_mmc_dt_ids,
},
.probe = davinci_mmcsd_probe,
- .remove_new = __exit_p(davinci_mmcsd_remove),
+ .remove_new = davinci_mmcsd_remove,
.id_table = davinci_mmc_devtype,
};
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 55c421b364482b61c4c45313a535e61ed5ae4ea3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061229-crying-exemplary-9dce@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
55c421b36448 ("mmc: davinci: Don't strip remove function when driver is builtin")
bc1711e8332d ("mmc: davinci_mmc: Convert to platform remove callback returning void")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 55c421b364482b61c4c45313a535e61ed5ae4ea3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig(a)pengutronix.de>
Date: Sun, 24 Mar 2024 12:40:17 +0100
Subject: [PATCH] mmc: davinci: Don't strip remove function when driver is
builtin
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Using __exit for the remove function results in the remove callback being
discarded with CONFIG_MMC_DAVINCI=y. When such a device gets unbound (e.g.
using sysfs or hotplug), the driver is just removed without the cleanup
being performed. This results in resource leaks. Fix it by compiling in the
remove callback unconditionally.
This also fixes a W=1 modpost warning:
WARNING: modpost: drivers/mmc/host/davinci_mmc: section mismatch in
reference: davinci_mmcsd_driver+0x10 (section: .data) ->
davinci_mmcsd_remove (section: .exit.text)
Fixes: b4cff4549b7a ("DaVinci: MMC: MMC/SD controller driver for DaVinci family")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240324114017.231936-2-u.kleine-koenig@pengutron…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/davinci_mmc.c b/drivers/mmc/host/davinci_mmc.c
index 8bd938919687..d7427894e0bc 100644
--- a/drivers/mmc/host/davinci_mmc.c
+++ b/drivers/mmc/host/davinci_mmc.c
@@ -1337,7 +1337,7 @@ static int davinci_mmcsd_probe(struct platform_device *pdev)
return ret;
}
-static void __exit davinci_mmcsd_remove(struct platform_device *pdev)
+static void davinci_mmcsd_remove(struct platform_device *pdev)
{
struct mmc_davinci_host *host = platform_get_drvdata(pdev);
@@ -1392,7 +1392,7 @@ static struct platform_driver davinci_mmcsd_driver = {
.of_match_table = davinci_mmc_dt_ids,
},
.probe = davinci_mmcsd_probe,
- .remove_new = __exit_p(davinci_mmcsd_remove),
+ .remove_new = davinci_mmcsd_remove,
.id_table = davinci_mmc_devtype,
};
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 55c421b364482b61c4c45313a535e61ed5ae4ea3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061228-olive-jawless-313c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
55c421b36448 ("mmc: davinci: Don't strip remove function when driver is builtin")
bc1711e8332d ("mmc: davinci_mmc: Convert to platform remove callback returning void")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 55c421b364482b61c4c45313a535e61ed5ae4ea3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig(a)pengutronix.de>
Date: Sun, 24 Mar 2024 12:40:17 +0100
Subject: [PATCH] mmc: davinci: Don't strip remove function when driver is
builtin
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Using __exit for the remove function results in the remove callback being
discarded with CONFIG_MMC_DAVINCI=y. When such a device gets unbound (e.g.
using sysfs or hotplug), the driver is just removed without the cleanup
being performed. This results in resource leaks. Fix it by compiling in the
remove callback unconditionally.
This also fixes a W=1 modpost warning:
WARNING: modpost: drivers/mmc/host/davinci_mmc: section mismatch in
reference: davinci_mmcsd_driver+0x10 (section: .data) ->
davinci_mmcsd_remove (section: .exit.text)
Fixes: b4cff4549b7a ("DaVinci: MMC: MMC/SD controller driver for DaVinci family")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240324114017.231936-2-u.kleine-koenig@pengutron…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/davinci_mmc.c b/drivers/mmc/host/davinci_mmc.c
index 8bd938919687..d7427894e0bc 100644
--- a/drivers/mmc/host/davinci_mmc.c
+++ b/drivers/mmc/host/davinci_mmc.c
@@ -1337,7 +1337,7 @@ static int davinci_mmcsd_probe(struct platform_device *pdev)
return ret;
}
-static void __exit davinci_mmcsd_remove(struct platform_device *pdev)
+static void davinci_mmcsd_remove(struct platform_device *pdev)
{
struct mmc_davinci_host *host = platform_get_drvdata(pdev);
@@ -1392,7 +1392,7 @@ static struct platform_driver davinci_mmcsd_driver = {
.of_match_table = davinci_mmc_dt_ids,
},
.probe = davinci_mmcsd_probe,
- .remove_new = __exit_p(davinci_mmcsd_remove),
+ .remove_new = davinci_mmcsd_remove,
.id_table = davinci_mmc_devtype,
};
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 4bc60736154bc9e0e39d3b88918f5d3762ebe5e0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061207-grooving-scholar-3378@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
4bc60736154b ("media: mc: mark the media devnode as registered from the, start")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4bc60736154bc9e0e39d3b88918f5d3762ebe5e0 Mon Sep 17 00:00:00 2001
From: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Date: Fri, 23 Feb 2024 09:46:19 +0100
Subject: [PATCH] media: mc: mark the media devnode as registered from the,
start
First the media device node was created, and if successful it was
marked as 'registered'. This leaves a small race condition where
an application can open the device node and get an error back
because the 'registered' flag was not yet set.
Change the order: first set the 'registered' flag, then actually
register the media device node. If that fails, then clear the flag.
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Acked-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Fixes: cf4b9211b568 ("[media] media: Media device node support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
diff --git a/drivers/media/mc/mc-devnode.c b/drivers/media/mc/mc-devnode.c
index 7f67825c8757..318e267e798e 100644
--- a/drivers/media/mc/mc-devnode.c
+++ b/drivers/media/mc/mc-devnode.c
@@ -245,15 +245,14 @@ int __must_check media_devnode_register(struct media_device *mdev,
kobject_set_name(&devnode->cdev.kobj, "media%d", devnode->minor);
/* Part 3: Add the media and char device */
+ set_bit(MEDIA_FLAG_REGISTERED, &devnode->flags);
ret = cdev_device_add(&devnode->cdev, &devnode->dev);
if (ret < 0) {
+ clear_bit(MEDIA_FLAG_REGISTERED, &devnode->flags);
pr_err("%s: cdev_device_add failed\n", __func__);
goto cdev_add_error;
}
- /* Part 4: Activate this minor. The char device can now be used. */
- set_bit(MEDIA_FLAG_REGISTERED, &devnode->flags);
-
return 0;
cdev_add_error:
On Wed, Jun 12, 2024 at 05:29:10PM +0200, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> bcache: fix variable length array abuse in btree_iter
>
> to the 5.15-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> bcache-fix-variable-length-array-abuse-in-btree_iter.patch
> and it can be found in the queue-5.15 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
Hi, I forgot to add a version tag on this -- it should only be in
kernels >= v6.1, so please drop it from v5.10 & v5.15.
Thanks,
Matthew
It will return all zero data when DIO reading from inline_data inode, it
is because f2fs_iomap_begin() assign iomap->type w/ IOMAP_HOLE incorrectly
for this case.
We can let iomap framework handle inline data via assigning iomap->type
and iomap->inline_data correctly, however, it will be a little bit
complicated when handling race case in between direct IO and buffered IO.
So, let's force to use buffered IO to fix this issue.
Cc: stable(a)vger.kernel.org
Reported-by: Barry Song <v-songbaohua(a)oppo.com>
Signed-off-by: Chao Yu <chao(a)kernel.org>
---
fs/f2fs/file.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index db6236f27852..e038910ad1e5 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -851,6 +851,8 @@ static bool f2fs_force_buffered_io(struct inode *inode, int rw)
return true;
if (f2fs_compressed_file(inode))
return true;
+ if (f2fs_has_inline_data(inode))
+ return true;
/* disallow direct IO if any of devices has unaligned blksize */
if (f2fs_is_multi_device(sbi) && !sbi->aligned_blksize)
--
2.40.1
On Tue, 28 May 2024, Eric Dumazet wrote:
> __dst_negative_advice() does not enforce proper RCU rules when
> sk->dst_cache must be cleared, leading to possible UAF.
>
> RCU rules are that we must first clear sk->sk_dst_cache,
> then call dst_release(old_dst).
>
> Note that sk_dst_reset(sk) is implementing this protocol correctly,
> while __dst_negative_advice() uses the wrong order.
>
> Given that ip6_negative_advice() has special logic
> against RTF_CACHE, this means each of the three ->negative_advice()
> existing methods must perform the sk_dst_reset() themselves.
>
> Note the check against NULL dst is centralized in
> __dst_negative_advice(), there is no need to duplicate
> it in various callbacks.
>
> Many thanks to Clement Lecigne for tracking this issue.
>
> This old bug became visible after the blamed commit, using UDP sockets.
>
> Fixes: a87cb3e48ee8 ("net: Facility to report route quality of connected sockets")
> Reported-by: Clement Lecigne <clecigne(a)google.com>
> Diagnosed-by: Clement Lecigne <clecigne(a)google.com>
> Signed-off-by: Eric Dumazet <edumazet(a)google.com>
> Cc: Tom Herbert <tom(a)herbertland.com>
> ---
> include/net/dst_ops.h | 2 +-
> include/net/sock.h | 13 +++----------
> net/ipv4/route.c | 22 ++++++++--------------
> net/ipv6/route.c | 29 +++++++++++++++--------------
> net/xfrm/xfrm_policy.c | 11 +++--------
> 5 files changed, 30 insertions(+), 47 deletions(-)
Could we have this patch in all Stable branches please?
Upstream commit:
Fixes: 92f1655aa2b2 ("net: fix __dst_negative_advice() race")
--
Lee Jones [李琼斯]
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x f592cc5794747b81e53b53dd6e80219ee25f0611
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061242-cosmetics-bronco-9d06@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
f592cc579474 ("soc: qcom: rpmh-rsc: Enhance check for VRM in-flight request")
778279f4f5e4 ("soc: qcom: cmd-db: allow loading as a module")
d6815c5c43d4 ("soc: qcom: cmd-db: Add debugfs dumping file")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f592cc5794747b81e53b53dd6e80219ee25f0611 Mon Sep 17 00:00:00 2001
From: Maulik Shah <quic_mkshah(a)quicinc.com>
Date: Thu, 15 Feb 2024 10:55:44 +0530
Subject: [PATCH] soc: qcom: rpmh-rsc: Enhance check for VRM in-flight request
Each RPMh VRM accelerator resource has 3 or 4 contiguous 4-byte aligned
addresses associated with it. These control voltage, enable state, mode,
and in legacy targets, voltage headroom. The current in-flight request
checking logic looks for exact address matches. Requests for different
addresses of the same RPMh resource as thus not detected as in-flight.
Add new cmd-db API cmd_db_match_resource_addr() to enhance the in-flight
request check for VRM requests by ignoring the address offset.
This ensures that only one request is allowed to be in-flight for a given
VRM resource. This is needed to avoid scenarios where request commands are
carried out by RPMh hardware out-of-order leading to LDO regulator
over-current protection triggering.
Fixes: 658628e7ef78 ("drivers: qcom: rpmh-rsc: add RPMH controller for QCOM SoCs")
Cc: stable(a)vger.kernel.org
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)linaro.org>
Tested-by: Elliot Berman <quic_eberman(a)quicinc.com> # sm8650-qrd
Signed-off-by: Maulik Shah <quic_mkshah(a)quicinc.com>
Link: https://lore.kernel.org/r/20240215-rpmh-rsc-fixes-v4-1-9cbddfcba05b@quicinc…
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/drivers/soc/qcom/cmd-db.c b/drivers/soc/qcom/cmd-db.c
index c344107bc36c..b4e613c34a5c 100644
--- a/drivers/soc/qcom/cmd-db.c
+++ b/drivers/soc/qcom/cmd-db.c
@@ -1,6 +1,10 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Copyright (c) 2016-2018, 2020, The Linux Foundation. All rights reserved. */
+/*
+ * Copyright (c) 2016-2018, 2020, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2024, Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+#include <linux/bitfield.h>
#include <linux/debugfs.h>
#include <linux/kernel.h>
#include <linux/module.h>
@@ -17,6 +21,8 @@
#define MAX_SLV_ID 8
#define SLAVE_ID_MASK 0x7
#define SLAVE_ID_SHIFT 16
+#define SLAVE_ID(addr) FIELD_GET(GENMASK(19, 16), addr)
+#define VRM_ADDR(addr) FIELD_GET(GENMASK(19, 4), addr)
/**
* struct entry_header: header for each entry in cmddb
@@ -220,6 +226,30 @@ const void *cmd_db_read_aux_data(const char *id, size_t *len)
}
EXPORT_SYMBOL_GPL(cmd_db_read_aux_data);
+/**
+ * cmd_db_match_resource_addr() - Compare if both Resource addresses are same
+ *
+ * @addr1: Resource address to compare
+ * @addr2: Resource address to compare
+ *
+ * Return: true if two addresses refer to the same resource, false otherwise
+ */
+bool cmd_db_match_resource_addr(u32 addr1, u32 addr2)
+{
+ /*
+ * Each RPMh VRM accelerator resource has 3 or 4 contiguous 4-byte
+ * aligned addresses associated with it. Ignore the offset to check
+ * for VRM requests.
+ */
+ if (addr1 == addr2)
+ return true;
+ else if (SLAVE_ID(addr1) == CMD_DB_HW_VRM && VRM_ADDR(addr1) == VRM_ADDR(addr2))
+ return true;
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(cmd_db_match_resource_addr);
+
/**
* cmd_db_read_slave_id - Get the slave ID for a given resource address
*
diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
index c4c7aad957e6..561d8037b50a 100644
--- a/drivers/soc/qcom/rpmh-rsc.c
+++ b/drivers/soc/qcom/rpmh-rsc.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2016-2018, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2023-2024, Qualcomm Innovation Center, Inc. All rights reserved.
*/
#define pr_fmt(fmt) "%s " fmt, KBUILD_MODNAME
@@ -557,7 +558,7 @@ static int check_for_req_inflight(struct rsc_drv *drv, struct tcs_group *tcs,
for_each_set_bit(j, &curr_enabled, MAX_CMDS_PER_TCS) {
addr = read_tcs_cmd(drv, drv->regs[RSC_DRV_CMD_ADDR], i, j);
for (k = 0; k < msg->num_cmds; k++) {
- if (addr == msg->cmds[k].addr)
+ if (cmd_db_match_resource_addr(msg->cmds[k].addr, addr))
return -EBUSY;
}
}
diff --git a/include/soc/qcom/cmd-db.h b/include/soc/qcom/cmd-db.h
index c8bb56e6852a..47a6cab75e63 100644
--- a/include/soc/qcom/cmd-db.h
+++ b/include/soc/qcom/cmd-db.h
@@ -1,5 +1,8 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Copyright (c) 2016-2018, The Linux Foundation. All rights reserved. */
+/*
+ * Copyright (c) 2016-2018, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2024, Qualcomm Innovation Center, Inc. All rights reserved.
+ */
#ifndef __QCOM_COMMAND_DB_H__
#define __QCOM_COMMAND_DB_H__
@@ -21,6 +24,8 @@ u32 cmd_db_read_addr(const char *resource_id);
const void *cmd_db_read_aux_data(const char *resource_id, size_t *len);
+bool cmd_db_match_resource_addr(u32 addr1, u32 addr2);
+
enum cmd_db_hw_type cmd_db_read_slave_id(const char *resource_id);
int cmd_db_ready(void);
@@ -31,6 +36,9 @@ static inline u32 cmd_db_read_addr(const char *resource_id)
static inline const void *cmd_db_read_aux_data(const char *resource_id, size_t *len)
{ return ERR_PTR(-ENODEV); }
+static inline bool cmd_db_match_resource_addr(u32 addr1, u32 addr2)
+{ return false; }
+
static inline enum cmd_db_hw_type cmd_db_read_slave_id(const char *resource_id)
{ return -ENODEV; }
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x f592cc5794747b81e53b53dd6e80219ee25f0611
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061241-obscurity-phonics-bd3b@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
f592cc579474 ("soc: qcom: rpmh-rsc: Enhance check for VRM in-flight request")
778279f4f5e4 ("soc: qcom: cmd-db: allow loading as a module")
d6815c5c43d4 ("soc: qcom: cmd-db: Add debugfs dumping file")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f592cc5794747b81e53b53dd6e80219ee25f0611 Mon Sep 17 00:00:00 2001
From: Maulik Shah <quic_mkshah(a)quicinc.com>
Date: Thu, 15 Feb 2024 10:55:44 +0530
Subject: [PATCH] soc: qcom: rpmh-rsc: Enhance check for VRM in-flight request
Each RPMh VRM accelerator resource has 3 or 4 contiguous 4-byte aligned
addresses associated with it. These control voltage, enable state, mode,
and in legacy targets, voltage headroom. The current in-flight request
checking logic looks for exact address matches. Requests for different
addresses of the same RPMh resource as thus not detected as in-flight.
Add new cmd-db API cmd_db_match_resource_addr() to enhance the in-flight
request check for VRM requests by ignoring the address offset.
This ensures that only one request is allowed to be in-flight for a given
VRM resource. This is needed to avoid scenarios where request commands are
carried out by RPMh hardware out-of-order leading to LDO regulator
over-current protection triggering.
Fixes: 658628e7ef78 ("drivers: qcom: rpmh-rsc: add RPMH controller for QCOM SoCs")
Cc: stable(a)vger.kernel.org
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)linaro.org>
Tested-by: Elliot Berman <quic_eberman(a)quicinc.com> # sm8650-qrd
Signed-off-by: Maulik Shah <quic_mkshah(a)quicinc.com>
Link: https://lore.kernel.org/r/20240215-rpmh-rsc-fixes-v4-1-9cbddfcba05b@quicinc…
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/drivers/soc/qcom/cmd-db.c b/drivers/soc/qcom/cmd-db.c
index c344107bc36c..b4e613c34a5c 100644
--- a/drivers/soc/qcom/cmd-db.c
+++ b/drivers/soc/qcom/cmd-db.c
@@ -1,6 +1,10 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Copyright (c) 2016-2018, 2020, The Linux Foundation. All rights reserved. */
+/*
+ * Copyright (c) 2016-2018, 2020, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2024, Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+#include <linux/bitfield.h>
#include <linux/debugfs.h>
#include <linux/kernel.h>
#include <linux/module.h>
@@ -17,6 +21,8 @@
#define MAX_SLV_ID 8
#define SLAVE_ID_MASK 0x7
#define SLAVE_ID_SHIFT 16
+#define SLAVE_ID(addr) FIELD_GET(GENMASK(19, 16), addr)
+#define VRM_ADDR(addr) FIELD_GET(GENMASK(19, 4), addr)
/**
* struct entry_header: header for each entry in cmddb
@@ -220,6 +226,30 @@ const void *cmd_db_read_aux_data(const char *id, size_t *len)
}
EXPORT_SYMBOL_GPL(cmd_db_read_aux_data);
+/**
+ * cmd_db_match_resource_addr() - Compare if both Resource addresses are same
+ *
+ * @addr1: Resource address to compare
+ * @addr2: Resource address to compare
+ *
+ * Return: true if two addresses refer to the same resource, false otherwise
+ */
+bool cmd_db_match_resource_addr(u32 addr1, u32 addr2)
+{
+ /*
+ * Each RPMh VRM accelerator resource has 3 or 4 contiguous 4-byte
+ * aligned addresses associated with it. Ignore the offset to check
+ * for VRM requests.
+ */
+ if (addr1 == addr2)
+ return true;
+ else if (SLAVE_ID(addr1) == CMD_DB_HW_VRM && VRM_ADDR(addr1) == VRM_ADDR(addr2))
+ return true;
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(cmd_db_match_resource_addr);
+
/**
* cmd_db_read_slave_id - Get the slave ID for a given resource address
*
diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
index c4c7aad957e6..561d8037b50a 100644
--- a/drivers/soc/qcom/rpmh-rsc.c
+++ b/drivers/soc/qcom/rpmh-rsc.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2016-2018, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2023-2024, Qualcomm Innovation Center, Inc. All rights reserved.
*/
#define pr_fmt(fmt) "%s " fmt, KBUILD_MODNAME
@@ -557,7 +558,7 @@ static int check_for_req_inflight(struct rsc_drv *drv, struct tcs_group *tcs,
for_each_set_bit(j, &curr_enabled, MAX_CMDS_PER_TCS) {
addr = read_tcs_cmd(drv, drv->regs[RSC_DRV_CMD_ADDR], i, j);
for (k = 0; k < msg->num_cmds; k++) {
- if (addr == msg->cmds[k].addr)
+ if (cmd_db_match_resource_addr(msg->cmds[k].addr, addr))
return -EBUSY;
}
}
diff --git a/include/soc/qcom/cmd-db.h b/include/soc/qcom/cmd-db.h
index c8bb56e6852a..47a6cab75e63 100644
--- a/include/soc/qcom/cmd-db.h
+++ b/include/soc/qcom/cmd-db.h
@@ -1,5 +1,8 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Copyright (c) 2016-2018, The Linux Foundation. All rights reserved. */
+/*
+ * Copyright (c) 2016-2018, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2024, Qualcomm Innovation Center, Inc. All rights reserved.
+ */
#ifndef __QCOM_COMMAND_DB_H__
#define __QCOM_COMMAND_DB_H__
@@ -21,6 +24,8 @@ u32 cmd_db_read_addr(const char *resource_id);
const void *cmd_db_read_aux_data(const char *resource_id, size_t *len);
+bool cmd_db_match_resource_addr(u32 addr1, u32 addr2);
+
enum cmd_db_hw_type cmd_db_read_slave_id(const char *resource_id);
int cmd_db_ready(void);
@@ -31,6 +36,9 @@ static inline u32 cmd_db_read_addr(const char *resource_id)
static inline const void *cmd_db_read_aux_data(const char *resource_id, size_t *len)
{ return ERR_PTR(-ENODEV); }
+static inline bool cmd_db_match_resource_addr(u32 addr1, u32 addr2)
+{ return false; }
+
static inline enum cmd_db_hw_type cmd_db_read_slave_id(const char *resource_id)
{ return -ENODEV; }
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x f592cc5794747b81e53b53dd6e80219ee25f0611
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061240-pointing-endanger-621b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
f592cc579474 ("soc: qcom: rpmh-rsc: Enhance check for VRM in-flight request")
778279f4f5e4 ("soc: qcom: cmd-db: allow loading as a module")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f592cc5794747b81e53b53dd6e80219ee25f0611 Mon Sep 17 00:00:00 2001
From: Maulik Shah <quic_mkshah(a)quicinc.com>
Date: Thu, 15 Feb 2024 10:55:44 +0530
Subject: [PATCH] soc: qcom: rpmh-rsc: Enhance check for VRM in-flight request
Each RPMh VRM accelerator resource has 3 or 4 contiguous 4-byte aligned
addresses associated with it. These control voltage, enable state, mode,
and in legacy targets, voltage headroom. The current in-flight request
checking logic looks for exact address matches. Requests for different
addresses of the same RPMh resource as thus not detected as in-flight.
Add new cmd-db API cmd_db_match_resource_addr() to enhance the in-flight
request check for VRM requests by ignoring the address offset.
This ensures that only one request is allowed to be in-flight for a given
VRM resource. This is needed to avoid scenarios where request commands are
carried out by RPMh hardware out-of-order leading to LDO regulator
over-current protection triggering.
Fixes: 658628e7ef78 ("drivers: qcom: rpmh-rsc: add RPMH controller for QCOM SoCs")
Cc: stable(a)vger.kernel.org
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)linaro.org>
Tested-by: Elliot Berman <quic_eberman(a)quicinc.com> # sm8650-qrd
Signed-off-by: Maulik Shah <quic_mkshah(a)quicinc.com>
Link: https://lore.kernel.org/r/20240215-rpmh-rsc-fixes-v4-1-9cbddfcba05b@quicinc…
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/drivers/soc/qcom/cmd-db.c b/drivers/soc/qcom/cmd-db.c
index c344107bc36c..b4e613c34a5c 100644
--- a/drivers/soc/qcom/cmd-db.c
+++ b/drivers/soc/qcom/cmd-db.c
@@ -1,6 +1,10 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Copyright (c) 2016-2018, 2020, The Linux Foundation. All rights reserved. */
+/*
+ * Copyright (c) 2016-2018, 2020, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2024, Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+#include <linux/bitfield.h>
#include <linux/debugfs.h>
#include <linux/kernel.h>
#include <linux/module.h>
@@ -17,6 +21,8 @@
#define MAX_SLV_ID 8
#define SLAVE_ID_MASK 0x7
#define SLAVE_ID_SHIFT 16
+#define SLAVE_ID(addr) FIELD_GET(GENMASK(19, 16), addr)
+#define VRM_ADDR(addr) FIELD_GET(GENMASK(19, 4), addr)
/**
* struct entry_header: header for each entry in cmddb
@@ -220,6 +226,30 @@ const void *cmd_db_read_aux_data(const char *id, size_t *len)
}
EXPORT_SYMBOL_GPL(cmd_db_read_aux_data);
+/**
+ * cmd_db_match_resource_addr() - Compare if both Resource addresses are same
+ *
+ * @addr1: Resource address to compare
+ * @addr2: Resource address to compare
+ *
+ * Return: true if two addresses refer to the same resource, false otherwise
+ */
+bool cmd_db_match_resource_addr(u32 addr1, u32 addr2)
+{
+ /*
+ * Each RPMh VRM accelerator resource has 3 or 4 contiguous 4-byte
+ * aligned addresses associated with it. Ignore the offset to check
+ * for VRM requests.
+ */
+ if (addr1 == addr2)
+ return true;
+ else if (SLAVE_ID(addr1) == CMD_DB_HW_VRM && VRM_ADDR(addr1) == VRM_ADDR(addr2))
+ return true;
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(cmd_db_match_resource_addr);
+
/**
* cmd_db_read_slave_id - Get the slave ID for a given resource address
*
diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
index c4c7aad957e6..561d8037b50a 100644
--- a/drivers/soc/qcom/rpmh-rsc.c
+++ b/drivers/soc/qcom/rpmh-rsc.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2016-2018, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2023-2024, Qualcomm Innovation Center, Inc. All rights reserved.
*/
#define pr_fmt(fmt) "%s " fmt, KBUILD_MODNAME
@@ -557,7 +558,7 @@ static int check_for_req_inflight(struct rsc_drv *drv, struct tcs_group *tcs,
for_each_set_bit(j, &curr_enabled, MAX_CMDS_PER_TCS) {
addr = read_tcs_cmd(drv, drv->regs[RSC_DRV_CMD_ADDR], i, j);
for (k = 0; k < msg->num_cmds; k++) {
- if (addr == msg->cmds[k].addr)
+ if (cmd_db_match_resource_addr(msg->cmds[k].addr, addr))
return -EBUSY;
}
}
diff --git a/include/soc/qcom/cmd-db.h b/include/soc/qcom/cmd-db.h
index c8bb56e6852a..47a6cab75e63 100644
--- a/include/soc/qcom/cmd-db.h
+++ b/include/soc/qcom/cmd-db.h
@@ -1,5 +1,8 @@
/* SPDX-License-Identifier: GPL-2.0 */
-/* Copyright (c) 2016-2018, The Linux Foundation. All rights reserved. */
+/*
+ * Copyright (c) 2016-2018, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2024, Qualcomm Innovation Center, Inc. All rights reserved.
+ */
#ifndef __QCOM_COMMAND_DB_H__
#define __QCOM_COMMAND_DB_H__
@@ -21,6 +24,8 @@ u32 cmd_db_read_addr(const char *resource_id);
const void *cmd_db_read_aux_data(const char *resource_id, size_t *len);
+bool cmd_db_match_resource_addr(u32 addr1, u32 addr2);
+
enum cmd_db_hw_type cmd_db_read_slave_id(const char *resource_id);
int cmd_db_ready(void);
@@ -31,6 +36,9 @@ static inline u32 cmd_db_read_addr(const char *resource_id)
static inline const void *cmd_db_read_aux_data(const char *resource_id, size_t *len)
{ return ERR_PTR(-ENODEV); }
+static inline bool cmd_db_match_resource_addr(u32 addr1, u32 addr2)
+{ return false; }
+
static inline enum cmd_db_hw_type cmd_db_read_slave_id(const char *resource_id)
{ return -ENODEV; }
These commits reference out.of.bound between v6.9 and v6.10-rc1
These commits are not, yet, in stable/linux-rolling-stable.
Let me know if you would rather me compare to a different repo/branch.
The list has been manually pruned to only contain commits that look like
actual issues.
If they contain a Fixes line it has been verified that at least one of the
commits that the Fixes tag(s) reference is in stable/linux-rolling-stable
2ba24864d2f61b52210b Syz Fuzzers, Out of bounds
3ebc46ca8675de6378e3 Syz Fuzzers, Out of bounds
9841991a446c87f90f66 Kernel panic, NULL pointer, Out of bounds
51fafb3cd7fcf4f46826 Out of bounds
45cf976008ddef4a9c9a Out of bounds
8b2faf1a4f3b6c748c0d Out of bounds
faa4364bef2ec0060de3 Buffer overflow, Out of bounds
8ee1b439b1540ae54314 Out of bounds
7b4c74cf22d7584d1eb4 Out of bounds
1008368e1c7e36bdec01 Out of bounds
--
Ronnie Sahlberg [Principal Software Engineer, Linux]
P 775 384 8203 | E [email] | W ciq.com
These commits reference KASAN between v6.9 and v6.10-rc1
These commits are not, yet, in stable/linux-rolling-stable.
Let me know if you would rather me compare to a different repo/branch.
The list has been manually pruned to only contain commits that look like
actual issues.
If they contain a Fixes line it has been verified that at least one of the
commits that the Fixes tag(s) reference is in stable/linux-rolling-stable
195aba96b854dd664768 KASAN, Out of bounds
2e577732e8d28b9183df Kernel panic, KASAN
20faaf30e55522bba2b5 KASAN, Syz Fuzzers, Out of bounds
c1115ddbda9c930fba0f KASAN, NULL pointer
--
Ronnie Sahlberg [Principal Software Engineer, Linux]
P 775 384 8203 | E [email] | W ciq.com
From: Qingfang Deng <qingfang.deng(a)siflower.com.cn>
[ Upstream commit ed779fe4c9b5a20b4ab4fd6f3e19807445bb78c7 ]
After the blamed commit, the member key is longer 4-byte aligned. On
platforms that do not support unaligned access, e.g., MIPS32R2 with
unaligned_action set to 1, this will trigger a crash when accessing
an IPv6 pneigh_entry, as the key is cast to an in6_addr pointer.
Change the type of the key to u32 to make it aligned.
Fixes: 62dd93181aaa ("[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.")
Signed-off-by: Qingfang Deng <qingfang.deng(a)siflower.com.cn>
---
include/net/neighbour.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index e58ef9e338de..4c53e51f0799 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -172,7 +172,7 @@ struct pneigh_entry {
possible_net_t net;
struct net_device *dev;
u8 flags;
- u8 key[0];
+ u32 key[0];
};
/*
--
2.34.1
It has been brought to my attention that what had been fixed 1 year ago
here for kernels 5.18 and later:
https://lore.kernel.org/netdev/20230626155112.3155993-1-vladimir.oltean@nxp…
is still broken on linux-5.15.y. Short summary: PTP boundary clock is
broken for ports under a VLAN-aware bridge.
The reason is that the Fixes: tags in those patches were wrong. The
issue originated from earlier, but the changes from 5.18 (blamed there),
aka DSA FDB isolation, masked that.
A straightforward cherry-pick was not possible, due to the conflict with
the aforementioned DSA FDB isolation work from 5.18. So I redid patch
2/2 and marked what I had to adapt.
Tested on the NXP LS1021A-TSN board.
Vladimir Oltean (2):
net: dsa: sja1105: always enable the INCL_SRCPT option
net: dsa: tag_sja1105: always prefer source port information from
INCL_SRCPT
drivers/net/dsa/sja1105/sja1105_main.c | 9 ++-----
net/dsa/tag_sja1105.c | 34 ++++++++++++++++++++------
2 files changed, 28 insertions(+), 15 deletions(-)
---
I'm sorry for the people who will want to backport DSA FDB isolation to
linux-5.15.y :(
--
2.34.1
From: Linus Torvalds <torvalds(a)linux-foundation.org>
commit 02b670c1f88e78f42a6c5aee155c7b26960ca054 upstream.
The syzbot-reported stack trace from hell in this discussion thread
actually has three nested page faults:
https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com
... and I think that's actually the important thing here:
- the first page fault is from user space, and triggers the vsyscall
emulation.
- the second page fault is from __do_sys_gettimeofday(), and that should
just have caused the exception that then sets the return value to
-EFAULT
- the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
preempt_schedule() -> trace_sched_switch(), which then causes a BPF
trace program to run, which does that bpf_probe_read_compat(), which
causes that page fault under pagefault_disable().
It's quite the nasty backtrace, and there's a lot going on.
The problem is literally the vsyscall emulation, which sets
current->thread.sig_on_uaccess_err = 1;
and that causes the fixup_exception() code to send the signal *despite* the
exception being caught.
And I think that is in fact completely bogus. It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent -
like for the BPF user mode trace gathering.
In other words, I think the whole "sig_on_uaccess_err" thing is entirely
broken, because it makes any nested page-faults do all the wrong things.
Now, arguably, I don't think anybody should enable vsyscall emulation any
more, but this test case clearly does.
I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state for
something that isn't actually per thread.
The x86 page fault code actually tried to deal with the "incorrect nesting"
by having that:
if (in_interrupt())
return;
which ignores the sig_on_uaccess_err case when it happens in interrupts,
but as shown by this example, these nested page faults do not need to be
about interrupts at all.
IOW, I think the only right thing is to remove that horrendously broken
code.
The attached patch looks like the ObviouslyCorrect(tm) thing to do.
NOTE! This broken code goes back to this commit in 2011:
4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults")
... and back then the reason was to get all the siginfo details right.
Honestly, I do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says:
This fixes issues with UML when vsyscall=emulate.
... and so my patch to remove this garbage will probably break UML in this
situation.
I do not believe that anybody should be running with vsyscall=emulate in
2024 in the first place, much less if you are doing things like UML. But
let's see if somebody screams.
Reported-and-tested-by: syzbot+83e7f982ca045ab4405c(a)syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Tested-by: Jiri Olsa <jolsa(a)kernel.org>
Acked-by: Andy Lutomirski <luto(a)kernel.org>
Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKey…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[gpiccoli: Backport the patch due to differences in the trees. The main change
between 5.10.y and 5.15.y is due to renaming the fixup function, by
commit 6456a2a69ee1 ("x86/fault: Rename no_context() to kernelmode_fixup_or_oops()").
Following 2 commits cause divergence in the diffs too (in the removed lines):
cd072dab453a ("x86/fault: Add a helper function to sanitize error code")
d4ffd5df9d18 ("x86/fault: Fix wrong signal when vsyscall fails with pkey")
Finally, there is context adjustment in the processor.h file.]
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
---
Hi folks, this was backported by AUTOSEL up to 5.15.y; I'm manually submitting
the backport to 5.4.y and 5.10.y. I've detailed a bit the changes necessary
due to other nonrelated missing patches, but these are really simple and
non-intrusive. Nevertheless, I've explicitely CCed x86 ML to be sure the
maintainers are aware of the backport, and if anybody thinks we shouldn't
do it for these (very) old releases, please respond here.
Cheers,
Guilherme
arch/x86/entry/vsyscall/vsyscall_64.c | 28 ++-------------------------
arch/x86/include/asm/processor.h | 1 -
arch/x86/mm/fault.c | 27 +-------------------------
3 files changed, 3 insertions(+), 53 deletions(-)
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 44c33103a955..f0b817eb6e8b 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -98,11 +98,6 @@ static int addr_to_vsyscall_nr(unsigned long addr)
static bool write_ok_or_segv(unsigned long ptr, size_t size)
{
- /*
- * XXX: if access_ok, get_user, and put_user handled
- * sig_on_uaccess_err, this could go away.
- */
-
if (!access_ok((void __user *)ptr, size)) {
struct thread_struct *thread = ¤t->thread;
@@ -120,10 +115,8 @@ static bool write_ok_or_segv(unsigned long ptr, size_t size)
bool emulate_vsyscall(unsigned long error_code,
struct pt_regs *regs, unsigned long address)
{
- struct task_struct *tsk;
unsigned long caller;
int vsyscall_nr, syscall_nr, tmp;
- int prev_sig_on_uaccess_err;
long ret;
unsigned long orig_dx;
@@ -172,8 +165,6 @@ bool emulate_vsyscall(unsigned long error_code,
goto sigsegv;
}
- tsk = current;
-
/*
* Check for access_ok violations and find the syscall nr.
*
@@ -233,12 +224,8 @@ bool emulate_vsyscall(unsigned long error_code,
goto do_ret; /* skip requested */
/*
- * With a real vsyscall, page faults cause SIGSEGV. We want to
- * preserve that behavior to make writing exploits harder.
+ * With a real vsyscall, page faults cause SIGSEGV.
*/
- prev_sig_on_uaccess_err = current->thread.sig_on_uaccess_err;
- current->thread.sig_on_uaccess_err = 1;
-
ret = -EFAULT;
switch (vsyscall_nr) {
case 0:
@@ -261,23 +248,12 @@ bool emulate_vsyscall(unsigned long error_code,
break;
}
- current->thread.sig_on_uaccess_err = prev_sig_on_uaccess_err;
-
check_fault:
if (ret == -EFAULT) {
/* Bad news -- userspace fed a bad pointer to a vsyscall. */
warn_bad_vsyscall(KERN_INFO, regs,
"vsyscall fault (exploit attempt?)");
-
- /*
- * If we failed to generate a signal for any reason,
- * generate one here. (This should be impossible.)
- */
- if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) &&
- !sigismember(&tsk->pending.signal, SIGSEGV)))
- goto sigsegv;
-
- return true; /* Don't emulate the ret. */
+ goto sigsegv;
}
regs->ax = ret;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 6dc3c5f0be07..c682a14299e0 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -528,7 +528,6 @@ struct thread_struct {
unsigned long iopl_emul;
unsigned int iopl_warn:1;
- unsigned int sig_on_uaccess_err:1;
/* Floating point and extended processor state */
struct fpu fpu;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index cdb337cf92ba..98a5924d98b7 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -649,33 +649,8 @@ no_context(struct pt_regs *regs, unsigned long error_code,
}
/* Are we prepared to handle this kernel fault? */
- if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
- /*
- * Any interrupt that takes a fault gets the fixup. This makes
- * the below recursive fault logic only apply to a faults from
- * task context.
- */
- if (in_interrupt())
- return;
-
- /*
- * Per the above we're !in_interrupt(), aka. task context.
- *
- * In this case we need to make sure we're not recursively
- * faulting through the emulate_vsyscall() logic.
- */
- if (current->thread.sig_on_uaccess_err && signal) {
- set_signal_archinfo(address, error_code);
-
- /* XXX: hwpoison faults will set the wrong code. */
- force_sig_fault(signal, si_code, (void __user *)address);
- }
-
- /*
- * Barring that, we can do the fixup and be happy.
- */
+ if (fixup_exception(regs, X86_TRAP_PF, error_code, address))
return;
- }
#ifdef CONFIG_VMAP_STACK
/*
--
2.45.1
Since upstream commit 4acffb66 "selftests: net: explicitly wait for
listener ready", the net_helper.sh from commit 3bdd9fd2 "selftests/net:
synchronize udpgro tests' tx and rx connection" will be needed.
Otherwise selftests/net/udpgro_fwd.sh will complain about:
$ sudo ./udpgro_fwd.sh
./udpgro_fwd.sh: line 4: net_helper.sh: No such file or directory
IPv4
No GRO ./udpgro_fwd.sh: line 134: wait_local_port_listen: command not found
Patch "selftests/net: synchronize udpgro tests' tx and rx connection" adds
the missing net_helper.sh. Context adjustment is needed for applying this
patch, as the BPF_FILE is different in 6.6.y
Patch "selftests: net: Remove executable bits from library scripts" fixes
the script permission.
Patch "selftests: net: included needed helper in the install targets" and
"selftests: net: List helper scripts in TEST_FILES Makefile variable" will
add this helper to the Makefile and fix the installation, lib.sh needs to
be ignored for them.
Benjamin Poirier (2):
selftests: net: Remove executable bits from library scripts
selftests: net: List helper scripts in TEST_FILES Makefile variable
Lucas Karpinski (1):
selftests/net: synchronize udpgro tests' tx and rx connection
Paolo Abeni (1):
selftests: net: included needed helper in the install targets
tools/testing/selftests/net/Makefile | 4 ++--
tools/testing/selftests/net/net_helper.sh | 22 ++++++++++++++++++++++
tools/testing/selftests/net/setup_loopback.sh | 0
tools/testing/selftests/net/udpgro.sh | 13 ++++++-------
tools/testing/selftests/net/udpgro_bench.sh | 5 +++--
tools/testing/selftests/net/udpgro_frglist.sh | 5 +++--
6 files changed, 36 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/net/net_helper.sh
mode change 100755 => 100644 tools/testing/selftests/net/setup_loopback.sh
--
2.7.4
[ Upstream commit 1cd4bc987abb2823836cbb8f887026011ccddc8a ]
Commit f58f45c1e5b9 ("vxlan: drop packets from invalid src-address")
has recently been added to vxlan mainly in the context of source
address snooping/learning so that when it is enabled, an entry in the
FDB is not being created for an invalid address for the corresponding
tunnel endpoint.
Before commit f58f45c1e5b9 vxlan was similarly behaving as geneve in
that it passed through whichever macs were set in the L2 header. It
turns out that this change in behavior breaks setups, for example,
Cilium with netkit in L3 mode for Pods as well as tunnel mode has been
passing before the change in f58f45c1e5b9 for both vxlan and geneve.
After mentioned change it is only passing for geneve as in case of
vxlan packets are dropped due to vxlan_set_mac() returning false as
source and destination macs are zero which for E/W traffic via tunnel
is totally fine.
Fix it by only opting into the is_valid_ether_addr() check in
vxlan_set_mac() when in fact source address snooping/learning is
actually enabled in vxlan. This is done by moving the check into
vxlan_snoop(). With this change, the Cilium connectivity test suite
passes again for both tunnel flavors.
Fixes: f58f45c1e5b9 ("vxlan: drop packets from invalid src-address")
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Cc: David Bauer <mail(a)david-bauer.net>
Cc: Ido Schimmel <idosch(a)nvidia.com>
Cc: Nikolay Aleksandrov <razor(a)blackwall.org>
Cc: Martin KaFai Lau <martin.lau(a)kernel.org>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor(a)blackwall.org>
Reviewed-by: David Bauer <mail(a)david-bauer.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
[ Backport note: vxlan snooping/learning not supported in 6.8 or older,
so commit is simply a revert. ]
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
---
drivers/net/vxlan/vxlan_core.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 0a0b4a9717ce..1d0688610189 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -1615,10 +1615,6 @@ static bool vxlan_set_mac(struct vxlan_dev *vxlan,
if (ether_addr_equal(eth_hdr(skb)->h_source, vxlan->dev->dev_addr))
return false;
- /* Ignore packets from invalid src-address */
- if (!is_valid_ether_addr(eth_hdr(skb)->h_source))
- return false;
-
/* Get address from the outer IP header */
if (vxlan_get_sk_family(vs) == AF_INET) {
saddr.sin.sin_addr.s_addr = ip_hdr(skb)->saddr;
--
2.34.1
The patch below does not apply to the 6.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.9.y
git checkout FETCH_HEAD
git cherry-pick -x 34bf6bae3286a58762711cfbce2cf74ecd42e1b5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024060624-platinum-ladies-9214@gregkh' --subject-prefix 'PATCH 6.9.y' HEAD^..
Possible dependencies:
34bf6bae3286 ("x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater")
21f546a43a91 ("Merge branch 'x86/urgent' into x86/cpu, to resolve conflict")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 34bf6bae3286a58762711cfbce2cf74ecd42e1b5 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Tue, 28 May 2024 22:21:31 +0200
Subject: [PATCH] x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only
on family 0x17 and greater
The new AMD/HYGON topology parser evaluates the SMT information in CPUID leaf
0x8000001e unconditionally while the original code restricted it to CPUs with
family 0x17 and greater.
This breaks family 0x15 CPUs which advertise that leaf and have a non-zero
value in the SMT section. The machine boots, but the scheduler complains loudly
about the mismatch of the core IDs:
WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:6482 sched_cpu_starting+0x183/0x250
WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2408 build_sched_domains+0x76b/0x12b0
Add the condition back to cure it.
[ bp: Make it actually build because grandpa is not concerned with
trivial stuff. :-P ]
Fixes: f7fb3b2dd92c ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Closes: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/56
Reported-by: Tim Teichmann <teichmanntim(a)outlook.de>
Reported-by: Christian Heusel <christian(a)heusel.eu>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Tested-by: Tim Teichmann <teichmanntim(a)outlook.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/7skhx6mwe4hxiul64v6azhlxnokheorksqsdbp7qw6g2jduf6…
diff --git a/arch/x86/kernel/cpu/topology_amd.c b/arch/x86/kernel/cpu/topology_amd.c
index d419deed6a48..7d476fa697ca 100644
--- a/arch/x86/kernel/cpu/topology_amd.c
+++ b/arch/x86/kernel/cpu/topology_amd.c
@@ -84,9 +84,9 @@ static bool parse_8000_001e(struct topo_scan *tscan, bool has_topoext)
/*
* If leaf 0xb is available, then the domain shifts are set
- * already and nothing to do here.
+ * already and nothing to do here. Only valid for family >= 0x17.
*/
- if (!has_topoext) {
+ if (!has_topoext && tscan->c->x86 >= 0x17) {
/*
* Leaf 0x80000008 set the CORE domain shift already.
* Update the SMT domain, but do not propagate it.
From: Shakeel Butt <shakeelb(a)google.com>
commit d4a5b369ad6d8aae552752ff438dddde653a72ec upstream.
One of our workloads (Postgres 14 + sysbench OLTP) regressed on newer
upstream kernel and on further investigation, it seems like the cause is
the always synchronous rstat flush in the count_shadow_nodes() added by
the commit f82e6bf9bb9b ("mm: memcg: use rstat for non-hierarchical
stats"). On further inspection it seems like we don't really need
accurate stats in this function as it was already approximating the amount
of appropriate shadow entries to keep for maintaining the refault
information. Since there is already 2 sec periodic rstat flush, we don't
need exact stats here. Let's ratelimit the rstat flush in this code path.
Link: https://lkml.kernel.org/r/20231228073055.4046430-1-shakeelb@google.com
Fixes: f82e6bf9bb9b ("mm: memcg: use rstat for non-hierarchical stats")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Jesper Dangaard Brouer <hawk(a)kernel.org>
---
On production with kernel v6.6 we are observing issues with excessive
cgroup rstat flushing due to the extra call to mem_cgroup_flush_stats()
in count_shadow_nodes() introduced in commit f82e6bf9bb9b ("mm: memcg:
use rstat for non-hierarchical stats") that commit is part of v6.6.
We request backport of commit d4a5b369ad6d ("mm: ratelimit stat flush
from workingset shrinker") as it have a fixes tag for this commit.
IMHO it is worth explaining call path that makes count_shadow_nodes()
cause excessive cgroup rstat flushing calls. Function shrink_node()
calls mem_cgroup_flush_stats() on its own first, and then invokes
shrink_node_memcgs(). Function shrink_node_memcgs() iterates over
cgroups via mem_cgroup_iter() for each calling shrink_slab(). The
shrink_slab() calls do_shrink_slab() that via shrinker->count_objects()
invoke count_shadow_nodes(), and count_shadow_nodes() does
a mem_cgroup_flush_stats() call, that seems unnecessary.
Backport differs slightly due to v6.6.32 doesn't contain commit
7d7ef0a4686a ("mm: memcg: restore subtree stats flushing") from v6.8.
---
mm/workingset.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/workingset.c b/mm/workingset.c
index 2559a1f2fc1c..9110957bec5b 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -664,7 +664,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
struct lruvec *lruvec;
int i;
- mem_cgroup_flush_stats();
+ mem_cgroup_flush_stats_ratelimited();
lruvec = mem_cgroup_lruvec(sc->memcg, NODE_DATA(sc->nid));
for (pages = 0, i = 0; i < NR_LRU_LISTS; i++)
pages += lruvec_page_state_local(lruvec,
Hi Sasha,
Thank you for the recent backports linked to MPTCP.
Recently, I noticed that two patches [1] [2] with the same "Fixes" tag
-- but without "Cc: stable", sorry for that -- have been backported to
different stable versions. That's good, thank you!
The first patch made its way to v5.15, while the second one went up to
v6.8, but not to older stable versions. I understand it didn't go
further because of other conflicts, and I'm ready to help to fix them.
I just wanted to know if it is normal I didn't get any 'FAILED'
notifications like the ones Greg send [3]: I rely on them to know which
patches have been treated by the Stable team, but had conflicts. Will I
get these notifications later (no hurry), or should I not rely on them
to track fixes that could not be backported?
Cheers,
Matt
[1]
https://lore.kernel.org/mptcp/20240514011335.176158-2-martineau@kernel.org/
[2]
https://lore.kernel.org/mptcp/20240514011335.176158-3-martineau@kernel.org/
[3] 'FAILED: patch "[PATCH] (...)" failed to apply to x.y-stable tree'
--
Sponsored by the NGI0 Core fund.
It looks like the patch "mptcp: fix full TCP keep-alive support" has
been backported up to v6.8 recently (thanks!), but not before due to
conflicts.
I had to adapt a bit the code not to backport new features, but the
modifications were simple, and isolated from the rest. MPTCP sockopt
tests have been executed, and no issues have been reported.
Matthieu Baerts (NGI0) (1):
mptcp: fix full TCP keep-alive support
Paolo Abeni (2):
mptcp: avoid some duplicate code in socket option handling
mptcp: cleanup SOL_TCP handling
net/mptcp/protocol.h | 3 ++
net/mptcp/sockopt.c | 123 +++++++++++++++++++++++++++++--------------
2 files changed, 87 insertions(+), 39 deletions(-)
--
2.43.0
This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.
A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook works again.
He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.
An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.
Reported-by: Barry Kauler <bkauler(a)gmail.com>
Signed-off-by: Armin Wolf <W_Armin(a)gmx.de>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 222a1d9ecf16..5f6c32ec674d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
if (r)
goto init_failed;
+ r = amdgpu_amdkfd_resume_iommu(adev);
+ if (r)
+ goto init_failed;
+
r = amdgpu_device_ip_hw_init_phase1(adev);
if (r)
goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);
- r = amdgpu_amdkfd_resume_iommu(adev);
- if (r)
- goto init_failed;
-
amdgpu_fru_get_product_info(adev);
init_failed:
--
2.39.2
Am 26.05.24 um 21:43 schrieb Sasha Levin:
> This is a note to let you know that I've just added the patch titled
>
> platform/x86: xiaomi-wmi: Fix race condition when reporting key events
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> platform-x86-xiaomi-wmi-fix-race-condition-when-repo.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
Hi,
the underlying race condition can only be triggered since
commit e2ffcda16290 ("ACPI: OSL: Allow Notify () handlers to run on all CPUs"), which
afaik was introduced with kernel 6.8.
Because of this, i do not think that we have to backport this commit to kernels before 6.8.
Thanks,
Armin Wolf
>
> commit 831f943a69833152081ec7393af598f0c8b415fa
> Author: Armin Wolf <W_Armin(a)gmx.de>
> Date: Tue Apr 2 16:30:57 2024 +0200
>
> platform/x86: xiaomi-wmi: Fix race condition when reporting key events
>
> [ Upstream commit 290680c2da8061e410bcaec4b21584ed951479af ]
>
> Multiple WMI events can be received concurrently, so multiple instances
> of xiaomi_wmi_notify() can be active at the same time. Since the input
> device is shared between those handlers, the key input sequence can be
> disturbed.
>
> Fix this by protecting the key input sequence with a mutex.
>
> Compile-tested only.
>
> Fixes: edb73f4f0247 ("platform/x86: wmi: add Xiaomi WMI key driver")
> Signed-off-by: Armin Wolf <W_Armin(a)gmx.de>
> Link: https://lore.kernel.org/r/20240402143059.8456-2-W_Armin@gmx.de
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
> Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
> Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/platform/x86/xiaomi-wmi.c b/drivers/platform/x86/xiaomi-wmi.c
> index 54a2546bb93bf..be80f0bda9484 100644
> --- a/drivers/platform/x86/xiaomi-wmi.c
> +++ b/drivers/platform/x86/xiaomi-wmi.c
> @@ -2,8 +2,10 @@
> /* WMI driver for Xiaomi Laptops */
>
> #include <linux/acpi.h>
> +#include <linux/device.h>
> #include <linux/input.h>
> #include <linux/module.h>
> +#include <linux/mutex.h>
> #include <linux/wmi.h>
>
> #include <uapi/linux/input-event-codes.h>
> @@ -20,12 +22,21 @@
>
> struct xiaomi_wmi {
> struct input_dev *input_dev;
> + struct mutex key_lock; /* Protects the key event sequence */
> unsigned int key_code;
> };
>
> +static void xiaomi_mutex_destroy(void *data)
> +{
> + struct mutex *lock = data;
> +
> + mutex_destroy(lock);
> +}
> +
> static int xiaomi_wmi_probe(struct wmi_device *wdev, const void *context)
> {
> struct xiaomi_wmi *data;
> + int ret;
>
> if (wdev == NULL || context == NULL)
> return -EINVAL;
> @@ -35,6 +46,11 @@ static int xiaomi_wmi_probe(struct wmi_device *wdev, const void *context)
> return -ENOMEM;
> dev_set_drvdata(&wdev->dev, data);
>
> + mutex_init(&data->key_lock);
> + ret = devm_add_action_or_reset(&wdev->dev, xiaomi_mutex_destroy, &data->key_lock);
> + if (ret < 0)
> + return ret;
> +
> data->input_dev = devm_input_allocate_device(&wdev->dev);
> if (data->input_dev == NULL)
> return -ENOMEM;
> @@ -59,10 +75,12 @@ static void xiaomi_wmi_notify(struct wmi_device *wdev, union acpi_object *dummy)
> if (data == NULL)
> return;
>
> + mutex_lock(&data->key_lock);
> input_report_key(data->input_dev, data->key_code, 1);
> input_sync(data->input_dev);
> input_report_key(data->input_dev, data->key_code, 0);
> input_sync(data->input_dev);
> + mutex_unlock(&data->key_lock);
> }
>
> static const struct wmi_device_id xiaomi_wmi_id_table[] = {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f5d4e04634c9cf68bdf23de08ada0bb92e8befe7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024052603-deceiving-stood-2b59@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f5d4e04634c9cf68bdf23de08ada0bb92e8befe7 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Mon, 20 May 2024 22:26:19 +0900
Subject: [PATCH] nilfs2: fix use-after-free of timer for log writer thread
Patch series "nilfs2: fix log writer related issues".
This bug fix series covers three nilfs2 log writer-related issues,
including a timer use-after-free issue and potential deadlock issue on
unmount, and a potential freeze issue in event synchronization found
during their analysis. Details are described in each commit log.
This patch (of 3):
A use-after-free issue has been reported regarding the timer sc_timer on
the nilfs_sc_info structure.
The problem is that even though it is used to wake up a sleeping log
writer thread, sc_timer is not shut down until the nilfs_sc_info structure
is about to be freed, and is used regardless of the thread's lifetime.
Fix this issue by limiting the use of sc_timer only while the log writer
thread is alive.
Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com
Fixes: fdce895ea5dd ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: "Bai, Shuangpeng" <sjb7183(a)psu.edu>
Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 6be7dd423fbd..7cb34e1c9206 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2118,8 +2118,10 @@ static void nilfs_segctor_start_timer(struct nilfs_sc_info *sci)
{
spin_lock(&sci->sc_state_lock);
if (!(sci->sc_state & NILFS_SEGCTOR_COMMIT)) {
- sci->sc_timer.expires = jiffies + sci->sc_interval;
- add_timer(&sci->sc_timer);
+ if (sci->sc_task) {
+ sci->sc_timer.expires = jiffies + sci->sc_interval;
+ add_timer(&sci->sc_timer);
+ }
sci->sc_state |= NILFS_SEGCTOR_COMMIT;
}
spin_unlock(&sci->sc_state_lock);
@@ -2320,10 +2322,21 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
*/
static void nilfs_segctor_accept(struct nilfs_sc_info *sci)
{
+ bool thread_is_alive;
+
spin_lock(&sci->sc_state_lock);
sci->sc_seq_accepted = sci->sc_seq_request;
+ thread_is_alive = (bool)sci->sc_task;
spin_unlock(&sci->sc_state_lock);
- del_timer_sync(&sci->sc_timer);
+
+ /*
+ * This function does not race with the log writer thread's
+ * termination. Therefore, deleting sc_timer, which should not be
+ * done after the log writer thread exits, can be done safely outside
+ * the area protected by sc_state_lock.
+ */
+ if (thread_is_alive)
+ del_timer_sync(&sci->sc_timer);
}
/**
@@ -2349,7 +2362,7 @@ static void nilfs_segctor_notify(struct nilfs_sc_info *sci, int mode, int err)
sci->sc_flush_request &= ~FLUSH_DAT_BIT;
/* re-enable timer if checkpoint creation was not done */
- if ((sci->sc_state & NILFS_SEGCTOR_COMMIT) &&
+ if ((sci->sc_state & NILFS_SEGCTOR_COMMIT) && sci->sc_task &&
time_before(jiffies, sci->sc_timer.expires))
add_timer(&sci->sc_timer);
}
@@ -2539,6 +2552,7 @@ static int nilfs_segctor_thread(void *arg)
int timeout = 0;
sci->sc_timer_task = current;
+ timer_setup(&sci->sc_timer, nilfs_construction_timeout, 0);
/* start sync. */
sci->sc_task = current;
@@ -2606,6 +2620,7 @@ static int nilfs_segctor_thread(void *arg)
end_thread:
/* end sync. */
sci->sc_task = NULL;
+ timer_shutdown_sync(&sci->sc_timer);
wake_up(&sci->sc_wait_task); /* for nilfs_segctor_kill_thread() */
spin_unlock(&sci->sc_state_lock);
return 0;
@@ -2669,7 +2684,6 @@ static struct nilfs_sc_info *nilfs_segctor_new(struct super_block *sb,
INIT_LIST_HEAD(&sci->sc_gc_inodes);
INIT_LIST_HEAD(&sci->sc_iput_queue);
INIT_WORK(&sci->sc_iput_work, nilfs_iput_work_func);
- timer_setup(&sci->sc_timer, nilfs_construction_timeout, 0);
sci->sc_interval = HZ * NILFS_SC_DEFAULT_TIMEOUT;
sci->sc_mjcp_freq = HZ * NILFS_SC_DEFAULT_SR_FREQ;
@@ -2748,7 +2762,6 @@ static void nilfs_segctor_destroy(struct nilfs_sc_info *sci)
down_write(&nilfs->ns_segctor_sem);
- timer_shutdown_sync(&sci->sc_timer);
kfree(sci);
}
Hey Greg,
Could you please backport commit ce4f78f1b53d3327fbd32764aa333bf05fb68818
"riscv: signal: handle syscall restart before get_signal" to at least
6.6? Apparently it fixes CRIU and ptrace, but was unfortunately not
given a fixes tag so I do not know how far back it is actually required.
It cherry-picks to 6.1 and builds there, but I have not tested it.
Thanks,
Conor.
Hello,
Commit a940904443e432623579245babe63e2486ff327b ("powerpc/iommu: Add
iommu_ops to report capabilities and allow blocking domains") fixes a
regression that prevents attaching PCI devices to the vfio-pci driver on
PPC64. Its inclusion in 6.1 would open the door for restoring VFIO and
KVM PCI passthrough support on distros that rely on this LTS kernel.
Thanks,
Shawn
Hi stable team,
> > On Mon, 15 Jan 2024 12:43:38 +0000
> > "Russell King (Oracle)" <rmk+kernel(a)armlinux.org.uk> wrote:
> >
> > > The referenced commit moved the setting of the Autoneg and pause bits
> > > early in sfp_parse_support(). However, we check whether the modes are
> > > empty before using the bitrate to set some modes. Setting these bits
> > > so early causes that test to always be false, preventing this working,
> > > and thus some modules that used to work no longer do.
> > >
> > > Move them just before the call to the quirk.
> > >
> > > Fixes: 8110633db49d ("net: sfp-bus: allow SFP quirks to override Autoneg and pause bits")
> > > Signed-off-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
please apply this patch also to Linux stable down to v6.4 which are
affected by problems introduced by commit 8110633db49d ("net: sfp-bus:
allow SFP quirks to override Autoneg and pause bits").
The fix has been applied to net tree as commit 97eb5d51b4a5 ("net:
sfp-bus: fix SFP mode detect from bitrate") but never picked for older
kernel versions affected as well.
Thank you!
Daniel
Please backport to the following patch which was merged upstream.
It should apply to linux-5.4.y and later.
commit 29be9100aca2915fab54b5693309bc42956542e5
Author: Marc Dionne <marc.dionne(a)auristor.com>1
Date: Fri May 24 17:17:55 2024 +0100
afs: Don't cross .backup mountpoint from backup volume
Don't cross a mountpoint that explicitly specifies a backup volume
(target is <vol>.backup) when starting from a backup volume.
It it not uncommon to mount a volume's backup directly in the volume
itself. This can cause tools that are not paying attention to get
into a loop mounting the volume onto itself as they attempt to
traverse the tree, leading to a variety of problems.
This doesn't prevent the general case of loops in a sequence of
mountpoints, but addresses a common special case in the same way
as other afs clients.
Reported-by: Jan Henrik Sylvester <jan.henrik.sylvester(a)uni-hamburg.de>
Link:
http://lists.infradead.org/pipermail/linux-afs/2024-May/008454.html
Reported-by: Markus Suvanto <markus.suvanto(a)gmail.com>
Link:
http://lists.infradead.org/pipermail/linux-afs/2024-February/008074.html
Signed-off-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Link:
https://lore.kernel.org/r/768760.1716567475@warthog.procyon.org.uk
Reviewed-by: Jeffrey Altman <jaltman(a)auristor.com>
cc: linux-afs(a)lists.infradead.org
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
Thank you.
Jeffrey Altman
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 3d8f874bd620ce03f75a5512847586828ab86544
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024052549-gyration-replica-129f@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3d8f874bd620ce03f75a5512847586828ab86544 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei(a)redhat.com>
Date: Fri, 10 May 2024 11:50:27 +0800
Subject: [PATCH] io_uring: fail NOP if non-zero op flags is passed in
The NOP op flags should have been checked from beginning like any other
opcode, otherwise NOP may not be extended with the op flags.
Given both liburing and Rust io-uring crate always zeros SQE op flags, just
ignore users which play raw NOP uring interface without zeroing SQE, because
NOP is just for test purpose. Then we can save one NOP2 opcode.
Suggested-by: Jens Axboe <axboe(a)kernel.dk>
Fixes: 2b188cc1bb85 ("Add io_uring IO interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Link: https://lore.kernel.org/r/20240510035031.78874-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/nop.c b/io_uring/nop.c
index d956599a3c1b..1a4e312dfe51 100644
--- a/io_uring/nop.c
+++ b/io_uring/nop.c
@@ -12,6 +12,8 @@
int io_nop_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
+ if (READ_ONCE(sqe->rw_flags))
+ return -EINVAL;
return 0;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 42316941335644a98335f209daafa4c122f28983
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024052313-runner-spree-04c1@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42316941335644a98335f209daafa4c122f28983 Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Sun, 21 Apr 2024 17:37:49 +0000
Subject: [PATCH] binder: fix max_thread type inconsistency
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The type defined for the BINDER_SET_MAX_THREADS ioctl was changed from
size_t to __u32 in order to avoid incompatibility issues between 32 and
64-bit kernels. However, the internal types used to copy from user and
store the value were never updated. Use u32 to fix the inconsistency.
Fixes: a9350fc859ae ("staging: android: binder: fix BINDER_SET_MAX_THREADS declaration")
Reported-by: Arve Hjønnevåg <arve(a)android.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Reviewed-by: Alice Ryhl <aliceryhl(a)google.com>
Link: https://lore.kernel.org/r/20240421173750.3117808-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index dd6923d37931..b21a7b246a0d 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -5367,7 +5367,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
goto err;
break;
case BINDER_SET_MAX_THREADS: {
- int max_threads;
+ u32 max_threads;
if (copy_from_user(&max_threads, ubuf,
sizeof(max_threads))) {
diff --git a/drivers/android/binder_internal.h b/drivers/android/binder_internal.h
index 7270d4d22207..5b7c80b99ae8 100644
--- a/drivers/android/binder_internal.h
+++ b/drivers/android/binder_internal.h
@@ -421,7 +421,7 @@ struct binder_proc {
struct list_head todo;
struct binder_stats stats;
struct list_head delivered_death;
- int max_threads;
+ u32 max_threads;
int requested_threads;
int requested_threads_started;
int tmp_ref;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 955a923d2809803980ff574270f81510112be9cf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051347-uncross-jockstrap-5ce0@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
955a923d2809 ("maple_tree: fix mas_empty_area_rev() null pointer dereference")
29ad6bb31348 ("maple_tree: fix allocation in mas_sparse_area()")
fad8e4291da5 ("maple_tree: make maple state reusable after mas_empty_area_rev()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 955a923d2809803980ff574270f81510112be9cf Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Mon, 22 Apr 2024 16:33:49 -0400
Subject: [PATCH] maple_tree: fix mas_empty_area_rev() null pointer dereference
Currently the code calls mas_start() followed by mas_data_end() if the
maple state is MA_START, but mas_start() may return with the maple state
node == NULL. This will lead to a null pointer dereference when checking
information in the NULL node, which is done in mas_data_end().
Avoid setting the offset if there is no node by waiting until after the
maple state is checked for an empty or single entry state.
A user could trigger the events to cause a kernel oops by unmapping all
vmas to produce an empty maple tree, then mapping a vma that would cause
the scenario described above.
Link: https://lkml.kernel.org/r/20240422203349.2418465-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Marius Fleischer <fleischermarius(a)gmail.com>
Closes: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW…
Link: https://lore.kernel.org/lkml/CAJg=8jyuSxDL6XvqEXY_66M20psRK2J53oBTP+fjV5xpW…
Tested-by: Marius Fleischer <fleischermarius(a)gmail.com>
Tested-by: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 55e1b35bf877..2d7d27e6ae3c 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5109,18 +5109,18 @@ int mas_empty_area_rev(struct ma_state *mas, unsigned long min,
if (size == 0 || max - min < size - 1)
return -EINVAL;
- if (mas_is_start(mas)) {
+ if (mas_is_start(mas))
mas_start(mas);
- mas->offset = mas_data_end(mas);
- } else if (mas->offset >= 2) {
- mas->offset -= 2;
- } else if (!mas_rewind_node(mas)) {
+ else if ((mas->offset < 2) && (!mas_rewind_node(mas)))
return -EBUSY;
- }
- /* Empty set. */
- if (mas_is_none(mas) || mas_is_ptr(mas))
+ if (unlikely(mas_is_none(mas) || mas_is_ptr(mas)))
return mas_sparse_area(mas, min, max, size, false);
+ else if (mas->offset >= 2)
+ mas->offset -= 2;
+ else
+ mas->offset = mas_data_end(mas);
+
/* The start of the window can only be within these values. */
mas->index = min;
Hi all,
Can you please pick up the following two drm patches to linux-5.15.y
and newer?
As these bugs affect these kernel versions too
List of patches to be backported
Patch 1:
5abffb66d12bcac84bf7b66389c571b8bb6e82bd
drm: Check output polling initialized before disabling
Patch 2:
048a36d8a6085bbd8ab9e5794b713b92ac986450
drm: Check polling initialized before enabling in drm_helper_probe_single_connector_modes
These however do not apply cleanly on the 5.15.y branch, so I am also
attaching rebased versions of these patches in the mail
Thanks and Regards,
Shradha.
Read/write callbacks registered with nvmem core expect 0 to be returned
on success and a negative value to be returned on failure.
Currently pci1xxxx_otp_read()/pci1xxxx_otp_write() and
pci1xxxx_eeprom_read()/pci1xxxx_eeprom_write() return the number of
bytes read/written on success.
Fix to return 0 on success.
Fixes: 9ab5465349c0 ("misc: microchip: pci1xxxx: Add support to read and write into PCI1XXXX EEPROM via NVMEM sysfs")
Fixes: 0969001569e4 ("misc: microchip: pci1xxxx: Add support to read and write into PCI1XXXX OTP via NVMEM sysfs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Joy Chakraborty <joychakr(a)google.com>
---
drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_otpe2p.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_otpe2p.c b/drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_otpe2p.c
index 16695cb5e69c..7c3d8bedf90b 100644
--- a/drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_otpe2p.c
+++ b/drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_otpe2p.c
@@ -153,7 +153,6 @@ static int pci1xxxx_eeprom_read(void *priv_t, unsigned int off,
buf[byte] = readl(rb + MMAP_EEPROM_OFFSET(EEPROM_DATA_REG));
}
- ret = byte;
error:
release_sys_lock(priv);
return ret;
@@ -197,7 +196,6 @@ static int pci1xxxx_eeprom_write(void *priv_t, unsigned int off,
goto error;
}
}
- ret = byte;
error:
release_sys_lock(priv);
return ret;
@@ -258,7 +256,6 @@ static int pci1xxxx_otp_read(void *priv_t, unsigned int off,
buf[byte] = readl(rb + MMAP_OTP_OFFSET(OTP_RD_DATA_OFFSET));
}
- ret = byte;
error:
release_sys_lock(priv);
return ret;
@@ -315,7 +312,6 @@ static int pci1xxxx_otp_write(void *priv_t, unsigned int off,
goto error;
}
}
- ret = byte;
error:
release_sys_lock(priv);
return ret;
--
2.45.2.505.gda0bf45e8d-goog
Hello,
I'm using Ubuntu Ubuntu 22.04.4 LTS.
I upgraded my kernel last week and obviously there is something wrong with
it.
When I come back on 5.15.0-107.117, everything is fine.
You can find in attachment : kern.log when the kernel seems to crash.
I have added lshw -short + dpkg.
Let me know if you are the good contact.
Thanks & Regards
Xavier.
The original backport didn't move the code to link the vma into the MT
and also the code to increment the map_count causing ~15 xfstests
(including ext4/303 generic/051 generic/054 generic/069) to hard fail
on some platforms. This patch resolves test failures.
Fixes: 0c42f7e039ab ("fork: defer linking file vma until vma is fully initialized")
Signed-off-by: Leah Rumancik <leah.rumancik(a)gmail.com>
---
kernel/fork.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 7e9a5919299b..3b44960b1385 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -668,6 +668,15 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
if (is_vm_hugetlb_page(tmp))
hugetlb_dup_vma_private(tmp);
+ /* Link the vma into the MT */
+ mas.index = tmp->vm_start;
+ mas.last = tmp->vm_end - 1;
+ mas_store(&mas, tmp);
+ if (mas_is_err(&mas))
+ goto fail_nomem_mas_store;
+
+ mm->map_count++;
+
if (tmp->vm_ops && tmp->vm_ops->open)
tmp->vm_ops->open(tmp);
@@ -687,14 +696,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
i_mmap_unlock_write(mapping);
}
- /* Link the vma into the MT */
- mas.index = tmp->vm_start;
- mas.last = tmp->vm_end - 1;
- mas_store(&mas, tmp);
- if (mas_is_err(&mas))
- goto fail_nomem_mas_store;
-
- mm->map_count++;
if (!(tmp->vm_flags & VM_WIPEONFORK))
retval = copy_page_range(tmp, mpnt);
--
2.45.1.288.g0e0cd299f1-goog
From: Jeff Xu <jeffxu(a)chromium.org>
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it
didn't have proper documentation. This led to a lot of confusion,
especially about whether or not memfd created with the MFD_NOEXEC_SEAL
flag is sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.
As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea
is to make it easier to use memfd in the most common way, which is
NOEXEC + F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl
vm.noexec to help existing applications move to a more secure way of
using memfd.
Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1] [2],
Those are based on the viewpoint that each flag is an atomic unit,
which is a reasonable assumption. However, MFD_NOEXEC_SEAL was
designed with the intent of promoting the most secure method of using
memfd, therefore a combination of multiple functionalities into one
bit.
Furthermore, the MFD_NOEXEC_SEAL has been added for more than one
year, and multiple applications and distributions have backported and
utilized it. Altering ABI now presents a degree of risk and may lead
to disruption.
MFD_NOEXEC_SEAL is a new flag, and applications must change their code
to use it. There is no backward compatibility problem.
When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And
old-application might break, that is by-design, in such a system
vm.noexec = 0 shall be used. Also no backward compatibility problem.
I propose to include this documentation patch to assist in clarifying
the semantics of MFD_NOEXEC_SEAL, thereby preventing any potential
future confusion.
This patch supersede previous patch which is trying different
direction [3], and please remove [2] from mm-unstable branch when
applying this patch.
Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.
[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
[2]
https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
[3]
https://lore.kernel.org/lkml/20240524033933.135049-1-jeffxu@google.com/
v3:
Additional Randy Dunlap' comments.
v2:
Update according to Randy Dunlap' comments.
https://lore.kernel.org/linux-mm/20240611034903.3456796-1-jeffxu@chromium.o…
v1:
https://lore.kernel.org/linux-mm/20240607203543.2151433-1-jeffxu@google.com/
Jeff Xu (1):
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++
2 files changed, 87 insertions(+)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.2.505.gda0bf45e8d-goog
From: Jeff Xu <jeffxu(a)chromium.org>
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it
didn't have proper documentation. This led to a lot of confusion,
especially about whether or not memfd created with the MFD_NOEXEC_SEAL
flag is sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.
As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea
is to make it easier to use memfd in the most common way, which is
NOEXEC + F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl
vm.noexec to help existing applications move to a more secure way of
using memfd.
Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1] [2],
Those are based on the viewpoint that each flag is an atomic unit,
which is a reasonable assumption. However, MFD_NOEXEC_SEAL was
designed with the intent of promoting the most secure method of using
memfd, therefore a combination of multiple functionalities into one
bit.
Furthermore, the MFD_NOEXEC_SEAL has been added for more than one
year, and multiple applications and distributions have backported and
utilized it. Altering ABI now presents a degree of risk and may lead
to disruption.
MFD_NOEXEC_SEAL is a new flag, and applications must change their code
to use it. There is no backward compatibility problem.
When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And
old-application might break, that is by-design, in such a system
vm.noexec = 0 shall be used. Also no backward compatibility problem.
I propose to include this documentation patch to assist in clarifying
the semantics of MFD_NOEXEC_SEAL, thereby preventing any potential
future confusion.
This patch supersede previous patch which is trying different
direction [3], and please remove [2] from mm-unstable branch when
applying this patch.
Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.
[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
[2]
https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
[3]
https://lore.kernel.org/lkml/20240524033933.135049-1-jeffxu@google.com/
v2:
Update according to Randy Dunlap' comments.
v1:
https://lore.kernel.org/linux-mm/20240607203543.2151433-1-jeffxu@google.com/
Jeff Xu (1):
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++
2 files changed, 87 insertions(+)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.2.505.gda0bf45e8d-goog
From: Dillon Varone <dillon.varone(a)amd.com>
When a phantom stream is in the process of being deconstructed, there
could be pipes with no associated planes. In that case, ignore the
phantom stream entirely when searching for associated pipes.
Cc: stable(a)vger.kernel.org
Reviewed-by: Alvin Lee <alvin.lee2(a)amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Dillon Varone <dillon.varone(a)amd.com>
---
.../gpu/drm/amd/display/dc/dml2/dml21/dml21_utils.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_utils.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_utils.c
index 4e12810308a4..4166332b5b89 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_utils.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_utils.c
@@ -126,10 +126,15 @@ int dml21_find_dc_pipes_for_plane(const struct dc *in_dc,
if (dc_phantom_stream && num_pipes > 0) {
dc_phantom_stream_status = dml_ctx->config.callbacks.get_stream_status(context, dc_phantom_stream);
- /* phantom plane will have same index as main */
- dc_phantom_plane = dc_phantom_stream_status->plane_states[dc_plane_index];
+ if (dc_phantom_stream_status) {
+ /* phantom plane will have same index as main */
+ dc_phantom_plane = dc_phantom_stream_status->plane_states[dc_plane_index];
- dml_ctx->config.callbacks.get_dpp_pipes_for_plane(dc_phantom_plane, &context->res_ctx, dc_phantom_pipes);
+ if (dc_phantom_plane) {
+ /* only care about phantom pipes if they contain the phantom plane */
+ dml_ctx->config.callbacks.get_dpp_pipes_for_plane(dc_phantom_plane, &context->res_ctx, dc_phantom_pipes);
+ }
+ }
}
return num_pipes;
--
2.45.1
From: Michael Strauss <michael.strauss(a)amd.com>
[WHY]
Empty SST TUs are illegal to transmit over a USB4 DP tunnel.
Current policy is to configure stream encoder to pack 2 pixels per pclk
even when ODM combine is not in use, allowing seamless dynamic ODM
reconfiguration. However, in extreme edge cases where average pixel
count per TU is less than 2, this can lead to unexpected empty TU
generation during compliance testing. For example, VIC 1 with a 1xHBR3
link configuration will average 1.98 pix/TU.
[HOW]
Calculate average pixel count per TU, and block 2 pixels per clock if
endpoint is a DPIA tunnel and pixel clock is low enough that we will
never require 2:1 ODM combine.
Cc: stable(a)vger.kernel.org # 6.6+
Reviewed-by: Wenjing Liu <wenjing.liu(a)amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Michael Strauss <michael.strauss(a)amd.com>
---
.../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 72 +++++++++++++++++++
.../amd/display/dc/hwss/dcn35/dcn35_hwseq.h | 2 +
.../amd/display/dc/hwss/dcn35/dcn35_init.c | 2 +-
3 files changed, 75 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
index 4f87316e1318..0602921399cd 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
@@ -1529,3 +1529,75 @@ void dcn35_set_long_vblank(struct pipe_ctx **pipe_ctx,
}
}
}
+
+static bool should_avoid_empty_tu(struct pipe_ctx *pipe_ctx)
+{
+ /* Calculate average pixel count per TU, return false if under ~2.00 to
+ * avoid empty TUs. This is only required for DPIA tunneling as empty TUs
+ * are legal to generate for native DP links. Assume TU size 64 as there
+ * is currently no scenario where it's reprogrammed from HW default.
+ * MTPs have no such limitation, so this does not affect MST use cases.
+ */
+ unsigned int pix_clk_mhz;
+ unsigned int symclk_mhz;
+ unsigned int avg_pix_per_tu_x1000;
+ unsigned int tu_size_bytes = 64;
+ struct dc_crtc_timing *timing = &pipe_ctx->stream->timing;
+ struct dc_link_settings *link_settings = &pipe_ctx->link_config.dp_link_settings;
+ const struct dc *dc = pipe_ctx->stream->link->dc;
+
+ if (pipe_ctx->stream->link->ep_type != DISPLAY_ENDPOINT_USB4_DPIA)
+ return false;
+
+ // Not necessary for MST configurations
+ if (pipe_ctx->stream->signal == SIGNAL_TYPE_DISPLAY_PORT_MST)
+ return false;
+
+ pix_clk_mhz = timing->pix_clk_100hz / 10000;
+
+ // If this is true, can't block due to dynamic ODM
+ if (pix_clk_mhz > dc->clk_mgr->bw_params->clk_table.entries[0].dispclk_mhz)
+ return false;
+
+ switch (link_settings->link_rate) {
+ case LINK_RATE_LOW:
+ symclk_mhz = 162;
+ break;
+ case LINK_RATE_HIGH:
+ symclk_mhz = 270;
+ break;
+ case LINK_RATE_HIGH2:
+ symclk_mhz = 540;
+ break;
+ case LINK_RATE_HIGH3:
+ symclk_mhz = 810;
+ break;
+ default:
+ // We shouldn't be tunneling any other rates, something is wrong
+ ASSERT(0);
+ return false;
+ }
+
+ avg_pix_per_tu_x1000 = (1000 * pix_clk_mhz * tu_size_bytes)
+ / (symclk_mhz * link_settings->lane_count);
+
+ // Add small empirically-decided margin to account for potential jitter
+ return (avg_pix_per_tu_x1000 < 2020);
+}
+
+bool dcn35_is_dp_dig_pixel_rate_div_policy(struct pipe_ctx *pipe_ctx)
+{
+ struct dc *dc = pipe_ctx->stream->ctx->dc;
+
+ if (!is_h_timing_divisible_by_2(pipe_ctx->stream))
+ return false;
+
+ if (should_avoid_empty_tu(pipe_ctx))
+ return false;
+
+ if (dc_is_dp_signal(pipe_ctx->stream->signal) && !dc->link_srv->dp_is_128b_132b_signal(pipe_ctx) &&
+ dc->debug.enable_dp_dig_pixel_rate_div_policy)
+ return true;
+
+ return false;
+}
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.h b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.h
index bc05beba5f2c..e27b3609020f 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.h
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.h
@@ -97,4 +97,6 @@ void dcn35_set_static_screen_control(struct pipe_ctx **pipe_ctx,
void dcn35_set_long_vblank(struct pipe_ctx **pipe_ctx,
int num_pipes, uint32_t v_total_min, uint32_t v_total_max);
+bool dcn35_is_dp_dig_pixel_rate_div_policy(struct pipe_ctx *pipe_ctx);
+
#endif /* __DC_HWSS_DCN35_H__ */
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c
index 30e6a6398839..428912f37129 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c
@@ -161,7 +161,7 @@ static const struct hwseq_private_funcs dcn35_private_funcs = {
.setup_hpo_hw_control = dcn35_setup_hpo_hw_control,
.calculate_dccg_k1_k2_values = dcn32_calculate_dccg_k1_k2_values,
.resync_fifo_dccg_dio = dcn314_resync_fifo_dccg_dio,
- .is_dp_dig_pixel_rate_div_policy = dcn32_is_dp_dig_pixel_rate_div_policy,
+ .is_dp_dig_pixel_rate_div_policy = dcn35_is_dp_dig_pixel_rate_div_policy,
.dsc_pg_control = dcn35_dsc_pg_control,
.dsc_pg_status = dcn32_dsc_pg_status,
.enable_plane = dcn35_enable_plane,
--
2.45.1
Currently, for JH7110 boards with EMMC slot, vqmmc voltage for EMMC is
fixed to 1.8V, while the spec needs it to be 3.3V on low speed mode and
should support switching to 1.8V when using higher speed mode. Since
there are no other peripherals using the same voltage source of EMMC's
vqmmc(ALDO4) on every board currently supported by mainline kernel,
regulator-max-microvolt of ALDO4 should be set to 3.3V.
Cc: stable(a)vger.kernel.org
Signed-off-by: Shengyu Qu <wiagn233(a)outlook.com>
Fixes: ac9a37e2d6b6 ("riscv: dts: starfive: introduce a common board dtsi for jh7110 based boards")
---
arch/riscv/boot/dts/starfive/jh7110-common.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
index 37b4c294ffcc..c7a549ec7452 100644
--- a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
+++ b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
@@ -244,7 +244,7 @@ emmc_vdd: aldo4 {
regulator-boot-on;
regulator-always-on;
regulator-min-microvolt = <1800000>;
- regulator-max-microvolt = <1800000>;
+ regulator-max-microvolt = <3300000>;
regulator-name = "emmc_vdd";
};
};
--
2.34.1
handle_nested_irq() is supposed to be running inside the parent thread
handler context. It per se has no dedicated kernel thread, thus shouldn't
touch desc->threads_active. The parent kernel thread has already taken
care of this.
Fixes: e2c12739ccf7 ("genirq: Prevent nested thread vs synchronize_hardirq() deadlock")
Cc: stable(a)vger.kernel.org
Signed-off-by: Peng Liu <iwtbavbm(a)gmail.com>
---
Despite of its correctness, I'm afraid the testing on my only PC can't
cover the affected code path. So the patch may be totally -UNTESTED-.
kernel/irq/chip.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index dc94e0bf2c94..85d4f29134e9 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -478,7 +478,6 @@ void handle_nested_irq(unsigned int irq)
}
kstat_incr_irqs_this_cpu(desc);
- atomic_inc(&desc->threads_active);
raw_spin_unlock_irq(&desc->lock);
action_ret = IRQ_NONE;
@@ -487,8 +486,6 @@ void handle_nested_irq(unsigned int irq)
if (!irq_settings_no_debug(desc))
note_interrupt(desc, action_ret);
-
- wake_threads_waitq(desc);
}
EXPORT_SYMBOL_GPL(handle_nested_irq);
--
2.39.2
From: Hector Martin <marcan(a)marcan.st>
When multiple streams are in use, multiple TDs might be in flight when
an endpoint is stopped. We need to issue a Set TR Dequeue Pointer for
each, to ensure everything is reset properly and the caches cleared.
Change the logic so that any N>1 TDs found active for different streams
are deferred until after the first one is processed, calling
xhci_invalidate_cancelled_tds() again from xhci_handle_cmd_set_deq() to
queue another command until we are done with all of them. Also change
the error/"should never happen" paths to ensure we at least clear any
affected TDs, even if we can't issue a command to clear the hardware
cache, and complain loudly with an xhci_warn() if this ever happens.
This problem case dates back to commit e9df17eb1408 ("USB: xhci: Correct
assumptions about number of rings per endpoint.") early on in the XHCI
driver's life, when stream support was first added.
It was then identified but not fixed nor made into a warning in commit
674f8438c121 ("xhci: split handling halted endpoints into two steps"),
which added a FIXME comment for the problem case (without materially
changing the behavior as far as I can tell, though the new logic made
the problem more obvious).
Then later, in commit 94f339147fc3 ("xhci: Fix failure to give back some
cached cancelled URBs."), it was acknowledged again.
[Mathias: commit 94f339147fc3 ("xhci: Fix failure to give back some cached
cancelled URBs.") was a targeted regression fix to the previously mentioned
patch. Users reported issues with usb stuck after unmounting/disconnecting
UAS devices. This rolled back the TD clearing of multiple streams to its
original state.]
Apparently the commit author was aware of the problem (yet still chose
to submit it): It was still mentioned as a FIXME, an xhci_dbg() was
added to log the problem condition, and the remaining issue was mentioned
in the commit description. The choice of making the log type xhci_dbg()
for what is, at this point, a completely unhandled and known broken
condition is puzzling and unfortunate, as it guarantees that no actual
users would see the log in production, thereby making it nigh
undebuggable (indeed, even if you turn on DEBUG, the message doesn't
really hint at there being a problem at all).
It took me *months* of random xHC crashes to finally find a reliable
repro and be able to do a deep dive debug session, which could all have
been avoided had this unhandled, broken condition been actually reported
with a warning, as it should have been as a bug intentionally left in
unfixed (never mind that it shouldn't have been left in at all).
> Another fix to solve clearing the caches of all stream rings with
> cancelled TDs is needed, but not as urgent.
3 years after that statement and 14 years after the original bug was
introduced, I think it's finally time to fix it. And maybe next time
let's not leave bugs unfixed (that are actually worse than the original
bug), and let's actually get people to review kernel commits please.
Fixes xHC crashes and IOMMU faults with UAS devices when handling
errors/faults. Easiest repro is to use `hdparm` to mark an early sector
(e.g. 1024) on a disk as bad, then `cat /dev/sdX > /dev/null` in a loop.
At least in the case of JMicron controllers, the read errors end up
having to cancel two TDs (for two queued requests to different streams)
and the one that didn't get cleared properly ends up faulting the xHC
entirely when it tries to access DMA pages that have since been unmapped,
referred to by the stale TDs. This normally happens quickly (after two
or three loops). After this fix, I left the `cat` in a loop running
overnight and experienced no xHC failures, with all read errors
recovered properly. Repro'd and tested on an Apple M1 Mac Mini
(dwc3 host).
On systems without an IOMMU, this bug would instead silently corrupt
freed memory, making this a security bug (even on systems with IOMMUs
this could silently corrupt memory belonging to other USB devices on the
same controller, so it's still a security bug). Given that the kernel
autoprobes partition tables, I'm pretty sure a malicious USB device
pretending to be a UAS device and reporting an error with the right
timing could deliberately trigger a UAF and write to freed memory, with
no user action.
[Mathias: Commit message and code comment edit, original at:]
https://lore.kernel.org/linux-usb/20240524-xhci-streams-v1-1-6b1f13819bea@m…
Fixes: e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.")
Fixes: 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs.")
Fixes: 674f8438c121 ("xhci: split handling halted endpoints into two steps")
Cc: stable(a)vger.kernel.org
Cc: security(a)kernel.org
Reviewed-by: Neal Gompa <neal(a)gompa.dev>
Signed-off-by: Hector Martin <marcan(a)marcan.st>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 54 ++++++++++++++++++++++++++++--------
drivers/usb/host/xhci.h | 1 +
2 files changed, 44 insertions(+), 11 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 1db61bb2b9b5..fd0cde3d1569 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1031,13 +1031,27 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
break;
case TD_DIRTY: /* TD is cached, clear it */
case TD_HALTED:
+ case TD_CLEARING_CACHE_DEFERRED:
+ if (cached_td) {
+ if (cached_td->urb->stream_id != td->urb->stream_id) {
+ /* Multiple streams case, defer move dq */
+ xhci_dbg(xhci,
+ "Move dq deferred: stream %u URB %p\n",
+ td->urb->stream_id, td->urb);
+ td->cancel_status = TD_CLEARING_CACHE_DEFERRED;
+ break;
+ }
+
+ /* Should never happen, but clear the TD if it does */
+ xhci_warn(xhci,
+ "Found multiple active URBs %p and %p in stream %u?\n",
+ td->urb, cached_td->urb,
+ td->urb->stream_id);
+ td_to_noop(xhci, ring, cached_td, false);
+ cached_td->cancel_status = TD_CLEARED;
+ }
+
td->cancel_status = TD_CLEARING_CACHE;
- if (cached_td)
- /* FIXME stream case, several stopped rings */
- xhci_dbg(xhci,
- "Move dq past stream %u URB %p instead of stream %u URB %p\n",
- td->urb->stream_id, td->urb,
- cached_td->urb->stream_id, cached_td->urb);
cached_td = td;
break;
}
@@ -1057,10 +1071,16 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
if (err) {
/* Failed to move past cached td, just set cached TDs to no-op */
list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list, cancelled_td_list) {
- if (td->cancel_status != TD_CLEARING_CACHE)
+ /*
+ * Deferred TDs need to have the deq pointer set after the above command
+ * completes, so if that failed we just give up on all of them (and
+ * complain loudly since this could cause issues due to caching).
+ */
+ if (td->cancel_status != TD_CLEARING_CACHE &&
+ td->cancel_status != TD_CLEARING_CACHE_DEFERRED)
continue;
- xhci_dbg(xhci, "Failed to clear cancelled cached URB %p, mark clear anyway\n",
- td->urb);
+ xhci_warn(xhci, "Failed to clear cancelled cached URB %p, mark clear anyway\n",
+ td->urb);
td_to_noop(xhci, ring, td, false);
td->cancel_status = TD_CLEARED;
}
@@ -1346,6 +1366,7 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
struct xhci_ep_ctx *ep_ctx;
struct xhci_slot_ctx *slot_ctx;
struct xhci_td *td, *tmp_td;
+ bool deferred = false;
ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3]));
stream_id = TRB_TO_STREAM_ID(le32_to_cpu(trb->generic.field[2]));
@@ -1432,6 +1453,8 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
xhci_dbg(ep->xhci, "%s: Giveback cancelled URB %p TD\n",
__func__, td->urb);
xhci_td_cleanup(ep->xhci, td, ep_ring, td->status);
+ } else if (td->cancel_status == TD_CLEARING_CACHE_DEFERRED) {
+ deferred = true;
} else {
xhci_dbg(ep->xhci, "%s: Keep cancelled URB %p TD as cancel_status is %d\n",
__func__, td->urb, td->cancel_status);
@@ -1441,8 +1464,17 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
ep->ep_state &= ~SET_DEQ_PENDING;
ep->queued_deq_seg = NULL;
ep->queued_deq_ptr = NULL;
- /* Restart any rings with pending URBs */
- ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
+
+ if (deferred) {
+ /* We have more streams to clear */
+ xhci_dbg(ep->xhci, "%s: Pending TDs to clear, continuing with invalidation\n",
+ __func__);
+ xhci_invalidate_cancelled_tds(ep);
+ } else {
+ /* Restart any rings with pending URBs */
+ xhci_dbg(ep->xhci, "%s: All TDs cleared, ring doorbell\n", __func__);
+ ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
+ }
}
static void xhci_handle_cmd_reset_ep(struct xhci_hcd *xhci, int slot_id,
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 30415158ed3c..78d014c4d884 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1276,6 +1276,7 @@ enum xhci_cancelled_td_status {
TD_DIRTY = 0,
TD_HALTED,
TD_CLEARING_CACHE,
+ TD_CLEARING_CACHE_DEFERRED,
TD_CLEARED,
};
--
2.25.1
The transferred length is set incorrectly for cancelled bulk
transfer TDs in case the bulk transfer ring stops on the last transfer
block with a 'Stop - Length Invalid' completion code.
length essentially ends up being set to the requested length:
urb->actual_length = urb->transfer_buffer_length
Length for 'Stop - Length Invalid' cases should be the sum of all
TRB transfer block lengths up to the one the ring stopped on,
_excluding_ the one stopped on.
Fix this by always summing up TRB lengths for 'Stop - Length Invalid'
bulk cases.
This issue was discovered by Alan Stern while debugging
https://bugzilla.kernel.org/show_bug.cgi?id=218890, but does not
solve that bug. Issue is older than 4.10 kernel but fix won't apply
to those due to major reworks in that area.
Tested-by: Pierre Tomon <pierretom+12(a)ik.me>
Cc: stable(a)vger.kernel.org # v4.10+
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9e90d2952760..1db61bb2b9b5 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2524,9 +2524,8 @@ static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
goto finish_td;
case COMP_STOPPED_LENGTH_INVALID:
/* stopped on ep trb with invalid length, exclude it */
- ep_trb_len = 0;
- remaining = 0;
- break;
+ td->urb->actual_length = sum_trb_lengths(xhci, ep_ring, ep_trb);
+ goto finish_td;
case COMP_USB_TRANSACTION_ERROR:
if (xhci->quirks & XHCI_NO_SOFT_RETRY ||
(ep->err_count++ > MAX_SOFT_RETRY) ||
--
2.25.1
reg_read() callback registered with nvmem core expects 0 on success and
a negative value on error but rmem_read() returns the number of bytes
read which is treated as an error at the nvmem core.
This does not break when rmem is accessed using sysfs via
bin_attr_nvmem_read()/write() but causes an error when accessed from
places like nvmem_access_with_keepouts(), etc.
Change to return 0 on success and error in case
memory_read_from_buffer() returns an error or -EIO if bytes read do not
match what was requested.
Fixes: 5a3fa75a4d9c ("nvmem: Add driver to expose reserved memory as nvmem")
Cc: stable(a)vger.kernel.org
Signed-off-by: Joy Chakraborty <joychakr(a)google.com>
---
drivers/nvmem/rmem.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvmem/rmem.c b/drivers/nvmem/rmem.c
index 752d0bf4445e..7f907c5a445e 100644
--- a/drivers/nvmem/rmem.c
+++ b/drivers/nvmem/rmem.c
@@ -46,7 +46,10 @@ static int rmem_read(void *context, unsigned int offset,
memunmap(addr);
- return count;
+ if (count < 0)
+ return count;
+
+ return count == bytes ? 0 : -EIO;
}
static int rmem_probe(struct platform_device *pdev)
--
2.45.2.505.gda0bf45e8d-goog
Hello netfilter developers,
Do we have any tests that we could run before sending a stable backport
in netfilter/ subsystem to stable@vger ?
Let us say we have a CVE fix which is only backported till 5.10.y but it
is needed is 5.4.y and 4.19.y, the backport might need to easy to make,
just fixing some conflicts due to contextual changes or missing commits.
One question that comes in my mind is did I test that particular code,
often testing that particular code is tough unless the reproducer is
public. So I thought it would be good to learn about any netfilter test
suite(set of tests) to run before sending a backport to stable kernel
which might ensure we don't introduce regressions.
Thanks,
Harshit
From: Jeff Xu <jeffxu(a)chromium.org>
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it
didn't have proper documentation. This led to a lot of confusion,
especially about whether or not memfd created with the MFD_NOEXEC_SEAL
flag is sealable. Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.
As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags. The idea
is to make it easier to use memfd in the most common way, which is
NOEXEC + F_SEAL_EXEC + MFD_ALLOW_SEALING. This works with sysctl
vm.noexec to help existing applications move to a more secure way of
using memfd.
Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1] [2],
Those are based on the viewpoint that each flag is an atomic unit,
which is a reasonable assumption. However, MFD_NOEXEC_SEAL was
designed with the intent of promoting the most secure method of using
memfd, therefore a combination of multiple functionalities into one
bit.
Furthermore, the MFD_NOEXEC_SEAL has been added for more than one
year, and multiple applications and distributions have backported and
utilized it. Altering ABI now presents a degree of risk and may lead
to disruption.
MFD_NOEXEC_SEAL is a new flag, and applications must change their code
to use it. There is no backward compatibility problem.
When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd. And
old-application might break, that is by-design, in such a system
vm.noexec = 0 shall be used. Also no backward compatibility problem.
I propose to include this documentation patch to assist in clarifying
the semantics of MFD_NOEXEC_SEAL, thereby preventing any potential
future confusion.
This patch supersede previous patch which is trying different
direction [3], and please remove [2] from mm-unstable branch when
applying this patch.
Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.
[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
[2]
https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
[3]
https://lore.kernel.org/lkml/20240524033933.135049-1-jeffxu@google.com/
Jeff Xu (1):
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++
2 files changed, 87 insertions(+)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.2.505.gda0bf45e8d-goog
The different patches here are some unrelated fixes for MPTCP:
- Patch 1 ensures 'snd_una' is initialised on connect in case of MPTCP
fallback to TCP followed by retransmissions before the processing of
any other incoming packets. A fix for v5.9+.
- Patch 2 makes sure the RmAddr MIB counter is incremented, and only
once per ID, upon the reception of a RM_ADDR. A fix for v5.10+.
- Patch 3 doesn't update 'add addr' related counters if the connect()
was not possible. A fix for v5.7+.
- Patch 4 updates the mailmap file to add Geliang's new email address.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (1):
mailmap: map Geliang's new email address
Paolo Abeni (1):
mptcp: ensure snd_una is properly initialized on connect
YonglongLi (2):
mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
mptcp: pm: update add_addr counters after connect
.mailmap | 1 +
net/mptcp/pm_netlink.c | 21 ++++++++++++++-------
net/mptcp/protocol.c | 1 +
tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 +++--
4 files changed, 19 insertions(+), 9 deletions(-)
---
base-commit: c44711b78608c98a3e6b49ce91678cd0917d5349
change-id: 20240607-upstream-net-20240607-misc-fixes-024007171d60
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
The compiled dtb files aren't executable, so install them with 0644 as their
permission mode, instead of defaulting to 0755 for the permission mode and
installing them with the executable bits set.
Some Linux distributions, including Debian, [1][2][3] already include fixes
in their kernel package build recipes to change the dtb file permissions to
0644 in their kernel packages. These changes, when additionally propagated
into the long-term kernel versions, will allow such distributions to remove
their downstream fixes.
[1] https://salsa.debian.org/kernel-team/linux/-/merge_requests/642
[2] https://salsa.debian.org/kernel-team/linux/-/merge_requests/749
[3] https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/rules.real?…
Cc: Diederik de Haas <didi.debian(a)cknow.org>
Cc: stable(a)vger.kernel.org
Fixes: aefd80307a05 ("kbuild: refactor Makefile.dtbinst more")
Signed-off-by: Dragan Simic <dsimic(a)manjaro.org>
---
Notes:
Changes in v2:
- Improved the patch description, to include additional details and
to address the patch submission issues pointed out by Greg K-H [4]
- No changes were made to the patch itself
Link to v1: https://lore.kernel.org/linux-kbuild/ae087ef1715142f606ba6477ace3e4111972cf…
[4] https://lore.kernel.org/linux-kbuild/2024061006-ladylike-paving-a36b@gregkh/
scripts/Makefile.dtbinst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/Makefile.dtbinst b/scripts/Makefile.dtbinst
index 67956f6496a5..9d920419a62c 100644
--- a/scripts/Makefile.dtbinst
+++ b/scripts/Makefile.dtbinst
@@ -17,7 +17,7 @@ include $(srctree)/scripts/Kbuild.include
dst := $(INSTALL_DTBS_PATH)
quiet_cmd_dtb_install = INSTALL $@
- cmd_dtb_install = install -D $< $@
+ cmd_dtb_install = install -D -m 0644 $< $@
$(dst)/%: $(obj)/%
$(call cmd,dtb_install)
Add subsystem lvds and mipi. Add pwm and i2c in lvds and mipi.
imx8qm-mek:
- add remove-proc
- fixed gpio number error for vmmc
- add usb3 and typec
- add pwm and i2c in lvds and mipi
DTB_CHECK warning fixed by seperate patches.
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: usb@5b110000: usb@5b120000: 'port', 'usb-role-switch' do not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/usb/fsl,imx8qm-cdns3.yaml#
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: usb@5b120000: 'port', 'usb-role-switch' do not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/usb/cdns,usb3.yaml#
** binding fix patch: https://lore.kernel.org/imx/20240606161509.3201080-1-Frank.Li@nxp.com/T/#u
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: interrupt-controller@56240000: 'power-domains' does not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/interrupt-controller/fsl,irqsteer.yaml#
** binding fix patch: https://lore.kernel.org/imx/20240528071141.92003-1-alexander.stein@ew.tq-gr…
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: pwm@56244000: 'oneOf' conditional failed, one must be fixed:
'interrupts' is a required property
'interrupts-extended' is a required property
from schema $id: http://devicetree.org/schemas/pwm/imx-pwm.yaml#
** binding fix patch: https://lore.kernel.org/imx/dc9accba-78af-45ec-a516-b89f2d4f4b03@kernel.org…
from schema $id: http://devicetree.org/schemas/interrupt-controller/fsl,irqsteer.yaml#
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: imx8qm-cm4-0: power-domains: [[15, 278], [15, 297]] is too short
from schema $id: http://devicetree.org/schemas/remoteproc/fsl,imx-rproc.yaml#
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: imx8qm-cm4-1: power-domains: [[15, 298], [15, 317]] is too short
** binding fix patch: https://lore.kernel.org/imx/20240606150030.3067015-1-Frank.Li@nxp.com/T/#u
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
---
Changes in v2:
Changes in v2:
- split common lvds and mipi part to seperate dtsi file.
- num-interpolated-steps = <100>
- irq-steer add "fsl,imx8qm-irqsteer"
- using mux-controller
- move address-cells common dtsi
- Link to v1: https://lore.kernel.org/r/20240606-imx8qm-dts-usb-v1-0-565721b64f25@nxp.com
---
Frank Li (9):
arm64: dts: imx8: add basic lvds and lvds2 subsystem
arm64: dts: imx8qm: add lvds subsystem
arm64: dts: imx8: add basic mipi subsystem
arm64: dts: imx8qm: add mipi subsystem
arm64: dts: imx8qm-mek: add cm4 remote-proc and related memory region
arm64: dts: imx8qm-mek: add pwm and i2c in lvds subsystem
arm64: dts: imx8qm-mek: add i2c in mipi[0,1] subsystem
arm64: dts: imx8qm-mek: fix gpio number for reg_usdhc2_vmmc
arm64: dts: imx8qm-mek: add usb 3.0 and related type C nodes
arch/arm64/boot/dts/freescale/imx8-ss-lvds0.dtsi | 63 +++++
arch/arm64/boot/dts/freescale/imx8-ss-lvds1.dtsi | 114 +++++++++
arch/arm64/boot/dts/freescale/imx8-ss-mipi0.dtsi | 138 +++++++++++
arch/arm64/boot/dts/freescale/imx8-ss-mipi1.dtsi | 138 +++++++++++
arch/arm64/boot/dts/freescale/imx8qm-mek.dts | 280 +++++++++++++++++++++-
arch/arm64/boot/dts/freescale/imx8qm-ss-lvds.dtsi | 77 ++++++
arch/arm64/boot/dts/freescale/imx8qm.dtsi | 27 +++
7 files changed, 836 insertions(+), 1 deletion(-)
---
base-commit: ee78a17615ad0cfdbbc27182b1047cd36c9d4d5f
change-id: 20240606-imx8qm-dts-usb-9c55d2bfe526
Best regards,
---
Frank Li <Frank.Li(a)nxp.com>
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 07c54cc5988f19c9642fd463c2dbdac7fc52f777
Gitweb: https://git.kernel.org/tip/07c54cc5988f19c9642fd463c2dbdac7fc52f777
Author: Oleg Nesterov <oleg(a)redhat.com>
AuthorDate: Tue, 28 May 2024 14:20:19 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 10 Jun 2024 20:18:13 +02:00
tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash
when the boot CPU is nohz_full") the kernel no longer crashes, but there is
another problem.
In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
in smp_call_function_single().
Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
comment explains why this is safe (thanks Thomas!).
Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240528122019.GA28794@redhat.com
Link: https://lore.kernel.org/all/20240522151742.GA10400@redhat.com
---
kernel/time/tick-common.c | 42 ++++++++++++--------------------------
1 file changed, 14 insertions(+), 28 deletions(-)
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b130..a47bcf7 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -178,26 +178,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
}
}
-#ifdef CONFIG_NO_HZ_FULL
-static void giveup_do_timer(void *info)
-{
- int cpu = *(unsigned int *)info;
-
- WARN_ON(tick_do_timer_cpu != smp_processor_id());
-
- tick_do_timer_cpu = cpu;
-}
-
-static void tick_take_do_timer_from_boot(void)
-{
- int cpu = smp_processor_id();
- int from = tick_do_timer_boot_cpu;
-
- if (from >= 0 && from != cpu)
- smp_call_function_single(from, giveup_do_timer, &cpu, 1);
-}
-#endif
-
/*
* Setup the tick device
*/
@@ -221,19 +201,25 @@ static void tick_setup_device(struct tick_device *td,
tick_next_period = ktime_get();
#ifdef CONFIG_NO_HZ_FULL
/*
- * The boot CPU may be nohz_full, in which case set
- * tick_do_timer_boot_cpu so the first housekeeping
- * secondary that comes up will take do_timer from
- * us.
+ * The boot CPU may be nohz_full, in which case the
+ * first housekeeping secondary will take do_timer()
+ * from it.
*/
if (tick_nohz_full_cpu(cpu))
tick_do_timer_boot_cpu = cpu;
- } else if (tick_do_timer_boot_cpu != -1 &&
- !tick_nohz_full_cpu(cpu)) {
- tick_take_do_timer_from_boot();
+ } else if (tick_do_timer_boot_cpu != -1 && !tick_nohz_full_cpu(cpu)) {
tick_do_timer_boot_cpu = -1;
- WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
+ /*
+ * The boot CPU will stay in periodic (NOHZ disabled)
+ * mode until clocksource_done_booting() called after
+ * smp_init() selects a high resolution clocksource and
+ * timekeeping_notify() kicks the NOHZ stuff alive.
+ *
+ * So this WRITE_ONCE can only race with the READ_ONCE
+ * check in tick_periodic() but this race is harmless.
+ */
+ WRITE_ONCE(tick_do_timer_cpu, cpu);
#endif
}
The patch titled
Subject: gcov: add support for GCC 14
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
gcov-add-support-for-gcc-14.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Oberparleiter <oberpar(a)linux.ibm.com>
Subject: gcov: add support for GCC 14
Date: Mon, 10 Jun 2024 11:27:43 +0200
Using gcov on kernels compiled with GCC 14 results in truncated 16-byte
long .gcda files with no usable data. To fix this, update GCOV_COUNTERS
to match the value defined by GCC 14.
Tested with GCC versions 14.1.0 and 13.2.0.
Link: https://lkml.kernel.org/r/20240610092743.1609845-1-oberpar@linux.ibm.com
Signed-off-by: Peter Oberparleiter <oberpar(a)linux.ibm.com>
Reported-by: Allison Henderson <allison.henderson(a)oracle.com>
Reported-by: Chuck Lever III <chuck.lever(a)oracle.com>
Tested-by: Chuck Lever <chuck.lever(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/gcov/gcc_4_7.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/kernel/gcov/gcc_4_7.c~gcov-add-support-for-gcc-14
+++ a/kernel/gcov/gcc_4_7.c
@@ -18,7 +18,9 @@
#include <linux/mm.h>
#include "gcov.h"
-#if (__GNUC__ >= 10)
+#if (__GNUC__ >= 14)
+#define GCOV_COUNTERS 9
+#elif (__GNUC__ >= 10)
#define GCOV_COUNTERS 8
#elif (__GNUC__ >= 7)
#define GCOV_COUNTERS 9
_
Patches currently in -mm which might be from oberpar(a)linux.ibm.com are
gcov-add-support-for-gcc-14.patch
Commit 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed
serdev") will cause below regression issue:
BT can't be enabled after below steps:
cold boot -> enable BT -> disable BT -> warm reboot -> BT enable failure
if property enable-gpios is not configured within DT|ACPI for QCA6390.
The commit is to fix a use-after-free issue within qca_serdev_shutdown()
by adding condition to avoid the serdev is flushed or wrote after closed
but also introduces this regression issue regarding above steps since the
VSC is not sent to reset controller during warm reboot.
Fixed by sending the VSC to reset controller within qca_serdev_shutdown()
once BT was ever enabled, and the use-after-free issue is also fixed by
this change since the serdev is still opened before it is flushed or wrote.
Verified by the reported machine Dell XPS 13 9310 laptop over below two
kernel commits:
commit e00fc2700a3f ("Bluetooth: btusb: Fix triggering coredump
implementation for QCA") of bluetooth-next tree.
commit b23d98d46d28 ("Bluetooth: btusb: Fix triggering coredump
implementation for QCA") of linus mainline tree.
Fixes: 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev")
Cc: stable(a)vger.kernel.org
Reported-by: Wren Turkal <wt(a)penguintechs.org>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218726
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
Tested-by: Wren Turkal <wt(a)penguintechs.org>
---
V1 -> V2: Add comments and more commit messages
V1 discussion link:
https://lore.kernel.org/linux-bluetooth/d553edef-c1a4-4d52-a892-715549d31eb…
drivers/bluetooth/hci_qca.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 0c9c9ee56592..9a0bc86f9aac 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -2450,15 +2450,27 @@ static void qca_serdev_shutdown(struct device *dev)
struct qca_serdev *qcadev = serdev_device_get_drvdata(serdev);
struct hci_uart *hu = &qcadev->serdev_hu;
struct hci_dev *hdev = hu->hdev;
- struct qca_data *qca = hu->priv;
const u8 ibs_wake_cmd[] = { 0xFD };
const u8 edl_reset_soc_cmd[] = { 0x01, 0x00, 0xFC, 0x01, 0x05 };
if (qcadev->btsoc_type == QCA_QCA6390) {
- if (test_bit(QCA_BT_OFF, &qca->flags) ||
- !test_bit(HCI_RUNNING, &hdev->flags))
+ /* The purpose of sending the VSC is to reset SOC into a initial
+ * state and the state will ensure next hdev->setup() success.
+ * if HCI_QUIRK_NON_PERSISTENT_SETUP is set, it means that
+ * hdev->setup() can do its job regardless of SoC state, so
+ * don't need to send the VSC.
+ * if HCI_SETUP is set, it means that hdev->setup() was never
+ * invoked and the SOC is already in the initial state, so
+ * don't also need to send the VSC.
+ */
+ if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks) ||
+ hci_dev_test_flag(hdev, HCI_SETUP))
return;
+ /* The serdev must be in open state when conrol logic arrives
+ * here, so also fix the use-after-free issue caused by that
+ * the serdev is flushed or wrote after it is closed.
+ */
serdev_device_write_flush(serdev);
ret = serdev_device_write_buf(serdev, ibs_wake_cmd,
sizeof(ibs_wake_cmd));
--
2.7.4
From: Vasant Karasulli <vkarasulli(a)suse.de>
Hi,
here are changes to enable kexec/kdump in SEV-ES guests. The biggest
problem for supporting kexec/kdump under SEV-ES is to find a way to
hand the non-boot CPUs (APs) from one kernel to another.
Without SEV-ES the first kernel parks the CPUs in a HLT loop until
they get reset by the kexec'ed kernel via an INIT-SIPI-SIPI sequence.
For virtual machines the CPU reset is emulated by the hypervisor,
which sets the vCPU registers back to reset state.
This does not work under SEV-ES, because the hypervisor has no access
to the vCPU registers and can't make modifications to them. So an
SEV-ES guest needs to reset the vCPU itself and park it using the
AP-reset-hold protocol. Upon wakeup the guest needs to jump to
real-mode and to the reset-vector configured in the AP-Jump-Table.
The code to do this is the main part of this patch-set. It works by
placing code on the AP Jump-Table page itself to park the vCPU and for
jumping to the reset vector upon wakeup. The code on the AP Jump Table
runs in 16-bit protected mode with segment base set to the beginning
of the page. The AP Jump-Table is usually not within the first 1MB of
memory, so the code can't run in real-mode.
The AP Jump-Table is the best place to put the parking code, because
the memory is owned, but read-only by the firmware and writeable by
the OS. Only the first 4 bytes are used for the reset-vector, leaving
the rest of the page for code/data/stack to park a vCPU. The code
can't be in kernel memory because by the time the vCPU wakes up the
memory will be owned by the new kernel, which might have overwritten it
already.
The other patches add initial GHCB Version 2 protocol support, because
kexec/kdump need the MSR-based (without a GHCB) AP-reset-hold VMGEXIT,
which is a GHCB protocol version 2 feature.
The kexec'ed kernel is also entered via the decompressor and needs
MMIO support there, so this patch-set also adds MMIO #VC support to
the decompressor and support for handling CLFLUSH instructions.
Finally there is also code to disable kexec/kdump support at runtime
when the environment does not support it (e.g. no GHCB protocol
version 2 support or AP Jump Table over 4GB).
The diffstat looks big, but most of it is moving code for MMIO #VC
support around to make it available to the decompressor.
The previous version of this patch-set can be found here:
https://lore.kernel.org/kvm/20240408074049.7049-1-vsntk18@gmail.com/
Please review.
Thanks,
Vasant
Changes v5->v6:
- Rebased to v6.10-rc3 kernel
Changes v4->v5:
- Rebased to v6.9-rc2 kernel
- Applied review comments by Tom Lendacky
- Exclude the AP jump table related code for SEV-SNP guests
Changes v3->v4:
- Rebased to v6.8 kernel
- Applied review comments by Sean Christopherson
- Combined sev_es_setup_ap_jump_table() and sev_setup_ap_jump_table()
into a single function which makes caching jump table address
unnecessary
- annotated struct sev_ap_jump_table_header with __packed attribute
- added code to set up real mode data segment at boot time instead of
hardcoding the value.
Joerg Roedel (9):
x86/kexec/64: Disable kexec when SEV-ES is active
x86/sev: Save and print negotiated GHCB protocol version
x86/sev: Set GHCB data structure version
x86/sev: Setup code to park APs in the AP Jump Table
x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
x86/sev: Use AP Jump Table blob to stop CPU
x86/sev: Add MMIO handling support to boot/compressed/ code
x86/sev: Handle CLFLUSH MMIO events
x86/kexec/64: Support kexec under SEV-ES with AP Jump Table Blob
Vasant Karasulli (1):
x86/sev: Exclude AP jump table related code for SEV-SNP guests
arch/x86/boot/compressed/sev.c | 45 +-
arch/x86/include/asm/insn-eval.h | 1 +
arch/x86/include/asm/realmode.h | 5 +
arch/x86/include/asm/sev-ap-jumptable.h | 30 +
arch/x86/include/asm/sev.h | 7 +
arch/x86/kernel/machine_kexec_64.c | 12 +
arch/x86/kernel/process.c | 8 +
arch/x86/kernel/sev-shared.c | 234 +++++-
arch/x86/kernel/sev.c | 376 +++++-----
arch/x86/lib/insn-eval-shared.c | 921 ++++++++++++++++++++++++
arch/x86/lib/insn-eval.c | 911 +----------------------
arch/x86/realmode/Makefile | 9 +-
arch/x86/realmode/init.c | 5 +-
arch/x86/realmode/rm/Makefile | 11 +-
arch/x86/realmode/rm/header.S | 3 +
arch/x86/realmode/rm/sev.S | 85 +++
arch/x86/realmode/rmpiggy.S | 6 +
arch/x86/realmode/sev/Makefile | 33 +
arch/x86/realmode/sev/ap_jump_table.S | 131 ++++
arch/x86/realmode/sev/ap_jump_table.lds | 24 +
20 files changed, 1711 insertions(+), 1146 deletions(-)
create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h
create mode 100644 arch/x86/lib/insn-eval-shared.c
create mode 100644 arch/x86/realmode/rm/sev.S
create mode 100644 arch/x86/realmode/sev/Makefile
create mode 100644 arch/x86/realmode/sev/ap_jump_table.S
create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds
base-commit: 83a7eefedc9b56fe7bfeff13b6c7356688ffa670
--
2.34.1
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: mgb4: Fix double debugfs remove
Author: Martin Tůma <martin.tuma(a)digiteqautomotive.com>
Date: Tue May 21 18:22:54 2024 +0200
Fixes an error where debugfs_remove_recursive() is called first on a parent
directory and then again on a child which causes a kernel panic.
Signed-off-by: Martin Tůma <martin.tuma(a)digiteqautomotive.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Fixes: 0ab13674a9bd ("media: pci: mgb4: Added Digiteq Automotive MGB4 driver")
Cc: <stable(a)vger.kernel.org>
[hverkuil: added Fixes/Cc tags]
drivers/media/pci/mgb4/mgb4_core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
---
diff --git a/drivers/media/pci/mgb4/mgb4_core.c b/drivers/media/pci/mgb4/mgb4_core.c
index 60498a5abebf..ab4f07e2e560 100644
--- a/drivers/media/pci/mgb4/mgb4_core.c
+++ b/drivers/media/pci/mgb4/mgb4_core.c
@@ -642,9 +642,6 @@ static void mgb4_remove(struct pci_dev *pdev)
struct mgb4_dev *mgbdev = pci_get_drvdata(pdev);
int i;
-#ifdef CONFIG_DEBUG_FS
- debugfs_remove_recursive(mgbdev->debugfs);
-#endif
#if IS_REACHABLE(CONFIG_HWMON)
hwmon_device_unregister(mgbdev->hwmon_dev);
#endif
@@ -659,6 +656,10 @@ static void mgb4_remove(struct pci_dev *pdev)
if (mgbdev->vin[i])
mgb4_vin_free(mgbdev->vin[i]);
+#ifdef CONFIG_DEBUG_FS
+ debugfs_remove_recursive(mgbdev->debugfs);
+#endif
+
device_remove_groups(&mgbdev->pdev->dev, mgb4_pci_groups);
free_spi(mgbdev);
free_i2c(mgbdev);
Hi Pauli,
well done! This patch fixes the issue. Thank you.
Best regards,
Timo
Am So., 9. Juni 2024 um 17:06 Uhr schrieb Pauli Virtanen <pav(a)iki.fi>:
>
> The amp_id argument of l2cap_connect() was removed in
> commit 84a4bb6548a2 ("Bluetooth: HCI: Remove HCI_AMP support")
>
> It was always called with amp_id == 0, i.e. AMP_ID_BREDR == 0x00 (ie.
> non-AMP controller). In the above commit, the code path for amp_id != 0
> was preserved, although it should have used the amp_id == 0 one.
>
> Restore the previous behavior of the non-AMP code path, to fix problems
> with L2CAP connections.
>
> Fixes: 84a4bb6548a2 ("Bluetooth: HCI: Remove HCI_AMP support")
> Signed-off-by: Pauli Virtanen <pav(a)iki.fi>
> ---
>
> Notes:
> v2: do the change in the actually right if branch
>
> Tried proofreading the commit, and this part seemed suspicious.
> Can you try if this fixes the problem?
>
> net/bluetooth/l2cap_core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> index c49e0d4b3c0d..aed025734d04 100644
> --- a/net/bluetooth/l2cap_core.c
> +++ b/net/bluetooth/l2cap_core.c
> @@ -4011,8 +4011,8 @@ static void l2cap_connect(struct l2cap_conn *conn, struct l2cap_cmd_hdr *cmd,
> status = L2CAP_CS_AUTHOR_PEND;
> chan->ops->defer(chan);
> } else {
> - l2cap_state_change(chan, BT_CONNECT2);
> - result = L2CAP_CR_PEND;
> + l2cap_state_change(chan, BT_CONFIG);
> + result = L2CAP_CR_SUCCESS;
> status = L2CAP_CS_NO_INFO;
> }
> } else {
> --
> 2.45.2
>
Hallo,
on my two notebooks, one with Ubuntu (Mainline Kernel 6.9.3, bluez
5.7.2) and the other one with Manjaro (6.9.3, bluez 5.7.6) I'm having
problems with my Sony WH-1000XM3 and Shure BT1. Either A2DP or HFP/HSP
is not available after the connection has been established after a
reboot or a reconnection. It's reproducible that with the WH-1000XM3
the A2DP profiles are missing and with the Shure BT1 HFP/HSP profiles
are missing. It also takes longer than usual to connect and I have a
log message in the journal:
Jun 06 16:28:10 liebig bluetoothd[854]:
profiles/audio/avdtp.c:cancel_request() Discover: Connection timed out
(110)
When I disable and re-enable bluetooth (while the Headsets are still
on) and trigger a reconnect from the notebooks, A2DP and HFP/HSP
Profiles are available again.
I also tested it with 6.8.12 and it's the same problem. 6.8.11 and
6.9.2 don't have the problem.
So I did a bisection. After reverting commit
af1d425b6dc67cd67809f835dd7afb6be4d43e03 "Bluetooth: HCI: Remove
HCI_AMP support" for 6.9.3 it's working again without problems.
Let me know if you need anything from me.
Best regards,
Timo
The patch titled
From: <xu.xin16(a)zte.com.cn>
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-fix-misused-mapping_large_folio_support-for-anon-folios.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ran Xiaokai <ran.xiaokai(a)zte.com.cn>
Subject: mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
Date: Fri, 7 Jun 2024 17:40:48 +0800 (CST)
When I did a large folios split test, a WARNING "[ 5059.122759][ T166]
Cannot split file folio to non-0 order" was triggered. But the test cases
are only for anonmous folios. while mapping_large_folio_support() is only
reasonable for page cache folios.
In split_huge_page_to_list_to_order(), the folio passed to
mapping_large_folio_support() maybe anonmous folio. The folio_test_anon()
check is missing. So the split of the anonmous THP is failed. This is
also the same for shmem_mapping(). We'd better add a check for both. But
the shmem_mapping() in __split_huge_page() is not involved, as for
anonmous folios, the end parameter is set to -1, so (head[i].index >= end)
is always false. shmem_mapping() is not called.
Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support() for anon
mapping, So we can detect the wrong use more easily.
THP folios maybe exist in the pagecache even the file system doesn't
support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE is
enabled, khugepaged will try to collapse read-only file-backed pages to
THP. But the mapping does not actually support multi order large folios
properly.
Using /sys/kernel/debug/split_huge_pages to verify this, with this patch,
large anon THP is successfully split and the warning is ceased.
Link: https://lkml.kernel.org/r/202406071740485174hcFl7jRxncsHDtI-Pz-o@zte.com.cn
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Reviewed-by: Barry Song <baohua(a)kernel.org>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Ran Xiaokai <ran.xiaokai(a)zte.com.cn>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: xu xin <xu.xin16(a)zte.com.cn>
Cc: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/pagemap.h | 4 ++++
mm/huge_memory.c | 28 +++++++++++++++++-----------
2 files changed, 21 insertions(+), 11 deletions(-)
--- a/include/linux/pagemap.h~mm-huge_memory-fix-misused-mapping_large_folio_support-for-anon-folios
+++ a/include/linux/pagemap.h
@@ -368,6 +368,10 @@ static inline void mapping_set_large_fol
*/
static inline bool mapping_large_folio_support(struct address_space *mapping)
{
+ /* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
+ VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
+ "Anonymous mapping always supports large folio");
+
return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
}
--- a/mm/huge_memory.c~mm-huge_memory-fix-misused-mapping_large_folio_support-for-anon-folios
+++ a/mm/huge_memory.c
@@ -3009,30 +3009,36 @@ int split_huge_page_to_list_to_order(str
if (new_order >= folio_order(folio))
return -EINVAL;
- /* Cannot split anonymous THP to order-1 */
- if (new_order == 1 && folio_test_anon(folio)) {
- VM_WARN_ONCE(1, "Cannot split to order-1 folio");
- return -EINVAL;
- }
-
- if (new_order) {
- /* Only swapping a whole PMD-mapped folio is supported */
- if (folio_test_swapcache(folio))
+ if (folio_test_anon(folio)) {
+ /* order-1 is not supported for anonymous THP. */
+ if (new_order == 1) {
+ VM_WARN_ONCE(1, "Cannot split to order-1 folio");
return -EINVAL;
+ }
+ } else if (new_order) {
/* Split shmem folio to non-zero order not supported */
if (shmem_mapping(folio->mapping)) {
VM_WARN_ONCE(1,
"Cannot split shmem folio to non-0 order");
return -EINVAL;
}
- /* No split if the file system does not support large folio */
- if (!mapping_large_folio_support(folio->mapping)) {
+ /*
+ * No split if the file system does not support large folio.
+ * Note that we might still have THPs in such mappings due to
+ * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
+ * does not actually support large folios properly.
+ */
+ if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
+ !mapping_large_folio_support(folio->mapping)) {
VM_WARN_ONCE(1,
"Cannot split file folio to non-0 order");
return -EINVAL;
}
}
+ /* Only swapping a whole PMD-mapped folio is supported */
+ if (folio_test_swapcache(folio) && new_order)
+ return -EINVAL;
is_hzp = is_huge_zero_folio(folio);
if (is_hzp) {
_
Patches currently in -mm which might be from ran.xiaokai(a)zte.com.cn are
mm-huge_memory-fix-misused-mapping_large_folio_support-for-anon-folios.patch
mm-huge_memory-mark-racy-access-onhuge_anon_orders_always.patch
The patch titled
From: <xu.xin16(a)zte.com.cn>
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
=utf-8bw1bbveniigxpbnv4lw5lehqgdjndig1toibodwdlx21lbw9yetogzml4ig1pc3vzzwqgbwfwcgluz19syxjnzv9mb2xpb19zdxbwb3j0kcncocbmb3igyw5vbibmb2xpb3m==.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ran Xiaokai <ran.xiaokai(a)zte.com.cn>
Subject: mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
Date: Fri, 7 Jun 2024 17:40:48 +0800 (CST)
When I did a large folios split test, a WARNING "[ 5059.122759][ T166]
Cannot split file folio to non-0 order" was triggered. But the test cases
are only for anonmous folios. while mapping_large_folio_support() is only
reasonable for page cache folios.
In split_huge_page_to_list_to_order(), the folio passed to
mapping_large_folio_support() maybe anonmous folio. The folio_test_anon()
check is missing. So the split of the anonmous THP is failed. This is
also the same for shmem_mapping(). We'd better add a check for both. But
the shmem_mapping() in __split_huge_page() is not involved, as for
anonmous folios, the end parameter is set to -1, so (head[i].index >= end)
is always false. shmem_mapping() is not called.
Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support() for anon
mapping, So we can detect the wrong use more easily.
THP folios maybe exist in the pagecache even the file system doesn't
support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE is
enabled, khugepaged will try to collapse read-only file-backed pages to
THP. But the mapping does not actually support multi order large folios
properly.
Using /sys/kernel/debug/split_huge_pages to verify this, with this patch,
large anon THP is successfully split and the warning is ceased.
Link: https://lkml.kernel.org/r/202406071740485174hcFl7jRxncsHDtI-Pz-o@zte.com.cn
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Reviewed-by: Barry Song <baohua(a)kernel.org>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Ran Xiaokai <ran.xiaokai(a)zte.com.cn>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: xu xin <xu.xin16(a)zte.com.cn>
Cc: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/pagemap.h | 4 ++++
mm/huge_memory.c | 28 +++++++++++++++++-----------
2 files changed, 21 insertions(+), 11 deletions(-)
--- a/include/linux/pagemap.h~=utf-8bw1bbveniigxpbnv4lw5lehqgdjndig1toibodwdlx21lbw9yetogzml4ig1pc3vzzwqgbwfwcgluz19syxjnzv9mb2xpb19zdxbwb3j0kcncocbmb3igyw5vbibmb2xpb3m==
+++ a/include/linux/pagemap.h
@@ -368,6 +368,10 @@ static inline void mapping_set_large_fol
*/
static inline bool mapping_large_folio_support(struct address_space *mapping)
{
+ /* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */
+ VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON,
+ "Anonymous mapping always supports large folio");
+
return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
}
--- a/mm/huge_memory.c~=utf-8bw1bbveniigxpbnv4lw5lehqgdjndig1toibodwdlx21lbw9yetogzml4ig1pc3vzzwqgbwfwcgluz19syxjnzv9mb2xpb19zdxbwb3j0kcncocbmb3igyw5vbibmb2xpb3m==
+++ a/mm/huge_memory.c
@@ -3009,30 +3009,36 @@ int split_huge_page_to_list_to_order(str
if (new_order >= folio_order(folio))
return -EINVAL;
- /* Cannot split anonymous THP to order-1 */
- if (new_order == 1 && folio_test_anon(folio)) {
- VM_WARN_ONCE(1, "Cannot split to order-1 folio");
- return -EINVAL;
- }
-
- if (new_order) {
- /* Only swapping a whole PMD-mapped folio is supported */
- if (folio_test_swapcache(folio))
+ if (folio_test_anon(folio)) {
+ /* order-1 is not supported for anonymous THP. */
+ if (new_order == 1) {
+ VM_WARN_ONCE(1, "Cannot split to order-1 folio");
return -EINVAL;
+ }
+ } else if (new_order) {
/* Split shmem folio to non-zero order not supported */
if (shmem_mapping(folio->mapping)) {
VM_WARN_ONCE(1,
"Cannot split shmem folio to non-0 order");
return -EINVAL;
}
- /* No split if the file system does not support large folio */
- if (!mapping_large_folio_support(folio->mapping)) {
+ /*
+ * No split if the file system does not support large folio.
+ * Note that we might still have THPs in such mappings due to
+ * CONFIG_READ_ONLY_THP_FOR_FS. But in that case, the mapping
+ * does not actually support large folios properly.
+ */
+ if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
+ !mapping_large_folio_support(folio->mapping)) {
VM_WARN_ONCE(1,
"Cannot split file folio to non-0 order");
return -EINVAL;
}
}
+ /* Only swapping a whole PMD-mapped folio is supported */
+ if (folio_test_swapcache(folio) && new_order)
+ return -EINVAL;
is_hzp = is_huge_zero_folio(folio);
if (is_hzp) {
_
Patches currently in -mm which might be from ran.xiaokai(a)zte.com.cn are
=utf-8bw1bbveniigxpbnv4lw5lehqgdjndig1toibodwdlx21lbw9yetogzml4ig1pc3vzzwqgbwfwcgluz19syxjnzv9mb2xpb19zdxbwb3j0kcncocbmb3igyw5vbibmb2xpb3m==.patch
mm-huge_memory-mark-racy-access-onhuge_anon_orders_always.patch
The SPMI GPIO driver assumes that the parent device is an SPMI device
and accesses random data when backcasting the parent struct device
pointer for non-SPMI devices.
Fortunately this does not seem to cause any issues currently when the
parent device is an I2C client like the PM8008, but this could change if
the structures are reorganised (e.g. using structure randomisation).
Notably the interrupt implementation is also broken for non-SPMI devices.
Also note that the two GPIO pins on PM8008 are used for interrupts and
reset so their practical use should be limited.
Drop the broken GPIO support for PM8008 for now.
Fixes: ea119e5a482a ("pinctrl: qcom-pmic-gpio: Add support for pm8008")
Cc: stable(a)vger.kernel.org # 5.13
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Reviewed-by: Stephen Boyd <swboyd(a)chromium.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/pinctrl/qcom/pinctrl-spmi-gpio.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c b/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c
index f4e2c88a7c82..e61be7d05494 100644
--- a/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c
+++ b/drivers/pinctrl/qcom/pinctrl-spmi-gpio.c
@@ -1206,7 +1206,6 @@ static const struct of_device_id pmic_gpio_of_match[] = {
{ .compatible = "qcom,pm7325-gpio", .data = (void *) 10 },
{ .compatible = "qcom,pm7550ba-gpio", .data = (void *) 8},
{ .compatible = "qcom,pm8005-gpio", .data = (void *) 4 },
- { .compatible = "qcom,pm8008-gpio", .data = (void *) 2 },
{ .compatible = "qcom,pm8019-gpio", .data = (void *) 6 },
/* pm8150 has 10 GPIOs with holes on 2, 5, 7 and 8 */
{ .compatible = "qcom,pm8150-gpio", .data = (void *) 10 },
--
2.44.1
This is a note to let you know that I've just added the patch titled
iio: temperature: mlx90635: Fix ERR_PTR dereference in
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From a23c14b062d8800a2192077d83273bbfe6c7552d Mon Sep 17 00:00:00 2001
From: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Date: Mon, 13 May 2024 13:34:27 -0700
Subject: iio: temperature: mlx90635: Fix ERR_PTR dereference in
mlx90635_probe()
When devm_regmap_init_i2c() fails, regmap_ee could be error pointer,
instead of checking for IS_ERR(regmap_ee), regmap is checked which looks
like a copy paste error.
Fixes: a1d1ba5e1c28 ("iio: temperature: mlx90635 MLX90635 IR Temperature sensor")
Reviewed-by: Crt Mori<cmo(a)melexis.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Link: https://lore.kernel.org/r/20240513203427.3208696-1-harshit.m.mogalapalli@or…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/temperature/mlx90635.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/iio/temperature/mlx90635.c b/drivers/iio/temperature/mlx90635.c
index 1f5c962c1818..f7f88498ba0e 100644
--- a/drivers/iio/temperature/mlx90635.c
+++ b/drivers/iio/temperature/mlx90635.c
@@ -947,9 +947,9 @@ static int mlx90635_probe(struct i2c_client *client)
"failed to allocate regmap\n");
regmap_ee = devm_regmap_init_i2c(client, &mlx90635_regmap_ee);
- if (IS_ERR(regmap))
- return dev_err_probe(&client->dev, PTR_ERR(regmap),
- "failed to allocate regmap\n");
+ if (IS_ERR(regmap_ee))
+ return dev_err_probe(&client->dev, PTR_ERR(regmap_ee),
+ "failed to allocate EEPROM regmap\n");
mlx90635 = iio_priv(indio_dev);
i2c_set_clientdata(client, indio_dev);
--
2.45.2
This is a note to let you know that I've just added the patch titled
iio: imu: bmi323: Fix trigger notification in case of error
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From bedb2ccb566de5ca0c336ca3fd3588cea6d50414 Mon Sep 17 00:00:00 2001
From: Vasileios Amoiridis <vassilisamir(a)gmail.com>
Date: Wed, 8 May 2024 17:54:07 +0200
Subject: iio: imu: bmi323: Fix trigger notification in case of error
In case of error in the bmi323_trigger_handler() function, the
function exits without calling the iio_trigger_notify_done()
which is responsible for informing the attached trigger that
the process is done and in case there is a .reenable(), to
call it.
Fixes: 8a636db3aa57 ("iio: imu: Add driver for BMI323 IMU")
Signed-off-by: Vasileios Amoiridis <vassilisamir(a)gmail.com>
Link: https://lore.kernel.org/r/20240508155407.139805-1-vassilisamir@gmail.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/imu/bmi323/bmi323_core.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/imu/bmi323/bmi323_core.c b/drivers/iio/imu/bmi323/bmi323_core.c
index 5d42ab9b176a..67d74a1a1b26 100644
--- a/drivers/iio/imu/bmi323/bmi323_core.c
+++ b/drivers/iio/imu/bmi323/bmi323_core.c
@@ -1391,7 +1391,7 @@ static irqreturn_t bmi323_trigger_handler(int irq, void *p)
&data->buffer.channels,
ARRAY_SIZE(data->buffer.channels));
if (ret)
- return IRQ_NONE;
+ goto out;
} else {
for_each_set_bit(bit, indio_dev->active_scan_mask,
BMI323_CHAN_MAX) {
@@ -1400,13 +1400,14 @@ static irqreturn_t bmi323_trigger_handler(int irq, void *p)
&data->buffer.channels[index++],
BMI323_BYTES_PER_SAMPLE);
if (ret)
- return IRQ_NONE;
+ goto out;
}
}
iio_push_to_buffers_with_timestamp(indio_dev, &data->buffer,
iio_get_time_ns(indio_dev));
+out:
iio_trigger_notify_done(indio_dev->trig);
return IRQ_HANDLED;
--
2.45.2
This is a note to let you know that I've just added the patch titled
iio: dac: ad5592r: fix temperature channel scaling value
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 279428df888319bf68f2686934897301a250bb84 Mon Sep 17 00:00:00 2001
From: Marc Ferland <marc.ferland(a)sonatest.com>
Date: Wed, 1 May 2024 11:05:54 -0400
Subject: iio: dac: ad5592r: fix temperature channel scaling value
The scale value for the temperature channel is (assuming Vref=2.5 and
the datasheet):
376.7897513
When calculating both val and val2 for the temperature scale we
use (3767897513/25) and multiply it by Vref (here I assume 2500mV) to
obtain:
2500 * (3767897513/25) ==> 376789751300
Finally we divide with remainder by 10^9 to get:
val = 376
val2 = 789751300
However, we return IIO_VAL_INT_PLUS_MICRO (should have been NANO) as
the scale type. So when converting the raw temperature value to the
'processed' temperature value we will get (assuming raw=810,
offset=-753):
processed = (raw + offset) * scale_val
= (810 + -753) * 376
= 21432
processed += div((raw + offset) * scale_val2, 10^6)
+= div((810 + -753) * 789751300, 10^6)
+= 45015
==> 66447
==> 66.4 Celcius
instead of the expected 21.5 Celsius.
Fix this issue by changing IIO_VAL_INT_PLUS_MICRO to
IIO_VAL_INT_PLUS_NANO.
Fixes: 56ca9db862bf ("iio: dac: Add support for the AD5592R/AD5593R ADCs/DACs")
Signed-off-by: Marc Ferland <marc.ferland(a)sonatest.com>
Link: https://lore.kernel.org/r/20240501150554.1871390-1-marc.ferland@sonatest.com
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/dac/ad5592r-base.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/dac/ad5592r-base.c b/drivers/iio/dac/ad5592r-base.c
index 076bc9ecfb49..4763402dbcd6 100644
--- a/drivers/iio/dac/ad5592r-base.c
+++ b/drivers/iio/dac/ad5592r-base.c
@@ -415,7 +415,7 @@ static int ad5592r_read_raw(struct iio_dev *iio_dev,
s64 tmp = *val * (3767897513LL / 25LL);
*val = div_s64_rem(tmp, 1000000000LL, val2);
- return IIO_VAL_INT_PLUS_MICRO;
+ return IIO_VAL_INT_PLUS_NANO;
}
mutex_lock(&st->lock);
--
2.45.2
This is a note to let you know that I've just added the patch titled
iio: pressure: bmp280: Fix BMP580 temperature reading
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 0f0f6306617cb4b6231fc9d4ec68ab9a56dba7c0 Mon Sep 17 00:00:00 2001
From: Adam Rizkalla <ajarizzo(a)gmail.com>
Date: Thu, 25 Apr 2024 01:22:49 -0500
Subject: iio: pressure: bmp280: Fix BMP580 temperature reading
Fix overflow issue when storing BMP580 temperature reading and
properly preserve sign of 24-bit data.
Signed-off-by: Adam Rizkalla <ajarizzo(a)gmail.com>
Tested-By: Vasileios Amoiridis <vassilisamir(a)gmail.com>
Acked-by: Angel Iglesias <ang.iglesiasg(a)gmail.com>
Link: https://lore.kernel.org/r/Zin2udkXRD0+GrML@adam-asahi.lan
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/pressure/bmp280-core.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/iio/pressure/bmp280-core.c b/drivers/iio/pressure/bmp280-core.c
index 09f53d987c7d..221fa2c552ae 100644
--- a/drivers/iio/pressure/bmp280-core.c
+++ b/drivers/iio/pressure/bmp280-core.c
@@ -1394,12 +1394,12 @@ static int bmp580_read_temp(struct bmp280_data *data, int *val, int *val2)
/*
* Temperature is returned in Celsius degrees in fractional
- * form down 2^16. We rescale by x1000 to return milli Celsius
- * to respect IIO ABI.
+ * form down 2^16. We rescale by x1000 to return millidegrees
+ * Celsius to respect IIO ABI.
*/
- *val = raw_temp * 1000;
- *val2 = 16;
- return IIO_VAL_FRACTIONAL_LOG2;
+ raw_temp = sign_extend32(raw_temp, 23);
+ *val = ((s64)raw_temp * 1000) / (1 << 16);
+ return IIO_VAL_INT;
}
static int bmp580_read_press(struct bmp280_data *data, int *val, int *val2)
--
2.45.2
This is a note to let you know that I've just added the patch titled
iio: adc: ad9467: fix scan type sign
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 8a01ef749b0a632f0e1f4ead0f08b3310d99fcb1 Mon Sep 17 00:00:00 2001
From: David Lechner <dlechner(a)baylibre.com>
Date: Fri, 3 May 2024 14:45:05 -0500
Subject: iio: adc: ad9467: fix scan type sign
According to the IIO documentation, the sign in the scan type should be
lower case. The ad9467 driver was incorrectly using upper case.
Fix by changing to lower case.
Fixes: 4606d0f4b05f ("iio: adc: ad9467: add support for AD9434 high-speed ADC")
Fixes: ad6797120238 ("iio: adc: ad9467: add support AD9467 ADC")
Signed-off-by: David Lechner <dlechner(a)baylibre.com>
Link: https://lore.kernel.org/r/20240503-ad9467-fix-scan-type-sign-v1-1-c7a1a066e…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/ad9467.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/adc/ad9467.c b/drivers/iio/adc/ad9467.c
index e85b763b9ffc..8f5b9c3f6e3d 100644
--- a/drivers/iio/adc/ad9467.c
+++ b/drivers/iio/adc/ad9467.c
@@ -243,11 +243,11 @@ static void __ad9467_get_scale(struct ad9467_state *st, int index,
}
static const struct iio_chan_spec ad9434_channels[] = {
- AD9467_CHAN(0, 0, 12, 'S'),
+ AD9467_CHAN(0, 0, 12, 's'),
};
static const struct iio_chan_spec ad9467_channels[] = {
- AD9467_CHAN(0, 0, 16, 'S'),
+ AD9467_CHAN(0, 0, 16, 's'),
};
static const struct ad9467_chip_info ad9467_chip_tbl = {
--
2.45.2
A Rembrandt-based HP thin client is reported to have problems where
the NVME disk isn't present after resume from s2idle.
This is because the NVME disk wasn't put into D3 at suspend, and
that happened because the StorageD3Enable _DSD was missing in the BIOS.
As AMD's architecture requires that the NVME is in D3 for s2idle, adjust
the criteria for force_storage_d3 to match *all* Zen SoCs when the FADT
advertises low power idle support.
This will ensure that any future products with this BIOS deficiency don't
need to be added to the allow list of overrides.
Cc: stable(a)vger.kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/acpi/x86/utils.c | 24 ++++++++++--------------
1 file changed, 10 insertions(+), 14 deletions(-)
diff --git a/drivers/acpi/x86/utils.c b/drivers/acpi/x86/utils.c
index 90c3d2eab9e9..7507a7706898 100644
--- a/drivers/acpi/x86/utils.c
+++ b/drivers/acpi/x86/utils.c
@@ -197,16 +197,16 @@ bool acpi_device_override_status(struct acpi_device *adev, unsigned long long *s
}
/*
- * AMD systems from Renoir and Lucienne *require* that the NVME controller
+ * AMD systems from Renoir onwards *require* that the NVME controller
* is put into D3 over a Modern Standby / suspend-to-idle cycle.
*
* This is "typically" accomplished using the `StorageD3Enable`
* property in the _DSD that is checked via the `acpi_storage_d3` function
- * but this property was introduced after many of these systems launched
- * and most OEM systems don't have it in their BIOS.
+ * but some OEM systems still don't have it in their BIOS.
*
* The Microsoft documentation for StorageD3Enable mentioned that Windows has
- * a hardcoded allowlist for D3 support, which was used for these platforms.
+ * a hardcoded allowlist for D3 support as well as a registry key to override
+ * the BIOS, which has been used for these cases.
*
* This allows quirking on Linux in a similar fashion.
*
@@ -219,19 +219,15 @@ bool acpi_device_override_status(struct acpi_device *adev, unsigned long long *s
* https://bugzilla.kernel.org/show_bug.cgi?id=216773
* https://bugzilla.kernel.org/show_bug.cgi?id=217003
* 2) On at least one HP system StorageD3Enable is missing on the second NVME
- disk in the system.
+ * disk in the system.
+ * 3) On at least one HP Rembrandt system StorageD3Enable is missing on the only
+ * NVME device.
*/
-static const struct x86_cpu_id storage_d3_cpu_ids[] = {
- X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 24, NULL), /* Picasso */
- X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 96, NULL), /* Renoir */
- X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 104, NULL), /* Lucienne */
- X86_MATCH_VENDOR_FAM_MODEL(AMD, 25, 80, NULL), /* Cezanne */
- {}
-};
-
bool force_storage_d3(void)
{
- return x86_match_cpu(storage_d3_cpu_ids);
+ if (!cpu_feature_enabled(X86_FEATURE_ZEN))
+ return false;
+ return acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0;
}
/*
--
2.43.0
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and MADV_DONTNEED semantics). Consider some enclave page added to the
enclave. User space decides to temporarily remove this page (e.g.,
emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
space performs a memory access on the same page on CPU2, which results
in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
follows:
/*
* CPU1: User space performs
* ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
* on enclave page X
*/
sgx_encl_remove_pages() {
mutex_lock(&encl->lock);
entry = sgx_encl_load_page(encl);
/*
* verify that page is
* trimmed and accepted
*/
mutex_unlock(&encl->lock);
/*
* remove PTE entry; cannot
* be performed under lock
*/
sgx_zap_enclave_ptes(encl);
/*
* Fault on CPU2 on same page X
*/
sgx_vma_fault() {
/*
* PTE entry was removed, but the
* page is still in enclave's xarray
*/
xa_load(&encl->page_array) != NULL ->
/*
* SGX driver thinks that this page
* was swapped out and loads it
*/
mutex_lock(&encl->lock);
/*
* this is effectively a no-op
*/
entry = sgx_encl_load_page_in_vma();
/*
* add PTE entry
*
* *BUG*: a PTE is installed for a
* page in process of being removed
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
/*
* continue with page removal
*/
mutex_lock(&encl->lock);
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
xa_erase(&encl->page_array);
mutex_unlock(&encl->lock);
}
Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page.
Fix this race by forcing the fault handler on CPU2 to back off if the
page is currently being removed (on CPU1). This is achieved by
introducing a new flag SGX_ENCL_PAGE_BEING_REMOVED, which is unset by
default and set only right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is being removed, and if yes then CPU2 backs off and waits until
the page is completely removed. After that, any memory access to this
page results in a normal "allocate and EAUG a page on #PF" flow.
Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
Reviewed-by: Haitao Huang <haitao.huang(a)linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Acked-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 3 ++-
arch/x86/kernel/cpu/sgx/encl.h | 3 +++
arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 41f14b1a3025..7ccd8b2fce5f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -257,7 +257,8 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
/* Entry successfully located. */
if (entry->epc_page) {
- if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+ if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED |
+ SGX_ENCL_PAGE_BEING_REMOVED))
return ERR_PTR(-EBUSY);
return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..fff5f2293ae7 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -25,6 +25,9 @@
/* 'desc' bit marking that the page is being reclaimed. */
#define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3)
+/* 'desc' bit marking that the page is being removed. */
+#define SGX_ENCL_PAGE_BEING_REMOVED BIT(2)
+
struct sgx_encl_page {
unsigned long desc;
unsigned long vm_max_prot_bits:8;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 5d390df21440..de59219ae794 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
*/
+ entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
--
2.34.1
From: Jeff Xu <jeffxu(a)google.com>
By default, memfd_create() creates a non-sealable MFD, unless the
MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created
with that flag is sealable, even though MFD_ALLOW_SEALING is not set.
This patch changes MFD_NOEXEC_SEAL to be non-sealable by default,
unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL
is new, we expect not many applications will rely on the nature of
MFD_NOEXEC_SEAL being sealable. In most cases, the application already
sets MFD_ALLOW_SEALING if they need a sealable MFD.
Additionally, this enhances the useability of pid namespace sysctl
vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will
add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or
MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD
to be sealable. This means, any application that does not desire this
behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to
migrate/enforce non-executable MFD. This adjustment ensures that
applications can anticipate that the sealable characteristic will
remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás
used Debian Code Search and GitHub to try to find potential breakages
and could only find a single one. Dbus-broker's memfd_create() wrapper
is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to
work around it [1]. This workaround will break. Luckily, this only
affects the test suite, it does not affect
the normal operations of dbus-broker. There is a PR with a fix[2]. In
addition, David Rheinsberg also raised similar fix in [3]
[1]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a…
[2]: https://github.com/bus1/dbus-broker/pull/366
[3]: https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
Cc: stable(a)vger.kernel.org
Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Barnabás Pőcze <pobrn(a)protonmail.com>
Signed-off-by: Jeff Xu <jeffxu(a)google.com>
Reviewed-by: David Rheinsberg <david(a)readahead.eu>
---
mm/memfd.c | 9 ++++----
tools/testing/selftests/memfd/memfd_test.c | 26 +++++++++++++++++++++-
2 files changed, 29 insertions(+), 6 deletions(-)
diff --git a/mm/memfd.c b/mm/memfd.c
index 7d8d3ab3fa37..8b7f6afee21d 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -356,12 +356,11 @@ SYSCALL_DEFINE2(memfd_create,
inode->i_mode &= ~0111;
file_seals = memfd_file_seals_ptr(file);
- if (file_seals) {
- *file_seals &= ~F_SEAL_SEAL;
+ if (file_seals)
*file_seals |= F_SEAL_EXEC;
- }
- } else if (flags & MFD_ALLOW_SEALING) {
- /* MFD_EXEC and MFD_ALLOW_SEALING are set */
+ }
+
+ if (flags & MFD_ALLOW_SEALING) {
file_seals = memfd_file_seals_ptr(file);
if (file_seals)
*file_seals &= ~F_SEAL_SEAL;
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 95af2d78fd31..8579a93d006b 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -1151,7 +1151,7 @@ static void test_noexec_seal(void)
mfd_def_size,
MFD_CLOEXEC | MFD_NOEXEC_SEAL);
mfd_assert_mode(fd, 0666);
- mfd_assert_has_seals(fd, F_SEAL_EXEC);
+ mfd_assert_has_seals(fd, F_SEAL_SEAL | F_SEAL_EXEC);
mfd_fail_chmod(fd, 0777);
close(fd);
}
@@ -1169,6 +1169,14 @@ static void test_sysctl_sysctl0(void)
mfd_assert_has_seals(fd, 0);
mfd_assert_chmod(fd, 0644);
close(fd);
+
+ fd = mfd_assert_new("kern_memfd_sysctl_0_dfl",
+ mfd_def_size,
+ MFD_CLOEXEC);
+ mfd_assert_mode(fd, 0777);
+ mfd_assert_has_seals(fd, F_SEAL_SEAL);
+ mfd_assert_chmod(fd, 0644);
+ close(fd);
}
static void test_sysctl_set_sysctl0(void)
@@ -1206,6 +1214,14 @@ static void test_sysctl_sysctl1(void)
mfd_assert_has_seals(fd, F_SEAL_EXEC);
mfd_fail_chmod(fd, 0777);
close(fd);
+
+ fd = mfd_assert_new("kern_memfd_sysctl_1_noexec_nosealable",
+ mfd_def_size,
+ MFD_CLOEXEC | MFD_NOEXEC_SEAL);
+ mfd_assert_mode(fd, 0666);
+ mfd_assert_has_seals(fd, F_SEAL_EXEC | F_SEAL_SEAL);
+ mfd_fail_chmod(fd, 0777);
+ close(fd);
}
static void test_sysctl_set_sysctl1(void)
@@ -1238,6 +1254,14 @@ static void test_sysctl_sysctl2(void)
mfd_assert_has_seals(fd, F_SEAL_EXEC);
mfd_fail_chmod(fd, 0777);
close(fd);
+
+ fd = mfd_assert_new("kern_memfd_sysctl_2_noexec_notsealable",
+ mfd_def_size,
+ MFD_CLOEXEC | MFD_NOEXEC_SEAL);
+ mfd_assert_mode(fd, 0666);
+ mfd_assert_has_seals(fd, F_SEAL_EXEC | F_SEAL_SEAL);
+ mfd_fail_chmod(fd, 0777);
+ close(fd);
}
static void test_sysctl_set_sysctl2(void)
--
2.45.1.288.g0e0cd299f1-goog
amd_rng_mod_init() uses pci_read_config_dword() that returns PCIBIOS_*
codes. The return code is then returned as is but amd_rng_mod_init() is
a module_init() function that should return normal errnos.
Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it.
Fixes: 96d63c0297cc ("[PATCH] Add AMD HW RNG driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
drivers/char/hw_random/amd-rng.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/char/hw_random/amd-rng.c b/drivers/char/hw_random/amd-rng.c
index 86162a13681e..9a24d19236dc 100644
--- a/drivers/char/hw_random/amd-rng.c
+++ b/drivers/char/hw_random/amd-rng.c
@@ -143,8 +143,10 @@ static int __init amd_rng_mod_init(void)
found:
err = pci_read_config_dword(pdev, 0x58, &pmbase);
- if (err)
+ if (err) {
+ err = pcibios_err_to_errno(err);
goto put_dev;
+ }
pmbase &= 0x0000FF00;
if (pmbase == 0) {
--
2.39.2
Please consider commit
15aa8fb852f995dd
x86/efistub: Omit physical KASLR when memory reservations exist
for backporting to v6.1 and later.
Thanks,
Ard.
Add subsystem lvds and mipi. Add pwm and i2c in lvds and mipi.
imx8qm-mek:
- add remove-proc
- fixed gpio number error for vmmc
- add usb3 and typec
- add pwm and i2c in lvds and mipi
DTB_CHECK warning fixed by seperate patches.
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: usb@5b110000: usb@5b120000: 'port', 'usb-role-switch' do not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/usb/fsl,imx8qm-cdns3.yaml#
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: usb@5b120000: 'port', 'usb-role-switch' do not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/usb/cdns,usb3.yaml#
** binding fix patch: https://lore.kernel.org/imx/20240606161509.3201080-1-Frank.Li@nxp.com/T/#u
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: interrupt-controller@56240000: 'power-domains' does not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/interrupt-controller/fsl,irqsteer.yaml#
** binding fix patch: https://lore.kernel.org/imx/20240528071141.92003-1-alexander.stein@ew.tq-gr…
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: pwm@56244000: 'oneOf' conditional failed, one must be fixed:
'interrupts' is a required property
'interrupts-extended' is a required property
from schema $id: http://devicetree.org/schemas/pwm/imx-pwm.yaml#
** binding fix patch: https://lore.kernel.org/imx/dc9accba-78af-45ec-a516-b89f2d4f4b03@kernel.org…
from schema $id: http://devicetree.org/schemas/interrupt-controller/fsl,irqsteer.yaml#
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: imx8qm-cm4-0: power-domains: [[15, 278], [15, 297]] is too short
from schema $id: http://devicetree.org/schemas/remoteproc/fsl,imx-rproc.yaml#
arch/arm64/boot/dts/freescale/imx8qm-mek.dtb: imx8qm-cm4-1: power-domains: [[15, 298], [15, 317]] is too short
** binding fix patch: https://lore.kernel.org/imx/20240606150030.3067015-1-Frank.Li@nxp.com/T/#u
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
---
Frank Li (7):
arm64: dts: imx8qm: add lvds subsystem
arm64: dts: imx8qm: add mipi subsystem
arm64: dts: imx8qm-mek: add cm4 remote-proc and related memory region
arm64: dts: imx8qm-mek: add pwm and i2c in lvds subsystem
arm64: dts: imx8qm-mek: add i2c in mipi[0,1] subsystem
arm64: dts: imx8qm-mek: fix gpio number for reg_usdhc2_vmmc
arm64: dts: imx8qm-mek: add usb 3.0 and related type C nodes
arch/arm64/boot/dts/freescale/imx8qm-mek.dts | 308 +++++++++++++++++++++-
arch/arm64/boot/dts/freescale/imx8qm-ss-lvds.dtsi | 231 ++++++++++++++++
arch/arm64/boot/dts/freescale/imx8qm-ss-mipi.dtsi | 286 ++++++++++++++++++++
arch/arm64/boot/dts/freescale/imx8qm.dtsi | 2 +
4 files changed, 826 insertions(+), 1 deletion(-)
---
base-commit: ee78a17615ad0cfdbbc27182b1047cd36c9d4d5f
change-id: 20240606-imx8qm-dts-usb-9c55d2bfe526
Best regards,
---
Frank Li <Frank.Li(a)nxp.com>
On Thu, Jun 06, 2024 at 12:14:22AM +0300, миша ухин wrote:
> <div><div>Thank you for the comment.<br />It seems there might be a misunderstanding.<br />The commit 00d873c17e29 ("ext4: avoid deadlock in fs reclaim with page writeback") you mentioned introduces the use of memalloc_nofs_save()/memalloc_nofs_restore() when acquiring the EXT4_SB(sb)->s_writepages_rwsem lock.<br />On the other hand the patch we proposed corrects the order of locking/unlocking resources with calls to the functions ext4_journal_start()/ext4_journal_stop() and down_write(&EXT4_I(inode)->i_data_sem)/up_write(&EXT4_I(inode)->i_data_sem).<br />These patches do not appear to resolve the same issue, and the code changes are different.</div><div> </div><div>- <span style="white-space:pre-wrap">Mikhail Ukhin</span></div></div>
PLEASE do not send HTML messages to the linux-kernel mailing list. It
looks like garbage when read on a text mail reader.
In any case, you're correct. I had misremembered the issue with this
patch. The complaint that I had made with the V1 of the patch has not
been corrected, which is that the assertion made in the commit
description "the order of unlocking must be the reverse of the order
of locking" is errant nonsense. It is simply is technically
incorrect; the order in which locks are released doesn't matter. (And
a jbd2 handle is not a lock.)
The syzkaller report which apparntly triggered this failure was
supplied by Artem here[1], and the explanation should include that it
was triggered by an EXT4_IOC_MIGRATE ioctl which was set to require
synchornous update because the file descriptor was opened with O_SYNC,
and this could result in the jbd2_journal_stop() function calling
jbd2_might_wait_for_commit() which could potentially trigger a
deadlock if the EXT4_IOC_MIGRATE call is racing with write(2) system
call.
[1] https://lore.kernel.org/r/1845977.e0hk0VWMCB@cherry
In any case, this is a low priority issue since the only program which
uses EXT4_IOC_MIGRATE is e4defrag, and it doesn't open files with
O_SYNC, so this isn't going to happen in real life. And so why don't
you use this as an opportunity to practice writing a technically valid
and correct commit description, and how to properlty submit patches
and send valid (non-HTML) messages to the Linux kernel mailing list?
Cheers,
- Ted
This reverts commit f49449fbc21e7e9550a5203902d69c8ae7dfd918.
This commit breaks u_ether on some setups (at least Merrifield). The fix
"usb: gadget: u_ether: Re-attach netif device to mirror detachment" party
restores u-ether. However the netif usb: remains up even usb is switched
from device to host mode. This creates problems for user space as the
interface remains in the routing table while not realy present and network
managers (connman) not detecting a network change.
Various attempts to find the root cause were unsuccesful up to now. Therefore
revert until a solution is found.
Link: https://lore.kernel.org/linux-usb/20231006141231.7220-1-hgajjar@de.adit-jv.…
Reported-by: Andy Shevchenko <andriy.shevchenko(a)intel.com>
Reported-by: Ferry Toth <fntoth(a)gmail.com>
Fixes: f49449fbc21e ("usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach")
Cc: stable(a)vger.kernel.org
---
drivers/usb/gadget/function/u_ether.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c
index aa0511c3a62c..95191083b455 100644
--- a/drivers/usb/gadget/function/u_ether.c
+++ b/drivers/usb/gadget/function/u_ether.c
@@ -1200,7 +1200,7 @@ void gether_disconnect(struct gether *link)
DBG(dev, "%s\n", __func__);
- netif_device_detach(dev->net);
+ netif_stop_queue(dev->net);
netif_carrier_off(dev->net);
/* disable endpoints, forcing (synchronous) completion
--
2.43.0
From: Jonas Gorski <jonas.gorski(a)gmail.com>
Analogue to uart_port_tx_flags() introduced in commit 3ee07964d407
("serial: core: introduce uart_port_tx_flags()"), add a _flags variant
for uart_port_tx_limited().
Fixes: d11cc8c3c4b6 ("tty: serial: use uart_port_tx_limited()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jonas Gorski <jonas.gorski(a)gmail.com>
Signed-off-by: Doug Brown <doug(a)schmorgal.com>
---
include/linux/serial_core.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 3fb9a29e025f..aea25eef9a1a 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -850,6 +850,24 @@ enum UART_TX_FLAGS {
__count--); \
})
+/**
+ * uart_port_tx_limited_flags -- transmit helper for uart_port with count limiting with flags
+ * @port: uart port
+ * @ch: variable to store a character to be written to the HW
+ * @flags: %UART_TX_NOSTOP or similar
+ * @count: a limit of characters to send
+ * @tx_ready: can HW accept more data function
+ * @put_char: function to write a character
+ * @tx_done: function to call after the loop is done
+ *
+ * See uart_port_tx_limited() for more details.
+ */
+#define uart_port_tx_limited_flags(port, ch, flags, count, tx_ready, put_char, tx_done) ({ \
+ unsigned int __count = (count); \
+ __uart_port_tx(port, ch, flags, tx_ready, put_char, tx_done, __count, \
+ __count--); \
+})
+
/**
* uart_port_tx -- transmit helper for uart_port
* @port: uart port
--
2.34.1
This reverts commit 7bfb915a597a301abb892f620fe5c283a9fdbd77.
This commit broke pxa and omap-serial, because it inhibited them from
calling stop_tx() if their TX FIFOs weren't completely empty. This
resulted in these two drivers hanging during transmits because the TX
interrupt would stay enabled, and a new TX interrupt would never fire.
Cc: stable(a)vger.kernel.org
Signed-off-by: Doug Brown <doug(a)schmorgal.com>
---
include/linux/serial_core.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 8cb65f50e830..3fb9a29e025f 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -811,8 +811,7 @@ enum UART_TX_FLAGS {
if (pending < WAKEUP_CHARS) { \
uart_write_wakeup(__port); \
\
- if (!((flags) & UART_TX_NOSTOP) && pending == 0 && \
- __port->ops->tx_empty(__port)) \
+ if (!((flags) & UART_TX_NOSTOP) && pending == 0) \
__port->ops->stop_tx(__port); \
} \
\
--
2.34.1
While looking at using 'lib.sh' for the MPTCP selftests [1], we found
some small issues with 'lib.sh'. Here they are:
- Patch 1: fix 'errexit' (set -e) support with busywait. 'errexit' is
supported in some functions, not all. A fix for v6.8+.
- Patch 2: avoid confusing error messages linked to the cleaning part
when the netns setup fails. A fix for v6.8+.
- Patch 3: set a variable as local to avoid accidentally changing the
value of a another one with the same name on the caller side. A fix
for v6.10-rc1+.
Link: https://lore.kernel.org/mptcp/5f4615c3-0621-43c5-ad25-55747a4350ce@kernel.o… [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (3):
selftests: net: lib: support errexit with busywait
selftests: net: lib: avoid error removing empty netns name
selftests: net: lib: set 'i' as local
tools/testing/selftests/net/lib.sh | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
---
base-commit: a535d59432370343058755100ee75ab03c0e3f91
change-id: 20240605-upstream-net-20240605-selftests-net-lib-fixes-7a90a1a8d9d2
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
This is the start of the stable review cycle for the 5.15.160 release.
There are 23 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 25 May 2024 13:03:15 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.160-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.160-rc1
Akira Yokosawa <akiyks(a)gmail.com>
docs: kernel_include.py: Cope with docutils 0.21
Thomas Weißschuh <linux(a)weissschuh.net>
admin-guide/hw-vuln/core-scheduling: fix return type of PR_SCHED_CORE_GET
Jarkko Sakkinen <jarkko(a)kernel.org>
KEYS: trusted: Do not use WARN when encode fails
AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
remoteproc: mediatek: Make sure IPI buffer fits in L2TCM
Daniel Thompson <daniel.thompson(a)linaro.org>
serial: kgdboc: Fix NMI-safety problems from keyboard reset code
Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
usb: typec: ucsi: displayport: Fix potential deadlock
Carlos Llamas <cmllamas(a)google.com>
binder: fix max_thread type inconsistency
Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
drm/amdgpu: Fix possible NULL dereference in amdgpu_ras_query_error_status_helper()
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection
Eric Dumazet <edumazet(a)google.com>
netlink: annotate data-races around sk->sk_err
Eric Dumazet <edumazet(a)google.com>
netlink: annotate lockless accesses to nlk->max_recvmsg_len
Jakub Kicinski <kuba(a)kernel.org>
net: tls: handle backlogging of crypto requests
Jakub Kicinski <kuba(a)kernel.org>
tls: fix race between async notify and socket close
Jakub Kicinski <kuba(a)kernel.org>
net: tls: factor out tls_*crypt_async_wait()
Sabrina Dubroca <sd(a)queasysnail.net>
tls: extract context alloc/initialization out of tls_set_sw_offload
Jakub Kicinski <kuba(a)kernel.org>
tls: rx: simplify async wait
Doug Berger <opendmb(a)gmail.com>
net: bcmgenet: synchronize UMAC_CMD access
Doug Berger <opendmb(a)gmail.com>
net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL access
Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Revert "selftests: mm: fix map_hugetlb failure on 64K page size systems"
Jarkko Sakkinen <jarkko(a)kernel.org>
KEYS: trusted: Fix memory leak in tpm2_key_encode()
NeilBrown <neilb(a)suse.de>
nfsd: don't allow nfsd threads to be signalled.
Sergey Shtylyov <s.shtylyov(a)omp.ru>
pinctrl: core: handle radix_tree_insert() errors in pinctrl_register_one_pin()
Jose Fernandez <josef(a)netflix.com>
drm/amd/display: Fix division by zero in setup_dsc_config
-------------
Diffstat:
.../admin-guide/hw-vuln/core-scheduling.rst | 4 +-
Documentation/sphinx/kernel_include.py | 1 -
Makefile | 4 +-
arch/x86/kvm/x86.c | 11 +-
drivers/android/binder.c | 2 +-
drivers/android/binder_internal.h | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +
drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c | 7 +-
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 12 +-
drivers/net/ethernet/broadcom/genet/bcmgenet.h | 2 +
drivers/net/ethernet/broadcom/genet/bcmgenet_wol.c | 6 +
drivers/net/ethernet/broadcom/genet/bcmmii.c | 4 +
drivers/pinctrl/core.c | 14 +-
drivers/remoteproc/mtk_scp.c | 10 +-
drivers/tty/serial/kgdboc.c | 30 +++-
drivers/usb/typec/ucsi/displayport.c | 4 -
fs/nfs/callback.c | 9 +-
fs/nfsd/nfs4proc.c | 5 +-
fs/nfsd/nfssvc.c | 12 --
include/net/tls.h | 6 -
net/netlink/af_netlink.c | 23 +--
net/sunrpc/svc_xprt.c | 16 +-
net/tls/tls_sw.c | 199 +++++++++++----------
security/keys/trusted-keys/trusted_tpm2.c | 25 ++-
tools/testing/selftests/vm/map_hugetlb.c | 7 -
25 files changed, 243 insertions(+), 175 deletions(-)
On Thu, 6 Jun 2024 at 01:11, Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> arm64: fpsimd: Bring cond_yield asm macro in line with new rules
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> arm64-fpsimd-bring-cond_yield-asm-macro-in-line-with.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
NAK
None of these changes belong in v6.6 - please drop all of them.
From: Kan Liang <kan.liang(a)linux.intel.com>
The hard-coded metrics is wrongly calculated on the hybrid machine.
$ perf stat -e cycles,instructions -a sleep 1
Performance counter stats for 'system wide':
18,205,487 cpu_atom/cycles/
9,733,603 cpu_core/cycles/
9,423,111 cpu_atom/instructions/ # 0.52 insn per cycle
4,268,965 cpu_core/instructions/ # 0.23 insn per cycle
The insn per cycle for cpu_core should be 4,268,965 / 9,733,603 = 0.44.
When finding the metric events, the find_stat() doesn't take the PMU
type into account. The cpu_atom/cycles/ is wrongly used to calculate
the IPC of the cpu_core.
Fixes: 0a57b910807a ("perf stat: Use counts rather than saved_value")
Reported-by: "Khalil, Amiri" <amiri.khalil(a)intel.com>
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
tools/perf/util/stat-shadow.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 3466aa952442..4d0edc061f1a 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -176,6 +176,10 @@ static double find_stat(const struct evsel *evsel, int aggr_idx, enum stat_type
if (type != evsel__stat_type(cur))
continue;
+ /* Ignore if not the PMU we're looking for. */
+ if (evsel->pmu != cur->pmu)
+ continue;
+
aggr = &cur->stats->aggr[aggr_idx];
if (type == STAT_NSECS)
return aggr->counts.val;
--
2.35.1
On Thu, 30 May 2024 at 21:11, Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> arm64: fpsimd: Drop unneeded 'busy' flag
>
> to the 6.6-stable tree
Why?
> which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> arm64-fpsimd-drop-unneeded-busy-flag.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 37f2773a1ef05374538d5e4ed26cbacebe363241
> Author: Ard Biesheuvel <ardb(a)kernel.org>
> Date: Fri Dec 8 12:32:20 2023 +0100
>
> arm64: fpsimd: Drop unneeded 'busy' flag
>
> [ Upstream commit 9b19700e623f96222c69ecb2adecb1a3e3664cc0 ]
>
> Kernel mode NEON will preserve the user mode FPSIMD state by saving it
> into the task struct before clobbering the registers. In order to avoid
> the need for preserving kernel mode state too, we disallow nested use of
> kernel mode NEON, i..e, use in softirq context while the interrupted
> task context was using kernel mode NEON too.
>
> Originally, this policy was implemented using a per-CPU flag which was
> exposed via may_use_simd(), requiring the users of the kernel mode NEON
> to deal with the possibility that it might return false, and having NEON
> and non-NEON code paths. This policy was changed by commit
> 13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs
> disabled"), and now, softirq processing is disabled entirely instead,
> and so may_use_simd() can never fail when called from task or softirq
> context.
>
> This means we can drop the fpsimd_context_busy flag entirely, and
> instead, ensure that we disable softirq processing in places where we
> formerly relied on the flag for preventing races in the FPSIMD preserve
> routines.
>
> Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
> Reviewed-by: Mark Brown <broonie(a)kernel.org>
> Tested-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
> Link: https://lore.kernel.org/r/20231208113218.3001940-7-ardb@google.com
> [will: Folded in fix from CAMj1kXFhzbJRyWHELCivQW1yJaF=p07LLtbuyXYX3G1WtsdyQg(a)mail.gmail.com]
> Signed-off-by: Will Deacon <will(a)kernel.org>
> Stable-dep-of: b8995a184170 ("Revert "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"")
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
> index 6a75d7ecdcaa2..8e86c9e70e483 100644
> --- a/arch/arm64/include/asm/simd.h
> +++ b/arch/arm64/include/asm/simd.h
> @@ -12,8 +12,6 @@
> #include <linux/preempt.h>
> #include <linux/types.h>
>
> -DECLARE_PER_CPU(bool, fpsimd_context_busy);
> -
> #ifdef CONFIG_KERNEL_MODE_NEON
>
> /*
> @@ -28,17 +26,10 @@ static __must_check inline bool may_use_simd(void)
> /*
> * We must make sure that the SVE has been initialized properly
> * before using the SIMD in kernel.
> - * fpsimd_context_busy is only set while preemption is disabled,
> - * and is clear whenever preemption is enabled. Since
> - * this_cpu_read() is atomic w.r.t. preemption, fpsimd_context_busy
> - * cannot change under our feet -- if it's set we cannot be
> - * migrated, and if it's clear we cannot be migrated to a CPU
> - * where it is set.
> */
> return !WARN_ON(!system_capabilities_finalized()) &&
> system_supports_fpsimd() &&
> - !in_hardirq() && !irqs_disabled() && !in_nmi() &&
> - !this_cpu_read(fpsimd_context_busy);
> + !in_hardirq() && !irqs_disabled() && !in_nmi();
> }
>
> #else /* ! CONFIG_KERNEL_MODE_NEON */
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 5cdfcc9e3e54b..b805bdab284c4 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -85,13 +85,13 @@
> * softirq kicks in. Upon vcpu_put(), KVM will save the vcpu FP state and
> * flag the register state as invalid.
> *
> - * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may
> - * save the task's FPSIMD context back to task_struct from softirq context.
> - * To prevent this from racing with the manipulation of the task's FPSIMD state
> - * from task context and thereby corrupting the state, it is necessary to
> - * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE
> - * flag with {, __}get_cpu_fpsimd_context(). This will still allow softirqs to
> - * run but prevent them to use FPSIMD.
> + * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may be
> + * called from softirq context, which will save the task's FPSIMD context back
> + * to task_struct. To prevent this from racing with the manipulation of the
> + * task's FPSIMD state from task context and thereby corrupting the state, it
> + * is necessary to protect any manipulation of a task's fpsimd_state or
> + * TIF_FOREIGN_FPSTATE flag with get_cpu_fpsimd_context(), which will suspend
> + * softirq servicing entirely until put_cpu_fpsimd_context() is called.
> *
> * For a certain task, the sequence may look something like this:
> * - the task gets scheduled in; if both the task's fpsimd_cpu field
> @@ -209,27 +209,14 @@ static inline void sme_free(struct task_struct *t) { }
>
> #endif
>
> -DEFINE_PER_CPU(bool, fpsimd_context_busy);
> -EXPORT_PER_CPU_SYMBOL(fpsimd_context_busy);
> -
> static void fpsimd_bind_task_to_cpu(void);
>
> -static void __get_cpu_fpsimd_context(void)
> -{
> - bool busy = __this_cpu_xchg(fpsimd_context_busy, true);
> -
> - WARN_ON(busy);
> -}
> -
> /*
> * Claim ownership of the CPU FPSIMD context for use by the calling context.
> *
> * The caller may freely manipulate the FPSIMD context metadata until
> * put_cpu_fpsimd_context() is called.
> *
> - * The double-underscore version must only be called if you know the task
> - * can't be preempted.
> - *
> * On RT kernels local_bh_disable() is not sufficient because it only
> * serializes soft interrupt related sections via a local lock, but stays
> * preemptible. Disabling preemption is the right choice here as bottom
> @@ -242,14 +229,6 @@ static void get_cpu_fpsimd_context(void)
> local_bh_disable();
> else
> preempt_disable();
> - __get_cpu_fpsimd_context();
> -}
> -
> -static void __put_cpu_fpsimd_context(void)
> -{
> - bool busy = __this_cpu_xchg(fpsimd_context_busy, false);
> -
> - WARN_ON(!busy); /* No matching get_cpu_fpsimd_context()? */
> }
>
> /*
> @@ -261,18 +240,12 @@ static void __put_cpu_fpsimd_context(void)
> */
> static void put_cpu_fpsimd_context(void)
> {
> - __put_cpu_fpsimd_context();
> if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> local_bh_enable();
> else
> preempt_enable();
> }
>
> -static bool have_cpu_fpsimd_context(void)
> -{
> - return !preemptible() && __this_cpu_read(fpsimd_context_busy);
> -}
> -
> unsigned int task_get_vl(const struct task_struct *task, enum vec_type type)
> {
> return task->thread.vl[type];
> @@ -383,7 +356,7 @@ static void task_fpsimd_load(void)
> bool restore_ffr;
>
> WARN_ON(!system_supports_fpsimd());
> - WARN_ON(!have_cpu_fpsimd_context());
> + WARN_ON(preemptible());
>
> if (system_supports_sve() || system_supports_sme()) {
> switch (current->thread.fp_type) {
> @@ -467,7 +440,7 @@ static void fpsimd_save(void)
> unsigned int vl;
>
> WARN_ON(!system_supports_fpsimd());
> - WARN_ON(!have_cpu_fpsimd_context());
> + WARN_ON(preemptible());
>
> if (test_thread_flag(TIF_FOREIGN_FPSTATE))
> return;
> @@ -1583,7 +1556,7 @@ void fpsimd_thread_switch(struct task_struct *next)
> if (!system_supports_fpsimd())
> return;
>
> - __get_cpu_fpsimd_context();
> + WARN_ON_ONCE(!irqs_disabled());
>
> /* Save unsaved fpsimd state, if any: */
> fpsimd_save();
> @@ -1599,8 +1572,6 @@ void fpsimd_thread_switch(struct task_struct *next)
>
> update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> wrong_task || wrong_cpu);
> -
> - __put_cpu_fpsimd_context();
> }
>
> static void fpsimd_flush_thread_vl(enum vec_type type)
> @@ -1892,13 +1863,15 @@ static void fpsimd_flush_cpu_state(void)
> */
> void fpsimd_save_and_flush_cpu_state(void)
> {
> + unsigned long flags;
> +
> if (!system_supports_fpsimd())
> return;
> WARN_ON(preemptible());
> - __get_cpu_fpsimd_context();
> + local_irq_save(flags);
> fpsimd_save();
> fpsimd_flush_cpu_state();
> - __put_cpu_fpsimd_context();
> + local_irq_restore(flags);
> }
>
> #ifdef CONFIG_KERNEL_MODE_NEON
I'm not seeing a test mail for v6.6.33-rc1 but it's in the stable-rc git
and I'm seeing build failures in the KVM selftests for arm64 with it:
/usr/bin/ld: /build/stage/build-work/kselftest/kvm/aarch64/vgic_init.o: in funct
ion `test_v2_uaccess_cpuif_no_vcpus':
/build/stage/linux/tools/testing/selftests/kvm/aarch64/vgic_init.c:388:(.text+0x
1234): undefined reference to `FIELD_PREP'
/usr/bin/ld: /build/stage/linux/tools/testing/selftests/kvm/aarch64/vgic_init.c:
388:(.text+0x1244): undefined reference to `FIELD_PREP'
/usr/bin/ld: /build/stage/linux/tools/testing/selftests/kvm/aarch64/vgic_init.c:
393:(.text+0x12a4): undefined reference to `FIELD_PREP'
/usr/bin/ld: /build/stage/linux/tools/testing/selftests/kvm/aarch64/vgic_init.c:
393:(.text+0x12b4): undefined reference to `FIELD_PREP'
/usr/bin/ld: /build/stage/linux/tools/testing/selftests/kvm/aarch64/vgic_init.c:
398:(.text+0x1308): undefined reference to `FIELD_PREP'
due to 12237178b318fb3 ("KVM: selftests: Add test for uaccesses to
non-existent vgic-v2 CPUIF") which was backported from
160933e330f4c5a13931d725a4d952a4b9aefa71.
commit 4a63bd179fa8d3fcc44a0d9d71d941ddd62f0c4e upstream.
Currently ALSA timer doesn't have the lower limit of the start tick
time, and it allows a very small size, e.g. 1 tick with 1ns resolution
for hrtimer. Such a situation may lead to an unexpected RCU stall,
where the callback repeatedly queuing the expire update, as reported
by fuzzer.
This patch introduces a sanity check of the timer start tick time, so
that the system returns an error when a too small start size is set.
As of this patch, the lower limit is hard-coded to 100us, which is
small enough but can still work somehow.
[ backport note: the error handling is changed, as the original commit
is based on the recent cleanup with guard() in commit beb45974dd49
-- tiwai ]
Reported-by: syzbot+43120c2af6ca2938cc38(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/000000000000fa00a1061740ab6d@google.com
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20240514182745.4015-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
---
Greg, this is an alternative fix to the original cherry-pick; apply
to 6.8.y and older stable kernels. Thanks!
sound/core/timer.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/sound/core/timer.c b/sound/core/timer.c
index e6e551d4a29e..a0b515981ee9 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -553,6 +553,16 @@ static int snd_timer_start1(struct snd_timer_instance *timeri,
goto unlock;
}
+ /* check the actual time for the start tick;
+ * bail out as error if it's way too low (< 100us)
+ */
+ if (start) {
+ if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000) {
+ result = -EINVAL;
+ goto unlock;
+ }
+ }
+
if (start)
timeri->ticks = timeri->cticks = ticks;
else if (!timeri->cticks)
--
2.43.0
No upstream commit exists for this commit.
The issue was introduced with commit e2f744a82d72 ("clk: mediatek:
Add MT2712 clock support")
In case of memory allocation fail in clk_mt2712_top_init_early()
'top_clk_data' will be set to NULL and later dereferenced without check.
Fix this bug by adding NULL-return check.
Upstream branch code has been significantly refactored and can't be
backported directly.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Aleksandr Mishin <amishin(a)t-argos.ru>
---
drivers/clk/mediatek/clk-mt2712.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/clk/mediatek/clk-mt2712.c b/drivers/clk/mediatek/clk-mt2712.c
index a0f0c9ed48d1..1830bae661dc 100644
--- a/drivers/clk/mediatek/clk-mt2712.c
+++ b/drivers/clk/mediatek/clk-mt2712.c
@@ -1277,6 +1277,11 @@ static void clk_mt2712_top_init_early(struct device_node *node)
if (!top_clk_data) {
top_clk_data = mtk_alloc_clk_data(CLK_TOP_NR_CLK);
+ if (!top_clk_data) {
+ pr_err("%s(): could not register clock provider: %d\n",
+ __func__, -ENOMEM);
+ return;
+ }
for (i = 0; i < CLK_TOP_NR_CLK; i++)
top_clk_data->hws[i] = ERR_PTR(-EPROBE_DEFER);
--
2.30.2
No upstream commit exists for this commit.
The issue was introduced with commit c93d059a8045 ("clk: mediatek: mt8183:
Register 13MHz clock earlier for clocksource")
In case of memory allocation fail in clk_mt8183_top_init_early()
'top_clk_data' will be set to NULL and later dereferenced without check.
Fix this bug by adding NULL-return check.
Upstream branch code has been significantly refactored and can't be
backported directly.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Aleksandr Mishin <amishin(a)t-argos.ru>
---
drivers/clk/mediatek/clk-mt8183.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/clk/mediatek/clk-mt8183.c b/drivers/clk/mediatek/clk-mt8183.c
index 78620244144e..8377a877d9e3 100644
--- a/drivers/clk/mediatek/clk-mt8183.c
+++ b/drivers/clk/mediatek/clk-mt8183.c
@@ -1185,6 +1185,11 @@ static void clk_mt8183_top_init_early(struct device_node *node)
int i;
top_clk_data = mtk_alloc_clk_data(CLK_TOP_NR_CLK);
+ if (!top_clk_data) {
+ pr_err("%s(): could not register clock provider: %d\n",
+ __func__, -ENOMEM);
+ return;
+ }
for (i = 0; i < CLK_TOP_NR_CLK; i++)
top_clk_data->hws[i] = ERR_PTR(-EPROBE_DEFER);
--
2.30.2
[ Upstream commit 1cd4bc987abb2823836cbb8f887026011ccddc8a ]
Commit f58f45c1e5b9 ("vxlan: drop packets from invalid src-address")
has recently been added to vxlan mainly in the context of source
address snooping/learning so that when it is enabled, an entry in the
FDB is not being created for an invalid address for the corresponding
tunnel endpoint.
Before commit f58f45c1e5b9 vxlan was similarly behaving as geneve in
that it passed through whichever macs were set in the L2 header. It
turns out that this change in behavior breaks setups, for example,
Cilium with netkit in L3 mode for Pods as well as tunnel mode has been
passing before the change in f58f45c1e5b9 for both vxlan and geneve.
After mentioned change it is only passing for geneve as in case of
vxlan packets are dropped due to vxlan_set_mac() returning false as
source and destination macs are zero which for E/W traffic via tunnel
is totally fine.
Fix it by only opting into the is_valid_ether_addr() check in
vxlan_set_mac() when in fact source address snooping/learning is
actually enabled in vxlan. This is done by moving the check into
vxlan_snoop(). With this change, the Cilium connectivity test suite
passes again for both tunnel flavors.
Fixes: f58f45c1e5b9 ("vxlan: drop packets from invalid src-address")
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Cc: David Bauer <mail(a)david-bauer.net>
Cc: Ido Schimmel <idosch(a)nvidia.com>
Cc: Nikolay Aleksandrov <razor(a)blackwall.org>
Cc: Martin KaFai Lau <martin.lau(a)kernel.org>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor(a)blackwall.org>
Reviewed-by: David Bauer <mail(a)david-bauer.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
---
drivers/net/vxlan/vxlan_core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 3a9148fb1422..eccf09c81df2 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -1446,6 +1446,10 @@ static bool vxlan_snoop(struct net_device *dev,
struct vxlan_fdb *f;
u32 ifindex = 0;
+ /* Ignore packets from invalid src-address */
+ if (!is_valid_ether_addr(src_mac))
+ return true;
+
#if IS_ENABLED(CONFIG_IPV6)
if (src_ip->sa.sa_family == AF_INET6 &&
(ipv6_addr_type(&src_ip->sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL))
@@ -1615,10 +1619,6 @@ static bool vxlan_set_mac(struct vxlan_dev *vxlan,
if (ether_addr_equal(eth_hdr(skb)->h_source, vxlan->dev->dev_addr))
return false;
- /* Ignore packets from invalid src-address */
- if (!is_valid_ether_addr(eth_hdr(skb)->h_source))
- return false;
-
/* Get address from the outer IP header */
if (vxlan_get_sk_family(vs) == AF_INET) {
saddr.sin.sin_addr.s_addr = ip_hdr(skb)->saddr;
--
2.34.1
There is a potential out-of-bounds access when using test_bit() on a
single word. The test_bit() and set_bit() functions operate on long
values, and when testing or setting a single word, they can exceed the
word boundary. KASAN detects this issue and produces a dump:
BUG: KASAN: slab-out-of-bounds in _scsih_add_device.constprop.0 (./arch/x86/include/asm/bitops.h:60 ./include/asm-generic/bitops/instrumented-atomic.h:29 drivers/scsi/mpt3sas/mpt3sas_scsih.c:7331) mpt3sas
Write of size 8 at addr ffff8881d26e3c60 by task kworker/u1536:2/2965
For full log, please look at [1].
Make the allocation at least the size of sizeof(unsigned long) so that
set_bit() and test_bit() have sufficient room for read/write operations
without overwriting unallocated memory.
[1] Link: https://lore.kernel.org/all/ZkNcALr3W3KGYYJG@gmail.com/
Fixes: c696f7b83ede ("scsi: mpt3sas: Implement device_remove_in_progress check in IOCTL path")
Cc: stable(a)vger.kernel.org
Suggested-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changelog:
v2:
* Do the same protection in krealloc() in
_base_check_ioc_facts_changes, as suggested by Keith.
---
drivers/scsi/mpt3sas/mpt3sas_base.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 258647fc6bdd..cc17204721c2 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -8512,6 +8512,12 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc)
ioc->pd_handles_sz = (ioc->facts.MaxDevHandle / 8);
if (ioc->facts.MaxDevHandle % 8)
ioc->pd_handles_sz++;
+ /* pd_handles_sz should have, at least, the minimal room
+ * for set_bit()/test_bit(), otherwise out-of-memory touch
+ * may occur
+ */
+ ioc->pd_handles_sz = ALIGN(ioc->pd_handles_sz, sizeof(unsigned long));
+
ioc->pd_handles = kzalloc(ioc->pd_handles_sz,
GFP_KERNEL);
if (!ioc->pd_handles) {
@@ -8529,6 +8535,12 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc)
ioc->pend_os_device_add_sz = (ioc->facts.MaxDevHandle / 8);
if (ioc->facts.MaxDevHandle % 8)
ioc->pend_os_device_add_sz++;
+
+ /* pend_os_device_add_sz should have, at least, the minimal room
+ * for set_bit()/test_bit(), otherwise out-of-memory may occur
+ */
+ ioc->pend_os_device_add_sz = ALIGN(ioc->pend_os_device_add_sz,
+ sizeof(unsigned long));
ioc->pend_os_device_add = kzalloc(ioc->pend_os_device_add_sz,
GFP_KERNEL);
if (!ioc->pend_os_device_add) {
@@ -8820,6 +8832,11 @@ _base_check_ioc_facts_changes(struct MPT3SAS_ADAPTER *ioc)
if (ioc->facts.MaxDevHandle % 8)
pd_handles_sz++;
+ /* pd_handles should have, at least, the minimal room
+ * for set_bit()/test_bit(), otherwise out-of-memory touch
+ * may occur
+ */
+ pd_handles_sz = ALIGN(pd_handles_sz, sizeof(unsigned long));
pd_handles = krealloc(ioc->pd_handles, pd_handles_sz,
GFP_KERNEL);
if (!pd_handles) {
--
2.43.0
No upstream commit exists for this patch.
Fuzzing of 5.10 stable branch reports a slab-out-of-bounds error in
ata_scsi_pass_thru.
The error is fixed in 5.18 by commit ce70fd9a551a ("scsi: core: Remove the
cmd field from struct scsi_request") upstream.
Backporting this commit would require significant changes to the code so
it is bettter to use a simple fix for that particular error.
The problem is that the length of the received SCSI command is not
validated if scsi_op == VARIABLE_LENGTH_CMD. It can lead to out-of-bounds
reading if the user sends a request with SCSI command of length less than
32.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Signed-off-by: Artem Sadovnikov <ancowi69(a)gmail.com>
Signed-off-by: Mikhail Ivanov <iwanov-23(a)bk.ru>
Signed-off-by: Mikhail Ukhin <mish.uxin2012(a)yandex.ru>
---
v2: The new addresses were added and the text was updated.
v3: Checking has been moved to the function ata_scsi_var_len_cdb_xlat at
the request of Damien Le Moal
drivers/ata/libata-scsi.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index dfa090ccd21c..38488bd813d1 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3948,7 +3948,11 @@ static unsigned int ata_scsi_var_len_cdb_xlat(struct ata_queued_cmd *qc)
struct scsi_cmnd *scmd = qc->scsicmd;
const u8 *cdb = scmd->cmnd;
const u16 sa = get_unaligned_be16(&cdb[8]);
+ u8 scsi_op = scmd->cmnd[0];
+ if (scsi_op == VARIABLE_LENGTH_CMD && scmd->cmd_len < 32)
+ return 1;
+
/*
* if service action represents a ata pass-thru(32) command,
* then pass it to ata_scsi_pass_thru handler.
--
2.25.1
On the Qualcomm RB1 and RB2 platforms the I2C bus connected to the
LT9611UXC bridge under some circumstances can go into a state when all
transfers timeout. This causes both issues with fetching of EDID and
with updating of the bridge's firmware.
While we are debugging the issue, switch corresponding I2C bus to use
i2c-gpio driver. While using i2c-gpio no communication issues are
observed.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
---
Changes in v2:
- Fixed i2c node names to fix DT validation issues (Rob)
- Link to v1: https://lore.kernel.org/r/20240604-rb12-i2c2g-pio-v1-0-f323907179d9@linaro.…
---
Dmitry Baryshkov (2):
arm64: dts: qcom: qrb2210-rb1: switch I2C2 to i2c-gpio
arm64: dts: qcom: qrb4210-rb2: switch I2C2 to i2c-gpio
arch/arm64/boot/dts/qcom/qrb2210-rb1.dts | 13 ++++++++++++-
arch/arm64/boot/dts/qcom/qrb4210-rb2.dts | 13 ++++++++++++-
2 files changed, 24 insertions(+), 2 deletions(-)
---
base-commit: 0e1980c40b6edfa68b6acf926bab22448a6e40c9
change-id: 20240604-rb12-i2c2g-pio-f6035fa8e022
Best regards,
--
Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
The quilt patch titled
Subject: nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
has been removed from the -mm tree. Its filename was
nilfs2-fix-nilfs_empty_dir-misjudgment-and-long-loop-on-i-o-errors.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
Date: Tue, 4 Jun 2024 22:42:55 +0900
The error handling in nilfs_empty_dir() when a directory folio/page read
fails is incorrect, as in the old ext2 implementation, and if the
folio/page cannot be read or nilfs_check_folio() fails, it will falsely
determine the directory as empty and corrupt the file system.
In addition, since nilfs_empty_dir() does not immediately return on a
failed folio/page read, but continues to loop, this can cause a long loop
with I/O if i_size of the directory's inode is also corrupted, causing the
log writer thread to wait and hang, as reported by syzbot.
Fix these issues by making nilfs_empty_dir() immediately return a false
value (0) if it fails to get a directory folio/page.
Link: https://lkml.kernel.org/r/20240604134255.7165-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+c8166c541d3971bf6c87(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c8166c541d3971bf6c87
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nilfs2/dir.c~nilfs2-fix-nilfs_empty_dir-misjudgment-and-long-loop-on-i-o-errors
+++ a/fs/nilfs2/dir.c
@@ -607,7 +607,7 @@ int nilfs_empty_dir(struct inode *inode)
kaddr = nilfs_get_folio(inode, i, &folio);
if (IS_ERR(kaddr))
- continue;
+ return 0;
de = (struct nilfs_dir_entry *)kaddr;
kaddr += nilfs_last_byte(inode, i) - NILFS_DIR_REC_LEN(1);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The quilt patch titled
Subject: mm/hugetlb: do not call vma_add_reservation upon ENOMEM
has been removed from the -mm tree. Its filename was
mm-hugetlb-do-not-call-vma_add_reservation-upon-enomem.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Oscar Salvador <osalvador(a)suse.de>
Subject: mm/hugetlb: do not call vma_add_reservation upon ENOMEM
Date: Tue, 28 May 2024 22:53:23 +0200
sysbot reported a splat [1] on __unmap_hugepage_range(). This is because
vma_needs_reservation() can return -ENOMEM if
allocate_file_region_entries() fails to allocate the file_region struct
for the reservation.
Check for that and do not call vma_add_reservation() if that is the case,
otherwise region_abort() and region_del() will see that we do not have any
file_regions.
If we detect that vma_needs_reservation() returned -ENOMEM, we clear the
hugetlb_restore_reserve flag as if this reservation was still consumed, so
free_huge_folio() will not increment the resv count.
[1] https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/T/…
Link: https://lkml.kernel.org/r/20240528205323.20439-1-osalvador@suse.de
Fixes: df7a6d1f6405 ("mm/hugetlb: restore the reservation if needed")
Signed-off-by: Oscar Salvador <osalvador(a)suse.de>
Reported-and-tested-by: syzbot+d3fe2dc5ffe9380b714b(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/0000000000004096100617c58d54@google.com/
Cc: Breno Leitao <leitao(a)debian.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-do-not-call-vma_add_reservation-upon-enomem
+++ a/mm/hugetlb.c
@@ -5768,8 +5768,20 @@ void __unmap_hugepage_range(struct mmu_g
* do_exit() will not see it, and will keep the reservation
* forever.
*/
- if (adjust_reservation && vma_needs_reservation(h, vma, address))
- vma_add_reservation(h, vma, address);
+ if (adjust_reservation) {
+ int rc = vma_needs_reservation(h, vma, address);
+
+ if (rc < 0)
+ /* Pressumably allocate_file_region_entries failed
+ * to allocate a file_region struct. Clear
+ * hugetlb_restore_reserve so that global reserve
+ * count will not be incremented by free_huge_folio.
+ * Act as if we consumed the reservation.
+ */
+ folio_clear_hugetlb_restore_reserve(page_folio(page));
+ else if (rc)
+ vma_add_reservation(h, vma, address);
+ }
tlb_remove_page_size(tlb, page, huge_page_size(h));
/*
_
Patches currently in -mm which might be from osalvador(a)suse.de are
mm-hugetlb-drop-node_alloc_noretry-from-alloc_fresh_hugetlb_folio.patch
arch-x86-do-not-explicitly-clear-reserved-flag-in-free_pagetable.patch
The quilt patch titled
Subject: mm/ksm: fix ksm_zero_pages accounting
has been removed from the -mm tree. Its filename was
mm-ksm-fix-ksm_zero_pages-accounting.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Chengming Zhou <chengming.zhou(a)linux.dev>
Subject: mm/ksm: fix ksm_zero_pages accounting
Date: Tue, 28 May 2024 13:15:22 +0800
We normally ksm_zero_pages++ in ksmd when page is merged with zero page,
but ksm_zero_pages-- is done from page tables side, where there is no any
accessing protection of ksm_zero_pages.
So we can read very exceptional value of ksm_zero_pages in rare cases,
such as -1, which is very confusing to users.
Fix it by changing to use atomic_long_t, and the same case with the
mm->ksm_zero_pages.
Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-2-34bb358fdc13@linux.…
Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM")
Fixes: 6080d19f0704 ("ksm: add ksm zero pages for each process")
Signed-off-by: Chengming Zhou <chengming.zhou(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Ran Xiaokai <ran.xiaokai(a)zte.com.cn>
Cc: Stefan Roesch <shr(a)devkernel.io>
Cc: xu xin <xu.xin16(a)zte.com.cn>
Cc: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/base.c | 2 +-
include/linux/ksm.h | 17 ++++++++++++++---
include/linux/mm_types.h | 2 +-
mm/ksm.c | 11 +++++------
4 files changed, 21 insertions(+), 11 deletions(-)
--- a/fs/proc/base.c~mm-ksm-fix-ksm_zero_pages-accounting
+++ a/fs/proc/base.c
@@ -3214,7 +3214,7 @@ static int proc_pid_ksm_stat(struct seq_
mm = get_task_mm(task);
if (mm) {
seq_printf(m, "ksm_rmap_items %lu\n", mm->ksm_rmap_items);
- seq_printf(m, "ksm_zero_pages %lu\n", mm->ksm_zero_pages);
+ seq_printf(m, "ksm_zero_pages %ld\n", mm_ksm_zero_pages(mm));
seq_printf(m, "ksm_merging_pages %lu\n", mm->ksm_merging_pages);
seq_printf(m, "ksm_process_profit %ld\n", ksm_process_profit(mm));
mmput(mm);
--- a/include/linux/ksm.h~mm-ksm-fix-ksm_zero_pages-accounting
+++ a/include/linux/ksm.h
@@ -33,16 +33,27 @@ void __ksm_exit(struct mm_struct *mm);
*/
#define is_ksm_zero_pte(pte) (is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte))
-extern unsigned long ksm_zero_pages;
+extern atomic_long_t ksm_zero_pages;
+
+static inline void ksm_map_zero_page(struct mm_struct *mm)
+{
+ atomic_long_inc(&ksm_zero_pages);
+ atomic_long_inc(&mm->ksm_zero_pages);
+}
static inline void ksm_might_unmap_zero_page(struct mm_struct *mm, pte_t pte)
{
if (is_ksm_zero_pte(pte)) {
- ksm_zero_pages--;
- mm->ksm_zero_pages--;
+ atomic_long_dec(&ksm_zero_pages);
+ atomic_long_dec(&mm->ksm_zero_pages);
}
}
+static inline long mm_ksm_zero_pages(struct mm_struct *mm)
+{
+ return atomic_long_read(&mm->ksm_zero_pages);
+}
+
static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags))
--- a/include/linux/mm_types.h~mm-ksm-fix-ksm_zero_pages-accounting
+++ a/include/linux/mm_types.h
@@ -985,7 +985,7 @@ struct mm_struct {
* Represent how many empty pages are merged with kernel zero
* pages when enabling KSM use_zero_pages.
*/
- unsigned long ksm_zero_pages;
+ atomic_long_t ksm_zero_pages;
#endif /* CONFIG_KSM */
#ifdef CONFIG_LRU_GEN_WALKS_MMU
struct {
--- a/mm/ksm.c~mm-ksm-fix-ksm_zero_pages-accounting
+++ a/mm/ksm.c
@@ -296,7 +296,7 @@ static bool ksm_use_zero_pages __read_mo
static bool ksm_smart_scan = true;
/* The number of zero pages which is placed by KSM */
-unsigned long ksm_zero_pages;
+atomic_long_t ksm_zero_pages = ATOMIC_LONG_INIT(0);
/* The number of pages that have been skipped due to "smart scanning" */
static unsigned long ksm_pages_skipped;
@@ -1429,8 +1429,7 @@ static int replace_page(struct vm_area_s
* the dirty bit in zero page's PTE is set.
*/
newpte = pte_mkdirty(pte_mkspecial(pfn_pte(page_to_pfn(kpage), vma->vm_page_prot)));
- ksm_zero_pages++;
- mm->ksm_zero_pages++;
+ ksm_map_zero_page(mm);
/*
* We're replacing an anonymous page with a zero page, which is
* not anonymous. We need to do proper accounting otherwise we
@@ -3374,7 +3373,7 @@ static void wait_while_offlining(void)
#ifdef CONFIG_PROC_FS
long ksm_process_profit(struct mm_struct *mm)
{
- return (long)(mm->ksm_merging_pages + mm->ksm_zero_pages) * PAGE_SIZE -
+ return (long)(mm->ksm_merging_pages + mm_ksm_zero_pages(mm)) * PAGE_SIZE -
mm->ksm_rmap_items * sizeof(struct ksm_rmap_item);
}
#endif /* CONFIG_PROC_FS */
@@ -3663,7 +3662,7 @@ KSM_ATTR_RO(pages_skipped);
static ssize_t ksm_zero_pages_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
- return sysfs_emit(buf, "%ld\n", ksm_zero_pages);
+ return sysfs_emit(buf, "%ld\n", atomic_long_read(&ksm_zero_pages));
}
KSM_ATTR_RO(ksm_zero_pages);
@@ -3672,7 +3671,7 @@ static ssize_t general_profit_show(struc
{
long general_profit;
- general_profit = (ksm_pages_sharing + ksm_zero_pages) * PAGE_SIZE -
+ general_profit = (ksm_pages_sharing + atomic_long_read(&ksm_zero_pages)) * PAGE_SIZE -
ksm_rmap_items * sizeof(struct ksm_rmap_item);
return sysfs_emit(buf, "%ld\n", general_profit);
_
Patches currently in -mm which might be from chengming.zhou(a)linux.dev are
The quilt patch titled
Subject: mm/ksm: fix ksm_pages_scanned accounting
has been removed from the -mm tree. Its filename was
mm-ksm-fix-ksm_pages_scanned-accounting.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Chengming Zhou <chengming.zhou(a)linux.dev>
Subject: mm/ksm: fix ksm_pages_scanned accounting
Date: Tue, 28 May 2024 13:15:21 +0800
Patch series "mm/ksm: fix some accounting problems", v3.
We encountered some abnormal ksm_pages_scanned and ksm_zero_pages during
some random tests.
1. ksm_pages_scanned unchanged even ksmd scanning has progress.
2. ksm_zero_pages maybe -1 in some rare cases.
This patch (of 2):
During testing, I found ksm_pages_scanned is unchanged although the
scan_get_next_rmap_item() did return valid rmap_item that is not NULL.
The reason is the scan_get_next_rmap_item() will return NULL after a full
scan, so ksm_do_scan() just return without accounting of the
ksm_pages_scanned.
Fix it by just putting ksm_pages_scanned accounting in that loop, and it
will be accounted more timely if that loop would last for a long time.
Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-0-34bb358fdc13@linux.…
Link: https://lkml.kernel.org/r/20240528-b4-ksm-counters-v3-1-34bb358fdc13@linux.…
Fixes: b348b5fe2b5f ("mm/ksm: add pages scanned metric")
Signed-off-by: Chengming Zhou <chengming.zhou(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: xu xin <xu.xin16(a)zte.com.cn>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Ran Xiaokai <ran.xiaokai(a)zte.com.cn>
Cc: Stefan Roesch <shr(a)devkernel.io>
Cc: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/ksm.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
--- a/mm/ksm.c~mm-ksm-fix-ksm_pages_scanned-accounting
+++ a/mm/ksm.c
@@ -2754,18 +2754,16 @@ static void ksm_do_scan(unsigned int sca
{
struct ksm_rmap_item *rmap_item;
struct page *page;
- unsigned int npages = scan_npages;
- while (npages-- && likely(!freezing(current))) {
+ while (scan_npages-- && likely(!freezing(current))) {
cond_resched();
rmap_item = scan_get_next_rmap_item(&page);
if (!rmap_item)
return;
cmp_and_merge_page(page, rmap_item);
put_page(page);
+ ksm_pages_scanned++;
}
-
- ksm_pages_scanned += scan_npages - npages;
}
static int ksmd_should_run(void)
_
Patches currently in -mm which might be from chengming.zhou(a)linux.dev are
The quilt patch titled
Subject: kmsan: do not wipe out origin when doing partial unpoisoning
has been removed from the -mm tree. Its filename was
kmsan-do-not-wipe-out-origin-when-doing-partial-unpoisoning.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Alexander Potapenko <glider(a)google.com>
Subject: kmsan: do not wipe out origin when doing partial unpoisoning
Date: Tue, 28 May 2024 12:48:06 +0200
As noticed by Brian, KMSAN should not be zeroing the origin when
unpoisoning parts of a four-byte uninitialized value, e.g.:
char a[4];
kmsan_unpoison_memory(a, 1);
This led to false negatives, as certain poisoned values could receive zero
origins, preventing those values from being reported.
To fix the problem, check that kmsan_internal_set_shadow_origin() writes
zero origins only to slots which have zero shadow.
Link: https://lkml.kernel.org/r/20240528104807.738758-1-glider@google.com
Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core")
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Reported-by: Brian Johannesmeyer <bjohannesmeyer(a)gmail.com>
Link: https://lore.kernel.org/lkml/20240524232804.1984355-1-bjohannesmeyer@gmail.…
Reviewed-by: Marco Elver <elver(a)google.com>
Tested-by: Brian Johannesmeyer <bjohannesmeyer(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmsan/core.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
--- a/mm/kmsan/core.c~kmsan-do-not-wipe-out-origin-when-doing-partial-unpoisoning
+++ a/mm/kmsan/core.c
@@ -196,8 +196,7 @@ void kmsan_internal_set_shadow_origin(vo
u32 origin, bool checked)
{
u64 address = (u64)addr;
- void *shadow_start;
- u32 *origin_start;
+ u32 *shadow_start, *origin_start;
size_t pad = 0;
KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
@@ -225,8 +224,16 @@ void kmsan_internal_set_shadow_origin(vo
origin_start =
(u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN);
- for (int i = 0; i < size / KMSAN_ORIGIN_SIZE; i++)
- origin_start[i] = origin;
+ /*
+ * If the new origin is non-zero, assume that the shadow byte is also non-zero,
+ * and unconditionally overwrite the old origin slot.
+ * If the new origin is zero, overwrite the old origin slot iff the
+ * corresponding shadow slot is zero.
+ */
+ for (int i = 0; i < size / KMSAN_ORIGIN_SIZE; i++) {
+ if (origin || !shadow_start[i])
+ origin_start[i] = origin;
+ }
}
struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
_
Patches currently in -mm which might be from glider(a)google.com are
The quilt patch titled
Subject: nilfs2: fix potential kernel bug due to lack of writeback flag waiting
has been removed from the -mm tree. Its filename was
nilfs2-fix-potential-kernel-bug-due-to-lack-of-writeback-flag-waiting.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix potential kernel bug due to lack of writeback flag waiting
Date: Thu, 30 May 2024 23:15:56 +0900
Destructive writes to a block device on which nilfs2 is mounted can cause
a kernel bug in the folio/page writeback start routine or writeback end
routine (__folio_start_writeback in the log below):
kernel BUG at mm/page-writeback.c:3070!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
...
RIP: 0010:__folio_start_writeback+0xbaa/0x10e0
Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff
e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f>
0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00
...
Call Trace:
<TASK>
nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2]
nilfs_segctor_construct+0x181/0x6b0 [nilfs2]
nilfs_segctor_thread+0x548/0x11c0 [nilfs2]
kthread+0x2f0/0x390
ret_from_fork+0x4b/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
This is because when the log writer starts a writeback for segment summary
blocks or a super root block that use the backing device's page cache, it
does not wait for the ongoing folio/page writeback, resulting in an
inconsistent writeback state.
Fix this issue by waiting for ongoing writebacks when putting
folios/pages on the backing device into writeback state.
Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 3 +++
1 file changed, 3 insertions(+)
--- a/fs/nilfs2/segment.c~nilfs2-fix-potential-kernel-bug-due-to-lack-of-writeback-flag-waiting
+++ a/fs/nilfs2/segment.c
@@ -1652,6 +1652,7 @@ static void nilfs_segctor_prepare_write(
if (bh->b_folio != bd_folio) {
if (bd_folio) {
folio_lock(bd_folio);
+ folio_wait_writeback(bd_folio);
folio_clear_dirty_for_io(bd_folio);
folio_start_writeback(bd_folio);
folio_unlock(bd_folio);
@@ -1665,6 +1666,7 @@ static void nilfs_segctor_prepare_write(
if (bh == segbuf->sb_super_root) {
if (bh->b_folio != bd_folio) {
folio_lock(bd_folio);
+ folio_wait_writeback(bd_folio);
folio_clear_dirty_for_io(bd_folio);
folio_start_writeback(bd_folio);
folio_unlock(bd_folio);
@@ -1681,6 +1683,7 @@ static void nilfs_segctor_prepare_write(
}
if (bd_folio) {
folio_lock(bd_folio);
+ folio_wait_writeback(bd_folio);
folio_clear_dirty_for_io(bd_folio);
folio_start_writeback(bd_folio);
folio_unlock(bd_folio);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are